初始化项目,由ModelHub XC社区提供模型

Model: W-61/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-26 08:53:18 +08:00
commit 4c5ec42aaa
682 changed files with 17438 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

81
README.md Normal file
View File

@@ -0,0 +1,81 @@
---
library_name: transformers
base_model: W-61/llama-3-8b-base-sft-hh-harmless-4xh200
tags:
- alignment-handbook
- new-dpo
- generated_from_trainer
datasets:
- Anthropic/hh-rlhf
model-index:
- name: llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851
This model is a fine-tuned version of [W-61/llama-3-8b-base-sft-hh-harmless-4xh200](https://huggingface.co/W-61/llama-3-8b-base-sft-hh-harmless-4xh200) on the Anthropic/hh-rlhf dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5467
- Fcm Dpo/beta: 0.2268
- Fcm Dpo/q T: 0.3412
- Fcm Dpo/delta: -0.0017
- Fcm Dpo/margin: 4.4089
- Margin Dpo/margin Mean: 4.4089
- Margin Dpo/margin Std: 7.2504
- Logps/chosen: -82.5570
- Logps/rejected: -91.6554
- Logps/ref Chosen: -74.8595
- Logps/ref Rejected: -79.5490
- Logits/chosen: 0.2724
- Logits/rejected: 0.2290
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Fcm Dpo/beta | Fcm Dpo/q T | Fcm Dpo/delta | Fcm Dpo/margin | Margin Dpo/margin Mean | Margin Dpo/margin Std | Logps/chosen | Logps/rejected | Logps/ref Chosen | Logps/ref Rejected | Logits/chosen | Logits/rejected |
|:-------------:|:------:|:----:|:---------------:|:------------:|:-----------:|:-------------:|:--------------:|:----------------------:|:---------------------:|:------------:|:--------------:|:----------------:|:------------------:|:-------------:|:---------------:|
| 0.9878 | 0.3023 | 200 | 0.5709 | 0.4418 | 0.3475 | 0.0198 | 2.2173 | 2.2173 | 3.8804 | -79.5152 | -86.4220 | -74.8595 | -79.5490 | 0.2313 | 0.1912 |
| 0.966 | 0.6047 | 400 | 0.5573 | 0.2747 | 0.3451 | 0.0194 | 3.5687 | 3.5687 | 6.0461 | -81.6850 | -89.9432 | -74.8595 | -79.5490 | 0.2557 | 0.2135 |
| 1.122 | 0.9070 | 600 | 0.5467 | 0.2268 | 0.3412 | -0.0017 | 4.4089 | 4.4089 | 7.2504 | -82.5570 | -91.6554 | -74.8595 | -79.5490 | 0.2724 | 0.2290 |
### Framework versions
- Transformers 4.51.0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.21.4

26
all_results.json Normal file
View File

@@ -0,0 +1,26 @@
{
"epoch": 0.999244142101285,
"eval_fcm_dpo/beta": 0.22239074110984802,
"eval_fcm_dpo/delta": 0.017137322574853897,
"eval_fcm_dpo/margin": 4.419626235961914,
"eval_fcm_dpo/q_t": 0.3427804410457611,
"eval_logits/chosen": 0.3007291853427887,
"eval_logits/rejected": 0.25601860880851746,
"eval_logps/chosen": -82.5825424194336,
"eval_logps/ref_chosen": -74.85946655273438,
"eval_logps/ref_rejected": -79.54898834228516,
"eval_logps/rejected": -91.69168853759766,
"eval_loss": 0.5452697277069092,
"eval_margin_dpo/margin_mean": 4.419626235961914,
"eval_margin_dpo/margin_std": 7.256680488586426,
"eval_runtime": 38.1623,
"eval_samples": 2303,
"eval_samples_per_second": 60.347,
"eval_steps_per_second": 1.887,
"total_flos": 0.0,
"train_loss": 1.0460260132617922,
"train_runtime": 1747.5645,
"train_samples": 42336,
"train_samples_per_second": 24.226,
"train_steps_per_second": 0.378
}

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.51.0",
"use_cache": true,
"vocab_size": 128256
}

20
eval_results.json Normal file
View File

@@ -0,0 +1,20 @@
{
"epoch": 0.999244142101285,
"eval_fcm_dpo/beta": 0.22239074110984802,
"eval_fcm_dpo/delta": 0.017137322574853897,
"eval_fcm_dpo/margin": 4.419626235961914,
"eval_fcm_dpo/q_t": 0.3427804410457611,
"eval_logits/chosen": 0.3007291853427887,
"eval_logits/rejected": 0.25601860880851746,
"eval_logps/chosen": -82.5825424194336,
"eval_logps/ref_chosen": -74.85946655273438,
"eval_logps/ref_rejected": -79.54898834228516,
"eval_logps/rejected": -91.69168853759766,
"eval_loss": 0.5452697277069092,
"eval_margin_dpo/margin_mean": 4.419626235961914,
"eval_margin_dpo/margin_std": 7.256680488586426,
"eval_runtime": 38.1623,
"eval_samples": 2303,
"eval_samples_per_second": 60.347,
"eval_steps_per_second": 1.887
}

9
generation_config.json Normal file
View File

@@ -0,0 +1,9 @@
{
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": 128001,
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "4.51.0"
}

661
margin_logs/margins.jsonl Normal file
View File

@@ -0,0 +1,661 @@
{"epoch": 0.0, "step": 1, "batch_size": 64, "mean": -0.0013527870178222656, "std": 0.2564818859100342, "min": -0.736083984375, "p10": -0.3432229995727539, "median": 0.038166046142578125, "p90": 0.29227676391601565, "max": 0.645111083984375, "pos_frac": 0.578125, "sample": [0.1120758056640625, 0.12518310546875, 0.31621551513671875, 0.13765716552734375, -0.12592506408691406, 0.23141098022460938, -0.21887779235839844, 0.21950721740722656, 0.04480743408203125, 0.020877838134765625, 0.0570220947265625, 0.058269500732421875, -0.4338226318359375, -0.030628204345703125, 0.645111083984375, -0.395477294921875, 0.09050941467285156, 0.0007190704345703125, -0.34615325927734375, 0.016077041625976562, -0.33638572692871094, 0.293853759765625, 0.17610931396484375, 0.22386932373046875, 0.21470260620117188, -0.08536529541015625, 0.0907745361328125, -0.03816986083984375, 0.39190101623535156, 0.16336441040039062, 0.08024787902832031, -0.031158447265625, 0.08477020263671875, 0.002460479736328125, -0.242034912109375, 0.07232666015625, -0.60186767578125, 0.20531463623046875, 0.155731201171875, -0.14299774169921875, -0.25698089599609375, 0.12331962585449219, -0.26497650146484375, 0.15140533447265625, -0.0920257568359375, -0.18599319458007812, 0.19028091430664062, 0.2496490478515625, 0.42162322998046875, 0.17873382568359375, -0.1525421142578125, -0.4972076416015625, 0.32010650634765625, -0.10365867614746094, -0.233795166015625, -0.19828224182128906, -0.4018898010253906, -0.13407135009765625, -0.09596633911132812, 0.031524658203125, 0.28859710693359375, -0.192962646484375, -0.736083984375, 0.3026123046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000001.npy"}
{"epoch": 0.0015117157974300832, "step": 2, "batch_size": 64, "mean": 0.03744968771934509, "std": 0.2875921130180359, "min": -0.7604827880859375, "p10": -0.2812448501586914, "median": 0.03963661193847656, "p90": 0.3654294967651367, "max": 0.8134727478027344, "pos_frac": 0.5625, "sample": [0.30594635009765625, -0.24289894104003906, -0.11509323120117188, -0.13417816162109375, 0.06942558288574219, 0.36568641662597656, -0.14640045166015625, 0.1497650146484375, 0.30261993408203125, 0.10124588012695312, 0.13028717041015625, -0.0031890869140625, 0.0361480712890625, 0.5662612915039062, 0.09694290161132812, -0.01091766357421875, 0.1128997802734375, 0.0411834716796875, -0.21860504150390625, -0.1236419677734375, -0.08812713623046875, 0.10360527038574219, 0.1790008544921875, -0.5114288330078125, 0.3056755065917969, -0.14553451538085938, 0.28168487548828125, 0.26990509033203125, 0.1686878204345703, 0.038089752197265625, 0.19541168212890625, -0.10783576965332031, -0.2644004821777344, -0.19707489013671875, -0.140472412109375, 0.1349811553955078, 0.19672012329101562, -0.0714111328125, 0.53369140625, 0.1271820068359375, 0.8134727478027344, 0.2990264892578125, -0.7604827880859375, -0.08274078369140625, 0.05890846252441406, 0.029361724853515625, 0.4510040283203125, -0.1599273681640625, -0.29346656799316406, 0.10005569458007812, -0.27509117126464844, -0.1937713623046875, 0.19167327880859375, 0.28173065185546875, -0.09406471252441406, -0.3380699157714844, -0.29186248779296875, 0.36483001708984375, 0.009979248046875, 0.44391632080078125, -0.126708984375, -0.6550216674804688, 0.6160736083984375, -0.28388214111328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000002.npy"}
{"epoch": 0.0030234315948601664, "step": 3, "batch_size": 64, "mean": 0.03993678092956543, "std": 0.2944541573524475, "min": -0.6945343017578125, "p10": -0.2851646423339843, "median": 0.030259132385253906, "p90": 0.4060111999511719, "max": 0.6619415283203125, "pos_frac": 0.578125, "sample": [0.07639312744140625, 0.030282974243164062, 0.0016326904296875, -0.5877304077148438, 0.2992401123046875, 0.59893798828125, -0.1027984619140625, 0.0534515380859375, 0.60992431640625, -0.19660186767578125, 0.06414794921875, -0.657135009765625, -0.1485443115234375, 0.09978675842285156, 0.015472412109375, 0.24958038330078125, 0.04119873046875, 0.41283416748046875, 0.17412567138671875, 0.06246185302734375, -0.3940887451171875, -0.057361602783203125, -0.07501220703125, 0.3238716125488281, -0.6945343017578125, 0.34920310974121094, 0.650848388671875, -0.00933837890625, 0.011508941650390625, 0.2433929443359375, 0.055606842041015625, -0.176300048828125, -0.11013031005859375, 0.03023529052734375, -0.079437255859375, -0.002017974853515625, -0.42412567138671875, -0.0469970703125, 0.030300140380859375, 0.5537261962890625, 0.6619415283203125, -0.2111053466796875, -0.11006927490234375, 0.410614013671875, 0.19634246826171875, 0.194671630859375, -0.03992462158203125, 0.09444808959960938, 0.17004776000976562, -0.0316619873046875, -0.5266265869140625, 0.20313262939453125, 0.39527130126953125, 0.31229591369628906, 0.07370376586914062, 0.31048583984375, 0.0140533447265625, -0.22106170654296875, -0.21651458740234375, -0.0494384765625, -0.3126373291015625, 0.291168212890625, -0.150054931640625, -0.17913818359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000003.npy"}
{"epoch": 0.0045351473922902496, "step": 4, "batch_size": 64, "mean": 0.0005339086055755615, "std": 0.3667702376842499, "min": -0.609161376953125, "p10": -0.4043434143066406, "median": -0.0712890625, "p90": 0.37760734558105474, "max": 1.6048583984375, "pos_frac": 0.4375, "sample": [-0.246368408203125, 0.7781829833984375, -0.13834381103515625, 0.33487701416015625, 0.062530517578125, -0.30374908447265625, 0.06683731079101562, 0.7350387573242188, -0.1242218017578125, 0.3258056640625, 0.20132064819335938, 0.19298362731933594, 0.0226898193359375, -0.1905670166015625, -0.17315292358398438, -0.4734039306640625, 0.2257537841796875, -0.19969940185546875, 0.0992279052734375, -0.46793365478515625, -0.19141387939453125, 0.017459869384765625, 0.31662750244140625, -0.1072235107421875, -0.3006172180175781, -0.3187255859375, -0.07154083251953125, 0.2966499328613281, 0.024112701416015625, 0.21506500244140625, 1.6048583984375, -0.09209442138671875, 0.07324981689453125, -0.24561309814453125, -0.38344573974609375, 0.544677734375, 0.6014404296875, -0.2508888244628906, -0.609161376953125, -0.05493927001953125, -0.023639678955078125, -0.07103729248046875, 0.3816986083984375, -0.28769493103027344, -0.3297271728515625, 0.1739959716796875, -0.5849819183349609, -0.05998802185058594, -0.413299560546875, -0.13056564331054688, 0.117706298828125, -0.0944366455078125, -0.19222640991210938, -0.16402053833007812, -0.198211669921875, 0.3399505615234375, -0.14891815185546875, 0.4174461364746094, 0.3680610656738281, -0.4725017547607422, -0.15869903564453125, 0.0086517333984375, 0.21982574462890625, -0.459503173828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000004.npy"}
{"epoch": 0.006046863189720333, "step": 5, "batch_size": 64, "mean": -0.006652712821960449, "std": 0.2899929881095886, "min": -0.7787094116210938, "p10": -0.28804569244384765, "median": 0.0009975433349609375, "p90": 0.28809967041015627, "max": 0.7808380126953125, "pos_frac": 0.5, "sample": [-0.11611175537109375, -0.5383987426757812, 0.0043468475341796875, -0.015716552734375, 0.2375335693359375, -0.265167236328125, -0.14346694946289062, 0.20703887939453125, 0.258392333984375, 0.57940673828125, 0.085723876953125, -0.70849609375, -0.17682266235351562, 0.26236724853515625, -0.17504119873046875, -0.27101898193359375, 0.179779052734375, -0.09952735900878906, -0.7195358276367188, 0.027851104736328125, -0.09554862976074219, 0.017181396484375, 0.14794921875, -0.014125823974609375, -0.1702880859375, -0.1416778564453125, -0.455474853515625, -0.0023517608642578125, 0.35333251953125, 0.3196601867675781, 0.19947052001953125, -0.21715545654296875, 0.03108978271484375, 0.23685646057128906, 0.7808380126953125, -0.11661529541015625, -0.04241180419921875, -0.09322357177734375, 0.1274738311767578, 0.07691574096679688, 0.08426666259765625, -0.145599365234375, 0.29181671142578125, 0.08888626098632812, 0.251434326171875, -0.1291961669921875, -0.2158203125, -0.2339935302734375, 0.5867767333984375, -0.2751922607421875, 0.29383277893066406, 0.011474609375, 0.14488983154296875, 0.1612548828125, 0.208648681640625, -0.31592559814453125, -0.7787094116210938, -0.29355430603027344, -0.1632061004638672, 0.08612632751464844, -0.05777168273925781, 0.27942657470703125, -0.13308143615722656, 0.2724113464355469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000005.npy"}
{"epoch": 0.007558578987150416, "step": 6, "batch_size": 64, "mean": -0.0765211284160614, "std": 0.3343644440174103, "min": -1.024139404296875, "p10": -0.5026498794555663, "median": -0.0458526611328125, "p90": 0.300423049926758, "max": 0.703521728515625, "pos_frac": 0.4375, "sample": [0.090850830078125, -0.005130767822265625, -0.2366790771484375, -0.159332275390625, 0.003192901611328125, 0.09245491027832031, -0.06145477294921875, 0.07430267333984375, -0.7787246704101562, -0.1262683868408203, 0.15942764282226562, -0.36865234375, -0.2829627990722656, 0.3451080322265625, 0.3524169921875, -0.4005584716796875, -0.042999267578125, 0.5825614929199219, -0.19013214111328125, -0.17931175231933594, -0.0487060546875, -0.598602294921875, -0.2674674987792969, 0.11294937133789062, -0.2067413330078125, -0.7875900268554688, -0.16432952880859375, 0.0228271484375, -0.1838703155517578, 0.2559013366699219, 0.21952247619628906, 0.025363922119140625, 0.3195037841796875, 0.168670654296875, -0.7930145263671875, -0.1435546875, 0.07561111450195312, -0.42780303955078125, 0.19189834594726562, -0.2738494873046875, -0.37343597412109375, -0.028514862060546875, -0.15891647338867188, 0.06707000732421875, 0.14501190185546875, -0.081329345703125, 0.703521728515625, -0.03281402587890625, -0.2498321533203125, -0.16085052490234375, 0.079803466796875, 0.6070022583007812, -0.5347270965576172, -0.2386455535888672, 0.09570693969726562, 0.22719192504882812, -0.20503807067871094, -0.35652923583984375, -1.024139404296875, -0.62493896484375, 0.3252716064453125, 0.2167835235595703, 0.1969623565673828, 0.1432056427001953], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000006.npy"}
{"epoch": 0.009070294784580499, "step": 7, "batch_size": 64, "mean": 0.001888364553451538, "std": 0.24290354549884796, "min": -0.7939491271972656, "p10": -0.30234394073486326, "median": 0.0028963088989257812, "p90": 0.3067249298095704, "max": 0.5625457763671875, "pos_frac": 0.5, "sample": [0.05937957763671875, -0.24109268188476562, -0.0214691162109375, 0.0066013336181640625, 0.07084083557128906, 0.5570068359375, 0.3590106964111328, 0.0547027587890625, 0.37343597412109375, 0.32227325439453125, -0.07001113891601562, -0.11045646667480469, 0.047222137451171875, 0.0981292724609375, 0.10247230529785156, -0.29552268981933594, -0.027347564697265625, 0.13301849365234375, 0.17098617553710938, 0.01389312744140625, 0.1564464569091797, -0.09877777099609375, -0.2408447265625, -0.0008087158203125, -0.02435302734375, -0.17652511596679688, -0.22074508666992188, -0.305267333984375, 0.3166046142578125, -0.010705947875976562, -0.10748291015625, -0.02032470703125, 0.0787811279296875, 0.2836723327636719, -0.048370361328125, 0.5300979614257812, 0.26493072509765625, 0.2508544921875, -0.13443756103515625, 0.04185295104980469, 0.0240936279296875, -0.191192626953125, 0.0483551025390625, -0.0257415771484375, -0.31647491455078125, 0.10138702392578125, -0.3438529968261719, 0.2307872772216797, 0.1331329345703125, 0.0554046630859375, -0.23279380798339844, -0.7939491271972656, 0.20828819274902344, -0.0008697509765625, 0.0451812744140625, -0.42430877685546875, -0.02611541748046875, -0.20970916748046875, -0.3968048095703125, 0.5625457763671875, 0.09575653076171875, -0.14589691162109375, -0.0334320068359375, -0.38060569763183594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000007.npy"}
{"epoch": 0.010582010582010581, "step": 8, "batch_size": 64, "mean": 0.06306475400924683, "std": 0.29175060987472534, "min": -0.5202560424804688, "p10": -0.28011417388916016, "median": 0.029870986938476562, "p90": 0.4441596984863282, "max": 0.830657958984375, "pos_frac": 0.5625, "sample": [-0.022825241088867188, 0.6728668212890625, 0.18280029296875, 0.029876708984375, -0.2757415771484375, -0.18683624267578125, 0.19657135009765625, 0.32558441162109375, -0.1420917510986328, -0.194580078125, 0.830657958984375, 0.1099090576171875, -0.0151214599609375, -0.24798202514648438, 0.375701904296875, 0.709716796875, 0.3835296630859375, -0.010784149169921875, 0.23396682739257812, 0.30322265625, -0.0038166046142578125, -0.158538818359375, -0.5202560424804688, 0.2985382080078125, -0.1564769744873047, -0.4049549102783203, -0.07499504089355469, 0.0111846923828125, 0.17040634155273438, 0.09755516052246094, 0.029865264892578125, 0.19501495361328125, -0.28198814392089844, -0.21097564697265625, 0.24552154541015625, -0.1166839599609375, 0.2390899658203125, -0.01721954345703125, 0.046875, -0.046955108642578125, 0.484619140625, -0.11401557922363281, 0.0284881591796875, 0.11972808837890625, -0.107879638671875, -0.05499267578125, 0.4491081237792969, -0.4864540100097656, -0.41477012634277344, 0.4326133728027344, 0.12791061401367188, 0.6081829071044922, -0.34630584716796875, -0.40048980712890625, 0.1787567138671875, 0.08293914794921875, -0.10100746154785156, -0.2118377685546875, 0.0518341064453125, 0.39775848388671875, 0.00731658935546875, 0.12885284423828125, 0.5147171020507812, 0.06143951416015625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000008.npy"}
{"epoch": 0.012093726379440665, "step": 9, "batch_size": 64, "mean": 0.020789533853530884, "std": 0.3175339698791504, "min": -0.7542724609375, "p10": -0.4309680938720703, "median": 0.043204307556152344, "p90": 0.4450424194335938, "max": 0.6871337890625, "pos_frac": 0.5625, "sample": [0.6871337890625, 0.28559112548828125, -0.09312820434570312, 0.386016845703125, 0.09407806396484375, -0.13114547729492188, 0.6191596984863281, -0.4905548095703125, -0.3966522216796875, -0.679656982421875, 0.14859771728515625, -0.008749008178710938, 0.49027252197265625, 0.005100250244140625, -0.181488037109375, -0.48131561279296875, -0.1793060302734375, 0.5039825439453125, 0.007122039794921875, 0.08590888977050781, 0.4399261474609375, 0.13897323608398438, -0.087860107421875, 0.1983184814453125, -0.12292289733886719, 0.489959716796875, 0.2701301574707031, 0.27381134033203125, -0.23126602172851562, -0.4456748962402344, 0.02696990966796875, -0.3699932098388672, 0.06568145751953125, 0.459686279296875, 0.05943870544433594, -0.37064170837402344, -0.04444122314453125, -0.6100234985351562, -0.459930419921875, 0.23514556884765625, 0.20630645751953125, -0.3297863006591797, 0.114532470703125, -0.10150146484375, -0.022439956665039062, 0.2692718505859375, 0.2755107879638672, 0.2077007293701172, 0.15522193908691406, -0.1125640869140625, 0.16059112548828125, -0.047821044921875, 0.14040756225585938, 0.2818145751953125, 0.38584327697753906, 0.447235107421875, 0.08924484252929688, -0.02079010009765625, 0.01215362548828125, 0.06450462341308594, -0.3251953125, -0.7542724609375, -0.18236541748046875, -0.1693267822265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000009.npy"}
{"epoch": 0.013605442176870748, "step": 10, "batch_size": 64, "mean": 0.01924341917037964, "std": 0.3252469599246979, "min": -0.8141632080078125, "p10": -0.3524349212646484, "median": 0.04585552215576172, "p90": 0.46739444732666025, "max": 0.6575927734375, "pos_frac": 0.5625, "sample": [0.0787353515625, -0.3270149230957031, -0.05677032470703125, -2.86102294921875e-05, 0.027690887451171875, -0.8141632080078125, -0.19189453125, 0.15760040283203125, -0.104248046875, 0.36325836181640625, -0.13671112060546875, 0.17238998413085938, -0.24616241455078125, 0.47625732421875, -0.051025390625, 0.19091796875, 0.07597732543945312, -0.3544158935546875, 0.121429443359375, -0.018985748291015625, -0.03415679931640625, 0.12363243103027344, 0.520233154296875, 0.389190673828125, -0.23454856872558594, -0.3478126525878906, 0.15727615356445312, 0.1707916259765625, -0.0689544677734375, -0.37775421142578125, 0.10498046875, 0.5667037963867188, 0.06573486328125, 0.40251922607421875, 0.09346199035644531, 0.26032257080078125, -0.15295028686523438, 0.5367755889892578, 0.15922927856445312, 0.4467144012451172, -0.2386455535888672, -0.34255218505859375, -0.315216064453125, 0.0552978515625, -0.2954254150390625, 0.046703338623046875, -0.3356742858886719, -0.3766937255859375, 0.540618896484375, -0.597412109375, -0.60791015625, 0.40837860107421875, 0.027553558349609375, 0.013370513916015625, 0.12247085571289062, -0.202056884765625, 0.6575927734375, 0.04500770568847656, 0.35002899169921875, 0.05133819580078125, 0.2870826721191406, -0.0702362060546875, 0.5805282592773438, -0.716796875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000010.npy"}
{"epoch": 0.015117157974300832, "step": 11, "batch_size": 64, "mean": 0.07758983969688416, "std": 0.3178706169128418, "min": -0.8515777587890625, "p10": -0.2960853576660156, "median": 0.07575702667236328, "p90": 0.5262016296386719, "max": 0.8899459838867188, "pos_frac": 0.640625, "sample": [-0.0879364013671875, 0.1795196533203125, -0.3942680358886719, 0.1360931396484375, -0.37725830078125, 0.0961456298828125, -0.12307357788085938, 0.06306838989257812, 0.5181427001953125, 0.1487884521484375, 0.3356781005859375, 0.13617706298828125, 0.2176513671875, 0.20941734313964844, 0.011749267578125, 0.0096282958984375, 0.7785682678222656, 0.6526031494140625, 0.14501953125, -0.225250244140625, 0.6157379150390625, 0.29656982421875, 0.5296554565429688, 0.10291290283203125, -0.25635719299316406, -0.20304107666015625, 0.21579360961914062, 0.047389984130859375, -0.2244853973388672, 0.5651931762695312, 0.0008563995361328125, 0.035587310791015625, -0.12828445434570312, 0.8899459838867188, -0.16138076782226562, -0.21584320068359375, -0.025714874267578125, 0.7418289184570312, 0.19244384765625, 0.4199676513671875, -0.3544921875, 0.12757110595703125, -0.3067779541015625, 0.11976242065429688, 0.08758544921875, -0.8515777587890625, -0.3056526184082031, 0.06881141662597656, 0.2153778076171875, 0.383544921875, 0.2601432800292969, 0.063323974609375, -0.12062644958496094, -0.024749755859375, -0.11992645263671875, -0.23647308349609375, -0.19099998474121094, 0.2788505554199219, -0.3755989074707031, 0.08270263671875, -0.2737617492675781, 0.3177032470703125, 0.0312957763671875, 0.2204742431640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000011.npy"}
{"epoch": 0.016628873771730914, "step": 12, "batch_size": 64, "mean": 0.04613432288169861, "std": 0.3242299258708954, "min": -0.7829437255859375, "p10": -0.29956741333007814, "median": 0.03281879425048828, "p90": 0.42695541381835944, "max": 0.770904541015625, "pos_frac": 0.515625, "sample": [0.03436088562011719, -0.6370697021484375, -0.024972915649414062, 0.2264270782470703, -0.000179290771484375, 0.20998382568359375, 0.43801307678222656, -0.24318313598632812, 0.19778060913085938, -0.0794677734375, 0.28752899169921875, -0.7191429138183594, -0.2693634033203125, 0.770904541015625, 0.3572845458984375, -0.21399307250976562, -0.25885009765625, -0.35533905029296875, 0.36936187744140625, 0.21085548400878906, -0.21804428100585938, 0.436492919921875, -0.7829437255859375, 0.508575439453125, -0.05323600769042969, -0.3978271484375, -0.28836822509765625, -0.3043670654296875, -0.119903564453125, 0.1825714111328125, -0.26636505126953125, 0.5175628662109375, -0.13351058959960938, -0.0115966796875, -0.14484786987304688, 0.08420753479003906, -0.3649330139160156, 0.367462158203125, 0.2305450439453125, 0.07435035705566406, -0.0237884521484375, -0.18620872497558594, -0.118255615234375, -0.06826019287109375, 0.18994140625, 0.3632678985595703, 0.3139305114746094, 0.35396575927734375, 0.7383346557617188, 0.17392730712890625, -0.038417816162109375, 0.310791015625, 0.27448272705078125, 0.1912841796875, 0.560516357421875, 0.3701324462890625, -0.17844581604003906, 0.40470123291015625, -0.23816680908203125, 0.06298637390136719, -0.06259536743164062, 0.031276702880859375, 0.17528343200683594, -0.2648506164550781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000012.npy"}
{"epoch": 0.018140589569160998, "step": 13, "batch_size": 64, "mean": -0.004494607448577881, "std": 0.2818032503128052, "min": -0.8103523254394531, "p10": -0.31782989501953124, "median": 0.04474925994873047, "p90": 0.29543018341064453, "max": 0.55194091796875, "pos_frac": 0.546875, "sample": [-0.322540283203125, 0.21533203125, 0.1115264892578125, 0.1400146484375, 0.55194091796875, -0.8103523254394531, 0.17261123657226562, 0.3225574493408203, 0.05724334716796875, 0.1522979736328125, 0.5203113555908203, 0.15331649780273438, 0.19316482543945312, -0.1890106201171875, 0.15944671630859375, -0.00139617919921875, 0.211090087890625, 0.07915496826171875, 0.102142333984375, 0.2956390380859375, -0.0789031982421875, 0.11243438720703125, -0.041961669921875, -0.3428192138671875, -0.6894950866699219, -0.0009918212890625, -0.2043304443359375, -0.1707134246826172, -0.24170684814453125, -0.21070098876953125, 0.1675243377685547, 0.153350830078125, 0.16022109985351562, -0.2905082702636719, -0.3068389892578125, 0.5316352844238281, -0.10147476196289062, 0.016544342041015625, 0.1229400634765625, -0.16207122802734375, 0.16811370849609375, 0.29309844970703125, 0.25315093994140625, -0.2686958312988281, -0.21222686767578125, 0.0026607513427734375, 0.0426483154296875, -0.027099609375, -0.45632171630859375, 0.09774017333984375, -0.1224212646484375, 0.32891082763671875, -0.11238288879394531, -0.1282329559326172, 0.1281585693359375, -0.7450637817382812, -0.6300506591796875, 0.29494285583496094, 0.11182403564453125, 0.31603050231933594, -0.0288848876953125, -0.033721923828125, -0.14330673217773438, 0.04685020446777344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000013.npy"}
{"epoch": 0.019652305366591082, "step": 14, "batch_size": 64, "mean": 0.03439044952392578, "std": 0.30847546458244324, "min": -0.5689544677734375, "p10": -0.28922882080078127, "median": 0.012643814086914062, "p90": 0.3579042434692384, "max": 1.039642333984375, "pos_frac": 0.515625, "sample": [-0.0201568603515625, -0.28594970703125, -0.5689544677734375, 0.19348907470703125, 0.0997314453125, 0.0644989013671875, 0.04553985595703125, 0.3319091796875, -0.13073158264160156, -0.36530113220214844, -0.10821533203125, 0.3061866760253906, -0.19951248168945312, -0.5509490966796875, 0.033935546875, 0.1283740997314453, 0.25090980529785156, -0.20641326904296875, 0.02130126953125, -0.205810546875, -0.11481857299804688, -0.299957275390625, 0.016330718994140625, 0.22013092041015625, -0.1072845458984375, -0.1414642333984375, 0.0177001953125, 0.41156768798828125, 0.44944000244140625, 1.0177536010742188, -0.131134033203125, 0.7400054931640625, 0.25981903076171875, 1.039642333984375, 0.1072235107421875, 0.1285400390625, -0.00244903564453125, -0.14357757568359375, 0.2282848358154297, 0.159637451171875, 0.0115966796875, 0.33724021911621094, -0.10923385620117188, 0.36676025390625, -0.13309288024902344, -0.023197174072265625, -0.15117645263671875, -0.09722137451171875, 0.6566009521484375, 0.16774368286132812, -0.13111114501953125, -0.4504966735839844, 0.013690948486328125, -0.38512420654296875, 0.10283851623535156, -0.053516387939453125, -0.2829132080078125, -0.1791553497314453, 0.0518646240234375, 0.160491943359375, 0.09311485290527344, -0.0579986572265625, -0.2906341552734375, -0.10535430908203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000014.npy"}
{"epoch": 0.021164021164021163, "step": 15, "batch_size": 64, "mean": 0.037355005741119385, "std": 0.33673563599586487, "min": -1.3348236083984375, "p10": -0.2765209197998047, "median": 0.0007753372192382812, "p90": 0.40157012939453135, "max": 1.1241531372070312, "pos_frac": 0.5, "sample": [0.33975982666015625, -0.1893310546875, 0.033382415771484375, -0.07475471496582031, -0.048427581787109375, 0.325286865234375, -0.115325927734375, 0.00868988037109375, 0.081085205078125, -0.509124755859375, 0.21335983276367188, -0.08021354675292969, -0.2816009521484375, -0.05560874938964844, -0.0306854248046875, -0.03986930847167969, 1.1241531372070312, -0.0212249755859375, 0.3442649841308594, -0.055759429931640625, -0.03228759765625, -0.2678718566894531, 0.34433746337890625, -0.2279510498046875, -0.07691192626953125, 0.124481201171875, 0.10003662109375, 0.4102630615234375, -0.02813720703125, -0.2802276611328125, 0.17931747436523438, -0.22119712829589844, -0.25984954833984375, -0.5589447021484375, 0.4181251525878906, 0.15584564208984375, 0.38128662109375, 0.26788330078125, -0.08617782592773438, 0.29091644287109375, 0.69488525390625, 0.4299468994140625, 0.021270751953125, 0.46712493896484375, -0.1259899139404297, -1.3348236083984375, -0.3874969482421875, 0.13096237182617188, 0.3699684143066406, 0.0707244873046875, 0.08510589599609375, 0.13727378845214844, -0.26360321044921875, -0.15380096435546875, -0.0071392059326171875, -0.20985031127929688, -0.29705810546875, -0.24268722534179688, 0.2697124481201172, 0.20219802856445312, 0.1786956787109375, 0.46897125244140625, 0.338592529296875, -0.05325508117675781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000015.npy"}
{"epoch": 0.022675736961451247, "step": 16, "batch_size": 64, "mean": -0.030480757355690002, "std": 0.2903221845626831, "min": -0.9575042724609375, "p10": -0.34693222045898436, "median": -0.03944540023803711, "p90": 0.31504631042480474, "max": 0.6150054931640625, "pos_frac": 0.4375, "sample": [-0.27138328552246094, 0.05599021911621094, -0.3785438537597656, -0.1549968719482422, 0.5089454650878906, -0.19731903076171875, -0.071258544921875, 0.1376953125, 0.3348846435546875, 0.026689529418945312, -0.3987388610839844, -0.13433074951171875, -0.0477142333984375, -0.32149505615234375, 0.24880027770996094, -0.11631202697753906, 0.04844093322753906, 0.002513885498046875, -0.051387786865234375, -0.694671630859375, -0.12052154541015625, -0.017642974853515625, -0.1890869140625, 0.5703811645507812, 0.11620140075683594, -0.9575042724609375, -0.060962677001953125, 0.1898651123046875, -0.17726516723632812, 0.19495391845703125, -0.07633209228515625, 0.0627288818359375, 0.165069580078125, 0.16221237182617188, 0.30301666259765625, -0.5638656616210938, 0.6150054931640625, -0.1059112548828125, -0.2062225341796875, -0.17952537536621094, -0.24444961547851562, 0.19261741638183594, -0.017246246337890625, 0.099273681640625, 0.47304534912109375, 0.3202018737792969, 0.10462188720703125, -0.11946868896484375, 0.17255401611328125, 0.000164031982421875, -0.03656196594238281, 0.12193679809570312, -0.042328834533691406, -0.19055938720703125, -0.24626922607421875, 0.113433837890625, -0.01297760009765625, -0.7476959228515625, -0.3578338623046875, 0.2850170135498047, -0.10399627685546875, -0.20392990112304688, 0.3550262451171875, -0.11574554443359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000016.npy"}
{"epoch": 0.02418745275888133, "step": 17, "batch_size": 64, "mean": 0.015365689992904663, "std": 0.326249361038208, "min": -0.91668701171875, "p10": -0.325762939453125, "median": -0.008798599243164062, "p90": 0.3948453903198243, "max": 1.046142578125, "pos_frac": 0.453125, "sample": [0.18558502197265625, 0.1495819091796875, 0.06920814514160156, -0.10853767395019531, -0.075164794921875, 0.007061004638671875, -0.30318450927734375, -0.004505157470703125, 0.20938873291015625, -0.290435791015625, -0.05914306640625, -0.105133056640625, -0.31612586975097656, 0.04950714111328125, -0.17342185974121094, -0.232513427734375, -0.00731658935546875, -0.06419754028320312, 0.023588180541992188, -0.070648193359375, -0.1867828369140625, 0.17410659790039062, -0.91668701171875, -0.071441650390625, -0.40522003173828125, -0.06282806396484375, -0.04413604736328125, -0.16402435302734375, 0.2735271453857422, 0.0152587890625, -0.5201759338378906, -0.4101715087890625, 0.451568603515625, -0.09326553344726562, 0.6287307739257812, 1.046142578125, -0.00626373291015625, 0.1182861328125, 0.16110992431640625, 0.413055419921875, 0.30975341796875, -0.2662086486816406, 0.14750289916992188, 0.3058319091796875, 0.2332611083984375, -0.0846710205078125, -0.2717742919921875, 0.888092041015625, 0.40542030334472656, 0.69287109375, -0.16964340209960938, -0.32830810546875, -0.4236297607421875, -0.16175079345703125, 0.37017059326171875, -0.14464187622070312, -0.010280609130859375, 0.3392601013183594, 0.1263580322265625, -0.31982421875, 0.1451873779296875, 0.2271881103515625, -0.35223388671875, 0.0410919189453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000017.npy"}
{"epoch": 0.025699168556311415, "step": 18, "batch_size": 64, "mean": 0.018225044012069702, "std": 0.28745800256729126, "min": -0.85693359375, "p10": -0.3030841827392578, "median": 0.020994186401367188, "p90": 0.39208450317382815, "max": 0.5591278076171875, "pos_frac": 0.53125, "sample": [0.07282257080078125, 0.002285003662109375, -0.14760780334472656, -0.023958206176757812, 0.2765846252441406, 0.4109649658203125, 0.080230712890625, 0.32166290283203125, 0.39249420166015625, 0.37517547607421875, 0.24429893493652344, 0.0012359619140625, 0.1782398223876953, -0.16186141967773438, -0.312652587890625, 0.181793212890625, 0.3348503112792969, -0.395294189453125, -0.21439743041992188, 0.18609619140625, 0.3391876220703125, 0.3911285400390625, -0.20822906494140625, 0.410736083984375, 0.10995864868164062, -0.2807579040527344, -0.101776123046875, 0.4223670959472656, 0.3741607666015625, 0.12689971923828125, -0.15092849731445312, 0.11885833740234375, -0.10535430908203125, -0.14794921875, -0.2476348876953125, -0.15186309814453125, -0.3433418273925781, 0.14276123046875, -0.25025177001953125, 0.039703369140625, -0.17470550537109375, 0.5591278076171875, -0.5284347534179688, 0.133331298828125, -0.280609130859375, 0.5061626434326172, -0.0637359619140625, -0.005420684814453125, -0.5827827453613281, -0.85693359375, -0.20012664794921875, 0.23354339599609375, -0.4745216369628906, -0.13147735595703125, -0.10229873657226562, 0.17054176330566406, -0.08240318298339844, -0.042499542236328125, 0.26568603515625, 0.07409286499023438, 0.1015472412109375, 0.0982208251953125, 0.4410438537597656, -0.18158340454101562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000018.npy"}
{"epoch": 0.027210884353741496, "step": 19, "batch_size": 64, "mean": 0.09455005824565887, "std": 0.349026083946228, "min": -0.986328125, "p10": -0.282951545715332, "median": 0.09041881561279297, "p90": 0.3993577957153321, "max": 1.3810577392578125, "pos_frac": 0.671875, "sample": [0.404327392578125, 0.208343505859375, -0.23219871520996094, 0.091400146484375, 0.3657112121582031, 0.38776206970214844, -0.04085540771484375, -0.3715667724609375, -0.3047027587890625, 0.009769439697265625, 0.1682891845703125, 0.182861328125, 0.3704071044921875, 0.1331920623779297, 0.109100341796875, -0.0023097991943359375, 0.08844757080078125, -0.031703948974609375, 0.36002349853515625, 0.28762054443359375, -0.18195343017578125, -0.7739028930664062, 0.0226898193359375, 0.304168701171875, 0.9110565185546875, 0.4896392822265625, 0.1776885986328125, -0.033477783203125, -0.5819854736328125, -0.1540241241455078, 0.6937789916992188, 0.2193145751953125, 0.04850006103515625, 0.3842010498046875, -0.986328125, -0.47100830078125, -0.3377532958984375, 0.23383522033691406, 0.015838623046875, 0.11639595031738281, -0.17786598205566406, -0.059520721435546875, 0.38543701171875, 0.06818389892578125, 0.13266754150390625, 0.2303791046142578, 0.1441497802734375, 1.3810577392578125, -0.00826263427734375, 0.43422698974609375, 0.1788177490234375, 0.08943748474121094, 0.01319122314453125, -0.019278526306152344, 0.15979766845703125, -0.05498504638671875, 0.016357421875, -0.028186798095703125, 0.08288383483886719, -0.11434173583984375, 0.608856201171875, 0.00815582275390625, 0.19861602783203125, 0.10083770751953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000019.npy"}
{"epoch": 0.02872260015117158, "step": 20, "batch_size": 64, "mean": 0.00376950204372406, "std": 0.30668583512306213, "min": -0.647491455078125, "p10": -0.4087802886962891, "median": -0.00689697265625, "p90": 0.3079948425292969, "max": 0.925567626953125, "pos_frac": 0.46875, "sample": [-0.080291748046875, 0.23828506469726562, -0.5882492065429688, -0.32500457763671875, -0.009286880493164062, -0.31314849853515625, 0.16851806640625, 0.925567626953125, 0.23264312744140625, 0.43112945556640625, -0.446380615234375, 0.11925125122070312, -0.47955322265625, 0.2275409698486328, -0.3385467529296875, 0.07752227783203125, 0.18854522705078125, -0.06262588500976562, 0.24652862548828125, -0.4137001037597656, -0.03222084045410156, -0.08239173889160156, -0.24058914184570312, -0.39730072021484375, -0.1811389923095703, -0.07299041748046875, 0.26193809509277344, -0.018301010131835938, 0.30915069580078125, 0.02511119842529297, -0.647491455078125, -0.5417900085449219, -0.0079193115234375, -0.018617630004882812, -0.0783233642578125, 0.12410354614257812, 0.2617988586425781, -0.31026458740234375, -0.0058746337890625, 0.17967987060546875, 0.30254173278808594, -0.1189422607421875, 0.2699432373046875, 0.0475311279296875, 0.27228546142578125, 0.2003173828125, 0.03290557861328125, -0.1313323974609375, 0.6506423950195312, 0.035980224609375, -0.5484390258789062, 0.33025360107421875, 0.0538330078125, 0.6276168823242188, -0.06207275390625, 0.08160209655761719, 0.3052978515625, 0.4218101501464844, -0.02700042724609375, -0.3324146270751953, -0.12158584594726562, -0.09878921508789062, -0.2726631164550781, -0.0033855438232421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000020.npy"}
{"epoch": 0.030234315948601664, "step": 21, "batch_size": 64, "mean": 0.0008340179920196533, "std": 0.33952510356903076, "min": -0.9742889404296875, "p10": -0.38591327667236325, "median": -0.025707244873046875, "p90": 0.4188003540039063, "max": 0.9599227905273438, "pos_frac": 0.46875, "sample": [0.06628799438476562, -0.39316368103027344, -0.03363800048828125, -0.0257415771484375, 0.5261306762695312, 0.103363037109375, 0.146453857421875, -0.07550811767578125, -0.1310100555419922, -0.2017192840576172, 0.18441390991210938, -0.060699462890625, 0.0030975341796875, 0.4261932373046875, 0.4725456237792969, 0.3944091796875, 0.132354736328125, 0.46648406982421875, -0.448699951171875, -0.003955841064453125, -0.12767791748046875, 0.20970726013183594, 0.023525238037109375, 0.12389373779296875, 0.1260833740234375, -0.87542724609375, -0.1497211456298828, 0.9599227905273438, -0.306060791015625, -0.2465801239013672, -0.232177734375, -0.030902862548828125, 0.27154541015625, -0.02567291259765625, -0.9742889404296875, -0.09117507934570312, 0.327728271484375, 0.13826370239257812, 0.31671142578125, -0.5115394592285156, 0.399566650390625, -0.29544830322265625, 0.70697021484375, 0.17454910278320312, 0.40155029296875, -0.2093219757080078, 0.07256317138671875, -0.1909332275390625, -0.16486740112304688, -0.04075431823730469, 0.05831146240234375, 0.0064697265625, -0.14090347290039062, 0.30643463134765625, -0.07565879821777344, 0.24440383911132812, -0.1981067657470703, -0.23916244506835938, -0.11727333068847656, 0.4556121826171875, -0.45909881591796875, -0.15057754516601562, -0.36899566650390625, -0.5957069396972656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000021.npy"}
{"epoch": 0.031746031746031744, "step": 22, "batch_size": 64, "mean": 0.05243924260139465, "std": 0.2646307647228241, "min": -0.7320938110351562, "p10": -0.23623199462890623, "median": 0.061614990234375, "p90": 0.40937767028808597, "max": 0.7900848388671875, "pos_frac": 0.59375, "sample": [-0.4976348876953125, -0.3986015319824219, -0.229278564453125, 0.4169044494628906, 0.20191001892089844, 0.27325439453125, 0.0814361572265625, 0.111846923828125, -0.0020999908447265625, 0.07309913635253906, 0.09821701049804688, 0.0429840087890625, -0.08901596069335938, 0.0049610137939453125, 0.2618141174316406, 0.1132354736328125, -0.026763916015625, 0.18873977661132812, -0.02283477783203125, -0.13230133056640625, 0.41428375244140625, -0.0299072265625, 0.47383880615234375, 0.08801841735839844, -0.09606361389160156, -0.2392120361328125, 0.13833045959472656, -0.12089729309082031, -0.16543006896972656, -0.00542449951171875, -0.3585357666015625, -0.2237091064453125, -0.11530303955078125, -0.010448455810546875, 0.048183441162109375, 0.0527801513671875, 0.09982109069824219, 0.04109954833984375, -0.7320938110351562, 0.138397216796875, 0.5510406494140625, 0.024749755859375, -0.06537628173828125, 0.7900848388671875, 0.0704498291015625, -0.34055328369140625, 0.09746551513671875, -0.1304931640625, 0.24529266357421875, 0.1619873046875, 0.2862129211425781, 0.66949462890625, 0.2155914306640625, 0.3979301452636719, 0.428741455078125, 0.07302284240722656, -0.02840423583984375, 0.08205032348632812, 0.3753509521484375, 0.10352325439453125, -0.10089111328125, -0.3744621276855469, 0.1049652099609375, -0.149261474609375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000022.npy"}
{"epoch": 0.03325774754346183, "step": 23, "batch_size": 64, "mean": 0.02134445309638977, "std": 0.3245933949947357, "min": -0.7376556396484375, "p10": -0.3513080596923828, "median": -0.028860092163085938, "p90": 0.47755661010742195, "max": 0.84259033203125, "pos_frac": 0.484375, "sample": [-0.20831298828125, -0.449615478515625, 0.09247589111328125, 0.84259033203125, -0.1291065216064453, 0.12290573120117188, -0.6472625732421875, -0.172760009765625, 0.21704864501953125, -0.3573150634765625, 0.228302001953125, -0.1594562530517578, 0.03310394287109375, 0.0054168701171875, -0.3018627166748047, 0.510009765625, -0.299285888671875, 0.18038177490234375, -0.2679252624511719, 0.5180435180664062, 0.0936279296875, 0.14981460571289062, 0.11444282531738281, -0.41965484619140625, 0.021556854248046875, 0.66351318359375, -0.16080474853515625, -0.05503654479980469, 0.48490142822265625, -0.03734588623046875, -0.7376556396484375, 0.35052490234375, 0.423553466796875, -0.04943275451660156, 0.8142547607421875, -0.15727996826171875, -0.082183837890625, 0.460418701171875, 0.23914718627929688, -0.478424072265625, 0.09980010986328125, -0.1703948974609375, -0.058620452880859375, 0.1024322509765625, 0.6224365234375, 0.3597564697265625, -0.0945892333984375, 0.3039588928222656, -0.18199920654296875, -0.468658447265625, 0.242462158203125, -0.048915863037109375, -0.11049079895019531, -0.06553459167480469, -0.0236358642578125, -0.034084320068359375, 0.156158447265625, -0.3372917175292969, -0.3146514892578125, 0.04046630859375, -0.048828125, -0.17007064819335938, 0.03699302673339844, 0.134033203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000023.npy"}
{"epoch": 0.03476946334089191, "step": 24, "batch_size": 64, "mean": 0.016127735376358032, "std": 0.27078190445899963, "min": -0.63623046875, "p10": -0.28309955596923825, "median": -0.025330543518066406, "p90": 0.3474014282226563, "max": 0.7409439086914062, "pos_frac": 0.484375, "sample": [0.6331710815429688, 0.15087127685546875, -0.11548614501953125, 0.010625839233398438, 0.389190673828125, -0.06217193603515625, 0.5497379302978516, 0.7409439086914062, 0.008983612060546875, -0.15541839599609375, 0.12175941467285156, -0.2859783172607422, -0.09685897827148438, -0.445556640625, 0.4642181396484375, 0.09234619140625, 0.13055038452148438, -0.29355430603027344, -0.23514556884765625, -0.045375823974609375, -0.17098236083984375, 0.14483261108398438, -0.63623046875, -0.2179718017578125, 0.3522491455078125, -0.11751556396484375, 0.2244873046875, -0.10252761840820312, 0.014495849609375, 0.30548095703125, -0.03576087951660156, -0.370330810546875, -0.2697410583496094, -0.0371246337890625, 0.3270721435546875, 0.0176544189453125, 0.2042388916015625, -0.04894828796386719, 0.167877197265625, -0.12066650390625, -0.08771514892578125, 0.2432861328125, 0.09754180908203125, 0.11176681518554688, -0.095977783203125, 0.3192596435546875, 0.18415069580078125, -0.25316619873046875, -0.28690338134765625, 0.28619384765625, -0.25350379943847656, -0.1848297119140625, -0.01490020751953125, -0.11984634399414062, 0.2702178955078125, 0.1296539306640625, -0.08394813537597656, -0.537261962890625, -0.260986328125, 0.02472686767578125, 0.358062744140625, -0.2763824462890625, 0.336090087890625, -0.060794830322265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000024.npy"}
{"epoch": 0.036281179138321996, "step": 25, "batch_size": 64, "mean": 0.03995510935783386, "std": 0.31749221682548523, "min": -0.71661376953125, "p10": -0.32217826843261715, "median": 0.020921707153320312, "p90": 0.41413879394531283, "max": 1.00994873046875, "pos_frac": 0.53125, "sample": [0.08932113647460938, -0.5194244384765625, -0.04989051818847656, -0.0379486083984375, 0.03314018249511719, 0.0522308349609375, 0.5074920654296875, -0.05885887145996094, 0.0382232666015625, -0.23851394653320312, -0.10308837890625, -0.07180404663085938, -0.38188934326171875, 0.1232757568359375, -0.0331878662109375, 0.18299484252929688, -0.339324951171875, -0.020366668701171875, -0.011341094970703125, -0.18613433837890625, -0.459503173828125, 0.05672645568847656, -0.5357666015625, 1.00994873046875, 0.03093719482421875, 0.057224273681640625, -0.18671035766601562, 0.735382080078125, 0.3326911926269531, 0.02471923828125, -0.0081939697265625, 0.1384258270263672, -0.095245361328125, 0.14830398559570312, 0.4486541748046875, -0.4465751647949219, 0.7633056640625, -0.0004405975341796875, -0.1214599609375, 0.3336029052734375, -0.71661376953125, 0.017124176025390625, 0.03508186340332031, -0.2363910675048828, 0.2929229736328125, -0.1505584716796875, -0.022983551025390625, 0.10279083251953125, 0.00799560546875, 0.04402923583984375, -0.2821693420410156, 0.08200645446777344, 0.8084640502929688, -0.09644317626953125, 0.10469818115234375, -0.022945404052734375, 0.7181549072265625, 0.10994338989257812, 0.32823944091796875, -0.1597900390625, 0.04730224609375, 0.077850341796875, -0.03375244140625, 0.301239013671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000025.npy"}
{"epoch": 0.03779289493575208, "step": 26, "batch_size": 64, "mean": 0.014275223016738892, "std": 0.30752044916152954, "min": -0.557159423828125, "p10": -0.4186378479003906, "median": 0.0348358154296875, "p90": 0.31678047180175783, "max": 0.9458465576171875, "pos_frac": 0.59375, "sample": [0.21667861938476562, -0.557159423828125, 0.0042572021484375, -0.5012664794921875, 0.1657867431640625, 0.11223411560058594, -0.28118896484375, 0.033050537109375, -0.4220619201660156, 0.5740928649902344, -0.05828094482421875, 0.61962890625, 0.0045013427734375, -0.4106483459472656, -0.3395233154296875, 0.03662109375, 0.3150634765625, 0.16447067260742188, 0.9458465576171875, 0.42815399169921875, -0.1659393310546875, -0.03396415710449219, -0.0945281982421875, -0.4290637969970703, -0.08661270141601562, -0.46610260009765625, 0.5377693176269531, -0.21196746826171875, 0.04435920715332031, 0.077178955078125, 0.039119720458984375, 0.8434429168701172, -0.18192291259765625, 0.15917587280273438, -0.02896881103515625, -0.361724853515625, 0.19054412841796875, 0.09094810485839844, -0.096435546875, 0.2735443115234375, 0.12657546997070312, -0.23044967651367188, 0.032878875732421875, -0.3271751403808594, 0.10528564453125, -0.5002365112304688, 0.0191497802734375, 0.2939643859863281, -0.5015716552734375, 0.020936965942382812, 0.13245391845703125, -0.2555389404296875, 0.083038330078125, 0.20105743408203125, 0.08177757263183594, -0.10526275634765625, 0.07007980346679688, -0.024480819702148438, 0.190704345703125, 0.06451797485351562, -0.2088470458984375, 0.09398078918457031, 0.08415031433105469, 0.3175163269042969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000026.npy"}
{"epoch": 0.039304610733182165, "step": 27, "batch_size": 64, "mean": -0.004676610231399536, "std": 0.31378409266471863, "min": -1.08709716796875, "p10": -0.310589599609375, "median": 0.022478103637695312, "p90": 0.351641845703125, "max": 0.70208740234375, "pos_frac": 0.53125, "sample": [0.003692626953125, 0.220367431640625, -0.1210174560546875, -0.09243583679199219, 0.23650550842285156, -0.10717010498046875, -0.12656402587890625, -0.26419830322265625, -0.0370635986328125, -0.0614013671875, -0.25225830078125, -0.6738433837890625, 0.1083221435546875, 0.1875286102294922, -0.1902294158935547, -0.025543212890625, 0.11787033081054688, 0.3018798828125, -0.26674652099609375, -0.4740638732910156, 0.1487884521484375, 0.1041259765625, -0.0008544921875, -0.21620941162109375, 0.1665496826171875, -0.1399688720703125, -0.03929901123046875, -0.16981124877929688, 0.28182220458984375, -0.08431053161621094, -0.0947113037109375, 0.24982261657714844, 0.0151824951171875, -0.385009765625, 0.23800086975097656, -0.9142684936523438, 0.28700828552246094, 0.49786376953125, 0.35109710693359375, 0.3556480407714844, 0.162322998046875, -0.316009521484375, 0.24511337280273438, 0.3586158752441406, -0.13074111938476562, 0.055999755859375, 0.35187530517578125, 0.029773712158203125, 0.03168487548828125, 0.1743907928466797, -0.297943115234375, 0.08095169067382812, 0.11924362182617188, 0.3923492431640625, 0.3829479217529297, -0.2917518615722656, 0.70208740234375, -0.1780681610107422, -1.08709716796875, -0.21734237670898438, 0.1427764892578125, -0.5427780151367188, 0.19759750366210938, 0.1996002197265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000027.npy"}
{"epoch": 0.04081632653061224, "step": 28, "batch_size": 64, "mean": -0.029662340879440308, "std": 0.3243613541126251, "min": -0.592620849609375, "p10": -0.43805694580078125, "median": 0.007140159606933594, "p90": 0.30887107849121104, "max": 1.05487060546875, "pos_frac": 0.515625, "sample": [0.17116546630859375, -0.547698974609375, 1.05487060546875, -0.4928131103515625, -0.2877960205078125, -0.09651947021484375, -0.592620849609375, -0.2221527099609375, 0.3272724151611328, 0.68157958984375, -0.10721206665039062, 0.31925201416015625, 0.0786285400390625, -0.48128509521484375, -0.254730224609375, -0.29601287841796875, -0.18269729614257812, -0.4424285888671875, 0.18993759155273438, 0.129791259765625, 0.36153411865234375, 0.0266265869140625, 0.074920654296875, -0.4278564453125, 0.2573070526123047, 0.11877822875976562, -0.1429271697998047, 0.2846488952636719, -0.5374298095703125, -0.274444580078125, -0.5491561889648438, -0.2663002014160156, 0.047092437744140625, 0.045806884765625, -0.04608917236328125, -0.00467681884765625, 0.004497528076171875, 0.019550323486328125, -0.204345703125, -0.4264373779296875, 0.18344688415527344, -0.191986083984375, -0.36373138427734375, 0.23979759216308594, 0.35404205322265625, -0.010114669799804688, 0.06843757629394531, 0.07840728759765625, -0.06336593627929688, -0.18474960327148438, 0.13067245483398438, 0.8305435180664062, 0.21855545043945312, 0.2175140380859375, 0.12158203125, 0.14947509765625, 0.03050994873046875, -0.27332305908203125, -0.3979949951171875, 0.009782791137695312, -0.31134796142578125, 0.1242828369140625, 0.1907672882080078, -0.359222412109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000028.npy"}
{"epoch": 0.042328042328042326, "step": 29, "batch_size": 64, "mean": -0.01944562792778015, "std": 0.3498800992965698, "min": -0.932769775390625, "p10": -0.3974639892578125, "median": -0.045719146728515625, "p90": 0.3868770599365235, "max": 1.0959930419921875, "pos_frac": 0.453125, "sample": [-0.12075042724609375, -0.13010406494140625, 0.23452186584472656, -0.310821533203125, 0.6399383544921875, -0.15624046325683594, -0.3861846923828125, 0.022197723388671875, -0.6965408325195312, 0.1551361083984375, -0.09159278869628906, 0.1190338134765625, 0.125457763671875, -0.05332183837890625, 0.3926963806152344, 0.326934814453125, -0.932769775390625, -0.31429100036621094, -0.9300994873046875, 0.4655914306640625, -0.014774322509765625, -0.1682281494140625, 0.217498779296875, -0.1348590850830078, 0.09682464599609375, 0.018646240234375, -0.15507125854492188, -0.2564239501953125, 0.17437744140625, 0.006961822509765625, 0.046600341796875, -0.038116455078125, -0.2399139404296875, 0.07294082641601562, -0.464080810546875, -0.11362838745117188, -0.0236053466796875, 0.12749481201171875, -0.10205268859863281, -0.082275390625, -0.10894584655761719, -0.07470321655273438, 0.10778427124023438, 0.5977058410644531, -0.09668922424316406, -0.2579345703125, 0.48555755615234375, -0.4022979736328125, 0.03301239013671875, -0.11098289489746094, -0.177581787109375, 0.37329864501953125, 0.13015365600585938, 0.10985755920410156, 0.21134185791015625, -0.12493515014648438, -0.6230392456054688, 0.21897125244140625, -0.2835845947265625, 1.0959930419921875, -0.5463409423828125, -0.07111740112304688, 0.27388954162597656, 0.6689605712890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000029.npy"}
{"epoch": 0.04383975812547241, "step": 30, "batch_size": 64, "mean": 0.011273384094238281, "std": 0.30534279346466064, "min": -0.84112548828125, "p10": -0.3669757843017578, "median": 0.025846481323242188, "p90": 0.3758007049560549, "max": 0.8270263671875, "pos_frac": 0.53125, "sample": [-0.04134559631347656, -0.03302192687988281, -0.42192840576171875, -0.06633758544921875, 0.17375946044921875, 0.12784576416015625, 0.1367340087890625, 0.1264190673828125, 0.1296234130859375, -0.6291351318359375, 0.23949050903320312, -0.303070068359375, 0.1347808837890625, 0.5702896118164062, -0.31531524658203125, -0.43011474609375, -0.20754241943359375, 0.6837158203125, -0.4951629638671875, -0.37796783447265625, -0.2478790283203125, 0.0072784423828125, -0.0239105224609375, 0.00406646728515625, 0.31313323974609375, -0.3227996826171875, -0.3607139587402344, -0.3326377868652344, 0.21321868896484375, 0.4297943115234375, 0.2430267333984375, 0.052066802978515625, -0.06563949584960938, 0.3213615417480469, 0.40081787109375, 0.20481109619140625, 0.8270263671875, 0.06558418273925781, 0.1731109619140625, 0.06293487548828125, 0.044414520263671875, 0.39913177490234375, 0.402618408203125, 0.04811859130859375, -0.22049331665039062, 0.27954864501953125, -0.031341552734375, -0.062046051025390625, 0.21918869018554688, -0.135284423828125, -0.12448501586914062, -0.127288818359375, 0.280975341796875, 0.07359695434570312, -0.07609176635742188, 0.24953651428222656, -0.19940567016601562, -0.03900146484375, -0.84112548828125, 0.06724357604980469, -0.0990447998046875, -0.369659423828125, -0.2947845458984375, 0.3108081817626953], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000030.npy"}
{"epoch": 0.045351473922902494, "step": 31, "batch_size": 64, "mean": 0.050272136926651, "std": 0.2845270037651062, "min": -0.44770050048828125, "p10": -0.3710367202758789, "median": 0.0392913818359375, "p90": 0.37747650146484374, "max": 0.6119537353515625, "pos_frac": 0.5625, "sample": [0.32572174072265625, 0.27936553955078125, 0.328643798828125, 0.30190277099609375, -0.26990318298339844, -0.395111083984375, -0.21378707885742188, 0.21036529541015625, -0.05742645263671875, -0.29758453369140625, -0.009366989135742188, 0.03900146484375, 0.5397262573242188, -0.1614990234375, 0.28154754638671875, 0.334716796875, 0.2288837432861328, 0.3579254150390625, 0.46044921875, -0.40915679931640625, 0.6119537353515625, 0.039581298828125, -0.1508331298828125, -0.35529136657714844, 0.069610595703125, 0.0170745849609375, 0.6021499633789062, -0.409942626953125, -0.3388671875, -0.37778472900390625, 0.3787078857421875, -0.44770050048828125, 0.01374053955078125, -0.0487518310546875, 0.04391288757324219, -0.15107345581054688, 0.200347900390625, -0.1571502685546875, -0.0023479461669921875, 0.013950347900390625, 0.37331390380859375, -0.40766143798828125, 0.11393356323242188, 0.4912872314453125, 0.374603271484375, 0.42543792724609375, -0.17829513549804688, -0.3291206359863281, 0.04300689697265625, 0.079864501953125, -0.1845245361328125, -0.40502166748046875, 0.3074493408203125, -0.10270500183105469, 0.237030029296875, -0.02365875244140625, 0.23148345947265625, -0.1056365966796875, 0.1417865753173828, -0.050079345703125, 0.32794189453125, 0.3679523468017578, 0.06547927856445312, -0.0021514892578125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000031.npy"}
{"epoch": 0.04686318972033258, "step": 32, "batch_size": 64, "mean": 0.044005244970321655, "std": 0.40330129861831665, "min": -0.7698135375976562, "p10": -0.40988998413085936, "median": 0.00513458251953125, "p90": 0.49845275878906264, "max": 1.2181854248046875, "pos_frac": 0.5, "sample": [-0.326507568359375, 0.4088134765625, -0.7698135375976562, 0.3560638427734375, -0.73602294921875, -0.3090629577636719, 0.54901123046875, -0.2293548583984375, 1.2181854248046875, 0.01329803466796875, 0.6926765441894531, 1.126129150390625, 0.40918731689453125, -0.0438385009765625, 0.21570396423339844, -0.263824462890625, -0.5630340576171875, 0.07888031005859375, -0.3807716369628906, -0.14337539672851562, 0.0773162841796875, 0.07501411437988281, 0.03272056579589844, 0.5154953002929688, 0.36992645263671875, -0.1250476837158203, 0.4188041687011719, 0.45868682861328125, -0.3323211669921875, 0.3930511474609375, 0.06653213500976562, 0.36341094970703125, 0.25078582763671875, -0.04604339599609375, 0.45436859130859375, 0.1437225341796875, -0.40335845947265625, -0.4434852600097656, 0.6877288818359375, 0.38495635986328125, -0.10849571228027344, 0.3039741516113281, -0.00302886962890625, -0.22417068481445312, -0.16223907470703125, 0.4212074279785156, -0.11864089965820312, -0.4910736083984375, -0.19651031494140625, 0.16860580444335938, -0.217926025390625, 0.5177879333496094, -0.3975067138671875, -0.06434822082519531, -0.20856857299804688, -0.09779739379882812, -0.47426605224609375, 0.35894012451171875, -0.37994956970214844, 0.20599365234375, -0.013927459716796875, 0.10955810546875, -0.34320068359375, -0.412689208984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000032.npy"}
{"epoch": 0.04837490551776266, "step": 33, "batch_size": 64, "mean": 0.005504101514816284, "std": 0.40451520681381226, "min": -1.345306396484375, "p10": -0.4526691436767578, "median": 0.02974700927734375, "p90": 0.44549560546875, "max": 1.1068496704101562, "pos_frac": 0.515625, "sample": [-0.463470458984375, -0.782928466796875, 0.5365066528320312, -0.742706298828125, -0.18751144409179688, -0.3623199462890625, 0.7028427124023438, 0.18283462524414062, 0.20397186279296875, 0.44780731201171875, 0.1338348388671875, 0.4492149353027344, -0.2610359191894531, -0.39527320861816406, 1.1068496704101562, -0.4432563781738281, -0.0423126220703125, 0.06412506103515625, -0.02236175537109375, -0.0335693359375, 0.24872589111328125, 0.983154296875, 0.36095428466796875, -0.18135452270507812, -0.45670318603515625, 0.3301239013671875, -0.0099334716796875, 0.22721481323242188, 0.3995819091796875, -0.086029052734375, 0.27852630615234375, 0.1732959747314453, 0.46695709228515625, -0.029491424560546875, -0.053318023681640625, 0.05926513671875, 0.3413867950439453, -0.142364501953125, -0.17409706115722656, 0.12110137939453125, -0.3094024658203125, -0.08090972900390625, -0.0639190673828125, -0.31751251220703125, 0.03479766845703125, -0.676116943359375, 0.27285003662109375, 0.13532447814941406, 0.08033180236816406, 0.11050033569335938, -0.029356002807617188, -0.8202896118164062, 0.1266803741455078, -0.36974334716796875, -0.18075180053710938, 0.09437942504882812, 0.02469635009765625, 0.28519439697265625, -1.345306396484375, 0.0447845458984375, -0.07493209838867188, 0.44010162353515625, 0.10338783264160156, -0.0807647705078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000033.npy"}
{"epoch": 0.049886621315192746, "step": 34, "batch_size": 64, "mean": 0.020601540803909302, "std": 0.32879605889320374, "min": -0.9585418701171875, "p10": -0.36212749481201173, "median": 0.06018543243408203, "p90": 0.38851451873779297, "max": 0.811981201171875, "pos_frac": 0.578125, "sample": [0.10190963745117188, 0.6015777587890625, 0.2820892333984375, 0.2563819885253906, -0.0750274658203125, -0.14899444580078125, -0.16675186157226562, 0.17897415161132812, 0.153228759765625, 0.03609657287597656, -0.02138519287109375, -0.663421630859375, 0.027587890625, 0.1280517578125, -0.367095947265625, -0.39662933349609375, 0.5328521728515625, 0.190093994140625, 0.13838958740234375, 0.39133262634277344, 0.06540679931640625, 0.0512237548828125, 0.5281906127929688, 0.0823211669921875, 0.038738250732421875, 0.293670654296875, 0.07729339599609375, 0.24324417114257812, -0.00270843505859375, -0.35053443908691406, -0.2719612121582031, -0.09547805786132812, -0.27294921875, 0.2159862518310547, 0.0878753662109375, 0.811981201171875, -0.009609222412109375, 0.30495452880859375, 0.21753692626953125, 0.07236862182617188, -0.16689300537109375, -0.23341751098632812, 0.16483688354492188, 0.2839374542236328, 0.170928955078125, 0.53851318359375, -0.6436386108398438, -0.2238006591796875, 0.12569427490234375, 0.18161773681640625, 0.05496406555175781, -0.8387069702148438, 0.24822235107421875, -0.9585418701171875, 0.3819389343261719, -0.5549354553222656, -0.06279754638671875, -0.11149215698242188, -0.15845298767089844, -0.07124710083007812, 0.4099273681640625, -0.019805908203125, -0.26483917236328125, -0.20032501220703125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000034.npy"}
{"epoch": 0.05139833711262283, "step": 35, "batch_size": 64, "mean": -0.002966582775115967, "std": 0.39351344108581543, "min": -0.7843017578125, "p10": -0.4485576629638672, "median": -0.036528587341308594, "p90": 0.5091278076171877, "max": 1.3257217407226562, "pos_frac": 0.46875, "sample": [0.6297683715820312, -0.5296173095703125, 0.5961952209472656, 0.3266448974609375, -0.3173656463623047, 1.3257217407226562, -0.4311370849609375, 0.08303070068359375, -0.0641021728515625, -0.2127704620361328, -0.224609375, -0.01612091064453125, -0.22112274169921875, 0.13353729248046875, -0.0026702880859375, 0.03370857238769531, 0.8871917724609375, -0.0921630859375, -0.400421142578125, -0.1200714111328125, 0.2009410858154297, 0.13078689575195312, 0.40312957763671875, -0.22595977783203125, -0.05693626403808594, -0.25437164306640625, -0.5488128662109375, -0.09194564819335938, -0.450286865234375, 0.39896392822265625, -0.513885498046875, 0.0667572021484375, -0.3460540771484375, -0.1604156494140625, -0.4445228576660156, -0.3339691162109375, -0.12634849548339844, -0.7843017578125, 0.16736984252929688, -0.7571754455566406, 0.207855224609375, 0.12627410888671875, 0.5634078979492188, -0.1483306884765625, 0.0658111572265625, 0.5287246704101562, 0.4264106750488281, 0.17378997802734375, -0.13485336303710938, 0.10224723815917969, -0.2286396026611328, -0.17246246337890625, 0.3362312316894531, -0.409515380859375, -0.467254638671875, 0.0898284912109375, 0.18914794921875, -0.39581298828125, 0.07349777221679688, 0.766082763671875, 0.19170570373535156, 0.46340179443359375, -0.25360107421875, 0.05960273742675781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000035.npy"}
{"epoch": 0.05291005291005291, "step": 36, "batch_size": 64, "mean": -0.0010439753532409668, "std": 0.3825839161872864, "min": -0.830963134765625, "p10": -0.4177061080932617, "median": -0.0159912109375, "p90": 0.3989753723144532, "max": 1.494873046875, "pos_frac": 0.5, "sample": [-0.1202239990234375, 1.494873046875, -0.4758148193359375, -0.3007392883300781, -0.31281089782714844, -0.16551589965820312, 0.09654998779296875, 0.004489898681640625, 0.23559951782226562, 0.0468902587890625, -0.830963134765625, 0.23128890991210938, 0.24554443359375, -0.3010101318359375, 0.04470062255859375, 0.061527252197265625, 0.33982086181640625, -0.17133331298828125, 0.08551406860351562, 0.48311614990234375, -0.07269096374511719, 0.5157546997070312, 0.079925537109375, -0.04020500183105469, -0.392578125, -0.7811660766601562, -0.1266937255859375, -0.10584259033203125, 0.17202377319335938, -0.754608154296875, 0.21975326538085938, 0.202606201171875, 0.22027587890625, 0.0235137939453125, -0.647918701171875, -0.5928878784179688, -0.2582893371582031, 0.1322479248046875, -0.41412925720214844, -0.0589447021484375, -0.036472320556640625, 0.29705810546875, 0.5902252197265625, -0.40163421630859375, 0.004852294921875, 0.04444694519042969, 0.752166748046875, 0.2467193603515625, -0.08133697509765625, 0.5651130676269531, 0.40605926513671875, -0.2969169616699219, -0.1348705291748047, 0.1653900146484375, -0.0417022705078125, -0.08258056640625, 0.26615142822265625, 0.30702972412109375, -0.3082122802734375, -0.047992706298828125, -0.1847057342529297, 0.3824462890625, -0.4192390441894531, -0.07045936584472656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000036.npy"}
{"epoch": 0.05442176870748299, "step": 37, "batch_size": 64, "mean": 0.1075165867805481, "std": 0.38432008028030396, "min": -0.6298599243164062, "p10": -0.3467241287231445, "median": 0.0882415771484375, "p90": 0.5327117919921878, "max": 1.2620391845703125, "pos_frac": 0.609375, "sample": [0.008758544921875, 0.6020660400390625, -0.001922607421875, -0.60296630859375, 0.3899993896484375, 0.47624969482421875, 0.05167198181152344, 0.08084487915039062, 0.035797119140625, 0.106536865234375, 1.25030517578125, 0.3934516906738281, 0.3247108459472656, 0.090576171875, -0.22593307495117188, -0.591552734375, 0.2616729736328125, 0.37381744384765625, 0.2746620178222656, 0.085906982421875, -0.10670280456542969, 0.13144874572753906, -0.3521881103515625, -0.04448699951171875, -0.33397483825683594, -0.49828338623046875, 0.11782073974609375, -0.18402862548828125, -0.19145965576171875, -0.15625, -0.3819732666015625, 0.16399383544921875, 0.7451095581054688, 0.55511474609375, 1.2620391845703125, 0.0958251953125, -0.19766616821289062, 0.010297775268554688, 0.12025833129882812, 0.18665313720703125, 0.9212646484375, -0.02291107177734375, 0.3711891174316406, 0.38971710205078125, -0.6298599243164062, -0.09140396118164062, -0.4957122802734375, 0.177581787109375, -0.0157470703125, 0.283660888671875, 0.12239456176757812, -0.06113433837890625, 0.038974761962890625, -0.254364013671875, -0.01042938232421875, 0.42462921142578125, -0.245330810546875, 0.2446746826171875, -0.23895263671875, 0.13259124755859375, 0.480438232421875, 0.45783233642578125, -0.04681587219238281, 0.6225738525390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000037.npy"}
{"epoch": 0.055933484504913075, "step": 38, "batch_size": 64, "mean": 0.041731446981430054, "std": 0.38410642743110657, "min": -0.8486404418945312, "p10": -0.409146499633789, "median": 0.04220390319824219, "p90": 0.4673442840576174, "max": 1.0817604064941406, "pos_frac": 0.53125, "sample": [-0.12638473510742188, 0.33647727966308594, 0.34615325927734375, 1.0817604064941406, -0.19506072998046875, 0.4891395568847656, 0.1721668243408203, 0.21805191040039062, -0.26351165771484375, -0.4279937744140625, -0.03864097595214844, 0.2681159973144531, 0.12258338928222656, 0.006946563720703125, -0.01894378662109375, -0.16956329345703125, -0.33473968505859375, -0.1671123504638672, -0.49896240234375, 0.05271148681640625, -0.06432342529296875, -0.2912750244140625, -0.5861282348632812, -0.8486404418945312, 0.1082916259765625, 0.076873779296875, 0.0416259765625, -0.44512176513671875, -0.13639068603515625, -0.21703338623046875, -0.22402191162109375, -0.3042449951171875, 0.12732696533203125, 0.15147781372070312, 0.885223388671875, 0.4164886474609375, 0.220001220703125, -0.4599571228027344, 0.0618743896484375, 0.08675575256347656, 0.6864051818847656, -0.35442352294921875, 0.3633766174316406, 0.3784523010253906, 0.15836334228515625, -0.247650146484375, -0.3651695251464844, -0.45696258544921875, 0.7216072082519531, 0.9591064453125, 0.1370086669921875, 0.3606414794921875, 0.9991607666015625, -0.3242340087890625, -0.10387039184570312, 0.06602287292480469, -0.09064483642578125, -0.07930755615234375, 0.13437652587890625, -0.14572906494140625, -0.013256072998046875, 0.31976318359375, 0.042781829833984375, 0.072998046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000038.npy"}
{"epoch": 0.05744520030234316, "step": 39, "batch_size": 64, "mean": 0.056430280208587646, "std": 0.37973517179489136, "min": -1.196075439453125, "p10": -0.4471668243408202, "median": 0.041899681091308594, "p90": 0.5097763061523438, "max": 0.8560638427734375, "pos_frac": 0.578125, "sample": [0.6197242736816406, 0.04772186279296875, 0.6865234375, 0.35633277893066406, -0.5619354248046875, 0.0027923583984375, 0.48040008544921875, 0.3780536651611328, -0.21678543090820312, 0.15139389038085938, 0.21360015869140625, -0.03604888916015625, 0.4441699981689453, -0.035564422607421875, 0.119598388671875, 0.3125896453857422, -0.020486831665039062, 0.17297744750976562, 0.3414802551269531, 0.5120697021484375, -0.3433647155761719, -0.4916534423828125, -0.15117263793945312, -0.09035110473632812, 0.367706298828125, 0.055515289306640625, -1.196075439453125, 0.2754535675048828, 0.00275421142578125, 0.22111892700195312, 0.20701217651367188, 0.4799957275390625, -0.3229217529296875, -0.0032196044921875, 0.12431716918945312, 0.5181159973144531, 0.171539306640625, -0.03025054931640625, 0.032756805419921875, -0.24622726440429688, 0.8560638427734375, -0.914581298828125, -0.14005279541015625, 0.55438232421875, 0.504425048828125, 0.00479888916015625, -0.5737152099609375, -0.610260009765625, -0.09415054321289062, 0.16817665100097656, 0.37567901611328125, -0.5508575439453125, -0.11830520629882812, -0.11460494995117188, -0.03274726867675781, 0.648193359375, 0.03607749938964844, -0.2127532958984375, 0.07437896728515625, -0.2856101989746094, 0.3350982666015625, 0.225189208984375, -0.0022029876708984375, -0.07073974609375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000039.npy"}
{"epoch": 0.05895691609977324, "step": 40, "batch_size": 64, "mean": 0.04981803894042969, "std": 0.2986147701740265, "min": -0.878753662109375, "p10": -0.3250648498535156, "median": 0.060563087463378906, "p90": 0.3924787521362308, "max": 0.6322021484375, "pos_frac": 0.578125, "sample": [-0.01355743408203125, 0.6322021484375, 0.2035980224609375, 0.15600967407226562, -0.0744476318359375, 0.06140327453613281, 0.3046760559082031, -0.14598846435546875, 0.2786712646484375, 0.5707130432128906, 0.253021240234375, 0.13135528564453125, 0.1920013427734375, -0.14245986938476562, 0.2232227325439453, -0.03446006774902344, 0.0185546875, 0.29155731201171875, 0.49538421630859375, 0.05904579162597656, 0.22500991821289062, 0.428314208984375, 0.1914825439453125, 0.3086814880371094, -0.21346282958984375, -0.10422515869140625, 0.20848846435546875, -0.003734588623046875, 0.43671417236328125, -0.1576385498046875, 0.08439254760742188, -0.5433197021484375, 0.615509033203125, -0.3825492858886719, 0.24433517456054688, 0.08412551879882812, 0.23676300048828125, 0.4587841033935547, 0.29926300048828125, 0.12420654296875, 0.021955490112304688, -0.18695068359375, 0.30886268615722656, -0.3779144287109375, -0.1752471923828125, -0.013286590576171875, 0.015478134155273438, -0.17130661010742188, -0.28716278076171875, -0.082061767578125, 0.282073974609375, -0.34130859375, 0.26412200927734375, -0.0131683349609375, -0.0792999267578125, -0.6889610290527344, -0.878753662109375, -0.555908203125, -0.1229400634765625, 0.059722900390625, 0.23981666564941406, -0.05962181091308594, 0.202178955078125, -0.17360687255859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000040.npy"}
{"epoch": 0.06046863189720333, "step": 41, "batch_size": 64, "mean": 0.03554290533065796, "std": 0.3952210247516632, "min": -1.2379913330078125, "p10": -0.386163330078125, "median": 0.03942298889160156, "p90": 0.4590553283691407, "max": 1.0165252685546875, "pos_frac": 0.546875, "sample": [0.329315185546875, 0.06671142578125, -0.0625, -0.23381805419921875, 0.2142181396484375, -0.5375823974609375, -0.1643238067626953, 0.5160026550292969, 0.28772735595703125, -0.33739471435546875, 0.40007781982421875, 3.814697265625e-06, -0.1952667236328125, 0.6399688720703125, -0.1221466064453125, 0.3899993896484375, -0.368133544921875, 1.0165252685546875, 0.10565948486328125, -0.06253623962402344, 0.36632537841796875, -1.2379913330078125, -0.5568885803222656, -0.24214935302734375, 0.2165660858154297, -0.1883544921875, -0.2174224853515625, 0.43204307556152344, 0.1976776123046875, -0.6515960693359375, 0.008214950561523438, 0.7225799560546875, 0.4677276611328125, 0.39781951904296875, -0.21327781677246094, -0.050567626953125, 0.20140457153320312, 0.16767120361328125, 0.20433807373046875, -0.13448715209960938, 0.036136627197265625, -0.7588729858398438, -0.156646728515625, -0.05682945251464844, 0.376220703125, -0.2344684600830078, 0.2994804382324219, -0.1079559326171875, 0.564666748046875, 0.0427093505859375, -0.20590591430664062, -0.768524169921875, 0.22196197509765625, -0.33895111083984375, 0.2751960754394531, 0.2783355712890625, 0.41051483154296875, 0.158782958984375, 0.6245517730712891, 0.16998291015625, -0.06882476806640625, 0.43881988525390625, -0.30388450622558594, -0.393890380859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000041.npy"}
{"epoch": 0.06198034769463341, "step": 42, "batch_size": 64, "mean": 0.0012210756540298462, "std": 0.41268596053123474, "min": -0.9726181030273438, "p10": -0.4393409729003906, "median": -0.006907463073730469, "p90": 0.5349000930786133, "max": 0.8854293823242188, "pos_frac": 0.484375, "sample": [0.7736053466796875, 0.88140869140625, 0.10776519775390625, -0.05220794677734375, -0.42626190185546875, -0.39096832275390625, 0.16345596313476562, -0.0677328109741211, -0.5858917236328125, 0.044952392578125, -0.22011566162109375, 0.3217315673828125, 0.015186309814453125, -0.056549072265625, -0.41271209716796875, 0.5900154113769531, 0.7628936767578125, 0.195709228515625, 0.03996849060058594, 0.06591415405273438, 0.17092132568359375, 0.8854293823242188, -0.02986907958984375, -0.2675971984863281, -0.880035400390625, -0.36066436767578125, 0.2807655334472656, -0.25421142578125, -0.23259544372558594, -0.3189525604248047, 0.5408058166503906, 0.5211200714111328, -0.15349578857421875, -0.6775131225585938, 0.3753662109375, 0.2564697265625, -0.02893829345703125, 0.0706787109375, -0.40518951416015625, -0.35411834716796875, -0.22365570068359375, -0.8568649291992188, -0.9726181030273438, -0.1517181396484375, 0.08663177490234375, -0.4449462890625, 0.4904632568359375, -0.336669921875, 0.09964370727539062, -0.0818023681640625, 0.39981842041015625, 0.5150642395019531, -0.0040435791015625, 0.09490203857421875, 0.32155418395996094, 0.05518341064453125, -0.5762100219726562, 0.05973243713378906, 0.6453018188476562, 0.43505859375, -0.05731010437011719, -0.009771347045898438, -0.11910247802734375, -0.17903518676757812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000042.npy"}
{"epoch": 0.06349206349206349, "step": 43, "batch_size": 64, "mean": 0.04852169752120972, "std": 0.36693528294563293, "min": -0.7471237182617188, "p10": -0.48790454864501953, "median": 0.05782318115234375, "p90": 0.4485282897949221, "max": 0.871978759765625, "pos_frac": 0.5625, "sample": [0.49562835693359375, 0.21321487426757812, -0.49090576171875, -0.11913681030273438, -0.3109130859375, -0.26639556884765625, -0.11859130859375, 0.0369720458984375, 0.0134429931640625, -0.6931533813476562, 0.3268413543701172, 0.28939056396484375, 0.0019683837890625, -0.2421417236328125, 0.38763427734375, 0.5752105712890625, 0.32431602478027344, -0.028751373291015625, 0.23487091064453125, -0.5158462524414062, 0.2601470947265625, 0.1626739501953125, 0.297454833984375, -0.10042572021484375, 0.35044097900390625, 0.173858642578125, -0.052623748779296875, -0.34902191162109375, 0.04156494140625, 0.1543426513671875, 0.3486175537109375, -0.1981201171875, 0.21821212768554688, -0.5841445922851562, -0.19174766540527344, 0.73394775390625, -0.111053466796875, -0.14165306091308594, 0.39942169189453125, 0.8164749145507812, -0.1529693603515625, -0.2857208251953125, -0.0503997802734375, -0.7471237182617188, 0.2642536163330078, 0.871978759765625, -0.04498291015625, 0.469573974609375, -0.25821495056152344, 0.3863792419433594, 0.32917213439941406, 0.0740814208984375, 0.21637725830078125, -0.0806121826171875, 0.0980682373046875, 0.151947021484375, -0.48090171813964844, 0.813507080078125, 0.3120269775390625, 0.10272216796875, -0.651641845703125, -0.0630950927734375, 0.08072662353515625, -0.5917854309082031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000043.npy"}
{"epoch": 0.06500377928949358, "step": 44, "batch_size": 64, "mean": 0.05146560072898865, "std": 0.43708229064941406, "min": -1.0123214721679688, "p10": -0.5158199310302735, "median": 0.04706287384033203, "p90": 0.5406135559082033, "max": 1.50128173828125, "pos_frac": 0.546875, "sample": [-0.5128746032714844, -0.199676513671875, 0.16888427734375, 0.06252098083496094, 0.7807998657226562, -0.11798095703125, 0.0950469970703125, -0.4217529296875, -0.7431259155273438, -0.6009979248046875, -0.3633766174316406, 0.511138916015625, -0.1974945068359375, 0.1405181884765625, 0.24951934814453125, -0.23122406005859375, -0.2330760955810547, 0.25090980529785156, 0.22929000854492188, -0.045146942138671875, -0.09656143188476562, 0.3819541931152344, 0.14891433715820312, -0.33965301513671875, -0.10949325561523438, -0.0405731201171875, 0.640106201171875, -0.5170822143554688, 0.4436798095703125, -0.03044891357421875, 0.381134033203125, -0.1891021728515625, 0.32906341552734375, 0.5800018310546875, 0.014598846435546875, 0.3490142822265625, 1.50128173828125, 0.2877922058105469, -0.1855621337890625, -0.555450439453125, -0.049686431884765625, 0.15972900390625, 0.3621368408203125, -0.310791015625, 0.40891265869140625, 0.38199615478515625, -0.111114501953125, 0.6306915283203125, -0.32735443115234375, 0.04545021057128906, -1.0123214721679688, 0.31659698486328125, 0.11006927490234375, 0.0554962158203125, -0.555755615234375, -0.14658355712890625, 0.5532455444335938, 1.1945953369140625, 0.4147624969482422, 0.1971588134765625, 0.048675537109375, 0.0257568359375, -0.27558135986328125, -0.6378021240234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000044.npy"}
{"epoch": 0.06651549508692366, "step": 45, "batch_size": 64, "mean": 0.06958654522895813, "std": 0.38374876976013184, "min": -1.034912109375, "p10": -0.41257781982421876, "median": 0.07933807373046875, "p90": 0.4695953369140627, "max": 1.1434783935546875, "pos_frac": 0.625, "sample": [-0.010105133056640625, -0.30970001220703125, 0.3147430419921875, 0.4286651611328125, -0.043605804443359375, -0.517791748046875, -0.20432472229003906, -0.21710205078125, -0.5966949462890625, -0.5734024047851562, 0.645904541015625, 0.4871368408203125, 0.18030166625976562, 0.1812896728515625, 0.01564788818359375, 0.024181365966796875, 0.27689552307128906, -0.419036865234375, 0.03226470947265625, 0.080352783203125, 0.6128692626953125, -0.1622161865234375, 0.2214202880859375, 0.20902442932128906, 0.14782333374023438, 1.1434783935546875, 0.3707275390625, 0.16040611267089844, 0.3470001220703125, 0.261474609375, 0.3570098876953125, 0.492431640625, -0.972686767578125, 0.18638992309570312, 0.3685264587402344, 0.97467041015625, -0.05533409118652344, 0.12058258056640625, 0.0783233642578125, 0.3571014404296875, -1.034912109375, -0.4329071044921875, -0.12510299682617188, 0.1114044189453125, 0.05535888671875, 0.013275146484375, -0.05078125, -0.24534034729003906, -0.0308837890625, 0.2947998046875, -0.3975067138671875, 0.09649658203125, 0.38385772705078125, -0.0341339111328125, 0.11336898803710938, 0.07086944580078125, 0.8386688232421875, -0.057231903076171875, 0.11109542846679688, -0.1288928985595703, 0.058483123779296875, -0.1500244140625, 0.1264801025390625, -0.12754440307617188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000045.npy"}
{"epoch": 0.06802721088435375, "step": 46, "batch_size": 64, "mean": 0.12216731905937195, "std": 0.40575912594795227, "min": -0.8578643798828125, "p10": -0.380401611328125, "median": 0.08113288879394531, "p90": 0.6506282806396485, "max": 1.146240234375, "pos_frac": 0.65625, "sample": [-0.26983642578125, -0.6029129028320312, -0.35308837890625, -0.0935821533203125, 0.4688720703125, 0.0501861572265625, 0.151031494140625, -0.6477203369140625, -0.10587310791015625, -0.13946533203125, 0.5516109466552734, 0.3630180358886719, 0.223602294921875, 0.6820144653320312, -0.1341400146484375, -0.18065261840820312, -0.41844940185546875, 0.15155029296875, 0.17755508422851562, 0.3324737548828125, 0.3214244842529297, 1.07757568359375, 0.04075050354003906, 0.5342178344726562, 0.33957672119140625, 0.5732421875, 0.2147216796875, 0.5562210083007812, 0.06237030029296875, 0.09423828125, -0.35417938232421875, 0.29268646240234375, 0.0058746337890625, 0.03802490234375, -0.22333908081054688, 0.6453704833984375, 0.01433563232421875, 0.3064556121826172, -0.3286094665527344, 0.6528816223144531, 0.02265167236328125, -0.39163970947265625, 0.4784088134765625, 0.06802749633789062, 0.8074378967285156, 0.03778839111328125, -0.12425994873046875, -0.4019775390625, -0.19503402709960938, 0.20411300659179688, 0.14415359497070312, -0.02309417724609375, -0.12073898315429688, 0.16039657592773438, 0.66400146484375, 0.927978515625, -0.8578643798828125, 0.04187774658203125, -0.3940105438232422, 1.146240234375, 0.1378021240234375, 0.10146141052246094, -0.20804786682128906, 0.5230026245117188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000046.npy"}
{"epoch": 0.06953892668178382, "step": 47, "batch_size": 64, "mean": 0.17277857661247253, "std": 0.38665497303009033, "min": -0.7733078002929688, "p10": -0.3947093963623047, "median": 0.2532844543457031, "p90": 0.6305286407470704, "max": 0.806304931640625, "pos_frac": 0.640625, "sample": [0.31296730041503906, 0.6226272583007812, -0.01421356201171875, -0.6701278686523438, -0.01503753662109375, 0.33767127990722656, -0.10773086547851562, 0.708251953125, -0.008192062377929688, 0.100128173828125, -0.1443309783935547, -0.1986255645751953, 0.2797698974609375, -0.16440582275390625, 0.03659820556640625, 0.24657440185546875, 0.5335369110107422, -0.26522064208984375, 0.224517822265625, 0.07928466796875, -0.029552459716796875, -0.47149658203125, -0.2263031005859375, 0.21149444580078125, -0.08795547485351562, 0.7667999267578125, 0.27997398376464844, 0.3887290954589844, -0.2548484802246094, 0.5002784729003906, -0.27210426330566406, 0.7843780517578125, 0.23069381713867188, 0.4031486511230469, -0.549530029296875, 0.5506362915039062, 0.7938613891601562, 0.4326629638671875, 0.5191688537597656, 0.69488525390625, 0.211090087890625, 0.5230026245117188, -0.085174560546875, 0.45642852783203125, 0.5668067932128906, 0.4160919189453125, 0.806304931640625, 0.27826690673828125, 0.4459877014160156, 0.4671630859375, -0.7733078002929688, 0.10138702392578125, -0.41040802001953125, -0.40167999267578125, -0.4915142059326172, 0.3411865234375, -0.01491546630859375, 0.6339149475097656, 0.2914314270019531, 0.3377571105957031, 0.3456764221191406, 0.5718193054199219, 0.2599945068359375, -0.3784446716308594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000047.npy"}
{"epoch": 0.0710506424792139, "step": 48, "batch_size": 64, "mean": 0.022846579551696777, "std": 0.47910425066947937, "min": -0.8838882446289062, "p10": -0.5133781433105469, "median": -0.04696083068847656, "p90": 0.7184341430664072, "max": 1.3555984497070312, "pos_frac": 0.484375, "sample": [0.027318954467773438, -0.093902587890625, 0.3332252502441406, 0.1089019775390625, 0.098388671875, 0.9492225646972656, -0.1614990234375, 0.4000701904296875, -0.4906005859375, -0.10712814331054688, 0.4227294921875, 0.23248863220214844, -0.3692779541015625, -0.6463470458984375, -0.07416534423828125, 0.875701904296875, 0.30816650390625, 0.07299995422363281, -0.7348403930664062, 0.8130264282226562, -0.024261474609375, -0.2946434020996094, 0.46213531494140625, 0.44101715087890625, 0.46988677978515625, 0.33710479736328125, 0.0922393798828125, 0.3055763244628906, -0.430450439453125, 1.007781982421875, -0.367523193359375, -0.182403564453125, -0.29064178466796875, -0.06966018676757812, 0.052581787109375, 0.49771881103515625, -0.5231399536132812, -0.09694290161132812, 1.0025386810302734, -0.4571552276611328, -0.2180500030517578, -0.3160858154296875, -0.146484375, -0.08231353759765625, -0.8838882446289062, -0.401092529296875, 0.025045394897460938, 0.2913475036621094, -0.565826416015625, -0.2350788116455078, 0.065185546875, -0.08460235595703125, -0.3839302062988281, 0.07270240783691406, -0.6182346343994141, 0.08600044250488281, 1.3555984497070312, -0.14827537536621094, -0.6045684814453125, -0.4546852111816406, 0.012941360473632812, 1.2032318115234375, 0.07285881042480469, -0.47585296630859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000048.npy"}
{"epoch": 0.07256235827664399, "step": 49, "batch_size": 64, "mean": 0.06817954778671265, "std": 0.4877556264400482, "min": -1.274658203125, "p10": -0.5827156066894531, "median": 0.1327524185180664, "p90": 0.5201179504394531, "max": 2.0048828125, "pos_frac": 0.703125, "sample": [0.332550048828125, -0.39557647705078125, 0.1466808319091797, -0.0185394287109375, 0.24956512451171875, -0.11777496337890625, -0.12044906616210938, 0.06204795837402344, 0.2719287872314453, 0.04730987548828125, 0.1677074432373047, 0.047443389892578125, 0.607879638671875, 0.5077285766601562, 0.1511821746826172, -0.3175811767578125, -0.58660888671875, -0.07892608642578125, 0.07514190673828125, -1.274658203125, 0.14950942993164062, 0.0121917724609375, 0.05842399597167969, 0.321075439453125, 0.16314315795898438, -0.11418914794921875, -0.182403564453125, 0.05410194396972656, 0.23607826232910156, -0.5736312866210938, 0.1945476531982422, 0.18616104125976562, -0.3988800048828125, 0.4141368865966797, 0.021917343139648438, 0.62506103515625, 0.09879302978515625, 0.35581207275390625, 0.5174102783203125, 1.1158065795898438, 0.1754150390625, 0.03045654296875, 0.2598991394042969, 0.248809814453125, -0.6518783569335938, 0.20279693603515625, 0.2570838928222656, 0.5376815795898438, 0.22730255126953125, 0.0763092041015625, 0.247833251953125, 0.5212783813476562, -0.8814239501953125, -0.1474609375, -0.4739494323730469, -0.6644439697265625, 0.11882400512695312, 2.0048828125, 0.6107330322265625, 0.15217018127441406, -0.8224411010742188, 0.05028533935546875, 0.3624114990234375, -1.0912017822265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000049.npy"}
{"epoch": 0.07407407407407407, "step": 50, "batch_size": 64, "mean": 0.1104932427406311, "std": 0.4204116761684418, "min": -0.9023590087890625, "p10": -0.4098323822021484, "median": 0.10402679443359375, "p90": 0.6616226196289063, "max": 1.0756301879882812, "pos_frac": 0.59375, "sample": [0.6677932739257812, 0.3626861572265625, -0.0140228271484375, 0.3957099914550781, 0.01334381103515625, 0.22088623046875, 0.3234519958496094, -0.5159683227539062, 1.0756301879882812, 0.3676910400390625, 0.3039360046386719, -0.19899749755859375, -0.47894287109375, -0.03330230712890625, -0.4761390686035156, -0.31536865234375, 0.2881927490234375, 0.6008567810058594, 0.5676727294921875, 0.8986129760742188, 0.489410400390625, 0.05530548095703125, -0.007970809936523438, -0.23586273193359375, -0.1049652099609375, 0.677093505859375, -0.3872337341308594, 0.3681449890136719, 0.029476165771484375, -0.1995849609375, 0.6472244262695312, 0.16072463989257812, 0.28165435791015625, 0.6012744903564453, -0.7156982421875, 0.13123321533203125, 0.42653656005859375, 0.9010391235351562, 0.1330394744873047, -0.092987060546875, 0.8326950073242188, 0.9188079833984375, -0.19849014282226562, -0.3833770751953125, 0.0167388916015625, 0.06487274169921875, 0.27291107177734375, -0.13616180419921875, -0.3402099609375, 0.3958740234375, 0.2088756561279297, -0.26839637756347656, -0.17177772521972656, 0.09597015380859375, 0.4148597717285156, -0.41951751708984375, -0.9023590087890625, 0.11208343505859375, -0.14263153076171875, -0.4395885467529297, 0.2596549987792969, -0.10527801513671875, 0.1264629364013672, -0.35202789306640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000050.npy"}
{"epoch": 0.07558578987150416, "step": 51, "batch_size": 64, "mean": 0.12510016560554504, "std": 0.5251401662826538, "min": -1.0513076782226562, "p10": -0.4841999053955078, "median": 0.09212875366210938, "p90": 0.7964683532714848, "max": 1.6484375, "pos_frac": 0.578125, "sample": [0.906768798828125, 0.04475975036621094, -0.20807647705078125, -0.10601806640625, -0.14162445068359375, 0.28830718994140625, 0.059112548828125, 0.25955963134765625, -0.5521259307861328, -0.30812835693359375, 0.2711372375488281, 0.3822898864746094, -0.4605064392089844, 0.8781661987304688, -0.10640716552734375, -0.494354248046875, 0.33263397216796875, 0.4800567626953125, 0.551177978515625, 0.6927490234375, 0.5097808837890625, 0.3171520233154297, 0.4217796325683594, -0.1260833740234375, 0.04093170166015625, -0.652008056640625, -0.5218429565429688, -0.5997352600097656, 0.12247276306152344, 0.2758331298828125, 0.591796875, 0.5367431640625, -0.010223388671875, 0.2151947021484375, -1.0513076782226562, -0.18486785888671875, 0.2531394958496094, -0.35245513916015625, -0.3359527587890625, -0.0687713623046875, -0.132843017578125, -0.24786376953125, 0.3524208068847656, 1.6484375, 0.15962982177734375, 0.11870574951171875, -0.909088134765625, 1.57293701171875, 0.0255279541015625, 0.8409194946289062, 0.5004634857177734, 1.1627349853515625, 0.4323921203613281, -0.3706207275390625, 0.28165435791015625, -0.3124275207519531, -0.17608642578125, 1.1961212158203125, -0.10525131225585938, -0.416717529296875, -0.12299346923828125, 0.0655517578125, 0.15450668334960938, 0.13724517822265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000051.npy"}
{"epoch": 0.07709750566893424, "step": 52, "batch_size": 64, "mean": 0.20921196043491364, "std": 0.5971322059631348, "min": -1.0050048828125, "p10": -0.6165618896484373, "median": 0.14728641510009766, "p90": 0.9068817138671879, "max": 2.471771240234375, "pos_frac": 0.65625, "sample": [1.5595703125, -0.7123565673828125, 0.5904083251953125, 0.741241455078125, 0.6008834838867188, -0.3017578125, 0.46539306640625, -0.01056671142578125, -0.110748291015625, 0.6553726196289062, 1.292083740234375, 0.7932968139648438, -0.0230255126953125, 0.980621337890625, 0.8055572509765625, -0.7138900756835938, 0.5532455444335938, -0.04638671875, 0.07916641235351562, -0.16136932373046875, -0.8620719909667969, -0.22896575927734375, 0.4478759765625, 0.3017082214355469, 0.2403564453125, 0.6152801513671875, -0.1851348876953125, 0.07041740417480469, -0.2652740478515625, 0.14911651611328125, 0.14545631408691406, -0.783782958984375, 0.1762847900390625, -0.2254467010498047, 0.122894287109375, 1.1025314331054688, 0.12282562255859375, 0.3342399597167969, 0.130859375, -0.343475341796875, 0.17411231994628906, 0.6202316284179688, 0.0259857177734375, 0.2477855682373047, -0.03150749206542969, 0.825897216796875, 0.014937400817871094, 0.08463287353515625, 2.471771240234375, -0.7194976806640625, 1.0084724426269531, -0.70623779296875, -0.407318115234375, 0.6326103210449219, 0.22509002685546875, -1.0050048828125, 0.94158935546875, -0.028476715087890625, 0.1434326171875, 0.3086357116699219, -0.08652114868164062, 0.18286705017089844, 0.17689895629882812, 0.18674468994140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000052.npy"}
{"epoch": 0.07860922146636433, "step": 53, "batch_size": 64, "mean": 0.19459694623947144, "std": 0.4596414864063263, "min": -1.1073379516601562, "p10": -0.32976264953613277, "median": 0.21790313720703125, "p90": 0.7867462158203126, "max": 1.29119873046875, "pos_frac": 0.6875, "sample": [0.2718849182128906, 0.9736175537109375, 0.660888671875, 0.12404632568359375, 0.1563243865966797, 0.2952384948730469, 0.6451797485351562, 0.18453216552734375, 0.5223846435546875, -0.49393463134765625, 0.235870361328125, -0.003997802734375, -0.29425048828125, 0.41245079040527344, 0.07738494873046875, -0.09608078002929688, 0.15684127807617188, 1.29119873046875, 0.07487297058105469, -0.2313385009765625, 0.06219482421875, 0.1377105712890625, 0.3731842041015625, 0.5545921325683594, -1.1073379516601562, -0.02324676513671875, -0.5018844604492188, 0.3600006103515625, 0.40544891357421875, 0.24868392944335938, 0.3845176696777344, -0.946685791015625, -0.15658187866210938, -0.0690155029296875, 0.1999359130859375, 0.743560791015625, 0.24065780639648438, 1.2729301452636719, -0.10040473937988281, -0.04807281494140625, -0.38899803161621094, -0.2177886962890625, 0.8075065612792969, -0.6418609619140625, 0.8086929321289062, 0.39373779296875, 0.4258575439453125, 0.239959716796875, 0.7299385070800781, 0.884521484375, 0.29518890380859375, 0.3999595642089844, 0.024580001831054688, 0.794464111328125, 0.33251190185546875, -0.22838973999023438, 0.76873779296875, -0.2644996643066406, 0.34954833984375, -0.08044815063476562, 0.07039642333984375, 0.26473236083984375, -0.3449821472167969, 0.03753662109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000053.npy"}
{"epoch": 0.0801209372637944, "step": 54, "batch_size": 64, "mean": 0.19555151462554932, "std": 0.541435718536377, "min": -0.9209098815917969, "p10": -0.4622291564941406, "median": 0.17998313903808594, "p90": 0.8175607681274414, "max": 2.243560791015625, "pos_frac": 0.609375, "sample": [0.3922691345214844, 1.34478759765625, 0.329559326171875, 0.2724456787109375, -0.18758392333984375, -0.25208282470703125, 0.6814746856689453, 0.46855926513671875, 0.61090087890625, -0.48172760009765625, -0.22810745239257812, 0.001575469970703125, -0.1610260009765625, -0.778472900390625, -0.0027637481689453125, 0.06594467163085938, 0.84771728515625, 0.3944854736328125, 0.17404937744140625, -0.6395797729492188, 0.3173370361328125, 2.243560791015625, 0.179962158203125, 0.7474040985107422, -0.09527587890625, 0.7314548492431641, 0.3344764709472656, -0.0177154541015625, -0.0134735107421875, -0.2013397216796875, 0.0903778076171875, 0.026020050048828125, -0.4167327880859375, -0.3414878845214844, 0.5455970764160156, 0.6369171142578125, 0.33997344970703125, 0.6405868530273438, 1.0729866027832031, 0.8282585144042969, 0.13383865356445312, -0.10181808471679688, 0.18000411987304688, 0.8159866333007812, -0.3435554504394531, 0.41449546813964844, 0.2626304626464844, 0.1958770751953125, -0.07447052001953125, 0.3338165283203125, 0.649169921875, -0.38994598388671875, 0.33567047119140625, 0.34552001953125, 0.8182353973388672, -0.06516647338867188, -0.29889678955078125, -0.56695556640625, 1.0306587219238281, -0.5442695617675781, -0.586944580078125, -0.0573272705078125, 0.44834136962890625, -0.9209098815917969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000054.npy"}
{"epoch": 0.08163265306122448, "step": 55, "batch_size": 64, "mean": 0.31402069330215454, "std": 0.6525632739067078, "min": -0.7774505615234375, "p10": -0.47425975799560544, "median": 0.2214031219482422, "p90": 1.2537948608398446, "max": 2.04254150390625, "pos_frac": 0.6875, "sample": [1.0637664794921875, 0.278472900390625, 0.0349273681640625, -0.06612396240234375, 0.01285552978515625, 0.105438232421875, 0.5197219848632812, -0.29332733154296875, -0.5081844329833984, -0.3398323059082031, 0.40045928955078125, 1.01995849609375, -0.7774505615234375, 0.39571380615234375, 0.08139801025390625, 2.0376358032226562, 0.890777587890625, 0.8266067504882812, 0.4268035888671875, -0.30503082275390625, 0.4353599548339844, -0.49310302734375, 0.4950065612792969, 0.16608810424804688, 1.86260986328125, 0.5518550872802734, 0.5931472778320312, 1.335235595703125, 0.015350341796875, 0.841094970703125, 0.1392345428466797, 0.7241973876953125, 0.424224853515625, -0.4497852325439453, -0.3278045654296875, -0.168304443359375, 0.15682315826416016, 0.3873138427734375, 0.2767181396484375, 0.14567184448242188, 0.40906524658203125, 0.8803977966308594, -0.18103790283203125, 1.3991546630859375, 0.47894287109375, 0.09807586669921875, 1.4875640869140625, -0.4430999755859375, -0.3749237060546875, -0.5354766845703125, 0.1547393798828125, 0.3884162902832031, -0.5234451293945312, -0.16754913330078125, -0.411834716796875, 0.9195823669433594, 0.526214599609375, 0.10837554931640625, 0.8191852569580078, 2.04254150390625, -0.2089691162109375, -0.48474884033203125, 1.3619232177734375, -0.5612907409667969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000055.npy"}
{"epoch": 0.08314436885865457, "step": 56, "batch_size": 64, "mean": 0.17931008338928223, "std": 0.6052380204200745, "min": -2.099578857421875, "p10": -0.4180051803588867, "median": 0.15262508392333984, "p90": 0.8193058013916016, "max": 1.835784912109375, "pos_frac": 0.640625, "sample": [-0.22127532958984375, 0.4376945495605469, -0.04387474060058594, -0.524444580078125, 0.041595458984375, 0.31903839111328125, 0.3154296875, -0.8506011962890625, -0.32010650634765625, 0.1992626190185547, 0.00135040283203125, -0.602447509765625, 0.43701171875, 0.22212982177734375, 0.0360870361328125, 1.2647476196289062, -0.0384674072265625, -0.3964519500732422, 0.7715377807617188, 1.1413841247558594, -0.30419921875, 1.835784912109375, 0.7362346649169922, 0.417236328125, -0.005374908447265625, 0.8708343505859375, -0.1941680908203125, 0.16445159912109375, 0.14079856872558594, 0.774810791015625, -0.3647880554199219, 0.2537689208984375, 0.803436279296875, 0.8261070251464844, 0.0910186767578125, -0.6400604248046875, -2.099578857421875, 0.3476390838623047, -0.07623100280761719, 0.5770492553710938, 0.5094928741455078, 0.4841156005859375, -1.038299560546875, -0.3238945007324219, 0.27271080017089844, -0.11156463623046875, -0.16728973388671875, 0.6790542602539062, 0.03345680236816406, 0.2855491638183594, 0.7829933166503906, 0.27516937255859375, 1.0814476013183594, 0.4762687683105469, 0.03900146484375, 0.08504676818847656, -0.058467864990234375, 0.32022857666015625, 0.5278511047363281, -0.0256805419921875, -0.12157630920410156, 1.4848403930664062, 0.06826400756835938, -0.4272422790527344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000056.npy"}
{"epoch": 0.08465608465608465, "step": 57, "batch_size": 64, "mean": 0.31764987111091614, "std": 0.6559314131736755, "min": -0.88885498046875, "p10": -0.5456054687499999, "median": 0.28947925567626953, "p90": 0.8602485656738281, "max": 2.256195068359375, "pos_frac": 0.671875, "sample": [-0.5505218505859375, -0.30812644958496094, 0.23191070556640625, 2.256195068359375, -0.1269359588623047, 0.8024749755859375, 0.2894477844238281, -0.0628204345703125, 0.6474380493164062, 0.7937774658203125, 1.758544921875, 0.8407573699951172, -0.8644256591796875, 0.16738510131835938, 0.504119873046875, -0.5794887542724609, 0.19292449951171875, 0.6243972778320312, 1.9619674682617188, 0.28951072692871094, 0.653076171875, -0.2564544677734375, 0.8051776885986328, 1.4542236328125, -0.2052288055419922, -0.6401309967041016, -0.031646728515625, 0.34473419189453125, -0.0948944091796875, -0.6350479125976562, 0.5719146728515625, 0.6421089172363281, 0.7048416137695312, 0.2799835205078125, -0.0197601318359375, 0.7588729858398438, -0.5677490234375, 0.33618736267089844, -0.5074081420898438, 0.8489532470703125, 0.2839202880859375, -0.5341339111328125, -0.88885498046875, 0.6856040954589844, -0.4934539794921875, 0.3880462646484375, 0.198089599609375, 0.7787246704101562, 0.9192352294921875, 0.31905174255371094, 0.196624755859375, 0.232818603515625, -0.3154144287109375, 1.6924591064453125, 0.3712139129638672, 0.8650894165039062, 0.7820587158203125, 0.4349327087402344, 0.1912841796875, 0.779815673828125, 0.13204574584960938, -0.09868240356445312, 0.575042724609375, -0.4762115478515625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000057.npy"}
{"epoch": 0.08616780045351474, "step": 58, "batch_size": 64, "mean": 0.2457965612411499, "std": 0.9496217370033264, "min": -1.2542457580566406, "p10": -0.6488033294677734, "median": 0.09746551513671875, "p90": 1.1857986450195312, "max": 3.988525390625, "pos_frac": 0.59375, "sample": [-1.2542457580566406, 0.7665214538574219, 0.9917221069335938, -0.190765380859375, 0.9786453247070312, -0.261260986328125, 0.32779693603515625, 0.3367919921875, -0.20967864990234375, -0.493011474609375, 1.0261039733886719, -0.5241470336914062, 1.1875152587890625, -0.9918594360351562, -0.9481678009033203, 0.8796195983886719, 1.4754867553710938, -0.37183380126953125, -0.654693603515625, 1.1619720458984375, 0.10195541381835938, -0.21260833740234375, 0.0699310302734375, 0.500640869140625, -0.6350593566894531, 0.06483078002929688, -0.9447021484375, -0.35718536376953125, -0.8903770446777344, 0.6331787109375, 0.2767791748046875, -0.8563232421875, 0.09036636352539062, 0.0939178466796875, 3.988525390625, -0.5135650634765625, -0.082000732421875, 3.34698486328125, 1.3382110595703125, 0.318511962890625, -0.216217041015625, 0.1295642852783203, 0.44783782958984375, -0.5740509033203125, 0.2641448974609375, 2.5778045654296875, -0.31181907653808594, -0.6171035766601562, -0.59002685546875, 0.713623046875, 0.6733856201171875, 1.4059600830078125, 0.10101318359375, 0.056758880615234375, 0.242218017578125, 1.181793212890625, 0.08925056457519531, -0.493621826171875, -0.4723052978515625, 1.0767822265625, 0.2591094970703125, -0.15731048583984375, 0.14453125, 0.23513412475585938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000058.npy"}
{"epoch": 0.08767951625094482, "step": 59, "batch_size": 64, "mean": 0.25578179955482483, "std": 0.8319905400276184, "min": -1.5390777587890625, "p10": -0.5641242980957031, "median": 0.1388397216796875, "p90": 1.168563461303711, "max": 3.196014404296875, "pos_frac": 0.578125, "sample": [-0.19799423217773438, 0.3053741455078125, 0.0328826904296875, -0.44483184814453125, -0.08921051025390625, -0.4101295471191406, 0.6442031860351562, -1.3245162963867188, 0.6799659729003906, 0.51495361328125, 2.15264892578125, 0.3197288513183594, 0.8902130126953125, 1.1366615295410156, 0.7277793884277344, -0.273681640625, 1.2515869140625, -0.9730701446533203, 0.028533935546875, 0.08289337158203125, 0.1579914093017578, -0.13095855712890625, -0.44319915771484375, 0.13924407958984375, -0.141510009765625, 0.3327293395996094, -0.16739654541015625, 1.1822357177734375, 3.196014404296875, 0.7250442504882812, -0.16727066040039062, 0.188690185546875, 0.4383106231689453, -0.2402496337890625, -0.597869873046875, -0.08776283264160156, -0.3326301574707031, 0.7011623382568359, -0.608856201171875, -0.1484813690185547, 2.51824951171875, 0.9328689575195312, 0.917999267578125, 0.3946037292480469, -0.48125267028808594, 0.6663303375244141, -0.4850921630859375, 1.1172008514404297, -0.017675399780273438, 1.2604522705078125, -0.5833778381347656, 0.7383575439453125, -0.5191993713378906, -1.5390777587890625, 0.6356124877929688, 0.0993499755859375, -1.1002655029296875, 0.21622467041015625, -0.171356201171875, 0.13843536376953125, 0.494476318359375, 1.6240386962890625, 0.8311805725097656, -0.3672771453857422], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000059.npy"}
{"epoch": 0.08919123204837491, "step": 60, "batch_size": 64, "mean": 0.0870862603187561, "std": 0.8006487488746643, "min": -1.9941864013671875, "p10": -1.0227552413940428, "median": 0.17855262756347656, "p90": 1.0126752853393555, "max": 1.7910308837890625, "pos_frac": 0.640625, "sample": [0.14890289306640625, -0.02074432373046875, 0.5037422180175781, 0.0767669677734375, 0.05474853515625, 1.4258346557617188, -1.7461700439453125, -0.7735366821289062, -0.7558860778808594, -1.782684326171875, 0.4257621765136719, 0.41036224365234375, -0.22442054748535156, 0.0703582763671875, -1.2324447631835938, 0.3284645080566406, -0.3695487976074219, 1.007822036743164, 1.2828044891357422, 0.43881988525390625, -0.3865203857421875, 0.42156982421875, 0.1247711181640625, -0.404388427734375, 0.6580829620361328, 0.490509033203125, 0.4591827392578125, 0.31887054443359375, 1.1701507568359375, -1.1020851135253906, 0.2911834716796875, -0.21583938598632812, 1.15667724609375, 0.6764678955078125, -0.36287689208984375, 0.39178466796875, 0.927001953125, -0.5437507629394531, 0.22971725463867188, -1.0367450714111328, -0.48777008056640625, 0.6981048583984375, 0.27339935302734375, -1.4906883239746094, 0.3235130310058594, 1.4734954833984375, 0.007457733154296875, -0.8529815673828125, 0.6710357666015625, 0.053310394287109375, 1.0147552490234375, 0.19576644897460938, 0.6614780426025391, 0.8097991943359375, 1.7910308837890625, -0.5547733306884766, 0.16133880615234375, 0.0081024169921875, 0.72259521484375, -0.08105659484863281, -1.9941864013671875, -0.9901123046875, 0.7254905700683594, -0.09830093383789062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000060.npy"}
{"epoch": 0.09070294784580499, "step": 61, "batch_size": 64, "mean": 0.03422611951828003, "std": 0.997624933719635, "min": -2.4702987670898438, "p10": -1.373103332519531, "median": 0.11194229125976562, "p90": 1.2850921630859382, "max": 1.9685211181640625, "pos_frac": 0.546875, "sample": [0.6982421875, 0.9741668701171875, 1.966064453125, 1.9380874633789062, 0.4495048522949219, 0.2361125946044922, 0.5320663452148438, 1.4131240844726562, 0.716766357421875, -0.2909736633300781, -0.6532058715820312, -0.11203575134277344, -0.7879104614257812, 0.6191558837890625, 0.7336502075195312, -0.7399940490722656, 0.6468124389648438, -0.30852317810058594, -0.10277175903320312, 0.37000083923339844, -0.199249267578125, 0.36307716369628906, 0.59869384765625, 0.023488998413085938, 0.05882072448730469, -0.43868064880371094, -1.18328857421875, -1.4544525146484375, -0.228118896484375, -0.5290603637695312, 0.7175769805908203, -0.12572860717773438, -1.1709785461425781, 1.1154327392578125, -0.898468017578125, 0.32622528076171875, 0.11594772338867188, -0.7316417694091797, -0.40377044677734375, 1.5300140380859375, 1.9685211181640625, -2.1065902709960938, -0.01882171630859375, -1.9856910705566406, 0.5164718627929688, 1.526336669921875, 0.2909412384033203, 0.3517436981201172, -0.855682373046875, -0.11853218078613281, 0.8404045104980469, -1.5629806518554688, 0.10793685913085938, -0.141815185546875, 0.870849609375, -2.4702987670898438, 1.3578033447265625, 0.5762615203857422, 0.2941436767578125, -0.308868408203125, 0.8623199462890625, 0.2642669677734375, -2.286571502685547, -1.56585693359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000061.npy"}
{"epoch": 0.09221466364323508, "step": 62, "batch_size": 64, "mean": 0.23229673504829407, "std": 0.8599525094032288, "min": -1.983652114868164, "p10": -0.7147701263427734, "median": 0.25634765625, "p90": 1.130083465576172, "max": 2.383514404296875, "pos_frac": 0.65625, "sample": [0.6962966918945312, -1.983652114868164, -0.6724205017089844, 0.1974010467529297, 0.840423583984375, 0.4071998596191406, 1.0168838500976562, -0.2623119354248047, -1.3032379150390625, 0.36090850830078125, 1.8410110473632812, 0.08772087097167969, 1.590301513671875, -1.077880859375, 1.1389617919921875, 0.30912017822265625, 0.9241714477539062, 0.249725341796875, 0.08699226379394531, 0.7995872497558594, -0.5148086547851562, -0.6380844116210938, -0.688812255859375, 0.27051544189453125, 0.2484455108642578, -0.137725830078125, -0.6446952819824219, 0.3796844482421875, 0.4636650085449219, 0.953369140625, 0.410919189453125, -0.07462882995605469, 0.18951988220214844, 0.2437896728515625, 1.4233245849609375, -1.9105377197265625, -0.7954864501953125, 1.7165050506591797, -0.299163818359375, 0.262969970703125, -0.06603622436523438, 1.1093673706054688, 0.2156982421875, 0.7657470703125, 1.0806846618652344, 0.3650321960449219, 0.6077728271484375, -0.7258949279785156, 0.2988243103027344, 0.9475860595703125, -0.32016754150390625, 0.6616859436035156, 0.6786251068115234, 0.35637474060058594, -0.5741004943847656, -0.19960784912109375, 2.142791748046875, 2.383514404296875, -0.14903640747070312, 0.0377197265625, 0.15523529052734375, -0.12877655029296875, -1.4609222412109375, 0.5789070129394531], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000062.npy"}
{"epoch": 0.09372637944066516, "step": 63, "batch_size": 64, "mean": 0.6112611293792725, "std": 0.9562734365463257, "min": -2.579345703125, "p10": -0.34592514038085936, "median": 0.4499845504760742, "p90": 1.820903015136719, "max": 4.05804443359375, "pos_frac": 0.78125, "sample": [0.37511444091796875, 0.44330596923828125, 0.3097686767578125, -0.19477081298828125, 4.05804443359375, -0.064910888671875, -0.34984588623046875, 3.1275634765625, 1.0763015747070312, 0.5735397338867188, -2.579345703125, -0.5623245239257812, 0.4417896270751953, -0.4748077392578125, 1.36285400390625, 0.8226318359375, 1.9066314697265625, 0.3056182861328125, 0.2515392303466797, 0.5239067077636719, -0.7792625427246094, 0.3184547424316406, 1.0146255493164062, 1.8723602294921875, 0.5434951782226562, 0.9577922821044922, 0.4065093994140625, 0.38175201416015625, 0.5403861999511719, 1.2110977172851562, 0.3384666442871094, 0.75347900390625, 1.8384742736816406, 1.02435302734375, 0.23339080810546875, -0.1309814453125, 0.5092926025390625, -0.09790802001953125, 0.6430683135986328, 0.34379005432128906, 0.5899105072021484, 2.7750244140625, 0.2321491241455078, 1.493856430053711, 0.4566631317138672, 1.2015552520751953, 0.55596923828125, 1.7799034118652344, 0.8454437255859375, -0.16707229614257812, 0.34496307373046875, 0.4160346984863281, -0.3640594482421875, -0.3367767333984375, 0.7738189697265625, 0.31036376953125, 1.1965179443359375, 0.4158954620361328, -0.12821006774902344, 0.052005767822265625, 0.5443992614746094, 2.153106689453125, 1.1765899658203125, -0.4725799560546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000063.npy"}
{"epoch": 0.09523809523809523, "step": 64, "batch_size": 64, "mean": 0.09731447696685791, "std": 0.852840006351471, "min": -2.9611663818359375, "p10": -0.731803512573242, "median": 0.06452560424804688, "p90": 0.9641616821289065, "max": 3.1358642578125, "pos_frac": 0.53125, "sample": [1.1364669799804688, 3.1358642578125, -1.0569725036621094, -0.5894851684570312, -0.5305023193359375, -0.234375, 0.9967269897460938, -0.03397941589355469, 0.4017200469970703, 1.9008941650390625, 0.5065937042236328, 0.38922882080078125, 1.2997379302978516, -0.34156036376953125, 0.6520271301269531, -0.08776283264160156, 0.3754730224609375, 0.1495513916015625, 0.09286880493164062, 0.4660682678222656, 0.721343994140625, -0.03517913818359375, -0.25229644775390625, -0.014926910400390625, -1.2979736328125, -0.2415332794189453, -0.16234207153320312, -2.9611663818359375, -0.3412208557128906, 0.5654373168945312, 0.2983665466308594, 0.1451873779296875, -0.1273193359375, -0.7927970886230469, 0.03509330749511719, 0.42302703857421875, 0.7226409912109375, -2.06182861328125, -0.8425712585449219, 0.6074142456054688, 0.6371231079101562, 0.04314422607421875, 0.8881759643554688, -0.11759567260742188, 0.24902915954589844, -0.2351226806640625, -0.4231224060058594, 0.427825927734375, -0.5216751098632812, 1.5441131591796875, -0.13997650146484375, -0.37082862854003906, 1.311431884765625, 0.17842483520507812, 0.26506805419921875, 0.2737140655517578, 0.45215606689453125, -0.3146247863769531, -0.8441276550292969, -0.3389320373535156, 0.085906982421875, -0.248321533203125, -0.23788833618164062, 0.6482887268066406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000064.npy"}
{"epoch": 0.09674981103552532, "step": 65, "batch_size": 64, "mean": 0.5047406554222107, "std": 1.1758701801300049, "min": -2.3018112182617188, "p10": -0.6936077117919921, "median": 0.47106170654296875, "p90": 1.985490608215332, "max": 4.507904052734375, "pos_frac": 0.65625, "sample": [0.3951597213745117, 1.0186614990234375, 1.0041427612304688, 0.047515869140625, -0.08227729797363281, 0.7911529541015625, -1.5196533203125, -0.62542724609375, -0.47509193420410156, 2.0057296752929688, -0.35260009765625, 0.8563880920410156, 1.0931968688964844, 1.6732177734375, 0.4820556640625, 0.7664833068847656, 0.7111454010009766, 0.36942100524902344, -0.5420074462890625, 2.4835662841796875, 0.4178619384765625, 0.5609588623046875, -1.5248870849609375, 1.9757614135742188, 1.6512641906738281, -0.17755508422851562, -2.3018112182617188, 0.33245086669921875, -0.560028076171875, 0.37296295166015625, 0.024990081787109375, 0.654052734375, -0.49806976318359375, -1.2159500122070312, 0.7469558715820312, 2.3241806030273438, 0.5990028381347656, 2.8774871826171875, 1.1432647705078125, 1.3090934753417969, 1.9896602630615234, 0.06888008117675781, 1.7665023803710938, 1.028533935546875, -0.7228279113769531, -0.08935546875, 0.6390628814697266, -0.0282135009765625, -1.372812271118164, -0.49347686767578125, 0.4600677490234375, 0.6696262359619141, -0.436920166015625, 0.9668159484863281, 0.30536651611328125, 0.5098533630371094, 2.9404678344726562, -0.33429908752441406, 1.0122528076171875, 1.5192108154296875, -0.21802520751953125, -1.1445159912109375, 4.507904052734375, -0.05312347412109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000065.npy"}
{"epoch": 0.0982615268329554, "step": 66, "batch_size": 64, "mean": 0.36604076623916626, "std": 1.1676099300384521, "min": -2.84326171875, "p10": -1.029404067993164, "median": 0.20158767700195312, "p90": 1.7143882751464845, "max": 4.876739501953125, "pos_frac": 0.640625, "sample": [0.2576007843017578, 4.876739501953125, 1.0281562805175781, 0.6937541961669922, 0.3233299255371094, 0.865814208984375, 0.195709228515625, 0.24784088134765625, -0.09531021118164062, 0.6277141571044922, -0.4566802978515625, 0.6689643859863281, 1.765777587890625, 0.08916091918945312, -0.49896240234375, -1.0832748413085938, -0.10784912109375, 0.12047195434570312, 1.5316314697265625, 1.7014694213867188, 0.44564056396484375, 1.8692436218261719, -1.7938308715820312, -0.10394287109375, 0.3479881286621094, 1.0405216217041016, 0.19411087036132812, 1.2252159118652344, 1.2313232421875, 0.6203460693359375, 1.1294631958007812, 0.6379222869873047, -0.03564453125, -0.21094703674316406, -0.022960662841796875, 0.20746612548828125, 0.005615234375, 0.6364860534667969, -0.9276313781738281, 0.169525146484375, -2.84326171875, -2.0457000732421875, -0.2220306396484375, 0.17031097412109375, -0.473052978515625, -1.0730209350585938, 2.0934486389160156, -0.10579299926757812, 1.539093017578125, 0.9530506134033203, -0.1975841522216797, -0.2103118896484375, 0.006824493408203125, 2.9556427001953125, 1.7199249267578125, -1.0909652709960938, 0.8524246215820312, -0.7407150268554688, -1.1944580078125, 2.389129638671875, 0.4361534118652344, -0.0453643798828125, 0.13559722900390625, 0.999298095703125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000066.npy"}
{"epoch": 0.09977324263038549, "step": 67, "batch_size": 64, "mean": 0.28152012825012207, "std": 0.934364914894104, "min": -1.9339599609375, "p10": -0.8863252639770506, "median": 0.24392414093017578, "p90": 1.6157215118408215, "max": 2.7671051025390625, "pos_frac": 0.671875, "sample": [0.24446868896484375, -1.2769775390625, -0.5563335418701172, -0.5501060485839844, -1.3175582885742188, 0.30495452880859375, 1.7287330627441406, 1.130157470703125, 0.47058868408203125, -0.6552276611328125, 1.7821044921875, 0.11844444274902344, -0.722381591796875, -1.582275390625, 1.029937744140625, 0.8384056091308594, -0.3451042175292969, 0.6233749389648438, -0.2859344482421875, 1.1508941650390625, 0.11199951171875, 1.3067398071289062, -1.566864013671875, 2.7671051025390625, 0.2433795928955078, 0.6622161865234375, 0.5129318237304688, 0.484100341796875, 1.3520278930664062, -0.9565868377685547, 0.3490333557128906, 0.4005165100097656, 0.7949562072753906, 1.7887191772460938, 0.235687255859375, 0.19991111755371094, 0.6673393249511719, -0.02709197998046875, -0.306396484375, 0.4348602294921875, 2.2771759033203125, 0.1759185791015625, 0.6183624267578125, 1.0583953857421875, 0.179595947265625, -0.057037353515625, -0.371185302734375, 0.2974987030029297, 0.03441619873046875, -0.6680221557617188, -0.07049942016601562, 2.236724853515625, 0.3533649444580078, 0.10588836669921875, -0.28646087646484375, -0.33588409423828125, 0.47068023681640625, 0.2172088623046875, -0.9702358245849609, -1.9339599609375, 1.7477149963378906, 0.1854095458984375, 0.516143798828125, 0.6513252258300781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000067.npy"}
{"epoch": 0.10128495842781557, "step": 68, "batch_size": 64, "mean": 0.4388529658317566, "std": 1.2396509647369385, "min": -2.959197998046875, "p10": -1.0493301391601562, "median": 0.4263343811035156, "p90": 1.6844734191894537, "max": 4.46148681640625, "pos_frac": 0.6875, "sample": [4.46148681640625, 1.4341049194335938, 0.3403491973876953, 1.51617431640625, -0.5269699096679688, 2.5447845458984375, 0.6939926147460938, -0.15862274169921875, 0.9143905639648438, 0.4195404052734375, 1.7812423706054688, -2.959197998046875, 1.0615921020507812, 1.2614364624023438, 0.81304931640625, 0.9348907470703125, -0.572967529296875, -0.6165008544921875, 0.45394134521484375, 1.3673171997070312, 1.3139190673828125, 0.2474384307861328, -0.020965576171875, 1.3664932250976562, 0.4068107604980469, 0.5207557678222656, 0.5289535522460938, -0.4432106018066406, 1.9409103393554688, -1.2752227783203125, 0.016180038452148438, 0.5255947113037109, 0.6202335357666016, 0.24132537841796875, -0.8206520080566406, 0.25008392333984375, 4.07080078125, 0.33364105224609375, 0.2230072021484375, 0.2597503662109375, -1.5422210693359375, -1.1447181701660156, 0.445831298828125, 0.932861328125, -1.38836669921875, -2.8538360595703125, 0.41175079345703125, -0.001708984375, -1.0727996826171875, -0.10501861572265625, 1.7501373291015625, 0.492431640625, -0.16374969482421875, 1.7390823364257812, 0.9542083740234375, -0.10247802734375, 0.21016693115234375, 1.5570526123046875, 1.545318603515625, 0.5633106231689453, 1.0243492126464844, -0.073455810546875, 0.43312835693359375, -0.99456787109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000068.npy"}
{"epoch": 0.10279667422524566, "step": 69, "batch_size": 64, "mean": 0.6424198150634766, "std": 1.2560410499572754, "min": -2.1302719116210938, "p10": -0.6795406341552733, "median": 0.5075340270996094, "p90": 2.1367576599121096, "max": 5.233795166015625, "pos_frac": 0.734375, "sample": [0.008405685424804688, 0.248565673828125, 2.5247573852539062, -0.0266571044921875, 0.40903472900390625, -0.2684783935546875, -0.4092559814453125, 0.018121719360351562, 1.2812347412109375, 1.0885238647460938, 0.6856403350830078, 0.37352752685546875, 0.5337142944335938, 1.8693008422851562, 1.1898956298828125, 1.7823028564453125, 0.6186885833740234, 1.5467529296875, 2.5706024169921875, -1.4785614013671875, 3.5999603271484375, -2.1302719116210938, 2.1028289794921875, 0.06562423706054688, 2.310577392578125, 0.10316085815429688, 0.5628204345703125, 1.08782958984375, 0.180267333984375, -0.0098724365234375, 0.9920024871826172, -1.072296142578125, 0.5816631317138672, 0.5166549682617188, 0.13266754150390625, -0.7230224609375, 1.8416023254394531, 0.752349853515625, -0.0932464599609375, 0.4984130859375, -0.148468017578125, 0.6092281341552734, 0.4625701904296875, 0.41443634033203125, 1.8706626892089844, -0.3127899169921875, 2.1512985229492188, 0.8187522888183594, -0.11859703063964844, -1.5607681274414062, 0.425384521484375, 0.8187961578369141, -2.0130653381347656, -0.5780830383300781, 1.3744354248046875, -0.43499755859375, 1.5183868408203125, 0.3853111267089844, 1.9443626403808594, -0.8448677062988281, 5.233795166015625, 0.5423069000244141, 2.6032257080078125, 0.08772468566894531], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000069.npy"}
{"epoch": 0.10430839002267574, "step": 70, "batch_size": 64, "mean": 0.3567911982536316, "std": 1.218503475189209, "min": -3.290802001953125, "p10": -1.0795143127441407, "median": 0.3805351257324219, "p90": 1.7140087127685548, "max": 3.254241943359375, "pos_frac": 0.640625, "sample": [0.24051666259765625, -1.3169670104980469, 1.134002685546875, -2.208831787109375, 0.2731895446777344, -0.9427070617675781, -0.39121246337890625, 1.2681198120117188, 1.7238998413085938, -0.12190628051757812, 0.6399688720703125, 1.6909294128417969, -3.290802001953125, 0.9765815734863281, 0.05130767822265625, -0.7015895843505859, 2.246257781982422, -1.7624893188476562, -1.0026931762695312, -0.33167457580566406, 3.2441329956054688, 0.6413459777832031, 1.3309917449951172, 0.5870380401611328, -1.4896888732910156, 0.6525650024414062, 0.5591964721679688, 2.5802078247070312, -1.1003494262695312, 3.254241943359375, 0.4516448974609375, 0.5238914489746094, -0.6221542358398438, -0.3584747314453125, 0.33017730712890625, 1.5052642822265625, 1.8696823120117188, 0.5477752685546875, 0.406402587890625, 1.467874526977539, 0.8340358734130859, -0.7083263397216797, -0.2976245880126953, 0.5758590698242188, 1.4517364501953125, 1.0738334655761719, -0.1267547607421875, 2.30584716796875, 0.1079864501953125, 0.9512786865234375, 0.5326461791992188, 1.1383590698242188, 0.35466766357421875, -1.0308990478515625, 0.20135498046875, -0.13534927368164062, -0.0248260498046875, -1.67730712890625, 1.4797592163085938, 0.3110389709472656, -0.0399017333984375, 1.2467327117919922, -0.3407268524169922, 0.095550537109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000070.npy"}
{"epoch": 0.10582010582010581, "step": 71, "batch_size": 64, "mean": 0.22931434214115143, "std": 1.514479637145996, "min": -5.023284912109375, "p10": -1.4854743957519532, "median": 0.20254135131835938, "p90": 1.8951194763183596, "max": 3.1529083251953125, "pos_frac": 0.578125, "sample": [-0.11144447326660156, 1.1338729858398438, 1.0183143615722656, 0.17876815795898438, -0.215728759765625, -0.24701690673828125, 1.202545166015625, -1.4480209350585938, -0.35771942138671875, 1.745513916015625, 0.3766822814941406, -2.323139190673828, -0.427398681640625, 0.8467254638671875, -2.5793609619140625, -0.0023956298828125, 0.0731964111328125, 2.85943603515625, 0.5939064025878906, 1.6227798461914062, -3.4825439453125, 3.1004257202148438, 0.196197509765625, 0.06287765502929688, 0.9027271270751953, 3.1529083251953125, 0.6380691528320312, -0.364532470703125, 0.9314613342285156, 0.20888519287109375, -5.023284912109375, -0.1792755126953125, -1.50152587890625, -0.90631103515625, -0.104339599609375, 0.26207733154296875, 0.34805870056152344, 1.367095947265625, 1.4363288879394531, -0.7791595458984375, -0.37198638916015625, -3.1157989501953125, 0.42444610595703125, 1.8105621337890625, -0.19306564331054688, -0.3872671127319336, -1.85400390625, 1.9313583374023438, -0.33575439453125, -0.08562469482421875, 1.351654052734375, 2.5153579711914062, -1.03936767578125, 1.4259414672851562, 1.2033843994140625, 1.0310325622558594, 0.9187965393066406, 0.12439727783203125, 0.2297515869140625, 0.6589012145996094, 2.0625991821289062, -0.1790904998779297, 3.0019607543945312, -0.6577224731445312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000071.npy"}
{"epoch": 0.1073318216175359, "step": 72, "batch_size": 64, "mean": 0.6940467357635498, "std": 1.2795592546463013, "min": -3.2796783447265625, "p10": -0.8415603637695312, "median": 0.6131153106689453, "p90": 2.022164916992188, "max": 4.295890808105469, "pos_frac": 0.75, "sample": [1.8876800537109375, 1.05084228515625, 0.3289794921875, 1.650299072265625, -3.2796783447265625, -0.11585235595703125, 0.19259262084960938, 4.295890808105469, -1.4103145599365234, -0.4008617401123047, 1.3666763305664062, 1.540985107421875, -0.548309326171875, 1.942718505859375, 0.21315383911132812, -0.8187160491943359, -0.4268302917480469, 0.8406581878662109, 1.7057876586914062, 0.069671630859375, 3.506378173828125, 0.5534515380859375, 1.9414787292480469, 0.4981956481933594, 0.6246566772460938, 0.1977691650390625, 0.34498023986816406, -0.7550086975097656, -1.206634521484375, -0.198883056640625, 2.490692138671875, 1.4493846893310547, 1.4943695068359375, 2.05621337890625, 0.6516494750976562, 0.171142578125, 0.6786575317382812, 1.1368885040283203, 0.6015739440917969, -0.0922393798828125, 1.7812957763671875, 1.157144546508789, 2.1492080688476562, 0.1334857940673828, 0.5407562255859375, -0.4012184143066406, 1.5489044189453125, 1.7871780395507812, -1.0553245544433594, 1.3010787963867188, -1.246612548828125, 1.4017333984375, 2.7750778198242188, -0.8513507843017578, 2.7096786499023438, 0.07790756225585938, 1.6697044372558594, 0.773895263671875, -1.7732696533203125, 0.3973541259765625, 0.42261505126953125, 1.4093093872070312, 0.8939628601074219, 0.5863876342773438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000072.npy"}
{"epoch": 0.10884353741496598, "step": 73, "batch_size": 64, "mean": 0.12942326068878174, "std": 1.6903053522109985, "min": -3.732666015625, "p10": -1.4799085617065428, "median": -0.06747817993164062, "p90": 1.5126436233520513, "max": 8.21331787109375, "pos_frac": 0.46875, "sample": [1.4024314880371094, 0.7466621398925781, -0.494140625, -0.6263084411621094, -1.3409748077392578, -1.267333984375, -1.5394515991210938, -0.8890266418457031, -3.732666015625, 1.2586193084716797, -1.266754150390625, 0.7735061645507812, -0.16259384155273438, -0.9645614624023438, -1.2343254089355469, 0.6671085357666016, -1.8146514892578125, 0.110137939453125, -0.466461181640625, 0.9275588989257812, -0.899139404296875, 0.28934478759765625, -0.8892936706542969, 0.34088134765625, 8.21331787109375, 2.3781280517578125, -0.7549514770507812, -0.8447513580322266, -0.04864501953125, 0.9543609619140625, 0.5266208648681641, -0.2589683532714844, 0.6467399597167969, -0.12699127197265625, -2.4933013916015625, 0.42879295349121094, 1.4154491424560547, 2.81622314453125, -1.739980697631836, -0.9874191284179688, -0.300445556640625, 2.7441864013671875, 0.5157318115234375, -1.87762451171875, -0.43506622314453125, 1.5542984008789062, 0.32155609130859375, -0.9307422637939453, -0.08631134033203125, -0.7302703857421875, -2.151397705078125, 0.5019874572753906, 0.7867355346679688, 0.45113372802734375, 1.1093406677246094, 0.92974853515625, 1.3469772338867188, 0.14371490478515625, -0.7343673706054688, -0.20932769775390625, -0.0064563751220703125, 2.717395782470703, 4.177337646484375, -0.6082382202148438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000073.npy"}
{"epoch": 0.11035525321239607, "step": 74, "batch_size": 64, "mean": 0.7178570628166199, "std": 1.3336726427078247, "min": -2.3871307373046875, "p10": -0.7588415145874023, "median": 0.7360324859619141, "p90": 2.1014648437500005, "max": 5.57373046875, "pos_frac": 0.703125, "sample": [-0.5030288696289062, 1.3860244750976562, -0.3822021484375, 0.35738372802734375, 1.78558349609375, -0.4478912353515625, 2.8723831176757812, 5.57373046875, 0.96746826171875, 2.4393787384033203, 1.469980239868164, 0.2590446472167969, 2.1353759765625, 1.2775650024414062, -0.5177860260009766, 1.0206069946289062, 1.007232666015625, 0.6849365234375, 0.23057174682617188, -2.3871307373046875, 0.5323638916015625, 2.0023040771484375, 1.0236549377441406, 0.9542388916015625, 0.595062255859375, -0.056262969970703125, -0.6852340698242188, 1.1012496948242188, 2.3051605224609375, -1.1653060913085938, 0.6672611236572266, 0.9319305419921875, 1.2824840545654297, -0.9491157531738281, -0.7428398132324219, 1.1767501831054688, -0.7656993865966797, 0.871307373046875, -1.365936279296875, -0.19151878356933594, 0.6340503692626953, -0.9292335510253906, 1.5572738647460938, 0.7469024658203125, 0.7251625061035156, 3.1556549072265625, 1.3286056518554688, -0.7266788482666016, 1.9091796875, 2.0223388671875, -2.26922607421875, -0.015140533447265625, 0.45638275146484375, -0.00655364990234375, -0.7162437438964844, 0.4460430145263672, 0.20546722412109375, 3.710418701171875, 1.921173095703125, 1.3264389038085938, 1.326589584350586, 0.9606895446777344, 0.6115016937255859, 0.81097412109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000074.npy"}
{"epoch": 0.11186696900982615, "step": 75, "batch_size": 64, "mean": 0.7734754085540771, "std": 1.3499062061309814, "min": -2.426982879638672, "p10": -0.7200963973999019, "median": 0.5268831253051758, "p90": 2.765701293945314, "max": 4.6822357177734375, "pos_frac": 0.734375, "sample": [-0.3497581481933594, -0.30467987060546875, -0.04942131042480469, 1.0962677001953125, 3.1336212158203125, -0.020853042602539062, 0.32341766357421875, 1.2853164672851562, 1.45318603515625, 3.4888839721679688, -1.2476425170898438, 0.3948822021484375, -0.8788127899169922, 0.43653106689453125, 0.5724868774414062, 1.4340171813964844, 2.0449752807617188, -1.3689956665039062, 0.4118938446044922, 0.07745933532714844, 4.003509521484375, 1.4050827026367188, 2.1154022216796875, 1.9111328125, 1.035074234008789, 0.3360939025878906, 0.70703125, -1.0173492431640625, 1.8081817626953125, 0.7114105224609375, 0.5480899810791016, 3.3277816772460938, -0.2335968017578125, 1.9383907318115234, 0.8339157104492188, 0.23093032836914062, 4.6822357177734375, 0.799407958984375, 1.0359039306640625, 0.5043964385986328, 0.41794586181640625, 1.1187057495117188, -2.426982879638672, 0.50567626953125, 1.2176475524902344, -0.16803359985351562, -0.12912368774414062, 0.261749267578125, 0.2586193084716797, 0.18587684631347656, 0.6129131317138672, 2.39410400390625, -1.5089454650878906, -0.1517181396484375, 0.8717689514160156, -0.210662841796875, 2.924957275390625, -0.2405071258544922, -1.0269317626953125, 0.2731437683105469, 0.6732654571533203, 0.11663818359375, 3.70623779296875, 1.2102813720703125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000075.npy"}
{"epoch": 0.11337868480725624, "step": 76, "batch_size": 64, "mean": 0.2703562378883362, "std": 1.573948860168457, "min": -4.776611328125, "p10": -1.458339309692383, "median": 0.11649703979492188, "p90": 2.2903276443481446, "max": 4.186614990234375, "pos_frac": 0.5625, "sample": [-4.776611328125, 1.5303115844726562, -2.4977493286132812, -0.461517333984375, 0.14954376220703125, -0.8258590698242188, -0.49226951599121094, 0.9538040161132812, 0.3940601348876953, 3.4333572387695312, 2.2539196014404297, 4.186614990234375, -0.5767917633056641, -3.4811630249023438, 0.884674072265625, -1.4375724792480469, 0.6898193359375, 0.8742046356201172, 0.5653839111328125, 0.2788887023925781, 0.6619415283203125, 2.4350051879882812, -1.9486923217773438, 2.5930709838867188, -1.41082763671875, 1.7791213989257812, 0.44881439208984375, 1.4567184448242188, 0.06575202941894531, -0.48458099365234375, -0.3591461181640625, -0.7043514251708984, -1.4672393798828125, -0.5782814025878906, -0.4291839599609375, 0.05565643310546875, 2.3059310913085938, 0.761566162109375, -0.5059528350830078, 1.5380172729492188, -0.2666282653808594, 0.227325439453125, 0.021350860595703125, -0.061367034912109375, 1.9625091552734375, 0.49917030334472656, 0.4569549560546875, -1.5440711975097656, -0.1097564697265625, 0.0834503173828125, -0.07744598388671875, 0.870208740234375, -0.4703712463378906, 2.3872222900390625, 3.5937042236328125, 1.2523517608642578, -0.5020103454589844, -1.0481014251708984, -2.2828292846679688, 1.8161773681640625, 1.3727836608886719, -0.1476573944091797, 1.4393997192382812, -0.027957916259765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000076.npy"}
{"epoch": 0.11489040060468632, "step": 77, "batch_size": 64, "mean": 0.8061983585357666, "std": 1.5472729206085205, "min": -3.1985321044921875, "p10": -1.055364990234375, "median": 0.5516948699951172, "p90": 2.771761703491211, "max": 5.25897216796875, "pos_frac": 0.734375, "sample": [-0.39711952209472656, 1.460174560546875, -1.047760009765625, -1.2931251525878906, 1.7736759185791016, 1.7738418579101562, -1.1801033020019531, 0.4820537567138672, -3.1985321044921875, 0.5143814086914062, 1.6919403076171875, 1.0056686401367188, 0.094818115234375, -1.5413894653320312, 0.5890083312988281, 3.3737716674804688, 0.23626708984375, 1.6265716552734375, 0.5095596313476562, 3.5671310424804688, -0.25031280517578125, 0.33817481994628906, 2.3628616333007812, 2.443042755126953, 0.1508941650390625, 0.20217132568359375, 0.15935134887695312, 2.7848548889160156, 0.3999671936035156, 0.6261444091796875, -0.6762313842773438, -0.83154296875, -0.19148635864257812, 0.20201873779296875, 1.3739814758300781, 2.7412109375, 1.20538330078125, 1.4942779541015625, 1.1336383819580078, 1.6752853393554688, 1.6240234375, 0.28662109375, 0.98236083984375, -0.4675140380859375, 0.1232147216796875, 5.25897216796875, 0.1486377716064453, -0.29354286193847656, 3.6834564208984375, 1.236907958984375, 4.498115539550781, -0.11135101318359375, 0.2824115753173828, 1.9842395782470703, 1.5840682983398438, 0.6947746276855469, -2.3452835083007812, -1.058624267578125, 2.2076644897460938, -0.14326858520507812, 1.2626953125, 3.1453933715820312, 0.8219356536865234, -1.1937332153320312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000077.npy"}
{"epoch": 0.1164021164021164, "step": 78, "batch_size": 64, "mean": 0.44995251297950745, "std": 1.4063997268676758, "min": -3.4477310180664062, "p10": -1.1119468688964844, "median": 0.40949440002441406, "p90": 2.179683876037598, "max": 3.786876678466797, "pos_frac": 0.640625, "sample": [2.2705764770507812, -3.108734130859375, 0.8098106384277344, 2.177276611328125, 0.6058120727539062, 2.180715560913086, 0.7996559143066406, 0.04822540283203125, -0.07041549682617188, 0.26123809814453125, 1.6286468505859375, -1.3467559814453125, -0.20517349243164062, 1.9954681396484375, -0.05084228515625, -0.6639175415039062, -1.5231971740722656, 3.4829788208007812, -1.4583587646484375, 0.6068572998046875, -0.9324722290039062, -0.49069976806640625, 0.4359130859375, -0.9360160827636719, 1.2774810791015625, -0.1790771484375, 0.07997894287109375, -3.4477310180664062, 2.364105224609375, -2.3068084716796875, 1.7115287780761719, 1.6343765258789062, 0.07149505615234375, 2.1695709228515625, 0.23089599609375, -0.40582275390625, -1.0707550048828125, 0.2707366943359375, 1.711294174194336, 1.7551631927490234, 0.6054420471191406, 0.5666923522949219, 0.521484375, 1.2429389953613281, -0.3505401611328125, 1.5624847412109375, -1.1296005249023438, 0.425537109375, 0.39751434326171875, 1.2563934326171875, -0.6126308441162109, 2.5650405883789062, -0.0319061279296875, -0.8442535400390625, 0.376556396484375, -0.5175590515136719, 0.2716407775878906, -0.46327972412109375, 1.274515151977539, 2.6721954345703125, 0.7744903564453125, 0.4214744567871094, 3.786876678466797, 1.6424293518066406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000078.npy"}
{"epoch": 0.11791383219954649, "step": 79, "batch_size": 64, "mean": 1.1419804096221924, "std": 2.061448574066162, "min": -2.9225616455078125, "p10": -0.7230665206909178, "median": 0.8328018188476562, "p90": 3.309349060058596, "max": 8.213493347167969, "pos_frac": 0.71875, "sample": [0.04465484619140625, 0.3863525390625, -1.1293182373046875, 1.5913887023925781, -0.7691421508789062, 1.225738525390625, -2.9225616455078125, -0.288360595703125, -0.3291015625, -0.0645294189453125, -0.6155567169189453, 0.40319061279296875, 1.4020881652832031, 8.213493347167969, -1.9076614379882812, 0.7563247680664062, 2.3102569580078125, 0.5294113159179688, -0.18259429931640625, 0.2714424133300781, 1.0800132751464844, 6.73150634765625, 1.7805023193359375, 3.5228271484375, 1.1387062072753906, -1.4719734191894531, 4.569171905517578, 2.0114822387695312, 1.6488494873046875, 2.3068466186523438, -1.192047119140625, -0.7964649200439453, 2.4215774536132812, 1.82244873046875, 1.6806526184082031, 0.38849830627441406, 0.9092788696289062, 1.8697795867919922, -0.3311920166015625, 2.4460525512695312, -0.2831268310546875, 2.2955322265625, 0.42852783203125, -0.15805816650390625, 0.39342308044433594, 1.0327262878417969, 2.8112335205078125, 0.27298545837402344, 0.6438503265380859, 0.04186248779296875, 1.0277271270751953, -0.44858551025390625, 5.4828033447265625, 0.13104248046875, 1.2284202575683594, 8.155731201171875, 0.9600448608398438, 1.1902389526367188, 0.12502288818359375, 1.9982490539550781, -0.41896820068359375, 0.9877891540527344, 3.942485809326172, -0.21623992919921875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000079.npy"}
{"epoch": 0.11942554799697656, "step": 80, "batch_size": 64, "mean": 0.5696668028831482, "std": 2.0159716606140137, "min": -4.4531402587890625, "p10": -1.5977794647216796, "median": 0.2602081298828125, "p90": 3.3180305480957033, "max": 5.693206787109375, "pos_frac": 0.546875, "sample": [-0.6950016021728516, 3.221649169921875, 0.3399658203125, 1.8620223999023438, 1.1291961669921875, -0.6936798095703125, -0.37030029296875, -0.29390716552734375, -1.7565536499023438, -2.40948486328125, 2.0294322967529297, 3.3593368530273438, -0.8892059326171875, -1.0150146484375, -0.1573638916015625, 4.21893310546875, -0.9816741943359375, -0.2718963623046875, 0.71685791015625, 0.24637222290039062, 0.3812522888183594, -1.3739013671875, -0.30051422119140625, 2.0565948486328125, 4.10723876953125, 2.895404815673828, 2.90087890625, -0.22406005859375, 3.8471221923828125, 4.4892120361328125, -1.6192760467529297, 1.6943435668945312, 0.2740440368652344, 0.5533103942871094, 1.1448440551757812, 1.8983535766601562, -0.7418136596679688, 0.16756057739257812, 4.4719085693359375, -4.4531402587890625, 2.944061279296875, -0.23725128173828125, -3.4559707641601562, 1.8853797912597656, -0.11567878723144531, 0.5802078247070312, 1.9290771484375, 0.31344032287597656, -1.7929611206054688, -0.8904571533203125, 2.72723388671875, -0.1543407440185547, 0.106903076171875, -0.3638801574707031, -0.4044075012207031, -0.06855392456054688, 0.663848876953125, -1.5476207733154297, 1.7906131744384766, -3.3023147583007812, 0.4887847900390625, -1.249420166015625, 5.693206787109375, 1.15972900390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000080.npy"}
{"epoch": 0.12093726379440665, "step": 81, "batch_size": 64, "mean": 0.8183580636978149, "std": 1.79688560962677, "min": -2.9832534790039062, "p10": -1.2488800048828124, "median": 0.6460199356079102, "p90": 2.6525512695312505, "max": 7.5319671630859375, "pos_frac": 0.671875, "sample": [1.0423603057861328, 2.0003890991210938, 1.2525138854980469, -1.04510498046875, 1.1849937438964844, -2.0644073486328125, 0.6176242828369141, 0.2868156433105469, -0.5459575653076172, -0.6723403930664062, 5.0323333740234375, -0.04421234130859375, 0.5456314086914062, 2.5343856811523438, 0.7293930053710938, -0.5815658569335938, 0.5697536468505859, -0.1267719268798828, 3.5379867553710938, 2.1210994720458984, 0.8087425231933594, -0.26889801025390625, 1.623565673828125, 1.988250732421875, 2.1564788818359375, 0.8865928649902344, 0.3425750732421875, -0.5379486083984375, -0.3366222381591797, -1.581939697265625, 0.596282958984375, -2.103229522705078, 1.06610107421875, 0.1473865509033203, -0.26961517333984375, 4.06219482421875, 0.047271728515625, 7.5319671630859375, 1.429473876953125, 4.9984893798828125, -0.5276031494140625, 1.3675060272216797, 0.9797897338867188, 0.0167694091796875, 1.2612075805664062, 0.11159133911132812, 2.004301071166992, 0.6744155883789062, 2.3210601806640625, 2.0282745361328125, 3.254741668701172, -1.336212158203125, 2.4026412963867188, -0.7501907348632812, -0.2590618133544922, 0.80816650390625, -1.6444473266601562, 0.5271511077880859, 1.2529678344726562, 2.7031936645507812, -0.4305438995361328, 1.1231880187988281, -2.9832534790039062, -1.4927749633789062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000081.npy"}
{"epoch": 0.12244897959183673, "step": 82, "batch_size": 64, "mean": 1.0782179832458496, "std": 1.5589746236801147, "min": -4.4263458251953125, "p10": -0.5100061416625976, "median": 1.088693618774414, "p90": 3.2792472839355487, "max": 5.850135803222656, "pos_frac": 0.796875, "sample": [0.8566455841064453, 1.6482162475585938, 0.4191551208496094, -0.4502677917480469, 2.0955657958984375, 0.1992816925048828, 0.9983406066894531, 2.40570068359375, 3.44439697265625, 1.2018146514892578, 1.179046630859375, 0.42008018493652344, 1.341766357421875, 2.6028976440429688, 0.286346435546875, 1.4593048095703125, 1.4196014404296875, 0.11391448974609375, 1.4736595153808594, 0.7050704956054688, 0.5401611328125, -0.17537307739257812, 0.9973011016845703, -0.40972900390625, -0.5964508056640625, 3.7842483520507812, 1.790414810180664, 0.9700927734375, 1.6410064697265625, 1.4171905517578125, 0.027620315551757812, -0.06944084167480469, 5.850135803222656, 0.5703811645507812, 1.8531646728515625, 3.7804412841796875, 2.8938980102539062, 1.5753936767578125, 1.3726272583007812, 3.5485477447509766, 0.3493919372558594, -1.69879150390625, 2.7438278198242188, -1.2946548461914062, 1.6382789611816406, 0.2986602783203125, 1.9793510437011719, 0.0896759033203125, 3.5091171264648438, 1.3134307861328125, 0.0120697021484375, -0.03436851501464844, 1.9610595703125, 3.6906967163085938, -0.2551231384277344, -0.8285179138183594, 2.0179824829101562, 1.5223007202148438, 1.6464061737060547, 0.7365798950195312, -1.0280609130859375, -4.4263458251953125, -0.5356082916259766, 0.41641998291015625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000082.npy"}
{"epoch": 0.12396069538926682, "step": 83, "batch_size": 64, "mean": 1.1423637866973877, "std": 1.8866578340530396, "min": -3.7048492431640625, "p10": -1.0385147094726563, "median": 1.0522651672363281, "p90": 3.020955276489258, "max": 8.191333770751953, "pos_frac": 0.8125, "sample": [2.6965179443359375, 1.476806640625, -1.0300445556640625, 0.20943450927734375, 0.8170223236083984, 0.13379669189453125, 0.9460906982421875, 1.3125457763671875, 2.675994873046875, 4.380645751953125, 0.10700225830078125, 1.5347175598144531, -1.7920265197753906, 1.3194923400878906, 1.7487602233886719, 0.8818378448486328, 1.8706130981445312, 2.5282821655273438, 2.1314449310302734, 1.1584396362304688, 1.9663314819335938, -1.0728492736816406, 4.223182678222656, 0.29380226135253906, -1.7156906127929688, 1.49359130859375, 3.8540496826171875, 0.9318771362304688, 0.4065380096435547, -0.08020210266113281, -1.042144775390625, 0.4236564636230469, 2.9703025817871094, -3.7048492431640625, -0.8698024749755859, 2.268421173095703, 1.5119190216064453, 0.0040740966796875, 1.4666633605957031, -2.2148284912109375, 0.1070098876953125, 0.30865478515625, 0.3746795654296875, 0.34114646911621094, -0.4842796325683594, -0.29815673828125, 0.3887443542480469, 2.0344619750976562, 3.466156005859375, 1.814849853515625, 8.191333770751953, 1.929840087890625, 3.04266357421875, 0.7205467224121094, 0.3773956298828125, 1.9714851379394531, 1.2858467102050781, 1.8582115173339844, 2.2721633911132812, 6.481658935546875, 1.6471405029296875, 0.02436065673828125, 0.126312255859375, -1.0923652648925781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000083.npy"}
{"epoch": 0.1254724111866969, "step": 84, "batch_size": 64, "mean": 0.839773416519165, "std": 1.957689642906189, "min": -3.519287109375, "p10": -1.6680192947387693, "median": 0.671142578125, "p90": 3.813704681396486, "max": 5.6296844482421875, "pos_frac": 0.71875, "sample": [1.0972824096679688, 2.010955810546875, 0.7313003540039062, 1.7652015686035156, 4.581062316894531, 2.0396060943603516, 3.9829330444335938, -0.3236236572265625, -0.4777946472167969, -3.519287109375, -0.6097869873046875, 2.0962352752685547, 3.207275390625, 0.128997802734375, 0.15367507934570312, -0.003997802734375, 0.34108734130859375, -1.036376953125, 3.9864883422851562, -1.4707832336425781, 1.4412155151367188, 1.8938217163085938, 1.0640125274658203, 2.2614288330078125, 0.31438255310058594, 0.27069091796875, -1.4989700317382812, 3.4188385009765625, 0.03262138366699219, 0.8912582397460938, 1.0402069091796875, 0.395721435546875, 0.06258773803710938, 5.6296844482421875, -2.1011810302734375, 0.9339942932128906, 1.8104743957519531, 2.123899459838867, 1.1642341613769531, -2.3999786376953125, -3.049957275390625, 4.4206085205078125, 1.979766845703125, 0.09303092956542969, 4.2217864990234375, 0.7879161834716797, 0.22315406799316406, 2.865978240966797, 0.599761962890625, -2.5513076782226562, 0.6109848022460938, -1.4210052490234375, 2.216733932495117, 1.7539749145507812, 1.2791175842285156, -1.3177108764648438, 4.22406005859375, 0.45269012451171875, -0.6023788452148438, -1.7415542602539062, 2.9693260192871094, 0.2623138427734375, -1.740468978881836, -0.2207183837890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000084.npy"}
{"epoch": 0.12698412698412698, "step": 85, "batch_size": 64, "mean": 1.304450273513794, "std": 1.9892730712890625, "min": -3.4084548950195312, "p10": -1.3872287750244139, "median": 1.156402587890625, "p90": 3.75656852722168, "max": 6.781463623046875, "pos_frac": 0.8125, "sample": [2.597625732421875, -1.4268798828125, 1.3466262817382812, 2.31427001953125, 0.9832305908203125, 1.2938232421875, 1.9353866577148438, 0.7798633575439453, 0.2542533874511719, 3.6421546936035156, 3.578369140625, 1.054656982421875, 2.0800342559814453, 0.474578857421875, 1.2566509246826172, 1.8743324279785156, 1.7636222839355469, 3.1421127319335938, 2.8137435913085938, 2.50250244140625, -1.4159202575683594, 2.2035179138183594, -2.621673583984375, 2.6589431762695312, 0.5103034973144531, 3.80560302734375, 0.412109375, 1.1453323364257812, 4.883441925048828, -0.11087226867675781, -1.5123786926269531, 1.8420982360839844, 2.0579833984375, 0.307830810546875, 0.6249942779541016, 0.276885986328125, 4.17218017578125, 2.3938331604003906, 0.6473121643066406, -0.2826499938964844, -3.4084548950195312, 1.93536376953125, -0.3613739013671875, 1.7306709289550781, 6.781463623046875, 3.215496063232422, 5.1235504150390625, 0.4635581970214844, 1.1674728393554688, 1.1046142578125, 1.2696914672851562, -2.3906326293945312, 0.6918601989746094, 0.04534149169921875, 0.8397026062011719, 1.034515380859375, -2.3267059326171875, -1.2928390502929688, 4.420722961425781, 0.9613761901855469, 5.9719390869140625, -1.320281982421875, 0.2733001708984375, 1.2946357727050781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000085.npy"}
{"epoch": 0.12849584278155707, "step": 86, "batch_size": 64, "mean": 0.9814530611038208, "std": 2.1054396629333496, "min": -3.6268539428710938, "p10": -1.5589244842529297, "median": 0.8800640106201172, "p90": 3.8050209045410157, "max": 6.436958312988281, "pos_frac": 0.671875, "sample": [6.436958312988281, -0.2303905487060547, -2.1921615600585938, 0.1626720428466797, 2.1318130493164062, -0.3302116394042969, 0.383148193359375, 0.5919113159179688, -3.6268539428710938, 1.3323478698730469, -0.4290351867675781, -2.0752029418945312, 4.0233001708984375, 5.512001037597656, -0.9512405395507812, 3.5852813720703125, 1.2477664947509766, -0.6555385589599609, -3.1237106323242188, 0.7819442749023438, 4.73046875, 0.159698486328125, -0.18726348876953125, 2.6146278381347656, 3.7962570190429688, 0.24393081665039062, 1.441558837890625, 1.1108016967773438, 3.0672683715820312, 0.41278839111328125, 2.957103729248047, 3.80877685546875, 2.3889923095703125, -2.37957763671875, 2.9960174560546875, 2.1829071044921875, -1.5795822143554688, 0.8665084838867188, -1.5107231140136719, 0.60443115234375, -1.3927841186523438, 2.6287899017333984, 1.8063545227050781, -0.6017112731933594, 1.056304931640625, -0.017635345458984375, 3.591287612915039, -1.3215408325195312, 2.3239059448242188, -0.7663021087646484, 1.7161102294921875, 4.367332458496094, 0.45428466796875, -0.206085205078125, 2.224224090576172, 3.8272857666015625, -2.189626693725586, 1.859039306640625, 1.1943321228027344, -1.3651351928710938, 0.8936195373535156, 0.2389087677001953, 1.1259498596191406, 1.0662994384765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000086.npy"}
{"epoch": 0.13000755857898716, "step": 87, "batch_size": 64, "mean": 1.061636209487915, "std": 2.458143711090088, "min": -6.7685546875, "p10": -1.840717315673828, "median": 0.973297119140625, "p90": 4.225119781494141, "max": 6.605583190917969, "pos_frac": 0.71875, "sample": [1.7307701110839844, 5.523193359375, 6.605583190917969, -0.4212818145751953, 1.5437698364257812, 0.6374664306640625, 2.771514892578125, -1.2315216064453125, 3.4109268188476562, 2.676088333129883, -2.403217315673828, -1.3964595794677734, 1.6133384704589844, -2.5744400024414062, -6.7685546875, 5.439727783203125, -2.2693710327148438, 4.079872131347656, 0.9669456481933594, 0.8448524475097656, -1.6937713623046875, 0.10541534423828125, 1.4968509674072266, -1.2143325805664062, 3.753082275390625, -3.5548629760742188, -0.03232574462890625, -0.2533435821533203, 2.2530059814453125, -1.4418754577636719, -0.8272552490234375, 0.2801971435546875, 4.2873687744140625, -2.2180404663085938, 1.5255203247070312, -1.6606483459472656, 0.7627792358398438, 1.6280555725097656, 2.77655029296875, 0.46756553649902344, 0.9796485900878906, 2.500579833984375, 0.5942001342773438, -1.9036941528320312, 1.3011932373046875, 1.4254150390625, 0.8542747497558594, 3.2639236450195312, -0.1363525390625, 3.0218353271484375, 3.6844482421875, 1.275054931640625, 5.524078369140625, 0.11339569091796875, 0.17548370361328125, 1.3893814086914062, 0.5809841156005859, 0.055973052978515625, 6.053802490234375, 5.137626647949219, 1.3099365234375, 1.2506370544433594, 0.210693359375, 2.0630569458007812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000087.npy"}
{"epoch": 0.13151927437641722, "step": 88, "batch_size": 64, "mean": 0.8105129599571228, "std": 2.926940441131592, "min": -7.8688507080078125, "p10": -2.2023216247558595, "median": 0.8543252944946289, "p90": 4.030390167236329, "max": 10.03387451171875, "pos_frac": 0.671875, "sample": [5.8756866455078125, 0.8259735107421875, -1.9752769470214844, -1.2709274291992188, 7.29766845703125, 3.2346572875976562, 5.28265380859375, 0.16803741455078125, 1.0143814086914062, -0.09890556335449219, 3.424396514892578, 3.7867584228515625, 1.587820053100586, 1.6220932006835938, -0.8135986328125, 1.9763107299804688, 4.134803771972656, -1.8033866882324219, -2.2227630615234375, 0.9030990600585938, 2.8659019470214844, 0.8826770782470703, 0.2735748291015625, -0.48188018798828125, -2.077251434326172, 1.897735595703125, 0.275390625, 10.03387451171875, -1.6071357727050781, -2.7938804626464844, -7.334053039550781, -2.7962417602539062, 3.7771453857421875, 0.914947509765625, 0.9771614074707031, -0.04738616943359375, -0.9710311889648438, 2.1004199981689453, 1.7406768798828125, 1.0482921600341797, 0.25067710876464844, 2.5320892333984375, 2.34033203125, 0.8978538513183594, -2.643096923828125, 0.210845947265625, 0.5066719055175781, 0.3948993682861328, 0.2591819763183594, 0.044162750244140625, 2.0667858123779297, 1.0238571166992188, 6.3211822509765625, 4.137298583984375, 2.6185150146484375, -2.1546249389648438, 1.8604660034179688, 0.14229774475097656, -0.2946014404296875, 2.9943408966064453, -1.089773178100586, -7.8688507080078125, -1.6853446960449219, -2.6207542419433594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000088.npy"}
{"epoch": 0.1330309901738473, "step": 89, "batch_size": 64, "mean": 1.3490331172943115, "std": 2.5324809551239014, "min": -3.8837509155273438, "p10": -0.8496692657470701, "median": 1.1078758239746094, "p90": 4.0967346191406255, "max": 11.08917236328125, "pos_frac": 0.71875, "sample": [0.6781959533691406, 2.0680694580078125, -0.1877460479736328, 1.7728652954101562, 3.611083984375, 4.021961212158203, -0.5583114624023438, 0.20661163330078125, 1.4899730682373047, 3.7480316162109375, 2.00347900390625, 3.6243209838867188, 2.1063232421875, -0.5456390380859375, 0.2955780029296875, 0.5601577758789062, -0.31504058837890625, 1.15618896484375, -0.20513153076171875, -2.741455078125, 2.3471832275390625, -0.9507217407226562, -0.02098846435546875, 3.0633087158203125, -0.5770149230957031, 11.08917236328125, 2.0681190490722656, 1.0595626831054688, -0.5167045593261719, 4.15997314453125, -0.45113372802734375, 1.8365631103515625, 1.0083847045898438, 0.645904541015625, 4.506591796875, 0.04930686950683594, 3.485076904296875, 1.2313308715820312, -1.26318359375, 2.1987762451171875, 2.852842330932617, 2.6485252380371094, 6.616874694824219, 7.384328842163086, 1.184814453125, 0.07634162902832031, 0.21878433227539062, 3.1685333251953125, 4.128780364990234, -3.4099960327148438, -0.610107421875, 2.631654739379883, 0.36385345458984375, 0.36663818359375, 0.5771274566650391, -3.4431915283203125, 1.8184261322021484, -0.6138801574707031, -3.0877304077148438, -3.8837509155273438, 5.0156097412109375, 0.9819259643554688, 1.5224132537841797, 2.070272445678711], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000089.npy"}
{"epoch": 0.1345427059712774, "step": 90, "batch_size": 64, "mean": 1.3000154495239258, "std": 2.5150344371795654, "min": -5.110992431640625, "p10": -1.6013954162597654, "median": 1.1500835418701172, "p90": 4.284128189086915, "max": 7.338348388671875, "pos_frac": 0.734375, "sample": [-0.9113636016845703, 2.768087387084961, 0.7517719268798828, 2.6013946533203125, 3.597515106201172, 0.40064239501953125, 3.024290084838867, 6.7138519287109375, -3.4210205078125, -5.110992431640625, 3.9715309143066406, 3.5333213806152344, 3.251668930053711, 0.9500885009765625, 2.9453887939453125, -0.9635963439941406, 1.3359222412109375, 6.507781982421875, -1.6292343139648438, 1.1748161315917969, 3.3704833984375, -1.401275634765625, 6.41705322265625, 4.418098449707031, 5.321868896484375, 1.1253509521484375, 1.3586883544921875, 4.48846435546875, 0.44171905517578125, 0.686767578125, 0.4564781188964844, -0.62890625, 1.8760318756103516, 7.338348388671875, 1.5260734558105469, 3.8312530517578125, -2.1097259521484375, -1.2607803344726562, 3.6901473999023438, 1.9273452758789062, -1.247283935546875, 1.3403167724609375, 2.110443115234375, -1.53643798828125, 3.7863845825195312, 0.272003173828125, 1.0048484802246094, 0.4380226135253906, -0.3293304443359375, -0.5802040100097656, -2.970867156982422, 1.3877334594726562, 1.7164382934570312, 0.07240676879882812, 0.6955013275146484, 1.0017986297607422, 0.5268783569335938, -3.1780147552490234, -1.7123031616210938, 2.732105255126953, 1.9342880249023438, 1.7969818115234375, -0.7744998931884766, 0.34842872619628906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000090.npy"}
{"epoch": 0.1360544217687075, "step": 91, "batch_size": 64, "mean": 1.1158554553985596, "std": 2.8653030395507812, "min": -6.279445648193359, "p10": -1.921957778930664, "median": 0.6590843200683594, "p90": 4.60748634338379, "max": 12.102783203125, "pos_frac": 0.65625, "sample": [3.3077392578125, 3.5221939086914062, 1.1847248077392578, -1.689596176147461, 0.5164566040039062, 6.571781158447266, 0.1390361785888672, 1.3903045654296875, 3.1527099609375, 6.9623565673828125, 12.102783203125, -0.954132080078125, 1.016998291015625, -1.5940513610839844, 1.626901626586914, -1.8119640350341797, -1.1162567138671875, -0.5135498046875, 3.2540550231933594, -2.8406906127929688, 1.9814453125, 3.4928207397460938, 0.4049530029296875, 0.6782913208007812, 1.8076934814453125, 1.9593219757080078, 3.3184261322021484, -2.0556468963623047, -2.8387298583984375, 3.2692031860351562, 0.6398773193359375, -1.9131431579589844, -1.9257354736328125, 1.4572334289550781, 4.916013717651367, -0.8824996948242188, -1.1944732666015625, 2.8237762451171875, -0.6994724273681641, -6.279445648193359, 2.0715484619140625, 1.1689910888671875, -2.074615478515625, -1.4142990112304688, 0.180908203125, 0.2852287292480469, 0.0589141845703125, 5.047220230102539, 2.211925506591797, 1.5568923950195312, -0.25455474853515625, -1.7663192749023438, -0.13559532165527344, -0.15621376037597656, 4.646839141845703, 3.386871337890625, 6.1168975830078125, 0.08325004577636719, 4.515663146972656, 2.5203323364257812, -2.2884521484375, 0.021329879760742188, 1.9554977416992188, 0.48877525329589844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000091.npy"}
{"epoch": 0.13756613756613756, "step": 92, "batch_size": 64, "mean": 1.5725436210632324, "std": 3.183176040649414, "min": -4.849884033203125, "p10": -1.6498603820800781, "median": 0.9076480865478516, "p90": 5.673493003845215, "max": 11.51458740234375, "pos_frac": 0.671875, "sample": [2.7115554809570312, -0.4689769744873047, -3.314849853515625, -0.05339241027832031, -0.6724472045898438, -1.003143310546875, 0.23620223999023438, 0.6927413940429688, 1.8008499145507812, -0.9928779602050781, -1.6505126953125, 1.4590721130371094, 3.5894203186035156, -1.6483383178710938, 5.231529235839844, 2.457334518432617, 2.1408843994140625, 0.17850685119628906, 5.584199905395508, 0.44539642333984375, 3.168304443359375, -0.935028076171875, 1.6868743896484375, 0.7742462158203125, 4.618896484375, -0.12671470642089844, -3.389801025390625, 1.4784088134765625, -4.343254089355469, 3.6631393432617188, -2.0510005950927734, 2.696880340576172, 6.2738037109375, 0.2308826446533203, 3.7637786865234375, 1.022918701171875, 0.37530517578125, 3.847137451171875, -3.2210617065429688, 1.835601806640625, -0.12117767333984375, 0.13160324096679688, 5.955696105957031, 2.6934890747070312, 1.957632064819336, -1.4954185485839844, 4.412139892578125, 5.7846221923828125, 3.472869873046875, -1.1898117065429688, -0.5489120483398438, 10.420623779296875, 1.1988677978515625, -0.30382728576660156, 4.42559814453125, -4.849884033203125, 11.51458740234375, 4.136253356933594, 0.5842609405517578, 7.973724365234375, 5.711761474609375, 0.7923774719238281, 0.03672027587890625, -0.14347457885742188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000092.npy"}
{"epoch": 0.13907785336356765, "step": 93, "batch_size": 64, "mean": 1.0267179012298584, "std": 3.223741054534912, "min": -5.793613433837891, "p10": -2.7202384948730467, "median": 0.5600242614746094, "p90": 4.9454063415527365, "max": 14.074699401855469, "pos_frac": 0.609375, "sample": [0.68560791015625, 3.155628204345703, 4.228271484375, 3.7576904296875, -2.8440628051757812, 3.8520278930664062, -2.0507278442382812, -1.33453369140625, -0.07584571838378906, 2.7440109252929688, 1.3600711822509766, 2.3535327911376953, 14.074699401855469, 0.3035850524902344, -0.2430419921875, 0.43444061279296875, 0.4083404541015625, 3.1087379455566406, -0.8538761138916016, -5.793613433837891, 2.7592010498046875, 7.378631591796875, -1.6716384887695312, -0.2533111572265625, 2.5747756958007812, -0.3124275207519531, 4.464866638183594, 0.012393951416015625, -1.6418437957763672, 0.9786605834960938, 0.016632080078125, 0.010435104370117188, -0.4733753204345703, 5.394752502441406, -1.22662353515625, -0.8479843139648438, -0.25630950927734375, -0.09232711791992188, -5.2112274169921875, -0.0057525634765625, 1.2131462097167969, 0.09113693237304688, 2.3244171142578125, 1.7278518676757812, 1.6489639282226562, -2.6883926391601562, 0.8056926727294922, 0.6905803680419922, 5.5136871337890625, 1.3324470520019531, 4.30793571472168, 1.631765365600586, 5.1513519287109375, 1.93414306640625, -2.3694000244140625, -2.73388671875, 5.163188934326172, -5.06036376953125, -3.317462921142578, 5.749931335449219, 3.9595413208007812, -3.2739028930664062, -0.0123138427734375, 3.051422119140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000093.npy"}
{"epoch": 0.14058956916099774, "step": 94, "batch_size": 64, "mean": 1.8935097455978394, "std": 4.03842306137085, "min": -10.15130615234375, "p10": -1.3360969543457029, "median": 1.8004398345947266, "p90": 5.82388153076172, "max": 14.358169555664062, "pos_frac": 0.765625, "sample": [4.101104736328125, 0.5267181396484375, 1.319793701171875, 3.261058807373047, 1.4879074096679688, -8.504371643066406, -0.07053947448730469, 0.6240367889404297, 1.581024169921875, 2.7188262939453125, 0.6901206970214844, 5.5178985595703125, 1.828277587890625, -8.668342590332031, 2.545196533203125, 5.259498596191406, -1.6178703308105469, -0.5877895355224609, 5.95501708984375, 2.27301025390625, 2.4860668182373047, 2.0882492065429688, 3.224332809448242, 2.129894256591797, 2.7901153564453125, 1.1574668884277344, 0.41800498962402344, 0.5141677856445312, -3.194355010986328, 3.7767791748046875, 3.4568653106689453, -1.1101741790771484, 0.2548179626464844, 0.1809539794921875, 2.9130821228027344, 3.5221290588378906, 4.018226623535156, 2.12896728515625, 3.9328460693359375, 0.402496337890625, 6.336315155029297, 6.414848327636719, -0.0804290771484375, 12.34039306640625, 3.8860435485839844, 4.985137939453125, -1.412994384765625, 11.25433349609375, 4.0553436279296875, 2.1341285705566406, 14.358169555664062, 1.0894756317138672, -0.24503326416015625, -0.7103290557861328, 1.7726020812988281, 0.6049900054931641, -10.15130615234375, 2.0361461639404297, 9.517410278320312, 1.6085796356201172, 0.11206626892089844, -1.1566696166992188, -2.2497825622558594, -0.6463241577148438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000094.npy"}
{"epoch": 0.1421012849584278, "step": 95, "batch_size": 64, "mean": 1.585265874862671, "std": 3.48620343208313, "min": -6.3185272216796875, "p10": -2.6255027770996096, "median": 1.1804447174072266, "p90": 6.513040924072266, "max": 8.790374755859375, "pos_frac": 0.640625, "sample": [6.026580810546875, -2.9495620727539062, 1.3492279052734375, -0.6595382690429688, -2.6341934204101562, 0.18040084838867188, 8.790374755859375, 0.8755702972412109, 2.8079452514648438, 1.3828506469726562, -1.14715576171875, 1.6764144897460938, 1.13165283203125, -0.18799972534179688, -0.7450485229492188, 1.008941650390625, -3.4552001953125, -2.2045249938964844, -0.3784160614013672, 2.0189476013183594, 7.378387451171875, 6.346527099609375, -0.7021656036376953, 0.9716987609863281, -5.70953369140625, -0.4911956787109375, 7.88543701171875, 6.73004150390625, -3.1448898315429688, -6.3185272216796875, 2.0619354248046875, 6.8686370849609375, 1.2292366027832031, 2.3681564331054688, -0.7900180816650391, 0.9091110229492188, 5.7612152099609375, -1.1628875732421875, -3.9380950927734375, 2.36529541015625, 3.3682708740234375, 0.2386474609375, 5.721809387207031, 2.735443115234375, -2.605224609375, 5.011283874511719, 5.392822265625, 0.28539466857910156, 1.6633338928222656, 6.448585510253906, -0.6422271728515625, 4.7101287841796875, -0.5573215484619141, 3.8539466857910156, 5.562469482421875, 8.380424499511719, -1.2738723754882812, -0.4645233154296875, 1.5781688690185547, 3.8757858276367188, 0.5716304779052734, 1.7895698547363281, 6.5406646728515625, -2.233827590942383], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000095.npy"}
{"epoch": 0.1436130007558579, "step": 96, "batch_size": 64, "mean": 1.8237941265106201, "std": 4.562654495239258, "min": -8.272048950195312, "p10": -2.4108444213867184, "median": 1.3568229675292969, "p90": 7.884473419189456, "max": 15.6845703125, "pos_frac": 0.6875, "sample": [3.2365074157714844, -2.12615966796875, 7.132781982421875, -0.46283721923828125, 12.362442016601562, -8.272048950195312, 3.4856719970703125, 0.20294952392578125, 2.5936050415039062, 0.9110755920410156, 5.159759521484375, -0.59405517578125, 1.4933395385742188, 10.564483642578125, 2.7674179077148438, 1.8539695739746094, -5.041603088378906, 0.24479293823242188, 1.220306396484375, -1.0234107971191406, 1.9790458679199219, 2.4555130004882812, 8.206626892089844, -5.440315246582031, 0.6161041259765625, 0.6157493591308594, 5.181079864501953, -0.8295745849609375, -1.8853683471679688, 0.1459503173828125, 2.200061798095703, 8.472869873046875, 15.6845703125, 4.995613098144531, -1.3001670837402344, -2.5328521728515625, -7.73651123046875, 0.3359184265136719, -2.7766952514648438, 1.8487510681152344, 2.9053497314453125, 1.1863937377929688, 1.9854660034179688, 0.035755157470703125, 12.710372924804688, -1.8907012939453125, 3.8152523040771484, -0.3893585205078125, 1.8438873291015625, -0.5363388061523438, 4.294765472412109, -1.8096504211425781, 0.8087482452392578, 6.619529724121094, 10.742111206054688, 2.5505218505859375, -1.3398914337158203, 0.5060272216796875, 1.5992259979248047, 3.8739471435546875, 6.89251708984375, 1.9187908172607422, -1.7755012512207031, -5.769752502441406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000096.npy"}
{"epoch": 0.14512471655328799, "step": 97, "batch_size": 64, "mean": 1.0247814655303955, "std": 5.034452438354492, "min": -17.65313720703125, "p10": -3.063512420654297, "median": 0.7616186141967773, "p90": 6.242437362670903, "max": 17.42718505859375, "pos_frac": 0.609375, "sample": [-9.7669677734375, -2.0098533630371094, -0.10608291625976562, 7.625741958618164, 0.6551494598388672, 13.230323791503906, -3.0706100463867188, -3.0469512939453125, -2.5631752014160156, -17.65313720703125, -2.6443920135498047, -0.8143272399902344, 0.246612548828125, 3.304269790649414, -3.9021072387695312, -9.045356750488281, 0.27994346618652344, 1.1683158874511719, 3.0554885864257812, 0.41437721252441406, 1.1834297180175781, 3.7857589721679688, 1.918039321899414, -2.39093017578125, 3.949951171875, -1.4836578369140625, -1.1507797241210938, 5.134361267089844, -1.4406967163085938, 11.251571655273438, 1.8758392333984375, -3.4501495361328125, 3.106842041015625, 0.4927253723144531, 0.8754749298095703, 4.812767028808594, 7.659879684448242, 6.717327117919922, -1.1009578704833984, 4.122802734375, 2.83837890625, -4.992359161376953, 17.42718505859375, 2.5326194763183594, 2.5435104370117188, -1.1657333374023438, 3.494150161743164, 3.5181961059570312, 8.88470458984375, -3.0308914184570312, -0.14240646362304688, 2.1795501708984375, 0.8680877685546875, -1.3539085388183594, -1.3454132080078125, 3.0875167846679688, 3.964822769165039, -2.0048446655273438, 2.7145824432373047, 2.5913238525390625, 0.5406913757324219, 0.01148223876953125, 2.181049346923828, -0.9831409454345703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000097.npy"}
{"epoch": 0.14663643235071808, "step": 98, "batch_size": 64, "mean": 0.6753795146942139, "std": 3.2884814739227295, "min": -9.170562744140625, "p10": -2.964135932922363, "median": 1.05023193359375, "p90": 4.709123992919922, "max": 6.928985595703125, "pos_frac": 0.640625, "sample": [0.16079330444335938, 1.3961524963378906, 5.29157829284668, 0.9055213928222656, 2.470428466796875, 2.4085617065429688, 2.2788543701171875, 1.9911956787109375, -1.2950897216796875, 5.863800048828125, -9.170562744140625, -2.9181041717529297, -1.21783447265625, -1.4831314086914062, -2.9838638305664062, -0.752960205078125, -0.1463623046875, 1.5886077880859375, 1.089559555053711, -0.5045166015625, 3.290813446044922, 6.238136291503906, -3.714691162109375, 4.73809814453125, 0.7137470245361328, -1.8095455169677734, 3.2377700805664062, 0.9076690673828125, 1.6487884521484375, 0.9834022521972656, 1.9189682006835938, 1.39385986328125, 6.384498596191406, 3.01385498046875, -2.8574371337890625, 1.6854324340820312, 1.3509559631347656, 2.5724105834960938, 1.010904312133789, -0.6743087768554688, 2.6180267333984375, 0.5983200073242188, -8.057952880859375, 1.7301712036132812, -1.2554702758789062, -3.758615493774414, 1.7348403930664062, 1.7295036315917969, -1.476409912109375, 5.662906646728516, -2.1332168579101562, 4.641517639160156, -1.49908447265625, -2.0341548919677734, -8.94403076171875, 3.5270233154296875, 2.1803359985351562, 0.19782066345214844, 0.6791572570800781, 6.928985595703125, 2.3901405334472656, 4.196479797363281, -0.31252288818359375, -3.125438690185547], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000098.npy"}
{"epoch": 0.14814814814814814, "step": 99, "batch_size": 64, "mean": 0.9959125518798828, "std": 4.641501426696777, "min": -13.119476318359375, "p10": -3.5128204345703122, "median": 0.9193649291992188, "p90": 6.57787322998047, "max": 14.655181884765625, "pos_frac": 0.625, "sample": [8.124977111816406, 7.611644744873047, 6.721473693847656, 11.855613708496094, 0.623077392578125, -3.567047119140625, -0.49254608154296875, -1.7555809020996094, 0.9151611328125, 6.296764373779297, 2.230438232421875, 0.9256057739257812, 1.8494377136230469, 4.4849395751953125, -0.6918754577636719, 1.2478790283203125, 1.0921897888183594, -0.2383880615234375, -1.5108985900878906, 1.798187255859375, 6.436088562011719, 0.1849517822265625, -5.5436859130859375, -2.7108497619628906, -5.5794525146484375, -3.38629150390625, 14.655181884765625, -9.479385375976562, -2.4085006713867188, 6.626007080078125, 1.4353294372558594, -3.00262451171875, -0.6588592529296875, 0.21530914306640625, 0.7082862854003906, 1.9300079345703125, 1.2161827087402344, -3.231374740600586, 0.15105056762695312, 5.519985198974609, 0.9235687255859375, -13.119476318359375, 5.820701599121094, 10.58612060546875, -0.3970451354980469, 1.8726577758789062, 6.4655609130859375, 0.6977157592773438, -7.8934783935546875, 1.4507122039794922, -0.3018074035644531, 3.653902053833008, 1.9272422790527344, 0.9835472106933594, 2.411346435546875, -0.13436126708984375, 0.08170700073242188, 4.321556091308594, 1.4937191009521484, -0.8344345092773438, 1.3335494995117188, -2.2951087951660156, -1.6377716064453125, -4.270130157470703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000099.npy"}
{"epoch": 0.14965986394557823, "step": 100, "batch_size": 64, "mean": 1.477339267730713, "std": 6.1843461990356445, "min": -16.36968994140625, "p10": -3.5797843933105464, "median": 1.0745601654052734, "p90": 7.525942993164066, "max": 25.605133056640625, "pos_frac": 0.671875, "sample": [1.6627197265625, -1.55126953125, -4.138153076171875, 2.5274505615234375, 7.8858642578125, -1.2348976135253906, 0.6616630554199219, 3.761676788330078, 5.626976013183594, 4.041259765625, 2.0941619873046875, 1.1517410278320312, 4.118629455566406, 12.716804504394531, 1.5888900756835938, 4.2058868408203125, -1.0338821411132812, 1.3301734924316406, 0.6354331970214844, 1.0024871826171875, 0.6318244934082031, 0.79864501953125, -5.5449371337890625, 0.554473876953125, -4.169380187988281, 4.6244659423828125, -1.4718494415283203, 8.361740112304688, 0.9892997741699219, -14.793487548828125, -1.0048637390136719, -0.2725677490234375, -2.8511962890625, 1.1466331481933594, 0.712646484375, -0.9577865600585938, 3.6180267333984375, 0.5096588134765625, -13.203865051269531, 5.5713958740234375, 1.619720458984375, 18.647079467773438, -3.7242889404296875, 0.6861038208007812, -1.87396240234375, -0.8419990539550781, 2.8879432678222656, 1.5133705139160156, -1.217477798461914, -0.9310798645019531, 6.686126708984375, 9.022048950195312, -0.5708465576171875, 2.5686893463134766, 1.43780517578125, 1.5463981628417969, -3.2426071166992188, 12.043930053710938, 5.090156555175781, 1.5059356689453125, 25.605133056640625, 0.3324127197265625, -16.36968994140625, 1.826324462890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000100.npy"}
{"epoch": 0.15117157974300832, "step": 101, "batch_size": 64, "mean": 2.0707483291625977, "std": 4.241043567657471, "min": -11.549530029296875, "p10": -2.2558044433593745, "median": 1.7790508270263672, "p90": 6.691229248046876, "max": 13.512153625488281, "pos_frac": 0.71875, "sample": [-1.3135948181152344, -0.5788955688476562, 2.9315643310546875, 0.4061279296875, 6.371086120605469, 6.3338470458984375, 4.196449279785156, 0.8519058227539062, -0.185211181640625, -0.17036056518554688, 0.5506381988525391, 4.3004608154296875, -11.549530029296875, 7.921974182128906, 4.388916015625, -0.058902740478515625, 4.560234069824219, 2.5934181213378906, -1.0045166015625, -2.896068572998047, 4.416149139404297, 1.5323028564453125, 5.4325714111328125, -1.9303817749023438, 3.6223716735839844, -1.7341499328613281, 4.166179656982422, 4.2882080078125, 0.8840847015380859, 6.458892822265625, 1.2547721862792969, 10.721160888671875, 5.59132194519043, 0.6955928802490234, -0.719635009765625, -2.4105453491210938, 1.0616836547851562, -6.575920104980469, -2.3952713012695312, 4.376861572265625, -4.5757598876953125, 3.7177658081054688, 1.256134033203125, 1.8501129150390625, 6.790802001953125, 2.066986083984375, 0.8239517211914062, -9.076065063476562, 0.05159187316894531, 1.7079887390136719, 7.0338592529296875, 0.5452728271484375, 7.212297439575195, 2.927318572998047, 3.7408447265625, -0.1295928955078125, 13.512153625488281, 2.6128768920898438, -0.7595901489257812, 2.8206710815429688, 6.0104827880859375, 3.179941177368164, 11.118305206298828, 1.7037467956542969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000101.npy"}
{"epoch": 0.15268329554043839, "step": 102, "batch_size": 64, "mean": 0.8522800207138062, "std": 5.460872650146484, "min": -9.717391967773438, "p10": -4.530429840087891, "median": 0.37404537200927734, "p90": 7.16289405822754, "max": 20.6900634765625, "pos_frac": 0.546875, "sample": [7.279876708984375, -3.5727996826171875, -2.115327835083008, 1.3520355224609375, -7.192047119140625, -1.1912479400634766, -4.473045349121094, 2.6856307983398438, -7.929618835449219, -0.17313385009765625, 2.3486785888671875, 3.8578643798828125, 0.7456512451171875, 1.7173309326171875, 7.45513916015625, -2.7066574096679688, -2.4121036529541016, 17.10430908203125, -3.5725936889648438, 9.845748901367188, -2.9311084747314453, 0.9990215301513672, -2.8182830810546875, -0.7661972045898438, 2.8164424896240234, 2.4388504028320312, 0.4662742614746094, 6.889934539794922, -9.666465759277344, 1.0565757751464844, 6.703422546386719, 0.7376823425292969, -4.555023193359375, 0.4927864074707031, -3.7321014404296875, -0.7969284057617188, 4.27703857421875, 7.8080902099609375, 11.564468383789062, -2.5364856719970703, -0.41153907775878906, 1.971527099609375, -9.717391967773438, -0.6170730590820312, -1.4070167541503906, 0.140899658203125, 0.057281494140625, 5.2889404296875, 0.2818164825439453, -1.9050331115722656, 1.928436279296875, 4.697837829589844, 1.7916488647460938, -4.76591682434082, -1.4742088317871094, -3.7650604248046875, -1.6452598571777344, 20.6900634765625, -2.841552734375, -7.1662445068359375, 3.7843704223632812, 5.39459228515625, 5.934535980224609, 0.7985858917236328], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000102.npy"}
{"epoch": 0.15419501133786848, "step": 103, "batch_size": 64, "mean": 1.5969194173812866, "std": 3.722949743270874, "min": -6.484672546386719, "p10": -2.6689689636230467, "median": 1.0410003662109375, "p90": 6.934427642822266, "max": 10.347648620605469, "pos_frac": 0.65625, "sample": [-3.1431884765625, -0.4028816223144531, 7.6903076171875, -6.484672546386719, 0.1576557159423828, 3.711597442626953, -0.8014907836914062, -1.0181388854980469, 1.0035057067871094, 6.879425048828125, -2.2496795654296875, 2.0062255859375, -2.526092529296875, 1.1844253540039062, -2.3146133422851562, 5.307685852050781, 2.7337989807128906, -0.8594741821289062, -3.41986083984375, 6.198642730712891, 1.3941497802734375, 4.7642669677734375, -0.010402679443359375, 2.7699317932128906, 2.0378494262695312, -0.8517837524414062, -1.0072517395019531, 8.934097290039062, -2.7302017211914062, 2.818429946899414, 4.005401611328125, 6.958000183105469, 8.51202392578125, -1.2542781829833984, 0.2678070068359375, 4.51416015625, -6.173095703125, 3.768768310546875, 0.6561813354492188, 0.6261062622070312, 5.271247863769531, 1.0775146484375, 0.8388957977294922, 2.6423492431640625, 3.2401294708251953, 1.004486083984375, 6.776500701904297, -2.116046905517578, 7.040916442871094, 0.07835769653320312, 9.574142456054688, 2.6416854858398438, -4.573493957519531, 0.7994651794433594, -3.0875091552734375, 0.29456329345703125, -1.6028366088867188, -1.0860023498535156, -1.1615657806396484, 1.1933517456054688, 3.4545516967773438, 3.2012939453125, 10.347648620605469, 2.699859619140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000103.npy"}
{"epoch": 0.15570672713529857, "step": 104, "batch_size": 64, "mean": 2.348520278930664, "std": 6.509428024291992, "min": -12.042640686035156, "p10": -4.177348327636718, "median": 1.0256805419921875, "p90": 9.256547546386722, "max": 29.02874755859375, "pos_frac": 0.6875, "sample": [9.509521484375, 4.8010711669921875, 3.788074493408203, -2.105264663696289, -4.489662170410156, 2.6742019653320312, -1.656881332397461, 0.9583892822265625, 4.985908508300781, -0.433685302734375, 0.94921875, -12.042640686035156, 29.02874755859375, 7.7437591552734375, -6.4932403564453125, -1.853485107421875, 12.8800048828125, 1.9536018371582031, -0.3276481628417969, 7.611961364746094, -5.0973052978515625, -10.044769287109375, 6.00030517578125, 8.666275024414062, 0.35198974609375, -0.4650611877441406, 0.8923072814941406, -7.315940856933594, 5.409303665161133, -0.1999187469482422, 0.42765235900878906, 0.8291873931884766, -0.8792629241943359, 2.613431930541992, 8.235958099365234, 3.3666000366210938, 0.979644775390625, -0.9212436676025391, 3.5981292724609375, 2.7298736572265625, 1.07171630859375, -2.632413864135742, 18.07293701171875, 0.9541511535644531, 3.9696121215820312, 11.492378234863281, 3.4663963317871094, 0.9103641510009766, -9.077110290527344, 0.04015350341796875, 1.8285160064697266, 3.081491470336914, 6.185169219970703, 4.687328338623047, 1.3112144470214844, -3.4486160278320312, 0.07305145263671875, -1.6558380126953125, 0.7263088226318359, 1.3145751953125, -3.263896942138672, 6.439888000488281, 12.379074096679688, 15.7197265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000104.npy"}
{"epoch": 0.15721844293272866, "step": 105, "batch_size": 64, "mean": 1.7908658981323242, "std": 4.9530768394470215, "min": -8.1932373046875, "p10": -4.027791595458983, "median": 1.3142194747924805, "p90": 7.751492881774904, "max": 18.76129150390625, "pos_frac": 0.65625, "sample": [1.7184638977050781, 5.471229553222656, 2.9722900390625, 3.4253387451171875, 1.3224315643310547, 4.924646377563477, -7.7764434814453125, 10.27557373046875, -3.1418914794921875, 2.363687515258789, -0.4499664306640625, 4.028057098388672, -1.5523605346679688, 7.901542663574219, 0.015869140625, 8.259788513183594, -1.2066841125488281, -5.9932861328125, -0.7742595672607422, 4.3856658935546875, 2.9680252075195312, -2.1592063903808594, 1.3060073852539062, -4.407463073730469, 0.8315582275390625, 3.9370574951171875, 1.0118522644042969, -5.8020477294921875, 2.8174209594726562, -6.52362060546875, -1.906005859375, 1.0945377349853516, -1.491912841796875, 1.66717529296875, -8.1932373046875, 18.76129150390625, 1.5999603271484375, -0.4232635498046875, 2.07098388671875, 0.2705707550048828, 0.021625518798828125, -0.8678855895996094, 0.3552742004394531, 1.6195316314697266, -5.5285491943359375, 4.855831146240234, 5.6432037353515625, -0.9926605224609375, 6.164794921875, -2.5798873901367188, -2.8980560302734375, 2.473857879638672, 0.9370269775390625, 10.935371398925781, 13.633621215820312, 0.19188690185546875, 7.401376724243164, 5.993919372558594, -0.4431495666503906, 2.7689952850341797, -0.796600341796875, 10.40399169921875, 4.5435333251953125, 7.178993225097656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000105.npy"}
{"epoch": 0.15873015873015872, "step": 106, "batch_size": 64, "mean": 1.541475534439087, "std": 5.294121265411377, "min": -16.755966186523438, "p10": -4.969756317138671, "median": 1.351980209350586, "p90": 5.998196029663086, "max": 15.727142333984375, "pos_frac": 0.6875, "sample": [-5.365211486816406, 4.912055969238281, 3.674917221069336, 5.203182220458984, -6.734870910644531, -0.8591995239257812, 8.720451354980469, 2.8078975677490234, 0.6790618896484375, 0.8389167785644531, 4.616401672363281, 6.076580047607422, 10.85546875, 3.6782150268554688, 0.4768199920654297, -3.890819549560547, 1.6486396789550781, 5.762977600097656, 3.5919113159179688, 0.73248291015625, -0.4496612548828125, 0.06794357299804688, -7.7640533447265625, 2.5786285400390625, -0.9586954116821289, -0.15301513671875, -0.4284343719482422, -4.047027587890625, 1.4748611450195312, 15.727142333984375, -0.9277839660644531, -0.43212127685546875, -1.7883071899414062, -2.4571800231933594, 5.4385528564453125, 5.279365539550781, 0.6615085601806641, -1.3209686279296875, 5.259544372558594, 15.346221923828125, 4.243328094482422, 2.630474090576172, 4.681512832641602, 0.8060569763183594, -6.502693176269531, 1.0950698852539062, 1.7958621978759766, 1.2290992736816406, -16.755966186523438, 1.217803955078125, -9.496307373046875, 1.93817138671875, 5.34375, 4.682277679443359, 5.815299987792969, 7.933052062988281, 0.9441871643066406, 2.739215850830078, -0.9536323547363281, 9.925872802734375, -7.9333343505859375, 1.5976715087890625, 0.5259475708007812, 2.61932373046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000106.npy"}
{"epoch": 0.1602418745275888, "step": 107, "batch_size": 64, "mean": 0.8103493452072144, "std": 6.1209716796875, "min": -17.62261962890625, "p10": -5.271922302246093, "median": 1.3040409088134766, "p90": 7.1670860290527365, "max": 17.779953002929688, "pos_frac": 0.65625, "sample": [1.2874717712402344, -5.392799377441406, 2.008575439453125, 0.20940017700195312, 6.650489807128906, 3.8669281005859375, -16.239288330078125, 6.6895904541015625, -0.19008255004882812, 1.8958740234375, -17.62261962890625, 0.107269287109375, 2.126230239868164, 1.19415283203125, 3.7208099365234375, -3.766206741333008, -3.370452880859375, 4.9353790283203125, 3.421895980834961, 1.0964202880859375, -1.39910888671875, 10.349212646484375, 8.751472473144531, 0.4708137512207031, 4.346759796142578, -9.446426391601562, 3.9126853942871094, -2.5769309997558594, -1.4246597290039062, 1.0649948120117188, 3.1913070678710938, -6.971412658691406, 1.829681396484375, 5.725471496582031, 0.7008590698242188, 4.3831787109375, 6.2447052001953125, 1.3206100463867188, 2.9603271484375, 9.599388122558594, 11.82794189453125, 7.371726989746094, -4.8283843994140625, -2.1854782104492188, 2.9675426483154297, 0.6932296752929688, -0.3273887634277344, 3.2287826538085938, 2.4800682067871094, -10.71368408203125, 8.880874633789062, 2.8182945251464844, -2.2792625427246094, -3.531553268432617, -0.4342231750488281, 0.6826248168945312, 2.260528564453125, 17.779953002929688, 2.6347599029541016, -2.1015853881835938, -4.989875793457031, -14.976516723632812, 2.1770401000976562, -3.2350234985351562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000107.npy"}
{"epoch": 0.1617535903250189, "step": 108, "batch_size": 64, "mean": 1.7884013652801514, "std": 5.279046058654785, "min": -11.37994384765625, "p10": -4.7596496582031245, "median": 1.4553642272949219, "p90": 7.981703948974613, "max": 18.427169799804688, "pos_frac": 0.625, "sample": [2.38494873046875, 4.490791320800781, 10.369781494140625, -0.7267227172851562, -0.2830085754394531, 5.352195739746094, 4.00128173828125, 4.345424652099609, 5.8592071533203125, -4.8332672119140625, 1.1904449462890625, 6.208408355712891, 3.49639892578125, -1.2702045440673828, 5.451080322265625, 2.0478973388671875, -2.8572311401367188, -0.19457244873046875, 0.5993156433105469, 0.9191112518310547, 7.134918212890625, -0.037639617919921875, -0.5201129913330078, 3.556121826171875, 0.6797561645507812, -6.892387390136719, 3.3500213623046875, -11.37994384765625, 12.97637939453125, 0.6448593139648438, -8.4208984375, -1.941619873046875, 5.983848571777344, -0.6886329650878906, 8.344612121582031, 5.059320449829102, 0.06170463562011719, -0.1707611083984375, -6.172462463378906, 5.304573059082031, -3.6935501098632812, 8.788856506347656, 2.463043212890625, -1.3785476684570312, -0.29449462890625, 1.9817276000976562, -1.5565547943115234, 2.8296051025390625, 1.8230209350585938, -4.5878753662109375, -7.495738983154297, 2.730438232421875, -5.0867156982421875, 2.6342239379882812, 11.065933227539062, 0.09444046020507812, 2.310680389404297, 18.427169799804688, 1.7202835083007812, 0.1307525634765625, -0.3844795227050781, 5.4863739013671875, -1.3162117004394531, 14.342361450195312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000108.npy"}
{"epoch": 0.16326530612244897, "step": 109, "batch_size": 64, "mean": 3.001702308654785, "std": 5.188776016235352, "min": -5.535255432128906, "p10": -1.3072458267211915, "median": 1.8216361999511719, "p90": 7.930307388305664, "max": 23.65997314453125, "pos_frac": 0.734375, "sample": [0.19487762451171875, 13.274894714355469, 15.3619384765625, 0.7777290344238281, 3.9376049041748047, 3.6876888275146484, 6.797756195068359, 0.3204345703125, 2.0068435668945312, 4.888797760009766, -0.05283927917480469, 1.2131118774414062, 2.0453872680664062, 5.606464385986328, 2.65838623046875, -4.128448486328125, 2.7328720092773438, 0.4585590362548828, -1.0842971801757812, 1.2671775817871094, 1.131204605102539, 4.914325714111328, -3.6954498291015625, -0.2868804931640625, -0.4366302490234375, 23.65997314453125, 0.23206329345703125, -0.6679458618164062, 3.8849868774414062, 18.10400390625, 3.523773193359375, 1.7969436645507812, 1.6433334350585938, 0.08991241455078125, 0.2488574981689453, 2.4590377807617188, -1.2947368621826172, 5.104522705078125, -0.5105438232421875, 4.717647552490234, -1.4344596862792969, 4.1412811279296875, 1.8463287353515625, 6.9324493408203125, -0.5529880523681641, -1.3126068115234375, 0.30861663818359375, 1.5384235382080078, -5.535255432128906, -3.5460739135742188, 1.3223228454589844, 3.3231964111328125, 4.732093811035156, 7.876167297363281, 7.953510284423828, 13.824241638183594, 11.21121597290039, 2.3115234375, 5.771156311035156, 3.3724212646484375, -2.7718124389648438, -1.0820674896240234, -0.96575927734375, 6.2616729736328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000109.npy"}
{"epoch": 0.16477702191987906, "step": 110, "batch_size": 64, "mean": 0.2836933434009552, "std": 5.067322731018066, "min": -17.777191162109375, "p10": -5.110834503173828, "median": 0.6910171508789062, "p90": 6.205155944824221, "max": 13.1171875, "pos_frac": 0.578125, "sample": [1.3483085632324219, -0.22670364379882812, -3.260223388671875, 0.7375030517578125, -0.41948699951171875, 6.427162170410156, 3.7209625244140625, -1.0869865417480469, -9.431671142578125, 1.5766220092773438, 5.402809143066406, 2.937671661376953, 1.2591934204101562, 6.7877960205078125, 0.202484130859375, -2.9635467529296875, 0.45246124267578125, 9.774673461914062, -0.8695449829101562, -4.063348770141602, 2.377859115600586, -2.1588363647460938, 1.9042110443115234, 5.687141418457031, 0.2774639129638672, 1.8974838256835938, 0.917510986328125, 1.1279296875, 1.0645980834960938, -5.5300750732421875, -2.877410888671875, 2.2966079711914062, 2.1949501037597656, -1.4629287719726562, -5.151985168457031, 0.64453125, -5.0148162841796875, 4.323211669921875, -1.4918899536132812, 1.864248275756836, 7.656341552734375, 4.420356750488281, -0.6854400634765625, -3.0155029296875, -1.1242485046386719, 3.2393951416015625, 4.718410491943359, -6.609947204589844, -7.98822021484375, -15.345535278320312, -0.4833221435546875, 1.6609573364257812, -0.8513641357421875, -0.9816513061523438, -0.21021270751953125, 13.1171875, 8.178913116455078, 3.262929916381836, 1.7478713989257812, 6.7356109619140625, -3.872568130493164, 0.9169158935546875, 0.2507476806640625, -17.777191162109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000110.npy"}
{"epoch": 0.16628873771730915, "step": 111, "batch_size": 64, "mean": 1.8971482515335083, "std": 4.761153221130371, "min": -6.242866516113281, "p10": -3.117704582214355, "median": 1.2384767532348633, "p90": 9.072660827636719, "max": 15.87060546875, "pos_frac": 0.59375, "sample": [-1.0264167785644531, -2.972970962524414, -0.2614898681640625, 13.40008544921875, 1.8799362182617188, 6.358024597167969, -1.4930992126464844, 3.2008819580078125, 2.2274742126464844, 0.5978755950927734, 5.184410095214844, -0.3943023681640625, -3.3180389404296875, 1.765167236328125, -6.0091552734375, 0.1963214874267578, -0.1615314483642578, 1.0857868194580078, -2.4148941040039062, 2.3351669311523438, 8.4061279296875, 4.9400634765625, 10.746879577636719, 9.100616455078125, 10.560531616210938, 3.758028030395508, 3.92889404296875, 5.774879455566406, 2.0077171325683594, -6.242866516113281, -0.1665515899658203, -2.6787281036376953, -4.586456298828125, -5.297771453857422, -0.18526840209960938, -1.9253387451171875, 0.3276958465576172, 4.5047607421875, -1.8982620239257812, 5.654441833496094, 9.007431030273438, 7.6055755615234375, 9.466415405273438, -1.2638988494873047, -1.8172035217285156, -0.31993865966796875, 3.1559104919433594, 9.139698028564453, -2.8596649169921875, 1.3911666870117188, 5.057319641113281, 3.715442657470703, 3.0368709564208984, -2.176034927368164, -5.4482269287109375, 1.0741119384765625, -3.1797332763671875, 3.0599822998046875, -2.956707000732422, 1.6930923461914062, 15.87060546875, -1.6791305541992188, 2.4047813415527344, 0.5309982299804688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000111.npy"}
{"epoch": 0.16780045351473924, "step": 112, "batch_size": 64, "mean": 2.2222859859466553, "std": 4.324650764465332, "min": -8.848968505859375, "p10": -2.3277210235595702, "median": 2.184194564819336, "p90": 7.80662307739258, "max": 11.41851806640625, "pos_frac": 0.71875, "sample": [10.70513916015625, 6.8447723388671875, 8.959854125976562, 3.5177879333496094, -0.07941055297851562, 1.1849136352539062, 11.41851806640625, 3.0650863647460938, 5.357398986816406, 2.9816017150878906, 0.05252838134765625, 5.110382080078125, -4.695831298828125, 2.2508468627929688, 1.0391387939453125, -6.517913818359375, -1.635833740234375, 3.2071456909179688, -4.663246154785156, -0.4553260803222656, 3.1833953857421875, -1.757659912109375, 3.268688201904297, 6.080619812011719, 2.1627655029296875, 3.350025177001953, 5.107994079589844, 1.92333984375, 11.4122314453125, 3.2508182525634766, 7.24462890625, 2.0326995849609375, -2.3632583618164062, 0.8195362091064453, 5.970802307128906, -0.0110626220703125, 1.9176063537597656, -5.023193359375, 8.047477722167969, 2.7811050415039062, -1.650186538696289, 2.838470458984375, 4.268589019775391, 2.2056236267089844, 1.5493316650390625, 9.596267700195312, 6.24346923828125, 5.611354827880859, -1.1884918212890625, 1.0346946716308594, -8.062347412109375, 2.2294654846191406, -1.0253067016601562, 1.846405029296875, 6.4475250244140625, 1.0797805786132812, 1.2670555114746094, -2.244800567626953, -1.9353904724121094, 4.134674072265625, -0.639984130859375, -8.848968505859375, 1.8848190307617188, 8.538139343261719], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000112.npy"}
{"epoch": 0.1693121693121693, "step": 113, "batch_size": 64, "mean": 1.2924858331680298, "std": 4.160187244415283, "min": -14.006973266601562, "p10": -3.30301513671875, "median": 1.1308870315551758, "p90": 6.3239585876464846, "max": 11.633026123046875, "pos_frac": 0.640625, "sample": [-2.22369384765625, -4.871559143066406, -2.0721588134765625, 0.9683589935302734, -1.2260169982910156, 5.550502777099609, 2.3517379760742188, -6.502967834472656, 0.2618865966796875, 3.800771713256836, 1.2934150695800781, -1.1104660034179688, -2.523193359375, 2.36083984375, -3.1649932861328125, 0.351806640625, 1.5095787048339844, 7.849475860595703, -1.2271347045898438, -3.7310333251953125, -3.3621673583984375, 2.9361572265625, 6.380405426025391, -4.4300384521484375, -0.912506103515625, 2.8820266723632812, 4.198974609375, 2.8907032012939453, 6.463600158691406, -0.6039237976074219, 3.839550018310547, 11.633026123046875, 0.6924915313720703, -1.1612815856933594, 3.2390518188476562, 6.816652297973633, -14.006973266601562, 2.8952274322509766, 5.43243408203125, -0.6608810424804688, 1.3843345642089844, 5.438940048217773, -3.544891357421875, 5.498924255371094, -2.6632652282714844, 7.6539154052734375, 0.698089599609375, 0.29018592834472656, -0.7196102142333984, 4.642353057861328, 3.6509246826171875, -2.270050048828125, 1.5205440521240234, 10.03533935546875, 4.8151702880859375, -0.06867408752441406, 2.8752822875976562, 0.4766712188720703, 4.250274658203125, 0.3453216552734375, 1.5967369079589844, 0.2337799072265625, -2.4211387634277344, 6.192249298095703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000113.npy"}
{"epoch": 0.1708238851095994, "step": 114, "batch_size": 64, "mean": 2.164108991622925, "std": 4.07156229019165, "min": -5.088050842285156, "p10": -2.869025039672851, "median": 1.363144874572754, "p90": 7.645086479187015, "max": 13.7447509765625, "pos_frac": 0.671875, "sample": [-5.088050842285156, -3.5941085815429688, 1.0300331115722656, 5.602088928222656, 4.054731369018555, 1.0770492553710938, 3.5043258666992188, 5.773567199707031, 4.273414611816406, -2.367877960205078, -3.277280807495117, 1.5619621276855469, 3.056751251220703, -2.42449951171875, -0.3756980895996094, 1.0891265869140625, 3.5220184326171875, -0.0863800048828125, 0.42472267150878906, -0.9945526123046875, 2.8222274780273438, -1.3154067993164062, 13.7447509765625, -2.2480239868164062, 0.39829254150390625, 6.2718505859375, 3.992137908935547, 4.743101119995117, 0.22251129150390625, 6.097911834716797, -1.3002204895019531, 1.3575611114501953, 7.910923004150391, 4.0482330322265625, 2.8839492797851562, 1.0005912780761719, 6.0495147705078125, 2.6984710693359375, 5.371746063232422, -4.455116271972656, 13.633056640625, 7.024801254272461, 3.772796630859375, -0.07715606689453125, 4.0019989013671875, 0.3020801544189453, 9.334564208984375, -1.3030548095703125, -1.884695053100586, -3.0595359802246094, -0.1210174560546875, 0.12823486328125, 3.7916393280029297, 8.12994384765625, 8.174308776855469, 8.4058837890625, 1.7284603118896484, -0.05191230773925781, -4.161216735839844, -3.366456985473633, 5.889614105224609, -1.1837310791015625, 1.3687286376953125, 0.9692955017089844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000114.npy"}
{"epoch": 0.17233560090702948, "step": 115, "batch_size": 64, "mean": 1.7653319835662842, "std": 3.626727819442749, "min": -5.431640625, "p10": -2.061806488037109, "median": 1.5453357696533203, "p90": 6.286008071899415, "max": 12.138931274414062, "pos_frac": 0.6875, "sample": [-1.3430099487304688, -0.8863563537597656, 3.7563705444335938, -4.902191162109375, 12.138931274414062, 1.945596694946289, 3.2226638793945312, 4.131984710693359, 0.9990768432617188, 11.765365600585938, 1.6399688720703125, -1.152740478515625, 9.395309448242188, -4.482521057128906, -1.8268051147460938, -2.1800003051757812, 0.3682060241699219, 4.51129150390625, 0.8943328857421875, 2.1385498046875, 4.446807861328125, 2.655426025390625, 1.2016754150390625, 0.08158111572265625, -2.791454315185547, 1.8509063720703125, 2.8872909545898438, 1.8218002319335938, -5.431640625, 6.387035369873047, -2.1625213623046875, -0.1878814697265625, 0.5773067474365234, 6.0502777099609375, 1.4507026672363281, 4.010475158691406, -0.40512847900390625, 3.153261184692383, -1.2527618408203125, 1.9994735717773438, 3.294219970703125, -5.430538177490234, 5.618072509765625, 2.34832763671875, 7.9029541015625, 2.648204803466797, 2.8546981811523438, 1.7879352569580078, -0.5833206176757812, 5.456298828125, 0.6974449157714844, 0.3992805480957031, 6.614124298095703, -0.5823612213134766, 0.46729469299316406, -0.45379638671875, -1.5979080200195312, -1.0825672149658203, -0.3712921142578125, 0.8451004028320312, 3.7807674407958984, 0.4357128143310547, 2.4858264923095703, 8.970115661621094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000115.npy"}
{"epoch": 0.17384731670445955, "step": 116, "batch_size": 64, "mean": 2.411888599395752, "std": 3.305551052093506, "min": -4.471530914306641, "p10": -1.564462661743164, "median": 2.393850326538086, "p90": 6.119873046875, "max": 10.1458740234375, "pos_frac": 0.71875, "sample": [5.401115417480469, -1.5489349365234375, 0.8891677856445312, 7.698997497558594, 4.22662353515625, 3.4590835571289062, 2.4149017333984375, 1.1343307495117188, 4.467493057250977, 5.721778869628906, 5.00445556640625, -2.2850685119628906, -3.9788436889648438, 8.029632568359375, 0.7267341613769531, -1.0944175720214844, 0.6406707763671875, -0.09057998657226562, 4.040008544921875, 3.9197616577148438, -4.471530914306641, 2.3785324096679688, 4.753582000732422, 2.409168243408203, -0.043689727783203125, 4.908935546875, 4.103515625, 6.083370208740234, 4.916107177734375, 1.6151046752929688, 1.0688362121582031, -0.32161903381347656, -1.8917770385742188, 1.1607398986816406, 8.512557983398438, 0.29756927490234375, 4.129905700683594, 5.7588653564453125, -0.17567062377929688, -1.3689918518066406, 3.772655487060547, 3.1437149047851562, 3.7225570678710938, 1.97845458984375, 10.1458740234375, 4.149169921875, 8.36492919921875, -3.4474868774414062, 0.8450851440429688, 2.2746658325195312, -0.2645549774169922, 5.460182189941406, 0.5912551879882812, -4.264198303222656, -0.5109176635742188, 4.480690002441406, -0.12977981567382812, 3.5920867919921875, 0.8324317932128906, 7.4383697509765625, 5.347412109375, -0.32655906677246094, 6.135517120361328, -1.5711174011230469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000116.npy"}
{"epoch": 0.17535903250188964, "step": 117, "batch_size": 64, "mean": 1.7048879861831665, "std": 3.5822792053222656, "min": -8.751533508300781, "p10": -2.572180557250976, "median": 1.4574689865112305, "p90": 5.740165710449219, "max": 11.439254760742188, "pos_frac": 0.765625, "sample": [1.5033111572265625, 4.962137222290039, 3.5646209716796875, 0.3896217346191406, 5.6804046630859375, 2.8687191009521484, 4.279563903808594, 1.562856674194336, -1.0870132446289062, 3.1851673126220703, 0.5540885925292969, -1.7561073303222656, 5.148956298828125, -3.157470703125, 0.33123016357421875, 4.01629638671875, -1.9914703369140625, 2.0388946533203125, 3.0919265747070312, 4.929817199707031, 1.0821762084960938, 3.2558021545410156, 3.652252197265625, 3.0305862426757812, -1.2772102355957031, -3.07318115234375, 1.0814208984375, 0.4076080322265625, 2.6222896575927734, 9.248298645019531, -1.5357513427734375, 7.991020202636719, 5.798938751220703, -0.15380096435546875, 6.6807708740234375, 1.9757156372070312, 3.32806396484375, 7.453662872314453, 1.2650833129882812, 5.765777587890625, 2.0275344848632812, -0.5003547668457031, 0.891510009765625, 3.554840087890625, -1.634368896484375, -8.751533508300781, 1.4116268157958984, 0.7970809936523438, 3.121673583984375, 3.7241439819335938, 1.0875701904296875, 1.105377197265625, 4.725364685058594, -2.821056365966797, 11.439254760742188, -5.0633697509765625, -2.942829132080078, 0.9381866455078125, -8.241012573242188, 0.45256805419921875, 4.170997619628906, 0.5317611694335938, 0.2713050842285156, 0.10148811340332031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000117.npy"}
{"epoch": 0.17687074829931973, "step": 118, "batch_size": 64, "mean": 2.1893467903137207, "std": 3.1141438484191895, "min": -5.609075546264648, "p10": -1.7140655517578123, "median": 2.1208629608154297, "p90": 6.339834403991699, "max": 8.945487976074219, "pos_frac": 0.703125, "sample": [2.6626358032226562, 1.8127365112304688, 4.376495361328125, -2.4851837158203125, -5.609075546264648, -0.3176078796386719, 4.1561431884765625, 1.7812957763671875, 4.072965621948242, -1.1136932373046875, 2.3211097717285156, 2.7886962890625, 0.7208614349365234, 7.068656921386719, 1.076650619506836, -0.34853363037109375, -2.2915725708007812, 4.092746734619141, 0.08517074584960938, 1.1430950164794922, -0.8963356018066406, 6.2453155517578125, 3.3142852783203125, 4.611419677734375, 2.6906356811523438, -1.4383430480957031, -0.1995849609375, -2.0474472045898438, 4.270566940307617, -0.05406761169433594, 3.3676834106445312, 2.442228317260742, 6.79046630859375, -1.3191070556640625, 0.5853424072265625, 3.2754669189453125, 1.8020248413085938, 8.945487976074219, 6.401458740234375, -3.117767333984375, -1.9596481323242188, 6.1634368896484375, 0.1342926025390625, 4.8175506591796875, -0.494415283203125, 5.7393341064453125, -1.1894302368164062, 6.366455078125, 4.2333526611328125, 6.4369659423828125, 3.5177345275878906, 1.4043464660644531, 1.9206161499023438, 7.923053741455078, 6.05120849609375, 1.67742919921875, -1.824981689453125, 3.942026138305664, 5.410106658935547, 6.277719497680664, 2.95068359375, -1.45526123046875, 1.0328960418701172, -0.6205978393554688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000118.npy"}
{"epoch": 0.17838246409674982, "step": 119, "batch_size": 64, "mean": 1.539390206336975, "std": 3.9495739936828613, "min": -8.886917114257812, "p10": -3.0609941482543945, "median": 1.3000240325927734, "p90": 6.581449127197267, "max": 14.390838623046875, "pos_frac": 0.6875, "sample": [-0.5192794799804688, 4.20733642578125, -2.4769420623779297, 1.0715866088867188, -3.140716552734375, 0.218719482421875, 4.439764022827148, 2.67889404296875, -1.6303863525390625, 0.6655540466308594, -2.8749752044677734, 0.041568756103515625, 1.7120590209960938, 6.153022766113281, 5.671482086181641, -8.886917114257812, 0.054119110107421875, -3.727630615234375, 0.6252555847167969, 2.5413055419921875, -1.9073257446289062, 5.5266265869140625, -0.5259552001953125, 2.5550479888916016, 7.467018127441406, 3.0859222412109375, -3.3092079162597656, 3.1934547424316406, 7.1292266845703125, 7.915252685546875, 5.4475860595703125, -2.3763351440429688, 2.7081241607666016, 3.3813114166259766, 0.1753997802734375, 14.390838623046875, 1.4651508331298828, -0.03980255126953125, 0.6902942657470703, 6.7650604248046875, 3.532001495361328, 5.441497802734375, -2.557100296020508, 0.7291259765625, -4.3349151611328125, -3.526792526245117, 7.3572235107421875, 4.166042327880859, 3.094024658203125, 2.6962890625, 1.94000244140625, -7.94024658203125, -2.309112548828125, -1.452615737915039, 1.134897232055664, 1.5717201232910156, 3.995542526245117, 7.660194396972656, 5.013378143310547, 0.667327880859375, 1.8516731262207031, 1.0726547241210938, -1.2198982238769531, -0.6234512329101562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000119.npy"}
{"epoch": 0.17989417989417988, "step": 120, "batch_size": 64, "mean": 2.2099881172180176, "std": 2.8830983638763428, "min": -3.794475555419922, "p10": -0.6813880920410156, "median": 1.6012954711914062, "p90": 5.403333854675293, "max": 10.457595825195312, "pos_frac": 0.78125, "sample": [0.12860107421875, 1.30035400390625, -2.102336883544922, 5.380205154418945, 3.741455078125, 0.631072998046875, 4.526760101318359, 0.8774948120117188, 0.8244590759277344, -3.3931808471679688, 4.3344573974609375, 1.3597526550292969, 1.3506107330322266, 2.506519317626953, 3.09283447265625, 9.281448364257812, 3.80572509765625, 0.3278179168701172, 1.5927581787109375, 3.072784423828125, 4.433990478515625, -0.4654083251953125, 3.498798370361328, 2.230436325073242, 1.1791839599609375, 1.3816814422607422, 4.641654968261719, 2.3189468383789062, 3.76141357421875, 4.8580169677734375, 10.457595825195312, -1.1740036010742188, 6.773826599121094, -0.7099113464355469, 6.230815887451172, 0.2817344665527344, 2.3179359436035156, -0.22446823120117188, 5.683200836181641, -0.291259765625, -0.45136070251464844, 2.035541534423828, 0.2879180908203125, 9.540252685546875, -0.7557468414306641, -0.5569000244140625, -3.794475555419922, 5.1491546630859375, 3.1536407470703125, 0.36153411865234375, 0.8636245727539062, -0.3890533447265625, 5.413246154785156, 0.54315185546875, 3.610136032104492, 0.0430450439453125, 1.609832763671875, 4.813591003417969, -0.6148338317871094, -2.0807037353515625, 4.046173095703125, 1.4243927001953125, 5.244846343994141, 2.1184654235839844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000120.npy"}
{"epoch": 0.18140589569160998, "step": 121, "batch_size": 64, "mean": 2.17172908782959, "std": 3.374420166015625, "min": -2.8598785400390625, "p10": -1.3640235900878905, "median": 1.417144775390625, "p90": 5.990299797058108, "max": 15.244873046875, "pos_frac": 0.78125, "sample": [2.8831939697265625, 7.788482666015625, 0.07207489013671875, -1.5253887176513672, 5.381261825561523, 8.336334228515625, 6.251316070556641, -1.27569580078125, 2.6238441467285156, 1.9188003540039062, -0.30165863037109375, -0.906463623046875, 10.783401489257812, 2.1186180114746094, 0.6045074462890625, 0.3683757781982422, 4.225736618041992, -1.4018783569335938, 3.4292831420898438, -2.8598785400390625, 4.9017791748046875, 0.7186393737792969, 4.7187042236328125, 1.7299118041992188, 2.5919723510742188, 2.7451858520507812, 1.2000885009765625, -1.6987438201904297, 1.9826812744140625, 2.102313995361328, -0.5838279724121094, 15.244873046875, 2.552806854248047, 1.4228363037109375, -0.16090774536132812, 0.1380157470703125, 0.497589111328125, 1.6626663208007812, 2.1495208740234375, 1.0150680541992188, 0.47884559631347656, 0.33890724182128906, 1.9503097534179688, 3.6971397399902344, 1.4114532470703125, 1.1814498901367188, -2.3714599609375, 11.297874450683594, 0.4302978515625, 2.371633529663086, 0.4577465057373047, 0.80181884765625, 1.6515655517578125, 3.565753936767578, 1.1458320617675781, 10.00445556640625, 1.2997570037841797, -1.4559974670410156, 0.8009929656982422, -0.3539543151855469, -0.009029388427734375, 3.4219970703125, 4.935394287109375, -1.5075626373291016], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000121.npy"}
{"epoch": 0.18291761148904007, "step": 122, "batch_size": 64, "mean": 2.198540449142456, "std": 3.7733025550842285, "min": -3.906402587890625, "p10": -1.882708740234375, "median": 1.4292564392089844, "p90": 6.594112396240235, "max": 16.37091064453125, "pos_frac": 0.765625, "sample": [16.37091064453125, -1.9023971557617188, -0.224822998046875, 6.046390533447266, -3.906402587890625, -1.261077880859375, 6.71856689453125, 6.303718566894531, 0.6116275787353516, 3.9482765197753906, 3.9646682739257812, 1.5062408447265625, 0.6181812286376953, 1.2511520385742188, 0.5161266326904297, 7.926971435546875, -0.566253662109375, 2.46771240234375, -2.909473419189453, 5.10826301574707, 0.8375587463378906, 10.160211563110352, 3.086078643798828, 4.987436294555664, 7.221752166748047, 2.6016845703125, -0.4572315216064453, -2.3236122131347656, 2.264923095703125, 3.6437606811523438, -3.02435302734375, 4.7568206787109375, 0.04618263244628906, 0.0949554443359375, -3.3860549926757812, 3.57464599609375, 1.9301013946533203, -1.8367691040039062, 2.798555374145508, 0.010589599609375, -0.5956878662109375, 1.1394577026367188, -3.2147865295410156, 9.008377075195312, 0.0650177001953125, 4.650003433227539, 2.0522117614746094, 1.690155029296875, 0.47902679443359375, 1.2122802734375, 1.7955894470214844, 2.1191062927246094, 0.6205730438232422, 1.3522720336914062, -1.3084335327148438, 1.9337291717529297, -0.055202484130859375, 0.23556137084960938, 0.0693206787109375, 4.664237976074219, 4.679542541503906, 4.353599548339844, 0.919219970703125, 13.26580810546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000122.npy"}
{"epoch": 0.18442932728647016, "step": 123, "batch_size": 64, "mean": 2.0282022953033447, "std": 3.063011646270752, "min": -6.9947052001953125, "p10": -1.1912887573242186, "median": 1.937032699584961, "p90": 5.8734375000000005, "max": 9.02947998046875, "pos_frac": 0.765625, "sample": [5.894809722900391, 3.4371719360351562, 0.6846466064453125, 7.6243896484375, -3.239542007446289, -0.7338352203369141, 2.1189918518066406, 0.459320068359375, 9.02947998046875, 5.131738662719727, 0.5181732177734375, 5.893341064453125, 0.3309211730957031, -2.170196533203125, 4.389366149902344, 3.9403305053710938, 2.2032604217529297, 3.2743186950683594, -1.8960800170898438, 5.3466033935546875, 1.9312515258789062, 5.701789855957031, 1.97015380859375, 5.35786247253418, 3.9455413818359375, 7.330535888671875, 2.4040088653564453, -1.3107833862304688, 4.6015472412109375, 1.9428138732910156, 0.6261444091796875, 8.596641540527344, -3.0117759704589844, 4.24824333190918, -0.6654548645019531, -3.3065223693847656, 1.9525299072265625, -0.9124679565429688, 0.017040252685546875, 1.3108367919921875, 1.340667724609375, 0.16640090942382812, 2.2671051025390625, 1.1319122314453125, 1.5849571228027344, 4.0725555419921875, -0.5701103210449219, -0.4313850402832031, 0.8395252227783203, 5.480777740478516, 0.5786895751953125, 2.282867431640625, 1.1254043579101562, 3.578754425048828, -6.9947052001953125, 5.826995849609375, -0.7098598480224609, 2.0649948120117188, -0.8765182495117188, 3.2048110961914062, 0.9561901092529297, 1.5330543518066406, 6.53192138671875, -0.14720916748046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000123.npy"}
{"epoch": 0.18594104308390022, "step": 124, "batch_size": 64, "mean": 2.656966209411621, "std": 3.030804395675659, "min": -4.499378204345703, "p10": -0.8597339630126952, "median": 2.5366249084472656, "p90": 6.550251007080078, "max": 11.2562255859375, "pos_frac": 0.84375, "sample": [-0.7221260070800781, 10.894668579101562, 3.5212326049804688, -4.499378204345703, 2.4598617553710938, 2.4281845092773438, 11.2562255859375, 4.437244415283203, 7.218879699707031, 7.3320159912109375, 3.7694473266601562, 3.9128074645996094, 1.0124053955078125, 8.62017822265625, 1.095907211303711, 1.1093406677246094, 1.7831878662109375, 4.758920669555664, 1.3168563842773438, -1.3961772918701172, 4.2845458984375, 2.687458038330078, 3.502735137939453, 0.12789154052734375, 1.4071159362792969, 3.1766433715820312, 1.280364990234375, 0.7817840576171875, 1.2865371704101562, -1.4707221984863281, 3.461597442626953, 3.307525634765625, 1.459930419921875, 5.92010498046875, 3.4233741760253906, 3.80426025390625, 7.691001892089844, -2.322265625, 6.581825256347656, 2.669769287109375, 0.47496795654296875, 6.4765777587890625, -0.9187088012695312, 5.042472839355469, 2.7692909240722656, 3.2359085083007812, 2.1844024658203125, 2.6133880615234375, -1.669015884399414, 4.527647018432617, 6.301300048828125, 0.7214927673339844, 0.07960891723632812, 2.972686767578125, 1.503082275390625, 1.7128219604492188, 1.382110595703125, -0.38034820556640625, -2.813945770263672, 2.43499755859375, -0.6331214904785156, 3.2833328247070312, 0.2080230712890625, 5.1656951904296875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000124.npy"}
{"epoch": 0.1874527588813303, "step": 125, "batch_size": 64, "mean": 2.1459460258483887, "std": 3.12943434715271, "min": -3.0765762329101562, "p10": -1.6770866394042967, "median": 1.8384780883789062, "p90": 6.691307067871094, "max": 10.40875244140625, "pos_frac": 0.734375, "sample": [3.43060302734375, 5.1957855224609375, 2.7867431640625, 1.4960174560546875, 1.4527130126953125, -1.1839618682861328, 10.40875244140625, 0.24304580688476562, 2.440032958984375, 2.7839393615722656, 10.202789306640625, -0.5391159057617188, -0.680328369140625, 3.917633056640625, 1.5585002899169922, 6.558658599853516, 5.588783264160156, 2.8995208740234375, 1.6489715576171875, -0.5142230987548828, 7.24030876159668, -0.024616241455078125, 2.8525543212890625, 3.7248687744140625, -1.0768203735351562, 2.0325870513916016, 5.894783020019531, -1.9996814727783203, -1.4718856811523438, -0.0451202392578125, 4.867584228515625, 1.9930953979492188, -0.927337646484375, 6.706207275390625, 4.731655120849609, -1.7650299072265625, 4.270057678222656, 1.495269775390625, 7.204933166503906, 7.7470855712890625, 6.6565399169921875, 1.0521774291992188, 1.8552398681640625, 3.898601531982422, 2.4150047302246094, -3.0608444213867188, 0.22660064697265625, 6.864994049072266, 0.14274215698242188, 0.12858963012695312, 0.248138427734375, 2.69024658203125, 2.375732421875, 1.2661361694335938, -2.8925933837890625, 2.2312545776367188, 0.6181449890136719, -2.4360275268554688, 2.861766815185547, -0.37459754943847656, 1.82171630859375, -2.0973587036132812, -3.0765762329101562, 0.7795562744140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000125.npy"}
{"epoch": 0.1889644746787604, "step": 126, "batch_size": 64, "mean": 1.4770548343658447, "std": 3.4547524452209473, "min": -6.491004943847656, "p10": -2.070420837402344, "median": 1.3500375747680664, "p90": 5.039773559570313, "max": 14.780044555664062, "pos_frac": 0.640625, "sample": [2.3241806030273438, -0.05079078674316406, 4.989219665527344, 1.6589431762695312, -0.08518791198730469, 5.692081451416016, 5.9766998291015625, 0.14250564575195312, 1.3107452392578125, -2.0823822021484375, 4.647846221923828, 4.1641998291015625, -5.9238433837890625, 1.5795059204101562, 5.061439514160156, 1.3893299102783203, -1.1028289794921875, -1.0746955871582031, -0.3548240661621094, -2.042510986328125, -0.4837799072265625, 3.9997787475585938, 1.7381019592285156, 2.0049943923950195, -0.1953754425048828, 0.06565475463867188, 2.0314102172851562, -3.1175308227539062, 0.5304946899414062, -1.4378013610839844, -0.133453369140625, -0.697235107421875, -2.4266357421875, 2.9543590545654297, -2.000925064086914, -1.1343345642089844, -1.4437198638916016, 14.780044555664062, 3.3130836486816406, -6.491004943847656, 1.1053695678710938, 2.9014129638671875, 2.432279586791992, 3.49774169921875, 2.6528587341308594, 2.8202743530273438, 3.9786758422851562, 9.16278076171875, 0.9803237915039062, 2.5019073486328125, 1.5845527648925781, 6.8401947021484375, 3.1792068481445312, 2.7871742248535156, 1.9119720458984375, -0.170867919921875, -2.5005035400390625, 4.8846893310546875, 0.1039581298828125, 1.175323486328125, -0.1456756591796875, 8.619895935058594, -4.927633285522461, 1.079833984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000126.npy"}
{"epoch": 0.19047619047619047, "step": 127, "batch_size": 64, "mean": 2.3593273162841797, "std": 2.808340549468994, "min": -5.3937835693359375, "p10": -0.5373237609863281, "median": 1.7770938873291016, "p90": 6.33441162109375, "max": 9.93890380859375, "pos_frac": 0.796875, "sample": [-0.6333541870117188, 7.91668701171875, 3.1994476318359375, 3.743175506591797, 0.6243972778320312, 3.7982330322265625, 3.0793838500976562, 6.324531555175781, 0.7665023803710938, 6.4530792236328125, 4.838420867919922, 1.9057693481445312, 2.2874069213867188, 5.908594131469727, 3.686779022216797, 1.5611763000488281, 3.9402999877929688, 3.4905242919921875, -0.46665191650390625, 1.6920509338378906, 4.475135803222656, -5.3937835693359375, 3.234283447265625, 6.040996551513672, 2.6679458618164062, -2.8738250732421875, 1.395843505859375, 1.3568267822265625, -0.4117889404296875, 1.0617523193359375, -0.18541336059570312, 1.8621368408203125, 0.13762474060058594, 5.3895263671875, -0.16233444213867188, 5.411069869995117, 0.5436038970947266, 6.338645935058594, 5.988376617431641, 9.93890380859375, -0.5676116943359375, 4.593284606933594, 0.9410057067871094, -1.6537113189697266, 3.0247802734375, 1.1624679565429688, 0.17792510986328125, -0.32843780517578125, 6.4859466552734375, -0.984405517578125, 1.4462738037109375, 6.4823455810546875, 2.2802162170410156, 0.5277023315429688, 0.6664695739746094, 0.7150535583496094, 1.3205413818359375, 1.1908493041992188, 0.6846504211425781, 2.468252182006836, 3.2468700408935547, -0.23651123046875, 7.045932769775391, -0.6249313354492188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000127.npy"}
{"epoch": 0.19198790627362056, "step": 128, "batch_size": 64, "mean": 2.0811781883239746, "std": 3.7920241355895996, "min": -6.644035339355469, "p10": -2.730030822753906, "median": 1.9933319091796875, "p90": 7.585766983032227, "max": 10.160003662109375, "pos_frac": 0.703125, "sample": [-0.289093017578125, 7.9389495849609375, 1.7368507385253906, 3.6845474243164062, 2.7257118225097656, 1.4886627197265625, 2.4313812255859375, 3.3240966796875, 1.6496238708496094, 8.204864501953125, 1.9042816162109375, -6.558502197265625, 7.088962554931641, -1.1588058471679688, 2.9253082275390625, -3.2232933044433594, 0.6424694061279297, -2.8006820678710938, -4.110084533691406, 6.3468017578125, 7.03094482421875, -4.0499114990234375, -2.5651779174804688, 10.160003662109375, 9.813278198242188, 3.848602294921875, 0.20209503173828125, 7.628978729248047, -0.39849281311035156, 5.06512451171875, 0.7436676025390625, 3.6794281005859375, 1.8507957458496094, 5.366119384765625, 3.9466590881347656, 3.535858154296875, 0.889801025390625, 7.740882873535156, -3.466411590576172, -0.5529327392578125, -2.312835693359375, 1.8323020935058594, 2.1578750610351562, 0.16494369506835938, -1.6837272644042969, -0.5012893676757812, 2.1623096466064453, 3.4921913146972656, 4.8455352783203125, -1.541921615600586, 3.0987091064453125, 2.0823822021484375, 7.4849395751953125, -0.6372528076171875, 4.450784683227539, 1.5215396881103516, 0.273529052734375, 2.9753799438476562, 9.423721313476562, -0.7379379272460938, -6.644035339355469, -0.21240234375, 4.269031524658203, 2.810272216796875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000128.npy"}
{"epoch": 0.19349962207105065, "step": 129, "batch_size": 64, "mean": 1.5748928785324097, "std": 3.242370843887329, "min": -6.831367492675781, "p10": -1.9952682495117187, "median": 0.838648796081543, "p90": 5.560966110229493, "max": 11.438045501708984, "pos_frac": 0.640625, "sample": [2.9764175415039062, -0.6854953765869141, 3.6804275512695312, -2.1911087036132812, -0.26854896545410156, 5.586894989013672, -2.35614013671875, -1.5690593719482422, -0.01580810546875, 5.855808258056641, 4.2144622802734375, 1.3957633972167969, 3.2163925170898438, -1.228546142578125, -1.958770751953125, -2.0109100341796875, 5.048675537109375, 0.35498809814453125, 6.8582000732421875, 5.792289733886719, 0.13427734375, -3.117401123046875, 1.61004638671875, 2.290283203125, -1.8833675384521484, 8.686492919921875, 3.586977005004883, 0.33673858642578125, 2.936321258544922, 3.9349365234375, 3.593830108642578, -1.2063827514648438, -3.0093536376953125, 5.414070129394531, 0.07014846801757812, 3.0337677001953125, -3.7082138061523438, 0.00684356689453125, -0.28098106384277344, 6.864728927612305, 3.600433349609375, 0.022356033325195312, -0.71337890625, 0.8896713256835938, 2.968738555908203, 2.420927047729492, 0.42028045654296875, 11.438045501708984, 4.2589111328125, 5.192718505859375, 2.109729766845703, 4.442970275878906, 5.500465393066406, -0.7173614501953125, -0.33031463623046875, -1.8278331756591797, 3.5228424072265625, -6.831367492675781, -0.641937255859375, -0.000713348388671875, 0.7876262664794922, -0.105865478515625, 0.2918205261230469, 2.1046829223632812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000129.npy"}
{"epoch": 0.19501133786848074, "step": 130, "batch_size": 64, "mean": 1.9433494806289673, "std": 2.7514312267303467, "min": -4.096702575683594, "p10": -1.7778629302978515, "median": 1.9755744934082031, "p90": 5.118476104736328, "max": 10.160144805908203, "pos_frac": 0.796875, "sample": [2.5105247497558594, 1.4531288146972656, 6.914115905761719, 3.5029678344726562, 4.420246124267578, 3.052032470703125, -2.4033203125, 5.123023986816406, 1.5597114562988281, 0.5910186767578125, -2.1580963134765625, 4.1954803466796875, 3.9844818115234375, 0.9569644927978516, -1.7930526733398438, 2.7964935302734375, 2.6276473999023438, 0.09081268310546875, 2.3535003662109375, 5.312835693359375, 3.5736770629882812, 3.9342117309570312, 5.893745422363281, 4.971305847167969, -0.8270378112792969, -1.7424201965332031, -1.283111572265625, 0.829071044921875, -2.277933120727539, 0.23984146118164062, 3.4597930908203125, 5.1078643798828125, 3.8651351928710938, 2.502290725708008, 0.08733367919921875, 10.160144805908203, 3.7758941650390625, 1.6723785400390625, 7.4278564453125, 0.11458206176757812, -0.171142578125, 5.053028106689453, -1.2896194458007812, -3.489299774169922, 2.0848731994628906, 2.65997314453125, 1.9846458435058594, 0.5019798278808594, 3.0619659423828125, -0.4843406677246094, -2.721254348754883, 1.7415447235107422, 0.7923431396484375, 3.4702892303466797, -4.096702575683594, 1.7733116149902344, 0.27686309814453125, 1.9665031433105469, 1.1248626708984375, 2.354522705078125, 6.467887878417969, 2.628673553466797, 1.3529300689697266, 0.7553863525390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000130.npy"}
{"epoch": 0.1965230536659108, "step": 131, "batch_size": 64, "mean": 2.3442277908325195, "std": 3.528257131576538, "min": -6.8982391357421875, "p10": -2.1461605072021483, "median": 2.226506233215332, "p90": 6.475154113769531, "max": 9.621185302734375, "pos_frac": 0.765625, "sample": [0.7457046508789062, 3.681882858276367, 0.14261245727539062, -0.464080810546875, -1.7816352844238281, 9.621185302734375, 1.9299774169921875, 3.0116958618164062, -1.292165756225586, 5.949195861816406, 2.6547622680664062, 1.2065467834472656, -2.442211151123047, -2.18719482421875, 1.1614265441894531, 3.9883079528808594, -3.9903106689453125, 2.1063156127929688, 5.635185241699219, -0.245361328125, -2.151325225830078, -2.1341094970703125, 5.703765869140625, 4.113273620605469, 2.761444091796875, 0.6936492919921875, 2.3466968536376953, -4.6086273193359375, 0.4895439147949219, -0.4020538330078125, 3.5416717529296875, 6.3783721923828125, 7.0987701416015625, 4.986297607421875, 6.0418853759765625, 7.305999755859375, 1.6132545471191406, 7.8999176025390625, 2.6930160522460938, 5.9788970947265625, 2.4532241821289062, 8.681846618652344, -1.433736801147461, 5.437232971191406, 2.4896278381347656, 0.5112380981445312, 0.9333610534667969, 5.273750305175781, -3.5320205688476562, 4.662109375, 0.024913787841796875, -0.20603179931640625, 4.097484588623047, 0.3494071960449219, -6.8982391357421875, 5.383842468261719, 1.2993049621582031, 1.1207218170166016, 6.516632080078125, 0.3750419616699219, 1.6550750732421875, 6.296995162963867, 9.223474502563477, 5.533138275146484], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000131.npy"}
{"epoch": 0.1980347694633409, "step": 132, "batch_size": 64, "mean": 2.0535614490509033, "std": 3.052639961242676, "min": -6.743799209594727, "p10": -1.139122772216797, "median": 2.3334474563598633, "p90": 5.741617584228516, "max": 8.895797729492188, "pos_frac": 0.75, "sample": [6.591621398925781, 1.2934684753417969, 3.067626953125, 3.888988494873047, -0.8147068023681641, -1.6868057250976562, 3.6165771484375, 0.326995849609375, -0.9903411865234375, 5.408134460449219, 4.9232940673828125, 5.372636795043945, 2.8432960510253906, 2.5256500244140625, -2.6580123901367188, 1.371124267578125, -0.4895668029785156, 6.13037109375, 2.5373001098632812, -1.1281585693359375, -0.2551841735839844, -6.1729736328125, 3.5595474243164062, 1.1810226440429688, -0.7877540588378906, -3.1147003173828125, 2.9311752319335938, 5.237148284912109, 6.267967224121094, -0.7175502777099609, -0.0315399169921875, 0.3754844665527344, -6.743799209594727, 1.8607177734375, 2.20355224609375, 1.5699005126953125, 8.042686462402344, 4.243001937866211, 4.972568511962891, 3.8416748046875, -0.3963661193847656, 8.895797729492188, 0.7508087158203125, 5.308769226074219, 5.09947395324707, 2.9398574829101562, 1.065277099609375, 0.7606773376464844, 1.596078872680664, 2.4633426666259766, 3.510242462158203, 1.210968017578125, 5.802589416503906, 2.9562911987304688, 6.057788848876953, 2.5186920166015625, 2.5345611572265625, -1.1438217163085938, 0.5084381103515625, 3.0487060546875, -2.515350341796875, 1.8116302490234375, 5.5993499755859375, 0.451690673828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000132.npy"}
{"epoch": 0.19954648526077098, "step": 133, "batch_size": 64, "mean": 1.8627970218658447, "std": 3.050084114074707, "min": -7.0967559814453125, "p10": -1.235038375854492, "median": 1.7790889739990234, "p90": 5.2559186935424815, "max": 11.124114990234375, "pos_frac": 0.75, "sample": [3.6840972900390625, -1.455902099609375, 1.6915016174316406, 5.962818145751953, 0.19170570373535156, 0.7452468872070312, 1.2486610412597656, 6.0836944580078125, 0.2237224578857422, -0.4941558837890625, 2.9824485778808594, 6.354736328125, 4.671545028686523, 0.6620368957519531, -2.561431884765625, 4.1163482666015625, 4.0491485595703125, -7.0967559814453125, 3.517833709716797, 0.6369705200195312, -0.6970481872558594, 2.821949005126953, 0.2658958435058594, 4.443145751953125, 3.5481319427490234, 0.33087921142578125, 3.2416534423828125, 0.133880615234375, 0.5475387573242188, -0.5379657745361328, 2.54071044921875, 1.4351673126220703, -4.1283721923828125, 4.949432373046875, -1.2604866027832031, 3.0891189575195312, -0.4873619079589844, 0.74505615234375, 4.5915069580078125, 4.1580352783203125, 1.5123519897460938, 2.1154708862304688, 3.3349609375, -4.507350921630859, 1.9743881225585938, 8.066665649414062, 3.8341026306152344, 4.3833770751953125, 1.8666763305664062, 11.124114990234375, -0.1370849609375, -0.0837860107421875, -1.1103973388671875, 6.421142578125, 3.1629791259765625, 0.12984848022460938, 3.706298828125, 5.387269973754883, 3.0238380432128906, -1.1756591796875, 3.3514175415039062, -0.9835357666015625, -2.7927398681640625, 1.669525146484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000133.npy"}
{"epoch": 0.20105820105820105, "step": 134, "batch_size": 64, "mean": 2.63435959815979, "std": 3.213198184967041, "min": -4.07220458984375, "p10": -0.5131366729736326, "median": 2.4292993545532227, "p90": 6.1049648284912115, "max": 11.904739379882812, "pos_frac": 0.796875, "sample": [1.016693115234375, -0.1799774169921875, 2.8030853271484375, 2.5166664123535156, 1.4114532470703125, 2.3419322967529297, 0.6391792297363281, 3.8205642700195312, 6.641208648681641, 4.380645751953125, 2.7415847778320312, 4.1786041259765625, 3.485565185546875, 3.0177345275878906, 2.0722122192382812, -0.06773567199707031, 4.348972320556641, 2.9307098388671875, -0.6067390441894531, 2.605316162109375, 4.602764129638672, 2.3142166137695312, 5.440668106079102, 0.7911567687988281, 1.2635459899902344, 5.980136871337891, 0.3074951171875, 1.2119827270507812, -1.7407417297363281, 4.191335678100586, -2.2368946075439453, -0.19982528686523438, 6.1584625244140625, 0.7973175048828125, 0.5541286468505859, -3.570526123046875, 2.1742706298828125, 2.2566184997558594, 3.7690277099609375, 3.3972530364990234, 11.535972595214844, 5.159177780151367, -4.07220458984375, -2.7565765380859375, 3.9048080444335938, -0.13763427734375, 7.0634002685546875, 0.354278564453125, 9.612884521484375, 2.226022720336914, -0.22686004638671875, 4.222751617431641, 4.440406799316406, 4.34735107421875, 10.038406372070312, 0.24806976318359375, -2.081035614013672, 3.4981231689453125, 1.8448734283447266, 3.775135040283203, 5.408660888671875, 11.904739379882812, 1.0229225158691406, -0.29473114013671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000134.npy"}
{"epoch": 0.20256991685563114, "step": 135, "batch_size": 64, "mean": 1.7294639348983765, "std": 3.098318338394165, "min": -6.674522399902344, "p10": -1.517701148986816, "median": 1.4146900177001953, "p90": 5.358306503295901, "max": 9.584358215332031, "pos_frac": 0.71875, "sample": [3.8122177124023438, -1.1171607971191406, 5.827232360839844, 1.1784534454345703, 1.2040462493896484, -1.6753177642822266, 1.6061248779296875, -2.3641357421875, -0.6108226776123047, 9.584358215332031, 2.95635986328125, 0.33852386474609375, 3.8576278686523438, -0.391204833984375, -0.9078216552734375, -2.4268035888671875, 4.857635498046875, 2.253082275390625, -1.0351581573486328, 1.3400211334228516, 1.5719413757324219, -0.3055267333984375, -6.674522399902344, 0.7793045043945312, 1.4837570190429688, 5.572879791259766, 0.6391487121582031, 0.49167633056640625, 0.39228057861328125, 1.4345970153808594, 2.1038589477539062, 1.6533203125, 3.8183212280273438, 1.0723495483398438, 3.5183029174804688, 1.8371238708496094, 0.8790130615234375, 2.6739044189453125, 3.97882080078125, -4.31370735168457, -0.03784751892089844, 7.9232635498046875, 0.2444000244140625, 1.3885421752929688, 2.686044692993164, 1.3947830200195312, -0.02484130859375, -3.5869979858398438, 4.193510055541992, 4.161102294921875, 3.843280792236328, -3.2987518310546875, 0.9733505249023438, 4.364452362060547, 7.847679138183594, 9.445690155029297, -1.1499290466308594, 3.607999801635742, -0.5941715240478516, 2.904918670654297, 1.7592926025390625, 4.694208145141602, 7.510219573974609, -0.45861053466796875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000135.npy"}
{"epoch": 0.20408163265306123, "step": 136, "batch_size": 64, "mean": 2.10935115814209, "std": 4.2255706787109375, "min": -7.099895477294922, "p10": -2.6634048461914057, "median": 2.132863998413086, "p90": 8.391702842712403, "max": 12.783187866210938, "pos_frac": 0.671875, "sample": [12.152801513671875, 5.784740447998047, -4.586385726928711, -1.6451797485351562, -2.0971221923828125, 3.32281494140625, 0.33829498291015625, 1.62591552734375, -0.24666595458984375, 2.2629241943359375, 2.9980087280273438, -5.7323150634765625, -0.4140758514404297, 3.542387008666992, -1.1724090576171875, 3.1193008422851562, 8.461519241333008, 1.2142753601074219, 1.3508892059326172, -1.9585342407226562, 4.3615570068359375, 2.1888999938964844, 3.6937789916992188, 8.228797912597656, -0.992095947265625, -0.5449600219726562, 9.166389465332031, 2.259471893310547, 1.842376708984375, 12.00653076171875, 4.5207672119140625, 0.4342308044433594, 4.513275146484375, 5.720306396484375, -1.89404296875, 2.974395751953125, -0.49906158447265625, 2.9649715423583984, 3.13214111328125, 1.7266769409179688, -7.099895477294922, 3.7691802978515625, 1.933868408203125, 5.110893249511719, 0.5106277465820312, 4.6437530517578125, -0.4458732604980469, 2.0768280029296875, -3.538665771484375, 12.783187866210938, 2.4153594970703125, -3.5104751586914062, -2.906097412109375, 2.224700927734375, -0.1439971923828125, 3.570587158203125, 2.4996490478515625, -0.6594161987304688, 9.324213027954102, 10.849781036376953, 3.901702880859375, -0.7124862670898438, 0.23516464233398438, -5.959709167480469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000136.npy"}
{"epoch": 0.20559334845049132, "step": 137, "batch_size": 64, "mean": 2.0380120277404785, "std": 3.324810743331909, "min": -5.692291259765625, "p10": -2.100225257873535, "median": 1.867833137512207, "p90": 6.002580261230469, "max": 12.376762390136719, "pos_frac": 0.703125, "sample": [-0.3997344970703125, -5.692291259765625, 3.850170135498047, 4.528205871582031, 3.795412063598633, 7.1249542236328125, 4.503559112548828, 5.949462890625, -0.5462379455566406, 6.0253448486328125, 4.546701431274414, 2.561981201171875, 6.065528869628906, 3.2191314697265625, 6.600391387939453, 0.67108154296875, 2.211700439453125, 5.879302978515625, -0.2559967041015625, 1.0292034149169922, -2.2649574279785156, 3.0305023193359375, 4.067840576171875, -3.3122177124023438, 1.2167949676513672, -0.220184326171875, 4.3480987548828125, 12.376762390136719, -2.118070602416992, -0.6922149658203125, -3.7842483520507812, 0.7212066650390625, 0.16252517700195312, 3.494342803955078, 10.444419860839844, 0.6116828918457031, -0.03002166748046875, -2.5667076110839844, 2.4650955200195312, -0.2982292175292969, -1.4441871643066406, 0.6498832702636719, 2.7291507720947266, 3.775177001953125, 3.9365158081054688, -0.8452949523925781, 1.523965835571289, 4.833505630493164, 1.46484375, 6.24853515625, 0.9654026031494141, -2.0585861206054688, 0.20877838134765625, -3.0761451721191406, 4.479192733764648, 2.464435577392578, 1.267568588256836, -0.7566604614257812, -1.2296142578125, 2.8403854370117188, 0.2168426513671875, 5.228237152099609, 4.950927734375, 2.7396240234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000137.npy"}
{"epoch": 0.20710506424792138, "step": 138, "batch_size": 64, "mean": 1.6650199890136719, "std": 3.1300408840179443, "min": -7.185371398925781, "p10": -2.233860206604004, "median": 1.7609748840332031, "p90": 5.246660232543947, "max": 10.209609985351562, "pos_frac": 0.703125, "sample": [-0.4415130615234375, 4.238166809082031, 3.1951141357421875, 5.989353179931641, 2.5746841430664062, 2.0855579376220703, 1.8449859619140625, 3.3550033569335938, -2.0841827392578125, 1.890655517578125, 1.485788345336914, -3.86334228515625, 4.865997314453125, -0.481353759765625, 10.209609985351562, -2.523773193359375, -2.2234420776367188, 4.600982666015625, 3.7880630493164062, 4.493259429931641, 1.0823974609375, 2.4310836791992188, 6.571502685546875, 3.8630905151367188, 0.20154571533203125, 3.357545852661133, 3.9157447814941406, 2.814685821533203, 0.9357376098632812, 0.8662071228027344, 5.409801483154297, -0.27294921875, 5.575695037841797, -0.3119354248046875, -2.2383251190185547, -0.001956939697265625, 0.3466949462890625, -5.2212677001953125, -0.19980430603027344, 3.1187782287597656, 1.5538711547851562, 4.517555236816406, -0.45365142822265625, 1.7040328979492188, -0.9024925231933594, -7.185371398925781, 9.020729064941406, 1.4253559112548828, 1.8236770629882812, 0.552093505859375, 0.8243789672851562, 1.9577407836914062, 3.469390869140625, -2.4948883056640625, -1.453125, -0.24183273315429688, -2.4468135833740234, 2.4300498962402344, 0.2006683349609375, 1.8179168701171875, 7.62060546875, 2.3056678771972656, 4.559623718261719, 0.7122116088867188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000138.npy"}
{"epoch": 0.20861678004535147, "step": 139, "batch_size": 64, "mean": 2.1236162185668945, "std": 2.8272316455841064, "min": -3.527313232421875, "p10": -1.0974998474121092, "median": 1.7187728881835938, "p90": 6.119976806640627, "max": 9.480813980102539, "pos_frac": 0.78125, "sample": [2.4401168823242188, 5.502445220947266, 0.2837982177734375, 0.8209075927734375, -3.2487106323242188, 3.1944313049316406, 1.0053482055664062, 5.465869903564453, 2.882549285888672, 0.087249755859375, 0.6619415283203125, 9.480813980102539, 4.250827789306641, -0.8261699676513672, 2.476959228515625, 7.110511779785156, 8.453216552734375, 1.2060203552246094, -1.1964797973632812, 4.113384246826172, -0.5953369140625, -3.527313232421875, 0.4198150634765625, 1.7216300964355469, 2.4249801635742188, 1.2727394104003906, 2.4612579345703125, 1.4051036834716797, 5.241363525390625, 2.2507476806640625, 2.4908981323242188, 4.301551818847656, 1.58111572265625, -0.866546630859375, -0.31754302978515625, 3.1704559326171875, 3.2141265869140625, 2.359149932861328, 1.1309814453125, -0.3191070556640625, 5.694313049316406, 1.1461677551269531, 7.007266998291016, -0.33555030822753906, -1.8589153289794922, 0.1633148193359375, 1.2710189819335938, 2.3540191650390625, 6.9658966064453125, -0.5896263122558594, 7.494140625, 1.7159156799316406, 2.4611053466796875, -2.09393310546875, -1.6023483276367188, 2.4415626525878906, 0.49338531494140625, 2.5159072875976562, -2.5600528717041016, 3.863525390625, 6.2664337158203125, 5.7782440185546875, 1.7139034271240234, 1.6206512451171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000139.npy"}
{"epoch": 0.21012849584278157, "step": 140, "batch_size": 64, "mean": 2.5468788146972656, "std": 3.97773814201355, "min": -11.907089233398438, "p10": -2.133621025085449, "median": 2.773275375366211, "p90": 7.6257164001464846, "max": 10.336397171020508, "pos_frac": 0.765625, "sample": [7.9452362060546875, 2.2342529296875, 0.034526824951171875, 0.4785308837890625, 1.8253021240234375, 3.8478927612304688, -3.5643463134765625, -2.6801223754882812, 5.2483673095703125, 4.628791809082031, -11.907089233398438, 2.3060150146484375, 4.2086639404296875, 0.5567626953125, 7.619697570800781, 3.09088134765625, -2.1160144805908203, 1.5000991821289062, -2.410858154296875, 6.564109802246094, 2.945585250854492, 6.198760986328125, 3.73052978515625, 3.7464752197265625, 0.587310791015625, 3.923980712890625, 1.3754901885986328, 10.336397171020508, 0.40765380859375, -2.0182723999023438, -1.4411468505859375, -1.6648292541503906, 4.240652084350586, -1.5046615600585938, -0.4639129638671875, -0.857574462890625, 6.255649566650391, 8.929794311523438, 7.4203948974609375, 6.4521026611328125, 3.4231414794921875, 5.845489501953125, 8.510589599609375, 6.5716705322265625, 2.4964218139648438, -2.9420013427734375, 7.6282958984375, 1.5428695678710938, 8.063560485839844, -1.1153430938720703, -4.1808013916015625, 4.297697067260742, 1.7343597412109375, 2.6432456970214844, 7.97320556640625, 1.603200912475586, 7.130653381347656, 0.0192718505859375, 4.130401611328125, 2.9033050537109375, 0.10609054565429688, 5.822269439697266, -2.1411666870117188, 2.9227294921875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000140.npy"}
{"epoch": 0.21164021164021163, "step": 141, "batch_size": 64, "mean": 2.4597620964050293, "std": 3.703202247619629, "min": -4.5916748046875, "p10": -1.6053632736206054, "median": 1.5110397338867188, "p90": 7.4067108154296895, "max": 11.240657806396484, "pos_frac": 0.734375, "sample": [-4.5916748046875, 0.77252197265625, 5.8878326416015625, 0.4414100646972656, 3.4023895263671875, 0.5948562622070312, 0.9949226379394531, 4.9017181396484375, 0.6711311340332031, -3.872997283935547, 3.2754669189453125, 2.901611328125, 5.743534088134766, -2.1139469146728516, 6.6773529052734375, 2.5901641845703125, 4.476409912109375, 0.5533714294433594, -0.12482261657714844, -2.9969024658203125, 1.8234710693359375, 0.9434700012207031, 1.0772171020507812, 3.5092811584472656, 4.639793395996094, -0.080902099609375, 1.5737075805664062, 5.857757568359375, -1.6111602783203125, -1.18865966796875, -1.3319168090820312, -2.0547714233398438, 2.2641448974609375, 3.4041976928710938, 5.468849182128906, -1.591836929321289, 5.775798797607422, 0.5358657836914062, 11.240657806396484, 7.9137725830078125, 7.0288543701171875, 5.527027130126953, 4.372287750244141, 3.7114410400390625, 7.5686492919921875, 0.40242767333984375, -0.5635585784912109, 1.4483718872070312, 9.042587280273438, 6.742095947265625, -3.1586570739746094, 0.7833175659179688, 1.1487236022949219, 10.81060791015625, 3.9068603515625, 11.131622314453125, -1.0290908813476562, 9.034196853637695, -1.1546096801757812, 0.1488189697265625, 2.343017578125, 1.4305763244628906, -1.10125732421875, -0.5026283264160156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000141.npy"}
{"epoch": 0.21315192743764172, "step": 142, "batch_size": 64, "mean": 1.7730889320373535, "std": 2.961139678955078, "min": -4.772954940795898, "p10": -1.923664093017578, "median": 1.488987922668457, "p90": 5.73900604248047, "max": 8.625717163085938, "pos_frac": 0.75, "sample": [1.4811553955078125, -3.1695327758789062, 1.3085346221923828, 2.9234352111816406, 5.818084716796875, 8.544021606445312, 3.1188201904296875, -1.7181587219238281, 1.8381156921386719, 4.241844177246094, 1.4602584838867188, -0.22546005249023438, 4.498601913452148, -4.435798645019531, 0.6411781311035156, 3.1851882934570312, 2.539398193359375, -4.772954940795898, 1.4968204498291016, 1.2684154510498047, 2.0176849365234375, 4.8782196044921875, 2.1448497772216797, 3.1793174743652344, 2.2398757934570312, 3.1543731689453125, -0.13970947265625, 0.7501373291015625, 8.451560974121094, -2.011737823486328, 2.0923690795898438, -0.6071929931640625, 5.9392242431640625, 8.625717163085938, 2.3346633911132812, -3.414348602294922, 1.0690803527832031, -2.0944442749023438, 0.41632843017578125, 1.1274986267089844, 8.097648620605469, 5.5544891357421875, 0.8379287719726562, 1.1032752990722656, 0.46768951416015625, 4.467092514038086, -2.9176082611083984, 0.8084640502929688, -0.8124847412109375, 3.1083145141601562, -1.2041587829589844, -1.2197189331054688, 0.5733795166015625, 2.8830413818359375, 1.823211669921875, 1.0088653564453125, 4.757225036621094, 6.030139923095703, 2.5253143310546875, -0.7333087921142578, 0.9412078857421875, -0.5814189910888672, 4.199306488037109, 1.5643653869628906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000142.npy"}
{"epoch": 0.2146636432350718, "step": 143, "batch_size": 64, "mean": 1.7093937397003174, "std": 3.5505189895629883, "min": -9.22711181640625, "p10": -2.199939727783203, "median": 1.8567352294921875, "p90": 6.234836006164551, "max": 9.413169860839844, "pos_frac": 0.71875, "sample": [1.26153564453125, 3.927825927734375, 3.0136871337890625, 2.8391342163085938, 1.8773956298828125, -1.1541900634765625, 2.0667953491210938, -0.2266845703125, 6.944726943969727, -3.788402557373047, 4.765924453735352, 1.553741455078125, 7.4817962646484375, 1.8360748291015625, 2.7504501342773438, 2.835927963256836, -6.897373199462891, 3.0250701904296875, 7.308021545410156, 0.5017757415771484, 6.08135986328125, -2.047760009765625, 9.413169860839844, 5.260894775390625, 3.5284271240234375, 1.5307044982910156, 2.8181610107421875, -5.3204345703125, 0.8520259857177734, 0.8729133605957031, 2.21307373046875, 1.1118965148925781, -2.6466827392578125, 4.435935974121094, 7.616722106933594, -2.2651596069335938, -0.6432418823242188, -0.08116531372070312, -0.07164764404296875, 2.144775390625, 0.8945198059082031, -9.22711181640625, 5.1250152587890625, 1.2357254028320312, -0.42011260986328125, 2.591400146484375, 2.2342376708984375, 4.8680419921875, 0.548095703125, -0.2131805419921875, -1.35675048828125, 2.4901695251464844, 4.242958068847656, 8.781623840332031, -5.979179382324219, 6.30061149597168, 2.3999862670898438, 0.35400390625, 4.170219421386719, 3.401926040649414, -0.7945327758789062, -0.7864799499511719, 1.3671989440917969, 0.4456157684326172], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000143.npy"}
{"epoch": 0.2161753590325019, "step": 144, "batch_size": 64, "mean": 2.036937713623047, "std": 3.42403244972229, "min": -8.349609375, "p10": -2.0002468109130858, "median": 1.8706092834472656, "p90": 6.267353820800783, "max": 10.448028564453125, "pos_frac": 0.734375, "sample": [7.669425964355469, 3.8223609924316406, -1.5252838134765625, 0.2420482635498047, -1.5512638092041016, 1.2843437194824219, -2.184446334838867, 3.5430221557617188, 0.8917427062988281, 0.22487258911132812, -1.2244186401367188, 0.9331932067871094, 3.054859161376953, 2.898723602294922, 5.266792297363281, 1.7578353881835938, 5.9147796630859375, 0.2216339111328125, -1.232574462890625, 2.8287277221679688, 3.810699462890625, 3.604145050048828, 2.2679595947265625, -0.5715484619140625, 0.138214111328125, 1.6670455932617188, 4.4476776123046875, 5.009708404541016, -1.6609039306640625, -0.2671356201171875, 3.8657684326171875, 1.3976898193359375, -2.8883705139160156, -0.5680007934570312, 4.4250335693359375, 0.2328643798828125, -1.3817768096923828, 5.624567031860352, 6.41845703125, 7.6910858154296875, 3.795745849609375, 2.200531005859375, -2.3839187622070312, 3.8820343017578125, -2.2135658264160156, 2.276580810546875, 3.1163864135742188, -2.145679473876953, 5.798576354980469, 0.5614700317382812, 0.3948478698730469, -0.0959320068359375, 1.9833831787109375, 8.324621200561523, -3.47607421875, 7.863344192504883, 10.448028564453125, 9.692642211914062, 0.8087120056152344, -8.349609375, 2.3795928955078125, 1.2191238403320312, 4.098136901855469, 4.085487365722656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000144.npy"}
{"epoch": 0.21768707482993196, "step": 145, "batch_size": 64, "mean": 2.1144752502441406, "std": 3.4435296058654785, "min": -7.934288024902344, "p10": -1.5642127990722656, "median": 2.2648468017578125, "p90": 6.380429077148437, "max": 10.944656372070312, "pos_frac": 0.75, "sample": [2.3805694580078125, 3.227670669555664, -1.802927017211914, 0.50897216796875, 0.022251129150390625, -7.934288024902344, 3.8463592529296875, -0.154632568359375, 10.884586334228516, 3.16778564453125, 2.4583702087402344, 3.730731964111328, 0.8338260650634766, 3.9059295654296875, 0.2581672668457031, 2.41949462890625, -0.8127899169921875, 5.627677917480469, -1.544586181640625, 3.0974273681640625, 0.37166595458984375, 0.9957847595214844, -1.5726242065429688, 3.52435302734375, 7.8796234130859375, 0.32332611083984375, -0.7187576293945312, 4.1043701171875, 6.394775390625, 0.037570953369140625, 10.944656372070312, 4.314796447753906, 2.1491241455078125, 8.405441284179688, 2.7579345703125, 6.930152893066406, 2.7064075469970703, 5.177913665771484, 1.6797256469726562, 5.072509765625, -4.015403747558594, 2.986492156982422, 7.217292785644531, 4.488929748535156, 0.2037372589111328, -1.601226806640625, 1.2778949737548828, 1.99737548828125, -1.1215591430664062, 3.0216217041015625, 3.7022132873535156, 3.0819053649902344, 4.977777481079102, -0.4020881652832031, 0.92669677734375, 6.346954345703125, 1.5771484375, -1.0903205871582031, -4.4391937255859375, 1.0044937133789062, 4.7738800048828125, -3.1585159301757812, -1.296844482421875, -0.7322006225585938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000145.npy"}
{"epoch": 0.21919879062736206, "step": 146, "batch_size": 64, "mean": 2.123279333114624, "std": 2.987227201461792, "min": -3.1505889892578125, "p10": -1.9493656158447266, "median": 2.2250471115112305, "p90": 5.565428352355958, "max": 11.943653106689453, "pos_frac": 0.75, "sample": [3.7500457763671875, 4.428478240966797, -1.3936996459960938, 2.2291107177734375, 0.26678466796875, -1.95989990234375, -1.40460205078125, 3.6945114135742188, 3.8740501403808594, 2.6489181518554688, -2.5110702514648438, -1.8168411254882812, 4.5838775634765625, 3.4320316314697266, 1.4865570068359375, -2.6716156005859375, 4.108436584472656, -0.05118560791015625, 4.7051849365234375, -3.1505889892578125, 3.221099853515625, 7.5161590576171875, 2.9803009033203125, 0.6347026824951172, -3.03863525390625, -0.0487213134765625, 1.5305747985839844, 5.9850006103515625, -1.9247856140136719, 2.1483230590820312, 1.8690719604492188, -2.9138450622558594, 8.404037475585938, 2.2922821044921875, 0.37546730041503906, -0.365966796875, 1.27520751953125, 0.99176025390625, 3.14031982421875, -0.21198463439941406, 5.818210601806641, 2.3131656646728516, 1.2671833038330078, 4.0558319091796875, 5.480888366699219, 6.907707214355469, 4.459785461425781, 2.2209835052490234, 5.004493713378906, 1.5084648132324219, 3.1446971893310547, 1.498199462890625, 5.187896728515625, 2.2595748901367188, -2.110147476196289, -1.168548583984375, 11.943653106689453, 0.2023773193359375, 1.2756404876708984, 1.971883773803711, 3.7554702758789062, 2.7067222595214844, 2.4752349853515625, 5.601659774780273], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000146.npy"}
{"epoch": 0.22071050642479215, "step": 147, "batch_size": 64, "mean": 1.6847352981567383, "std": 3.152956247329712, "min": -8.9166259765625, "p10": -1.3638282775878905, "median": 1.6340560913085938, "p90": 5.529904174804687, "max": 8.689949035644531, "pos_frac": 0.71875, "sample": [2.122722625732422, 1.9083099365234375, 3.8671913146972656, -0.3233833312988281, 2.5429458618164062, 3.5597610473632812, -2.5885696411132812, 1.4228973388671875, -0.9591522216796875, -0.44281768798828125, 3.68707275390625, 2.403911590576172, 2.583770751953125, 4.531585693359375, -0.634674072265625, 5.54827880859375, 7.348690032958984, -8.9166259765625, 4.47093391418457, -0.4984588623046875, 1.5527896881103516, 0.8727645874023438, 0.18146324157714844, 1.893707275390625, 0.2698516845703125, 7.823080062866211, -1.2343673706054688, 4.392608642578125, 2.2450218200683594, 1.6266975402832031, 0.6213359832763672, 2.0114212036132812, 0.622100830078125, -2.8343429565429688, -1.4193115234375, 3.0400161743164062, 4.200353622436523, 8.006301879882812, 6.6130523681640625, -0.9876422882080078, -0.602386474609375, 0.364532470703125, 1.0635833740234375, 3.6377620697021484, 3.9230308532714844, -1.8643417358398438, 8.689949035644531, -0.2245330810546875, 4.698249816894531, 0.7388877868652344, 1.6414146423339844, 2.5420074462890625, 1.662994384765625, -0.6309108734130859, -0.7645416259765625, 1.0881576538085938, 1.5551471710205078, 2.0282115936279297, 5.487030029296875, -4.05181884765625, -4.499813079833984, 2.2750396728515625, 7.910062789916992, 0.024051666259765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000147.npy"}
{"epoch": 0.2222222222222222, "step": 148, "batch_size": 64, "mean": 2.0376498699188232, "std": 3.31152606010437, "min": -5.280027389526367, "p10": -2.063665771484375, "median": 1.9827728271484375, "p90": 6.204834938049319, "max": 9.219741821289062, "pos_frac": 0.6875, "sample": [7.2080535888671875, -4.042034149169922, 2.6456432342529297, -2.743907928466797, 4.910911560058594, 9.134452819824219, 3.194732666015625, -5.280027389526367, 4.657684326171875, 5.5794219970703125, -0.7200336456298828, 5.486259460449219, 1.3146095275878906, 1.4961738586425781, 4.641120910644531, -0.52191162109375, -1.6264190673828125, 5.5657196044921875, 2.5504150390625, -0.8596649169921875, 3.238819122314453, -2.3809967041015625, 6.462318420410156, 0.7101001739501953, 5.604040145874023, 9.219741821289062, 3.647411346435547, -0.1683807373046875, 8.00335693359375, 2.421001434326172, 2.0322265625, -2.0999298095703125, 5.544809341430664, -1.9790496826171875, 2.4726943969726562, 4.015411376953125, 0.4619579315185547, 5.359834671020508, -4.961860656738281, -0.07623672485351562, 1.0894546508789062, -0.3265800476074219, 6.53179931640625, 8.884727478027344, -3.0580177307128906, 2.993185043334961, 2.6618499755859375, 1.6996421813964844, 2.8375625610351562, 1.7950897216796875, 3.2943058013916016, -0.08078765869140625, 0.9556503295898438, 0.17800140380859375, -0.08941841125488281, 4.506690979003906, 1.933319091796875, 2.1630783081054688, 1.512441635131836, -0.4217529296875, 0.3520030975341797, 2.30255126953125, -0.8369903564453125, -0.5866851806640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000148.npy"}
{"epoch": 0.2237339380196523, "step": 149, "batch_size": 64, "mean": 2.492323875427246, "std": 3.213804244995117, "min": -6.335216522216797, "p10": -1.0937065124511716, "median": 2.0282649993896484, "p90": 6.640311050415039, "max": 10.460458755493164, "pos_frac": 0.765625, "sample": [3.1136093139648438, 4.8896331787109375, 6.659206390380859, 0.7258644104003906, 2.498138427734375, -0.33669281005859375, -0.3365974426269531, 3.6540489196777344, -0.30942344665527344, 4.851240158081055, -2.9594879150390625, 1.96356201171875, 0.6873588562011719, 2.1400508880615234, 4.682708740234375, 0.577178955078125, 1.7588768005371094, 0.8376846313476562, 0.7084159851074219, 1.4095916748046875, 1.1971511840820312, 8.600822448730469, 5.66444206237793, -1.2058906555175781, 10.460458755493164, 2.5326595306396484, 1.2512874603271484, -0.8319435119628906, 2.5394821166992188, -6.335216522216797, -0.7765007019042969, 4.895744323730469, -1.8142318725585938, 9.795183181762695, -0.05791473388671875, 7.64240837097168, 5.676904678344727, -0.36609649658203125, 6.596221923828125, -1.5837974548339844, -1.5660858154296875, 2.5892791748046875, 0.45066261291503906, 3.7448272705078125, 8.678573608398438, 5.913373947143555, 3.1095657348632812, 4.398723602294922, 1.4028587341308594, 1.4059906005859375, 2.73516845703125, 1.5164718627929688, 2.065704345703125, -1.7762451171875, -0.08160972595214844, 5.3360595703125, 2.77130126953125, 8.378021240234375, 4.7269439697265625, 1.9908256530761719, 1.6820106506347656, 4.934309005737305, 0.4994964599609375, 3.5063552856445312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000149.npy"}
{"epoch": 0.2252456538170824, "step": 150, "batch_size": 64, "mean": 2.887101411819458, "std": 2.988075017929077, "min": -3.1267852783203125, "p10": -0.9248798370361325, "median": 2.952495574951172, "p90": 6.424122619628906, "max": 10.495010375976562, "pos_frac": 0.875, "sample": [3.283000946044922, 0.2722606658935547, 7.555084228515625, 1.4369964599609375, -1.7907791137695312, 0.08939361572265625, 1.7027931213378906, 5.690361022949219, 2.7744598388671875, 5.573160171508789, 0.1472320556640625, 3.4106597900390625, 1.97662353515625, 4.46527099609375, 0.07655715942382812, 2.482147216796875, 2.9033355712890625, 9.136749267578125, 2.9413604736328125, 4.3911590576171875, 3.85699462890625, 0.663116455078125, 3.5816116333007812, -3.0598907470703125, 0.5807037353515625, 5.274568557739258, 1.2039813995361328, 4.545993804931641, 0.5469970703125, 7.376106262207031, 3.1718063354492188, 3.5653839111328125, 6.098930358886719, 0.7083511352539062, 9.42340087890625, -3.1267852783203125, 1.7696285247802734, 3.1894702911376953, 3.1476211547851562, 7.689605712890625, 3.2451629638671875, 6.404052734375, -1.7891082763671875, 1.7328643798828125, 6.4327239990234375, 2.8728790283203125, 4.769216537475586, 5.412054061889648, -1.8528785705566406, 10.495010375976562, 0.2928314208984375, 0.30561065673828125, 3.4426651000976562, -0.6490898132324219, 1.3714752197265625, 5.352691650390625, -1.2386856079101562, -1.0430755615234375, 6.3897247314453125, 0.48708343505859375, 1.6551895141601562, 3.9539718627929688, 2.9636306762695312, 5.043060302734375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000150.npy"}
{"epoch": 0.22675736961451248, "step": 151, "batch_size": 64, "mean": 2.9433765411376953, "std": 3.2631454467773438, "min": -3.7646446228027344, "p10": -1.0661184310913083, "median": 2.923858642578125, "p90": 6.820898056030275, "max": 13.42901611328125, "pos_frac": 0.796875, "sample": [4.243968963623047, 2.826448440551758, 5.30999755859375, -1.27874755859375, 0.5333919525146484, 0.6390972137451172, 13.42901611328125, 0.6558418273925781, 7.3818511962890625, 0.6911144256591797, 2.2587738037109375, 4.557403564453125, 2.3405609130859375, -0.1904144287109375, 3.2787857055664062, 3.6068363189697266, 5.7295684814453125, -0.4091663360595703, 3.1074142456054688, 0.41759681701660156, -1.720458984375, -1.2091426849365234, 3.9617843627929688, -1.2700080871582031, 10.163909912109375, -1.32147216796875, 4.15351676940918, 3.109905242919922, -0.7323951721191406, -0.5325393676757812, 3.2834701538085938, -2.0670166015625, 6.2017974853515625, 2.8740234375, 1.441741943359375, -0.7169151306152344, 3.6446609497070312, 3.31378173828125, 1.2830352783203125, 9.331939697265625, 0.6491737365722656, 3.1648941040039062, 6.974094390869141, 1.7167854309082031, 8.194316864013672, 6.327606201171875, 5.772113800048828, 5.4197845458984375, 6.46343994140625, 2.3545455932617188, 2.761280059814453, 2.97369384765625, 1.855539321899414, 8.665435791015625, -3.7646446228027344, 5.031515121459961, 4.390323638916016, 2.5027008056640625, -0.4657783508300781, 0.585968017578125, 3.1269760131835938, 6.219646453857422, 0.07990264892578125, 5.053825378417969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000151.npy"}
{"epoch": 0.22826908541194255, "step": 152, "batch_size": 64, "mean": 2.2182211875915527, "std": 3.4405314922332764, "min": -4.565422058105469, "p10": -2.073738098144531, "median": 1.6996831893920898, "p90": 7.57829818725586, "max": 9.855018615722656, "pos_frac": 0.734375, "sample": [-1.2055511474609375, 0.67474365234375, 4.559015274047852, 1.1667861938476562, -4.329704284667969, -0.5223388671875, 2.741668701171875, 4.024078369140625, 1.8445281982421875, -2.2276153564453125, 1.810720443725586, -1.714691162109375, 2.6316871643066406, 0.8446426391601562, 5.552574157714844, 7.388328552246094, 0.6529407501220703, 1.4284248352050781, 0.9600162506103516, -0.6336517333984375, -4.003684997558594, 7.1437225341796875, 3.4761962890625, 3.9986190795898438, 0.7589569091796875, 1.0981101989746094, 4.040870666503906, 2.566802978515625, 0.8574390411376953, 1.482574462890625, -0.03888511657714844, 9.325677871704102, 0.9270992279052734, 1.5886459350585938, -0.31803131103515625, -0.595733642578125, 1.9569091796875, 0.3759307861328125, 4.252906799316406, 2.920013427734375, -3.205474853515625, 7.690824508666992, 4.667415618896484, -2.3703231811523438, 5.938972473144531, 7.797172546386719, 5.06878662109375, -0.5760574340820312, 2.7959442138671875, 2.53173828125, -4.565422058105469, 7.7175750732421875, 0.06900215148925781, 8.83083724975586, 5.246709823608398, 2.5306015014648438, 7.6597137451171875, 6.500755310058594, -1.554351806640625, 9.855018615722656, -0.5510368347167969, -2.3894309997558594, 0.5957260131835938, 4.2207183837890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000152.npy"}
{"epoch": 0.22978080120937264, "step": 153, "batch_size": 64, "mean": 2.3372559547424316, "std": 3.0446579456329346, "min": -3.9931640625, "p10": -1.6810241699218749, "median": 2.2150802612304688, "p90": 6.466363525390627, "max": 9.309059143066406, "pos_frac": 0.8125, "sample": [3.250699996948242, 1.7212677001953125, 3.09765625, 7.1383056640625, 4.772958755493164, 6.667289733886719, 4.585136413574219, 1.5857048034667969, 1.63433837890625, 2.4143142700195312, 1.7270374298095703, 7.051042556762695, 1.0445938110351562, 5.7921600341796875, 1.9179534912109375, -2.8212051391601562, 4.557594299316406, -0.5734100341796875, 1.1170120239257812, 7.291206359863281, 9.309059143066406, 5.926006317138672, -1.6939468383789062, 2.2833099365234375, 9.290691375732422, -1.8896923065185547, 7.39788818359375, 0.5178604125976562, 0.9491462707519531, 5.997535705566406, -3.9931640625, 0.9733829498291016, -3.4826297760009766, 1.0681838989257812, 2.7003250122070312, 0.84796142578125, 0.20867156982421875, 5.768867492675781, 2.54364013671875, 2.2695083618164062, 2.793670654296875, -0.6952972412109375, 5.006072998046875, 3.686878204345703, 2.2650375366210938, 0.028415679931640625, 3.281829833984375, 2.1937103271484375, 1.581512451171875, -0.43354225158691406, 0.632659912109375, 3.5666770935058594, 0.6588859558105469, 0.8855209350585938, 4.922332763671875, -1.5872421264648438, -1.73138427734375, 5.364133834838867, -3.219085693359375, 4.95458984375, 3.8042221069335938, 0.07495307922363281, -1.6508712768554688, 2.2364501953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000153.npy"}
{"epoch": 0.23129251700680273, "step": 154, "batch_size": 64, "mean": 1.6750484704971313, "std": 3.3519387245178223, "min": -6.913599014282227, "p10": -2.6367317199707028, "median": 2.1162309646606445, "p90": 6.042160415649415, "max": 9.64889907836914, "pos_frac": 0.703125, "sample": [-3.2452468872070312, 3.339569091796875, -6.1960906982421875, -6.913599014282227, 6.2929840087890625, 3.3122310638427734, -3.127798080444336, 1.4767227172851562, 1.9332122802734375, -2.3709259033203125, 1.8565406799316406, -0.5454616546630859, 3.8060379028320312, -1.3955459594726562, -1.8805694580078125, -2.7506484985351562, -1.2117424011230469, 2.3176040649414062, 0.5511684417724609, 1.4782028198242188, 5.6653594970703125, 8.388994216918945, 6.914987564086914, 1.7186431884765625, -3.6193771362304688, 2.642850875854492, 6.61566162109375, 0.6070175170898438, 7.697296142578125, -5.4261932373046875, 3.1419849395751953, 2.580526351928711, 6.147392272949219, 2.970165252685547, 2.2633590698242188, 0.2512702941894531, 1.914306640625, 2.921487808227539, 3.5627803802490234, 2.1608524322509766, -1.1506538391113281, -0.6237945556640625, 3.312297821044922, -0.6320972442626953, 1.394927978515625, 9.64889907836914, 2.7093048095703125, 3.6265335083007812, 2.5615463256835938, -1.7515106201171875, 0.595611572265625, -0.39250946044921875, 2.0716094970703125, -1.6386947631835938, 3.746337890625, 2.6217384338378906, 5.796619415283203, -0.13861656188964844, 4.975664138793945, 3.1303939819335938, 4.569786071777344, 2.167154312133789, 1.2595329284667969, 3.4970149993896484], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000154.npy"}
{"epoch": 0.2328042328042328, "step": 155, "batch_size": 64, "mean": 2.155862808227539, "std": 3.769047737121582, "min": -6.0039825439453125, "p10": -3.141200256347656, "median": 1.8103675842285156, "p90": 7.825474166870118, "max": 11.295907974243164, "pos_frac": 0.71875, "sample": [1.4490585327148438, 1.0677032470703125, 8.150665283203125, 1.8415069580078125, 0.1535491943359375, 0.9766921997070312, -0.33000946044921875, 4.7890472412109375, -1.0235519409179688, 8.061809539794922, 4.3684844970703125, 5.175559997558594, -1.003692626953125, -6.0039825439453125, 0.2992820739746094, -2.0899276733398438, 4.012237548828125, 8.502212524414062, 3.402557373046875, 7.512081146240234, 8.02508544921875, 0.38752174377441406, 4.634788513183594, -3.391185760498047, 3.8231201171875, 8.7010498046875, 1.210662841796875, 3.6272735595703125, -1.2769508361816406, 5.365325927734375, -4.567138671875, 1.729379653930664, 11.295907974243164, 4.492401123046875, 6.503974914550781, 5.2693023681640625, 0.11678695678710938, -2.23968505859375, -4.5624847412109375, 3.3370285034179688, -3.1608428955078125, 0.7345924377441406, 4.695892333984375, 0.8334922790527344, 5.446434020996094, 3.2666702270507812, 1.8377532958984375, 5.622962951660156, -3.095367431640625, 2.0340309143066406, 3.6297454833984375, 3.676849365234375, -0.06924819946289062, 0.480712890625, 1.6075820922851562, -0.00151824951171875, 3.138385772705078, -3.181222915649414, -4.939594268798828, -0.8907241821289062, 1.7829818725585938, 7.959785461425781, 4.95673942565918, -0.18431854248046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000155.npy"}
{"epoch": 0.23431594860166288, "step": 156, "batch_size": 64, "mean": 2.54197359085083, "std": 3.1154823303222656, "min": -3.554698944091797, "p10": -0.898708152770996, "median": 2.2249603271484375, "p90": 6.469859313964844, "max": 11.869911193847656, "pos_frac": 0.765625, "sample": [2.58453369140625, -0.5375690460205078, -1.130706787109375, -3.554698944091797, 5.573699951171875, 6.3823699951171875, 11.869911193847656, 2.2059707641601562, 2.9805450439453125, 6.640781402587891, 2.2464981079101562, 5.420146942138672, -0.8530235290527344, 3.086395263671875, 6.507354736328125, 4.4844207763671875, 5.7587127685546875, 3.897674560546875, -0.9182872772216797, 3.502227783203125, 1.208749771118164, 0.9138832092285156, 2.786041259765625, 3.3408050537109375, -0.6784591674804688, 0.7952194213867188, -0.29598045349121094, 4.339977264404297, 4.5632171630859375, 4.996299743652344, 2.2189407348632812, 1.86029052734375, -1.8142547607421875, 4.088142395019531, 1.7963104248046875, 11.003795623779297, -3.2498626708984375, 2.3055343627929688, -2.187774658203125, 7.0069732666015625, 4.022392272949219, 9.67333984375, 1.0631484985351562, 1.6612396240234375, -0.11525154113769531, 0.1829681396484375, -0.0100860595703125, 0.4435272216796875, 3.0062637329101562, 2.021270751953125, 1.364206314086914, -0.437835693359375, 2.9815006256103516, 1.5434188842773438, 4.454925537109375, 1.994049072265625, -0.5103797912597656, 2.2309799194335938, 7.31951904296875, 0.6013603210449219, 2.7164764404296875, 5.49530029296875, 1.3882484436035156, -1.5490646362304688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000156.npy"}
{"epoch": 0.23582766439909297, "step": 157, "batch_size": 64, "mean": 2.0342559814453125, "std": 3.15667724609375, "min": -6.240074157714844, "p10": -1.6504177093505854, "median": 1.6023807525634766, "p90": 6.1026252746582035, "max": 11.890838623046875, "pos_frac": 0.75, "sample": [4.849857330322266, 7.79969596862793, -0.39875030517578125, 3.351856231689453, 4.515312194824219, -3.6767196655273438, 0.801788330078125, -1.910247802734375, 6.077262878417969, -0.7388648986816406, 0.5584945678710938, 2.4585189819335938, 7.729644775390625, 1.0399360656738281, -6.240074157714844, 0.12519073486328125, 1.3067512512207031, 1.3856124877929688, 4.899515151977539, 6.757928848266602, 3.39202880859375, 3.834369659423828, 11.890838623046875, 2.8651390075683594, 5.481542587280273, 2.10595703125, -1.0441474914550781, -0.38088035583496094, 2.0026016235351562, -0.9841423034667969, 4.630146026611328, 0.44347381591796875, 3.1042747497558594, 1.0305099487304688, 2.9828643798828125, 6.4298095703125, -2.148956298828125, 6.821954727172852, 4.617950439453125, 0.4137306213378906, -0.7338790893554688, 1.596282958984375, -0.33211517333984375, 4.924190521240234, 6.113494873046875, 1.086721420288086, -0.6618728637695312, 2.3950119018554688, 1.6084785461425781, 0.2210102081298828, 3.5017471313476562, -2.1421737670898438, 1.1086692810058594, 1.1015815734863281, 3.0485401153564453, 4.974126815795898, 0.8500289916992188, 2.4764137268066406, -3.0511093139648438, 3.7134552001953125, -0.31043243408203125, 0.68328857421875, -3.3444061279296875, 3.1835479736328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000157.npy"}
{"epoch": 0.23733938019652306, "step": 158, "batch_size": 64, "mean": 1.9084839820861816, "std": 2.9777884483337402, "min": -4.402519226074219, "p10": -1.5881752014160155, "median": 1.9932518005371094, "p90": 5.797108840942384, "max": 10.144134521484375, "pos_frac": 0.6875, "sample": [2.330718994140625, 3.2660369873046875, 2.788818359375, 1.8673763275146484, -4.402519226074219, 3.665830612182617, 6.602020263671875, 2.9425811767578125, -1.26812744140625, -0.14611053466796875, 6.081241607666016, 2.4863357543945312, 2.8736419677734375, 3.1652488708496094, -4.003807067871094, 3.5504379272460938, 5.4962158203125, 1.3214664459228516, 1.3291282653808594, 4.303462982177734, -1.4252815246582031, 0.6802520751953125, 4.182003021240234, 0.3258514404296875, 5.257997512817383, 1.5877227783203125, -0.5755157470703125, 8.156082153320312, 2.636932373046875, 2.9368057250976562, -1.499847412109375, 3.9487533569335938, 4.576606750488281, 2.043670654296875, 5.592220306396484, 4.228376388549805, 3.640949249267578, -1.2136688232421875, 1.9428329467773438, -2.7443695068359375, 6.0514373779296875, 0.8376655578613281, 2.8568801879882812, 5.884918212890625, 3.1877212524414062, 2.4977951049804688, -0.24028587341308594, -0.06108856201171875, 10.144134521484375, -1.1523056030273438, 1.3005027770996094, -0.545501708984375, -1.6260299682617188, 0.9874420166015625, -0.06710624694824219, -0.4106788635253906, -1.6721134185791016, 7.036472320556641, 3.0764312744140625, 1.4769535064697266, -2.9954261779785156, 0.8863029479980469, -0.22843360900878906, -3.611083984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000158.npy"}
{"epoch": 0.23885109599395313, "step": 159, "batch_size": 64, "mean": 2.244302749633789, "std": 3.421088933944702, "min": -10.519073486328125, "p10": -1.1790046691894531, "median": 1.8967971801757812, "p90": 6.7843526840209964, "max": 10.393356323242188, "pos_frac": 0.78125, "sample": [2.734508514404297, 3.8595638275146484, 0.8596572875976562, 5.350105285644531, 3.6024703979492188, -1.2120132446289062, -2.96954345703125, 7.016801834106445, -0.39166259765625, 10.393356323242188, 8.03900146484375, 1.0398025512695312, 6.790842056274414, 2.5044021606445312, -0.5496711730957031, 1.5519599914550781, 2.51409912109375, 9.30584716796875, 2.777721405029297, 0.2398090362548828, 0.49005126953125, -3.7715835571289062, 1.458770751953125, 2.8683433532714844, 1.8278656005859375, 1.9673137664794922, 1.577199935913086, -2.0277786254882812, 6.7692108154296875, 9.614139556884766, -1.2067794799804688, 2.7628631591796875, 1.707366943359375, -0.11872100830078125, 1.3193130493164062, 2.6683406829833984, 6.6549224853515625, 2.9141845703125, 0.654815673828125, 0.8603363037109375, 1.535287857055664, -0.9683380126953125, 6.3858642578125, 0.7559337615966797, 2.5641021728515625, 5.869604110717773, -1.3949260711669922, 3.0309696197509766, -1.11419677734375, 2.0591354370117188, -0.3155479431152344, 4.938028335571289, 1.9559860229492188, 0.8671035766601562, 0.19541549682617188, -10.519073486328125, 4.555107116699219, 3.1022491455078125, -0.50054931640625, 1.699981689453125, 2.99456787109375, 1.8376083374023438, 8.061103820800781, 3.5927200317382812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000159.npy"}
{"epoch": 0.24036281179138322, "step": 160, "batch_size": 64, "mean": 2.0097408294677734, "std": 3.461805582046509, "min": -5.374488830566406, "p10": -1.7700008392333983, "median": 1.604085922241211, "p90": 6.7544715881347654, "max": 12.19384765625, "pos_frac": 0.71875, "sample": [0.5965557098388672, -0.2136993408203125, 3.0703277587890625, 8.528488159179688, 1.3797245025634766, 4.158668518066406, 0.4054107666015625, -0.40081024169921875, 2.7793445587158203, 4.438972473144531, 10.258193969726562, -0.9661331176757812, 6.7686004638671875, 3.6494903564453125, 3.6230907440185547, 6.721504211425781, -0.1691265106201172, -3.7333984375, 3.670551300048828, 4.060016632080078, -0.5655136108398438, -2.0191192626953125, 0.26625823974609375, -1.8060073852539062, 1.9093647003173828, 2.6568527221679688, 3.1945114135742188, 0.4612865447998047, 3.2291221618652344, 0.551605224609375, 1.8284473419189453, 2.2933425903320312, 12.19384765625, 9.41678237915039, 0.8240375518798828, -1.6652240753173828, 4.690406799316406, 1.2512245178222656, 0.0843963623046875, 5.4368743896484375, 0.6193351745605469, 3.072612762451172, -0.041301727294921875, -1.4147605895996094, 0.06100654602050781, 3.1646499633789062, -4.324974060058594, -1.6859855651855469, 3.2158355712890625, 2.7264633178710938, -3.4504013061523438, -1.0790519714355469, -2.195343017578125, 4.326091766357422, 5.0609893798828125, 0.30878448486328125, 0.275848388671875, 6.9007720947265625, -5.374488830566406, 3.279033660888672, 4.339714050292969, 7.4465789794921875, 0.995208740234375, -0.4614715576171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000160.npy"}
{"epoch": 0.2418745275888133, "step": 161, "batch_size": 64, "mean": 2.1461808681488037, "std": 3.515488862991333, "min": -6.98956298828125, "p10": -1.4248310089111327, "median": 1.5193824768066406, "p90": 6.908300399780273, "max": 11.434333801269531, "pos_frac": 0.75, "sample": [0.6705474853515625, -5.543155670166016, 1.2818832397460938, 2.0230560302734375, 0.905059814453125, -0.9608993530273438, 3.621368408203125, 8.948867797851562, -2.0281848907470703, -1.3677968978881836, -1.9077529907226562, 3.3857574462890625, 2.2514266967773438, 4.3296966552734375, 6.0694732666015625, 2.1971778869628906, 2.7367782592773438, -6.98956298828125, 2.769073486328125, -0.6381607055664062, -1.3982429504394531, 1.018035888671875, 1.1351165771484375, 0.6284408569335938, 2.4692840576171875, -2.2713661193847656, 0.13291549682617188, 4.128076553344727, 3.5227813720703125, -0.993072509765625, 6.38677978515625, 6.674472808837891, 5.735126495361328, 8.623077392578125, 7.050693511962891, -1.1352882385253906, -1.682088851928711, 10.845779418945312, 5.4751739501953125, 4.019481658935547, 1.3818130493164062, 7.515846252441406, 0.010011672973632812, 1.1878814697265625, 1.1935806274414062, 0.009037017822265625, 1.2428550720214844, 2.7971038818359375, 0.7596282958984375, 2.4908828735351562, -1.0952911376953125, 1.9481353759765625, 6.928585052490234, 0.93133544921875, -0.2080078125, -0.0296173095703125, 1.656951904296875, 1.7346630096435547, 6.860969543457031, 2.6148834228515625, 11.434333801269531, -1.4362258911132812, 0.5928821563720703, 4.713504791259766], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000161.npy"}
{"epoch": 0.24338624338624337, "step": 162, "batch_size": 64, "mean": 2.8900184631347656, "std": 3.1929850578308105, "min": -4.557060241699219, "p10": -1.2217758178710936, "median": 2.661810874938965, "p90": 6.777505111694336, "max": 11.008056640625, "pos_frac": 0.8125, "sample": [5.5240936279296875, 7.932716369628906, 0.10205078125, 2.173625946044922, 0.5940475463867188, 4.8945770263671875, 2.587675094604492, -0.07696533203125, 6.3182373046875, 5.793466567993164, 2.8789291381835938, 4.095827102661133, 3.191049575805664, 11.008056640625, 3.7957229614257812, 2.248228073120117, 3.538330078125, 9.072132110595703, 2.29608154296875, 8.723628997802734, 4.500740051269531, 4.103122711181641, 1.1291160583496094, -1.9015350341796875, 2.7359466552734375, -2.4744491577148438, 4.87786865234375, 1.4213409423828125, -1.25390625, 5.58880615234375, 0.6621437072753906, 3.447540283203125, -0.6676139831542969, 6.078395843505859, 7.4051666259765625, 2.4246063232421875, 6.629266738891602, 4.042325973510742, 0.04963874816894531, -0.9761238098144531, -2.8542938232421875, -1.1468048095703125, 6.213264465332031, 6.307125091552734, 3.6604557037353516, 1.8960647583007812, -1.6729583740234375, 2.4985122680664062, -4.557060241699219, -0.368316650390625, 1.7603797912597656, 0.2630615234375, 3.1950759887695312, 3.5487194061279297, 8.20779800415039, 5.051689147949219, 0.4973869323730469, -1.4451522827148438, 1.8398895263671875, 1.3459434509277344, 1.4902820587158203, 6.841035842895508, 5.547201156616211, 2.3279800415039062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000162.npy"}
{"epoch": 0.24489795918367346, "step": 163, "batch_size": 64, "mean": 2.7819252014160156, "std": 3.606288433074951, "min": -5.0781707763671875, "p10": -1.7911788940429687, "median": 2.4678726196289062, "p90": 7.062766265869142, "max": 12.106277465820312, "pos_frac": 0.78125, "sample": [-0.6441459655761719, 5.194953918457031, -5.0781707763671875, 6.8080902099609375, 3.4027175903320312, 0.48606109619140625, -3.8874073028564453, 2.1082763671875, 8.55279541015625, -2.3220062255859375, 3.4818191528320312, 5.302803039550781, 9.202980041503906, 1.7755126953125, 1.9052047729492188, 4.132843017578125, 3.4568824768066406, -0.12622833251953125, 9.210784912109375, 2.651123046875, 0.45702362060546875, 4.199089050292969, 4.860116958618164, -1.6282119750976562, -2.084749221801758, 7.171913146972656, 2.367298126220703, 6.0680999755859375, 2.554483413696289, 2.5068893432617188, 4.371572494506836, 2.2021427154541016, 4.153617858886719, 0.04305267333984375, 5.8871002197265625, 5.3339080810546875, -3.2966079711914062, 6.724372863769531, -1.7884445190429688, 0.9488983154296875, 3.539703369140625, 12.106277465820312, 2.1938247680664062, 5.405170440673828, -0.686492919921875, -1.092824935913086, -0.6594390869140625, 0.9243316650390625, 0.9533004760742188, 1.4594078063964844, -4.217247009277344, 5.790485382080078, 5.124515533447266, 0.5376014709472656, 1.6335945129394531, -1.7923507690429688, 8.628402709960938, 9.202484130859375, 6.614997863769531, 5.59941291809082, 4.024797439575195, 1.7213897705078125, 1.9365615844726562, 2.4288558959960938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000163.npy"}
{"epoch": 0.24640967498110355, "step": 164, "batch_size": 64, "mean": 2.7905771732330322, "std": 3.7510087490081787, "min": -6.030372619628906, "p10": -1.3383367538452147, "median": 2.0564937591552734, "p90": 7.558120727539063, "max": 12.680259704589844, "pos_frac": 0.765625, "sample": [-2.6910171508789062, 0.9706935882568359, -0.14588165283203125, 1.8097915649414062, 2.2858352661132812, 1.093658447265625, 10.915206909179688, 7.662971496582031, -0.26717376708984375, 1.1795005798339844, 6.5085296630859375, -0.9095535278320312, 4.535726547241211, -2.5217247009277344, 4.166309356689453, 5.3603973388671875, 6.437267303466797, 2.0446434020996094, 8.26824951171875, -1.196481704711914, 1.5691051483154297, -2.2076034545898438, 3.2006378173828125, 2.0683441162109375, 0.32592201232910156, 1.9596977233886719, 8.189788818359375, 2.118396759033203, 2.9472808837890625, 0.3186683654785156, 4.674160003662109, 7.313468933105469, -0.8338947296142578, 2.291667938232422, 6.6578369140625, -1.3991317749023438, 4.520698547363281, 12.680259704589844, 11.224809646606445, -4.786064147949219, 1.1987228393554688, 2.9298324584960938, 0.9682273864746094, 5.34300422668457, 1.6972732543945312, 2.771116256713867, 4.799690246582031, 0.9029808044433594, 6.643646240234375, -0.013538360595703125, 1.890838623046875, 1.9503021240234375, 8.786109924316406, -1.7708816528320312, -0.8361988067626953, -6.030372619628906, 1.4873428344726562, 6.946577072143555, 6.8983306884765625, -0.07053756713867188, 6.939857482910156, 1.265045166015625, 2.5918922424316406, 2.9666824340820312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000164.npy"}
{"epoch": 0.24792139077853365, "step": 165, "batch_size": 64, "mean": 2.4577760696411133, "std": 2.9506170749664307, "min": -5.151214599609375, "p10": -1.1642686843872068, "median": 2.481656074523926, "p90": 6.087551689147949, "max": 9.741546630859375, "pos_frac": 0.828125, "sample": [6.426521301269531, 3.302478790283203, 3.4197540283203125, 4.168449401855469, 1.0394935607910156, 0.7452468872070312, 1.0465202331542969, -0.2083892822265625, 3.6346511840820312, 2.467294692993164, 0.6691360473632812, 3.3204307556152344, 5.02703857421875, 1.1648845672607422, -0.2313385009765625, 3.7116165161132812, 1.768402099609375, 1.9262580871582031, -2.431478500366211, 0.5429229736328125, -5.151214599609375, -0.2230377197265625, 9.741546630859375, -0.837554931640625, 5.922607421875, 7.930732727050781, 2.158327102661133, 1.9555015563964844, 7.487569808959961, 6.075042724609375, 2.4415435791015625, 0.045711517333984375, 9.360626220703125, 5.80511474609375, 3.0554122924804688, -3.4760665893554688, 0.717254638671875, 6.0621795654296875, 3.9899826049804688, 4.347686767578125, -1.3042888641357422, 1.0332145690917969, 0.62969970703125, 0.11565780639648438, 6.092912673950195, -1.6639938354492188, 2.2589111328125, 1.3662605285644531, 2.8314132690429688, 0.14459228515625, 2.4960174560546875, 3.9742355346679688, -2.8055801391601562, 4.408107757568359, 3.3063888549804688, -2.0480995178222656, 2.6062393188476562, 0.9057044982910156, 3.211141586303711, 2.942108154296875, 3.5420608520507812, 6.148223876953125, 4.306755065917969, 3.881134033203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000165.npy"}
{"epoch": 0.2494331065759637, "step": 166, "batch_size": 64, "mean": 1.6755683422088623, "std": 2.9443438053131104, "min": -3.3504409790039062, "p10": -1.4461700439453125, "median": 1.5465278625488281, "p90": 5.667397689819336, "max": 9.606002807617188, "pos_frac": 0.625, "sample": [4.729164123535156, -0.9136867523193359, -1.4859771728515625, 5.079994201660156, 4.048431396484375, 0.9429550170898438, -1.5255355834960938, 6.301929473876953, 3.4631500244140625, 2.921600341796875, -1.178802490234375, 1.995462417602539, 6.685604095458984, -1.0498886108398438, 4.8100128173828125, -0.9514350891113281, -0.276885986328125, 5.491191864013672, 1.6847381591796875, -1.1465835571289062, -0.6624431610107422, 5.725337982177734, -3.255329132080078, 2.443927764892578, 5.532203674316406, 2.0742015838623047, -0.156646728515625, 3.669036865234375, 4.01568603515625, 1.875335693359375, -1.1557769775390625, -2.3256797790527344, 1.4083175659179688, 6.2572174072265625, 0.8141021728515625, -0.9312095642089844, 4.162498474121094, 1.703125, 2.9792709350585938, 2.0083847045898438, 0.3916168212890625, 8.712661743164062, -1.6450881958007812, -3.3504409790039062, 3.030731201171875, -0.178253173828125, 0.247711181640625, -0.4659385681152344, 1.215322494506836, 9.606002807617188, 1.9221038818359375, -2.7333831787109375, -1.3532867431640625, 3.1022262573242188, -1.1655349731445312, -0.9317169189453125, 3.4376754760742188, -0.9568939208984375, 6.2975616455078125, -1.3388824462890625, 2.576171875, 3.6885452270507812, 0.8682537078857422, 0.4522132873535156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000166.npy"}
{"epoch": 0.2509448223733938, "step": 167, "batch_size": 64, "mean": 2.6759848594665527, "std": 3.1451375484466553, "min": -4.832798004150391, "p10": -0.793182373046875, "median": 2.2144412994384766, "p90": 6.7014867782592775, "max": 10.0263671875, "pos_frac": 0.84375, "sample": [0.935272216796875, 0.7165451049804688, 1.3391971588134766, 2.861133575439453, -3.8107757568359375, 5.3463134765625, 6.108436584472656, 0.832000732421875, 2.287313461303711, 1.7400856018066406, 4.450769424438477, 2.6811447143554688, 0.8079509735107422, 1.4200820922851562, 0.3825836181640625, 1.032958984375, 4.634925842285156, 3.7857208251953125, 7.2742156982421875, 0.0324859619140625, -4.832798004150391, 1.8890342712402344, 2.141569137573242, 1.4415359497070312, 1.213460922241211, 2.348419189453125, 3.2044906616210938, -0.15435028076171875, 2.0121078491210938, 4.0084228515625, 0.9193191528320312, 6.73051643371582, 0.1474781036376953, 2.6217689514160156, 3.2650184631347656, 4.804168701171875, 7.327754974365234, 9.884468078613281, 6.083953857421875, 4.421043395996094, 0.6787376403808594, 2.68975830078125, -0.7370529174804688, 1.7592506408691406, 5.297695159912109, 0.20423126220703125, -2.9105224609375, 8.550224304199219, -1.176025390625, 3.5141754150390625, 5.934074401855469, -0.8172378540039062, -0.40947914123535156, 9.788829803466797, 4.198469161987305, 6.633750915527344, -1.0955886840820312, 10.0263671875, 5.025962829589844, 6.1590576171875, 3.2054367065429688, 0.9401321411132812, 0.37686920166015625, -0.9098281860351562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000167.npy"}
{"epoch": 0.25245653817082386, "step": 168, "batch_size": 64, "mean": 1.4807050228118896, "std": 2.7265443801879883, "min": -5.7198486328125, "p10": -1.5608543395996093, "median": 1.4583501815795898, "p90": 5.185702514648438, "max": 8.312530517578125, "pos_frac": 0.703125, "sample": [2.8199234008789062, -1.6369094848632812, 4.011386871337891, -1.383392333984375, 1.522481918334961, 3.9738006591796875, 3.970458984375, -1.2007675170898438, -1.9015045166015625, -2.7648162841796875, 3.0168304443359375, -0.18982696533203125, 0.9921035766601562, 5.71485710144043, 1.6365966796875, 1.4482269287109375, 2.7851905822753906, 1.8182258605957031, -0.7526626586914062, 3.24139404296875, 0.7187576293945312, -0.9015254974365234, 0.9139595031738281, 2.221811294555664, 0.09326553344726562, 1.6882972717285156, 7.14324951171875, 1.8487167358398438, -3.2517623901367188, 1.4786453247070312, 1.3091506958007812, 0.6385040283203125, -5.7198486328125, 3.340911865234375, 3.8609275817871094, 0.2893829345703125, 5.207061767578125, 1.7687568664550781, -0.9473800659179688, 0.9321746826171875, 0.35198402404785156, 8.312530517578125, -1.0483627319335938, 5.654212951660156, 5.1358642578125, -0.4710235595703125, 2.8772735595703125, -1.1297454833984375, 1.4500579833984375, -2.055805206298828, 2.681884765625, 0.6908493041992188, 3.442401885986328, 1.4666423797607422, -1.2181777954101562, 5.405893325805664, -0.943939208984375, -3.6230392456054688, 0.33892059326171875, -0.4321422576904297, 4.370307922363281, 3.9137954711914062, 6.753997802734375, 3.086090087890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000168.npy"}
{"epoch": 0.25396825396825395, "step": 169, "batch_size": 64, "mean": 2.387477397918701, "std": 3.5723865032196045, "min": -3.6911392211914062, "p10": -1.4587772369384764, "median": 1.7586135864257812, "p90": 6.843977355957032, "max": 15.69818115234375, "pos_frac": 0.734375, "sample": [1.6559829711914062, 4.993949890136719, 2.7058181762695312, -2.361705780029297, 4.192346572875977, 1.5279159545898438, -3.6911392211914062, 0.0514984130859375, 3.1482505798339844, 3.7162322998046875, 1.0313949584960938, -0.030628204345703125, 5.531219482421875, 7.482757568359375, 1.2918777465820312, -0.7195091247558594, 1.5659523010253906, 1.3423614501953125, 3.2499618530273438, -1.1225738525390625, -0.126251220703125, 5.0897674560546875, 7.050788879394531, 1.746307373046875, -1.1912117004394531, 11.722633361816406, 2.322906494140625, 1.1453704833984375, 1.2826118469238281, 4.611713409423828, 0.9571800231933594, 3.0377273559570312, 4.7972412109375, -0.8623180389404297, 3.334413528442383, 1.0361328125, 5.847137451171875, -2.8021087646484375, 2.2370357513427734, 1.7709197998046875, 0.29547119140625, 2.4429473876953125, -0.3681297302246094, -1.5734481811523438, 2.8092098236083984, 1.1431655883789062, 3.963031768798828, 1.60980224609375, 6.695648193359375, 3.2987518310546875, -1.1397018432617188, 4.145036697387695, 2.0810775756835938, -2.3629150390625, -0.5700531005859375, 15.69818115234375, 6.93048095703125, 6.9075469970703125, -3.6670169830322266, 11.346553802490234, 3.3630828857421875, 3.7905235290527344, -1.7196044921875, -0.8910446166992188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000169.npy"}
{"epoch": 0.25547996976568405, "step": 170, "batch_size": 64, "mean": 2.1373472213745117, "std": 3.168792963027954, "min": -6.278501510620117, "p10": -1.7403587341308593, "median": 1.876601219177246, "p90": 6.301238250732424, "max": 9.872299194335938, "pos_frac": 0.78125, "sample": [4.254478454589844, 0.8870201110839844, 1.7039852142333984, 3.2965545654296875, 1.5573844909667969, 7.951301574707031, 1.6120166778564453, 2.6910018920898438, 0.585418701171875, 6.503208160400391, -2.0850830078125, 0.9644756317138672, 4.358772277832031, 7.3957672119140625, 5.160747528076172, -0.4570140838623047, 1.9152240753173828, 1.9039287567138672, 4.726860046386719, 0.21134185791015625, -0.7083015441894531, 8.149261474609375, 2.2508411407470703, 1.6865501403808594, -1.23419189453125, 2.584197998046875, -3.4741592407226562, -1.755096435546875, -6.278501510620117, 1.849273681640625, 5.829975128173828, 1.04638671875, 4.1716156005859375, 2.899812698364258, -4.043733596801758, 1.4558334350585938, -3.8654327392578125, 5.814323425292969, 5.325401306152344, 0.5108642578125, 3.5537109375, -0.28804588317871094, 1.0499725341796875, 3.9320602416992188, 2.25048828125, 9.872299194335938, 1.7871551513671875, 6.6336669921875, 2.6264686584472656, 0.3462677001953125, -0.3671417236328125, -0.9922351837158203, 1.1110153198242188, 7.770774841308594, -1.7059707641601562, 0.3862476348876953, 2.84173583984375, 2.8474807739257812, 1.8094253540039062, 2.010080337524414, 4.569908142089844, 4.551784515380859, 5.29510498046875, -2.4543380737304688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000170.npy"}
{"epoch": 0.25699168556311414, "step": 171, "batch_size": 64, "mean": 2.385944366455078, "std": 3.7700421810150146, "min": -8.561744689941406, "p10": -1.1221771240234373, "median": 1.9815540313720703, "p90": 7.396414947509766, "max": 9.928329467773438, "pos_frac": 0.75, "sample": [2.501800537109375, 1.4365119934082031, 5.927520751953125, 3.1191329956054688, 2.9850006103515625, 8.722488403320312, 2.221935272216797, 5.502067565917969, 0.015321731567382812, 2.038806915283203, 0.2095012664794922, 0.9727020263671875, 7.063850402832031, 3.2473678588867188, 7.723888397216797, -0.3712501525878906, 3.3016490936279297, 1.4649200439453125, 4.081504821777344, -0.827484130859375, -0.14505767822265625, 1.3990192413330078, 2.7527694702148438, -3.108001708984375, 5.024847030639648, 0.2002544403076172, 1.9243011474609375, 0.549224853515625, 6.7985076904296875, -5.002378463745117, 1.2718276977539062, 2.466827392578125, 7.4730682373046875, -0.6668605804443359, 2.415802001953125, 7.362358093261719, 0.2704315185546875, -3.9488449096679688, 6.8032684326171875, 7.0227203369140625, -8.561744689941406, 1.2872371673583984, 3.019420623779297, -0.015289306640625, 1.099700927734375, 8.507156372070312, 6.7612762451171875, -0.1293182373046875, -1.4525070190429688, 6.7613983154296875, 2.7204971313476562, 9.3994140625, 6.159614562988281, 2.2361679077148438, -0.10573577880859375, 0.729705810546875, 7.4110107421875, -0.29997825622558594, -1.24847412109375, -6.174102783203125, 9.928329467773438, 1.4441299438476562, 1.8213653564453125, -0.8001556396484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000171.npy"}
{"epoch": 0.2585034013605442, "step": 172, "batch_size": 64, "mean": 2.6429290771484375, "std": 3.449373245239258, "min": -7.655616760253906, "p10": -1.3594062805175777, "median": 2.3902511596679688, "p90": 7.333338928222659, "max": 12.554666519165039, "pos_frac": 0.828125, "sample": [-2.9411087036132812, 3.3311996459960938, 1.7429962158203125, 2.0550689697265625, 0.42877960205078125, 1.1620674133300781, -0.09031867980957031, 3.6349754333496094, 2.4301986694335938, 7.821218490600586, 10.386566162109375, 4.19512939453125, 3.5138473510742188, -0.9514312744140625, 3.897125244140625, 12.554666519165039, 4.10723876953125, -2.2207794189453125, -0.9664306640625, 2.2539749145507812, 5.9279327392578125, 1.9533843994140625, -3.268442153930664, -1.5278244018554688, 3.514751434326172, 2.703754425048828, 2.0563735961914062, 4.242462158203125, 3.123992919921875, 1.8649826049804688, 2.879728317260742, 1.9778823852539062, 2.163299560546875, 10.25177001953125, 0.2899017333984375, -0.42797088623046875, 4.412771224975586, 3.2045135498046875, 8.389930725097656, 2.3503036499023438, 0.6214714050292969, 2.809772491455078, 2.9960556030273438, 4.871858596801758, 5.3596038818359375, -1.8161582946777344, 1.7115516662597656, 0.10581207275390625, 3.4296951293945312, 3.5449905395507812, -2.0718154907226562, 6.5550994873046875, 7.6668701171875, -7.655616760253906, 0.6846637725830078, 5.77655029296875, 0.5746536254882812, 1.3308486938476562, 1.1809158325195312, 5.792514801025391, 0.2185821533203125, 0.3126811981201172, 8.417709350585938, 4.3006744384765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000172.npy"}
{"epoch": 0.2600151171579743, "step": 173, "batch_size": 64, "mean": 2.465545654296875, "std": 4.0189924240112305, "min": -5.742424011230469, "p10": -2.1154293060302733, "median": 2.540119171142578, "p90": 7.424947357177734, "max": 11.172958374023438, "pos_frac": 0.703125, "sample": [-3.062967300415039, 2.8817520141601562, -0.20511245727539062, 5.027076721191406, 2.57965087890625, 4.3199920654296875, 2.6231212615966797, -0.2338714599609375, -0.3269214630126953, 2.638935089111328, -0.4671669006347656, 3.1268157958984375, 5.727851867675781, -5.084989547729492, -2.059711456298828, 7.069404602050781, 9.503005981445312, 0.3610038757324219, 2.5797977447509766, 1.113260269165039, 3.17694091796875, 8.286056518554688, -0.9099082946777344, 11.165924072265625, -4.914083480834961, -4.0071868896484375, 6.000312805175781, 5.611305236816406, 2.3353538513183594, 0.880035400390625, -2.107330322265625, 7.4122314453125, -1.6531982421875, 7.430397033691406, 4.358360290527344, 1.4190788269042969, 5.233360290527344, -2.0158424377441406, 3.9031753540039062, 8.275039672851562, 2.5522232055664062, 0.6493682861328125, 1.5619335174560547, 6.563102722167969, 4.970298767089844, -5.742424011230469, 0.14186477661132812, -0.782257080078125, 11.172958374023438, 6.428226470947266, 6.327844619750977, -1.7114448547363281, 1.8179168701171875, 5.795326232910156, 7.202415466308594, -2.79400634765625, -1.7020111083984375, -2.1189002990722656, 2.1804275512695312, 9.946014404296875, 0.19408416748046875, 3.5247974395751953, 1.0981941223144531, 2.52801513671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000173.npy"}
{"epoch": 0.2615268329554044, "step": 174, "batch_size": 64, "mean": 2.620671272277832, "std": 3.8917815685272217, "min": -7.197731018066406, "p10": -1.9499780654907226, "median": 2.0744142532348633, "p90": 8.01572036743164, "max": 11.140525817871094, "pos_frac": 0.6875, "sample": [1.7037353515625, 1.4602813720703125, 4.885063171386719, 1.9406700134277344, 2.722177505493164, -2.527679443359375, 1.4256210327148438, 6.4063568115234375, 2.665863037109375, 3.0376338958740234, 7.871070861816406, 7.09161376953125, 1.346405029296875, -0.7664604187011719, 6.360626220703125, -2.0220584869384766, 3.4729080200195312, 5.901397705078125, 3.4812355041503906, -6.2003631591796875, 8.077713012695312, 4.4818878173828125, -1.8533763885498047, 4.607414245605469, 11.140525817871094, -0.9658622741699219, 1.21527099609375, -0.21743011474609375, 9.835075378417969, 1.5271453857421875, 1.466278076171875, 10.741386413574219, 8.733293533325195, -1.729654312133789, 4.297588348388672, 6.313201904296875, -0.0780029296875, 9.251937866210938, -0.5493125915527344, 5.9720611572265625, -1.5334243774414062, 5.65380859375, 3.554515838623047, -0.15720558166503906, -0.1824493408203125, 4.330768585205078, 1.0291595458984375, -2.6195831298828125, 8.574150085449219, -0.010525703430175781, -7.197731018066406, 5.284248352050781, -0.20801544189453125, 1.5062103271484375, 6.5613861083984375, -2.8702239990234375, 3.4144668579101562, -0.5331039428710938, 4.553134918212891, -1.9913787841796875, 2.208158493041992, 2.55511474609375, 1.86798095703125, 1.4102516174316406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000174.npy"}
{"epoch": 0.26303854875283444, "step": 175, "batch_size": 64, "mean": 2.9583497047424316, "std": 4.198704719543457, "min": -6.85009765625, "p10": -1.8627395629882812, "median": 2.2733993530273438, "p90": 8.61830940246582, "max": 12.513561248779297, "pos_frac": 0.71875, "sample": [-2.697835922241211, 2.558134078979492, -3.2907257080078125, 1.3627033233642578, -0.6356887817382812, 5.323738098144531, 0.3839836120605469, 8.792724609375, 2.21630859375, -1.3963794708251953, 1.6922988891601562, 7.328575134277344, 1.527008056640625, 4.305995941162109, -1.4484100341796875, 7.275823593139648, 5.2223052978515625, 7.113746643066406, 7.064250946044922, 1.3695602416992188, 2.3304901123046875, 2.465761184692383, 1.0951690673828125, 7.336700439453125, 5.5362396240234375, 3.8572845458984375, 0.17080307006835938, -0.434112548828125, 8.344436645507812, -6.85009765625, 6.976387023925781, 3.0590286254882812, 2.1811141967773438, 5.244712829589844, -1.8709564208984375, -0.26099205017089844, 0.4138832092285156, 12.513561248779297, -1.375579833984375, -2.82763671875, 3.3271408081054688, -0.261962890625, -0.216064453125, 1.896636962890625, -3.2340545654296875, -1.84356689453125, 5.452781677246094, -1.72552490234375, 9.096168518066406, 6.953332901000977, 3.323617935180664, 7.482269287109375, 0.8271636962890625, -3.4060211181640625, 1.3987045288085938, 6.813346862792969, -1.4648761749267578, 8.73568344116211, 5.2064361572265625, 10.532302856445312, 1.6896209716796875, 10.798202514648438, 5.383888244628906, 10.594844818115234], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000175.npy"}
{"epoch": 0.26455026455026454, "step": 176, "batch_size": 64, "mean": 1.9553676843643188, "std": 3.988619089126587, "min": -7.9859619140625, "p10": -1.9192520141601561, "median": 1.8261146545410156, "p90": 7.03240089416504, "max": 15.3349609375, "pos_frac": 0.640625, "sample": [7.9755096435546875, -5.970268249511719, 4.823677062988281, 4.658885955810547, -0.19925689697265625, 6.774635314941406, 0.04578399658203125, 2.4072036743164062, 2.849184036254883, 0.479278564453125, -5.7676544189453125, -1.8395919799804688, -2.241741180419922, 1.9121551513671875, -2.3691787719726562, 7.142871856689453, -0.4446144104003906, 4.297183990478516, -1.2406463623046875, -0.7199325561523438, -0.6585311889648438, -1.4215888977050781, 2.0136260986328125, 8.450878143310547, 4.7962646484375, 2.7495346069335938, -1.9533920288085938, 3.5689849853515625, -1.77264404296875, 3.0256500244140625, 6.026947021484375, -0.7836151123046875, -1.051666259765625, 3.132843017578125, 1.2849254608154297, 7.714698791503906, -0.28759002685546875, 2.2181396484375, 0.190765380859375, -1.8225631713867188, 1.9805068969726562, -0.360748291015625, 3.8061752319335938, 10.1824951171875, 1.6653175354003906, 5.717071533203125, -0.8129348754882812, -1.1649017333984375, 5.5606536865234375, 9.360580444335938, -0.2423095703125, 3.671741485595703, 0.6121635437011719, 4.190040588378906, 1.7400741577148438, 2.322542190551758, 15.3349609375, 2.223773956298828, 0.53948974609375, 5.54046630859375, 4.233894348144531, -7.9859619140625, 1.7264175415039062, -2.6931304931640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000176.npy"}
{"epoch": 0.2660619803476946, "step": 177, "batch_size": 64, "mean": 3.6052703857421875, "std": 3.887321949005127, "min": -6.4684906005859375, "p10": -0.4278491973876953, "median": 3.5703163146972656, "p90": 7.625187873840333, "max": 14.350051879882812, "pos_frac": 0.828125, "sample": [2.9306106567382812, 1.8847770690917969, 3.664722442626953, 14.350051879882812, 5.540824890136719, -0.2354259490966797, -0.13100433349609375, 4.9273681640625, 7.5021209716796875, 4.6610107421875, 14.163322448730469, 7.427150726318359, 7.417308807373047, 0.17920684814453125, 2.0129432678222656, -2.028318405151367, 7.390495300292969, 0.4891357421875, -2.0038223266601562, 4.827449798583984, 4.825592041015625, 4.564002990722656, 1.5823516845703125, 10.30927848815918, 4.808807373046875, 2.783843994140625, 1.5679950714111328, 5.682891845703125, -0.7015438079833984, 0.1358642578125, 1.2010040283203125, 1.4899978637695312, 7.777374267578125, 6.722198486328125, 0.07901573181152344, 7.67793083190918, 6.477319717407227, 1.5970916748046875, 4.1104736328125, 3.2925453186035156, 0.45986175537109375, 3.475910186767578, 7.071994781494141, -1.6016998291015625, 7.1918182373046875, -3.779327392578125, 4.10980224609375, 2.322357177734375, 5.593662261962891, -0.37578392028808594, 4.593780517578125, -0.4501628875732422, -6.4684906005859375, 11.207714080810547, 2.2599945068359375, 4.411521911621094, 4.802541732788086, 6.7322845458984375, 3.9573974609375, 1.2646026611328125, 1.1020545959472656, 8.819713592529297, -0.245819091796875, 3.3276138305664062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000177.npy"}
{"epoch": 0.2675736961451247, "step": 178, "batch_size": 64, "mean": 2.0391430854797363, "std": 3.1476197242736816, "min": -5.2362823486328125, "p10": -1.8996162414550781, "median": 1.8080501556396484, "p90": 6.3664268493652365, "max": 9.859535217285156, "pos_frac": 0.765625, "sample": [0.5870494842529297, 5.8137664794921875, 3.7977561950683594, -0.6984329223632812, 4.747901916503906, 1.1446475982666016, -0.27254486083984375, 5.847221374511719, 2.3587646484375, 2.193920135498047, -2.3724365234375, 3.7547454833984375, -0.091461181640625, -2.885089874267578, 4.23724365234375, 2.1053314208984375, 6.7647552490234375, 0.7136821746826172, -1.9038848876953125, 1.2393646240234375, 6.936519622802734, 0.38648033142089844, 1.70318603515625, -2.995574951171875, 3.736705780029297, 2.90093994140625, 2.9441604614257812, 1.5815887451171875, 2.3408966064453125, 1.2339630126953125, 4.135772705078125, -0.11152267456054688, 1.0049209594726562, 8.423355102539062, 0.765167236328125, 4.028495788574219, 1.58074951171875, 2.30157470703125, 9.859535217285156, 0.35941314697265625, -5.2362823486328125, 4.80731201171875, 9.090805053710938, 1.2688751220703125, -4.371803283691406, 5.0495758056640625, 0.20264816284179688, -0.9655952453613281, 7.064666748046875, 2.457193374633789, 2.2435760498046875, 6.5889434814453125, 1.5637168884277344, 5.555877685546875, 1.0697917938232422, -0.7787399291992188, 3.4329071044921875, 0.8450241088867188, -3.2755584716796875, -1.8896560668945312, 1.9129142761230469, 2.3500213623046875, 3.1477203369140625, -1.8274097442626953], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000178.npy"}
{"epoch": 0.2690854119425548, "step": 179, "batch_size": 64, "mean": 2.399564743041992, "std": 4.095296382904053, "min": -8.165000915527344, "p10": -1.9587520599365233, "median": 2.2608118057250977, "p90": 7.662645721435548, "max": 12.699661254882812, "pos_frac": 0.703125, "sample": [2.1041412353515625, -1.640045166015625, 0.33774566650390625, 5.8424835205078125, -0.6105995178222656, 3.851642608642578, 2.417482376098633, 3.4451904296875, 11.9765625, 3.9141464233398438, 6.526643753051758, 9.596115112304688, 4.4480133056640625, -3.4566612243652344, 3.7945175170898438, -1.9939117431640625, -8.165000915527344, 0.27091407775878906, 1.3957443237304688, 0.9767608642578125, -0.36075592041015625, 4.576545715332031, -1.724029541015625, 2.4599990844726562, 9.043689727783203, -1.4402084350585938, -0.8065872192382812, 4.4907073974609375, 1.0283050537109375, 0.9986686706542969, 4.292028427124023, 6.022087097167969, 2.0349502563476562, -3.8378524780273438, 0.5608253479003906, -3.4073028564453125, -1.4307861328125, 11.992034912109375, 6.0937042236328125, -0.36273956298828125, 3.028444290161133, 3.644805908203125, 3.3661441802978516, 3.8448753356933594, 1.942596435546875, 5.0243682861328125, 8.392974853515625, 2.7042999267578125, 7.405555725097656, 12.699661254882812, 5.989978790283203, 2.7296829223632812, 7.7728271484375, 1.3919143676757812, -1.0818557739257812, 3.0281829833984375, 1.7385444641113281, -0.3002166748046875, 1.9709892272949219, -2.9615859985351562, -5.442113876342773, -0.23347854614257812, 3.5370941162109375, -1.8767127990722656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000179.npy"}
{"epoch": 0.2705971277399849, "step": 180, "batch_size": 64, "mean": 1.969395637512207, "std": 3.603458881378174, "min": -4.790153503417969, "p10": -2.8691986083984373, "median": 1.8255538940429688, "p90": 6.745124435424805, "max": 11.512260437011719, "pos_frac": 0.6875, "sample": [5.3005828857421875, 1.1481208801269531, 3.7827281951904297, 0.7117843627929688, 2.299957275390625, 3.4395523071289062, -3.8681182861328125, 1.4560508728027344, 1.513254165649414, -1.1455116271972656, 6.770252227783203, 4.826984405517578, -3.59173583984375, -0.38899803161621094, 2.24432373046875, 0.03728675842285156, -4.431976318359375, 6.686492919921875, -0.12015914916992188, -4.628501892089844, 7.6101226806640625, 1.8073272705078125, 4.607669830322266, 1.43963623046875, 2.8666324615478516, 5.036521911621094, -0.5274543762207031, 1.843780517578125, 2.063457489013672, 2.112010955810547, 3.6580886840820312, 6.407133102416992, -4.790153503417969, -0.7042236328125, 3.0519142150878906, 2.7586822509765625, 0.8596897125244141, 6.467628479003906, -0.009534835815429688, -0.5916938781738281, 1.5714969635009766, 4.078376770019531, 8.38943099975586, 0.0335540771484375, -1.6129493713378906, 1.1081504821777344, 7.377655029296875, -0.0810699462890625, 11.512260437011719, -0.1657276153564453, 5.401386260986328, -2.7053070068359375, 2.1669082641601562, -1.6996688842773438, -2.1453170776367188, 2.663654327392578, 9.09225082397461, -2.9394378662109375, 4.781414031982422, -4.1155853271484375, 5.149993896484375, 2.245250701904297, 7.5586090087890625, 0.36638641357421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000180.npy"}
{"epoch": 0.272108843537415, "step": 181, "batch_size": 64, "mean": 3.2346842288970947, "std": 3.2071831226348877, "min": -2.4967288970947266, "p10": -0.3411907196044922, "median": 3.043243408203125, "p90": 7.234043884277344, "max": 12.408721923828125, "pos_frac": 0.84375, "sample": [1.9074935913085938, 4.572475433349609, 0.8914260864257812, -0.3384590148925781, 1.1295814514160156, 2.003591537475586, 2.6885910034179688, 0.1466522216796875, 1.4952125549316406, 1.3437423706054688, 6.0725860595703125, 1.5694770812988281, 4.207233428955078, 0.112823486328125, 3.4823646545410156, 4.334617614746094, 4.497400283813477, 4.736747741699219, 4.4388885498046875, -2.316558837890625, 3.3143310546875, 7.931556701660156, 4.615131378173828, 3.7159576416015625, 8.718635559082031, 7.2546844482421875, 2.77215576171875, 12.408721923828125, 1.2144546508789062, -1.6111679077148438, 8.061538696289062, 4.1963348388671875, 7.185882568359375, -2.4967288970947266, 5.7707366943359375, -0.24648284912109375, -0.5084457397460938, 9.076065063476562, 1.1492385864257812, 0.8971023559570312, 5.025157928466797, 1.3454360961914062, 4.879951477050781, 6.157051086425781, 6.707733154296875, 0.1598377227783203, 1.0184764862060547, 2.75396728515625, 4.962306976318359, -0.3423614501953125, 1.3155288696289062, 6.010921478271484, 11.787002563476562, 1.6356887817382812, 2.5833740234375, 5.760929107666016, 1.6017303466796875, 4.5343170166015625, 4.029874801635742, -1.3053131103515625, 3.4661483764648438, -0.1345062255859375, 4.876075744628906, -2.2051239013671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000181.npy"}
{"epoch": 0.273620559334845, "step": 182, "batch_size": 64, "mean": 2.8214097023010254, "std": 4.137951374053955, "min": -8.225273132324219, "p10": -1.8442867279052733, "median": 2.061971664428711, "p90": 8.55853443145752, "max": 11.842544555664062, "pos_frac": 0.75, "sample": [11.842544555664062, -1.3879451751708984, -2.1120681762695312, 1.19573974609375, 4.896171569824219, 5.660118103027344, 3.401418685913086, -1.5519142150878906, -2.7693710327148438, 9.43450927734375, 5.16926383972168, 1.2839126586914062, 10.603424072265625, -1.9925003051757812, 4.953708648681641, 5.62994384765625, 1.2928009033203125, 2.015270233154297, 8.939628601074219, 1.4391059875488281, 3.6071910858154297, 0.19775390625, 0.532470703125, -1.4564666748046875, 3.6284255981445312, -1.9695892333984375, 8.578977584838867, 3.6762771606445312, 6.262176513671875, 7.357357025146484, 9.126708984375, 2.3737621307373047, 2.8162841796875, 2.108673095703125, 5.305244445800781, -0.630035400390625, 1.0156097412109375, 1.3088607788085938, 7.28179931640625, 0.04094505310058594, 1.8471603393554688, -3.6679306030273438, 8.16558837890625, -0.3184661865234375, 7.788372039794922, 5.75640869140625, 1.6062164306640625, -0.290130615234375, 4.96929931640625, 1.7802581787109375, 4.191410064697266, 8.869209289550781, -1.2840957641601562, 7.953081130981445, 8.510833740234375, 1.519317626953125, -1.3646011352539062, 3.283201217651367, 6.282360076904297, 0.8474807739257812, 0.0448455810546875, -6.212890625, -0.5876235961914062, -8.225273132324219], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000182.npy"}
{"epoch": 0.2751322751322751, "step": 183, "batch_size": 64, "mean": 1.421842098236084, "std": 3.3376352787017822, "min": -9.759902954101562, "p10": -2.287810134887695, "median": 1.1596622467041016, "p90": 5.366211700439453, "max": 7.765281677246094, "pos_frac": 0.65625, "sample": [2.259979248046875, 5.980491638183594, 0.765594482421875, -0.307830810546875, 1.5537300109863281, -1.3472137451171875, -0.038665771484375, 5.164520263671875, -2.8500595092773438, -0.20035552978515625, -9.759902954101562, 0.14971923828125, 2.7303943634033203, -1.1846771240234375, 2.3724021911621094, 5.321128845214844, -0.8073444366455078, 2.086151123046875, 7.765281677246094, 3.013263702392578, 5.11749267578125, -1.9839019775390625, 4.767127990722656, -1.4572906494140625, 0.5801315307617188, -0.8810577392578125, 2.894805908203125, 1.6308174133300781, -1.8824615478515625, -4.072265625, 7.765190124511719, 4.56903076171875, 1.6582107543945312, 0.643341064453125, -3.7657623291015625, 6.121955871582031, 5.078910827636719, 5.335060119628906, -5.252708435058594, 6.778598785400391, -2.4180564880371094, -0.29679107666015625, 5.236305236816406, -0.1895160675048828, 0.33063316345214844, 2.5119895935058594, 3.636737823486328, 0.6817855834960938, -1.1661758422851562, -0.7509098052978516, 2.6315155029296875, 3.2236785888671875, 0.3603973388671875, 0.5451507568359375, 5.5295562744140625, 0.15694618225097656, 3.3930282592773438, -3.7835693359375, 0.18364906311035156, -0.9594039916992188, 3.2930374145507812, 5.3795623779296875, 4.6408843994140625, 2.515625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000183.npy"}
{"epoch": 0.2766439909297052, "step": 184, "batch_size": 64, "mean": 2.226041793823242, "std": 2.8987724781036377, "min": -2.017162322998047, "p10": -0.9911209106445309, "median": 1.1882991790771484, "p90": 6.679843139648438, "max": 9.962179183959961, "pos_frac": 0.78125, "sample": [-0.1116485595703125, 6.59027099609375, 4.989435195922852, 1.0972213745117188, 5.008369445800781, 0.842071533203125, -0.1861133575439453, 0.2267913818359375, 0.2005615234375, 2.7872467041015625, 0.31476783752441406, 5.333671569824219, 0.20895957946777344, 1.783285140991211, 3.7840042114257812, -1.9439315795898438, 7.1165771484375, 5.233406066894531, 0.62139892578125, 8.434539794921875, 0.7057418823242188, 9.962179183959961, 1.9680404663085938, 9.051094055175781, -1.3831977844238281, 4.21234130859375, 0.11478042602539062, 3.2217559814453125, 7.811878204345703, 1.2216224670410156, 0.5856704711914062, -0.46091461181640625, 0.7072601318359375, -1.2200584411621094, 0.19849014282226562, 2.579742431640625, 8.093803405761719, 3.496694564819336, 1.1549758911132812, 5.034309387207031, 1.1460704803466797, -1.1316680908203125, 6.718231201171875, -0.00391387939453125, 4.580471038818359, 2.8464221954345703, 3.1707763671875, 1.4510040283203125, 0.5811176300048828, 5.532466888427734, -1.4857044219970703, -0.04405975341796875, 0.9376296997070312, 1.5664024353027344, 1.9225311279296875, 3.6346435546875, -2.017162322998047, 2.7477035522460938, -0.3053550720214844, 0.9354019165039062, -0.663177490234375, 0.5221099853515625, -1.320657730102539, 1.7582855224609375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000184.npy"}
{"epoch": 0.2781557067271353, "step": 185, "batch_size": 64, "mean": 2.337040901184082, "std": 4.194239616394043, "min": -7.306755065917969, "p10": -2.7723485946655266, "median": 2.5064687728881836, "p90": 8.055098724365235, "max": 12.480926513671875, "pos_frac": 0.6875, "sample": [2.0164546966552734, 11.351776123046875, -1.5000076293945312, 4.301355361938477, 9.687637329101562, -5.742881774902344, 3.507415771484375, 1.8616561889648438, -1.0504074096679688, -1.3965530395507812, 8.100227355957031, 10.428909301757812, 1.0313491821289062, -0.665252685546875, 12.480926513671875, 3.4602737426757812, 9.246231079101562, 1.7253036499023438, 5.182086944580078, 5.183635711669922, -1.6877670288085938, 5.623588562011719, 4.423244476318359, 2.8718490600585938, 1.6294918060302734, 5.152034759521484, 2.352537155151367, 3.499591827392578, 2.660400390625, 3.4450225830078125, -2.1945343017578125, 0.33957672119140625, 3.7446346282958984, 7.6208038330078125, -4.511587142944336, 4.8303375244140625, 3.62359619140625, 5.444606781005859, 2.192882537841797, -0.36445045471191406, -7.306755065917969, 1.4828109741210938, -2.1696090698242188, 2.7613525390625, 3.5476303100585938, -4.1749267578125, 5.495012283325195, -0.2254486083984375, 7.949798583984375, -0.8348522186279297, 0.3372955322265625, -4.122978210449219, 4.867118835449219, -1.8165245056152344, -0.2481842041015625, 2.7291488647460938, -4.130962371826172, -3.0199832916259766, 2.7329177856445312, -0.79150390625, 0.11743545532226562, 1.5183525085449219, 8.295076370239258, 6.672391891479492], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000185.npy"}
{"epoch": 0.2796674225245654, "step": 186, "batch_size": 64, "mean": 2.448810338973999, "std": 3.614980697631836, "min": -6.947059631347656, "p10": -1.9123971939086912, "median": 2.4103775024414062, "p90": 6.799041748046875, "max": 10.980865478515625, "pos_frac": 0.75, "sample": [5.747322082519531, 9.478866577148438, -3.6606826782226562, 2.5521373748779297, 2.2485122680664062, 1.3571929931640625, 3.69610595703125, -2.8922500610351562, -2.671844482421875, 4.708915710449219, 1.9963531494140625, -0.8373603820800781, 2.480987548828125, 6.336406707763672, 9.92431640625, 6.823171615600586, 2.0204391479492188, 2.1108970642089844, 1.3731765747070312, -3.9190521240234375, 1.6709918975830078, 3.1243038177490234, 5.368273735046387, 7.5176849365234375, -6.947059631347656, -1.5734786987304688, -1.3616065979003906, -1.7728252410888672, 0.43006134033203125, 0.8934211730957031, 1.930419921875, 6.742738723754883, 0.4936981201171875, -1.013671875, -0.35803985595703125, -0.16555213928222656, 2.8091697692871094, 6.564720153808594, 5.252559661865234, 2.611652374267578, -0.8061141967773438, -3.2344436645507812, 0.8345985412597656, 1.7220611572265625, 2.597076416015625, 2.059093475341797, 6.718950271606445, 3.1690826416015625, -1.9722137451171875, 3.14703369140625, 8.780303955078125, 3.2838001251220703, 3.9460678100585938, 9.64837646484375, 0.40282440185546875, 2.3570632934570312, 10.980865478515625, 4.908830642700195, 3.9230804443359375, 3.972291946411133, 3.2893600463867188, 4.0677032470703125, -0.6265869140625, 2.4636917114257812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000186.npy"}
{"epoch": 0.2811791383219955, "step": 187, "batch_size": 64, "mean": 2.576138734817505, "std": 3.9301562309265137, "min": -7.240928649902344, "p10": -1.975027084350586, "median": 2.52166748046875, "p90": 7.045487213134767, "max": 14.07757568359375, "pos_frac": 0.765625, "sample": [3.2327117919921875, 0.039398193359375, 7.202552795410156, 7.82208251953125, 14.07757568359375, 4.124198913574219, 5.2334442138671875, 1.2142982482910156, 1.10577392578125, 4.53350830078125, -2.5494003295898438, 4.508453369140625, 1.0625495910644531, 5.6642303466796875, 4.62274169921875, 1.4260139465332031, 1.3392581939697266, 5.856742858886719, 2.4325485229492188, 3.5724620819091797, -7.03619384765625, -7.240928649902344, 8.406661987304688, 0.351165771484375, 2.1306190490722656, -1.2751693725585938, 0.3696098327636719, 2.12701416015625, 2.8811798095703125, 5.618537902832031, 5.290504455566406, 6.6790008544921875, -3.226268768310547, 0.2964515686035156, 3.5602569580078125, 7.293304443359375, 1.3087730407714844, -0.19440841674804688, -0.6520843505859375, 3.15960693359375, -0.8687953948974609, -1.4266853332519531, 6.301422119140625, -0.14354515075683594, 2.2718124389648438, 3.3891448974609375, 7.2803497314453125, -2.0166664123535156, 6.2281341552734375, 5.7715911865234375, 6.170722961425781, 1.6671142578125, 3.8839492797851562, 10.391357421875, 1.2399520874023438, -1.87786865234375, 6.162689208984375, 5.793209075927734, 1.213775634765625, 2.6107864379882812, -0.0702362060546875, -3.252452850341797, -6.157012939453125, 3.9413528442382812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000187.npy"}
{"epoch": 0.28269085411942557, "step": 188, "batch_size": 64, "mean": 2.3482742309570312, "std": 3.795362949371338, "min": -12.275672912597656, "p10": -1.1295883178710935, "median": 1.924276351928711, "p90": 7.540480041503907, "max": 9.215072631835938, "pos_frac": 0.78125, "sample": [0.813323974609375, -0.7327785491943359, 7.168308258056641, -2.5987586975097656, 0.323394775390625, 4.139701843261719, 5.4494171142578125, 1.4979267120361328, 1.6619110107421875, 0.23486328125, 1.2644996643066406, 7.3467559814453125, 5.948612213134766, -0.619049072265625, 0.3361835479736328, 0.5634002685546875, 7.8552703857421875, 6.555931091308594, 0.6066017150878906, -1.214508056640625, 1.9521446228027344, 0.4149627685546875, 2.083251953125, -0.9314422607421875, 3.501453399658203, 1.4731979370117188, -1.3355464935302734, 4.5518341064453125, 1.5349922180175781, 7.664821624755859, 6.280975341796875, -0.1963043212890625, 4.2663116455078125, 8.713096618652344, 5.473075866699219, 0.209381103515625, 7.4315643310546875, 2.4040374755859375, 0.08306884765625, 1.8964080810546875, 8.550518035888672, 0.988677978515625, 2.4998016357421875, 3.5985565185546875, 0.6041946411132812, -2.81842041015625, 2.6522369384765625, 8.130531311035156, 2.1118812561035156, -0.130279541015625, 9.215072631835938, -5.711761474609375, -0.4260749816894531, 3.2373504638671875, 0.8397674560546875, 3.532958984375, 2.087736129760742, -3.338085174560547, 6.944980621337891, 7.587158203125, 4.766965866088867, 3.7912826538085938, -0.2221088409423828, -12.275672912597656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000188.npy"}
{"epoch": 0.2842025699168556, "step": 189, "batch_size": 64, "mean": 2.7277445793151855, "std": 3.52020263671875, "min": -4.164264678955078, "p10": -1.9842838287353513, "median": 2.1434898376464844, "p90": 8.178064727783203, "max": 9.902259826660156, "pos_frac": 0.859375, "sample": [0.9191970825195312, 3.0418357849121094, 2.645465850830078, 2.0991287231445312, 5.1105499267578125, 8.073074340820312, 7.460609436035156, 5.29632568359375, 0.7477283477783203, 3.5921783447265625, -3.2894134521484375, 4.643680572509766, -3.4569320678710938, 1.430877685546875, 0.48467254638671875, -0.5407695770263672, 9.34735107421875, 5.294712066650391, -4.150447845458984, 2.1792068481445312, 1.4560928344726562, 5.759393692016602, 1.9025726318359375, 3.01202392578125, 0.6147918701171875, 1.2238941192626953, 3.0170822143554688, 1.9567680358886719, 1.9951725006103516, 0.6240768432617188, 8.694671630859375, 0.2911529541015625, 2.0545272827148438, 9.543180465698242, 9.902259826660156, 2.9166641235351562, 0.3915596008300781, 1.2458953857421875, 0.7156391143798828, 3.7040023803710938, 7.919464111328125, 0.80810546875, 2.569173812866211, 0.5427398681640625, 3.1718597412109375, 3.279071807861328, -2.111888885498047, -3.2992210388183594, 3.2098388671875, 2.2670669555664062, 1.6997604370117188, 7.629938125610352, -1.6865386962890625, 8.223060607910156, 4.9744110107421875, 1.1845169067382812, -3.753650665283203, -4.164264678955078, 2.6896514892578125, 5.802001953125, 2.0470504760742188, 2.1077728271484375, 9.227653503417969, 8.287635803222656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000189.npy"}
{"epoch": 0.2857142857142857, "step": 190, "batch_size": 64, "mean": 2.855081796646118, "std": 3.4913086891174316, "min": -4.88037109375, "p10": -1.1030319213867186, "median": 2.414205551147461, "p90": 7.770275115966798, "max": 9.344921112060547, "pos_frac": 0.796875, "sample": [2.646099090576172, 3.147430419921875, 8.877456665039062, -3.655517578125, 3.545032501220703, 6.1961517333984375, -0.90362548828125, 1.906158447265625, -3.1636276245117188, 9.344921112060547, 1.4425086975097656, 4.133903503417969, 3.367259979248047, 0.19796371459960938, 1.5425033569335938, -0.5776290893554688, -0.07325363159179688, 6.748809814453125, 1.7904415130615234, 3.8094654083251953, 4.167564392089844, 7.869449615478516, 0.3175048828125, 1.5624275207519531, -2.1586856842041016, 2.7258987426757812, -4.88037109375, 1.0355758666992188, 2.5449676513671875, 2.355670928955078, 5.103507995605469, -0.06623458862304688, 2.4727401733398438, 1.828460693359375, 4.690895080566406, 0.5374183654785156, 4.870574951171875, 0.18171310424804688, -0.8806858062744141, 2.8453330993652344, 0.102142333984375, 8.78369140625, 7.538867950439453, 8.945587158203125, 1.0767498016357422, 0.6975326538085938, -1.212728500366211, 6.5579986572265625, -0.8473968505859375, -2.3653335571289062, 0.5806732177734375, 2.1577587127685547, 1.6055793762207031, 7.297538757324219, 4.03717041015625, 9.22772216796875, 7.471160888671875, 2.3140716552734375, 7.176948547363281, 8.926780700683594, -1.1884918212890625, 7.45123291015625, 3.03228759765625, 5.911518096923828], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000190.npy"}
{"epoch": 0.2872260015117158, "step": 191, "batch_size": 64, "mean": 2.576185703277588, "std": 3.403745174407959, "min": -5.1070556640625, "p10": -1.2131418228149409, "median": 2.3449363708496094, "p90": 6.46296615600586, "max": 10.57608413696289, "pos_frac": 0.78125, "sample": [5.820365905761719, -1.3967571258544922, -0.7368621826171875, 2.0364303588867188, 10.57608413696289, 1.36279296875, 2.4835968017578125, 5.711212158203125, 1.880270004272461, 1.39019775390625, 3.4701995849609375, 9.672798156738281, 2.5060653686523438, 2.347942352294922, 2.6978073120117188, 1.470834732055664, -3.518707275390625, -0.17179107666015625, 9.3055419921875, 2.5523929595947266, 2.5609283447265625, 8.583099365234375, 4.039966583251953, 6.185111999511719, -3.663707733154297, 5.757469177246094, 1.7016372680664062, 5.843231201171875, -0.7847061157226562, 0.6984176635742188, 3.3250579833984375, 0.00305938720703125, 0.4364356994628906, 2.341930389404297, 9.55535888671875, -5.1070556640625, 0.9883613586425781, -2.2657012939453125, 5.703098297119141, 3.2412185668945312, 3.6017494201660156, 3.224609375, 0.762481689453125, 4.0023651123046875, 4.532129287719727, 1.2043914794921875, 6.5820465087890625, 5.666912078857422, -0.7160263061523438, -0.126190185546875, -0.5242691040039062, 2.3047218322753906, 4.97021484375, 1.7547760009765625, -2.3301124572753906, 3.82037353515625, 8.620765686035156, -3.7881927490234375, 2.9260692596435547, -0.0358734130859375, 0.7657012939453125, 6.170074462890625, 0.6553421020507812, 2.2282028198242188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000191.npy"}
{"epoch": 0.2887377173091459, "step": 192, "batch_size": 64, "mean": 1.788528323173523, "std": 3.250593900680542, "min": -7.710075378417969, "p10": -1.7699817657470702, "median": 1.9131278991699219, "p90": 6.114638137817383, "max": 11.099939346313477, "pos_frac": 0.703125, "sample": [-7.710075378417969, -3.3735885620117188, 0.5987586975097656, 4.589210510253906, 3.0811595916748047, 0.1726226806640625, -1.7857780456542969, 5.5193634033203125, 8.010879516601562, 1.8818588256835938, -1.7763481140136719, -0.23842620849609375, -0.3114967346191406, 0.07265472412109375, 0.787933349609375, 4.5726318359375, 6.118495941162109, 2.1374053955078125, 1.737997055053711, -0.8637313842773438, 4.766843795776367, 6.262001037597656, -0.4528846740722656, 0.2375621795654297, 6.744789123535156, 0.20324134826660156, -1.755126953125, 2.5838470458984375, 2.764995574951172, 1.2513847351074219, 3.1210174560546875, 2.4351806640625, 2.042745590209961, 3.170137405395508, -0.988861083984375, -0.17130279541015625, 2.3681793212890625, 0.21872901916503906, -0.7543792724609375, 3.034942626953125, 0.9264888763427734, -0.26708221435546875, -5.170726776123047, 11.099939346313477, 6.1056365966796875, 2.6881847381591797, -1.9022064208984375, 2.2458953857421875, 1.94439697265625, -2.0341873168945312, 4.9471282958984375, 6.307525634765625, 0.5212936401367188, 8.421722412109375, -0.18354034423828125, 2.3230209350585938, 4.822154998779297, -1.4116554260253906, 2.6689376831054688, 5.678565979003906, 2.888212203979492, -1.397176742553711, 0.4421195983886719, 2.4965953826904297], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000192.npy"}
{"epoch": 0.29024943310657597, "step": 193, "batch_size": 64, "mean": 2.658322811126709, "std": 3.5353200435638428, "min": -5.509685516357422, "p10": -1.187436294555664, "median": 1.5786237716674805, "p90": 7.195554351806641, "max": 14.559310913085938, "pos_frac": 0.8125, "sample": [1.0951385498046875, 1.359395980834961, 3.3054733276367188, 3.084644317626953, 0.9761886596679688, -1.7681999206542969, -1.5629653930664062, 5.620368957519531, 1.231719970703125, 0.29235076904296875, 5.198871612548828, 0.5398426055908203, 5.675025939941406, 1.6122817993164062, 4.237579345703125, -1.2331771850585938, 4.725067138671875, -0.06681442260742188, 2.6903820037841797, 4.818244934082031, -1.4584083557128906, 8.328706741333008, 14.559310913085938, 0.07159805297851562, 3.6167144775390625, 5.6887359619140625, 0.41371726989746094, 2.1452713012695312, -0.131378173828125, 1.9430408477783203, 8.730453491210938, 4.273200988769531, 0.9689979553222656, 4.423316955566406, 3.8579025268554688, -2.1702117919921875, -1.9406661987304688, 0.96234130859375, 4.541999816894531, 3.7768630981445312, -0.5536346435546875, 1.2695655822753906, 4.520298004150391, 12.450302124023438, -1.0807075500488281, 0.6989593505859375, 1.5449657440185547, 0.1868438720703125, 0.5209236145019531, 7.234992980957031, 5.972709655761719, -0.16070175170898438, 7.1035308837890625, 9.346092224121094, 8.546878814697266, 1.8518829345703125, 2.451416015625, -5.509685516357422, 1.4840621948242188, 4.023092269897461, 1.4689712524414062, 1.1124267578125, 0.6386928558349609, 0.5778598785400391], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000193.npy"}
{"epoch": 0.29176114890400606, "step": 194, "batch_size": 64, "mean": 2.281986713409424, "std": 3.766561985015869, "min": -6.141998291015625, "p10": -2.3141010284423826, "median": 2.3493576049804688, "p90": 7.324410438537598, "max": 10.965606689453125, "pos_frac": 0.71875, "sample": [5.5867767333984375, 3.0552215576171875, 2.4332962036132812, -2.37237548828125, 5.184814453125, 0.8212814331054688, -2.1781272888183594, 1.2732868194580078, -1.0787010192871094, 3.0145263671875, 3.1093826293945312, 2.6416397094726562, 1.3088455200195312, 3.945098876953125, -4.239013671875, -3.1777420043945312, 7.263908386230469, -0.1157684326171875, 2.5129776000976562, -1.1346664428710938, 2.90509033203125, 0.8927078247070312, 5.048576354980469, 4.4265899658203125, -0.7812213897705078, 4.068031311035156, -0.7220458984375, 3.1355743408203125, -1.0888290405273438, 1.698394775390625, 0.22359848022460938, 5.1228485107421875, -3.7213668823242188, -1.82318115234375, -4.932868957519531, -0.1821765899658203, 7.350339889526367, 6.7247314453125, 4.771648406982422, 8.869752883911133, -6.141998291015625, 2.2654190063476562, 2.220752716064453, 1.1821098327636719, 7.469017028808594, 10.625276565551758, 4.208261489868164, -4.704582214355469, 10.432403564453125, 8.13665771484375, 4.4856109619140625, -0.6079940795898438, 2.8342056274414062, 5.617311477661133, 3.01123046875, 10.965606689453125, 3.333883285522461, 0.4054985046386719, -0.02069854736328125, 4.977272033691406, 1.7273406982421875, 1.8293571472167969, 0.04250526428222656, 1.9118537902832031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000194.npy"}
{"epoch": 0.29327286470143615, "step": 195, "batch_size": 64, "mean": 2.4718117713928223, "std": 3.4465885162353516, "min": -7.0722198486328125, "p10": -1.0962303161621092, "median": 2.0895423889160156, "p90": 6.488844871520997, "max": 9.88848876953125, "pos_frac": 0.765625, "sample": [1.9475555419921875, 3.611072540283203, -1.4487152099609375, 2.5712356567382812, 9.157392501831055, 3.8889694213867188, 2.2556076049804688, 1.68408203125, 3.3898468017578125, 6.9354095458984375, 5.391853332519531, 5.479560852050781, -4.790946960449219, -0.0731964111328125, -2.418304443359375, -7.0722198486328125, -0.6907138824462891, 1.828857421875, 1.9809341430664062, 6.932050704956055, -1.16119384765625, 4.930885314941406, 4.482046127319336, 4.355621337890625, 1.1479644775390625, 1.3602943420410156, 4.401460647583008, 9.46945571899414, 0.962188720703125, 4.1442413330078125, 0.9831695556640625, 0.8720855712890625, 9.854572296142578, 2.4006805419921875, 4.552215576171875, -0.32450103759765625, -0.3173046112060547, 4.313728332519531, 5.30963134765625, 5.897796630859375, 1.7125473022460938, 0.5126419067382812, -0.9446487426757812, 0.25164794921875, 0.3378410339355469, 1.4302692413330078, -0.712799072265625, 4.302482604980469, -4.1091766357421875, -0.2422809600830078, -0.700439453125, 0.2195281982421875, 9.88848876953125, 5.756187438964844, 4.334228515625, 4.736288070678711, 6.5474700927734375, 3.57855224609375, -3.303619384765625, 5.731620788574219, 1.9425735473632812, 0.18097686767578125, 2.198150634765625, 6.352052688598633], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000195.npy"}
{"epoch": 0.2947845804988662, "step": 196, "batch_size": 64, "mean": 2.2476589679718018, "std": 3.3513097763061523, "min": -9.487262725830078, "p10": -1.3692874908447261, "median": 1.8203544616699219, "p90": 6.362846565246582, "max": 8.561843872070312, "pos_frac": 0.796875, "sample": [0.4520111083984375, -9.487262725830078, -0.7696495056152344, 2.9679412841796875, 0.5581684112548828, -1.7167701721191406, 1.8498992919921875, 1.7640380859375, 2.4425888061523438, -1.5581092834472656, 1.073486328125, 5.6790771484375, 5.433197021484375, 1.359649658203125, 2.6640396118164062, 1.0168685913085938, -5.971149444580078, 5.431114196777344, -2.6472854614257812, 7.463371276855469, -0.9252109527587891, 0.7515068054199219, 6.280706405639648, 0.172882080078125, -0.2696647644042969, 2.340087890625, 4.6996002197265625, 4.629966735839844, 4.832794189453125, 6.394989013671875, 1.2353267669677734, 4.811037063598633, 1.42144775390625, -0.9287033081054688, -0.16909027099609375, 2.2476806640625, 0.22386932373046875, 1.9474048614501953, 1.6361770629882812, 7.7259063720703125, 1.6776046752929688, 4.596710205078125, 2.42333984375, 2.6312904357910156, 4.560600280761719, -1.5711212158203125, 0.039699554443359375, 3.423126220703125, 1.7908096313476562, 8.1951904296875, 4.90643310546875, 6.287847518920898, -0.9038734436035156, 0.8376007080078125, 3.300281524658203, 8.561843872070312, 8.017402648925781, 8.074684143066406, 1.1000518798828125, 1.623931884765625, 1.954925537109375, -1.9173469543457031, 1.08673095703125, 6.088470458984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000196.npy"}
{"epoch": 0.2962962962962963, "step": 197, "batch_size": 64, "mean": 2.6723194122314453, "std": 4.447953701019287, "min": -10.65679931640625, "p10": -2.609155082702636, "median": 2.044727325439453, "p90": 8.047512435913086, "max": 11.968376159667969, "pos_frac": 0.75, "sample": [-1.121084213256836, 10.076316833496094, 1.6908245086669922, 0.1747589111328125, 0.47662353515625, 2.0030288696289062, -3.6884689331054688, -10.65679931640625, 1.4719467163085938, -0.4665069580078125, 8.53411865234375, 6.235410690307617, -2.8985137939453125, -3.412456512451172, 0.5650005340576172, 9.847061157226562, 1.2783737182617188, 1.3389549255371094, 11.767082214355469, 8.091506958007812, 1.0690460205078125, 0.7918701171875, 7.944858551025391, 2.4433460235595703, 3.5244598388671875, 1.7212066650390625, 0.929290771484375, -1.4509239196777344, 5.957134246826172, 5.058036804199219, 6.8359222412109375, 7.63653564453125, -4.294708251953125, 7.244863510131836, -2.76361083984375, -7.850494384765625, 2.5094642639160156, 0.553955078125, 4.6379852294921875, 4.457344055175781, -0.09023284912109375, 6.058807373046875, 7.864795684814453, 0.4706878662109375, -0.4642791748046875, 8.172378540039062, 2.08642578125, -0.7667312622070312, 2.153768539428711, 7.230112075805664, -2.248758316040039, -0.469329833984375, 4.57830810546875, 7.017305374145508, 1.2321805953979492, -1.7682952880859375, 11.968376159667969, 3.9540023803710938, 0.8542366027832031, 2.762096405029297, 5.955329895019531, 6.70770263671875, 3.4920520782470703, 6.014739990234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000197.npy"}
{"epoch": 0.29780801209372637, "step": 198, "batch_size": 64, "mean": 3.004481792449951, "std": 3.513232946395874, "min": -5.2377471923828125, "p10": -1.1033004760742187, "median": 3.138026237487793, "p90": 6.894127273559572, "max": 13.483695983886719, "pos_frac": 0.796875, "sample": [-0.910430908203125, 4.652259826660156, 1.217355728149414, 1.942657470703125, 1.7905731201171875, 4.436073303222656, -2.4729671478271484, 1.6422958374023438, 1.0796527862548828, 2.225860595703125, 6.281822204589844, 1.8874664306640625, 0.5614204406738281, -2.789827346801758, 12.360300064086914, 9.164512634277344, 5.147430419921875, -0.573577880859375, 5.406097412109375, -0.05479621887207031, 3.9402618408203125, 4.707187652587891, 6.510478973388672, -2.403463363647461, 7.342796325683594, -3.3961944580078125, -1.883514404296875, -0.1727294921875, 5.688789367675781, 2.567049026489258, 4.041313171386719, 4.278167724609375, 1.4040069580078125, -5.2377471923828125, 0.6449127197265625, -0.20556640625, 0.9118213653564453, 2.2513904571533203, -1.1859588623046875, 3.4200477600097656, 3.249908447265625, 4.906288146972656, 8.128456115722656, 3.05023193359375, 4.36695671081543, 4.2243194580078125, 0.8288784027099609, 7.0585479736328125, 13.483695983886719, 3.6766510009765625, 2.87896728515625, 1.4143829345703125, 3.3782958984375, 5.801746368408203, 4.611656188964844, 3.9322662353515625, 3.7880859375, 2.7929649353027344, -0.35169029235839844, 2.1595516204833984, 3.3777389526367188, 5.054546356201172, 11.031341552734375, 3.225820541381836], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000198.npy"}
{"epoch": 0.29931972789115646, "step": 199, "batch_size": 64, "mean": 2.285083293914795, "std": 3.5821666717529297, "min": -6.172069549560547, "p10": -3.0616237640380857, "median": 2.533940315246582, "p90": 6.730189514160156, "max": 8.860671997070312, "pos_frac": 0.765625, "sample": [4.032535552978516, 4.5813446044921875, 1.1750335693359375, 2.6705322265625, -1.1596736907958984, -3.2408485412597656, 8.860671997070312, 6.858184814453125, 2.393646240234375, 4.038055419921875, 4.2457427978515625, -0.5217514038085938, 7.925212860107422, 5.136501312255859, 6.673088073730469, 7.395965576171875, 5.827877044677734, 4.873882293701172, 4.924774169921875, 6.502696990966797, 0.1700439453125, -5.020929336547852, 3.31463623046875, 4.0871124267578125, -0.37558746337890625, 1.1950225830078125, 2.745838165283203, -1.2509994506835938, 2.6504478454589844, 1.6785354614257812, 2.1792144775390625, 1.8036479949951172, 7.216533660888672, 0.8785400390625, -0.9453420639038086, 1.9890594482421875, 3.09124755859375, -5.343971252441406, -2.6434326171875, 2.0644912719726562, -6.172069549560547, 3.5068359375, 0.8739089965820312, 5.390342712402344, -5.586433410644531, 6.5101318359375, 4.430265426635742, 2.813201904296875, -0.5207672119140625, 0.7203903198242188, 0.04259490966796875, -4.192329406738281, -0.7768478393554688, 6.115756988525391, -3.4146270751953125, 0.80731201171875, 6.754661560058594, 1.14044189453125, 4.6232757568359375, 3.213581085205078, 6.084331512451172, 1.1820068359375, 7.604347229003906, 2.4174327850341797], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000199.npy"}
{"epoch": 0.30083144368858655, "step": 200, "batch_size": 64, "mean": 3.3162336349487305, "std": 4.1814470291137695, "min": -7.378387451171875, "p10": -1.5818897247314452, "median": 2.9389686584472656, "p90": 10.010131072998048, "max": 12.354888916015625, "pos_frac": 0.8125, "sample": [3.3209762573242188, 8.91690444946289, 10.897979736328125, 0.41100311279296875, 2.9631195068359375, 1.1744327545166016, 5.222938537597656, 1.0992279052734375, 4.24322509765625, 0.2385406494140625, -0.97576904296875, 5.03570556640625, -1.4528160095214844, 11.407089233398438, 1.1392478942871094, 0.03154182434082031, 2.9148178100585938, 0.25162696838378906, 7.06854248046875, -2.3588714599609375, -1.63720703125, 0.7297286987304688, 4.496253967285156, 10.608657836914062, 0.8776149749755859, 4.717020034790039, 2.294828414916992, 2.3832244873046875, 3.926280975341797, -0.09187698364257812, 8.173820495605469, 3.8162918090820312, 4.240425109863281, 10.1949462890625, 7.422554016113281, 0.09752273559570312, -7.378387451171875, 5.804929733276367, 10.418563842773438, 6.142621994018555, 9.578895568847656, 2.711395263671875, 12.354888916015625, 1.8764724731445312, 3.4925537109375, -3.9880218505859375, 1.2077713012695312, 3.97308349609375, 2.773771286010742, 11.573959350585938, 0.5882110595703125, 4.782032012939453, 4.183025360107422, -1.9250106811523438, 2.9645538330078125, 3.8086700439453125, -1.6790618896484375, -1.4425010681152344, 2.45574951171875, 1.8874053955078125, -0.2003803253173828, -2.900498390197754, 8.997901916503906, 6.376808166503906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000200.npy"}
{"epoch": 0.30234315948601664, "step": 201, "batch_size": 64, "mean": 2.4518747329711914, "std": 3.6231648921966553, "min": -3.7703094482421875, "p10": -1.4231678009033202, "median": 1.7824440002441406, "p90": 7.459311676025392, "max": 12.133329391479492, "pos_frac": 0.765625, "sample": [2.2578887939453125, 5.1844940185546875, 0.8949508666992188, 1.2747879028320312, 0.035724639892578125, 2.6109619140625, 10.590274810791016, 0.7626113891601562, 2.0396556854248047, 0.4080352783203125, 1.3087158203125, 5.418128967285156, 2.6340560913085938, 7.221405029296875, 0.6997280120849609, -0.331024169921875, 8.286811828613281, 4.25445556640625, -0.3603515625, 12.133329391479492, -3.1073036193847656, 1.5101776123046875, 0.322723388671875, -1.2166557312011719, 0.9764289855957031, 8.910282135009766, 3.3046035766601562, 2.6318740844726562, -2.671335220336914, 3.033782958984375, 0.7102241516113281, 4.981414794921875, 1.9220733642578125, -0.318023681640625, 5.737985610961914, 9.366519927978516, -1.1099853515625, -3.69287109375, -1.5116729736328125, 1.3574371337890625, 5.260406494140625, 4.492328643798828, -1.0148468017578125, 4.0918731689453125, -0.7838497161865234, 5.724761962890625, 1.8929023742675781, 0.530914306640625, 2.3259353637695312, 6.633636474609375, -0.9822998046875, 4.4726715087890625, 1.6719856262207031, 6.344356536865234, 2.0006866455078125, 7.561271667480469, 5.858875274658203, -2.7051315307617188, 0.7248306274414062, -3.7703094482421875, 9.947708129882812, 1.2913360595703125, 0.1140594482421875, -3.2264328002929688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000201.npy"}
{"epoch": 0.30385487528344673, "step": 202, "batch_size": 64, "mean": 2.856332302093506, "std": 3.822866201400757, "min": -6.751373291015625, "p10": -1.8270877838134763, "median": 2.4148120880126953, "p90": 7.490810775756842, "max": 13.536151885986328, "pos_frac": 0.78125, "sample": [-1.2868595123291016, 1.8617401123046875, 10.362506866455078, 10.19873046875, 5.169036865234375, -0.735870361328125, 3.8877029418945312, 2.717010498046875, 2.4091835021972656, 2.7874221801757812, 0.7229194641113281, 3.903339385986328, 1.4052314758300781, 2.2362823486328125, -1.1204452514648438, 4.5467681884765625, 5.845619201660156, -0.0265960693359375, 2.339508056640625, -2.34967041015625, 1.1469383239746094, 2.420440673828125, 5.4420013427734375, 2.3328323364257812, -1.9779987335205078, 6.088554382324219, 1.8276996612548828, 13.536151885986328, 0.6784286499023438, 8.091777801513672, -0.09413909912109375, 4.699626922607422, 2.406522750854492, 5.6403045654296875, 2.2453765869140625, -6.751373291015625, 3.693328857421875, 4.137216567993164, 2.8417587280273438, -2.6038589477539062, 4.852264404296875, 4.42095947265625, 2.236797332763672, 2.58612060546875, 3.8399581909179688, 3.5891952514648438, 5.7470550537109375, 2.8492298126220703, 6.0446624755859375, -1.596954345703125, 0.5214424133300781, -4.766475677490234, -0.4521636962890625, -1.9257164001464844, 2.06378173828125, 2.163604736328125, 9.923599243164062, 1.0555400848388672, 2.101713180541992, 4.870506286621094, -2.9613189697265625, 12.593971252441406, 2.5804805755615234, 9.791873931884766], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000202.npy"}
{"epoch": 0.30536659108087677, "step": 203, "batch_size": 64, "mean": 2.1485180854797363, "std": 3.565376043319702, "min": -5.842521667480469, "p10": -1.9661552429199216, "median": 2.0315628051757812, "p90": 6.242327117919922, "max": 12.92471694946289, "pos_frac": 0.734375, "sample": [3.4570236206054688, 1.4633865356445312, -5.842521667480469, -0.958160400390625, 1.4793853759765625, 1.0740089416503906, -0.13381195068359375, 5.217041015625, -0.32637786865234375, 0.8772354125976562, 3.867523193359375, 1.7804718017578125, 1.4765796661376953, 8.759506225585938, 0.35344886779785156, 9.934677124023438, 4.264076232910156, 1.032196044921875, 6.441249847412109, 2.66473388671875, -5.7848968505859375, -0.7280693054199219, 5.486152648925781, -0.8064022064208984, -1.7579689025878906, -2.6017303466796875, 0.7932891845703125, 9.288818359375, 2.8034820556640625, 5.979804992675781, 3.9835891723632812, 2.476287841796875, 6.319023132324219, 1.6296710968017578, 0.9720611572265625, 2.4174728393554688, 4.687568664550781, 5.416011810302734, 6.0633697509765625, 0.9063701629638672, 3.8242263793945312, -2.659698486328125, 2.3573150634765625, 2.4024887084960938, 3.5964202880859375, -1.0441741943359375, 5.198371887207031, 5.9549102783203125, 0.021484375, 12.92471694946289, 2.148406982421875, -0.42331695556640625, -0.40099334716796875, 3.131561279296875, 3.616973876953125, -4.485126495361328, -1.6926040649414062, 1.9147186279296875, 6.759674072265625, 2.357931137084961, 2.2793655395507812, -2.7453765869140625, -2.055377960205078, 0.09768867492675781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000203.npy"}
{"epoch": 0.30687830687830686, "step": 204, "batch_size": 64, "mean": 2.647813320159912, "std": 4.101550102233887, "min": -6.800846099853516, "p10": -2.0651407241821285, "median": 2.4367713928222656, "p90": 7.627550315856934, "max": 14.737518310546875, "pos_frac": 0.71875, "sample": [14.737518310546875, 4.939727783203125, -0.9524612426757812, 3.446918487548828, 1.6215267181396484, 12.950881958007812, 1.2582550048828125, 2.39129638671875, -3.1649169921875, -1.130645751953125, 6.5916290283203125, -3.3678207397460938, -1.5911521911621094, 6.008441925048828, 0.8196487426757812, 7.045799255371094, 5.114711761474609, -0.01319122314453125, -3.8156261444091797, -1.2429275512695312, 3.301332473754883, 0.9087142944335938, 2.4822463989257812, 7.348045349121094, -0.06384086608886719, -0.32483673095703125, 2.54168701171875, 4.947483062744141, 0.9869308471679688, 3.6647377014160156, 7.653730392456055, -1.59149169921875, 2.9016857147216797, 7.121574401855469, 3.823057174682617, -0.7988204956054688, 1.9832763671875, 10.97860336303711, -2.473630905151367, 8.211135864257812, 0.16262435913085938, 1.6817626953125, 5.129173278808594, 5.052803039550781, -3.211681365966797, 3.4465484619140625, 3.0582427978515625, -0.7555427551269531, 2.6263427734375, 0.20468521118164062, 9.551368713378906, 9.098583221435547, 0.7047843933105469, 4.580360412597656, 2.668670654296875, 7.566463470458984, 3.1888885498046875, 0.6095314025878906, -0.35341644287109375, 1.9731292724609375, -6.800846099853516, -2.2681331634521484, 4.8486480712890625, 1.4478302001953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000204.npy"}
{"epoch": 0.30839002267573695, "step": 205, "batch_size": 64, "mean": 2.1588058471679688, "std": 4.207107067108154, "min": -11.583343505859375, "p10": -2.4196802139282223, "median": 1.7737846374511719, "p90": 7.4071891784667985, "max": 11.856887817382812, "pos_frac": 0.734375, "sample": [11.27532958984375, 0.28171730041503906, -0.4226665496826172, 2.3142051696777344, 1.0912322998046875, 11.856887817382812, 2.5147323608398438, 2.2564010620117188, -11.583343505859375, 2.2724456787109375, 1.339742660522461, 0.7071113586425781, 10.710639953613281, 3.120197296142578, 3.8201026916503906, -0.639434814453125, -0.71209716796875, -1.1220703125, -0.452178955078125, 1.0624237060546875, 1.368814468383789, 6.476287841796875, 0.4419841766357422, 5.725379943847656, 7.023704528808594, 0.5546035766601562, -1.9868011474609375, -1.3505287170410156, -0.5633583068847656, 7.5405426025390625, 1.178558349609375, -2.6051998138427734, 4.39117431640625, 2.0836868286132812, -1.87518310546875, 3.966156005859375, 3.003683090209961, 1.837554931640625, 0.26811981201171875, 6.419776916503906, 0.27301025390625, 7.096031188964844, 4.89984130859375, 1.766204833984375, 9.520759582519531, 3.6750106811523438, 0.4523468017578125, 1.7813644409179688, 8.297561645507812, -4.503173828125, 0.5779953002929688, 6.798309326171875, 5.747379302978516, 3.5539398193359375, -3.068389892578125, 4.810462951660156, 4.1691436767578125, -2.8476314544677734, 8.557914733886719, -2.965404510498047, 0.4493865966796875, -7.221931457519531, -0.06620025634765625, 2.819303512573242], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000205.npy"}
{"epoch": 0.30990173847316704, "step": 206, "batch_size": 64, "mean": 2.079155683517456, "std": 3.842315673828125, "min": -7.080780029296875, "p10": -2.1802673339843746, "median": 1.5209197998046875, "p90": 7.3757484436035154, "max": 10.220977783203125, "pos_frac": 0.671875, "sample": [-1.2821731567382812, -2.9582366943359375, 0.8554153442382812, -4.698158264160156, -3.231698989868164, -0.5681991577148438, 1.9952621459960938, 1.6249160766601562, 4.760978698730469, 3.911813735961914, 2.5479068756103516, 10.220977783203125, 6.440219879150391, -1.3402557373046875, -0.24983978271484375, 7.982940673828125, -1.0429534912109375, 3.90716552734375, -0.694366455078125, 3.1757678985595703, 7.410102844238281, 0.4713478088378906, 1.1371116638183594, 0.9930572509765625, -1.8464279174804688, 7.396812438964844, 2.884889602661133, -6.918619155883789, 2.8850536346435547, 4.31060791015625, 6.2578582763671875, 6.581586837768555, 0.8798751831054688, 3.5597782135009766, 4.98223876953125, 1.18853759765625, -7.080780029296875, -2.3233413696289062, -1.2725200653076172, 5.166114807128906, 5.14543342590332, 0.9306831359863281, 7.750457763671875, -0.9121360778808594, -0.00408935546875, -0.4211616516113281, 9.21774673461914, 2.8331298828125, 1.4169235229492188, 4.5342559814453125, 8.212371826171875, 6.1133880615234375, 4.894832611083984, -0.6778125762939453, 4.363426208496094, 4.262992858886719, -0.4465904235839844, 0.35805320739746094, 1.139434814453125, 7.32659912109375, -5.672399520874023, -1.1037139892578125, 0.9503135681152344, 4.833045959472656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000206.npy"}
{"epoch": 0.31141345427059713, "step": 207, "batch_size": 64, "mean": 2.569598913192749, "std": 3.8378896713256836, "min": -5.055488586425781, "p10": -1.9724222183227536, "median": 2.139129638671875, "p90": 7.6206298828125005, "max": 15.527778625488281, "pos_frac": 0.71875, "sample": [15.527778625488281, -0.4875144958496094, -1.5411567687988281, 0.7666091918945312, 1.4807815551757812, 2.5487632751464844, 1.325714111328125, 2.4438629150390625, -1.560638427734375, 4.521430969238281, 3.4528980255126953, 1.5503997802734375, -1.3793601989746094, 7.2418212890625, -0.9120063781738281, -0.8652687072753906, 1.7465057373046875, 7.398590087890625, 0.9505863189697266, 5.872562408447266, 6.888545989990234, 3.5329437255859375, 2.878326416015625, 2.12139892578125, -1.1849288940429688, -2.6804580688476562, -2.472745895385742, 5.9976043701171875, -0.39670562744140625, 1.9575614929199219, 5.571285247802734, 1.3650627136230469, 9.068008422851562, 3.3456497192382812, 4.2070770263671875, -1.2178535461425781, -2.1489009857177734, 3.7043304443359375, 6.82843017578125, 1.2809295654296875, 8.231719970703125, 6.738101959228516, 0.8938064575195312, 2.4505767822265625, -0.364105224609375, 2.1568603515625, 3.0593185424804688, -2.7934837341308594, 3.8376026153564453, 7.9411163330078125, -5.055488586425781, 6.241254806518555, -2.931884765625, 4.103973388671875, 1.31121826171875, 9.032150268554688, 7.715789794921875, 1.1871414184570312, 2.667510986328125, 2.1629886627197266, -3.9702377319335938, -0.619873046875, 2.1178970336914062, 9.61246109008789], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000207.npy"}
{"epoch": 0.3129251700680272, "step": 208, "batch_size": 64, "mean": 1.6647076606750488, "std": 3.6005330085754395, "min": -4.811332702636719, "p10": -3.0319631576538084, "median": 1.143686294555664, "p90": 7.047345542907716, "max": 9.654685974121094, "pos_frac": 0.6875, "sample": [0.4408740997314453, 0.6621723175048828, 6.5673370361328125, -3.0734806060791016, 8.133895874023438, 7.690910339355469, 0.6771755218505859, -2.252349853515625, 1.048095703125, -4.4634857177734375, -0.5403900146484375, -3.32672119140625, 4.7885589599609375, 2.2651596069335938, 6.845203399658203, 2.414854049682617, 1.8535957336425781, 4.2712860107421875, -0.6755962371826172, -2.935089111328125, 3.70355224609375, -1.96783447265625, 0.6728038787841797, 0.5283279418945312, 1.1526908874511719, 6.560543060302734, 4.559051513671875, -1.1712665557861328, 3.5401153564453125, 7.133977890014648, -1.1606388092041016, 1.7042770385742188, -0.2625274658203125, 2.612213134765625, -4.811332702636719, 2.898895263671875, 7.182857513427734, 1.9172592163085938, 0.23345184326171875, 1.3617362976074219, 8.955123901367188, 0.8463592529296875, 1.1346817016601562, -0.6013946533203125, 5.591072082519531, 0.33512115478515625, 5.27642822265625, -1.2788352966308594, 2.7890052795410156, 2.1206493377685547, -4.5337982177734375, 3.5890045166015625, -1.3938827514648438, 5.178779602050781, -3.4215965270996094, 9.654685974121094, 0.8165283203125, -2.2670326232910156, -2.219930648803711, 1.5388259887695312, -4.5137176513671875, 4.010921478271484, 0.5165252685546875, 7.637603759765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000208.npy"}
{"epoch": 0.3144368858654573, "step": 209, "batch_size": 64, "mean": 2.748065948486328, "std": 4.124401092529297, "min": -9.174217224121094, "p10": -1.3825668334960934, "median": 1.9637203216552734, "p90": 8.517654418945312, "max": 11.800468444824219, "pos_frac": 0.71875, "sample": [4.836864471435547, 7.088726043701172, 1.8837203979492188, 1.1770362854003906, -1.5156707763671875, 6.4720458984375, 3.2426490783691406, 4.650970458984375, 1.1168842315673828, 11.641517639160156, -1.071990966796875, 2.4794769287109375, -0.56494140625, 5.428657531738281, 1.37567138671875, 1.1024856567382812, -0.7938232421875, -0.209442138671875, -1.5794410705566406, 10.848419189453125, 0.3464927673339844, -1.9313240051269531, -0.18085479736328125, 3.2388916015625, 1.27947998046875, 5.418525695800781, 8.531822204589844, 3.5673980712890625, 8.9278564453125, 1.922210693359375, 3.3346920013427734, 9.5869140625, 2.005229949951172, 6.171806335449219, 3.9806671142578125, -0.48451995849609375, 5.6926422119140625, -0.6819381713867188, -9.174217224121094, 2.203460693359375, -0.43737030029296875, 1.8101749420166016, -0.94879150390625, -0.8068389892578125, 5.5771636962890625, 3.3377132415771484, 11.800468444824219, 6.340253829956055, 1.492462158203125, 5.947566986083984, 1.850545883178711, -2.8892974853515625, -0.22785568237304688, 0.28680419921875, -1.598297119140625, 2.024637222290039, 10.993106842041016, -6.867218017578125, 7.267486572265625, 3.0958614349365234, 5.859006881713867, 8.484596252441406, 0.9602088928222656, 1.1587791442871094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000209.npy"}
{"epoch": 0.31594860166288735, "step": 210, "batch_size": 64, "mean": 2.81829571723938, "std": 4.008813381195068, "min": -4.890834808349609, "p10": -2.1898525238037108, "median": 2.4549903869628906, "p90": 8.786885070800782, "max": 11.953392028808594, "pos_frac": 0.71875, "sample": [9.061119079589844, 0.45428466796875, 2.615081787109375, 3.8459625244140625, -1.5184097290039062, 4.5075531005859375, 11.953392028808594, 4.425506591796875, 11.302139282226562, 8.453643798828125, 5.158260345458984, 7.2482757568359375, 6.055091857910156, -2.0126495361328125, 9.829383850097656, 0.07549285888671875, -3.043272018432617, -2.8103466033935547, 7.6164703369140625, -2.265796661376953, 10.116355895996094, 2.4887237548828125, 1.4505596160888672, -0.205169677734375, 6.0031890869140625, -0.6737213134765625, -0.5658512115478516, 1.4647445678710938, 8.444297790527344, -0.0855865478515625, 9.015525817871094, 0.2750377655029297, 3.9943618774414062, 1.414306640625, 3.1170501708984375, -1.5321922302246094, 0.24036788940429688, -0.3311958312988281, -0.15248489379882812, 8.128732681274414, -2.4980335235595703, 3.94415283203125, 2.3352813720703125, 2.47216796875, 4.549156188964844, 0.19025421142578125, 6.249593734741211, 0.1761341094970703, 4.650474548339844, 2.2001724243164062, -0.3784217834472656, 3.6513423919677734, 4.857322692871094, -4.052556991577148, 1.5623092651367188, 8.929702758789062, 4.454132080078125, 2.4378128051757812, 4.647741317749023, -4.890834808349609, -0.028345108032226562, -3.692901611328125, 2.8738555908203125, 2.172182083129883], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000210.npy"}
{"epoch": 0.31746031746031744, "step": 211, "batch_size": 64, "mean": 2.341519832611084, "std": 3.209725856781006, "min": -4.732975006103516, "p10": -1.2750223159790033, "median": 2.3159141540527344, "p90": 6.238642883300781, "max": 13.74896240234375, "pos_frac": 0.75, "sample": [1.5652351379394531, -4.732975006103516, 4.090267181396484, 0.8052253723144531, -0.44020843505859375, 1.089803695678711, -2.394073486328125, 2.6360626220703125, 2.57781982421875, 1.5820636749267578, 5.7613525390625, -0.1695880889892578, 0.6463699340820312, 6.5601654052734375, 0.3664283752441406, 5.308595657348633, 0.6861896514892578, -0.28619384765625, 4.484111785888672, 0.1228179931640625, 2.402240753173828, -1.59539794921875, 2.3534584045410156, -2.456329345703125, 2.6630287170410156, -1.7856616973876953, 0.5924606323242188, 4.1884002685546875, 8.837581634521484, 2.278369903564453, 3.5616226196289062, 2.4694061279296875, 3.0897884368896484, -0.12699127197265625, 5.8975067138671875, 2.656909942626953, 4.823009490966797, 1.2382965087890625, -0.4552726745605469, 3.3411426544189453, 2.953807830810547, 13.74896240234375, 8.915557861328125, -0.05422401428222656, -0.235565185546875, 3.5848121643066406, 6.606880187988281, 0.524749755859375, 3.4442901611328125, 6.205413818359375, -2.536376953125, 1.9781455993652344, 6.2839813232421875, 5.457963943481445, 6.2528839111328125, 5.174734115600586, 0.9815826416015625, 4.241296768188477, -0.5274791717529297, 4.4544219970703125, -0.13379287719726562, -3.3237228393554688, 0.9105339050292969, 0.7153701782226562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000211.npy"}
{"epoch": 0.31897203325774753, "step": 212, "batch_size": 64, "mean": 2.5763843059539795, "std": 3.728151559829712, "min": -6.5319061279296875, "p10": -2.165858459472656, "median": 2.622152328491211, "p90": 6.878247833251954, "max": 15.636566162109375, "pos_frac": 0.75, "sample": [6.904857635498047, 4.368564605712891, 5.266332626342773, 6.816158294677734, 0.2888622283935547, 2.6365890502929688, 9.762920379638672, 2.770404815673828, -2.0218124389648438, 4.528282165527344, -0.6241111755371094, -6.5319061279296875, 7.099376678466797, 15.636566162109375, 2.6108551025390625, 6.0655670166015625, 5.108020782470703, 2.9187583923339844, -2.2275924682617188, 3.8475914001464844, 1.659637451171875, 0.5643310546875, -0.08738136291503906, 2.5463733673095703, -2.9622802734375, 0.3134307861328125, 4.149932861328125, -2.4710044860839844, 7.860841751098633, 8.2998046875, 2.6334495544433594, 4.105907440185547, 4.736663818359375, 2.4827041625976562, 5.186737060546875, 1.093505859375, -0.17056655883789062, 1.5079116821289062, 0.5944576263427734, 2.0491161346435547, -1.3812637329101562, 0.3007659912109375, 4.107906341552734, -3.3042831420898438, 3.8255043029785156, 3.016204833984375, 3.2936935424804688, 10.860946655273438, -2.673503875732422, -0.6711578369140625, -0.08330917358398438, 1.2432479858398438, 3.4383087158203125, -1.2614822387695312, 2.5909576416015625, 6.5338592529296875, 3.5778274536132812, 4.901725769042969, 6.031635284423828, 0.27724456787109375, 2.91778564453125, 1.0611724853515625, -2.2994537353515625, -0.7335968017578125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000212.npy"}
{"epoch": 0.3204837490551776, "step": 213, "batch_size": 64, "mean": 2.8676233291625977, "std": 3.9669899940490723, "min": -8.612102508544922, "p10": -2.333682632446289, "median": 2.31298828125, "p90": 8.41369972229004, "max": 11.36224365234375, "pos_frac": 0.8125, "sample": [4.736522674560547, 0.14760589599609375, 1.0758209228515625, 5.420677185058594, 2.2523193359375, 1.8609428405761719, -2.3544387817382812, 0.24007797241210938, -3.85546875, 6.746421813964844, 6.2418060302734375, 1.4050216674804688, -0.05841064453125, -2.4932918548583984, -8.612102508544922, -4.1205902099609375, 4.71965217590332, 11.174102783203125, -0.07577896118164062, 3.213287353515625, 6.849138259887695, -3.680448532104492, 8.622230529785156, 9.532989501953125, 6.9649505615234375, 2.9568634033203125, -0.42009925842285156, -2.2852516174316406, 2.171102523803711, 0.6362438201904297, -3.793088912963867, 1.75189208984375, 4.921684265136719, 2.866209030151367, 3.1061248779296875, 3.8154296875, 1.4709415435791016, 4.544807434082031, 6.5834197998046875, 1.96392822265625, 1.4130401611328125, 3.3272247314453125, 10.747112274169922, 0.9004364013671875, 5.950965881347656, 4.407806396484375, 2.5446014404296875, 4.918785095214844, 8.972892761230469, -0.187652587890625, 0.37407684326171875, 2.3669891357421875, 1.3995819091796875, 1.7649459838867188, 2.2589874267578125, 2.9773178100585938, 0.81512451171875, 11.36224365234375, 1.9015579223632812, 0.8069610595703125, 4.682952880859375, 10.181640625, 5.469917297363281, 7.927127838134766], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000213.npy"}
{"epoch": 0.3219954648526077, "step": 214, "batch_size": 64, "mean": 2.6293630599975586, "std": 3.7779908180236816, "min": -6.421478271484375, "p10": -1.972992897033691, "median": 2.3644790649414062, "p90": 8.202059555053712, "max": 14.423980712890625, "pos_frac": 0.734375, "sample": [4.622943878173828, 0.7713775634765625, -1.6572113037109375, 2.66522216796875, 3.2691612243652344, 5.232795715332031, 0.6699256896972656, 0.9144935607910156, 4.46624755859375, 5.952568054199219, 8.673633575439453, 3.952667236328125, 3.229705810546875, 3.2220306396484375, 2.7412185668945312, 8.243270874023438, 3.156463623046875, -0.5390625, 4.878578186035156, 2.268941879272461, -1.2724494934082031, 4.987804412841797, 0.272705078125, 0.9909515380859375, -0.7387676239013672, 5.7652130126953125, -2.728912353515625, 2.2534122467041016, 8.105899810791016, 4.214164733886719, 2.4481658935546875, 0.068023681640625, -3.3864383697509766, 0.9904098510742188, 4.564582824707031, 5.7346954345703125, -6.421478271484375, -0.0755157470703125, 8.46931266784668, 14.423980712890625, -0.0886993408203125, -2.108327865600586, 6.468654632568359, 3.44537353515625, 9.46237564086914, 9.360763549804688, 0.07786178588867188, 0.5115509033203125, 0.7708473205566406, -2.209674835205078, -2.158660888671875, 8.431915283203125, 2.0809860229492188, -1.1499271392822266, 2.009519577026367, -0.4904022216796875, 7.518096923828125, 2.280792236328125, -3.0502891540527344, 2.8045654296875, 4.051689147949219, 5.496425628662109, -0.497100830078125, -0.13982582092285156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000214.npy"}
{"epoch": 0.3235071806500378, "step": 215, "batch_size": 64, "mean": 2.9273152351379395, "std": 3.7066454887390137, "min": -4.1664886474609375, "p10": -1.5293510437011715, "median": 2.5353622436523438, "p90": 8.201057434082033, "max": 12.041065216064453, "pos_frac": 0.765625, "sample": [2.0372352600097656, 1.4504203796386719, -4.1664886474609375, 2.0980072021484375, 5.036674499511719, -0.6923141479492188, 2.567901611328125, -1.6934890747070312, 7.517965316772461, 7.353107452392578, 0.9626083374023438, -0.550445556640625, -2.9678573608398438, 0.7958908081054688, 3.469757080078125, 9.253204345703125, -0.9234390258789062, 4.683280944824219, 0.19844818115234375, 0.01580810546875, 4.485260009765625, 4.7747039794921875, 1.0367164611816406, 3.8639373779296875, 6.40550422668457, 0.9026069641113281, -0.0647430419921875, 2.54803466796875, 2.5226898193359375, -2.5022754669189453, -0.043399810791015625, 8.458297729492188, 0.5163803100585938, 1.291534423828125, 0.169219970703125, -3.02685546875, 3.9480361938476562, 2.9071502685546875, 5.453559875488281, 9.4033203125, 2.5788326263427734, 1.8124618530273438, -2.726165771484375, -1.1463623046875, 5.2845916748046875, 9.840377807617188, 7.401786804199219, 12.041065216064453, 7.104241371154785, 7.600830078125, 4.9642333984375, 8.882652282714844, 4.09283447265625, -2.0311431884765625, 1.5926990509033203, 0.6358528137207031, 4.4837646484375, 9.357955932617188, 5.406078338623047, -0.46550559997558594, 7.11785888671875, 1.6202564239501953, 2.9032974243164062, -0.5002689361572266], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000215.npy"}
{"epoch": 0.3250188964474679, "step": 216, "batch_size": 64, "mean": 3.177128553390503, "std": 5.066586494445801, "min": -4.591545104980469, "p10": -1.7511276245117187, "median": 2.1963138580322266, "p90": 7.3359016418457035, "max": 29.167510986328125, "pos_frac": 0.75, "sample": [0.20816802978515625, 6.252105712890625, 6.19378662109375, 10.482452392578125, 5.689149856567383, 1.2782135009765625, 2.9450836181640625, 1.8827362060546875, 6.009796142578125, 8.579652786254883, 2.3786544799804688, 3.5933609008789062, 0.6534805297851562, -1.810394287109375, 4.186180114746094, -2.4192123413085938, 1.1137504577636719, 3.7260284423828125, -1.6128387451171875, 4.9865264892578125, 5.09912109375, 7.1807098388671875, 0.5234603881835938, 4.995933532714844, 5.795001983642578, 1.7247505187988281, 7.265357971191406, 1.202585220336914, 1.5135021209716797, 9.977996826171875, 2.8221054077148438, -4.591545104980469, 0.915374755859375, 2.0472679138183594, -2.0854034423828125, 4.7381591796875, -3.7168121337890625, 0.5018730163574219, 5.123241424560547, -0.9456348419189453, -0.4049339294433594, 29.167510986328125, 14.161918640136719, -0.6053390502929688, 1.3427581787109375, -2.5487594604492188, 4.6263885498046875, 0.37567710876464844, -0.6619720458984375, 7.3661346435546875, 9.928043365478516, -3.422344207763672, -1.6060867309570312, 0.2422332763671875, 6.758522033691406, -1.4039554595947266, 1.2155838012695312, 3.469207763671875, -1.2155303955078125, 7.118927001953125, -0.807159423828125, 6.543251037597656, 2.3453598022460938, 6.9470672607421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000216.npy"}
{"epoch": 0.32653061224489793, "step": 217, "batch_size": 64, "mean": 3.7433767318725586, "std": 4.164006233215332, "min": -3.73809814453125, "p10": -0.32035980224609373, "median": 2.7087173461914062, "p90": 8.941438484191895, "max": 19.72113037109375, "pos_frac": 0.859375, "sample": [3.8743515014648438, 10.895584106445312, -0.2845458984375, 2.675872802734375, 0.80279541015625, -0.5765571594238281, 1.423532485961914, 19.72113037109375, 0.419219970703125, 0.9227294921875, 8.986663818359375, 9.626502990722656, 2.9524002075195312, 2.366943359375, 3.375244140625, 1.9934616088867188, 3.5754852294921875, 1.7121162414550781, 1.0361785888671875, 4.993309020996094, 0.8598308563232422, 5.014654159545898, 1.430145263671875, 5.9082794189453125, 4.9317169189453125, 8.065055847167969, -2.4939193725585938, 1.6018905639648438, 0.9955215454101562, 6.237529754638672, 8.480745315551758, 3.6444854736328125, 4.2305908203125, -1.8198776245117188, 4.505683898925781, -1.5823917388916016, 1.082672119140625, 8.835912704467773, -0.3357086181640625, 1.2055816650390625, -3.73809814453125, 7.971715927124023, 4.432014465332031, -1.63232421875, 1.8332748413085938, 2.7956161499023438, 1.5680923461914062, 1.3218994140625, 2.43341064453125, 15.1087646484375, 1.8614006042480469, 7.784284591674805, 2.1111621856689453, 0.5575809478759766, 10.746376037597656, 4.3595123291015625, 10.300117492675781, 5.5038604736328125, 2.2591400146484375, 7.22456169128418, 2.7415618896484375, 6.6791839599609375, 4.272926330566406, -0.21075439453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000217.npy"}
{"epoch": 0.328042328042328, "step": 218, "batch_size": 64, "mean": 1.9406603574752808, "std": 4.02776575088501, "min": -10.351036071777344, "p10": -3.0124765396118156, "median": 2.010099411010742, "p90": 6.027716827392579, "max": 13.589736938476562, "pos_frac": 0.71875, "sample": [3.31243896484375, 1.0340461730957031, 0.308074951171875, 2.381805419921875, 4.4281005859375, -0.6007232666015625, 0.3961982727050781, 1.156005859375, 0.1382598876953125, 5.666862487792969, 1.7999343872070312, 2.438558578491211, 6.382911682128906, -5.102020263671875, -2.23406982421875, -3.342374801635742, 0.32271766662597656, 6.140251159667969, -5.295013427734375, -0.34284210205078125, 3.1272964477539062, 2.8559188842773438, 3.009185791015625, 1.2869148254394531, 4.584949493408203, 13.589736938476562, 5.092777252197266, 8.445358276367188, 4.426883697509766, 5.4093017578125, -0.7251491546630859, 4.97705078125, 3.0915374755859375, -3.9992599487304688, -0.6877250671386719, -0.35980224609375, 2.185638427734375, 1.5020809173583984, 1.49951171875, 5.76513671875, 9.865299224853516, 4.618499755859375, 0.36107826232910156, -1.2157058715820312, 1.8345603942871094, -2.0630130767822266, 4.379280090332031, 0.7416782379150391, -3.983154296875, -2.2427139282226562, 2.8242263793945312, -5.248451232910156, 4.988166809082031, -0.08653831481933594, 3.7957534790039062, 4.915863037109375, 3.7442893981933594, 9.759696960449219, -10.351036071777344, -1.985137939453125, 2.3613739013671875, 0.9345855712890625, 4.2480621337890625, 7.9391326904296875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000218.npy"}
{"epoch": 0.3295540438397581, "step": 219, "batch_size": 64, "mean": 1.7401175498962402, "std": 3.091704845428467, "min": -4.421421051025391, "p10": -2.354212188720703, "median": 1.5556488037109375, "p90": 5.571509552001954, "max": 10.39617919921875, "pos_frac": 0.71875, "sample": [4.681404113769531, 5.306499481201172, 4.7122802734375, 1.5187911987304688, 2.1069869995117188, 0.2399749755859375, -0.8403587341308594, 0.45247840881347656, 5.087310791015625, -4.150440216064453, 0.0063571929931640625, 8.555648803710938, 2.649383544921875, -1.0568466186523438, -2.406158447265625, 3.4274749755859375, 1.6853256225585938, 1.9826812744140625, 1.3963813781738281, -0.056613922119140625, 0.6377449035644531, -1.31878662109375, -2.0561065673828125, 1.2245903015136719, -0.5771636962890625, -1.2517509460449219, 1.310516357421875, -0.9474525451660156, 2.7000579833984375, 3.7329864501953125, 2.4530410766601562, 5.884986877441406, 6.932197570800781, 7.021978378295898, 2.4371376037597656, 5.0268402099609375, 0.1334686279296875, 2.7314605712890625, -2.5361480712890625, 1.1256446838378906, 0.5428142547607422, 0.5027847290039062, -3.0861282348632812, 1.5925064086914062, 2.4161834716796875, -2.3923416137695312, 3.472484588623047, -2.2652435302734375, 2.88189697265625, 10.39617919921875, 2.9838485717773438, 2.9824485778808594, 3.568836212158203, -0.499237060546875, -4.421421051025391, 0.8119010925292969, 4.383941650390625, 3.90069580078125, 4.7849884033203125, -2.1183624267578125, 5.685085296630859, -2.67303466796875, 6.732215881347656, 1.220672607421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000219.npy"}
{"epoch": 0.3310657596371882, "step": 220, "batch_size": 64, "mean": 2.4130749702453613, "std": 3.7387242317199707, "min": -5.878147125244141, "p10": -2.1905836105346674, "median": 1.9381961822509766, "p90": 7.051694488525395, "max": 11.98495101928711, "pos_frac": 0.71875, "sample": [-0.3768329620361328, -0.175872802734375, 1.9024696350097656, 2.2088775634765625, 0.8060798645019531, 6.10955810546875, 9.5755615234375, 1.7742156982421875, 3.1228904724121094, 2.5380172729492188, -0.913604736328125, 11.98495101928711, 0.8635082244873047, -3.3519439697265625, -1.3792972564697266, 0.7903575897216797, 7.455467224121094, -0.76141357421875, -1.7812271118164062, 0.27982330322265625, 5.064569473266602, 3.5621299743652344, -1.2131881713867188, -2.7540340423583984, 1.5954742431640625, -2.3660221099853516, -0.00466156005859375, -4.214836120605469, 4.740753173828125, 5.363883972167969, 3.8345718383789062, 1.3394088745117188, 3.2520904541015625, 5.251953125, 9.206548690795898, 1.7206535339355469, 5.027313232421875, 5.718803405761719, 7.966650009155273, -0.5284652709960938, 1.0195121765136719, 3.6249771118164062, 1.3544998168945312, 3.553691864013672, 9.386566162109375, -0.8893051147460938, 4.531463623046875, 0.2162151336669922, 3.7542972564697266, 4.641506195068359, 4.618207931518555, -4.106409072875977, 5.726552963256836, 0.5094623565673828, 1.7935791015625, 1.9739227294921875, -5.878147125244141, 5.660144805908203, 3.7350730895996094, -0.3533763885498047, 11.5596923828125, -2.61187744140625, 4.705142974853516, 2.676219940185547], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000220.npy"}
{"epoch": 0.3325774754346183, "step": 221, "batch_size": 64, "mean": 2.503993034362793, "std": 3.8708651065826416, "min": -5.779155731201172, "p10": -1.445709228515625, "median": 2.156147003173828, "p90": 7.0751491546630865, "max": 12.3526611328125, "pos_frac": 0.71875, "sample": [0.5744762420654297, -0.4024543762207031, -0.20793914794921875, 1.5382080078125, -0.4141731262207031, 6.939933776855469, 0.6590156555175781, -0.03399658203125, 3.6608543395996094, -1.4600601196289062, 0.5295448303222656, 2.5392227172851562, 2.981292724609375, 3.2063369750976562, -2.2236709594726562, 0.8567295074462891, -0.7427215576171875, 0.2559185028076172, 2.824361801147461, 2.2134857177734375, 4.965732574462891, 4.441062927246094, 5.185520172119141, 4.934654235839844, 8.13543701171875, 3.770130157470703, 4.767692565917969, 12.292949676513672, -1.4122238159179688, -0.06496047973632812, 5.4922943115234375, 7.133098602294922, -0.05687141418457031, 3.4019508361816406, 12.3526611328125, 1.9632644653320312, 4.54925537109375, -4.416820526123047, 2.759929656982422, 4.59136962890625, 3.6334762573242188, -3.204193115234375, 2.0988082885742188, -0.20647048950195312, -0.7206306457519531, -0.920501708984375, 1.0385761260986328, 11.731216430664062, 0.980194091796875, 0.07634353637695312, -2.1546630859375, -5.779155731201172, 0.5625724792480469, 11.411666870117188, 4.2166900634765625, -2.0525264739990234, 2.304901123046875, 3.069793701171875, 11.964309692382812, 4.134407043457031, 0.34270477294921875, 0.9714889526367188, 5.9066009521484375, 2.7694625854492188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000221.npy"}
{"epoch": 0.3340891912320484, "step": 222, "batch_size": 64, "mean": 3.2043919563293457, "std": 3.738464593887329, "min": -4.798515319824219, "p10": -1.3576515197753907, "median": 2.8837318420410156, "p90": 7.455670738220215, "max": 13.522552490234375, "pos_frac": 0.828125, "sample": [-1.2159309387207031, -4.798515319824219, 4.089851379394531, 7.332061767578125, 5.231639862060547, 7.508646011352539, 5.408611297607422, -0.023983001708984375, 6.1424560546875, 10.026458740234375, 2.79583740234375, 2.5416908264160156, -1.3182754516601562, 0.474639892578125, 3.8002452850341797, 0.6755905151367188, 2.2425689697265625, 2.6827392578125, 0.22853469848632812, 4.662059783935547, 1.5719375610351562, 0.07737541198730469, 4.06378173828125, 4.612884521484375, -2.4038848876953125, 4.965080261230469, 1.1591567993164062, 13.522552490234375, 0.08482170104980469, 4.8980560302734375, 2.602518081665039, 2.8294830322265625, 5.511474609375, 6.259117126464844, 4.407135009765625, 6.9668426513671875, 2.9379806518554688, -0.9576187133789062, -2.597463607788086, 0.2896728515625, -3.5569305419921875, 6.957817077636719, 0.32114410400390625, 4.575843811035156, 1.0950889587402344, 9.0211181640625, 2.0422744750976562, -2.28875732421875, -1.3745269775390625, 4.988761901855469, 3.7732162475585938, 0.00902557373046875, 7.30584716796875, 10.395353317260742, 4.327857971191406, 2.3856163024902344, 6.127513885498047, 1.543548583984375, 4.4491119384765625, 8.49127197265625, 4.0880279541015625, 2.8199539184570312, 11.246551513671875, -2.95147705078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000222.npy"}
{"epoch": 0.3356009070294785, "step": 223, "batch_size": 64, "mean": 3.0298619270324707, "std": 3.4150021076202393, "min": -2.9639053344726562, "p10": -1.1427906036376954, "median": 3.109236717224121, "p90": 7.2420082092285165, "max": 14.063404083251953, "pos_frac": 0.75, "sample": [4.442485809326172, 4.120941162109375, 4.630760192871094, 2.85369873046875, -1.0794219970703125, 6.330970764160156, 0.2122802734375, 3.0175628662109375, 2.6257171630859375, 3.654582977294922, -0.1283092498779297, 8.126480102539062, 4.212551116943359, 3.8836746215820312, -2.9639053344726562, -1.5120811462402344, -0.17938804626464844, 4.358131408691406, 5.701324462890625, -1.44244384765625, 6.154033660888672, 2.1677169799804688, -0.682891845703125, 3.1077632904052734, 0.10787200927734375, 3.1334457397460938, 0.9583034515380859, 4.6512298583984375, -1.2575454711914062, -2.152759552001953, 3.1670303344726562, 4.359651565551758, -1.4871826171875, 3.9183006286621094, 2.626556396484375, 11.301193237304688, 6.506401062011719, 8.758190155029297, 0.2530517578125, -1.1699485778808594, -0.37198638916015625, 0.013498306274414062, 4.0763702392578125, 2.002838134765625, 6.91864013671875, 1.3174629211425781, 7.354095458984375, 6.26689338684082, 2.697021484375, 6.788654327392578, 14.063404083251953, -0.8383331298828125, 7.6165618896484375, 7.321556091308594, 4.0179595947265625, 3.1107101440429688, 4.245246887207031, 3.7078323364257812, 1.9561882019042969, -0.9530239105224609, -0.23297119140625, 7.056396484375, 0.9255523681640625, -0.4354133605957031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000223.npy"}
{"epoch": 0.3371126228269085, "step": 224, "batch_size": 64, "mean": 1.824683427810669, "std": 2.9444823265075684, "min": -3.3330841064453125, "p10": -1.7620582580566402, "median": 1.3852863311767578, "p90": 5.462033081054688, "max": 12.826705932617188, "pos_frac": 0.734375, "sample": [1.8130455017089844, 8.476287841796875, 5.515987396240234, 2.9126739501953125, -0.0012969970703125, 5.811740875244141, 4.44549560546875, 1.0512161254882812, -0.1619091033935547, 1.3419113159179688, 5.336139678955078, 2.1724910736083984, 5.21844482421875, 6.058803558349609, -0.9038619995117188, 0.5616588592529297, 1.1512794494628906, 1.8011283874511719, 5.093513488769531, 0.2188873291015625, 0.17525482177734375, 4.3954620361328125, 2.6828155517578125, -2.614145278930664, -1.222900390625, -0.8231239318847656, -1.1414260864257812, -1.19207763671875, 2.864227294921875, 2.19586181640625, 2.6240463256835938, -2.4415435791015625, -2.1908321380615234, 0.20204925537109375, -2.12860107421875, 0.11528968811035156, 0.5723361968994141, -2.434478759765625, 2.4906158447265625, 2.9857864379882812, -0.09188079833984375, 1.9311866760253906, 2.6786251068115234, -0.3088417053222656, 0.7874755859375, 0.4003143310546875, 4.959064483642578, 1.4286613464355469, -1.9931259155273438, 0.9558639526367188, -1.1753120422363281, 2.886157989501953, 4.451478958129883, 3.2621917724609375, 12.826705932617188, 2.1322402954101562, 1.1469497680664062, 5.8684539794921875, 0.2546520233154297, 6.542957305908203, -3.3330841064453125, 5.19976806640625, 1.6475677490234375, 1.2934188842773438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000224.npy"}
{"epoch": 0.3386243386243386, "step": 225, "batch_size": 64, "mean": 2.916795492172241, "std": 3.991753101348877, "min": -5.916961669921875, "p10": -1.5707393646240233, "median": 2.0191478729248047, "p90": 8.624144744873048, "max": 13.594924926757812, "pos_frac": 0.765625, "sample": [0.4121437072753906, 1.7724227905273438, 0.1461029052734375, 1.774627685546875, 4.21917724609375, 7.00469970703125, -1.0032768249511719, -0.46585845947265625, 2.157367706298828, 3.9028396606445312, 1.7361183166503906, 6.1638641357421875, 1.2401962280273438, 6.109344482421875, -1.6186866760253906, 1.2051353454589844, 2.4024658203125, 2.7334136962890625, 3.550447463989258, 7.884834289550781, 8.48525619506836, 5.010173797607422, 10.391349792480469, 1.823089599609375, 13.594924926757812, 1.3285751342773438, 0.764068603515625, 2.4617652893066406, 8.124382019042969, -1.663949966430664, 0.7108154296875, -4.27374267578125, 2.4752731323242188, -2.473491668701172, 8.534759521484375, 4.5672454833984375, 0.6686801910400391, 10.270050048828125, 0.115631103515625, 0.39971160888671875, -0.22612762451171875, 1.5799789428710938, 8.662452697753906, -0.030025482177734375, 10.650505065917969, -1.724700927734375, -1.4588623046875, -0.32172393798828125, 8.987701416015625, 1.0945587158203125, 4.948326110839844, -0.08441352844238281, 2.9835968017578125, 5.972503662109375, 7.352203369140625, -0.27878570556640625, 1.8809280395507812, 4.628593444824219, 9.105228424072266, 2.828968048095703, -5.916961669921875, 2.4214324951171875, 4.848197937011719, -3.8705978393554688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000225.npy"}
{"epoch": 0.3401360544217687, "step": 226, "batch_size": 64, "mean": 2.5553388595581055, "std": 3.616837501525879, "min": -5.223258972167969, "p10": -1.038224220275879, "median": 2.0959558486938477, "p90": 8.055851554870605, "max": 10.329933166503906, "pos_frac": 0.765625, "sample": [2.46337890625, -0.8059234619140625, 2.473602294921875, 5.682683944702148, 2.316436767578125, 0.6548614501953125, 10.080947875976562, 2.118906021118164, 0.8673858642578125, -2.673473358154297, 3.9832725524902344, 1.1818523406982422, 2.9677505493164062, 6.520952224731445, -1.4460372924804688, 0.027187347412109375, 3.0065174102783203, -0.17736053466796875, 8.081972122192383, 7.207389831542969, 8.74896240234375, 0.8627700805664062, 4.163841247558594, 0.5782928466796875, 1.8006134033203125, 0.8231124877929688, 0.7774200439453125, 8.180305480957031, 6.68377685546875, 6.1378326416015625, -0.6719417572021484, 5.67828369140625, 1.92449951171875, 1.8579368591308594, 8.85784912109375, 0.7570419311523438, 3.7804641723632812, -3.7106781005859375, -0.5839595794677734, 0.2452983856201172, 2.0730056762695312, 4.95489501953125, -0.9700469970703125, -5.035167694091797, 9.345375061035156, 7.994903564453125, -5.223258972167969, 5.5941009521484375, -3.3390426635742188, 2.3694915771484375, 1.3889732360839844, 1.6343555450439453, 0.7132759094238281, 4.176567077636719, 2.6099700927734375, 10.329933166503906, 3.303009033203125, 4.198173522949219, -1.0510101318359375, -0.1561412811279297, 6.351387023925781, -1.0083904266357422, -0.8712501525878906, 2.7345504760742188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000226.npy"}
{"epoch": 0.3416477702191988, "step": 227, "batch_size": 64, "mean": 2.345842123031616, "std": 3.764446973800659, "min": -4.853462219238281, "p10": -2.246027374267578, "median": 1.8425979614257812, "p90": 7.622931098937989, "max": 10.297258377075195, "pos_frac": 0.703125, "sample": [6.05694580078125, 2.393260955810547, 0.5974044799804688, 4.847038269042969, 0.094482421875, -2.0541725158691406, 4.563999176025391, 10.297258377075195, -4.4259796142578125, -2.3022384643554688, 9.306999206542969, 0.1101837158203125, -0.9682140350341797, 7.71619987487793, 5.72552490234375, 1.2074813842773438, -0.37207603454589844, 0.42180633544921875, 7.405303955078125, 1.274200439453125, 1.8045196533203125, 4.447662353515625, 2.9207611083984375, 1.9385261535644531, 9.574951171875, 3.4099349975585938, -0.13104248046875, 5.132621765136719, -0.9147167205810547, 0.07917213439941406, 4.3423614501953125, 9.878263473510742, 4.2456817626953125, -2.1148681640625, -1.5233917236328125, 5.219520568847656, 9.995182037353516, -0.0439300537109375, 3.84613037109375, 7.139411926269531, 1.3113784790039062, 1.7785186767578125, -0.9538803100585938, 1.88067626953125, 3.2813720703125, 0.27767181396484375, -0.07462310791015625, 2.48675537109375, -2.4148941040039062, -0.1708354949951172, -4.069705963134766, 4.424060821533203, 4.06793212890625, 5.454399108886719, 1.062582015991211, -2.439453125, 1.9191131591796875, 1.3274955749511719, 4.62895393371582, -4.853462219238281, -0.0103302001953125, -4.3936920166015625, 8.369722366333008, 6.101982116699219], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000227.npy"}
{"epoch": 0.3431594860166289, "step": 228, "batch_size": 64, "mean": 2.548862934112549, "std": 4.132721900939941, "min": -6.654407501220703, "p10": -2.809539413452148, "median": 2.017580032348633, "p90": 7.785293960571289, "max": 13.676948547363281, "pos_frac": 0.75, "sample": [0.10147476196289062, 6.624752044677734, 1.7643814086914062, 11.203451156616211, 4.828399658203125, 2.7333984375, 1.7493228912353516, 9.354032516479492, 1.2880859375, -3.1203155517578125, 2.719728469848633, -0.8404083251953125, 2.00958251953125, -6.654407501220703, 0.48687744140625, 1.081857681274414, 3.5075111389160156, -0.49396514892578125, 3.649913787841797, 2.75390625, -0.5040493011474609, 8.170013427734375, -1.523834228515625, 5.364860534667969, -3.4709625244140625, 3.7543601989746094, 3.9704227447509766, 13.676948547363281, 1.9828433990478516, 5.29266357421875, 0.5653171539306641, -2.4869232177734375, 0.4096183776855469, 0.07588958740234375, 4.097881317138672, -2.404470443725586, 2.0255775451660156, -0.6106967926025391, -5.43475341796875, -2.947803497314453, 2.1188507080078125, 4.2053985595703125, 12.389404296875, -3.028261184692383, 1.9022979736328125, 0.7668113708496094, 6.957977294921875, 7.822902679443359, -4.104866027832031, 2.7180328369140625, -0.023374557495117188, 1.9377326965332031, 7.697540283203125, 7.1688385009765625, -1.1129379272460938, 6.731996536254883, 7.6385955810546875, 8.787544250488281, 5.2170867919921875, 2.3705902099609375, 1.9860382080078125, 2.8080596923828125, 3.8429908752441406, 1.5774917602539062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000228.npy"}
{"epoch": 0.34467120181405897, "step": 229, "batch_size": 64, "mean": 2.1660614013671875, "std": 3.456918478012085, "min": -7.985332489013672, "p10": -2.5769664764404294, "median": 2.0654678344726562, "p90": 6.624015808105471, "max": 8.573211669921875, "pos_frac": 0.765625, "sample": [4.2237091064453125, 5.501066207885742, 4.588356018066406, 2.4848251342773438, 1.796457290649414, 5.977104187011719, 2.9237136840820312, 1.862752914428711, 1.6985797882080078, -2.86474609375, 0.9217109680175781, 1.1383056640625, 4.056602478027344, -2.9297027587890625, 1.9568023681640625, 1.5444011688232422, -3.4154205322265625, -3.076488494873047, 5.415565490722656, 4.723552703857422, 2.946859359741211, 0.68572998046875, -0.27510643005371094, 4.535392761230469, -2.220245361328125, 4.785083770751953, -1.9426345825195312, 0.2342243194580078, 1.1375274658203125, -2.2060699462890625, 3.99407958984375, 8.558349609375, -7.985332489013672, 8.042182922363281, -2.0642013549804688, 0.3827228546142578, 4.09759521484375, -2.729846954345703, 3.139984130859375, 3.6207427978515625, 6.867179870605469, 4.513486862182617, 0.6399211883544922, -0.7466964721679688, 5.006587982177734, -0.9039306640625, 7.2959747314453125, 3.7415847778320312, 4.826173782348633, 1.9765625, 2.9089431762695312, -0.19344520568847656, 2.1543731689453125, 1.4637451171875, 7.0107421875, 3.8797683715820312, 0.8634681701660156, 1.8646106719970703, 8.224367141723633, 8.573211669921875, 6.056632995605469, 2.354351043701172, 0.3396568298339844, -5.353513717651367], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000229.npy"}
{"epoch": 0.34618291761148906, "step": 230, "batch_size": 64, "mean": 2.495007038116455, "std": 3.6780214309692383, "min": -6.0762481689453125, "p10": -1.4152538299560546, "median": 2.1038036346435547, "p90": 7.078112030029297, "max": 12.948226928710938, "pos_frac": 0.796875, "sample": [-1.2392387390136719, 1.9819488525390625, 2.9158401489257812, 1.1220855712890625, 3.1293563842773438, -4.703369140625, -0.21076011657714844, -0.03061676025390625, 4.110633850097656, 5.370941162109375, 4.754692077636719, 8.515922546386719, 3.9409866333007812, 7.00372314453125, 3.6027603149414062, 2.6810760498046875, 4.9657745361328125, 6.010143280029297, -1.936767578125, 0.721282958984375, -6.0762481689453125, 1.0356826782226562, 0.187713623046875, 4.311313629150391, 4.7002716064453125, -1.9896240234375, 4.521148681640625, 1.4116897583007812, 7.109992980957031, 4.891090393066406, 4.235761642456055, 2.225658416748047, -1.3378181457519531, 1.4204559326171875, 1.2027130126953125, -0.6817054748535156, 0.9814453125, 0.8321952819824219, 12.248687744140625, 1.5524139404296875, 2.598062515258789, 2.2551345825195312, 0.21760177612304688, 1.5673294067382812, 8.958442687988281, 6.1053314208984375, -4.295246124267578, 12.948226928710938, 0.8032760620117188, -0.24938011169433594, 9.923385620117188, -5.148193359375, 2.9709529876708984, 7.312070846557617, 2.2952003479003906, 3.9135894775390625, 1.662435531616211, 0.42098045349121094, 1.2390117645263672, 1.2582778930664062, 2.8713855743408203, 4.23681640625, -1.4484405517578125, 1.7749443054199219], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000230.npy"}
{"epoch": 0.3476946334089191, "step": 231, "batch_size": 64, "mean": 3.3989481925964355, "std": 3.440877676010132, "min": -4.199501037597656, "p10": -0.924144172668457, "median": 3.341341972351074, "p90": 8.094099426269535, "max": 11.003288269042969, "pos_frac": 0.84375, "sample": [4.8728485107421875, 0.06904220581054688, 5.598171234130859, 4.777919769287109, 0.34454345703125, 4.151771545410156, 1.5313606262207031, 6.747222900390625, -0.29383087158203125, 1.3622474670410156, 2.629720687866211, 1.3788604736328125, 1.1128387451171875, 1.5054283142089844, -2.0845489501953125, 2.9215621948242188, 4.063468933105469, 5.873130798339844, 2.14373779296875, -1.3256492614746094, 10.130081176757812, -2.071727752685547, 2.4985580444335938, 4.982540130615234, 0.6096229553222656, 8.457046508789062, -4.199501037597656, -0.4823646545410156, 11.003288269042969, 5.227636337280273, 5.95379638671875, -0.9137687683105469, 7.247222900390625, 9.187469482421875, 6.093776702880859, 3.379547119140625, 1.0660171508789062, 3.6108169555664062, 4.254974365234375, 0.34732818603515625, 9.961898803710938, 2.4375381469726562, 3.3031368255615234, 4.7299652099609375, 6.698585510253906, 2.3226890563964844, 5.246009826660156, -0.9285907745361328, 1.5518798828125, -1.0664863586425781, 5.144235610961914, 5.19171142578125, 2.3470611572265625, 4.959136962890625, 4.201206207275391, 10.6815185546875, 3.882781982421875, 2.9450225830078125, 2.6211700439453125, 4.0309600830078125, -3.954681396484375, 1.276357650756836, 6.08287239074707, 10.104503631591797], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000231.npy"}
{"epoch": 0.3492063492063492, "step": 232, "batch_size": 64, "mean": 2.641950845718384, "std": 4.283224105834961, "min": -9.308197021484375, "p10": -2.3510070800781246, "median": 1.8436050415039062, "p90": 8.303096771240234, "max": 13.544891357421875, "pos_frac": 0.75, "sample": [-0.0219573974609375, 8.693931579589844, 5.050472259521484, 10.981117248535156, 3.1593074798583984, 10.36572265625, 1.1843490600585938, 8.2626953125, -0.0921173095703125, -3.6939620971679688, -2.466754913330078, 1.6220664978027344, -0.7080535888671875, 1.0517826080322266, 1.7535591125488281, 2.9155616760253906, 0.7871112823486328, 2.402872085571289, 3.3361740112304688, 0.3190879821777344, 1.1899490356445312, 8.320411682128906, 4.8367919921875, 2.7938079833984375, 4.137855529785156, 3.4320755004882812, -3.437154769897461, 8.171600341796875, -2.0809288024902344, 5.350608825683594, 0.848236083984375, 6.6900482177734375, -3.196237564086914, -0.2682952880859375, 3.69390869140625, 0.5038909912109375, 0.959197998046875, -3.1119766235351562, 7.0967559814453125, 0.429351806640625, 1.018035888671875, 8.196624755859375, -0.4277687072753906, 12.6724853515625, 10.33526611328125, 13.544891357421875, -2.8992080688476562, -1.5140609741210938, -1.31060791015625, 1.4367332458496094, 1.9096298217773438, 6.2562408447265625, 6.142333984375, 4.430156707763672, 1.9488639831542969, -1.1987152099609375, 0.36009979248046875, 2.1402320861816406, 1.7775802612304688, 2.1821556091308594, 2.9394378662109375, -9.308197021484375, 6.133918762207031, 1.055868148803711], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000232.npy"}
{"epoch": 0.3507180650037793, "step": 233, "batch_size": 64, "mean": 2.199291706085205, "std": 3.979929208755493, "min": -5.302864074707031, "p10": -2.262759780883789, "median": 1.6190595626831055, "p90": 7.909310913085938, "max": 15.763694763183594, "pos_frac": 0.671875, "sample": [15.763694763183594, 0.6508102416992188, 10.419883728027344, -0.7877445220947266, 2.41717529296875, 5.041019439697266, 4.106292724609375, 3.6375732421875, 1.3057212829589844, -0.8061370849609375, 3.3112258911132812, 10.995574951171875, 0.2708091735839844, -0.6079864501953125, -2.5520477294921875, 6.108772277832031, 4.234945297241211, 1.6377506256103516, -0.5188140869140625, 0.8175868988037109, 5.0258331298828125, 2.956554412841797, 4.204257965087891, 2.2896881103515625, -3.265045166015625, 2.804351806640625, -5.010261535644531, 3.6391067504882812, -1.8691558837890625, -2.4506759643554688, 4.679817199707031, 0.7017097473144531, 4.608489990234375, 1.6003684997558594, 7.8688507080078125, -5.302864074707031, 4.441505432128906, 0.6503429412841797, -0.7754058837890625, 0.654815673828125, 2.7317657470703125, -2.8982086181640625, 10.091438293457031, -0.6690597534179688, 2.4312667846679688, -0.96551513671875, -0.5215377807617188, 1.9212188720703125, 4.17144775390625, 4.483612060546875, -1.1188316345214844, -0.03814125061035156, 0.9325294494628906, 7.9266510009765625, 9.721153259277344, 8.474777221679688, 1.209970474243164, -2.2839622497558594, 2.647705078125, 0.0911712646484375, -0.1780853271484375, -2.213287353515625, -0.6136054992675781, 2.5218124389648438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000233.npy"}
{"epoch": 0.35222978080120937, "step": 234, "batch_size": 64, "mean": 1.7568473815917969, "std": 3.556968927383423, "min": -6.696754455566406, "p10": -2.1622097015380857, "median": 1.1774864196777344, "p90": 6.441807174682619, "max": 11.272880554199219, "pos_frac": 0.65625, "sample": [1.1542892456054688, -0.00067138671875, 0.19195556640625, -1.843994140625, 1.7526073455810547, -1.4456596374511719, 1.5744705200195312, -2.0385780334472656, 3.258453369140625, -2.7020263671875, 3.654815673828125, 1.675100326538086, 3.7701644897460938, 4.286106109619141, 7.3732147216796875, 0.5845375061035156, -2.2151947021484375, -2.463237762451172, 5.055938720703125, -0.9529857635498047, 1.1755256652832031, 1.3354816436767578, 8.777299880981445, 11.272880554199219, -0.019269943237304688, -3.2391738891601562, 5.846324920654297, -0.48303794860839844, -6.696754455566406, -0.6752052307128906, 3.197418212890625, 5.311702728271484, 8.02309799194336, 6.697013854980469, 0.1907215118408203, -4.036531448364258, 0.30377197265625, 0.3489990234375, 2.2828006744384766, -1.8365554809570312, 5.0872955322265625, 3.6513824462890625, 1.1794471740722656, 7.45977783203125, 5.763969421386719, 0.9530487060546875, 1.0866622924804688, 5.370872497558594, 5.690887451171875, -1.129434585571289, -0.4973297119140625, 1.352773666381836, -4.585033416748047, 7.805545806884766, -1.7100181579589844, 0.9478416442871094, 4.9779052734375, 1.8288345336914062, -1.9938278198242188, 2.7888717651367188, -0.2028961181640625, 4.983011245727539, 3.8127059936523438, -0.6298751831054688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000234.npy"}
{"epoch": 0.35374149659863946, "step": 235, "batch_size": 64, "mean": 3.1893510818481445, "std": 3.0239381790161133, "min": -3.69122314453125, "p10": -0.11522693634033182, "median": 2.7191028594970703, "p90": 6.651708221435547, "max": 11.110618591308594, "pos_frac": 0.890625, "sample": [1.669454574584961, 6.66802978515625, 2.7888565063476562, 5.1011505126953125, 5.637054443359375, 1.1321372985839844, 1.7439956665039062, 10.579757690429688, 4.9774627685546875, 4.83905029296875, 0.09629058837890625, 3.07623291015625, 1.1899871826171875, 3.2270030975341797, 0.94927978515625, 6.010478973388672, 4.506746292114258, 0.9534416198730469, 3.749544143676758, 2.2313804626464844, 2.79327392578125, 2.571136474609375, -1.8525543212890625, 4.047523498535156, 6.613624572753906, 3.7855224609375, -0.9440841674804688, -0.4444160461425781, 11.110618591308594, -0.20587730407714844, 6.105903625488281, 1.6715965270996094, 3.575969696044922, 7.76983642578125, 5.757040023803711, 1.2006607055664062, 6.330265045166016, 0.8675346374511719, 7.90850830078125, -3.69122314453125, 0.9323596954345703, 7.328227996826172, 2.56903076171875, 2.6493492126464844, 5.3890533447265625, 1.2954483032226562, 0.5817108154296875, 4.761390686035156, 8.529205322265625, 2.3713455200195312, 2.5807876586914062, 0.4023284912109375, 5.973388671875, 0.824371337890625, 1.5873184204101562, -1.3176651000976562, 5.327278137207031, 4.963527679443359, 6.3391571044921875, 0.1907501220703125, 5.7801055908203125, 0.22684860229492188, 0.6823844909667969, -1.9474201202392578], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000235.npy"}
{"epoch": 0.35525321239606955, "step": 236, "batch_size": 64, "mean": 2.667544364929199, "std": 4.288147926330566, "min": -7.29302978515625, "p10": -1.5619041442871093, "median": 2.0092525482177734, "p90": 8.696699142456055, "max": 17.775192260742188, "pos_frac": 0.71875, "sample": [3.516185760498047, -1.508941650390625, -0.7321395874023438, 5.986377716064453, 3.7453365325927734, -0.9097442626953125, 2.3675193786621094, 8.858352661132812, -0.08140182495117188, 1.0559730529785156, 3.269195556640625, 2.982757568359375, -1.344808578491211, 3.4889068603515625, -1.5846023559570312, 2.1290435791015625, 1.8113460540771484, 4.18511962890625, 4.54620361328125, 9.037126541137695, 0.45474815368652344, 1.5022430419921875, 0.5032196044921875, 0.2514228820800781, 5.570343017578125, 3.5605850219726562, 4.4022979736328125, 5.502410888671875, 9.823600769042969, 4.033618927001953, 8.773792266845703, -1.6783599853515625, -1.2880859375, 3.0675697326660156, 12.961380004882812, -0.6010284423828125, -1.2412567138671875, 17.775192260742188, 1.2129325866699219, 1.8294715881347656, -2.724142074584961, 1.9449729919433594, 4.6998291015625, 5.833587646484375, 0.6372222900390625, 1.8693618774414062, 0.15423202514648438, -2.0767669677734375, 6.247371673583984, 1.5049114227294922, -0.5099296569824219, -6.805999755859375, -7.29302978515625, 9.638862609863281, 8.516815185546875, 6.264556884765625, 7.2552337646484375, -2.2447643280029297, -0.22864151000976562, -0.994873046875, 5.725963592529297, 1.0335159301757812, 2.9671154022216797, 2.0735321044921875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000236.npy"}
{"epoch": 0.35676492819349964, "step": 237, "batch_size": 64, "mean": 2.6089158058166504, "std": 4.0183305740356445, "min": -7.654205322265625, "p10": -2.121277618408203, "median": 2.1001129150390625, "p90": 7.592416763305666, "max": 16.556686401367188, "pos_frac": 0.828125, "sample": [3.3667430877685547, 5.2371978759765625, 4.295196533203125, 7.124034881591797, 1.3340377807617188, -2.1668548583984375, 2.0170059204101562, 5.938117980957031, 1.8653717041015625, 4.108554840087891, 2.5983810424804688, 1.868560791015625, -2.6681137084960938, 0.9812088012695312, 3.5973892211914062, 5.722679138183594, -0.4784889221191406, 7.79315185546875, -4.17108154296875, 4.391105651855469, -4.405303955078125, 5.284784317016602, 4.9235992431640625, 4.358875274658203, 9.4031982421875, -0.7246379852294922, 0.5479278564453125, 4.873573303222656, 0.8101806640625, 1.7987499237060547, 4.112848281860352, 1.5228271484375, 3.3601226806640625, -5.3788604736328125, 0.8333168029785156, 1.1894969940185547, 3.00732421875, -5.837104797363281, 10.028640747070312, 0.11681938171386719, 1.5911102294921875, 1.8947830200195312, 0.70208740234375, 0.39818572998046875, 2.1685638427734375, 0.3304004669189453, 4.724283218383789, 2.0316619873046875, 9.95281982421875, 7.9134979248046875, 4.0983428955078125, 3.348236083984375, 8.713817596435547, 16.556686401367188, -7.654205322265625, 3.492717742919922, -0.6440944671630859, 5.151910781860352, 2.395111083984375, 1.7342071533203125, 0.5504875183105469, 1.51611328125, -2.0149307250976562, 5.438240051269531], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000237.npy"}
{"epoch": 0.35827664399092973, "step": 238, "batch_size": 64, "mean": 2.537043809890747, "std": 3.620142936706543, "min": -3.818389892578125, "p10": -1.6618066787719725, "median": 2.1912612915039062, "p90": 6.708494567871094, "max": 11.605436325073242, "pos_frac": 0.734375, "sample": [-0.8550300598144531, 1.0344009399414062, 6.297792434692383, -3.2789535522460938, 5.3816986083984375, 11.605436325073242, 1.1418952941894531, -0.19175338745117188, 5.8724822998046875, 0.29400062561035156, 0.06250762939453125, -1.0417633056640625, 5.969779968261719, 2.7705230712890625, 3.5258560180664062, 4.337047576904297, -0.3017921447753906, -1.7530269622802734, 3.8013763427734375, 5.583271026611328, 6.380104064941406, 9.5850830078125, 5.0367584228515625, 3.8677120208740234, 0.10342979431152344, 3.372608184814453, 3.2272491455078125, 1.9390716552734375, -2.7614974975585938, 6.6723175048828125, 5.86414909362793, 5.633760452270508, 1.0047378540039062, 2.5230178833007812, -0.126922607421875, 0.7042522430419922, 2.847930908203125, -0.6233329772949219, 6.0334625244140625, 8.863372802734375, 2.443450927734375, -1.4489593505859375, 1.2871513366699219, -0.851043701171875, 6.342063903808594, 1.744598388671875, 9.692764282226562, 8.257404327392578, 1.5182857513427734, -0.922210693359375, 1.05963134765625, 4.622978210449219, 6.7239990234375, -3.689725875854492, -2.8612804412841797, 0.40454673767089844, -3.818389892578125, 2.511749267578125, 8.12057876586914, 0.3193511962890625, 4.185302734375, -0.6660919189453125, -3.5404014587402344, 0.5320377349853516], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000238.npy"}
{"epoch": 0.35978835978835977, "step": 239, "batch_size": 64, "mean": 2.9347760677337646, "std": 3.999671697616577, "min": -6.660745620727539, "p10": -2.0618408203125, "median": 2.886423110961914, "p90": 7.909360504150391, "max": 13.7401123046875, "pos_frac": 0.796875, "sample": [3.548797607421875, 6.244544982910156, 1.17047119140625, 1.6744346618652344, 6.8862457275390625, 7.626365661621094, -1.4277687072753906, 1.8425827026367188, 1.4410934448242188, 0.7670135498046875, -3.2375717163085938, 2.4110260009765625, 0.68707275390625, 7.8651885986328125, 6.0499114990234375, 0.1898345947265625, -1.9499130249023438, -4.397251129150391, 6.557220458984375, 5.84088134765625, 1.3626251220703125, 1.6206245422363281, 6.216587066650391, 7.977317810058594, 10.679744720458984, 1.899932861328125, 3.642759323120117, 2.899738311767578, 2.65777587890625, 3.8036460876464844, 5.104034423828125, 2.3153762817382812, -2.612396240234375, -1.5411300659179688, 3.222932815551758, 9.759750366210938, -0.6605911254882812, 0.8834362030029297, -4.478271484375, 3.500274658203125, -1.5273399353027344, 3.6745223999023438, 3.0713729858398438, 3.2613983154296875, -3.6729583740234375, 3.9341888427734375, 9.536964416503906, 13.7401123046875, 9.720321655273438, -2.1098098754882812, -0.5912322998046875, 2.87310791015625, -6.660745620727539, 7.3785858154296875, 3.73834228515625, 1.519866943359375, 6.077564239501953, 0.4810829162597656, 3.0876312255859375, 6.36767578125, 2.0517311096191406, 4.310997009277344, 7.928291320800781, 1.5896530151367188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000239.npy"}
{"epoch": 0.36130007558578986, "step": 240, "batch_size": 64, "mean": 3.4503772258758545, "std": 4.331057071685791, "min": -7.912422180175781, "p10": -1.6911483764648434, "median": 3.1106834411621094, "p90": 9.202485466003418, "max": 16.56552505493164, "pos_frac": 0.78125, "sample": [3.013111114501953, -2.1710433959960938, 3.8953857421875, 1.7256546020507812, 1.7997665405273438, 10.77252197265625, 4.914154052734375, 7.0772857666015625, -2.032621383666992, -0.0040760040283203125, 4.653327941894531, -0.5672283172607422, 3.81005859375, 9.207361221313477, 11.709789276123047, 12.289054870605469, 5.883289337158203, -4.154226303100586, 7.005096435546875, 7.8178558349609375, -1.3664894104003906, 3.0811386108398438, 2.304323196411133, -3.3894119262695312, 3.8696136474609375, 16.56552505493164, 3.30419921875, 4.0044097900390625, 9.191108703613281, 6.5849761962890625, 7.158775329589844, 6.007781982421875, 1.3964080810546875, 1.99591064453125, 2.0559425354003906, 3.8063812255859375, 3.140228271484375, 2.014190673828125, 6.039947509765625, 0.6714553833007812, 4.8288726806640625, 1.5198326110839844, -1.8302879333496094, 4.8963165283203125, 9.818634033203125, -7.912422180175781, 0.7610454559326172, 1.4249114990234375, 1.893585205078125, 0.6203155517578125, 4.061229705810547, 6.106967926025391, -0.9639511108398438, 2.3921890258789062, -0.15797805786132812, 1.5133476257324219, 6.519062042236328, 9.402656555175781, 0.08444976806640625, -0.2400646209716797, 7.418479919433594, -0.05106353759765625, -3.2937545776367188, 6.930824279785156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000240.npy"}
{"epoch": 0.36281179138321995, "step": 241, "batch_size": 64, "mean": 1.545569658279419, "std": 3.2214291095733643, "min": -3.983154296875, "p10": -2.411760711669922, "median": 1.2105426788330078, "p90": 6.120724105834961, "max": 9.441802978515625, "pos_frac": 0.65625, "sample": [-3.983154296875, -2.4606552124023438, 6.176471710205078, -1.389657974243164, 2.5523223876953125, -2.0808563232421875, 1.3603897094726562, 4.618610382080078, 3.357736587524414, 8.000869750976562, -1.5119857788085938, 6.6120452880859375, -2.661907196044922, 1.3425064086914062, 0.33737945556640625, 6.2713775634765625, 0.2689628601074219, 0.7216129302978516, -2.4267044067382812, 0.3989410400390625, 3.5131607055664062, 4.424163818359375, -2.00286865234375, 2.3160476684570312, 1.9864425659179688, -1.8933124542236328, 4.240501403808594, 0.45960235595703125, 0.9376907348632812, -2.37689208984375, 5.129617691040039, 5.9906463623046875, 9.441802978515625, 1.0785789489746094, -0.8409271240234375, -3.831268310546875, -1.8988265991210938, 4.169219970703125, -0.9751968383789062, 2.8494110107421875, -0.5160369873046875, -0.746337890625, -0.15118789672851562, 6.909858703613281, 2.2509689331054688, 3.6632614135742188, -3.2760009765625, 9.035552978515625, 3.9645233154296875, -0.8523483276367188, 2.097900390625, 1.8501663208007812, 3.6893310546875, 0.064483642578125, 5.7685089111328125, 0.4431800842285156, -3.0998916625976562, 3.5520572662353516, 1.4569320678710938, 2.2785682678222656, 0.11454010009765625, 3.41778564453125, -1.094034194946289, -0.12722396850585938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000241.npy"}
{"epoch": 0.36432350718065004, "step": 242, "batch_size": 64, "mean": 3.1865997314453125, "std": 4.11544942855835, "min": -4.773950576782227, "p10": -1.4526714324951164, "median": 2.834784507751465, "p90": 8.871808242797853, "max": 15.88275146484375, "pos_frac": 0.8125, "sample": [-4.773950576782227, 3.0760574340820312, 1.9590377807617188, 2.7079086303710938, 4.8646240234375, -3.1001853942871094, 6.7059173583984375, 6.528850555419922, 3.7534523010253906, 7.690155029296875, 0.808013916015625, 3.4882736206054688, -0.1226806640625, 10.173294067382812, 8.46365737915039, -0.1140899658203125, 9.942207336425781, -0.39206695556640625, 2.3769302368164062, 1.2981910705566406, 1.9097824096679688, 7.414695739746094, 9.705142974853516, 1.1742877960205078, 0.31475830078125, 5.22686767578125, -1.790802001953125, 0.18704986572265625, 4.964683532714844, 5.476654052734375, 4.0392303466796875, 0.6805210113525391, 13.675025939941406, 2.961660385131836, 4.757453918457031, 4.557838439941406, 0.8354854583740234, -0.6637001037597656, 15.88275146484375, 1.2209320068359375, -2.9418182373046875, 3.8556289672851562, 2.1218414306640625, 0.3108673095703125, 4.8136138916015625, -0.5450897216796875, 5.3450164794921875, 5.052803039550781, 4.9330902099609375, 5.79486083984375, 11.102745056152344, 3.7192325592041016, 9.046730041503906, -2.145221710205078, 0.17358970642089844, 3.1061935424804688, 1.635162353515625, 1.3539562225341797, 3.2442169189453125, -4.768714904785156, 1.0615921020507812, 1.0413169860839844, -3.2702693939208984, 2.0371265411376953], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000242.npy"}
{"epoch": 0.36583522297808013, "step": 243, "batch_size": 64, "mean": 2.7344346046447754, "std": 3.6671249866485596, "min": -4.978324890136719, "p10": -1.4097078323364256, "median": 2.6647281646728516, "p90": 8.285323715209962, "max": 10.581497192382812, "pos_frac": 0.78125, "sample": [0.04767608642578125, 7.814598083496094, 3.9653854370117188, -4.321189880371094, 9.945806503295898, 1.2879257202148438, -3.510650634765625, 0.7696533203125, 3.9433822631835938, -0.7389068603515625, 10.581497192382812, 1.7079086303710938, 1.302154541015625, 2.8903732299804688, 0.4538688659667969, 8.118919372558594, -0.5882892608642578, 0.8996658325195312, 4.201128005981445, -0.597900390625, 1.7092819213867188, 10.362503051757812, 2.7714996337890625, -2.2394943237304688, 0.269622802734375, 5.158449172973633, 0.14544677734375, 3.230731964111328, 5.1103363037109375, 3.2932815551757812, 1.62152099609375, -2.9356651306152344, 3.780506134033203, -1.5329303741455078, 4.882478713989258, 5.198755264282227, 4.22979736328125, 5.756782531738281, -0.941192626953125, 1.5030364990234375, 9.439811706542969, 3.2277603149414062, 2.5579566955566406, 6.2579193115234375, 4.4739227294921875, -0.010852813720703125, 0.4159564971923828, 3.469442367553711, 8.356639862060547, 5.658363342285156, 1.9977264404296875, 0.1422576904296875, 1.2101478576660156, 9.35028076171875, -1.1221885681152344, 8.467071533203125, 3.463390350341797, 1.5234909057617188, -1.0164432525634766, -2.060565948486328, -4.978324890136719, 7.817876815795898, 3.449798583984375, 3.3646392822265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000243.npy"}
{"epoch": 0.3673469387755102, "step": 244, "batch_size": 64, "mean": 3.4561331272125244, "std": 3.211548328399658, "min": -2.8306007385253906, "p10": -0.12122592926025381, "median": 2.8916492462158203, "p90": 7.436705780029297, "max": 13.843597412109375, "pos_frac": 0.875, "sample": [4.322303771972656, -0.18657684326171875, 4.9877471923828125, 1.07281494140625, 4.334259033203125, 2.040994644165039, -0.4331169128417969, 7.387628555297852, 3.3367958068847656, -1.3614654541015625, 1.2903213500976562, 3.1587562561035156, -0.817626953125, 6.797607421875, 2.8988304138183594, 4.88787841796875, 2.397308349609375, 12.173469543457031, -2.2523345947265625, 4.378576278686523, 4.7528533935546875, 2.390533447265625, -2.8306007385253906, 4.7640228271484375, 3.0198516845703125, 5.5364837646484375, 2.3787059783935547, 7.8173980712890625, 7.6725616455078125, 1.7914619445800781, 9.401092529296875, 1.9144821166992188, -0.03394889831542969, 1.6657638549804688, 0.7678375244140625, 1.1776580810546875, 0.06034088134765625, 4.149806976318359, 7.0825653076171875, 13.843597412109375, 0.2627239227294922, 5.360504150390625, 1.7362689971923828, 2.4815521240234375, 0.43438720703125, 8.705562591552734, 5.703788757324219, 7.457738876342773, 0.0702972412109375, 2.1104965209960938, 0.6578330993652344, 6.6984100341796875, 5.464838027954102, 4.3349609375, 5.3517608642578125, 2.8844680786132812, -0.15863037109375, 4.418251037597656, 1.7412223815917969, 2.8355026245117188, 2.6507339477539062, 1.8354816436767578, 6.605094909667969, 3.8126373291015625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000244.npy"}
{"epoch": 0.3688586545729403, "step": 245, "batch_size": 64, "mean": 3.6819822788238525, "std": 4.066456317901611, "min": -7.083953857421875, "p10": -0.3916179656982421, "median": 3.5193252563476562, "p90": 8.66933765411377, "max": 14.065391540527344, "pos_frac": 0.859375, "sample": [3.7877349853515625, 0.29901123046875, 2.333324432373047, 12.588241577148438, 4.529674530029297, -0.07394981384277344, 0.7467613220214844, 5.000505447387695, 2.774700164794922, 3.1802902221679688, 1.14581298828125, 4.12890625, 5.46821403503418, 4.1912994384765625, 7.481594085693359, 10.923540115356445, 1.092306137084961, 6.223596572875977, -2.6245574951171875, 14.065391540527344, 11.468833923339844, 3.2593154907226562, 0.7996826171875, 4.305248260498047, 4.582611083984375, 6.6985015869140625, 1.7926483154296875, 0.2843170166015625, 3.175048828125, -0.2752361297607422, 8.57049560546875, -2.1370277404785156, 0.454833984375, 4.053108215332031, 2.2976531982421875, 1.7547454833984375, -2.9059906005859375, 2.482217788696289, 5.484340667724609, 10.441001892089844, 2.656444549560547, 1.012481689453125, 12.745792388916016, 2.1720199584960938, -0.4414958953857422, 2.9468955993652344, 8.711698532104492, 8.305282592773438, -1.7131805419921875, 4.6306915283203125, 4.96209716796875, 4.346172332763672, -7.083953857421875, 6.971122741699219, 7.4946136474609375, 3.7793350219726562, -5.0408172607421875, 6.36982536315918, 3.0461463928222656, 2.186124801635742, 3.7899436950683594, 4.7133331298828125, 1.1860504150390625, 4.051494598388672], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000245.npy"}
{"epoch": 0.37037037037037035, "step": 246, "batch_size": 64, "mean": 3.4722418785095215, "std": 4.428107738494873, "min": -4.673259735107422, "p10": -2.3997671127319333, "median": 3.2677040100097656, "p90": 9.00420036315918, "max": 15.963401794433594, "pos_frac": 0.765625, "sample": [6.014202117919922, 5.80766487121582, 8.518234252929688, 10.782764434814453, 2.8565216064453125, 1.1696853637695312, -4.673259735107422, 9.108779907226562, 3.3214073181152344, -2.0735397338867188, 6.9400634765625, 6.0758514404296875, 1.9011688232421875, -1.5665016174316406, 8.561553955078125, 3.214000701904297, 2.8533782958984375, -2.258424758911133, -3.700408935546875, -0.4737701416015625, 2.121063232421875, 6.568778991699219, 2.2349777221679688, 8.760181427001953, 12.996017456054688, -3.76214599609375, 2.9247283935546875, 12.956573486328125, -0.6683731079101562, 5.395805358886719, 0.9597911834716797, 6.167610168457031, 3.033111572265625, 0.9037876129150391, 6.405792236328125, 3.1082687377929688, -2.4603424072265625, 5.840965270996094, 10.707099914550781, -0.5535964965820312, 4.5040435791015625, -4.4320068359375, -2.6150360107421875, 5.607513427734375, 4.559474945068359, -1.0265655517578125, 3.975006103515625, 5.921783447265625, 5.6324005126953125, -3.1966400146484375, 3.0413818359375, 1.8435325622558594, 3.6435012817382812, 15.963401794433594, 3.3428001403808594, 1.7749671936035156, 5.03741455078125, 9.887489318847656, -1.112884521484375, 3.9871826171875, 3.3481616973876953, 1.696075439453125, 4.5389251708984375, 0.2820892333984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000246.npy"}
{"epoch": 0.37188208616780044, "step": 247, "batch_size": 64, "mean": 1.707242727279663, "std": 3.742208242416382, "min": -11.201648712158203, "p10": -2.3523046493530275, "median": 1.6723747253417969, "p90": 5.758129119873048, "max": 12.737640380859375, "pos_frac": 0.734375, "sample": [-1.5681915283203125, 1.4174613952636719, -0.5116424560546875, 2.1514129638671875, 2.1697463989257812, -4.558746337890625, 0.6539840698242188, 7.9769287109375, 8.116508483886719, 0.2191619873046875, 7.419952392578125, 2.5860366821289062, 5.549018859863281, 4.063852310180664, 1.4532012939453125, 1.21173095703125, 2.8872203826904297, 4.321445465087891, -0.23138999938964844, -0.67462158203125, 12.737640380859375, 4.91802978515625, -6.368682861328125, -2.340951919555664, 1.6288528442382812, 0.47875213623046875, 4.530536651611328, 0.10565567016601562, -3.156890869140625, 2.2538299560546875, 5.217353820800781, 2.5228958129882812, 1.1207351684570312, 2.212890625, 2.0519561767578125, 1.3943729400634766, 3.030567169189453, -11.201648712158203, -2.3571701049804688, -1.1394824981689453, -1.0901222229003906, 1.2573223114013672, -1.881500244140625, 7.453704833984375, 8.84637451171875, 1.3739356994628906, 2.7432823181152344, -0.5757560729980469, 1.1109580993652344, 2.63714599609375, 1.7158966064453125, 3.1973438262939453, -0.8721046447753906, 3.3021926879882812, -3.4413280487060547, 3.7899627685546875, 3.1951522827148438, 5.498456954956055, 1.614532470703125, 0.0037994384765625, -4.185546875, 3.3737869262695312, 5.847747802734375, 2.055988311767578], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000247.npy"}
{"epoch": 0.37339380196523053, "step": 248, "batch_size": 64, "mean": 3.0551724433898926, "std": 4.302940368652344, "min": -4.294334411621094, "p10": -2.3822624206542966, "median": 2.329418182373047, "p90": 8.76436996459961, "max": 15.279617309570312, "pos_frac": 0.8125, "sample": [5.020183563232422, 3.458587646484375, -4.27374267578125, -3.60919189453125, 12.078575134277344, 5.5617218017578125, 8.980819702148438, 4.038461685180664, 1.4564189910888672, -1.6622505187988281, 0.03524017333984375, 6.837984085083008, 5.493324279785156, -1.9015960693359375, 5.938018798828125, 1.5286712646484375, 9.214096069335938, 0.003265380859375, -4.16417121887207, 2.3438568115234375, -3.1386871337890625, 0.6046371459960938, 2.3548812866210938, 2.00140380859375, 8.788482666015625, 3.21844482421875, 2.936014175415039, 1.8204231262207031, 3.7664833068847656, 7.8203125, -2.027557373046875, 8.708106994628906, 0.2203998565673828, -1.5178089141845703, 8.435016632080078, 4.276100158691406, 2.0868759155273438, 2.4683685302734375, 1.1271228790283203, 2.525287628173828, 4.855140686035156, 8.381752014160156, 1.5782737731933594, 12.276840209960938, 8.568855285644531, 2.0534324645996094, 1.3057441711425781, 1.1077384948730469, 2.1525115966796875, 5.8642425537109375, 15.279617309570312, 1.6524810791015625, 4.1102294921875, 2.3149795532226562, -4.294334411621094, 2.1010684967041016, 2.792083740234375, 7.683502197265625, 9.485626220703125, -2.5342788696289062, 0.4170551300048828, 0.3951263427734375, -3.308074951171875, -1.5611648559570312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000248.npy"}
{"epoch": 0.3749055177626606, "step": 249, "batch_size": 64, "mean": 1.2201197147369385, "std": 3.8902363777160645, "min": -7.8466949462890625, "p10": -4.389717102050781, "median": 2.396883010864258, "p90": 5.3778518676757825, "max": 10.343582153320312, "pos_frac": 0.6875, "sample": [5.551628112792969, -4.03558349609375, 2.9409027099609375, -0.211517333984375, 0.06154632568359375, -3.4187164306640625, -4.5414886474609375, 3.9734420776367188, 0.6492919921875, 4.972373962402344, 2.0901927947998047, 4.199764251708984, 0.6435050964355469, 2.7646942138671875, -4.77001953125, -5.7175445556640625, 1.2839469909667969, -5.570671081542969, -2.3129501342773438, -2.6118087768554688, -0.2406482696533203, -1.881134033203125, 0.04801177978515625, 2.8787364959716797, 2.4910659790039062, 2.6378173828125, 0.1530437469482422, -0.8926124572753906, 10.343582153320312, -0.80810546875, -2.8345108032226562, 3.0264720916748047, 0.7879257202148438, 1.9575328826904297, 5.892791748046875, -7.240753173828125, 2.7511367797851562, -0.6358184814453125, 3.3123130798339844, -6.9401092529296875, 2.5239486694335938, 6.3781280517578125, 3.2504653930664062, 10.232681274414062, 4.913848876953125, 2.339527130126953, -3.8037796020507812, 4.827396392822266, 6.179180145263672, 2.5247268676757812, 2.7664661407470703, 3.3520240783691406, 2.9136810302734375, 1.1606559753417969, 4.417055130004883, 3.0640411376953125, 0.7644634246826172, 3.255615234375, 7.060596466064453, -7.8466949462890625, 2.6966171264648438, -2.604991912841797, 2.4542388916015625, 4.520046234130859], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000249.npy"}
{"epoch": 0.3764172335600907, "step": 250, "batch_size": 64, "mean": 3.2904372215270996, "std": 4.290297031402588, "min": -8.820869445800781, "p10": -0.9408737182617187, "median": 2.938352584838867, "p90": 9.676303100585939, "max": 15.49514389038086, "pos_frac": 0.796875, "sample": [3.6985645294189453, 1.1633052825927734, -1.0538806915283203, 2.9691104888916016, 0.19635009765625, 8.378990173339844, 4.918394088745117, 10.420059204101562, 8.408950805664062, 1.6873855590820312, 4.326877593994141, 0.3524932861328125, 9.503311157226562, 1.3359489440917969, 3.8304977416992188, 0.5778274536132812, 1.0543212890625, -8.820869445800781, 4.861186981201172, 2.1120376586914062, 2.5175914764404297, -3.734283447265625, 2.729156494140625, -2.698366165161133, 3.1300182342529297, -0.318756103515625, 9.911445617675781, 5.5034332275390625, 0.109375, 8.675582885742188, -0.9398002624511719, 9.94244384765625, 9.850564956665039, 0.3330249786376953, 5.716716766357422, 0.2635955810546875, 0.37943077087402344, 4.635349273681641, 15.49514389038086, 3.989654541015625, -2.311664581298828, -3.7418212890625, 2.427295684814453, -0.18013381958007812, 9.750442504882812, 2.907594680786133, 12.03973388671875, 6.536458969116211, 5.677440643310547, 5.968761444091797, 1.5893707275390625, -0.3607330322265625, 4.190742492675781, 0.6267547607421875, -0.735595703125, 3.2433624267578125, 8.10675048828125, 6.41754150390625, 4.018894195556641, -0.05983161926269531, -0.9413337707519531, 3.4808120727539062, 1.3765487670898438, 5.148406982421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000250.npy"}
{"epoch": 0.3779289493575208, "step": 251, "batch_size": 64, "mean": 2.3853681087493896, "std": 4.262994766235352, "min": -6.178070068359375, "p10": -1.9425193786621093, "median": 1.7905559539794922, "p90": 8.88943061828614, "max": 13.990570068359375, "pos_frac": 0.703125, "sample": [0.6861572265625, 12.374862670898438, 11.310447692871094, -0.3525524139404297, 1.4566574096679688, 0.7855453491210938, -5.119056701660156, 3.3112945556640625, 2.5544967651367188, 13.990570068359375, -0.43852996826171875, -3.312591552734375, 3.903472900390625, -1.437286376953125, 1.2448654174804688, -1.8119964599609375, 1.6072196960449219, 9.715667724609375, 2.3498096466064453, -4.4055633544921875, 0.6551170349121094, 12.430252075195312, 2.720224380493164, -6.178070068359375, 1.4806060791015625, -0.4628715515136719, 7.2347412109375, 4.37908935546875, -1.7585983276367188, -0.01491546630859375, 2.8160629272460938, 4.7687835693359375, 0.001735687255859375, 2.1582717895507812, 4.630329132080078, 3.7476119995117188, 5.3274688720703125, 0.155914306640625, -2.0748424530029297, -0.6874160766601562, 7.183206558227539, 1.9738922119140625, 2.1203231811523438, 10.201091766357422, 1.1219711303710938, 1.2403564453125, 3.917510986328125, 1.2098731994628906, 3.6539764404296875, 4.673309326171875, 4.78955078125, 3.550811767578125, 9.598583221435547, -1.9274215698242188, -0.6546173095703125, -1.125417709350586, 4.3968963623046875, -4.374412536621094, 2.7382259368896484, -1.9489898681640625, -1.7264633178710938, 5.7010345458984375, 5.260551452636719, 1.3467273712158203], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000251.npy"}
{"epoch": 0.3794406651549509, "step": 252, "batch_size": 64, "mean": 2.258627414703369, "std": 4.5768303871154785, "min": -13.495323181152344, "p10": -1.891042709350586, "median": 2.569934844970703, "p90": 7.327511596679688, "max": 12.878997802734375, "pos_frac": 0.765625, "sample": [2.9250831604003906, -12.313766479492188, -1.4133358001708984, -8.0157470703125, 5.83984375, 0.8934173583984375, 2.910449981689453, 7.3814544677734375, 0.5512466430664062, 5.8098602294921875, 10.708892822265625, 3.2029285430908203, 0.6875152587890625, 6.646995544433594, 9.542835235595703, 1.1027069091796875, 8.229225158691406, -1.9963970184326172, 2.4797134399414062, 0.3127117156982422, -3.4424667358398438, 0.6997604370117188, 7.2016448974609375, 3.874330520629883, 4.47088623046875, 2.862224578857422, 1.1783905029296875, -2.7896957397460938, 5.6580810546875, -1.4749031066894531, 2.815042495727539, 8.17236328125, 6.841970443725586, 5.9603729248046875, 2.643707275390625, 1.3740234375, 4.1151275634765625, 3.395132064819336, 2.482606887817383, -0.7986259460449219, 2.4961624145507812, -0.37256622314453125, -1.0713462829589844, 1.8420848846435547, 3.2371368408203125, 12.878997802734375, -1.90509033203125, 3.0430450439453125, -13.495323181152344, -1.8582649230957031, -1.183614730834961, 1.47265625, 1.6913318634033203, 1.4011917114257812, 2.6886215209960938, 1.5867767333984375, 3.3633575439453125, 6.486900329589844, 2.94677734375, 3.3478431701660156, 9.40045166015625, 6.9705352783203125, -1.6286563873291016, 0.4875335693359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000252.npy"}
{"epoch": 0.38095238095238093, "step": 253, "batch_size": 64, "mean": 2.417257308959961, "std": 3.5663108825683594, "min": -3.865865707397461, "p10": -2.097083282470703, "median": 1.8774566650390625, "p90": 6.634255409240725, "max": 10.881256103515625, "pos_frac": 0.71875, "sample": [3.0436248779296875, -2.35357666015625, 8.764341354370117, 0.07828521728515625, -0.7586002349853516, 3.4297828674316406, 1.8353271484375, 5.982204437255859, 3.44281005859375, 0.8709678649902344, 4.633033752441406, -2.1443862915039062, 6.913705825805664, 1.825185775756836, 0.9950962066650391, -0.9125747680664062, -2.8802928924560547, 3.4750137329101562, -2.666168212890625, 5.2396392822265625, 10.654296875, 3.0739059448242188, -2.6685791015625, 3.398162841796875, 1.915679931640625, 1.5057220458984375, 5.483158111572266, 4.149681091308594, -0.03960418701171875, 4.939727783203125, 2.21728515625, 1.5409660339355469, 4.071784973144531, 1.1600379943847656, 1.8392333984375, 9.600234985351562, 1.1654510498046875, -2.603118896484375, 5.883995056152344, 3.536104202270508, 1.516754150390625, -1.9582901000976562, 2.0257110595703125, -3.865865707397461, 4.445304870605469, -1.0453739166259766, -0.33809471130371094, 4.4051513671875, 10.881256103515625, 0.5604171752929688, 2.5960845947265625, 10.471694946289062, 10.327629089355469, 0.9021072387695312, 5.450244903564453, -0.05051422119140625, 2.7765235900878906, -0.950897216796875, 4.980155944824219, -1.9867095947265625, -1.6523971557617188, -0.5375003814697266, 1.314565658569336, 4.798976898193359], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000253.npy"}
{"epoch": 0.382464096749811, "step": 254, "batch_size": 64, "mean": 2.7207343578338623, "std": 4.3068108558654785, "min": -6.3726806640625, "p10": -2.8272422790527343, "median": 2.800273895263672, "p90": 7.928693580627442, "max": 14.76165771484375, "pos_frac": 0.703125, "sample": [-1.1928024291992188, 5.8849639892578125, 10.157440185546875, 3.89288330078125, -0.47670745849609375, 3.345111846923828, 7.406955718994141, 4.5698699951171875, 0.6243381500244141, 0.12929344177246094, -2.8307647705078125, 4.106845855712891, 7.752525329589844, 2.982137680053711, 4.177970886230469, 5.0089111328125, 6.61456298828125, 4.580955505371094, -2.8190231323242188, 5.23028564453125, 4.298439025878906, 14.76165771484375, -3.113189697265625, 10.093463897705078, 2.7755661010742188, -3.3225860595703125, 5.610626220703125, 4.46473503112793, 8.004194259643555, -0.8233222961425781, 10.713064193725586, -4.1280364990234375, 2.8575286865234375, -0.7830295562744141, -2.0078163146972656, 2.29583740234375, -1.2107658386230469, 3.196033477783203, -0.2284698486328125, 2.824981689453125, 1.9578399658203125, 0.3413963317871094, 4.517051696777344, -3.6223678588867188, 0.9566669464111328, 9.298072814941406, -1.8049240112304688, 6.995929718017578, 5.438568115234375, 0.8856372833251953, 2.88531494140625, -2.0237960815429688, -1.1835174560546875, 1.517852783203125, 10.458541870117188, 2.4390945434570312, 1.916351318359375, 1.4513511657714844, 7.689544677734375, 7.7216644287109375, 2.6710948944091797, -6.3726806640625, -1.285003662109375, -4.147346496582031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000254.npy"}
{"epoch": 0.3839758125472411, "step": 255, "batch_size": 64, "mean": 2.979689598083496, "std": 4.240329742431641, "min": -7.0545501708984375, "p10": -1.756422424316406, "median": 2.65802001953125, "p90": 9.471342849731446, "max": 12.80355453491211, "pos_frac": 0.734375, "sample": [9.247356414794922, -0.34856414794921875, 10.794204711914062, 1.716400146484375, 6.5330963134765625, -0.657196044921875, -0.8750572204589844, 0.9927444458007812, 0.6133613586425781, 3.9052276611328125, -2.8211746215820312, -0.16623878479003906, 1.1314811706542969, 5.626148223876953, -1.3680477142333984, 2.6168479919433594, 5.007606506347656, 4.636817932128906, 0.18392181396484375, -7.0545501708984375, 8.201473236083984, -0.49402618408203125, 2.7087326049804688, 1.7853813171386719, 8.059005737304688, 3.0471954345703125, -1.0422286987304688, 0.9400291442871094, 9.184799194335938, 1.5958175659179688, 11.511306762695312, 6.055259704589844, -1.8715133666992188, 2.6991920471191406, 3.137176513671875, 2.964406967163086, 5.705938339233398, 9.8350830078125, 9.567337036132812, 3.181619644165039, 1.0177173614501953, 3.29290771484375, 6.1061248779296875, -5.2967071533203125, 1.424844741821289, -1.4878768920898438, 1.1583480834960938, 10.4786376953125, -0.49843597412109375, 3.0894317626953125, 1.9493522644042969, 5.586906433105469, 3.617279052734375, 12.80355453491211, 1.2150192260742188, 4.392148971557617, 0.7081565856933594, 6.604732513427734, 11.151077270507812, -2.5310096740722656, -1.9816970825195312, -2.1195297241210938, 4.21136474609375, -0.6785964965820312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000255.npy"}
{"epoch": 0.3854875283446712, "step": 256, "batch_size": 64, "mean": 2.671656608581543, "std": 4.174123764038086, "min": -9.071174621582031, "p10": -1.7801734924316404, "median": 2.146364212036133, "p90": 8.367211151123048, "max": 13.7733154296875, "pos_frac": 0.734375, "sample": [-0.16872406005859375, 2.157257080078125, 4.760702133178711, 2.3136253356933594, 8.422470092773438, -1.359832763671875, -0.8980598449707031, 8.238273620605469, 2.027780532836914, 0.9500064849853516, 0.012939453125, 0.8743495941162109, -1.88275146484375, 8.6162109375, 5.7748870849609375, 12.986717224121094, 0.3225555419921875, 2.5867538452148438, -9.071174621582031, 4.494102478027344, 0.9364128112792969, 2.55926513671875, -2.103595733642578, 2.1026992797851562, -3.9099960327148438, -2.1610260009765625, 2.4571666717529297, -0.4341888427734375, 3.4948272705078125, 2.421306610107422, 1.304534912109375, 6.33270263671875, 2.242828369140625, 4.747550964355469, 5.530912399291992, 5.819358825683594, 7.217475891113281, 7.859748840332031, -2.7472381591796875, 7.749547958374023, -1.5408248901367188, 2.4695892333984375, 9.832122802734375, -0.7027130126953125, 8.81489372253418, 1.362396240234375, 3.3924713134765625, 1.2434539794921875, 0.9849567413330078, 13.7733154296875, 1.115966796875, 6.4295196533203125, -1.462005615234375, 1.5100593566894531, 4.7386932373046875, -1.1207809448242188, 8.65765380859375, -0.9027862548828125, 0.8070487976074219, 7.7082672119140625, -1.2749977111816406, 3.7751007080078125, -3.3392410278320312, 2.1354713439941406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000256.npy"}
{"epoch": 0.3869992441421013, "step": 257, "batch_size": 64, "mean": 3.8598098754882812, "std": 5.015024662017822, "min": -8.876901626586914, "p10": -1.0339149475097653, "median": 3.3014907836914062, "p90": 11.38736953735352, "max": 15.496692657470703, "pos_frac": 0.78125, "sample": [-0.8116989135742188, 1.1578369140625, 2.30914306640625, 15.393646240234375, 11.753807067871094, 6.592197418212891, -0.06751251220703125, 6.772821426391602, -0.5492706298828125, 5.6723480224609375, 6.200164794921875, 4.2316741943359375, 2.5229110717773438, -1.5488052368164062, 7.043914794921875, 10.094261169433594, 10.5323486328125, -1.2645950317382812, -7.815216064453125, -0.1693267822265625, 0.21351242065429688, 2.5263137817382812, 5.90704345703125, 0.8594436645507812, 3.2522506713867188, 0.3007354736328125, 3.5550575256347656, 12.800712585449219, 5.1067657470703125, -0.5904312133789062, 5.297542572021484, 10.358173370361328, 3.0400848388671875, 1.0898361206054688, 6.292015075683594, 4.972648620605469, 2.532012939453125, 6.4002227783203125, 15.496692657470703, 0.3152961730957031, -8.876901626586914, 14.83868408203125, 3.7207183837890625, 3.7578353881835938, 1.763092041015625, -0.07923507690429688, 1.2981529235839844, 0.7108840942382812, 3.3507308959960938, 4.802818298339844, 5.57501220703125, 2.0160140991210938, 6.355476379394531, 0.6871986389160156, -1.129150390625, 14.786842346191406, -0.7697257995605469, -2.6837387084960938, 11.842880249023438, 5.537864685058594, 5.016880035400391, 5.81719970703125, 3.1569061279296875, -2.2451858520507812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000257.npy"}
{"epoch": 0.3885109599395314, "step": 258, "batch_size": 64, "mean": 2.665008068084717, "std": 3.215672731399536, "min": -4.322971343994141, "p10": -1.1578613281249999, "median": 2.404277801513672, "p90": 7.148544311523438, "max": 10.666219711303711, "pos_frac": 0.8125, "sample": [3.8453216552734375, 1.2520256042480469, -0.9184646606445312, -2.696229934692383, 2.0352325439453125, 10.666219711303711, 7.22991943359375, 4.243988037109375, 7.971210479736328, 0.2924060821533203, 2.1759490966796875, 1.0265121459960938, -1.208160400390625, 4.719377517700195, 1.1257553100585938, 5.8856048583984375, 1.2107772827148438, -1.6184844970703125, 2.6326065063476562, -4.322971343994141, 1.4188957214355469, 0.16644287109375, -2.5936203002929688, 2.8406448364257812, 3.0924072265625, 3.665271759033203, 3.8567886352539062, 2.9651451110839844, 1.5813331604003906, 0.8300590515136719, 2.8548583984375, 3.4379234313964844, 3.3171768188476562, 4.811979293823242, 1.0934028625488281, -1.040496826171875, 3.802225112915039, 0.287109375, -0.376708984375, 9.684356689453125, 4.963279724121094, -2.143402099609375, 6.4078216552734375, 4.0425262451171875, 6.977325439453125, 4.680192947387695, 7.221923828125, 2.770742416381836, 1.3322067260742188, 5.109617233276367, -0.00441741943359375, 1.6172637939453125, 2.1001968383789062, 1.1002731323242188, 1.8751106262207031, 1.1452503204345703, 8.456787109375, -3.775787353515625, 4.253658294677734, -0.2703380584716797, 4.50554084777832, 4.875598907470703, 10.187284469604492, 1.8880748748779297], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000258.npy"}
{"epoch": 0.3900226757369615, "step": 259, "batch_size": 64, "mean": 4.3964128494262695, "std": 4.396032333374023, "min": -4.270538330078125, "p10": -1.2107406616210936, "median": 4.953386306762695, "p90": 9.720844650268555, "max": 16.289886474609375, "pos_frac": 0.828125, "sample": [-0.26596832275390625, 1.969390869140625, -4.270538330078125, 5.391014099121094, 5.269317626953125, 6.878498077392578, 9.612205505371094, 9.518447875976562, 6.063861846923828, -3.2860946655273438, 8.898933410644531, -1.0318756103515625, 1.6551380157470703, 8.63604736328125, 6.362010955810547, 1.6445770263671875, -3.5220394134521484, 0.8249645233154297, 11.179061889648438, 9.311965942382812, 2.2615585327148438, 6.8101348876953125, 9.685962677001953, 7.8295745849609375, 7.798225402832031, -1.9405860900878906, -1.2560958862304688, 1.0819664001464844, 1.9330368041992188, 4.152124404907227, 5.576984405517578, 3.7909774780273438, -2.8579063415527344, 5.23626708984375, 1.3734664916992188, 2.3670196533203125, 10.055259704589844, 2.420074462890625, 7.897430419921875, 6.23486328125, 9.198772430419922, 5.098407745361328, 0.25543212890625, 16.289886474609375, 2.5831832885742188, 5.858367919921875, -2.910858154296875, -0.3823375701904297, -1.1049118041992188, 0.3334465026855469, 9.735794067382812, 10.919944763183594, 4.38470458984375, 2.3485107421875, 2.5539169311523438, 10.950382232666016, 2.118743896484375, 5.327751159667969, 1.1217765808105469, 6.189781188964844, 4.8083648681640625, 11.132743835449219, 6.583442687988281, 6.685918807983398], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000259.npy"}
{"epoch": 0.3915343915343915, "step": 260, "batch_size": 64, "mean": 3.252629041671753, "std": 4.479521751403809, "min": -5.499479293823242, "p10": -2.444050979614258, "median": 3.162107467651367, "p90": 8.999559020996093, "max": 14.190711975097656, "pos_frac": 0.703125, "sample": [7.234378814697266, -2.5111961364746094, 5.702423095703125, -2.456340789794922, -2.3798141479492188, 14.190711975097656, 2.9415817260742188, 4.742767333984375, 8.658447265625, -2.415374755859375, 8.238533020019531, 3.1445865631103516, 6.4870758056640625, 6.5246429443359375, 7.504219055175781, -2.7611045837402344, 10.648605346679688, 2.3501129150390625, 5.141057968139648, 13.720535278320312, 10.900146484375, -0.6004562377929688, 5.233802795410156, 8.306095123291016, -1.291595458984375, 0.05130767822265625, 8.07241439819336, 9.036865234375, -0.9305534362792969, 3.636322021484375, 5.599773406982422, 4.72760009765625, 6.054374694824219, 0.25136566162109375, 3.179628372192383, 2.067668914794922, -0.4488105773925781, -0.23263168334960938, -5.499479293823242, 4.754268646240234, 0.48073577880859375, 4.474720001220703, 0.3363189697265625, -1.1885223388671875, 5.221099853515625, 5.170055389404297, 9.034271240234375, -1.3650741577148438, 9.047449111938477, -2.169921875, -1.6263084411621094, 2.8648643493652344, 2.4549942016601562, 2.861663818359375, 4.288702011108398, 3.5419540405273438, -1.0394287109375, 4.978557586669922, 8.918563842773438, -3.8175582885742188, 2.4012222290039062, -3.008586883544922, -3.052356719970703, 1.7868881225585938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000260.npy"}
{"epoch": 0.3930461073318216, "step": 261, "batch_size": 64, "mean": 2.7603230476379395, "std": 4.29247522354126, "min": -5.4632415771484375, "p10": -2.901860427856445, "median": 2.2729454040527344, "p90": 8.37584800720215, "max": 12.968456268310547, "pos_frac": 0.75, "sample": [-0.056285858154296875, 3.92327880859375, 3.712635040283203, -0.3349037170410156, -3.8059921264648438, 7.845897674560547, -0.6667137145996094, 7.4174041748046875, 4.827062606811523, -0.5227146148681641, -5.366508483886719, -1.0654296875, -1.3702163696289062, 8.470523834228516, 2.5650291442871094, 0.1688079833984375, 1.5305900573730469, 9.226509094238281, 0.1989269256591797, 10.70077896118164, 1.3592910766601562, 4.4721832275390625, 0.7707748413085938, -5.3299560546875, 8.154937744140625, 2.6089935302734375, 12.618240356445312, 1.9674911499023438, 1.0313644409179688, 2.453989028930664, 4.487827301025391, 4.199880599975586, 7.532920837402344, 6.1082763671875, 1.8286056518554688, -1.1022300720214844, 12.968456268310547, -0.4537506103515625, -3.424205780029297, 3.7972946166992188, 2.1408920288085938, 0.9001674652099609, 3.9030027389526367, 4.524162292480469, 2.7441158294677734, -4.36236572265625, 1.1761207580566406, -5.4632415771484375, -3.0771141052246094, 2.093128204345703, 4.747352600097656, 0.1875152587890625, 9.568130493164062, 10.480234146118164, 2.404998779296875, -2.4929351806640625, 4.777240753173828, 7.224693298339844, 0.14708709716796875, 8.003158569335938, 1.5503921508789062, 2.1283531188964844, 5.745887756347656, 4.16064453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000261.npy"}
{"epoch": 0.3945578231292517, "step": 262, "batch_size": 64, "mean": 3.237910270690918, "std": 3.9321908950805664, "min": -2.9169082641601562, "p10": -1.0111778259277342, "median": 1.7544422149658203, "p90": 8.813964843750002, "max": 13.134765625, "pos_frac": 0.84375, "sample": [-2.151885986328125, 5.097053527832031, 1.5379524230957031, 1.633432388305664, -1.5153045654296875, 5.66624641418457, 5.0565643310546875, 1.5289077758789062, 1.2406253814697266, 9.943359375, -0.9342575073242188, 8.930648803710938, 0.13946533203125, 0.2153301239013672, 11.518726348876953, -2.9169082641601562, 0.8671321868896484, 12.664360046386719, 7.829620361328125, 3.145477294921875, 0.5922393798828125, 0.08015060424804688, 0.5148544311523438, 0.0098419189453125, 4.73211669921875, 10.630081176757812, 0.360809326171875, 7.8902740478515625, -0.06536865234375, 0.19089508056640625, 8.541702270507812, 0.29293060302734375, 2.7747650146484375, 4.360889434814453, -1.9270553588867188, 7.7548980712890625, 2.670381546020508, 7.658931732177734, 5.259918212890625, 1.8432655334472656, -1.0441436767578125, 3.489990234375, -0.9089126586914062, 6.322967529296875, 13.134765625, 2.7452030181884766, 1.4669303894042969, 0.9540882110595703, -2.8637924194335938, 2.264575958251953, 3.6372833251953125, 0.003894805908203125, 8.950119018554688, 0.947540283203125, 7.17057991027832, 1.0281524658203125, 6.9596405029296875, 7.6115875244140625, 1.6148681640625, 4.4419403076171875, 1.4070701599121094, 1.665618896484375, 4.0274810791015625, -1.464263916015625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000262.npy"}
{"epoch": 0.3960695389266818, "step": 263, "batch_size": 64, "mean": 2.659841299057007, "std": 4.202951431274414, "min": -6.9544525146484375, "p10": -2.530517196655273, "median": 2.4016857147216797, "p90": 7.9219667434692385, "max": 12.672157287597656, "pos_frac": 0.703125, "sample": [1.7783546447753906, 5.795970916748047, -4.239524841308594, -0.7673797607421875, -2.5955162048339844, -0.6176891326904297, -0.072052001953125, 1.37823486328125, 4.879495620727539, 12.300844192504883, 4.520782470703125, -0.6300888061523438, 1.3519134521484375, 5.280147552490234, 1.7817001342773438, -4.979652404785156, 7.97454833984375, 1.725006103515625, 4.424186706542969, -2.3788528442382812, 4.759162902832031, 4.0226898193359375, 7.4663238525390625, 2.6864089965820312, -0.11408615112304688, 0.9108009338378906, 8.324508666992188, 4.3170318603515625, 6.2178497314453125, 3.923370361328125, 10.429460525512695, 3.983062744140625, -2.92279052734375, 0.02896881103515625, -1.6494941711425781, -0.9792098999023438, -4.645744323730469, 8.274730682373047, 4.530204772949219, 5.722053527832031, 5.1561431884765625, -0.9629058837890625, -0.1304473876953125, 12.672157287597656, -2.2574996948242188, 0.32700347900390625, 5.7037506103515625, 4.881477355957031, 4.982181549072266, 1.3842544555664062, -0.28668212890625, 10.278076171875, 0.525543212890625, 0.5941658020019531, 5.11590576171875, 0.8159828186035156, 7.670310974121094, 5.836725234985352, 2.610828399658203, -3.525411605834961, 2.1925430297851562, -6.9544525146484375, 7.799276351928711, 3.6051902770996094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000263.npy"}
{"epoch": 0.3975812547241119, "step": 264, "batch_size": 64, "mean": 3.1300601959228516, "std": 4.9980854988098145, "min": -8.100555419921875, "p10": -2.4380104064941404, "median": 2.787872314453125, "p90": 9.959605407714845, "max": 15.574150085449219, "pos_frac": 0.71875, "sample": [9.100936889648438, -0.8239898681640625, 2.33099365234375, 0.8722724914550781, 1.5162887573242188, -2.1426563262939453, 2.0728073120117188, -0.22017669677734375, 5.053318023681641, 5.855072021484375, -1.5542678833007812, 2.365192413330078, -8.100555419921875, 7.998466491699219, -0.8321914672851562, 7.2100677490234375, 7.503021240234375, 0.745025634765625, -0.6370429992675781, 6.791351318359375, 2.7196693420410156, 3.5735721588134766, 6.212333679199219, 9.799858093261719, 3.4800262451171875, 15.311824798583984, -0.4141082763671875, 3.16644287109375, -1.600830078125, 2.5999221801757812, 4.6577606201171875, 0.00638580322265625, 8.769447326660156, 1.0776481628417969, 0.28350067138671875, 1.4249191284179688, 10.46786880493164, -2.3457412719726562, 2.9558639526367188, 10.320816040039062, -3.3364715576171875, -4.775764465332031, 3.1741714477539062, 6.0251312255859375, 10.028068542480469, 6.308956146240234, -3.221893310546875, 6.00701904296875, -7.130683898925781, 2.8560752868652344, -2.4775543212890625, 3.961578369140625, 15.213582992553711, 4.039508819580078, 0.017124176025390625, -2.8568878173828125, 0.7093582153320312, 10.05096435546875, 3.9762725830078125, -0.2469005584716797, -1.05859375, 15.574150085449219, 3.5164871215820312, 6.3990325927734375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000264.npy"}
{"epoch": 0.39909297052154197, "step": 265, "batch_size": 64, "mean": 3.319258451461792, "std": 4.87885046005249, "min": -9.407432556152344, "p10": -2.3185577392578116, "median": 3.106867790222168, "p90": 10.224722290039063, "max": 12.652576446533203, "pos_frac": 0.78125, "sample": [2.7078628540039062, 1.2650375366210938, -0.05364227294921875, -7.130241394042969, 3.4075069427490234, 1.6296768188476562, 1.3208389282226562, 6.818092346191406, -4.6345672607421875, 10.283943176269531, 0.8060111999511719, 1.9708595275878906, 4.984596252441406, 4.886566162109375, -9.407432556152344, 0.6190109252929688, 2.6294078826904297, 4.033512115478516, 5.601232528686523, -5.858264923095703, 1.3629226684570312, 7.489776611328125, 7.655773162841797, 4.315832138061523, 0.5306854248046875, 0.5607643127441406, 10.37989616394043, 2.3300857543945312, 4.541561126708984, -1.1807689666748047, 10.086540222167969, 9.7113037109375, -0.2421417236328125, 10.418317794799805, 4.227912902832031, 10.654937744140625, 1.695159912109375, 0.4541969299316406, 0.31742286682128906, -2.7786331176757812, 2.9705581665039062, 4.205305099487305, 4.130903244018555, 8.218757629394531, 12.652576446533203, 10.878433227539062, 10.288335800170898, 5.415718078613281, 2.3873825073242188, -3.7746658325195312, 4.7818756103515625, 1.84710693359375, -1.0344886779785156, 9.681877136230469, 9.106101989746094, 9.287590026855469, 9.363334655761719, -6.524272918701172, 3.2431774139404297, -1.0758705139160156, 4.810279846191406, -1.2450485229492188, 5.269523620605469, -0.863494873046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000265.npy"}
{"epoch": 0.40060468631897206, "step": 266, "batch_size": 64, "mean": 3.107522487640381, "std": 4.820417881011963, "min": -7.4736175537109375, "p10": -2.483747100830078, "median": 3.1425981521606445, "p90": 8.40233383178711, "max": 17.34691619873047, "pos_frac": 0.71875, "sample": [0.09432601928710938, 8.516868591308594, 13.305931091308594, 2.141460418701172, 4.8809051513671875, -0.5163097381591797, -0.6816635131835938, -3.804656982421875, 3.3072986602783203, -0.5296630859375, 5.013404846191406, 5.0619049072265625, 0.6497974395751953, 3.9470653533935547, 3.9145679473876953, 4.707324981689453, 0.0518798828125, 17.34691619873047, -0.07750701904296875, 2.0890045166015625, 12.915420532226562, 4.3101348876953125, -0.29465484619140625, 11.279539108276367, 5.974176406860352, 7.60809326171875, -2.4951248168945312, 8.135086059570312, 4.631237030029297, -0.24127960205078125, 6.410924911499023, 2.6083221435546875, 7.457122802734375, -1.2825088500976562, 3.283601760864258, -2.4571990966796875, 4.171073913574219, 6.77886962890625, 3.3779563903808594, 1.3674354553222656, -0.9805679321289062, -4.2820587158203125, 0.1260986328125, 3.648975372314453, -7.4736175537109375, 1.761688232421875, 3.8434791564941406, 3.0015945434570312, 3.57171630859375, 3.9252700805664062, -6.4007568359375, 0.8518581390380859, -4.0786590576171875, -2.726409912109375, 6.887359619140625, 2.9073753356933594, 14.967132568359375, -1.311452865600586, 8.962135314941406, 2.8985137939453125, 7.74662971496582, 2.7139034271240234, 6.619270324707031, -1.2551231384277344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000266.npy"}
{"epoch": 0.4021164021164021, "step": 267, "batch_size": 64, "mean": 2.443159580230713, "std": 4.479851722717285, "min": -11.376598358154297, "p10": -2.2297860145568844, "median": 2.2111358642578125, "p90": 7.138975143432617, "max": 15.018302917480469, "pos_frac": 0.6875, "sample": [1.419921875, 1.2007026672363281, -0.10317230224609375, -6.131898880004883, 3.0742645263671875, -2.2890357971191406, 7.163013458251953, 6.740325927734375, 4.5175628662109375, 1.6476325988769531, -0.5913314819335938, 3.6258392333984375, -0.7276382446289062, 15.018302917480469, 0.3095893859863281, -4.420219421386719, 0.33139801025390625, -2.091536521911621, 0.896148681640625, -4.722412109375, -1.8298053741455078, -3.08502197265625, 0.9035568237304688, 4.779117584228516, 10.199569702148438, 9.595453262329102, -1.0532455444335938, 0.3222637176513672, 7.0828857421875, -3.3701171875, 3.4603500366210938, 1.9222564697265625, 7.668968200683594, 4.383333206176758, -0.13700103759765625, 4.291820526123047, 6.197578430175781, 2.73638916015625, 2.5000152587890625, 14.263008117675781, 3.3106002807617188, -0.2017650604248047, 9.102920532226562, -0.5939254760742188, 5.713768005371094, 3.864706039428711, 6.4491119384765625, 4.6627960205078125, -11.376598358154297, 5.3013916015625, 5.105583190917969, 1.2354736328125, -1.2430343627929688, 1.3872032165527344, -0.9188003540039062, -0.7155570983886719, 5.118907928466797, 5.979917526245117, 4.687324523925781, 1.1621246337890625, 3.4279747009277344, 5.926959991455078, 3.6781158447265625, -0.4018268585205078], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000267.npy"}
{"epoch": 0.4036281179138322, "step": 268, "batch_size": 64, "mean": 2.9329018592834473, "std": 4.029765605926514, "min": -4.69622802734375, "p10": -1.4889247894287105, "median": 2.4859466552734375, "p90": 8.605474853515627, "max": 12.536392211914062, "pos_frac": 0.765625, "sample": [2.4760894775390625, 11.520721435546875, 0.6958999633789062, -2.31207275390625, 2.4958038330078125, 6.695426940917969, 3.4866180419921875, 0.17734527587890625, -2.1128807067871094, 8.803375244140625, 4.395011901855469, 0.7363510131835938, -0.6958847045898438, 0.8573150634765625, 0.7525405883789062, 1.2676162719726562, 4.283060073852539, 0.1467132568359375, -0.22449493408203125, 2.1699790954589844, 5.607698440551758, 8.099937438964844, -0.2178192138671875, 0.5301284790039062, -4.168567657470703, 2.303253173828125, 3.7725601196289062, 2.25750732421875, 4.4330596923828125, 4.405540466308594, 6.282112121582031, 2.602008819580078, 0.225311279296875, 4.807315826416016, -0.41501617431640625, 9.524772644042969, 0.38823699951171875, -3.511302947998047, -4.69622802734375, 2.8297996520996094, 3.2905960083007812, 0.6944694519042969, -0.47332000732421875, 12.536392211914062, 8.143707275390625, 4.58221435546875, 8.134223937988281, -0.43967628479003906, 2.812938690185547, -1.6225471496582031, -3.4740219116210938, -1.1771392822265625, 7.555095672607422, 0.37551116943359375, 11.501937866210938, 4.110004425048828, 4.5561981201171875, 6.8485565185546875, 10.500728607177734, 2.8204574584960938, 0.4032917022705078, 9.860687255859375, 5.995616912841797, -0.5050411224365234], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000268.npy"}
{"epoch": 0.4051398337112623, "step": 269, "batch_size": 64, "mean": 3.2884888648986816, "std": 4.139364719390869, "min": -5.780204772949219, "p10": -1.6518590927124024, "median": 2.8981475830078125, "p90": 9.316984558105469, "max": 11.608566284179688, "pos_frac": 0.796875, "sample": [2.3326492309570312, -3.1066017150878906, 2.1019287109375, 5.964664459228516, 4.064666748046875, -0.9303665161132812, 1.117431640625, 9.298751831054688, 5.063591003417969, 1.2002029418945312, -3.1653099060058594, 6.190032958984375, 3.9455909729003906, 3.8860321044921875, 9.727989196777344, 9.649650573730469, 0.9806938171386719, 3.8248367309570312, 9.324798583984375, -4.970920562744141, 1.3380126953125, -3.990020751953125, 8.724723815917969, 0.7711410522460938, -1.78643798828125, 10.510292053222656, 0.03333091735839844, 11.608566284179688, 0.9464302062988281, 7.281150817871094, 3.3099098205566406, 2.560260772705078, -0.5117301940917969, 4.297344207763672, 0.816650390625, 1.4179878234863281, 0.7949562072753906, 3.0232696533203125, -1.6064949035644531, 5.680095672607422, 11.553058624267578, 6.420680999755859, 2.3831710815429688, 4.046043395996094, 6.689537048339844, -1.6713008880615234, 4.404153823852539, 1.5498504638671875, -1.5078353881835938, 0.718170166015625, 10.310516357421875, 1.00634765625, 8.925445556640625, -0.5644073486328125, 2.2488555908203125, -5.780204772949219, 7.4075775146484375, 7.167976379394531, -0.36994171142578125, 5.981048583984375, 4.072418212890625, 3.7896652221679688, 7.189680099487305, 2.7730255126953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000269.npy"}
{"epoch": 0.40665154950869237, "step": 270, "batch_size": 64, "mean": 2.442448616027832, "std": 3.878445625305176, "min": -8.87127685546875, "p10": -1.8113536834716795, "median": 2.95596981048584, "p90": 6.6215126037597685, "max": 14.415092468261719, "pos_frac": 0.796875, "sample": [5.873987197875977, 7.48457145690918, -2.6898555755615234, 11.638137817382812, 1.7755584716796875, 10.236328125, 2.6410064697265625, 0.056468963623046875, 3.8654632568359375, 0.5743408203125, 14.415092468261719, 4.05035400390625, 3.1034011840820312, 3.6284561157226562, 1.6571998596191406, -2.9216079711914062, 2.970733642578125, 3.3806495666503906, 0.13832473754882812, 6.017082214355469, 0.8143768310546875, 3.031585693359375, 0.833740234375, -1.9038276672363281, 1.0292434692382812, 3.945049285888672, 4.725151062011719, -8.87127685546875, 3.380870819091797, 1.469228744506836, 3.8937530517578125, 0.598388671875, -0.15116119384765625, 7.009696960449219, -0.08355712890625, 4.391448974609375, 3.2413558959960938, -1.0991783142089844, 3.342243194580078, -7.495510101318359, 6.88055419921875, 7.929634094238281, 5.0897369384765625, -2.8676013946533203, 2.9412059783935547, 2.5102615356445312, -1.4781646728515625, 3.6068954467773438, 3.6869049072265625, 3.1320247650146484, 5.746969223022461, 0.07874870300292969, -0.96343994140625, 4.4002838134765625, 2.4745864868164062, -1.5955810546875, 0.8962631225585938, -4.718406677246094, 5.4485931396484375, 2.7950401306152344, 2.1974945068359375, 0.2757720947265625, 2.9981918334960938, 4.853424072265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000270.npy"}
{"epoch": 0.40816326530612246, "step": 271, "batch_size": 64, "mean": 2.243441104888916, "std": 5.223219871520996, "min": -6.871133804321289, "p10": -3.6437145233154293, "median": 1.0469255447387695, "p90": 9.307927703857425, "max": 18.3602294921875, "pos_frac": 0.59375, "sample": [4.5731353759765625, 3.7595367431640625, -2.9034271240234375, 2.8451385498046875, 10.072502136230469, 5.4504241943359375, 2.4271068572998047, -0.2282257080078125, 8.195770263671875, -3.7250099182128906, 8.155105590820312, -4.940763473510742, 10.65411376953125, -0.5929603576660156, 5.371421813964844, 10.073356628417969, 1.0629215240478516, 1.0309295654296875, 0.3885059356689453, 12.105064392089844, 2.236328125, 4.144649505615234, 5.828130722045898, -2.5472259521484375, -0.339111328125, -0.4132652282714844, -2.5931549072265625, 4.7631683349609375, 0.6655120849609375, 0.7740116119384766, 0.47016143798828125, 4.606719970703125, -1.4699783325195312, -3.846658706665039, -1.4191131591796875, 9.580207824707031, 2.5655174255371094, 0.09510040283203125, 4.0280609130859375, 7.950372695922852, -0.4840545654296875, -4.889339447021484, 18.3602294921875, -3.4540252685546875, 1.9832191467285156, -1.50994873046875, -0.4054603576660156, -6.871133804321289, -6.242347717285156, -2.8297653198242188, -0.9277839660644531, -0.9272003173828125, 6.517795562744141, 8.672607421875, 5.937198638916016, 6.541961669921875, -6.0718994140625, 4.95257568359375, -1.4640731811523438, 1.0918121337890625, -0.8048572540283203, -1.873687744140625, 15.800128936767578, 3.62420654296875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000271.npy"}
{"epoch": 0.40967498110355255, "step": 272, "batch_size": 64, "mean": 3.835308313369751, "std": 4.23404598236084, "min": -3.43603515625, "p10": -0.5874589920043944, "median": 2.8604774475097656, "p90": 10.504680633544922, "max": 15.306900024414062, "pos_frac": 0.828125, "sample": [0.46121978759765625, -1.0635757446289062, -1.9101905822753906, 4.387884140014648, 2.5979385375976562, 10.978160858154297, 3.827117919921875, 8.320320129394531, 10.050437927246094, 3.043975830078125, 5.504783630371094, 4.640235900878906, -0.2932262420654297, -0.8329849243164062, 3.4530868530273438, 1.4943962097167969, 4.7070465087890625, -0.09889984130859375, 8.510772705078125, 10.876731872558594, 5.481073379516602, 1.7948741912841797, 4.519374847412109, 3.746511459350586, 12.391416549682617, 1.6824188232421875, 2.6769790649414062, -3.0432052612304688, 3.2247161865234375, 10.334503173828125, 0.8498001098632812, 14.104320526123047, 2.4906768798828125, 0.4405632019042969, 6.172811508178711, 0.7152976989746094, -0.5005664825439453, 1.2909507751464844, 5.2262725830078125, -0.7173347473144531, 7.956413269042969, 12.055717468261719, 10.577613830566406, 1.8329429626464844, -0.0389404296875, 0.46750831604003906, 15.306900024414062, 0.41156768798828125, 4.506076812744141, 6.453643798828125, 3.388467788696289, 6.375820159912109, -3.43603515625, 1.0102386474609375, 2.1254348754882812, 1.760223388671875, 5.687950134277344, 5.7397003173828125, -0.6246986389160156, 0.3059253692626953, 0.5827255249023438, 0.9079456329345703, 8.055309295654297, 2.514589309692383], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000272.npy"}
{"epoch": 0.41118669690098264, "step": 273, "batch_size": 64, "mean": 3.344717025756836, "std": 4.702366352081299, "min": -7.313495635986328, "p10": -2.1933753967285154, "median": 3.2905845642089844, "p90": 8.162949371337891, "max": 17.84393310546875, "pos_frac": 0.78125, "sample": [-1.9450759887695312, 4.781414031982422, 5.17083740234375, -3.810791015625, 9.098968505859375, 1.8379592895507812, -1.2998046875, 5.071353912353516, -4.293327331542969, 3.0065765380859375, -0.33280181884765625, -2.2997894287109375, 4.045005798339844, 0.6616916656494141, 4.847782135009766, 5.823860168457031, 8.156257629394531, 0.8032627105712891, 8.165817260742188, 3.294727325439453, 5.929100036621094, 3.347850799560547, -1.31060791015625, -1.1181297302246094, -4.5805816650390625, 6.426025390625, 5.3539276123046875, 3.2864418029785156, -0.3134727478027344, 1.6189079284667969, 4.1293792724609375, 17.84393310546875, 5.5525665283203125, -7.313495635986328, -0.39379119873046875, 4.522491455078125, 4.780666351318359, 6.706752777099609, 1.4665145874023438, 3.3087921142578125, -3.5368270874023438, 2.60955810546875, 5.874565124511719, 0.9420547485351562, 1.5270156860351562, 15.653457641601562, 13.918302536010742, 5.7607879638671875, 3.4582977294921875, 5.770374298095703, 0.5923423767089844, 1.1763191223144531, 1.0065269470214844, 0.910400390625, 1.4146347045898438, 8.034187316894531, 3.2539825439453125, 8.037483215332031, 1.523294448852539, 0.7049789428710938, 9.621650695800781, 12.739309310913086, -3.0014991760253906, 6.043487548828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000273.npy"}
{"epoch": 0.4126984126984127, "step": 274, "batch_size": 64, "mean": 3.1323938369750977, "std": 4.395838737487793, "min": -7.802009582519531, "p10": -1.7618515014648437, "median": 2.794745445251465, "p90": 8.33728485107422, "max": 13.833564758300781, "pos_frac": 0.78125, "sample": [5.52215576171875, -0.7993850708007812, -1.4102306365966797, 4.462318420410156, 0.3914794921875, -7.802009582519531, 6.673011779785156, 6.896903991699219, 1.7432136535644531, 2.5814266204833984, 7.089450836181641, 4.7764434814453125, 1.061553955078125, -2.550079345703125, 3.113800048828125, 9.903968811035156, 12.078596115112305, -5.029010772705078, 2.554891586303711, 2.7760066986083984, 8.430160522460938, 6.731815338134766, 6.903606414794922, 5.948633193969727, 4.6351776123046875, -0.2155933380126953, 9.124565124511719, 13.833564758300781, 1.3551387786865234, -1.5224227905273438, 0.1280517578125, 1.19146728515625, 4.695747375488281, 0.22613525390625, -1.6978607177734375, 7.419246673583984, -4.8178558349609375, -2.5403900146484375, 5.075023651123047, 3.1948013305664062, -0.8217582702636719, 4.423919677734375, 0.9193382263183594, 1.7895965576171875, 0.02655792236328125, 3.937286376953125, 2.444305419921875, -5.995208740234375, 3.9515151977539062, 1.4108009338378906, 10.189521789550781, 8.120574951171875, 3.7179622650146484, -0.2223663330078125, 5.019096374511719, -1.789276123046875, 1.5746803283691406, 2.638631820678711, 11.877822875976562, 7.268228530883789, 0.5927314758300781, 2.8134841918945312, 7.797401428222656, 6.654836654663086], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000274.npy"}
{"epoch": 0.41421012849584277, "step": 275, "batch_size": 64, "mean": 3.5030534267425537, "std": 4.758714199066162, "min": -9.545841217041016, "p10": -1.5040081024169922, "median": 2.5397281646728516, "p90": 10.137905883789065, "max": 13.91314697265625, "pos_frac": 0.765625, "sample": [7.1815338134765625, 3.4010448455810547, 0.5755615234375, 8.117429733276367, 9.456085205078125, 2.5104713439941406, 2.4470977783203125, -9.545841217041016, 13.91314697265625, 8.186676025390625, 0.221099853515625, 3.420806884765625, 1.8972091674804688, 13.27731704711914, -0.21387863159179688, 6.108772277832031, 6.547119140625, 4.905969619750977, 13.77056884765625, -1.9686965942382812, -5.559844970703125, 13.655506134033203, 2.1732864379882812, 1.5045394897460938, 5.515419006347656, 0.46436309814453125, -1.5262107849121094, 2.2523422241210938, -0.3139152526855469, 0.9927902221679688, 2.5689849853515625, 13.480087280273438, -0.44712066650390625, 5.419792175292969, 7.215908050537109, 5.887535095214844, 1.6025657653808594, 2.4498424530029297, -1.4522018432617188, 0.8134994506835938, 5.4111175537109375, 0.4594268798828125, -1.5357284545898438, -0.6075210571289062, 3.6771926879882812, 6.914512634277344, 6.7074737548828125, 4.375511169433594, -3.8293304443359375, 1.6614532470703125, 1.4393196105957031, -0.5787391662597656, -1.7362556457519531, 4.8654632568359375, 10.48503303527832, 10.43011474609375, -0.9345779418945312, 4.319877624511719, 3.2913742065429688, 8.172468185424805, 6.756229400634766, 0.07521820068359375, -0.6352443695068359, 4.104377746582031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000275.npy"}
{"epoch": 0.41572184429327286, "step": 276, "batch_size": 64, "mean": 3.65468168258667, "std": 3.8442740440368652, "min": -6.0770416259765625, "p10": -1.1302900314331055, "median": 3.2585268020629883, "p90": 8.229478454589845, "max": 13.440048217773438, "pos_frac": 0.84375, "sample": [6.830352783203125, 1.1888275146484375, -1.2042217254638672, 6.636817932128906, 3.2143115997314453, 5.31707763671875, 13.440048217773438, -1.5696792602539062, 2.1457576751708984, 2.527658462524414, 4.3428955078125, 0.52886962890625, 6.509273529052734, 4.1615447998046875, 8.42352294921875, 5.849647521972656, 1.7101936340332031, -0.001392364501953125, 4.23883056640625, 2.619180679321289, 1.4930191040039062, 7.8214263916015625, 6.883571624755859, 7.658252716064453, 0.6320343017578125, -1.177774429321289, 4.699455261230469, -1.8746795654296875, 6.930418014526367, 7.290674209594727, 1.6615371704101562, 5.3044281005859375, 1.3559951782226562, -2.8579158782958984, 8.40435791015625, -0.750640869140625, 13.211063385009766, 0.1729888916015625, 2.3907318115234375, 6.471355438232422, -6.0770416259765625, 1.6228561401367188, 9.388519287109375, 0.6596298217773438, 1.3984451293945312, -3.1849899291992188, 7.43775749206543, 4.88671875, 7.090057373046875, 5.102325439453125, -1.0194931030273438, 1.1125946044921875, 7.26171875, 2.160236358642578, 4.094078063964844, 8.706687927246094, 5.487819671630859, 2.1113204956054688, 2.6191558837890625, 9.913856506347656, 1.4387760162353516, 3.3027420043945312, 5.438907623291016, 0.31713104248046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000276.npy"}
{"epoch": 0.41723356009070295, "step": 277, "batch_size": 64, "mean": 2.3820600509643555, "std": 4.523863315582275, "min": -6.867029190063477, "p10": -2.840956115722656, "median": 1.807337760925293, "p90": 7.947003746032717, "max": 15.91363525390625, "pos_frac": 0.734375, "sample": [3.9354400634765625, -2.976043701171875, 4.539340972900391, 1.0055789947509766, 4.090965270996094, -0.6319236755371094, 2.237152099609375, 1.84979248046875, 5.606035232543945, 6.37628173828125, -2.0336074829101562, -2.0650177001953125, 0.33355712890625, 1.2783584594726562, -0.6025505065917969, 2.5950088500976562, 1.5679206848144531, -2.1254425048828125, 1.04254150390625, 0.9122276306152344, 3.193483352661133, 4.4733123779296875, 1.6067638397216797, 2.648406982421875, 0.5883941650390625, 8.712677001953125, 0.21692657470703125, -0.4418487548828125, -3.0500030517578125, 5.878576278686523, -0.3541431427001953, 3.247905731201172, -6.867029190063477, 7.3838653564453125, 1.764883041381836, 4.7316741943359375, -2.5549468994140625, -4.031532287597656, 3.2588577270507812, 1.6518630981445312, 3.02288818359375, -0.9685745239257812, 15.91363525390625, 10.171051025390625, 3.9655227661132812, 2.0094985961914062, 4.4389801025390625, 2.2023468017578125, -6.432243347167969, 13.298072814941406, 3.723224639892578, -1.0144786834716797, 10.127532958984375, 5.171852111816406, -2.963531494140625, 0.8912620544433594, 8.188348770141602, -6.134052276611328, 5.36016845703125, 14.745830535888672, 0.7231826782226562, 0.8686904907226562, 0.34284019470214844, 5.8061065673828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000277.npy"}
{"epoch": 0.41874527588813304, "step": 278, "batch_size": 64, "mean": 3.2141804695129395, "std": 4.462850093841553, "min": -10.768402099609375, "p10": -1.5666248321533196, "median": 2.9101200103759766, "p90": 8.602692794799806, "max": 12.470626831054688, "pos_frac": 0.796875, "sample": [-1.8447837829589844, -0.9175872802734375, 6.135833740234375, 3.9115142822265625, 3.4372177124023438, -0.8253021240234375, -6.28289794921875, 0.5646934509277344, 8.746910095214844, 11.582679748535156, 2.4915542602539062, -0.1949005126953125, 7.9343414306640625, 12.156309127807617, 8.910453796386719, -0.8390369415283203, 1.3881454467773438, 1.3850288391113281, 2.965679168701172, 8.266185760498047, 2.293682098388672, 4.783905029296875, 3.1024093627929688, 2.8545608520507812, -10.768402099609375, 12.470626831054688, 2.064208984375, 6.585535049438477, 6.853302001953125, 2.6213531494140625, 2.3579483032226562, 4.194892883300781, -3.828704833984375, -0.5676765441894531, 7.510078430175781, 2.743255615234375, 3.432657241821289, 2.7978973388671875, 5.013986587524414, 5.6818084716796875, 11.910560607910156, 11.173118591308594, -2.0274620056152344, 3.4988555908203125, 2.345560073852539, 7.011077880859375, 3.6386184692382812, 0.5465717315673828, -4.000587463378906, 0.7527656555175781, 3.8216629028320312, 7.077667236328125, -4.3794097900390625, -0.7006816864013672, 0.047149658203125, 8.070480346679688, 6.486541748046875, 6.6905364990234375, 3.6635704040527344, 1.9788742065429688, 1.6161537170410156, 0.1699085235595703, 3.6521072387695312, 1.494537353515625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000278.npy"}
{"epoch": 0.42025699168556313, "step": 279, "batch_size": 64, "mean": 4.115296363830566, "std": 4.209471702575684, "min": -7.633312225341797, "p10": -0.5278905868530273, "median": 3.4664812088012695, "p90": 9.776377868652347, "max": 14.79855728149414, "pos_frac": 0.859375, "sample": [-0.5778560638427734, 3.247608184814453, 14.79855728149414, 2.6465530395507812, -7.633312225341797, 8.85736083984375, 1.3685379028320312, -0.8007049560546875, 0.3904571533203125, 4.274074554443359, 5.841682434082031, 8.338054656982422, 2.0403594970703125, 8.793182373046875, 5.568248748779297, 12.409255981445312, 3.768096923828125, 7.673397064208984, -1.9728622436523438, -0.4113044738769531, -1.7094364166259766, 6.30316162109375, 1.4303970336914062, 1.5299263000488281, 3.6792068481445312, 2.8855438232421875, 3.542543411254883, -0.11646842956542969, 2.8752975463867188, 9.237503051757812, 1.6659507751464844, 5.94708251953125, 6.839332580566406, 2.57659912109375, -1.50030517578125, 6.541309356689453, 7.337070465087891, 7.12921142578125, 0.00019073486328125, 11.285877227783203, 10.405353546142578, 5.3751220703125, 12.156673431396484, 2.183502197265625, 0.19329833984375, 4.474903106689453, 1.9426231384277344, 2.3739013671875, 10.00732421875, 8.432540893554688, 4.760982513427734, -1.5737762451171875, 3.3904190063476562, 7.404926300048828, 0.9893646240234375, 1.1203575134277344, 2.1666717529296875, 0.8267135620117188, 4.757347106933594, 4.181156158447266, 5.665168762207031, 0.5106201171875, 12.756771087646484, 2.777618408203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000279.npy"}
{"epoch": 0.4217687074829932, "step": 280, "batch_size": 64, "mean": 4.276982307434082, "std": 3.9530766010284424, "min": -3.817859649658203, "p10": 0.16128845214843762, "median": 3.279895782470703, "p90": 9.823120117187502, "max": 14.768058776855469, "pos_frac": 0.90625, "sample": [4.2135162353515625, 14.09408187866211, 3.044649124145508, 4.6924591064453125, 2.5214805603027344, 11.054725646972656, 0.40690040588378906, -1.2210540771484375, 1.1287498474121094, 5.332038879394531, 5.2871856689453125, 4.9775543212890625, 2.8018798828125, 4.4468536376953125, 8.623367309570312, 8.69329833984375, 1.90838623046875, 2.870563507080078, 8.604522705078125, -0.00872039794921875, 0.2857666015625, 2.5139198303222656, 0.3646278381347656, 1.3054275512695312, 8.485603332519531, 2.930938720703125, 3.9057846069335938, 4.209434509277344, 9.405059814453125, 13.122802734375, 4.533683776855469, -1.980255126953125, 4.4647674560546875, 0.9114837646484375, 1.824127197265625, 6.4681396484375, 14.768058776855469, -0.47385406494140625, 10.79806137084961, 8.163612365722656, 0.45529937744140625, -3.817859649658203, 0.324249267578125, 0.107940673828125, 2.1956825256347656, 7.211116790771484, 3.301748275756836, -0.3982429504394531, 0.5833892822265625, 3.2580432891845703, 10.558074951171875, 2.201171875, 3.140514373779297, 7.711236953735352, 5.971342086791992, 2.9634246826171875, 4.0270233154296875, 2.7637939453125, 1.3149566650390625, 4.3872528076171875, 2.2155685424804688, 5.439689636230469, 10.002288818359375, 8.329544067382812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000280.npy"}
{"epoch": 0.42328042328042326, "step": 281, "batch_size": 64, "mean": 3.773850202560425, "std": 4.4934916496276855, "min": -6.260103225708008, "p10": -2.4635808944702138, "median": 3.7140731811523438, "p90": 8.818034172058105, "max": 13.780181884765625, "pos_frac": 0.78125, "sample": [-2.851350784301758, 8.08509635925293, 1.480997085571289, -0.270477294921875, 11.535429000854492, 8.218231201171875, 2.8712024688720703, 8.259292602539062, 3.6989898681640625, 13.780181884765625, 2.5426177978515625, 6.351663589477539, 4.114997863769531, 3.1803970336914062, 3.2048873901367188, 2.097759246826172, 2.991954803466797, 8.841629028320312, -1.445638656616211, 4.65692138671875, 8.449661254882812, 2.610929489135742, 7.8224639892578125, 6.655418395996094, 5.275421142578125, 3.886920928955078, 4.105632781982422, 0.8653755187988281, 2.3716354370117188, -6.260103225708008, 4.901908874511719, -0.0697174072265625, -1.5587844848632812, 3.729156494140625, 9.851730346679688, -4.464788436889648, 8.447158813476562, 11.357887268066406, -0.5522041320800781, 0.7031154632568359, 11.133583068847656, 5.425750732421875, 12.396812438964844, 6.345285415649414, 5.243345260620117, 5.680656433105469, 3.045928955078125, 2.8936843872070312, 6.11097526550293, 2.9989356994628906, -0.1177520751953125, 0.7257537841796875, 8.701221466064453, -5.662162780761719, 4.422847747802734, 8.762979507446289, 6.679004669189453, -3.7664337158203125, 2.7205772399902344, 5.860649108886719, -0.7091636657714844, -3.365633010864258, 0.018754959106445312, -3.492788314819336], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000281.npy"}
{"epoch": 0.42479213907785335, "step": 282, "batch_size": 64, "mean": 3.401188611984253, "std": 4.680123805999756, "min": -5.626667022705078, "p10": -2.3997411727905273, "median": 2.818047523498535, "p90": 10.670629501342777, "max": 15.275613784790039, "pos_frac": 0.765625, "sample": [1.8876495361328125, -2.5448837280273438, -2.1498146057128906, 2.2487106323242188, 0.1938934326171875, 5.2033538818359375, 4.310390472412109, 2.583677291870117, -1.935384750366211, 12.030929565429688, 8.6290283203125, 10.964656829833984, 2.9895191192626953, 1.27008056640625, 8.23052978515625, 4.170623779296875, 11.501550674438477, 0.5879096984863281, -1.9367523193359375, 4.130626678466797, -0.8615303039550781, -0.14825439453125, 4.829397201538086, 7.42242431640625, 0.5254707336425781, 1.5008926391601562, 1.9158859252929688, 6.727638244628906, 15.275613784790039, 3.8969802856445312, -2.435924530029297, -2.3153133392333984, -5.626667022705078, 7.487434387207031, 0.9729480743408203, -2.8560867309570312, 7.76043701171875, 1.23651123046875, -2.4609909057617188, 3.2353897094726562, 3.1856765747070312, 7.9805908203125, 2.646575927734375, 4.099834442138672, 11.430366516113281, 13.142379760742188, 5.446460723876953, 1.410787582397461, 0.39505767822265625, 3.292144775390625, -0.7952117919921875, -1.5396881103515625, 0.6714324951171875, 0.772430419921875, -2.701416015625, 13.15351676940918, 9.984565734863281, 3.5245189666748047, -3.5604934692382812, 5.393890380859375, 4.724367141723633, 5.676872253417969, 8.312744140625, 2.580108642578125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000282.npy"}
{"epoch": 0.42630385487528344, "step": 283, "batch_size": 64, "mean": 3.0049023628234863, "std": 4.873682022094727, "min": -6.422454833984375, "p10": -2.0956459045410147, "median": 1.5419130325317383, "p90": 9.257214546203613, "max": 15.60019302368164, "pos_frac": 0.703125, "sample": [8.193130493164062, -3.389863967895508, -4.81982421875, -1.4450302124023438, 1.7201766967773438, 2.8749656677246094, 0.8456993103027344, -0.0993804931640625, 15.60019302368164, -1.3894271850585938, -1.4217910766601562, 0.7036285400390625, 2.72235107421875, -0.022678375244140625, 6.1885223388671875, 3.3072357177734375, -0.093994140625, 8.499290466308594, -6.422454833984375, 0.92950439453125, 9.870155334472656, 5.497894287109375, 9.138446807861328, 1.3636493682861328, 4.8470916748046875, 11.279800415039062, 13.630126953125, 4.744842529296875, 12.121833801269531, 1.8956985473632812, 2.7806396484375, 8.087852478027344, -0.96685791015625, 2.767345428466797, -2.374481201171875, 6.2680511474609375, -1.2611846923828125, -0.400726318359375, 4.214935302734375, 7.846759796142578, 9.308115005493164, -3.4351978302001953, 8.972923278808594, 12.275283813476562, 1.3568038940429688, 0.027141571044921875, 0.447418212890625, -5.47662353515625, 0.8900165557861328, -0.8345947265625, 5.035688400268555, 0.9560508728027344, 6.6265869140625, 1.1309089660644531, 5.6082763671875, 5.8925628662109375, 0.5475234985351562, -0.32379913330078125, 1.3169517517089844, 0.78192138671875, -5.3519744873046875, 8.852336883544922, -0.79595947265625, 4.67326545715332], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000283.npy"}
{"epoch": 0.42781557067271353, "step": 284, "batch_size": 64, "mean": 3.582159996032715, "std": 4.405559062957764, "min": -6.322715759277344, "p10": -0.4168874740600584, "median": 3.2856340408325195, "p90": 8.024816131591798, "max": 19.427398681640625, "pos_frac": 0.875, "sample": [3.9467391967773438, 1.6807403564453125, 3.514070510864258, 4.2972564697265625, 8.904106140136719, 4.811431884765625, -0.5060901641845703, -2.3533782958984375, 7.4052581787109375, 2.070037841796875, 1.6968650817871094, 6.569633483886719, -0.20874786376953125, 4.2156829833984375, 3.9233531951904297, -1.7645721435546875, 1.445566177368164, 5.977508544921875, 0.892333984375, 19.427398681640625, 0.728118896484375, 5.714935302734375, 3.7485504150390625, -2.449993133544922, 0.3726654052734375, 5.8437042236328125, 1.5226058959960938, 2.5755996704101562, 3.3190784454345703, 1.3256340026855469, 5.076728820800781, 4.113929748535156, 4.607872009277344, -0.5102920532226562, 3.180898666381836, 0.7704029083251953, 5.4137115478515625, 15.92793083190918, 6.12066650390625, 0.8641891479492188, 3.8492507934570312, 13.403038024902344, 4.4964447021484375, 4.824680328369141, 0.6105766296386719, 0.3545684814453125, 5.113677978515625, 6.828165054321289, 3.2521896362304688, 10.827606201171875, 0.3810882568359375, 1.7042350769042969, 8.091217041015625, 0.38925933837890625, 1.8836669921875, 2.1472320556640625, 13.373809814453125, 0.794647216796875, 4.904928207397461, -4.6199798583984375, 7.869880676269531, -6.322715759277344, 0.45371246337890625, 0.4349365234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000284.npy"}
{"epoch": 0.4293272864701436, "step": 285, "batch_size": 64, "mean": 2.7972030639648438, "std": 4.089816570281982, "min": -4.5361175537109375, "p10": -2.601489448547363, "median": 1.8994693756103516, "p90": 8.828554153442385, "max": 12.086738586425781, "pos_frac": 0.765625, "sample": [4.538970947265625, 1.4507713317871094, -0.46673583984375, 8.593929290771484, 0.42327880859375, -2.663860321044922, -3.2663040161132812, 2.15521240234375, 1.5533103942871094, -4.007978439331055, 3.723970413208008, 10.436290740966797, 6.383443832397461, 4.6302032470703125, 9.516349792480469, -3.703519821166992, 4.982368469238281, 12.086738586425781, -0.5556869506835938, 1.9029121398925781, -0.09927749633789062, 0.7623405456542969, -2.4559574127197266, -3.1002197265625, 1.0496673583984375, -0.23514556884765625, -2.9107513427734375, 1.6164093017578125, 7.66729736328125, 0.0569610595703125, -0.34745025634765625, 6.2410430908203125, -4.5361175537109375, 8.929107666015625, 9.330902099609375, 2.4238510131835938, 2.3865585327148438, 3.5883331298828125, 5.786802291870117, 4.793901443481445, 4.27069091796875, 2.79461669921875, -1.7068214416503906, 11.442070007324219, 7.963783264160156, 1.8881912231445312, 1.0618743896484375, 11.369720458984375, 1.9304046630859375, 1.382040023803711, 0.7140159606933594, 5.095184326171875, 1.896026611328125, 4.756107330322266, 1.5907135009765625, 0.6379165649414062, 8.207672119140625, 7.300743103027344, -2.3330841064453125, 1.6973915100097656, 0.7256011962890625, 2.7521820068359375, 0.594940185546875, 4.3271026611328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000285.npy"}
{"epoch": 0.4308390022675737, "step": 286, "batch_size": 64, "mean": 4.635047912597656, "std": 4.793935775756836, "min": -11.60211181640625, "p10": -0.15613784790039048, "median": 4.7818603515625, "p90": 10.794741821289064, "max": 13.891998291015625, "pos_frac": 0.875, "sample": [6.888618469238281, 0.4032001495361328, 1.498046875, 5.398643493652344, 4.755851745605469, 5.296241760253906, 1.9823226928710938, 5.1428680419921875, 6.3424224853515625, -0.7668609619140625, 7.410129547119141, 4.829627990722656, 3.5818328857421875, 8.186288833618164, 5.376258850097656, 13.891998291015625, 7.614101409912109, 2.340688705444336, -6.037754058837891, 2.5067977905273438, 4.454799652099609, -0.2139568328857422, 1.6727638244628906, 13.527992248535156, 0.6085014343261719, -5.834209442138672, 8.136934280395508, 6.030784606933594, 0.329681396484375, 7.0848846435546875, 5.549196243286133, 7.042034149169922, 9.597484588623047, 9.311767578125, 5.297676086425781, 9.868682861328125, 2.3836212158203125, 8.500463485717773, -0.8504524230957031, 11.065017700195312, 10.518890380859375, 12.60113525390625, 1.4851608276367188, 3.0013275146484375, 3.7890396118164062, 4.369579315185547, 0.41252899169921875, 2.088348388671875, 13.482696533203125, 9.070323944091797, 10.9129638671875, 2.5218429565429688, 1.3196220397949219, -0.021226882934570312, -0.4273567199707031, 6.7265472412109375, 12.938758850097656, 2.796184539794922, -11.60211181640625, 7.9679107666015625, 4.453105926513672, 0.3408775329589844, 4.807868957519531, 2.8840789794921875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000286.npy"}
{"epoch": 0.4323507180650038, "step": 287, "batch_size": 64, "mean": 4.1888251304626465, "std": 5.007585048675537, "min": -7.364494323730469, "p10": -0.8463268280029296, "median": 3.0231266021728516, "p90": 11.774482727050783, "max": 18.405593872070312, "pos_frac": 0.8125, "sample": [6.552825927734375, 0.8950958251953125, 1.5507736206054688, 3.9065933227539062, 13.285537719726562, 11.902816772460938, 0.15029144287109375, 2.8358078002929688, 2.6038818359375, -0.9060821533203125, 2.9995498657226562, 4.922538757324219, 2.2897796630859375, 5.07244873046875, 12.383354187011719, 14.4322509765625, 13.328125, 1.3943843841552734, 5.924631118774414, 4.485088348388672, 4.4618072509765625, 3.313262939453125, 8.91689682006836, 1.9367313385009766, 3.2423744201660156, 6.900421142578125, -0.5201644897460938, -3.0782546997070312, 4.93365478515625, 2.3405227661132812, 6.494529724121094, 0.44556427001953125, -7.364494323730469, -1.4825515747070312, -1.2829875946044922, 7.66107177734375, 2.289447784423828, 0.6162300109863281, 3.046703338623047, 7.382328033447266, 9.627174377441406, -0.047931671142578125, 9.453804016113281, 8.215612411499023, -5.64154052734375, 0.47882843017578125, 18.405593872070312, 2.5574493408203125, 7.116401672363281, 7.089818954467773, -0.36997032165527344, 1.4389877319335938, 3.5322723388671875, 1.9076557159423828, -1.6407585144042969, 2.710784912109375, 14.315383911132812, 4.0671844482421875, 2.2484893798828125, 1.5768966674804688, -0.6523818969726562, 11.47503662109375, 10.664129257202148, -0.7068977355957031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000287.npy"}
{"epoch": 0.43386243386243384, "step": 288, "batch_size": 64, "mean": 2.9255194664001465, "std": 4.8843584060668945, "min": -6.912620544433594, "p10": -2.930451965332031, "median": 2.887974739074707, "p90": 8.925611114501955, "max": 15.986915588378906, "pos_frac": 0.6875, "sample": [4.659597396850586, 2.0144271850585938, -2.990570068359375, -6.912620544433594, 11.406044006347656, 1.2495079040527344, 3.7461166381835938, -1.3451194763183594, 7.377471923828125, 1.8850765228271484, 3.453693389892578, 2.875528335571289, -0.43210601806640625, 8.2061767578125, -3.32232666015625, 8.06396484375, -4.0302276611328125, 3.2791786193847656, -0.897674560546875, 3.4267196655273438, -0.658843994140625, 11.607841491699219, 15.986915588378906, 14.80352783203125, -2.7901763916015625, 10.2642822265625, 2.293886184692383, 5.344821929931641, -2.2567596435546875, 0.8766269683837891, 12.843307495117188, -0.7195587158203125, 3.5001373291015625, 2.726543426513672, 8.998855590820312, 2.900421142578125, 3.5533294677734375, -1.7877655029296875, 3.9091949462890625, -2.2816085815429688, -3.482372283935547, -5.75921630859375, 8.754707336425781, 2.7749290466308594, 1.9093170166015625, -4.778118133544922, 5.456977844238281, -1.5139617919921875, 4.037393569946289, 6.706329345703125, 4.6576690673828125, 0.35187530517578125, 4.008880615234375, 6.816375732421875, -1.9589195251464844, -0.8168964385986328, 4.717620849609375, 6.735870361328125, -0.19939231872558594, 6.16497802734375, 0.2417449951171875, 3.07073974609375, 7.4179534912109375, 1.0909233093261719], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000288.npy"}
{"epoch": 0.43537414965986393, "step": 289, "batch_size": 64, "mean": 3.2641940116882324, "std": 5.137382507324219, "min": -6.42913818359375, "p10": -2.8903841018676752, "median": 2.30426025390625, "p90": 9.685256576538086, "max": 17.17450714111328, "pos_frac": 0.734375, "sample": [2.9037227630615234, 2.0335617065429688, 16.80937957763672, -3.126893997192383, -1.1260452270507812, -5.1701507568359375, -0.3931751251220703, 2.125457763671875, 7.225837707519531, 4.786323547363281, 1.4112548828125, 3.3263397216796875, 2.9490718841552734, 5.454959869384766, 6.456085205078125, 0.9647693634033203, 5.48468017578125, -6.42913818359375, -0.4890022277832031, 9.638107299804688, 1.7227325439453125, 13.41779899597168, -0.8974533081054688, 0.10576629638671875, 4.610816955566406, 1.7602081298828125, 9.705463409423828, 0.36518096923828125, -0.19199752807617188, 13.328790664672852, 1.9144439697265625, 1.96380615234375, -2.3385276794433594, -4.28570556640625, 17.17450714111328, -1.2795333862304688, 0.8044242858886719, 16.096969604492188, 3.6604232788085938, 4.335414886474609, 6.984819412231445, 6.3250579833984375, -1.2752227783203125, 1.3019428253173828, 3.509052276611328, -3.9326343536376953, 2.932962417602539, 0.6662063598632812, -0.8059768676757812, -3.494853973388672, 6.983940124511719, 8.620758056640625, 2.483062744140625, 2.9210662841796875, 8.432441711425781, 10.538217544555664, 5.945976257324219, 6.018653869628906, 5.392326354980469, 1.9299068450927734, 1.585561752319336, -3.472562789916992, 4.057901382446289, -1.5488548278808594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000289.npy"}
{"epoch": 0.436885865457294, "step": 290, "batch_size": 64, "mean": 4.022075653076172, "std": 5.1389851570129395, "min": -10.83053970336914, "p10": -1.9302021026611327, "median": 2.8028182983398438, "p90": 11.651578903198242, "max": 14.069427490234375, "pos_frac": 0.84375, "sample": [9.345867156982422, 8.639396667480469, 5.052921295166016, 14.069427490234375, 6.052764892578125, 3.5753440856933594, -2.4119014739990234, 0.8145275115966797, 1.569671630859375, 8.007102966308594, 3.191852569580078, 0.5051822662353516, 0.014862060546875, -2.9398956298828125, 8.567184448242188, 2.3338623046875, -2.8873214721679688, -1.805572509765625, 1.2111091613769531, -0.0863800048828125, 3.889455795288086, 12.997230529785156, 12.144096374511719, 7.2316741943359375, 0.8617515563964844, 1.0188064575195312, 2.2201480865478516, -3.368682861328125, 3.8852901458740234, -1.8812637329101562, 10.072166442871094, 8.530410766601562, 5.266637802124023, 0.5493583679199219, 1.3402328491210938, 8.286537170410156, 13.389762878417969, 0.8336582183837891, 2.2012557983398438, 1.12664794921875, 10.531143188476562, 13.976255416870117, 0.06268692016601562, -10.83053970336914, 1.3207778930664062, 0.18890380859375, 0.1430492401123047, 12.30059814453125, 6.180534362792969, 7.802648544311523, 0.6393852233886719, 2.596588134765625, 0.559967041015625, 3.0090484619140625, -1.9511756896972656, 11.679428100585938, 7.156272888183594, 11.586597442626953, 10.475433349609375, 2.4209365844726562, 6.0524749755859375, 5.720855712890625, 4.4691162109375, -2.09332275390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000290.npy"}
{"epoch": 0.4383975812547241, "step": 291, "batch_size": 64, "mean": 3.7549681663513184, "std": 5.2389607429504395, "min": -8.819343566894531, "p10": -2.749824714660644, "median": 3.2480154037475586, "p90": 10.837806701660156, "max": 14.947479248046875, "pos_frac": 0.8125, "sample": [10.73712158203125, 10.843551635742188, 1.1638336181640625, -3.6300048828125, 4.622314453125, 9.643142700195312, -1.4011383056640625, 0.9111900329589844, 13.018295288085938, -1.1982421875, 3.7237625122070312, 9.691972732543945, -3.2814903259277344, -1.9912052154541016, 10.115772247314453, 6.934404373168945, 0.3475151062011719, 0.7710800170898438, 6.0634613037109375, 3.541656494140625, 0.1046905517578125, 0.4143028259277344, 5.5595703125, 5.484642028808594, 6.198789596557617, 5.360382080078125, 14.263275146484375, 0.42864990234375, 11.027759552001953, 2.6408729553222656, 2.686969757080078, 14.947479248046875, 0.14398765563964844, 0.6189804077148438, -4.1624908447265625, 0.4353904724121094, 9.714794158935547, 1.8949356079101562, 5.673957824707031, 2.063722610473633, 0.8574447631835938, 13.658790588378906, -8.819343566894531, 3.3134841918945312, 8.475894927978516, 3.86492919921875, 6.014495849609375, 10.024646759033203, 3.182546615600586, 12.50189208984375, -3.0749473571777344, 1.7193603515625, -5.275783538818359, 5.2339935302734375, -4.977346420288086, 1.9006004333496094, 0.7225475311279297, 10.82440185546875, 4.148994445800781, 4.755941390991211, 5.21746826171875, 2.526639938354492, -1.8185043334960938, -0.7878341674804688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000291.npy"}
{"epoch": 0.4399092970521542, "step": 292, "batch_size": 64, "mean": 3.4032821655273438, "std": 4.709022521972656, "min": -10.142623901367188, "p10": -2.1008880615234373, "median": 3.5593948364257812, "p90": 7.832390022277832, "max": 17.613250732421875, "pos_frac": 0.8125, "sample": [-10.142623901367188, -2.1359481811523438, 4.810724258422852, -0.3298759460449219, 10.575960159301758, 7.240129470825195, 1.5253829956054688, 17.613250732421875, 4.746248245239258, 3.633636474609375, 4.454029083251953, 1.8725299835205078, 7.441747665405273, 5.799430847167969, -1.7686233520507812, 0.42572784423828125, 6.2881927490234375, 4.85443115234375, 5.098411560058594, -1.0827484130859375, 4.7378997802734375, 0.5705070495605469, 3.832019805908203, -4.147075653076172, 4.02099609375, 1.6429595947265625, 0.9843101501464844, 7.1465606689453125, 0.89154052734375, 9.716533660888672, 1.6035614013671875, 4.900665283203125, 1.258169174194336, 3.4851531982421875, 0.05557823181152344, -2.0190811157226562, 5.9149169921875, -2.9697113037109375, 1.1851692199707031, 4.78668212890625, 1.365814208984375, 7.800849914550781, 5.813638687133789, -3.4249744415283203, 15.911491394042969, 7.845907211303711, 3.845001220703125, 2.754640579223633, 2.2964839935302734, 3.073833465576172, 4.92718505859375, 5.7679443359375, 1.1124248504638672, 6.152500152587891, 5.193218231201172, 0.6997261047363281, 5.59278678894043, 2.9057884216308594, 14.31478500366211, 1.9148635864257812, -1.47052001953125, -2.4515380859375, -3.7517623901367188, 11.102584838867188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000292.npy"}
{"epoch": 0.4414210128495843, "step": 293, "batch_size": 64, "mean": 4.360095024108887, "std": 5.2026753425598145, "min": -10.812789916992188, "p10": -2.433646392822266, "median": 4.839485168457031, "p90": 9.91421184539795, "max": 18.13134765625, "pos_frac": 0.796875, "sample": [0.19525909423828125, 2.760448455810547, 4.786411285400391, 9.028717041015625, 5.513847351074219, 5.355663299560547, -3.6601028442382812, 3.4702281951904297, 2.330047607421875, 8.38763427734375, 7.32427978515625, -0.8731517791748047, 7.992729187011719, 3.8953208923339844, 7.202323913574219, 1.99560546875, 9.24661636352539, 6.874290466308594, 7.820659637451172, 5.308052062988281, 10.183910369873047, -2.2001686096191406, -1.6705474853515625, -2.5996246337890625, 9.041513442993164, 2.1646690368652344, 5.483150482177734, -1.4143695831298828, 2.6029281616210938, 15.298660278320312, 1.4493999481201172, 13.030105590820312, 12.912322998046875, -4.138729095458984, 5.0025634765625, -2.4430465698242188, 9.643739700317383, -10.812789916992188, 1.9058494567871094, 0.6100273132324219, 6.756103515625, 13.988090515136719, -4.261226654052734, -0.5660247802734375, 0.5633773803710938, -3.3553085327148438, 4.009544372558594, 4.892559051513672, 18.13134765625, 9.025672912597656, 7.81092643737793, 5.03521728515625, 5.751335144042969, 3.4997215270996094, 5.783515930175781, -2.411712646484375, 4.718231201171875, 2.45556640625, 4.49995231628418, 10.030128479003906, 7.2646484375, 6.262042999267578, 4.11700439453125, 6.040924072265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000293.npy"}
{"epoch": 0.4429327286470144, "step": 294, "batch_size": 64, "mean": 4.160007476806641, "std": 5.202470779418945, "min": -3.891437530517578, "p10": -1.4239816665649414, "median": 3.2292871475219727, "p90": 11.116288757324218, "max": 18.26006317138672, "pos_frac": 0.78125, "sample": [11.03070068359375, 4.957462310791016, 7.8061065673828125, 2.07745361328125, 10.08465576171875, 3.2911300659179688, 13.604999542236328, 3.393829345703125, -2.531463623046875, 2.9538192749023438, 4.603340148925781, -2.4698867797851562, -2.0455169677734375, 11.152969360351562, 4.118818283081055, 3.1674442291259766, 11.285270690917969, 4.397796630859375, 8.801761627197266, 6.648204803466797, 4.292324066162109, 0.60345458984375, 2.6235599517822266, 5.239814758300781, -1.4514503479003906, 17.919586181640625, -3.891437530517578, 3.305042266845703, -1.5208587646484375, 18.26006317138672, 15.278106689453125, 0.05950927734375, -1.230478286743164, 17.243011474609375, -0.20794296264648438, 1.0914154052734375, 2.542510986328125, -0.7243061065673828, 0.9229888916015625, 7.109649658203125, 2.31414794921875, -0.8686065673828125, 7.166559219360352, 0.7440166473388672, 5.344587326049805, 8.120361328125, 2.5477352142333984, 7.516529083251953, 1.0706844329833984, 10.483837127685547, -2.3803977966308594, 9.7242431640625, 6.6783599853515625, 1.0199661254882812, 1.503631591796875, 3.350078582763672, 0.116729736328125, -1.3598880767822266, 1.985870361328125, -0.1060333251953125, 4.130016326904297, -1.0815067291259766, 3.4281578063964844, 0.997955322265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000294.npy"}
{"epoch": 0.4444444444444444, "step": 295, "batch_size": 64, "mean": 3.395203113555908, "std": 5.17494010925293, "min": -5.134376525878906, "p10": -2.170604705810547, "median": 2.4061708450317383, "p90": 9.93492298126221, "max": 17.719749450683594, "pos_frac": 0.703125, "sample": [8.132583618164062, -2.21710205078125, -1.0751953125, 2.0651397705078125, 1.9823837280273438, 9.025680541992188, 7.147258758544922, 0.9431266784667969, 3.5866928100585938, 2.16778564453125, 17.719749450683594, -0.8399581909179688, 17.225494384765625, -1.2285137176513672, 7.036994934082031, 6.8717803955078125, 1.8317985534667969, 2.4870834350585938, -0.948822021484375, 4.128726959228516, 8.80599594116211, -2.820110321044922, 1.0507736206054688, 3.1833019256591797, 8.8839111328125, -1.477783203125, 2.325258255004883, -3.4922752380371094, 10.1998291015625, 2.9905319213867188, -5.134376525878906, -1.3517723083496094, 15.193283081054688, 4.574226379394531, 4.543205261230469, -1.8747825622558594, 5.640970230102539, 4.269783020019531, 7.073451995849609, 9.316808700561523, 5.002532958984375, -4.779151916503906, 2.6768341064453125, -0.10089111328125, 8.4007568359375, 11.518836975097656, -2.0621109008789062, -1.40423583984375, -4.703584671020508, 2.623147964477539, -0.4982891082763672, 2.0980682373046875, 1.5867080688476562, 1.0905380249023438, 2.2777442932128906, 2.488748550415039, 7.1923980712890625, 0.4849700927734375, 1.2890090942382812, -3.0571975708007812, 4.895893096923828, 10.45928955078125, 13.133842468261719, -1.2637805938720703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000295.npy"}
{"epoch": 0.4459561602418745, "step": 296, "batch_size": 64, "mean": 4.561258316040039, "std": 5.0827531814575195, "min": -6.65765380859375, "p10": -1.2277969360351562, "median": 3.8899707794189453, "p90": 11.399497222900393, "max": 15.899620056152344, "pos_frac": 0.78125, "sample": [9.640693664550781, 8.291786193847656, 7.938690185546875, 3.564493179321289, -0.34807586669921875, 0.8737640380859375, 1.4566078186035156, 13.589996337890625, 1.782958984375, 4.169147491455078, 7.874786376953125, 9.984539031982422, -2.436067581176758, 13.891986846923828, 1.4054832458496094, 8.84185791015625, 9.845779418945312, 6.222576141357422, 3.3837890625, 9.089851379394531, 3.6107940673828125, 3.0484352111816406, 8.963897705078125, -6.65765380859375, -1.426483154296875, -0.2920074462890625, -0.141937255859375, 4.488346099853516, 2.916168212890625, 4.491249084472656, 0.5105056762695312, -1.5630855560302734, 2.3341064453125, 1.2589263916015625, 15.894439697265625, -1.13800048828125, -1.0017776489257812, -1.0861186981201172, 7.746223449707031, -1.2248382568359375, 5.028038024902344, 1.7676467895507812, 15.899620056152344, -1.22906494140625, 10.750045776367188, 4.3191986083984375, 5.456050872802734, 6.337156295776367, 4.475364685058594, 9.423137664794922, 5.5002593994140625, 1.9393234252929688, 9.42573356628418, 2.6577510833740234, 14.129539489746094, 2.4100189208984375, 0.9288330078125, 11.677833557128906, 0.8574981689453125, -3.2266159057617188, -2.0185394287109375, 13.485176086425781, 7.5602264404296875, 4.570465087890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000296.npy"}
{"epoch": 0.4474678760393046, "step": 297, "batch_size": 64, "mean": 4.235692977905273, "std": 4.2500176429748535, "min": -4.946771621704102, "p10": -0.9324743270874023, "median": 3.30133056640625, "p90": 9.482161521911621, "max": 13.389923095703125, "pos_frac": 0.8125, "sample": [4.481777191162109, -1.9667816162109375, 2.9650497436523438, 8.550315856933594, 5.4744873046875, 8.41286849975586, 3.2080459594726562, 9.432851791381836, -2.0118484497070312, -4.946771621704102, 3.123828887939453, 2.8594818115234375, 4.692041397094727, 5.527156829833984, 1.7092742919921875, 10.853204727172852, 5.9160919189453125, 7.634555816650391, -0.9704532623291016, 0.0994873046875, 6.088157653808594, 1.3362503051757812, -0.6777801513671875, -2.0116119384765625, -0.8438568115234375, 8.453018188476562, 4.047828674316406, 1.1935653686523438, 8.894119262695312, 0.7947921752929688, 7.399080276489258, 9.486793518066406, 1.3931121826171875, -1.8643856048583984, -0.7048301696777344, 4.103391647338867, 1.668548583984375, 1.6787109375, 11.975440979003906, 3.3946151733398438, -0.2493896484375, 0.7344532012939453, 8.708297729492188, 9.471353530883789, 2.608905792236328, 10.3533935546875, 1.7095565795898438, -0.03228759765625, 0.5051651000976562, 9.382863998413086, 2.952381134033203, 6.872674942016602, 4.390346527099609, 8.703498840332031, -1.0011787414550781, 2.1798744201660156, 8.774620056152344, 13.389923095703125, 1.2635421752929688, 10.472412109375, 2.3111038208007812, 7.394847869873047, 10.236618041992188, 9.10174560546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000297.npy"}
{"epoch": 0.4489795918367347, "step": 298, "batch_size": 64, "mean": 3.8227360248565674, "std": 5.484193325042725, "min": -7.805011749267578, "p10": -3.21268310546875, "median": 3.4849014282226562, "p90": 11.372016525268554, "max": 19.371471405029297, "pos_frac": 0.75, "sample": [6.8262786865234375, -7.805011749267578, 19.371471405029297, 2.1699066162109375, 10.221923828125, 1.1488800048828125, 6.694164276123047, -0.7060470581054688, -0.22298812866210938, 5.350698471069336, 13.689350128173828, 11.399250030517578, -3.7156639099121094, 6.521697998046875, -2.3437652587890625, 5.649814605712891, 6.625091552734375, -1.6036605834960938, 7.185373306274414, 4.2153778076171875, 1.1211051940917969, -3.0465030670166016, 8.585472106933594, 8.172897338867188, 2.77899169921875, 2.978302001953125, -4.414249420166016, 6.590177536010742, 5.940986633300781, 11.51003646850586, 2.1106109619140625, 11.3084716796875, 1.828887939453125, 9.535263061523438, 2.0899829864501953, 4.649082183837891, 1.7745113372802734, 12.497329711914062, 3.616037368774414, 2.7759933471679688, -0.4591331481933594, -1.7979812622070312, -3.7393665313720703, 7.543632507324219, 2.2958621978759766, 8.154205322265625, 11.767433166503906, 3.8110198974609375, 0.3886680603027344, 4.9903717041015625, 3.3537654876708984, -2.3124008178710938, 0.88543701171875, -3.283903121948242, 0.458465576171875, 6.516944885253906, -2.5988731384277344, 3.70965576171875, -5.0966796875, -6.548801422119141, 9.699146270751953, 7.400844573974609, 2.4540443420410156, 13.987215042114258], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000298.npy"}
{"epoch": 0.4504913076341648, "step": 299, "batch_size": 64, "mean": 3.332130193710327, "std": 5.467432022094727, "min": -6.887947082519531, "p10": -2.855971527099609, "median": 2.2548351287841797, "p90": 10.355829429626466, "max": 17.3070068359375, "pos_frac": 0.703125, "sample": [5.173179626464844, 5.609256744384766, 0.8068885803222656, -5.5863037109375, -1.84442138671875, 7.632410049438477, 5.19183349609375, 2.649517059326172, 17.3070068359375, 0.8239574432373047, -3.9730453491210938, 2.7299728393554688, -6.887947082519531, 0.6733932495117188, -5.14518928527832, 0.4200439453125, 1.834564208984375, 9.977045059204102, 9.752342224121094, -2.4402236938476562, 10.518165588378906, -0.5835609436035156, 4.555820465087891, 4.976863861083984, -1.4338836669921875, -1.050048828125, 1.1132965087890625, 1.0347938537597656, 16.68170166015625, -0.18631744384765625, 8.906204223632812, 2.0572891235351562, 6.370582580566406, -6.373920440673828, -0.8566055297851562, 1.5223388671875, 6.0072479248046875, 11.296327590942383, 0.7057037353515625, 3.1114578247070312, 7.024017333984375, -0.30306243896484375, 7.675865173339844, 7.838376998901367, 15.048843383789062, -3.1257171630859375, 6.673246383666992, 12.652069091796875, 4.92620849609375, 8.016830444335938, 0.823089599609375, 0.1332683563232422, 5.099264144897461, -1.8224334716796875, 4.35064697265625, 6.139652252197266, -3.034149169921875, 9.459716796875, -0.7994556427001953, 1.465850830078125, -0.5732574462890625, 2.452381134033203, -2.3834609985351562, 12.440807342529297], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000299.npy"}
{"epoch": 0.4520030234315949, "step": 300, "batch_size": 64, "mean": 3.682488441467285, "std": 5.215744972229004, "min": -10.097381591796875, "p10": -2.6222917556762693, "median": 3.932239532470703, "p90": 10.000929260253907, "max": 15.198616027832031, "pos_frac": 0.765625, "sample": [1.988016128540039, 11.619117736816406, -1.9212493896484375, 10.961467742919922, 3.9126815795898438, 1.0010452270507812, 6.2431793212890625, 8.286659240722656, -0.33805084228515625, -5.934431076049805, 12.972564697265625, 10.169967651367188, 6.350118637084961, 6.0113983154296875, 14.977294921875, 5.700868606567383, 9.448150634765625, 0.994476318359375, -0.9151535034179688, 3.868114471435547, 1.6087608337402344, 7.826728820800781, 4.986297607421875, -1.8213653564453125, -4.4315948486328125, 2.913818359375, 4.413646697998047, -1.8310661315917969, 3.9517974853515625, 5.725069046020508, 1.4583492279052734, -5.926422119140625, 5.6393585205078125, 1.7236919403076172, -10.097381591796875, 4.239910125732422, 1.7040882110595703, 15.198616027832031, 2.2044219970703125, 10.01409912109375, 7.56707763671875, 6.670330047607422, -2.486236572265625, 9.970199584960938, 5.703784942626953, -3.0408477783203125, 1.2896766662597656, 3.662647247314453, -0.782470703125, 8.663654327392578, 5.7989501953125, 2.5527496337890625, 3.9518890380859375, 0.5485954284667969, 2.0566787719726562, -2.680601119995117, 6.554607391357422, 7.815948486328125, 7.576007843017578, -7.012674331665039, 1.4440460205078125, 9.09531021118164, 7.020927429199219, -1.1580429077148438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000300.npy"}
{"epoch": 0.45351473922902497, "step": 301, "batch_size": 64, "mean": 3.6360573768615723, "std": 5.811470031738281, "min": -8.353126525878906, "p10": -3.6524572372436523, "median": 2.562554359436035, "p90": 11.33793830871582, "max": 20.962661743164062, "pos_frac": 0.75, "sample": [5.822956085205078, 6.3790740966796875, 6.166412353515625, 3.0701770782470703, 9.045852661132812, -1.936065673828125, 11.37701416015625, 11.246761322021484, 9.055295944213867, 1.0921554565429688, 2.4923954010009766, 1.0184059143066406, 4.343971252441406, -2.4747180938720703, -3.5901260375976562, 1.32183837890625, 1.5037384033203125, 4.5442657470703125, 4.5099639892578125, 1.0939369201660156, -2.953622817993164, 2.649059295654297, 15.44955062866211, -4.3511810302734375, 8.523359298706055, 0.36716461181640625, 1.0984516143798828, 11.22113037109375, 7.9609222412109375, -3.720184326171875, -1.5638580322265625, -1.0522613525390625, 10.047737121582031, 12.4251708984375, -3.87725830078125, 0.1082000732421875, -4.7756805419921875, 20.962661743164062, -3.679170608520508, 1.5268516540527344, -0.7987461090087891, 0.5415668487548828, 4.205574035644531, -4.695026397705078, 1.492757797241211, 9.89084243774414, 7.048322677612305, 4.3055572509765625, 1.82525634765625, 12.677597045898438, 2.454153060913086, 13.347000122070312, 3.9861297607421875, 1.2581062316894531, -2.637481689453125, 6.144176483154297, 7.144287109375, 13.625099182128906, 2.7786102294921875, 2.6327133178710938, -1.260894775390625, -8.353126525878906, 11.068084716796875, 1.5767593383789062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000301.npy"}
{"epoch": 0.455026455026455, "step": 302, "batch_size": 64, "mean": 2.5001766681671143, "std": 5.005168914794922, "min": -8.875411987304688, "p10": -3.733311080932617, "median": 2.5255661010742188, "p90": 9.288398742675781, "max": 15.420135498046875, "pos_frac": 0.734375, "sample": [2.652568817138672, 13.59090805053711, -2.399566650390625, 15.420135498046875, 4.717887878417969, 2.1837215423583984, 11.793128967285156, -3.3355331420898438, 8.929710388183594, 4.399070739746094, 4.239509582519531, 4.1479949951171875, 0.9580841064453125, 0.7742843627929688, 9.219905853271484, 9.317752838134766, 1.2243499755859375, 1.2587051391601562, 0.9935531616210938, -3.7801666259765625, 3.2043380737304688, -4.5851593017578125, -5.5778656005859375, 4.9134063720703125, 7.8801422119140625, -3.623981475830078, 1.8036270141601562, 4.8816986083984375, -1.20745849609375, 0.6241683959960938, -0.8038406372070312, 5.9032440185546875, 6.6839599609375, 1.2239570617675781, -6.389898300170898, 5.983020782470703, 9.357009887695312, 1.6714859008789062, 3.046173095703125, 3.4951858520507812, 4.3781890869140625, 9.335073471069336, -0.017240524291992188, 2.475311279296875, 0.125701904296875, 3.5806121826171875, -8.875411987304688, -0.18354034423828125, 4.628570556640625, 3.3268890380859375, 6.357645034790039, -2.2110137939453125, 2.5758209228515625, 0.10919952392578125, 1.6516647338867188, -3.8054122924804688, -2.0985565185546875, 13.653656005859375, -0.38628387451171875, 0.15856552124023438, -8.839851379394531, 3.54638671875, 2.6266098022460938, 3.1095027923583984], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000302.npy"}
{"epoch": 0.4565381708238851, "step": 303, "batch_size": 64, "mean": 5.00660514831543, "std": 6.132766246795654, "min": -6.265357971191406, "p10": -3.0735605239868162, "median": 5.151649475097656, "p90": 12.495458221435548, "max": 21.07050323486328, "pos_frac": 0.734375, "sample": [-0.654022216796875, 2.9403209686279297, 5.5047607421875, 5.102622985839844, 18.829925537109375, 11.951942443847656, 7.7462158203125, -2.717588424682617, 8.114328384399414, 4.3427581787109375, 11.47418212890625, -5.370208740234375, 4.202045440673828, 1.2734355926513672, -3.2261199951171875, -3.6019058227539062, -4.4065093994140625, 7.89434814453125, -5.236812591552734, -1.0322113037109375, -0.6222286224365234, 5.635759353637695, 12.7283935546875, 8.800739288330078, -3.8098373413085938, 5.584159851074219, 8.77531623840332, -6.265357971191406, -1.345306396484375, 11.161712646484375, 4.771663665771484, 5.200675964355469, 4.373729705810547, -1.1119003295898438, 2.610198974609375, 2.3934574127197266, 11.687015533447266, 2.404407501220703, 6.626468658447266, 9.4007568359375, 0.568023681640625, 6.79107666015625, 5.566015243530273, 10.255096435546875, 2.7551193237304688, 10.284820556640625, 16.782379150390625, 3.7757339477539062, 6.29522705078125, 11.214561462402344, -2.5316696166992188, -0.3285198211669922, -2.1695098876953125, 7.121856689453125, 12.906761169433594, 7.2183990478515625, 2.9842300415039062, 4.338470458984375, 8.058258056640625, 14.033658981323242, 21.07050323486328, -2.272829055786133, 9.421783447265625, 14.151931762695312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000303.npy"}
{"epoch": 0.4580498866213152, "step": 304, "batch_size": 64, "mean": 4.613096237182617, "std": 4.586028099060059, "min": -6.451202392578125, "p10": -0.8380752563476561, "median": 4.775160789489746, "p90": 10.14125919342041, "max": 15.293304443359375, "pos_frac": 0.84375, "sample": [9.52975082397461, -1.721099853515625, 6.203704833984375, 7.887176513671875, 1.3275527954101562, 4.682342529296875, 5.8177490234375, 7.969053268432617, 0.7795677185058594, 3.4942893981933594, 1.9821739196777344, 10.108901977539062, 8.271957397460938, 2.975383758544922, 7.502784729003906, 4.867979049682617, -0.6809234619140625, 3.3193397521972656, 13.484512329101562, -6.451202392578125, 1.3962249755859375, 6.931571960449219, 3.371826171875, 3.732280731201172, 3.196056365966797, 2.6317977905273438, 0.2671852111816406, 7.93023681640625, 6.146537780761719, 10.155126571655273, 7.254302978515625, 0.43768310546875, 12.190460205078125, 5.917350769042969, -0.284088134765625, -4.1685028076171875, 4.9595947265625, 5.1895294189453125, 1.6607627868652344, 5.85772705078125, -0.25386810302734375, -2.2559585571289062, 15.293304443359375, 5.6611328125, 2.7166175842285156, 1.8454742431640625, 4.378379821777344, 14.311790466308594, 9.158363342285156, 10.037002563476562, -0.905426025390625, 0.7615585327148438, 11.053756713867188, 0.21650123596191406, 5.656028747558594, -2.7345199584960938, 1.9380226135253906, 14.646163940429688, 7.7255706787109375, 4.465606689453125, 5.2544097900390625, -1.0015907287597656, 5.1584625244140625, 5.986713409423828], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000304.npy"}
{"epoch": 0.4595616024187453, "step": 305, "batch_size": 64, "mean": 3.627037763595581, "std": 5.75944709777832, "min": -11.799667358398438, "p10": -3.240489578247069, "median": 3.1408309936523438, "p90": 11.236740112304691, "max": 16.718429565429688, "pos_frac": 0.765625, "sample": [9.122859954833984, 0.8945846557617188, -2.087146759033203, -6.784660339355469, -1.2126235961914062, 0.3242759704589844, 16.718429565429688, -3.851625442504883, 6.3092193603515625, 9.335731506347656, 1.8464241027832031, 12.165325164794922, 1.051544189453125, 5.575531005859375, -0.3376617431640625, 0.08688735961914062, 1.3387107849121094, 0.9176788330078125, -4.04681396484375, 9.720817565917969, 0.4764976501464844, 6.817386627197266, 3.1217880249023438, 3.568389892578125, 2.9065780639648438, 3.282379150390625, 10.506549835205078, -0.6474456787109375, 7.6832275390625, -11.799667358398438, 3.1598739624023438, 5.643579483032227, 11.549678802490234, 6.984031677246094, -1.166229248046875, 2.157571792602539, 15.537002563476562, 4.029327392578125, 1.717721939086914, -0.42474365234375, 1.2039260864257812, 0.15700531005859375, 8.545913696289062, 14.384389877319336, 12.25927734375, 0.11465072631835938, 2.136260986328125, -5.4720001220703125, 9.915321350097656, 4.132843017578125, 5.2762298583984375, 3.3393478393554688, 3.5963783264160156, 15.066650390625, 7.830558776855469, 10.4892578125, -3.7347793579101562, 3.6483840942382812, -6.199485778808594, 7.998022079467773, -1.4792404174804688, 0.37621498107910156, 7.5391998291015625, -1.1848907470703125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000305.npy"}
{"epoch": 0.46107331821617537, "step": 306, "batch_size": 64, "mean": 3.21763277053833, "std": 5.122567653656006, "min": -7.504791259765625, "p10": -2.1552589416503904, "median": 2.265645980834961, "p90": 10.043663024902346, "max": 21.638824462890625, "pos_frac": 0.75, "sample": [14.736572265625, 10.632034301757812, 0.6850757598876953, 10.654556274414062, 2.9846038818359375, 6.083465576171875, 0.6487312316894531, 4.191699981689453, 2.1817588806152344, 6.6086273193359375, 7.97412109375, 2.8468170166015625, -5.903434753417969, 1.3616142272949219, -1.9675750732421875, -7.504791259765625, 7.0867156982421875, -0.578369140625, 3.4090538024902344, 1.6673088073730469, 4.473686218261719, 14.421287536621094, 5.1030120849609375, -0.34241294860839844, 4.419828414916992, 0.9101333618164062, -0.7204170227050781, 7.026641845703125, -0.0018310546875, 1.6125030517578125, -5.466094970703125, 9.546989440917969, -0.5056705474853516, 0.4681587219238281, 2.9563636779785156, 1.6136627197265625, 8.813716888427734, 10.7640380859375, -2.215301513671875, 1.2031478881835938, 0.7813568115234375, 1.1942100524902344, 2.1155128479003906, 4.561668395996094, -2.6471710205078125, -0.14073944091796875, -0.8036441802978516, 10.256523132324219, 0.90643310546875, 8.560096740722656, 6.7753753662109375, 3.9599342346191406, 2.3495330810546875, 1.6749496459960938, 5.34820556640625, 1.2025337219238281, 4.650726318359375, -2.8146705627441406, 3.1908950805664062, 21.638824462890625, -5.1864471435546875, 3.0279541015625, 5.4615631103515625, -2.0151596069335938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000306.npy"}
{"epoch": 0.46258503401360546, "step": 307, "batch_size": 64, "mean": 2.7784557342529297, "std": 5.863908290863037, "min": -9.301849365234375, "p10": -4.267910194396972, "median": 1.8596954345703125, "p90": 10.414121246337892, "max": 18.364097595214844, "pos_frac": 0.578125, "sample": [9.905052185058594, -2.925506591796875, 2.9353370666503906, -1.0404701232910156, -9.301849365234375, 5.8220367431640625, 18.364097595214844, 1.1982192993164062, 9.262191772460938, 9.3284912109375, -0.6723709106445312, 6.370460510253906, 3.0357589721679688, 13.317192077636719, -5.309837341308594, 9.283447265625, -0.9453582763671875, -2.000335693359375, 5.723361968994141, -2.669727325439453, 3.9302520751953125, -6.578432083129883, -3.9067764282226562, -2.9638519287109375, 2.9000930786132812, -5.3123321533203125, -3.2021713256835938, 4.273750305175781, 9.400314331054688, -2.1016902923583984, 0.7473678588867188, 7.899354934692383, -1.3486976623535156, -1.5470390319824219, -4.986724853515625, 4.2188720703125, -0.476593017578125, 2.006072998046875, 6.198129653930664, -0.4926910400390625, 1.71331787109375, -1.6660385131835938, 0.2256317138671875, 11.793815612792969, 10.636375427246094, -1.5508880615234375, 2.4928016662597656, 7.66265869140625, 12.76898193359375, 6.890693664550781, -4.618896484375, 10.632293701171875, 9.775627136230469, -0.12384414672851562, 6.049560546875, 8.040964126586914, 1.4645500183105469, 11.46023941040039, -2.3929977416992188, -1.4969558715820312, -4.42268180847168, 7.45042610168457, 8.837284088134766, -2.1391448974609375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000307.npy"}
{"epoch": 0.46409674981103555, "step": 308, "batch_size": 64, "mean": 3.9357619285583496, "std": 5.063067436218262, "min": -8.581634521484375, "p10": -2.3304893493652337, "median": 3.8874330520629883, "p90": 10.359017372131348, "max": 16.89276123046875, "pos_frac": 0.75, "sample": [7.946847915649414, 4.867877960205078, 4.8839874267578125, -2.6626815795898438, 8.565383911132812, 8.35089111328125, -1.2145118713378906, 5.1912841796875, -0.8492889404296875, 0.206878662109375, 16.89276123046875, 4.8263397216796875, 15.145576477050781, 5.028499603271484, 3.4485931396484375, -2.563812255859375, 4.173316955566406, 5.250190734863281, 8.397638320922852, 3.295154571533203, 6.4612884521484375, -3.359050750732422, 11.840797424316406, 6.578733444213867, 4.357147216796875, 10.000473022460938, -0.21665382385253906, 4.467308044433594, 4.925689697265625, -2.5404815673828125, 10.135894775390625, -0.15001678466796875, 2.990131378173828, -1.8405075073242188, 8.071640014648438, 1.7292251586914062, 7.78179931640625, 2.25677490234375, 2.2223129272460938, -8.581634521484375, 2.0191497802734375, 3.1562061309814453, 11.593002319335938, 10.463756561279297, 0.903533935546875, 2.5691452026367188, 4.69683837890625, 0.5201416015625, 8.28323745727539, -0.7391853332519531, 10.17457389831543, 1.1656951904296875, -0.0730133056640625, 10.438064575195312, 12.449752807617188, -0.9009017944335938, 9.602249145507812, -5.807275772094727, -1.1797714233398438, 3.7059268951416016, 0.9847297668457031, -3.6909027099609375, 4.068939208984375, 1.173074722290039], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000308.npy"}
{"epoch": 0.4656084656084656, "step": 309, "batch_size": 64, "mean": 3.8618478775024414, "std": 5.270479202270508, "min": -8.705535888671875, "p10": -2.5283124923706053, "median": 3.8004150390625, "p90": 9.624753761291505, "max": 17.905357360839844, "pos_frac": 0.78125, "sample": [5.0670318603515625, 17.234394073486328, 0.4051017761230469, 10.890867233276367, 5.953081130981445, 0.6685104370117188, 7.830070495605469, -3.650390625, 2.7600936889648438, 2.341114044189453, 8.197372436523438, 0.2509040832519531, -3.5887908935546875, -0.9159297943115234, 3.6983642578125, -4.921764373779297, 5.429021835327148, 8.992443084716797, 0.8731765747070312, 1.0184593200683594, 1.972665786743164, 8.816959381103516, 9.540937423706055, -2.5755863189697266, 9.168869018554688, 7.600555419921875, 0.034099578857421875, 5.365354537963867, -2.4180068969726562, 0.8681411743164062, 1.1077022552490234, -3.8085174560546875, 0.7061767578125, 4.423133850097656, 2.9090499877929688, 4.178859710693359, 5.3946685791015625, 5.7342529296875, -0.4056282043457031, 10.335941314697266, -0.090850830078125, 9.2835693359375, 9.526111602783203, 6.267795562744141, 11.38311767578125, -4.298614501953125, 9.660675048828125, 5.88629150390625, 1.2510623931884766, 6.198432922363281, -1.3427276611328125, 2.216644287109375, 2.6575164794921875, 7.239095687866211, 17.905357360839844, 1.3857803344726562, 4.363580703735352, 6.2255401611328125, -1.2799835205078125, -8.705535888671875, 3.9024658203125, -0.42842864990234375, 4.4084625244140625, 16.060134887695312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000309.npy"}
{"epoch": 0.4671201814058957, "step": 310, "batch_size": 64, "mean": 2.361100196838379, "std": 5.289212226867676, "min": -7.181100845336914, "p10": -3.8823318481445312, "median": 1.8423757553100586, "p90": 8.944829750061036, "max": 15.714508056640625, "pos_frac": 0.640625, "sample": [4.6994781494140625, 0.5658798217773438, 5.629016876220703, 2.068777084350586, 7.00177001953125, -5.252834320068359, -1.36688232421875, 5.2517242431640625, 7.6761322021484375, 0.5748481750488281, -1.0791149139404297, -7.181100845336914, 5.988986968994141, 14.945968627929688, 3.349884033203125, -0.9517784118652344, -0.29145050048828125, 3.565185546875, 12.713478088378906, -3.90020751953125, 0.04906463623046875, 3.352449417114258, -4.291893005371094, 15.714508056640625, -2.3490371704101562, -2.868133544921875, 6.481109619140625, -1.2949352264404297, 1.5269317626953125, 9.380159378051758, -3.8406219482421875, 6.479957580566406, 6.02783203125, 5.5106048583984375, -2.80596923828125, 2.19293212890625, 0.0018157958984375, 3.536346435546875, 6.368141174316406, 3.960540771484375, -0.634063720703125, 5.818031311035156, -5.043052673339844, -6.906436920166016, 6.9156646728515625, 2.5285568237304688, 1.6159744262695312, 6.456844329833984, -2.60797119140625, -2.936248779296875, -1.6418075561523438, 12.396656036376953, 5.046163558959961, -3.3544464111328125, 0.428131103515625, 0.023397445678710938, -1.4116973876953125, 8.572555541992188, 9.104375839233398, -4.866302490234375, -3.50347900390625, 12.348197937011719, 1.5134315490722656, 4.108367919921875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000310.npy"}
{"epoch": 0.46863189720332576, "step": 311, "batch_size": 64, "mean": 3.265782356262207, "std": 5.613485813140869, "min": -13.542922973632812, "p10": -2.674713134765625, "median": 2.80621337890625, "p90": 9.725485229492188, "max": 20.656890869140625, "pos_frac": 0.75, "sample": [7.462291717529297, 0.0283203125, -0.5170059204101562, -4.308891296386719, -2.7319107055664062, -1.1900482177734375, 2.2825546264648438, 10.159904479980469, -0.28488731384277344, 10.509078979492188, 2.3912429809570312, -13.542922973632812, 4.331951141357422, 5.7325592041015625, 6.9947662353515625, -1.1372756958007812, 5.678070068359375, 1.2091751098632812, 9.772369384765625, 4.1331787109375, 1.4075469970703125, 1.18475341796875, 2.1049766540527344, 1.0772590637207031, 6.3097686767578125, 3.0228118896484375, -3.5451583862304688, -1.2804641723632812, 1.466094970703125, 7.494171142578125, -4.57598876953125, 9.6160888671875, 19.869461059570312, 1.9146690368652344, 4.470428466796875, 7.081197738647461, 4.377250671386719, 2.064952850341797, 2.750091552734375, 2.862335205078125, 20.656890869140625, 1.3225555419921875, -2.3264007568359375, 10.92095947265625, -4.246988296508789, 3.6042327880859375, 1.462066650390625, -2.5412521362304688, 6.061866760253906, 4.25872802734375, 14.144664764404297, 7.3211669921875, 3.1262969970703125, -6.701423645019531, 4.198577880859375, 5.863069534301758, 1.3318729400634766, -1.209686279296875, -2.18463134765625, 5.4423675537109375, 1.2194900512695312, 7.811492919921875, 9.38330078125, 3.446075439453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000311.npy"}
{"epoch": 0.47014361300075586, "step": 312, "batch_size": 64, "mean": 4.089212417602539, "std": 5.063168525695801, "min": -7.8762359619140625, "p10": -1.535786056518554, "median": 3.93505859375, "p90": 10.371256828308105, "max": 17.97869110107422, "pos_frac": 0.828125, "sample": [2.0522804260253906, 2.2464065551757812, 0.587127685546875, 5.0105438232421875, 6.777599334716797, 5.693092346191406, 3.2624034881591797, 0.2198486328125, 4.026237487792969, -0.6800537109375, 14.301063537597656, 9.045700073242188, 7.2332000732421875, 7.223817825317383, -0.113861083984375, 6.6283721923828125, 1.4105377197265625, 0.1684398651123047, 2.1562633514404297, 9.8646240234375, 12.48260498046875, 2.3186607360839844, -2.028881072998047, 2.828094482421875, 3.0667724609375, 12.573600769042969, 1.8284912109375, -2.8383636474609375, 0.7319107055664062, 10.335000991821289, 15.013742446899414, -7.8762359619140625, 7.778800964355469, 7.143993377685547, -6.7452850341796875, 4.191913604736328, 9.32736587524414, 3.2671737670898438, -3.1810531616210938, 10.386795043945312, 17.97869110107422, 2.0548362731933594, 1.1438827514648438, 4.7225189208984375, 6.805328369140625, 2.4638214111328125, 4.225914001464844, 8.35860824584961, 5.579280853271484, -0.7417411804199219, -0.48647308349609375, 5.166511535644531, 11.858592987060547, 4.33831787109375, 0.9443359375, 4.4002532958984375, 4.356777191162109, 7.596099853515625, -6.512441635131836, 4.796173095703125, 1.685791015625, -1.8760910034179688, 1.2879829406738281, 3.8438796997070312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000312.npy"}
{"epoch": 0.47165532879818595, "step": 313, "batch_size": 64, "mean": 3.245781421661377, "std": 5.776256084442139, "min": -7.8359222412109375, "p10": -3.8733207702636716, "median": 2.9690170288085938, "p90": 11.116100311279299, "max": 19.633827209472656, "pos_frac": 0.703125, "sample": [0.3405036926269531, 6.571418762207031, 7.432147979736328, 2.1618003845214844, 5.9672088623046875, 5.68779182434082, 3.9352798461914062, 1.0677738189697266, 10.1102294921875, -3.9284744262695312, 0.2587127685546875, 8.399879455566406, 4.3640289306640625, 3.0672760009765625, 2.8772201538085938, -0.06011199951171875, 1.8430328369140625, -0.7451744079589844, 3.3419342041015625, -1.5154533386230469, 7.18304443359375, 6.007045745849609, -0.4757556915283203, 17.021926879882812, -7.8359222412109375, -5.786657333374023, -0.65545654296875, -0.448699951171875, 8.35223388671875, 3.2709579467773438, 15.065391540527344, 11.32080078125, -6.1035614013671875, -2.2641239166259766, 11.466995239257812, 10.638465881347656, 1.3783760070800781, -2.444873809814453, 0.0250091552734375, 15.0257568359375, 1.9529838562011719, 1.3609428405761719, 4.7672119140625, -1.8227462768554688, 19.633827209472656, 2.3077926635742188, -1.194427490234375, 0.6562957763671875, 5.424873352050781, 3.0608139038085938, 9.150505065917969, 5.5932159423828125, 3.584697723388672, 0.03852272033691406, -1.5526466369628906, -5.7399749755859375, 4.388206481933594, -4.74110221862793, 8.691307067871094, -3.74462890625, -5.494396209716797, 4.255523681640625, 3.8841094970703125, 11.351133346557617], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000313.npy"}
{"epoch": 0.47316704459561604, "step": 314, "batch_size": 64, "mean": 4.635438919067383, "std": 5.0715227127075195, "min": -4.8448944091796875, "p10": -1.8286537170410155, "median": 3.7130050659179688, "p90": 11.438550567626955, "max": 17.684858322143555, "pos_frac": 0.84375, "sample": [8.149787902832031, 1.4705162048339844, 3.427642822265625, 15.4378662109375, 5.187175750732422, 6.387443542480469, 3.3254470825195312, 4.379936218261719, -2.701629638671875, 5.280527114868164, 0.263519287109375, -1.924020767211914, 15.73681640625, 2.2843780517578125, -0.9204845428466797, 15.109970092773438, 4.733776092529297, 3.4735469818115234, 9.453353881835938, 0.25933837890625, 0.4927024841308594, 0.0788726806640625, 8.705366134643555, 10.838973999023438, 10.712821960449219, -4.8448944091796875, 3.283599853515625, 9.720504760742188, 6.3617095947265625, 4.7373504638671875, 8.10284423828125, 2.9862823486328125, 1.9976348876953125, 2.455127716064453, 0.5706329345703125, 2.9086990356445312, 10.675537109375, 0.4933013916015625, 1.7479419708251953, 4.976186752319336, 0.5066204071044922, -1.8314208984375, 12.114593505859375, 3.053936004638672, 5.98443603515625, 7.076873779296875, 7.896869659423828, 17.684858322143555, -0.0973052978515625, 12.21209716796875, 2.7137832641601562, 11.532119750976562, 6.93096923828125, 2.1882476806640625, 6.6683807373046875, 4.49090576171875, -2.8563385009765625, 4.407402038574219, 2.6829833984375, -1.8221969604492188, 3.952463150024414, -2.908445358276367, -2.9500694274902344, 11.220222473144531], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000314.npy"}
{"epoch": 0.47467876039304613, "step": 315, "batch_size": 64, "mean": 3.7043237686157227, "std": 4.728330612182617, "min": -8.339153289794922, "p10": -1.9169639587402343, "median": 3.857367515563965, "p90": 9.525964927673341, "max": 15.159727096557617, "pos_frac": 0.765625, "sample": [-1.9903030395507812, 5.117828369140625, 11.164682388305664, 3.8786449432373047, -0.6523704528808594, 5.527933120727539, -8.339153289794922, -1.3804206848144531, -1.9176712036132812, 1.8380813598632812, -4.0549468994140625, 8.544952392578125, -0.1061248779296875, 9.617118835449219, 0.11841201782226562, 5.116111755371094, 4.3952789306640625, 0.05991363525390625, 0.19240188598632812, -0.5508098602294922, 4.534099578857422, 8.176620483398438, 1.0213642120361328, 2.793304443359375, 2.1734466552734375, 5.559722900390625, 13.389801025390625, -1.915313720703125, 1.6849174499511719, 6.550321578979492, 4.9966278076171875, 3.3396034240722656, 10.215019226074219, 4.2898712158203125, 3.3474044799804688, 8.539306640625, 5.4762115478515625, 3.8022918701171875, -6.4770355224609375, 2.5240325927734375, 7.355377197265625, -1.108734130859375, 7.0962677001953125, 7.218372344970703, -1.0822906494140625, 5.881538391113281, 7.8587188720703125, 15.159727096557617, -4.496166229248047, 3.1575050354003906, 7.377067565917969, 1.3811721801757812, 6.03363037109375, -0.69140625, 5.129032135009766, 3.34576416015625, 9.313272476196289, 5.4990234375, 11.196483612060547, -3.1320743560791016, 8.282875061035156, 0.7514324188232422, 11.11285400390625, 3.836090087890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000315.npy"}
{"epoch": 0.47619047619047616, "step": 316, "batch_size": 64, "mean": 4.514555931091309, "std": 5.21467399597168, "min": -4.638938903808594, "p10": -1.3579973220825192, "median": 3.9095382690429688, "p90": 11.888262939453126, "max": 19.40655517578125, "pos_frac": 0.734375, "sample": [-1.1373367309570312, 3.0914764404296875, 4.948020935058594, -0.7207107543945312, -4.638938903808594, -0.44530487060546875, 3.233489990234375, 6.633819580078125, 1.1848526000976562, 7.771446228027344, 6.94219970703125, 11.92523193359375, 4.769710540771484, 10.563468933105469, 6.217681884765625, -0.6566619873046875, -3.875173568725586, 2.5109825134277344, -2.474151611328125, 9.222806930541992, 6.606935501098633, 13.950462341308594, 3.354511260986328, 2.8698196411132812, 5.591251373291016, 4.638614654541016, 5.662712097167969, -0.43309783935546875, -2.2309722900390625, 5.930583953857422, 9.26658821105957, 12.908493041992188, 3.4472618103027344, 19.40655517578125, -1.452566146850586, 3.6397323608398438, -2.14447021484375, 7.184181213378906, 7.798149108886719, 2.0602874755859375, 9.288482666015625, 3.789764404296875, 5.6730804443359375, 0.08538436889648438, 10.415206909179688, -0.046295166015625, -0.597686767578125, 11.802001953125, -0.84588623046875, 16.30596923828125, -0.8705825805664062, -3.81390380859375, 1.150115966796875, -0.6853828430175781, 0.915771484375, 4.0293121337890625, 2.5239830017089844, 12.312847137451172, 6.1402587890625, 7.371604919433594, 10.633262634277344, 2.3370742797851562, 12.384075164794922, 5.5111846923828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000316.npy"}
{"epoch": 0.47770219198790626, "step": 317, "batch_size": 64, "mean": 5.105019569396973, "std": 5.044238090515137, "min": -4.2430572509765625, "p10": -1.3761512756347654, "median": 4.907321929931641, "p90": 12.044970703125001, "max": 17.203027725219727, "pos_frac": 0.828125, "sample": [6.233406066894531, 8.989818572998047, 6.227809906005859, 8.835281372070312, 3.7393798828125, 16.601425170898438, -0.8396453857421875, 4.102081298828125, 1.0765819549560547, 6.878021240234375, 0.6320724487304688, 3.326618194580078, -1.4684982299804688, -0.7592983245849609, 6.472240447998047, 1.480682373046875, 11.409500122070312, 1.0797271728515625, 10.672813415527344, 11.966667175292969, -2.0023536682128906, 0.47597503662109375, 6.132972717285156, -3.2686614990234375, 8.14671516418457, 4.8180084228515625, 6.185516357421875, 8.706710815429688, 9.316520690917969, -1.160675048828125, 17.203027725219727, 3.0377044677734375, 5.142038345336914, 13.652669906616211, 13.152053833007812, -4.2430572509765625, 1.6269683837890625, -1.6076240539550781, 4.996635437011719, 12.078529357910156, 2.6708412170410156, 5.860565185546875, 14.402214050292969, -2.8372344970703125, 2.4972381591796875, 3.5525474548339844, 5.363410949707031, 7.567535400390625, 3.2640838623046875, 1.1255569458007812, 10.090103149414062, 6.49310302734375, 4.1233978271484375, 8.348281860351562, -0.5340347290039062, 3.5198287963867188, -1.5392913818359375, 7.6446990966796875, 9.663385391235352, 13.734046936035156, 1.0633697509765625, 7.629035949707031, 1.2914886474609375, 2.6807098388671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000317.npy"}
{"epoch": 0.47921390778533635, "step": 318, "batch_size": 64, "mean": 3.096670150756836, "std": 6.321573257446289, "min": -8.838554382324219, "p10": -6.24200267791748, "median": 2.736644744873047, "p90": 11.609192276000979, "max": 16.567276000976562, "pos_frac": 0.640625, "sample": [4.385990142822266, 9.32244873046875, 1.504058837890625, -3.958526611328125, 10.408882141113281, 8.584781646728516, 4.921173095703125, -6.869140625, 3.4575023651123047, 15.037609100341797, 0.8033256530761719, -5.311653137207031, 6.605892181396484, 6.424781799316406, 11.019878387451172, 2.0654029846191406, 3.1819534301757812, -3.921560287475586, 4.971364974975586, -0.6529521942138672, -1.213479995727539, 6.149909973144531, 12.107547760009766, -6.568901062011719, 0.6662712097167969, 6.31121826171875, 11.86175537109375, -8.838554382324219, 4.57861328125, 7.6485137939453125, -1.4648666381835938, 6.365516662597656, -3.3940658569335938, 0.7399253845214844, -0.6716384887695312, 15.554840087890625, 16.567276000976562, -1.9603652954101562, 10.936588287353516, -2.1997528076171875, -0.5291748046875, 2.2913360595703125, 8.652427673339844, -7.52435302734375, -0.3533477783203125, 7.2117767333984375, -0.25032806396484375, 12.896560668945312, 1.1751384735107422, 7.184825897216797, 0.7858562469482422, 14.455062866210938, 3.250307083129883, 6.597930908203125, -6.853534698486328, -0.5968170166015625, 7.446647644042969, -5.684001922607422, 1.54998779296875, -0.054779052734375, 7.121913909912109, -6.481145858764648, -6.537614822387695, 7.274660110473633], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000318.npy"}
{"epoch": 0.48072562358276644, "step": 319, "batch_size": 64, "mean": 4.249757289886475, "std": 5.85236930847168, "min": -8.158676147460938, "p10": -2.2656669616699214, "median": 3.569948196411133, "p90": 11.865819549560548, "max": 19.50847625732422, "pos_frac": 0.78125, "sample": [3.5017833709716797, 14.246856689453125, 2.032684326171875, 7.071441650390625, 1.8909435272216797, 9.651123046875, 3.6133995056152344, 3.0016098022460938, 3.8738250732421875, 5.85472297668457, 9.793853759765625, 8.984443664550781, -0.8312034606933594, 0.6084213256835938, 4.3953857421875, 10.209075927734375, -1.2688980102539062, 19.50847625732422, 2.335512161254883, 8.19167709350586, 2.8907108306884766, -2.4374771118164062, 5.7693328857421875, 6.939460754394531, 14.845245361328125, -1.1176834106445312, 12.56602668762207, 4.2532958984375, 11.705623626708984, 8.604263305664062, -1.4163055419921875, -3.0543899536132812, 2.9874114990234375, 19.415130615234375, -4.885749816894531, 4.786857604980469, -1.333953857421875, 8.062919616699219, 6.5113067626953125, -5.7183837890625, 7.590038299560547, 2.4332752227783203, 0.8374252319335938, 1.292642593383789, 1.5322341918945312, 1.0489330291748047, -8.158676147460938, -0.03887939453125, -6.822696685791016, 11.93447494506836, 0.8267593383789062, 2.2315750122070312, 2.3922653198242188, -1.864776611328125, 4.282661437988281, 10.61163330078125, 7.199619293212891, 4.433931350708008, 6.833314895629883, 4.873476028442383, 2.169219970703125, -6.889915466308594, 13.670654296875, 3.5264968872070312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000319.npy"}
{"epoch": 0.48223733938019653, "step": 320, "batch_size": 64, "mean": 3.3326382637023926, "std": 6.017539024353027, "min": -9.238662719726562, "p10": -3.399678039550781, "median": 1.6270904541015625, "p90": 10.865658187866213, "max": 22.20611572265625, "pos_frac": 0.6875, "sample": [7.269838333129883, -4.140605926513672, 0.904571533203125, 2.1242427825927734, 3.0726470947265625, 1.1056556701660156, 11.941337585449219, -0.01885986328125, 8.345863342285156, -0.45707130432128906, -0.34711456298828125, 8.268211364746094, 0.3453235626220703, 0.06894493103027344, -2.7028961181640625, 4.913330078125, 1.8264999389648438, -2.4033432006835938, -0.2782440185546875, 9.412040710449219, -5.3487091064453125, 3.293834686279297, -9.238662719726562, 1.3402462005615234, -1.723876953125, 22.20611572265625, 19.729949951171875, 8.44970703125, 4.721397399902344, -3.9385032653808594, 1.3557815551757812, -4.107635498046875, 6.458984375, 4.026214599609375, 0.2508506774902344, 14.644100189208984, 2.5409584045410156, -0.3195953369140625, -4.221027374267578, -0.0116424560546875, 5.607791900634766, 13.038593292236328, 1.3201904296875, 1.7914657592773438, 0.4328765869140625, 1.2362613677978516, 16.66451644897461, 3.9648094177246094, 1.4627151489257812, -1.3405647277832031, -0.10087013244628906, -3.0654220581054688, 10.077796936035156, -2.331390380859375, 3.969959259033203, 6.5518951416015625, 6.352783203125, 11.182445526123047, 10.126487731933594, 1.4192733764648438, 4.079629898071289, -3.5429306030273438, 8.661285400390625, 6.370389938354492], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000320.npy"}
{"epoch": 0.4837490551776266, "step": 321, "batch_size": 64, "mean": 3.652297258377075, "std": 5.461522579193115, "min": -9.385154724121094, "p10": -3.6539993286132795, "median": 3.675839424133301, "p90": 10.618072509765627, "max": 14.132320404052734, "pos_frac": 0.78125, "sample": [3.6812591552734375, 6.382236480712891, 0.32483863830566406, 5.0292205810546875, 4.569854736328125, 0.19196319580078125, 14.013729095458984, 5.443506240844727, 13.110904693603516, 0.9665660858154297, 12.983718872070312, 2.4101905822753906, 3.7941246032714844, 8.301216125488281, -5.158226013183594, 0.42726898193359375, -1.5383758544921875, 3.2998428344726562, 4.39300537109375, 8.521568298339844, 7.2601470947265625, 1.934234619140625, -8.555206298828125, -1.771881103515625, 3.5785789489746094, -4.454296112060547, -4.547828674316406, -5.705545425415039, 6.101167678833008, 4.2814483642578125, 2.7428131103515625, 8.417667388916016, 9.313934326171875, 4.115865707397461, 0.6359825134277344, 8.127403259277344, 5.865936279296875, -1.7866401672363281, 1.7535552978515625, 7.938911437988281, 1.5883464813232422, -0.6496620178222656, 1.2451744079589844, 12.723243713378906, -0.6809844970703125, -9.385154724121094, 9.435440063476562, 10.236434936523438, 12.499828338623047, 7.056133270263672, 2.7459030151367188, 3.670419692993164, 0.13763427734375, 8.447132110595703, 10.781631469726562, 6.669277191162109, 4.606842041015625, 3.5383377075195312, 14.132320404052734, -7.594200134277344, 5.737129211425781, -1.583587646484375, 3.3148117065429688, -1.3200836181640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000321.npy"}
{"epoch": 0.4852607709750567, "step": 322, "batch_size": 64, "mean": 4.863307952880859, "std": 4.571203708648682, "min": -7.724651336669922, "p10": -0.10219192504882803, "median": 4.209711074829102, "p90": 11.004305648803712, "max": 16.451148986816406, "pos_frac": 0.875, "sample": [-0.6607666015625, 8.233871459960938, 11.025901794433594, 2.238433837890625, 8.45450210571289, 2.8558502197265625, 7.404571533203125, 6.195808410644531, 7.836399078369141, 3.4924774169921875, 2.0678939819335938, 4.573509216308594, 11.573104858398438, 12.492340087890625, 4.675445556640625, 3.9499282836914062, 4.837577819824219, 3.1251678466796875, 1.1275157928466797, -0.3941192626953125, 3.6485443115234375, 1.7237777709960938, -2.9178695678710938, 11.325284957885742, 10.953914642333984, 8.547367095947266, 2.739839553833008, 5.554925918579102, -0.00507354736328125, 6.798652648925781, 1.7827606201171875, 3.7541656494140625, 5.4835357666015625, 10.213565826416016, 8.42465591430664, 4.007131576538086, 2.1913528442382812, 13.374725341796875, 1.8984413146972656, -0.1438140869140625, 7.7894134521484375, 8.209238052368164, 0.4972248077392578, 9.76321029663086, 4.761165618896484, 12.528236389160156, 10.72354507446289, 4.412290573120117, -3.1191940307617188, -7.724651336669922, 2.899332046508789, 2.1912765502929688, 2.896381378173828, 5.723054885864258, 10.186637878417969, 2.8232269287109375, 2.7609100341796875, 1.0876312255859375, 4.510284423828125, -4.709850311279297, 16.451148986816406, 1.284149169921875, 7.680938720703125, 3.1648197174072266], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000322.npy"}
{"epoch": 0.48677248677248675, "step": 323, "batch_size": 64, "mean": 3.4498794078826904, "std": 6.767147541046143, "min": -10.660369873046875, "p10": -3.5377441406249996, "median": 2.6794052124023438, "p90": 13.492502593994141, "max": 22.871257781982422, "pos_frac": 0.6875, "sample": [1.8912849426269531, -1.5924835205078125, 3.1524429321289062, 0.032196044921875, 0.5656204223632812, 5.135765075683594, 5.847625732421875, 4.336511611938477, 0.46666717529296875, 13.930038452148438, 6.079673767089844, -2.150674819946289, 11.007926940917969, 4.49432373046875, 13.421211242675781, -3.2477989196777344, -7.468231201171875, 8.389678955078125, 7.927669525146484, 2.72052001953125, -4.9463958740234375, 7.7972259521484375, 1.24420166015625, 1.8567047119140625, 18.51036834716797, -0.15676307678222656, -6.434913635253906, 4.240470886230469, 0.6398773193359375, 5.05316162109375, 21.011672973632812, -3.04296875, -1.0926666259765625, -3.662006378173828, 9.545585632324219, 1.0428695678710938, -1.217864990234375, 7.134193420410156, 2.7482757568359375, 0.9924087524414062, 6.369377136230469, 3.5287094116210938, -0.9822158813476562, 0.122650146484375, 14.439804077148438, 22.871257781982422, 16.997604370117188, -9.575019836425781, 13.523056030273438, -1.0049972534179688, -10.660369873046875, -0.041149139404296875, 4.438602447509766, 2.6382904052734375, 5.63372802734375, 8.401742935180664, 6.402130126953125, 5.7783355712890625, -3.7471771240234375, -2.2184886932373047, 2.830768585205078, -0.30313873291015625, 0.354217529296875, -1.2088470458984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000323.npy"}
{"epoch": 0.48828420256991684, "step": 324, "batch_size": 64, "mean": 3.9843506813049316, "std": 4.398356914520264, "min": -5.5849151611328125, "p10": -1.3540222167968747, "median": 3.563884735107422, "p90": 9.561030197143557, "max": 15.228973388671875, "pos_frac": 0.84375, "sample": [7.444007873535156, 2.137176513671875, 4.717018127441406, 1.1851654052734375, -2.535350799560547, 11.618335723876953, 5.373086929321289, 1.4985733032226562, -1.4169464111328125, 1.2580108642578125, 9.769645690917969, 8.849496841430664, 2.06536865234375, 4.979549407958984, 4.815025329589844, -0.06629562377929688, 9.074260711669922, -5.5849151611328125, 4.9434661865234375, 0.4981689453125, 4.738563537597656, 3.79052734375, 1.1465110778808594, 6.802070617675781, 6.173473358154297, 2.3891944885253906, -1.2071990966796875, 1.0809173583984375, 15.228973388671875, 2.1075363159179688, 1.824819564819336, -3.474803924560547, 7.354959487915039, 6.7079925537109375, 4.5792388916015625, 2.455280303955078, 2.373920440673828, -2.3953628540039062, 5.513397216796875, 5.526405334472656, 7.246604919433594, -3.0152854919433594, 12.93218994140625, 0.5057373046875, 13.028648376464844, 1.0703887939453125, 0.7273025512695312, 5.8605499267578125, 4.493564605712891, 11.4249267578125, 14.062164306640625, 1.5627517700195312, 1.094573974609375, 5.009086608886719, 8.101051330566406, 4.245170593261719, 0.6606597900390625, -1.5436248779296875, 2.9011993408203125, 2.507984161376953, 7.899772644042969, 7.8874664306640625, 3.3372421264648438, -0.34094810485839844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000324.npy"}
{"epoch": 0.4897959183673469, "step": 325, "batch_size": 64, "mean": 3.764554262161255, "std": 5.269677639007568, "min": -13.706893920898438, "p10": -2.4331996917724608, "median": 3.1532554626464844, "p90": 11.122118759155274, "max": 18.83655548095703, "pos_frac": 0.78125, "sample": [3.0718231201171875, 12.336845397949219, 1.7969551086425781, 5.751070022583008, 2.2066612243652344, 3.1921844482421875, 7.350437164306641, 5.641162872314453, 2.9516143798828125, 2.6486587524414062, 5.9577178955078125, 5.685102462768555, 1.2819232940673828, -3.53118896484375, 4.3522491455078125, -0.8872337341308594, 11.058296203613281, 2.7116241455078125, 3.1143264770507812, -2.688924789428711, 18.83655548095703, -3.713470458984375, 6.318796157836914, 1.3462200164794922, 0.47580528259277344, 14.4329833984375, -0.013607025146484375, 6.704923629760742, -2.078662872314453, 3.921802520751953, -0.2477874755859375, 3.4745140075683594, 6.527534484863281, 5.075531005859375, -3.3698654174804688, 11.4473876953125, -2.854198455810547, 7.4305572509765625, 4.420717239379883, 9.24871826171875, 3.7163543701171875, 0.6334514617919922, 2.2433319091796875, 12.159454345703125, -13.706893920898438, 2.0620193481445312, 1.795745849609375, 1.7831802368164062, 11.149471282958984, 4.785545349121094, 17.039569854736328, -0.86590576171875, 4.171518325805664, -2.58514404296875, 2.5806655883789062, -0.6228485107421875, 4.937751770019531, 6.480091094970703, 9.589019775390625, 2.5091705322265625, 5.8197784423828125, 0.5810317993164062, 3.5491256713867188, -0.25975799560546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000325.npy"}
{"epoch": 0.491307634164777, "step": 326, "batch_size": 64, "mean": 2.1960787773132324, "std": 5.1725263595581055, "min": -8.85308837890625, "p10": -4.55428295135498, "median": 1.9090919494628906, "p90": 8.895280075073245, "max": 12.091758728027344, "pos_frac": 0.65625, "sample": [-4.010612487792969, 0.1426544189453125, 3.6751365661621094, 1.2913780212402344, 8.1739501953125, -8.85308837890625, 11.171249389648438, 5.4432830810546875, 1.6345291137695312, 11.653297424316406, 0.5085258483886719, 5.663364410400391, 1.8318824768066406, 0.10547637939453125, 2.2989768981933594, 5.785285949707031, 0.6284770965576172, 5.49615478515625, -1.9445152282714844, 7.107856750488281, 9.126066207885742, 2.3040313720703125, 4.4049530029296875, 9.546066284179688, -2.9659347534179688, -0.947601318359375, -7.9498748779296875, -2.568115234375, -0.607452392578125, -1.5016708374023438, 10.105768203735352, -6.848121643066406, 3.5596389770507812, -1.5628471374511719, 6.692405700683594, 1.9162826538085938, 2.7012062072753906, 1.700164794921875, -2.1755409240722656, -5.0651702880859375, 8.356779098510742, 11.955230712890625, -4.572174072265625, -0.49919891357421875, 7.793125152587891, 1.9019012451171875, 6.534824371337891, -4.512537002563477, 4.236322402954102, 12.091758728027344, -0.4693717956542969, -2.196565628051758, -5.74884033203125, 5.5032501220703125, -1.4781112670898438, 4.9151611328125, -7.183187484741211, 4.0151519775390625, 6.421302795410156, 5.54339599609375, 5.824228286743164, -3.237081527709961, 0.37873077392578125, 7.307422637939453], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000326.npy"}
{"epoch": 0.4928193499622071, "step": 327, "batch_size": 64, "mean": 3.5499043464660645, "std": 5.820561408996582, "min": -12.964401245117188, "p10": -2.808633232116699, "median": 3.2461071014404297, "p90": 11.524661254882814, "max": 18.670249938964844, "pos_frac": 0.6875, "sample": [6.872039794921875, 0.29065704345703125, -1.0541534423828125, 15.19720458984375, 1.247161865234375, 6.6375885009765625, -7.832265853881836, 4.60791015625, -4.028095245361328, 8.053340911865234, 13.694202423095703, -0.5329475402832031, -2.816610336303711, 8.709728240966797, 6.1353759765625, 9.146852493286133, 9.410980224609375, 0.16925621032714844, -2.238037109375, -2.790019989013672, 0.5661182403564453, 3.178722381591797, 5.892208099365234, 11.240371704101562, 11.905441284179688, -0.2187957763671875, 4.341423034667969, 3.9650192260742188, 7.266632080078125, -1.9041748046875, 5.942693710327148, 2.6390037536621094, -2.81689453125, 9.409881591796875, 6.126617431640625, 0.5428009033203125, 6.426971435546875, 1.5704269409179688, -0.9580268859863281, -0.7102069854736328, 12.718048095703125, 0.9042434692382812, -12.964401245117188, 3.3134918212890625, 7.46502685546875, 18.670249938964844, -0.26711273193359375, 6.1128692626953125, 13.774330139160156, 4.023508071899414, 11.646499633789062, -0.4325752258300781, 3.0660362243652344, 2.792816162109375, -4.330020904541016, -0.8991413116455078, 9.879631042480469, 4.900840759277344, 5.971893310546875, -0.14628219604492188, 1.0496292114257812, -5.477071762084961, 3.6361083984375, -1.5011253356933594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000327.npy"}
{"epoch": 0.4943310657596372, "step": 328, "batch_size": 64, "mean": 3.664557695388794, "std": 4.727904319763184, "min": -5.932281494140625, "p10": -2.1495502471923826, "median": 3.610661506652832, "p90": 10.561863136291505, "max": 15.898040771484375, "pos_frac": 0.765625, "sample": [13.105422973632812, 1.4018783569335938, 3.4933414459228516, 1.608978271484375, 6.879764556884766, 0.7668361663818359, -1.7053813934326172, 7.704469680786133, 4.292320251464844, 4.896049499511719, -1.088623046875, 7.8846435546875, 13.124153137207031, 4.539043426513672, 3.7279815673828125, -3.1173858642578125, 10.147350311279297, 5.461462020874023, 4.7007904052734375, -4.297687530517578, 3.8263092041015625, -2.7968292236328125, 0.27431488037109375, 5.271141052246094, -0.7435760498046875, 0.6562271118164062, 12.421722412109375, 1.6589279174804688, -5.932281494140625, 6.198936462402344, -2.8161468505859375, 5.98292350769043, 9.314863204956055, 4.4117279052734375, 7.606536865234375, -0.015106201171875, -0.39804840087890625, -0.5209026336669922, 15.898040771484375, 0.4151153564453125, 8.585205078125, 5.4604034423828125, -1.7996063232421875, -2.2995262145996094, 4.63897705078125, 5.389152526855469, 0.8336410522460938, 1.7049713134765625, 0.910186767578125, 3.923870086669922, 3.4622859954833984, -1.1515426635742188, 5.055212020874023, 6.631500244140625, -2.3069076538085938, 2.139484405517578, 5.639823913574219, 2.606813430786133, 11.765527725219727, 2.101287841796875, 3.20526123046875, 0.0031280517578125, 13.053733825683594, 10.739511489868164], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000328.npy"}
{"epoch": 0.4958427815570673, "step": 329, "batch_size": 64, "mean": 3.6760973930358887, "std": 5.667335510253906, "min": -6.983375549316406, "p10": -2.92387580871582, "median": 3.180478096008301, "p90": 10.77886238098145, "max": 19.55081558227539, "pos_frac": 0.71875, "sample": [3.183633804321289, 1.8879890441894531, 4.118621826171875, 5.064994812011719, -3.05560302734375, 3.8418807983398438, 11.411407470703125, 1.9516220092773438, 6.83428955078125, 1.9156341552734375, 9.061225891113281, 4.987251281738281, -2.6165122985839844, 6.655063629150391, -0.9508304595947266, -0.43845367431640625, 1.3591079711914062, 5.52935791015625, -4.3723602294921875, 5.005699157714844, 1.6207656860351562, 0.16040992736816406, 16.329896926879883, 6.2659149169921875, -6.983375549316406, 12.082542419433594, 18.9764404296875, 6.401725769042969, -1.833709716796875, -0.21087646484375, 7.748252868652344, 3.832672119140625, -4.227008819580078, 6.748748779296875, 6.315832138061523, -0.5484161376953125, 15.175079345703125, -1.3037166595458984, 8.777688980102539, 6.656982421875, 9.263824462890625, 7.494178771972656, 0.5050773620605469, -3.95623779296875, 2.3643035888671875, 4.3315887451171875, 0.06288719177246094, 2.6155929565429688, 1.4813003540039062, 5.215787887573242, 0.9752388000488281, 7.83648681640625, 3.4663848876953125, -1.3283119201660156, 19.55081558227539, 9.695892333984375, -2.002878189086914, 11.242992401123047, 1.4914093017578125, -6.827880859375, 3.1773223876953125, -0.12935638427734375, -0.1349964141845703, -4.481048583984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000329.npy"}
{"epoch": 0.4973544973544973, "step": 330, "batch_size": 64, "mean": 3.9373912811279297, "std": 5.965422630310059, "min": -12.862041473388672, "p10": -3.479503631591797, "median": 3.493006706237793, "p90": 11.302259826660158, "max": 16.81964874267578, "pos_frac": 0.765625, "sample": [-0.16839599609375, -1.9759330749511719, -1.8549232482910156, -7.563920974731445, -0.3785552978515625, -12.862041473388672, 10.536727905273438, 2.74371337890625, -3.4963607788085938, 11.795280456542969, 9.904556274414062, 3.6261043548583984, -7.338338851928711, 2.5439682006835938, 8.340038299560547, 10.595306396484375, 13.068008422851562, -3.7204208374023438, 16.81964874267578, 10.19955062866211, 9.594985961914062, -1.3937320709228516, 5.3109130859375, 0.6270236968994141, 8.873600006103516, 8.694229125976562, 3.6229934692382812, 1.2081756591796875, -0.06916046142578125, 1.6593399047851562, 2.429067611694336, 7.015052795410156, 1.3376388549804688, 2.673450469970703, -6.199668884277344, 10.259193420410156, 8.348594665527344, 6.41595458984375, 7.7247161865234375, 3.3630199432373047, 1.2116012573242188, 4.2282562255859375, 0.7266998291015625, 5.8899383544921875, 11.605239868164062, 10.021936416625977, 2.351726531982422, 2.8905487060546875, 2.8348922729492188, -1.054168701171875, 0.7098731994628906, 4.1102294921875, 14.222526550292969, 5.563478469848633, 3.9183731079101562, 4.496490478515625, 1.0275726318359375, -3.4401702880859375, 13.671005249023438, -5.804206848144531, 7.785591125488281, 4.552970886230469, 2.6721649169921875, 15.491077423095703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000330.npy"}
{"epoch": 0.4988662131519274, "step": 331, "batch_size": 64, "mean": 5.116671562194824, "std": 6.038074016571045, "min": -10.586105346679688, "p10": -1.9199790954589842, "median": 5.3330230712890625, "p90": 12.858785629272463, "max": 21.23352813720703, "pos_frac": 0.796875, "sample": [10.87631607055664, 3.0800857543945312, 2.803272247314453, 5.550228118896484, 6.603939056396484, 9.207435607910156, 5.8028717041015625, 10.752098083496094, 5.7509002685546875, 1.407196044921875, -0.378814697265625, 14.119735717773438, 12.51401138305664, 14.358123779296875, 6.1284027099609375, -1.3663864135742188, 21.23352813720703, 5.3013916015625, 4.335792541503906, 16.893465042114258, 7.121519088745117, 1.7844696044921875, 2.4511260986328125, 3.5580673217773438, 0.2941436767578125, 10.970836639404297, 6.85130500793457, 18.588973999023438, -0.580596923828125, 6.9493865966796875, 7.3003692626953125, 4.029634475708008, 9.612434387207031, 10.502273559570312, 1.833221435546875, 0.5569915771484375, 8.421285629272461, -3.4351348876953125, -2.8515243530273438, 8.493915557861328, -2.0349578857421875, 14.433616638183594, 4.749261856079102, -0.4684295654296875, 3.0669078826904297, -10.586105346679688, 3.1456451416015625, -3.1301422119140625, -4.706573486328125, 0.4572315216064453, -1.6516952514648438, 4.043365478515625, 10.611724853515625, 1.1548080444335938, 6.365570068359375, 0.7580795288085938, -0.9384670257568359, 8.376081466674805, 5.364654541015625, -5.332679748535156, 11.401277542114258, 5.794567108154297, 6.160400390625, 13.006546020507812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000331.npy"}
{"epoch": 0.5003779289493575, "step": 332, "batch_size": 64, "mean": 4.245758533477783, "std": 5.32445764541626, "min": -6.345424652099609, "p10": -0.773029327392578, "median": 3.3121557235717773, "p90": 11.525017547607423, "max": 19.998178482055664, "pos_frac": 0.828125, "sample": [-0.6508636474609375, 2.3238296508789062, 3.2253494262695312, -0.17315673828125, 4.070409774780273, 5.443534851074219, 3.2707672119140625, 3.9469470977783203, 2.366424560546875, 1.6835136413574219, 3.1132431030273438, 15.762153625488281, 10.167129516601562, 7.929588317871094, 11.585517883300781, 1.663909912109375, 3.135639190673828, 0.24040603637695312, 5.872707366943359, 0.77679443359375, 10.95199966430664, 2.6653518676757812, -6.345424652099609, 14.74020004272461, 3.4811477661132812, 5.2368316650390625, -2.5878257751464844, 1.5951080322265625, 3.5700302124023438, 4.9311370849609375, -0.8253860473632812, 10.53775405883789, 2.4436874389648438, 0.46967315673828125, 4.2913360595703125, 3.353544235229492, 0.046421051025390625, 15.710525512695312, -5.0673828125, -0.0458221435546875, 4.1830596923828125, 10.967559814453125, 5.79966926574707, 4.656131744384766, 1.788970947265625, -0.100341796875, 6.4403228759765625, 8.252323150634766, 4.757904052734375, 3.401721954345703, 11.7623291015625, 2.9040679931640625, 3.430217742919922, 0.8508949279785156, 2.6426773071289062, -1.444091796875, 19.998178482055664, 2.3660106658935547, 1.7064361572265625, 11.38385009765625, 16.383087158203125, 3.5951690673828125, -6.267353057861328, -2.6370086669921875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000332.npy"}
{"epoch": 0.5018896447467877, "step": 333, "batch_size": 64, "mean": 5.13956356048584, "std": 5.502954959869385, "min": -4.6061248779296875, "p10": -1.4387344360351562, "median": 3.6345787048339844, "p90": 12.388198852539062, "max": 21.060104370117188, "pos_frac": 0.828125, "sample": [2.362396240234375, -0.7521934509277344, 12.308517456054688, 9.570549011230469, 5.651481628417969, 0.11091041564941406, 7.586795806884766, 2.76910400390625, 2.907623291015625, 2.80712890625, 12.084342956542969, 9.928062438964844, 4.995903015136719, 5.391574859619141, 10.295221328735352, 6.352783203125, 4.544525146484375, 16.197357177734375, 12.422348022460938, 14.511672973632812, 1.4086761474609375, 2.9716796875, 2.9824447631835938, 0.070465087890625, 3.406930923461914, -2.3185882568359375, -1.8668346405029297, 4.215522766113281, 3.2332916259765625, 3.7517013549804688, 8.595279693603516, 14.37640380859375, -0.90435791015625, 2.5219497680664062, 6.969505310058594, 7.715873718261719, 21.060104370117188, -1.7172412872314453, 0.2311229705810547, 0.4409942626953125, -0.3867664337158203, 3.5174560546875, -4.6061248779296875, 1.7317314147949219, 2.8503494262695312, -1.5256729125976562, 11.3231201171875, 1.2051467895507812, -1.2358779907226562, 6.4652557373046875, 2.8171920776367188, -3.147022247314453, 5.032648086547852, 14.211418151855469, 0.7865467071533203, -3.2979660034179688, 5.448081970214844, 2.547637939453125, 13.296754837036133, 11.030641555786133, 10.648681640625, 9.422821044921875, 11.481010437011719, 10.123970031738281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000333.npy"}
{"epoch": 0.5034013605442177, "step": 334, "batch_size": 64, "mean": 3.458221197128296, "std": 5.9718427658081055, "min": -10.953025817871094, "p10": -3.2356868743896485, "median": 2.8460540771484375, "p90": 11.754915618896487, "max": 15.808120727539062, "pos_frac": 0.765625, "sample": [-3.2056808471679688, -10.953025817871094, 4.4150238037109375, 2.8057785034179688, 5.4229736328125, 3.1826210021972656, -8.228744506835938, 11.449867248535156, 1.6240425109863281, -9.758304595947266, 0.5729293823242188, 15.808120727539062, -3.411529541015625, 2.010162353515625, 1.567718505859375, 5.640834808349609, -0.563751220703125, -1.1077861785888672, 3.47882080078125, 12.520442962646484, 9.663642883300781, 14.45166015625, 2.7229461669921875, 5.640615463256836, 8.656230926513672, -5.870632171630859, 2.6250381469726562, 0.6747455596923828, 14.172992706298828, 1.9183635711669922, 1.478841781616211, -2.5953330993652344, 11.172676086425781, -2.573944091796875, 7.740938186645508, 13.622802734375, 8.174560546875, 4.268096923828125, 3.68377685546875, 1.2718276977539062, 2.8863296508789062, -2.479473114013672, -0.4331645965576172, 4.4030303955078125, 12.569869995117188, 0.5265541076660156, 0.20067596435546875, 1.3405723571777344, -8.540191650390625, 7.777557373046875, 7.014461517333984, 2.6865005493164062, 5.150238037109375, 2.488910675048828, -3.248546600341797, 5.07684326171875, 6.515495300292969, 11.885650634765625, -2.9372329711914062, 1.4698104858398438, 6.927154541015625, 5.3975677490234375, 9.927093505859375, 10.550094604492188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000334.npy"}
{"epoch": 0.5049130763416477, "step": 335, "batch_size": 64, "mean": 4.165672302246094, "std": 5.840446472167969, "min": -8.592903137207031, "p10": -2.645248031616211, "median": 3.0770692825317383, "p90": 11.89737377166748, "max": 16.585372924804688, "pos_frac": 0.75, "sample": [6.844276428222656, 10.137557983398438, -2.6950206756591797, 14.005996704101562, 1.9326629638671875, 11.046112060546875, 9.864524841308594, 0.1618213653564453, 4.3302001953125, 3.093669891357422, 11.82016372680664, 10.05038070678711, 2.831939697265625, 3.4285659790039062, 15.0098876953125, 11.930463790893555, 4.176116943359375, 0.6552505493164062, 1.1237258911132812, 15.272130966186523, -3.8517837524414062, 4.88128662109375, 4.500295639038086, 12.762657165527344, -1.46630859375, 6.93023681640625, 9.886581420898438, -6.1313629150390625, -2.692331314086914, 7.162353515625, 7.229316711425781, -0.8518905639648438, 1.6081161499023438, 3.7664108276367188, 8.813793182373047, -2.5353870391845703, 11.164213180541992, -0.5016975402832031, 1.8551712036132812, 16.585372924804688, 2.7471237182617188, 3.0604686737060547, 0.25710296630859375, 7.1268463134765625, 3.3399009704589844, 14.872947692871094, -0.23256683349609375, -2.0062942504882812, -0.21405792236328125, 0.2869586944580078, 2.3203125, -2.0188865661621094, 0.4895782470703125, -5.430938720703125, -8.592903137207031, -4.529430389404297, 9.818161010742188, 6.775413513183594, 2.265657424926758, 8.405658721923828, -0.1442718505859375, 0.46469879150390625, 2.2592086791992188, 11.1468505859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000335.npy"}
{"epoch": 0.5064247921390779, "step": 336, "batch_size": 64, "mean": 3.7194204330444336, "std": 6.141866683959961, "min": -6.38592529296875, "p10": -2.646804046630859, "median": 2.3637256622314453, "p90": 11.96867370605469, "max": 24.029525756835938, "pos_frac": 0.6875, "sample": [-1.7416839599609375, 6.6898345947265625, 9.433303833007812, -1.18402099609375, 19.462982177734375, 6.79827880859375, 6.12310791015625, -4.507896423339844, 17.104598999023438, 0.9088897705078125, -0.7757625579833984, -0.8354244232177734, 15.15216064453125, -0.5711746215820312, 8.560470581054688, 2.2772865295410156, -0.048553466796875, 4.848106384277344, 14.147125244140625, -3.613384246826172, -0.3956451416015625, 2.90142822265625, 11.294929504394531, 1.0093002319335938, -2.5238800048828125, 9.48525619506836, 1.7753562927246094, -6.38592529296875, -1.58099365234375, -5.4326019287109375, 13.121002197265625, -4.184898376464844, 1.2770195007324219, 6.4540252685546875, 0.4415321350097656, -0.7199993133544922, 2.5912322998046875, -2.6994857788085938, 9.07133674621582, 5.6851043701171875, 0.6840744018554688, -0.7396163940429688, 2.2377853393554688, 0.9999847412109375, 7.301300048828125, 3.566375732421875, 4.8608856201171875, -1.2830810546875, 2.5646591186523438, 12.110397338867188, 1.6627693176269531, 0.712371826171875, 5.564125061035156, 11.637985229492188, 24.029525756835938, 5.7970733642578125, 0.3833160400390625, 2.450164794921875, 7.391181945800781, -3.0684967041015625, 3.2992706298828125, 6.376064300537109, 2.604583740234375, -2.512115478515625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000336.npy"}
{"epoch": 0.5079365079365079, "step": 337, "batch_size": 64, "mean": 3.395236015319824, "std": 5.306595325469971, "min": -7.450040817260742, "p10": -3.370429992675781, "median": 3.0293378829956055, "p90": 8.752456665039062, "max": 20.19171142578125, "pos_frac": 0.734375, "sample": [2.1570091247558594, 0.5610809326171875, -4.5133819580078125, -3.498363494873047, 5.330257415771484, 6.859123229980469, -0.4361896514892578, 10.70738410949707, -1.13397216796875, -0.1811676025390625, 15.30010986328125, 2.7959213256835938, 3.2018203735351562, -7.450040817260742, 3.043426513671875, 1.0429153442382812, 5.2431182861328125, 8.418609619140625, 2.368927001953125, -1.302642822265625, 3.8315658569335938, -0.3012676239013672, -4.77739143371582, -6.875999450683594, 8.282012939453125, 5.62550163269043, -3.071918487548828, 4.218559265136719, 3.015249252319336, 14.33770751953125, 3.261138916015625, -0.6755599975585938, 8.687562942504883, -1.5356521606445312, 8.721267700195312, 8.002754211425781, 6.587715148925781, 1.3106231689453125, 1.3033294677734375, 6.042728424072266, 8.765823364257812, 4.149562835693359, 7.7489776611328125, 3.6468734741210938, 1.920013427734375, -1.2342376708984375, 3.742565155029297, 0.620361328125, 6.333198547363281, 1.1691703796386719, 7.248077392578125, 20.19171142578125, -0.605621337890625, 1.2599639892578125, 2.3371658325195312, 12.578460693359375, 0.8688735961914062, -3.7241439819335938, 8.039308547973633, 6.019227981567383, -5.739891052246094, 2.5377044677734375, 11.150764465332031, 3.767303466796875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000337.npy"}
{"epoch": 0.509448223733938, "step": 338, "batch_size": 64, "mean": 3.481980562210083, "std": 5.133074760437012, "min": -8.818336486816406, "p10": -2.017831230163574, "median": 2.9517383575439453, "p90": 9.201964378356934, "max": 15.086894989013672, "pos_frac": 0.765625, "sample": [-2.911956787109375, 0.6128883361816406, 6.122283935546875, 7.64088249206543, 8.216434478759766, 5.936054229736328, -8.818336486816406, 6.001518249511719, 0.5112152099609375, 14.005983352661133, 15.086894989013672, 2.3140106201171875, -1.3150711059570312, 4.9797515869140625, 1.619415283203125, -2.5064353942871094, -0.388916015625, -5.942203521728516, 0.1573486328125, 1.6012725830078125, 3.17431640625, 7.857688903808594, 3.1794052124023438, -0.04529380798339844, 5.843406677246094, 4.11541748046875, -7.708427429199219, 7.631904602050781, 5.484375, 9.253946304321289, 5.024223327636719, 11.312255859375, 0.24257659912109375, 2.7291603088378906, 2.184419631958008, 5.2154998779296875, 5.899728775024414, 3.9613571166992188, 9.080673217773438, -0.203216552734375, 0.312713623046875, 7.011457443237305, -2.7028732299804688, 5.153179168701172, 1.8826942443847656, 3.8787784576416016, -1.7925643920898438, -2.109212875366211, 1.985687255859375, 0.180816650390625, 2.0671615600585938, 14.722909927368164, 7.4093780517578125, 0.2278594970703125, 3.6669044494628906, 9.003320693969727, 1.2255401611328125, 12.697452545166016, 8.089019775390625, -1.8046073913574219, -1.3858051300048828, -0.4858207702636719, 15.05537223815918, 1.4009475708007812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000338.npy"}
{"epoch": 0.5109599395313681, "step": 339, "batch_size": 64, "mean": 3.9608728885650635, "std": 5.046861171722412, "min": -5.207012176513672, "p10": -2.306228446960449, "median": 3.1543006896972656, "p90": 10.54089469909668, "max": 16.9937686920166, "pos_frac": 0.796875, "sample": [6.269010543823242, 3.5425643920898438, 0.624542236328125, -0.08603668212890625, -2.237049102783203, 9.604879379272461, 4.0757598876953125, 3.2042999267578125, 1.4384117126464844, -2.2555103302001953, 8.989044189453125, -0.8721160888671875, 1.6478633880615234, 1.6182594299316406, 1.7642478942871094, 1.4651718139648438, -4.932201385498047, 5.20001220703125, 12.715450286865234, -0.29708099365234375, 9.729804992675781, 2.702983856201172, 6.292915344238281, 4.837615966796875, 0.6528701782226562, 8.12545394897461, 2.639657974243164, 2.2889556884765625, 2.8931884765625, 5.839973449707031, 10.343402862548828, 11.125778198242188, 14.923149108886719, 16.9937686920166, 7.176795959472656, 2.3385353088378906, 5.867439270019531, -2.3279647827148438, 7.712982177734375, 5.741363525390625, 12.452400207519531, 2.2439727783203125, -3.3501243591308594, -2.496784210205078, 4.6873779296875, -1.4405364990234375, 0.310577392578125, 5.4381103515625, -5.207012176513672, -4.7506866455078125, 0.9312191009521484, 1.122720718383789, 6.894317626953125, 7.418336868286133, -4.1302032470703125, 15.561813354492188, 0.9233856201171875, 3.1043014526367188, 3.5705642700195312, 7.493354797363281, 5.102468490600586, 7.832706451416016, 1.7798690795898438, 10.625534057617188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000339.npy"}
{"epoch": 0.5124716553287982, "step": 340, "batch_size": 64, "mean": 4.320586681365967, "std": 6.507540702819824, "min": -10.600448608398438, "p10": -3.1760450363159176, "median": 3.3371429443359375, "p90": 13.82598648071289, "max": 16.433366775512695, "pos_frac": 0.703125, "sample": [6.5677947998046875, 11.355842590332031, 6.101512908935547, 5.060455322265625, -1.9257965087890625, 16.433366775512695, 0.413421630859375, -0.22459983825683594, -2.3137874603271484, 7.793914794921875, 3.3431549072265625, 8.823089599609375, 5.18536376953125, 6.712339401245117, 15.566513061523438, 9.280696868896484, 0.20971298217773438, -7.210487365722656, 13.459060668945312, 10.787445068359375, -2.0992908477783203, -1.1307830810546875, -6.416683197021484, 4.485532760620117, 4.196308135986328, 15.080986022949219, 0.14171600341796875, 8.970436096191406, 4.636863708496094, 11.949838638305664, -4.705055236816406, 7.291290283203125, 1.34832763671875, 4.4872283935546875, 10.435516357421875, 3.0018482208251953, 13.794525146484375, 0.1565399169921875, -0.25616455078125, 12.791425704956055, 16.201887130737305, -2.016603469848633, -0.15752029418945312, 13.839469909667969, 10.143024444580078, -3.0376052856445312, -2.1903839111328125, 2.3583641052246094, 2.7271480560302734, 9.291534423828125, 2.2317352294921875, 3.1318740844726562, 2.4662322998046875, -3.2353763580322266, -10.600448608398438, -3.304473876953125, 15.3424072265625, -0.0652008056640625, 4.188226699829102, 15.341812133789062, 2.5326194763183594, -0.083709716796875, -5.498003005981445, 3.3311309814453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000340.npy"}
{"epoch": 0.5139833711262283, "step": 341, "batch_size": 64, "mean": 4.315929889678955, "std": 6.261607646942139, "min": -8.654899597167969, "p10": -4.0239208221435545, "median": 3.792379379272461, "p90": 11.96708793640137, "max": 16.659164428710938, "pos_frac": 0.71875, "sample": [-4.180957794189453, 8.869857788085938, 7.524757385253906, 8.501893997192383, 11.32408332824707, 7.652502059936523, 11.386943817138672, 16.659164428710938, 9.036575317382812, -4.7723541259765625, 13.162029266357422, 2.0648365020751953, -3.657501220703125, -1.0761184692382812, -0.15557479858398438, 1.3532028198242188, 9.177223205566406, 3.5082015991210938, 14.709257125854492, 14.811141967773438, 8.342742919921875, 11.019073486328125, 12.197994232177734, 4.924995422363281, -7.87530517578125, 14.73805046081543, 7.9497833251953125, -1.1832771301269531, -0.4887809753417969, -2.841766357421875, 4.703472137451172, 4.184539794921875, -3.261138916015625, -1.0578079223632812, 5.829010009765625, 11.428306579589844, 2.6691818237304688, 2.9300537109375, 1.6646957397460938, -6.547340393066406, -4.7959747314453125, -0.9722671508789062, 3.175334930419922, 7.24072265625, 0.9252357482910156, 14.863704681396484, 9.867263793945312, 10.184326171875, 4.246158599853516, 11.224861145019531, 3.8900909423828125, 3.6946678161621094, 11.18271255493164, -4.54052734375, 1.0270538330078125, 11.09783935546875, 4.2802276611328125, 0.34283447265625, 1.1645870208740234, -1.9163436889648438, 1.4135589599609375, -0.34668731689453125, 2.399381637573242, -8.654899597167969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000341.npy"}
{"epoch": 0.5154950869236583, "step": 342, "batch_size": 64, "mean": 4.975249290466309, "std": 6.052150726318359, "min": -8.150690078735352, "p10": -1.5888853073120113, "median": 3.245511054992676, "p90": 13.51208610534668, "max": 19.460769653320312, "pos_frac": 0.765625, "sample": [5.359397888183594, 2.9822998046875, -2.4180526733398438, -0.9952316284179688, 0.36882781982421875, 13.417774200439453, 2.072193145751953, 11.172203063964844, -1.7816162109375, 10.373130798339844, 1.2000179290771484, 2.8034896850585938, 0.8423252105712891, 14.965354919433594, 2.9129180908203125, -3.4020919799804688, 1.0162353515625, -8.150690078735352, -0.7640190124511719, -1.012136459350586, 2.829242706298828, 11.674907684326172, 8.189056396484375, 12.256582260131836, -1.7745513916015625, 7.142108917236328, 4.9084930419921875, 5.754554748535156, 6.902259826660156, 9.062145233154297, 3.5087223052978516, 3.936185836791992, 1.5777587890625, 5.1150970458984375, 9.064353942871094, -2.1956729888916016, 9.958751678466797, -0.6133880615234375, 7.3895263671875, 12.044303894042969, -0.4946441650390625, 14.95039176940918, 13.552505493164062, 4.1758575439453125, -2.388345718383789, 19.460769653320312, 1.6821022033691406, -1.1556644439697266, 15.166641235351562, 15.0916748046875, 3.95233154296875, 2.4222145080566406, 2.0816268920898438, 19.391122817993164, -0.6224746704101562, 12.818592071533203, 5.049531936645508, 0.3347320556640625, 0.896392822265625, 2.8859519958496094, 7.007875442504883, 2.435628890991211, -0.8961639404296875, 12.924560546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000342.npy"}
{"epoch": 0.5170068027210885, "step": 343, "batch_size": 64, "mean": 2.679318904876709, "std": 5.305469036102295, "min": -10.735572814941406, "p10": -3.6123918533325194, "median": 2.5159034729003906, "p90": 8.71628646850586, "max": 15.71337890625, "pos_frac": 0.765625, "sample": [5.1344451904296875, 0.3590850830078125, 0.3074531555175781, 0.0645599365234375, -1.0394515991210938, 9.08759880065918, 4.0447540283203125, -1.2870941162109375, 0.4570770263671875, -0.588226318359375, -7.126384735107422, -0.55780029296875, 3.8857574462890625, 7.332206726074219, 11.343048095703125, -0.5616722106933594, 6.487144470214844, 15.71337890625, 13.7376708984375, 6.739006042480469, 2.4857177734375, 4.843631744384766, 5.045440673828125, 4.227516174316406, 0.7944793701171875, 4.64491081237793, 2.8697948455810547, -5.780200958251953, 6.615383148193359, 0.026123046875, -7.9523162841796875, 0.919525146484375, 5.5306854248046875, 0.9519309997558594, 0.140716552734375, 1.5489463806152344, 7.387470245361328, 5.112739562988281, -10.735572814941406, 12.900894165039062, -1.777374267578125, 1.4797286987304688, 6.183624267578125, 8.3592529296875, 3.873992919921875, 2.10723876953125, 13.396682739257812, 3.78216552734375, 0.04228973388671875, 7.401969909667969, 8.869300842285156, -1.9116134643554688, 3.5890426635742188, 6.857391357421875, 2.5460891723632812, 6.287689208984375, -3.7083740234375, 0.4221649169921875, 4.2273406982421875, 0.0306396484375, -7.912445068359375, -6.3057861328125, -3.3884334564208984, 1.9134578704833984], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000343.npy"}
{"epoch": 0.5185185185185185, "step": 344, "batch_size": 64, "mean": 4.045633316040039, "std": 5.412993431091309, "min": -8.822263717651367, "p10": -2.8427745819091794, "median": 3.7505102157592773, "p90": 10.85270233154297, "max": 16.580657958984375, "pos_frac": 0.796875, "sample": [-3.6491546630859375, 8.149999618530273, 3.7899398803710938, 3.0418148040771484, 11.290916442871094, 0.1507110595703125, 1.3906288146972656, 5.448883056640625, 3.4825286865234375, 9.003021240234375, 1.6007232666015625, 0.6679534912109375, 3.711080551147461, 8.588634490966797, 3.1897239685058594, 8.33498764038086, 4.5505828857421875, 16.580657958984375, 4.9012908935546875, 9.893539428710938, -2.7126808166503906, 1.3759002685546875, -6.59796142578125, 11.677719116210938, 0.1882781982421875, -2.2382240295410156, -2.898529052734375, 1.1541767120361328, 11.086105346679688, 8.32614517211914, -1.493927001953125, 14.629722595214844, 0.9301986694335938, 2.3627357482910156, 4.382442474365234, 10.726573944091797, 10.906757354736328, 2.9438247680664062, 2.906869888305664, -6.734508514404297, 1.3792076110839844, 1.6574897766113281, -0.4740486145019531, -1.083963394165039, 1.658538818359375, -3.3087100982666016, 13.803001403808594, 9.715538024902344, 4.676567077636719, 5.9308013916015625, 7.225767135620117, 4.520931243896484, 10.269588470458984, 3.9721717834472656, 7.595710754394531, 0.3854503631591797, -4.700408935546875, 9.424810409545898, -0.230377197265625, 4.8599853515625, -8.822263717651367, 9.982841491699219, 7.603965759277344, 7.8378448486328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000344.npy"}
{"epoch": 0.5200302343159486, "step": 345, "batch_size": 64, "mean": 3.8822784423828125, "std": 5.636628150939941, "min": -8.643014907836914, "p10": -2.807233810424804, "median": 3.947115898132324, "p90": 10.814275360107423, "max": 19.856338500976562, "pos_frac": 0.75, "sample": [2.8641815185546875, 1.2383537292480469, 17.16509246826172, -7.235939025878906, 1.9704055786132812, -0.7354888916015625, 1.2470855712890625, -4.167236328125, 1.8660507202148438, 1.6684207916259766, 5.325969696044922, 8.581584930419922, 8.639841079711914, 5.7529754638671875, -3.5099830627441406, 5.238525390625, -3.9914703369140625, 4.111572265625, 1.945526123046875, 9.539405822753906, 19.856338500976562, 1.1750526428222656, 5.5815582275390625, 6.205776214599609, 7.95606803894043, 0.0637359619140625, -1.1344680786132812, 3.9211254119873047, 1.6469268798828125, 5.092735290527344, 6.577301025390625, 12.592071533203125, 6.257175445556641, 5.39177131652832, 7.099456787109375, -5.421476364135742, 14.829721450805664, 11.149948120117188, 9.987911224365234, 0.1576995849609375, 11.040847778320312, -1.393463134765625, -3.284320831298828, 13.441070556640625, 6.2882080078125, -1.69403076171875, 0.0219573974609375, -1.1565170288085938, 3.9731063842773438, 1.0737838745117188, 7.633491516113281, 1.9925537109375, -1.223876953125, 5.5421600341796875, -0.11229705810546875, 8.84503173828125, -0.7967529296875, -8.643014907836914, 0.8763313293457031, -0.24842071533203125, 7.783334732055664, 4.01385498046875, 7.70587158203125, 10.285606384277344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000345.npy"}
{"epoch": 0.5215419501133787, "step": 346, "batch_size": 64, "mean": 4.847224235534668, "std": 5.933534145355225, "min": -9.767402648925781, "p10": -2.1145402908325193, "median": 3.492156982421875, "p90": 13.780052375793458, "max": 16.627487182617188, "pos_frac": 0.8125, "sample": [11.608022689819336, 13.967290878295898, 7.878997802734375, 10.842948913574219, 15.666023254394531, 5.2314605712890625, 0.37459564208984375, 7.036888122558594, 15.152410507202148, 10.953907012939453, 14.475128173828125, 2.87286376953125, 4.00933837890625, 11.331605911254883, -0.8165473937988281, 16.627487182617188, -2.9396820068359375, 1.72760009765625, -1.9962158203125, 2.613523483276367, 5.353771209716797, 5.357421875, 4.015289306640625, 2.770383834838867, 6.377847671508789, 15.238189697265625, -9.767402648925781, 4.71966552734375, 1.8787269592285156, -7.4254608154296875, 2.431640625, -3.8279647827148438, 7.392490386962891, -2.165250778198242, 1.8252830505371094, 2.066650390625, 2.0919876098632812, -2.6659164428710938, 13.238317489624023, 11.381301879882812, 11.374298095703125, 12.315423965454102, 1.2044219970703125, 8.230270385742188, 2.2883872985839844, 2.8292465209960938, 14.047086715698242, 5.936553955078125, -1.038064956665039, 7.1995391845703125, 3.7975997924804688, 0.25548553466796875, 2.928659439086914, 0.18058013916015625, -0.3800697326660156, 1.3104019165039062, 3.062572479248047, 5.3128814697265625, -2.4539642333984375, 13.343162536621094, 0.22148895263671875, -0.4074745178222656, 3.1867141723632812, 8.572555541992188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000346.npy"}
{"epoch": 0.5230536659108088, "step": 347, "batch_size": 64, "mean": 3.6475653648376465, "std": 5.844892501831055, "min": -8.683685302734375, "p10": -3.250778198242187, "median": 2.9671640396118164, "p90": 11.599357986450197, "max": 21.180526733398438, "pos_frac": 0.71875, "sample": [2.7089767456054688, 2.7481689453125, 6.270271301269531, 5.5220184326171875, -8.683685302734375, 1.0146770477294922, 4.5403594970703125, 6.039995193481445, 4.698953628540039, 11.182758331298828, 0.57330322265625, 12.137283325195312, 1.8818206787109375, -5.169944763183594, -5.509040832519531, 18.195518493652344, 3.31768798828125, 5.466754913330078, -4.3409423828125, 3.577648162841797, 1.447998046875, 2.290494918823242, 5.810028076171875, 1.4672470092773438, -4.077861785888672, -1.1058273315429688, 8.8201904296875, 10.31022834777832, 12.25677490234375, -3.4735031127929688, -0.3411598205566406, -0.7343692779541016, -2.235687255859375, 4.287528991699219, -2.7310867309570312, 2.9454383850097656, -2.6911659240722656, 5.2529296875, 3.25067138671875, 1.575775146484375, 5.97187614440918, -0.9186935424804688, 0.1469268798828125, 7.124504089355469, 4.643760681152344, 14.575088500976562, -0.7158374786376953, -4.157054901123047, 21.180526733398438, 15.8096923828125, 11.777900695800781, 7.461009979248047, 6.18109130859375, 4.5075531005859375, -1.8065834045410156, 7.953556060791016, 2.988889694213867, 2.5850582122802734, 1.9546928405761719, -0.6004924774169922, -2.3564376831054688, 1.7373580932617188, 10.170257568359375, 8.732301712036133], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000347.npy"}
{"epoch": 0.5245653817082389, "step": 348, "batch_size": 64, "mean": 5.337552070617676, "std": 7.645017623901367, "min": -5.32086181640625, "p10": -2.4541069030761715, "median": 3.0003509521484375, "p90": 15.914426803588869, "max": 30.530834197998047, "pos_frac": 0.703125, "sample": [15.421157836914062, 12.82071304321289, 1.2480926513671875, -0.15462112426757812, 13.586563110351562, 4.2289581298828125, 3.7105712890625, -2.2705459594726562, -5.32086181640625, -0.8693618774414062, 19.184410095214844, 1.7714996337890625, 5.1629638671875, 7.941102981567383, 14.785789489746094, 11.411859512329102, 2.992328643798828, 19.28498077392578, 3.008373260498047, 16.12582778930664, 0.45789337158203125, 9.80826187133789, 11.489032745361328, 3.3562965393066406, 3.62420654296875, 9.830293655395508, 0.6723480224609375, 11.370986938476562, -2.53277587890625, -2.8381195068359375, 7.2792816162109375, -0.29137420654296875, 2.1833133697509766, 22.458160400390625, 1.8021392822265625, -3.1178340911865234, 4.6673583984375, -2.0234813690185547, 0.3996753692626953, 2.5763721466064453, 17.222503662109375, 30.530834197998047, 2.3126068115234375, -1.9233779907226562, 10.686965942382812, -0.4446697235107422, -2.7386131286621094, -3.187793731689453, -0.5384521484375, 5.993610382080078, 0.15938186645507812, 24.204957962036133, -3.69091796875, 7.013437271118164, 10.514884948730469, 1.04644775390625, 7.393293380737305, 1.61932373046875, -1.9923858642578125, -0.2082500457763672, -1.0774040222167969, 3.69830322265625, 10.275474548339844, -0.5086860656738281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000348.npy"}
{"epoch": 0.5260770975056689, "step": 349, "batch_size": 64, "mean": 4.532926559448242, "std": 5.7730607986450195, "min": -12.675148010253906, "p10": -1.7047353744506835, "median": 4.013546943664551, "p90": 13.1572416305542, "max": 17.898597717285156, "pos_frac": 0.75, "sample": [-0.5467796325683594, 6.548065185546875, 8.803840637207031, 3.9517669677734375, 3.2653732299804688, 4.075326919555664, 6.26812744140625, 8.559810638427734, 2.668611526489258, 5.320518493652344, -1.6382999420166016, 1.0949535369873047, 7.869113922119141, 3.1199684143066406, 5.102058410644531, 6.785303115844727, 12.861124038696289, -5.615257263183594, -12.675148010253906, 10.541093826293945, 4.8018646240234375, 13.325325012207031, 1.6253280639648438, 2.6370468139648438, 9.01217269897461, 1.7304916381835938, -1.7332077026367188, 7.2853851318359375, 1.55731201171875, 6.013885498046875, 9.027435302734375, 8.379070281982422, -2.9930572509765625, 13.855321884155273, -0.3905792236328125, 11.416648864746094, 2.3003597259521484, 17.268936157226562, -2.73211669921875, 3.1892757415771484, 13.290153503417969, -1.398345947265625, 2.21490478515625, 3.0252838134765625, 6.16688346862793, 13.284149169921875, 4.922210693359375, 1.3810043334960938, -1.6061248779296875, -1.8397789001464844, 8.773063659667969, 5.2984466552734375, -1.3586769104003906, 1.0789871215820312, 3.614889144897461, 17.898597717285156, -1.4113006591796875, -1.1099777221679688, -2.4324417114257812, 12.170326232910156, -0.5149974822998047, 5.780494689941406, 6.239356994628906, 14.7037353515625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000349.npy"}
{"epoch": 0.527588813303099, "step": 350, "batch_size": 64, "mean": 5.352407455444336, "std": 6.2919182777404785, "min": -8.799337387084961, "p10": -1.749296760559082, "median": 4.431883811950684, "p90": 14.138033294677737, "max": 19.468456268310547, "pos_frac": 0.8125, "sample": [16.095054626464844, 3.6981048583984375, 6.204704284667969, -1.6300830841064453, 5.829322814941406, 2.8220767974853516, 13.361495971679688, 15.450927734375, 4.242654800415039, 7.056894302368164, 0.970611572265625, -2.1837615966796875, -0.38623046875, 7.994659423828125, 3.5002880096435547, 7.801006317138672, 11.135940551757812, 1.6435089111328125, -1.5080547332763672, -2.1911697387695312, 7.469146728515625, 10.139419555664062, -7.136924743652344, 2.96380615234375, 3.090129852294922, -1.3322906494140625, -8.799337387084961, 10.06036376953125, 1.188943862915039, -0.7502555847167969, 9.698646545410156, -4.234375, 10.293609619140625, 1.3897552490234375, 10.964584350585938, 14.318916320800781, 5.793266296386719, 10.669876098632812, 13.715972900390625, 15.909629821777344, 4.90648078918457, 18.81847381591797, 19.468456268310547, 1.0550994873046875, 1.9312477111816406, 2.895040512084961, 10.151130676269531, 1.0961780548095703, -1.8003883361816406, 10.928237915039062, 0.1591949462890625, 6.404933929443359, 0.8225555419921875, 4.621112823486328, 0.5580062866210938, -3.3622589111328125, 9.81640625, 1.9260940551757812, 3.2190628051757812, 1.0009307861328125, 17.430023193359375, 6.489574432373047, 8.538433074951172, 10.15924072265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000350.npy"}
{"epoch": 0.5291005291005291, "step": 351, "batch_size": 64, "mean": 4.2928667068481445, "std": 5.940453052520752, "min": -13.914785385131836, "p10": -2.9028701782226562, "median": 4.84968376159668, "p90": 11.504835891723635, "max": 14.753890991210938, "pos_frac": 0.765625, "sample": [1.3407306671142578, 6.358085632324219, 13.540653228759766, 6.605567932128906, -0.5309791564941406, 9.37216567993164, -1.567352294921875, 5.712882995605469, 11.742042541503906, 3.2325897216796875, 4.9854736328125, 12.319717407226562, 2.86981201171875, 3.165130615234375, 4.681097030639648, -2.7375259399414062, 1.3536529541015625, -0.39687347412109375, 10.88076400756836, 3.7874107360839844, 5.718208312988281, -3.6616287231445312, 10.98733139038086, 4.383392333984375, 10.264785766601562, 8.843429565429688, 7.6632537841796875, 6.537300109863281, -10.959014892578125, 5.435394287109375, 10.86362075805664, -3.886829376220703, 10.025001525878906, -0.30077362060546875, 6.4628448486328125, -5.7295989990234375, 1.1387863159179688, 11.72662353515625, 5.778160095214844, 4.713893890380859, 8.949920654296875, 12.401603698730469, 4.5049591064453125, -2.3745498657226562, 9.688812255859375, 3.9524765014648438, 6.831401824951172, 6.3295440673828125, 0.228851318359375, 5.766487121582031, -2.9737319946289062, -0.20037460327148438, 8.363161087036133, 6.908233642578125, 12.185258865356445, 3.452056884765625, 4.709980010986328, 14.753890991210938, -13.914785385131836, 4.187501907348633, -11.27886962890625, 4.440521240234375, 6.256988525390625, -1.14508056640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000351.npy"}
{"epoch": 0.5306122448979592, "step": 352, "batch_size": 64, "mean": 3.96950101852417, "std": 5.388165473937988, "min": -6.6256256103515625, "p10": -1.8618366241455073, "median": 2.469066619873047, "p90": 11.027827453613282, "max": 18.910995483398438, "pos_frac": 0.8125, "sample": [8.4461669921875, 18.910995483398438, 18.887950897216797, 1.5662918090820312, 8.849929809570312, -6.6256256103515625, -2.7610950469970703, 2.1313915252685547, 1.9396934509277344, 4.455905914306641, 5.272653579711914, 8.238662719726562, -1.417877197265625, -0.4830589294433594, 5.29351806640625, 1.3000946044921875, -3.3292236328125, 0.401824951171875, -2.2809600830078125, 2.3009567260742188, 10.23162841796875, 3.7749557495117188, 6.56781005859375, 12.114330291748047, 12.145376205444336, 6.1918487548828125, 4.15997314453125, -4.357969284057617, 0.6780853271484375, 0.6016521453857422, 1.0356273651123047, 1.4974746704101562, 0.59991455078125, 2.7458229064941406, 1.2793350219726562, 4.470432281494141, 1.7913970947265625, 8.406219482421875, 10.650077819824219, 4.789787292480469, 7.5028839111328125, 2.637176513671875, 16.486968994140625, 0.6164875030517578, 2.123973846435547, 10.357080459594727, -1.0036392211914062, 0.5791816711425781, -0.9918994903564453, 6.959110260009766, 1.522705078125, 1.82147216796875, 4.4281463623046875, 14.036293029785156, 7.384666442871094, 1.32806396484375, -2.052104949951172, 11.189720153808594, -5.061611175537109, -0.0048828125, 0.2314929962158203, 5.896379470825195, 3.1263866424560547, 4.462030410766602], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000352.npy"}
{"epoch": 0.5321239606953893, "step": 353, "batch_size": 64, "mean": 4.291474342346191, "std": 6.048905849456787, "min": -16.209678649902344, "p10": -1.3262577056884766, "median": 3.063981056213379, "p90": 12.882649612426759, "max": 19.908157348632812, "pos_frac": 0.796875, "sample": [3.0505924224853516, 9.655807495117188, -6.050518035888672, 5.149635314941406, -3.857664108276367, 2.364917755126953, 2.7790908813476562, -4.421083450317383, 14.720550537109375, 3.8136444091796875, 0.23770523071289062, 6.246925354003906, 7.88764762878418, 1.64556884765625, 13.198478698730469, -0.8485927581787109, 7.9488525390625, 6.8843231201171875, 9.779876708984375, 11.446945190429688, 12.117080688476562, 15.083038330078125, 0.5416088104248047, 8.554449081420898, 0.21332550048828125, -1.2653083801269531, 15.307159423828125, 5.662290573120117, 12.955101013183594, 3.2963333129882812, 5.812461853027344, 2.3911666870117188, 5.290184020996094, -0.5822792053222656, -16.209678649902344, 9.470613479614258, -3.165538787841797, -1.218221664428711, 7.902061462402344, 12.71359634399414, 3.0773696899414062, 13.44818115234375, 19.908157348632812, 3.0360870361328125, 3.1064453125, 2.9383506774902344, 2.315582275390625, 2.5786170959472656, 0.9105739593505859, 2.6558303833007812, 4.092744827270508, 1.3947868347167969, -0.9140777587890625, 1.8232345581054688, 0.49209022521972656, 1.9942092895507812, 1.7124748229980469, 3.5523223876953125, -2.2214622497558594, -0.49262237548828125, 5.7281494140625, -1.3523788452148438, 12.648235321044922, 5.719291687011719], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000353.npy"}
{"epoch": 0.5336356764928194, "step": 354, "batch_size": 64, "mean": 4.799306392669678, "std": 6.015079021453857, "min": -9.342763900756836, "p10": -3.3225318908691404, "median": 4.403608322143555, "p90": 12.872228050231934, "max": 17.896984100341797, "pos_frac": 0.765625, "sample": [15.544525146484375, 8.467828750610352, -4.818328857421875, 7.89111328125, -3.5081710815429688, 2.110523223876953, 6.571281433105469, 6.092369079589844, 9.882293701171875, -0.08808517456054688, 8.000486373901367, 12.658803939819336, 6.164427757263184, 0.9003868103027344, 0.5620040893554688, 0.8280525207519531, 9.840438842773438, -2.1849098205566406, 13.675626754760742, 17.896984100341797, 11.485246658325195, 8.302497863769531, 2.4426116943359375, 5.584739685058594, -9.342763900756836, -0.2325592041015625, -4.5671234130859375, 11.490890502929688, 4.293487548828125, 4.218925476074219, 6.629432678222656, 12.975040435791016, 11.563804626464844, 8.834320068359375, 1.197723388671875, 0.14898300170898438, 4.513729095458984, -4.840551376342773, 12.963695526123047, -0.200439453125, 5.119926452636719, 13.977798461914062, -4.726715087890625, -3.666259765625, 12.469619750976562, -1.5984878540039062, 7.8383331298828125, 15.094825744628906, 3.0175724029541016, 1.4129257202148438, 3.571624755859375, 3.8267860412597656, 9.716398239135742, 12.220813751220703, 6.025379180908203, 3.5092849731445312, 2.6302261352539062, 1.6029586791992188, -1.8197097778320312, 5.746990203857422, 2.6562042236328125, 8.55959701538086, -2.889373779296875, -1.0904312133789062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000354.npy"}
{"epoch": 0.5351473922902494, "step": 355, "batch_size": 64, "mean": 4.645839691162109, "std": 6.196963310241699, "min": -9.051473617553711, "p10": -2.7883930206298824, "median": 5.021683692932129, "p90": 13.128558349609376, "max": 18.949913024902344, "pos_frac": 0.734375, "sample": [6.111064910888672, 18.949913024902344, 0.7104606628417969, 5.68756103515625, 10.0782470703125, 6.188030242919922, 3.4568328857421875, -4.198856353759766, -2.2476348876953125, 0.5985565185546875, 7.350902557373047, -0.4984893798828125, 6.584251403808594, -5.2538604736328125, 6.357154846191406, 6.664817810058594, -2.8983078002929688, 5.6494293212890625, -5.780757904052734, 4.184379577636719, -6.778038024902344, 4.354679107666016, 9.541046142578125, 12.981147766113281, 10.27899169921875, 3.3045711517333984, 4.594980239868164, 6.65361213684082, 11.485029220581055, 14.949310302734375, -2.471221923828125, 2.9646377563476562, -0.8337078094482422, 13.563385009765625, 2.7126312255859375, -5.3167572021484375, 6.782958984375, -1.7308006286621094, 4.402980804443359, 9.576242446899414, 5.448387145996094, -0.4912567138671875, 15.109428405761719, -0.9822616577148438, 3.7054996490478516, 13.191734313964844, 8.8701171875, 1.1597862243652344, 15.616966247558594, -9.051473617553711, 12.183963775634766, 8.53460693359375, 9.762542724609375, -1.8599128723144531, 14.603569030761719, -1.4086284637451172, -2.5319252014160156, 0.8820877075195312, 11.913284301757812, 1.3893585205078125, 7.379730224609375, 3.136137008666992, 5.586509704589844, 6.476127624511719], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000355.npy"}
{"epoch": 0.5366591080876795, "step": 356, "batch_size": 64, "mean": 3.84525203704834, "std": 7.082036972045898, "min": -13.89889144897461, "p10": -5.045273971557617, "median": 3.3675289154052734, "p90": 11.871570587158203, "max": 20.205490112304688, "pos_frac": 0.78125, "sample": [15.610815048217773, 11.047147750854492, 2.767852783203125, -8.683219909667969, 11.888084411621094, 2.294158935546875, 1.5397720336914062, 9.660110473632812, 4.994663238525391, 0.7223472595214844, -8.634588241577148, -11.748054504394531, 0.5691680908203125, -4.732139587402344, 11.239925384521484, 15.392387390136719, 3.3738937377929688, 2.1584930419921875, 14.251916885375977, 0.4367656707763672, 9.597320556640625, -13.89889144897461, 8.0091552734375, 2.9091873168945312, 4.110511779785156, 11.674339294433594, -5.151691436767578, -4.796966552734375, 4.743022918701172, 4.4845123291015625, 2.1200294494628906, 2.0833816528320312, -2.075838088989258, 0.0398406982421875, -7.710472106933594, 7.4997406005859375, 8.332473754882812, -0.8094749450683594, 5.782642364501953, 10.879669189453125, 3.5726985931396484, 1.4753265380859375, 13.183731079101562, -1.1726837158203125, 4.823692321777344, 3.361164093017578, 11.53683090209961, 0.87103271484375, -1.6281051635742188, 5.1530609130859375, 0.5722999572753906, 6.494169235229492, 15.512765884399414, 10.864791870117188, 0.50970458984375, 5.967435836791992, 2.9285945892333984, 0.24160003662109375, 11.721206665039062, -2.1091766357421875, -9.342296600341797, 20.205490112304688, 7.547760009765625, 11.833038330078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000356.npy"}
{"epoch": 0.5381708238851096, "step": 357, "batch_size": 64, "mean": 3.672792911529541, "std": 6.1011810302734375, "min": -11.144073486328125, "p10": -3.616480636596679, "median": 2.7050399780273438, "p90": 11.452881813049318, "max": 16.521854400634766, "pos_frac": 0.78125, "sample": [8.675888061523438, -0.4749565124511719, 10.397163391113281, 0.5372772216796875, -11.144073486328125, 16.42656707763672, -0.483123779296875, 3.987316131591797, 2.8133544921875, 1.7258529663085938, -4.3509521484375, 13.106101989746094, -1.6025619506835938, 4.062618255615234, 16.521854400634766, 4.089630126953125, 11.664754867553711, 2.4334869384765625, 9.748586654663086, 2.0827903747558594, 5.738006591796875, 2.7613258361816406, 2.581707000732422, 2.930727005004883, 1.6702804565429688, 5.5095062255859375, 0.7376232147216797, -0.14334487915039062, -3.94171142578125, 4.490177154541016, -5.904212951660156, 1.2154178619384766, -2.8576087951660156, 4.1656951904296875, 6.946712493896484, 15.35137939453125, 9.450325012207031, 10.164840698242188, 10.446624755859375, 14.053537368774414, 0.5193405151367188, 5.296043395996094, 1.5530471801757812, 1.8434524536132812, 10.18511962890625, 3.0650787353515625, 0.7988700866699219, -1.5659332275390625, -7.703670501708984, 2.4460296630859375, 1.9767799377441406, 7.20819091796875, 9.012092590332031, 2.648754119873047, 10.958511352539062, 0.32159423828125, 1.104654312133789, 5.325061798095703, 12.53873062133789, -6.711284637451172, -2.4434967041015625, -10.268959045410156, 10.2567138671875, 1.1094532012939453], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000357.npy"}
{"epoch": 0.5396825396825397, "step": 358, "batch_size": 64, "mean": 5.18164587020874, "std": 5.983022689819336, "min": -6.616199493408203, "p10": -2.4525144577026365, "median": 5.157594680786133, "p90": 12.43187942504883, "max": 19.140655517578125, "pos_frac": 0.78125, "sample": [9.886993408203125, 6.3059844970703125, 9.750911712646484, 11.616867065429688, -2.1867847442626953, 11.015983581542969, 9.346672058105469, -2.5663986206054688, 2.0569820404052734, 4.005218505859375, -3.0183792114257812, 7.0409393310546875, 2.6322250366210938, 7.2189178466796875, 12.686943054199219, 1.6995697021484375, -5.2182464599609375, 8.274703979492188, 11.158172607421875, 5.179531097412109, 9.8984375, 1.0446014404296875, -0.41510009765625, 10.790006637573242, 10.401086807250977, -0.70361328125, 2.7043914794921875, -0.6450119018554688, -0.98516845703125, 1.8166465759277344, -4.858314514160156, 6.380527496337891, -0.956390380859375, 0.8991165161132812, 0.29099273681640625, 8.6397705078125, 7.358680725097656, 8.789955139160156, 2.4383468627929688, -0.5455493927001953, 14.70001220703125, 11.83673095703125, 19.140655517578125, 1.3415031433105469, 16.10308074951172, 18.130868911743164, 6.559642791748047, -2.855518341064453, 2.6401519775390625, 9.188873291015625, 1.4781494140625, -4.909551620483398, -6.616199493408203, 12.90003776550293, 2.2980499267578125, 6.831993103027344, 3.3375587463378906, 3.6083831787109375, 1.5548095703125, 7.655609130859375, 5.359945297241211, 11.191810607910156, 5.135658264160156, 15.782859802246094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000358.npy"}
{"epoch": 0.5411942554799698, "step": 359, "batch_size": 64, "mean": 3.1616458892822266, "std": 5.8025360107421875, "min": -8.95494270324707, "p10": -3.636865615844726, "median": 3.0886287689208984, "p90": 10.48596820831299, "max": 16.815841674804688, "pos_frac": 0.671875, "sample": [-2.09002685546875, 10.05877685546875, -2.573474884033203, 2.1635589599609375, 4.834972381591797, -0.028743743896484375, -3.2872543334960938, 2.360170364379883, -3.3760719299316406, 12.900115966796875, 16.815841674804688, 2.778322219848633, 0.6148529052734375, 6.891090393066406, -2.036924362182617, 5.9246063232421875, 8.702760696411133, -4.850151062011719, 4.1145477294921875, -8.95494270324707, 1.0498085021972656, 3.7977638244628906, 1.5446624755859375, -3.8646392822265625, 2.2534561157226562, 9.676734924316406, 4.2585601806640625, 4.343975067138672, 4.4365234375, -8.895801544189453, 12.549163818359375, 16.171783447265625, -4.537567138671875, 10.004585266113281, 1.134185791015625, 3.0562210083007812, 5.417320251464844, 5.049411773681641, 1.7526664733886719, 5.369728088378906, -3.7486343383789062, 7.934534072875977, 10.669050216674805, 7.957651138305664, 3.3613052368164062, -3.0831298828125, -2.1963119506835938, -7.2456207275390625, -1.6122169494628906, 4.524139404296875, 3.1210365295410156, -2.353404998779297, -0.0291595458984375, -0.7489986419677734, 7.709327697753906, -0.9092445373535156, 8.833982467651367, 3.700204849243164, 4.6313323974609375, 13.020278930664062, 6.429908752441406, 14.891181945800781, -0.15189743041992188, 2.1094512939453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000359.npy"}
{"epoch": 0.5427059712773998, "step": 360, "batch_size": 64, "mean": 4.204775810241699, "std": 6.116269588470459, "min": -7.050201416015625, "p10": -2.4650527954101555, "median": 3.089527130126953, "p90": 12.508595275878907, "max": 22.39361572265625, "pos_frac": 0.75, "sample": [1.936065673828125, 20.130401611328125, -5.4054412841796875, 0.4414215087890625, 3.658184051513672, 9.564231872558594, -2.7154083251953125, 1.5732154846191406, -0.09667587280273438, 4.277008056640625, -1.3674545288085938, -1.1003494262695312, 1.2461318969726562, -3.5521278381347656, 10.62716293334961, -3.86181640625, 13.755523681640625, -3.0019989013671875, 4.242807388305664, 7.9716339111328125, 3.238525390625, 1.9115524291992188, -7.050201416015625, -1.2310791015625, 1.4735870361328125, 1.9557647705078125, 8.895881652832031, 4.159183502197266, 6.175300598144531, 4.331207275390625, 4.291435241699219, 2.9405288696289062, 4.767887115478516, -1.880889892578125, 2.0126113891601562, 4.4939117431640625, -4.682807922363281, 5.878574371337891, 10.389862060546875, 2.2141799926757812, 22.39361572265625, 1.7329864501953125, 8.135753631591797, -1.8065834045410156, 15.331939697265625, 5.642189025878906, 12.220535278320312, 3.7537612915039062, 2.937948226928711, 19.37552261352539, 12.632049560546875, 7.905120849609375, 13.891040802001953, -0.09509658813476562, -0.485137939453125, 5.984371185302734, 0.6894607543945312, -1.4448165893554688, 1.7410087585449219, 11.685924530029297, 2.32470703125, 0.04753875732421875, 5.939399719238281, 5.964874267578125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000360.npy"}
{"epoch": 0.54421768707483, "step": 361, "batch_size": 64, "mean": 4.77806282043457, "std": 6.16286563873291, "min": -11.632827758789062, "p10": -2.402326965332031, "median": 4.249375343322754, "p90": 10.843788528442385, "max": 23.90351104736328, "pos_frac": 0.796875, "sample": [3.9111862182617188, 8.354087829589844, 0.48298072814941406, -6.26300048828125, 14.532356262207031, 3.0540008544921875, -2.8300514221191406, -4.536872863769531, 2.9219627380371094, 4.610958099365234, 1.9698638916015625, -6.026561737060547, -1.1874847412109375, 9.9866943359375, 9.524063110351562, 6.787519454956055, -2.5236892700195312, -0.005878448486328125, 1.6735000610351562, 2.46893310546875, 3.712127685546875, 8.003440856933594, 0.23090362548828125, 13.575332641601562, 8.41537857055664, 5.34619140625, 7.166141510009766, -11.632827758789062, 13.743972778320312, 4.131336212158203, 7.480617523193359, 1.0659561157226562, 8.82138442993164, 23.90351104736328, 10.163284301757812, 9.023162841796875, 8.818035125732422, 6.602823257446289, -0.1796875, -2.1191482543945312, -0.278778076171875, 0.2809333801269531, 11.914993286132812, 8.923686981201172, 9.342113494873047, 9.91256332397461, 4.857372283935547, 2.1217041015625, -3.5812149047851562, 8.954093933105469, 8.602653503417969, 3.019256591796875, 5.90936279296875, 2.315624237060547, 22.698341369628906, 4.23430061340332, 11.135433197021484, 4.2644500732421875, -0.0218048095703125, 7.463478088378906, 2.4805374145507812, 5.662975311279297, 1.2703399658203125, 1.1370697021484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000361.npy"}
{"epoch": 0.54572940287226, "step": 362, "batch_size": 64, "mean": 4.654008865356445, "std": 5.722803115844727, "min": -7.263999938964844, "p10": -1.8162418365478517, "median": 3.4537506103515625, "p90": 11.540377044677735, "max": 20.75616455078125, "pos_frac": 0.8125, "sample": [-1.8031120300292969, 4.125633239746094, 1.0078277587890625, 0.45217323303222656, 3.811511993408203, -7.263999938964844, 7.367393493652344, 0.1534423828125, 20.75616455078125, 8.470016479492188, 1.4884719848632812, -2.8733482360839844, 5.192161560058594, 2.2605438232421875, -0.3438835144042969, 0.3491668701171875, 3.6257858276367188, 10.153854370117188, 3.019664764404297, 8.94171142578125, -0.07670783996582031, 2.4371337890625, 2.4281673431396484, 1.2193603515625, 11.403877258300781, 3.2817153930664062, 11.125762939453125, 2.5941314697265625, -0.49318885803222656, 8.40689468383789, 14.90997314453125, 1.526376724243164, 1.0024833679199219, 7.342197418212891, 1.1826705932617188, 11.598876953125, 4.8604278564453125, 2.3785247802734375, 4.2170562744140625, -1.2400436401367188, 1.2813396453857422, 14.7642822265625, 10.854591369628906, 12.217979431152344, 7.089385986328125, 0.91693115234375, 9.75400161743164, 11.238616943359375, 8.145605087280273, 5.722320556640625, -5.092466354370117, 6.171478271484375, 7.178779602050781, -3.5417251586914062, 3.2758255004882812, 2.106586456298828, 13.219558715820312, -1.821868896484375, -3.3229026794433594, 8.165390014648438, 19.760719299316406, -3.429859161376953, 9.136726379394531, 5.0683746337890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000362.npy"}
{"epoch": 0.54724111866969, "step": 363, "batch_size": 64, "mean": 4.79734992980957, "std": 6.033456325531006, "min": -9.046966552734375, "p10": -2.3449394226074216, "median": 4.184385299682617, "p90": 13.43543853759766, "max": 18.264739990234375, "pos_frac": 0.796875, "sample": [7.306983947753906, -6.240730285644531, 14.689247131347656, 4.318805694580078, 3.9129638671875, 3.4148788452148438, 3.0207061767578125, 17.446578979492188, 3.1450042724609375, -2.3981704711914062, -1.87286376953125, 10.0205078125, 18.264739990234375, 8.9273681640625, 1.3846664428710938, 17.943984985351562, 2.592632293701172, 1.8637218475341797, 4.818367004394531, 7.621330261230469, 0.7695655822753906, 5.871282577514648, 6.391653060913086, 15.886459350585938, 11.984909057617188, 4.703744888305664, 1.108367919921875, 2.8553085327148438, -2.220733642578125, 1.818399429321289, -3.4323768615722656, -9.046966552734375, 4.961570739746094, -3.7917861938476562, 11.350940704345703, 8.559568405151367, 1.2257671356201172, 7.162742614746094, 3.8351192474365234, -0.2016887664794922, 4.581867218017578, 2.0388717651367188, 8.838821411132812, 11.250335693359375, 5.443611145019531, 13.774810791015625, 6.850128173828125, 3.9461669921875, 7.948873519897461, -0.777923583984375, 14.329263687133789, 2.010173797607422, 4.237703323364258, 8.126325607299805, -3.322019577026367, -8.442611694335938, 10.275558471679688, -0.752685546875, 12.643569946289062, 2.3063316345214844, -0.3244495391845703, 2.9772567749023438, 4.966773986816406, 4.131067276000977], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000363.npy"}
{"epoch": 0.5487528344671202, "step": 364, "batch_size": 64, "mean": 4.943852424621582, "std": 7.05634069442749, "min": -12.011550903320312, "p10": -2.798266983032226, "median": 4.908283233642578, "p90": 14.962668228149417, "max": 20.811843872070312, "pos_frac": 0.734375, "sample": [15.30767822265625, -0.3247642517089844, -0.5142593383789062, 18.78515625, 14.167106628417969, 2.0319366455078125, 6.782096862792969, 5.831878662109375, -2.224803924560547, 12.1455078125, 20.811843872070312, 5.894695281982422, -2.1785240173339844, -3.660472869873047, 12.368408203125, 17.145599365234375, 3.2879486083984375, 1.0973949432373047, 18.787857055664062, -1.9993667602539062, 12.780838012695312, 2.8773117065429688, 5.623321533203125, 9.571342468261719, -9.33648681640625, -5.7559967041015625, 16.575592041015625, -1.3297157287597656, 0.29433250427246094, 7.269309997558594, 9.07809066772461, 7.399627685546875, -0.6310958862304688, 3.677490234375, 3.449462890625, 0.5986099243164062, 9.734249114990234, 8.060638427734375, -0.7090225219726562, 1.2581157684326172, 5.609477996826172, 6.1907501220703125, -1.519866943359375, 5.4947967529296875, 9.792022705078125, 1.0943031311035156, 0.13175392150878906, 7.5867462158203125, 0.6949310302734375, 15.30362319946289, -3.200897216796875, 10.40595817565918, -8.167667388916016, 9.176101684570312, -12.011550903320312, -0.288848876953125, 8.886245727539062, 7.763458251953125, 1.6547966003417969, 4.321769714355469, 3.837207794189453, -3.044036865234375, 13.251625061035156, 9.414936065673828], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000364.npy"}
{"epoch": 0.5502645502645502, "step": 365, "batch_size": 64, "mean": 4.593672752380371, "std": 5.302447319030762, "min": -8.21017074584961, "p10": -1.3247785568237305, "median": 4.1720380783081055, "p90": 10.297148132324219, "max": 18.354415893554688, "pos_frac": 0.828125, "sample": [0.5543479919433594, 5.16423225402832, 3.1923675537109375, 3.971120834350586, 4.683725357055664, 3.1549205780029297, 9.482955932617188, 5.888095855712891, 11.03155517578125, 6.893524169921875, 5.8248748779296875, 2.950042724609375, -8.21017074584961, 4.655214309692383, 3.3959274291992188, 7.025276184082031, -1.2751712799072266, 4.604133605957031, 1.2628021240234375, 0.32534217834472656, 8.172698974609375, -1.346038818359375, 8.522621154785156, 2.0226917266845703, 0.8639984130859375, -6.872020721435547, 3.4123382568359375, 18.235130310058594, 3.4038772583007812, 10.071338653564453, 3.67034912109375, 3.7102394104003906, 6.796482086181641, 3.780994415283203, 12.477005004882812, 4.495594024658203, 10.30889892578125, -0.351531982421875, 0.7358589172363281, 6.015769958496094, 8.326089859008789, 8.107940673828125, 6.532920837402344, -1.65924072265625, 6.20989990234375, 18.354415893554688, 2.583322525024414, -2.827463150024414, 16.6312255859375, 3.2395858764648438, 10.269729614257812, -3.0359230041503906, 9.112167358398438, 4.372955322265625, 2.3487186431884766, 5.038299560546875, -1.0952262878417969, -4.692176818847656, 2.4382781982421875, 1.7067489624023438, 15.69186019897461, 8.338127136230469, -1.1979789733886719, 6.499359130859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000365.npy"}
{"epoch": 0.5517762660619804, "step": 366, "batch_size": 64, "mean": 3.793876886367798, "std": 6.284309387207031, "min": -9.153839111328125, "p10": -4.15950584411621, "median": 2.967876434326172, "p90": 11.093863487243654, "max": 18.416030883789062, "pos_frac": 0.71875, "sample": [-0.7013263702392578, 9.656036376953125, 8.284164428710938, 2.000293731689453, -2.3035125732421875, 11.328149795532227, -6.472686767578125, -3.3241748809814453, 1.0721893310546875, -4.442741394042969, 0.5687179565429688, 8.086860656738281, 5.423057556152344, 7.452327728271484, 7.4499969482421875, 2.3862457275390625, -9.153839111328125, 6.2682342529296875, 2.307809829711914, 7.1622161865234375, 1.10113525390625, 3.454315185546875, 1.1500778198242188, -1.5764217376708984, -0.6156368255615234, 5.146013259887695, 6.223316192626953, 11.651273727416992, -6.39410400390625, 0.06800270080566406, 18.416030883789062, -2.4331283569335938, -3.4986228942871094, 10.547195434570312, -8.07481575012207, 1.906454086303711, 0.016326904296875, -5.604270935058594, 4.801641464233398, 3.4850006103515625, 1.4832210540771484, -1.1635189056396484, 1.253702163696289, 6.80828857421875, -0.5921745300292969, -0.5000381469726562, 1.2598705291748047, 10.391845703125, 16.167640686035156, 2.4814376831054688, 5.349250793457031, 10.116233825683594, 9.455886840820312, -7.091907501220703, 8.427892684936523, -0.29407310485839844, 18.235084533691406, 10.48272705078125, 12.951801300048828, 10.155281066894531, 12.667320251464844, 8.76865005493164, 9.007026672363281, 4.168874740600586], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000366.npy"}
{"epoch": 0.5532879818594104, "step": 367, "batch_size": 64, "mean": 3.646374464035034, "std": 4.941457748413086, "min": -11.041839599609375, "p10": -1.7573760986328124, "median": 3.7792816162109375, "p90": 9.4653076171875, "max": 14.541893005371094, "pos_frac": 0.765625, "sample": [1.7336616516113281, 13.485015869140625, 4.913272857666016, -0.4981651306152344, 6.448371887207031, 0.85736083984375, 12.89419174194336, 8.977363586425781, 3.6574020385742188, 1.0683536529541016, -0.721099853515625, 2.2761917114257812, 2.9991607666015625, 2.11328125, 14.541893005371094, 4.914974212646484, 5.807525634765625, 1.01177978515625, -2.048116683959961, 6.774986267089844, 5.171417236328125, 6.39276123046875, -0.8771114349365234, -4.30217170715332, 3.455535888671875, -11.041839599609375, -1.7791404724121094, 4.3673248291015625, 3.7989635467529297, 7.666900634765625, 8.78424072265625, 5.830072402954102, 2.143810272216797, 7.6548919677734375, 4.070468902587891, 2.5334033966064453, 6.2396392822265625, -1.7065925598144531, 3.7595996856689453, 10.167404174804688, 0.5397720336914062, -8.869590759277344, 1.8424854278564453, 13.682723999023438, 5.639198303222656, -2.3348388671875, 5.157649993896484, 2.575733184814453, 5.32476806640625, -0.10146713256835938, 4.08929443359375, 2.3666858673095703, -3.416900634765625, 5.228172302246094, 9.482025146484375, 9.426300048828125, -0.07995033264160156, -1.4844188690185547, 6.102500915527344, 11.767738342285156, -1.0986175537109375, 4.0888824462890625, 0.7778472900390625, 9.124982833862305], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000367.npy"}
{"epoch": 0.5547996976568406, "step": 368, "batch_size": 64, "mean": 3.478600025177002, "std": 5.661892890930176, "min": -7.4769287109375, "p10": -4.222705841064453, "median": 3.513758659362793, "p90": 10.635896492004395, "max": 14.630386352539062, "pos_frac": 0.703125, "sample": [-7.0155792236328125, 5.977447509765625, -3.7244720458984375, 1.520303726196289, 0.3647918701171875, 4.465934753417969, 10.496030807495117, 7.185417175292969, -0.1309814453125, -1.648651123046875, 9.387779235839844, 1.0426597595214844, 1.8489456176757812, 6.1481781005859375, 12.94232177734375, 2.51556396484375, -1.1525955200195312, -4.0398712158203125, -3.373016357421875, 9.43896484375, 2.7032546997070312, 5.809041976928711, 6.544425964355469, -4.301063537597656, 10.664453506469727, 3.4536170959472656, -4.497489929199219, 5.5735626220703125, 14.630386352539062, 9.563884735107422, -0.5341033935546875, 0.4643096923828125, 10.569263458251953, -0.7598609924316406, 1.3256645202636719, 6.774745941162109, 0.7036209106445312, 4.2508544921875, 5.443672180175781, 6.590675354003906, 1.6996994018554688, -0.10557174682617188, 13.802970886230469, 10.043739318847656, -5.620830535888672, -5.671882629394531, 4.841442108154297, -2.0911102294921875, 13.090660095214844, -5.943382263183594, 11.289375305175781, 11.952564239501953, 8.158843994140625, 7.7106475830078125, 1.9077930450439453, 6.393768310546875, 5.999931335449219, -3.197601318359375, -1.8547592163085938, 5.377471923828125, 9.881927490234375, 3.5739002227783203, -7.4769287109375, 1.6456298828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000368.npy"}
{"epoch": 0.5563114134542706, "step": 369, "batch_size": 64, "mean": 3.5810842514038086, "std": 5.490725994110107, "min": -9.070676803588867, "p10": -3.1127668380737297, "median": 2.9716033935546875, "p90": 10.927528381347656, "max": 14.383516311645508, "pos_frac": 0.78125, "sample": [7.751487731933594, -6.7984161376953125, 4.79057502746582, 8.9320068359375, -9.070676803588867, 12.345779418945312, 9.822837829589844, 1.6846160888671875, 4.7071075439453125, 4.122196197509766, 2.985321044921875, 10.959869384765625, 8.775688171386719, 11.907150268554688, 9.899152755737305, 2.0374794006347656, 3.941192626953125, 2.9178009033203125, 4.082366943359375, 2.845996856689453, 2.6544952392578125, -7.953386306762695, 1.3237380981445312, -7.621000289916992, 2.6507568359375, 9.049739837646484, -1.170755386352539, 5.620355606079102, 6.507564544677734, -2.138326644897461, 12.417022705078125, 1.14117431640625, 3.4064197540283203, 9.487602233886719, 6.550800323486328, 11.11264419555664, 14.383516311645508, -0.47370147705078125, 0.872650146484375, -5.850067138671875, 2.9578857421875, 2.5972366333007812, -1.2115440368652344, 3.155820846557617, 0.6346549987792969, -0.44324493408203125, 10.852066040039062, 4.838584899902344, 8.1767578125, 0.17583465576171875, 2.3557815551757812, 2.3565216064453125, -0.7483406066894531, 10.245742797851562, 0.104278564453125, -2.036317825317383, 0.6318321228027344, 13.386655807495117, 1.7452011108398438, 5.704185485839844, -3.530384063720703, 7.115409851074219, -6.155479431152344, 5.669483184814453], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000369.npy"}
{"epoch": 0.5578231292517006, "step": 370, "batch_size": 64, "mean": 4.600583076477051, "std": 6.891026973724365, "min": -13.780635833740234, "p10": -3.536147689819336, "median": 4.195796966552734, "p90": 13.457968902587893, "max": 19.66834259033203, "pos_frac": 0.796875, "sample": [3.3142337799072266, 13.806617736816406, 6.8254852294921875, 2.6451988220214844, 13.73046875, 2.0010986328125, 18.02912139892578, 3.9663009643554688, 19.66834259033203, 1.6099891662597656, 0.5689544677734375, -0.43119049072265625, -0.260986328125, 11.374320983886719, 18.303634643554688, 5.098590850830078, 2.4268798828125, -5.095039367675781, -7.0329742431640625, 10.127197265625, 6.568756103515625, 6.994384765625, 2.989421844482422, 8.75040054321289, 8.765422821044922, 5.201263427734375, 8.21413803100586, 0.4809722900390625, 2.8719329833984375, -1.9955177307128906, -1.8741798400878906, 0.40789794921875, 0.28600311279296875, -7.233062744140625, 0.0623016357421875, 9.930183410644531, 16.413803100585938, 5.445075988769531, 14.283367156982422, 7.2960968017578125, 12.221885681152344, -12.569976806640625, 7.261407852172852, 6.4590911865234375, 0.9393768310546875, 11.852584838867188, 6.144439697265625, 8.966073989868164, 9.450538635253906, 9.393203735351562, -13.780635833740234, 3.2334346771240234, 12.822135925292969, -3.4187164306640625, -1.6510772705078125, 3.2698822021484375, 4.42529296875, 0.75640869140625, -3.996034622192383, 0.5147476196289062, 6.515586853027344, 11.829940795898438, -3.586475372314453, 2.849302291870117], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000370.npy"}
{"epoch": 0.5593348450491308, "step": 371, "batch_size": 64, "mean": 4.200061798095703, "std": 6.360535621643066, "min": -10.656461715698242, "p10": -4.1681365966796875, "median": 4.102449417114258, "p90": 11.530434036254883, "max": 18.356534957885742, "pos_frac": 0.71875, "sample": [9.938079833984375, -10.656461715698242, 16.04187774658203, 1.7228012084960938, 11.4031982421875, -4.3006591796875, 10.039947509765625, -1.445657730102539, -0.599578857421875, 1.9242725372314453, 0.4396820068359375, 8.211174011230469, -4.085277557373047, 18.356534957885742, 0.8432273864746094, 9.480827331542969, -0.1429309844970703, 11.452226638793945, 5.897287368774414, 16.994850158691406, 2.4110260009765625, -2.0858306884765625, -0.6980972290039062, 1.4727439880371094, 7.235382080078125, 11.56395149230957, 7.037105560302734, 17.022567749023438, 11.319026947021484, 8.566415786743164, 9.081649780273438, 7.6446380615234375, -6.291267395019531, -0.3787193298339844, 4.754739761352539, 3.7063140869140625, -1.289459228515625, -4.4816131591796875, -1.9126091003417969, 13.73748779296875, -4.203647613525391, 7.533306121826172, 0.37854957580566406, 3.060871124267578, 5.861503601074219, -3.905986785888672, 4.326641082763672, 3.3492660522460938, 0.3750419616699219, 3.8782577514648438, 4.491558074951172, -4.943059921264648, 0.32260704040527344, 7.387603759765625, 13.97686767578125, 0.0765533447265625, 7.157951354980469, 5.094841003417969, 9.519798278808594, 5.666679382324219, 8.888580322265625, 8.227100372314453, -5.140235900878906, -2.5075645446777344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000371.npy"}
{"epoch": 0.5608465608465608, "step": 372, "batch_size": 64, "mean": 3.016690731048584, "std": 7.0870256423950195, "min": -11.507164001464844, "p10": -6.2447412490844725, "median": 3.6977901458740234, "p90": 11.329949951171875, "max": 20.986955642700195, "pos_frac": 0.71875, "sample": [-4.7534027099609375, 4.140514373779297, 7.127708435058594, 2.5333251953125, 4.750823974609375, 4.389892578125, 14.336624145507812, -2.4112625122070312, 11.335479736328125, 6.335350036621094, 13.39837646484375, 0.148712158203125, 4.739959716796875, 18.76317596435547, 5.445137023925781, 2.4652633666992188, 8.343063354492188, -11.104686737060547, 1.3843002319335938, 7.8160400390625, 1.0164756774902344, 1.534402847290039, 5.6705169677734375, 12.312984466552734, 0.3572502136230469, -2.4298324584960938, -11.185493469238281, 5.0650787353515625, 19.27130126953125, -3.962066650390625, 7.058441162109375, 9.675323486328125, 1.880615234375, -2.020862579345703, -6.430671691894531, 6.520256042480469, -8.518932342529297, 3.4482154846191406, 9.251365661621094, 4.652570724487305, 11.317047119140625, 4.985553741455078, -1.5760955810546875, 9.099853515625, -3.501171112060547, 4.71061897277832, -11.507164001464844, 0.6453380584716797, 1.2431755065917969, 4.105567932128906, 1.0504913330078125, -10.675918579101562, 6.8359832763671875, 1.8827152252197266, -1.9567546844482422, -5.810903549194336, -7.737033843994141, 5.175727844238281, 9.512540817260742, 20.986955642700195, -2.137388229370117, -2.190460205078125, 2.3108291625976562, 3.9473648071289062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000372.npy"}
{"epoch": 0.562358276643991, "step": 373, "batch_size": 64, "mean": 3.8584046363830566, "std": 6.567584991455078, "min": -8.069633483886719, "p10": -5.48632583618164, "median": 3.393402099609375, "p90": 11.421071624755859, "max": 20.56842803955078, "pos_frac": 0.75, "sample": [-6.302532196044922, 10.393085479736328, 13.8160400390625, 3.11083984375, 9.762077331542969, -5.5847015380859375, -0.008380889892578125, 10.491619110107422, 4.155841827392578, 3.037660598754883, 7.978313446044922, -8.069633483886719, 6.3162384033203125, -2.8989715576171875, -7.047990798950195, -1.9107780456542969, 3.085023880004883, 3.3895950317382812, -7.593391418457031, -1.6642875671386719, 4.954364776611328, 0.3887615203857422, 9.303291320800781, -3.263824462890625, 4.581390380859375, 11.422538757324219, -0.1531829833984375, 0.8149757385253906, 9.646011352539062, 5.229265213012695, 15.544904708862305, 3.4295730590820312, 1.3760261535644531, 11.417648315429688, -5.256782531738281, 0.5752315521240234, -2.6993865966796875, 5.5209808349609375, 0.005218505859375, 11.131427764892578, 7.631252288818359, -7.265625, -4.003303527832031, 2.1752662658691406, 0.38895416259765625, 5.873281478881836, -6.593196868896484, 12.96566390991211, 3.3972091674804688, 1.9316482543945312, 1.8742828369140625, 15.020645141601562, 2.6426239013671875, 20.56842803955078, 3.5774173736572266, 9.579875946044922, 6.854167938232422, 9.460739135742188, 2.0616912841796875, 19.471702575683594, 4.640626907348633, 2.681499481201172, 3.4323577880859375, 10.146575927734375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000373.npy"}
{"epoch": 0.563869992441421, "step": 374, "batch_size": 64, "mean": 5.353536605834961, "std": 5.1977410316467285, "min": -5.7719573974609375, "p10": -1.1711807250976558, "median": 5.733442306518555, "p90": 12.06467056274414, "max": 16.2677001953125, "pos_frac": 0.84375, "sample": [12.568878173828125, 6.476890563964844, 7.861747741699219, -0.5676231384277344, 2.365428924560547, 10.306526184082031, 12.083793640136719, 9.203163146972656, 14.607032775878906, 3.628622055053711, 12.020050048828125, -1.899810791015625, 5.770103454589844, 12.510612487792969, 7.153717041015625, 0.02606201171875, 9.959968566894531, 2.4846839904785156, 0.3211669921875, 3.5715103149414062, 6.025886535644531, 7.082490921020508, 1.5141258239746094, -0.7216644287109375, 4.4932098388671875, 6.778770446777344, 0.9590530395507812, 11.925010681152344, 0.4961395263671875, 6.009246826171875, 3.213287353515625, 8.357833862304688, 2.357484817504883, 0.03947257995605469, 5.4244232177734375, 9.628292083740234, 7.60986328125, -3.3839874267578125, 9.69268798828125, 7.606204986572266, 5.731220245361328, -2.942850112915039, -0.23107147216796875, 15.52093505859375, 11.732269287109375, 1.5751419067382812, -2.5008544921875, 5.3445281982421875, 3.6154727935791016, 7.892547607421875, 16.2677001953125, 11.780452728271484, 13.091827392578125, 2.6341476440429688, 5.61083984375, 1.1797237396240234, 9.650962829589844, 7.130096435546875, 5.735664367675781, 9.023239135742188, -4.413671493530273, -1.36383056640625, 0.7734603881835938, -5.7719573974609375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000374.npy"}
{"epoch": 0.5653817082388511, "step": 375, "batch_size": 64, "mean": 2.856553554534912, "std": 5.52036190032959, "min": -10.268363952636719, "p10": -3.3214191436767577, "median": 2.677191734313965, "p90": 9.89471817016602, "max": 17.326068878173828, "pos_frac": 0.6875, "sample": [-2.40380859375, -6.527801513671875, 0.2269744873046875, 3.9738235473632812, 6.171125411987305, 2.189359664916992, 11.673919677734375, 2.9510726928710938, -1.1201248168945312, 3.844511032104492, -2.217815399169922, 9.125328063964844, -1.9221343994140625, 13.750146865844727, 3.4771175384521484, -3.1255455017089844, 4.020986557006836, -0.5974197387695312, 4.331844329833984, -1.8890953063964844, 1.9676322937011719, 8.648712158203125, 8.899459838867188, -3.405364990234375, 3.9012222290039062, 10.360893249511719, -0.230224609375, -2.1056976318359375, 2.4711532592773438, 6.792819976806641, 1.0245628356933594, -0.4834480285644531, -0.3524742126464844, -6.452125549316406, 0.351104736328125, 2.883230209350586, 3.620513916015625, 3.1569366455078125, 17.326068878173828, 0.218231201171875, 8.145217895507812, 7.016935348510742, 10.493762969970703, -10.268363952636719, 1.9689559936523438, 0.3817291259765625, 14.127655029296875, 2.3503036499023438, 7.730499267578125, 4.979042053222656, 6.701423645019531, 10.224456787109375, -4.293392181396484, 6.948284149169922, 2.3826541900634766, 2.1272125244140625, 4.193946838378906, -8.773406982421875, -2.8867645263671875, 8.341865539550781, -2.1824569702148438, 9.04616928100586, 3.5673141479492188, -4.029289245605469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000375.npy"}
{"epoch": 0.5668934240362812, "step": 376, "batch_size": 64, "mean": 5.62941312789917, "std": 6.77410888671875, "min": -8.175682067871094, "p10": -2.759593963623047, "median": 4.693593978881836, "p90": 15.201396942138672, "max": 23.194122314453125, "pos_frac": 0.796875, "sample": [4.752105712890625, 3.1806564331054688, 2.3969345092773438, 5.3480987548828125, 3.9247817993164062, -3.5174407958984375, -2.6232757568359375, 12.57623291015625, 2.8251800537109375, 5.1501922607421875, 9.52093505859375, 8.982650756835938, -8.175682067871094, -0.8605098724365234, -3.953125, 1.4799270629882812, 16.459617614746094, 9.0169677734375, 6.336696624755859, 23.194122314453125, 5.6504058837890625, 10.517013549804688, 0.506195068359375, 10.756576538085938, 15.050765991210938, -1.8345565795898438, 15.265953063964844, 12.926652908325195, 8.546417236328125, 10.979759216308594, 8.752986907958984, 4.470331192016602, 18.70745849609375, 10.072509765625, 4.635082244873047, 11.041030883789062, 6.77947998046875, -2.364023208618164, 15.895904541015625, -6.491497039794922, 3.4331130981445312, -2.8180160522460938, 10.957283020019531, 3.1804656982421875, 3.9371871948242188, 15.005706787109375, -1.3443126678466797, 6.007326126098633, -1.4606781005859375, 10.775472640991211, -7.841468811035156, 7.738737106323242, 2.90985107421875, 3.765186309814453, 16.617734909057617, 1.4671192169189453, 2.8319091796875, 2.829143524169922, 15.895170211791992, 9.27294921875, 1.3859710693359375, 3.624032974243164, 1.1019001007080078, -4.868858337402344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000376.npy"}
{"epoch": 0.5684051398337112, "step": 377, "batch_size": 64, "mean": 3.804288387298584, "std": 5.368813514709473, "min": -8.283889770507812, "p10": -2.214376449584961, "median": 4.3307037353515625, "p90": 10.39005126953125, "max": 15.60595703125, "pos_frac": 0.734375, "sample": [6.421867370605469, -4.4413299560546875, 8.210762023925781, 7.397693634033203, 3.7572555541992188, 0.67535400390625, -7.151268005371094, -1.9526824951171875, 3.1783294677734375, 7.869091033935547, 12.938972473144531, -0.8377609252929688, 13.039016723632812, 0.8768539428710938, 7.678462982177734, 5.64349365234375, -2.173614501953125, 6.0114288330078125, -1.9556293487548828, 1.6865863800048828, 10.087886810302734, 0.0074596405029296875, 7.558887481689453, 7.831241607666016, 4.422847747802734, -7.459079742431641, 5.3639373779296875, 13.065261840820312, 2.2398643493652344, 2.285776138305664, 7.379985809326172, -2.2318458557128906, 6.104156494140625, 0.7627048492431641, 3.6849746704101562, -2.0150203704833984, 7.3586883544921875, 10.031946182250977, 6.06903076171875, 6.182586669921875, -1.6724433898925781, 13.818283081054688, 2.6833934783935547, 4.238559722900391, 5.661102294921875, 13.390960693359375, 4.72840690612793, -2.0292816162109375, -2.77899169921875, 5.186113357543945, 6.654022216796875, -1.003814697265625, 1.0915393829345703, 15.60595703125, -0.8377685546875, -3.2212162017822266, 10.519550323486328, 4.122465133666992, -8.283889770507812, 1.6348686218261719, 7.00242805480957, -1.0841293334960938, 6.4619598388671875, 5.982219696044922], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000377.npy"}
{"epoch": 0.5699168556311414, "step": 378, "batch_size": 64, "mean": 3.6004490852355957, "std": 6.080357551574707, "min": -10.938289642333984, "p10": -1.881396484375, "median": 2.3654327392578125, "p90": 12.838851547241212, "max": 22.205337524414062, "pos_frac": 0.71875, "sample": [5.132904052734375, 2.35296630859375, 3.7452049255371094, 1.7945728302001953, 9.455802917480469, 6.9580535888671875, -4.136962890625, -0.9649467468261719, 3.90582275390625, 14.703170776367188, -10.938289642333984, 2.427389144897461, 15.78704833984375, 1.9136714935302734, 7.960777282714844, 7.521690368652344, 1.277456283569336, 4.026552200317383, 15.520172119140625, -0.3122711181640625, 8.507869720458984, 4.758586883544922, -0.8937339782714844, 4.2953338623046875, 12.425418853759766, -0.18532562255859375, 1.7559814453125, 9.736328125, -5.6306610107421875, 1.1880836486816406, 8.83331298828125, 0.9088935852050781, -3.6644439697265625, 3.584474563598633, 8.546295166015625, 0.0938262939453125, -0.0650482177734375, 0.0971527099609375, 8.18914794921875, -1.8832321166992188, 22.205337524414062, 2.377899169921875, 2.7224655151367188, 13.19717025756836, 2.632232666015625, 6.350616455078125, -1.1585578918457031, 13.961055755615234, 4.387115478515625, 0.6037845611572266, 1.6136093139648438, -1.8771133422851562, 1.7062873840332031, -6.204307556152344, -1.3805427551269531, 10.442977905273438, -1.5076675415039062, 0.865966796875, 1.9088783264160156, 4.892311096191406, -1.0196380615234375, -7.0480804443359375, -0.9881381988525391, 13.016036987304688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000378.npy"}
{"epoch": 0.5714285714285714, "step": 379, "batch_size": 64, "mean": 4.641477584838867, "std": 6.0337233543396, "min": -8.03485107421875, "p10": -1.6208080291748046, "median": 4.387420654296875, "p90": 11.84545478820801, "max": 19.88555145263672, "pos_frac": 0.828125, "sample": [5.197973251342773, -0.628814697265625, 11.964641571044922, -4.58563232421875, 4.029449462890625, 4.745391845703125, 6.446290969848633, -5.56666374206543, 2.499286651611328, 4.760345458984375, 0.29381561279296875, 0.0731201171875, -3.13897705078125, 6.7610015869140625, 0.050201416015625, 19.659942626953125, 8.2734375, -1.6108131408691406, 8.994136810302734, 8.809356689453125, 16.438720703125, 11.567352294921875, 11.29897689819336, 9.730079650878906, 0.716461181640625, -3.52984619140625, 7.714611053466797, -0.223052978515625, 9.760124206542969, -8.03485107421875, 16.485633850097656, 5.769121170043945, 3.280641555786133, 2.4869155883789062, 2.49407958984375, 8.469512939453125, 10.026636123657227, 1.1817245483398438, 1.5597915649414062, -7.987682342529297, -1.3514633178710938, 6.7624053955078125, 1.9588546752929688, 14.674505233764648, 8.768241882324219, 0.5176734924316406, 19.88555145263672, 4.8560638427734375, 1.4769096374511719, 3.6485443115234375, 13.152130126953125, -1.625091552734375, 0.2560272216796875, 5.785894393920898, 8.486127853393555, 0.7436141967773438, 3.0691375732421875, 7.931308746337891, 0.3037757873535156, 0.17258262634277344, 4.8486175537109375, 8.89459228515625, 6.472328186035156, 1.1337814331054688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000379.npy"}
{"epoch": 0.5729402872260015, "step": 380, "batch_size": 64, "mean": 5.180295944213867, "std": 6.244778156280518, "min": -10.975936889648438, "p10": -3.150909423828125, "median": 5.263523101806641, "p90": 11.930780601501466, "max": 19.545188903808594, "pos_frac": 0.78125, "sample": [4.349491119384766, 11.694643020629883, 3.18212890625, 4.374786376953125, -4.387718200683594, -4.6771392822265625, 10.331670761108398, 10.770378112792969, -3.4718780517578125, 9.666030883789062, 5.926372528076172, -0.3199424743652344, -1.0715065002441406, 10.598468780517578, 3.9954872131347656, -3.0894927978515625, 0.8820610046386719, 3.4561691284179688, 10.694965362548828, 11.434249877929688, 6.631202697753906, 4.878562927246094, -2.7618408203125, -10.975936889648438, 15.316455841064453, 3.8081817626953125, 8.360546112060547, 10.739570617675781, -0.3693218231201172, 3.5881710052490234, 7.9910888671875, 10.24615478515625, 5.346092224121094, 3.3813934326171875, 16.206069946289062, 2.4024009704589844, 12.031982421875, 6.638214111328125, -1.5431671142578125, 2.5875625610351562, 8.414327621459961, -10.547344207763672, 14.689659118652344, 16.069610595703125, 6.1208953857421875, 7.115377426147461, 13.348793029785156, 5.1809539794921875, -0.2646026611328125, 9.077579498291016, 1.3833694458007812, 1.665313720703125, -3.1772308349609375, 19.545188903808594, 11.18597412109375, 7.743675231933594, 1.6836185455322266, 6.3826141357421875, 9.271038055419922, 8.219253540039062, 4.671787261962891, 7.768276214599609, 0.9896793365478516, -3.841461181640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000380.npy"}
{"epoch": 0.5744520030234316, "step": 381, "batch_size": 64, "mean": 2.150967836380005, "std": 5.430963039398193, "min": -15.24542236328125, "p10": -3.350636291503906, "median": 1.7008304595947266, "p90": 8.705277824401856, "max": 16.280357360839844, "pos_frac": 0.65625, "sample": [-3.035552978515625, 2.577117919921875, 1.9471797943115234, -5.280965805053711, -15.24542236328125, 1.6180152893066406, 0.7215385437011719, 3.3552398681640625, 16.280357360839844, 4.855751037597656, 7.962242126464844, -2.2181930541992188, 0.7763442993164062, 5.216114044189453, -2.8280582427978516, 15.000457763671875, 6.0322113037109375, 0.6422557830810547, 1.7836456298828125, -0.9152851104736328, -0.3196125030517578, -2.0934371948242188, 3.701345443725586, 3.3019866943359375, -5.524742126464844, 4.713588714599609, -3.4856719970703125, -4.5727386474609375, 7.135894775390625, -0.48987770080566406, 4.682407379150391, 3.465545654296875, 4.331809997558594, 0.2195587158203125, -1.4214248657226562, 8.60495376586914, 2.916423797607422, -1.9034156799316406, -4.187286376953125, 2.5207977294921875, 8.345928192138672, 0.935699462890625, 8.242355346679688, 1.8862056732177734, 3.8143539428710938, 14.250720977783203, 8.748273849487305, 5.695850372314453, -0.9286785125732422, -1.2982521057128906, 0.8091049194335938, 9.066282272338867, 6.6211700439453125, -2.4322509765625, 1.148345947265625, -1.5466766357421875, 10.324932098388672, -6.4027557373046875, -2.4863052368164062, 2.294301986694336, 0.35268402099609375, -2.2013778686523438, 0.24903106689453125, 11.331901550292969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000381.npy"}
{"epoch": 0.5759637188208617, "step": 382, "batch_size": 64, "mean": 3.5834178924560547, "std": 4.961226940155029, "min": -6.961832046508789, "p10": -2.6908634185791014, "median": 3.287628173828125, "p90": 8.78075351715088, "max": 18.935771942138672, "pos_frac": 0.734375, "sample": [-0.771240234375, 0.03473091125488281, 10.093704223632812, 7.40681266784668, 8.897758483886719, -3.3752899169921875, 0.4610137939453125, 2.0422630310058594, 7.4592437744140625, 5.686014175415039, 2.13153076171875, 2.6864166259765625, -0.48818206787109375, 7.9635772705078125, 5.9398040771484375, -0.00830841064453125, -3.1695804595947266, 5.0931396484375, -6.961832046508789, 4.354240417480469, -1.0637969970703125, 4.184196472167969, 6.072845458984375, -1.4314689636230469, -0.507080078125, 4.052001953125, 5.570241928100586, 1.7551116943359375, 0.169097900390625, 18.935771942138672, 5.439971923828125, 4.683135986328125, -4.293830871582031, -3.6154232025146484, 8.289451599121094, -0.2693328857421875, -2.746540069580078, -0.5372238159179688, 2.5267181396484375, 8.507741928100586, 1.5870895385742188, 18.126312255859375, 2.2684326171875, 8.25396728515625, 1.3585319519042969, 7.319801330566406, -2.5811920166015625, -0.2570953369140625, 0.6418533325195312, 6.737052917480469, 0.8734588623046875, 8.505622863769531, -2.737865447998047, 2.959087371826172, 5.656669616699219, 1.3487586975097656, 9.229019165039062, 3.875680923461914, 9.771549224853516, 13.312911987304688, 3.616168975830078, 7.0291900634765625, 4.350116729736328, 6.896221160888672], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000382.npy"}
{"epoch": 0.5774754346182918, "step": 383, "batch_size": 64, "mean": 4.541975021362305, "std": 6.309675693511963, "min": -8.774383544921875, "p10": -2.003325271606445, "median": 4.030651092529297, "p90": 13.126781845092774, "max": 23.180267333984375, "pos_frac": 0.71875, "sample": [0.005252838134765625, 8.105987548828125, 5.058525085449219, 2.9127655029296875, 4.2402191162109375, -1.2433738708496094, 7.550205230712891, 4.62713623046875, -1.7837066650390625, -5.05656623840332, 4.120044708251953, -0.5001220703125, 3.9412574768066406, 5.738183975219727, 10.13555908203125, 1.0965194702148438, 8.065086364746094, -2.070812225341797, 14.609268188476562, 5.239784240722656, 3.9051284790039062, -1.845855712890625, 11.327400207519531, 7.00653076171875, 13.358844757080078, -1.386566162109375, 5.3279266357421875, 0.0176239013671875, 5.1804351806640625, 13.09796142578125, 13.053884506225586, -2.343839645385742, -0.9048233032226562, 23.180267333984375, 0.6690521240234375, 4.973110198974609, 13.13913345336914, -8.774383544921875, 3.3746337890625, 4.423835754394531, -2.6451797485351562, 2.6277542114257812, 3.9229888916015625, 3.5834217071533203, 7.442146301269531, -1.7603912353515625, 1.0682754516601562, 4.286506652832031, -3.8728065490722656, 15.223762512207031, -0.13687705993652344, 18.878509521484375, 9.587425231933594, 1.4916534423828125, -0.5463943481445312, 12.541595458984375, 12.565361022949219, 2.97503662109375, -0.3274993896484375, -6.9906158447265625, 11.225793838500977, -1.692657470703125, 13.430988311767578, 6.236080169677734], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000383.npy"}
{"epoch": 0.5789871504157218, "step": 384, "batch_size": 64, "mean": 4.0027689933776855, "std": 5.165865898132324, "min": -10.833648681640625, "p10": -1.8368408203124997, "median": 3.829151153564453, "p90": 10.316211318969726, "max": 16.97417449951172, "pos_frac": 0.8125, "sample": [-1.30377197265625, 9.459396362304688, 3.1314163208007812, 1.8402786254882812, -0.4492301940917969, 9.675918579101562, 3.7242202758789062, 5.7714691162109375, 7.543426513671875, 0.3521537780761719, 7.148162841796875, 3.93408203125, 1.86285400390625, 3.3340682983398438, -1.9362030029296875, 0.5065155029296875, 0.6284332275390625, 14.584129333496094, -1.91015625, 10.623180389404297, -1.6261138916015625, 5.781890869140625, 3.1842498779296875, -6.067741394042969, -2.257020950317383, 5.7651519775390625, 16.150657653808594, 4.775367736816406, -3.002992630004883, 5.2508392333984375, 13.87401008605957, 6.525566101074219, 0.4197654724121094, 2.3529491424560547, 4.938179016113281, 0.5100955963134766, 16.97417449951172, 1.9773712158203125, -10.833648681640625, 10.352108001708984, 3.4066009521484375, 5.5054779052734375, 10.906822204589844, -4.735263824462891, 10.185745239257812, 10.232452392578125, 5.336784362792969, 1.1674327850341797, 2.3170204162597656, 4.56938362121582, 4.80865478515625, 1.4929046630859375, 4.409677505493164, 4.7720489501953125, 1.5905303955078125, -0.5445747375488281, 1.2578468322753906, 8.127578735351562, -1.665771484375, 1.8313217163085938, 9.351409912109375, 6.1742706298828125, 6.959283828735352, 5.154388427734375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000384.npy"}
{"epoch": 0.5804988662131519, "step": 385, "batch_size": 64, "mean": 4.467778205871582, "std": 5.067529201507568, "min": -7.48895263671875, "p10": -1.2501583099365228, "median": 4.392965316772461, "p90": 10.263583755493165, "max": 20.394184112548828, "pos_frac": 0.765625, "sample": [-0.5936698913574219, 0.9872207641601562, 8.536226272583008, 6.9225311279296875, 4.810371398925781, 6.381401062011719, 4.3049468994140625, 5.795680999755859, 2.2147750854492188, 3.863872528076172, -4.128353118896484, 4.4635772705078125, 3.746387481689453, 8.08892822265625, 6.312255859375, 8.430702209472656, 12.63363265991211, 11.391639709472656, -0.4059467315673828, 3.868532180786133, -0.6546173095703125, -7.48895263671875, 3.0093460083007812, 9.659568786621094, 10.083026885986328, 7.787811279296875, 5.212926864624023, 4.9391021728515625, 2.0784683227539062, 5.015110015869141, 3.9593353271484375, -6.665576934814453, 8.335556030273438, -0.33929443359375, 4.785869598388672, 8.25787353515625, 6.148418426513672, 4.9212646484375, 4.322353363037109, 5.313831329345703, -2.564289093017578, -0.5758285522460938, 7.897308349609375, 8.81732177734375, -0.17573165893554688, 15.070602416992188, 8.718330383300781, -2.3939895629882812, -1.5053901672363281, -2.49041748046875, 2.1070709228515625, 3.847421646118164, -0.2908782958984375, 2.8507537841796875, 4.283246994018555, 10.340965270996094, 0.07575607299804688, 7.865932464599609, -0.16762542724609375, 20.394184112548828, 0.0806427001953125, 3.1972694396972656, 12.297504425048828, 11.951507568359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000385.npy"}
{"epoch": 0.582010582010582, "step": 386, "batch_size": 64, "mean": 3.6487817764282227, "std": 5.23617696762085, "min": -8.956787109375, "p10": -2.3981834411621095, "median": 3.3147830963134766, "p90": 10.576557159423828, "max": 16.57110595703125, "pos_frac": 0.75, "sample": [16.57110595703125, -1.3965702056884766, -1.403097152709961, 5.917671203613281, 9.87356185913086, 3.35357666015625, -0.27919769287109375, 8.711807250976562, 2.3855323791503906, 4.833892822265625, 2.0837249755859375, -3.39697265625, -6.544879913330078, 1.5375595092773438, -2.895111083984375, 11.176658630371094, 6.6925506591796875, 13.287208557128906, 7.320671081542969, 2.7245025634765625, 4.77509880065918, 7.98785400390625, 4.449333190917969, -2.0202980041503906, -1.3040847778320312, 5.282436370849609, 12.114435195922852, -0.6620712280273438, 5.91717529296875, 3.275989532470703, 3.48065185546875, 8.625091552734375, -1.5087051391601562, 3.544097900390625, -5.091510772705078, 0.44152069091796875, 2.1246490478515625, 2.241973876953125, 4.379758834838867, 9.197174072265625, 3.9650115966796875, -2.407215118408203, 4.684995651245117, 10.376632690429688, 10.662239074707031, -2.3771095275878906, 0.658172607421875, 4.936302185058594, -0.6319179534912109, 14.736801147460938, 1.4307594299316406, 0.270477294921875, -5.38360595703125, 5.4292144775390625, 11.72857666015625, 2.5139617919921875, 1.6667022705078125, 2.925567626953125, 8.931991577148438, 9.154525756835938, 6.7339019775390625, 2.690826416015625, 1.9772415161132812, -8.956787109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000386.npy"}
{"epoch": 0.5835222978080121, "step": 387, "batch_size": 64, "mean": 4.922030448913574, "std": 4.782616138458252, "min": -7.1878204345703125, "p10": -0.5393192291259763, "median": 4.012458801269531, "p90": 10.832954406738281, "max": 19.880508422851562, "pos_frac": 0.875, "sample": [8.536331176757812, 1.27069091796875, 9.793292999267578, -1.0099201202392578, 2.1146583557128906, 10.860946655273438, 5.302581787109375, 1.1512336730957031, 5.110660552978516, 19.880508422851562, 3.8193740844726562, -1.1903400421142578, 0.7101211547851562, 3.8211212158203125, 11.248065948486328, 1.939300537109375, 7.1481781005859375, 10.590866088867188, 6.475795745849609, 4.147003173828125, 9.854522705078125, 9.881351470947266, 8.291328430175781, -0.23376083374023438, -1.6794910430908203, 6.6804962158203125, 2.074432373046875, 0.5119285583496094, 10.76763916015625, 4.693403244018555, 2.0942344665527344, 2.377593994140625, 1.87939453125, 12.956304550170898, 1.0606689453125, 7.640968322753906, 3.7154083251953125, 0.9248275756835938, 3.4341506958007812, -0.6702728271484375, 0.7409458160400391, 6.1741180419921875, 8.23488998413086, -7.1878204345703125, -1.1180000305175781, 6.421674728393555, 6.25555419921875, 14.493019104003906, 1.3675918579101562, 9.946203231811523, -2.188701629638672, 8.035118103027344, 4.93157958984375, 4.3257904052734375, 0.246826171875, 3.434629440307617, 3.769927978515625, 3.0314579010009766, 3.8779144287109375, 11.441268920898438, 15.103836059570312, 3.239309310913086, 7.671731948852539, 4.785478591918945], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000387.npy"}
{"epoch": 0.5850340136054422, "step": 388, "batch_size": 64, "mean": 3.698843240737915, "std": 6.1343841552734375, "min": -8.330886840820312, "p10": -3.195923233032226, "median": 3.111239433288574, "p90": 12.81257591247559, "max": 20.955169677734375, "pos_frac": 0.65625, "sample": [8.639434814453125, 14.417320251464844, 9.462844848632812, 16.98879623413086, 2.0823516845703125, 6.5978240966796875, 6.347381591796875, 3.7653045654296875, -3.8948516845703125, -1.8897361755371094, 5.416191101074219, 7.3226165771484375, -1.5771293640136719, 4.930707931518555, -5.006660461425781, 20.955169677734375, -2.4388389587402344, 13.978378295898438, 1.2374725341796875, 5.580495834350586, 8.5826416015625, 1.0042877197265625, 8.12445068359375, -0.100006103515625, 10.18576431274414, -8.330886840820312, 1.15380859375, 5.551246643066406, 1.7536544799804688, 1.4887142181396484, -1.6767826080322266, -2.4790992736816406, -0.825347900390625, 13.21982192993164, -1.2547130584716797, 4.598052978515625, 3.1228179931640625, -4.506103515625, 7.260150909423828, -2.8476295471191406, 8.955207824707031, 13.68023681640625, -3.3451919555664062, 11.250228881835938, 8.982244491577148, 4.9821014404296875, -1.223602294921875, 3.3795700073242188, 6.08258056640625, 13.326057434082031, 2.6255416870117188, 11.862335205078125, -2.5972213745117188, 1.7127838134765625, -7.878047943115234, -4.2736358642578125, 5.856658935546875, 4.175445556640625, -0.21692848205566406, -0.14531707763671875, 2.096912384033203, -1.5159072875976562, -1.0856704711914062, 3.099660873413086], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000388.npy"}
{"epoch": 0.5865457294028723, "step": 389, "batch_size": 64, "mean": 4.432642936706543, "std": 5.752743244171143, "min": -10.537837982177734, "p10": -2.2395629882812496, "median": 4.82686710357666, "p90": 11.464521026611328, "max": 15.948905944824219, "pos_frac": 0.75, "sample": [-4.9920654296875, -3.045511245727539, 6.195695877075195, 3.62762451171875, 3.2055206298828125, 2.8985214233398438, 8.06064224243164, 11.458427429199219, 14.363082885742188, 11.467132568359375, 0.198883056640625, -4.152458190917969, 8.212051391601562, 6.016080856323242, 6.4192352294921875, 10.13493537902832, 5.936500549316406, 9.130550384521484, 6.09246826171875, 4.815608978271484, 10.12225341796875, 6.87286376953125, 1.1145496368408203, 6.3568115234375, 10.762664794921875, -0.8484649658203125, 6.81671142578125, -1.9400291442871094, 3.558258056640625, 6.083534240722656, -0.6214370727539062, 5.894096374511719, -1.9572601318359375, 14.222789764404297, 12.40999984741211, 4.2696380615234375, 15.1326904296875, 5.270393371582031, -2.509174346923828, -10.255678176879883, 11.225936889648438, -0.985595703125, 1.9754161834716797, -0.5235042572021484, 6.79332160949707, 0.33697509765625, 8.127700805664062, -1.3983497619628906, -10.537837982177734, 4.741455078125, 1.4707183837890625, 3.1574344635009766, 9.188751220703125, -0.4219837188720703, -2.3605499267578125, 15.948905944824219, 13.867328643798828, 5.1894073486328125, 2.842803955078125, 10.479934692382812, -0.16020584106445312, 4.838125228881836, 2.7126312255859375, 0.3821907043457031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000389.npy"}
{"epoch": 0.5880574452003023, "step": 390, "batch_size": 64, "mean": 4.06973123550415, "std": 5.214151382446289, "min": -7.364818572998047, "p10": -2.1696653366088863, "median": 3.3783411979675293, "p90": 11.557283020019536, "max": 18.617347717285156, "pos_frac": 0.78125, "sample": [-5.4965362548828125, 2.256317138671875, 4.065620422363281, -0.9197998046875, 12.667465209960938, 0.6445236206054688, 5.7859039306640625, 7.519458770751953, -1.4794921875, 5.549736022949219, -0.3614044189453125, -2.792694091796875, 7.502616882324219, 0.5672283172607422, -2.5183029174804688, 18.617347717285156, 7.321582794189453, -2.5649642944335938, 4.4473114013671875, -1.9194793701171875, 7.0595855712890625, -0.2301177978515625, 5.711923599243164, 12.117782592773438, 8.67572021484375, 9.214344024658203, 2.5949935913085938, -2.9005889892578125, 0.708221435546875, 14.804153442382812, 0.03395843505859375, 3.170140266418457, 7.682313919067383, 7.250526428222656, -2.276887893676758, -0.26978302001953125, -7.364818572998047, 8.752410888671875, 13.264198303222656, 1.1328392028808594, 12.179893493652344, 2.3412532806396484, 7.717962265014648, 3.5865421295166016, 1.1313896179199219, 6.662817001342773, 2.1606082916259766, 3.611907958984375, 10.24945068359375, 3.588348388671875, 8.03305435180664, 4.447795867919922, 2.682525634765625, 7.735286712646484, 0.9745998382568359, 2.3042449951171875, 1.5330123901367188, 0.48070526123046875, 1.627950668334961, -0.682861328125, 7.167091369628906, 6.046897888183594, 1.28533935546875, 15.573627471923828], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000390.npy"}
{"epoch": 0.5895691609977324, "step": 391, "batch_size": 64, "mean": 4.828234672546387, "std": 6.907925128936768, "min": -11.39447021484375, "p10": -4.231708526611328, "median": 4.583560943603516, "p90": 13.377582931518555, "max": 23.214553833007812, "pos_frac": 0.75, "sample": [3.3887176513671875, -11.39447021484375, 13.495597839355469, 3.8038787841796875, 6.36370849609375, 9.978218078613281, -1.0785484313964844, 10.400840759277344, 13.977737426757812, 17.763671875, 4.701683044433594, -9.560684204101562, 9.6605224609375, -1.1590728759765625, 13.102214813232422, 14.535842895507812, -3.4087963104248047, 4.13348388671875, -2.048614501953125, 0.28572845458984375, 1.8541603088378906, 6.1607666015625, 6.135345458984375, 9.9228515625, 2.6709365844726562, -0.4613819122314453, -5.7246856689453125, -2.9083194732666016, 8.407508850097656, 4.213920593261719, 23.214553833007812, 7.616851806640625, 4.413965225219727, -3.9477767944335938, 5.154541015625, -1.4985294342041016, 0.32265281677246094, 8.484687805175781, 12.030197143554688, 7.942588806152344, 4.4654388427734375, 12.665275573730469, -6.1007232666015625, -4.886993408203125, 4.975822448730469, 5.088619232177734, 12.743507385253906, 4.324272155761719, 12.126487731933594, 13.661415100097656, -5.4984588623046875, -4.3533935546875, 12.089374542236328, 12.888671875, 13.603797912597656, 10.568805694580078, 6.749605178833008, 3.7500858306884766, -2.1750717163085938, 1.5019912719726562, 0.04674530029296875, 4.990406036376953, 2.6270389556884766, 2.2078170776367188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000391.npy"}
{"epoch": 0.5910808767951625, "step": 392, "batch_size": 64, "mean": 5.359006881713867, "std": 7.208142280578613, "min": -9.187549591064453, "p10": -2.257625389099121, "median": 4.781564712524414, "p90": 14.626948547363282, "max": 25.53167724609375, "pos_frac": 0.765625, "sample": [-6.265592575073242, 0.9292640686035156, 6.236671447753906, 4.7604522705078125, 1.0827484130859375, 0.7713146209716797, 15.479581832885742, -1.4239578247070312, 13.250625610351562, 11.205472946166992, 2.6378326416015625, -3.419769287109375, 6.451591491699219, 10.937137603759766, 12.431884765625, 6.1312255859375, 0.38826751708984375, 8.403297424316406, 12.77508544921875, 6.5001068115234375, 4.7181854248046875, 5.527801513671875, 5.2113494873046875, -9.187549591064453, 13.777816772460938, 5.970672607421875, 15.02011489868164, 4.802677154541016, 6.904449462890625, -8.3519287109375, 2.3197021484375, 14.754829406738281, 8.577590942382812, 2.1206436157226562, -5.464332580566406, 7.939990997314453, 2.9698429107666016, 7.7843170166015625, 4.8217315673828125, 12.100223541259766, 20.323410034179688, 17.278884887695312, -2.1954517364501953, 23.62896728515625, 0.45027923583984375, 2.3537254333496094, -0.43486785888671875, -1.1231842041015625, 14.328559875488281, 4.6082763671875, -1.65350341796875, 25.53167724609375, -0.6317176818847656, -1.3465576171875, -4.0048370361328125, 0.5404891967773438, 11.722625732421875, -2.284271240234375, 3.35858154296875, 9.343978881835938, 3.018718719482422, 5.296422958374023, -1.0285873413085938, 4.313453674316406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000392.npy"}
{"epoch": 0.5925925925925926, "step": 393, "batch_size": 64, "mean": 5.010361671447754, "std": 5.298646926879883, "min": -4.334564208984375, "p10": -0.984415817260742, "median": 3.4176063537597656, "p90": 12.922500610351562, "max": 16.46179962158203, "pos_frac": 0.796875, "sample": [4.7611236572265625, -0.1863536834716797, 11.36220932006836, 9.213348388671875, 13.376771926879883, 7.8143463134765625, 3.255525588989258, 3.5752944946289062, 0.7175674438476562, 2.734600067138672, -0.1403217315673828, 2.66558837890625, 15.573699951171875, 8.34332275390625, 5.692495346069336, -0.7556228637695312, 3.4623794555664062, 8.09200668334961, -1.0824699401855469, 8.50503921508789, 0.6234893798828125, 2.3501739501953125, -0.03271484375, 16.46179962158203, 3.372833251953125, 10.867462158203125, 0.45469093322753906, 14.54278564453125, 8.615196228027344, 9.710189819335938, 8.206275939941406, -1.8627376556396484, 11.731903076171875, 12.872993469238281, 0.7679042816162109, 6.207374572753906, -2.0740718841552734, 12.943717956542969, 1.9200210571289062, -0.6021194458007812, 4.037618637084961, 2.7829132080078125, 3.367593765258789, -2.447612762451172, 3.36395263671875, 5.895557403564453, 8.369834899902344, 1.9081382751464844, 0.7540264129638672, 14.343109130859375, 10.927001953125, 15.103385925292969, 6.807470321655273, -4.334564208984375, 3.1106929779052734, 7.9222869873046875, 1.9573822021484375, -0.526123046875, -2.3459014892578125, -3.7493896484375, 4.549999237060547, 10.452695846557617, 2.8041229248046875, 1.551239013671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000393.npy"}
{"epoch": 0.5941043083900227, "step": 394, "batch_size": 64, "mean": 4.238980770111084, "std": 5.225926399230957, "min": -4.9536895751953125, "p10": -3.273646545410154, "median": 4.276451110839844, "p90": 11.774509429931642, "max": 16.74004364013672, "pos_frac": 0.8125, "sample": [-0.37140655517578125, 14.139862060546875, 11.885002136230469, 1.8269424438476562, 1.4304656982421875, 5.460113525390625, -0.2184600830078125, 2.478912353515625, 14.047065734863281, 4.935951232910156, 14.025848388671875, 5.06566047668457, 0.3726158142089844, -1.3999748229980469, -4.9536895751953125, 4.494670867919922, 2.5659751892089844, -4.842031478881836, 2.474639892578125, 4.359310150146484, 3.8620529174804688, -4.301713943481445, 5.254495620727539, 8.403106689453125, 8.135299682617188, 15.567028045654297, 7.661821365356445, 8.633674621582031, 3.326078414916992, 0.4506378173828125, -4.225067138671875, 0.813507080078125, 15.776866912841797, 5.630645751953125, 1.4195117950439453, 4.147186279296875, 0.14250946044921875, 0.42972564697265625, 11.516693115234375, 1.96533203125, 9.253440856933594, -4.413993835449219, -0.472137451171875, 4.635337829589844, 5.964633941650391, 16.74004364013672, -4.076648712158203, 8.580650329589844, -0.880523681640625, 4.3664703369140625, -4.880104064941406, 4.2148895263671875, 4.1593475341796875, 5.746299743652344, 4.93115234375, 2.352642059326172, 6.0987701416015625, 6.529689788818359, 2.9119510650634766, 4.3380126953125, 2.25146484375, 7.931434631347656, 6.07470703125, 6.550376892089844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000394.npy"}
{"epoch": 0.5956160241874527, "step": 395, "batch_size": 64, "mean": 3.6631171703338623, "std": 5.308507442474365, "min": -6.151580810546875, "p10": -3.2388095855712886, "median": 3.2049312591552734, "p90": 11.346350097656256, "max": 15.166891098022461, "pos_frac": 0.734375, "sample": [6.150136947631836, 9.304550170898438, 7.295379638671875, 0.9110107421875, 8.802299499511719, 7.9860076904296875, -0.8677959442138672, 9.755584716796875, 12.513137817382812, -2.7337913513183594, 2.522857666015625, 8.276870727539062, 12.470169067382812, 12.955223083496094, 1.4824790954589844, 15.166891098022461, 1.461883544921875, 5.539775848388672, -4.36773681640625, 3.936553955078125, -0.114410400390625, 2.346141815185547, 1.4987449645996094, 3.799938201904297, 6.715980529785156, -4.253992080688477, 6.8092193603515625, 9.248588562011719, 3.790132522583008, -3.4552459716796875, 4.829433441162109, 12.028106689453125, 4.7045440673828125, -0.0737152099609375, -5.0425262451171875, 3.9475173950195312, -5.707965850830078, 0.16816329956054688, 9.211845397949219, -2.138011932373047, -0.24738311767578125, 0.876617431640625, 13.424095153808594, 0.3713035583496094, 8.419502258300781, 9.142913818359375, 2.166963577270508, 6.3283233642578125, -1.4468154907226562, -1.1260318756103516, 1.6089801788330078, 15.045272827148438, -1.2342529296875, 1.2379722595214844, 3.0754470825195312, 7.371910095214844, -3.7699317932128906, 1.012969970703125, 4.250020980834961, -6.151580810546875, 4.180734634399414, -2.23077392578125, 3.3344154357910156, 1.9248580932617188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000395.npy"}
{"epoch": 0.5971277399848829, "step": 396, "batch_size": 64, "mean": 4.709628582000732, "std": 5.528806686401367, "min": -7.7972564697265625, "p10": -1.8137302398681634, "median": 4.025266647338867, "p90": 12.80703353881836, "max": 18.21807861328125, "pos_frac": 0.796875, "sample": [15.162109375, 18.21807861328125, 8.675430297851562, 15.6435546875, 12.506479263305664, 13.682596206665039, 3.095134735107422, 6.719675064086914, 1.899444580078125, 2.144510269165039, 5.2599029541015625, 1.2656974792480469, 8.389190673828125, -2.093475341796875, 2.273162841796875, 0.3786754608154297, 10.330810546875, 9.390933990478516, -0.5925140380859375, 6.010334014892578, 0.7238311767578125, -0.6807441711425781, 16.957847595214844, 5.42254638671875, 3.9163970947265625, 8.541587829589844, 3.949665069580078, -7.7972564697265625, 6.781242370605469, 3.5837974548339844, 2.960062026977539, -3.3718338012695312, 1.7510128021240234, 13.194168090820312, 5.294342041015625, 6.301658630371094, 4.395097732543945, 4.424110412597656, 12.818992614746094, 11.90679931640625, -2.078815460205078, 5.639072418212891, -0.843780517578125, 4.100868225097656, 5.501899719238281, 1.7879180908203125, 3.65057373046875, -0.8716964721679688, 6.3809661865234375, -2.929576873779297, 3.216583251953125, -3.3840065002441406, 3.1769332885742188, -0.03271484375, -1.1951980590820312, 9.552928924560547, 4.574605941772461, 9.691686630249023, 1.522857666015625, 1.6944770812988281, -4.4524383544921875, 4.113290786743164, 0.387603759765625, 12.779129028320312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000396.npy"}
{"epoch": 0.5986394557823129, "step": 397, "batch_size": 64, "mean": 2.719043254852295, "std": 6.357906341552734, "min": -19.365829467773438, "p10": -4.412015724182128, "median": 3.0244789123535156, "p90": 9.593829345703126, "max": 16.318592071533203, "pos_frac": 0.71875, "sample": [0.052799224853515625, 2.3833389282226562, -6.787839889526367, 2.771514892578125, -3.371488571166992, 3.9133453369140625, 8.028980255126953, 1.8959884643554688, 4.5275421142578125, 6.2636871337890625, 1.8140735626220703, 5.972461700439453, 2.2875747680664062, 13.029106140136719, 9.474189758300781, 9.645103454589844, -11.6529541015625, -5.3025665283203125, 2.1579551696777344, -0.9451522827148438, 8.549917221069336, -4.8579559326171875, 3.5599288940429688, 4.6004638671875, 3.7417068481445312, -10.272323608398438, -2.306427001953125, 1.7888221740722656, 9.250587463378906, 8.445854187011719, 3.1028213500976562, 1.8742198944091797, 11.735031127929688, 6.054840087890625, 13.118642807006836, 5.996978759765625, 6.054851531982422, 11.284774780273438, 9.060932159423828, 1.4941425323486328, 1.1044540405273438, 9.252922058105469, 7.280364990234375, 2.9618911743164062, -19.365829467773438, 4.357002258300781, 13.82159423828125, -1.519683837890625, -2.5236473083496094, -2.3364715576171875, 4.884452819824219, 3.087066650390625, 1.9107742309570312, 0.9068164825439453, -0.6652259826660156, -0.5326004028320312, 4.497280120849609, 3.2351303100585938, -2.403656005859375, 5.0685882568359375, -9.780208587646484, -3.06451416015625, -0.9117832183837891, 16.318592071533203], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000397.npy"}
{"epoch": 0.600151171579743, "step": 398, "batch_size": 64, "mean": 4.784501075744629, "std": 5.408127307891846, "min": -4.540927886962891, "p10": -1.080573272705078, "median": 4.152196884155273, "p90": 12.837336158752443, "max": 18.03314208984375, "pos_frac": 0.796875, "sample": [3.349874496459961, 5.166851043701172, 6.3577423095703125, 1.5185089111328125, 9.280895233154297, 1.9913978576660156, 15.033317565917969, 10.854461669921875, -2.954303741455078, 16.026039123535156, 8.314613342285156, 5.02862548828125, -1.0221099853515625, -0.5956649780273438, 7.845283508300781, -0.13999176025390625, 8.280487060546875, -3.8053665161132812, 4.69287109375, -2.917888641357422, 5.0426177978515625, 0.7645587921142578, 17.509597778320312, 14.038162231445312, -0.6008872985839844, 12.323253631591797, 14.05632209777832, 5.693172454833984, 5.568187713623047, 2.5099105834960938, -0.4353370666503906, 3.4888038635253906, 0.9831695556640625, 9.907981872558594, 5.6181488037109375, -2.871339797973633, -0.1131439208984375, 11.414878845214844, 0.405914306640625, 4.272327423095703, -1.4001693725585938, 2.1452102661132812, 7.294891357421875, 1.6134605407714844, 5.509757995605469, 8.216087341308594, 2.133878707885742, -1.1056289672851562, 1.6724472045898438, 7.846670150756836, 0.09220123291015625, 18.03314208984375, 4.059032440185547, 3.725818634033203, 3.9777984619140625, 0.34442901611328125, 10.579925537109375, -4.540927886962891, 4.245361328125, 0.8391513824462891, 4.665927886962891, 9.339570999145508, 13.057657241821289, 1.980428695678711], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000398.npy"}
{"epoch": 0.6016628873771731, "step": 399, "batch_size": 64, "mean": 5.440232753753662, "std": 6.687586307525635, "min": -5.880687713623047, "p10": -1.8785308837890624, "median": 3.753053665161133, "p90": 14.465604782104492, "max": 27.4808349609375, "pos_frac": 0.828125, "sample": [2.88690185546875, 4.199531555175781, 6.846717834472656, 10.826934814453125, -1.9011993408203125, 10.98760986328125, 9.807212829589844, 3.5436038970947266, -3.0576515197753906, -4.824775695800781, -3.754486083984375, 20.07562255859375, 4.365812301635742, -5.880687713623047, 4.0630645751953125, 3.029998779296875, 3.8019866943359375, 4.4715576171875, 3.432209014892578, 14.329299926757812, -3.840972900390625, 2.39990234375, -3.935150146484375, 7.906623840332031, 20.15478515625, 1.3427276611328125, 4.0144805908203125, 27.4808349609375, -0.43917083740234375, 1.7171611785888672, 2.4967880249023438, -0.4091453552246094, 5.6500396728515625, 1.2443580627441406, 1.9964752197265625, -1.8256378173828125, 2.5371475219726562, 9.545928955078125, 3.2013015747070312, 8.376380920410156, 16.613292694091797, -0.5810184478759766, 12.83160400390625, 1.8358154296875, 11.734668731689453, 4.359806060791016, 5.821086883544922, 3.704120635986328, 0.123565673828125, 1.469085693359375, 12.340999603271484, 14.52402114868164, 14.699615478515625, 3.3152236938476562, 8.123525619506836, 10.378402709960938, 4.403099060058594, 18.63592529296875, 0.5226287841796875, 0.7647895812988281, 14.05218505859375, 1.0048484802246094, 3.1826400756835938, 7.450859069824219], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000399.npy"}
{"epoch": 0.6031746031746031, "step": 400, "batch_size": 64, "mean": 3.521681547164917, "std": 5.58358097076416, "min": -7.1828765869140625, "p10": -2.0527500152587885, "median": 2.288257598876953, "p90": 11.419918060302738, "max": 18.801361083984375, "pos_frac": 0.671875, "sample": [-0.025524139404296875, 2.7378616333007812, 1.356943130493164, 5.6291046142578125, 8.262458801269531, 13.277164459228516, -0.30371856689453125, 7.495880126953125, 6.00762939453125, 14.203866958618164, 3.0466651916503906, -1.6974430084228516, 1.6514701843261719, 2.081541061401367, -0.20243072509765625, 2.684112548828125, 4.22564697265625, 11.78786849975586, 6.329551696777344, 9.495393753051758, -0.0125732421875, 2.1005325317382812, 9.934280395507812, -2.2006797790527344, 0.49261474609375, 7.964099884033203, -0.07913398742675781, 4.993629455566406, 1.6447868347167969, -0.3247241973876953, -7.1828765869140625, -0.9326267242431641, 5.4869537353515625, 1.4073410034179688, 18.801361083984375, 0.17471694946289062, -3.67071533203125, -1.0534934997558594, 9.564117431640625, 3.4331092834472656, 10.56136703491211, 9.023078918457031, -6.4453582763671875, 3.438323974609375, 3.6951217651367188, 1.6896247863769531, 2.0253143310546875, -1.70758056640625, 5.1232757568359375, 16.288108825683594, -0.07162284851074219, 2.6298141479492188, -2.2205753326416016, 2.475982666015625, -3.708250045776367, 16.194801330566406, 14.378772735595703, -0.7806854248046875, -0.46837615966796875, 3.29498291015625, -4.78619384765625, 0.6605033874511719, -0.9966506958007812, 6.509086608886719], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000400.npy"}
{"epoch": 0.6046863189720333, "step": 401, "batch_size": 64, "mean": 5.8337602615356445, "std": 6.222314357757568, "min": -4.887847900390625, "p10": -1.4247287750244138, "median": 4.89085578918457, "p90": 13.813796997070312, "max": 22.94500732421875, "pos_frac": 0.796875, "sample": [-2.2047882080078125, 9.175071716308594, 9.681304931640625, 4.7997283935546875, -1.8177204132080078, 0.31813621520996094, 3.5022735595703125, 5.091730117797852, 0.6542816162109375, -1.1909103393554688, 13.043373107910156, 0.30016326904296875, -4.0789794921875, -0.01265716552734375, 5.718727111816406, -0.45514678955078125, 13.828948974609375, 13.7784423828125, 16.742910385131836, 13.556655883789062, 2.6531448364257812, 4.697502136230469, 0.8969345092773438, 7.025726318359375, -2.9578704833984375, -4.887847900390625, 4.653564453125, 6.923252105712891, 3.7023391723632812, 6.080078125, 11.744644165039062, 5.519859313964844, 3.7724742889404297, 12.939949035644531, -0.20188522338867188, 12.494590759277344, 0.5380077362060547, 7.100078582763672, 0.7091999053955078, 9.090667724609375, -0.07978439331054688, 15.684621810913086, 9.013473510742188, -1.5249366760253906, 6.0443267822265625, 3.404590606689453, 1.74859619140625, 18.64466094970703, 10.432159423828125, 9.002035140991211, 4.62730598449707, 0.6073112487792969, 4.600269317626953, 2.7866363525390625, -1.6719627380371094, 4.981983184814453, 12.774105072021484, 6.530603408813477, 15.945892333984375, 19.17047882080078, 6.122957229614258, -0.3693733215332031, 9.013744354248047, 22.94500732421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000401.npy"}
{"epoch": 0.6061980347694633, "step": 402, "batch_size": 64, "mean": 4.412362098693848, "std": 5.899099349975586, "min": -8.3387451171875, "p10": -2.3283670425415033, "median": 4.312613487243652, "p90": 12.321200561523442, "max": 17.95720672607422, "pos_frac": 0.84375, "sample": [9.689050674438477, 12.729026794433594, 6.507289886474609, 8.22150993347168, 6.0221099853515625, 6.940469741821289, 17.95720672607422, -0.435302734375, 5.4290008544921875, 14.122642517089844, 2.0821170806884766, 15.105178833007812, 4.203123092651367, -6.956748962402344, 8.845108032226562, 3.647663116455078, 1.4393768310546875, 4.923162460327148, 1.5838050842285156, 4.4221038818359375, 16.07787322998047, 0.2723579406738281, 1.2097129821777344, -6.5995941162109375, 9.62481689453125, 0.10591697692871094, 4.858371734619141, 11.062820434570312, 0.5478363037109375, 1.2534027099609375, 15.120834350585938, 16.729915618896484, 3.4649429321289062, 0.02550506591796875, 4.828521728515625, 0.09444427490234375, 9.094799041748047, 11.369606018066406, 5.366943359375, 2.557720184326172, -2.540464401245117, -4.317440032958984, -8.3387451171875, 8.999353408813477, 6.772552490234375, 0.10512351989746094, 2.1812496185302734, 0.05141448974609375, 0.6429061889648438, -0.07219314575195312, 7.907419204711914, 7.2135009765625, 7.354503631591797, 9.050987243652344, 1.4109344482421875, 0.07199478149414062, 2.0971832275390625, -5.536201477050781, -5.4801483154296875, 6.1105499267578125, 4.4504241943359375, 8.921470642089844, -1.8334732055664062, 3.6236190795898438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000402.npy"}
{"epoch": 0.6077097505668935, "step": 403, "batch_size": 64, "mean": 6.0127716064453125, "std": 6.598952293395996, "min": -8.99435806274414, "p10": -2.1552721023559567, "median": 5.8628082275390625, "p90": 14.91170825958252, "max": 22.676300048828125, "pos_frac": 0.828125, "sample": [-1.60284423828125, 5.522552490234375, 11.359222412109375, 6.511466979980469, 13.93095588684082, 2.3409271240234375, 20.881088256835938, 7.5127716064453125, 2.3883533477783203, 7.881343841552734, 12.540618896484375, -3.66510009765625, 14.923238754272461, 1.5279884338378906, 17.15550422668457, 3.9498252868652344, 3.8349533081054688, 2.345897674560547, 3.9152679443359375, 1.6507072448730469, -1.9515972137451172, -3.8937034606933594, 22.676300048828125, 16.64533233642578, 0.0679931640625, 10.31906509399414, 11.1517333984375, 1.2733039855957031, 9.405620574951172, 5.4004058837890625, 0.7810554504394531, 3.85260009765625, 7.08587646484375, 6.4611053466796875, 6.20306396484375, -3.0678749084472656, 3.053619384765625, 18.228256225585938, 2.5468921661376953, 6.331817626953125, 2.88446044921875, 1.0504112243652344, 14.884803771972656, 4.802967071533203, 8.973512649536133, -0.03125, 7.039058685302734, 10.402992248535156, 8.952590942382812, 3.449615478515625, 11.800079345703125, 11.238410949707031, -8.99435806274414, -3.8770294189453125, 9.392730712890625, -6.308357238769531, 16.239418029785156, 9.160369873046875, -0.2715873718261719, 11.723663330078125, -2.2425613403320312, 1.4359130859375, 8.229591369628906, 7.406318664550781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000403.npy"}
{"epoch": 0.6092214663643235, "step": 404, "batch_size": 64, "mean": 4.619820594787598, "std": 6.072515487670898, "min": -8.604904174804688, "p10": -2.7668439865112298, "median": 4.001808166503906, "p90": 12.263872528076178, "max": 17.960697174072266, "pos_frac": 0.75, "sample": [2.173736572265625, 1.227020263671875, 8.613624572753906, 0.8917999267578125, 2.058706283569336, -0.6002655029296875, 11.061500549316406, 0.9700393676757812, -1.962656021118164, -3.6168060302734375, 10.405174255371094, 1.86505126953125, -7.439617156982422, -6.542705535888672, -0.09387588500976562, 8.138290405273438, -1.9097061157226562, -4.915924072265625, 12.7791748046875, 11.00677490234375, 8.393009185791016, 14.774162292480469, 8.0523681640625, 16.45977020263672, -0.69891357421875, 5.9561309814453125, -1.1287841796875, 6.927001953125, 3.463672637939453, 9.744800567626953, 1.7143287658691406, 9.27175521850586, -1.0070838928222656, 10.340227127075195, -0.2136077880859375, 0.7158889770507812, 1.9424362182617188, 7.959449768066406, 3.5632171630859375, 1.5393714904785156, 3.3828887939453125, 4.440399169921875, 7.728214263916016, 5.4345855712890625, 14.380290985107422, 7.783321380615234, 6.6950531005859375, 10.178001403808594, 17.960697174072266, 3.4923019409179688, -0.15337371826171875, 17.936798095703125, 14.5765380859375, 6.3551177978515625, 7.067523956298828, 6.0909271240234375, 8.036590576171875, -4.582420349121094, -3.1114959716796875, -8.604904174804688, 4.50054931640625, 8.857013702392578, 3.0566577911376953, 2.288726806640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000404.npy"}
{"epoch": 0.6107331821617535, "step": 405, "batch_size": 64, "mean": 4.47020149230957, "std": 6.075931549072266, "min": -9.149261474609375, "p10": -1.9421052932739258, "median": 4.0150251388549805, "p90": 12.587374877929687, "max": 22.673330307006836, "pos_frac": 0.71875, "sample": [10.318367004394531, -0.8983497619628906, -1.9078655242919922, 21.36353302001953, 4.900642395019531, 3.2323532104492188, 12.60498046875, 4.7283935546875, 1.0547904968261719, -9.149261474609375, -0.742584228515625, 1.6179485321044922, -0.9502582550048828, -2.806211471557617, 4.850425720214844, 1.5508918762207031, 15.802337646484375, -0.3280353546142578, -1.7499771118164062, 7.3800201416015625, 10.17007827758789, 11.709564208984375, 3.9326553344726562, 10.723731994628906, -2.126873016357422, 2.9893760681152344, 7.09259033203125, 12.841287612915039, -1.5259666442871094, 7.648580551147461, 1.6870689392089844, -0.19414901733398438, 5.640846252441406, 6.243133544921875, -0.08416748046875, 10.523292541503906, 3.2986412048339844, -3.027050018310547, 5.147254943847656, 10.248756408691406, 2.081186294555664, -1.539102554321289, 4.102210998535156, 4.097394943237305, 1.3759536743164062, 1.3039722442626953, 1.709503173828125, 4.720298767089844, 12.546295166015625, 6.941686630249023, -1.9567794799804688, 6.26007080078125, 8.668891906738281, 2.2734146118164062, 13.587186813354492, 22.673330307006836, -4.5610198974609375, 5.921924591064453, 7.001731872558594, -1.1834640502929688, 0.05631256103515625, 13.678733825683594, 5.658607482910156, -3.1362228393554688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000405.npy"}
{"epoch": 0.6122448979591837, "step": 406, "batch_size": 64, "mean": 5.058940887451172, "std": 6.1067795753479, "min": -8.71868896484375, "p10": -2.428604125976562, "median": 4.980438232421875, "p90": 13.337067794799808, "max": 18.08575439453125, "pos_frac": 0.75, "sample": [9.917648315429688, 1.6364974975585938, -2.1367149353027344, -3.5047607421875, 5.937736511230469, 11.347557067871094, 13.646411895751953, 12.320266723632812, -5.107212066650391, 14.373336791992188, 7.779146194458008, 7.859870910644531, 3.022052764892578, -0.41500091552734375, 10.121604919433594, 5.75048828125, 12.615264892578125, 2.4700679779052734, 4.21881103515625, 15.367897033691406, 9.716896057128906, 3.4021835327148438, 6.677555084228516, 10.93109130859375, 3.4729843139648438, 9.843597412109375, -2.1008434295654297, 5.7420654296875, -0.06160736083984375, 0.9230079650878906, -0.5333442687988281, -0.5473785400390625, -8.71868896484375, -1.6669960021972656, 8.965835571289062, 9.523826599121094, 1.6490955352783203, 13.84951400756836, -2.553699493408203, 16.873886108398438, 7.32391357421875, -0.18902206420898438, 0.49346160888671875, 18.08575439453125, 0.5709075927734375, 11.482818603515625, -4.126775741577148, 1.9842300415039062, 10.548957824707031, 13.816253662109375, 6.216550827026367, -2.5692291259765625, 11.310806274414062, 2.0424880981445312, 0.7710304260253906, -5.264177322387695, 3.849853515625, 7.9435577392578125, -1.5083808898925781, 2.119720458984375, 7.6756439208984375, 1.973785400390625, 9.853607177734375, 6.7564849853515625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000406.npy"}
{"epoch": 0.6137566137566137, "step": 407, "batch_size": 64, "mean": 3.640390634536743, "std": 6.020927906036377, "min": -13.12591552734375, "p10": -2.103367805480957, "median": 2.5868682861328125, "p90": 10.67284736633301, "max": 24.626235961914062, "pos_frac": 0.71875, "sample": [1.4107437133789062, 1.2657356262207031, 2.9270172119140625, 9.150016784667969, -0.45029449462890625, 10.737285614013672, 12.729263305664062, -0.2190990447998047, -6.969196319580078, 7.640169143676758, 7.447639465332031, 1.7351951599121094, -1.2904510498046875, -4.646003723144531, -3.37139892578125, 2.2467193603515625, 3.5305023193359375, 6.350135803222656, 15.07662582397461, 8.900421142578125, 0.731231689453125, 0.02367401123046875, 6.275806427001953, 4.9147796630859375, 1.903900146484375, 2.1632766723632812, 1.9281234741210938, -0.22414779663085938, 3.7366943359375, 24.626235961914062, 4.329597473144531, -0.519195556640625, 10.408538818359375, 7.278778076171875, 10.293060302734375, 2.0484886169433594, 0.04998779296875, 11.485015869140625, -4.698274612426758, 1.8514175415039062, -1.5486907958984375, 13.856948852539062, 6.028327941894531, 4.336036682128906, 9.711227416992188, 10.522491455078125, 10.445886611938477, 0.1586151123046875, 6.248931884765625, -0.5366668701171875, 1.1818695068359375, -13.12591552734375, -5.842010498046875, 3.460601806640625, -2.1612777709960938, 4.322544097900391, -0.037647247314453125, 12.259830474853516, -1.9682445526123047, -1.1228675842285156, -1.0230865478515625, 3.593994140625, 7.316886901855469, 4.0991973876953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000407.npy"}
{"epoch": 0.6152683295540439, "step": 408, "batch_size": 64, "mean": 5.4966349601745605, "std": 5.892077445983887, "min": -5.959541320800781, "p10": -1.364710617065429, "median": 4.98210334777832, "p90": 13.560218811035158, "max": 20.50621795654297, "pos_frac": 0.796875, "sample": [2.582174301147461, 12.963945388793945, 8.943283081054688, 6.133670806884766, -0.3886127471923828, -2.9515323638916016, -5.959541320800781, 2.0059738159179688, 8.678268432617188, 2.751619338989258, 6.254642486572266, 11.38848876953125, -0.715972900390625, 0.01806640625, 13.76055908203125, 9.596832275390625, 1.3987350463867188, -0.8224945068359375, 7.938638687133789, 2.07684326171875, 5.495948791503906, -0.8728408813476562, -2.6973953247070312, 7.317567825317383, 16.8245849609375, -3.9320220947265625, 4.646278381347656, 20.50621795654297, 11.013896942138672, 8.758781433105469, 14.044769287109375, -0.4735126495361328, 9.677881240844727, 0.3519420623779297, 5.45556640625, 12.88620376586914, 4.652500152587891, -3.5025787353515625, 16.7821044921875, 3.6481170654296875, 2.2582473754882812, 7.263542175292969, 11.019950866699219, 14.904335021972656, 15.013992309570312, 3.7659549713134766, 10.742290496826172, -1.5755119323730469, 0.8185863494873047, 7.0554962158203125, 9.163436889648438, 5.400127410888672, 13.092758178710938, -3.730998992919922, 4.432373046875, 9.542312622070312, 3.072368621826172, -0.7003040313720703, 5.31170654296875, 1.7012557983398438, 3.8687515258789062, 2.773265838623047, 1.3744888305664062, 8.978609085083008], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000408.npy"}
{"epoch": 0.6167800453514739, "step": 409, "batch_size": 64, "mean": 3.1290056705474854, "std": 6.120492458343506, "min": -10.456497192382812, "p10": -3.0060501098632812, "median": 2.1230850219726562, "p90": 12.924734497070315, "max": 18.702163696289062, "pos_frac": 0.671875, "sample": [-0.13207244873046875, 0.7351474761962891, -1.5073394775390625, 0.8478546142578125, 0.8543624877929688, -0.3865242004394531, 5.488977432250977, -1.9648017883300781, 1.3828239440917969, -4.5415191650390625, -0.0172271728515625, 13.242134094238281, 0.5511016845703125, 18.702163696289062, 6.338893890380859, 2.23504638671875, 8.665565490722656, -2.3246726989746094, -2.4755611419677734, 14.215814590454102, 13.945438385009766, -2.8458175659179688, 5.5511322021484375, 6.74713134765625, 3.906951904296875, -2.730010986328125, 3.17791748046875, 6.6078338623046875, 6.44038200378418, -7.469024658203125, -2.987091064453125, 3.5961036682128906, 6.6864471435546875, -3.0141754150390625, 6.547615051269531, 0.35954856872558594, 12.184135437011719, 13.53973388671875, 1.12640380859375, 2.1565284729003906, -0.5218238830566406, -1.4301834106445312, 0.9558563232421875, -4.247163772583008, 2.6927642822265625, 1.5496826171875, 2.48760986328125, 1.9702739715576172, -10.456497192382812, 13.417243957519531, 2.089641571044922, 6.789527893066406, -0.6750278472900391, 11.669410705566406, -9.62455940246582, 2.5595321655273438, 5.8021240234375, -0.0135345458984375, 5.6266632080078125, 11.73800277709961, 16.504493713378906, -3.7656307220458984, 9.014572143554688, 2.6860294342041016], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000409.npy"}
{"epoch": 0.618291761148904, "step": 410, "batch_size": 64, "mean": 4.1284589767456055, "std": 4.81343412399292, "min": -4.949914932250977, "p10": -1.6196258544921869, "median": 3.05660343170166, "p90": 10.798712539672856, "max": 17.633468627929688, "pos_frac": 0.8125, "sample": [2.528165817260742, 3.7926158905029297, 4.084705352783203, -0.07459449768066406, 14.335624694824219, -4.468994140625, 9.123565673828125, 2.2356910705566406, -1.900848388671875, -4.949914932250977, 7.861976623535156, 2.772899627685547, 0.8356990814208984, -3.1086387634277344, 8.154083251953125, 6.349096298217773, 11.16189193725586, 2.3494415283203125, 7.231929779052734, 12.218360900878906, 12.16265869140625, 2.9030818939208984, -0.96343994140625, 6.981563568115234, 1.7376861572265625, 3.6718063354492188, -2.191253662109375, 3.5103073120117188, 4.2262725830078125, -0.8528366088867188, 6.212989807128906, -0.3379669189453125, 2.1972713470458984, 9.395484924316406, 2.4109649658203125, 17.633468627929688, 8.152816772460938, 0.4001579284667969, 1.9831829071044922, 9.9512939453125, 0.8927001953125, -3.262115478515625, 2.3820552825927734, 0.5451812744140625, 2.7709732055664062, 6.520973205566406, 8.718786239624023, 1.308807373046875, 2.734607696533203, 3.86328125, 4.029144287109375, 3.810821533203125, 14.861434936523438, 7.999870300292969, -0.9233856201171875, 2.79949951171875, 3.210124969482422, 1.0545806884765625, 1.3641128540039062, 9.346431732177734, 11.539176940917969, 5.753318786621094, -2.6268997192382812, 5.809654235839844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000410.npy"}
{"epoch": 0.6198034769463341, "step": 411, "batch_size": 64, "mean": 4.597599029541016, "std": 7.20441198348999, "min": -10.503021240234375, "p10": -4.414614105224609, "median": 4.398158073425293, "p90": 13.585925292968753, "max": 22.746551513671875, "pos_frac": 0.71875, "sample": [-4.8826904296875, 12.760875701904297, 13.939517974853516, 14.848735809326172, 4.086524963378906, 1.2239913940429688, -6.1415252685546875, -8.905624389648438, 4.9671478271484375, -0.147735595703125, 6.090091705322266, 12.417266845703125, 7.816213607788086, 6.052211761474609, 15.031723022460938, 2.4235687255859375, 6.400995254516602, -2.6098403930664062, 7.6380615234375, 3.3655471801757812, 8.836830139160156, 12.261894226074219, 12.6295166015625, 2.378498077392578, -4.03228759765625, 2.9683074951171875, 0.41050148010253906, 12.368282318115234, -1.1687545776367188, -10.503021240234375, -3.34161376953125, -4.663415908813477, 1.5918502807617188, 8.595710754394531, 3.984783172607422, 6.466571807861328, -0.9875278472900391, 21.35799789428711, -2.2819595336914062, 22.746551513671875, 7.036041259765625, 5.272592544555664, -3.3560638427734375, 10.57232666015625, 9.834991455078125, 15.048650741577148, -4.578468322753906, 5.731590270996094, -0.5673942565917969, -1.5909347534179688, 0.9627799987792969, 0.11530303955078125, 9.664121627807617, 10.228385925292969, 3.339479446411133, 7.51295280456543, 4.70979118347168, 2.121124267578125, 21.818485260009766, 5.90093994140625, -1.513031005859375, 0.9669876098632812, -5.813453674316406, 4.835382461547852], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000411.npy"}
{"epoch": 0.6213151927437641, "step": 412, "batch_size": 64, "mean": 4.628368377685547, "std": 6.041375637054443, "min": -4.604545593261719, "p10": -2.552938652038574, "median": 3.6740989685058594, "p90": 12.696851348876953, "max": 19.088464736938477, "pos_frac": 0.71875, "sample": [-2.642749786376953, 12.722396850585938, 4.8420867919921875, 18.570966720581055, 4.422863006591797, 4.855594635009766, 6.1018829345703125, 7.714773178100586, -4.604545593261719, 1.8590869903564453, 5.828573226928711, 2.2573471069335938, 4.265892028808594, 19.088464736938477, -0.12945556640625, -2.4898910522460938, 7.674762725830078, -2.469900131225586, -1.2846755981445312, -0.5131721496582031, 9.979877471923828, 6.6986541748046875, 5.955078125, 3.0022506713867188, -1.101287841796875, 1.2702484130859375, 6.5469207763671875, 3.2685775756835938, 8.401100158691406, 1.27801513671875, 2.909893035888672, 13.347915649414062, 3.1260910034179688, 18.66351318359375, -2.574249267578125, -0.9064254760742188, 6.8608856201171875, -0.7799034118652344, 8.51530647277832, -0.15636444091796875, -2.503213882446289, -4.05645751953125, 11.235946655273438, -3.728229522705078, 0.9825916290283203, 11.758094787597656, 11.615005493164062, 11.589103698730469, 16.56226348876953, -1.0034637451171875, 4.079620361328125, 7.5332794189453125, 12.90188217163086, -3.723682403564453, 6.745582580566406, 11.111164093017578, 1.7761058807373047, 1.812326431274414, -3.752391815185547, 2.8675537109375, 7.832704544067383, 0.5060329437255859, 12.637245178222656, 1.06011962890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000412.npy"}
{"epoch": 0.6228269085411943, "step": 413, "batch_size": 64, "mean": 4.234124183654785, "std": 6.867523670196533, "min": -9.736396789550781, "p10": -4.462366485595703, "median": 3.5295047760009766, "p90": 12.329950714111328, "max": 23.432205200195312, "pos_frac": 0.765625, "sample": [0.35271263122558594, -4.2236480712890625, 9.412742614746094, 2.5784645080566406, 0.5554962158203125, 0.49676513671875, 3.35687255859375, 17.788646697998047, -5.875692367553711, 1.701263427734375, -9.736396789550781, 9.102439880371094, 5.625877380371094, -2.230335235595703, -2.5355987548828125, 2.9099044799804688, -0.4292488098144531, 18.396366119384766, 0.305938720703125, 4.561500549316406, 19.784957885742188, 4.648101806640625, 6.2303314208984375, 1.8210639953613281, 7.40087890625, 4.086393356323242, 23.432205200195312, 14.517486572265625, 1.7747268676757812, 5.0273590087890625, 3.5519485473632812, 6.762935638427734, 4.895561218261719, 5.0301361083984375, 16.188194274902344, -6.578521728515625, 3.3842239379882812, 1.802093505859375, 8.923789978027344, 1.2727203369140625, -7.5748138427734375, 10.423126220703125, 8.679954528808594, -1.5326156616210938, 7.3652801513671875, -2.1237716674804688, -8.700431823730469, -0.45665740966796875, 3.0692100524902344, 3.7890357971191406, 1.55279541015625, -5.7601776123046875, 12.297805786132812, -1.0190620422363281, 1.1608657836914062, 5.339103698730469, 3.507061004638672, 12.343727111816406, 5.3037109375, 10.504081726074219, -4.564674377441406, 9.16969108581543, 10.232688903808594, 11.9073486328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000413.npy"}
{"epoch": 0.6243386243386243, "step": 414, "batch_size": 64, "mean": 4.865261077880859, "std": 6.436190128326416, "min": -10.275070190429688, "p10": -1.767473983764648, "median": 4.221588134765625, "p90": 13.888067626953125, "max": 22.418067932128906, "pos_frac": 0.78125, "sample": [10.133077621459961, -2.059955596923828, 7.666114807128906, 12.491943359375, -0.9530563354492188, 6.414388656616211, -0.37249755859375, 13.489482879638672, 4.269828796386719, 1.212890625, 5.88067626953125, 5.232597351074219, 6.929237365722656, -4.810935974121094, 13.728469848632812, 6.203273773193359, -7.076713562011719, -5.828216552734375, 2.052490234375, 3.7474517822265625, 7.730836868286133, 1.7668533325195312, 19.97112274169922, 6.830894470214844, 14.878812789916992, -0.6824493408203125, 7.630657196044922, 8.157127380371094, 3.9490814208984375, 7.674957275390625, 14.77151107788086, 1.6147308349609375, 7.588253021240234, -0.302825927734375, 11.256851196289062, 3.4945621490478516, 11.225940704345703, 3.5436859130859375, 5.98968505859375, 11.199516296386719, 4.173347473144531, -3.251861572265625, 0.4708080291748047, -0.023014068603515625, 14.231468200683594, -1.2812652587890625, 0.6868820190429688, 6.98524284362793, -10.275070190429688, 0.27278900146484375, 7.06536865234375, 0.2854499816894531, -1.9758491516113281, 0.9625320434570312, 16.502029418945312, 22.418067932128906, 1.2430496215820312, 0.08172607421875, 7.462703704833984, 4.347906112670898, 1.2800140380859375, -1.2600154876708984, 0.3775787353515625, 13.956466674804688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000414.npy"}
{"epoch": 0.6258503401360545, "step": 415, "batch_size": 64, "mean": 4.514373779296875, "std": 6.699206352233887, "min": -11.771240234375, "p10": -3.435853767395019, "median": 3.505978584289551, "p90": 14.004625129699708, "max": 20.905548095703125, "pos_frac": 0.796875, "sample": [-0.5706233978271484, 10.576149940490723, 8.053098678588867, 6.420026779174805, 2.2107276916503906, 3.448760986328125, 8.496337890625, 14.99819564819336, 0.1147003173828125, 10.708274841308594, 4.9324951171875, -8.53253173828125, 8.584877014160156, 10.971441268920898, -0.8907318115234375, 16.478260040283203, 20.905548095703125, 3.335357666015625, 13.000381469726562, -0.6524314880371094, 1.1944046020507812, 4.3817291259765625, 2.8458404541015625, 0.18372535705566406, 5.857292175292969, 17.73290252685547, -7.368980407714844, 2.7705535888671875, 7.7286529541015625, 7.98760986328125, -4.5446929931640625, 7.077121734619141, 3.5631961822509766, -1.897705078125, -8.484725952148438, 18.66522216796875, 8.394172668457031, 13.84553337097168, 9.578353881835938, -3.8630828857421875, 3.0575332641601562, -3.6907806396484375, -2.841024398803711, -11.771240234375, 4.788150787353516, 8.634475708007812, 1.1633415222167969, 1.6256179809570312, 2.550128936767578, 0.7110881805419922, 0.23190879821777344, 3.2421951293945312, 1.9213104248046875, 1.4601173400878906, 3.6941909790039062, 5.072471618652344, -0.24752235412597656, 6.386222839355469, 5.179656982421875, 0.04210662841796875, 1.597076416015625, 8.44875717163086, 15.355875015258789, 14.072807312011719], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000415.npy"}
{"epoch": 0.6273620559334845, "step": 416, "batch_size": 64, "mean": 3.758429527282715, "std": 6.401620864868164, "min": -9.774234771728516, "p10": -3.4034515380859363, "median": 3.333589553833008, "p90": 11.697997093200685, "max": 22.457544326782227, "pos_frac": 0.71875, "sample": [1.2812614440917969, -9.774234771728516, -1.7689552307128906, 9.870346069335938, 0.9608535766601562, 13.570846557617188, -2.3158111572265625, -2.304046630859375, 8.689250946044922, 13.497459411621094, 2.41436767578125, 18.359458923339844, 18.5341796875, 1.8228836059570312, 1.1045341491699219, 5.138767242431641, 2.6791934967041016, 5.042070388793945, -7.9893646240234375, -6.239959716796875, 5.616016387939453, 6.395830154418945, 4.445337295532227, 6.790210723876953, -0.9523506164550781, 9.243408203125, 2.6357421875, 22.457544326782227, 13.30026626586914, 6.891899108886719, 3.4702796936035156, 3.662487030029297, 2.30133056640625, -8.437524795532227, 2.672880172729492, -0.14214324951171875, -0.34918212890625, -6.853734970092773, 4.23649787902832, 2.4441146850585938, -3.8695831298828125, 6.8343963623046875, -1.3862762451171875, 1.436126708984375, -0.078125, -0.3326454162597656, 11.051286697387695, -7.638946533203125, 1.1274795532226562, 8.296051025390625, 3.3035240173339844, 3.4902172088623047, -1.8922920227050781, 8.179672241210938, -0.9669456481933594, 3.332660675048828, 3.3345184326171875, 8.29388427734375, 5.466808319091797, 6.294467926025391, 7.096160888671875, 9.383056640625, 11.97515869140625, 5.406839370727539], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000416.npy"}
{"epoch": 0.6288737717309146, "step": 417, "batch_size": 64, "mean": 4.888284683227539, "std": 6.2287750244140625, "min": -11.219200134277344, "p10": -1.6297645568847656, "median": 4.616708755493164, "p90": 12.028039550781251, "max": 21.295143127441406, "pos_frac": 0.78125, "sample": [2.5379085540771484, -1.581003189086914, -0.8943328857421875, -11.219200134277344, 12.43231201171875, 9.187995910644531, -9.600936889648438, 12.140342712402344, 4.455440521240234, 6.205631256103516, 16.982269287109375, 2.1950225830078125, 5.811302185058594, 7.085180282592773, 7.51025390625, -6.064825057983398, 13.171669006347656, 4.387290954589844, 0.6349334716796875, 11.509668350219727, 8.039932250976562, 6.750244140625, 4.29774284362793, 1.4069747924804688, 3.9746551513671875, -1.543731689453125, 10.342216491699219, 5.7805328369140625, 1.4139289855957031, 5.880516052246094, 1.7769603729248047, 16.996604919433594, 15.448169708251953, -1.6247711181640625, -0.14190673828125, -1.6319046020507812, -2.2628021240234375, 1.6909751892089844, 11.765998840332031, 4.777976989746094, 11.601802825927734, 3.1467514038085938, 10.716426849365234, 5.064872741699219, 0.15407562255859375, 9.398712158203125, 8.055948257446289, 4.275321960449219, -1.3814125061035156, 6.792503356933594, 1.790435791015625, 4.152057647705078, 1.7431106567382812, -6.4349822998046875, 8.58847427368164, -1.7708740234375, 7.095310211181641, 10.033084869384766, 2.810657501220703, -1.1106243133544922, 10.971073150634766, 6.044626235961914, 9.792488098144531, 21.295143127441406], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000417.npy"}
{"epoch": 0.6303854875283447, "step": 418, "batch_size": 64, "mean": 3.1133694648742676, "std": 7.052056312561035, "min": -10.422821044921875, "p10": -4.924216461181641, "median": 2.6274185180664062, "p90": 10.638269042968751, "max": 29.354476928710938, "pos_frac": 0.734375, "sample": [2.0479736328125, 9.86338996887207, 6.283515930175781, 4.520416259765625, 1.259847640991211, 5.806427001953125, 1.8734970092773438, 8.270912170410156, 17.59096908569336, -4.290367126464844, 10.813480377197266, 18.230499267578125, -2.3399429321289062, 5.18084716796875, 5.4879913330078125, -5.0306854248046875, 6.4161224365234375, 0.9635658264160156, 13.761425018310547, 0.29914093017578125, 0.4609050750732422, 29.354476928710938, 3.4776229858398438, 2.7365341186523438, 9.106086730957031, 2.202117919921875, -2.7088623046875, 2.5183029174804688, 5.171764373779297, 3.802459716796875, -9.22027587890625, -10.422821044921875, 4.8687744140625, 5.266155242919922, 18.07439422607422, 0.08675384521484375, 0.2647895812988281, -3.0950241088867188, 5.295982360839844, 1.3759880065917969, -10.183547973632812, -7.306571960449219, -8.306547164916992, -0.5946006774902344, 1.606231689453125, -2.88372802734375, 5.526649475097656, 3.7462310791015625, -4.042314529418945, 1.1102371215820312, 0.9554824829101562, -2.8950653076171875, 10.968826293945312, 6.1047821044921875, 6.568458557128906, -5.47357177734375, -4.675788879394531, 6.975242614746094, 4.90565299987793, 10.229442596435547, 5.806221008300781, -0.5314903259277344, 5.439453125, 0.580810546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000418.npy"}
{"epoch": 0.6318972033257747, "step": 419, "batch_size": 64, "mean": 5.145925998687744, "std": 6.328725814819336, "min": -7.275608062744141, "p10": -1.9712837219238277, "median": 3.854949951171875, "p90": 13.771473693847657, "max": 24.68333625793457, "pos_frac": 0.796875, "sample": [0.7478713989257812, 13.888381958007812, 8.953353881835938, 0.8690071105957031, -5.217683792114258, 3.9883460998535156, 13.7381591796875, 3.2580337524414062, -2.1040878295898438, 1.7109794616699219, -5.058494567871094, 11.892902374267578, 9.919570922851562, 4.7284393310546875, -1.3196334838867188, -0.678314208984375, 8.02734375, 2.495269775390625, 11.71173095703125, -1.661407470703125, 2.677398681640625, 5.7659912109375, -2.7710227966308594, 8.04754638671875, 3.217864990234375, 0.46428680419921875, 10.280952453613281, -4.5093994140625, 5.8931427001953125, 14.568412780761719, -0.6693687438964844, -1.2086715698242188, -3.392608642578125, 13.195625305175781, 18.216995239257812, 7.737865447998047, 9.888225555419922, 3.1047019958496094, 3.8586692810058594, 6.07928466796875, 4.828697204589844, 14.509077072143555, 2.2871017456054688, 1.9241256713867188, 13.474361419677734, -0.6759490966796875, 10.922103881835938, 3.8512306213378906, 1.7097892761230469, 24.68333625793457, 13.785751342773438, 0.6392002105712891, 4.396087646484375, 3.5680160522460938, 7.8227996826171875, 0.9722995758056641, 7.4951934814453125, 6.183353424072266, 15.944168090820312, 0.459320068359375, 3.354990005493164, 3.2261199951171875, -7.275608062744141, 10.918037414550781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000419.npy"}
{"epoch": 0.6334089191232048, "step": 420, "batch_size": 64, "mean": 3.1102042198181152, "std": 7.267636299133301, "min": -15.336563110351562, "p10": -4.603804779052734, "median": 2.4683055877685547, "p90": 12.244610786437988, "max": 24.385826110839844, "pos_frac": 0.6875, "sample": [0.5354232788085938, -9.059551239013672, 15.954622268676758, -0.2285003662109375, 16.865150451660156, -5.881095886230469, 6.4526824951171875, 2.0335960388183594, 2.7558212280273438, -11.851163864135742, -0.8254318237304688, 6.143501281738281, 0.6486740112304688, 7.125526428222656, -3.1768035888671875, 7.996498107910156, 16.57269287109375, 5.893772125244141, 18.944690704345703, 8.700141906738281, 2.2567691802978516, -2.0475502014160156, 0.49320030212402344, 6.019737243652344, 2.481426239013672, 13.748149871826172, -3.782909393310547, -7.109472274780273, 2.486543655395508, 1.987234115600586, -6.2970428466796875, 3.3310394287109375, 0.9269638061523438, 1.2533988952636719, -3.1090316772460938, -3.087596893310547, -1.8408203125, -4.0852508544921875, 7.34320068359375, 5.238380432128906, -1.198568344116211, -15.336563110351562, -3.694578170776367, 2.4551849365234375, 12.092544555664062, 3.384462356567383, 4.160636901855469, 12.309782028198242, 3.3655853271484375, 6.331085205078125, 8.899726867675781, 5.044002532958984, 7.765041351318359, 0.27149009704589844, 1.9002037048339844, 6.096345901489258, 11.929779052734375, -4.826042175292969, 8.212615966796875, 7.912147521972656, 1.838531494140625, 24.385826110839844, -2.694080352783203, -3.358692169189453], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000420.npy"}
{"epoch": 0.6349206349206349, "step": 421, "batch_size": 64, "mean": 3.682920455932617, "std": 7.70320987701416, "min": -15.998140335083008, "p10": -5.6991634368896475, "median": 3.2135725021362305, "p90": 12.709754943847656, "max": 25.182891845703125, "pos_frac": 0.671875, "sample": [-7.343803405761719, 10.0322265625, -1.0568046569824219, 7.92201042175293, 6.684814453125, 8.946338653564453, 10.239707946777344, -1.5669631958007812, -0.3990058898925781, 25.182891845703125, 18.135452270507812, -6.103973388671875, 12.799476623535156, 5.667823791503906, -3.934497833251953, 2.4685802459716797, 3.01422119140625, -3.7338180541992188, 13.357841491699219, 4.9432525634765625, 12.036392211914062, 1.42816162109375, 1.6950931549072266, -0.717010498046875, 4.308526992797852, -3.461996078491211, 9.433868408203125, -15.998140335083008, -8.106056213378906, -5.225574493408203, 3.412923812866211, 18.990619659423828, 12.383377075195312, 0.5020923614501953, 3.5746307373046875, 11.7989501953125, 14.33270263671875, 9.099189758300781, -0.4203033447265625, 0.6130943298339844, 15.758621215820312, 0.8326129913330078, 1.8356742858886719, -2.2111053466796875, -6.680824279785156, 6.543985366821289, -0.06276702880859375, 2.3106460571289062, -12.4114990234375, 1.9228439331054688, 10.963348388671875, 10.704086303710938, -0.05503082275390625, 6.138145446777344, 1.0570220947265625, -5.902130126953125, 5.6806793212890625, 3.600177764892578, 5.179145812988281, -5.115827560424805, 8.199859619140625, 12.500404357910156, 4.890241622924805, -4.907718658447266], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000421.npy"}
{"epoch": 0.636432350718065, "step": 422, "batch_size": 64, "mean": 4.617974758148193, "std": 6.379191875457764, "min": -7.522979736328125, "p10": -2.4678241729736325, "median": 4.3279571533203125, "p90": 12.79168930053711, "max": 23.912063598632812, "pos_frac": 0.765625, "sample": [12.852096557617188, 17.16485595703125, -5.83662223815918, 0.176605224609375, 9.140914916992188, -2.13372802734375, 4.61798095703125, -6.7753753662109375, 4.204959869384766, 23.912063598632812, 6.12030029296875, 4.450954437255859, -1.3349380493164062, 4.993255615234375, 15.971046447753906, 0.7300148010253906, 8.18014144897461, 10.410289764404297, 2.9158477783203125, 4.669364929199219, 5.1970367431640625, -0.1104888916015625, -2.2247238159179688, 3.914562225341797, 2.9403839111328125, 9.73785400390625, 7.618671417236328, 7.871492385864258, 18.8016357421875, 4.693656921386719, 3.7339630126953125, 9.618366241455078, 6.183338165283203, -2.855203628540039, 2.051023483276367, 3.818164825439453, 12.707901000976562, 1.3691425323486328, 12.827598571777344, 0.7876129150390625, -1.78448486328125, -2.561870574951172, 8.25921630859375, 10.00339126586914, 11.473251342773438, -5.931083679199219, -2.248382568359375, 6.561622619628906, 0.22161102294921875, 6.152448654174805, -1.3274917602539062, 17.740829467773438, 1.1226654052734375, -0.8011856079101562, 1.0659523010253906, 4.698516845703125, 0.9765663146972656, -7.522979736328125, 2.9866085052490234, 9.190528869628906, 7.473335266113281, 4.176321029663086, -3.162261962890625, 5.675233840942383], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000422.npy"}
{"epoch": 0.6379440665154951, "step": 423, "batch_size": 64, "mean": 5.024381637573242, "std": 6.860838413238525, "min": -9.497261047363281, "p10": -2.4427516937255858, "median": 4.380426406860352, "p90": 13.528973388671877, "max": 22.29385757446289, "pos_frac": 0.765625, "sample": [2.8022537231445312, -2.9043502807617188, 4.9665374755859375, -0.9411640167236328, 22.29385757446289, 0.4354209899902344, 2.6658782958984375, -9.497261047363281, 20.535030364990234, 0.790313720703125, -2.55633544921875, 10.794109344482422, 1.7758941650390625, -2.177722930908203, 3.2128372192382812, -0.13286590576171875, 0.03823089599609375, -3.6028594970703125, 13.708114624023438, -3.7425079345703125, 4.980791091918945, 5.238670349121094, -5.853111267089844, -0.5595226287841797, 0.33026885986328125, 13.110977172851562, 12.343856811523438, 0.514404296875, -8.514564514160156, 2.6337966918945312, 12.082794189453125, 18.517349243164062, 6.81671142578125, 7.2642364501953125, 8.120561599731445, 6.81390380859375, 10.9266357421875, 10.425033569335938, 20.693748474121094, 12.891265869140625, 4.236118316650391, 2.908275604248047, 0.54595947265625, 5.6601715087890625, 8.133415222167969, 10.603981018066406, 11.875801086425781, 2.684661865234375, 0.4725799560546875, 5.446651458740234, 13.74932861328125, 5.266456604003906, 2.8645877838134766, 5.122028350830078, 5.364097595214844, 16.771469116210938, 4.5247344970703125, -1.3897171020507812, -1.9391899108886719, 9.029190063476562, 10.471519470214844, -0.35659027099609375, 2.2904605865478516, -0.01674652099609375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000423.npy"}
{"epoch": 0.6394557823129252, "step": 424, "batch_size": 64, "mean": 5.203647136688232, "std": 6.602239608764648, "min": -6.5256195068359375, "p10": -2.2983375549316403, "median": 3.907154083251953, "p90": 14.85458087921143, "max": 21.849884033203125, "pos_frac": 0.75, "sample": [-2.1351394653320312, 12.735885620117188, 2.0707626342773438, 10.363138198852539, 1.5071220397949219, 10.134790420532227, 21.849884033203125, 7.282318115234375, -5.558963775634766, -6.5256195068359375, -2.4586029052734375, 2.735044479370117, 2.318918228149414, -5.325736999511719, 15.314584732055664, 3.28399658203125, 11.713211059570312, -2.1684494018554688, 1.8118972778320312, 1.2378387451171875, -0.14982032775878906, 7.904869079589844, 9.081703186035156, 1.1632232666015625, -2.35400390625, 4.714630126953125, 16.293975830078125, -0.1328277587890625, 19.63390350341797, 3.416807174682617, 8.433067321777344, 6.7884674072265625, -1.4605712890625, 2.612062454223633, 10.662853240966797, -0.9086151123046875, 5.077415466308594, 17.898101806640625, 8.286514282226562, 6.129341125488281, 1.5384254455566406, 7.695156097412109, 2.964691162109375, 15.771583557128906, 3.4728927612304688, 18.765222549438477, 9.671455383300781, 0.5895175933837891, -2.4125747680664062, -0.012979507446289062, 9.942752838134766, 4.3414154052734375, 6.093719482421875, 12.217222213745117, -0.6831588745117188, -0.767669677734375, 5.6708984375, -3.6409988403320312, 13.735702514648438, 13.781238555908203, 0.24927520751953125, 5.796966552734375, 0.4857940673828125, 4.488899230957031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000424.npy"}
{"epoch": 0.6409674981103552, "step": 425, "batch_size": 64, "mean": 4.442198753356934, "std": 6.081453800201416, "min": -7.496726989746094, "p10": -3.432643127441406, "median": 3.444345474243164, "p90": 12.62596130371094, "max": 22.73067855834961, "pos_frac": 0.796875, "sample": [11.222034454345703, 6.3984527587890625, 2.3815078735351562, 3.2893524169921875, 13.967109680175781, -0.6503829956054688, 16.83867645263672, 1.5082588195800781, 5.147335052490234, 3.003101348876953, 4.59161376953125, -3.8200416564941406, 5.933738708496094, 1.18511962890625, -3.8487586975097656, 6.391944885253906, -1.4215240478515625, 12.00247573852539, 2.0175704956054688, 3.5993385314941406, -3.6776485443115234, 5.3873138427734375, 1.3125419616699219, 1.8427314758300781, 7.9651947021484375, 0.08272552490234375, 3.2011051177978516, 15.295745849609375, -0.06624031066894531, 22.73067855834961, 10.35040283203125, 0.18152999877929688, 5.40594482421875, 1.3772087097167969, 6.482635498046875, 2.5504684448242188, 0.676116943359375, 17.578367233276367, 6.985065460205078, 1.9783897399902344, 10.022445678710938, -7.496726989746094, 4.427764892578125, 3.7432861328125, 4.904632568359375, -0.6960849761962891, 3.74664306640625, 1.9160499572753906, 8.884292602539062, 3.8679847717285156, 10.032341003417969, 2.081493377685547, -5.04095458984375, 11.108169555664062, -3.6038284301757812, 1.7317581176757812, 12.893169403076172, 9.115631103515625, -2.982250213623047, 10.613174438476562, -3.0332107543945312, 2.9870681762695312, 13.920989990234375, -6.2203369140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000425.npy"}
{"epoch": 0.6424792139077853, "step": 426, "batch_size": 64, "mean": 2.4346063137054443, "std": 6.189519882202148, "min": -17.822906494140625, "p10": -3.675598907470703, "median": 2.3598880767822266, "p90": 10.070692443847657, "max": 19.783485412597656, "pos_frac": 0.671875, "sample": [0.6345481872558594, 0.053569793701171875, 5.861616134643555, 10.831855773925781, 5.0786895751953125, 9.611766815185547, -0.3075103759765625, 19.783485412597656, 2.427154541015625, 4.0293426513671875, -5.342708587646484, -0.9194259643554688, 3.7764835357666016, -4.6383209228515625, -0.868804931640625, 3.613931655883789, 5.514118194580078, -3.3349609375, 2.644197463989258, -3.0416793823242188, -3.8215866088867188, 2.258941650390625, -0.40374755859375, 0.335540771484375, -3.1206893920898438, 1.4255523681640625, 3.428171157836914, -17.822906494140625, 6.865007400512695, 4.548133850097656, 6.793182373046875, 10.194320678710938, 5.680229187011719, -0.9120578765869141, -9.466476440429688, 2.9249534606933594, -2.6511688232421875, 2.292621612548828, 0.0903778076171875, 9.7822265625, 16.19307518005371, 13.053817749023438, -7.752437591552734, -6.273529052734375, -2.021739959716797, 15.580249786376953, 1.5515289306640625, 1.3965473175048828, -1.5088310241699219, -0.010711669921875, 3.433870315551758, 7.778205871582031, 0.7166595458984375, 9.037033081054688, 2.5734710693359375, -2.4833602905273438, 4.9093780517578125, 4.674934387207031, 3.0088043212890625, 0.8183135986328125, 6.339996337890625, 3.0303573608398438, 11.142520904541016, -3.201326370239258], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000426.npy"}
{"epoch": 0.6439909297052154, "step": 427, "batch_size": 64, "mean": 4.304543972015381, "std": 6.597926616668701, "min": -10.030353546142578, "p10": -2.9115821838378904, "median": 4.209512710571289, "p90": 12.584520149230958, "max": 26.115631103515625, "pos_frac": 0.734375, "sample": [6.759967803955078, 0.27721595764160156, -1.4867610931396484, -5.732429504394531, -2.01531982421875, 8.30622673034668, 10.443880081176758, -2.5286216735839844, 10.737464904785156, -3.3186683654785156, 12.545738220214844, -2.2607994079589844, -6.132081985473633, -2.6763458251953125, 5.8633880615234375, 7.74017333984375, 3.2982101440429688, 5.872802734375, 0.5373916625976562, 1.917449951171875, -0.6703395843505859, 4.290256500244141, 9.90692138671875, 6.697998046875, 3.56756591796875, 8.160932540893555, 4.080730438232422, 9.358596801757812, 1.5054187774658203, 4.1287689208984375, 12.601140975952148, 5.0703582763671875, 6.708648681640625, 2.3067855834960938, 5.499626159667969, 14.053699493408203, -0.616485595703125, -10.030353546142578, 17.23657989501953, 9.662059783935547, 1.4198684692382812, 0.0239715576171875, 6.642486572265625, -3.0123977661132812, -1.7154006958007812, -4.225149154663086, 7.550777435302734, 9.086151123046875, 2.767791748046875, -7.853595733642578, 13.66413688659668, 14.580963134765625, 1.115753173828125, 5.804603576660156, 5.616912841796875, 2.675506591796875, 4.7163238525390625, 5.263298034667969, 6.763458251953125, -0.5288848876953125, 0.3440284729003906, -2.126941680908203, 26.115631103515625, 19.13372802734375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000427.npy"}
{"epoch": 0.6455026455026455, "step": 428, "batch_size": 64, "mean": 5.411052703857422, "std": 6.55393648147583, "min": -7.4010162353515625, "p10": -1.4718910217285157, "median": 3.6081371307373047, "p90": 14.874644279479982, "max": 21.06317138671875, "pos_frac": 0.765625, "sample": [-7.4010162353515625, 11.595970153808594, 11.099411010742188, -1.4495887756347656, 0.16105079650878906, 2.2705154418945312, -2.3392601013183594, -2.91668701171875, -6.501747131347656, 2.2913131713867188, 15.489330291748047, -1.6766281127929688, 2.2789306640625, -0.6071014404296875, 11.790031433105469, 7.0145263671875, 17.4090576171875, 1.8803176879882812, -0.7563400268554688, 2.4629364013671875, 9.061958312988281, 3.7521324157714844, 0.16159820556640625, 6.913417816162109, 11.103080749511719, 4.499839782714844, 3.464141845703125, -4.278045654296875, 9.080245971679688, -1.3855438232421875, 9.802947998046875, -0.5559005737304688, 1.1307735443115234, 5.475673675537109, 6.151954650878906, 17.618911743164062, 13.555740356445312, 15.136734008789062, 2.964641571044922, -0.8008632659912109, 11.162460327148438, 21.06317138671875, 8.386333465576172, 11.098655700683594, 3.3458404541015625, 0.27043914794921875, 10.460479736328125, 2.6564788818359375, 1.170328140258789, -0.265289306640625, 1.9421501159667969, 4.418373107910156, 0.34459686279296875, 0.6246814727783203, 14.263101577758789, 12.299205780029297, 16.16107177734375, 11.542081832885742, 8.691307067871094, 7.506690979003906, 18.20562744140625, 7.872100830078125, -0.37952423095703125, -1.4814491271972656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000428.npy"}
{"epoch": 0.6470143613000756, "step": 429, "batch_size": 64, "mean": 4.633237838745117, "std": 6.435501575469971, "min": -11.465652465820312, "p10": -3.5226341247558595, "median": 4.540027618408203, "p90": 12.704315185546877, "max": 23.030624389648438, "pos_frac": 0.796875, "sample": [8.949440002441406, -3.2780914306640625, 0.5054798126220703, 3.8039321899414062, 5.848945617675781, 3.588979721069336, 4.756736755371094, 11.830154418945312, -2.931354522705078, 3.0744781494140625, 3.5678634643554688, -3.53094482421875, 2.579845428466797, 6.325050354003906, 1.3597412109375, 5.119361877441406, -11.465652465820312, 2.9187088012695312, 5.1840667724609375, 5.3128204345703125, 10.5577392578125, 0.22611236572265625, 6.050689697265625, 8.575422286987305, 12.813079833984375, 13.702919006347656, 15.425939559936523, 5.919887542724609, 9.58026123046875, -6.331695556640625, -1.5754013061523438, -5.519481658935547, 0.1689453125, -5.963768005371094, 12.42576789855957, 2.904775619506836, -0.2502899169921875, 12.450531005859375, -1.573333740234375, 0.5470962524414062, 9.880340576171875, 13.28033447265625, 1.4785270690917969, 8.011892318725586, 15.199028015136719, -5.2853851318359375, -4.753074645996094, 2.3036842346191406, 2.7442970275878906, 6.696765899658203, 1.5204524993896484, 16.102989196777344, 8.808448791503906, -3.5032424926757812, 4.39995002746582, 6.0731353759765625, 11.605945587158203, 2.246408462524414, 23.030624389648438, 9.112770080566406, 1.9985389709472656, 4.680105209350586, 6.023902893066406, 11.216011047363281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000429.npy"}
{"epoch": 0.6485260770975056, "step": 430, "batch_size": 64, "mean": 4.866377830505371, "std": 6.436357498168945, "min": -10.857284545898438, "p10": -1.7803951263427724, "median": 4.237068176269531, "p90": 12.455667877197268, "max": 20.955177307128906, "pos_frac": 0.84375, "sample": [-0.0182647705078125, 0.25168800354003906, 0.40821075439453125, 2.270721435546875, -3.9591751098632812, 11.02313232421875, 0.08882522583007812, 20.578563690185547, 2.123668670654297, 4.3710479736328125, -4.698966979980469, 12.59814453125, 12.123222351074219, 2.583515167236328, 16.213096618652344, 5.124839782714844, 8.814262390136719, 1.4671478271484375, 15.742706298828125, 0.4419269561767578, 1.4390029907226562, 20.955177307128906, 3.8823318481445312, 2.3582115173339844, -0.7998466491699219, 16.41019630432129, 2.1563549041748047, 10.844703674316406, 9.370376586914062, 9.224594116210938, 5.887065887451172, 4.10308837890625, 5.500701904296875, 7.5500030517578125, -2.848966598510742, 20.64419937133789, 11.658454895019531, 6.6913604736328125, -5.636997222900391, 9.878185272216797, 5.0812530517578125, 6.424713134765625, 1.623443603515625, 1.69940185546875, 1.34661865234375, 7.697063446044922, -0.6636428833007812, 0.0070323944091796875, -10.857284545898438, -2.2006301879882812, 0.4268798828125, 4.960508346557617, 1.1467056274414062, 9.822158813476562, 0.7486572265625, 8.939559936523438, 9.279830932617188, 4.66954231262207, 5.361812591552734, 8.2137451171875, 4.400108337402344, -5.446908950805664, 0.3265838623046875, 1.62451171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000430.npy"}
{"epoch": 0.6500377928949358, "step": 431, "batch_size": 64, "mean": 4.228203296661377, "std": 5.953577518463135, "min": -6.254840850830078, "p10": -1.7079921722412108, "median": 3.343194007873535, "p90": 12.310240554809571, "max": 24.15787124633789, "pos_frac": 0.78125, "sample": [-6.254840850830078, 6.832736968994141, 4.093482971191406, 0.8823280334472656, 0.49818992614746094, 7.901466369628906, 12.0865478515625, 4.59051513671875, 6.2951507568359375, -1.4022464752197266, -0.34240150451660156, 19.164066314697266, 5.956687927246094, 0.10738372802734375, 3.3288955688476562, 4.594594955444336, 17.126564025878906, -1.7815284729003906, 9.835540771484375, 3.564666748046875, 3.8480300903320312, -4.522926330566406, 1.6521759033203125, 14.963785171508789, -3.3817291259765625, 15.856185913085938, 12.406108856201172, -2.6838321685791016, 4.105865478515625, -1.3068199157714844, 2.3687705993652344, 2.607656478881836, 5.167289733886719, 10.933486938476562, -1.536407470703125, 1.8362274169921875, 12.436744689941406, 0.15180206298828125, -6.1058349609375, -0.18097686767578125, 0.7465057373046875, 1.0892791748046875, -3.0983657836914062, 8.017860412597656, 2.761871337890625, 2.4585189819335938, 6.961334228515625, 4.45697021484375, 3.1731529235839844, 24.15787124633789, 2.0730209350585938, 3.538005828857422, 1.6068115234375, 6.319629669189453, 5.487342834472656, 4.438701629638672, 1.6580047607421875, -0.7329578399658203, 0.69500732421875, 10.056434631347656, 11.407419204711914, 3.357492446899414, 5.312896728515625, -1.03118896484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000431.npy"}
{"epoch": 0.6515495086923658, "step": 432, "batch_size": 64, "mean": 4.6349287033081055, "std": 6.274265766143799, "min": -10.152587890625, "p10": -1.8882675170898438, "median": 3.5696563720703125, "p90": 13.016725921630862, "max": 18.18421173095703, "pos_frac": 0.78125, "sample": [0.42888641357421875, -3.9470272064208984, 0.8558578491210938, -10.152587890625, 1.4454402923583984, 2.692361831665039, -1.6972846984863281, 3.6775436401367188, -2.4472885131835938, 14.002714157104492, 3.638629913330078, -8.624717712402344, -0.90985107421875, 3.500682830810547, 10.212600708007812, -4.383453369140625, 7.273139953613281, -1.74560546875, 14.877548217773438, 16.41192626953125, -0.83160400390625, 7.951057434082031, 13.214569091796875, 3.9941329956054688, 0.43416404724121094, 11.384788513183594, 3.2389144897460938, 13.412704467773438, 8.684944152832031, -1.0979061126708984, -1.7788124084472656, 0.7645854949951172, 1.3607044219970703, -3.1832122802734375, 9.165863037109375, 4.085521697998047, 12.555091857910156, 7.5176239013671875, 3.329803466796875, 9.268699645996094, -1.9351768493652344, 6.155368804931641, 8.011222839355469, 0.21502685546875, 9.14630126953125, 0.885406494140625, 0.12338829040527344, 12.376337051391602, 0.6363182067871094, 17.98809051513672, 10.486478805541992, 5.937534332275391, 9.481712341308594, 9.098464965820312, 7.09088134765625, 18.18421173095703, 0.46728515625, 8.89584732055664, 8.850418090820312, -1.4666862487792969, 0.0136566162109375, 12.535720825195312, 1.43389892578125, 3.442584991455078], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000432.npy"}
{"epoch": 0.6530612244897959, "step": 433, "batch_size": 64, "mean": 5.798556327819824, "std": 7.344533920288086, "min": -9.88616943359375, "p10": -2.2432485580444337, "median": 4.016263008117676, "p90": 16.479643630981446, "max": 24.865249633789062, "pos_frac": 0.78125, "sample": [3.498889923095703, 9.346899032592773, -0.18697738647460938, 3.7587127685546875, 5.866294860839844, 2.77398681640625, 0.6423683166503906, 12.528961181640625, -1.29638671875, 4.2708740234375, 2.1457366943359375, 3.7435073852539062, 2.5103302001953125, -3.5858535766601562, 19.383750915527344, -5.207359313964844, 4.129831314086914, 12.395423889160156, 9.372913360595703, -2.120563507080078, 11.476264953613281, 1.0342864990234375, 2.2470054626464844, 9.028257369995117, -9.88616943359375, 1.8447837829589844, 13.237716674804688, 0.043209075927734375, 10.77705192565918, 11.553791046142578, 0.7859306335449219, 7.1185760498046875, -2.295827865600586, 6.710258483886719, 7.713630676269531, 0.6267967224121094, -2.8890914916992188, 3.9026947021484375, 5.3454742431640625, 24.865249633789062, 17.519241333007812, 14.040912628173828, 18.203819274902344, 1.8856658935546875, 10.905769348144531, -7.888130187988281, -0.5340538024902344, 11.466995239257812, -0.5318374633789062, 16.495494842529297, 0.19002532958984375, 7.703193664550781, 6.4506683349609375, -0.7840728759765625, 0.033447265625, 14.181087493896484, 7.900505065917969, -0.00569915771484375, 12.126869201660156, 16.442657470703125, 18.66010284423828, 1.8725967407226562, 20.555330276489258, -2.9942092895507812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000433.npy"}
{"epoch": 0.654572940287226, "step": 434, "batch_size": 64, "mean": 5.430469036102295, "std": 7.299190998077393, "min": -14.141983032226562, "p10": -4.07616958618164, "median": 5.927631378173828, "p90": 15.530664253234866, "max": 25.214786529541016, "pos_frac": 0.796875, "sample": [-10.906646728515625, 6.222267150878906, 9.810785293579102, 2.0604248046875, 7.2874298095703125, 14.738628387451172, -4.587421417236328, 9.353736877441406, -4.3519134521484375, 4.3396453857421875, 1.3033523559570312, -4.198474884033203, -14.141983032226562, 7.5280609130859375, 7.332256317138672, 1.2660369873046875, 9.629894256591797, 6.417026519775391, 15.91748046875, 16.900936126708984, 4.085777282714844, -0.8460960388183594, 16.155975341796875, 1.6430511474609375, -7.217477798461914, 9.003082275390625, 15.820741653442383, 7.217124938964844, 1.9165802001953125, 6.5159454345703125, -1.0589103698730469, 11.986865997314453, 5.63299560546875, 25.214786529541016, 19.142250061035156, 9.305416107177734, -3.790790557861328, 7.9055023193359375, 10.1978759765625, -0.232574462890625, 14.853816986083984, -3.5201797485351562, 4.479455947875977, 6.785102844238281, 4.722404479980469, 2.225006103515625, 1.7892341613769531, 13.344657897949219, 10.595321655273438, 6.442899703979492, 0.4899101257324219, 15.877532958984375, -5.870983123779297, 1.0333099365234375, 1.7748565673828125, 4.82470703125, 10.018810272216797, 11.933609008789062, 4.020296096801758, -3.1141128540039062, 5.615947723388672, 6.604461669921875, 8.90110969543457, 3.2032012939453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000434.npy"}
{"epoch": 0.656084656084656, "step": 435, "batch_size": 64, "mean": 4.466201305389404, "std": 6.320147514343262, "min": -12.032554626464844, "p10": -2.775394058227538, "median": 3.710247039794922, "p90": 12.94753036499024, "max": 21.07665252685547, "pos_frac": 0.78125, "sample": [15.350305557250977, 2.2499923706054688, -0.16373443603515625, 2.391267776489258, -5.42664909362793, 1.8139877319335938, 5.022857666015625, 1.2258892059326172, 11.22625732421875, -0.7266883850097656, 5.961093902587891, 10.869987487792969, -3.2034912109375, 8.019607543945312, 11.346151351928711, 11.425033569335938, 3.6468353271484375, -6.246337890625, -0.080322265625, 13.794059753417969, 6.665802001953125, 1.6712760925292969, 8.133522033691406, 15.076934814453125, 1.090545654296875, 4.897167205810547, -1.420694351196289, 2.665771484375, 13.481307983398438, 11.164573669433594, 1.5560760498046875, 9.4588623046875, -7.911664962768555, 5.167734146118164, 3.8823013305664062, 8.951908111572266, -1.1585922241210938, 2.24322509765625, 6.422868728637695, 6.5395050048828125, 7.6712799072265625, 15.1654052734375, 21.07665252685547, 9.554252624511719, 0.1834430694580078, 3.6918716430664062, 3.2942962646484375, -1.2116050720214844, -1.7765007019042969, 14.973968505859375, -4.531745910644531, 4.200714111328125, 0.19370651245117188, 3.7286224365234375, 5.581089019775391, 4.697376251220703, 11.702049255371094, -4.985813140869141, -12.032554626464844, 2.76226806640625, 0.14039993286132812, 3.412628173828125, 10.626800537109375, 0.6437492370605469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000435.npy"}
{"epoch": 0.6575963718820862, "step": 436, "batch_size": 64, "mean": 4.684969902038574, "std": 6.4650373458862305, "min": -8.941219329833984, "p10": -3.0732372283935545, "median": 4.462419509887695, "p90": 13.987956619262699, "max": 18.86932373046875, "pos_frac": 0.71875, "sample": [3.5766448974609375, 1.6726360321044922, 16.122177124023438, 0.8136672973632812, 4.203716278076172, 1.3387298583984375, 4.485618591308594, 11.433212280273438, -3.05950927734375, 4.439220428466797, 6.63916015625, 12.940977096557617, 3.5688934326171875, 9.3521728515625, 2.1449050903320312, 13.313396453857422, 16.086441040039062, 14.947845458984375, -6.5964508056640625, 9.841854095458984, 8.054405212402344, 5.471063613891602, -2.6063995361328125, 2.8023147583007812, 5.471643447875977, -1.5537185668945312, 18.86932373046875, 0.4741172790527344, 8.694007873535156, 9.612220764160156, 6.399875640869141, -0.38519287109375, -8.941219329833984, -1.16705322265625, -0.18259811401367188, 4.108940124511719, 8.248214721679688, -0.215667724609375, -0.7010116577148438, -2.050506591796875, 8.879974365234375, 6.3930511474609375, 5.716045379638672, -2.6862716674804688, -3.079120635986328, 4.429267883300781, -4.141162872314453, 17.791458129882812, 6.585906982421875, 14.83331298828125, 8.41054916381836, 1.203165054321289, 7.96868896484375, 6.8310699462890625, -1.5121612548828125, 6.674629211425781, 11.526069641113281, -4.344032287597656, 6.71514892578125, 0.4282112121582031, -7.8146820068359375, 14.277053833007812, -3.8013553619384766, 10.885208129882812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000436.npy"}
{"epoch": 0.6591080876795162, "step": 437, "batch_size": 64, "mean": 4.458591938018799, "std": 7.9100728034973145, "min": -15.99578857421875, "p10": -4.231438636779785, "median": 3.6901416778564453, "p90": 14.993705368041994, "max": 27.1358642578125, "pos_frac": 0.703125, "sample": [17.194442749023438, 14.823936462402344, 8.27151870727539, 5.468994140625, 1.8588485717773438, -3.8574981689453125, 14.761627197265625, 8.19012451171875, 27.1358642578125, 2.692249298095703, 9.347747802734375, 2.534698486328125, -15.99578857421875, 6.4434814453125, -3.6523208618164062, -2.5637664794921875, 4.975151062011719, 3.8985328674316406, 4.935436248779297, -2.9161834716796875, 20.105533599853516, 20.323829650878906, 15.066463470458984, -3.4716453552246094, -4.493747711181641, 5.405555725097656, 10.6319580078125, -4.391698837280273, -7.195554733276367, -5.384132385253906, -3.430755615234375, -2.3808937072753906, -1.6082077026367188, -0.8582687377929688, 4.1284332275390625, 16.447153091430664, 7.487724304199219, -2.6841278076171875, 2.6142044067382812, 2.489788055419922, 3.455780029296875, 1.001810073852539, -2.0714244842529297, 9.730453491210938, 10.508895874023438, 12.235023498535156, 2.2882003784179688, 4.105276107788086, 15.241859436035156, -5.3681182861328125, 3.48175048828125, 13.69973373413086, 4.9534759521484375, 13.0296630859375, 6.802879333496094, 2.8457908630371094, 4.333381652832031, 7.119232177734375, -9.855758666992188, 0.2929954528808594, -0.5462303161621094, 1.9265213012695312, 1.4578323364257812, 12.3321533203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000437.npy"}
{"epoch": 0.6606198034769464, "step": 438, "batch_size": 64, "mean": 5.2789506912231445, "std": 6.704009532928467, "min": -9.006139755249023, "p10": -2.1738811492919923, "median": 4.8844757080078125, "p90": 13.967855072021484, "max": 21.944061279296875, "pos_frac": 0.765625, "sample": [17.724227905273438, -2.1865005493164062, 21.944061279296875, -6.826713562011719, 0.5683441162109375, -3.0444259643554688, -0.19295310974121094, 6.479000091552734, 0.9797801971435547, 11.70037841796875, 5.677272796630859, 17.013381958007812, -1.8343505859375, -0.7480621337890625, 9.074651718139648, 0.4455547332763672, 10.732301712036133, -9.006139755249023, 7.408073425292969, 6.025575637817383, 13.51141357421875, 9.005237579345703, -2.1444358825683594, 8.494720458984375, 7.156415939331055, 8.516927719116211, 5.753425598144531, 0.5023155212402344, 13.989303588867188, 13.917808532714844, 7.291107177734375, 1.754058837890625, 15.065666198730469, 2.6453800201416016, -0.021314620971679688, 10.962760925292969, 7.096763610839844, -4.427045822143555, 5.022640228271484, 8.38935661315918, 13.189321517944336, 0.6946563720703125, 3.505657196044922, 13.583770751953125, 11.147422790527344, 3.2672271728515625, 10.268592834472656, 1.4968528747558594, 1.5737686157226562, 2.945995330810547, -1.72711181640625, -0.15133285522460938, 4.111148834228516, 15.256668090820312, 7.0015869140625, -6.3680267333984375, 15.986549377441406, 1.2692947387695312, -5.442649841308594, 13.227025985717773, 2.6286392211914062, -1.0635299682617188, 4.746311187744141, 2.289003372192383], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000438.npy"}
{"epoch": 0.6621315192743764, "step": 439, "batch_size": 64, "mean": 4.7440595626831055, "std": 6.45287561416626, "min": -11.713653564453125, "p10": -3.1375074386596675, "median": 4.030641555786133, "p90": 13.815016937255864, "max": 19.729721069335938, "pos_frac": 0.8125, "sample": [4.712955474853516, -1.1422920227050781, 1.710744857788086, -6.8007354736328125, 7.205482482910156, 5.9190826416015625, 3.878744125366211, 1.8153228759765625, 4.168941497802734, 10.806472778320312, 18.27466583251953, -0.9867591857910156, 11.17161750793457, 3.6080169677734375, 3.7726287841796875, 12.114456176757812, 17.096359252929688, 0.01044464111328125, 10.332199096679688, -3.6711273193359375, 1.7552757263183594, 7.983203887939453, 6.780670166015625, -3.3061389923095703, 9.978862762451172, 5.787788391113281, 16.814117431640625, -3.3714866638183594, 1.1931209564208984, 14.520626068115234, -2.293100357055664, 12.736824035644531, -2.7440338134765625, -1.17510986328125, 7.6322021484375, 9.76727294921875, 0.521240234375, 3.7477645874023438, -11.713653564453125, 0.6153106689453125, -7.36334228515625, 5.8234710693359375, 12.274139404296875, 19.729721069335938, 4.995708465576172, 11.679512023925781, 3.863525390625, 14.972183227539062, 0.9808349609375, 3.7699966430664062, 4.405254364013672, 3.744415283203125, 4.194252014160156, 6.342735290527344, 5.609832763671875, 14.277099609375, 1.3226318359375, 3.371612548828125, -6.379066467285156, 0.270721435546875, 4.784961700439453, 6.0346221923828125, 3.8923416137695312, 1.7946701049804688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000439.npy"}
{"epoch": 0.6636432350718064, "step": 440, "batch_size": 64, "mean": 5.663836479187012, "std": 6.75380277633667, "min": -8.288324356079102, "p10": -0.7990261077880859, "median": 4.547966003417969, "p90": 16.253149032592777, "max": 21.696533203125, "pos_frac": 0.796875, "sample": [-2.067890167236328, 8.174667358398438, 12.198196411132812, 16.905258178710938, 16.668941497802734, 21.696533203125, 18.863365173339844, -0.1953105926513672, 13.894279479980469, -0.460479736328125, 1.7291412353515625, 18.6020565032959, 0.189239501953125, 12.480461120605469, 3.219776153564453, 2.3978805541992188, 11.269796371459961, 0.34944915771484375, -4.089740753173828, 2.889667510986328, 6.6661376953125, -0.28371620178222656, 1.0453872680664062, 9.591567993164062, 6.414466857910156, 8.062509536743164, 4.153564453125, 8.433517456054688, 0.7948017120361328, 3.110107421875, 2.7081069946289062, 4.9423675537109375, 6.5925750732421875, 0.6989936828613281, 0.5637531280517578, 2.7394866943359375, 5.098043441772461, -8.288324356079102, 5.350133895874023, 14.394033432006836, 7.434917449951172, 12.632244110107422, 1.94488525390625, 11.95595645904541, -7.768291473388672, 1.3818283081054688, 16.707008361816406, 4.012989044189453, 8.24041748046875, -0.04964447021484375, -0.8159828186035156, -3.0958995819091797, 9.02862548828125, 0.6302146911621094, 5.51325798034668, 1.2054367065429688, -0.7385673522949219, -2.316049575805664, 10.71445083618164, -0.75946044921875, 5.780097961425781, 18.844921112060547, 15.282966613769531, 9.216400146484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000440.npy"}
{"epoch": 0.6651549508692366, "step": 441, "batch_size": 64, "mean": 5.577340126037598, "std": 5.577171802520752, "min": -6.65478515625, "p10": -0.8840320587158199, "median": 4.549709320068359, "p90": 14.749160766601564, "max": 18.067947387695312, "pos_frac": 0.859375, "sample": [7.745595932006836, 9.270811080932617, 15.934497833251953, 9.976234436035156, 0.77142333984375, 0.5779514312744141, 11.044620513916016, 0.7774276733398438, 6.6568603515625, 0.443511962890625, -2.878498077392578, 8.471431732177734, 3.0009841918945312, 15.90753173828125, 1.777994155883789, 4.10748291015625, 5.560546875, 2.5763015747070312, -6.65478515625, 15.965843200683594, 2.74237060546875, 7.372013092041016, 7.591552734375, 14.903839111328125, 3.6273193359375, 17.7237491607666, 2.8349952697753906, 4.350227355957031, 18.067947387695312, 0.68316650390625, -1.0529708862304688, 4.5414581298828125, -0.4898414611816406, 8.210586547851562, 3.7842369079589844, -1.700714111328125, -0.0859527587890625, 12.303121566772461, 13.265811920166016, 2.9392547607421875, 8.602432250976562, 11.435935974121094, 1.3943405151367188, 4.7094879150390625, 6.2707672119140625, 5.3340911865234375, 6.966480255126953, 2.427764892578125, 4.117095947265625, -1.4629745483398438, 10.812332153320312, 1.0365715026855469, 14.891044616699219, 14.418098449707031, -3.1327381134033203, 6.2494049072265625, 2.8859100341796875, 4.736446380615234, 4.033210754394531, 4.557960510253906, -2.6259288787841797, 8.235069274902344, 5.833221435546875, 2.5778121948242188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000441.npy"}
{"epoch": 0.6666666666666666, "step": 442, "batch_size": 64, "mean": 4.7306718826293945, "std": 6.480957984924316, "min": -8.403617858886719, "p10": -2.481182098388672, "median": 4.400547027587891, "p90": 11.853509140014648, "max": 23.491838455200195, "pos_frac": 0.78125, "sample": [0.8810806274414062, 10.591720581054688, 6.555877685546875, -2.5019302368164062, 2.1618270874023438, 22.070755004882812, 15.898178100585938, 11.68453598022461, 5.996944427490234, 9.90576171875, 5.856685638427734, 9.824790954589844, 14.3297119140625, -1.02764892578125, 6.460857391357422, 5.673057556152344, -0.22156906127929688, 1.4394378662109375, 1.8147430419921875, -2.432769775390625, 0.19821929931640625, 10.027297973632812, 7.187450408935547, 2.7899017333984375, 8.221542358398438, 10.09906005859375, -8.403617858886719, 5.884037017822266, -1.1727371215820312, 11.86822509765625, 4.408271789550781, 4.296760559082031, 5.4608612060546875, 8.734779357910156, 6.9728240966796875, 1.2657432556152344, -3.4754486083984375, 2.3847923278808594, 23.491838455200195, 12.255157470703125, 2.6024551391601562, -1.8892555236816406, 2.018838882446289, 4.776966094970703, -1.9143085479736328, -5.213226318359375, -5.688179016113281, 15.953323364257812, -6.1308746337890625, 0.553802490234375, 11.017868041992188, 7.4050140380859375, -7.8975677490234375, 3.3939361572265625, 1.032318115234375, -1.151153564453125, 4.392822265625, 2.3677520751953125, 0.780914306640625, 10.263477325439453, 8.867874145507812, 11.819171905517578, 2.6714210510253906, 5.272581100463867], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000442.npy"}
{"epoch": 0.6681783824640968, "step": 443, "batch_size": 64, "mean": 6.226693153381348, "std": 7.557669639587402, "min": -10.724348068237305, "p10": -1.1763950347900392, "median": 4.761666297912598, "p90": 16.15639400482178, "max": 22.25396728515625, "pos_frac": 0.84375, "sample": [1.3890419006347656, 12.140785217285156, 8.620643615722656, 13.684173583984375, 1.2536468505859375, 15.009490966796875, 6.650596618652344, -8.077251434326172, 3.2304306030273438, 2.4773635864257812, 0.5259475708007812, -2.249715805053711, 2.9733924865722656, 5.6404266357421875, 1.6683483123779297, 9.609840393066406, 0.04342460632324219, 7.611259460449219, 4.9830169677734375, 11.253799438476562, 22.25396728515625, -10.724348068237305, -3.7731399536132812, 11.889663696289062, 0.9390296936035156, 19.302581787109375, 20.29334259033203, 2.952310562133789, 20.98371124267578, 15.410322189331055, 9.320518493652344, 3.1602134704589844, 5.822824478149414, 9.177936553955078, 10.872512817382812, 14.339118957519531, 12.492158889770508, 14.132675170898438, 13.437049865722656, 10.972686767578125, 2.23577880859375, 10.161340713500977, 0.4855690002441406, 8.177314758300781, 16.476139068603516, 13.75014877319336, -1.1861190795898438, 3.4024200439453125, 2.3597869873046875, -0.01849365234375, -0.3536243438720703, -6.185752868652344, 3.30059814453125, 4.653800964355469, 1.3592262268066406, 1.8131294250488281, -10.572227478027344, 4.098167419433594, 16.78868865966797, 4.869531631469727, 0.0578765869140625, -1.1537055969238281, 1.7962570190429688, 20.498733520507812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000443.npy"}
{"epoch": 0.6696900982615268, "step": 444, "batch_size": 64, "mean": 2.382774591445923, "std": 6.640721321105957, "min": -11.773567199707031, "p10": -5.813668823242187, "median": 1.4477195739746094, "p90": 11.901855087280275, "max": 16.885452270507812, "pos_frac": 0.671875, "sample": [12.45458984375, 0.7579078674316406, -2.5228118896484375, 0.1365070343017578, 3.6390380859375, 6.122528076171875, 6.473472595214844, -7.1722869873046875, -3.013286590576172, 12.130321502685547, -0.9946708679199219, 11.368766784667969, -2.95513916015625, -8.727790832519531, 2.2054901123046875, 8.19317626953125, 10.472257614135742, 4.234453201293945, 1.8389759063720703, -8.988632202148438, 12.554244995117188, 9.757049560546875, 0.7570209503173828, 1.123748779296875, -4.265472412109375, 0.2961578369140625, 1.10009765625, -1.7838268280029297, -3.7643051147460938, -4.042354583740234, 10.663467407226562, 1.4882888793945312, 6.982078552246094, 2.9382858276367188, 3.7168197631835938, 9.12601089477539, 0.4768791198730469, 7.743019104003906, -1.4809951782226562, -11.773567199707031, 2.1980438232421875, 16.885452270507812, -9.299530029296875, -5.315681457519531, 0.017988204956054688, 10.30061149597168, 5.211151123046875, 15.667083740234375, 12.786754608154297, 1.4071502685546875, 4.298315048217773, 2.9438323974609375, 0.2961616516113281, -1.2136306762695312, 8.957008361816406, -1.5195846557617188, -2.2523536682128906, 1.9439868927001953, 5.843353271484375, -6.027091979980469, -10.085586547851562, 14.129287719726562, -2.091066360473633, 0.15041160583496094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000444.npy"}
{"epoch": 0.671201814058957, "step": 445, "batch_size": 64, "mean": 5.074286460876465, "std": 6.551628589630127, "min": -8.200614929199219, "p10": -3.5553527832031246, "median": 3.837648391723633, "p90": 14.189237976074219, "max": 19.738807678222656, "pos_frac": 0.78125, "sample": [9.18240737915039, 15.972541809082031, 5.260017395019531, -0.693939208984375, -4.0948944091796875, 4.920623779296875, 13.338361740112305, 2.9290771484375, 6.11181640625, 3.0959396362304688, -0.22026824951171875, -4.395845413208008, -0.7507152557373047, 2.564727783203125, -6.7387847900390625, 13.83116340637207, -3.648040771484375, 1.15264892578125, 0.201324462890625, 3.1693382263183594, 0.21436119079589844, 13.344581604003906, -7.6597137451171875, 2.0985183715820312, -8.200614929199219, 11.620765686035156, 4.461887359619141, 13.47027587890625, 3.2251052856445312, -0.01947784423828125, -6.560375213623047, -3.279315948486328, 3.619476318359375, 14.613845825195312, 19.738807678222656, 6.32110595703125, 9.125579833984375, 15.214448928833008, 2.7572879791259766, 0.08333587646484375, 9.31353759765625, 8.15390396118164, 6.739387512207031, 14.054794311523438, 9.18797492980957, 10.098075866699219, 14.246856689453125, 1.0697059631347656, 7.93272590637207, -1.3248577117919922, 3.7963905334472656, 2.7299041748046875, -3.339080810546875, 3.4858627319335938, 8.45782470703125, 8.397567749023438, 14.963630676269531, 10.565181732177734, 1.6453113555908203, 3.87890625, 7.785575866699219, 8.779623031616211, 15.37548828125, 3.3826751708984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000445.npy"}
{"epoch": 0.672713529856387, "step": 446, "batch_size": 64, "mean": 6.717385292053223, "std": 6.160783290863037, "min": -3.865549087524414, "p10": -0.6556335449218749, "median": 5.846120834350586, "p90": 14.901636123657227, "max": 25.7147216796875, "pos_frac": 0.84375, "sample": [0.6134586334228516, -0.3598213195800781, -3.865549087524414, 8.0263671875, 4.38330078125, 12.026718139648438, 2.9586219787597656, -3.3433761596679688, 25.7147216796875, 15.273494720458984, 6.835422515869141, 5.7701568603515625, 8.816650390625, 10.383743286132812, 11.02396011352539, 17.34833526611328, 5.902015686035156, 7.56068229675293, 12.23968505859375, 7.561515808105469, 0.7555580139160156, 9.531173706054688, 0.275177001953125, 9.20083236694336, 14.951972961425781, 7.108924865722656, 1.8237686157226562, 10.992603302001953, 14.667465209960938, -1.1960372924804688, 10.16972541809082, 1.6327457427978516, -0.7093887329101562, 1.5910263061523438, -0.9894638061523438, -0.1454334259033203, 1.5661296844482422, -2.673358917236328, 9.093971252441406, 4.287101745605469, 5.204368591308594, 19.816978454589844, 14.784183502197266, 5.790225982666016, 16.473922729492188, 1.6461124420166016, 4.6128082275390625, 13.504680633544922, -0.5302047729492188, 5.338140487670898, 9.508209228515625, 2.1007118225097656, 7.4875946044921875, 11.898025512695312, 12.291566848754883, 16.845447540283203, 2.7175521850585938, 4.807710647583008, 2.345775604248047, 11.400131225585938, -0.8614597320556641, 3.918912887573242, 6.272804260253906, 5.733848571777344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000446.npy"}
{"epoch": 0.674225245653817, "step": 447, "batch_size": 64, "mean": 6.6432905197143555, "std": 7.258760452270508, "min": -18.767864227294922, "p10": -2.2476364135742184, "median": 6.7209930419921875, "p90": 15.379793930053712, "max": 26.339569091796875, "pos_frac": 0.828125, "sample": [15.890472412109375, 11.452316284179688, 8.746528625488281, 1.3312149047851562, 1.2533950805664062, -1.7611312866210938, 8.297676086425781, -3.1371231079101562, -5.445827484130859, 2.0475616455078125, 4.588077545166016, 4.976509094238281, 4.738273620605469, 12.852020263671875, 9.133796691894531, 20.07207489013672, 4.10536003112793, 8.662712097167969, 5.463962554931641, 6.10101318359375, 4.394527435302734, -4.607872009277344, 6.1400146484375, -5.1536712646484375, 1.62451171875, 13.583526611328125, 14.979217529296875, 11.271270751953125, 10.015426635742188, 6.6932220458984375, 9.5731201171875, 11.31187629699707, 5.796482086181641, 11.71490478515625, 9.588314056396484, 7.868640899658203, 18.76697540283203, 6.6804351806640625, 16.646947860717773, -0.1941204071044922, 7.026985168457031, 4.793914794921875, 13.146499633789062, -2.4561386108398438, 1.007883071899414, -0.385711669921875, 1.8404541015625, 18.52313995361328, 11.048324584960938, 7.506307601928711, -0.0159149169921875, 13.297513961791992, -3.2173004150390625, 15.306705474853516, 7.4351043701171875, 7.111480712890625, 6.478431701660156, -18.767864227294922, 3.053333282470703, 26.339569091796875, 15.411117553710938, 1.104827880859375, 6.7487640380859375, 6.770544052124023], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000447.npy"}
{"epoch": 0.6757369614512472, "step": 448, "batch_size": 64, "mean": 5.357995986938477, "std": 6.641022682189941, "min": -13.185989379882812, "p10": -1.8416463851928702, "median": 4.9532470703125, "p90": 13.001782798767092, "max": 23.390392303466797, "pos_frac": 0.84375, "sample": [10.077964782714844, -8.507272720336914, 13.159036636352539, 1.2866668701171875, 8.701919555664062, 4.972320556640625, -0.20873260498046875, 6.6990814208984375, 4.4247589111328125, 0.42157745361328125, 12.634857177734375, 10.166015625, 1.9188766479492188, 2.552581787109375, 4.934173583984375, 12.017303466796875, -0.9517688751220703, 15.379669189453125, 8.334159851074219, 1.6237049102783203, 1.815093994140625, 23.390392303466797, 5.612464904785156, 6.4915008544921875, 4.614627838134766, -0.4100303649902344, 10.247726440429688, 4.010105133056641, 7.6927947998046875, 5.9178619384765625, -4.074443817138672, 0.3620452880859375, 4.8415374755859375, 2.61260986328125, 2.1354141235351562, 8.475135803222656, -2.2230224609375, 6.176858901977539, 17.238082885742188, -3.543539047241211, 7.041891098022461, -13.185989379882812, 6.474517822265625, 1.0129241943359375, 5.7930450439453125, 9.326393127441406, 6.9063720703125, 12.299156188964844, 7.20225715637207, 1.3528556823730469, -2.6436729431152344, 3.2169189453125, 3.0579147338867188, 11.167068481445312, 0.1915111541748047, 17.129913330078125, 0.1551971435546875, -6.260520935058594, 16.179920196533203, 8.312187194824219, 11.892105102539062, 2.4716567993164062, 1.510040283203125, 21.287994384765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000448.npy"}
{"epoch": 0.6772486772486772, "step": 449, "batch_size": 64, "mean": 3.257981777191162, "std": 5.509819030761719, "min": -7.429803848266602, "p10": -2.657843589782715, "median": 3.369464874267578, "p90": 9.477563476562501, "max": 19.88888931274414, "pos_frac": 0.703125, "sample": [-0.30538368225097656, 2.9534759521484375, -6.969322204589844, 9.066268920898438, -1.9333877563476562, -2.1648101806640625, 2.54180908203125, 3.2970046997070312, -2.4717979431152344, 9.807083129882812, 5.619132995605469, 3.4538955688476562, 3.7759017944335938, -1.1618576049804688, 1.49078369140625, 9.634521484375, 19.88888931274414, 3.647062301635742, 0.5563125610351562, 4.574542999267578, 0.8572120666503906, 4.691839218139648, 15.06085205078125, 16.918678283691406, -5.770404815673828, 3.840160369873047, 6.289833068847656, 3.441925048828125, 5.814453125, 2.4101181030273438, 5.7295684814453125, -0.040863037109375, 4.373939514160156, -7.429803848266602, 10.741477966308594, 4.2398529052734375, 14.420333862304688, -1.0952072143554688, 3.61468505859375, 5.6885833740234375, 2.063425064086914, -2.2646636962890625, 0.3972129821777344, 8.368026733398438, -1.6171989440917969, 6.977359771728516, -2.737577438354492, 8.099151611328125, 2.043426513671875, 3.9211196899414062, 9.111328125, -2.3247146606445312, -0.16716766357421875, 2.0531654357910156, -4.352691650390625, 6.668487548828125, 7.997982025146484, -6.794807434082031, 6.953357696533203, 1.3323822021484375, 6.538959503173828, -3.8813323974609375, 1.556325912475586, -0.5280838012695312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000449.npy"}
{"epoch": 0.6787603930461074, "step": 450, "batch_size": 64, "mean": 4.098100662231445, "std": 6.59647798538208, "min": -9.06682014465332, "p10": -3.4233665466308594, "median": 2.844114303588867, "p90": 13.40026206970215, "max": 22.732376098632812, "pos_frac": 0.71875, "sample": [0.6172122955322266, -0.4686431884765625, -3.4596214294433594, -0.5422515869140625, 14.780914306640625, 13.484088897705078, 4.2825927734375, 2.7456512451171875, 11.015548706054688, 19.500808715820312, 7.1567230224609375, 3.0784664154052734, 9.959272384643555, 1.5380401611328125, 0.176483154296875, 15.960334777832031, -0.7433090209960938, 2.717620849609375, 7.145545959472656, 22.732376098632812, -0.8046302795410156, 4.322109222412109, 2.9983367919921875, -2.196216583251953, 0.05560302734375, -3.3387718200683594, 13.204666137695312, 13.001022338867188, 4.285728454589844, -5.093477249145508, 2.5438499450683594, 11.87152099609375, 2.607776641845703, 6.744916915893555, -9.06682014465332, 12.311378479003906, 7.996795654296875, -0.0056133270263671875, 10.3150634765625, 4.225336074829102, 2.942577362060547, 8.87799072265625, 2.7207412719726562, -8.403940200805664, 4.638359069824219, 2.2809276580810547, 5.616374969482422, 2.2194747924804688, 3.859405517578125, -1.4289283752441406, 0.9732303619384766, -5.932682037353516, 1.687307357788086, 3.916971206665039, 14.8228759765625, 4.532844543457031, -3.64703369140625, 9.260540008544922, -0.04376983642578125, 13.742023468017578, -2.3763275146484375, 1.0791816711425781, -6.106611251831055, -0.6095352172851562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000450.npy"}
{"epoch": 0.6802721088435374, "step": 451, "batch_size": 64, "mean": 3.965038537979126, "std": 6.617405414581299, "min": -9.424591064453125, "p10": -3.3293697357177727, "median": 3.3574066162109375, "p90": 11.88691444396973, "max": 20.67236328125, "pos_frac": 0.734375, "sample": [11.354381561279297, 1.8782958984375, 6.515754699707031, 3.7482833862304688, 10.436111450195312, 0.96307373046875, -8.811187744140625, 4.7617645263671875, 2.695344924926758, 11.09893798828125, -2.3215789794921875, 5.702659606933594, 10.798477172851562, 1.9142608642578125, 6.936397552490234, 2.98779296875, 5.858940124511719, 7.265434265136719, 1.7374401092529297, 8.847574234008789, 1.3382415771484375, -9.420166015625, -2.7283973693847656, 20.67236328125, 9.563549041748047, -5.728370666503906, 4.5617828369140625, -9.424591064453125, -5.29901123046875, 0.6339035034179688, 5.754507064819336, 12.589324951171875, -1.2988166809082031, 1.5795326232910156, 5.338420867919922, 4.2254486083984375, 15.680940628051758, 10.739273071289062, 0.7700157165527344, 4.7132415771484375, -0.13036346435546875, 12.115142822265625, 0.4101085662841797, -0.8485584259033203, 4.719175338745117, -0.722076416015625, 4.559032440185547, 17.737136840820312, 16.572547912597656, 2.9580078125, 3.727020263671875, 1.7250213623046875, 5.3785552978515625, -2.5058670043945312, -2.063884735107422, -0.5969924926757812, 11.164693832397461, 0.8849754333496094, -1.8385047912597656, 8.846065521240234, 19.949100494384766, -4.140190124511719, -3.5869293212890625, 0.8199005126953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000451.npy"}
{"epoch": 0.6817838246409675, "step": 452, "batch_size": 64, "mean": 4.886285781860352, "std": 6.727230548858643, "min": -12.497970581054688, "p10": -2.7075601577758786, "median": 4.465192794799805, "p90": 13.935597991943359, "max": 19.352012634277344, "pos_frac": 0.765625, "sample": [6.568534851074219, -0.81005859375, 4.020635604858398, 11.499275207519531, 0.9342880249023438, 9.220495223999023, 4.954261779785156, 13.905685424804688, 10.343944549560547, 15.317085266113281, 2.286884307861328, 16.375900268554688, -10.030426025390625, 10.293418884277344, 13.948417663574219, 3.1690444946289062, 17.948623657226562, 7.06207275390625, 7.483190536499023, 12.829345703125, 3.3934402465820312, -0.8538894653320312, 7.892791748046875, 4.916969299316406, 1.0715656280517578, -0.9348506927490234, 6.191497802734375, 4.19305419921875, 2.4636154174804688, 0.7806472778320312, 0.03005218505859375, 19.352012634277344, -4.821586608886719, 12.161487579345703, -2.2515716552734375, 11.834175109863281, 4.881675720214844, 5.862148284912109, -5.384269714355469, 2.0743465423583984, 3.7933998107910156, -2.256683349609375, 13.525413513183594, 4.737331390380859, 8.607208251953125, -4.400665283203125, 16.08749771118164, 15.431869506835938, 5.9778289794921875, -0.8485984802246094, 3.512655258178711, -2.9007930755615234, 4.066349029541016, 7.315547943115234, -0.6280937194824219, -1.9161605834960938, 0.5150680541992188, 6.704692840576172, 9.415201187133789, -3.5603256225585938, 0.13655853271484375, 10.654502868652344, -12.497970581054688, 1.076507568359375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000452.npy"}
{"epoch": 0.6832955404383976, "step": 453, "batch_size": 64, "mean": 4.907158851623535, "std": 7.174266338348389, "min": -10.214136123657227, "p10": -3.4514545440673827, "median": 3.291494369506836, "p90": 14.829991722106934, "max": 20.827987670898438, "pos_frac": 0.734375, "sample": [6.351959228515625, 8.256555557250977, -2.3868026733398438, -1.1171913146972656, 19.276649475097656, 1.0756301879882812, 0.8914527893066406, 6.937095642089844, 10.925003051757812, -3.9649429321289062, -10.214136123657227, 2.4247665405273438, 10.138053894042969, -2.7695178985595703, 12.304443359375, 9.861785888671875, 14.93896484375, 0.914703369140625, 14.268165588378906, 7.950981140136719, 1.0061378479003906, -2.6656265258789062, 2.0965423583984375, 1.017852783203125, 0.5704841613769531, -3.5641250610351562, -5.1805572509765625, 14.901529312133789, -3.7647132873535156, 8.103612899780273, 19.68084716796875, 2.7651729583740234, 8.80484390258789, -6.989158630371094, -0.34346771240234375, 3.3220787048339844, 14.19938850402832, 6.9654541015625, -2.8717422485351562, 20.827987670898438, 3.514934539794922, 14.663070678710938, -3.188556671142578, 0.4969139099121094, -0.2876434326171875, 6.118888854980469, -2.048419952392578, 1.5385856628417969, 5.984703063964844, 8.797927856445312, 5.920108795166016, 0.24200820922851562, 3.2609100341796875, 8.437032699584961, 6.166343688964844, -4.574237823486328, -1.375823974609375, 16.45989227294922, 11.380191802978516, 1.54144287109375, 13.887916564941406, 12.769582748413086, 16.87895965576172, 2.5272750854492188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000453.npy"}
{"epoch": 0.6848072562358276, "step": 454, "batch_size": 64, "mean": 5.496814727783203, "std": 7.4250288009643555, "min": -18.137367248535156, "p10": -2.722002220153808, "median": 4.600458145141602, "p90": 15.30749969482422, "max": 21.708084106445312, "pos_frac": 0.8125, "sample": [14.217473983764648, -5.663145065307617, -1.8605804443359375, 10.470069885253906, -18.137367248535156, -2.1152820587158203, 9.804977416992188, 0.12009429931640625, 3.065532684326172, 3.3359375, 6.805500030517578, 14.876325607299805, 6.350364685058594, -3.8196182250976562, 11.291366577148438, 4.628597259521484, 15.938955307006836, 10.20378303527832, 8.626106262207031, 9.088645935058594, -1.6065635681152344, -7.154937744140625, -1.3105926513671875, 11.548439025878906, 18.87122344970703, 17.419189453125, -4.0319366455078125, 2.8659133911132812, 5.321189880371094, 1.5049057006835938, 0.40944671630859375, 6.135284423828125, 15.133888244628906, -1.0595245361328125, 4.572319030761719, 1.322866439819336, 21.01373291015625, -2.982025146484375, 21.708084106445312, 2.7826271057128906, 1.9323158264160156, 2.2833709716796875, 1.182891845703125, 0.8618640899658203, 3.2567901611328125, 2.785430908203125, 15.381904602050781, 8.969741821289062, 2.4588565826416016, 6.859424591064453, 6.95562744140625, 6.691749572753906, 8.75930404663086, 17.88112449645996, 9.139318466186523, 0.1816253662109375, 3.217416763305664, 13.7303466796875, 5.5752716064453125, 14.818328857421875, 10.877128601074219, 0.8525428771972656, 0.9566135406494141, -3.5041122436523438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000454.npy"}
{"epoch": 0.6863189720332578, "step": 455, "batch_size": 64, "mean": 5.39186954498291, "std": 6.647305011749268, "min": -10.248210906982422, "p10": -1.6941913604736325, "median": 4.518733978271484, "p90": 13.86427001953125, "max": 24.383758544921875, "pos_frac": 0.78125, "sample": [10.381072998046875, 3.6939620971679688, 7.798255920410156, 12.740119934082031, 7.041999816894531, -0.9890956878662109, 18.448318481445312, 18.44034194946289, -0.276519775390625, 3.0920944213867188, 2.364381790161133, -1.8830184936523438, 5.570556640625, 4.635601043701172, 6.00567626953125, 24.383758544921875, 0.8705673217773438, -2.03741455078125, 4.2109222412109375, -1.3393630981445312, 15.695304870605469, 1.5004920959472656, 8.193733215332031, -0.6183547973632812, -0.7452468872070312, 12.370574951171875, 3.75347900390625, 11.489494323730469, 6.577964782714844, 0.9146652221679688, 5.987369537353516, 2.0821170806884766, 13.537544250488281, 1.3232498168945312, 5.278587341308594, 1.203155517578125, -10.248210906982422, 9.184860229492188, 12.015422821044922, 14.004295349121094, -4.306377410888672, -1.8163604736328125, 5.573173522949219, -5.29376220703125, 9.61081314086914, 11.778961181640625, 4.9963531494140625, 1.0412216186523438, 4.401866912841797, 0.20345306396484375, 21.55938720703125, 7.854164123535156, -1.4091300964355469, 9.225112915039062, -1.0831146240234375, 1.9901351928710938, 8.432323455810547, 3.217010498046875, -2.0903244018554688, 2.866670608520508, 1.0565567016601562, 6.925327301025391, 7.242725372314453, 16.45074462890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000455.npy"}
{"epoch": 0.6878306878306878, "step": 456, "batch_size": 64, "mean": 4.807732582092285, "std": 6.48820686340332, "min": -11.438125610351562, "p10": -2.2069400787353515, "median": 4.554189682006836, "p90": 13.10100193023682, "max": 19.775970458984375, "pos_frac": 0.75, "sample": [12.43310546875, -4.504753112792969, -0.5579051971435547, 7.327217102050781, 19.775970458984375, -0.9834575653076172, -1.5855255126953125, 5.984996795654297, -1.1625251770019531, 13.387243270874023, 8.810466766357422, 3.831256866455078, 12.284332275390625, 18.699966430664062, 6.77813720703125, 1.3828353881835938, -4.27496337890625, 8.680747985839844, 3.0929832458496094, 8.244331359863281, -2.2526168823242188, 2.3819503784179688, 5.167022705078125, 1.5876388549804688, 8.452077865600586, 0.3314838409423828, -8.779529571533203, -3.1808319091796875, 6.272668838500977, -0.8624038696289062, 3.3827438354492188, 3.378570556640625, 2.8819847106933594, 5.1405487060546875, 6.0482025146484375, 8.811830520629883, 2.384265899658203, -1.536651611328125, 8.03204345703125, 1.0920848846435547, -2.6359081268310547, 3.9678306579589844, 1.5178489685058594, 6.9547576904296875, -2.100360870361328, -11.438125610351562, 14.067276000976562, 5.428291320800781, -1.3429412841796875, -1.1446876525878906, 12.335197448730469, 19.03384017944336, 1.5522880554199219, 12.369728088378906, 5.291839599609375, 12.138740539550781, 10.828643798828125, 2.9637298583984375, 5.953346252441406, 17.58039093017578, 6.530229568481445, 1.0855941772460938, 6.249755859375, 14.130043029785156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000456.npy"}
{"epoch": 0.6893424036281179, "step": 457, "batch_size": 64, "mean": 4.321967601776123, "std": 5.734006881713867, "min": -7.82342529296875, "p10": -2.0856410980224607, "median": 3.248727798461914, "p90": 12.113709259033204, "max": 17.3826904296875, "pos_frac": 0.75, "sample": [2.3192520141601562, -4.5003662109375, 0.5727615356445312, 6.301357269287109, 6.50300407409668, -2.1089859008789062, 6.078275680541992, 0.8159332275390625, -1.3479461669921875, 1.3510208129882812, 2.795797348022461, 17.3826904296875, 6.881223678588867, 12.022689819335938, -3.2892208099365234, -1.8809738159179688, 11.53989028930664, 5.34881591796875, 0.4279956817626953, 9.955154418945312, 14.194709777832031, 5.631937026977539, 16.146453857421875, -1.720733642578125, -0.11036300659179688, 7.084753036499023, -2.183563232421875, 9.475448608398438, -7.82342529296875, 1.3386497497558594, -2.3466796875, -5.5418701171875, 1.8007354736328125, 8.440673828125, 5.204250335693359, 0.2493133544921875, -0.601959228515625, 10.2406005859375, 0.200836181640625, 3.339385986328125, 10.157859802246094, -2.031169891357422, 9.097635269165039, 15.532234191894531, -1.4541893005371094, 0.8021926879882812, 9.738433837890625, -0.7846412658691406, 6.615081787109375, 13.790990829467773, 12.152717590332031, 3.158069610595703, 13.25823974609375, 1.8757591247558594, 5.884449005126953, 11.656961441040039, 7.919776916503906, 2.705169677734375, 4.163122177124023, -0.2577934265136719, 4.7072296142578125, 2.7788925170898438, 4.635274887084961, 0.3161277770996094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000457.npy"}
{"epoch": 0.690854119425548, "step": 458, "batch_size": 64, "mean": 5.3481597900390625, "std": 6.848968982696533, "min": -14.865524291992188, "p10": -2.798225593566894, "median": 4.627769470214844, "p90": 14.325132369995117, "max": 20.347919464111328, "pos_frac": 0.78125, "sample": [4.266510009765625, 8.372764587402344, 1.473470687866211, 9.715316772460938, 5.278434753417969, 13.508281707763672, 3.9049758911132812, -14.865524291992188, -0.6047420501708984, 1.3632278442382812, -1.7452983856201172, -4.0024566650390625, -5.7739105224609375, 14.353805541992188, 14.258228302001953, 2.3793506622314453, 9.710868835449219, 10.960672378540039, 4.257640838623047, 7.064697265625, 4.083105087280273, 6.3795166015625, 6.084438323974609, -1.0427932739257812, 10.728591918945312, 4.9890289306640625, 9.668289184570312, 7.1505126953125, 2.9535675048828125, 10.600250244140625, 13.1436767578125, 2.2999706268310547, 2.3287391662597656, -5.051155090332031, 3.7882843017578125, -0.8483314514160156, 4.027275085449219, 4.0868682861328125, 9.124923706054688, 20.347919464111328, 6.076498031616211, 3.992992401123047, -9.998062133789062, -0.016326904296875, 13.42364501953125, 2.0986557006835938, 5.055837631225586, -0.5731487274169922, 17.789358139038086, 15.46490478515625, -3.607999801635742, 13.266555786132812, 10.235794067382812, 7.214906692504883, -3.015104293823242, 18.99980354309082, 1.627960205078125, 14.611137390136719, -2.29217529296875, 0.651031494140625, 4.038721084594727, 6.97052001953125, 15.246612548828125, 6.3011322021484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000458.npy"}
{"epoch": 0.6923658352229781, "step": 459, "batch_size": 64, "mean": 5.993643760681152, "std": 6.7308197021484375, "min": -6.361808776855469, "p10": -2.937005043029785, "median": 5.349233627319336, "p90": 16.847121238708496, "max": 21.427997589111328, "pos_frac": 0.796875, "sample": [5.801483154296875, 10.223148345947266, -5.736896514892578, 6.242408752441406, -3.7063827514648438, -0.090972900390625, 3.790679931640625, 9.81781005859375, 4.552513122558594, 6.975996017456055, 14.58721923828125, -2.068145751953125, 10.953155517578125, -3.024921417236328, 16.69245147705078, 17.96240234375, 4.561119079589844, 7.989532470703125, 4.818534851074219, 6.59382438659668, 8.40240478515625, 8.116409301757812, 3.342632293701172, 2.7730178833007812, 2.269102096557617, 12.222297668457031, 2.699859619140625, 8.437301635742188, 7.080854415893555, 13.364849090576172, 17.480392456054688, 0.34218597412109375, -2.7318668365478516, 2.534770965576172, -0.01476287841796875, 4.627439498901367, 5.374835968017578, -4.825874328613281, 12.861433029174805, -3.3846511840820312, 1.5204830169677734, 11.418472290039062, 7.890726089477539, 2.5990123748779297, 17.794281005859375, -0.7210235595703125, 10.149948120117188, -0.5258026123046875, -6.361808776855469, 16.913408279418945, 0.23603439331054688, 4.06695556640625, -5.229887008666992, 13.027816772460938, 6.14801025390625, 0.128265380859375, 7.164691925048828, 4.7659912109375, 17.270652770996094, 12.339393615722656, 17.17742919921875, 1.1609344482421875, 21.427997589111328, 5.323631286621094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000459.npy"}
{"epoch": 0.6938775510204082, "step": 460, "batch_size": 64, "mean": 6.508405685424805, "std": 7.194154739379883, "min": -4.963798522949219, "p10": -1.0558277130126954, "median": 4.886220932006836, "p90": 17.44034194946289, "max": 22.49969482421875, "pos_frac": 0.78125, "sample": [15.366622924804688, 11.056900024414062, 22.49969482421875, -0.9311370849609375, 2.7090530395507812, 17.476699829101562, 18.529586791992188, 3.10345458984375, 16.204727172851562, -1.0604705810546875, -0.28876495361328125, 3.8465499877929688, 1.8477401733398438, 3.8041839599609375, 0.0089111328125, 1.479949951171875, 4.916219711303711, 10.33110237121582, -1.0449943542480469, 20.455202102661133, 1.8869342803955078, 18.7374267578125, 7.1904754638671875, 10.255302429199219, 18.13589859008789, 10.310653686523438, 5.712791442871094, 4.1276092529296875, 10.500221252441406, 3.3834762573242188, 5.58013916015625, 16.535568237304688, 11.506242752075195, 4.0687103271484375, 0.8696117401123047, 5.48504638671875, 4.888164520263672, 2.46356201171875, 2.412628173828125, 2.518707275390625, 17.25728416442871, 11.049057006835938, -0.7816238403320312, -0.04566192626953125, 12.091083526611328, 17.355506896972656, -4.840492248535156, -4.963798522949219, 4.88427734375, -4.6810760498046875, -1.533487319946289, 5.0396270751953125, 13.954078674316406, 1.1102409362792969, -1.6086196899414062, 9.147865295410156, 1.0097732543945312, 16.036956787109375, -3.71466064453125, -0.6378021240234375, 10.1815185546875, 6.12237548828125, -0.465240478515625, 17.690387725830078], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000460.npy"}
{"epoch": 0.6953892668178382, "step": 461, "batch_size": 64, "mean": 5.748613357543945, "std": 6.928276538848877, "min": -9.688385009765625, "p10": -1.9583436965942382, "median": 5.01622200012207, "p90": 15.323514556884769, "max": 21.771759033203125, "pos_frac": 0.765625, "sample": [9.260566711425781, 6.893375396728516, 3.7840194702148438, 4.280691146850586, 17.004013061523438, 11.145538330078125, 7.008026123046875, 9.662178039550781, 11.689117431640625, -1.633941650390625, -9.688385009765625, 8.81549072265625, 7.3303985595703125, 2.522695541381836, 8.844165802001953, 0.13867950439453125, -0.95892333984375, 14.699800491333008, 1.8043498992919922, 3.706563949584961, 5.8668060302734375, 13.006006240844727, 11.283805847167969, 5.188148498535156, 2.7733707427978516, 12.251871109008789, -4.237548828125, 1.4350299835205078, -2.4863433837890625, 3.6142730712890625, 7.42950439453125, -0.999908447265625, -7.821306228637695, 4.844295501708984, 5.593101501464844, 10.7283935546875, 17.285064697265625, 3.5673980712890625, 2.0943374633789062, 21.771759033203125, 8.825485229492188, 15.829450607299805, 11.098522186279297, 3.5873641967773438, 1.5528507232666016, -1.6456890106201172, 3.4093093872070312, -2.0096607208251953, 10.111438751220703, 6.529670715332031, 11.011383056640625, 8.001306533813477, 1.1445236206054688, 1.8208847045898438, 14.70843505859375, -1.3187789916992188, 19.693893432617188, -3.393810272216797, -1.8386039733886719, -3.4890480041503906, 15.587120056152344, -1.051849365234375, 21.25891876220703, -1.0083351135253906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000461.npy"}
{"epoch": 0.6969009826152683, "step": 462, "batch_size": 64, "mean": 6.363646507263184, "std": 6.627534866333008, "min": -9.928791046142578, "p10": -1.2419013977050775, "median": 5.943666458129883, "p90": 14.87487411499024, "max": 26.033172607421875, "pos_frac": 0.859375, "sample": [15.764223098754883, 15.928531646728516, 15.789260864257812, 10.323234558105469, 0.561279296875, 5.274749755859375, 6.566741943359375, 3.8476715087890625, -1.5010299682617188, 1.2185821533203125, -2.1043472290039062, 18.65662384033203, 12.804489135742188, 3.7514572143554688, 7.8906707763671875, 15.652030944824219, 0.8220596313476562, 10.919998168945312, 2.4615631103515625, 8.067337036132812, 5.659046173095703, 4.1395721435546875, 13.7227783203125, 5.6319732666015625, 7.1029510498046875, 2.4335708618164062, 9.344001770019531, 10.472335815429688, 0.9517745971679688, 10.028339385986328, 1.849578857421875, -0.63726806640625, 7.654121398925781, 9.869342803955078, 15.368629455566406, -6.243682861328125, -1.504486083984375, 8.789962768554688, 7.787384033203125, 3.7783889770507812, 11.252471923828125, 12.388191223144531, 8.295524597167969, 2.187908172607422, 6.2282867431640625, 11.222732543945312, 4.4286651611328125, 5.206516265869141, 13.215999603271484, 9.304597854614258, 3.4117050170898438, -9.8216552734375, 12.251487731933594, 12.345182418823242, 12.768455505371094, 0.6974258422851562, 2.8270435333251953, 26.033172607421875, 4.700355529785156, 0.7851657867431641, -9.928791046142578, -1.9179267883300781, -0.5770053863525391, 1.0744400024414062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000462.npy"}
{"epoch": 0.6984126984126984, "step": 463, "batch_size": 64, "mean": 4.794564723968506, "std": 5.741898059844971, "min": -6.6280517578125, "p10": -1.7856433868408204, "median": 3.398958206176758, "p90": 12.255147552490236, "max": 20.027084350585938, "pos_frac": 0.75, "sample": [-0.354766845703125, 9.727447509765625, 10.037887573242188, -6.6280517578125, 10.502201080322266, 7.734010696411133, -1.1790332794189453, 16.139263153076172, 7.65869140625, 6.672735214233398, -1.7876358032226562, 1.9826812744140625, 2.0565185546875, 12.348304748535156, 10.610107421875, -2.17108154296875, -0.6996269226074219, 1.8206939697265625, -0.5688037872314453, 1.6775894165039062, -1.0491085052490234, 0.2034912109375, -0.1303558349609375, 14.780426025390625, 7.774356842041016, 10.207260131835938, 3.44866943359375, 7.814247131347656, -4.849143981933594, -1.7809944152832031, 3.500030517578125, 2.3785171508789062, 13.59417724609375, 12.921863555908203, 2.7097301483154297, 10.6400146484375, 6.153709411621094, 3.169036865234375, 3.338165283203125, 12.386489868164062, 3.3492469787597656, 2.2722702026367188, 1.412689208984375, 2.0170135498046875, -1.6739730834960938, -2.600433349609375, 20.027084350585938, 2.8114166259765625, 12.03778076171875, 9.738128662109375, 7.256416320800781, -4.385215759277344, 4.088951110839844, -0.8550796508789062, 1.941314697265625, 10.521774291992188, 11.455066680908203, 8.532875061035156, 8.4549560546875, 5.0817413330078125, -3.0691986083984375, 5.119041442871094, 0.017614364624023438, 10.510944366455078], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000463.npy"}
{"epoch": 0.6999244142101285, "step": 464, "batch_size": 64, "mean": 4.808414459228516, "std": 7.215304374694824, "min": -12.519256591796875, "p10": -3.860200691223144, "median": 3.7323379516601562, "p90": 14.351169967651368, "max": 19.476722717285156, "pos_frac": 0.765625, "sample": [-2.407867431640625, 2.471588134765625, 7.935413360595703, -6.700279235839844, 12.824180603027344, -0.6264572143554688, -4.436042785644531, 13.78704833984375, 9.484939575195312, 4.067047119140625, 11.701316833496094, -3.0980377197265625, 15.358474731445312, 5.884225845336914, 3.40130615234375, -3.087677001953125, 9.332366943359375, 2.653034210205078, 3.9526138305664062, 8.338386535644531, 3.06756591796875, 17.5325927734375, -4.18684196472168, 7.191337585449219, 1.3004226684570312, 11.002105712890625, 3.092060089111328, 14.12070083618164, 9.18215560913086, 7.0573272705078125, 2.00518798828125, 0.16984939575195312, 14.42502212524414, 15.288772583007812, -12.519256591796875, 4.1568145751953125, 13.438440322875977, 6.219841003417969, -6.423828125, -4.5999603271484375, 15.969959259033203, 2.3687782287597656, 3.4560489654541016, 13.888984680175781, 0.8341751098632812, 2.4067840576171875, 1.736541748046875, 1.919774055480957, -2.6199302673339844, 14.178848266601562, 6.117473602294922, 13.546279907226562, -12.286468505859375, -2.3168888092041016, 1.8912353515625, -2.6789588928222656, 3.5120620727539062, 14.929807662963867, 4.884204864501953, 19.476722717285156, -2.215974807739258, 6.720247268676758, 8.70944595336914, 0.9534759521484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000464.npy"}
{"epoch": 0.7014361300075586, "step": 465, "batch_size": 64, "mean": 4.909489631652832, "std": 6.993549823760986, "min": -7.577808380126953, "p10": -4.950488281249999, "median": 3.9485902786254883, "p90": 14.301530838012695, "max": 18.374736785888672, "pos_frac": 0.71875, "sample": [-6.036399841308594, 10.214645385742188, 6.799507141113281, -1.8229827880859375, 8.15185546875, 8.347297668457031, -0.9315109252929688, 18.374736785888672, -2.329742431640625, 9.397621154785156, 2.3082199096679688, 13.421974182128906, 12.34857177734375, 17.65960693359375, -5.492425918579102, 14.297378540039062, -5.8850250244140625, 14.30331039428711, 10.412078857421875, 3.2377243041992188, 0.71490478515625, -6.88592529296875, 9.000638961791992, 15.01348876953125, 10.221817016601562, 14.131240844726562, -1.226470947265625, -5.171478271484375, -1.39630126953125, 3.0793113708496094, 2.822601318359375, -4.434844970703125, 1.135467529296875, 8.964149475097656, 7.1543121337890625, 4.712806701660156, -1.9870014190673828, -0.9412040710449219, -2.098348617553711, 6.317584991455078, -7.175952911376953, 2.4165725708007812, 3.565034866333008, 6.3284759521484375, -1.8095626831054688, -7.577808380126953, 0.8340797424316406, 3.952699661254883, 6.687137603759766, 17.800853729248047, 1.2520065307617188, 13.390777587890625, 3.9444808959960938, 15.571428298950195, 3.9556732177734375, -1.1360301971435547, 3.5541534423828125, 3.3370914459228516, 6.190864562988281, 8.011322021484375, 18.00079345703125, 3.9169063568115234, 11.431816101074219, 11.861343383789062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000465.npy"}
{"epoch": 0.7029478458049887, "step": 466, "batch_size": 64, "mean": 3.300107479095459, "std": 5.650628566741943, "min": -9.23193359375, "p10": -3.3329845428466798, "median": 2.9616384506225586, "p90": 10.492823600769047, "max": 17.379257202148438, "pos_frac": 0.71875, "sample": [-0.8705215454101562, 8.71133041381836, -5.1131134033203125, -5.3252105712890625, 4.255701065063477, 0.770050048828125, -0.9848480224609375, 1.9260215759277344, 3.0105438232421875, -9.23193359375, 5.331153869628906, 5.038116455078125, 12.367729187011719, -3.2985191345214844, 3.8161849975585938, -4.164703369140625, 5.216041564941406, 15.952392578125, 6.034759521484375, 14.9500732421875, 8.007678985595703, 5.587123870849609, 12.362823486328125, -1.0108966827392578, 3.413909912109375, 1.8548583984375, 7.698661804199219, -2.8030357360839844, 17.379257202148438, 3.9267959594726562, 0.6699447631835938, -1.21160888671875, -1.8279533386230469, 3.2367706298828125, 6.784824371337891, 4.841949462890625, 9.795257568359375, 1.734161376953125, -3.3477554321289062, 3.120868682861328, 0.5173263549804688, -0.029296875, -1.447000503540039, -7.574466705322266, 9.29412841796875, 1.1461181640625, -0.9323577880859375, 6.061614990234375, 17.05309295654297, 1.352874755859375, -5.901039123535156, 7.3000640869140625, -0.8056106567382812, 7.3731536865234375, 2.9127330780029297, 1.609201431274414, 3.873626708984375, 2.3257884979248047, 2.241384506225586, 6.300285339355469, 10.791780471801758, 4.8341827392578125, 2.6073989868164062, 1.6970138549804688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000466.npy"}
{"epoch": 0.7044595616024187, "step": 467, "batch_size": 64, "mean": 5.451963424682617, "std": 6.389036178588867, "min": -8.026071548461914, "p10": -1.6945920944213866, "median": 5.042819023132324, "p90": 14.278576469421386, "max": 23.065967559814453, "pos_frac": 0.796875, "sample": [2.3896827697753906, 23.065967559814453, 4.364616394042969, 7.88348388671875, -0.0491943359375, 1.8835391998291016, -1.5790634155273438, -1.5427989959716797, 6.725055694580078, 7.928447723388672, 7.32818603515625, -4.113666534423828, 12.181610107421875, -2.2859935760498047, 10.336502075195312, 14.950216293334961, 1.0583992004394531, 3.4686355590820312, 1.6706924438476562, 2.4416255950927734, 7.3306121826171875, -4.0348968505859375, 0.8494529724121094, 17.170883178710938, 14.2606201171875, 0.67852783203125, 9.359237670898438, 8.554828643798828, -8.026071548461914, 10.066009521484375, -1.7441043853759766, 12.0740966796875, 2.378080368041992, 7.1281585693359375, -2.310699462890625, 4.063873291015625, 7.7222442626953125, 11.600051879882812, 5.3246307373046875, 5.97216796875, 0.032306671142578125, 0.29424476623535156, -0.88629150390625, 5.232969284057617, 2.1664581298828125, 1.9258575439453125, -0.8937835693359375, -0.011474609375, 8.605138778686523, 14.286272048950195, 22.496826171875, 4.142425537109375, 7.423095703125, 0.7488746643066406, 5.973665237426758, 4.852668762207031, -3.8279647827148438, 16.133590698242188, 3.858631134033203, 16.51605224609375, 6.18841552734375, 8.220733642578125, 12.643360137939453, 6.279937744140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000467.npy"}
{"epoch": 0.7059712773998488, "step": 468, "batch_size": 64, "mean": 7.052037239074707, "std": 7.337891101837158, "min": -5.467643737792969, "p10": -1.7734695434570311, "median": 5.460711479187012, "p90": 17.17345142364502, "max": 27.316516876220703, "pos_frac": 0.859375, "sample": [6.558652877807617, 0.8006725311279297, 6.691186904907227, 19.363235473632812, 5.470342636108398, -0.9222011566162109, 0.27523040771484375, 12.800090789794922, 6.8978271484375, 3.5159149169921875, 10.453203201293945, 18.49120330810547, 12.812576293945312, -4.764055252075195, 5.451080322265625, 11.502220153808594, 0.8104476928710938, -3.372894287109375, -1.783782958984375, 8.289337158203125, 18.280845642089844, 0.61541748046875, 12.301483154296875, 17.322250366210938, 2.878316879272461, 4.727289199829102, 19.25429344177246, 2.383943557739258, -2.603759765625, -5.467643737792969, 16.163455963134766, 1.40185546875, -1.7494049072265625, -4.732431411743164, 3.476114273071289, -2.900167465209961, 8.22076416015625, 8.283859252929688, 0.3623809814453125, 13.905200958251953, 3.7554855346679688, 2.58221435546875, 10.401191711425781, 16.75384521484375, 11.07528305053711, 3.188343048095703, 4.372594833374023, 7.965583801269531, 4.09797477722168, 4.537557601928711, 16.82625389099121, 13.464088439941406, 9.945323944091797, 5.3948211669921875, 23.245468139648438, 2.235015869140625, 1.5478363037109375, 27.316516876220703, 9.652732849121094, 13.01424789428711, 15.259565353393555, 10.461090087890625, 2.044506072998047, 0.7284927368164062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000468.npy"}
{"epoch": 0.7074829931972789, "step": 469, "batch_size": 64, "mean": 5.594452857971191, "std": 6.799135208129883, "min": -6.821588516235352, "p10": -1.220736503601074, "median": 3.756319046020508, "p90": 14.35146179199219, "max": 23.385713577270508, "pos_frac": 0.78125, "sample": [4.641487121582031, 10.673938751220703, 10.043258666992188, 5.9425506591796875, 2.0902023315429688, 1.7530269622802734, 2.1021270751953125, 2.642885208129883, 9.724369049072266, -4.392852783203125, 13.603385925292969, 9.366283416748047, 8.410110473632812, -2.5821189880371094, 0.38702392578125, 11.471986770629883, 3.9565048217773438, 3.1126232147216797, 8.512763977050781, 15.517284393310547, 14.672065734863281, 2.4927406311035156, 3.4292373657226562, 0.8056259155273438, 9.848556518554688, -4.714597702026367, 3.556133270263672, 1.8516044616699219, -1.050699234008789, 2.8835296630859375, -3.6266403198242188, -0.022676467895507812, 13.468780517578125, 22.47835922241211, 12.853759765625, -1.0037994384765625, -0.03837394714355469, -0.17209625244140625, 12.755743026733398, 20.2414493560791, 13.073638916015625, 7.882698059082031, 3.510843276977539, 7.2195281982421875, 0.7713165283203125, 0.7169647216796875, 4.5706329345703125, -0.911376953125, 23.385713577270508, 6.1099700927734375, 1.1775970458984375, 0.7844276428222656, 21.37802505493164, 0.5576190948486328, 15.30999755859375, -2.6838417053222656, 8.33741569519043, 5.286785125732422, 7.674343109130859, -6.821588516235352, 5.821098327636719, -0.19780921936035156, -1.293609619140625, 8.699058532714844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000469.npy"}
{"epoch": 0.708994708994709, "step": 470, "batch_size": 64, "mean": 5.169079780578613, "std": 6.4647417068481445, "min": -16.86449432373047, "p10": -2.2997768402099608, "median": 5.4086809158325195, "p90": 12.667165374755859, "max": 19.10610580444336, "pos_frac": 0.734375, "sample": [9.693992614746094, 8.843330383300781, 19.10610580444336, -16.86449432373047, 3.0785903930664062, 14.24267578125, 11.533088684082031, 3.5005722045898438, 12.673049926757812, 10.647445678710938, 8.213226318359375, -4.400459289550781, 11.568946838378906, 5.419395446777344, 10.278861999511719, 2.793834686279297, 12.653434753417969, -0.7825336456298828, -0.16181182861328125, 7.953516006469727, -1.1440925598144531, 8.744232177734375, 3.541088104248047, -3.3819580078125, 12.522102355957031, 4.042091369628906, 17.894989013671875, 8.260711669921875, 1.5571937561035156, -2.369009017944336, 6.461006164550781, -0.21565818786621094, 11.237983703613281, 2.269287109375, -2.138235092163086, -0.19986343383789062, 2.4073448181152344, 0.7716770172119141, 5.745277404785156, -4.038860321044922, 5.8107147216796875, 8.828109741210938, 12.425453186035156, 1.149383544921875, -1.7021560668945312, -2.9619369506835938, 5.397966384887695, 9.251060485839844, 7.9829864501953125, 14.131912231445312, 2.3886985778808594, 11.182472229003906, -3.0904998779296875, 6.369564056396484, 1.6856460571289062, 7.61077880859375, 2.6132030487060547, 14.277107238769531, -1.1775283813476562, 4.502595901489258, 15.40164566040039, -0.5175762176513672, 8.329998016357422, -1.0265617370605469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000470.npy"}
{"epoch": 0.7105064247921391, "step": 471, "batch_size": 64, "mean": 6.202991485595703, "std": 7.400732517242432, "min": -18.707626342773438, "p10": -2.0559085845947265, "median": 6.645721435546875, "p90": 16.734378814697266, "max": 19.78070068359375, "pos_frac": 0.828125, "sample": [1.2875823974609375, 14.452877044677734, 12.411258697509766, 9.932598114013672, 3.7313079833984375, -2.12860107421875, 11.595603942871094, 7.268779754638672, 17.707542419433594, 13.155776977539062, 10.415611267089844, 3.371612548828125, 17.94293975830078, 0.20599365234375, -5.782978057861328, -0.4151573181152344, 3.6024837493896484, 17.36243438720703, 16.562625885009766, -4.0374908447265625, 2.698741912841797, 10.244319915771484, 9.813835144042969, 7.642570495605469, 10.490028381347656, -18.707626342773438, 8.97944450378418, 8.396270751953125, 18.635154724121094, 0.7377853393554688, 9.973430633544922, 4.319700241088867, 1.2748069763183594, 2.3903884887695312, 4.207569122314453, 3.7023468017578125, 13.759109497070312, 0.4774932861328125, 1.0257492065429688, 16.807987213134766, 6.34539794921875, -2.061370849609375, 15.81155014038086, 13.290771484375, 0.3736419677734375, 1.0131263732910156, 8.582138061523438, 9.229602813720703, -6.284454345703125, 10.441654205322266, -4.634132385253906, 13.212150573730469, 0.46089935302734375, 2.1988143920898438, 7.377204895019531, -0.8429546356201172, 6.946044921875, 9.092321395874023, 2.5520172119140625, 19.78070068359375, 3.5359420776367188, -0.06097412109375, -2.043163299560547, 17.16259765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000471.npy"}
{"epoch": 0.7120181405895691, "step": 472, "batch_size": 64, "mean": 5.7624406814575195, "std": 7.562279224395752, "min": -10.422210693359375, "p10": -4.573089981079101, "median": 4.723865509033203, "p90": 15.796063804626469, "max": 23.816757202148438, "pos_frac": 0.828125, "sample": [4.1635589599609375, 5.645668029785156, 4.610435485839844, 6.4983978271484375, 13.504310607910156, 10.291610717773438, 1.5888595581054688, 13.441390991210938, 13.867660522460938, 18.413299560546875, 4.483020782470703, 23.816757202148438, 5.321569442749023, -4.3245849609375, 1.4657230377197266, 16.137178421020508, 3.459075927734375, 7.234504699707031, 21.929332733154297, 2.6658477783203125, -5.483192443847656, 5.258766174316406, -4.679592132568359, 11.648597717285156, -1.8535919189453125, 6.9787750244140625, 3.1608123779296875, 16.576194763183594, 18.134178161621094, -2.3517208099365234, 3.8252105712890625, 7.397247314453125, 2.462312698364258, -7.739234924316406, -3.7361488342285156, 9.332901000976562, 10.97320556640625, 13.37176513671875, 1.9041709899902344, 3.6708450317382812, 10.739015579223633, 19.16223907470703, -10.422210693359375, 15.000129699707031, -8.747024536132812, -6.112300872802734, 10.264190673828125, 1.5062637329101562, -5.744804382324219, 6.749931335449219, 2.7874488830566406, 3.1650943756103516, 9.28839111328125, 0.555938720703125, 4.8372955322265625, 0.23624420166015625, 7.775215148925781, 4.362701416015625, 14.122365951538086, 0.17539215087890625, 6.274089813232422, 4.226722717285156, 1.4246349334716797, 14.104135513305664], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000472.npy"}
{"epoch": 0.7135298563869993, "step": 473, "batch_size": 64, "mean": 5.75660514831543, "std": 8.477140426635742, "min": -10.0816650390625, "p10": -4.788209533691406, "median": 4.422090530395508, "p90": 17.47600326538086, "max": 25.70098876953125, "pos_frac": 0.765625, "sample": [17.524734497070312, 17.36229705810547, 0.6720142364501953, -6.340606689453125, -6.219705581665039, 18.215316772460938, 2.1988525390625, 8.547019958496094, 2.4573516845703125, 16.361778259277344, 16.306739807128906, 0.7424488067626953, -1.7467613220214844, 4.6691741943359375, 2.563549041748047, -5.245979309082031, 20.500686645507812, 14.805519104003906, 7.068458557128906, 13.305160522460938, -10.0816650390625, -2.5687637329101562, 11.3546142578125, -3.829608917236328, -4.4640045166015625, 9.961944580078125, 6.488609313964844, 8.93011474609375, 2.636625289916992, 2.6319808959960938, 6.2081298828125, 0.3167133331298828, -7.293708801269531, -4.927154541015625, -4.402751922607422, 12.273740768432617, 25.70098876953125, 20.089942932128906, 4.246784210205078, 1.1958141326904297, 13.697784423828125, 7.630298614501953, 14.98980712890625, 22.656280517578125, 3.168170928955078, 2.242992401123047, 3.106426239013672, 0.734222412109375, 4.5973968505859375, 11.356456756591797, 7.2022552490234375, 2.165019989013672, 11.554388046264648, 4.84100341796875, -0.6678619384765625, 6.346355438232422, 16.730361938476562, -4.3015289306640625, -1.4850616455078125, 1.7028350830078125, 1.6106109619140625, 20.236129760742188, 8.743122100830078, -8.65114974975586], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000473.npy"}
{"epoch": 0.7150415721844293, "step": 474, "batch_size": 64, "mean": 4.675611972808838, "std": 7.490244388580322, "min": -10.928054809570312, "p10": -3.9584730148315423, "median": 3.3116378784179688, "p90": 15.21924457550049, "max": 20.465530395507812, "pos_frac": 0.71875, "sample": [16.12079620361328, 8.684211730957031, 7.280475616455078, -6.433103561401367, -3.288127899169922, -0.9542808532714844, 11.155197143554688, 17.810287475585938, 9.006631851196289, -1.7823410034179688, 1.0957870483398438, 8.416816711425781, -8.04526138305664, 9.691474914550781, 5.999298095703125, 16.10470199584961, 1.110382080078125, 0.30814552307128906, -2.888965606689453, 8.318008422851562, -4.245763778686523, -2.031829833984375, 13.335582733154297, 6.315301895141602, 9.8758544921875, 1.722970962524414, 0.09185409545898438, -10.928054809570312, 1.167144775390625, -0.994049072265625, 13.017318725585938, 3.210590362548828, -7.601858139038086, 2.1696243286132812, -3.280750274658203, 14.6566162109375, 0.19007110595703125, 12.932296752929688, 9.09228515625, -2.4610748291015625, 1.7775897979736328, 3.1052322387695312, 13.259071350097656, 8.261871337890625, -1.380035400390625, 20.465530395507812, 14.161590576171875, 2.3084716796875, 13.880821228027344, -1.7295780181884766, 15.460371017456055, 16.28864288330078, 1.1701736450195312, 4.6413116455078125, 3.4717674255371094, -4.462665557861328, 15.850521087646484, -2.2797164916992188, 3.963258743286133, 7.826873779296875, 12.386837005615234, -9.323646545410156, 2.777923583984375, 3.4126853942871094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000474.npy"}
{"epoch": 0.7165532879818595, "step": 475, "batch_size": 64, "mean": 7.490861415863037, "std": 6.961335182189941, "min": -4.27056884765625, "p10": -0.5461696624755858, "median": 6.677543640136719, "p90": 17.898830795288085, "max": 22.91773223876953, "pos_frac": 0.84375, "sample": [0.8161334991455078, 4.668331146240234, 7.327293395996094, 5.480388641357422, 10.766227722167969, 5.497875213623047, 12.989198684692383, 17.900447845458984, 6.097114562988281, -1.7566261291503906, 6.785346984863281, 0.660400390625, 14.363998413085938, 17.233856201171875, 6.238653182983398, 9.098907470703125, 9.585298538208008, 6.346363067626953, 4.111228942871094, 9.999626159667969, 3.6084213256835938, 3.8429718017578125, 7.441120147705078, 4.1605224609375, 6.569740295410156, 2.705150604248047, 1.6212921142578125, 17.807022094726562, 6.8316192626953125, -4.27056884765625, 20.03655242919922, 17.895057678222656, 15.396957397460938, 22.91773223876953, 4.262529373168945, -0.039333343505859375, 7.171098709106445, -0.4828758239746094, 7.7421722412109375, -0.32155609130859375, 17.9296875, 10.901336669921875, 0.6903514862060547, 2.0601654052734375, 7.669940948486328, -4.032506942749023, -2.6791343688964844, 5.271238327026367, 21.75726318359375, 1.1828422546386719, 19.758808135986328, 21.349838256835938, 12.658599853515625, 8.936756134033203, 11.031499862670898, -0.5732955932617188, -3.9008255004882812, 12.22381591796875, 13.588394165039062, 11.787113189697266, 3.433788299560547, 4.743461608886719, -3.6563568115234375, 8.176643371582031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000475.npy"}
{"epoch": 0.7180650037792895, "step": 476, "batch_size": 64, "mean": 4.575405597686768, "std": 7.678650379180908, "min": -10.57415771484375, "p10": -3.710699462890625, "median": 4.568756103515625, "p90": 14.769959831237795, "max": 22.744964599609375, "pos_frac": 0.703125, "sample": [14.919357299804688, 7.760326385498047, -3.0914745330810547, 5.60760498046875, -6.905677795410156, 7.691436767578125, -3.7537612915039062, 16.204910278320312, 22.744964599609375, 2.1310577392578125, 16.62615966796875, -0.42684364318847656, 20.72060203552246, 9.59490966796875, 10.408878326416016, 5.3605804443359375, -0.624908447265625, 2.738800048828125, 6.681312561035156, 5.568267822265625, -3.6102218627929688, 0.4405345916748047, 10.452333450317383, 8.564899444580078, 5.7410125732421875, -9.340309143066406, -2.4563827514648438, 2.284412384033203, 0.7162017822265625, 2.4352970123291016, 20.615619659423828, -9.168777465820312, 5.8909759521484375, -0.5499839782714844, 5.745216369628906, -6.384490966796875, -2.8797149658203125, -10.10675048828125, 10.047523498535156, 1.4826927185058594, -2.498199462890625, 8.203781127929688, 4.944366455078125, 2.2505950927734375, 8.940940856933594, 3.3545989990234375, 4.193145751953125, 0.4557037353515625, -0.13675689697265625, 14.095756530761719, 6.6607818603515625, -10.57415771484375, 11.097747802734375, 1.52410888671875, 10.300106048583984, 19.660423278808594, -1.2957801818847656, 3.1317901611328125, 12.071056365966797, 7.802459716796875, -2.357381820678711, 14.421365737915039, 8.883636474609375, -2.1806983947753906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000476.npy"}
{"epoch": 0.7195767195767195, "step": 477, "batch_size": 64, "mean": 4.849900245666504, "std": 8.009820938110352, "min": -12.855335235595703, "p10": -6.611124420166015, "median": 5.039166450500488, "p90": 15.066094207763674, "max": 20.079883575439453, "pos_frac": 0.75, "sample": [-7.7265625, -3.7765579223632812, 11.386962890625, 1.1007156372070312, 12.446273803710938, 8.265850067138672, 1.4007720947265625, 0.9044570922851562, 0.1563262939453125, -1.5585708618164062, 13.362358093261719, 5.3917388916015625, -12.025981903076172, 3.0966644287109375, 6.280817031860352, 2.2615833282470703, -2.145263671875, 16.834442138671875, 8.023086547851562, -2.4363327026367188, 8.814437866210938, 16.65943717956543, 6.851593017578125, 7.0867156982421875, 12.124200820922852, -1.0328598022460938, -6.219398498535156, 3.4813919067382812, 11.81494140625, 2.6772727966308594, -9.704025268554688, 2.0717239379882812, 4.686594009399414, 0.10696029663085938, -12.855335235595703, -0.6667022705078125, -7.7376556396484375, 4.1611175537109375, 7.2844390869140625, 4.173809051513672, 20.079883575439453, 0.3457794189453125, 12.769912719726562, 19.880508422851562, 6.699552536010742, 16.49627685546875, -6.7790069580078125, 9.615264892578125, 11.968040466308594, -3.976654052734375, 15.347076416015625, 3.176393508911133, 7.336816787719727, 11.342777252197266, 15.970367431640625, 12.283035278320312, -11.608306884765625, 7.02069091796875, 9.864639282226562, 14.410469055175781, 1.0370674133300781, 12.75677490234375, -1.45928955078125, 10.794122695922852], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000477.npy"}
{"epoch": 0.7210884353741497, "step": 478, "batch_size": 64, "mean": 4.55825138092041, "std": 7.5273661613464355, "min": -8.361709594726562, "p10": -3.4752689361572267, "median": 4.197360992431641, "p90": 16.690741539001465, "max": 21.32185173034668, "pos_frac": 0.65625, "sample": [-0.8902854919433594, 21.32185173034668, 0.7405548095703125, 11.117225646972656, 6.082937240600586, -0.6688461303710938, 16.7742919921875, 11.729400634765625, -1.35870361328125, 10.386154174804688, -1.1254730224609375, 0.00433349609375, 5.795379638671875, 17.363677978515625, -3.440563201904297, 2.5947608947753906, -0.5170745849609375, 4.12738037109375, -2.03778076171875, 6.978599548339844, -2.6239452362060547, -3.0763587951660156, 8.806488037109375, 6.642723083496094, 11.034950256347656, -8.16546630859375, 6.967227935791016, 8.124715805053711, 10.032234191894531, -7.935249328613281, 8.273330688476562, -0.06715202331542969, -8.361709594726562, -8.211380004882812, 1.4544105529785156, 17.49445343017578, 15.259567260742188, -1.235382080078125, 9.704193115234375, 4.274078369140625, 16.495790481567383, -0.3965644836425781, -5.4217529296875, 4.116384506225586, 20.3380126953125, 4.5256500244140625, 0.17867279052734375, 12.895797729492188, 4.267341613769531, -2.2102394104003906, 6.264308929443359, 12.084001541137695, -3.490142822265625, 7.156248092651367, -1.1775436401367188, 17.26494598388672, 17.754589080810547, -1.8779296875, 2.1475753784179688, 5.149085998535156, 0.4853668212890625, -6.533205032348633, 5.18035888671875, 3.1617870330810547], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000478.npy"}
{"epoch": 0.7226001511715797, "step": 479, "batch_size": 64, "mean": 4.54490852355957, "std": 7.770591735839844, "min": -21.44831085205078, "p10": -3.4731983184814452, "median": 4.1741228103637695, "p90": 14.89191513061524, "max": 22.811676025390625, "pos_frac": 0.703125, "sample": [6.088779449462891, 15.999370574951172, -1.5535888671875, -3.384510040283203, -1.6565914154052734, 4.264049530029297, 20.517013549804688, 6.469121932983398, 12.528934478759766, 0.26349639892578125, 4.56370735168457, -5.163841247558594, -1.0234527587890625, -0.8832874298095703, -0.6369476318359375, 8.628860473632812, 13.258270263671875, 11.713973999023438, -0.415435791015625, 1.4251956939697266, -2.147642135620117, 2.4652671813964844, 17.03192138671875, 9.408367156982422, 3.205869674682617, -6.344585418701172, 5.440155029296875, 1.5804595947265625, -5.516448974609375, 0.8924331665039062, 6.19183349609375, 7.735776901245117, 1.0142478942871094, 18.365211486816406, 15.592048645019531, 1.6445941925048828, -9.648323059082031, -2.6370925903320312, 4.609642028808594, 10.834747314453125, 22.811676025390625, -21.44831085205078, -3.5112075805664062, 3.351530075073242, 11.043163299560547, 20.61672592163086, 7.826408386230469, -4.85595703125, 2.2356300354003906, 2.8392715454101562, 7.0919952392578125, 11.66522216796875, -0.33941650390625, 13.223968505859375, 1.2892436981201172, 7.400966644287109, 9.139272689819336, 4.084196090698242, 5.023990631103516, 7.585536956787109, -0.42029380798339844, -1.9597244262695312, 6.706733703613281, 8.75190544128418], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000479.npy"}
{"epoch": 0.7241118669690099, "step": 480, "batch_size": 64, "mean": 5.2290520668029785, "std": 8.067790985107422, "min": -18.532833099365234, "p10": -4.384766387939453, "median": 5.934398651123047, "p90": 14.952583122253419, "max": 22.48931884765625, "pos_frac": 0.765625, "sample": [5.9081878662109375, -7.522369384765625, 8.193439483642578, -5.4014129638671875, 7.72650146484375, 3.299856185913086, 7.941972732543945, -18.532833099365234, 2.5801162719726562, 22.48931884765625, 0.26909446716308594, 3.9278182983398438, 10.999824523925781, -2.046173095703125, 5.129570007324219, -0.94134521484375, -0.904571533203125, 5.960609436035156, -3.4844894409179688, 2.003734588623047, -8.965499877929688, 13.747352600097656, 7.956207275390625, -5.861061096191406, 15.664688110351562, 15.02464485168457, 20.732925415039062, 8.704582214355469, 19.373390197753906, 7.015922546386719, 7.733264923095703, 0.146392822265625, 3.3447723388671875, 6.700328826904297, 7.865652084350586, 6.649431228637695, 11.936485290527344, 6.17523193359375, -16.377212524414062, 5.619926452636719, 18.30011749267578, 11.666206359863281, 7.916584014892578, 6.849956512451172, 3.723480224609375, 14.425823211669922, 3.5700225830078125, -4.296684265136719, 5.581600189208984, 5.452642440795898, 4.637067794799805, -4.422515869140625, 14.784439086914062, 10.130485534667969, 6.792182922363281, 9.984790802001953, 14.0943603515625, 9.25286865234375, 1.9136543273925781, -1.4827289581298828, 1.4350814819335938, 18.269943237304688, -0.42705535888671875, -4.277252197265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000480.npy"}
{"epoch": 0.7256235827664399, "step": 481, "batch_size": 64, "mean": 5.9973039627075195, "std": 8.107568740844727, "min": -10.241064071655273, "p10": -3.798338317871093, "median": 4.678478240966797, "p90": 16.567068481445315, "max": 29.07049560546875, "pos_frac": 0.765625, "sample": [7.295648574829102, 2.9738922119140625, 22.659889221191406, 2.5802536010742188, -3.9776763916015625, 7.126840591430664, 14.065757751464844, 6.67230224609375, -4.682395935058594, -4.3637847900390625, 16.336761474609375, 4.014640808105469, 8.844009399414062, 11.683979034423828, -1.4214706420898438, 16.665771484375, 8.109306335449219, 5.927989959716797, 8.01055908203125, 11.134902954101562, -9.535968780517578, -0.8782768249511719, 16.948158264160156, 3.452993392944336, 0.5538520812988281, -2.362316131591797, 0.6216888427734375, 11.581405639648438, 2.477832794189453, 9.605506896972656, -0.6809234619140625, 13.028818130493164, 2.45037841796875, 0.98602294921875, 3.2085189819335938, 5.817352294921875, 10.392654418945312, 4.909381866455078, 3.4299278259277344, 4.517578125, -3.9795379638671875, 3.719482421875, 13.430837631225586, 23.642669677734375, -0.3920860290527344, 18.378559112548828, 1.5024566650390625, 2.2130355834960938, 29.07049560546875, 9.978361129760742, 10.723907470703125, 4.839378356933594, 1.1836929321289062, -3.3798828125, -3.1506595611572266, 16.2611083984375, -10.241064071655273, 10.7940673828125, 10.469709396362305, 23.003143310546875, 10.200647354125977, 2.713226318359375, -6.115264892578125, -1.2206153869628906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000481.npy"}
{"epoch": 0.72713529856387, "step": 482, "batch_size": 64, "mean": 4.719492435455322, "std": 8.176856994628906, "min": -11.98635482788086, "p10": -4.80166015625, "median": 4.950435638427734, "p90": 17.312490844726568, "max": 26.0404052734375, "pos_frac": 0.75, "sample": [-3.1308441162109375, -7.5106048583984375, -0.13266754150390625, 7.014198303222656, 2.6770553588867188, 2.2246932983398438, 6.961158752441406, -10.305503845214844, 0.7134437561035156, 19.57269287109375, 5.4124603271484375, 15.492904663085938, 7.570240020751953, 3.3313751220703125, 7.806774139404297, 5.776023864746094, 10.559967041015625, 6.129615783691406, -11.98635482788086, 7.8955535888671875, 17.989303588867188, 0.0763397216796875, 26.0404052734375, -4.70159912109375, -1.2650165557861328, 9.082113265991211, 18.389244079589844, 9.004878997802734, -1.54364013671875, 1.406219482421875, -2.2482376098632812, 7.517972946166992, -4.619260787963867, 3.867279052734375, 18.87187957763672, 10.586067199707031, 11.527580261230469, 1.1769046783447266, 3.7415504455566406, -4.84454345703125, 18.085479736328125, 4.913238525390625, 6.9515838623046875, 1.79998779296875, 3.5290355682373047, 9.574377059936523, 6.7619781494140625, 5.33906364440918, 23.199050903320312, -11.975067138671875, 7.765373229980469, 1.30303955078125, 8.584178924560547, 4.987632751464844, 10.162830352783203, 0.4327049255371094, -9.486053466796875, 15.733261108398438, -5.656639099121094, -3.8201637268066406, 8.183319091796875, 0.9686012268066406, -1.833648681640625, 0.4167346954345703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000482.npy"}
{"epoch": 0.7286470143613001, "step": 483, "batch_size": 64, "mean": 6.200436592102051, "std": 8.01875114440918, "min": -17.19330596923828, "p10": -2.0340652465820312, "median": 5.6502180099487305, "p90": 16.306092834472658, "max": 24.073898315429688, "pos_frac": 0.75, "sample": [-1.6250381469726562, 17.06916046142578, -1.7761878967285156, 2.218576431274414, 13.05517578125, 22.15155029296875, 10.518667221069336, 3.25927734375, -1.9163055419921875, -2.439716339111328, 3.320598602294922, 7.598926544189453, 12.8026123046875, 20.356430053710938, -2.502185821533203, 12.21628189086914, 3.4887351989746094, -3.7511062622070312, -0.08777618408203125, 14.673236846923828, 13.008682250976562, 16.574722290039062, 14.467269897460938, 0.6211776733398438, 13.469802856445312, 6.349292755126953, 11.159576416015625, -0.1162261962890625, 3.0324783325195312, 2.331573486328125, 7.296730041503906, -11.818748474121094, 21.982337951660156, 0.6220550537109375, 5.825042724609375, 15.679290771484375, 6.083709716796875, -17.19330596923828, 8.133052825927734, -3.3259201049804688, 2.9363479614257812, 5.475393295288086, 11.586074829101562, 0.621063232421875, -2.08453369140625, 7.847827911376953, 24.073898315429688, 11.574356079101562, 7.9973297119140625, 12.785417556762695, 3.5189590454101562, 9.745376586914062, 3.2368545532226562, 18.178817749023438, -0.5802955627441406, 0.5008983612060547, -1.0708370208740234, 5.471340179443359, 11.553592681884766, 9.266399383544922, 1.0316696166992188, -1.84552001953125, -0.9237251281738281, 13.117721557617188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000483.npy"}
{"epoch": 0.7301587301587301, "step": 484, "batch_size": 64, "mean": 4.6300859451293945, "std": 6.079267978668213, "min": -7.7806243896484375, "p10": -3.113003921508789, "median": 5.153352737426758, "p90": 11.535152435302738, "max": 21.33948516845703, "pos_frac": 0.75, "sample": [3.2109451293945312, 7.1499481201171875, -7.7806243896484375, 5.083049774169922, 21.33948516845703, 2.2939529418945312, 1.9861297607421875, -4.8744659423828125, 12.686027526855469, -3.453765869140625, 3.5470657348632812, -3.80975341796875, 5.27360725402832, 2.856586456298828, 5.956785202026367, 8.808048248291016, -1.4736900329589844, -2.8218650817871094, 5.030128479003906, 10.688346862792969, -1.3590927124023438, 8.623252868652344, 10.884803771972656, 6.027093887329102, 6.454835891723633, 8.450872421264648, 11.813873291015625, 0.44524383544921875, 8.794429779052734, 1.3639469146728516, 0.055904388427734375, 8.766426086425781, 0.73065185546875, -3.50732421875, 16.421974182128906, -0.4894905090332031, 6.408712387084961, 5.223655700683594, 5.402099609375, 0.7845230102539062, 5.427165985107422, -0.7094268798828125, 19.29391098022461, -0.017852783203125, 13.28012466430664, 3.9866161346435547, 8.146358489990234, 1.3019905090332031, 6.232227325439453, 6.182182312011719, 19.188430786132812, 5.055765151977539, 7.459236145019531, -1.1255035400390625, -2.9401321411132812, -3.187091827392578, 9.154312133789062, -2.232757568359375, 4.442071914672852, 7.56292724609375, 5.42962646484375, 8.697731018066406, 8.35561752319336, -5.650352478027344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000484.npy"}
{"epoch": 0.7316704459561603, "step": 485, "batch_size": 64, "mean": 5.490420341491699, "std": 7.4919939041137695, "min": -12.823951721191406, "p10": -2.254449081420898, "median": 4.1959686279296875, "p90": 18.06372871398926, "max": 22.241989135742188, "pos_frac": 0.765625, "sample": [5.8724212646484375, 5.645326614379883, 9.97479248046875, 9.004928588867188, 5.007041931152344, 6.352569580078125, 4.022590637207031, 0.6669578552246094, 9.074783325195312, -2.5339794158935547, -12.823951721191406, 3.14697265625, 20.441513061523438, 9.227386474609375, 0.34584808349609375, 9.661087036132812, 4.272886276245117, 17.342479705810547, 22.241989135742188, 4.119050979614258, 1.8984489440917969, 5.8233642578125, -1.035247802734375, 20.140625, -0.7860889434814453, -0.27408599853515625, -10.306938171386719, 3.7880325317382812, 4.0681304931640625, 6.6101837158203125, -3.8796615600585938, -0.7368278503417969, 12.276412963867188, -0.6542758941650391, 9.097053527832031, -4.9548187255859375, 18.230438232421875, 17.674739837646484, 2.2509613037109375, 8.16830062866211, 4.572223663330078, -3.9296131134033203, 6.082466125488281, 2.216339111328125, 3.769489288330078, -2.369861602783203, -1.2022628784179688, 13.604866027832031, 19.392227172851562, 3.0317535400390625, 1.1492576599121094, 4.87908935546875, 8.243568420410156, 19.390554428100586, 5.698757171630859, 16.208751678466797, 5.014812469482422, 19.57388687133789, -1.9851531982421875, -1.8128528594970703, 1.6323089599609375, 2.183910369873047, 4.07257080078125, 3.5083770751953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000485.npy"}
{"epoch": 0.7331821617535903, "step": 486, "batch_size": 64, "mean": 5.1180195808410645, "std": 8.554729461669922, "min": -12.751968383789062, "p10": -4.148353767395019, "median": 3.438990592956543, "p90": 17.452358436584472, "max": 34.030670166015625, "pos_frac": 0.765625, "sample": [0.8790969848632812, 4.722007751464844, 13.510284423828125, 5.825578689575195, 0.5908203125, -0.7168312072753906, 6.683013916015625, 34.030670166015625, 6.603179931640625, -1.26116943359375, -4.567604064941406, 10.681892395019531, 5.729927062988281, 3.4840145111083984, 10.279945373535156, 3.8947525024414062, 2.4401397705078125, 17.471540451049805, -1.2166786193847656, 7.7307586669921875, 6.308706283569336, 3.4914932250976562, -5.977664947509766, 3.3939666748046875, 22.054229736328125, -6.952384948730469, 5.534637451171875, -6.880130767822266, 10.1895751953125, -6.909446716308594, 14.285919189453125, 0.59033203125, 1.3979949951171875, -11.045417785644531, 6.024843215942383, 1.2286224365234375, 5.319845199584961, 12.547359466552734, -3.170103073120117, 2.608592987060547, 0.9632129669189453, 17.40760040283203, 6.7440338134765625, 22.268325805664062, 17.233753204345703, -1.430816650390625, 2.8532562255859375, 8.615522384643555, 6.2986297607421875, 21.255048751831055, 2.2893829345703125, 2.328845977783203, 3.13800048828125, 17.69500732421875, -2.233386993408203, -0.54473876953125, 2.139892578125, 2.846384048461914, 21.691329956054688, 0.03501701354980469, 7.7248382568359375, -12.751968383789062, 1.4364852905273438, -1.286712646484375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000486.npy"}
{"epoch": 0.7346938775510204, "step": 487, "batch_size": 64, "mean": 5.155494689941406, "std": 6.960051536560059, "min": -10.031009674072266, "p10": -3.298585510253906, "median": 4.022547721862793, "p90": 13.842522430419924, "max": 26.519256591796875, "pos_frac": 0.796875, "sample": [11.207427978515625, -4.304649353027344, 0.8441009521484375, 6.477592468261719, 9.53863525390625, 2.2279815673828125, 1.0149383544921875, 7.913143157958984, 15.15786361694336, 8.12969970703125, 7.1903076171875, -3.0553436279296875, 2.2135848999023438, -2.5514373779296875, -2.2872467041015625, 21.293926239013672, 14.671897888183594, 11.53719711303711, 15.4342041015625, 8.877052307128906, -4.1694488525390625, 3.399871826171875, 2.5057125091552734, 13.110090255737305, 5.574193954467773, -0.7153205871582031, 0.316131591796875, -5.2027587890625, 0.7270774841308594, 1.0533447265625, 20.102394104003906, 6.552276611328125, 7.072385787963867, -3.40283203125, 14.095191955566406, -0.13851165771484375, 4.451959609985352, -0.52410888671875, 2.8470230102539062, 5.301309585571289, 13.252960205078125, 1.2313270568847656, 0.5182342529296875, 7.409427642822266, -3.6151771545410156, 4.044208526611328, 4.000886917114258, 0.7617340087890625, 0.26404380798339844, -10.031009674072266, 0.06692123413085938, 12.104915618896484, 5.591236114501953, 3.2810440063476562, 2.9424514770507812, 13.079299926757812, 3.5888710021972656, 4.84857177734375, -3.589996337890625, 12.775772094726562, 12.988349914550781, 8.896780014038086, 26.519256591796875, 4.534692764282227], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000487.npy"}
{"epoch": 0.7362055933484505, "step": 488, "batch_size": 64, "mean": 6.276533603668213, "std": 7.017754554748535, "min": -6.987348556518555, "p10": -2.0108335494995115, "median": 5.577484130859375, "p90": 15.748757553100587, "max": 27.303482055664062, "pos_frac": 0.8125, "sample": [0.7564468383789062, -1.774810791015625, -1.0218677520751953, -3.7050724029541016, 4.493354797363281, -2.1119861602783203, -3.6539535522460938, 9.425943374633789, 14.782333374023438, 11.41054916381836, 3.592378616333008, 4.491912841796875, 19.960098266601562, 3.0054244995117188, 2.9058761596679688, 27.303482055664062, 6.8720855712890625, 16.011520385742188, 0.05268096923828125, 0.4373435974121094, 7.8461456298828125, 6.167396545410156, 2.9163360595703125, 0.3933258056640625, 3.4527359008789062, 7.0164642333984375, -5.1964263916015625, 15.861927032470703, 2.0820541381835938, 13.036552429199219, 3.4443359375, -1.6122589111328125, -0.10163116455078125, 19.261932373046875, 9.199142456054688, 3.257822036743164, 14.23558235168457, 0.2742195129394531, 15.484695434570312, 12.715911865234375, 12.210906982421875, 6.109775543212891, 10.527755737304688, 6.194915771484375, 11.83489990234375, 4.969261169433594, -0.29961395263671875, 5.665191650390625, -5.42486572265625, 4.28973388671875, 5.489776611328125, 6.774196624755859, -6.987348556518555, 10.100418090820312, 6.766593933105469, 11.223472595214844, 13.981353759765625, 11.447010040283203, 15.865802764892578, 9.737186431884766, 3.356109619140625, 1.8113250732421875, -3.9802017211914062, 17.064483642578125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000488.npy"}
{"epoch": 0.7377173091458806, "step": 489, "batch_size": 64, "mean": 3.3096535205841064, "std": 6.726823329925537, "min": -10.262125015258789, "p10": -5.336106872558593, "median": 2.070720672607422, "p90": 11.859014129638673, "max": 18.135650634765625, "pos_frac": 0.71875, "sample": [-10.262125015258789, 8.032661437988281, 6.927490234375, 8.536460876464844, 12.117725372314453, 11.580764770507812, -0.2748870849609375, 6.5212249755859375, 9.4615478515625, 16.06439208984375, 9.006782531738281, 0.9480361938476562, 10.599761962890625, -7.882266998291016, -1.8891525268554688, 2.5028457641601562, -5.649436950683594, 0.21999740600585938, 10.9727783203125, 5.140378952026367, 18.135650634765625, 10.146095275878906, 2.3347549438476562, 0.3073921203613281, 8.023536682128906, 11.978263854980469, -3.2407608032226562, -3.2574081420898438, 12.553985595703125, 2.01947021484375, -4.605003356933594, -7.078346252441406, 16.472808837890625, 3.023235321044922, 1.3063812255859375, -1.778676986694336, 0.0300750732421875, 1.1469917297363281, 9.595436096191406, 2.1219711303710938, 2.504209518432617, 3.7071380615234375, 0.28156280517578125, 11.02386474609375, 6.2468109130859375, -5.830715179443359, 9.127433776855469, 1.03851318359375, 6.1019134521484375, 7.4298248291015625, -7.894798278808594, 0.16521453857421875, 1.1591720581054688, 0.5151023864746094, -3.2245616912841797, -8.9476318359375, -1.804534912109375, 1.4397201538085938, -3.2228927612304688, 0.8408203125, -2.8025779724121094, 17.615997314453125, -0.7096996307373047, 5.1471099853515625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000489.npy"}
{"epoch": 0.7392290249433107, "step": 490, "batch_size": 64, "mean": 4.216080188751221, "std": 8.430861473083496, "min": -14.4393310546875, "p10": -5.672647857666015, "median": 4.010250091552734, "p90": 15.027788162231447, "max": 22.45784568786621, "pos_frac": 0.65625, "sample": [-0.6457252502441406, -5.8159027099609375, -12.775543212890625, 9.333061218261719, -2.4620742797851562, 5.386701583862305, -3.9224395751953125, 12.795272827148438, 19.97998046875, 14.56231689453125, -13.87261962890625, 15.227275848388672, -14.4393310546875, 3.255647659301758, 8.315654754638672, 9.665281295776367, 13.380470275878906, 5.7659759521484375, 17.08990478515625, -0.6753921508789062, 1.2527084350585938, 9.887451171875, 9.859903335571289, 12.027565002441406, -5.338386535644531, 2.745849609375, 22.45784568786621, 1.0903358459472656, 1.6353721618652344, 4.2025604248046875, -10.147216796875, 7.4855194091796875, 5.8764801025390625, 9.902519226074219, -1.8091907501220703, -1.867645263671875, -0.25887298583984375, 18.378271102905273, -5.886102676391602, -0.6686439514160156, 7.150138854980469, -0.37506103515625, 2.7724533081054688, 10.945100784301758, -3.215940475463867, 6.816402435302734, -1.4709548950195312, -1.1700458526611328, -7.1609344482421875, 0.41663360595703125, 0.342071533203125, 5.119501113891602, 0.6040916442871094, 7.192607879638672, 21.236663818359375, 3.8179397583007812, 13.892511367797852, 5.366004943847656, 5.039802551269531, 16.781288146972656, -4.313030242919922, -4.407566070556641, 14.44012451171875, 9.034481048583984], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000490.npy"}
{"epoch": 0.7407407407407407, "step": 491, "batch_size": 64, "mean": 4.9876275062561035, "std": 6.460248947143555, "min": -10.75927734375, "p10": -2.753256225585937, "median": 4.273133277893066, "p90": 13.905841064453128, "max": 16.764801025390625, "pos_frac": 0.765625, "sample": [14.607168197631836, -1.1459197998046875, 15.092412948608398, 7.564849853515625, 9.249359130859375, 6.4075469970703125, 0.3761100769042969, 10.202590942382812, 11.130561828613281, -0.014087677001953125, 1.7559661865234375, -5.368827819824219, 8.664989471435547, -0.2706146240234375, 16.356050491333008, -1.2087478637695312, 1.0599899291992188, 9.93896484375, 2.604595184326172, 7.8115234375, 0.0641326904296875, 6.94451904296875, 11.17121696472168, 4.637134552001953, 3.1054248809814453, 14.290092468261719, 2.8648681640625, 1.3132247924804688, 2.1145553588867188, 2.706134796142578, 14.846298217773438, 10.79620361328125, -2.8975067138671875, 5.113273620605469, 16.764801025390625, 6.485767364501953, -2.4166717529296875, 11.812118530273438, -0.653228759765625, -3.068450927734375, 15.319454193115234, -0.2174835205078125, 1.2672977447509766, 2.7136402130126953, 1.9277687072753906, 12.547412872314453, -4.588043212890625, -8.4512939453125, 13.009254455566406, 11.0697021484375, 6.033897399902344, 10.229450225830078, 12.322452545166016, 11.772041320800781, 3.187835693359375, 1.3896636962890625, 5.179229736328125, 1.5593376159667969, 9.21440315246582, -10.75927734375, -6.157800674438477, 3.9091320037841797, -2.121448516845703, 8.04315185546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000491.npy"}
{"epoch": 0.7422524565381708, "step": 492, "batch_size": 64, "mean": 4.251875877380371, "std": 7.826610565185547, "min": -15.178218841552734, "p10": -5.669305229187011, "median": 3.7709827423095703, "p90": 13.856035804748537, "max": 23.145309448242188, "pos_frac": 0.671875, "sample": [16.707504272460938, 3.7381591796875, -4.338680267333984, 2.5836830139160156, 13.914680480957031, -5.125406265258789, 11.77215576171875, 3.4457530975341797, 6.373685836791992, -0.8933982849121094, 10.018661499023438, 8.3218994140625, 5.323982238769531, 0.0264129638671875, 6.462654113769531, 4.950225830078125, -12.575065612792969, 4.1145477294921875, 1.3089179992675781, 11.940338134765625, 8.988086700439453, -0.9268112182617188, 2.4468307495117188, -1.06732177734375, 4.690452575683594, 3.8038063049316406, 3.8312034606933594, 23.145309448242188, -0.499755859375, -6.028804779052734, 15.275184631347656, 9.403501510620117, -0.8808364868164062, 6.8780364990234375, -0.9017295837402344, -0.34259033203125, -5.90240478515625, 21.319664001464844, -7.20404052734375, -3.3027572631835938, 13.079010009765625, 6.381965637207031, 11.647071838378906, -2.1409149169921875, -0.764129638671875, 11.029190063476562, 13.634563446044922, 3.1055030822753906, 0.31409454345703125, 13.719198226928711, -15.178218841552734, -6.2427520751953125, 0.773651123046875, 10.852622985839844, 10.995529174804688, 1.7837715148925781, 16.498249053955078, 6.6427154541015625, 10.182788848876953, -1.200408935546875, 2.2347335815429688, -6.976448059082031, -5.0887603759765625, 16.041316986083984], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000492.npy"}
{"epoch": 0.7437641723356009, "step": 493, "batch_size": 64, "mean": 5.958076000213623, "std": 7.001844882965088, "min": -15.489736557006836, "p10": -1.9115310668945311, "median": 5.45722770690918, "p90": 14.34027099609375, "max": 20.746566772460938, "pos_frac": 0.859375, "sample": [3.795013427734375, 2.4510765075683594, -1.9820404052734375, 10.511005401611328, 5.635101318359375, 1.3126983642578125, -12.304519653320312, 5.389636993408203, 2.031646728515625, 5.666667938232422, 3.7500381469726562, 12.725936889648438, 14.688674926757812, 17.659912109375, 1.9832115173339844, -1.74700927734375, 13.203147888183594, 1.7581062316894531, 13.772003173828125, 14.4166259765625, 2.5964183807373047, 3.661834716796875, -1.298776626586914, 6.786659240722656, 4.6396026611328125, -3.20947265625, 12.566314697265625, 2.8939247131347656, 19.3433837890625, 7.607728958129883, 4.1752166748046875, 10.671356201171875, -2.855743408203125, 7.152561187744141, -2.5660400390625, 5.524818420410156, 19.761159896850586, 0.08526611328125, 14.162109375, 1.3859233856201172, 11.616973876953125, 9.3251953125, 7.1572113037109375, -1.987030029296875, 3.1635360717773438, 20.746566772460938, 1.6193466186523438, 11.155838012695312, 11.24061393737793, 2.0150718688964844, 0.2659759521484375, 11.195745468139648, 2.3606796264648438, 1.9068603515625, 19.475955963134766, 8.041839599609375, 6.01312255859375, 5.528011322021484, 9.129898071289062, 12.817314147949219, 3.4123001098632812, 3.341684341430664, 9.462715148925781, -15.489736557006836], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000493.npy"}
{"epoch": 0.745275888133031, "step": 494, "batch_size": 64, "mean": 4.280810832977295, "std": 7.846949100494385, "min": -15.134086608886719, "p10": -3.498711776733398, "median": 3.666013717651367, "p90": 13.827726554870607, "max": 23.296018600463867, "pos_frac": 0.703125, "sample": [13.424068450927734, 1.2541580200195312, 3.4773178100585938, 5.3889007568359375, 8.494026184082031, 0.28094482421875, -1.8725605010986328, 13.311056137084961, 5.893768310546875, 0.0052032470703125, 0.02480316162109375, -3.327869415283203, 1.3859634399414062, 10.602825164794922, 2.0484085083007812, 5.7582855224609375, 10.227867126464844, 0.5592269897460938, 8.642932891845703, -2.622802734375, 1.6355934143066406, 16.7623291015625, 8.486988067626953, 3.120046615600586, 7.5844879150390625, 23.296018600463867, 7.851875305175781, -0.8185005187988281, 7.796417236328125, 14.000722885131836, 6.676567077636719, -13.953842163085938, -12.137069702148438, 12.960268020629883, -2.65802001953125, -5.7521209716796875, -10.009315490722656, 10.561714172363281, 4.465614318847656, 5.738719940185547, 2.530426025390625, -0.14999961853027344, -15.134086608886719, -0.6225852966308594, 7.126415252685547, 2.57647705078125, 17.847702026367188, 3.8547096252441406, 2.1856231689453125, 4.817192077636719, 18.6031494140625, 6.453704833984375, 9.44879150390625, 17.533584594726562, -1.1856842041015625, -0.10590362548828125, -1.0726165771484375, -4.539363861083984, 9.798019409179688, -3.571929931640625, 12.003997802734375, -0.144866943359375, -2.544750213623047, 19.698867797851562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000494.npy"}
{"epoch": 0.7467876039304611, "step": 495, "batch_size": 64, "mean": 3.8267667293548584, "std": 6.500558853149414, "min": -12.228567123413086, "p10": -2.5582323074340816, "median": 2.9293556213378906, "p90": 12.423320007324218, "max": 20.683408737182617, "pos_frac": 0.703125, "sample": [-3.9721145629882812, -1.2712211608886719, -1.7486114501953125, -12.228567123413086, 14.474960327148438, -2.8863258361816406, -3.5549583435058594, 3.502838134765625, 5.880809783935547, 0.3651580810546875, -1.7852249145507812, -9.558433532714844, 0.10601043701171875, 3.492694854736328, 3.2847747802734375, 18.181087493896484, -3.1350021362304688, 8.396026611328125, 6.013519287109375, 1.206918716430664, 4.166162490844727, 5.640207290649414, 8.650932312011719, -0.7206554412841797, 10.964286804199219, 12.634897232055664, -2.1675891876220703, 7.7653045654296875, 0.8338050842285156, 1.5272483825683594, 18.3065185546875, 6.940156936645508, 1.8473701477050781, -1.402902603149414, 8.920866012573242, 1.2234954833984375, 0.04412841796875, 5.04852294921875, 12.421600341796875, 9.866531372070312, -1.103057861328125, 1.8839225769042969, 9.99282455444336, 4.866912841796875, 16.488933563232422, 2.7253265380859375, 1.5220470428466797, -0.41355133056640625, 0.351409912109375, 20.683408737182617, -1.8949661254882812, 11.71898078918457, 5.192909240722656, -2.7256507873535156, -1.9611358642578125, 12.377593994140625, 6.1654052734375, 0.5078887939453125, 3.2158355712890625, -1.4070892333984375, 4.786125183105469, 3.1333847045898438, 12.424057006835938, -0.8936653137207031], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000495.npy"}
{"epoch": 0.7482993197278912, "step": 496, "batch_size": 64, "mean": 5.174496650695801, "std": 8.590091705322266, "min": -10.8470458984375, "p10": -3.6423980712890627, "median": 3.939253807067871, "p90": 16.981764984130866, "max": 34.9013671875, "pos_frac": 0.6875, "sample": [8.374034881591797, 4.89947509765625, 23.891830444335938, 7.256111145019531, -2.6100997924804688, 5.163352966308594, 6.353412628173828, 8.969879150390625, 12.831720352172852, 17.765357971191406, -1.749298095703125, -0.5073204040527344, 10.490522384643555, 2.1470718383789062, 0.5962886810302734, -0.16245269775390625, -0.4081859588623047, 5.325897216796875, 8.231679916381836, 23.16161346435547, -8.871421813964844, 34.9013671875, -0.4413127899169922, 15.04193115234375, 4.015007019042969, -10.8470458984375, 20.1187744140625, 10.215408325195312, -0.45436668395996094, 15.15338134765625, -4.655220031738281, 1.4685325622558594, 18.694046020507812, 12.399642944335938, -2.8207321166992188, 1.1729145050048828, -5.0603485107421875, -2.522623062133789, 1.094024658203125, 4.334270477294922, 12.072860717773438, 3.5911331176757812, 9.625679016113281, 9.633298873901367, 6.226947784423828, 4.220066070556641, -3.6520156860351562, 3.8635005950927734, -2.4167652130126953, 2.902759552001953, 23.886024475097656, -6.249973297119141, -4.923362731933594, 0.4493904113769531, 8.150833129882812, 6.7083740234375, 2.064159393310547, 10.287590026855469, -2.584615707397461, 3.28692626953125, -0.6048126220703125, 4.197572708129883, -3.6199569702148438, 1.0950355529785156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000496.npy"}
{"epoch": 0.7498110355253212, "step": 497, "batch_size": 64, "mean": 6.839195251464844, "std": 7.940847396850586, "min": -10.906730651855469, "p10": -1.4910003662109372, "median": 6.334623336791992, "p90": 15.708695220947266, "max": 26.855453491210938, "pos_frac": 0.765625, "sample": [24.468923568725586, 4.358558654785156, 26.855453491210938, 13.998466491699219, 4.039337158203125, -0.88494873046875, 7.1685333251953125, 2.4142532348632812, 4.456707000732422, 9.029220581054688, 11.099454879760742, 3.2017059326171875, 13.214151382446289, 5.545845031738281, 4.2677459716796875, -0.14246368408203125, 11.537345886230469, -10.906730651855469, 15.508350372314453, 9.492050170898438, -0.0570526123046875, 8.123329162597656, -0.3615570068359375, -0.9923629760742188, 2.4562454223632812, -1.3278350830078125, -1.5609283447265625, 8.754581451416016, 11.043533325195312, 15.758140563964844, 0.25743675231933594, 5.2696380615234375, 5.76837158203125, 9.196098327636719, -7.4870758056640625, 9.70681381225586, 15.59332275390625, 10.83418083190918, 15.471866607666016, -3.304384231567383, 17.106056213378906, 5.222169876098633, 0.3549652099609375, 8.064170837402344, 14.712570190429688, -4.61895751953125, 21.216964721679688, 22.05158233642578, -0.3900794982910156, 1.3896903991699219, 7.663238525390625, 10.9959716796875, 11.488443374633789, 6.900875091552734, 3.7409515380859375, 24.748920440673828, -7.245330810546875, 1.479238510131836, -1.2044410705566406, 9.232410430908203, 9.963897705078125, 2.8045883178710938, -2.0074920654296875, 12.17376708984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000497.npy"}
{"epoch": 0.7513227513227513, "step": 498, "batch_size": 64, "mean": 5.718539237976074, "std": 7.106716156005859, "min": -8.769287109375, "p10": -1.7636611938476563, "median": 3.9170408248901367, "p90": 15.820114326477052, "max": 23.203079223632812, "pos_frac": 0.828125, "sample": [-2.3500213623046875, 4.738075256347656, 11.794677734375, 7.196155548095703, -1.714111328125, -4.217426300048828, 9.179107666015625, 14.9571533203125, -2.1199073791503906, -2.1579132080078125, 7.3130340576171875, 3.869457244873047, 3.4514541625976562, 3.501953125, 19.440650939941406, 2.004364013671875, 1.4092464447021484, 0.5183544158935547, 1.286712646484375, 7.9485321044921875, 17.022415161132812, 2.3407440185546875, 12.549676895141602, 1.1065292358398438, 3.8276901245117188, 2.7651214599609375, -1.5647125244140625, 15.59162712097168, 14.604549407958984, 0.1651611328125, 11.10711669921875, 17.90359115600586, 1.443227767944336, 5.9410247802734375, 4.0962371826171875, -2.42205810546875, -1.036407470703125, 7.280355453491211, 0.06973648071289062, 0.9323863983154297, 13.861961364746094, 15.918037414550781, -8.769287109375, 8.438798904418945, 3.9646244049072266, 2.5353546142578125, 0.2726783752441406, 4.06982421875, 5.120761871337891, 4.382633209228516, 15.059112548828125, 23.159589767456055, 9.523445129394531, 0.818145751953125, 15.568367004394531, 0.15716552734375, 17.877334594726562, 6.398712158203125, 0.04886627197265625, 0.0762939453125, -1.7848968505859375, -1.0886993408203125, 7.401031494140625, 23.203079223632812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000498.npy"}
{"epoch": 0.7528344671201814, "step": 499, "batch_size": 64, "mean": 5.195621967315674, "std": 8.444607734680176, "min": -11.475921630859375, "p10": -5.938932800292968, "median": 4.802371978759766, "p90": 17.023526000976567, "max": 24.519744873046875, "pos_frac": 0.6875, "sample": [11.822990417480469, 17.72833251953125, -11.475921630859375, 14.52593994140625, 7.4503021240234375, 2.6997604370117188, 1.9847030639648438, -9.891921997070312, 3.4672088623046875, 3.8160552978515625, 9.802875518798828, 6.48077392578125, 22.445648193359375, -5.28485107421875, 10.157485961914062, 0.8351974487304688, 24.519744873046875, -9.930818557739258, 18.538494110107422, 15.912919998168945, 9.847869873046875, -1.663604736328125, -3.667509078979492, -3.8166141510009766, -6.926582336425781, 9.47406005859375, 6.7522125244140625, 9.499412536621094, -3.434286117553711, -1.9669876098632812, -0.7960472106933594, -0.9668998718261719, 7.4061737060546875, 5.0972747802734375, 6.663921356201172, -0.7774810791015625, 2.8077468872070312, 7.549263000488281, -2.065889358520508, 1.9511909484863281, 15.85205078125, -0.19408035278320312, 13.950027465820312, 5.131856918334961, -8.379928588867188, 4.507469177246094, 2.8138961791992188, 10.098077774047852, 13.102264404296875, 21.043533325195312, 8.846878051757812, 2.530132293701172, 10.194089889526367, -7.592216491699219, -6.2192535400390625, 16.168243408203125, 9.020200729370117, 8.448822021484375, -1.2935333251953125, 17.5404052734375, 1.7688980102539062, 17.39007568359375, 3.250152587890625, -2.030414581298828], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000499.npy"}
{"epoch": 0.7543461829176115, "step": 500, "batch_size": 64, "mean": 4.860734939575195, "std": 7.669670104980469, "min": -13.523681640625, "p10": -5.01274299621582, "median": 3.8990936279296875, "p90": 15.524257659912111, "max": 21.94894790649414, "pos_frac": 0.796875, "sample": [-3.672149658203125, 3.6744613647460938, 2.7462615966796875, 1.3949546813964844, 8.01495361328125, 5.761878967285156, 7.906917572021484, 17.376983642578125, -0.10153961181640625, 15.137519836425781, 4.6157989501953125, 17.36800765991211, 12.419418334960938, 0.001678466796875, 14.862815856933594, 10.986015319824219, 21.94894790649414, -5.39971923828125, 4.28741455078125, -0.9158554077148438, 12.428077697753906, 8.5997314453125, 10.515800476074219, 12.75130844116211, 0.26268768310546875, 19.128986358642578, 7.522491455078125, 0.2921257019042969, -1.1256694793701172, -9.365043640136719, -4.998897552490234, 1.2481269836425781, 0.11516952514648438, 3.2804718017578125, -7.591608047485352, -2.8953189849853516, 4.123725891113281, 2.2049407958984375, 11.130767822265625, 0.5868911743164062, 7.3417816162109375, 0.6376094818115234, 8.302928924560547, 1.1565093994140625, -5.0186767578125, 13.092704772949219, 2.1991806030273438, -13.523681640625, 12.963336944580078, -10.351654052734375, 16.528648376464844, 3.107513427734375, 0.46518898010253906, 4.397560119628906, 10.49462890625, 2.9843921661376953, 1.9260177612304688, 4.9834136962890625, 9.694259643554688, 1.0342636108398438, -6.82342529296875, 6.726430892944336, 16.44857406616211, 15.69000244140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000500.npy"}
{"epoch": 0.7558578987150416, "step": 501, "batch_size": 64, "mean": 4.770712852478027, "std": 5.6712822914123535, "min": -8.239189147949219, "p10": -1.1217546463012693, "median": 3.7679290771484375, "p90": 12.848210906982423, "max": 21.341400146484375, "pos_frac": 0.796875, "sample": [4.447715759277344, 12.350784301757812, 5.8297576904296875, 10.259502410888672, 7.504631042480469, 8.38224983215332, 3.906463623046875, 4.977378845214844, 6.139652252197266, 16.21338653564453, 15.322319030761719, 1.5937843322753906, 8.648208618164062, 1.7135391235351562, -1.210042953491211, -0.7486553192138672, 8.443527221679688, 13.346328735351562, -4.666069030761719, -2.067535400390625, 1.7878265380859375, -0.6739845275878906, -0.1889495849609375, 8.672012329101562, 21.341400146484375, 14.43402099609375, 3.0908203125, 17.648757934570312, 2.4032669067382812, -0.3301658630371094, -0.9157485961914062, 2.30413818359375, -1.3633460998535156, 1.4085731506347656, 7.526763916015625, 1.3491668701171875, 7.319456100463867, -2.00823974609375, 2.435657501220703, 0.27176475524902344, 2.076223373413086, 12.170669555664062, 0.3338127136230469, 3.3415184020996094, 3.62939453125, -8.239189147949219, 1.1958389282226562, 6.5990447998046875, -3.2682418823242188, 3.4060440063476562, 5.571983337402344, 8.107719421386719, 2.621379852294922, 0.38782501220703125, 3.9363555908203125, -0.3009796142578125, 4.303009033203125, 2.577983856201172, 10.935249328613281, 10.810405731201172, 13.061393737792969, 6.01984977722168, 5.02601432800293, 4.122222900390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000501.npy"}
{"epoch": 0.7573696145124716, "step": 502, "batch_size": 64, "mean": 4.486284255981445, "std": 6.266890048980713, "min": -8.134857177734375, "p10": -3.612070083618164, "median": 3.5199899673461914, "p90": 11.747674560546878, "max": 26.476226806640625, "pos_frac": 0.84375, "sample": [-1.1365203857421875, 1.008474349975586, 2.6726150512695312, -3.9612655639648438, 10.923004150390625, 8.397846221923828, 2.6458816528320312, 2.280181884765625, -5.165069580078125, 4.652069091796875, 5.487514495849609, 14.751312255859375, 0.5672187805175781, -4.794471740722656, 5.1181640625, 18.54449462890625, 5.7897796630859375, -2.0987396240234375, 3.99639892578125, 1.2296600341796875, 5.740501403808594, 0.7351455688476562, 6.164361953735352, 6.3478546142578125, 2.8386917114257812, 1.8159828186035156, 9.0584716796875, -3.699748992919922, 0.5053291320800781, 9.082427978515625, 0.683441162109375, 1.1486873626708984, 7.2904205322265625, 2.350065231323242, 4.756809234619141, 2.3081283569335938, 17.736068725585938, 26.476226806640625, -8.092254638671875, 1.7095832824707031, -3.8829116821289062, 2.9224395751953125, 6.2154083251953125, 0.9507369995117188, 3.284090042114258, 0.8744125366210938, 3.755889892578125, 9.534255981445312, 9.657421112060547, -3.4074859619140625, 7.359792709350586, 4.9471893310546875, 7.608486175537109, 2.045166015625, 7.975433349609375, -8.134857177734375, 9.169830322265625, 12.101104736328125, 0.34378814697265625, 13.245399475097656, 2.151531219482422, 9.372261047363281, 15.68212890625, 7.48594856262207], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000502.npy"}
{"epoch": 0.7588813303099018, "step": 503, "batch_size": 64, "mean": 4.70809268951416, "std": 7.283265590667725, "min": -12.700580596923828, "p10": -3.2143043518066396, "median": 3.214649200439453, "p90": 14.166237449645998, "max": 18.898345947265625, "pos_frac": 0.828125, "sample": [6.371421813964844, -3.5738754272460938, -2.166717529296875, 1.0024642944335938, 11.166648864746094, 2.5920333862304688, 13.854415893554688, -12.532318115234375, 2.2424545288085938, 9.468515396118164, 9.47357177734375, 1.8406524658203125, 3.7246856689453125, 18.898345947265625, 1.2849140167236328, 9.41019058227539, -2.37530517578125, 2.015228271484375, 16.102584838867188, -4.143516540527344, 1.99237060546875, 3.2849349975585938, 18.387521743774414, 4.3299560546875, 3.6222305297851562, 3.1443634033203125, 9.464523315429688, 16.014699935913086, 1.8046379089355469, 8.149839401245117, -1.5657310485839844, 1.3483467102050781, 17.754283905029297, 10.293964385986328, 14.368824005126953, 0.23139572143554688, 6.970485687255859, -0.01279449462890625, -12.487533569335938, 0.7528915405273438, 2.8259429931640625, 3.0627365112304688, 0.8171825408935547, 6.999183654785156, 2.018056869506836, 2.144622802734375, 3.0659027099609375, 12.930900573730469, 14.299875259399414, 5.793628692626953, 12.623397827148438, 12.125129699707031, -12.700580596923828, 0.7639236450195312, 13.734207153320312, 12.570133209228516, 0.9018211364746094, -5.824464797973633, 4.1734619140625, 7.646095275878906, 2.1001625061035156, 4.7715606689453125, 12.086454391479492, -10.117006301879883], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000503.npy"}
{"epoch": 0.7603930461073318, "step": 504, "batch_size": 64, "mean": 6.4376020431518555, "std": 6.433803081512451, "min": -4.295890808105469, "p10": -0.7705673217773437, "median": 5.758073806762695, "p90": 15.255352592468263, "max": 23.23792266845703, "pos_frac": 0.828125, "sample": [0.269561767578125, 15.47962760925293, -0.7985687255859375, 12.47022819519043, 7.867706298828125, -0.30812835693359375, 9.687347412109375, 6.438070297241211, 22.270755767822266, -0.5568294525146484, 2.53936767578125, 7.9786376953125, 3.5801925659179688, 15.673343658447266, 2.9015159606933594, 13.978252410888672, 10.519237518310547, -1.5942764282226562, 8.02288818359375, 20.045303344726562, 7.308979034423828, 3.777881622314453, 1.9107913970947266, 13.634408950805664, 14.583534240722656, 0.442291259765625, 2.5605411529541016, 17.26544189453125, 1.646993637084961, 23.23792266845703, 8.854400634765625, 11.28216552734375, -0.8508186340332031, 9.049997329711914, 10.278778076171875, -1.0527725219726562, 2.6748580932617188, -4.0521392822265625, 9.598648071289062, 3.362640380859375, 11.245567321777344, 2.2010574340820312, 1.0291671752929688, 7.8939361572265625, -2.3228836059570312, 8.039318084716797, 6.762561798095703, 5.006656646728516, 6.484954833984375, 1.8168106079101562, 14.732044219970703, 1.6447124481201172, -4.295890808105469, -0.705230712890625, 5.541187286376953, 4.964351654052734, -0.03795051574707031, 5.9749603271484375, 7.1548614501953125, 0.8211441040039062, 5.17314338684082, 20.215431213378906, 7.042076110839844, 3.645742416381836], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000504.npy"}
{"epoch": 0.7619047619047619, "step": 505, "batch_size": 64, "mean": 4.496703147888184, "std": 7.833970546722412, "min": -12.900154113769531, "p10": -3.2545938491821285, "median": 3.1754589080810547, "p90": 15.606708526611328, "max": 25.452720642089844, "pos_frac": 0.71875, "sample": [6.46343994140625, -0.6149520874023438, 0.4687652587890625, 3.352367401123047, -0.9694728851318359, 15.405181884765625, 2.224365234375, 22.362197875976562, 10.036224365234375, 11.234058380126953, 17.120223999023438, -3.4381484985351562, 1.5944442749023438, 0.1880035400390625, 4.294189453125, -1.81280517578125, 3.6397705078125, 1.0874977111816406, 6.43290901184082, 3.937000274658203, -2.3183135986328125, 7.534320831298828, 10.661849975585938, 7.089939117431641, 10.492568969726562, 25.452720642089844, 7.275184631347656, -0.1976470947265625, 5.118232727050781, 0.0159912109375, 5.93426513671875, -2.7845077514648438, 15.693077087402344, -9.308540344238281, 20.287036895751953, 11.469764709472656, 4.525672912597656, 11.225431442260742, -1.031667709350586, -2.5374755859375, 4.5962066650390625, 2.6593017578125, -0.8476448059082031, -12.900154113769531, 2.789520263671875, 1.5908699035644531, 5.834144592285156, -2.8262996673583984, 16.596893310546875, -11.81606674194336, -1.7189407348632812, 10.868576049804688, 14.890399932861328, 2.9985504150390625, 13.751083374023438, 2.437896728515625, 6.9423065185546875, -3.70263671875, 2.29888916015625, 16.4988956451416, 2.680940628051758, 0.3163013458251953, -4.588905334472656, -9.164276123046875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000505.npy"}
{"epoch": 0.763416477702192, "step": 506, "batch_size": 64, "mean": 4.415276527404785, "std": 7.5492143630981445, "min": -13.437740325927734, "p10": -4.167510986328124, "median": 4.004081726074219, "p90": 12.060032653808596, "max": 26.29558563232422, "pos_frac": 0.765625, "sample": [5.985776901245117, 9.737201690673828, -1.3722152709960938, 2.441375732421875, 3.3195648193359375, 4.0182037353515625, 6.543312072753906, -4.363605499267578, 5.40620231628418, -1.568878173828125, 0.8544979095458984, 9.569915771484375, 7.432228088378906, 3.989959716796875, 2.294757843017578, 1.8774261474609375, -8.74148941040039, 23.561416625976562, 26.29558563232422, 1.6172714233398438, 11.261550903320312, -8.919715881347656, 3.354583740234375, -12.788177490234375, -5.965721130371094, -13.437740325927734, 2.180023193359375, 20.779525756835938, 13.83746337890625, 11.622169494628906, 18.786014556884766, -6.128448486328125, 12.247688293457031, -3.49273681640625, 11.573652267456055, 1.6553726196289062, 10.973995208740234, 1.1834564208984375, 15.509346008300781, -1.5948333740234375, 1.1984405517578125, -0.2855262756347656, 1.6419525146484375, 6.146125793457031, 7.0936431884765625, 3.3498001098632812, -0.14869308471679688, 3.8777542114257812, 5.262880325317383, 9.193771362304688, 8.922754287719727, 7.4577178955078125, 1.4880142211914062, 7.518829345703125, 8.328010559082031, 6.6143341064453125, 0.9796524047851562, 5.725730895996094, -2.277008056640625, -3.7099571228027344, 4.149562835693359, 5.5725250244140625, 7.047149658203125, 5.8942413330078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000506.npy"}
{"epoch": 0.764928193499622, "step": 507, "batch_size": 64, "mean": 5.173243999481201, "std": 7.874217987060547, "min": -8.659233093261719, "p10": -3.3983373641967773, "median": 4.527017593383789, "p90": 15.819252777099614, "max": 28.157562255859375, "pos_frac": 0.71875, "sample": [10.174610137939453, 3.9328765869140625, -2.6805801391601562, 13.019378662109375, 28.157562255859375, 3.39501953125, 14.70855712890625, -5.993072509765625, -6.861156463623047, 4.1562042236328125, 1.3803691864013672, 4.544010162353516, -2.9479217529296875, 16.37066650390625, -1.22210693359375, 8.97479248046875, 1.3248748779296875, -2.0892791748046875, 9.595375061035156, 2.4592742919921875, 0.73358154296875, 8.551223754882812, 8.628192901611328, -3.439310073852539, 16.295265197753906, 0.6662273406982422, 2.906726837158203, 7.254150390625, 5.507259368896484, 5.526103973388672, 4.5100250244140625, -2.239288330078125, 24.054458618164062, -2.80633544921875, 7.233165740966797, 6.634443283081055, -3.302734375, 12.14288330078125, -5.319965362548828, -2.3890323638916016, -1.2616119384765625, 8.293197631835938, 0.48288917541503906, 11.335018157958984, 6.391838073730469, 1.0993499755859375, 9.091133117675781, 3.1582794189453125, 18.79071044921875, 21.54974365234375, -8.659233093261719, 10.8447265625, -1.9115982055664062, 1.7577896118164062, 11.249443054199219, -0.7684555053710938, 11.635120391845703, -5.4257354736328125, 8.692684173583984, -8.41824722290039, 5.615398406982422, 13.30072021484375, 5.208721160888672, 17.489242553710938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000507.npy"}
{"epoch": 0.7664399092970522, "step": 508, "batch_size": 64, "mean": 5.647343158721924, "std": 7.226112365722656, "min": -7.97332763671875, "p10": -2.953024673461914, "median": 5.522491455078125, "p90": 15.036322402954102, "max": 23.674713134765625, "pos_frac": 0.8125, "sample": [9.205585479736328, 2.13250732421875, 13.878936767578125, 7.994209289550781, 2.2142257690429688, -4.6513824462890625, 23.64623260498047, -1.5782432556152344, 9.74635124206543, 14.90283203125, 23.674713134765625, 8.845451354980469, 1.201385498046875, 3.6541595458984375, 0.455230712890625, 9.389434814453125, -7.637434005737305, 5.746673583984375, 3.999908447265625, -5.391660690307617, 4.406646728515625, -2.435871124267578, 8.799976348876953, 0.5779800415039062, 0.6958541870117188, -7.97332763671875, 5.914541244506836, 2.123859405517578, 8.44830322265625, 8.096996307373047, 2.062835693359375, 13.170318603515625, 5.05816650390625, 5.683784484863281, 15.09353256225586, 9.0780029296875, 3.325847625732422, 5.8499603271484375, 17.948211669921875, 1.1617584228515625, -1.5294551849365234, -3.075408935546875, 13.625839233398438, 11.39206314086914, -1.2422809600830078, 0.038787841796875, 6.800224304199219, 18.139617919921875, 17.58746337890625, 1.6164493560791016, 8.930156707763672, 1.3477115631103516, 2.9371089935302734, -3.2523651123046875, 2.2095184326171875, -7.377666473388672, -2.667461395263672, 20.590309143066406, 7.161323547363281, 9.26983642578125, 7.774261474609375, 6.1301116943359375, 5.361198425292969, 11.146141052246094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000508.npy"}
{"epoch": 0.7679516250944822, "step": 509, "batch_size": 64, "mean": 4.861347198486328, "std": 6.8452959060668945, "min": -12.495346069335938, "p10": -3.7056472778320306, "median": 4.538387298583984, "p90": 12.931330490112305, "max": 22.851593017578125, "pos_frac": 0.734375, "sample": [9.598752975463867, -2.25958251953125, 0.8597602844238281, 9.337078094482422, 2.7433853149414062, -0.1343994140625, 2.224058151245117, -2.0118942260742188, -7.265129089355469, 0.0823974609375, 6.0832672119140625, 13.679542541503906, 3.578582763671875, 2.967405319213867, 12.423233032226562, -12.495346069335938, 22.851593017578125, 9.002975463867188, -1.7826385498046875, 3.010894775390625, 9.026390075683594, -3.11328125, 7.95111083984375, -5.205768585205078, 12.134300231933594, 3.164581298828125, -2.1279220581054688, 1.8918266296386719, 13.139892578125, 3.160430908203125, 10.487358093261719, 12.740055084228516, 6.0760498046875, 5.957893371582031, 2.0522689819335938, -0.34414100646972656, 11.449222564697266, 2.960693359375, 8.752685546875, 13.0133056640625, 6.164878845214844, -0.640411376953125, 13.763648986816406, -5.690784454345703, 11.571868896484375, 9.294631958007812, 4.4477081298828125, 8.688278198242188, 20.182601928710938, 11.034156799316406, 11.682144165039062, -0.6283950805664062, -2.5774459838867188, -8.366897583007812, 2.3687667846679688, 7.5621490478515625, 8.230762481689453, -4.124143600463867, -3.9595184326171875, 2.5544281005859375, 6.9613037109375, 4.629066467285156, 13.785194396972656, 8.531318664550781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000509.npy"}
{"epoch": 0.7694633408919124, "step": 510, "batch_size": 64, "mean": 5.268549919128418, "std": 7.403101921081543, "min": -9.202560424804688, "p10": -4.306829833984374, "median": 5.0083465576171875, "p90": 15.128653335571295, "max": 22.834754943847656, "pos_frac": 0.8125, "sample": [2.399200439453125, 15.736003875732422, 12.123710632324219, 3.3748016357421875, -9.202560424804688, 0.0894622802734375, 7.449623107910156, 20.715431213378906, 2.5545196533203125, 3.651641845703125, 1.0037155151367188, 0.48842620849609375, 2.1421356201171875, 3.1834793090820312, -7.45698356628418, 0.02379608154296875, 1.1078720092773438, 6.1900787353515625, 0.340606689453125, 4.054563522338867, 6.4184722900390625, 11.478788375854492, 13.416976928710938, -2.964305877685547, 3.551513671875, 10.786815643310547, 9.886444091796875, 0.1740570068359375, -0.24961090087890625, 11.613468170166016, 9.871849060058594, 6.467536926269531, 17.825210571289062, 7.229349136352539, 4.103506088256836, 8.219573974609375, 5.4184112548828125, 21.17230987548828, -2.1963729858398438, -4.790557861328125, 1.8659896850585938, -7.154029846191406, 7.18377685546875, 11.58558464050293, 6.728851318359375, 1.4458847045898438, 5.588523864746094, 22.834754943847656, 7.071807861328125, -3.178131103515625, 7.502006530761719, 4.662954330444336, 17.40088653564453, 5.353738784790039, -7.575309753417969, 5.838573455810547, 20.069580078125, 13.711502075195312, -1.0582408905029297, 7.565349578857422, 12.65972900390625, 3.7392635345458984, -4.892753601074219, -9.166053771972656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000510.npy"}
{"epoch": 0.7709750566893424, "step": 511, "batch_size": 64, "mean": 6.860415935516357, "std": 7.506218433380127, "min": -8.382232666015625, "p10": -3.1985267639160146, "median": 5.821720123291016, "p90": 17.772171783447266, "max": 22.24530029296875, "pos_frac": 0.828125, "sample": [22.24530029296875, 14.792999267578125, 6.444402694702148, 18.6732234954834, 4.744922637939453, 2.1948699951171875, 1.446502685546875, 0.143218994140625, 16.305572509765625, 16.256832122802734, 6.220123291015625, -3.5958328247070312, 7.96258544921875, 5.423316955566406, 15.618492126464844, 7.2057342529296875, 9.371583938598633, 2.3077735900878906, 17.3541259765625, 10.706336975097656, 15.032814025878906, -2.37554931640625, 3.7727737426757812, -8.382232666015625, -3.6494827270507812, -2.35345458984375, 18.874717712402344, 10.550827026367188, 4.366447448730469, 4.275146484375, -0.8658103942871094, 4.1986236572265625, 8.251312255859375, 3.2061767578125, 7.41526985168457, 9.533767700195312, 13.181556701660156, 3.6828346252441406, -3.5512313842773438, 2.7714614868164062, 1.8270263671875, 11.022228240966797, 13.32635498046875, 19.265968322753906, 17.778656005859375, 9.937156677246094, -5.224418640136719, 2.3065185546875, 0.6734409332275391, 19.699892044067383, 3.4767608642578125, 2.8124656677246094, 8.381019592285156, 7.829338073730469, 1.1705036163330078, 2.3362045288085938, -3.7352848052978516, 20.713340759277344, 7.736091613769531, 14.151226043701172, 17.757041931152344, -0.7945632934570312, -4.987190246582031, 1.8487739562988281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000511.npy"}
{"epoch": 0.7724867724867724, "step": 512, "batch_size": 64, "mean": 5.79302978515625, "std": 13.173238754272461, "min": -11.953521728515625, "p10": -4.042386245727539, "median": 4.433933258056641, "p90": 15.244940185546877, "max": 91.9073486328125, "pos_frac": 0.734375, "sample": [11.345333099365234, 5.3673095703125, 11.859485626220703, -8.71676254272461, -3.9849510192871094, 18.788253784179688, 0.4251384735107422, 1.1759300231933594, 1.0255813598632812, 5.6469573974609375, 9.10544204711914, 11.876777648925781, -11.953521728515625, -0.9942970275878906, 5.968294143676758, -0.20697784423828125, 1.5871658325195312, -1.3294410705566406, 14.845123291015625, 0.5036830902099609, 15.416290283203125, 10.622367858886719, 1.7519378662109375, 5.6655120849609375, 4.8022918701171875, 9.190536499023438, -9.878692626953125, 5.202262878417969, 7.544654846191406, 16.839584350585938, -0.4784812927246094, 8.425369262695312, 16.344871520996094, 0.3423194885253906, 7.0540008544921875, 2.28570556640625, -0.019947052001953125, 7.790985107421875, -0.3581371307373047, 14.663925170898438, 2.2263565063476562, -8.068391799926758, 0.3928050994873047, -4.0670013427734375, 4.574775695800781, 12.422805786132812, 0.841705322265625, 4.2930908203125, 3.2744598388671875, -0.4055442810058594, 14.242835998535156, -1.484588623046875, 1.1922607421875, -7.203460693359375, 1.6479148864746094, 4.999671936035156, 91.9073486328125, 17.263809204101562, 23.823402404785156, 13.840377807617188, 5.172372817993164, -7.966884613037109, 5.903713226318359, -3.61181640625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000512.npy"}
{"epoch": 0.7739984882842026, "step": 513, "batch_size": 64, "mean": 5.684092998504639, "std": 6.590614318847656, "min": -6.4374237060546875, "p10": -1.6803634643554684, "median": 4.598090171813965, "p90": 17.57877998352051, "max": 24.042186737060547, "pos_frac": 0.828125, "sample": [18.37311553955078, 9.44708251953125, 19.738361358642578, 7.007907867431641, 7.0721282958984375, 4.837604522705078, 4.9334716796875, 5.226871490478516, 2.1466922760009766, -1.3471221923828125, 4.119071960449219, -5.2416839599609375, 8.601081848144531, 2.128389358520508, 2.794984817504883, 5.850950241088867, -1.1620674133300781, 4.08636474609375, 24.042186737060547, 9.400436401367188, 9.120904922485352, 19.189876556396484, 7.2662506103515625, 3.4874534606933594, 17.129287719726562, 0.7095718383789062, 5.4449920654296875, 8.653438568115234, 2.8869247436523438, 6.178651809692383, 3.656658172607422, 0.31516075134277344, 9.226280212402344, -6.4374237060546875, 3.8623199462890625, 4.404851913452148, -3.5973052978515625, 3.096466064453125, 19.529769897460938, 1.7508659362792969, -2.314699172973633, -3.2545127868652344, 19.05699920654297, -0.1986083984375, 11.818756103515625, 12.89529800415039, 17.771419525146484, 4.297126770019531, 9.3671875, 4.791328430175781, 4.865501403808594, 4.819007873535156, 9.393779754638672, 3.3917236328125, 4.0337677001953125, -4.577476501464844, 2.2348861694335938, 2.279651641845703, 0.9242820739746094, 1.434427261352539, 10.204330444335938, -1.82318115234375, -0.434539794921875, 4.874668121337891], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000513.npy"}
{"epoch": 0.7755102040816326, "step": 514, "batch_size": 64, "mean": 5.463407516479492, "std": 7.12099552154541, "min": -13.613998413085938, "p10": -3.4556755065917963, "median": 6.1523332595825195, "p90": 15.13342742919922, "max": 19.996402740478516, "pos_frac": 0.78125, "sample": [0.2409229278564453, 3.045948028564453, 8.335639953613281, 10.61770248413086, 7.306591033935547, 4.04572868347168, 15.065780639648438, -2.8969497680664062, 3.561126708984375, 1.3898887634277344, 6.098766326904297, 12.257999420166016, 6.603364944458008, 9.67431640625, -1.4478302001953125, 0.7723884582519531, -10.28076171875, 6.907125473022461, -0.2370147705078125, 6.362113952636719, -13.613998413085938, 1.3014678955078125, 8.963811874389648, 9.742826461791992, 19.996402740478516, -2.8740997314453125, -0.5334320068359375, 7.666202545166016, 10.31329345703125, 11.042930603027344, -2.3019256591796875, -4.79527473449707, 9.967704772949219, 7.323463439941406, 8.899955749511719, 7.194755554199219, 5.3377685546875, 16.178688049316406, 1.5222091674804688, 4.944135665893555, 2.2696075439453125, -4.537834167480469, -2.5781211853027344, 2.613269805908203, 9.344024658203125, 9.761390686035156, 5.168052673339844, 2.3693389892578125, 6.205900192260742, 9.809852600097656, -3.69512939453125, 1.7307624816894531, 4.148445129394531, 18.55396270751953, -5.4904327392578125, 15.138938903808594, 15.508621215820312, 7.6197357177734375, 15.120567321777344, 0.7627353668212891, 19.8233642578125, 17.69061851501465, -4.0220794677734375, 12.642723083496094], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000514.npy"}
{"epoch": 0.7770219198790628, "step": 515, "batch_size": 64, "mean": 5.677316188812256, "std": 6.9267401695251465, "min": -9.571037292480469, "p10": -3.0281324386596675, "median": 5.8194169998168945, "p90": 15.064975357055667, "max": 21.189903259277344, "pos_frac": 0.765625, "sample": [1.8192481994628906, 6.619604110717773, 2.277210235595703, 8.93841552734375, 18.444488525390625, 14.14614486694336, 7.849857330322266, 13.149147033691406, 2.063018798828125, 8.941673278808594, 11.73733139038086, 8.621932983398438, 10.898185729980469, -2.408285140991211, 5.852197647094727, 7.135650634765625, -3.9471817016601562, 11.702301025390625, -4.224250793457031, 2.7999267578125, 7.534492492675781, -0.3698158264160156, 1.853118896484375, 3.565704345703125, -9.571037292480469, 5.693180084228516, 19.62290382385254, 7.985740661621094, 5.8535919189453125, -0.127838134765625, 4.258995056152344, 7.518474578857422, -0.17202377319335938, 0.03323554992675781, 7.0233917236328125, 14.359386444091797, -8.3314208984375, 8.783699035644531, 0.9629459381103516, 5.7866363525390625, -3.5073089599609375, 3.541555404663086, 15.36737060546875, -5.19659423828125, 5.4475250244140625, 21.189903259277344, 5.871023178100586, -0.7253513336181641, 12.798484802246094, 12.729089736938477, 6.8534088134765625, 5.755836486816406, 1.56744384765625, 18.870513916015625, 15.986648559570312, -3.293781280517578, 10.592794418334961, 0.1195220947265625, 15.470344543457031, -1.796356201171875, 2.0926742553710938, 12.376556396484375, -1.6464157104492188, -1.7966384887695312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000515.npy"}
{"epoch": 0.7785336356764928, "step": 516, "batch_size": 64, "mean": 4.814523696899414, "std": 6.803935527801514, "min": -14.691474914550781, "p10": -1.306400108337402, "median": 4.817867279052734, "p90": 13.267310714721681, "max": 21.409072875976562, "pos_frac": 0.8125, "sample": [17.355655670166016, 21.409072875976562, 10.239879608154297, -3.2742080688476562, 0.48868560791015625, 16.099109649658203, -2.677001953125, 5.085975646972656, 5.087169647216797, 7.419525146484375, 1.0443038940429688, 7.028568267822266, 11.67608642578125, 18.405746459960938, 3.0616836547851562, 0.7867393493652344, 7.4617767333984375, 9.688026428222656, 13.324287414550781, 8.2855224609375, -0.3540763854980469, -14.691474914550781, 3.8683395385742188, 7.1288909912109375, 19.521217346191406, 7.080425262451172, 6.9249420166015625, 0.7635879516601562, 13.13436508178711, 7.332073211669922, -8.540351867675781, -0.4675788879394531, 0.46798133850097656, 0.06301116943359375, 3.8907623291015625, 7.9468841552734375, 6.941204071044922, -0.012464523315429688, 1.4751739501953125, 1.1324539184570312, 4.429412841796875, -13.306228637695312, 5.125152587890625, 7.890501022338867, 8.137100219726562, 4.561614990234375, 10.33566665649414, 5.083171844482422, 1.681243896484375, 4.39825439453125, 2.1166458129882812, 5.074119567871094, 0.8293228149414062, 0.23954391479492188, -2.936767578125, 10.839347839355469, -0.09948348999023438, 6.009450912475586, -0.8872222900390625, -1.4860477447509766, 10.052474975585938, 1.7729854583740234, 0.950897216796875, 15.716388702392578], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000516.npy"}
{"epoch": 0.780045351473923, "step": 517, "batch_size": 64, "mean": 4.934661865234375, "std": 7.3802947998046875, "min": -7.526035308837891, "p10": -3.710567474365234, "median": 3.658492088317871, "p90": 13.392079162597657, "max": 27.367172241210938, "pos_frac": 0.75, "sample": [6.665760040283203, -2.372772216796875, -1.7332572937011719, 7.358110427856445, -4.941322326660156, 4.1450347900390625, 3.3218841552734375, 3.5586376190185547, 17.874099731445312, 6.437625885009766, 6.529075622558594, -4.38593864440918, -1.5777664184570312, 2.5619163513183594, 0.598602294921875, -3.829986572265625, 0.5208225250244141, 6.7225494384765625, 9.825769424438477, -1.5651931762695312, 8.658153533935547, -5.716728210449219, 11.212005615234375, 20.777095794677734, -2.46630859375, 1.4004268646240234, 2.6052169799804688, 4.385463714599609, 10.450408935546875, 10.564342498779297, 10.001571655273438, 13.512191772460938, 9.720216751098633, 3.129547119140625, -2.7050628662109375, -7.473814010620117, 3.7583465576171875, 13.11181640625, -1.3805866241455078, -2.831512451171875, 7.9013671875, 11.919565200805664, 22.503684997558594, 1.5642356872558594, 3.1103973388671875, 16.51637077331543, 4.723094940185547, 6.980342864990234, -3.4319229125976562, 5.180992126464844, 1.295572280883789, 7.5210418701171875, 2.243335723876953, 27.367172241210938, 22.18682861328125, 10.681474685668945, 1.1843338012695312, -4.1722869873046875, 2.7750072479248047, 3.531360626220703, 6.256416320800781, -7.526035308837891, 7.242198944091797, 1.8373641967773438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000517.npy"}
{"epoch": 0.781557067271353, "step": 518, "batch_size": 64, "mean": 4.144077301025391, "std": 7.125534534454346, "min": -7.6700897216796875, "p10": -3.76072654724121, "median": 3.5465126037597656, "p90": 14.037728118896492, "max": 24.538864135742188, "pos_frac": 0.671875, "sample": [4.213329315185547, -1.1331100463867188, 0.6882171630859375, 10.145614624023438, -0.3095550537109375, -1.9692230224609375, 1.97882080078125, 9.499649047851562, -7.251522064208984, 2.515960693359375, 2.1753273010253906, -1.7052383422851562, 3.6832504272460938, 12.207496643066406, 5.847663879394531, -7.6700897216796875, 5.34307861328125, 16.559518814086914, -6.273275375366211, 5.414676666259766, 14.822113037109375, -7.263450622558594, 0.8256378173828125, 4.972949981689453, 7.140129089355469, 5.633235931396484, -2.4110260009765625, 5.4267730712890625, -6.7529296875, 15.411323547363281, -1.034658432006836, -3.0386619567871094, -0.6243553161621094, 10.561086654663086, 10.373069763183594, -2.3945541381835938, 5.9898681640625, -4.070182800292969, 6.138458251953125, 24.538864135742188, 9.314865112304688, 0.21956634521484375, 1.8277530670166016, 22.64191436767578, 11.101226806640625, 12.119049072265625, -0.5595207214355469, 16.660518646240234, 4.807891845703125, -5.502296447753906, -0.9303245544433594, -1.5040912628173828, 1.660430908203125, -0.13374710083007812, 3.4097747802734375, 7.818214416503906, 7.3614959716796875, 6.4130706787109375, -0.5282688140869141, 1.0593700408935547, 3.8243560791015625, 0.4467926025390625, 19.39141845703125, 6.097190856933594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000518.npy"}
{"epoch": 0.783068783068783, "step": 519, "batch_size": 64, "mean": 3.211030960083008, "std": 8.332837104797363, "min": -20.14459228515625, "p10": -5.986062431335449, "median": 2.299093246459961, "p90": 13.820445251464845, "max": 20.436874389648438, "pos_frac": 0.671875, "sample": [-10.701562881469727, 10.036483764648438, -1.501739501953125, -15.119319915771484, -1.9188041687011719, 7.381744384765625, -1.4890480041503906, 17.84851837158203, 5.984710693359375, 4.60223388671875, 1.895355224609375, 4.098077774047852, 11.428749084472656, 0.304595947265625, 2.4418106079101562, 0.26697540283203125, 10.102920532226562, 1.2374553680419922, -5.178609848022461, -9.560928344726562, 13.596559524536133, 17.90003204345703, -2.81890869140625, 4.8469696044921875, 9.719276428222656, 17.32239532470703, -4.055881500244141, 4.572380065917969, 9.223350524902344, 1.4251708984375, -3.4694557189941406, 12.270927429199219, 1.8083648681640625, -3.28436279296875, 5.822242736816406, -6.210624694824219, 4.123291015625, 2.1563758850097656, 13.908271789550781, -4.118255615234375, 1.4247055053710938, 4.457309722900391, 10.355209350585938, 3.957307815551758, 2.9766006469726562, -20.14459228515625, 1.1072921752929688, -2.869384765625, 9.830459594726562, -5.46208381652832, 16.949382781982422, -3.938304901123047, -7.160905838012695, 20.436874389648438, 0.1385040283203125, 5.955589294433594, 12.501449584960938, 13.615516662597656, 7.642063140869141, -4.301654815673828, 1.249704360961914, -1.4993038177490234, 18.264389038085938, -6.877889633178711], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000519.npy"}
{"epoch": 0.7845804988662132, "step": 520, "batch_size": 64, "mean": 4.767726421356201, "std": 6.121098041534424, "min": -6.6074676513671875, "p10": -3.8312072753906237, "median": 4.447805404663086, "p90": 14.492258071899416, "max": 17.629867553710938, "pos_frac": 0.796875, "sample": [5.205905914306641, -1.7699317932128906, -0.5117645263671875, 4.005916595458984, 11.0982666015625, 13.408882141113281, 6.032381057739258, 0.0732574462890625, 3.0230445861816406, 7.8934326171875, 9.08782958984375, 4.356039047241211, 1.1294403076171875, 4.902072906494141, 10.293014526367188, 3.3835372924804688, -0.5480728149414062, -4.324562072753906, 0.5945472717285156, 16.077943801879883, 11.804718017578125, 4.278980255126953, 5.238960266113281, 14.118778228759766, 3.571929931640625, 6.971473693847656, 16.7176513671875, -4.9941864013671875, -4.6961669921875, 15.456253051757812, 7.49090576171875, 4.412727355957031, -0.2814006805419922, 7.496681213378906, 4.247306823730469, 0.8657703399658203, 15.280515670776367, 5.741508483886719, -2.6800460815429688, -6.3577423095703125, -6.145416259765625, 3.1402359008789062, 0.7090225219726562, -5.24835205078125, 0.63824462890625, 1.07562255859375, 5.697898864746094, 5.349185943603516, 4.976127624511719, 1.9200515747070312, 3.7456817626953125, 14.652320861816406, 15.36834716796875, 5.743175506591797, 17.629867553710938, 8.360441207885742, 9.4981689453125, 6.903175354003906, 0.07271194458007812, -2.010049819946289, 4.482883453369141, 8.123262405395508, 8.963550567626953, -6.6074676513671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000520.npy"}
{"epoch": 0.7860922146636432, "step": 521, "batch_size": 64, "mean": 4.97168493270874, "std": 6.649054050445557, "min": -13.442367553710938, "p10": -1.6153881072998046, "median": 4.413644790649414, "p90": 13.980754852294924, "max": 21.632278442382812, "pos_frac": 0.8125, "sample": [8.351188659667969, 2.9326171875, 11.207542419433594, 3.3967742919921875, 13.638931274414062, 3.3833694458007812, 14.504104614257812, 0.1112823486328125, -1.6436080932617188, 8.226974487304688, 7.434062957763672, 0.09074783325195312, 8.880966186523438, 3.8378219604492188, 10.310142517089844, 6.528404235839844, 8.169843673706055, 4.333015441894531, 2.8517837524414062, 19.09165382385254, 1.0268173217773438, 11.392318725585938, 2.0723876953125, -0.22180938720703125, -0.3418560028076172, 7.550132751464844, -5.005836486816406, 11.22275161743164, 0.7817306518554688, 6.082725524902344, 21.632278442382812, 18.74609375, -6.705142974853516, 2.230274200439453, -0.167327880859375, -2.7665863037109375, 0.8817062377929688, 6.777046203613281, 1.4122467041015625, 14.871978759765625, 9.938369750976562, -0.19129562377929688, -3.753267288208008, 5.774505615234375, -1.5495414733886719, 3.0852413177490234, 8.052467346191406, 9.9334716796875, 5.383220672607422, 14.127250671386719, 4.494274139404297, 5.696678161621094, -13.442367553710938, 17.298553466796875, -9.69232177734375, 0.6892147064208984, 2.5814952850341797, 0.21573638916015625, 10.976852416992188, 0.465057373046875, 5.8118896484375, 1.8533973693847656, 7.9317626953125, 5.39764404296875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000521.npy"}
{"epoch": 0.7876039304610734, "step": 522, "batch_size": 64, "mean": 4.70469856262207, "std": 7.97706413269043, "min": -19.363510131835938, "p10": -5.488496780395508, "median": 4.207602500915527, "p90": 14.19647216796875, "max": 22.39025115966797, "pos_frac": 0.75, "sample": [-1.2855682373046875, 6.916339874267578, 1.9637022018432617, 4.909641265869141, 5.89239501953125, 14.63470458984375, 6.022010803222656, 14.000579833984375, 10.320953369140625, 5.071813583374023, -7.663105010986328, -5.169090270996094, 3.90399169921875, 21.22962188720703, 2.988140106201172, 11.579025268554688, -6.808982849121094, 6.057912826538086, 8.567462921142578, 3.466890335083008, 8.451484680175781, 3.020092010498047, 4.186492919921875, -19.363510131835938, 16.245391845703125, -0.34236907958984375, 6.196567535400391, -2.8201217651367188, -3.536346435546875, 4.024675369262695, -2.0383377075195312, 18.179834365844727, 0.807891845703125, 11.935749053955078, 12.679903030395508, 2.2579498291015625, -1.8381500244140625, 6.141044616699219, -1.101776123046875, 2.3404159545898438, 4.22871208190918, 4.837547302246094, 3.775684356689453, 2.84991455078125, 7.063091278076172, -0.8738250732421875, 12.174575805664062, -7.735355377197266, 22.39025115966797, 0.3701496124267578, 7.116783142089844, -10.244277954101562, 8.344474792480469, 21.762283325195312, 2.62371826171875, 2.7443008422851562, -9.857589721679688, 14.280426025390625, 10.887615203857422, 13.390464782714844, 1.8788833618164062, -5.625385284423828, 12.60861587524414, 10.084297180175781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000522.npy"}
{"epoch": 0.7891156462585034, "step": 523, "batch_size": 64, "mean": 5.652378082275391, "std": 7.53450345993042, "min": -14.758060455322266, "p10": -4.9731498718261715, "median": 6.271335601806641, "p90": 14.681848907470703, "max": 23.151500701904297, "pos_frac": 0.796875, "sample": [-2.929729461669922, 15.403793334960938, 3.574909210205078, 4.282783508300781, 14.786376953125, 10.777814865112305, 0.4926910400390625, 13.305931091308594, 9.988960266113281, 4.092643737792969, 2.1360034942626953, 12.573356628417969, 15.289527893066406, -1.4744300842285156, 14.437950134277344, 11.030105590820312, 6.404487609863281, 1.424896240234375, 7.97357177734375, -7.861331939697266, 3.824310302734375, 11.555465698242188, -2.3689651489257812, 3.1206512451171875, -8.055145263671875, 7.561756134033203, -4.4409637451171875, -1.5205192565917969, 6.13818359375, 9.939689636230469, 5.496061325073242, 9.269248962402344, 11.50433349609375, 7.001363754272461, 8.379974365234375, 16.080684661865234, 14.02969741821289, 5.2611846923828125, 4.746299743652344, -9.61932373046875, 5.743896484375, -5.022499084472656, 23.151500701904297, 7.003448486328125, 10.836570739746094, 5.952598571777344, 1.2339591979980469, 1.3503570556640625, 15.442489624023438, -8.524581909179688, 16.512435913085938, 3.3206520080566406, 8.384620666503906, -6.8885498046875, 13.770950317382812, 9.250083923339844, 4.496494293212891, 9.719482421875, 13.508529663085938, -14.758060455322266, -4.858001708984375, 7.766035079956055, 2.101715087890625, 8.643783569335938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000523.npy"}
{"epoch": 0.7906273620559335, "step": 524, "batch_size": 64, "mean": 4.04063606262207, "std": 6.406954765319824, "min": -15.955909729003906, "p10": -2.9193691253662104, "median": 4.262580871582031, "p90": 11.633225250244145, "max": 22.90300750732422, "pos_frac": 0.75, "sample": [10.43798828125, 5.033409118652344, 20.45459747314453, 0.663360595703125, 4.316925048828125, -3.0713653564453125, -2.035858154296875, -0.7939605712890625, 10.722000122070312, 2.488048553466797, 10.632938385009766, -3.0608787536621094, -6.209314346313477, 4.534538269042969, 0.5601959228515625, 9.572196960449219, -7.3549957275390625, -4.778472900390625, 2.58795166015625, 12.763175964355469, 8.31475830078125, -15.955909729003906, 5.685676574707031, 0.35129737854003906, 1.2422637939453125, 3.955240249633789, 22.90300750732422, 4.551918029785156, 5.6612396240234375, -1.703643798828125, -1.325439453125, 5.477439880371094, -4.759613037109375, 3.9593162536621094, 8.998504638671875, 8.088447570800781, 5.697935104370117, 7.5635986328125, 4.2082366943359375, -2.217082977294922, 1.13458251953125, 1.5275421142578125, 1.20391845703125, 15.355194091796875, 7.097352981567383, 0.95611572265625, 13.363531112670898, -2.5891799926757812, 7.924049377441406, 7.002655029296875, 4.7156829833984375, 3.5939979553222656, 5.483299255371094, 12.023750305175781, 3.994556427001953, 7.419099807739258, 12.44436264038086, 4.532039642333984, -0.0131988525390625, -0.28006744384765625, 8.972021102905273, -1.5567703247070312, 5.832588195800781, 0.3039073944091797], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000524.npy"}
{"epoch": 0.7921390778533636, "step": 525, "batch_size": 64, "mean": 3.813899278640747, "std": 5.872398376464844, "min": -8.151473999023438, "p10": -2.4742370605468746, "median": 3.392232894897461, "p90": 11.73142776489258, "max": 19.095733642578125, "pos_frac": 0.75, "sample": [7.654571533203125, -6.439582824707031, -4.266937255859375, -2.1217994689941406, -5.8516845703125, 3.8746109008789062, 3.761362075805664, 0.46837615966796875, 11.267471313476562, -1.4359207153320312, 3.9786605834960938, 2.0952529907226562, 3.948822021484375, 2.8916778564453125, 2.2737884521484375, 11.838226318359375, 0.11929512023925781, -7.301612854003906, 5.8157958984375, 4.730552673339844, 6.6554107666015625, -0.4962425231933594, 13.756355285644531, 13.595085144042969, 1.2289352416992188, 5.506486892700195, 1.3775787353515625, 1.2280693054199219, 3.4046173095703125, 8.182914733886719, -1.0255126953125, 11.40521240234375, 2.2444610595703125, 6.947345733642578, 9.281723022460938, -3.2966156005859375, 2.350496292114258, 3.4977569580078125, 19.095733642578125, -1.791360855102539, -0.8080291748046875, 3.3798484802246094, 5.533823013305664, -2.5928115844726562, 3.0917396545410156, 15.887535095214844, 1.1320343017578125, 1.7057418823242188, 4.46429443359375, 12.656848907470703, -2.1975631713867188, 11.482231140136719, 5.712066650390625, 0.9652481079101562, 6.2873382568359375, 11.293975830078125, -0.5397186279296875, 6.997657775878906, 5.796596527099609, 6.154022216796875, -1.9549026489257812, 17.322330474853516, 0.021343231201171875, -8.151473999023438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000525.npy"}
{"epoch": 0.7936507936507936, "step": 526, "batch_size": 64, "mean": 3.873396396636963, "std": 6.040987968444824, "min": -10.723575592041016, "p10": -3.209337043762207, "median": 3.726787567138672, "p90": 11.970274353027346, "max": 19.151647567749023, "pos_frac": 0.734375, "sample": [15.282928466796875, 6.721233367919922, -3.259397506713867, 1.8424739837646484, -3.857006072998047, 5.84355354309082, 2.7060489654541016, 11.659591674804688, -7.325286865234375, -3.092529296875, 3.566680908203125, 6.102933883666992, -6.585258483886719, 2.7366790771484375, 6.56402587890625, -1.208984375, 5.927019119262695, 19.151647567749023, -0.41786956787109375, 9.996101379394531, 6.2249603271484375, 8.500999450683594, 1.883819580078125, 13.616546630859375, 10.651382446289062, 2.2861404418945312, 4.913230895996094, 4.671974182128906, 3.182199478149414, 3.448089599609375, 0.4059772491455078, 0.4694995880126953, 12.103424072265625, -1.9736576080322266, 8.555130004882812, -0.39074134826660156, 7.109088897705078, -0.646209716796875, 8.310245513916016, -0.11222076416015625, 15.081302642822266, 2.5998573303222656, 4.960906982421875, 4.304372787475586, 5.907787322998047, -9.738931655883789, -2.2356109619140625, -0.09388351440429688, 8.825920104980469, 0.4506072998046875, 4.4548187255859375, -0.7474021911621094, 14.611541748046875, 4.568492889404297, 14.799150466918945, 0.42842864990234375, -3.8734512329101562, 7.371036529541016, 4.659706115722656, 3.2790603637695312, 3.8868942260742188, 1.0618896484375, 8.493988037109375, -10.723575592041016], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000526.npy"}
{"epoch": 0.7951625094482238, "step": 527, "batch_size": 64, "mean": 3.3972949981689453, "std": 6.8744988441467285, "min": -13.345958709716797, "p10": -3.567557144165039, "median": 2.90283203125, "p90": 12.708312225341803, "max": 23.229999542236328, "pos_frac": 0.6875, "sample": [-2.8239822387695312, 5.777069091796875, 2.9415054321289062, 9.868118286132812, 2.9594650268554688, 1.8834972381591797, 18.310195922851562, 2.8641586303710938, 10.65755844116211, 8.601837158203125, 14.909420013427734, -4.095947265625, 5.167236328125, -3.2503280639648438, 6.8744049072265625, -3.1487808227539062, 15.111221313476562, 0.7029037475585938, 3.0730018615722656, -3.5933990478515625, -0.137237548828125, -6.011390686035156, -6.6811065673828125, 0.392333984375, 4.765800476074219, 2.7287216186523438, 15.738319396972656, -2.6714630126953125, 6.3207855224609375, -1.326019287109375, 5.953392028808594, 1.8100814819335938, -0.23989105224609375, 6.059078216552734, 5.460289001464844, 23.229999542236328, 8.00341796875, 1.6210479736328125, 0.6598224639892578, 8.67226791381836, 6.762605667114258, 0.5242118835449219, 11.110321044921875, -1.4789924621582031, 4.023895263671875, -1.5300636291503906, -3.5072593688964844, 14.596206665039062, 5.1220855712890625, -3.1877479553222656, -6.31561279296875, 2.187335968017578, 6.22747802734375, 7.627529144287109, -1.7369918823242188, -2.153461456298828, 0.672332763671875, 6.061927795410156, 9.781951904296875, 13.393165588378906, -13.345958709716797, -12.513412475585938, 6.530605316162109, 1.4373321533203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000527.npy"}
{"epoch": 0.7966742252456538, "step": 528, "batch_size": 64, "mean": 3.4871726036071777, "std": 7.001922130584717, "min": -14.237213134765625, "p10": -3.3579231262207023, "median": 3.6091957092285156, "p90": 12.753842163085942, "max": 19.577651977539062, "pos_frac": 0.71875, "sample": [13.835830688476562, -13.110519409179688, 16.312896728515625, -1.200775146484375, 8.54364013671875, 3.5968093872070312, 6.8189849853515625, 0.7276535034179688, 1.0456581115722656, 3.62158203125, -2.2659568786621094, -3.6501617431640625, 8.67034912109375, -11.772224426269531, -2.1208572387695312, -2.619121551513672, 13.230152130126953, 2.7643871307373047, 3.2703990936279297, 0.5410614013671875, -0.22821426391601562, 14.071914672851562, 0.9102363586425781, -14.237213134765625, 0.66064453125, 4.80548095703125, -1.2318191528320312, -2.6760330200195312, 4.217260360717773, 9.349281311035156, 7.299890518188477, 0.9315567016601562, 5.807659149169922, 16.876144409179688, 3.6715927124023438, -0.7100467681884766, 19.577651977539062, 9.253562927246094, 9.916122436523438, 2.8546085357666016, 4.779640197753906, 8.960014343261719, -1.1440563201904297, 4.44847297668457, -5.906862258911133, 7.9076995849609375, 5.2032470703125, 0.723846435546875, 11.76700210571289, 2.785390853881836, 4.85223388671875, -11.060091018676758, -8.05157470703125, 11.7840576171875, 8.296142578125, 5.521724700927734, -1.5777053833007812, 2.7521820068359375, 13.169464111328125, -2.2328052520751953, 5.999626159667969, 4.688056945800781, 10.259391784667969, 1.893890380859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000528.npy"}
{"epoch": 0.7981859410430839, "step": 529, "batch_size": 64, "mean": 7.4513702392578125, "std": 7.53562068939209, "min": -13.226638793945312, "p10": -1.801815032958984, "median": 7.397148132324219, "p90": 15.734031105041504, "max": 23.23394775390625, "pos_frac": 0.765625, "sample": [10.193412780761719, -1.3088798522949219, 8.441207885742188, 2.6697235107421875, 13.695735931396484, 2.6130752563476562, 23.23394775390625, 17.41874885559082, -2.1979637145996094, -0.17466354370117188, -1.1802940368652344, -2.272745132446289, 6.5941009521484375, 4.752044677734375, 5.7210693359375, 0.9486122131347656, -2.46246337890625, 13.248001098632812, -0.5172805786132812, 9.837654113769531, 15.27713394165039, 0.9179534912109375, -4.0990142822265625, 5.548393249511719, 8.951507568359375, 15.100799560546875, -3.9708099365234375, -0.03830718994140625, 4.828981399536133, 3.873910903930664, 15.027366638183594, 9.18527603149414, 15.209121704101562, 19.950048446655273, 7.0460968017578125, 15.448249816894531, 14.07718276977539, 15.592344284057617, -0.9255294799804688, 15.794754028320312, 7.748199462890625, 12.052862167358398, 22.657318115234375, 4.6349945068359375, 4.371894836425781, 12.905794143676758, 9.302162170410156, -0.6261463165283203, 11.324653625488281, 12.007820129394531, 4.246513366699219, 13.904869079589844, 2.467172622680664, 14.715950012207031, -2.013072967529297, 22.08521270751953, 9.433906555175781, 11.839412689208984, 4.935993194580078, 8.83740234375, -13.226638793945312, 4.967414855957031, -0.1837158203125, 16.44921875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000529.npy"}
{"epoch": 0.799697656840514, "step": 530, "batch_size": 64, "mean": 4.754424571990967, "std": 6.014433860778809, "min": -4.903806686401367, "p10": -2.296334075927734, "median": 3.797182083129883, "p90": 12.461758613586428, "max": 19.912200927734375, "pos_frac": 0.734375, "sample": [17.24871826171875, 8.359756469726562, 4.787410736083984, -1.9874458312988281, 6.384330749511719, -0.6110992431640625, 19.912200927734375, -2.7006492614746094, 8.733871459960938, 8.44271469116211, 3.4226036071777344, 9.585578918457031, -1.6259117126464844, 4.80616569519043, 8.976373672485352, 9.34013557434082, 3.2168807983398438, 3.9296493530273438, 14.616645812988281, -4.818199157714844, 0.17528343200683594, -4.801795959472656, 1.28375244140625, 7.673057556152344, 9.834148406982422, -0.5884227752685547, 9.774436950683594, 5.847574234008789, 2.6882171630859375, 17.305938720703125, 5.2155303955078125, 2.403656005859375, 13.073127746582031, 15.4188232421875, -0.8377914428710938, 0.6063365936279297, 7.00457763671875, 11.916372299194336, -1.5731315612792969, 1.7135238647460938, 2.003021240234375, 8.978256225585938, 1.1208724975585938, 5.037223815917969, -2.4287147521972656, 2.264476776123047, 11.526603698730469, -0.7613258361816406, -4.903806686401367, 11.731536865234375, -3.5970802307128906, -3.5113258361816406, 8.633001327514648, -0.9645576477050781, 11.46152114868164, 1.3672256469726562, 9.606849670410156, -0.3844718933105469, 2.0261154174804688, -1.7482452392578125, 12.69549560546875, 0.5639190673828125, 3.664714813232422, 5.748931884765625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000530.npy"}
{"epoch": 0.8012093726379441, "step": 531, "batch_size": 64, "mean": 3.63983416557312, "std": 6.428878307342529, "min": -11.5277099609375, "p10": -3.567897796630859, "median": 2.282472610473633, "p90": 12.35207576751709, "max": 20.183303833007812, "pos_frac": 0.71875, "sample": [-6.603809356689453, 12.324474334716797, 8.856170654296875, 2.163379669189453, 3.30828857421875, 0.439239501953125, 0.09012603759765625, 6.324127197265625, 2.9825897216796875, 1.3550834655761719, 2.4015655517578125, 1.5041255950927734, 9.375396728515625, -3.2508544921875, 12.36390495300293, 10.160377502441406, -4.4237518310546875, -3.763153076171875, 1.5926437377929688, 19.305492401123047, 4.387969970703125, 12.033416748046875, -1.5270309448242188, -4.1723785400390625, -1.3425636291503906, 3.89691162109375, 3.5513763427734375, 2.6756057739257812, 4.535888671875, 0.12488555908203125, 3.2234344482421875, -1.6548080444335938, -0.41607093811035156, 1.1715011596679688, -11.5277099609375, -0.8534946441650391, -0.0640869140625, 7.158271789550781, 7.970802307128906, -9.475234985351562, 5.794029235839844, 3.1708450317382812, 12.84280776977539, 11.554252624511719, -3.212677001953125, -3.7037734985351562, 15.753009796142578, 13.031787872314453, 4.8443450927734375, 0.9185791015625, 0.5787200927734375, 2.049478530883789, 5.0684814453125, 10.281280517578125, -0.883026123046875, 1.8831939697265625, 9.238006591796875, -0.9640541076660156, 11.576045989990234, 20.183303833007812, 1.2520370483398438, 1.619802474975586, 14.362106323242188, -0.49129486083984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000531.npy"}
{"epoch": 0.8027210884353742, "step": 532, "batch_size": 64, "mean": 5.41311502456665, "std": 7.33374547958374, "min": -8.286918640136719, "p10": -3.7292163848876947, "median": 4.65069580078125, "p90": 16.04519119262695, "max": 22.721580505371094, "pos_frac": 0.765625, "sample": [10.852340698242188, 9.530891418457031, 15.042366027832031, 16.012351989746094, -5.261138916015625, -0.19591903686523438, 6.637947082519531, 20.75463104248047, 0.9695243835449219, 4.7920074462890625, -3.905559539794922, 8.596660614013672, 14.717277526855469, 2.8519210815429688, 14.732452392578125, 3.98883056640625, 5.055438995361328, 6.156898498535156, 1.612722396850586, -8.286918640136719, 0.3854560852050781, 2.0732803344726562, 10.959671020507812, 2.447854995727539, 5.248199462890625, 7.4681243896484375, 3.530763626098633, 16.26031494140625, 8.473064422607422, 12.412551879882812, 10.844985961914062, 0.7704582214355469, 3.946868896484375, 22.721580505371094, 1.6817169189453125, 8.318878173828125, 6.706291198730469, -3.3177490234375, -0.563720703125, 1.6377277374267578, -2.52252197265625, -5.076438903808594, 1.900533676147461, -2.008464813232422, 17.498252868652344, 4.016838073730469, 16.05926513671875, -2.8095169067382812, 5.899665832519531, 6.502586364746094, 3.8724441528320312, 4.7225494384765625, -1.5733909606933594, 17.011947631835938, -5.740821838378906, 12.856800079345703, -4.328144073486328, 4.5788421630859375, 18.35124969482422, -8.016510009765625, 0.12548446655273438, -2.7670841217041016, 13.299713134765625, 7.925045013427734], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000532.npy"}
{"epoch": 0.8042328042328042, "step": 533, "batch_size": 64, "mean": 4.841955184936523, "std": 6.7857666015625, "min": -15.016826629638672, "p10": -2.388016128540039, "median": 3.7715225219726562, "p90": 14.31616439819336, "max": 21.020187377929688, "pos_frac": 0.765625, "sample": [7.282066345214844, 10.737188339233398, -2.2699127197265625, 8.713268280029297, 6.051074981689453, 9.026872634887695, 1.4090118408203125, -0.6790580749511719, 9.961540222167969, 9.88783073425293, 2.090484619140625, 1.8679561614990234, 4.408470153808594, -1.491018295288086, 10.071334838867188, 11.80877685546875, 1.699056625366211, 9.705406188964844, 3.02349853515625, 0.12738800048828125, 1.5803070068359375, 2.4799861907958984, 3.845989227294922, 3.114715576171875, -3.354461669921875, 6.01521110534668, 10.607254028320312, -1.500396728515625, 3.2404556274414062, -8.751358032226562, -0.822418212890625, 0.08759689331054688, 16.090375900268555, 16.13513946533203, -2.3992347717285156, 14.4603271484375, -3.5616607666015625, 17.996337890625, 3.3953609466552734, -1.2517204284667969, 3.0647506713867188, 7.682228088378906, 5.421470642089844, 6.666561126708984, 14.003768920898438, 21.020187377929688, 3.395965576171875, -3.2675094604492188, 3.8940887451171875, -2.3618392944335938, -2.0160465240478516, 14.116111755371094, 6.461204528808594, 1.1089096069335938, 1.1162052154541016, -4.334861755371094, 3.6970558166503906, 7.854736328125, 4.7587127685546875, 17.97753143310547, 14.401901245117188, 11.308883666992188, -15.016826629638672, 8.092910766601562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000533.npy"}
{"epoch": 0.8057445200302343, "step": 534, "batch_size": 64, "mean": 4.603996276855469, "std": 6.568345069885254, "min": -11.195960998535156, "p10": -3.5339553833007806, "median": 4.757715225219727, "p90": 12.196589088439943, "max": 21.683135986328125, "pos_frac": 0.765625, "sample": [3.104644775390625, -5.21368408203125, 5.63037109375, 3.9073028564453125, 1.1784858703613281, 7.785560607910156, 2.632537841796875, -9.707294464111328, 4.569927215576172, -7.072509765625, 0.2376422882080078, 3.8776779174804688, 8.66177749633789, 7.0211639404296875, -1.854888916015625, 21.683135986328125, 6.744804382324219, 4.206670761108398, 10.370677947998047, -5.90435791015625, 11.503097534179688, 11.879608154296875, 3.012187957763672, 5.9496002197265625, -0.8468093872070312, 3.19769287109375, -11.195960998535156, 5.704172134399414, 14.309654235839844, 6.419057846069336, 6.322723388671875, -2.4935302734375, 16.25519561767578, 14.818939208984375, 13.753728866577148, 12.298812866210938, 2.3239269256591797, 6.2632293701171875, 19.112777709960938, 5.585500717163086, 9.091156005859375, 11.581642150878906, -3.8261184692382812, 0.0377655029296875, 8.795236587524414, 6.094280242919922, -2.8117828369140625, 4.978126525878906, 1.9384880065917969, -0.21734619140625, 3.5676307678222656, 10.952751159667969, 4.945503234863281, -2.6089439392089844, 2.3547592163085938, 2.5746917724609375, 9.041149139404297, -4.206321716308594, 5.387189865112305, -0.980804443359375, 9.208358764648438, 3.6192855834960938, 11.958066940307617, -2.8522415161132812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000534.npy"}
{"epoch": 0.8072562358276644, "step": 535, "batch_size": 64, "mean": 4.690255165100098, "std": 7.49647331237793, "min": -15.487512588500977, "p10": -3.5723827362060545, "median": 4.844270706176758, "p90": 15.294014739990235, "max": 24.417701721191406, "pos_frac": 0.75, "sample": [1.7422752380371094, -4.8535614013671875, 15.580070495605469, 7.519565582275391, 3.4777984619140625, 9.299808502197266, -9.284482955932617, -2.664623260498047, -3.6972618103027344, 7.691520690917969, 9.62408447265625, -1.136770248413086, 5.23579216003418, 7.312644958496094, -0.9084873199462891, 1.1176986694335938, -2.6594085693359375, 4.370147705078125, 0.26937294006347656, -1.3592605590820312, 5.8267822265625, 7.012840270996094, 8.307046890258789, 3.52044677734375, 7.922794342041016, -1.9271926879882812, 17.028717041015625, -3.9832420349121094, -7.885734558105469, 2.256439208984375, 14.897674560546875, 21.784637451171875, 8.6248779296875, -3.2809982299804688, 4.257299423217773, 6.805704116821289, 0.9265594482421875, 18.572357177734375, -6.515647888183594, 5.017131805419922, 5.882392883300781, 0.87078857421875, 24.417701721191406, 6.504638671875, 0.365966796875, -15.487512588500977, -2.511749267578125, 8.419670104980469, 1.102752685546875, 9.895011901855469, 9.008102416992188, -3.209014892578125, 1.9171524047851562, 15.463874816894531, 9.642017364501953, 6.576885223388672, 5.598777770996094, 10.30902099609375, 13.660152435302734, 4.671409606933594, 1.3871994018554688, 0.9226913452148438, 17.421142578125, 11.501861572265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000535.npy"}
{"epoch": 0.8087679516250945, "step": 536, "batch_size": 64, "mean": 5.52219820022583, "std": 5.955845832824707, "min": -4.140449523925781, "p10": -1.4579330444335934, "median": 4.049476623535156, "p90": 14.044841766357424, "max": 20.557785034179688, "pos_frac": 0.8125, "sample": [6.279552459716797, 2.2013168334960938, -4.140449523925781, 19.413755416870117, 14.9649658203125, 3.415740966796875, 2.1957244873046875, -3.3962173461914062, 4.0213165283203125, 1.7855510711669922, -2.3078765869140625, 2.9763412475585938, 11.025291442871094, 9.334892272949219, 3.184995651245117, 6.269813537597656, 6.343292236328125, 15.396631240844727, 20.557785034179688, 7.618587493896484, 4.685968399047852, -1.5742816925048828, 11.721132278442383, 8.018882751464844, -0.6575393676757812, 17.128028869628906, 6.844623565673828, 3.0262279510498047, 4.80499267578125, 5.057407379150391, 12.390499114990234, -3.6289939880371094, 2.155132293701172, 7.060901641845703, -0.8961639404296875, 0.832763671875, 11.493274688720703, 0.95330810546875, 4.8811187744140625, 0.9940280914306641, 14.255401611328125, 12.370271682739258, 6.7894439697265625, -2.190704345703125, 1.138458251953125, 3.87884521484375, 3.2858200073242188, 9.504419326782227, 3.9715328216552734, 10.106353759765625, -0.8218593597412109, 13.327835083007812, 1.705352783203125, 13.553535461425781, 11.201873779296875, 4.07763671875, 2.299528121948242, 16.48760986328125, -1.186452865600586, -2.4416656494140625, 2.5743370056152344, 5.257627487182617, -0.92413330078125, 2.7672882080078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000536.npy"}
{"epoch": 0.8102796674225246, "step": 537, "batch_size": 64, "mean": 5.068394660949707, "std": 7.330803871154785, "min": -13.646690368652344, "p10": -4.3390148162841795, "median": 4.952662467956543, "p90": 14.726646041870119, "max": 19.821136474609375, "pos_frac": 0.734375, "sample": [4.005657196044922, 11.706565856933594, 4.8479461669921875, 10.991424560546875, 13.67645263671875, 11.810029983520508, 10.594703674316406, -0.4546070098876953, 8.004608154296875, 7.045398712158203, 1.434713363647461, 4.921421051025391, 14.811134338378906, 10.88873291015625, 14.52950668334961, 10.47552490234375, 12.93992805480957, 4.983903884887695, 0.38390350341796875, -4.159889221191406, -2.1313247680664062, 4.84173583984375, 5.956878662109375, 2.3211288452148438, -2.084524154663086, 15.374465942382812, 5.9661865234375, 0.7974700927734375, -13.646690368652344, 0.06300926208496094, 5.94268798828125, 2.631267547607422, 3.6365127563476562, -2.7471771240234375, -4.839630126953125, -0.50897216796875, 15.811279296875, 6.607761383056641, 9.16986083984375, -1.3520088195800781, 13.232872009277344, 12.836952209472656, 4.3929443359375, 3.8546600341796875, -2.0167293548583984, -0.9504051208496094, 7.577972412109375, -6.2816009521484375, 4.265377044677734, 11.619476318359375, 4.638294219970703, 9.198896408081055, -7.983091354370117, 17.432220458984375, 15.071144104003906, 19.821136474609375, -1.8119316101074219, 6.7928314208984375, -6.344381332397461, 7.02545166015625, 15.800483703613281, 7.649726867675781, -12.27621078491211, -4.415782928466797], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000537.npy"}
{"epoch": 0.8117913832199547, "step": 538, "batch_size": 64, "mean": 6.019152641296387, "std": 6.648603439331055, "min": -10.509651184082031, "p10": -2.024769973754882, "median": 5.93939208984375, "p90": 14.028425025939942, "max": 25.540435791015625, "pos_frac": 0.8125, "sample": [-0.14858245849609375, 9.78468132019043, 1.199066162109375, 2.7368431091308594, 7.12982177734375, -10.509651184082031, 7.609409332275391, -1.2467803955078125, 8.778095245361328, 10.149776458740234, 5.019453048706055, 13.76480484008789, 0.5549468994140625, 10.857650756835938, 8.537588119506836, 7.538227081298828, 13.105247497558594, 16.690704345703125, 6.3305206298828125, 7.398683547973633, 3.6139984130859375, -0.7111606597900391, 14.72650146484375, 5.343414306640625, 4.279970169067383, -4.114103317260742, 4.42181396484375, 4.209312438964844, 22.06866455078125, 8.705375671386719, -2.45849609375, 7.535377502441406, -3.9872589111328125, 1.19732666015625, -3.3099708557128906, 2.9834213256835938, 5.5482635498046875, 6.7401123046875, 1.432159423828125, 3.1236572265625, 13.552986145019531, -3.669097900390625, 16.939926147460938, 8.754226684570312, -1.4354133605957031, 8.205129623413086, -2.2773513793945312, 14.14140510559082, 7.582023620605469, 25.540435791015625, 7.9748992919921875, 9.888046264648438, 9.776870727539062, -0.7041473388671875, 4.165985107421875, 2.3112106323242188, 21.845653533935547, 1.5887222290039062, 8.743488311767578, 2.6752777099609375, 2.3464889526367188, 8.152841567993164, 8.849933624267578, 3.6473655700683594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000538.npy"}
{"epoch": 0.8133030990173847, "step": 539, "batch_size": 64, "mean": 5.081345081329346, "std": 6.325012683868408, "min": -7.05194091796875, "p10": -1.7905324935913085, "median": 3.439333915710449, "p90": 13.41413860321045, "max": 20.2664794921875, "pos_frac": 0.796875, "sample": [4.540428161621094, 11.677696228027344, 0.6991310119628906, 18.15264892578125, 9.057470321655273, 2.3931198120117188, 12.49945068359375, 8.49090576171875, 12.64654541015625, 1.3416595458984375, 14.141189575195312, 4.5865325927734375, -0.1540069580078125, 4.3698272705078125, -1.48052978515625, -0.2374134063720703, 11.43951416015625, -2.5249404907226562, 16.34280776977539, 4.271213531494141, 4.3443145751953125, -1.7607440948486328, 6.661865234375, -7.05194091796875, 1.7113571166992188, -1.8032989501953125, 0.7730121612548828, 4.091358184814453, 20.2664794921875, 2.8333473205566406, 2.0109024047851562, 3.009410858154297, 2.250030517578125, 2.3072128295898438, 11.859725952148438, 5.564939498901367, 1.5928459167480469, -0.19696044921875, 13.053268432617188, 2.5653228759765625, -2.5698928833007812, 2.39453125, 3.8692569732666016, 0.6616325378417969, 10.977981567382812, 4.2608489990234375, -3.779956817626953, 16.292678833007812, 4.163421630859375, -2.9102096557617188, 10.409244537353516, 11.174980163574219, 1.1414566040039062, 1.5200347900390625, 2.6373138427734375, 11.145599365234375, -0.62841796875, 1.9588470458984375, -6.347499847412109, 13.474824905395508, 4.182401657104492, 13.272537231445312, 19.665130615234375, 1.9036407470703125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000539.npy"}
{"epoch": 0.8148148148148148, "step": 540, "batch_size": 64, "mean": 6.36396598815918, "std": 7.112360000610352, "min": -9.682079315185547, "p10": -1.6299329757690424, "median": 6.103218078613281, "p90": 16.37178192138672, "max": 21.682281494140625, "pos_frac": 0.828125, "sample": [-6.388542175292969, 2.4705238342285156, 21.682281494140625, 9.938470840454102, 2.141042709350586, -9.682079315185547, 10.321609497070312, 1.4694366455078125, 2.330291748046875, 10.600215911865234, 0.778045654296875, 10.576568603515625, 14.700098037719727, 6.255226135253906, -3.8091602325439453, 6.327068328857422, 18.205032348632812, 6.666313171386719, -1.8789100646972656, 8.132080078125, 9.181724548339844, -3.1067047119140625, 9.83197021484375, 3.276254653930664, 16.421783447265625, 16.52227020263672, -0.9279994964599609, 5.951210021972656, -0.61407470703125, 7.578788757324219, -3.46466064453125, 14.764633178710938, 15.938461303710938, 6.6013641357421875, 15.309860229492188, 0.89471435546875, 0.17675018310546875, 10.875282287597656, -3.3998870849609375, 1.0547924041748047, 0.3963508605957031, 11.69427490234375, 7.727745056152344, 19.593002319335938, 18.790103912353516, 4.260986328125, 16.255111694335938, 5.5624542236328125, 1.4245147705078125, -1.0489864349365234, 4.7003021240234375, 12.198410034179688, -0.5330963134765625, 0.00470733642578125, 1.7305755615234375, 11.732147216796875, 0.7610549926757812, 18.379486083984375, 3.2000274658203125, 10.058082580566406, 7.877098083496094, 1.6136207580566406, 2.916513442993164, 14.29719352722168], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000540.npy"}
{"epoch": 0.8163265306122449, "step": 541, "batch_size": 64, "mean": 5.945107460021973, "std": 7.421595573425293, "min": -10.85418701171875, "p10": -2.933929824829101, "median": 5.763389587402344, "p90": 16.0051513671875, "max": 22.06237030029297, "pos_frac": 0.796875, "sample": [22.06237030029297, 9.367416381835938, 10.300968170166016, 17.081239700317383, -1.095916748046875, 16.256614685058594, 0.23332977294921875, 12.95528793334961, 15.018083572387695, -1.4095535278320312, -1.0856704711914062, 11.565948486328125, 5.386985778808594, 6.028501510620117, 2.91619873046875, -7.123439788818359, 5.8515777587890625, 17.712203979492188, 2.7575454711914062, 13.375659942626953, -2.5787200927734375, 7.347875595092773, 18.822132110595703, 12.179733276367188, 15.418403625488281, 6.744197845458984, 9.352569580078125, -0.3955230712890625, -7.93250846862793, -1.3508243560791016, 5.844291687011719, 4.82501220703125, 15.157270431518555, 10.747100830078125, 0.8286056518554688, 10.738075256347656, 4.794164657592773, -7.287635803222656, 10.393978118896484, 0.46512413024902344, 0.6448345184326172, -3.086162567138672, 17.492713928222656, 12.026294708251953, 1.813638687133789, 7.016681671142578, 11.724117279052734, 5.682487487792969, -10.85418701171875, 1.2800369262695312, 0.36431312561035156, 11.392406463623047, 6.2578887939453125, 3.5980606079101562, 0.2659111022949219, -6.142242431640625, 5.189552307128906, 5.2223968505859375, 1.0333938598632812, 8.530109405517578, 2.8438339233398438, -4.191913604736328, 18.848346710205078, 11.265663146972656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000541.npy"}
{"epoch": 0.817838246409675, "step": 542, "batch_size": 64, "mean": 6.0159406661987305, "std": 7.337803363800049, "min": -7.802642822265625, "p10": -3.0301280975341793, "median": 4.873044967651367, "p90": 16.90193748474121, "max": 23.104595184326172, "pos_frac": 0.796875, "sample": [11.66305160522461, 11.495563507080078, 1.5085487365722656, 12.690109252929688, 3.9408111572265625, 5.821189880371094, 4.113433837890625, 8.127479553222656, 3.0935592651367188, 3.7476654052734375, 6.807880401611328, -0.14915084838867188, -3.1367263793945312, 6.408599853515625, 2.733644485473633, 8.338159561157227, 0.24327850341796875, 14.2822265625, -3.618194580078125, 12.800939559936523, 17.41954803466797, 15.452690124511719, 2.4308109283447266, -0.6138534545898438, -6.666473388671875, -7.802642822265625, 6.527130126953125, 1.1745376586914062, 1.7991561889648438, -3.8899154663085938, 18.381114959716797, 15.023616790771484, 1.390838623046875, 11.624732971191406, -2.7813987731933594, -2.198474884033203, -4.2945404052734375, 6.8348846435546875, -2.716949462890625, 11.891860961914062, 7.733741760253906, 1.2720298767089844, 18.471263885498047, -0.7165679931640625, 7.056888580322266, 3.039459228515625, 11.649120330810547, 3.2426795959472656, 0.22768402099609375, 18.08979034423828, 16.596763610839844, 19.959949493408203, -6.698392868041992, 6.184577941894531, 3.9534435272216797, 6.07537841796875, 0.05859375, 23.104595184326172, 11.779441833496094, 5.262351989746094, 4.483737945556641, 14.607635498046875, 2.6545791625976562, 17.032726287841797], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000542.npy"}
{"epoch": 0.8193499622071051, "step": 543, "batch_size": 64, "mean": 5.263012886047363, "std": 6.901883125305176, "min": -17.93122100830078, "p10": -1.5788030624389644, "median": 4.483904838562012, "p90": 14.583052062988283, "max": 22.33570098876953, "pos_frac": 0.796875, "sample": [1.8594970703125, 2.7368412017822266, -1.1381187438964844, -0.7433204650878906, 6.365852355957031, 3.225666046142578, 10.979339599609375, 13.982109069824219, -9.339347839355469, 13.63287353515625, -0.34393310546875, 8.286035537719727, 6.200847625732422, 11.725467681884766, 16.565216064453125, 6.308219909667969, 22.33570098876953, -6.323268890380859, 3.038227081298828, 10.309940338134766, 2.477212905883789, 3.4923629760742188, 0.17652511596679688, 1.8290557861328125, 1.8885459899902344, 11.834806442260742, -17.93122100830078, 0.3793830871582031, 2.966289520263672, -1.778116226196289, 5.8753662109375, 13.731063842773438, 17.264732360839844, 6.006744384765625, 9.317996978759766, 5.1755523681640625, 17.861661911010742, -0.2995128631591797, -2.4769325256347656, -0.9817237854003906, 8.947925567626953, 4.807889938354492, 6.945404052734375, 12.238739013671875, 9.532127380371094, 7.346382141113281, 17.088058471679688, 15.948112487792969, -0.6947822570800781, 1.3149681091308594, 2.371112823486328, 2.1713085174560547, 1.7089767456054688, 4.159919738769531, 7.312835693359375, 1.8400650024414062, 7.523490905761719, 3.211376190185547, 3.7339096069335938, -1.7676677703857422, 5.638336181640625, 14.840599060058594, -1.9717674255371094, 6.111850738525391], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000543.npy"}
{"epoch": 0.8208616780045351, "step": 544, "batch_size": 64, "mean": 6.795982360839844, "std": 8.291118621826172, "min": -10.447244644165039, "p10": -4.882183074951172, "median": 5.674022674560547, "p90": 18.83520431518555, "max": 25.05474853515625, "pos_frac": 0.796875, "sample": [12.007026672363281, -6.743110656738281, 5.6445465087890625, 3.5469131469726562, 10.668342590332031, 9.809326171875, -6.604719161987305, 2.716035842895508, 11.844169616699219, 2.2151947021484375, 3.28265380859375, 2.688627243041992, 4.681640625, 17.734153747558594, 5.228660583496094, -0.2637519836425781, -10.447244644165039, 6.488471984863281, 15.835853576660156, 8.8743896484375, 5.703498840332031, 3.208770751953125, 7.959495544433594, 12.360347747802734, 13.125411987304688, 4.337091445922852, 3.2258682250976562, -1.8167877197265625, -0.04894065856933594, -4.9627838134765625, 25.05474853515625, -4.694114685058594, 6.86323356628418, -0.6864395141601562, 19.187942504882812, 6.026451110839844, 17.218780517578125, -8.102447509765625, 5.344305038452148, 6.247184753417969, 6.5341033935546875, 3.3314285278320312, 18.54613494873047, 1.6343326568603516, 3.3820114135742188, 4.298835754394531, 14.310428619384766, 4.91990852355957, 19.700973510742188, -7.5184326171875, 0.1272106170654297, 20.257339477539062, 15.557907104492188, 24.538726806640625, -5.3565216064453125, 7.795631408691406, 17.755340576171875, 1.8142852783203125, 19.633533477783203, 18.959091186523438, -0.9542942047119141, 9.298553466796875, 10.820503234863281, 10.79705810546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000544.npy"}
{"epoch": 0.8223733938019653, "step": 545, "batch_size": 64, "mean": 4.799767971038818, "std": 7.146633625030518, "min": -11.85310173034668, "p10": -3.354701614379883, "median": 4.334157943725586, "p90": 14.570146942138674, "max": 23.53186798095703, "pos_frac": 0.734375, "sample": [9.334726333618164, 6.480491638183594, -2.933074951171875, 16.117870330810547, 4.295696258544922, 2.3831024169921875, -3.4277114868164062, -1.6634025573730469, -1.3769207000732422, 5.681301116943359, 6.3424072265625, -3.292144775390625, 18.748306274414062, 3.0341243743896484, 0.2738170623779297, 10.03104019165039, 3.8602828979492188, 4.57403564453125, -3.2766265869140625, 4.37261962890625, 5.551731109619141, 13.518707275390625, -0.8454246520996094, 9.113052368164062, 13.982635498046875, 8.389778137207031, 3.85162353515625, 0.9263706207275391, 8.742904663085938, 1.597311019897461, 0.44957733154296875, -6.991477966308594, -6.791778564453125, 4.904193878173828, -0.32379150390625, -1.9770946502685547, 17.09563446044922, 18.50543975830078, 14.821937561035156, -0.4427642822265625, -0.23577499389648438, 8.441848754882812, 8.674678802490234, 4.989528656005859, 1.2770729064941406, 9.427696228027344, -3.381511688232422, 4.4071044921875, 4.07489013671875, -11.85310173034668, 0.884124755859375, 10.49224853515625, 10.567459106445312, 23.53186798095703, 11.695510864257812, -4.315101623535156, 1.6935005187988281, 5.633060455322266, -6.723945617675781, 0.7906551361083984, 10.985618591308594, 17.567440032958984, 12.761573791503906, 2.160205841064453], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000545.npy"}
{"epoch": 0.8238851095993953, "step": 546, "batch_size": 64, "mean": 4.658918380737305, "std": 6.6222453117370605, "min": -11.496711730957031, "p10": -3.342230987548828, "median": 4.3800153732299805, "p90": 12.094890975952149, "max": 19.002304077148438, "pos_frac": 0.78125, "sample": [14.146835327148438, -2.89434814453125, 12.023933410644531, 6.13720703125, 6.124530792236328, 11.932771682739258, 1.509368896484375, -3.2628326416015625, -1.2589492797851562, -2.2507286071777344, -9.610382080078125, 8.183319091796875, 3.362201690673828, 2.6353912353515625, -11.496711730957031, 9.193817138671875, 2.349699020385742, 2.0746498107910156, -4.259769439697266, 3.4866104125976562, -3.3762588500976562, 7.137094497680664, 6.373847961425781, 5.2199554443359375, 6.600044250488281, 4.352699279785156, 9.32952880859375, 0.130096435546875, 16.387142181396484, 19.002304077148438, 4.407331466674805, 5.725990295410156, 12.635011672973633, 1.5470123291015625, 7.312067031860352, -6.96002197265625, 3.75006103515625, 11.3341064453125, 2.975362777709961, 5.442634582519531, 4.974985122680664, 3.7096080780029297, -5.989501953125, 3.72601318359375, 8.350282669067383, 10.886054992675781, -2.9651031494140625, 3.6368846893310547, 8.682710647583008, 10.12893295288086, 10.262985229492188, 1.2101058959960938, 16.71405029296875, -1.146636962890625, -1.5824928283691406, 3.472379684448242, 10.863508224487305, 8.705825805664062, 17.658260345458984, 4.010290145874023, 11.538124084472656, -9.135845184326172, 12.125301361083984, 0.8814563751220703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000546.npy"}
{"epoch": 0.8253968253968254, "step": 547, "batch_size": 64, "mean": 5.906020164489746, "std": 6.842987060546875, "min": -12.807952880859375, "p10": -1.5885675430297848, "median": 6.137086868286133, "p90": 13.938745880126953, "max": 23.89684295654297, "pos_frac": 0.859375, "sample": [1.3865623474121094, 0.939910888671875, 5.223810195922852, 4.283935546875, 14.603179931640625, 2.5830020904541016, -4.331296920776367, 23.89684295654297, 9.502513885498047, 8.702880859375, 6.2909393310546875, 8.350929260253906, 5.427743911743164, 4.56427001953125, 1.9442596435546875, 0.0613861083984375, 11.5704345703125, 20.608497619628906, 1.6649055480957031, 5.5627288818359375, 11.215316772460938, 9.309295654296875, 18.218215942382812, 1.31982421875, 13.823287963867188, 6.142597198486328, 1.500823974609375, 8.908622741699219, 13.988227844238281, 1.070526123046875, 7.7752685546875, -1.7332820892333984, -12.807952880859375, 6.371833801269531, -3.8010482788085938, 8.189128875732422, 11.783557891845703, 1.4205093383789062, 8.363380432128906, 7.660715103149414, 6.3970184326171875, 16.746131896972656, 7.207542419433594, 9.13473129272461, -6.1115875244140625, 8.902095794677734, 1.762542724609375, -11.825241088867188, -1.2509002685546875, 5.907627105712891, 12.565757751464844, 6.627161026000977, 18.353195190429688, -2.3901710510253906, 4.358972549438477, 1.015045166015625, 6.1315765380859375, 12.775150299072266, 0.7790470123291016, 6.0479736328125, 9.276588439941406, 3.070587158203125, 11.2340087890625, -0.28585243225097656], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000547.npy"}
{"epoch": 0.8269085411942555, "step": 548, "batch_size": 64, "mean": 4.508130073547363, "std": 6.734511852264404, "min": -6.951316833496094, "p10": -3.111934471130371, "median": 3.665393829345703, "p90": 13.551522827148442, "max": 23.920822143554688, "pos_frac": 0.765625, "sample": [2.4281463623046875, 8.824981689453125, 4.625999450683594, -4.3248443603515625, -2.457193374633789, 10.451019287109375, -3.5005645751953125, 23.36407470703125, 11.698186874389648, 3.7761611938476562, -2.4887466430664062, 15.806655883789062, 0.5572376251220703, 0.2909393310546875, 0.398895263671875, 3.1287078857421875, -3.1592483520507812, 2.7704849243164062, 3.9121856689453125, 4.933013916015625, 12.328178405761719, 5.2310028076171875, -6.951316833496094, 3.1601104736328125, 5.1539459228515625, 3.0688858032226562, 19.377479553222656, 9.841060638427734, 9.834831237792969, 14.075813293457031, 0.5320091247558594, 3.55462646484375, 0.39498138427734375, 5.305015563964844, 0.9510650634765625, 2.9397735595703125, 16.34429931640625, 4.132205963134766, -3.6891326904296875, 5.796623229980469, 7.782249450683594, 3.9385509490966797, 6.8795166015625, -0.339996337890625, 18.82073974609375, 0.3225669860839844, -1.5059432983398438, 8.521881103515625, -3.7855682373046875, 4.228736877441406, 5.2806243896484375, -6.9209136962890625, 3.8714466094970703, 8.922069549560547, 2.7831268310546875, -0.096893310546875, 9.691192626953125, -2.769195556640625, 3.0877685546875, -2.3512420654296875, 23.920822143554688, 6.458827972412109, -3.001535415649414, 2.36395263671875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000548.npy"}
{"epoch": 0.8284202569916855, "step": 549, "batch_size": 64, "mean": 4.411401748657227, "std": 6.8581109046936035, "min": -12.919864654541016, "p10": -3.42620849609375, "median": 3.833192825317383, "p90": 13.55303287506104, "max": 17.87090301513672, "pos_frac": 0.671875, "sample": [-0.18548583984375, 11.501861572265625, -2.4442825317382812, 3.4537124633789062, -0.5403022766113281, -0.09356689453125, 7.830116271972656, 1.9677162170410156, 4.468914031982422, 15.431995391845703, 16.522293090820312, 2.784454345703125, 3.0572261810302734, 14.108896255493164, -0.53192138671875, 3.0029678344726562, -5.736667633056641, 8.392425537109375, 7.095376014709473, 9.705951690673828, 12.335922241210938, 9.902320861816406, 8.14605712890625, -3.4963035583496094, 8.762359619140625, -0.022914886474609375, 2.1234397888183594, 11.338043212890625, 3.8081893920898438, -9.58929443359375, 11.887100219726562, 16.79688262939453, 9.199462890625, -0.5347919464111328, 6.43317985534668, 14.025779724121094, -1.1035785675048828, -6.819671630859375, -2.325469970703125, 6.227325439453125, -8.179092407226562, 4.291229248046875, 1.4486064910888672, 2.5239734649658203, 9.436702728271484, 8.958009719848633, -3.262653350830078, 7.1497344970703125, 8.182769775390625, -1.3948974609375, -0.96636962890625, 12.449956893920898, 11.08441162109375, 5.1483612060546875, 3.210540771484375, -1.64959716796875, 5.893758773803711, 3.858196258544922, 0.47681236267089844, 17.87090301513672, 16.931652069091797, -12.919864654541016, -3.9755191802978516, -1.1236419677734375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000549.npy"}
{"epoch": 0.8299319727891157, "step": 550, "batch_size": 64, "mean": 3.995119571685791, "std": 6.474963188171387, "min": -11.2198486328125, "p10": -3.0837650299072257, "median": 2.4418811798095703, "p90": 11.848506927490236, "max": 19.323291778564453, "pos_frac": 0.796875, "sample": [3.085662841796875, 19.323291778564453, 17.869182586669922, 4.937492370605469, -7.421236038208008, 1.622100830078125, 0.051677703857421875, 0.6151351928710938, 1.1326789855957031, 2.4320106506347656, 9.044733047485352, 5.225837707519531, 6.471067428588867, 4.201333999633789, 0.2590351104736328, 14.250373840332031, 18.80957794189453, 8.678367614746094, 6.4600372314453125, 5.032346725463867, 11.282325744628906, 1.6582984924316406, 6.533222198486328, 10.570556640625, 10.351104736328125, -5.338722229003906, 9.756423950195312, 0.3572559356689453, 1.674041748046875, -4.8586883544921875, -5.68388557434082, 0.5371551513671875, 2.6395721435546875, 1.4771881103515625, 1.4505386352539062, 2.451751708984375, 2.9556808471679688, -0.42797088623046875, -0.9559478759765625, 0.7199859619140625, -1.9479866027832031, 7.023735046386719, 6.292430877685547, 1.8332366943359375, 5.477447509765625, 5.283191680908203, 12.091156005859375, 7.558435440063477, 1.960693359375, 10.659900665283203, -3.4137725830078125, 7.970338821411133, 9.519378662109375, 1.6788196563720703, 16.528270721435547, 0.9580440521240234, -11.2198486328125, -0.1294097900390625, 17.392440795898438, -9.087074279785156, 2.2143478393554688, 1.4908294677734375, -2.3137474060058594, -1.3638038635253906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000550.npy"}
{"epoch": 0.8314436885865457, "step": 551, "batch_size": 64, "mean": 4.548789024353027, "std": 6.3072991371154785, "min": -7.91253662109375, "p10": -2.186036491394043, "median": 2.8377113342285156, "p90": 12.203640747070313, "max": 23.001800537109375, "pos_frac": 0.75, "sample": [4.887428283691406, 4.7452545166015625, 9.420093536376953, 2.9240875244140625, 9.423097610473633, 9.349504470825195, 23.001800537109375, -0.0251007080078125, 1.0652580261230469, 11.717803955078125, -0.6808319091796875, 2.5727462768554688, 16.015724182128906, 0.5860919952392578, 10.639190673828125, 3.8374290466308594, 2.2434844970703125, 5.90576171875, -7.91253662109375, -1.544870376586914, 3.717123031616211, 4.4860076904296875, 7.564216613769531, -0.301116943359375, 2.7513351440429688, 18.567594528198242, 1.713958740234375, 5.6051788330078125, 8.600837707519531, 7.041561126708984, 4.905296325683594, 0.6836090087890625, 0.9156379699707031, 7.70513916015625, -1.10443115234375, 14.5169677734375, -2.795379638671875, 16.584388732910156, 0.34417724609375, 13.815963745117188, -0.9542617797851562, 1.8932342529296875, 0.6120376586914062, 0.9081649780273438, 12.090747833251953, 9.530792236328125, 11.554794311523438, 7.564472198486328, -2.128023147583008, -2.9395389556884766, 2.20068359375, 7.1253662109375, -1.638641357421875, -5.024980545043945, -2.7547225952148438, 1.7474517822265625, -1.5171089172363281, 12.142501831054688, -2.2108993530273438, 1.2659454345703125, -5.9626312255859375, 0.4200420379638672, 12.229843139648438, 11.477737426757812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000551.npy"}
{"epoch": 0.8329554043839759, "step": 552, "batch_size": 64, "mean": 5.240490913391113, "std": 6.330565452575684, "min": -12.27920913696289, "p10": -1.8558637619018554, "median": 5.457606315612793, "p90": 12.234622192382814, "max": 20.805133819580078, "pos_frac": 0.84375, "sample": [-1.850332260131836, 4.0222625732421875, 5.206001281738281, 0.24466705322265625, 11.930719375610352, 3.8747291564941406, 1.4468650817871094, 2.7499847412109375, 2.3973159790039062, 12.719444274902344, 11.111114501953125, 18.882080078125, 8.014556884765625, 7.317626953125, 2.2388687133789062, -3.1019363403320312, -1.8582344055175781, 11.456111907958984, 4.033470153808594, -0.29366302490234375, 6.394615173339844, 7.8868560791015625, 4.076934814453125, 6.200761795043945, 11.147903442382812, 1.8698997497558594, 8.493598937988281, 12.364866256713867, 8.224296569824219, -9.7567138671875, 5.188772201538086, 10.538110733032227, 11.452163696289062, -12.27920913696289, 7.6895599365234375, -0.5505313873291016, 13.037811279296875, 8.31842041015625, 1.9907760620117188, 4.030769348144531, 8.952194213867188, 5.661308288574219, -5.792808532714844, 14.82501220703125, 6.482696533203125, 7.155998229980469, 9.396835327148438, 3.9165267944335938, -8.44342041015625, 1.4022064208984375, 20.805133819580078, 7.129508972167969, 4.528873443603516, -11.034370422363281, 9.365047454833984, 3.6673812866210938, 2.6843643188476562, 5.37449836730957, 6.9979705810546875, 7.171039581298828, 13.957748413085938, 5.540714263916016, 5.3483123779296875, 3.437349319458008], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000552.npy"}
{"epoch": 0.8344671201814059, "step": 553, "batch_size": 64, "mean": 4.953568935394287, "std": 8.625454902648926, "min": -15.948753356933594, "p10": -3.7315021514892575, "median": 2.8716049194335938, "p90": 18.085712051391607, "max": 25.372467041015625, "pos_frac": 0.640625, "sample": [20.81097412109375, -3.5993309020996094, -0.9549140930175781, 8.786243438720703, 1.1573333740234375, -1.145620346069336, 14.965288162231445, 5.75244140625, -5.5054473876953125, -0.27532386779785156, 0.41644287109375, 1.40966796875, 11.83993911743164, 11.9781494140625, -2.1060562133789062, 13.363304138183594, -1.3777008056640625, 10.363037109375, 2.8720474243164062, -15.948753356933594, -3.0968856811523438, 16.98553466796875, 2.2708053588867188, 8.719001770019531, 19.684547424316406, 11.983268737792969, -0.8969039916992188, -0.15017318725585938, -5.42120361328125, -5.601795196533203, 22.101715087890625, 15.39471435546875, 13.061973571777344, 25.372467041015625, 4.3470001220703125, 8.369226455688477, 18.55721664428711, 6.1570892333984375, 5.4890899658203125, 19.033897399902344, 2.8711624145507812, 3.5398826599121094, -0.9080066680908203, 0.9354305267333984, -3.78814697265625, 1.9816513061523438, -3.424074172973633, 23.34917449951172, -8.92645263671875, -5.401523590087891, 2.5332260131835938, 7.302433013916016, 9.562362670898438, 3.0280838012695312, 0.1920928955078125, 7.2978057861328125, -0.06544113159179688, -2.8955078125, 7.716339111328125, 7.314872741699219, -1.4461631774902344, -2.5634593963623047, 15.716094970703125, -2.0557403564453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000553.npy"}
{"epoch": 0.8359788359788359, "step": 554, "batch_size": 64, "mean": 4.846909523010254, "std": 7.740989685058594, "min": -17.227615356445312, "p10": -2.9843187332153316, "median": 3.379509925842285, "p90": 15.9768913269043, "max": 23.62986946105957, "pos_frac": 0.78125, "sample": [1.995025634765625, 16.20938491821289, 22.47597312927246, 14.552658081054688, 2.699676513671875, 3.41741943359375, 3.3416004180908203, 3.115478515625, 4.567390441894531, 7.676055908203125, -1.5833282470703125, -0.5973358154296875, -2.7373504638671875, 11.185821533203125, -10.735469818115234, 3.2390174865722656, 7.148048400878906, -6.5599517822265625, 4.2083892822265625, 1.7770576477050781, 0.8686733245849609, 7.008327484130859, 1.3508358001708984, -4.4634552001953125, 8.436614990234375, 8.900276184082031, 17.916175842285156, 1.7153434753417969, -3.4803829193115234, 6.3340911865234375, 23.62986946105957, 0.2222900390625, 6.178218841552734, 0.4500083923339844, -0.6252937316894531, -3.0901622772216797, 10.847370147705078, 0.7043476104736328, 5.035675048828125, -0.10595703125, -17.227615356445312, 14.16650390625, -0.758056640625, 4.379673004150391, 6.8944091796875, 2.0305442810058594, 1.113372802734375, 3.834512710571289, 1.0981330871582031, 0.5441131591796875, 9.11199951171875, 4.177101135253906, -1.3611869812011719, 15.434406280517578, 23.392868041992188, 1.8673477172851562, 6.673240661621094, 6.606838226318359, 2.496734619140625, 14.652240753173828, 17.52875518798828, -3.5122222900390625, 19.673969268798828, 4.156105041503906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000554.npy"}
{"epoch": 0.8374905517762661, "step": 555, "batch_size": 64, "mean": 4.502638816833496, "std": 7.037137508392334, "min": -16.811752319335938, "p10": -3.848073577880859, "median": 4.5053300857543945, "p90": 13.312849807739262, "max": 20.08380126953125, "pos_frac": 0.796875, "sample": [0.9143657684326172, -13.812118530273438, 2.2776355743408203, 2.5846633911132812, 8.328315734863281, 1.5752182006835938, -6.4841766357421875, 0.49615478515625, -3.205974578857422, -0.9270248413085938, 9.6904296875, -5.030448913574219, 7.11322021484375, 2.041423797607422, 12.121990203857422, 14.199172973632812, 3.116680145263672, -0.6626930236816406, 3.001220703125, 10.77685546875, 3.53204345703125, 0.191986083984375, 20.08380126953125, 2.20916748046875, 10.304891586303711, 3.2508163452148438, 10.752155303955078, 13.726417541503906, 6.4402923583984375, 11.605857849121094, 8.911138534545898, 5.5387115478515625, 7.264808654785156, 14.409149169921875, -6.4607086181640625, 8.820602416992188, -16.811752319335938, 5.785942077636719, 1.7882156372070312, 4.738534927368164, 4.272125244140625, 6.687267303466797, 2.1174163818359375, 3.1062545776367188, -2.4526214599609375, 1.8244094848632812, 15.7747802734375, 6.460834503173828, 13.93722915649414, -3.295318603515625, 12.347858428955078, -1.450042724609375, -7.1182708740234375, 1.1527023315429688, 8.718746185302734, 5.5660552978515625, 11.133403778076172, 5.4332733154296875, 5.986429214477539, -4.084968566894531, 18.290523529052734, 6.582122802734375, 12.2772216796875, 0.7044639587402344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000555.npy"}
{"epoch": 0.8390022675736961, "step": 556, "batch_size": 64, "mean": 4.736089706420898, "std": 7.569568634033203, "min": -8.51300048828125, "p10": -3.4088188171386715, "median": 2.40432071685791, "p90": 14.88124351501465, "max": 27.91455078125, "pos_frac": 0.703125, "sample": [6.336544036865234, 2.1392364501953125, 3.39788818359375, -0.6049385070800781, -2.3186721801757812, 11.402641296386719, 17.609508514404297, 2.306171417236328, -0.7520561218261719, -4.8996429443359375, 0.24851036071777344, 2.277362823486328, 12.857250213623047, 14.893112182617188, 0.8777275085449219, -2.6120147705078125, 4.9805755615234375, -0.3711585998535156, 5.761600494384766, 14.85354995727539, 0.10881805419921875, 10.96795654296875, 13.372156143188477, 2.0765914916992188, -1.0435237884521484, 3.2783660888671875, -6.776023864746094, 6.535589218139648, 9.59829330444336, -1.4467544555664062, 0.5770320892333984, -8.51300048828125, 16.54051971435547, -0.894927978515625, -1.6528701782226562, 2.502470016479492, 8.01470947265625, 13.241111755371094, 0.9958953857421875, 1.1081123352050781, 13.964815139770508, 27.91455078125, 1.1982269287109375, 8.703285217285156, 12.401542663574219, 10.575172424316406, -0.18648338317871094, 5.570220947265625, -7.483367919921875, -2.9082412719726562, 17.283615112304688, -5.2195281982421875, -0.8143463134765625, 16.528419494628906, 1.218963623046875, -3.62335205078125, 2.5504150390625, -7.488208770751953, 6.361873626708984, 12.953983306884766, 15.716697692871094, 14.628475189208984, 6.074365615844727, 0.2149200439453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000556.npy"}
{"epoch": 0.8405139833711263, "step": 557, "batch_size": 64, "mean": 6.339790344238281, "std": 7.174754619598389, "min": -5.830474853515625, "p10": -1.2596523284912107, "median": 5.643896102905273, "p90": 17.11936874389649, "max": 25.526344299316406, "pos_frac": 0.796875, "sample": [15.191186904907227, 7.058444976806641, 6.956798553466797, 1.0508804321289062, 0.6733055114746094, -4.142038345336914, 22.4327335357666, 2.1821136474609375, 9.405643463134766, 4.789276123046875, 1.2498550415039062, -4.6265869140625, -1.4142265319824219, 9.005165100097656, 2.5401573181152344, 2.3167362213134766, 8.004444122314453, 6.203273773193359, 22.3712100982666, 9.79052734375, -0.22853851318359375, -0.41767120361328125, 15.260383605957031, -0.79779052734375, -0.8989791870117188, -3.5303955078125, 18.41339874267578, 19.425811767578125, 1.6379203796386719, -0.13482666015625, 9.02667236328125, 4.93494987487793, 8.899009704589844, -5.830474853515625, 3.960235595703125, 18.398054122924805, 4.353485107421875, 14.800765991210938, 14.447498321533203, 25.526344299316406, 1.4761276245117188, 2.5733203887939453, -4.474372863769531, 0.13708114624023438, -0.0113525390625, 5.0845184326171875, 9.07366943359375, 10.439178466796875, 10.148012161254883, 4.413795471191406, 0.30222320556640625, 4.175743103027344, 7.633308410644531, 7.735685348510742, 8.915321350097656, 7.11566162109375, 13.333488464355469, 6.762519836425781, 8.93697738647461, 8.854232788085938, 0.094451904296875, -3.1089859008789062, 17.91607666015625, 9.935169219970703], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000557.npy"}
{"epoch": 0.8420256991685563, "step": 558, "batch_size": 64, "mean": 5.552489280700684, "std": 7.099879741668701, "min": -9.82309341430664, "p10": -1.975558471679687, "median": 4.152046203613281, "p90": 15.266456985473633, "max": 24.22625732421875, "pos_frac": 0.71875, "sample": [4.023323059082031, 13.003082275390625, 14.512832641601562, -1.1332321166992188, 16.387062072753906, 1.4861602783203125, 14.756814956665039, -0.09383010864257812, 0.5922031402587891, 5.436767578125, 1.0040817260742188, 10.690574645996094, 5.3284149169921875, 4.011970520019531, -0.427337646484375, 9.494083404541016, 9.100196838378906, 2.5131607055664062, 5.818992614746094, 5.854095458984375, 15.79034423828125, 10.874969482421875, 4.28753662109375, 13.28371810913086, 6.049686431884766, -2.2176589965820312, 19.700119018554688, 2.058929443359375, 8.872459411621094, 4.285469055175781, 15.246074676513672, 1.7333602905273438, 0.989654541015625, 24.22625732421875, -1.4744300842285156, -1.5734272003173828, 3.2760257720947266, -0.039276123046875, 4.280769348144531, 21.926406860351562, 9.9530029296875, -1.5850982666015625, 15.275192260742188, -2.1428985595703125, 14.140619277954102, 18.978809356689453, -2.845020294189453, -2.2895965576171875, 11.0408935546875, 2.0675201416015625, 3.2040786743164062, -9.82309341430664, 4.9030609130859375, -3.0890655517578125, -0.34124755859375, -1.0653114318847656, 7.576625823974609, -0.8625564575195312, -4.031795501708984, 0.9949188232421875, 10.129867553710938, 2.2860889434814453, 9.006072998046875, -0.05816650390625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000558.npy"}
{"epoch": 0.8435374149659864, "step": 559, "batch_size": 64, "mean": 5.632546424865723, "std": 7.924187660217285, "min": -9.659969329833984, "p10": -3.579076385498047, "median": 4.032352447509766, "p90": 15.38737735748291, "max": 27.28228759765625, "pos_frac": 0.71875, "sample": [-1.2443084716796875, -4.6301422119140625, 2.566020965576172, -0.9764862060546875, 9.522010803222656, 27.28228759765625, 14.327659606933594, 14.4088134765625, 3.513763427734375, 1.9111900329589844, -9.55535888671875, 15.14263916015625, 0.7719039916992188, 9.051254272460938, -0.0065517425537109375, 17.117095947265625, 10.900390625, 2.5422821044921875, 11.263755798339844, 7.474517822265625, 14.604156494140625, 2.3944778442382812, -1.976531982421875, 8.055419921875, -0.77685546875, 0.8368854522705078, -3.5966567993164062, 7.87640380859375, 3.6621932983398438, 20.67577362060547, 1.1266021728515625, 4.4025115966796875, 6.6627044677734375, 10.835174560546875, 15.434234619140625, -1.7078304290771484, 8.805421829223633, -1.1671924591064453, 2.612884521484375, -5.78009033203125, 0.589263916015625, -3.538055419921875, -2.0157470703125, 4.701446533203125, 7.739463806152344, 15.278043746948242, 11.835376739501953, -3.6690292358398438, 11.815765380859375, 8.804397583007812, 8.898859024047852, 20.099105834960938, -9.659969329833984, -2.2092247009277344, 9.955863952636719, 19.839820861816406, 3.6607742309570312, 15.948516845703125, 3.501504898071289, 2.6269073486328125, 13.25131607055664, 8.656295776367188, -0.6100616455078125, -9.38010025024414], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000559.npy"}
{"epoch": 0.8450491307634165, "step": 560, "batch_size": 64, "mean": 4.887587547302246, "std": 7.152764797210693, "min": -8.143035888671875, "p10": -3.6756454467773434, "median": 3.4182310104370117, "p90": 15.893726921081543, "max": 19.732786178588867, "pos_frac": 0.703125, "sample": [4.500633239746094, 4.425506591796875, 16.926406860351562, 6.880699157714844, -8.143035888671875, -3.7337493896484375, 14.045234680175781, 4.8033294677734375, 16.96209716796875, 0.75238037109375, 13.053237915039062, -0.41602325439453125, 1.7408370971679688, 7.2332305908203125, -1.6130905151367188, -4.986785888671875, -2.8914031982421875, -1.2988300323486328, 4.69122314453125, 10.66778564453125, -5.334541320800781, 10.095794677734375, 2.9261856079101562, 0.7213382720947266, 10.6942138671875, 9.509475708007812, 1.32080078125, 12.626684188842773, 17.325347900390625, 2.8858108520507812, 6.962593078613281, -3.5337677001953125, -4.5253448486328125, 15.945144653320312, 15.773752212524414, -2.9667129516601562, 1.7184219360351562, -3.540069580078125, 19.732786178588867, -7.526336669921875, 17.496543884277344, 12.111061096191406, 5.927940368652344, 7.57261848449707, 1.9260177612304688, 6.29644775390625, -2.0300827026367188, 15.357954025268555, 2.5336380004882812, -1.3629608154296875, -0.332244873046875, 9.28106689453125, -1.15985107421875, -1.3869552612304688, 2.690216064453125, 9.070472717285156, -4.2429351806640625, 3.1725196838378906, 13.789329528808594, 2.8709716796875, 16.197433471679688, 7.163539886474609, 1.787649154663086, 3.663942337036133], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000560.npy"}
{"epoch": 0.8465608465608465, "step": 561, "batch_size": 64, "mean": 5.273043632507324, "std": 7.364389419555664, "min": -7.8154144287109375, "p10": -2.9095296859741206, "median": 3.5787620544433594, "p90": 16.657612609863282, "max": 24.506088256835938, "pos_frac": 0.75, "sample": [0.25095367431640625, 0.7857437133789062, 3.736358642578125, 2.1260929107666016, 1.326456069946289, 8.741645812988281, 2.7396774291992188, -2.246185302734375, 9.82612419128418, 4.1441802978515625, 3.3883285522460938, 6.5844573974609375, 0.4677543640136719, 11.691261291503906, 0.6748771667480469, -4.451745986938477, 0.6041412353515625, 15.599813461303711, 24.506088256835938, -1.169464111328125, 0.9312324523925781, 10.98956298828125, 8.629013061523438, -2.603740692138672, 8.168235778808594, -2.656900405883789, 6.673194885253906, -3.8922386169433594, -7.8154144287109375, 18.033111572265625, -5.337085723876953, 4.558815002441406, 11.07623291015625, 11.800193786621094, 4.091697692871094, 3.03778076171875, 3.5032730102539062, 5.2054595947265625, 1.4072036743164062, 20.685028076171875, 21.414093017578125, 16.8153076171875, 14.292129516601562, 20.744461059570312, 9.70296859741211, -3.1757888793945312, -3.0177993774414062, 7.3820037841796875, 16.289657592773438, 5.580854415893555, -1.4223175048828125, 2.298473358154297, -0.5601806640625, -0.36823081970214844, 3.6542510986328125, 8.312271118164062, 1.7234363555908203, 7.4105987548828125, 9.205268859863281, -3.7223892211914062, -0.2387371063232422, 2.4571266174316406, 18.558712005615234, -1.6725997924804688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000561.npy"}
{"epoch": 0.8480725623582767, "step": 562, "batch_size": 64, "mean": 6.048947334289551, "std": 6.957977771759033, "min": -7.816162109375, "p10": -1.6655630111694335, "median": 6.261955261230469, "p90": 16.01244583129883, "max": 20.943859100341797, "pos_frac": 0.734375, "sample": [0.6072463989257812, 14.582391738891602, -5.561653137207031, 10.602996826171875, 7.5242767333984375, 8.084358215332031, -1.2679977416992188, 16.780197143554688, 19.323467254638672, -0.9102134704589844, 7.286827087402344, 9.5050048828125, -5.44511604309082, 2.268993377685547, 16.322052001953125, 0.6323776245117188, -0.6544876098632812, 3.373260498046875, 8.030479431152344, -0.3126678466796875, 13.505943298339844, 13.970375061035156, -0.8288726806640625, 1.0033950805664062, -2.1784286499023438, 16.090538024902344, -1.684457778930664, -2.6520843505859375, 11.339920043945312, 4.994087219238281, 11.842411041259766, -1.6214752197265625, 20.943859100341797, 9.550094604492188, 4.105316162109375, 15.830230712890625, -1.2059059143066406, -0.21773529052734375, -1.0605659484863281, 14.724647521972656, 0.92547607421875, 1.1523971557617188, 8.761030197143555, 7.1288299560546875, 8.556503295898438, 17.21868133544922, 9.272064208984375, 6.3687896728515625, 12.200241088867188, 6.155120849609375, 14.048843383789062, 8.779754638671875, -0.023801803588867188, 4.9740142822265625, 7.3819427490234375, 7.995412826538086, 10.744089126586914, 18.742385864257812, 5.745155334472656, -7.816162109375, 0.858612060546875, -3.6216659545898438, 1.0510406494140625, 3.310791015625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000562.npy"}
{"epoch": 0.8495842781557067, "step": 563, "batch_size": 64, "mean": 5.28586483001709, "std": 7.9002790451049805, "min": -11.345855712890625, "p10": -3.6859046936035154, "median": 5.429177284240723, "p90": 14.09688186645508, "max": 27.47807502746582, "pos_frac": 0.75, "sample": [-0.13576507568359375, 9.117687225341797, 6.307670593261719, 0.6444435119628906, 11.166557312011719, 12.101577758789062, -10.427244186401367, 15.164703369140625, 5.6092681884765625, -0.41594696044921875, -0.20279312133789062, 6.047523498535156, 0.4166107177734375, -3.21942138671875, 12.975406646728516, 4.055397033691406, -5.622589111328125, 11.557792663574219, -5.484180450439453, 14.50347900390625, 7.470710754394531, 20.928421020507812, -3.4158077239990234, 7.325492858886719, 11.899177551269531, 3.785655975341797, 12.440765380859375, 2.163410186767578, 5.77166748046875, -9.408744812011719, 0.5735855102539062, -3.2314224243164062, -3.4017295837402344, 2.206714630126953, 0.07751274108886719, 17.687570571899414, 3.8267173767089844, -11.157390594482422, 8.307628631591797, 4.198574066162109, 17.300670623779297, 14.310394287109375, 11.486225128173828, -3.7764663696289062, -3.4745941162109375, 4.045066833496094, 13.598686218261719, 0.21484375, 12.7982177734375, 6.9432373046875, -11.345855712890625, 4.271451950073242, 9.000701904296875, 13.25088882446289, 8.64583969116211, 5.249086380004883, 27.47807502746582, 12.655624389648438, -0.9552001953125, 12.547735214233398, 8.342948913574219, 1.9730453491210938, 9.15644645690918, 2.3695850372314453], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000563.npy"}
{"epoch": 0.8510959939531368, "step": 564, "batch_size": 64, "mean": 5.805528163909912, "std": 7.080882549285889, "min": -9.43271255493164, "p10": -2.626820373535156, "median": 5.238564491271973, "p90": 14.99775905609131, "max": 22.330726623535156, "pos_frac": 0.828125, "sample": [11.159198760986328, -2.6660003662109375, 14.075531005859375, 9.876266479492188, 13.044517517089844, 16.609012603759766, 8.935867309570312, -9.43271255493164, 0.052947998046875, -0.7083511352539062, 16.00702667236328, 14.365058898925781, 6.097377777099609, -2.535400390625, 1.795135498046875, 0.2556915283203125, -2.008909225463867, 5.15507698059082, 6.677467346191406, 0.8325576782226562, 0.5869636535644531, 0.9433727264404297, 10.579484939575195, 7.236907958984375, 12.945571899414062, 3.85479736328125, 14.609214782714844, -4.763313293457031, 4.072261810302734, 6.57867431640625, 13.788749694824219, 15.889015197753906, 20.156784057617188, 3.5513877868652344, 1.1413154602050781, -6.835487365722656, 5.143426895141602, 3.9215965270996094, 1.4887161254882812, 2.5020828247070312, 15.164278030395508, 6.827186584472656, 8.251632690429688, 2.15948486328125, -4.28373908996582, -0.02001953125, 5.9179534912109375, 5.322052001953125, 3.1117630004882812, -8.6134033203125, 10.858173370361328, 6.275857925415039, 15.1981201171875, 0.7428989410400391, 13.664825439453125, 3.5298538208007812, 12.030567169189453, 22.330726623535156, -7.0730438232421875, 14.36428451538086, 1.855621337890625, 8.15500259399414, 4.142793655395508, 6.66204833984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000564.npy"}
{"epoch": 0.8526077097505669, "step": 565, "batch_size": 64, "mean": 4.313503265380859, "std": 6.811407089233398, "min": -11.201290130615234, "p10": -4.45505313873291, "median": 4.174156188964844, "p90": 11.904421806335451, "max": 21.290393829345703, "pos_frac": 0.71875, "sample": [1.7545890808105469, -4.098720550537109, -7.635017395019531, 9.998710632324219, -1.0132675170898438, 1.8906402587890625, 4.192138671875, 6.308479309082031, -0.19197845458984375, 2.4550323486328125, 5.5240631103515625, 7.87164306640625, 15.399124145507812, -11.201290130615234, -0.514923095703125, 7.340950012207031, 4.868900299072266, 14.104278564453125, 20.03472900390625, 19.475799560546875, 9.955482482910156, -7.740936279296875, -1.5575103759765625, -1.9536857604980469, 7.358997344970703, 6.227210998535156, 3.951152801513672, 4.1561737060546875, 11.237335205078125, 7.525970458984375, 8.870807647705078, 0.8608283996582031, -2.1829605102539062, -0.6938915252685547, 8.605688095092773, -4.607767105102539, 5.106012344360352, 7.998805999755859, -5.390413284301758, 5.150047302246094, 5.130615234375, -5.957302093505859, -0.9161834716796875, 9.595794677734375, 2.7859649658203125, 8.76620864868164, 1.3473434448242188, 7.352542877197266, 13.969806671142578, 2.1411285400390625, 11.439704895019531, -3.3608970642089844, -7.830322265625, -1.287506103515625, 4.026602745056152, 2.504119873046875, 0.4058837890625, 10.318309783935547, 2.3429794311523438, 5.4315032958984375, 21.290393829345703, 11.041435241699219, 3.98126220703125, 12.103586196899414], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000565.npy"}
{"epoch": 0.854119425547997, "step": 566, "batch_size": 64, "mean": 4.747633934020996, "std": 8.118224143981934, "min": -11.498401641845703, "p10": -5.945330047607421, "median": 4.518070220947266, "p90": 13.242103195190431, "max": 25.78680419921875, "pos_frac": 0.6875, "sample": [8.04656982421875, 3.5243301391601562, -1.7750396728515625, 4.483970642089844, 0.5420989990234375, 14.156883239746094, 17.749267578125, 6.7987518310546875, 3.942615509033203, 8.55572509765625, 5.272500991821289, 4.211334228515625, 3.482481002807617, 2.236492156982422, 12.973182678222656, 20.14678955078125, -11.447265625, 9.193603515625, 9.103492736816406, 8.502799987792969, 11.93759536743164, -1.5303497314453125, -5.194461822509766, 13.304573059082031, -0.013334274291992188, -5.150604248046875, 12.441524505615234, 12.060001373291016, 11.350616455078125, -5.277553558349609, 13.09634017944336, 12.81866455078125, 12.18886947631836, 3.43157958984375, -5.726959228515625, -11.498401641845703, -3.4375534057617188, 5.897159576416016, 11.330368041992188, -7.976318359375, 0.6278457641601562, -6.5270233154296875, 25.78680419921875, -6.038917541503906, 11.988113403320312, 2.2532119750976562, -3.8418636322021484, -6.9431610107421875, 12.230262756347656, 1.6273345947265625, -0.6001434326171875, -1.1493301391601562, 6.5107879638671875, 8.927709579467773, -1.6581039428710938, 5.1369781494140625, -6.340126037597656, 14.085487365722656, 5.210853576660156, -1.2867698669433594, 10.350238800048828, 2.7034530639648438, 4.5521697998046875, 22.490402221679688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000566.npy"}
{"epoch": 0.8556311413454271, "step": 567, "batch_size": 64, "mean": 5.59492301940918, "std": 7.039702892303467, "min": -5.934761047363281, "p10": -2.5735544204711913, "median": 4.2507781982421875, "p90": 15.505675506591798, "max": 28.7860107421875, "pos_frac": 0.796875, "sample": [2.3157730102539062, 1.6719169616699219, -2.3211441040039062, 17.19020652770996, 4.80645751953125, 5.677421569824219, 13.92190170288086, 10.825939178466797, 18.727800369262695, 0.6347694396972656, 11.13836669921875, 6.969583511352539, 4.190914154052734, -5.479957580566406, 4.310642242431641, 2.06756591796875, -2.681730270385742, 5.808219909667969, 4.0997314453125, 6.199626922607422, 9.433135986328125, -0.24681854248046875, -0.6138362884521484, 6.116905212402344, 0.81439208984375, 0.46503257751464844, -2.7841949462890625, -3.291536331176758, 6.432579040527344, -5.934761047363281, 21.873062133789062, 5.491432189941406, 18.25384521484375, 28.7860107421875, 12.815437316894531, 0.7813301086425781, 10.557369232177734, 0.06982421875, 3.2342376708984375, 15.690353393554688, -5.09150505065918, 3.1918716430664062, 2.9254379272460938, 3.1869354248046875, 11.023178100585938, 12.189422607421875, -1.257568359375, 3.621671676635742, 13.06414794921875, 5.598106384277344, 8.912689208984375, 1.9109840393066406, 15.074760437011719, 1.318115234375, 0.9710311889648438, -1.6717910766601562, -4.097309112548828, 6.159246444702148, 16.771644592285156, -0.063507080078125, 5.812236785888672, 3.6144790649414062, 8.380779266357422, 8.512199401855469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000567.npy"}
{"epoch": 0.8571428571428571, "step": 568, "batch_size": 64, "mean": 6.142929553985596, "std": 6.5240044593811035, "min": -5.1899566650390625, "p10": -1.2747070312499997, "median": 5.295980453491211, "p90": 15.173567199707032, "max": 26.040924072265625, "pos_frac": 0.859375, "sample": [2.3665084838867188, 0.09568977355957031, 5.362194061279297, 0.93170166015625, 5.681209564208984, 9.514236450195312, 7.39788818359375, -5.1899566650390625, 2.1412696838378906, 1.4181137084960938, -3.992288589477539, 14.434564590454102, 2.9420223236083984, 14.466339111328125, -1.5209102630615234, -2.911865234375, 0.86407470703125, 2.0792007446289062, 9.104217529296875, 7.065252304077148, 0.5110702514648438, 4.0429229736328125, 1.3973684310913086, 7.8878021240234375, 14.328056335449219, 18.055912017822266, 9.64425277709961, 15.192184448242188, -4.746883392333984, 5.229766845703125, 6.4060516357421875, 3.6025123596191406, 7.288694381713867, -1.0319671630859375, 8.576395034790039, 4.731283187866211, -1.3787384033203125, 9.399696350097656, -0.1629791259765625, 7.605785369873047, 6.957775115966797, 15.620513916015625, 1.109954833984375, 8.472480773925781, 13.086227416992188, 2.2448959350585938, 8.316680908203125, -2.3346099853515625, 4.0403289794921875, 3.2388839721679688, 5.807403564453125, 8.69256591796875, 2.56878662109375, 12.087064743041992, 17.128761291503906, 2.4933605194091797, 0.303619384765625, 15.130126953125, 17.460472106933594, 11.804473876953125, 3.356475830078125, 0.7956466674804688, 19.896018981933594, 26.040924072265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000568.npy"}
{"epoch": 0.8586545729402872, "step": 569, "batch_size": 64, "mean": 4.768857479095459, "std": 6.8481340408325195, "min": -12.551872253417969, "p10": -2.4175773620605465, "median": 4.2123308181762695, "p90": 13.834501647949223, "max": 20.94830322265625, "pos_frac": 0.796875, "sample": [8.936912536621094, 2.4091949462890625, 2.2334842681884766, 15.383203506469727, 5.952766418457031, 16.66852569580078, -7.343048095703125, 9.250732421875, 10.322891235351562, 18.757843017578125, 9.129478454589844, 1.7029647827148438, 0.3489189147949219, 4.108255386352539, -11.483627319335938, 8.195987701416016, 9.877300262451172, 14.26446533203125, -0.06720924377441406, 8.093145370483398, 12.69295883178711, 10.89215087890625, 2.827953338623047, 16.828880310058594, 1.2627792358398438, 9.086254119873047, 8.697723388671875, 3.582132339477539, 5.0527801513671875, -7.809986114501953, -8.875221252441406, 3.4030838012695312, 0.15181350708007812, 2.380664825439453, 1.8473358154296875, 2.753021240234375, 20.94830322265625, 3.4673500061035156, 8.525428771972656, 1.0004253387451172, 6.131155014038086, -2.2663803100585938, 4.92071533203125, -1.0467662811279297, 2.579082489013672, -12.551872253417969, 5.9747314453125, -1.0897369384765625, 10.172832489013672, 4.949739456176758, 3.680624008178711, 4.31640625, 16.312299728393555, -2.4823760986328125, -3.3416919708251953, 3.4509754180908203, 5.253303527832031, 4.5803375244140625, 10.569358825683594, -0.1305999755859375, -1.4084014892578125, 5.042045593261719, 12.831253051757812, 3.3018341064453125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000569.npy"}
{"epoch": 0.8601662887377173, "step": 570, "batch_size": 64, "mean": 7.153788089752197, "std": 6.973662376403809, "min": -6.744026184082031, "p10": -1.4517692565917968, "median": 6.686017036437988, "p90": 16.973634338378908, "max": 22.62763214111328, "pos_frac": 0.859375, "sample": [13.327194213867188, -6.744026184082031, 8.622507095336914, 6.834003448486328, 9.202629089355469, 8.019828796386719, 0.732818603515625, -1.3841629028320312, 5.629890441894531, 1.4442901611328125, 1.1510658264160156, 1.0860137939453125, 1.5109195709228516, 8.171041488647461, 2.7572402954101562, 10.581829071044922, -1.0004768371582031, -1.498199462890625, 1.4601211547851562, 9.528606414794922, 7.0494537353515625, -1.8457984924316406, 1.6200103759765625, 16.434289932250977, 20.861316680908203, 0.5329265594482422, 13.922863006591797, 8.243217468261719, 13.81170654296875, 5.638954162597656, 7.946723937988281, 17.198387145996094, 10.699562072753906, -3.3059234619140625, 5.757537841796875, 7.284538269042969, 11.449699401855469, 5.777549743652344, 17.321903228759766, 5.53264045715332, 15.548736572265625, 20.789031982421875, 13.805923461914062, -2.325969696044922, 4.772930145263672, 7.1825103759765625, 15.91455078125, 11.429611206054688, 6.186187744140625, -6.237846374511719, 10.81556510925293, 0.0965576171875, 1.6732864379882812, 6.538030624389648, 22.62763214111328, 4.547782897949219, 2.795347213745117, 2.750894546508789, 10.600578308105469, 18.094879150390625, 20.47955322265625, 3.4234962463378906, 16.44921112060547, -1.480743408203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000570.npy"}
{"epoch": 0.8616780045351474, "step": 571, "batch_size": 64, "mean": 4.104123115539551, "std": 6.773094654083252, "min": -10.215324401855469, "p10": -5.261121559143066, "median": 4.342540740966797, "p90": 10.800432205200197, "max": 20.566532135009766, "pos_frac": 0.75, "sample": [6.725372314453125, 3.4360580444335938, 4.177360534667969, 1.6824913024902344, -1.0973892211914062, 20.566532135009766, 6.5357513427734375, 5.4766845703125, 9.425016403198242, 4.37200927734375, 5.156827926635742, 9.883529663085938, -1.2463760375976562, 1.3658618927001953, -1.6244430541992188, 8.629562377929688, 3.9022979736328125, 6.477201461791992, -0.682159423828125, 11.181602478027344, 13.032001495361328, 0.4701995849609375, 10.473655700683594, 4.181282043457031, 9.885269165039062, 8.914928436279297, 8.438865661621094, 6.361570358276367, -8.127693176269531, 8.977371215820312, 0.6287498474121094, -7.867757797241211, 3.6727848052978516, -5.120054244995117, 8.719879150390625, 1.7313518524169922, -5.3215789794921875, -1.5498676300048828, -3.2706146240234375, 4.1174468994140625, 6.758613586425781, 4.8647613525390625, 5.792793273925781, 2.2672576904296875, 6.299968719482422, 16.229747772216797, 9.915924072265625, -7.685855865478516, 2.0457305908203125, 2.4957809448242188, -9.951507568359375, -10.048583984375, 9.351173400878906, 9.918373107910156, 10.940479278564453, -1.9830245971679688, 5.4710693359375, 15.889896392822266, 4.313072204589844, 9.335067749023438, -2.508516311645508, 0.8742961883544922, -10.215324401855469, 19.60110092163086], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000571.npy"}
{"epoch": 0.8631897203325775, "step": 572, "batch_size": 64, "mean": 4.735124588012695, "std": 6.5809431076049805, "min": -8.8133544921875, "p10": -3.7114654541015626, "median": 4.809930801391602, "p90": 13.680130767822268, "max": 24.038002014160156, "pos_frac": 0.78125, "sample": [0.6355819702148438, -5.804801940917969, -0.13776397705078125, 7.655498504638672, 10.992668151855469, 3.7271156311035156, 16.545307159423828, 4.5113525390625, -2.876443862915039, 24.038002014160156, 5.220062255859375, 7.16969108581543, 11.304351806640625, 11.711883544921875, -1.3257637023925781, 8.866497039794922, -2.6617889404296875, 1.9306049346923828, 1.3701229095458984, 15.535743713378906, 9.366958618164062, 6.951995849609375, 6.67742919921875, 14.73244857788086, 6.243949890136719, 4.558708190917969, 1.136709213256836, 3.1173324584960938, -4.136260986328125, -2.874664306640625, 12.199195861816406, 13.862167358398438, 15.447147369384766, -8.8133544921875, 7.528097152709961, 4.7296600341796875, 2.39874267578125, -3.941661834716797, 1.8771991729736328, 1.7192764282226562, 8.042800903320312, 5.958696365356445, 7.313812255859375, 7.8441314697265625, 5.139505386352539, 13.255378723144531, 1.0539627075195312, -3.75750732421875, 17.39483642578125, 10.158348083496094, 6.227256774902344, 7.782646179199219, 0.7022628784179688, 4.890201568603516, -6.71575927734375, -5.645965576171875, 0.30779266357421875, 5.5512542724609375, -3.604034423828125, -3.5929107666015625, 6.42230224609375, 0.13432884216308594, 4.363470077514648, 2.632162094116211], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000572.npy"}
{"epoch": 0.8647014361300076, "step": 573, "batch_size": 64, "mean": 5.335370063781738, "std": 8.706788063049316, "min": -13.905147552490234, "p10": -4.185890960693359, "median": 3.932750701904297, "p90": 17.487719345092774, "max": 25.145408630371094, "pos_frac": 0.75, "sample": [2.32305908203125, 7.403980255126953, -13.905147552490234, 7.502849578857422, -3.1777267456054688, 4.004905700683594, -4.5194091796875, 9.73675537109375, 4.771270751953125, -3.6002044677734375, -12.121500015258789, -6.035438537597656, -1.736490249633789, 2.02337646484375, 2.885711669921875, 6.074127197265625, 0.8082427978515625, 6.489082336425781, 17.48672866821289, -11.383888244628906, 16.405372619628906, 5.496391296386719, 14.847404479980469, 3.860595703125, -2.50244140625, -1.1296195983886719, 24.951583862304688, 3.613157272338867, 13.886871337890625, 2.3528289794921875, 9.31167221069336, 2.6239013671875, 4.076084136962891, 6.4461822509765625, 24.674560546875, 4.656452178955078, 0.7540779113769531, 2.9173107147216797, 3.0673484802246094, 25.145408630371094, 13.570266723632812, 8.420267105102539, 8.43808364868164, 1.0615673065185547, -2.8707427978515625, 2.787689208984375, 21.356307983398438, 2.2439117431640625, 3.75433349609375, 9.114105224609375, 19.050209045410156, 15.997360229492188, 2.177642822265625, 9.814064025878906, -0.41597747802734375, -4.3984375, 21.17059326171875, -3.6899490356445312, 8.736024856567383, 6.954936981201172, 8.683029174804688, -1.1649551391601562, -7.300220489501953, 17.488143920898438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000573.npy"}
{"epoch": 0.8662131519274376, "step": 574, "batch_size": 64, "mean": 5.938455581665039, "std": 6.058755397796631, "min": -3.9599227905273438, "p10": -0.9900363922119141, "median": 5.065950393676758, "p90": 15.090402221679692, "max": 22.551048278808594, "pos_frac": 0.8125, "sample": [5.166934967041016, -1.2770004272460938, -0.9683990478515625, -3.110137939453125, 1.6518096923828125, 7.77168083190918, 1.4091434478759766, -1.4029541015625, 2.5203208923339844, -0.7406768798828125, -1.1801528930664062, 6.1934814453125, 4.536376953125, 1.808572769165039, -0.9929008483886719, -0.9833526611328125, -0.7744159698486328, 2.690216064453125, 5.881095886230469, 4.445911407470703, 12.680320739746094, 5.957069396972656, 4.9649658203125, 8.836257934570312, -1.7642326354980469, 22.551048278808594, 9.948280334472656, 16.950424194335938, 0.2874774932861328, 11.177276611328125, 9.159637451171875, 0.4611034393310547, 6.153285980224609, 6.297515869140625, 4.93670654296875, 8.118515014648438, -3.9599227905273438, 2.8636207580566406, 13.724998474121094, 1.9588603973388672, 5.706512451171875, 7.059364318847656, 18.472408294677734, 5.2684326171875, 10.21849250793457, 15.675575256347656, 6.510200500488281, 8.955108642578125, 4.623497009277344, 4.943485260009766, 2.4000701904296875, 19.815250396728516, 4.7151641845703125, 7.349786758422852, 10.805267333984375, -0.758392333984375, 2.4305419921875, 12.560199737548828, 13.346481323242188, 17.090553283691406, 1.1506423950195312, 10.487930297851562, 16.36212158203125, 0.9236831665039062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000574.npy"}
{"epoch": 0.8677248677248677, "step": 575, "batch_size": 64, "mean": 4.131783962249756, "std": 7.541484832763672, "min": -12.716045379638672, "p10": -4.248023605346679, "median": 3.083531379699707, "p90": 14.77573509216309, "max": 23.404266357421875, "pos_frac": 0.703125, "sample": [2.4400711059570312, -6.121978759765625, -10.123531341552734, -1.5841865539550781, 10.070953369140625, 2.186614990234375, 1.0222702026367188, 6.607856750488281, -1.08392333984375, -3.494354248046875, 9.405120849609375, -1.0852127075195312, 18.5484619140625, -4.410755157470703, -5.667976379394531, 9.889419555664062, -7.728904724121094, 6.71826171875, 6.243564605712891, 2.6383705139160156, 6.927482604980469, 17.127227783203125, 4.571281433105469, 5.477104187011719, 0.8248825073242188, 1.470733642578125, -3.868316650390625, 6.49671745300293, 13.7901611328125, 0.8446578979492188, 0.136871337890625, 1.5798721313476562, 15.198123931884766, 16.517410278320312, 12.54599380493164, -7.732995986938477, 21.113990783691406, -3.4225616455078125, 17.112632751464844, 12.25037956237793, -1.013336181640625, 7.115745544433594, 8.766006469726562, 2.089508056640625, 2.3916778564453125, 2.4071578979492188, 1.3565959930419922, -12.716045379638672, -0.47259521484375, 23.404266357421875, 10.657867431640625, 4.106475830078125, 5.6079559326171875, 5.890600204467773, -2.9476585388183594, 6.1688995361328125, 3.5286922454833984, -3.3728599548339844, 5.073129653930664, 9.106803894042969, -2.6636734008789062, 11.912521362304688, 6.009605407714844, -1.4049568176269531], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000575.npy"}
{"epoch": 0.8692365835222978, "step": 576, "batch_size": 64, "mean": 5.235239028930664, "std": 7.145557880401611, "min": -8.195877075195312, "p10": -3.1844457626342773, "median": 4.340606689453125, "p90": 15.214325714111329, "max": 19.929962158203125, "pos_frac": 0.703125, "sample": [15.34783935546875, 14.548482894897461, -1.1619606018066406, 4.908740997314453, 0.1209716796875, 8.577873229980469, 17.23434066772461, 3.642549514770508, 7.689888000488281, 3.77191162109375, -3.2791576385498047, -4.304990768432617, 10.799381256103516, -7.299591064453125, 5.875923156738281, -5.159660339355469, -1.8173904418945312, 10.463560104370117, 19.929962158203125, 0.5535888671875, 9.307687759399414, 18.160280227661133, 7.091091156005859, -0.41121864318847656, 10.293235778808594, 7.58192253112793, 4.927707672119141, -3.640178680419922, 15.551612854003906, -2.2236270904541016, 1.1297988891601562, -0.28498077392578125, 1.337860107421875, 14.075902938842773, 4.3954010009765625, 2.238433837890625, 17.538681030273438, -1.0401840209960938, 0.9608535766601562, 0.955596923828125, 9.477535247802734, -0.9827842712402344, 14.902793884277344, -2.4534549713134766, 10.999542236328125, 3.3757591247558594, 18.64578628540039, 12.4537353515625, -1.1151351928710938, 7.015735626220703, 10.42718505859375, -8.195877075195312, 4.2858123779296875, -4.0950469970703125, 10.871086120605469, 5.9626922607421875, -2.963451385498047, -2.5093936920166016, 12.53656005859375, 12.09866714477539, 2.1347484588623047, 1.2625370025634766, -0.7323379516601562, 13.264450073242188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000576.npy"}
{"epoch": 0.8707482993197279, "step": 577, "batch_size": 64, "mean": 4.056430816650391, "std": 7.237849712371826, "min": -12.185928344726562, "p10": -4.7545885086059565, "median": 2.9533119201660156, "p90": 13.174563980102544, "max": 24.14252471923828, "pos_frac": 0.765625, "sample": [8.740461349487305, 2.3197250366210938, -4.067375183105469, 5.2412261962890625, -0.4990959167480469, 5.16522216796875, 0.9494209289550781, -1.5806922912597656, 11.6727294921875, 0.5473403930664062, -2.0742416381835938, 6.3903350830078125, 9.339599609375, -0.52886962890625, -3.691722869873047, 15.050819396972656, 9.233154296875, 16.285701751708984, -12.185928344726562, 8.07499885559082, 11.85379409790039, 5.476905822753906, 2.102733612060547, 17.630996704101562, 0.6865692138671875, -8.097640991210938, -7.3295440673828125, 9.32073974609375, 6.892923355102539, 1.1081695556640625, 1.6240005493164062, 4.366598129272461, 0.9285354614257812, 2.1832427978515625, 2.386932373046875, 0.12432479858398438, 2.601032257080078, 3.86151123046875, 0.6242752075195312, -1.7268753051757812, 4.97503662109375, 7.09942626953125, 15.871719360351562, 1.271942138671875, 20.716211318969727, 13.740608215332031, -9.903778076171875, 7.308540344238281, 2.71929931640625, 11.107002258300781, -5.948234558105469, 6.4161224365234375, 24.14252471923828, 11.83785629272461, 3.4708404541015625, 1.5258598327636719, -9.227005004882812, -5.049108505249023, -2.3848419189453125, 10.447052001953125, 3.1873245239257812, 7.491056442260742, 6.344940185546875, 1.4491500854492188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000577.npy"}
{"epoch": 0.872260015117158, "step": 578, "batch_size": 64, "mean": 3.685648202896118, "std": 6.9516682624816895, "min": -10.118789672851562, "p10": -3.9928955078125, "median": 2.9202003479003906, "p90": 14.593817329406738, "max": 23.043243408203125, "pos_frac": 0.578125, "sample": [3.2541847229003906, -5.220090866088867, -0.1172637939453125, -0.4497108459472656, 3.176788330078125, 0.09168815612792969, 4.056194305419922, 5.1348419189453125, 6.19921875, -0.3219013214111328, -0.2046051025390625, 23.043243408203125, -1.9172744750976562, 6.107303619384766, 12.155193328857422, -9.252120971679688, 16.73779296875, 6.858715057373047, 6.514163970947266, 12.339996337890625, 14.623176574707031, 4.2994842529296875, 1.1840457916259766, 4.581258773803711, -1.0047416687011719, 6.163225173950195, 9.999542236328125, -0.3161773681640625, 1.393911361694336, -0.19836807250976562, 16.928497314453125, 7.242698669433594, 20.149173736572266, 6.537353515625, -0.199249267578125, 3.547576904296875, 4.146661758422852, 6.917823791503906, 3.2947158813476562, 0.8000431060791016, 12.152236938476562, -10.118789672851562, 3.5845413208007812, -1.5952987670898438, -2.8618621826171875, -4.064430236816406, -2.2023239135742188, 14.525312423706055, -1.3043498992919922, 16.1335506439209, 6.595726013183594, -4.704071044921875, 14.944595336914062, -0.01612091064453125, 2.6636123657226562, -4.457374572753906, -0.1665496826171875, -0.8688278198242188, -1.801727294921875, -0.021697998046875, -3.8259811401367188, -0.8346366882324219, -4.570045471191406, 10.418975830078125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000578.npy"}
{"epoch": 0.873771730914588, "step": 579, "batch_size": 64, "mean": 6.828756332397461, "std": 7.165412902832031, "min": -8.744705200195312, "p10": -1.119995880126953, "median": 6.706972122192383, "p90": 18.390143203735356, "max": 24.85980987548828, "pos_frac": 0.828125, "sample": [8.862777709960938, 6.94940185546875, 5.031530380249023, 2.0726051330566406, 3.32220458984375, -3.276836395263672, 8.434417724609375, 3.7318878173828125, 7.5739898681640625, 20.297775268554688, 16.15245819091797, 9.805578231811523, 7.387851715087891, 19.445377349853516, 7.720085144042969, 5.221778869628906, 17.16509246826172, -0.2698974609375, -0.8993339538574219, 9.421249389648438, 24.85980987548828, -3.223358154296875, 22.23196792602539, 0.08658027648925781, 4.276802062988281, 3.3355140686035156, 20.64777183532715, 10.483543395996094, 15.733892440795898, 8.35268783569336, -1.2009353637695312, 5.536737442016602, 14.330039978027344, 5.482723236083984, 9.431694030761719, 7.771690368652344, 9.539863586425781, 7.372785568237305, -0.03460693359375, 18.915164947509766, 0.30804443359375, 7.231304168701172, 0.6275711059570312, -4.847618103027344, 6.464542388916016, 2.808076858520508, -2.3509674072265625, -8.744705200195312, 19.766319274902344, 0.9502830505371094, 7.332183837890625, 2.5170135498046875, -0.9311370849609375, 12.366964340209961, 9.181587219238281, 3.632253646850586, 13.008903503417969, 10.220108032226562, 2.954923629760742, 3.477825164794922, -1.9691009521484375, 3.2094383239746094, 11.465255737304688, 0.28096771240234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000579.npy"}
{"epoch": 0.8752834467120182, "step": 580, "batch_size": 64, "mean": 4.6459503173828125, "std": 7.572780132293701, "min": -12.505365371704102, "p10": -4.7120521545410154, "median": 3.880000114440918, "p90": 16.396609497070315, "max": 20.925819396972656, "pos_frac": 0.6875, "sample": [-6.1587371826171875, 0.247222900390625, -3.701171875, 2.400167465209961, -2.66131591796875, 8.442176818847656, 19.92643928527832, 7.7095489501953125, 4.153556823730469, -0.2865791320800781, 8.558944702148438, 4.630760192871094, 2.9967498779296875, 2.3147850036621094, -5.195960998535156, 4.076665878295898, 10.449302673339844, 6.9311981201171875, -2.601114273071289, 6.079185485839844, -1.2746009826660156, 8.243398666381836, 17.098602294921875, 0.4501304626464844, 11.607954025268555, -0.838592529296875, 15.555618286132812, -0.20676422119140625, -0.03415489196777344, 17.70672607421875, 9.073921203613281, 10.717048645019531, 3.6833343505859375, 7.077348709106445, -4.7786407470703125, 20.925819396972656, -8.202682495117188, -12.505365371704102, 0.8049125671386719, 18.4727783203125, -8.630910873413086, 10.840667724609375, -4.556678771972656, 4.94818115234375, 16.757034301757812, -0.9562206268310547, 2.4951419830322266, 1.0751609802246094, 9.955863952636719, 3.0555801391601562, 14.181018829345703, 13.706754684448242, -0.5759429931640625, -8.02679443359375, -0.5400047302246094, -2.384584426879883, 7.0756072998046875, 10.744691848754883, 7.8895416259765625, 16.813201904296875, 2.916473388671875, 6.20068359375, 3.1943206787109375, 9.273429870605469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000580.npy"}
{"epoch": 0.8767951625094482, "step": 581, "batch_size": 64, "mean": 4.5316572189331055, "std": 7.0951995849609375, "min": -9.768686294555664, "p10": -3.265161132812499, "median": 3.5885772705078125, "p90": 11.528143692016602, "max": 24.362388610839844, "pos_frac": 0.765625, "sample": [11.122745513916016, 10.312370300292969, 1.0644073486328125, 5.475536346435547, 2.7642822265625, 2.284210205078125, 2.2443695068359375, -5.207359313964844, 7.1935272216796875, 7.438026428222656, -9.451690673828125, 1.5939064025878906, 0.8283329010009766, 5.018922805786133, 10.071090698242188, -4.388444900512695, 8.80035400390625, 2.5169105529785156, 7.1104736328125, 0.7399997711181641, 18.281877517700195, 2.1652984619140625, 18.442012786865234, -2.30157470703125, 11.513961791992188, 23.63031768798828, 10.11978530883789, 2.1456298828125, 9.538166046142578, 8.730308532714844, -2.1959228515625, 9.866607666015625, 1.4662971496582031, 9.104835510253906, 5.523674011230469, 4.412872314453125, 2.4240493774414062, -3.588287353515625, 24.362388610839844, 15.383514404296875, 1.775054931640625, -8.174240112304688, 6.05718994140625, -2.2298126220703125, 6.357303619384766, -9.768686294555664, 15.00439453125, 1.1930389404296875, -2.511199951171875, -6.128749847412109, 9.064453125, -1.8608016967773438, 5.58270263671875, 11.534221649169922, 0.8140754699707031, 7.3494110107421875, 0.4095954895019531, -1.898172378540039, -1.210540771484375, 6.3353118896484375, 0.34204864501953125, 8.770610809326172, 8.176420211791992, -1.5153732299804688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000581.npy"}
{"epoch": 0.8783068783068783, "step": 582, "batch_size": 64, "mean": 5.11611270904541, "std": 6.441960334777832, "min": -7.63017463684082, "p10": -3.3005249023437497, "median": 4.77894401550293, "p90": 12.573267555236816, "max": 22.206817626953125, "pos_frac": 0.8125, "sample": [10.332481384277344, 21.281883239746094, 4.625946044921875, 5.30572509765625, 2.558807373046875, 0.4215812683105469, 8.888519287109375, 4.691654205322266, 8.719032287597656, 12.916175842285156, 4.866233825683594, 5.453742980957031, 6.4487457275390625, 22.206817626953125, 0.502960205078125, 13.631494522094727, 3.8104705810546875, 0.6370391845703125, 5.552276611328125, 10.018280029296875, 2.4026737213134766, 6.847267150878906, 6.468732833862305, 13.596565246582031, 10.395065307617188, 8.211746215820312, 5.330223083496094, 3.931964874267578, -6.25117301940918, -7.63017463684082, 0.08810615539550781, -0.60882568359375, 11.149978637695312, -3.341522216796875, -0.547698974609375, 11.81273078918457, 18.544464111328125, 9.348638534545898, 11.04864501953125, 1.825469970703125, 12.606048583984375, 9.613616943359375, 3.5222721099853516, 4.543428421020508, -1.919656753540039, 0.1742706298828125, -6.098854064941406, 3.6111373901367188, 5.163337707519531, -6.558078765869141, 2.011262893676758, 1.8240966796875, -0.35194969177246094, 0.5469264984130859, -3.542745590209961, 5.540763854980469, 10.930837631225586, -5.258110046386719, 12.49677848815918, -3.204864501953125, 2.933013916015625, 1.7272567749023438, 9.327392578125, 12.300296783447266], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000582.npy"}
{"epoch": 0.8798185941043084, "step": 583, "batch_size": 64, "mean": 4.781617164611816, "std": 7.713698387145996, "min": -14.627227783203125, "p10": -3.5540584564208983, "median": 4.989505767822266, "p90": 14.283691787719729, "max": 22.437152862548828, "pos_frac": 0.734375, "sample": [7.125926971435547, 2.6205406188964844, -2.4394912719726562, -3.321308135986328, 22.264209747314453, 4.512168884277344, -2.385833740234375, -3.0895767211914062, -6.7610931396484375, 0.25757598876953125, 6.7592315673828125, -6.784305572509766, 0.08057022094726562, 8.931465148925781, 5.028480529785156, -3.2389984130859375, 7.817386627197266, -1.5671024322509766, 22.437152862548828, 17.169647216796875, 8.060224533081055, -11.255851745605469, 4.950531005859375, -0.8210220336914062, 6.0069732666015625, 6.826505661010742, 9.389129638671875, -7.803886413574219, 4.130525588989258, 10.847190856933594, 7.200099945068359, 3.132753372192383, 7.0811767578125, -0.04746246337890625, 7.680755615234375, 14.462554931640625, -4.870887756347656, -3.65380859375, 6.774253845214844, 20.379844665527344, 3.6489696502685547, 16.2943115234375, -14.627227783203125, 13.33880615234375, 2.5524730682373047, 10.699432373046875, 9.476919174194336, 0.4571876525878906, 5.3364410400390625, -2.022052764892578, 9.863800048828125, -0.319183349609375, 10.950922012329102, 2.346738815307617, 6.4546661376953125, 3.6942596435546875, 1.0475788116455078, 6.855068206787109, 0.45606231689453125, 13.866344451904297, 21.77794647216797, 12.048820495605469, 5.5623931884765625, 2.3765811920166016], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000583.npy"}
{"epoch": 0.8813303099017384, "step": 584, "batch_size": 64, "mean": 5.151208877563477, "std": 8.04017162322998, "min": -11.317726135253906, "p10": -4.036684417724609, "median": 5.227621078491211, "p90": 14.42826442718506, "max": 27.602981567382812, "pos_frac": 0.6875, "sample": [18.701705932617188, 5.44317626953125, 9.881584167480469, -1.6435279846191406, 5.429420471191406, -0.8656177520751953, 19.8502197265625, 8.503746032714844, 8.573921203613281, 9.497512817382812, 5.919612884521484, -2.1815452575683594, -2.732942581176758, -2.3675060272216797, -2.1414642333984375, 11.855140686035156, 11.051450729370117, 14.142160415649414, 13.894094467163086, -4.368167877197266, 14.932540893554688, 11.852951049804688, -2.0603160858154297, 14.550880432128906, 11.664459228515625, 7.729759216308594, 0.9288616180419922, -0.6694717407226562, 3.330169677734375, 4.282371520996094, 11.986434936523438, 2.1894378662109375, 7.553455352783203, 3.6552810668945312, -2.867307662963867, 12.793731689453125, -4.282615661621094, -3.2518081665039062, 1.2618789672851562, 0.2463550567626953, 6.554691314697266, 16.078651428222656, -7.0361175537109375, 6.453378677368164, 12.0206298828125, 9.867431640625, -0.18465805053710938, 6.651298522949219, 3.043975830078125, -6.070732116699219, -11.317726135253906, 5.025821685791016, -3.4628448486328125, 10.312568664550781, 2.108736038208008, 27.602981567382812, 25.21269989013672, -0.14270401000976562, 9.680841445922852, 2.3712921142578125, 2.0184783935546875, 8.500005722045898, -9.848777770996094, -8.032583236694336], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000584.npy"}
{"epoch": 0.8828420256991686, "step": 585, "batch_size": 64, "mean": 4.290743827819824, "std": 6.719827651977539, "min": -16.907447814941406, "p10": -2.3644485473632812, "median": 3.9751148223876953, "p90": 13.172695541381836, "max": 20.45677375793457, "pos_frac": 0.796875, "sample": [3.9254112243652344, 9.610855102539062, 3.3788375854492188, 6.909996032714844, 8.047367095947266, 0.5898284912109375, 2.9647865295410156, 14.646247863769531, 2.658863067626953, 1.462921142578125, -13.891315460205078, 0.8639945983886719, 16.033355712890625, -0.48012542724609375, 0.5906505584716797, 8.031097412109375, -6.294549942016602, 2.0596084594726562, 8.349319458007812, 6.356842041015625, 13.211315155029297, 1.4681148529052734, 0.6300525665283203, -7.065925598144531, 4.27879524230957, 5.059175491333008, 0.3889312744140625, -2.2392120361328125, 6.28814697265625, 3.6856613159179688, -0.5295372009277344, 1.5710601806640625, 11.35391616821289, -0.2514152526855469, -4.7138671875, 4.577415466308594, 1.6349105834960938, 10.167377471923828, 13.13033676147461, 9.182014465332031, 5.018634796142578, 7.954357147216797, 2.324493408203125, -5.810819625854492, -16.907447814941406, 16.802631378173828, 15.647640228271484, 11.525920867919922, -2.418121337890625, -1.1078414916992188, 4.741239547729492, 6.7997589111328125, 9.745086669921875, 4.614315032958984, 2.783966064453125, -0.6486282348632812, 3.4910888671875, 13.190849304199219, 20.45677375793457, 7.383644104003906, 2.475963592529297, 4.024818420410156, 6.7126922607421875, 8.165351867675781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000585.npy"}
{"epoch": 0.8843537414965986, "step": 586, "batch_size": 64, "mean": 5.529616832733154, "std": 6.264185428619385, "min": -9.821615219116211, "p10": -1.1790035247802733, "median": 5.074286460876465, "p90": 14.696556091308597, "max": 21.832199096679688, "pos_frac": 0.84375, "sample": [16.662628173828125, 3.7804718017578125, 8.24631118774414, 5.594268798828125, 2.83782958984375, 15.50067138671875, 6.316587448120117, 0.8861560821533203, 5.276023864746094, 4.872549057006836, 9.357213973999023, -3.4761123657226562, -2.242706298828125, 15.319683074951172, 15.236774444580078, -0.8628101348876953, 4.513212203979492, 10.073722839355469, 3.3556766510009766, 7.655866622924805, 0.7042083740234375, 11.370513916015625, 5.856067657470703, 11.149200439453125, 2.21844482421875, 0.04386711120605469, 0.5263519287109375, 1.896087646484375, 9.721187591552734, 7.532032012939453, 12.140884399414062, 9.028728485107422, -1.2889938354492188, 10.359310150146484, 4.7842254638671875, 2.4453964233398438, -9.821615219116211, 21.832199096679688, -0.9223594665527344, 1.8586139678955078, 1.9213695526123047, 4.3475189208984375, 1.8458709716796875, 8.409461975097656, 6.2659149169921875, -0.16003036499023438, -1.672994613647461, 6.9658203125, 0.7480316162109375, 7.078025817871094, 14.021240234375, 3.1816253662109375, 8.876846313476562, 1.6274967193603516, 13.107738494873047, 3.9413223266601562, -7.769462585449219, 18.011398315429688, -8.625701904296875, 14.985977172851562, 8.62728500366211, 6.693660736083984, 4.7113189697265625, 6.4173736572265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000586.npy"}
{"epoch": 0.8858654572940288, "step": 587, "batch_size": 64, "mean": 5.087654113769531, "std": 8.037989616394043, "min": -20.205169677734375, "p10": -3.6639202117919916, "median": 5.482292175292969, "p90": 16.2357234954834, "max": 22.083995819091797, "pos_frac": 0.75, "sample": [1.2863578796386719, -13.6881103515625, 3.587024688720703, 5.899450302124023, 7.391807556152344, -4.958564758300781, 3.026641845703125, -3.9655303955078125, 9.000785827636719, 7.997039794921875, 2.0977020263671875, 6.206279754638672, -2.960163116455078, -1.6806716918945312, 2.396026611328125, 7.657033920288086, 10.773885726928711, -0.40789222717285156, 2.6964874267578125, 21.29721450805664, 11.176322937011719, 6.261749267578125, -1.0501022338867188, 17.723846435546875, 5.597496032714844, 6.549541473388672, 22.083995819091797, -0.13153457641601562, -20.205169677734375, -1.0682373046875, 12.662544250488281, 4.466243743896484, 11.888763427734375, 18.48765754699707, 3.3751220703125, 3.7255020141601562, 8.248214721679688, -12.478073120117188, -0.6818695068359375, 12.356536865234375, 16.02593231201172, -5.939342498779297, 9.993263244628906, 7.00676155090332, -0.3235282897949219, 8.270309448242188, 5.371711730957031, 9.899463653564453, 10.761373519897461, 2.9623451232910156, 16.325634002685547, -7.026294708251953, 9.40018081665039, 1.1890792846679688, 3.921772003173828, 16.328895568847656, 0.9728927612304688, -0.21196746826171875, 1.3559379577636719, 5.855964660644531, 5.592872619628906, 18.824234008789062, 14.818859100341797, 1.5921554565429688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000587.npy"}
{"epoch": 0.8873771730914588, "step": 588, "batch_size": 64, "mean": 5.0742058753967285, "std": 6.926985740661621, "min": -10.858917236328125, "p10": -3.4670158386230465, "median": 5.036930084228516, "p90": 12.79710350036621, "max": 22.40380859375, "pos_frac": 0.78125, "sample": [5.941734313964844, -5.565483093261719, -2.611907958984375, 6.0082550048828125, -2.9591751098632812, 7.6707916259765625, 15.486503601074219, 8.171653747558594, 8.491081237792969, 11.607330322265625, 22.40380859375, 4.358415603637695, 4.887790679931641, 4.689947128295898, 4.262260437011719, 7.562723159790039, 21.327884674072266, 22.321243286132812, 5.42896842956543, 14.708999633789062, 4.783576965332031, 6.276968002319336, -5.25880241394043, 15.288219451904297, 6.204334259033203, 11.404563903808594, 4.071891784667969, -1.6226425170898438, -1.3724994659423828, -0.1922607421875, 1.033761978149414, 8.744729995727539, -6.102893829345703, 1.27203369140625, 8.749713897705078, 12.089706420898438, -1.1117496490478516, 5.5349578857421875, 4.311738967895508, 0.8968048095703125, -6.448272705078125, -3.684661865234375, 6.277767181396484, 12.701862335205078, 7.188571929931641, 5.0400543212890625, 10.61625862121582, 0.9625568389892578, 1.37127685546875, 11.185735702514648, 3.9296722412109375, 5.120697021484375, 1.2529335021972656, 5.033805847167969, 0.8067703247070312, 3.68597412109375, -1.5485992431640625, 7.710046768188477, 2.9482879638671875, -10.858917236328125, 8.164152145385742, 12.837921142578125, 10.792041778564453, -9.531723022460938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000588.npy"}
{"epoch": 0.8888888888888888, "step": 589, "batch_size": 64, "mean": 6.2913055419921875, "std": 7.1040802001953125, "min": -11.572174072265625, "p10": -0.18621826171874975, "median": 4.963102340698242, "p90": 16.323761940002445, "max": 25.675010681152344, "pos_frac": 0.890625, "sample": [0.9319629669189453, 11.095672607421875, -5.005073547363281, 16.526634216308594, 4.852001190185547, 2.908367156982422, 1.4368820190429688, 2.7607688903808594, 1.26934814453125, 0.597686767578125, 1.3392715454101562, 0.8255157470703125, 20.996505737304688, 4.918605804443359, 13.9647216796875, 6.043399810791016, 3.9448165893554688, 11.50701904296875, 19.42251968383789, 14.724800109863281, 1.6454696655273438, 10.015731811523438, 9.650331497192383, 5.201324462890625, 21.08306884765625, 2.9092636108398438, -2.90692138671875, 8.302322387695312, 13.388893127441406, 5.617471694946289, 5.825347900390625, -0.29095458984375, 14.230154037475586, 5.71343994140625, -0.43758392333984375, 5.723308563232422, 4.902130126953125, 8.332389831542969, 4.874664306640625, 18.181671142578125, 8.136138916015625, 5.3150634765625, 12.254592895507812, 4.8937835693359375, 6.698322296142578, 5.007598876953125, 1.1633453369140625, -4.181468963623047, 0.32867431640625, 3.1994781494140625, -11.572174072265625, 7.293983459472656, 22.29199981689453, -5.612571716308594, 2.7843246459960938, 8.596540451049805, 15.850393295288086, 3.7591400146484375, 3.2837905883789062, 2.1405105590820312, 0.05816650390625, 2.900737762451172, 25.675010681152344, 5.355226516723633], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000589.npy"}
{"epoch": 0.890400604686319, "step": 590, "batch_size": 64, "mean": 6.443607807159424, "std": 7.5259199142456055, "min": -10.974746704101562, "p10": -2.5843917846679685, "median": 6.122842788696289, "p90": 17.145849990844727, "max": 21.156389236450195, "pos_frac": 0.8125, "sample": [13.613670349121094, 4.0464324951171875, 8.302108764648438, 0.26909637451171875, 10.846542358398438, 18.46484375, 4.817596435546875, -10.974746704101562, 1.0000648498535156, 1.5570220947265625, 4.9590606689453125, 1.1608524322509766, -0.17775726318359375, 1.6675338745117188, -7.5000457763671875, 4.788360595703125, 9.17789077758789, 17.264251708984375, -8.102840423583984, -2.6358489990234375, -1.1225051879882812, 2.10980224609375, 1.1638870239257812, 14.971672058105469, 19.25815200805664, 3.7129592895507812, 1.9478836059570312, 8.8472900390625, 17.384033203125, 10.015243530273438, -0.6136264801025391, 15.1497802734375, 3.0259037017822266, 12.80224609375, 7.0536041259765625, 10.675628662109375, 9.614477157592773, 6.0609283447265625, 0.44098663330078125, 7.321285247802734, 16.869579315185547, 7.053400039672852, 18.6456298828125, 10.9678955078125, 16.234130859375, 5.214363098144531, 12.441650390625, 8.089797973632812, -3.6832504272460938, 7.993858337402344, -3.9022598266601562, 14.476760864257812, 12.209400177001953, 0.5075225830078125, 2.3214874267578125, 16.13848114013672, 6.184757232666016, 3.938629150390625, 18.439067840576172, -1.608510971069336, 8.758914947509766, -2.464324951171875, 21.156389236450195, -5.956180572509766], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000590.npy"}
{"epoch": 0.891912320483749, "step": 591, "batch_size": 64, "mean": 5.801677703857422, "std": 7.835216522216797, "min": -9.249984741210938, "p10": -4.0993501663208, "median": 6.058187484741211, "p90": 15.654833984375001, "max": 23.771224975585938, "pos_frac": 0.765625, "sample": [0.10061836242675781, 18.720714569091797, 9.296411514282227, 12.218515396118164, 6.463348388671875, 4.52850341796875, 12.98874282836914, 5.054328918457031, 5.868404388427734, 10.588260650634766, 5.7195892333984375, -3.12774658203125, 7.873420715332031, -6.2814483642578125, 6.311405181884766, -1.7120819091796875, 6.830316543579102, 2.022430419921875, 6.633056640625, 1.7332649230957031, 9.063323974609375, 4.954811096191406, 18.977294921875, -5.140571594238281, 14.21539306640625, -9.249984741210938, 9.374526977539062, 2.9608840942382812, 3.8361167907714844, 6.2479705810546875, -6.352836608886719, -4.258459091186523, 15.843414306640625, 23.771224975585938, 1.4379653930664062, 13.257064819335938, -8.142414093017578, 4.26300048828125, 14.820419311523438, 0.5848808288574219, -2.1287460327148438, -1.2809200286865234, 4.4239044189453125, 11.1114501953125, -1.706268310546875, 20.156021118164062, -3.218637466430664, -7.6909942626953125, -3.7280960083007812, 15.214813232421875, -2.5377197265625, 0.11281967163085938, 20.75714111328125, 7.243328094482422, 1.5955696105957031, 12.474075317382812, 14.930961608886719, 7.425016403198242, 6.4785003662109375, 8.790199279785156, 18.72216033935547, 7.345611572265625, 0.44716453552246094, 14.075920104980469], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000591.npy"}
{"epoch": 0.8934240362811792, "step": 592, "batch_size": 64, "mean": 5.218626022338867, "std": 7.31174373626709, "min": -6.153141021728516, "p10": -4.175887680053711, "median": 3.189549446105957, "p90": 17.52765502929688, "max": 24.308258056640625, "pos_frac": 0.796875, "sample": [6.079963684082031, 18.436309814453125, 14.53432846069336, 5.099920272827148, 8.44412612915039, 19.73601531982422, 12.826766967773438, -0.21294021606445312, 2.1295394897460938, 4.858314514160156, 1.0070419311523438, 7.202850341796875, 2.7649688720703125, 0.02156829833984375, 8.375518798828125, 9.545074462890625, 20.75516128540039, 5.646823883056641, 0.8925094604492188, 1.5740966796875, -4.516563415527344, 12.15926742553711, 3.226133346557617, 16.04241943359375, 1.4443168640136719, 8.389198303222656, 2.3590774536132812, 18.998672485351562, 12.047340393066406, 5.1208648681640625, -4.711883544921875, 6.556098937988281, 1.459014892578125, 5.0831451416015625, 5.261238098144531, 2.749889373779297, -4.531494140625, 12.220115661621094, -6.153141021728516, -1.2103099822998047, 0.39800453186035156, 11.83769416809082, -0.7980194091796875, -3.745769500732422, -4.4983062744140625, 7.773284912109375, -3.7773284912109375, 18.1641845703125, 21.468971252441406, -5.413612365722656, 24.308258056640625, -0.4959850311279297, 0.7918834686279297, 1.0571174621582031, 2.7654037475585938, 1.525604248046875, 1.57598876953125, 4.0332794189453125, -4.346698760986328, 3.4283714294433594, 3.152965545654297, 0.9505386352539062, 3.0168914794921875, 9.107955932617188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000592.npy"}
{"epoch": 0.8949357520786092, "step": 593, "batch_size": 64, "mean": 4.855713844299316, "std": 8.12277889251709, "min": -14.476646423339844, "p10": -4.9523067474365225, "median": 4.147899627685547, "p90": 16.45102119445801, "max": 22.41649627685547, "pos_frac": 0.734375, "sample": [9.619888305664062, 1.7776031494140625, 11.355987548828125, -4.469024658203125, 15.801753997802734, -2.5876083374023438, 2.8133544921875, -6.54705810546875, 18.771793365478516, 2.591766357421875, 17.145362854003906, 1.601633071899414, 15.427993774414062, 12.903217315673828, 3.7261810302734375, -5.159427642822266, 4.569618225097656, 7.687652587890625, 15.397491455078125, 5.962652206420898, 3.380949020385742, 7.919300079345703, 17.184120178222656, 12.400943756103516, 7.856239318847656, 5.857429504394531, -1.812326431274414, 9.516683578491211, 3.6894149780273438, -11.1688232421875, 11.347434997558594, -8.304489135742188, -1.4335098266601562, 9.282196044921875, 6.663419723510742, 8.317623138427734, -0.7427253723144531, -4.121303558349609, 11.516227722167969, -2.4543304443359375, -7.816398620605469, -2.654935836791992, 16.729278564453125, -11.694679260253906, 19.383460998535156, 10.074905395507812, 18.417068481445312, 3.4818592071533203, -3.2780227661132812, 22.41649627685547, 2.54913330078125, 0.1398143768310547, 6.033416748046875, 5.297821044921875, 5.853851318359375, -14.476646423339844, 1.1542797088623047, 9.440109252929688, 1.1474933624267578, -1.1414260864257812, 2.4294891357421875, 3.0976104736328125, 3.1980819702148438, 7.698333740234375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000593.npy"}
{"epoch": 0.8964474678760394, "step": 594, "batch_size": 64, "mean": 4.2019805908203125, "std": 6.791548252105713, "min": -12.453826904296875, "p10": -3.3929176330566406, "median": 3.835559844970703, "p90": 14.31217918395997, "max": 21.921531677246094, "pos_frac": 0.71875, "sample": [2.069276809692383, 9.523666381835938, -3.4807281494140625, 8.982048034667969, 9.339366912841797, 11.619316101074219, 0.417144775390625, 1.6863327026367188, 7.7040863037109375, -4.351341247558594, -1.290252685546875, 0.7273292541503906, -3.1880264282226562, 3.8078765869140625, 3.634347915649414, -0.6524829864501953, -2.34002685546875, -0.290313720703125, 10.748779296875, 0.7065582275390625, 8.630889892578125, 3.867807388305664, 1.2511787414550781, 5.810516357421875, 18.731706619262695, 5.121551513671875, 4.062984466552734, -4.50933837890625, 9.02164077758789, -1.5382328033447266, 21.921531677246094, 4.158905029296875, 17.63189697265625, 15.118022918701172, 1.6909408569335938, -2.610525131225586, -5.091346740722656, 7.04779052734375, 12.431877136230469, -7.369682312011719, 7.06390380859375, 0.9611225128173828, 10.153650283813477, 15.882186889648438, -1.7835884094238281, 4.966253280639648, 1.9244804382324219, 9.786689758300781, -2.2766170501708984, 8.481277465820312, 7.23486328125, -12.453826904296875, 15.285888671875, 16.08403778076172, 5.96209716796875, 5.552288055419922, 0.611724853515625, 3.0942535400390625, 3.8632431030273438, -1.8805999755859375, 4.8210296630859375, -6.488433837890625, 1.9523162841796875, -0.6245765686035156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000594.npy"}
{"epoch": 0.8979591836734694, "step": 595, "batch_size": 64, "mean": 6.816158294677734, "std": 8.192422866821289, "min": -18.07268714904785, "p10": -2.110100936889648, "median": 6.2865447998046875, "p90": 16.349353790283203, "max": 25.873428344726562, "pos_frac": 0.78125, "sample": [-0.948760986328125, 11.565788269042969, 15.974456787109375, 6.499298095703125, 7.172004699707031, 5.976961135864258, 9.524742126464844, 4.447164535522461, -0.8180046081542969, 16.423625946044922, 12.160820007324219, 2.3237648010253906, 3.200347900390625, -6.286369323730469, 15.280447006225586, 2.3344573974609375, 3.3343868255615234, -2.0013694763183594, 13.790767669677734, 1.5802745819091797, 15.882957458496094, 5.333667755126953, 15.97403335571289, 4.667945861816406, 19.58837890625, 2.852294921875, 4.238945007324219, 4.704925537109375, 7.5478973388671875, 11.451957702636719, 2.9709739685058594, -5.067100524902344, 4.332000732421875, -4.856109619140625, 13.003223419189453, 14.855941772460938, 19.547607421875, -6.1826171875, -18.07268714904785, 23.09095001220703, 25.873428344726562, 16.17605209350586, 8.662567138671875, 7.482551574707031, -2.1567001342773438, 15.322212219238281, 7.433837890625, 10.886852264404297, 8.453239440917969, 8.16073226928711, -1.1972999572753906, 10.749214172363281, 1.3350353240966797, 2.1595840454101562, 2.8451690673828125, 6.07379150390625, 20.698501586914062, 9.093734741210938, -1.0391006469726562, -0.6192474365234375, -0.06578636169433594, -6.8723602294921875, 11.43154525756836, 17.946584701538086], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000595.npy"}
{"epoch": 0.8994708994708994, "step": 596, "batch_size": 64, "mean": 3.38840389251709, "std": 7.309378147125244, "min": -11.219091415405273, "p10": -4.50071964263916, "median": 2.2060670852661133, "p90": 16.80945510864258, "max": 21.587112426757812, "pos_frac": 0.65625, "sample": [-5.052101135253906, 4.035369873046875, 17.734466552734375, 3.9469070434570312, 6.326496124267578, -3.189727783203125, -1.5085983276367188, 0.07199859619140625, -0.139373779296875, 16.138214111328125, -6.483783721923828, 4.981403350830078, -5.9385986328125, -0.6680221557617188, 17.715438842773438, 18.453134536743164, 9.326179504394531, 2.164896011352539, -6.755134582519531, 2.724458694458008, -2.6388702392578125, 7.7460174560546875, 4.969898223876953, 5.851690292358398, 20.20094108581543, 10.71725082397461, 6.643058776855469, -3.784128189086914, 1.4965591430664062, -0.959381103515625, 0.9963493347167969, 2.1642913818359375, 2.2472381591796875, 0.24598312377929688, 3.7530288696289062, -3.3024559020996094, -4.704750061035156, 21.587112426757812, 0.14278793334960938, -8.826148986816406, 3.8777999877929688, 5.732505798339844, -0.856414794921875, -11.219091415405273, -1.7132949829101562, 4.785068511962891, -0.6736564636230469, -3.6862869262695312, 17.097129821777344, 9.805747985839844, 12.637041091918945, 2.2989654541015625, -1.0228958129882812, 1.630279541015625, 0.18131256103515625, 17.55426025390625, 0.047008514404296875, 7.063484191894531, -4.024648666381836, 2.8694019317626953, 6.383335113525391, 6.52435302734375, 4.126556396484375, -0.9902172088623047], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000596.npy"}
{"epoch": 0.9009826152683296, "step": 597, "batch_size": 64, "mean": 4.968687057495117, "std": 6.893178939819336, "min": -8.557880401611328, "p10": -2.2006603240966798, "median": 3.443619728088379, "p90": 14.823419952392578, "max": 22.82733917236328, "pos_frac": 0.734375, "sample": [-2.2308921813964844, -8.557880401611328, -1.3984146118164062, 1.0714874267578125, 14.927627563476562, 3.355609893798828, 4.625268936157227, -4.046379089355469, -2.1301193237304688, 17.61433219909668, 0.6276702880859375, 13.151611328125, 1.8155345916748047, 4.584434509277344, -1.5974960327148438, 3.1429290771484375, 21.52861785888672, 15.237995147705078, 8.216140747070312, -3.617338180541992, 8.21337890625, 2.617645263671875, -5.0383453369140625, -1.5794181823730469, 11.590919494628906, 9.889617919921875, 2.8311939239501953, -0.22601318359375, 10.367210388183594, -1.8948345184326172, 4.776470184326172, 4.661460876464844, 14.490318298339844, 7.350982666015625, 9.556344985961914, 4.510356903076172, 0.7055511474609375, 3.5316295623779297, 5.550046920776367, 15.366905212402344, -0.8171119689941406, -2.3182449340820312, 2.2158203125, -0.12705039978027344, -0.34208106994628906, 2.1252059936523438, 1.0713253021240234, 0.8293285369873047, 13.884735107421875, 2.531768798828125, 22.82733917236328, 3.5543880462646484, 9.699394226074219, -1.98443603515625, -6.6444091796875, 7.495124816894531, 1.0590133666992188, 3.5638580322265625, 15.916290283203125, 14.580268859863281, 1.9140987396240234, 13.490907669067383, 10.777450561523438, 9.100805282592773], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000597.npy"}
{"epoch": 0.9024943310657596, "step": 598, "batch_size": 64, "mean": 3.4464516639709473, "std": 7.2302565574646, "min": -11.588691711425781, "p10": -6.077601432800293, "median": 2.7339086532592773, "p90": 12.934156036376962, "max": 20.723541259765625, "pos_frac": 0.65625, "sample": [-0.2931404113769531, -4.7735443115234375, 8.025001525878906, 4.762001037597656, 8.647575378417969, -0.28012847900390625, 3.4555511474609375, 14.408390045166016, 6.71577262878418, -11.588691711425781, -0.91815185546875, 3.204526901245117, 0.6970062255859375, -5.75581169128418, 19.65807342529297, -8.83642578125, 20.238391876220703, -6.2983856201171875, 10.526273727416992, 6.997509002685547, -2.647096633911133, -8.588247299194336, -6.22271728515625, 6.795318603515625, 1.0024261474609375, 2.465944290161133, 3.001873016357422, 10.42999267578125, -3.9283218383789062, -1.2953834533691406, -2.009349822998047, 1.6940727233886719, 5.907915115356445, 5.2688751220703125, 14.38677978515625, -1.0640811920166016, 10.85552978515625, -1.1771392822265625, 10.77392578125, 0.6122283935546875, 9.129302978515625, 5.841033935546875, 0.08615875244140625, 20.723541259765625, -1.8182754516601562, 3.7269935607910156, 4.601409912109375, 0.3729438781738281, 8.919618606567383, 6.772468566894531, -1.1567535400390625, -6.318809509277344, 0.7696990966796875, 3.822601318359375, 1.9263973236083984, -6.215511322021484, 17.670486450195312, 7.3704681396484375, -0.6713333129882812, 0.9113616943359375, 10.97100830078125, 13.775505065917969, -4.265995025634766, 8.774253845214844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000598.npy"}
{"epoch": 0.9040060468631897, "step": 599, "batch_size": 64, "mean": 4.206566333770752, "std": 7.3346452713012695, "min": -11.546745300292969, "p10": -3.4748935699462886, "median": 2.9035730361938477, "p90": 13.979364013671878, "max": 23.964080810546875, "pos_frac": 0.71875, "sample": [-0.9864120483398438, 1.3817729949951172, 3.61407470703125, 4.539787292480469, -3.6575241088867188, 8.64291763305664, -2.951335906982422, 15.597412109375, -2.961376190185547, -0.10064697265625, -5.804420471191406, -3.048755645751953, 3.1624298095703125, 6.787420272827148, 8.249946594238281, -1.2068710327148438, -10.469070434570312, -0.2662315368652344, 2.7830066680908203, 0.36334228515625, 14.375490188598633, 6.634317398071289, 0.9473533630371094, -11.546745300292969, 0.4878559112548828, -1.7387237548828125, 0.105743408203125, 16.70359992980957, -4.804420471191406, 2.153717041015625, 18.494918823242188, 7.42919921875, -2.9831161499023438, -6.0229034423828125, 2.5750465393066406, 1.5169143676757812, 1.0455856323242188, 3.276294708251953, 1.0451431274414062, 14.345382690429688, 13.125320434570312, 0.9564704895019531, 0.441192626953125, 9.184297561645508, 12.121700286865234, 12.289241790771484, 12.246170043945312, -1.1282920837402344, -2.4289894104003906, 4.854438781738281, 23.964080810546875, 8.5924072265625, 7.991966247558594, 5.518413543701172, 12.644744873046875, 3.291292190551758, 22.3902587890625, 1.6052932739257812, 12.096206665039062, 3.024139404296875, -4.8349609375, 9.970941543579102, 9.526077270507812, 4.067722320556641], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000599.npy"}
{"epoch": 0.9055177626606198, "step": 600, "batch_size": 64, "mean": 3.981468677520752, "std": 7.18269157409668, "min": -13.221736907958984, "p10": -3.8064741134643554, "median": 3.4523849487304688, "p90": 13.221693420410157, "max": 22.399185180664062, "pos_frac": 0.734375, "sample": [10.490318298339844, 2.4743003845214844, -3.801706314086914, 1.7079925537109375, 11.808181762695312, 3.5180015563964844, 4.8465576171875, -4.022241592407227, 15.757949829101562, 15.352767944335938, -2.7369041442871094, 6.349922180175781, 9.75252914428711, 15.76873779296875, -3.4421005249023438, -1.4786739349365234, -3.8085174560546875, -2.999378204345703, -4.2534332275390625, 12.9691162109375, 0.14664840698242188, 4.1610107421875, 13.329940795898438, 4.10089111328125, -3.577423095703125, 10.520721435546875, 3.128326416015625, 14.292377471923828, -0.06207275390625, 9.272712707519531, 10.432098388671875, 3.386768341064453, -13.221736907958984, 1.0515899658203125, 4.123359680175781, -11.351951599121094, 0.787567138671875, 5.696739196777344, 1.1807308197021484, 12.42620849609375, 0.46746826171875, 5.68365478515625, 5.802249908447266, 6.390350341796875, -0.8831920623779297, 6.570062637329102, -8.618316650390625, -2.756834030151367, -9.253005981445312, 11.976306915283203, 0.6508464813232422, 2.8798751831054688, 1.6566181182861328, -0.86822509765625, 4.630138397216797, 5.502166748046875, 8.5350341796875, 3.289691925048828, 5.351535797119141, 20.63434600830078, 2.2638015747070312, 1.1113662719726562, 7.320953369140625, 22.399185180664062], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000600.npy"}
{"epoch": 0.9070294784580499, "step": 601, "batch_size": 64, "mean": 5.62648868560791, "std": 7.58783483505249, "min": -11.6029052734375, "p10": -2.7886470794677725, "median": 4.218879699707031, "p90": 16.063171386718754, "max": 24.96515655517578, "pos_frac": 0.765625, "sample": [10.599868774414062, 2.8852081298828125, 9.394157409667969, -5.459897994995117, 12.573097229003906, 13.354976654052734, 22.045684814453125, 2.1822967529296875, 2.6627120971679688, -4.443000793457031, 24.96515655517578, -11.6029052734375, 16.353424072265625, 4.319612503051758, 0.7509441375732422, 3.8500518798828125, 13.284801483154297, 9.20037841796875, -1.9190597534179688, 7.7776947021484375, 16.47449493408203, 2.031829833984375, 0.34270477294921875, 16.314666748046875, 21.82312774658203, 2.1501808166503906, -1.1371917724609375, -0.7237186431884766, 8.246681213378906, -1.6753921508789062, 0.7384872436523438, 10.219890594482422, 4.709991455078125, -0.6180305480957031, -0.009029388427734375, 12.483692169189453, 4.025634765625, 7.602203369140625, 4.338691711425781, 4.118146896362305, -4.46624755859375, 6.452239990234375, 0.9866104125976562, 13.432365417480469, 0.888916015625, 3.756866455078125, 4.743314743041992, -1.98651123046875, -3.1324195861816406, 4.0970306396484375, 6.473030090332031, 15.283416748046875, 0.06133842468261719, 0.5669803619384766, 7.850561141967773, 9.911643981933594, 16.61474609375, 9.312664031982422, 11.073158264160156, -0.7959651947021484, -6.055534362792969, 13.193824768066406, 15.476348876953125, -7.8753662109375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000601.npy"}
{"epoch": 0.90854119425548, "step": 602, "batch_size": 64, "mean": 5.502286434173584, "std": 6.595191955566406, "min": -8.18707275390625, "p10": -2.317366027832031, "median": 3.903106689453125, "p90": 15.623613357543945, "max": 18.723316192626953, "pos_frac": 0.78125, "sample": [-2.4805145263671875, 13.034038543701172, 10.92340087890625, 18.237640380859375, 7.4143524169921875, 6.091896057128906, 2.9854660034179688, -0.47316551208496094, 8.731674194335938, 3.7905311584472656, 18.029098510742188, -2.385364532470703, 11.78753662109375, 17.078147888183594, -4.666530609130859, 17.895286560058594, -6.5484771728515625, 12.597137451171875, 3.8785247802734375, 2.190460205078125, 3.2446746826171875, -1.3360881805419922, 0.51861572265625, -1.9791755676269531, 0.9051704406738281, 11.511283874511719, 1.134817123413086, 10.199615478515625, 4.636119842529297, 5.63154411315918, -2.158702850341797, 2.5095138549804688, 7.916069030761719, 2.3441295623779297, 3.588298797607422, 0.17157363891601562, 2.7774581909179688, 3.0723114013671875, 12.616872787475586, 1.3052177429199219, -0.13516998291015625, 5.026458740234375, 8.794870376586914, 15.700145721435547, -3.7419891357421875, -8.18707275390625, 10.186504364013672, 3.8278274536132812, 6.472040176391602, 1.1429901123046875, -0.27312469482421875, 4.844474792480469, 18.723316192626953, 10.304756164550781, 2.3668365478515625, 12.96285629272461, 9.800743103027344, 4.328857421875, 16.595077514648438, 15.445037841796875, -2.4944381713867188, 10.10986328125, 3.9276885986328125, -0.30267333984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000602.npy"}
{"epoch": 0.91005291005291, "step": 603, "batch_size": 64, "mean": 5.357183456420898, "std": 6.599529266357422, "min": -9.161338806152344, "p10": -1.448147583007812, "median": 3.855937957763672, "p90": 14.540781402587893, "max": 26.711090087890625, "pos_frac": 0.796875, "sample": [7.123542785644531, 14.834640502929688, 8.3377685546875, -0.42498779296875, -0.3646240234375, 9.517181396484375, 3.3012542724609375, 8.594146728515625, -2.3442115783691406, -1.0749053955078125, 15.004081726074219, 2.1693572998046875, 10.553470611572266, -0.8782253265380859, 1.5787277221679688, 13.781776428222656, -3.4758033752441406, 1.8099136352539062, 1.7342376708984375, 26.711090087890625, 6.672813415527344, 4.0686798095703125, 15.150957107543945, 3.5248336791992188, -0.3128547668457031, 2.9682159423828125, 8.548877716064453, -1.6081085205078125, 7.560081481933594, -7.687187194824219, 3.5793495178222656, 3.6131134033203125, 4.746549606323242, 2.098003387451172, -4.867734909057617, 9.49655532836914, 7.269308090209961, 3.6431961059570312, 0.7200717926025391, 17.796409606933594, 4.143863677978516, 7.2983245849609375, 2.8492355346679688, 5.345853805541992, 8.030876159667969, 12.541770935058594, 4.924217224121094, 0.06767654418945312, 2.9389495849609375, 1.4071903228759766, -2.7960128784179688, 1.6321640014648438, 8.569875717163086, 11.143291473388672, 12.946037292480469, 0.3664207458496094, 6.109308242797852, -0.33692169189453125, 13.855110168457031, 13.621299743652344, 17.148082733154297, 1.0012550354003906, 15.743637084960938, -9.161338806152344], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000603.npy"}
{"epoch": 0.9115646258503401, "step": 604, "batch_size": 64, "mean": 5.1371169090271, "std": 8.054845809936523, "min": -15.305694580078125, "p10": -3.5821090698242184, "median": 3.8480911254882812, "p90": 16.295999908447268, "max": 23.121475219726562, "pos_frac": 0.765625, "sample": [6.44818115234375, 10.700569152832031, -1.5336074829101562, -5.720829010009766, 7.014347076416016, 5.818300247192383, 0.7452812194824219, 2.800619125366211, 19.300189971923828, -3.3148345947265625, -1.8521728515625, -0.07532501220703125, 1.4224796295166016, 10.539375305175781, 0.23870849609375, 3.2206954956054688, 13.27947998046875, 2.061910629272461, 6.282661437988281, 9.564872741699219, -14.429924011230469, -0.5984039306640625, 1.4867897033691406, -3.6966552734375, 5.68217658996582, 0.8177375793457031, 1.4127483367919922, 4.151649475097656, 5.480133056640625, -15.305694580078125, 5.68280029296875, -0.7039108276367188, 3.5445327758789062, -5.012969970703125, 20.73807144165039, -4.089958190917969, 0.76983642578125, 14.519287109375, 21.65428924560547, 9.917427062988281, 8.204113006591797, 20.430767059326172, 23.121475219726562, 2.37115478515625, 6.1441192626953125, 12.309844970703125, 3.3997421264648438, 11.441658020019531, 22.787567138671875, -1.5604743957519531, 16.397552490234375, 4.840812683105469, 1.5198745727539062, 1.1759567260742188, 7.6629180908203125, 8.480964660644531, 8.694408416748047, 0.30283355712890625, 1.919534683227539, 16.059043884277344, 10.821367263793945, -0.3199005126953125, -6.373481750488281, 9.982765197753906], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000604.npy"}
{"epoch": 0.9130763416477702, "step": 605, "batch_size": 64, "mean": 4.79755973815918, "std": 6.984202861785889, "min": -10.825592041015625, "p10": -4.081386947631836, "median": 4.713497161865234, "p90": 14.756168365478516, "max": 23.4930419921875, "pos_frac": 0.75, "sample": [4.578193664550781, 0.1533966064453125, -4.081871032714844, 23.4930419921875, 9.741119384765625, 12.183059692382812, -0.37139892578125, 5.053730010986328, 9.192985534667969, -5.81593132019043, 14.28076171875, -3.7188491821289062, 5.408573150634766, -1.5300369262695312, 1.6919937133789062, -2.9191246032714844, 2.9091033935546875, 10.511878967285156, 6.312747955322266, 3.9437522888183594, -1.2688446044921875, 0.18280029296875, 5.289340972900391, -7.414878845214844, -0.6792678833007812, 5.951141357421875, 16.75423240661621, 5.646636962890625, 2.3776073455810547, 4.8488006591796875, 6.0767364501953125, 3.1535606384277344, -7.807960510253906, 16.044784545898438, -1.588470458984375, 10.148590087890625, 14.813114166259766, 10.443656921386719, 2.8675308227539062, 16.263870239257812, 3.1388473510742188, 2.7505340576171875, 6.114845275878906, 9.198949813842773, 15.790924072265625, 3.6872940063476562, 9.666778564453125, 14.623294830322266, 7.5131683349609375, -5.06591796875, 18.292335510253906, 8.528549194335938, 6.601787567138672, 1.2741775512695312, -5.577606201171875, 2.741546630859375, 2.4396228790283203, -10.825592041015625, 5.730079650878906, -0.21234703063964844, 2.2287063598632812, 10.219619750976562, -4.080257415771484, 9.14436149597168], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000605.npy"}
{"epoch": 0.9145880574452003, "step": 606, "batch_size": 64, "mean": 4.654353141784668, "std": 6.634653091430664, "min": -6.180854797363281, "p10": -3.034385299682617, "median": 4.37774658203125, "p90": 12.851221466064457, "max": 20.640777587890625, "pos_frac": 0.6875, "sample": [-2.5788955688476562, 11.932968139648438, 0.7365226745605469, 13.227203369140625, -0.95709228515625, 0.18419647216796875, 13.646469116210938, -4.1675567626953125, -0.4298095703125, 11.730125427246094, 16.505109786987305, -6.180854797363281, 9.381759643554688, 7.6220855712890625, 9.264045715332031, -5.606971740722656, 6.824853897094727, 2.2892913818359375, -0.5481586456298828, 5.899711608886719, 20.63128662109375, -4.2879791259765625, 0.14593124389648438, 7.227142333984375, -3.229595184326172, -2.4009475708007812, 8.65203857421875, 1.8622093200683594, 5.257932662963867, 7.503143310546875, 2.6241378784179688, 1.0319099426269531, -4.060207366943359, 11.400249481201172, 1.3376998901367188, 1.1497058868408203, 4.7299957275390625, -2.388425827026367, 4.6809844970703125, 10.18963623046875, 10.968502044677734, 17.798988342285156, 4.0745086669921875, 11.040111541748047, 3.4515838623046875, 20.640777587890625, -0.1286773681640625, 7.2475738525390625, -3.4091014862060547, -2.114971160888672, 11.07028579711914, -1.9413909912109375, -2.3303604125976562, 7.695991516113281, -0.8660049438476562, 8.268569946289062, 3.3861427307128906, -2.307342529296875, 5.04576301574707, 17.09046173095703, 6.162872314453125, 11.973930358886719, 5.49249267578125, -1.263967514038086], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000606.npy"}
{"epoch": 0.9160997732426304, "step": 607, "batch_size": 64, "mean": 5.377292633056641, "std": 7.485074043273926, "min": -11.442474365234375, "p10": -3.674897003173828, "median": 5.269229888916016, "p90": 14.507992935180665, "max": 28.254638671875, "pos_frac": 0.71875, "sample": [5.106418609619141, 2.3065185546875, -3.939332962036133, 3.9038467407226562, 8.476852416992188, 6.669712066650391, 11.108482360839844, 8.292930603027344, -11.442474365234375, -1.935638427734375, 1.4545021057128906, 6.007530212402344, 23.360382080078125, 14.956193923950195, 3.5329818725585938, 1.20379638671875, -3.0721969604492188, 3.3299198150634766, 10.155281066894531, 20.10527801513672, 2.753875732421875, 9.594371795654297, 8.905384063720703, 10.450469970703125, 12.866775512695312, 5.726280212402344, 3.3335189819335938, 3.7029571533203125, -2.5771026611328125, -1.8742446899414062, 8.133659362792969, 10.719047546386719, 13.036609649658203, 4.215312957763672, -3.743133544921875, 17.321144104003906, 15.061721801757812, -0.6666908264160156, 12.35357666015625, -2.587982177734375, -3.5156784057617188, 2.0600032806396484, 2.624513626098633, 7.8206024169921875, 5.432041168212891, -5.085681915283203, -1.1895179748535156, 11.446901321411133, 28.254638671875, 14.04056167602539, -0.9986457824707031, -4.361427307128906, 7.467079162597656, 14.708320617675781, 0.3901710510253906, 9.087699890136719, -0.4521522521972656, -0.6115264892578125, 5.929130554199219, 12.948299407958984, 6.630157470703125, -6.880195617675781, -3.9997711181640625, 6.094669342041016], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000607.npy"}
{"epoch": 0.9176114890400605, "step": 608, "batch_size": 64, "mean": 4.0554890632629395, "std": 6.475800037384033, "min": -11.871871948242188, "p10": -2.892200851440429, "median": 3.476593017578125, "p90": 13.073913574218754, "max": 21.402545928955078, "pos_frac": 0.734375, "sample": [7.649011611938477, -1.2940750122070312, 1.5431632995605469, 8.6236572265625, 0.08648681640625, 5.8579864501953125, 10.647109985351562, -3.0455780029296875, 3.6962966918945312, -4.175537109375, 5.893058776855469, 1.7808837890625, 9.2288818359375, 8.343730926513672, 1.665964126586914, 8.762470245361328, 17.160537719726562, -9.139511108398438, -2.022981643676758, 0.07483673095703125, 3.47998046875, -2.534320831298828, -0.21755218505859375, 21.402545928955078, 7.477058410644531, 0.5055961608886719, -3.4434814453125, 3.6946544647216797, 1.9844093322753906, 16.99898910522461, 3.47320556640625, -6.133674621582031, -1.151336669921875, 6.591224670410156, -2.0427169799804688, 11.591278076171875, 1.73150634765625, 0.45671844482421875, 5.629638671875, 2.3405303955078125, 2.901226043701172, 3.8742103576660156, 2.4266891479492188, -0.3373546600341797, -3.3000946044921875, 7.8382415771484375, -0.5959091186523438, 13.554931640625, 17.138641357421875, 5.584535598754883, 6.22845458984375, 4.178009033203125, 3.5409164428710938, 4.329010009765625, 1.245096206665039, 10.817024230957031, -11.871871948242188, 17.602602005004883, 4.5555877685546875, -1.9716033935546875, -0.2858314514160156, 12.17242431640625, 3.2954559326171875, 13.46026611328125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000608.npy"}
{"epoch": 0.9191232048374905, "step": 609, "batch_size": 64, "mean": 6.137574672698975, "std": 6.878267288208008, "min": -10.381362915039062, "p10": -1.5414501190185543, "median": 5.235029220581055, "p90": 14.984880065917972, "max": 24.977279663085938, "pos_frac": 0.84375, "sample": [15.9588623046875, 5.246063232421875, 7.7104949951171875, 3.5839080810546875, 7.23577880859375, 1.183053970336914, 8.862022399902344, 7.149452209472656, -0.4243049621582031, -1.7437667846679688, -2.508909225463867, 5.121599197387695, -0.6980743408203125, 3.1518173217773438, 24.977279663085938, 13.019355773925781, 15.320846557617188, 8.851764678955078, 4.145929336547852, 21.211715698242188, -7.164344787597656, 12.59564208984375, 5.452301025390625, 11.712081909179688, 0.00107574462890625, 11.863945007324219, 6.138095855712891, 8.597366333007812, -4.026832580566406, 4.5834197998046875, 17.66754150390625, 0.01517486572265625, 20.690959930419922, 11.010292053222656, 8.468826293945312, 10.41566276550293, 3.3652725219726562, 0.23259925842285156, 20.43994140625, 9.499359130859375, 2.9945850372314453, -4.350505828857422, 1.9456520080566406, 14.200958251953125, 9.496280670166016, -10.381362915039062, 4.462884902954102, 5.223995208740234, 9.608711242675781, -1.0693778991699219, 4.415493011474609, 3.1025962829589844, 4.670379638671875, 3.152101516723633, 7.927736282348633, 0.9183197021484375, 8.53170394897461, 9.28115463256836, 7.771596908569336, -4.3416595458984375, 9.520156860351562, 5.1781463623046875, 0.6566848754882812, 0.9752883911132812], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000609.npy"}
{"epoch": 0.9206349206349206, "step": 610, "batch_size": 64, "mean": 5.368198394775391, "std": 8.477282524108887, "min": -15.111343383789062, "p10": -4.750569152832031, "median": 4.5269927978515625, "p90": 17.543841934204107, "max": 26.343276977539062, "pos_frac": 0.71875, "sample": [4.785106658935547, 18.090225219726562, 13.027923583984375, 11.711776733398438, 1.3207778930664062, 13.237682342529297, 2.2310409545898438, 3.3144187927246094, -4.3563385009765625, 3.2131423950195312, 4.1598052978515625, -7.542715072631836, 4.419490814208984, 19.391250610351562, -0.3394775390625, 4.5070037841796875, 16.26894760131836, -2.725025177001953, -15.111343383789062, -0.5453567504882812, 0.9464492797851562, -1.405426025390625, 5.60888671875, 5.390026092529297, -5.410430908203125, -2.2861709594726562, -4.919525146484375, 25.28563690185547, 8.549690246582031, 15.103683471679688, 1.4984245300292969, 24.485931396484375, 7.828460693359375, 12.683387756347656, -2.332763671875, 5.958745956420898, 13.296295166015625, 0.6527900695800781, 8.572738647460938, 5.10540771484375, 6.7994842529296875, 4.827070236206055, 4.979499816894531, -5.975921630859375, 19.125778198242188, -1.9506721496582031, -0.0909576416015625, -0.41962432861328125, 11.389181137084961, 15.398635864257812, 0.6524734497070312, 5.287567138671875, 4.5469818115234375, 1.0730552673339844, -0.40328407287597656, 0.07789230346679688, -5.91400146484375, 9.529396057128906, 26.343276977539062, 1.4202461242675781, 11.896484375, -5.9980316162109375, 19.979293823242188, 7.3203277587890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000610.npy"}
{"epoch": 0.9221466364323507, "step": 611, "batch_size": 64, "mean": 3.7607359886169434, "std": 6.374274253845215, "min": -10.215644836425781, "p10": -3.758777618408203, "median": 3.5005569458007812, "p90": 11.447650146484376, "max": 24.74591827392578, "pos_frac": 0.71875, "sample": [9.790245056152344, 7.339136123657227, -6.87506103515625, 4.608020782470703, 15.734533309936523, 2.098552703857422, 0.6130828857421875, 8.403255462646484, -1.5769939422607422, 2.798309326171875, -5.121511459350586, 11.6431884765625, 3.893037796020508, 3.7980728149414062, -2.279388427734375, -0.741607666015625, -3.6819801330566406, 10.320266723632812, -6.683742523193359, 24.74591827392578, 5.484100341796875, 8.972589492797852, 3.0591583251953125, -3.2234668731689453, -3.7916908264160156, -0.41588592529296875, 10.99139404296875, 1.649728775024414, 1.1927566528320312, 9.354415893554688, 3.5679397583007812, -4.11981201171875, -2.1241073608398438, 15.493522644042969, 3.2350311279296875, 2.0392913818359375, -6.421453475952148, 17.978591918945312, 5.392585754394531, 5.076728820800781, 6.1634063720703125, 2.6332340240478516, -0.9986648559570312, 8.076751708984375, 0.164459228515625, 5.373653411865234, 5.251501083374023, 6.067047119140625, 5.407623291015625, 7.745758056640625, 5.3841094970703125, 13.615882873535156, 2.65899658203125, 7.912788391113281, 3.2378902435302734, 5.1591796875, 11.890987396240234, -10.215644836425781, 3.4331741333007812, 2.2763824462890625, -3.2178897857666016, -2.4766273498535156, -1.3618736267089844, 4.2882232666015625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000611.npy"}
{"epoch": 0.9236583522297808, "step": 612, "batch_size": 64, "mean": 5.858378887176514, "std": 7.928894996643066, "min": -15.574508666992188, "p10": -2.0126163482666017, "median": 5.127322196960449, "p90": 16.634332084655764, "max": 24.882171630859375, "pos_frac": 0.796875, "sample": [-0.24667739868164062, -7.387451171875, 8.519256591796875, -0.41080474853515625, -1.9951019287109375, 1.3607025146484375, 10.904476165771484, 10.090591430664062, 0.17711639404296875, 0.4138336181640625, 13.122570037841797, 14.645126342773438, 10.148910522460938, 1.6793975830078125, 6.215461730957031, 7.5436553955078125, 5.225679397583008, 3.1031036376953125, 3.7666854858398438, 5.923038482666016, 7.561717987060547, -10.670150756835938, 4.945320129394531, 2.5231094360351562, -2.0233917236328125, 3.1550865173339844, 14.811206817626953, -2.7447242736816406, -2.020122528076172, 2.3065872192382812, 4.70440673828125, 16.46379280090332, 0.385528564453125, 0.23109054565429688, -0.9944076538085938, 16.707420349121094, 3.8538436889648438, 10.49905014038086, 24.47174072265625, -9.869548797607422, 19.487564086914062, 2.874004364013672, 16.326889038085938, -0.3085479736328125, 17.14508056640625, 19.508651733398438, 6.7100982666015625, -1.1892681121826172, 11.847259521484375, 24.882171630859375, 3.2572898864746094, 4.5066375732421875, 19.11053466796875, 8.259185791015625, 6.753490447998047, 12.095703125, 5.178012847900391, 1.1181583404541016, 5.486854553222656, -15.574508666992188, 7.07518196105957, 11.423625946044922, 5.076631546020508, 6.788444519042969], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000612.npy"}
{"epoch": 0.9251700680272109, "step": 613, "batch_size": 64, "mean": 6.773859024047852, "std": 7.523411750793457, "min": -10.355941772460938, "p10": -1.17911262512207, "median": 6.064569473266602, "p90": 17.681590270996093, "max": 23.611000061035156, "pos_frac": 0.859375, "sample": [-1.2680549621582031, 5.167816162109375, 0.8310317993164062, 10.672943115234375, 8.034200668334961, 8.442611694335938, 9.32489013671875, 4.457817077636719, -4.032356262207031, -0.9715805053710938, 6.874631881713867, 5.743579864501953, -10.164310455322266, 6.38555908203125, 7.373722076416016, 9.633987426757812, 2.793773651123047, 2.9734745025634766, 17.61573028564453, 4.135719299316406, 0.3590126037597656, -1.7912101745605469, -5.101226806640625, 2.2115440368652344, 19.480037689208984, 6.721599578857422, 17.283042907714844, 3.1641616821289062, 2.6944808959960938, 4.51953125, 17.244430541992188, 7.431940078735352, 7.8710479736328125, 0.5043239593505859, 1.6822662353515625, 4.409122467041016, -0.26114654541015625, 21.107004165649414, 8.746410369873047, 22.904190063476562, 15.857261657714844, 12.39222526550293, 2.165485382080078, 19.24249267578125, 19.500823974609375, 1.6783981323242188, 16.67943572998047, 7.828069686889648, -1.7646865844726562, 10.855083465576172, 2.1951141357421875, 1.4307861328125, 7.4225616455078125, 17.709815979003906, 10.46221923828125, 4.810749053955078, 8.180458068847656, 15.064115524291992, 4.041215896606445, 23.611000061035156, 2.0638885498046875, 7.130939483642578, 0.119720458984375, -10.355941772460938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000613.npy"}
{"epoch": 0.926681783824641, "step": 614, "batch_size": 64, "mean": 5.172794342041016, "std": 5.685115337371826, "min": -11.594173431396484, "p10": -1.3382022857666014, "median": 4.774533271789551, "p90": 12.181122589111329, "max": 21.95904541015625, "pos_frac": 0.84375, "sample": [10.609210968017578, 12.230155944824219, 2.5284042358398438, 5.2126922607421875, 0.9100418090820312, 8.931327819824219, 3.406167984008789, 2.2368030548095703, 3.7835731506347656, 4.129264831542969, 10.159347534179688, 6.654470443725586, 13.236221313476562, 9.289665222167969, 7.763458251953125, 4.020109176635742, 8.877666473388672, 8.716560363769531, 17.27556610107422, 12.06671142578125, 2.0917186737060547, 2.766620635986328, -2.0314407348632812, -5.3470916748046875, 21.95904541015625, -0.7619705200195312, 0.14972496032714844, -4.9767913818359375, 6.805742263793945, 6.55633544921875, 1.8440322875976562, 2.2676658630371094, 0.3113975524902344, 12.646955490112305, 2.249422073364258, 4.273784637451172, 11.977184295654297, 6.7789306640625, -11.594173431396484, 6.0864715576171875, -0.8996467590332031, 14.855072021484375, 7.41790771484375, 10.066631317138672, 4.5510406494140625, 7.345037460327148, 2.85888671875, 6.12445068359375, 5.315296173095703, -1.08294677734375, 4.998025894165039, 3.1668357849121094, -3.863525390625, -1.6471519470214844, 1.6164054870605469, 12.469108581542969, 8.667861938476562, 3.1646347045898438, 9.155689239501953, 3.03485107421875, 1.5519695281982422, 7.540004730224609, -1.4475975036621094, 10.009017944335938], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000614.npy"}
{"epoch": 0.9281934996220711, "step": 615, "batch_size": 64, "mean": 5.831323623657227, "std": 7.6891560554504395, "min": -11.293380737304688, "p10": -2.982577896118164, "median": 5.241455078125, "p90": 17.007868957519534, "max": 22.28247833251953, "pos_frac": 0.78125, "sample": [3.060333251953125, 5.3580780029296875, 9.338191986083984, 3.763935089111328, 7.078010559082031, 13.220457077026367, 4.54986572265625, -11.293380737304688, 17.714481353759766, 3.9953460693359375, -4.050056457519531, 4.844383239746094, -2.1702880859375, 7.766754150390625, 3.2972774505615234, -5.259391784667969, 4.436393737792969, 17.164039611816406, 10.78082275390625, -1.4554901123046875, 11.48358154296875, 1.8429641723632812, 16.643470764160156, -1.6497631072998047, 9.549064636230469, 7.5779876708984375, -2.8661346435546875, 6.1475372314453125, 5.6476593017578125, 16.15255355834961, 11.996585845947266, 1.7672920227050781, -2.3763580322265625, -4.658790588378906, 7.719245910644531, 13.423582077026367, 12.552398681640625, 5.731109619140625, 0.4022369384765625, 17.304885864257812, 15.463462829589844, 22.28247833251953, 11.410385131835938, 5.1248321533203125, 7.931190490722656, 2.1319656372070312, 2.9140625, 1.0042724609375, 0.63702392578125, 15.825454711914062, 6.06402587890625, 22.0087890625, -3.032482147216797, 5.527820587158203, -0.8660812377929688, 19.33343505859375, 2.235919952392578, 7.457513809204102, 17.63494110107422, 2.1846847534179688, 3.385021209716797, -0.62371826171875, -11.122848510742188, -8.23828125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000615.npy"}
{"epoch": 0.9297052154195011, "step": 616, "batch_size": 64, "mean": 4.5265655517578125, "std": 6.812039852142334, "min": -9.934417724609375, "p10": -2.25836067199707, "median": 3.1961593627929688, "p90": 13.652102661132814, "max": 21.330917358398438, "pos_frac": 0.71875, "sample": [13.728302001953125, 1.4967422485351562, 0.14886474609375, -2.06585693359375, 9.040557861328125, -2.2659664154052734, 21.330917358398438, 6.5821990966796875, -0.22285842895507812, 4.6082763671875, 5.109025955200195, 2.053617477416992, 3.6372451782226562, -9.934417724609375, 2.334850311279297, -7.381832122802734, 2.3646240234375, -2.2406139373779297, 0.053070068359375, 19.705657958984375, 1.0626296997070312, 20.314912796020508, -2.602672576904297, -1.760904312133789, 20.14575958251953, -1.6231231689453125, 1.225931167602539, -4.869956970214844, -0.42278480529785156, 13.47430419921875, 7.385246276855469, 3.7690963745117188, 4.243053436279297, 6.46466064453125, 14.996356964111328, 9.718875885009766, -3.902820587158203, 4.878871917724609, 0.4913787841796875, 12.299467086791992, 9.666561126708984, 8.596641540527344, 3.0730152130126953, 8.667068481445312, 16.357261657714844, 3.01007080078125, 1.9742107391357422, -0.4481086730957031, -3.25518798828125, 2.1248550415039062, -1.3451423645019531, -1.6130828857421875, 6.6172943115234375, 3.1448974609375, 5.0509185791015625, 11.933631896972656, 4.7713623046875, 6.349010467529297, 11.267955780029297, -1.7782554626464844, 9.729133605957031, 11.176185607910156, 3.2474212646484375, -1.9881763458251953], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000616.npy"}
{"epoch": 0.9312169312169312, "step": 617, "batch_size": 64, "mean": 4.721736907958984, "std": 6.846593856811523, "min": -10.048088073730469, "p10": -2.527614021301269, "median": 3.5999107360839844, "p90": 14.680566406250003, "max": 22.512039184570312, "pos_frac": 0.71875, "sample": [3.4743118286132812, 2.5845985412597656, -3.818084716796875, 8.593833923339844, 15.553680419921875, 3.8176631927490234, 9.370147705078125, 9.735502243041992, -0.19742965698242188, -2.7221527099609375, 2.6479721069335938, 5.706756591796875, 0.5257415771484375, 4.531841278076172, 0.9256839752197266, 8.248455047607422, 5.6788787841796875, 8.17303466796875, -0.34925079345703125, 3.002431869506836, 20.104095458984375, 2.0177345275878906, -0.11724472045898438, 12.650157928466797, 4.3171844482421875, -1.5393667221069336, 0.19054794311523438, 10.446517944335938, -0.7410736083984375, 6.758853912353516, 4.786985397338867, -1.4369697570800781, 2.8571128845214844, 15.034202575683594, 8.966903686523438, 22.512039184570312, 2.341705322265625, -2.035003662109375, -0.9593276977539062, 18.027626037597656, 10.815025329589844, 4.396636962890625, -2.073690414428711, -0.7645912170410156, 16.78243637084961, 7.61524772644043, -10.048088073730469, -6.7318878173828125, -6.5888824462890625, 1.9453887939453125, 6.30072021484375, 13.855415344238281, 8.219482421875, -4.0378265380859375, 22.095016479492188, 7.478019714355469, 3.7255096435546875, 2.5836868286132812, 8.244487762451172, 2.5865478515625, 2.434846878051758, -0.06671905517578125, 7.666450500488281, -3.908355712890625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000617.npy"}
{"epoch": 0.9327286470143613, "step": 618, "batch_size": 64, "mean": 4.994216442108154, "std": 6.572030067443848, "min": -10.442855834960938, "p10": -1.6537261962890621, "median": 4.007562637329102, "p90": 14.455053329467777, "max": 22.931655883789062, "pos_frac": 0.8125, "sample": [17.731201171875, 13.7587890625, 6.537097930908203, 6.9966278076171875, 0.018951416015625, 0.3995037078857422, -1.0550956726074219, -1.2208938598632812, 5.89996337890625, -0.11508560180664062, 2.372028350830078, 15.324127197265625, 7.273700714111328, 7.4308929443359375, 22.931655883789062, 1.7739334106445312, 3.1799774169921875, 1.8391036987304688, 18.457550048828125, 10.53759765625, 4.0980987548828125, 21.9404296875, 1.6153926849365234, 8.077171325683594, -1.8392257690429688, 14.75345230102539, -0.10741615295410156, 6.438804626464844, 6.527294158935547, 1.8731861114501953, 0.9115142822265625, -0.27678680419921875, 9.012306213378906, 3.15234375, 7.697700500488281, 8.253143310546875, 8.425987243652344, -1.9533538818359375, 7.817958831787109, 6.548362731933594, -4.623006820678711, 0.027599334716796875, 0.5941085815429688, -5.057060241699219, 1.4842185974121094, -5.6712493896484375, 0.8763046264648438, -10.442855834960938, 7.181877136230469, 6.405170440673828, 0.23348236083984375, 12.25738525390625, 3.9170265197753906, 3.4306259155273438, 2.4301376342773438, 4.1536712646484375, 1.4647140502929688, 5.590278625488281, 4.96000862121582, 12.274036407470703, 9.805694580078125, 17.551918029785156, -3.555816650390625, 1.3035964965820312], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000618.npy"}
{"epoch": 0.9342403628117913, "step": 619, "batch_size": 64, "mean": 5.008410930633545, "std": 6.950492858886719, "min": -13.065994262695312, "p10": -2.066933250427246, "median": 3.4807395935058594, "p90": 14.570907974243164, "max": 20.58502197265625, "pos_frac": 0.75, "sample": [7.189159393310547, -1.8226394653320312, 1.4019393920898438, 1.9731826782226562, 20.58502197265625, 14.211273193359375, 14.104568481445312, -3.5079498291015625, -6.209983825683594, 16.58294677734375, 7.9369049072265625, 2.184276580810547, 11.109619140625, 0.8307819366455078, 15.947128295898438, -1.5686359405517578, 7.572456359863281, 6.182769775390625, -0.04607200622558594, -0.8287200927734375, -3.0953750610351562, 14.625728607177734, 0.6395282745361328, -2.0932769775390625, -2.005464553833008, 6.748199462890625, 7.62773323059082, 3.8641510009765625, 0.4916267395019531, -0.0009098052978515625, 3.9645767211914062, -13.065994262695312, -2.556610107421875, 4.678314208984375, 10.994190216064453, 14.306869506835938, 2.6143436431884766, 18.594722747802734, 18.38896942138672, -4.4087066650390625, 2.7645645141601562, 14.4429931640625, 6.601875305175781, -1.7754039764404297, 19.01654052734375, 9.247627258300781, 1.6719188690185547, 7.470863342285156, 6.483407974243164, 1.2761344909667969, 3.0973281860351562, 5.420871734619141, 9.737117767333984, -1.6858901977539062, 12.633148193359375, 1.1955337524414062, 1.9988784790039062, 10.566204071044922, 0.26319313049316406, 0.6582431793212891, 0.6982307434082031, 6.579853057861328, -0.27458953857421875, 8.309017181396484], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000619.npy"}
{"epoch": 0.9357520786092215, "step": 620, "batch_size": 64, "mean": 4.48681116104126, "std": 6.719709873199463, "min": -7.100067138671875, "p10": -2.8600030899047852, "median": 3.10772705078125, "p90": 15.393631744384773, "max": 20.579917907714844, "pos_frac": 0.6875, "sample": [1.7166900634765625, -4.01202392578125, -3.073486328125, 19.549880981445312, -4.930084228515625, 1.0714893341064453, 5.86260986328125, 16.208465576171875, 2.4554595947265625, 8.114795684814453, 13.689170837402344, 16.9510498046875, 4.5827178955078125, 3.101776123046875, 1.6060256958007812, 13.316726684570312, 4.0830230712890625, 4.541496276855469, -1.6234512329101562, 1.2421302795410156, -7.100067138671875, 1.1401634216308594, -4.12554931640625, 19.46949005126953, -6.3071441650390625, 11.057060241699219, 0.9207267761230469, -1.5745697021484375, 20.579917907714844, -2.6618804931640625, 9.96091079711914, 7.786827087402344, 0.293212890625, -2.865449905395508, -1.0850334167480469, 2.1907882690429688, -0.19314956665039062, 5.6905364990234375, 9.801225662231445, -0.12602996826171875, 9.013795852661133, 1.7936630249023438, 6.419795989990234, 3.5496482849121094, -0.29369354248046875, -0.8437347412109375, 16.124114990234375, 13.005645751953125, -1.1634368896484375, 7.462882995605469, 0.8721828460693359, 16.958206176757812, 3.113677978515625, -0.8350353240966797, -2.8472938537597656, -0.3833122253417969, 5.922918319702148, -0.36159515380859375, 6.46905517578125, 5.00146484375, 7.844326019287109, 10.212181091308594, 9.39079475402832, 3.4231929779052734], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000620.npy"}
{"epoch": 0.9372637944066515, "step": 621, "batch_size": 64, "mean": 4.156601428985596, "std": 6.819648265838623, "min": -10.999237060546875, "p10": -3.1858764648437496, "median": 3.684581756591797, "p90": 13.690658569335941, "max": 19.282285690307617, "pos_frac": 0.703125, "sample": [2.2008743286132812, 0.465362548828125, 3.2664794921875, -1.2543296813964844, -5.9976959228515625, -10.999237060546875, 6.344482421875, 17.932571411132812, -3.4140796661376953, 0.21270179748535156, 4.498394012451172, 15.499557495117188, 12.90655517578125, 1.6034259796142578, 19.282285690307617, 6.547882080078125, 6.160161972045898, 1.7577362060546875, 6.625988006591797, 10.57647705078125, -1.1459426879882812, 2.214874267578125, 10.945114135742188, -8.194095611572266, 12.875244140625, 16.00540542602539, 12.575687408447266, -8.78729248046875, 3.944122314453125, -1.891845703125, 15.065505981445312, 8.090072631835938, 7.2326812744140625, 7.061065673828125, 4.323604583740234, 1.3657798767089844, 4.647268295288086, 1.6808223724365234, 7.348651885986328, -0.5117340087890625, -2.653402328491211, 1.0095710754394531, 10.582550048828125, 5.829986572265625, -0.215728759765625, -0.4463233947753906, 3.4250411987304688, -0.6513290405273438, -4.447532653808594, 5.088911056518555, -0.8181667327880859, 10.065292358398438, 6.976341247558594, 0.9185237884521484, 10.158584594726562, -2.0533809661865234, -1.312479019165039, 17.882720947265625, -7.552240371704102, 7.026813507080078, -1.630645751953125, 0.13720703125, 5.614879608154297, 14.026702880859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000621.npy"}
{"epoch": 0.9387755102040817, "step": 622, "batch_size": 64, "mean": 4.578090667724609, "std": 6.19249963760376, "min": -6.199131011962891, "p10": -2.287985610961913, "median": 3.3613691329956055, "p90": 11.956713867187501, "max": 22.296173095703125, "pos_frac": 0.828125, "sample": [9.815238952636719, 2.207916259765625, 2.0394668579101562, -6.199131011962891, 6.7432403564453125, 0.21285247802734375, 0.9698944091796875, 7.358726501464844, -2.906930923461914, 1.3769378662109375, 21.86176109313965, 5.632560729980469, 0.5483188629150391, -4.650688171386719, 10.542793273925781, 3.3945560455322266, 7.796573638916016, 5.450199127197266, 8.383438110351562, 17.80225372314453, 3.6376113891601562, 8.415699005126953, -2.7039451599121094, -0.2183704376220703, 15.809127807617188, 2.0730209350585938, 4.225257873535156, 8.9801025390625, 1.5702800750732422, 3.6914634704589844, 18.044403076171875, -3.51776123046875, 7.0099029541015625, 2.117706298828125, 1.3036041259765625, 1.2858295440673828, 14.62677001953125, 11.658638000488281, 7.976047515869141, 1.5205001831054688, 11.994117736816406, 4.158378601074219, 3.3281822204589844, -0.6712970733642578, 1.5735130310058594, 7.758884429931641, 1.054412841796875, -5.196342468261719, 0.5537528991699219, 4.63031005859375, 11.869438171386719, 0.3956146240234375, 0.17098236083984375, -1.317413330078125, 2.1721458435058594, 0.1480255126953125, 22.296173095703125, 0.8962936401367188, -4.606573104858398, -0.5903129577636719, 8.07659912109375, 3.9263172149658203, 7.266517639160156, 7.2242279052734375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000622.npy"}
{"epoch": 0.9402872260015117, "step": 623, "batch_size": 64, "mean": 5.403250217437744, "std": 6.261257171630859, "min": -7.57574462890625, "p10": -2.257660675048828, "median": 5.815269470214844, "p90": 12.626889038085938, "max": 20.242233276367188, "pos_frac": 0.765625, "sample": [5.998252868652344, 2.6710968017578125, -0.9376411437988281, -4.406232833862305, 2.5995712280273438, 14.72845458984375, 6.29364013671875, 9.384872436523438, 8.967498779296875, 11.716531753540039, -0.634429931640625, 1.8436431884765625, 12.536964416503906, 3.8410720825195312, 1.8657150268554688, 20.242233276367188, -2.3083534240722656, 14.275093078613281, 12.665428161621094, 6.365760803222656, 5.632286071777344, 2.353473663330078, 8.848615646362305, -3.99554443359375, -4.546318054199219, 7.801067352294922, 12.50909423828125, 9.284431457519531, -7.57574462890625, 15.192581176757812, 8.669822692871094, 18.914310455322266, 8.420455932617188, 11.568450927734375, 0.253692626953125, -4.969562530517578, 5.999214172363281, -0.34934234619140625, 10.860820770263672, 10.406608581542969, 11.766204833984375, 17.381690979003906, 3.354320526123047, -0.0014057159423828125, -0.48419952392578125, 0.5137290954589844, 0.40185546875, 3.8975582122802734, 1.36163330078125, 6.53700065612793, 6.9186859130859375, 9.4210205078125, 8.8984375, 10.143936157226562, -0.6299648284912109, 4.99493408203125, 10.402999877929688, 0.39009857177734375, 2.6951522827148438, 0.7967987060546875, 9.665672302246094, -2.1393775939941406, -2.334716796875, -1.1316299438476562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000623.npy"}
{"epoch": 0.9417989417989417, "step": 624, "batch_size": 64, "mean": 5.346616744995117, "std": 7.113637924194336, "min": -9.305225372314453, "p10": -2.432736206054687, "median": 4.964956283569336, "p90": 14.82772445678711, "max": 23.1134033203125, "pos_frac": 0.71875, "sample": [-1.6929550170898438, -4.541469573974609, 6.848165512084961, 8.435165405273438, 20.743183135986328, 18.0361385345459, -0.05521392822265625, 3.47760009765625, 0.9349822998046875, -1.3443012237548828, 2.574005126953125, 4.986110687255859, 7.635446548461914, 6.098089218139648, 9.624626159667969, 6.4580078125, 3.5299930572509766, 10.327789306640625, 14.908187866210938, 0.3815345764160156, -0.4107704162597656, 0.9386711120605469, -4.600896835327148, 12.71209716796875, -0.16410446166992188, 7.484098434448242, 0.8842849731445312, -4.0666961669921875, 1.9296875, 2.0343894958496094, 13.73040771484375, 14.639976501464844, 10.350845336914062, 13.670867919921875, 5.249340057373047, -3.7339515686035156, 17.729734420776367, 10.822521209716797, 23.1134033203125, 11.448234558105469, 7.555257797241211, 7.159095764160156, 9.21091079711914, 3.0483932495117188, -1.8787689208984375, 9.149986267089844, 20.423782348632812, -0.058803558349609375, -1.1416664123535156, 4.9438018798828125, 6.787757873535156, 0.5364665985107422, -2.6701507568359375, 0.11093902587890625, -9.305225372314453, 12.647298812866211, -0.947235107421875, -0.5921478271484375, 8.098480224609375, -0.2776374816894531, -6.617288589477539, 1.9823970794677734, 6.755516052246094, 16.13507843017578], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000624.npy"}
{"epoch": 0.9433106575963719, "step": 625, "batch_size": 64, "mean": 4.292520046234131, "std": 6.891178607940674, "min": -9.698417663574219, "p10": -3.132456016540527, "median": 3.5182676315307617, "p90": 14.409183883666993, "max": 21.48322296142578, "pos_frac": 0.765625, "sample": [3.441041946411133, 13.4066162109375, 2.553546905517578, 0.112335205078125, -2.921234130859375, 8.155338287353516, 4.235111236572266, 2.784515380859375, 4.798154830932617, 15.777374267578125, 1.8510932922363281, -1.0276870727539062, 8.99995231628418, 0.9755287170410156, 15.1617431640625, 0.5001373291015625, 19.1363525390625, 3.7530670166015625, -4.059803009033203, 0.9686927795410156, 7.586219787597656, 14.313419342041016, 0.4571685791015625, -8.126678466796875, 0.5604095458984375, 4.133272171020508, 4.979759216308594, 10.060569763183594, -0.6607303619384766, 3.5954933166503906, -2.371967315673828, 14.450225830078125, 13.718902587890625, 0.3709564208984375, 6.661968231201172, -2.323251724243164, -3.210958480834961, 17.26116943359375, 2.1661148071289062, 2.0076675415039062, 8.979141235351562, 21.48322296142578, 12.594490051269531, 5.510087966918945, 4.234596252441406, -0.013671875, -8.864120483398438, -2.3090667724609375, -2.9492835998535156, 5.596199035644531, 1.6716156005859375, -5.690574645996094, -9.698417663574219, 16.666339874267578, 5.410165786743164, 0.46877479553222656, 6.50115966796875, 1.1917228698730469, 11.290739059448242, -4.3262481689453125, 2.2408981323242188, 11.523263931274414, 5.185186386108398, 3.7934494018554688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000625.npy"}
{"epoch": 0.9448223733938019, "step": 626, "batch_size": 64, "mean": 5.978490352630615, "std": 8.258874893188477, "min": -9.884674072265625, "p10": -4.611162757873535, "median": 4.7940874099731445, "p90": 17.185759735107425, "max": 28.267791748046875, "pos_frac": 0.78125, "sample": [12.400405883789062, 6.9124298095703125, 11.595458984375, -5.816211700439453, 22.63336181640625, 14.147363662719727, 17.644729614257812, -6.047466278076172, 4.251157760620117, 15.135700225830078, 0.8494911193847656, 1.9183769226074219, 2.5343017578125, 0.5519046783447266, -3.818756103515625, 21.687557220458984, 9.605392456054688, -7.979404449462891, 15.541999816894531, 1.9164905548095703, -4.119951248168945, 3.98797607421875, 5.345706939697266, 20.397212982177734, 3.8355655670166016, 1.3451728820800781, 5.745048522949219, 9.6075439453125, -9.884674072265625, 28.267791748046875, 5.406055450439453, -2.1423587799072266, 11.982490539550781, 9.1307373046875, 4.095401763916016, -6.3284912109375, -1.444366455078125, 9.023277282714844, 0.5260419845581055, 5.540428161621094, 5.2588043212890625, 2.739055633544922, 16.114830017089844, 4.3167877197265625, -3.4115142822265625, 4.329370498657227, 2.4739742279052734, 14.232437133789062, 10.730976104736328, 7.147716522216797, 12.444801330566406, 1.5638618469238281, -6.754589080810547, 9.09735107421875, -4.821681976318359, 11.319656372070312, 3.764862060546875, 18.607986450195312, 13.002519607543945, 7.757568359375, 18.87603759765625, -3.8259029388427734, 3.3152694702148438, -1.6376876831054688], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000626.npy"}
{"epoch": 0.9463340891912321, "step": 627, "batch_size": 64, "mean": 6.653484344482422, "std": 7.863141059875488, "min": -8.841316223144531, "p10": -2.3313125610351557, "median": 5.04095458984375, "p90": 18.450967407226564, "max": 25.160659790039062, "pos_frac": 0.84375, "sample": [6.7764434814453125, 4.233898162841797, 11.549522399902344, 14.271728515625, 4.305122375488281, 10.305057525634766, 21.457382202148438, 18.777847290039062, 15.13873291015625, 2.8662261962890625, -4.738471984863281, 12.745925903320312, 23.301345825195312, 8.063003540039062, 6.278800964355469, -1.3783721923828125, 1.6889877319335938, 4.9757080078125, 20.96112060546875, 2.348724365234375, 11.670936584472656, 3.9854068756103516, 2.464313507080078, 5.106201171875, 0.07131195068359375, -6.8007354736328125, 14.673864364624023, -1.813201904296875, 12.044473648071289, 0.024444580078125, 2.188985824584961, 21.92371368408203, 2.5610809326171875, 0.1349334716796875, 5.258270263671875, 8.713973999023438, -0.86260986328125, 0.99609375, 0.9029903411865234, 10.360298156738281, 25.160659790039062, 7.757026672363281, -2.5533599853515625, -3.473642349243164, -2.8645896911621094, 3.5751953125, 1.7994537353515625, 9.333208084106445, 3.4357528686523438, 17.688247680664062, 5.2967071533203125, -8.841316223144531, 2.9186344146728516, 10.545114517211914, 16.438899993896484, 4.428741455078125, 2.4165573120117188, -8.743202209472656, 10.210441589355469, 1.0779571533203125, 9.31143569946289, 14.084609985351562, 10.030097961425781, 19.256919860839844], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000627.npy"}
{"epoch": 0.9478458049886621, "step": 628, "batch_size": 64, "mean": 6.520160675048828, "std": 8.282068252563477, "min": -16.630813598632812, "p10": -2.9507358551025384, "median": 7.4844465255737305, "p90": 16.982763671875002, "max": 24.733022689819336, "pos_frac": 0.828125, "sample": [24.733022689819336, 4.470184326171875, 9.243988037109375, 7.483846664428711, -3.4847259521484375, -3.184356689453125, 9.452392578125, 2.0402374267578125, 15.413795471191406, 10.265893936157227, 23.184768676757812, 21.948043823242188, 10.425872802734375, 3.887409210205078, 7.502717971801758, 1.2240219116210938, -16.630813598632812, -11.355566024780273, 12.794082641601562, 2.4826393127441406, 12.017194747924805, 7.48504638671875, 1.1169872283935547, 1.127349853515625, 10.207939147949219, 12.302139282226562, 5.373802185058594, 8.263641357421875, 6.361785888671875, 18.810054779052734, 7.652467727661133, 9.756088256835938, -0.9265670776367188, 18.437543869018555, 4.001136779785156, 16.735153198242188, -1.7448921203613281, 1.1523380279541016, 8.851081848144531, 0.00905609130859375, 9.56436538696289, 4.726139068603516, 5.297367095947266, 11.461530685424805, 8.660896301269531, 11.8031005859375, 20.859020233154297, 13.096216201782227, 14.052202224731445, 10.769315719604492, 2.9230785369873047, 1.0437469482421875, 1.5202960968017578, -8.139871597290039, 3.9491024017333984, 17.088882446289062, 1.6202316284179688, 8.503868103027344, -2.037384033203125, 3.4801483154296875, -4.007196426391602, 15.746559143066406, -2.405620574951172, -11.172500610351562], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000628.npy"}
{"epoch": 0.9493575207860923, "step": 629, "batch_size": 64, "mean": 5.196226119995117, "std": 6.064513206481934, "min": -5.030708312988281, "p10": -1.2303937911987302, "median": 3.2338180541992188, "p90": 14.5026611328125, "max": 23.02759552001953, "pos_frac": 0.828125, "sample": [0.5681686401367188, 1.3404769897460938, -5.030708312988281, 5.322959899902344, 4.580814361572266, 2.4789505004882812, -0.28223419189453125, 8.571365356445312, 3.8933029174804688, 2.8208084106445312, 3.0233230590820312, 5.2516021728515625, 2.6768569946289062, 23.02759552001953, 2.002584457397461, -0.9398117065429688, 4.532670974731445, 4.619472503662109, 2.3153076171875, 14.569229125976562, -2.1469573974609375, -0.5483932495117188, -1.354928970336914, 0.4293842315673828, -2.902027130126953, 2.0019912719726562, 14.152359008789062, 8.249176025390625, 1.163583755493164, 13.239013671875, 6.235198974609375, -3.1594390869140625, 6.7293701171875, 6.681352615356445, 8.170761108398438, 4.6786651611328125, 7.95850944519043, 1.2800140380859375, 1.4942646026611328, -0.3575000762939453, 0.3419189453125, 1.9822502136230469, 13.067916870117188, 2.1376819610595703, 2.2126312255859375, 2.9708709716796875, 3.4443130493164062, 9.123664855957031, 6.960992813110352, 7.7078704833984375, 13.932014465332031, 1.5754814147949219, 15.692764282226562, 2.1120128631591797, 10.596221923828125, 14.347335815429688, -3.1159744262695312, 17.162399291992188, 18.28738021850586, 15.826187133789062, 2.034912109375, 14.9888916015625, -2.5123138427734375, 4.343910217285156], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000629.npy"}
{"epoch": 0.9508692365835223, "step": 630, "batch_size": 64, "mean": 5.667186737060547, "std": 6.922513008117676, "min": -4.021354675292969, "p10": -2.6234115600585928, "median": 4.357387542724609, "p90": 14.529614257812502, "max": 24.682586669921875, "pos_frac": 0.734375, "sample": [-4.011137008666992, 5.8616943359375, 2.2403297424316406, -1.0603866577148438, 10.530851364135742, 16.570831298828125, 6.118328094482422, -0.13893508911132812, 22.496299743652344, 13.968597412109375, -0.9138450622558594, 10.837642669677734, -3.1166610717773438, 14.852523803710938, -0.9904365539550781, 12.95306396484375, -4.021354675292969, 6.2529296875, 9.612312316894531, 10.856109619140625, 1.9487762451171875, 1.6929664611816406, 8.810493469238281, 10.845489501953125, 14.770050048828125, 9.605392456054688, -3.3661422729492188, 4.154243469238281, -3.6529693603515625, 3.43817138671875, -0.9943771362304688, 12.390693664550781, -0.6999664306640625, 15.5516357421875, 8.56158447265625, 2.22686767578125, 10.83120346069336, 1.1920318603515625, 4.9025115966796875, 3.3718719482421875, 24.682586669921875, 8.965936660766602, 13.871444702148438, -3.572856903076172, 22.25238800048828, 9.031486511230469, 2.6176605224609375, 4.675540924072266, 6.959878921508789, -0.68634033203125, 1.3970794677734375, -1.4724960327148438, -1.0297889709472656, 12.201728820800781, 6.520576477050781, -3.1186695098876953, 1.8435897827148438, 0.7704582214355469, 4.354301452636719, 12.625846862792969, -1.3509368896484375, 4.3604736328125, 2.186737060546875, 0.1340484619140625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000630.npy"}
{"epoch": 0.9523809523809523, "step": 631, "batch_size": 64, "mean": 6.559107780456543, "std": 8.426466941833496, "min": -12.162567138671875, "p10": -2.755682373046875, "median": 5.8977556228637695, "p90": 17.511853790283205, "max": 28.89312744140625, "pos_frac": 0.78125, "sample": [11.353727340698242, 1.5117950439453125, 4.4302825927734375, 11.07159423828125, 7.941982269287109, 15.852066040039062, 13.931442260742188, 7.709835052490234, 0.7057609558105469, 2.9744796752929688, -12.162567138671875, -4.696599960327148, 6.704273223876953, -0.7095489501953125, -1.4881324768066406, 2.24517822265625, -1.3458538055419922, 4.370746612548828, 7.939765930175781, 18.133941650390625, 20.822921752929688, -0.486541748046875, 28.340354919433594, 14.354644775390625, 13.766845703125, 9.497787475585938, 2.1125030517578125, 18.89080810546875, -0.9256725311279297, 4.021379470825195, 3.425121307373047, 2.518646240234375, -6.2144012451171875, 5.811735153198242, 8.249622344970703, 11.330596923828125, 9.719488143920898, 7.8046112060546875, 28.89312744140625, 14.601142883300781, 1.5638504028320312, -7.739482879638672, -2.8049468994140625, 7.519033432006836, 0.15997314453125, -8.775863647460938, 3.5729236602783203, 5.983776092529297, -5.467979431152344, 4.44927978515625, -2.6407318115234375, 10.344161987304688, 1.744659423828125, 16.803421020507812, -2.0675601959228516, 11.582725524902344, 4.908870697021484, 17.815467834472656, 6.517047882080078, 16.488571166992188, 6.30743408203125, 20.233482360839844, 4.2326812744140625, 16.043209075927734], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000631.npy"}
{"epoch": 0.9538926681783825, "step": 632, "batch_size": 64, "mean": 6.659795761108398, "std": 6.8344902992248535, "min": -8.579559326171875, "p10": -2.5508962631225582, "median": 6.20658016204834, "p90": 14.696174240112306, "max": 21.173721313476562, "pos_frac": 0.828125, "sample": [8.682388305664062, 10.48089599609375, 3.71978759765625, 1.7997283935546875, 11.84500503540039, 15.387168884277344, -0.9105968475341797, 14.839542388916016, 1.3918647766113281, 12.311058044433594, 2.6490020751953125, -2.8929443359375, 5.993377685546875, -0.7033920288085938, -2.7661304473876953, 13.491130828857422, 12.244739532470703, 2.827402114868164, 13.877716064453125, -7.184713363647461, 5.434947967529297, 3.806703567504883, -7.120817184448242, 8.090530395507812, 2.6452598571777344, 13.671512603759766, 4.561431884765625, 10.93157958984375, 6.406709671020508, 5.295932769775391, 15.153213500976562, 20.636138916015625, 3.8084869384765625, -2.0486831665039062, 9.073356628417969, 7.744560241699219, 14.361648559570312, 6.164552688598633, 12.985420227050781, 6.17518424987793, 10.701713562011719, 12.219017028808594, 6.23797607421875, 2.3677978515625, 2.0636024475097656, 13.352863311767578, 9.059982299804688, 2.9643211364746094, 7.504447937011719, 11.257413864135742, 21.173721313476562, -8.579559326171875, 10.8370361328125, 20.235286712646484, -2.7764415740966797, 1.9479904174804688, 16.380104064941406, 13.143119812011719, 6.444183349609375, 3.9608707427978516, 4.994316101074219, 3.080005645751953, -6.405376434326172, -0.798187255859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000632.npy"}
{"epoch": 0.9554043839758125, "step": 633, "batch_size": 64, "mean": 5.077201843261719, "std": 7.346278667449951, "min": -11.018974304199219, "p10": -2.4946746826171875, "median": 3.9895734786987305, "p90": 15.783480834960942, "max": 25.557830810546875, "pos_frac": 0.75, "sample": [0.23063087463378906, 4.28358268737793, 6.0568389892578125, 7.9120635986328125, 10.276527404785156, 5.377265930175781, 4.4707183837890625, 0.8036956787109375, 1.7249259948730469, 4.9222412109375, 0.27114105224609375, -1.5785026550292969, 19.450363159179688, 2.657705307006836, -1.2142105102539062, 9.510986328125, 1.189483642578125, 1.4514007568359375, 3.6955642700195312, -2.2948837280273438, 2.8606300354003906, -0.9798202514648438, 12.77587890625, 14.1163330078125, 8.77804946899414, 6.039268493652344, -0.4085102081298828, -0.6509246826171875, 5.426809310913086, 6.632740020751953, 8.141433715820312, 0.24142837524414062, 8.83011245727539, 0.40233612060546875, -1.4910736083984375, 6.571575164794922, -2.5802993774414062, -3.8370590209960938, -1.6449298858642578, -5.348075866699219, -0.8193321228027344, 0.1678314208984375, 1.0696640014648438, 25.557830810546875, 9.67071533203125, 0.2013568878173828, 4.9593048095703125, -5.649507522583008, 11.887680053710938, 16.342208862304688, 16.444957733154297, 8.34134292602539, 13.941635131835938, 13.535064697265625, 2.5984153747558594, -6.130010604858398, 2.6823463439941406, -2.700714111328125, 14.479782104492188, 18.360408782958984, 21.662803649902344, -11.018974304199219, 16.804107666015625, 9.478546142578125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000633.npy"}
{"epoch": 0.9569160997732427, "step": 634, "batch_size": 64, "mean": 3.22459077835083, "std": 7.136390686035156, "min": -7.949901580810547, "p10": -4.724715805053711, "median": 1.086233139038086, "p90": 12.898325538635254, "max": 20.50487518310547, "pos_frac": 0.59375, "sample": [-0.4226417541503906, -2.3088836669921875, 2.8466148376464844, 20.50487518310547, 12.184478759765625, -2.411417007446289, -2.9856300354003906, -0.5939350128173828, -0.7063751220703125, -4.000141143798828, 1.2587089538574219, 1.9354400634765625, -1.6529712677001953, -0.39661598205566406, 0.272125244140625, 3.034160614013672, 6.8057403564453125, 3.1932029724121094, -0.7219696044921875, -1.6188945770263672, -6.145332336425781, -0.6217613220214844, 9.834953308105469, -6.7167205810546875, 19.501426696777344, -1.2894096374511719, 12.910232543945312, 8.558374404907227, 16.550811767578125, -3.2200851440429688, -5.848609924316406, 7.911088943481445, 2.722393035888672, -2.8855133056640625, 12.870542526245117, 0.202880859375, -6.725128173828125, -6.474189758300781, 17.586219787597656, 12.126832962036133, 9.445945739746094, -3.786346435546875, -5.035247802734375, 4.828350067138672, 0.91375732421875, -7.949901580810547, 0.47609710693359375, 2.845111846923828, 19.519100189208984, 3.036205291748047, -2.6935043334960938, -1.8841590881347656, 4.2676544189453125, 0.8123397827148438, 5.618082046508789, 1.8982582092285156, 11.73477554321289, 9.568809509277344, 8.84588623046875, 11.179168701171875, -0.13722991943359375, 3.0368480682373047, 0.48191070556640625, 14.287017822265625], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000634.npy"}
{"epoch": 0.9584278155706727, "step": 635, "batch_size": 64, "mean": 4.275641441345215, "std": 8.202296257019043, "min": -8.889991760253906, "p10": -5.768129348754883, "median": 3.08197021484375, "p90": 14.815631103515626, "max": 24.008590698242188, "pos_frac": 0.625, "sample": [-2.25299072265625, 10.342130661010742, -7.675981521606445, -1.9497947692871094, 13.487327575683594, 2.7683944702148438, -5.802181243896484, -7.4094696044921875, -4.932914733886719, 17.750255584716797, 0.7875766754150391, 10.533863067626953, -0.747894287109375, -2.7322769165039062, 9.897857666015625, 15.00527572631836, 14.373126983642578, 3.3955459594726562, 16.336212158203125, 8.273788452148438, -8.889991760253906, -1.3642578125, -1.1197090148925781, 21.29003143310547, -0.7864837646484375, -2.251708984375, 7.861610412597656, -0.9986515045166016, -1.4967975616455078, 3.927051544189453, 2.314544677734375, -5.501319885253906, 13.43771743774414, 3.7589569091796875, -2.2047119140625, 3.7951087951660156, 13.87551498413086, 0.001880645751953125, 11.029182434082031, -7.149044036865234, 10.918220520019531, 18.829002380371094, -1.288543701171875, 22.59371566772461, 3.95294189453125, -5.6886749267578125, 7.906951904296875, 8.339561462402344, 13.684059143066406, -3.887847900390625, 6.228919982910156, 24.008590698242188, 3.61572265625, 2.5171356201171875, -8.051445007324219, 0.7894515991210938, 12.559783935546875, 1.8268051147460938, 1.94989013671875, 7.125114440917969, -6.9436798095703125, 8.019697189331055, 7.20458984375, -1.5456733703613281], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000635.npy"}
{"epoch": 0.9599395313681028, "step": 636, "batch_size": 64, "mean": 5.956270217895508, "std": 6.395050525665283, "min": -7.493858337402344, "p10": -0.9865816116333005, "median": 4.930999755859375, "p90": 15.363099288940433, "max": 22.880447387695312, "pos_frac": 0.84375, "sample": [10.781429290771484, -1.109283447265625, 5.5597381591796875, 1.4899673461914062, 8.023223876953125, 6.698272705078125, 8.943866729736328, 0.2798614501953125, 0.23099517822265625, 14.540679931640625, 7.433197021484375, 0.36037254333496094, 14.087173461914062, 7.231815338134766, 7.703216552734375, 3.982452392578125, 4.907203674316406, 4.954795837402344, 4.574947357177734, 5.808746337890625, 5.462860107421875, 4.4265899658203125, 0.7314033508300781, 9.673818588256836, 3.038393020629883, 4.224123001098633, 22.880447387695312, -0.11718559265136719, -0.7002773284912109, 6.664604187011719, 2.4312705993652344, 5.382780075073242, 20.0440673828125, 4.213897705078125, -1.3068161010742188, 10.985710144042969, 4.895111083984375, 9.994806289672852, -1.9365768432617188, 15.89190673828125, 8.057270050048828, 15.715564727783203, -0.0007915496826171875, 22.1541748046875, 0.8835792541503906, -6.7660980224609375, 17.8074951171875, 5.517578125, 4.387823104858398, 9.429031372070312, 7.68707275390625, 3.5169639587402344, 10.632598876953125, 18.583267211914062, 8.069107055664062, -5.6663970947265625, -7.493858337402344, 1.9373531341552734, 3.203969955444336, 10.474088668823242, 2.693572998046875, -1.2655868530273438, 4.774547576904297, 3.505340576171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000636.npy"}
{"epoch": 0.9614512471655329, "step": 637, "batch_size": 64, "mean": 4.683398723602295, "std": 8.03256607055664, "min": -16.0390625, "p10": -3.015416717529297, "median": 3.640087127685547, "p90": 18.762394714355477, "max": 23.89752197265625, "pos_frac": 0.71875, "sample": [3.292755126953125, 10.042945861816406, 5.631660461425781, 23.89752197265625, 0.6299705505371094, 5.5357513427734375, -3.0257644653320312, 6.531450271606445, 13.938232421875, 1.6835670471191406, -1.0772476196289062, 2.4957656860351562, 7.190711975097656, -3.58062744140625, 3.4605636596679688, 8.739643096923828, -0.5228919982910156, 5.664196014404297, 20.429935455322266, 1.6100711822509766, -0.0660552978515625, 3.6856155395507812, 1.0898361206054688, 0.7100601196289062, -0.4388389587402344, 6.608940124511719, -1.3388214111328125, 21.82532501220703, 13.770095825195312, 22.431732177734375, 20.957393646240234, 1.4906463623046875, -7.82196044921875, 8.0732421875, -2.4937591552734375, 4.174873352050781, -9.153465270996094, 4.457080841064453, 3.4434356689453125, 11.807777404785156, -0.5847625732421875, 10.60333251953125, -0.8145313262939453, 4.062862396240234, 8.604461669921875, 4.787689208984375, -3.8134384155273438, 4.45379638671875, -2.149463653564453, 0.5637893676757812, 1.485443115234375, 21.329299926757812, 4.446262359619141, 6.386932373046875, 3.2732391357421875, -8.149200439453125, 17.180526733398438, 3.7881717681884766, -0.1942901611328125, -16.0390625, 4.691474914550781, 19.440338134765625, -2.99127197265625, 3.5945587158203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000637.npy"}
{"epoch": 0.9629629629629629, "step": 638, "batch_size": 64, "mean": 6.197139739990234, "std": 7.290056228637695, "min": -7.563423156738281, "p10": -1.7272785186767576, "median": 4.922690391540527, "p90": 16.128404808044436, "max": 22.246170043945312, "pos_frac": 0.75, "sample": [-1.1510467529296875, 13.251983642578125, 13.657478332519531, 0.321258544921875, -7.563423156738281, 0.7024612426757812, 17.69304084777832, 4.695953369140625, 1.5067176818847656, -2.961902618408203, 2.8302230834960938, -2.5599632263183594, 0.08869171142578125, 4.8505401611328125, 1.6247406005859375, 13.975692749023438, -5.9669647216796875, -1.81329345703125, 11.616241455078125, 3.238037109375, 7.5360565185546875, 9.02813720703125, 14.054664611816406, 12.40985107421875, -6.114997863769531, 22.246170043945312, 7.707069396972656, 15.178565979003906, 0.8781700134277344, 3.517345428466797, 4.994840621948242, 16.535478591918945, 8.455841064453125, 19.02585220336914, 6.909454345703125, -0.9683208465576172, 1.2293777465820312, 10.6820068359375, 8.634384155273438, 12.847393035888672, 11.324806213378906, 9.687873840332031, 10.59002685546875, -1.5265769958496094, 17.05400848388672, -0.874755859375, -0.04351234436035156, 14.518196105957031, 3.6131515502929688, 12.49639892578125, -0.2169933319091797, 6.8675384521484375, 17.789474487304688, 4.75634765625, -4.233619689941406, 7.2815399169921875, -0.22674560546875, 22.220787048339844, 14.617780685424805, 1.6654472351074219, 0.7214202880859375, -0.120208740234375, -0.029510498046875, 5.86027717590332], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000638.npy"}
{"epoch": 0.9644746787603931, "step": 639, "batch_size": 64, "mean": 5.566571235656738, "std": 7.248122215270996, "min": -7.472627639770508, "p10": -2.880776214599609, "median": 4.6573486328125, "p90": 15.285376739501954, "max": 25.08039093017578, "pos_frac": 0.75, "sample": [8.001066207885742, 9.34722900390625, 2.7417144775390625, 9.86981201171875, 7.3108978271484375, 7.494285583496094, 4.327686309814453, -4.172435760498047, -0.9773235321044922, 10.9554443359375, 0.07102203369140625, 5.056575775146484, 23.933792114257812, 4.66351318359375, 7.863018035888672, 0.9211044311523438, 6.558942794799805, -1.5339603424072266, -5.728111267089844, 5.916282653808594, 3.0811080932617188, 15.462635040283203, 1.746328353881836, 11.6373291015625, -3.295581817626953, 13.160720825195312, 2.9202842712402344, 9.321754455566406, 0.3721466064453125, 15.345245361328125, 18.99939727783203, -3.636943817138672, 1.5586700439453125, 7.200531005859375, -1.0400142669677734, 0.982452392578125, 5.903682708740234, 2.8103103637695312, 24.53433609008789, 11.324775695800781, 12.212095260620117, 4.571502685546875, 15.145683288574219, 6.0451202392578125, -2.886962890625, 2.431415557861328, -1.362396240234375, -2.2057228088378906, 6.778388977050781, -0.001117706298828125, 16.555824279785156, 11.221824645996094, -0.7508621215820312, 4.65118408203125, 2.18463134765625, -7.472627639770508, 1.6007118225097656, -0.45660400390625, -2.8663406372070312, 25.08039093017578, -3.2328834533691406, 6.175270080566406, 8.906381607055664, 12.9259033203125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000639.npy"}
{"epoch": 0.9659863945578231, "step": 640, "batch_size": 64, "mean": 4.297284126281738, "std": 7.089978218078613, "min": -18.445110321044922, "p10": -3.9785495758056633, "median": 4.9007110595703125, "p90": 13.170020294189456, "max": 17.547475814819336, "pos_frac": 0.75, "sample": [8.277118682861328, 9.623907089233398, 12.005447387695312, -9.610174179077148, 1.1271476745605469, 4.826896667480469, 11.118419647216797, 2.2694778442382812, 3.671661376953125, 14.926399230957031, 0.8243026733398438, 1.0431900024414062, 11.305463790893555, 5.320606231689453, -6.845245361328125, 4.974525451660156, 6.722684860229492, 5.408176422119141, 1.2058238983154297, 15.935544967651367, -18.445110321044922, 1.76751708984375, 10.40460205078125, 1.5426788330078125, -3.205455780029297, -2.849061965942383, 7.8479461669921875, -0.36603546142578125, -4.900394439697266, 2.123292922973633, 14.096120834350586, -1.4761199951171875, 2.5097198486328125, 11.027740478515625, 10.662841796875, -0.0715484619140625, -1.15728759765625, 3.573577880859375, 5.567832946777344, 6.863258361816406, 0.8053436279296875, 13.812644958496094, 1.4547958374023438, 6.6183929443359375, 13.488067626953125, 6.6187744140625, 5.001914978027344, -14.909881591796875, 3.7847633361816406, 8.419349670410156, 7.754367828369141, 15.945518493652344, 3.7275123596191406, 11.412841796875, -0.6593170166015625, -1.9225597381591797, 17.547475814819336, -4.891426086425781, -4.30987548828125, 8.07615852355957, -0.10443496704101562, 12.427909851074219, 5.407529830932617, 9.872825622558594], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000640.npy"}
{"epoch": 0.9674981103552532, "step": 641, "batch_size": 64, "mean": 4.501331329345703, "std": 6.488428592681885, "min": -13.816047668457031, "p10": -1.1825763702392575, "median": 3.865720748901367, "p90": 12.495214080810547, "max": 24.263259887695312, "pos_frac": 0.796875, "sample": [-3.697093963623047, -7.502052307128906, 8.387374877929688, 4.901468276977539, 7.5892791748046875, 0.47863006591796875, 9.10220718383789, 4.235893249511719, -1.2990188598632812, 1.5134086608886719, 1.310333251953125, 2.7733020782470703, 6.514686584472656, 16.417068481445312, 1.849639892578125, 2.995880126953125, -0.6638412475585938, 11.753005981445312, 15.979551315307617, 5.125709533691406, 8.246383666992188, -0.8914108276367188, -0.02152252197265625, -0.528472900390625, 7.813499450683594, 12.509956359863281, 5.71295166015625, 3.716156005859375, 6.865478515625, 7.096185684204102, -0.627655029296875, 12.4608154296875, 2.8068618774414062, 4.677412033081055, -0.9108772277832031, -10.332244873046875, 3.8272438049316406, 7.149566650390625, -3.287628173828125, 0.2686614990234375, -13.816047668457031, 19.38086700439453, 4.732490539550781, 1.4460983276367188, 3.9041976928710938, 17.727375030517578, 0.5087165832519531, 0.02811431884765625, 2.6986541748046875, 6.2682952880859375, 24.263259887695312, 8.12946891784668, 14.435415267944336, 5.661676406860352, -2.635040283203125, 4.375499725341797, 6.964073181152344, 2.8276824951171875, 3.1053504943847656, 3.046234130859375, 3.5479736328125, 1.41082763671875, 5.972158432006836, 9.785078048706055], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000641.npy"}
{"epoch": 0.9690098261526833, "step": 642, "batch_size": 64, "mean": 3.8901000022888184, "std": 7.285355567932129, "min": -17.25585174560547, "p10": -3.962574577331543, "median": 2.8171730041503906, "p90": 14.270179748535158, "max": 18.17314910888672, "pos_frac": 0.65625, "sample": [13.512428283691406, 5.2864837646484375, -0.34741973876953125, -3.886606216430664, 2.7103424072265625, -3.2204761505126953, 6.7969818115234375, 15.518569946289062, 1.029388427734375, -17.25585174560547, 13.878768920898438, 1.806936264038086, 2.2143402099609375, -2.75250244140625, 8.026634216308594, -0.3292694091796875, 10.672904968261719, 18.17314910888672, -1.8179702758789062, -6.082908630371094, 3.58917236328125, 7.04095458984375, 6.858085632324219, -2.124217987060547, 15.370231628417969, 8.370719909667969, 18.12483787536621, 4.097076416015625, 13.587615966796875, 2.2081146240234375, 4.7757110595703125, 6.1612701416015625, 7.440544128417969, 1.6119575500488281, -2.5064048767089844, 13.65234375, -6.313804626464844, 2.6179733276367188, 0.7268447875976562, -2.8607940673828125, 0.3717689514160156, 2.9240036010742188, -0.15161895751953125, -0.7557601928710938, 8.322481155395508, 4.318584442138672, -7.352996826171875, 10.050451278686523, 14.43792724609375, 17.277137756347656, 1.58612060546875, -3.0187835693359375, 9.949024200439453, 11.130035400390625, 17.012876510620117, 4.450016021728516, -3.9951324462890625, 5.065101623535156, -3.7601490020751953, -4.308837890625, -1.513031005859375, -0.8695602416992188, -5.4225921630859375, 6.857181549072266], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000642.npy"}
{"epoch": 0.9705215419501134, "step": 643, "batch_size": 64, "mean": 3.9034934043884277, "std": 8.251629829406738, "min": -35.27435302734375, "p10": -4.773406600952148, "median": 4.0526580810546875, "p90": 13.665461730957032, "max": 19.868186950683594, "pos_frac": 0.765625, "sample": [2.177978515625, 5.609531402587891, 4.190011978149414, -5.1650543212890625, 1.819366455078125, 16.98590087890625, 3.915304183959961, 7.891380310058594, 10.943031311035156, 1.079559326171875, -1.514923095703125, 3.5267181396484375, 4.742218017578125, 0.8634033203125, 7.497596740722656, -35.27435302734375, -2.1242294311523438, 0.09874534606933594, 13.205778121948242, 19.868186950683594, 3.656219482421875, 5.126514434814453, 10.149860382080078, 2.571033477783203, -1.2416839599609375, 16.630630493164062, 5.412483215332031, -6.1163787841796875, 18.883174896240234, 11.084884643554688, 5.9431304931640625, 6.539726257324219, 1.0217437744140625, 7.191043853759766, 3.2393150329589844, -0.39519500732421875, 2.5498390197753906, 4.544429779052734, 15.150871276855469, 11.615434646606445, 11.95677375793457, -7.822380065917969, 8.593276977539062, -4.830944061279297, 7.719230651855469, -3.4627227783203125, -5.813697814941406, -10.251106262207031, 3.496084213256836, 5.066362380981445, 0.21368408203125, 5.180515289306641, 15.061981201171875, 7.024158477783203, 3.0683135986328125, -4.639152526855469, -2.919647216796875, 13.546249389648438, 4.204990386962891, 2.3458118438720703, -2.738536834716797, 13.716552734375, 6.978981018066406, 0.235565185546875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000643.npy"}
{"epoch": 0.9720332577475435, "step": 644, "batch_size": 64, "mean": 4.259905815124512, "std": 7.684263229370117, "min": -14.6075439453125, "p10": -3.555403900146484, "median": 3.284576416015625, "p90": 14.348961067199708, "max": 20.80207061767578, "pos_frac": 0.703125, "sample": [8.051895141601562, -0.6935939788818359, -14.250286102294922, 6.053642272949219, -1.7737350463867188, -4.09368896484375, 4.250690460205078, 14.187604904174805, 16.269893646240234, 13.565055847167969, 5.609821319580078, -0.749267578125, 11.561599731445312, 12.853496551513672, 6.1729736328125, 1.2762069702148438, 3.0894622802734375, 2.0369491577148438, 0.26930999755859375, 17.32697868347168, -0.5835609436035156, 10.04058837890625, -10.758018493652344, -0.8379783630371094, -1.8768539428710938, 7.986259460449219, -0.7992324829101562, 5.981639862060547, 0.48001861572265625, -3.666473388671875, 11.68075942993164, -7.5299072265625, 4.579753875732422, 9.026298522949219, 12.913772583007812, -3.2962417602539062, 2.9200592041015625, 19.475353240966797, 14.614997863769531, 2.075254440307617, 5.5158843994140625, 10.333980560302734, 2.1079254150390625, 3.0664825439453125, -0.52972412109375, -14.6075439453125, -0.31056785583496094, 13.407039642333984, 2.6792831420898438, 14.418113708496094, 3.4019317626953125, 3.1672210693359375, -0.2491588592529297, -2.8717422485351562, 4.478080749511719, 5.626621246337891, -9.845458984375, 4.155464172363281, 7.6997222900390625, 1.2526397705078125, 1.59124755859375, 20.80207061767578, 4.4802398681640625, 19.422714233398438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000644.npy"}
{"epoch": 0.9735449735449735, "step": 645, "batch_size": 64, "mean": 4.9632649421691895, "std": 6.506046295166016, "min": -10.19561767578125, "p10": -2.34079475402832, "median": 4.239788055419922, "p90": 12.983089828491211, "max": 23.235755920410156, "pos_frac": 0.796875, "sample": [4.474491119384766, 14.0906982421875, 8.894004821777344, -3.5527725219726562, 10.167686462402344, 23.235755920410156, 9.215400695800781, 3.597726821899414, 2.7386245727539062, -0.471435546875, -2.114063262939453, 2.1865692138671875, 4.005084991455078, 0.8792762756347656, 13.261215209960938, 1.7108993530273438, 9.27723503112793, 1.7128868103027344, 0.96466064453125, 0.21266555786132812, 17.70309066772461, -3.3347854614257812, 2.8661575317382812, 4.959320068359375, 5.757057189941406, -2.4379653930664062, 0.5010147094726562, 4.519096374511719, 11.73463249206543, 9.235305786132812, 3.1323623657226562, 1.1469688415527344, 6.939899444580078, 0.3085289001464844, 0.107757568359375, 7.467445373535156, -6.04473876953125, -0.4957122802734375, -3.9705352783203125, 8.822647094726562, 12.939476013183594, 6.282920837402344, 21.382156372070312, 0.8447456359863281, 0.22187423706054688, 7.0715179443359375, 8.812156677246094, 3.5192089080810547, 5.7970733642578125, 5.192863464355469, -2.7692031860351562, 12.283226013183594, 5.160648345947266, 19.373565673828125, 7.249481201171875, -0.635162353515625, 12.758125305175781, -0.8791160583496094, 7.177734375, -10.19561767578125, 1.0308761596679688, 13.001781463623047, 8.824291229248047, -0.19983673095703125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000645.npy"}
{"epoch": 0.9750566893424036, "step": 646, "batch_size": 64, "mean": 5.40712833404541, "std": 7.127107620239258, "min": -12.844930648803711, "p10": -2.217995452880859, "median": 4.297357559204102, "p90": 14.459144210815431, "max": 25.113555908203125, "pos_frac": 0.8125, "sample": [0.6849517822265625, 0.2411975860595703, 4.059806823730469, 6.0971527099609375, 13.930755615234375, 9.717147827148438, -4.243202209472656, 12.468231201171875, 9.213987350463867, 9.466316223144531, -1.190338134765625, 2.90509033203125, 9.003997802734375, 17.798080444335938, 7.125335693359375, 0.4104881286621094, 6.638603210449219, 0.370208740234375, 17.826374053955078, 10.83534049987793, 12.663116455078125, 6.5699462890625, -7.790067672729492, -1.9478530883789062, 1.5610313415527344, 10.26190185546875, 3.3309364318847656, -1.2406158447265625, 1.2511215209960938, 2.7008285522460938, 5.0419769287109375, 3.297962188720703, 9.034088134765625, 14.685596466064453, 9.844268798828125, 0.6737098693847656, 2.8660850524902344, -2.333770751953125, 3.620279312133789, 9.239500045776367, 0.3040599822998047, -2.956920623779297, 7.141357421875, 1.6347503662109375, 9.845474243164062, 25.113555908203125, 8.250274658203125, 16.16448974609375, 2.550802230834961, 22.1773681640625, 12.176130294799805, -0.914398193359375, 6.399261474609375, 6.2321929931640625, -12.844930648803711, 1.712677001953125, 12.08465576171875, -5.754602432250977, 0.2585258483886719, 17.96015167236328, -3.739917755126953, -0.3713836669921875, 4.534908294677734, 1.4081611633300781], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000646.npy"}
{"epoch": 0.9765684051398337, "step": 647, "batch_size": 64, "mean": 6.402267932891846, "std": 7.592617988586426, "min": -10.160423278808594, "p10": -1.5046566009521476, "median": 5.929035186767578, "p90": 17.26319065093995, "max": 24.010650634765625, "pos_frac": 0.828125, "sample": [0.5396461486816406, 6.501808166503906, 10.720571517944336, 5.427238464355469, 8.937271118164062, 5.89801025390625, 0.202789306640625, 22.615013122558594, 1.128173828125, 18.3751220703125, 5.960060119628906, 5.385650634765625, 7.20025634765625, -0.2532005310058594, 2.2510910034179688, 6.45672607421875, 20.103469848632812, 0.8366107940673828, -4.291538238525391, -8.92083740234375, 12.367471694946289, 8.189682006835938, 12.848541259765625, 1.7508621215820312, 23.484901428222656, 13.461563110351562, 12.671703338623047, -1.8341102600097656, 12.603374481201172, 2.1914520263671875, 6.286144256591797, -0.735931396484375, 11.261825561523438, 0.693603515625, 6.630649566650391, -3.0433425903320312, 2.906759262084961, 14.668684005737305, 1.3544464111328125, -2.456329345703125, 8.945032119750977, 6.269615173339844, 3.2842369079589844, 10.059249877929688, -10.160423278808594, 2.870281219482422, 3.3150177001953125, 8.311840057373047, 2.3733749389648438, 4.187540054321289, -0.0880889892578125, 20.267372131347656, 5.571502685546875, -6.255340576171875, -0.3106231689453125, 13.745094299316406, 1.32965087890625, 24.010650634765625, 6.490455627441406, 11.166748046875, 22.863327026367188, 0.7786331176757812, 8.999135971069336, 11.344970703125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000647.npy"}
{"epoch": 0.9780801209372638, "step": 648, "batch_size": 64, "mean": 5.271755218505859, "std": 7.294980049133301, "min": -14.79880142211914, "p10": -2.4037025451660154, "median": 4.680132865905762, "p90": 15.333749389648439, "max": 21.727005004882812, "pos_frac": 0.75, "sample": [10.953346252441406, 10.759666442871094, 6.510826110839844, -14.79880142211914, 17.598922729492188, -0.7816314697265625, -10.3961181640625, 5.915924072265625, 6.6291351318359375, 5.349296569824219, 0.9897956848144531, 6.604011535644531, 3.7384262084960938, -0.6228408813476562, -2.8495254516601562, -6.58894157409668, 9.172019958496094, -0.01761627197265625, 21.727005004882812, 10.928436279296875, 19.975109100341797, 2.00079345703125, 10.477386474609375, 15.091522216796875, 12.191116333007812, 2.116809844970703, 2.0483245849609375, 15.966606140136719, 4.220458984375, 17.507408142089844, 2.8524322509765625, -0.4701728820800781, 15.43756103515625, 13.96600341796875, 1.1923847198486328, -4.583656311035156, 2.8090667724609375, 4.547519683837891, 6.008750915527344, 13.429672241210938, 6.608177185058594, 9.225509643554688, 17.189300537109375, 0.020917892456054688, -2.4827194213867188, -0.822845458984375, 8.370712280273438, 10.169660568237305, 3.108062744140625, 7.789318084716797, 7.192268371582031, 8.81393051147461, 10.089256286621094, -9.158233642578125, -2.219329833984375, 3.6144866943359375, 11.136260986328125, -0.6343841552734375, 1.8455963134765625, -0.9095573425292969, 2.673004150390625, -1.1937618255615234, 4.6473236083984375, 4.712942123413086], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000648.npy"}
{"epoch": 0.9795918367346939, "step": 649, "batch_size": 64, "mean": 5.26740837097168, "std": 7.266210079193115, "min": -10.491653442382812, "p10": -4.732758712768555, "median": 5.863924980163574, "p90": 15.372026252746585, "max": 20.923751831054688, "pos_frac": 0.765625, "sample": [16.124343872070312, 10.108522415161133, -1.5420074462890625, 7.277435302734375, 9.584489822387695, 10.963081359863281, 8.796745300292969, 5.775716781616211, 1.8482837677001953, 14.716915130615234, 2.1917457580566406, 16.220840454101562, 5.9521331787109375, -6.810596466064453, 9.539684295654297, 5.243034362792969, 4.386013031005859, 9.838153839111328, 6.796266555786133, 6.9405975341796875, 18.3708553314209, -1.6904144287109375, -2.3821868896484375, -7.458930969238281, -10.491653442382812, 11.356067657470703, 9.93603515625, 1.2773113250732422, -4.663761138916016, -4.7623291015625, 10.21298599243164, 1.0839614868164062, 12.815654754638672, 1.5236186981201172, 8.232372283935547, 5.27777099609375, 6.508510589599609, 9.397159576416016, 14.209789276123047, 6.498844146728516, 2.8971786499023438, 17.869495391845703, 7.5879974365234375, -4.267993927001953, 0.6808319091796875, 15.652788162231445, 1.0699100494384766, 1.9435653686523438, 4.375144958496094, 1.0059642791748047, -6.245689392089844, 16.724533081054688, -5.426578521728516, 3.9702606201171875, 20.923751831054688, -3.6421127319335938, -6.068454742431641, -2.0527801513671875, 8.890853881835938, 11.019466400146484, 12.848709106445312, 9.325065612792969, 0.9282264709472656, -2.09906005859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000649.npy"}
{"epoch": 0.981103552532124, "step": 650, "batch_size": 64, "mean": 4.830893516540527, "std": 7.800980567932129, "min": -11.277572631835938, "p10": -5.212482833862305, "median": 4.901346206665039, "p90": 15.46465301513672, "max": 21.308183670043945, "pos_frac": 0.75, "sample": [-5.233463287353516, 14.43072509765625, -6.393402099609375, 16.00023651123047, 0.5971450805664062, 15.0501708984375, 5.230152130126953, 0.1504364013671875, 3.2499847412109375, 1.2077159881591797, 7.873680114746094, 10.93008804321289, -5.1635284423828125, 5.307121276855469, 0.22153282165527344, 14.772727966308594, 15.642288208007812, 5.465951919555664, 10.213058471679688, 2.1678714752197266, 5.902240753173828, 5.530181884765625, -2.231809616088867, -3.7501602172851562, 18.126617431640625, 11.661449432373047, 21.308183670043945, 13.194820404052734, 7.816947937011719, -6.705467224121094, 0.5255584716796875, -11.277572631835938, 0.7864532470703125, 12.450489044189453, -10.35760498046875, 1.6599349975585938, 19.088842391967773, 9.023197174072266, 0.2997093200683594, -8.066978454589844, 5.425933837890625, 13.879266738891602, -6.885528564453125, 13.254478454589844, -1.97808837890625, -1.5157012939453125, 16.529708862304688, 10.642353057861328, 4.572540283203125, 0.7166957855224609, 5.927331924438477, 0.468170166015625, 6.626613616943359, 1.3151359558105469, -0.2858104705810547, 11.597879409790039, -3.539154052734375, 7.2255401611328125, 8.313316345214844, 17.908599853515625, -2.327056884765625, 2.4769859313964844, 4.289920806884766, -2.167449951171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000650.npy"}
{"epoch": 0.982615268329554, "step": 651, "batch_size": 64, "mean": 4.82521915435791, "std": 6.858511924743652, "min": -10.47265625, "p10": -2.221345520019531, "median": 3.258840560913086, "p90": 15.08385581970215, "max": 21.972152709960938, "pos_frac": 0.71875, "sample": [1.523630142211914, 0.6717147827148438, 4.031837463378906, 1.8010711669921875, 6.802268981933594, -10.47265625, -1.0351638793945312, 4.884748458862305, -2.0844879150390625, 10.588005065917969, 1.7434921264648438, 1.485931396484375, -1.2892532348632812, -2.7774810791015625, 8.838577270507812, -7.985450744628906, -0.5915317535400391, 19.85107421875, 1.8228378295898438, 7.187084197998047, 6.449520111083984, -0.18409347534179688, -0.7481460571289062, 2.353240966796875, 2.4700241088867188, 1.000284194946289, 9.10858154296875, 10.869552612304688, -1.2079010009765625, 15.207427978515625, 6.218482971191406, 11.850387573242188, 5.717437744140625, -0.2233123779296875, 7.3907318115234375, 1.0261955261230469, 9.535015106201172, 6.965888977050781, -2.5183944702148438, 5.161785125732422, 16.8968505859375, 6.16131591796875, 14.795520782470703, 2.2504615783691406, -1.04229736328125, 21.42681884765625, 21.972152709960938, 16.34059715270996, -1.7725143432617188, 12.925540924072266, 2.4858436584472656, -2.279998779296875, -0.13854408264160156, 8.696243286132812, 6.9045562744140625, 5.553413391113281, 10.423870086669922, 4.857746124267578, 9.038230895996094, -5.696807861328125, 2.1655349731445312, -2.397186279296875, 17.208229064941406, 0.5994720458984375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000651.npy"}
{"epoch": 0.9841269841269841, "step": 652, "batch_size": 64, "mean": 4.821639060974121, "std": 6.815805435180664, "min": -7.7347259521484375, "p10": -3.30331916809082, "median": 4.224153518676758, "p90": 14.276937103271484, "max": 18.665679931640625, "pos_frac": 0.71875, "sample": [-0.7069244384765625, 13.124685287475586, 15.75101089477539, 6.274757385253906, 3.2314453125, 14.299552917480469, 11.386449813842773, 7.05487060546875, 18.563106536865234, 2.2289047241210938, 10.291915893554688, -7.604579925537109, 0.16270065307617188, 5.922021865844727, 5.853609085083008, 6.7979278564453125, 2.2530899047851562, 11.729736328125, 6.4793853759765625, -1.287139892578125, -3.0922088623046875, 7.67724609375, 6.367462158203125, -0.6717700958251953, 2.2905540466308594, 0.3820037841796875, -6.832183837890625, -1.8385772705078125, 14.224166870117188, 6.802757263183594, 5.336887359619141, -1.5367202758789062, 3.8510093688964844, 4.597297668457031, 11.586051940917969, -0.17420196533203125, 18.665679931640625, 1.8022270202636719, -2.608530044555664, -7.7347259521484375, 0.7094535827636719, 17.264209747314453, 2.8019676208496094, 5.759639739990234, -0.6093597412109375, 0.2880668640136719, 13.453475952148438, 8.291305541992188, -0.08859443664550781, 12.092231750488281, 16.818359375, 0.48551177978515625, 9.628829956054688, 3.013660430908203, -3.3937950134277344, 8.857208251953125, -4.373291015625, 2.6509933471679688, -2.1270370483398438, 18.131486892700195, 4.8721771240234375, -3.4749679565429688, -4.753791809082031, 11.386199951171875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000652.npy"}
{"epoch": 0.9856386999244142, "step": 653, "batch_size": 64, "mean": 5.046598434448242, "std": 7.4187211990356445, "min": -13.261566162109375, "p10": -2.0655433654785154, "median": 3.940244674682617, "p90": 16.621514892578126, "max": 23.961944580078125, "pos_frac": 0.75, "sample": [10.70448112487793, 16.95825958251953, -1.108673095703125, 1.1256637573242188, 4.17620849609375, 3.918498992919922, 7.531440734863281, 2.8801651000976562, 21.06298828125, 0.6129302978515625, -2.1734542846679688, 1.0293350219726562, 15.140708923339844, 1.4342193603515625, 18.938690185546875, 11.929664611816406, 7.105991363525391, 17.292190551757812, 2.310504913330078, 15.835777282714844, 12.666603088378906, 17.057449340820312, 1.2499637603759766, -0.1242828369140625, 1.8026561737060547, -0.9762477874755859, 9.193092346191406, 0.4503822326660156, -0.1974468231201172, 0.7124862670898438, 23.961944580078125, -3.7837677001953125, -1.6157989501953125, 2.494251251220703, 7.724992752075195, -5.136116027832031, 6.799663543701172, 4.871944427490234, 4.467742919921875, 0.6953544616699219, 1.340301513671875, -13.261566162109375, 5.51666259765625, 4.6502227783203125, 10.731582641601562, -10.405078887939453, 15.273805618286133, -1.813751220703125, 18.049057006835938, 3.9619903564453125, 2.0862884521484375, 11.72540283203125, -5.8643341064453125, -0.159210205078125, -0.9925212860107422, -0.09372138977050781, 9.237129211425781, 6.436332702636719, 2.0452651977539062, 8.068923950195312, 5.169811248779297, 6.402986526489258, -2.38958740234375, 8.245880126953125], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000653.npy"}
{"epoch": 0.9871504157218443, "step": 654, "batch_size": 64, "mean": 5.069292068481445, "std": 7.4383344650268555, "min": -11.393264770507812, "p10": -2.9344470977783197, "median": 3.3801965713500977, "p90": 15.553734016418462, "max": 27.0166015625, "pos_frac": 0.796875, "sample": [2.7374801635742188, 13.3048095703125, -7.226554870605469, 5.741632461547852, 1.9468536376953125, -5.9329681396484375, 11.719001770019531, 3.4495105743408203, 5.399562835693359, 0.27365875244140625, 2.8191680908203125, -3.9533462524414062, 15.955451965332031, 2.8667984008789062, -0.647369384765625, 3.9939327239990234, 5.071941375732422, -3.8322830200195312, 14.616392135620117, 0.8001785278320312, 3.310882568359375, 3.836437225341797, 12.453414916992188, 24.158287048339844, 0.6605949401855469, 2.7890853881835938, -0.47702980041503906, -0.8621292114257812, 0.051849365234375, 4.217494964599609, 1.401742935180664, 7.79522705078125, 27.0166015625, 2.8569908142089844, 6.834770202636719, 10.608573913574219, 6.908241271972656, -3.204986572265625, 1.4013214111328125, -0.9398651123046875, 10.72011947631836, -11.393264770507812, 6.8231964111328125, -3.6346492767333984, 2.95819091796875, -0.8088531494140625, 20.057159423828125, 18.268512725830078, 3.063549041748047, 4.289894104003906, 2.2806243896484375, 21.133041381835938, 2.361246109008789, 5.1686248779296875, 20.623931884765625, -2.3031883239746094, 5.790069580078125, 3.833171844482422, 0.3306732177734375, 11.640769958496094, 7.322765350341797, 2.8408355712890625, 5.0830841064453125, 8.063812255859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000654.npy"}
{"epoch": 0.9886621315192744, "step": 655, "batch_size": 64, "mean": 5.493218421936035, "std": 6.022883415222168, "min": -5.612110137939453, "p10": -1.9300743103027336, "median": 4.483430862426758, "p90": 12.31313819885254, "max": 28.449859619140625, "pos_frac": 0.84375, "sample": [9.098005294799805, 2.0953292846679688, -0.0105743408203125, 3.4639511108398438, 2.317626953125, 13.871620178222656, 10.397964477539062, 7.998386383056641, 2.546173095703125, 2.0824737548828125, 0.7872390747070312, 4.819053649902344, 3.8823318481445312, 0.49956512451171875, -3.622161865234375, -2.404327392578125, 5.017473220825195, 9.559249877929688, 18.28351593017578, 1.6298866271972656, 14.65786361694336, -2.234375, 11.913158416748047, 7.677001953125, 4.222858428955078, -0.5819549560546875, 3.4748306274414062, -5.612110137939453, 13.66107177734375, 1.7464828491210938, 9.150699615478516, 3.9540252685546875, 10.081031799316406, 1.8816757202148438, 9.404510498046875, 1.663543701171875, 3.232156753540039, 10.323539733886719, 6.696502685546875, 28.449859619140625, 1.4827423095703125, -2.8574676513671875, -1.2200393676757812, 12.48455810546875, 4.966320037841797, -4.9117431640625, 4.7440032958984375, 9.344985961914062, 14.844057083129883, 10.67852783203125, 1.1302261352539062, 3.193817138671875, 8.338623046875, 3.313840866088867, 3.0817108154296875, 8.139850616455078, 11.626567840576172, 4.9621124267578125, 9.731277465820312, 0.93798828125, -4.144421577453613, 9.034856796264648, 7.2841949462890625, 9.304206848144531], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000655.npy"}
{"epoch": 0.9901738473167044, "step": 656, "batch_size": 64, "mean": 6.14048957824707, "std": 7.17099142074585, "min": -8.750980377197266, "p10": -2.518041610717773, "median": 4.339771270751953, "p90": 16.625474548339845, "max": 24.468177795410156, "pos_frac": 0.84375, "sample": [17.093379974365234, 2.2097854614257812, 16.421279907226562, 7.481498718261719, 1.6970710754394531, 0.18849945068359375, 3.1733055114746094, 1.1874542236328125, 5.7027587890625, -1.4080581665039062, 2.3995742797851562, 16.66020965576172, 2.101236343383789, -1.2618179321289062, 1.9488029479980469, 13.304519653320312, -2.6396636962890625, -2.6566085815429688, 6.325595855712891, 11.50489616394043, 24.468177795410156, 6.228740692138672, -8.750980377197266, 8.364124298095703, 13.351318359375, -2.2342567443847656, 3.6753158569335938, 11.732406616210938, -3.2441864013671875, 1.9162788391113281, 22.148895263671875, 0.29457855224609375, 3.2581348419189453, 4.402000427246094, 4.2775421142578125, 15.044570922851562, 8.093109130859375, 4.190290451049805, 16.54442596435547, 8.851409912109375, 1.6944580078125, -3.0677413940429688, 7.054176330566406, 15.358543395996094, 4.989814758300781, 10.025012969970703, 2.4403743743896484, 12.772497177124023, 1.4448814392089844, 9.311988830566406, 16.9913330078125, -2.823040008544922, 6.545989990234375, 21.265338897705078, 8.662338256835938, 3.379150390625, 0.11352920532226562, -4.0280914306640625, 0.7371063232421875, 2.178997039794922, 0.38362884521484375, 4.8416748046875, 10.633171081542969, 18.04058074951172], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000656.npy"}
{"epoch": 0.9916855631141346, "step": 657, "batch_size": 64, "mean": 6.249157428741455, "std": 7.651082992553711, "min": -14.215686798095703, "p10": -1.5979019165039057, "median": 6.86884880065918, "p90": 16.501566123962405, "max": 22.427413940429688, "pos_frac": 0.796875, "sample": [16.68585205078125, 9.073158264160156, 0.04869842529296875, 12.77874755859375, 7.589485168457031, 4.238386154174805, 19.074005126953125, 18.364654541015625, 8.519142150878906, -14.215686798095703, -1.8019638061523438, -1.1217575073242188, 8.569698333740234, 0.9327888488769531, 15.59454345703125, 3.7583274841308594, 10.079116821289062, 20.90165138244629, -4.887092590332031, 11.809097290039062, -0.596527099609375, 8.73187255859375, -6.469367980957031, 1.8914566040039062, 7.723285675048828, 16.071565628051758, 3.156951904296875, 3.3555374145507812, -0.7540473937988281, 4.05938720703125, 9.060821533203125, 10.87164306640625, 9.651588439941406, 13.89892578125, 9.403072357177734, 8.030021667480469, 6.971691131591797, 19.726581573486328, 14.491043090820312, 2.2421646118164062, 10.882171630859375, 2.75384521484375, 0.0003223419189453125, 13.330978393554688, -1.0784530639648438, -0.2373046875, 22.427413940429688, 0.5755290985107422, -2.5889892578125, 4.488182067871094, 2.7515640258789062, -9.774887084960938, 11.089385986328125, 0.24834251403808594, 1.8543891906738281, 1.0092620849609375, -4.128715515136719, 6.7660064697265625, 9.556209564208984, 19.635986328125, -0.5716419219970703, 10.703384399414062, 0.8733673095703125, 11.871212005615234], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000657.npy"}
{"epoch": 0.9931972789115646, "step": 658, "batch_size": 64, "mean": 4.898590564727783, "std": 8.249150276184082, "min": -12.829475402832031, "p10": -3.966641235351561, "median": 2.7432479858398438, "p90": 16.012803649902345, "max": 22.063491821289062, "pos_frac": 0.71875, "sample": [6.464092254638672, 6.785053253173828, 10.508552551269531, -7.780914306640625, 2.1893310546875, 1.374898910522461, -9.679779052734375, 0.676422119140625, 7.136358261108398, 11.769073486328125, 1.4575576782226562, 3.0382232666015625, -11.540285110473633, 14.967830657958984, 8.650375366210938, 1.7518692016601562, -0.2329254150390625, 6.298309326171875, 21.00927734375, -0.5157012939453125, -1.571615219116211, 2.448272705078125, -1.9629745483398438, 10.4530029296875, 17.072784423828125, 13.9715576171875, -5.1448974609375, 8.735239028930664, 10.144378662109375, 2.0009803771972656, 0.26090049743652344, -0.7254467010498047, 4.710807800292969, 13.479934692382812, -0.2832298278808594, 10.650833129882812, 14.664703369140625, -4.558502197265625, 1.3193016052246094, 5.262237548828125, 21.828468322753906, 1.6143169403076172, 6.376220703125, 15.264450073242188, -0.9589195251464844, 9.354391098022461, 0.0578460693359375, 16.333526611328125, 18.95606231689453, 7.626644134521484, 2.017578125, -1.8483753204345703, -2.1219749450683594, -10.08867073059082, 4.3601226806640625, 8.233551025390625, -1.9464607238769531, -12.829475402832031, 22.063491821289062, -2.58563232421875, 20.583648681640625, 1.3750972747802734, 14.106773376464844, 0.4812469482421875], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000658.npy"}
{"epoch": 0.9947089947089947, "step": 659, "batch_size": 64, "mean": 4.264475345611572, "std": 7.417530536651611, "min": -13.96307373046875, "p10": -4.222239685058593, "median": 3.947751045227051, "p90": 14.243280029296878, "max": 21.774993896484375, "pos_frac": 0.703125, "sample": [2.8553924560546875, 10.234317779541016, 16.44123077392578, 8.42109489440918, 0.5128898620605469, 15.04315185546875, 3.5228538513183594, 9.485044479370117, 3.928955078125, 20.599445343017578, 12.561248779296875, -3.8840103149414062, 1.8679656982421875, -0.8528404235839844, 1.2465629577636719, 4.34136962890625, 8.190284729003906, -8.850860595703125, 3.9665470123291016, -5.516654968261719, 13.622406005859375, -1.1959762573242188, -0.16686248779296875, 6.269233703613281, 1.2589912414550781, -1.4898490905761719, 2.788860321044922, 4.484516143798828, -0.679840087890625, 6.545074462890625, 4.2752838134765625, 2.3601303100585938, 11.116447448730469, 4.54815673828125, 3.80816650390625, 0.7919235229492188, 17.225372314453125, 9.516212463378906, 20.39411163330078, 4.873630523681641, 14.509368896484375, -3.1098785400390625, -3.4140052795410156, 9.738086700439453, -1.5558910369873047, 0.5614204406738281, 7.260440826416016, 9.949302673339844, 6.4659423828125, -4.367195129394531, 2.6871795654296875, 7.2077531814575195, -11.997318267822266, -5.679283142089844, 12.755905151367188, 4.0180206298828125, -0.31522369384765625, -13.96307373046875, 21.774993896484375, 4.69744873046875, -3.3466224670410156, 9.144460678100586, -0.1517333984375, -4.403656005859375], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000659.npy"}
{"epoch": 0.9962207105064248, "step": 660, "batch_size": 64, "mean": 5.726513862609863, "std": 6.551272869110107, "min": -7.2996826171875, "p10": -1.6459732055664062, "median": 4.655551910400391, "p90": 15.127202987670902, "max": 22.300559997558594, "pos_frac": 0.8125, "sample": [17.067398071289062, 2.7233657836914062, 2.947111129760742, -0.4963951110839844, 7.609312057495117, -2.2577133178710938, 2.21099853515625, 9.959640502929688, -4.725425720214844, -7.2996826171875, 14.22946548461914, 22.300559997558594, 4.223133087158203, 5.965717315673828, 5.031396865844727, 17.831298828125, 13.25714111328125, 6.328439712524414, 3.501638412475586, -3.0305709838867188, 4.181053161621094, -5.762298583984375, 15.511947631835938, 0.16953659057617188, 6.5710601806640625, 17.054332733154297, -5.447536468505859, 6.8181304931640625, 6.118669509887695, -1.544342041015625, 2.4965953826904297, 10.7144775390625, 20.66845703125, 2.2926025390625, 1.2110767364501953, 0.199127197265625, 2.9418792724609375, 4.886039733886719, 9.811723709106445, -0.02642822265625, 3.9614791870117188, 6.909984588623047, 7.879432678222656, 7.651327133178711, 12.404525756835938, 9.233104705810547, -1.6895294189453125, 3.451995849609375, 9.076879501342773, 12.282485961914062, 4.434947967529297, 19.264297485351562, 2.7817764282226562, 0.08793449401855469, 13.048511505126953, 3.413299560546875, 6.38116455078125, 10.287254333496094, 3.7517471313476562, -0.15391921997070312, 2.5808982849121094, 4.876155853271484, -1.2563705444335938, 9.594589233398438], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000660.npy"}
{"epoch": 0.9977324263038548, "step": 661, "batch_size": 64, "mean": 5.0092973709106445, "std": 7.698227405548096, "min": -15.141010284423828, "p10": -4.2287246704101555, "median": 4.586338043212891, "p90": 15.424906539916993, "max": 19.286008834838867, "pos_frac": 0.765625, "sample": [2.7534103393554688, -3.177021026611328, -7.281585693359375, -4.830049514770508, 11.697406768798828, 3.5751419067382812, 15.26083755493164, -1.3871612548828125, -14.295944213867188, 6.3747711181640625, -5.100257873535156, 2.0906448364257812, 0.6631622314453125, 16.525100708007812, 9.755783081054688, 14.916107177734375, 19.286008834838867, 5.349979400634766, 17.967147827148438, 1.6108551025390625, -15.141010284423828, 4.483833312988281, 16.199871063232422, 0.4289093017578125, -1.9818611145019531, 3.1881256103515625, 2.6487979888916016, -3.8097190856933594, 1.110666275024414, 3.3950729370117188, 8.246334075927734, 7.902198791503906, -7.312335968017578, 15.828559875488281, 15.4830322265625, 1.6680984497070312, 6.864524841308594, 15.28927993774414, 1.5975532531738281, 12.183246612548828, 11.753494262695312, 2.469982147216797, 0.6992645263671875, 14.339157104492188, 11.991851806640625, 5.567230224609375, 13.4241943359375, 4.707998275756836, 2.158018112182617, 2.7787952423095703, -4.408298492431641, 10.112838745117188, -3.373199462890625, -1.44683837890625, 7.222324371337891, 12.666976928710938, 9.592498779296875, 7.5318603515625, 4.975799560546875, 16.856582641601562, 4.6888427734375, -1.6244964599609375, -1.4221000671386719, 9.304733276367188], "npy": "/workspace/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851/margin_logs/step_0000661.npy"}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:22dddd9b4bf59a58ac9754704862dc0b60abca7f1f9941029f73d8f470387557
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9c4a0d8c26a315903fc2506660d8ac2eb82c1e4d9a761e6a7de89830e1a119f6
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bb10fba5205f98ffdc6f4cb4581d1a09592740d723d6951913a884dc24c06a40
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f4dd5a99e209e57527be4bf7c0e97e8f9808b921eaf683750b55146d08fef381
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:37f91720bb3f1a2391978fd4acf0d9b4844326360233f9d001127ff33fc51bd8
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1501c3310a580187e8ad53d36f115cdae7a1b8aad03417201926dbb34e4b1055
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:893dca05b7210f5b6a2721e52b3cf8afd756a4617a0929adc4b1049e1fced544
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ed1c9b8ff4a807ca700c494b93fd9c6f3021124780bd0fc13e3aa7cbefe71574
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a993b388847d565b9b02a3a8c36c7390f3043dc41aee061e31fa1f6375bbd342
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6ccf6fc3e827cc5bcbc467e7d35639966bbf71a8f8b3e007089c81d8e3954fc4
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8b2a829fa2b1710a73a4d7009dfbba2d7a682ad86d70d13c0c6cd5927982500f
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c51424340f6d52f7323cf9ee68d60cc13e9fb8c961b961cae520cf978a3b5648
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:80349720aec09311ff08daaab4707bad676243cf29910e3c09e954b437258343
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:18f75a70e4084ce2549c4628daaca5733635c18a766855f4d8f8a26166f65e3c
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a363c80b3b2ba0508ef1498a513d45ca042d2402dd23c8678420b3240b291345
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:51a589c84c2da8c70d84b321cbafc460d72453000e961dea1e722a5c8266e4eb
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0447227bacba5594d26b5a996e85d2986e1d4819a59a8ad4a53ffb2173aebff3
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:959e9b1a2524e22a8f58869d23b48719343ae29b5d03f0192405ecb5465d2890
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:59df341e755c4da83299bc74c23ec3d9879480642278c3ec03a5c28621f3fa54
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:605563de47bdf55deb66bc72729c8bc78908e9709d000df7f78c217d0f0c6fb2
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6718a2668317d3a309e1af136190cd33d9aed2fdbfa3c676367e90543b0c1b8d
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0b60f0a87fd88ca40915f5b060d738c41d89c9a72f5162bd70dc1b14d45cf45c
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9c1ab0d42f9247443eba518df2a652940e03b4b9816fe6a93a0b0dcaad57f925
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:259e229facc89c586741272d3e0e2817c14a2a3b4d1d28dffcfa04755de8ace7
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a23a86ae0684eefde491e28f37c63481c4403d13007678112a844f33ac34bd1f
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:94ad2e440f69481639085d944f2bcfc0dea5c04fe507cd5487f5ea0fa0528ad7
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d89a2eb615c332a8fa0590c5bbe7440eb3f1ce34ad7e0916602865fb5c24fe84
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0ca46a6527d505cfc77dc191ae2a6bf5e923d46509ee9b2a17cd7d483d260d08
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0eee1bcd8893691ba24e2e4847452763221c036fa43ad4025defe74210408bd2
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8adfddae72bc21a8615ab4d370d954a653b28370112887e482ccf16b40894ef8
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:78aa881b1bc5878a3cdb4faeb156f19ea8de305e26a1494131ef3d175c57e8c1
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:63c4e15a571d09947bc9f1978741468ad36fceb62cb243759196d203c5eca6e9
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:52ed5a36745b1d892c62d00f579f46c6aea439d5a03b3cf9cb3fe77b50cb7004
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6a4e2c9c237d936ce626a308535daee1028974bf4be465c6fa478e914a26018b
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:745f96068276f4cfefb3b20a69d62097749e974c2dfb5e334d83bd5909300ff5
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1d752d8c637e2d75da6559f01086f50373b91811e19c700819c28551fe824331
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0ffbc6639e4a9d0e2d37613d6fa66e87d62c53b3005d382cc488cb4c1f3c8f4e
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6c92995f1ff765afcc14c981a6072575ebbf0a26f3925b67a9da272b1164e8ed
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e12be23eaaf21db95ebd5855b58d2eceda5b05ec52994a0342ff5764e05549ae
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:234ada16d647a084e742ba2570ceaac23880773e4821f8adf65b57261d2ec114
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:54ad7d5968f4d09d31dacbdbbdb7a60f8f44f4b66fe02917531dbe99bb0f7681
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:31dde69c427997a802949d09f651ef50866e565d28815c09f6fba19449e53503
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:471c824562929a59a3f8d109cf5064cb6a142f0d8f4817baf82bce5e1d13f4b3
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:287facb2672c5ea55e91afa273ccc31ce2f2fe7fcaff04d7e7b46ec0c6caa2b1
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:379385db61d7816b434e40dd774e63eef644b78eb3be8d18b00d32d0f0fda9f3
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4c8c49884bb167a348abdbdc08eca90460ae5f93a519b49713ad2e29d023f384
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b008ddaee14be08996942bb83bbf8f6102502e659d9c7be5eb670f7cce243f15
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a0d5aee066326131a10cccf35ce1f26b12a5bddc07c5dfc349591926724c8a11
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8e5dad2b9dce972885f389b265f78ef3283eba21c779a59af02cdb4815947418
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:dd54b7f76da0e18c1c0029aa21515f98934da6ab65798bbb33b8954d12851c4b
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:58f7e0e41e55e32ae533003c2473413d73e876c69a76e62e265ab6d406db5a8e
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d23fdb4c2099db7b7db7135ab9e2a82b9a2f67425ef2ce57c3a3889d0ca9994c
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3a7618ceb092ceb03f2bd6d3e420409cec2c0092f43cd980747c845eda336fe7
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ca0a3a189f4360a23a9e4c18178a9429371f0ac4be6ffac32bad89ff2bf2dda5
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:dcc26e7c2e7e7d67ccc1ea7947be2dab32e7d019895865462acdd69e12948194
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:353ab894da73ec8f5c882fa61a18379249ec9cfee0808673db7eeb2147cb8563
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:70563ff1a059f3b4d0abc492c372a21e3e051117cb037589a7a986d569d9d8f8
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5ef3c3ef8dcff14928bef86ffa154e72a4bd0ca4c25b47d6f2f2dd5164ca694c
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:eec62f9d65655dae212580dff244077aa8368ddcd9e00934a0e38f1fd0009e20
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f987d6c8fb89d0273880ab25cb46cdd4475c194f2593c05e0f01d516a0b2ebf5
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f6c6c42986b4659464cd178f929969e90b877300f18226b77411b304cdc05238
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:80799817e6b7fdd8d48fd792e2431108a04fb7819e4c6e935d85eb9c04604b61
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:22920cbd113853f6154ee0065e90b1b8dde1872842b1c13f08e08d4105c6bf2e
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c8405f802f8e6e8b40842d83f7040304cdde4586f7ef949575d41bf8f53b215f
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8153311448f3ed711796816352d750dbd97c6e5768d218cfcabe7ccbc27881b7
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ae7711a51c20d2f5d19d02be59a1228277c1fb1d45bdfbfe8f7b3a268706b77d
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bd783c85ca0cd98954767e95ddf442f22a56334368dc6597ab396c66d48c9335
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ec39a11830beb705851b60b52a356fd6be42e1941ec987f0d2053149e4c717fa
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0a3e7a1f3c0331b672ae2088b0ee4c5af388306071e5664ff58b7271be9fafb8
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6b26f7e96e14c443ac91219fcd6be9bd475bbaf8217c3f84d9a5f9ed7679c07f
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0870794c5678701a7497743b2d826df6fe8230a486f71b576e6e9eebf10ce9d6
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:500a4df2c8dbe8321c0fbca84afb2ff5af77c2b3d4fad9376cc7d0caa025d893
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9160b05a9de3061749d5d4b4596ff12b61dd5f646644093e4fa40b7c02b34e2b
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:42e6e6c0b308a0420464ade56b53aed3c0bb1eb8ed07728bc5c06fd01ebeede6
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f81bf296a11411830b00d89fde71ec32a93a609fda51cb503fb66d359bb6e055
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:05d51008610877c0b135d7389cd5b9a029d51e3f58122f2e383da67352561da8
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:65f2858808c42c33904991f7ef73eafaa73a613cb8a89e1976c086ff615d03a8
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d85736bbe3370f7dc5d3b5920bfad91b8432ffdc64dabdd8679aec5e8a7e360f
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:41f82fa3b96b3c0ffaa18cf4e760bd566b77b321d5cb912eb9ec3ae8797fb563
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1dc6da215da84a256ce2310517aeb2d83faec594befa7c9c72da16f75bde9509
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d34819a39c3b0377ae3c0a611c7299665e9bedb0766532dffee046ffe12358bc
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aedc486d8ef826772c3542ce8f8ee5428c0212aaeefd2bc6cd0fa3375dfe200f
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8e4b4883f9f94946022d229809504fbdfa3281b807c29933666fefe4e4144807
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8b5aaf559adaaa201d345b994a410a41b0403559ee1beae47b12fd704351fe91
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d7ad74cb0a30b8ea40ce53d5baa8970b16e526e3b51beae1514209897e32b1bd
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5cce0c2efdbfc646d6bdb6572e9b005725c16cbd2d974ac8f4df79bdc5d23f31
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:730993d67fd3a4d6306cf338689406d791cb789458979d6872d82b6b007409f1
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:411b57061f8ba8ad4bf83e03c0149b494e56becd8172fc36f1f6c12485a3e9b3
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d3aeda3ec066c9c77aa7559aaf091d2648c5ba446745e8e019c3b056b78c0a63
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1110fa80880f98c2bcedf0d77b5140e6e002dd2b78c24f440bb8e71bf1e14aa3
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1b152b82d51119afea00424c30d40206940bc5983fc7784935a73a0c54155613
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e200390ba34225af519c6639872cc8aefa31108823110d18c54f125a3665cd32
size 384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f47ba6574ea257be45bc733dba81fc7c2615b9834fe93f541a130715a2cefe19
size 384

Some files were not shown because too many files have changed in this diff Show More