初始化项目,由ModelHub XC社区提供模型

Model: jackf857/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-09 21:17:36 +08:00
commit ed60227d3e
24 changed files with 165072 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

80
README.md Normal file
View File

@@ -0,0 +1,80 @@
---
library_name: transformers
base_model: jackf857/qwen3-8b-base-sft-hh-helpful-4xh200-batch-64-20260417-214452
tags:
- alignment-handbook
- margin-dpo
- generated_from_trainer
datasets:
- Anthropic/hh-rlhf
model-index:
- name: qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948
This model is a fine-tuned version of [jackf857/qwen3-8b-base-sft-hh-helpful-4xh200-batch-64-20260417-214452](https://huggingface.co/jackf857/qwen3-8b-base-sft-hh-helpful-4xh200-batch-64-20260417-214452) on the Anthropic/hh-rlhf dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4195
- Margin Dpo/margin Mean: 15.8715
- Margin Dpo/margin Std: 17.0771
- Logps/chosen: -132.6461
- Logps/rejected: -139.3175
- Logps/ref Chosen: -101.8862
- Logps/ref Rejected: -92.6861
- Logits/chosen: -1.4538
- Logits/rejected: -1.1606
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Margin Dpo/margin Mean | Margin Dpo/margin Std | Logps/chosen | Logps/rejected | Logps/ref Chosen | Logps/ref Rejected | Logits/chosen | Logits/rejected |
|:-------------:|:------:|:----:|:---------------:|:----------------------:|:---------------------:|:------------:|:--------------:|:----------------:|:------------------:|:-------------:|:---------------:|
| 1.0426 | 0.1468 | 100 | 0.6031 | 2.7546 | 5.8844 | -105.6357 | -99.1902 | -101.8862 | -92.6861 | -0.0566 | 0.1974 |
| 0.7622 | 0.2937 | 200 | 0.4644 | 10.5567 | 12.6651 | -113.3752 | -114.7319 | -101.8862 | -92.6861 | -1.1210 | -0.8454 |
| 0.7508 | 0.4405 | 300 | 0.4348 | 13.3686 | 14.8396 | -121.8530 | -126.0214 | -101.8862 | -92.6861 | -1.2687 | -0.9791 |
| 0.4743 | 0.5874 | 400 | 0.4292 | 15.3820 | 16.7624 | -128.4094 | -134.5913 | -101.8862 | -92.6861 | -1.2117 | -0.8992 |
| 0.7107 | 0.7342 | 500 | 0.4213 | 15.8606 | 17.0950 | -131.5918 | -138.2523 | -101.8862 | -92.6861 | -1.3378 | -1.0359 |
| 0.5423 | 0.8811 | 600 | 0.4195 | 15.8715 | 17.0771 | -132.6461 | -139.3175 | -101.8862 | -92.6861 | -1.4538 | -1.1606 |
### Framework versions
- Transformers 4.51.0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.21.4

28
added_tokens.json Normal file
View File

@@ -0,0 +1,28 @@
{
"</think>": 151668,
"</tool_call>": 151658,
"</tool_response>": 151666,
"<think>": 151667,
"<tool_call>": 151657,
"<tool_response>": 151665,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

22
all_results.json Normal file
View File

@@ -0,0 +1,22 @@
{
"epoch": 1.0,
"eval_logits/chosen": -1.4371687173843384,
"eval_logits/rejected": -1.1416503190994263,
"eval_logps/chosen": -132.72055053710938,
"eval_logps/ref_chosen": -101.88616943359375,
"eval_logps/ref_rejected": -92.68607330322266,
"eval_logps/rejected": -139.40850830078125,
"eval_loss": 0.42009782791137695,
"eval_margin_dpo/margin_mean": 15.88807487487793,
"eval_margin_dpo/margin_std": 17.02425765991211,
"eval_runtime": 44.0516,
"eval_samples": 2339,
"eval_samples_per_second": 53.097,
"eval_steps_per_second": 1.68,
"total_flos": 0.0,
"train_loss": 0.7553340482816823,
"train_runtime": 3298.7616,
"train_samples": 43598,
"train_samples_per_second": 13.216,
"train_steps_per_second": 0.206
}

30
config.json Normal file
View File

@@ -0,0 +1,30 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151643,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 12288,
"max_position_embeddings": 32768,
"max_window_layers": 36,
"model_type": "qwen3",
"num_attention_heads": 32,
"num_hidden_layers": 36,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.51.0",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}

16
eval_results.json Normal file
View File

@@ -0,0 +1,16 @@
{
"epoch": 1.0,
"eval_logits/chosen": -1.4371687173843384,
"eval_logits/rejected": -1.1416503190994263,
"eval_logps/chosen": -132.72055053710938,
"eval_logps/ref_chosen": -101.88616943359375,
"eval_logps/ref_rejected": -92.68607330322266,
"eval_logps/rejected": -139.40850830078125,
"eval_loss": 0.42009782791137695,
"eval_margin_dpo/margin_mean": 15.88807487487793,
"eval_margin_dpo/margin_std": 17.02425765991211,
"eval_runtime": 44.0516,
"eval_samples": 2339,
"eval_samples_per_second": 53.097,
"eval_steps_per_second": 1.68
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"bos_token_id": 151643,
"eos_token_id": 151643,
"max_new_tokens": 2048,
"transformers_version": "4.51.0"
}

681
margin_logs/margins.jsonl Normal file
View File

@@ -0,0 +1,681 @@
{"epoch": 0.0, "step": 1, "batch_size": 64, "mean": 0.061235010623931885, "std": 0.44581517577171326, "min": -0.9558563232421875, "p10": -0.49806652069091795, "median": 0.0538330078125, "p90": 0.6431861877441406, "max": 1.349212646484375, "pos_frac": 0.5625, "sample": [0.2952117919921875, -0.9558563232421875, 0.002197265625, 0.5723953247070312, 0.45829010009765625, 0.045989990234375, 0.07258987426757812, 0.16957473754882812, -0.0712890625, 0.067230224609375, 0.061676025390625, 0.9847030639648438, 0.25745391845703125, 0.223114013671875, 0.8358078002929688, -0.205474853515625, -0.4477996826171875, -0.6458778381347656, 1.101287841796875, 0.0405731201171875, -0.05789947509765625, 0.8459625244140625, -0.16561508178710938, -0.1964569091796875, -0.08588409423828125, 0.07699203491210938, -0.15012359619140625, 0.08970260620117188, -0.6135711669921875, -0.3250083923339844, -0.5196094512939453, -0.055461883544921875, 0.06568145751953125, -0.1916046142578125, 0.07837677001953125, -0.5969200134277344, -0.6232337951660156, -0.23907470703125, 0.10578536987304688, 1.349212646484375, 0.6396026611328125, -0.1342315673828125, -0.33837318420410156, -0.6055755615234375, 0.28924560546875, -0.053333282470703125, -0.13497543334960938, 0.040065765380859375, 0.156463623046875, 0.30047607421875, 0.18242645263671875, -0.4043922424316406, 0.10161590576171875, -0.2782421112060547, 0.6447219848632812, 0.592559814453125, 0.2892589569091797, 0.2594795227050781, 0.278594970703125, 0.7102890014648438, -0.0288848876953125, 0.5050926208496094, -0.35527801513671875, -0.3906135559082031], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000001.npy"}
{"epoch": 0.0014684287812041115, "step": 2, "batch_size": 64, "mean": -0.12442246079444885, "std": 0.41391414403915405, "min": -1.25341796875, "p10": -0.5899612426757812, "median": -0.16697025299072266, "p90": 0.40547485351562507, "max": 0.74664306640625, "pos_frac": 0.375, "sample": [-0.4356536865234375, 0.0176544189453125, -0.20431900024414062, 0.74664306640625, -0.2661247253417969, -0.1260967254638672, -0.8647918701171875, -0.17073822021484375, -0.477691650390625, -0.18256378173828125, 0.3944091796875, -0.16320228576660156, -0.3320465087890625, -0.39223480224609375, -0.4348640441894531, -0.602691650390625, -1.201324462890625, -0.4142303466796875, -0.037799835205078125, 0.09072113037109375, -0.5269927978515625, 0.05698394775390625, 0.5052413940429688, 0.13204193115234375, -0.4286689758300781, 0.03350067138671875, 0.65325927734375, 0.6349716186523438, -0.19322967529296875, 0.34609222412109375, 0.65667724609375, -0.2496337890625, -0.6125564575195312, -0.11834716796875, -0.10693359375, -0.062713623046875, -0.3054351806640625, 0.3363838195800781, 0.35395050048828125, -0.008523941040039062, -0.6783828735351562, 0.23185348510742188, -0.02996826171875, -0.4708709716796875, -1.25341796875, -0.22631072998046875, 0.0096588134765625, 0.437835693359375, 0.345306396484375, 0.17419815063476562, -0.5602569580078125, 0.17741012573242188, 0.04550933837890625, -0.2523040771484375, -0.6806221008300781, -0.28516387939453125, -0.21526336669921875, -0.24713897705078125, 0.18416213989257812, 0.41021728515625, -0.4717559814453125, -0.22457504272460938, 0.0484161376953125, -0.47069549560546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000002.npy"}
{"epoch": 0.002936857562408223, "step": 3, "batch_size": 64, "mean": 0.03174795210361481, "std": 0.4643594026565552, "min": -1.1258697509765625, "p10": -0.595195198059082, "median": 0.040493011474609375, "p90": 0.653947448730469, "max": 1.12786865234375, "pos_frac": 0.53125, "sample": [-0.6029033660888672, -0.911865234375, 0.1763153076171875, 0.0651092529296875, 0.3923759460449219, -0.18950462341308594, 0.14054107666015625, 0.09049606323242188, 0.2709808349609375, -0.14229774475097656, 0.6973876953125, 0.5523185729980469, -1.1258697509765625, -0.20796966552734375, -0.078369140625, 0.055694580078125, -0.57720947265625, -0.6348648071289062, -0.7830657958984375, 0.2159576416015625, 0.6094284057617188, 0.8430023193359375, 0.256744384765625, -0.25201416015625, 0.06450653076171875, 0.03399658203125, -0.22730255126953125, -0.17305755615234375, 0.04698944091796875, -0.8885498046875, 0.44861602783203125, 1.0282745361328125, 0.013353347778320312, -0.178070068359375, -0.00365447998046875, 0.1596832275390625, -0.01377105712890625, -0.2791481018066406, 0.30573272705078125, 0.4403038024902344, -0.29293060302734375, 0.134063720703125, 0.3950958251953125, 0.4018821716308594, -0.258209228515625, -0.19649124145507812, 0.7665863037109375, -0.2421875, 0.27696990966796875, -0.15114593505859375, -0.3887939453125, -0.2666740417480469, 0.17680740356445312, 0.7966537475585938, -0.45803070068359375, -0.7217254638671875, -0.21836090087890625, 0.6730270385742188, -0.06058502197265625, 0.4389190673828125, -0.02663707733154297, 1.12786865234375, 0.4224395751953125, 0.06500625610351562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000003.npy"}
{"epoch": 0.004405286343612335, "step": 4, "batch_size": 64, "mean": -0.006301596760749817, "std": 0.41835400462150574, "min": -1.1286163330078125, "p10": -0.6194938659667969, "median": 0.026378631591796875, "p90": 0.49124336242675787, "max": 0.838165283203125, "pos_frac": 0.53125, "sample": [-0.7179412841796875, 0.3267974853515625, -0.06884384155273438, -0.0682830810546875, 0.0425262451171875, 0.115203857421875, -0.5585479736328125, -1.059478759765625, 0.3492927551269531, -0.0864410400390625, 0.11034774780273438, -0.46399688720703125, 0.34256744384765625, 0.2814903259277344, -0.167999267578125, 0.345703125, 0.5978164672851562, 0.4986076354980469, 0.4197120666503906, 0.11420822143554688, -0.23513031005859375, -0.6312103271484375, 0.23905181884765625, -0.1127777099609375, 0.838165283203125, -0.78778076171875, 0.7340087890625, -0.5921554565429688, -0.4002265930175781, -0.686309814453125, 0.2817668914794922, 0.26856040954589844, -0.30342864990234375, -0.20601654052734375, -0.10233116149902344, 0.155059814453125, -1.1286163330078125, 0.52593994140625, 0.16728973388671875, -0.345458984375, 0.47406005859375, 0.04554462432861328, -0.1041412353515625, 0.01023101806640625, 0.14017486572265625, -0.10758209228515625, 0.697784423828125, -0.6721343994140625, -0.029521942138671875, 0.37738800048828125, 0.19993019104003906, 0.6581916809082031, -0.03799629211425781, -0.32936859130859375, -0.14397048950195312, -0.12257003784179688, 0.1221771240234375, 0.31018829345703125, -0.15686798095703125, 0.083038330078125, 0.000701904296875, 0.23792648315429688, -0.227783203125, 0.14015579223632812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000004.npy"}
{"epoch": 0.005873715124816446, "step": 5, "batch_size": 64, "mean": 0.02907317876815796, "std": 0.46151700615882874, "min": -0.920074462890625, "p10": -0.4865249633789062, "median": 0.019622802734375, "p90": 0.5275279998779299, "max": 1.874603271484375, "pos_frac": 0.515625, "sample": [-0.3662109375, -0.07076263427734375, 0.4440422058105469, -0.3920021057128906, 0.1690673828125, -0.87158203125, -0.226470947265625, -0.042118072509765625, 0.1916046142578125, 0.22330474853515625, 0.1880035400390625, 0.479278564453125, -0.030033111572265625, -0.14899826049804688, 1.0869979858398438, 0.013652801513671875, 0.1419219970703125, -0.36943817138671875, -0.920074462890625, 0.5616302490234375, 0.2798500061035156, 0.380218505859375, 0.33712005615234375, -0.20590591430664062, 0.025592803955078125, -0.4581451416015625, -0.01917266845703125, 0.15151214599609375, 0.2228240966796875, 0.17777252197265625, -0.0841522216796875, 0.3103141784667969, 0.15856170654296875, -0.12004280090332031, 0.7521934509277344, -0.8556976318359375, -0.290557861328125, 0.6452789306640625, -0.076080322265625, -0.08795166015625, -0.12297439575195312, 0.05914306640625, -0.391326904296875, -0.508087158203125, -0.0390777587890625, -0.4341278076171875, 0.07831764221191406, 0.8653945922851562, -0.5413818359375, 0.3864402770996094, 0.5482063293457031, -0.190582275390625, -0.5164947509765625, 0.05670928955078125, -0.4333229064941406, 0.43349456787109375, 1.874603271484375, 0.1480560302734375, 0.09416961669921875, 0.22420501708984375, -0.37172698974609375, 0.04876708984375, -0.498687744140625, -0.21437835693359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000005.npy"}
{"epoch": 0.007342143906020558, "step": 6, "batch_size": 64, "mean": -0.008213222026824951, "std": 0.4243967533111572, "min": -1.1857757568359375, "p10": -0.5290302276611328, "median": 0.06088829040527344, "p90": 0.4377971649169922, "max": 0.9782867431640625, "pos_frac": 0.53125, "sample": [0.2510528564453125, 0.1860198974609375, -0.031116485595703125, -0.37410736083984375, -0.3921165466308594, -0.5013580322265625, 0.3042144775390625, 0.4391441345214844, 0.18747711181640625, 0.05322265625, -1.1857757568359375, 0.14395904541015625, 0.03336334228515625, -0.12063217163085938, 0.6276931762695312, -0.355712890625, 0.1814117431640625, 0.5567436218261719, -0.7007904052734375, 0.12203216552734375, -0.20355224609375, 0.46076202392578125, 0.15771484375, 0.12047958374023438, -0.22602462768554688, -0.2010498046875, 0.1109619140625, -0.1612091064453125, 0.2970733642578125, -0.098907470703125, 0.6921234130859375, -1.1168975830078125, -0.3266143798828125, 0.4319725036621094, -0.04251861572265625, 0.58367919921875, 0.2021636962890625, 0.43465423583984375, 0.24622344970703125, 0.3650970458984375, 0.258056640625, -0.0213165283203125, 0.9782867431640625, -0.0445404052734375, 0.3499183654785156, 0.12223052978515625, 0.10764312744140625, 0.2894935607910156, -0.9111404418945312, 0.3773040771484375, -0.4263916015625, -0.06914138793945312, 0.06855392456054688, 0.34001922607421875, 0.24610137939453125, -0.5408897399902344, -0.751922607421875, -0.282257080078125, -0.4444580078125, -0.0311126708984375, -0.20697021484375, -0.004119873046875, -0.7458343505859375, -0.334014892578125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000006.npy"}
{"epoch": 0.00881057268722467, "step": 7, "batch_size": 64, "mean": 0.006833851337432861, "std": 0.4285798668861389, "min": -0.99481201171875, "p10": -0.5364656448364258, "median": 0.025867462158203125, "p90": 0.4765869140625001, "max": 1.3897857666015625, "pos_frac": 0.546875, "sample": [0.13875198364257812, -0.4645538330078125, 0.37833213806152344, 0.647308349609375, -0.44659423828125, -0.02259063720703125, 0.20149612426757812, 0.328338623046875, 0.2766227722167969, 0.49710845947265625, 0.3763427734375, 0.1641387939453125, -0.0731353759765625, -0.7234649658203125, 0.03688812255859375, 0.127777099609375, -0.4828834533691406, -0.025569915771484375, -0.21386337280273438, -0.32151031494140625, 0.06337738037109375, 1.0927886962890625, 0.45592498779296875, -0.030048370361328125, -0.078582763671875, -0.5460186004638672, 0.13687896728515625, -0.201171875, -0.99481201171875, 0.03528594970703125, -0.31356048583984375, -0.057891845703125, -0.17003631591796875, -0.671295166015625, 0.5531539916992188, 0.058414459228515625, -0.2945098876953125, -0.5141754150390625, 1.3897857666015625, -0.28155517578125, 0.36431884765625, 0.48392486572265625, 0.0747528076171875, 0.0163726806640625, 0.018707275390625, -0.45123291015625, -0.20929718017578125, -0.8036117553710938, 0.1033172607421875, 0.1500396728515625, 0.41607666015625, 0.636962890625, -0.6068191528320312, 0.37177276611328125, -0.5816192626953125, 0.032848358154296875, 0.108795166015625, -0.25904083251953125, 0.026397705078125, -0.2377471923828125, 0.02533721923828125, 0.45946502685546875, -0.05352020263671875, 0.32027435302734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000007.npy"}
{"epoch": 0.010279001468428781, "step": 8, "batch_size": 64, "mean": 0.029244408011436462, "std": 0.4103383719921112, "min": -1.198455810546875, "p10": -0.41649627685546875, "median": -0.001964569091796875, "p90": 0.47425384521484387, "max": 1.41522216796875, "pos_frac": 0.5, "sample": [0.5579986572265625, 0.056732177734375, 0.142852783203125, -0.04881858825683594, -0.0262451171875, 0.04876518249511719, -0.0224151611328125, -0.07376861572265625, -0.328125, -0.6969451904296875, -0.21799468994140625, 0.2532157897949219, 0.120208740234375, 0.15353775024414062, 0.814422607421875, 0.0555572509765625, -0.07276153564453125, -0.0294342041015625, 0.01772308349609375, 0.442413330078125, 0.0579071044921875, 1.41522216796875, -0.147613525390625, -0.8093414306640625, -0.004364013671875, -0.14069747924804688, 0.363128662109375, -0.2644157409667969, -0.4501190185546875, 0.00043487548828125, -0.02165985107421875, 0.336395263671875, -0.4301414489746094, 0.743896484375, -0.17098617553710938, -0.159698486328125, -0.0176544189453125, 0.1762371063232422, 0.14262866973876953, 0.413116455078125, 0.12649154663085938, -0.184967041015625, -0.07909393310546875, -0.7998504638671875, -0.011568069458007812, 0.08502197265625, -1.198455810546875, 0.035491943359375, 0.3195781707763672, 0.11194610595703125, -0.304290771484375, 0.5970001220703125, -0.4271392822265625, -0.2842903137207031, 0.3325347900390625, 0.1624298095703125, 0.948944091796875, 0.3598175048828125, -0.39166259765625, -0.243560791015625, -0.06847381591796875, -0.08031082153320312, 0.19895553588867188, 0.4878997802734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000008.npy"}
{"epoch": 0.011747430249632892, "step": 9, "batch_size": 64, "mean": 0.043021202087402344, "std": 0.3984578251838684, "min": -0.9730072021484375, "p10": -0.4643203735351562, "median": 0.03820991516113281, "p90": 0.5252098083496095, "max": 1.0045623779296875, "pos_frac": 0.546875, "sample": [-0.2072296142578125, 0.28713226318359375, -0.213653564453125, -0.3916778564453125, -0.3201484680175781, 0.230377197265625, -0.12854766845703125, 0.2897186279296875, 0.34127044677734375, -0.13409042358398438, 0.3513031005859375, 0.5030517578125, 0.20087432861328125, 1.0045623779296875, 0.6880416870117188, -0.2227325439453125, 0.14044189453125, 0.029628753662109375, -0.2816925048828125, -0.34972381591796875, -0.38889312744140625, -0.12508773803710938, 0.2080841064453125, -0.332366943359375, -0.526153564453125, 0.342742919921875, -0.5619430541992188, 0.3594970703125, 0.026668548583984375, 0.0535125732421875, -0.005275726318359375, 0.013814926147460938, -0.4969024658203125, 0.5347061157226562, 0.04679107666015625, -0.1949920654296875, 0.2718791961669922, 0.15596580505371094, 0.3950042724609375, -0.33000946044921875, -0.20991897583007812, -0.07460784912109375, 0.764495849609375, -0.5012702941894531, -0.2681121826171875, 0.45284271240234375, 0.0685272216796875, -0.376129150390625, 0.06974029541015625, -0.17664718627929688, 0.408935546875, 0.3478126525878906, -0.236663818359375, -0.07153701782226562, 0.2864799499511719, -0.9730072021484375, 0.7865791320800781, 0.854034423828125, 0.6583805084228516, -0.495452880859375, -0.5660552978515625, 0.39449310302734375, 0.2353515625, 0.11113739013671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000009.npy"}
{"epoch": 0.013215859030837005, "step": 10, "batch_size": 64, "mean": -0.06389984488487244, "std": 0.4449545443058014, "min": -1.3020477294921875, "p10": -0.6542152404785155, "median": -0.019428253173828125, "p90": 0.41669921875000016, "max": 1.2683181762695312, "pos_frac": 0.46875, "sample": [0.2119140625, -0.52667236328125, 0.30840492248535156, 0.06448173522949219, 0.21901702880859375, 0.00447845458984375, -0.136871337890625, -0.7428817749023438, 0.06915283203125, -0.18158721923828125, -0.0993194580078125, 0.7774505615234375, -0.447998046875, -0.5972366333007812, 0.956939697265625, -0.326446533203125, -0.20809173583984375, -0.19511032104492188, 0.12239837646484375, -0.2953033447265625, 1.2683181762695312, -0.2999153137207031, 0.20050048828125, 0.18017578125, 0.113311767578125, 0.29022216796875, -0.71923828125, -0.7445945739746094, -0.1786651611328125, -0.3052024841308594, 0.1998138427734375, 0.18151092529296875, 0.026805877685546875, -0.7007980346679688, -0.2152862548828125, -0.45949554443359375, 0.34307861328125, -0.1565399169921875, 0.21297454833984375, -0.009351730346679688, 0.2659263610839844, 0.046722412109375, -0.15918350219726562, 0.699005126953125, -0.8619842529296875, -0.6786346435546875, -0.32025146484375, -0.4416351318359375, 0.13402938842773438, -0.5312538146972656, -0.02538299560546875, 0.07103157043457031, -1.3020477294921875, -0.10795974731445312, 0.508544921875, 0.1378192901611328, -0.4116935729980469, 0.3830108642578125, -0.4813995361328125, -0.15997314453125, -0.0134735107421875, 0.5135574340820312, 0.4311370849609375, 0.01015472412109375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000010.npy"}
{"epoch": 0.014684287812041116, "step": 11, "batch_size": 64, "mean": 0.07255060970783234, "std": 0.4450220763683319, "min": -0.7209625244140625, "p10": -0.5425159454345703, "median": 0.077056884765625, "p90": 0.6855148315429691, "max": 1.178436279296875, "pos_frac": 0.53125, "sample": [0.11156082153320312, 0.5699462890625, -0.12450790405273438, -0.16173553466796875, -0.1116943359375, 0.1995697021484375, -0.35742950439453125, 0.055809974670410156, 0.12424659729003906, 0.32355499267578125, -0.13821029663085938, 0.21933746337890625, 0.2289276123046875, -0.18318939208984375, 1.110382080078125, 0.29935455322265625, -0.0660552978515625, 0.23752975463867188, 1.138519287109375, 0.1166229248046875, 0.13463211059570312, 0.5002670288085938, 0.07531356811523438, -0.19698143005371094, 0.6084823608398438, 0.18734359741210938, 0.3278045654296875, 0.5234146118164062, -0.566314697265625, 0.152252197265625, 0.7314300537109375, -0.7209625244140625, 0.30373382568359375, -0.5697021484375, 0.28040313720703125, 0.78240966796875, -0.6566390991210938, -0.02092742919921875, 0.3675994873046875, 0.0811309814453125, -0.07838058471679688, -0.35611724853515625, -0.06256103515625, -0.534881591796875, -0.02099609375, 0.07880020141601562, 0.9942474365234375, -0.08781242370605469, 1.178436279296875, -0.5457878112792969, -0.3288459777832031, 0.7185287475585938, -0.45690155029296875, -0.1136016845703125, -0.6250267028808594, -0.3338890075683594, -0.2144622802734375, 0.3426971435546875, -0.6691131591796875, 0.19223403930664062, -0.24492645263671875, -0.17267417907714844, -0.19347763061523438, 0.26052093505859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000011.npy"}
{"epoch": 0.016152716593245228, "step": 12, "batch_size": 64, "mean": 0.11385723948478699, "std": 0.536648690700531, "min": -1.9069442749023438, "p10": -0.34834365844726556, "median": 0.11944770812988281, "p90": 0.8023155212402349, "max": 1.27716064453125, "pos_frac": 0.625, "sample": [0.3134918212890625, -0.18682861328125, 0.251983642578125, -1.399627685546875, -0.21155548095703125, -0.4985504150390625, -0.0077972412109375, 0.635009765625, 0.2742919921875, 0.29364013671875, -0.3733673095703125, 0.40509033203125, 0.472076416015625, 0.30332183837890625, -0.1119842529296875, -0.077178955078125, 0.054534912109375, 1.27716064453125, -0.13226890563964844, 0.015842437744140625, 0.9506683349609375, -0.28995513916015625, 0.10797119140625, 0.0295257568359375, 1.009246826171875, 0.6528778076171875, 0.5429840087890625, -1.0009918212890625, 0.07305908203125, 0.8572769165039062, 0.1733551025390625, -1.9069442749023438, -0.5919418334960938, 0.13098526000976562, 0.296905517578125, 0.11376953125, -0.2584800720214844, -0.1127777099609375, 0.38376617431640625, -0.2451648712158203, 0.2712860107421875, 0.24395370483398438, 0.12512588500976562, 0.13568878173828125, 0.67303466796875, -0.5606765747070312, -0.106170654296875, -0.16387939453125, -0.07385063171386719, 1.0428695678710938, -0.13657379150390625, 0.0550537109375, -0.24395751953125, 0.2298431396484375, 0.97357177734375, 0.940704345703125, 0.3621063232421875, -0.23298263549804688, 0.48606109619140625, 0.029361724853515625, -0.14385223388671875, 0.674072265625, 0.2076873779296875, 0.28496551513671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000012.npy"}
{"epoch": 0.01762114537444934, "step": 13, "batch_size": 64, "mean": 0.05036202073097229, "std": 0.5334006547927856, "min": -1.326507568359375, "p10": -0.5186630249023437, "median": 0.03661346435546875, "p90": 0.671520233154297, "max": 1.7962188720703125, "pos_frac": 0.5625, "sample": [-0.418304443359375, 0.2079181671142578, -0.06696319580078125, -0.5292282104492188, 0.04897308349609375, 0.3022918701171875, 0.1107940673828125, 0.27628326416015625, -0.06829833984375, -0.49401092529296875, 0.6199798583984375, -0.85784912109375, -0.4495201110839844, 0.5440826416015625, -0.0065765380859375, 1.4448928833007812, -0.5867080688476562, 0.90972900390625, -0.4385414123535156, 0.05477142333984375, 0.08434677124023438, 0.1439208984375, 1.1734085083007812, 0.19590187072753906, 0.003631591796875, 0.2012176513671875, -0.35819244384765625, 0.023921966552734375, -0.30489158630371094, 0.047943115234375, -0.1774444580078125, 0.362518310546875, -0.25634765625, 0.8629188537597656, -1.326507568359375, -0.6375274658203125, 0.0605010986328125, 0.29586029052734375, 0.06981658935546875, -0.565582275390625, -0.196624755859375, -0.14844131469726562, 0.4548187255859375, -0.4834747314453125, 0.438201904296875, 0.0252838134765625, -0.488006591796875, -0.4174766540527344, 0.3046989440917969, -0.29925537109375, 0.6557388305664062, 0.1647796630859375, -0.7192764282226562, 0.06669998168945312, 0.48261260986328125, 1.7962188720703125, -0.01245880126953125, 0.26709747314453125, 0.67828369140625, -0.2381439208984375, -0.2598419189453125, 0.0020294189453125, 0.8279266357421875, -0.1813507080078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000013.npy"}
{"epoch": 0.01908957415565345, "step": 14, "batch_size": 64, "mean": -0.07616353034973145, "std": 0.458617240190506, "min": -1.478729248046875, "p10": -0.6013679504394531, "median": -0.05608367919921875, "p90": 0.49374732971191415, "max": 0.89166259765625, "pos_frac": 0.421875, "sample": [-0.1197967529296875, -0.03388786315917969, 0.008447647094726562, 0.43871307373046875, -1.36651611328125, -0.392608642578125, -0.371124267578125, -0.36660003662109375, -0.8685531616210938, 0.1216278076171875, -1.12274169921875, 0.23111724853515625, 0.0459136962890625, -0.5541305541992188, -0.1691131591796875, -0.6272048950195312, -0.24015045166015625, 0.4536590576171875, -0.272613525390625, 0.21979522705078125, -0.06517410278320312, -0.0221099853515625, -0.1767597198486328, 0.05107879638671875, 0.04628753662109375, 0.5056686401367188, 0.1791229248046875, 0.6755409240722656, -0.17609405517578125, 0.0602264404296875, 0.18047332763671875, -0.3519306182861328, 0.291900634765625, 0.2418670654296875, -0.046993255615234375, -0.621612548828125, -0.14968490600585938, -1.478729248046875, 0.19585418701171875, -0.12451171875, -0.835174560546875, 0.7151031494140625, -0.33295440673828125, -0.2714996337890625, -0.4533958435058594, -0.118865966796875, -0.2946281433105469, -0.2315826416015625, 0.32134246826171875, 0.59210205078125, -0.1990509033203125, 0.89166259765625, -0.113433837890625, -0.4592247009277344, 0.214508056640625, 0.574554443359375, 0.24866485595703125, -0.1075592041015625, 0.4659309387207031, -0.03216552734375, -0.02014923095703125, 0.5826148986816406, -0.288726806640625, 0.048809051513671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000014.npy"}
{"epoch": 0.020558002936857563, "step": 15, "batch_size": 64, "mean": 0.01562337577342987, "std": 0.5057007670402527, "min": -1.0444793701171875, "p10": -0.5003461837768555, "median": -0.00041103363037109375, "p90": 0.4858776092529299, "max": 2.402923583984375, "pos_frac": 0.5, "sample": [0.049538612365722656, 0.816009521484375, -0.26567840576171875, -0.24251937866210938, -0.4398040771484375, -0.39418888092041016, -0.1563854217529297, 0.2902374267578125, -0.04611968994140625, 0.6221084594726562, 0.0349884033203125, 0.06580543518066406, -0.34990692138671875, -0.6861114501953125, -0.0627288818359375, -0.260223388671875, 0.06778717041015625, 0.9094085693359375, -0.11667633056640625, -0.32318878173828125, -0.504486083984375, 0.3232879638671875, -0.4881591796875, -0.1478424072265625, 0.22783279418945312, -0.2594757080078125, 0.3158988952636719, -0.5002059936523438, 0.0534515380859375, 0.19071197509765625, 0.14185714721679688, -0.0214080810546875, -0.0263214111328125, -0.012460708618164062, -1.0444793701171875, -1.0238494873046875, 0.7030563354492188, 0.43573760986328125, 0.1361236572265625, -0.2065448760986328, 0.07904815673828125, 0.011638641357421875, 0.028321266174316406, 0.29396820068359375, 0.5073661804199219, 0.16107749938964844, -0.2596702575683594, -0.1494426727294922, 0.72406005859375, -0.30718994140625, 0.4074554443359375, 0.31573486328125, -0.20952224731445312, 0.22278594970703125, -0.65570068359375, 0.19300460815429688, 2.402923583984375, 0.38916778564453125, 0.28353118896484375, -0.0271148681640625, -0.5004062652587891, 0.3825950622558594, -0.2616691589355469, -0.8371429443359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000015.npy"}
{"epoch": 0.022026431718061675, "step": 16, "batch_size": 64, "mean": 0.10098287463188171, "std": 0.5034706592559814, "min": -1.647125244140625, "p10": -0.49740867614746087, "median": 0.10939407348632812, "p90": 0.7118560791015626, "max": 1.0838699340820312, "pos_frac": 0.578125, "sample": [-0.19163131713867188, 1.0838699340820312, 0.03326416015625, 0.0910491943359375, 0.7187957763671875, -0.17529296875, 0.6956634521484375, -0.10001373291015625, -0.004276275634765625, 0.9151840209960938, -1.647125244140625, 0.6461601257324219, 0.8873443603515625, -0.018768310546875, 0.3154888153076172, 0.7329864501953125, 0.16342544555664062, 0.16255950927734375, 0.007843017578125, -0.6534347534179688, 0.218505859375, -0.01222991943359375, -0.09638214111328125, 0.91754150390625, 0.036853790283203125, 0.40819549560546875, -0.5145645141601562, 0.2135639190673828, -0.35553741455078125, 0.01235198974609375, -0.07571792602539062, 0.40375518798828125, 0.482269287109375, -0.22263336181640625, 0.4008827209472656, -0.09458160400390625, 0.48706817626953125, -0.6317596435546875, 0.15146636962890625, 0.1659698486328125, -0.16231155395507812, 0.4564018249511719, -0.028347015380859375, 0.39849853515625, 0.4824371337890625, 0.6005897521972656, -0.38300323486328125, 0.18152427673339844, 0.2387237548828125, 0.12773895263671875, -0.20227813720703125, -0.38748931884765625, 0.6123199462890625, -0.4573783874511719, -0.2412109375, 0.6224441528320312, -0.03026580810546875, 0.98553466796875, -1.1905593872070312, -0.642364501953125, 0.531219482421875, -0.5155105590820312, 0.191925048828125, -0.283843994140625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000016.npy"}
{"epoch": 0.023494860499265784, "step": 17, "batch_size": 64, "mean": 0.02503436803817749, "std": 0.49855920672416687, "min": -1.132080078125, "p10": -0.6859764099121094, "median": 0.021656036376953125, "p90": 0.5491249084472657, "max": 1.295135498046875, "pos_frac": 0.546875, "sample": [0.4929962158203125, -0.18125343322753906, -0.2915802001953125, 0.40155792236328125, 0.8040618896484375, 0.039737701416015625, -0.69073486328125, 0.4097137451171875, -0.11396408081054688, 0.17631912231445312, -0.29813385009765625, 0.1480560302734375, -0.038482666015625, 0.5565338134765625, 0.02988433837890625, 0.7261123657226562, -0.01078033447265625, -0.07395553588867188, -0.2744140625, -0.04701995849609375, 0.5318374633789062, -0.9622955322265625, -0.2722206115722656, 0.1325531005859375, 0.36785125732421875, 0.26326751708984375, 0.5202255249023438, -0.42413330078125, -0.1611328125, -0.15978622436523438, -0.10591888427734375, 0.417205810546875, 0.5250167846679688, -0.9488067626953125, -0.5022563934326172, 0.12430572509765625, 0.4188385009765625, -0.147796630859375, -0.3174476623535156, -1.132080078125, 0.1526031494140625, 0.4395408630371094, -0.8450698852539062, 0.0042324066162109375, 0.680938720703125, 0.013427734375, 0.392578125, -0.7685966491699219, 0.6968612670898438, 0.221893310546875, 0.33515167236328125, 0.2576789855957031, 1.295135498046875, -0.3458404541015625, 0.19373703002929688, 1.172454833984375, -0.161712646484375, -0.21942138671875, 0.08221435546875, -0.57080078125, 0.0057544708251953125, -0.926422119140625, -0.6748733520507812, 0.23885345458984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000017.npy"}
{"epoch": 0.024963289280469897, "step": 18, "batch_size": 64, "mean": -0.007831811904907227, "std": 0.45526260137557983, "min": -1.208404541015625, "p10": -0.50157470703125, "median": -0.046794891357421875, "p90": 0.6129184722900392, "max": 1.04937744140625, "pos_frac": 0.453125, "sample": [0.5835533142089844, -0.0749053955078125, 0.01760101318359375, 1.04937744140625, -0.07326507568359375, 0.000499725341796875, -0.12203598022460938, -1.185516357421875, 0.03035736083984375, -0.5990886688232422, 0.09197998046875, 0.057758331298828125, -0.260894775390625, 0.258636474609375, 0.0658721923828125, 0.7501602172851562, -0.1484508514404297, 0.09542655944824219, -0.6571502685546875, 0.6255035400390625, -0.037914276123046875, 0.18767547607421875, -0.1710205078125, -0.35072898864746094, 0.6473236083984375, -0.2974433898925781, -0.8780517578125, 0.04491424560546875, -0.07898902893066406, 0.3861351013183594, -0.005462646484375, 0.13858413696289062, 0.3268890380859375, -0.3578643798828125, 0.5774383544921875, -0.26074981689453125, -0.3535575866699219, -0.06312751770019531, 0.47479248046875, 0.232391357421875, -0.0719146728515625, -0.2473907470703125, 1.01873779296875, -0.5022201538085938, -0.1792144775390625, -0.0733184814453125, -0.04860687255859375, 0.2717781066894531, 0.9910430908203125, -0.04498291015625, -0.621856689453125, -0.08512115478515625, 0.7114410400390625, -0.460601806640625, 0.14148330688476562, -0.5000686645507812, -0.1897125244140625, -0.3256187438964844, 0.08380126953125, 0.4718208312988281, 0.12919235229492188, -0.24474334716796875, -1.208404541015625, -0.18341064453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000018.npy"}
{"epoch": 0.02643171806167401, "step": 19, "batch_size": 64, "mean": 0.021921008825302124, "std": 0.379949688911438, "min": -0.9033050537109375, "p10": -0.37892074584960933, "median": 0.04242992401123047, "p90": 0.41266708374023436, "max": 1.372314453125, "pos_frac": 0.546875, "sample": [-0.22922706604003906, -0.0011653900146484375, -0.023859024047851562, -0.261199951171875, 0.07810211181640625, 0.4112091064453125, 0.6156005859375, -0.328460693359375, -0.3530387878417969, -0.115814208984375, -0.01065826416015625, 0.12883758544921875, -0.38761138916015625, -0.4715423583984375, 0.12056732177734375, 0.29290771484375, 1.372314453125, 0.30855560302734375, 0.46453094482421875, 0.9087677001953125, -0.8767242431640625, -0.19197845458984375, 0.21535682678222656, -0.9033050537109375, 0.4105091094970703, 0.023363113403320312, -0.164459228515625, 0.0667724609375, 0.04500579833984375, -0.03395843505859375, -0.1443634033203125, 0.12407684326171875, 0.06476020812988281, -0.14434814453125, -0.358642578125, -0.2740821838378906, -0.0529327392578125, 0.03985404968261719, -0.6592254638671875, 0.286651611328125, 0.1654510498046875, 0.180633544921875, 0.12839508056640625, 0.1275634765625, -0.22251129150390625, 0.19228363037109375, -0.3027992248535156, 0.34783172607421875, -0.78399658203125, -0.14858245849609375, -0.03436279296875, 0.001739501953125, 0.10467529296875, 0.15744400024414062, 0.5006179809570312, -0.019777297973632812, 0.41329193115234375, 0.14173126220703125, 0.071258544921875, -0.47979736328125, 0.630645751953125, 0.16337966918945312, -0.03542327880859375, 0.1121063232421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000019.npy"}
{"epoch": 0.027900146842878122, "step": 20, "batch_size": 64, "mean": 0.026839792728424072, "std": 0.4463766813278198, "min": -1.2085037231445312, "p10": -0.5859840393066406, "median": 0.048168182373046875, "p90": 0.5943557739257814, "max": 1.2464599609375, "pos_frac": 0.578125, "sample": [-0.0938568115234375, 0.04593658447265625, 0.17235946655273438, -0.0641326904296875, 0.03594970703125, 0.4331398010253906, -0.3788604736328125, 0.7234077453613281, 0.6459579467773438, -1.2085037231445312, -0.6835174560546875, 0.6208038330078125, -0.3231620788574219, 0.2962474822998047, -0.017059326171875, -0.2203216552734375, 0.6316566467285156, -0.16381072998046875, 0.4518394470214844, 0.0270843505859375, 0.02399444580078125, 0.6936111450195312, 0.4058723449707031, 0.027660369873046875, -0.32464599609375, 0.050930023193359375, 0.303131103515625, -0.05516815185546875, 0.41593170166015625, -0.4849872589111328, 0.16413497924804688, -0.0947723388671875, 0.55023193359375, -0.3128318786621094, -0.140960693359375, -0.16716766357421875, -0.694854736328125, -1.131744384765625, -0.20705413818359375, 0.2947998046875, 0.11585235595703125, -0.220306396484375, 0.0530548095703125, 0.48123931884765625, 0.336334228515625, -0.05804443359375, -0.0569000244140625, -0.694305419921875, 0.4235687255859375, 0.17547607421875, 0.18608856201171875, -0.5687179565429688, 0.3789253234863281, 0.199371337890625, 0.6132659912109375, 0.0503997802734375, -0.5933837890625, 0.06622314453125, 0.19011306762695312, -0.22771835327148438, 0.10213470458984375, 1.2464599609375, -0.7868881225585938, 0.05823516845703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000020.npy"}
{"epoch": 0.02936857562408223, "step": 21, "batch_size": 64, "mean": 0.08599665760993958, "std": 0.37314027547836304, "min": -0.7222900390625, "p10": -0.36446075439453124, "median": 0.08053970336914062, "p90": 0.48460235595703127, "max": 1.061920166015625, "pos_frac": 0.59375, "sample": [-0.7222900390625, 0.304351806640625, 0.4408416748046875, -0.0303497314453125, -0.3179779052734375, 0.17163848876953125, 0.06961822509765625, 0.03264617919921875, 0.38530731201171875, -0.0344696044921875, 0.18798828125, 0.5892562866210938, -0.030355453491210938, -0.12468719482421875, 0.1568164825439453, 0.9367599487304688, -0.1682891845703125, 0.3563232421875, 0.1521625518798828, -0.35031890869140625, -0.3111572265625, 0.169342041015625, 0.225433349609375, -0.7144660949707031, -0.2054290771484375, -0.17523193359375, 0.6583404541015625, 0.6378898620605469, -0.37686920166015625, 0.146942138671875, -0.00710296630859375, 0.42896270751953125, -0.12060165405273438, 0.2819786071777344, 0.480682373046875, 0.0439453125, 0.37732696533203125, -0.34296417236328125, -0.1417388916015625, 0.064422607421875, 0.41120147705078125, 0.2603607177734375, -0.3933258056640625, 0.07219314575195312, 0.3750267028808594, -0.37052154541015625, 0.1333160400390625, 0.4862823486328125, -0.09490203857421875, -0.6103668212890625, -0.3037452697753906, -0.00699615478515625, 0.13238525390625, 0.1603851318359375, 0.0840301513671875, -0.4197998046875, 1.061920166015625, 0.42391204833984375, 0.07704925537109375, 0.15316390991210938, 0.37035369873046875, -0.26380157470703125, 0.9324302673339844, -0.29144287109375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000021.npy"}
{"epoch": 0.030837004405286344, "step": 22, "batch_size": 64, "mean": 0.05312791466712952, "std": 0.4397926330566406, "min": -1.1528816223144531, "p10": -0.4682186126708983, "median": 0.030635833740234375, "p90": 0.6682197570800783, "max": 0.9744110107421875, "pos_frac": 0.546875, "sample": [0.9609031677246094, -0.9813232421875, 0.0059661865234375, -0.047637939453125, 0.231353759765625, -0.0005950927734375, -0.183013916015625, 0.3500213623046875, 0.043487548828125, 0.4256439208984375, -0.0058746337890625, -0.12896728515625, -0.009967803955078125, 0.3466148376464844, -0.5277061462402344, 0.17291259765625, 0.11705780029296875, -0.18569183349609375, 0.0491180419921875, 0.811798095703125, -1.1528816223144531, 0.9285812377929688, -0.26685333251953125, 0.9744110107421875, 0.12023353576660156, -0.01009368896484375, 0.0007171630859375, 0.1510467529296875, 0.2018585205078125, -0.5759429931640625, -0.27828216552734375, -0.32941436767578125, 0.4676055908203125, 0.33362579345703125, -0.3153038024902344, -0.25171661376953125, -0.1342620849609375, -0.534637451171875, -0.053730010986328125, -0.6463546752929688, 0.1691741943359375, 0.34078216552734375, -0.12420654296875, 0.17937088012695312, 0.0411529541015625, 0.45169830322265625, -0.24948883056640625, 0.6141891479492188, -0.23230743408203125, 0.11146354675292969, -0.027973175048828125, 0.691375732421875, 0.05771636962890625, -0.18175888061523438, -0.203826904296875, -0.949127197265625, 0.02011871337890625, 0.49129295349121094, 0.7292938232421875, 0.11627197265625, 0.865081787109375, -0.15948486328125, 0.356842041015625, 0.21982955932617188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000022.npy"}
{"epoch": 0.032305433186490456, "step": 23, "batch_size": 64, "mean": 0.009142071008682251, "std": 0.41245266795158386, "min": -1.0657196044921875, "p10": -0.44484405517578124, "median": 0.0053577423095703125, "p90": 0.4920696258544922, "max": 1.1036300659179688, "pos_frac": 0.53125, "sample": [-0.13592529296875, 0.05211639404296875, -0.1574554443359375, -0.10146331787109375, 0.166473388671875, -0.11063766479492188, 0.30998992919921875, 0.000762939453125, -0.5411624908447266, -0.427764892578125, 0.429046630859375, 0.18947601318359375, 0.9585418701171875, 0.14803314208984375, 0.037647247314453125, 0.05895233154296875, 0.49619293212890625, -0.5623703002929688, 0.6410408020019531, -0.3816719055175781, -0.12812042236328125, 1.1036300659179688, 0.14806365966796875, -0.2275543212890625, -0.4157066345214844, -0.214111328125, 0.5238265991210938, 0.5546722412109375, -0.1096954345703125, 0.24981689453125, 0.21958160400390625, 0.0158233642578125, 0.035991668701171875, 0.08058929443359375, -0.5121574401855469, -0.30423736572265625, -0.08795928955078125, -0.22015762329101562, 0.4824485778808594, -0.4521636962890625, 1.0858802795410156, 0.18590736389160156, 0.009555816650390625, -0.3584785461425781, 0.3083610534667969, -0.8457794189453125, -0.04857635498046875, 0.355743408203125, -0.0908050537109375, 0.00115966796875, 0.3955059051513672, 0.15972137451171875, -0.22028732299804688, -0.2115020751953125, 0.36631202697753906, -0.4060821533203125, 0.25119781494140625, -0.84527587890625, 0.09327507019042969, -1.0657196044921875, -0.11383819580078125, -0.10950469970703125, 0.06707763671875, -0.19115829467773438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000023.npy"}
{"epoch": 0.033773861967694566, "step": 24, "batch_size": 64, "mean": 0.034656256437301636, "std": 0.3773519694805145, "min": -0.7795867919921875, "p10": -0.4486785888671875, "median": 0.087890625, "p90": 0.4316150665283204, "max": 1.226531982421875, "pos_frac": 0.53125, "sample": [-0.46198272705078125, -0.37993621826171875, -0.7795867919921875, -0.1723804473876953, -0.116729736328125, 0.4369926452636719, -0.120361328125, -0.4155845642089844, -0.02843475341796875, 0.33803558349609375, 0.12667083740234375, 0.09778594970703125, 0.18323898315429688, 0.3973236083984375, 0.17642974853515625, 0.24056243896484375, 0.15299224853515625, -0.30370330810546875, -0.2864532470703125, 0.3890037536621094, -0.37985992431640625, -0.10271835327148438, -0.11043548583984375, -0.2119598388671875, 1.226531982421875, 0.3998565673828125, 0.27577972412109375, 0.8273468017578125, -0.4586944580078125, -0.01444244384765625, 0.32916259765625, -0.48339080810546875, -0.12613296508789062, 0.575408935546875, 0.2754478454589844, -0.34820556640625, 0.235992431640625, 0.3874664306640625, 0.5607833862304688, 0.07928466796875, -0.547027587890625, -0.7130050659179688, 0.2734832763671875, 0.4709930419921875, 0.0312652587890625, 0.09649658203125, -0.3627471923828125, 0.17315292358398438, -0.142974853515625, 0.1556396484375, 0.22210311889648438, -0.09390449523925781, -0.41843414306640625, -0.10693931579589844, 0.517974853515625, 0.1821136474609375, -0.13922119140625, 0.32431793212890625, 0.2612953186035156, 0.4190673828125, -0.5592880249023438, -0.4253082275390625, -0.020355224609375, 0.20819854736328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000024.npy"}
{"epoch": 0.03524229074889868, "step": 25, "batch_size": 64, "mean": 0.07580611109733582, "std": 0.47967109084129333, "min": -1.307373046875, "p10": -0.553537368774414, "median": 0.08641624450683594, "p90": 0.7231239318847659, "max": 1.2715911865234375, "pos_frac": 0.578125, "sample": [-0.6735000610351562, 0.14194297790527344, -0.6566162109375, -0.03795623779296875, -0.1582794189453125, -0.05972480773925781, 0.989105224609375, 0.4389190673828125, -0.214874267578125, -0.13335609436035156, 0.402618408203125, -1.307373046875, 0.35666656494140625, 0.4931793212890625, -0.0203704833984375, 0.3909568786621094, -0.5808944702148438, -0.263336181640625, -0.41916656494140625, 0.7612075805664062, -0.331024169921875, 0.3431358337402344, 0.7499237060546875, 0.0757293701171875, -0.7657928466796875, 0.34439849853515625, 0.759765625, 0.08013153076171875, 0.4053802490234375, -0.0063114166259765625, -0.8844985961914062, -0.4897041320800781, 0.20240402221679688, -0.42856597900390625, -0.08509063720703125, -0.2743492126464844, -0.025728225708007812, 0.002899169921875, 0.6046142578125, -0.5885772705078125, -0.4653778076171875, 0.2770538330078125, -0.212371826171875, 0.9179534912109375, 0.045246124267578125, 0.132598876953125, 0.6381072998046875, 0.4536285400390625, 0.6605911254882812, -0.084442138671875, 1.2715911865234375, 0.09270095825195312, -0.4372100830078125, 0.1035919189453125, 0.29782867431640625, 0.09605598449707031, 0.2404937744140625, 0.20282745361328125, 0.152130126953125, 0.02538299560546875, 0.2060680389404297, -0.00888824462890625, 0.7928695678710938, 0.3152732849121094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000025.npy"}
{"epoch": 0.03671071953010279, "step": 26, "batch_size": 64, "mean": -0.09747996926307678, "std": 0.3947950601577759, "min": -1.115966796875, "p10": -0.6545516967773437, "median": -0.08047103881835938, "p90": 0.42462158203125, "max": 0.8478775024414062, "pos_frac": 0.359375, "sample": [0.1894378662109375, 0.10794639587402344, 0.3188819885253906, -0.3219146728515625, 0.546783447265625, -0.05544281005859375, -0.37006378173828125, 0.0023193359375, -0.29805755615234375, -0.5769462585449219, 0.4251708984375, -0.1608734130859375, -0.80596923828125, 0.14919281005859375, -0.136962890625, -0.30462646484375, -0.3505401611328125, 0.18158721923828125, -0.032806396484375, -0.7945365905761719, -0.1294403076171875, -0.2173614501953125, 0.17108917236328125, 0.42333984375, -0.08957672119140625, -0.8878097534179688, -0.051303863525390625, -0.027008056640625, 0.12508583068847656, -1.115966796875, 0.141693115234375, -0.04062652587890625, -0.30692291259765625, -0.3188209533691406, -0.2836475372314453, 0.1258392333984375, -0.1111907958984375, -0.170623779296875, 0.8478775024414062, 0.6765289306640625, -0.056545257568359375, 0.592041015625, -0.845367431640625, -0.26287078857421875, 0.39598846435546875, 0.5706558227539062, -0.29981231689453125, -0.12650299072265625, -0.24956512451171875, 0.13623046875, -0.6524581909179688, -0.06758880615234375, -0.08203887939453125, 0.17284393310546875, -0.0789031982421875, -0.0159912109375, -0.42409515380859375, -0.7169418334960938, -0.38909149169921875, -0.6554489135742188, 0.12387847900390625, 0.047229766845703125, 0.4318580627441406, -0.25995635986328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000026.npy"}
{"epoch": 0.0381791483113069, "step": 27, "batch_size": 64, "mean": 0.1542966067790985, "std": 0.4503997266292572, "min": -0.889129638671875, "p10": -0.33721008300781247, "median": 0.088836669921875, "p90": 0.7845909118652346, "max": 1.3876800537109375, "pos_frac": 0.609375, "sample": [0.3783111572265625, 0.36865997314453125, -0.35683441162109375, -0.6059036254882812, 0.9352645874023438, 0.2630462646484375, -0.3105926513671875, 0.943634033203125, 0.05153656005859375, -0.05452156066894531, -0.025318145751953125, -0.10688018798828125, 0.5978469848632812, -0.09818649291992188, 0.7542724609375, 0.360809326171875, 0.18182373046875, -0.384521484375, 0.031688690185546875, -0.23972320556640625, -0.008632659912109375, 0.869171142578125, 0.2991180419921875, 0.2302570343017578, 0.1381683349609375, 0.27545166015625, -0.221405029296875, 0.09743118286132812, 0.1588897705078125, 1.2565994262695312, 0.019397735595703125, 0.5759696960449219, -0.17304420471191406, 0.0882568359375, 0.49427032470703125, 0.7975845336914062, 0.4008331298828125, 0.4866943359375, -0.3597068786621094, 0.19542312622070312, -0.30843353271484375, 0.415191650390625, -0.2910614013671875, 0.49443817138671875, 0.038227081298828125, -0.889129638671875, 0.1389007568359375, -0.082733154296875, -0.27851104736328125, -0.12030792236328125, 0.8915328979492188, 0.7220916748046875, 0.5699386596679688, 1.3876800537109375, -0.0077972412109375, 0.0573272705078125, -0.7002716064453125, -0.091644287109375, -0.28580474853515625, 0.402740478515625, -0.24228668212890625, 0.0089569091796875, -0.3486175537109375, 0.08941650390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000027.npy"}
{"epoch": 0.039647577092511016, "step": 28, "batch_size": 64, "mean": 0.11996951699256897, "std": 0.38977569341659546, "min": -0.91973876953125, "p10": -0.3096908569335937, "median": 0.11608314514160156, "p90": 0.5218215942382812, "max": 1.3572845458984375, "pos_frac": 0.640625, "sample": [0.780181884765625, 0.540924072265625, -0.2525787353515625, -0.000675201416015625, 0.11682891845703125, 0.3694610595703125, -0.1248931884765625, 0.14862823486328125, 0.21663665771484375, 0.5248565673828125, 0.37000274658203125, -0.417816162109375, 0.5303955078125, -0.04425048828125, 0.1131439208984375, 0.3020343780517578, 0.11601638793945312, -0.67047119140625, -0.5132102966308594, 0.0753936767578125, 0.30466461181640625, 0.20807647705078125, 1.3572845458984375, 1.0428924560546875, 0.2328948974609375, 0.45063018798828125, 0.39629364013671875, 0.47431182861328125, -0.2384490966796875, 0.09992218017578125, -0.040374755859375, -0.25243377685546875, 0.11614990234375, -0.0760498046875, 0.4782829284667969, -0.4436187744140625, -0.14705276489257812, 0.20583724975585938, 0.10557174682617188, 0.4043121337890625, 0.48496246337890625, -0.09392547607421875, 0.14081573486328125, 0.077880859375, 0.1722412109375, -0.91973876953125, 0.06313323974609375, -0.16003036499023438, 0.01953125, 0.212432861328125, -0.6348228454589844, 0.514739990234375, 0.3150444030761719, -0.33416748046875, -0.19890213012695312, 0.09875106811523438, 0.4131355285644531, 0.6904296875, 0.5054931640625, -0.11329269409179688, -0.23557662963867188, -0.2119140625, -0.155609130859375, 0.16768264770507812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000028.npy"}
{"epoch": 0.041116005873715125, "step": 29, "batch_size": 64, "mean": 0.08065186440944672, "std": 0.4189628064632416, "min": -0.8525238037109375, "p10": -0.4314414978027344, "median": 0.07001781463623047, "p90": 0.5893627166748048, "max": 1.126953125, "pos_frac": 0.5625, "sample": [0.056087493896484375, 0.3458118438720703, -0.1502227783203125, 0.5804252624511719, -0.0634918212890625, -0.023342132568359375, 0.35729217529296875, 0.2220001220703125, 0.28679656982421875, 0.698089599609375, -0.2862968444824219, -0.3040132522583008, -0.02179718017578125, 0.7571563720703125, -0.203399658203125, 0.116912841796875, -0.49239349365234375, 0.3483428955078125, 0.06824874877929688, -0.3010406494140625, 0.2570304870605469, 0.10699462890625, -0.16841888427734375, 0.301666259765625, 0.7361297607421875, 0.442901611328125, 0.5931930541992188, 0.29923248291015625, -0.0263519287109375, -0.443084716796875, 0.06789398193359375, 0.09248733520507812, -0.20455169677734375, 0.5080490112304688, 0.30127716064453125, -0.20414352416992188, 0.4243927001953125, 0.004180908203125, 0.7134552001953125, 1.126953125, -0.833282470703125, 0.508392333984375, 0.2801704406738281, -0.29553985595703125, 0.44361305236816406, 0.07178688049316406, 0.1766204833984375, -0.1100616455078125, 0.31136322021484375, -0.16312217712402344, -0.22550582885742188, -0.181976318359375, -0.7955703735351562, -0.332855224609375, 0.5616607666015625, -0.4382781982421875, -0.8525238037109375, -0.21490478515625, 0.8237152099609375, 0.5748939514160156, 0.19443511962890625, -0.463897705078125, -0.41548919677734375, -0.38237762451171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000029.npy"}
{"epoch": 0.042584434654919234, "step": 30, "batch_size": 64, "mean": 0.05558106303215027, "std": 0.4330596327781677, "min": -0.89031982421875, "p10": -0.4527809143066406, "median": 0.058887481689453125, "p90": 0.5904491424560552, "max": 1.14752197265625, "pos_frac": 0.546875, "sample": [-0.011627197265625, 0.9432640075683594, 0.3084678649902344, 0.2769012451171875, 0.16705322265625, 0.1536712646484375, 0.36279296875, -0.3804473876953125, 0.17926979064941406, 0.05542755126953125, -0.46231842041015625, 1.0237655639648438, 0.40399932861328125, 0.43413543701171875, 0.062347412109375, -0.6709518432617188, -0.33736419677734375, -0.4305267333984375, -0.05796051025390625, -0.6791763305664062, -0.05559539794921875, 0.744384765625, -0.822662353515625, -0.2005462646484375, -0.2848358154296875, -0.2503795623779297, -0.664886474609375, -0.0550384521484375, 0.1671276092529297, 0.03178215026855469, 0.1482086181640625, 0.2475566864013672, -0.10549163818359375, -0.08423614501953125, 0.04145050048828125, 0.36865997314453125, -0.89031982421875, 1.14752197265625, 0.472259521484375, 0.6362152099609375, -0.08353996276855469, 0.4582061767578125, 0.356536865234375, -0.14730453491210938, -0.5617599487304688, 0.9485626220703125, -0.2889556884765625, -0.22027587890625, 0.1342906951904297, 0.12786865234375, 0.4836616516113281, -0.42494964599609375, -0.07325363159179688, 0.12893295288085938, -0.0274200439453125, -0.030529022216796875, 0.23368453979492188, 0.15808868408203125, -0.41298675537109375, 0.193695068359375, 0.111785888671875, 0.706298828125, -0.27318572998046875, 0.127838134765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000030.npy"}
{"epoch": 0.04405286343612335, "step": 31, "batch_size": 64, "mean": 0.10736304521560669, "std": 0.4195689558982849, "min": -0.6175193786621094, "p10": -0.34134750366210936, "median": 0.07667922973632812, "p90": 0.5670181274414063, "max": 2.0493316650390625, "pos_frac": 0.59375, "sample": [0.1116180419921875, -0.08206939697265625, 0.063873291015625, -0.1262969970703125, 0.10085296630859375, -0.028934478759765625, 0.1301116943359375, -0.6175193786621094, -0.10628509521484375, -0.254669189453125, 0.141021728515625, 0.106201171875, 0.46321868896484375, -0.06943511962890625, -0.3400230407714844, 0.042110443115234375, -0.44454193115234375, -0.05976676940917969, 0.03662109375, -0.09125709533691406, -0.08161163330078125, -0.28122711181640625, 0.20856475830078125, 0.7679290771484375, -0.011444091796875, -0.2626190185546875, -0.3592071533203125, -0.2725868225097656, -0.23225784301757812, 0.2971649169921875, -0.57000732421875, -0.3419151306152344, 0.27881622314453125, 2.0493316650390625, 0.001087188720703125, 0.643585205078125, 0.09665679931640625, 0.1827545166015625, 0.08948516845703125, -0.3925323486328125, 0.5558242797851562, 0.32598876953125, 0.12921524047851562, 0.1228790283203125, -0.15081024169921875, 1.1221466064453125, 0.336029052734375, 0.5718154907226562, -0.4286079406738281, 0.05213165283203125, 0.24526214599609375, 0.1351776123046875, 0.00201416015625, 0.9149169921875, 0.2576751708984375, 0.4038543701171875, 0.3491935729980469, 0.2892303466796875, 0.6432952880859375, -0.01828765869140625, 0.46903228759765625, 0.1340179443359375, -0.177490234375, -0.19806671142578125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000031.npy"}
{"epoch": 0.04552129221732746, "step": 32, "batch_size": 64, "mean": 0.1167445182800293, "std": 0.5180900692939758, "min": -1.1079864501953125, "p10": -0.42131576538085935, "median": 0.06767845153808594, "p90": 0.8428337097167969, "max": 1.5986785888671875, "pos_frac": 0.5625, "sample": [0.06749725341796875, 0.16551971435546875, -0.44244384765625, -0.01642608642578125, -0.2194042205810547, 1.3202056884765625, -0.5384597778320312, 0.4240264892578125, 0.10783004760742188, 0.8930435180664062, 0.2891387939453125, -0.19506454467773438, 0.00389862060546875, 0.06785964965820312, -0.6862030029296875, 0.8298721313476562, 0.7401351928710938, 0.15781784057617188, 0.292510986328125, -0.236053466796875, 0.37322235107421875, -0.3581695556640625, 1.5986785888671875, -1.1079864501953125, -0.8174476623535156, 0.6605567932128906, 0.8836746215820312, -0.116607666015625, 0.4550132751464844, 0.32366943359375, -0.47566986083984375, -0.23859405517578125, -0.2989158630371094, 1.1394500732421875, -0.05384063720703125, 0.1633148193359375, 0.888214111328125, 0.000457763671875, -0.0605316162109375, -0.3621940612792969, 0.1458740234375, -0.2899284362792969, -0.03997802734375, 0.15486526489257812, 0.848388671875, -0.16412353515625, 0.28861236572265625, -0.10189056396484375, 0.47528076171875, 0.142608642578125, -0.1579132080078125, 0.0908355712890625, -1.0384368896484375, 0.7464408874511719, -0.37201690673828125, -0.0295257568359375, 0.172149658203125, 0.7514801025390625, -0.16187286376953125, 0.425689697265625, -0.1626415252685547, 0.04340362548828125, 0.105072021484375, -0.022319793701171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000032.npy"}
{"epoch": 0.04698972099853157, "step": 33, "batch_size": 64, "mean": 0.1387576460838318, "std": 0.44316932559013367, "min": -0.6660842895507812, "p10": -0.36255722045898436, "median": 0.07840919494628906, "p90": 0.8277282714843751, "max": 1.1306838989257812, "pos_frac": 0.609375, "sample": [0.07747268676757812, 0.0684967041015625, 0.6417694091796875, -0.16128921508789062, 0.11431884765625, -0.28217315673828125, 0.119964599609375, -0.3223876953125, 0.3499488830566406, 0.08433151245117188, 0.03021240234375, 0.5418319702148438, 0.83734130859375, 0.7124214172363281, -0.2169647216796875, 0.116729736328125, -0.0198516845703125, -0.5478935241699219, -0.2840576171875, -0.5654449462890625, 0.565582275390625, -0.06722259521484375, 0.5518341064453125, 0.959136962890625, 0.6768722534179688, 0.1748504638671875, -0.1285400390625, 0.0162200927734375, 0.4862213134765625, 0.4100341796875, -0.09129714965820312, 0.4747467041015625, -0.5008621215820312, 0.13604736328125, -0.12609100341796875, 0.141815185546875, 0.056087493896484375, 0.08238983154296875, 0.007232666015625, -0.092559814453125, 1.0680999755859375, -0.19042205810546875, -0.6660842895507812, 0.23648834228515625, -0.0643768310546875, 0.8052978515625, 0.079345703125, 0.9046401977539062, 0.21323394775390625, -0.15877532958984375, 0.116607666015625, 0.2668876647949219, 1.1306838989257812, -0.6469955444335938, 0.04583740234375, 0.9396018981933594, -0.355438232421875, -0.147918701171875, -0.340667724609375, 0.9526443481445312, -0.36560821533203125, -0.37068939208984375, 0.5981712341308594, -0.19734954833984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000033.npy"}
{"epoch": 0.048458149779735685, "step": 34, "batch_size": 64, "mean": 0.13282954692840576, "std": 0.545570969581604, "min": -0.854095458984375, "p10": -0.39227409362792964, "median": 0.06119728088378906, "p90": 0.6347229003906251, "max": 3.0831298828125, "pos_frac": 0.625, "sample": [0.9553070068359375, 0.476226806640625, 0.20929718017578125, -0.22374725341796875, 0.7730560302734375, 0.18372726440429688, -0.1667633056640625, 0.0587158203125, -0.12827301025390625, -0.25213623046875, -0.0690765380859375, -0.2776069641113281, -0.081787109375, 0.02050018310546875, 0.281280517578125, 0.13763427734375, -0.3749198913574219, -0.11962127685546875, 0.224029541015625, 0.43526458740234375, 0.01326751708984375, -0.49982452392578125, 0.614044189453125, 0.3702964782714844, 3.0831298828125, -0.1378326416015625, 0.041107177734375, -0.28173828125, 0.643585205078125, 0.0677642822265625, 0.9529571533203125, 0.40467071533203125, -0.15572357177734375, 0.08650588989257812, 0.585723876953125, -0.6779861450195312, 0.016313552856445312, -0.255706787109375, -0.43536376953125, 0.1287078857421875, 0.0059356689453125, 0.23786163330078125, 0.11178207397460938, 0.9030609130859375, 0.6116561889648438, -0.0211944580078125, 0.2370128631591797, 0.4197998046875, 1.1429214477539062, -0.497314453125, 0.06367874145507812, 0.3657989501953125, 0.29796600341796875, -0.39971160888671875, -0.550506591796875, -0.17845535278320312, 0.178955078125, -0.2881202697753906, 0.07659912109375, 0.0517425537109375, -0.162200927734375, -0.854095458984375, 0.0829010009765625, 0.040012359619140625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000034.npy"}
{"epoch": 0.049926578560939794, "step": 35, "batch_size": 64, "mean": 0.12453588843345642, "std": 0.40164801478385925, "min": -1.129547119140625, "p10": -0.31658363342285156, "median": 0.09693145751953125, "p90": 0.6453079223632815, "max": 1.1167526245117188, "pos_frac": 0.609375, "sample": [0.19109344482421875, 0.21141433715820312, 0.44598388671875, 0.5521774291992188, -0.038066864013671875, -0.04327583312988281, -1.129547119140625, 0.15303802490234375, -0.2912750244140625, 0.11809539794921875, -0.32794189453125, -0.18587303161621094, 0.24593353271484375, 0.1453704833984375, -0.5080528259277344, -0.3256645202636719, 0.05194854736328125, 0.1007537841796875, 0.34161376953125, 0.022058486938476562, 0.3061676025390625, 0.5077590942382812, 0.6694793701171875, 0.37836456298828125, 0.1356658935546875, -0.3399505615234375, -0.05184173583984375, -0.437164306640625, 0.03543853759765625, 0.533538818359375, -0.081787109375, 0.38761138916015625, -0.08905029296875, 0.6656875610351562, -0.1282196044921875, -0.21512603759765625, 0.12000274658203125, 0.9707756042480469, 0.1991424560546875, 0.5977554321289062, 0.093109130859375, 0.4738807678222656, -0.1993408203125, 0.21407318115234375, -0.21511077880859375, -0.09801101684570312, -0.1071014404296875, 1.1167526245117188, 0.0241241455078125, 0.000904083251953125, -0.08406448364257812, -0.05389404296875, 0.34450531005859375, -0.214019775390625, 0.53167724609375, 0.1963043212890625, 0.944366455078125, -0.2565155029296875, -0.2953948974609375, -0.464019775390625, 0.006290435791015625, 0.761932373046875, 0.41121673583984375, 0.9445991516113281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000035.npy"}
{"epoch": 0.0513950073421439, "step": 36, "batch_size": 64, "mean": 0.17598721385002136, "std": 0.5479477047920227, "min": -1.2998504638671875, "p10": -0.3351959228515625, "median": 0.12308883666992188, "p90": 0.8873622894287111, "max": 2.23284912109375, "pos_frac": 0.609375, "sample": [0.3049468994140625, -0.20982933044433594, -0.41889381408691406, -0.11403656005859375, -0.10251617431640625, 0.3834552764892578, -0.0837249755859375, -0.0604095458984375, 0.3304023742675781, 2.23284912109375, 0.12139892578125, 0.06280517578125, -0.2751350402832031, 0.1456623077392578, 0.022029876708984375, 0.01766204833984375, 0.6979541778564453, 1.00830078125, 0.43531036376953125, -0.33728790283203125, 0.0847625732421875, -0.32309722900390625, -0.112884521484375, 0.16907882690429688, 0.9420242309570312, 0.851898193359375, 0.6313629150390625, -0.0123291015625, -1.2998504638671875, 0.4323577880859375, 0.12477874755859375, 0.06893157958984375, 0.3100433349609375, 1.0281524658203125, 0.3389854431152344, -0.24787521362304688, -0.881103515625, 0.47866058349609375, 0.64837646484375, 1.2421722412109375, 0.3130950927734375, 0.14327239990234375, 0.9025611877441406, -0.15798568725585938, -0.38330078125, 0.16104507446289062, 0.26219940185546875, 1.555999755859375, 0.25988006591796875, 0.35076141357421875, -0.33031463623046875, 0.5332260131835938, 0.0658111572265625, 0.2680206298828125, -0.22377777099609375, -0.0423126220703125, -0.2730865478515625, -0.023406982421875, -0.33917236328125, 0.4363594055175781, -0.17575836181640625, 0.2029266357421875, -0.7172393798828125, -0.1610107421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000036.npy"}
{"epoch": 0.05286343612334802, "step": 37, "batch_size": 64, "mean": 0.27413442730903625, "std": 0.6604204773902893, "min": -1.960235595703125, "p10": -0.2858772277832031, "median": 0.19058609008789062, "p90": 1.008169555664063, "max": 2.65985107421875, "pos_frac": 0.6875, "sample": [-0.5087738037109375, -0.05309772491455078, 0.1936492919921875, 0.8335914611816406, 0.2774658203125, 0.026569366455078125, 1.1222381591796875, 0.05536651611328125, -0.0511016845703125, 0.48670196533203125, 0.33289337158203125, 1.0706787109375, -0.1974029541015625, -0.3858642578125, -0.3394012451171875, 0.6411056518554688, 0.36547088623046875, -0.04578113555908203, 0.1837921142578125, 2.65985107421875, -0.2824859619140625, -0.248443603515625, 0.0230560302734375, 0.5962753295898438, -0.610015869140625, -0.19337844848632812, 0.1216583251953125, 0.519012451171875, -0.41062164306640625, 0.14227294921875, 0.010594367980957031, 1.06219482421875, 0.084259033203125, 0.869842529296875, 0.18752288818359375, 0.6304512023925781, 0.27987003326416016, 0.237548828125, 0.107086181640625, 2.58837890625, 0.08352279663085938, -1.960235595703125, 0.0710906982421875, 0.42319488525390625, 0.1980438232421875, -0.28733062744140625, -0.12082099914550781, 0.38091278076171875, -0.10279083251953125, 1.2139396667480469, 0.3638916015625, -0.1663055419921875, 0.8402976989746094, -0.0251312255859375, 0.21997642517089844, -0.1635589599609375, 0.31093597412109375, 1.3427276611328125, 0.2357616424560547, 0.882110595703125, 0.5091171264648438, 0.715423583984375, -0.14504241943359375, 0.3418426513671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000037.npy"}
{"epoch": 0.05433186490455213, "step": 38, "batch_size": 64, "mean": 0.21863074600696564, "std": 0.626781165599823, "min": -0.6341934204101562, "p10": -0.38673324584960933, "median": 0.07575607299804688, "p90": 1.0064529418945314, "max": 2.5083770751953125, "pos_frac": 0.5625, "sample": [-0.5862274169921875, 1.1908111572265625, 2.5083770751953125, -0.26355743408203125, 0.3711433410644531, -0.399017333984375, -0.6341934204101562, 0.34598541259765625, 0.02539825439453125, -0.081878662109375, 0.855926513671875, 0.5825653076171875, -0.01715087890625, -0.35807037353515625, -0.4768829345703125, -0.0595855712890625, 0.189544677734375, -0.3103485107421875, 0.13974761962890625, 0.3376617431640625, 0.7270965576171875, 0.2557220458984375, 0.15623092651367188, 0.052516937255859375, -0.21636962890625, 0.4242286682128906, 1.6363372802734375, -0.548980712890625, 0.26554107666015625, -0.5568275451660156, 0.027963638305664062, -0.2776679992675781, 0.14449310302734375, -0.2477874755859375, -0.00479888916015625, -0.13279342651367188, -0.26461029052734375, 2.318634033203125, 0.139495849609375, 0.6064605712890625, 0.961944580078125, 1.086761474609375, 0.727813720703125, -0.055133819580078125, -0.0854644775390625, 0.153106689453125, -0.18872451782226562, 0.5258560180664062, -0.2054901123046875, -0.3096771240234375, 0.305511474609375, -0.22133636474609375, 0.3599681854248047, -0.130889892578125, -0.24980926513671875, 1.0255279541015625, 0.6047897338867188, 1.2572174072265625, -0.45282745361328125, -0.14190673828125, 0.09899520874023438, 0.01792621612548828, 0.312286376953125, 0.7307891845703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000038.npy"}
{"epoch": 0.055800293685756244, "step": 39, "batch_size": 64, "mean": 0.3002166748046875, "std": 0.596712589263916, "min": -1.16522216796875, "p10": -0.3478485107421875, "median": 0.28340911865234375, "p90": 0.9181053161621096, "max": 2.20770263671875, "pos_frac": 0.65625, "sample": [0.5297012329101562, -0.0346221923828125, 0.86956787109375, 0.150390625, 0.4986114501953125, 0.7720565795898438, 0.5258560180664062, -0.007198333740234375, -0.6319503784179688, 0.3558349609375, 0.6514625549316406, 0.8823928833007812, 0.6017913818359375, -0.5912933349609375, 1.1506462097167969, 0.6418228149414062, 0.42342376708984375, -0.6377639770507812, 1.7742156982421875, 0.223602294921875, 0.1039581298828125, 0.10253143310546875, 0.5289802551269531, 0.8036918640136719, -0.12012672424316406, -0.3487091064453125, -0.1910247802734375, 0.7100143432617188, -1.16522216796875, -0.076446533203125, 0.7426109313964844, -0.06558990478515625, 1.0197410583496094, -0.14812850952148438, -0.318572998046875, 0.4254493713378906, -0.050083160400390625, 0.22610092163085938, 0.18449783325195312, 1.6510848999023438, 0.64971923828125, -0.3458404541015625, -0.290069580078125, 0.5798873901367188, 0.5370979309082031, 0.28984832763671875, 0.93341064453125, -0.372650146484375, -0.802734375, 0.10197067260742188, -0.05493927001953125, 0.6427459716796875, 0.3090667724609375, 2.20770263671875, -0.11035346984863281, 0.7565155029296875, 1.1928863525390625, 0.287689208984375, 0.2791290283203125, 0.15866661071777344, 0.019369125366210938, -0.02008342742919922, -0.2028369903564453, 0.3043632507324219], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000039.npy"}
{"epoch": 0.05726872246696035, "step": 40, "batch_size": 64, "mean": 0.26183804869651794, "std": 0.5151414275169373, "min": -0.4571380615234375, "p10": -0.24042835235595703, "median": 0.09141921997070312, "p90": 0.9105445861816407, "max": 2.33477783203125, "pos_frac": 0.578125, "sample": [-0.27571868896484375, 0.7792892456054688, -0.223876953125, -0.18307876586914062, -0.130462646484375, -0.01507568359375, -0.18018341064453125, -0.185546875, -0.020648956298828125, 0.20864486694335938, 0.7982635498046875, 0.87396240234375, -0.04986381530761719, 0.384765625, -0.023036956787109375, 0.13648223876953125, -0.16538238525390625, 0.809356689453125, -0.2421398162841797, 0.10772705078125, -0.358123779296875, 0.1379108428955078, -0.2587127685546875, 0.3990631103515625, 0.9205856323242188, 0.887115478515625, -0.056476593017578125, -0.033405303955078125, 0.6230010986328125, -0.014495849609375, 0.8743133544921875, 0.04935264587402344, -0.2364349365234375, 0.6849212646484375, 0.5263710021972656, 0.9675064086914062, 0.13813400268554688, 1.0345306396484375, -0.048297882080078125, 0.5477867126464844, 1.2577972412109375, 0.2735710144042969, -0.4571380615234375, 2.33477783203125, -0.3184375762939453, 0.7659454345703125, 0.5870132446289062, 0.048828125, 0.0710296630859375, 0.369903564453125, -0.1651611328125, -0.1589202880859375, 0.5703582763671875, -0.15436553955078125, 1.1418380737304688, -0.19879150390625, -0.2634124755859375, 0.07511138916015625, 0.6310501098632812, -0.22991180419921875, 0.18805694580078125, 0.15647125244140625, 1.0149002075195312, 0.02899932861328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000040.npy"}
{"epoch": 0.05873715124816446, "step": 41, "batch_size": 64, "mean": 0.3038408160209656, "std": 0.6161208152770996, "min": -1.2952880859375, "p10": -0.3745155334472656, "median": 0.23688316345214844, "p90": 0.9865093231201173, "max": 2.3796768188476562, "pos_frac": 0.703125, "sample": [0.5467071533203125, 0.07865142822265625, -0.38649749755859375, 0.24251937866210938, 0.8466033935546875, 0.3715972900390625, 0.2384033203125, 0.17812347412109375, -0.84893798828125, 0.18909454345703125, 1.9400787353515625, 1.3687362670898438, 0.5975341796875, 0.863922119140625, -0.10450363159179688, 0.0883941650390625, 0.6038589477539062, -0.2702178955078125, 0.0388641357421875, -0.3710174560546875, 0.2786445617675781, 0.7836647033691406, 0.925323486328125, -0.23284149169921875, 2.3796768188476562, -0.37601470947265625, 0.31621551513671875, 0.36334228515625, 0.16628646850585938, -0.41776275634765625, 0.6274795532226562, -0.1798095703125, 0.653411865234375, -0.036495208740234375, 0.23536300659179688, 0.69476318359375, 0.9714622497558594, 0.21429443359375, -0.240081787109375, 0.4874458312988281, 0.7016143798828125, -0.01381683349609375, -0.4659423828125, 1.3025054931640625, 1.1407241821289062, -0.25299072265625, 0.9929580688476562, 0.32938385009765625, 0.22988510131835938, 0.09998321533203125, 0.05179023742675781, 0.5022430419921875, -1.2952880859375, 0.8563232421875, -0.18758010864257812, 0.2766227722167969, -0.2758941650390625, 0.336090087890625, -0.44576263427734375, -0.28092193603515625, 0.13764190673828125, 0.08954429626464844, 1.3032188415527344, 0.4871978759765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000041.npy"}
{"epoch": 0.06020558002936858, "step": 42, "batch_size": 64, "mean": 0.4196072816848755, "std": 0.5776397585868835, "min": -1.1312026977539062, "p10": -0.2594074249267577, "median": 0.3565940856933594, "p90": 1.0775001525878907, "max": 2.054595947265625, "pos_frac": 0.796875, "sample": [0.4439239501953125, 0.1265411376953125, 0.08539390563964844, 0.9339447021484375, 2.054595947265625, 0.05802154541015625, -0.2891998291015625, 0.7365570068359375, 0.24884414672851562, 0.812896728515625, -0.84625244140625, 0.42734527587890625, 0.20482635498046875, 1.2763671875, -0.19939804077148438, 0.7901611328125, 1.0813522338867188, 0.3407135009765625, -0.11510086059570312, 0.8126678466796875, 0.5097885131835938, 0.4797992706298828, 0.8494720458984375, -0.07744979858398438, -0.59881591796875, 0.7560043334960938, 0.1851043701171875, 0.64862060546875, 0.2884407043457031, 1.65216064453125, 0.37247467041015625, 0.7066574096679688, -0.09450435638427734, 0.8050537109375, 0.31365966796875, 0.21942901611328125, -0.285125732421875, -0.011522293090820312, 0.5375213623046875, 0.09848785400390625, 0.4863128662109375, -1.1312026977539062, 0.12543487548828125, 1.000762939453125, 1.1966476440429688, 1.0360107421875, 0.4582862854003906, 0.12394332885742188, 0.2101898193359375, 0.3195343017578125, 0.270263671875, 0.7921905517578125, 0.25455474853515625, -0.35016822814941406, 1.068511962890625, 1.706756591796875, -0.4827461242675781, 1.0817108154296875, 0.13147735595703125, 0.833251953125, 0.6730422973632812, 0.07916259765625, -0.13154983520507812, 0.763031005859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000042.npy"}
{"epoch": 0.06167400881057269, "step": 43, "batch_size": 64, "mean": 0.34349310398101807, "std": 0.6137816309928894, "min": -0.5793609619140625, "p10": -0.37971630096435544, "median": 0.238922119140625, "p90": 1.1867383956909185, "max": 2.606475830078125, "pos_frac": 0.65625, "sample": [0.32509613037109375, -0.00656890869140625, -0.26709938049316406, -0.12414932250976562, 0.8037109375, 0.6603546142578125, -0.45238494873046875, 1.00799560546875, -0.3901824951171875, 1.445465087890625, 0.28631591796875, 0.13254547119140625, 0.48464202880859375, 0.5943603515625, 0.6090202331542969, -0.1287994384765625, -0.0375213623046875, 0.8289947509765625, -0.118408203125, -0.35529518127441406, 1.0438232421875, 0.060894012451171875, 0.16640472412109375, 0.89508056640625, 0.28005218505859375, 0.6969184875488281, -0.5793609619140625, -0.10056114196777344, 0.06449127197265625, 0.55145263671875, -0.4014129638671875, 0.013545989990234375, 0.881134033203125, 0.5081787109375, -0.26116943359375, -0.044330596923828125, -0.15271759033203125, 0.3671302795410156, 0.6887626647949219, 0.43694305419921875, 0.36835479736328125, -0.5775489807128906, 0.7791748046875, 1.3555679321289062, 0.9040603637695312, 0.14083099365234375, 0.17957305908203125, 0.24785614013671875, 1.5319366455078125, -0.530670166015625, 1.3047904968261719, 0.6876220703125, 1.43304443359375, 0.4231128692626953, -0.31646728515625, 2.606475830078125, -0.011260986328125, 1.2479877471923828, 0.181671142578125, -0.08920097351074219, 0.1959686279296875, -0.5416717529296875, 0.22998809814453125, -0.18098831176757812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000043.npy"}
{"epoch": 0.0631424375917768, "step": 44, "batch_size": 64, "mean": 0.3009817898273468, "std": 0.5567229390144348, "min": -1.438507080078125, "p10": -0.28886718749999984, "median": 0.19157791137695312, "p90": 0.9512199401855469, "max": 2.0252227783203125, "pos_frac": 0.796875, "sample": [0.08694839477539062, -0.14215087890625, 0.3255195617675781, 0.11665916442871094, 2.0252227783203125, 0.84210205078125, 0.37613677978515625, -0.0985107421875, 0.2951698303222656, 0.16866683959960938, 0.5821456909179688, 0.29210662841796875, -0.35174560546875, 0.243927001953125, 1.813018798828125, -1.438507080078125, 0.1326141357421875, 0.9503173828125, 0.033985137939453125, 0.2323150634765625, 0.013788223266601562, 0.876129150390625, 0.45708465576171875, 0.9365615844726562, 0.09320068359375, 0.99365234375, 1.1490097045898438, 0.06902313232421875, 0.4140777587890625, 0.06423568725585938, 0.09871673583984375, 0.46607208251953125, 0.24464797973632812, -0.44101715087890625, 0.5799942016601562, 0.1510467529296875, 1.25164794921875, 0.12370681762695312, 0.193115234375, 1.126922607421875, -0.5970001220703125, 0.682830810546875, 0.43597412109375, 0.058383941650390625, 0.04434967041015625, 0.797149658203125, 0.60858154296875, -0.498443603515625, 0.34395599365234375, 0.1490154266357422, 0.065338134765625, 0.01508331298828125, -0.647247314453125, 0.9516067504882812, -0.14185714721679688, 0.63043212890625, -0.0737152099609375, 0.19004058837890625, -0.41765594482421875, 0.35015869140625, -0.040187835693359375, -0.00836181640625, 0.0767364501953125, 0.9401092529296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000044.npy"}
{"epoch": 0.06461086637298091, "step": 45, "batch_size": 64, "mean": 0.30300837755203247, "std": 0.58711177110672, "min": -0.88812255859375, "p10": -0.24270286560058593, "median": 0.25726318359375, "p90": 0.9778804779052738, "max": 2.573089599609375, "pos_frac": 0.703125, "sample": [0.3727264404296875, 0.099273681640625, -0.88812255859375, -0.07525253295898438, 0.26389312744140625, 0.6066665649414062, -0.14137649536132812, 1.9661865234375, 0.375396728515625, 1.2865447998046875, 0.6327056884765625, 0.21706390380859375, 0.31203460693359375, 0.12126922607421875, -0.0064029693603515625, -0.24454879760742188, 0.30511474609375, -0.11693954467773438, -0.21031761169433594, 0.8236923217773438, -0.09229278564453125, 0.3041982650756836, 0.89385986328125, 0.25063323974609375, -0.45386314392089844, -0.050075531005859375, -0.09329986572265625, 0.17679977416992188, 0.033458709716796875, 0.20083999633789062, 0.3066558837890625, 0.4400787353515625, 1.0316162109375, 0.5421905517578125, -0.6373443603515625, 0.3817291259765625, 1.0529708862304688, 0.42528533935546875, 0.2842292785644531, 0.01862335205078125, 0.21062469482421875, -0.08962631225585938, -0.17629241943359375, 0.42565155029296875, 0.178680419921875, 2.573089599609375, 1.9336700439453125, 0.41131591796875, 0.32354736328125, -0.31894683837890625, 0.6501617431640625, -0.07779693603515625, 1.0138893127441406, 0.4349861145019531, -0.23839569091796875, 0.057525634765625, -0.3950042724609375, 0.101318359375, 0.37761497497558594, 0.829620361328125, 0.4115447998046875, 0.6026458740234375, 0.016246795654296875, -0.579437255859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000045.npy"}
{"epoch": 0.06607929515418502, "step": 46, "batch_size": 64, "mean": 0.4510946273803711, "std": 0.5304697751998901, "min": -0.9152679443359375, "p10": -0.10133972167968744, "median": 0.35236167907714844, "p90": 1.2520904541015634, "max": 1.824188232421875, "pos_frac": 0.859375, "sample": [0.666259765625, -0.9152679443359375, 0.72222900390625, 1.804931640625, 0.6940536499023438, 0.46492767333984375, 0.06246185302734375, -0.2558479309082031, -0.43894195556640625, 0.860870361328125, 0.5926361083984375, 0.13197708129882812, 0.22124481201171875, 1.4041290283203125, 1.351287841796875, 0.13791656494140625, 0.32440948486328125, -0.12427139282226562, 0.5645370483398438, 0.2003173828125, 0.12650299072265625, -0.364105224609375, 0.33660888671875, 0.48419952392578125, 0.3689117431640625, 1.0206298828125, 1.364166259765625, 0.5479888916015625, 1.0122299194335938, 0.2156524658203125, 0.295257568359375, 0.124481201171875, 0.952850341796875, 0.1198577880859375, 0.735748291015625, 0.04814910888671875, -0.047832489013671875, 0.3494071960449219, 0.12418365478515625, 0.7918319702148438, -0.0033969879150390625, 0.44061279296875, 0.5868949890136719, 0.0329437255859375, 1.824188232421875, 0.5142555236816406, 0.3620452880859375, 0.308197021484375, 0.00079345703125, 0.703857421875, 0.355316162109375, 1.600372314453125, 0.26084136962890625, 0.612762451171875, 0.2806396484375, 0.8352508544921875, 0.7330207824707031, 0.6630325317382812, -0.3741455078125, 1.4753570556640625, 0.133087158203125, 0.3440971374511719, -0.1925678253173828, 0.2960205078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000046.npy"}
{"epoch": 0.06754772393538913, "step": 47, "batch_size": 64, "mean": 0.4063158631324768, "std": 0.7473422884941101, "min": -1.2518997192382812, "p10": -0.3665565490722656, "median": 0.3009910583496094, "p90": 1.3478893280029298, "max": 2.8862838745117188, "pos_frac": 0.6875, "sample": [0.4188385009765625, 1.3861083984375, -0.2703857421875, 0.16666412353515625, 2.8862838745117188, -0.49176788330078125, -0.24362945556640625, 1.1006546020507812, -0.37044525146484375, 0.20206451416015625, 0.29488372802734375, -1.2332611083984375, 1.1314010620117188, 0.6582107543945312, 0.26480865478515625, 1.0027122497558594, 0.6136016845703125, -0.22188186645507812, -0.076080322265625, 0.47503662109375, 0.7889785766601562, 0.236328125, 0.2367877960205078, 0.20331954956054688, 0.5524673461914062, 0.18046188354492188, 1.0667037963867188, 0.554718017578125, 0.307098388671875, 0.3845100402832031, -0.23786449432373047, -0.049713134765625, 0.632537841796875, 1.2131423950195312, 0.7193069458007812, -0.12291145324707031, -0.7609710693359375, 0.0930328369140625, 0.8136138916015625, 0.4404296875, 2.1353759765625, 0.8994598388671875, 0.6832847595214844, -0.35748291015625, 1.4023284912109375, 0.1546173095703125, -0.0003814697265625, 1.2984809875488281, 0.05181884765625, 1.7420501708984375, -1.2518997192382812, 1.3690643310546875, -0.579681396484375, 0.4670257568359375, -0.5054473876953125, -0.239990234375, 1.2541351318359375, 0.6840133666992188, -0.01177978515625, -0.08348274230957031, 1.7660675048828125, 0.0314483642578125, 0.3617668151855469, -0.212371826171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000047.npy"}
{"epoch": 0.06901615271659324, "step": 48, "batch_size": 64, "mean": 0.4999656081199646, "std": 0.8587316870689392, "min": -2.2722015380859375, "p10": -0.2348825454711914, "median": 0.3773460388183594, "p90": 1.5409725189208985, "max": 2.97930908203125, "pos_frac": 0.75, "sample": [-0.1558074951171875, 0.4761238098144531, 1.87384033203125, 0.9808502197265625, 1.5456771850585938, 1.93017578125, -0.501434326171875, -0.820281982421875, 0.2620086669921875, 0.6248970031738281, 0.4778900146484375, 1.1428680419921875, -2.2722015380859375, 2.0273284912109375, -0.01415252685546875, -0.2609214782714844, 1.231658935546875, 0.02565288543701172, 0.2436065673828125, 0.7212104797363281, 0.42700958251953125, 0.2607688903808594, -0.21550559997558594, 0.85736083984375, 0.180084228515625, 2.396820068359375, 0.1701812744140625, 1.3791732788085938, 0.07069015502929688, 0.7936496734619141, 0.17859649658203125, 0.5505218505859375, 0.9839096069335938, -0.13742446899414062, 0.1125640869140625, -1.54595947265625, 0.5610733032226562, 1.2456512451171875, 0.064849853515625, 0.5424957275390625, -0.15230941772460938, -0.2660980224609375, 0.3276824951171875, -0.09619903564453125, 1.419189453125, -0.16930770874023438, 0.2870063781738281, 0.3170013427734375, 0.135284423828125, 0.16359710693359375, -0.24318695068359375, 1.0256614685058594, 1.474090576171875, 0.6358413696289062, 1.902618408203125, 2.97930908203125, 0.8773994445800781, -0.20897293090820312, -0.16257095336914062, 0.438140869140625, 0.25543212890625, 0.6263465881347656, 0.4863471984863281, 1.5299949645996094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000048.npy"}
{"epoch": 0.07048458149779736, "step": 49, "batch_size": 64, "mean": 0.7768339514732361, "std": 1.063268780708313, "min": -1.1688079833984375, "p10": -0.21381454467773436, "median": 0.4071922302246094, "p90": 2.69670639038086, "max": 3.76458740234375, "pos_frac": 0.796875, "sample": [0.2688102722167969, 0.5628395080566406, -0.212890625, 1.0614509582519531, 0.1368255615234375, 1.07000732421875, -0.438323974609375, 0.020648956298828125, 0.0303497314453125, -0.454742431640625, 0.3225822448730469, 0.08312225341796875, 1.636077880859375, 2.7573623657226562, 0.310882568359375, 1.3652267456054688, -0.15438079833984375, 0.1065521240234375, 3.0069808959960938, 1.0867080688476562, -0.22288131713867188, -0.7023849487304688, 0.40201568603515625, -0.31255531311035156, 2.55517578125, 1.42474365234375, 0.9443817138671875, -1.1688079833984375, 0.27567291259765625, 0.3732948303222656, 1.6573104858398438, 2.9514007568359375, 0.4872283935546875, 0.7776641845703125, 0.46271514892578125, 0.639801025390625, -0.15201187133789062, 0.2658805847167969, -0.1150970458984375, 3.76458740234375, 0.2638092041015625, 0.06727027893066406, 0.62310791015625, -0.21421051025390625, 2.3311080932617188, 2.96954345703125, 0.321868896484375, 0.7963790893554688, 0.1888427734375, 0.9584503173828125, 0.6682109832763672, 0.31363677978515625, 3.2956771850585938, 2.8222732543945312, 0.6523590087890625, 0.24124908447265625, 0.4123687744140625, 0.6367568969726562, 0.3872642517089844, 1.5347061157226562, 1.979736328125, -0.1029815673828125, -0.0034027099609375, 1.699127197265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000049.npy"}
{"epoch": 0.07195301027900147, "step": 50, "batch_size": 64, "mean": 0.6025470495223999, "std": 0.9507008790969849, "min": -1.2895050048828125, "p10": -0.5529907226562498, "median": 0.41659069061279297, "p90": 1.8927749633789064, "max": 2.98785400390625, "pos_frac": 0.75, "sample": [1.82281494140625, 0.4524383544921875, 0.4627838134765625, 0.1294097900390625, 0.20206451416015625, 0.7218170166015625, -0.00695037841796875, -1.1811561584472656, 0.5106430053710938, 0.38074302673339844, -0.1694793701171875, 1.6868667602539062, 0.22876739501953125, 1.569061279296875, -0.3687591552734375, -0.607208251953125, 1.9634628295898438, 0.3805084228515625, 1.021087646484375, 2.4179229736328125, -0.6207847595214844, 1.90521240234375, 0.9640121459960938, 0.75628662109375, -0.4119071960449219, 1.2416343688964844, -0.7323493957519531, 0.19000625610351562, 0.26806640625, 1.4297027587890625, 0.2866973876953125, 1.8161354064941406, 1.9191970825195312, 2.98785400390625, -0.102630615234375, 1.372323989868164, -1.2895050048828125, 1.8637542724609375, 0.09455108642578125, 1.1623992919921875, -0.9256820678710938, 1.4695053100585938, 0.5370979309082031, -0.10662078857421875, 1.14581298828125, 0.49143218994140625, -0.426483154296875, 0.27556610107421875, 1.2016525268554688, -0.851287841796875, 0.78265380859375, 0.24881744384765625, 1.1618919372558594, 0.26778411865234375, 0.223907470703125, 0.2451171875, 0.00092315673828125, 0.8362045288085938, 0.4532318115234375, -0.3457183837890625, 2.3245773315429688, 2.8191757202148438, -0.0179290771484375, 0.03388786315917969], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000050.npy"}
{"epoch": 0.07342143906020558, "step": 51, "batch_size": 64, "mean": 0.6005125641822815, "std": 1.067212700843811, "min": -2.0585403442382812, "p10": -0.4406101226806641, "median": 0.47777366638183594, "p90": 1.7722061157226563, "max": 4.2667694091796875, "pos_frac": 0.71875, "sample": [0.8357162475585938, -0.6811981201171875, 0.05875396728515625, -0.44258880615234375, 0.2712593078613281, 0.8012542724609375, 0.9820404052734375, -0.6752548217773438, -2.0585403442382812, -1.52911376953125, 2.7887420654296875, 0.974822998046875, 0.7046642303466797, 1.4196319580078125, 0.3778724670410156, 1.784881591796875, 1.0501861572265625, 1.690603256225586, 1.2255401611328125, 0.8279380798339844, 0.10405349731445312, 2.0130767822265625, 0.2875633239746094, 0.2176361083984375, 0.446807861328125, 1.521728515625, 0.5855751037597656, 0.8883895874023438, 0.4349021911621094, -0.28238677978515625, 0.455474853515625, 0.03739166259765625, 0.08008193969726562, 0.5421905517578125, -0.4359931945800781, 4.2667694091796875, 0.5000724792480469, 1.7426300048828125, 1.4340057373046875, 0.51361083984375, -0.209228515625, 0.3330116271972656, 1.7365951538085938, 2.6993560791015625, 0.423431396484375, 1.0481224060058594, 0.6759490966796875, -0.22524261474609375, -0.23847198486328125, 2.1231536865234375, 3.06866455078125, -0.984588623046875, -0.31877899169921875, -0.7729644775390625, -0.05525779724121094, 0.19219207763671875, -0.12140083312988281, -0.3720550537109375, 0.9856719970703125, 1.3275604248046875, 0.8521347045898438, -0.0296173095703125, 0.9270553588867188, -0.393280029296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000051.npy"}
{"epoch": 0.07488986784140969, "step": 52, "batch_size": 64, "mean": 0.8083518743515015, "std": 0.8565996885299683, "min": -0.5957908630371094, "p10": -0.14503173828124996, "median": 0.7307224273681641, "p90": 1.6487930297851563, "max": 4.4656524658203125, "pos_frac": 0.859375, "sample": [0.08744430541992188, 0.9160003662109375, 0.5154647827148438, -0.48885345458984375, -0.3353118896484375, -0.19696044921875, 1.2451248168945312, 0.9334640502929688, 0.2513427734375, 3.0214996337890625, 0.29016876220703125, 0.9947547912597656, 0.733856201171875, -0.2250213623046875, 0.8370208740234375, 1.0353851318359375, 1.5181655883789062, 1.035797119140625, 0.905364990234375, 1.6476287841796875, 1.1363964080810547, 0.6624374389648438, 0.899658203125, 0.6868934631347656, 0.39208984375, 0.74298095703125, -0.0565185546875, 1.1656532287597656, 0.7275886535644531, 1.7631072998046875, 4.4656524658203125, 0.06912994384765625, -0.119140625, 0.2923431396484375, 0.4865455627441406, 1.1715087890625, 2.1190109252929688, 0.05328369140625, 0.1285247802734375, 2.1267433166503906, -0.1561279296875, 1.5111770629882812, 0.7582473754882812, 0.10269927978515625, 0.5142898559570312, 1.6492919921875, 1.3128318786621094, 1.404296875, 0.1209259033203125, 0.4674568176269531, 1.1663551330566406, 1.3206329345703125, 0.5467605590820312, 0.45777130126953125, -0.5957908630371094, 2.620208740234375, 0.06457901000976562, 1.106170654296875, 0.6790313720703125, 1.4577102661132812, 1.0304641723632812, 0.3778076171875, 0.43390846252441406, -0.25240325927734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000052.npy"}
{"epoch": 0.0763582966226138, "step": 53, "batch_size": 64, "mean": 1.0885956287384033, "std": 1.4301378726959229, "min": -1.14373779296875, "p10": -0.4161579132080077, "median": 0.8546781539916992, "p90": 2.8429740905761722, "max": 7.790496826171875, "pos_frac": 0.859375, "sample": [0.356201171875, 0.5484695434570312, 0.16840362548828125, 4.1263427734375, 0.24317169189453125, -1.14373779296875, 1.7633438110351562, -0.49266815185546875, 1.4750938415527344, 0.4422454833984375, 3.5395660400390625, 7.790496826171875, 0.9709014892578125, 1.0248947143554688, 0.4266510009765625, 0.6506996154785156, -0.84002685546875, 1.0201644897460938, 0.0011749267578125, 1.7695541381835938, 1.0890045166015625, 0.448944091796875, 2.9001235961914062, 1.1154632568359375, 0.9874153137207031, 0.7421035766601562, 1.8843994140625, 0.33807373046875, 2.1889877319335938, 0.09939956665039062, 3.7453765869140625, -0.6916580200195312, 1.2769889831542969, 0.7979202270507812, 1.3178253173828125, 1.0638046264648438, -0.4507408142089844, 0.9427833557128906, 0.12649154663085938, 0.45473480224609375, 0.8638153076171875, -0.7786521911621094, 1.534994125366211, 3.1638031005859375, 0.7455596923828125, 2.0628318786621094, 2.6717300415039062, -1.0894851684570312, -0.263519287109375, 0.5725898742675781, 0.581390380859375, 0.16392898559570312, 0.8804435729980469, -0.3354644775390625, 1.1638336181640625, 0.3224945068359375, 0.8455410003662109, 2.5601272583007812, 0.6290168762207031, 2.760040283203125, 0.9371490478515625, 2.2229461669921875, 2.8785171508789062, 0.3581085205078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000053.npy"}
{"epoch": 0.07782672540381791, "step": 54, "batch_size": 64, "mean": 1.0387556552886963, "std": 1.2162748575210571, "min": -1.5294647216796875, "p10": -0.16211891174316406, "median": 0.8818397521972656, "p90": 2.549137115478516, "max": 4.765533447265625, "pos_frac": 0.8125, "sample": [0.4282073974609375, 2.6611862182617188, -0.13382720947265625, -0.36324310302734375, 1.10943603515625, -0.12041473388671875, 1.9663009643554688, -0.4665985107421875, 2.244537353515625, 1.2025070190429688, 0.9530086517333984, 0.945037841796875, 1.527557373046875, 0.8073692321777344, -1.1006546020507812, 0.96197509765625, 0.5491981506347656, 0.44190025329589844, 1.7420387268066406, 3.4225692749023438, -0.15294647216796875, 2.1012954711914062, 2.4700164794921875, -0.05210113525390625, 2.5830459594726562, -0.16604995727539062, 1.0891799926757812, 1.9992904663085938, 1.1912345886230469, 1.8642845153808594, 0.1748371124267578, 0.8501777648925781, 2.4023513793945312, 0.191192626953125, -0.5238113403320312, 1.5639266967773438, 3.43536376953125, 0.3249664306640625, 0.5738525390625, 1.8374404907226562, 0.481201171875, 1.7491912841796875, 2.30487060546875, -1.5294647216796875, -0.3228797912597656, 1.3252487182617188, 0.1076812744140625, 0.901702880859375, 1.2710456848144531, 0.5469207763671875, 0.8619766235351562, 0.25547027587890625, 1.0242843627929688, 0.2294769287109375, 0.14413070678710938, 4.162078857421875, 0.17214584350585938, 1.5067214965820312, -0.08359527587890625, 0.7357330322265625, 0.19460487365722656, 4.765533447265625, 0.021640777587890625, 3.1190032958984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000054.npy"}
{"epoch": 0.07929515418502203, "step": 55, "batch_size": 64, "mean": 1.1621229648590088, "std": 1.6585365533828735, "min": -1.87359619140625, "p10": -0.3004417419433593, "median": 0.535400390625, "p90": 3.197885513305664, "max": 7.049659729003906, "pos_frac": 0.765625, "sample": [2.6071929931640625, 1.4181938171386719, -0.1017303466796875, 2.5553741455078125, 3.5423583984375, -0.5302352905273438, 0.8877182006835938, 1.6352500915527344, 3.6379241943359375, 2.424285888671875, 1.336029052734375, 0.6108036041259766, 0.5997314453125, 1.2231597900390625, 7.049659729003906, 2.5483245849609375, 1.2099609375, 0.37616729736328125, 2.769611358642578, -1.4144210815429688, 2.4397735595703125, -0.21619415283203125, 0.45514678955078125, -0.09726715087890625, 4.982635498046875, 0.1352081298828125, 2.0643157958984375, 2.4987411499023438, 0.2730903625488281, 0.13775253295898438, -0.961181640625, 1.88763427734375, 1.3364200592041016, 1.1395854949951172, -0.0441741943359375, -0.31935882568359375, 0.28076171875, -0.574554443359375, 2.5686912536621094, 1.2572555541992188, 0.3250885009765625, 0.08920669555664062, 0.0397186279296875, 5.949638366699219, 3.193065643310547, -0.2563018798828125, 0.16028213500976562, 0.35704803466796875, -0.12786483764648438, -1.87359619140625, 0.28704071044921875, 0.4710693359375, 3.199951171875, 0.3364715576171875, 1.1484375, 0.3070068359375, 0.2528228759765625, 3.3875961303710938, -0.09874725341796875, 2.4512786865234375, -0.634124755859375, 0.1633148193359375, -0.0580596923828125, 1.6758880615234375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000055.npy"}
{"epoch": 0.08076358296622614, "step": 56, "batch_size": 64, "mean": 1.0214989185333252, "std": 1.6304402351379395, "min": -3.907470703125, "p10": -0.4282203674316406, "median": 0.798896312713623, "p90": 2.634797668457032, "max": 7.0074462890625, "pos_frac": 0.765625, "sample": [1.2334976196289062, 1.145355224609375, -0.0777587890625, 1.71063232421875, -0.42543792724609375, 0.2550811767578125, 4.1481475830078125, 0.39311981201171875, -0.8531303405761719, -0.3341712951660156, 0.731414794921875, 1.1594085693359375, -0.429412841796875, 2.684295654296875, 0.932098388671875, 0.1852874755859375, 0.8333740234375, -0.07288360595703125, 6.219673156738281, 2.5193023681640625, -0.6001243591308594, 0.7476806640625, 1.501800537109375, 1.4921417236328125, 2.0897216796875, 0.888214111328125, 0.010608673095703125, -0.1922779083251953, 1.1789817810058594, 0.6133193969726562, 1.7136154174804688, 3.599884033203125, 1.4050674438476562, 0.7888326644897461, 0.6278305053710938, 2.4530410766601562, -0.229583740234375, 0.8089599609375, 2.1728363037109375, 4.482460021972656, 1.67987060546875, 0.5979080200195312, 1.302093505859375, 7.0074462890625, 1.7519378662109375, 1.167694091796875, -0.5865364074707031, -0.655792236328125, 0.6603355407714844, 1.7150802612304688, 0.32171630859375, -0.1526041030883789, 0.9128379821777344, 1.7084808349609375, 0.305572509765625, 0.36675262451171875, 0.35079193115234375, -3.907470703125, 0.20283126831054688, 1.5682449340820312, -0.9785308837890625, -0.395050048828125, 0.034030914306640625, 2.8873825073242188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000056.npy"}
{"epoch": 0.08223201174743025, "step": 57, "batch_size": 64, "mean": 1.1166605949401855, "std": 1.4715468883514404, "min": -1.4610595703125, "p10": -0.4488388061523437, "median": 0.7616539001464844, "p90": 3.226908111572266, "max": 4.8101806640625, "pos_frac": 0.765625, "sample": [0.843048095703125, 1.1038856506347656, 0.78106689453125, 2.8385238647460938, 2.8666763305664062, -0.216552734375, 3.3551788330078125, -0.2031230926513672, 1.5506591796875, 0.6371612548828125, 1.6296234130859375, 0.38440704345703125, 0.9102706909179688, 0.2010498046875, 1.77764892578125, 4.253204345703125, 0.10845565795898438, 1.989837646484375, -1.4610595703125, 4.8101806640625, 2.4210586547851562, 0.8830718994140625, 0.6723785400390625, 0.7386131286621094, -0.012762069702148438, 0.16119766235351562, 2.27984619140625, -0.3877105712890625, 0.9263534545898438, 2.9234542846679688, 2.0009689331054688, 0.5097808837890625, 0.23961639404296875, 0.05136680603027344, -1.31634521484375, 4.5557861328125, 0.3996124267578125, 0.7422409057617188, 0.5414657592773438, 1.4575958251953125, 4.0375823974609375, 0.18016433715820312, 1.53143310546875, 0.5631790161132812, 2.1227874755859375, 0.8785476684570312, 0.6544723510742188, -0.5100479125976562, 3.245880126953125, -0.35234642028808594, -0.7184219360351562, 3.5575714111328125, -0.9650993347167969, -0.257843017578125, 1.99078369140625, 0.5620651245117188, -0.2587432861328125, 1.8034858703613281, -0.47503662109375, -0.9668502807617188, -0.2896614074707031, 3.03155517578125, 0.9704437255859375, 3.1826400756835938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000057.npy"}
{"epoch": 0.08370044052863436, "step": 58, "batch_size": 64, "mean": 1.503377914428711, "std": 1.9578640460968018, "min": -1.80633544921875, "p10": -0.11366119384765624, "median": 0.7111606597900391, "p90": 4.365526580810547, "max": 8.486129760742188, "pos_frac": 0.8125, "sample": [2.5300674438476562, 0.38763427734375, -0.08350563049316406, 2.040679931640625, 0.7310104370117188, -0.02979278564453125, 0.04840850830078125, 1.3733367919921875, 2.3997421264648438, 0.7073478698730469, 0.58831787109375, 5.8651275634765625, 0.26454925537109375, 1.8136138916015625, 0.3033638000488281, 0.9544525146484375, 0.9565162658691406, 0.08396148681640625, 2.3270721435546875, 3.2316246032714844, 0.792694091796875, 0.6400222778320312, 4.403861999511719, -0.6993408203125, 3.1947784423828125, 0.9138565063476562, 2.5823898315429688, 4.131917953491211, 3.9325332641601562, 0.5588035583496094, 8.486129760742188, 0.6965255737304688, 0.26793670654296875, -0.11490631103515625, 0.5370903015136719, -0.06564140319824219, 0.45821380615234375, 0.9203529357910156, 4.2760772705078125, -0.744903564453125, 0.15789031982421875, 0.41926002502441406, 5.9542083740234375, 2.307403564453125, 3.2010345458984375, 4.523754119873047, 0.7149734497070312, 0.394134521484375, 2.1333465576171875, 5.759796142578125, -0.12776756286621094, -0.2261810302734375, 0.20412063598632812, 2.2469253540039062, 0.4301719665527344, 0.33179473876953125, 5.028778076171875, -1.80633544921875, -0.0623626708984375, 0.34809112548828125, -0.11075592041015625, -0.4203376770019531, 0.8488082885742188, 2.3035125732421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000058.npy"}
{"epoch": 0.08516886930983847, "step": 59, "batch_size": 64, "mean": 1.272177815437317, "std": 1.5715222358703613, "min": -1.92584228515625, "p10": -0.3638580322265625, "median": 0.8322877883911133, "p90": 2.87466049194336, "max": 6.539459228515625, "pos_frac": 0.8125, "sample": [0.76995849609375, -0.38768768310546875, 2.649871826171875, 0.7421875, 0.7157859802246094, 2.9141998291015625, 0.0808563232421875, 5.3847808837890625, 0.22411346435546875, 3.8289337158203125, -0.3528938293457031, 1.8191680908203125, 2.5265655517578125, -0.029199600219726562, 2.2341156005859375, 0.9008674621582031, -0.18850326538085938, 2.6752777099609375, 0.8832855224609375, 0.9971694946289062, -0.5617294311523438, -1.92584228515625, 0.02257537841796875, 1.3420562744140625, 1.7433929443359375, 2.6909027099609375, 0.8495502471923828, 0.11554527282714844, 1.688018798828125, -0.6934051513671875, 1.0577621459960938, -0.40981292724609375, 0.5888595581054688, 0.56298828125, 0.774871826171875, 0.5384750366210938, 1.9394569396972656, 0.6368255615234375, 2.2837066650390625, 1.563385009765625, 1.4885520935058594, 3.7066802978515625, 0.19171142578125, -0.80535888671875, 0.5962753295898438, 0.8150253295898438, 1.9982147216796875, 0.8147048950195312, 2.5355148315429688, 2.0859107971191406, 1.4702682495117188, -0.13304901123046875, 1.9745864868164062, 2.7824020385742188, 0.6610374450683594, -0.2450103759765625, 1.0968208312988281, 0.36348724365234375, -0.3685569763183594, 0.39879608154296875, 5.766448974609375, 3.9825973510742188, 0.5064239501953125, 6.539459228515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000059.npy"}
{"epoch": 0.08663729809104258, "step": 60, "batch_size": 64, "mean": 1.2155587673187256, "std": 1.4714477062225342, "min": -1.95367431640625, "p10": -0.1440301895141601, "median": 0.9920597076416016, "p90": 3.462558746337891, "max": 5.5161895751953125, "pos_frac": 0.8125, "sample": [0.4229927062988281, 3.845245361328125, 0.09818458557128906, -0.1631011962890625, 0.5272979736328125, 1.0667228698730469, 2.5074539184570312, 3.5149383544921875, 1.20245361328125, 2.1514434814453125, 1.3157539367675781, 1.2459564208984375, -0.09953117370605469, -0.041961669921875, 2.8675918579101562, 1.8973846435546875, 1.90118408203125, 2.382537841796875, 1.14263916015625, 1.358062744140625, 4.3880462646484375, 1.0918121337890625, 5.5161895751953125, 2.6868743896484375, 1.1100845336914062, 1.1300697326660156, 0.2638359069824219, -0.0789794921875, 0.9173965454101562, -0.23724746704101562, 0.24670791625976562, 4.777000427246094, 0.7528076171875, 3.5571517944335938, 0.020771026611328125, 0.10745429992675781, -0.80810546875, 1.484649658203125, -0.074615478515625, -0.0039825439453125, 1.193389892578125, 2.2007904052734375, 2.5288009643554688, 2.7640304565429688, 4.168037414550781, -0.2288188934326172, 0.4676513671875, 3.3403396606445312, 0.866485595703125, 0.1648101806640625, 1.6667938232421875, -0.565887451171875, 0.6236724853515625, -1.0449371337890625, 1.4007759094238281, 0.262939453125, -1.95367431640625, 0.0642242431640625, 1.4610366821289062, 0.8010215759277344, 0.433502197265625, 0.65545654296875, 0.4073657989501953, 0.1267852783203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000060.npy"}
{"epoch": 0.0881057268722467, "step": 61, "batch_size": 64, "mean": 1.110102653503418, "std": 1.715085744857788, "min": -2.6710205078125, "p10": -0.3156900405883789, "median": 0.6351375579833984, "p90": 3.2669895172119143, "max": 9.109710693359375, "pos_frac": 0.8125, "sample": [-0.29915428161621094, 0.5807266235351562, 0.545196533203125, 0.305450439453125, 1.7495269775390625, 0.6050262451171875, 0.4442024230957031, 3.3142433166503906, -0.0026092529296875, 2.4419631958007812, 0.35186004638671875, 5.1290435791015625, 2.375335693359375, -0.20250320434570312, 1.4199600219726562, 1.5010948181152344, -0.32277679443359375, 0.6375694274902344, 0.6899566650390625, -0.41297149658203125, 3.1567306518554688, -2.052825927734375, 3.43853759765625, 1.0283889770507812, 1.8669052124023438, 0.49449920654296875, 1.0364532470703125, 0.6245765686035156, 0.2795143127441406, 2.369781494140625, 1.1103897094726562, 0.948486328125, 1.4006576538085938, 0.6327056884765625, 9.109710693359375, 2.6631622314453125, 0.654388427734375, -0.22670745849609375, -2.6710205078125, -0.3894805908203125, 1.0784378051757812, -0.21600723266601562, 3.93994140625, 0.9742851257324219, 0.09356307983398438, 4.180908203125, 0.3119964599609375, 0.3459320068359375, 0.4224853515625, 1.1652984619140625, 3.9805030822753906, 0.19179534912109375, -0.8146514892578125, 0.3145732879638672, 0.398162841796875, 0.23073959350585938, 0.25830841064453125, 0.5124664306640625, 2.4440383911132812, 1.8968467712402344, 1.4150772094726562, 0.9315948486328125, -0.42181396484375, 1.0860919952392578], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000061.npy"}
{"epoch": 0.08957415565345081, "step": 62, "batch_size": 64, "mean": 1.1418296098709106, "std": 1.9877238273620605, "min": -7.294677734375, "p10": -0.15011444091796874, "median": 0.7781257629394531, "p90": 3.5032310485839857, "max": 8.869888305664062, "pos_frac": 0.859375, "sample": [0.5472030639648438, 0.5133323669433594, -1.2625732421875, 3.8255767822265625, 0.789031982421875, 0.45867919921875, 1.0788612365722656, 0.015300750732421875, 2.4687271118164062, -0.637969970703125, 4.35302734375, 0.24680328369140625, 0.7186069488525391, 0.15418243408203125, 0.48744964599609375, -0.06901741027832031, 1.43939208984375, 1.60919189453125, 0.016081809997558594, 1.3939895629882812, 0.8135719299316406, 3.6395645141601562, 1.6844329833984375, -0.14018630981445312, 0.7672195434570312, 5.1416778564453125, -0.23418045043945312, 3.18511962890625, 1.153371810913086, 6.56695556640625, 1.1159706115722656, 3.7182769775390625, 2.0972137451171875, 0.9256744384765625, 0.7334136962890625, 2.1580581665039062, 0.3930511474609375, 0.5592193603515625, 0.7576904296875, 0.45953369140625, 0.318756103515625, -0.15436935424804688, 0.2629051208496094, 1.1608314514160156, 0.0920562744140625, 0.635162353515625, 1.366302490234375, 0.9676589965820312, 2.23968505859375, 0.9119720458984375, 1.3051071166992188, 0.019186019897460938, -0.4537506103515625, 1.6287078857421875, 2.551668167114258, 0.4644737243652344, 0.2007884979248047, 1.1341552734375, 8.869888305664062, -0.2309417724609375, 1.3444671630859375, 1.3917312622070312, 0.7038040161132812, -7.294677734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000062.npy"}
{"epoch": 0.09104258443465492, "step": 63, "batch_size": 64, "mean": 1.5261400938034058, "std": 1.6241228580474854, "min": -2.0800094604492188, "p10": -0.20030689239501953, "median": 1.2479476928710938, "p90": 3.4574050903320312, "max": 6.566444396972656, "pos_frac": 0.8125, "sample": [0.37247467041015625, 3.4475250244140625, 2.565380096435547, 3.3121185302734375, -0.10418701171875, 0.07857513427734375, -0.2037200927734375, 3.2669525146484375, 3.0970916748046875, 1.1405525207519531, 1.204132080078125, 1.4693374633789062, 0.535186767578125, 0.4690589904785156, -0.19234275817871094, -0.104278564453125, 4.46160888671875, 0.4423084259033203, 0.03983306884765625, 1.86944580078125, 6.566444396972656, 1.2917633056640625, 1.6697769165039062, -0.13055419921875, 0.50994873046875, 0.5368118286132812, 1.1468734741210938, 0.13684844970703125, 2.7679367065429688, 0.8180294036865234, 3.394439697265625, 1.918182373046875, 3.0024337768554688, 0.5992546081542969, 2.33197021484375, 2.4387054443359375, 3.192157745361328, 2.8747100830078125, 1.4838848114013672, 0.5665283203125, 0.17254638671875, 1.9578704833984375, -0.21622467041015625, -0.75299072265625, -2.0800094604492188, 2.6596946716308594, 0.30539703369140625, 0.73907470703125, 1.982452392578125, 0.12994384765625, -0.126495361328125, 4.294364929199219, 4.136512756347656, -0.360504150390625, 2.298473358154297, -0.2657279968261719, 1.664093017578125, 4.9539794921875, 1.408843994140625, -0.3857383728027344, 3.4720306396484375, 2.8112945556640625, 3.461639404296875, 1.1292495727539062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000063.npy"}
{"epoch": 0.09251101321585903, "step": 64, "batch_size": 64, "mean": 1.7069969177246094, "std": 2.185443639755249, "min": -1.5686416625976562, "p10": -0.1212432861328125, "median": 1.0959205627441406, "p90": 4.675096130371094, "max": 9.208160400390625, "pos_frac": 0.828125, "sample": [-1.533721923828125, 0.8887557983398438, 4.838470458984375, 1.1125869750976562, 0.4147224426269531, 1.0956649780273438, 0.5847816467285156, 0.06302642822265625, 1.1002960205078125, 0.27478790283203125, 0.02384185791015625, 1.8784446716308594, 0.39658355712890625, 1.1606292724609375, -0.5979690551757812, 0.02685546875, 1.3592376708984375, 0.35523223876953125, 5.426471710205078, 0.9381370544433594, 2.857593536376953, -1.5686416625976562, 4.7506866455078125, -0.12137603759765625, -1.3798294067382812, 0.8477096557617188, -0.0234832763671875, 0.5574874877929688, -0.068603515625, 2.469390869140625, 0.00489044189453125, -0.12093353271484375, 9.208160400390625, 1.3580093383789062, 0.05013275146484375, 2.435028076171875, -0.7732467651367188, 2.1402435302734375, 5.2590789794921875, 1.0961761474609375, 0.6694736480712891, 1.7584228515625, 4.49871826171875, 4.183998107910156, -0.1869659423828125, 1.1149749755859375, 3.696258544921875, 2.789703369140625, 6.729217529296875, 2.6558303833007812, 3.73492431640625, 1.5133514404296875, 3.4205093383789062, 0.3216209411621094, 7.810615539550781, -0.005680084228515625, 0.09531021118164062, 1.079132080078125, 3.5978317260742188, 3.540802001953125, 1.8657188415527344, 0.5984039306640625, 4.295509338378906, 0.684814453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000064.npy"}
{"epoch": 0.09397944199706314, "step": 65, "batch_size": 64, "mean": 1.7465707063674927, "std": 2.0688929557800293, "min": -2.65216064453125, "p10": -0.1328948974609375, "median": 1.316812515258789, "p90": 4.719104003906251, "max": 9.03912353515625, "pos_frac": 0.84375, "sample": [4.54205322265625, 5.8557281494140625, 1.61932373046875, 0.9145050048828125, 1.034515380859375, -0.11492919921875, -0.6440200805664062, 2.1521835327148438, 1.1889152526855469, 0.5875244140625, 9.03912353515625, -0.6332473754882812, 0.16916656494140625, 0.9824676513671875, 4.79498291015625, 0.7950363159179688, 2.8913116455078125, 0.583094596862793, 1.8749847412109375, -2.65216064453125, 2.004274368286133, 4.050628662109375, 1.85589599609375, 1.3366355895996094, 2.9700775146484375, 5.5206146240234375, 4.23077392578125, 2.741985321044922, 5.041816711425781, 0.08341598510742188, 1.4781379699707031, 0.61669921875, -0.5544967651367188, 0.7786788940429688, 2.178680419921875, 1.7804222106933594, 2.15625, -0.091644287109375, 5.899101257324219, 0.40973472595214844, -0.140594482421875, -2.02398681640625, 0.9493408203125, 0.29941749572753906, 6.870697021484375, 0.92010498046875, 2.8167266845703125, 0.20684814453125, 2.85302734375, 3.6644973754882812, 0.9453811645507812, -0.14862060546875, 1.60369873046875, 1.9364128112792969, 2.601287841796875, 0.167694091796875, 1.2969894409179688, -0.0174560546875, 1.409332275390625, 1.817840576171875, 0.28006744384765625, 2.0988311767578125, 0.7242546081542969, 1.1804924011230469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000065.npy"}
{"epoch": 0.09544787077826726, "step": 66, "batch_size": 64, "mean": 1.36179780960083, "std": 2.001673698425293, "min": -3.494640350341797, "p10": -0.5732147216796875, "median": 0.8548641204833984, "p90": 4.482350158691407, "max": 7.349555969238281, "pos_frac": 0.84375, "sample": [2.663043975830078, 5.6063079833984375, 4.2261199951171875, 0.43096923828125, 1.2661895751953125, 0.3910713195800781, 0.5406379699707031, 0.5816841125488281, 0.16702842712402344, 1.3895950317382812, 2.4488372802734375, 0.4881439208984375, 1.1837158203125, 0.2422332763671875, -1.4321784973144531, -3.494640350341797, 1.2821426391601562, 0.22348594665527344, 0.8950614929199219, 7.349555969238281, 0.8942947387695312, 2.6321983337402344, -0.06060218811035156, 1.1320037841796875, -0.651031494140625, 0.78387451171875, 0.2708320617675781, 1.7244110107421875, 1.3079261779785156, 1.731201171875, 0.531982421875, 5.723320007324219, 3.029529571533203, 2.382843017578125, 1.6961326599121094, 1.5153732299804688, 1.7054328918457031, 0.6065006256103516, 0.6565933227539062, 4.5921630859375, 0.43621826171875, -0.6160430908203125, 0.38800811767578125, 0.0149688720703125, 1.2687339782714844, -1.0865478515625, 1.1955413818359375, -0.4732818603515625, -1.307403564453125, 0.13706207275390625, 2.737701416015625, 1.9655075073242188, 6.645881652832031, 0.244964599609375, 0.13566207885742188, 5.531707763671875, -0.6696243286132812, 0.4834938049316406, 0.1829071044921875, 0.8154335021972656, 5.744873046875, -0.2712211608886719, 2.61138916015625, 2.3851394653320312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000066.npy"}
{"epoch": 0.09691629955947137, "step": 67, "batch_size": 64, "mean": 1.068185567855835, "std": 2.7776286602020264, "min": -5.129554748535156, "p10": -2.0255218505859376, "median": 0.6895828247070312, "p90": 5.190534591674807, "max": 10.409027099609375, "pos_frac": 0.703125, "sample": [5.512599945068359, 1.19171142578125, 8.142662048339844, 0.9094314575195312, 0.5200481414794922, 0.01751708984375, 5.7246246337890625, 0.8739700317382812, 1.3913803100585938, 0.03841972351074219, 0.4609527587890625, 2.4500350952148438, -1.0728683471679688, 0.5564918518066406, 0.2546844482421875, -1.3046150207519531, 1.4293060302734375, 1.9214019775390625, -0.09094619750976562, -2.416778564453125, 1.93365478515625, 0.312591552734375, 2.3706817626953125, -0.8171310424804688, -0.2456512451171875, -1.125, 3.429046630859375, -0.09460830688476562, 0.7209396362304688, -2.4835739135742188, -1.33349609375, 2.3372421264648438, 1.305755615234375, 0.6582260131835938, -1.1265029907226562, 1.0344581604003906, 0.501617431640625, 0.263458251953125, -2.0628433227539062, -0.7889862060546875, -2.7663421630859375, 3.4673118591308594, 0.3031005859375, 1.5932388305664062, 1.512603759765625, 2.7366676330566406, 0.44860267639160156, 4.439048767089844, 1.0721702575683594, 5.628334045410156, -1.9384384155273438, 1.3168067932128906, 2.738861083984375, 1.252899169921875, 1.6989898681640625, -3.1433486938476562, 7.703338623046875, 0.7748832702636719, -5.129554748535156, 6.798248291015625, 10.409027099609375, 0.33824920654296875, -3.1925506591796875, -0.9981765747070312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000067.npy"}
{"epoch": 0.09838472834067548, "step": 68, "batch_size": 64, "mean": 2.134885311126709, "std": 2.9077298641204834, "min": -2.0548934936523438, "p10": -0.2069355010986328, "median": 1.3730545043945312, "p90": 6.770529937744151, "max": 13.302963256835938, "pos_frac": 0.828125, "sample": [2.2813644409179688, 1.8837890625, 3.9599761962890625, 1.8619766235351562, 0.6087808609008789, 3.6917877197265625, 4.365470886230469, 3.4202117919921875, -1.2764816284179688, 0.11990737915039062, 1.4042129516601562, 13.302963256835938, 3.995025634765625, 2.733856201171875, 1.9626922607421875, 0.5592498779296875, 1.2863845825195312, 3.4615402221679688, 0.9648056030273438, 8.753631591796875, 0.44903564453125, 1.6289234161376953, 0.7327461242675781, -0.584197998046875, 0.48328208923339844, 3.297271728515625, 7.85015869140625, 10.346977233886719, 2.3863754272460938, -2.0548934936523438, 2.41552734375, 1.0527877807617188, -0.661407470703125, 1.7505645751953125, 0.4180145263671875, 0.6157016754150391, -0.22027587890625, 0.2275238037109375, 0.832122802734375, 2.8579483032226562, -1.1219253540039062, -0.16121673583984375, -0.10883712768554688, 0.72137451171875, 0.54998779296875, 1.95513916015625, 1.7126026153564453, 7.80126953125, -0.1132049560546875, 0.044859886169433594, 1.5928421020507812, 8.5679931640625, 8.18115234375, 1.8397369384765625, -0.5824661254882812, 3.2981948852539062, 0.8600845336914062, 1.3015327453613281, 3.6116180419921875, 0.7552490234375, 0.09371185302734375, -0.17580795288085938, 1.3418960571289062, 1.5015373229980469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000068.npy"}
{"epoch": 0.09985315712187959, "step": 69, "batch_size": 64, "mean": 2.1411383152008057, "std": 2.976512908935547, "min": -6.667816162109375, "p10": -0.3128568649291991, "median": 1.5989608764648438, "p90": 5.083832550048829, "max": 14.055206298828125, "pos_frac": 0.84375, "sample": [0.6478118896484375, -0.9274826049804688, 6.90399169921875, 0.5047683715820312, 2.3455429077148438, 4.971771240234375, -6.667816162109375, 0.0211181640625, -0.18347930908203125, 0.6861648559570312, -0.1848907470703125, 0.6981468200683594, 4.8230743408203125, -0.23625564575195312, 2.0918502807617188, 3.8786697387695312, 4.1320037841796875, 2.7443161010742188, 4.962043762207031, 5.196868896484375, 2.4829177856445312, 0.42835235595703125, 1.704071044921875, 2.846435546875, 2.87255859375, 2.94439697265625, 0.9193191528320312, 12.519981384277344, 3.0447959899902344, 1.520132064819336, 0.2740631103515625, -0.9658546447753906, 1.594024658203125, 7.2554473876953125, 0.7818717956542969, 5.131858825683594, 2.4817733764648438, 2.571136474609375, 1.6038970947265625, 0.49642181396484375, -0.4389533996582031, 2.9930877685546875, 0.505615234375, -0.6400299072265625, 0.14548110961914062, 1.5701446533203125, 2.66156005859375, 4.678009033203125, 0.0798797607421875, 0.21020126342773438, 2.5207557678222656, 2.5345802307128906, 0.27323150634765625, 0.3620758056640625, 2.43267822265625, 1.0649261474609375, 6.037689208984375, 14.055206298828125, 1.35479736328125, 3.3812332153320312, -0.3456859588623047, 1.0324649810791016, -0.4397125244140625, 2.087799072265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000069.npy"}
{"epoch": 0.1013215859030837, "step": 70, "batch_size": 64, "mean": 2.5402894020080566, "std": 3.4875900745391846, "min": -2.5561065673828125, "p10": -0.9349903106689452, "median": 1.7452278137207031, "p90": 6.887731933593751, "max": 16.63848876953125, "pos_frac": 0.78125, "sample": [-0.45989227294921875, 9.447265625, 2.0988235473632812, 0.34842681884765625, -1.6014556884765625, 0.5863189697265625, -2.5561065673828125, 6.6102142333984375, -0.9702072143554688, 0.5670242309570312, 2.87078857421875, -1.0777359008789062, -1.0890693664550781, 3.512939453125, 16.63848876953125, 1.6009025573730469, 7.0066680908203125, 4.203369140625, 3.6182899475097656, 3.5723876953125, 4.511894226074219, 5.117015838623047, 2.027130126953125, 5.279815673828125, -1.1352500915527344, 3.9179630279541016, -0.8115997314453125, 1.0715560913085938, 1.3311882019042969, 0.6025543212890625, -0.7513427734375, 8.459335327148438, 1.7816429138183594, 2.4189987182617188, 3.4327011108398438, 1.4657020568847656, 1.7088127136230469, 2.047637939453125, 1.5766143798828125, 0.18244171142578125, 0.5186996459960938, -0.8528175354003906, 0.9896240234375, -0.1640167236328125, 1.0761985778808594, -1.992889404296875, 8.104713439941406, 5.377391815185547, 3.10589599609375, -0.22255706787109375, 2.1855010986328125, 3.916778564453125, 0.3161468505859375, 2.9429397583007812, 0.759857177734375, 1.914031982421875, 11.585861206054688, 4.88507080078125, 10.760848999023438, 0.8999862670898438, -0.6367034912109375, 1.1062698364257812, 3.3265037536621094, 3.5129318237304688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000070.npy"}
{"epoch": 0.1027900146842878, "step": 71, "batch_size": 64, "mean": 3.5774307250976562, "std": 3.962826728820801, "min": -2.101593017578125, "p10": 0.031185150146484507, "median": 2.760875701904297, "p90": 8.919518280029298, "max": 18.631103515625, "pos_frac": 0.890625, "sample": [0.924957275390625, 12.603652954101562, 0.23701095581054688, -0.32415008544921875, 7.9367218017578125, -0.041168212890625, 1.9141197204589844, 2.497577667236328, 1.7920074462890625, 2.9935054779052734, 3.0748748779296875, 2.068449020385742, 2.6684188842773438, -0.13945579528808594, 2.93780517578125, 3.3126068115234375, 5.049034118652344, 6.7703094482421875, 1.2207107543945312, 1.3872432708740234, 8.977020263671875, 8.785346984863281, 2.93634033203125, 3.55487060546875, 1.111297607421875, 3.3876380920410156, 1.8061676025390625, 0.7583541870117188, 3.223104476928711, 0.5323257446289062, -2.101593017578125, 4.58673095703125, 4.8035125732421875, -0.02538299560546875, 4.011745452880859, 1.1490020751953125, 1.497894287109375, 5.5191802978515625, 2.85333251953125, 9.223869323730469, 0.6029052734375, 5.0968780517578125, 4.5478363037109375, 1.00140380859375, 5.9337615966796875, 3.9683799743652344, 1.0306358337402344, 16.744140625, 18.631103515625, 6.6566619873046875, 0.40378570556640625, 6.326469421386719, 1.9234447479248047, 0.4923057556152344, 2.915372848510742, 1.302947998046875, -0.5399398803710938, 1.5455169677734375, 3.6438980102539062, 10.911880493164062, 10.17254638671875, 0.7605743408203125, -0.7551078796386719, 0.163177490234375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000071.npy"}
{"epoch": 0.10425844346549193, "step": 72, "batch_size": 64, "mean": 2.8712196350097656, "std": 3.432685375213623, "min": -2.7816619873046875, "p10": -1.0512945175170896, "median": 2.0907669067382812, "p90": 8.412989807128907, "max": 13.623146057128906, "pos_frac": 0.828125, "sample": [3.448009490966797, 4.273811340332031, -2.1305465698242188, 0.6486282348632812, 3.5723953247070312, -1.6042518615722656, 2.0674057006835938, 6.894599914550781, -2.340545654296875, -2.17926025390625, 0.4241447448730469, 0.25111961364746094, 2.6812782287597656, 8.789886474609375, 5.1038055419921875, 1.0016326904296875, 2.565673828125, 5.763145446777344, 0.41127777099609375, 3.4752655029296875, 4.595314025878906, 1.44293212890625, -0.3022003173828125, 2.9818458557128906, 2.034637451171875, 3.4448623657226562, 1.1224937438964844, 2.2160263061523438, 2.1141281127929688, 13.623146057128906, 1.3452606201171875, 1.6255264282226562, 1.2196044921875, 3.7440643310546875, 1.03289794921875, 11.161834716796875, 9.686027526855469, 1.9132099151611328, 0.565765380859375, 9.063606262207031, 1.9344291687011719, 7.99908447265625, 0.3312225341796875, -1.18389892578125, -0.7418842315673828, 5.8972015380859375, 8.739837646484375, 2.8073654174804688, -0.4835052490234375, 8.590377807617188, 2.2154541015625, 3.064891815185547, 0.6303749084472656, -1.2180633544921875, 1.40887451171875, 7.016761779785156, 4.1353912353515625, 4.913715362548828, 6.4666748046875, 1.995025634765625, 2.4886932373046875, 2.0447540283203125, -2.7816619873046875, -0.26152801513671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000072.npy"}
{"epoch": 0.10572687224669604, "step": 73, "batch_size": 64, "mean": 3.095101833343506, "std": 4.9663004875183105, "min": -4.266666412353516, "p10": -1.8667251586914062, "median": 1.9172515869140625, "p90": 10.831417083740238, "max": 19.493270874023438, "pos_frac": 0.765625, "sample": [2.1505889892578125, 0.20029449462890625, 0.9539222717285156, 0.889129638671875, 1.5693035125732422, 5.25341796875, -4.266666412353516, 0.0164337158203125, 3.52227783203125, 1.0792884826660156, 19.493270874023438, 7.450187683105469, 3.7779083251953125, 0.6090469360351562, -2.242034912109375, -1.7132949829101562, 16.582489013671875, 3.32159423828125, 1.5057907104492188, 0.19042205810546875, 6.454021453857422, 3.5097808837890625, 2.8732948303222656, 0.5489215850830078, -0.4628276824951172, 2.306671142578125, 2.9116477966308594, 4.871067047119141, 5.116424560546875, 1.9888916015625, 1.0966625213623047, 0.22455596923828125, 12.103515625, 3.8455657958984375, -3.1015472412109375, 4.552375793457031, 0.73944091796875, -0.1783123016357422, -0.5622692108154297, -0.037689208984375, -3.3938446044921875, 2.0664825439453125, -0.2527313232421875, -0.1714935302734375, 10.125312805175781, 3.38800048828125, 1.5655288696289062, 0.324127197265625, 7.361751556396484, 11.134033203125, 1.845611572265625, 5.521797180175781, 14.353607177734375, 0.007831573486328125, 2.0203628540039062, -1.8788604736328125, 12.852340698242188, 7.699066162109375, -1.838409423828125, 14.81207275390625, 2.0799217224121094, 5.1188507080078125, -3.118682861328125, -2.679718017578125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000073.npy"}
{"epoch": 0.10719530102790015, "step": 74, "batch_size": 64, "mean": 3.501643180847168, "std": 4.409668922424316, "min": -2.6977691650390625, "p10": -0.8918792724609375, "median": 2.4664840698242188, "p90": 10.680897521972657, "max": 16.261016845703125, "pos_frac": 0.71875, "sample": [2.9324417114257812, 2.5593109130859375, 2.769512176513672, 0.5066261291503906, 9.658035278320312, 2.0407180786132812, 2.3736572265625, 12.559356689453125, -0.9111328125, -0.42711639404296875, 12.086776733398438, -0.846954345703125, 12.147979736328125, 5.698455810546875, -0.3671875, 12.593231201171875, 3.0590667724609375, 3.390655517578125, 2.169586181640625, -0.01386260986328125, 1.6776981353759766, 16.261016845703125, -2.6977691650390625, 5.267860412597656, 1.2338485717773438, 3.217254638671875, -0.46661949157714844, 7.3404693603515625, -0.362701416015625, 2.9243850708007812, 3.7230300903320312, -1.1474533081054688, 0.33278465270996094, 11.657012939453125, -1.4227294921875, 5.847480773925781, -1.4160919189453125, 1.1600341796875, 4.277645111083984, 0.8502655029296875, -0.7020301818847656, 1.6903305053710938, 4.600242614746094, 9.986038208007812, 6.666961669921875, 4.82073974609375, -0.0654144287109375, -0.2995452880859375, 8.613235473632812, 0.5451927185058594, 4.482944488525391, 10.810394287109375, -1.55755615234375, 6.6041412353515625, 6.672996520996094, 1.1248550415039062, 3.8077774047851562, 4.222175598144531, -0.5247344970703125, 10.378738403320312, -0.5822296142578125, 1.205902099609375, -2.4248809814453125, 1.7923049926757812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000074.npy"}
{"epoch": 0.10866372980910426, "step": 75, "batch_size": 64, "mean": 4.590297698974609, "std": 5.4551215171813965, "min": -3.216064453125, "p10": -0.1726703643798828, "median": 3.281888961791992, "p90": 13.343907165527344, "max": 23.2156982421875, "pos_frac": 0.84375, "sample": [-0.014240264892578125, 13.360153198242188, 6.631626129150391, 1.4916305541992188, -3.216064453125, 4.1018829345703125, 3.3490142822265625, 1.22381591796875, 1.6411361694335938, 13.938507080078125, 4.350250244140625, 1.4740924835205078, 5.299522399902344, 2.2665786743164062, 7.280982971191406, 3.214763641357422, 4.450958251953125, 5.3912811279296875, 0.24441146850585938, 1.31781005859375, 10.148529052734375, 3.0401992797851562, 2.114459991455078, -0.6793670654296875, 8.1826171875, 3.805419921875, -1.1705322265625, 0.16898727416992188, 2.93670654296875, 10.570724487304688, 6.622016906738281, 0.21845626831054688, 6.2525482177734375, 16.769424438476562, 0.9083175659179688, 3.5214767456054688, 18.2572021484375, -2.997833251953125, 2.0787086486816406, 1.2664642333984375, 0.3355255126953125, 7.503448486328125, 3.755077362060547, 5.453086853027344, -0.574188232421875, 4.7112579345703125, 0.6903934478759766, 2.5328369140625, 1.6111297607421875, 10.986160278320312, -0.044330596923828125, 4.91864013671875, -0.5786666870117188, 5.031349182128906, -0.16218948364257812, 0.2849407196044922, 4.4669647216796875, 13.305999755859375, -0.17716217041015625, 1.1202774047851562, 23.2156982421875, 13.913604736328125, 18.20404052734375, 3.4625091552734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000075.npy"}
{"epoch": 0.11013215859030837, "step": 76, "batch_size": 64, "mean": 3.0265936851501465, "std": 4.668426990509033, "min": -6.828575134277344, "p10": -1.3559585571289061, "median": 2.152054786682129, "p90": 6.544948577880859, "max": 20.249916076660156, "pos_frac": 0.8125, "sample": [-3.9184188842773438, 1.7839622497558594, 0.5794429779052734, 19.4390869140625, 1.8658599853515625, 1.1278839111328125, 5.1718597412109375, 6.5603179931640625, 3.07452392578125, 2.9550704956054688, 4.167743682861328, -0.6762886047363281, -3.65069580078125, 3.7816619873046875, 2.8936004638671875, 1.2316513061523438, 4.045539855957031, 20.249916076660156, 2.0621490478515625, 1.9741668701171875, 5.805351257324219, 3.726642608642578, -0.21012496948242188, 1.5302810668945312, -0.1724090576171875, 6.204597473144531, 0.38501644134521484, 3.3705368041992188, -1.2479400634765625, -6.828575134277344, 5.8780517578125, 4.649517059326172, 2.5184707641601562, 1.4184379577636719, -4.9949493408203125, 0.5022354125976562, 5.7825469970703125, 3.0754165649414062, -1.038961410522461, 0.42142486572265625, 2.2419605255126953, 4.645397186279297, -2.463359832763672, 0.7859954833984375, 13.05181884765625, 6.509086608886719, 10.269393920898438, 3.1969833374023438, 6.099918365478516, 1.6701431274414062, -1.402252197265625, -1.5170745849609375, 0.4510498046875, 0.3778266906738281, 11.817535400390625, 0.5289154052734375, 0.1667327880859375, 7.5517120361328125, 4.858814239501953, 5.8191070556640625, 1.6926460266113281, 5.40142822265625, 4.9565887451171875, 1.4970245361328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000076.npy"}
{"epoch": 0.11160058737151249, "step": 77, "batch_size": 64, "mean": 3.4725894927978516, "std": 5.009212970733643, "min": -8.21783447265625, "p10": -0.9232202529907225, "median": 2.5361557006835938, "p90": 9.79236526489258, "max": 19.313690185546875, "pos_frac": 0.765625, "sample": [2.4884033203125, 12.749862670898438, 1.596099853515625, 5.564430236816406, 2.9259109497070312, -3.4007911682128906, 13.810806274414062, 5.603118896484375, 1.554483413696289, 5.566398620605469, -0.7572097778320312, 4.596893310546875, 6.357776641845703, 1.0970420837402344, 1.3665924072265625, 2.7684249877929688, -8.21783447265625, 8.660186767578125, -0.6477508544921875, 9.575553894042969, 3.5374374389648438, -0.6010971069335938, 0.25681304931640625, 0.38520050048828125, 3.7606887817382812, 6.4076385498046875, -0.5282325744628906, 9.885284423828125, 0.12773895263671875, 1.6952342987060547, 11.255714416503906, 10.83502197265625, 1.1108283996582031, 1.0967864990234375, 6.877532958984375, 1.7460498809814453, 6.327972412109375, 19.313690185546875, -0.6420364379882812, 5.6200714111328125, -3.283355712890625, 1.6307258605957031, -5.995513916015625, -0.0656890869140625, 4.799327850341797, 0.4862518310546875, 8.904632568359375, 9.137107849121094, 2.5839080810546875, 13.106132507324219, 0.5709800720214844, 0.168853759765625, -0.9943675994873047, 1.88629150390625, 6.015380859375, 4.989227294921875, -0.07783699035644531, -5.7564239501953125, 8.235610961914062, 8.01205062866211, -0.7296333312988281, 2.7163467407226562, 5.209026336669922, -1.0300445556640625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000077.npy"}
{"epoch": 0.1130690161527166, "step": 78, "batch_size": 64, "mean": 3.673032283782959, "std": 4.33587121963501, "min": -6.038358688354492, "p10": -0.8400871276855468, "median": 3.2116565704345703, "p90": 8.574334716796876, "max": 18.160232543945312, "pos_frac": 0.828125, "sample": [6.90582275390625, -1.1809768676757812, 8.15399169921875, -0.7673454284667969, 2.434986114501953, 10.62481689453125, 3.6840171813964844, 15.35009765625, 8.442977905273438, 3.314044952392578, 4.524580001831055, 7.6749267578125, -0.7863235473632812, -6.038358688354492, 3.8939590454101562, 3.57940673828125, 1.4862060546875, 4.145530700683594, 4.428825378417969, 5.367820739746094, 0.5862903594970703, 1.6244087219238281, 5.0759429931640625, 14.598968505859375, 3.1092681884765625, 0.8508453369140625, -0.2879486083984375, 6.079315185546875, 11.606353759765625, 2.7235755920410156, 0.23900604248046875, 1.2228355407714844, -2.428497314453125, 3.7301483154296875, 4.7257080078125, 11.532127380371094, 3.41986083984375, 1.7048797607421875, 1.4697341918945312, 6.572288513183594, 6.220745086669922, 4.362312316894531, 8.630630493164062, 4.334163665771484, 4.8640594482421875, 2.6262855529785156, -1.1161231994628906, 1.003427505493164, -0.9229145050048828, 2.7973251342773438, 0.7694282531738281, 0.7894859313964844, 1.8111000061035156, -0.37546539306640625, 3.7361602783203125, -1.9729843139648438, 18.160232543945312, 2.2496414184570312, 1.891510009765625, 5.268238067626953, -0.863128662109375, 6.083915710449219, 0.8060150146484375, 0.52587890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000078.npy"}
{"epoch": 0.1145374449339207, "step": 79, "batch_size": 64, "mean": 4.751272201538086, "std": 5.9252495765686035, "min": -8.2327880859375, "p10": -1.1659999847412108, "median": 3.833782196044922, "p90": 11.377622985839844, "max": 26.045440673828125, "pos_frac": 0.78125, "sample": [-5.94403076171875, 10.533283233642578, 0.8436508178710938, 9.949951171875, -0.8196945190429688, 2.633258819580078, -1.0263786315917969, 1.5386734008789062, 9.190170288085938, 13.645355224609375, 11.43890380859375, -8.2327880859375, 2.744659423828125, 26.045440673828125, 0.22877883911132812, 12.71795654296875, 5.045463562011719, -1.4442901611328125, 3.216400146484375, 12.533279418945312, 16.99633026123047, 6.049678802490234, 6.2418212890625, -1.650787353515625, 9.946617126464844, -1.8916358947753906, 3.8368911743164062, 6.527273178100586, 4.9579925537109375, 7.1746063232421875, 9.244918823242188, 3.066082000732422, 8.421424865722656, 5.715015411376953, -0.4195404052734375, 3.8306732177734375, 3.9759254455566406, -1.0174713134765625, 21.337203979492188, 11.226837158203125, 5.225131988525391, 2.181262969970703, 4.994392395019531, 3.154592514038086, 8.335533142089844, -0.21450424194335938, 6.628089904785156, 2.7239036560058594, 1.8861045837402344, -0.44916534423828125, 7.797615051269531, 0.16499900817871094, 3.0428543090820312, 6.0033416748046875, 1.1359214782714844, 5.305877685546875, 3.7982406616210938, -0.1636199951171875, -3.6527481079101562, 1.155120849609375, 4.392101287841797, 11.234634399414062, -1.2258377075195312, 2.2196884155273438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000079.npy"}
{"epoch": 0.11600587371512482, "step": 80, "batch_size": 64, "mean": 3.7452752590179443, "std": 5.205329895019531, "min": -5.6841888427734375, "p10": -1.1568359374999995, "median": 2.293560028076172, "p90": 10.865390014648439, "max": 23.35308837890625, "pos_frac": 0.828125, "sample": [1.470306396484375, 2.9274826049804688, 3.1656036376953125, -5.6841888427734375, 2.064167022705078, -0.13607406616210938, 0.9114303588867188, 0.9065017700195312, 8.702301025390625, -3.527618408203125, 0.9943027496337891, 4.087451934814453, -2.0855541229248047, 1.2555618286132812, 0.7672653198242188, 0.25261688232421875, 23.35308837890625, 3.9304275512695312, 11.130332946777344, 6.223354339599609, 19.564712524414062, 2.0490875244140625, 5.929344177246094, -2.7958908081054688, 4.989250183105469, 1.69073486328125, 0.2368488311767578, 12.086334228515625, 2.944232940673828, 4.80279541015625, 0.8600997924804688, 0.1642913818359375, 1.9744186401367188, 9.394561767578125, 4.314483642578125, -1.5563812255859375, 2.4075927734375, 5.556129455566406, -1.4240875244140625, 14.463470458984375, 0.8147754669189453, 7.2022247314453125, 2.1795272827148438, 10.956832885742188, 10.652023315429688, 2.9821548461914062, 3.5483474731445312, 5.739471435546875, 1.7008209228515625, 1.2971954345703125, 1.2554054260253906, -0.5332489013671875, -0.27524566650390625, 3.0140228271484375, 1.1439476013183594, 3.7980804443359375, 0.8846588134765625, -2.9515609741210938, 4.118217468261719, 3.6572723388671875, 7.433052062988281, 10.018600463867188, -0.24115371704101562, 12.941413879394531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000080.npy"}
{"epoch": 0.11747430249632893, "step": 81, "batch_size": 64, "mean": 5.342126846313477, "std": 7.290727138519287, "min": -9.48455810546875, "p10": -0.8988349914550777, "median": 4.411769866943359, "p90": 12.910920715332034, "max": 36.60667419433594, "pos_frac": 0.8125, "sample": [6.566474914550781, 15.68218994140625, 11.735809326171875, 5.29412841796875, 2.0067138671875, 12.066116333007812, 0.7122535705566406, 0.6742706298828125, -1.1752510070800781, 4.208641052246094, 5.4176177978515625, -9.48455810546875, 1.8576240539550781, 5.963844299316406, 8.98834228515625, 1.1362991333007812, 3.1649856567382812, 5.9486846923828125, 13.272979736328125, -2.2215957641601562, 0.636322021484375, 6.8700714111328125, -0.072296142578125, 4.7738189697265625, -0.4514732360839844, 1.7960491180419922, 8.66168212890625, 9.16946792602539, 2.832307815551758, 4.6535491943359375, -1.781768798828125, -0.055755615234375, -0.26739501953125, -4.53857421875, 7.6761627197265625, 24.892181396484375, 1.5741767883300781, -0.5190505981445312, -3.9670372009277344, 36.60667419433594, 6.0197296142578125, 13.690643310546875, 4.877494812011719, 5.925605773925781, 1.6573944091796875, 2.180206298828125, -1.0615997314453125, 3.4498062133789062, 6.5947265625, 22.565078735351562, 4.3043212890625, 0.0414581298828125, 3.0868663787841797, 1.678253173828125, 6.91156005859375, 10.35477066040039, 23.45465087890625, 1.9284744262695312, 4.519218444824219, 5.067008972167969, 5.1376800537109375, 5.4238128662109375, 3.6497573852539062, 10.134490966796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000081.npy"}
{"epoch": 0.11894273127753303, "step": 82, "batch_size": 64, "mean": 3.6950860023498535, "std": 5.213653564453125, "min": -8.394668579101562, "p10": -1.128623580932617, "median": 3.0354843139648438, "p90": 9.68737030029297, "max": 19.4197998046875, "pos_frac": 0.796875, "sample": [1.8916244506835938, -0.4755096435546875, 5.624603271484375, 3.3980979919433594, 1.5859413146972656, 7.627399444580078, 1.0176315307617188, 0.8642616271972656, 5.4386749267578125, 17.19677734375, 9.872879028320312, -0.042903900146484375, 1.221405029296875, 4.702239990234375, 1.5630607604980469, 16.462600708007812, 4.3984375, -3.6481781005859375, 4.878776550292969, 4.28564453125, -0.6452713012695312, 7.2327728271484375, 9.2545166015625, 7.299823760986328, -1.1412544250488281, 1.1050796508789062, 2.0498123168945312, 3.5494613647460938, -4.6669921875, 3.5170135498046875, 7.4446868896484375, 2.136749267578125, 0.1858978271484375, 0.03818321228027344, 14.4722900390625, 2.9825973510742188, 2.3910140991210938, 4.6287078857421875, 3.6511611938476562, -0.8564987182617188, 1.5932273864746094, 5.572105407714844, 14.339744567871094, 1.2491874694824219, 0.052570343017578125, 1.9553985595703125, 4.8707122802734375, 6.335685729980469, -1.099151611328125, -5.924041748046875, -1.2030715942382812, -3.0882415771484375, -8.394668579101562, 7.828643798828125, 5.6360321044921875, 6.060188293457031, 0.943695068359375, 10.424484252929688, 2.255094528198242, 19.4197998046875, 7.9957733154296875, 3.0883712768554688, -0.7050094604492188, 4.785743713378906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000082.npy"}
{"epoch": 0.12041116005873716, "step": 83, "batch_size": 64, "mean": 4.222027778625488, "std": 6.714541912078857, "min": -9.219879150390625, "p10": -2.7900909423828124, "median": 3.2276926040649414, "p90": 13.6236572265625, "max": 24.753677368164062, "pos_frac": 0.703125, "sample": [13.534149169921875, -0.2684478759765625, -1.4464111328125, 12.737701416015625, 6.334449768066406, 1.8559370040893555, 1.6000556945800781, 3.069000244140625, 13.662017822265625, 2.20196533203125, 9.180221557617188, 4.743162155151367, -1.9217071533203125, -3.9141464233398438, -0.5702590942382812, 14.005363464355469, 0.1044769287109375, -0.6788330078125, 21.7879638671875, 9.128257751464844, 11.479286193847656, 3.2728805541992188, -2.821044921875, -2.2377700805664062, -0.8600997924804688, 4.2199554443359375, 10.94140625, -2.717864990234375, 3.182504653930664, 7.893341064453125, -6.3872222900390625, 4.6535491943359375, 0.5900707244873047, -1.6708450317382812, 4.6019439697265625, 7.856071472167969, 9.153247833251953, -1.381917953491211, 4.225921630859375, 2.115558624267578, 5.278255462646484, 0.1134033203125, 6.506813049316406, 24.753677368164062, 2.4841880798339844, 12.281669616699219, -5.636848449707031, -1.8597488403320312, -4.595329284667969, -2.701080322265625, -9.219879150390625, 5.054840087890625, 3.3318138122558594, 14.401611328125, 11.023632049560547, 15.006134033203125, 3.1815109252929688, -2.852764129638672, 7.0145111083984375, 15.157699584960938, 0.30141448974609375, 5.527198791503906, 3.5952911376953125, 0.807861328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000083.npy"}
{"epoch": 0.12187958883994127, "step": 84, "batch_size": 64, "mean": 2.6200926303863525, "std": 5.423254013061523, "min": -12.114593505859375, "p10": -2.5615053176879883, "median": 1.8165168762207031, "p90": 9.598812866210942, "max": 21.174697875976562, "pos_frac": 0.71875, "sample": [1.865966796875, 5.7608795166015625, 2.79962158203125, -0.6063995361328125, 17.9423828125, -1.0025177001953125, 4.235805511474609, 2.642608642578125, 0.2201709747314453, 3.8751983642578125, -0.19669723510742188, 2.1982345581054688, -2.111419677734375, -1.5379486083984375, -2.8505859375, -4.4563140869140625, 4.240932464599609, 4.886283874511719, 3.6192550659179688, 4.064971923828125, 2.320859909057617, 5.926612854003906, 21.174697875976562, 0.17258644104003906, 0.18583106994628906, -0.8705482482910156, 2.39007568359375, -1.26788330078125, -12.114593505859375, 2.8453731536865234, 2.3885841369628906, 0.85003662109375, 15.272216796875, -2.5716018676757812, 0.9685211181640625, 7.951255798339844, 12.331829071044922, -4.589752197265625, 0.16320037841796875, -4.54766845703125, -0.5120010375976562, 3.6924285888671875, 0.8361377716064453, -2.5379467010498047, 12.907470703125, 4.616096496582031, 3.138782501220703, 2.2077560424804688, 5.8861846923828125, 8.351318359375, -0.7785491943359375, 1.7670669555664062, 10.133453369140625, 6.867008209228516, 1.4649810791015625, -1.7479705810546875, 4.916522979736328, 1.3953742980957031, 0.03497314453125, 11.063339233398438, 0.7624778747558594, -2.9111976623535156, 1.464263916015625, 0.09789276123046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000084.npy"}
{"epoch": 0.12334801762114538, "step": 85, "batch_size": 64, "mean": 3.4285178184509277, "std": 6.344602584838867, "min": -10.377525329589844, "p10": -4.59955596923828, "median": 2.724517822265625, "p90": 10.086124420166016, "max": 23.851295471191406, "pos_frac": 0.734375, "sample": [13.610149383544922, 9.09259033203125, 14.879768371582031, 1.3114776611328125, -10.377525329589844, -2.9967727661132812, 1.6932601928710938, -2.6381759643554688, 23.851295471191406, -3.9146804809570312, 0.28568267822265625, 0.1098480224609375, 9.947250366210938, 9.679679870605469, -1.3467655181884766, 3.0548362731933594, -7.1069183349609375, 6.70587158203125, 2.5064239501953125, -2.7092132568359375, 0.4156684875488281, 8.609573364257812, 0.6046257019042969, -0.4394416809082031, -0.449249267578125, -6.78497314453125, 6.059455871582031, 1.192352294921875, 13.545604705810547, 3.2349853515625, 9.87689208984375, -5.780303955078125, -7.4960784912109375, 2.349092483520508, -4.893074035644531, -0.5766334533691406, 12.075881958007812, 8.904930114746094, 2.39715576171875, 0.5389404296875, 3.53851318359375, 6.932563781738281, 3.2457237243652344, 4.775535583496094, 9.975494384765625, 4.466228485107422, 1.9676856994628906, 15.028182983398438, 9.73345947265625, 3.804311752319336, 8.331413269042969, 9.236194610595703, 0.6331787109375, 2.9426116943359375, 0.18719482421875, 4.456134796142578, -1.21746826171875, 0.01502227783203125, 9.607894897460938, 10.133537292480469, -5.977134704589844, 3.3043899536132812, -0.4804496765136719, 5.7614288330078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000085.npy"}
{"epoch": 0.12481644640234948, "step": 86, "batch_size": 64, "mean": 4.484457969665527, "std": 7.41502571105957, "min": -8.758567810058594, "p10": -1.482398986816406, "median": 1.974008560180664, "p90": 16.37922058105469, "max": 35.17781066894531, "pos_frac": 0.78125, "sample": [-2.066387176513672, 8.69256591796875, 16.751136779785156, 4.891208648681641, -0.9697151184082031, -0.47074127197265625, 5.1215362548828125, -0.8252449035644531, 4.600799560546875, 2.6634292602539062, -0.2691230773925781, 3.3946456909179688, -8.758567810058594, 5.535436630249023, 9.646308898925781, 5.2428436279296875, 18.734375, 3.7130279541015625, 1.44195556640625, 13.465194702148438, 0.5502777099609375, 4.228542327880859, 1.683258056640625, 1.6710624694824219, 8.002410888671875, 0.2875194549560547, 1.4284439086914062, 16.609447479248047, 0.6195411682128906, -0.5421218872070312, 6.492340087890625, -2.726428985595703, 1.9246635437011719, -4.9204254150390625, 8.798637390136719, 15.112533569335938, 1.682037353515625, -8.066055297851562, 19.138290405273438, 0.6576957702636719, 16.295501708984375, -2.9414024353027344, 1.6868438720703125, 3.278533935546875, -1.615264892578125, 21.795822143554688, 1.5021114349365234, 35.17781066894531, 16.41510009765625, 3.9885406494140625, 5.0410003662109375, -0.23333740234375, 6.251811981201172, 1.7334861755371094, 2.6287879943847656, 1.4644050598144531, 1.649688720703125, 2.6535511016845703, -1.1723785400390625, 1.762777328491211, 1.3531875610351562, 0.8223953247070312, 2.2766265869140625, 2.0233535766601562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000086.npy"}
{"epoch": 0.1262848751835536, "step": 87, "batch_size": 64, "mean": 4.877290725708008, "std": 8.039790153503418, "min": -26.339202880859375, "p10": -1.4050865173339842, "median": 4.091320037841797, "p90": 15.694841384887695, "max": 30.085662841796875, "pos_frac": 0.828125, "sample": [-0.12176132202148438, 9.236351013183594, 1.8073310852050781, 5.0314178466796875, 4.8463592529296875, 3.6862411499023438, 6.830005645751953, -0.050884246826171875, -1.2116775512695312, -3.9152374267578125, 2.973602294921875, 4.934906005859375, -3.603424072265625, 16.77532958984375, 8.149604797363281, 7.655830383300781, 7.280784606933594, 0.898956298828125, 22.396728515625, 4.9913330078125, 1.8318099975585938, 2.159320831298828, 3.0132808685302734, 10.038433074951172, 5.260143280029297, 0.14296531677246094, 9.712615966796875, 30.085662841796875, 5.2133026123046875, 6.6961517333984375, -7.189601898193359, 13.850296020507812, 18.18303680419922, 15.78531265258789, 7.539039611816406, -0.8294486999511719, 1.2553253173828125, 4.3096160888671875, 15.483741760253906, 7.5968017578125, 0.40326690673828125, 17.620155334472656, 0.87225341796875, 10.943321228027344, -26.339202880859375, 1.3955154418945312, 14.011505126953125, -5.021167755126953, 4.016349792480469, 4.8905487060546875, 1.0391387939453125, 2.0262832641601562, 3.2008304595947266, -9.614433288574219, -1.48797607421875, 18.964340209960938, 4.166290283203125, 7.488731384277344, 6.839805603027344, 1.7829360961914062, 0.70111083984375, 1.1174125671386719, 3.50177001953125, 0.8981971740722656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000087.npy"}
{"epoch": 0.1277533039647577, "step": 88, "batch_size": 64, "mean": 3.8435635566711426, "std": 6.733339786529541, "min": -13.539520263671875, "p10": -3.1743610382080076, "median": 3.118194580078125, "p90": 12.841436004638677, "max": 24.272323608398438, "pos_frac": 0.734375, "sample": [0.20558929443359375, -3.3078460693359375, 4.769004821777344, 5.3312835693359375, 7.883636474609375, 2.0986480712890625, 7.548259735107422, 24.272323608398438, -4.0551300048828125, 0.9815349578857422, 16.2503662109375, 5.089488983154297, 3.1287078857421875, -0.6241035461425781, -13.539520263671875, 4.979583740234375, 5.085395812988281, 3.1868209838867188, -3.886260986328125, 2.329925537109375, -0.4410667419433594, 11.391448974609375, 9.632194519042969, 1.2298126220703125, 3.2521438598632812, -7.517261505126953, 15.26654052734375, 3.1076812744140625, 4.24683952331543, 8.671646118164062, 5.783447265625, 16.728103637695312, 0.4597206115722656, 7.542213439941406, -2.8943252563476562, 23.032928466796875, -2.8576202392578125, 10.198699951171875, 0.46965789794921875, 14.530181884765625, 3.48895263671875, 2.3619918823242188, 2.492036819458008, 1.4679126739501953, -1.2023086547851562, -0.7644882202148438, -1.8008098602294922, 0.6404209136962891, 13.291923522949219, 11.790298461914062, 6.0370025634765625, 2.2771835327148438, -1.7784767150878906, -0.4225425720214844, -2.8611297607421875, 11.283378601074219, -3.2943763732910156, 0.5249004364013672, 3.9975547790527344, 2.5289230346679688, 4.0579071044921875, -5.007850646972656, 3.9413299560546875, 3.3776321411132812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000088.npy"}
{"epoch": 0.12922173274596183, "step": 89, "batch_size": 64, "mean": 4.274923324584961, "std": 7.1252546310424805, "min": -9.442459106445312, "p10": -1.1532726287841795, "median": 2.788480758666992, "p90": 10.85906219482422, "max": 35.258941650390625, "pos_frac": 0.75, "sample": [5.700653076171875, 1.7622261047363281, 1.5398979187011719, 2.0047683715820312, -3.5205535888671875, 4.076972961425781, 3.2455902099609375, 8.496635437011719, 3.2825546264648438, 10.508026123046875, 7.539234161376953, 15.236480712890625, 9.452545166015625, -0.6106796264648438, 1.2700347900390625, 4.0893096923828125, -0.182220458984375, 0.610015869140625, -0.5653533935546875, 7.747768402099609, 3.4492321014404297, -0.9271659851074219, 5.72027587890625, 2.6584091186523438, 7.7884368896484375, -0.3276195526123047, -0.8439140319824219, 1.0292510986328125, 9.300956726074219, 3.8499069213867188, 3.7538223266601562, 28.290313720703125, 2.9185523986816406, 7.522167205810547, 6.145587921142578, 7.271873474121094, 1.1828556060791016, 35.258941650390625, 0.6658058166503906, -1.6142120361328125, 15.568092346191406, -0.7247772216796875, 0.062465667724609375, 3.359149932861328, -9.442459106445312, -9.145606994628906, 0.897064208984375, 0.3737831115722656, 5.685577392578125, 11.009506225585938, 8.751419067382812, 7.8788604736328125, -1.2501754760742188, 2.2083778381347656, 18.429061889648438, -0.140045166015625, -0.7209625244140625, 1.3319549560546875, -4.287559509277344, 2.468791961669922, 11.16790771484375, -1.8417205810546875, 5.16943359375, 2.00958251953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000089.npy"}
{"epoch": 0.13069016152716592, "step": 90, "batch_size": 64, "mean": 5.5095744132995605, "std": 7.6072845458984375, "min": -11.968307495117188, "p10": -2.4814323425292963, "median": 3.7482070922851562, "p90": 16.131028747558595, "max": 35.26165771484375, "pos_frac": 0.828125, "sample": [4.215370178222656, -4.160163879394531, 1.0555419921875, 17.384185791015625, 6.159751892089844, 4.103767395019531, 0.7790641784667969, 1.3958320617675781, 15.708251953125, 0.5746822357177734, 17.281463623046875, -3.1414108276367188, 3.6932373046875, 5.5577850341796875, 3.8031768798828125, 7.634605407714844, 1.778167724609375, 18.94647216796875, 2.7101287841796875, 0.8598175048828125, 4.653257369995117, 2.5416946411132812, 12.229804992675781, 2.8935699462890625, 35.26165771484375, 2.0355300903320312, 12.210578918457031, 10.549896240234375, -1.8320980072021484, -2.6851348876953125, 18.7298583984375, 7.134063720703125, -4.396148681640625, 3.4697580337524414, 6.9794769287109375, -11.968307495117188, 1.8460273742675781, 4.481788635253906, 15.242080688476562, 0.6445159912109375, 9.347648620605469, -0.47014617919921875, -2.0061264038085938, 0.6580619812011719, -2.9315948486328125, 3.63330078125, 1.7744903564453125, -5.8163604736328125, -0.3070831298828125, 3.2076644897460938, 8.587657928466797, 3.0254344940185547, 4.7897796630859375, 15.266342163085938, 16.19384765625, 4.562461853027344, 1.6391143798828125, 10.46347427368164, 18.7943115234375, 6.641395568847656, 7.256935119628906, 1.700958251953125, 15.984451293945312, 4.255130767822266], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000090.npy"}
{"epoch": 0.13215859030837004, "step": 91, "batch_size": 64, "mean": 4.61190128326416, "std": 7.207120418548584, "min": -7.049957275390625, "p10": -2.8654346466064453, "median": 2.595804214477539, "p90": 12.96967239379883, "max": 28.893783569335938, "pos_frac": 0.71875, "sample": [1.07635498046875, 12.8570556640625, 1.0520172119140625, 2.5025177001953125, 28.893783569335938, 0.8349056243896484, 0.374481201171875, -3.3732147216796875, 5.583942413330078, 7.469482421875, 6.382926940917969, 8.694602966308594, 5.733467102050781, 7.51495361328125, 1.9118499755859375, 22.126617431640625, 2.8490638732910156, 6.4309539794921875, -2.5209197998046875, 16.839859008789062, 7.09967041015625, 4.002742767333984, 10.80816650390625, 2.6122817993164062, 9.304924011230469, 2.0431442260742188, 1.1193923950195312, -0.15445327758789062, -0.7798233032226562, -3.3036842346191406, 3.684112548828125, 2.579326629638672, -1.79632568359375, 2.730499267578125, 3.1547775268554688, 5.400749206542969, 4.617919921875, -4.388763427734375, -0.005107879638671875, 2.1852798461914062, -5.1458587646484375, -4.509857177734375, 1.89105224609375, -2.8735084533691406, -1.0297622680664062, 18.286209106445312, 2.4502182006835938, 12.099906921386719, -7.049957275390625, 9.123687744140625, 2.3227462768554688, -1.52880859375, -0.0966949462890625, 22.781211853027344, -0.6147251129150391, 12.147296905517578, -2.8465957641601562, 13.017936706542969, 8.615631103515625, 3.5502548217773438, 18.564491271972656, -2.584503173828125, 12.523727416992188, 1.9180450439453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000091.npy"}
{"epoch": 0.13362701908957417, "step": 92, "batch_size": 64, "mean": 4.024504661560059, "std": 6.661458492279053, "min": -11.225372314453125, "p10": -1.5822254180908202, "median": 2.8074722290039062, "p90": 13.279668426513672, "max": 30.94573974609375, "pos_frac": 0.703125, "sample": [-4.7943115234375, -0.02097320556640625, 0.953948974609375, 12.845497131347656, 1.5033493041992188, 3.8212356567382812, 5.211585998535156, -0.5159759521484375, 1.7788009643554688, -1.532379150390625, 11.750019073486328, -1.3614425659179688, 4.7178955078125, 0.4647102355957031, -11.225372314453125, 13.795516967773438, 7.802436828613281, 30.94573974609375, -6.964263916015625, -0.1222076416015625, 4.006103515625, -0.7623138427734375, 6.897674560546875, -1.2463417053222656, -0.1848297119140625, 3.56414794921875, 4.513721466064453, 4.3639678955078125, 7.8825531005859375, 3.431985855102539, 17.085575103759766, -1.6269712448120117, 0.7953109741210938, -1.6035881042480469, 4.797401428222656, 0.025768280029296875, 5.3309783935546875, 14.87005615234375, -1.3094310760498047, 18.474952697753906, 13.301666259765625, 11.156784057617188, 1.73687744140625, 2.666107177734375, 1.1711692810058594, 3.985321044921875, 3.6859893798828125, 13.687103271484375, 2.3968772888183594, 4.627408981323242, -0.24051666259765625, -3.0478477478027344, 13.228340148925781, 0.9610748291015625, -0.9053001403808594, -1.2651214599609375, 2.150409698486328, -2.6634674072265625, 10.100814819335938, 3.986726760864258, 4.608570098876953, 2.0671424865722656, 8.862796783447266, 2.9488372802734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000092.npy"}
{"epoch": 0.13509544787077826, "step": 93, "batch_size": 64, "mean": 3.842654228210449, "std": 7.092199325561523, "min": -10.861953735351562, "p10": -4.140321350097656, "median": 2.8549156188964844, "p90": 12.131130981445315, "max": 25.18072509765625, "pos_frac": 0.71875, "sample": [-10.861953735351562, 0.9225234985351562, 10.411460876464844, 9.728218078613281, 5.227283477783203, -1.7468948364257812, 2.9325103759765625, 0.5567893981933594, 12.334693908691406, -2.62188720703125, 11.128036499023438, 3.4460296630859375, 1.3449478149414062, -0.9540481567382812, 16.642608642578125, 11.529613494873047, 2.1327362060546875, -3.6887969970703125, -10.668739318847656, 15.070625305175781, -2.1268653869628906, 4.656566619873047, 11.080001831054688, -5.2505035400390625, 0.9593048095703125, 3.2629165649414062, 17.064132690429688, 17.409217834472656, -0.37560462951660156, 4.544094085693359, 4.702754974365234, -2.0138702392578125, 0.8476104736328125, 6.022834777832031, 9.393272399902344, 25.18072509765625, -6.6390228271484375, 5.1425628662109375, -0.7797126770019531, -5.3417510986328125, 6.524299621582031, 0.66070556640625, 7.4156036376953125, 1.6130294799804688, 4.290287017822266, 5.816303253173828, -0.9499092102050781, 0.8687753677368164, 11.656150817871094, 10.694114685058594, -4.333831787109375, 16.984695434570312, 2.583038330078125, 4.405876159667969, 2.7773208618164062, 6.432685852050781, -2.64312744140625, 7.394256591796875, 2.4116363525390625, 9.368843078613281, 2.6641693115234375, -2.7023468017578125, -9.51983642578125, 0.9127063751220703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000093.npy"}
{"epoch": 0.13656387665198239, "step": 94, "batch_size": 64, "mean": 5.79172945022583, "std": 8.04280948638916, "min": -15.782394409179688, "p10": -2.960993766784667, "median": 5.147871017456055, "p90": 15.783106994628909, "max": 27.29266357421875, "pos_frac": 0.8125, "sample": [4.5240020751953125, 3.207172393798828, 7.536033630371094, -1.2481670379638672, -12.0201416015625, 1.1351318359375, 0.7076301574707031, 19.328094482421875, 1.718048095703125, 3.6438140869140625, 1.355682373046875, -2.171356201171875, 6.442958831787109, 10.086158752441406, -0.6758346557617188, 0.09362220764160156, 16.167221069335938, 10.959392547607422, 5.329509735107422, 12.122817993164062, -3.299409866333008, 4.9662322998046875, 7.116424560546875, 5.644874572753906, 12.40997314453125, 8.9481201171875, -3.7852821350097656, 5.436443328857422, 7.125217437744141, 0.7956771850585938, -5.233514785766602, 4.727794647216797, 3.765399932861328, 24.130020141601562, 5.344264984130859, 2.4213027954101562, 27.29266357421875, 17.752777099609375, 5.5256805419921875, 5.930870056152344, 1.3250770568847656, 1.6952171325683594, -15.782394409179688, 1.9232673645019531, 10.383064270019531, 11.926074981689453, -8.517658233642578, 10.9278564453125, 11.86712646484375, 8.520278930664062, 14.8868408203125, 3.4198455810546875, -0.4069976806640625, 7.4861907958984375, 3.2502059936523438, 14.323211669921875, -0.08123779296875, -5.083808898925781, 21.89678192138672, 2.8285903930664062, 9.881046295166016, 12.056808471679688, 22.568206787109375, 4.1197509765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000094.npy"}
{"epoch": 0.13803230543318648, "step": 95, "batch_size": 64, "mean": 4.601568698883057, "std": 7.26455020904541, "min": -7.913909912109375, "p10": -2.518924617767334, "median": 2.396909713745117, "p90": 13.47412567138672, "max": 30.691131591796875, "pos_frac": 0.703125, "sample": [2.092864990234375, 4.71929931640625, 4.768119812011719, -2.6409568786621094, 6.2457427978515625, 19.256378173828125, 1.9894561767578125, 1.9381599426269531, -0.8587665557861328, 15.983856201171875, -0.052494049072265625, 3.8884506225585938, 8.014026641845703, 23.99951171875, 18.417774200439453, -2.306610107421875, 4.895969390869141, -0.24954605102539062, -1.1375503540039062, 4.04522705078125, 11.3358154296875, 5.266506195068359, 3.80902099609375, -0.056884765625, -2.5954904556274414, -3.618896484375, 0.24668502807617188, 17.875892639160156, -0.5201492309570312, 8.057365417480469, -6.052925109863281, 6.2804412841796875, 11.717445373535156, 10.930618286132812, -5.5950164794921875, 13.451385498046875, 10.838752746582031, 0.6501903533935547, 1.4080123901367188, 11.402923583984375, 7.402778625488281, 30.691131591796875, 6.80426025390625, 6.210113525390625, 1.6982097625732422, 0.7023429870605469, 0.2397918701171875, 4.741844177246094, -2.34027099609375, -7.913909912109375, -1.9795513153076172, 1.1979198455810547, -0.4649658203125, 1.6789627075195312, 9.328361511230469, -1.0566062927246094, 13.483871459960938, 5.903755187988281, 1.99847412109375, 8.412986755371094, -0.8873062133789062, 2.7009544372558594, 1.9811649322509766, -3.87451171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000095.npy"}
{"epoch": 0.1395007342143906, "step": 96, "batch_size": 64, "mean": 5.786574363708496, "std": 7.675961017608643, "min": -13.42205810546875, "p10": -0.9728126525878902, "median": 4.025215148925781, "p90": 15.70906982421875, "max": 27.469741821289062, "pos_frac": 0.84375, "sample": [15.065536499023438, 4.184574127197266, 14.659942626953125, 2.4390296936035156, 8.887130737304688, 25.669723510742188, -13.42205810546875, 0.5533599853515625, -1.6485977172851562, 4.8385467529296875, 27.469741821289062, 6.391773223876953, 6.472412109375, -0.5346450805664062, 3.718982696533203, 4.328945159912109, 2.2696971893310547, 15.62139892578125, 0.15962982177734375, 16.36121368408203, -1.951690673828125, 9.986454010009766, 12.268951416015625, 24.798202514648438, 15.865142822265625, 6.098598480224609, 7.013389587402344, -3.47210693359375, 15.74664306640625, 4.303478240966797, -1.1605987548828125, 14.6353759765625, 1.8209152221679688, 2.5736083984375, 3.6332931518554688, 2.183624267578125, 0.13132476806640625, 2.317901611328125, -8.102897644042969, 1.6459722518920898, 4.126197814941406, 3.6506004333496094, 9.129718780517578, 2.9595565795898438, 0.5688762664794922, -0.4486846923828125, 3.9242324829101562, 1.6045074462890625, 0.06269454956054688, 13.208831787109375, 4.315582275390625, 9.63739013671875, -1.8052597045898438, -0.0230865478515625, 24.574111938476562, 5.271640777587891, 3.3372421264648438, 12.30706787109375, 6.6099700927734375, 4.3303680419921875, 1.8262977600097656, 4.410072326660156, 1.2395477294921875, 1.7013397216796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000096.npy"}
{"epoch": 0.14096916299559473, "step": 97, "batch_size": 64, "mean": 5.202969551086426, "std": 6.539410591125488, "min": -8.256805419921875, "p10": -0.9535882949829102, "median": 3.2203245162963867, "p90": 13.80872039794922, "max": 23.513092041015625, "pos_frac": 0.828125, "sample": [1.1593780517578125, 10.135971069335938, 2.35089111328125, -4.5128936767578125, 0.45026397705078125, -0.056819915771484375, 11.187763214111328, 1.2069358825683594, 3.361898422241211, 3.6360092163085938, 19.89763641357422, 13.893150329589844, -0.9524440765380859, 2.5042076110839844, 17.99298095703125, 20.79096221923828, -5.4598846435546875, 7.2051849365234375, 1.49920654296875, 3.249906539916992, -0.9540786743164062, -2.6269683837890625, 1.7246112823486328, 0.20676422119140625, 9.713027954101562, 4.554908752441406, 3.1107025146484375, 1.8321380615234375, 13.611717224121094, 8.431694030761719, 0.9727439880371094, -1.8043403625488281, 13.007041931152344, 1.4137344360351562, 10.883583068847656, 8.7255859375, 12.063568115234375, 5.725452423095703, -8.256805419921875, 0.9389762878417969, 4.8006439208984375, 0.9318275451660156, -5.599952697753906, 0.5452766418457031, 16.669227600097656, 8.509414672851562, 7.1464080810546875, 2.5840797424316406, 5.289447784423828, -0.44309234619140625, 3.10943603515625, 2.2786407470703125, 2.988889694213867, 4.823875427246094, 3.1907424926757812, 23.513092041015625, -0.41622161865234375, 8.099609375, 6.436309814453125, 9.656150817871094, 9.325908660888672, 6.905666351318359, 3.016876220703125, 16.813461303710938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000097.npy"}
{"epoch": 0.14243759177679882, "step": 98, "batch_size": 64, "mean": 5.8181281089782715, "std": 6.620409965515137, "min": -8.26385498046875, "p10": -0.8174884796142573, "median": 4.163612365722656, "p90": 13.414273071289065, "max": 33.92689514160156, "pos_frac": 0.875, "sample": [0.1003570556640625, 4.2093658447265625, 13.711105346679688, 17.96038055419922, -2.566770553588867, 3.9509201049804688, 10.922882080078125, 12.49917984008789, 2.145050048828125, 2.3900604248046875, 5.158779144287109, 4.11785888671875, 3.1441383361816406, 16.615875244140625, 7.4932098388671875, 10.697189331054688, 7.997833251953125, 6.213714599609375, -0.3375282287597656, 4.025733947753906, 10.256782531738281, 1.9657173156738281, -1.6451339721679688, 3.94073486328125, 16.250041961669922, 2.911144256591797, 7.302490234375, 12.444114685058594, 9.970809936523438, 3.5659866333007812, 2.4481048583984375, 6.129150390625, 9.921306610107422, 8.37261962890625, 5.7044830322265625, 20.3260498046875, 3.3768386840820312, -8.26385498046875, -1.0769500732421875, 8.514228820800781, -4.843772888183594, 8.165630340576172, -2.15643310546875, 8.50079345703125, 4.3451080322265625, 1.6970863342285156, 2.3295135498046875, 3.3394775390625, 2.2134246826171875, 15.049781799316406, 8.552642822265625, 12.721664428710938, 4.741058349609375, 33.92689514160156, -1.0231857299804688, 4.811775207519531, 0.1054840087890625, 0.10687637329101562, 0.5792160034179688, 0.9121322631835938, 10.219642639160156, 1.1827774047851562, 1.6182975769042969, 2.4003067016601562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000098.npy"}
{"epoch": 0.14390602055800295, "step": 99, "batch_size": 64, "mean": 6.420859336853027, "std": 8.826066970825195, "min": -5.5465545654296875, "p10": -0.535635757446289, "median": 3.4112424850463867, "p90": 17.549004364013676, "max": 36.58561706542969, "pos_frac": 0.8125, "sample": [0.96875, -1.8541030883789062, 0.3869647979736328, -0.48633575439453125, -0.19705581665039062, 6.498104095458984, 11.165332794189453, 8.455574035644531, 19.189041137695312, 0.3888702392578125, 4.031745910644531, 13.906707763671875, 10.578285217285156, -5.5465545654296875, 19.266387939453125, 5.3246307373046875, 7.990058898925781, 3.9389572143554688, 1.807159423828125, -0.8097000122070312, 16.86181640625, -3.2783889770507812, 1.2478523254394531, 31.393157958984375, 1.2045135498046875, 12.4581298828125, 3.429962158203125, -0.5441741943359375, 0.2602195739746094, -3.756916046142578, 1.6121826171875, 4.312599182128906, 6.5411376953125, 5.312496185302734, 9.719474792480469, 26.818267822265625, 9.692367553710938, 1.8437919616699219, 12.443103790283203, 14.894088745117188, 3.9679489135742188, 8.60662841796875, 2.054615020751953, 0.29045867919921875, 36.58561706542969, 2.528350830078125, 1.8385696411132812, 9.244354248046875, -0.07413482666015625, 3.096466064453125, 17.84351348876953, 3.7023696899414062, -0.5157127380371094, -1.1903076171875, 12.309127807617188, 2.823394775390625, 3.3925228118896484, 0.22406578063964844, 35.40406799316406, 3.2056732177734375, 2.1412792205810547, 4.225811004638672, 1.9911994934082031, -0.2293853759765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000099.npy"}
{"epoch": 0.14537444933920704, "step": 100, "batch_size": 64, "mean": 4.661480903625488, "std": 5.618067741394043, "min": -5.755218505859375, "p10": -1.1418537139892575, "median": 3.198444366455078, "p90": 11.732985305786134, "max": 24.736846923828125, "pos_frac": 0.828125, "sample": [1.4255619049072266, 1.5716972351074219, -1.8638801574707031, 6.618045806884766, 0.7725753784179688, -0.5631484985351562, 3.2026901245117188, 7.19488525390625, -4.411571502685547, 13.41436767578125, 3.1732635498046875, 0.29523658752441406, 11.782745361328125, 0.953887939453125, 0.2622489929199219, 2.630035400390625, 3.6609954833984375, 6.805938720703125, -3.5004425048828125, 7.5524139404296875, 5.6584930419921875, 6.038902282714844, 2.1843185424804688, 1.3242225646972656, 8.903335571289062, 12.085723876953125, 1.8962631225585938, 2.118865966796875, -0.3218412399291992, -1.2546043395996094, 6.691936492919922, 9.382701873779297, 1.7828369140625, 5.681434631347656, 20.983238220214844, 9.35036849975586, 6.841375350952148, 6.544410705566406, 8.35595703125, 7.8129425048828125, 15.012306213378906, -0.18878936767578125, 5.980926513671875, 7.8055877685546875, 7.209682464599609, 6.109172821044922, 3.1941986083984375, -1.6307830810546875, 3.0434341430664062, 3.4359779357910156, 7.330726623535156, 15.908546447753906, 4.054962158203125, 2.125213623046875, 24.736846923828125, -0.8787689208984375, -3.21685791015625, 0.914947509765625, 1.843658447265625, -5.755218505859375, 11.616878509521484, 1.7155113220214844, 2.974079132080078, 1.9541130065917969], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000100.npy"}
{"epoch": 0.14684287812041116, "step": 101, "batch_size": 64, "mean": 4.04945182800293, "std": 7.209812641143799, "min": -18.761322021484375, "p10": -3.2558990478515617, "median": 3.1738834381103516, "p90": 14.783232879638676, "max": 22.525665283203125, "pos_frac": 0.75, "sample": [18.719467163085938, 16.36279296875, 2.062896728515625, 2.502887725830078, 5.673969268798828, 13.627494812011719, 17.487205505371094, 3.606433868408203, 15.278549194335938, 16.974227905273438, -18.761322021484375, 5.8621826171875, 1.4303131103515625, 10.845298767089844, -0.5411834716796875, 0.07342529296875, -1.0108642578125, 4.31158447265625, 8.334091186523438, 4.566398620605469, -12.051559448242188, 11.055618286132812, 10.958908081054688, 2.998077392578125, 7.468776702880859, 1.6958866119384766, -0.7835960388183594, 22.525665283203125, 12.581954956054688, 5.715202331542969, -3.532928466796875, -1.3011665344238281, -2.054281234741211, 7.864692687988281, 6.077430725097656, 4.2906646728515625, 4.52972412109375, 15.544349670410156, 2.7734527587890625, -2.6094970703125, -2.5032787322998047, 3.3187599182128906, 0.18974685668945312, 6.196037292480469, 1.2132797241210938, -0.5685577392578125, 12.866683959960938, 7.095611572265625, 4.529814720153809, 2.523052215576172, 3.0290069580078125, -1.8430557250976562, 5.986907958984375, -4.692668914794922, 2.2661056518554688, 5.074121475219727, 0.9783782958984375, -7.3459014892578125, -3.5755462646484375, 0.6465930938720703, 2.6197738647460938, 0.31589698791503906, 3.7133102416992188, -4.022369384765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000101.npy"}
{"epoch": 0.14831130690161526, "step": 102, "batch_size": 64, "mean": 3.5364413261413574, "std": 7.519670486450195, "min": -21.542816162109375, "p10": -3.7090736389160153, "median": 3.2174015045166016, "p90": 12.760597229003906, "max": 31.68463134765625, "pos_frac": 0.78125, "sample": [-1.1418647766113281, -1.9965667724609375, 16.120529174804688, 6.847389221191406, -21.542816162109375, 2.373645782470703, 0.1370849609375, 6.4923553466796875, 3.9067955017089844, -2.9585418701171875, -5.3889312744140625, 14.790473937988281, 5.226869583129883, 0.8345718383789062, 3.2715682983398438, -0.3716468811035156, 2.0465316772460938, 8.498294830322266, -12.008392333984375, 1.2098159790039062, 4.225250244140625, 0.3895530700683594, 10.139541625976562, 0.4165153503417969, 6.777610778808594, 3.6623153686523438, 2.537752151489258, 15.017898559570312, 12.841278076171875, 2.5073623657226562, 10.738540649414062, 0.49901580810546875, 6.22662353515625, 5.28460693359375, 3.465961456298828, -3.7818145751953125, 3.1632347106933594, -6.2866363525390625, 7.8066864013671875, -3.2459335327148438, 1.3392105102539062, 1.2841644287109375, 3.451274871826172, -2.1901779174804688, 14.353363037109375, 31.68463134765625, 4.214851379394531, 17.511444091796875, 0.9317359924316406, -9.905250549316406, 12.572341918945312, 0.01941680908203125, 6.9560699462890625, 1.3689651489257812, 6.2210693359375, -3.5393447875976562, 5.253261566162109, 6.29693603515625, 4.800174713134766, 2.6710739135742188, -4.22723388671875, 9.716011047363281, 4.74609375, 2.0696334838867188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000102.npy"}
{"epoch": 0.14977973568281938, "step": 103, "batch_size": 64, "mean": 6.019868850708008, "std": 7.639141082763672, "min": -15.986968994140625, "p10": -1.5961814880371092, "median": 4.900346755981445, "p90": 17.32846908569336, "max": 29.108489990234375, "pos_frac": 0.8125, "sample": [3.33648681640625, 1.76678466796875, 0.27098846435546875, -0.48673248291015625, -7.40386962890625, 4.729347229003906, 1.214385986328125, 11.97116470336914, 2.921417236328125, 4.810298919677734, -0.47076416015625, 4.437063217163086, -0.23676681518554688, 9.62054443359375, -4.84417724609375, 7.896434783935547, 17.87939453125, 13.754745483398438, 9.211318969726562, 8.834053039550781, 4.53631591796875, 21.947219848632812, 0.8385467529296875, -15.986968994140625, 1.3315887451171875, 12.253921508789062, 10.931098937988281, -1.5218887329101562, 5.401355743408203, 4.990394592285156, 2.95953369140625, 17.54083251953125, 8.120574951171875, -1.628021240234375, 10.493843078613281, 5.575439453125, 6.682685852050781, 2.562591552734375, -2.7391929626464844, 1.2106304168701172, -5.0342559814453125, 8.842391967773438, 10.628791809082031, -5.23681640625, 12.015487670898438, 18.105682373046875, 1.2248649597167969, 2.5407257080078125, 29.108489990234375, 16.83295440673828, 18.011276245117188, 10.686416625976562, 7.893577575683594, 15.798904418945312, 2.1673812866210938, -0.4763450622558594, 5.729642868041992, 8.432476043701172, 2.1726226806640625, 1.17333984375, 18.236740112304688, 8.02657699584961, 2.823028564453125, 10.855003356933594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000103.npy"}
{"epoch": 0.1512481644640235, "step": 104, "batch_size": 64, "mean": 6.509515762329102, "std": 8.712037086486816, "min": -10.8536376953125, "p10": -0.7406959533691405, "median": 4.066728591918945, "p90": 16.682057189941407, "max": 42.15167236328125, "pos_frac": 0.84375, "sample": [2.2816619873046875, 3.3795394897460938, 0.32064056396484375, 7.063175201416016, 16.383209228515625, -0.37149810791015625, 0.22064971923828125, 0.3866539001464844, 1.7192001342773438, 10.547340393066406, 0.49675559997558594, -4.4426116943359375, 19.13848876953125, 6.04443359375, 16.79278564453125, -1.2200927734375, 14.42529296875, 12.659984588623047, 3.0223236083984375, 11.95770263671875, 1.32122802734375, 3.6344528198242188, 7.997051239013672, 3.3639755249023438, -1.6485023498535156, 2.0191287994384766, 31.0643310546875, 3.537994384765625, 9.54364013671875, 10.498634338378906, 17.99706268310547, 17.845352172851562, 1.3449745178222656, 4.952117919921875, 0.6989707946777344, 6.731208801269531, 4.465618133544922, 10.645034790039062, 42.15167236328125, -0.7817459106445312, 9.143966674804688, -3.44012451171875, 16.423690795898438, -2.9529800415039062, 5.807285308837891, 6.558923721313477, 15.382583618164062, 4.844463348388672, 4.145294189453125, 0.8875579833984375, 0.8762741088867188, 1.7457122802734375, 2.2288131713867188, 10.713462829589844, 9.936691284179688, 28.736160278320312, 2.149078369140625, 7.159816741943359, -0.4182929992675781, -0.6449127197265625, 3.9881629943847656, 1.0994815826416016, -10.8536376953125, 4.90369987487793], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000104.npy"}
{"epoch": 0.1527165932452276, "step": 105, "batch_size": 64, "mean": 6.392341613769531, "std": 8.305362701416016, "min": -18.465370178222656, "p10": -2.643664550781249, "median": 5.144571304321289, "p90": 17.52802658081055, "max": 32.05206298828125, "pos_frac": 0.84375, "sample": [-4.1716156005859375, 1.9873809814453125, 5.981624603271484, 1.9498443603515625, 6.595832824707031, 6.871849060058594, 3.8343887329101562, -18.465370178222656, -4.259063720703125, -7.5702056884765625, 3.5423583984375, 9.870841979980469, 8.634231567382812, 6.663341522216797, -1.2698516845703125, 8.227134704589844, 0.0990753173828125, 4.663211822509766, -3.1821136474609375, -0.8859176635742188, 6.579681396484375, 19.847274780273438, 1.31622314453125, 1.0332527160644531, 3.57904052734375, 13.717277526855469, 11.995193481445312, 4.298511505126953, 1.527557373046875, 8.461662292480469, 14.195899963378906, 16.158164978027344, 0.6484603881835938, 17.988250732421875, 8.592239379882812, 15.339424133300781, 20.328170776367188, 16.121139526367188, 2.3095550537109375, 6.882987976074219, 9.769668579101562, 13.33612060546875, 4.29302978515625, 1.794301986694336, 5.6259307861328125, 2.188526153564453, 2.2019004821777344, 2.6715469360351562, -1.3872833251953125, 8.730712890625, 8.109718322753906, 0.2066650390625, -3.5269393920898438, 3.0597915649414062, 21.30011749267578, 5.915283203125, 2.133115768432617, 32.05206298828125, 16.45417022705078, 23.71551513671875, 11.105789184570312, -3.3708133697509766, 2.340667724609375, 20.353317260742188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000105.npy"}
{"epoch": 0.15418502202643172, "step": 106, "batch_size": 64, "mean": 6.071486473083496, "std": 8.278144836425781, "min": -8.396270751953125, "p10": -1.4784854888916013, "median": 4.0711517333984375, "p90": 17.06074676513672, "max": 35.77153015136719, "pos_frac": 0.75, "sample": [8.513725280761719, 1.7588214874267578, -0.8815765380859375, -2.6222190856933594, 7.060115814208984, 5.255992889404297, 2.467315673828125, -8.396270751953125, 4.557769775390625, 11.466156005859375, 7.967998504638672, 2.5723323822021484, 1.1235980987548828, 12.38277816772461, 5.9542388916015625, 24.995574951171875, -1.2545738220214844, 2.7084007263183594, 16.757476806640625, 0.34700870513916016, -1.1762466430664062, 12.186805725097656, -1.1793441772460938, 25.8892822265625, 4.381320953369141, 3.753833770751953, -0.06463623046875, 2.974853515625, 19.556015014648438, 2.1410064697265625, -1.1076431274414062, -1.5744476318359375, 9.976997375488281, 8.89565658569336, 3.7341442108154297, 4.090461730957031, -0.8022918701171875, 1.518707275390625, 8.679149627685547, -5.97039794921875, 20.040908813476562, 7.242240905761719, -0.7319412231445312, 1.06939697265625, 35.77153015136719, 14.879936218261719, 6.084678649902344, -5.762786865234375, 21.018508911132812, 10.405006408691406, -2.3109664916992188, 11.390777587890625, 0.9108810424804688, 4.051841735839844, 15.423492431640625, 10.183303833007812, 5.673896789550781, 17.190719604492188, 4.0428314208984375, 11.970176696777344, 1.1384429931640625, -0.18939208984375, -4.428459167480469, 4.8721923828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000106.npy"}
{"epoch": 0.15565345080763582, "step": 107, "batch_size": 64, "mean": 7.033601760864258, "std": 11.456780433654785, "min": -11.413314819335938, "p10": -3.14106979370117, "median": 5.48469352722168, "p90": 16.517349243164062, "max": 73.4091796875, "pos_frac": 0.8125, "sample": [4.3004150390625, 20.36871337890625, -0.008331298828125, 3.9507179260253906, 4.2147216796875, -4.524391174316406, 6.4588470458984375, 2.0461578369140625, -5.3930206298828125, -7.4975128173828125, 1.9652462005615234, 5.827568054199219, 3.8484153747558594, 4.515064239501953, 2.7668380737304688, -1.1363372802734375, 5.539161682128906, 6.911750793457031, -0.3484649658203125, 2.4998016357421875, 18.257339477539062, 16.63629913330078, 10.2274169921875, 3.974597930908203, -7.51995849609375, 5.8236236572265625, -11.413314819335938, 5.815727233886719, 73.4091796875, 9.685470581054688, 2.3392562866210938, 7.637176513671875, 7.778388977050781, 6.5050201416015625, 14.359596252441406, 16.23979949951172, 1.1135368347167969, 6.163421630859375, 5.430225372314453, 5.0650787353515625, 11.22418212890625, 0.723876953125, 7.63336181640625, 15.406639099121094, 3.857452392578125, 5.2594451904296875, -1.3793258666992188, 26.063323974609375, 8.708663940429688, 7.030242919921875, 4.545097351074219, 2.6130752563476562, 23.844924926757812, 8.7021484375, -6.1506500244140625, 8.844085693359375, 1.6402950286865234, 7.2802276611328125, 10.921821594238281, -0.348480224609375, 10.196235656738281, 33.61405944824219, -3.8961029052734375, 9.982666015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000107.npy"}
{"epoch": 0.15712187958883994, "step": 108, "batch_size": 64, "mean": 6.542989730834961, "std": 8.796480178833008, "min": -15.215377807617188, "p10": -1.6822540283203122, "median": 4.803069114685059, "p90": 17.851284790039063, "max": 38.931915283203125, "pos_frac": 0.78125, "sample": [3.104156494140625, 2.560760498046875, 5.917476654052734, -1.89324951171875, -15.215377807617188, 8.8193359375, -2.31976318359375, -0.6192855834960938, 21.685089111328125, -5.141349792480469, -1.3590850830078125, 16.435775756835938, 1.4847412109375, 0.6099853515625, -0.43279266357421875, 5.060760498046875, 0.7059402465820312, 27.798904418945312, 38.931915283203125, 11.608692169189453, 20.656814575195312, 11.59942626953125, 4.354421615600586, 1.6686897277832031, 2.9031028747558594, 2.3693580627441406, -1.2124595642089844, 2.5702056884765625, 18.09552001953125, 5.5424346923828125, 7.852268218994141, -1.8207550048828125, 17.281402587890625, 8.881439208984375, 25.139617919921875, -3.2175445556640625, 8.579032897949219, 4.388275146484375, 16.392860412597656, 10.118743896484375, 7.644126892089844, 8.228202819824219, 1.6035842895507812, 9.834823608398438, 6.79718017578125, -0.1345672607421875, 10.466217041015625, 22.242660522460938, 4.421537399291992, 11.611923217773438, -5.318382263183594, 14.210491180419922, 6.0325164794921875, 7.6666412353515625, -0.9791488647460938, 1.8122081756591797, 9.95065689086914, 1.5766830444335938, 8.486488342285156, 0.074737548828125, 3.0218124389648438, -0.037891387939453125, 5.107967376708984, 4.545377731323242], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000108.npy"}
{"epoch": 0.15859030837004406, "step": 109, "batch_size": 64, "mean": 8.21860122680664, "std": 9.753891944885254, "min": -15.4525146484375, "p10": -1.4070623397827149, "median": 6.778223037719727, "p90": 20.93879394531251, "max": 39.5089111328125, "pos_frac": 0.78125, "sample": [8.340591430664062, 6.036994934082031, 10.3560791015625, 4.4440765380859375, 6.530059814453125, 11.092166900634766, 6.461402893066406, -0.01329803466796875, -8.913642883300781, 11.7947998046875, 29.813812255859375, 12.336421966552734, 1.4036865234375, 4.5980377197265625, 12.02017593383789, 3.902740478515625, 2.8077430725097656, -0.2940559387207031, -2.3660507202148438, -2.0754594802856445, -0.7575302124023438, 39.5089111328125, -1.3545551300048828, 5.0780181884765625, 7.7797393798828125, 0.8217506408691406, 8.338607788085938, -0.4091167449951172, 7.026386260986328, -3.1271133422851562, 14.897377014160156, 25.91839599609375, 0.6643238067626953, 8.13153076171875, 22.071456909179688, 9.314079284667969, 10.400917053222656, 16.013885498046875, 14.263877868652344, 18.295913696289062, -1.4295654296875, -1.7616043090820312, -15.4525146484375, 6.026493072509766, 3.6978759765625, 30.598995208740234, 9.847442626953125, 3.314971923828125, 2.3492507934570312, 25.788681030273438, 6.1693115234375, 11.621749877929688, 3.3835525512695312, 9.253776550292969, 30.0579833984375, -1.0050506591796875, 3.5459365844726562, 16.894397735595703, 13.714096069335938, 15.781539916992188, 12.774124145507812, 7.890480041503906, -0.49486732482910156, 12.270248413085938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000109.npy"}
{"epoch": 0.16005873715124816, "step": 110, "batch_size": 64, "mean": 4.912660121917725, "std": 7.3809123039245605, "min": -13.267013549804688, "p10": -2.983184814453124, "median": 4.447227478027344, "p90": 15.294825744628907, "max": 27.8638916015625, "pos_frac": 0.796875, "sample": [16.605377197265625, 0.6360931396484375, 5.713188171386719, 5.7674560546875, 8.66595458984375, 4.5015716552734375, 6.845878601074219, 6.156881332397461, -1.514190673828125, 16.707809448242188, 2.8189239501953125, 0.467254638671875, -13.267013549804688, 0.20539474487304688, 10.453117370605469, 0.8533058166503906, 2.698963165283203, 26.08935546875, 1.7745208740234375, 6.9298248291015625, -0.7497119903564453, 4.39288330078125, 4.95330810546875, 18.52513885498047, 6.7320556640625, 15.031204223632812, 9.558219909667969, 4.6119537353515625, 0.5062828063964844, 7.862861633300781, -3.949676513671875, -0.34375762939453125, 2.2692108154296875, 0.140869140625, -4.8483123779296875, 15.407806396484375, 14.740730285644531, 15.669921875, 7.4475555419921875, 0.89813232421875, 4.344974517822266, 6.165872573852539, 8.609504699707031, -4.41546630859375, 1.5845413208007812, 7.25927734375, -2.1207351684570312, -8.531646728515625, 7.915214538574219, -1.2160720825195312, 6.7062530517578125, 0.8329429626464844, 6.835243225097656, 3.449817657470703, -3.3528060913085938, 8.080169677734375, 5.955171585083008, -0.03318023681640625, 0.8402137756347656, 12.049240112304688, -7.399864196777344, 0.6807060241699219, 4.340648651123047, 27.8638916015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000110.npy"}
{"epoch": 0.16152716593245228, "step": 111, "batch_size": 64, "mean": 9.290351867675781, "std": 10.13336181640625, "min": -14.815032958984375, "p10": -1.63819351196289, "median": 8.608308792114258, "p90": 22.1308219909668, "max": 42.0491943359375, "pos_frac": 0.828125, "sample": [32.66717529296875, 14.313873291015625, -14.815032958984375, 23.558074951171875, 6.418796539306641, 11.132743835449219, -3.821258544921875, 6.137992858886719, -0.48923492431640625, 15.758773803710938, 13.476432800292969, 10.39837646484375, 13.364265441894531, 21.47039031982422, 18.235809326171875, 7.46514892578125, 1.6991806030273438, 9.692367553710938, 22.413864135742188, 2.624279022216797, 2.9062271118164062, -1.0167083740234375, 19.179855346679688, -3.715057373046875, 9.568771362304688, 17.286834716796875, 0.662750244140625, 8.820426940917969, -2.439788818359375, 8.432647705078125, 8.410247802734375, 4.063993453979492, 16.611541748046875, 11.951671600341797, 19.895347595214844, 4.154121398925781, 0.9263191223144531, 9.253028869628906, 42.0491943359375, 4.547027587890625, 4.263397216796875, -1.180938720703125, 2.66458797454834, 10.813385009765625, 14.812454223632812, 25.578292846679688, 6.90380859375, 4.746635437011719, 18.817955017089844, 2.9716453552246094, 9.384956359863281, 8.421602249145508, 16.884292602539062, 14.308441162109375, 31.1417236328125, -0.4571399688720703, -7.479118347167969, 8.78396987915039, 5.569583892822266, 9.150981903076172, 0.3476905822753906, 23.655685424804688, -6.9376373291015625, -1.8341598510742188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000111.npy"}
{"epoch": 0.16299559471365638, "step": 112, "batch_size": 64, "mean": 4.750532150268555, "std": 6.954592227935791, "min": -10.809555053710938, "p10": -2.771734619140625, "median": 2.9848861694335938, "p90": 15.183465576171876, "max": 21.995254516601562, "pos_frac": 0.71875, "sample": [2.8252487182617188, 14.943695068359375, 15.286224365234375, 1.6022262573242188, -1.7774009704589844, 1.6559333801269531, -1.8076019287109375, -2.1251220703125, 8.160179138183594, 2.338184356689453, 11.398033142089844, 5.821754455566406, 0.0153350830078125, -5.141761779785156, 17.04503631591797, 10.186355590820312, -1.2615184783935547, 9.782333374023438, 5.72015380859375, 0.0100860595703125, 1.9571533203125, 9.925931930541992, 11.261085510253906, -2.7798118591308594, 4.3887786865234375, 13.925453186035156, 2.899658203125, 5.201984405517578, -1.4802703857421875, -1.6348991394042969, 9.264423370361328, 2.5019607543945312, 4.987396240234375, 16.60342788696289, 5.4888763427734375, -5.023590087890625, 18.561767578125, -0.13771820068359375, 10.591629028320312, 0.6202049255371094, -10.809555053710938, 8.123847961425781, 10.33203125, -3.0122604370117188, -5.4184417724609375, 13.074398040771484, 21.995254516601562, 0.7934207916259766, 18.09576416015625, 15.568702697753906, 0.7499923706054688, 2.33868408203125, 7.456268310546875, -1.0858612060546875, -0.4436473846435547, 3.0701141357421875, -3.10333251953125, -2.752887725830078, 4.877830505371094, 4.054811477661133, 1.6471900939941406, 5.442996978759766, -1.3750038146972656, 12.612903594970703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000112.npy"}
{"epoch": 0.1644640234948605, "step": 113, "batch_size": 64, "mean": 5.153696060180664, "std": 7.075318336486816, "min": -12.040191650390625, "p10": -2.128285980224609, "median": 4.3888397216796875, "p90": 12.191255950927735, "max": 32.63128662109375, "pos_frac": 0.796875, "sample": [2.2401351928710938, 7.310028076171875, 0.6873512268066406, 32.63128662109375, 11.48565673828125, -1.670074462890625, 7.344152450561523, 9.766250610351562, -1.9184494018554688, 0.7604713439941406, 4.303031921386719, 9.52691650390625, 6.067024230957031, 2.788848876953125, 8.13021469116211, -0.09226226806640625, 2.0607337951660156, -4.489360809326172, 3.5578536987304688, 4.382476806640625, -6.6095123291015625, -3.2068710327148438, 12.151153564453125, -12.040191650390625, 5.854530334472656, 7.5872802734375, 6.5032806396484375, 6.793174743652344, 11.650749206542969, 9.162879943847656, 7.547916412353516, 8.569320678710938, 4.700557708740234, 9.836280822753906, 12.208442687988281, -8.579750061035156, 1.8691635131835938, -0.404296875, 13.510650634765625, 7.434013366699219, 1.7397079467773438, 2.957244873046875, -0.9721870422363281, 4.012676239013672, 4.39520263671875, 6.281333923339844, 6.540046691894531, 12.920677185058594, -4.7743377685546875, 2.838479995727539, 27.8922119140625, 2.6014747619628906, 4.033918380737305, 2.3434295654296875, 4.8184661865234375, 12.704631805419922, 9.037086486816406, 15.727436065673828, -1.6177444458007812, 4.268455505371094, -2.2182159423828125, 7.1656036376953125, 3.5144920349121094, 4.21539306640625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000113.npy"}
{"epoch": 0.16593245227606462, "step": 114, "batch_size": 64, "mean": 7.202906131744385, "std": 10.425104141235352, "min": -15.63677978515625, "p10": -2.9388389587402326, "median": 4.6546478271484375, "p90": 21.679404449462893, "max": 45.0244140625, "pos_frac": 0.828125, "sample": [45.0244140625, 22.04833221435547, 19.469200134277344, -9.995712280273438, 2.7319068908691406, 14.6217041015625, 2.523792266845703, 8.457252502441406, 0.0335235595703125, 4.087303161621094, 1.0922317504882812, 5.256591796875, 17.168121337890625, 9.804039001464844, 8.271064758300781, 9.472305297851562, 4.6664581298828125, -3.798828125, 0.016269683837890625, 23.703353881835938, 0.4237213134765625, 2.4417667388916016, 24.0694580078125, 10.441215515136719, -0.9321975708007812, 20.818572998046875, 0.8456192016601562, 0.5398712158203125, -0.658599853515625, 10.352142333984375, 5.193971633911133, 3.8367233276367188, -5.7247772216796875, 6.145351409912109, -15.63677978515625, 0.6315650939941406, 3.8501338958740234, 1.09588623046875, 15.529647827148438, 24.52220916748047, 10.54399299621582, 15.613945007324219, 1.8204383850097656, 16.899581909179688, 7.1757049560546875, -7.74700927734375, 30.472572326660156, 3.9390182495117188, 4.48968505859375, 1.6240386962890625, -0.00933074951171875, 4.6428375244140625, 5.913116455078125, 3.4599037170410156, 14.927642822265625, 23.848522186279297, -0.7998447418212891, 20.701217651367188, 7.4457244873046875, -7.410255432128906, 8.414962768554688, 1.8095626831054688, -8.063514709472656, 8.834663391113281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000114.npy"}
{"epoch": 0.16740088105726872, "step": 115, "batch_size": 64, "mean": 7.2280988693237305, "std": 9.10738468170166, "min": -11.524612426757812, "p10": -1.7330711364746088, "median": 5.359796524047852, "p90": 16.302021408081057, "max": 36.0546875, "pos_frac": 0.8125, "sample": [-2.7724685668945312, -1.1818275451660156, 14.553237915039062, 13.012222290039062, 8.498046875, 6.942657470703125, 3.2155609130859375, 3.396883010864258, 0.07030296325683594, 14.122085571289062, 5.463489532470703, 1.6566886901855469, 2.432180404663086, 14.528778076171875, 4.911403656005859, 25.463775634765625, 1.1023082733154297, 5.256103515625, 12.843032836914062, 1.8107643127441406, 11.266616821289062, -0.9106063842773438, 15.753471374511719, 22.673648834228516, 14.112258911132812, 7.973716735839844, 0.5424957275390625, 16.503726959228516, 1.8709125518798828, 35.83453369140625, 14.15576171875, 0.31134033203125, 2.65399169921875, -3.0020599365234375, 5.499225616455078, -8.794265747070312, 7.381378173828125, 9.046436309814453, 14.290325164794922, 15.831375122070312, 8.709320068359375, 10.720474243164062, 9.937599182128906, -2.2510757446289062, 36.0546875, 2.0604095458984375, 1.2793807983398438, -1.9693183898925781, 4.334014892578125, 8.044731140136719, 3.9872703552246094, 15.470321655273438, 4.605812072753906, 19.911544799804688, -0.040676116943359375, 7.22882080078125, 15.556167602539062, -0.36839771270751953, 2.279022216796875, -8.802642822265625, 17.28216552734375, 1.8479766845703125, -0.07419204711914062, -11.524612426757812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000115.npy"}
{"epoch": 0.16886930983847284, "step": 116, "batch_size": 64, "mean": 6.000702857971191, "std": 7.361717700958252, "min": -6.8576202392578125, "p10": -1.2559799194335937, "median": 4.463157653808594, "p90": 15.690821838378907, "max": 26.346405029296875, "pos_frac": 0.78125, "sample": [3.3067054748535156, 7.890186309814453, 0.09961318969726562, 12.433204650878906, 2.6927413940429688, 6.113182067871094, -2.4127883911132812, 1.5069351196289062, 3.435028076171875, -1.2667236328125, 0.766571044921875, -3.427305221557617, 13.328437805175781, -2.1361732482910156, 4.858615875244141, -1.2067279815673828, 6.4135589599609375, -5.337165832519531, 15.114730834960938, 24.836952209472656, 8.163078308105469, 6.10205078125, 1.2537078857421875, 8.418212890625, 15.802589416503906, 2.2452011108398438, 6.8519287109375, 19.74026107788086, 3.8709793090820312, -5.765655517578125, 12.514389038085938, 7.439842224121094, 4.067699432373047, 6.680023193359375, -0.6156806945800781, 3.1257286071777344, -6.8576202392578125, 5.975929260253906, 3.709491729736328, 3.0945587158203125, -1.0857315063476562, 22.931747436523438, -1.2309112548828125, 26.346405029296875, 5.106941223144531, 0.7384185791015625, 18.45209503173828, 9.505722045898438, 2.288681983947754, 9.653480529785156, 5.246212005615234, -0.2157144546508789, 15.430030822753906, 1.6595230102539062, 14.505290985107422, 0.3634033203125, 4.933750152587891, 11.176136016845703, -0.12455368041992188, 3.398883819580078, -0.7931251525878906, 15.830780029296875, 12.444076538085938, 14.657157897949219], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000116.npy"}
{"epoch": 0.17033773861967694, "step": 117, "batch_size": 64, "mean": 5.051124572753906, "std": 7.9182634353637695, "min": -19.49847412109375, "p10": -3.2515609741210927, "median": 4.237829208374023, "p90": 14.674231719970704, "max": 28.621307373046875, "pos_frac": 0.75, "sample": [-0.004131317138671875, 0.3273773193359375, 23.952102661132812, 1.8723983764648438, 7.271049499511719, 2.9506072998046875, 9.269935607910156, -0.07719802856445312, 6.3974151611328125, 3.2268829345703125, 5.135692596435547, 13.038368225097656, 14.254585266113281, -19.49847412109375, 2.8511505126953125, 2.3951759338378906, 6.10137939453125, 10.24346923828125, -2.2674102783203125, -8.543487548828125, 2.719146728515625, 13.765182495117188, 4.167213439941406, -0.2620067596435547, 15.985595703125, -1.0097427368164062, -6.579246520996094, 0.5629510879516602, -0.36241912841796875, -4.946998596191406, 6.661628723144531, 12.923095703125, 9.014762878417969, 4.68402099609375, 14.953475952148438, 4.013999938964844, 28.621307373046875, 10.778793334960938, 0.112762451171875, 13.000507354736328, 4.7412872314453125, 0.40409088134765625, 3.090106964111328, -4.7684173583984375, -7.1534271240234375, 1.2349815368652344, 11.552631378173828, 7.3228759765625, 6.6285552978515625, 1.2401847839355469, -3.67333984375, 4.308444976806641, 7.866615295410156, 7.249961853027344, 18.28534698486328, 14.854080200195312, -0.624176025390625, 8.229007720947266, 20.994598388671875, 2.597991943359375, 6.968067169189453, -0.8450660705566406, -1.84527587890625, 6.911891937255859], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000117.npy"}
{"epoch": 0.17180616740088106, "step": 118, "batch_size": 64, "mean": 6.8765411376953125, "std": 7.398930549621582, "min": -12.019737243652344, "p10": -0.41395587921142574, "median": 6.5714616775512695, "p90": 15.573406982421876, "max": 28.858901977539062, "pos_frac": 0.875, "sample": [1.6198883056640625, 3.3188438415527344, 13.26150131225586, 0.737823486328125, 7.696956634521484, 11.089698791503906, 2.0323333740234375, 6.790569305419922, 7.314992904663086, 10.00503158569336, 6.2896728515625, 3.9017181396484375, 26.15301513671875, 0.3738899230957031, 0.7630996704101562, 9.930530548095703, 1.2208175659179688, 7.169990539550781, 11.238540649414062, 18.4072265625, 0.20967864990234375, 1.0039634704589844, -1.5349884033203125, 4.0211334228515625, -4.098289489746094, 3.1954498291015625, 0.4910392761230469, 6.599966049194336, 5.224906921386719, 10.64166259765625, 11.940855026245117, 7.502410888671875, -2.3165817260742188, 6.542957305908203, 22.162445068359375, 15.886650085449219, 1.9796504974365234, 9.314773559570312, 21.248580932617188, 6.796318054199219, -0.42765045166015625, -0.3820018768310547, -12.019737243652344, 6.049781799316406, 28.858901977539062, 14.56646728515625, 8.823013305664062, 15.248817443847656, 8.817306518554688, -4.226480484008789, 4.230377197265625, 15.712516784667969, 12.44256591796875, 10.939937591552734, 8.871017456054688, 5.746589660644531, 6.743253707885742, 4.302257537841797, 3.450448989868164, 12.757987976074219, 4.88653564453125, -7.519401550292969, 13.329727172851562, 2.767637252807617], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000118.npy"}
{"epoch": 0.17327459618208516, "step": 119, "batch_size": 64, "mean": 7.111355304718018, "std": 10.072500228881836, "min": -24.905258178710938, "p10": -3.0482734680175776, "median": 6.292522430419922, "p90": 17.130027008056643, "max": 32.84254455566406, "pos_frac": 0.828125, "sample": [6.415426254272461, 13.316085815429688, 4.167812347412109, 25.246353149414062, 19.069732666015625, 12.24493408203125, -5.506370544433594, 0.315582275390625, 0.9852638244628906, 8.649971008300781, -6.235572814941406, 5.845573425292969, 26.092575073242188, 8.946800231933594, 1.48760986328125, 14.213462829589844, 4.563453674316406, 7.047172546386719, 7.158203125, -0.8249359130859375, 11.518142700195312, 14.58663558959961, 16.39795684814453, 13.574058532714844, 6.3572235107421875, 5.5834503173828125, 16.6488037109375, 6.271446228027344, 15.214431762695312, 12.244033813476562, -3.2186508178710938, -2.0411911010742188, 1.2151641845703125, 3.7999038696289062, 28.691757202148438, -4.058929443359375, 14.153099060058594, 32.79058837890625, 3.981029510498047, 1.8856430053710938, -14.516876220703125, 5.678077697753906, 0.0150909423828125, 32.84254455566406, 6.3135986328125, 9.482379913330078, 3.965301513671875, 11.140159606933594, 7.194908142089844, 7.467437744140625, -1.9520339965820312, -24.905258178710938, 3.8805160522460938, 5.0915069580078125, 12.166519165039062, 4.519622802734375, -2.650726318359375, 2.2027320861816406, 9.278594970703125, 0.8663558959960938, -10.884933471679688, 5.6424102783203125, 16.158836364746094, 17.336265563964844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000119.npy"}
{"epoch": 0.17474302496328928, "step": 120, "batch_size": 64, "mean": 9.431661605834961, "std": 12.3392915725708, "min": -20.384979248046875, "p10": -3.2934724807739255, "median": 6.824032783508301, "p90": 28.39235992431641, "max": 47.23870849609375, "pos_frac": 0.8125, "sample": [28.04425048828125, 17.71978759765625, 12.685165405273438, 12.70947265625, 6.815362930297852, 6.465614318847656, -9.07861328125, 28.762596130371094, 8.688446044921875, 3.9961090087890625, 18.755104064941406, 15.056808471679688, 10.343067169189453, 7.0566253662109375, 5.379949569702148, -20.384979248046875, 11.679351806640625, 2.492084503173828, 14.690231323242188, -1.334259033203125, -16.8106689453125, 25.162269592285156, 16.22528076171875, 13.342391967773438, 3.7803955078125, -6.600128173828125, 1.9819488525390625, 9.198040008544922, 24.57526397705078, -2.9091873168945312, -4.4066925048828125, 33.53240966796875, 29.54468536376953, 11.483108520507812, 28.541549682617188, 4.2049407958984375, 17.293807983398438, 4.72508430480957, 5.567283630371094, -3.4581661224365234, 4.182624816894531, 7.4807586669921875, 1.3768882751464844, 47.23870849609375, 30.953529357910156, 16.172821044921875, 32.27348327636719, 5.7292327880859375, 12.134811401367188, 12.772193908691406, 1.4663925170898438, 3.783966064453125, -6.459075927734375, 10.19207763671875, -0.6398849487304688, -1.757110595703125, -0.9281158447265625, 3.8255157470703125, 26.904891967773438, 3.3735008239746094, 6.079071044921875, 2.47784423828125, 6.83270263671875, 2.6436691284179688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000120.npy"}
{"epoch": 0.1762114537444934, "step": 121, "batch_size": 64, "mean": 8.387594223022461, "std": 8.516066551208496, "min": -7.200286865234375, "p10": -0.5776996612548827, "median": 6.748355865478516, "p90": 21.081601715087892, "max": 36.729835510253906, "pos_frac": 0.84375, "sample": [2.7893333435058594, 6.900054931640625, 3.749755859375, 21.410507202148438, -2.395954132080078, 7.175987243652344, 14.076698303222656, 2.4257431030273438, 2.177865982055664, 13.637062072753906, 0.7814483642578125, -2.5238609313964844, 3.1271018981933594, 9.324302673339844, 12.870880126953125, 5.105445861816406, -0.32964324951171875, 5.884510040283203, 8.453231811523438, 5.912147521972656, 7.313663482666016, 3.3832244873046875, 3.2040939331054688, 10.161148071289062, 15.216896057128906, 2.3590774536132812, 8.728462219238281, 10.589973449707031, 15.770927429199219, 3.387907028198242, 9.180717468261719, -1.4776611328125, 6.596656799316406, 28.70452880859375, 0.4831199645996094, 6.3702392578125, 9.434089660644531, 23.8917236328125, -0.877288818359375, 36.729835510253906, 3.305419921875, 15.746078491210938, 7.058723449707031, -0.487457275390625, 3.2804412841796875, 8.154701232910156, 23.09520721435547, -0.0687713623046875, 10.686553955078125, 16.74981689453125, 3.3970184326171875, 5.972587585449219, 20.31415557861328, 9.632217407226562, 17.3411865234375, -0.6163749694824219, -7.200286865234375, 21.822113037109375, 13.799713134765625, 18.379531860351562, 23.63775634765625, 6.473268508911133, -5.451082229614258, 2.079498291015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000121.npy"}
{"epoch": 0.1776798825256975, "step": 122, "batch_size": 64, "mean": 9.313936233520508, "std": 9.542064666748047, "min": -16.702774047851562, "p10": -1.341054344177246, "median": 8.643514633178711, "p90": 21.546685028076173, "max": 33.15669250488281, "pos_frac": 0.796875, "sample": [18.16778564453125, -0.1562652587890625, 8.523200988769531, 7.199684143066406, 11.9244384765625, -6.257602691650391, -1.24420166015625, 6.3151092529296875, 9.789142608642578, 11.348640441894531, 8.76382827758789, -16.702774047851562, 17.052291870117188, 18.677978515625, 10.506011962890625, 14.172782897949219, 8.005561828613281, -0.6548309326171875, 30.580001831054688, 7.0413055419921875, -1.0614089965820312, 8.931129455566406, 7.538444519042969, 6.3973541259765625, 10.728157043457031, 3.5299758911132812, 11.178104400634766, 20.24530792236328, 15.917526245117188, 13.487312316894531, 21.64641571044922, -7.033103942871094, 6.980690002441406, 0.9762229919433594, 0.5128555297851562, 14.365478515625, 11.107757568359375, 10.451087951660156, 29.183013916015625, 23.939319610595703, -0.13771438598632812, 33.15669250488281, 1.4893875122070312, 23.174209594726562, -1.4942626953125, 24.724403381347656, -1.2192916870117188, 15.549415588378906, 4.411369323730469, 7.0774383544921875, 3.4684219360351562, -2.516693115234375, 15.096504211425781, -1.3825626373291016, 14.662948608398438, 7.4558868408203125, 19.261260986328125, 21.313980102539062, -1.778076171875, 1.619415283203125, 12.974067687988281, 6.8412933349609375, 3.8920631408691406, 16.377975463867188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000122.npy"}
{"epoch": 0.17914831130690162, "step": 123, "batch_size": 64, "mean": 6.8726606369018555, "std": 8.64792537689209, "min": -10.469963073730469, "p10": -3.087017822265625, "median": 7.286346435546875, "p90": 17.43204116821289, "max": 35.001708984375, "pos_frac": 0.75, "sample": [7.2105255126953125, 7.815399169921875, 5.872322082519531, 6.263023376464844, 8.170822143554688, 9.027206420898438, 15.0035400390625, 18.897918701171875, 7.399269104003906, 3.6932144165039062, -0.9608345031738281, 3.2842254638671875, -3.3955230712890625, 12.682258605957031, 2.4411773681640625, 8.855670928955078, 2.4365921020507812, 19.028671264648438, -4.926544189453125, 12.235710144042969, 2.75445556640625, 35.001708984375, 9.816619873046875, -1.963104248046875, 1.7723846435546875, -0.0536346435546875, 26.58605194091797, 3.822206497192383, 22.039688110351562, 9.3814697265625, 10.350826263427734, 8.87432861328125, -0.19585418701171875, -1.8680496215820312, 12.613670349121094, 12.973159790039062, 12.160285949707031, 12.095706939697266, 6.2910919189453125, 1.8496627807617188, 7.3621673583984375, 17.101165771484375, -0.44681739807128906, -6.732017517089844, 14.021766662597656, -2.8045654296875, 0.6154098510742188, -5.626014709472656, -3.20806884765625, 24.914337158203125, 13.828353881835938, 12.8017578125, -0.599609375, 8.266143798828125, 17.57384490966797, -1.8642845153808594, 9.433490753173828, 1.2327003479003906, -10.469963073730469, 4.052055358886719, 13.51629638671875, -7.934600830078125, 1.2073345184326172, 8.272087097167969], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000123.npy"}
{"epoch": 0.18061674008810572, "step": 124, "batch_size": 64, "mean": 8.370440483093262, "std": 12.116426467895508, "min": -9.825437545776367, "p10": -3.272485351562499, "median": 4.706523895263672, "p90": 24.74584121704102, "max": 51.381431579589844, "pos_frac": 0.71875, "sample": [10.039031982421875, -3.60980224609375, 19.63996124267578, -1.7536544799804688, -0.797882080078125, 14.014312744140625, -1.786285400390625, 2.253582000732422, -2.019805908203125, 51.381431579589844, 1.3118057250976562, 4.040317535400391, -0.09070205688476562, 12.102325439453125, 4.230278015136719, 30.307571411132812, 33.89588165283203, 4.3354644775390625, 1.0823593139648438, 1.5416107177734375, -2.48541259765625, -4.4433441162109375, 29.576034545898438, -8.669509887695312, 25.011474609375, 17.751564025878906, 15.9005126953125, 0.17531585693359375, 10.018310546875, 13.04315185546875, -9.825437545776367, 1.7718029022216797, 0.8519058227539062, 2.7708740234375, 13.876968383789062, 39.84912109375, 12.671676635742188, 17.700759887695312, 11.48651123046875, 6.31536865234375, 11.488311767578125, 8.184913635253906, 5.115638732910156, -4.114234924316406, 8.597799301147461, -0.254058837890625, 10.67633056640625, 4.3048553466796875, -0.6895732879638672, -6.042144775390625, -7.140907287597656, 5.077583312988281, 6.9077301025390625, 11.892120361328125, 15.014480590820312, 35.019256591796875, 10.37054443359375, 24.12602996826172, -0.4984703063964844, -0.11022186279296875, -0.20306396484375, 3.5478057861328125, 19.308483123779297, 1.6635417938232422], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000124.npy"}
{"epoch": 0.18208516886930984, "step": 125, "batch_size": 64, "mean": 6.136148452758789, "std": 9.025079727172852, "min": -16.600753784179688, "p10": -3.2471256256103516, "median": 4.586750030517578, "p90": 18.26108169555664, "max": 31.715484619140625, "pos_frac": 0.765625, "sample": [5.796802520751953, 8.8966064453125, 7.623710632324219, 0.71209716796875, -1.9798812866210938, -4.9834442138671875, 15.63470458984375, 0.21300506591796875, 18.3619384765625, 3.1355438232421875, 1.0781097412109375, -0.7153205871582031, -1.5358390808105469, -16.600753784179688, 13.515933990478516, 1.2845497131347656, 7.6309356689453125, 8.927146911621094, 11.43814468383789, 0.24013519287109375, 7.711982727050781, 0.6644554138183594, -5.8726806640625, -4.280426025390625, 4.880531311035156, 31.715484619140625, -2.0083847045898438, 3.5496883392333984, 13.300468444824219, 10.693885803222656, 5.102626800537109, -1.8583984375, -1.2379035949707031, 0.4899635314941406, 20.007061004638672, 16.729583740234375, -3.9236297607421875, 24.332054138183594, 18.67376708984375, 10.667617797851562, 3.6007843017578125, 27.305286407470703, 4.53802490234375, 14.620262145996094, 13.707687377929688, -3.2188873291015625, 4.238746643066406, 6.455558776855469, 4.635475158691406, 12.807720184326172, 2.2818756103515625, 3.6365127563476562, 14.472785949707031, -3.2243614196777344, 18.02574920654297, 4.803562164306641, -3.2568817138671875, 17.10485076904297, 4.76226806640625, 3.67034912109375, -8.26043701171875, 20.863571166992188, 1.068634033203125, 0.06250381469726562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000125.npy"}
{"epoch": 0.18355359765051396, "step": 126, "batch_size": 64, "mean": 8.823543548583984, "std": 8.069236755371094, "min": -10.50177001953125, "p10": -0.5528572082519525, "median": 8.024059295654297, "p90": 19.935886383056644, "max": 30.4239501953125, "pos_frac": 0.890625, "sample": [30.4239501953125, -3.2427978515625, 2.571979522705078, 7.183677673339844, 18.028533935546875, 21.206031799316406, 12.934440612792969, -0.8769073486328125, 6.527740478515625, 2.418426513671875, 2.27252197265625, 12.36541748046875, 15.224334716796875, 13.280868530273438, 2.7376251220703125, -1.5348739624023438, 3.789031982421875, 7.754341125488281, 8.778450012207031, 0.01822662353515625, 4.8285369873046875, 4.961952209472656, 0.46860313415527344, -1.2055892944335938, 6.28369140625, 15.49237060546875, 19.40361785888672, 23.155303955078125, 14.32796859741211, 10.479496002197266, 3.8148536682128906, 0.7658233642578125, 0.6248855590820312, 5.3106842041015625, 20.16400146484375, 2.695159912109375, 10.042068481445312, -5.76116943359375, 17.172927856445312, 8.689888000488281, -0.797607421875, 15.853363037109375, 10.69805908203125, 12.443611145019531, 4.7690582275390625, 8.293777465820312, 9.980472564697266, 9.047161102294922, 9.833892822265625, 7.120819091796875, 4.107177734375, 15.598556518554688, 21.332565307617188, 25.639068603515625, 9.592117309570312, 2.3627243041992188, 15.700820922851562, 3.731861114501953, 6.196563720703125, 16.467185974121094, 5.696754455566406, 12.96206283569336, -10.50177001953125, 25.002365112304688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000126.npy"}
{"epoch": 0.18502202643171806, "step": 127, "batch_size": 64, "mean": 6.779728412628174, "std": 9.359068870544434, "min": -14.3831787109375, "p10": -3.0276676177978508, "median": 5.250177383422852, "p90": 16.977753829956058, "max": 36.57920837402344, "pos_frac": 0.8125, "sample": [0.20391464233398438, 23.25609588623047, -2.386432647705078, 1.007598876953125, 5.071479797363281, -3.3024826049804688, 5.2223968505859375, 0.6251678466796875, 6.8160858154296875, 10.947174072265625, 1.9706859588623047, 9.99056625366211, 20.286514282226562, -10.100048065185547, 6.706758499145508, 2.627389907836914, 11.48421859741211, 2.0917816162109375, 3.002777099609375, 12.113677978515625, 9.469711303710938, 31.22144317626953, -1.004913330078125, 8.741058349609375, 8.756359100341797, 13.852645874023438, 5.085929870605469, 15.228302001953125, -9.522842407226562, 8.686882019042969, 27.349430084228516, 22.72161102294922, 11.189586639404297, 1.349583625793457, -14.3831787109375, 0.473663330078125, -0.6065883636474609, 5.277957916259766, 1.180551528930664, -0.08829116821289062, 4.8162994384765625, 15.935997009277344, 0.7247352600097656, 7.3647003173828125, 3.06561279296875, 13.550338745117188, 6.379722595214844, 11.66998291015625, -4.65399169921875, 11.005203247070312, 1.9994277954101562, 2.696033477783203, 7.965341567993164, 36.57920837402344, 0.5199470520019531, 15.296432495117188, -1.2092742919921875, 15.483528137207031, -4.582618713378906, 1.4220085144042969, 17.42422103881836, 14.833145141601562, -4.0655517578125, 7.06793212890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000127.npy"}
{"epoch": 0.18649045521292218, "step": 128, "batch_size": 64, "mean": 6.783669471740723, "std": 10.266966819763184, "min": -35.77569580078125, "p10": -0.920114517211914, "median": 4.623449325561523, "p90": 19.385073471069337, "max": 44.071441650390625, "pos_frac": 0.8125, "sample": [3.449993133544922, -0.9501724243164062, 9.733169555664062, 14.525337219238281, 17.3074951171875, 0.9007339477539062, 19.643047332763672, 22.195541381835938, 1.5164947509765625, 2.7165603637695312, 8.608406066894531, 5.363437652587891, -0.223297119140625, 4.444480895996094, 0.27713775634765625, 18.78313446044922, 2.6534576416015625, 4.1632080078125, -0.6273384094238281, 0.86083984375, -6.291618347167969, 3.896808624267578, 15.735527038574219, 20.212265014648438, -0.189056396484375, 1.7531242370605469, 44.071441650390625, 21.772048950195312, 24.208663940429688, 8.490425109863281, 9.21502685546875, 0.33840179443359375, 10.849773406982422, -2.3926239013671875, 1.7740516662597656, 11.904884338378906, -35.77569580078125, 11.14227294921875, 7.063106536865234, -0.21584320068359375, 11.491172790527344, 2.3678760528564453, 0.1874847412109375, -0.8499794006347656, 1.5180511474609375, 13.336441040039062, -3.7844715118408203, 0.4653472900390625, 7.119716644287109, 3.6269798278808594, 9.896781921386719, 5.757568359375, 9.847915649414062, -1.0634212493896484, 4.802417755126953, 9.476409912109375, 1.7524948120117188, 14.747749328613281, 17.9925537109375, -2.986949920654297, 9.3555908203125, 11.023067474365234, 24.192153930664062, 0.9772434234619141], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000128.npy"}
{"epoch": 0.18795888399412627, "step": 129, "batch_size": 64, "mean": 6.976961135864258, "std": 9.583343505859375, "min": -9.814910888671875, "p10": -3.322932434082031, "median": 4.567933082580566, "p90": 20.385048294067385, "max": 32.513153076171875, "pos_frac": 0.671875, "sample": [-0.0240631103515625, -0.8506965637207031, -1.7881813049316406, 20.466854095458984, -3.4766159057617188, 18.2774658203125, -0.24181365966796875, 1.9833297729492188, 20.194168090820312, -1.4900131225585938, 19.615737915039062, 11.91925048828125, -0.11248588562011719, 24.20868682861328, 0.2330322265625, 1.8417282104492188, 6.8973388671875, 0.820220947265625, 4.38031005859375, 4.755556106567383, 12.467437744140625, 17.61878204345703, 12.598403930664062, 6.962837219238281, -3.4658279418945312, -9.814910888671875, -3.4172592163085938, 4.915740966796875, 29.11480712890625, 32.513153076171875, 2.092020034790039, 30.64788818359375, 23.972122192382812, 6.102653503417969, 11.887832641601562, -0.8397560119628906, -0.5093841552734375, 3.8682098388671875, 12.192359924316406, 3.6288681030273438, -4.311897277832031, -0.7852706909179688, 11.144363403320312, 11.120368957519531, 21.32367706298828, -0.457794189453125, 14.426475524902344, 3.928678512573242, -4.989408493041992, -3.1028366088867188, 9.288780212402344, 10.666816711425781, 2.0605926513671875, 9.220382690429688, 5.26434326171875, -0.135894775390625, 13.616199493408203, 1.397613525390625, -2.3397140502929688, -3.958221435546875, 19.789886474609375, 7.984638214111328, 6.641845703125, -1.4139080047607422], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000129.npy"}
{"epoch": 0.1894273127753304, "step": 130, "batch_size": 64, "mean": 9.080093383789062, "std": 10.614298820495605, "min": -4.8106231689453125, "p10": -2.978909683227539, "median": 7.938911437988281, "p90": 23.649007034301764, "max": 46.88679504394531, "pos_frac": 0.8125, "sample": [6.516456604003906, 5.439949035644531, 0.7205009460449219, 30.203956604003906, -3.89361572265625, 11.168075561523438, 11.875534057617188, 12.774761199951172, 28.32219696044922, -1.1538639068603516, 8.783554077148438, 26.820655822753906, 6.852165222167969, 15.243316650390625, 10.19451904296875, -4.529273986816406, 8.619552612304688, -3.020793914794922, 15.495674133300781, 10.607269287109375, 1.3061141967773438, -1.862152099609375, 1.7954864501953125, 21.78548812866211, 24.403926849365234, -4.5092620849609375, -2.8811798095703125, 10.472431182861328, 2.787425994873047, 19.98174285888672, -3.7981300354003906, 8.68939208984375, -3.9598007202148438, 4.8710784912109375, 9.117019653320312, 15.08096694946289, 9.325088500976562, 3.1627960205078125, 7.502593994140625, -0.9244155883789062, 9.430145263671875, 0.41765403747558594, 21.887527465820312, 14.298515319824219, 5.089210510253906, 0.0948028564453125, 46.88679504394531, 4.7105712890625, 2.105741500854492, 3.6742095947265625, 8.375228881835938, -0.7551345825195312, 16.243133544921875, -4.8106231689453125, 10.522315979003906, 1.9698638916015625, 17.142929077148438, 2.248866081237793, 29.834060668945312, 13.543014526367188, 35.89811706542969, 5.452144622802734, 3.8624267578125, 13.6173095703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000130.npy"}
{"epoch": 0.19089574155653452, "step": 131, "batch_size": 64, "mean": 7.820746421813965, "std": 9.677148818969727, "min": -13.02252197265625, "p10": -2.288307094573974, "median": 6.821376800537109, "p90": 22.269977569580078, "max": 37.868682861328125, "pos_frac": 0.765625, "sample": [-1.6301727294921875, -0.0672760009765625, 6.77203369140625, -1.8924942016601562, 29.882308959960938, -13.02252197265625, -1.9257431030273438, 4.43389892578125, 11.082923889160156, 6.870719909667969, 37.868682861328125, 13.284492492675781, -2.7965240478515625, 1.5963058471679688, 2.4489593505859375, 0.9511947631835938, 3.976308822631836, -2.4244279861450195, 1.6444320678710938, 8.768939971923828, -1.9706916809082031, 11.238815307617188, -3.893768310546875, 1.0298843383789062, 16.78044891357422, 3.097991943359375, 2.6240997314453125, 4.266304016113281, 21.942733764648438, 7.3110809326171875, -1.6556205749511719, 13.191970825195312, 0.154327392578125, 2.924816131591797, 15.199264526367188, -0.011236190795898438, 16.080223083496094, 9.206047058105469, 28.062843322753906, 4.493946075439453, 7.303569793701172, 11.074274063110352, 17.20026397705078, 8.394287109375, -4.0949249267578125, 15.116058349609375, 8.361763000488281, 7.247104644775391, -2.5733165740966797, 8.810955047607422, 7.334144592285156, -1.17718505859375, -3.15997314453125, 23.310562133789062, 19.37616729736328, 22.41022491455078, 3.5098800659179688, 5.289791107177734, 0.3692331314086914, 22.971832275390625, 17.880020141601562, 10.956584930419922, 25.1820068359375, 13.538932800292969], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000131.npy"}
{"epoch": 0.19236417033773862, "step": 132, "batch_size": 64, "mean": 8.275461196899414, "std": 9.346583366394043, "min": -24.83209228515625, "p10": 0.17722854614257882, "median": 5.708625793457031, "p90": 21.62172546386719, "max": 30.16948699951172, "pos_frac": 0.890625, "sample": [7.316188812255859, 1.1977996826171875, 10.322402954101562, 3.0223922729492188, 3.5020980834960938, 20.500289916992188, 3.7817420959472656, 28.300750732421875, 24.52154541015625, 4.878292083740234, 5.1563873291015625, 7.536041259765625, 4.460411071777344, 3.5068511962890625, -1.519378662109375, 9.215103149414062, 22.29669189453125, 1.6437454223632812, 11.313423156738281, 4.049152374267578, -4.4193572998046875, 5.761161804199219, 21.15039825439453, 0.88970947265625, 3.1098098754882812, 4.9716339111328125, 3.2222766876220703, 26.67186737060547, 6.053733825683594, 6.98388671875, -5.109046936035156, -24.83209228515625, 1.0332145690917969, 18.39165496826172, 6.033843994140625, 30.16948699951172, 14.290058135986328, 6.739727020263672, 5.656089782714844, 14.424636840820312, 21.82372283935547, 4.790306091308594, 15.86935043334961, 14.366104125976562, -2.9587860107421875, 5.326694488525391, 16.682838439941406, 4.6544647216796875, -0.12812042236328125, 5.4462432861328125, -0.4604377746582031, 18.878990173339844, 4.977851867675781, 5.469596862792969, 14.246734619140625, 15.4014892578125, 4.3133697509765625, 1.3867950439453125, 7.935447692871094, 8.48841381072998, 6.414085388183594, 0.9857330322265625, 10.594673156738281, 28.929306030273438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000132.npy"}
{"epoch": 0.19383259911894274, "step": 133, "batch_size": 64, "mean": 8.378573417663574, "std": 9.179451942443848, "min": -5.8385162353515625, "p10": -0.602605628967285, "median": 6.166007995605469, "p90": 19.88926010131836, "max": 38.20375061035156, "pos_frac": 0.859375, "sample": [5.342674255371094, 12.776687622070312, -5.8385162353515625, 4.7872772216796875, 5.097908020019531, -3.8278274536132812, 17.534515380859375, -3.6496658325195312, 0.293914794921875, 14.926483154296875, 8.739364624023438, 6.355037689208984, 0.14780426025390625, 33.75025177001953, 1.98199462890625, 18.546737670898438, 2.9346923828125, 1.143310546875, 20.118423461914062, 5.533420562744141, 9.967453002929688, 17.202392578125, 6.314659118652344, 7.8054351806640625, 7.858451843261719, 19.35454559326172, 0.057373046875, 8.908775329589844, 17.059341430664062, -2.22479248046875, 13.359642028808594, 6.017356872558594, 4.1894683837890625, 4.5931243896484375, 2.0989608764648438, 7.207275390625, 3.2006301879882812, 7.860252380371094, 5.309539794921875, 8.879871368408203, 3.544677734375, -1.506744384765625, 2.679698944091797, 38.20375061035156, -0.17308425903320312, 8.799354553222656, 19.236282348632812, -0.6577072143554688, 21.175262451171875, 2.8686599731445312, 6.546623229980469, 16.014400482177734, 0.9383926391601562, 6.584625244140625, 28.23711395263672, 23.318008422851562, 10.736137390136719, 24.50267791748047, 18.804611206054688, 5.387828826904297, 2.1560287475585938, -0.47403526306152344, 2.6793994903564453, -5.087493896484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000133.npy"}
{"epoch": 0.19530102790014683, "step": 134, "batch_size": 64, "mean": 9.169682502746582, "std": 9.1242094039917, "min": -6.365043640136719, "p10": -0.4676452636718746, "median": 7.328680992126465, "p90": 21.51821746826172, "max": 32.68046569824219, "pos_frac": 0.875, "sample": [1.6233978271484375, -0.6450653076171875, -4.376800537109375, 13.091270446777344, 32.68046569824219, 6.8856353759765625, 0.3838958740234375, 7.376079559326172, -4.419456481933594, 11.92498779296875, 7.0706024169921875, 8.744194030761719, 9.209441184997559, 2.1175460815429688, 0.1868438720703125, 2.955718994140625, 0.5733566284179688, 9.49822998046875, 13.674674987792969, 0.8850326538085938, 3.911663055419922, 7.007366180419922, -3.2627792358398438, 5.258167266845703, 7.578498840332031, 6.8811798095703125, 18.15899658203125, 11.663284301757812, 15.822990417480469, 20.675762176513672, 7.281282424926758, 21.644432067871094, 3.6088123321533203, 12.952079772949219, 14.772720336914062, 17.985267639160156, 13.139030456542969, 1.2360687255859375, -3.2186355590820312, 6.782924652099609, 19.69573974609375, 7.797576904296875, 9.646713256835938, 1.7403984069824219, 29.489608764648438, 15.245414733886719, 23.000228881835938, -0.0536651611328125, 22.3026123046875, 3.691433906555176, -5.847404479980469, 3.1532516479492188, -6.365043640136719, 30.16448211669922, 13.538581848144531, 6.16229248046875, 6.577339172363281, 2.396697998046875, 10.214981079101562, 14.472129821777344, 2.1020278930664062, 26.890182495117188, 21.223716735839844, 20.30120849609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000134.npy"}
{"epoch": 0.19676945668135096, "step": 135, "batch_size": 64, "mean": 8.215873718261719, "std": 10.619638442993164, "min": -14.852340698242188, "p10": -2.1787574768066404, "median": 5.4277496337890625, "p90": 19.238882446289068, "max": 58.8043212890625, "pos_frac": 0.859375, "sample": [1.1009674072265625, 11.494384765625, 7.848640441894531, 16.39111328125, -4.752838134765625, 17.709659576416016, 19.949951171875, 1.2948760986328125, 10.487197875976562, 18.0645751953125, 12.113883972167969, 1.9091300964355469, 21.7010498046875, -4.251518249511719, 10.61895751953125, 2.474945068359375, 17.380840301513672, 13.628562927246094, 6.71356201171875, 1.6491050720214844, 3.985990524291992, 17.146995544433594, 3.74652099609375, -2.870086669921875, 4.459016799926758, 24.882568359375, 1.4281768798828125, 58.8043212890625, -5.421726226806641, 30.54174041748047, 3.7879257202148438, 5.609649658203125, -2.9861984252929688, 1.1098518371582031, 4.715145111083984, 12.749862670898438, 4.4823455810546875, 12.507253646850586, -2.0442657470703125, -1.8007583618164062, 0.1035919189453125, 0.702728271484375, 1.2848243713378906, 3.927764892578125, -14.852340698242188, 9.332355499267578, 5.794334411621094, 4.0492095947265625, 4.333927154541016, 6.297874450683594, 4.526655197143555, 5.245849609375, 15.538558959960938, 3.5468482971191406, 13.799171447753906, 16.7412109375, 10.635738372802734, 19.742156982421875, 8.227775573730469, -2.2363967895507812, 3.2105636596679688, 28.62005615234375, 9.812713623046875, 9.069404602050781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000135.npy"}
{"epoch": 0.19823788546255505, "step": 136, "batch_size": 64, "mean": 8.894369125366211, "std": 8.368104934692383, "min": -6.589324951171875, "p10": 0.05446434020996112, "median": 7.790353775024414, "p90": 21.627578735351562, "max": 29.39666748046875, "pos_frac": 0.890625, "sample": [3.7030258178710938, 16.97271728515625, 8.59124755859375, 9.652450561523438, 4.610908508300781, -0.617156982421875, 24.81378173828125, 9.491832733154297, 4.775947570800781, 4.664556503295898, 10.709022521972656, 3.1267242431640625, 6.613922119140625, 1.1408557891845703, 8.080554962158203, 9.199569702148438, 0.23381423950195312, 3.059316635131836, 11.232315063476562, 28.84326171875, 7.500152587890625, 21.655364990234375, 28.97119140625, 6.643798828125, 4.7033843994140625, 19.140960693359375, 23.432174682617188, 1.5113353729248047, 29.39666748046875, 2.2632102966308594, 10.161659240722656, 6.230842590332031, 4.955982208251953, -6.589324951171875, 4.1945953369140625, -0.02239990234375, 8.244163513183594, 2.102813720703125, 12.40740966796875, 9.237476348876953, 2.9146270751953125, 9.459861755371094, 0.44544219970703125, 21.93499755859375, -1.8403739929199219, 10.296926498413086, 1.8522529602050781, 15.947635650634766, -3.9619979858398438, -4.4651641845703125, 20.462623596191406, 15.066780090332031, 14.766376495361328, 12.159263610839844, 10.512435913085938, 7.196552276611328, 15.09661865234375, 2.7128868103027344, 6.0829925537109375, -2.7511672973632812, 12.28314208984375, 14.253341674804688, 21.562744140625, 2.2107086181640625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000136.npy"}
{"epoch": 0.19970631424375918, "step": 137, "batch_size": 64, "mean": 12.916418075561523, "std": 11.870491981506348, "min": -11.240928649902344, "p10": 1.4889087677001958, "median": 9.408896446228027, "p90": 30.245177459716803, "max": 47.679107666015625, "pos_frac": 0.921875, "sample": [3.140350341796875, 13.705984115600586, 12.079475402832031, 7.400840759277344, 16.92462921142578, 19.444229125976562, 9.0772705078125, 37.845489501953125, 6.3434295654296875, 20.095367431640625, 1.9984664916992188, 7.137199401855469, 42.615203857421875, 8.513729095458984, 7.672233581542969, 4.519756317138672, 11.631256103515625, 15.042068481445312, 6.465259552001953, 11.611778259277344, 28.640884399414062, 7.247276306152344, 9.364400863647461, 47.679107666015625, 30.93273162841797, 12.258056640625, 18.742691040039062, 12.877670288085938, -1.9338092803955078, 11.201622009277344, 8.78609848022461, 11.80364990234375, 35.45939636230469, 5.210136413574219, 32.112213134765625, 7.038261413574219, 12.606487274169922, 11.319198608398438, 27.904541015625, 9.453392028808594, 7.235248565673828, 8.662635803222656, -1.9027481079101562, -11.240928649902344, 3.8162994384765625, -0.48018646240234375, 41.71588134765625, -1.6362838745117188, 20.625213623046875, 20.969879150390625, 8.525936126708984, 1.3028564453125, 3.9729366302490234, 13.616893768310547, 1.9230308532714844, 0.13034629821777344, 3.860076904296875, 6.0202484130859375, 3.539398193359375, 24.49034881591797, 25.192825317382812, 2.496826171875, 15.952495574951172, 25.893478393554688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000137.npy"}
{"epoch": 0.2011747430249633, "step": 138, "batch_size": 64, "mean": 8.444417953491211, "std": 9.162915229797363, "min": -6.13018798828125, "p10": -1.037647056579589, "median": 6.0747528076171875, "p90": 21.182179260253907, "max": 35.697113037109375, "pos_frac": 0.875, "sample": [7.0747528076171875, 2.889007568359375, 15.84731674194336, 19.994232177734375, 6.7584228515625, 20.674598693847656, 2.3096771240234375, 2.808736801147461, 7.370567321777344, 26.724891662597656, 1.3971023559570312, -4.0668182373046875, -1.8129119873046875, 1.4371404647827148, 3.018451690673828, 3.7982177734375, 8.904745101928711, 2.8767852783203125, 0.7631149291992188, -3.7723541259765625, 15.101966857910156, 25.701171875, -0.1777515411376953, 4.603118896484375, 24.783470153808594, 16.8961181640625, -1.4061737060546875, 6.148284912109375, 7.943077087402344, 1.9139404296875, 2.271514892578125, 2.2668113708496094, 5.423774719238281, 19.873794555664062, 20.800941467285156, 26.852432250976562, 16.295654296875, 12.076385498046875, -2.0807037353515625, 21.315383911132812, 6.174232482910156, 10.03204345703125, 0.9510345458984375, 1.7054929733276367, 4.169319152832031, -6.13018798828125, 18.569610595703125, 7.6343231201171875, -2.6317672729492188, 9.763107299804688, 9.962257385253906, 3.5290279388427734, 0.8552627563476562, 4.762603759765625, 0.24712562561035156, 8.258243560791016, 12.247303009033203, 24.13623809814453, 7.195991516113281, 35.697113037109375, 2.840343475341797, 20.871368408203125, 2.0025787353515625, 6.001220703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000138.npy"}
{"epoch": 0.2026431718061674, "step": 139, "batch_size": 64, "mean": 9.706657409667969, "std": 10.668333053588867, "min": -32.958160400390625, "p10": -0.7955841064453123, "median": 9.585823059082031, "p90": 23.302023315429693, "max": 34.14613723754883, "pos_frac": 0.859375, "sample": [-2.7584495544433594, 9.212974548339844, 13.836814880371094, 19.617446899414062, 15.844501495361328, 10.245071411132812, 34.14613723754883, 3.3595657348632812, 14.455322265625, 5.393280029296875, -32.958160400390625, 11.19976806640625, 10.620216369628906, 23.747482299804688, 24.187301635742188, 8.439231872558594, 10.65325927734375, 26.252235412597656, 16.40251922607422, -1.7532157897949219, 22.262619018554688, 3.963733673095703, 3.3373260498046875, 5.044532775878906, 6.1592254638671875, 1.8871040344238281, 12.774436950683594, 17.18384552001953, -4.361631393432617, 3.8276748657226562, 20.567962646484375, 12.048357009887695, -0.597747802734375, -0.88037109375, 12.721511840820312, 8.429431915283203, 8.464183807373047, 3.917694091796875, 1.0186805725097656, 21.225135803222656, 2.9458045959472656, 8.369979858398438, 10.410694122314453, 6.692230224609375, 9.147789001464844, 10.4676513671875, 0.08020782470703125, 12.58255386352539, 0.9443092346191406, -4.054847717285156, 32.748695373535156, 10.674240112304688, 5.316822052001953, 6.623302459716797, 32.12213134765625, -2.22039794921875, 31.461517333984375, 14.604660034179688, 20.138351440429688, 2.0697479248046875, 10.631271362304688, 9.958671569824219, 10.531368255615234, -0.157684326171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000139.npy"}
{"epoch": 0.20411160058737152, "step": 140, "batch_size": 64, "mean": 8.94013786315918, "std": 9.2737398147583, "min": -7.6414794921875, "p10": -1.6880004882812498, "median": 7.032390594482422, "p90": 22.43387260437012, "max": 30.65484619140625, "pos_frac": 0.84375, "sample": [5.403755187988281, 5.556903839111328, 26.22125244140625, 17.425697326660156, -1.7911529541015625, 0.9226303100585938, 4.001533508300781, 13.405059814453125, 19.475175857543945, 22.607528686523438, 19.17778778076172, -4.1723480224609375, 10.266105651855469, 3.0545501708984375, 10.197021484375, 7.10418701171875, 16.54497528076172, 2.9859390258789062, 8.604190826416016, 22.028675079345703, -1.4473114013671875, 2.1163330078125, 19.354713439941406, 21.68663787841797, -0.1320648193359375, 7.148468017578125, 4.0976104736328125, 9.794296264648438, 5.8179779052734375, 2.8952674865722656, 2.7011489868164062, 3.971343994140625, 7.661338806152344, 23.725196838378906, -0.6522216796875, 22.819664001464844, -2.2985305786132812, 15.363761901855469, 14.1290283203125, 9.947242736816406, 3.6435775756835938, 0.7573928833007812, 30.65484619140625, -7.6414794921875, 6.925483703613281, 6.821605682373047, 0.63067626953125, 29.462451934814453, 12.300102233886719, -4.303802490234375, -6.830043792724609, 1.0201034545898438, 1.0987663269042969, 4.939018249511719, 6.960594177246094, 11.301162719726562, 7.391441345214844, -4.66766357421875, 25.342193603515625, 20.647735595703125, 8.688785552978516, 12.803176879882812, 6.5952606201171875, 19.908050537109375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000140.npy"}
{"epoch": 0.2055800293685756, "step": 141, "batch_size": 64, "mean": 10.515582084655762, "std": 10.635018348693848, "min": -16.562744140625, "p10": -0.5571094512939452, "median": 9.47307014465332, "p90": 27.544334411621097, "max": 43.30632019042969, "pos_frac": 0.84375, "sample": [4.260223388671875, 6.708892822265625, 9.556961059570312, 9.223564147949219, 14.739471435546875, 2.020862579345703, 13.168159484863281, 3.1394920349121094, 2.1867218017578125, 2.287567138671875, -0.12276077270507812, 31.03497314453125, -16.562744140625, 7.480766296386719, -0.6080093383789062, 26.532455444335938, 0.21665382385253906, 4.77752685546875, 28.884613037109375, 6.038124084472656, 27.977996826171875, 29.95294189453125, 16.25603675842285, 10.842575073242188, 21.06488037109375, 2.2705078125, 16.247100830078125, 9.486618041992188, 5.6474609375, -0.2030487060546875, 13.139785766601562, 11.52093505859375, 6.618078231811523, -1.1781082153320312, 31.013015747070312, 9.459522247314453, 11.350852966308594, 4.754451751708984, 4.206932067871094, 7.34684944152832, 10.106315612792969, 13.774761199951172, -0.612030029296875, 43.30632019042969, 5.309976577758789, 11.491138458251953, 14.614051818847656, -1.9771957397460938, -0.4383430480957031, -6.47698974609375, 19.33025360107422, 15.827781677246094, 7.548248291015625, 0.8993988037109375, 15.947998046875, 13.386375427246094, 33.355133056640625, 14.174957275390625, 5.383277893066406, 15.991447448730469, 16.660499572753906, -3.474517822265625, 19.6212158203125, 16.50829315185547], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000141.npy"}
{"epoch": 0.20704845814977973, "step": 142, "batch_size": 64, "mean": 9.713190078735352, "std": 10.590311050415039, "min": -5.661033630371094, "p10": -1.9565204620361327, "median": 7.468488693237305, "p90": 20.761311721801768, "max": 49.44927978515625, "pos_frac": 0.84375, "sample": [8.645835876464844, 49.44927978515625, 5.248785018920898, 7.638805389404297, 14.708301544189453, 6.438079833984375, 11.142738342285156, 11.580513000488281, 17.049301147460938, 6.793235778808594, 6.2018280029296875, 13.04058837890625, 7.968025207519531, -2.4212074279785156, 25.274192810058594, 34.721160888671875, 2.227876663208008, 6.1328582763671875, 0.384979248046875, 1.562042236328125, 11.613746643066406, 21.73748016357422, 6.3977203369140625, 5.4882965087890625, 16.700485229492188, 8.647560119628906, 2.398345947265625, 8.90119743347168, -1.5605621337890625, -1.4238567352294922, 37.603729248046875, 9.979904174804688, 18.338668823242188, 18.483585357666016, -4.590482711791992, 27.432762145996094, 14.763008117675781, 5.652944564819336, -3.4867210388183594, 1.5852279663085938, 7.2981719970703125, 8.355735778808594, -2.0190773010253906, 5.800048828125, 2.4681625366210938, 17.017578125, 7.191789627075195, 6.5367584228515625, 17.764816284179688, 6.35577392578125, 12.37801742553711, -5.661033630371094, -1.8105545043945312, 10.75030517578125, 34.724761962890625, 17.096099853515625, -3.001920700073242, 10.948280334472656, 14.399436950683594, 0.4534492492675781, 4.772979736328125, 12.712127685546875, 1.295196533203125, -2.6329879760742188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000142.npy"}
{"epoch": 0.20851688693098386, "step": 143, "batch_size": 64, "mean": 8.390382766723633, "std": 10.29248046875, "min": -18.835845947265625, "p10": -2.4872978210449217, "median": 7.2354278564453125, "p90": 21.013642883300786, "max": 44.44075012207031, "pos_frac": 0.8125, "sample": [10.114734649658203, 10.515457153320312, 1.8064422607421875, -11.778228759765625, 17.010467529296875, 11.513397216796875, -2.5740737915039062, 21.543289184570312, 14.801345825195312, 6.2610015869140625, 23.783348083496094, -1.6781539916992188, 7.6226043701171875, 10.79833984375, -0.4287109375, -4.172027587890625, 1.2513809204101562, -1.1638603210449219, -6.046913146972656, 0.6225433349609375, 6.27056884765625, 11.697006225585938, -2.6161155700683594, 23.8341064453125, 6.409614562988281, 3.434436798095703, 2.7402191162109375, 15.666893005371094, 8.357162475585938, 15.03762435913086, 9.071342468261719, 4.44239616394043, 28.49445343017578, 3.407085418701172, 11.876960754394531, 4.126762390136719, 6.5792694091796875, 12.121757507324219, 19.777801513671875, 16.129837036132812, 13.155616760253906, -2.284820556640625, 1.2418670654296875, 12.121170043945312, -0.6024866104125977, 19.53314971923828, 3.2082061767578125, 13.459144592285156, 0.45206451416015625, 10.540023803710938, 14.590286254882812, 44.44075012207031, 8.61822509765625, 4.7826690673828125, 30.763381958007812, 6.8482513427734375, 11.427986145019531, 3.9984817504882812, 27.601089477539062, -18.835845947265625, 3.6399612426757812, 9.909332275390625, -3.328521728515625, 5.042938232421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000143.npy"}
{"epoch": 0.20998531571218795, "step": 144, "batch_size": 64, "mean": 9.0801362991333, "std": 11.914495468139648, "min": -18.289093017578125, "p10": -3.2223129272460938, "median": 7.8013458251953125, "p90": 22.360574340820314, "max": 46.14177703857422, "pos_frac": 0.828125, "sample": [-3.1121368408203125, 9.073326110839844, 2.721485137939453, 4.899101257324219, -6.092308044433594, 5.197296142578125, 32.76441192626953, 7.829948425292969, 9.407638549804688, -8.619346618652344, 46.14177703857422, 10.530311584472656, 16.989639282226562, 26.0877685546875, 7.332977294921875, 21.685653686523438, 15.08737564086914, 4.586067199707031, 3.9779739379882812, 34.66780471801758, 5.015625, 12.386127471923828, -2.261272430419922, -3.84716796875, 12.615737915039062, 3.8867645263671875, 4.95770263671875, 10.340171813964844, 7.772743225097656, 0.0702362060546875, 13.679500579833984, 11.046211242675781, 16.130279541015625, 8.59454345703125, -18.289093017578125, 9.757888793945312, 0.7783927917480469, 3.7616958618164062, -3.26953125, 8.977958679199219, 6.724700927734375, 7.533302307128906, 34.555511474609375, 18.606178283691406, 7.161888122558594, 14.434555053710938, -2.57940673828125, -12.102859497070312, 22.649826049804688, 13.486129760742188, 4.2586517333984375, 17.13672637939453, 7.3050384521484375, 9.491790771484375, -12.510330200195312, 3.82122802734375, 42.834686279296875, 9.492366790771484, 4.266845703125, 12.977447509765625, 18.0289306640625, -2.2263336181640625, 10.579978942871094, 1.9406051635742188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000144.npy"}
{"epoch": 0.21145374449339208, "step": 145, "batch_size": 64, "mean": 9.981130599975586, "std": 10.526957511901855, "min": -15.710494995117188, "p10": -1.9122611999511718, "median": 9.59291934967041, "p90": 23.520112609863283, "max": 44.739906311035156, "pos_frac": 0.84375, "sample": [8.493392944335938, 12.959098815917969, 23.3179931640625, 36.243408203125, 12.300605773925781, 20.554290771484375, 14.246505737304688, -0.97210693359375, 11.745094299316406, 18.962600708007812, -3.7948455810546875, 18.727310180664062, 10.640052795410156, 4.3327178955078125, 6.284601211547852, 4.68218994140625, 24.930946350097656, 9.512142181396484, -8.286659240722656, 4.296693801879883, 1.461334228515625, -4.810546875, 1.1921920776367188, 3.8194808959960938, 12.710853576660156, 5.272275924682617, 13.434326171875, -6.113677978515625, 1.5204315185546875, 23.606735229492188, 4.71934700012207, 14.54266357421875, 16.870880126953125, -2.7209701538085938, 15.558578491210938, 19.700210571289062, 10.994804382324219, 4.09694766998291, 0.485076904296875, 6.2371368408203125, -15.710494995117188, 4.2050018310546875, 28.098857879638672, 3.3964309692382812, 10.86685562133789, 44.739906311035156, 24.94580841064453, -1.8399658203125, 9.527700424194336, 28.039352416992188, 13.991416931152344, 15.563911437988281, -1.9432449340820312, 11.573501586914062, 9.658138275146484, 4.561862945556641, 5.00341796875, 11.323646545410156, 10.012802124023438, 9.110538482666016, 13.22760009765625, 20.848541259765625, -0.68280029296875, 8.519439697265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000145.npy"}
{"epoch": 0.21292217327459617, "step": 146, "batch_size": 64, "mean": 11.222307205200195, "std": 10.310596466064453, "min": -11.531341552734375, "p10": 1.2772165298461917, "median": 9.79473876953125, "p90": 21.95705795288086, "max": 39.21051025390625, "pos_frac": 0.90625, "sample": [22.015525817871094, 18.94890594482422, 21.184669494628906, 1.1087646484375, 3.499757766723633, -3.8591766357421875, 20.381744384765625, 12.059677124023438, 3.6564788818359375, 14.48629379272461, 17.025360107421875, 20.88886260986328, 1.6702709197998047, 3.9205093383789062, 3.3097991943359375, 7.242759704589844, 6.561180114746094, 17.2835693359375, 38.67036437988281, 6.755870819091797, 2.600627899169922, 2.193359375, 9.005828857421875, 17.017257690429688, -11.531341552734375, 19.95421600341797, -2.99090576171875, 13.619758605957031, 17.804058074951172, 24.66064453125, 1.8900260925292969, 3.3248119354248047, 8.235076904296875, 20.849700927734375, 33.92485046386719, -6.130470275878906, 2.4255523681640625, 4.554948806762695, 10.715202331542969, 7.5947418212890625, 23.22186279296875, 12.88330078125, 17.950950622558594, 8.458480834960938, 3.0594406127929688, 16.85064697265625, 19.244953155517578, -8.939361572265625, 13.6376953125, 13.912979125976562, 31.537384033203125, 39.21051025390625, 12.291755676269531, 7.7727508544921875, 5.1486053466796875, 21.820632934570312, 5.552806854248047, 9.482566833496094, 13.6888427734375, 6.9564361572265625, 9.678131103515625, 10.459598541259766, -2.093780517578125, 9.911346435546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000146.npy"}
{"epoch": 0.2143906020558003, "step": 147, "batch_size": 64, "mean": 11.62929916381836, "std": 10.40943717956543, "min": -7.554283142089844, "p10": -1.8922897338867186, "median": 11.0128173828125, "p90": 23.780369567871094, "max": 43.14593505859375, "pos_frac": 0.796875, "sample": [29.395042419433594, 6.024288177490234, 19.897499084472656, 19.133865356445312, 8.783378601074219, 2.50494384765625, 7.8234710693359375, 17.492046356201172, -3.48486328125, 5.581512451171875, 10.184701919555664, 21.977455139160156, 24.821969985961914, 10.120948791503906, 19.39944076538086, 31.87169647216797, 17.210365295410156, 15.140838623046875, -2.6974945068359375, -0.00656890869140625, 19.633926391601562, -7.554283142089844, -7.372295379638672, 10.911483764648438, 33.22705841064453, -0.8260116577148438, 17.363380432128906, 19.634815216064453, 14.276174545288086, -4.651222229003906, 6.172393798828125, 11.151287078857422, 5.732160568237305, 2.323200225830078, 9.914276123046875, 19.8355712890625, 4.941549301147461, 43.14593505859375, 7.289911270141602, 23.402587890625, 12.550949096679688, 9.057212829589844, -1.3780288696289062, -0.2080535888671875, -2.4525985717773438, 16.575756072998047, 8.521743774414062, 20.520803451538086, 23.942276000976562, -0.17690277099609375, -1.8251800537109375, 17.60512924194336, 18.436565399169922, 18.204269409179688, 17.153247833251953, -1.921051025390625, 25.878448486328125, 11.114151000976562, 9.554435729980469, 13.879608154296875, 14.133636474609375, 4.8036956787109375, 8.304256439208984, 12.274322509765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000147.npy"}
{"epoch": 0.21585903083700442, "step": 148, "batch_size": 64, "mean": 9.56178092956543, "std": 11.316573143005371, "min": -10.497512817382812, "p10": -1.5233608245849608, "median": 7.4624786376953125, "p90": 24.672293090820318, "max": 43.21495056152344, "pos_frac": 0.8125, "sample": [-3.4764556884765625, 4.526832580566406, 4.68182373046875, 25.358444213867188, 27.719863891601562, -4.079925537109375, 2.7853927612304688, 6.034400939941406, 3.7069549560546875, 43.21495056152344, 11.847122192382812, 23.071273803710938, 4.645298004150391, 0.1011962890625, 18.890159606933594, 27.543121337890625, 4.960014343261719, -0.1352996826171875, 11.60123062133789, 8.008926391601562, 6.815818786621094, 14.357017517089844, 5.250160217285156, 8.654815673828125, 41.443946838378906, 4.324562072753906, 7.085700988769531, -10.497512817382812, 19.919021606445312, -2.2136192321777344, 34.325050354003906, 10.744415283203125, 10.261650085449219, -0.6362686157226562, -1.4412841796875, -1.1411933898925781, 7.867523193359375, 0.4515533447265625, 16.050270080566406, -10.354679107666016, 9.745880126953125, 14.750396728515625, -1.8166770935058594, 9.570716857910156, 6.612022399902344, 10.59619140625, 16.33111572265625, 0.7843437194824219, 8.517326354980469, 16.51776123046875, 12.539962768554688, 12.294815063476562, 16.264820098876953, 5.898059844970703, -1.5585365295410156, 1.9598379135131836, 20.447463989257812, 19.196624755859375, 38.417999267578125, 1.7662334442138672, -0.23750686645507812, 1.0129013061523438, 7.839256286621094, 2.2306575775146484], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000148.npy"}
{"epoch": 0.2173274596182085, "step": 149, "batch_size": 64, "mean": 9.372669219970703, "std": 11.790626525878906, "min": -8.289886474609375, "p10": -2.2389331817626945, "median": 7.205255508422852, "p90": 25.895137023925788, "max": 47.8741455078125, "pos_frac": 0.8125, "sample": [35.890281677246094, -1.4306182861328125, 41.00848388671875, 6.3574066162109375, -1.3078384399414062, 8.511199951171875, 0.030658721923828125, 8.783095359802246, 2.8448638916015625, 3.515533447265625, 3.7150306701660156, 15.274826049804688, 10.25252914428711, -7.2459869384765625, 15.726593017578125, 6.662017822265625, 15.792999267578125, 7.46588134765625, 9.042816162109375, 2.3222274780273438, 10.815284729003906, -2.5853538513183594, 0.2643318176269531, 0.43729305267333984, 7.84773063659668, 7.435863494873047, 5.543754577636719, 6.974647521972656, 11.25143051147461, 26.442306518554688, -7.6713104248046875, 0.4800148010253906, 23.881011962890625, -2.8577194213867188, 0.23114013671875, 15.731369018554688, 47.8741455078125, 8.611026763916016, 2.3949737548828125, 24.618408203125, 6.8899688720703125, 23.089866638183594, 8.743797302246094, 8.719539642333984, -0.2010345458984375, 18.685447692871094, 9.167259216308594, 26.987518310546875, -3.445465087890625, 6.7784423828125, 26.66961669921875, 22.402671813964844, 2.7096405029296875, 6.0879974365234375, 0.5046463012695312, -0.2169952392578125, -0.6214456558227539, 14.941062927246094, 0.52001953125, 37.19691467285156, 15.173515319824219, -8.289886474609375, 10.293220520019531, -3.867828369140625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000149.npy"}
{"epoch": 0.21879588839941264, "step": 150, "batch_size": 64, "mean": 10.34152889251709, "std": 11.742480278015137, "min": -9.577583312988281, "p10": -0.8582168579101558, "median": 8.064875602722168, "p90": 22.113199234008793, "max": 55.89002990722656, "pos_frac": 0.875, "sample": [14.170501708984375, 42.44468688964844, 10.724769592285156, 8.366031646728516, 19.832740783691406, -9.577583312988281, 5.448036193847656, 18.096263885498047, 10.921890258789062, -1.068939208984375, 55.89002990722656, -1.6493644714355469, 20.792869567871094, 3.1302871704101562, 6.7810821533203125, 33.771514892578125, 7.249229431152344, 4.7685699462890625, 12.401611328125, 16.012771606445312, 7.76371955871582, 29.923355102539062, 9.481521606445312, 11.811264038085938, 20.218250274658203, 8.900375366210938, 14.251510620117188, 21.243179321289062, 2.0440292358398438, 20.06517791748047, 2.66046142578125, 6.578102111816406, 0.27719879150390625, 29.092819213867188, 2.147430419921875, 1.277313232421875, 6.798900604248047, 13.244670867919922, 27.251876831054688, -8.543601989746094, 19.825401306152344, 19.814727783203125, 11.561723709106445, -0.3665313720703125, -5.4198455810546875, 1.5392284393310547, -8.955429077148438, 8.511978149414062, 10.523666381835938, 5.27490234375, -3.8501739501953125, 16.847179412841797, 6.0401611328125, 2.861248016357422, 22.486064910888672, 12.992225646972656, 5.228118896484375, 0.47864532470703125, 2.5949859619140625, 3.4216156005859375, 4.232429504394531, 3.355083465576172, 1.544891357421875, 16.32099151611328], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000150.npy"}
{"epoch": 0.22026431718061673, "step": 151, "batch_size": 64, "mean": 9.63852310180664, "std": 11.66657543182373, "min": -12.54541015625, "p10": -2.841400909423828, "median": 7.329460144042969, "p90": 24.980418395996097, "max": 52.646636962890625, "pos_frac": 0.796875, "sample": [-4.64544677734375, 8.436931610107422, 17.284683227539062, 6.1693115234375, -2.976032257080078, 23.72858428955078, -2.3668365478515625, 3.1779708862304688, -9.475875854492188, 2.907867431640625, 13.160125732421875, -3.9751052856445312, -2.6676559448242188, 2.5386390686035156, 8.380722045898438, -1.739715576171875, 1.985809326171875, 11.53680419921875, 12.380447387695312, 25.434768676757812, 52.646636962890625, 7.0023040771484375, 6.388885498046875, 2.12921142578125, -2.915863037109375, 21.8604736328125, 8.277542114257812, 3.86083984375, -0.01854705810546875, 29.426475524902344, 26.17896270751953, 16.726150512695312, 7.071220397949219, 19.705156326293945, -0.6807098388671875, 9.047775268554688, 7.0973052978515625, 1.503021240234375, 7.561614990234375, 25.37835693359375, 10.169822692871094, 18.221881866455078, 28.535842895507812, -2.373626708984375, 6.4373016357421875, 21.557662963867188, 28.454238891601562, -8.940471649169922, 20.791515350341797, -12.54541015625, 2.2462310791015625, 2.8935508728027344, 11.299131393432617, 8.112693786621094, 24.027374267578125, 0.2654685974121094, 7.0073699951171875, 11.585693359375, 21.1832275390625, 4.876678466796875, 10.803215026855469, 24.051895141601562, 0.4215888977050781, 20.259803771972656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000151.npy"}
{"epoch": 0.22173274596182085, "step": 152, "batch_size": 64, "mean": 11.462337493896484, "std": 11.114155769348145, "min": -15.386260986328125, "p10": -0.6729705810546873, "median": 10.53547477722168, "p90": 23.87456207275391, "max": 46.207305908203125, "pos_frac": 0.875, "sample": [3.6293182373046875, 9.160255432128906, 43.12757873535156, 22.020172119140625, 4.858833312988281, 10.6798095703125, 3.4509506225585938, 12.375579833984375, 46.207305908203125, -5.400848388671875, 12.544570922851562, 8.067138671875, 10.39113998413086, 11.5123291015625, 11.925140380859375, 8.650291442871094, 5.893486022949219, 20.95706558227539, 12.709480285644531, -2.1468124389648438, 17.18578338623047, 5.222358703613281, 4.138830184936523, 11.591552734375, 5.5033416748046875, 19.405845642089844, 11.634841918945312, 25.601943969726562, 8.22186279296875, 6.353057861328125, 0.1743927001953125, 13.781585693359375, 24.18487548828125, -4.504402160644531, 9.011289596557617, 10.895736694335938, 8.305526733398438, 17.400970458984375, 2.8448143005371094, -3.3081512451171875, 25.372833251953125, 13.10809326171875, 6.360937118530273, 23.150497436523438, -0.5501747131347656, -15.386260986328125, 14.024124145507812, 3.3206710815429688, 15.419414520263672, 44.94450378417969, 9.884025573730469, 5.200050354003906, 17.35833740234375, 5.036346435546875, 29.493247985839844, -0.7255973815917969, 10.138931274414062, 1.939971923828125, 14.496353149414062, 21.920852661132812, 16.443355560302734, -1.05841064453125, 13.961807250976562, 15.476837158203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000152.npy"}
{"epoch": 0.22320117474302498, "step": 153, "batch_size": 64, "mean": 8.251411437988281, "std": 10.639138221740723, "min": -13.150157928466797, "p10": -3.375931167602539, "median": 6.22723388671875, "p90": 22.37026481628418, "max": 39.39759826660156, "pos_frac": 0.765625, "sample": [6.49969482421875, -2.1880645751953125, 8.853515625, 5.392845153808594, -9.767410278320312, 22.27044677734375, -0.5558757781982422, 28.734527587890625, 11.235937118530273, 12.350967407226562, 5.0075531005859375, -3.4501953125, 9.986000061035156, -4.719020843505859, 9.311477661132812, 2.574878692626953, 1.8412017822265625, 5.207698822021484, 2.781282424926758, 1.9540328979492188, 8.7164306640625, 25.67144775390625, 20.142303466796875, 12.715871810913086, 9.912574768066406, 6.2399749755859375, 5.0618743896484375, 5.741523742675781, 39.39759826660156, 21.73138427734375, 15.723453521728516, 2.0118026733398438, 26.653564453125, -2.0068283081054688, -0.05142402648925781, -13.150157928466797, 19.661788940429688, 21.115921020507812, 12.542884826660156, -3.202648162841797, 16.98560333251953, 23.308883666992188, 6.2144927978515625, 13.237762451171875, -4.423561096191406, 2.1773681640625, -4.747169494628906, 5.678474426269531, 31.05126953125, -11.39569091796875, 6.7918701171875, 16.195709228515625, 18.827877044677734, 3.5619964599609375, 6.298482894897461, 3.6814403533935547, 22.413043975830078, 5.1242828369140625, -3.067291259765625, 1.376190185546875, -2.5738258361816406, -0.846527099609375, 10.396926879882812, 13.871841430664062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000153.npy"}
{"epoch": 0.22466960352422907, "step": 154, "batch_size": 64, "mean": 13.361062049865723, "std": 11.287209510803223, "min": -5.987880706787109, "p10": 0.8491531372070316, "median": 11.969873428344727, "p90": 29.702498245239266, "max": 46.12995910644531, "pos_frac": 0.921875, "sample": [1.5792732238769531, 4.654424667358398, 4.196685791015625, 13.742151260375977, 27.61376953125, 18.163516998291016, 0.1836395263671875, 22.8575439453125, 34.734039306640625, 7.902294158935547, 23.865196228027344, 32.33652114868164, 16.35284423828125, 13.253684997558594, 11.56906509399414, 11.134674072265625, -0.6579265594482422, 10.7181396484375, 3.249177932739258, -5.758819580078125, 24.46977996826172, 26.048892974853516, -0.2033233642578125, 3.0313186645507812, 3.0740890502929688, 7.9111328125, 25.601634979248047, 36.048919677734375, 0.7258453369140625, 13.3880615234375, 9.131080627441406, 21.254058837890625, -2.2160491943359375, 15.205089569091797, 3.4644222259521484, 10.881332397460938, 33.06504821777344, 10.455154418945312, 6.8053741455078125, 26.71680450439453, 17.30492401123047, 10.59759521484375, 12.980903625488281, 18.7198486328125, 46.12995910644531, 14.55450439453125, 12.908271789550781, 20.655773162841797, 8.557003021240234, 13.059154510498047, 6.015106201171875, 19.92487335205078, 3.134632110595703, 1.9368896484375, 7.510856628417969, -5.987880706787109, 30.597667694091797, 37.12788391113281, 16.685745239257812, 3.0127182006835938, 12.370681762695312, 14.050071716308594, 1.136871337890625, 5.575347900390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000154.npy"}
{"epoch": 0.2261380323054332, "step": 155, "batch_size": 64, "mean": 10.932403564453125, "std": 13.444591522216797, "min": -35.6314697265625, "p10": -3.5443049430847164, "median": 8.921531677246094, "p90": 29.694139480590838, "max": 43.000267028808594, "pos_frac": 0.8125, "sample": [3.0091171264648438, -10.3907470703125, -8.032173156738281, 20.847457885742188, 8.259750366210938, 0.03083038330078125, -1.3578605651855469, -5.446807861328125, 10.820117950439453, 4.098581314086914, 6.013916015625, 4.962532043457031, 4.118438720703125, -35.6314697265625, 13.142196655273438, 8.438705444335938, 31.656932830810547, 19.572017669677734, -4.23699951171875, 18.122222900390625, 34.90106201171875, 12.322540283203125, 4.173957824707031, 24.129730224609375, 42.695770263671875, 5.241815567016602, 8.44085693359375, 20.60004425048828, 9.939178466796875, 16.624282836914062, 5.044748306274414, 18.300071716308594, 14.499313354492188, 33.07789611816406, 25.114288330078125, -3.090972900390625, 43.000267028808594, 32.41767883300781, -11.88311767578125, 21.55087661743164, -2.8018150329589844, 6.9650726318359375, 12.247688293457031, 7.9460906982421875, 20.87554931640625, 11.974258422851562, 24.306617736816406, 6.614814758300781, 15.888702392578125, 7.06658935546875, 18.266098022460938, 11.401741027832031, 32.124351501464844, 8.644184112548828, 18.646852493286133, -0.3143463134765625, 21.70614242553711, 4.321968078613281, 9.19887924194336, 6.864070892333984, 8.387725830078125, 11.135929107666016, -3.6929445266723633, -3.197479248046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000155.npy"}
{"epoch": 0.2276064610866373, "step": 156, "batch_size": 64, "mean": 9.428732872009277, "std": 11.33333969116211, "min": -12.370849609375, "p10": -4.188565826416015, "median": 8.705368041992188, "p90": 22.93018493652344, "max": 43.6942138671875, "pos_frac": 0.796875, "sample": [-3.671966552734375, 27.81511688232422, 43.6942138671875, 19.168846130371094, -2.0205154418945312, 2.6643524169921875, 14.114334106445312, 0.3876495361328125, -8.062652587890625, 15.518692016601562, 6.7679290771484375, 13.80881118774414, -6.414520263671875, 7.772705078125, 2.5372695922851562, 1.7779922485351562, 13.690681457519531, 17.292156219482422, 31.609115600585938, 9.611251831054688, -4.8699493408203125, 29.203659057617188, 22.605712890625, 12.006980895996094, 6.151472091674805, 0.3335723876953125, 5.733922958374023, 16.58789825439453, 36.259124755859375, 3.0934295654296875, 16.500762939453125, 20.29247283935547, 17.98968505859375, 15.226200103759766, 12.513185501098633, 1.5123615264892578, 3.5590667724609375, -12.370849609375, 12.71893310546875, -4.614677429199219, 15.194438934326172, -1.0015144348144531, 16.004409790039062, 21.737213134765625, 2.5035400390625, 23.069244384765625, 24.151718139648438, 10.700952529907227, 9.105331420898438, 8.898780822753906, 0.7863731384277344, -1.82415771484375, -2.5373077392578125, 19.8066463470459, -4.409965515136719, 16.950942993164062, 4.43994140625, 1.6472854614257812, 6.524375915527344, 5.4108123779296875, 8.511955261230469, -9.939437866210938, -1.670989990234375, 10.883907318115234], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000156.npy"}
{"epoch": 0.2290748898678414, "step": 157, "batch_size": 64, "mean": 11.95935344696045, "std": 10.994964599609375, "min": -10.341720581054688, "p10": -0.9476982116699216, "median": 9.911191940307617, "p90": 26.44399566650391, "max": 41.048492431640625, "pos_frac": 0.875, "sample": [12.441322326660156, 19.970794677734375, -9.460609436035156, 15.2281494140625, 10.497474670410156, 5.1449737548828125, 9.956344604492188, 2.4070892333984375, 6.951103210449219, 41.048492431640625, 22.616111755371094, 17.682785034179688, 9.101987838745117, 25.884078979492188, 27.510345458984375, 10.302696228027344, 20.182716369628906, 4.977008819580078, 16.695587158203125, 24.68401336669922, 9.634170532226562, -2.9987869262695312, 27.822715759277344, 3.5223236083984375, 14.87982177734375, 3.3419189453125, 5.820777893066406, 20.37555694580078, 7.041534423828125, 26.70954132080078, 1.9224853515625, 19.00952911376953, -0.7017059326171875, -1.3777942657470703, 7.746246337890625, 25.783832550048828, 17.432952880859375, -1.0531234741210938, 9.866039276123047, 12.814804077148438, 6.042644500732422, 22.251235961914062, 5.612583160400391, 3.7924652099609375, 3.0465049743652344, 12.56253433227539, 11.986740112304688, -3.7138633728027344, 8.037101745605469, 13.725845336914062, 40.06092834472656, 8.293968200683594, 2.640392303466797, 5.741447448730469, 19.618019104003906, 18.592498779296875, 38.87623596191406, 0.9513626098632812, 9.08761978149414, 8.676918029785156, 26.6839599609375, 13.981887817382812, -2.2239904403686523, -10.341720581054688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000157.npy"}
{"epoch": 0.2305433186490455, "step": 158, "batch_size": 64, "mean": 12.468442916870117, "std": 10.90707778930664, "min": -11.826744079589844, "p10": 0.7630420684814466, "median": 10.806013107299805, "p90": 28.954676437377938, "max": 38.202911376953125, "pos_frac": 0.90625, "sample": [5.2028961181640625, 2.3190383911132812, 10.874683380126953, 10.737342834472656, 7.287487030029297, 4.4645843505859375, 5.653329849243164, 11.230697631835938, 11.259719848632812, 26.652912139892578, 4.15374755859375, 36.40769958496094, 33.75978088378906, 5.411670684814453, 10.318244934082031, -11.826744079589844, -6.10316276550293, 17.371124267578125, -4.840858459472656, 14.027488708496094, 20.38763427734375, -0.1667327880859375, 5.497222900390625, 17.77887725830078, 20.058631896972656, 12.435379028320312, 18.2738037109375, 10.385211944580078, 9.256050109863281, 2.086711883544922, 18.788394927978516, 7.894615173339844, 12.810527801513672, 4.5929412841796875, 11.05936050415039, 20.678855895996094, 22.37169647216797, 10.045978546142578, 18.015625, 38.202911376953125, 16.0169677734375, 4.69207763671875, 4.23736572265625, 34.07806396484375, 11.762763977050781, 31.8228759765625, 6.753782272338867, -0.916168212890625, 17.137969970703125, 7.88031005859375, 16.19366455078125, 25.167953491210938, 0.1957550048828125, 29.941146850585938, 4.49591064453125, 2.479248046875, 23.9403076171875, -2.3959522247314453, 18.100982666015625, 6.5932159423828125, 19.086029052734375, 2.604705810546875, 36.42938232421875, 6.866619110107422], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000158.npy"}
{"epoch": 0.23201174743024963, "step": 159, "batch_size": 64, "mean": 12.106963157653809, "std": 12.955107688903809, "min": -20.734703063964844, "p10": -2.193710327148437, "median": 10.730186462402344, "p90": 28.627163696289074, "max": 52.02299499511719, "pos_frac": 0.859375, "sample": [22.64250946044922, 36.864410400390625, 11.10723876953125, 0.047847747802734375, 0.298583984375, 21.8433837890625, -4.866584777832031, 8.001514434814453, 21.99737548828125, 14.1341552734375, 10.751235961914062, -2.4841232299804688, 0.3491096496582031, -2.3432159423828125, 5.424335479736328, 2.1603927612304688, 37.376434326171875, 18.45526885986328, 10.240955352783203, 23.758609771728516, 52.02299499511719, 13.62100601196289, -6.97076416015625, 5.4813385009765625, 2.339845657348633, 20.034225463867188, 7.427459716796875, -20.734703063964844, 6.674415588378906, 11.3504638671875, 10.709136962890625, 12.630073547363281, -7.3583984375, 13.570266723632812, 17.517799377441406, 4.778663635253906, 17.091949462890625, 10.610061645507812, -0.1823253631591797, 18.042190551757812, 24.482208251953125, 36.31312561035156, 8.96780776977539, 12.915206909179688, 3.9868392944335938, 29.938766479492188, 17.535694122314453, 0.3189239501953125, -5.747425079345703, 25.110809326171875, 17.262571334838867, 0.44944000244140625, 9.296348571777344, -1.8448638916015625, 8.141677856445312, 37.3878059387207, 5.858348846435547, 25.566757202148438, 6.641384124755859, 37.12181091308594, 7.741432189941406, 16.386028289794922, 11.187606811523438, 15.412174224853516], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000159.npy"}
{"epoch": 0.23348017621145375, "step": 160, "batch_size": 64, "mean": 11.26180648803711, "std": 11.120935440063477, "min": -15.762149810791016, "p10": -2.3277782440185546, "median": 11.459487915039062, "p90": 24.594832611083987, "max": 45.5609130859375, "pos_frac": 0.84375, "sample": [16.92711639404297, 18.8538818359375, 13.433143615722656, 14.17572021484375, -2.4112205505371094, 8.314620971679688, 11.814674377441406, 16.06170654296875, 4.2471923828125, -1.0841588973999023, -5.433156967163086, 12.49444580078125, 14.109615325927734, 13.9796142578125, 16.591964721679688, 17.087326049804688, -0.4547691345214844, 23.291259765625, 5.0012359619140625, 2.878082275390625, 30.164276123046875, 11.104301452636719, 16.160110473632812, 15.550483703613281, 10.446956634521484, -7.637763977050781, 5.47576904296875, 19.62720489501953, 45.5609130859375, 23.866661071777344, 2.3498058319091797, 6.932193756103516, 14.195709228515625, 19.213104248046875, -15.762149810791016, 8.767417907714844, -9.291984558105469, 8.583106994628906, 27.849639892578125, 9.23382568359375, 7.542274475097656, 17.012184143066406, 14.705039978027344, 16.575790405273438, 10.237922668457031, 11.023956298828125, 4.233612060546875, 2.0219173431396484, -3.6581573486328125, 24.906906127929688, 23.12723159790039, 31.8115234375, 17.265182495117188, 2.239837646484375, 12.10662841796875, 5.070304870605469, 2.6818695068359375, 0.08112716674804688, -7.34857177734375, 30.66851806640625, -2.1330795288085938, 29.543319702148438, 17.87689208984375, 10.895553588867188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000160.npy"}
{"epoch": 0.23494860499265785, "step": 161, "batch_size": 64, "mean": 10.827180862426758, "std": 12.127645492553711, "min": -12.979217529296875, "p10": -3.6815876007080073, "median": 11.04346752166748, "p90": 27.175789642333985, "max": 42.521759033203125, "pos_frac": 0.765625, "sample": [-1.9437828063964844, 14.335769653320312, 10.998220443725586, 36.70050811767578, 28.635284423828125, 11.231155395507812, 8.041061401367188, -1.0501708984375, 28.966644287109375, -6.343357086181641, 13.911712646484375, 28.372779846191406, 2.202972412109375, 11.514518737792969, 16.880409240722656, 1.476318359375, 25.000106811523438, 20.65265655517578, 12.618888854980469, 42.521759033203125, -7.582874298095703, -2.69268798828125, 22.14679718017578, -0.5588340759277344, 5.770439147949219, -1.3557262420654297, 15.604156494140625, 13.359230041503906, 26.455596923828125, 24.584266662597656, 15.727157592773438, -10.567146301269531, 5.479936599731445, -3.9864730834960938, 18.189056396484375, 25.849472045898438, 6.912239074707031, -4.5496063232421875, 23.692039489746094, 8.227588653564453, -9.215805053710938, 4.15765380859375, -12.979217529296875, 13.83493423461914, 5.993743896484375, 2.1723403930664062, 6.88922119140625, 18.49945068359375, -1.62469482421875, 11.088714599609375, -0.46795654296875, 11.862953186035156, 9.001235961914062, 27.43482208251953, 15.19137191772461, 26.571380615234375, 4.447807312011719, 32.915122985839844, 11.251150131225586, -2.9701881408691406, 7.923259735107422, 12.178810119628906, 9.727684020996094, 3.6277694702148438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000161.npy"}
{"epoch": 0.23641703377386197, "step": 162, "batch_size": 64, "mean": 11.17841911315918, "std": 11.531044960021973, "min": -27.27313232421875, "p10": -1.3279754638671872, "median": 10.513011932373047, "p90": 26.336342620849614, "max": 38.58888244628906, "pos_frac": 0.859375, "sample": [-1.1545028686523438, 6.838191986083984, 19.8526611328125, 13.28021240234375, 0.810699462890625, 21.595901489257812, 13.62321662902832, 18.531944274902344, 14.077213287353516, 4.682552337646484, -8.04245376586914, 3.744384765625, 14.881603240966797, 6.293861389160156, 27.104736328125, 31.68047523498535, 12.042037963867188, -0.80029296875, 1.1376800537109375, 21.22576141357422, 21.969818115234375, 3.6445960998535156, 20.787521362304688, 2.5997085571289062, 7.932559967041016, 12.872032165527344, 19.2978515625, 1.915802001953125, -27.27313232421875, 30.06891632080078, 8.410499572753906, 18.76355743408203, 4.248958587646484, 0.260040283203125, 30.816242218017578, 9.550338745117188, 23.232177734375, 11.7274169921875, 12.792205810546875, -2.9720611572265625, 16.529037475585938, 17.714157104492188, 26.82750701904297, 10.147315979003906, 5.639030456542969, 9.017486572265625, 15.119205474853516, 10.732337951660156, -1.4023208618164062, 7.3703155517578125, 10.02337646484375, 25.190292358398438, 8.615760803222656, -2.4209060668945312, 38.58888244628906, 23.28537368774414, 13.153732299804688, -5.024200439453125, 3.5150375366210938, 10.293685913085938, -8.93398666381836, 0.5944328308105469, 16.608051300048828, 32.184226989746094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000162.npy"}
{"epoch": 0.23788546255506607, "step": 163, "batch_size": 64, "mean": 14.11191463470459, "std": 14.131208419799805, "min": -10.66790771484375, "p10": -1.27852611541748, "median": 12.278617858886719, "p90": 32.77423858642579, "max": 66.58677673339844, "pos_frac": 0.875, "sample": [5.449653625488281, 8.72006607055664, 25.04228973388672, 31.71331024169922, 3.1387672424316406, 4.381443023681641, -3.4120330810546875, 18.96947479248047, 5.171043395996094, 37.90486145019531, 13.182449340820312, 11.994903564453125, 12.562332153320312, 33.18748474121094, 24.910602569580078, -1.481353759765625, 17.797632217407227, 38.25677490234375, 4.291130065917969, 19.22393035888672, 2.157085418701172, 13.817146301269531, 7.3434906005859375, 21.70024871826172, -10.66790771484375, 1.8524227142333984, 13.154884338378906, 4.356266021728516, 5.2698516845703125, 9.088340759277344, 44.180999755859375, -3.9547576904296875, 16.464542388916016, 5.562835693359375, -6.889617919921875, -3.429096221923828, 1.5199203491210938, 5.807670593261719, 15.08782958984375, 14.874664306640625, 25.09349822998047, 3.1503067016601562, 17.98736572265625, 13.072723388671875, 31.80999755859375, 13.02630615234375, 3.740325927734375, 41.39246368408203, 66.58677673339844, -0.8052616119384766, 10.394767761230469, 5.163612365722656, 27.21966552734375, 37.96947479248047, 26.35938262939453, 17.37420654296875, 7.8272857666015625, 12.79071044921875, -3.3841323852539062, 11.644294738769531, 24.173202514648438, 9.822586059570312, 26.962379455566406, 5.489036560058594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000163.npy"}
{"epoch": 0.2393538913362702, "step": 164, "batch_size": 64, "mean": 9.396201133728027, "std": 10.930228233337402, "min": -21.780136108398438, "p10": -3.2133884429931636, "median": 9.887200355529785, "p90": 23.85827713012696, "max": 36.7999267578125, "pos_frac": 0.78125, "sample": [3.8936080932617188, 13.052276611328125, 17.62860107421875, 10.497734069824219, 1.1852188110351562, -2.87091064453125, 19.080707550048828, 26.347549438476562, -7.089336395263672, 10.361053466796875, 1.213226318359375, -7.943489074707031, 3.0265426635742188, 10.141435623168945, 18.409393310546875, 5.820960998535156, 13.609458923339844, 21.646865844726562, 7.826526641845703, 7.164422988891602, -3.3601646423339844, 3.96075439453125, 10.279045104980469, 19.063156127929688, 5.560405731201172, -8.178115844726562, 4.568019866943359, 21.581375122070312, -1.63507080078125, -0.34922027587890625, 3.9882659912109375, -3.8527374267578125, 10.811790466308594, -0.794189453125, 4.659748077392578, 15.468093872070312, 12.520294189453125, 29.463333129882812, -0.5629196166992188, 36.7999267578125, -1.6686630249023438, -21.780136108398438, -7.395500183105469, 14.014968872070312, 1.5248966217041016, 10.324722290039062, 24.75769805908203, 9.343299865722656, 9.632965087890625, 9.205829620361328, 15.97686767578125, 22.112159729003906, 7.65478515625, 15.609954833984375, 35.69563293457031, 13.272506713867188, 12.065475463867188, 14.512130737304688, 25.1376953125, 5.841835021972656, -1.54730224609375, 24.606613159179688, 14.062301635742188, 15.402503967285156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000164.npy"}
{"epoch": 0.24082232011747431, "step": 165, "batch_size": 64, "mean": 13.537657737731934, "std": 15.075767517089844, "min": -11.80569839477539, "p10": -0.44575309753417935, "median": 11.646995544433594, "p90": 24.046022033691408, "max": 84.07485961914062, "pos_frac": 0.875, "sample": [7.364192962646484, 6.5538330078125, 1.9937973022460938, 9.452747344970703, 16.256378173828125, 3.352963447570801, 19.086029052734375, -4.2184600830078125, 17.488616943359375, 7.459163665771484, 14.251152038574219, 3.2215652465820312, 31.08563232421875, 21.347381591796875, 15.73904800415039, 15.919479370117188, 1.3910903930664062, 24.17755126953125, 4.526702880859375, 46.3155403137207, 13.130725860595703, 1.717498779296875, 4.182743072509766, 84.07485961914062, 18.27393341064453, 42.526519775390625, -0.14800262451171875, -11.80569839477539, 18.10250473022461, 23.37469482421875, 38.35011291503906, 5.485515594482422, 23.739120483398438, 20.708961486816406, -3.8839111328125, 4.2989349365234375, 11.597789764404297, -0.793060302734375, 2.5957489013671875, -0.5733604431152344, 11.69620132446289, 2.3544158935546875, 7.431297302246094, -7.606208801269531, 21.932144165039062, 16.600059509277344, 21.24590301513672, 8.716255187988281, 12.267257690429688, 4.876026153564453, 15.182937622070312, 17.307052612304688, 9.049285888671875, -5.030815124511719, 13.73067855834961, 10.726158142089844, 23.18482208251953, 7.1313629150390625, 49.01026153564453, 22.980632781982422, 1.9699211120605469, 22.049896240234375, 1.2523956298828125, 20.632095336914062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000165.npy"}
{"epoch": 0.2422907488986784, "step": 166, "batch_size": 64, "mean": 12.7144775390625, "std": 14.705838203430176, "min": -8.949256896972656, "p10": -2.854871368408202, "median": 11.347644805908203, "p90": 29.502590942382813, "max": 60.242645263671875, "pos_frac": 0.828125, "sample": [29.759140014648438, 34.3011474609375, -7.366779327392578, 57.32745361328125, 10.205726623535156, 34.22787094116211, 3.8176307678222656, 12.48956298828125, 0.9887619018554688, 7.6087646484375, 20.63207244873047, 9.36111831665039, 4.437599182128906, 18.506362915039062, 12.793701171875, 4.980165481567383, 2.9285888671875, 14.294479370117188, 13.857307434082031, 13.4674072265625, 13.791400909423828, 45.1395263671875, 2.1390762329101562, 3.4327430725097656, -3.8180294036865234, -4.759115219116211, 16.806655883789062, -0.915374755859375, 4.9460601806640625, 54.58905029296875, 16.233413696289062, 60.242645263671875, 13.843299865722656, 22.296173095703125, -0.7504501342773438, 28.903976440429688, -6.5974884033203125, 2.38250732421875, 12.673614501953125, -0.29097747802734375, 15.558113098144531, 3.596019744873047, 3.6081314086914062, 15.674148559570312, 12.747600555419922, 8.454544067382812, 15.447921752929688, 24.685001373291016, 8.940696716308594, 19.370513916015625, 16.158580780029297, 6.942111968994141, -4.585269927978516, 0.6890335083007812, 3.8642807006835938, -8.949256896972656, -1.4873046875, 24.3466796875, 25.80859375, 15.2767333984375, 19.017311096191406, 5.603858947753906, -3.4409713745117188, 7.492733001708984], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000166.npy"}
{"epoch": 0.24375917767988253, "step": 167, "batch_size": 64, "mean": 13.920673370361328, "std": 12.069576263427734, "min": -4.791748046875, "p10": 0.7479705810546876, "median": 11.747119903564453, "p90": 31.061311340332036, "max": 49.22406005859375, "pos_frac": 0.90625, "sample": [18.40643310546875, 3.486959457397461, 13.27764892578125, 17.17279052734375, 28.97906494140625, 26.213764190673828, 5.6568603515625, 12.019706726074219, 26.42192840576172, 11.52590560913086, 31.44415283203125, 39.25957489013672, 13.374507904052734, -4.791748046875, 5.0518798828125, 14.915824890136719, 27.22476577758789, 11.118690490722656, 21.4456787109375, 15.123924255371094, 0.8348617553710938, 3.347320556640625, 19.323593139648438, 5.092430114746094, 9.134387969970703, 21.808563232421875, 41.23259735107422, 10.717037200927734, 6.32073974609375, 10.503700256347656, -1.460672378540039, -0.8702774047851562, 39.49732971191406, 5.722568511962891, 5.132568359375, 14.722061157226562, 49.22406005859375, 10.403118133544922, 43.778419494628906, 7.114738464355469, 9.875160217285156, 5.445587158203125, 7.803611755371094, 3.7171707153320312, 30.168014526367188, 17.561119079589844, 13.764537811279297, 17.190284729003906, 32.16191101074219, 11.82501220703125, 11.669227600097656, 12.478240966796875, 16.176475524902344, -2.6004981994628906, 2.371440887451172, 21.203773498535156, 6.1277008056640625, 4.640357971191406, -1.00482177734375, 8.286941528320312, 13.466079711914062, -3.1531753540039062, 12.130767822265625, 0.7107315063476562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000167.npy"}
{"epoch": 0.24522760646108663, "step": 168, "batch_size": 64, "mean": 11.710172653198242, "std": 12.751720428466797, "min": -14.01926040649414, "p10": -1.0377138137817377, "median": 10.331167221069336, "p90": 27.4486785888672, "max": 61.21382141113281, "pos_frac": 0.859375, "sample": [11.991657257080078, 11.824493408203125, -14.01926040649414, 8.706546783447266, -1.3180980682373047, 9.9873046875, 24.666366577148438, 10.675029754638672, -12.624923706054688, 6.652442932128906, 1.9984970092773438, -11.307342529296875, 14.777061462402344, 10.746833801269531, 8.542526245117188, 16.38055419921875, 17.507789611816406, 18.32640838623047, 8.783897399902344, 28.641098022460938, 21.15512466430664, 15.37933349609375, 8.770919799804688, 9.636283874511719, 46.103858947753906, 31.612525939941406, -0.19730186462402344, 34.31732177734375, 6.196891784667969, 12.168266296386719, -10.900093078613281, 12.678871154785156, -7.879600524902344, 12.059867858886719, 7.475105285644531, 5.119102478027344, 7.7661285400390625, 3.297170639038086, 9.282882690429688, 5.679779052734375, 4.800361633300781, 29.171905517578125, 23.735557556152344, -0.38348388671875, -1.755645751953125, 3.7574901580810547, 9.644588470458984, 15.954849243164062, 13.174652099609375, 14.419708251953125, 5.573211669921875, 10.750072479248047, 7.498470306396484, 6.7932586669921875, 33.026466369628906, 12.09844970703125, 15.66656494140625, 12.950996398925781, 19.268451690673828, 2.5978240966796875, 61.21382141113281, 6.436561584472656, 22.321632385253906, 20.073997497558594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000168.npy"}
{"epoch": 0.24669603524229075, "step": 169, "batch_size": 64, "mean": 13.047781944274902, "std": 13.044960975646973, "min": -8.189765930175781, "p10": -3.0041885375976562, "median": 12.418210983276367, "p90": 29.085379791259765, "max": 53.890533447265625, "pos_frac": 0.859375, "sample": [-2.810832977294922, 15.78116226196289, 1.4044952392578125, 5.4705657958984375, 2.36358642578125, 17.836410522460938, 31.06249237060547, 20.286895751953125, 6.952266693115234, -4.027080535888672, 4.7919464111328125, 32.82417297363281, -3.9673690795898438, -1.0566024780273438, 11.326648712158203, 18.142295837402344, 7.9081573486328125, 13.533523559570312, 0.2518310546875, 2.4282989501953125, 15.123859405517578, 2.3213958740234375, 10.50701904296875, -5.1277618408203125, 29.18303680419922, 22.870277404785156, 7.287662506103516, 18.7357177734375, 8.349411010742188, 2.346721649169922, 24.086624145507812, 14.769218444824219, 1.8143119812011719, 1.2245979309082031, 20.625919342041016, -3.087055206298828, 17.36761474609375, -8.189765930175781, 16.71648406982422, 29.471885681152344, 6.51292610168457, 53.890533447265625, 13.561634063720703, 9.856712341308594, 27.62664031982422, 8.013896942138672, 11.390983581542969, 12.771881103515625, 28.857513427734375, 14.87274169921875, 43.6849365234375, 7.5039520263671875, 15.967300415039062, -3.767578125, 53.40643310546875, 7.466590881347656, 12.06454086303711, -7.8064422607421875, 16.664817810058594, 22.420379638671875, 16.658370971679688, 14.3851318359375, 24.501907348632812, 17.652198791503906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000169.npy"}
{"epoch": 0.24816446402349487, "step": 170, "batch_size": 64, "mean": 12.180381774902344, "std": 11.538603782653809, "min": -13.603130340576172, "p10": -1.5312187194824216, "median": 11.745738983154297, "p90": 28.185571289062505, "max": 39.3023681640625, "pos_frac": 0.875, "sample": [9.129989624023438, 17.28192138671875, 5.16259765625, 15.905433654785156, 10.781501770019531, 2.267597198486328, 0.7038726806640625, -1.9295463562011719, 16.483966827392578, 2.715850830078125, 26.589027404785156, 4.436454772949219, 4.71174430847168, 5.045131683349609, 0.5101814270019531, -3.23358154296875, 18.808868408203125, 38.48460006713867, 38.34318542480469, 11.857437133789062, 7.984174728393555, 14.25494384765625, 19.55426788330078, 22.106660842895508, -1.3230056762695312, 3.828887939453125, 0.4868927001953125, 11.897140502929688, -6.2838134765625, 15.305286407470703, 21.128700256347656, 39.3023681640625, 7.179351806640625, 29.183258056640625, 23.935256958007812, 1.9467620849609375, 7.8499908447265625, -1.620452880859375, 7.7218170166015625, 17.2474365234375, 28.86980438232422, 12.250587463378906, 14.246978759765625, 16.835304260253906, 35.20722961425781, 15.83331298828125, 20.266559600830078, 8.331859588623047, 17.733489990234375, 10.750669479370117, 12.836395263671875, 19.163414001464844, -4.2657318115234375, 16.528976440429688, 35.77422332763672, 11.634040832519531, 11.062118530273438, -8.966087341308594, 3.318359375, -13.603130340576172, 16.118972778320312, 4.552158355712891, 19.050491333007812, 10.272275924682617], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000170.npy"}
{"epoch": 0.24963289280469897, "step": 171, "batch_size": 64, "mean": 13.571109771728516, "std": 15.890766143798828, "min": -23.28509521484375, "p10": 0.07001476287841854, "median": 10.025957107543945, "p90": 30.852384185791017, "max": 72.80929565429688, "pos_frac": 0.890625, "sample": [2.3839340209960938, 17.14636993408203, 36.52732849121094, 2.0043792724609375, 10.079177856445312, -5.902126312255859, 7.246448516845703, 4.4715423583984375, -7.456573486328125, 20.772171020507812, 19.24300765991211, 18.037174224853516, 5.941551208496094, 4.370857238769531, 16.452545166015625, -10.757499694824219, 8.688720703125, 15.605880737304688, 10.029083251953125, -0.16800689697265625, 21.51104736328125, 29.489044189453125, 28.067527770996094, 3.1992225646972656, 29.923233032226562, 11.144447326660156, 10.537757873535156, 52.805694580078125, 4.098094940185547, 31.67672348022461, 9.014701843261719, -23.28509521484375, 57.015350341796875, 2.8278350830078125, 23.172779083251953, 2.68341064453125, 18.494766235351562, 25.048412322998047, 6.733282089233398, 26.468994140625, 4.903327941894531, 3.0477752685546875, 22.841339111328125, 1.8022994995117188, 0.699127197265625, 14.041595458984375, 17.76502227783203, 20.3248291015625, 1.41229248046875, 5.034523010253906, -4.782463073730469, -1.5842514038085938, 36.20127868652344, 31.083236694335938, 3.6297607421875, 6.9909515380859375, 72.80929565429688, 10.022830963134766, 12.12466812133789, 5.928680419921875, 7.811256408691406, 0.6253986358642578, 20.161300659179688, 30.31372833251953], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000171.npy"}
{"epoch": 0.2511013215859031, "step": 172, "batch_size": 64, "mean": 12.833066940307617, "std": 9.60873031616211, "min": -13.389907836914062, "p10": 1.8570056915283204, "median": 12.461141586303711, "p90": 24.615641975402834, "max": 35.67406463623047, "pos_frac": 0.921875, "sample": [12.249046325683594, 14.451103210449219, -0.10073280334472656, 1.7904777526855469, 20.270889282226562, 9.007575988769531, 13.220565795898438, 2.84808349609375, 12.11761474609375, 2.690837860107422, 10.404342651367188, 11.80612564086914, -13.389907836914062, 20.511940002441406, 11.006202697753906, 20.75836181640625, 29.56523895263672, 18.35821533203125, 6.709373474121094, 11.418586730957031, 5.160617828369141, 16.124420166015625, 18.83580780029297, 35.67406463623047, 7.001213073730469, 3.6745777130126953, 21.96814727783203, 12.862770080566406, 2.012237548828125, 32.78985595703125, 11.808780670166016, -0.8316116333007812, 9.500370025634766, 8.7030029296875, 6.580146789550781, 21.335189819335938, 17.085403442382812, 24.086095809936523, 2.782245635986328, 12.300914764404297, -5.483985900878906, 18.31319808959961, 33.15533447265625, 18.848953247070312, -2.3033981323242188, 13.61431884765625, 1.4432792663574219, 6.44873046875, 12.621368408203125, 17.7763671875, 14.752220153808594, 20.0667724609375, 13.155593872070312, 18.597427368164062, 20.956459045410156, 24.84259033203125, 25.546897888183594, 15.081642150878906, 4.7647705078125, 8.579437255859375, 7.019126892089844, 2.6177024841308594, 30.86566162109375, 14.887641906738281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000172.npy"}
{"epoch": 0.2525697503671072, "step": 173, "batch_size": 64, "mean": 10.60384750366211, "std": 11.331790924072266, "min": -8.736209869384766, "p10": -4.62908935546875, "median": 9.906597137451172, "p90": 24.777006149291996, "max": 38.543731689453125, "pos_frac": 0.8125, "sample": [-3.2342071533203125, -0.45784759521484375, 4.790637969970703, 16.161819458007812, 2.8144683837890625, 38.543731689453125, 22.926597595214844, 8.824615478515625, -0.8584976196289062, -0.30876922607421875, 17.87847900390625, 9.212516784667969, 0.9691963195800781, 1.47113037109375, 16.41907501220703, 13.749916076660156, 14.0489501953125, 15.258691787719727, 5.925579071044922, 37.61302185058594, -7.4232940673828125, -4.662322998046875, 5.813789367675781, 15.021598815917969, 21.818832397460938, 16.53128433227539, 1.011444091796875, 1.9153461456298828, 18.140445709228516, 12.965469360351562, 2.3710765838623047, 4.247348785400391, 4.9980316162109375, 23.896678924560547, 15.455810546875, 4.71636962890625, 28.781585693359375, 21.8353271484375, 6.87595272064209, -4.551544189453125, 25.15428924560547, 22.295738220214844, 12.12887954711914, 25.57433319091797, 12.954010009765625, -5.173595428466797, 36.48894500732422, 11.462600708007812, -8.363914489746094, 13.616226196289062, 14.94407844543457, 5.876216888427734, 9.624443054199219, 21.90118408203125, -8.736209869384766, 19.760772705078125, 1.3904743194580078, -5.073356628417969, 10.188751220703125, 6.759178161621094, -6.7745361328125, 6.853668212890625, 26.75853729248047, 17.52721405029297], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000173.npy"}
{"epoch": 0.2540381791483113, "step": 174, "batch_size": 64, "mean": 13.14596939086914, "std": 12.646212577819824, "min": -22.588973999023438, "p10": -0.8566982269287104, "median": 11.99203872680664, "p90": 26.99906806945801, "max": 65.72637939453125, "pos_frac": 0.875, "sample": [22.38530731201172, 2.4032135009765625, 12.218746185302734, 12.544708251953125, -4.189117431640625, 8.331878662109375, 6.357170104980469, 20.16326141357422, 24.694438934326172, 26.10919189453125, 7.646797180175781, -4.298561096191406, 10.791252136230469, 7.077981948852539, 25.617645263671875, 24.107982635498047, 27.099468231201172, 11.31988525390625, 14.364479064941406, -5.767578125, 18.84817123413086, 24.356651306152344, 24.73809814453125, 7.988777160644531, 32.216552734375, -22.588973999023438, 13.701248168945312, 15.858169555664062, 1.6752090454101562, 16.967958450317383, 12.126274108886719, 6.719775199890137, 11.096626281738281, -2.821533203125, 26.764801025390625, 11.402557373046875, 12.371734619140625, 6.744632720947266, 1.302093505859375, 31.734054565429688, 19.943862915039062, 27.957473754882812, 16.69318199157715, -5.805023193359375, -1.0598182678222656, 6.97979736328125, 6.167762756347656, 30.033401489257812, 5.7133026123046875, 8.567764282226562, 11.857803344726562, 11.743658065795898, 13.364830017089844, 29.86864471435547, 13.554122924804688, 13.43841552734375, 8.904834747314453, 3.486236572265625, 1.1447525024414062, -0.38275146484375, 65.72637939453125, 26.12924575805664, 9.746883392333984, 17.386281967163086], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000174.npy"}
{"epoch": 0.2555066079295154, "step": 175, "batch_size": 64, "mean": 13.553534507751465, "std": 12.051722526550293, "min": -11.246551513671875, "p10": -0.42940044403076116, "median": 11.477392196655273, "p90": 29.839660644531254, "max": 45.01972961425781, "pos_frac": 0.890625, "sample": [11.455974578857422, 18.775291442871094, 1.891958236694336, 19.6953125, 7.554924011230469, -1.444244384765625, 10.040115356445312, 8.631813049316406, 3.743671417236328, 28.49481201171875, 23.933258056640625, 0.113983154296875, 3.7378692626953125, 45.01972961425781, 43.83173370361328, 26.35369110107422, -0.6622791290283203, 1.6607742309570312, 6.945655822753906, 5.024627685546875, 22.310081481933594, 11.785964965820312, 34.10759735107422, 22.05206298828125, 9.029895782470703, 11.508499145507812, 23.326858520507812, 25.254661560058594, 16.905899047851562, 12.065139770507812, 11.838233947753906, 3.2016983032226562, 3.3487701416015625, 20.285076141357422, 11.18753433227539, 13.45356559753418, -6.168460845947266, 30.281082153320312, 5.35467529296875, -2.3390464782714844, 22.84765625, 11.735366821289062, 13.799627304077148, 10.110898971557617, 4.74168586730957, 34.795982360839844, 21.317581176757812, -1.5521697998046875, 4.772880554199219, 37.48033142089844, 11.388771057128906, 24.008193969726562, 9.75204849243164, 2.7226715087890625, 6.934133529663086, 8.200027465820312, 17.81231689453125, -3.4443931579589844, -11.246551513671875, 28.809677124023438, 32.195709228515625, 18.229217529296875, 11.498809814453125, 6.9272918701171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000175.npy"}
{"epoch": 0.25697503671071953, "step": 176, "batch_size": 64, "mean": 13.665754318237305, "std": 15.091054916381836, "min": -13.624847412109375, "p10": -4.367766571044921, "median": 11.48703384399414, "p90": 36.29244918823244, "max": 49.263580322265625, "pos_frac": 0.828125, "sample": [18.21261215209961, 11.571617126464844, 12.500625610351562, 42.592254638671875, 29.211692810058594, 12.880435943603516, 6.643161773681641, 2.4419708251953125, -2.3928794860839844, 1.7780914306640625, 7.278713226318359, 32.9615478515625, 3.0773048400878906, 15.285102844238281, 28.837997436523438, 9.841827392578125, 14.266555786132812, 22.902076721191406, 6.108619689941406, -6.236225128173828, 37.71997833251953, 9.299407958984375, 28.83118438720703, -0.018207550048828125, -13.624847412109375, -5.883298873901367, 0.0835723876953125, 42.838829040527344, 11.96990966796875, 0.20656585693359375, 4.1498565673828125, 27.94036865234375, 29.27606201171875, 0.9834861755371094, 15.608951568603516, -1.9683990478515625, 13.665863037109375, 2.9117279052734375, -4.772918701171875, 12.817779541015625, 27.815643310546875, 5.7372589111328125, 24.845813751220703, 2.0521163940429688, 6.376651763916016, 39.21739959716797, 27.72296142578125, 11.402450561523438, 28.89789581298828, 44.93023681640625, 27.74994659423828, -8.346755981445312, 20.569351196289062, -3.83831787109375, -4.594673156738281, 1.68231201171875, -8.374523162841797, 8.014720916748047, 24.414432525634766, 7.762855529785156, 49.263580322265625, 39.87391662597656, 7.360403060913086, 12.273651123046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000176.npy"}
{"epoch": 0.25844346549192365, "step": 177, "batch_size": 64, "mean": 15.959339141845703, "std": 16.8228759765625, "min": -5.736000061035156, "p10": -0.44184074401855467, "median": 12.96092700958252, "p90": 34.83762435913087, "max": 79.61607360839844, "pos_frac": 0.875, "sample": [1.938018798828125, 22.400970458984375, 21.276092529296875, 5.464439392089844, -5.736000061035156, 30.48186492919922, 14.55584716796875, -1.255645751953125, 11.709453582763672, 32.237083435058594, 73.19146728515625, 5.767887115478516, 20.60905647277832, 17.782546997070312, 13.593280792236328, 14.691753387451172, 5.917839050292969, 60.249847412109375, 3.82958984375, 29.614013671875, 3.49468994140625, 15.131900787353516, 13.320472717285156, 12.837104797363281, -3.7865686416625977, 8.772611618041992, 46.092613220214844, -2.5492515563964844, 3.177093505859375, 36.85372543334961, 15.138565063476562, 7.815349578857422, 19.408958435058594, 12.402557373046875, 23.002792358398438, -0.40008544921875, -2.2639312744140625, 16.497777938842773, 13.084749221801758, 4.964115142822266, 9.548751831054688, 41.40142822265625, 12.003684997558594, 7.424049377441406, 13.892669677734375, 15.003875732421875, 21.78185272216797, 0.5159645080566406, 3.199920654296875, 7.8772735595703125, 19.634624481201172, 8.300704956054688, 13.899253845214844, 0.07718467712402344, 32.774810791015625, 12.069381713867188, 7.9004058837890625, 5.998970031738281, 25.65582275390625, -0.4597358703613281, 79.61607360839844, 35.72168731689453, 27.863800048828125, -1.6194076538085938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000177.npy"}
{"epoch": 0.2599118942731278, "step": 178, "batch_size": 64, "mean": 18.396028518676758, "std": 15.802523612976074, "min": -6.662284851074219, "p10": 1.6502357482910168, "median": 15.380290985107422, "p90": 42.63868408203126, "max": 67.2333984375, "pos_frac": 0.921875, "sample": [19.879295349121094, 16.808677673339844, 10.822235107421875, 0.525787353515625, 25.645309448242188, 5.525035858154297, 3.8976287841796875, 3.9122657775878906, 2.823505401611328, 14.038536071777344, 2.9278602600097656, 19.244495391845703, 28.81958770751953, 19.641357421875, 20.93404769897461, 19.082496643066406, 61.93556213378906, 21.428550720214844, 39.5126953125, 2.80950927734375, 67.2333984375, 6.6910552978515625, 3.323272705078125, 21.6229248046875, 27.268558502197266, 44.79955291748047, 43.9783935546875, 12.499343872070312, -2.2554931640625, 16.9249267578125, 20.5557861328125, 50.37226867675781, -1.2523193359375, 11.065201759338379, 20.092575073242188, 34.460968017578125, 15.111251831054688, 1.1534042358398438, 13.75178337097168, 32.98725891113281, 5.8734588623046875, -1.1812744140625, 32.991668701171875, 12.056510925292969, 15.842525482177734, 24.310813903808594, 16.880176544189453, 26.923370361328125, 13.448223114013672, 45.145965576171875, 6.808292388916016, 34.199951171875, 15.122329711914062, 8.997856140136719, 10.035614013671875, 15.638252258300781, -1.6933517456054688, 11.905586242675781, -6.662284851074219, 13.714374542236328, 55.477203369140625, 7.1514892578125, 19.539974212646484, 14.220550537109375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000178.npy"}
{"epoch": 0.26138032305433184, "step": 179, "batch_size": 64, "mean": 12.888062477111816, "std": 15.708548545837402, "min": -21.083328247070312, "p10": -4.193777847290039, "median": 11.093236923217773, "p90": 37.664366149902364, "max": 48.21641540527344, "pos_frac": 0.78125, "sample": [-3.87176513671875, 26.283145904541016, 19.675575256347656, 22.7503662109375, 6.647317886352539, 20.532432556152344, 23.734813690185547, 0.004520416259765625, -4.066829681396484, -3.096895217895508, 41.228759765625, 14.48471450805664, 3.710969924926758, -0.8015327453613281, 2.64971923828125, 21.445999145507812, -9.787399291992188, 8.323867797851562, 5.330291748046875, 24.226287841796875, 22.58380126953125, -0.38763427734375, 20.229400634765625, 1.2939224243164062, 26.49700927734375, 16.97270965576172, 8.37109375, 33.293724060058594, 10.964622497558594, 21.997238159179688, 1.5888442993164062, 39.553802490234375, 48.21641540527344, 10.255691528320312, -5.515094757080078, 17.36998748779297, -6.790559768676758, 25.787391662597656, -3.4238739013671875, 1.37982177734375, 4.190986633300781, -21.083328247070312, 3.8457908630371094, 1.79632568359375, 23.17925262451172, 43.450958251953125, 21.28449249267578, 14.571990966796875, 43.54389953613281, -2.3771743774414062, 16.23889923095703, 46.259674072265625, 1.186309814453125, 26.508636474609375, 39.537498474121094, 25.072978973388672, -4.2481842041015625, 13.74777603149414, -11.780609130859375, 5.047931671142578, -10.273666381835938, 21.321044921875, 2.949991226196289, 11.221851348876953], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000179.npy"}
{"epoch": 0.26284875183553597, "step": 180, "batch_size": 64, "mean": 15.391548156738281, "std": 14.540177345275879, "min": -10.443611145019531, "p10": -1.4379573822021476, "median": 12.918933868408203, "p90": 33.172374725341804, "max": 78.47505187988281, "pos_frac": 0.875, "sample": [14.079093933105469, 25.96588897705078, 19.97271728515625, -10.443611145019531, 12.307220458984375, 78.47505187988281, 28.80227279663086, 12.756233215332031, 30.303508758544922, 28.409408569335938, -3.1596851348876953, 8.196487426757812, -6.6069793701171875, 6.365886688232422, 4.722930908203125, 34.421875, 8.884994506835938, 15.511409759521484, 11.897186279296875, 7.7753448486328125, 8.429481506347656, 16.2821044921875, 15.24355697631836, 25.592803955078125, 34.20532989501953, 30.76214599609375, -6.516166687011719, 17.856746673583984, 14.164443969726562, -1.7416915893554688, 36.99897766113281, 2.9924230575561523, 16.322158813476562, 24.540260314941406, 25.194557189941406, 8.412921905517578, 9.509040832519531, 25.364112854003906, 18.744056701660156, 35.19598388671875, 15.640453338623047, 20.542556762695312, 30.279205322265625, 7.733818054199219, -2.2356033325195312, 1.9533767700195312, 16.437271118164062, 20.962356567382812, 5.530914306640625, 2.7906036376953125, -1.94354248046875, 26.182723999023438, 9.738761901855469, 44.6893310546875, 6.132637023925781, 0.7988739013671875, 10.012001037597656, 5.272743225097656, 13.081634521484375, 6.5340576171875, 37.54911804199219, 10.095809936523438, -0.7292442321777344, 11.818710327148438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000180.npy"}
{"epoch": 0.2643171806167401, "step": 181, "batch_size": 64, "mean": 11.940736770629883, "std": 12.900199890136719, "min": -10.993721008300781, "p10": -2.225965881347656, "median": 8.897977828979492, "p90": 28.121186065673836, "max": 49.053253173828125, "pos_frac": 0.8125, "sample": [7.665618896484375, 1.9421310424804688, 12.164411544799805, 28.863525390625, 16.349136352539062, -4.036876678466797, 1.0988578796386719, -0.19530868530273438, 0.2175750732421875, 18.259429931640625, 21.321746826171875, 11.763101577758789, -2.416865348815918, 20.768417358398438, 8.541046142578125, 6.937206268310547, 6.919952392578125, 13.407150268554688, 17.134124755859375, -0.12128448486328125, 9.25490951538086, -10.993721008300781, 32.87474060058594, 22.05938720703125, 7.705230712890625, 7.243263244628906, -4.2151947021484375, 47.250816345214844, 31.122364044189453, 14.17685317993164, 4.943519592285156, 17.051223754882812, -1.9367523193359375, 22.8004150390625, 12.774978637695312, 22.69717025756836, 23.573204040527344, 4.169258117675781, 20.637237548828125, 3.3383331298828125, 4.102449417114258, -5.393218994140625, 36.04222869873047, 12.964897155761719, 47.20579528808594, 15.934473037719727, -2.34991455078125, 8.405128479003906, 17.830001831054688, 6.9971771240234375, 26.389060974121094, 49.053253173828125, -4.644889831542969, 9.789085388183594, 5.7265625, -0.7142467498779297, 6.198991775512695, 20.716629028320312, 13.768753051757812, 15.284416198730469, 3.6836929321289062, 2.0656967163085938, -0.7144088745117188, 2.7552566528320312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000181.npy"}
{"epoch": 0.2657856093979442, "step": 182, "batch_size": 64, "mean": 15.589113235473633, "std": 16.499013900756836, "min": -17.628997802734375, "p10": -1.3736437797546386, "median": 15.22087049484253, "p90": 36.063906097412115, "max": 64.44932556152344, "pos_frac": 0.84375, "sample": [22.876724243164062, 15.303863525390625, 36.343894958496094, 22.344512939453125, 25.24810791015625, 32.93909454345703, 60.44154357910156, 18.1427001953125, 64.44932556152344, 15.266539573669434, 1.8462677001953125, 18.422168731689453, -9.90799331665039, -1.2445306777954102, 25.779022216796875, -17.628997802734375, 23.292102813720703, 25.22748565673828, 2.9334630966186523, 11.389907836914062, 14.25201416015625, -9.55695915222168, 30.38796615600586, 19.0989990234375, 8.749080657958984, -1.1709556579589844, 11.823516845703125, 0.992462158203125, -1.4289779663085938, 15.847503662109375, 14.157649993896484, 15.175201416015625, 35.41059875488281, 17.809803009033203, 16.244613647460938, 21.56963348388672, 8.5765380859375, -17.243072509765625, 4.92034912109375, 45.63897705078125, 2.7685699462890625, 14.353199005126953, 5.997592926025391, 8.292015075683594, 43.16294860839844, 11.798942565917969, 16.085330963134766, 34.104042053222656, 16.272903442382812, -2.2574234008789062, 43.10284423828125, 18.8700008392334, 6.456550598144531, 4.4128265380859375, 9.042282104492188, 4.469575881958008, -10.55434799194336, -0.40979766845703125, 42.05999755859375, 1.3777656555175781, 26.743804931640625, 6.199756622314453, 25.192867279052734, 25.44092559814453], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000182.npy"}
{"epoch": 0.26725403817914833, "step": 183, "batch_size": 64, "mean": 15.86647891998291, "std": 14.529356956481934, "min": -10.334175109863281, "p10": 0.11017112731933613, "median": 14.130409240722656, "p90": 33.37358436584473, "max": 51.597412109375, "pos_frac": 0.90625, "sample": [32.64109802246094, 26.31218719482422, 3.4740982055664062, 15.005722045898438, 2.2786102294921875, 29.853317260742188, 40.26567077636719, -5.483306884765625, 14.127517700195312, -4.898834228515625, 19.955764770507812, 51.597412109375, 14.950019836425781, 2.765625, -2.9599609375, 10.901809692382812, 50.936553955078125, 30.636192321777344, 13.768348693847656, 3.999357223510742, 6.172271728515625, 16.538665771484375, 14.466140747070312, 15.088607788085938, 2.6282196044921875, 0.7095518112182617, 33.13628387451172, 31.88506317138672, 33.475284576416016, 6.390289306640625, 49.73792266845703, 12.886985778808594, 12.073980331420898, 2.25860595703125, 23.95269775390625, -1.1021804809570312, 14.13330078125, 25.495529174804688, 15.255901336669922, 33.02716064453125, -2.676727294921875, 1.0520172119140625, 23.641510009765625, 22.70606231689453, 9.826614379882812, 46.816612243652344, 9.911796569824219, 0.29788970947265625, 13.726520538330078, 17.74017333984375, 22.22649383544922, 3.6482696533203125, 21.054061889648438, 12.888839721679688, 7.030067443847656, 31.611839294433594, 6.385429382324219, 15.58575439453125, 3.087738037109375, 0.029720306396484375, 34.35204315185547, -10.334175109863281, 18.604873657226562, 7.9037322998046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000183.npy"}
{"epoch": 0.2687224669603524, "step": 184, "batch_size": 64, "mean": 10.338337898254395, "std": 14.900415420532227, "min": -41.36517333984375, "p10": -3.8251914978027335, "median": 8.810525894165039, "p90": 31.249519348144535, "max": 40.67799377441406, "pos_frac": 0.71875, "sample": [21.621124267578125, 17.634803771972656, 0.19403839111328125, 29.825958251953125, 39.291893005371094, -1.2953338623046875, 11.608116149902344, 1.3365249633789062, 40.67799377441406, -1.6802635192871094, 10.624313354492188, -0.6526260375976562, 10.44247817993164, 11.28515625, -0.9193115234375, 28.532699584960938, 19.866363525390625, 32.47308349609375, 30.654579162597656, 38.16078186035156, 3.5426292419433594, 4.417449951171875, -41.36517333984375, 15.878192901611328, -0.989959716796875, 25.355876922607422, 11.686298370361328, -0.491058349609375, -1.5719451904296875, 12.888294219970703, 1.8086700439453125, -17.159332275390625, 15.113395690917969, 31.504493713378906, -2.827106475830078, 18.67657470703125, 15.710533142089844, 7.201499938964844, 17.47503662109375, 6.3377685546875, 20.998435974121094, -3.040740966796875, 4.072307586669922, -4.161384582519531, 10.185394287109375, 7.435657501220703, -1.6333465576171875, 4.678016662597656, 21.114898681640625, -1.0382614135742188, -4.602455139160156, 6.4936981201171875, 3.0007247924804688, 13.078502655029297, -4.641380310058594, -10.281303405761719, 37.185508728027344, 33.86542510986328, 6.771884918212891, 18.381309509277344, 3.43780517578125, -7.257587432861328, 15.14332389831543, 29.592666625976562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000184.npy"}
{"epoch": 0.2701908957415565, "step": 185, "batch_size": 64, "mean": 12.732060432434082, "std": 15.859108924865723, "min": -19.194459915161133, "p10": -6.8398855209350575, "median": 12.420846939086914, "p90": 33.57360038757326, "max": 62.7518310546875, "pos_frac": 0.8125, "sample": [-19.194459915161133, -10.596076965332031, 5.916919708251953, -4.751556396484375, 7.798492431640625, 3.9588661193847656, -2.7613449096679688, 0.5402603149414062, 24.406692504882812, 15.949737548828125, 15.421550750732422, 46.28971862792969, 15.299545288085938, 16.131431579589844, 11.343795776367188, 22.28973388671875, 11.946596145629883, -0.5366668701171875, 19.62470245361328, 39.66416931152344, -6.075159072875977, 20.144489288330078, 4.867740631103516, 1.7391510009765625, -7.167625427246094, -11.048477172851562, 20.689151763916016, 27.290340423583984, 11.400283813476562, 15.861724853515625, -12.805145263671875, 0.8442726135253906, 24.04901885986328, 19.173797607421875, -16.32757568359375, 11.141143798828125, 12.796165466308594, 18.855087280273438, 41.144248962402344, 8.350357055664062, 15.233203887939453, 35.36698913574219, 1.0510177612304688, 21.41980743408203, 8.492996215820312, 5.476776123046875, 22.28052520751953, 16.2486572265625, 21.12842559814453, 11.586549758911133, 62.7518310546875, 15.39352798461914, 15.256954193115234, -2.256561279296875, 1.779052734375, 29.389026641845703, 12.045528411865234, 8.683891296386719, -14.88848876953125, 41.426544189453125, 16.767120361328125, 43.654273986816406, 19.923385620117188, 2.9757308959960938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000185.npy"}
{"epoch": 0.27165932452276065, "step": 186, "batch_size": 64, "mean": 14.809966087341309, "std": 14.64017391204834, "min": -16.302413940429688, "p10": -1.226972961425781, "median": 14.837108612060547, "p90": 36.496376800537114, "max": 58.89128112792969, "pos_frac": 0.84375, "sample": [28.985260009765625, 17.35992431640625, 16.073135375976562, 19.246423721313477, 13.565040588378906, 40.61152648925781, 11.577880859375, 18.93012237548828, 58.89128112792969, -6.182716369628906, 21.384674072265625, 18.805221557617188, 2.7680435180664062, -8.210594177246094, 19.22577667236328, 26.396881103515625, 3.9512557983398438, -1.3848876953125, -2.0914535522460938, 4.567558288574219, 25.542755126953125, 21.299232482910156, -16.302413940429688, 24.86121368408203, 8.987823486328125, 12.331298828125, 6.876029968261719, 40.75110626220703, -0.3026123046875, 1.1083602905273438, 2.101734161376953, -8.41552734375, 35.284584045410156, 0.90362548828125, 5.567447662353516, 15.643470764160156, 40.277191162109375, 28.419509887695312, 42.79444885253906, 6.383476257324219, 17.560012817382812, 8.39508056640625, 32.75599670410156, 37.015716552734375, 4.046440124511719, -0.8585052490234375, 3.2738037109375, 23.95086669921875, 6.183990478515625, 10.500633239746094, 29.679733276367188, 17.026622772216797, 18.766693115234375, 18.336124420166016, -5.121826171875, 38.589134216308594, 2.719921112060547, -0.828277587890625, 14.518836975097656, 21.804588317871094, 15.835868835449219, 15.155380249023438, 8.212352752685547, 11.735504150390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000186.npy"}
{"epoch": 0.27312775330396477, "step": 187, "batch_size": 64, "mean": 12.314312934875488, "std": 15.001114845275879, "min": -18.521484375, "p10": -6.532796859741208, "median": 9.956188201904297, "p90": 32.02977752685547, "max": 58.96461486816406, "pos_frac": 0.84375, "sample": [-3.9864501953125, 32.69654846191406, 1.633469581604004, 4.101833343505859, 1.1925277709960938, 9.048637390136719, 35.29096984863281, 37.121246337890625, 16.984527587890625, 18.831405639648438, 2.9277801513671875, 1.6621646881103516, 35.15440368652344, 4.902061462402344, 16.949661254882812, -0.5013961791992188, 20.881759643554688, 4.792869567871094, 2.2479476928710938, -10.874385833740234, 7.978614807128906, -1.9544601440429688, 24.44012451171875, 6.444770812988281, 3.5052947998046875, 19.361583709716797, 58.96461486816406, -10.65194320678711, 0.8935127258300781, 10.136154174804688, 14.474479675292969, 31.571266174316406, 9.776222229003906, -7.624088287353516, 3.1119117736816406, 26.760936737060547, 41.02662658691406, 10.429683685302734, 1.5763702392578125, 10.256248474121094, 31.810760498046875, 19.075294494628906, 25.723548889160156, 28.60924530029297, 3.932281494140625, -15.020923614501953, 0.5062103271484375, 2.2709007263183594, -18.521484375, -8.438339233398438, 6.413890838623047, 25.13971710205078, 32.12364196777344, 15.685104370117188, 23.538990020751953, 21.638519287109375, 27.115062713623047, -9.429428100585938, 20.51885223388672, 23.314607620239258, 13.033645629882812, 13.867462158203125, 5.316566467285156, 8.356401443481445], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000187.npy"}
{"epoch": 0.2745961820851689, "step": 188, "batch_size": 64, "mean": 14.495676040649414, "std": 14.492321968078613, "min": -33.39897155761719, "p10": -2.704994964599609, "median": 15.073654174804688, "p90": 29.548048019409183, "max": 57.63685607910156, "pos_frac": 0.859375, "sample": [21.051902770996094, 18.37059783935547, 46.109169006347656, 6.076290130615234, 19.19915008544922, 20.756881713867188, 15.370559692382812, 16.377777099609375, 13.699348449707031, 24.290870666503906, 4.714756011962891, 22.615921020507812, 10.130195617675781, 6.456855773925781, 6.9657440185546875, 11.475692749023438, 11.37575912475586, 29.807525634765625, 7.357982635498047, 2.0538511276245117, 34.11022186279297, 14.723705291748047, 18.469131469726562, 20.54866600036621, 18.15753173828125, -7.6651458740234375, 25.631454467773438, 5.845981597900391, 28.94260025024414, -2.591278076171875, 57.63685607910156, -33.39897155761719, 19.775009155273438, 14.97125244140625, 40.89019012451172, 18.462936401367188, -12.81671142578125, 26.79790496826172, 5.553062438964844, 20.407997131347656, 26.003883361816406, -11.820297241210938, 15.947433471679688, 14.36159896850586, -3.5349273681640625, -1.2345046997070312, 5.609367370605469, 7.182525634765625, 13.399673461914062, 17.443023681640625, 16.306678771972656, 21.793724060058594, 17.002120971679688, -2.7537307739257812, 13.3702392578125, 13.304237365722656, 20.295909881591797, 43.700836181640625, 37.032432556152344, 13.06634521484375, 15.176055908203125, -3.5381765365600586, 10.464103698730469, 0.43549346923828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000188.npy"}
{"epoch": 0.27606461086637296, "step": 189, "batch_size": 64, "mean": 15.27053451538086, "std": 17.854066848754883, "min": -15.677227020263672, "p10": -3.123505401611327, "median": 12.215897560119629, "p90": 36.82639465332032, "max": 76.365234375, "pos_frac": 0.84375, "sample": [-1.419403076171875, 5.675266265869141, 20.93426513671875, 35.33436584472656, 6.665431976318359, -10.121929168701172, 23.16538429260254, -3.4874343872070312, 6.569244384765625, 14.921249389648438, 18.669654846191406, 66.64657592773438, 9.307723999023438, 30.608535766601562, 22.742958068847656, 19.739974975585938, -3.55731201171875, 8.584854125976562, 3.6880645751953125, 5.624015808105469, -8.721630096435547, 39.68791961669922, -1.4420051574707031, 32.19784927368164, 18.005306243896484, 19.922348022460938, -2.2743377685546875, 12.364473342895508, 12.06732177734375, 31.530258178710938, 13.404647827148438, 15.636774063110352, 30.465381622314453, 57.397682189941406, 11.800437927246094, 19.969234466552734, 1.4244041442871094, 5.272666931152344, -15.677227020263672, 20.123825073242188, 1.0975341796875, 0.4313507080078125, 24.821701049804688, 5.066619873046875, 7.872095108032227, 27.550395965576172, 53.698974609375, 76.365234375, 1.7835502624511719, 17.087398529052734, 39.81467819213867, 2.8665103912353516, 10.657499313354492, 12.616439819335938, -3.6385955810546875, 5.247352600097656, 20.489501953125, 0.33582305908203125, 37.46583557128906, -6.032932281494141, 0.272491455078125, 25.604515075683594, 17.035865783691406, 5.357513427734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000189.npy"}
{"epoch": 0.2775330396475771, "step": 190, "batch_size": 64, "mean": 16.90254020690918, "std": 13.775455474853516, "min": -9.274215698242188, "p10": 0.6841220855712893, "median": 15.168590545654297, "p90": 35.52760696411133, "max": 53.66520690917969, "pos_frac": 0.921875, "sample": [2.9863204956054688, 11.542617797851562, 0.40325164794921875, 6.397514343261719, 10.824005126953125, 24.511558532714844, 28.790489196777344, -5.606437683105469, 3.8334007263183594, 25.294403076171875, 20.511703491210938, 12.826358795166016, 26.619369506835938, 16.317893981933594, 31.06566619873047, 16.540748596191406, -7.8228912353515625, 16.762771606445312, 26.708831787109375, 31.110366821289062, 4.419155120849609, 42.01298522949219, 12.100227355957031, 27.2921142578125, 37.04472351074219, 12.307449340820312, 0.93701171875, 35.60786437988281, 12.31109619140625, 15.011695861816406, -9.274215698242188, 35.34033966064453, -5.564121246337891, 23.181934356689453, 36.26512145996094, 30.847061157226562, 15.325485229492188, 26.50232696533203, 9.550491333007812, -8.28472900390625, 43.44617462158203, 2.0861473083496094, 13.671142578125, 30.641845703125, 4.952140808105469, 11.240142822265625, 26.0068359375, 5.3494873046875, 37.968292236328125, 25.268280029296875, 25.544174194335938, 53.66520690917969, 14.292243957519531, 18.5938720703125, 11.340354919433594, 10.396371841430664, 0.5757408142089844, 17.879165649414062, 5.612354278564453, 23.042510986328125, 4.636360168457031, 27.436721801757812, 11.523426055908203, 4.041595458984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000190.npy"}
{"epoch": 0.2790014684287812, "step": 191, "batch_size": 64, "mean": 18.517475128173828, "std": 16.931194305419922, "min": -13.2288818359375, "p10": 0.5915617942810061, "median": 17.49883270263672, "p90": 43.765870666503915, "max": 67.22334289550781, "pos_frac": 0.921875, "sample": [31.668243408203125, -4.1174774169921875, 64.09698486328125, 45.947967529296875, 20.911048889160156, 15.2686767578125, 0.5037841796875, 19.559097290039062, 2.7709999084472656, 48.81932067871094, 13.663627624511719, 14.596992492675781, 38.27351379394531, 42.45964050292969, 6.1694793701171875, 7.6360931396484375, 4.730907440185547, 19.615394592285156, 24.30084991455078, 2.1190032958984375, 17.78839111328125, 23.041488647460938, -11.405792236328125, 27.826812744140625, 50.018714904785156, 22.496337890625, 7.612386703491211, 44.32568359375, 1.1562347412109375, 10.450332641601562, 9.267166137695312, 4.647218704223633, 6.6119842529296875, 0.3354034423828125, 17.209274291992188, 19.253509521484375, 13.062675476074219, 9.402496337890625, 45.754310607910156, 2.9469594955444336, 0.7963762283325195, 20.397048950195312, 2.362232208251953, 6.87274169921875, 3.0783233642578125, 39.2529296875, -0.9366245269775391, 9.42313003540039, 8.190101623535156, 31.79975128173828, -13.2288818359375, 25.457687377929688, 30.272628784179688, -0.3665008544921875, 27.1004638671875, 20.22357177734375, 33.898223876953125, 17.044273376464844, 20.991744995117188, 28.251861572265625, 67.22334289550781, 26.926708221435547, 18.096752166748047, 21.19476318359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000191.npy"}
{"epoch": 0.28046989720998533, "step": 192, "batch_size": 64, "mean": 13.294942855834961, "std": 15.85792350769043, "min": -24.153076171875, "p10": -6.094364929199218, "median": 12.943140029907227, "p90": 32.19769935607911, "max": 56.3572998046875, "pos_frac": 0.78125, "sample": [15.472579956054688, 13.123340606689453, -12.307357788085938, 9.77212905883789, 47.310306549072266, 28.100215911865234, 4.7133331298828125, -3.5175704956054688, -9.371391296386719, 8.407073974609375, 6.23138427734375, 21.823394775390625, 12.699241638183594, 5.0515899658203125, -5.4031982421875, 1.9245452880859375, 20.871292114257812, 16.859180450439453, 1.1158447265625, 28.963443756103516, -7.7122039794921875, 24.026046752929688, -9.777694702148438, 11.740463256835938, -6.3905792236328125, 41.38768005371094, 0.3907470703125, 56.3572998046875, -0.8590545654296875, 42.749359130859375, -0.47556304931640625, 21.26245880126953, 49.480072021484375, 25.713516235351562, 10.162742614746094, 5.893989562988281, 33.58380889892578, 27.63457489013672, 25.324378967285156, 11.470489501953125, 17.934059143066406, 17.98882293701172, 9.112739562988281, 10.594085693359375, 21.45777130126953, 15.786209106445312, 20.076156616210938, 36.46466064453125, -7.159149169921875, 22.353477478027344, 14.051265716552734, 12.762939453125, 18.51861572265625, -4.4347076416015625, 2.5428686141967773, 19.08948516845703, -4.86309814453125, 19.79027557373047, -3.761993408203125, 28.637649536132812, 14.680557250976562, 15.776355743408203, -24.153076171875, 3.82843017578125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000192.npy"}
{"epoch": 0.28193832599118945, "step": 193, "batch_size": 64, "mean": 11.581165313720703, "std": 15.705705642700195, "min": -24.79339599609375, "p10": -6.728816986083984, "median": 12.35641098022461, "p90": 32.754171752929686, "max": 44.289024353027344, "pos_frac": 0.703125, "sample": [16.366371154785156, -5.5527191162109375, 34.68291473388672, -6.667854309082031, -24.79339599609375, 6.9937896728515625, 16.832275390625, 1.0702056884765625, 0.9298858642578125, 20.99871826171875, 23.673561096191406, 11.199169158935547, 37.639854431152344, 15.871788024902344, 33.66608428955078, -2.99725341796875, 17.520397186279297, 8.988895416259766, 12.5477294921875, 32.80828857421875, -2.8219661712646484, 27.844375610351562, 22.05613136291504, 23.772354125976562, 14.189292907714844, 13.565990447998047, 28.810150146484375, 17.482810974121094, -2.4004135131835938, 33.97734832763672, -4.195059776306152, 30.279937744140625, 40.24397277832031, 11.206687927246094, -4.7414398193359375, -16.668041229248047, 5.9668121337890625, 7.28131103515625, -8.518142700195312, -1.2629776000976562, -6.75494384765625, 5.525886535644531, 12.165092468261719, 20.392059326171875, 18.480026245117188, -4.532337188720703, 21.304794311523438, -12.601837158203125, 24.384502410888672, 4.285205841064453, 0.7829704284667969, 32.627899169921875, -15.497116088867188, 16.111541748046875, 30.294376373291016, 1.6233444213867188, 22.07604217529297, 44.289024353027344, -9.015213012695312, -0.19427871704101562, 24.62432098388672, -4.5187530517578125, 28.055099487304688, -0.5609588623046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000193.npy"}
{"epoch": 0.2834067547723935, "step": 194, "batch_size": 64, "mean": 13.251032829284668, "std": 15.142853736877441, "min": -16.958446502685547, "p10": -3.807234954833984, "median": 12.779402732849121, "p90": 33.22033615112305, "max": 52.09184265136719, "pos_frac": 0.8125, "sample": [26.391101837158203, -3.4364547729492188, 20.558643341064453, 10.014495849609375, 17.310298919677734, 27.337890625, -0.5069236755371094, 12.407482147216797, 21.595054626464844, -6.886392593383789, -0.3833732604980469, 16.239395141601562, 15.892227172851562, 19.790313720703125, 7.651031494140625, 6.600364685058594, 15.274337768554688, 17.384052276611328, 17.51163101196289, 31.06764793395996, 1.6563682556152344, 5.77972412109375, -12.322593688964844, -1.6127548217773438, 10.296119689941406, 26.127033233642578, -0.5699443817138672, 33.015567779541016, 2.2430686950683594, 47.7533073425293, 52.09184265136719, 35.45037841796875, 36.963645935058594, -16.958446502685547, 1.7379112243652344, -14.237174987792969, 8.018257141113281, 4.664216995239258, 10.632171630859375, 3.5703773498535156, 17.14092254638672, 16.031394958496094, -15.192863464355469, 12.720512390136719, 12.838293075561523, 19.537254333496094, 33.3080940246582, 17.151824951171875, 1.0856704711914062, 0.977630615234375, 2.9706687927246094, 20.337722778320312, 15.908355712890625, 31.302894592285156, 3.2289066314697266, 41.3756103515625, -3.9661407470703125, 25.472118377685547, 45.48486328125, 5.420326232910156, 19.908065795898438, -4.513420104980469, 15.279510498046875, 8.145988464355469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000194.npy"}
{"epoch": 0.28487518355359764, "step": 195, "batch_size": 64, "mean": 15.680245399475098, "std": 16.32789421081543, "min": -24.004043579101562, "p10": -2.4850254058837873, "median": 15.725557327270508, "p90": 38.46228485107422, "max": 61.300048828125, "pos_frac": 0.859375, "sample": [23.698596954345703, 2.506134033203125, 16.56888198852539, 19.14566421508789, 30.086746215820312, 38.85453796386719, 8.672714233398438, 37.547027587890625, 5.8623046875, 1.6848716735839844, 13.865966796875, 21.819007873535156, 12.282844543457031, 22.469886779785156, 27.986732482910156, 13.568550109863281, 15.398826599121094, 0.27972412109375, -23.860553741455078, 20.628509521484375, 20.577110290527344, 11.461997985839844, -0.3509712219238281, 21.00555419921875, 18.923603057861328, 32.28961181640625, 43.73442077636719, 19.9178466796875, 8.533760070800781, 10.095306396484375, -12.346515655517578, 25.992111206054688, 18.117210388183594, 26.976341247558594, -6.591667175292969, -7.682380676269531, 4.905834197998047, 19.800888061523438, 5.3471832275390625, 26.314010620117188, -24.004043579101562, 23.800155639648438, 19.71487045288086, 1.1777114868164062, 44.282325744628906, 44.80607604980469, 4.8486175537109375, 44.37913513183594, -0.8162155151367188, 19.132972717285156, 7.3636322021484375, 2.57696533203125, 61.300048828125, 36.38890838623047, 11.3858642578125, -6.630584716796875, 14.219097137451172, 11.106330871582031, 26.800233840942383, 16.052288055419922, 6.7424468994140625, 6.724891662597656, 39.29600524902344, -3.2002296447753906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000195.npy"}
{"epoch": 0.28634361233480177, "step": 196, "batch_size": 64, "mean": 16.26885414123535, "std": 14.377398490905762, "min": -7.543972015380859, "p10": -1.5096229553222649, "median": 15.094884872436523, "p90": 36.32523040771485, "max": 50.58268737792969, "pos_frac": 0.875, "sample": [29.89459228515625, 30.390731811523438, 41.54835510253906, 18.836212158203125, 15.959373474121094, 9.638710021972656, -0.7305984497070312, 31.308807373046875, 28.717254638671875, 24.41779327392578, 1.4755821228027344, 10.929279327392578, 14.845451354980469, 23.15373992919922, 13.761871337890625, 19.586462020874023, 8.477190017700195, 21.933609008789062, -3.8729171752929688, 7.409885406494141, 5.435249328613281, -5.690532684326172, -7.3587493896484375, 6.9146575927734375, 15.600324630737305, 15.543365478515625, 5.421661376953125, -1.8434906005859375, 19.66094970703125, 2.8574085235595703, 36.45616149902344, 11.669334411621094, 40.90631103515625, 34.35443115234375, 3.0267791748046875, 27.302879333496094, 8.117721557617188, 28.04559326171875, 2.426403045654297, 33.4658203125, 10.168159484863281, 31.38428497314453, 36.662803649902344, -2.9393768310546875, 36.41265869140625, 5.838478088378906, 9.288986206054688, 11.32394027709961, 7.94956111907959, 16.519210815429688, -3.9560394287109375, -7.543972015380859, 18.049358367919922, 33.033287048339844, 44.112754821777344, 16.413009643554688, 32.972076416015625, 0.5819931030273438, 3.52056884765625, 36.12123107910156, 50.58268737792969, 15.344318389892578, 7.830230712890625, 1.5427932739257812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000196.npy"}
{"epoch": 0.2878120411160059, "step": 197, "batch_size": 64, "mean": 10.15202808380127, "std": 12.539072036743164, "min": -30.630084991455078, "p10": -2.1630565643310526, "median": 8.546890258789062, "p90": 24.630636596679693, "max": 43.30652618408203, "pos_frac": 0.875, "sample": [4.895305633544922, 29.710655212402344, 16.85897445678711, 7.481819152832031, -0.07206344604492188, 7.127796173095703, 9.702930450439453, 8.261177062988281, 5.401336669921875, 0.1342620849609375, 32.66862487792969, 43.30652618408203, 0.8173370361328125, 17.637939453125, 9.45367431640625, 19.93120765686035, -12.764801025390625, 22.021759033203125, 8.93829345703125, 11.173797607421875, 11.327762603759766, 20.00495147705078, 7.734474182128906, 8.8272705078125, 8.089981079101562, 3.7011489868164062, 14.295295715332031, 15.221267700195312, 20.749771118164062, 12.367828369140625, 1.9942855834960938, -8.067512512207031, -5.985683441162109, 15.415985107421875, 5.99908447265625, 0.16710662841796875, 30.9942626953125, 29.881240844726562, 7.0602264404296875, -30.630084991455078, 4.914875030517578, -18.2318115234375, 1.0828075408935547, 21.135940551757812, 2.0873336791992188, 11.4410400390625, 9.227188110351562, 12.691307067871094, 21.17951202392578, 25.21825408935547, -3.0591964721679688, 23.25952911376953, 3.1781158447265625, 8.266510009765625, 19.38751220703125, 38.60130310058594, 8.086742401123047, 5.840934753417969, 4.614250183105469, 7.546119689941406, -6.671867370605469, 20.50519561767578, 5.061737060546875, 12.531257629394531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000197.npy"}
{"epoch": 0.28928046989721, "step": 198, "batch_size": 64, "mean": 14.79887580871582, "std": 14.658514022827148, "min": -15.885734558105469, "p10": -2.2560974121093746, "median": 14.949975967407227, "p90": 33.08132705688477, "max": 68.78848266601562, "pos_frac": 0.859375, "sample": [-2.0401992797851562, -9.534286499023438, 32.08338165283203, 41.33396911621094, 17.177940368652344, 9.93536376953125, 21.3760986328125, 4.414283752441406, 22.773452758789062, 15.959041595458984, 29.541458129882812, -9.265731811523438, -4.111053466796875, 28.99396514892578, 19.36743927001953, -9.508956909179688, 25.419601440429688, 17.583389282226562, 19.359954833984375, 6.098918914794922, 21.684749603271484, 30.91455841064453, 0.9441757202148438, 19.622802734375, 9.26675796508789, 1.09906005859375, 16.842952728271484, 8.754444122314453, 4.071651458740234, 33.50901794433594, 14.233646392822266, 8.7601318359375, 5.416542053222656, 11.252120971679688, 8.189926147460938, 1.6829299926757812, -15.885734558105469, 0.39612579345703125, 41.27653503417969, 35.18341827392578, 24.165435791015625, 35.66905975341797, 19.198829650878906, -0.6396026611328125, 7.758491516113281, 10.990631103515625, 40.64305114746094, 10.953617095947266, 6.244447708129883, 10.080204010009766, -2.3486251831054688, 22.789047241210938, 20.04950714111328, 17.418292999267578, 11.847930908203125, 26.542579650878906, 5.854644775390625, 18.030731201171875, -3.540212631225586, 19.066665649414062, 19.05419158935547, 68.78848266601562, 8.670539855957031, 15.666305541992188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000198.npy"}
{"epoch": 0.2907488986784141, "step": 199, "batch_size": 64, "mean": 13.53719711303711, "std": 14.110115051269531, "min": -10.756103515625, "p10": -4.078464889526367, "median": 13.843354225158691, "p90": 31.12042465209961, "max": 48.74229431152344, "pos_frac": 0.78125, "sample": [5.338531494140625, 21.91644287109375, -10.756103515625, 23.804229736328125, 15.622695922851562, 17.996856689453125, -0.6501731872558594, 17.807296752929688, 17.193328857421875, 13.467689514160156, 48.74229431152344, 21.947113037109375, -1.1841583251953125, -0.6585197448730469, 5.112968444824219, -5.253093719482422, -4.319892883300781, 23.95587158203125, 38.96131896972656, 25.792095184326172, -9.626983642578125, 9.930946350097656, 27.955978393554688, 11.087539672851562, 3.8175582885742188, 15.749008178710938, 3.2598800659179688, 14.949825286865234, 30.844757080078125, 35.839439392089844, 7.268409729003906, 18.132850646972656, 22.594100952148438, 36.40414810180664, -3.5151329040527344, 30.88843536376953, 5.1663665771484375, 3.0494384765625, -1.7885017395019531, -0.25377655029296875, -6.202339172363281, 0.194610595703125, 20.92394256591797, 6.085578918457031, 45.65630340576172, 11.024879455566406, 4.918708801269531, 4.568471908569336, 3.0199356079101562, -2.263195037841797, -6.5747833251953125, 14.219018936157227, 17.90884017944336, 24.118141174316406, 30.055023193359375, 17.599472045898438, 5.778564453125, 23.40699005126953, 14.219871520996094, -4.779014587402344, 27.37142562866211, 37.57423400878906, 31.2198486328125, 9.745025634765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000199.npy"}
{"epoch": 0.2922173274596182, "step": 200, "batch_size": 64, "mean": 15.671920776367188, "std": 16.422321319580078, "min": -19.98328399658203, "p10": -7.4075927734375, "median": 15.317070960998535, "p90": 38.179795074462895, "max": 52.02430725097656, "pos_frac": 0.828125, "sample": [22.349571228027344, 44.66510772705078, 16.593259811401367, 33.231201171875, 12.263473510742188, -3.2794952392578125, -10.968338012695312, 16.042348861694336, 44.941680908203125, -7.590301513671875, 13.0733642578125, 22.2655029296875, 4.325035095214844, 0.5616989135742188, 2.709135055541992, 26.98632049560547, 33.875267028808594, 44.258697509765625, 26.74030303955078, 35.712982177734375, 1.5917854309082031, 7.82826042175293, 14.318666458129883, 7.333644866943359, -12.684757232666016, 28.054298400878906, 16.419937133789062, 9.47991943359375, 28.339599609375, -1.165863037109375, 33.54958724975586, -10.81336784362793, 5.5863494873046875, 13.174392700195312, 0.47419166564941406, 38.63005065917969, 11.70269775390625, 28.665512084960938, 26.409568786621094, 52.02430725097656, 5.997577667236328, 39.02344512939453, 19.16461181640625, 20.2022705078125, 0.9197063446044922, 8.490982055664062, 23.922496795654297, 17.077817916870117, 38.4378662109375, -8.078575134277344, -3.481351852416992, 24.398406982421875, -7.49365234375, 24.331565856933594, 14.08782958984375, 3.8997039794921875, 37.57762908935547, 29.10625457763672, 7.041130065917969, 26.14251708984375, -7.206787109375, 14.591793060302734, -19.98328399658203, 17.157424926757812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000200.npy"}
{"epoch": 0.2936857562408223, "step": 201, "batch_size": 64, "mean": 14.584449768066406, "std": 17.321977615356445, "min": -26.87774658203125, "p10": -8.047577667236327, "median": 13.955497741699219, "p90": 34.010589599609375, "max": 55.882354736328125, "pos_frac": 0.8125, "sample": [-0.9559249877929688, 13.15958023071289, 14.737655639648438, -26.87774658203125, 32.68771743774414, 31.716476440429688, 4.380731582641602, 2.3024330139160156, 5.264606475830078, 5.478759765625, -11.61004638671875, 29.40948486328125, -9.416015625, 20.008670806884766, 20.9493408203125, 7.024045944213867, 25.04937744140625, 1.8641281127929688, 8.844070434570312, 18.215736389160156, 23.422683715820312, 33.07066345214844, 21.46971893310547, -1.553558349609375, 13.17333984375, 41.81090545654297, -19.395540237426758, 48.88641357421875, -1.4617195129394531, 6.53485107421875, 14.902008056640625, 18.668296813964844, 20.57530975341797, 18.188018798828125, -7.7153778076171875, 17.053466796875, 33.62792205810547, 7.374000549316406, 4.795812606811523, 27.38739013671875, 50.28326416015625, 12.670585632324219, 35.52783203125, 24.059471130371094, 6.192359924316406, 33.448204040527344, -8.189949035644531, -12.344856262207031, 33.656402587890625, 9.613609313964844, 41.477508544921875, 0.31626129150390625, -5.176704406738281, 34.162384033203125, 22.620010375976562, 10.796829223632812, 10.615402221679688, 23.895904541015625, 0.7097053527832031, -17.342254638671875, 55.882354736328125, 11.488906860351562, 29.504180908203125, 16.48968505859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000201.npy"}
{"epoch": 0.29515418502202645, "step": 202, "batch_size": 64, "mean": 13.013971328735352, "std": 15.355274200439453, "min": -20.426116943359375, "p10": -2.584856033325195, "median": 12.139404296875, "p90": 31.92527008056642, "max": 61.8309326171875, "pos_frac": 0.8125, "sample": [44.78033447265625, -0.93426513671875, -0.031314849853515625, -5.719520568847656, 0.6784152984619141, 28.01642608642578, 39.076629638671875, 0.5404682159423828, 40.101043701171875, 25.9794921875, 15.040691375732422, 14.894248962402344, 9.040863037109375, 15.174057006835938, 1.0519866943359375, 26.628768920898438, 3.2931060791015625, 14.92608642578125, 9.294857025146484, -5.95062255859375, 0.483489990234375, 12.182670593261719, -2.6920700073242188, 27.85888671875, 2.397113800048828, -8.446006774902344, 19.534744262695312, 0.32122230529785156, 10.610689163208008, 18.787567138671875, 3.4783859252929688, 25.70966339111328, 46.446624755859375, -4.401058197021484, 14.932357788085938, -20.426116943359375, 5.714630126953125, -0.0479583740234375, 47.41807556152344, 16.0181884765625, 3.048797607421875, -2.3346900939941406, -5.677560806274414, 28.46564483642578, 6.506767272949219, 21.180957794189453, 15.345954895019531, 17.457290649414062, 12.096138000488281, 61.8309326171875, 2.426605224609375, 18.723722457885742, 5.964515686035156, -0.63970947265625, 7.2810211181640625, 17.21923828125, 18.591236114501953, 1.6656417846679688, 24.636322021484375, 13.08327865600586, 21.423614501953125, 2.8200607299804688, 33.40796661376953, 16.607646942138672], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000202.npy"}
{"epoch": 0.2966226138032305, "step": 203, "batch_size": 64, "mean": 15.303019523620605, "std": 14.342397689819336, "min": -13.800811767578125, "p10": -2.3643199920654276, "median": 17.231014251708984, "p90": 33.29453887939454, "max": 51.07102966308594, "pos_frac": 0.828125, "sample": [-5.507221221923828, -0.42871856689453125, 42.231910705566406, -5.3908843994140625, -11.588508605957031, 33.73841857910156, 1.7085456848144531, 19.855484008789062, 10.357343673706055, 18.780715942382812, 5.345460891723633, 10.099088668823242, 18.868072509765625, 45.46026611328125, 4.288454055786133, 18.379974365234375, 17.27245330810547, 15.97479248046875, 17.762439727783203, -0.4934120178222656, 10.771827697753906, 32.258819580078125, 5.1800384521484375, 9.71261978149414, 35.10546875, 4.652566909790039, 51.07102966308594, 27.8525390625, -12.44146728515625, 13.105384826660156, 30.913963317871094, -3.1661376953125, 27.757125854492188, 31.467918395996094, 34.07768249511719, 23.336196899414062, 45.701148986816406, 19.552688598632812, 12.679107666015625, 5.694145202636719, 10.479682922363281, 13.24456787109375, 18.958267211914062, 17.38287353515625, 7.066596984863281, 17.1895751953125, -0.48114013671875, -9.892261505126953, 10.966838836669922, 8.72457504272461, 21.067214965820312, 18.713096618652344, 18.03635025024414, 10.916658401489258, 24.3128662109375, 1.85736083984375, -0.07633590698242188, 21.594879150390625, -13.800811767578125, 20.552947998046875, 24.462356567382812, 20.992965698242188, 29.121845245361328, 26.005016326904297], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000203.npy"}
{"epoch": 0.29809104258443464, "step": 204, "batch_size": 64, "mean": 12.790034294128418, "std": 15.232439041137695, "min": -19.963150024414062, "p10": -6.295363235473633, "median": 14.986830711364746, "p90": 30.760982513427738, "max": 56.03765869140625, "pos_frac": 0.78125, "sample": [4.795021057128906, 3.0476226806640625, -7.700775146484375, 2.9378089904785156, 19.97555923461914, 17.288034439086914, 33.385955810546875, 31.244140625, 16.785568237304688, 1.2995338439941406, -3.2330093383789062, 14.488037109375, 8.003921508789062, -2.69073486328125, 22.249557495117188, -3.3509674072265625, 3.90338134765625, 29.63361358642578, 33.57110595703125, 15.214040756225586, 56.03765869140625, -10.309806823730469, -6.4643707275390625, -5.901012420654297, 26.047645568847656, 10.860605239868164, 23.899078369140625, 46.94264602661133, 21.153274536132812, 16.629234313964844, 17.336273193359375, -4.4793548583984375, 3.889801025390625, 5.495368957519531, -5.726348876953125, 37.98680877685547, 19.51202392578125, -17.91297149658203, 27.013267517089844, -3.2891845703125, -13.187301635742188, 17.164932250976562, 5.839744567871094, 24.534759521484375, 9.228988647460938, 16.69815444946289, 15.324996948242188, 19.48291778564453, 22.342567443847656, 16.54867935180664, -10.375244140625, 12.889312744140625, 9.900970458984375, 6.572147369384766, 15.516242980957031, 19.6949462890625, -19.963150024414062, 22.36376190185547, 13.782333374023438, 29.34332275390625, 18.70721435546875, 40.65581512451172, 11.168424606323242, 14.759620666503906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000204.npy"}
{"epoch": 0.29955947136563876, "step": 205, "batch_size": 64, "mean": 16.78203010559082, "std": 13.284283638000488, "min": -8.200874328613281, "p10": -2.3272529602050773, "median": 17.416484832763672, "p90": 32.21205062866211, "max": 56.381500244140625, "pos_frac": 0.84375, "sample": [7.6991424560546875, 22.385543823242188, 16.002639770507812, 14.789764404296875, -0.48299407958984375, 8.705101013183594, -0.9560909271240234, 8.5726318359375, 37.47178649902344, 27.6678466796875, -1.4606857299804688, 49.913818359375, 10.234607696533203, 18.671316146850586, 10.112422943115234, 32.9939079284668, 13.223884582519531, 26.87506103515625, 37.24070358276367, 20.524394989013672, 9.85750961303711, 11.069511413574219, 9.728378295898438, 15.630218505859375, 22.515830993652344, 12.620887756347656, 25.757976531982422, -3.0947914123535156, 31.037139892578125, 26.25251007080078, -6.46049690246582, 18.72991180419922, -3.6196746826171875, 16.759811401367188, 8.624267578125, 4.9891815185546875, 47.7091064453125, 17.25334930419922, 15.885536193847656, 23.835731506347656, -3.272979736328125, 20.131622314453125, 20.325775146484375, 11.23944091796875, 6.8837738037109375, 13.028373718261719, 27.118606567382812, -2.698638916015625, 32.71558380126953, 28.88121795654297, 4.402378082275391, 22.271759033203125, 19.603614807128906, 19.05072784423828, 24.941314697265625, 56.381500244140625, 20.4752197265625, 23.076065063476562, -8.200874328613281, 17.997539520263672, -4.063011169433594, 17.579620361328125, 21.27215576171875, 19.642410278320312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000205.npy"}
{"epoch": 0.3010279001468429, "step": 206, "batch_size": 64, "mean": 15.318704605102539, "std": 12.675344467163086, "min": -9.417457580566406, "p10": 1.3642032623291018, "median": 14.199518203735352, "p90": 31.890071105957034, "max": 54.66400146484375, "pos_frac": 0.90625, "sample": [13.75653076171875, 2.360931396484375, 8.790916442871094, 29.618240356445312, 27.27630615234375, 27.60723114013672, 19.191787719726562, 5.672599792480469, 6.5358428955078125, 12.314178466796875, 18.601394653320312, -4.099884033203125, 16.58814239501953, 11.095706939697266, -6.3869171142578125, 8.381240844726562, 17.90789031982422, 4.3712921142578125, 1.2320823669433594, 12.304464340209961, 3.4044570922851562, 13.145370483398438, 22.010360717773438, 31.095489501953125, 9.767959594726562, 7.270294189453125, 6.795173645019531, 22.756683349609375, 34.197105407714844, 19.775554656982422, 19.471847534179688, 1.6724853515625, 12.516517639160156, 15.71044921875, 21.934795379638672, 19.094051361083984, 31.5870361328125, 15.276405334472656, 33.39948272705078, 19.882675170898438, -2.6298980712890625, 17.159042358398438, -9.417457580566406, 5.204673767089844, 22.18891143798828, 12.047477722167969, -0.6310348510742188, 19.674753189086914, 2.7416839599609375, 4.508655548095703, -0.4806480407714844, 23.24919891357422, 14.642505645751953, 5.3836822509765625, 47.85270690917969, 34.98478698730469, 8.087696075439453, 22.104061126708984, 54.66400146484375, 32.777801513671875, 4.25390625, 5.3426008224487305, 32.01994323730469, 30.78388214111328], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000206.npy"}
{"epoch": 0.302496328928047, "step": 207, "batch_size": 64, "mean": 15.534026145935059, "std": 18.231779098510742, "min": -17.77447509765625, "p10": -4.9801193237304675, "median": 12.706674575805664, "p90": 34.08270339965821, "max": 74.83480834960938, "pos_frac": 0.84375, "sample": [5.033546447753906, -2.1558303833007812, 21.464988708496094, 14.473289489746094, 19.638755798339844, -12.207473754882812, 49.81462097167969, -3.1203536987304688, -7.5450592041015625, 20.84508514404297, 1.239349365234375, 7.812446594238281, 23.250656127929688, 16.74755096435547, 0.0108184814453125, 28.48800277709961, 29.73784637451172, 36.353492736816406, 3.6204967498779297, 5.592336654663086, 8.866283416748047, 6.923362731933594, -6.394317626953125, 10.289329528808594, 9.06801986694336, -5.578887939453125, 28.011810302734375, 13.713455200195312, -3.5829925537109375, 8.984092712402344, 11.699893951416016, 51.54304504394531, 24.059608459472656, 10.868247985839844, 32.1815185546875, 33.175296783447266, 33.17889404296875, 33.210411071777344, 7.610908508300781, -14.351791381835938, 19.001426696777344, 30.33902359008789, 28.259506225585938, 24.420745849609375, 29.52220916748047, 4.292137145996094, 3.5149383544921875, 17.085311889648438, 34.45654296875, 24.167678833007812, 2.5102996826171875, 23.1463623046875, -11.909912109375, 20.576217651367188, 65.748291015625, -17.77447509765625, 0.6596279144287109, 1.6063079833984375, 0.16850662231445312, 74.83480834960938, 5.59295654296875, 40.22430419921875, 2.158843994140625, 19.005203247070312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000207.npy"}
{"epoch": 0.3039647577092511, "step": 208, "batch_size": 64, "mean": 16.657718658447266, "std": 15.011775970458984, "min": -33.33015441894531, "p10": 2.0076469421386722, "median": 14.469583511352539, "p90": 34.51478805541993, "max": 55.182891845703125, "pos_frac": 0.921875, "sample": [14.50510025024414, 9.926856994628906, 11.951423645019531, 4.803070068359375, 9.123180389404297, -6.5369720458984375, 32.21586608886719, 3.7243728637695312, 1.8058967590332031, 34.89555358886719, 19.182334899902344, 28.62554168701172, 11.566337585449219, 21.093324661254883, 24.703712463378906, 17.279823303222656, 11.572967529296875, 4.274425506591797, 21.372909545898438, 30.145217895507812, 25.240402221679688, 23.466449737548828, 30.811172485351562, 14.251182556152344, 5.420326232910156, 14.0863037109375, 41.223175048828125, 17.064483642578125, 33.09178161621094, 36.86476135253906, -2.3943939208984375, 5.229217529296875, 7.232231140136719, 11.758384704589844, 31.270801544189453, 40.440208435058594, 3.9468612670898438, 11.188796997070312, 17.00830078125, 10.093727111816406, 55.182891845703125, 0.9519538879394531, 30.17316436767578, 5.9115142822265625, 2.4783973693847656, 33.62633514404297, 21.55216407775879, 41.75421905517578, -33.33015441894531, 33.403533935546875, -9.457122802734375, 2.7093353271484375, 26.034164428710938, 11.353042602539062, 24.492530822753906, 23.530075073242188, 8.643028259277344, 14.434066772460938, 40.76173400878906, 8.996429443359375, 24.833698272705078, 7.107948303222656, 17.738876342773438, -10.312965393066406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000208.npy"}
{"epoch": 0.3054331864904552, "step": 209, "batch_size": 64, "mean": 14.547401428222656, "std": 15.791078567504883, "min": -20.193321228027344, "p10": -4.596049880981444, "median": 11.686164855957031, "p90": 37.60775146484376, "max": 52.574371337890625, "pos_frac": 0.8125, "sample": [23.063217163085938, 8.028339385986328, 40.26483917236328, 6.091644287109375, 42.973297119140625, 52.574371337890625, 12.967300415039062, -5.49211311340332, 33.7662353515625, 9.808494567871094, 22.077659606933594, 12.731781005859375, 28.695560455322266, 5.8615570068359375, 38.562198638916016, 10.782440185546875, 39.84283447265625, 14.573463439941406, 11.09783935546875, 27.64649200439453, 13.818038940429688, 7.513832092285156, 3.231814384460449, 3.4450302124023438, 8.977699279785156, -20.193321228027344, -8.587240219116211, 25.339797973632812, 16.99140167236328, 7.945587158203125, 32.25401306152344, 30.35051727294922, 7.8843231201171875, -0.4116935729980469, 3.0042495727539062, 38.49951171875, -10.593870162963867, 8.621078491210938, 20.329078674316406, 45.58111572265625, -0.36827850341796875, 3.7122268676757812, 6.84388542175293, 8.711845397949219, -16.188629150390625, 13.267196655273438, 29.332870483398438, -5.077007293701172, 25.243423461914062, 9.015228271484375, 12.274490356445312, 11.014518737792969, 18.027236938476562, -10.990791320800781, 35.5269775390625, 28.753921508789062, 25.675296783447266, 8.528823852539062, 18.402999877929688, -3.47381591796875, 13.145027160644531, 33.72020721435547, -1.9007110595703125, -2.0816726684570312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000209.npy"}
{"epoch": 0.3069016152716593, "step": 210, "batch_size": 64, "mean": 14.190086364746094, "std": 14.906085014343262, "min": -8.602020263671875, "p10": -1.8846202850341796, "median": 10.474705696105957, "p90": 35.18120613098145, "max": 58.19584655761719, "pos_frac": 0.828125, "sample": [10.880905151367188, 8.411697387695312, 34.31383514404297, 26.11358642578125, 8.56634521484375, 8.356063842773438, 14.063812255859375, 2.00161075592041, 5.474372863769531, -1.9161796569824219, 39.64813995361328, 35.44208908081055, 6.143781661987305, 0.6370010375976562, 31.083599090576172, -8.234874725341797, 6.482666015625, 26.182579040527344, 40.788429260253906, 7.0014801025390625, 34.572479248046875, 10.564931869506836, 3.1552162170410156, 28.26123046875, -5.591239929199219, 2.7934646606445312, 4.735551834106445, -8.602020263671875, 17.967971801757812, 25.85015869140625, 10.384479522705078, 30.286842346191406, -8.138809204101562, 27.946441650390625, 14.063507080078125, 2.7832565307617188, 26.66375732421875, 6.5019378662109375, 10.693920135498047, -1.246255874633789, 14.462261199951172, 8.67095947265625, -1.8109817504882812, -3.7488861083984375, 5.818704605102539, 35.84169006347656, 44.54380798339844, -1.799072265625, 12.587921142578125, -3.461496353149414, 39.249061584472656, 58.19584655761719, 19.449554443359375, 28.504039764404297, 19.854904174804688, 6.1380615234375, 17.355392456054688, -0.8280906677246094, 1.3395414352416992, 9.647247314453125, 13.395751953125, 3.8626022338867188, 13.361984252929688, 32.44697570800781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000210.npy"}
{"epoch": 0.30837004405286345, "step": 211, "batch_size": 64, "mean": 15.147308349609375, "std": 16.033203125, "min": -18.599578857421875, "p10": -3.7430450439453127, "median": 14.352580070495605, "p90": 35.79385223388672, "max": 72.66098022460938, "pos_frac": 0.8125, "sample": [17.415210723876953, 10.339273452758789, 26.481689453125, -2.8778762817382812, 23.878517150878906, 13.423728942871094, 35.608673095703125, 35.87321472167969, 49.2314453125, 22.224029541015625, 22.420684814453125, -8.246335983276367, 6.645927429199219, 28.880386352539062, 3.7133941650390625, 39.03233337402344, 42.94451904296875, 7.57177734375, 33.12688446044922, -11.637153625488281, 6.292758941650391, -18.599578857421875, 3.24627685546875, 24.034915924072266, 19.808837890625, -7.4652252197265625, -1.3421173095703125, 23.702072143554688, -2.894805908203125, 10.189300537109375, 72.66098022460938, 15.236846923828125, 17.707992553710938, 21.595870971679688, 38.263710021972656, 22.543701171875, 13.73638916015625, 18.505569458007812, 12.173431396484375, 22.183258056640625, 13.967670440673828, 12.012039184570312, 17.71735382080078, 3.7446556091308594, 2.364410400390625, 36.03228759765625, 11.53370475769043, 17.170074462890625, 3.448453903198242, 20.63260269165039, -3.75048828125, -3.725677490234375, -7.6704864501953125, 10.008468627929688, 32.510711669921875, 18.18158721923828, 16.34563446044922, 10.278339385986328, -6.354928970336914, 14.737489700317383, 6.238983154296875, 31.954986572265625, -1.5664138793945312, 5.9858245849609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000211.npy"}
{"epoch": 0.30983847283406757, "step": 212, "batch_size": 64, "mean": 19.92238998413086, "std": 16.048542022705078, "min": -17.374603271484375, "p10": 1.543497467041016, "median": 19.68808937072754, "p90": 43.90265197753907, "max": 68.24821472167969, "pos_frac": 0.921875, "sample": [16.204513549804688, 5.692115783691406, 68.24821472167969, 48.149078369140625, 19.387561798095703, 21.835487365722656, -2.5869140625, 4.07720947265625, 22.520217895507812, 8.01129150390625, 41.60234069824219, -17.374603271484375, 4.796607971191406, 4.5823822021484375, 12.407196044921875, 23.339561462402344, 30.105194091796875, 47.420677185058594, 48.39990234375, 30.487762451171875, 1.379913330078125, -2.6632461547851562, 24.813446044921875, 24.736328125, 23.347946166992188, 35.439186096191406, 48.152984619140625, 1.9251937866210938, 26.597496032714844, 4.783607482910156, 12.819610595703125, 7.263587951660156, 22.121383666992188, 37.246429443359375, 5.332643508911133, 14.490150451660156, 20.720932006835938, 13.674957275390625, 18.746997833251953, 17.489715576171875, 23.038307189941406, 41.6322021484375, 18.887319564819336, 45.644866943359375, 28.016311645507812, 18.194259643554688, 5.903678894042969, -3.6144065856933594, 37.05023193359375, -5.488094329833984, 19.988616943359375, 25.565948486328125, 22.94488525390625, 10.389892578125, 13.691287994384766, 7.4994659423828125, 27.658370971679688, 30.61626434326172, 20.472381591796875, 27.253963470458984, 1.1054458618164062, 12.529979705810547, 5.4510955810546875, 44.875701904296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000212.npy"}
{"epoch": 0.31130690161527164, "step": 213, "batch_size": 64, "mean": 16.29867935180664, "std": 14.35468864440918, "min": -17.54852294921875, "p10": -0.8153339385986321, "median": 14.88740348815918, "p90": 33.718256378173834, "max": 55.52528381347656, "pos_frac": 0.859375, "sample": [28.561241149902344, 14.230148315429688, 15.199913024902344, 26.444480895996094, 18.564376831054688, 12.602241516113281, 1.374420166015625, 34.523170471191406, -0.13348388671875, 25.449310302734375, -1.1075553894042969, 22.024169921875, 28.03631591796875, 3.2409229278564453, 25.432086944580078, 28.059917449951172, 0.6916427612304688, -5.055294036865234, 31.430816650390625, 55.52528381347656, 44.902618408203125, 15.369880676269531, 15.167240142822266, 37.22028350830078, 34.10536193847656, 14.607566833496094, 4.993621826171875, -4.906366348266602, 9.515121459960938, -1.1740493774414062, 10.283470153808594, 11.975875854492188, 18.76049041748047, 8.646377563476562, 26.033065795898438, 18.798805236816406, 10.650192260742188, 28.951919555664062, 14.077808380126953, 24.33838653564453, 19.17084503173828, -0.055828094482421875, 40.908409118652344, -3.0682373046875, 1.8687515258789062, 19.478458404541016, 14.388134002685547, 52.639862060546875, -17.54852294921875, 26.862510681152344, 12.945938110351562, 32.81501007080078, 9.080978393554688, 3.759319305419922, 8.370063781738281, -6.623353958129883, 29.280433654785156, 9.336807250976562, 6.181436538696289, 10.260322570800781, 20.694805145263672, 4.902790069580078, 17.932697296142578, 22.122024536132812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000213.npy"}
{"epoch": 0.31277533039647576, "step": 214, "batch_size": 64, "mean": 14.064952850341797, "std": 18.036895751953125, "min": -19.911602020263672, "p10": -5.948512840270996, "median": 12.947061538696289, "p90": 36.45271301269531, "max": 77.33705139160156, "pos_frac": 0.75, "sample": [18.443275451660156, 44.254066467285156, -1.8511734008789062, -1.5094108581542969, 16.494426727294922, -10.636764526367188, 3.2009429931640625, -12.770133972167969, 22.239547729492188, 14.502616882324219, -6.0657501220703125, 10.428947448730469, 4.865329742431641, -1.385812759399414, 27.340431213378906, 10.5943603515625, -2.1517105102539062, 10.724651336669922, 13.248710632324219, 31.721969604492188, 10.993732452392578, 14.00604248046875, 26.39398956298828, 3.3797149658203125, 23.009319305419922, 23.56217384338379, 31.979347229003906, 6.443689346313477, 10.81512451171875, -14.253665924072266, 24.189762115478516, 29.32628631591797, 14.025161743164062, 45.69517517089844, 9.11126708984375, 42.397674560546875, 52.355350494384766, 13.445526123046875, 17.539894104003906, 14.99749755859375, 12.64541244506836, 0.8330726623535156, 31.929283142089844, 77.33705139160156, -19.911602020263672, 35.612640380859375, 9.887687683105469, 8.628265380859375, 23.337112426757812, -1.796630859375, 19.888641357421875, 20.66234588623047, 7.460752487182617, 2.3061370849609375, -1.4633560180664062, -18.916946411132812, 16.69027328491211, 41.06218719482422, 27.1932373046875, -1.454010009765625, -2.4513397216796875, 36.812744140625, -5.674959182739258, -11.562576293945312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000214.npy"}
{"epoch": 0.3142437591776799, "step": 215, "batch_size": 64, "mean": 18.109539031982422, "std": 15.855560302734375, "min": -6.473911285400391, "p10": -0.19671192169189453, "median": 16.424800872802734, "p90": 36.95844039916993, "max": 65.51692199707031, "pos_frac": 0.875, "sample": [27.833343505859375, 32.85145568847656, 9.069755554199219, 18.495399475097656, 9.143058776855469, 21.586647033691406, -6.473911285400391, 22.290855407714844, 25.593109130859375, 21.267881393432617, 64.2308349609375, 22.30854034423828, 14.310201644897461, 20.77069091796875, 33.81028747558594, 25.095672607421875, 7.863746643066406, 10.767013549804688, 54.03520202636719, 28.039222717285156, 2.062042236328125, 42.35772705078125, 65.51692199707031, 21.620361328125, 13.48870849609375, -4.9957427978515625, -1.3768043518066406, 21.892044067382812, 28.780174255371094, 18.438575744628906, 4.385643005371094, 11.39764404296875, -6.359199523925781, 2.0667152404785156, 15.089790344238281, 16.694908142089844, -1.1287155151367188, 17.193710327148438, 27.285934448242188, 15.558303833007812, 1.0320281982421875, 19.256088256835938, 46.52716827392578, 37.516571044921875, 34.49259567260742, 7.6812896728515625, 35.65613555908203, 10.816326141357422, 16.154693603515625, 38.370201110839844, 20.482200622558594, 14.202690124511719, 30.84100341796875, 2.2877273559570312, -0.8405303955078125, 5.243896484375, 6.4803314208984375, 32.801414489746094, -0.20281219482421875, 10.560358047485352, -0.1824779510498047, 10.083343505859375, 0.7216873168945312, 6.1708221435546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000215.npy"}
{"epoch": 0.315712187958884, "step": 216, "batch_size": 64, "mean": 14.697672843933105, "std": 13.772549629211426, "min": -12.582794189453125, "p10": 0.5555122375488283, "median": 12.098572731018066, "p90": 33.312373733520516, "max": 61.32411193847656, "pos_frac": 0.90625, "sample": [16.54261016845703, 26.27459716796875, 2.2013015747070312, 18.9476318359375, 6.1688690185546875, 7.538139343261719, 21.497604370117188, 10.684925079345703, 22.14276885986328, 1.0909271240234375, 21.956573486328125, 0.833160400390625, 25.929031372070312, 3.7990036010742188, 15.315658569335938, 25.050430297851562, -2.800457000732422, 36.735870361328125, 11.396015167236328, -12.582794189453125, 3.205780029296875, 43.10302734375, 7.411487579345703, 23.417755126953125, 35.44068908691406, 15.925811767578125, 25.952133178710938, 31.78350830078125, 42.85626220703125, 5.518775939941406, 24.045433044433594, 18.144336700439453, -0.037878990173339844, 6.751091957092285, 18.438201904296875, 10.753860473632812, 13.317756652832031, 20.765159606933594, 6.596893310546875, 14.125629425048828, 7.501636505126953, 3.526012420654297, -3.5931854248046875, 5.860767364501953, 30.826812744140625, 12.801130294799805, 33.96760177612305, 14.078819274902344, 0.47097015380859375, 22.093368530273438, 6.781585693359375, 8.460350036621094, -3.238391876220703, 6.45672607421875, 5.040271759033203, 61.32411193847656, 17.49483871459961, 9.084922790527344, 41.01258850097656, 29.950164794921875, 1.8503875732421875, 0.752777099609375, 2.6064300537109375, -0.697235107421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000216.npy"}
{"epoch": 0.31718061674008813, "step": 217, "batch_size": 64, "mean": 13.677059173583984, "std": 15.640470504760742, "min": -13.912189483642578, "p10": -5.65495319366455, "median": 11.691972732543945, "p90": 32.08452606201172, "max": 56.36478042602539, "pos_frac": 0.828125, "sample": [31.947113037109375, 29.382251739501953, 9.607391357421875, -9.876523971557617, 8.45855712890625, 10.833770751953125, 21.493099212646484, 12.434677124023438, 16.051345825195312, 14.47406005859375, 30.926101684570312, 48.23631286621094, 21.13629150390625, 44.14405822753906, 17.969234466552734, -8.643157958984375, 12.243526458740234, -5.063453674316406, 26.205154418945312, 53.40193176269531, -4.230537414550781, 7.104175567626953, 8.669342041015625, 1.6922454833984375, 8.52862548828125, 20.38585662841797, 3.871856689453125, -13.912189483642578, 2.4893798828125, 56.36478042602539, 47.97993469238281, 10.392784118652344, -11.758087158203125, 2.735687255859375, 32.14341735839844, 18.762958526611328, 20.008651733398438, 11.46841049194336, -10.422439575195312, 18.902816772460938, 24.42230224609375, 9.504379272460938, 42.8458251953125, 8.890106201171875, 16.737533569335938, 13.006263732910156, 21.763381958007812, 8.642204284667969, 14.522834777832031, 19.839265823364258, -10.054428100585938, 8.190467834472656, 10.198822021484375, 12.382266998291016, 16.815887451171875, 3.1380844116210938, 3.824981689453125, 8.623952865600586, 4.3988494873046875, -0.2709197998046875, 11.915534973144531, 17.700881958007812, -2.3375930786132812, -5.908452987670898], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000217.npy"}
{"epoch": 0.3186490455212922, "step": 218, "batch_size": 64, "mean": 15.674627304077148, "std": 16.214881896972656, "min": -23.129913330078125, "p10": -4.5466468811035154, "median": 15.932694435119629, "p90": 37.56399612426758, "max": 63.18585205078125, "pos_frac": 0.859375, "sample": [19.04507827758789, 28.259437561035156, 30.762489318847656, 7.645732879638672, 36.298423767089844, 7.898387908935547, 39.86643981933594, 21.938568115234375, 6.611064910888672, -9.011817932128906, 16.817363739013672, 17.418479919433594, -8.459014892578125, 27.5887451171875, 30.194366455078125, 7.6561126708984375, -6.6588287353515625, 7.806598663330078, 16.085708618164062, 16.008522033691406, 16.349884033203125, 23.211286544799805, 16.721519470214844, -23.129913330078125, 6.666839599609375, -4.325370788574219, 23.245986938476562, 29.436439514160156, 38.10638427734375, 40.818870544433594, 20.3049373626709, 16.928070068359375, 35.2642822265625, 13.429489135742188, 23.025146484375, 29.71514129638672, 15.856866836547852, -1.4733352661132812, 10.193180084228516, 11.309249877929688, 18.782855987548828, 6.060070037841797, 23.59084701538086, 9.110633850097656, 7.171173095703125, 0.47939300537109375, 63.18585205078125, 40.01133728027344, -5.033668518066406, 13.946578979492188, 6.053863525390625, 39.020294189453125, 16.262557983398438, 10.092353820800781, 7.193412780761719, 31.109201431274414, 53.290279388427734, 4.6377410888671875, 11.641532897949219, -22.711196899414062, 1.8509140014648438, 3.0884552001953125, -4.6414794921875, 13.556312561035156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000218.npy"}
{"epoch": 0.3201174743024963, "step": 219, "batch_size": 64, "mean": 14.859960556030273, "std": 16.39613151550293, "min": -29.94342041015625, "p10": -1.3353693962097164, "median": 12.171496391296387, "p90": 38.52925605773926, "max": 58.388248443603516, "pos_frac": 0.875, "sample": [34.746490478515625, 21.61471939086914, 21.58600425720215, 39.196990966796875, 12.366697311401367, 21.3253173828125, -29.94342041015625, 3.1181640625, 4.13330078125, 41.5780029296875, 23.863082885742188, 0.06383895874023438, 13.611900329589844, 42.98785400390625, 51.2618408203125, 8.219646453857422, -0.8841180801391602, 30.533897399902344, 11.976295471191406, 34.03733444213867, 21.585250854492188, 5.52099609375, -2.7497596740722656, 13.642143249511719, 28.269500732421875, 2.1803512573242188, 37.03694152832031, 9.056068420410156, 11.574516296386719, 11.127403259277344, 14.94681167602539, 58.388248443603516, -6.8673095703125, 5.809215545654297, -1.5287628173828125, 18.62127685546875, 23.948699951171875, 9.886428833007812, 8.625381469726562, 47.388702392578125, 21.33538055419922, 0.4716644287109375, 1.7523422241210938, 14.931640625, 39.168819427490234, 14.979961395263672, 6.664764404296875, 0.16424942016601562, 5.428749084472656, 21.45520782470703, -20.460723876953125, 23.66309356689453, 6.27813720703125, -6.057563781738281, -8.032196044921875, 19.590213775634766, 8.59743881225586, 18.086294174194336, 19.77686309814453, 9.773544311523438, 9.373931884765625, 33.0318603515625, 1.2818222045898438, 7.926006317138672], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000219.npy"}
{"epoch": 0.32158590308370044, "step": 220, "batch_size": 64, "mean": 12.971321105957031, "std": 13.557524681091309, "min": -12.963783264160156, "p10": -4.174382400512694, "median": 12.19329833984375, "p90": 30.786670684814453, "max": 56.13685607910156, "pos_frac": 0.84375, "sample": [-12.963783264160156, 7.187702178955078, 9.723377227783203, -4.774944305419922, 31.407054901123047, 30.472900390625, 15.638877868652344, 10.789989471435547, -11.61126708984375, 30.815658569335938, 23.25109100341797, 3.1797943115234375, 22.937355041503906, 13.512081146240234, 24.559688568115234, 20.405654907226562, 12.4959716796875, -7.402189254760742, 30.719032287597656, 30.339641571044922, 56.13685607910156, 4.105869293212891, 37.383087158203125, 22.26275634765625, 0.44788360595703125, 8.951766967773438, 24.799205780029297, 0.7528533935546875, -0.7239837646484375, 6.6975555419921875, 34.45098876953125, 0.016571044921875, 13.649471282958984, 0.267425537109375, 34.646270751953125, 17.814178466796875, -5.0016937255859375, 4.143035888671875, 3.5207977294921875, 14.912857055664062, 2.0653305053710938, 14.519763946533203, 9.485906600952148, -2.7730712890625, 9.7401123046875, -6.643861770629883, 31.768707275390625, 18.732650756835938, 6.304103851318359, 3.0605239868164062, 2.231689453125, 25.157974243164062, 13.527168273925781, 16.38048553466797, -5.658729553222656, 20.207744598388672, 21.64319610595703, 25.618453979492188, 11.890625, 9.682952880859375, 16.23064422607422, -1.2171974182128906, 21.008331298828125, 7.283573150634766], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000220.npy"}
{"epoch": 0.32305433186490456, "step": 221, "batch_size": 64, "mean": 19.033649444580078, "std": 15.764942169189453, "min": -11.384635925292969, "p10": 0.10459899902343905, "median": 16.971359252929688, "p90": 42.0051986694336, "max": 63.563133239746094, "pos_frac": 0.890625, "sample": [2.376972198486328, -7.3425750732421875, -0.551239013671875, 6.39471435546875, 15.179924011230469, 48.125091552734375, 13.820816040039062, 12.4486083984375, 23.28386688232422, 63.563133239746094, 41.96929931640625, -3.6141891479492188, 25.070178985595703, 25.475372314453125, 26.377582550048828, 7.311912536621094, 26.93304443359375, 29.520309448242188, 11.633659362792969, 32.734771728515625, 30.257553100585938, 9.909317016601562, -1.4331436157226562, 20.121788024902344, 22.08460235595703, 47.46448516845703, 31.985862731933594, 16.677581787109375, 7.718578338623047, -3.0339508056640625, 17.957046508789062, 42.02058410644531, 23.011146545410156, 28.26081085205078, 8.37615966796875, 26.044845581054688, 4.725788116455078, 15.638473510742188, 12.508602142333984, 11.161178588867188, 10.466239929199219, 52.991455078125, 43.387657165527344, 7.9237213134765625, 18.453109741210938, 5.683143615722656, 30.334739685058594, 27.158958435058594, 38.020042419433594, 19.4388427734375, 12.902961730957031, -4.612022399902344, 16.514846801757812, 10.916763305664062, 17.26513671875, 5.712005615234375, -11.384635925292969, 26.461837768554688, 6.930225372314453, 1.6348876953125, 22.02001953125, 2.607006072998047, 38.00189208984375, 47.15631103515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000221.npy"}
{"epoch": 0.3245227606461087, "step": 222, "batch_size": 64, "mean": 16.127126693725586, "std": 18.903045654296875, "min": -29.70037841796875, "p10": -3.7647300720214836, "median": 16.554706573486328, "p90": 35.02169570922852, "max": 76.88426208496094, "pos_frac": 0.84375, "sample": [8.843704223632812, 32.51165771484375, 7.396759033203125, 11.184162139892578, -6.8598785400390625, 20.700023651123047, 2.4122314453125, 31.721923828125, -21.831756591796875, 30.729812622070312, 13.816368103027344, 3.708751678466797, 67.72134399414062, 23.678104400634766, 16.495864868164062, 38.42772674560547, 9.0635986328125, 17.68994140625, 26.8873291015625, 24.71322250366211, 10.543319702148438, 6.226325988769531, -29.70037841796875, 21.709869384765625, 3.7501144409179688, 14.731956481933594, 17.50164031982422, -3.0520782470703125, 20.799636840820312, 10.702484130859375, 19.555389404296875, 6.537929534912109, 22.88166046142578, 4.285427093505859, -18.844772338867188, 5.3882598876953125, -1.182647705078125, 19.047874450683594, 0.3529624938964844, 22.18280029296875, 24.7388916015625, 16.613548278808594, 16.795269012451172, -15.7579345703125, 34.85533142089844, 15.391395568847656, 22.339397430419922, 35.092994689941406, 12.072822570800781, 4.858203887939453, 11.454170227050781, 20.261831283569336, 24.534927368164062, 34.74885940551758, 55.30731201171875, -5.1875762939453125, 41.73262023925781, -4.070152282714844, -2.2567214965820312, 3.4244384765625, 76.88426208496094, 25.52081298828125, 53.0264892578125, 17.326217651367188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000222.npy"}
{"epoch": 0.32599118942731276, "step": 223, "batch_size": 64, "mean": 13.751843452453613, "std": 15.782407760620117, "min": -18.066991806030273, "p10": -5.201155853271481, "median": 11.909862518310547, "p90": 35.99612884521485, "max": 60.44573974609375, "pos_frac": 0.828125, "sample": [1.4087982177734375, -8.560462951660156, 20.720840454101562, -1.011749267578125, 5.053436279296875, 5.11126708984375, 24.328758239746094, 12.44087028503418, 14.69997787475586, -11.299644470214844, -8.625091552734375, 20.114044189453125, 11.691749572753906, 19.061492919921875, 12.127975463867188, 10.0040283203125, 14.329708099365234, 15.035070419311523, 35.21673583984375, 6.746868133544922, 17.906211853027344, -6.7213592529296875, 14.24094009399414, -18.066991806030273, -8.173561096191406, 37.65723419189453, 5.2405853271484375, 31.184371948242188, 26.857025146484375, 3.2585906982421875, 18.358299255371094, 9.945911407470703, 6.740150451660156, 27.57281494140625, 31.991073608398438, 37.162628173828125, -0.2444305419921875, 10.523147583007812, 11.177253723144531, 60.44573974609375, 39.48912048339844, -16.591800689697266, 16.35781478881836, 36.33015441894531, 35.12138366699219, -1.6540145874023438, 5.749603271484375, 14.526969909667969, 16.82036590576172, 28.975772857666016, 3.442981719970703, -0.6984367370605469, 9.555755615234375, 39.099761962890625, 1.5988883972167969, 43.843231201171875, 1.3271369934082031, 1.673593521118164, 8.530166625976562, 1.19879150390625, 15.135419845581055, 31.421852111816406, 2.870746612548828, 30.3424072265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000223.npy"}
{"epoch": 0.3274596182085169, "step": 224, "batch_size": 64, "mean": 15.357132911682129, "std": 15.210442543029785, "min": -13.133621215820312, "p10": -3.363623809814453, "median": 13.637889862060547, "p90": 37.03127593994142, "max": 67.13956451416016, "pos_frac": 0.84375, "sample": [9.803886413574219, -13.133621215820312, 15.614334106445312, 8.270431518554688, 1.9729347229003906, 47.796302795410156, 4.121116638183594, 15.140510559082031, 16.988327026367188, 12.599098205566406, 13.631494522094727, 38.64247131347656, 18.801239013671875, 16.07962417602539, 33.271820068359375, -3.8214187622070312, 13.610183715820312, 5.671514511108398, 20.49059295654297, 30.463531494140625, 31.78624725341797, 8.128047943115234, -3.0921783447265625, 16.425777435302734, -4.96142578125, 18.48419952392578, 12.90911865234375, 20.195270538330078, 28.608726501464844, -4.186145782470703, -0.2041015625, -3.4799575805664062, 10.894573211669922, 54.17657470703125, 11.228271484375, -3.9384002685546875, 11.601341247558594, 0.323394775390625, 14.334152221679688, 3.420806884765625, 26.978588104248047, 40.68275451660156, 2.8227996826171875, 3.1551952362060547, 12.146095275878906, 13.644285202026367, -5.579204559326172, 6.473899841308594, 22.290637969970703, 11.872440338134766, 14.456062316894531, 14.0517578125, 42.090057373046875, 22.350326538085938, -1.40692138671875, 7.701896667480469, 15.896888732910156, 17.287689208984375, 13.422813415527344, 26.909011840820312, 22.708049774169922, 42.23487854003906, 67.13956451416016, 14.858318328857422], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000224.npy"}
{"epoch": 0.328928046989721, "step": 225, "batch_size": 64, "mean": 17.880281448364258, "std": 17.803144454956055, "min": -24.49039077758789, "p10": -1.5320655822753901, "median": 16.999595642089844, "p90": 40.98373832702637, "max": 72.11788940429688, "pos_frac": 0.875, "sample": [21.401893615722656, 70.35418701171875, 22.947914123535156, 16.022693634033203, 23.99401092529297, 13.991046905517578, 3.8370361328125, 20.024932861328125, 2.461395263671875, 24.728256225585938, 19.078170776367188, 17.69757843017578, -15.653594970703125, 9.333694458007812, 33.05977249145508, 44.281700134277344, -24.49039077758789, -2.4819374084472656, -1.7575302124023438, 2.6457443237304688, 10.425552368164062, 47.369346618652344, 41.427703857421875, 39.947818756103516, 0.10360527038574219, 2.438629150390625, 53.98915100097656, 25.14948272705078, 28.508319854736328, 19.04751205444336, 14.580131530761719, 8.383956909179688, 5.31927490234375, 12.693550109863281, -5.6209564208984375, 29.4600830078125, 14.710311889648438, 4.9939117431640625, 6.502277374267578, 19.10761260986328, 28.923599243164062, 27.126441955566406, 16.301612854003906, 22.42845916748047, 46.549949645996094, 8.12704086303711, 72.11788940429688, 29.027667999267578, 26.736282348632812, 25.981430053710938, 15.065994262695312, 16.249771118164062, 9.239212036132812, -1.0059814453125, -11.831829071044922, 20.30059814453125, 20.874374389648438, 11.315086364746094, 15.974693298339844, 23.62950897216797, 2.4560890197753906, 22.862422943115234, -7.6823883056640625, 23.556243896484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000225.npy"}
{"epoch": 0.3303964757709251, "step": 226, "batch_size": 64, "mean": 15.729891777038574, "std": 14.933470726013184, "min": -13.034940719604492, "p10": -2.0168689727783193, "median": 15.515892028808594, "p90": 35.77992935180664, "max": 56.77130126953125, "pos_frac": 0.859375, "sample": [56.77130126953125, 11.768386840820312, -11.023574829101562, 44.66236114501953, 6.698102951049805, -0.58123779296875, 4.055633544921875, 39.30281448364258, 5.5080413818359375, 14.805416107177734, 22.08002471923828, 16.594207763671875, 0.05760765075683594, 15.971334457397461, 10.506385803222656, 4.784517288208008, 17.625953674316406, 3.4279098510742188, 23.257633209228516, 16.916236877441406, 26.426986694335938, 35.03520202636719, 10.049718856811523, 18.376636505126953, 7.2040557861328125, 4.860198974609375, 19.843055725097656, 22.474868774414062, -7.669921875, -12.73077392578125, -2.37213134765625, 20.387901306152344, 15.160406112670898, 22.432662963867188, 15.503219604492188, 14.119354248046875, 34.88533020019531, 14.102828979492188, 19.238739013671875, 3.8654861450195312, 17.019943237304688, 2.5009231567382812, 18.681072235107422, 12.57666015625, 33.7713508605957, -5.7662353515625, 24.430099487304688, 22.976524353027344, -6.366792678833008, 13.381240844726562, 31.83606719970703, 8.608238220214844, 19.48388671875, 15.528564453125, -1.1879234313964844, 36.33610534667969, 19.607208251953125, 4.0939178466796875, 47.10365295410156, 7.7119140625, 33.217735290527344, -13.034940719604492, 43.72186279296875, 36.099098205566406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000226.npy"}
{"epoch": 0.33186490455212925, "step": 227, "batch_size": 64, "mean": 15.833307266235352, "std": 17.482177734375, "min": -16.937973022460938, "p10": -5.200950622558594, "median": 15.1929931640625, "p90": 42.11606063842773, "max": 60.2352294921875, "pos_frac": 0.796875, "sample": [17.194671630859375, 4.904876708984375, 3.5359344482421875, 31.39655303955078, 21.317398071289062, 12.365421295166016, 22.74826431274414, 19.325298309326172, 24.946212768554688, 10.591819763183594, 9.230133056640625, 40.80815887451172, 2.32196044921875, 2.540224075317383, -5.28277587890625, 21.560928344726562, -6.5237274169921875, 21.692832946777344, 39.602996826171875, 60.2352294921875, 15.41094970703125, 16.147720336914062, 14.97503662109375, 22.085845947265625, 8.814338684082031, -10.954513549804688, 42.14098358154297, -0.5848541259765625, 19.717926025390625, 14.718887329101562, 28.473777770996094, 33.093353271484375, 17.192230224609375, -4.933277130126953, -13.663787841796875, -16.937973022460938, 10.666378021240234, -1.970916748046875, 42.80641174316406, -14.530303955078125, 22.578704833984375, 45.83757019042969, -2.9482421875, 43.93499755859375, 5.059574127197266, 19.224273681640625, 23.951141357421875, 5.312919616699219, 51.28889465332031, 10.692852020263672, 12.586654663085938, 42.05790710449219, 45.33331298828125, -11.504486083984375, -2.6162185668945312, 19.77092933654785, 32.963233947753906, -5.0100250244140625, 3.9978408813476562, 4.957244873046875, 15.885929107666016, 12.775308609008789, 8.485469818115234, 31.5352783203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000227.npy"}
{"epoch": 0.3333333333333333, "step": 228, "batch_size": 64, "mean": 12.747953414916992, "std": 12.836167335510254, "min": -19.25469970703125, "p10": -1.19861602783203, "median": 14.03472900390625, "p90": 27.578981781005865, "max": 49.7489013671875, "pos_frac": 0.890625, "sample": [1.5999603271484375, -13.620269775390625, 25.197723388671875, 31.403564453125, -12.304122924804688, 30.968551635742188, -5.953910827636719, 6.3853607177734375, 28.056175231933594, 1.522918701171875, 18.121994018554688, 18.654449462890625, 10.443572998046875, 32.539886474609375, 18.07025909423828, 13.788406372070312, 29.706314086914062, -19.25469970703125, 4.558141708374023, 6.304798126220703, -1.7588729858398438, 1.233978271484375, 5.3826141357421875, 25.715763092041016, 15.823501586914062, 8.409957885742188, 9.754619598388672, 0.10865020751953125, 8.13232421875, 2.2347869873046875, 49.7489013671875, 9.523185729980469, 16.290369033813477, 36.65169143676758, 20.668113708496094, 8.442375183105469, 2.8483352661132812, 15.506156921386719, 22.38031768798828, 21.821533203125, 0.4728355407714844, 22.4390869140625, 26.465530395507812, 23.312355041503906, 8.58181381225586, 15.030860900878906, 4.843925476074219, -10.262153625488281, 11.476737976074219, 8.315826416015625, 18.238037109375, 15.627159118652344, -6.699443817138672, 20.057586669921875, 15.888069152832031, 5.161478042602539, 23.088340759277344, 3.058502197265625, 21.03265380859375, 26.328445434570312, 24.454917907714844, 14.963874816894531, 4.6042022705078125, 14.281051635742188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000228.npy"}
{"epoch": 0.33480176211453744, "step": 229, "batch_size": 64, "mean": 19.76543426513672, "std": 17.677209854125977, "min": -16.257286071777344, "p10": -1.0964111328125, "median": 18.180606842041016, "p90": 44.1922492980957, "max": 69.12523651123047, "pos_frac": 0.859375, "sample": [44.30608367919922, 11.859979629516602, 8.24212646484375, 34.38404846191406, 14.694343566894531, 4.946113586425781, 14.75225830078125, 28.992786407470703, 1.3520050048828125, 23.687896728515625, 41.66497802734375, 41.68205261230469, 23.442302703857422, 51.676788330078125, 49.94929504394531, 33.600563049316406, 23.49761199951172, 14.102859497070312, 19.580429077148438, 43.9266357421875, -5.5665130615234375, 9.379447937011719, 9.546192169189453, 30.044898986816406, 8.398719787597656, 23.264137268066406, 48.13710021972656, 5.063995361328125, 33.792015075683594, 33.27239227294922, 20.72899627685547, 29.723831176757812, 8.409996032714844, 24.582962036132812, 69.12523651123047, 6.177410125732422, 53.20347595214844, 39.97373580932617, -4.8612060546875, 2.21441650390625, 33.925994873046875, -16.257286071777344, -3.0000534057617188, -2.1684341430664062, 17.210548400878906, -1.057830810546875, 30.566970825195312, 34.98140335083008, 13.363739013671875, -0.9055652618408203, -1.2186336517333984, 28.652790069580078, 6.02726936340332, 3.5414161682128906, 14.613758087158203, 19.150665283203125, 5.9397125244140625, 9.089363098144531, 22.10993194580078, 25.45581817626953, -1.112945556640625, 2.202564239501953, 3.7958526611328125, 45.12828063964844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000229.npy"}
{"epoch": 0.33627019089574156, "step": 230, "batch_size": 64, "mean": 19.17786407470703, "std": 15.140778541564941, "min": -11.289134979248047, "p10": 1.5773853302001954, "median": 16.331168174743652, "p90": 40.76442222595215, "max": 57.280128479003906, "pos_frac": 0.921875, "sample": [9.236900329589844, 47.16804504394531, 8.392681121826172, 26.116348266601562, 0.6165122985839844, 14.907175064086914, 10.687747955322266, 29.256423950195312, 20.57442855834961, 15.628593444824219, 11.136245727539062, 10.7528076171875, -11.015571594238281, 19.462203979492188, 28.005645751953125, 54.584136962890625, -0.4357757568359375, 29.556854248046875, 1.6918792724609375, 15.759723663330078, 48.55174255371094, -11.289134979248047, 3.8613357543945312, 12.136993408203125, 16.41310691833496, 19.018280029296875, 26.343475341796875, 11.737411499023438, 9.99774169921875, 15.894538879394531, 7.724967956542969, -0.09507369995117188, 36.74269104003906, 44.02473449707031, 38.9821891784668, 38.21222686767578, 16.249229431152344, 20.18212127685547, 16.773784637451172, 29.11389923095703, 1.5283164978027344, 20.26299285888672, 28.053848266601562, 20.582847595214844, 13.786727905273438, 7.257270812988281, 41.528236389160156, 29.14006805419922, 43.5765266418457, 14.270469665527344, 7.564212799072266, 31.840225219726562, 11.845710754394531, 2.35302734375, 36.6156005859375, -9.053565979003906, 57.280128479003906, 16.9141845703125, 12.746040344238281, 23.49083709716797, 11.918632507324219, 11.099632263183594, 31.516708374023438, 18.605209350585938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000230.npy"}
{"epoch": 0.3377386196769457, "step": 231, "batch_size": 64, "mean": 14.49700927734375, "std": 14.05528450012207, "min": -12.4080810546875, "p10": -2.0097810745239255, "median": 12.342323303222656, "p90": 32.77192001342774, "max": 58.41736602783203, "pos_frac": 0.828125, "sample": [58.41736602783203, 11.186935424804688, 31.776981353759766, 38.568145751953125, -3.798412322998047, 21.31097412109375, 26.105018615722656, 15.160385131835938, 1.3007869720458984, 19.06011962890625, 0.7918930053710938, 11.839469909667969, -1.7522411346435547, 19.112281799316406, 53.66986846923828, 18.853958129882812, 21.635883331298828, 25.38275909423828, 12.951457977294922, -3.54034423828125, 16.250091552734375, 22.322633743286133, 9.863113403320312, 9.41025161743164, 8.750167846679688, 17.458614349365234, 23.836578369140625, -6.013023376464844, 17.609346389770508, 26.833724975585938, 39.4381103515625, 3.86676025390625, 10.985794067382812, 11.026912689208984, -2.1292648315429688, 4.986882209777832, -0.25337982177734375, 17.779285430908203, 5.610137939453125, 2.8121337890625, 11.0257568359375, 11.157329559326172, 7.7769775390625, -0.5753822326660156, 25.617156982421875, 12.673511505126953, 14.880683898925781, -1.0996208190917969, -12.4080810546875, 12.135223388671875, 33.19832229614258, 15.942089080810547, 8.523502349853516, 18.024322509765625, 17.74224853515625, 44.208251953125, -5.894508361816406, 10.874122619628906, -2.1201553344726562, 41.176116943359375, 12.549423217773438, 12.045257568359375, 5.985252380371094, 15.892669677734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000231.npy"}
{"epoch": 0.3392070484581498, "step": 232, "batch_size": 64, "mean": 14.463016510009766, "std": 16.30129051208496, "min": -19.514068603515625, "p10": -6.0167308807373026, "median": 14.03055191040039, "p90": 34.867645263671875, "max": 64.19261169433594, "pos_frac": 0.8125, "sample": [-11.274368286132812, 10.02081298828125, 0.27320098876953125, 13.985855102539062, 10.063980102539062, 23.569854736328125, 45.553680419921875, 16.642364501953125, -16.201690673828125, 40.21381378173828, -0.9920654296875, 9.063392639160156, 41.79998779296875, 8.7659912109375, 1.4673004150390625, -2.2205581665039062, -8.02593994140625, 2.9855880737304688, -1.69512939453125, 14.075248718261719, 20.518829345703125, 34.00830078125, 34.932716369628906, 21.4339599609375, 24.724143981933594, -4.050205230712891, 25.38885498046875, 9.575347900390625, 10.894775390625, 21.355850219726562, 23.42823028564453, 4.609710693359375, -19.514068603515625, 20.487762451171875, -2.031818389892578, 18.985748291015625, 19.775049209594727, 2.9819602966308594, 20.110198974609375, 43.138519287109375, 36.286376953125, 15.005706787109375, 32.37518310546875, 6.796485900878906, 9.906597137451172, 10.565958023071289, 32.029056549072266, 1.7676544189453125, 18.566200256347656, -9.59086799621582, 64.19261169433594, 4.224342346191406, 15.670211791992188, 22.06592559814453, 33.29199981689453, -6.859527587890625, 34.71581268310547, 3.7740516662597656, 10.207328796386719, -7.156646728515625, 0.8030319213867188, 18.316078186035156, 26.592247009277344, 23.262001037597656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000232.npy"}
{"epoch": 0.3406754772393539, "step": 233, "batch_size": 64, "mean": 19.149642944335938, "std": 18.02028465270996, "min": -15.912971496582031, "p10": -3.1538734436035147, "median": 18.765716552734375, "p90": 41.453620910644545, "max": 64.62330627441406, "pos_frac": 0.828125, "sample": [2.478759765625, 19.513782501220703, 31.4722900390625, -5.01171875, 15.568206787109375, 30.782909393310547, -13.890235900878906, 10.584590911865234, 2.1216278076171875, -0.89459228515625, 15.626434326171875, 9.41402816772461, 19.659080505371094, 17.128036499023438, 64.62330627441406, 23.649269104003906, 14.4791259765625, 2.62408447265625, 34.34104537963867, 28.45307159423828, 34.50513458251953, 32.416709899902344, -6.688652038574219, 55.84288787841797, 22.228248596191406, 12.815521240234375, 18.05453872680664, -1.1768798828125, -7.718437194824219, 23.428573608398438, -0.5267333984375, 35.538352966308594, 8.436111450195312, 51.097381591796875, 19.411643981933594, 58.1121826171875, 3.3189563751220703, 36.764984130859375, -2.1021270751953125, 19.890037536621094, 3.09478759765625, 42.856117248535156, 18.119789123535156, 15.706024169921875, -15.912971496582031, 5.186279296875, -3.6046218872070312, -4.318183898925781, 7.184120178222656, 12.063217163085938, 6.8848114013671875, 27.029457092285156, 27.330764770507812, 31.892974853515625, 36.83039855957031, 20.85748291015625, 14.224376678466797, 29.28382110595703, 21.13106918334961, 56.59321594238281, 43.274566650390625, 38.181129455566406, 34.78948974609375, 20.527572631835938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000233.npy"}
{"epoch": 0.342143906020558, "step": 234, "batch_size": 64, "mean": 19.561351776123047, "std": 19.38007926940918, "min": -18.428466796875, "p10": -2.4239433288574217, "median": 19.223844528198242, "p90": 50.479721069335945, "max": 65.25373840332031, "pos_frac": 0.859375, "sample": [13.639476776123047, 24.948837280273438, 54.05830383300781, 28.2119140625, 21.995712280273438, 49.11360168457031, 19.66241455078125, 20.870040893554688, 25.522857666015625, 18.708839416503906, 3.055339813232422, 3.5439300537109375, -15.23529052734375, -8.749271392822266, 11.273696899414062, 20.017539978027344, 4.453697204589844, 5.030300140380859, -8.426620483398438, 21.30402374267578, 65.25373840332031, 36.360984802246094, 7.726263046264648, -2.492523193359375, 29.796005249023438, 31.378097534179688, 38.68000030517578, 18.33892822265625, 9.810165405273438, 59.86213684082031, 2.389894485473633, 23.143569946289062, 35.61119079589844, 29.105064392089844, 51.06520080566406, -8.999053955078125, 18.785274505615234, 24.619075775146484, 14.224838256835938, 58.24969482421875, -12.2430419921875, 8.092470169067383, 12.735466003417969, 5.992006301879883, 0.6752433776855469, -0.11114501953125, 22.581809997558594, 7.398551940917969, 24.415557861328125, 21.011703491210938, 33.45829772949219, 53.95048522949219, 53.548980712890625, -2.2639236450195312, 33.244720458984375, 13.993019104003906, 5.297935485839844, 33.00606918334961, 18.45415496826172, -18.428466796875, 41.64182662963867, 0.7053985595703125, 33.83319091796875, 5.028350830078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000234.npy"}
{"epoch": 0.3436123348017621, "step": 235, "batch_size": 64, "mean": 15.239870071411133, "std": 14.483437538146973, "min": -12.070709228515625, "p10": -1.0915676116943358, "median": 13.843965530395508, "p90": 36.186135864257814, "max": 49.4447021484375, "pos_frac": 0.84375, "sample": [14.836174011230469, 9.305622100830078, -2.9748687744140625, -0.4884681701660156, 1.4558334350585938, 47.46784973144531, 24.99382781982422, 4.966640472412109, 8.196182250976562, 31.980987548828125, 10.25262451171875, 24.340309143066406, 36.48161315917969, 9.609390258789062, 14.546524047851562, -11.857589721679688, 22.24481201171875, -1.1166954040527344, 18.26163101196289, 28.527206420898438, -2.625809669494629, 18.597152709960938, -1.0329360961914062, 14.596900939941406, 23.088027954101562, 41.616180419921875, 18.364097595214844, 19.127792358398438, 12.167098999023438, 25.339157104492188, -12.070709228515625, 13.585731506347656, 8.169952392578125, 17.957916259765625, 0.15012359619140625, 38.472862243652344, 22.57624053955078, -0.3621826171875, 22.1143798828125, 49.4447021484375, 1.0187797546386719, 4.474523544311523, 17.30194091796875, 12.360206604003906, 46.44732666015625, 17.206436157226562, 37.61241149902344, -6.0460357666015625, 9.485389709472656, 34.33496856689453, 32.94386291503906, 32.39482116699219, 7.1557769775390625, 6.508594512939453, 5.891965866088867, 1.4282855987548828, -1.9476547241210938, 15.890975952148438, 4.7656097412109375, 9.399131774902344, 11.131927490234375, 14.10219955444336, 35.49668884277344, 5.687294006347656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000235.npy"}
{"epoch": 0.34508076358296624, "step": 236, "batch_size": 64, "mean": 18.328739166259766, "std": 21.269718170166016, "min": -15.53475570678711, "p10": -2.503033447265623, "median": 14.415497779846191, "p90": 43.50128479003907, "max": 100.60427856445312, "pos_frac": 0.875, "sample": [22.13800811767578, 3.6088409423828125, 36.36705017089844, 7.824615478515625, 9.778860092163086, 13.823158264160156, 8.61767578125, 8.01361083984375, -3.5544776916503906, -13.87158203125, 69.7000732421875, 84.99298095703125, 16.407535552978516, -3.3395233154296875, 12.908897399902344, 24.434814453125, 5.461734771728516, 0.7902565002441406, 11.290504455566406, 11.020492553710938, 16.45647430419922, 16.112762451171875, 6.907190322875977, 45.92005920410156, 41.239219665527344, 15.154315948486328, 11.990966796875, 100.60427856445312, 32.905517578125, 6.748876571655273, 40.212005615234375, 15.007837295532227, 44.470741271972656, 19.90439224243164, 22.410064697265625, 17.777603149414062, 23.005687713623047, 61.071983337402344, 21.53485870361328, 6.555719375610352, 20.808059692382812, 2.1353302001953125, 2.8498916625976562, -6.015220642089844, -0.5512237548828125, 25.5732421875, 4.883670806884766, 12.768775939941406, 2.532299041748047, -4.162801742553711, 10.75537109375, 40.84700012207031, 6.670799255371094, -15.53475570678711, 21.03179931640625, 21.617523193359375, 51.291351318359375, 15.692302703857422, 3.2805557250976562, -6.3016204833984375, 16.88311767578125, 22.281028747558594, 5.756782531738281, 25.541946411132812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000236.npy"}
{"epoch": 0.3465491923641703, "step": 237, "batch_size": 64, "mean": 15.88388442993164, "std": 20.03357696533203, "min": -31.42877960205078, "p10": -6.693202972412109, "median": 11.912469863891602, "p90": 41.083866119384766, "max": 66.86714172363281, "pos_frac": 0.796875, "sample": [2.510425567626953, 20.604721069335938, 54.175743103027344, 30.797874450683594, 2.170024871826172, 31.42321014404297, 37.91564178466797, 21.30047607421875, 0.5421981811523438, -6.809028625488281, 8.01211166381836, 23.26892852783203, 37.31047058105469, -2.5820770263671875, 7.776531219482422, 24.58238983154297, 11.593090057373047, -8.15644645690918, 14.312942504882812, 40.65082550048828, 8.49462890625, -2.8683319091796875, 59.547149658203125, 66.5367431640625, 25.832000732421875, 27.068572998046875, -2.194488525390625, 41.26945495605469, 18.84621810913086, 15.760025024414062, 66.86714172363281, 7.573509216308594, 36.94488525390625, 37.60152053833008, 19.192398071289062, 6.53118896484375, 17.342041015625, -6.0042572021484375, 18.354827880859375, 19.363460540771484, 53.597412109375, 2.5320816040039062, 10.697725296020508, 14.4879150390625, 4.363067626953125, -9.692474365234375, -2.020599365234375, 10.173698425292969, -31.42877960205078, 37.110740661621094, 8.553041458129883, 4.972297668457031, -15.381416320800781, 6.9688720703125, 19.810218811035156, 3.4204177856445312, 6.854198455810547, -9.205459594726562, -9.164291381835938, 8.828865051269531, -6.422943115234375, 44.281532287597656, 17.54193115234375, 12.231849670410156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000237.npy"}
{"epoch": 0.34801762114537443, "step": 238, "batch_size": 64, "mean": 16.033048629760742, "std": 16.922779083251953, "min": -11.703033447265625, "p10": -4.519968795776367, "median": 14.437093734741211, "p90": 38.62709274291993, "max": 72.05708312988281, "pos_frac": 0.859375, "sample": [44.26594543457031, 6.874732971191406, 4.315864562988281, 14.554641723632812, 2.5746994018554688, 18.59526824951172, 5.285980224609375, 48.756805419921875, 0.4589996337890625, 38.91786193847656, 21.40987777709961, 0.4098320007324219, 4.0750885009765625, 13.190071105957031, 18.749191284179688, 2.307201385498047, 36.151458740234375, -7.6175689697265625, -7.531421661376953, 6.099714279174805, 9.453023910522461, 49.0911865234375, 21.67181396484375, 15.281227111816406, -3.086200714111328, 72.05708312988281, 7.205619812011719, 19.547378540039062, 30.576812744140625, 23.338455200195312, 5.8659515380859375, -11.703033447265625, 14.0224609375, 19.961898803710938, 28.478675842285156, 5.756340026855469, -8.693763732910156, 19.084178924560547, 36.63383483886719, 18.24462890625, 7.332759857177734, 43.353759765625, 37.948631286621094, -8.814186096191406, -4.225246429443359, 37.26831817626953, 6.557685852050781, 10.2489013671875, 7.373348236083984, 44.919342041015625, 14.950981140136719, 14.31954574584961, 0.05450439453125, 22.993743896484375, -4.646278381347656, 23.849956512451172, 5.816265106201172, 15.531135559082031, 27.929309844970703, -8.12872314453125, 12.240188598632812, 27.40386199951172, 24.11602783203125, 23.08950424194336], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000238.npy"}
{"epoch": 0.34948604992657856, "step": 239, "batch_size": 64, "mean": 15.163556098937988, "std": 15.636310577392578, "min": -19.035255432128906, "p10": -3.3310569763183593, "median": 15.297775268554688, "p90": 35.59111633300782, "max": 46.30491638183594, "pos_frac": 0.796875, "sample": [44.81835174560547, 16.382476806640625, 7.1117401123046875, 15.32415771484375, 18.587936401367188, 32.2408447265625, 33.454071044921875, 4.533023834228516, -4.912242889404297, 17.58022689819336, 32.699615478515625, 36.79771423339844, -7.524299621582031, 28.252391815185547, 15.271392822265625, -0.3661918640136719, 46.30491638183594, 17.778076171875, -1.8691329956054688, 45.22889709472656, 30.062774658203125, 21.453964233398438, -4.1343994140625, -2.236642837524414, -1.487396240234375, 0.15358352661132812, 8.387016296386719, 4.642646789550781, 29.179946899414062, 5.409454345703125, 32.18476104736328, -2.423004150390625, 21.580596923828125, 9.892852783203125, 36.23396301269531, 9.852397918701172, 2.085174560546875, 23.833511352539062, 28.35485076904297, 3.6853790283203125, -3.2881507873535156, 30.841026306152344, 19.89795684814453, 9.286293029785156, 2.9582366943359375, 9.231746673583984, -19.035255432128906, 7.7185211181640625, -3.349445343017578, 16.979827880859375, 38.573570251464844, 1.5039901733398438, 19.150558471679688, 16.405466079711914, 10.174957275390625, 17.87338638305664, 34.09114074707031, -7.359477996826172, 15.007869720458984, 23.34653091430664, 33.53544616699219, 9.361427307128906, 44.85063171386719, -11.694086074829102], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000239.npy"}
{"epoch": 0.3509544787077827, "step": 240, "batch_size": 64, "mean": 14.376178741455078, "std": 14.550166130065918, "min": -17.71826171875, "p10": -4.104077148437499, "median": 17.155244827270508, "p90": 31.233640289306642, "max": 50.13744354248047, "pos_frac": 0.796875, "sample": [26.84819221496582, 35.7813835144043, 11.060195922851562, 3.5048294067382812, 22.558670043945312, 9.149383544921875, 11.629142761230469, 22.114471435546875, 17.370834350585938, -3.1921424865722656, -3.2541351318359375, 3.6147994995117188, 22.766990661621094, -11.321891784667969, 30.849815368652344, -9.945671081542969, 5.610837936401367, 10.865562438964844, 17.359371185302734, 30.91661834716797, 23.217342376708984, 34.1417236328125, 31.846473693847656, 27.036422729492188, 13.367530822753906, -1.24530029296875, 4.234733581542969, 4.04107666015625, 28.802154541015625, 25.555931091308594, 18.205245971679688, 28.736190795898438, 7.6370391845703125, -6.501251220703125, -17.71826171875, 50.13744354248047, 18.3555908203125, 17.462261199951172, -1.4057464599609375, 16.95111846923828, -0.0389862060546875, 27.13481903076172, 23.54149627685547, 6.860755920410156, 29.07225799560547, 25.00701904296875, 36.50788879394531, -7.204975128173828, 31.3695068359375, -4.4683380126953125, 18.970260620117188, 44.864227294921875, -1.6527786254882812, 5.0251922607421875, 18.285011291503906, 18.572738647460938, 21.873294830322266, 21.920913696289062, 11.005126953125, 3.2644500732421875, 0.8203048706054688, 7.405555725097656, 16.291053771972656, -11.496322631835938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000240.npy"}
{"epoch": 0.3524229074889868, "step": 241, "batch_size": 64, "mean": 23.0627498626709, "std": 19.968881607055664, "min": -9.356111526489258, "p10": 1.7946578979492194, "median": 23.475460052490234, "p90": 40.55640182495117, "max": 101.82414245605469, "pos_frac": 0.921875, "sample": [44.15406799316406, 13.075920104980469, 4.680023193359375, 12.677627563476562, 1.268310546875, 22.927040100097656, 26.427520751953125, 25.7447509765625, 34.847991943359375, 49.1372184753418, 63.54591369628906, 24.746688842773438, 11.692962646484375, 31.753387451171875, -3.5588645935058594, 24.521366119384766, -5.319889068603516, 17.098663330078125, 34.92013168334961, 26.622230529785156, 1.483245849609375, 26.034271240234375, 10.759674072265625, -4.8938751220703125, 83.90554809570312, 24.023880004882812, 2.899688720703125, 26.620529174804688, 39.660125732421875, 13.089103698730469, 101.82414245605469, 13.156845092773438, 21.085227966308594, 13.822456359863281, 38.42259979248047, 7.607337951660156, 24.755264282226562, 30.928512573242188, 7.073238372802734, 32.278045654296875, 40.764617919921875, 26.643531799316406, 4.000301361083984, 15.538658142089844, 25.486373901367188, 2.5212860107421875, 33.02471160888672, 26.78670883178711, 33.14349365234375, 5.4685211181640625, 13.61663818359375, 32.85652160644531, -1.1911392211914062, 40.07056427001953, 5.636760711669922, 11.267398834228516, 22.348556518554688, 36.914581298828125, 18.25177001953125, 27.999359130859375, 3.5108642578125, 63.22083282470703, -9.356111526489258, 21.992340087890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000241.npy"}
{"epoch": 0.35389133627019087, "step": 242, "batch_size": 64, "mean": 17.334701538085938, "std": 15.170355796813965, "min": -13.17230224609375, "p10": -3.958333587646484, "median": 15.322498321533203, "p90": 42.36447906494141, "max": 48.2799072265625, "pos_frac": 0.875, "sample": [5.1966094970703125, 14.458118438720703, 0.994049072265625, 33.52998352050781, 10.431404113769531, 28.275299072265625, 23.054611206054688, -7.620513916015625, 22.940773010253906, 21.43683624267578, 2.4487857818603516, 45.11735534667969, 44.043922424316406, -4.200355529785156, 18.388206481933594, 6.938507080078125, 25.51110076904297, 33.75959014892578, 40.67616271972656, 1.7401695251464844, 47.36667251586914, 19.43598175048828, 32.07076644897461, 7.7213897705078125, -5.159954071044922, 28.81116485595703, 22.66547393798828, 8.98590087890625, 14.886241912841797, 21.7576904296875, 16.032371520996094, 10.357200622558594, 11.747245788574219, 12.792922973632812, 10.291030883789062, -7.594047546386719, 16.28753662109375, 11.905410766601562, -6.167930603027344, 6.451873779296875, 11.007396697998047, 9.173660278320312, 32.959739685058594, 48.2799072265625, -3.39361572265625, 14.796371459960938, 43.62908935546875, 43.09130859375, -7.65167236328125, 33.721275329589844, 13.879379272460938, 16.48615264892578, 12.758502960205078, 2.9499359130859375, -13.17230224609375, 6.2003631591796875, 15.75875473022461, 25.784420013427734, 30.077651977539062, 17.919410705566406, 14.428077697753906, 26.555465698242188, 23.328048706054688, 43.088043212890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000242.npy"}
{"epoch": 0.355359765051395, "step": 243, "batch_size": 64, "mean": 17.512855529785156, "std": 17.505395889282227, "min": -19.09378433227539, "p10": -5.004259490966795, "median": 20.189624786376953, "p90": 35.30860061645508, "max": 55.82733154296875, "pos_frac": 0.8125, "sample": [20.1551513671875, -15.958419799804688, -2.2884063720703125, 29.393112182617188, 16.456199645996094, 29.57819366455078, -3.0681991577148438, -2.079984664916992, 18.08778190612793, 15.349365234375, 49.62811279296875, 9.710929870605469, 50.939849853515625, 20.888587951660156, 8.134811401367188, 35.121055603027344, -7.705841064453125, 22.511825561523438, 0.5210037231445312, 6.694221496582031, -17.049118041992188, 7.8003692626953125, 30.534496307373047, 8.512893676757812, -1.9866161346435547, 52.24562454223633, 23.50530242919922, -18.7188720703125, 28.783016204833984, 15.594833374023438, 44.9813232421875, 21.39190673828125, 15.668357849121094, 27.213729858398438, 11.481147766113281, 55.82733154296875, 33.151397705078125, 3.629180908203125, -5.8339996337890625, 30.64215087890625, 24.977325439453125, 12.950271606445312, 29.866744995117188, 45.50383758544922, 32.394439697265625, 25.330596923828125, -12.341506958007812, 20.224098205566406, 22.82465362548828, -2.1019344329833984, 35.38897705078125, 25.514129638671875, 15.05572509765625, 7.903778076171875, 6.886451721191406, 4.96405029296875, 30.585914611816406, 22.04706573486328, 27.980640411376953, 25.31981658935547, 24.411666870117188, 19.34076690673828, 25.445262908935547, -19.09378433227539], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000243.npy"}
{"epoch": 0.3568281938325991, "step": 244, "batch_size": 64, "mean": 16.991851806640625, "std": 17.610031127929688, "min": -14.384256362915039, "p10": -1.4273300170898429, "median": 15.202524185180664, "p90": 40.87757110595704, "max": 65.46891784667969, "pos_frac": 0.84375, "sample": [23.22442626953125, 12.194435119628906, 8.387733459472656, 19.754241943359375, 0.303497314453125, 12.52115249633789, 29.076148986816406, 28.22101593017578, 39.78269958496094, 39.422698974609375, 17.148174285888672, 1.68988037109375, 19.214187622070312, 16.175010681152344, 2.576995849609375, 27.16387176513672, 1.6300163269042969, 47.892921447753906, 15.752799987792969, 0.6754302978515625, -0.27704620361328125, 7.3597564697265625, 24.402305603027344, 2.2267704010009766, 22.99737548828125, 9.477811813354492, -8.773517608642578, 15.72152328491211, -0.6666183471679688, -5.5471038818359375, 19.388320922851562, -5.671070098876953, 57.20458984375, 14.075569152832031, 41.3468017578125, 8.90020751953125, 5.340049743652344, 16.909385681152344, -0.16677284240722656, 34.61598205566406, 30.259979248046875, -14.384256362915039, -3.9234237670898438, 7.0987701416015625, 6.7961578369140625, -7.3955841064453125, 24.681751251220703, 18.435100555419922, 18.201278686523438, -1.7533493041992188, 42.914817810058594, 53.687129974365234, 14.683525085449219, 8.154190063476562, 1.0701980590820312, 10.48025131225586, 11.68563461303711, 63.32635498046875, 11.601936340332031, 65.46891784667969, 24.58660125732422, 19.70563507080078, 21.858306884765625, 38.56684112548828], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000244.npy"}
{"epoch": 0.35829662261380324, "step": 245, "batch_size": 64, "mean": 19.31477928161621, "std": 15.774532318115234, "min": -8.260696411132812, "p10": -0.3970492362976074, "median": 17.343774795532227, "p90": 41.927273559570324, "max": 52.04893493652344, "pos_frac": 0.875, "sample": [-2.7396926879882812, 31.13311767578125, 30.380264282226562, 10.183967590332031, 16.794357299804688, 3.112934112548828, 5.973735809326172, 22.85498046875, 37.04438781738281, 39.89695739746094, 39.029876708984375, -0.5421257019042969, 23.418785095214844, 46.57173156738281, 9.451499938964844, 46.041587829589844, 24.233787536621094, 22.995437622070312, 27.566200256347656, 1.9660606384277344, 52.04893493652344, 26.349029541015625, 0.7366094589233398, 36.74729919433594, 1.0255041122436523, -0.4073944091796875, 25.232017517089844, -3.6822967529296875, 7.483760833740234, 8.992263793945312, 26.072479248046875, 14.673652648925781, 46.81689453125, 37.75761413574219, 30.383167266845703, 27.772811889648438, 0.909637451171875, 17.893192291259766, 16.154693603515625, 8.291938781738281, 11.702573776245117, -1.3906631469726562, 21.475311279296875, 28.92967987060547, 0.3968658447265625, 11.6343994140625, 31.57172393798828, 45.32139205932617, 10.944557189941406, 44.91184997558594, 34.37913513183594, 4.445747375488281, -1.6833648681640625, 3.044239044189453, -8.260696411132812, 42.79740905761719, 11.271434783935547, 31.54192352294922, 35.44009780883789, 16.120559692382812, 16.508163452148438, 19.637405395507812, 9.15945816040039, -0.3729104995727539], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000245.npy"}
{"epoch": 0.35976505139500736, "step": 246, "batch_size": 64, "mean": 14.936592102050781, "std": 15.910637855529785, "min": -21.392333984375, "p10": -2.73893814086914, "median": 13.005517959594727, "p90": 36.17243270874024, "max": 66.02347564697266, "pos_frac": 0.84375, "sample": [36.86712646484375, 36.266624450683594, 11.173080444335938, 28.062210083007812, 40.6966552734375, 25.450183868408203, 53.73969268798828, 11.17447280883789, -2.9401092529296875, 1.3341026306152344, 35.95265197753906, 19.285015106201172, 17.834062576293945, 10.804759979248047, 3.6617431640625, 26.72193145751953, -21.392333984375, 6.047885894775391, 14.836563110351562, 8.601058959960938, 10.335945129394531, 20.441585540771484, 10.14321517944336, 0.2321319580078125, 24.71063995361328, 18.778839111328125, -2.2695388793945312, -3.000823974609375, 66.02347564697266, -7.400064468383789, -0.8834342956542969, 0.11859130859375, 6.599205017089844, 18.813858032226562, -3.6872634887695312, 25.693817138671875, 6.599336624145508, 8.756084442138672, 38.95977783203125, 3.5241737365722656, 36.487335205078125, 23.372879028320312, 24.87476348876953, -4.126220703125, 9.954601287841797, 21.332443237304688, 0.311614990234375, 31.2177734375, -1.1099977493286133, 24.744049072265625, 33.428138732910156, 18.82561492919922, 16.978469848632812, 3.5920028686523438, 1.9736709594726562, 17.29883575439453, 1.643381118774414, 22.866050720214844, 27.72466278076172, 5.565093994140625, 22.2188720703125, 16.220500946044922, -16.78802490234375, 10.66843032836914], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000246.npy"}
{"epoch": 0.36123348017621143, "step": 247, "batch_size": 64, "mean": 19.322500228881836, "std": 18.510311126708984, "min": -17.608795166015625, "p10": 0.8037725448608404, "median": 16.093809127807617, "p90": 46.21750030517579, "max": 86.92901611328125, "pos_frac": 0.9375, "sample": [0.4757728576660156, 10.4923095703125, 3.780853271484375, 26.684982299804688, 17.36968994140625, 28.36737060546875, 37.19071960449219, 27.149574279785156, 4.064338684082031, 33.26647186279297, 24.82823944091797, 47.57817077636719, 19.5072021484375, 19.653175354003906, 5.743934631347656, 27.24329376220703, 20.7864990234375, 7.0418853759765625, 43.08208465576172, 43.48704528808594, 1.470916748046875, -2.726001739501953, 25.816390991210938, 4.054592132568359, 22.151294708251953, 14.997417449951172, -4.063682556152344, 1.3255081176757812, 24.013782501220703, 5.3561248779296875, 49.43125915527344, 41.42938232421875, 3.01922607421875, 7.05572509765625, 8.446975708007812, 10.096237182617188, 86.92901611328125, 48.095245361328125, 12.261878967285156, -3.7524185180664062, 0.5801715850830078, 35.448516845703125, -17.608795166015625, 18.956817626953125, 2.4563255310058594, 6.595817565917969, 14.694007873535156, 12.703765869140625, 60.81378173828125, 18.780479431152344, 17.190200805664062, 7.917488098144531, 28.104354858398438, 57.619789123535156, 0.3281402587890625, 6.193973541259766, 13.436485290527344, 27.68938446044922, 24.840314865112305, 7.422027587890625, 47.3876953125, 5.386676788330078, 23.589141845703125, 12.9110107421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000247.npy"}
{"epoch": 0.36270190895741555, "step": 248, "batch_size": 64, "mean": 14.593929290771484, "std": 16.044591903686523, "min": -15.846435546875, "p10": -3.545543670654296, "median": 12.70712661743164, "p90": 36.52227706909181, "max": 55.038612365722656, "pos_frac": 0.828125, "sample": [17.78485870361328, 49.92523193359375, -4.857200622558594, -8.543251037597656, 3.032867431640625, 4.080894470214844, -9.356163024902344, 24.085540771484375, 9.762855529785156, -11.984130859375, -1.6614608764648438, 15.298675537109375, 19.60161590576172, 8.230815887451172, 55.038612365722656, 6.612234115600586, 3.190389633178711, 6.26397705078125, 19.883522033691406, 4.903411865234375, -0.5687713623046875, 29.76901626586914, 7.127838134765625, 46.413909912109375, 12.126937866210938, 0.6939811706542969, 25.59841537475586, 43.23041534423828, 14.743728637695312, 30.919021606445312, -5.5142059326171875, 37.33775329589844, 31.794357299804688, 21.28369140625, 27.13172721862793, 19.150123596191406, -15.846435546875, 27.381683349609375, 5.052421569824219, -2.607677459716797, 12.083160400390625, 15.119842529296875, 14.300819396972656, 39.244346618652344, 13.287315368652344, 7.11668586730957, 32.019187927246094, 52.65934753417969, -3.9029312133789062, 34.61949920654297, 9.361778259277344, 6.863986968994141, 26.867206573486328, 23.35163116455078, 21.429580688476562, 14.743980407714844, 1.1377143859863281, 18.232688903808594, -2.711639404296875, 13.547298431396484, 2.083984375, 7.1491546630859375, 1.2614326477050781, 7.634197235107422], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000248.npy"}
{"epoch": 0.3641703377386197, "step": 249, "batch_size": 64, "mean": 14.382675170898438, "std": 16.93221664428711, "min": -19.531539916992188, "p10": -6.429325103759766, "median": 14.13223648071289, "p90": 38.583096313476574, "max": 48.328857421875, "pos_frac": 0.796875, "sample": [7.136325836181641, 31.11809539794922, -9.218475341796875, -14.63547134399414, 26.183570861816406, 7.642101287841797, 0.187042236328125, 15.781448364257812, -6.58868408203125, 3.9043731689453125, 24.972320556640625, 22.167152404785156, 2.7798709869384766, 29.97888946533203, -6.057487487792969, -19.531539916992188, 7.311969757080078, 48.328857421875, 13.858749389648438, -3.7666664123535156, 48.221832275390625, 19.73987579345703, -13.513336181640625, 19.50238037109375, 6.527168273925781, 14.393814086914062, 10.832721710205078, -1.1700820922851562, 25.791160583496094, 21.25109100341797, -4.730926513671875, 17.67932891845703, 4.100410461425781, 18.112030029296875, 12.357501983642578, 11.0543212890625, 22.681610107421875, 18.7864990234375, 9.111518859863281, 19.831222534179688, 41.666900634765625, 14.763778686523438, 46.31671142578125, -15.043140411376953, -1.80108642578125, 13.603523254394531, 34.36280059814453, 2.9842357635498047, 13.870658874511719, 3.7112693786621094, -1.1019248962402344, 33.2872314453125, 15.019554138183594, 44.326690673828125, 39.93827819824219, 25.591773986816406, 0.3539314270019531, 8.117851257324219, 41.306278228759766, -18.758262634277344, 34.03309631347656, 35.42100524902344, 28.089324951171875, 18.318115234375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000249.npy"}
{"epoch": 0.3656387665198238, "step": 250, "batch_size": 64, "mean": 14.561920166015625, "std": 15.646280288696289, "min": -30.64997100830078, "p10": -4.690089607238769, "median": 15.988601684570312, "p90": 34.32710494995118, "max": 53.47660827636719, "pos_frac": 0.796875, "sample": [32.846595764160156, 26.344310760498047, -4.5586700439453125, -10.295781135559082, -8.701751708984375, -1.597198486328125, 37.293739318847656, 40.786376953125, 10.481613159179688, 22.75664520263672, 20.73196029663086, 16.14935302734375, 23.11980438232422, 5.605682373046875, -1.351165771484375, 5.606439590454102, 32.18373107910156, 23.16986083984375, 34.96160888671875, 26.228515625, 24.573974609375, 10.134445190429688, 18.54461669921875, 24.213367462158203, 18.855358123779297, 25.018245697021484, 19.041988372802734, -1.4270553588867188, 20.452869415283203, -6.6768798828125, 7.57293701171875, 8.261619567871094, -4.420982360839844, 17.160430908203125, 0.9450263977050781, 53.47660827636719, 45.91770935058594, 1.4367752075195312, -1.3708038330078125, 41.879085540771484, 4.0902252197265625, 9.409324645996094, 24.29583740234375, 17.23084259033203, 15.827850341796875, 23.066078186035156, 12.636268615722656, 0.5469474792480469, 4.454902648925781, 14.97894287109375, 23.500030517578125, 8.234405517578125, 21.674291610717773, 42.600860595703125, -30.64997100830078, 4.010059356689453, 16.71839141845703, 28.394317626953125, 12.675399780273438, -5.199394226074219, 14.011016845703125, -4.74641227722168, -6.136810302734375, 24.988426208496094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000250.npy"}
{"epoch": 0.3671071953010279, "step": 251, "batch_size": 64, "mean": 14.380847930908203, "std": 16.96839714050293, "min": -21.1082763671875, "p10": -8.272480010986328, "median": 14.156705856323242, "p90": 33.15830879211426, "max": 61.173431396484375, "pos_frac": 0.796875, "sample": [10.247711181640625, -2.2973384857177734, 6.0807342529296875, 25.232345581054688, 27.771530151367188, 9.289176940917969, 25.490814208984375, 48.614898681640625, 10.101310729980469, -4.1699371337890625, 61.173431396484375, 7.1899871826171875, 23.84917449951172, -2.8232803344726562, 18.618125915527344, 6.773826599121094, -9.373336791992188, 35.41969299316406, 16.559452056884766, 20.104713439941406, 33.76247787475586, 28.47686767578125, 0.07720184326171875, 8.807754516601562, 31.748580932617188, -8.132102966308594, -8.3326416015625, -18.937679290771484, 41.09959030151367, 21.01787567138672, 13.627243041992188, 2.096893310546875, 26.679443359375, 18.341148376464844, -1.5577583312988281, 26.119224548339844, 22.025299072265625, 8.327190399169922, 4.963863372802734, 41.03437423706055, 26.597900390625, -21.1082763671875, 14.285579681396484, 21.133712768554688, 16.52300262451172, 18.35938262939453, 28.812423706054688, 52.436431884765625, 21.79126739501953, 27.852767944335938, -9.80422592163086, 22.301345825195312, 28.934295654296875, 9.019157409667969, 14.02783203125, 6.307401657104492, 3.3670597076416016, 27.3262939453125, 8.016460418701172, 1.9984664916992188, -16.347503662109375, 6.5888671875, -2.0154590606689453, -11.127830505371094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000251.npy"}
{"epoch": 0.368575624082232, "step": 252, "batch_size": 64, "mean": 14.227885246276855, "std": 14.598040580749512, "min": -13.587425231933594, "p10": -5.500228118896484, "median": 12.445703506469727, "p90": 34.98490257263184, "max": 52.095184326171875, "pos_frac": 0.828125, "sample": [48.484901428222656, 34.52227783203125, -4.630104064941406, 17.912586212158203, 19.681015014648438, -11.362800598144531, 4.282352447509766, 35.34979248046875, 17.37591552734375, 27.087509155273438, 22.17718505859375, 3.0390968322753906, 35.183170318603516, -6.0525360107421875, 21.681015014648438, 22.730148315429688, 17.46389389038086, 8.883766174316406, 12.856643676757812, 28.702102661132812, 18.173690795898438, -0.165283203125, 15.910320281982422, 11.113288879394531, 7.8345184326171875, 2.7391357421875, 3.089548110961914, 17.42401123046875, 42.50819396972656, 12.03476333618164, 15.629840850830078, 18.954513549804688, 19.38603973388672, 6.670753479003906, 52.095184326171875, -12.15447998046875, 25.37543487548828, -9.754417419433594, 21.477684020996094, 26.12579345703125, -2.849679946899414, -13.587425231933594, 11.1571044921875, 6.272087097167969, -5.873138427734375, 23.665416717529297, 21.89043426513672, 10.273788452148438, 8.057754516601562, 32.79343795776367, 19.781173706054688, -6.155128479003906, 10.853271484375, 25.1070556640625, 7.0843963623046875, 10.048454284667969, 9.535839080810547, 8.09466552734375, 35.23333740234375, 3.815410614013672, -4.377498626708984, 6.764049530029297, 37.880836486816406, 5.282508850097656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000252.npy"}
{"epoch": 0.3700440528634361, "step": 253, "batch_size": 64, "mean": 14.266160011291504, "std": 14.657346725463867, "min": -25.850555419921875, "p10": -3.317631530761718, "median": 14.367191314697266, "p90": 31.86056823730469, "max": 49.0316162109375, "pos_frac": 0.828125, "sample": [6.833597183227539, -7.0682525634765625, 15.849639892578125, 13.150527954101562, 20.211227416992188, 4.751960754394531, 24.460174560546875, 16.695785522460938, 17.073837280273438, 11.616188049316406, 32.18194580078125, 23.898101806640625, 28.65411376953125, 9.041145324707031, 25.998886108398438, 23.703445434570312, 32.733123779296875, 4.5206146240234375, -4.168487548828125, 4.977996826171875, -17.43612289428711, -12.450897216796875, 24.547714233398438, 23.14764404296875, 5.9412841796875, 6.861001968383789, 41.19200134277344, -0.2678565979003906, 11.3082275390625, 6.401679992675781, -2.87298583984375, 27.787437438964844, 13.680259704589844, 29.318710327148438, 27.953079223632812, 31.110687255859375, 7.207115173339844, 3.9749488830566406, 7.94732666015625, 15.054122924804688, 15.255973815917969, 45.49433898925781, 36.21040344238281, 49.0316162109375, 22.079925537109375, -3.5081939697265625, 18.1170654296875, 6.7773284912109375, -4.316688537597656, 5.153144836425781, 21.969879150390625, 7.378662109375, -25.850555419921875, -1.968099594116211, 17.969940185546875, 7.070041656494141, 17.897125244140625, -2.0160675048828125, 30.49410629272461, 32.5782470703125, 2.2210121154785156, 11.11260986328125, 20.554283142089844, 27.80718994140625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000253.npy"}
{"epoch": 0.37151248164464024, "step": 254, "batch_size": 64, "mean": 13.471290588378906, "std": 14.295844078063965, "min": -21.049970626831055, "p10": -2.5473865509033202, "median": 13.414466857910156, "p90": 31.196575927734376, "max": 48.83466339111328, "pos_frac": 0.8125, "sample": [41.706451416015625, -21.049970626831055, 24.782630920410156, 0.4524097442626953, 17.867708206176758, 30.445693969726562, 2.4033355712890625, -0.19339942932128906, 7.0025787353515625, 9.960929870605469, -2.573780059814453, 14.751121520996094, 7.08111572265625, 20.417510986328125, 5.677665710449219, -8.671890258789062, -2.4858016967773438, 14.267105102539062, 20.58599853515625, 33.623329162597656, -3.8240890502929688, 33.838340759277344, 43.556427001953125, 48.83466339111328, 31.270614624023438, -13.5482177734375, 8.247001647949219, 8.288345336914062, 2.867158889770508, 19.310428619384766, -5.79315185546875, 21.576614379882812, -1.1056671142578125, 11.725830078125, -1.1511421203613281, 29.226058959960938, 21.311660766601562, 31.023818969726562, 19.694740295410156, 14.48220443725586, 0.05490875244140625, 12.56182861328125, 4.641424179077148, 29.403594970703125, -1.48345947265625, 21.701873779296875, 0.2880096435546875, 9.934822082519531, 9.422332763671875, 16.99722671508789, 18.102272033691406, -4.458648681640625, 25.603973388671875, 23.838272094726562, 4.618816375732422, 25.676971435546875, 3.193389892578125, 39.16094207763672, 21.91539764404297, 14.953353881835938, 21.206741333007812, 4.61737060546875, 7.390491485595703, 16.938262939453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000254.npy"}
{"epoch": 0.37298091042584436, "step": 255, "batch_size": 64, "mean": 16.123796463012695, "std": 15.002100944519043, "min": -11.68580436706543, "p10": -1.4479362487792964, "median": 18.146081924438477, "p90": 34.83368453979493, "max": 62.748050689697266, "pos_frac": 0.859375, "sample": [33.543182373046875, -3.919586181640625, 1.5405960083007812, 24.079132080078125, 19.190414428710938, 7.085693359375, 3.6941757202148438, -7.2342681884765625, 7.4344024658203125, 26.61907196044922, 2.02630615234375, 20.059993743896484, 20.025619506835938, 1.6596641540527344, 23.75537109375, 22.804054260253906, 7.511760711669922, 20.340560913085938, 11.00396728515625, 38.522178649902344, 20.478057861328125, 8.506084442138672, 4.6686859130859375, 18.66463851928711, 17.451507568359375, -1.0380630493164062, -11.363727569580078, 30.32453155517578, 6.979438781738281, -1.62359619140625, 4.9673309326171875, 62.748050689697266, 19.049774169921875, 20.727012634277344, 6.34185791015625, 22.939163208007812, 45.17103576660156, 17.627525329589844, 21.20782470703125, 25.271610260009766, 35.386756896972656, -11.68580436706543, 23.781429290771484, 15.546539306640625, 2.3144683837890625, 19.875614166259766, 28.366546630859375, 24.644332885742188, 14.825874328613281, 23.084156036376953, 38.88319396972656, 8.22021484375, -6.277259826660156, 8.75387954711914, -0.3848381042480469, 29.544540405273438, 38.967254638671875, 32.762001037597656, 5.476066589355469, 16.48595428466797, -10.463325500488281, 25.911659240722656, 3.723175048828125, 45.33954620361328], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000255.npy"}
{"epoch": 0.3744493392070485, "step": 256, "batch_size": 64, "mean": 18.059707641601562, "std": 17.70880126953125, "min": -24.343589782714844, "p10": -0.17880935668945272, "median": 17.159061431884766, "p90": 39.9193775177002, "max": 65.62922668457031, "pos_frac": 0.890625, "sample": [17.070030212402344, 24.58892059326172, 13.47955322265625, 6.1456146240234375, 1.0384445190429688, 31.64032745361328, 26.263721466064453, 40.32024002075195, 3.9529571533203125, 24.848678588867188, 17.248092651367188, 13.090530395507812, 50.49871826171875, 14.20880126953125, 46.91856384277344, -24.343589782714844, 11.407825469970703, 32.925437927246094, 11.665229797363281, 5.233628273010254, 11.69390869140625, -12.242744445800781, 15.715938568115234, 33.97508239746094, 21.036548614501953, 30.949073791503906, 31.151363372802734, 14.05126953125, 11.038238525390625, 17.491348266601562, 26.1492919921875, 15.770584106445312, 10.625831604003906, -14.454849243164062, 4.001434326171875, 35.640384674072266, 6.5012054443359375, 29.103927612304688, -2.3941650390625, 0.21263885498046875, 17.571746826171875, -15.216049194335938, 38.984031677246094, -0.3465728759765625, 26.44439697265625, 3.0571975708007812, 30.68743133544922, 45.91767501831055, 17.88426971435547, -8.20754623413086, 9.186325073242188, 24.74860382080078, 34.64483642578125, 42.76361083984375, 0.558929443359375, 36.067115783691406, 30.933876037597656, 0.4225730895996094, 54.25038146972656, 0.25336456298828125, 21.1011962890625, 2.501131057739258, 21.76555633544922, 65.62922668457031], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000256.npy"}
{"epoch": 0.37591776798825255, "step": 257, "batch_size": 64, "mean": 16.88214874267578, "std": 16.617778778076172, "min": -18.54558563232422, "p10": 0.018435668945313233, "median": 14.376745223999023, "p90": 35.88106918334961, "max": 71.31863403320312, "pos_frac": 0.890625, "sample": [71.31863403320312, 10.497909545898438, 8.9951171875, 14.545421600341797, 22.22785186767578, 9.303321838378906, 33.899452209472656, 48.51863098144531, 23.818954467773438, -18.54558563232422, 6.1005401611328125, 1.0389480590820312, 27.10857391357422, 28.30425262451172, 28.509620666503906, 15.363555908203125, 24.22753143310547, 16.981433868408203, 5.317661285400391, 11.745948791503906, 21.51068878173828, 57.13232421875, 35.79804992675781, -18.163787841796875, 12.500858306884766, 13.925361633300781, -0.2967529296875, 6.0434417724609375, 17.205211639404297, 12.9442138671875, 22.55999755859375, 8.319412231445312, 34.76361846923828, 28.263397216796875, 6.427703857421875, 16.6883544921875, 21.483400344848633, -1.5735054016113281, 31.70313262939453, 0.753875732421875, 24.72365951538086, 5.776374816894531, 13.747184753417969, 10.19271469116211, 17.627334594726562, 12.770401000976562, 4.922569274902344, 8.425382614135742, 40.257293701171875, 35.916648864746094, 16.88936996459961, 4.060874938964844, 6.4321746826171875, 3.3491973876953125, 21.821754455566406, 41.64683532714844, 19.931365966796875, -0.6629390716552734, 14.20806884765625, -11.478727340698242, -12.59649658203125, 26.18767547607422, 8.300132751464844, 50.741920471191406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000257.npy"}
{"epoch": 0.37738619676945667, "step": 258, "batch_size": 64, "mean": 16.540096282958984, "std": 14.066015243530273, "min": -14.757331848144531, "p10": -0.1721284866333001, "median": 15.812002182006836, "p90": 37.08871421813965, "max": 50.70774841308594, "pos_frac": 0.890625, "sample": [11.169219970703125, 0.47381019592285156, 19.3236083984375, 24.7578125, -14.757331848144531, 23.31988525390625, 29.225357055664062, 28.126373291015625, 10.430572509765625, 4.630180358886719, -1.0025405883789062, 12.268272399902344, -0.5127029418945312, 23.643638610839844, 22.475168228149414, 12.024978637695312, 2.9175453186035156, 4.616798400878906, 40.85894775390625, 28.354324340820312, 16.023277282714844, 22.04124641418457, 12.134063720703125, 21.942298889160156, 50.70774841308594, 26.813522338867188, 19.66412353515625, 9.880622863769531, 8.866889953613281, 25.052631378173828, 37.55168151855469, 9.808151245117188, 42.96607971191406, 18.584548950195312, 3.5954933166503906, 2.501842498779297, 25.629451751708984, -10.726493835449219, 9.426681518554688, 7.1293182373046875, 12.297744750976562, 3.3854942321777344, 33.18565368652344, 26.857879638671875, 4.445636749267578, 4.692573547363281, 42.048370361328125, 21.28533935546875, -2.706388473510742, -0.4489593505859375, 15.745498657226562, 15.87850570678711, 13.528358459472656, 36.00845718383789, 8.809432983398438, 38.41486358642578, -14.6973876953125, 11.860687255859375, 40.744850158691406, 22.542530059814453, 23.712249755859375, 11.60040283203125, 19.092071533203125, 28.34514617919922], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000258.npy"}
{"epoch": 0.3788546255506608, "step": 259, "batch_size": 64, "mean": 14.237401962280273, "std": 17.110197067260742, "min": -30.644004821777344, "p10": -1.7224201202392573, "median": 10.127135276794434, "p90": 32.70484390258789, "max": 73.31681823730469, "pos_frac": 0.875, "sample": [25.006103515625, 3.812084197998047, 31.78887939453125, 3.174896240234375, 15.00936508178711, -12.013118743896484, 2.1563720703125, 73.31681823730469, 37.68182373046875, 30.66834259033203, 17.962942123413086, -24.006622314453125, 9.477485656738281, 6.100494384765625, 9.7664794921875, 3.6761932373046875, 8.823005676269531, 9.748565673828125, 32.808982849121094, 9.136642456054688, 25.49561309814453, 7.79296875, -7.47991943359375, 14.535259246826172, 30.502426147460938, 4.457527160644531, 36.786808013916016, 14.730491638183594, 4.981414794921875, -30.644004821777344, 30.275054931640625, 4.172794342041016, 45.9320068359375, 12.925567626953125, 5.12274169921875, 29.803359985351562, 21.907913208007812, 57.51799011230469, 5.477893829345703, 18.096954345703125, 3.9992904663085938, 12.418510437011719, 14.369338989257812, 26.95733642578125, -6.944023132324219, 4.999969482421875, 23.17498779296875, 25.143051147460938, -6.8208465576171875, 35.00079345703125, 6.522792816162109, 30.87688446044922, 32.46185302734375, 6.606517791748047, 13.10479736328125, -1.9460411071777344, 10.487791061401367, 9.644813537597656, 9.276226043701172, -1.2006378173828125, 3.126678466796875, 1.1960525512695312, 12.263275146484375, 19.987709045410156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000259.npy"}
{"epoch": 0.3803230543318649, "step": 260, "batch_size": 64, "mean": 13.400418281555176, "std": 17.477108001708984, "min": -23.614856719970703, "p10": -8.903579711914062, "median": 13.334672927856445, "p90": 34.17601776123047, "max": 52.28247833251953, "pos_frac": 0.75, "sample": [16.89240264892578, -5.53900146484375, 17.682334899902344, -15.25058364868164, 7.178256988525391, 34.8314208984375, 27.254642486572266, 14.178314208984375, 5.673065185546875, 7.4916229248046875, 0.3308753967285156, 30.726985931396484, 10.273792266845703, 20.907882690429688, 28.398788452148438, 46.968353271484375, 21.404186248779297, -13.659446716308594, -3.2367401123046875, 27.82935333251953, -23.614856719970703, -6.782890319824219, 17.239376068115234, 32.38096237182617, -5.735450744628906, 5.905853271484375, 16.67548370361328, 27.45800018310547, 6.9661407470703125, 12.402952194213867, 9.86123275756836, 30.845184326171875, 31.456405639648438, 41.704010009765625, 18.363628387451172, -8.318279266357422, 48.129638671875, 52.28247833251953, 4.175556182861328, 12.291763305664062, -11.013206481933594, -3.403453826904297, 9.033740997314453, -13.088706970214844, 17.88229751586914, -7.0020294189453125, -5.74737548828125, -9.154422760009766, 26.677074432373047, 6.250213623046875, 22.48291015625, -0.8330001831054688, -18.616432189941406, 45.283714294433594, 28.3123779296875, 23.505020141601562, 2.9189910888671875, 11.0181884765625, 13.440929412841797, 13.817657470703125, 13.228416442871094, 34.2388916015625, 20.3419132232666, 34.02931213378906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000260.npy"}
{"epoch": 0.38179148311306904, "step": 261, "batch_size": 64, "mean": 16.377025604248047, "std": 16.034015655517578, "min": -13.446308135986328, "p10": 0.3528686523437502, "median": 13.201143264770508, "p90": 36.56497611999512, "max": 70.6685791015625, "pos_frac": 0.90625, "sample": [20.518310546875, 31.766677856445312, 20.29644775390625, 1.0240478515625, -5.3005828857421875, 70.6685791015625, 4.713260650634766, 19.6614990234375, 2.012857437133789, 3.2132225036621094, 29.48297119140625, 3.990032196044922, 20.11408233642578, 1.821685791015625, 18.23241424560547, 25.48822021484375, 17.929317474365234, 6.9150848388671875, 20.05611801147461, 3.330486297607422, 27.99224090576172, 0.5673675537109375, 13.687885284423828, 7.107177734375, 23.048751831054688, 26.60468292236328, 12.714401245117188, 33.74122619628906, 50.56814193725586, 2.718505859375, 2.8470420837402344, 16.159461975097656, 4.078914642333984, 21.543811798095703, 9.583663940429688, 3.2923011779785156, 34.3405647277832, 11.620025634765625, 37.51829528808594, 7.44757080078125, 59.69459533691406, 10.382110595703125, 8.967208862304688, 22.812240600585938, -13.446308135986328, 39.514305114746094, 12.147705078125, 21.687374114990234, 11.103401184082031, 26.86370849609375, 40.116737365722656, 22.828163146972656, 0.2609405517578125, -1.0352935791015625, -6.3451995849609375, 31.607559204101562, 26.7147216796875, -2.2152976989746094, 10.404716491699219, -7.887351989746094, 5.7802581787109375, 22.45538330078125, 37.65790557861328, 4.94329833984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000261.npy"}
{"epoch": 0.3832599118942731, "step": 262, "batch_size": 64, "mean": 15.51146125793457, "std": 15.35657024383545, "min": -19.76999282836914, "p10": -1.470408821105956, "median": 14.362196922302246, "p90": 35.685446166992186, "max": 61.795745849609375, "pos_frac": 0.875, "sample": [4.106967926025391, 12.431808471679688, -19.76999282836914, 20.343673706054688, 16.690261840820312, 6.871063232421875, 35.369537353515625, 39.683677673339844, 14.282346725463867, 14.238594055175781, -3.829803466796875, 16.82080841064453, -0.5795726776123047, 24.2021484375, -6.3255462646484375, -5.693115234375, 17.723255157470703, 27.103199005126953, 0.9031200408935547, 37.89117431640625, 15.886035919189453, 17.81221580505371, 6.917095184326172, 16.6866455078125, 13.591552734375, 15.262420654296875, 9.734289169311523, 40.39942932128906, 18.910430908203125, 0.46659088134765625, 3.5382652282714844, 8.13637924194336, 14.442047119140625, 1.744964599609375, 13.92431640625, 3.0091552734375, 13.144371032714844, 23.819427490234375, 35.380157470703125, 5.4270782470703125, 15.640586853027344, 9.258750915527344, 30.531951904296875, 39.578033447265625, 29.85158920288086, 46.4188232421875, 18.155563354492188, -19.206558227539062, 18.86205291748047, 25.954936981201172, -1.8521957397460938, 28.964637756347656, 3.2500534057617188, 35.299320220947266, -4.946254730224609, 35.8162841796875, 30.64258575439453, 22.54479217529297, 4.2132568359375, 8.172622680664062, 12.037734985351562, 9.8443603515625, 1.2083740234375, 61.795745849609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000262.npy"}
{"epoch": 0.38472834067547723, "step": 263, "batch_size": 64, "mean": 20.975749969482422, "std": 18.008493423461914, "min": -21.0137939453125, "p10": -1.2938583374023427, "median": 21.23251724243164, "p90": 44.31124649047852, "max": 76.52386474609375, "pos_frac": 0.875, "sample": [21.058975219726562, -4.444358825683594, 10.534317016601562, 53.28546142578125, 23.527313232421875, 24.58831024169922, 29.9580078125, 30.144058227539062, 7.670257568359375, 27.816139221191406, 14.578033447265625, 44.7425537109375, 10.740236282348633, 12.529792785644531, 8.48193359375, 21.40605926513672, 34.54014587402344, 33.95599365234375, 9.194091796875, 43.30486297607422, -9.529632568359375, 10.180099487304688, 37.162681579589844, -5.374534606933594, -0.09490966796875, 15.847602844238281, 15.2744140625, 28.117679595947266, -9.247596740722656, 25.484107971191406, 19.322052001953125, 3.2081222534179688, 35.834869384765625, 8.600614547729492, 12.978713989257812, 36.28553771972656, 34.090538024902344, 20.775348663330078, 42.34600067138672, 6.112548828125, 6.195812225341797, 35.22471618652344, 24.566307067871094, 26.07526397705078, 18.994308471679688, 13.552330017089844, 22.689842224121094, 47.32733154296875, 16.855180740356445, 2.5113143920898438, 9.711395263671875, 30.91374969482422, 57.75019836425781, -21.0137939453125, 24.777999877929688, 24.017776489257812, 44.91094970703125, 24.288578033447266, 1.9256439208984375, -1.8076934814453125, 49.07050323486328, 76.52386474609375, 31.41727066040039, -9.017288208007812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000263.npy"}
{"epoch": 0.38619676945668135, "step": 264, "batch_size": 64, "mean": 17.059368133544922, "std": 17.06623649597168, "min": -19.220417022705078, "p10": -4.716495132446289, "median": 14.796234130859375, "p90": 40.67317047119141, "max": 63.93267059326172, "pos_frac": 0.84375, "sample": [18.215648651123047, 10.960090637207031, 18.79126739501953, 14.3580322265625, 9.405624389648438, 28.402442932128906, 33.409423828125, 63.93267059326172, 13.010993957519531, 25.451675415039062, 28.522350311279297, 10.352981567382812, 7.880090713500977, -10.979759216308594, 16.790363311767578, 5.842838287353516, 56.523773193359375, 8.927337646484375, 32.695560455322266, -12.751754760742188, -8.942626953125, 5.9061126708984375, 17.721420288085938, 18.844118118286133, 12.059219360351562, 27.967247009277344, 43.57154083251953, 7.147026062011719, 9.762359619140625, 12.666519165039062, 4.417793273925781, 6.96923828125, -5.616472244262695, 12.681106567382812, 2.7573394775390625, 22.15869903564453, 8.655982971191406, 15.23443603515625, -3.9208106994628906, 41.22167205810547, 28.020919799804688, -1.6496734619140625, 24.541748046875, -5.794166564941406, 17.710205078125, 8.042938232421875, 37.364112854003906, -2.4727935791015625, 23.627609252929688, 12.558929443359375, 25.863136291503906, -5.057502746582031, 29.374229431152344, 51.323020935058594, 4.840932846069336, 24.49674415588379, 16.280197143554688, 43.586124420166016, 19.42749786376953, 49.29707336425781, 32.20574951171875, 39.393333435058594, 7.0360260009765625, -19.220417022705078], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000264.npy"}
{"epoch": 0.3876651982378855, "step": 265, "batch_size": 64, "mean": 16.98664093017578, "std": 18.551450729370117, "min": -21.803192138671875, "p10": -5.675100708007811, "median": 14.301769256591797, "p90": 43.38597564697266, "max": 61.911651611328125, "pos_frac": 0.84375, "sample": [19.12261199951172, 60.5693359375, 29.793067932128906, 29.454097747802734, 0.4645538330078125, 7.672451019287109, -9.529129028320312, 9.0228271484375, 49.73609924316406, 27.240707397460938, 19.491683959960938, 25.46149444580078, -1.3204765319824219, 1.2860870361328125, 2.0429763793945312, 2.0833969116210938, 11.574882507324219, -6.595951080322266, 12.045543670654297, 22.494606018066406, 6.803749084472656, 53.050262451171875, 21.51709747314453, 61.911651611328125, 15.620101928710938, 56.845947265625, 22.20484161376953, 43.058876037597656, 8.773834228515625, 36.436866760253906, 16.230039596557617, -6.32952880859375, 6.009737014770508, 6.900506973266602, 11.167556762695312, 29.187835693359375, 5.919338226318359, 28.117164611816406, 48.31427001953125, 27.30260467529297, 43.526161193847656, 9.059585571289062, -3.447978973388672, -21.803192138671875, 13.952171325683594, -7.519844055175781, -4.148101806640625, 12.474151611328125, 4.3368682861328125, 22.240097045898438, 32.241607666015625, 8.992446899414062, 38.95170593261719, 29.640274047851562, 4.615531921386719, 14.975515365600586, 33.52388000488281, 21.072593688964844, 13.581344604492188, -14.616081237792969, 17.444141387939453, 1.4444427490234375, -9.197303771972656, 14.6513671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000265.npy"}
{"epoch": 0.3891336270190896, "step": 266, "batch_size": 64, "mean": 15.071517944335938, "std": 15.398823738098145, "min": -19.077774047851562, "p10": -5.267890548706054, "median": 16.045547485351562, "p90": 33.43430099487305, "max": 60.22380065917969, "pos_frac": 0.78125, "sample": [3.7272186279296875, 23.271041870117188, 16.773277282714844, 25.28917694091797, 23.397926330566406, 12.22808837890625, -0.728515625, 16.12066650390625, 9.874832153320312, 28.04889678955078, 19.828838348388672, 39.66423034667969, 9.551773071289062, 33.861167907714844, -3.7631454467773438, 14.930747985839844, 28.673980712890625, 27.86138916015625, 34.652889251708984, -9.121192932128906, -0.36176300048828125, -6.678127288818359, 17.080612182617188, 32.43827819824219, 18.04113006591797, 15.970428466796875, -12.341812133789062, 9.486400604248047, -3.91595458984375, 38.66865539550781, 20.586013793945312, 23.314491271972656, 60.22380065917969, 24.46112060546875, 28.26065444946289, 8.691421508789062, 31.861968994140625, 44.56309509277344, 9.589286804199219, -11.461441040039062, 21.71649169921875, 9.481689453125, -5.176307678222656, -5.307140350341797, 24.74342155456543, 11.961811065673828, 15.482276916503906, 20.399124145507812, -19.077774047851562, 25.85992431640625, 3.273723602294922, 17.660781860351562, 23.542442321777344, 24.762420654296875, 0.4315185546875, 9.902130126953125, 6.499139785766602, 9.102691650390625, -2.799224853515625, 12.763816833496094, 28.39666748046875, 35.62174987792969, -6.109958648681641, -1.17578125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000266.npy"}
{"epoch": 0.39060205580029367, "step": 267, "batch_size": 64, "mean": 18.470882415771484, "std": 15.915902137756348, "min": -10.719470977783203, "p10": 1.7938144683837896, "median": 18.542333602905273, "p90": 35.70496978759766, "max": 86.09176635742188, "pos_frac": 0.90625, "sample": [24.490631103515625, 34.477149963378906, 2.62994384765625, -10.719470977783203, 17.856552124023438, 22.678836822509766, 25.96613311767578, 20.27783966064453, 28.154281616210938, 2.9640274047851562, 22.550399780273438, 23.75213623046875, -6.6270599365234375, 3.6552391052246094, 31.412452697753906, 5.464931488037109, 24.76116943359375, 19.899749755859375, 36.231178283691406, 86.09176635742188, 4.229269027709961, -6.34405517578125, 28.49987030029297, 1.5865859985351562, 17.311233520507812, 15.169988632202148, 19.692779541015625, -2.744894027709961, 23.084136962890625, 33.93022918701172, 4.87738037109375, 12.0289306640625, 33.698516845703125, 29.890243530273438, 25.814136505126953, 6.349601745605469, 2.2773475646972656, 10.024971008300781, 14.011154174804688, 23.99413299560547, 46.914451599121094, 3.8487701416015625, 9.664085388183594, 22.8306884765625, 14.851470947265625, 3.1139373779296875, 16.140975952148438, 37.424285888671875, 48.960060119628906, 24.06756591796875, 19.22811508178711, 24.470481872558594, -0.539337158203125, 22.397109985351562, 43.503143310546875, 7.930126190185547, 15.09930419921875, -9.2674560546875, 7.828426361083984, 24.766971588134766, 16.64311981201172, 14.38083267211914, 37.001861572265625, 17.52800750732422], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000267.npy"}
{"epoch": 0.3920704845814978, "step": 268, "batch_size": 64, "mean": 16.947551727294922, "std": 15.381264686584473, "min": -15.086334228515625, "p10": -1.4661697387695303, "median": 15.60953140258789, "p90": 35.80980606079102, "max": 59.08636474609375, "pos_frac": 0.859375, "sample": [17.147167205810547, 33.533721923828125, 13.381736755371094, 11.04612922668457, -2.763202667236328, 6.402276992797852, 1.6190643310546875, -0.5699996948242188, 19.7659912109375, 7.7871551513671875, 9.653213500976562, -9.95245361328125, 26.553451538085938, -1.8502426147460938, 33.825313568115234, 26.30487060546875, 11.156661987304688, 9.438274383544922, 13.969375610351562, 20.538944244384766, 9.1812744140625, 24.48560333251953, 19.026382446289062, 37.25726318359375, 38.51786422729492, 35.8692626953125, 31.110443115234375, 59.08636474609375, 1.1127090454101562, 35.67107391357422, 4.357421875, 8.190299987792969, 24.463783264160156, 13.344837188720703, 22.372177124023438, 9.9617919921875, 27.73134994506836, 24.372852325439453, 26.374374389648438, 36.00550842285156, 43.401206970214844, 39.268157958984375, 14.071895599365234, -2.974546432495117, 23.463401794433594, 3.4207916259765625, 8.724384307861328, 28.756515502929688, -4.5774993896484375, 32.72752380371094, 4.2023468017578125, -0.2376708984375, 26.785079956054688, 7.038825988769531, 4.258064270019531, 33.91766357421875, 20.57461929321289, 31.20642852783203, 28.09722900390625, 34.71549606323242, -14.638076782226562, -15.086334228515625, 0.6456108093261719, 1.4021797180175781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000268.npy"}
{"epoch": 0.3935389133627019, "step": 269, "batch_size": 64, "mean": 18.490314483642578, "std": 16.24837303161621, "min": -11.38882064819336, "p10": -1.4318321228027333, "median": 19.357772827148438, "p90": 37.565606689453126, "max": 69.31044006347656, "pos_frac": 0.875, "sample": [22.635208129882812, 32.432647705078125, 21.060325622558594, 23.20355224609375, -0.3735809326171875, 11.539543151855469, 14.376289367675781, 13.360305786132812, 33.58229064941406, 6.201484680175781, 43.61088562011719, -1.8853683471679688, 24.432388305664062, 35.048828125, 26.96636199951172, 23.256591796875, 38.03507995605469, -2.765869140625, 18.190414428710938, -9.383918762207031, 29.307235717773438, 12.039291381835938, 3.67767333984375, 13.523834228515625, 11.06036376953125, 7.18475341796875, 20.56624984741211, 4.495574951171875, 50.973846435546875, 21.665191650390625, 26.493732452392578, -5.754325866699219, -3.6949920654296875, 28.614227294921875, 3.6598129272460938, 19.23218536376953, 20.054481506347656, 9.471580505371094, 20.758682250976562, 32.5352783203125, -11.38882064819336, 26.21808624267578, 24.82830810546875, 2.3937225341796875, 13.195056915283203, 46.998992919921875, 3.8003997802734375, 1.42608642578125, 40.13328552246094, 14.95904541015625, 8.606964111328125, 7.8726654052734375, 33.52748107910156, 69.31044006347656, 0.468658447265625, 19.483360290527344, 49.94355773925781, 8.317756652832031, 2.2738418579101562, 35.72570037841797, 31.227027893066406, 23.66832733154297, -5.468128204345703, 36.47016906738281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000269.npy"}
{"epoch": 0.39500734214390604, "step": 270, "batch_size": 64, "mean": 17.19367218017578, "std": 18.864795684814453, "min": -13.512008666992188, "p10": -8.091945648193358, "median": 15.477813720703125, "p90": 39.228212738037115, "max": 65.8818588256836, "pos_frac": 0.828125, "sample": [5.9278106689453125, 37.5902099609375, 45.491607666015625, 4.990009307861328, 7.3195648193359375, -12.43267822265625, 20.3472900390625, 40.582984924316406, 37.137786865234375, 56.276031494140625, 21.24639129638672, -9.812484741210938, 15.632293701171875, 12.911338806152344, 16.122230529785156, 15.323333740234375, 14.269134521484375, 1.6748313903808594, -1.437652587890625, 1.818115234375, 22.72525405883789, 24.80229949951172, 33.376983642578125, 28.486549377441406, 17.55345916748047, -0.5226593017578125, 0.803192138671875, 13.847908020019531, 18.347442626953125, 27.384788513183594, 11.576919555664062, 9.562423706054688, 62.73095703125, -5.773414611816406, 39.930213928222656, 18.135169982910156, 30.194503784179688, 2.2665977478027344, 28.10540771484375, 7.780149459838867, 35.261505126953125, 9.360038757324219, 36.85832214355469, -9.756629943847656, -5.508155822753906, 25.53423309326172, 32.090492248535156, 9.1602783203125, 23.481231689453125, 4.119941711425781, 24.182022094726562, -9.837562561035156, 1.0157928466796875, 0.28260040283203125, 25.15777587890625, 33.88848876953125, -9.085601806640625, -9.098514556884766, -13.512008666992188, 10.700920104980469, 31.085052490234375, 65.8818588256836, 5.780414581298828, 61.0601806640625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000270.npy"}
{"epoch": 0.3964757709251101, "step": 271, "batch_size": 64, "mean": 18.77385139465332, "std": 17.234874725341797, "min": -23.540740966796875, "p10": -1.346133613586424, "median": 19.523598670959473, "p90": 38.15220947265626, "max": 60.9630126953125, "pos_frac": 0.890625, "sample": [19.64459228515625, 29.359329223632812, 4.584709167480469, 19.320133209228516, 6.32818603515625, -2.534076690673828, 26.737045288085938, 60.9630126953125, 23.90491485595703, 22.877479553222656, 10.701133728027344, -23.319671630859375, 55.42115783691406, 22.703086853027344, 16.23885726928711, 26.82209014892578, -9.62176513671875, 47.06890106201172, 11.793609619140625, 6.6620588302612305, -9.423343658447266, 30.88721466064453, 44.6229248046875, 33.62782287597656, 17.953521728515625, 55.923431396484375, 9.969547271728516, 14.919837951660156, 24.090335845947266, 22.26959991455078, -18.22488021850586, 14.722881317138672, 27.134078979492188, 23.514480590820312, 25.8238525390625, 34.92291259765625, 13.811447143554688, 0.32840538024902344, 15.796592712402344, 19.402605056762695, 32.0003662109375, 30.054275512695312, 12.717227935791016, 0.7176380157470703, 6.656040191650391, 8.75269889831543, 2.40936279296875, -2.063793182373047, 34.72209930419922, 30.295135498046875, 39.53619384765625, 41.82538604736328, 21.643508911132812, 6.5857391357421875, 11.984970092773438, -23.540740966796875, 8.614830017089844, 25.98944091796875, 30.278114318847656, 19.969947814941406, 14.516944885253906, 9.319801330566406, 31.067550659179688, 29.7457275390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000271.npy"}
{"epoch": 0.39794419970631423, "step": 272, "batch_size": 64, "mean": 19.341678619384766, "std": 19.071928024291992, "min": -14.154800415039062, "p10": -2.8329338073730455, "median": 16.240694046020508, "p90": 43.49707107543946, "max": 64.69672393798828, "pos_frac": 0.84375, "sample": [11.267341613769531, -6.381824493408203, 25.255077362060547, -4.771110534667969, 11.419136047363281, 43.659446716308594, 41.354248046875, 64.69672393798828, 48.650108337402344, 25.861923217773438, 5.580574035644531, 20.865341186523438, 5.020160675048828, 1.4493064880371094, -7.925958633422852, 32.082977294921875, 20.714466094970703, 17.772735595703125, 42.30870056152344, 40.84319305419922, -5.8011932373046875, 16.89026641845703, 4.471229553222656, 55.684326171875, -14.154800415039062, -1.3480987548828125, 36.759437561035156, 62.22059631347656, 35.95166015625, 52.355712890625, 30.17099380493164, 12.508621215820312, 3.067455291748047, 4.158721923828125, 16.392868041992188, 13.642814636230469, 40.9520263671875, -13.51361083984375, 28.253211975097656, 14.684879302978516, 40.00432586669922, 27.1197509765625, -3.4692916870117188, 16.088520050048828, 13.925933837890625, 3.0358047485351562, 49.7442626953125, 11.947807312011719, 25.454917907714844, 43.118194580078125, 4.8172454833984375, 41.254730224609375, -1.108428955078125, -0.38214874267578125, 5.384002685546875, 13.563796997070312, 19.470436096191406, 2.5627899169921875, 23.64789581298828, 4.156515121459961, 6.6703643798828125, 17.099117279052734, 32.90740203857422, 7.783843994140625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000272.npy"}
{"epoch": 0.39941262848751835, "step": 273, "batch_size": 64, "mean": 19.769119262695312, "std": 20.53986358642578, "min": -32.96421813964844, "p10": -5.638054275512694, "median": 20.720638275146484, "p90": 46.54782104492189, "max": 67.56566619873047, "pos_frac": 0.859375, "sample": [-3.2348403930664062, 28.365249633789062, -9.514076232910156, 1.9498443603515625, 59.61531066894531, 67.56566619873047, 3.977874755859375, 26.959327697753906, 30.602127075195312, 10.931900024414062, 6.839607238769531, 15.795822143554688, 30.513320922851562, 20.924819946289062, -8.36651611328125, 41.1732063293457, -8.968425750732422, 10.351638793945312, 25.27642059326172, 48.43304443359375, 30.293106079101562, 12.124778747558594, 6.235992431640625, 14.079212188720703, -12.64010238647461, 37.00799560546875, 22.95844268798828, 14.548404693603516, 21.612960815429688, 49.11357116699219, -19.965858459472656, 3.54638671875, 55.18144989013672, 35.60032653808594, 21.654922485351562, 47.812530517578125, 25.775115966796875, 35.35618591308594, 15.935955047607422, 43.596832275390625, 0.6470260620117188, 38.851524353027344, 15.663135528564453, 12.928855895996094, 14.063453674316406, 34.18682861328125, 3.0327301025390625, -4.478580474853516, 5.7014617919921875, -6.134971618652344, 34.883140563964844, 7.30950927734375, 5.779304504394531, 66.4354476928711, 29.6143798828125, 42.92289733886719, 4.201774597167969, 0.7739944458007812, 23.956451416015625, 20.516456604003906, 35.6072998046875, -32.96421813964844, 27.922882080078125, 24.783447265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000273.npy"}
{"epoch": 0.4008810572687225, "step": 274, "batch_size": 64, "mean": 16.445659637451172, "std": 15.444137573242188, "min": -13.521408081054688, "p10": 0.04262886047363321, "median": 15.162788391113281, "p90": 38.61848297119142, "max": 54.04729461669922, "pos_frac": 0.890625, "sample": [6.554290771484375, 7.731330871582031, 13.119888305664062, -6.671379089355469, 29.632186889648438, 14.884002685546875, 17.233360290527344, 3.9147186279296875, 6.924283981323242, 39.61753845214844, 18.05105972290039, 42.74241638183594, 15.8267822265625, -0.1235198974609375, 10.760009765625, 35.0338134765625, 19.3896484375, 19.22834014892578, -12.083061218261719, 2.007476806640625, 49.690895080566406, 11.989631652832031, 17.425697326660156, 7.2526397705078125, -13.521408081054688, 32.17063903808594, 22.000267028808594, -1.4028472900390625, 10.786018371582031, 27.511566162109375, 7.488248825073242, 36.287353515625, 0.4303092956542969, 24.661056518554688, 4.737407684326172, 31.07440185546875, 48.60237121582031, 22.38530731201172, 53.42108917236328, 16.293865203857422, 54.04729461669922, 16.432437896728516, 3.9129791259765625, 5.265842437744141, -9.989387512207031, 17.434104919433594, 27.200790405273438, 6.351158142089844, 10.174613952636719, 45.747581481933594, 17.19256591796875, -3.2437896728515625, 10.091547012329102, 15.441574096679688, 29.49148178100586, 11.647674560546875, 20.545612335205078, 3.24440860748291, 9.664253234863281, 9.590385437011719, 20.50792694091797, 5.667976379394531, 6.452110290527344, 26.59335708618164], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000274.npy"}
{"epoch": 0.4023494860499266, "step": 275, "batch_size": 64, "mean": 16.654417037963867, "std": 15.381179809570312, "min": -19.212783813476562, "p10": -0.8181869506835936, "median": 16.53258228302002, "p90": 40.66789474487305, "max": 49.64006805419922, "pos_frac": 0.875, "sample": [4.1318817138671875, -4.398792266845703, 6.158973693847656, 5.950817108154297, 13.278533935546875, 25.626541137695312, 32.09675598144531, -0.6682662963867188, 41.43677520751953, 31.48571014404297, 25.396644592285156, 23.98291015625, 42.04278564453125, 11.147478103637695, 2.3399887084960938, 21.419342041015625, 9.641170501708984, 17.695083618164062, -17.056964874267578, -4.6992645263671875, 29.64654541015625, 25.647567749023438, 12.34759521484375, 16.3052978515625, 2.46209716796875, 6.143882751464844, 13.361915588378906, 1.248626708984375, 21.836471557617188, -9.177444458007812, 16.673330307006836, 16.391834259033203, 42.09724426269531, 18.757461547851562, 18.32898712158203, 3.355865478515625, 5.616607666015625, 22.342201232910156, 0.2636985778808594, 11.645111083984375, -19.212783813476562, 5.383148193359375, -0.8824386596679688, 15.605842590332031, 28.60601806640625, 42.701812744140625, 27.285385131835938, 38.87384033203125, 9.022932052612305, 11.710128784179688, 20.90338134765625, -16.079856872558594, 15.947547912597656, 43.0638542175293, 28.069984436035156, 15.220100402832031, 30.540359497070312, 49.64006805419922, 19.41777801513672, 21.573040008544922, 31.56633758544922, 21.44769287109375, 42.0206298828125, 21.154815673828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000275.npy"}
{"epoch": 0.40381791483113066, "step": 276, "batch_size": 64, "mean": 16.045581817626953, "std": 16.006404876708984, "min": -18.561595916748047, "p10": -2.573135375976562, "median": 15.520875930786133, "p90": 38.69178009033204, "max": 59.559349060058594, "pos_frac": 0.8125, "sample": [17.73961639404297, 39.84527587890625, 29.361644744873047, 21.5556640625, -2.1347808837890625, 9.047914505004883, 22.10065460205078, 36.56524658203125, -0.15002059936523438, 26.473617553710938, 10.142265319824219, -2.7610015869140625, 10.666358947753906, 15.041976928710938, 17.11292266845703, 23.987262725830078, -0.10564041137695312, 3.9766464233398438, 31.43511962890625, 22.79991912841797, 17.56649398803711, -10.605903625488281, 15.29788589477539, 13.8970947265625, 3.0865478515625, 16.202354431152344, 3.1762428283691406, 4.906959533691406, 13.376188278198242, 16.51473617553711, -5.0138092041015625, -7.582244873046875, 21.449684143066406, 11.345108032226562, 39.72941589355469, 32.438758850097656, 14.710769653320312, -0.34039306640625, 1.1761798858642578, 21.555011749267578, 20.496826171875, 2.2492141723632812, -18.561595916748047, 17.044158935546875, 12.501640319824219, -1.343414306640625, 2.2908096313476562, 20.584091186523438, 39.15538024902344, 48.45598602294922, 52.06748962402344, 9.376750946044922, 19.51675033569336, 43.626739501953125, -6.338020324707031, 37.61004638671875, 2.8531036376953125, -3.9907188415527344, 34.11802673339844, 28.58289337158203, 15.743865966796875, 59.559349060058594, 10.987205505371094, 24.743026733398438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000276.npy"}
{"epoch": 0.4052863436123348, "step": 277, "batch_size": 64, "mean": 16.167617797851562, "std": 12.85084056854248, "min": -9.014450073242188, "p10": -2.956939697265625, "median": 18.75562286376953, "p90": 30.621627426147462, "max": 41.689292907714844, "pos_frac": 0.859375, "sample": [-3.7903213500976562, 25.90093231201172, 27.307525634765625, 30.7982177734375, 30.209583282470703, 9.637821197509766, 4.9959716796875, 17.98321533203125, -2.821075439453125, 15.2392578125, 18.563888549804688, 23.06295394897461, 3.3434219360351562, -6.949378967285156, 38.69886779785156, 28.18670654296875, 7.163578033447266, 3.473724365234375, 27.382036209106445, -3.0617523193359375, 19.848461151123047, -7.118072509765625, 19.01013946533203, 28.023452758789062, 5.657901763916016, 18.947357177734375, 9.645118713378906, 27.225738525390625, 5.647727966308594, 25.17816925048828, 23.03577423095703, 39.724937438964844, 0.6147079467773438, 32.345481872558594, 15.904930114746094, 6.916614532470703, 2.4574356079101562, 27.56387710571289, 12.915302276611328, 21.944244384765625, 14.217208862304688, 24.709148406982422, 33.46989440917969, -3.015167236328125, -0.9319305419921875, 21.571815490722656, 4.835960388183594, 41.689292907714844, 39.24397277832031, 20.882080078125, 19.76031494140625, 25.212356567382812, 25.627079010009766, -9.014450073242188, 21.65648651123047, 12.99578857421875, 2.3439712524414062, 13.804397583007812, 26.399093627929688, 20.033660888671875, 10.559539794921875, 16.772274017333984, -4.6884918212890625, 25.778778076171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000277.npy"}
{"epoch": 0.4067547723935389, "step": 278, "batch_size": 64, "mean": 16.45389175415039, "std": 17.18553924560547, "min": -29.94835662841797, "p10": -3.5727569580078122, "median": 17.210840225219727, "p90": 39.607706069946296, "max": 54.189056396484375, "pos_frac": 0.796875, "sample": [16.45452880859375, 38.65835189819336, 8.796737670898438, 17.967151641845703, 25.313232421875, 27.6348876953125, 36.78894805908203, -8.117076873779297, 52.57156753540039, 22.957046508789062, -4.976890563964844, 20.052627563476562, 25.727787017822266, 1.350616455078125, 9.64078140258789, 16.214813232421875, -0.9232025146484375, 36.068206787109375, 42.54779052734375, -1.2568359375, 2.0933914184570312, 49.92240905761719, 44.551727294921875, 4.647216796875, -3.6711349487304688, 21.472869873046875, 23.18682098388672, 1.0266742706298828, 7.735191345214844, 40.01457214355469, 54.189056396484375, 27.399532318115234, 27.27191162109375, -29.94835662841797, 29.306610107421875, 2.1588897705078125, 21.481117248535156, 30.5291748046875, -3.3432083129882812, 22.227378845214844, 23.573440551757812, 13.352046966552734, 6.380376815795898, 11.603012084960938, -0.7109832763671875, 26.9285888671875, 1.2789688110351562, -4.585502624511719, 5.9439697265625, 24.301498413085938, 23.7095947265625, 26.693607330322266, 7.420703887939453, 4.625556945800781, 44.73210906982422, 34.750144958496094, -0.3587188720703125, 10.269638061523438, -3.8122100830078125, 5.566780090332031, 24.749847412109375, -7.625213623046875, 21.191513061523438, -2.6525497436523438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000278.npy"}
{"epoch": 0.40822320117474303, "step": 279, "batch_size": 64, "mean": 15.111471176147461, "std": 16.85564422607422, "min": -22.44707489013672, "p10": -3.2769935607910154, "median": 13.468935012817383, "p90": 40.30500793457031, "max": 66.28756713867188, "pos_frac": 0.8125, "sample": [6.082569122314453, 13.514392852783203, -3.4314041137695312, -7.43023681640625, -2.6055908203125, 7.442901611328125, 13.646438598632812, 6.4952392578125, 2.3466567993164062, -0.33991241455078125, 27.403160095214844, 29.821434020996094, 32.973392486572266, 19.325687408447266, 45.33411407470703, 1.4482269287109375, -1.7615852355957031, 10.248176574707031, 30.886253356933594, -22.44707489013672, 20.030364990234375, 42.27234649658203, 16.364582061767578, 16.177711486816406, 15.441296577453613, 17.092350006103516, 36.84419250488281, 24.901222229003906, 13.539192199707031, 41.03459930419922, -6.857940673828125, 13.695831298828125, 26.284194946289062, 47.89747619628906, 15.619049072265625, 40.50761413574219, 2.882659912109375, 66.28756713867188, 12.4434814453125, 28.987621307373047, 5.006797790527344, -4.5079193115234375, 12.619060516357422, 4.0724945068359375, -2.294445037841797, 27.1302490234375, 2.7693023681640625, 9.02426528930664, 33.88018798828125, 10.769638061523438, 13.423477172851562, 19.130516052246094, 48.681854248046875, -2.9167022705078125, 39.83226013183594, 3.6156692504882812, 5.734588623046875, 4.780662536621094, -9.677927017211914, 1.9589004516601562, 21.129226684570312, 6.37652587890625, -3.834125518798828, 20.0313720703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000279.npy"}
{"epoch": 0.40969162995594716, "step": 280, "batch_size": 64, "mean": 19.829395294189453, "std": 16.68490982055664, "min": -13.605266571044922, "p10": -1.224846267700195, "median": 18.638484954833984, "p90": 40.56291046142578, "max": 55.49562072753906, "pos_frac": 0.859375, "sample": [14.027069091796875, 36.7764892578125, -13.605266571044922, 42.82575988769531, -5.552726745605469, -8.067817687988281, 23.89289093017578, 14.353958129882812, 40.78276062011719, 47.363922119140625, 5.070667266845703, 1.4576187133789062, 35.02009582519531, 23.831871032714844, 19.72667694091797, 33.22410583496094, 16.920034408569336, 54.690345764160156, 29.8564453125, 0.8426055908203125, -0.3782329559326172, 26.48636245727539, 39.480438232421875, 6.742343902587891, 34.69436264038086, 29.50943374633789, 21.761207580566406, 31.873703002929688, -1.3555755615234375, 31.558815002441406, 55.40040588378906, 33.60975646972656, 17.4434814453125, 44.72290802001953, 40.0499267578125, -2.437711715698242, 12.985504150390625, 55.49562072753906, 11.643447875976562, 35.079444885253906, 25.36029052734375, 14.815479278564453, -6.3300018310546875, -6.522247314453125, -0.9198112487792969, 13.242828369140625, 10.587398529052734, 17.55029296875, 3.762115478515625, 37.16790771484375, 13.110343933105469, 24.264312744140625, 13.651657104492188, 2.3427047729492188, 25.033740997314453, 11.213432312011719, 15.133544921875, 30.203880310058594, 2.3430023193359375, 27.33319091796875, 20.49367332458496, 8.324234008789062, 21.800796508789062, 7.3153839111328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000280.npy"}
{"epoch": 0.4111600587371512, "step": 281, "batch_size": 64, "mean": 20.291027069091797, "std": 19.107872009277344, "min": -23.071685791015625, "p10": -0.9095291137695297, "median": 16.23461151123047, "p90": 46.862694549560565, "max": 74.15115356445312, "pos_frac": 0.890625, "sample": [-6.657310485839844, 63.96763610839844, 10.108341217041016, 31.434009552001953, 15.55331039428711, 8.409591674804688, 41.99266052246094, 13.270668029785156, 13.594398498535156, 11.96158218383789, 30.1612548828125, 35.3809814453125, 15.196060180664062, 4.552482604980469, 14.319122314453125, -23.071685791015625, 31.76504135131836, 5.7042083740234375, 13.736076354980469, 16.329360961914062, 25.13066864013672, 37.469970703125, 19.647377014160156, 16.828659057617188, 6.518585205078125, 53.71942138671875, 27.670745849609375, 33.458866119384766, 16.139862060546875, 51.281471252441406, 33.77189636230469, 13.196624755859375, -2.3147010803222656, 37.43763732910156, 18.828659057617188, 20.208755493164062, 14.429153442382812, 24.91718292236328, 36.168373107910156, 1.185699462890625, 48.949851989746094, 16.997329711914062, 13.25396728515625, 39.15453338623047, 53.93175506591797, 15.5350341796875, 14.924301147460938, 27.43035125732422, 74.15115356445312, 22.375211715698242, -1.5726776123046875, 4.8888702392578125, -9.30609130859375, 7.09417724609375, 0.92242431640625, 17.18018341064453, 6.937551498413086, 61.22196960449219, -7.974571228027344, 29.832210540771484, 12.811664581298828, -16.7235107421875, 0.6378173828125, 32.569488525390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000281.npy"}
{"epoch": 0.41262848751835535, "step": 282, "batch_size": 64, "mean": 20.987438201904297, "std": 16.466598510742188, "min": -0.49179649353027344, "p10": 5.431135368347168, "median": 18.358449935913086, "p90": 43.39593353271484, "max": 81.28326416015625, "pos_frac": 0.984375, "sample": [42.18296813964844, 5.343881607055664, 15.52773666381836, 57.310699462890625, 8.480270385742188, 20.771949768066406, 45.58443069458008, 33.07894515991211, 0.10797882080078125, 35.36151123046875, 44.113433837890625, 42.141632080078125, 31.344770431518555, -0.49179649353027344, 21.61651611328125, 2.9734725952148438, 81.28326416015625, 52.751502990722656, 2.357269287109375, 0.921661376953125, 23.382537841796875, 42.247894287109375, 12.212604522705078, 6.409427642822266, 32.60963439941406, 44.268402099609375, 24.96350860595703, 33.794944763183594, 10.877975463867188, 20.273231506347656, 23.16143798828125, 10.194198608398438, 6.046241760253906, 5.674478530883789, 9.257598876953125, 43.028961181640625, 32.069488525390625, 13.286758422851562, 16.4459228515625, 18.51865005493164, 9.97723388671875, 8.260505676269531, 9.171966552734375, 43.55320739746094, 22.300674438476562, 18.19824981689453, 7.313690185546875, 8.649574279785156, 5.9988555908203125, 19.56396484375, 6.283660888671875, 9.324493408203125, 24.81536865234375, 22.943376541137695, 16.890548706054688, 21.692337036132812, 23.41259765625, 20.466289520263672, 9.416007995605469, 40.799591064453125, 5.634727478027344, 6.178718566894531, 3.846599578857422, 7.0178070068359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000282.npy"}
{"epoch": 0.41409691629955947, "step": 283, "batch_size": 64, "mean": 16.220184326171875, "std": 19.65955352783203, "min": -31.443511962890625, "p10": -5.80026626586914, "median": 13.927932739257812, "p90": 45.388995361328135, "max": 61.706695556640625, "pos_frac": 0.78125, "sample": [1.6179580688476562, 5.513114929199219, 31.40375518798828, 5.131570816040039, 11.185653686523438, 52.481056213378906, 19.252197265625, 9.831840515136719, 43.18946838378906, -4.105129241943359, -4.2181854248046875, 20.234268188476562, 0.797821044921875, 6.0867767333984375, -8.525352478027344, 12.109939575195312, 14.889999389648438, 22.603469848632812, -14.00885009765625, 3.499380111694336, 12.965866088867188, 27.847320556640625, -15.710044860839844, 49.719757080078125, 8.356781005859375, 26.795562744140625, 5.175773620605469, 51.96559143066406, 19.339279174804688, 61.706695556640625, 3.89166259765625, 12.227767944335938, 41.0526008605957, 28.065704345703125, 5.586265563964844, 27.688743591308594, -6.109405517578125, 23.087158203125, 4.5888519287109375, -0.3705291748046875, -5.078941345214844, 46.63105773925781, 30.19184112548828, -20.11370086669922, -1.2332344055175781, 34.68439483642578, 31.141159057617188, 30.373374938964844, 1.568634033203125, -10.108367919921875, 46.33164978027344, 18.439788818359375, 46.91725158691406, 11.366127014160156, 22.541351318359375, 20.07000732421875, 39.972896575927734, -1.6662445068359375, 32.78289794921875, 27.525474548339844, 16.641448974609375, 33.81422424316406, -31.443511962890625, -0.09997177124023438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000283.npy"}
{"epoch": 0.4155653450807636, "step": 284, "batch_size": 64, "mean": 21.845355987548828, "std": 16.362932205200195, "min": -7.196220397949219, "p10": 6.475268173217773, "median": 19.39855194091797, "p90": 43.65651855468752, "max": 81.92535400390625, "pos_frac": 0.953125, "sample": [33.820953369140625, 19.27416229248047, 20.360443115234375, 22.158729553222656, 10.486366271972656, 6.455493927001953, 18.92719268798828, 9.106006622314453, 27.22156524658203, 33.76927185058594, 14.505424499511719, 17.881301879882812, 25.857574462890625, 6.888664245605469, 17.468666076660156, 1.9528865814208984, -7.196220397949219, 47.49689483642578, 31.974048614501953, 20.954551696777344, 13.411735534667969, 60.99623107910156, 6.5214080810546875, 49.121864318847656, 27.054370880126953, 12.805671691894531, 26.874046325683594, 17.411514282226562, 24.16455078125, -3.667572021484375, 8.763824462890625, 81.92535400390625, 16.986282348632812, 14.805072784423828, 15.854278564453125, 28.552932739257812, 24.379898071289062, 4.734039306640625, 22.757692337036133, 70.2437744140625, 47.14287567138672, 20.262237548828125, 6.605987548828125, 19.109527587890625, 22.438697814941406, 7.0768280029296875, 15.39581298828125, 19.52294158935547, 0.0052490234375, 33.333045959472656, 19.16705322265625, 23.847244262695312, 20.009536743164062, 46.025550842285156, 38.12877655029297, 8.94015884399414, 10.499183654785156, -2.6859054565429688, 7.849090576171875, 29.466548919677734, 16.219303131103516, 28.254135131835938, 32.61542510986328, 25.812591552734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000284.npy"}
{"epoch": 0.4170337738619677, "step": 285, "batch_size": 64, "mean": 17.451358795166016, "std": 14.521804809570312, "min": -21.815227508544922, "p10": -0.2986755371093749, "median": 18.690792083740234, "p90": 34.02354393005371, "max": 54.75038146972656, "pos_frac": 0.875, "sample": [36.94718933105469, 0.9756927490234375, 15.07391357421875, 19.394851684570312, -7.605472564697266, 15.778491973876953, 18.769615173339844, 3.9802322387695312, 25.314552307128906, 1.0096969604492188, 9.141448974609375, 29.017372131347656, 34.072452545166016, 9.748323440551758, 32.24322509765625, 22.693405151367188, 21.503990173339844, 6.5289459228515625, 54.75038146972656, 28.667938232421875, 12.727027893066406, -0.15901947021484375, 10.335250854492188, 16.213394165039062, 21.430946350097656, 14.8902587890625, 9.456336975097656, 45.765716552734375, 21.84765625, 28.10809326171875, 23.901611328125, 25.657188415527344, -20.762710571289062, -3.3296966552734375, 21.985870361328125, 37.790313720703125, -0.8421630859375, -2.2101593017578125, 25.444473266601562, 27.41741943359375, 9.537586212158203, 23.947509765625, 33.909423828125, 11.511795043945312, 10.603939056396484, 2.989177703857422, 8.373367309570312, 32.01380157470703, 24.185871124267578, 16.288162231445312, -0.35852813720703125, 17.362163543701172, 30.508018493652344, 21.804168701171875, 12.240367889404297, 34.42196273803711, 46.71369171142578, -21.815227508544922, 3.403564453125, 22.479339599609375, 24.9686279296875, 17.815444946289062, 18.611968994140625, 21.696653366088867], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000285.npy"}
{"epoch": 0.4185022026431718, "step": 286, "batch_size": 64, "mean": 18.329389572143555, "std": 16.521963119506836, "min": -15.982330322265625, "p10": -1.8977323532104482, "median": 17.50442886352539, "p90": 32.17158508300781, "max": 78.24290466308594, "pos_frac": 0.875, "sample": [5.323368072509766, 17.536956787109375, 26.651840209960938, 17.471900939941406, 19.63327407836914, 22.933734893798828, 27.58159637451172, 6.366966247558594, 10.47344970703125, 7.981414794921875, 13.037101745605469, 23.92315673828125, 26.627212524414062, 0.3612518310546875, 4.004669189453125, 32.40216064453125, 65.2464599609375, 14.590194702148438, 18.143051147460938, 11.013959884643555, 46.23508834838867, -6.542732238769531, -3.5064468383789062, 12.822071075439453, 16.867019653320312, 11.687328338623047, 14.50030517578125, 28.650146484375, 12.646194458007812, 21.50043487548828, 56.335662841796875, 32.125762939453125, 17.747638702392578, 22.726783752441406, 31.458526611328125, 26.98938751220703, 32.19122314453125, 23.51982879638672, 13.485549926757812, -15.982330322265625, 30.30689239501953, 8.931060791015625, 18.913307189941406, -2.5646896362304688, -12.491111755371094, 15.578670501708984, 28.527732849121094, 18.658584594726562, -15.078641891479492, 78.24290466308594, 15.745292663574219, 20.850265502929688, -0.9566440582275391, 13.61090087890625, 4.5282135009765625, 30.1160888671875, -2.301055908203125, 24.60742950439453, 13.266775131225586, 14.164749145507812, 24.072837829589844, 31.83629608154297, 35.993743896484375, 11.760223388671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000286.npy"}
{"epoch": 0.4199706314243759, "step": 287, "batch_size": 64, "mean": 14.114128112792969, "std": 18.383861541748047, "min": -19.88109588623047, "p10": -3.5900274276733386, "median": 10.873744010925293, "p90": 38.233391952514665, "max": 74.19686889648438, "pos_frac": 0.796875, "sample": [3.9385986328125, -0.950439453125, 22.142730712890625, -0.47199249267578125, 6.241661071777344, -15.674888610839844, 22.046966552734375, 17.208572387695312, 39.8457145690918, -4.217838287353516, 65.76871490478516, 43.81977844238281, 14.33535385131836, 17.037742614746094, 15.025215148925781, -0.22475051879882812, 5.653347015380859, 15.266410827636719, 2.712677001953125, 1.322662353515625, 17.34791374206543, 6.452037811279297, 30.18172836303711, 22.962478637695312, 74.19686889648438, 10.970075607299805, 4.419620513916016, -14.88287353515625, 0.5819778442382812, 21.454299926757812, 7.278408050537109, 12.254318237304688, 4.452312469482422, 2.9999542236328125, 7.983369827270508, 34.47130584716797, 6.1783905029296875, 41.213348388671875, 10.597404479980469, -1.1690826416015625, 19.81328582763672, 30.97144317626953, -19.88109588623047, 12.110462188720703, 3.89215087890625, 7.771266937255859, 33.70470428466797, 6.2865142822265625, 19.51203155517578, 51.6773681640625, -6.42399787902832, 2.0002212524414062, 18.534645080566406, 30.048580169677734, -1.073699951171875, -2.1251354217529297, -15.350048065185547, 17.92505645751953, -6.253889083862305, 12.113975524902344, 22.746498107910156, 33.5467643737793, 10.777412414550781, 50.2095947265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000287.npy"}
{"epoch": 0.42143906020558003, "step": 288, "batch_size": 64, "mean": 19.528255462646484, "std": 17.679471969604492, "min": -10.889923095703125, "p10": -2.2374286651611315, "median": 17.990217208862305, "p90": 40.91802597045899, "max": 63.10326385498047, "pos_frac": 0.84375, "sample": [2.885967254638672, 9.996383666992188, -10.578826904296875, 17.377593994140625, -10.889923095703125, 32.77127456665039, -10.044551849365234, 27.692779541015625, -6.400249481201172, 27.46825408935547, 45.3218994140625, 26.343536376953125, 0.6155738830566406, 33.606422424316406, 6.866554260253906, 4.1296539306640625, 14.291236877441406, 30.468101501464844, 35.24952697753906, 31.544052124023438, 6.878868103027344, 39.351715087890625, 19.126022338867188, -2.7333526611328125, 33.67273712158203, 16.177696228027344, 11.895759582519531, 5.689075469970703, 25.28290557861328, 10.694984436035156, -6.821746826171875, -0.43321990966796875, 37.185707092285156, 41.58930206298828, 54.893280029296875, 2.66998291015625, 10.672164916992188, 47.17546081542969, 12.189048767089844, 10.86777114868164, 9.753700256347656, 16.685401916503906, -1.0802726745605469, 45.31024169921875, 37.03580856323242, 28.27082061767578, 8.883453369140625, -0.7217063903808594, 32.7537841796875, 32.56262969970703, 22.858802795410156, 39.33222961425781, 12.757492065429688, 32.338340759277344, 22.746932983398438, 7.745475769042969, 63.10326385498047, 18.602840423583984, 10.743537902832031, 37.390960693359375, 20.50684356689453, 56.91168212890625, -9.186279296875, 19.76291275024414], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000288.npy"}
{"epoch": 0.42290748898678415, "step": 289, "batch_size": 64, "mean": 20.74490737915039, "std": 22.805891036987305, "min": -18.852493286132812, "p10": -0.19559001922607386, "median": 11.981121063232422, "p90": 49.13873291015625, "max": 99.53974151611328, "pos_frac": 0.890625, "sample": [12.496194839477539, 24.37213134765625, 66.16656494140625, 62.11389923095703, -5.237293243408203, 10.495071411132812, 8.010009765625, 36.458038330078125, 5.382255554199219, 14.322349548339844, 9.035797119140625, 26.19005584716797, 30.951705932617188, 99.53974151611328, 46.293914794921875, 29.65740966796875, 48.90289306640625, 69.35707092285156, 10.753837585449219, 47.50639343261719, 6.838634490966797, 8.181194305419922, -18.852493286132812, 3.40045166015625, 0.8162031173706055, 0.86328125, 64.67052459716797, 7.1602630615234375, 17.16309356689453, 0.28064727783203125, 51.97998046875, 40.1781120300293, 2.1496353149414062, 1.1487445831298828, -1.3005447387695312, 3.121002197265625, 48.170166015625, 9.837303161621094, 10.641914367675781, 1.7606563568115234, 7.603004455566406, 49.23980712890625, -0.34354209899902344, 16.073509216308594, -1.1643428802490234, 33.004432678222656, 47.20268630981445, 7.735931396484375, 0.40448760986328125, 17.378053665161133, 27.359722137451172, 37.36981201171875, -8.688541412353516, 27.79790496826172, 4.484687805175781, 12.053207397460938, 3.5796146392822266, 0.14963150024414062, -0.6279678344726562, 29.48748779296875, 16.271926879882812, 35.62016296386719, 42.796661376953125, 11.909034729003906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000289.npy"}
{"epoch": 0.4243759177679883, "step": 290, "batch_size": 64, "mean": 21.441116333007812, "std": 22.93538475036621, "min": -19.283903121948242, "p10": -3.5466083526611323, "median": 15.890379905700684, "p90": 51.28179321289065, "max": 113.44332885742188, "pos_frac": 0.84375, "sample": [36.69970703125, 46.40309143066406, 33.05242156982422, 21.488113403320312, 9.62213134765625, 9.254417419433594, -3.1646957397460938, 1.7676849365234375, 18.808731079101562, -3.710285186767578, 30.676544189453125, 28.536529541015625, 9.230743408203125, 27.177200317382812, 39.85511016845703, 25.082534790039062, -15.605224609375, 1.7146530151367188, -8.66391372680664, 11.546257019042969, 13.151153564453125, 42.216583251953125, 15.005817413330078, 26.000652313232422, 24.42839813232422, -19.283903121948242, 43.01667785644531, 12.94866943359375, 113.44332885742188, 72.29434204101562, 13.09091567993164, 17.67272186279297, 16.058982849121094, 53.37266540527344, -7.556373596191406, -5.550331115722656, 23.503631591796875, 18.979997634887695, 11.104255676269531, 23.046531677246094, 14.236785888671875, 27.00259017944336, 26.763153076171875, 13.570533752441406, 6.2024078369140625, 6.854942321777344, 32.22590637207031, 10.929790496826172, 8.239852905273438, 12.830127716064453, 64.31065368652344, 32.398895263671875, 15.721776962280273, 57.763702392578125, -0.8099727630615234, -4.89923095703125, 67.8929443359375, 12.268295288085938, -0.30451011657714844, 1.2156314849853516, 12.636775970458984, 57.642578125, 45.034568786621094, 25.786745071411133], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000290.npy"}
{"epoch": 0.42584434654919234, "step": 291, "batch_size": 64, "mean": 15.815654754638672, "std": 16.842182159423828, "min": -34.495704650878906, "p10": -2.438893127441405, "median": 15.023094177246094, "p90": 36.48410148620606, "max": 49.86097717285156, "pos_frac": 0.875, "sample": [15.799858093261719, 4.448848724365234, -1.4576263427734375, 14.985748291015625, 18.260337829589844, 31.706497192382812, 32.80596923828125, 28.89629364013672, 43.3157958984375, 9.93133544921875, 2.44415283203125, 19.32752227783203, 35.38302993774414, 1.8135337829589844, 10.68960189819336, 9.139663696289062, 36.6480598449707, 28.484054565429688, 18.2789306640625, 0.2357940673828125, 49.86097717285156, -27.176971435546875, 12.44821548461914, 43.317291259765625, 26.319229125976562, 43.33489990234375, 15.107173919677734, 17.771041870117188, 16.807510375976562, -6.6735382080078125, 21.10247802734375, 27.517532348632812, 9.681884765625, 3.882354736328125, 44.30113983154297, 35.13250732421875, -34.495704650878906, 26.619140625, 15.060440063476562, 15.625823974609375, 13.000951766967773, 37.502197265625, 27.92389678955078, 4.903175354003906, 3.4708118438720703, 7.28227424621582, -4.378240585327148, 35.582275390625, 7.786264419555664, 28.73095703125, 10.143829345703125, -21.452423095703125, 13.840827941894531, -5.1333160400390625, 5.218231201171875, -2.85943603515625, 29.833213806152344, 10.522090911865234, 19.783111572265625, 10.378067016601562, 36.101531982421875, 14.714767456054688, 8.933670043945312, 3.6923599243164062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000291.npy"}
{"epoch": 0.42731277533039647, "step": 292, "batch_size": 64, "mean": 20.906238555908203, "std": 15.373456001281738, "min": -9.686378479003906, "p10": -0.02152061462402302, "median": 20.583955764770508, "p90": 41.39471740722658, "max": 57.21131896972656, "pos_frac": 0.890625, "sample": [31.981155395507812, 29.346607208251953, 29.94614601135254, 22.081024169921875, 24.4154052734375, 18.810325622558594, -3.938720703125, 10.564453125, 30.5028076171875, 3.599578857421875, 28.611862182617188, 8.608661651611328, 44.362266540527344, 11.421031951904297, 6.631797790527344, 19.91860580444336, 24.744094848632812, 46.35414123535156, 22.911300659179688, 14.110084533691406, 17.674285888671875, 57.21131896972656, 34.22831726074219, 33.73212814331055, 30.861663818359375, 55.88874816894531, 34.06791687011719, 32.50975036621094, 14.78902816772461, 18.736408233642578, -9.686378479003906, -3.95550537109375, 0.6222114562988281, 7.824974060058594, 4.176666259765625, 27.807403564453125, 17.15848159790039, 18.563236236572266, 37.423614501953125, -2.3862152099609375, 35.88768005371094, 19.243331909179688, 43.09661865234375, -3.4011268615722656, 6.8478546142578125, 32.21791076660156, 43.3880615234375, 26.678070068359375, 34.40985107421875, 9.275516510009766, -0.19469833374023438, 20.238571166992188, 20.929340362548828, 2.8094558715820312, 26.7996826171875, 10.76934814453125, 25.465927124023438, 15.864124298095703, 29.831100463867188, 13.993904113769531, 45.64856719970703, -4.0031280517578125, 29.589988708496094, 0.38256072998046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000292.npy"}
{"epoch": 0.4287812041116006, "step": 293, "batch_size": 64, "mean": 13.989400863647461, "std": 18.09434700012207, "min": -17.31414794921875, "p10": -7.301374435424804, "median": 8.198427200317383, "p90": 40.547738647460946, "max": 57.332427978515625, "pos_frac": 0.8125, "sample": [9.151981353759766, 6.178901672363281, -17.31414794921875, 5.256980895996094, 3.8039817810058594, 7.676155090332031, 47.2056884765625, 1.6555023193359375, 26.173233032226562, 1.9286231994628906, 13.686576843261719, 21.0064697265625, 6.953956604003906, -8.325183868408203, 57.332427978515625, 3.8069076538085938, -5.528564453125, 17.577972412109375, 9.145828247070312, 23.88660430908203, 5.1452178955078125, -10.45208740234375, 38.09199523925781, -1.0725860595703125, 19.083728790283203, 2.246549606323242, 6.739402770996094, 51.72300720214844, -1.1247177124023438, 47.21095275878906, 26.703323364257812, 38.859230041503906, 24.695602416992188, 7.0912628173828125, 41.271385192871094, 3.1811141967773438, 29.039466857910156, -17.008365631103516, 8.452136993408203, 9.335990905761719, 33.28180694580078, -8.342498779296875, 1.9901504516601562, 2.638153076171875, 35.623077392578125, 35.103084564208984, -8.227968215942383, 47.469268798828125, 7.276649475097656, -7.770015716552734, 4.217926025390625, -6.207878112792969, 21.907752990722656, 7.9447174072265625, 55.98980712890625, 22.602306365966797, 12.56878662109375, 9.090263366699219, 1.7764778137207031, 18.260223388671875, -3.1212005615234375, 15.04571533203125, 31.620651245117188, 4.111881256103516], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000293.npy"}
{"epoch": 0.4302496328928047, "step": 294, "batch_size": 64, "mean": 13.479206085205078, "std": 17.28700065612793, "min": -17.648422241210938, "p10": -5.734793281555175, "median": 9.949567794799805, "p90": 31.93691864013672, "max": 90.47562408447266, "pos_frac": 0.796875, "sample": [-8.833488464355469, 30.01691436767578, 7.907703399658203, 37.4869499206543, 22.74322509765625, 28.049346923828125, 17.967445373535156, 13.927021026611328, 25.49291229248047, 20.853424072265625, 11.653121948242188, 8.226760864257812, 0.4520149230957031, 19.341705322265625, -17.648422241210938, -0.17462158203125, 5.519676208496094, 22.804595947265625, -0.3451042175292969, 11.235408782958984, -15.6116943359375, 6.32659912109375, 18.141258239746094, 14.633857727050781, 14.512313842773438, -1.481414794921875, 17.953445434570312, 5.035591125488281, 24.41944122314453, 25.76721954345703, 5.658374786376953, 49.64313507080078, 14.634864807128906, 31.84039306640625, 14.25916862487793, -11.646459579467773, 90.47562408447266, 22.730932235717773, 12.230888366699219, 34.40480041503906, -13.14617919921875, 27.423480987548828, 2.5050697326660156, 7.138759613037109, 37.111549377441406, 24.97530746459961, 8.426017761230469, 8.663726806640625, 4.519279479980469, 1.5377979278564453, -7.948333740234375, 8.372283935546875, -5.681415557861328, 8.641014099121094, 26.33722686767578, 6.02093505859375, 7.225749969482422, 8.288124084472656, -5.757669448852539, 7.466571807861328, 41.5528564453125, -1.1801300048828125, -2.406005859375, 31.978286743164062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000294.npy"}
{"epoch": 0.43171806167400884, "step": 295, "batch_size": 64, "mean": 18.12828826904297, "std": 18.49224281311035, "min": -28.615272521972656, "p10": 0.3678619384765632, "median": 16.3787784576416, "p90": 36.10127105712892, "max": 79.45846557617188, "pos_frac": 0.90625, "sample": [24.441661834716797, 21.62062644958496, 6.8208770751953125, 2.2054290771484375, 21.138595581054688, 11.600311279296875, 15.716407775878906, 9.671051025390625, 32.414947509765625, 29.989280700683594, 21.41412353515625, 2.2204208374023438, 15.50399398803711, 24.999183654785156, 24.294937133789062, 21.139366149902344, -28.615272521972656, 26.98560333251953, 12.008407592773438, 31.743202209472656, 18.56695556640625, 5.841064453125, 1.0873565673828125, 17.353073120117188, 7.67315673828125, -0.6820602416992188, 37.54087829589844, 9.457958221435547, 54.920867919921875, 41.85577392578125, 13.902999877929688, 22.81031036376953, 0.09352874755859375, 9.3134765625, 13.037647247314453, 30.14710235595703, 25.178024291992188, 2.903881072998047, 3.1968231201171875, 51.93572998046875, 1.6506233215332031, 31.193740844726562, 1.0079727172851562, -1.9553756713867188, 1.4433403015136719, -3.6657791137695312, 30.9576416015625, 11.080989837646484, 24.157344818115234, 32.7421875, 16.18114471435547, 27.47174072265625, 12.717369079589844, 79.45846557617188, 31.080825805664062, 55.55310821533203, 17.13424301147461, 18.002777099609375, 16.576412200927734, -19.205982208251953, -6.052419662475586, 12.085491180419922, 69.65206909179688, 7.4948272705078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000295.npy"}
{"epoch": 0.4331864904552129, "step": 296, "batch_size": 64, "mean": 19.57727813720703, "std": 21.31549072265625, "min": -34.17974853515625, "p10": -2.5344596862792947, "median": 18.561809539794922, "p90": 43.19766616821289, "max": 92.82467651367188, "pos_frac": 0.84375, "sample": [-0.11756706237792969, 15.231292724609375, 42.795654296875, 92.82467651367188, 25.367721557617188, 86.14651489257812, 7.888023376464844, 6.0079498291015625, 26.455307006835938, -34.17974853515625, 39.762939453125, -3.4069595336914062, -12.561088562011719, 24.327064514160156, -5.038970947265625, -0.07765960693359375, 16.868295669555664, 23.143165588378906, 43.369956970214844, 19.895156860351562, -7.476165771484375, 21.549301147460938, -0.498626708984375, 19.885421752929688, 8.083251953125, 39.56694030761719, 46.547088623046875, 25.4110107421875, 34.69481658935547, 2.754608154296875, 31.093704223632812, 28.296630859375, 32.77076721191406, 34.75621032714844, 12.170684814453125, 43.38548278808594, 30.126510620117188, 7.4416656494140625, -8.727973937988281, 19.88507080078125, 7.653816223144531, 21.812828063964844, 0.8038482666015625, 11.821517944335938, 65.057861328125, 1.8046493530273438, 18.73194122314453, 48.626670837402344, 5.9385528564453125, 5.425149917602539, 33.95098876953125, 3.0049476623535156, 13.418624877929688, 40.52882385253906, 7.0778045654296875, 18.391677856445312, 19.441482543945312, 25.493545532226562, 17.306072235107422, -5.183017730712891, 15.84954833984375, 16.94013214111328, 3.0941314697265625, 19.536117553710938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000296.npy"}
{"epoch": 0.434654919236417, "step": 297, "batch_size": 64, "mean": 18.1894474029541, "std": 17.558549880981445, "min": -12.094451904296875, "p10": -1.4868513107299803, "median": 13.477800369262695, "p90": 38.99282836914063, "max": 64.74871826171875, "pos_frac": 0.875, "sample": [21.330684661865234, 24.36760711669922, 32.751312255859375, 29.514450073242188, 23.86243438720703, 3.1491050720214844, 37.982879638671875, 28.350051879882812, 42.531471252441406, -8.435379028320312, 19.908145904541016, 32.77667236328125, 8.074981689453125, 2.474761962890625, 32.19963073730469, 4.4249725341796875, 0.8831329345703125, 30.028919219970703, 27.85279083251953, -10.683420181274414, -9.682342529296875, 0.8648548126220703, 7.812709808349609, 1.8760566711425781, 36.44940185546875, 17.714447021484375, 19.26367950439453, 12.584510803222656, 8.666797637939453, 27.172927856445312, 5.213401794433594, 10.449302673339844, 5.677330017089844, 13.64132308959961, 13.314277648925781, -2.6922149658203125, 48.498558044433594, 39.37556457519531, 61.549041748046875, 38.09977722167969, 8.724029541015625, 29.23812484741211, -12.094451904296875, 11.744598388671875, -1.5471668243408203, 4.237201690673828, 8.936546325683594, 3.672943115234375, 51.305816650390625, 9.526443481445312, 6.327077865600586, 21.112098693847656, 25.092330932617188, 4.559600830078125, -1.3461151123046875, -2.080413818359375, 7.973358154296875, 37.24347686767578, 44.90705871582031, 64.74871826171875, 7.433073997497559, 32.70753479003906, 36.65814208984375, 25.850013732910156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000297.npy"}
{"epoch": 0.43612334801762115, "step": 298, "batch_size": 64, "mean": 19.88420867919922, "std": 20.926685333251953, "min": -12.358688354492188, "p10": -1.3941375732421868, "median": 16.770977020263672, "p90": 42.06797142028809, "max": 113.2967529296875, "pos_frac": 0.859375, "sample": [34.1527099609375, 34.89801788330078, 5.173788070678711, 6.654937744140625, -9.632034301757812, 28.08306884765625, -4.003147125244141, 35.67121887207031, 37.851715087890625, -2.0073089599609375, 3.89556884765625, 22.499237060546875, 25.151329040527344, 15.193634033203125, -0.71881103515625, 21.455551147460938, 0.9377059936523438, 13.667327880859375, 11.771217346191406, 8.452762603759766, 27.74755859375, -12.358688354492188, 19.860275268554688, -3.302459716796875, 39.64942169189453, 59.8599739074707, 19.675918579101562, 16.106491088867188, 17.435462951660156, 46.75074768066406, 4.703804016113281, 5.732170104980469, 6.284748077392578, 1.5148124694824219, -1.683563232421875, 41.428131103515625, -4.2991943359375, 56.39628601074219, 30.836196899414062, 27.331268310546875, 11.81979751586914, 7.035396575927734, 113.2967529296875, 4.352436065673828, 1.3739089965820312, 7.211517333984375, 42.13434600830078, 41.9130973815918, 33.908470153808594, 9.06985092163086, 28.834716796875, 19.771949768066406, 32.07695007324219, 22.813095092773438, 29.5604248046875, 25.537994384765625, 13.045459747314453, 55.7857666015625, -0.6189422607421875, 25.819602966308594, 1.5211334228515625, 5.1264190673828125, 1.8567695617675781, 50.52459716796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000298.npy"}
{"epoch": 0.43759177679882527, "step": 299, "batch_size": 64, "mean": 20.033157348632812, "std": 17.0035400390625, "min": -15.415313720703125, "p10": -1.0592079162597643, "median": 22.256282806396484, "p90": 45.95589294433594, "max": 63.29002380371094, "pos_frac": 0.890625, "sample": [13.237495422363281, 13.477561950683594, 31.443984985351562, 21.339332580566406, 7.215461730957031, -1.9085006713867188, 30.481887817382812, 22.2425537109375, 3.481006622314453, 9.314292907714844, 46.4871826171875, 7.889198303222656, 47.416046142578125, 23.671329498291016, 25.075733184814453, 32.93059539794922, 7.5486602783203125, 51.273109436035156, 11.278635025024414, 28.633899688720703, 3.8514022827148438, 20.632492065429688, -1.6676902770996094, -3.6045188903808594, 3.1124496459960938, 6.310874938964844, 25.562713623046875, 1.6450042724609375, -5.525260925292969, 6.192424774169922, 15.533561706542969, 30.22136688232422, 22.543304443359375, 32.029502868652344, 36.84403991699219, 29.126422882080078, 6.84808349609375, 23.158702850341797, 44.716217041015625, 24.802764892578125, 32.80390548706055, 15.226551055908203, 0.3605842590332031, -7.88818359375, 51.44035339355469, 22.27001190185547, 3.2354888916015625, 63.29002380371094, 30.474029541015625, 47.76616668701172, 25.699981689453125, 8.277164459228516, 14.997306823730469, 7.3620452880859375, -4.8869171142578125, 34.81382751464844, 27.593250274658203, 54.478614807128906, 26.688400268554688, 36.089202880859375, -15.415313720703125, 24.196311950683594, 5.388704299926758, 22.997196197509766], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000299.npy"}
{"epoch": 0.4390602055800294, "step": 300, "batch_size": 64, "mean": 17.913686752319336, "std": 19.892627716064453, "min": -18.40728759765625, "p10": -2.967597198486328, "median": 13.372915267944336, "p90": 42.822519302368164, "max": 72.67890930175781, "pos_frac": 0.828125, "sample": [18.816265106201172, 7.677494049072266, 23.696495056152344, 48.22395324707031, 10.375953674316406, 4.938060760498047, 9.386703491210938, 42.14326477050781, 22.86862564086914, -16.49264144897461, -17.168989181518555, -3.9017333984375, 32.41398239135742, 4.2196044921875, -7.611553192138672, 13.565216064453125, 8.048229217529297, 6.8392486572265625, 2.454038619995117, 1.3714218139648438, 38.53308868408203, 43.11362838745117, 0.4669189453125, -2.424957275390625, -3.1668319702148438, 46.38591003417969, 12.010665893554688, 10.30267333984375, 32.53727722167969, 10.254806518554688, 23.283859252929688, 72.35882568359375, 26.32953643798828, -11.097480773925781, 22.862335205078125, 3.2319679260253906, 33.27555847167969, 6.979286193847656, 26.955718994140625, 33.683441162109375, 32.0162353515625, 15.517864227294922, 0.72589111328125, 36.51866149902344, -18.40728759765625, 22.540672302246094, 25.62485122680664, 13.180614471435547, 36.12138366699219, 34.91999816894531, 45.686275482177734, 8.802371978759766, -1.3792800903320312, 7.198848724365234, -0.7021169662475586, 5.6341705322265625, 26.369125366210938, 6.79345703125, -2.502716064453125, 72.67890930175781, 18.98968505859375, 56.33715057373047, 42.05961227416992, 24.011768341064453], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000300.npy"}
{"epoch": 0.44052863436123346, "step": 301, "batch_size": 64, "mean": 20.18459701538086, "std": 18.501291275024414, "min": -15.819488525390625, "p10": 0.40234298706054733, "median": 18.299280166625977, "p90": 44.676815032958984, "max": 74.26001739501953, "pos_frac": 0.921875, "sample": [16.482330322265625, 23.949058532714844, 22.15127944946289, 41.20294189453125, 62.537994384765625, 17.297943115234375, 9.307861328125, 25.43451690673828, 35.41521453857422, -5.641319274902344, 34.132598876953125, 36.09775924682617, 10.706367492675781, 10.456363677978516, -8.338787078857422, 31.311241149902344, 0.2102813720703125, 74.26001739501953, 5.0713348388671875, 19.117889404296875, 23.95550537109375, 18.021686553955078, 2.2212295532226562, 6.975494384765625, 23.055206298828125, 33.932220458984375, 18.791481018066406, 15.128646850585938, 3.7968482971191406, 26.75615692138672, 62.11736297607422, 3.33709716796875, 0.07431793212890625, 25.99164581298828, 6.6616668701171875, 6.329822540283203, 17.208053588867188, 4.29180908203125, 7.075260162353516, 9.492874145507812, 8.83914566040039, 11.282581329345703, 6.8710174560546875, -6.579216003417969, -15.819488525390625, 48.888343811035156, 45.6435661315918, 9.620719909667969, 21.570175170898438, 26.724822998046875, 44.90641784667969, 34.755859375, 20.129150390625, 31.71204376220703, 60.286712646484375, 44.141075134277344, -2.12908935546875, 35.82206344604492, 31.495986938476562, 18.576873779296875, 2.2928390502929688, 0.8504867553710938, 6.248165130615234, 29.30663299560547], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000301.npy"}
{"epoch": 0.4419970631424376, "step": 302, "batch_size": 64, "mean": 17.414783477783203, "std": 18.99659538269043, "min": -20.07965087890625, "p10": -2.0856142044067374, "median": 14.165725708007812, "p90": 44.933363342285155, "max": 63.759681701660156, "pos_frac": 0.859375, "sample": [16.52996063232422, 58.04759216308594, -0.082305908203125, 18.24118423461914, 26.968582153320312, 25.13719940185547, 24.52246856689453, 14.354934692382812, 2.699951171875, -15.318405151367188, 13.230300903320312, 2.0586109161376953, 1.7839469909667969, 13.196182250976562, -20.07965087890625, -10.399477005004883, 6.005859375, 1.1939773559570312, 49.18293762207031, 24.181442260742188, 0.29119873046875, 39.476806640625, 13.976516723632812, 14.552299499511719, 12.269145965576172, -11.315797805786133, 46.64656066894531, 27.574844360351562, 22.898666381835938, 0.18108367919921875, 21.663963317871094, 30.334495544433594, 11.365867614746094, 7.464683532714844, 5.302968978881836, 45.140716552734375, 63.759681701660156, 16.154830932617188, 16.759288787841797, -12.73721694946289, 7.5187530517578125, 54.567909240722656, 10.731986999511719, 28.421558380126953, -2.419178009033203, 33.377197265625, 5.1200714111328125, 62.343841552734375, 21.396528244018555, -6.15509033203125, 2.923006057739258, 44.44953918457031, 10.744392395019531, 10.013824462890625, 24.22809600830078, 10.836681365966797, 31.60626983642578, 32.89275360107422, 7.171661376953125, 39.1185302734375, 21.449325561523438, 2.5962066650390625, 39.70375061035156, -1.3072986602783203], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000302.npy"}
{"epoch": 0.4434654919236417, "step": 303, "batch_size": 64, "mean": 19.779584884643555, "std": 17.321170806884766, "min": -16.031898498535156, "p10": -2.185220336914062, "median": 17.41312026977539, "p90": 40.809930419921876, "max": 61.32383728027344, "pos_frac": 0.875, "sample": [40.161712646484375, 5.8632659912109375, 22.59307098388672, 23.841941833496094, 4.176624298095703, 27.654006958007812, 20.155303955078125, 16.703086853027344, 4.037067413330078, -4.8596038818359375, 6.230926513671875, 9.681808471679688, 39.13513946533203, 4.990154266357422, 28.08244514465332, 6.4796600341796875, 4.4075927734375, 13.84688949584961, 5.197544097900391, 59.38421630859375, 28.799835205078125, 33.16240692138672, -6.345458984375, -1.6745758056640625, 53.69829559326172, 23.245254516601562, -4.115730285644531, 28.42218017578125, 23.086570739746094, 35.02904510498047, 14.901771545410156, 34.45262908935547, 42.81901550292969, 7.950305938720703, 56.19938659667969, -2.4040679931640625, -16.031898498535156, 8.831756591796875, 41.087738037109375, 15.634536743164062, 55.7493896484375, 11.136749267578125, 19.874488830566406, 22.283462524414062, 35.57157897949219, 32.402740478515625, 32.292884826660156, 25.416118621826172, 22.335647583007812, -8.92070198059082, 61.32383728027344, 10.971725463867188, 33.3587646484375, 18.123153686523438, 14.854715347290039, 36.14771270751953, -4.8641357421875, 14.732707977294922, 15.20135498046875, 7.407527923583984, 8.31307601928711, 11.867538452148438, 11.946828842163086, 23.854393005371094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000303.npy"}
{"epoch": 0.44493392070484583, "step": 304, "batch_size": 64, "mean": 18.5361270904541, "std": 16.288076400756836, "min": -18.33048439025879, "p10": -6.3653562545776365, "median": 18.81121063232422, "p90": 37.95373039245606, "max": 49.597869873046875, "pos_frac": 0.84375, "sample": [-6.614229202270508, 17.270889282226562, 16.0693359375, 18.871841430664062, 10.589874267578125, 24.492862701416016, -15.992584228515625, 44.677406311035156, 20.31658172607422, 45.910980224609375, 33.562477111816406, 13.787544250488281, 20.340126037597656, 14.335899353027344, 11.929519653320312, 30.193496704101562, 12.722396850585938, 15.649307250976562, 10.814468383789062, 21.536151885986328, 36.636436462402344, 23.781402587890625, 42.680633544921875, 18.750579833984375, 34.790260314941406, 36.96178436279297, 18.200092315673828, 14.762474060058594, 49.597869873046875, 5.978404998779297, 21.363616943359375, 30.149688720703125, -7.7237701416015625, 38.28033447265625, -18.33048439025879, -11.318344116210938, 9.468215942382812, 19.713932037353516, 16.402313232421875, 26.592506408691406, -5.7846527099609375, 45.27366638183594, -12.158012390136719, -9.981407165527344, 22.892227172851562, 1.8953857421875, 14.495948791503906, 4.808709144592285, 29.092819213867188, 20.009990692138672, -0.8120498657226562, -0.4057655334472656, 35.78096008300781, 33.97492980957031, 43.91699981689453, 17.537193298339844, 13.119731903076172, 22.847976684570312, 10.84478759765625, 32.575653076171875, 29.007301330566406, 37.191654205322266, 30.41742706298828, 2.5683975219726562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000304.npy"}
{"epoch": 0.44640234948604995, "step": 305, "batch_size": 64, "mean": 19.564817428588867, "std": 17.208406448364258, "min": -13.157562255859375, "p10": 0.33774566650390725, "median": 18.453411102294922, "p90": 45.44252700805664, "max": 52.26435089111328, "pos_frac": 0.890625, "sample": [45.06800842285156, 26.13304901123047, 16.869491577148438, 32.969261169433594, -0.9222297668457031, 27.62567138671875, 46.37518310546875, 4.12849235534668, 24.881881713867188, 45.60303497314453, 16.22229766845703, 6.644443511962891, 17.067550659179688, 31.725494384765625, 11.163742065429688, 1.8148040771484375, 22.885955810546875, 52.115264892578125, 34.3307991027832, 27.259742736816406, 52.26435089111328, 18.749366760253906, 25.854812622070312, 31.886627197265625, 14.531612396240234, 6.6367645263671875, 11.558364868164062, 6.709075927734375, 29.651885986328125, 1.312347412109375, 11.46417236328125, 31.340164184570312, 8.163566589355469, -0.0799407958984375, 44.33946228027344, 10.931045532226562, 46.640533447265625, 2.9964332580566406, 7.978240966796875, -7.806602478027344, 1.557291030883789, 1.4585723876953125, 38.988868713378906, 18.926074981689453, 15.095252990722656, 20.310012817382812, 27.997406005859375, 7.256172180175781, 21.725296020507812, 44.23778533935547, 29.562652587890625, 51.606544494628906, 31.867431640625, 21.783050537109375, -13.157562255859375, -12.699005126953125, 18.157455444335938, -11.08636474609375, 29.8602294921875, -12.592544555664062, 16.25475311279297, 46.604400634765625, 8.000656127929688, 5.349672317504883], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000305.npy"}
{"epoch": 0.447870778267254, "step": 306, "batch_size": 64, "mean": 22.919784545898438, "std": 14.62917423248291, "min": -4.360260009765625, "p10": 9.176833152770996, "median": 18.863059997558594, "p90": 45.28605651855469, "max": 61.30230712890625, "pos_frac": 0.953125, "sample": [22.589336395263672, 10.606342315673828, 24.360183715820312, 13.184118270874023, 22.849777221679688, 18.72705078125, 58.73506164550781, 41.279273986816406, 51.57380676269531, 19.53852081298828, 42.124000549316406, 15.632781982421875, -0.3641014099121094, 18.074005126953125, 15.630447387695312, 23.637203216552734, 31.40584945678711, 46.22755432128906, 15.845909118652344, 33.03727722167969, 10.848846435546875, 14.660263061523438, 30.523101806640625, 13.516357421875, 9.7354736328125, 9.439996719360352, 12.94168472290039, 9.064048767089844, 17.12258529663086, 61.30230712890625, 12.142333984375, -4.360260009765625, 52.17925262451172, 11.236625671386719, 17.66466522216797, 21.56463623046875, 10.957914352416992, 27.52277374267578, 14.736400604248047, 15.080825805664062, 26.867408752441406, 18.999069213867188, 23.64417266845703, 39.11482238769531, 4.60157585144043, -1.093963623046875, 16.384170532226562, 44.836219787597656, 16.64238739013672, 7.768524169921875, 46.49073791503906, 35.745216369628906, 17.86139678955078, 45.478843688964844, 19.580787658691406, 20.273197174072266, 21.08177947998047, 10.260589599609375, 33.62410354614258, 13.542144775390625, 25.63165283203125, 37.90351867675781, 7.1137542724609375, 41.93986511230469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000306.npy"}
{"epoch": 0.44933920704845814, "step": 307, "batch_size": 64, "mean": 17.387653350830078, "std": 17.720918655395508, "min": -16.474700927734375, "p10": -5.4517997741699205, "median": 18.687175750732422, "p90": 36.31173477172852, "max": 72.2132568359375, "pos_frac": 0.8125, "sample": [26.062847137451172, -2.396820068359375, 6.1096649169921875, 21.033721923828125, 20.047428131103516, 6.990306854248047, -11.048431396484375, -3.3815231323242188, 7.9515838623046875, 29.056243896484375, 12.635913848876953, -9.654609680175781, 36.316429138183594, -13.332427978515625, 35.24552917480469, -4.3271331787109375, 33.06960678100586, -1.1591033935546875, 38.3170166015625, -5.933799743652344, 17.057891845703125, 18.038230895996094, 10.372337341308594, 27.549457550048828, 60.822471618652344, 20.50780487060547, 17.230701446533203, -2.4516830444335938, 16.28912353515625, 27.769615173339844, 6.072820663452148, 23.849136352539062, 16.38728904724121, -16.474700927734375, 19.33612060546875, 24.215408325195312, 33.49479675292969, 29.94775390625, 72.2132568359375, 35.17444610595703, 20.663681030273438, 28.598167419433594, -15.3892822265625, 19.733718872070312, 36.63815689086914, 21.66211700439453, 20.363250732421875, 31.15001678466797, 14.429485321044922, 10.589736938476562, 6.748161315917969, 10.837896347045898, 29.846405029296875, 20.071754455566406, 32.18257141113281, 44.84790802001953, 5.591033935546875, 7.616649627685547, 47.29450988769531, 36.30078125, 5.204559326171875, 6.22650146484375, 0.4251251220703125, -7.827932357788086], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000307.npy"}
{"epoch": 0.45080763582966227, "step": 308, "batch_size": 64, "mean": 16.023202896118164, "std": 18.435483932495117, "min": -16.713539123535156, "p10": -5.329388809204101, "median": 10.134113311767578, "p90": 44.322783660888675, "max": 62.61522674560547, "pos_frac": 0.84375, "sample": [22.132850646972656, 41.77854919433594, -6.071014404296875, -11.681697845458984, 14.999565124511719, 44.66944122314453, -11.794471740722656, 4.706974029541016, 22.814926147460938, 7.965028762817383, 22.08624267578125, 0.034198760986328125, -5.511051177978516, 22.492401123046875, 8.952041625976562, 45.510467529296875, -8.807315826416016, 17.381576538085938, 16.276748657226562, 16.40802001953125, 2.207550048828125, 62.61522674560547, 39.96558380126953, 40.86833953857422, 7.339111328125, 11.404991149902344, -16.713539123535156, 7.88292121887207, 36.71269989013672, 14.35128402709961, 43.513916015625, 8.83245849609375, 9.267951965332031, 61.88340759277344, 33.60845947265625, 51.51380157470703, 45.21099090576172, 5.401824951171875, 7.31793212890625, 9.377948760986328, 7.728961944580078, -4.905509948730469, 16.724891662597656, 20.28143310546875, 26.332740783691406, 50.902381896972656, 6.5399169921875, 28.045013427734375, 6.624542236328125, 10.531768798828125, 31.814342498779297, 22.842803955078125, 3.0507888793945312, 6.184150695800781, -2.3619518280029297, 3.3876380920410156, 20.56756591796875, 15.438751220703125, 1.068145751953125, 0.4861183166503906, -7.802360534667969, -3.9440536499023438, 9.304092407226562, 9.736457824707031], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000308.npy"}
{"epoch": 0.4522760646108664, "step": 309, "batch_size": 64, "mean": 18.65892791748047, "std": 17.668581008911133, "min": -11.161663055419922, "p10": -5.462860488891601, "median": 19.81082534790039, "p90": 42.58665313720704, "max": 65.55005645751953, "pos_frac": 0.796875, "sample": [12.589950561523438, 2.751321792602539, 17.706836700439453, 43.22100830078125, 17.50304412841797, 31.793258666992188, 7.820926666259766, -7.417568206787109, 36.41508483886719, 11.89239501953125, 30.11656951904297, 26.462326049804688, -0.10296249389648438, 50.68278503417969, -6.80914306640625, -11.161663055419922, -5.704746246337891, 8.680961608886719, 21.449310302734375, 24.560348510742188, 22.68500518798828, -2.967620849609375, 24.30340576171875, 31.783035278320312, 3.89117431640625, -8.36581802368164, 26.145278930664062, 30.9212646484375, -1.6803474426269531, 12.593265533447266, 6.646450042724609, 51.0941162109375, 30.561241149902344, -1.6276016235351562, 30.599014282226562, 6.383056640625, 56.1143798828125, 22.95855712890625, 21.86852264404297, -7.56341552734375, -4.898460388183594, 18.144908905029297, 2.2126541137695312, 65.55005645751953, 11.01373291015625, 31.2437744140625, 26.288116455078125, 20.166183471679688, 26.06812286376953, -6.370887756347656, 18.354034423828125, -2.443286895751953, 41.10649108886719, 20.119789123535156, 18.204017639160156, 19.635215759277344, 46.129188537597656, 51.85694885253906, 4.116207122802734, 28.876811981201172, 13.279281616210938, 20.415977478027344, 19.986434936523438, 36.323089599609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000309.npy"}
{"epoch": 0.45374449339207046, "step": 310, "batch_size": 64, "mean": 15.934802055358887, "std": 16.956804275512695, "min": -20.00273895263672, "p10": -2.6361850738525385, "median": 17.32898712158203, "p90": 36.78972473144532, "max": 53.94274139404297, "pos_frac": 0.8125, "sample": [0.8641529083251953, 3.5406341552734375, 53.94274139404297, -1.8606643676757812, 19.87506103515625, 53.44227600097656, 28.975860595703125, 12.013162612915039, 8.212112426757812, -13.895065307617188, -6.205833435058594, 2.001483917236328, 7.2711029052734375, 31.485015869140625, -1.6349601745605469, -2.94610595703125, 11.365291595458984, 11.812812805175781, 27.402313232421875, 11.668594360351562, 19.685577392578125, 1.9003791809082031, 6.6411590576171875, -20.00273895263672, 43.58892822265625, -1.2106246948242188, 37.235595703125, 9.274555206298828, 23.4205379486084, 26.03516387939453, -0.298126220703125, 23.426834106445312, 32.37375259399414, 2.5889663696289062, 28.709579467773438, 20.780517578125, 17.840835571289062, 28.74059295654297, 4.4032440185546875, 10.862613677978516, -6.862152099609375, 16.817138671875, 50.881507873535156, 34.31610107421875, 37.78211212158203, 38.925559997558594, 27.92444610595703, 35.749359130859375, 29.510581970214844, 11.994972229003906, -13.71030044555664, 0.5487957000732422, 28.658172607421875, 18.381378173828125, -18.78814697265625, 1.2080497741699219, 20.792144775390625, 18.40185546875, 27.85076141357422, 24.406707763671875, -1.9130363464355469, 28.950130462646484, 16.58802032470703, 18.085803985595703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000310.npy"}
{"epoch": 0.4552129221732746, "step": 311, "batch_size": 64, "mean": 19.584360122680664, "std": 18.303251266479492, "min": -17.944168090820312, "p10": -1.8874106407165505, "median": 18.40692138671875, "p90": 45.89295883178711, "max": 67.26620483398438, "pos_frac": 0.890625, "sample": [2.3043594360351562, 45.053565979003906, 24.018096923828125, 25.99420166015625, -2.872124671936035, 59.45442199707031, -4.988670349121094, 33.594329833984375, 31.471450805664062, 40.809906005859375, 29.610397338867188, 6.250335693359375, 28.84033966064453, 26.133712768554688, -11.470867156982422, 6.297271728515625, 6.8555908203125, 8.246339797973633, 17.586654663085938, 23.78369140625, 8.901496887207031, 45.21562957763672, 42.351318359375, 35.018062591552734, 31.893966674804688, 21.344280242919922, 19.227188110351562, 20.866920471191406, 22.691272735595703, 23.588157653808594, 20.235637664794922, 5.278339385986328, 25.568981170654297, 33.97198486328125, 30.682418823242188, 1.9925384521484375, 7.089668273925781, 47.54130554199219, 9.33294677734375, 53.067352294921875, -6.8005218505859375, 0.41025543212890625, 46.18324279785156, 13.388967514038086, 51.10368347167969, 10.44500732421875, -12.624679565429688, -5.787815093994141, 5.140472412109375, 16.371679306030273, 3.433135986328125, -17.944168090820312, 12.577579498291016, 10.854686737060547, 2.498882293701172, 28.591537475585938, 46.8331298828125, 15.402130126953125, 9.521827697753906, 11.696067810058594, 20.89527130126953, 7.701358795166016, 13.408638000488281, 67.26620483398438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000311.npy"}
{"epoch": 0.4566813509544787, "step": 312, "batch_size": 64, "mean": 20.303661346435547, "std": 18.56291389465332, "min": -28.04931640625, "p10": -0.1472778320312494, "median": 19.59014892578125, "p90": 44.28424606323244, "max": 71.45013427734375, "pos_frac": 0.890625, "sample": [20.260366439819336, 71.45013427734375, 45.889122009277344, 19.22259521484375, 4.392372131347656, -28.04931640625, 20.96442413330078, 1.4400825500488281, 25.963520050048828, 30.723190307617188, -20.015640258789062, 54.41276550292969, 37.466575622558594, 34.89878845214844, 29.613361358642578, 9.572074890136719, -9.027690887451172, 33.375335693359375, 14.406768798828125, 51.85832977294922, 32.514225006103516, -14.807123184204102, 26.792930603027344, 23.73925018310547, 16.938438415527344, -0.4107093811035156, 50.850494384765625, 40.02130126953125, 37.62535095214844, 19.95770263671875, 15.175559997558594, 36.993927001953125, 21.648147583007812, 14.709037780761719, 19.04034996032715, 23.433929443359375, 10.060565948486328, 24.053936004638672, 10.356880187988281, 30.481666564941406, 59.590118408203125, 13.837833404541016, -4.30572509765625, 14.949958801269531, 21.922054290771484, 12.2637939453125, 0.4673957824707031, 40.53953552246094, 12.945648193359375, 45.905555725097656, 11.427444458007812, 1.7767829895019531, 5.156293869018555, 9.082866668701172, 9.716045379638672, 30.975807189941406, 14.671396255493164, -5.506706237792969, 27.098594665527344, 28.072364807128906, 23.19918441772461, 2.0347747802734375, 18.914321899414062, 16.705936431884766], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000312.npy"}
{"epoch": 0.4581497797356828, "step": 313, "batch_size": 64, "mean": 17.38157081604004, "std": 18.66155242919922, "min": -16.356250762939453, "p10": -5.2365966796875, "median": 17.920848846435547, "p90": 43.88169403076172, "max": 81.89702606201172, "pos_frac": 0.8125, "sample": [38.951507568359375, -2.8844032287597656, 28.79236602783203, 81.89702606201172, 18.155654907226562, 32.13123321533203, 28.886268615722656, 7.605865478515625, 46.974632263183594, -0.5357284545898438, 17.80035400390625, 18.227890014648438, 46.29443359375, 17.09465789794922, 6.601558685302734, -9.080184936523438, 30.832000732421875, 0.727325439453125, 46.357322692871094, -12.874214172363281, 8.66566276550293, 23.813278198242188, -16.356250762939453, 3.3589630126953125, 10.533313751220703, 30.372047424316406, 31.192218780517578, 3.1232070922851562, 18.338348388671875, 43.61305236816406, 22.565414428710938, 22.3565673828125, -9.25906753540039, 45.415069580078125, 18.041343688964844, 11.825538635253906, -5.501483917236328, 6.051849365234375, 3.36248779296875, 26.597793579101562, 13.183235168457031, -9.774009704589844, 14.49652099609375, 4.2106170654296875, 33.808692932128906, 43.480743408203125, 1.0938949584960938, -6.79351806640625, 23.480201721191406, 6.30841064453125, 46.58351135253906, 19.820362091064453, 32.28382110595703, 2.6945295333862305, -4.618526458740234, 21.765289306640625, 13.690093994140625, 19.364089965820312, -2.179986000061035, 14.54534912109375, -4.348152160644531, 20.848342895507812, 24.415267944335938, 43.996826171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000313.npy"}
{"epoch": 0.45961820851688695, "step": 314, "batch_size": 64, "mean": 19.829458236694336, "std": 19.724374771118164, "min": -19.445892333984375, "p10": -4.366532135009765, "median": 16.259241104125977, "p90": 47.913949584960946, "max": 85.39837646484375, "pos_frac": 0.84375, "sample": [45.91477966308594, -1.0190048217773438, -9.35208511352539, -3.07965087890625, -17.75396728515625, 48.788856506347656, 14.472312927246094, 7.69049072265625, 11.482402801513672, -8.601974487304688, 20.466615676879883, 9.188491821289062, 1.3757781982421875, 53.424293518066406, 27.268348693847656, -4.918052673339844, 4.78399658203125, 17.174388885498047, 48.770965576171875, 85.39837646484375, 33.13218688964844, 4.618659973144531, 13.198307037353516, 29.209823608398438, 2.982513427734375, 11.773849487304688, 23.38990020751953, -0.9868106842041016, -5.0575103759765625, 15.344093322753906, 38.73816680908203, 28.09893798828125, 44.130958557128906, 40.571929931640625, 28.768169403076172, 8.27475357055664, 22.49966049194336, 25.015140533447266, 33.99867248535156, 54.049522399902344, 31.765037536621094, 32.10082244873047, 34.80314636230469, 48.77073669433594, -5.677501678466797, 28.260299682617188, 14.422100067138672, 38.450050354003906, -19.445892333984375, 10.648384094238281, 17.916763305664062, 14.669168472290039, 2.1331634521484375, 20.240760803222656, 25.31854248046875, 36.998870849609375, 13.229583740234375, 14.721817016601562, 12.123199462890625, 25.963394165039062, 3.142608642578125, 3.6768932342529297, 11.499893188476562, 50.12723159790039], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000314.npy"}
{"epoch": 0.461086637298091, "step": 315, "batch_size": 64, "mean": 16.92934799194336, "std": 17.953319549560547, "min": -27.598434448242188, "p10": -6.643329811096189, "median": 15.972198486328125, "p90": 40.13272399902344, "max": 62.269256591796875, "pos_frac": 0.84375, "sample": [14.496795654296875, 36.002323150634766, 25.32970428466797, 3.9101638793945312, -0.8120975494384766, 6.108116149902344, -4.45808219909668, -9.172439575195312, -7.579864501953125, -2.1877098083496094, 32.33409881591797, 1.3012657165527344, -11.293869018554688, 19.660701751708984, 37.02284240722656, 31.12713623046875, 15.816642761230469, 16.988693237304688, 34.995540618896484, 15.241889953613281, 19.7447509765625, 12.579002380371094, 20.087181091308594, 40.33430480957031, 46.63130187988281, 10.831775665283203, 25.696212768554688, 17.363452911376953, 24.33642578125, 14.583259582519531, 37.26933288574219, -11.623565673828125, 32.27215576171875, 7.920093536376953, -12.797821044921875, 47.15008544921875, 32.24778747558594, 7.733455657958984, 22.441635131835938, 1.6595535278320312, 4.379518508911133, 0.847900390625, 62.269256591796875, 8.559587478637695, 21.028274536132812, 20.186233520507812, -14.569931030273438, 2.5319862365722656, 19.483036041259766, 39.66236877441406, 5.231433868408203, 13.680965423583984, 52.741729736328125, 15.319786071777344, 46.741302490234375, 16.12775421142578, 28.73535919189453, 13.071914672851562, 41.63307189941406, 28.34746551513672, -27.598434448242188, 19.653690338134766, 7.359413146972656, 6.762474060058594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000315.npy"}
{"epoch": 0.46255506607929514, "step": 316, "batch_size": 64, "mean": 17.51339340209961, "std": 18.295547485351562, "min": -19.987403869628906, "p10": -2.495595932006836, "median": 17.8071346282959, "p90": 41.133650970459, "max": 67.61388397216797, "pos_frac": 0.859375, "sample": [2.9650192260742188, 8.930952072143555, 28.039222717285156, 24.352392196655273, 17.825939178466797, 20.239837646484375, 17.31622314453125, 11.178497314453125, 58.09918212890625, 67.61388397216797, 20.508346557617188, 2.2811279296875, 2.6509857177734375, -0.8337478637695312, -2.3620948791503906, 32.119293212890625, 7.370601654052734, 19.837051391601562, 8.29189682006836, 29.593765258789062, 8.626609802246094, 23.213790893554688, 32.8737678527832, -13.925811767578125, 30.245101928710938, 4.7628936767578125, 9.775238037109375, 58.45858383178711, 27.779251098632812, 34.00844955444336, 3.639150619506836, 14.66632080078125, 9.08001708984375, 17.788330078125, 19.822425842285156, 25.348678588867188, 19.58905029296875, -19.987403869628906, -12.5555419921875, 12.924217224121094, 43.575035095214844, 27.347396850585938, 6.4651947021484375, 2.32708740234375, 20.58049774169922, -16.115360260009766, 37.50811004638672, -11.662391662597656, 43.6455078125, 26.912647247314453, 53.9989013671875, 7.02911376953125, -14.569091796875, 10.574441909790039, 29.381622314453125, 11.761112213134766, 21.627761840820312, 23.746490478515625, 1.5924797058105469, 32.4102783203125, 17.567211151123047, 22.86697769165039, -2.5528106689453125, 42.68745422363281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000316.npy"}
{"epoch": 0.46402349486049926, "step": 317, "batch_size": 64, "mean": 25.990633010864258, "std": 20.00446128845215, "min": -12.653076171875, "p10": 2.593517303466798, "median": 21.747196197509766, "p90": 50.027712249755865, "max": 88.873046875, "pos_frac": 0.921875, "sample": [30.690353393554688, 17.350982666015625, -12.653076171875, 5.987144470214844, 47.14271545410156, 29.183609008789062, 16.0926513671875, 24.919769287109375, 10.287521362304688, 22.231124877929688, 52.266563415527344, 1.6916084289550781, 17.107460021972656, 16.174728393554688, 33.968048095703125, 50.58952331542969, 25.178604125976562, 16.451919555664062, 48.716819763183594, 15.728408813476562, 8.217453002929688, 21.42084503173828, 19.425437927246094, 41.77685546875, -4.620384216308594, 20.965652465820312, 1.9928741455078125, 31.07781982421875, 88.873046875, 45.818511962890625, 25.849197387695312, 15.170791625976562, -0.9294891357421875, 43.76800537109375, 32.33210754394531, 12.393875122070312, 78.9172134399414, 35.159088134765625, 51.47663116455078, 31.72812271118164, 71.87876892089844, 29.30914306640625, -0.32338905334472656, 19.104122161865234, 32.7576904296875, 18.97925567626953, 68.45852661132812, 13.410675048828125, 42.85801696777344, 3.9950180053710938, 13.583209991455078, 22.07354736328125, 9.416728973388672, 27.33873748779297, 12.934700012207031, 15.672473907470703, 13.46246337890625, 47.41084289550781, 36.79465866088867, 38.723854064941406, -1.6714324951171875, 7.3538665771484375, 19.466575622558594, 30.49231719970703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000317.npy"}
{"epoch": 0.4654919236417034, "step": 318, "batch_size": 64, "mean": 20.04991340637207, "std": 17.953554153442383, "min": -15.488861083984375, "p10": 0.3110050201416016, "median": 18.95071792602539, "p90": 44.39333648681641, "max": 72.8692855834961, "pos_frac": 0.90625, "sample": [8.261676788330078, 33.818145751953125, 34.2940673828125, 40.716033935546875, 7.835849761962891, 24.013904571533203, 13.822114944458008, 30.01500701904297, 6.264015197753906, 42.61604309082031, 0.7640647888183594, 6.730400085449219, 0.30170440673828125, 23.966339111328125, 47.77466583251953, 21.181285858154297, 0.3327064514160156, 26.514467239379883, 27.463607788085938, 10.80888557434082, 32.99174499511719, 9.26409912109375, 4.30340576171875, 44.09852600097656, 9.98883056640625, 36.716365814208984, -6.326918601989746, -15.488861083984375, 30.42095184326172, 22.117050170898438, 23.855674743652344, 48.604736328125, 32.91481018066406, 1.0677413940429688, 14.302597045898438, 16.79633331298828, 72.8692855834961, -1.3888893127441406, 9.852859497070312, 45.286338806152344, 56.101043701171875, 9.72012710571289, 2.47454833984375, 20.02863311767578, 35.577964782714844, 11.533443450927734, 11.993965148925781, 19.56878662109375, -13.100715637207031, -1.185708999633789, 5.460689544677734, 14.954689025878906, 44.519683837890625, 30.697235107421875, 45.513832092285156, 23.851573944091797, 0.89080810546875, 17.452251434326172, 24.802398681640625, 43.3656005859375, -13.699043273925781, 18.33264923095703, 11.444137573242188, 23.15423583984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000318.npy"}
{"epoch": 0.4669603524229075, "step": 319, "batch_size": 64, "mean": 21.176036834716797, "std": 20.204618453979492, "min": -12.786972045898438, "p10": 0.6861371994018558, "median": 16.50252628326416, "p90": 45.03572845458985, "max": 105.02470397949219, "pos_frac": 0.90625, "sample": [16.061538696289062, 27.708965301513672, 2.714385986328125, 12.454605102539062, 59.66218566894531, 14.39569091796875, 105.02470397949219, 25.397506713867188, 39.39781188964844, 10.899457931518555, 4.707082748413086, 27.713764190673828, 24.868362426757812, 20.353347778320312, 22.522003173828125, 10.197052001953125, 6.714969635009766, 36.637508392333984, 5.619781494140625, 6.979316711425781, 53.883636474609375, -3.0138893127441406, 21.84170150756836, 3.293079376220703, 16.457901000976562, 0.5536975860595703, 14.712451934814453, 40.649200439453125, 4.6656494140625, 47.495365142822266, 16.547151565551758, 18.8228759765625, 8.299560546875, 40.48944854736328, 3.3697128295898438, -10.501548767089844, 24.69770050048828, 9.19349479675293, 2.4394607543945312, 33.16577911376953, 55.97947692871094, 48.51065444946289, 39.34124755859375, -1.4636192321777344, 28.470664978027344, 3.2475814819335938, 39.27008819580078, 35.53926086425781, 42.98925018310547, 45.83372497558594, 30.04461669921875, 15.686321258544922, 10.270835876464844, 31.253135681152344, -12.786972045898438, 18.555686950683594, -0.7403564453125, -1.2518253326416016, 5.55084228515625, 3.2111663818359375, 43.173736572265625, 13.088478088378906, 33.404624938964844, 0.9951629638671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000319.npy"}
{"epoch": 0.4684287812041116, "step": 320, "batch_size": 64, "mean": 19.236787796020508, "std": 18.908756256103516, "min": -16.240489959716797, "p10": 0.33887844085693375, "median": 16.014575958251953, "p90": 43.19076232910157, "max": 88.96377563476562, "pos_frac": 0.90625, "sample": [2.96185302734375, 35.860069274902344, 68.30976867675781, 3.8038558959960938, 17.28388214111328, 23.727325439453125, 41.2349853515625, 16.422889709472656, 88.96377563476562, 17.472286224365234, 19.788803100585938, -1.71722412109375, -10.054634094238281, 5.500396728515625, 45.57318878173828, 31.02477264404297, 29.436859130859375, -7.0931549072265625, 4.402660369873047, 12.370468139648438, 14.346473693847656, 37.72557830810547, 0.2782459259033203, 27.465545654296875, 5.851043701171875, 14.672813415527344, 26.354400634765625, 16.58917236328125, -11.034210205078125, 13.708614349365234, 5.2823638916015625, 13.421173095703125, 15.60626220703125, 11.276260375976562, 43.67756652832031, 10.573680877685547, 35.036468505859375, 15.45132064819336, 42.05488586425781, 17.9053955078125, 5.8423004150390625, -16.240489959716797, 3.9316062927246094, 25.225807189941406, 31.822967529296875, 16.557174682617188, 44.88383483886719, 4.129997253417969, 3.3030757904052734, 12.424827575683594, 14.227851867675781, 10.1065673828125, 26.91283416748047, 46.82280731201172, 28.277732849121094, 56.71714782714844, 31.00000762939453, 1.8951950073242188, 30.63514518737793, 7.9267120361328125, 19.014877319335938, 28.264942169189453, 0.48035430908203125, -4.5247650146484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000320.npy"}
{"epoch": 0.4698972099853157, "step": 321, "batch_size": 64, "mean": 21.689363479614258, "std": 20.327064514160156, "min": -17.16840362548828, "p10": -2.3237636566162108, "median": 22.304275512695312, "p90": 47.14752044677737, "max": 73.20794677734375, "pos_frac": 0.8125, "sample": [31.34801483154297, 20.185962677001953, 38.716209411621094, 31.72382354736328, 1.5805244445800781, 12.039073944091797, 39.24342346191406, -0.6310462951660156, 12.105255126953125, 40.55534362792969, 32.866146087646484, 22.2464599609375, -0.28009033203125, -2.476612091064453, -17.16840362548828, 17.415557861328125, 25.69396209716797, 54.80009460449219, 59.40203857421875, 6.7765655517578125, 40.662078857421875, 7.695159912109375, 7.4223480224609375, 24.171844482421875, 52.3209228515625, 10.421150207519531, 25.871437072753906, -6.851238250732422, -5.246253967285156, 15.11199951171875, -1.9671173095703125, 37.87504577636719, -4.8427886962890625, 37.386573791503906, 26.61597442626953, 36.15705108642578, -0.4383544921875, -15.014480590820312, 64.99822998046875, 40.970611572265625, 0.6656494140625, 1.4264144897460938, 6.4731903076171875, 16.61867332458496, 5.568061828613281, 51.23954772949219, 33.577598571777344, -5.239738464355469, 49.76316833496094, 28.793228149414062, 35.739471435546875, 38.40611267089844, 11.567802429199219, 28.104942321777344, 22.362091064453125, 37.293701171875, 34.11467742919922, 14.409133911132812, 5.281719207763672, 7.202552795410156, -1.8601531982421875, 32.896610260009766, 73.20794677734375, 41.044342041015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000321.npy"}
{"epoch": 0.4713656387665198, "step": 322, "batch_size": 64, "mean": 23.44620132446289, "std": 20.483488082885742, "min": -11.209060668945312, "p10": 0.26068344116211056, "median": 20.2756290435791, "p90": 53.639516448974625, "max": 68.28768920898438, "pos_frac": 0.890625, "sample": [18.80893325805664, 46.21990966796875, 42.88938903808594, 55.156524658203125, 14.653852462768555, -7.081535339355469, -11.209060668945312, 44.31156921386719, 32.86455535888672, 10.107889175415039, -11.091728210449219, 5.2437896728515625, 59.13874816894531, -4.960853576660156, 50.099830627441406, 39.77293395996094, 32.385414123535156, 21.81072998046875, 11.47308349609375, 22.53339385986328, 60.53411865234375, 66.84323120117188, 3.261871337890625, 46.06788635253906, 4.276802062988281, 22.765029907226562, 9.270416259765625, 30.61640167236328, -5.4467010498046875, 31.0311279296875, 18.538558959960938, 11.108783721923828, 48.846378326416016, 16.103416442871094, 29.793956756591797, 13.85995101928711, 8.226455688476562, 2.9719619750976562, 13.472869873046875, 68.28768920898438, 3.63153076171875, 64.4742431640625, 26.64071273803711, 6.754993438720703, 40.72461700439453, 22.531543731689453, 21.742324829101562, -4.050691604614258, 15.930252075195312, -0.235260009765625, 26.200660705566406, 48.68145751953125, 56.72894287109375, 24.198623657226562, 16.924911499023438, 7.7459716796875, 17.553977966308594, 25.033905029296875, 5.280794143676758, 28.727340698242188, 17.636199951171875, 1.4178848266601562, 45.53791809082031, 7.186513900756836], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000322.npy"}
{"epoch": 0.47283406754772395, "step": 323, "batch_size": 64, "mean": 20.465576171875, "std": 16.686498641967773, "min": -9.36846923828125, "p10": -0.7933694839477536, "median": 22.250064849853516, "p90": 42.35022048950196, "max": 69.9289779663086, "pos_frac": 0.875, "sample": [-2.9662017822265625, 27.8759822845459, 30.194732666015625, 6.437965393066406, 22.33342742919922, 15.981132507324219, 35.368675231933594, 57.80279541015625, 7.142446517944336, -3.0998611450195312, 46.24498748779297, 4.2237701416015625, -2.1570816040039062, 27.3697509765625, 9.642387390136719, 40.672035217285156, 4.1583251953125, 23.086149215698242, 23.370887756347656, 40.173309326171875, 7.6781768798828125, -1.9194488525390625, 24.045433044433594, 23.07045555114746, 6.208366394042969, 45.13536071777344, 10.51138687133789, 16.314109802246094, 43.06944274902344, 22.500701904296875, 12.7481689453125, -9.36846923828125, 14.232070922851562, 9.845239639282227, -1.62628173828125, 10.632659912109375, 22.166702270507812, 14.075881958007812, 69.9289779663086, 26.39514923095703, 3.604480743408203, 6.7596282958984375, 23.133380889892578, 38.4383544921875, 24.073043823242188, 33.84558868408203, 14.62674331665039, 25.61545181274414, -0.9211502075195312, 29.72241973876953, 31.737079620361328, 60.66277313232422, 8.910797119140625, 30.91004180908203, 5.027740478515625, 22.852767944335938, 45.47645950317383, 34.07549285888672, 30.197372436523438, 28.476547241210938, 7.460170745849609, -0.49521446228027344, 10.343683242797852, 15.763605117797852], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000323.npy"}
{"epoch": 0.47430249632892807, "step": 324, "batch_size": 64, "mean": 21.645790100097656, "std": 17.04680061340332, "min": -15.365005493164062, "p10": 0.9022457122802738, "median": 20.698474884033203, "p90": 45.86210098266603, "max": 68.06451416015625, "pos_frac": 0.90625, "sample": [22.42536163330078, 21.41779327392578, 24.879512786865234, 19.20602798461914, 4.388589859008789, 68.06451416015625, 18.22869873046875, 20.817718505859375, 8.156387329101562, 14.030193328857422, 58.521575927734375, -15.365005493164062, 56.974143981933594, 11.640338897705078, 18.199691772460938, 38.31719970703125, 26.41333770751953, 38.05853271484375, 16.6632080078125, 36.393001556396484, 14.660476684570312, 9.607746124267578, -11.95096206665039, 20.188018798828125, 35.72564697265625, 23.800682067871094, 28.679550170898438, 0.7602691650390625, 13.275611877441406, 47.31025695800781, 53.58837890625, 28.362281799316406, 24.900650024414062, 14.8704833984375, 29.017044067382812, -9.421005249023438, 16.457504272460938, 33.390193939208984, 4.199972152709961, 23.689498901367188, 24.596664428710938, 5.160186767578125, 31.36090087890625, 25.631874084472656, 6.1912689208984375, -0.5054550170898438, 1.2335243225097656, 17.536468505859375, -0.07875823974609375, 42.483070373535156, 28.420440673828125, 50.567283630371094, 20.57923126220703, -0.776580810546875, 24.47540283203125, 11.801513671875, 15.558250427246094, 2.5982627868652344, 27.917890548706055, 55.980987548828125, 16.928333282470703, 25.485443115234375, 24.1715087890625, 19.469722747802734], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000324.npy"}
{"epoch": 0.47577092511013214, "step": 325, "batch_size": 64, "mean": 19.893634796142578, "std": 17.9092960357666, "min": -24.0391845703125, "p10": -1.2135105133056638, "median": 17.429277420043945, "p90": 43.773314285278325, "max": 56.41266632080078, "pos_frac": 0.859375, "sample": [-4.115663528442383, -24.0391845703125, 27.006694793701172, 3.2594337463378906, 23.38717269897461, 37.07194519042969, 32.04887771606445, -10.737541198730469, -1.3451004028320312, 36.50022888183594, 16.866073608398438, 29.545394897460938, 22.972900390625, 12.187061309814453, 10.276802062988281, 7.003053665161133, 46.549285888671875, 22.942604064941406, -3.8590087890625, 42.76185989379883, 37.3841552734375, 46.908294677734375, 35.508460998535156, 55.90808868408203, 8.597923278808594, 1.1546592712402344, 34.35020446777344, 23.777751922607422, 11.87530517578125, 4.764346122741699, 48.38710021972656, 13.852386474609375, 29.994476318359375, 3.319915771484375, 13.160072326660156, 8.91793441772461, 56.41266632080078, 16.033945083618164, 8.023651123046875, 42.25166702270508, -0.79150390625, -0.9064674377441406, 24.640365600585938, 2.819507598876953, 33.8662109375, 10.29583740234375, 9.435958862304688, 15.8031005859375, -5.4532318115234375, 18.11603546142578, 27.73290252685547, 3.9329376220703125, 7.425022125244141, 44.20679473876953, 55.23072814941406, 13.445869445800781, 39.70233154296875, 26.06676483154297, 34.2587890625, 17.526443481445312, 23.74615478515625, -9.10308837890625, 17.332111358642578, 36.997318267822266], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000325.npy"}
{"epoch": 0.47723935389133626, "step": 326, "batch_size": 64, "mean": 19.341712951660156, "std": 17.360013961791992, "min": -15.971153259277344, "p10": -0.9039653778076168, "median": 15.727771759033203, "p90": 41.72540969848633, "max": 70.80043029785156, "pos_frac": 0.875, "sample": [22.95001220703125, 36.813636779785156, -15.971153259277344, -1.0839729309082031, 11.265296936035156, -10.9739990234375, 4.572460174560547, 8.682380676269531, 45.03710174560547, 3.133808135986328, 40.51092529296875, 15.171432495117188, 70.80043029785156, 14.563583374023438, -7.087795257568359, -1.9903430938720703, 5.670871734619141, 12.02528190612793, 15.150836944580078, 54.379615783691406, 17.929473876953125, 26.12171173095703, 34.37098693847656, 29.577133178710938, 30.884536743164062, 9.795364379882812, 25.027381896972656, 10.209365844726562, 6.73065185546875, 4.950965881347656, 27.528854370117188, 36.48053741455078, 36.81846618652344, 30.304553985595703, 21.41912078857422, 29.542396545410156, 15.337936401367188, 11.524482727050781, 2.4368858337402344, 32.45556640625, -6.216339111328125, 8.872184753417969, 9.01974105834961, 29.978912353515625, 45.365989685058594, 16.966888427734375, 23.790992736816406, 23.66168212890625, 16.11760711669922, 15.087139129638672, -3.5355606079101562, 42.67755126953125, 13.360595703125, 7.4941253662109375, 42.24590301513672, 59.49043273925781, 21.719253540039062, 39.412353515625, -0.48394775390625, 2.6257858276367188, 24.095458984375, 31.285598754882812, 11.121139526367188, 0.6493682861328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000326.npy"}
{"epoch": 0.4787077826725404, "step": 327, "batch_size": 64, "mean": 23.18556785583496, "std": 20.871238708496094, "min": -15.094657897949219, "p10": -1.3773019790649412, "median": 22.671682357788086, "p90": 48.42455444335939, "max": 87.29850006103516, "pos_frac": 0.859375, "sample": [35.386962890625, -0.9371681213378906, 44.39954376220703, 40.08949279785156, 11.756057739257812, 35.35395050048828, -1.499837875366211, 29.455650329589844, 40.114715576171875, 37.33784484863281, 44.84724426269531, 12.308837890625, -4.0019989013671875, 73.67739868164062, 14.609004974365234, -15.094657897949219, -7.838172912597656, 40.99773406982422, 26.41220474243164, 35.22697448730469, -6.815330505371094, 31.388458251953125, 87.29850006103516, 30.778797149658203, 6.1108245849609375, 32.35430908203125, 34.2219352722168, 3.8293685913085938, 29.55205535888672, 15.8292236328125, 8.175430297851562, 36.15342712402344, 20.796188354492188, 22.45766830444336, 25.809730529785156, 30.738807678222656, 0.7163276672363281, 22.885696411132812, 5.94708251953125, 25.31026840209961, -9.447402954101562, 5.787742614746094, 9.618392944335938, 69.91230773925781, 8.969207763671875, 26.01024627685547, 6.940376281738281, 51.679039001464844, 36.61376953125, 27.7730712890625, -2.1235580444335938, -1.0913848876953125, 55.60108184814453, 6.206123352050781, 50.72715759277344, 6.598512649536133, 22.245742797851562, 49.95768737792969, 8.113113403320312, 4.280738830566406, 22.080608367919922, 43.66719055175781, 8.723037719726562, 18.892929077148438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000327.npy"}
{"epoch": 0.4801762114537445, "step": 328, "batch_size": 64, "mean": 22.190967559814453, "std": 20.07547950744629, "min": -12.905364990234375, "p10": -1.8973997116088865, "median": 19.957218170166016, "p90": 47.84672470092774, "max": 77.4508285522461, "pos_frac": 0.875, "sample": [33.239288330078125, 9.926864624023438, 25.881057739257812, 6.77386474609375, 26.107398986816406, 33.7320556640625, 13.093746185302734, 32.21784973144531, 27.831802368164062, 0.15020751953125, 22.501205444335938, 2.6595916748046875, 36.62838363647461, 17.797271728515625, 11.909889221191406, 27.90118408203125, -12.581869125366211, -6.029548645019531, 15.003982543945312, 38.34344482421875, 8.2034912109375, 26.7879638671875, 54.09406280517578, 77.4508285522461, 9.639972686767578, 32.96110534667969, -2.02178955078125, -4.9628448486328125, 32.768760681152344, 58.01183319091797, 20.56175994873047, 47.806419372558594, 41.835533142089844, 14.631683349609375, 3.0998268127441406, 11.886703491210938, 7.0635986328125, 40.683815002441406, 13.377532958984375, 3.1485061645507812, -1.607156753540039, 41.80773162841797, -3.9431724548339844, 28.85820770263672, 9.188217163085938, 59.31935501098633, -12.506200790405273, 19.352676391601562, 45.564788818359375, 18.593978881835938, 51.52723693847656, 70.24638366699219, 27.181594848632812, -12.905364990234375, 6.290554046630859, 45.923431396484375, 47.86399841308594, 21.78204345703125, 13.731636047363281, 24.062088012695312, 16.74871826171875, 28.291641235351562, 9.588056564331055, 5.145027160644531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000328.npy"}
{"epoch": 0.48164464023494863, "step": 329, "batch_size": 64, "mean": 22.457298278808594, "std": 19.58085060119629, "min": -23.36724853515625, "p10": -2.3776542663574216, "median": 21.0609130859375, "p90": 50.704312896728524, "max": 67.48202514648438, "pos_frac": 0.875, "sample": [51.41602325439453, 32.29087829589844, 62.875762939453125, 28.447341918945312, 19.803451538085938, 6.296211242675781, 31.76470184326172, -1.94049072265625, 67.48202514648438, 24.56684112548828, 60.172393798828125, 23.556846618652344, 1.9291267395019531, 30.166244506835938, 8.317398071289062, 4.737918853759766, 7.0982513427734375, 30.102813720703125, 15.547897338867188, 49.04365539550781, 18.28081512451172, 14.4710693359375, 26.952659606933594, 10.919597625732422, 28.425735473632812, 3.110137939453125, -10.550106048583984, 35.200042724609375, 23.183876037597656, 52.61161804199219, 14.483154296875, -4.72906494140625, 46.19519805908203, 6.879299163818359, 31.044265747070312, 22.318374633789062, 46.020469665527344, 60.284393310546875, 28.894622802734375, 26.676921844482422, -2.5650100708007812, -3.4690170288085938, 24.27032470703125, 34.14482116699219, 11.9129638671875, 12.257591247558594, 55.281185150146484, -5.505584716796875, 17.77535629272461, 15.747238159179688, 13.017833709716797, 9.94891357421875, -7.78521728515625, 19.60137176513672, 30.87757110595703, 18.339508056640625, 28.112350463867188, 15.197433471679688, 48.53485107421875, 35.084808349609375, 3.58392333984375, 7.126802444458008, 44.7960205078125, -23.36724853515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000329.npy"}
{"epoch": 0.4831130690161527, "step": 330, "batch_size": 64, "mean": 20.760862350463867, "std": 15.60470199584961, "min": -18.9423828125, "p10": 1.1366756439208994, "median": 19.291427612304688, "p90": 39.76109237670899, "max": 67.68428039550781, "pos_frac": 0.921875, "sample": [9.5731201171875, 17.450942993164062, -0.036468505859375, 2.4210052490234375, 38.459632873535156, 4.403129577636719, 12.427104949951172, 31.93134307861328, 18.901657104492188, 10.07308578491211, 5.684803009033203, 29.251266479492188, 41.946781158447266, 25.603416442871094, 38.622406005859375, 22.83081817626953, 23.027000427246094, 19.681198120117188, 16.46062469482422, 26.51732635498047, 20.43878173828125, 0.763336181640625, 14.535858154296875, -7.447731018066406, 18.2794189453125, 25.131866455078125, 25.233402252197266, -3.122478485107422, 0.17865371704101562, 17.048160552978516, 28.305519104003906, 15.987113952636719, 30.265640258789062, 9.709587097167969, 18.083419799804688, 39.42835235595703, 20.67790985107422, 39.74957275390625, 27.845901489257812, 24.425186157226562, 2.007801055908203, 39.766029357910156, 15.544795989990234, 45.54252624511719, 3.9264984130859375, 67.68428039550781, 37.4957275390625, 17.932331085205078, 24.510772705078125, 11.038345336914062, 26.450729370117188, 50.685691833496094, -4.669883728027344, 17.71337890625, 26.20880889892578, 45.83095169067383, 9.091976165771484, 8.90557861328125, -18.9423828125, 20.695602416992188, 44.638877868652344, 18.25426483154297, 39.696807861328125, 17.938018798828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000330.npy"}
{"epoch": 0.4845814977973568, "step": 331, "batch_size": 64, "mean": 22.217872619628906, "std": 18.041772842407227, "min": -21.483016967773438, "p10": 2.315803527832032, "median": 21.256881713867188, "p90": 45.73589019775391, "max": 64.48709106445312, "pos_frac": 0.90625, "sample": [6.2992095947265625, 58.33172607421875, 33.354095458984375, 18.539108276367188, 20.663917541503906, -18.11663818359375, 8.505558013916016, 3.25238037109375, 10.198234558105469, 44.13201904296875, 33.367530822753906, 6.756290435791016, 42.925506591796875, 24.758056640625, 41.718963623046875, -5.422454833984375, 46.42326354980469, 4.789348602294922, 22.626312255859375, 49.073089599609375, 33.08671569824219, 31.237239837646484, 64.48709106445312, 13.297187805175781, 10.9019775390625, 7.4704742431640625, 26.112438201904297, -15.098247528076172, 53.214935302734375, 31.44970703125, 15.17585563659668, 17.092992782592773, 25.589630126953125, 1.9144134521484375, 34.975341796875, 8.39178466796875, 5.127891540527344, -1.5857715606689453, 8.824954986572266, 18.141437530517578, 20.366092681884766, 28.92443084716797, -0.8780078887939453, 60.09052276611328, 21.848678588867188, 29.030189514160156, 35.70979690551758, 33.92286682128906, 50.83768844604492, 30.266387939453125, 16.546905517578125, 27.877098083496094, 20.665084838867188, 17.359329223632812, -21.483016967773438, 9.390892028808594, 20.171871185302734, 18.015426635742188, 11.364395141601562, 31.190147399902344, 23.816673278808594, 35.20442199707031, 23.936264038085938, 35.78609085083008], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000331.npy"}
{"epoch": 0.48604992657856094, "step": 332, "batch_size": 64, "mean": 16.549270629882812, "std": 16.93157386779785, "min": -15.973541259765625, "p10": -5.798826217651367, "median": 15.807489395141602, "p90": 36.98094940185547, "max": 64.96856689453125, "pos_frac": 0.875, "sample": [46.63923645019531, 1.2159423828125, 24.809906005859375, 17.223602294921875, 6.760261535644531, -5.71453857421875, 18.06719207763672, 34.89110565185547, 18.78155517578125, 10.658256530761719, 36.23143768310547, -6.922477722167969, -6.280181884765625, 53.82672119140625, 8.868450164794922, 10.542747497558594, 16.905685424804688, 64.96856689453125, 11.084968566894531, 0.4361562728881836, 10.447998046875, 26.211334228515625, 25.999122619628906, 2.0137176513671875, 47.56730651855469, 24.207462310791016, 14.078781127929688, -15.973541259765625, 7.86822509765625, 23.048397064208984, 30.236507415771484, 15.362394332885742, -9.77999496459961, 13.487358093261719, 25.209854125976562, 8.710441589355469, 33.61530303955078, 37.071929931640625, 6.367204666137695, 27.686080932617188, 18.477874755859375, -15.716178894042969, 36.76866149902344, 11.410087585449219, 16.735607147216797, 32.75077819824219, 16.396141052246094, 48.75305938720703, 23.270095825195312, 3.3519287109375, 29.77278709411621, 4.067192077636719, 16.011463165283203, 0.348541259765625, 38.87938690185547, 3.9307498931884766, 9.296432495117188, 1.1625709533691406, 4.349392890930176, 15.603515625, 18.301467895507812, 24.99365997314453, -5.834949493408203, -10.35748291015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000332.npy"}
{"epoch": 0.48751835535976507, "step": 333, "batch_size": 64, "mean": 22.598552703857422, "std": 24.571687698364258, "min": -69.85794067382812, "p10": -1.216615295410156, "median": 22.626304626464844, "p90": 53.956854248046874, "max": 84.97463989257812, "pos_frac": 0.8125, "sample": [14.925613403320312, -3.6539764404296875, 51.24365234375, -10.620628356933594, 19.187164306640625, 49.418670654296875, -13.578811645507812, 42.688377380371094, -0.7970809936523438, -0.06614494323730469, 32.50628662109375, -0.7178716659545898, 27.76019287109375, 9.814910888671875, 55.177276611328125, 32.14227294921875, 9.51650619506836, 17.860576629638672, 27.114578247070312, 8.529518127441406, -0.10345840454101562, 32.514286041259766, 23.297225952148438, 34.2784423828125, 26.194656372070312, 31.193763732910156, 24.219058990478516, 12.366409301757812, 12.283782958984375, 45.86393737792969, 10.237710952758789, 4.538400650024414, 43.37596130371094, 59.9095458984375, 65.74693298339844, -15.016983032226562, 2.032289505004883, -4.02061653137207, 25.000646591186523, 21.95538330078125, 20.126968383789062, 55.345184326171875, -1.3964157104492188, 50.95378112792969, 25.7152099609375, 16.953319549560547, 3.460845947265625, 35.71734619140625, 84.97463989257812, 45.24658966064453, 39.362327575683594, 46.07998275756836, 61.71149444580078, -0.17359161376953125, 25.625579833984375, 15.194744110107422, 25.7371826171875, 8.409862518310547, 3.9445571899414062, 13.387779235839844, 7.69769287109375, -69.85794067382812, 53.70838928222656, 54.06333923339844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000333.npy"}
{"epoch": 0.4889867841409692, "step": 334, "batch_size": 64, "mean": 19.693979263305664, "std": 18.398378372192383, "min": -25.59930419921875, "p10": -1.6514755249023436, "median": 18.95949363708496, "p90": 43.95934524536133, "max": 65.65348815917969, "pos_frac": 0.84375, "sample": [39.38353729248047, 13.8106689453125, 10.924556732177734, 30.218433380126953, 40.2393798828125, -1.6168365478515625, 9.277381896972656, 5.813350677490234, 9.464765548706055, 19.551666259765625, 4.1615753173828125, 2.1789093017578125, -2.8049240112304688, 16.562042236328125, 18.367321014404297, 7.138130187988281, 16.53396224975586, 32.202613830566406, -25.59930419921875, 23.458999633789062, 6.9936065673828125, 22.159950256347656, 44.10839080810547, -2.397817611694336, 27.5927734375, 44.28181457519531, 26.386600494384766, 1.4545135498046875, 20.987548828125, 9.635116577148438, -1.87384033203125, 10.426126480102539, 43.611572265625, 17.36300277709961, 6.4658660888671875, 17.042694091796875, 23.3046875, 37.777252197265625, 20.86947250366211, 0.6644973754882812, 48.563819885253906, -11.323562622070312, 36.17222595214844, -2.865692138671875, 35.726951599121094, 36.67823028564453, 1.7198638916015625, 8.11846923828125, 52.7906494140625, 39.035587310791016, 24.002220153808594, 39.674163818359375, 30.483638763427734, 20.389991760253906, 8.147354125976562, 48.08498001098633, 25.69208526611328, 27.933975219726562, -1.66632080078125, 62.62107849121094, -0.8110160827636719, 65.65348815917969, 20.859840393066406, -1.3874053955078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000334.npy"}
{"epoch": 0.49045521292217326, "step": 335, "batch_size": 64, "mean": 17.365917205810547, "std": 17.437580108642578, "min": -24.264596939086914, "p10": -4.168155670166016, "median": 18.015140533447266, "p90": 40.00481414794922, "max": 62.2706184387207, "pos_frac": 0.796875, "sample": [-12.82940673828125, 41.170166015625, 7.912384033203125, 18.893978118896484, 24.184738159179688, 21.03421401977539, 18.654022216796875, 19.5120849609375, 36.1414794921875, -24.264596939086914, 6.154327392578125, -4.2356414794921875, -5.6812744140625, -2.7951736450195312, 39.3826904296875, 31.05584716796875, 26.462615966796875, 33.16101837158203, 6.726909637451172, -4.010688781738281, 41.56199645996094, 13.640281677246094, 16.396141052246094, -3.1262588500976562, 20.203964233398438, 18.555557250976562, 31.774154663085938, -16.041305541992188, -4.536531448364258, 25.693771362304688, 46.50663757324219, 15.386272430419922, 29.239303588867188, 9.023788452148438, 59.133079528808594, 21.180744171142578, 62.2706184387207, 17.47472381591797, 25.023582458496094, -7.2176513671875, 6.3190765380859375, 20.625125885009766, 11.913497924804688, 9.20074462890625, 32.139190673828125, 2.7508392333984375, 15.856670379638672, 25.522720336914062, -3.5160675048828125, -1.387298583984375, 16.473922729492188, 8.978466033935547, 40.27143859863281, 14.494827270507812, -1.9867134094238281, 39.0031623840332, 37.08940124511719, 41.579872131347656, 10.053359985351562, 23.25737762451172, 4.495414733886719, 19.514053344726562, 17.031587600708008, 22.965450286865234], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000335.npy"}
{"epoch": 0.4919236417033774, "step": 336, "batch_size": 64, "mean": 21.134733200073242, "std": 21.54682731628418, "min": -34.122962951660156, "p10": 0.9192016601562509, "median": 20.166473388671875, "p90": 42.874871826171876, "max": 118.0811767578125, "pos_frac": 0.90625, "sample": [23.682022094726562, 4.546718597412109, 29.977569580078125, 19.106388092041016, 13.963455200195312, -34.122962951660156, 1.8551483154296875, 6.061561584472656, 5.5897979736328125, 54.701873779296875, -5.144351959228516, 9.433425903320312, 10.038948059082031, -0.8093948364257812, 118.0811767578125, 4.301544189453125, 61.45991516113281, 22.61978530883789, 21.846481323242188, 2.2024688720703125, 5.209438323974609, 44.69157409667969, 32.91730499267578, 40.303062438964844, 20.363128662109375, 29.982147216796875, 24.737764358520508, 23.391937255859375, 3.1048736572265625, 12.274703979492188, 7.039653778076172, 23.377962112426758, 19.969818115234375, 0.5180816650390625, 27.84166717529297, 4.7846832275390625, -5.4370574951171875, 12.937065124511719, 29.376880645751953, 36.48956298828125, 42.23274230957031, 41.665557861328125, 28.54041290283203, 7.483924865722656, 66.62602233886719, 18.62451171875, -0.09856414794921875, 25.925003051757812, 19.039199829101562, 20.849594116210938, 35.867088317871094, 26.599443435668945, 36.3685302734375, 13.317615509033203, 28.50440216064453, -13.9239501953125, 13.726631164550781, 39.13909912109375, 6.576877593994141, 7.4062347412109375, 24.705612182617188, 43.15007019042969, 43.83905029296875, 13.191986083984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000336.npy"}
{"epoch": 0.4933920704845815, "step": 337, "batch_size": 64, "mean": 19.975955963134766, "std": 17.926420211791992, "min": -12.99578857421875, "p10": -2.2264442443847647, "median": 16.113021850585938, "p90": 46.61615066528321, "max": 71.17708587646484, "pos_frac": 0.859375, "sample": [32.02099609375, 53.29219436645508, -2.5580368041992188, 4.364719390869141, 32.57661437988281, 37.30763244628906, 33.40223693847656, 24.655269622802734, 36.24591827392578, 36.94633483886719, 13.322280883789062, 29.191078186035156, 15.804931640625, 3.2395172119140625, 49.200469970703125, 14.526031494140625, 23.08102798461914, 22.91234588623047, 16.44994354248047, 45.93318176269531, 15.937515258789062, 8.908157348632812, 5.862506866455078, -1.0717849731445312, 36.51108169555664, -7.07708740234375, 23.057952880859375, 42.489593505859375, 3.4102325439453125, 11.390899658203125, 11.12982177734375, 16.10327911376953, -8.79315185546875, -11.34384536743164, 15.024093627929688, -4.281440734863281, 14.834602355957031, 15.29888916015625, 5.1275787353515625, 22.47284698486328, 17.7879638671875, 71.17708587646484, 5.067268371582031, 11.846954345703125, -6.629478454589844, 20.569583892822266, 59.224761962890625, 15.246788024902344, 33.579864501953125, 2.1080398559570312, 25.79499053955078, 15.064529418945312, 15.375457763671875, 47.81977844238281, -12.99578857421875, 25.769317626953125, 19.666427612304688, 48.85791015625, 16.122764587402344, 46.908851623535156, 28.440399169921875, 10.578857421875, -1.452728271484375, 25.625099182128906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000337.npy"}
{"epoch": 0.4948604992657856, "step": 338, "batch_size": 64, "mean": 17.551193237304688, "std": 16.123653411865234, "min": -31.370574951171875, "p10": 0.8558792114257818, "median": 19.400233268737793, "p90": 34.41931228637695, "max": 59.9412841796875, "pos_frac": 0.921875, "sample": [2.7785568237304688, 34.36484909057617, 11.842201232910156, 0.6122093200683594, 23.1407470703125, 22.456832885742188, 16.228729248046875, 20.608444213867188, 55.730682373046875, 22.773284912109375, 32.27555847167969, 9.59063720703125, 24.199073791503906, 30.412017822265625, -31.370574951171875, 22.622879028320312, 27.020286560058594, 30.171493530273438, 36.55648422241211, 21.065269470214844, 9.298103332519531, 15.063827514648438, 19.594703674316406, -22.63201904296875, 5.86175537109375, 32.090110778808594, 11.624385833740234, -3.1565895080566406, 14.718124389648438, 6.174701690673828, 24.33434295654297, 18.601356506347656, 30.122562408447266, -8.702312469482422, 4.5933837890625, 14.736839294433594, 6.5351409912109375, 12.649538040161133, 34.44265365600586, 8.496719360351562, 25.98163604736328, 1.4244422912597656, 5.8708343505859375, -17.910781860351562, 41.44036865234375, 20.71611785888672, 40.48997497558594, 11.786140441894531, 25.10300064086914, 19.20576286315918, 24.96539306640625, 0.2678394317626953, 43.43836212158203, 16.976478576660156, 21.307968139648438, 29.7607421875, 59.9412841796875, 15.688858032226562, 22.869369506835938, 27.63582420349121, 21.2509765625, 11.14303207397461, 3.393134117126465, 3.0025405883789062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000338.npy"}
{"epoch": 0.49632892804698975, "step": 339, "batch_size": 64, "mean": 20.968585968017578, "std": 21.402843475341797, "min": -22.995346069335938, "p10": -3.6732536315917943, "median": 18.482357025146484, "p90": 51.109869384765624, "max": 79.810791015625, "pos_frac": 0.84375, "sample": [-0.34804534912109375, 29.531158447265625, 52.16967010498047, 16.202007293701172, 17.95880889892578, 11.679962158203125, 8.305221557617188, -0.3269004821777344, 35.839195251464844, -13.777629852294922, 79.810791015625, 19.27349090576172, 30.91192626953125, -11.068222045898438, 5.651123046875, 24.151241302490234, -5.285243988037109, -1.1383438110351562, 3.360321044921875, 31.066665649414062, 41.753597259521484, 15.462268829345703, -22.995346069335938, 15.524028778076172, 17.652236938476562, 25.717817306518555, 27.626693725585938, 1.4841156005859375, 54.18211364746094, 54.32825469970703, 2.789905548095703, 38.32879638671875, -21.63238525390625, 19.544204711914062, 34.390380859375, 15.552230834960938, 13.698226928710938, 12.820457458496094, 38.3173828125, 11.116279602050781, 22.421371459960938, 11.442466735839844, 10.56524658203125, 19.2607421875, 22.791709899902344, 67.09846496582031, -6.436836242675781, 48.213958740234375, 25.92572021484375, 11.475631713867188, 28.083866119384766, 70.74263000488281, 41.02438735961914, 1.826141357421875, 38.836551666259766, 50.91651916503906, 1.475341796875, 24.597671508789062, 19.005905151367188, -4.7596435546875, 51.19273376464844, 37.39723205566406, 10.306877136230469, 8.956329345703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000339.npy"}
{"epoch": 0.4977973568281938, "step": 340, "batch_size": 64, "mean": 18.20838737487793, "std": 17.331825256347656, "min": -32.26866149902344, "p10": -3.4873321533203123, "median": 19.416549682617188, "p90": 36.25122985839844, "max": 79.22940063476562, "pos_frac": 0.875, "sample": [31.93952178955078, 36.41162109375, 24.944690704345703, 19.34405517578125, -17.57415771484375, 12.517982482910156, 13.640857696533203, 30.597862243652344, 24.12298583984375, 9.633186340332031, 32.30513381958008, 79.22940063476562, 22.251861572265625, 4.403200149536133, 35.876983642578125, 11.109626770019531, 25.79265594482422, -14.246543884277344, 45.56212615966797, 7.457492828369141, 26.754920959472656, 25.980140686035156, -32.26866149902344, 45.99003601074219, 22.539947509765625, 21.90117645263672, -5.6271820068359375, 7.402374267578125, 23.677692413330078, 18.956985473632812, 27.669055938720703, 45.54034423828125, 21.11456298828125, 20.606979370117188, 8.166938781738281, 7.031032562255859, 30.594139099121094, -3.4894790649414062, 2.432727813720703, 24.170042037963867, 11.4483642578125, 19.489044189453125, 22.31012725830078, 13.556068420410156, 8.171188354492188, 39.40878677368164, -5.9552001953125, 18.611339569091797, 16.073421478271484, 26.987442016601562, 13.343231201171875, 51.257301330566406, 15.754608154296875, 12.981109619140625, 12.423404693603516, 25.430084228515625, -9.326427459716797, 14.606128692626953, 29.772430419921875, 26.755001068115234, 1.3063278198242188, 22.683258056640625, -3.4823226928710938, 7.2677154541015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000340.npy"}
{"epoch": 0.49926578560939794, "step": 341, "batch_size": 64, "mean": 21.244434356689453, "std": 17.815170288085938, "min": -16.740814208984375, "p10": -0.38436508178710693, "median": 22.418479919433594, "p90": 41.66487045288086, "max": 67.63848876953125, "pos_frac": 0.890625, "sample": [48.18754577636719, 3.98687744140625, -3.8541297912597656, 8.063041687011719, 15.664802551269531, 28.939254760742188, -3.680206298828125, 26.053802490234375, 46.863861083984375, 23.875713348388672, 19.768299102783203, 22.274768829345703, 18.765209197998047, -7.5583038330078125, 11.146717071533203, 2.665149688720703, 2.0096817016601562, 61.746131896972656, 6.837902069091797, -1.4103851318359375, 4.983407974243164, 41.01216125488281, 24.47894287109375, 24.056190490722656, 31.455360412597656, 33.850128173828125, 10.147956848144531, 10.454322814941406, 4.6499176025390625, 22.753982543945312, 25.137924194335938, 67.63848876953125, 59.68426513671875, 38.19306945800781, 34.40855026245117, 40.425384521484375, 39.147499084472656, -6.493049621582031, 4.668281555175781, 16.573589324951172, 24.623489379882812, -16.740814208984375, 26.9381160736084, 9.554275512695312, 7.938720703125, 27.504150390625, 20.572998046875, 35.38299560546875, -4.946807861328125, 23.180564880371094, 22.562191009521484, 9.474163055419922, 7.8334808349609375, 13.2569580078125, 19.389892578125, 16.314285278320312, 41.944602966308594, 4.782135009765625, 58.33509826660156, 27.058837890625, 31.71112060546875, 38.18858337402344, 28.859703063964844, 28.35303497314453], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000341.npy"}
{"epoch": 0.5007342143906021, "step": 342, "batch_size": 64, "mean": 25.108253479003906, "std": 17.259349822998047, "min": -12.153549194335938, "p10": 2.4043842315673833, "median": 25.054880142211914, "p90": 46.09967193603516, "max": 71.2644271850586, "pos_frac": 0.921875, "sample": [34.11083984375, 21.674591064453125, 30.243621826171875, 33.71361541748047, 14.290807723999023, 26.494503021240234, 35.431243896484375, 16.29730224609375, 22.349365234375, 2.185832977294922, 30.30577850341797, 16.045852661132812, 47.28410339355469, 35.26055145263672, 37.87507629394531, 39.213653564453125, 23.167613983154297, 26.434646606445312, 45.647239685058594, -4.02154541015625, 22.528656005859375, 2.914337158203125, 21.645263671875, 8.640045166015625, 29.239154815673828, -3.2731475830078125, 23.44847869873047, 16.46636962890625, 26.876480102539062, -12.153549194335938, 4.327995300292969, 54.53388214111328, 41.46162414550781, 5.961845397949219, 9.852203369140625, 13.427604675292969, 22.79000473022461, 65.6340560913086, 71.2644271850586, 23.71202850341797, 1.4684257507324219, 30.388351440429688, 15.740734100341797, 6.33038330078125, 41.659423828125, 53.91487121582031, 29.57525634765625, 25.948097229003906, 38.67436218261719, 15.124740600585938, 46.29357147216797, 43.291160583496094, 52.60878372192383, 28.691665649414062, 24.161663055419922, 33.689659118652344, 23.748092651367188, 8.302696228027344, 12.227676391601562, -6.555908203125, -2.085845947265625, 30.64199447631836, 40.623680114746094, 29.1622314453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000342.npy"}
{"epoch": 0.5022026431718062, "step": 343, "batch_size": 64, "mean": 22.402755737304688, "std": 17.28380584716797, "min": -11.694999694824219, "p10": 6.482577514648438, "median": 18.95611572265625, "p90": 40.09604034423828, "max": 81.40474700927734, "pos_frac": 0.9375, "sample": [21.771377563476562, 2.5152435302734375, 20.191650390625, 6.7297821044921875, 16.486080169677734, -0.0267791748046875, 12.69614028930664, 8.848777770996094, 22.691192626953125, 35.956939697265625, 16.845596313476562, 64.7626953125, 9.785049438476562, 35.858829498291016, 19.729087829589844, 11.268646240234375, 7.327251434326172, 13.618705749511719, 35.001312255859375, 16.985183715820312, 23.13613510131836, 26.88300323486328, -11.694999694824219, 56.99397277832031, 51.57253646850586, 8.221389770507812, 14.767017364501953, 10.567794799804688, 58.97100830078125, 28.013084411621094, 33.19141387939453, 15.806657791137695, 6.3766326904296875, 18.784278869628906, 32.59556579589844, 20.441192626953125, 8.562332153320312, -2.5015830993652344, 6.818456649780273, 13.211761474609375, 15.61004638671875, 9.36181640625, 14.290428161621094, 15.462760925292969, 32.126033782958984, 33.86552429199219, 40.4375, 37.15331268310547, 30.364337921142578, 24.11492919921875, 38.88015365600586, 17.045074462890625, 37.182743072509766, 7.091335296630859, 32.64118957519531, 42.0595703125, 19.127952575683594, 23.081680297851562, 39.29930114746094, 4.669258117675781, 12.098457336425781, 38.17059326171875, 81.40474700927734, -11.5228271484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000343.npy"}
{"epoch": 0.5036710719530103, "step": 344, "batch_size": 64, "mean": 20.87055015563965, "std": 19.24639320373535, "min": -15.248504638671875, "p10": -2.6181179046630842, "median": 18.688480377197266, "p90": 46.468514633178714, "max": 76.62260437011719, "pos_frac": 0.875, "sample": [11.905885696411133, -6.041728973388672, -15.248504638671875, -6.748359680175781, 26.944305419921875, 19.26617431640625, 13.376091003417969, 17.23779296875, 15.806320190429688, 59.44294738769531, 9.033611297607422, 12.965965270996094, -3.3992080688476562, 34.724998474121094, 28.085494995117188, 21.235435485839844, 58.622802734375, 39.21917724609375, 21.290916442871094, 21.174095153808594, 19.51943588256836, 11.293062210083008, 2.11923885345459, 7.3287811279296875, 8.533439636230469, 33.73333740234375, 1.0401611328125, 32.79597473144531, 19.927024841308594, 26.8128662109375, 10.412445068359375, 45.895355224609375, 16.722930908203125, 17.22936248779297, 43.44353485107422, 20.429271697998047, 52.145477294921875, 11.823982238769531, 31.915725708007812, 9.710023880004883, 46.6026725769043, 25.81761932373047, -13.957313537597656, 24.07018280029297, -6.725543975830078, 51.20738220214844, 46.155479431152344, 41.12884521484375, 10.66756820678711, 76.62260437011719, -0.7955741882324219, 4.79400634765625, 11.276193618774414, 18.11078643798828, 19.703372955322266, 62.27848815917969, -11.469833374023438, 12.867210388183594, 19.32879638671875, 26.220535278320312, 15.940532684326172, 7.425914764404297, 40.07687759399414, 6.642749786376953], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000344.npy"}
{"epoch": 0.5051395007342144, "step": 345, "batch_size": 64, "mean": 21.320972442626953, "std": 21.83964729309082, "min": -20.659324645996094, "p10": -1.6019084930419916, "median": 17.5824031829834, "p90": 55.89483337402344, "max": 76.99649047851562, "pos_frac": 0.859375, "sample": [12.910964965820312, 65.99114990234375, -20.659324645996094, -5.485069274902344, 4.435222625732422, 10.881378173828125, 76.99649047851562, 40.57433319091797, 53.600914001464844, 19.168563842773438, 3.135835647583008, 29.265838623046875, 52.449951171875, 25.235702514648438, 24.112625122070312, 21.561981201171875, 0.6394805908203125, 29.893638610839844, 19.351314544677734, 60.81727600097656, 12.158248901367188, 36.028968811035156, 17.515644073486328, 8.753643035888672, 57.483238220214844, 34.370849609375, 5.688323974609375, 5.80072021484375, 3.4665565490722656, 17.378173828125, -5.220603942871094, 41.32035827636719, 17.76067352294922, 18.085803985595703, 66.471923828125, 30.52568817138672, -1.1048583984375, -0.8463211059570312, 8.614173889160156, 32.784767150878906, -10.479179382324219, 76.24037170410156, 5.208625793457031, 17.64916229248047, 15.613021850585938, 26.86206817626953, -7.0498046875, 19.22039794921875, 10.296318054199219, 41.40857696533203, 25.10120391845703, 20.907135009765625, 14.116289138793945, 54.5687255859375, -1.8149299621582031, 8.156082153320312, 2.6421051025390625, -2.74176025390625, 18.967079162597656, 6.274547576904297, 12.207195281982422, 56.463165283203125, 16.66803741455078, 6.1436767578125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000345.npy"}
{"epoch": 0.5066079295154186, "step": 346, "batch_size": 64, "mean": 18.848159790039062, "std": 21.74367904663086, "min": -46.35955810546875, "p10": -5.804480743408202, "median": 19.669979095458984, "p90": 46.700425720214845, "max": 86.81477355957031, "pos_frac": 0.84375, "sample": [61.900787353515625, 53.965789794921875, -10.950557708740234, 51.504554748535156, 54.699729919433594, 18.134353637695312, 23.018585205078125, 21.31288719177246, 22.75177001953125, 46.47032165527344, 30.741714477539062, 33.490875244140625, 23.946144104003906, 7.359443664550781, 18.294822692871094, 10.902481079101562, 32.687652587890625, -7.723228454589844, 5.614524841308594, 0.09723472595214844, 11.695762634277344, -10.941963195800781, -46.35955810546875, 45.15392303466797, -7.515083312988281, 22.53792953491211, 18.9637451171875, -5.007469177246094, 21.908294677734375, 28.044815063476562, 23.845184326171875, 22.69792938232422, -21.184158325195312, 3.1027374267578125, 42.24176025390625, 30.68109893798828, 18.313419342041016, 22.60248565673828, 6.576194763183594, 12.849533081054688, 25.440635681152344, 5.70343017578125, 15.416593551635742, 65.203857421875, 3.6378173828125, 20.37621307373047, 18.80365753173828, 16.274871826171875, 29.020893096923828, 23.257095336914062, 27.55767822265625, -4.188117980957031, 5.305206298828125, 0.8163871765136719, 3.9589881896972656, 46.799041748046875, 11.948131561279297, 22.762245178222656, 86.81477355957031, -6.14605712890625, 26.208740234375, -4.2716827392578125, 1.624725341796875, 25.530588150024414], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000346.npy"}
{"epoch": 0.5080763582966226, "step": 347, "batch_size": 64, "mean": 17.112594604492188, "std": 19.5159912109375, "min": -26.665603637695312, "p10": -7.55622329711914, "median": 17.27747344970703, "p90": 40.36369781494141, "max": 72.20820617675781, "pos_frac": 0.8125, "sample": [-26.665603637695312, 0.21114730834960938, 25.518539428710938, 32.49812316894531, 15.99135971069336, 12.30279541015625, 12.44753646850586, 17.930923461914062, 13.620269775390625, 19.93718719482422, 30.2177734375, -0.24224853515625, -7.654937744140625, 10.951873779296875, 10.530838012695312, 2.6348342895507812, -19.26569366455078, 4.6207427978515625, 64.25324249267578, 13.263496398925781, 35.82209396362305, -4.1394500732421875, 16.810195922851562, 20.47771453857422, 9.632568359375, 19.22753143310547, 53.03683853149414, 4.5921173095703125, 18.373493194580078, 45.641944885253906, 9.898139953613281, 6.6594390869140625, 60.34765625, 32.85449981689453, -0.49587059020996094, 25.08983612060547, 40.79133605957031, 17.997875213623047, 18.38580322265625, -11.997241973876953, 25.35419464111328, -7.325889587402344, 22.72083282470703, 19.06473159790039, 30.173851013183594, 17.7447509765625, 8.20831298828125, 25.82492446899414, 25.56915283203125, 7.295866012573242, 39.365875244140625, 36.62208557128906, 30.955047607421875, 72.20820617675781, -11.380817413330078, 23.109115600585938, -10.82501220703125, -9.841527938842773, 52.54490661621094, 20.159934997558594, 15.303466796875, 6.015167236328125, -0.47110748291015625, 4.701263427734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000347.npy"}
{"epoch": 0.5095447870778267, "step": 348, "batch_size": 64, "mean": 17.527830123901367, "std": 18.840862274169922, "min": -21.28070068359375, "p10": -5.398260307312012, "median": 15.947219848632812, "p90": 43.709484863281254, "max": 61.046630859375, "pos_frac": 0.8125, "sample": [-0.8494338989257812, -21.28070068359375, 32.66472625732422, 14.994400024414062, 41.80579376220703, 44.980804443359375, -1.23858642578125, 16.122581481933594, 25.86956024169922, 33.24070739746094, 8.180503845214844, 0.677642822265625, 20.620162963867188, 17.347801208496094, 5.4297027587890625, 15.196075439453125, 1.299285888671875, 41.84606170654297, 16.624691009521484, 13.433439254760742, 15.89654541015625, 6.351043701171875, 31.636184692382812, 32.489105224609375, 33.827781677246094, 17.395111083984375, 12.511398315429688, 61.046630859375, 1.5295543670654297, 25.637939453125, 15.997894287109375, 14.304080963134766, -5.56622314453125, 56.79014587402344, 50.4732666015625, 31.135009765625, 7.3501434326171875, -4.935157775878906, 23.773422241210938, 19.808874130249023, -8.628349304199219, 22.881851196289062, 15.644264221191406, 35.772212982177734, -0.5569229125976562, -18.537818908691406, 44.508094787597656, 2.8534469604492188, 11.338348388671875, 16.0589599609375, 15.437141418457031, 6.929821014404297, 56.307334899902344, 26.91497802734375, 33.690338134765625, 10.03885269165039, -5.4181976318359375, 34.42413330078125, 11.058334350585938, 50.56260681152344, 19.134017944335938, -17.457672119140625, -10.240894317626953, -5.351739883422852], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000348.npy"}
{"epoch": 0.5110132158590308, "step": 349, "batch_size": 64, "mean": 18.896404266357422, "std": 19.53678321838379, "min": -26.479652404785156, "p10": -3.909394073486326, "median": 18.148399353027344, "p90": 45.284705352783206, "max": 68.3628158569336, "pos_frac": 0.84375, "sample": [-0.3592529296875, 13.800018310546875, -0.3358154296875, 5.014408111572266, 30.322093963623047, 39.05047607421875, 42.12498474121094, 1.7496528625488281, 30.809894561767578, 39.75575256347656, 12.191120147705078, 13.852691650390625, 10.131561279296875, 5.5787811279296875, 21.714082717895508, 1.4207820892333984, 0.3875999450683594, 23.378280639648438, 8.853195190429688, 22.015174865722656, 51.58357238769531, 24.764812469482422, 4.23046875, 9.11612319946289, 26.653369903564453, -9.042564392089844, 26.880325317382812, 18.768455505371094, 22.071842193603516, -4.8430328369140625, 64.58854675292969, -6.986412048339844, -13.994300842285156, 7.543537139892578, 23.60346221923828, -8.929557800292969, -26.479652404785156, 1.0862617492675781, 45.209312438964844, 45.58587646484375, 38.51924133300781, -9.595672607421875, 11.244216918945312, 21.324180603027344, 68.3628158569336, 45.71088409423828, 19.742233276367188, 23.53882598876953, 29.83393096923828, 17.528343200683594, 6.3232879638671875, 50.41551971435547, 14.924247741699219, -1.7309036254882812, 41.750030517578125, 7.747245788574219, 3.8161277770996094, 36.923309326171875, 27.29010772705078, 30.392044067382812, 11.70892333984375, 4.983983993530273, 40.43408203125, 45.3170166015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000349.npy"}
{"epoch": 0.5124816446402349, "step": 350, "batch_size": 64, "mean": 16.28253173828125, "std": 14.764904975891113, "min": -8.938777923583984, "p10": -1.0609039306640622, "median": 14.811325073242188, "p90": 37.84320259094238, "max": 53.95365905761719, "pos_frac": 0.859375, "sample": [22.214065551757812, -6.521327972412109, 2.5496139526367188, 18.267406463623047, 13.244834899902344, 9.63833236694336, 39.580177307128906, 11.50128173828125, -1.2005767822265625, 7.471611022949219, 8.255329132080078, 37.631248474121094, 18.052490234375, 21.192760467529297, 27.39784812927246, 2.955352783203125, 14.451858520507812, 7.3597259521484375, 15.004837036132812, 26.368053436279297, 40.38423538208008, 37.263648986816406, 8.658466339111328, 15.800796508789062, -3.6110992431640625, 39.50495910644531, 24.665771484375, -4.0430755615234375, -8.770797729492188, 49.93580627441406, 9.941856384277344, 37.93404006958008, 23.332420349121094, 22.278274536132812, 3.4140243530273438, 13.723091125488281, -8.938777923583984, 48.85865020751953, 17.108001708984375, 6.15045166015625, 8.37701416015625, 30.51416015625, 13.684768676757812, 26.238845825195312, 21.194808959960938, 30.926265716552734, 0.676177978515625, 10.677162170410156, 14.617813110351562, 27.524749755859375, 23.886672973632812, 53.95365905761719, -8.377838134765625, 25.63140869140625, 1.4999370574951172, -0.7350006103515625, 10.432418823242188, 17.736846923828125, 16.97637939453125, -0.6892166137695312, 6.326622009277344, 2.2789688110351562, 20.78540802001953, 18.93834686279297], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000350.npy"}
{"epoch": 0.5139500734214391, "step": 351, "batch_size": 64, "mean": 23.223834991455078, "std": 19.206491470336914, "min": -18.425674438476562, "p10": -0.7004844665527334, "median": 22.1810302734375, "p90": 48.98135757446289, "max": 66.57213592529297, "pos_frac": 0.890625, "sample": [23.75360870361328, 19.525381088256836, 45.9891357421875, 63.3404541015625, 48.800071716308594, 5.423259735107422, 25.739151000976562, 19.789871215820312, 40.966739654541016, 33.17875289916992, 29.59222412109375, 16.254169464111328, 19.170547485351562, 20.626708984375, 49.059051513671875, 16.244049072265625, -8.111007690429688, 29.830108642578125, 29.343795776367188, 50.32832336425781, -8.698410034179688, 58.389251708984375, 29.074905395507812, 24.904212951660156, 1.2629928588867188, 36.25072479248047, 15.110366821289062, 18.62750244140625, 0.19954681396484375, 22.171096801757812, 21.5848388671875, 39.83143615722656, -9.836414337158203, 29.57345199584961, 19.71184539794922, 34.32124328613281, 5.524940490722656, 33.279090881347656, 28.70343017578125, -1.086212158203125, 17.55408477783203, 3.56622314453125, 35.6093635559082, 54.94417190551758, -12.7105712890625, 1.447732925415039, 38.83765411376953, 14.96661376953125, 42.19732666015625, -18.425674438476562, 7.038408279418945, 28.22955322265625, -7.345226287841797, 1.1175765991210938, 52.51588439941406, 4.8377838134765625, 34.979209899902344, 66.57213592529297, 22.190963745117188, 19.985748291015625, 32.78528594970703, 41.02061462402344, 11.996089935302734, 14.670188903808594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000351.npy"}
{"epoch": 0.5154185022026432, "step": 352, "batch_size": 64, "mean": 18.596710205078125, "std": 16.966463088989258, "min": -11.763191223144531, "p10": -3.430459022521972, "median": 18.01453971862793, "p90": 38.77568588256837, "max": 66.58587646484375, "pos_frac": 0.828125, "sample": [58.80127716064453, 39.683319091796875, 6.496570587158203, 34.084136962890625, -4.4830169677734375, 7.546417236328125, 19.059852600097656, -2.8104248046875, 31.76439666748047, 32.262115478515625, 20.169662475585938, 18.494552612304688, 25.316604614257812, -11.763191223144531, 18.027606964111328, -3.696187973022461, 4.607257843017578, 66.58587646484375, 10.441009521484375, -0.8903236389160156, 45.92207336425781, 34.97625732421875, 15.516342163085938, 18.00147247314453, 24.105648040771484, 32.19605255126953, -7.120826721191406, 12.18743896484375, 19.271820068359375, 17.310195922851562, 12.354888916015625, 24.349853515625, 14.366477966308594, -10.3265380859375, 26.79574203491211, 22.021011352539062, -0.56719970703125, 17.34671974182129, -10.354454040527344, 13.994987487792969, 9.29411506652832, 15.808700561523438, 19.037799835205078, 13.34805679321289, 4.457977294921875, 30.392120361328125, 48.69284439086914, 32.332725524902344, 55.52598571777344, 36.657875061035156, 22.452316284179688, 2.2387008666992188, 9.89981460571289, 34.06730651855469, 19.368263244628906, 26.113990783691406, 12.467872619628906, 5.6828460693359375, 40.75578308105469, 14.162254333496094, -7.4232177734375, -1.3928909301757812, 20.931400299072266, 33.271358489990234], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000352.npy"}
{"epoch": 0.5168869309838473, "step": 353, "batch_size": 64, "mean": 13.157454490661621, "std": 15.032512664794922, "min": -23.54000473022461, "p10": -5.672432708740233, "median": 10.93331241607666, "p90": 33.36452102661133, "max": 49.133750915527344, "pos_frac": 0.765625, "sample": [34.482669830322266, 30.29095458984375, 10.689592361450195, 46.50724792480469, 16.844093322753906, 11.177032470703125, -1.592193603515625, -11.445466995239258, 26.27252197265625, 14.553665161132812, -3.6773223876953125, -4.2280426025390625, 25.416210174560547, 29.038185119628906, 33.64173126220703, 39.086395263671875, -3.663097381591797, 7.640594482421875, 20.181427001953125, 12.575607299804688, 6.894187927246094, 8.626296997070312, 29.331077575683594, -7.137424468994141, -9.73602294921875, 18.396591186523438, -0.5752143859863281, 8.260032653808594, 10.1119384765625, 9.659980773925781, 20.025184631347656, 3.966693878173828, 5.442058563232422, 10.118003845214844, 0.20159530639648438, -23.54000473022461, 49.133750915527344, 28.106307983398438, -1.2726593017578125, 4.736742973327637, 36.0892219543457, 9.497734069824219, -0.360137939453125, 9.945587158203125, 28.59772491455078, 19.850791931152344, 33.884681701660156, -6.564117431640625, -4.8143463134765625, 32.211456298828125, 17.3568115234375, 23.089935302734375, 12.287841796875, 24.445350646972656, 15.013153076171875, 7.4547576904296875, 14.431877136230469, 32.71769714355469, -6.858001708984375, -6.040184020996094, 8.34157943725586, 18.530967712402344, 3.2708206176757812, 15.15496826171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000353.npy"}
{"epoch": 0.5183553597650514, "step": 354, "batch_size": 64, "mean": 21.11812973022461, "std": 16.549596786499023, "min": -9.923851013183594, "p10": 0.8043258666992195, "median": 20.37138271331787, "p90": 44.17042198181153, "max": 70.67024230957031, "pos_frac": 0.90625, "sample": [39.881072998046875, 29.42352294921875, 5.056434631347656, 21.80303955078125, 63.220367431640625, 22.183151245117188, 50.80116271972656, -0.5827713012695312, 23.565597534179688, 7.7943115234375, 2.71466064453125, 8.781694412231445, 23.21424102783203, 57.311553955078125, 34.95038986206055, 0.4832801818847656, 20.714874267578125, 28.577484130859375, 25.968795776367188, 4.65618896484375, 16.04852294921875, 14.586044311523438, 8.278518676757812, 12.100929260253906, 28.12169647216797, 20.128644943237305, 17.340984344482422, 30.366310119628906, 20.0982666015625, 30.274688720703125, 23.370361328125, 12.68148422241211, 42.39376449584961, 15.791851043701172, 14.481758117675781, 31.987548828125, -9.923851013183594, 8.703165054321289, 20.614120483398438, -1.6608123779296875, 6.7536163330078125, 25.2379150390625, 28.617919921875, -0.34963226318359375, 27.44994354248047, 44.931846618652344, -1.7830219268798828, 47.154396057128906, 70.67024230957031, 27.214828491210938, 1.5534324645996094, 12.343639373779297, 13.279373168945312, 14.761199951171875, 2.3275833129882812, 15.541915893554688, 51.428253173828125, 15.674072265625, 20.94103240966797, -0.4777679443359375, 41.328941345214844, 24.60973358154297, 14.941986083984375, 21.105772018432617], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000354.npy"}
{"epoch": 0.5198237885462555, "step": 355, "batch_size": 64, "mean": 19.532241821289062, "std": 19.108638763427734, "min": -32.34002685546875, "p10": -0.018783187866209583, "median": 16.94678497314453, "p90": 40.50212860107422, "max": 84.31559753417969, "pos_frac": 0.890625, "sample": [25.638198852539062, 36.03832244873047, 14.786994934082031, 38.52104949951172, -16.98719024658203, 40.52503967285156, -11.974323272705078, 48.97710418701172, -5.117103576660156, 45.74320983886719, 10.978515625, 40.44866943359375, -18.90117645263672, 3.989419937133789, 10.633289337158203, 30.610614776611328, 60.89685821533203, 64.41873931884766, 1.3446311950683594, 15.769126892089844, 39.07854461669922, 22.08654022216797, -0.6031036376953125, 3.9632644653320312, 13.158012390136719, 16.976200103759766, 9.37015151977539, 13.23162841796875, 16.917369842529297, 30.734085083007812, 10.812980651855469, 16.2408447265625, 23.6986083984375, 25.43903350830078, 35.13115692138672, 3.8068771362304688, 16.00585174560547, 44.36988830566406, 11.80990219116211, 84.31559753417969, 22.58319091796875, 24.961265563964844, 16.156341552734375, 16.87224578857422, 19.199691772460938, 10.260330200195312, 8.933974266052246, 18.861602783203125, 10.620208740234375, 13.541006088256836, 22.418682098388672, 28.50481414794922, -32.34002685546875, 27.858505249023438, 2.6097259521484375, 32.84958267211914, 27.610389709472656, 14.978584289550781, 20.98880386352539, 8.318252563476562, 20.401275634765625, -2.0204925537109375, 21.435832977294922, 21.576278686523438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000355.npy"}
{"epoch": 0.5212922173274597, "step": 356, "batch_size": 64, "mean": 15.670114517211914, "std": 18.73194122314453, "min": -22.859703063964844, "p10": -8.27687644958496, "median": 15.445137023925781, "p90": 35.24840049743652, "max": 72.91885375976562, "pos_frac": 0.796875, "sample": [32.09325408935547, 22.17252540588379, 10.390811920166016, 17.26068115234375, 11.509761810302734, -1.9974079132080078, 18.839216232299805, 34.468658447265625, 13.161544799804688, 12.808273315429688, 14.782974243164062, 9.057687759399414, 35.2674560546875, 50.87272644042969, 35.20393753051758, 14.639511108398438, -3.8283538818359375, 20.887725830078125, -4.981048583984375, 30.471237182617188, -22.859703063964844, 18.46758270263672, 1.337493896484375, 21.1070556640625, 22.64502716064453, 17.792129516601562, 33.314491271972656, 42.27455139160156, -12.421516418457031, 14.055488586425781, -17.715423583984375, 62.384307861328125, 31.142189025878906, 21.95238494873047, 26.223190307617188, -15.486892700195312, -5.342033386230469, 37.126625061035156, -13.013229370117188, 12.752685546875, -0.595916748046875, 21.004623413085938, -19.806415557861328, 23.449634552001953, 3.998617172241211, 4.750518798828125, 16.1072998046875, 6.542407989501953, 72.91885375976562, 0.943359375, 34.824913024902344, 28.378860473632812, 24.128875732421875, 28.70647430419922, 3.79547119140625, 4.1998443603515625, -5.996356964111328, 25.456459045410156, 3.99407958984375, 35.40362548828125, 13.085418701171875, 12.231269836425781, -9.254241943359375, 25.802051544189453], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000356.npy"}
{"epoch": 0.5227606461086637, "step": 357, "batch_size": 64, "mean": 22.50282096862793, "std": 16.66988182067871, "min": -9.938350677490234, "p10": 0.9496349334716805, "median": 21.850915908813477, "p90": 43.76978607177735, "max": 65.49740600585938, "pos_frac": 0.90625, "sample": [35.71143341064453, 42.988555908203125, 24.986770629882812, 31.98400115966797, 40.45148468017578, 31.227813720703125, 19.516708374023438, 19.677276611328125, 24.408531188964844, 27.664779663085938, 20.01744842529297, -1.6958694458007812, -9.938350677490234, 38.755950927734375, 19.174957275390625, 19.313980102539062, 32.605133056640625, 22.74102020263672, 40.04988098144531, 10.449634552001953, 1.725250244140625, 2.3688392639160156, 13.627883911132812, 49.50636291503906, -7.148124694824219, -8.767332077026367, 34.613189697265625, 44.91123580932617, -2.132221221923828, 65.49740600585938, 30.919898986816406, 12.408592224121094, 18.844924926757812, 3.981353759765625, 23.5523681640625, -1.5233650207519531, 40.470115661621094, 58.74394989013672, 31.07769775390625, 39.744842529296875, 15.619972229003906, 5.606849670410156, 47.35874557495117, 21.237960815429688, 21.990623474121094, 1.7177886962890625, 14.897964477539062, 21.607437133789062, 47.18939208984375, 44.10459899902344, 21.71120834350586, 21.557476043701172, 22.16411590576172, 12.77081298828125, 28.144866943359375, 8.283416748046875, 0.6204261779785156, 29.818275451660156, 30.101455688476562, 24.250450134277344, 11.674388885498047, 36.87483215332031, 2.5443572998046875, 5.819099426269531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000357.npy"}
{"epoch": 0.5242290748898678, "step": 358, "batch_size": 64, "mean": 17.26789093017578, "std": 18.61229133605957, "min": -10.95751953125, "p10": -3.585952377319336, "median": 11.31118392944336, "p90": 43.476846313476564, "max": 70.83810424804688, "pos_frac": 0.84375, "sample": [7.261085510253906, 30.509803771972656, 34.1563720703125, 37.80582046508789, 20.08441162109375, 20.570816040039062, 9.372459411621094, 33.8760986328125, 5.143314361572266, 18.546384811401367, -3.4313278198242188, 41.79157257080078, 13.742263793945312, -3.5321807861328125, -3.608997344970703, 45.152374267578125, -6.056194305419922, 31.232421875, 43.01618957519531, 8.958698272705078, -3.998699188232422, 7.7187652587890625, 1.8949356079101562, 2.4218082427978516, 6.5861968994140625, 5.3346099853515625, 16.66478729248047, 14.95718765258789, 70.83810424804688, 43.67427062988281, 28.40001678466797, 1.0745086669921875, 50.08008575439453, -1.3981781005859375, 38.638694763183594, 22.043869018554688, 17.556304931640625, -3.84857177734375, 10.048206329345703, 14.12158203125, 7.087371826171875, 9.084457397460938, 4.294380187988281, 5.3648834228515625, 16.642467498779297, -10.95751953125, 59.343505859375, 17.211959838867188, 6.471996307373047, 12.32540512084961, -10.896514892578125, 23.737747192382812, 7.0686798095703125, 7.474465370178223, 34.13970184326172, 67.2934341430664, 7.12921142578125, 25.671348571777344, 27.225250244140625, 3.9701080322265625, 45.61814880371094, 6.43621826171875, 10.29696273803711, -4.288536071777344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000358.npy"}
{"epoch": 0.5256975036710719, "step": 359, "batch_size": 64, "mean": 17.11254119873047, "std": 21.121313095092773, "min": -14.003604888916016, "p10": -8.280194091796874, "median": 15.200223922729492, "p90": 48.309217071533226, "max": 85.8597412109375, "pos_frac": 0.796875, "sample": [4.5198974609375, 9.480911254882812, 52.187957763671875, 33.042659759521484, -10.734649658203125, 68.13943481445312, 12.321197509765625, 85.8597412109375, 16.289363861083984, -3.692838668823242, 0.7950592041015625, 19.486801147460938, 20.44881820678711, 50.580848693847656, -10.075759887695312, 55.31443786621094, 29.28537940979004, 28.098464965820312, 26.848228454589844, 16.503997802734375, 13.535598754882812, 19.13323974609375, 40.63889694213867, 17.02722930908203, 1.4987525939941406, 16.231002807617188, 22.146835327148438, 6.7152099609375, -12.702423095703125, 3.3976821899414062, 69.448974609375, 11.66269302368164, 43.00874328613281, 0.6397438049316406, -9.64141845703125, 26.089752197265625, 11.765655517578125, 60.15715789794922, 10.500129699707031, 16.510101318359375, -8.755451202392578, 19.10491180419922, 7.332771301269531, 9.243070602416992, -8.519119262695312, -1.0472545623779297, 38.075225830078125, 2.3927001953125, 29.841604232788086, 17.911956787109375, -6.042945861816406, -7.7227020263671875, 20.329238891601562, 11.057182312011719, 16.94921112060547, 9.054901123046875, 15.820369720458984, -0.5271873474121094, 30.386856079101562, 14.580078125, 23.033061981201172, -0.6286468505859375, 4.872890472412109, -14.003604888916016], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000359.npy"}
{"epoch": 0.527165932452276, "step": 360, "batch_size": 64, "mean": 17.802963256835938, "std": 16.150840759277344, "min": -22.051443099975586, "p10": -1.6852012634277322, "median": 17.407127380371094, "p90": 36.691649246215825, "max": 69.82926940917969, "pos_frac": 0.890625, "sample": [13.917007446289062, 3.250518798828125, 27.865943908691406, 36.513671875, 21.83081817626953, 29.658615112304688, 54.43463134765625, 27.913421630859375, 36.771724700927734, 10.4173583984375, 17.289260864257812, 17.524993896484375, 9.025840759277344, 7.673236846923828, 23.760047912597656, 3.0973682403564453, 4.206380844116211, 20.97473907470703, 23.38333511352539, -3.029998779296875, 35.29359817504883, 5.214702606201172, -6.879417419433594, 21.206192016601562, 3.1948318481445312, 9.210929870605469, 19.16217803955078, 31.232437133789062, 29.30559539794922, 31.445053100585938, 69.82926940917969, 25.775394439697266, 15.502510070800781, -22.051443099975586, 6.213047027587891, 16.81769561767578, 22.57341766357422, -15.010871887207031, 36.10560607910156, 0.6149444580078125, 9.805553436279297, 45.852073669433594, -6.329952239990234, 26.984649658203125, 31.060638427734375, 37.37739562988281, 14.966312408447266, 2.4456825256347656, 14.251880645751953, 21.70567512512207, -5.7188720703125, 12.730560302734375, 0.5611572265625, 24.404083251953125, 37.679168701171875, 36.76792526245117, 22.197128295898438, 24.029006958007812, 11.999576568603516, -2.6479263305664062, 9.816390991210938, 27.654735565185547, 9.937538146972656, 10.624595642089844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000360.npy"}
{"epoch": 0.5286343612334802, "step": 361, "batch_size": 64, "mean": 18.384605407714844, "std": 18.615556716918945, "min": -22.273727416992188, "p10": -4.128029632568358, "median": 17.30086326599121, "p90": 41.9796371459961, "max": 68.3084716796875, "pos_frac": 0.84375, "sample": [17.444896697998047, 18.474292755126953, 0.7449569702148438, 24.874343872070312, -4.589599609375, 10.847003936767578, 35.20391845703125, 48.74329376220703, 40.084999084472656, 7.275299072265625, -0.0657806396484375, 17.156829833984375, 11.063186645507812, 46.224761962890625, 2.45703125, 12.073759078979492, 6.330322265625, 31.69886016845703, -6.2350311279296875, 20.924522399902344, 47.19971466064453, 11.672332763671875, 0.804168701171875, 32.073265075683594, 19.84124755859375, 18.95513153076172, 30.485885620117188, 22.22064208984375, 22.232437133789062, 38.59466552734375, 19.023361206054688, 6.864856719970703, 29.345062255859375, 3.6254501342773438, 42.75904846191406, 33.6090087890625, -6.833538055419922, 40.1610107421875, 29.201744079589844, 20.83282470703125, 26.987506866455078, 0.8430099487304688, 49.22220993041992, 32.01579284667969, -2.239593505859375, 29.767274856567383, 65.85598754882812, 10.139995574951172, 16.17108154296875, -4.732887268066406, 4.322911262512207, 15.661323547363281, 36.01373291015625, 8.554656982421875, -16.73487091064453, 15.2606201171875, 30.147872924804688, 5.277933120727539, -5.308738708496094, 7.905975341796875, 68.3084716796875, -3.0510330200195312, -22.273727416992188, 5.099021911621094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000361.npy"}
{"epoch": 0.5301027900146843, "step": 362, "batch_size": 64, "mean": 22.256715774536133, "std": 20.39779281616211, "min": -26.02077865600586, "p10": 0.18722476959228534, "median": 20.728944778442383, "p90": 44.05804977416992, "max": 104.66232299804688, "pos_frac": 0.90625, "sample": [15.161226272583008, 57.66014862060547, 13.615333557128906, 34.57685852050781, 10.90522575378418, -26.02077865600586, 104.66232299804688, 36.31409454345703, 18.09893035888672, 44.38468933105469, 35.77655029296875, 7.165676116943359, 16.37310791015625, 29.486351013183594, 17.476478576660156, 16.654129028320312, 0.11013221740722656, -19.048606872558594, 36.09297180175781, 56.33348083496094, 29.593231201171875, 28.37884521484375, 0.3682060241699219, 15.866989135742188, 31.062156677246094, 0.42218017578125, 8.061325073242188, 15.604324340820312, 1.5320968627929688, 31.425216674804688, 29.102397918701172, 20.66931915283203, 29.56769561767578, 4.624519348144531, 35.862361907958984, 15.696510314941406, 44.450286865234375, -1.8512496948242188, 31.948333740234375, 15.64410400390625, 18.279651641845703, 0.3671073913574219, 26.7081298828125, 43.29589080810547, 11.398590087890625, -10.608238220214844, 37.06357192993164, 20.788570404052734, 32.47505187988281, 28.963943481445312, 26.442222595214844, 50.74555969238281, -13.008895874023438, 40.955322265625, -3.41619873046875, 18.921714782714844, 41.01750183105469, 14.454910278320312, 14.622867584228516, 49.13714599609375, 24.43466567993164, 22.470046997070312, 3.36370849609375, 31.749725341796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000362.npy"}
{"epoch": 0.5315712187958884, "step": 363, "batch_size": 64, "mean": 24.451162338256836, "std": 19.82657241821289, "min": -9.889595031738281, "p10": 0.9674278259277345, "median": 25.209667205810547, "p90": 48.99616775512696, "max": 80.26861572265625, "pos_frac": 0.90625, "sample": [-4.416595458984375, 17.962539672851562, 58.113304138183594, 35.648460388183594, -4.0439300537109375, 7.513988494873047, 40.14970397949219, 31.98168182373047, 2.1260452270507812, 29.07524871826172, 31.307579040527344, 35.76405334472656, 63.37824249267578, 25.638294219970703, 35.37419128417969, 28.958084106445312, 0.9417495727539062, 46.58595275878906, 20.54305648803711, 47.081565856933594, 10.055023193359375, 6.338813781738281, 25.0921630859375, 64.07254028320312, 20.72722625732422, 16.55194091796875, 80.26861572265625, 28.696456909179688, 3.706207275390625, 41.99285125732422, 14.523941040039062, 40.22361755371094, 5.084014892578125, 5.618730545043945, 56.82263946533203, 30.55849838256836, 10.968387603759766, 8.160781860351562, 18.673851013183594, 31.32935333251953, 38.916038513183594, 25.327171325683594, -7.049785614013672, 37.61272430419922, -3.7532882690429688, 14.301322937011719, 19.10112762451172, -9.889595031738281, 35.65142822265625, 49.81671142578125, 12.2264404296875, -0.6984977722167969, 8.693649291992188, 1.2421951293945312, 32.23456573486328, 1.02734375, 24.424118041992188, 6.92120361328125, 38.73775100708008, 10.703197479248047, 64.7410659790039, 26.882415771484375, 32.62702941894531, 35.929168701171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000363.npy"}
{"epoch": 0.5330396475770925, "step": 364, "batch_size": 64, "mean": 16.92709732055664, "std": 15.816713333129883, "min": -8.872947692871094, "p10": -4.329115676879881, "median": 15.842662811279297, "p90": 38.575800323486334, "max": 51.88893127441406, "pos_frac": 0.84375, "sample": [1.8716278076171875, 16.500560760498047, -8.872947692871094, 7.264192581176758, -6.657737731933594, 12.983543395996094, -6.277191162109375, 20.941062927246094, -6.134132385253906, 36.69818115234375, 11.894645690917969, 26.53034210205078, 17.804840087890625, 23.584579467773438, 2.6988449096679688, 0.70953369140625, 12.83135986328125, 14.128074645996094, 6.519012451171875, -7.119846343994141, 10.7718505859375, 6.165809631347656, 35.20399475097656, 35.27891540527344, 9.530319213867188, 0.4022026062011719, 18.947242736816406, 48.020301818847656, 24.098255157470703, 31.96286392211914, -0.037200927734375, 39.475608825683594, 21.184921264648438, 39.984222412109375, 33.829010009765625, 0.601043701171875, 12.998218536376953, 15.850196838378906, 10.862960815429688, 6.250541687011719, -2.2327232360839844, 45.48920440673828, 36.133148193359375, 16.739274978637695, 25.766517639160156, 35.31928253173828, 15.835128784179688, 27.461196899414062, 24.761512756347656, 5.7280426025390625, 16.062164306640625, 23.40114402770996, -0.5131473541259766, -5.227569580078125, 39.087921142578125, 21.34479522705078, 39.568145751953125, 51.88893127441406, -6.861469268798828, 36.963600158691406, 15.445289611816406, 3.1480255126953125, 1.3651161193847656, 37.38085174560547], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000364.npy"}
{"epoch": 0.5345080763582967, "step": 365, "batch_size": 64, "mean": 20.91604995727539, "std": 20.199970245361328, "min": -21.177093505859375, "p10": 0.020352172851563988, "median": 20.04845905303955, "p90": 42.707884597778325, "max": 98.60382080078125, "pos_frac": 0.890625, "sample": [30.95824432373047, 12.657821655273438, 38.11500549316406, 4.3376922607421875, -0.6137237548828125, 3.8254432678222656, 28.923912048339844, 17.09575653076172, 10.830734252929688, -12.247703552246094, 7.027870178222656, 22.033309936523438, 3.244884490966797, 41.9250602722168, 10.29931640625, 5.735576629638672, 25.669174194335938, 44.376495361328125, 28.424270629882812, 14.826705932617188, 43.04338073730469, 13.152204513549805, 8.296035766601562, 20.982940673828125, 16.723941802978516, 7.251197814941406, 29.09954071044922, 31.56494140625, 37.26463317871094, -10.805713653564453, -21.177093505859375, 10.963302612304688, 7.299381256103516, 8.537883758544922, -4.0141754150390625, 45.16947937011719, 41.055702209472656, 20.118547439575195, 37.876182556152344, -14.016448974609375, 38.255096435546875, 32.654502868652344, 32.15800857543945, 1.4998626708984375, 3.469512939453125, 58.67991638183594, 29.54303741455078, 18.0069580078125, 12.793071746826172, 4.108619689941406, 17.942672729492188, 20.739334106445312, -8.62055778503418, 28.12897491455078, 23.067489624023438, 21.06110382080078, 98.60382080078125, 60.91651916503906, 8.236663818359375, 19.978370666503906, 34.00477600097656, 57.88580322265625, 21.337017059326172, 38.34492492675781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000365.npy"}
{"epoch": 0.5359765051395007, "step": 366, "batch_size": 64, "mean": 20.499000549316406, "std": 20.604028701782227, "min": -12.533744812011719, "p10": 0.6124023437500004, "median": 14.831855773925781, "p90": 41.877990341186525, "max": 102.38302612304688, "pos_frac": 0.90625, "sample": [10.039794921875, 8.312524795532227, 41.48108673095703, -2.2902584075927734, 34.76249694824219, -1.4975967407226562, 8.767234802246094, 12.819721221923828, 1.1922988891601562, 87.4754409790039, 3.210172653198242, 8.384162902832031, 17.277420043945312, 58.90692901611328, 36.41933059692383, 17.2633056640625, 4.241973876953125, 5.326625823974609, 1.74847412109375, 5.1641693115234375, 42.048091888427734, 19.1376953125, 35.299556732177734, -1.2246894836425781, 45.231170654296875, 23.01026153564453, 30.758399963378906, 8.724681854248047, 19.99761199951172, 13.58697509765625, 29.499919891357422, 10.180068969726562, 23.031253814697266, 14.690093994140625, 32.25779724121094, 3.3415451049804688, 50.46241760253906, 40.96937942504883, 11.8355712890625, 9.306394577026367, 30.732921600341797, 4.5185394287109375, 2.5617847442626953, 17.376296997070312, 39.233154296875, -0.7520713806152344, 12.04541015625, -2.3931236267089844, 13.92767333984375, 102.38302612304688, 29.447021484375, 0.921173095703125, 38.80121612548828, 22.968994140625, 51.73762512207031, 14.973617553710938, 6.527626037597656, 17.747955322265625, 21.307811737060547, 0.480072021484375, 30.52674102783203, 34.56055450439453, 13.686214447021484, -12.533744812011719], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000366.npy"}
{"epoch": 0.5374449339207048, "step": 367, "batch_size": 64, "mean": 20.089473724365234, "std": 16.88616943359375, "min": -16.644302368164062, "p10": 0.7859970092773445, "median": 17.521751403808594, "p90": 41.75571975708009, "max": 69.31298828125, "pos_frac": 0.921875, "sample": [3.5714664459228516, 12.63925552368164, 39.92613983154297, 4.285846710205078, 61.39042663574219, 17.22705078125, 26.695655822753906, 6.50816535949707, 36.5965576171875, 8.805961608886719, 42.539825439453125, 0.45438385009765625, 5.711769104003906, 69.31298828125, 33.747501373291016, 47.425323486328125, 23.930145263671875, 16.02729034423828, 50.749114990234375, 0.2561988830566406, 5.820590972900391, 46.26054763793945, 2.66766357421875, 36.50092315673828, 27.22777557373047, 37.69578552246094, 25.68981170654297, 9.586906433105469, 49.743324279785156, 22.27288818359375, 19.222923278808594, 27.194232940673828, 19.230873107910156, 25.145645141601562, 15.892593383789062, 22.33608627319336, 27.875259399414062, 13.753976821899414, 13.221040725708008, 9.87135124206543, 14.738651275634766, 21.82404327392578, 15.960712432861328, -2.2402515411376953, 17.558853149414062, -16.644302368164062, -3.691844940185547, -6.6135711669921875, 39.286258697509766, 12.708183288574219, 16.53771209716797, -8.582855224609375, 18.49372100830078, 7.196949005126953, 1.5597610473632812, 21.074356079101562, 17.484649658203125, 6.32965087890625, 27.2064208984375, 9.874603271484375, 20.49713134765625, 15.564628601074219, 36.57170104980469, 38.019866943359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000367.npy"}
{"epoch": 0.5389133627019089, "step": 368, "batch_size": 64, "mean": 20.336618423461914, "std": 17.969846725463867, "min": -25.80229949951172, "p10": -2.7206479072570793, "median": 21.165237426757812, "p90": 40.86208534240723, "max": 65.773193359375, "pos_frac": 0.84375, "sample": [14.924095153808594, 21.22020721435547, 21.774520874023438, 45.717926025390625, 30.782989501953125, 3.6760482788085938, -0.45687103271484375, -25.80229949951172, 6.85089111328125, 21.110267639160156, -4.866485595703125, 27.701461791992188, 24.343910217285156, 13.211040496826172, 22.242881774902344, 32.58530044555664, 23.41297149658203, 37.35258483886719, 20.29944610595703, 34.368255615234375, 37.03399658203125, 13.505844116210938, 49.89344787597656, 14.042068481445312, 65.773193359375, 41.0467529296875, 5.6726531982421875, -1.258270263671875, 10.560992240905762, -3.1158246994018555, 52.692928314208984, 30.69512176513672, 32.394493103027344, 3.39886474609375, 24.47412872314453, 16.426734924316406, 11.58984375, 39.06389236450195, 9.0054931640625, 24.933837890625, 14.68212890625, 28.729694366455078, 25.13971710205078, -4.261871337890625, 8.02035903930664, 10.211082458496094, 39.517578125, 26.257476806640625, -4.142024993896484, -9.496490478515625, 21.76166534423828, 6.8262786865234375, 1.3207931518554688, -4.404106140136719, 40.43119430541992, 12.48876953125, 13.304813385009766, 61.473907470703125, 34.279327392578125, 28.516403198242188, 38.79328155517578, 49.45147705078125, 16.161334991455078, -1.7985687255859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000368.npy"}
{"epoch": 0.540381791483113, "step": 369, "batch_size": 64, "mean": 19.09603500366211, "std": 14.100530624389648, "min": -8.3057861328125, "p10": 0.2296901702880887, "median": 17.036544799804688, "p90": 40.50803146362305, "max": 51.501953125, "pos_frac": 0.890625, "sample": [24.551223754882812, 5.084177017211914, 28.829513549804688, 22.747426986694336, -2.3235855102539062, 2.985462188720703, -6.829139709472656, 27.998565673828125, 5.018218994140625, 51.501953125, 21.81388282775879, 15.173896789550781, 11.750835418701172, 7.477485656738281, 13.32501220703125, 13.588127136230469, 21.894546508789062, 44.89204406738281, -4.622383117675781, 25.127090454101562, 32.84351348876953, 13.642379760742188, 39.77391052246094, -0.95135498046875, 19.00299835205078, -8.3057861328125, 8.368461608886719, 14.786636352539062, 17.509307861328125, 9.11163330078125, 21.146160125732422, 28.4110107421875, 22.085250854492188, -3.497344970703125, 21.08892822265625, 13.945940017700195, 25.44373321533203, 40.822654724121094, 29.27373504638672, 6.9422607421875, 30.383182525634766, 15.224296569824219, 42.537811279296875, 14.245040893554688, 49.397037506103516, 28.87066650390625, 14.348869323730469, 12.006675720214844, -1.4325542449951172, 28.49798583984375, 8.466285705566406, 16.56378173828125, 13.5552978515625, 8.277816772460938, 43.504119873046875, 21.873153686523438, 23.018882751464844, 11.24660873413086, 21.391937255859375, 42.16270446777344, 37.18696594238281, 8.256149291992188, 38.99041748046875, 12.14462661743164], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000369.npy"}
{"epoch": 0.5418502202643172, "step": 370, "batch_size": 64, "mean": 16.98387908935547, "std": 17.41647720336914, "min": -24.127182006835938, "p10": -7.155684661865234, "median": 18.359786987304688, "p90": 41.270601272583015, "max": 59.153953552246094, "pos_frac": 0.8125, "sample": [18.070363998413086, 19.89379119873047, -24.127182006835938, 35.10560607910156, 37.31817626953125, 20.850250244140625, 13.153305053710938, 9.743759155273438, 1.4467735290527344, 28.885581970214844, 43.01715850830078, 44.17901611328125, 3.2159423828125, 20.91602325439453, 32.8855094909668, 9.677436828613281, -7.576499938964844, -8.794647216796875, -6.1603240966796875, 40.19550323486328, 37.279449462890625, 22.567031860351562, 32.07245635986328, 8.750137329101562, 20.18846893310547, 11.773820877075195, 9.680389404296875, -9.655532836914062, 13.97149658203125, 19.8640079498291, 19.370269775390625, 12.79410171508789, 16.02062225341797, 8.359527587890625, 6.336742401123047, 25.844558715820312, 36.49421691894531, 26.216949462890625, -6.1737823486328125, -0.697361946105957, 46.934364318847656, 11.86444091796875, -9.024154663085938, 18.607074737548828, 18.440673828125, 20.08206558227539, -0.811553955078125, 14.7576904296875, 59.153953552246094, -5.256561279296875, 5.505645751953125, 2.53662109375, -10.458534240722656, 28.5982608795166, 18.278900146484375, 22.448654174804688, 5.269783020019531, 24.49709701538086, -13.414459228515625, 46.114959716796875, 23.78167724609375, 28.126419067382812, 41.73135757446289, 46.250885009765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000370.npy"}
{"epoch": 0.5433186490455213, "step": 371, "batch_size": 64, "mean": 14.912303924560547, "std": 19.488142013549805, "min": -31.98162841796875, "p10": -8.455202293395995, "median": 11.494686126708984, "p90": 42.18496017456055, "max": 60.57080078125, "pos_frac": 0.84375, "sample": [22.351890563964844, 27.328948974609375, 8.961780548095703, -7.005388259887695, -19.520645141601562, 55.69200134277344, 48.024932861328125, 27.132610321044922, 28.33795166015625, 13.215057373046875, 8.708930969238281, -8.388641357421875, 29.625717163085938, 47.14836120605469, 31.317031860351562, 6.79461669921875, 22.063201904296875, 8.397655487060547, 11.20709228515625, 34.07799530029297, 6.92510986328125, 26.90713882446289, 2.6757888793945312, 3.2967987060546875, 2.8192138671875, 32.140869140625, 60.57080078125, 8.071609497070312, 48.81134033203125, 4.82806396484375, 58.53601837158203, -31.98162841796875, -9.06976318359375, -3.0634765625, -17.456344604492188, 12.896244049072266, 16.92633056640625, 16.064640045166016, 6.474006652832031, 11.782279968261719, -8.483728408813477, 3.0741844177246094, 29.594482421875, 3.489400863647461, 13.195446014404297, 2.3044509887695312, 15.814048767089844, 10.804193496704102, 28.195236206054688, -18.476470947265625, -10.463668823242188, 0.1541900634765625, 23.129474639892578, 38.312164306640625, 23.686279296875, 10.814048767089844, 17.548675537109375, 0.2726936340332031, 5.1616363525390625, 4.153022766113281, 42.316551208496094, 41.87791442871094, 22.144927978515625, 2.1421241760253906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000371.npy"}
{"epoch": 0.5447870778267254, "step": 372, "batch_size": 64, "mean": 24.429153442382812, "std": 18.631084442138672, "min": -10.522876739501953, "p10": 2.7795669555664064, "median": 26.177410125732422, "p90": 40.543058776855474, "max": 101.4552001953125, "pos_frac": 0.921875, "sample": [27.010780334472656, 34.112327575683594, 31.358184814453125, 18.120132446289062, 13.371444702148438, 28.814651489257812, 24.105789184570312, 33.1109619140625, 11.762039184570312, 5.1191864013671875, 101.4552001953125, 34.10929870605469, 19.378982543945312, 32.680397033691406, 16.187637329101562, -2.5931015014648438, 4.461771011352539, 9.322914123535156, 32.46656799316406, -10.522876739501953, 16.305252075195312, 31.060640335083008, 38.179344177246094, 22.3958740234375, 41.2205810546875, 37.42686462402344, 28.478897094726562, 36.63393783569336, 18.325607299804688, 33.81446838378906, 19.621109008789062, 43.70226287841797, 26.38947296142578, 12.151901245117188, 25.60559844970703, 7.745540618896484, 27.878650665283203, 27.147125244140625, 13.600196838378906, 21.927276611328125, 27.619964599609375, 3.0312042236328125, 28.699275970458984, 20.254562377929688, 31.55010223388672, 59.207191467285156, 2.671722412109375, 0.10284423828125, -6.9451904296875, 29.34967041015625, 45.284324645996094, 28.00428009033203, 31.937973022460938, 29.985443115234375, 25.965347290039062, 15.162208557128906, -4.781038284301758, 38.96217346191406, 9.106151580810547, 73.48831176757812, -3.6723461151123047, 24.616806030273438, 56.71992492675781, 3.7021484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000372.npy"}
{"epoch": 0.5462555066079295, "step": 373, "batch_size": 64, "mean": 19.721309661865234, "std": 17.271305084228516, "min": -18.4754638671875, "p10": 0.7627098083496096, "median": 16.738059997558594, "p90": 41.18330802917481, "max": 65.64562225341797, "pos_frac": 0.921875, "sample": [13.375125885009766, 13.979705810546875, 7.419212341308594, 33.75065612792969, 17.000076293945312, 41.55049514770508, 16.475013732910156, 18.43053436279297, 12.51529312133789, 14.235301971435547, 5.301902770996094, 0.01657867431640625, 23.155517578125, 0.6827774047851562, 27.959632873535156, 24.096630096435547, 4.462795257568359, 30.688499450683594, 14.934860229492188, 18.551918029785156, 3.5499114990234375, 12.180145263671875, 34.95542526245117, 1.4342041015625, 8.042423248291016, 5.460229873657227, 38.33148956298828, 30.333221435546875, 10.58721923828125, -18.4754638671875, -14.135799407958984, 47.36859130859375, 6.824226379394531, 19.23455810546875, 46.598907470703125, -2.545024871826172, 39.43537902832031, 48.45851135253906, 20.06155776977539, 5.918365478515625, 38.69389343261719, 23.776508331298828, -3.605396270751953, 65.64562225341797, 56.74308776855469, 32.90876770019531, 40.3265380859375, 27.185211181640625, 35.985626220703125, 11.734344482421875, 14.530410766601562, 49.572296142578125, 10.893234252929688, 13.57590103149414, 0.94921875, 19.322906494140625, 29.37762451171875, 25.198917388916016, 4.5181121826171875, 11.974945068359375, 30.8734130859375, 36.96723937988281, -13.66119384765625, 16.476043701171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000373.npy"}
{"epoch": 0.5477239353891337, "step": 374, "batch_size": 64, "mean": 20.857561111450195, "std": 16.909080505371094, "min": -6.434871673583984, "p10": -0.028024673461913917, "median": 18.259767532348633, "p90": 42.53700561523438, "max": 64.70901489257812, "pos_frac": 0.890625, "sample": [9.0198974609375, 13.351934432983398, 23.102577209472656, 41.2901611328125, 18.047096252441406, 64.70901489257812, 0.117279052734375, 16.88995361328125, 17.088211059570312, 13.213865280151367, 0.6645050048828125, -1.1299476623535156, 31.13421630859375, -0.09029769897460938, 30.306732177734375, 16.942550659179688, 17.027931213378906, 26.97032928466797, 49.33210754394531, -3.1578369140625, 21.260597229003906, 1.5066375732421875, 39.26115798950195, -2.1835861206054688, -5.6732330322265625, 42.586517333984375, 14.172578811645508, -5.5332794189453125, 1.410400390625, 25.36695098876953, 51.924224853515625, 54.78254699707031, 37.050048828125, 48.90562438964844, 21.059890747070312, 20.305679321289062, 1.1279373168945312, 36.236610412597656, 1.0740280151367188, 24.799102783203125, 15.674613952636719, 5.831592559814453, 9.577291488647461, 2.4355220794677734, 18.43594741821289, 4.137351989746094, 29.439239501953125, 42.421478271484375, 32.003700256347656, 37.00164794921875, 15.443603515625, 33.775611877441406, 27.600399017333984, 36.05998229980469, 18.083587646484375, 31.66427230834961, -6.434871673583984, 29.23328399658203, 33.14604187011719, 12.552688598632812, 28.19439697265625, 11.740196228027344, 7.1147308349609375, 45.48087692260742], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000374.npy"}
{"epoch": 0.5491923641703378, "step": 375, "batch_size": 64, "mean": 21.63034439086914, "std": 19.809179306030273, "min": -24.016250610351562, "p10": 1.2806777954101567, "median": 19.10414695739746, "p90": 48.8170783996582, "max": 66.2431869506836, "pos_frac": 0.90625, "sample": [31.945701599121094, 22.548067092895508, 10.573333740234375, 35.844200134277344, 32.430686950683594, 18.828304290771484, 52.86592102050781, 1.7960968017578125, 32.998748779296875, 3.295764923095703, 13.278507232666016, 31.100143432617188, 11.872398376464844, -24.016250610351562, 7.6456298828125, 48.8131103515625, 64.74378967285156, 22.310855865478516, 5.805488586425781, 10.587242126464844, 12.74029541015625, 17.33487319946289, 4.23162841796875, 44.245018005371094, 45.824684143066406, 36.38627624511719, 37.01769256591797, 5.79588508605957, 19.56536865234375, 20.028488159179688, 41.19501495361328, 6.296207427978516, 20.133201599121094, 21.024566650390625, 11.938156127929688, 6.38177490234375, 58.13465881347656, 24.725738525390625, 14.621269226074219, 19.379989624023438, 5.176368713378906, 6.300092697143555, -4.65179443359375, 10.351638793945312, 40.939910888671875, 31.104782104492188, 48.81877899169922, 17.12881088256836, 66.2431869506836, 7.86358642578125, 28.562686920166016, -9.274818420410156, 18.2657470703125, -0.14134597778320312, 64.12454223632812, 38.624053955078125, 42.286773681640625, 24.89531707763672, -14.535629272460938, 11.281646728515625, 1.059783935546875, 7.507049560546875, 50.714508056640625, -10.572090148925781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000375.npy"}
{"epoch": 0.5506607929515418, "step": 376, "batch_size": 64, "mean": 23.631113052368164, "std": 18.860553741455078, "min": -35.5494384765625, "p10": 0.36324348449707067, "median": 23.57394027709961, "p90": 47.34362869262696, "max": 72.05795288085938, "pos_frac": 0.921875, "sample": [-35.5494384765625, 20.159713745117188, 28.488082885742188, 36.967918395996094, 49.221717834472656, 30.997207641601562, 44.05323791503906, -4.2441864013671875, 29.809356689453125, 10.820270538330078, 14.208221435546875, 0.23019790649414062, 25.174041748046875, 0.03488731384277344, 28.422454833984375, 22.859729766845703, 29.11688232421875, 6.5149688720703125, 37.624290466308594, -0.03243255615234375, 50.242340087890625, 1.0023422241210938, 43.91807556152344, 52.14500427246094, 53.316261291503906, 1.1250762939453125, 45.47553253173828, 33.94017028808594, 72.05795288085938, 21.835433959960938, 23.793533325195312, 33.42503356933594, 32.19226837158203, 48.14424133300781, 15.583877563476562, 7.0879669189453125, 34.21882629394531, 15.208572387695312, 33.40997314453125, 34.45229721069336, 30.568374633789062, 42.961151123046875, -12.434463500976562, 11.845413208007812, 15.712127685546875, 22.534805297851562, 22.826751708984375, 45.29510498046875, 20.217376708984375, 4.527412414550781, 14.776359558105469, -8.2491455078125, 32.316734313964844, 22.53082275390625, 13.589599609375, 51.51580810546875, 10.192588806152344, 38.525390625, 31.764419555664062, 3.303638458251953, 0.6736831665039062, 23.354347229003906, 31.87621307373047, 14.714889526367188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000376.npy"}
{"epoch": 0.5521292217327459, "step": 377, "batch_size": 64, "mean": 24.837284088134766, "std": 21.71783447265625, "min": -12.330707550048828, "p10": 0.012201690673829368, "median": 22.7320499420166, "p90": 51.95682830810548, "max": 80.48141479492188, "pos_frac": 0.890625, "sample": [44.06269836425781, 46.87873840332031, 19.642921447753906, 69.02262878417969, 10.758342742919922, 1.690582275390625, 49.56114196777344, 36.45845031738281, 44.82615661621094, 10.48635482788086, 40.30885314941406, 80.48141479492188, 16.35382843017578, 21.8426513671875, 1.8310070037841797, 26.639984130859375, 44.128780364990234, 1.1852645874023438, 38.33188247680664, 22.63846206665039, 37.07672119140625, 74.9703598022461, -0.49053955078125, -8.587181091308594, 14.414840698242188, -11.955291748046875, 67.99549865722656, 31.244827270507812, 8.50885009765625, 14.133468627929688, 28.585247039794922, 21.180007934570312, 11.92694091796875, 30.841232299804688, 4.235200881958008, 43.86870574951172, 17.116119384765625, 32.65949249267578, 54.26051330566406, 52.983551025390625, 11.333913803100586, 24.817222595214844, 37.34374237060547, 44.1566162109375, -8.44891357421875, 18.185009002685547, 22.825637817382812, 13.24176025390625, 28.487533569335938, 70.50617980957031, 23.653953552246094, 12.429359436035156, -6.608558654785156, 4.900821685791016, 7.776477813720703, 27.77301788330078, 4.2218170166015625, 33.84716796875, -6.909645080566406, -12.330707550048828, 19.314010620117188, 22.894615173339844, 35.17713928222656, 8.899215698242188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000377.npy"}
{"epoch": 0.55359765051395, "step": 378, "batch_size": 64, "mean": 22.49722671508789, "std": 19.35369300842285, "min": -38.93402099609375, "p10": -0.9107896804809547, "median": 24.01447296142578, "p90": 47.16464920043948, "max": 79.87178039550781, "pos_frac": 0.890625, "sample": [7.622444152832031, 8.461341857910156, 31.018478393554688, 51.46318054199219, 55.047203063964844, 27.100536346435547, 18.82776641845703, 17.33055877685547, 40.492576599121094, 14.808387756347656, 36.64772415161133, 29.228118896484375, -2.7812652587890625, 23.30712890625, 11.371498107910156, 5.702606201171875, 16.574459075927734, 33.29240417480469, 24.504806518554688, 14.158008575439453, 13.082168579101562, -7.495548248291016, 24.414230346679688, 28.168685913085938, 28.32927703857422, 25.7769832611084, -6.369667053222656, 34.95349884033203, 18.672115325927734, 17.9672908782959, 27.32262420654297, 18.21826171875, 25.586807250976562, 3.7449951171875, -10.741981506347656, 32.23516845703125, 6.5611419677734375, 50.02410888671875, -5.435276031494141, -38.93402099609375, 38.47658920288086, 57.641761779785156, 24.579246520996094, 39.32798767089844, 37.179298400878906, 1.3852481842041016, 53.47747802734375, 16.916879653930664, 66.43580627441406, 9.696638107299805, 18.206771850585938, 25.818008422851562, 23.08641815185547, 14.37176513671875, 38.651222229003906, 25.024078369140625, 23.614715576171875, 11.726234436035156, 24.714645385742188, 24.502487182617188, 32.98509216308594, -1.894805908203125, 3.7682723999023438, 79.87178039550781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000378.npy"}
{"epoch": 0.5550660792951542, "step": 379, "batch_size": 64, "mean": 20.73077964782715, "std": 21.999765396118164, "min": -17.259889602661133, "p10": -7.626766586303709, "median": 16.01788330078125, "p90": 43.962245178222666, "max": 84.38357543945312, "pos_frac": 0.84375, "sample": [36.39765548706055, 11.687660217285156, 84.18836975097656, 45.871360778808594, -8.46993637084961, -8.835386276245117, 6.8120269775390625, 16.102142333984375, 6.9269256591796875, 5.5295867919921875, 27.96038818359375, 37.759727478027344, 38.75068664550781, 38.631683349609375, 41.58245849609375, 1.2403717041015625, 31.503021240234375, 38.36414337158203, 18.357872009277344, 13.433845520019531, 39.46275329589844, 29.972434997558594, 4.043121337890625, 3.8241329193115234, 15.933624267578125, 12.898185729980469, 29.184982299804688, 7.478252410888672, 18.870872497558594, -5.659370422363281, 74.02610778808594, 10.835899353027344, -14.56402587890625, 35.83292007446289, 44.734222412109375, 34.94184875488281, 39.03752136230469, 51.462554931640625, 35.68959045410156, 7.9072113037109375, 21.45824432373047, 32.51310729980469, -16.927261352539062, 15.23689079284668, 22.894332885742188, 14.769790649414062, -11.981338500976562, 9.06219482421875, 12.191261291503906, -2.4000396728515625, -0.029369354248046875, 9.05706787109375, 4.643524169921875, 45.49333190917969, 84.38357543945312, 3.7623672485351562, 26.856353759765625, 14.77236557006836, 10.557792663574219, 25.070011138916016, 31.777074813842773, 42.16096496582031, -17.259889602661133, -10.9998779296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000379.npy"}
{"epoch": 0.5565345080763583, "step": 380, "batch_size": 64, "mean": 22.613746643066406, "std": 24.187694549560547, "min": -32.10243225097656, "p10": -3.320518112182616, "median": 18.18869972229004, "p90": 50.69476127624512, "max": 98.86178588867188, "pos_frac": 0.828125, "sample": [17.267776489257812, 26.87335205078125, -1.7849273681640625, 23.718765258789062, 7.987627029418945, 3.232105255126953, 34.446998596191406, 79.89311981201172, 31.469261169433594, 10.058124542236328, 7.689155578613281, 35.44207000732422, -0.073760986328125, 77.97220611572266, 49.905975341796875, 14.777900695800781, -0.27812957763671875, 17.19385528564453, 37.7120361328125, 17.340545654296875, 48.463539123535156, -2.2477378845214844, 26.56371307373047, 10.876018524169922, 22.903541564941406, 4.775262832641602, 17.71270751953125, 17.995521545410156, 40.64404296875, 15.150634765625, -7.9755706787109375, 64.71100616455078, 39.62016677856445, 52.04881286621094, 98.86178588867188, 50.19089126586914, -3.7802810668945312, 18.381877899169922, 37.3946533203125, 20.689231872558594, 41.8045654296875, 22.375282287597656, -11.936342239379883, -8.010017395019531, -6.364013671875, 12.783489227294922, 22.694915771484375, 35.34349060058594, 5.276268005371094, -32.10243225097656, 50.91070556640625, 8.66183853149414, 11.302543640136719, 19.10595703125, 6.5424957275390625, -27.993934631347656, 46.70580291748047, 21.32184600830078, 17.513973236083984, 8.38031005859375, 33.99040222167969, 31.510025024414062, 60.712120056152344, 12.926521301269531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000380.npy"}
{"epoch": 0.5580029368575624, "step": 381, "batch_size": 64, "mean": 18.0841064453125, "std": 19.524564743041992, "min": -40.88710021972656, "p10": -1.7407760620117188, "median": 15.279891967773438, "p90": 45.46031951904297, "max": 61.96282958984375, "pos_frac": 0.828125, "sample": [10.501182556152344, -3.111480712890625, 13.206066131591797, 30.37596893310547, 35.63560485839844, 23.52556610107422, 45.53221130371094, 28.41645050048828, 32.281227111816406, 47.252349853515625, 24.396350860595703, 37.45713806152344, 10.093124389648438, 38.378021240234375, 61.96282958984375, 4.151988983154297, 55.78855895996094, 0.9579353332519531, 61.596771240234375, 17.945457458496094, 25.121055603027344, 18.747520446777344, 3.1491050720214844, 20.084396362304688, -1.70654296875, 35.3442268371582, 37.764678955078125, 11.50229263305664, 1.0864028930664062, 4.17097282409668, 6.774965286254883, 22.10924530029297, 22.482688903808594, 8.005884170532227, 6.651939392089844, 16.458091735839844, 45.292572021484375, -0.6103401184082031, 17.009307861328125, 3.9869537353515625, 10.996932983398438, 39.40531921386719, 17.92888641357422, 19.485824584960938, 14.101692199707031, 24.564407348632812, 7.4127197265625, 11.012527465820312, 55.119964599609375, 2.38592529296875, 10.724681854248047, 59.97385025024414, 9.610092163085938, -0.9203147888183594, -0.12095069885253906, -1.7554473876953125, -12.086601257324219, 28.988258361816406, 22.79314422607422, 7.654638290405273, -2.4200668334960938, -40.88710021972656, -4.47943115234375, -1.87493896484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000381.npy"}
{"epoch": 0.5594713656387665, "step": 382, "batch_size": 64, "mean": 22.994197845458984, "std": 21.24288558959961, "min": -18.3392333984375, "p10": -1.388600921630859, "median": 21.897480010986328, "p90": 52.97406921386719, "max": 74.08287048339844, "pos_frac": 0.875, "sample": [74.08287048339844, 24.13579559326172, 21.63702392578125, 3.3694076538085938, 28.987327575683594, 70.009033203125, 53.2320556640625, 5.622833251953125, 44.79484558105469, 12.5904541015625, 33.19385528564453, 33.28714370727539, 12.221176147460938, 23.75658416748047, -1.003753662109375, 53.84136962890625, 52.14977264404297, -4.203765869140625, 13.596656799316406, 8.38150405883789, 52.057716369628906, 17.180408477783203, 36.98736572265625, 22.157936096191406, 33.624298095703125, 3.5890045166015625, 11.08425521850586, 34.848358154296875, 9.991020202636719, 25.072105407714844, 19.090988159179688, 1.0294952392578125, -8.001068115234375, 24.729839324951172, 42.648216247558594, 52.372100830078125, 9.052108764648438, 33.40876770019531, 2.0160293579101562, 61.45488739013672, 11.781723022460938, 26.76926612854004, 42.29695129394531, 7.996925354003906, -18.3392333984375, 12.693977355957031, 6.121101379394531, -2.094329833984375, 4.997322082519531, -5.965999603271484, 56.446922302246094, 63.325584411621094, 37.01451110839844, 26.459976196289062, 23.090476989746094, 16.380599975585938, -1.5535354614257812, 0.5609054565429688, 30.210716247558594, 7.5918731689453125, 13.603572845458984, -12.281349182128906, 25.588531494140625, 50.8560791015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000382.npy"}
{"epoch": 0.5609397944199707, "step": 383, "batch_size": 64, "mean": 18.315284729003906, "std": 19.663192749023438, "min": -28.40430450439453, "p10": -8.556473922729488, "median": 17.475990295410156, "p90": 44.57725982666016, "max": 62.73194885253906, "pos_frac": 0.8125, "sample": [33.10881042480469, 0.0034770965576171875, 52.2889404296875, -11.491661071777344, 40.14295959472656, 10.108779907226562, 5.915641784667969, 25.358966827392578, 18.809070587158203, -12.360816955566406, 45.670005798339844, 24.527488708496094, 9.056793212890625, -9.949413299560547, 36.55271911621094, -1.219635009765625, 62.73194885253906, 29.964065551757812, 14.094573974609375, 19.683425903320312, 37.509368896484375, 1.7582359313964844, 11.436473846435547, 35.356475830078125, 41.50287628173828, -5.306282043457031, 47.8575439453125, 10.569145202636719, -16.047378540039062, -12.480091094970703, 43.294036865234375, 7.601985931396484, -12.559158325195312, 34.870018005371094, 10.17544937133789, 7.6197967529296875, 48.781578063964844, 14.723289489746094, 46.033164978027344, 21.12836265563965, 35.39271545410156, 22.751686096191406, -0.7558822631835938, 34.722110748291016, 31.32489013671875, 15.458717346191406, -28.40430450439453, 21.385894775390625, 9.142135620117188, 18.167572021484375, 45.12721252441406, 31.53656005859375, 5.8638458251953125, 40.07145690917969, 41.838172912597656, 0.9100914001464844, -4.1569061279296875, 16.0048828125, -0.5344924926757812, 10.839340209960938, 2.878662109375, 18.31295394897461, 16.784408569335938, 20.69538116455078], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000383.npy"}
{"epoch": 0.5624082232011748, "step": 384, "batch_size": 64, "mean": 20.731597900390625, "std": 20.244815826416016, "min": -17.73963165283203, "p10": 1.6103988647460943, "median": 16.820877075195312, "p90": 44.90930061340333, "max": 89.90149688720703, "pos_frac": 0.921875, "sample": [17.680509567260742, 47.74748611450195, 10.994483947753906, 8.235736846923828, -17.73963165283203, 7.4712371826171875, 3.341686248779297, 8.849655151367188, 85.88839721679688, 16.823638916015625, 32.16313934326172, 41.50188446044922, 18.69516372680664, 22.53984832763672, 35.42464065551758, 11.58758544921875, -2.7654953002929688, 5.047379493713379, 25.73895263671875, 6.545867919921875, 21.744121551513672, 5.110939025878906, 5.557802200317383, 50.67017364501953, 15.930442810058594, 32.455108642578125, 9.458702087402344, 4.787925720214844, 6.769975662231445, 42.21519088745117, 33.65250778198242, 29.762939453125, 5.661689758300781, 46.06391906738281, 29.737686157226562, -17.672611236572266, 1.4257965087890625, 26.41559600830078, 32.971092224121094, 2.0411376953125, 8.647270202636719, 23.703704833984375, 12.699966430664062, 32.26605224609375, 13.675308227539062, 13.165313720703125, 61.469337463378906, 33.7159423828125, 28.55987548828125, -6.665740966796875, 13.0382080078125, 16.818115234375, 22.597084045410156, 89.90149688720703, 32.835662841796875, 5.40423583984375, 21.142967224121094, 16.29499053955078, 1.2123603820800781, 14.66189193725586, 24.864120483398438, 56.945159912109375, -5.90667724609375, 25.24329376220703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000384.npy"}
{"epoch": 0.5638766519823789, "step": 385, "batch_size": 64, "mean": 20.57828140258789, "std": 18.37662124633789, "min": -13.800827026367188, "p10": -5.810400009155272, "median": 23.04648494720459, "p90": 42.149134826660166, "max": 64.6677474975586, "pos_frac": 0.84375, "sample": [57.563873291015625, -7.5583038330078125, 21.348350524902344, 5.160083770751953, 39.954566955566406, 28.2891845703125, 38.88533020019531, 18.758216857910156, 16.48918914794922, 40.32008361816406, 25.889862060546875, 0.6163482666015625, 14.132675170898438, 3.38592529296875, 6.9102935791015625, -1.4948806762695312, 35.737762451171875, 36.698516845703125, 46.128211975097656, 35.32262420654297, 28.424091339111328, -4.131069183349609, 26.301864624023438, 10.270370483398438, -10.851921081542969, 27.327957153320312, 34.60761260986328, 52.83839416503906, 2.6144561767578125, 42.933013916015625, -8.92449951171875, -12.041717529296875, 37.69823455810547, 20.110694885253906, 35.126365661621094, 32.79210662841797, -13.800827026367188, 4.142448425292969, 64.6677474975586, 36.50227355957031, 16.40270233154297, 11.587844848632812, 29.293746948242188, 5.9416656494140625, 24.744619369506836, 35.77477264404297, 28.634910583496094, 24.79753875732422, 45.12516784667969, -1.8391799926757812, -8.749221801757812, 15.829788208007812, 33.2763671875, 29.431106567382812, 26.18724822998047, 2.779052734375, 24.910701751708984, 16.45431137084961, 20.709421157836914, 44.680328369140625, 9.865020751953125, 3.0365447998046875, -6.530113220214844, 15.520095825195312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000385.npy"}
{"epoch": 0.5653450807635829, "step": 386, "batch_size": 64, "mean": 19.520633697509766, "std": 17.924114227294922, "min": -21.306137084960938, "p10": -1.6698493957519525, "median": 20.049468994140625, "p90": 40.29407424926758, "max": 69.8929672241211, "pos_frac": 0.859375, "sample": [6.447484970092773, 9.282184600830078, -11.780181884765625, -7.518772125244141, 22.86639404296875, 8.025100708007812, -0.9732933044433594, 25.194618225097656, -3.8453369140625, 35.23539733886719, 23.37401580810547, 12.312301635742188, 35.6666259765625, -5.9918212890625, 23.800086975097656, 64.09707641601562, 12.878646850585938, 13.037460327148438, 19.715438842773438, 0.42783355712890625, 22.798309326171875, 22.714340209960938, -1.0275115966796875, 25.804153442382812, 34.30485916137695, 26.092464447021484, 1.9771385192871094, 1.7855186462402344, 46.305198669433594, 69.8929672241211, 10.920547485351562, 18.660728454589844, 40.59033203125, 29.263778686523438, 30.64708709716797, 36.73419189453125, 17.03510284423828, 10.85650634765625, 29.058998107910156, 21.29266357421875, 13.335983276367188, 27.359779357910156, 4.7602691650390625, 31.610763549804688, 12.380889892578125, -16.235214233398438, -1.9451370239257812, 44.46763610839844, 39.602806091308594, 12.492584228515625, 11.330657958984375, 24.370193481445312, 3.6666336059570312, 31.364959716796875, 13.006675720214844, 41.726112365722656, 30.744651794433594, 15.456157684326172, -21.306137084960938, 35.946136474609375, 53.93547821044922, 9.145450592041016, 33.76115417480469, 20.383499145507812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000386.npy"}
{"epoch": 0.566813509544787, "step": 387, "batch_size": 64, "mean": 19.068519592285156, "std": 18.669727325439453, "min": -16.520980834960938, "p10": -5.315198516845703, "median": 17.530078887939453, "p90": 42.79512596130371, "max": 59.631256103515625, "pos_frac": 0.8125, "sample": [22.370586395263672, -6.528717041015625, 29.376075744628906, 43.00267028808594, 1.7701053619384766, 40.619407653808594, 18.1522216796875, 2.0309715270996094, 41.948814392089844, 59.631256103515625, 3.703510284423828, 27.251968383789062, 0.1674041748046875, 12.248397827148438, 8.166091918945312, 42.310855865478516, 27.5367431640625, 47.68620300292969, 39.18519592285156, -9.69146728515625, -4.466117858886719, 36.547203063964844, 41.063819885253906, 43.153228759765625, 10.83087158203125, 23.143226623535156, 11.473609924316406, 8.41015625, -5.206298828125, 18.949851989746094, 12.189262390136719, 6.9695281982421875, 58.46795654296875, 6.415107727050781, 41.83668518066406, 27.84142303466797, -16.520980834960938, -0.97607421875, 12.705886840820312, 17.49224853515625, 16.972496032714844, 15.148534774780273, 32.50843811035156, 5.6106719970703125, 17.567909240722656, 27.3587646484375, -0.8445587158203125, 2.130626678466797, 24.767562866210938, -8.454551696777344, -5.361869812011719, 34.843841552734375, -9.744239807128906, 20.913997650146484, -6.8141632080078125, 39.615135192871094, -4.228553771972656, 38.09259033203125, 45.044921875, 27.578155517578125, 17.462631225585938, 11.604835510253906, 31.856719970703125, 45.496585845947266], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000387.npy"}
{"epoch": 0.5682819383259912, "step": 388, "batch_size": 64, "mean": 24.807453155517578, "std": 16.975664138793945, "min": -3.2136783599853516, "p10": 1.9076904296875028, "median": 23.78821563720703, "p90": 49.15668716430665, "max": 72.4151611328125, "pos_frac": 0.9375, "sample": [13.944427490234375, 29.994949340820312, 0.0516204833984375, 45.73295593261719, 34.80982971191406, 51.29132843017578, 25.82101821899414, 20.979049682617188, 39.93041229248047, 5.4633026123046875, 29.785442352294922, 39.99354553222656, 41.795692443847656, 31.50733184814453, 18.034435272216797, 21.664875030517578, 19.486602783203125, 7.708927154541016, 29.18553924560547, 25.189102172851562, 28.17043685913086, 26.183425903320312, 22.8560791015625, -2.534748077392578, 20.971057891845703, 38.15057373046875, 15.400680541992188, 23.238571166992188, 61.493194580078125, 24.337860107421875, 10.803085327148438, 7.25250244140625, 11.901718139648438, 14.627861022949219, 30.79259490966797, 24.404556274414062, 31.0963134765625, 10.986671447753906, 72.4151611328125, -3.2136783599853516, -1.099456787109375, 17.83163833618164, 38.04827880859375, 12.623130798339844, 35.720855712890625, 60.62163543701172, 39.386199951171875, 17.39685821533203, 39.13395690917969, 50.624000549316406, 37.29489517211914, 51.157867431640625, 0.74053955078125, 24.94231414794922, 0.7062263488769531, 4.63104248046875, 5.102054595947266, 17.428558349609375, 53.22602844238281, 10.919815063476562, 33.86647033691406, 15.05963134765625, 22.809425354003906, -2.199291229248047], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000388.npy"}
{"epoch": 0.5697503671071953, "step": 389, "batch_size": 64, "mean": 19.392778396606445, "std": 20.59193992614746, "min": -28.206573486328125, "p10": -5.0768089294433585, "median": 20.371849060058594, "p90": 44.85646972656251, "max": 65.93185424804688, "pos_frac": 0.796875, "sample": [24.499549865722656, 48.70240020751953, 32.107173919677734, -6.792903900146484, -6.056831359863281, 0.73919677734375, 41.27378463745117, 23.46576690673828, 21.827362060546875, -2.7607421875, 65.93185424804688, -11.094520568847656, 30.79161834716797, -13.424728393554688, -2.0343704223632812, 33.436744689941406, 2.404693603515625, 14.660133361816406, 19.608421325683594, 4.6529083251953125, 9.756149291992188, 21.488536834716797, 56.92961883544922, -25.09891128540039, 21.135276794433594, 12.753700256347656, -28.206573486328125, 45.5740966796875, 24.7091064453125, 26.265472412109375, 40.38883590698242, 19.524429321289062, 35.95540237426758, -4.514259338378906, -4.367053985595703, 11.490158081054688, 14.901634216308594, 1.384429931640625, 40.93798828125, 40.340118408203125, 50.469940185546875, 13.036680221557617, 1.6903610229492188, 34.79779052734375, 15.949348449707031, 34.447471618652344, 23.494720458984375, 4.529396057128906, 40.44534683227539, 11.604988098144531, 22.72491455078125, 52.2034912109375, 1.8199691772460938, -5.317901611328125, 29.775897979736328, 37.18254089355469, 57.94451904296875, 18.085365295410156, 43.1820068359375, 11.441810607910156, -0.323333740234375, -0.5572452545166016, 21.583240509033203, 37.64079284667969], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000389.npy"}
{"epoch": 0.5712187958883994, "step": 390, "batch_size": 64, "mean": 19.065292358398438, "std": 22.973493576049805, "min": -18.905906677246094, "p10": -6.155249786376952, "median": 16.22657299041748, "p90": 43.76772232055664, "max": 106.76026916503906, "pos_frac": 0.8125, "sample": [-10.122360229492188, 7.558559417724609, 31.650230407714844, 27.897933959960938, -6.2896881103515625, 3.52203369140625, 30.305023193359375, 3.186960220336914, 39.71578598022461, 18.594566345214844, 23.761642456054688, 42.35686492919922, 21.812992095947266, 12.633293151855469, 6.652191162109375, 3.898548126220703, 18.594928741455078, 106.76026916503906, 6.832847595214844, 69.5130615234375, 18.83179473876953, 39.90252685546875, 33.869590759277344, 23.683677673339844, -9.138240814208984, 0.07425308227539062, 4.693935394287109, 9.322105407714844, 8.406547546386719, 7.4867706298828125, -5.098731994628906, 60.256683349609375, 9.90240478515625, -18.905906677246094, 10.309162139892578, 64.1255111694336, 35.70878601074219, 42.051666259765625, 3.9520950317382812, 15.135292053222656, 11.61859130859375, 18.086639404296875, -5.841560363769531, 17.317853927612305, 30.254287719726562, -3.6079254150390625, -1.8891830444335938, 22.709136962890625, 51.453369140625, 27.37195587158203, 37.7783203125, 40.9534912109375, 0.716522216796875, 39.753807067871094, 5.570899963378906, -10.906383514404297, -3.277374267578125, -15.21933364868164, -6.965980529785156, 32.001441955566406, 1.5447540283203125, 20.611167907714844, 44.37237548828125, 52.366302490234375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000390.npy"}
{"epoch": 0.5726872246696035, "step": 391, "batch_size": 64, "mean": 20.136383056640625, "std": 18.701433181762695, "min": -16.601776123046875, "p10": -0.9150827407836912, "median": 16.739299774169922, "p90": 47.413533020019536, "max": 74.0357437133789, "pos_frac": 0.875, "sample": [-3.850006103515625, 13.659767150878906, 14.233650207519531, 33.34945297241211, 10.653356552124023, 14.666656494140625, 34.66027069091797, 13.206428527832031, 18.522781372070312, 46.89337158203125, 9.43621826171875, 23.195907592773438, -0.7820529937744141, 11.108474731445312, 7.5792694091796875, 3.9789352416992188, 11.319843292236328, 19.225852966308594, -0.9720954895019531, 18.259780883789062, 29.721118927001953, 50.66361618041992, 2.4439239501953125, 27.13720703125, 10.897069931030273, 12.910682678222656, 15.085685729980469, 21.251190185546875, 16.22858428955078, 28.571823120117188, 12.630977630615234, 58.204925537109375, 2.4789791107177734, 21.884687423706055, 5.474983215332031, -6.228950500488281, 39.76548767089844, 39.38524627685547, 43.014400482177734, 9.324913024902344, 19.987403869628906, -16.601776123046875, 31.86069107055664, 22.229907989501953, 55.998111724853516, -13.076065063476562, 24.531204223632812, 14.80801010131836, 4.586273193359375, 23.95998764038086, -2.2037696838378906, 31.608169555664062, 4.120819091796875, 47.63645935058594, 52.994834899902344, 74.0357437133789, 60.719207763671875, 0.5720062255859375, 31.39007568359375, 1.885955810546875, 34.06378936767578, 29.6592960357666, -2.480255126953125, 17.250015258789062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000391.npy"}
{"epoch": 0.5741556534508077, "step": 392, "batch_size": 64, "mean": 21.572608947753906, "std": 19.62053108215332, "min": -25.211265563964844, "p10": -0.06258392333984322, "median": 21.580392837524414, "p90": 48.9836051940918, "max": 71.9895248413086, "pos_frac": 0.890625, "sample": [28.762535095214844, -0.287689208984375, 0.4626617431640625, 30.668682098388672, 27.830848693847656, 6.1946563720703125, 26.024322509765625, -14.84716796875, 6.462005615234375, 49.116615295410156, 35.32933044433594, 17.304718017578125, 23.51736831665039, 13.664112091064453, -12.628143310546875, 12.301494598388672, 65.57943725585938, 71.9895248413086, 14.775054931640625, 18.039749145507812, 15.915763854980469, 16.122283935546875, 54.290992736816406, 31.361434936523438, 4.320220947265625, 35.19425964355469, 48.673248291015625, 29.817230224609375, 28.582996368408203, 30.545867919921875, 14.819145202636719, 59.65631103515625, 26.38243865966797, 15.909805297851562, 24.470916748046875, 33.422752380371094, 5.426826477050781, 11.832603454589844, 6.09478759765625, 12.719636917114258, 25.12445068359375, 17.332443237304688, 31.354400634765625, 19.643417358398438, 40.691184997558594, 39.44474792480469, 24.15526580810547, 23.74846649169922, -0.9655647277832031, -25.211265563964844, 5.905902862548828, 49.2967529296875, 0.4799308776855469, 10.669700622558594, 9.504104614257812, 31.285724639892578, 32.51493835449219, -1.39190673828125, 62.02685546875, -21.6995849609375, 14.98975944519043, 7.3074798583984375, 25.84516143798828, 32.774993896484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000392.npy"}
{"epoch": 0.5756240822320118, "step": 393, "batch_size": 64, "mean": 20.865938186645508, "std": 18.218353271484375, "min": -11.224159240722656, "p10": -0.10606079101562496, "median": 19.105140686035156, "p90": 41.639640045166026, "max": 73.00401306152344, "pos_frac": 0.875, "sample": [26.97753143310547, 37.077239990234375, 39.638301849365234, -6.987464904785156, 7.599815368652344, 50.696533203125, 3.5842056274414062, 39.238487243652344, 16.58880043029785, 29.720497131347656, 29.188339233398438, 18.372291564941406, 13.508092880249023, 36.09282684326172, 4.26434326171875, 13.993824005126953, -0.0601959228515625, 73.00401306152344, 22.869964599609375, 39.043235778808594, 10.433895111083984, 1.957794189453125, 23.906646728515625, 26.65607452392578, 54.514198303222656, 24.86200714111328, 1.5478248596191406, 33.30896759033203, 17.535621643066406, 8.042823791503906, 26.82152557373047, 19.65233612060547, 22.33655548095703, 36.057098388671875, 3.7685394287109375, 16.974105834960938, 30.628997802734375, 15.727340698242188, 47.916290283203125, 22.016143798828125, -11.224159240722656, 18.557945251464844, 18.187904357910156, 12.924263000488281, 42.49735641479492, 15.467910766601562, 30.511924743652344, -6.075798034667969, 28.716224670410156, 8.053855895996094, 31.056800842285156, 23.754470825195312, 39.119842529296875, -9.201274871826172, 15.496480941772461, 49.16119384765625, 9.446937561035156, 20.659347534179688, 3.8882598876953125, -9.078933715820312, -8.296575546264648, 3.7847442626953125, 69.06158447265625, -0.1257171630859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000393.npy"}
{"epoch": 0.5770925110132159, "step": 394, "batch_size": 64, "mean": 22.18606185913086, "std": 18.24863624572754, "min": -10.685659408569336, "p10": 1.0136405944824223, "median": 21.64926528930664, "p90": 44.53365173339844, "max": 74.7290267944336, "pos_frac": 0.9375, "sample": [27.482362747192383, 21.660255432128906, 4.541954040527344, 18.810039520263672, 70.28977966308594, 20.78404998779297, 38.75309371948242, 0.5164222717285156, 33.46186828613281, 21.638275146484375, 32.41368865966797, 4.517856597900391, 4.884124755859375, 35.80541229248047, -10.685659408569336, 17.74738311767578, -9.015201568603516, 6.117637634277344, 4.16668701171875, -3.258880615234375, 2.0516357421875, 35.616798400878906, 46.17943572998047, 25.110137939453125, 44.88737487792969, 26.416030883789062, 43.70829772949219, 6.992107391357422, 12.416057586669922, 74.7290267944336, 50.41307067871094, 26.211036682128906, 15.17117691040039, 4.170297622680664, 27.778385162353516, 28.707107543945312, 15.959022521972656, 34.59918212890625, 31.1873779296875, 22.756546020507812, 1.3399124145507812, 20.305099487304688, 3.5692901611328125, 9.457481384277344, -5.479831695556641, 26.90472412109375, 45.396026611328125, 42.50090789794922, 34.87157440185547, 10.178688049316406, 17.915451049804688, 8.554397583007812, 68.73460388183594, 27.792037963867188, 12.907279968261719, 34.150146484375, 0.873809814453125, 20.057476043701172, 34.50132751464844, 21.8720703125, 21.67333221435547, 0.03206443786621094, 17.813905715942383, 28.294891357421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000394.npy"}
{"epoch": 0.57856093979442, "step": 395, "batch_size": 64, "mean": 19.882238388061523, "std": 22.658201217651367, "min": -23.231494903564453, "p10": -5.303212356567382, "median": 15.21051025390625, "p90": 51.72984466552735, "max": 77.19021606445312, "pos_frac": 0.828125, "sample": [30.634788513183594, 11.495830535888672, -17.432113647460938, -4.657405853271484, -4.1023712158203125, 52.88768005371094, 10.818756103515625, 8.219284057617188, -20.61627769470215, 25.12816619873047, 14.606185913085938, 24.281021118164062, 3.7374343872070312, 15.814834594726562, 13.780708312988281, 31.23577880859375, 43.50743865966797, 32.36613464355469, 39.74974060058594, -2.5185976028442383, 36.621368408203125, 8.140800476074219, 61.888336181640625, 21.141681671142578, 9.098129272460938, 8.865592956542969, 18.208541870117188, 47.786376953125, -5.579986572265625, 2.2231063842773438, 34.892059326171875, 56.24053955078125, 28.14508819580078, 11.530845642089844, 32.7133674621582, 5.937042236328125, 27.878311157226562, 73.53507995605469, 77.19021606445312, 27.320098876953125, 22.345130920410156, 13.461166381835938, -23.231494903564453, -10.780929565429688, 4.34100341796875, 1.04010009765625, 9.098220825195312, 54.209869384765625, 10.493572235107422, -15.911033630371094, 5.148853302001953, -12.868980407714844, 7.406364440917969, 43.52509307861328, 49.028228759765625, 47.393409729003906, 1.7527999877929688, 20.845970153808594, 24.78528594970703, 56.14400863647461, 34.519508361816406, -0.5986213684082031, 34.81976318359375, 2.7823753356933594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000395.npy"}
{"epoch": 0.580029368575624, "step": 396, "batch_size": 64, "mean": 16.61659812927246, "std": 18.989290237426758, "min": -30.194358825683594, "p10": -5.086912536621093, "median": 16.98104476928711, "p90": 38.711326599121094, "max": 68.68988037109375, "pos_frac": 0.828125, "sample": [37.880653381347656, 29.90198516845703, -5.364234924316406, 25.85455322265625, 4.975761413574219, 32.74884033203125, -4.439826965332031, 2.985748291015625, -0.24873733520507812, 7.82318115234375, 40.68060302734375, -6.720447540283203, 22.285518646240234, 35.504905700683594, -1.2087821960449219, 16.605697631835938, -0.5214977264404297, 1.7280426025390625, 7.9793548583984375, 47.99859619140625, 30.81841278076172, 14.912559509277344, 34.523094177246094, 24.04278564453125, 27.327957153320312, 11.9063720703125, 28.817218780517578, 25.612518310546875, 29.370803833007812, 68.68988037109375, 1.1761474609375, 8.530181884765625, -30.194358825683594, 26.56226348876953, 5.7048492431640625, 12.962074279785156, -20.804534912109375, 18.772960662841797, -8.935951232910156, 40.83576202392578, 32.19390869140625, 17.521286010742188, -10.654563903808594, 17.82354736328125, 12.90496826171875, 1.247344970703125, 14.198501586914062, 8.367538452148438, 17.35639190673828, 66.23544311523438, 56.324981689453125, 19.11147689819336, 3.9419403076171875, -11.502710342407227, 25.634872436523438, 27.546077728271484, 2.7368087768554688, 3.051239013671875, 2.3280105590820312, 4.048187255859375, 27.789825439453125, 18.681556701660156, 39.06732940673828, 20.427459716796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000396.npy"}
{"epoch": 0.5814977973568282, "step": 397, "batch_size": 64, "mean": 25.206602096557617, "std": 23.586017608642578, "min": -23.007381439208984, "p10": -5.122070312499999, "median": 25.48540687561035, "p90": 60.63054733276368, "max": 75.72317504882812, "pos_frac": 0.859375, "sample": [0.5671234130859375, 61.6033935546875, 18.40831756591797, 67.36851501464844, 2.6519241333007812, 21.93817901611328, 1.61578369140625, 16.538177490234375, 4.968505859375, -5.407600402832031, 73.903076171875, 47.97737121582031, 23.517730712890625, 12.055145263671875, -3.9991989135742188, -7.1105194091796875, 10.744239807128906, 34.322509765625, 31.554847717285156, 28.347259521484375, 28.627471923828125, 61.99562072753906, 16.55272674560547, 25.076278686523438, 40.64634704589844, -23.007381439208984, 35.940025329589844, 50.120113372802734, -6.838237762451172, 22.336685180664062, 30.184539794921875, 34.46711349487305, 50.36432647705078, 65.0543212890625, 27.81940460205078, 17.68267822265625, 25.894535064697266, -4.455833435058594, 34.33045959472656, 27.25214385986328, 35.57989501953125, 0.7573013305664062, 1.2247085571289062, 19.661788940429688, 44.90959930419922, 44.17967224121094, -12.639541625976562, 50.36993408203125, -7.762676239013672, 1.1493606567382812, 2.041280746459961, 42.159912109375, 50.82398986816406, 58.360572814941406, 11.213851928710938, 75.72317504882812, 40.74168395996094, -10.25350570678711, 62.28133773803711, 0.7294464111328125, 28.747940063476562, 17.05084228515625, 16.564727783203125, 37.99921417236328], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000397.npy"}
{"epoch": 0.5829662261380323, "step": 398, "batch_size": 64, "mean": 24.48653793334961, "std": 22.08234977722168, "min": -20.25934600830078, "p10": 0.4275695800781265, "median": 22.161577224731445, "p90": 58.36796493530274, "max": 81.50845336914062, "pos_frac": 0.890625, "sample": [20.6363525390625, 21.87274169921875, 35.02064514160156, -9.140968322753906, 50.345977783203125, 15.796077728271484, 22.45041275024414, 14.389556884765625, 6.8013763427734375, -0.19097900390625, 31.32727813720703, -14.131729125976562, 9.963211059570312, 8.692153930664062, 15.36785888671875, 55.02214050292969, 1.99151611328125, 27.60546875, 32.322662353515625, 33.93010330200195, 50.5130615234375, 6.749631881713867, 7.6690216064453125, 3.1311721801757812, 71.38597106933594, 23.938995361328125, 25.769134521484375, -5.182258605957031, 19.656814575195312, 3.6966476440429688, 6.587562561035156, 58.145973205566406, 17.729278564453125, 20.823577880859375, 5.8116455078125, 26.713241577148438, 43.21582794189453, 30.487655639648438, 58.496681213378906, -20.25934600830078, 13.297779083251953, 40.237213134765625, 9.002878189086914, -7.075458526611328, 22.653839111328125, 16.852977752685547, 31.757869720458984, 28.804534912109375, 26.782363891601562, 48.248504638671875, -1.5273971557617188, 81.50845336914062, 58.463104248046875, 63.59843444824219, 1.870849609375, 28.720733642578125, 65.8211669921875, 33.826263427734375, 25.240310668945312, 19.036521911621094, 35.010597229003906, 68.88656616210938, 5.093849182128906, 15.874351501464844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000398.npy"}
{"epoch": 0.5844346549192364, "step": 399, "batch_size": 64, "mean": 25.84492301940918, "std": 16.860872268676758, "min": -8.331571578979492, "p10": 6.866875457763672, "median": 24.60647201538086, "p90": 44.461051940917976, "max": 62.80622863769531, "pos_frac": 0.953125, "sample": [17.84746551513672, 0.0932464599609375, 25.273849487304688, 16.310302734375, 31.664173126220703, 3.3114662170410156, 34.081451416015625, 17.5562744140625, 36.197486877441406, 57.691864013671875, 16.467453002929688, 14.348411560058594, 23.412628173828125, 39.388916015625, 29.792510986328125, 19.739974975585938, 9.794754028320312, 6.8240814208984375, 8.543964385986328, 20.431968688964844, 26.0589599609375, 8.779285430908203, 23.93909454345703, 26.848167419433594, 42.33751678466797, 35.010589599609375, 31.052711486816406, 40.93085479736328, -2.616943359375, 27.88251304626465, 41.33451843261719, 60.48961639404297, 11.528263092041016, 41.714881896972656, 42.44215393066406, -5.96599006652832, 41.49786376953125, 16.660232543945312, 20.194618225097656, 54.90440368652344, 6.966728210449219, 30.559295654296875, 11.86474609375, 12.730636596679688, -8.331571578979492, 41.21638488769531, 8.843515396118164, 36.61140441894531, 45.3262939453125, 9.700859069824219, 42.00732421875, 62.80622863769531, 16.58489990234375, 53.66162109375, 15.813697814941406, 37.30811309814453, 0.3326416015625, 23.169330596923828, 19.5008544921875, 40.49870300292969, 42.232940673828125, 49.90340805053711, 33.19392395019531, 7.77751350402832], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000399.npy"}
{"epoch": 0.5859030837004405, "step": 400, "batch_size": 64, "mean": 26.4871826171875, "std": 20.98246955871582, "min": -12.30706787109375, "p10": -0.6069438934326171, "median": 27.801773071289062, "p90": 54.56327056884766, "max": 76.61355590820312, "pos_frac": 0.859375, "sample": [12.5638427734375, 16.723663330078125, 69.29763793945312, 32.300811767578125, 34.35600280761719, -6.0751190185546875, -0.48290252685546875, 39.80205535888672, 17.2806396484375, 30.522369384765625, 36.548583984375, 12.376655578613281, 33.36346435546875, 33.44744873046875, -0.6483497619628906, -0.5103302001953125, 59.633148193359375, 14.752349853515625, 36.829864501953125, 23.553985595703125, 76.61355590820312, 2.32086181640625, 35.57011413574219, 52.460792541503906, 27.420730590820312, 25.100723266601562, 17.005172729492188, 16.421981811523438, 31.135414123535156, 55.464332580566406, -3.119770050048828, 22.004959106445312, -7.706871032714844, 6.437347412109375, 27.41437530517578, 27.85797119140625, 37.73072052001953, 28.894668579101562, 25.98236846923828, 9.743728637695312, -11.858978271484375, -6.174293518066406, 38.06854248046875, 31.26397705078125, 3.751178741455078, 36.57245635986328, 27.082305908203125, 46.78567886352539, 8.782951354980469, 75.16517639160156, 48.684417724609375, -12.30706787109375, 8.578948974609375, 29.366474151611328, 28.407752990722656, 73.03515625, 3.0885009765625, 43.529212951660156, 57.283599853515625, 34.60468292236328, 24.768630981445312, 32.881927490234375, 35.68383026123047, 27.745574951171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000400.npy"}
{"epoch": 0.5873715124816447, "step": 401, "batch_size": 64, "mean": 19.680631637573242, "std": 25.157018661499023, "min": -16.611114501953125, "p10": -6.114338684082031, "median": 13.159366607666016, "p90": 59.7957691192627, "max": 92.17056274414062, "pos_frac": 0.78125, "sample": [-7.958335876464844, -8.196861267089844, 6.6205902099609375, 58.21615219116211, 20.184364318847656, -1.350973129272461, -15.117420196533203, 13.685966491699219, 12.632766723632812, -16.611114501953125, -2.1427459716796875, 1.730316162109375, 14.129112243652344, -3.7096519470214844, 10.757904052734375, 20.213830947875977, 12.214981079101562, 20.158828735351562, -5.913330078125, 61.09312057495117, 3.683624267578125, 0.9506416320800781, 30.8564453125, 61.272186279296875, 0.2148590087890625, 88.35279846191406, 60.472747802734375, 52.79490661621094, 64.67333221435547, 1.0727710723876953, 15.057106018066406, -2.9918899536132812, -2.723155975341797, 40.92620086669922, -9.007547378540039, 33.07896423339844, 66.57916259765625, -6.2004852294921875, 18.922119140625, 5.44049072265625, 22.592140197753906, 21.66693115234375, 9.19659423828125, 2.3071212768554688, 30.817245483398438, 0.6830596923828125, 17.733291625976562, 12.168212890625, 92.17056274414062, 41.50133514404297, 41.777008056640625, 0.17440032958984375, 37.41572189331055, 11.97079849243164, 32.22993469238281, 47.991455078125, 17.301877975463867, 30.24197006225586, 14.408462524414062, -0.904149055480957, 9.123382568359375, 57.87281799316406, 4.434837341308594, -9.377386093139648], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000401.npy"}
{"epoch": 0.5888399412628488, "step": 402, "batch_size": 64, "mean": 27.708587646484375, "std": 19.893310546875, "min": -18.24288558959961, "p10": 5.249015045166016, "median": 27.334020614624023, "p90": 50.13126373291016, "max": 88.46714782714844, "pos_frac": 0.921875, "sample": [8.546531677246094, 8.691551208496094, 25.33469009399414, 30.1817626953125, 5.0359649658203125, 30.00547218322754, -4.2947235107421875, 8.000518798828125, 18.311859130859375, 33.85198974609375, 29.058250427246094, 35.41317367553711, 73.48977661132812, 28.283306121826172, 11.02032470703125, 43.16004180908203, -7.292320251464844, 45.256004333496094, 31.979454040527344, 43.818267822265625, 48.74821472167969, 20.82183837890625, 44.7216796875, -2.2137489318847656, 40.81828308105469, 26.384735107421875, 22.66259002685547, 61.57826232910156, 34.52025604248047, 37.678009033203125, 33.124263763427734, 21.2843017578125, 36.722900390625, 9.888927459716797, -2.6089248657226562, 24.54047393798828, 46.10736846923828, 42.73384094238281, 25.722976684570312, 23.387313842773438, 34.64861297607422, 2.5643043518066406, 50.7239990234375, 53.27587890625, 28.95960235595703, 16.213191986083984, 19.35340118408203, 24.024763107299805, 13.002487182617188, 88.46714782714844, 74.1416244506836, 25.350357055664062, 22.1185302734375, 34.47200012207031, 57.26658630371094, 31.906673431396484, 30.27376937866211, 6.540733337402344, 5.746131896972656, -18.24288558959961, 19.74114418029785, 6.3885955810546875, 15.969154357910156, 35.968284606933594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000402.npy"}
{"epoch": 0.5903083700440529, "step": 403, "batch_size": 64, "mean": 22.11441421508789, "std": 21.259973526000977, "min": -14.132759094238281, "p10": -1.108635139465332, "median": 20.63594627380371, "p90": 48.144873809814456, "max": 86.88841247558594, "pos_frac": 0.84375, "sample": [18.96875762939453, 12.393226623535156, 12.927852630615234, 21.131820678710938, 27.760299682617188, 4.341178894042969, 57.46749496459961, 23.64257049560547, -10.455314636230469, -1.0371074676513672, 25.76746368408203, 27.2760009765625, -14.132759094238281, 63.1287841796875, 6.258508682250977, 39.836517333984375, 10.523662567138672, -3.7660789489746094, 33.899078369140625, 22.50326919555664, 45.214927673339844, 20.140071868896484, 38.76685333251953, 0.7733345031738281, 1.2173652648925781, 45.6793212890625, 46.99888610839844, 5.427001953125, 30.514942169189453, 66.85320281982422, 3.5050125122070312, -6.680450439453125, 86.88841247558594, 23.896163940429688, 19.648696899414062, 5.967231750488281, -2.4613685607910156, 35.74937057495117, 25.23694610595703, 47.959251403808594, -1.1392898559570312, 6.252452850341797, 48.22442626953125, -0.31116199493408203, 29.822105407714844, -4.471855163574219, 50.28346633911133, 18.776031494140625, 20.028099060058594, 44.844635009765625, 13.877998352050781, 16.212854385375977, 70.43667602539062, 25.013336181640625, 34.55565643310547, 1.0265045166015625, 1.143035888671875, 1.905670166015625, 25.60723876953125, 24.75550079345703, 13.439105987548828, -0.33954620361328125, 24.03687286376953, 31.612335205078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000403.npy"}
{"epoch": 0.591776798825257, "step": 404, "batch_size": 64, "mean": 23.50090789794922, "std": 22.771345138549805, "min": -27.390174865722656, "p10": -3.9870502471923817, "median": 22.54512596130371, "p90": 47.52130889892578, "max": 100.15618133544922, "pos_frac": 0.859375, "sample": [47.843894958496094, -4.514747619628906, 15.126152038574219, 22.92779541015625, 40.811744689941406, 13.058012008666992, 41.819435119628906, 41.919464111328125, 39.450523376464844, 23.320297241210938, 46.76860809326172, 17.683731079101562, 0.7641677856445312, 84.65267944335938, 33.41547775268555, 27.260276794433594, 100.15618133544922, 39.704315185546875, -7.3377838134765625, 16.975330352783203, -5.450923919677734, 26.350391387939453, 19.812286376953125, 14.853378295898438, 26.272506713867188, 38.498172760009766, 0.2966270446777344, 43.001861572265625, 35.116546630859375, 13.213653564453125, -1.6676998138427734, 16.399682998657227, 2.1163177490234375, 29.126216888427734, 42.924713134765625, 19.744630813598633, 8.641571044921875, 32.17176818847656, 31.767868041992188, 6.027191162109375, -5.753166198730469, 4.484533309936523, 35.9980583190918, 60.771812438964844, -2.755756378173828, 19.92761993408203, -8.726333618164062, 13.96308708190918, 46.087249755859375, -27.390174865722656, 27.837078094482422, 64.66627502441406, 14.732511520385742, 26.304443359375, 51.212913513183594, 54.59590148925781, 30.114269256591797, 0.69921875, 1.6036567687988281, -8.512064933776855, 8.868476867675781, 24.484268188476562, 22.162456512451172, 7.659538269042969], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000404.npy"}
{"epoch": 0.593245227606461, "step": 405, "batch_size": 64, "mean": 19.302730560302734, "std": 18.67698097229004, "min": -18.99414825439453, "p10": -1.088600158691405, "median": 14.344215393066406, "p90": 43.944139099121095, "max": 64.44685363769531, "pos_frac": 0.890625, "sample": [13.387680053710938, 38.7974853515625, 8.263191223144531, 16.35722541809082, 25.95069122314453, 13.86932373046875, 26.494300842285156, -18.99414825439453, 13.548807144165039, 13.723602294921875, 13.343727111816406, 48.89344787597656, 63.97911834716797, 11.839515686035156, 9.94229507446289, 63.922027587890625, 16.27387237548828, 6.978614807128906, 34.99951934814453, 3.90234375, 23.629867553710938, 6.883979797363281, -1.6719970703125, 44.10704803466797, -4.405506134033203, 12.855106353759766, 4.3395843505859375, 11.873428344726562, 30.2237548828125, 4.725006103515625, -6.765567779541016, -2.215264320373535, 16.307788848876953, 4.836414337158203, 26.932968139648438, -5.023170471191406, 4.793022155761719, 2.08746337890625, 8.282032012939453, 0.2726593017578125, 43.56401824951172, 22.650909423828125, 55.14173889160156, 15.501140594482422, -9.084724426269531, 23.043275833129883, 19.071502685546875, 14.819107055664062, 10.02789306640625, 58.54833984375, 42.927520751953125, 11.29705810546875, 3.6926937103271484, 41.211944580078125, 6.890663146972656, 17.109603881835938, 41.31147766113281, 11.445503234863281, 30.60979461669922, 20.788681030273438, 64.44685363769531, 23.85169219970703, 19.958541870117188, 39.00829315185547], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000405.npy"}
{"epoch": 0.5947136563876652, "step": 406, "batch_size": 64, "mean": 19.543176651000977, "std": 19.20459747314453, "min": -23.99053192138672, "p10": -2.172596740722656, "median": 17.378318786621094, "p90": 38.519323730468756, "max": 74.6593017578125, "pos_frac": 0.84375, "sample": [18.941482543945312, 7.387992858886719, 7.037681579589844, 13.797637939453125, 16.45282745361328, 43.287200927734375, 6.158500671386719, 35.17913055419922, 63.97886657714844, -0.4713706970214844, 30.295516967773438, -5.180946350097656, 74.6593017578125, 27.44837188720703, 28.135719299316406, 16.588455200195312, 8.51812744140625, 17.84770965576172, -8.823867797851562, 7.797206878662109, 34.94911193847656, 39.50103759765625, 25.471328735351562, 29.754058837890625, 38.77622985839844, 8.914203643798828, 9.026254653930664, 26.090499877929688, 28.7335205078125, -23.99053192138672, -12.117801666259766, -4.0829620361328125, 6.361900329589844, 7.011722564697266, 24.520545959472656, 12.002372741699219, 29.630714416503906, -11.879226684570312, 19.21202850341797, 0.31729888916015625, 27.564714431762695, 33.5987548828125, 36.937355041503906, -2.2111244201660156, 15.97265625, -0.5933914184570312, 33.395751953125, 34.414493560791016, 28.895278930664062, 19.553024291992188, 73.89878845214844, 26.06183624267578, 11.344005584716797, 7.4605255126953125, 11.342384338378906, -2.0826988220214844, 2.2959365844726562, 14.280784606933594, 37.91987609863281, 16.90892791748047, 56.80219268798828, 9.386993408203125, 28.54632568359375, 31.832012176513672], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000406.npy"}
{"epoch": 0.5961820851688693, "step": 407, "batch_size": 64, "mean": 18.65850830078125, "std": 19.293792724609375, "min": -31.817184448242188, "p10": 0.797126388549805, "median": 13.42624282836914, "p90": 40.17920455932617, "max": 81.7950439453125, "pos_frac": 0.90625, "sample": [8.483274459838867, 56.10742950439453, 8.558372497558594, -14.255012512207031, -31.817184448242188, 20.329330444335938, 56.87841796875, -5.282859802246094, 11.0413818359375, -10.731815338134766, 24.290451049804688, 21.80535888671875, 18.638364791870117, 8.623970031738281, 35.35326385498047, 5.8505706787109375, 22.51995849609375, 37.0185546875, 8.114471435546875, 34.49576187133789, 6.2203369140625, 28.783828735351562, 60.379364013671875, 38.23614501953125, 12.808326721191406, 16.913909912109375, 20.440025329589844, 28.542583465576172, -3.51934814453125, 23.680343627929688, 0.6732406616210938, 81.7950439453125, 39.29431915283203, 13.683929443359375, 11.677589416503906, 19.27501678466797, 26.932891845703125, 11.870750427246094, 38.236454010009766, -7.5074615478515625, 3.3457794189453125, 7.3455047607421875, 6.5773162841796875, 7.4935150146484375, 38.523582458496094, 32.814613342285156, 15.5867919921875, 54.29847717285156, 31.67380142211914, 13.318328857421875, 44.21874237060547, 1.0861930847167969, 9.631362915039062, 27.71611785888672, 9.326545715332031, 9.18531608581543, 7.967151641845703, 7.6451263427734375, 13.534156799316406, 6.455657958984375, 40.558441162109375, 6.481239318847656, 6.583404541015625, 8.338081359863281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000407.npy"}
{"epoch": 0.5976505139500734, "step": 408, "batch_size": 64, "mean": 18.556800842285156, "std": 22.205425262451172, "min": -20.760883331298828, "p10": -3.8642456054687497, "median": 12.236279487609863, "p90": 49.86603813171387, "max": 90.06394958496094, "pos_frac": 0.859375, "sample": [7.93768310546875, 90.06394958496094, 56.08544158935547, 43.589881896972656, 20.214523315429688, 21.093896865844727, 18.38459014892578, 12.216791152954102, 2.7744216918945312, 8.552322387695312, 11.632884979248047, 2.072772979736328, 27.51275634765625, 12.255767822265625, 4.2950286865234375, -20.760883331298828, 19.365562438964844, 31.399734497070312, 44.62260437011719, 21.052169799804688, 10.077655792236328, 23.95184326171875, 7.57342529296875, 19.524749755859375, -18.238197326660156, 43.15636444091797, -5.673368453979492, 7.0515899658203125, -4.031913757324219, 33.47084045410156, 0.38990020751953125, 20.902938842773438, 59.14076232910156, -5.437686920166016, -19.141883850097656, -3.604949951171875, 25.75262451171875, 45.50090026855469, 19.86054229736328, 58.1041259765625, 9.609085083007812, -1.849945068359375, 7.403358459472656, 61.83721923828125, 15.796554565429688, 3.3936805725097656, 50.154693603515625, 45.34381103515625, 0.704071044921875, 4.571968078613281, 14.968690872192383, -3.975372314453125, 11.647079467773438, 4.373100280761719, 4.9496307373046875, 1.7228622436523438, 4.717634201049805, 4.208831787109375, 25.414718627929688, 38.692665100097656, 64.31912231445312, 49.192508697509766, 0.5227203369140625, 17.224416732788086], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000408.npy"}
{"epoch": 0.5991189427312775, "step": 409, "batch_size": 64, "mean": 17.5576171875, "std": 19.278173446655273, "min": -26.67962646484375, "p10": -5.615674591064453, "median": 16.294175148010254, "p90": 41.775230789184576, "max": 73.40225219726562, "pos_frac": 0.828125, "sample": [3.694366455078125, -5.567604064941406, 25.661537170410156, 19.016632080078125, 29.119274139404297, 13.239471435546875, 54.457191467285156, 26.118377685546875, 25.833236694335938, -26.67962646484375, 44.90924072265625, 33.74623107910156, -5.067573547363281, 30.970855712890625, 23.246932983398438, 18.251251220703125, 23.336225509643555, 0.8772125244140625, 41.10799789428711, 73.40225219726562, 8.172050476074219, -20.535621643066406, 38.61878967285156, 43.022335052490234, 31.23388671875, 9.633708953857422, 27.502166748046875, 4.098140716552734, 26.446212768554688, 10.973854064941406, -6.563629150390625, 58.93206787109375, 38.78467559814453, -12.589286804199219, 26.80036163330078, 42.061187744140625, 8.506233215332031, 12.226119995117188, -10.01605224609375, 4.027229309082031, 14.96026611328125, 14.315345764160156, 4.5379180908203125, 4.9648590087890625, 22.25799560546875, 25.765926361083984, 11.221939086914062, 18.639026641845703, 26.35546875, 22.96605682373047, 15.82304573059082, 11.895744323730469, -5.434055328369141, 11.246723175048828, -14.42120361328125, 49.84919357299805, 37.52030944824219, 15.311676025390625, 6.667243957519531, 6.438911437988281, 25.197540283203125, 16.765304565429688, -4.5294342041015625, -5.6362762451171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000409.npy"}
{"epoch": 0.6005873715124816, "step": 410, "batch_size": 64, "mean": 17.959930419921875, "std": 18.557422637939453, "min": -24.66382598876953, "p10": -1.3482118606567373, "median": 14.920612335205078, "p90": 40.147047424316405, "max": 87.92686462402344, "pos_frac": 0.859375, "sample": [32.99730682373047, 13.222461700439453, -1.7880420684814453, 8.309707641601562, 5.670433044433594, 24.7916259765625, 34.627227783203125, 55.7337646484375, 40.20086669921875, -0.20092391967773438, 9.641458511352539, 16.469818115234375, 24.67517852783203, 17.64897918701172, 12.503658294677734, 20.26678466796875, 40.02146911621094, 40.92778015136719, 6.322509765625, 12.877391815185547, 26.5872802734375, 9.128265380859375, 16.272216796875, 32.80870056152344, 14.539787292480469, 20.994110107421875, 17.040359497070312, 3.209257125854492, -2.9087905883789062, 39.391265869140625, 33.463134765625, -0.3219413757324219, 23.31757354736328, 22.993675231933594, 5.6849822998046875, -8.817138671875, 2.5763778686523438, -8.030921936035156, -11.364517211914062, -24.66382598876953, 1.5692367553710938, 24.168289184570312, 3.9976882934570312, 9.796173095703125, 4.539142608642578, 4.539398193359375, 12.478546142578125, 32.95575714111328, 40.73213195800781, 24.794265747070312, 3.139739990234375, 15.301437377929688, 32.46876525878906, 24.868789672851562, 35.28491973876953, 18.1590576171875, -2.0940322875976562, 45.69456481933594, 13.857284545898438, 87.92686462402344, 3.884998321533203, 12.599098205566406, 60.11598205566406, 11.8382568359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000410.npy"}
{"epoch": 0.6020558002936858, "step": 411, "batch_size": 64, "mean": 20.6715087890625, "std": 20.327835083007812, "min": -13.000732421875, "p10": -1.9251033782958984, "median": 18.031234741210938, "p90": 45.08404388427734, "max": 95.968017578125, "pos_frac": 0.828125, "sample": [12.913978576660156, -8.134674072265625, 14.510597229003906, -1.9669189453125, 11.499130249023438, -7.617027282714844, -4.283435821533203, 11.280220031738281, 44.77069091796875, 43.83055114746094, 38.771392822265625, 12.807838439941406, 45.21833801269531, 5.600311279296875, 8.252071380615234, -0.3989715576171875, 13.5958251953125, 62.35625457763672, 55.68909454345703, 18.381057739257812, 43.08855438232422, 21.380271911621094, 24.8824462890625, -10.85453987121582, 27.307693481445312, -0.1549835205078125, 4.5700836181640625, 27.32433319091797, 3.227508544921875, 38.290618896484375, 21.220428466796875, 24.455604553222656, 21.925155639648438, 40.56039047241211, -13.000732421875, 2.38409423828125, 16.905906677246094, 57.60832977294922, 50.34382629394531, 28.464065551757812, 32.797550201416016, 42.90849304199219, 21.8743896484375, 11.944278717041016, 23.259761810302734, 27.121780395507812, 24.18780517578125, 3.357879638671875, 95.968017578125, 17.681411743164062, 24.653732299804688, 46.580196380615234, 12.823837280273438, 36.48719787597656, 26.04229736328125, -1.8275337219238281, 4.574531555175781, 12.079925537109375, -1.8057937622070312, 4.8959197998046875, 32.40106964111328, 3.597219467163086, 16.523422241210938, -2.1562271118164062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000411.npy"}
{"epoch": 0.6035242290748899, "step": 412, "batch_size": 64, "mean": 20.16986083984375, "std": 20.06533432006836, "min": -14.174041748046875, "p10": -3.1650711059570305, "median": 17.38201904296875, "p90": 41.48425216674805, "max": 89.34213256835938, "pos_frac": 0.84375, "sample": [-3.828205108642578, 40.72275924682617, -1.7490730285644531, 0.09508514404296875, 32.982391357421875, 40.659820556640625, 41.778621673583984, -9.074726104736328, 2.712921142578125, 37.31317138671875, 7.506406784057617, 23.235824584960938, 18.74560546875, 45.40642547607422, 17.17235565185547, 21.563690185546875, 32.15683364868164, 9.18685531616211, 28.77770233154297, 57.389686584472656, 5.2428741455078125, 0.9919586181640625, 31.88713836669922, 7.816925048828125, 10.449146270751953, 17.061790466308594, 8.615097045898438, 24.33575439453125, -0.9076309204101562, 5.154758453369141, 0.027910232543945312, 15.359012603759766, 15.1041259765625, 3.2284774780273438, 23.709125518798828, 17.59168243408203, -2.3985595703125, 10.838411331176758, 36.012596130371094, 61.917266845703125, 23.651824951171875, -4.401451110839844, 40.79738998413086, 15.050689697265625, 29.68756866455078, -3.4935760498046875, 31.632080078125, 32.91434097290039, -10.051773071289062, 20.954124450683594, 37.19074249267578, 14.955879211425781, 6.322685241699219, 8.099517822265625, 60.692626953125, 89.34213256835938, -14.174041748046875, 29.096145629882812, 32.940757751464844, -13.184799194335938, 40.055091857910156, 26.26873016357422, 15.405349731445312, 46.327056884765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000412.npy"}
{"epoch": 0.604992657856094, "step": 413, "batch_size": 64, "mean": 19.43267059326172, "std": 21.24412727355957, "min": -37.5576171875, "p10": -4.751888275146482, "median": 18.753952026367188, "p90": 40.44720764160157, "max": 99.75569152832031, "pos_frac": 0.859375, "sample": [22.017013549804688, -7.8922271728515625, 18.099788665771484, 2.619487762451172, 10.9073486328125, 10.145614624023438, 20.500335693359375, 15.860967636108398, 11.999420166015625, -17.253814697265625, 25.416038513183594, 35.37261962890625, -0.22652435302734375, 36.24451446533203, -6.9094390869140625, 9.628133773803711, -2.2684402465820312, 10.235382080078125, 41.35040283203125, 33.70750427246094, 1.5707950592041016, 16.89703369140625, 50.200103759765625, -33.39984130859375, 1.824951171875, 17.845279693603516, 14.768798828125, 29.19867706298828, 21.60065460205078, 33.61161804199219, -10.533267974853516, 17.941497802734375, 10.030021667480469, 34.640625, 43.268707275390625, 19.20257568359375, 67.16790008544922, 18.382232666015625, 37.031349182128906, 37.89398193359375, 22.152755737304688, 24.381622314453125, 20.699012756347656, 33.19590759277344, 20.343975067138672, 34.586158752441406, 30.46112060546875, -37.5576171875, 7.610443115234375, 33.35841369628906, 10.272811889648438, 19.12567138671875, 45.69171142578125, 50.63288879394531, 2.0792484283447266, 13.07468032836914, -5.81622314453125, 99.75569152832031, 20.3459529876709, 38.339752197265625, 24.270355224609375, 14.074434280395508, 9.751792907714844, 14.162513732910156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000413.npy"}
{"epoch": 0.6064610866372981, "step": 414, "batch_size": 64, "mean": 19.660972595214844, "std": 22.256919860839844, "min": -38.25352478027344, "p10": -3.02148208618164, "median": 14.575027465820312, "p90": 49.86645278930664, "max": 84.9361572265625, "pos_frac": 0.828125, "sample": [34.7201042175293, 29.276851654052734, 35.084320068359375, -2.1059951782226562, 26.3819580078125, 48.672332763671875, 0.070770263671875, -38.25352478027344, -0.33673095703125, 50.49273681640625, 19.36138916015625, 11.662361145019531, 35.09219741821289, 17.368118286132812, 2.8214569091796875, 8.804506301879883, 21.084821701049805, 18.319908142089844, 5.749279022216797, -6.062288284301758, 11.969207763671875, 27.578392028808594, 10.889055252075195, -4.667930603027344, -1.45111083984375, 14.03570556640625, 4.400787353515625, 17.657154083251953, 40.151947021484375, 32.354522705078125, 50.20002746582031, 80.85474395751953, -1.3860740661621094, 10.157058715820312, 57.19733428955078, 8.45751953125, 13.095901489257812, 11.293663024902344, 7.532989501953125, 32.21121597290039, 22.371475219726562, 15.114349365234375, 54.87892150878906, 12.015525817871094, 4.549762725830078, 26.78082275390625, 18.63360595703125, 18.449527740478516, 1.1726531982421875, -8.800529479980469, 84.9361572265625, 49.088111877441406, 34.57335662841797, 5.997657775878906, 12.306461334228516, 11.980506896972656, 63.328521728515625, -17.303565979003906, 27.779682159423828, 31.797256469726562, -3.503143310546875, -3.4138336181640625, 7.466419219970703, 47.36589431762695], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000414.npy"}
{"epoch": 0.6079295154185022, "step": 415, "batch_size": 64, "mean": 19.73385238647461, "std": 22.08148765563965, "min": -32.17335510253906, "p10": -3.644650268554687, "median": 17.627992630004883, "p90": 50.578764343261746, "max": 90.10455322265625, "pos_frac": 0.84375, "sample": [41.365447998046875, 0.563507080078125, 22.012130737304688, 78.43946838378906, -3.001708984375, 1.1958770751953125, 90.10455322265625, 14.234687805175781, 29.426921844482422, 27.48784637451172, 20.77286148071289, 8.952117919921875, 15.204010009765625, 41.17314910888672, 11.16351318359375, 17.483280181884766, 20.707229614257812, 28.393096923828125, 3.186126708984375, 10.364566802978516, 5.022621154785156, 26.622467041015625, -3.920196533203125, -1.3708648681640625, 10.89261245727539, 30.041879653930664, 36.64427185058594, 14.742774963378906, 44.55494689941406, 8.37469482421875, -32.17335510253906, 25.93594741821289, -19.94805908203125, 13.603652954101562, 53.160400390625, -1.3378715515136719, -8.338882446289062, 17.772705078125, 21.36717987060547, 53.5821533203125, 3.522857666015625, 29.471115112304688, 12.228363037109375, -18.350494384765625, 11.08792495727539, 37.924625396728516, 3.125396728515625, 29.283676147460938, 17.29949188232422, 31.114242553710938, 15.75701904296875, 59.80506896972656, 18.994117736816406, -10.994659423828125, 28.649208068847656, 3.346181869506836, 19.933265686035156, 28.621780395507812, 54.556358337402344, -7.6233367919921875, 54.847686767578125, 4.4053955078125, 33.482383728027344, 28.021150588989258], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000415.npy"}
{"epoch": 0.6093979441997063, "step": 416, "batch_size": 64, "mean": 24.90656280517578, "std": 20.028270721435547, "min": -14.119667053222656, "p10": 3.5293151855468765, "median": 22.730891227722168, "p90": 54.437676239013676, "max": 71.79888916015625, "pos_frac": 0.90625, "sample": [13.492591857910156, 30.152015686035156, 67.94287109375, 12.459388732910156, -11.361064910888672, 42.82072448730469, 17.168807983398438, 5.785709381103516, 33.71516418457031, 46.38622283935547, 40.447357177734375, -6.888343811035156, 11.443241119384766, 33.448883056640625, 25.406600952148438, 28.98492431640625, 23.372129440307617, 8.127609252929688, 33.29493713378906, 7.741241455078125, 23.3226318359375, 49.41358184814453, -14.119667053222656, 49.89251708984375, 31.26366424560547, 54.68377685546875, 22.501760482788086, 71.79888916015625, 21.700958251953125, 42.50098419189453, 15.396385192871094, 53.863441467285156, 68.29397583007812, 22.96002197265625, 21.04000473022461, 5.780059814453125, 36.2203369140625, 13.158004760742188, -6.483329772949219, 31.28919219970703, 9.713996887207031, 44.2376708984375, 24.321868896484375, 14.291595458984375, -1.6923370361328125, 13.482406616210938, 15.216407775878906, 57.93287658691406, 21.345308303833008, 18.523521423339844, 9.599254608154297, 6.9770660400390625, 2.8860015869140625, 31.044754028320312, 9.579704284667969, 5.0303802490234375, 60.81507110595703, -3.120189666748047, 13.64394760131836, 24.644424438476562, 14.049537658691406, 26.148406982421875, 55.9181022644043, 41.01214599609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000416.npy"}
{"epoch": 0.6108663729809104, "step": 417, "batch_size": 64, "mean": 22.92763900756836, "std": 20.348093032836914, "min": -23.7900390625, "p10": -2.031050109863281, "median": 21.78533363342285, "p90": 51.71606674194336, "max": 71.55857849121094, "pos_frac": 0.84375, "sample": [13.932315826416016, 53.07511901855469, 1.4745616912841797, 44.81040954589844, -0.8468093872070312, 71.55857849121094, 26.579978942871094, 18.259490966796875, 4.89324951171875, 51.399803161621094, -15.390602111816406, 10.048805236816406, 23.182220458984375, -4.818111419677734, 47.04852294921875, 37.09807586669922, -2.001708984375, 35.93052673339844, 29.928558349609375, -8.37457275390625, 29.327713012695312, 45.36273193359375, 25.036579132080078, 51.85160827636719, 48.649253845214844, 40.843719482421875, 54.53056335449219, -1.5142135620117188, 23.943939208984375, 47.17533874511719, 11.429878234863281, -2.0436248779296875, 27.722850799560547, 34.35478973388672, 19.361724853515625, 10.022918701171875, 12.946510314941406, 11.786421775817871, 23.613510131835938, 13.497200012207031, 3.9319992065429688, 16.604022979736328, 17.98979949951172, 20.477218627929688, 15.261390686035156, 57.08240509033203, 27.0540771484375, 3.041393280029297, -23.7900390625, 27.1943359375, 19.76134490966797, -9.286569595336914, 53.881568908691406, 43.43113708496094, 27.765487670898438, 14.683441162109375, 12.486312866210938, 23.093448638916016, 43.23115539550781, 52.58808898925781, 11.70123291015625, -7.9436187744140625, 36.321929931640625, 15.119514465332031], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000417.npy"}
{"epoch": 0.6123348017621145, "step": 418, "batch_size": 64, "mean": 27.021373748779297, "std": 22.232643127441406, "min": -38.82067108154297, "p10": 1.291721534729005, "median": 23.479557037353516, "p90": 60.32138137817383, "max": 80.16654968261719, "pos_frac": 0.90625, "sample": [23.257423400878906, 23.73157501220703, 13.890401840209961, 0.7877025604248047, 43.14043426513672, 23.380325317382812, 8.604515075683594, 25.057205200195312, 59.89342498779297, 45.162620544433594, 32.79718017578125, -2.1844329833984375, 62.69620895385742, 32.15441131591797, 33.10704803466797, 18.464675903320312, 19.208343505859375, 9.958633422851562, 17.50865936279297, 32.88262176513672, -38.82067108154297, 48.266502380371094, 45.001991271972656, 12.411376953125, 15.285930633544922, 31.68933868408203, 7.578330993652344, 50.55868911743164, 41.967376708984375, 26.573760986328125, 39.598121643066406, 37.92091369628906, 12.181625366210938, 74.82886505126953, 9.875228881835938, 11.809459686279297, 2.4677658081054688, 20.211849212646484, 23.459030151367188, 34.790618896484375, 80.16654968261719, 26.064544677734375, 63.678253173828125, 60.504791259765625, 45.305389404296875, 19.49396514892578, 11.932952880859375, 16.766666412353516, 21.744674682617188, -8.11895751953125, 63.88587951660156, 18.902759552001953, -5.3111114501953125, 41.87548065185547, 34.921653747558594, -0.45611572265625, -17.39502716064453, 57.22843933105469, 23.500083923339844, 16.081192016601562, 33.1614990234375, 61.74000549316406, 17.846405029296875, 14.692962646484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000418.npy"}
{"epoch": 0.6138032305433186, "step": 419, "batch_size": 64, "mean": 22.709108352661133, "std": 24.08184242248535, "min": -29.35654067993164, "p10": -6.0526626586914025, "median": 19.188331604003906, "p90": 57.06195678710939, "max": 85.73056030273438, "pos_frac": 0.84375, "sample": [38.262657165527344, -11.3734130859375, 18.474960327148438, 2.646209716796875, 41.540706634521484, 19.901702880859375, -2.6702728271484375, 10.099555969238281, 11.56976318359375, 37.09368133544922, 85.73056030273438, 8.868240356445312, 23.639511108398438, 48.50685119628906, 9.849071502685547, 27.940284729003906, -29.35654067993164, 14.885419845581055, 45.98516082763672, 10.078609466552734, 43.46131134033203, 3.43328857421875, -28.20562744140625, 25.76690673828125, 12.900497436523438, -8.076995849609375, 15.844432830810547, 6.621673583984375, 32.98351287841797, -0.29135894775390625, -0.910980224609375, 4.259204864501953, 32.0474853515625, 24.18413543701172, 4.517608642578125, 21.43499755859375, 29.409584045410156, 61.116661071777344, 5.260463714599609, -7.50225830078125, 42.259490966796875, -8.886466979980469, 63.1864013671875, 46.06309509277344, 26.577604293823242, 9.334571838378906, 5.877246856689453, -8.937667846679688, 64.92379760742188, 64.0689926147461, 53.77155303955078, 8.556587219238281, 4.221626281738281, 58.899391174316406, 38.73882293701172, 53.36390686035156, 58.472129821777344, 53.296409606933594, 1.94683837890625, 40.99818420410156, 27.54107666015625, 11.045150756835938, 15.086593627929688, 33.05029296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000419.npy"}
{"epoch": 0.6152716593245228, "step": 420, "batch_size": 64, "mean": 22.751480102539062, "std": 22.259098052978516, "min": -13.038673400878906, "p10": -3.615308380126952, "median": 21.55980396270752, "p90": 47.803321075439456, "max": 84.6187973022461, "pos_frac": 0.84375, "sample": [-2.1952972412109375, 28.666976928710938, 37.330543518066406, 6.248046875, 42.311553955078125, -13.038673400878906, 30.275863647460938, 9.100341796875, 84.6187973022461, 26.353797912597656, 3.8399658203125, -1.2375946044921875, 14.35727310180664, 15.880020141601562, 17.129669189453125, 23.498821258544922, 21.65757179260254, 21.96346664428711, 9.865388870239258, 6.258533477783203, -6.261423110961914, 20.099029541015625, 6.98211669921875, 12.547439575195312, 11.927452087402344, 52.47807312011719, 75.61929321289062, 23.47170639038086, 3.9156494140625, 45.725311279296875, 29.04468536376953, 24.619972229003906, 81.53791046142578, -12.190311431884766, 27.655845642089844, 41.03325653076172, 28.273574829101562, 3.3730926513671875, 19.3326416015625, 35.698036193847656, -4.223884582519531, -1.1240234375, 32.462196350097656, 2.526477813720703, 39.82328796386719, 29.214073181152344, 70.50518798828125, 11.03759765625, 9.937541961669922, 27.101593017578125, -6.677825927734375, -8.58349609375, 18.664505004882812, 25.42653465270996, -11.327804565429688, 45.064483642578125, 18.070098876953125, 22.73443603515625, 46.38536071777344, 38.81420135498047, 48.41101837158203, 64.783935546875, 7.8387298583984375, 21.4620361328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000420.npy"}
{"epoch": 0.6167400881057269, "step": 421, "batch_size": 64, "mean": 21.629940032958984, "std": 20.726879119873047, "min": -16.664306640625, "p10": 0.5057994842529298, "median": 16.250581741333008, "p90": 47.1467945098877, "max": 83.6854248046875, "pos_frac": 0.921875, "sample": [12.77523422241211, 7.493862152099609, 39.49882888793945, 15.312713623046875, 43.72792053222656, 7.540618896484375, 60.632171630859375, 12.269279479980469, 9.117204666137695, 0.7224617004394531, 42.47709655761719, 7.461277008056641, 1.84771728515625, 47.082977294921875, 34.34798812866211, 18.406890869140625, 54.58219909667969, 1.3156509399414062, 21.622451782226562, 16.226585388183594, 83.6854248046875, 59.36296844482422, 41.13780212402344, 33.91330337524414, 22.45867156982422, 12.885086059570312, -2.1848068237304688, 16.274578094482422, 28.427215576171875, 47.17414474487305, 31.419113159179688, 33.29694747924805, 45.5458984375, 41.360740661621094, -8.803813934326172, 0.5808372497558594, 40.28327178955078, 48.49932861328125, 15.71185302734375, 31.412803649902344, 6.142890930175781, 0.13788604736328125, 43.147491455078125, 49.505523681640625, 29.242950439453125, 29.567413330078125, 37.34012222290039, 5.656242370605469, 6.64824104309082, 33.22151184082031, 8.588363647460938, 1.3453216552734375, 5.894969940185547, -15.165817260742188, 6.669223785400391, 3.3339309692382812, 39.71399688720703, 0.47364044189453125, -16.664306640625, 6.603752136230469, 1.791238784790039, -5.153594970703125, 27.198867797851562, 2.1738433837890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000421.npy"}
{"epoch": 0.618208516886931, "step": 422, "batch_size": 64, "mean": 19.077220916748047, "std": 18.92664909362793, "min": -16.387619018554688, "p10": -6.185371398925781, "median": 17.418533325195312, "p90": 43.76482238769532, "max": 88.90568542480469, "pos_frac": 0.859375, "sample": [-10.810272216796875, 28.250282287597656, 59.75718688964844, 44.08778381347656, 34.05546569824219, -16.387619018554688, 1.6342926025390625, 13.187614440917969, 4.783882141113281, 88.90568542480469, 34.96311950683594, 44.65208435058594, 16.525802612304688, 16.671218872070312, 30.78748321533203, 12.586517333984375, 1.4729194641113281, 17.93596649169922, 18.486114501953125, 43.01124572753906, 23.47598648071289, 18.393035888671875, 14.811698913574219, 10.588043212890625, 20.059188842773438, 37.45628356933594, 16.44831085205078, 16.826507568359375, 14.495340347290039, 38.19642639160156, 59.33281707763672, -5.773345947265625, 16.78936767578125, 14.45511245727539, 0.8475761413574219, 18.567899703979492, 21.75147247314453, -6.92041015625, 30.750587463378906, -6.3619537353515625, 9.82009506225586, 17.3492431640625, 9.193496704101562, -7.2265777587890625, 12.477867126464844, 26.43393325805664, 28.025413513183594, 24.735702514648438, 12.863052368164062, 1.412435531616211, 17.487823486328125, 22.6995849609375, 51.7249755859375, -3.6942901611328125, 25.717071533203125, 45.340301513671875, 18.42706298828125, 17.603134155273438, 25.29883575439453, 12.974678039550781, -16.312728881835938, 12.56192398071289, 24.60248565673828, -7.320075988769531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000422.npy"}
{"epoch": 0.6196769456681351, "step": 423, "batch_size": 64, "mean": 20.08889389038086, "std": 20.61139488220215, "min": -20.679931640625, "p10": -3.853458786010741, "median": 20.08120346069336, "p90": 41.524799346923835, "max": 82.70521545410156, "pos_frac": 0.828125, "sample": [20.514999389648438, -4.7380523681640625, 0.16323089599609375, -2.2208633422851562, 8.085445404052734, 5.8194732666015625, 33.810794830322266, 0.7636871337890625, 27.466176986694336, 21.707351684570312, 7.92657470703125, -4.185832977294922, 55.442138671875, 15.409271240234375, 12.959148406982422, 12.34067153930664, 33.868385314941406, 82.70521545410156, 42.05133056640625, 39.444122314453125, 23.87286376953125, -20.679931640625, 10.32513427734375, 17.309513092041016, 25.69561767578125, 14.669136047363281, 40.296226501464844, 18.84213638305664, -16.345312118530273, -16.294479370117188, 20.80801773071289, 23.745189666748047, 1.2596359252929688, 34.312896728515625, 57.591949462890625, 26.276153564453125, 31.698822021484375, 13.537437438964844, -0.23740768432617188, 7.3339385986328125, 25.05328369140625, 32.775882720947266, 5.977752685546875, -3.0779190063476562, 74.34971618652344, 19.162677764892578, 17.031044006347656, 0.1461963653564453, 38.68125915527344, 26.908119201660156, -5.263580322265625, -9.690605163574219, 56.441192626953125, 29.34002685546875, 29.862457275390625, 39.90290069580078, -0.4002246856689453, 23.02904510498047, 3.9461135864257812, 25.311737060546875, 26.64905548095703, 39.54273986816406, 19.64740753173828, 47.012203216552734], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000423.npy"}
{"epoch": 0.6211453744493393, "step": 424, "batch_size": 64, "mean": 21.220592498779297, "std": 22.75119972229004, "min": -25.157745361328125, "p10": 0.9468101501464845, "median": 16.89898681640625, "p90": 56.53835144042971, "max": 73.44293975830078, "pos_frac": 0.90625, "sample": [3.717254638671875, 61.396278381347656, 1.069061279296875, 4.936773300170898, 24.668052673339844, 38.32857131958008, 35.84973907470703, 58.739837646484375, 2.961864471435547, 4.4071044921875, 51.40155029296875, -4.280479431152344, 15.57244873046875, 1.469482421875, 13.402908325195312, 67.52115631103516, 18.22552490234375, 21.982872009277344, 15.259662628173828, 6.517940521240234, 22.132164001464844, -25.157745361328125, 41.71876525878906, 23.373138427734375, 26.25623321533203, 21.77361297607422, 7.3752899169921875, 33.61566925048828, 3.768413543701172, -22.777936935424805, 4.110116958618164, 34.1329345703125, 0.8944168090820312, 1.6578521728515625, 10.826828002929688, 10.5013427734375, 68.97648620605469, -20.48094940185547, 10.486587524414062, 20.821517944335938, 7.548397064208984, 3.5244674682617188, 10.746162414550781, 45.12660217285156, 47.74181365966797, 20.740081787109375, 13.761138916015625, -0.7530632019042969, -12.946243286132812, 73.44293975830078, 10.434364318847656, 5.388175964355469, 39.65721130371094, 28.815542221069336, 20.8948974609375, 4.990276336669922, 42.68219757080078, 19.515914916992188, 15.24831771850586, 60.306976318359375, 47.63039779663086, 69.94157409667969, 35.32902526855469, 31.198440551757812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000424.npy"}
{"epoch": 0.6226138032305433, "step": 425, "batch_size": 64, "mean": 17.739974975585938, "std": 22.240772247314453, "min": -41.78854751586914, "p10": -5.1085249900817855, "median": 14.593576431274414, "p90": 47.8085620880127, "max": 72.90536499023438, "pos_frac": 0.75, "sample": [-3.913630485534668, -0.7952470779418945, 7.221366882324219, -7.325584411621094, -13.681182861328125, 36.17179870605469, 1.69500732421875, 36.750213623046875, 18.973617553710938, 41.16485595703125, 22.338287353515625, 9.848457336425781, -3.0351104736328125, -5.620622634887695, 11.57636833190918, -11.370323181152344, 2.0312347412109375, 15.717742919921875, 47.53904724121094, 13.26605224609375, 2.042236328125, 16.405467987060547, 11.832778930664062, -0.80523681640625, 12.179168701171875, 18.291858673095703, 20.646018981933594, -2.4025421142578125, 33.444740295410156, 56.65167236328125, -0.522796630859375, 15.011756896972656, 0.872406005859375, 1.2877845764160156, 46.20718002319336, 28.7017822265625, 60.82774353027344, -1.0542068481445312, -7.527435302734375, 52.60333251953125, -3.798006057739258, 34.59489440917969, 14.522430419921875, 9.642372131347656, 58.655006408691406, 47.924068450927734, 7.480754852294922, 19.84336280822754, 36.925132751464844, 44.575592041015625, 28.583274841308594, 64.07090759277344, 28.28857421875, -13.154640197753906, 17.235435485839844, 40.405731201171875, 31.822677612304688, 72.90536499023438, 11.236930847167969, 6.008110046386719, 14.664722442626953, -41.78854751586914, 22.09625244140625, -0.6281604766845703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000425.npy"}
{"epoch": 0.6240822320117474, "step": 426, "batch_size": 64, "mean": 23.562868118286133, "std": 17.203031539916992, "min": -13.2120361328125, "p10": 3.7990623474121112, "median": 21.222795486450195, "p90": 48.374488830566406, "max": 59.40531921386719, "pos_frac": 0.921875, "sample": [14.24533462524414, 11.545677185058594, 8.569778442382812, 41.09803771972656, 0.6538734436035156, 21.715728759765625, 5.670928955078125, 52.696266174316406, 57.32923889160156, 18.36389923095703, 53.39411163330078, 27.867172241210938, 21.059402465820312, -0.44142913818359375, 29.176177978515625, 16.039737701416016, 20.89798355102539, 21.68011474609375, 31.147199630737305, 58.90435791015625, 16.994422912597656, 38.27130126953125, 8.343387603759766, 24.551910400390625, 11.120330810546875, 10.288509368896484, 24.44345474243164, -11.690811157226562, 30.324920654296875, 36.19569396972656, 31.058448791503906, 48.43951416015625, 19.937789916992188, 31.79168701171875, 41.32133483886719, 48.22276306152344, 17.014541625976562, 19.07866668701172, 10.546859741210938, 11.579544067382812, 7.6009674072265625, 32.560516357421875, 33.418304443359375, 18.12865447998047, 54.44989776611328, 9.682891845703125, 11.363101959228516, 42.883811950683594, -2.8263626098632812, 21.386188507080078, 38.798004150390625, 39.477088928222656, 6.125631332397461, 8.261161804199219, 32.96421813964844, 24.941059112548828, 12.66400146484375, 59.40531921386719, 20.958633422851562, -13.2120361328125, 34.488067626953125, 2.9968338012695312, -3.0015716552734375, 35.06126403808594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000426.npy"}
{"epoch": 0.6255506607929515, "step": 427, "batch_size": 64, "mean": 19.664459228515625, "std": 16.366439819335938, "min": -16.025489807128906, "p10": 0.5581867218017578, "median": 18.423845291137695, "p90": 45.03484802246094, "max": 56.15251159667969, "pos_frac": 0.90625, "sample": [13.287826538085938, 11.63397216796875, -0.25189208984375, 15.359092712402344, 56.15251159667969, 50.28502655029297, 52.20478820800781, -16.025489807128906, 30.728500366210938, 17.664161682128906, 22.284652709960938, 19.183528900146484, -6.7572784423828125, 4.785511016845703, 55.650489807128906, 14.108970642089844, 32.37242889404297, 37.62135314941406, 7.320350646972656, 22.43157196044922, 15.284141540527344, 0.5557479858398438, 34.472564697265625, -1.930450439453125, 20.489418029785156, 12.78057861328125, 30.11609649658203, 28.040016174316406, 19.599136352539062, 13.012687683105469, 39.28802490234375, 45.626617431640625, 20.390602111816406, 10.42255973815918, 13.96017074584961, 43.654052734375, 19.24842071533203, 12.843055725097656, 26.640060424804688, 0.5638771057128906, 49.47198486328125, 7.519233703613281, 13.673622131347656, 19.9183349609375, 13.1368408203125, 17.127059936523438, 21.15874481201172, 24.64305877685547, 25.384315490722656, 1.2068157196044922, 16.05144500732422, 7.695335388183594, 36.743072509765625, 22.738174438476562, 21.830135345458984, 24.456008911132812, -4.2557525634765625, 4.477851867675781, -1.5368156433105469, 4.709157943725586, 55.26921081542969, 27.987464904785156, 1.47406005859375, 2.5485076904296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000427.npy"}
{"epoch": 0.6270190895741556, "step": 428, "batch_size": 64, "mean": 17.093006134033203, "std": 21.59332847595215, "min": -51.278778076171875, "p10": -4.84343032836914, "median": 16.90288543701172, "p90": 43.67852401733399, "max": 72.8189697265625, "pos_frac": 0.78125, "sample": [12.237174987792969, -18.56409454345703, 19.06201934814453, -2.5358200073242188, 49.62353515625, 10.561483383178711, 35.20296096801758, 70.62710571289062, 33.52642059326172, 27.502914428710938, 49.37580108642578, -3.888092041015625, 27.999778747558594, -2.39105224609375, 44.07475280761719, 13.492416381835938, 4.7914276123046875, 34.46763610839844, 4.5448760986328125, 23.42791748046875, 33.428314208984375, 12.66705322265625, 18.93730926513672, 21.77346420288086, 31.981491088867188, 30.46057891845703, 6.4175567626953125, 4.865480422973633, 3.6741485595703125, 18.384811401367188, 21.467395782470703, -0.244781494140625, 27.034088134765625, 27.06880760192871, 42.753990173339844, -2.592496871948242, 7.9908599853515625, 28.78197479248047, 1.9499740600585938, -5.252861022949219, 23.13658905029297, 13.927108764648438, 15.589317321777344, -8.349166870117188, 8.559928894042969, 12.039480209350586, -5.2693634033203125, -13.903141021728516, 25.42132568359375, 72.8189697265625, 20.281646728515625, 18.216453552246094, -1.3564529418945312, -0.07684326171875, 26.72924041748047, 28.80487060546875, 62.293617248535156, 54.988006591796875, -51.278778076171875, 7.336063385009766, 24.357009887695312, 3.8825302124023438, 14.68115234375, -23.563522338867188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000428.npy"}
{"epoch": 0.6284875183553598, "step": 429, "batch_size": 64, "mean": 21.598388671875, "std": 21.622217178344727, "min": -13.881832122802734, "p10": -4.019532012939453, "median": 21.910907745361328, "p90": 52.82714233398438, "max": 73.2080078125, "pos_frac": 0.78125, "sample": [21.656219482421875, -2.4897384643554688, 11.905155181884766, 58.50167465209961, 22.16559600830078, 39.266143798828125, -13.432342529296875, -10.275972366333008, 49.2469482421875, 9.8582763671875, 5.0494384765625, 16.131256103515625, 28.045249938964844, 8.692222595214844, -0.6921157836914062, 39.987213134765625, 8.20953369140625, -0.8062629699707031, 50.99427795410156, 33.944366455078125, 36.18669128417969, 27.06371307373047, 10.8358154296875, 24.012420654296875, -4.156913757324219, 7.562660217285156, 41.03471374511719, 7.185176849365234, 40.23029327392578, 53.991668701171875, -2.0113449096679688, 10.884208679199219, 34.96300506591797, 4.0616455078125, -13.597244262695312, -4.3796844482421875, -2.476665496826172, 32.94905090332031, 35.599365234375, 9.002296447753906, 13.00387191772461, 53.61265563964844, 13.416419982910156, 38.34817886352539, 66.29930114746094, 4.730531692504883, 68.46279907226562, 9.953262329101562, 12.505714416503906, -0.5441665649414062, 36.385009765625, 26.550750732421875, 38.35521697998047, 23.33099365234375, -13.881832122802734, -3.698974609375, -4.494050979614258, 22.306560516357422, 56.37692642211914, 27.245697021484375, 37.18153381347656, 73.2080078125, 25.41387939453125, 33.33058166503906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000429.npy"}
{"epoch": 0.6299559471365639, "step": 430, "batch_size": 64, "mean": 20.758493423461914, "std": 19.995210647583008, "min": -24.782135009765625, "p10": -0.48446502685546866, "median": 18.528583526611328, "p90": 49.07106094360351, "max": 86.60601806640625, "pos_frac": 0.859375, "sample": [31.047714233398438, -0.5338897705078125, 49.049041748046875, -4.381431579589844, 37.34869384765625, 9.660530090332031, 52.342193603515625, 27.999576568603516, 16.226295471191406, 26.512493133544922, 10.726570129394531, 19.615097045898438, 6.850128173828125, -24.782135009765625, 31.445236206054688, 9.745738983154297, -4.154571533203125, 49.08049774169922, 28.412582397460938, 9.782856941223145, -15.28802490234375, 18.577560424804688, 12.637313842773438, 33.120880126953125, 15.450939178466797, 14.70306396484375, -0.7728080749511719, 57.70524597167969, 14.574172973632812, 18.47960662841797, 12.68742561340332, 4.352527618408203, 48.44793701171875, 27.723983764648438, 28.144866943359375, 26.31940460205078, -5.623260498046875, 13.562301635742188, 1.3183364868164062, 63.0750732421875, 15.255859375, 18.30188751220703, 19.218238830566406, 0.88525390625, 22.98839569091797, 55.84601593017578, 25.542221069335938, 26.6575927734375, 40.66343688964844, -0.03961181640625, 19.38995361328125, 27.30413055419922, 27.028427124023438, 20.391958236694336, 0.14510536193847656, -0.369140625, 7.459102630615234, 3.4541549682617188, 30.761436462402344, 59.8797607421875, 86.60601806640625, 11.854217529296875, 29.946243286132812, 8.183115005493164], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000430.npy"}
{"epoch": 0.631424375917768, "step": 431, "batch_size": 64, "mean": 20.57917022705078, "std": 23.168012619018555, "min": -42.40447998046875, "p10": -2.9333396911621086, "median": 19.0899600982666, "p90": 51.22292404174805, "max": 87.5651626586914, "pos_frac": 0.84375, "sample": [-3.2803726196289062, 32.868675231933594, 20.18054962158203, 54.50672912597656, 39.828163146972656, 28.608165740966797, 27.256744384765625, 2.2751312255859375, 8.936454772949219, 18.817989349365234, 33.22767639160156, 20.24170684814453, 1.6815071105957031, -7.602210998535156, 15.529731750488281, 18.495025634765625, 31.825393676757812, 21.684005737304688, 22.91796875, 27.300743103027344, 15.085418701171875, 20.84051513671875, 31.05474853515625, 87.5651626586914, -4.931303024291992, -34.13648986816406, 73.46839904785156, 30.86990737915039, 51.82769775390625, 39.79588317871094, 32.03974914550781, 22.095882415771484, 69.4740219116211, 7.005897521972656, 54.33488464355469, -1.3952255249023438, 48.54374694824219, -1.133209228515625, 19.36193084716797, 3.1744308471679688, 21.839187622070312, 14.277908325195312, 51.445884704589844, 0.4505615234375, -2.12359619140625, -10.830268859863281, 44.57640075683594, 12.51776123046875, 1.9425086975097656, 21.24236297607422, 0.1046295166015625, -3.7026519775390625, 14.523258209228516, 50.281646728515625, 7.027790069580078, 18.704818725585938, 11.06478500366211, 7.828289031982422, -42.40447998046875, 7.847968101501465, 50.70268249511719, 34.03578186035156, 12.353252410888672, 13.11865234375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000431.npy"}
{"epoch": 0.6328928046989721, "step": 432, "batch_size": 64, "mean": 25.08104705810547, "std": 22.348154067993164, "min": -8.268474578857422, "p10": -0.27407531738281243, "median": 20.891387939453125, "p90": 55.09023818969727, "max": 96.50065612792969, "pos_frac": 0.859375, "sample": [13.492515563964844, -0.21202850341796875, 33.17787170410156, 33.079795837402344, 20.272438049316406, 7.699863433837891, 20.75756072998047, 3.09063720703125, 57.238426208496094, 54.57476043701172, 1.0461807250976562, -6.4109954833984375, 57.714385986328125, 34.64128112792969, -0.30066680908203125, 73.9102783203125, 32.109519958496094, 55.3111572265625, 18.319580078125, -0.1811981201171875, 6.015220642089844, 64.87110137939453, 29.659927368164062, -5.955322265625, 1.5539779663085938, 66.20577239990234, 44.65586853027344, -8.268474578857422, 28.62518310546875, 20.36962127685547, 32.670806884765625, -1.0950546264648438, 7.309715270996094, 33.071285247802734, -5.888946533203125, 53.479248046875, 19.874404907226562, 13.685279846191406, 38.334991455078125, 22.471601486206055, 15.043266296386719, 13.747955322265625, 29.0772705078125, 47.659637451171875, 9.201263427734375, 22.811870574951172, 23.938552856445312, 13.848899841308594, 38.361358642578125, 3.5599021911621094, 36.003021240234375, 48.961002349853516, 29.063858032226562, 14.459663391113281, -7.4182586669921875, 10.612621307373047, 49.32282638549805, 13.792566299438477, 96.50065612792969, 7.5736083984375, 21.02521514892578, 10.015975952148438, 36.99385070800781, 50.052886962890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000432.npy"}
{"epoch": 0.6343612334801763, "step": 433, "batch_size": 64, "mean": 20.366147994995117, "std": 22.961097717285156, "min": -42.377166748046875, "p10": -8.12639236450195, "median": 21.208724975585938, "p90": 53.185253906250004, "max": 73.42160034179688, "pos_frac": 0.8125, "sample": [6.2333221435546875, 45.27156448364258, 39.97813415527344, -13.240493774414062, 59.4202880859375, 17.108280181884766, -26.84248161315918, -10.054344177246094, 73.42160034179688, 60.992431640625, 31.951446533203125, -4.9754791259765625, 12.57275390625, 25.552947998046875, 30.383140563964844, 16.24437713623047, 27.154708862304688, 52.18898010253906, 48.25028991699219, -42.377166748046875, -17.05426025390625, 32.146629333496094, 29.199867248535156, 18.485671997070312, 15.56484603881836, 25.292236328125, 53.804473876953125, 50.46126937866211, 14.305366516113281, 38.4630126953125, 27.361709594726562, 25.433242797851562, 16.910476684570312, 26.659088134765625, 1.2440261840820312, 21.068405151367188, 26.260704040527344, -0.761749267578125, 21.349044799804688, 9.7255859375, 59.24626922607422, 33.35646057128906, 2.624469757080078, 6.0081939697265625, -20.8902587890625, 53.61222839355469, 29.883499145507812, 7.975852966308594, 11.852153778076172, 29.3267822265625, 17.39350128173828, 14.76568603515625, 38.07698059082031, -1.458144187927246, 13.983840942382812, -9.476783752441406, 21.584434509277344, 3.5738754272460938, -1.2949790954589844, 59.872642517089844, 26.82378387451172, 21.387924194335938, 1.3314495086669922, -1.280303955078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000433.npy"}
{"epoch": 0.6358296622613803, "step": 434, "batch_size": 64, "mean": 19.341928482055664, "std": 21.825698852539062, "min": -16.617965698242188, "p10": -7.498940658569335, "median": 18.52715015411377, "p90": 41.97433242797852, "max": 118.779541015625, "pos_frac": 0.8125, "sample": [41.541954040527344, 46.5076904296875, 118.779541015625, 3.4193038940429688, -16.031539916992188, -11.597122192382812, 10.890655517578125, -1.23895263671875, -7.161712646484375, 3.3794784545898438, 23.39411163330078, 21.48321533203125, 8.754833221435547, 36.22496032714844, 35.6759033203125, -2.222900390625, 32.456260681152344, 16.857131958007812, 20.480422973632812, 22.04547119140625, 16.307735443115234, -7.849063873291016, 30.23076629638672, -6.8557281494140625, 42.721282958984375, 12.01199722290039, 18.9908504486084, 21.89991569519043, 14.889360427856445, 20.508499145507812, -16.617965698242188, 55.58984375, 32.406532287597656, 11.627204895019531, 24.489547729492188, 41.67018127441406, -1.8361625671386719, 2.713520050048828, 23.388656616210938, 29.997329711914062, 9.30267333984375, 12.389411926269531, 4.4591522216796875, -15.317581176757812, 45.672996520996094, 35.595741271972656, 15.512222290039062, 16.75359344482422, 16.403762817382812, 32.9625244140625, 12.209007263183594, -7.643466949462891, 39.98130798339844, 18.06344985961914, 0.6257095336914062, 21.350570678710938, 29.13530731201172, 41.98090362548828, 24.461807250976562, 52.04326629638672, 41.95899963378906, 13.12396240234375, 21.556713104248047, -14.621673583984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000434.npy"}
{"epoch": 0.6372980910425844, "step": 435, "batch_size": 64, "mean": 21.372522354125977, "std": 20.06279754638672, "min": -20.41088104248047, "p10": 3.0472480773925787, "median": 17.417009353637695, "p90": 48.54635772705078, "max": 76.69743347167969, "pos_frac": 0.921875, "sample": [76.69743347167969, -6.8191986083984375, 12.628498077392578, 10.552818298339844, 50.13823699951172, 37.73884582519531, 14.143646240234375, 35.02937316894531, 38.754608154296875, 6.575286865234375, 65.18734741210938, 6.9825286865234375, 0.7871913909912109, 22.59345245361328, 41.46568298339844, 29.876754760742188, 19.030479431152344, 17.957290649414062, 19.841838836669922, 4.988197326660156, -10.646453857421875, 5.6392822265625, 15.694686889648438, 19.180152893066406, 4.977264404296875, 18.58843994140625, 21.6708984375, 25.867576599121094, -20.41088104248047, 6.053958892822266, 47.71661376953125, 4.640422821044922, 10.79107666015625, 12.682266235351562, 29.432186126708984, 7.684364318847656, 48.90196228027344, 6.166542053222656, 19.16033935546875, 4.760158538818359, 20.108150482177734, 15.131027221679688, 41.151611328125, 14.768070220947266, 36.82768630981445, 30.477787017822266, -10.7098388671875, 30.644622802734375, 16.876728057861328, 6.322174072265625, 69.43972778320312, 2.735767364501953, 15.277606964111328, -5.970235824584961, 46.249725341796875, 51.20619583129883, 37.24336242675781, 16.071685791015625, 8.010990142822266, 67.25660705566406, 3.774036407470703, 11.954292297363281, 34.777496337890625, 25.514907836914062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000435.npy"}
{"epoch": 0.6387665198237885, "step": 436, "batch_size": 64, "mean": 25.003204345703125, "std": 20.277652740478516, "min": -14.791427612304688, "p10": 0.9550949096679711, "median": 24.158432960510254, "p90": 51.592414474487306, "max": 83.88827514648438, "pos_frac": 0.890625, "sample": [4.566267013549805, -9.587234497070312, 10.101276397705078, 51.71931457519531, 55.49297332763672, 29.888568878173828, 55.200721740722656, 14.550117492675781, 15.479446411132812, 9.343170166015625, -0.061367034912109375, 31.085975646972656, 36.83995819091797, -6.747241973876953, 67.62374877929688, 23.40544319152832, -14.791427612304688, 28.968231201171875, 14.946182250976562, 34.16387939453125, 27.068458557128906, 16.122268676757812, 14.149688720703125, 5.5391998291015625, 76.5712890625, 26.869400024414062, 26.280517578125, 42.42498779296875, 51.29631423950195, 24.911422729492188, 83.88827514648438, 52.17023468017578, 16.735698699951172, 32.75274658203125, 16.582054138183594, 9.299335479736328, 25.90997314453125, 15.508537292480469, 41.88585662841797, 18.886993408203125, 30.728347778320312, 10.167427062988281, 42.79747009277344, -0.9051284790039062, 25.17919158935547, 26.481094360351562, 8.619522094726562, -4.793792724609375, 49.15423583984375, -13.832061767578125, 49.967193603515625, 16.480112075805664, 22.079124450683594, 28.982498168945312, 43.19712829589844, 18.539642333984375, 35.63983917236328, 14.959510803222656, 13.223175048828125, 16.50237274169922, 3.3268394470214844, 28.242591857910156, 18.971406936645508, 39.45610809326172], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000436.npy"}
{"epoch": 0.6402349486049926, "step": 437, "batch_size": 64, "mean": 16.71832275390625, "std": 15.615690231323242, "min": -18.459060668945312, "p10": -2.2122844696044917, "median": 16.519588470458984, "p90": 38.04119491577149, "max": 56.01805877685547, "pos_frac": 0.875, "sample": [9.147209167480469, 9.460922241210938, 10.665695190429688, 24.73369789123535, 36.91227722167969, -12.622488021850586, 16.990074157714844, 10.461311340332031, 38.52501678466797, 12.550140380859375, 5.148349761962891, -6.349952697753906, -14.537269592285156, 26.48328399658203, 18.06031036376953, 45.636390686035156, 18.488693237304688, -10.754505157470703, 17.235595703125, 22.11646270751953, 4.226509094238281, 11.410011291503906, 2.551013946533203, 10.500747680664062, 12.06170654296875, 38.8284912109375, 20.40428924560547, -3.479095458984375, 35.384765625, -2.4590110778808594, 25.360183715820312, 24.250579833984375, 40.7408447265625, 3.4527053833007812, 22.68663787841797, -18.459060668945312, 20.98613739013672, 19.6251277923584, 35.00395965576172, 29.667198181152344, 32.85350799560547, 3.6845970153808594, 15.592185974121094, 27.66058921813965, 9.960075378417969, 3.268993377685547, 33.323326110839844, 14.964668273925781, 33.704254150390625, 12.487640380859375, 2.7668800354003906, 3.6474456787109375, 17.809280395507812, 16.049102783203125, 39.13373565673828, 18.834197998046875, 56.01805877685547, 27.369949340820312, -1.6365890502929688, 44.153228759765625, 1.4973907470703125, 3.994384765625, 26.363388061523438, 15.377357482910156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000437.npy"}
{"epoch": 0.6417033773861968, "step": 438, "batch_size": 64, "mean": 21.238506317138672, "std": 18.284090042114258, "min": -16.882354736328125, "p10": -3.623821830749511, "median": 21.192045211791992, "p90": 46.74835968017578, "max": 58.10516357421875, "pos_frac": 0.875, "sample": [16.11591339111328, 24.208206176757812, 24.79897689819336, 11.418586730957031, 14.818845748901367, -8.226486206054688, 51.9971923828125, -2.8810291290283203, 24.364662170410156, 51.420562744140625, 21.908397674560547, 26.136016845703125, 8.119060516357422, 32.654624938964844, 52.647613525390625, 50.011505126953125, 46.76884460449219, 46.46360778808594, -8.272285461425781, 4.2843170166015625, -13.073150634765625, 38.95690155029297, 22.12071990966797, 46.7005615234375, 46.412109375, 13.016891479492188, 26.10748291015625, 18.327781677246094, 9.6109619140625, 6.699577331542969, 0.34935760498046875, 6.778484344482422, 53.80781936645508, 11.656394958496094, 42.692230224609375, 27.231796264648438, 12.272829055786133, 7.4207000732421875, 20.475692749023438, 23.152420043945312, 15.994873046875, 23.533172607421875, 11.062614440917969, 30.274459838867188, 0.8052825927734375, 18.872432708740234, 19.756927490234375, 43.28456115722656, -5.6470947265625, 35.734596252441406, 13.751976013183594, 26.47278594970703, 44.38422393798828, 23.874732971191406, 1.2677154541015625, 13.663177490234375, 14.658744812011719, -3.9421615600585938, 29.2752685546875, 58.10516357421875, 26.927230834960938, 29.94310760498047, -16.882354736328125, -5.3816680908203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000438.npy"}
{"epoch": 0.6431718061674009, "step": 439, "batch_size": 64, "mean": 16.508562088012695, "std": 18.598798751831055, "min": -28.244056701660156, "p10": -4.81901340484619, "median": 16.133041381835938, "p90": 46.34485321044922, "max": 60.56536865234375, "pos_frac": 0.828125, "sample": [4.1196136474609375, 46.82463073730469, 4.855640411376953, 45.225372314453125, 8.372047424316406, -10.895954132080078, 16.378250122070312, 32.53639221191406, 28.10382080078125, -15.260108947753906, 16.323997497558594, 33.99689483642578, -1.17449951171875, 37.35234069824219, 2.548614501953125, 23.625919342041016, 14.947093963623047, 16.41130828857422, 17.1982421875, 23.907127380371094, 6.295114517211914, 27.410423278808594, 10.879417419433594, 8.612747192382812, 8.352935791015625, 48.567352294921875, 10.326663970947266, -12.219573974609375, -5.093996047973633, 5.485130310058594, 49.18315124511719, 2.2803726196289062, 16.069862365722656, -1.960205078125, 60.56536865234375, 16.852584838867188, 20.64337921142578, 26.17841339111328, 20.535625457763672, 0.540313720703125, -4.177387237548828, -12.771743774414062, 38.32279968261719, 6.644248962402344, 30.352447509765625, 9.695980072021484, 3.785480499267578, 24.13203239440918, 47.276458740234375, 16.19622039794922, 48.3857421875, 16.596874237060547, 8.395187377929688, 56.65319061279297, 5.2141876220703125, 10.817035675048828, 27.159744262695312, -3.3099136352539062, 20.82166290283203, 15.761489868164062, 32.65009307861328, 30.13519287109375, -28.244056701660156, -8.846782684326172], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000439.npy"}
{"epoch": 0.644640234948605, "step": 440, "batch_size": 64, "mean": 19.40981674194336, "std": 21.035799026489258, "min": -26.80597686767578, "p10": -4.901908874511718, "median": 17.728275299072266, "p90": 49.87466125488282, "max": 63.36944580078125, "pos_frac": 0.78125, "sample": [39.405494689941406, 3.810596466064453, 46.04832458496094, 14.64617919921875, 52.42134094238281, 5.378448486328125, -11.121543884277344, 19.514816284179688, 31.664981842041016, 39.361900329589844, -5.3824462890625, 18.566696166992188, 48.13134765625, 18.137489318847656, 63.36944580078125, 0.58856201171875, 17.56652069091797, 29.705856323242188, 5.1975555419921875, 17.890029907226562, 50.621795654296875, 35.158416748046875, 13.170076370239258, 26.964981079101562, 21.350265502929688, 22.71527862548828, -6.1383209228515625, 43.748687744140625, -6.4998626708984375, 1.5701713562011719, -0.502197265625, 55.346126556396484, 57.95263671875, 12.720352172851562, 16.09109878540039, -26.80597686767578, 52.65631103515625, -16.871898651123047, 31.676925659179688, -1.7350826263427734, 12.911453247070312, 41.643402099609375, 23.07860565185547, 41.238990783691406, 16.402183532714844, 14.930118560791016, -25.900131225585938, 5.754180908203125, -2.16009521484375, 36.27046203613281, 25.576210021972656, 54.070281982421875, 31.015090942382812, 11.032733917236328, 6.365843772888184, 26.711441040039062, -0.6469459533691406, -2.3809890747070312, 23.040084838867188, -3.7806549072265625, 12.548828125, 42.83470916748047, 14.814079284667969, -1.23309326171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000440.npy"}
{"epoch": 0.6461086637298091, "step": 441, "batch_size": 64, "mean": 23.92841339111328, "std": 20.250173568725586, "min": -21.772125244140625, "p10": 3.9070301055908208, "median": 20.62213134765625, "p90": 52.13987312316895, "max": 86.95034790039062, "pos_frac": 0.953125, "sample": [30.629051208496094, 2.861586570739746, 44.91749572753906, 13.579780578613281, 37.89790344238281, 42.51312255859375, 21.800125122070312, 16.0665283203125, 18.638648986816406, 20.976703643798828, 2.6029624938964844, 23.762828826904297, 35.717628479003906, 29.968555450439453, 30.467052459716797, 23.392196655273438, 11.561790466308594, 25.719879150390625, 9.948013305664062, 24.141815185546875, -2.3170413970947266, 20.633010864257812, 78.882080078125, 65.57528686523438, 10.429222106933594, 17.247970581054688, 29.46685791015625, 21.609546661376953, 7.6050567626953125, 86.95034790039062, 7.505931854248047, -1.9549560546875, 10.512916564941406, 57.14556884765625, 32.19364929199219, 9.759334564208984, 20.611251831054688, 37.28428649902344, 52.30984878540039, 9.876182556152344, 32.7812385559082, 8.269668579101562, 2.9912567138671875, 11.272674560546875, 55.31846618652344, 15.680328369140625, 23.29775047302246, 44.18346405029297, 8.342742919921875, 17.663196563720703, -21.772125244140625, 3.704601287841797, 40.26856994628906, 14.803665161132812, 4.379364013671875, 66.052001953125, 6.540275573730469, 5.115875244140625, 11.352813720703125, 33.20454406738281, 35.953582763671875, 5.033226013183594, 16.748085021972656, 51.743263244628906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000441.npy"}
{"epoch": 0.6475770925110133, "step": 442, "batch_size": 64, "mean": 23.821212768554688, "std": 18.96866226196289, "min": -5.450798034667969, "p10": 1.1142013549804715, "median": 22.261079788208008, "p90": 49.98020172119141, "max": 84.29891967773438, "pos_frac": 0.890625, "sample": [29.041015625, 11.652870178222656, 45.10713195800781, -0.003021240234375, 17.883480072021484, 48.58294677734375, 50.57902526855469, 23.18222427368164, 44.96482849121094, 11.553661346435547, 20.329864501953125, 52.455413818359375, 10.124343872070312, 26.27068328857422, 13.358264923095703, 30.811309814453125, 9.250564575195312, 16.380470275878906, -1.428131103515625, 9.241157531738281, 18.8656005859375, 33.3634033203125, 12.783302307128906, 57.9486083984375, 4.013153076171875, 29.5078125, 21.339935302734375, -1.7133941650390625, 28.520416259765625, 39.9857177734375, 17.5860595703125, 7.469259262084961, 15.401199340820312, -1.3074264526367188, 51.952796936035156, 26.654273986816406, 5.346563339233398, 16.303733825683594, 36.093505859375, 43.796913146972656, 6.370635986328125, 3.7674026489257812, 45.07714080810547, -4.858321189880371, 4.107391357421875, 41.19469451904297, 31.357528686523438, 6.59686279296875, 84.29891967773438, 27.86565399169922, 28.013402938842773, 25.82453155517578, -5.450798034667969, 3.7210540771484375, 13.991294860839844, -0.9872970581054688, 28.27313232421875, 8.466964721679688, 52.27220153808594, 39.90715789794922, 67.14396667480469, 24.053916931152344, 36.05005645751953, 24.25042724609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000442.npy"}
{"epoch": 0.6490455212922174, "step": 443, "batch_size": 64, "mean": 20.12678337097168, "std": 17.81278419494629, "min": -31.04229736328125, "p10": -0.729280853271484, "median": 19.950197219848633, "p90": 41.32990951538086, "max": 63.15480041503906, "pos_frac": 0.859375, "sample": [44.49671173095703, 3.1790103912353516, 40.96009063720703, 29.082672119140625, 19.65448760986328, 11.972183227539062, 8.568531036376953, 7.3585052490234375, 23.92438507080078, 12.381454467773438, -5.022655487060547, -2.562744140625, 3.557229995727539, 1.3827743530273438, 35.84332275390625, -3.4392738342285156, 43.298614501953125, 19.013572692871094, 55.60297393798828, 11.659832000732422, 63.15480041503906, 36.95555114746094, 10.862106323242188, 5.790687561035156, 14.380836486816406, 31.72509765625, -31.04229736328125, 20.245906829833984, 24.285686492919922, 8.426956176757812, 24.62226104736328, 41.4884033203125, 7.486907958984375, 36.80968475341797, 31.569557189941406, 27.508712768554688, 32.42948913574219, 25.088111877441406, 14.24127197265625, -2.670745849609375, 8.370124816894531, -0.28238677978515625, 29.584228515625, 32.51789855957031, 1.3469467163085938, -0.920806884765625, 34.824275970458984, 25.839019775390625, 13.61123275756836, 36.11827087402344, 52.67536926269531, -9.342998504638672, 37.846649169921875, 17.215225219726562, 16.246383666992188, 36.57688903808594, 7.116764068603516, 21.930789947509766, 26.643905639648438, 29.212764739990234, 38.59748458862305, -0.12774658203125, 0.9234619140625, 47.31971740722656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000443.npy"}
{"epoch": 0.6505139500734214, "step": 444, "batch_size": 64, "mean": 20.58400535583496, "std": 20.905948638916016, "min": -15.105300903320312, "p10": -4.267753982543945, "median": 16.002422332763672, "p90": 53.649630737304705, "max": 74.32467651367188, "pos_frac": 0.859375, "sample": [-5.796958923339844, 19.335617065429688, 8.710479736328125, 60.461639404296875, -12.750030517578125, 23.674636840820312, 13.354717254638672, 58.269683837890625, 28.9903564453125, -11.91876220703125, -0.8926010131835938, 42.968994140625, 6.7371368408203125, 26.908058166503906, 29.107669830322266, -4.0330810546875, 3.7616729736328125, 22.991409301757812, 46.615997314453125, 2.28839111328125, 14.75716781616211, -4.368328094482422, 29.52752685546875, 43.11548614501953, 3.4933815002441406, 12.349822998046875, 29.016109466552734, 49.21720886230469, 7.668853759765625, 40.1895751953125, 20.35735321044922, 67.28748321533203, 32.82048034667969, 4.181789398193359, 11.11578369140625, 57.7310905456543, 30.16124725341797, 12.564804077148438, 41.15571594238281, 24.978500366210938, 15.535446166992188, 35.274566650390625, 55.54924011230469, 13.505348205566406, 24.17428970336914, 29.566864013671875, 7.079048156738281, 16.469398498535156, -15.105300903320312, 20.395282745361328, 15.320022583007812, 3.2122840881347656, 9.483123779296875, 14.013038635253906, 19.942764282226562, -9.52785873413086, 1.6363563537597656, 29.706878662109375, 0.5268058776855469, 7.266571044921875, 59.788299560546875, -10.277694702148438, 13.410881042480469, 74.32467651367188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000444.npy"}
{"epoch": 0.6519823788546255, "step": 445, "batch_size": 64, "mean": 20.141815185546875, "std": 17.381248474121094, "min": -15.79836654663086, "p10": 2.1715988159179687, "median": 16.022695541381836, "p90": 49.02049636840822, "max": 63.47651672363281, "pos_frac": 0.921875, "sample": [14.624778747558594, 2.9684791564941406, 34.1317138671875, 28.614898681640625, 4.914093017578125, 8.710166931152344, 14.843006134033203, 31.86554718017578, 17.807647705078125, 9.257938385009766, 16.107219696044922, 37.737335205078125, 13.087188720703125, 12.74478530883789, 4.608997344970703, -5.54241943359375, 54.74651336669922, 24.757179260253906, -11.027816772460938, 20.096298217773438, 26.70301055908203, 2.1234054565429688, 1.0307731628417969, 15.385078430175781, 8.83987045288086, 16.44403839111328, 63.47651672363281, 50.44989776611328, 8.317840576171875, 28.916236877441406, 21.186580657958984, -2.916656494140625, 15.654823303222656, 30.097129821777344, 32.616302490234375, 39.456451416015625, 34.78706359863281, 29.407691955566406, 2.2840499877929688, 52.169921875, 2.946929931640625, 29.482864379882812, 15.25335693359375, -15.79836654663086, 2.31427001953125, 52.55500030517578, 4.114322662353516, 14.870033264160156, 21.3232421875, 18.521663665771484, 9.237648010253906, 20.67169189453125, 4.787630081176758, 30.306716918945312, 15.93817138671875, 52.605445861816406, 15.773956298828125, 45.68522644042969, 52.30427551269531, -1.327056884765625, 12.607177734375, 17.609352111816406, 45.172393798828125, 10.636688232421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000445.npy"}
{"epoch": 0.6534508076358296, "step": 446, "batch_size": 64, "mean": 20.84058952331543, "std": 19.645774841308594, "min": -17.466312408447266, "p10": -1.1554763793945313, "median": 17.269813537597656, "p90": 49.15673446655274, "max": 72.11991882324219, "pos_frac": 0.859375, "sample": [10.361038208007812, 20.328948974609375, 4.353546142578125, 18.8359375, 49.74700927734375, 25.70465850830078, 61.11374282836914, 36.14810562133789, 11.96856689453125, 37.05924987792969, 58.25349426269531, 3.313751220703125, 6.0021514892578125, 14.309921264648438, -1.4296760559082031, 2.6470947265625, 72.11991882324219, 39.823944091796875, 11.436880111694336, 35.69691467285156, 16.47039031982422, -8.400421142578125, 32.362998962402344, 13.499454498291016, -5.296516418457031, 9.034408569335938, 20.621517181396484, 37.726402282714844, 39.81915283203125, 29.047061920166016, 37.648109436035156, 10.0458984375, 47.77942657470703, 6.895336151123047, -1.0658073425292969, 30.368698120117188, -1.1429901123046875, 46.77638244628906, 38.48194885253906, 52.077301025390625, -9.195037841796875, -17.466312408447266, 32.095603942871094, 45.13282012939453, 4.219085693359375, 4.179359436035156, 33.80168151855469, 21.21436309814453, 12.60870361328125, 51.579795837402344, 6.004875183105469, 6.7371063232421875, -7.1937255859375, 15.955692291259766, 0.45965576171875, 18.069236755371094, -1.16082763671875, 19.298744201660156, 8.037261962890625, 9.287664413452148, 13.859939575195312, 51.16737365722656, 20.835975646972656, 23.724746704101562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000446.npy"}
{"epoch": 0.6549192364170338, "step": 447, "batch_size": 64, "mean": 20.112377166748047, "std": 22.590373992919922, "min": -15.095657348632812, "p10": -3.648859214782714, "median": 17.403547286987305, "p90": 44.78117904663087, "max": 119.44436645507812, "pos_frac": 0.796875, "sample": [18.638410568237305, 42.14469909667969, 14.933563232421875, 22.049686431884766, -10.739212036132812, -15.095657348632812, 16.546066284179688, 22.109130859375, 13.80557632446289, 34.02647399902344, 3.88623046875, 30.2161865234375, 33.09159851074219, 23.720294952392578, -3.9964752197265625, 11.559730529785156, 58.17543029785156, -2.8377552032470703, 119.44436645507812, 56.85786437988281, 17.282238006591797, 47.331634521484375, 24.973663330078125, 29.84881591796875, 14.092185974121094, 0.05115509033203125, 13.424678802490234, -8.584678649902344, 54.97829818725586, -6.061195373535156, 7.3127899169921875, 22.74889373779297, -2.5929031372070312, -0.1080780029296875, 29.986221313476562, 3.9210376739501953, 22.59307861328125, 17.524856567382812, 15.0006103515625, -0.36151885986328125, 43.30119323730469, -5.772541046142578, 39.91725158691406, 7.209531784057617, 41.12214660644531, 30.438140869140625, 5.9088897705078125, -8.474117279052734, -0.4139251708984375, 14.873546600341797, 32.98451232910156, 6.4645538330078125, 72.175048828125, -1.1419639587402344, 1.0575942993164062, 28.302841186523438, 45.41545867919922, 4.927583694458008, 34.948455810546875, 20.191070556640625, 24.97442626953125, 25.566810607910156, 29.955039978027344, 1.362518310546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000447.npy"}
{"epoch": 0.6563876651982379, "step": 448, "batch_size": 64, "mean": 21.071447372436523, "std": 21.067506790161133, "min": -20.407135009765625, "p10": -2.2316284179687496, "median": 19.59436798095703, "p90": 46.05420761108398, "max": 82.74796295166016, "pos_frac": 0.828125, "sample": [-2.52685546875, 3.1925125122070312, 3.8723678588867188, 4.5525665283203125, 68.92874908447266, 10.07452392578125, 9.939727783203125, 17.052391052246094, -10.468505859375, -8.521453857421875, 35.86528778076172, 10.109203338623047, -20.407135009765625, 5.858287811279297, -2.3503494262695312, 25.40411376953125, 19.24549102783203, 30.653610229492188, 33.31852722167969, 26.18389892578125, -17.698833465576172, 19.94324493408203, -6.2072296142578125, 24.172653198242188, 32.754024505615234, 70.8653564453125, -1.9546127319335938, 11.978378295898438, 36.83289337158203, 56.97422790527344, 32.04884719848633, 31.52685546875, -1.6186485290527344, 45.21985626220703, 45.032318115234375, 0.9017372131347656, 46.924468994140625, 15.128440856933594, 0.3700408935546875, 22.768329620361328, 36.374053955078125, 38.50494384765625, 24.470718383789062, -1.5785140991210938, 82.74796295166016, 5.305450439453125, 14.349441528320312, 55.27690124511719, 23.451217651367188, 24.536178588867188, 17.220977783203125, -0.3898773193359375, 30.113868713378906, 10.363418579101562, 45.96746063232422, 23.262451171875, 8.749702453613281, 15.125011444091797, 39.74974822998047, 21.68628692626953, 29.689464569091797, 14.428203582763672, 17.136817932128906, 46.09138488769531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000448.npy"}
{"epoch": 0.657856093979442, "step": 449, "batch_size": 64, "mean": 22.15040397644043, "std": 20.87432861328125, "min": -28.87584686279297, "p10": -3.08810920715332, "median": 23.689204216003418, "p90": 48.88168563842775, "max": 74.30477905273438, "pos_frac": 0.875, "sample": [1.2043380737304688, 5.916130065917969, 25.00414276123047, 19.599411010742188, 49.88093566894531, 39.49726104736328, 24.126968383789062, 23.054210662841797, 35.87689208984375, 23.92429542541504, 61.58289337158203, -11.803104400634766, 17.8409423828125, 27.149124145507812, 6.796241760253906, 10.453475952148438, 36.878971099853516, 1.3891983032226562, 0.497833251953125, -11.49832534790039, 53.42610168457031, 13.985111236572266, 23.454113006591797, 27.55352783203125, 24.199172973632812, 30.76007080078125, 26.96893310546875, 5.566648483276367, 22.474685668945312, 27.674583435058594, 34.90776824951172, 8.947017669677734, 9.761650085449219, 16.873077392578125, 50.10386657714844, 41.23240661621094, -15.627265930175781, -11.558357238769531, -8.711257934570312, 51.347686767578125, 74.30477905273438, 31.924198150634766, -28.87584686279297, 36.47090148925781, 33.681732177734375, 11.211372375488281, 37.45819091796875, 9.603641510009766, -2.9946651458740234, 36.82066345214844, 27.956924438476562, 46.20756530761719, 21.568344116210938, -3.1281566619873047, 46.55010223388672, 1.4385986328125, 40.52076721191406, 66.54454803466797, 26.56920623779297, 45.51525115966797, 7.375789642333984, 2.8157196044921875, 16.37793731689453, 10.996917724609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000449.npy"}
{"epoch": 0.6593245227606461, "step": 450, "batch_size": 64, "mean": 23.5332088470459, "std": 17.522886276245117, "min": -19.49017333984375, "p10": 3.5371725082397463, "median": 24.022632598876953, "p90": 44.04533767700195, "max": 61.00181579589844, "pos_frac": 0.9375, "sample": [26.75000762939453, 51.73058319091797, 44.08897399902344, 9.865089416503906, 43.40058898925781, 49.677513122558594, 38.872703552246094, 27.724258422851562, 6.8570098876953125, 3.408416748046875, 25.936264038085938, 42.30480194091797, 53.75812530517578, 22.138938903808594, 35.00935363769531, 61.00181579589844, 40.084815979003906, 21.894515991210938, 29.291030883789062, 31.250808715820312, -19.49017333984375, 6.449668884277344, 14.422317504882812, 29.8543701171875, 43.943519592285156, 22.84351348876953, 20.949264526367188, 40.441192626953125, 27.32653045654297, -16.171031951904297, 33.64906311035156, 52.384521484375, 39.813697814941406, 11.7483491897583, 25.201751708984375, 2.162525177001953, 11.644821166992188, 31.8853759765625, 28.249740600585938, 3.8376026153564453, 55.22467041015625, 4.6806793212890625, 40.222320556640625, 40.66307067871094, 16.32367706298828, 22.812759399414062, 0.6709213256835938, 6.8912200927734375, 12.39837646484375, 17.601573944091797, 33.57733917236328, 13.531097412109375, -1.4098358154296875, 17.071189880371094, 16.785179138183594, 6.120033264160156, 36.755882263183594, 12.75277328491211, 11.566932678222656, 27.059051513671875, 5.488380432128906, 31.470245361328125, -3.402435302734375, 5.077993392944336], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000450.npy"}
{"epoch": 0.6607929515418502, "step": 451, "batch_size": 64, "mean": 19.375080108642578, "std": 20.676454544067383, "min": -26.642654418945312, "p10": -7.049988555908202, "median": 19.996397018432617, "p90": 45.66733016967774, "max": 73.58131408691406, "pos_frac": 0.828125, "sample": [34.51472473144531, 27.105613708496094, 39.4251708984375, 12.487266540527344, 39.310821533203125, 28.294586181640625, -9.979972839355469, 46.25498962402344, 4.619083404541016, 19.771862030029297, 8.593017578125, 73.58131408691406, 68.67098999023438, 36.84980773925781, 23.270217895507812, 7.7639923095703125, 8.669061660766602, 7.7815399169921875, -7.512474060058594, -5.970855712890625, 21.945281982421875, 1.9424285888671875, 44.46031188964844, 40.14507293701172, 40.377933502197266, 12.95281982421875, 20.384536743164062, -9.0950927734375, 26.80333709716797, 9.037353515625, 7.895658493041992, 46.18462371826172, 21.159027099609375, 12.7894287109375, 20.855384826660156, 5.66851806640625, 27.493919372558594, -11.259586334228516, 24.79161834716797, -26.642654418945312, -4.866109848022461, 8.271751403808594, 35.61356735229492, -9.7835693359375, 3.290721893310547, 0.11487960815429688, 5.694400787353516, 23.074810028076172, 25.98102569580078, 12.658798217773438, 18.341716766357422, -12.8724365234375, 1.9074897766113281, 21.196876525878906, 38.68081283569336, 32.69921875, 56.13488006591797, 20.220932006835938, 19.19426727294922, -1.4993610382080078, 56.08903884887695, -2.6087512969970703, 52.30353546142578, 38.775970458984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000451.npy"}
{"epoch": 0.6622613803230544, "step": 452, "batch_size": 64, "mean": 18.586013793945312, "std": 16.118940353393555, "min": -11.744117736816406, "p10": -3.1009094238281247, "median": 17.859365463256836, "p90": 40.5775016784668, "max": 65.2572250366211, "pos_frac": 0.84375, "sample": [-5.0922698974609375, 9.160911560058594, 35.49696350097656, 28.5899658203125, 25.77220916748047, 7.0425872802734375, 28.7452335357666, 41.64689636230469, -0.777740478515625, 18.035545349121094, 13.50567626953125, 31.121959686279297, 3.3313140869140625, 7.003948211669922, 35.9139404296875, -8.598915100097656, 17.683185577392578, 15.745368957519531, -5.533668518066406, -3.165475845336914, 39.99896240234375, 5.2539215087890625, 21.012969970703125, 38.90953063964844, -0.8286209106445312, 54.4984130859375, 13.06386947631836, -5.711000442504883, 41.05363464355469, -11.744117736816406, 3.5704345703125, 65.2572250366211, -2.950254440307617, 13.499336242675781, 30.119613647460938, 16.631744384765625, 15.680030822753906, 35.79607391357422, 17.353012084960938, 6.27496337890625, 19.54638671875, 4.770790100097656, 29.0523681640625, 20.331146240234375, 25.784317016601562, -4.319187164306641, 8.76708984375, 12.840347290039062, 20.182464599609375, 40.82544708251953, 20.986186981201172, 16.587112426757812, 29.341468811035156, 4.292924880981445, 9.287033081054688, 20.40367889404297, 21.706619262695312, 43.84767150878906, 24.265090942382812, 21.365493774414062, 41.05048370361328, 32.92497253417969, 7.785881042480469, 25.51169204711914], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000452.npy"}
{"epoch": 0.6637298091042585, "step": 453, "batch_size": 64, "mean": 24.068645477294922, "std": 19.59189224243164, "min": -4.063503265380859, "p10": 1.5473567962646486, "median": 20.678154945373535, "p90": 50.938864898681665, "max": 78.34083557128906, "pos_frac": 0.96875, "sample": [-3.97393798828125, 10.928565979003906, 2.2726593017578125, 3.3002777099609375, 33.55104446411133, 15.127815246582031, 25.658477783203125, 23.23221206665039, 13.65237045288086, 25.288894653320312, 16.980941772460938, 32.597747802734375, 45.30387878417969, 32.24595642089844, 16.020462036132812, 37.49998474121094, 28.870731353759766, 37.32294845581055, 20.458799362182617, 13.85791015625, 18.802200317382812, 0.394927978515625, 18.554412841796875, 13.624252319335938, 65.20205688476562, 11.553390502929688, 35.977142333984375, 43.44208526611328, 1.2194862365722656, 23.328933715820312, 2.4863319396972656, 26.52503204345703, 27.092605590820312, 35.421142578125, 60.14088439941406, 27.817218780517578, 20.897510528564453, 4.970796585083008, 75.8653564453125, 35.169219970703125, 9.630760192871094, 1.2377586364746094, 68.78913879394531, 21.667869567871094, 19.4998779296875, 31.357254028320312, 53.353858947753906, 12.579383850097656, 32.92622756958008, 78.34083557128906, 0.17144775390625, 16.463211059570312, -4.063503265380859, 43.68610382080078, 28.046951293945312, 4.137664794921875, 11.584861755371094, 3.9778175354003906, 62.372589111328125, 35.658653259277344, 1.4642829895019531, 19.02587890625, 1.7411956787109375, 8.060588836669922], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000453.npy"}
{"epoch": 0.6651982378854625, "step": 454, "batch_size": 64, "mean": 19.446483612060547, "std": 21.03237533569336, "min": -20.165557861328125, "p10": -5.489623260498046, "median": 17.77096939086914, "p90": 51.14480361938477, "max": 65.97513580322266, "pos_frac": 0.84375, "sample": [29.51980972290039, 14.386749267578125, 26.735443115234375, 9.41611099243164, 7.547431945800781, -5.053825378417969, 37.242462158203125, 0.3466033935546875, 14.410331726074219, 2.22406005859375, -17.48309326171875, 25.50696563720703, 39.427337646484375, 45.11920166015625, 4.029201507568359, 34.79169464111328, 2.7488059997558594, -9.613304138183594, -5.619300842285156, 21.529945373535156, 13.018133163452148, 65.97513580322266, 24.544052124023438, 19.3349609375, 0.5260772705078125, -8.526763916015625, 27.216548919677734, 18.130325317382812, 50.04501724243164, 26.703231811523438, 50.369293212890625, 34.863460540771484, 47.799644470214844, 23.03567886352539, 63.87570571899414, 21.811870574951172, 17.41161346435547, 15.055908203125, -12.799850463867188, 23.301620483398438, 54.24400329589844, 51.984230041503906, 13.623748779296875, 15.234153747558594, 30.1678466796875, 51.47716522216797, 58.52827453613281, 2.0204010009765625, 27.06344985961914, 17.10112762451172, 27.002220153808594, 60.69837951660156, -20.165557861328125, 18.681236267089844, 10.360374450683594, -2.7085113525390625, 1.882049560546875, -10.284538269042969, 27.925369262695312, 3.56829833984375, 6.574607849121094, 5.430202484130859, -5.187042236328125, 0.4491767883300781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000454.npy"}
{"epoch": 0.6666666666666666, "step": 455, "batch_size": 64, "mean": 22.848453521728516, "std": 20.99519157409668, "min": -12.614547729492188, "p10": -2.212774658203125, "median": 18.301328659057617, "p90": 48.9809425354004, "max": 80.03741455078125, "pos_frac": 0.84375, "sample": [8.90707015991211, 50.15403747558594, 39.74787139892578, 31.42596435546875, 8.828754425048828, 37.227142333984375, 33.62879943847656, -2.044097900390625, 72.77267456054688, 5.786474227905273, 80.03741455078125, 35.485809326171875, -4.196929931640625, -1.9076309204101562, 17.637664794921875, -7.56810188293457, 46.24372100830078, 39.07609558105469, 18.693809509277344, -12.614547729492188, 17.591453552246094, 60.41065979003906, 43.45835876464844, 10.02499008178711, 22.875473022460938, 36.325714111328125, 52.089439392089844, 0.2798728942871094, -7.8083038330078125, 14.701553344726562, 42.863807678222656, 16.70171356201172, 37.117828369140625, 16.292587280273438, 17.90884780883789, 34.82914733886719, 10.678680419921875, 8.27081298828125, 9.831924438476562, -2.285064697265625, 11.394767761230469, 17.376800537109375, 31.275833129882812, 20.209999084472656, -10.049964904785156, 6.177038192749023, 6.3198089599609375, 33.787322998046875, 20.727745056152344, 5.628120422363281, 37.81660461425781, 25.008834838867188, 67.44125366210938, 12.419929504394531, 13.067291259765625, 2.4539413452148438, -3.4727783203125, 39.55583190917969, 26.52896499633789, 26.447620391845703, 38.98384094238281, 34.683868408203125, -1.8141365051269531, 60.850921630859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000455.npy"}
{"epoch": 0.6681350954478708, "step": 456, "batch_size": 64, "mean": 21.325469970703125, "std": 18.57806968688965, "min": -15.91143798828125, "p10": 0.51748275756836, "median": 20.10101318359375, "p90": 44.15124511718751, "max": 75.74154663085938, "pos_frac": 0.90625, "sample": [8.736869812011719, 29.316619873046875, 45.09538269042969, 23.81918716430664, 2.7196502685546875, 34.44889450073242, -15.91143798828125, 8.847160339355469, -1.5244293212890625, 37.464683532714844, 28.948562622070312, 3.7556381225585938, 41.94825744628906, 21.01080322265625, 35.68946838378906, 24.51183319091797, 28.125640869140625, 27.43597412109375, -0.7058534622192383, 2.653045654296875, 19.19122314453125, -4.0141143798828125, 4.88043212890625, 15.8587646484375, 26.445449829101562, 55.51530456542969, 9.049728393554688, -4.396240234375, 35.11661148071289, 25.075851440429688, 38.93617248535156, 74.75541687011719, 10.411415100097656, 14.623641967773438, 1.193572998046875, -4.352380752563477, 24.882186889648438, 13.713581085205078, 64.60928344726562, 12.205001831054688, 17.068641662597656, 35.1300048828125, 24.22515869140625, 24.36412239074707, 6.053722381591797, 28.608139038085938, 0.22772979736328125, 17.044891357421875, 75.74154663085938, 1.3683357238769531, 51.56195068359375, 8.814506530761719, 45.746002197265625, 6.469337463378906, 8.929311752319336, 30.593780517578125, 24.698394775390625, 12.554244995117188, 18.564727783203125, 21.024063110351562, 35.73600769042969, 11.327117919921875, 13.402984619140625, 25.488510131835938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000456.npy"}
{"epoch": 0.6696035242290749, "step": 457, "batch_size": 64, "mean": 22.390531539916992, "std": 16.50472640991211, "min": -10.171981811523438, "p10": 3.3440555572509796, "median": 21.528564453125, "p90": 42.80699768066407, "max": 75.64212036132812, "pos_frac": 0.921875, "sample": [47.66813659667969, 22.247024536132812, 28.012969970703125, 21.277599334716797, 28.704124450683594, 15.600440979003906, 75.64212036132812, 20.640056610107422, 24.971771240234375, 15.0009765625, 51.245452880859375, 33.754249572753906, 7.933494567871094, 2.1422348022460938, 21.702789306640625, 25.148855209350586, 44.78434753417969, 9.199322700500488, 43.12510681152344, 7.916007995605469, 31.353679656982422, 16.227935791015625, 29.32062530517578, 12.536865234375, 22.86152458190918, 23.263519287109375, 17.427017211914062, 9.383186340332031, 45.48700714111328, 8.699142456054688, 36.2679443359375, -10.171981811523438, 14.636798858642578, 21.354339599609375, 23.36928939819336, 9.932899475097656, 14.253639221191406, 37.46815490722656, 25.707611083984375, 34.03651428222656, 25.44245147705078, 18.78516387939453, 40.59004211425781, -1.4173660278320312, 6.148303985595703, 32.18034362792969, 36.86729431152344, 6.803195953369141, 13.332817077636719, 20.77672576904297, 9.96731185913086, 33.167442321777344, 28.805845260620117, 38.794189453125, -8.505599975585938, 64.16860961914062, 42.06474304199219, 15.878692626953125, 8.549617767333984, -3.358154296875, 0.0238037109375, 17.858924865722656, 23.201278686523438, -7.264472961425781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000457.npy"}
{"epoch": 0.671071953010279, "step": 458, "batch_size": 64, "mean": 21.381813049316406, "std": 22.984508514404297, "min": -20.85675048828125, "p10": -4.954537200927734, "median": 19.321085929870605, "p90": 54.18813552856446, "max": 77.14642333984375, "pos_frac": 0.8125, "sample": [0.7419586181640625, -1.0492706298828125, 23.734214782714844, 29.226776123046875, 10.2237548828125, -4.860260009765625, 76.56761169433594, -7.8156280517578125, -20.85675048828125, 34.85621643066406, 11.889436721801758, 17.213340759277344, 42.64472579956055, -3.9496688842773438, 32.51806640625, 4.935508728027344, 27.50991439819336, -0.11763381958007812, 22.92690658569336, 5.1340789794921875, -19.22980499267578, 4.4725341796875, 22.503807067871094, 77.14642333984375, -7.79083251953125, 17.43012237548828, 12.209625244140625, 8.00311279296875, 44.58076477050781, 26.47679901123047, 58.405967712402344, 52.85011291503906, 32.10748291015625, 29.158615112304688, 4.377265930175781, -8.256523132324219, 75.91934204101562, 34.7796630859375, 7.842460632324219, 61.72064208984375, 22.878402709960938, 40.613311767578125, 9.597663879394531, 14.306705474853516, 15.650558471679688, -4.994941711425781, 54.761573791503906, 8.001129150390625, 7.727867126464844, 68.81368255615234, -0.5343017578125, 19.62537384033203, 0.7602615356445312, 19.84845542907715, 9.402477264404297, 45.55644226074219, 32.74784851074219, 32.496734619140625, 26.135482788085938, 35.50952911376953, 19.01679801940918, 38.04354476928711, -9.180007934570312, 25.470680236816406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000458.npy"}
{"epoch": 0.6725403817914831, "step": 459, "batch_size": 64, "mean": 20.655725479125977, "std": 18.23946189880371, "min": -16.982952117919922, "p10": -0.7459819793701159, "median": 19.5192813873291, "p90": 46.544031143188484, "max": 62.10755920410156, "pos_frac": 0.890625, "sample": [43.53816223144531, 22.411190032958984, -16.982952117919922, 9.314056396484375, -5.430816650390625, 22.147258758544922, 19.503639221191406, 34.08644104003906, 36.170921325683594, -2.4028854370117188, 29.146591186523438, 4.303062438964844, 23.376876831054688, 35.06089782714844, 8.576179504394531, 28.62340545654297, 39.27796936035156, 6.133270263671875, -9.05126953125, 35.03401184082031, 9.310089111328125, 47.48744201660156, 54.684120178222656, 7.692968368530273, 0.7880096435546875, 14.784238815307617, 13.511035919189453, -3.9505157470703125, 23.409683227539062, 19.534923553466797, 15.182483673095703, 21.164146423339844, 42.229766845703125, 8.816375732421875, 42.53179931640625, 0.5687904357910156, 62.10755920410156, 22.57469940185547, -1.3094558715820312, 8.9150390625, 16.41314697265625, 6.213775634765625, 28.821609497070312, 17.863494873046875, 1.4630889892578125, 50.74261474609375, 11.584915161132812, 36.95008850097656, 5.0074462890625, 47.91583251953125, 47.83078384399414, 33.791534423828125, 7.979057312011719, 8.433395385742188, 34.35649108886719, 57.854896545410156, 5.590023040771484, -11.112056732177734, 24.731651306152344, 34.42875671386719, 44.34273910522461, 20.451385498046875, 3.670255661010742, 13.7723388671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000459.npy"}
{"epoch": 0.6740088105726872, "step": 460, "batch_size": 64, "mean": 17.503557205200195, "std": 17.54983901977539, "min": -23.701705932617188, "p10": -3.128457641601562, "median": 16.68613052368164, "p90": 40.68261795043946, "max": 69.10696411132812, "pos_frac": 0.8125, "sample": [14.543769836425781, 17.000473022460938, 23.873489379882812, 19.779037475585938, -4.991119384765625, 29.12847900390625, 17.358871459960938, 46.02873992919922, 11.776763916015625, 4.051055908203125, 12.435432434082031, 18.74675750732422, -1.2013359069824219, 28.986190795898438, -8.911161422729492, 52.198089599609375, 31.355518341064453, 23.801742553710938, -23.701705932617188, 3.1689605712890625, 42.51640319824219, 3.1263885498046875, 52.65032958984375, 0.221588134765625, 22.689712524414062, 20.428192138671875, 69.10696411132812, 40.96160888671875, 31.203571319580078, 3.85198974609375, 11.580276489257812, -5.172271728515625, 16.594093322753906, 13.1605224609375, 28.632755279541016, 22.420814514160156, -2.4462032318115234, 12.4346923828125, 35.52672576904297, 50.76789855957031, 11.912933349609375, 35.664581298828125, 6.986936569213867, 30.869205474853516, 23.186126708984375, -2.4822769165039062, -2.8759765625, -4.840789794921875, 13.295377731323242, 10.807018280029297, 11.095653533935547, -3.236663818359375, 12.579116821289062, -9.587203979492188, 20.833892822265625, 29.876171112060547, 16.778167724609375, -1.2630538940429688, 1.3248062133789062, 23.754257202148438, 30.194747924804688, 12.874107360839844, 26.76480484008789, 40.031639099121094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000460.npy"}
{"epoch": 0.6754772393538914, "step": 461, "batch_size": 64, "mean": 15.80099105834961, "std": 18.61309814453125, "min": -15.01898193359375, "p10": -4.0694431304931635, "median": 11.0106201171875, "p90": 44.47570686340334, "max": 79.85717010498047, "pos_frac": 0.796875, "sample": [5.777435302734375, 3.1408119201660156, -5.013370513916016, 40.061344146728516, 19.85204315185547, 22.315216064453125, 24.826629638671875, 27.567909240722656, -15.01898193359375, 24.31969451904297, 24.59128189086914, -6.015995025634766, 17.555461883544922, -1.8706588745117188, 49.291900634765625, 21.207427978515625, 49.86890411376953, 1.9157447814941406, 15.287734985351562, 21.202484130859375, 2.65765380859375, 27.65851593017578, 33.814857482910156, 31.5762939453125, 8.112327575683594, 3.3758325576782227, 46.367576599121094, 79.85717010498047, 4.8336944580078125, 20.089996337890625, 29.176223754882812, 7.7425994873046875, 24.528785705566406, -0.2690277099609375, 8.480216979980469, -5.015983581542969, 7.800315856933594, 7.355693817138672, -10.610427856445312, 10.658660888671875, 52.942718505859375, -3.414386749267578, 3.9299850463867188, 8.966419219970703, 0.3714141845703125, 11.362579345703125, -2.9315872192382812, 14.139869689941406, -8.844566345214844, 2.320892333984375, 46.57299041748047, -4.350181579589844, 16.37417984008789, 9.177886962890625, 2.9667892456054688, 26.1495361328125, 17.322463989257812, -2.487590789794922, 38.617469787597656, 24.998443603515625, 3.9425048828125, 54.85124206542969, -2.542694091796875, 21.773033142089844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000461.npy"}
{"epoch": 0.6769456681350955, "step": 462, "batch_size": 64, "mean": 19.05565643310547, "std": 16.175155639648438, "min": -12.872062683105469, "p10": -0.2440538406372067, "median": 19.80124282836914, "p90": 35.90292053222656, "max": 62.014251708984375, "pos_frac": 0.890625, "sample": [21.341270446777344, 33.928199768066406, -12.872062683105469, 35.67796325683594, 15.503074645996094, 20.23886489868164, 6.291339874267578, -7.107776641845703, 28.627906799316406, 33.82331848144531, 27.556846618652344, 12.11181640625, 12.491897583007812, -6.737295150756836, 17.903850555419922, 53.459251403808594, 20.476051330566406, 5.8224334716796875, 8.226776123046875, 29.429996490478516, 18.2294921875, 40.46900939941406, 7.041473388671875, 27.730926513671875, 12.327911376953125, 62.014251708984375, 0.26184844970703125, 28.49188995361328, 0.3068370819091797, 12.047809600830078, 29.917816162109375, 19.283889770507812, 11.107429504394531, 31.521682739257812, 7.2051849365234375, 25.131675720214844, 4.2454376220703125, -3.1702308654785156, 30.683761596679688, 56.682525634765625, 0.07541275024414062, 34.96752166748047, 2.2281723022460938, 21.457672119140625, -2.1835784912109375, 20.18299102783203, 5.65277099609375, 25.810760498046875, 24.95147705078125, 31.194610595703125, -7.008079528808594, 21.6019287109375, 43.220458984375, 2.918548583984375, 19.41949462890625, 4.147010803222656, 36.506256103515625, 30.08263397216797, 17.337844848632812, 35.970367431640625, 35.74554443359375, 35.683189392089844, 2.2556076049804688, -0.3809680938720703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000462.npy"}
{"epoch": 0.6784140969162996, "step": 463, "batch_size": 64, "mean": 25.350496292114258, "std": 22.86827278137207, "min": -19.25971221923828, "p10": -2.6270584106445294, "median": 23.200679779052734, "p90": 55.32487945556641, "max": 80.15312194824219, "pos_frac": 0.859375, "sample": [4.3190460205078125, 29.050537109375, -16.272520065307617, 8.231914520263672, 19.38758087158203, 36.177467346191406, 33.94245910644531, 40.338890075683594, 10.05252456665039, 14.492874145507812, 65.22776794433594, 26.63370704650879, -3.3902740478515625, 54.85919189453125, 19.472782135009766, 19.53448486328125, 15.470647811889648, 8.34615707397461, 8.734764099121094, 23.133888244628906, 48.6509895324707, 52.65899658203125, -19.25971221923828, -17.73346710205078, 20.134384155273438, 0.5362091064453125, 6.289569854736328, 44.672279357910156, 24.066024780273438, 38.64884948730469, 58.39680480957031, 23.267471313476562, 19.00025177001953, 25.727874755859375, 18.78484344482422, -8.999958038330078, 38.05426025390625, 13.391593933105469, 24.66883087158203, -8.267738342285156, 50.85725402832031, 33.087791442871094, 21.64698028564453, 11.601158142089844, 26.454742431640625, 56.70367431640625, -0.846221923828125, 25.259124755859375, 72.00149536132812, 39.70973205566406, -0.14800643920898438, 21.416152954101562, 52.41028594970703, 47.82048034667969, 30.028076171875, -14.686027526855469, 47.26491928100586, 4.12556266784668, 59.85768127441406, 13.535266876220703, 13.5003662109375, 54.721458435058594, 80.15312194824219, 55.52445983886719], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000463.npy"}
{"epoch": 0.6798825256975036, "step": 464, "batch_size": 64, "mean": 19.714889526367188, "std": 20.15580940246582, "min": -22.496036529541016, "p10": -1.9723609924316394, "median": 17.24871826171875, "p90": 41.18885574340822, "max": 99.23704528808594, "pos_frac": 0.875, "sample": [-2.5106048583984375, 16.907058715820312, 8.439193725585938, 19.569183349609375, 30.415771484375, 22.117218017578125, 6.374420166015625, 13.233718872070312, 30.93402099609375, 17.590377807617188, 0.02274322509765625, 28.757492065429688, -11.810592651367188, 19.084930419921875, 15.951202392578125, 36.5823974609375, 30.72522735595703, 2.2570114135742188, 5.759239196777344, 58.16734313964844, 10.1082763671875, -14.835494995117188, 14.519035339355469, 25.72064208984375, 37.97569274902344, 16.033035278320312, 14.986236572265625, 42.56592559814453, 7.352043151855469, -11.762275695800781, 33.617103576660156, 43.27922821044922, 11.104576110839844, 10.814517974853516, 31.553848266601562, 33.358497619628906, -8.640113830566406, 34.45991134643555, 3.5665435791015625, 32.460052490234375, 14.0855712890625, 6.866281509399414, 75.63052368164062, 10.263431549072266, -0.762725830078125, 32.2862548828125, -22.496036529541016, 9.719779968261719, 24.419662475585938, 43.14106750488281, 26.550148010253906, 23.408157348632812, 99.23704528808594, 22.958396911621094, -2.4907760620117188, 26.343475341796875, 21.15337371826172, 6.6108245849609375, 28.204185485839844, 49.530975341796875, 2.518310546875, 25.32091522216797, 12.044151306152344, 10.405433654785156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000464.npy"}
{"epoch": 0.6813509544787077, "step": 465, "batch_size": 64, "mean": 24.127197265625, "std": 20.079092025756836, "min": -10.612907409667969, "p10": 0.12634506225585995, "median": 20.988581657409668, "p90": 52.7123348236084, "max": 72.61618041992188, "pos_frac": 0.890625, "sample": [26.448394775390625, 52.59468460083008, 17.833152770996094, -0.9473762512207031, 19.905546188354492, -9.445564270019531, 28.11731719970703, 36.63981246948242, 10.26131820678711, 34.25077819824219, 9.970901489257812, 62.5706787109375, 2.4377288818359375, -3.631622314453125, 16.414474487304688, 65.14509582519531, 45.100074768066406, 44.41792678833008, 25.274330139160156, 12.962791442871094, 15.83233642578125, 27.405372619628906, 52.76275634765625, 36.149383544921875, 63.061553955078125, 11.796005249023438, 38.44440460205078, 19.3912353515625, -0.11867523193359375, 70.29776000976562, 39.80948257446289, 30.974483489990234, 22.14828109741211, 14.857004165649414, 15.573556900024414, 34.724334716796875, 30.110595703125, 19.121246337890625, 4.352378845214844, 22.071617126464844, 14.435890197753906, 10.818744659423828, 12.039909362792969, -5.151813507080078, 53.30491638183594, 12.964958190917969, 25.207427978515625, 45.643890380859375, 13.808887481689453, 42.84133529663086, 2.3485183715820312, -8.80758285522461, 12.1802978515625, 5.836860656738281, 32.774391174316406, 72.61618041992188, -10.612907409667969, 36.01744079589844, 27.161788940429688, 1.1534576416015625, 30.387176513671875, 37.373291015625, 0.69805908203125, 14.013908386230469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000465.npy"}
{"epoch": 0.6828193832599119, "step": 466, "batch_size": 64, "mean": 23.960386276245117, "std": 20.149324417114258, "min": -15.854171752929688, "p10": -1.5235307693481437, "median": 22.65292739868164, "p90": 52.29732437133789, "max": 71.48514556884766, "pos_frac": 0.875, "sample": [51.47085952758789, 37.988975524902344, 28.8936767578125, 2.0820236206054688, 20.92060089111328, -15.854171752929688, 33.30364227294922, -3.40960693359375, 31.457054138183594, 12.3065185546875, 38.28581237792969, 71.48514556884766, 27.255626678466797, 13.20025634765625, 12.157218933105469, 29.498611450195312, 18.49233627319336, 27.836597442626953, 30.503170013427734, 28.248920440673828, 5.915443420410156, 21.192611694335938, 19.663087844848633, 67.70361328125, 52.65152359008789, 19.415367126464844, 2.6483840942382812, 39.583953857421875, 10.548957824707031, 29.425704956054688, 42.463775634765625, 20.459321975708008, 30.060588836669922, 59.9173583984375, 5.7323455810546875, 30.43792724609375, 26.798908233642578, 50.50872039794922, 32.72069549560547, 19.343952178955078, -0.6463451385498047, -1.8994674682617188, -5.384090423583984, 70.30418395996094, 24.785720825195312, 56.509117126464844, 9.270467758178711, 23.36779022216797, 3.1022415161132812, 12.550079345703125, 32.177032470703125, 13.389144897460938, 41.756134033203125, 8.9552001953125, 36.83872985839844, 41.316585540771484, -10.757553100585938, -4.1231536865234375, 55.07499694824219, 12.649513244628906, 16.541458129882812, 4.855228424072266, -12.421844482421875, 21.938064575195312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000466.npy"}
{"epoch": 0.684287812041116, "step": 467, "batch_size": 64, "mean": 25.117164611816406, "std": 22.424823760986328, "min": -15.862968444824219, "p10": -0.44644222259521454, "median": 24.704757690429688, "p90": 54.212838745117196, "max": 87.77397918701172, "pos_frac": 0.875, "sample": [30.85926055908203, 44.72572326660156, 85.342041015625, 24.481178283691406, 6.001880645751953, -8.937759399414062, 39.979217529296875, 45.545318603515625, 11.668939590454102, 19.975322723388672, 27.966552734375, 24.92833709716797, 44.020660400390625, 5.336505889892578, 21.310195922851562, 43.759300231933594, 28.499759674072266, 4.2039337158203125, 87.77397918701172, 14.9608154296875, 35.4385986328125, 29.766826629638672, 15.552871704101562, -7.044105529785156, 1.2813568115234375, 36.59173583984375, 6.514923095703125, 11.468070983886719, 57.13526916503906, 59.74217987060547, 21.288681030273438, -12.41543960571289, 48.292236328125, 19.560447692871094, 59.781158447265625, 27.828948974609375, 51.981353759765625, 41.40728759765625, -15.638626098632812, 13.246047973632812, 17.544841766357422, 9.5960693359375, 32.4272575378418, 16.933021545410156, 57.711830139160156, -0.5710391998291016, 36.354339599609375, -15.862968444824219, 2.139739990234375, 55.169189453125, 18.885757446289062, 26.958885192871094, 29.60589599609375, 20.903701782226562, 2.3907623291015625, 39.22296142578125, 26.23343276977539, 39.71211242675781, 43.18335723876953, 37.80506896972656, -13.47711181640625, -0.1557159423828125, 6.417076110839844, 14.189208984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000467.npy"}
{"epoch": 0.6857562408223201, "step": 468, "batch_size": 64, "mean": 18.9315242767334, "std": 22.19029998779297, "min": -20.632064819335938, "p10": -3.003533935546873, "median": 15.831707000732422, "p90": 46.69086456298829, "max": 98.57333374023438, "pos_frac": 0.859375, "sample": [5.39215087890625, 4.745018005371094, 17.891952514648438, 23.257553100585938, 83.80099487304688, 27.967391967773438, 52.78138732910156, 15.364700317382812, 35.456138610839844, 17.101409912109375, 38.22428894042969, 0.3519287109375, 14.370071411132812, 0.3767890930175781, -13.829479217529297, 5.6929473876953125, 29.001266479492188, 4.708459854125977, 4.570335388183594, 10.427452087402344, -0.8682327270507812, 2.2874317169189453, 5.299072265625, 1.92718505859375, 4.039085388183594, 47.59730529785156, 27.495712280273438, 56.34489440917969, 26.776466369628906, -11.09814453125, -11.26727294921875, 2.3073806762695312, 24.078052520751953, -20.632064819335938, 54.799598693847656, 36.52508544921875, 27.724288940429688, 16.29871368408203, 14.10211181640625, 30.418243408203125, 22.104888916015625, 4.659877777099609, -0.01434326171875, 39.174278259277344, 4.002532958984375, 19.81896209716797, 25.067779541015625, 18.066116333007812, 16.37786865234375, 43.86627197265625, -3.9186630249023438, 31.767005920410156, 2.8331031799316406, 12.2991943359375, 7.012031555175781, 35.379737854003906, 98.57333374023438, 30.971099853515625, 50.672279357910156, 44.575836181640625, 3.6827239990234375, -9.293033599853516, -7.19268798828125, 9.323715209960938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000468.npy"}
{"epoch": 0.6872246696035242, "step": 469, "batch_size": 64, "mean": 22.34588623046875, "std": 18.36735725402832, "min": -11.094863891601562, "p10": -2.6051998138427717, "median": 22.60645294189453, "p90": 45.449045562744146, "max": 64.2976303100586, "pos_frac": 0.859375, "sample": [32.24940490722656, 36.02788543701172, 9.66221809387207, -0.16898727416992188, 22.97283935546875, 4.426464080810547, 47.20397186279297, 27.178909301757812, 7.655010223388672, -3.461029052734375, 36.05802917480469, 47.815269470214844, 12.7847900390625, -10.438125610351562, 57.409210205078125, 16.649429321289062, 37.31211853027344, 59.451507568359375, 6.781219482421875, 28.346412658691406, 28.948341369628906, 9.3077392578125, 42.86547088623047, 17.517974853515625, 51.74755859375, 34.973114013671875, 25.89775848388672, 25.522903442382812, 33.39778137207031, 11.366416931152344, 15.280921936035156, 33.589874267578125, 30.11579132080078, 0.2505950927734375, 37.40150451660156, -9.938541412353516, 13.640613555908203, 15.360481262207031, 6.103668212890625, 40.80780029296875, 44.351722717285156, 16.214210510253906, 14.434852600097656, 11.843002319335938, 22.359695434570312, 0.5777511596679688, -6.074253082275391, 29.891815185546875, 33.824161529541016, -0.6082649230957031, -11.094863891601562, 22.83087158203125, -10.12896728515625, 45.91932678222656, 31.776458740234375, 64.2976303100586, -6.382377624511719, 34.813106536865234, 12.367820739746094, 37.26655197143555, 16.03936767578125, 41.485992431640625, 21.67467498779297, 22.382034301757812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000469.npy"}
{"epoch": 0.6886930983847284, "step": 470, "batch_size": 64, "mean": 22.040130615234375, "std": 18.975242614746094, "min": -13.443313598632812, "p10": 1.1423961639404303, "median": 19.10087013244629, "p90": 50.44264450073243, "max": 73.66812896728516, "pos_frac": 0.90625, "sample": [12.045616149902344, 23.214385986328125, 37.287025451660156, 10.807975769042969, 55.759246826171875, 1.8422012329101562, 24.289281845092773, 12.323638916015625, 18.274456024169922, 22.390586853027344, 27.479415893554688, 15.232040405273438, 29.500083923339844, 3.4103050231933594, 28.992889404296875, -0.181396484375, 13.306333541870117, 24.0860595703125, 20.392913818359375, -1.9526138305664062, 11.941543579101562, -1.7684326171875, 32.272499084472656, -10.934661865234375, 48.928436279296875, 8.591514587402344, 9.778839111328125, 51.091590881347656, 26.678848266601562, 4.268772125244141, 5.796974182128906, 16.6833553314209, 63.379966735839844, 19.927284240722656, 0.8424797058105469, 37.142974853515625, 21.108867645263672, 25.559898376464844, 44.638954162597656, 8.893440246582031, 11.321578979492188, 9.1009521484375, 32.13109588623047, 51.63658142089844, 4.4793853759765625, 73.66812896728516, 17.43267059326172, 12.406723022460938, 7.8498992919921875, 61.29990005493164, 9.856491088867188, 7.608659744262695, 39.56611633300781, 32.159629821777344, 42.03495788574219, 31.872283935546875, 13.462810516357422, 26.831283569335938, 6.4644775390625, -1.02105712890625, 65.69921875, 41.47648620605469, 23.34979820251465, -13.443313598632812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000470.npy"}
{"epoch": 0.6901615271659325, "step": 471, "batch_size": 64, "mean": 20.70227813720703, "std": 13.893501281738281, "min": -12.821126937866211, "p10": 3.8789474487304694, "median": 18.64150047302246, "p90": 41.23289222717286, "max": 49.32017517089844, "pos_frac": 0.96875, "sample": [12.094608306884766, 20.08995819091797, 28.09661102294922, 27.875518798828125, 25.637939453125, 38.480926513671875, 17.513198852539062, 49.32017517089844, 23.232666015625, 28.492660522460938, 2.8051605224609375, 15.046722412109375, 11.135162353515625, 3.7042236328125, 45.15286636352539, 7.936857223510742, 11.50390625, 4.887563705444336, 17.208236694335938, 32.133628845214844, 48.99041748046875, 18.8031005859375, 12.080764770507812, 21.149654388427734, 9.809318542480469, -12.821126937866211, 4.2866363525390625, 15.586112976074219, 43.534400939941406, 37.652198791503906, 13.511478424072266, 22.296669006347656, 42.013790130615234, 15.14358901977539, 6.883296966552734, 39.410797119140625, 0.738800048828125, 8.598358154296875, 21.561782836914062, 15.548641204833984, 36.787967681884766, 0.6789703369140625, 30.484909057617188, 8.501052856445312, 8.055313110351562, 30.565628051757812, 47.0206298828125, 17.059185028076172, 19.970916748046875, 27.540908813476562, 42.34037780761719, 18.13768768310547, 11.908927917480469, 13.793472290039062, 16.997329711914062, 22.909866333007812, -4.965364456176758, 2.25048828125, 32.84370422363281, 20.693511962890625, 35.181053161621094, 28.123741149902344, 18.479900360107422, 32.45832061767578], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000471.npy"}
{"epoch": 0.6916299559471366, "step": 472, "batch_size": 64, "mean": 25.412967681884766, "std": 21.36362648010254, "min": -10.442733764648438, "p10": 0.825839233398443, "median": 20.005672454833984, "p90": 54.922249603271496, "max": 83.59024047851562, "pos_frac": 0.890625, "sample": [42.40210723876953, -10.442733764648438, 7.161262512207031, 15.041748046875, 9.000579833984375, 18.465789794921875, 7.7453460693359375, -1.544891357421875, 35.403846740722656, 29.904071807861328, 16.660755157470703, 13.791595458984375, 48.39555358886719, 25.482999801635742, 20.463531494140625, 70.28080749511719, 24.049301147460938, 18.594161987304688, -4.517547607421875, -8.8377685546875, 83.59024047851562, 23.232749938964844, 32.8775634765625, 14.622283935546875, -8.514625549316406, 9.169540405273438, 39.97264862060547, 6.3575439453125, 14.231842041015625, 51.560821533203125, 7.18841552734375, 79.58235168457031, 20.960838317871094, 58.23823547363281, 68.35232543945312, 20.82056427001953, 8.39510726928711, 56.435302734375, 44.31752014160156, 42.55668640136719, 41.17720031738281, 39.814605712890625, 16.317161560058594, 14.265029907226562, 35.274192810058594, 48.72584533691406, 33.32633972167969, 56.36286163330078, 45.33860778808594, 18.334548950195312, 8.649383544921875, 9.999387741088867, -7.9366912841796875, 19.547813415527344, 17.152481079101562, 36.4276008605957, 36.654449462890625, 27.571739196777344, 8.436820983886719, 13.680267333984375, -4.636653900146484, 17.428970336914062, 12.824981689453125, 30.244476318359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000472.npy"}
{"epoch": 0.6930983847283406, "step": 473, "batch_size": 64, "mean": 17.962514877319336, "std": 17.74390983581543, "min": -18.603240966796875, "p10": -1.64734878540039, "median": 14.447589874267578, "p90": 40.42151947021485, "max": 62.12059020996094, "pos_frac": 0.859375, "sample": [2.530242919921875, 12.336677551269531, 30.919178009033203, 7.844673156738281, 4.46502685546875, 62.12059020996094, 2.3651275634765625, 3.485767364501953, 2.3233642578125, 4.843334197998047, 54.02875518798828, 12.862167358398438, -1.8410491943359375, -1.0695648193359375, 14.841995239257812, 15.677080154418945, 6.874664306640625, -11.074748992919922, 34.93915557861328, 5.042976379394531, -1.1953811645507812, 28.31378936767578, 51.128318786621094, 10.915077209472656, 26.43236541748047, 13.840438842773438, 39.86405944824219, 6.222358703613281, 6.806285858154297, 21.81537628173828, 9.042930603027344, -7.9826507568359375, -18.603240966796875, 23.200180053710938, 13.44876480102539, 9.051712036132812, 47.51301574707031, 37.276451110839844, 20.69452667236328, 41.710723876953125, 9.118980407714844, 22.943939208984375, 14.053184509277344, 40.660430908203125, 23.72167205810547, 25.702302932739258, 6.399871826171875, 16.41590118408203, -7.713417053222656, 52.893829345703125, 30.055282592773438, 38.234344482421875, -10.838851928710938, -10.394519805908203, 30.427223205566406, 25.683212280273438, 31.355186462402344, 37.405601501464844, 37.10771179199219, 37.41216278076172, 16.610084533691406, 19.647003173828125, 9.374752044677734, 10.314521789550781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000473.npy"}
{"epoch": 0.6945668135095447, "step": 474, "batch_size": 64, "mean": 16.981578826904297, "std": 21.021121978759766, "min": -37.29902648925781, "p10": -4.42747631072998, "median": 15.106315612792969, "p90": 43.0725456237793, "max": 61.26324462890625, "pos_frac": 0.8125, "sample": [17.24639892578125, 11.192977905273438, 13.530889511108398, 13.645317077636719, 31.673004150390625, 33.25709533691406, 46.11347961425781, 1.0076351165771484, 42.462074279785156, 43.18110656738281, 23.857833862304688, 61.26324462890625, 3.472074508666992, 18.55939483642578, -26.90636444091797, 9.28045654296875, -1.6096763610839844, 0.16830062866210938, -35.205810546875, 9.720178604125977, 32.369781494140625, 26.29962158203125, 5.598701477050781, 6.967296600341797, 1.8506050109863281, 23.78899383544922, 36.61589050292969, 21.374008178710938, 42.819236755371094, -19.376480102539062, 8.336532592773438, 6.952671051025391, 15.729690551757812, -5.2194061279296875, 0.6963844299316406, 7.4281463623046875, 8.686370849609375, 9.361770629882812, -0.7391242980957031, -1.2695579528808594, 53.530059814453125, 14.150955200195312, 25.840667724609375, 40.76005554199219, -4.3270721435546875, -37.29902648925781, 42.19997787475586, 50.997802734375, -8.433235168457031, -2.684661865234375, 10.424598693847656, 43.310569763183594, 40.18992614746094, 34.140846252441406, 19.574119567871094, 15.736522674560547, 26.454757690429688, 40.14836120605469, 58.09033966064453, -4.47050666809082, 22.124435424804688, 22.73963737487793, 14.482940673828125, 24.95819854736328], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000474.npy"}
{"epoch": 0.6960352422907489, "step": 475, "batch_size": 64, "mean": 20.654516220092773, "std": 21.644872665405273, "min": -13.093318939208984, "p10": -5.654642581939697, "median": 18.15454864501953, "p90": 45.63283767700196, "max": 85.61859130859375, "pos_frac": 0.828125, "sample": [5.1721954345703125, -13.093318939208984, 48.097740173339844, 4.375522613525391, -8.035018920898438, 16.60320281982422, 41.56096649169922, -8.693984985351562, 51.16008758544922, 39.80742645263672, 28.995849609375, 3.5172500610351562, 22.449554443359375, 12.783638000488281, 15.931320190429688, 21.11285400390625, 46.11488342285156, 37.55966567993164, 18.95172119140625, -3.992389678955078, 0.17910385131835938, 5.57421875, 2.7695770263671875, 26.55771255493164, 19.919597625732422, 36.273681640625, 25.571014404296875, -5.9601287841796875, 14.54498291015625, 7.299232482910156, 2.9202842712402344, -8.1798095703125, 14.2181396484375, 68.30451202392578, -0.78228759765625, 29.61156463623047, 37.630950927734375, 23.41211700439453, 44.50806427001953, 38.83172607421875, 14.901666641235352, 4.713459014892578, 56.784278869628906, -7.998146057128906, 14.152603149414062, 10.616325378417969, 6.365894317626953, 12.184574127197266, 33.6624755859375, 31.851829528808594, -2.1488037109375, 2.3436355590820312, 19.969100952148438, 17.357376098632812, 42.1864013671875, 22.89795684814453, 32.75652313232422, 85.61859130859375, 23.38669204711914, 85.60858154296875, -5.701141357421875, 23.69580078125, -5.546145439147949, 38.64616775512695], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000475.npy"}
{"epoch": 0.697503671071953, "step": 476, "batch_size": 64, "mean": 19.488996505737305, "std": 20.090534210205078, "min": -9.59771728515625, "p10": -3.209438514709473, "median": 16.86057949066162, "p90": 47.80866165161133, "max": 69.13094329833984, "pos_frac": 0.828125, "sample": [-4.6871337890625, -3.2817306518554688, 1.8756523132324219, -8.390043258666992, 46.56672668457031, 10.862281799316406, 58.09977722167969, 19.855472564697266, 18.502803802490234, -2.2785186767578125, 39.61943054199219, 32.824623107910156, 25.199256896972656, 69.13094329833984, 21.10144805908203, 39.40501403808594, 30.09008026123047, 2.529296875, 29.944313049316406, 2.4390182495117188, 59.897666931152344, 25.161758422851562, 1.5500640869140625, 38.13250732421875, 15.96595573425293, 37.69886016845703, 0.7417945861816406, -3.2258224487304688, 6.958763122558594, 6.989921569824219, 3.2525863647460938, -7.038211822509766, 0.8261299133300781, 26.100910186767578, 15.176349639892578, 40.36553192138672, -0.6094818115234375, -3.1712093353271484, 9.893165588378906, 37.93810272216797, 5.997768402099609, 2.7869415283203125, 12.5257568359375, -9.59771728515625, -2.239011764526367, 16.758108139038086, 58.635459899902344, 48.340919494628906, 48.36914825439453, 68.98844909667969, 38.90177917480469, 19.519332885742188, 11.906656265258789, 22.72030258178711, 16.963050842285156, 9.709724426269531, 20.93914794921875, 33.92224884033203, 18.53298568725586, -8.176750183105469, 18.89281463623047, 34.82606506347656, 13.529342651367188, 2.5291595458984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000476.npy"}
{"epoch": 0.6989720998531571, "step": 477, "batch_size": 64, "mean": 22.225059509277344, "std": 18.093852996826172, "min": -14.973796844482422, "p10": -0.4580577850341776, "median": 26.036190032958984, "p90": 44.9030403137207, "max": 63.48020935058594, "pos_frac": 0.890625, "sample": [7.131587982177734, 31.97240447998047, 29.032485961914062, 6.9101104736328125, 46.02110290527344, 39.05325698852539, 26.84130859375, 34.32936477661133, 6.996280670166016, 27.351211547851562, 3.8642444610595703, 21.912662506103516, 59.0072021484375, 44.578285217285156, -6.1414031982421875, 2.770763397216797, 26.94646453857422, 23.06503677368164, 41.61958312988281, 29.720478057861328, 38.499755859375, 45.04222106933594, 31.107559204101562, 4.045515060424805, 15.48114013671875, 31.558250427246094, -11.04580307006836, 30.21405029296875, 8.81905746459961, 11.879886627197266, -14.973796844482422, 27.930679321289062, 2.3682098388671875, -10.500961303710938, 63.48020935058594, 5.9554290771484375, 27.526748657226562, 17.402254104614258, 25.23107147216797, 17.75519561767578, 18.261489868164062, -1.3224449157714844, 12.669464111328125, 31.48261260986328, 3.077056884765625, 31.252662658691406, 28.233251571655273, 58.79252624511719, 1.5588455200195312, 14.775676727294922, 36.086944580078125, 19.784839630126953, -13.044609069824219, 37.670936584472656, 14.768402099609375, 23.528945922851562, 32.53715515136719, 31.452621459960938, 32.25941467285156, 47.523582458496094, -9.099777221679688, 17.713485717773438, 33.235504150390625, 48.44618225097656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000477.npy"}
{"epoch": 0.7004405286343612, "step": 478, "batch_size": 64, "mean": 20.522260665893555, "std": 18.867589950561523, "min": -30.822097778320312, "p10": -2.0933559417724608, "median": 17.901111602783203, "p90": 39.798011016845706, "max": 81.96746826171875, "pos_frac": 0.875, "sample": [14.324951171875, 29.487022399902344, 6.763763427734375, 9.125652313232422, 28.488250732421875, 23.195701599121094, 4.7251739501953125, 13.814666748046875, -1.736175537109375, -2.2464332580566406, -30.822097778320312, 12.335311889648438, 9.461177825927734, 26.589431762695312, 12.427692413330078, 40.12310028076172, 17.993141174316406, 81.96746826171875, -8.68841552734375, 11.292137145996094, 10.737655639648438, 20.0550537109375, 11.723724365234375, 18.65460968017578, 25.943092346191406, 39.82240295410156, 11.593952178955078, 17.568862915039062, 5.8632354736328125, -3.825763702392578, 31.478561401367188, 23.931926727294922, 16.270248413085938, 34.77703857421875, 30.640304565429688, 31.99095916748047, 72.03202819824219, 16.244140625, 15.61151123046875, 4.114582061767578, 32.42311096191406, 38.74315643310547, 7.810859680175781, -3.1264305114746094, 18.018722534179688, 10.049118041992188, 30.772796630859375, 52.86387634277344, 58.97357177734375, 17.80908203125, 26.01403045654297, 11.18023681640625, 32.08840560913086, 29.94801902770996, -8.487777709960938, 35.40570831298828, 45.98358154296875, 7.48358154296875, 37.415504455566406, 31.10877227783203, 22.138442993164062, 15.395784378051758, 39.74109649658203, -10.178253173828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000478.npy"}
{"epoch": 0.7019089574155654, "step": 479, "batch_size": 64, "mean": 18.0128116607666, "std": 17.62907600402832, "min": -20.81097412109375, "p10": -2.694417572021483, "median": 15.087600708007812, "p90": 46.01304664611816, "max": 61.43659973144531, "pos_frac": 0.84375, "sample": [4.498588562011719, 4.850776672363281, 19.270492553710938, 14.791702270507812, 18.34912109375, 61.43659973144531, 1.0466575622558594, 34.825531005859375, 34.933258056640625, 6.217658042907715, 22.37744903564453, 13.016300201416016, 6.561302185058594, 47.081764221191406, 27.11078643798828, 10.388015747070312, 26.49444580078125, -8.905540466308594, 21.840927124023438, -0.418304443359375, 8.597480773925781, 46.88310241699219, 6.238655090332031, 3.412181854248047, 30.096725463867188, 46.39263916015625, 25.90289306640625, -1.2313079833984375, -5.5179595947265625, 6.989467620849609, 19.62976837158203, 14.850677490234375, 7.184303283691406, 22.649612426757812, 8.293212890625, 53.442779541015625, 37.64398193359375, 8.370658874511719, 7.443267822265625, 18.43764877319336, 46.02280044555664, 24.08685302734375, 53.63787078857422, 13.383377075195312, -7.7330780029296875, -4.05072021484375, 9.589202880859375, 8.310333251953125, 21.57750701904297, 23.120277404785156, 27.174041748046875, 23.346641540527344, -12.179698944091797, -20.81097412109375, -3.3214645385742188, 34.49565124511719, 15.32452392578125, 45.99028778076172, 24.92545509338379, 43.93048095703125, -0.7993621826171875, 32.617767333984375, 8.604743957519531, 14.100074768066406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000479.npy"}
{"epoch": 0.7033773861967695, "step": 480, "batch_size": 64, "mean": 22.651287078857422, "std": 20.5920352935791, "min": -19.287887573242188, "p10": -0.9793449401855459, "median": 20.770484924316406, "p90": 49.57975540161134, "max": 75.72030639648438, "pos_frac": 0.890625, "sample": [18.50226593017578, 44.55523681640625, 31.770431518554688, 8.752967834472656, 27.67486572265625, 43.263431549072266, 57.865081787109375, 55.12208557128906, 8.005744934082031, 11.766580581665039, 0.284210205078125, 39.74806213378906, -1.4437026977539062, 0.104156494140625, 2.6428909301757812, 27.369476318359375, 17.538593292236328, 7.155452728271484, 28.98584747314453, 10.473403930664062, 67.78889465332031, 19.99140167236328, -11.802825927734375, 32.35050964355469, 7.17803955078125, 18.77391815185547, 26.64667510986328, 33.44242858886719, 29.456130981445312, -19.287887573242188, 31.35533905029297, 50.54957580566406, 47.31684112548828, 9.4365234375, 13.461990356445312, 24.410099029541016, 24.689571380615234, -13.652229309082031, 41.47808837890625, 16.01220703125, 9.342926025390625, -2.6586151123046875, 20.710243225097656, 20.25037384033203, 34.21307373046875, 4.899238586425781, 19.89801788330078, 37.283721923828125, 3.6847610473632812, 20.830726623535156, 3.4436798095703125, 3.1673593521118164, 32.722084045410156, 52.012535095214844, 38.07627868652344, -4.875238418579102, -14.206527709960938, 23.432456970214844, 31.922561645507812, 43.938499450683594, 68.78334045410156, 10.843788146972656, 26.51434326171875, 75.72030639648438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000480.npy"}
{"epoch": 0.7048458149779736, "step": 481, "batch_size": 64, "mean": 22.764768600463867, "std": 21.032968521118164, "min": -24.38233184814453, "p10": -1.3790872573852528, "median": 19.52721405029297, "p90": 49.7394157409668, "max": 73.86582946777344, "pos_frac": 0.875, "sample": [31.859329223632812, -15.754730224609375, 11.992855072021484, 13.035324096679688, 6.300323486328125, 9.477012634277344, 7.0384063720703125, 27.229045867919922, 12.710531234741211, 47.716819763183594, 16.468666076660156, 10.858383178710938, -7.5285491943359375, 3.4231300354003906, 32.68536376953125, 13.345706939697266, 22.965126037597656, 30.89813232421875, 47.958831787109375, 10.542900085449219, 63.17718505859375, 44.69910430908203, 28.907936096191406, 34.16533660888672, 33.90623092651367, 19.46569061279297, 28.140625, 30.60717010498047, 17.293731689453125, 15.903465270996094, -8.808242797851562, 34.873008728027344, 32.922569274902344, 29.83434295654297, 16.57025146484375, 48.921607971191406, 15.927408218383789, 6.5435638427734375, 15.791252136230469, 52.91462707519531, 33.475975036621094, 19.58873748779297, 17.865585327148438, 63.085906982421875, -12.732833862304688, -24.38233184814453, 29.46221923828125, 26.938735961914062, 73.86582946777344, 10.1357421875, 12.21914291381836, 23.688400268554688, 15.710639953613281, 6.500415802001953, -0.4011383056640625, 2.1009979248046875, -9.930150985717773, 50.08990478515625, 22.076587677001953, 71.86763000488281, 69.92962646484375, 35.407508850097656, -1.798208236694336, 27.20067596435547], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000481.npy"}
{"epoch": 0.7063142437591777, "step": 482, "batch_size": 64, "mean": 18.955894470214844, "std": 18.159269332885742, "min": -23.495361328125, "p10": -3.19317512512207, "median": 18.160160064697266, "p90": 39.68782768249512, "max": 54.58087158203125, "pos_frac": 0.84375, "sample": [26.202316284179688, 11.860069274902344, 25.008926391601562, 25.088226318359375, 49.30134582519531, 30.413406372070312, 16.763916015625, -0.8617782592773438, 15.045166015625, 17.14765167236328, 3.9553184509277344, 7.062351226806641, -2.845928192138672, -18.71417236328125, 27.713760375976562, 37.74689483642578, 39.65907669067383, 11.473064422607422, 28.36723518371582, 43.244911193847656, 10.142227172851562, 31.210174560546875, -23.495361328125, 15.907180786132812, 6.075674057006836, 30.814098358154297, 5.368839263916016, 28.201202392578125, 43.04931640625, -6.661334991455078, 10.028947830200195, 19.08905029296875, 34.61354064941406, 29.213481903076172, 21.579544067382812, 12.550334930419922, 10.62335205078125, 24.158096313476562, 38.12135314941406, 29.374801635742188, 32.24608612060547, 2.614116668701172, -4.166391372680664, -0.72515869140625, 23.84134292602539, -19.94464111328125, -3.3419952392578125, 39.70014953613281, 54.58087158203125, 17.23126983642578, -20.6378173828125, 30.575889587402344, 6.22344970703125, 16.324722290039062, 2.8553466796875, 16.15044403076172, 33.60761642456055, 30.891647338867188, 34.83399963378906, 2.0187225341796875, 13.02593994140625, 52.808258056640625, 50.983123779296875, 37.883846282958984], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000482.npy"}
{"epoch": 0.7077826725403817, "step": 483, "batch_size": 64, "mean": 23.0577449798584, "std": 20.689611434936523, "min": -11.017753601074219, "p10": 0.5751407623291017, "median": 20.345508575439453, "p90": 54.10981903076174, "max": 83.12954711914062, "pos_frac": 0.90625, "sample": [4.054298400878906, 65.70722961425781, 34.31861877441406, 37.49407958984375, -3.6568450927734375, 28.58600616455078, 29.33349609375, 31.12909698486328, 16.421241760253906, 5.747314453125, 12.77783203125, 83.12954711914062, 17.416397094726562, 20.780906677246094, 12.533557891845703, 44.38359832763672, 69.9105224609375, 23.023849487304688, 26.615379333496094, 20.167770385742188, -4.488288879394531, 0.6896266937255859, 42.166534423828125, 32.28437805175781, 0.5260753631591797, 30.904373168945312, 36.84089660644531, 25.796173095703125, 22.728023529052734, 56.7003173828125, 1.4903030395507812, 40.87757110595703, 10.263420104980469, 60.21205139160156, 20.027023315429688, 20.52324676513672, 28.952312469482422, 4.889976501464844, 68.535400390625, 27.696613311767578, 7.818328857421875, 7.6153106689453125, 41.861289978027344, 9.225204467773438, 57.016510009765625, 48.06532287597656, 1.1530532836914062, -11.017753601074219, 21.826568603515625, 1.1530532836914062, 3.01666259765625, -1.8968048095703125, -6.264457702636719, 19.97106170654297, -2.1100921630859375, 29.26385498046875, 18.572402954101562, 27.765182495117188, 41.985595703125, 4.934055328369141, 14.049057006835938, 6.672794342041016, 16.637237548828125, 10.89227294921875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000483.npy"}
{"epoch": 0.7092511013215859, "step": 484, "batch_size": 64, "mean": 24.421932220458984, "std": 22.687559127807617, "min": -12.135238647460938, "p10": -0.7953506469726557, "median": 22.356189727783203, "p90": 49.717872619628906, "max": 87.00680541992188, "pos_frac": 0.875, "sample": [-12.135238647460938, 36.36888885498047, 10.044326782226562, 27.1702880859375, 16.2288818359375, 10.31109619140625, 0.8935813903808594, 30.162887573242188, 32.986602783203125, 31.024208068847656, -4.216663360595703, 4.385028839111328, 47.42108917236328, 49.011260986328125, 39.8853759765625, 3.271373748779297, 27.574073791503906, 0.31212615966796875, 12.796173095703125, 26.157928466796875, 22.15972900390625, 13.226909637451172, 39.08562469482422, 44.43476104736328, 79.60693359375, -2.8798751831054688, 46.51348114013672, 81.47743225097656, -1.011932373046875, 42.34483337402344, 11.857955932617188, 1.3298301696777344, 32.067893981933594, 26.706825256347656, 14.963363647460938, 28.009323120117188, 49.778411865234375, 6.369421005249023, -4.7885894775390625, 42.336204528808594, 16.106891632080078, 11.314067840576172, 7.4758453369140625, 1.2652053833007812, 58.43891906738281, 23.887725830078125, 10.948722839355469, 29.471363067626953, 4.916603088378906, -3.779937744140625, -0.2899932861328125, 22.552650451660156, 87.00680541992188, 14.897674560546875, 24.59701156616211, 19.101539611816406, 76.13494873046875, 41.45354461669922, 53.764190673828125, 0.41433143615722656, 33.879234313964844, 21.77328872680664, -5.145408630371094, 49.57661437988281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000484.npy"}
{"epoch": 0.71071953010279, "step": 485, "batch_size": 64, "mean": 26.501728057861328, "std": 19.52461814880371, "min": -8.776081085205078, "p10": 3.909611892700196, "median": 23.14684295654297, "p90": 55.87941131591797, "max": 77.938232421875, "pos_frac": 0.9375, "sample": [38.65656280517578, 24.868759155273438, 39.83458709716797, -0.6150970458984375, 63.486663818359375, 20.489990234375, 55.64482116699219, 12.583122253417969, 20.406997680664062, 32.85375213623047, 15.188140869140625, 56.36095428466797, 11.925796508789062, 22.15038299560547, -6.1711578369140625, 61.674774169921875, 9.779190063476562, 13.103561401367188, 77.938232421875, 17.9970703125, 42.92303466796875, 42.23963928222656, 0.9453811645507812, 14.238697052001953, 40.168975830078125, 55.979949951171875, 9.109565734863281, 3.5779380798339844, 22.052997589111328, 22.361480712890625, 28.27367401123047, 17.643447875976562, 17.490943908691406, 9.118034362792969, 28.209003448486328, 23.932205200195312, 40.272987365722656, 38.34341049194336, -8.776081085205078, 27.202880859375, 61.16034698486328, 41.160491943359375, 37.27275085449219, 44.133453369140625, 9.296913146972656, 21.014808654785156, 5.910037994384766, 47.164459228515625, 27.117694854736328, 9.311798095703125, 6.6776885986328125, 37.061302185058594, 22.03514862060547, 47.48234558105469, -5.8130340576171875, 40.90702819824219, 30.081466674804688, 9.814966201782227, 2.3238258361816406, 8.4222412109375, 4.6835174560546875, 32.35670471191406, 65.73991394042969, 27.3294677734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000485.npy"}
{"epoch": 0.7121879588839941, "step": 486, "batch_size": 64, "mean": 24.53888702392578, "std": 21.462217330932617, "min": -17.392593383789062, "p10": -0.8908203124999995, "median": 21.974660873413086, "p90": 56.08984069824219, "max": 66.27006530761719, "pos_frac": 0.875, "sample": [64.79508209228516, 42.161720275878906, 37.61936950683594, 22.18103790283203, 21.76828384399414, 34.134681701660156, 21.359519958496094, 14.820556640625, 14.409256935119629, 42.160804748535156, 54.06583023071289, 11.218490600585938, 55.54313659667969, 22.48102569580078, -1.2578277587890625, 53.693115234375, 33.3690185546875, 19.134689331054688, -1.1577339172363281, 36.87223815917969, 32.00663757324219, -9.639156341552734, 11.145500183105469, -10.627914428710938, 4.784580230712891, 9.616340637207031, 12.353660583496094, 46.69843673706055, 30.88296890258789, 60.33466339111328, 42.654205322265625, 57.866729736328125, 3.485321044921875, 5.50421142578125, 20.931716918945312, 10.491683959960938, 7.393669128417969, 32.236419677734375, 28.655967712402344, 41.593971252441406, 26.223770141601562, 10.807655334472656, 45.758636474609375, 12.34241008758545, 56.32414245605469, 50.64106750488281, 42.27494812011719, 58.673736572265625, 7.964164733886719, 29.308212280273438, 5.8546905517578125, 13.616199493408203, 6.785028457641602, 22.63915252685547, 5.594268798828125, 0.9068088531494141, 28.81353759765625, 8.3203125, -12.113212585449219, -1.1020736694335938, 66.27006530761719, -17.392593383789062, -0.39789581298828125, 64.63383483886719], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000486.npy"}
{"epoch": 0.7136563876651982, "step": 487, "batch_size": 64, "mean": 22.845443725585938, "std": 17.989728927612305, "min": -8.744186401367188, "p10": 4.000131225585938, "median": 21.467529296875, "p90": 45.82454986572266, "max": 86.43421936035156, "pos_frac": 0.9375, "sample": [21.942840576171875, 47.83148193359375, 14.8829345703125, 14.703910827636719, 23.782752990722656, 18.765846252441406, 23.86444091796875, 3.6531600952148438, 63.45475769042969, 19.739532470703125, 10.243087768554688, 14.337982177734375, 45.602264404296875, 13.321029663085938, 25.10974884033203, 86.43421936035156, 28.486572265625, 49.93900680541992, 12.579889297485352, 35.680328369140625, 37.3203125, 55.883384704589844, 19.99529266357422, 56.61272430419922, 10.612709045410156, 42.545982360839844, 14.616531372070312, 10.873504638671875, 5.887149810791016, 25.191551208496094, 5.392547607421875, 1.113077163696289, 14.803003311157227, 5.1308441162109375, 29.054397583007812, 11.963714599609375, 36.306732177734375, -7.564247131347656, 39.564178466796875, 28.106773376464844, 6.628620147705078, 3.48016357421875, -1.6901473999023438, 4.809730529785156, 25.172225952148438, 35.07792282104492, 6.430816650390625, -8.744186401367188, 32.264068603515625, 25.574459075927734, 10.03472900390625, 45.91981506347656, 23.246288299560547, 21.53728485107422, 5.213441848754883, -8.461944580078125, 23.635467529296875, 38.22997283935547, 43.08489990234375, 28.77667236328125, 20.18326187133789, 20.52375030517578, 22.017440795898438, 21.39777374267578], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000487.npy"}
{"epoch": 0.7151248164464024, "step": 488, "batch_size": 64, "mean": 21.75119400024414, "std": 18.906265258789062, "min": -31.385040283203125, "p10": 1.7376234054565434, "median": 20.311782836914062, "p90": 50.0599952697754, "max": 72.06121826171875, "pos_frac": 0.921875, "sample": [14.935394287109375, 20.162460327148438, 32.588706970214844, 40.109134674072266, 52.83735656738281, 7.504852294921875, 51.16874694824219, 5.6136016845703125, -8.808929443359375, 18.37512969970703, 2.2098846435546875, 5.9228057861328125, -2.155364990234375, 13.929412841796875, 22.428390502929688, 33.57246017456055, 31.850616455078125, 2.9186229705810547, 2.6541481018066406, -31.385040283203125, 12.484603881835938, 3.2296066284179688, 47.47290802001953, 17.81468963623047, 15.060867309570312, 72.06121826171875, 22.134687423706055, 24.322837829589844, 61.1453857421875, 14.808107376098633, 41.330482482910156, 29.19036865234375, 17.374671936035156, 26.00458526611328, 47.317466735839844, 17.54837417602539, 54.89933776855469, 33.82206726074219, 18.010051727294922, 5.660099029541016, 19.641332626342773, 39.255828857421875, 24.0185546875, 20.748504638671875, 21.499210357666016, -2.6585540771484375, 16.459205627441406, 20.461105346679688, 28.845550537109375, -1.3294830322265625, 24.01486587524414, 30.44287872314453, 54.194725036621094, 20.75559425354004, 23.269805908203125, 0.8212432861328125, 1.5580787658691406, 41.221553802490234, 2.1565608978271484, 3.5699462890625, 5.6595458984375, 17.077743530273438, 56.872589111328125, 25.395179748535156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000488.npy"}
{"epoch": 0.7165932452276065, "step": 489, "batch_size": 64, "mean": 20.03380584716797, "std": 20.237672805786133, "min": -8.208152770996094, "p10": -2.307263755798338, "median": 14.148191452026367, "p90": 45.42837142944336, "max": 86.17884063720703, "pos_frac": 0.859375, "sample": [12.631759643554688, 19.01001739501953, 38.863807678222656, 6.175666809082031, 2.418752670288086, 17.701873779296875, 11.089872360229492, 47.33948516845703, 7.754112243652344, 11.774696350097656, 35.68389892578125, -0.6171035766601562, -8.034194946289062, 18.891809463500977, 11.506629943847656, 26.853317260742188, 39.799896240234375, 45.77964782714844, 13.663589477539062, 42.259925842285156, 26.33875274658203, 2.244661331176758, -7.671989440917969, 23.085418701171875, 6.803668975830078, -8.208152770996094, 39.66986083984375, 8.495246887207031, 8.07088851928711, 8.665267944335938, -3.031618118286133, 31.519821166992188, 1.4906482696533203, 7.147411346435547, 57.51325988769531, 10.944259643554688, 14.632793426513672, 16.868408203125, 3.5884857177734375, -3.6119003295898438, 20.0103759765625, -0.214813232421875, 64.88958740234375, 8.25899887084961, 34.04261779785156, 24.254043579101562, 25.462867736816406, 16.760543823242188, 57.27055358886719, 10.721771240234375, 86.17884063720703, 39.67262268066406, 23.379005432128906, 44.608726501464844, -7.995855331420898, 12.804649353027344, 29.75054931640625, 26.034332275390625, 40.02716064453125, 13.411941528320312, 6.0915985107421875, 1.0238494873046875, 67.80265808105469, -7.18560791015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000489.npy"}
{"epoch": 0.7180616740088106, "step": 490, "batch_size": 64, "mean": 20.99321746826172, "std": 23.581018447875977, "min": -17.405887603759766, "p10": -8.551753616333007, "median": 19.96697425842285, "p90": 51.110782623291016, "max": 75.21434783935547, "pos_frac": 0.8125, "sample": [43.070159912109375, -7.4062957763671875, 50.46075439453125, -8.453315734863281, 21.013076782226562, 9.052282333374023, 21.72277069091797, -16.454933166503906, 45.099693298339844, 5.514888763427734, -17.405887603759766, 28.760894775390625, 39.5924072265625, 48.17732238769531, -2.1591110229492188, -9.285659790039062, -8.564682006835938, 51.389366149902344, -16.9520263671875, 2.8781814575195312, 2.619068145751953, 1.8908538818359375, -11.145736694335938, 6.921295166015625, -10.637886047363281, 48.02410125732422, 12.424753189086914, 3.5983753204345703, 18.138771057128906, 65.28201293945312, 2.53369140625, 75.21434783935547, 18.92087173461914, 36.52760696411133, 32.414215087890625, 24.62397003173828, 28.196773529052734, 30.83429718017578, 47.271270751953125, 0.5618667602539062, 41.26570129394531, 37.424644470214844, 48.858802795410156, 58.422027587890625, 14.82181167602539, 54.946868896484375, -8.521587371826172, 6.875385284423828, -8.388916015625, 43.27362060546875, 14.310760498046875, 5.4358673095703125, 29.75042724609375, 17.530929565429688, 23.03961181640625, 69.38353729248047, 16.915603637695312, 2.8134078979492188, 3.8959884643554688, 62.614158630371094, 21.444686889648438, 25.431167602539062, 25.947004318237305, 21.809921264648438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000490.npy"}
{"epoch": 0.7195301027900147, "step": 491, "batch_size": 64, "mean": 20.265714645385742, "std": 20.661968231201172, "min": -40.10640335083008, "p10": -1.804499816894531, "median": 19.42052173614502, "p90": 50.420096588134776, "max": 62.216697692871094, "pos_frac": 0.875, "sample": [25.055496215820312, -10.52450180053711, 51.47023010253906, 62.216697692871094, 33.00193786621094, 3.2262935638427734, 18.196853637695312, -1.8727798461914062, 18.349349975585938, 19.70452880859375, 36.893310546875, 29.972885131835938, -1.6451797485351562, -2.982696533203125, 56.55613708496094, 34.044464111328125, 7.270286560058594, 41.60084533691406, 27.886962890625, 17.729331970214844, 25.7586669921875, 1.7722606658935547, 41.6861572265625, 4.4576416015625, 3.4548683166503906, 7.73365592956543, 19.31414794921875, 0.28745079040527344, 25.773727416992188, 5.670948028564453, 18.95948028564453, 34.299407958984375, 14.587860107421875, 19.52689552307129, 3.1002197265625, 51.58505630493164, 8.235260009765625, 58.174049377441406, 3.4450950622558594, 31.907562255859375, 21.936004638671875, 2.8633804321289062, 47.980369567871094, -21.601058959960938, 12.346221923828125, 48.087989807128906, 3.1887435913085938, 37.06993103027344, 33.28327941894531, 35.449188232421875, 3.2192649841308594, 40.92933654785156, 51.89141845703125, 18.364418029785156, 8.250116348266602, -9.1622314453125, 32.693885803222656, 29.669490814208984, 14.0526123046875, 21.889801025390625, 51.41957092285156, -13.502044677734375, 20.911544799804688, -40.10640335083008], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000491.npy"}
{"epoch": 0.7209985315712188, "step": 492, "batch_size": 64, "mean": 19.087968826293945, "std": 21.75289535522461, "min": -23.7178955078125, "p10": -6.7801269531249995, "median": 16.758148193359375, "p90": 47.66708602905273, "max": 84.826171875, "pos_frac": 0.765625, "sample": [29.603485107421875, 42.40254211425781, 1.1334800720214844, -8.17291259765625, 24.345043182373047, 5.341804504394531, 8.212371826171875, 30.036849975585938, -9.979412078857422, 27.10102081298828, -1.3121185302734375, 40.95435333251953, -2.5822296142578125, 47.21543884277344, 12.657600402832031, 47.86064910888672, -1.4198989868164062, 27.51629638671875, 14.45425033569336, 54.8275146484375, -3.7977981567382812, 31.48419189453125, 33.48993682861328, 84.826171875, 27.7772216796875, 16.598541259765625, 40.21380615234375, 24.457290649414062, 6.1683349609375, 15.737773895263672, 6.787384033203125, -2.425328254699707, 3.4715213775634766, -6.1648712158203125, 4.867181777954102, 40.096038818359375, 16.917755126953125, 4.781044006347656, -23.7178955078125, 29.499122619628906, 49.66093444824219, 51.92352294921875, 19.45648193359375, 6.914379119873047, 4.208686828613281, 54.65507888793945, -0.39965057373046875, 19.54468536376953, 35.386260986328125, 39.74985885620117, 59.49101257324219, -4.9043731689453125, 37.935585021972656, -19.139999389648438, 18.97686004638672, 35.69878005981445, -7.0438079833984375, 14.386472702026367, -7.490474700927734, 11.6055908203125, -9.854255676269531, 43.41197967529297, 3.988353729248047, 22.204559326171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000492.npy"}
{"epoch": 0.7224669603524229, "step": 493, "batch_size": 64, "mean": 19.930822372436523, "std": 17.794815063476562, "min": -18.295597076416016, "p10": -1.2711261749267577, "median": 18.31800079345703, "p90": 43.15042572021485, "max": 61.599945068359375, "pos_frac": 0.859375, "sample": [1.4041862487792969, 12.352537155151367, 61.599945068359375, 30.9256591796875, 31.760894775390625, 33.23683166503906, 37.4202880859375, 41.8023681640625, 0.6039276123046875, -18.295597076416016, 60.51245880126953, 23.398269653320312, 49.12544250488281, 25.426040649414062, 9.26457405090332, 24.77996063232422, 18.09949493408203, 56.48321533203125, 3.9422531127929688, 7.053808212280273, 30.828285217285156, 17.944162368774414, 30.486942291259766, 29.032981872558594, -0.6874160766601562, 19.201332092285156, 47.137657165527344, -15.572532653808594, 18.53650665283203, -1.2033729553222656, -2.947662353515625, 4.777217864990234, 39.698341369628906, 27.74156951904297, 32.54655456542969, 0.2952003479003906, 16.99261474609375, 26.843704223632812, 11.647140502929688, -3.18267822265625, -1.3001632690429688, 19.897918701171875, 28.93585205078125, 11.241352081298828, 43.72816467285156, 43.85930633544922, -3.589994430541992, 40.735992431640625, 25.702552795410156, 36.79030227661133, 15.886184692382812, 14.471466064453125, 17.613630294799805, 19.301315307617188, -3.6793594360351562, 8.157806396484375, 4.640183448791504, 14.133115768432617, 35.99359130859375, 7.26540470123291, 4.250492095947266, 30.063339233398438, 7.8531494140625, 12.607940673828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000493.npy"}
{"epoch": 0.723935389133627, "step": 494, "batch_size": 64, "mean": 24.046892166137695, "std": 21.169652938842773, "min": -29.1671142578125, "p10": -0.5469169616699192, "median": 22.49958038330078, "p90": 48.902337646484376, "max": 84.19009399414062, "pos_frac": 0.890625, "sample": [-2.643054962158203, 24.69867706298828, 38.08885192871094, 53.717674255371094, 84.19009399414062, 49.205963134765625, 4.353363037109375, 65.58546447753906, 38.498504638671875, 23.02462387084961, 19.246559143066406, 20.440658569335938, -8.79012680053711, 41.33367919921875, 17.998703002929688, 32.51238250732422, 27.966049194335938, -29.1671142578125, 31.55810546875, 52.667877197265625, 18.764694213867188, 3.1325454711914062, 12.137741088867188, 48.193878173828125, 2.4251251220703125, 19.74953842163086, 16.109573364257812, 45.28034210205078, 19.512664794921875, 13.307552337646484, 5.43218994140625, 4.157936096191406, 25.597190856933594, 44.88481903076172, 40.35875701904297, 22.937110900878906, 37.16729736328125, 32.51488494873047, 16.666534423828125, 61.2230224609375, 15.102714538574219, 2.04681396484375, -12.712150573730469, 2.287872314453125, 29.84619140625, 30.39046859741211, 23.49755859375, 10.553108215332031, -7.850067138671875, 44.113197326660156, 22.062049865722656, -5.842922210693359, 42.410614013671875, 33.62602996826172, 37.92930603027344, 22.943580627441406, 73.29740142822266, 38.88682556152344, -1.6585159301757812, 12.870674133300781, 12.7406005859375, 20.692516326904297, 10.420600891113281, 7.3063812255859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000494.npy"}
{"epoch": 0.7254038179148311, "step": 495, "batch_size": 64, "mean": 18.81014633178711, "std": 20.70097541809082, "min": -26.6385498046875, "p10": -5.793658447265624, "median": 16.22942352294922, "p90": 47.67949829101564, "max": 75.4244384765625, "pos_frac": 0.828125, "sample": [24.885765075683594, -1.5844306945800781, 26.033164978027344, 34.13410949707031, 33.74664306640625, 19.423397064208984, 2.1289024353027344, 13.565505981445312, 7.5489959716796875, 22.623519897460938, 40.85344696044922, 6.189781188964844, 1.2336044311523438, 2.0444908142089844, 34.30603790283203, 32.553855895996094, 6.285427093505859, 13.035507202148438, 13.918304443359375, 13.932266235351562, 19.268447875976562, -26.6385498046875, 31.295555114746094, -2.25665283203125, 33.432586669921875, 55.98736572265625, 51.639671325683594, -16.345062255859375, 15.662605285644531, -0.98687744140625, 31.3460693359375, 28.94091796875, 16.951400756835938, -13.682846069335938, 30.886795043945312, 60.20376205444336, 27.519760131835938, 6.599456787109375, 75.4244384765625, 26.288543701171875, 11.79019546508789, -5.92877197265625, 11.347694396972656, 0.75421142578125, 8.829559326171875, 21.937057495117188, -15.97353744506836, 22.51374053955078, 10.80588150024414, 10.166450500488281, 14.245521545410156, 13.914344787597656, 12.797660827636719, 36.64005661010742, 50.126487731933594, -5.4783935546875, -9.075363159179688, 23.06650161743164, -10.296875, 16.796241760253906, 49.08839416503906, 44.39207458496094, 71.36061096191406, 21.634033203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000495.npy"}
{"epoch": 0.7268722466960352, "step": 496, "batch_size": 64, "mean": 18.35360336303711, "std": 17.487611770629883, "min": -20.752273559570312, "p10": -6.1343864440917955, "median": 19.834304809570312, "p90": 38.45634918212892, "max": 62.56582260131836, "pos_frac": 0.84375, "sample": [-5.141792297363281, -12.555877685546875, 30.727134704589844, 21.203475952148438, -10.074356079101562, 49.78907775878906, -8.967781066894531, 60.495086669921875, 26.389728546142578, 16.596450805664062, 26.76856231689453, 32.484161376953125, -0.13048553466796875, 44.11235046386719, 20.99163818359375, 10.924263000488281, 18.768508911132812, 46.42192077636719, -6.559783935546875, 15.310798645019531, 15.322845458984375, 31.423797607421875, 36.29203796386719, 27.758378982543945, 62.56582260131836, 29.424171447753906, 25.986114501953125, 13.757240295410156, 29.44994354248047, 20.567337036132812, 31.808792114257812, -4.409461975097656, 15.506072998046875, 12.732505798339844, 15.952812194824219, 22.508045196533203, 6.6707305908203125, 7.420600891113281, 3.4589309692382812, 22.96947479248047, 1.5826683044433594, 4.550712585449219, 24.763946533203125, -20.752273559570312, 10.718330383300781, 39.3839111328125, 7.302520751953125, 59.1749267578125, 21.807723999023438, 25.96197509765625, 25.676673889160156, -8.4998779296875, 21.112693786621094, 22.193706512451172, 20.705398559570312, 13.70553970336914, 32.11297607421875, 19.991241455078125, 6.7933197021484375, 13.4781494140625, 19.6773681640625, 0.10115814208984375, 16.720458984375, -8.351890563964844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000496.npy"}
{"epoch": 0.7283406754772394, "step": 497, "batch_size": 64, "mean": 18.66632080078125, "std": 19.94312858581543, "min": -18.910564422607422, "p10": -4.912795257568358, "median": 16.507190704345703, "p90": 45.05182571411133, "max": 70.44085693359375, "pos_frac": 0.84375, "sample": [10.99093246459961, 32.85012435913086, 22.122894287109375, 55.049407958984375, -2.3999576568603516, 6.899112701416016, 36.46492004394531, 29.422958374023438, 15.570281982421875, 20.865737915039062, 19.35809326171875, 36.53046417236328, 22.5057373046875, 1.1666946411132812, -5.7026824951171875, 9.519508361816406, 27.62700653076172, 62.5552978515625, 1.3038558959960938, 14.04046630859375, 13.515853881835938, 6.693859100341797, 43.46318054199219, -18.910564422607422, 21.647079467773438, 3.6294212341308594, 3.4649620056152344, 65.55503845214844, 23.440460205078125, -9.007881164550781, 22.587909698486328, 6.8571929931640625, 14.729301452636719, 21.981185913085938, -7.052677154541016, 65.52313232421875, 24.10161590576172, 55.280548095703125, 20.223377227783203, 23.506362915039062, 16.103500366210938, 22.503868103027344, 8.552532196044922, 70.44085693359375, -13.580097198486328, 6.791959762573242, 35.99748611450195, 24.160865783691406, 36.06748962402344, 14.298919677734375, 8.317352294921875, 3.119354248046875, 20.07379913330078, 10.948776245117188, 18.52159881591797, 16.91088104248047, -3.0697250366210938, 12.719711303710938, -10.00106430053711, 6.164360046386719, 40.1591796875, -11.343887329101562, 45.73267364501953, -2.9161376953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000497.npy"}
{"epoch": 0.7298091042584435, "step": 498, "batch_size": 64, "mean": 20.020029067993164, "std": 19.052614212036133, "min": -20.57268524169922, "p10": 3.2497421264648443, "median": 16.57085418701172, "p90": 41.12855834960938, "max": 81.75137329101562, "pos_frac": 0.90625, "sample": [8.041629791259766, 22.000503540039062, 8.596235275268555, 33.338111877441406, 9.72027587890625, 4.8291473388671875, 16.96474838256836, 16.859119415283203, 24.29845428466797, 28.64194107055664, -0.5140266418457031, 13.200294494628906, 24.094974517822266, 23.56760025024414, -5.545967102050781, 33.497650146484375, 16.729095458984375, 22.674339294433594, 44.036651611328125, 81.75137329101562, 6.205289840698242, 18.532081604003906, 10.589141845703125, 11.323158264160156, 25.373092651367188, 41.65171813964844, 20.630088806152344, -20.57268524169922, 13.161178588867188, 39.90785217285156, 9.34088134765625, 27.739070892333984, 3.0685272216796875, 53.438499450683594, 5.340229034423828, 30.828460693359375, 11.794975280761719, 32.70843505859375, 21.98815155029297, 37.09980773925781, 39.55574035644531, 9.346542358398438, 69.27728271484375, 15.433723449707031, 48.528785705566406, 12.570549011230469, -19.883224487304688, 14.047569274902344, 35.819496154785156, 4.018684387207031, 23.539260864257812, 5.214851379394531, 3.672576904296875, 7.801689147949219, 10.067276000976562, -4.7303466796875, 28.734949111938477, 16.412612915039062, 72.73379516601562, 9.5985107421875, -2.94354248046875, 34.545166015625, 12.750595092773438, 8.239192962646484], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000498.npy"}
{"epoch": 0.7312775330396476, "step": 499, "batch_size": 64, "mean": 21.523197174072266, "std": 20.609928131103516, "min": -40.16912841796875, "p10": -2.5687721252441396, "median": 21.643917083740234, "p90": 44.29251403808594, "max": 87.95158386230469, "pos_frac": 0.859375, "sample": [22.65699005126953, 17.324874877929688, 20.636940002441406, -14.837148666381836, 13.612815856933594, 20.409835815429688, 44.38111877441406, 9.273956298828125, -3.1353759765625, -8.14361572265625, 4.230796813964844, 23.773441314697266, 3.4955644607543945, 36.73541259765625, 9.052749633789062, 47.63878631591797, 31.55328369140625, 15.5615234375, 87.95158386230469, 21.661773681640625, 26.77273941040039, 23.481155395507812, 17.718780517578125, 39.3271484375, 56.735984802246094, 30.123733520507812, 36.97711944580078, 28.338363647460938, 25.00946044921875, 39.13829803466797, 16.21302032470703, -0.11215972900390625, 31.48675537109375, 37.425575256347656, 3.1220626831054688, 3.0999069213867188, 38.789093017578125, 8.110198974609375, 30.217010498046875, -40.16912841796875, -4.002498626708984, 31.560970306396484, 20.63507080078125, 48.42914581298828, 40.881805419921875, 44.08576965332031, 36.044273376464844, 34.87622833251953, -8.471179962158203, 3.6971054077148438, -1.2466964721679688, 39.75489807128906, 13.894241333007812, 14.720283508300781, 4.650115966796875, 21.626060485839844, 63.11860656738281, 27.37969207763672, 19.31090545654297, -7.9608306884765625, 3.25921630859375, 28.29320526123047, 1.66693115234375, 45.640960693359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000499.npy"}
{"epoch": 0.7327459618208517, "step": 500, "batch_size": 64, "mean": 20.32265853881836, "std": 22.07276153564453, "min": -21.48858642578125, "p10": -3.6904953002929677, "median": 16.788894653320312, "p90": 53.399784088134766, "max": 72.35073852539062, "pos_frac": 0.84375, "sample": [11.59299087524414, 11.042591094970703, -1.3155059814453125, 72.35073852539062, 43.40388870239258, 29.021099090576172, 23.223709106445312, 28.407302856445312, 0.09750747680664062, 22.74217987060547, 46.37159729003906, 56.95269775390625, -9.610649108886719, 1.6045379638671875, -4.272363662719727, 19.931785583496094, 3.4610137939453125, 6.2801361083984375, -0.34874725341796875, 16.126998901367188, 11.465866088867188, 14.702774047851562, 29.22235870361328, 53.49510192871094, 14.160247802734375, 55.646820068359375, 7.99285888671875, 71.90864562988281, 19.7415771484375, -5.598859786987305, 20.724037170410156, 35.738311767578125, 41.78337097167969, 30.560739517211914, -21.48858642578125, 42.80803680419922, 0.15489959716796875, 14.377975463867188, 28.293289184570312, -4.107879638671875, -11.923908233642578, 0.5918121337890625, 24.641983032226562, 67.55170440673828, 3.6263504028320312, 2.503520965576172, 50.53330993652344, 26.838043212890625, 13.1666259765625, 26.60260009765625, 44.45021057128906, 25.88079833984375, 24.81281280517578, 2.4079132080078125, 23.415130615234375, 62.516204833984375, 17.450790405273438, -11.753189086914062, -2.7165985107421875, 3.5164260864257812, 3.6014671325683594, 53.17737579345703, 2.524566650390625, 8.589149475097656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000500.npy"}
{"epoch": 0.7342143906020558, "step": 501, "batch_size": 64, "mean": 22.228221893310547, "std": 24.435277938842773, "min": -17.904296875, "p10": -5.117921257019043, "median": 17.529600143432617, "p90": 54.678214263916026, "max": 95.91513061523438, "pos_frac": 0.78125, "sample": [46.94831085205078, 5.540107727050781, 16.37586212158203, -1.9125328063964844, 10.163780212402344, 5.757087707519531, -0.48143768310546875, 62.76246643066406, -10.174629211425781, 8.024444580078125, 31.082672119140625, -5.093713760375977, 30.06011199951172, 11.011390686035156, 42.232025146484375, 46.579673767089844, 11.710769653320312, -6.520271301269531, 3.296855926513672, 17.5172119140625, 52.82905578613281, 10.207759857177734, 27.789093017578125, 21.079978942871094, 35.008060455322266, 33.64922332763672, 7.849151611328125, -3.0455551147460938, 57.93074035644531, 25.05675506591797, -9.432968139648438, 7.28741455078125, 22.32061767578125, 2.3621978759765625, 38.807899475097656, 60.32940673828125, 4.628852844238281, 43.15692138671875, 30.243606567382812, 60.7275390625, 95.91513061523438, 10.44830322265625, 13.515056610107422, -5.1282958984375, 48.15397644042969, 46.97296905517578, -0.7062301635742188, 32.92014694213867, -1.4013633728027344, -0.9423980712890625, 55.47071075439453, -5.348228454589844, 4.991199493408203, 19.958297729492188, 17.541988372802734, -17.904296875, 24.872074127197266, 46.17915344238281, 23.983840942382812, -12.233768463134766, 89.8321533203125, 35.244850158691406, 41.38232421875, 5.222583770751953], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000501.npy"}
{"epoch": 0.73568281938326, "step": 502, "batch_size": 64, "mean": 18.023334503173828, "std": 17.06119155883789, "min": -30.283727645874023, "p10": -2.428511810302733, "median": 16.315998077392578, "p90": 40.13419723510743, "max": 62.83983612060547, "pos_frac": 0.859375, "sample": [3.6499786376953125, -30.283727645874023, 17.708322525024414, 12.420066833496094, 18.633834838867188, -6.451057434082031, -4.49656867980957, 8.055877685546875, 62.83983612060547, 22.305400848388672, 21.663970947265625, 45.31620788574219, 3.4488067626953125, 21.359649658203125, 31.013904571533203, 12.0596923828125, 4.935333251953125, 21.881500244140625, 34.77191162109375, 40.78588104248047, 28.505569458007812, 10.62436294555664, 10.156829833984375, 12.508808135986328, -4.8947906494140625, 14.469856262207031, 15.440849304199219, 9.064981460571289, 32.18549346923828, 24.587432861328125, 6.931396484375, 45.76155090332031, -3.607025146484375, -1.3139495849609375, 7.198516845703125, 27.060264587402344, 9.071136474609375, 44.55149841308594, -15.591068267822266, 10.371307373046875, 20.25677490234375, 32.6357421875, 2.0805130004882812, 15.322891235351562, 35.230918884277344, 38.378662109375, 8.292770385742188, -2.9061813354492188, 18.574363708496094, 44.075347900390625, 35.06077575683594, 18.418479919433594, 27.827301025390625, -1.15142822265625, 11.774162292480469, 17.191146850585938, 33.249053955078125, 7.703033447265625, 35.733299255371094, 4.069156646728516, 51.349884033203125, 13.738445281982422, 38.61360168457031, 23.272872924804688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000502.npy"}
{"epoch": 0.737151248164464, "step": 503, "batch_size": 64, "mean": 21.831512451171875, "std": 20.486406326293945, "min": -26.233928680419922, "p10": -2.4562536239624015, "median": 22.342748641967773, "p90": 45.856864929199226, "max": 78.1484375, "pos_frac": 0.828125, "sample": [22.862564086914062, -3.0867691040039062, 46.40293884277344, -2.8551025390625, 24.704269409179688, 1.8252620697021484, -0.6567230224609375, 16.82583236694336, 4.0545806884765625, 40.39628219604492, 20.42620086669922, 41.544189453125, 27.552169799804688, 39.771934509277344, 37.55013656616211, 25.675525665283203, 9.991283416748047, 12.71136474609375, -25.99029541015625, 40.18779754638672, 41.003814697265625, 15.194931030273438, 15.275421142578125, 20.725114822387695, 1.2971839904785156, 21.216766357421875, -1.5122451782226562, 36.151123046875, -1.5256061553955078, 14.322990417480469, 1.80291748046875, 2.149822235107422, -4.4036865234375, 50.955169677734375, -0.7390899658203125, 53.11644744873047, 14.020126342773438, 44.3438720703125, 46.31982421875, 36.46971130371094, 36.930686950683594, 44.77662658691406, 13.165359497070312, 7.970684051513672, -26.233928680419922, 59.953765869140625, 22.46091079711914, 31.312881469726562, 32.01869201660156, 78.1484375, -10.190141677856445, 21.91720962524414, 7.0808258056640625, 29.56536865234375, 29.344268798828125, 35.325225830078125, 14.193740844726562, 33.751007080078125, 50.357513427734375, 33.064697265625, 22.224586486816406, 27.167118072509766, 29.923904418945312, -13.090682983398438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000503.npy"}
{"epoch": 0.7386196769456681, "step": 504, "batch_size": 64, "mean": 24.942359924316406, "std": 20.003402709960938, "min": -15.515594482421875, "p10": 2.8934959411621097, "median": 20.81618881225586, "p90": 49.25785446166993, "max": 81.20895385742188, "pos_frac": 0.9375, "sample": [17.188934326171875, 15.2349853515625, 19.801895141601562, 14.885414123535156, 9.740310668945312, 21.37375259399414, 13.974380493164062, 0.7571258544921875, 44.46653747558594, 60.05816650390625, 76.3545913696289, 39.88984680175781, 36.16024398803711, -5.675350189208984, 17.15160369873047, 18.016380310058594, 26.897037506103516, 15.248481750488281, -9.334674835205078, 29.494915008544922, 40.676536560058594, 27.38811492919922, 32.86323547363281, -15.515594482421875, 13.90463638305664, 13.299381256103516, 29.813392639160156, 8.219165802001953, 15.562103271484375, 7.4117431640625, -4.658790588378906, 16.749122619628906, 8.5848388671875, 33.9490966796875, 20.54241180419922, 25.75958251953125, 49.641082763671875, 24.56585693359375, 26.98302459716797, 21.587371826171875, 32.44666290283203, 66.9673080444336, 21.0899658203125, 2.7497901916503906, 53.677520751953125, 81.20895385742188, 1.6239166259765625, 44.99726867675781, 15.122934341430664, 9.557281494140625, 16.564647674560547, 13.460075378417969, 67.63542175292969, 34.864532470703125, 16.045944213867188, 45.197235107421875, 44.7939453125, 48.36365509033203, 32.876792907714844, 3.228809356689453, 11.392227172851562, 42.65618133544922, 25.74521255493164, 5.0339508056640625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000504.npy"}
{"epoch": 0.7400881057268722, "step": 505, "batch_size": 64, "mean": 21.754974365234375, "std": 19.08247947692871, "min": -13.278778076171875, "p10": -2.1493087768554675, "median": 19.337989807128906, "p90": 46.59337463378907, "max": 80.74369812011719, "pos_frac": 0.859375, "sample": [-0.680328369140625, 23.39764404296875, 5.3351898193359375, 46.00450134277344, 20.0640869140625, 55.63214111328125, 8.221488952636719, 34.014976501464844, 64.76883697509766, 51.62382507324219, -8.780887603759766, 40.618408203125, 80.74369812011719, 16.06987762451172, 17.233200073242188, 18.611892700195312, 21.514808654785156, 10.672351837158203, 28.425437927246094, 13.254199981689453, 22.909160614013672, 28.614212036132812, 32.722564697265625, 23.0596923828125, -5.689849853515625, 17.238929748535156, 12.665149688720703, 42.29939651489258, 43.85931396484375, 46.84574890136719, -2.704784393310547, 32.36039733886719, -0.8531990051269531, 9.285411834716797, 26.415302276611328, 31.330291748046875, 16.193954467773438, 14.010377883911133, 9.378543853759766, 17.025501251220703, 55.12915802001953, 0.1969757080078125, 30.734458923339844, 13.676445007324219, 31.3013916015625, 26.18798828125, 9.5880126953125, 16.314556121826172, 10.379684448242188, 48.21514892578125, 16.341758728027344, 0.9198379516601562, 7.695291519165039, 43.97727966308594, 33.2935791015625, -5.682598114013672, 28.78700065612793, -13.278778076171875, 25.068710327148438, 13.403923034667969, 33.21070098876953, -10.583850860595703, 22.021636962890625, -8.291417121887207], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000505.npy"}
{"epoch": 0.7415565345080763, "step": 506, "batch_size": 64, "mean": 20.012042999267578, "std": 17.60698890686035, "min": -21.592012405395508, "p10": -0.8169994354248042, "median": 18.48882293701172, "p90": 44.193171691894534, "max": 75.68728637695312, "pos_frac": 0.875, "sample": [1.7520980834960938, 31.181655883789062, 6.272830963134766, 13.909599304199219, 24.926897048950195, 15.991790771484375, -3.8788909912109375, 41.373504638671875, 10.488494873046875, 12.414569854736328, 46.83797073364258, 6.84332275390625, 27.963687896728516, -0.35411834716796875, -6.098409652709961, 17.89165496826172, 23.40705108642578, -1.0153770446777344, 9.530281066894531, 44.82612609863281, 18.207061767578125, 9.65521240234375, 33.62994384765625, 46.651954650878906, 75.68728637695312, -6.055298805236816, 13.152557373046875, 34.17890548706055, 29.74334716796875, 19.035057067871094, 29.672134399414062, 8.275672912597656, 25.820709228515625, 15.555686950683594, 7.753936767578125, -2.3917999267578125, 53.581520080566406, 39.853302001953125, 19.439926147460938, 42.716278076171875, 18.59069061279297, 15.398326873779297, 51.27955627441406, 2.7637977600097656, 0.43872833251953125, 13.088356018066406, 23.2581787109375, 10.4527587890625, 27.085311889648438, 24.859352111816406, 20.152618408203125, 3.5170230865478516, 21.326904296875, 31.148452758789062, 29.754669189453125, 27.16999053955078, -8.17864990234375, 6.897222518920898, 9.836257934570312, 18.38695526123047, 51.324195861816406, 37.562740325927734, 27.821151733398438, -21.592012405395508], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000506.npy"}
{"epoch": 0.7430249632892805, "step": 507, "batch_size": 64, "mean": 20.463586807250977, "std": 19.96019744873047, "min": -34.223876953125, "p10": -0.34999828338622896, "median": 22.062294006347656, "p90": 43.94490356445313, "max": 73.38284301757812, "pos_frac": 0.890625, "sample": [3.961935043334961, 12.669853210449219, 28.794326782226562, 14.751129150390625, 73.38284301757812, 16.608383178710938, 25.9771728515625, 14.527153015136719, 4.642570495605469, 38.22227478027344, 39.72380065917969, 17.930892944335938, 51.996917724609375, 32.97125244140625, 25.825363159179688, -1.5718536376953125, -34.223876953125, -18.77850341796875, 28.74468994140625, 49.5194091796875, 13.03857421875, 29.774917602539062, 3.8835010528564453, 23.134048461914062, 1.1662063598632812, 43.35070037841797, 58.874168395996094, 24.082717895507812, -0.977447509765625, 35.30378723144531, 27.041793823242188, 27.517364501953125, -25.882408142089844, 8.644010543823242, 26.043209075927734, 42.18170166015625, -19.805065155029297, 25.93939208984375, 11.136123657226562, 17.46764373779297, 17.421207427978516, 22.124847412109375, 44.199562072753906, 13.873817443847656, 32.347259521484375, 23.71137237548828, 26.240875244140625, 37.856910705566406, 49.281585693359375, 25.674636840820312, 14.41961669921875, 6.509544372558594, 1.1140499114990234, 14.9776611328125, 10.802688598632812, 57.22367858886719, 42.38554382324219, 3.7370223999023438, 17.240020751953125, 1.3449172973632812, 2.301910400390625, 21.999740600585938, 26.352710723876953, -1.0622329711914062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000507.npy"}
{"epoch": 0.7444933920704846, "step": 508, "batch_size": 64, "mean": 22.089759826660156, "std": 23.127580642700195, "min": -28.557708740234375, "p10": -3.3346279144287108, "median": 19.01995086669922, "p90": 52.01374664306641, "max": 64.21388244628906, "pos_frac": 0.828125, "sample": [37.03570556640625, 62.387176513671875, -15.998729705810547, 39.25830078125, -28.557708740234375, -28.063690185546875, -5.283287048339844, 52.740753173828125, 6.74420166015625, 39.70357894897461, 64.21388244628906, 20.8822021484375, 3.1219940185546875, 63.8780517578125, 49.12785339355469, 50.31739807128906, 29.05910873413086, 19.003097534179688, 47.110687255859375, 21.857940673828125, 0.41606903076171875, 52.91765594482422, 12.881778717041016, 38.814788818359375, 14.759231567382812, 33.26991271972656, 18.89928436279297, 3.4758834838867188, 6.7039337158203125, 8.636459350585938, -0.8088455200195312, 34.560638427734375, 42.89666748046875, -2.9323997497558594, 45.857025146484375, -3.5070114135742188, -8.100570678710938, 10.152721405029297, 0.3849334716796875, 32.95575714111328, 35.475502014160156, 35.40086364746094, 5.40277099609375, 30.5145263671875, 24.016693115234375, 26.47995376586914, 15.506523132324219, 34.395713806152344, 39.19572448730469, 62.89363098144531, 64.1937026977539, 19.03680419921875, -14.842964172363281, 48.352684020996094, 49.108070373535156, 18.530799865722656, 2.2204132080078125, 18.686370849609375, 0.32933807373046875, 9.460559844970703, -0.551483154296875, -1.2128677368164062, 5.034187316894531, 15.34454345703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000508.npy"}
{"epoch": 0.7459618208516887, "step": 509, "batch_size": 64, "mean": 19.402435302734375, "std": 22.30762481689453, "min": -27.736251831054688, "p10": -6.274920082092284, "median": 15.977725982666016, "p90": 50.23179931640625, "max": 65.2177734375, "pos_frac": 0.828125, "sample": [0.23218154907226562, 8.160934448242188, -23.802642822265625, 14.49993896484375, -8.027557373046875, 63.22785949707031, 35.33460998535156, 10.118152618408203, 61.15477752685547, 16.80614471435547, 23.918289184570312, 26.599708557128906, 2.2586212158203125, 19.95135498046875, 11.779060363769531, 2.540985107421875, 42.59120178222656, -4.761024475097656, -26.7230224609375, 3.6485443115234375, 62.53974914550781, 30.713363647460938, 7.546073913574219, 11.76229476928711, 18.407997131347656, -6.667743682861328, 3.4545822143554688, 27.719871520996094, 21.382774353027344, 15.149307250976562, 36.048179626464844, -5.358331680297852, 31.809099197387695, -7.206047058105469, 21.11199188232422, 52.687889099121094, 37.67163848876953, 49.817840576171875, 36.778160095214844, 5.6340789794921875, -1.0623245239257812, 4.056739807128906, 6.537422180175781, 12.23095703125, 22.555728912353516, -27.736251831054688, 32.7391357421875, 40.69550323486328, 2.8739471435546875, 33.991973876953125, -15.536331176757812, 6.5986328125, 33.18189239501953, -2.6485633850097656, 59.73619079589844, 14.031185150146484, 26.62670135498047, 10.047714233398438, 13.663253784179688, 50.409210205078125, 65.2177734375, 32.997344970703125, 45.120174407958984, 44.916961669921875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000509.npy"}
{"epoch": 0.7474302496328928, "step": 510, "batch_size": 64, "mean": 22.05681610107422, "std": 23.531625747680664, "min": -41.223480224609375, "p10": -4.72711143493652, "median": 19.797395706176758, "p90": 53.990960693359376, "max": 79.59342193603516, "pos_frac": 0.875, "sample": [-7.814338684082031, 6.270301818847656, 70.33667755126953, 40.494815826416016, -12.472366333007812, 20.921836853027344, 73.80552673339844, 79.59342193603516, 64.21792602539062, 18.93584632873535, 21.017681121826172, -7.426605224609375, 0.17686843872070312, 44.868629455566406, 22.73993682861328, 48.17998504638672, 45.16930389404297, -41.223480224609375, -23.25802230834961, 37.33680725097656, 11.154495239257812, 2.922403335571289, 26.39623260498047, -21.470352172851562, 21.82134246826172, 22.51447296142578, 17.048660278320312, 34.8004035949707, 19.542755126953125, 19.306488037109375, 7.687828063964844, 20.05203628540039, 12.609811782836914, 46.44683074951172, 53.85041809082031, 17.455299377441406, 60.87541198730469, 1.5423431396484375, 3.582836151123047, 54.05119323730469, 14.549758911132812, 14.648719787597656, 11.012115478515625, 5.944496154785156, 24.344078063964844, 25.46855926513672, 23.44384765625, 38.897117614746094, 17.264373779296875, 21.644832611083984, 70.25274658203125, 9.222343444824219, 13.86614990234375, 28.9237060546875, 11.581779479980469, 34.032798767089844, 17.4217529296875, 7.8702392578125, -6.076892852783203, 33.30609893798828, 13.5870361328125, 26.04534912109375, -1.5776214599609375, 21.901214599609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000510.npy"}
{"epoch": 0.748898678414097, "step": 511, "batch_size": 64, "mean": 21.31585121154785, "std": 24.386301040649414, "min": -26.609344482421875, "p10": -6.616131973266601, "median": 16.654117584228516, "p90": 57.106821060180664, "max": 98.49761962890625, "pos_frac": 0.859375, "sample": [18.560081481933594, 75.2013168334961, 6.23590087890625, 27.164016723632812, 14.504287719726562, 11.855194091796875, 16.451126098632812, 44.57866668701172, 57.701416015625, 16.120418548583984, 15.436767578125, -12.876052856445312, 56.4229736328125, -6.74786376953125, -3.637510299682617, 15.61834716796875, 3.3710098266601562, 19.1536865234375, 8.7850341796875, 16.85710906982422, 59.11376190185547, 7.120891571044922, 7.185554504394531, 18.024093627929688, 43.92911911010742, -6.916496276855469, -26.609344482421875, 34.45166778564453, 7.677757263183594, 31.98902130126953, 12.930450439453125, 11.818466186523438, 2.9325170516967773, 43.0037841796875, 33.480308532714844, 18.99237060546875, 43.27070617675781, 79.17489624023438, -15.990776062011719, 28.183456420898438, 31.746719360351562, 21.893707275390625, 30.285480499267578, 25.949630737304688, -17.716951370239258, 33.7437744140625, 17.236038208007812, 1.4446907043457031, -6.308757781982422, 1.853912353515625, 4.06793212890625, 7.786674499511719, 42.191314697265625, 14.662349700927734, -8.287071228027344, 20.033447265625, 57.399898529052734, 21.864837646484375, 98.49761962890625, 4.2621002197265625, 12.7049560546875, 31.875946044921875, 7.106288909912109, 75.40174865722656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000511.npy"}
{"epoch": 0.750367107195301, "step": 512, "batch_size": 64, "mean": 20.856685638427734, "std": 22.410446166992188, "min": -26.47943115234375, "p10": -5.553348541259763, "median": 16.28571128845215, "p90": 53.58303565979004, "max": 75.19187927246094, "pos_frac": 0.84375, "sample": [73.06275939941406, -10.905784606933594, 36.40312194824219, 13.41448974609375, 15.192707061767578, 5.392673492431641, 45.63844299316406, -9.287078857421875, 2.8191394805908203, 5.024412155151367, 23.538660049438477, 34.519683837890625, -2.5967721939086914, 49.66615676879883, 9.353424072265625, 11.552324295043945, -16.012481689453125, 12.2255859375, 6.2783050537109375, -3.2394638061523438, -0.6385040283203125, 35.05943298339844, 3.4509010314941406, 26.5919189453125, 27.180511474609375, 46.396240234375, 5.855072021484375, 13.810821533203125, 14.311386108398438, 4.163917541503906, 40.89710235595703, 19.515872955322266, 14.542583465576172, 50.07096862792969, -12.09865951538086, 56.67936325073242, 32.22620391845703, 23.989234924316406, 9.418212890625, 58.87837219238281, -26.47943115234375, 54.027008056640625, 52.54710006713867, 55.336669921875, -6.545013427734375, 60.95240783691406, 11.0177001953125, 17.37871551513672, -11.219566345214844, 26.914039611816406, 24.601783752441406, 4.969814300537109, 48.68482971191406, 18.400436401367188, 22.426300048828125, 10.538711547851562, 31.223712921142578, 2.899932861328125, 30.544174194335938, 11.47796630859375, 4.887424468994141, 75.19187927246094, 23.500774383544922, 19.209205627441406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000512.npy"}
{"epoch": 0.7518355359765051, "step": 513, "batch_size": 64, "mean": 20.959625244140625, "std": 19.70657730102539, "min": -53.90776062011719, "p10": 1.0431268692016615, "median": 20.85456085205078, "p90": 46.862129211425795, "max": 65.20599365234375, "pos_frac": 0.90625, "sample": [32.22576904296875, 8.993701934814453, 38.56652069091797, 2.729135513305664, 10.915168762207031, 7.808815002441406, -3.271923065185547, 31.319082260131836, 38.13166809082031, 49.988037109375, 16.754600524902344, 25.25738525390625, 25.507896423339844, 9.236824035644531, 4.301902770996094, 31.728973388671875, 2.4246444702148438, 5.696815490722656, 48.30003356933594, 57.51573944091797, 34.6328125, 31.170684814453125, 65.20599365234375, 52.245025634765625, 18.758159637451172, 5.49237060546875, -2.4223594665527344, 28.948158264160156, 53.21527862548828, 4.340053558349609, 25.413681030273438, 12.114933013916016, 32.93733215332031, 43.50701904296875, 16.443218231201172, 12.448577880859375, 9.40966796875, 32.00257873535156, 24.511245727539062, 21.866058349609375, 16.840816497802734, 0.4510478973388672, 52.299041748046875, -18.80596160888672, 34.229408264160156, 9.163909912109375, 10.600955963134766, 7.673229217529297, 27.344635009765625, 36.617462158203125, 18.37957000732422, -0.07953834533691406, 28.081790924072266, 23.30645751953125, 7.79241943359375, 40.52313232421875, 16.85002899169922, 19.843063354492188, -10.719612121582031, 30.17630386352539, -53.90776062011719, 40.30363464355469, 14.772071838378906, 23.308578491210938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000513.npy"}
{"epoch": 0.7533039647577092, "step": 514, "batch_size": 64, "mean": 20.327138900756836, "std": 19.073442459106445, "min": -14.100967407226562, "p10": -7.056201934814451, "median": 19.227184295654297, "p90": 46.35195770263672, "max": 71.9914779663086, "pos_frac": 0.84375, "sample": [13.783348083496094, -14.100967407226562, 5.800811767578125, -11.201470375061035, 25.374053955078125, 17.846424102783203, 46.62247085571289, 46.37286376953125, 49.69358825683594, 42.979042053222656, 7.486232757568359, 10.047775268554688, 14.431167602539062, 43.60789489746094, 40.24376678466797, -11.13641357421875, 41.44417190551758, 38.24560546875, 13.329898834228516, 46.30317687988281, -9.405895233154297, 19.327606201171875, 39.07809066772461, -0.5122566223144531, 20.659286499023438, 4.2184295654296875, 16.692626953125, -7.753227233886719, 10.860954284667969, 19.548667907714844, 12.844127655029297, 16.372665405273438, 23.239036560058594, 18.94683074951172, 5.765689849853516, 29.952693939208984, 71.9914779663086, 0.815399169921875, 24.203895568847656, -11.763763427734375, 27.515941619873047, 5.96331787109375, -12.677467346191406, 22.278602600097656, 37.63938903808594, -0.8482494354248047, 4.948944091796875, 10.853683471679688, 25.390316009521484, 52.126041412353516, 6.4725799560546875, 44.47101593017578, 30.29818344116211, 18.90093231201172, 49.57099914550781, 11.023965835571289, -5.4298095703125, 33.09095001220703, 23.4774169921875, 28.057296752929688, 49.337310791015625, 24.981712341308594, 19.12676239013672, 22.111236572265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000514.npy"}
{"epoch": 0.7547723935389133, "step": 515, "batch_size": 64, "mean": 17.215316772460938, "std": 21.287250518798828, "min": -23.264610290527344, "p10": -6.818709373474117, "median": 15.101200103759766, "p90": 41.055176162719725, "max": 111.14047241210938, "pos_frac": 0.828125, "sample": [9.670940399169922, -1.4322471618652344, 13.87216567993164, 30.190032958984375, 4.2557830810546875, -23.264610290527344, 29.54730987548828, 10.529365539550781, 5.202262878417969, -21.587547302246094, 14.427711486816406, 26.714065551757812, 9.575180053710938, 41.250701904296875, 45.65473175048828, -8.800857543945312, 27.839805603027344, 24.724349975585938, 17.352920532226562, -9.826179504394531, 111.14047241210938, -1.3064727783203125, 19.146621704101562, 25.540122985839844, 26.015159606933594, -18.099960327148438, 30.709381103515625, 19.556198120117188, 15.774688720703125, 9.990955352783203, 32.06492614746094, 8.975765228271484, 13.450634002685547, 12.703258514404297, 55.1253662109375, 6.366729736328125, 51.032447814941406, 16.54346466064453, 28.282684326171875, 17.716506958007812, 21.78008270263672, 26.613784790039062, 8.936317443847656, 3.7483367919921875, 5.6683502197265625, -0.055328369140625, -13.973678588867188, 48.72819519042969, 18.265892028808594, 4.350749969482422, -10.211212158203125, 60.70589065551758, 4.81800651550293, -2.193696975708008, 40.59894943237305, 27.816390991210938, 4.365318298339844, 20.23431968688965, 28.489913940429688, 5.421260833740234, 12.112174987792969, 30.533828735351562, 2.44805908203125, 25.95360565185547], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000515.npy"}
{"epoch": 0.7562408223201175, "step": 516, "batch_size": 64, "mean": 22.459487915039062, "std": 22.98818016052246, "min": -28.985504150390625, "p10": -0.5914684295654291, "median": 20.058921813964844, "p90": 55.6225082397461, "max": 95.86187744140625, "pos_frac": 0.875, "sample": [32.694541931152344, 29.999664306640625, 69.99838256835938, 1.8577938079833984, 27.78958511352539, 22.076370239257812, 19.843521118164062, 10.067428588867188, 29.786788940429688, 10.958892822265625, 42.07038879394531, 41.27146911621094, 16.683349609375, 17.74078369140625, -9.943695068359375, 21.30927276611328, 14.277111053466797, -19.469161987304688, 18.72430419921875, 38.532569885253906, 6.6561126708984375, 27.32843017578125, 1.4299278259277344, 12.223514556884766, 64.67001342773438, -5.5916595458984375, -7.192626953125, 3.075927734375, 43.350494384765625, 20.718284606933594, 4.939796447753906, 37.24990463256836, 54.486236572265625, 60.889251708984375, 10.966812133789062, 1.0633544921875, 22.290836334228516, -2.170612335205078, 8.541603088378906, 11.331890106201172, 30.04110336303711, -0.8260459899902344, 31.890472412109375, 10.408849716186523, 34.151344299316406, 25.68785858154297, 4.95343017578125, 56.10948181152344, 12.814279556274414, 59.96101379394531, 31.680038452148438, 9.783184051513672, 9.529937744140625, 6.981271743774414, 95.86187744140625, 30.429954528808594, -28.985504150390625, 4.06854248046875, 20.274322509765625, 73.10138702392578, 52.45582580566406, 28.271820068359375, 26.280109405517578, -0.04412078857421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000516.npy"}
{"epoch": 0.7577092511013216, "step": 517, "batch_size": 64, "mean": 22.86195182800293, "std": 18.839384078979492, "min": -13.493255615234375, "p10": -2.3001373291015614, "median": 24.90363883972168, "p90": 46.43690261840821, "max": 69.2947769165039, "pos_frac": 0.859375, "sample": [33.888397216796875, 21.092857360839844, 21.66911506652832, 34.793426513671875, 30.94072723388672, 36.06205749511719, 31.054336547851562, 33.571678161621094, 25.56298065185547, -9.114665985107422, 13.457870483398438, 36.14128112792969, 2.4796676635742188, 41.192630767822266, 37.27085876464844, 16.78277587890625, 24.72104263305664, 12.19268798828125, 14.361309051513672, 28.494125366210938, 34.66239929199219, 62.084228515625, 47.16145324707031, 20.502685546875, -3.9541168212890625, -10.923822402954102, -1.4272842407226562, 4.59600830078125, 8.277877807617188, 21.478530883789062, -2.6742172241210938, 8.796745300292969, 5.05084228515625, 57.077430725097656, 17.47069549560547, 28.675750732421875, 50.56732177734375, 14.051681518554688, -7.325761795043945, 2.70147705078125, 8.498466491699219, 36.10810470581055, 9.745351791381836, 7.768623352050781, 43.13661193847656, 44.74628448486328, 56.88594055175781, 32.12549591064453, -0.712921142578125, 35.661041259765625, 14.01225471496582, 27.25482177734375, 25.08623504638672, 69.2947769165039, -13.493255615234375, 50.61134338378906, 0.639984130859375, 29.66259765625, -5.607696533203125, 34.74871826171875, 23.49250030517578, 28.273147583007812, 35.4681396484375, 26.29334259033203], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000517.npy"}
{"epoch": 0.7591776798825257, "step": 518, "batch_size": 64, "mean": 16.74312400817871, "std": 18.85817527770996, "min": -29.54534912109375, "p10": -3.179800415039062, "median": 13.361371994018555, "p90": 39.39135513305665, "max": 78.50479888916016, "pos_frac": 0.828125, "sample": [-3.636005401611328, 45.060882568359375, 20.517715454101562, 34.90012741088867, 10.668281555175781, 14.239280700683594, -2.41351318359375, 26.286888122558594, 16.880508422851562, 0.7033233642578125, 22.59400177001953, 11.495559692382812, 0.257537841796875, -11.540481567382812, 14.064754486083984, 19.29619789123535, 37.39324188232422, 21.88166046142578, 6.0242156982421875, 21.647491455078125, 40.92887878417969, -4.170417785644531, 0.9685592651367188, 26.95903205871582, 12.657989501953125, 11.17013168334961, 54.63200378417969, 43.84764099121094, -4.665016174316406, 78.50479888916016, 37.59691619873047, 20.20648193359375, 30.40081787109375, 32.743167877197266, 1.8348236083984375, 26.05463409423828, 11.427547454833984, -1.889007568359375, 73.67556762695312, 12.328784942626953, 15.331535339355469, 14.154312133789062, 23.951156616210938, 10.659271240234375, 2.963897705078125, 28.916061401367188, -0.5267486572265625, -1.1830673217773438, 12.1922607421875, 8.975296020507812, -3.508209228515625, -13.615589141845703, 10.494701385498047, 21.936119079589844, 7.2082366943359375, 3.4482421875, 10.894573211669922, 32.19203186035156, -29.54534912109375, 40.160400390625, 27.848464965820312, 12.64874267578125, 17.413909912109375, 7.0146942138671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000518.npy"}
{"epoch": 0.7606461086637298, "step": 519, "batch_size": 64, "mean": 23.279375076293945, "std": 19.072216033935547, "min": -11.142204284667969, "p10": 3.2571151733398445, "median": 20.289569854736328, "p90": 49.010453796386734, "max": 87.06121826171875, "pos_frac": 0.9375, "sample": [0.6095657348632812, 44.046287536621094, 3.9528121948242188, 6.7257537841796875, 43.01373291015625, 35.386383056640625, 28.436813354492188, 3.888042449951172, 40.111785888671875, 5.657615661621094, 3.9118804931640625, 24.01230239868164, 13.20370864868164, 38.230072021484375, 32.95685577392578, 30.363567352294922, 55.174896240234375, 87.06121826171875, 50.433197021484375, 39.40808868408203, 20.188461303710938, 58.75907897949219, 6.156694412231445, 3.80816650390625, 17.092041015625, 32.170440673828125, 31.864418029785156, 28.6336669921875, -0.1220855712890625, 15.97926139831543, 24.689910888671875, 44.77900695800781, 31.479537963867188, 4.31671142578125, 18.301830291748047, 17.474411010742188, 55.3089599609375, -1.8009300231933594, 16.571868896484375, 24.041275024414062, 15.467903137207031, 3.0209503173828125, 25.11557388305664, 11.298858642578125, -0.81500244140625, 45.69071960449219, 27.93182373046875, 1.1847915649414062, 21.770751953125, 31.19873046875, 30.326156616210938, 56.04618835449219, 19.201263427734375, 5.6224365234375, 24.39776611328125, 64.7944107055664, -11.142204284667969, 14.216968536376953, 13.963134765625, 18.13946533203125, 5.926242828369141, 4.512395858764648, 20.39067840576172, 5.342765808105469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000519.npy"}
{"epoch": 0.762114537444934, "step": 520, "batch_size": 64, "mean": 22.700105667114258, "std": 16.32847785949707, "min": -11.200675964355469, "p10": 1.995471000671387, "median": 22.398324012756348, "p90": 41.258946990966805, "max": 56.24162292480469, "pos_frac": 0.9375, "sample": [37.66552734375, 17.872787475585938, 54.63555908203125, 1.9085559844970703, 12.680870056152344, 38.18220901489258, 31.80901336669922, 11.327953338623047, -11.200675964355469, 14.060531616210938, -0.89410400390625, 29.145339965820312, 0.64434814453125, -8.19259262084961, 22.11589813232422, 7.3238067626953125, 28.866981506347656, 56.24162292480469, 36.503501892089844, 5.589881896972656, 44.709083557128906, 54.20635986328125, 3.49456787109375, 30.89185333251953, 29.484481811523438, 31.276641845703125, 30.55179214477539, 23.986175537109375, 2.198272705078125, 37.561492919921875, 15.844064712524414, 12.533863067626953, 35.827239990234375, 4.785163879394531, 36.15143585205078, 21.82691192626953, 39.66249084472656, 30.89177703857422, 5.173896789550781, 12.869392395019531, 13.564090728759766, 5.791627883911133, 8.385581970214844, 26.30120849609375, 31.680736541748047, 41.805747985839844, 1.7022018432617188, 14.000635147094727, 38.11579132080078, -4.329113006591797, 22.680749893188477, 18.490859985351562, 39.98307800292969, 21.630462646484375, 15.307525634765625, 44.037841796875, 17.453048706054688, 33.63140106201172, 38.8475341796875, 30.939437866210938, 53.9107666015625, 10.005569458007812, 5.8036041259765625, 32.852455139160156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000520.npy"}
{"epoch": 0.7635829662261381, "step": 521, "batch_size": 64, "mean": 21.782573699951172, "std": 18.997690200805664, "min": -17.81250762939453, "p10": 0.6990554809570325, "median": 20.85554599761963, "p90": 47.579534912109374, "max": 72.2459945678711, "pos_frac": 0.90625, "sample": [23.324073791503906, 27.132232666015625, 47.8861198425293, 18.637535095214844, 15.062393188476562, 16.0625, 12.224166870117188, 27.968612670898438, 6.279987335205078, 31.941429138183594, 8.451900482177734, 3.5645294189453125, 1.9435272216796875, 18.80908203125, 5.796775817871094, 26.457733154296875, 0.16571044921875, -10.21139144897461, 35.642005920410156, 42.664825439453125, 2.5935516357421875, -17.81250762939453, 21.40178108215332, 1.9447669982910156, 23.834686279296875, 37.721641540527344, 18.79366683959961, 7.4674835205078125, 8.179271697998047, 8.022314071655273, 20.309310913085938, 31.38869285583496, 31.324600219726562, 16.625762939453125, -0.6356277465820312, 61.363563537597656, 2.493682861328125, 19.743980407714844, 60.21482849121094, 29.60092544555664, -2.9006996154785156, 32.66571807861328, 11.888946533203125, 20.175636291503906, -8.54433822631836, 47.31483459472656, -10.489892959594727, 47.69297790527344, 26.6973876953125, 3.4197158813476562, 39.861297607421875, 15.370113372802734, 25.759986877441406, 58.076019287109375, 21.460372924804688, 46.0296630859375, 21.565841674804688, 55.177268981933594, 5.190559387207031, 72.2459945678711, 24.16928482055664, 25.000335693359375, 39.660560607910156, 32.216976165771484], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000521.npy"}
{"epoch": 0.7650513950073421, "step": 522, "batch_size": 64, "mean": 19.691848754882812, "std": 20.64307403564453, "min": -16.33332061767578, "p10": -2.565446281433105, "median": 15.47938346862793, "p90": 45.991291046142585, "max": 95.59477233886719, "pos_frac": 0.84375, "sample": [32.542091369628906, 6.9702301025390625, 18.58186912536621, 6.56707763671875, 29.734214782714844, 6.099945068359375, 12.850677490234375, 14.3968505859375, 7.9424591064453125, 36.448143005371094, 5.334541320800781, 44.310264587402344, 57.04499816894531, -12.411521911621094, 95.59477233886719, 0.5673675537109375, 25.052997589111328, 10.106636047363281, 4.301933288574219, 8.9066162109375, 54.05742645263672, 43.14347839355469, 24.520580291748047, 9.669898986816406, 9.269920349121094, 43.38801574707031, 40.976409912109375, 32.594322204589844, 23.765769958496094, 25.868629455566406, 28.20490264892578, 8.38818359375, 12.914276123046875, 17.419883728027344, 20.244918823242188, 10.240005493164062, 25.268875122070312, -5.518959045410156, 51.07598876953125, 13.85845947265625, -3.727266311645508, 22.827224731445312, -1.736236572265625, 12.501655578613281, 16.56191635131836, 39.08003234863281, -2.7459716796875, -11.903129577636719, -0.04499053955078125, 46.71173095703125, 14.168865203857422, 17.37216567993164, 48.47572326660156, 7.326805114746094, -16.33332061767578, 27.200637817382812, -8.68704605102539, 5.93292236328125, 20.706466674804688, 4.7236480712890625, 71.60742950439453, -2.1442203521728516, 24.059417724609375, 28.050674438476562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000522.npy"}
{"epoch": 0.7665198237885462, "step": 523, "batch_size": 64, "mean": 18.144161224365234, "std": 20.15941619873047, "min": -16.2142333984375, "p10": -9.832487297058105, "median": 17.054991722106934, "p90": 44.42833404541016, "max": 73.12451171875, "pos_frac": 0.828125, "sample": [2.52264404296875, 20.65719223022461, 44.854026794433594, 40.66365051269531, 27.491172790527344, -13.767227172851562, 23.557220458984375, 12.97943115234375, 41.85882568359375, 43.25529479980469, -9.858049392700195, 43.43505096435547, 2.305002212524414, 23.15997314453125, -6.1041717529296875, 1.2445755004882812, 73.12451171875, 16.75299072265625, 6.907890319824219, 25.137008666992188, 26.568283081054688, 25.01556396484375, -8.297782897949219, 7.118974685668945, 25.06977081298828, 53.07508850097656, 15.194580078125, -13.5225830078125, -16.2142333984375, 25.778823852539062, 13.156550407409668, 6.775007247924805, 6.596183776855469, -14.198787689208984, 25.49915313720703, 20.73193359375, 28.61458969116211, -9.882570266723633, 67.85745239257812, 6.085674285888672, 25.326496124267578, 16.982267379760742, 60.92144775390625, -0.23934173583984375, 18.04621696472168, 8.155158996582031, 46.22157287597656, 5.973598480224609, 25.282302856445312, 4.797336578369141, 15.833389282226562, 19.016803741455078, -9.772842407226562, 4.253387451171875, 31.808555603027344, 17.127716064453125, 16.20296859741211, 6.447288513183594, 47.39470672607422, 29.269973754882812, 27.880584716796875, 31.653732299804688, -13.25433349609375, 14.698684692382812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000523.npy"}
{"epoch": 0.7679882525697503, "step": 524, "batch_size": 64, "mean": 21.927453994750977, "std": 20.094463348388672, "min": -26.000381469726562, "p10": 0.4301442146301282, "median": 19.25710678100586, "p90": 47.971682739257815, "max": 90.17221069335938, "pos_frac": 0.890625, "sample": [23.315460205078125, 42.044769287109375, 17.670364379882812, 17.563541412353516, 8.666877746582031, 6.275909423828125, 43.23680114746094, 1.6655807495117188, 9.465919494628906, 11.519142150878906, 12.795082092285156, 9.847763061523438, 5.92059326171875, 42.761138916015625, 31.751983642578125, 23.642410278320312, 34.970062255859375, 56.72674560546875, 26.06817626953125, 37.92767333984375, 41.82939910888672, 29.591644287109375, 40.12754821777344, 49.43806457519531, 9.315887451171875, 32.61260986328125, 47.30121612548828, 50.48291778564453, 10.619155883789062, -16.44888687133789, 23.326190948486328, 34.176788330078125, 13.52724838256836, 3.672391891479492, 48.25902557373047, -0.0942535400390625, 2.9080276489257812, 50.68718719482422, 36.7794075012207, 18.661048889160156, 26.540603637695312, 4.8611907958984375, 1.6537389755249023, -26.000381469726562, 52.75383758544922, -3.80743408203125, 46.1744384765625, 27.24688720703125, 26.838546752929688, 23.66678237915039, -2.13916015625, 35.3105354309082, 15.545753479003906, 5.0458984375, 19.828262329101562, 1.7410659790039062, -3.49017333984375, 8.809303283691406, 3.51800537109375, 90.17221069335938, 15.753303527832031, -0.5141143798828125, 24.553443908691406, 18.685951232910156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000524.npy"}
{"epoch": 0.7694566813509545, "step": 525, "batch_size": 64, "mean": 21.149002075195312, "std": 20.245986938476562, "min": -29.427993774414062, "p10": -5.731462860107421, "median": 19.576858520507812, "p90": 48.83493919372559, "max": 58.62904357910156, "pos_frac": 0.84375, "sample": [21.508342742919922, 23.98828125, 47.31103515625, 12.099319458007812, -7.157676696777344, 58.62904357910156, 56.97459411621094, 16.29155731201172, -0.934112548828125, -6.1736297607421875, 16.929763793945312, 2.741943359375, 1.7157783508300781, 14.142059326171875, 47.2216796875, -15.986274719238281, 15.016735076904297, 24.159225463867188, 19.45782470703125, 19.695892333984375, 25.0506591796875, 4.008609771728516, 12.057861328125, 49.4825439453125, -29.427993774414062, 31.851318359375, 10.161575317382812, 42.83989715576172, 10.746688842773438, 44.1177978515625, 4.253730773925781, 8.905006408691406, 47.99559020996094, 33.243289947509766, 44.35645294189453, 12.91015625, 27.067752838134766, 22.13763427734375, -1.4769744873046875, 16.145042419433594, 35.82981872558594, 32.147491455078125, 30.866741180419922, -5.372688293457031, 7.37249755859375, 16.76799774169922, 40.53436279296875, 9.548664093017578, 25.095443725585938, 52.42426300048828, 57.71717834472656, 55.09973907470703, -6.1473388671875, 7.7197265625, 49.19466018676758, 14.771255493164062, -5.885223388671875, 21.264934539794922, 4.2535858154296875, 23.14600372314453, 30.655288696289062, 47.3095703125, -12.018074035644531, 37.18223571777344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000525.npy"}
{"epoch": 0.7709251101321586, "step": 526, "batch_size": 64, "mean": 24.406322479248047, "std": 22.06157112121582, "min": -24.999244689941406, "p10": -1.3002359390258773, "median": 22.54286003112793, "p90": 56.26407012939453, "max": 66.81123352050781, "pos_frac": 0.890625, "sample": [9.122482299804688, 47.62129211425781, 22.176010131835938, 2.9276771545410156, 29.536239624023438, -7.6246490478515625, -2.000455856323242, 0.8480033874511719, 18.92669677734375, 6.5740509033203125, 42.75764083862305, 44.64629364013672, 27.21192169189453, 4.173274993896484, 18.725486755371094, 41.572776794433594, 24.61306381225586, 54.95172119140625, 39.51170349121094, 13.147529602050781, 0.6621589660644531, 0.3763885498046875, 22.909709930419922, 36.7412109375, 60.613616943359375, 30.039506912231445, 56.81156921386719, 56.14446258544922, 29.900909423828125, 13.520944595336914, 9.1690673828125, 58.869140625, 8.479972839355469, 56.315330505371094, -8.982452392578125, -9.749542236328125, 59.626853942871094, 30.174774169921875, 10.958770751953125, -2.493030548095703, 20.7049560546875, 48.162200927734375, 42.63038635253906, 33.75935363769531, 17.79395294189453, 7.631782531738281, 35.86833953857422, 42.28562545776367, 9.493965148925781, 10.254547119140625, 38.176002502441406, 66.81123352050781, 47.33796691894531, 64.51729583740234, 18.965965270996094, 30.56903076171875, 55.695343017578125, 4.3756561279296875, -4.06011962890625, -24.999244689941406, 0.33361053466796875, 28.937843322753906, 2.60650634765625, 4.644317626953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000526.npy"}
{"epoch": 0.7723935389133627, "step": 527, "batch_size": 64, "mean": 23.436687469482422, "std": 21.078371047973633, "min": -19.748886108398438, "p10": 2.14193687438965, "median": 18.897663116455078, "p90": 54.06777420043946, "max": 86.55247497558594, "pos_frac": 0.90625, "sample": [5.287223815917969, 23.179351806640625, 11.172782897949219, 50.414276123046875, 20.603790283203125, 26.347640991210938, 30.48891830444336, -6.689149856567383, 35.66859436035156, 31.45977783203125, 57.89728546142578, -8.853050231933594, 56.56861877441406, 3.4790916442871094, 4.349021911621094, -0.48035430908203125, 30.695274353027344, 9.333545684814453, 69.36306762695312, 49.06475830078125, 11.466506958007812, 4.386680603027344, 7.460784912109375, 7.559486389160156, 14.675872802734375, 51.41387176513672, 14.233600616455078, 53.38236999511719, 13.216365814208984, 35.92797088623047, 18.50146484375, 24.15985107421875, 55.05101013183594, 19.293861389160156, 50.036102294921875, 14.37628173828125, 54.36151885986328, 10.862377166748047, 25.248733520507812, -2.5573883056640625, 52.72705078125, 11.025314331054688, 86.55247497558594, 29.20684814453125, -12.260330200195312, 12.976455688476562, 35.275535583496094, 26.99224853515625, 6.5228271484375, 16.01983642578125, 26.747730255126953, 12.943794250488281, 9.139556884765625, 34.06385803222656, 28.492507934570312, 4.568572998046875, 54.705291748046875, 16.406749725341797, 14.940174102783203, 24.246788024902344, 7.257474899291992, -19.748886108398438, 1.5688705444335938, 37.169456481933594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000527.npy"}
{"epoch": 0.7738619676945668, "step": 528, "batch_size": 64, "mean": 21.40606117248535, "std": 19.900896072387695, "min": -21.09221649169922, "p10": -1.8414505004882784, "median": 18.40350341796875, "p90": 51.57149543762208, "max": 65.22200012207031, "pos_frac": 0.890625, "sample": [8.197513580322266, 42.88982391357422, 4.5920562744140625, -3.994476318359375, 27.589279174804688, 18.22430419921875, -21.09221649169922, 48.695674896240234, 42.24127197265625, 31.669940948486328, 11.725814819335938, 5.2903289794921875, 33.290283203125, 16.562931060791016, 8.5697021484375, 53.63011932373047, 20.525707244873047, 9.708221435546875, 53.05267333984375, 22.1907958984375, -10.147356033325195, 4.995609283447266, 22.30718994140625, 19.8306884765625, 2.1174545288085938, 49.141632080078125, 3.6408443450927734, 12.843055725097656, 14.495903015136719, 40.99589157104492, -3.0353050231933594, 12.604900360107422, 37.2108154296875, 3.849233627319336, 27.69476318359375, 14.07562255859375, 26.43834686279297, 36.61688232421875, 52.61286544799805, -8.388435363769531, 20.420825958251953, 25.297264099121094, 11.131725311279297, 41.74565124511719, 65.22200012207031, 20.445816040039062, 12.1630859375, 58.46551513671875, 18.58270263671875, 46.79456329345703, 7.875755310058594, 61.755882263183594, -4.9434356689453125, 39.122215270996094, 13.401144027709961, 60.29264831542969, 7.217039108276367, 7.342369079589844, -11.429443359375, 16.32281494140625, 20.91655731201172, 0.9442100524902344, 8.488334655761719, 28.952301025390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000528.npy"}
{"epoch": 0.775330396475771, "step": 529, "batch_size": 64, "mean": 20.461353302001953, "std": 20.756223678588867, "min": -17.307632446289062, "p10": -3.1042190551757805, "median": 17.14482879638672, "p90": 47.37877578735352, "max": 87.47350311279297, "pos_frac": 0.84375, "sample": [6.7298431396484375, 22.396873474121094, 36.93067932128906, 19.774139404296875, 20.034835815429688, 6.0973052978515625, -0.916412353515625, 28.055824279785156, 24.60753631591797, 87.47350311279297, 14.116920471191406, 7.751808166503906, 22.51001739501953, 40.80484390258789, 16.740562438964844, -9.558517456054688, 2.624441146850586, 15.138412475585938, 13.820960998535156, -9.713661193847656, -2.1417884826660156, 20.451522827148438, 36.33427429199219, 47.55400085449219, 40.01826477050781, 0.28958892822265625, 41.70347595214844, 10.435516357421875, 46.96991729736328, 2.91412353515625, 1.867919921875, 9.744392395019531, 22.27303695678711, 54.98705291748047, 11.576614379882812, -14.629244804382324, 40.902313232421875, 6.248165130615234, 40.6165771484375, 22.007598876953125, -3.5166893005371094, 58.768455505371094, 17.549095153808594, -17.307632446289062, 15.607505798339844, 13.364496231079102, 24.49615478515625, 37.58470153808594, 48.27942657470703, 22.09088897705078, 35.222511291503906, 30.72454071044922, -0.6340923309326172, 8.503814697265625, 52.4700927734375, 15.010848999023438, 13.62042236328125, 25.692825317382812, 28.309654235839844, 13.54849624633789, 5.352264404296875, -7.055824279785156, -6.26026725769043, 72.561767578125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000529.npy"}
{"epoch": 0.7767988252569751, "step": 530, "batch_size": 64, "mean": 21.012853622436523, "std": 16.924938201904297, "min": -13.267478942871094, "p10": 3.0227371215820313, "median": 19.585933685302734, "p90": 39.91585083007813, "max": 75.05136108398438, "pos_frac": 0.953125, "sample": [20.823654174804688, 33.48093795776367, 31.356048583984375, 1.7904891967773438, 17.06151580810547, 7.454042434692383, 43.694732666015625, 8.185928344726562, 19.799896240234375, 19.462677001953125, -3.192096710205078, 8.829612731933594, 40.3671875, 13.31414794921875, 30.645477294921875, 14.77364730834961, 3.0954437255859375, 38.511077880859375, 33.963531494140625, 18.121322631835938, 2.7858200073242188, 25.41986846923828, 18.7559814453125, 13.422027587890625, 15.0257568359375, 5.985008239746094, 25.16832733154297, 17.825416564941406, 41.10682678222656, 23.475234985351562, 1.0479145050048828, 33.197181701660156, 26.043987274169922, 6.856868743896484, 5.6199493408203125, 7.234062194824219, 2.9915771484375, 20.145038604736328, 16.895729064941406, -8.329925537109375, 26.298667907714844, 75.05136108398438, 12.098739624023438, 69.98370361328125, 20.44818115234375, 40.75239562988281, 38.86273193359375, -13.267478942871094, 71.53254699707031, 17.954753875732422, 27.548282623291016, 11.272294998168945, 22.195960998535156, 27.698165893554688, 33.121131896972656, 29.042198181152344, 6.658294677734375, 24.26690673828125, 31.9969482421875, 14.852859497070312, 4.049476623535156, 19.709190368652344, 25.294593811035156, 5.188869476318359], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000530.npy"}
{"epoch": 0.7782672540381792, "step": 531, "batch_size": 64, "mean": 22.43130874633789, "std": 17.54400634765625, "min": -25.441192626953125, "p10": -0.8115711212158199, "median": 23.28386878967285, "p90": 42.136104583740234, "max": 58.370811462402344, "pos_frac": 0.875, "sample": [2.2293853759765625, -0.9789466857910156, 40.12705993652344, 41.30290222167969, 14.75726318359375, -2.50128173828125, 11.952081680297852, 41.688331604003906, 35.81952667236328, 49.3155517578125, 11.276203155517578, 58.370811462402344, -3.8572921752929688, 38.049869537353516, 21.565963745117188, 5.3518524169921875, 34.48486328125, 32.32373809814453, 13.184219360351562, 23.036075592041016, 51.273040771484375, 34.04547882080078, 29.22509765625, 25.088119506835938, 28.72523307800293, 53.10882568359375, 17.419612884521484, 12.600860595703125, 13.55072021484375, -10.876167297363281, 41.831398010253906, 38.17359161376953, 42.266693115234375, 26.4814453125, -0.42102813720703125, 37.61992645263672, 15.113449096679688, 32.35514831542969, 0.15242385864257812, 32.83948516845703, 25.088638305664062, 11.196949005126953, 30.853683471679688, 14.216495513916016, 46.00028991699219, -2.4692306518554688, 34.462120056152344, 25.124191284179688, 25.16644287109375, 32.89373779296875, 14.776046752929688, 17.941619873046875, 14.613800048828125, 16.562915802001953, 13.12759780883789, 5.9340057373046875, 23.531661987304688, 50.966346740722656, 8.509895324707031, -25.441192626953125, -13.566604614257812, 39.94861602783203, 20.957870483398438, 17.136409759521484], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000531.npy"}
{"epoch": 0.7797356828193832, "step": 532, "batch_size": 64, "mean": 25.75627899169922, "std": 17.88080406188965, "min": -8.550016403198242, "p10": 1.8639060974121093, "median": 25.082155227661133, "p90": 52.00669250488282, "max": 58.043701171875, "pos_frac": 0.921875, "sample": [52.656558990478516, 16.565479278564453, 44.23857879638672, 24.160320281982422, 38.65046691894531, 36.279052734375, 32.732452392578125, 47.784912109375, 16.134719848632812, 9.806514739990234, -0.0568389892578125, 55.07342529296875, 50.49033737182617, 25.353317260742188, 24.393596649169922, 26.597442626953125, 58.043701171875, 20.991256713867188, 34.33457946777344, 7.816558837890625, 22.831832885742188, 32.43074035644531, 38.45366668701172, 53.659027099609375, 53.14432144165039, 0.52935791015625, 3.3354339599609375, 8.055610656738281, 34.888458251953125, 1.8495025634765625, 46.45353698730469, 46.03254699707031, 2.4328460693359375, 23.896072387695312, 17.068191528320312, -8.550016403198242, 19.879730224609375, 56.5718994140625, 44.60359191894531, 5.6645050048828125, 31.64463996887207, 31.8726806640625, 13.589324951171875, 26.133872985839844, 27.514007568359375, 11.017196655273438, 24.810993194580078, 31.76556396484375, 1.8975143432617188, 12.422515869140625, 15.584098815917969, -3.744964599609375, -7.0455780029296875, 55.790740966796875, 37.6571044921875, 33.06401062011719, 15.597917556762695, 16.310997009277344, 47.32623291015625, 26.917953491210938, 17.16814422607422, 38.75984191894531, -6.498748779296875, 23.568561553955078], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000532.npy"}
{"epoch": 0.7812041116005873, "step": 533, "batch_size": 64, "mean": 16.019718170166016, "std": 18.80318260192871, "min": -14.364456176757812, "p10": -7.21403284072876, "median": 13.925289154052734, "p90": 39.423883056640626, "max": 71.96781921386719, "pos_frac": 0.796875, "sample": [3.2901077270507812, 11.244895935058594, 19.000202178955078, 22.948440551757812, 11.049312591552734, 10.613327026367188, -6.209869384765625, 61.31207275390625, -14.143661499023438, 25.02873992919922, 40.63219451904297, 22.42554473876953, 15.624626159667969, -2.3331069946289062, 0.893035888671875, 14.836044311523438, 66.71786499023438, -0.490570068359375, -7.104296684265137, -7.2610626220703125, 42.635345458984375, 33.47816467285156, 7.6323089599609375, 39.47001647949219, 71.96781921386719, 37.87177658081055, 39.31623840332031, -11.349987030029297, 7.506254196166992, 40.89982604980469, 0.21832275390625, 21.773513793945312, 34.080909729003906, -10.008224487304688, 12.342575073242188, 28.293643951416016, 31.137100219726562, 16.789663314819336, 13.080986022949219, 10.278488159179688, 0.9321098327636719, -0.36730003356933594, 24.195301055908203, 31.295791625976562, 12.26275634765625, 5.2764434814453125, 9.701454162597656, -14.364456176757812, 22.327484130859375, 28.560943603515625, 7.852134704589844, -9.202981948852539, 21.11099624633789, 14.707328796386719, 20.544235229492188, -13.483139038085938, -4.939167022705078, 16.06281280517578, 13.14324951171875, 25.656028747558594, 24.099063873291016, 5.0762786865234375, 26.484947204589844, 2.840984344482422], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000533.npy"}
{"epoch": 0.7826725403817915, "step": 534, "batch_size": 64, "mean": 22.300796508789062, "std": 19.282651901245117, "min": -17.639144897460938, "p10": -2.687781524658203, "median": 21.913612365722656, "p90": 50.1673583984375, "max": 65.65333557128906, "pos_frac": 0.828125, "sample": [41.60580062866211, 51.12488555908203, 27.01519775390625, 22.821243286132812, 10.668888092041016, 13.796318054199219, -2.713958740234375, 12.884143829345703, 10.200630187988281, 51.267417907714844, -1.0975303649902344, 44.003936767578125, 20.974082946777344, 19.571380615234375, 22.96540069580078, 49.60328674316406, 30.928600311279297, -7.0831298828125, -2.4355850219726562, 56.59040832519531, 16.66741180419922, -5.382965087890625, 33.877227783203125, 2.7918624877929688, 28.68390655517578, 22.567138671875, 33.23834228515625, 13.238609313964844, -6.435222625732422, 50.40910339355469, 8.209806442260742, 52.5662841796875, -1.698617935180664, 11.830047607421875, 24.994625091552734, -17.639144897460938, 0.7169647216796875, -5.278373718261719, 61.6427001953125, 26.583702087402344, 40.23956298828125, 9.762527465820312, 18.623672485351562, 35.207340240478516, 21.260086059570312, 65.65333557128906, 40.15797424316406, 11.152732849121094, 45.40148162841797, 33.17417907714844, 40.08140182495117, 12.966476440429688, 28.308425903320312, 23.40755844116211, 13.110702514648438, 45.72831726074219, 40.31822204589844, -4.760429382324219, 17.1395263671875, 6.8940887451171875, 25.161460876464844, 11.007904052734375, -2.6267013549804688, 25.60631561279297], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000534.npy"}
{"epoch": 0.7841409691629956, "step": 535, "batch_size": 64, "mean": 23.264667510986328, "std": 22.173709869384766, "min": -19.54138946533203, "p10": -3.9895782470703125, "median": 23.6336669921875, "p90": 48.242834472656256, "max": 88.99432373046875, "pos_frac": 0.875, "sample": [3.220684051513672, 48.477264404296875, 16.415794372558594, 3.772174835205078, 36.68431091308594, 23.355056762695312, 25.668163299560547, 58.71434020996094, 49.0640869140625, 0.6666641235351562, -13.140411376953125, 47.48766326904297, 37.04624557495117, 39.07929229736328, 10.919464111328125, 15.700754165649414, 0.9511642456054688, 47.695831298828125, 31.481781005859375, 30.322784423828125, 73.32573699951172, 11.21478271484375, 10.1939697265625, 43.50617980957031, 47.19513702392578, 46.831932067871094, 17.384387969970703, 40.387996673583984, 19.266555786132812, 31.493972778320312, 30.26403045654297, 19.445663452148438, 38.33050537109375, 45.23420715332031, 20.61859893798828, 2.3846893310546875, 23.912277221679688, 37.44807434082031, -3.8350753784179688, 3.0973434448242188, 5.7560882568359375, 88.99432373046875, 26.510875701904297, 21.324602127075195, -4.055793762207031, 25.472347259521484, 10.717391967773438, 25.468734741210938, -16.26788330078125, -5.9665069580078125, 3.6681671142578125, -8.242225646972656, 17.298423767089844, 6.657085418701172, 26.43798828125, -13.483863830566406, 49.30887222290039, -19.54138946533203, 67.55671691894531, 24.385478973388672, 14.221273422241211, 36.37709045410156, 32.7896728515625, 2.2672042846679688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000535.npy"}
{"epoch": 0.7856093979441997, "step": 536, "batch_size": 64, "mean": 22.927248001098633, "std": 20.251827239990234, "min": -21.30982208251953, "p10": 2.0342458724975594, "median": 22.215003967285156, "p90": 56.31836471557618, "max": 71.28942108154297, "pos_frac": 0.921875, "sample": [1.687307357788086, 6.720527648925781, 46.99027633666992, 31.94206428527832, 24.143043518066406, 22.674602508544922, 36.67649841308594, 7.0316162109375, 7.8493499755859375, 27.38946533203125, 25.541961669921875, 31.381671905517578, 15.84014892578125, 61.479644775390625, 59.83282470703125, 22.97800064086914, 22.469558715820312, 5.446949005126953, 35.5223388671875, 57.081504821777344, 17.985763549804688, 11.127845764160156, 12.091156005859375, 29.059368133544922, 16.637569427490234, 34.22866439819336, 4.709524154663086, 54.53770446777344, 61.324317932128906, -7.4367218017578125, 44.95606994628906, 40.41658020019531, 21.96044921875, 0.3079681396484375, -9.891735076904297, 17.708236694335938, 26.1654052734375, 71.28813171386719, -6.548831939697266, 6.462257385253906, 26.86968994140625, -21.30982208251953, 27.051239013671875, -2.688262939453125, 29.687973022460938, 13.679435729980469, 19.03289794921875, 26.691598892211914, 3.4292984008789062, 12.13873291015625, 3.3378982543945312, 24.640506744384766, 42.18712615966797, 12.141578674316406, 3.1387176513671875, 59.20494079589844, 9.590953826904297, 4.861141204833984, 2.843769073486328, 29.37146759033203, 71.28942108154297, 37.44532012939453, 14.074630737304688, 20.86456298828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000536.npy"}
{"epoch": 0.7870778267254038, "step": 537, "batch_size": 64, "mean": 27.462430953979492, "std": 20.61726951599121, "min": -17.730201721191406, "p10": 5.075705718994141, "median": 24.0436954498291, "p90": 53.56758193969727, "max": 76.43751525878906, "pos_frac": 0.96875, "sample": [-12.230487823486328, 22.439804077148438, 0.9748783111572266, 8.756135940551758, 7.7821044921875, 31.298988342285156, 10.844307899475098, 48.12005615234375, 11.287628173828125, 53.099891662597656, 19.522750854492188, 12.354034423828125, 1.2504119873046875, 13.031787872314453, 25.582443237304688, 47.96240234375, 29.903076171875, 4.870643615722656, 62.15147399902344, -17.730201721191406, 1.2006340026855469, 28.667190551757812, 7.5085296630859375, 46.65554428100586, 5.5541839599609375, 15.590705871582031, 33.62518310546875, 7.761711120605469, 15.250526428222656, 29.169967651367188, 49.02555847167969, 50.915924072265625, 72.37020874023438, 53.76802062988281, 56.873130798339844, 59.749000549316406, 2.26153564453125, 52.14801025390625, 38.820655822753906, 56.9534912109375, 12.394821166992188, 76.43751525878906, 46.34776306152344, 39.66950988769531, 36.05699157714844, 46.12811279296875, 10.947669982910156, 20.203845977783203, 11.659141540527344, 47.580413818359375, 17.52423858642578, 16.06884765625, 9.320072174072266, 21.409072875976562, 36.21892547607422, 24.642658233642578, 49.443824768066406, 13.818723678588867, 34.33207702636719, 13.545269012451172, 27.502456665039062, 23.444732666015625, 13.351339340209961, 44.40565490722656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000537.npy"}
{"epoch": 0.788546255506608, "step": 538, "batch_size": 64, "mean": 19.475126266479492, "std": 16.842510223388672, "min": -4.831787109375, "p10": -0.14925689697265607, "median": 18.874841690063477, "p90": 41.829810333251956, "max": 65.07286071777344, "pos_frac": 0.890625, "sample": [40.4508056640625, 20.6451416015625, 27.809478759765625, 56.37141418457031, 32.61124801635742, 14.399810791015625, 19.619693756103516, 21.695724487304688, 22.31060028076172, 3.9283370971679688, 22.49075698852539, 23.576644897460938, 9.420562744140625, 9.698341369628906, 20.27245330810547, -1.323211669921875, 18.129989624023438, 6.4100494384765625, 20.657546997070312, 23.304122924804688, 50.08123779296875, 0.6482772827148438, 8.002326965332031, 22.770462036132812, 42.13726043701172, 39.08795928955078, 65.07286071777344, 26.60620880126953, 19.700225830078125, 6.26629638671875, 15.25103759765625, 52.9483642578125, 4.99066162109375, -0.2285003662109375, 30.253517150878906, 0.26815223693847656, 5.784202575683594, 45.059268951416016, 9.988758087158203, 6.8890838623046875, 24.17412567138672, 24.088218688964844, 9.757169723510742, 39.558570861816406, 4.2789306640625, 41.1124267578125, 2.823932647705078, -4.831787109375, -3.003305435180664, 0.03564453125, 17.20595932006836, -2.8954010009765625, 39.28418731689453, 19.914203643798828, -0.5415573120117188, 15.446662902832031, 10.858345031738281, 25.548675537109375, 62.77850341796875, 14.264846801757812, -4.3559722900390625, 23.482200622558594, 14.099113464355469, 9.267147064208984], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000538.npy"}
{"epoch": 0.7900146842878121, "step": 539, "batch_size": 64, "mean": 23.1872501373291, "std": 20.832191467285156, "min": -16.835521697998047, "p10": -5.172132110595702, "median": 24.74461555480957, "p90": 48.28630180358887, "max": 68.80503845214844, "pos_frac": 0.84375, "sample": [-5.8197174072265625, 3.87359619140625, -9.939552307128906, 19.401016235351562, 9.831016540527344, 32.12415313720703, 12.573440551757812, 41.924774169921875, 24.283828735351562, 36.83027648925781, -4.1003265380859375, 2.5502853393554688, 25.050338745117188, 33.03143310546875, 22.09014129638672, 33.11187744140625, 31.970008850097656, 54.34307861328125, 3.582855224609375, 31.806106567382812, 47.181793212890625, -15.041893005371094, 24.766921997070312, 55.918975830078125, 24.722309112548828, -16.835521697998047, 40.859832763671875, 38.57813262939453, 21.062164306640625, 32.636802673339844, 30.881847381591797, 19.282936096191406, 30.91900634765625, 63.253204345703125, 9.679740905761719, 29.621795654296875, 13.150224685668945, 59.89949035644531, 11.42325210571289, 18.317855834960938, -5.631477355957031, 45.99261474609375, 3.234466552734375, 3.8798828125, 37.91468811035156, 9.527910232543945, 36.29067611694336, -8.6981201171875, -3.9511795043945312, 12.037185668945312, 43.592979431152344, 10.45947265625, 51.95215606689453, 10.948966979980469, 44.396942138671875, -12.749786376953125, 42.21650695800781, 31.729995727539062, 32.33003234863281, 68.80503845214844, -2.0966720581054688, 44.073944091796875, 48.75966262817383, 0.17058563232421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000539.npy"}
{"epoch": 0.7914831130690162, "step": 540, "batch_size": 64, "mean": 23.78058624267578, "std": 23.339229583740234, "min": -26.9959716796875, "p10": -2.1267639160156246, "median": 22.261198043823242, "p90": 48.101724624633796, "max": 92.04776000976562, "pos_frac": 0.875, "sample": [15.594451904296875, 29.666091918945312, 6.1583251953125, 12.869308471679688, 37.89912414550781, 3.392974853515625, 27.13634490966797, -26.9959716796875, 6.491580963134766, 16.011734008789062, -7.840742111206055, 44.14601135253906, 18.0941162109375, 46.09194564819336, 92.04776000976562, 12.739631652832031, 25.431846618652344, 36.313446044921875, 32.04313659667969, 43.09808349609375, 29.910430908203125, 5.887237548828125, 19.341588973999023, 28.504196166992188, 41.74657440185547, -2.6663284301757812, 21.705196380615234, 43.51899719238281, 29.8095703125, 13.08718490600586, 2.3482589721679688, 48.96305847167969, 34.41783905029297, 18.961952209472656, 41.05571746826172, 56.38722229003906, -21.791839599609375, 20.62255859375, 80.52599334716797, -10.214141845703125, 43.42298126220703, 72.74459838867188, 25.869033813476562, 26.746337890625, -23.84876251220703, 15.514053344726562, 17.625396728515625, -2.2440567016601562, 35.376373291015625, 14.185211181640625, -1.8530807495117188, 22.81719970703125, 25.755592346191406, 18.77770233154297, 5.9964141845703125, 2.0072708129882812, 24.88262939453125, 6.308101654052734, 61.790748596191406, 6.168895721435547, 33.10480499267578, 73.25108337402344, 34.07603454589844, 10.972412109375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000540.npy"}
{"epoch": 0.7929515418502202, "step": 541, "batch_size": 64, "mean": 21.030881881713867, "std": 19.04636001586914, "min": -18.45245361328125, "p10": 5.601624679565431, "median": 16.796497344970703, "p90": 50.02502136230469, "max": 80.43803405761719, "pos_frac": 0.9375, "sample": [34.32820129394531, 22.375335693359375, 31.961669921875, 65.83006286621094, 3.3815765380859375, 12.469467163085938, 23.908409118652344, 11.409000396728516, 21.23699951171875, 50.385986328125, 7.30242919921875, 10.480186462402344, 15.143508911132812, 24.914588928222656, 33.677825927734375, 5.2813262939453125, 8.064544677734375, 80.43803405761719, 16.574100494384766, 15.654457092285156, 53.41566467285156, 59.61106872558594, 35.59510803222656, 13.447616577148438, 16.74169921875, 18.335189819335938, -10.233329772949219, 19.569194793701172, 8.018329620361328, 51.4000244140625, 18.574951171875, -12.22430419921875, 11.470565795898438, 41.755584716796875, -18.45245361328125, 10.261920928955078, 20.778854370117188, 8.967960357666016, 41.821815490722656, 32.525482177734375, 22.756370544433594, 6.52570915222168, 12.09759521484375, 19.1177978515625, 7.254364013671875, 38.745826721191406, 9.160083770751953, 40.788753509521484, 7.17071533203125, 10.057174682617188, 6.348987579345703, 16.863853454589844, 9.283760070800781, 6.781883239746094, 57.028282165527344, 8.043804168701172, -9.974525451660156, 1.5574722290039062, 16.851295471191406, 25.463058471679688, 49.182769775390625, 13.810867309570312, 18.70146942138672, 36.160362243652344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000541.npy"}
{"epoch": 0.7944199706314243, "step": 542, "batch_size": 64, "mean": 22.009353637695312, "std": 19.11579132080078, "min": -13.845123291015625, "p10": -0.825923156738281, "median": 21.498470306396484, "p90": 46.73464012145996, "max": 81.40957641601562, "pos_frac": 0.859375, "sample": [22.226085662841797, 49.469764709472656, 59.36479187011719, -0.5219268798828125, -7.324302673339844, 32.977752685546875, 7.809228897094727, 48.88861083984375, 43.585121154785156, 30.745555877685547, 47.06690216064453, 35.835731506347656, 35.85612487792969, 6.308135986328125, 45.9593620300293, 26.636199951171875, 22.100936889648438, 8.255294799804688, 26.872291564941406, 1.7952117919921875, 25.10736083984375, -0.1043548583984375, 15.662307739257812, 17.352460861206055, -13.845123291015625, 9.655136108398438, 4.353057861328125, 32.93305969238281, -0.956207275390625, 28.275253295898438, -3.6007156372070312, 36.78040313720703, -8.778732299804688, 7.984580993652344, 30.859230041503906, 13.304298400878906, 40.56764221191406, 13.765911102294922, -9.141960144042969, 10.837028503417969, 27.403968811035156, 34.446266174316406, 25.332592010498047, -2.748027801513672, 29.5496826171875, 37.53144836425781, 5.39216423034668, 20.592620849609375, 61.151084899902344, 5.437953948974609, 17.84854507446289, 31.403121948242188, 48.76350402832031, 20.89600372314453, 33.589599609375, 81.40957641601562, 20.65424919128418, 2.5948715209960938, 33.1044921875, 14.302131652832031, 16.47417640686035, 5.6309051513671875, 38.718414306640625, 4.2017059326171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000542.npy"}
{"epoch": 0.7958883994126285, "step": 543, "batch_size": 64, "mean": 16.486434936523438, "std": 19.627962112426758, "min": -18.579856872558594, "p10": -10.393653869628906, "median": 13.194425582885742, "p90": 41.78560028076172, "max": 65.48747253417969, "pos_frac": 0.796875, "sample": [13.540428161621094, 4.0792999267578125, -4.0390167236328125, -10.567031860351562, 22.179428100585938, 52.69099426269531, -2.0066795349121094, -2.3741455078125, 14.598426818847656, 24.65949249267578, -13.47418212890625, -13.362762451171875, -18.579856872558594, 6.820564270019531, 65.48747253417969, 40.016719818115234, 32.4608154296875, 50.159584045410156, 27.39688491821289, 19.3856201171875, 28.867454528808594, 25.92121124267578, -15.008621215820312, 17.83486557006836, -17.173397064208984, 37.329498291015625, 7.3907318115234375, 51.50230407714844, 19.395736694335938, 17.986190795898438, 20.5291748046875, 17.632034301757812, -11.892349243164062, 35.260841369628906, 42.29192352294922, -3.2075157165527344, 50.48931884765625, 53.261749267578125, 31.99026870727539, 10.237236022949219, -3.8882827758789062, 28.100082397460938, -9.989105224609375, 10.844970703125, 10.047111511230469, 7.0046234130859375, 39.25139617919922, 36.13818359375, 2.0457420349121094, 10.522903442382812, 11.720630645751953, 12.84842300415039, 40.60417938232422, 9.9842529296875, 23.45886993408203, 37.148468017578125, 9.573638916015625, 3.48919677734375, 3.281707763671875, 8.328086853027344, 5.474922180175781, 7.052070617675781, 4.710979461669922, 17.668155670166016], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000543.npy"}
{"epoch": 0.7973568281938326, "step": 544, "batch_size": 64, "mean": 20.237133026123047, "std": 19.857149124145508, "min": -17.381027221679688, "p10": -2.7503776550292964, "median": 17.975910186767578, "p90": 41.73516616821289, "max": 112.50146484375, "pos_frac": 0.8125, "sample": [9.733108520507812, -5.025421142578125, -1.9365386962890625, 11.85906982421875, 112.50146484375, 28.328445434570312, 18.834396362304688, 41.562339782714844, 21.159671783447266, 10.881830215454102, 9.5130615234375, 13.892841339111328, 17.11742401123047, 29.088409423828125, -0.7740859985351562, 22.145416259765625, 30.06220817565918, 43.56756591796875, 30.473373413085938, 10.219463348388672, 2.9360885620117188, 10.585647583007812, -2.3187522888183594, -9.784725189208984, 24.8775634765625, 28.42888641357422, 37.09107971191406, 53.83201599121094, -3.076658248901367, 35.866573333740234, 47.44792175292969, 14.64483642578125, 29.43994140625, 25.961746215820312, 14.351520538330078, 22.163833618164062, 41.809234619140625, -0.6974105834960938, 8.756492614746094, -3.7078399658203125, 15.904251098632812, 12.001012802124023, 32.825374603271484, 16.59674072265625, 16.785354614257812, 23.807647705078125, 30.37126922607422, 29.572860717773438, 20.402992248535156, 24.765052795410156, 5.099082946777344, 31.28985595703125, 7.5448760986328125, 47.90106201171875, 38.17854309082031, 9.527687072753906, 12.07968521118164, -17.381027221679688, 52.873538970947266, -0.5789031982421875, 26.850196838378906, 35.12754821777344, -2.9353599548339844, -5.244882583618164], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000544.npy"}
{"epoch": 0.7988252569750367, "step": 545, "batch_size": 64, "mean": 22.10393714904785, "std": 21.566360473632812, "min": -24.78929901123047, "p10": -3.3178489685058588, "median": 21.10238742828369, "p90": 51.90746002197266, "max": 78.49403381347656, "pos_frac": 0.875, "sample": [9.328680038452148, -7.729957580566406, -4.765045166015625, -24.78929901123047, 1.3127593994140625, 32.34534454345703, 20.867172241210938, 4.320501327514648, 31.289875030517578, 15.961410522460938, 21.41778564453125, 78.49403381347656, -3.585784912109375, 16.577133178710938, 20.7376708984375, 22.878952026367188, 6.0062408447265625, 48.332176208496094, 32.208274841308594, -2.6926651000976562, 28.8001708984375, 30.544612884521484, 2.9066238403320312, -14.286689758300781, 31.459312438964844, 15.38446044921875, 0.3603515625, 7.9927520751953125, 57.65211486816406, 7.2148895263671875, 21.896026611328125, 29.725311279296875, 6.883726119995117, 26.04217529296875, 52.11468505859375, 42.961795806884766, 32.961509704589844, 20.41608428955078, 17.123672485351562, 51.42393493652344, 6.848480224609375, 55.832237243652344, 46.02943420410156, 1.3160133361816406, 0.0146484375, 21.337602615356445, 45.7855224609375, 59.815093994140625, 37.081878662109375, 8.822101593017578, 1.0619888305664062, 9.610626220703125, 18.816791534423828, 25.576417922973633, 50.81597900390625, 57.20121765136719, 34.744873046875, 45.188568115234375, 22.87273406982422, -12.531669616699219, 39.67411804199219, -8.931938171386719, 4.045707702636719, 55.530765533447266], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000545.npy"}
{"epoch": 0.8002936857562408, "step": 546, "batch_size": 64, "mean": 20.23662567138672, "std": 19.779340744018555, "min": -35.75421142578125, "p10": -0.6919952392578113, "median": 19.52997398376465, "p90": 45.62078132629395, "max": 75.80499267578125, "pos_frac": 0.890625, "sample": [6.6608123779296875, 5.956493377685547, 1.5643997192382812, 16.899980545043945, 39.75170135498047, 27.63249969482422, 17.199722290039062, 23.016361236572266, 51.415767669677734, 19.637405395507812, 31.42061996459961, 9.762638092041016, 35.001953125, 8.88228988647461, 31.689010620117188, 22.021591186523438, 28.409339904785156, 15.471553802490234, 56.46028137207031, 6.657806396484375, 3.3060264587402344, 45.8917350769043, -35.75421142578125, 35.6655387878418, 19.53656005859375, 55.359825134277344, 10.361160278320312, 7.071502685546875, 39.18479537963867, 10.103385925292969, 0.4962615966796875, 11.447418212890625, 8.890247344970703, 26.69681167602539, 11.365997314453125, 5.118583679199219, 33.092620849609375, 29.803497314453125, 31.448410034179688, 75.80499267578125, -32.48516845703125, 5.1165771484375, 21.1884765625, -8.204345703125, -5.085906982421875, 33.03516387939453, -2.8794422149658203, 44.988555908203125, 34.39314270019531, 28.025527954101562, 1.6732559204101562, 39.57783508300781, 22.01895523071289, -1.2012481689453125, 16.279869079589844, 47.07823944091797, 21.009628295898438, 18.222972869873047, -2.548480987548828, 16.031021118164062, 8.977775573730469, 41.88392639160156, 19.523387908935547, 48.12083435058594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000546.npy"}
{"epoch": 0.801762114537445, "step": 547, "batch_size": 64, "mean": 23.299720764160156, "std": 19.437885284423828, "min": -8.405746459960938, "p10": -0.7049644470214823, "median": 21.89984130859375, "p90": 50.56024017333986, "max": 71.77671813964844, "pos_frac": 0.890625, "sample": [1.3263397216796875, -6.4310760498046875, 26.576324462890625, 71.77671813964844, -8.405746459960938, -4.5201416015625, 25.146942138671875, 6.0150604248046875, 25.887420654296875, 66.65925598144531, 13.882568359375, 34.277496337890625, 31.49414825439453, 69.69795989990234, 4.2252044677734375, -6.723213195800781, 14.596885681152344, 17.44988250732422, 14.978981018066406, 21.90831756591797, 29.615753173828125, 24.448524475097656, 14.11297607421875, 47.27143859863281, 28.699039459228516, 14.019119262695312, 43.90174102783203, 71.17036437988281, 29.52715301513672, 21.89136505126953, 11.337593078613281, 9.9000244140625, -6.100139617919922, 57.11651611328125, 47.021759033203125, 21.84039306640625, -1.5755233764648438, 27.774154663085938, 24.710451126098633, 57.08953857421875, 9.410934448242188, 15.973480224609375, 24.372669219970703, 51.9697265625, 38.41127014160156, 14.780754089355469, 17.118236541748047, 22.380462646484375, 3.455625534057617, 18.778343200683594, 27.881900787353516, 26.999832153320312, 38.59336853027344, 12.969573974609375, -5.410320281982422, 21.3563232421875, 24.272735595703125, 38.293304443359375, 35.466773986816406, 8.70086669921875, 16.419601440429688, 25.973297119140625, 4.534952163696289, 4.886749267578125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000547.npy"}
{"epoch": 0.8032305433186491, "step": 548, "batch_size": 64, "mean": 17.43431282043457, "std": 21.907316207885742, "min": -26.042221069335938, "p10": -7.078022384643553, "median": 15.409883499145508, "p90": 46.542112731933614, "max": 96.64216613769531, "pos_frac": 0.796875, "sample": [49.116455078125, 12.208213806152344, 14.214794158935547, 14.713272094726562, 17.889755249023438, 0.09650802612304688, 10.793807983398438, 53.88433837890625, 19.209842681884766, 21.712814331054688, 12.958122253417969, -5.239021301269531, 25.203121185302734, 10.287445068359375, 26.93451690673828, -22.20943832397461, 22.806488037109375, 18.763927459716797, 35.34820556640625, 16.336315155029297, 33.380401611328125, 12.797138214111328, 13.267929077148438, 21.479652404785156, 49.91664123535156, -26.042221069335938, 14.55202865600586, 7.166595458984375, 11.772232055664062, 9.910442352294922, 25.064529418945312, 96.64216613769531, 23.78394317626953, 41.75154113769531, 12.047767639160156, -3.2558517456054688, 17.977874755859375, 9.157890319824219, 34.831825256347656, 49.21363830566406, 34.660797119140625, -12.760478973388672, -0.8120841979980469, 19.45555877685547, 9.755699157714844, -22.11749267578125, -4.874168395996094, 26.39019775390625, 1.10888671875, -7.711353302001953, 0.42105865478515625, -2.85693359375, -14.598310470581055, 31.588165283203125, -19.177444458007812, 16.106494903564453, -5.600250244140625, 67.47361755371094, 12.262863159179688, 36.744285583496094, 19.577224731445312, 48.59521484375, 30.953948974609375, 40.764827728271484], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000548.npy"}
{"epoch": 0.8046989720998532, "step": 549, "batch_size": 64, "mean": 20.67557144165039, "std": 19.791828155517578, "min": -16.418540954589844, "p10": 0.5267005920410158, "median": 16.30229377746582, "p90": 46.594069290161144, "max": 89.68548583984375, "pos_frac": 0.921875, "sample": [17.981231689453125, 78.64088439941406, 29.30413055419922, 0.44290924072265625, 43.38790512084961, 35.669639587402344, 20.431114196777344, 47.9681396484375, 17.9091796875, 53.20234680175781, 12.321332931518555, 0.9205551147460938, 34.85877990722656, 9.42486572265625, 23.551376342773438, 11.687721252441406, 7.4551849365234375, -16.418540954589844, 14.20989990234375, 15.539569854736328, 15.632755279541016, 22.066753387451172, 11.2901611328125, 19.009033203125, 28.953201293945312, 11.592872619628906, 11.54488754272461, 12.721794128417969, 39.07527160644531, 38.764312744140625, 34.56221008300781, -7.464693069458008, 21.774124145507812, 1.9845695495605469, 28.704978942871094, -12.079620361328125, 14.962011337280273, 55.224853515625, 17.076377868652344, 2.8249359130859375, 4.1998443603515625, 16.501819610595703, 18.052391052246094, 30.269859313964844, 52.95258331298828, 39.292449951171875, -2.4292678833007812, 8.986988067626953, 16.102767944335938, 89.68548583984375, 17.50714874267578, 0.00421142578125, 5.207347869873047, 25.785202026367188, 38.08837890625, 11.257587432861328, -0.3535919189453125, 9.504880905151367, 0.7222137451171875, 53.58268737792969, 8.611778259277344, 34.79899597167969, 3.0184288024902344, 15.177406311035156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000549.npy"}
{"epoch": 0.8061674008810573, "step": 550, "batch_size": 64, "mean": 21.115978240966797, "std": 15.970026016235352, "min": -18.339820861816406, "p10": 2.4791549682617204, "median": 21.53896713256836, "p90": 42.79632644653321, "max": 54.4459228515625, "pos_frac": 0.921875, "sample": [5.481008529663086, 4.745079040527344, 9.687015533447266, 43.94275665283203, 23.911376953125, 4.487068176269531, 41.19451141357422, 21.37987518310547, -0.08725547790527344, 23.258682250976562, 35.61029052734375, 23.59967041015625, 54.4459228515625, 18.850040435791016, 24.519424438476562, 27.98590087890625, 27.73194122314453, -18.339820861816406, 19.765899658203125, 37.03418731689453, 1.7068252563476562, 4.281257629394531, 21.005264282226562, 15.232749938964844, 16.665321350097656, 38.00103759765625, 14.875404357910156, 31.963611602783203, 16.72545623779297, 4.838144302368164, -3.1238555908203125, 22.335731506347656, 29.985008239746094, 6.958744049072266, 13.09912109375, 53.367698669433594, 9.100830078125, 43.482818603515625, 40.59518051147461, 12.749565124511719, 25.445404052734375, 39.84660339355469, 22.346080780029297, 5.87432861328125, 16.717071533203125, 24.810302734375, 50.766136169433594, 39.219696044921875, 29.40680694580078, 25.3548583984375, 21.516990661621094, 51.04961395263672, -9.139446258544922, -6.510551452636719, 4.949199676513672, 5.6507720947265625, 23.823165893554688, 26.532821655273438, 0.8834228515625, 21.560943603515625, 20.109909057617188, 24.776702880859375, 48.620418548583984, 14.761795043945312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000550.npy"}
{"epoch": 0.8076358296622613, "step": 551, "batch_size": 64, "mean": 23.696578979492188, "std": 18.492572784423828, "min": -6.804191589355469, "p10": 3.004766845703126, "median": 20.074783325195312, "p90": 47.77998847961427, "max": 76.87582397460938, "pos_frac": 0.9375, "sample": [9.551000595092773, 15.225700378417969, 26.09661865234375, 2.1854782104492188, 21.10040283203125, 48.47072982788086, 0.48651123046875, 43.921546936035156, 8.375221252441406, 32.855770111083984, 16.329029083251953, 9.49844741821289, 40.01922607421875, 45.23442077636719, 20.447532653808594, 34.90476989746094, 13.706413269042969, 13.38677978515625, 18.997879028320312, 50.18726348876953, 35.16871643066406, 15.958797454833984, 75.9190673828125, 46.16825866699219, 14.052116394042969, 9.726593017578125, 76.87582397460938, 26.28925323486328, -0.20323657989501953, 26.664520263671875, 10.147422790527344, 17.5283203125, 67.09320068359375, -1.8074798583984375, 18.33984375, 16.788515090942383, 28.804595947265625, 9.66156005859375, 22.81970977783203, 6.049396514892578, 10.858200073242188, 2.588897705078125, 22.017074584960938, -1.74334716796875, 19.70203399658203, 31.060409545898438, 9.108612060546875, 26.56207275390625, 38.94252014160156, 4.371616363525391, 21.356788635253906, 39.04071044921875, 30.186363220214844, 8.534149169921875, 26.720247268676758, 48.68391418457031, -6.804191589355469, 24.087120056152344, 52.904937744140625, 44.33024597167969, 8.45676040649414, 45.368316650390625, 3.975128173828125, 13.246709823608398], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000551.npy"}
{"epoch": 0.8091042584434655, "step": 552, "batch_size": 64, "mean": 27.311275482177734, "std": 24.296459197998047, "min": -13.237163543701172, "p10": -1.2675630569458, "median": 27.048076629638672, "p90": 59.89355087280274, "max": 104.26856231689453, "pos_frac": 0.875, "sample": [3.642711639404297, 16.781614303588867, 30.297454833984375, 52.99159240722656, 57.69367980957031, 26.73052215576172, 58.61151885986328, 26.732521057128906, 16.61651611328125, 11.219642639160156, 38.55879211425781, 7.1942901611328125, -0.5560340881347656, 41.43402099609375, 27.363632202148438, 43.48106384277344, 43.86841583251953, 34.205589294433594, 104.26856231689453, -13.237163543701172, -10.120376586914062, 65.84979248046875, 54.40947723388672, 24.851917266845703, -11.032501220703125, 11.823065757751465, 23.108863830566406, 34.459617614746094, 12.717109680175781, 4.749855041503906, 53.04255676269531, -8.584747314453125, 16.624191284179688, 30.083663940429688, 37.913063049316406, 32.15320587158203, 36.672481536865234, 32.88819122314453, 79.68445587158203, 18.80059051513672, 61.56255340576172, 13.402519226074219, 5.06396484375, 33.88128662109375, 9.964805603027344, 60.4429931640625, 20.797210693359375, 46.98785400390625, 65.19085693359375, 36.161643981933594, 28.116012573242188, -5.046117782592773, -9.52276611328125, 9.236869812011719, 8.359077453613281, 3.6169052124023438, 28.26532745361328, 0.963409423828125, -1.5725040435791016, 7.646148681640625, 36.412132263183594, 31.104721069335938, 73.93106079101562, 14.962203979492188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000552.npy"}
{"epoch": 0.8105726872246696, "step": 553, "batch_size": 64, "mean": 20.89364242553711, "std": 18.836469650268555, "min": -14.985000610351562, "p10": -2.142867279052734, "median": 17.723527908325195, "p90": 50.02621307373047, "max": 66.23007202148438, "pos_frac": 0.875, "sample": [66.23007202148438, 52.20521545410156, 12.58953857421875, 7.759986877441406, 42.35514831542969, -2.6248817443847656, 4.450897216796875, 2.0583114624023438, 17.288440704345703, 9.243362426757812, 31.600997924804688, 10.263458251953125, 13.786705017089844, 17.355194091796875, 3.0288772583007812, 25.416574478149414, -1.7449569702148438, 0.6331787109375, 38.88789367675781, -8.420207977294922, 0.7625732421875, 35.52852249145508, 29.213829040527344, 9.681087493896484, 4.566535949707031, 16.51312255859375, 62.441253662109375, 21.6317138671875, -6.6991424560546875, 24.061870574951172, 24.983123779296875, 18.165077209472656, 26.797866821289062, -4.82373046875, 37.54936218261719, -2.3134002685546875, 14.866241455078125, 50.207611083984375, 17.502506256103516, -3.8600234985351562, 16.3343505859375, 46.41271209716797, 3.8092880249023438, 25.469955444335938, 23.891685485839844, 12.586296081542969, 22.920501708984375, -14.985000610351562, 40.39647674560547, 4.4815826416015625, 51.128570556640625, 26.489212036132812, 49.60295104980469, 23.175796508789062, 33.947601318359375, 44.52843475341797, 55.6660041809082, 52.53782653808594, 12.315521240234375, 18.695663452148438, 13.659076690673828, 32.36449432373047, 4.679847717285156, 17.944549560546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000553.npy"}
{"epoch": 0.8120411160058737, "step": 554, "batch_size": 64, "mean": 22.810638427734375, "std": 21.65435218811035, "min": -14.127296447753906, "p10": -3.184088325500488, "median": 19.062345504760742, "p90": 55.157698440551776, "max": 83.47534942626953, "pos_frac": 0.859375, "sample": [24.097339630126953, 5.743194580078125, 41.453956604003906, 22.013717651367188, 18.670001983642578, 41.86041259765625, 15.394073486328125, 23.47747802734375, 24.035003662109375, 29.742050170898438, 24.236724853515625, 7.8642120361328125, 38.356163024902344, 11.769371032714844, 4.821710586547852, 3.0195465087890625, 29.360580444335938, -9.394477844238281, -6.106056213378906, 24.034767150878906, 12.432716369628906, 0.5901660919189453, 12.751148223876953, 19.454689025878906, 21.080352783203125, 18.333480834960938, 17.736404418945312, 16.69688606262207, 7.0325164794921875, 18.067140579223633, 26.66204833984375, 40.782989501953125, 41.4515380859375, 17.480697631835938, 65.28034210205078, 29.156166076660156, 34.40252685546875, 68.23844909667969, 33.221527099609375, -3.0536861419677734, -7.0705108642578125, 30.03070831298828, 57.28194046020508, 1.5457496643066406, 24.30709457397461, 12.807777404785156, 57.712677001953125, -1.9068603515625, 64.3980712890625, 3.8853607177734375, -5.324771881103516, 38.03147888183594, 16.721343994140625, -14.127296447753906, 15.112930297851562, 50.201133728027344, 12.600326538085938, 12.513954162597656, -8.23065185546875, 42.45506286621094, 80.23977661132812, 83.47534942626953, 24.212360382080078, -3.2399749755859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000554.npy"}
{"epoch": 0.8135095447870778, "step": 555, "batch_size": 64, "mean": 21.779020309448242, "std": 21.01947784423828, "min": -23.969383239746094, "p10": -3.4374982833862298, "median": 20.397319793701172, "p90": 49.93844718933107, "max": 87.2574462890625, "pos_frac": 0.875, "sample": [45.62017822265625, 7.0880126953125, 51.70961380004883, 33.217132568359375, 17.149093627929688, 52.245216369628906, 45.80572509765625, 7.755584716796875, -3.747194290161133, 13.105003356933594, 35.36646270751953, 25.907852172851562, 14.620697021484375, 42.81543731689453, 17.541488647460938, -10.559684753417969, 9.454826354980469, 25.330795288085938, 43.663360595703125, 54.85004425048828, 21.113388061523438, 29.390785217285156, 14.353172302246094, 6.504302978515625, 25.437332153320312, -18.07269287109375, 19.681251525878906, 8.244834899902344, 59.0570068359375, -23.969383239746094, 28.496673583984375, 44.23077392578125, 19.554365158081055, 24.023178100585938, -4.880584716796875, 7.750701904296875, 17.87708282470703, -4.543968200683594, 7.949756622314453, -10.931861877441406, 0.5539093017578125, 8.68991470336914, 8.444290161132812, 29.15740966796875, 23.548995971679688, 4.587018966674805, 71.81147766113281, 2.009889602661133, 7.5098724365234375, 19.595993041992188, 27.03375244140625, 42.96478271484375, 57.318458557128906, 5.251167297363281, 27.178226470947266, 13.111719131469727, 87.2574462890625, 21.866836547851562, 28.080398559570312, 28.5914306640625, 25.7957763671875, 24.477447509765625, -2.714874267578125, 31.530197143554688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000555.npy"}
{"epoch": 0.8149779735682819, "step": 556, "batch_size": 64, "mean": 23.61233901977539, "std": 20.821821212768555, "min": -23.89031219482422, "p10": 1.4580223083496093, "median": 20.907896041870117, "p90": 51.33708877563477, "max": 83.815185546875, "pos_frac": 0.90625, "sample": [28.10955047607422, 1.439910888671875, 10.824256896972656, 13.230941772460938, 37.49477767944336, 21.17083740234375, 32.461158752441406, 46.638763427734375, -4.758610725402832, 51.442710876464844, 1.5002822875976562, 16.802764892578125, 47.93979263305664, 8.783302307128906, 23.097869873046875, 33.65509796142578, 46.9442138671875, 60.05029296875, 19.10238265991211, 13.839157104492188, -23.89031219482422, 51.09063720703125, 14.530143737792969, 13.711776733398438, -8.753517150878906, 52.09123611450195, -4.428306579589844, 14.58837890625, 51.96043395996094, -16.365585327148438, 5.001575469970703, 11.026123046875, 16.072410583496094, 30.471710205078125, 45.569244384765625, 7.543872833251953, 33.20250701904297, 10.073348999023438, 21.07098388671875, -5.52619743347168, 12.928375244140625, 27.813095092773438, 23.87979507446289, 49.47404479980469, 21.66193389892578, 8.495819091796875, 37.11299133300781, 24.42829132080078, 83.815185546875, 6.964408874511719, 12.761739730834961, 20.744808197021484, 38.17481994628906, 64.81907653808594, 51.84736633300781, 4.7603759765625, 18.23162841796875, 44.461387634277344, 9.192703247070312, 37.01177215576172, 48.153358459472656, 6.987377166748047, 6.1751708984375, 22.484268188476562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000556.npy"}
{"epoch": 0.8164464023494861, "step": 557, "batch_size": 64, "mean": 22.41769027709961, "std": 22.239931106567383, "min": -11.883682250976562, "p10": -4.62861976623535, "median": 19.730159759521484, "p90": 50.82703170776367, "max": 71.61456298828125, "pos_frac": 0.78125, "sample": [-10.280532836914062, 65.1689453125, 47.219383239746094, 3.1859092712402344, 6.479915618896484, 14.871978759765625, -11.820999145507812, -1.8017578125, 35.71772766113281, 6.2899627685546875, 40.459320068359375, 29.343032836914062, 56.20310974121094, 17.161026000976562, 9.841495513916016, 33.920570373535156, 5.515289306640625, 51.652557373046875, 20.126365661621094, 17.280746459960938, 8.044158935546875, 53.95137023925781, 39.684478759765625, 30.094924926757812, 19.333953857421875, 45.277915954589844, -3.554290771484375, 36.041404724121094, -2.9693527221679688, 47.22709655761719, 45.291358947753906, 10.480697631835938, 50.834266662597656, 5.2777862548828125, 27.42398452758789, 1.6587638854980469, 37.93217468261719, 13.102272033691406, 42.04241943359375, 3.4545364379882812, -2.458271026611328, -6.639261245727539, 15.1463623046875, 20.345947265625, -5.713657379150391, -0.6197738647460938, 71.50144958496094, 41.25209045410156, -10.611648559570312, 39.56199264526367, 29.931808471679688, 44.90193176269531, 71.61456298828125, 20.732376098632812, 36.895904541015625, 13.768798828125, -11.883682250976562, 50.810150146484375, -3.3221893310546875, 12.006031036376953, -2.7547149658203125, -5.089046478271484, 40.36835479736328, 27.822647094726562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000557.npy"}
{"epoch": 0.8179148311306902, "step": 558, "batch_size": 64, "mean": 25.042438507080078, "std": 21.73162078857422, "min": -13.435508728027344, "p10": -1.8486732482910146, "median": 22.202054977416992, "p90": 53.84360046386719, "max": 108.93128204345703, "pos_frac": 0.875, "sample": [27.955352783203125, 21.864192962646484, 31.838088989257812, 16.596343994140625, 31.6644287109375, 10.035619735717773, 23.102920532226562, 32.78913497924805, 7.426349639892578, -13.435508728027344, 15.9969482421875, 54.01776123046875, 13.466079711914062, 28.829986572265625, 7.768135070800781, -3.4904556274414062, 16.028152465820312, 24.393558502197266, 22.5399169921875, 51.50048828125, 45.91297912597656, 17.455337524414062, 34.2945556640625, 15.760139465332031, 7.070476531982422, 29.797042846679688, 33.32124328613281, -9.754524230957031, 15.47772216796875, -0.9961090087890625, 18.24416732788086, 7.071342468261719, 18.46947479248047, 51.15971374511719, 12.299240112304688, 54.197601318359375, 4.4139556884765625, 26.424209594726562, -3.8492279052734375, 36.96039962768555, -6.8288726806640625, 6.122507095336914, 24.210487365722656, 63.795806884765625, 3.2674713134765625, -2.4943389892578125, 9.28741455078125, 108.93128204345703, 19.440078735351562, 21.590559005737305, 50.13897705078125, 44.8240966796875, 60.15518569946289, 54.58137512207031, 39.44245147705078, 37.573760986328125, -2.2140579223632812, 12.791759490966797, 59.49702453613281, 7.171142578125, 30.795257568359375, 53.437225341796875, 37.68024444580078, 34.90196990966797], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000558.npy"}
{"epoch": 0.8193832599118943, "step": 559, "batch_size": 64, "mean": 24.246387481689453, "std": 23.02910804748535, "min": -29.451107025146484, "p10": -2.5492374420166013, "median": 21.75261688232422, "p90": 51.768192291259766, "max": 105.16633605957031, "pos_frac": 0.859375, "sample": [16.192962646484375, 24.773101806640625, 51.57328796386719, 27.727088928222656, 12.931648254394531, 22.334136962890625, 4.599687576293945, 28.777801513671875, 18.4857177734375, 62.00682067871094, 18.185043334960938, 21.171096801757812, 11.0037841796875, 51.66893768310547, 57.139892578125, 32.465301513671875, 41.09526062011719, 69.29393005371094, 22.536514282226562, 12.000152587890625, 105.16633605957031, 8.197027206420898, -8.148691177368164, 15.843414306640625, 5.055259704589844, 18.717803955078125, -29.451107025146484, -2.58026123046875, 15.172332763671875, 26.586963653564453, 28.161453247070312, -2.476848602294922, -3.3586273193359375, 15.494718551635742, 23.25546646118164, 46.875762939453125, 3.060871124267578, -14.94603157043457, 26.903228759765625, 6.478282928466797, 58.946533203125, 18.243385314941406, 36.09382629394531, 18.215667724609375, 46.313873291015625, 18.108285903930664, -10.338817596435547, 28.153377532958984, 49.72276306152344, -0.7509956359863281, 23.88359832763672, 36.9505615234375, 17.8919677734375, 14.281013488769531, 34.640960693359375, 51.81072998046875, -12.723098754882812, 67.65740966796875, 33.32469940185547, 8.634452819824219, 46.035072326660156, 7.7954559326171875, 34.570579528808594, 34.33799743652344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000559.npy"}
{"epoch": 0.8208516886930984, "step": 560, "batch_size": 64, "mean": 21.220582962036133, "std": 20.374897003173828, "min": -28.471115112304688, "p10": -5.902568817138671, "median": 22.26529884338379, "p90": 48.81653747558594, "max": 63.905670166015625, "pos_frac": 0.8125, "sample": [55.18955993652344, 23.570575714111328, 50.491878509521484, 31.40392303466797, 10.95578384399414, 3.8746414184570312, -9.551040649414062, -28.471115112304688, 19.479644775390625, 31.349533081054688, 36.630340576171875, 27.791702270507812, 60.387691497802734, 1.520111083984375, 39.52906799316406, -5.244823455810547, 47.38432312011719, 22.33953094482422, 22.64324951171875, 27.15020751953125, 14.519516944885254, 4.218658447265625, 38.84620666503906, 38.789344787597656, 14.0572509765625, 28.862411499023438, -1.4812698364257812, 43.898651123046875, 7.1396331787109375, -10.204071044921875, 30.550880432128906, -11.284561157226562, 6.712047576904297, 53.150779724121094, 41.95252990722656, 17.696365356445312, 14.483226776123047, 48.465843200683594, 49.962310791015625, 23.33856201171875, 37.730743408203125, 9.603485107421875, 22.15306282043457, -6.184459686279297, 48.30451202392578, 33.59416198730469, 23.278636932373047, 63.905670166015625, 11.629398345947266, 48.966835021972656, -6.3939971923828125, 17.572364807128906, 22.19106674194336, -3.4343719482421875, 7.391319274902344, 10.153411865234375, -2.11627197265625, 28.708534240722656, 13.04949951171875, 31.155067443847656, -4.799201965332031, -9.214401245117188, 15.030586242675781, 23.742584228515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000560.npy"}
{"epoch": 0.8223201174743024, "step": 561, "batch_size": 64, "mean": 19.7167911529541, "std": 19.549592971801758, "min": -11.443458557128906, "p10": -3.4428115844726555, "median": 15.276954650878906, "p90": 50.27169647216797, "max": 75.93057250976562, "pos_frac": 0.828125, "sample": [51.73786163330078, -6.428901672363281, -2.5235061645507812, 4.171791076660156, -11.443458557128906, 12.79022216796875, 27.091339111328125, 54.13652801513672, 50.46330261230469, 42.22187805175781, -0.6798686981201172, 10.967185974121094, 36.98252868652344, 39.49968719482422, 29.361602783203125, 32.732933044433594, 6.6067962646484375, 15.100570678710938, 39.24269104003906, 12.05323600769043, 49.824615478515625, -3.8367996215820312, 42.91383361816406, 15.953510284423828, 4.922454833984375, -10.149429321289062, 75.93057250976562, 10.779460906982422, 23.97207260131836, 33.18609619140625, 25.69593048095703, 62.08465576171875, 12.109588623046875, 26.27881622314453, 56.33427429199219, 22.183456420898438, -0.2305431365966797, 11.198654174804688, 11.456220626831055, 31.877052307128906, 3.1496238708496094, -5.371837615966797, 21.607643127441406, 8.266220092773438, -5.62530517578125, 5.244266510009766, 15.453338623046875, 50.585906982421875, 18.899198532104492, 26.505401611328125, 9.876724243164062, 42.033233642578125, -1.03497314453125, 10.540794372558594, 15.828361511230469, 1.6571311950683594, 7.212955474853516, 11.612503051757812, 19.056957244873047, 12.943275451660156, 7.9629364013671875, -9.638397216796875, 28.107635498046875, 20.432159423828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000561.npy"}
{"epoch": 0.8237885462555066, "step": 562, "batch_size": 64, "mean": 20.69796371459961, "std": 22.165830612182617, "min": -16.70299530029297, "p10": -7.531266021728513, "median": 19.460731506347656, "p90": 45.29259262084961, "max": 80.3668212890625, "pos_frac": 0.8125, "sample": [13.012069702148438, 49.42090606689453, -4.213405609130859, 18.38085174560547, 19.65966796875, 24.58177947998047, 22.18242645263672, -3.454315185546875, 35.722328186035156, 25.798477172851562, 14.110252380371094, 31.418006896972656, 3.081202507019043, 44.765953063964844, 62.42548370361328, -11.576034545898438, 31.680084228515625, 41.46586608886719, 5.947666168212891, 32.66172790527344, 11.603889465332031, 24.310989379882812, 27.929183959960938, 4.604511260986328, 5.102691650390625, 37.37055206298828, 62.280242919921875, 34.593170166015625, 36.542236328125, 45.51829528808594, 1.7576828002929688, 5.152366638183594, 8.397159576416016, 32.906524658203125, -11.47088623046875, 80.3668212890625, 27.089950561523438, 17.12371826171875, -12.93963623046875, 56.227142333984375, -5.009002685546875, 37.46125793457031, 32.092376708984375, 30.894386291503906, 4.515987396240234, 24.836669921875, 34.043914794921875, 19.261795043945312, 16.251991271972656, 43.438751220703125, -12.715709686279297, 41.622467041015625, 36.27522277832031, -8.612236022949219, 9.06296157836914, 4.7615509033203125, -16.70299530029297, 11.42109489440918, 2.0624237060546875, -2.4771595001220703, 6.397891044616699, -4.184906005859375, 78.3839111328125, -9.950504302978516], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000562.npy"}
{"epoch": 0.8252569750367107, "step": 563, "batch_size": 64, "mean": 17.477962493896484, "std": 19.534791946411133, "min": -41.480926513671875, "p10": -3.369768524169921, "median": 18.232078552246094, "p90": 35.08688430786133, "max": 89.97393798828125, "pos_frac": 0.859375, "sample": [5.118190765380859, -41.480926513671875, 14.777862548828125, 3.4331207275390625, 30.32225799560547, 22.534317016601562, 31.35057830810547, -12.08905029296875, 27.87851333618164, 20.185379028320312, 34.25352478027344, 5.916769027709961, -4.5605926513671875, 25.77154541015625, 17.243362426757812, 27.148944854736328, 7.5027923583984375, 15.971343994140625, 12.065765380859375, 28.872756958007812, 41.92664337158203, 0.38266944885253906, -0.2906494140625, 37.37982177734375, 32.105995178222656, -7.978633880615234, 20.829280853271484, -3.817138671875, 57.842498779296875, 19.148033142089844, 5.171783447265625, 9.961109161376953, 17.83483123779297, 29.801082611083984, 16.702260971069336, 30.099445343017578, 20.006484985351562, -18.587448120117188, 25.307037353515625, 20.736099243164062, 29.21813201904297, 24.952911376953125, 89.97393798828125, 16.03771209716797, 17.090545654296875, 0.8009796142578125, 13.615684509277344, 70.60031127929688, 21.637916564941406, 18.62932586669922, 5.403541564941406, 35.44403839111328, -2.3259048461914062, 12.499893188476562, 12.594345092773438, -14.0263671875, 21.349197387695312, 0.20122718811035156, 22.329315185546875, 7.5303754806518555, 36.05949401855469, 25.1748104095459, 19.597373962402344, 7.423185348510742], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000563.npy"}
{"epoch": 0.8267254038179148, "step": 564, "batch_size": 64, "mean": 17.98065757751465, "std": 16.40151023864746, "min": -23.075592041015625, "p10": -1.8582195281982419, "median": 16.364112854003906, "p90": 38.586968994140626, "max": 60.721160888671875, "pos_frac": 0.875, "sample": [10.46405029296875, 24.606979370117188, 28.104324340820312, 12.597084045410156, 5.595550537109375, -1.6470184326171875, 5.529170989990234, 0.10828018188476562, 21.108291625976562, 20.931304931640625, 39.88673400878906, 25.42803955078125, 4.385261535644531, 5.238525390625, 43.124542236328125, -4.4078826904296875, 6.028861999511719, 23.257049560546875, -7.946205139160156, -1.9487342834472656, 30.89179229736328, 27.714950561523438, 13.934528350830078, -6.910125732421875, -23.075592041015625, 4.599922180175781, 26.260848999023438, 38.771392822265625, 28.29095458984375, 30.80328369140625, 40.46674346923828, 36.13097381591797, 17.96780014038086, 7.7114715576171875, 10.754817962646484, 38.156646728515625, 16.079261779785156, 50.957420349121094, 1.0645065307617188, -5.90545654296875, 14.52960205078125, 11.46551513671875, 29.476402282714844, 9.609077453613281, 3.82061767578125, 22.298614501953125, 21.221515655517578, 12.656097412109375, 16.648963928222656, 5.999053955078125, 6.5087890625, 10.66571044921875, 25.91571044921875, -2.1423568725585938, 32.4246940612793, 57.08641052246094, 31.110000610351562, 25.345436096191406, 24.29076385498047, 28.78875732421875, 12.241924285888672, 60.721160888671875, 6.967254638671875, 38.002037048339844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000564.npy"}
{"epoch": 0.8281938325991189, "step": 565, "batch_size": 64, "mean": 24.278656005859375, "std": 19.54952049255371, "min": -18.748748779296875, "p10": -0.4260070800781245, "median": 23.347309112548828, "p90": 48.699858856201175, "max": 65.00848388671875, "pos_frac": 0.890625, "sample": [4.6181640625, 55.63829040527344, 65.00848388671875, 11.113021850585938, 17.46917724609375, 35.78382110595703, 38.11735534667969, -2.1188430786132812, 7.161685943603516, 34.6650390625, 29.96990203857422, 39.712493896484375, 20.105472564697266, 45.767845153808594, 5.781879425048828, 1.2726478576660156, 3.2141647338867188, 11.922439575195312, 49.78617858886719, 37.62516784667969, 46.87178039550781, 45.916542053222656, -2.5654983520507812, 46.04449462890625, 8.272323608398438, 2.8237762451171875, 39.26301193237305, 0.11412811279296875, -0.6574935913085938, 49.11225891113281, 30.428565979003906, 35.926597595214844, 19.872413635253906, 22.20665740966797, -18.748748779296875, -9.837661743164062, 26.47784423828125, 36.70423126220703, 24.31963348388672, 15.289974212646484, 17.524124145507812, 59.818626403808594, 18.440963745117188, 22.374984741210938, 3.999664306640625, 32.133056640625, 10.739407539367676, 47.77980041503906, 41.095603942871094, 7.152191162109375, 35.135467529296875, 10.556427001953125, 49.09416961669922, 31.162817001342773, 57.995513916015625, 47.25773620605469, 41.11372375488281, -2.123746871948242, -3.5980300903320312, 6.491832733154297, 25.054718017578125, 6.935146331787109, 19.25128173828125, 37.999366760253906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000565.npy"}
{"epoch": 0.8296622613803231, "step": 566, "batch_size": 64, "mean": 24.469667434692383, "std": 21.917417526245117, "min": -30.009273529052734, "p10": 0.6973548889160175, "median": 21.81503677368164, "p90": 53.37447319030762, "max": 76.00604248046875, "pos_frac": 0.890625, "sample": [21.312713623046875, 43.058868408203125, 29.252044677734375, 44.91056823730469, 6.1244659423828125, 37.40907287597656, 20.0302734375, 39.094268798828125, 5.104946136474609, 58.86750030517578, 35.73390197753906, 17.514678955078125, 24.10723876953125, 9.321823120117188, 30.60297393798828, 27.43438720703125, 40.585357666015625, 7.377464294433594, -0.06905364990234375, 6.5842132568359375, 11.668296813964844, 23.256851196289062, 41.56322479248047, 60.653564453125, 23.766510009765625, -6.379688262939453, 38.37846374511719, 10.339241027832031, 3.674173355102539, 14.628219604492188, 27.318496704101562, 53.613250732421875, 31.76556396484375, 12.591461181640625, 69.49853515625, 14.940258026123047, -17.679031372070312, 37.756752014160156, 9.708351135253906, 2.4856414794921875, 4.189613342285156, 30.648651123046875, 9.372489929199219, 52.817325592041016, 16.791519165039062, 76.00604248046875, 69.71189880371094, 19.491592407226562, -30.009273529052734, 5.3480377197265625, -3.437877655029297, 43.985313415527344, 30.929285049438477, 38.979591369628906, 3.927145004272461, 16.808841705322266, 21.58361053466797, 43.93548583984375, 73.38639831542969, -5.205905914306641, -1.2553653717041016, 14.966964721679688, 22.046463012695312, 43.135032653808594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000566.npy"}
{"epoch": 0.8311306901615272, "step": 567, "batch_size": 64, "mean": 19.73036766052246, "std": 19.103593826293945, "min": -25.908477783203125, "p10": -0.9929710388183589, "median": 19.0020809173584, "p90": 44.58182067871095, "max": 71.01551818847656, "pos_frac": 0.859375, "sample": [-25.908477783203125, 22.146446228027344, 33.064247131347656, 6.413307189941406, 22.346595764160156, 45.742462158203125, 26.124053955078125, 5.111869812011719, 30.955734252929688, 16.225982666015625, 24.332672119140625, 14.6290283203125, 40.048828125, 20.532257080078125, 66.539306640625, 28.26513671875, 41.8736572265625, 24.52271270751953, 27.05657196044922, 12.941551208496094, 8.59979248046875, 54.148719787597656, -1.7776050567626953, 52.47328186035156, 18.480571746826172, 15.630630493164062, 14.925712585449219, 23.14642333984375, 21.10235595703125, 20.099597930908203, 27.45195770263672, 41.7510986328125, 35.20006561279297, 9.893264770507812, 23.618362426757812, -1.7236061096191406, 18.471572875976562, 41.464698791503906, 71.01551818847656, 9.0748291015625, 5.3042144775390625, -0.1343536376953125, -1.1979141235351562, 3.4150466918945312, 18.462432861328125, 51.57169723510742, 2.1805877685546875, -5.710777282714844, 23.03607940673828, 3.9067115783691406, -21.253822326660156, 29.584434509277344, 3.6929359436035156, 19.523590087890625, 9.858505249023438, -0.5147705078125, -6.040016174316406, 16.62164306640625, 2.186626434326172, 36.638946533203125, 54.46319580078125, 6.796611785888672, 2.3981380462646484, 21.942642211914062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000567.npy"}
{"epoch": 0.8325991189427313, "step": 568, "batch_size": 64, "mean": 22.558488845825195, "std": 19.464208602905273, "min": -10.973747253417969, "p10": -0.0019941329956042475, "median": 18.442832946777344, "p90": 52.284059143066415, "max": 83.56942749023438, "pos_frac": 0.890625, "sample": [8.223197937011719, 1.1686935424804688, 53.23866271972656, 3.3778343200683594, 10.163703918457031, 53.95320129394531, 13.565498352050781, 39.373435974121094, 39.55833435058594, -5.5028839111328125, 2.5102577209472656, -2.774139404296875, 26.34906005859375, 32.023468017578125, 11.080291748046875, 65.45733642578125, 23.378211975097656, 17.548141479492188, 4.882190704345703, 15.536983489990234, 9.664424896240234, 4.894645690917969, 50.888641357421875, 15.64297866821289, 43.77482604980469, 37.77710723876953, 10.663116455078125, 24.098175048828125, 32.033294677734375, 13.749370574951172, 22.685012817382812, 20.104618072509766, 8.886489868164062, 35.774505615234375, 58.5521240234375, 36.771419525146484, 14.192691802978516, 6.415271759033203, 15.24700927734375, 33.62389373779297, 83.56942749023438, 1.3217926025390625, 9.322029113769531, 15.853195190429688, -10.973747253417969, 19.583892822265625, 19.3375244140625, -0.6393890380859375, -0.5037174224853516, 52.88209533691406, 22.214088439941406, 60.25302505493164, 35.42321014404297, -3.9770278930664062, 42.59294891357422, 32.777557373046875, -4.967658996582031, 21.50417709350586, 30.97516632080078, 36.74205780029297, 24.31409454345703, 14.598587036132812, 17.12127685546875, 15.867610931396484], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000568.npy"}
{"epoch": 0.8340675477239354, "step": 569, "batch_size": 64, "mean": 18.095369338989258, "std": 20.40603256225586, "min": -34.44268798828125, "p10": -7.915923690795898, "median": 16.963088989257812, "p90": 44.91590690612794, "max": 80.45137023925781, "pos_frac": 0.8125, "sample": [-8.423896789550781, 17.347732543945312, 48.65425109863281, 18.783676147460938, 48.203582763671875, -34.44268798828125, 20.225051879882812, 35.01994323730469, -6.012531280517578, 45.581912994384766, 24.35289764404297, 8.9312744140625, 6.332418441772461, 13.502700805664062, 49.84308624267578, 26.578506469726562, 3.063232421875, 23.70196533203125, -0.7211380004882812, 9.759698867797852, -12.878692626953125, 34.142974853515625, 27.91647720336914, 10.0682373046875, 2.9495315551757812, 41.510589599609375, 39.37266540527344, 9.320785522460938, 4.616424560546875, -8.796188354492188, 5.585071563720703, 14.667266845703125, 46.39158630371094, 24.766860961914062, 14.751594543457031, 17.443321228027344, 19.28186798095703, 25.52647590637207, 51.03446960449219, -5.10455322265625, 37.28639221191406, 6.3102264404296875, -7.923725128173828, 43.36189270019531, 26.294158935546875, 12.419471740722656, 7.019248962402344, 37.46056365966797, -15.166801452636719, 80.45137023925781, -20.396453857421875, 25.14929962158203, 4.875909805297852, 38.962059020996094, 32.98230743408203, -7.8977203369140625, 33.87599182128906, 15.33993911743164, 24.323745727539062, -3.8108367919921875, 14.92251205444336, 13.797721862792969, 29.039520263671875, 16.578445434570312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000569.npy"}
{"epoch": 0.8355359765051396, "step": 570, "batch_size": 64, "mean": 21.966522216796875, "std": 17.573711395263672, "min": -17.847061157226562, "p10": -0.4533672332763652, "median": 22.830028533935547, "p90": 39.86126861572266, "max": 73.105712890625, "pos_frac": 0.890625, "sample": [16.786376953125, 15.557296752929688, 12.050651550292969, 49.92601013183594, 24.33966827392578, 21.26412582397461, 34.1953239440918, -5.3841400146484375, 28.82714080810547, 57.97444152832031, 19.024105072021484, 65.85980224609375, 16.434341430664062, 39.15809631347656, -2.3469505310058594, 23.974319458007812, 5.7458648681640625, 18.227127075195312, 73.105712890625, 6.88519287109375, -1.3080635070800781, 30.92596435546875, 13.445892333984375, 18.819210052490234, 16.069454193115234, 27.283214569091797, 26.609664916992188, 55.998046875, 23.327049255371094, 5.623985290527344, 24.646705627441406, 7.679840087890625, 16.863357543945312, -10.00982666015625, 35.6966552734375, 36.27436828613281, 11.308708190917969, 36.641876220703125, 29.18061065673828, 34.7454833984375, 34.62350082397461, 1.540924072265625, 26.484481811523438, 22.3330078125, 40.162628173828125, 34.42616271972656, 29.969711303710938, 27.27985382080078, 25.031078338623047, 44.984283447265625, 15.08652114868164, 12.46453857421875, 27.18780517578125, -6.2391357421875, 3.679718017578125, 29.115516662597656, 9.86367416381836, -17.847061157226562, 11.597200393676758, -3.124950408935547, 4.089500427246094, 34.90375518798828, 26.701377868652344, 10.1165771484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000570.npy"}
{"epoch": 0.8370044052863436, "step": 571, "batch_size": 64, "mean": 24.052448272705078, "std": 21.626728057861328, "min": -9.473201751708984, "p10": 0.4859106063842783, "median": 22.418012619018555, "p90": 48.19075660705569, "max": 90.257568359375, "pos_frac": 0.90625, "sample": [33.101844787597656, 38.57984924316406, 7.4955902099609375, 22.812767028808594, -8.297103881835938, 50.5401725769043, 16.202152252197266, 38.664676666259766, 1.9019775390625, 55.222076416015625, 21.157005310058594, 7.524009704589844, 28.37313461303711, 7.792304992675781, 1.4461803436279297, 14.989885330200195, 90.257568359375, 35.43804931640625, 28.496917724609375, 74.32130432128906, 41.380001068115234, 85.34770202636719, 37.57677459716797, 23.404251098632812, -3.190582275390625, 6.126251220703125, 27.243667602539062, 22.023258209228516, 24.019001007080078, 69.39363861083984, 0.07948684692382812, 42.70878601074219, 31.82109832763672, 29.40660858154297, 25.699440002441406, 13.408699035644531, 10.371009826660156, 31.29443359375, 28.358829498291016, 15.316596984863281, 21.43402862548828, 15.115043640136719, 38.944244384765625, 38.455665588378906, 39.272300720214844, 7.681079864501953, 56.46607971191406, 38.33428955078125, -8.687973022460938, 13.04037857055664, 8.097671508789062, -8.60455322265625, 32.120872497558594, -9.473201751708984, 5.390777587890625, 16.231327056884766, 1.4342327117919922, 10.352516174316406, 12.46124267578125, 9.18331527709961, 10.367820739746094, 35.772911071777344, -6.7132110595703125, 34.870513916015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000571.npy"}
{"epoch": 0.8384728340675477, "step": 572, "batch_size": 64, "mean": 22.31145477294922, "std": 24.03070068359375, "min": -11.51690673828125, "p10": -3.5760564804077117, "median": 17.7092342376709, "p90": 47.40345153808594, "max": 127.86052703857422, "pos_frac": 0.859375, "sample": [12.66259765625, 6.189460754394531, -7.213417053222656, 19.502376556396484, 39.26587677001953, -11.51690673828125, 44.50392150878906, 17.53598403930664, 12.356281280517578, 12.658416748046875, 1.5751628875732422, 7.838657379150391, 41.83015441894531, -7.254425048828125, 4.8922271728515625, 24.68358612060547, 1.6152725219726562, 51.427833557128906, 17.882484436035156, 59.83198547363281, 3.7812728881835938, 31.662841796875, 4.025554656982422, 18.887535095214844, 18.348560333251953, 0.568511962890625, -7.164894104003906, 6.770454406738281, -0.6057415008544922, 31.649002075195312, 47.206939697265625, 18.63109588623047, 39.38898849487305, 29.089561462402344, -7.79791259765625, 127.86052703857422, 18.106590270996094, 14.373172760009766, 85.66294860839844, 9.922073364257812, 23.582229614257812, 18.706592559814453, -7.135473251342773, 59.01671600341797, -4.849048614501953, -0.5221023559570312, 62.33872985839844, 12.09820556640625, 14.472883224487305, 17.222190856933594, 40.08555603027344, 14.27392578125, 7.138404846191406, 42.900299072265625, 10.37972640991211, 37.76727294921875, 47.4876708984375, 35.3326416015625, 24.315811157226562, 41.478546142578125, 15.460380554199219, 40.49664306640625, 9.52532958984375, 25.725303649902344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000572.npy"}
{"epoch": 0.8399412628487518, "step": 573, "batch_size": 64, "mean": 19.730823516845703, "std": 19.715970993041992, "min": -31.29283905029297, "p10": -1.1817008972167953, "median": 17.722061157226562, "p90": 50.67651901245118, "max": 76.72211456298828, "pos_frac": 0.890625, "sample": [37.099884033203125, 17.977312088012695, 11.274971008300781, 0.4014434814453125, 1.7334365844726562, 17.57904052734375, 19.262746810913086, -1.8601913452148438, 56.08282470703125, 76.72211456298828, 17.865081787109375, 58.622108459472656, 3.408538818359375, 51.65668487548828, 20.654354095458984, 20.773788452148438, 17.887405395507812, -17.481407165527344, 2.2427597045898438, 13.09814453125, 32.01001739501953, 13.559806823730469, 8.428970336914062, 60.54743957519531, 33.67710876464844, 54.055450439453125, 26.23155975341797, 39.144256591796875, 7.959442138671875, 23.61096954345703, -31.29283905029297, -5.156837463378906, 21.34198760986328, 59.111183166503906, 37.75482177734375, -3.933624267578125, 18.796371459960938, -2.8863658905029297, 31.313217163085938, 8.017784118652344, 37.17571258544922, 24.490371704101562, 32.3408203125, 9.227333068847656, 8.826423645019531, 2.123767852783203, 13.887710571289062, 24.155838012695312, 14.39449691772461, 4.025299072265625, 34.791412353515625, 13.512466430664062, 7.369014739990234, -8.9647216796875, 15.042327880859375, 11.515371322631836, 20.480560302734375, 20.805938720703125, 16.957611083984375, 48.38946533203125, 2.82281494140625, 21.793376922607422, 15.967475891113281, 14.349967956542969], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000573.npy"}
{"epoch": 0.8414096916299559, "step": 574, "batch_size": 64, "mean": 22.074352264404297, "std": 20.55087661743164, "min": -9.852951049804688, "p10": -0.6831687927246088, "median": 18.399166107177734, "p90": 47.95656814575196, "max": 85.60107421875, "pos_frac": 0.875, "sample": [11.859066009521484, 18.239356994628906, 8.6416015625, 12.957721710205078, 27.93047523498535, 12.828773498535156, 31.10308265686035, 52.49720764160156, 28.25579833984375, 35.74114990234375, 22.536788940429688, 58.64215087890625, 24.38622283935547, 85.60107421875, 27.52825164794922, 24.210983276367188, 14.699329376220703, 18.231796264648438, 44.456932067871094, 61.072959899902344, 6.069110870361328, 12.74456787109375, 23.054336547851562, 52.318443298339844, -6.0758514404296875, 11.847625732421875, 17.850791931152344, -0.15338897705078125, 7.552989959716797, 45.37342071533203, 11.20562744140625, -0.91021728515625, 10.440849304199219, 9.88079833984375, 18.923622131347656, -9.852951049804688, 0.9150772094726562, 45.743621826171875, 31.081626892089844, 3.8120498657226562, 28.273666381835938, 2.1162109375, 48.634307861328125, -5.38623046875, 19.73541259765625, 23.92254638671875, 5.275642395019531, 8.589950561523438, -5.224311828613281, 0.08109283447265625, 23.050186157226562, -2.4034461975097656, 12.370363235473633, 46.07085418701172, -7.20745849609375, 40.05371856689453, 14.981016159057617, 85.14299774169922, 33.37055206298828, 12.396530151367188, 31.065685272216797, 46.37517547607422, 19.672279357910156, 18.558975219726562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000574.npy"}
{"epoch": 0.8428781204111601, "step": 575, "batch_size": 64, "mean": 24.45215606689453, "std": 22.081754684448242, "min": -26.776336669921875, "p10": -4.955781173706053, "median": 23.97592544555664, "p90": 47.865474319458016, "max": 75.8748779296875, "pos_frac": 0.875, "sample": [6.8732452392578125, 41.718746185302734, 43.9073486328125, 38.27650451660156, 23.557376861572266, 28.849273681640625, 5.88427734375, 46.479923248291016, 31.16943359375, 20.95022964477539, 11.933629989624023, 26.056594848632812, 44.067047119140625, 18.881622314453125, 8.58392333984375, 26.01654052734375, -24.056121826171875, 24.188941955566406, 39.299766540527344, 14.97186279296875, 40.499481201171875, -8.739265441894531, 41.57726287841797, 12.882850646972656, 30.520751953125, 15.529884338378906, 4.5653076171875, 17.730968475341797, 12.297218322753906, 31.081375122070312, -5.770839691162109, 55.634368896484375, 25.015071868896484, 33.93199157714844, -7.853122711181641, 13.087089538574219, 26.2879638671875, -3.0539779663085938, 23.762908935546875, 75.8748779296875, 29.584083557128906, 13.353233337402344, -7.1981964111328125, 55.003082275390625, 63.980743408203125, -26.776336669921875, 40.882293701171875, 23.574935913085938, 22.162017822265625, 41.15637969970703, 5.404022216796875, 75.21690368652344, 32.47169494628906, 22.11474609375, 75.33309936523438, 29.22600555419922, 48.45928192138672, 4.7682037353515625, 44.896568298339844, 8.132781982421875, -23.424514770507812, 21.948631286621094, 34.50261306762695, 17.693405151367188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000575.npy"}
{"epoch": 0.8443465491923642, "step": 576, "batch_size": 64, "mean": 21.225650787353516, "std": 22.17283058166504, "min": -19.664287567138672, "p10": -3.676936340332031, "median": 20.956157684326172, "p90": 50.23323822021485, "max": 74.1160888671875, "pos_frac": 0.8125, "sample": [13.664543151855469, 35.407257080078125, -19.664287567138672, 22.655803680419922, -2.7461776733398438, 0.37640380859375, 30.42156219482422, -2.619903564453125, 3.2509422302246094, 21.176803588867188, 46.19096374511719, 47.41996765136719, 28.640060424804688, 47.1539306640625, 51.22514343261719, 47.918792724609375, 33.692352294921875, 9.05190658569336, 31.478195190429688, -13.970268249511719, 0.164337158203125, 57.28291702270508, 18.3525390625, 31.224937438964844, -12.114906311035156, 74.1160888671875, 2.0869216918945312, 42.850006103515625, 0.7346115112304688, 46.060821533203125, 1.0828933715820312, 21.304229736328125, 5.891571044921875, 5.244422912597656, 2.4512863159179688, 58.418304443359375, 65.40678405761719, -9.036693572998047, 20.735511779785156, 46.69831848144531, -3.82818603515625, 17.770790100097656, 1.3160991668701172, 28.240478515625, 35.86326599121094, 12.816810607910156, -5.522407531738281, 20.670658111572266, 19.678924560546875, 28.11199188232422, -0.29238128662109375, 47.652801513671875, 54.94940185546875, 53.24848937988281, -16.567474365234375, 28.697128295898438, -0.00786590576171875, 11.354583740234375, 26.12515640258789, 7.948699951171875, 32.32550048828125, 23.555187225341797, -3.3240203857421875, 27.979061126708984], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000576.npy"}
{"epoch": 0.8458149779735683, "step": 577, "batch_size": 64, "mean": 23.499950408935547, "std": 21.52387809753418, "min": -18.250267028808594, "p10": -0.18547897338867178, "median": 21.13322639465332, "p90": 51.206258392334, "max": 85.48329162597656, "pos_frac": 0.875, "sample": [12.390670776367188, 85.48329162597656, 34.882896423339844, 59.83385467529297, 7.376457214355469, -0.08335113525390625, 36.26502227783203, 25.264625549316406, 7.245697021484375, 47.653594970703125, 34.58656311035156, 30.952251434326172, 33.29402542114258, 29.100257873535156, -10.822530746459961, 29.035919189453125, 47.12855529785156, 61.720245361328125, 29.628692626953125, 17.14472198486328, 4.368022918701172, 12.041847229003906, -4.762596130371094, -9.853363037109375, 60.42280960083008, 16.94623565673828, 10.817646026611328, 21.79232406616211, 52.72882843017578, 26.133350372314453, 38.93304443359375, 6.36553955078125, 10.939029693603516, 33.5438232421875, 42.13158416748047, 1.5684051513671875, 65.68763732910156, 27.385902404785156, 4.46929931640625, -18.250267028808594, 1.9458093643188477, 5.565380096435547, 14.467704772949219, 3.1668777465820312, 5.448844909667969, 8.489288330078125, -1.6741485595703125, 65.61853790283203, -0.229248046875, 37.5184326171875, 20.47412872314453, 44.458946228027344, 20.227073669433594, 10.804435729980469, 47.19685363769531, 42.320926666259766, 26.898643493652344, -10.320663452148438, 11.801445007324219, 42.652740478515625, 24.453598022460938, 12.96612548828125, 16.146942138671875, 32.10749053955078], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000577.npy"}
{"epoch": 0.8472834067547724, "step": 578, "batch_size": 64, "mean": 18.738391876220703, "std": 18.090993881225586, "min": -22.99420166015625, "p10": -3.7760566711425776, "median": 17.767780303955078, "p90": 45.62105636596681, "max": 56.691497802734375, "pos_frac": 0.859375, "sample": [6.336067199707031, 9.533727645874023, 9.103004455566406, 22.404075622558594, 50.74687194824219, 35.069854736328125, 48.726226806640625, 2.5505523681640625, 13.075868606567383, 8.88833236694336, 3.1031150817871094, 20.483375549316406, 12.351776123046875, 33.125152587890625, 12.792617797851562, -3.054443359375, 26.138629913330078, 18.466796875, 30.43233871459961, 19.08281707763672, 12.722076416015625, 30.79639434814453, 5.00640869140625, 5.1466064453125, -11.285240173339844, 28.396499633789062, 47.40071105957031, 12.387428283691406, 8.861370086669922, 41.443992614746094, 48.86824035644531, -22.99420166015625, 17.068763732910156, 41.107269287109375, -7.7725830078125, 42.427162170410156, 15.039413452148438, -22.797378540039062, 8.182861328125, -4.248456954956055, 16.261573791503906, 12.222419738769531, 31.791976928710938, 23.46221923828125, 36.57762145996094, 14.59820556640625, 36.73828125, -4.085319519042969, 56.691497802734375, 30.572837829589844, 22.990596771240234, 24.206466674804688, 5.982177734375, 8.631622314453125, 22.677330017089844, 24.68561553955078, 21.885528564453125, 16.24176788330078, 46.9898681640625, 26.428688049316406, -19.271446228027344, -1.0934066772460938, 49.13795471191406, 19.818790435791016], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000578.npy"}
{"epoch": 0.8487518355359766, "step": 579, "batch_size": 64, "mean": 20.02633285522461, "std": 18.094518661499023, "min": -27.051300048828125, "p10": 0.27952499389648444, "median": 20.007314682006836, "p90": 39.035928344726564, "max": 84.14466094970703, "pos_frac": 0.90625, "sample": [0.4855461120605469, 9.274171829223633, -4.242790222167969, 51.7279052734375, 4.685516357421875, 39.12586975097656, 21.16156005859375, 6.804046630859375, 24.392852783203125, 38.82606506347656, 0.346405029296875, 18.853069305419922, 45.421722412109375, -1.4668960571289062, 28.350868225097656, 22.25450897216797, 10.921745300292969, 27.70676040649414, 38.8192138671875, 5.910530090332031, -27.051300048828125, 6.224287033081055, 23.115678787231445, 41.657752990722656, 17.564903259277344, 32.64271926879883, 29.389144897460938, 2.503631591796875, 28.88144302368164, -10.163360595703125, 30.62884521484375, 25.317611694335938, 3.08770751953125, 16.174720764160156, 84.14466094970703, 29.802337646484375, 24.163307189941406, 27.566986083984375, 3.6668663024902344, -1.6524810791015625, 21.988739013671875, 9.250297546386719, 38.16761779785156, 28.662460327148438, 38.7357177734375, 15.448368072509766, 9.285110473632812, 7.200904846191406, 6.2499542236328125, 8.19723892211914, 40.5322265625, 17.910953521728516, -0.5102519989013672, 31.9296875, 33.250938415527344, 18.509933471679688, 28.289657592773438, 11.944547653198242, 0.25086212158203125, 21.669822692871094, 17.273082733154297, 64.83562469482422, 4.8923797607421875, 30.6953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000579.npy"}
{"epoch": 0.8502202643171806, "step": 580, "batch_size": 64, "mean": 21.945823669433594, "std": 19.79058837890625, "min": -16.481948852539062, "p10": -0.1808353424072257, "median": 18.83989715576172, "p90": 46.344041442871095, "max": 70.77045440673828, "pos_frac": 0.890625, "sample": [16.2257080078125, 28.206390380859375, 27.493453979492188, 16.098114013671875, 3.79034423828125, 19.260459899902344, 28.147430419921875, 70.77045440673828, 13.448677062988281, 29.27862548828125, 18.419334411621094, 8.285690307617188, 46.364105224609375, 23.039318084716797, -16.481948852539062, 43.328857421875, 10.965423583984375, -0.5521984100341797, 55.98979568481445, 30.1336669921875, 7.037437438964844, 38.34843444824219, 21.716171264648438, 6.335700988769531, 14.773223876953125, 21.494552612304688, 35.70829772949219, 21.825218200683594, 40.97075271606445, 3.7398834228515625, 45.730979919433594, 30.411361694335938, 33.813751220703125, 1.7259445190429688, 31.37427520751953, 57.17266845703125, 16.240455627441406, -14.901958465576172, 16.3477783203125, 7.191703796386719, 8.062225341796875, 18.393722534179688, 2.0103836059570312, 60.58625793457031, 7.332191467285156, -3.533050537109375, -9.466995239257812, -6.8945159912109375, -10.825523376464844, 59.64923858642578, 30.424789428710938, 1.5880813598632812, 46.29722595214844, 57.900245666503906, 20.80840301513672, 8.652687072753906, 37.85707092285156, 13.81304931640625, 0.6856784820556641, 14.692188262939453, 18.21551513671875, 43.83686828613281, 35.77154541015625, 39.407066345214844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000580.npy"}
{"epoch": 0.8516886930983847, "step": 581, "batch_size": 64, "mean": 20.235645294189453, "std": 19.199417114257812, "min": -17.067947387695312, "p10": -5.326288604736328, "median": 19.31659698486328, "p90": 39.38985977172852, "max": 65.989013671875, "pos_frac": 0.859375, "sample": [34.46832275390625, -17.067947387695312, 16.99373435974121, 14.033554077148438, 5.388397216796875, 4.251850128173828, 25.864944458007812, 10.880767822265625, 18.009193420410156, 22.438499450683594, 38.808319091796875, 36.01720428466797, 39.63909149169922, 19.80718994140625, 6.10345458984375, 6.633842468261719, 65.72314453125, 26.41766357421875, 27.196746826171875, 63.0511474609375, 7.6143646240234375, 65.989013671875, -14.022613525390625, 5.8292236328125, 14.554855346679688, 18.149932861328125, 31.40062713623047, 34.79490661621094, 46.71160888671875, 35.795448303222656, 5.045528411865234, 44.530731201171875, 8.838375091552734, 34.82220458984375, 36.830718994140625, 10.766830444335938, 34.352256774902344, 34.70545196533203, 27.200668334960938, 38.38520812988281, 24.864200592041016, 27.013084411621094, -8.228408813476562, 6.7424774169921875, 31.46222686767578, -7.491176605224609, 26.553321838378906, -2.933826446533203, 7.238929748535156, 10.80617904663086, 28.22620391845703, -6.8592376708984375, 3.807016372680664, -14.267486572265625, 9.42477798461914, 28.207237243652344, -5.611152648925781, 26.5906982421875, 25.94232177734375, -4.6616058349609375, 12.203315734863281, 63.72322082519531, 6.548458099365234, 18.826004028320312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000581.npy"}
{"epoch": 0.8531571218795888, "step": 582, "batch_size": 64, "mean": 21.433513641357422, "std": 22.126264572143555, "min": -17.446666717529297, "p10": -4.605190277099608, "median": 17.258893966674805, "p90": 55.08451385498047, "max": 73.04280090332031, "pos_frac": 0.84375, "sample": [29.067764282226562, 7.183097839355469, 23.879478454589844, 10.830055236816406, 65.5152587890625, 49.10223388671875, 43.76958465576172, 6.3873748779296875, 3.98248291015625, 40.255943298339844, 17.86385726928711, 35.315406799316406, 6.981903076171875, 13.660621643066406, 33.59742736816406, 49.964561462402344, 2.1272811889648438, 3.75006103515625, -17.13202667236328, 58.79377746582031, 5.67974853515625, 5.401878356933594, 36.64019012451172, 29.47269058227539, 14.97576904296875, 42.12431335449219, 24.744312286376953, 25.753135681152344, 61.25340270996094, 41.87944030761719, 8.346054077148438, 43.356666564941406, 16.6539306640625, 9.698677062988281, 54.89427185058594, -16.12126922607422, -0.051788330078125, 12.125404357910156, -8.150100708007812, 26.630584716796875, -6.548728942871094, 13.823410034179688, 64.89089965820312, 55.166046142578125, 7.15106201171875, 15.557220458984375, 18.49993896484375, 3.6786956787109375, -1.1113433837890625, 44.08563232421875, 0.4233245849609375, 73.04280090332031, 30.526504516601562, 8.841766357421875, -6.850505828857422, -17.446666717529297, 55.792877197265625, 9.76955795288086, 23.245620727539062, 19.316631317138672, 23.406692504882812, -5.303428649902344, -2.9759674072265625, 24.559478759765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000582.npy"}
{"epoch": 0.8546255506607929, "step": 583, "batch_size": 64, "mean": 19.605195999145508, "std": 21.549663543701172, "min": -26.923599243164062, "p10": -3.026033020019531, "median": 16.601969718933105, "p90": 42.31785964965821, "max": 85.05258178710938, "pos_frac": 0.8125, "sample": [9.837860107421875, 27.759124755859375, 9.100303649902344, 6.1822052001953125, 16.72052574157715, 10.628814697265625, 7.8221893310546875, 26.591995239257812, 76.36487579345703, 10.404075622558594, 11.526840209960938, -8.9547119140625, -26.923599243164062, 12.816146850585938, 19.2801513671875, -13.884071350097656, 20.992843627929688, 1.949462890625, 39.124046325683594, 16.201797485351562, 39.54541015625, 13.38565444946289, 16.483413696289062, -6.677204132080078, -0.3377037048339844, -5.400306701660156, 7.996849060058594, 27.96044921875, -2.6003036499023438, 19.163902282714844, 17.922203063964844, 22.8896484375, 33.882652282714844, 64.97593688964844, -3.2084884643554688, 24.21417999267578, 39.503074645996094, 0.704193115234375, 48.07633972167969, 12.603950500488281, -0.848480224609375, 26.921329498291016, 42.90044403076172, 1.76458740234375, 15.443267822265625, -1.1944198608398438, 85.05258178710938, 30.6292724609375, 27.601383209228516, -7.097038269042969, 46.27411651611328, 13.202659606933594, 3.8580169677734375, 19.383808135986328, 40.95849609375, -0.6794624328613281, 12.345878601074219, 36.476680755615234, 24.347373962402344, 80.97470092773438, 31.732826232910156, 38.764190673828125, 21.863006591796875, 19.43262481689453], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000583.npy"}
{"epoch": 0.856093979441997, "step": 584, "batch_size": 64, "mean": 22.10968017578125, "std": 19.705081939697266, "min": -18.271827697753906, "p10": -5.1316490173339835, "median": 24.455848693847656, "p90": 46.59785995483399, "max": 69.13418579101562, "pos_frac": 0.84375, "sample": [4.1214447021484375, 25.812034606933594, 23.06085205078125, 11.605300903320312, 25.066925048828125, 19.17315673828125, 13.02077865600586, 24.792922973632812, 29.364601135253906, 14.256973266601562, 28.07318115234375, 6.38482666015625, 5.250518798828125, 23.14636993408203, 24.25537872314453, 56.73408508300781, 4.010833740234375, 67.68914794921875, 40.54429626464844, -4.643119812011719, 24.86138153076172, -1.3073883056640625, 39.694786071777344, 24.495391845703125, 20.69811248779297, -14.637901306152344, 69.13418579101562, 37.541725158691406, 15.7010498046875, 24.53699493408203, 30.158477783203125, -8.729888916015625, 17.915531158447266, -18.271827697753906, -11.840873718261719, 43.522613525390625, 13.453407287597656, 26.811111450195312, -10.740180969238281, -5.3410186767578125, 49.48183822631836, 24.416305541992188, 4.2054290771484375, 47.53900146484375, 23.45297622680664, 16.974525451660156, 44.40186309814453, 28.23088836669922, -15.126827239990234, 31.911483764648438, 40.863258361816406, 29.120689392089844, 26.730117797851562, 20.4151611328125, 14.46676254272461, 52.62562561035156, 24.88808822631836, 30.755157470703125, 64.0009994506836, 9.17584228515625, 24.97955322265625, 34.93994140625, 29.71656036376953, -2.522003173828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000584.npy"}
{"epoch": 0.8575624082232012, "step": 585, "batch_size": 64, "mean": 22.367267608642578, "std": 20.254392623901367, "min": -25.817665100097656, "p10": -2.443460083007812, "median": 21.06968879699707, "p90": 50.485070037841794, "max": 66.97843933105469, "pos_frac": 0.875, "sample": [17.003339767456055, 23.776458740234375, 7.476783752441406, 25.503700256347656, 9.006576538085938, 63.033111572265625, 24.157859802246094, -1.8542022705078125, 25.669784545898438, 11.932506561279297, 24.406749725341797, 21.12073516845703, 50.41669464111328, 18.055679321289062, 17.797706604003906, 45.60477066040039, 62.059539794921875, 25.48133087158203, 50.514373779296875, -2.6959991455078125, 32.83003234863281, 13.0491943359375, 3.07989501953125, 7.1762542724609375, 33.86194610595703, 8.019729614257812, 24.25687026977539, 41.65276336669922, 62.22659683227539, 52.17369079589844, 66.97843933105469, -12.562911987304688, 7.3145294189453125, -3.72210693359375, 36.03496551513672, 8.302364349365234, 36.5963134765625, -25.817665100097656, 21.01864242553711, 59.21150207519531, 11.5299072265625, 27.74895477294922, 40.34604263305664, -5.579322814941406, 35.71202087402344, 29.758399963378906, 28.79187774658203, -10.224716186523438, 20.10407257080078, 15.941125869750977, 9.661478042602539, 7.5171051025390625, -3.8339920043945312, 10.5347900390625, 11.698883056640625, 39.98065948486328, 7.655754089355469, 24.708297729492188, 47.71559143066406, 11.64312744140625, 49.58351135253906, 0.23142623901367188, 7.4643096923828125, 22.667205810546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000585.npy"}
{"epoch": 0.8590308370044053, "step": 586, "batch_size": 64, "mean": 19.860626220703125, "std": 19.411640167236328, "min": -21.09870147705078, "p10": -0.564892387390135, "median": 17.351191520690918, "p90": 47.06956939697266, "max": 78.44435119628906, "pos_frac": 0.890625, "sample": [45.579708099365234, 17.14887237548828, 2.9188385009765625, -1.2814407348632812, 17.553510665893555, 1.1070537567138672, 7.393627166748047, 18.149330139160156, 12.033443450927734, 6.063102722167969, -4.546577453613281, 20.410781860351562, 62.5396728515625, 25.234237670898438, -2.550769805908203, 59.754127502441406, 14.479782104492188, -7.710163116455078, 26.992115020751953, 11.618682861328125, 21.939437866210938, 14.266746520996094, 41.80936813354492, 16.554222106933594, 44.660926818847656, 8.580459594726562, 13.882072448730469, 51.482452392578125, 2.0412445068359375, 47.274871826171875, 17.932830810546875, 44.60833740234375, -2.3549957275390625, 3.898181915283203, 18.47890853881836, 7.4267425537109375, 10.439231872558594, 4.193950653076172, 14.534927368164062, 46.59053039550781, 51.990203857421875, 5.066825866699219, 19.353721618652344, 35.49945068359375, 1.8599777221679688, -6.472999572753906, 22.511764526367188, 11.676498413085938, 21.496253967285156, 23.564178466796875, 11.659767150878906, 40.12346649169922, -21.09870147705078, 11.833969116210938, 58.62952423095703, 32.33534240722656, 25.72582244873047, 78.44435119628906, 1.2876930236816406, 5.0158843994140625, 17.981124877929688, 19.595848083496094, 19.43482208251953, 22.4368896484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000586.npy"}
{"epoch": 0.8604992657856094, "step": 587, "batch_size": 64, "mean": 19.25106430053711, "std": 21.53449821472168, "min": -31.93029022216797, "p10": -0.3469221115112305, "median": 14.270648956298828, "p90": 50.645664215087905, "max": 72.2593002319336, "pos_frac": 0.875, "sample": [8.9351806640625, 56.234466552734375, 11.035400390625, 0.7309436798095703, 13.110992431640625, 8.234338760375977, 12.442611694335938, 21.724533081054688, 9.320236206054688, 46.93153381347656, 8.84303092956543, 31.928314208984375, 67.54365539550781, 7.488262176513672, 26.68124008178711, 27.1943359375, 3.309661865234375, -12.05682373046875, 14.2783203125, 20.473796844482422, -31.93029022216797, 18.497955322265625, 42.81669616699219, 3.957090377807617, 26.371490478515625, 53.12042236328125, -16.323450088500977, 72.2593002319336, 19.338470458984375, 16.403427124023438, 10.565315246582031, 52.23743438720703, -12.077033996582031, 46.59547424316406, 1.7965087890625, 8.900491714477539, 12.418416976928711, 14.966606140136719, 4.237224578857422, 7.503021240234375, 0.6973114013671875, 59.684417724609375, -0.35022544860839844, 69.59625244140625, 9.290311813354492, 20.030670166015625, 0.5083427429199219, 3.6488876342773438, 26.292160034179688, 20.590726852416992, 41.262569427490234, 34.27783203125, 46.74505615234375, 41.03923034667969, 42.3487663269043, 21.325714111328125, 14.262977600097656, -2.4667739868164062, 2.911224365234375, -0.3392143249511719, 3.5115509033203125, -5.449012756347656, 20.546894073486328, 26.063804626464844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000587.npy"}
{"epoch": 0.8619676945668135, "step": 588, "batch_size": 64, "mean": 20.96023941040039, "std": 24.521533966064453, "min": -36.397132873535156, "p10": -2.936011886596679, "median": 12.968833923339844, "p90": 60.327571868896484, "max": 87.9874267578125, "pos_frac": 0.828125, "sample": [30.90805435180664, -7.214599609375, 23.022018432617188, 25.586395263671875, 61.788604736328125, 11.800704956054688, 32.22607421875, 36.26247024536133, 55.678802490234375, 37.78510284423828, -6.188423156738281, 35.444915771484375, 0.19533538818359375, 7.109260559082031, -13.85297966003418, 38.87023162841797, 39.583587646484375, 28.15655517578125, -1.3246307373046875, 13.32501220703125, 9.858917236328125, 30.005905151367188, 19.010711669921875, 60.26805114746094, -36.397132873535156, 5.089813232421875, 62.67582702636719, 7.5117645263671875, -11.826034545898438, -1.5490951538085938, 8.455612182617188, 22.80419921875, 1.3011932373046875, 60.35308074951172, 25.477745056152344, -3.200389862060547, 5.1502532958984375, 16.400497436523438, 12.082279205322266, 49.95484161376953, 44.862632751464844, 19.566349029541016, 3.69305419921875, 12.612655639648438, 35.79426574707031, 2.0196094512939453, 11.357307434082031, 74.02691650390625, 4.702888488769531, 39.7979736328125, 2.99871826171875, 73.72433471679688, 87.9874267578125, 5.613311767578125, 11.624992370605469, 7.348243713378906, 17.31328582763672, -1.2664375305175781, 0.9926662445068359, 31.419944763183594, -8.088470458984375, -2.3191299438476562, 60.794525146484375, 12.287734985351562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000588.npy"}
{"epoch": 0.8634361233480177, "step": 589, "batch_size": 64, "mean": 26.640060424804688, "std": 20.0481014251709, "min": -18.172164916992188, "p10": -1.2934864044189414, "median": 27.901885986328125, "p90": 48.894964599609374, "max": 84.34405517578125, "pos_frac": 0.890625, "sample": [34.647857666015625, 59.771385192871094, 27.84271240234375, 2.7037277221679688, 30.003448486328125, 6.227306365966797, 12.17275619506836, 36.27239990234375, 27.79669189453125, -18.172164916992188, 41.98072814941406, 52.15606689453125, 45.78861999511719, 15.651477813720703, 6.5645751953125, 42.19279479980469, 42.20787048339844, -2.9525146484375, -3.1781005859375, 34.52558898925781, 48.729705810546875, 44.63592529296875, 11.730850219726562, 36.033851623535156, 14.352340698242188, 27.618247985839844, 29.888656616210938, 48.091461181640625, 58.25128173828125, 26.433124542236328, 34.46693420410156, 32.668800354003906, 28.798019409179688, 16.371055603027344, 43.51094055175781, 22.600814819335938, 54.47820281982422, -12.124347686767578, 44.075714111328125, 11.760709762573242, 28.392009735107422, 3.8222274780273438, 10.089065551757812, 16.039508819580078, 48.965789794921875, 19.58673095703125, 27.9610595703125, 22.753509521484375, 63.25115203857422, 46.37455749511719, 84.34405517578125, 20.889041900634766, 6.594758987426758, 38.649078369140625, -7.200477600097656, 19.52661895751953, 38.212158203125, 26.292266845703125, 2.5775794982910156, 24.41766357421875, 28.617733001708984, -7.8507537841796875, -2.967010498046875, 29.050029754638672], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000589.npy"}
{"epoch": 0.8649045521292217, "step": 590, "batch_size": 64, "mean": 25.617919921875, "std": 21.065143585205078, "min": -9.149131774902344, "p10": 2.33395004272461, "median": 22.899961471557617, "p90": 56.26672210693361, "max": 77.6596450805664, "pos_frac": 0.921875, "sample": [-0.9911708831787109, 31.439407348632812, 14.714813232421875, 3.299285888671875, 13.883590698242188, 4.591361999511719, 32.02589416503906, 63.388671875, 49.83323669433594, 7.755615234375, -9.149131774902344, 2.9740066528320312, 23.806652069091797, 77.6596450805664, 53.200340270996094, 15.411701202392578, 6.858648300170898, -5.886810302734375, 61.21501159667969, 31.646759033203125, 21.721435546875, 42.47351837158203, 32.928794860839844, 53.35784912109375, 2.0728530883789062, -5.878593444824219, 16.343833923339844, 32.240966796875, 27.405853271484375, 2.94317626953125, 9.001083374023438, 37.90580749511719, 17.594802856445312, 41.660423278808594, 4.874256134033203, 42.16363525390625, 44.226226806640625, 1.6699066162109375, 6.212543487548828, 10.893814086914062, 21.12335968017578, 20.868934631347656, 24.917463302612305, 61.40770721435547, 9.066566467285156, 37.56927490234375, 36.165809631347656, 16.47559356689453, -5.716545104980469, 29.958694458007812, 20.963523864746094, 34.04603576660156, 9.893768310546875, 29.524070739746094, 9.031356811523438, 57.51338195800781, 3.4597854614257812, 67.06770324707031, 33.205360412597656, 76.77304077148438, 21.993270874023438, 29.3223876953125, 31.691917419433594, 43.73471450805664], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000590.npy"}
{"epoch": 0.8663729809104258, "step": 591, "batch_size": 64, "mean": 22.685993194580078, "std": 18.325469970703125, "min": -11.069091796875, "p10": 1.1563491821289071, "median": 22.494060516357422, "p90": 43.83818435668945, "max": 89.63002014160156, "pos_frac": 0.921875, "sample": [5.417205810546875, 43.84410095214844, 32.394737243652344, 13.047523498535156, 21.9385986328125, -9.626426696777344, 26.777050018310547, 10.513298034667969, 6.891185760498047, 89.63002014160156, 23.878097534179688, 30.400142669677734, 33.278594970703125, 43.63648223876953, 42.61235809326172, 44.77387237548828, 8.561492919921875, 39.96565246582031, 41.36076354980469, 30.66791534423828, 32.052154541015625, 21.488975524902344, 3.8852005004882812, -1.233184814453125, 14.634368896484375, 9.210384368896484, 18.38632583618164, 53.28357696533203, 49.74485778808594, 25.48773193359375, 13.768428802490234, 26.75921630859375, 18.725494384765625, 7.578248977661133, 29.398326873779297, 58.08601379394531, 24.03253173828125, 6.934059143066406, 8.660308837890625, 14.76937484741211, 2.482635498046875, 7.032268524169922, 43.824378967285156, 23.049522399902344, 0.738250732421875, 28.63671112060547, 35.849639892578125, -6.071054458618164, 18.361618041992188, 27.757247924804688, 33.715576171875, 20.047470092773438, 17.123855590820312, 19.310096740722656, 10.58172607421875, -11.069091796875, 30.389488220214844, -8.939682006835938, 2.1319122314453125, 0.410980224609375, 50.363426208496094, 30.182418823242188, 36.362396240234375, 24.016754150390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000591.npy"}
{"epoch": 0.8678414096916299, "step": 592, "batch_size": 64, "mean": 19.12704849243164, "std": 18.662195205688477, "min": -17.56369400024414, "p10": -3.564216995239257, "median": 19.05182647705078, "p90": 42.46944885253907, "max": 66.25201416015625, "pos_frac": 0.84375, "sample": [42.60716247558594, 7.780914306640625, 0.2553558349609375, 26.954933166503906, -17.56369400024414, 8.987220764160156, 30.67566680908203, -12.397140502929688, 61.86669158935547, -0.24180984497070312, 6.148193359375, 21.591995239257812, -8.640228271484375, 23.99736785888672, -8.031021118164062, 15.491607666015625, 40.589683532714844, 8.616127014160156, 22.09479522705078, 66.25201416015625, 24.58686065673828, 7.887870788574219, 10.905418395996094, 20.90717315673828, -14.579498291015625, 6.378667831420898, 4.583198547363281, 41.81440734863281, 43.92729187011719, -1.5049209594726562, 21.62004852294922, 6.259742736816406, 1.5607833862304688, 42.14811706542969, 26.80788803100586, 14.239805221557617, 9.603836059570312, 7.0538330078125, 22.88939666748047, 38.297725677490234, 41.09204864501953, 32.115013122558594, 44.644657135009766, 42.87969970703125, 18.134140014648438, 2.2037734985351562, 17.973434448242188, 17.138931274414062, 20.053176879882812, 37.30354309082031, 19.056884765625, 3.426025390625, 43.43934631347656, 33.4063720703125, -4.053615570068359, 32.898963928222656, -2.4222869873046875, 12.045143127441406, 36.12571716308594, 33.14879608154297, -15.513717651367188, 19.046768188476562, 33.02360534667969, 34.54112243652344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000592.npy"}
{"epoch": 0.869309838472834, "step": 593, "batch_size": 64, "mean": 19.348106384277344, "std": 22.135364532470703, "min": -27.79265594482422, "p10": -5.747275924682616, "median": 15.518478393554688, "p90": 41.431338500976565, "max": 92.2855453491211, "pos_frac": 0.8125, "sample": [4.5255889892578125, -4.0937957763671875, -6.683795928955078, 4.966728210449219, 18.93658447265625, -0.3172760009765625, 37.560264587402344, 6.8776397705078125, 31.161453247070312, 28.343521118164062, 32.557823181152344, 49.561279296875, 16.736358642578125, -3.4396209716796875, 29.44483184814453, 8.052913665771484, 77.1895523071289, 31.8817138671875, 31.097869873046875, 10.320907592773438, 34.97365951538086, 14.30059814453125, 13.681522369384766, 23.766983032226562, 4.581905364990234, 27.83513641357422, 4.672431945800781, 9.957225799560547, 20.26990509033203, 6.320762634277344, -7.958505630493164, 50.599761962890625, 12.43821907043457, 0.40161895751953125, 13.737876892089844, 54.625831604003906, 34.556304931640625, 75.87513732910156, 5.5645599365234375, 4.112091064453125, -4.261802673339844, 0.38187408447265625, -7.782344818115234, 32.426971435546875, 41.981964111328125, 0.6659088134765625, 17.751426696777344, 29.07965087890625, 31.38873291015625, 31.14126205444336, 10.322036743164062, 31.52789306640625, 92.2855453491211, 29.049785614013672, -9.489242553710938, 8.998764038085938, 40.14654541015625, -1.1131534576416016, -6.383907318115234, 32.826316833496094, 37.88960647583008, -27.79265594482422, -7.995502471923828, 26.239452362060547], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000593.npy"}
{"epoch": 0.8707782672540382, "step": 594, "batch_size": 64, "mean": 23.27004623413086, "std": 19.82186508178711, "min": -6.079227447509766, "p10": -1.556703186035156, "median": 18.823856353759766, "p90": 49.43396682739258, "max": 75.74359130859375, "pos_frac": 0.859375, "sample": [29.945823669433594, 30.524276733398438, 15.952384948730469, -0.6153812408447266, -1.234100341796875, 19.867843627929688, 49.20636749267578, 4.4300384521484375, 24.761505126953125, 18.128143310546875, -6.079227447509766, 45.757904052734375, 43.64701843261719, 36.353004455566406, 2.508014678955078, 16.225311279296875, 71.56378173828125, 34.13505554199219, -3.9665603637695312, -2.585865020751953, 34.52398681640625, 36.626922607421875, 2.506622314453125, 50.188262939453125, 19.040847778320312, 34.99896240234375, -3.7381019592285156, 59.531951904296875, 18.065940856933594, 50.518341064453125, 15.268386840820312, 22.178298950195312, -5.2285003662109375, 13.271377563476562, 38.69500732421875, 28.643569946289062, 10.008293151855469, -1.6949615478515625, 10.186805725097656, 2.42071533203125, 2.3850555419921875, 35.563201904296875, 17.743215560913086, 3.6086196899414062, 18.59511947631836, 49.53150939941406, 36.75390625, 16.209259033203125, 16.45171356201172, 13.828235626220703, -4.945339202880859, 15.02947998046875, 39.96080780029297, 28.505638122558594, 18.540939331054688, 6.915061950683594, 33.83749771118164, 35.948883056640625, 75.32771301269531, 18.60686492919922, 75.74359130859375, 19.96686363220215, 19.740676879882812, 30.896316528320312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000594.npy"}
{"epoch": 0.8722466960352423, "step": 595, "batch_size": 64, "mean": 18.862171173095703, "std": 21.14678382873535, "min": -26.25334930419922, "p10": -4.011071777343748, "median": 16.524539947509766, "p90": 47.116844177246094, "max": 77.04753112792969, "pos_frac": 0.796875, "sample": [6.574073791503906, 19.51617431640625, 27.12963104248047, 15.280216217041016, -0.04905509948730469, 77.04753112792969, 1.0271072387695312, 21.16187286376953, -13.216827392578125, 44.71583557128906, 11.124099731445312, 34.01241683959961, 16.099712371826172, 35.383277893066406, -0.1965484619140625, 19.991748809814453, 35.70784378051758, 37.762786865234375, 54.71815490722656, 0.8456153869628906, 4.239482879638672, -0.3657875061035156, 20.675643920898438, 46.32948303222656, 15.000186920166016, 32.38371276855469, 18.70362091064453, 8.135478973388672, 8.718017578125, 26.70583724975586, -2.27337646484375, 31.77914810180664, 1.1298484802246094, 1.0329742431640625, 47.45428466796875, -5.8632049560546875, -8.32366943359375, 8.725624084472656, 17.191566467285156, 33.663612365722656, -6.297920227050781, -14.787528991699219, 28.538848876953125, -26.25334930419922, 9.036750793457031, 36.02191925048828, 27.634010314941406, 5.3164215087890625, 4.2628173828125, 50.72308349609375, -1.474578857421875, 16.30670166015625, -4.75579833984375, 22.89940643310547, 8.192039489746094, 20.267101287841797, 16.74237823486328, -0.51251220703125, 53.7613525390625, 4.241537094116211, 44.98866271972656, 32.3050651550293, 61.14990997314453, 69.19436645507812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000595.npy"}
{"epoch": 0.8737151248164464, "step": 596, "batch_size": 64, "mean": 21.275171279907227, "std": 21.16473388671875, "min": -12.51301383972168, "p10": -4.358455657958984, "median": 17.860519409179688, "p90": 54.445812988281254, "max": 72.896484375, "pos_frac": 0.859375, "sample": [-9.4158935546875, 8.950820922851562, 46.131202697753906, 12.827606201171875, 21.96923065185547, 56.77812194824219, -4.5056610107421875, 5.84600830078125, 40.453094482421875, 7.846351623535156, 53.417015075683594, 13.380556106567383, 0.73931884765625, 56.88935089111328, 28.092803955078125, 59.785125732421875, 42.15553283691406, 3.418855667114258, 19.368087768554688, -4.4156646728515625, 3.7017364501953125, 42.11757278442383, 12.338226318359375, 21.65460205078125, 17.463706970214844, 33.96013641357422, -7.558549880981445, 29.881900787353516, 48.62982940673828, 14.958671569824219, 45.89616394042969, 26.56137466430664, -4.224967956542969, 62.72296142578125, 1.463226318359375, 2.5529556274414062, 42.65899658203125, 32.01152038574219, 54.88672637939453, -10.756927490234375, 4.276275634765625, 5.439775466918945, 3.80438232421875, 10.767623901367188, 23.94609832763672, 33.39434051513672, 7.2458343505859375, 19.706588745117188, 32.169925689697266, 56.14118957519531, -1.9058799743652344, 3.076061248779297, -7.335746765136719, 72.896484375, 10.916946411132812, -12.51301383972168, 40.301544189453125, 13.97064208984375, 32.64418029785156, 28.645355224609375, 20.446319580078125, 11.436626434326172, 18.25733184814453, 3.2503929138183594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000596.npy"}
{"epoch": 0.8751835535976505, "step": 597, "batch_size": 64, "mean": 23.44117546081543, "std": 21.571943283081055, "min": -24.52496337890625, "p10": -1.5377243041992172, "median": 22.907960891723633, "p90": 56.55466232299805, "max": 78.92570495605469, "pos_frac": 0.875, "sample": [37.93922424316406, 13.921516418457031, 4.5513916015625, 22.90670394897461, 31.58123779296875, -16.537109375, 9.334213256835938, 50.81483459472656, 8.337459564208984, 25.596878051757812, 25.132774353027344, 20.72888946533203, -24.52496337890625, 5.411243438720703, 55.734161376953125, 16.97283935546875, -0.01556396484375, 11.644844055175781, 24.338836669921875, 58.669700622558594, 1.7666244506835938, 31.724227905273438, 15.547187805175781, 36.00384521484375, 58.35022735595703, 22.909217834472656, -2.1900787353515625, 2.8238143920898438, 24.570297241210938, 10.784706115722656, -2.5950088500976562, 9.28836441040039, 39.660499572753906, 30.978607177734375, 10.010101318359375, 78.92570495605469, 57.85795593261719, 3.575347900390625, 8.97153091430664, 23.525325775146484, 11.63946533203125, 34.804176330566406, -6.3412017822265625, 69.77875518798828, 19.931396484375, 31.63818359375, 25.784732818603516, 23.748069763183594, 15.123649597167969, 46.82342529296875, 10.129631042480469, -3.3399314880371094, -10.097480773925781, 49.5643310546875, 59.94696807861328, 15.596168518066406, 55.913734436035156, 11.649017333984375, 56.829345703125, 26.87689971923828, 32.4515380859375, 39.42365264892578, 5.277168273925781, 32.05596160888672], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000597.npy"}
{"epoch": 0.8766519823788547, "step": 598, "batch_size": 64, "mean": 18.64090347290039, "std": 19.729700088500977, "min": -13.927167892456055, "p10": -3.1551929473876945, "median": 17.09593963623047, "p90": 45.20925064086915, "max": 68.06696319580078, "pos_frac": 0.765625, "sample": [29.642047882080078, -1.10955810546875, 19.610370635986328, 67.56232452392578, 0.6481857299804688, 51.87359619140625, 36.13627624511719, -3.45989990234375, 9.101394653320312, 25.655349731445312, 19.27841567993164, -1.6952972412109375, -2.4442100524902344, 0.01031494140625, -0.5771942138671875, 14.807544708251953, -7.7253875732421875, 45.90545654296875, 30.65184783935547, 19.004667282104492, -10.193374633789062, 53.72618103027344, 20.539337158203125, 8.08489990234375, 10.525522232055664, 31.52666473388672, 40.56971740722656, 8.728118896484375, -13.057853698730469, -13.927167892456055, 7.4463653564453125, 39.32875061035156, 31.615070343017578, 9.851024627685547, 9.510200500488281, 23.984352111816406, 15.178421020507812, 8.494537353515625, -7.364166259765625, 17.08624267578125, -5.1077728271484375, 20.49750518798828, 12.140861511230469, 29.90618896484375, -1.0382118225097656, 68.06696319580078, 35.732666015625, 6.718343734741211, 9.884899139404297, 37.517578125, 25.585716247558594, 17.105636596679688, 62.33607482910156, -0.756195068359375, 28.03619384765625, 10.9141845703125, -0.08245849609375, 51.9097900390625, 18.269207000732422, 32.897674560546875, 28.593109130859375, 17.428552627563477, 43.58477020263672, -1.6524848937988281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000598.npy"}
{"epoch": 0.8781204111600588, "step": 599, "batch_size": 64, "mean": 18.94031524658203, "std": 16.9210262298584, "min": -17.336402893066406, "p10": 0.1459091186523448, "median": 17.58313751220703, "p90": 41.78209381103516, "max": 68.1358413696289, "pos_frac": 0.890625, "sample": [14.659843444824219, 6.004604339599609, 19.746814727783203, 1.1614837646484375, 39.01750946044922, 44.860595703125, 46.036041259765625, -0.289337158203125, 1.8264427185058594, 11.024711608886719, 40.57636260986328, 39.99147033691406, 4.802585601806641, 17.89240264892578, -1.0179061889648438, 19.270736694335938, 18.614418029785156, 47.10301208496094, 7.5827178955078125, 42.29883575439453, 27.591751098632812, 37.185340881347656, 34.341224670410156, 16.040809631347656, 30.82006072998047, -1.6347846984863281, 26.11248016357422, 21.85955047607422, 7.487464904785156, -7.734832763671875, 23.2122802734375, 11.790512084960938, 1.6565628051757812, 10.32269287109375, 68.1358413696289, 40.50340270996094, 4.792564392089844, 9.490882873535156, 23.784828186035156, 8.83111572265625, 30.851768493652344, 22.683212280273438, 24.172565460205078, 10.315048217773438, 8.931831359863281, 2.452716827392578, 6.615949630737305, -17.336402893066406, 21.44178009033203, 25.487892150878906, 37.29560852050781, 2.0780601501464844, 10.558326721191406, -4.267036437988281, 60.26622772216797, 12.731143951416016, 24.048057556152344, 18.303199768066406, 5.916015625, 42.7599983215332, 17.27387237548828, -3.8282814025878906, 14.46963119506836, 23.20592498779297], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000599.npy"}
{"epoch": 0.8795888399412628, "step": 600, "batch_size": 64, "mean": 22.158344268798828, "std": 20.562549591064453, "min": -11.330268859863281, "p10": -0.9178161621093748, "median": 18.542255401611328, "p90": 46.144901275634766, "max": 93.04492950439453, "pos_frac": 0.875, "sample": [16.27532196044922, 15.310529708862305, 46.197120666503906, 14.278915405273438, 41.033294677734375, 23.507339477539062, 1.2955360412597656, -11.330268859863281, 7.725397109985352, 30.415874481201172, -8.351203918457031, 29.13176727294922, 46.02305603027344, 12.360136032104492, 19.15412139892578, 37.9368896484375, 4.6733245849609375, 28.856033325195312, 3.9722747802734375, 13.377803802490234, 21.545852661132812, 28.59278106689453, 9.893749237060547, 5.9007415771484375, 35.663330078125, 21.7647705078125, 10.911869049072266, 93.04492950439453, 45.79096984863281, 23.731529235839844, 5.976051330566406, 37.86457824707031, 71.59773254394531, 6.356224060058594, 56.84315490722656, 27.383743286132812, -1.002227783203125, 21.63995361328125, 3.802448272705078, 57.100738525390625, -1.18505859375, 65.13960266113281, 38.30152893066406, 17.930389404296875, 43.49348449707031, 10.039802551269531, 30.977195739746094, 11.148979187011719, 12.438018798828125, 16.865615844726562, -0.720855712890625, 56.0, 14.851898193359375, 9.253944396972656, 9.342430114746094, 29.592239379882812, -1.6356086730957031, 21.34588623046875, -10.453380584716797, 2.6766204833984375, 39.8834228515625, 22.817832946777344, -1.70269775390625, 25.48653793334961], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000600.npy"}
{"epoch": 0.8810572687224669, "step": 601, "batch_size": 64, "mean": 23.382741928100586, "std": 21.65766143798828, "min": -27.029151916503906, "p10": -1.2557323455810545, "median": 24.163894653320312, "p90": 49.83615646362305, "max": 81.18357849121094, "pos_frac": 0.84375, "sample": [-1.3757705688476562, 6.127540588378906, 81.18357849121094, 38.55719757080078, 8.327171325683594, 49.515235900878906, 20.479217529296875, 36.64849853515625, 18.033836364746094, 24.937271118164062, 16.2115535736084, 55.7520751953125, 80.06451416015625, 19.713050842285156, 24.458492279052734, 3.6942901611328125, -0.202392578125, 49.97369384765625, 31.04204559326172, 66.38285827636719, 38.1661376953125, 24.414138793945312, -1.3349609375, 8.258872985839844, 9.041812896728516, 41.108394622802734, 2.363231658935547, -0.3293800354003906, 62.04338073730469, 11.08407211303711, 14.576217651367188, 41.110626220703125, -27.029151916503906, 26.61188507080078, 41.226043701171875, 30.260055541992188, 7.429649353027344, 45.696006774902344, 12.794944763183594, 23.47534942626953, -1.0708656311035156, 33.09123992919922, -3.123607635498047, -4.66650390625, 13.723556518554688, -8.58697509765625, 9.416000366210938, 27.311859130859375, 36.19535446166992, 2.761066436767578, 26.856544494628906, 21.268447875976562, 29.138656616210938, 26.682456970214844, -16.48163604736328, 61.767906188964844, 23.796234130859375, 24.42218017578125, 31.352554321289062, 27.272117614746094, 43.48405456542969, 2.921785354614258, 23.913650512695312, 24.558135986328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000601.npy"}
{"epoch": 0.882525697503671, "step": 602, "batch_size": 64, "mean": 22.943323135375977, "std": 19.39066505432129, "min": -11.03472900390625, "p10": -1.3326078414916989, "median": 22.011592864990234, "p90": 49.98343162536621, "max": 71.29742431640625, "pos_frac": 0.859375, "sample": [18.292282104492188, 14.97113037109375, -3.2935562133789062, 21.66741943359375, 14.233390808105469, 25.10651397705078, 28.908889770507812, 23.226646423339844, 6.501495361328125, 2.27764892578125, 16.75623321533203, -1.5183334350585938, 4.5160675048828125, 13.171333312988281, 23.7857666015625, 12.780937194824219, 36.0854377746582, 68.27833557128906, 47.065677642822266, 6.124916076660156, 37.13348388671875, 10.396415710449219, -3.305145263671875, 7.440074920654297, -0.3043670654296875, 49.662994384765625, 66.11052703857422, 2.747852325439453, 10.805572509765625, 12.630012512207031, 17.49966812133789, -11.03472900390625, 26.385356903076172, 7.4288177490234375, 49.96449661254883, 29.538177490234375, 22.82526397705078, 26.66461944580078, -3.8832168579101562, -0.8992481231689453, 31.653717041015625, 39.40381622314453, 34.60957336425781, 13.06134033203125, 15.575950622558594, 27.95044708251953, 71.29742431640625, 36.02296447753906, 52.419715881347656, -4.666248321533203, 22.797412872314453, 28.1030330657959, 28.676403045654297, 54.23216247558594, 49.991546630859375, 41.39019012451172, 15.1300048828125, 22.074935913085938, 17.879404067993164, 30.298072814941406, 21.94824981689453, 63.40496826171875, 25.889892578125, -7.517181396484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000602.npy"}
{"epoch": 0.8839941262848752, "step": 603, "batch_size": 64, "mean": 25.826454162597656, "std": 22.18645477294922, "min": -24.622894287109375, "p10": 1.0055122375488283, "median": 25.648069381713867, "p90": 47.25779037475587, "max": 108.9638671875, "pos_frac": 0.90625, "sample": [28.097450256347656, 7.485939025878906, 12.18017578125, 3.8554534912109375, 33.9619140625, 44.516937255859375, 34.10951232910156, 48.43244171142578, 17.885971069335938, 22.407562255859375, 36.17008972167969, 27.74334716796875, -15.396339416503906, 38.65784454345703, 12.903013229370117, 31.708358764648438, 28.42266845703125, 61.42192077636719, 33.347572326660156, 12.982963562011719, 38.84745788574219, 18.117877960205078, 4.480133056640625, 17.40301513671875, 34.522789001464844, 1.2450103759765625, 27.50951385498047, 26.97817611694336, 15.705257415771484, 13.098136901855469, 33.32899475097656, -3.13299560546875, 42.2249755859375, -24.622894287109375, 23.652008056640625, 67.25979614257812, 28.562789916992188, 3.0196571350097656, 17.998706817626953, 23.90660858154297, 37.70945358276367, 43.431785583496094, 1.663360595703125, -4.533805847167969, 25.436656951904297, 18.85883331298828, 25.859481811523438, 18.186630249023438, 0.9028701782226562, 9.025806427001953, -4.136936187744141, 36.249061584472656, 66.93153381347656, 17.018783569335938, 26.67839813232422, 23.945602416992188, 108.9638671875, 32.541038513183594, 43.2142333984375, 22.332061767578125, 74.27702331542969, 63.88284683227539, 39.59980773925781, -6.145027160644531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000603.npy"}
{"epoch": 0.8854625550660793, "step": 604, "batch_size": 64, "mean": 22.049591064453125, "std": 17.895565032958984, "min": -15.235164642333984, "p10": -3.2677062988281236, "median": 20.89727020263672, "p90": 43.53845138549805, "max": 74.49209594726562, "pos_frac": 0.875, "sample": [-4.633659362792969, 22.111576080322266, -13.871219635009766, 22.357818603515625, 44.844940185546875, 30.37786102294922, 31.173851013183594, 5.798091888427734, 20.372817993164062, 12.207183837890625, -1.860443115234375, 40.07981491088867, -12.466300964355469, 12.1365966796875, 30.888458251953125, 15.907241821289062, 16.83240509033203, 18.34099578857422, 25.213329315185547, 24.05878448486328, 26.195598602294922, 2.49334716796875, 7.638824462890625, 36.786537170410156, -15.235164642333984, -6.279853820800781, 47.95549011230469, 17.525802612304688, 42.803443908691406, 40.228515625, 7.937000274658203, 17.05960464477539, 23.635826110839844, 28.77149200439453, 41.588356018066406, 51.50947570800781, 10.101577758789062, 11.664833068847656, 15.435981750488281, 18.20865249633789, 34.52660369873047, 13.45849609375, 20.657581329345703, 7.045135498046875, 39.81727600097656, 58.10906982421875, 25.34856414794922, 74.49209594726562, 40.77617263793945, 3.858001708984375, 14.986434936523438, 33.67469024658203, 43.85345458984375, 13.406791687011719, 40.700626373291016, 25.990886688232422, -4.0196685791015625, -3.870819091796875, 37.02617645263672, 21.136959075927734, 45.2010498046875, 16.545974731445312, 25.863449096679688, 16.693405151367188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000604.npy"}
{"epoch": 0.8869309838472834, "step": 605, "batch_size": 64, "mean": 20.655899047851562, "std": 19.46525764465332, "min": -19.71068572998047, "p10": -4.47454223632812, "median": 17.997556686401367, "p90": 40.691575622558595, "max": 98.58218383789062, "pos_frac": 0.890625, "sample": [12.660179138183594, 30.223861694335938, 45.56256103515625, 12.134544372558594, 25.98788070678711, 24.100936889648438, 39.1263313293457, 15.978469848632812, 56.62969207763672, 12.229970932006836, 26.794639587402344, 30.20001220703125, 14.160537719726562, 29.17633056640625, 29.136337280273438, 26.02175521850586, 16.09262466430664, 39.61231994628906, 26.104263305664062, -8.936431884765625, 98.58218383789062, 1.345458984375, 9.108318328857422, 14.995330810546875, 0.033416748046875, 18.679244995117188, 54.01432800292969, 5.315284729003906, -19.71068572998047, 15.668899536132812, 9.395980834960938, -7.179130554199219, 9.121566772460938, 38.12464904785156, 27.93889617919922, 6.985443115234375, 31.055343627929688, 7.406963348388672, 29.01879119873047, 41.15411376953125, 25.671199798583984, 55.360260009765625, 4.429935455322266, 17.93008804321289, 30.267913818359375, 6.197776794433594, -7.751708984375, 18.065025329589844, -13.393085479736328, 8.983219146728516, 14.184410095214844, -6.406524658203125, 34.333457946777344, 32.399147033691406, 3.090362548828125, 12.836540222167969, 35.68852615356445, 4.116374969482422, 32.22626495361328, 29.845870971679688, 44.507965087890625, 17.49211883544922, 34.41261291503906, -6.5614166259765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000605.npy"}
{"epoch": 0.8883994126284875, "step": 606, "batch_size": 64, "mean": 24.202983856201172, "std": 19.722009658813477, "min": -5.302986145019531, "p10": 1.9358379364013683, "median": 21.145549774169922, "p90": 48.730339813232426, "max": 72.72657775878906, "pos_frac": 0.90625, "sample": [-4.779705047607422, 9.041610717773438, 49.309669494628906, -0.8380470275878906, 64.33910369873047, 5.542823791503906, 12.392332077026367, 20.00128173828125, 62.602054595947266, 22.098915100097656, 20.014366149902344, 71.86465454101562, 33.39312744140625, 1.4550704956054688, 7.99029541015625, 13.7828369140625, 17.09262466430664, 3.057628631591797, 20.752540588378906, 36.18911361694336, -1.5382270812988281, -5.302986145019531, 36.85792541503906, 30.934410095214844, 72.72657775878906, 30.934803009033203, 32.081756591796875, 5.676109313964844, 4.318027496337891, 25.71733856201172, 20.338638305664062, 20.290077209472656, -3.9727745056152344, 14.775917053222656, 4.81573486328125, 3.43023681640625, 25.161231994628906, 33.53227996826172, 7.597484588623047, 24.975242614746094, 7.723379135131836, 25.289051055908203, 17.22808837890625, 29.767560958862305, 67.71356964111328, 15.807136535644531, 4.3941497802734375, 68.09640502929688, 47.11113739013672, 16.919631958007812, 29.38558578491211, 22.356239318847656, 25.100807189941406, 47.378570556640625, 16.464736938476562, 43.632110595703125, 19.203819274902344, -0.26926422119140625, 47.2955322265625, 46.707069396972656, 35.830474853515625, 23.456336975097656, 21.538558959960938, 22.20806884765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000606.npy"}
{"epoch": 0.8898678414096917, "step": 607, "batch_size": 64, "mean": 23.307056427001953, "std": 23.05050277709961, "min": -22.904048919677734, "p10": -6.6894691467285154, "median": 22.905773162841797, "p90": 52.69924926757813, "max": 80.96910095214844, "pos_frac": 0.828125, "sample": [47.431060791015625, 51.63690185546875, 39.95115661621094, -0.017406463623046875, 23.564300537109375, -6.9115142822265625, 40.164825439453125, 43.30027770996094, 53.154541015625, 31.447006225585938, 9.915447235107422, -22.904048919677734, 44.64495849609375, 10.190673828125, 12.827638626098633, 55.79817199707031, 19.10930633544922, 26.638328552246094, 54.08744812011719, 20.197973251342773, 46.93840789794922, 1.9668292999267578, 39.94436264038086, 16.396209716796875, -3.3202781677246094, 15.06964111328125, -9.755859375, 34.40766143798828, 26.77191162109375, 21.692890167236328, 56.881690979003906, 0.11189651489257812, 41.11711883544922, 17.389179229736328, 27.215911865234375, 10.411056518554688, 51.482208251953125, -9.644607543945312, 40.86522674560547, -7.915458679199219, -19.68463897705078, 34.03766632080078, -21.438064575195312, 46.70964050292969, -5.5142669677734375, 0.5996799468994141, 29.00885009765625, 15.305778503417969, 0.03185272216796875, 22.24724578857422, 27.921188354492188, 67.61802673339844, 39.28656005859375, 21.559558868408203, -6.171363830566406, 5.31591796875, 27.966567993164062, 80.96910095214844, 20.558029174804688, 31.380523681640625, 8.593488693237305, 3.0882911682128906, 61.16693115234375, 28.84194564819336], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000607.npy"}
{"epoch": 0.8913362701908958, "step": 608, "batch_size": 64, "mean": 22.603548049926758, "std": 19.300329208374023, "min": -42.661582946777344, "p10": 2.282218933105469, "median": 22.234655380249023, "p90": 45.19222564697266, "max": 67.43843078613281, "pos_frac": 0.90625, "sample": [44.46412658691406, 28.97159194946289, 7.737449645996094, 7.001747131347656, 15.252098083496094, 21.566635131835938, 22.19222640991211, 2.9056396484375, 30.762069702148438, 20.409278869628906, 17.66405487060547, 9.316568374633789, 5.573719024658203, 38.06524658203125, -14.794477462768555, 6.816234588623047, 19.88994598388672, 28.747085571289062, 3.5911026000976562, 35.96119689941406, 41.70191955566406, 40.50922393798828, 36.455535888671875, 29.854751586914062, 2.04754638671875, 21.7196044921875, -1.4285430908203125, 15.058509826660156, 18.148704528808594, 19.9708251953125, 36.203941345214844, 22.277084350585938, 67.43843078613281, 8.017623901367188, 31.38117218017578, 41.234947204589844, 47.03962707519531, -25.866989135742188, 21.59252166748047, -1.9766159057617188, 30.060688018798828, 40.94751739501953, 45.25874328613281, 2.8297882080078125, 24.111495971679688, 23.9454345703125, 10.560637474060059, 12.514816284179688, 41.210227966308594, 17.319747924804688, 48.60511779785156, -42.661582946777344, 15.661933898925781, 48.0794677734375, 25.959388732910156, 26.078989028930664, -7.530073165893555, 22.901214599609375, 49.977455139160156, 49.762359619140625, 38.78643798828125, 45.037017822265625, 16.68640899658203, 37.05049133300781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000608.npy"}
{"epoch": 0.8928046989720999, "step": 609, "batch_size": 64, "mean": 24.306610107421875, "std": 22.530479431152344, "min": -24.369308471679688, "p10": -0.28365173339843736, "median": 20.21210479736328, "p90": 51.85178680419922, "max": 104.00789642333984, "pos_frac": 0.875, "sample": [19.853771209716797, 19.513107299804688, 3.408416748046875, 32.71675109863281, -2.4801712036132812, 51.96815490722656, 33.249420166015625, 9.156814575195312, 0.7635078430175781, 17.588714599609375, 48.234291076660156, 35.98671340942383, 1.20477294921875, 21.408016204833984, 12.674346923828125, 49.372962951660156, 38.97065734863281, 55.06986999511719, 21.954208374023438, 7.717926025390625, 20.14678955078125, 51.58026123046875, 36.08182144165039, 27.192161560058594, 5.551479339599609, 17.779651641845703, 50.24409484863281, -1.9975814819335938, 3.8001976013183594, 35.260887145996094, 8.642257690429688, 20.277420043945312, 46.41968536376953, 53.295310974121094, 45.55403137207031, 24.569549560546875, 21.97338104248047, -2.610980987548828, 26.082416534423828, 37.88996124267578, -0.33834075927734375, 30.3878173828125, 43.05205535888672, 104.00789642333984, 24.783554077148438, 18.503860473632812, 16.954681396484375, 9.036808013916016, -3.5946273803710938, 19.028297424316406, 28.383651733398438, 8.250114440917969, 6.764472961425781, -1.8741531372070312, 29.814666748046875, 68.9428939819336, -0.15604400634765625, 66.57495880126953, 10.345626831054688, 71.90420532226562, 3.6777725219726562, -24.369308471679688, 16.473480224609375, 3.0035667419433594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000609.npy"}
{"epoch": 0.8942731277533039, "step": 610, "batch_size": 64, "mean": 23.985227584838867, "std": 19.955738067626953, "min": -37.08680725097656, "p10": -0.23071517944335923, "median": 26.20248031616211, "p90": 48.25469818115235, "max": 82.0166015625, "pos_frac": 0.875, "sample": [34.12321472167969, 40.84814453125, 4.430839538574219, 33.58557891845703, 26.757110595703125, 0.3084716796875, 48.88983154296875, 33.38372802734375, 38.09088897705078, 32.233978271484375, -5.727062225341797, 17.290145874023438, 7.901937484741211, 45.299346923828125, 82.0166015625, 32.946563720703125, 26.135818481445312, 26.696388244628906, 15.88589859008789, 16.749961853027344, 44.4091796875, 26.269142150878906, 8.257526397705078, 28.826675415039062, -0.295074462890625, 16.58228302001953, 3.9381027221679688, 15.40655517578125, 35.46806716918945, 59.066078186035156, 45.223785400390625, 4.144737243652344, 4.295373916625977, 6.160064697265625, 20.790199279785156, 32.67756652832031, -12.929962158203125, 33.052528381347656, -0.08054351806640625, 20.192787170410156, 50.58036804199219, 39.93376922607422, -1.5973262786865234, 34.82166290283203, 31.939590454101562, 12.784122467041016, 25.778789520263672, 32.680294036865234, -1.9230155944824219, 29.49523162841797, 13.619125366210938, 46.77272033691406, 21.124725341796875, 14.671255111694336, 22.732280731201172, 49.57564163208008, 23.63866424560547, 46.73190689086914, 9.214092254638672, 27.325942993164062, -8.233680725097656, -37.08680725097656, 49.55499267578125, 51.617767333984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000610.npy"}
{"epoch": 0.895741556534508, "step": 611, "batch_size": 64, "mean": 21.690879821777344, "std": 22.83681297302246, "min": -14.957328796386719, "p10": -2.0829738616943354, "median": 15.784220695495605, "p90": 50.36961479187012, "max": 88.40342712402344, "pos_frac": 0.859375, "sample": [5.698638916015625, 49.143314361572266, 20.276634216308594, 36.217254638671875, 36.2591667175293, 23.264118194580078, 6.480060577392578, 11.646461486816406, 23.436935424804688, 26.870208740234375, 5.390602111816406, 79.30738830566406, -14.112701416015625, -1.60394287109375, -3.513671875, -11.618152618408203, -1.307413101196289, 14.760883331298828, 50.895172119140625, 54.88960266113281, 8.38726806640625, 72.04191589355469, 88.40342712402344, -2.2882728576660156, 43.21356964111328, 72.28058624267578, 8.245750427246094, 27.257797241210938, -5.265052795410156, 12.88592529296875, 40.74543762207031, 25.074935913085938, 18.156471252441406, 10.113187789916992, 3.245208740234375, 10.729408264160156, 5.340381622314453, 15.980710983276367, -14.957328796386719, 7.064491271972656, 25.20770263671875, 31.211471557617188, 15.587730407714844, 48.38307571411133, 35.975494384765625, 3.9369964599609375, 13.461511611938477, 15.048809051513672, 31.541179656982422, 2.5430679321289062, 18.568967819213867, 10.391387939453125, 67.53775787353516, 25.1484375, 2.048717498779297, 13.531974792480469, 25.055248260498047, 33.64848327636719, 11.51498794555664, 30.73193359375, 34.360748291015625, 0.9195289611816406, 45.03253173828125, -12.2078857421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000611.npy"}
{"epoch": 0.8972099853157122, "step": 612, "batch_size": 64, "mean": 23.6812744140625, "std": 23.77232551574707, "min": -23.014999389648438, "p10": -1.4126146316528319, "median": 21.065956115722656, "p90": 48.283439636230476, "max": 126.14892578125, "pos_frac": 0.84375, "sample": [126.14892578125, 37.153839111328125, 24.22228240966797, 21.151397705078125, 10.647314071655273, 54.35938262939453, 27.655303955078125, 35.511810302734375, -1.4645233154296875, -1.291494369506836, 20.54383087158203, 6.6266021728515625, 42.13716125488281, 20.980514526367188, 62.45325469970703, -4.284782409667969, 48.83644104003906, 32.394447326660156, 17.91964340209961, 34.91099548339844, -14.925018310546875, 10.613216400146484, 11.511268615722656, 25.015769958496094, 52.753334045410156, -23.014999389648438, 9.868759155273438, 44.21770477294922, -0.33588409423828125, 36.15440368652344, 41.6994743347168, 12.173614501953125, 10.385910034179688, 30.633869171142578, 52.60741424560547, 21.15357208251953, 23.62895965576172, 19.123878479003906, 14.464874267578125, -1.0088729858398438, -6.429859161376953, 10.143749237060547, 6.056632995605469, 17.89373779296875, 46.99310302734375, 6.475727081298828, 44.27125549316406, 39.43206024169922, 74.41030883789062, -22.43224334716797, 8.776206970214844, 13.38753890991211, 42.15913391113281, -7.498779296875, 21.748001098632812, 18.476490020751953, 9.350852966308594, 26.389068603515625, 27.57915496826172, 32.428558349609375, 42.50318908691406, 45.37061309814453, 16.324111938476562, 8.459426879882812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000612.npy"}
{"epoch": 0.8986784140969163, "step": 613, "batch_size": 64, "mean": 24.644756317138672, "std": 21.37565040588379, "min": -11.259429931640625, "p10": 2.2266746520996095, "median": 18.047807693481445, "p90": 53.689114379882824, "max": 83.5548095703125, "pos_frac": 0.921875, "sample": [13.728096008300781, 27.5186767578125, 17.127479553222656, 32.875404357910156, 44.643585205078125, 66.35242462158203, 43.199119567871094, 22.20604705810547, 27.87164306640625, 37.400367736816406, 9.214279174804688, -8.661460876464844, 62.49273681640625, 12.963386535644531, 51.70787811279297, 40.362022399902344, 16.42845916748047, 12.24039077758789, 12.74432373046875, 1.6130142211914062, 15.967147827148438, 9.052936553955078, 30.105754852294922, 19.236801147460938, 18.223594665527344, 13.729133605957031, 37.012882232666016, 69.46160888671875, 25.743087768554688, 11.434074401855469, 33.38227844238281, 31.002323150634766, 2.1653594970703125, 5.668922424316406, -11.259429931640625, 44.551666259765625, 12.56338119506836, 43.61174011230469, 83.5548095703125, 6.822370529174805, 50.23182678222656, -3.3999481201171875, 27.64765167236328, 13.223808288574219, 10.573089599609375, 54.53821563720703, 38.35529327392578, 76.80519104003906, 2.6072044372558594, 4.006378173828125, 69.1802978515625, 17.872020721435547, -7.77716064453125, 2.3697433471679688, 7.4392242431640625, 27.557476043701172, 30.880050659179688, 12.498908996582031, 13.526107788085938, 33.91227340698242, 24.803543090820312, 11.937026977539062, 12.981399536132812, -0.563507080078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000613.npy"}
{"epoch": 0.9001468428781204, "step": 614, "batch_size": 64, "mean": 22.033950805664062, "std": 19.608352661132812, "min": -15.54248046875, "p10": -1.9725852966308588, "median": 20.412752151489258, "p90": 43.79263458251953, "max": 92.02360534667969, "pos_frac": 0.875, "sample": [17.04119110107422, 31.494239807128906, -3.85760498046875, 22.421112060546875, 43.899169921875, 19.55080795288086, 0.8465805053710938, -6.661689758300781, 9.630409240722656, 8.032476425170898, 11.419864654541016, 6.694892883300781, 28.923614501953125, 12.250244140625, 49.176246643066406, -15.54248046875, 28.929489135742188, 37.460296630859375, 18.551158905029297, 6.7649688720703125, 37.27745819091797, 26.379119873046875, 31.510757446289062, 17.242294311523438, 43.54405212402344, 31.06308364868164, 19.39666748046875, 34.078834533691406, 11.978988647460938, 3.1844558715820312, -11.971183776855469, 31.16051483154297, 35.83265686035156, 20.572101593017578, 25.592552185058594, -7.2100067138671875, 26.63131332397461, 8.720623016357422, 20.253402709960938, 40.8902587890625, 21.463504791259766, -1.4287490844726562, 34.71748352050781, 7.249431610107422, 92.02360534667969, 68.87760925292969, 12.168380737304688, 25.53105926513672, 19.710128784179688, 20.934844970703125, 24.271984100341797, 14.453353881835938, 32.75970458984375, 18.285995483398438, 54.96681213378906, 16.855331420898438, 44.26255798339844, -2.205657958984375, 60.839111328125, 0.31322479248046875, 29.44036865234375, -14.178253173828125, 38.71245574951172, 16.995697021484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000614.npy"}
{"epoch": 0.9016152716593245, "step": 615, "batch_size": 64, "mean": 19.82468605041504, "std": 18.810060501098633, "min": -29.623977661132812, "p10": -0.7530914306640619, "median": 20.44586944580078, "p90": 41.26695022583009, "max": 64.22622680664062, "pos_frac": 0.875, "sample": [-6.003814697265625, 59.45237731933594, 13.537132263183594, 21.472030639648438, 5.203757286071777, 9.180335998535156, 12.806549072265625, 8.480300903320312, 8.927490234375, 15.073684692382812, 32.370697021484375, 25.957107543945312, 32.608680725097656, 38.193206787109375, 46.40528869628906, 10.7666015625, 38.286476135253906, 11.981834411621094, 2.0774574279785156, 6.516746520996094, 10.077774047851562, -29.623977661132812, 27.967010498046875, -0.13782501220703125, 6.069732666015625, 64.22622680664062, 38.52362823486328, 33.69242858886719, 45.04307556152344, 9.033510208129883, 6.729034423828125, 28.944778442382812, 19.877334594726562, 0.1110382080078125, 30.84294891357422, 38.70867156982422, 33.994056701660156, 26.068401336669922, 2.3466567993164062, 21.922903060913086, -10.243778228759766, 52.63371276855469, 31.545631408691406, 4.059074401855469, 39.32073211669922, 18.28327178955078, 25.910934448242188, 42.101043701171875, 23.4888916015625, 35.49235534667969, 2.1304588317871094, -2.5870208740234375, 35.11335754394531, -22.72968292236328, 21.014404296875, -4.618549346923828, -1.0167770385742188, 52.345130920410156, 1.1803741455078125, 25.785499572753906, 14.255294799804688, 22.467132568359375, 36.93489074707031, 18.202194213867188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000615.npy"}
{"epoch": 0.9030837004405287, "step": 616, "batch_size": 64, "mean": 21.420740127563477, "std": 22.271535873413086, "min": -19.332305908203125, "p10": -1.4308029174804657, "median": 15.818124771118164, "p90": 57.280499267578136, "max": 82.95831298828125, "pos_frac": 0.890625, "sample": [37.61963653564453, 5.7538909912109375, 64.9584732055664, 41.20722961425781, 6.166328430175781, 15.530754089355469, -2.7045440673828125, -15.297821044921875, 58.8614501953125, 59.85829162597656, 35.88975524902344, 16.10549545288086, 15.367027282714844, 11.756904602050781, 69.77809143066406, 7.239219665527344, 28.617645263671875, 61.008155822753906, 13.34463119506836, 3.7868080139160156, 49.96159362792969, 37.228336334228516, 25.151588439941406, 4.699790954589844, 31.341339111328125, -19.332305908203125, 8.016029357910156, 5.5951690673828125, 38.4666748046875, 1.541259765625, 82.95831298828125, 18.651487350463867, -19.303817749023438, 21.84082794189453, 5.794593811035156, 4.8679656982421875, 7.08062744140625, 18.02801513671875, 13.140556335449219, 21.752456665039062, 8.715118408203125, 44.648475646972656, -11.510467529296875, 8.871200561523438, 6.58892822265625, 37.229736328125, 16.227725982666016, 13.1595458984375, 37.41813278198242, 13.425628662109375, 7.6870269775390625, 31.262466430664062, 23.68557357788086, -6.859382629394531, 28.81476593017578, 21.460418701171875, 20.505111694335938, 28.298320770263672, 15.0069580078125, -2.812835693359375, 53.59161376953125, 9.087051391601562, 69.88119506835938, 4.2171173095703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000616.npy"}
{"epoch": 0.9045521292217328, "step": 617, "batch_size": 64, "mean": 18.88084602355957, "std": 21.381799697875977, "min": -28.212432861328125, "p10": -3.687057876586913, "median": 16.182336807250977, "p90": 49.61511764526368, "max": 76.49447631835938, "pos_frac": 0.828125, "sample": [5.755073547363281, 17.794443130493164, -2.4853515625, 27.4998779296875, -11.262615203857422, 21.424121856689453, 51.24934387207031, 9.930511474609375, 18.359146118164062, 9.770940780639648, 6.2523345947265625, 23.091209411621094, 41.98847961425781, 46.81958770751953, 19.791702270507812, 10.356712341308594, 6.933958053588867, 16.293785095214844, 10.364261627197266, 76.49447631835938, 19.054603576660156, 4.489566802978516, -28.212432861328125, 33.99163055419922, 14.51333999633789, 35.3330078125, 8.47121810913086, 15.129478454589844, 6.297462463378906, 29.43157958984375, 16.07088851928711, 56.99115753173828, 4.8237152099609375, 13.230194091796875, 60.84681701660156, 33.9822998046875, 26.612396240234375, 20.869613647460938, 58.562530517578125, 34.22599792480469, -9.982170104980469, 7.193323135375977, 41.36695098876953, 37.4320068359375, 46.62860107421875, -1.3209304809570312, 17.467330932617188, -18.382965087890625, 10.964324951171875, 12.213172912597656, 8.841583251953125, -1.9555282592773438, -2.6672134399414062, -4.124134063720703, 1.9569625854492188, -7.141538619995117, 28.160247802734375, 23.795867919921875, -22.04729652404785, 68.24510955810547, 50.813201904296875, 22.37652587890625, 22.057968139648438, 5.345672607421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000617.npy"}
{"epoch": 0.9060205580029369, "step": 618, "batch_size": 64, "mean": 20.30179786682129, "std": 20.217187881469727, "min": -34.88862609863281, "p10": -6.309986495971677, "median": 19.72160530090332, "p90": 49.934417724609375, "max": 63.41138458251953, "pos_frac": 0.828125, "sample": [-17.237808227539062, 11.70125961303711, 17.74652862548828, 9.245445251464844, 35.37187194824219, 7.9684600830078125, 21.260581970214844, 30.422149658203125, 51.20057678222656, 22.19165802001953, 28.295875549316406, -3.8424758911132812, 53.729522705078125, 6.500028610229492, 22.156848907470703, 8.726470947265625, -34.88862609863281, 49.456275939941406, 27.387100219726562, 25.802474975585938, -8.019363403320312, 16.58490753173828, 19.7684326171875, -13.736007690429688, -8.357818603515625, 10.638275146484375, 27.34186553955078, -21.115554809570312, 32.92413330078125, 15.856742858886719, 29.808914184570312, 6.26307487487793, 12.262266159057617, 57.93544006347656, 53.598480224609375, 6.423820495605469, 11.660964965820312, 18.684906005859375, -0.26552581787109375, 18.05856704711914, 27.6500244140625, -4.185871124267578, -7.2203216552734375, 31.585525512695312, 39.302825927734375, -1.7478675842285156, 63.41138458251953, 19.67477798461914, 27.722923278808594, 52.07630920410156, 35.494136810302734, 6.062170028686523, 43.93440246582031, 42.930694580078125, 50.13933563232422, 26.51907730102539, 18.834320068359375, 11.936050415039062, 38.71586608886719, 23.712158203125, 14.39735221862793, 17.710617065429688, 40.85302734375, 20.295421600341797], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000618.npy"}
{"epoch": 0.9074889867841409, "step": 619, "batch_size": 64, "mean": 21.84807586669922, "std": 20.74909782409668, "min": -17.298980712890625, "p10": -2.5037832260131823, "median": 19.287914276123047, "p90": 49.61234741210937, "max": 63.76448059082031, "pos_frac": 0.84375, "sample": [-13.475250244140625, 41.715362548828125, 41.221282958984375, 49.82103729248047, 33.96897506713867, -3.4681549072265625, 9.806129455566406, 15.123191833496094, 49.597137451171875, 7.493471145629883, 55.691314697265625, 32.53797149658203, 19.26873016357422, 44.93077087402344, 15.34918212890625, 25.112403869628906, 49.618865966796875, 9.06191635131836, 39.888580322265625, 21.443885803222656, 2.8305206298828125, 16.827613830566406, 25.39117431640625, 56.49977111816406, 0.2252655029296875, -5.0251007080078125, -0.8907928466796875, 49.20939636230469, 7.53765869140625, 5.1342010498046875, 4.371517181396484, 27.71013641357422, 19.479930877685547, 31.916488647460938, 31.733306884765625, 12.8336181640625, 4.374763488769531, 5.532810211181641, 3.728668212890625, 16.336811065673828, 21.171119689941406, 62.935211181640625, -3.455596923828125, 36.727439880371094, 41.54383087158203, 14.04606819152832, -10.917850494384766, 47.85761260986328, 63.01336669921875, 19.307098388671875, -0.5895843505859375, 5.9039459228515625, 40.04070281982422, 63.76448059082031, 0.8090438842773438, 5.749595642089844, 32.65460968017578, -1.2147216796875, 32.01564025878906, 47.525634765625, 16.497848510742188, -17.298980712890625, -3.0562381744384766, 22.782058715820312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000619.npy"}
{"epoch": 0.908957415565345, "step": 620, "batch_size": 64, "mean": 20.691162109375, "std": 22.892547607421875, "min": -20.747581481933594, "p10": -7.807159614562986, "median": 18.983428955078125, "p90": 46.075143432617196, "max": 75.55624389648438, "pos_frac": 0.8125, "sample": [6.99993896484375, 6.701023101806641, 58.22069549560547, 14.181114196777344, 32.08071517944336, 40.10150909423828, 29.988067626953125, -1.1601829528808594, 23.479949951171875, 21.29375457763672, 3.1089859008789062, -17.538177490234375, 0.28525543212890625, 41.0059814453125, 3.8099746704101562, -2.83856201171875, 18.522010803222656, 35.15678405761719, 25.752174377441406, 42.742584228515625, 16.45153045654297, 31.123050689697266, 0.9727516174316406, 39.28608703613281, -15.436561584472656, 13.682174682617188, 15.655281066894531, 16.91596221923828, 37.78800964355469, 10.980209350585938, -0.963470458984375, 46.74781799316406, 35.74147033691406, 75.55624389648438, 24.538299560546875, -20.747581481933594, 61.18279266357422, 9.233360290527344, 71.88229370117188, 30.34088134765625, 2.0941848754882812, 31.83155059814453, -3.3252029418945312, 13.459793090820312, 29.542125701904297, 21.437545776367188, 18.807540893554688, 44.50556945800781, 32.5462646484375, 5.186115264892578, -9.064123153686523, 42.73923110961914, 74.43286895751953, 11.329048156738281, 32.30596160888672, 68.57481384277344, 10.363883972167969, -9.640983581542969, 20.60010528564453, -11.763439178466797, -4.874244689941406, 19.159317016601562, -19.057022094726562, 20.219146728515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000620.npy"}
{"epoch": 0.9104258443465492, "step": 621, "batch_size": 64, "mean": 21.058883666992188, "std": 21.956212997436523, "min": -23.538257598876953, "p10": -8.078889846801756, "median": 19.329200744628906, "p90": 50.298223876953145, "max": 99.00588989257812, "pos_frac": 0.8125, "sample": [23.055809020996094, 26.63064193725586, -23.538257598876953, 33.15995788574219, 29.426048278808594, -11.888534545898438, 36.04385757446289, 15.4827880859375, -2.2007827758789062, 8.085304260253906, 33.46949768066406, 14.612350463867188, 28.458656311035156, 55.87835693359375, -10.440376281738281, 15.073657989501953, 4.679729461669922, -8.572490692138672, 6.457183837890625, 9.544662475585938, 60.69255065917969, 27.456466674804688, 1.907501220703125, 22.21031951904297, 58.15742492675781, 35.769378662109375, 52.49437713623047, 1.76788330078125, 24.758010864257812, 29.259368896484375, -10.957542419433594, 33.584251403808594, 37.192596435546875, 8.614028930664062, 66.33032989501953, 27.405620574951172, 11.519697189331055, 54.66899108886719, 27.726451873779297, -0.5838165283203125, -6.927154541015625, 15.664260864257812, 15.935035705566406, 19.217788696289062, 18.51065444946289, 25.51651382446289, 32.85966491699219, 11.301097869873047, -13.255447387695312, 37.16163635253906, -1.2891273498535156, 10.899835586547852, 32.193695068359375, 37.3590087890625, -0.8718948364257812, -14.696380615234375, 28.201099395751953, 30.91534423828125, 17.674888610839844, 45.173866271972656, 99.00588989257812, 15.744083404541016, 19.44061279296875, 18.641551971435547], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000621.npy"}
{"epoch": 0.9118942731277533, "step": 622, "batch_size": 64, "mean": 23.937095642089844, "std": 18.859811782836914, "min": -12.807426452636719, "p10": 0.44628448486328137, "median": 24.67597007751465, "p90": 46.37722930908204, "max": 82.34443664550781, "pos_frac": 0.90625, "sample": [37.387088775634766, 15.908248901367188, 0.5750350952148438, 16.80413818359375, 0.39110565185546875, 6.299583435058594, 30.339584350585938, 28.135147094726562, 82.34443664550781, 40.54058837890625, 13.299018859863281, 35.79619598388672, 16.125606536865234, -2.6189117431640625, 12.004077911376953, 62.27892303466797, 18.84374237060547, 25.684234619140625, 1.90380859375, -1.9425506591796875, 14.865604400634766, 9.978202819824219, 9.162368774414062, 43.36627197265625, 38.095542907714844, 38.22509765625, 25.141216278076172, 21.160797119140625, 47.40730285644531, 10.914976119995117, 26.22924041748047, 42.50274658203125, 47.557289123535156, -1.965667724609375, 35.441184997558594, 48.524932861328125, 6.2437286376953125, 11.217727661132812, 57.035621643066406, 35.5860710144043, 43.973724365234375, 43.634124755859375, -2.2798004150390625, -9.448204040527344, 13.450996398925781, 9.230911254882812, 12.812515258789062, 2.8497161865234375, 27.42169189453125, 1.7991714477539062, 37.05276870727539, 24.210723876953125, 53.88722229003906, 36.15385055541992, 27.518035888671875, 20.75035858154297, 38.05939483642578, -12.807426452636719, 6.911617279052734, 30.38343048095703, 31.287731170654297, 18.624290466308594, 30.716094970703125, 38.99175262451172], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000622.npy"}
{"epoch": 0.9133627019089574, "step": 623, "batch_size": 64, "mean": 18.665071487426758, "std": 23.302227020263672, "min": -18.354862213134766, "p10": -6.1360630035400385, "median": 11.08549690246582, "p90": 52.496235656738286, "max": 72.63946533203125, "pos_frac": 0.828125, "sample": [24.650245666503906, 10.07220458984375, -15.14649772644043, 50.31615447998047, 58.96205139160156, 21.560630798339844, 5.347995758056641, 43.63629150390625, 31.96044158935547, 2.949615478515625, 52.57981872558594, 2.19384765625, -14.956623077392578, 12.684814453125, 46.28404998779297, -14.193492889404297, 72.63946533203125, 8.250732421875, 22.58452606201172, 0.802337646484375, 4.634071350097656, 23.080692291259766, 5.415914535522461, 8.812143325805664, -4.9686737060546875, -5.622650146484375, 40.87786102294922, 33.526920318603516, 19.15752410888672, 9.761985778808594, -12.210418701171875, 3.4381179809570312, 13.307403564453125, 0.5069999694824219, -6.252492904663086, 9.365747451782227, 50.99627685546875, 49.31055450439453, 9.842018127441406, 25.115066528320312, 44.683380126953125, -2.0194931030273438, 56.840843200683594, 11.583534240722656, -14.725784301757812, 9.96595573425293, 62.82197570800781, 0.6775970458984375, 2.1563358306884766, -18.354862213134766, 58.21720886230469, 61.53319549560547, 40.02738952636719, 12.110736846923828, 10.587459564208984, 52.30120849609375, 0.8094558715820312, -5.86439323425293, 2.111419677734375, 10.055206298828125, 49.068939208984375, 14.036712646484375, 21.35071563720703, 13.316194534301758], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000623.npy"}
{"epoch": 0.9148311306901615, "step": 624, "batch_size": 64, "mean": 26.686798095703125, "std": 23.140295028686523, "min": -19.71502685546875, "p10": -1.2031627655029293, "median": 26.90181541442871, "p90": 51.43482818603516, "max": 85.32415771484375, "pos_frac": 0.875, "sample": [39.53634262084961, 14.758697509765625, 35.139404296875, 17.795547485351562, -12.41843032836914, 27.692520141601562, 37.44151306152344, 35.63832092285156, 39.970458984375, 13.504814147949219, 44.85211181640625, -9.166641235351562, 44.58702850341797, 25.857322692871094, 80.99481201171875, 7.612518310546875, 38.815914154052734, -1.3468170166015625, 40.883544921875, 35.18488311767578, 23.791351318359375, 68.27626037597656, 43.15105438232422, 16.290119171142578, 30.44463348388672, 11.495155334472656, 71.07227325439453, 30.925193786621094, -8.997222900390625, 4.910499572753906, 27.39419174194336, 50.763763427734375, 37.68597412109375, 47.76500701904297, 31.861495971679688, -8.755950927734375, 41.213043212890625, 3.131183624267578, 26.409439086914062, 39.924407958984375, -9.062408447265625, 17.548667907714844, 13.311622619628906, 5.311256408691406, 47.63154602050781, 54.010955810546875, 16.406341552734375, 51.72242736816406, 27.696945190429688, 17.685134887695312, 13.667510986328125, -19.71502685546875, 17.63796043395996, 82.46754455566406, -0.8679695129394531, 3.6482772827148438, 1.407470703125, 14.35306167602539, 8.773876190185547, 28.334339141845703, 45.13966369628906, 16.686080932617188, 22.749900817871094, 85.32415771484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000624.npy"}
{"epoch": 0.9162995594713657, "step": 625, "batch_size": 64, "mean": 20.588481903076172, "std": 18.202354431152344, "min": -7.8656005859375, "p10": -1.317710876464843, "median": 16.598236083984375, "p90": 45.1130096435547, "max": 66.02128601074219, "pos_frac": 0.84375, "sample": [-1.61102294921875, 17.567283630371094, 15.629188537597656, 41.61981964111328, 15.219490051269531, 34.674217224121094, 60.37019348144531, 36.06005859375, 9.465042114257812, 31.10155487060547, 13.223373413085938, -2.0966567993164062, 5.787334442138672, 24.89019775390625, 39.22796630859375, 1.9979400634765625, 7.724081039428711, -0.5624465942382812, 20.796951293945312, 32.92234802246094, 60.57322692871094, -1.9906082153320312, 8.451801300048828, 10.570674896240234, 1.080963134765625, 27.856353759765625, 13.43939208984375, 30.33511734008789, 10.545867919921875, -0.6333160400390625, 23.253623962402344, 29.70635986328125, 46.102813720703125, 20.817794799804688, 58.512184143066406, 10.771968841552734, 9.768714904785156, 26.278270721435547, 1.3814010620117188, 1.6700515747070312, -7.8656005859375, 21.373291015625, 31.080730438232422, 36.32946014404297, 25.482704162597656, 42.803466796875, -0.24275970458984375, 22.72126007080078, 40.13141632080078, 30.904273986816406, -1.6694984436035156, -7.450920104980469, -3.1158676147460938, 10.533935546875, 10.605125427246094, 66.02128601074219, 10.852554321289062, 32.56614685058594, 14.275836944580078, 11.575492858886719, 49.87528991699219, 20.883625030517578, 53.30088806152344, 14.191062927246094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000625.npy"}
{"epoch": 0.9177679882525698, "step": 626, "batch_size": 64, "mean": 20.086833953857422, "std": 26.00185203552246, "min": -30.04027557373047, "p10": -5.8358329772949205, "median": 15.216438293457031, "p90": 57.061171722412126, "max": 100.76629638671875, "pos_frac": 0.75, "sample": [7.7662353515625, 27.969650268554688, -4.702911376953125, 5.833900451660156, -30.04027557373047, 50.8132438659668, 65.55868530273438, -21.100540161132812, 17.903297424316406, -20.183731079101562, 10.853973388671875, 36.78493118286133, 37.79839324951172, 26.989715576171875, -8.057144165039062, 21.08420181274414, 18.7313232421875, 37.141597747802734, 8.3116455078125, 1.5398445129394531, -2.82733154296875, 15.313735961914062, 33.044898986816406, 10.34981918334961, -8.7509765625, 18.34455108642578, 15.119140625, 10.348442077636719, 25.8642578125, 60.57086944580078, 6.801460266113281, 16.725196838378906, -6.286293029785156, -1.6057281494140625, 28.660472869873047, 27.581954956054688, 3.8251953125, 47.91636657714844, -0.8344917297363281, 14.105987548828125, -3.4365005493164062, -4.784759521484375, 67.1407470703125, 62.80615234375, 94.32026672363281, 30.72545623779297, 12.497150421142578, 38.31964111328125, 12.284873962402344, 59.25360107421875, -3.5170364379882812, 38.9246826171875, 20.725845336914062, 100.76629638671875, 30.57086944580078, -16.477554321289062, 37.575408935546875, 12.143417358398438, 9.232086181640625, -3.525989532470703, 31.884811401367188, 2.3442459106445312, 51.94550323486328, -1.4254531860351562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000626.npy"}
{"epoch": 0.9192364170337739, "step": 627, "batch_size": 64, "mean": 20.532487869262695, "std": 19.594614028930664, "min": -20.13262176513672, "p10": -3.944833374023437, "median": 18.188408851623535, "p90": 50.57925109863281, "max": 60.72650909423828, "pos_frac": 0.84375, "sample": [17.06464385986328, 50.581787109375, 30.287567138671875, 37.327720642089844, 10.288307189941406, -7.0308990478515625, 31.333438873291016, 21.36077880859375, 40.28461456298828, 0.25684356689453125, 6.684730529785156, 23.719345092773438, 14.312454223632812, 18.227725982666016, -1.0868301391601562, 17.478561401367188, -5.576404571533203, 41.055633544921875, 29.480728149414062, 50.573333740234375, -20.13262176513672, 10.036140441894531, 18.26880645751953, 1.361673355102539, 54.65409851074219, -3.72247314453125, 56.159507751464844, 23.919723510742188, -1.3416404724121094, 16.979076385498047, 0.9833564758300781, 8.09366226196289, 23.70301055908203, 45.88360595703125, 18.214881896972656, 3.162628173828125, 39.21917724609375, 20.62899398803711, 48.889495849609375, -4.040130615234375, 9.841413497924805, 18.161935806274414, 52.65721130371094, 6.701835632324219, 35.9718017578125, -6.269569396972656, 47.420623779296875, 10.170791625976562, 17.74317169189453, 12.914230346679688, 50.746063232421875, 29.3370361328125, 31.23421287536621, -9.9222412109375, 3.0253334045410156, 18.359947204589844, 30.89226531982422, 15.226936340332031, 5.0574951171875, 56.64593505859375, 60.72650909423828, -9.8487548828125, 22.71080780029297, 17.02911949157715], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000627.npy"}
{"epoch": 0.920704845814978, "step": 628, "batch_size": 64, "mean": 22.091106414794922, "std": 18.051145553588867, "min": -20.922805786132812, "p10": 1.456163024902346, "median": 20.384613037109375, "p90": 44.90489730834961, "max": 73.02960205078125, "pos_frac": 0.90625, "sample": [23.477771759033203, -2.627969741821289, 21.35260772705078, 32.16552734375, 20.77600860595703, 18.86358642578125, 13.826160430908203, 7.48065185546875, 6.3812713623046875, 18.629779815673828, 27.721923828125, 7.5550689697265625, 13.337020874023438, 36.942115783691406, 7.2039794921875, 52.15055847167969, -0.4445667266845703, 19.99321746826172, 73.02960205078125, 12.653366088867188, 8.057220458984375, 10.28802490234375, 25.379966735839844, 3.6585845947265625, 15.776901245117188, 27.622604370117188, 45.31036376953125, 30.662506103515625, 18.91484832763672, 15.069168090820312, 31.602127075195312, 22.096641540527344, 35.03919982910156, 53.79583740234375, 43.95880889892578, 28.176406860351562, 16.74425506591797, 28.224586486816406, 26.428409576416016, 0.51226806640625, 10.531597137451172, 9.28460693359375, -4.796226501464844, 4.363655090332031, 65.87904357910156, 37.45086669921875, 52.26115417480469, -6.751796722412109, 5.077552795410156, 21.016494750976562, 5.171356201171875, 31.956710815429688, 38.73051452636719, 39.9620246887207, -20.922805786132812, 18.52642059326172, 55.477783203125, 10.437877655029297, 23.753128051757812, 14.15496826171875, 43.84584045410156, 35.141746520996094, 28.04730987548828, -2.5554428100585938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000628.npy"}
{"epoch": 0.922173274596182, "step": 629, "batch_size": 64, "mean": 19.372583389282227, "std": 19.160127639770508, "min": -20.812103271484375, "p10": -3.4635856628417963, "median": 19.227191925048828, "p90": 43.037435150146486, "max": 60.623138427734375, "pos_frac": 0.8125, "sample": [37.54846954345703, 35.080810546875, 7.36939811706543, 26.32464599609375, 32.290122985839844, 23.469276428222656, 16.0364990234375, -1.3887176513671875, 21.761878967285156, 16.90420913696289, 30.123336791992188, 17.30169677734375, 51.53642272949219, 13.026870727539062, 35.384681701660156, -17.215408325195312, 2.6709747314453125, 31.420421600341797, 8.762765884399414, 27.663970947265625, -1.6834564208984375, 33.058006286621094, 39.78584289550781, 19.146865844726562, 3.1833267211914062, 43.191383361816406, 42.67822265625, -0.0370330810546875, 10.84661865234375, 6.298606872558594, 19.307518005371094, 7.2643280029296875, 60.623138427734375, 20.892967224121094, 26.363441467285156, -0.2130146026611328, -13.924484252929688, 21.007003784179688, 58.44673538208008, 22.16888427734375, 14.307605743408203, -2.939178466796875, 15.666618347167969, -15.354217529296875, 51.644439697265625, 4.622406005859375, 35.164161682128906, 16.973785400390625, 16.621292114257812, 2.0721988677978516, 29.371360778808594, 41.06669616699219, -3.6883316040039062, 46.10377502441406, -11.193971633911133, 57.43151092529297, 13.30935287475586, 24.175704956054688, 4.62054443359375, 26.761714935302734, 36.86836242675781, 28.255260467529297, -20.812103271484375, -5.6808929443359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000629.npy"}
{"epoch": 0.9236417033773862, "step": 630, "batch_size": 64, "mean": 18.927364349365234, "std": 18.477148056030273, "min": -21.051101684570312, "p10": -5.853813171386716, "median": 16.529476165771484, "p90": 42.659754943847666, "max": 72.95353698730469, "pos_frac": 0.84375, "sample": [40.403778076171875, -8.961532592773438, 16.664737701416016, 19.93712615966797, 29.864730834960938, 43.77910614013672, 37.55170440673828, 14.3990478515625, 20.2069091796875, 12.686004638671875, 10.078041076660156, 72.95353698730469, 8.292442321777344, 57.511436462402344, 35.622650146484375, 15.079444885253906, -3.4212684631347656, 26.35504150390625, 18.86078643798828, 14.94765853881836, 12.824996948242188, 16.157440185546875, 1.7833938598632812, 19.757217407226562, -19.604515075683594, 47.93083190917969, 27.853530883789062, 43.62660217285156, 5.603550910949707, 21.422271728515625, -3.5200347900390625, 11.376235961914062, 15.217880249023438, -6.85400390625, 19.235977172851562, 17.632476806640625, -14.912513732910156, 28.298126220703125, 13.859413146972656, 13.59600830078125, -6.954780578613281, -2.388336181640625, 19.0780029296875, 14.584190368652344, 30.79583740234375, -21.051101684570312, 31.64911651611328, 40.09743881225586, -7.105955123901367, 11.166908264160156, 32.22618103027344, 30.39421844482422, 15.974174499511719, 17.78716278076172, 19.449813842773438, 12.934234619140625, 45.940032958984375, 14.187973022460938, 28.234020233154297, 15.617828369140625, 66.89314270019531, 6.545628547668457, 24.8050537109375, 16.394214630126953], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000630.npy"}
{"epoch": 0.9251101321585903, "step": 631, "batch_size": 64, "mean": 21.33681869506836, "std": 16.365066528320312, "min": -16.15012550354004, "p10": 1.8183223724365234, "median": 21.426361083984375, "p90": 38.96267776489258, "max": 67.88101196289062, "pos_frac": 0.90625, "sample": [37.0301513671875, 20.123565673828125, 21.932144165039062, 33.93989562988281, 25.86041259765625, 24.783409118652344, 33.46820068359375, 39.363616943359375, -16.15012550354004, 23.47662353515625, 66.5853271484375, 47.22605895996094, 7.1824798583984375, 25.877304077148438, 16.89501953125, 12.778640747070312, 28.644119262695312, 34.430267333984375, 19.802021026611328, 11.876140594482422, 9.579391479492188, 1.8348274230957031, 4.7167816162109375, 31.887649536132812, 17.562042236328125, 24.116043090820312, 17.97779083251953, 13.339420318603516, 22.013748168945312, 18.383132934570312, 18.41509246826172, 19.408493041992188, 31.419944763183594, 7.185455322265625, 24.772796630859375, 67.88101196289062, 3.3759994506835938, 30.93280029296875, 18.815279006958008, 49.485328674316406, 4.637107849121094, -2.4441280364990234, 28.652175903320312, 38.02715301513672, 31.583534240722656, 1.811248779296875, 34.91852951049805, 23.796401977539062, 31.386199951171875, 20.920578002929688, 5.8490142822265625, -1.6877098083496094, 13.614208221435547, 9.05584716796875, -13.140823364257812, -3.4785423278808594, 30.301971435546875, 47.126007080078125, 39.828704833984375, -1.4257640838623047, 26.98462677001953, 27.972579956054688, 3.190093994140625, 19.849098205566406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000631.npy"}
{"epoch": 0.9265785609397944, "step": 632, "batch_size": 64, "mean": 24.196102142333984, "std": 21.422731399536133, "min": -21.47931671142578, "p10": -1.9997390747070307, "median": 21.06170654296875, "p90": 54.75511131286622, "max": 82.88389587402344, "pos_frac": 0.875, "sample": [36.59480667114258, -2.25836181640625, 6.059776306152344, 13.566963195800781, 27.299137115478516, 21.391754150390625, 12.66546630859375, 59.85862731933594, 68.18222045898438, 39.1071891784668, 47.496376037597656, 7.43609619140625, 16.128902435302734, 8.213623046875, 20.734710693359375, 20.49478530883789, 24.327224731445312, 53.30021286010742, -6.038225173950195, 16.49449920654297, 33.12298583984375, 21.388702392578125, 18.4583740234375, 30.954273223876953, 6.215263366699219, 82.88389587402344, 59.16502380371094, 18.552001953125, 34.59467315673828, -4.2138214111328125, -7.7691192626953125, -12.628616333007812, 63.43096923828125, 40.55223083496094, 0.8913116455078125, 20.440532684326172, 13.697250366210938, 30.570594787597656, 18.061737060546875, 11.658538818359375, 43.45482635498047, 35.72743225097656, 26.566055297851562, 36.13390350341797, 13.972900390625, 33.08909606933594, 24.76076316833496, 18.925182342529297, 40.33741760253906, 81.14155578613281, 55.378639221191406, 27.527450561523438, 11.49909782409668, 13.369590759277344, 13.703258514404297, 23.790746688842773, 1.0861968994140625, -1.3962860107421875, -21.47931671142578, 29.31903076171875, 31.356914520263672, 17.95774269104004, -9.57284927368164, 30.81854248046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000632.npy"}
{"epoch": 0.9280469897209985, "step": 633, "batch_size": 64, "mean": 17.65105438232422, "std": 22.972837448120117, "min": -12.46099853515625, "p10": -5.015203666687011, "median": 12.859745025634766, "p90": 38.62502517700196, "max": 114.8394775390625, "pos_frac": 0.8125, "sample": [4.08123779296875, 18.688636779785156, 5.439048767089844, 4.6600494384765625, -10.719528198242188, 13.854789733886719, 12.565818786621094, -2.48809814453125, 114.8394775390625, 38.987548828125, 21.216609954833984, -8.4765625, 51.29218292236328, 5.637626647949219, 21.942718505859375, 27.044296264648438, 34.06935119628906, 34.65995788574219, 8.55267333984375, 21.264984130859375, -12.170829772949219, 58.669677734375, 45.060546875, 0.7218017578125, 21.482879638671875, 31.776504516601562, 0.7220344543457031, 11.756671905517578, 1.360870361328125, 18.893447875976562, 0.8456649780273438, 22.02587890625, 32.18282699584961, 23.13280487060547, -2.1094970703125, -3.6035499572753906, -7.507537841796875, -12.46099853515625, -10.6478271484375, 22.06365203857422, 52.2679443359375, 5.237098693847656, 37.779136657714844, 30.514373779296875, 19.13225555419922, 19.586219787597656, 11.919876098632812, 22.3743896484375, 97.431396484375, 8.084320068359375, -5.127769470214844, 26.77874755859375, -4.75255012512207, 36.277870178222656, 12.012039184570312, -3.560272216796875, 9.943641662597656, 12.49212646484375, 6.050327301025391, 34.318115234375, 13.153671264648438, 18.730331420898438, 2.5182266235351562, 7.198169708251953], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000633.npy"}
{"epoch": 0.9295154185022027, "step": 634, "batch_size": 64, "mean": 20.120935440063477, "std": 18.549516677856445, "min": -13.786827087402344, "p10": 3.4292970657348634, "median": 17.745800018310547, "p90": 39.28057746887209, "max": 85.515380859375, "pos_frac": 0.90625, "sample": [15.499649047851562, 10.10906982421875, 10.654857635498047, 24.2362060546875, 15.67844009399414, 17.31450653076172, 3.585803985595703, 21.925933837890625, 25.20025634765625, 52.357269287109375, 12.273605346679688, 35.45893859863281, 18.369430541992188, 6.383880615234375, 85.515380859375, 31.40679168701172, 20.741661071777344, 32.59962463378906, 18.866470336914062, 7.180389404296875, 12.90167236328125, 4.478507995605469, 18.706398010253906, 8.300567626953125, 24.587890625, 12.968648910522461, 31.190200805664062, 24.817733764648438, 53.352508544921875, 12.682554244995117, 80.13629913330078, 3.362222671508789, 4.7242889404296875, -7.588706970214844, 26.797576904296875, 24.801498413085938, 54.517333984375, 16.627044677734375, 45.34987258911133, -13.786827087402344, -11.031898498535156, 30.635986328125, 9.065494537353516, 40.91842269897461, 11.18069839477539, 26.95452880859375, 22.229393005371094, 15.08563232421875, 10.975105285644531, 31.951034545898438, -0.6270637512207031, 33.66535949707031, -8.87542724609375, 6.381834030151367, 13.604507446289062, 21.170814514160156, 12.746517181396484, 34.858734130859375, -7.597648620605469, 4.40460205078125, 28.32964324951172, 33.68797302246094, 5.563114166259766, 18.177093505859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000634.npy"}
{"epoch": 0.9309838472834068, "step": 635, "batch_size": 64, "mean": 19.97067642211914, "std": 20.720016479492188, "min": -21.739587783813477, "p10": 0.9388870239257814, "median": 17.71721076965332, "p90": 51.11462860107422, "max": 76.98190307617188, "pos_frac": 0.90625, "sample": [26.97720718383789, 1.6850051879882812, 55.136871337890625, -6.755035400390625, 41.08014678955078, 23.473594665527344, 2.6075267791748047, 11.787551879882812, 27.635528564453125, 14.520225524902344, -1.43914794921875, 23.8072509765625, 23.163341522216797, 25.15375518798828, 51.50151062011719, 21.98830795288086, 4.4714508056640625, 50.280120849609375, 26.445449829101562, 40.11744689941406, 6.453758239746094, 6.562385559082031, 34.29510498046875, 1.3312835693359375, 20.619003295898438, 0.88623046875, 8.681182861328125, 29.1566162109375, 71.32028198242188, 51.47227478027344, 4.216266632080078, 11.82537841796875, -4.580535888671875, 76.98190307617188, 22.227529525756836, 8.256576538085938, 12.232402801513672, 7.519176483154297, 13.201431274414062, 19.006813049316406, 17.990264892578125, 62.68510818481445, 30.138473510742188, 27.56183624267578, 37.848663330078125, 1.5982437133789062, 19.760677337646484, 23.413124084472656, 1.7464599609375, 8.43524169921875, 46.685203552246094, 1.0617523193359375, -19.931137084960938, 31.918006896972656, 12.795440673828125, 6.739291191101074, 17.444156646728516, -14.000846862792969, 13.389366149902344, 66.15069580078125, -21.739587783813477, 8.083625793457031, 19.862457275390625, 13.183502197265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000635.npy"}
{"epoch": 0.9324522760646109, "step": 636, "batch_size": 64, "mean": 19.708370208740234, "std": 19.19512176513672, "min": -18.12310028076172, "p10": -6.586960220336914, "median": 19.933263778686523, "p90": 43.86399688720703, "max": 65.3553695678711, "pos_frac": 0.796875, "sample": [10.053743362426758, 25.046810150146484, -1.965972900390625, 32.99919128417969, 19.426212310791016, 1.37445068359375, 30.224151611328125, -6.883905410766602, 12.727615356445312, -6.607780456542969, -7.2044830322265625, 25.617591857910156, 12.231136322021484, 30.4610538482666, 31.55675506591797, 29.881668090820312, -0.4707489013671875, 48.50835418701172, -2.716583251953125, -11.766510009765625, -10.860054016113281, 19.792858123779297, 12.869865417480469, 5.966531753540039, 43.976661682128906, 10.15509033203125, 21.428817749023438, 49.200042724609375, 30.5001220703125, 35.23967742919922, 19.01703643798828, -6.538379669189453, 6.221889495849609, 29.618759155273438, 24.57354736328125, 11.516433715820312, 43.601112365722656, 17.559036254882812, 24.463157653808594, 20.07366943359375, 29.545120239257812, -0.5775489807128906, 35.945220947265625, 45.430152893066406, 10.658672332763672, 34.87071228027344, -4.271278381347656, 39.243186950683594, 8.961980819702148, -8.795692443847656, 23.01006317138672, -18.12310028076172, 34.05213928222656, 41.38725280761719, 38.220054626464844, 54.61751937866211, 65.3553695678711, 65.34619140625, 31.673744201660156, 10.192371368408203, 3.5263671875, 7.626045227050781, 8.126819610595703, 24.445755004882812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000636.npy"}
{"epoch": 0.933920704845815, "step": 637, "batch_size": 64, "mean": 22.652591705322266, "std": 23.24656105041504, "min": -36.3198356628418, "p10": -4.254946136474609, "median": 20.63357925415039, "p90": 55.10420150756836, "max": 72.6293716430664, "pos_frac": 0.796875, "sample": [8.045639038085938, 15.510597229003906, 34.45841979980469, 30.427196502685547, 24.570159912109375, 72.6293716430664, -12.440128326416016, -9.650081634521484, 48.975013732910156, 53.764320373535156, -12.664093017578125, -2.280649185180664, -3.9316329956054688, -0.601531982421875, 15.744037628173828, 71.27281188964844, 14.652584075927734, -9.509330749511719, 55.678436279296875, 47.99168395996094, 15.418792724609375, 21.702327728271484, 24.524612426757812, 39.897544860839844, 18.370941162109375, 35.16950988769531, 6.190521240234375, 33.466522216796875, 29.015472412109375, 20.433086395263672, -4.3935089111328125, 21.318883895874023, 15.287681579589844, 19.760711669921875, 66.58314514160156, -1.8320999145507812, 23.970458984375, 7.716361999511719, 24.8780574798584, 20.571556091308594, 52.738624572753906, 49.92058563232422, 28.6357421875, 48.883155822753906, -8.501091003417969, 43.00691223144531, 18.012420654296875, 13.641502380371094, 64.04969024658203, -1.8994598388671875, 27.44513702392578, -36.3198356628418, 20.28271484375, 56.25341796875, -1.1769695281982422, 4.20880126953125, 13.258960723876953, 69.49569702148438, 24.825023651123047, 22.000946044921875, 0.4892005920410156, 20.695602416992188, 31.42730712890625, 7.698406219482422], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000637.npy"}
{"epoch": 0.9353891336270191, "step": 638, "batch_size": 64, "mean": 25.182167053222656, "std": 19.49693489074707, "min": -9.282859802246094, "p10": 4.100817108154298, "median": 21.998828887939453, "p90": 46.48610229492188, "max": 99.88250732421875, "pos_frac": 0.96875, "sample": [99.88250732421875, 38.08636474609375, 43.054351806640625, 30.05463409423828, 26.823944091796875, 11.486068725585938, 22.16130828857422, 21.027931213378906, 24.22306251525879, 5.567867279052734, 38.11857604980469, 46.71265411376953, 55.36106872558594, 53.790809631347656, 21.069076538085938, 8.384906768798828, 0.47641563415527344, 21.343894958496094, 14.911922454833984, 9.536632537841797, 13.083061218261719, 13.882476806640625, 47.31364440917969, 23.206180572509766, 15.556808471679688, 43.598052978515625, 26.160011291503906, -9.282859802246094, 88.87786865234375, 26.02153778076172, 32.22812271118164, 21.836349487304688, 14.685661315917969, 10.462013244628906, 37.26737976074219, 31.0189208984375, 52.033233642578125, 7.631015777587891, 12.894546508789062, 20.97838592529297, 1.539499282836914, 37.075164794921875, 38.48291015625, 40.40705108642578, 1.8736495971679688, 2.2532520294189453, 36.843292236328125, 45.957481384277344, 5.129096984863281, 14.972976684570312, 14.069616317749023, -0.1616344451904297, 26.078750610351562, 26.214187622070312, 18.596450805664062, 27.26708984375, 11.510711669921875, 34.890018463134766, 29.853904724121094, 39.82289123535156, 6.928825378417969, 21.57904052734375, 3.660125732421875, 5.287834167480469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000638.npy"}
{"epoch": 0.9368575624082232, "step": 639, "batch_size": 64, "mean": 15.845381736755371, "std": 18.75926971435547, "min": -35.61658477783203, "p10": -5.561175918579101, "median": 15.572582244873047, "p90": 39.91725234985352, "max": 52.19554901123047, "pos_frac": 0.828125, "sample": [27.558364868164062, 8.586692810058594, 40.666847229003906, 49.63544464111328, 13.015007019042969, 18.2274169921875, 29.031646728515625, 38.16819763183594, 25.907257080078125, 10.123790740966797, 13.937881469726562, -5.812335968017578, -4.01983642578125, -19.85260009765625, 35.1385498046875, 50.58247375488281, 0.7767181396484375, -23.41650390625, 11.23740005493164, 20.743812561035156, 27.77245330810547, -11.853851318359375, 15.80819320678711, 23.238298416137695, 33.618560791015625, 13.77730941772461, -5.998542785644531, 28.173202514648438, 35.783592224121094, 28.829360961914062, 28.087905883789062, 52.19554901123047, 51.30480194091797, -3.765026092529297, 29.62933349609375, 23.086135864257812, 17.14478302001953, 13.730316162109375, 10.5638427734375, -35.61658477783203, 16.17681884765625, 24.865829467773438, 10.726600646972656, 2.3648681640625, 46.14683532714844, -4.975135803222656, 19.338897705078125, 5.6757049560546875, 4.6696319580078125, 11.730072021484375, 12.873289108276367, 0.8890228271484375, 15.752616882324219, 5.218971252441406, -19.23918914794922, 2.0456161499023438, -3.798675537109375, 17.461410522460938, 1.4905319213867188, 49.03717803955078, 8.748165130615234, 27.272727966308594, 28.49420166015625, 15.392547607421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000639.npy"}
{"epoch": 0.9383259911894273, "step": 640, "batch_size": 64, "mean": 24.32077407836914, "std": 21.088542938232422, "min": -38.58811950683594, "p10": -2.649780464172361, "median": 22.70906639099121, "p90": 50.53076171875, "max": 83.19180297851562, "pos_frac": 0.875, "sample": [10.98868179321289, 44.99676513671875, -5.110208511352539, 15.809013366699219, 2.768522262573242, 31.662246704101562, 29.057010650634766, 22.896770477294922, 31.628067016601562, 15.797477722167969, 29.34002685546875, 36.42909240722656, 21.31472396850586, 21.476947784423828, 37.596702575683594, 23.242319107055664, 35.17481994628906, -38.58811950683594, 1.7982864379882812, -9.39493179321289, 10.763641357421875, 12.81097412109375, 27.020599365234375, 13.28143310546875, 19.58185577392578, 7.011112213134766, 16.809066772460938, 49.33836364746094, 66.89820861816406, 22.5213623046875, 61.83119201660156, 61.302791595458984, 15.813316345214844, 17.69540023803711, 52.224365234375, -0.1536846160888672, -6.8588409423828125, 25.841243743896484, 16.38775634765625, 16.817626953125, 50.01377868652344, 23.782798767089844, 35.990570068359375, 36.65076446533203, 25.74818992614746, 43.59446716308594, 32.61766815185547, 18.1566162109375, -6.234844207763672, 50.312652587890625, 83.19180297851562, 31.727127075195312, -6.750221252441406, 37.34808349609375, -3.7195358276367188, 14.415122985839844, 40.178001403808594, 16.81816864013672, 20.054092407226562, 50.624237060546875, 1.06640625, 57.44810485839844, 29.821853637695312, 7.851551055908203], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000640.npy"}
{"epoch": 0.9397944199706314, "step": 641, "batch_size": 64, "mean": 23.554304122924805, "std": 19.85858154296875, "min": -15.963432312011719, "p10": -1.5510292053222638, "median": 24.268157958984375, "p90": 45.979067993164065, "max": 89.68121337890625, "pos_frac": 0.890625, "sample": [7.814414978027344, 36.211483001708984, 11.148200988769531, 46.195228576660156, -15.782249450683594, -11.28369140625, 39.4256591796875, 38.87202072143555, 11.00250244140625, 0.29961395263671875, 4.663066864013672, 25.31305694580078, 17.706436157226562, 25.88031768798828, 89.68121337890625, 26.109771728515625, 31.52584457397461, 11.71978759765625, 6.298187255859375, 12.278396606445312, 38.70838928222656, 4.7512969970703125, 19.466522216796875, 61.493621826171875, 23.4710693359375, 5.7572479248046875, -6.955821990966797, 28.89765167236328, 25.140262603759766, 57.32810974121094, 39.36769104003906, 32.44780731201172, 23.470069885253906, -4.01739501953125, 25.57122802734375, 32.88566970825195, 25.06524658203125, 40.34761047363281, 8.531707763671875, 1.8321914672851562, 22.269733428955078, -15.032752990722656, 20.52985382080078, 17.30500030517578, 19.056976318359375, -15.963432312011719, -2.3441619873046875, 28.172012329101562, 18.009300231933594, 31.466388702392578, 46.351356506347656, 55.81580352783203, 20.51019287109375, 35.95073699951172, 48.20561218261719, 17.41094970703125, 14.904708862304688, 26.828811645507812, 40.25910949707031, 11.423248291015625, 35.790992736816406, 45.474693298339844, 43.304351806640625, 43.13653564453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000641.npy"}
{"epoch": 0.9412628487518355, "step": 642, "batch_size": 64, "mean": 21.451595306396484, "std": 22.045013427734375, "min": -17.85162353515625, "p10": -1.7600814819335922, "median": 18.389652252197266, "p90": 46.37595367431641, "max": 95.566650390625, "pos_frac": 0.875, "sample": [41.98728942871094, 12.665840148925781, 40.601783752441406, 1.4406661987304688, 23.761962890625, 9.783912658691406, 58.05055236816406, -8.363693237304688, 12.969795227050781, 6.769020080566406, -2.3872222900390625, 40.133209228515625, 10.020416259765625, -7.6124420166015625, 24.73004150390625, 31.799270629882812, 18.768707275390625, 19.03795623779297, -0.2967529296875, 18.393959045410156, -12.62459945678711, -7.600074768066406, 4.000843048095703, 6.599948883056641, 8.362083435058594, 42.26383972167969, 43.30097198486328, 13.237682342529297, 5.4135894775390625, 15.89013671875, 24.33448028564453, 34.06028366088867, 43.96794128417969, 38.573097229003906, 27.780174255371094, 69.178955078125, 55.75188446044922, 38.60285186767578, 17.623016357421875, 1.9725761413574219, 55.652427673339844, 17.76129150390625, 25.108245849609375, 3.2825775146484375, 3.211193084716797, 18.385345458984375, 19.527973175048828, 42.27979278564453, 26.447189331054688, 7.3456268310546875, 95.566650390625, 2.3994483947753906, 0.05062103271484375, 19.97454833984375, 1.972869873046875, -4.596529006958008, 0.9661407470703125, 68.83297729492188, 42.77943420410156, 10.689231872558594, 31.245941162109375, 47.407958984375, -17.85162353515625, 31.518722534179688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000642.npy"}
{"epoch": 0.9427312775330396, "step": 643, "batch_size": 64, "mean": 19.87814712524414, "std": 16.777467727661133, "min": -13.199287414550781, "p10": -0.9719884872436519, "median": 19.20635986328125, "p90": 36.87419776916504, "max": 84.90997314453125, "pos_frac": 0.859375, "sample": [35.53038024902344, 42.63062286376953, 32.19108200073242, 19.288467407226562, 16.576919555664062, 26.846641540527344, 28.445297241210938, 28.40601348876953, -8.283138275146484, 25.38921356201172, -5.553495407104492, 3.1128005981445312, 0.7045478820800781, 35.578346252441406, 12.106742858886719, 38.96250534057617, 39.05897521972656, 16.3087158203125, 33.635597229003906, 37.42926025390625, 26.870010375976562, 20.141443252563477, 13.374197006225586, -0.4562263488769531, -1.2002124786376953, 7.864429473876953, 35.57905197143555, 32.74851989746094, 42.64900207519531, 9.712265014648438, -13.199287414550781, 26.965312957763672, -6.618507385253906, 29.70733642578125, 55.8497314453125, 32.36779022216797, -0.6256561279296875, 17.596092224121094, 15.860694885253906, 19.124252319335938, 24.937828063964844, 29.880355834960938, 84.90997314453125, 11.557037353515625, 33.956207275390625, 26.376914978027344, -1.1204166412353516, 11.305679321289062, 25.117279052734375, 6.828417778015137, 19.890731811523438, 17.300064086914062, 10.107250213623047, 30.640625, 9.238525390625, 1.7201690673828125, 17.79248046875, 17.2530517578125, 26.82171630859375, 9.688278198242188, 5.244850158691406, 17.81616973876953, -9.149070739746094, 21.411643981933594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000643.npy"}
{"epoch": 0.9441997063142438, "step": 644, "batch_size": 64, "mean": 17.589366912841797, "std": 18.404930114746094, "min": -19.893150329589844, "p10": -6.52930908203125, "median": 16.63086700439453, "p90": 44.41327514648438, "max": 64.86990356445312, "pos_frac": 0.828125, "sample": [58.5675048828125, 22.254215240478516, 45.671173095703125, -19.893150329589844, -15.455619812011719, 43.32284164428711, -9.009452819824219, 28.582191467285156, 4.134666442871094, 14.55131721496582, 10.311058044433594, 16.765579223632812, 3.717926025390625, 23.13058853149414, 33.31165313720703, 44.74365234375, 5.654991149902344, 1.96746826171875, 9.297332763671875, 23.655263900756836, 20.648529052734375, 4.863533020019531, 17.112831115722656, -0.9102401733398438, 21.0928955078125, 22.332275390625, -7.403392791748047, 12.658573150634766, 30.321006774902344, 43.64239501953125, 16.49615478515625, 14.69057846069336, 10.877548217773438, 11.560386657714844, 23.42162322998047, 8.868927001953125, 35.762786865234375, 45.92926025390625, 10.822395324707031, 27.845993041992188, -6.7188873291015625, 25.00341796875, 45.682830810546875, 0.3774833679199219, 17.217819213867188, 64.86990356445312, 16.22650146484375, -6.0869598388671875, 7.4376678466796875, 49.439697265625, 24.914825439453125, 23.888168334960938, -0.8747329711914062, -19.794662475585938, 25.503997802734375, 38.254859924316406, 24.17047119140625, 13.89027214050293, 21.8155517578125, -3.9388351440429688, 11.80172348022461, 37.99510955810547, 6.916640281677246, -8.18865966796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000644.npy"}
{"epoch": 0.9456681350954479, "step": 645, "batch_size": 64, "mean": 16.623207092285156, "std": 20.50801658630371, "min": -22.80914306640625, "p10": -8.358413696289062, "median": 14.351615905761719, "p90": 44.52913475036621, "max": 78.72527313232422, "pos_frac": 0.796875, "sample": [-9.42990493774414, 30.345108032226562, 14.366165161132812, -6.6788787841796875, 11.369888305664062, -22.80914306640625, 18.912322998046875, 36.180137634277344, 44.72975540161133, 4.018243789672852, 16.577335357666016, 5.9993438720703125, 34.694427490234375, -3.3755111694335938, 7.566425323486328, 11.077651977539062, 1.9593353271484375, 29.335472106933594, 44.06101989746094, 14.788291931152344, 31.74643325805664, -8.185432434082031, 16.213088989257812, 23.152679443359375, 7.825412750244141, 12.267623901367188, 2.1622467041015625, -11.128524780273438, -5.990623474121094, 12.180976867675781, 1.1187705993652344, 1.1151084899902344, 14.3016357421875, 30.361183166503906, 50.286651611328125, 8.796966552734375, 19.656944274902344, 45.45570373535156, 4.224800109863281, -8.432548522949219, 78.72527313232422, 14.337066650390625, -14.594131469726562, -8.842498779296875, 27.44244384765625, 76.11154174804688, 30.172882080078125, 16.09203338623047, 28.67181396484375, -0.05121612548828125, 19.20122528076172, 13.0045166015625, 26.351425170898438, -0.7299156188964844, 31.73382568359375, 59.60111999511719, 24.020557403564453, 21.073577880859375, -17.51043701171875, 45.71765899658203, 21.02541160583496, 5.55272102355957, 22.561058044433594, 13.400688171386719], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000645.npy"}
{"epoch": 0.947136563876652, "step": 646, "batch_size": 64, "mean": 21.496368408203125, "std": 15.187257766723633, "min": -7.578216552734375, "p10": 2.961273193359376, "median": 20.773672103881836, "p90": 41.218052291870116, "max": 57.13059997558594, "pos_frac": 0.90625, "sample": [35.3574333190918, 25.968727111816406, 16.794645309448242, 50.20130157470703, 3.864391326904297, 5.9004364013671875, 41.6080322265625, 33.934146881103516, 23.0272216796875, 36.34413146972656, 18.05329132080078, 33.107696533203125, 35.93219757080078, 7.434234619140625, 36.97467041015625, 20.814861297607422, 18.75265884399414, 26.580047607421875, 24.039276123046875, 57.13059997558594, -7.578216552734375, 17.075742721557617, 7.778388977050781, 25.81315803527832, 13.06472396850586, 34.00232696533203, 20.73248291015625, 22.982322692871094, 9.609394073486328, 26.063453674316406, 10.884635925292969, 25.0089111328125, 13.837020874023438, 18.741073608398438, 26.01013946533203, 13.21282958984375, -3.928924560546875, 44.05207061767578, 34.975929260253906, -5.148006439208984, 24.70184326171875, 5.313438415527344, 7.854351043701172, -3.661144256591797, -3.9709396362304688, 2.5742225646972656, 19.241531372070312, 16.59774398803711, 46.716217041015625, 37.08214569091797, 27.01502227783203, 41.384761810302734, 40.829063415527344, 6.8828125, 12.328483581542969, 17.883445739746094, 21.59748077392578, 20.205184936523438, 18.490283966064453, -5.835624694824219, 6.2880401611328125, 37.152740478515625, 25.28863525390625, 54.798370361328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000646.npy"}
{"epoch": 0.9486049926578561, "step": 647, "batch_size": 64, "mean": 17.998077392578125, "std": 20.022048950195312, "min": -18.15106201171875, "p10": -9.689571571350097, "median": 16.531667709350586, "p90": 45.19423370361329, "max": 65.80886840820312, "pos_frac": 0.78125, "sample": [1.1552734375, 26.612876892089844, 38.92396545410156, -3.2757110595703125, 48.345314025878906, 53.90093994140625, 18.59284210205078, -10.633613586425781, 13.953414916992188, -10.465621948242188, 37.01051330566406, -3.833160400390625, 6.084259033203125, 36.053253173828125, 19.82640838623047, 35.76673126220703, 13.714027404785156, 24.498245239257812, 57.17102813720703, 5.526165008544922, 28.790435791015625, -8.849214553833008, 31.773052215576172, -0.6095085144042969, 26.528114318847656, 53.09761047363281, 7.035125732421875, 65.80886840820312, -2.4831085205078125, 56.4264030456543, 34.20135498046875, 15.621597290039062, 13.320610046386719, 7.481109619140625, 24.60387420654297, -0.25382232666015625, -18.15106201171875, 39.46821975708008, 9.756256103515625, -10.781768798828125, 1.1853828430175781, 22.973968505859375, 27.35485076904297, 0.7796630859375, -10.049724578857422, 4.521404266357422, 17.706459045410156, 7.400455474853516, 6.522300720214844, 37.03703308105469, 7.639554977416992, 27.372211456298828, 46.735565185546875, 41.59779357910156, 19.34033203125, 29.153121948242188, -15.3304443359375, 15.468551635742188, -0.22377777099609375, 26.059295654296875, 11.770843505859375, 38.307838439941406, -10.598922729492188, 17.44173812866211], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000647.npy"}
{"epoch": 0.9500734214390602, "step": 648, "batch_size": 64, "mean": 24.0517578125, "std": 20.774396896362305, "min": -17.089797973632812, "p10": 0.5939292907714848, "median": 20.7905216217041, "p90": 50.16783294677735, "max": 85.24116516113281, "pos_frac": 0.90625, "sample": [-0.018720626831054688, 8.046432495117188, 12.808349609375, 42.5096435546875, 10.574016571044922, 45.69243621826172, 42.70808410644531, 56.75770568847656, 1.931671142578125, 15.844139099121094, 11.616493225097656, 6.977497100830078, 3.972015380859375, 39.61065673828125, 26.1524658203125, 26.302776336669922, 60.2899284362793, 10.265457153320312, 70.78201293945312, 1.033538818359375, 0.40552520751953125, 5.8561553955078125, 1.0874710083007812, 28.07073974609375, 10.69329833984375, 25.51563262939453, 50.993194580078125, 40.884681701660156, 33.21747589111328, 38.6031494140625, 24.329879760742188, -5.906852722167969, -17.089797973632812, 85.24116516113281, 30.985206604003906, 1.3386516571044922, 7.529472351074219, 13.813346862792969, 18.39873504638672, -4.429412841796875, -4.2186279296875, 17.771316528320312, 36.57640838623047, 46.29949951171875, 27.053844451904297, 39.35444641113281, 61.067771911621094, 4.4954376220703125, 17.26647186279297, 23.182308197021484, 41.014976501464844, 24.355178833007812, 11.494792938232422, 12.774726867675781, 35.87098693847656, -0.06928062438964844, 46.717803955078125, 48.24198913574219, 17.46529769897461, 53.40806579589844, 13.472923278808594, 38.60857391357422, 12.565071105957031, 31.148223876953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000648.npy"}
{"epoch": 0.9515418502202643, "step": 649, "batch_size": 64, "mean": 19.334102630615234, "std": 16.70734977722168, "min": -14.249099731445312, "p10": -2.528314781188964, "median": 19.278635025024414, "p90": 40.49261322021485, "max": 59.89764404296875, "pos_frac": 0.875, "sample": [22.424423217773438, 47.7528076171875, 16.473224639892578, 1.5218048095703125, 44.9428596496582, 5.802297592163086, -5.40325927734375, 4.161659240722656, 27.3961181640625, 3.7256317138671875, 20.66083526611328, -12.842987060546875, -10.080326080322266, -10.296249389648438, 33.39369201660156, 26.12696075439453, 12.697616577148438, 36.27229309082031, 10.140125274658203, 12.148185729980469, 29.49144744873047, 30.735492706298828, 0.4584064483642578, 0.4654560089111328, 22.97185516357422, 7.926143646240234, 32.533721923828125, 22.955612182617188, 4.62054443359375, 17.815902709960938, 6.643379211425781, -14.249099731445312, 18.638214111328125, 17.105815887451172, 18.862594604492188, 15.698699951171875, 19.262222290039062, 26.804698944091797, 23.245563507080078, 29.659866333007812, 32.58368682861328, 16.932968139648438, -1.7584667205810547, 40.947296142578125, 45.732383728027344, 14.776321411132812, 59.89764404296875, 36.86663818359375, 29.764068603515625, 23.343631744384766, 3.302400588989258, 36.52928161621094, 9.478271484375, 44.08782958984375, 39.18381881713867, 39.43168640136719, 41.71333312988281, -2.8582496643066406, 1.5568962097167969, 32.72235870361328, 31.666412353515625, 28.436676025390625, -4.9136505126953125, 19.295047760009766], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000649.npy"}
{"epoch": 0.9530102790014684, "step": 650, "batch_size": 64, "mean": 21.35319709777832, "std": 20.0443172454834, "min": -28.75732421875, "p10": -3.080380439758301, "median": 20.26061248779297, "p90": 45.10958480834962, "max": 94.05596923828125, "pos_frac": 0.859375, "sample": [13.792716979980469, 13.126518249511719, 13.621177673339844, 57.674583435058594, 30.61859130859375, 24.090057373046875, 21.349674224853516, -4.095512390136719, 44.133567810058594, 21.68970489501953, 64.63552856445312, 0.920166015625, 24.559112548828125, 8.910919189453125, 28.330787658691406, 47.88916778564453, 9.135330200195312, 38.80316162109375, 19.727981567382812, 46.86100769042969, 94.05596923828125, 32.02112579345703, 38.591270446777344, 28.95606231689453, 36.184383392333984, 11.319643020629883, -3.7557373046875, 1.2237319946289062, 17.756698608398438, 13.565109252929688, 32.32946014404297, 36.47434616088867, 13.869712829589844, -13.7318115234375, 15.211536407470703, 31.569091796875, 27.23162841796875, 45.52787780761719, -3.033750534057617, 30.836341857910156, -9.831085205078125, 26.143417358398438, 10.59315299987793, 20.793243408203125, 29.86475372314453, 30.225631713867188, 38.779808044433594, 4.289360046386719, 9.737640380859375, 19.035308837890625, 27.496803283691406, -0.39971160888671875, 25.80535888671875, 4.556171417236328, -9.981971740722656, 12.621223449707031, 17.1783447265625, 12.261734008789062, -28.75732421875, 17.873672485351562, 17.52264404296875, -3.1003646850585938, 56.677364349365234, 25.242515563964844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000650.npy"}
{"epoch": 0.9544787077826725, "step": 651, "batch_size": 64, "mean": 25.779937744140625, "std": 19.849525451660156, "min": -3.6590042114257812, "p10": 2.485031890869141, "median": 25.269323348999023, "p90": 48.960294342041024, "max": 97.79183959960938, "pos_frac": 0.921875, "sample": [61.048851013183594, 53.759490966796875, 14.085395812988281, 62.660888671875, 29.106185913085938, 9.738853454589844, 43.614952087402344, 33.38689041137695, 36.03791809082031, 27.245513916015625, 36.646522521972656, 36.69575500488281, 22.189319610595703, 9.008888244628906, 50.13922119140625, 25.271080017089844, 26.913299560546875, -3.6590042114257812, 29.028533935546875, 58.14254379272461, 31.351760864257812, 2.9140777587890625, 46.20946502685547, 6.981807708740234, 34.58038330078125, 4.45654296875, -0.8863716125488281, -3.5441436767578125, 29.891876220703125, 22.5682373046875, 18.887222290039062, 8.004173278808594, 39.17848205566406, -3.028308868408203, 44.295867919921875, 25.267566680908203, -0.604156494140625, 44.22479248046875, 57.63566589355469, 1.2517013549804688, 9.26600456237793, 21.67523193359375, 3.603799819946289, 12.689823150634766, 2.3011550903320312, 41.885475158691406, 5.278907775878906, 97.79183959960938, 9.526435852050781, 20.3001708984375, 10.403865814208984, 4.528053283691406, 35.958126068115234, 15.226455688476562, 27.94792938232422, 26.970191955566406, 22.153228759765625, 6.446784973144531, 40.62226867675781, 20.936080932617188, 44.20244598388672, 18.138214111328125, 37.45591735839844, 43.90995788574219], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000651.npy"}
{"epoch": 0.9559471365638766, "step": 652, "batch_size": 64, "mean": 21.245826721191406, "std": 20.232812881469727, "min": -11.268463134765625, "p10": 0.07303657531738367, "median": 16.829143524169922, "p90": 46.860039520263676, "max": 82.72247314453125, "pos_frac": 0.890625, "sample": [6.772213935852051, 12.145927429199219, 25.027740478515625, 10.211021423339844, 0.9131546020507812, 46.665252685546875, 22.525848388671875, 14.421531677246094, 37.474700927734375, 27.878814697265625, 61.32258605957031, 39.574188232421875, 46.943519592285156, 27.608489990234375, 1.7784194946289062, 20.090492248535156, 5.0264739990234375, 54.2305908203125, 82.72247314453125, -1.014944076538086, 13.982101440429688, 4.559539794921875, 15.019142150878906, -9.937606811523438, 13.897340774536133, 24.954849243164062, 16.18756103515625, 41.710723876953125, 6.4016876220703125, 17.74652862548828, 2.9457778930664062, 65.41583251953125, 18.67303466796875, 64.10828399658203, 26.445281982421875, 28.336944580078125, -4.2069244384765625, 12.029037475585938, -3.8975486755371094, 14.986785888671875, 16.530838012695312, 29.7427978515625, 1.1568603515625, -0.2870140075683594, 32.05030059814453, 27.779388427734375, 32.899169921875, 13.069114685058594, 33.29871368408203, 17.12744903564453, 76.68643188476562, 4.120597839355469, 43.091888427734375, 25.77204132080078, 20.51494598388672, 10.128835678100586, 8.424629211425781, 4.599037170410156, -2.8114013671875, -11.268463134765625, 23.07037353515625, 26.615188598632812, 13.63330078125, 2.1109848022460938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000652.npy"}
{"epoch": 0.9574155653450808, "step": 653, "batch_size": 64, "mean": 22.14688491821289, "std": 19.52638816833496, "min": -19.873321533203125, "p10": -1.604742622375488, "median": 22.178443908691406, "p90": 43.929856872558595, "max": 68.87265014648438, "pos_frac": 0.828125, "sample": [21.090835571289062, 5.615234375, 0.5141372680664062, 36.31968688964844, 18.033103942871094, 19.59820556640625, -2.2345352172851562, 11.696685791015625, 7.696399688720703, 39.82560729980469, 44.87187194824219, -1.71002197265625, -0.1238861083984375, 27.039443969726562, 59.53984069824219, 33.55656433105469, 43.90814208984375, 27.115798950195312, 43.93916320800781, -19.873321533203125, -5.900522232055664, 17.445606231689453, 7.7111358642578125, -0.18686676025390625, 40.96184158325195, 22.301177978515625, 23.339035034179688, 13.846107482910156, 34.054054260253906, 29.490631103515625, 68.87265014648438, 37.886680603027344, 26.48590850830078, 22.055709838867188, 21.35683822631836, 42.25550842285156, -15.655345916748047, 43.35919189453125, -0.16629981994628906, 16.490386962890625, 32.5101318359375, 18.11102294921875, 2.5613632202148438, 10.222900390625, 60.572547912597656, 21.34227752685547, 8.389373779296875, 11.356452941894531, 29.27779769897461, -1.359090805053711, 34.37776184082031, 3.3375205993652344, 34.59919738769531, 68.22532653808594, 17.470306396484375, 28.36884307861328, 23.966781616210938, -5.791908264160156, 34.40412902832031, 23.410734176635742, 32.7161865234375, 27.818893432617188, -11.813018798828125, 50.90269470214844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000653.npy"}
{"epoch": 0.9588839941262849, "step": 654, "batch_size": 64, "mean": 20.12354850769043, "std": 21.726564407348633, "min": -31.093528747558594, "p10": -8.612551116943358, "median": 20.35346221923828, "p90": 44.010853576660175, "max": 90.8736572265625, "pos_frac": 0.84375, "sample": [18.453453063964844, -0.6965141296386719, 25.43316650390625, 11.658607482910156, 23.5821533203125, 20.434494018554688, -17.742996215820312, 8.763763427734375, 1.8009510040283203, 39.809539794921875, 31.28227996826172, 50.66070556640625, 30.840309143066406, 12.141731262207031, 1.9244804382324219, 19.0882568359375, 34.86730194091797, -11.643829345703125, 37.08575439453125, 24.011547088623047, 45.81141662597656, 65.13949584960938, 15.66177749633789, 24.020172119140625, 24.1424560546875, 79.09025573730469, -7.4450531005859375, 14.01806640625, -6.0477752685546875, 19.938552856445312, 34.00012969970703, 36.04161071777344, 27.262203216552734, 26.580642700195312, 37.51255798339844, 36.244895935058594, 34.55668640136719, 22.6571044921875, 5.85365104675293, 25.550048828125, 20.435958862304688, 15.764236450195312, 23.043167114257812, 20.023197174072266, -31.093528747558594, 58.66756057739258, 14.460319519042969, 23.377582550048828, 46.429901123046875, 2.525707244873047, 27.596725463867188, -9.112907409667969, 29.105655670166016, 6.112548828125, 9.976654052734375, -10.252792358398438, 8.42276382446289, -10.184467315673828, 90.8736572265625, 8.88037109375, 20.272430419921875, -11.871063232421875, 2.024627685546875, 10.084762573242188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000654.npy"}
{"epoch": 0.960352422907489, "step": 655, "batch_size": 64, "mean": 18.406654357910156, "std": 20.04145050048828, "min": -19.17986297607422, "p10": -2.577292251586914, "median": 16.442941665649414, "p90": 41.135999298095705, "max": 84.16586303710938, "pos_frac": 0.84375, "sample": [5.248382568359375, 16.060317993164062, 40.4810791015625, 28.04631805419922, -2.592205047607422, 34.99814224243164, -8.185508728027344, 12.935012817382812, 1.6472358703613281, 25.052331924438477, 9.858863830566406, 11.174476623535156, 15.39764404296875, -3.1294326782226562, 13.48043441772461, 20.325439453125, 16.904342651367188, 27.864105224609375, 21.509567260742188, 27.049209594726562, 20.523040771484375, 35.52880096435547, 32.48241424560547, 34.953033447265625, 10.671943664550781, 6.996318817138672, 30.053020477294922, 41.41667938232422, 0.6358795166015625, 28.23663330078125, -14.535600662231445, 52.29998779296875, 20.98944091796875, 0.9171886444091797, 30.158832550048828, 16.825565338134766, -7.152061462402344, -2.5424957275390625, 50.199371337890625, 40.12120819091797, 0.4211082458496094, 27.323760986328125, 27.805999755859375, 55.63499450683594, 67.60514831542969, 7.319093704223633, 21.5589599609375, 5.816394805908203, 1.8530426025390625, -13.794876098632812, 5.6292724609375, 3.499267578125, 52.11396026611328, -19.17986297607422, 31.899749755859375, 0.68292236328125, 84.16586303710938, -1.8362197875976562, 1.226654052734375, 22.12017059326172, 32.15807342529297, 12.9241943359375, 9.794906616210938, -1.6216354370117188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000655.npy"}
{"epoch": 0.9618208516886931, "step": 656, "batch_size": 64, "mean": 22.10733985900879, "std": 17.192564010620117, "min": -16.83489227294922, "p10": 0.5290534973144537, "median": 21.807727813720703, "p90": 48.09617118835452, "max": 62.62152099609375, "pos_frac": 0.921875, "sample": [51.008487701416016, 30.309593200683594, 18.576904296875, 24.249778747558594, 26.43084716796875, 62.62152099609375, 11.421239852905273, 39.20135498046875, 32.33211135864258, 22.802467346191406, 5.5574493408203125, 0.26895904541015625, 25.13897705078125, 58.800750732421875, 13.277912139892578, 21.612586975097656, 11.729904174804688, 7.559837341308594, 25.130123138427734, 18.2230224609375, 25.654266357421875, 30.416397094726562, 31.419788360595703, 4.006591796875, -4.4731903076171875, -1.6791229248046875, 34.33367919921875, 18.69139862060547, 24.504417419433594, 38.3984375, 22.181640625, 30.783660888671875, 19.58348846435547, -2.8029747009277344, 56.44800567626953, 28.06389617919922, 59.61483383178711, 22.00286865234375, 60.449951171875, 16.51287841796875, 33.10188293457031, 27.304542541503906, 33.259765625, 10.919990539550781, 7.7526092529296875, 1.1359405517578125, 28.509586334228516, 41.30076599121094, 0.021137237548828125, 15.005393981933594, 13.156684875488281, 12.430587768554688, -1.7236976623535156, 4.4513702392578125, 51.2125244140625, 14.543685913085938, 7.324798583984375, 6.137655258178711, 7.526939392089844, 20.899391174316406, 23.401626586914062, 20.7827205657959, -16.83489227294922, 32.883995056152344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000656.npy"}
{"epoch": 0.9632892804698973, "step": 657, "batch_size": 64, "mean": 21.875469207763672, "std": 21.800230026245117, "min": -28.81463623046875, "p10": -2.0430614471435544, "median": 17.657733917236328, "p90": 51.32938079833985, "max": 89.01457214355469, "pos_frac": 0.84375, "sample": [25.601516723632812, -1.8695220947265625, 41.922401428222656, 20.71038818359375, 35.81492614746094, 89.01457214355469, 3.4085922241210938, -4.293022155761719, 32.91968536376953, 15.641494750976562, -28.81463623046875, 0.9098892211914062, 18.31061553955078, 51.956787109375, -0.1952056884765625, 60.438995361328125, 4.2328033447265625, 19.56937026977539, -5.543487548828125, 65.12673950195312, 34.501190185546875, 34.10774230957031, 36.31846618652344, 52.036537170410156, 12.695320129394531, 31.304275512695312, 11.820159912109375, 11.616935729980469, 24.79974365234375, 43.907073974609375, 5.9724273681640625, 17.09765625, 14.46291732788086, -15.503684997558594, 37.658599853515625, 12.254013061523438, 44.17461395263672, -0.29278564453125, -5.6598968505859375, 6.130731582641602, 18.217811584472656, 36.286415100097656, 4.096536636352539, 15.211292266845703, 49.86543273925781, 6.671730041503906, 1.1961822509765625, 14.94925308227539, 23.74850082397461, 22.231861114501953, 69.0932388305664, 22.898269653320312, 24.446487426757812, 36.99608612060547, 9.075798034667969, 14.028091430664062, 38.352264404296875, 60.42424011230469, 40.924312591552734, -2.1174354553222656, 11.5289306640625, 16.50395965576172, -4.7024078369140625, 15.838180541992188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000657.npy"}
{"epoch": 0.9647577092511013, "step": 658, "batch_size": 64, "mean": 21.676746368408203, "std": 20.195974349975586, "min": -25.451370239257812, "p10": -0.8308048248291015, "median": 17.95958137512207, "p90": 54.523642730712915, "max": 71.17681121826172, "pos_frac": 0.875, "sample": [24.62103271484375, -2.491180419921875, 17.150955200195312, -11.309707641601562, 0.3385772705078125, 10.61468505859375, 59.04802703857422, 0.0945281982421875, 8.662261962890625, 5.628410339355469, -2.7161788940429688, 66.49795532226562, 10.670719146728516, 27.061126708984375, 24.891448974609375, 38.680625915527344, 32.08720016479492, 18.46865463256836, 32.054100036621094, 3.663341522216797, 11.600433349609375, 8.158588409423828, -0.6974105834960938, 24.716075897216797, 13.163528442382812, 32.036712646484375, 59.53204345703125, 62.99137878417969, 19.248260498046875, -25.451370239257812, 17.368972778320312, 25.460147857666016, 11.274578094482422, 35.82110595703125, 33.28733825683594, 17.45050811767578, 19.035411834716797, 46.432159423828125, 19.56291961669922, 1.8371944427490234, 57.143367767333984, 56.901100158691406, 22.302001953125, 37.32427215576172, 16.77105712890625, 25.625564575195312, 25.168289184570312, 16.757217407226562, 45.576866149902344, 48.97624206542969, 2.846250534057617, 15.830177307128906, 12.264827728271484, 13.475410461425781, 13.9072265625, -1.826568603515625, 43.41154479980469, 71.17681121826172, -0.8879737854003906, 23.709487915039062, -9.814468383789062, 31.71717071533203, 15.527595520019531, 6.8831024169921875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000658.npy"}
{"epoch": 0.9662261380323054, "step": 659, "batch_size": 64, "mean": 26.300888061523438, "std": 22.747161865234375, "min": -14.551416397094727, "p10": -2.0410446166992178, "median": 25.8840389251709, "p90": 57.872249603271484, "max": 90.01882934570312, "pos_frac": 0.875, "sample": [36.7960205078125, 31.905136108398438, -1.1120834350585938, 25.68683624267578, 5.095550537109375, 21.927452087402344, 68.62481689453125, 42.25225830078125, 26.702285766601562, 32.404319763183594, 90.01882934570312, -2.4391708374023438, 31.040084838867188, -7.401679992675781, 8.716300964355469, 11.13665771484375, 12.13818359375, 18.742874145507812, 48.54510498046875, 44.757957458496094, 15.843605041503906, 10.294891357421875, -10.451786041259766, 42.996002197265625, 59.390960693359375, 76.88677978515625, 37.4090576171875, 38.923423767089844, 34.98033142089844, 24.742813110351562, 18.870346069335938, 1.92449951171875, -4.31430721282959, 58.22715759277344, 15.530914306640625, 33.52372741699219, 38.69950866699219, 4.304782867431641, 2.6525115966796875, 32.11671447753906, 36.454322814941406, 50.992191314697266, 66.06295013427734, -14.551416397094727, 49.385658264160156, 2.453187942504883, 12.14642333984375, 43.371185302734375, 31.00371551513672, 20.611236572265625, 37.189979553222656, 16.711780548095703, 11.472755432128906, 38.654273986816406, 26.081241607666016, 11.873741149902344, 58.1842041015625, 18.00836944580078, 57.14435577392578, 0.8861846923828125, 21.790084838867188, 32.031837463378906, -11.25961685180664, -11.531524658203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000659.npy"}
{"epoch": 0.9676945668135095, "step": 660, "batch_size": 64, "mean": 23.940519332885742, "std": 21.34345817565918, "min": -12.73971176147461, "p10": -2.1434604644775375, "median": 20.40393829345703, "p90": 57.98137283325196, "max": 77.88232421875, "pos_frac": 0.875, "sample": [36.40845489501953, 29.404293060302734, 40.994293212890625, 4.9592437744140625, -0.3340492248535156, 44.915504455566406, 23.90454864501953, 20.12059783935547, 12.675239562988281, 42.81056594848633, 2.2976951599121094, 40.832767486572266, 63.59825897216797, 7.899261474609375, 19.016708374023438, 58.9461669921875, 24.261688232421875, 5.979576110839844, 14.684669494628906, 3.217803955078125, 14.35565185546875, 20.687278747558594, 45.89270782470703, 36.52093505859375, 35.2725830078125, 65.04522705078125, -12.73971176147461, 4.986663818359375, 23.664703369140625, 27.202064514160156, 8.8212890625, 14.835750579833984, 9.671142578125, 58.81165313720703, 10.949554443359375, 67.33439636230469, 42.770965576171875, 25.115211486816406, 22.71111297607422, 39.30146789550781, -8.664186477661133, 19.86766815185547, 29.395492553710938, 37.651885986328125, -4.66179084777832, 31.986160278320312, 5.576454162597656, 63.70640563964844, 77.88232421875, 32.38954162597656, -4.548095703125, 56.04405212402344, 6.42578125, 18.887252807617188, 43.686378479003906, -4.275108337402344, 16.61334991455078, -2.9189224243164062, 4.578924179077148, 34.137718200683594, 8.66390609741211, -10.062149047851562, 6.027549743652344, 15.998714447021484], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000660.npy"}
{"epoch": 0.9691629955947136, "step": 661, "batch_size": 64, "mean": 23.661949157714844, "std": 22.737045288085938, "min": -18.672271728515625, "p10": -1.4619123458862302, "median": 21.729000091552734, "p90": 53.578591156005864, "max": 94.53109741210938, "pos_frac": 0.859375, "sample": [20.293651580810547, 2.1127891540527344, 27.643207550048828, 17.156566619873047, 48.264320373535156, -1.8015327453613281, 24.880603790283203, -13.881172180175781, 13.72549819946289, 18.97662353515625, 35.67314910888672, 14.211257934570312, 24.602569580078125, 38.63639831542969, 94.53109741210938, -9.060279846191406, 23.23943328857422, 10.014213562011719, -18.672271728515625, 39.144447326660156, 49.475074768066406, 38.14654541015625, 18.776655197143555, 5.349693298339844, 17.073383331298828, 54.01599884033203, 24.128143310546875, 49.66065216064453, 36.45344543457031, 5.12640380859375, -7.297981262207031, 44.56620788574219, 21.718704223632812, 54.003517150878906, 2.1924476623535156, 2.4190711975097656, 12.690879821777344, -1.116546630859375, 12.358299255371094, 21.739295959472656, 0.3585662841796875, 26.685791015625, -4.5662994384765625, 27.3887939453125, 54.5048828125, 0.19599533081054688, 52.58709716796875, 26.862327575683594, 23.18645668029785, 20.2283935546875, 62.49604034423828, 0.0076904296875, 22.616477966308594, -1.6099262237548828, 9.054088592529297, 39.05828094482422, 30.1826171875, 60.865692138671875, -0.9742393493652344, 88.81874084472656, 39.18031311035156, 29.667205810546875, 21.232627868652344, 15.166610717773438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000661.npy"}
{"epoch": 0.9706314243759178, "step": 662, "batch_size": 64, "mean": 21.101051330566406, "std": 15.665277481079102, "min": -8.523200988769531, "p10": 1.3529340744018556, "median": 20.38235855102539, "p90": 40.54698104858399, "max": 67.93501281738281, "pos_frac": 0.90625, "sample": [-1.3605003356933594, 13.037216186523438, 29.769699096679688, -8.523200988769531, 40.86090850830078, 20.1207275390625, 32.20402526855469, 16.108779907226562, 40.98187255859375, 21.52593994140625, 22.973194122314453, 9.246051788330078, 8.760574340820312, 25.699172973632812, 34.458641052246094, 23.202346801757812, 6.524662017822266, -3.1145362854003906, 41.49701690673828, 22.521331787109375, 67.93501281738281, 14.31745719909668, 19.470436096191406, 35.38667297363281, 6.362762451171875, 39.814483642578125, 24.88762664794922, 64.91844177246094, 12.492988586425781, -1.50433349609375, 28.588645935058594, 1.3185291290283203, 22.52886962890625, 4.519147872924805, 19.2100830078125, 19.988571166992188, 37.919403076171875, 23.189163208007812, 15.327682495117188, 39.09273910522461, 15.614830017089844, 34.9093017578125, 15.324848175048828, 38.13433074951172, 14.328323364257812, 20.64398956298828, 12.671546936035156, 25.55331802368164, -2.9036941528320312, 39.402984619140625, 10.17740249633789, 24.821619033813477, 6.746734619140625, 15.62982177734375, 11.324195861816406, 3.61212158203125, 22.158447265625, 46.05824279785156, 24.083236694335938, 16.472564697265625, 41.916259765625, 1.4332122802734375, 26.523681640625, -6.428314208984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000662.npy"}
{"epoch": 0.9720998531571219, "step": 663, "batch_size": 64, "mean": 23.321252822875977, "std": 20.974891662597656, "min": -13.714225769042969, "p10": 0.7879335403442391, "median": 19.429128646850586, "p90": 48.76063690185547, "max": 98.6756591796875, "pos_frac": 0.90625, "sample": [35.90478515625, 39.008636474609375, 24.218917846679688, 5.036735534667969, 9.293525695800781, 11.493110656738281, -3.2968921661376953, 2.2162246704101562, -2.3624267578125, 19.652435302734375, 8.914623260498047, 13.8614501953125, 28.686187744140625, 48.39060974121094, 46.59077453613281, 21.455055236816406, 9.126407623291016, 21.713045120239258, -13.714225769042969, 11.674003601074219, 52.70481872558594, 9.416328430175781, 31.286720275878906, 19.146499633789062, 53.689857482910156, 44.43951416015625, 21.819442749023438, 13.003036499023438, 48.919219970703125, 30.059661865234375, 1.554555892944336, 50.711395263671875, 63.2308349609375, 0.459381103515625, 18.35113525390625, -5.271240234375, 22.423385620117188, 9.122928619384766, 19.205821990966797, 25.21294403076172, 47.39607238769531, 6.9040069580078125, 11.999053955078125, -8.200065612792969, 13.940811157226562, 67.39840698242188, 18.97918701171875, 11.73541259765625, 11.340312957763672, 98.6756591796875, 16.231075286865234, 47.87492752075195, 47.339603424072266, 21.78736114501953, 48.285247802734375, 2.743335723876953, 9.4676513671875, 21.854961395263672, 8.228809356689453, 40.43653869628906, -10.416961669921875, 32.39442443847656, 24.037078857421875, 34.777992248535156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000663.npy"}
{"epoch": 0.973568281938326, "step": 664, "batch_size": 64, "mean": 23.47808837890625, "std": 19.55084228515625, "min": -9.82611083984375, "p10": -3.7559196472167966, "median": 22.56729507446289, "p90": 49.785700225830084, "max": 70.41221618652344, "pos_frac": 0.84375, "sample": [33.96171569824219, -5.771507263183594, 22.66773223876953, 18.31927490234375, 38.46287536621094, 16.614585876464844, 22.46685791015625, 33.42796325683594, 39.552459716796875, 41.22027587890625, 14.155860900878906, 70.41221618652344, 33.40470886230469, 30.370162963867188, 50.056922912597656, 25.286418914794922, 37.244659423828125, 58.106056213378906, -3.4960098266601562, 40.40919494628906, 8.191513061523438, 36.6995849609375, 23.315338134765625, 51.70170593261719, 14.033451080322266, -5.8202056884765625, 32.68536376953125, 1.5982513427734375, -3.3943214416503906, 43.86329650878906, 2.94921875, 13.546745300292969, 40.99102783203125, 5.299022674560547, -0.3940887451171875, 31.27104949951172, 10.856609344482422, 29.158355712890625, 19.82398223876953, 42.946533203125, -5.3047637939453125, 26.981353759765625, 10.57217025756836, -3.8673095703125, 26.266910552978516, 62.3966064453125, 15.451873779296875, 25.061668395996094, 46.661773681640625, -8.766754150390625, 17.453720092773438, 17.767662048339844, 61.750244140625, 11.44607925415039, 21.319793701171875, 6.4072113037109375, 49.15284729003906, -4.676158905029297, -9.82611083984375, 7.644142150878906, 33.94622802734375, 18.092803955078125, 50.406333923339844, 10.064537048339844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000664.npy"}
{"epoch": 0.9750367107195301, "step": 665, "batch_size": 64, "mean": 20.115211486816406, "std": 18.75139617919922, "min": -15.922927856445312, "p10": -0.016360473632811456, "median": 13.69809341430664, "p90": 48.32776565551758, "max": 79.251708984375, "pos_frac": 0.890625, "sample": [2.6745948791503906, 27.15441131591797, 11.022300720214844, 15.984298706054688, 13.72735595703125, 1.0008964538574219, 7.615440368652344, 12.33935546875, 17.389328002929688, 15.939300537109375, 23.194915771484375, 13.668830871582031, 7.507946014404297, 59.80010986328125, 10.556137084960938, 6.336143493652344, 3.221986770629883, 8.1302490234375, -4.392219543457031, 9.933380126953125, 14.821949005126953, 25.172584533691406, 25.93378448486328, 12.288673400878906, 17.965606689453125, 41.54804229736328, 32.077674865722656, 10.154373168945312, 52.925071716308594, 8.903411865234375, 13.066093444824219, 7.166679382324219, 48.97737121582031, 41.05161666870117, 33.02552795410156, -0.4523277282714844, 18.22516632080078, 31.931617736816406, -15.922927856445312, 9.466070175170898, 3.21932315826416, 25.771957397460938, 79.251708984375, 33.87431335449219, 13.512924194335938, 29.090576171875, -4.686359405517578, 26.038162231445312, 64.47677612304688, 46.81201934814453, 7.352415084838867, 13.522544860839844, 36.107460021972656, -4.8395233154296875, 49.29549789428711, 13.260940551757812, 24.683303833007812, -1.1221771240234375, 11.054466247558594, 12.705917358398438, 43.35699462890625, -5.472404479980469, 55.536651611328125, 23.439208984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000665.npy"}
{"epoch": 0.9765051395007343, "step": 666, "batch_size": 64, "mean": 19.517906188964844, "std": 22.10633659362793, "min": -40.441558837890625, "p10": -9.621747970581055, "median": 18.395389556884766, "p90": 53.41545562744141, "max": 67.6717529296875, "pos_frac": 0.828125, "sample": [30.557519912719727, 29.005401611328125, 26.648788452148438, 6.471561431884766, -14.366092681884766, 4.687664031982422, 14.584808349609375, 32.033485412597656, 26.269676208496094, 53.75465393066406, 19.243179321289062, -5.4747161865234375, -13.637493133544922, -2.461334228515625, 6.001335144042969, 22.149887084960938, 55.54383087158203, 28.87677764892578, -9.86859130859375, 30.96776580810547, 55.017601013183594, 3.4343032836914062, 3.2456893920898438, -8.207992553710938, 61.589759826660156, -21.378501892089844, 10.442581176757812, -13.325363159179688, 13.432273864746094, -40.441558837890625, 25.304336547851562, 19.277915954589844, 17.283527374267578, 46.59858703613281, 20.255706787109375, 10.7491455078125, 58.922607421875, 15.623344421386719, -9.485710144042969, 27.974151611328125, 10.138753890991211, 14.033687591552734, -9.680049896240234, 17.47125244140625, 12.412429809570312, 47.16937255859375, 8.866851806640625, 64.0137939453125, 18.875404357910156, 18.962387084960938, 30.30860137939453, 32.19490051269531, 48.743377685546875, 11.719902038574219, 22.536300659179688, 17.915374755859375, 24.00714111328125, 7.1415252685546875, 36.227874755859375, 31.633522033691406, 11.766447067260742, 15.090904235839844, 67.6717529296875, 52.623992919921875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000666.npy"}
{"epoch": 0.9779735682819384, "step": 667, "batch_size": 64, "mean": 24.828340530395508, "std": 20.758514404296875, "min": -8.738555908203125, "p10": -0.004179000854491521, "median": 25.22286319732666, "p90": 54.17920837402345, "max": 80.3623046875, "pos_frac": 0.890625, "sample": [68.60411071777344, 8.152694702148438, 27.350494384765625, -1.9118499755859375, 27.818130493164062, 34.69313049316406, 32.87657165527344, 28.548046112060547, 46.50144958496094, 5.83172607421875, 3.103536605834961, 35.71490478515625, 8.337169647216797, 23.493995666503906, 25.22136878967285, 55.466156005859375, 38.637786865234375, -1.6668548583984375, 69.40530395507812, 6.895420074462891, 17.45899200439453, 21.755420684814453, 80.3623046875, 17.43476104736328, 35.51309585571289, 5.256134033203125, 32.26718521118164, 16.198196411132812, 57.91205978393555, 35.079803466796875, 22.336830139160156, 27.167186737060547, 29.95513153076172, -4.008247375488281, 19.185401916503906, 3.895721435546875, 6.823387145996094, 38.07377243041992, 20.822769165039062, 36.142234802246094, 2.5001983642578125, 40.752098083496094, 15.491195678710938, -0.2745170593261719, 27.73603057861328, 51.17633056640625, 25.22435760498047, 8.853607177734375, 42.50712585449219, 73.48721313476562, 1.8055648803710938, 35.14544677734375, 15.077949523925781, -4.5419769287109375, 0.6266098022460938, 62.16233825683594, 30.449501037597656, -1.5079994201660156, 26.284564971923828, 8.01409912109375, 26.58734130859375, 3.6579437255859375, 43.83192443847656, -8.738555908203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000667.npy"}
{"epoch": 0.9794419970631424, "step": 668, "batch_size": 64, "mean": 25.462890625, "std": 19.41969871520996, "min": -10.60504150390625, "p10": 4.159209823608399, "median": 21.58450698852539, "p90": 48.48722839355469, "max": 74.60523986816406, "pos_frac": 0.953125, "sample": [28.905967712402344, 29.944625854492188, 62.18150329589844, 21.44293975830078, -0.4050750732421875, 27.027576446533203, 8.321159362792969, 8.776191711425781, 20.59381103515625, 39.70834732055664, 13.992347717285156, -3.756317138671875, 41.07213592529297, 11.144966125488281, 42.80708312988281, 7.7894439697265625, 0.768280029296875, 18.31208038330078, 19.488983154296875, 6.948360443115234, 28.50640869140625, 32.54254150390625, 27.51667022705078, 63.90708923339844, 47.927215576171875, 14.73895263671875, 8.109832763671875, 65.4603271484375, 47.40693664550781, 48.72723388671875, 24.57552719116211, 10.801971435546875, 46.68589782714844, 2.2377853393554688, 22.446563720703125, 10.005508422851562, 21.72607421875, 55.511322021484375, 18.267749786376953, 3.197345733642578, 32.859832763671875, 9.802070617675781, 44.88257598876953, 4.317726135253906, 7.069267272949219, 47.70588684082031, 30.046432495117188, 6.241203308105469, 40.79582214355469, 11.291425704956055, -10.60504150390625, 16.382278442382812, 47.05438232421875, 12.418289184570312, 8.871757507324219, 35.9490966796875, 9.511833190917969, 35.450294494628906, 74.60523986816406, 18.516502380371094, 4.091274261474609, 31.77435302734375, 42.550445556640625, 62.67866516113281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000668.npy"}
{"epoch": 0.9809104258443465, "step": 669, "batch_size": 64, "mean": 21.572715759277344, "std": 19.92790985107422, "min": -22.155052185058594, "p10": -1.6958077430725096, "median": 20.203861236572266, "p90": 49.38551750183105, "max": 67.43095397949219, "pos_frac": 0.859375, "sample": [25.5858154296875, -2.3080596923828125, -15.184032440185547, 20.468612670898438, -2.3017845153808594, 39.50541687011719, 11.2847900390625, 36.707340240478516, 6.924009323120117, -1.5800113677978516, 67.43095397949219, 1.9995269775390625, 3.3018646240234375, 5.15631103515625, 6.822120666503906, 24.332839965820312, 24.812828063964844, 8.333517074584961, 20.034164428710938, 20.373558044433594, 19.313385009765625, 7.520271301269531, 25.679656982421875, 26.646812438964844, 34.30720520019531, 35.6822509765625, 45.85321044921875, 57.5830078125, 50.795494079589844, 6.490802764892578, 20.022884368896484, 42.942962646484375, 20.54144287109375, 61.50822448730469, 66.67162322998047, -1.2022895812988281, -1.8487281799316406, 7.1055755615234375, 49.591285705566406, 17.280357360839844, 2.4574813842773438, 2.429656982421875, 41.08342742919922, 16.376075744628906, -22.155052185058594, 23.933940887451172, 7.2799072265625, 16.308494567871094, 17.45315933227539, 1.1854362487792969, 37.74748229980469, 36.8681755065918, 25.50867462158203, 15.476493835449219, 24.201522827148438, 41.770263671875, -1.7454347610473633, 53.17922592163086, 28.301544189453125, 40.878021240234375, 7.346549987792969, -3.776988983154297, 25.455032348632812, 48.905391693115234], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000669.npy"}
{"epoch": 0.9823788546255506, "step": 670, "batch_size": 64, "mean": 26.07461166381836, "std": 20.561687469482422, "min": -14.468658447265625, "p10": -0.7408615112304657, "median": 26.682910919189453, "p90": 51.600917816162124, "max": 85.83557891845703, "pos_frac": 0.890625, "sample": [16.10711669921875, 39.999229431152344, 45.85343933105469, 13.677818298339844, 31.897064208984375, 8.984344482421875, 17.69390869140625, 5.185646057128906, 61.197364807128906, 30.189910888671875, 45.092002868652344, 6.901298522949219, -14.468658447265625, 47.95220947265625, 52.74696350097656, 39.74690628051758, 61.771331787109375, 33.29710388183594, 34.335784912109375, -11.058517456054688, 28.40813446044922, 48.92681121826172, 30.45648956298828, 23.657882690429688, 19.431808471679688, 41.524227142333984, 11.076156616210938, -1.999725341796875, 37.56111145019531, 12.7034912109375, 38.160919189453125, 22.815582275390625, 18.360816955566406, 28.887863159179688, 14.007858276367188, 85.83557891845703, 53.544464111328125, 10.3441162109375, -13.913955688476562, 33.2757568359375, -6.080333709716797, 39.71661376953125, 39.72705078125, 53.70878601074219, 43.940704345703125, 42.20189666748047, 34.268707275390625, 2.1964874267578125, -2.2837486267089844, -5.200977325439453, 9.325813293457031, 34.412681579589844, 64.73158264160156, 24.766448974609375, 7.197206497192383, 7.5753326416015625, 18.58822250366211, 24.957687377929688, 14.678504943847656, 38.8476676940918, 8.155166625976562, 14.112174987792969, 11.747871398925781, 37.31581115722656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000670.npy"}
{"epoch": 0.9838472834067548, "step": 671, "batch_size": 64, "mean": 22.33591079711914, "std": 20.866256713867188, "min": -18.993717193603516, "p10": -0.9781173706054688, "median": 18.99784278869629, "p90": 48.477208709716805, "max": 92.01605224609375, "pos_frac": 0.859375, "sample": [8.029220581054688, 26.515113830566406, 13.078788757324219, 33.17084503173828, -18.993717193603516, 44.687408447265625, 18.988727569580078, 8.795822143554688, 26.36968994140625, 40.29362869262695, 28.499908447265625, 14.408454895019531, 11.740837097167969, 14.322677612304688, 8.831033706665039, -1.42156982421875, 1.6970291137695312, 9.72760009765625, 71.21922302246094, 21.569923400878906, -0.948455810546875, 51.39482116699219, 3.4629592895507812, 49.38763427734375, 33.68922424316406, 29.069324493408203, 44.091156005859375, 19.61878204345703, 43.30387878417969, 10.785737991333008, 4.137962341308594, 16.08319091796875, 25.475505828857422, -4.762027740478516, 30.375865936279297, 12.009258270263672, 49.04106140136719, 11.703086853027344, -0.9908294677734375, 31.231735229492188, 39.45075988769531, -16.146835327148438, 14.191291809082031, 17.157630920410156, 35.78157043457031, 47.16155242919922, 26.072818756103516, 73.00679016113281, 8.838798522949219, 8.207271575927734, 53.6265869140625, 24.484962463378906, 4.635631561279297, -1.1314697265625, 33.45135498046875, 9.472713470458984, -4.722614288330078, 19.0069580078125, -0.1451587677001953, 92.01605224609375, 27.908599853515625, 38.90851593017578, 13.368667602539062, 25.205223083496094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000671.npy"}
{"epoch": 0.9853157121879589, "step": 672, "batch_size": 64, "mean": 20.399099349975586, "std": 19.963367462158203, "min": -10.611953735351562, "p10": -4.638659286499023, "median": 15.994346618652344, "p90": 48.35052947998047, "max": 75.78668212890625, "pos_frac": 0.84375, "sample": [6.8027191162109375, -10.611953735351562, 6.819709777832031, 18.71282958984375, 13.6182861328125, -6.126617431640625, 12.199569702148438, 30.38446044921875, 49.24930191040039, -1.38037109375, 18.28662872314453, -9.569992065429688, 11.945388793945312, 37.608612060546875, 8.638404846191406, 24.684356689453125, -10.043731689453125, 26.78607940673828, 11.838661193847656, 75.78668212890625, 29.579925537109375, 35.63615417480469, 2.4661083221435547, 31.039947509765625, 15.741966247558594, -6.073585510253906, 48.92744445800781, -4.880908966064453, 22.775390625, 44.75475311279297, 6.091266632080078, 62.90315246582031, 54.59248352050781, 29.26092529296875, 13.641792297363281, 27.845443725585938, 44.541229248046875, 17.1053466796875, 16.246726989746094, 18.125118255615234, 0.9549789428710938, 19.349666595458984, 14.747442245483398, 13.437904357910156, 15.59576416015625, 36.490272521972656, 66.6861343383789, 47.00439453125, 2.0666427612304688, 10.625606536865234, 33.03730773925781, 35.04595947265625, 9.132431030273438, 6.529319763183594, 6.708597183227539, -2.6110363006591797, -10.337272644042969, 12.177425384521484, 54.569252014160156, -4.0734100341796875, 13.257007598876953, 29.071701049804688, 43.515296936035156, 26.611297607421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000672.npy"}
{"epoch": 0.986784140969163, "step": 673, "batch_size": 64, "mean": 22.179058074951172, "std": 17.8422794342041, "min": -18.56854248046875, "p10": 0.7650539398193359, "median": 20.846054077148438, "p90": 45.136619186401376, "max": 72.15938568115234, "pos_frac": 0.9375, "sample": [33.283103942871094, 4.565256118774414, 0.7021026611328125, 31.810279846191406, 32.05902099609375, 45.77377700805664, 31.516891479492188, 10.210613250732422, 42.58030319213867, 32.80255889892578, 21.775039672851562, 14.464599609375, -10.346946716308594, 27.992630004882812, 34.543880462646484, 19.24285888671875, 0.011304855346679688, 59.75604248046875, 43.64991760253906, 6.430809020996094, 9.173454284667969, 39.19230651855469, 0.7625923156738281, -0.7293014526367188, 33.65704345703125, 4.649269104003906, 14.022117614746094, 28.921722412109375, 0.7707977294921875, 32.03996276855469, 51.91093444824219, 72.15938568115234, 26.60979461669922, 29.638286590576172, 8.508163452148438, 12.620956420898438, 6.2917633056640625, 12.919639587402344, 33.273040771484375, 27.015472412109375, 12.617561340332031, 3.035686492919922, 26.09087371826172, 33.53112030029297, 19.917068481445312, 15.927131652832031, 12.442464828491211, 4.050622940063477, 7.366081237792969, 51.05690002441406, 32.03856658935547, 15.785282135009766, 57.37730407714844, 26.825237274169922, -18.56854248046875, 33.30145263671875, 13.548755645751953, 48.879852294921875, 14.865402221679688, 29.28961181640625, 33.92863464355469, 13.98672103881836, 11.276870727539062, -5.342315673828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000673.npy"}
{"epoch": 0.9882525697503671, "step": 674, "batch_size": 64, "mean": 23.33966827392578, "std": 19.914329528808594, "min": -15.871162414550781, "p10": 1.7281307220458992, "median": 20.067635536193848, "p90": 47.91269226074219, "max": 95.62776947021484, "pos_frac": 0.953125, "sample": [41.43007278442383, 26.397918701171875, 36.903770446777344, 47.98431396484375, 9.107738494873047, 34.54845428466797, 9.093603134155273, 28.934677124023438, -15.871162414550781, 42.580787658691406, 0.8412628173828125, 31.224639892578125, 5.9303131103515625, 64.48277282714844, 26.428970336914062, 18.862567901611328, 0.4630594253540039, 29.434059143066406, 6.241966247558594, 42.268409729003906, 3.6263656616210938, 17.4130859375, 2.9822120666503906, 34.29491424560547, 17.842002868652344, 5.354412078857422, 24.114242553710938, -2.8282394409179688, 9.970932006835938, 11.321258544921875, 54.538753509521484, 95.62776947021484, 9.063957214355469, 1.3657455444335938, 5.376163482666016, 66.89494323730469, 18.1961727142334, 14.615219116210938, 30.99630355834961, 41.92674255371094, 21.272703170776367, 0.406494140625, 13.059593200683594, 22.345230102539062, 7.364841461181641, 48.12104797363281, 47.745574951171875, 31.88219451904297, 17.051040649414062, 38.440650939941406, 23.74053955078125, 3.2609100341796875, 60.53302001953125, -1.267181396484375, 16.41724395751953, 15.412742614746094, 15.695304870605469, 31.06621551513672, 43.30535125732422, 26.98767852783203, 24.00021743774414, 2.5736961364746094, 11.577144622802734, 22.765426635742188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000674.npy"}
{"epoch": 0.9897209985315712, "step": 675, "batch_size": 64, "mean": 21.07205581665039, "std": 22.14422035217285, "min": -56.107269287109375, "p10": -2.0506406784057614, "median": 20.6585750579834, "p90": 52.78274154663087, "max": 75.87466430664062, "pos_frac": 0.859375, "sample": [-7.787326812744141, 30.618423461914062, 12.611015319824219, 33.361053466796875, 3.8229446411132812, 25.074573516845703, 56.695579528808594, -1.7195262908935547, 2.4681777954101562, 36.7457275390625, 9.618499755859375, 10.531967163085938, 15.483627319335938, 13.596122741699219, 32.882537841796875, 4.950599670410156, 5.780284881591797, 58.56182861328125, 20.98615264892578, -5.167976379394531, 59.38439178466797, 21.484817504882812, 75.87466430664062, 53.399932861328125, 23.73847198486328, 1.5109596252441406, 11.440013885498047, 23.586551666259766, 20.92220687866211, -12.930145263671875, 41.954833984375, 54.08003234863281, 14.494194030761719, 12.982078552246094, -0.7679901123046875, 61.2913818359375, 16.51776885986328, 28.745437622070312, 27.65301513671875, 1.5248260498046875, -2.192546844482422, 36.1097412109375, 6.515773773193359, 39.631202697753906, -6.93756103515625, 3.189401626586914, -56.107269287109375, 30.07184600830078, 32.57569885253906, 47.922393798828125, 6.908046722412109, 39.899200439453125, 15.856391906738281, 12.573108673095703, 20.394943237304688, 14.096988677978516, -8.685531616210938, 31.981040954589844, 26.620410919189453, 51.036521911621094, 8.282012939453125, 23.379608154296875, 51.342628479003906, 28.145824432373047], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000675.npy"}
{"epoch": 0.9911894273127754, "step": 676, "batch_size": 64, "mean": 24.266006469726562, "std": 17.40627670288086, "min": -21.188034057617188, "p10": 1.441667938232422, "median": 24.516769409179688, "p90": 44.753289031982426, "max": 66.28437042236328, "pos_frac": 0.90625, "sample": [13.494888305664062, 27.408462524414062, 16.308238983154297, 29.111846923828125, -21.188034057617188, 19.016498565673828, 44.901512145996094, 34.60734176635742, -1.710982322692871, 20.695232391357422, 24.606163024902344, 39.06211853027344, 22.439071655273438, 34.20164489746094, 21.393348693847656, 18.127853393554688, 22.48402976989746, 3.9382781982421875, 8.1436767578125, -11.6605224609375, 12.622249603271484, 10.974151611328125, 33.53766632080078, -0.9758148193359375, 14.945915222167969, 1.412689208984375, 29.092750549316406, 44.960662841796875, 44.11642837524414, 52.76402282714844, 27.25244140625, 8.889373779296875, 41.85445785522461, 24.42737579345703, 44.464561462402344, -1.1350059509277344, 57.81231689453125, 66.28437042236328, 32.988407135009766, 39.816314697265625, 9.805633544921875, -1.5345478057861328, 46.720703125, 1.5092849731445312, 28.926925659179688, 11.866897583007812, 14.59234619140625, 28.60138702392578, 42.214561462402344, 3.5758895874023438, 44.87702941894531, 42.75019836425781, 37.308082580566406, 19.97203826904297, 8.306570053100586, 34.34709930419922, 22.00775146484375, 14.001842498779297, 32.516300201416016, 24.786880493164062, 38.877410888671875, 11.890815734863281, 43.17903137207031, 40.438323974609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000676.npy"}
{"epoch": 0.9926578560939795, "step": 677, "batch_size": 64, "mean": 25.309600830078125, "std": 16.772090911865234, "min": -12.030784606933594, "p10": 2.8592567443847665, "median": 25.81276512145996, "p90": 46.93890533447266, "max": 60.4903564453125, "pos_frac": 0.921875, "sample": [2.5520858764648438, 46.794036865234375, 35.665252685546875, 23.321395874023438, 30.657745361328125, 30.006990432739258, -8.71518325805664, 16.240074157714844, 42.395118713378906, 17.21508026123047, 5.063892364501953, 21.340545654296875, -1.9552459716796875, 49.18552017211914, -0.5911331176757812, 40.984352111816406, 41.58348846435547, 17.173072814941406, 22.467544555664062, 42.27485656738281, 29.707061767578125, 38.65846252441406, 34.51111602783203, 18.877395629882812, 47.00099182128906, 19.80279541015625, 33.110198974609375, 6.290103912353516, 29.883216857910156, 25.87688446044922, 30.584762573242188, 60.4903564453125, 7.677196502685547, 17.905841827392578, 56.745277404785156, 7.43597412109375, 38.09712219238281, 11.826093673706055, 11.371635437011719, 20.196731567382812, 34.588111877441406, 23.187198638916016, 9.897834777832031, 56.910186767578125, -0.4095306396484375, 45.1428108215332, -12.030784606933594, 11.470287322998047, 27.884307861328125, 47.334999084472656, 24.21617889404297, 0.29870033264160156, 3.57598876953125, 33.49629211425781, 37.00630569458008, 34.46082305908203, 21.235366821289062, 57.898414611816406, 27.63202667236328, 25.748645782470703, 12.036602020263672, 28.00103759765625, 18.3409423828125, 32.18312072753906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000677.npy"}
{"epoch": 0.9941262848751835, "step": 678, "batch_size": 64, "mean": 24.243358612060547, "std": 20.8657283782959, "min": -13.89263916015625, "p10": 2.8625785827636725, "median": 20.62527084350586, "p90": 53.396497344970705, "max": 89.56077575683594, "pos_frac": 0.96875, "sample": [13.074165344238281, 41.11848449707031, 50.41994857788086, 46.16912841796875, 3.8481903076171875, 3.5960865020751953, 26.526931762695312, 32.88458251953125, 3.5208969116210938, 33.59166717529297, -13.89263916015625, 17.08892059326172, 8.580997467041016, 47.94446563720703, 13.391532897949219, 61.30589294433594, 21.307212829589844, 5.407249450683594, 7.3165740966796875, 45.09381103515625, 10.1529541015625, 20.271156311035156, 3.4850921630859375, 31.244232177734375, 53.75843048095703, 89.56077575683594, 21.304214477539062, 37.70652770996094, 0.4919853210449219, 34.23966979980469, 32.299110412597656, 6.372531890869141, 15.35906982421875, 4.9723968505859375, 11.398324012756348, 10.7313232421875, 9.659515380859375, 55.70344543457031, 2.5957870483398438, 35.54786682128906, 0.7612571716308594, 35.6611328125, 56.31602478027344, 52.55198669433594, 20.979385375976562, 2.2280731201171875, 45.10986328125, 39.3641357421875, 1.5602035522460938, -2.258056640625, 27.405715942382812, 8.877326965332031, 14.482933044433594, 6.710315704345703, 10.433086395263672, 24.72833251953125, 3.533111572265625, 75.01072692871094, 37.80381774902344, 15.913604736328125, 30.355785369873047, 6.747314453125, 55.16560363769531, 26.984771728515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000678.npy"}
{"epoch": 0.9955947136563876, "step": 679, "batch_size": 64, "mean": 21.760284423828125, "std": 16.534360885620117, "min": -11.301815032958984, "p10": -1.5655700683593738, "median": 21.188186645507812, "p90": 42.53456840515137, "max": 59.94793701171875, "pos_frac": 0.875, "sample": [23.322021484375, 13.748878479003906, -3.2335166931152344, 55.26842498779297, 50.18467712402344, 15.347782135009766, -3.4397239685058594, 30.515583038330078, 18.85761260986328, 31.20636749267578, 31.34836196899414, 56.69097900390625, 29.559478759765625, 0.815582275390625, 42.92170715332031, 5.5973358154296875, 44.60687255859375, 31.29541015625, 22.143157958984375, 17.88776397705078, 37.57134246826172, 41.63124465942383, 1.9398651123046875, 59.94793701171875, 16.717933654785156, 7.567291259765625, 7.373313903808594, 28.08697509765625, 17.862567901611328, 14.196361541748047, 16.70526123046875, 20.23321533203125, 16.918045043945312, 31.6097412109375, 12.181106567382812, 15.941474914550781, 38.71092224121094, 15.445289611816406, -2.1120986938476562, -0.29033660888671875, -2.5236358642578125, -11.301815032958984, 28.547256469726562, 31.91071319580078, 33.554656982421875, 24.14312744140625, 18.843780517578125, 26.641258239746094, 26.485843658447266, -4.511993408203125, 45.76561737060547, 30.826736450195312, 11.563766479492188, -8.151988983154297, 25.936721801757812, 9.890907287597656, 29.999263763427734, 34.735107421875, 10.014320373535156, 41.07159423828125, 32.166229248046875, 9.173171997070312, 34.653358459472656, 0.34197998046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000679.npy"}
{"epoch": 0.9970631424375918, "step": 680, "batch_size": 64, "mean": 22.779272079467773, "std": 21.148405075073242, "min": -13.943656921386719, "p10": -2.131080627441405, "median": 20.641395568847656, "p90": 52.359245300292976, "max": 81.38784790039062, "pos_frac": 0.859375, "sample": [60.4033203125, -5.698982238769531, -0.31980133056640625, -13.943656921386719, 20.731094360351562, 15.463211059570312, 39.022666931152344, 54.5408935546875, 19.70024871826172, 9.709190368652344, 35.90606689453125, 0.19666290283203125, 2.8658905029296875, 6.2816619873046875, 35.14604187011719, 57.084754943847656, 2.6871490478515625, 19.66741943359375, 10.988143920898438, -4.905281066894531, 39.59922790527344, 21.719085693359375, 20.861942291259766, 12.868820190429688, 17.36591339111328, 1.511260986328125, 13.468833923339844, 26.440189361572266, 14.249015808105469, 67.49996948242188, 4.587263107299805, 52.893341064453125, 0.8868026733398438, 24.8775634765625, 27.570472717285156, 30.573486328125, 28.085155487060547, 33.227691650390625, 40.011817932128906, -0.682769775390625, 1.434783935546875, 49.84295654296875, 20.55169677734375, 19.50188446044922, 27.047866821289062, -12.431022644042969, 41.322113037109375, -4.193271636962891, -8.179107666015625, 37.75959777832031, -2.7517852783203125, 81.38784790039062, 15.753875732421875, 1.9764404296875, 41.56251525878906, 9.266029357910156, 50.18750762939453, 51.11302185058594, 22.457801818847656, 26.360828399658203, 34.2203369140625, 59.27989196777344, 13.213058471679688, 38.04668426513672], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000680.npy"}
{"epoch": 0.9985315712187959, "step": 681, "batch_size": 64, "mean": 19.147510528564453, "std": 17.12638282775879, "min": -24.16240692138672, "p10": -1.9663063049316403, "median": 17.91724395751953, "p90": 45.455566406250014, "max": 62.93739318847656, "pos_frac": 0.875, "sample": [28.483821868896484, 2.3646507263183594, 46.98634338378906, 6.8813018798828125, 20.660232543945312, 13.931533813476562, 25.72393035888672, 12.648963928222656, 8.813430786132812, 28.511886596679688, 23.7777099609375, -1.723785400390625, 9.294795989990234, 13.813583374023438, -2.5915069580078125, 55.016815185546875, 57.085201263427734, 4.321662902832031, 42.791847229003906, 37.23447036743164, 3.719696044921875, -6.335472106933594, 18.380470275878906, 15.511016845703125, 62.93739318847656, -5.7685394287109375, 19.27214813232422, 29.991683959960938, 11.06317138671875, 9.981613159179688, 14.331153869628906, 21.39612579345703, 28.343994140625, 17.454017639160156, 14.811416625976562, 29.44841766357422, -2.0702438354492188, 24.384246826171875, 27.22311782836914, 16.135555267333984, 15.43853759765625, 36.3631591796875, 8.445037841796875, 50.43694305419922, -24.16240692138672, 25.81200408935547, -11.00296401977539, 23.453323364257812, 3.679718017578125, 15.46694564819336, 39.40122985839844, 22.45507049560547, 24.709823608398438, 21.683780670166016, 23.73210906982422, 15.184234619140625, 46.59716033935547, -4.840667724609375, 26.52132797241211, 6.81103515625, 4.470306396484375, 47.34320068359375, 19.44086456298828, 3.762969970703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948/margin_logs/step_0000681.npy"}

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:556803a1461339f1b6ea7e5f73d363a2221c2df72c5977f67d590fa35bf4dc8f
size 4972454376

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:386475117c57e42760cc89917bfae621bf3675a0e4f0c27726e05c8af1c7fcfc
size 4832048608

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ba144f5c724c750db6dce750db1faae7f7df1d59d412cc55bef3d554235bf89d
size 4832048656

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cab42a3ebf8c7dd9bdc7c336df89818408b7ca1d04583468581142a4410f2558
size 4999855528

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ccfe5f7cef3cff264302de0605a83ac326f2038e3056d80f3f3dd921fcb98fc8
size 4832048672

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cd8f1bea119ecc5a65caea8d61789770d817afe03f72676cfdea4838a06bf77a
size 4832048672

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:291841df183f90f25cd1f27b4ce637269ddf34e194bfd145592cabf74b376714
size 3462482728

View File

@@ -0,0 +1,406 @@
{
"metadata": {
"total_size": 32762941440
},
"weight_map": {
"lm_head.weight": "model-00007-of-00007.safetensors",
"model.embed_tokens.weight": "model-00001-of-00007.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.20.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.input_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.34.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.35.input_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.k_norm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.q_norm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.norm.weight": "model-00007-of-00007.safetensors"
}
}

31
special_tokens_map.json Normal file
View File

@@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

BIN
tokenizer.json (Stored with Git LFS) Normal file

Binary file not shown.

240
tokenizer_config.json Normal file
View File

@@ -0,0 +1,240 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|endoftext|>",
"errors": "replace",
"extra_special_tokens": {},
"model_max_length": 2048,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

1720
train.log Normal file

File diff suppressed because one or more lines are too long

9
train_results.json Normal file
View File

@@ -0,0 +1,9 @@
{
"epoch": 1.0,
"total_flos": 0.0,
"train_loss": 0.7553340482816823,
"train_runtime": 3298.7616,
"train_samples": 43598,
"train_samples_per_second": 13.216,
"train_steps_per_second": 0.206
}

10354
trainer_state.json Normal file

File diff suppressed because it is too large Load Diff

1
vocab.json Normal file

File diff suppressed because one or more lines are too long