初始化项目,由ModelHub XC社区提供模型

Model: jackf857/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-19 02:33:40 +08:00
commit 4efda549c2
24 changed files with 164720 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

80
README.md Normal file
View File

@@ -0,0 +1,80 @@
---
library_name: transformers
base_model: jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452
tags:
- alignment-handbook
- margin-dpo
- generated_from_trainer
datasets:
- Anthropic/hh-rlhf
model-index:
- name: qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249
This model is a fine-tuned version of [jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452](https://huggingface.co/jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452) on the Anthropic/hh-rlhf dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5180
- Margin Dpo/margin Mean: 7.8948
- Margin Dpo/margin Std: 11.6820
- Logps/chosen: -90.0938
- Logps/rejected: -105.9037
- Logps/ref Chosen: -87.3172
- Logps/ref Rejected: -95.2323
- Logits/chosen: 1.4433
- Logits/rejected: 1.3188
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Margin Dpo/margin Mean | Margin Dpo/margin Std | Logps/chosen | Logps/rejected | Logps/ref Chosen | Logps/ref Rejected | Logits/chosen | Logits/rejected |
|:-------------:|:------:|:----:|:---------------:|:----------------------:|:---------------------:|:------------:|:--------------:|:----------------:|:------------------:|:-------------:|:---------------:|
| 1.3236 | 0.1512 | 100 | 0.6540 | 0.9206 | 1.8427 | -86.6428 | -95.4785 | -87.3172 | -95.2323 | 1.6973 | 1.5878 |
| 1.1498 | 0.3023 | 200 | 0.5566 | 5.3407 | 9.0153 | -88.1022 | -101.3580 | -87.3172 | -95.2323 | 1.4121 | 1.2978 |
| 1.1522 | 0.4535 | 300 | 0.5328 | 7.2941 | 11.5055 | -91.8542 | -107.0635 | -87.3172 | -95.2323 | 1.4997 | 1.3738 |
| 1.2091 | 0.6047 | 400 | 0.5248 | 7.2854 | 11.1368 | -89.2882 | -104.4887 | -87.3172 | -95.2323 | 1.4582 | 1.3356 |
| 1.0214 | 0.7559 | 500 | 0.5192 | 8.0772 | 11.9903 | -90.4015 | -106.3938 | -87.3172 | -95.2323 | 1.7114 | 1.5744 |
| 1.1318 | 0.9070 | 600 | 0.5180 | 7.8948 | 11.6820 | -90.0938 | -105.9037 | -87.3172 | -95.2323 | 1.4433 | 1.3188 |
### Framework versions
- Transformers 4.51.0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.21.4

28
added_tokens.json Normal file
View File

@@ -0,0 +1,28 @@
{
"</think>": 151668,
"</tool_call>": 151658,
"</tool_response>": 151666,
"<think>": 151667,
"<tool_call>": 151657,
"<tool_response>": 151665,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

22
all_results.json Normal file
View File

@@ -0,0 +1,22 @@
{
"epoch": 0.999244142101285,
"eval_logits/chosen": 1.3520251512527466,
"eval_logits/rejected": 1.231923222541809,
"eval_logps/chosen": -90.12765502929688,
"eval_logps/ref_chosen": -87.31719970703125,
"eval_logps/ref_rejected": -95.23231506347656,
"eval_logps/rejected": -105.97742462158203,
"eval_loss": 0.5182201266288757,
"eval_margin_dpo/margin_mean": 7.934661388397217,
"eval_margin_dpo/margin_std": 11.753697395324707,
"eval_runtime": 42.5799,
"eval_samples": 2303,
"eval_samples_per_second": 54.087,
"eval_steps_per_second": 1.691,
"total_flos": 0.0,
"train_loss": 1.122965409968516,
"train_runtime": 3224.9347,
"train_samples": 42336,
"train_samples_per_second": 13.128,
"train_steps_per_second": 0.205
}

30
config.json Normal file
View File

@@ -0,0 +1,30 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151643,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 12288,
"max_position_embeddings": 32768,
"max_window_layers": 36,
"model_type": "qwen3",
"num_attention_heads": 32,
"num_hidden_layers": 36,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.51.0",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}

16
eval_results.json Normal file
View File

@@ -0,0 +1,16 @@
{
"epoch": 0.999244142101285,
"eval_logits/chosen": 1.3520251512527466,
"eval_logits/rejected": 1.231923222541809,
"eval_logps/chosen": -90.12765502929688,
"eval_logps/ref_chosen": -87.31719970703125,
"eval_logps/ref_rejected": -95.23231506347656,
"eval_logps/rejected": -105.97742462158203,
"eval_loss": 0.5182201266288757,
"eval_margin_dpo/margin_mean": 7.934661388397217,
"eval_margin_dpo/margin_std": 11.753697395324707,
"eval_runtime": 42.5799,
"eval_samples": 2303,
"eval_samples_per_second": 54.087,
"eval_steps_per_second": 1.691
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"bos_token_id": 151643,
"eos_token_id": 151643,
"max_new_tokens": 2048,
"transformers_version": "4.51.0"
}

661
margin_logs/margins.jsonl Normal file
View File

@@ -0,0 +1,661 @@
{"epoch": 0.0, "step": 1, "batch_size": 64, "mean": -0.0029816031455993652, "std": 0.38981664180755615, "min": -0.7835464477539062, "p10": -0.5016929626464843, "median": 0.02667522430419922, "p90": 0.4355194091796875, "max": 1.2425384521484375, "pos_frac": 0.53125, "sample": [-0.2990684509277344, 0.05040740966796875, 0.4813804626464844, -0.7835464477539062, 0.16756057739257812, -0.21320724487304688, 0.066741943359375, 0.169891357421875, -0.06363677978515625, -0.33983612060546875, 0.20204925537109375, -0.003765106201171875, -0.7424850463867188, -0.039760589599609375, 0.008941650390625, 0.2320232391357422, 0.3860015869140625, 0.11869239807128906, -0.36592864990234375, -0.047290802001953125, -0.28316688537597656, 0.0283660888671875, -0.351715087890625, 0.11574554443359375, 0.86297607421875, -0.7426376342773438, 0.1338043212890625, -0.21837997436523438, 0.426910400390625, -0.12430953979492188, 0.2183837890625, -0.4932708740234375, 0.13604736328125, 0.1666259765625, 0.024984359741210938, -0.42929840087890625, -0.6993560791015625, -0.413604736328125, 0.22283935546875, -0.0557861328125, 1.2425384521484375, -0.2928791046142578, -0.14715576171875, 0.3737640380859375, -0.14208221435546875, 0.19033432006835938, 0.3464927673339844, 0.20479965209960938, 0.04190826416015625, -0.00957489013671875, -0.5053024291992188, 0.4848480224609375, 0.2988262176513672, 0.045352935791015625, 0.427978515625, -0.5745201110839844, 0.5770988464355469, 0.1401214599609375, -0.027454376220703125, -0.6424560546875, -0.2728919982910156, -0.428192138671875, 0.5285491943359375, 0.438751220703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000001.npy"}
{"epoch": 0.0015117157974300832, "step": 2, "batch_size": 64, "mean": 0.029325813055038452, "std": 0.47058698534965515, "min": -1.2616119384765625, "p10": -0.39437751770019525, "median": -0.11953926086425781, "p90": 0.6386299133300786, "max": 1.48486328125, "pos_frac": 0.4375, "sample": [-0.43146514892578125, 0.07180404663085938, -0.20481109619140625, -0.00714111328125, 0.5232467651367188, 0.06253433227539062, -0.07450485229492188, -0.35506439208984375, -0.14567184448242188, -0.2234630584716797, -0.31732177734375, 1.456878662109375, 0.14324188232421875, -0.41083526611328125, -0.4837646484375, -0.12252044677734375, -0.1322479248046875, 0.45180511474609375, -0.6440353393554688, -1.2616119384765625, 0.7379837036132812, 0.0069866180419921875, 0.14553451538085938, 0.2057647705078125, -0.11970138549804688, 0.1814441680908203, -0.2711448669433594, -0.22872161865234375, 0.23077011108398438, 0.2108001708984375, 0.348419189453125, -0.10046005249023438, 0.4903106689453125, -0.209228515625, 0.3726234436035156, -0.2670707702636719, 0.056774139404296875, 0.1702728271484375, -0.3437042236328125, -0.5232925415039062, 0.1266021728515625, -0.31758880615234375, -0.4544639587402344, -0.13794708251953125, 0.5147171020507812, 0.03656768798828125, 1.48486328125, -0.2191619873046875, -0.22581100463867188, -0.11937713623046875, -0.1849536895751953, 0.9678802490234375, 0.3454742431640625, -0.16698455810546875, -0.2411823272705078, -0.1938018798828125, 0.999603271484375, -0.17424774169921875, 0.908782958984375, -0.3559761047363281, -0.17584609985351562, 0.688079833984375, 0.04034423828125, -0.2581329345703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000002.npy"}
{"epoch": 0.0030234315948601664, "step": 3, "batch_size": 64, "mean": -0.023697078227996826, "std": 0.45782145857810974, "min": -1.91986083984375, "p10": -0.5236030578613281, "median": -0.015285491943359375, "p90": 0.4350975036621095, "max": 1.10260009765625, "pos_frac": 0.484375, "sample": [-0.37566375732421875, 0.23485183715820312, -0.17536163330078125, 0.24562835693359375, -1.91986083984375, -0.01930999755859375, -0.011260986328125, -0.28614044189453125, 0.85833740234375, 0.2634124755859375, -0.03998565673828125, 0.44366455078125, 0.2115020751953125, -0.107452392578125, -0.5330810546875, -0.24988555908203125, -0.1912078857421875, -0.577880859375, 0.05163764953613281, 0.414642333984375, -0.3824920654296875, -0.10361099243164062, 0.6924972534179688, 0.48990631103515625, -0.11035919189453125, 0.248046875, 0.2889251708984375, -0.771728515625, 0.14304542541503906, 0.2736968994140625, -0.5632228851318359, 0.12537384033203125, 0.41510772705078125, -0.5014877319335938, 0.3296852111816406, -0.2542743682861328, -0.8375320434570312, -0.21380615234375, 0.0877532958984375, -0.31082916259765625, -0.02677154541015625, 0.10428237915039062, -0.7775650024414062, 0.561798095703125, 0.1243896484375, -0.1341705322265625, -0.27362060546875, 0.013427734375, -0.43447113037109375, -0.06104278564453125, 0.1995086669921875, -0.37561798095703125, 0.32418060302734375, -0.0221099853515625, -0.33000946044921875, 0.22850799560546875, -0.189361572265625, 1.10260009765625, 0.24840164184570312, 0.6668701171875, 0.4114952087402344, 0.024749755859375, -0.48583984375, 0.3024749755859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000003.npy"}
{"epoch": 0.0045351473922902496, "step": 4, "batch_size": 64, "mean": -0.05591192841529846, "std": 0.4326469898223877, "min": -1.0691566467285156, "p10": -0.6583343505859374, "median": -0.07081794738769531, "p90": 0.4313098907470704, "max": 1.0045242309570312, "pos_frac": 0.453125, "sample": [-0.2283477783203125, -0.07517242431640625, -0.1802825927734375, 0.21552276611328125, 0.498504638671875, -0.18966293334960938, -0.21212387084960938, -1.0691566467285156, 0.55279541015625, 0.088775634765625, 0.10100936889648438, 0.1404876708984375, -0.086944580078125, 0.6613006591796875, -0.4389190673828125, -0.4662628173828125, 0.917205810546875, 0.2422637939453125, -0.304412841796875, 0.12021636962890625, -0.9850921630859375, -0.23685073852539062, 0.07217025756835938, -0.7182464599609375, 0.3552742004394531, 0.8540191650390625, -0.4804840087890625, 0.4134483337402344, 1.0045242309570312, -0.15228271484375, -0.3690338134765625, -0.002971649169921875, -0.2252197265625, -0.03482818603515625, -0.25142669677734375, -0.315460205078125, -0.9277496337890625, 0.06322860717773438, -0.7583160400390625, -0.17298126220703125, 0.358062744140625, -0.17002105712890625, -0.06646347045898438, 0.10154342651367188, 0.43896484375, 0.188232421875, -0.3348579406738281, 0.17641067504882812, -0.244384765625, 0.19111251831054688, -0.19725799560546875, -0.22145843505859375, 0.1349048614501953, 0.06200408935546875, -0.778961181640625, 0.1268768310546875, -0.8601226806640625, 0.3565711975097656, 0.2837677001953125, -0.25106048583984375, -0.07921600341796875, -0.5185394287109375, 0.1865081787109375, 0.12050247192382812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000004.npy"}
{"epoch": 0.006046863189720333, "step": 5, "batch_size": 64, "mean": -0.05295339226722717, "std": 0.46033552289009094, "min": -1.3843231201171875, "p10": -0.6246238708496092, "median": -0.071685791015625, "p90": 0.5273338317871095, "max": 1.045684814453125, "pos_frac": 0.421875, "sample": [0.3418693542480469, -0.4502105712890625, 0.021970748901367188, -0.06707763671875, 0.0313568115234375, -1.3843231201171875, -0.5203895568847656, -0.6576995849609375, -0.142181396484375, 0.5985336303710938, 0.5424880981445312, -0.875640869140625, 0.14334869384765625, 0.491973876953125, 0.8687744140625, -0.019989013671875, -0.15488815307617188, 0.19957923889160156, -0.22430419921875, -0.3468360900878906, -0.22603988647460938, 0.3718433380126953, 0.22152137756347656, -0.4967384338378906, 1.045684814453125, -0.2505035400390625, 0.48891448974609375, -0.0762939453125, -0.9005126953125, 0.46755218505859375, -0.09194183349609375, -0.2880401611328125, -0.25054931640625, -0.09242439270019531, -0.5474472045898438, -0.13890838623046875, -0.010328292846679688, -0.06174468994140625, 0.011089324951171875, 0.22116851806640625, 0.01992034912109375, -0.336395263671875, 0.6236190795898438, -0.15242767333984375, -0.4627227783203125, 0.1039276123046875, -0.08152008056640625, 0.301788330078125, 0.6996307373046875, -0.74322509765625, -0.1083984375, 0.777679443359375, -0.02599334716796875, 0.34894561767578125, -0.718109130859375, -0.09141921997070312, -0.35814666748046875, 0.20095443725585938, 0.1799182891845703, 0.16890716552734375, 0.3045158386230469, -0.8083724975585938, -0.5225372314453125, -0.5022125244140625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000005.npy"}
{"epoch": 0.007558578987150416, "step": 6, "batch_size": 64, "mean": 0.11092519760131836, "std": 0.3888793885707855, "min": -1.306121826171875, "p10": -0.20977706909179686, "median": 0.11199569702148438, "p90": 0.49496459960937506, "max": 1.3124313354492188, "pos_frac": 0.71875, "sample": [0.3983192443847656, 0.27388763427734375, 0.5018081665039062, 0.19976806640625, 0.114654541015625, 0.2878837585449219, 0.00287628173828125, -0.17101669311523438, 0.15926361083984375, 0.050540924072265625, -0.11278152465820312, -0.287200927734375, 0.2155132293701172, -0.086151123046875, 0.93951416015625, 0.184326171875, 0.04860687255859375, -0.10675048828125, 0.3272590637207031, -0.23872756958007812, 0.6820526123046875, -0.888946533203125, 0.1060943603515625, -0.1482524871826172, 0.2257232666015625, 0.10933685302734375, -0.20562744140625, -1.306121826171875, 0.0338897705078125, 0.19256591796875, -0.06197357177734375, 0.8635215759277344, 0.4420166015625, 0.14841842651367188, 0.24866485595703125, 0.0819091796875, 0.5501899719238281, 1.3124313354492188, 0.1243438720703125, 0.24396133422851562, -0.6861419677734375, 0.1249237060546875, 0.10659027099609375, 0.12500381469726562, -0.21155548095703125, 0.1806640625, 0.184295654296875, 0.47899627685546875, 0.025270462036132812, 0.10140800476074219, 0.11578369140625, 0.2185516357421875, 0.036373138427734375, -0.08511543273925781, 0.00443267822265625, 0.3087120056152344, -0.19737625122070312, -0.214202880859375, 1.0, -0.1848602294921875, 0.2952423095703125, 0.05629920959472656, 0.032283782958984375, -0.17215728759765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000006.npy"}
{"epoch": 0.009070294784580499, "step": 7, "batch_size": 64, "mean": -0.08516579866409302, "std": 0.41920268535614014, "min": -1.33831787109375, "p10": -0.5426959991455078, "median": -0.04171562194824219, "p90": 0.42117881774902377, "max": 0.842559814453125, "pos_frac": 0.4375, "sample": [0.007091522216796875, 0.115325927734375, 0.842559814453125, -0.5744972229003906, -0.4913902282714844, -0.2840728759765625, 0.22412109375, -0.3116722106933594, -0.117034912109375, 0.15216827392578125, -0.22993850708007812, 0.24911117553710938, 0.1318683624267578, 0.33733367919921875, -0.20450401306152344, 0.5200271606445312, -0.3940887451171875, -0.07821273803710938, 0.11034393310546875, -0.48111724853515625, -0.19913101196289062, 0.5652084350585938, 0.2612457275390625, 0.1287078857421875, -1.153076171875, -0.34691619873046875, 0.0339813232421875, -0.9288139343261719, -0.47174835205078125, 0.2653656005859375, -0.5448875427246094, -0.24285125732421875, -0.5021133422851562, -0.14127349853515625, 0.07133865356445312, -0.20558929443359375, -0.035614013671875, -0.2751121520996094, 0.20868682861328125, -0.008056640625, -0.19052886962890625, -0.047817230224609375, 0.4557685852050781, 0.559112548828125, 0.1864013671875, 0.8091049194335938, -0.44550323486328125, 0.16884994506835938, 0.044506072998046875, 0.3404693603515625, -0.008953094482421875, 0.18051910400390625, -0.10285377502441406, -0.26537322998046875, -0.2841949462890625, -0.5375823974609375, -0.48152923583984375, -0.7779922485351562, -1.33831787109375, 0.5445709228515625, 0.15570068359375, -0.00856781005859375, 0.13588714599609375, -0.5450611114501953], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000007.npy"}
{"epoch": 0.010582010582010581, "step": 8, "batch_size": 64, "mean": 0.042690396308898926, "std": 0.37920111417770386, "min": -0.9971542358398438, "p10": -0.4463935852050781, "median": 0.06449127197265625, "p90": 0.49541511535644533, "max": 1.0818023681640625, "pos_frac": 0.546875, "sample": [-0.3851604461669922, 0.270782470703125, 0.03070068359375, -0.3773193359375, 0.2193603515625, 0.4956474304199219, -0.5853424072265625, 0.494873046875, 0.23632049560546875, 0.397491455078125, 0.60760498046875, -0.0206756591796875, -0.2118968963623047, -0.15597152709960938, -0.142425537109375, 0.26409149169921875, 0.019012451171875, 0.12432479858398438, -0.05310821533203125, 0.100372314453125, -0.07801055908203125, -0.3861885070800781, -0.9971542358398438, 1.0818023681640625, 0.2972869873046875, -0.21878433227539062, 0.733612060546875, 0.049285888671875, -0.40633392333984375, -0.01078033447265625, 0.2899436950683594, -0.178924560546875, -0.09769439697265625, 0.3932533264160156, -0.5130538940429688, -0.4964141845703125, -0.14728546142578125, 0.09308624267578125, 0.1412811279296875, 0.395263671875, -0.08394622802734375, -0.17898178100585938, 0.5129585266113281, 0.0796966552734375, 0.7049636840820312, -0.7156295776367188, 0.19393539428710938, 0.34189605712890625, -0.16073036193847656, 0.30158233642578125, 0.4967517852783203, -0.46356201171875, 0.2994651794433594, -0.08195877075195312, -0.1596221923828125, 0.32688140869140625, -0.3173408508300781, -0.57373046875, 0.25252532958984375, 0.1920013427734375, 0.1255950927734375, 0.32442474365234375, 0.3462982177734375, -0.30416107177734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000008.npy"}
{"epoch": 0.012093726379440665, "step": 9, "batch_size": 64, "mean": 0.028308242559432983, "std": 0.4588913321495056, "min": -1.938690185546875, "p10": -0.4285530090332031, "median": 0.023145675659179688, "p90": 0.5668273925781252, "max": 1.04248046875, "pos_frac": 0.5, "sample": [0.180633544921875, -0.46083831787109375, 0.132354736328125, -0.9810028076171875, -0.23604965209960938, -0.104400634765625, 0.13064193725585938, 0.308685302734375, -0.797607421875, 0.7418975830078125, 0.233245849609375, 0.11417579650878906, 0.13680267333984375, 0.8250846862792969, -1.938690185546875, -0.17389297485351562, 0.862060546875, 0.122894287109375, -0.004810333251953125, 0.46735382080078125, 0.06072998046875, 0.09133338928222656, 0.3584442138671875, 0.4258842468261719, 0.3041400909423828, 0.3374481201171875, -0.28912353515625, -0.017242431640625, -0.10507392883300781, -0.00579071044921875, 0.5848312377929688, -0.2715110778808594, -0.14109039306640625, 0.2639617919921875, -0.252410888671875, 0.30387306213378906, 0.2540321350097656, -0.2611541748046875, -0.036651611328125, -0.150909423828125, -0.37109375, -0.1543560028076172, -0.072113037109375, 0.752685546875, -0.11477279663085938, -0.4575843811035156, 0.3291587829589844, -0.3403205871582031, -0.2356414794921875, 0.17022705078125, -0.24787521362304688, -0.01598358154296875, 0.0511016845703125, 0.6190338134765625, 0.323577880859375, -0.0093994140625, 0.5248184204101562, 0.16930389404296875, 0.2930793762207031, -0.3301258087158203, 1.04248046875, -0.5912017822265625, -0.0823516845703125, -0.45317840576171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000009.npy"}
{"epoch": 0.013605442176870748, "step": 10, "batch_size": 64, "mean": -0.008004248142242432, "std": 0.4527832269668579, "min": -1.4342498779296875, "p10": -0.4912685394287109, "median": -0.0073261260986328125, "p90": 0.5871681213378906, "max": 1.017059326171875, "pos_frac": 0.5, "sample": [0.5934600830078125, 0.2234344482421875, -0.0542144775390625, 0.05194854736328125, 0.27060699462890625, -1.4342498779296875, -0.2347564697265625, -1.0058708190917969, 0.4930419921875, -0.1996307373046875, -0.4970512390136719, -0.5044937133789062, 0.6281356811523438, -0.1730804443359375, -0.4122772216796875, -0.042041778564453125, -0.28032875061035156, 0.0721435546875, 0.35174560546875, 0.16057968139648438, 0.014020919799804688, -0.36511993408203125, 0.3485260009765625, 0.1943511962890625, -0.14642715454101562, -0.020023345947265625, -0.035831451416015625, 0.1104736328125, -0.06221771240234375, 0.02510833740234375, 0.00537109375, 0.276275634765625, -0.29587554931640625, -0.3046112060546875, -0.40961456298828125, 0.7273635864257812, 1.017059326171875, 0.12991714477539062, -0.5258216857910156, -0.20914077758789062, -0.1039276123046875, -0.4586677551269531, 0.449676513671875, 0.80810546875, 0.05929374694824219, 0.424285888671875, -0.03951263427734375, -0.1439971923828125, 0.4043426513671875, 0.73382568359375, 0.7671661376953125, -0.47222137451171875, 0.082122802734375, -0.22540283203125, -0.47777557373046875, -0.22344207763671875, -0.8077239990234375, 0.15678977966308594, -0.8703460693359375, 0.33632659912109375, 0.32938385009765625, 0.5724868774414062, 0.064788818359375, -0.358734130859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000010.npy"}
{"epoch": 0.015117157974300832, "step": 11, "batch_size": 64, "mean": 0.003546088933944702, "std": 0.3971622586250305, "min": -1.2397232055664062, "p10": -0.41245117187499997, "median": 0.0241241455078125, "p90": 0.5309379577636719, "max": 1.0954437255859375, "pos_frac": 0.53125, "sample": [0.21399688720703125, -0.14637374877929688, 0.53753662109375, -0.32186126708984375, -0.07396697998046875, 0.5826759338378906, 0.0902862548828125, 0.10465049743652344, 0.078826904296875, -0.7855377197265625, 0.2168121337890625, -0.2678985595703125, -0.417083740234375, 0.075042724609375, -0.3279266357421875, -0.195037841796875, 0.3932342529296875, 0.3811492919921875, 0.0680389404296875, 0.2057647705078125, 0.7758331298828125, -0.060943603515625, 1.0954437255859375, -0.4772987365722656, -0.32013702392578125, 0.0257415771484375, 0.00194549560546875, 0.046779632568359375, -0.16391754150390625, -0.16868209838867188, 0.062015533447265625, 0.08989715576171875, -0.19940567016601562, 0.2223663330078125, -0.11154937744140625, -0.29718780517578125, 0.2600860595703125, 0.3644561767578125, -0.3332672119140625, -0.401641845703125, 0.8499908447265625, 0.22687149047851562, -0.3592414855957031, -0.128509521484375, 0.07631683349609375, -1.2397232055664062, -0.5893402099609375, -0.09994888305664062, 0.142608642578125, -0.4915313720703125, -0.5667438507080078, 0.5587310791015625, 0.0225067138671875, 0.2152690887451172, 0.26090240478515625, -0.363037109375, 0.1990203857421875, -0.14618301391601562, 0.27722930908203125, -0.2207183837890625, 0.673095703125, 0.5155410766601562, -0.065643310546875, -0.34337615966796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000011.npy"}
{"epoch": 0.016628873771730914, "step": 12, "batch_size": 64, "mean": 0.018486201763153076, "std": 0.3425329923629761, "min": -1.0438079833984375, "p10": -0.409661865234375, "median": 0.040073394775390625, "p90": 0.4006881713867188, "max": 0.7224655151367188, "pos_frac": 0.5625, "sample": [-0.06961822509765625, -0.10433197021484375, -0.3277301788330078, 0.06430244445800781, 0.2369861602783203, 0.2930755615234375, 0.33119964599609375, 0.290069580078125, -0.06285858154296875, 0.4046478271484375, 0.1475830078125, 0.16960906982421875, 0.388427734375, 0.20037841796875, 0.23726272583007812, -0.31536102294921875, -0.10594940185546875, -0.09004974365234375, -0.373687744140625, 0.154327392578125, 0.25574493408203125, 0.488006591796875, -0.6058502197265625, 0.7224655151367188, 0.04692840576171875, 0.391448974609375, -0.27068328857421875, 0.0653076171875, 0.08978271484375, 0.3071250915527344, -0.15618133544921875, 0.7152862548828125, 0.04195404052734375, -1.0438079833984375, -0.2735595703125, 0.016553878784179688, 0.2186431884765625, 0.004436492919921875, -0.19567108154296875, -0.12679672241210938, 0.1861114501953125, 0.2859230041503906, -0.10302734375, -0.071044921875, 0.5014381408691406, -0.49560546875, -0.0244293212890625, 0.033458709716796875, -0.20575714111328125, 0.07253265380859375, -0.607879638671875, -0.09049606323242188, -0.15028762817382812, -0.775482177734375, 0.4354705810546875, 0.0381927490234375, 0.0837860107421875, -0.425079345703125, -0.4846763610839844, 0.6419525146484375, 0.26959991455078125, 0.32093048095703125, -0.308349609375, -0.10358047485351562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000012.npy"}
{"epoch": 0.018140589569160998, "step": 13, "batch_size": 64, "mean": 0.10214745998382568, "std": 0.3993344306945801, "min": -1.165008544921875, "p10": -0.35240249633789056, "median": 0.09507560729980469, "p90": 0.6155563354492188, "max": 1.3120269775390625, "pos_frac": 0.578125, "sample": [0.244293212890625, 0.55426025390625, 0.6240997314453125, -0.13354873657226562, 0.332550048828125, 0.6003570556640625, 0.11163330078125, -0.033458709716796875, 0.9995803833007812, -1.165008544921875, -0.01868438720703125, -0.17003631591796875, 0.07851791381835938, 0.6538238525390625, 0.704193115234375, -0.4022979736328125, -0.296844482421875, -0.515411376953125, 0.138275146484375, -0.137664794921875, -0.5721435546875, 0.7145538330078125, -0.00347900390625, 0.6220703125, 0.17141342163085938, -0.23095703125, 0.2817192077636719, -0.08391571044921875, 0.0454254150390625, 0.4019775390625, 0.155517578125, -0.247589111328125, -0.37621307373046875, 0.3775444030761719, -0.5824356079101562, 0.3508720397949219, 0.16277694702148438, 0.4437065124511719, 0.008886337280273438, -0.08890533447265625, -0.018280029296875, 0.3018455505371094, 0.326568603515625, 0.058441162109375, -0.2599945068359375, -0.11440658569335938, 0.04967498779296875, -0.1781463623046875, -0.43357086181640625, 0.4548187255859375, 0.2987518310546875, 0.187896728515625, -0.2700347900390625, 0.19377708435058594, 0.23076629638671875, -0.11572265625, 1.3120269775390625, -0.12000274658203125, 0.313995361328125, 0.20754241943359375, -0.12579345703125, 0.3344268798828125, 0.3040580749511719, -0.12065505981445312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000013.npy"}
{"epoch": 0.019652305366591082, "step": 14, "batch_size": 64, "mean": -0.026755720376968384, "std": 0.3234489858150482, "min": -0.7077560424804688, "p10": -0.4747657775878906, "median": -0.03806304931640625, "p90": 0.4371215820312501, "max": 0.7574691772460938, "pos_frac": 0.421875, "sample": [-0.26917266845703125, 0.606231689453125, 0.4117431640625, -0.4652862548828125, -0.16710281372070312, -0.012451171875, -0.1636810302734375, 0.115234375, 0.3204841613769531, 0.1504669189453125, 0.01645660400390625, -0.47882843017578125, -0.05786323547363281, -0.2431640625, 0.5578193664550781, -0.007106781005859375, 0.48043060302734375, -0.1201019287109375, 0.0007171630859375, -0.5029487609863281, -0.12232589721679688, 0.13128662109375, -0.06146812438964844, -0.0355682373046875, -0.2196044921875, -0.6271743774414062, -0.247161865234375, 0.093475341796875, -0.038364410400390625, 0.447998046875, 0.5012016296386719, 0.30266571044921875, -0.075286865234375, -0.056884765625, -0.08719253540039062, 0.10858154296875, -0.3311805725097656, 0.7574691772460938, 0.3751239776611328, 0.2747039794921875, -0.5560302734375, -0.3037872314453125, 0.16899871826171875, -0.669647216796875, 0.0067596435546875, -0.1877918243408203, -0.0244140625, -0.2181854248046875, 0.2938423156738281, -0.0557708740234375, -0.7077560424804688, -0.5832901000976562, -0.13927268981933594, 0.10709381103515625, -0.17572402954101562, -0.037761688232421875, 0.2091064453125, -0.3802013397216797, -0.19953536987304688, 0.022985458374023438, 0.08885574340820312, 0.57476806640625, 0.06162261962890625, -0.26940155029296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000014.npy"}
{"epoch": 0.021164021164021163, "step": 15, "batch_size": 64, "mean": -0.01594683527946472, "std": 0.3755708932876587, "min": -1.2086029052734375, "p10": -0.3673967361450195, "median": -0.060486793518066406, "p90": 0.5256324768066406, "max": 1.0513381958007812, "pos_frac": 0.453125, "sample": [0.027698516845703125, 0.026702880859375, -0.08658218383789062, -0.06350326538085938, 0.1524810791015625, -0.2054443359375, -0.1043853759765625, -0.28444671630859375, -0.708221435546875, 0.58941650390625, -0.12245559692382812, 0.19713592529296875, -0.1201019287109375, -0.07830619812011719, 0.2039794921875, -0.33597755432128906, 0.00408935546875, 0.0271453857421875, 0.018306732177734375, -0.16574478149414062, -0.6455154418945312, 0.096466064453125, 0.059864044189453125, -0.29088592529296875, 0.005138397216796875, 0.417083740234375, 0.712310791015625, -0.4702301025390625, 0.00606536865234375, -0.374755859375, -0.17373275756835938, -0.19412803649902344, 0.870574951171875, 1.0513381958007812, -0.17437744140625, 0.23345184326171875, 0.0973052978515625, -0.067626953125, -0.35022544860839844, -0.378692626953125, 0.227691650390625, -0.2832756042480469, 0.5177230834960938, -1.2086029052734375, 0.0377655029296875, 0.08221435546875, -0.34705352783203125, -0.0379791259765625, 0.00142669677734375, 0.529022216796875, -0.3107147216796875, -0.15384674072265625, -0.12181854248046875, 0.6858673095703125, -0.05747032165527344, 0.7321014404296875, 0.3201904296875, -0.07187271118164062, -0.548492431640625, -0.22045516967773438, -0.17444610595703125, 0.06742095947265625, -0.07732772827148438, -0.00988006591796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000015.npy"}
{"epoch": 0.022675736961451247, "step": 16, "batch_size": 64, "mean": -0.07183182239532471, "std": 0.3608151972293854, "min": -0.996734619140625, "p10": -0.6489479064941405, "median": -0.011472702026367188, "p90": 0.2893363952636719, "max": 0.6451187133789062, "pos_frac": 0.46875, "sample": [-0.000308990478515625, 0.1939258575439453, 0.2876434326171875, 0.21825790405273438, -0.2982330322265625, 0.022380828857421875, -0.34085655212402344, -0.8370361328125, -0.87701416015625, 0.23340606689453125, -0.017589569091796875, 0.25101470947265625, -0.0079193115234375, -0.7732353210449219, 0.02555084228515625, 0.00820159912109375, 0.15044403076171875, -0.17550277709960938, -0.127349853515625, -0.195037841796875, -0.3022613525390625, -0.7152023315429688, 0.376800537109375, 0.44263458251953125, 0.22455978393554688, 0.6451187133789062, -0.3918495178222656, 0.29006195068359375, 0.017520904541015625, -0.996734619140625, -0.26993560791015625, 0.13999176025390625, -0.10121917724609375, -0.3111724853515625, 0.17977142333984375, -0.7367095947265625, 0.231536865234375, -0.8953094482421875, 0.28359222412109375, -0.174224853515625, -0.20107650756835938, -0.015026092529296875, -0.21207427978515625, -0.08097076416015625, -0.4275360107421875, 0.2470722198486328, -0.24250030517578125, 0.06824493408203125, -0.16766738891601562, 0.20610809326171875, -0.16553497314453125, 0.2776527404785156, -0.1493701934814453, 0.3186759948730469, 0.175201416015625, 0.012813568115234375, 0.0377197265625, 0.6013031005859375, -0.494354248046875, -0.0517120361328125, -0.08884429931640625, 0.043430328369140625, 0.3811492919921875, -0.3476524353027344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000016.npy"}
{"epoch": 0.02418745275888133, "step": 17, "batch_size": 64, "mean": -0.0021145641803741455, "std": 0.41558346152305603, "min": -1.435150146484375, "p10": -0.3931510925292968, "median": 0.005275726318359375, "p90": 0.4587863922119141, "max": 1.503509521484375, "pos_frac": 0.53125, "sample": [-0.3163795471191406, 0.24957275390625, -0.1546630859375, 0.0076141357421875, 0.792205810546875, -0.3470458984375, -0.5756607055664062, 0.023578643798828125, -0.4319629669189453, -0.65460205078125, -0.305572509765625, -0.33235931396484375, 0.0014858245849609375, 0.2060394287109375, 0.2484588623046875, -0.07012176513671875, -0.319580078125, -0.7428436279296875, 0.10269737243652344, -0.15254974365234375, 0.36637115478515625, -0.0286712646484375, -0.16497802734375, 0.120758056640625, 0.15651702880859375, -0.41291046142578125, 0.176727294921875, -0.1327075958251953, 0.09575653076171875, -0.13397216796875, 0.6360321044921875, -0.9185791015625, -0.1029205322265625, -0.14548492431640625, 0.5689926147460938, 0.5370025634765625, -0.33342742919921875, -0.1227569580078125, 0.25566864013671875, -1.435150146484375, -0.1783599853515625, 0.06990814208984375, 0.00293731689453125, -0.155609130859375, 0.18637466430664062, 0.28249359130859375, -0.217041015625, 0.173187255859375, 0.22430801391601562, 1.503509521484375, 0.4506072998046875, 0.1932373046875, 0.0985260009765625, 0.2608470916748047, -0.022369384765625, 0.17395782470703125, 0.4622917175292969, 0.23984527587890625, -0.2385406494140625, 0.02954864501953125, -0.10207366943359375, 0.4831809997558594, 0.0391693115234375, -0.30584716796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000017.npy"}
{"epoch": 0.025699168556311415, "step": 18, "batch_size": 64, "mean": 0.03362897038459778, "std": 0.35456904768943787, "min": -0.7523040771484375, "p10": -0.4257064819335937, "median": 0.03524017333984375, "p90": 0.49926452636718754, "max": 1.193206787109375, "pos_frac": 0.546875, "sample": [0.10205459594726562, 0.15775299072265625, -0.18589401245117188, 0.0277099609375, 0.1847991943359375, -0.6723861694335938, -0.5166263580322266, -0.077392578125, 0.415863037109375, -0.3272056579589844, -0.0716094970703125, 0.39947509765625, 0.063018798828125, -0.1073760986328125, -0.00635528564453125, 0.25775909423828125, -0.13134002685546875, 0.1747283935546875, -0.06705474853515625, 1.193206787109375, -0.217071533203125, -0.08606719970703125, 0.06659698486328125, -0.0336761474609375, -0.17397499084472656, 0.073455810546875, 0.053466796875, -0.2678565979003906, -0.49483489990234375, -0.22516632080078125, -0.2743568420410156, 0.5048675537109375, 0.126068115234375, -0.1065826416015625, 0.45980072021484375, 0.6508865356445312, 0.10326576232910156, -0.09954071044921875, 0.6666259765625, -0.7523040771484375, 0.01287841796875, 0.2151947021484375, 0.5370025634765625, -0.256439208984375, 0.3125038146972656, 0.6277122497558594, -0.27739906311035156, 0.017932891845703125, 0.23248672485351562, -0.453643798828125, -0.5405120849609375, 0.123046875, 0.04761505126953125, -0.625885009765625, -0.05410003662109375, 0.29839324951171875, -0.05555534362792969, 0.08032989501953125, 0.4861907958984375, 0.1061248779296875, 0.314697265625, 0.534698486328125, 0.0427703857421875, -0.3605194091796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000018.npy"}
{"epoch": 0.027210884353741496, "step": 19, "batch_size": 64, "mean": 0.011277079582214355, "std": 0.4315812289714813, "min": -1.1468582153320312, "p10": -0.5671165466308593, "median": 0.04753875732421875, "p90": 0.56005859375, "max": 0.8171615600585938, "pos_frac": 0.578125, "sample": [-0.5024261474609375, -0.0218505859375, 0.2070465087890625, 0.644195556640625, -0.16363143920898438, -0.2498321533203125, 0.0277862548828125, 0.8171615600585938, -0.7382659912109375, -0.6362648010253906, 0.0108489990234375, 0.27689361572265625, 0.3336753845214844, 0.13988876342773438, 0.555572509765625, -0.1571807861328125, -0.5804672241210938, -0.22315216064453125, -0.21353912353515625, -0.080169677734375, -1.1468582153320312, -0.43837738037109375, -0.32806396484375, 0.527862548828125, 0.047393798828125, 0.0476837158203125, 0.7439727783203125, -0.13031387329101562, -0.31647491455078125, 0.05439186096191406, 0.37122344970703125, 0.14479827880859375, 0.0951995849609375, -0.47101593017578125, 0.7340545654296875, -0.7803192138671875, -0.6327781677246094, 0.3691864013671875, 0.5968399047851562, 0.3974285125732422, 0.1392974853515625, -0.41230010986328125, -0.5359649658203125, 0.527557373046875, 0.44768524169921875, 0.18442916870117188, -0.20702362060546875, 0.691009521484375, 0.09991455078125, 0.29578399658203125, 0.14727783203125, 0.14498329162597656, 0.13617706298828125, -0.053974151611328125, 0.03224945068359375, -0.43833160400390625, 0.176239013671875, -0.10932159423828125, 0.023847579956054688, 0.561981201171875, 0.1611328125, 0.5256500244140625, -0.878173828125, -0.27051544189453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000019.npy"}
{"epoch": 0.02872260015117158, "step": 20, "batch_size": 64, "mean": -0.0018051862716674805, "std": 0.35725441575050354, "min": -0.743255615234375, "p10": -0.4165199279785156, "median": -0.03134727478027344, "p90": 0.415898895263672, "max": 0.9436187744140625, "pos_frac": 0.453125, "sample": [0.5077743530273438, -0.08196258544921875, 0.8248939514160156, -0.037933349609375, -0.2960357666015625, -0.37024688720703125, 0.069366455078125, 0.76116943359375, 0.2628326416015625, -0.4191246032714844, 0.0684356689453125, 0.21979141235351562, 0.108734130859375, -0.19018173217773438, 0.0181121826171875, 0.11263656616210938, -0.081695556640625, -0.39473724365234375, -0.1669158935546875, -0.4210624694824219, -0.04098701477050781, 0.43228912353515625, -0.003871917724609375, -0.6120033264160156, -0.22987747192382812, -0.298492431640625, 0.3212928771972656, -0.033199310302734375, 0.20471954345703125, -0.019834518432617188, -0.352569580078125, -0.3512153625488281, 0.342376708984375, 0.347869873046875, 0.235443115234375, -0.4104423522949219, 0.15938186645507812, -0.6554107666015625, -0.0731964111328125, 0.0091552734375, -0.27865028381347656, 0.34772491455078125, 0.50604248046875, -0.333770751953125, 0.32216644287109375, -0.20020294189453125, 0.35260009765625, -0.0294952392578125, -0.04248046875, -0.5680770874023438, -0.4679412841796875, -0.743255615234375, 0.5450019836425781, 0.053314208984375, -0.217559814453125, 0.16629409790039062, -0.19991302490234375, 0.377655029296875, 0.9436187744140625, -0.09614753723144531, -0.3536243438720703, 0.27154541015625, 0.22705078125, -0.1627063751220703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000020.npy"}
{"epoch": 0.030234315948601664, "step": 21, "batch_size": 64, "mean": 0.04978856444358826, "std": 0.4457750618457794, "min": -1.2358245849609375, "p10": -0.4559528350830077, "median": -0.013433456420898438, "p90": 0.6667036056518558, "max": 1.24224853515625, "pos_frac": 0.484375, "sample": [0.36920928955078125, -0.08208465576171875, -0.07604026794433594, 0.7101287841796875, -0.22093963623046875, 0.4888153076171875, 0.0608673095703125, 0.20536422729492188, -0.1624908447265625, 0.2088184356689453, 0.2669563293457031, -0.24987030029296875, 1.24224853515625, -0.750701904296875, -0.0292205810546875, 0.7807273864746094, -0.9233016967773438, -0.37680816650390625, 0.537628173828125, 0.5653781890869141, -0.121307373046875, -0.08847427368164062, 0.2705955505371094, -0.107940673828125, 0.212310791015625, -0.5629959106445312, -0.229217529296875, -0.326995849609375, 0.7267913818359375, -0.09238052368164062, -0.7830810546875, -0.4898719787597656, -0.215789794921875, 0.007114410400390625, 0.8024444580078125, -0.17592620849609375, -1.2358245849609375, -0.08258056640625, -0.0418853759765625, 0.05918121337890625, 0.4703521728515625, 0.5064849853515625, 0.45135498046875, 0.018352508544921875, 0.8378143310546875, -0.1888427734375, -0.022762298583984375, -0.08276748657226562, -0.0041046142578125, -0.18481063842773438, 0.12316513061523438, -0.53375244140625, -0.3368663787841797, 0.33992767333984375, 0.1539592742919922, -0.19036483764648438, 0.7807159423828125, -0.09708404541015625, 0.0568084716796875, 0.279571533203125, 0.2129364013671875, 0.4030303955078125, 0.29967498779296875, -0.1951751708984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000021.npy"}
{"epoch": 0.031746031746031744, "step": 22, "batch_size": 64, "mean": -0.03180846571922302, "std": 0.38974854350090027, "min": -0.802276611328125, "p10": -0.49106063842773434, "median": -0.043659210205078125, "p90": 0.34653816223144546, "max": 1.5310821533203125, "pos_frac": 0.4375, "sample": [-0.5625228881835938, 0.08671188354492188, -0.06859588623046875, -0.23936843872070312, -0.016124725341796875, -0.4435882568359375, -0.04357147216796875, 0.10651397705078125, 0.4871864318847656, 0.44975852966308594, -0.06631851196289062, 0.1190338134765625, 0.06750869750976562, 0.16579818725585938, -0.17207717895507812, -0.5080184936523438, -0.28017425537109375, -0.07574081420898438, 0.2106475830078125, 0.11347198486328125, -0.4514923095703125, -0.5298919677734375, -0.3316497802734375, 0.20844078063964844, -0.1195068359375, 1.0029296875, -0.18688583374023438, 0.0821990966796875, 0.116485595703125, 0.09605216979980469, 0.5932769775390625, -0.0974578857421875, -0.33563232421875, 0.16439437866210938, -0.15790176391601562, 0.16144943237304688, -0.17560577392578125, 0.042327880859375, 0.1023406982421875, 0.7179336547851562, 0.2586784362792969, 1.5310821533203125, -0.1148834228515625, 0.24964141845703125, -0.1641998291015625, -0.6305770874023438, 0.3638725280761719, -0.6810302734375, 0.30609130859375, -0.802276611328125, -0.021820068359375, -0.300994873046875, -0.0437469482421875, -0.19149017333984375, -0.4359588623046875, -0.4482460021972656, -0.44196319580078125, -0.13242721557617188, -0.5487289428710938, 0.1238555908203125, -0.03063201904296875, 0.161590576171875, 0.060878753662109375, -0.3347930908203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000022.npy"}
{"epoch": 0.03325774754346183, "step": 23, "batch_size": 64, "mean": 0.04522597789764404, "std": 0.3444393575191498, "min": -1.0107421875, "p10": -0.3236640930175781, "median": 0.04911231994628906, "p90": 0.46906585693359376, "max": 0.8609619140625, "pos_frac": 0.59375, "sample": [-0.26251983642578125, -0.224609375, 0.05489349365234375, -0.482208251953125, 0.40501976013183594, 0.0052032470703125, 0.045253753662109375, 0.1215057373046875, -0.26430511474609375, -0.3343963623046875, -0.36731719970703125, 0.1420574188232422, 0.288055419921875, -0.169036865234375, 0.4838600158691406, -0.0235595703125, 0.0689544677734375, 0.36469268798828125, 0.028179168701171875, 0.38756561279296875, 0.18532180786132812, 0.09015274047851562, -0.14434432983398438, 0.6357955932617188, 0.8609619140625, 0.159393310546875, 0.11699676513671875, 0.2507915496826172, 0.0133819580078125, 0.47222900390625, -1.0107421875, 0.4616851806640625, -0.37928009033203125, 0.011180877685546875, -0.11227798461914062, 0.3166694641113281, -0.274383544921875, -0.72589111328125, -0.17988204956054688, -0.051181793212890625, -0.2034912109375, 0.6713027954101562, 0.08577728271484375, -0.172332763671875, -0.13343048095703125, -0.7442169189453125, 0.005458831787109375, 0.15078353881835938, -0.29862213134765625, -0.13885498046875, 0.2208099365234375, -0.028715133666992188, 0.2163715362548828, -0.07399940490722656, 0.5867767333984375, 0.3117523193359375, -0.204010009765625, 0.1673736572265625, 0.05297088623046875, 0.699859619140625, 0.1799468994140625, 0.346893310546875, 0.250579833984375, -0.0183868408203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000023.npy"}
{"epoch": 0.03476946334089191, "step": 24, "batch_size": 64, "mean": 0.001662522554397583, "std": 0.30461716651916504, "min": -0.8986968994140625, "p10": -0.3763545989990234, "median": 0.0038909912109375, "p90": 0.3708881378173829, "max": 0.7423553466796875, "pos_frac": 0.53125, "sample": [0.037384033203125, -0.00438690185546875, -0.39492034912109375, 0.1598224639892578, 0.32227325439453125, 0.24739456176757812, -0.0643768310546875, -0.21619415283203125, -0.25439453125, 0.39469146728515625, 0.15311050415039062, 0.380126953125, 0.16402816772460938, -0.052764892578125, 0.05438232421875, 0.000640869140625, 0.5238227844238281, -0.06821060180664062, -0.04694366455078125, -0.31549835205078125, 0.17388153076171875, 0.17127227783203125, -0.0841827392578125, -0.09427261352539062, -0.270782470703125, -0.24398422241210938, 0.02557373046875, -0.015655517578125, 0.0918731689453125, 0.00714111328125, -0.6024017333984375, -0.0038299560546875, 0.4221839904785156, -0.032756805419921875, 0.22246932983398438, 0.000335693359375, 0.38401031494140625, 0.30113983154296875, -0.07725906372070312, -0.33502960205078125, -0.8986968994140625, 0.07269668579101562, 0.3493309020996094, -0.0565948486328125, -0.358856201171875, 0.2107696533203125, -0.3838539123535156, 0.1857452392578125, -0.08982467651367188, 0.0800628662109375, 0.0214691162109375, -0.4623069763183594, -0.13593292236328125, 0.05352783203125, 0.6014251708984375, 0.03980064392089844, 0.3346710205078125, -0.51617431640625, 0.0539703369140625, -0.17314910888671875, 0.7423553466796875, -0.80621337890625, -0.044158935546875, 0.2266254425048828], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000024.npy"}
{"epoch": 0.036281179138321996, "step": 25, "batch_size": 64, "mean": 0.04496079683303833, "std": 0.32489076256752014, "min": -0.735260009765625, "p10": -0.34247283935546874, "median": 0.06625938415527344, "p90": 0.385260009765625, "max": 0.9087066650390625, "pos_frac": 0.5625, "sample": [0.21988677978515625, 0.14656829833984375, 0.052112579345703125, 0.315582275390625, 0.3817138671875, 0.0009307861328125, 0.3509674072265625, 0.0011844635009765625, 0.6172027587890625, -0.01513671875, 0.17795562744140625, -0.3200531005859375, 0.33043670654296875, 0.2168731689453125, -0.090087890625, 0.159576416015625, 0.08040618896484375, 0.02838134765625, -0.0918121337890625, -0.1904144287109375, -0.3973388671875, 0.2451324462890625, -0.11551666259765625, -0.5652923583984375, -0.14825057983398438, 0.40927886962890625, 0.19841766357421875, -0.735260009765625, 0.6411590576171875, 0.21962738037109375, -0.0662689208984375, 0.30445098876953125, -0.352081298828125, 0.2549591064453125, -0.02677154541015625, 0.474822998046875, -0.51312255859375, 0.11953163146972656, -0.27768707275390625, 0.38677978515625, -0.2264556884765625, -0.0606231689453125, 0.13416481018066406, -0.2689056396484375, -0.15624618530273438, -0.24118423461914062, 0.9087066650390625, 0.1190032958984375, -0.5407257080078125, -0.1234588623046875, 0.09858131408691406, 0.2708549499511719, -0.3109588623046875, 0.35546875, -0.2914886474609375, -0.046268463134765625, -0.56378173828125, -0.0714263916015625, 0.16931915283203125, 0.19266510009765625, 0.2416534423828125, 0.8076171875, -0.07698440551757812, 0.129119873046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000025.npy"}
{"epoch": 0.03779289493575208, "step": 26, "batch_size": 64, "mean": 0.11025133728981018, "std": 0.3825315833091736, "min": -0.7835006713867188, "p10": -0.38880996704101556, "median": 0.101470947265625, "p90": 0.557806396484375, "max": 1.33154296875, "pos_frac": 0.5625, "sample": [0.13511276245117188, 0.365325927734375, 0.283660888671875, 0.35480499267578125, 0.228790283203125, 0.4516181945800781, -0.408477783203125, 0.26396942138671875, 0.2508888244628906, 0.7573471069335938, 0.24393463134765625, 0.5538330078125, -0.18426513671875, -0.4820098876953125, -0.04770660400390625, -0.7835006713867188, -0.007900238037109375, -0.5383453369140625, 0.30559539794921875, -0.2270355224609375, 0.277374267578125, -0.10456085205078125, 0.402984619140625, -0.06217193603515625, -0.007129669189453125, 0.43558502197265625, -0.021144866943359375, -0.4332103729248047, 0.23240280151367188, 0.5157699584960938, 0.023090362548828125, 0.4086036682128906, 0.06935882568359375, 0.29248809814453125, -0.04339599609375, -0.03277587890625, 0.13358306884765625, -0.16259384155273438, 0.17750930786132812, 0.865997314453125, 0.6415443420410156, -0.12429046630859375, 0.3667144775390625, 0.25555419921875, 0.55950927734375, -0.26729583740234375, 0.1914825439453125, -0.005924224853515625, 1.33154296875, -0.0051898956298828125, -0.6827316284179688, -0.4471893310546875, 0.05950355529785156, -0.34291839599609375, 0.16236114501953125, 0.7156448364257812, 0.20890045166015625, -0.01981353759765625, 0.7489700317382812, 0.019563674926757812, -0.3147430419921875, -0.19587135314941406, -0.09641647338867188, -0.18622589111328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000026.npy"}
{"epoch": 0.039304610733182165, "step": 27, "batch_size": 64, "mean": 0.07733336091041565, "std": 0.4554557204246521, "min": -0.862579345703125, "p10": -0.43507537841796873, "median": 0.04780387878417969, "p90": 0.47218589782714854, "max": 2.233795166015625, "pos_frac": 0.578125, "sample": [0.19905853271484375, -0.07380294799804688, -0.45017242431640625, -0.07715606689453125, 0.120269775390625, -0.5591583251953125, -0.14828109741210938, 0.986785888671875, -0.0569000244140625, -0.709228515625, 0.441925048828125, 0.6648712158203125, 0.2599601745605469, 0.12268257141113281, 0.18834686279296875, 0.0244140625, 0.2382049560546875, -0.05747222900390625, -0.14305877685546875, -0.1604766845703125, -0.23337554931640625, 0.1236572265625, 0.404693603515625, 0.29039764404296875, -0.42290496826171875, 2.233795166015625, -0.12108612060546875, -0.4258613586425781, 0.05619049072265625, 0.02536773681640625, 0.5337982177734375, 0.3554878234863281, -0.029146194458007812, 0.8302764892578125, 0.2535362243652344, 0.35593414306640625, 0.051288604736328125, -0.4389801025390625, 0.2567596435546875, 0.0319671630859375, 0.738250732421875, 0.27249908447265625, 0.3934326171875, 0.438140869140625, 0.2698783874511719, 0.15772247314453125, -0.3374481201171875, 0.040218353271484375, -0.11461639404296875, 0.48155975341796875, -0.18710708618164062, -0.10860443115234375, -0.17731285095214844, -0.862579345703125, 0.04431915283203125, -0.255767822265625, -0.47975921630859375, 0.09893417358398438, 0.2467498779296875, -0.42596435546875, 0.4503135681152344, 0.2041168212890625, -0.5894241333007812, -0.29082489013671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000027.npy"}
{"epoch": 0.04081632653061224, "step": 28, "batch_size": 64, "mean": 0.008581280708312988, "std": 0.3488011062145233, "min": -0.807159423828125, "p10": -0.42013893127441404, "median": -0.012051582336425781, "p90": 0.39760284423828124, "max": 0.8465385437011719, "pos_frac": 0.484375, "sample": [-0.7634124755859375, 0.3237152099609375, -0.2757568359375, 0.06101226806640625, 0.6002655029296875, -0.1417388916015625, 0.3868255615234375, 0.7788848876953125, 0.0881500244140625, -0.22674560546875, 0.18692779541015625, -0.04538726806640625, 0.43708038330078125, 0.19384002685546875, 0.3980560302734375, -0.11855316162109375, -0.26663970947265625, -0.4627952575683594, -0.057147979736328125, 0.8465385437011719, -0.4222412109375, 0.06262969970703125, 0.043834686279296875, 0.5736007690429688, -0.4777336120605469, -0.06777191162109375, -0.03256797790527344, 0.3551177978515625, -0.4152336120605469, 0.35456085205078125, -0.19464111328125, -0.11703872680664062, -0.06696319580078125, -0.5102615356445312, -0.807159423828125, 0.020751953125, 0.25449180603027344, -0.06049346923828125, -0.41045379638671875, -0.8030853271484375, -0.016244888305664062, -0.147918701171875, 0.30866241455078125, -0.1679534912109375, 0.25200653076171875, 0.19230270385742188, 0.07257461547851562, -0.10858154296875, 0.10030174255371094, -0.3089637756347656, 0.39654541015625, -0.0078582763671875, 0.3388652801513672, -0.057159423828125, 0.069000244140625, -0.034149169921875, -0.14860916137695312, 0.654205322265625, 0.09604644775390625, -0.278076171875, -0.3288917541503906, 0.1666717529296875, 0.1329021453857422, 0.15106201171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000028.npy"}
{"epoch": 0.042328042328042326, "step": 29, "batch_size": 64, "mean": 0.0027069151401519775, "std": 0.3982090353965759, "min": -0.9286041259765625, "p10": -0.6174690246582031, "median": 0.062061309814453125, "p90": 0.4827989578247071, "max": 0.841796875, "pos_frac": 0.578125, "sample": [-0.548126220703125, -0.24712753295898438, 0.3552284240722656, 0.3843536376953125, -0.9286041259765625, 0.09186553955078125, -0.5588302612304688, 0.4649066925048828, -0.6218185424804688, 0.642242431640625, 0.10221481323242188, -0.43798828125, -0.6329421997070312, 0.3248252868652344, 0.03188323974609375, 0.5332260131835938, 0.071380615234375, 0.3681755065917969, -0.153076171875, 0.080047607421875, 0.08746719360351562, 0.1750030517578125, 0.17222213745117188, 0.17299270629882812, -0.2617340087890625, 0.6633453369140625, 0.841796875, -0.20670318603515625, -0.1656494140625, 0.4904670715332031, -0.6126785278320312, -0.7478790283203125, 0.0662841796875, 0.04885101318359375, 0.100311279296875, -0.07987403869628906, 0.25608253479003906, -0.30292510986328125, 0.3739471435546875, 0.0076904296875, -0.010568618774414062, -0.10666656494140625, -0.3047218322753906, -0.04140472412109375, 0.10248184204101562, -0.09229278564453125, 0.05783843994140625, -0.8108367919921875, 0.189361572265625, 0.5851058959960938, 0.4007568359375, 0.3388404846191406, -0.336944580078125, -0.24852561950683594, -0.6195220947265625, 0.2844085693359375, 0.0829620361328125, -0.6404953002929688, 0.8253326416015625, 0.2541656494140625, 0.067718505859375, -0.1121673583984375, 0.00975799560546875, -0.10219573974609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000029.npy"}
{"epoch": 0.04383975812547241, "step": 30, "batch_size": 64, "mean": 0.01508912444114685, "std": 0.4657590985298157, "min": -1.582763671875, "p10": -0.42397098541259765, "median": -0.035271644592285156, "p90": 0.47291221618652357, "max": 2.113739013671875, "pos_frac": 0.453125, "sample": [0.25634765625, -0.08907890319824219, 0.0550079345703125, 0.44382476806640625, 0.6019325256347656, -0.1255340576171875, 0.17806434631347656, -0.02486419677734375, -0.2908172607421875, -0.230133056640625, 0.3810157775878906, -0.2036285400390625, 0.23498916625976562, -0.076446533203125, -0.046600341796875, 0.3072509765625, 0.3495330810546875, -1.582763671875, -0.2578010559082031, -0.29804229736328125, -0.04271697998046875, 0.7131538391113281, -0.06191253662109375, -0.1788177490234375, 0.0972442626953125, -0.23787689208984375, 0.22149658203125, -0.031494140625, -0.00542449951171875, 0.36568450927734375, 0.217926025390625, 0.3498954772949219, 0.508697509765625, -0.14939117431640625, 2.113739013671875, -0.3695526123046875, 0.143402099609375, -0.3955364227294922, -0.04034423828125, -0.1456451416015625, -0.03904914855957031, 0.6602401733398438, -0.9219970703125, 0.063751220703125, -0.159271240234375, -0.70843505859375, 0.50286865234375, -0.4361572265625, -0.06023216247558594, -0.153167724609375, 0.3726463317871094, 0.1944427490234375, 0.4853782653808594, -0.7195243835449219, -0.1191558837890625, 0.03462791442871094, -0.2035980224609375, -0.4446754455566406, -0.145965576171875, 0.23509979248046875, 0.3005847930908203, -0.5036773681640625, 0.06255340576171875, 0.01363372802734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000030.npy"}
{"epoch": 0.045351473922902494, "step": 31, "batch_size": 64, "mean": 0.008851438760757446, "std": 0.39495810866355896, "min": -1.1815643310546875, "p10": -0.42562332153320315, "median": 0.0038509368896484375, "p90": 0.4587451934814453, "max": 1.1712493896484375, "pos_frac": 0.53125, "sample": [0.20699310302734375, 0.00086212158203125, -0.248016357421875, -0.6518688201904297, 0.08123779296875, -0.4266510009765625, -0.5461273193359375, 0.17369842529296875, 0.26158905029296875, -0.20404815673828125, -0.04518890380859375, 1.1712493896484375, -0.2589225769042969, -0.77801513671875, -1.1815643310546875, 0.459808349609375, 0.19704627990722656, 0.702728271484375, -0.06218719482421875, 0.048919677734375, 0.112701416015625, 0.24895095825195312, 0.07710647583007812, 0.4562644958496094, -0.2827911376953125, -0.190643310546875, -0.42322540283203125, -0.12537384033203125, -0.386627197265625, 0.06158447265625, -0.019683837890625, 0.3865165710449219, 0.725738525390625, 0.6500663757324219, 0.00312042236328125, 0.21810150146484375, -0.44509124755859375, 0.371917724609375, -0.0459442138671875, 0.2893657684326172, 0.0695037841796875, -0.06396484375, 0.004581451416015625, 0.037353515625, 0.829071044921875, -0.22228240966796875, 0.036834716796875, -0.095794677734375, 0.380859375, -0.0590057373046875, -0.1094970703125, 0.14078903198242188, 0.021392822265625, -0.21375274658203125, -0.705291748046875, 0.1630859375, 0.29776763916015625, 0.56060791015625, -0.14696693420410156, 0.14641189575195312, -0.374053955078125, -0.3918323516845703, -0.2551422119140625, -0.067779541015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000031.npy"}
{"epoch": 0.04686318972033258, "step": 32, "batch_size": 64, "mean": 0.03950721025466919, "std": 0.3459027409553528, "min": -0.9256439208984375, "p10": -0.5061203002929686, "median": 0.08813667297363281, "p90": 0.4199417114257813, "max": 0.6166458129882812, "pos_frac": 0.6875, "sample": [0.26673126220703125, 0.0442047119140625, -0.5676956176757812, -0.5662994384765625, 0.15838623046875, 0.4228401184082031, -0.2443084716796875, 0.00598907470703125, -0.8170166015625, -0.2274169921875, -0.6185226440429688, -0.03852081298828125, 0.08284378051757812, -0.39095306396484375, 0.08528900146484375, 0.5178680419921875, 0.2384033203125, 0.27672576904296875, 0.045501708984375, -0.29985809326171875, 0.007816314697265625, 0.3688774108886719, 0.12883758544921875, -0.20107269287109375, -0.09403228759765625, -0.14659881591796875, 0.18069076538085938, 0.15234375, 0.3006591796875, 0.4219512939453125, 0.27123260498046875, 0.08922958374023438, 0.21236038208007812, -0.9256439208984375, 0.6064529418945312, 0.1766815185546875, -0.4001312255859375, 0.17323684692382812, 0.0051422119140625, 0.00624847412109375, 0.3818931579589844, 0.3958015441894531, -0.1746978759765625, -0.7461776733398438, 0.0313720703125, 0.402801513671875, 0.007450103759765625, 0.6166458129882812, 0.415252685546875, 0.08704376220703125, 0.1530609130859375, 0.20566558837890625, -0.10507583618164062, 0.11431121826171875, 0.36843109130859375, 0.09959030151367188, -0.030975341796875, 0.5528945922851562, 0.01972198486328125, 0.200439453125, 0.121185302734375, -0.2884979248046875, 0.54339599609375, -0.551544189453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000032.npy"}
{"epoch": 0.04837490551776266, "step": 33, "batch_size": 64, "mean": -0.03343048691749573, "std": 0.4397204518318176, "min": -1.3956298828125, "p10": -0.5084175109863281, "median": -0.03769874572753906, "p90": 0.5395698547363281, "max": 1.027496337890625, "pos_frac": 0.4375, "sample": [-0.22341156005859375, 0.0814971923828125, 0.4986419677734375, -0.948974609375, -0.0101318359375, -0.07745361328125, -1.3956298828125, 0.5409202575683594, -0.109588623046875, 0.426666259765625, 0.6031494140625, -0.4579010009765625, -0.32269859313964844, -0.212738037109375, -0.14056396484375, 0.054107666015625, -0.7904815673828125, -0.23763275146484375, -0.01235198974609375, 0.46685791015625, -0.828643798828125, 0.2759246826171875, -0.122802734375, -0.23929977416992188, 1.027496337890625, 0.7096405029296875, 0.316680908203125, 0.10951995849609375, -0.44183349609375, 0.7742767333984375, 0.076446533203125, -0.02205657958984375, 0.1672821044921875, -0.3875923156738281, 0.5364189147949219, -0.261322021484375, -0.11389732360839844, -0.21929168701171875, 0.1895904541015625, 0.13054275512695312, -0.2469024658203125, 0.13275527954101562, -0.012132644653320312, -0.6483306884765625, 0.7460479736328125, -0.427734375, 0.0030975341796875, -0.26618194580078125, 0.2763023376464844, -0.053340911865234375, -0.295867919921875, -0.634185791015625, 0.2808380126953125, -0.13519287109375, -0.05399322509765625, 0.0059051513671875, 0.11878204345703125, -0.4056396484375, 0.36236572265625, -0.510040283203125, -0.5046310424804688, 0.2534065246582031, -0.1009674072265625, 0.5667266845703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000033.npy"}
{"epoch": 0.049886621315192746, "step": 34, "batch_size": 64, "mean": 0.0012740492820739746, "std": 0.3651844263076782, "min": -0.8800201416015625, "p10": -0.46042442321777344, "median": 0.010158538818359375, "p90": 0.4674407958984375, "max": 0.8861007690429688, "pos_frac": 0.515625, "sample": [-0.4429740905761719, -0.46790313720703125, 0.2506256103515625, 0.12353134155273438, -0.3062591552734375, 0.17852783203125, 0.3217201232910156, 0.36415863037109375, -0.052761077880859375, 0.4697113037109375, 0.38481903076171875, -0.7105865478515625, -0.41384124755859375, -0.553863525390625, -0.0228271484375, 0.29286956787109375, 0.12136077880859375, -0.1982421875, -0.3939170837402344, -0.0945281982421875, 0.18375015258789062, -0.09873199462890625, -0.078582763671875, 0.0076751708984375, -0.1560516357421875, -0.1820220947265625, 0.1416168212890625, 0.5449295043945312, 0.4621429443359375, 0.045650482177734375, 0.16387939453125, 0.5122833251953125, 0.23205947875976562, -0.23050880432128906, 0.42969512939453125, -0.8800201416015625, 0.1363067626953125, -0.166046142578125, 0.04779815673828125, 0.5093460083007812, -0.5090866088867188, -0.30142974853515625, -0.29688453674316406, -0.08588027954101562, -0.761444091796875, 0.56207275390625, -0.19042205810546875, -0.034793853759765625, 0.139434814453125, 0.3026542663574219, -0.43078041076660156, 0.0855712890625, 0.3014984130859375, -0.28731536865234375, -0.4745941162109375, 0.4493408203125, 0.1204986572265625, -0.10859489440917969, 0.10758209228515625, -0.09578323364257812, 0.6462478637695312, 0.01264190673828125, 0.8861007690429688, -0.4298858642578125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000034.npy"}
{"epoch": 0.05139833711262283, "step": 35, "batch_size": 64, "mean": 0.053957194089889526, "std": 0.3707262873649597, "min": -0.7386322021484375, "p10": -0.37102584838867186, "median": 0.05076408386230469, "p90": 0.43961715698242193, "max": 1.346923828125, "pos_frac": 0.578125, "sample": [0.77642822265625, -0.1278228759765625, 0.2918663024902344, 0.6764297485351562, -0.2330303192138672, 1.346923828125, -0.155364990234375, -0.30091094970703125, -0.5703277587890625, 0.0188751220703125, 0.7910919189453125, 0.14650535583496094, 0.24653244018554688, -0.5102729797363281, 0.3181343078613281, -0.4045372009277344, 0.8122100830078125, -0.28293609619140625, 0.014892578125, -0.3409423828125, -0.26933860778808594, -0.46173858642578125, -0.19980621337890625, 0.1760406494140625, 0.19964218139648438, 0.02567291259765625, 0.42431640625, -0.17124557495117188, 0.052978515625, 0.2424163818359375, 0.1741485595703125, 0.13985443115234375, 0.288360595703125, 0.105987548828125, -0.2864837646484375, 0.36783599853515625, 0.44617462158203125, -0.7386322021484375, -0.12325286865234375, -0.3272552490234375, 0.12725067138671875, -0.16535377502441406, 0.048549652099609375, 0.09621429443359375, -0.129119873046875, -0.38391876220703125, 0.3881378173828125, 0.4879608154296875, 0.3242225646972656, 0.036624908447265625, 0.22272491455078125, 0.17748260498046875, 0.1342620849609375, -0.133331298828125, -0.16988372802734375, 0.242706298828125, 0.28302001953125, 0.0790557861328125, -0.6573638916015625, 0.2181396484375, -0.0663299560546875, -0.13819122314453125, -0.02144622802734375, -0.12757301330566406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000035.npy"}
{"epoch": 0.05291005291005291, "step": 36, "batch_size": 64, "mean": 0.06488761305809021, "std": 0.362324595451355, "min": -0.9944686889648438, "p10": -0.39102020263671877, "median": 0.06606769561767578, "p90": 0.49689102172851574, "max": 1.301849365234375, "pos_frac": 0.609375, "sample": [0.36620330810546875, 0.5098114013671875, 1.301849365234375, 0.27108001708984375, -0.46150970458984375, 0.15061187744140625, 0.24434661865234375, 0.5473365783691406, -0.002590179443359375, 0.0877838134765625, -0.2649688720703125, -0.13054466247558594, 0.09529876708984375, 0.2520751953125, -0.15971946716308594, 0.07578659057617188, -0.12169647216796875, 0.009922027587890625, -0.030376434326171875, -0.9944686889648438, 0.170867919921875, 0.3229522705078125, 0.4292755126953125, 0.06116294860839844, 0.020038604736328125, 0.20330810546875, -0.2108917236328125, -0.0527496337890625, 0.3002815246582031, 0.76611328125, 0.058502197265625, 0.11378288269042969, -0.4577980041503906, 0.3222503662109375, 0.409820556640625, -0.45563507080078125, 0.08864974975585938, 0.6928253173828125, -0.07465744018554688, -0.13510894775390625, 0.05509185791015625, -0.0451507568359375, -0.38701629638671875, -0.389129638671875, 0.06414222717285156, -0.22356033325195312, 0.008312225341796875, 0.641571044921875, -0.431854248046875, -0.3918304443359375, -0.3752403259277344, 0.46674346923828125, -0.12350845336914062, -0.32405853271484375, 0.2615242004394531, 0.0679931640625, 0.3820457458496094, 0.22027206420898438, 0.511138916015625, 0.2132110595703125, 0.08419227600097656, 0.1396636962890625, -0.1552448272705078, -0.43572235107421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000036.npy"}
{"epoch": 0.05442176870748299, "step": 37, "batch_size": 64, "mean": -0.06963574886322021, "std": 0.3863889276981354, "min": -1.45166015625, "p10": -0.5107711791992188, "median": -0.1028127670288086, "p90": 0.4358745574951173, "max": 0.8555564880371094, "pos_frac": 0.4375, "sample": [-0.2114410400390625, 0.1031341552734375, -0.45742034912109375, -0.571533203125, 0.1383209228515625, 0.4155616760253906, -0.07059478759765625, 0.20623397827148438, -0.1543560028076172, 0.00957489013671875, -0.71307373046875, -0.12752151489257812, -0.23050308227539062, -0.6565399169921875, 0.03006744384765625, -0.21979522705078125, -0.161773681640625, 0.3999977111816406, -0.059040069580078125, 0.4716796875, 0.34499359130859375, -0.24252700805664062, 0.571075439453125, -0.165557861328125, -0.3389892578125, -0.7620277404785156, 0.19562530517578125, -0.2902050018310547, -0.374114990234375, -0.508880615234375, -1.45166015625, 0.6037139892578125, -0.3876495361328125, -0.6234817504882812, 0.02337646484375, 0.0717926025390625, -0.0894012451171875, 0.19614410400390625, 0.3417854309082031, 0.03568267822265625, 0.1021728515625, 0.1457347869873047, 0.8555564880371094, 0.1505889892578125, -0.05690765380859375, -0.27834320068359375, -0.1688232421875, -0.280914306640625, -0.5115814208984375, 0.1905975341796875, -0.12814712524414062, -0.11622428894042969, -0.11688613891601562, -0.12250900268554688, 0.23470306396484375, 0.5835685729980469, -0.26110076904296875, 0.052921295166015625, 0.444580078125, -0.3750762939453125, -0.215057373046875, -0.4433441162109375, 0.09453582763671875, 0.47259521484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000037.npy"}
{"epoch": 0.055933484504913075, "step": 38, "batch_size": 64, "mean": -0.050820231437683105, "std": 0.39169442653656006, "min": -0.8740081787109375, "p10": -0.4475494384765625, "median": -0.09941768646240234, "p90": 0.45676422119140636, "max": 1.1850433349609375, "pos_frac": 0.390625, "sample": [0.2531623840332031, 0.12176895141601562, 0.6498031616210938, -0.2562713623046875, -0.057861328125, 0.13817596435546875, 0.11102294921875, 0.12335205078125, 1.1850433349609375, -0.42803955078125, -0.13833236694335938, -0.42148590087890625, -0.07791900634765625, 0.16373825073242188, -0.02614593505859375, 0.256988525390625, -0.8740081787109375, 0.07634925842285156, -0.366607666015625, 0.5322723388671875, -0.1432952880859375, 0.8823699951171875, -0.8603744506835938, 0.35809326171875, -0.13936233520507812, -0.4380645751953125, -0.5853424072265625, -0.08127403259277344, 0.5598602294921875, -0.23095703125, 0.042392730712890625, -0.5134735107421875, 0.12695884704589844, -0.21080780029296875, -0.11482620239257812, -0.030181884765625, 0.2062530517578125, -0.031223297119140625, 0.15652847290039062, 0.0041751861572265625, 0.436309814453125, -0.3003082275390625, -0.2102947235107422, -0.2692108154296875, 0.4655303955078125, -0.28147125244140625, 0.32247161865234375, 0.15474319458007812, -0.33319854736328125, -0.27655029296875, -0.8025970458984375, -0.20401763916015625, 0.6302032470703125, -0.18784332275390625, -0.251373291015625, -0.08400917053222656, -0.11607742309570312, -0.21878814697265625, -0.6852340698242188, 0.3201751708984375, -0.202850341796875, -0.41400146484375, -0.4516143798828125, -0.21494293212890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000038.npy"}
{"epoch": 0.05744520030234316, "step": 39, "batch_size": 64, "mean": 0.03375011682510376, "std": 0.359550803899765, "min": -1.177978515625, "p10": -0.3484172821044922, "median": -0.007075309753417969, "p90": 0.5063102722167971, "max": 0.840087890625, "pos_frac": 0.484375, "sample": [0.5356369018554688, -0.28272247314453125, 0.25315093994140625, 0.1722869873046875, -0.1187286376953125, 0.6136016845703125, -0.1482086181640625, 0.0777130126953125, 0.3753700256347656, 0.533782958984375, -0.6599273681640625, -0.3128376007080078, 0.3255805969238281, 0.06926727294921875, -0.580078125, -0.059871673583984375, 0.09229087829589844, -0.0392303466796875, -0.09108734130859375, 0.0632781982421875, 0.3004341125488281, -0.15087890625, -0.0035247802734375, 0.374969482421875, -0.097320556640625, 0.03022003173828125, -0.2555999755859375, -0.06678581237792969, 0.433319091796875, -0.1042327880859375, -0.010625839233398438, 0.804443359375, 0.8238143920898438, -0.04510498046875, -0.332061767578125, 0.04180145263671875, 0.4078369140625, -0.38382720947265625, -0.3554267883300781, -0.37412261962890625, 0.3347320556640625, -0.206573486328125, -0.11685752868652344, 0.193389892578125, -0.1619873046875, -0.16378021240234375, 0.840087890625, 0.2747802734375, 0.004856109619140625, -0.4049949645996094, 0.530487060546875, 0.44989776611328125, 0.3145408630371094, -0.12934494018554688, 0.03231048583984375, -1.177978515625, -0.2722892761230469, -0.1376190185546875, 0.07086944580078125, 0.11612701416015625, 0.3342437744140625, -0.07829093933105469, -0.14011383056640625, -0.2030792236328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000039.npy"}
{"epoch": 0.05895691609977324, "step": 40, "batch_size": 64, "mean": 0.062385231256484985, "std": 0.3560931086540222, "min": -0.8029212951660156, "p10": -0.3659791946411133, "median": 0.029623031616210938, "p90": 0.5377716064453127, "max": 0.7814254760742188, "pos_frac": 0.546875, "sample": [-0.49237060546875, 0.39574432373046875, 0.7814254760742188, 0.04633331298828125, 0.59130859375, 0.0050525665283203125, 0.24738311767578125, 0.16528701782226562, -0.1501312255859375, -0.3149566650390625, 0.4064445495605469, -0.023555755615234375, 0.7677001953125, -0.11931991577148438, -0.1317291259765625, -0.12471961975097656, 0.7522735595703125, 0.410308837890625, 0.1300334930419922, -0.008295059204101562, 0.5874710083007812, 0.6128997802734375, -0.47779083251953125, 0.0174560546875, -0.09532928466796875, -0.2115936279296875, 0.3923988342285156, -0.16650390625, 0.47991180419921875, 0.3720226287841797, 0.041790008544921875, 0.5625686645507812, 0.164031982421875, -0.6433563232421875, -0.3672466278076172, -0.2767295837402344, -0.019496917724609375, -0.039051055908203125, 0.2592449188232422, 0.360137939453125, -0.014148712158203125, 0.40283203125, 0.1305999755859375, 0.19476318359375, 0.1042022705078125, 0.1031341552734375, -0.16722488403320312, -0.10514068603515625, -0.2599754333496094, -0.027862548828125, 0.2655220031738281, 0.3214111328125, -0.8029212951660156, 0.190399169921875, -0.339996337890625, -0.10845184326171875, 0.0157012939453125, 0.3959197998046875, -0.7285308837890625, -0.5010528564453125, 0.2822227478027344, -0.06525802612304688, -0.3630218505859375, 0.1824798583984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000040.npy"}
{"epoch": 0.06046863189720333, "step": 41, "batch_size": 64, "mean": 0.045472562313079834, "std": 0.36446693539619446, "min": -0.802703857421875, "p10": -0.41301422119140624, "median": 0.0006256103515625, "p90": 0.5551994323730469, "max": 0.997100830078125, "pos_frac": 0.5, "sample": [0.3529815673828125, 0.5630264282226562, 0.06782913208007812, -0.4472198486328125, 0.217742919921875, -0.161865234375, -0.04777717590332031, 0.2619972229003906, 0.5536880493164062, 0.299041748046875, -0.802703857421875, 0.2709369659423828, -0.7417144775390625, -0.40704345703125, 0.404296875, -0.0015869140625, -0.38986968994140625, 0.3068084716796875, -0.3855018615722656, 0.20018577575683594, -0.0809783935546875, 0.17608642578125, 0.15435409545898438, -0.3876800537109375, 0.11292457580566406, -0.1849365234375, 0.33885955810546875, -0.12714385986328125, -0.10610198974609375, -0.6887664794921875, -0.13138198852539062, -0.4965667724609375, 0.4361724853515625, 0.5469512939453125, 0.23479270935058594, 0.49628448486328125, -0.15824127197265625, -0.19169235229492188, -0.11119842529296875, -0.012948989868164062, 0.5748214721679688, 0.060398101806640625, 0.002838134765625, 0.08209228515625, 0.6386947631835938, -0.021144866943359375, 0.15917205810546875, 0.6777496337890625, -0.06494903564453125, -0.4155731201171875, 0.00417327880859375, 0.585784912109375, -0.04268646240234375, -0.106231689453125, 0.0064544677734375, -0.03912353515625, -0.31836700439453125, -0.014980316162109375, -0.005001068115234375, 0.14764404296875, 0.55584716796875, 0.997100830078125, -0.02252960205078125, -0.46398162841796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000041.npy"}
{"epoch": 0.06198034769463341, "step": 42, "batch_size": 64, "mean": 0.06889224052429199, "std": 0.37070393562316895, "min": -0.5813827514648438, "p10": -0.3169563293457031, "median": 0.03316688537597656, "p90": 0.5156528472900391, "max": 1.2526321411132812, "pos_frac": 0.53125, "sample": [0.74578857421875, -0.240203857421875, 0.29660797119140625, -0.15039825439453125, 0.18226242065429688, 0.048828125, -0.2695503234863281, 0.08833885192871094, -0.5813827514648438, -0.30487060546875, -0.06371307373046875, 0.15304183959960938, -0.2799034118652344, -0.5015144348144531, -0.0429534912109375, 0.35687255859375, 0.9337005615234375, -0.2877960205078125, 0.32759857177734375, -0.30416107177734375, 0.4886627197265625, 0.10852813720703125, 0.6529045104980469, -0.2443389892578125, -0.21485519409179688, 1.2526321411132812, -0.3846397399902344, -0.32213592529296875, -0.2523975372314453, -0.17084884643554688, 0.20212554931640625, 0.21732330322265625, -0.25323486328125, -0.28266143798828125, -0.3391876220703125, 0.13349151611328125, 0.03798675537109375, 0.36475372314453125, -0.22600555419921875, 0.9136505126953125, 0.5035057067871094, -0.1816253662109375, -0.0727386474609375, -0.069366455078125, 0.09813690185546875, 0.381805419921875, 0.47657012939453125, 0.5208587646484375, 0.15731048583984375, -0.1161956787109375, 0.027017593383789062, -0.13280487060546875, 0.11907196044921875, 0.14376068115234375, -0.4498023986816406, 0.32524871826171875, -0.3871002197265625, 0.07037925720214844, 0.4030303955078125, 0.3807258605957031, 0.028347015380859375, -0.07941055297851562, 0.5309906005859375, -0.056957244873046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000042.npy"}
{"epoch": 0.06349206349206349, "step": 43, "batch_size": 64, "mean": 0.04241099953651428, "std": 0.4501649737358093, "min": -1.28057861328125, "p10": -0.48452186584472656, "median": 0.014641761779785156, "p90": 0.5195564270019533, "max": 1.5465850830078125, "pos_frac": 0.515625, "sample": [0.3292694091796875, 0.432769775390625, 0.493499755859375, 0.047954559326171875, 0.12776565551757812, -0.05181121826171875, -0.1704559326171875, 0.10440444946289062, 0.3528900146484375, 0.294525146484375, 0.05366706848144531, -0.29991912841796875, -0.6204986572265625, -0.38306427001953125, 0.73828125, -0.4578857421875, -0.00115203857421875, -0.117431640625, -0.10376167297363281, 0.3945884704589844, 0.18575096130371094, 0.808624267578125, -0.0277252197265625, -0.4959373474121094, -0.18848037719726562, 0.5307235717773438, -0.2061176300048828, -0.1798553466796875, -0.07180404663085938, -0.6532135009765625, 1.5465850830078125, 0.433197021484375, -0.04431915283203125, 0.25555419921875, 0.16096115112304688, 0.753265380859375, 0.03826904296875, 0.2826957702636719, 0.3803443908691406, 0.46795654296875, 0.72369384765625, -0.07583999633789062, 0.13141250610351562, -0.6717262268066406, 0.020685195922851562, -1.02789306640625, 0.2698822021484375, 0.09504127502441406, -0.05420684814453125, 0.309906005859375, -0.01361083984375, -0.048370361328125, -0.4537506103515625, -0.272674560546875, -0.00510406494140625, 0.00859832763671875, 0.5345115661621094, 0.2446441650390625, 0.2485198974609375, -1.28057861328125, -0.8009185791015625, -0.011562347412109375, -0.2422351837158203, -0.054229736328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000043.npy"}
{"epoch": 0.06500377928949358, "step": 44, "batch_size": 64, "mean": 0.032884448766708374, "std": 0.4050200581550598, "min": -0.9635810852050781, "p10": -0.5186126708984375, "median": 0.050640106201171875, "p90": 0.4772720336914064, "max": 0.9485321044921875, "pos_frac": 0.5625, "sample": [-0.06797409057617188, -0.018505096435546875, 0.523223876953125, 0.41481781005859375, -0.29579925537109375, -0.50927734375, 0.13373565673828125, 0.04853057861328125, -0.03978729248046875, -0.14257049560546875, -0.2141571044921875, -0.7146759033203125, 0.3092193603515625, 0.03526115417480469, 0.43470001220703125, -0.074981689453125, -0.07113265991210938, -0.02295684814453125, -0.522613525390625, -0.42363739013671875, -0.1550750732421875, 0.033477783203125, 0.1633453369140625, 0.30016326904296875, -0.18259048461914062, 0.1952056884765625, 0.1799468994140625, 0.311065673828125, 0.426116943359375, 0.9485321044921875, 0.438751220703125, -0.476470947265625, -0.5735321044921875, -0.9635810852050781, -0.2223529815673828, -0.218963623046875, 0.132110595703125, 0.08820724487304688, 0.3166961669921875, 0.8339157104492188, -0.5571670532226562, 0.0527496337890625, -0.4953155517578125, -0.04889678955078125, 0.7307357788085938, 0.32140350341796875, -0.21553802490234375, 0.5152053833007812, -0.7682876586914062, 0.18649864196777344, 0.08385467529296875, 0.6736907958984375, -0.12265777587890625, 0.1424713134765625, 0.26813507080078125, 0.30634307861328125, -0.0538177490234375, 0.49212646484375, 0.2932701110839844, -0.9605178833007812, 0.07061767578125, 0.00217437744140625, 0.4426116943359375, 0.38852691650390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000044.npy"}
{"epoch": 0.06651549508692366, "step": 45, "batch_size": 64, "mean": 0.038203686475753784, "std": 0.4234500825405121, "min": -0.744171142578125, "p10": -0.41313438415527337, "median": -0.05988883972167969, "p90": 0.7423625946044928, "max": 1.1728515625, "pos_frac": 0.421875, "sample": [-0.1688385009765625, 0.0338897705078125, -0.6677894592285156, -0.1232147216796875, -0.024171829223632812, 0.37941741943359375, 0.5119743347167969, 0.8739776611328125, -0.744171142578125, -0.18099212646484375, 0.3363800048828125, 1.121917724609375, 0.18789291381835938, -0.22021484375, -0.3102836608886719, -0.5890960693359375, -0.040332794189453125, -0.2325592041015625, -0.5245819091796875, -0.174346923828125, 0.840789794921875, 0.9087753295898438, -0.22423553466796875, -0.12349319458007812, -0.1773223876953125, 0.8063201904296875, 0.4589996337890625, 0.15608596801757812, 0.0431671142578125, 0.1657562255859375, 0.01906585693359375, -0.432464599609375, -0.4445953369140625, -0.107635498046875, 0.3779106140136719, -0.2518157958984375, -0.3680305480957031, 0.28179168701171875, -0.11261749267578125, 0.27840423583984375, 0.0252227783203125, 1.1728515625, -0.08226776123046875, 0.135345458984375, -0.13045501708984375, 0.5931282043457031, 0.0571441650390625, -0.1150970458984375, -0.005340576171875, 0.9193115234375, -0.5604782104492188, -0.24314117431640625, -0.277008056640625, -0.07944488525390625, 0.20324325561523438, -0.0143585205078125, -0.3665924072265625, 0.35296630859375, -0.1940746307373047, -0.010213851928710938, -0.29058074951171875, 0.16070556640625, -0.0933074951171875, -0.25223541259765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000045.npy"}
{"epoch": 0.06802721088435375, "step": 46, "batch_size": 64, "mean": 0.07199528813362122, "std": 0.376910001039505, "min": -0.903656005859375, "p10": -0.3916053771972656, "median": 0.07044696807861328, "p90": 0.5028182983398439, "max": 0.836151123046875, "pos_frac": 0.578125, "sample": [0.16949462890625, -0.2969093322753906, -0.2447052001953125, -0.37689208984375, 0.7313079833984375, 0.0928802490234375, 0.04885673522949219, 0.35762786865234375, -0.39791107177734375, -0.903656005859375, 0.4394111633300781, 0.2622489929199219, 0.4504852294921875, -0.2683258056640625, 0.4400634765625, 0.836151123046875, 0.7815017700195312, 0.27588653564453125, 0.22516632080078125, -0.1556549072265625, -0.17316627502441406, 0.7440185546875, -0.36334991455078125, 0.3412628173828125, -0.18697357177734375, -0.44026947021484375, 0.2868499755859375, 0.3231964111328125, -0.252899169921875, 0.2313690185546875, 0.2843780517578125, 0.06616973876953125, 0.22937774658203125, -0.22586822509765625, -0.08523941040039062, -0.02294921875, -0.031040191650390625, -0.45345306396484375, 0.1146087646484375, 0.4677734375, -0.4638099670410156, 0.44033050537109375, 0.53570556640625, 0.81390380859375, -0.10696029663085938, 0.188323974609375, 0.034206390380859375, -0.6393203735351562, 0.009063720703125, 0.2544059753417969, -0.10528755187988281, 0.11422538757324219, -0.3569183349609375, -0.30548858642578125, 0.5178375244140625, 0.41278839111328125, -0.0594482421875, -0.409149169921875, 0.03977203369140625, 0.441986083984375, -0.3506011962890625, 0.07472419738769531, 0.4061012268066406, -0.19951629638671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000046.npy"}
{"epoch": 0.06953892668178382, "step": 47, "batch_size": 64, "mean": 0.018892019987106323, "std": 0.3710062503814697, "min": -0.8740081787109375, "p10": -0.41676769256591795, "median": -0.003604888916015625, "p90": 0.5349205017089844, "max": 0.7633285522460938, "pos_frac": 0.484375, "sample": [-0.1394939422607422, -0.31946563720703125, -0.2277374267578125, 0.02392578125, -0.0751800537109375, 0.164703369140625, -0.11861228942871094, 0.7633285522460938, -0.2435016632080078, 0.6987533569335938, 0.09619331359863281, -0.05341529846191406, -0.7075424194335938, 0.00112152099609375, -0.170196533203125, 0.06113433837890625, 0.4786567687988281, 0.6128692626953125, -0.5875091552734375, 0.19199752807617188, -0.16835784912109375, 0.26306915283203125, 0.2748260498046875, 0.5252113342285156, 0.00798797607421875, 0.2498016357421875, 0.2836418151855469, -0.2945995330810547, -0.042568206787109375, -0.0529327392578125, -0.1423492431640625, 0.3208808898925781, 0.259918212890625, -0.6607513427734375, -0.435546875, -0.00116729736328125, 0.06406784057617188, 0.08316802978515625, 0.4074363708496094, -0.19710540771484375, 0.1718597412109375, 0.7374725341796875, -0.8405685424804688, 0.1940460205078125, 0.3649787902832031, -0.1101226806640625, -0.00604248046875, -0.8740081787109375, 0.5820159912109375, -0.045928955078125, 0.001800537109375, -0.25699615478515625, 0.52972412109375, -0.19155120849609375, -0.3334770202636719, -0.01081085205078125, -0.37294960021972656, -0.17689895629882812, -0.21573638916015625, -0.4949684143066406, -0.01422882080078125, 0.5371475219726562, 0.603118896484375, 0.23655319213867188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000047.npy"}
{"epoch": 0.0710506424792139, "step": 48, "batch_size": 64, "mean": 0.057750314474105835, "std": 0.34622976183891296, "min": -0.71636962890625, "p10": -0.37209510803222656, "median": 0.0652017593383789, "p90": 0.5162651062011719, "max": 1.029205322265625, "pos_frac": 0.578125, "sample": [-0.23810577392578125, -0.09564208984375, -0.14072799682617188, 0.3951263427734375, 0.4405670166015625, 0.1559906005859375, 0.1136322021484375, -0.37248992919921875, 0.21124649047851562, -0.0439910888671875, 0.2523193359375, 0.10366058349609375, 0.47480010986328125, 0.1534881591796875, 1.029205322265625, 0.0827178955078125, -0.10897064208984375, 0.11940193176269531, -0.3142547607421875, 0.418487548828125, -0.422882080078125, -0.25362396240234375, -0.71636962890625, 0.0311737060546875, -0.0831146240234375, 0.14176177978515625, -0.281890869140625, 0.08243560791015625, -0.6249427795410156, -0.2782135009765625, -0.1255950927734375, 0.7223358154296875, -0.4000587463378906, -0.40517425537109375, 0.5997848510742188, 0.6723861694335938, 0.15889358520507812, 0.3341064453125, 0.071929931640625, 0.08381271362304688, -0.06341361999511719, 0.7135391235351562, 0.2981109619140625, 0.05142402648925781, 0.4906463623046875, 0.04624176025390625, -0.04480743408203125, 0.5272445678710938, 0.203582763671875, 0.035541534423828125, -0.3711738586425781, -0.271759033203125, -0.34209442138671875, 0.27982330322265625, 0.06993675231933594, 0.060466766357421875, -0.1354827880859375, -0.1493968963623047, -0.1046142578125, -0.46844482421875, 0.18538665771484375, -0.08245086669921875, 0.14917373657226562, 0.675323486328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000048.npy"}
{"epoch": 0.07256235827664399, "step": 49, "batch_size": 64, "mean": 0.04920418560504913, "std": 0.37390002608299255, "min": -1.1567840576171875, "p10": -0.3680519104003906, "median": 0.07145404815673828, "p90": 0.4987205505371094, "max": 0.95916748046875, "pos_frac": 0.578125, "sample": [0.01410675048828125, -0.10004806518554688, 0.14979934692382812, 0.12581634521484375, -0.1781768798828125, 0.16849517822265625, -0.083740234375, -0.29909515380859375, 0.08114051818847656, -0.35967254638671875, 0.4033794403076172, 0.0831451416015625, 0.05902862548828125, 0.1728973388671875, -0.37164306640625, -0.22350502014160156, 0.003017425537109375, -0.39687347412109375, 0.15183639526367188, -0.3148956298828125, -0.21122360229492188, -0.017541885375976562, 0.13540935516357422, -0.13877105712890625, -0.04380607604980469, -0.736968994140625, 0.4420166015625, -0.16710662841796875, 0.11901473999023438, -0.7567138671875, 0.5014419555664062, 0.2887115478515625, 0.09957504272460938, 0.3961067199707031, 0.6107444763183594, 0.4878387451171875, -0.60894775390625, 0.00345611572265625, 0.539306640625, -0.07757568359375, -1.1567840576171875, 0.5459785461425781, 0.95916748046875, 0.566070556640625, 0.346435546875, -0.26830291748046875, 0.39806365966796875, -0.24570465087890625, 0.5098876953125, 0.3998451232910156, 0.44530487060546875, 0.061767578125, 0.268310546875, 0.20760345458984375, 0.37017822265625, -0.21010589599609375, 0.0834197998046875, -0.1814727783203125, -0.039306640625, -0.07248687744140625, -0.41699981689453125, 0.49237060546875, -0.2438812255859375, 0.379730224609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000049.npy"}
{"epoch": 0.07407407407407407, "step": 50, "batch_size": 64, "mean": -0.0005398094654083252, "std": 0.3561820983886719, "min": -1.353179931640625, "p10": -0.3953392028808594, "median": 0.06662464141845703, "p90": 0.37894821166992204, "max": 0.7989501953125, "pos_frac": 0.5625, "sample": [0.25319671630859375, -0.7236404418945312, 0.6128082275390625, 0.11759185791015625, -0.054065704345703125, -0.1972198486328125, 0.30080413818359375, -0.6238479614257812, 0.1038665771484375, 0.34009552001953125, -0.049938201904296875, 0.08710479736328125, -0.27167510986328125, -0.1078033447265625, 0.001964569091796875, -0.3655529022216797, 0.3155059814453125, 0.20959091186523438, -0.21384429931640625, 0.26363372802734375, 0.40389251708984375, 0.19591522216796875, -0.1926422119140625, 0.4135589599609375, -0.7320098876953125, -1.353179931640625, 0.0871734619140625, -0.16548538208007812, 0.06919479370117188, 0.22848892211914062, 0.7989501953125, -0.14631271362304688, -0.020175933837890625, 0.02878570556640625, 0.395599365234375, 0.285614013671875, 0.1545257568359375, 0.16349029541015625, -0.21701622009277344, -0.27712249755859375, -0.3373565673828125, -0.397552490234375, 0.15061569213867188, 0.5197067260742188, 0.17316436767578125, -0.07678413391113281, -0.26410675048828125, 0.19613265991210938, 0.2343902587890625, 0.05411529541015625, 0.22524452209472656, -0.11385345458984375, -0.10805130004882812, 0.06405448913574219, 0.13321495056152344, -0.39017486572265625, -0.5867042541503906, 0.6333885192871094, -0.02685546875, -0.0423736572265625, -0.5126609802246094, 0.07527923583984375, 0.15641212463378906, 0.08638763427734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000050.npy"}
{"epoch": 0.07558578987150416, "step": 51, "batch_size": 64, "mean": -0.03419503569602966, "std": 0.5002948045730591, "min": -1.0634918212890625, "p10": -0.6893280029296875, "median": 0.015702247619628906, "p90": 0.5129730224609375, "max": 1.4795913696289062, "pos_frac": 0.53125, "sample": [0.40423583984375, 0.5069732666015625, 0.2915153503417969, -0.47246551513671875, -0.36431121826171875, 1.4795913696289062, 0.0830841064453125, 0.10829925537109375, 0.16121482849121094, -0.2231903076171875, 0.5121879577636719, -0.8398361206054688, 0.088134765625, 0.1738739013671875, 0.5152969360351562, 0.828094482421875, 1.00604248046875, -0.16217041015625, 0.5133094787597656, 1.02947998046875, -0.003875732421875, 0.16852569580078125, 0.5340499877929688, -0.1221923828125, 0.35382080078125, -0.6930084228515625, -0.24198150634765625, -0.6595115661621094, -0.38266944885253906, -0.8116340637207031, -0.03924560546875, -0.642181396484375, 0.0928497314453125, -1.0634918212890625, -0.6807403564453125, 0.04892730712890625, 0.0186004638671875, -0.19023513793945312, -0.4811859130859375, 0.14181137084960938, -0.18332290649414062, -0.83428955078125, -0.24518585205078125, 0.446136474609375, -0.4337310791015625, 0.11638641357421875, -0.30450439453125, -0.765655517578125, 0.157379150390625, 0.006378173828125, 0.09332275390625, 0.05926513671875, -0.38951873779296875, 0.12473297119140625, 0.27455902099609375, 0.012804031372070312, -0.008541107177734375, 0.31600189208984375, 0.4440727233886719, -0.45813751220703125, -0.9909515380859375, -0.6548309326171875, -0.08275985717773438, 0.12591552734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000051.npy"}
{"epoch": 0.07709750566893424, "step": 52, "batch_size": 64, "mean": 0.07070627808570862, "std": 0.44669464230537415, "min": -1.0193328857421875, "p10": -0.3321987152099609, "median": -0.01950836181640625, "p90": 0.6837348937988283, "max": 1.401397705078125, "pos_frac": 0.484375, "sample": [-0.1476898193359375, -0.969329833984375, 0.7901458740234375, 0.222015380859375, -0.068450927734375, -0.02996063232421875, -0.08260536193847656, -0.07297897338867188, 0.32802581787109375, 0.8770523071289062, 0.305511474609375, 1.221099853515625, -0.24224853515625, 0.9174957275390625, -0.0631561279296875, 0.052825927734375, -0.11917877197265625, 0.07312774658203125, 0.2480316162109375, 0.17484283447265625, -0.3005638122558594, -0.11037826538085938, 0.18183135986328125, -0.00905609130859375, 0.9954681396484375, -0.19724273681640625, -0.34575653076171875, 0.09556007385253906, -0.36285400390625, -0.08114051818847656, 0.47531890869140625, -0.2265472412109375, 0.6389541625976562, 0.33150672912597656, -0.1648120880126953, 0.2922821044921875, -0.2273406982421875, -0.24414443969726562, -0.65411376953125, -0.122039794921875, 0.005558013916015625, 0.11553192138671875, -0.2918128967285156, 0.49585533142089844, -0.07297134399414062, 0.2708892822265625, -0.1325836181640625, -0.238037109375, -0.509124755859375, -0.12471771240234375, 0.1816558837890625, 0.467864990234375, 0.2512969970703125, -0.17304229736328125, 0.2219390869140625, 0.2005462646484375, 0.7029266357421875, 0.07582855224609375, 1.401397705078125, 0.18743133544921875, -1.0193328857421875, -0.12225341796875, -0.26787376403808594, -0.4812774658203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000052.npy"}
{"epoch": 0.07860922146636433, "step": 53, "batch_size": 64, "mean": 0.12471383810043335, "std": 0.4325999617576599, "min": -1.1451339721679688, "p10": -0.3741813659667968, "median": 0.1285877227783203, "p90": 0.6876266479492189, "max": 1.150054931640625, "pos_frac": 0.671875, "sample": [-0.199005126953125, 0.36650848388671875, 0.4894599914550781, -0.39647674560546875, -0.06092071533203125, 0.00428009033203125, 0.26903533935546875, 0.20416259765625, -0.20001602172851562, -0.129150390625, -0.8017730712890625, 0.7576446533203125, 0.2604827880859375, 0.13152694702148438, 0.16451263427734375, 0.16019058227539062, 0.5369186401367188, 0.12564849853515625, -0.0811920166015625, 1.1437530517578125, 0.0026454925537109375, 0.11801910400390625, 0.7044448852539062, 0.38224029541015625, -0.15216827392578125, 0.81781005859375, -0.17261886596679688, 0.2606468200683594, 0.2613868713378906, 0.31433868408203125, 0.1620025634765625, -0.7992172241210938, -0.3221588134765625, -1.1451339721679688, -0.12144279479980469, 0.707733154296875, 0.6483840942382812, 0.037952423095703125, -0.3201942443847656, 0.431060791015625, 0.14792633056640625, 0.0731353759765625, 0.11561775207519531, -0.6235809326171875, 0.32579803466796875, 1.150054931640625, 0.5238113403320312, 0.7300262451171875, 0.177459716796875, -0.412933349609375, 0.47498321533203125, 0.3803558349609375, 0.005458831787109375, 0.021793365478515625, -0.415679931640625, 0.026912689208984375, 0.62335205078125, 0.09465789794921875, 0.550445556640625, -0.2620811462402344, -0.1685943603515625, -0.075347900390625, -0.26520538330078125, 0.2219982147216797], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000053.npy"}
{"epoch": 0.0801209372637944, "step": 54, "batch_size": 64, "mean": 0.06533166766166687, "std": 0.41315680742263794, "min": -0.9965972900390625, "p10": -0.471890640258789, "median": 0.058826446533203125, "p90": 0.5955745697021485, "max": 0.8416175842285156, "pos_frac": 0.578125, "sample": [-0.26186370849609375, 0.3413238525390625, -0.3651123046875, 0.0049076080322265625, 0.6317253112792969, -0.438629150390625, 0.493438720703125, 0.40436553955078125, -0.056880950927734375, -0.18880462646484375, 0.5233383178710938, -0.08520126342773438, 0.586639404296875, 0.33946990966796875, 0.14947128295898438, -0.12878036499023438, 0.6512298583984375, 0.5248050689697266, -0.4071044921875, 0.5708541870117188, 0.5967864990234375, 0.126007080078125, -0.5073070526123047, 0.054576873779296875, -0.0031890869140625, 0.5927467346191406, 0.4307746887207031, 0.49462890625, -0.43569183349609375, 0.6871719360351562, -0.571746826171875, 0.15958023071289062, 0.075653076171875, -0.0045166015625, 0.05761528015136719, -0.22006988525390625, 0.11319732666015625, 0.03545379638671875, -0.4820518493652344, 0.7475433349609375, -0.44818115234375, -0.08129119873046875, -0.02056884765625, 0.3631591796875, 0.12949752807617188, 0.3675651550292969, 0.28359222412109375, 0.2044696807861328, -0.023151397705078125, 0.8416175842285156, -0.11664581298828125, -0.716461181640625, 0.317291259765625, 0.02501678466796875, 0.06003761291503906, -0.2960834503173828, -0.6670379638671875, 0.20852279663085938, 0.607452392578125, -0.40187835693359375, -0.72039794921875, -0.9965972900390625, -0.04573822021484375, 0.07068252563476562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000054.npy"}
{"epoch": 0.08163265306122448, "step": 55, "batch_size": 64, "mean": 0.23402303457260132, "std": 0.47766026854515076, "min": -0.6229171752929688, "p10": -0.23956718444824218, "median": 0.1851520538330078, "p90": 0.8469398498535157, "max": 2.47918701171875, "pos_frac": 0.71875, "sample": [0.3848876953125, 0.2871856689453125, 0.36757659912109375, 0.0950164794921875, -0.18181991577148438, 0.201995849609375, 0.1154937744140625, -0.18180084228515625, -0.24111557006835938, 0.33319091796875, -0.15532302856445312, 0.9563140869140625, -0.1042938232421875, 0.13504791259765625, 0.4961357116699219, 0.0341949462890625, 0.24662399291992188, 0.8743133544921875, 0.29669189453125, -0.1940765380859375, 0.18991851806640625, 0.57568359375, 0.4176483154296875, 0.07149696350097656, 2.47918701171875, 0.03907966613769531, 0.18038558959960938, 0.443939208984375, 1.055145263671875, 0.6504364013671875, 0.21459197998046875, 0.4306640625, -0.5124282836914062, 0.351898193359375, -0.23595428466796875, -0.24481201171875, -0.13781166076660156, 0.09587860107421875, 1.07354736328125, 0.15380477905273438, 0.5834197998046875, 0.306365966796875, -0.033496856689453125, 0.9582061767578125, 0.5256195068359375, 0.05494880676269531, 0.8231582641601562, -0.189727783203125, -0.379241943359375, -0.581390380859375, 0.11214065551757812, 0.3639373779296875, -0.061431884765625, -0.45319366455078125, 0.373809814453125, 0.35417938232421875, 0.8571319580078125, 0.07009506225585938, 0.2620658874511719, 0.06866455078125, 0.4765777587890625, -0.0001373291015625, 0.050151824951171875, -0.6229171752929688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000055.npy"}
{"epoch": 0.08314436885865457, "step": 56, "batch_size": 64, "mean": 0.1093185544013977, "std": 0.4850178062915802, "min": -1.6096038818359375, "p10": -0.4689924240112304, "median": 0.1166067123413086, "p90": 0.5861740112304689, "max": 1.35150146484375, "pos_frac": 0.640625, "sample": [-0.17942047119140625, 0.47154998779296875, 0.1292743682861328, 0.2632293701171875, -0.042266845703125, -0.30760955810546875, 0.423248291015625, -0.5442962646484375, -0.2568016052246094, 0.05456352233886719, 0.005229949951171875, 0.54656982421875, 1.237945556640625, -0.19136810302734375, 0.42859649658203125, 0.3599853515625, 1.2188873291015625, 0.175018310546875, 0.24362945556640625, 0.36215782165527344, -0.4230194091796875, 1.35150146484375, -0.3660011291503906, 0.5076179504394531, -0.18134307861328125, 0.12678146362304688, 0.21991729736328125, 0.4969978332519531, -0.1923065185546875, -0.49715423583984375, 0.11552810668945312, -0.6595611572265625, 0.5557327270507812, 0.6685714721679688, 0.0785675048828125, 0.22069549560546875, -0.716796875, 0.4637908935546875, -0.4886951446533203, -0.096282958984375, 0.380950927734375, -0.098236083984375, 0.0884552001953125, 0.00681304931640625, 0.11768531799316406, 0.6629867553710938, 0.1720123291015625, -0.1627655029296875, -0.08965492248535156, 0.2987174987792969, 0.42200469970703125, 0.0312347412109375, 0.5465316772460938, -0.7737350463867188, -1.6096038818359375, 0.2827949523925781, -0.18112945556640625, 0.07099151611328125, -0.05663299560546875, 0.5992202758789062, -0.3525848388671875, 0.7536773681640625, 0.021465301513671875, 0.28252410888671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000056.npy"}
{"epoch": 0.08465608465608465, "step": 57, "batch_size": 64, "mean": 0.16701054573059082, "std": 0.5862471461296082, "min": -1.4887237548828125, "p10": -0.5570905685424805, "median": 0.1906290054321289, "p90": 0.8081962585449219, "max": 1.3666114807128906, "pos_frac": 0.6875, "sample": [0.15792083740234375, -0.19665145874023438, 0.3219261169433594, 1.32672119140625, -0.17852401733398438, -0.0338897705078125, -0.29161834716796875, 0.812103271484375, 0.2406005859375, 0.27579498291015625, -0.030517578125, 0.43804931640625, 0.271209716796875, 0.1739349365234375, 0.795745849609375, 0.2338409423828125, -1.4887237548828125, 0.16166305541992188, 0.9893646240234375, 0.2073230743408203, -0.378692626953125, -0.1222686767578125, 0.5697669982910156, 0.329833984375, 0.22177505493164062, -0.5598468780517578, 0.25624847412109375, 0.8472442626953125, 0.127532958984375, 0.08805084228515625, 0.5211944580078125, 1.3666114807128906, -0.3826713562011719, -0.5506591796875, 0.07849884033203125, 0.6985549926757812, -1.0087890625, 0.6416053771972656, -1.2953643798828125, 0.4266510009765625, 0.2983818054199219, 0.6395225524902344, -0.5663604736328125, 0.493438720703125, 0.6724853515625, -0.7941131591796875, 0.4266204833984375, 0.5686912536621094, 0.16355133056640625, 0.11586761474609375, 0.13381576538085938, 0.04552459716796875, 0.6958389282226562, 1.332366943359375, 0.2992591857910156, 0.7990798950195312, 1.1664581298828125, -0.2725944519042969, 0.15245437622070312, -1.0173492431640625, -0.37773895263671875, -0.00450897216796875, 0.116241455078125, -0.459808349609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000057.npy"}
{"epoch": 0.08616780045351474, "step": 58, "batch_size": 64, "mean": 0.04027312994003296, "std": 0.47528353333473206, "min": -0.848602294921875, "p10": -0.5882598876953125, "median": 0.040462493896484375, "p90": 0.5976219177246096, "max": 1.3958969116210938, "pos_frac": 0.53125, "sample": [0.19664764404296875, 0.6220779418945312, -0.053699493408203125, 0.19108963012695312, -0.041107177734375, -0.439483642578125, 0.4008331298828125, 1.3958969116210938, 0.21734619140625, 0.310272216796875, 0.7143325805664062, -0.01016998291015625, -0.15667724609375, -0.3129425048828125, 0.12932586669921875, 0.3482475280761719, 0.3158721923828125, -0.424224853515625, -0.3311004638671875, 0.953643798828125, 0.20406150817871094, 0.23768997192382812, -0.35034942626953125, -0.5928192138671875, -0.5730743408203125, -0.317291259765625, -0.36615753173828125, -0.179931640625, -0.6703071594238281, -0.3915214538574219, -0.4768409729003906, 0.033935546875, -0.6454277038574219, 0.4270172119140625, 0.30560302734375, 0.16574859619140625, 0.540557861328125, 1.32525634765625, 0.48012542724609375, -0.0729522705078125, -0.09954833984375, 0.1531810760498047, 0.06668853759765625, -0.227996826171875, -0.7054290771484375, 0.3483428955078125, -0.17840576171875, 0.388946533203125, 0.1314849853515625, 0.4118080139160156, -0.5776214599609375, 0.1678466796875, 0.04698944091796875, 0.303253173828125, 0.4320716857910156, 0.6927108764648438, 0.0162353515625, -0.29845428466796875, -0.7080459594726562, -0.848602294921875, -0.7345428466796875, 0.7234344482421875, -0.014310836791992188, -0.022058486938476562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000058.npy"}
{"epoch": 0.08767951625094482, "step": 59, "batch_size": 64, "mean": 0.17183765769004822, "std": 0.5737805962562561, "min": -1.0540046691894531, "p10": -0.3873756408691406, "median": 0.12274932861328125, "p90": 0.7019840240478521, "max": 2.46710205078125, "pos_frac": 0.59375, "sample": [-0.395782470703125, -0.27227020263671875, 0.0265350341796875, -1.0540046691894531, 0.2808036804199219, -0.2978973388671875, 0.12885284423828125, -0.6647491455078125, -0.185028076171875, 0.4466552734375, 2.46710205078125, 0.244476318359375, 1.5274810791015625, 0.38726043701171875, 0.862213134765625, 0.23441314697265625, 0.158233642578125, -0.028966903686523438, 0.5646820068359375, 0.11474609375, -0.040111541748046875, -0.22274017333984375, -0.3660621643066406, 0.11664581298828125, -0.2202606201171875, 0.5651130676269531, -0.24893951416015625, 0.30503082275390625, 1.173614501953125, 0.5144424438476562, -0.09283065795898438, -0.03386688232421875, -0.055522918701171875, 0.0045166015625, -0.7397613525390625, 0.47541236877441406, 0.05614471435546875, -0.4001140594482422, 0.1073455810546875, 0.4182395935058594, 1.0760040283203125, 0.3970184326171875, -0.36775970458984375, 0.5812416076660156, 0.1533203125, 0.29912376403808594, 0.24908447265625, -0.00943756103515625, -0.2618598937988281, 1.895477294921875, -0.4046592712402344, 0.17795181274414062, 0.41291046142578125, -0.5033416748046875, 0.2636871337890625, 0.2021331787109375, -0.01007080078125, 0.5568008422851562, -0.07312393188476562, 0.7537307739257812, -0.3390083312988281, 0.1396942138671875, 0.19573974609375, -0.24809837341308594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000059.npy"}
{"epoch": 0.08919123204837491, "step": 60, "batch_size": 64, "mean": 0.11775723099708557, "std": 0.5579010248184204, "min": -1.304840087890625, "p10": -0.5370635986328124, "median": 0.0846099853515625, "p90": 0.7668746948242191, "max": 2.3864288330078125, "pos_frac": 0.640625, "sample": [0.5819854736328125, 0.22900390625, 0.6432418823242188, -0.4356842041015625, -0.37589263916015625, 0.5475578308105469, -1.304840087890625, -0.6949462890625, -0.646209716796875, -0.0147857666015625, 0.3714637756347656, 0.3502960205078125, -0.17504501342773438, 0.1448822021484375, -0.4463653564453125, 0.3218841552734375, 0.09901809692382812, 0.5600662231445312, 0.3539772033691406, 0.6396942138671875, 0.0137481689453125, 0.080657958984375, 0.054473876953125, 0.0241241455078125, -0.59326171875, -0.4056854248046875, -0.4402580261230469, 0.019540786743164062, 0.5981292724609375, 0.11168670654296875, 0.015207290649414062, 0.08856201171875, 0.8729400634765625, -0.33522796630859375, -0.6418037414550781, -0.1745147705078125, -0.07490921020507812, -0.12288665771484375, 0.11029815673828125, -0.2354412078857422, -0.69189453125, 0.34954261779785156, 0.4583854675292969, -0.4501190185546875, 0.009342193603515625, 0.6715240478515625, 0.12065505981445312, 2.3864288330078125, 1.0991668701171875, 0.09051132202148438, 0.3237152099609375, -0.090850830078125, 0.1544036865234375, -0.324737548828125, 1.2138671875, 0.004840850830078125, -0.5743255615234375, -0.07845306396484375, 0.8384552001953125, 0.8702507019042969, 0.8077392578125, 0.13782882690429688, 0.46448516845703125, 0.031019210815429688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000060.npy"}
{"epoch": 0.09070294784580499, "step": 61, "batch_size": 64, "mean": 0.04662443697452545, "std": 0.4838990867137909, "min": -1.3006744384765625, "p10": -0.6342811584472656, "median": 0.08600521087646484, "p90": 0.7164581298828125, "max": 1.03363037109375, "pos_frac": 0.578125, "sample": [-0.5117034912109375, 0.784912109375, -0.08369064331054688, 0.8766326904296875, 0.5171012878417969, 0.15294647216796875, 0.4815177917480469, 0.0713348388671875, -0.42256927490234375, 0.24706268310546875, -0.9555130004882812, 0.09506988525390625, 0.3672676086425781, -0.10150146484375, -0.19136810302734375, 0.04114532470703125, -0.08921051025390625, 0.13299560546875, -0.6501541137695312, -0.011749267578125, -0.16027450561523438, 0.194061279296875, 0.863677978515625, -0.0753488540649414, 0.1446990966796875, -0.234588623046875, -0.1420745849609375, -0.18109130859375, 0.24271392822265625, -0.21694564819335938, 0.08280181884765625, 0.08920860290527344, -0.6040534973144531, 0.0063629150390625, 0.137298583984375, 0.15697097778320312, -0.02175140380859375, -0.2844409942626953, -0.6472358703613281, 0.2523345947265625, 0.859832763671875, -1.1403579711914062, 0.721099853515625, -0.7551689147949219, 0.45121002197265625, 0.5527229309082031, -0.29638671875, 0.1560344696044922, -0.3319206237792969, 0.5018539428710938, 0.13409423828125, -0.698974609375, -0.20429039001464844, 0.063873291015625, 0.70562744140625, 0.1282501220703125, 1.03363037109375, 0.3165111541748047, 0.8326568603515625, 0.249542236328125, 0.49907684326171875, 0.24478912353515625, -1.3006744384765625, -0.0919189453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000061.npy"}
{"epoch": 0.09221466364323508, "step": 62, "batch_size": 64, "mean": 0.17216727137565613, "std": 0.4864511787891388, "min": -1.032470703125, "p10": -0.4739028930664062, "median": 0.15052032470703125, "p90": 0.7534168243408204, "max": 1.6102218627929688, "pos_frac": 0.640625, "sample": [0.06890106201171875, -0.0573883056640625, -0.16077423095703125, -0.07389640808105469, -0.02642822265625, 0.4660491943359375, -0.11305999755859375, -0.3076744079589844, 0.29840850830078125, 0.144927978515625, 0.3388328552246094, 0.5451469421386719, 0.495086669921875, -0.7407455444335938, -0.02895355224609375, 1.6102218627929688, 0.1577911376953125, 0.4473419189453125, 0.18190765380859375, 0.3450469970703125, 0.14611053466796875, 0.02822113037109375, -0.5546875, 0.29123687744140625, 0.0007114410400390625, -0.15100860595703125, 0.5630340576171875, -0.23445892333984375, 0.5377883911132812, 0.7253990173339844, -0.5059356689453125, 0.02132415771484375, -0.08774185180664062, 0.0855255126953125, 0.5709648132324219, -1.032470703125, 0.41036224365234375, 0.8270721435546875, -0.003124237060546875, 0.116851806640625, 0.5136260986328125, -0.0386962890625, -0.4377899169921875, 0.757904052734375, 0.6904506683349609, 0.2924308776855469, 0.7822189331054688, 0.04302215576171875, -0.22841262817382812, 0.92279052734375, -0.7975425720214844, 0.7429466247558594, 0.36147308349609375, 0.2384319305419922, -0.7132492065429688, -0.27719879150390625, 1.09759521484375, 1.1660003662109375, 0.41462135314941406, 0.334136962890625, 0.21662139892578125, -0.0741424560546875, -0.4893798828125, 0.15493011474609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000062.npy"}
{"epoch": 0.09372637944066516, "step": 63, "batch_size": 64, "mean": 0.22391453385353088, "std": 0.536447286605835, "min": -1.22216796875, "p10": -0.28531341552734374, "median": 0.20874786376953125, "p90": 0.8165302276611329, "max": 2.0606536865234375, "pos_frac": 0.609375, "sample": [0.8856277465820312, -0.19896697998046875, 0.49390411376953125, 0.264739990234375, 1.890777587890625, -0.5339775085449219, 0.14502716064453125, -0.271942138671875, 0.121856689453125, -0.072265625, -1.22216796875, -0.7387657165527344, 0.4420433044433594, -0.2347259521484375, 0.42942047119140625, -0.12613677978515625, 2.0606536865234375, -0.00600433349609375, 0.116302490234375, -0.28961181640625, 0.8901824951171875, 0.27379608154296875, 0.48760986328125, 1.0432586669921875, 0.6713027954101562, 0.2472076416015625, 0.1009979248046875, 0.4074897766113281, 0.4565105438232422, -0.505859375, 0.6054306030273438, 0.053852081298828125, 0.5212860107421875, -0.0258941650390625, -0.2752838134765625, 0.5085372924804688, -0.515350341796875, -0.26363372802734375, 0.391021728515625, 0.21669769287109375, 0.3333854675292969, 0.668212890625, -0.23583221435546875, 0.8256568908691406, -0.4497337341308594, 0.35268402099609375, -0.03472900390625, -0.015209197998046875, 0.642486572265625, -0.03496551513671875, -0.23957443237304688, -0.011322021484375, -0.03725433349609375, 0.513031005859375, 0.20079803466796875, 0.6628875732421875, 0.28271484375, -0.10304641723632812, -0.16819381713867188, 0.11000442504882812, 0.7952346801757812, 0.8544464111328125, 0.5753936767578125, 0.39850616455078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000063.npy"}
{"epoch": 0.09523809523809523, "step": 64, "batch_size": 64, "mean": 0.0653490424156189, "std": 0.5291185975074768, "min": -1.52142333984375, "p10": -0.493417739868164, "median": 0.05550956726074219, "p90": 0.5873786926269533, "max": 1.96185302734375, "pos_frac": 0.546875, "sample": [0.229461669921875, 1.96185302734375, 0.12279510498046875, -0.0415802001953125, -0.6413230895996094, 1.3306350708007812, 0.5108566284179688, 0.6175155639648438, 0.261993408203125, 1.0509262084960938, -0.13701248168945312, 0.4423179626464844, 0.4632987976074219, 0.1273193359375, 0.7303390502929688, -0.04837799072265625, 0.5312423706054688, 0.115966796875, -0.065155029296875, -0.1968822479248047, -0.0808258056640625, 0.11698150634765625, 0.18387222290039062, 0.09771728515625, -1.0158309936523438, 0.033290863037109375, -0.24810409545898438, 0.4853515625, -0.24545669555664062, 0.1497821807861328, -0.1820049285888672, 1.0748443603515625, 0.054416656494140625, -0.64697265625, 0.6049156188964844, 0.3039398193359375, 0.5464591979980469, -1.52142333984375, -0.2342376708984375, 0.00865936279296875, -0.028533935546875, -0.06150054931640625, 0.1995849609375, 0.05660247802734375, -0.19768524169921875, -0.397552490234375, 0.13957595825195312, -0.1779937744140625, -0.9091567993164062, 0.3676300048828125, 0.05829620361328125, 0.06553459167480469, -0.42420387268066406, -0.048069000244140625, 0.2772178649902344, 0.44013023376464844, 0.2531013488769531, -0.11439323425292969, -0.2828102111816406, -0.23887252807617188, -0.30767822265625, -0.6420974731445312, -0.5230808258056641, -0.17327117919921875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000064.npy"}
{"epoch": 0.09674981103552532, "step": 65, "batch_size": 64, "mean": 0.07702624797821045, "std": 0.6449230313301086, "min": -1.5763626098632812, "p10": -0.6226043701171875, "median": -0.011533737182617188, "p90": 0.9556331634521487, "max": 1.6975631713867188, "pos_frac": 0.5, "sample": [0.3479290008544922, 0.1017303466796875, 1.1445693969726562, -0.4346771240234375, -0.1935272216796875, 0.4527320861816406, -0.4439697265625, -0.583648681640625, -0.4325141906738281, -0.045864105224609375, 0.3701629638671875, -0.130096435546875, 0.7123622894287109, -1.5763626098632812, 0.5552291870117188, -0.35150146484375, 0.8771286010742188, 0.6449356079101562, -0.0750579833984375, 0.378173828125, -0.038330078125, 0.27539825439453125, -0.68072509765625, 1.4285736083984375, 1.2776107788085938, 0.041629791259765625, -0.901336669921875, -0.313568115234375, 0.454803466796875, 0.04634857177734375, -0.22029876708984375, -0.387176513671875, 0.548828125, -0.5246047973632812, 0.5127410888671875, 0.9056320190429688, -0.2173614501953125, -0.591522216796875, 1.6975631713867188, 0.6187400817871094, 0.9770622253417969, 0.05863189697265625, -0.0370635986328125, -0.09267425537109375, -0.7816581726074219, -0.23409271240234375, 0.013996124267578125, 0.4787445068359375, -0.4497489929199219, -0.63592529296875, 0.1742095947265625, 0.03326606750488281, -1.03021240234375, -0.10828781127929688, -0.272216796875, 0.2646331787109375, 0.4427642822265625, -0.251953125, 0.52008056640625, -0.4035797119140625, 1.3448028564453125, -1.199554443359375, 0.98016357421875, -0.11238670349121094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000065.npy"}
{"epoch": 0.0982615268329554, "step": 66, "batch_size": 64, "mean": 0.20094043016433716, "std": 0.616756796836853, "min": -2.2700653076171875, "p10": -0.45004386901855464, "median": 0.16564178466796875, "p90": 0.7830352783203125, "max": 2.1712188720703125, "pos_frac": 0.71875, "sample": [-0.03494453430175781, 2.1712188720703125, 0.39672088623046875, 0.21602439880371094, 0.1429615020751953, 0.5090789794921875, -0.2279987335205078, 0.581939697265625, -0.07412147521972656, 0.39609527587890625, 0.07471847534179688, -0.7816848754882812, 0.4255218505859375, -0.4033699035644531, 0.103912353515625, 0.4992523193359375, 0.5350570678710938, 0.2243194580078125, 0.43027496337890625, 1.1460418701171875, 0.603759765625, 1.2895889282226562, -0.5375442504882812, 0.13022994995117188, -0.3206214904785156, 0.7757453918457031, 0.4208393096923828, 0.7861595153808594, 1.1520919799804688, -0.4700469970703125, 0.5163497924804688, 0.14918136596679688, -0.10640716552734375, 0.0053558349609375, 0.31790733337402344, 0.5796699523925781, 0.041473388671875, -0.09552383422851562, 0.1450347900390625, 0.39447784423828125, -2.2700653076171875, -0.8725166320800781, 0.36240386962890625, 0.44525909423828125, 0.5549468994140625, 0.1343994140625, 0.02426910400390625, 0.18210220336914062, 0.378692626953125, 0.5958938598632812, 0.030649185180664062, 0.00128936767578125, -0.1634063720703125, 0.3529052734375, 0.5134963989257812, -0.6872024536132812, -0.296295166015625, 0.12799072265625, -0.0343780517578125, 1.4885711669921875, 0.870391845703125, -0.2822151184082031, 0.1038360595703125, -0.8095703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000066.npy"}
{"epoch": 0.09977324263038549, "step": 67, "batch_size": 64, "mean": 0.18860608339309692, "std": 0.46832337975502014, "min": -0.54534912109375, "p10": -0.2847291946411133, "median": 0.08477210998535156, "p90": 0.7995079040527344, "max": 1.684783935546875, "pos_frac": 0.59375, "sample": [-0.42954254150390625, -0.1656036376953125, -0.367095947265625, -0.20647621154785156, -0.0430145263671875, 0.24187088012695312, 0.7764205932617188, 0.48833656311035156, 0.563995361328125, 0.286529541015625, 1.684783935546875, -0.056690216064453125, -0.0673980712890625, 0.08925628662109375, 0.1692962646484375, -0.29002952575683594, -0.0201873779296875, 0.46204376220703125, -0.19982147216796875, -0.14481353759765625, -0.1071624755859375, 0.4054603576660156, 0.032440185546875, 1.2405929565429688, -0.1611499786376953, -0.54534912109375, 0.9822502136230469, 0.5346832275390625, -0.13091278076171875, 0.06541061401367188, -0.406341552734375, -0.12061691284179688, -0.133087158203125, 0.6308975219726562, 0.32009124755859375, 0.3010997772216797, 0.08028793334960938, 0.5438270568847656, -0.19696807861328125, 0.5885162353515625, 0.1845245361328125, 0.077606201171875, 0.197662353515625, 0.8094024658203125, 0.1996002197265625, 0.00328826904296875, 0.208282470703125, -0.3020153045654297, 0.8558120727539062, 0.20786285400390625, 0.1216278076171875, 1.462890625, -0.20751571655273438, 0.1289520263671875, -0.0316009521484375, -0.4289093017578125, 0.02732086181640625, 0.11328125, -0.2496185302734375, -0.27236175537109375, 0.7662162780761719, -0.132965087890625, 1.274383544921875, 0.3612327575683594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000067.npy"}
{"epoch": 0.10128495842781557, "step": 68, "batch_size": 64, "mean": 0.18833568692207336, "std": 0.5906465649604797, "min": -1.219024658203125, "p10": -0.4318008422851562, "median": 0.171295166015625, "p90": 0.9750656127929689, "max": 1.6233978271484375, "pos_frac": 0.578125, "sample": [1.082794189453125, 0.7232818603515625, -0.026515960693359375, -0.306854248046875, -0.905548095703125, 1.6233978271484375, -0.34348297119140625, 0.4802665710449219, 1.1616630554199219, -0.586944580078125, 0.9114303588867188, -1.219024658203125, 0.3824920654296875, 0.27685546875, -0.1439971923828125, 0.5886001586914062, 0.9875259399414062, -0.0567474365234375, 0.21636199951171875, 0.450042724609375, 1.0600814819335938, -0.3559074401855469, -0.7144317626953125, 0.7425765991210938, -0.16591262817382812, 0.1654815673828125, -0.342681884765625, -0.10915756225585938, 1.2880477905273438, 0.07073974609375, -0.02745819091796875, 0.4293537139892578, 0.358306884765625, 0.9459915161132812, -0.025791168212890625, 0.8968887329101562, 1.07244873046875, 0.5317459106445312, 0.46968841552734375, 0.2567291259765625, -0.913543701171875, -1.1253204345703125, -0.1469879150390625, 0.7975311279296875, -0.3932952880859375, 0.36060333251953125, 0.083892822265625, -0.19232940673828125, -0.29931640625, -0.44830322265625, 0.2346343994140625, 0.00605010986328125, 0.4147796630859375, 0.8292083740234375, 0.582611083984375, -0.0695343017578125, -0.0894775390625, -0.05075836181640625, 0.359466552734375, 0.06756210327148438, 0.55914306640625, -0.3490447998046875, -0.18353271484375, 0.1771087646484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000068.npy"}
{"epoch": 0.10279667422524566, "step": 69, "batch_size": 64, "mean": 0.25942692160606384, "std": 0.5868950486183167, "min": -1.7499237060546875, "p10": -0.4157665252685547, "median": 0.2508735656738281, "p90": 0.8631347656250001, "max": 1.9273681640625, "pos_frac": 0.703125, "sample": [0.20826339721679688, -0.1456298828125, 0.8730659484863281, -0.4100494384765625, 0.19440460205078125, 0.36812591552734375, 0.6038551330566406, -0.146148681640625, -0.132720947265625, 0.6277236938476562, -0.4582481384277344, 0.04518890380859375, 0.24292373657226562, 0.6971969604492188, 0.24908447265625, 0.23226165771484375, 0.43541717529296875, 1.1558380126953125, 0.7448883056640625, -0.6049957275390625, 0.719573974609375, -0.40077972412109375, 0.219451904296875, 0.9260215759277344, 0.781585693359375, 0.4099884033203125, 0.4993896484375, -0.17669677734375, 0.616943359375, -0.0444793701171875, 0.12801742553710938, 0.5, 0.3450889587402344, 0.741851806640625, 0.6541252136230469, -1.7499237060546875, 1.7401657104492188, 0.09492874145507812, 0.299957275390625, -0.4341583251953125, -0.363983154296875, 0.27809715270996094, -0.351470947265625, -0.13031005859375, 1.0073089599609375, -0.41416168212890625, 1.3566017150878906, 0.21609878540039062, 0.0434112548828125, -0.27447509765625, 0.25266265869140625, 0.5307216644287109, -0.4164543151855469, -0.6809043884277344, 0.8399620056152344, 0.3670806884765625, 0.589324951171875, 0.5785789489746094, 0.45340728759765625, -0.4829750061035156, 1.9273681640625, 0.27913665771484375, 0.14910888671875, 0.1976909637451172], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000069.npy"}
{"epoch": 0.10430839002267574, "step": 70, "batch_size": 64, "mean": 0.21499556303024292, "std": 0.7080262303352356, "min": -1.417816162109375, "p10": -0.5360164642333984, "median": 0.06749820709228516, "p90": 1.1805915832519538, "max": 2.1089324951171875, "pos_frac": 0.515625, "sample": [-0.414337158203125, -0.13600921630859375, 0.9076156616210938, -1.417816162109375, -0.11939620971679688, -0.1264934539794922, -0.40709686279296875, 0.33429718017578125, 0.6924362182617188, -0.5371665954589844, 1.2604598999023438, -0.3739509582519531, -0.184326171875, 0.994232177734375, 0.0001430511474609375, 0.24298667907714844, 0.21384811401367188, -0.06732940673828125, -0.02217864990234375, -0.24777793884277344, 0.8584213256835938, 0.6255645751953125, 0.8509635925292969, 0.14867019653320312, -0.9839706420898438, 0.823150634765625, -0.0657958984375, 1.5169754028320312, 0.27184295654296875, 1.902862548828125, -0.11818313598632812, -0.3443450927734375, -0.5449752807617188, -0.34175872802734375, -0.689971923828125, 2.1089324951171875, 0.15681838989257812, -1.1241912841796875, 0.2279949188232422, 0.2273406982421875, -0.0179901123046875, 0.37297821044921875, -0.35772705078125, 1.7158203125, 0.7358932495117188, 1.3563766479492188, 0.15122222900390625, 0.253173828125, -0.18342208862304688, 0.6876029968261719, 0.5241775512695312, 1.5354537963867188, -0.5333328247070312, -0.4315185546875, -0.020893096923828125, -0.0678558349609375, -0.00737762451171875, -0.5643577575683594, 0.796966552734375, 0.9411067962646484, -0.036651611328125, 0.13485336303710938, -0.018459320068359375, 0.6951904296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000070.npy"}
{"epoch": 0.10582010582010581, "step": 71, "batch_size": 64, "mean": 0.11634287238121033, "std": 0.6330570578575134, "min": -1.3487548828125, "p10": -0.82747802734375, "median": 0.2069377899169922, "p90": 0.8342723846435548, "max": 1.14947509765625, "pos_frac": 0.609375, "sample": [0.10675430297851562, 0.611541748046875, 0.6067848205566406, 0.8107872009277344, 0.5957603454589844, 0.1724090576171875, 0.6073436737060547, -0.5623626708984375, 0.564208984375, 1.14947509765625, -0.04525184631347656, -0.899200439453125, -0.24657440185546875, -0.6242218017578125, -0.8633346557617188, 0.21359634399414062, -0.6846160888671875, 0.30310821533203125, 0.7402420043945312, 0.1435089111328125, -0.21722412109375, 0.6237335205078125, 0.31510162353515625, -0.6463165283203125, 0.39645957946777344, 0.7136993408203125, 0.6313629150390625, 0.3587532043457031, 0.09157562255859375, -0.6560440063476562, -1.3487548828125, -0.7807159423828125, -0.8475189208984375, -0.2548637390136719, -0.06427764892578125, -0.5187530517578125, 0.5059051513671875, 0.5386505126953125, 0.8843345642089844, -0.8477325439453125, -0.0782318115234375, -1.2468414306640625, 0.755401611328125, 0.781585693359375, -0.6845779418945312, -0.19651412963867188, -0.124420166015625, 0.111663818359375, 0.32070159912109375, 0.23403358459472656, 0.8443374633789062, 0.9793128967285156, -0.9671707153320312, 0.95184326171875, 0.46985626220703125, 0.4161376953125, 0.5947303771972656, -0.07763671875, 0.20027923583984375, 0.7700080871582031, 1.0850563049316406, 0.1420269012451172, 1.1273956298828125, -0.5403671264648438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000071.npy"}
{"epoch": 0.1073318216175359, "step": 72, "batch_size": 64, "mean": 0.23683109879493713, "std": 0.6620417833328247, "min": -1.567626953125, "p10": -0.46623611450195307, "median": 0.1990528106689453, "p90": 0.9692398071289064, "max": 1.9905014038085938, "pos_frac": 0.625, "sample": [-0.7177581787109375, 1.5662841796875, 0.6487350463867188, 0.50384521484375, -0.5752105712890625, 0.18374252319335938, 0.08864021301269531, 1.8695411682128906, 0.003467559814453125, 0.7054710388183594, 0.3304176330566406, -0.06514739990234375, 0.312103271484375, 0.0063343048095703125, 0.5543594360351562, -0.07357597351074219, 0.5797901153564453, -0.34792327880859375, 0.05696868896484375, -0.2910308837890625, 0.412445068359375, 0.9113426208496094, 0.4282073974609375, 0.21436309814453125, 0.4467315673828125, 0.9860992431640625, 0.11891746520996094, -0.266937255859375, -1.01507568359375, 0.007049560546875, 1.44927978515625, 0.6222286224365234, -0.41066741943359375, -0.2453155517578125, -0.40317535400390625, -0.2316436767578125, 0.4857940673828125, -0.001129150390625, -0.25399017333984375, 0.267303466796875, 0.8016319274902344, 0.7443218231201172, 0.5780487060546875, -0.3000526428222656, -0.12335205078125, 0.32462310791015625, 0.929901123046875, -0.2734222412109375, 0.6966552734375, 0.291473388671875, -1.567626953125, -0.697845458984375, 1.9905014038085938, -0.49005126953125, 1.5665435791015625, -0.4903144836425781, 0.2979583740234375, 0.5305023193359375, -0.3604888916015625, -0.09717941284179688, 0.8932876586914062, -0.164581298828125, 0.07158660888671875, 1.1441879272460938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000072.npy"}
{"epoch": 0.10884353741496598, "step": 73, "batch_size": 64, "mean": 0.021545469760894775, "std": 0.7188830375671387, "min": -1.5119400024414062, "p10": -0.7290702819824219, "median": -0.10976314544677734, "p90": 0.9744834899902347, "max": 2.08026123046875, "pos_frac": 0.453125, "sample": [0.52850341796875, 0.005340576171875, 1.207855224609375, -0.96697998046875, -0.6321296691894531, -1.5119400024414062, -0.5527267456054688, -0.3159980773925781, -0.423828125, 0.8995285034179688, -0.5514068603515625, 0.34716796875, 0.8006019592285156, -0.6038246154785156, -0.609100341796875, 0.4863700866699219, -0.4472808837890625, -0.7169952392578125, -0.18312835693359375, 0.7832260131835938, -0.0206298828125, -0.34256744384765625, 0.1413116455078125, -0.2547760009765625, 2.08026123046875, 1.0066070556640625, -0.7342453002929688, -0.5045738220214844, 0.041698455810546875, 0.17721176147460938, 0.2588844299316406, -0.24940109252929688, 0.3413352966308594, -0.7762908935546875, -0.41619873046875, 0.44939422607421875, 0.7106590270996094, 0.62359619140625, -0.5676155090332031, -0.26554107666015625, -0.828125, 1.62744140625, -0.765472412109375, -0.29978179931640625, -0.6237945556640625, 1.6729583740234375, 0.19866943359375, -0.05435943603515625, 0.31337738037109375, -0.530517578125, 0.39715576171875, 0.37725830078125, -0.06507682800292969, -0.47847747802734375, -0.154449462890625, 0.5026779174804688, -0.6900634765625, 0.6239776611328125, -0.26734161376953125, -0.991241455078125, 0.09212875366210938, 1.3235321044921875, 1.29364013671875, -0.5375804901123047], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000073.npy"}
{"epoch": 0.11035525321239607, "step": 74, "batch_size": 64, "mean": 0.2516631782054901, "std": 0.6504147052764893, "min": -1.76580810546875, "p10": -0.4560020446777343, "median": 0.28249645233154297, "p90": 1.0494979858398439, "max": 1.33929443359375, "pos_frac": 0.65625, "sample": [-0.1289825439453125, 0.33303070068359375, -1.6024322509765625, 0.9044342041015625, -0.41748046875, -0.08306121826171875, 1.0017547607421875, 1.275054931640625, 0.5101966857910156, 0.92034912109375, 0.2652397155761719, 1.1165847778320312, 0.584442138671875, 0.38883209228515625, -0.060474395751953125, 0.7148971557617188, -0.341644287109375, 0.743865966796875, -0.3176746368408203, -0.69915771484375, 1.0543136596679688, 0.02249908447265625, -0.04610443115234375, 0.5992317199707031, 0.5039215087890625, -0.4133491516113281, 0.018939971923828125, 0.47560882568359375, 1.26800537109375, -0.7592544555664062, 0.0728302001953125, 0.6739349365234375, 0.5500564575195312, -0.47251129150390625, -0.7957115173339844, 0.9105567932128906, -0.3188972473144531, -0.35684967041015625, -0.5899658203125, 0.11496353149414062, 0.5364532470703125, -0.2948284149169922, 0.4869117736816406, 0.5073089599609375, 0.14359283447265625, 1.2343292236328125, 1.33929443359375, 0.29975318908691406, 1.0305709838867188, -0.111358642578125, -1.76580810546875, -0.12548446655273438, -0.035247802734375, -0.0990753173828125, 0.4669303894042969, 0.1970062255859375, 0.7084121704101562, 0.096221923828125, 0.05377960205078125, 0.8069839477539062, 1.0810623168945312, 1.0382614135742188, 0.14301300048828125, 0.7483673095703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000074.npy"}
{"epoch": 0.11186696900982615, "step": 75, "batch_size": 64, "mean": 0.2583045959472656, "std": 0.7815049886703491, "min": -2.4210281372070312, "p10": -0.4076782226562499, "median": 0.2311868667602539, "p90": 1.0436031341552736, "max": 2.801513671875, "pos_frac": 0.65625, "sample": [-0.10805130004882812, 0.295257568359375, 0.6712989807128906, 0.4794120788574219, 1.6942367553710938, 0.31534576416015625, 0.37734222412109375, -0.08371734619140625, 0.5157318115234375, 0.9428901672363281, -0.48919677734375, -0.8417778015136719, -0.09251785278320312, 0.29105377197265625, 0.11519622802734375, 0.9847259521484375, 0.7208251953125, -0.24808883666992188, 0.021615982055664062, -0.4447822570800781, 2.801513671875, 0.7557640075683594, 1.1130905151367188, 0.9495010375976562, 0.1766510009765625, -0.1446380615234375, 0.19873046875, -1.767242431640625, 0.63446044921875, -0.3211021423339844, 0.5126190185546875, 1.0688362121582031, -0.8133392333984375, 0.6312751770019531, 0.4596099853515625, -0.1714038848876953, 2.2446136474609375, 0.04219818115234375, 0.2700958251953125, 0.0503692626953125, 0.01975250244140625, 0.4901695251464844, -2.4210281372070312, 0.28102874755859375, 0.4914703369140625, 0.00461578369140625, -0.2563934326171875, 0.1036529541015625, -0.0451812744140625, -0.1528167724609375, 0.23085594177246094, -0.20072174072265625, -0.04579925537109375, -0.22936248779296875, 0.7313385009765625, -0.31491851806640625, 1.475555419921875, 0.4101390838623047, -0.5848884582519531, 0.35646820068359375, 0.23151779174804688, -0.06049346923828125, 1.729522705078125, 0.478607177734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000075.npy"}
{"epoch": 0.11337868480725624, "step": 76, "batch_size": 64, "mean": 0.2676779627799988, "std": 0.7423666715621948, "min": -1.4982528686523438, "p10": -0.659893226623535, "median": 0.1456451416015625, "p90": 1.1669942855834963, "max": 1.8912582397460938, "pos_frac": 0.65625, "sample": [-0.090301513671875, 0.01905059814453125, -0.8145751953125, 0.1856842041015625, 0.64501953125, -0.5090980529785156, -0.19131088256835938, 0.0680999755859375, 0.5950584411621094, 0.75006103515625, 1.1290855407714844, 1.081207275390625, -0.1378631591796875, -0.23809051513671875, 1.0283966064453125, -1.260955810546875, -0.0596466064453125, 0.5101318359375, 0.156951904296875, 0.053760528564453125, 0.0956268310546875, 0.6138839721679688, -1.04351806640625, 0.9390106201171875, 1.7360687255859375, 0.0595550537109375, 0.16726112365722656, 0.8279037475585938, -0.3858489990234375, 0.18459701538085938, -0.1498260498046875, -0.003002166748046875, -0.09188461303710938, -1.3290252685546875, -0.3379707336425781, 0.0704498291015625, 1.7667999267578125, 1.64337158203125, -0.3206062316894531, 0.2289276123046875, 0.3143310546875, 0.5967521667480469, -0.498931884765625, -0.0121917724609375, 1.2541885375976562, -0.15665817260742188, 0.39530181884765625, -1.4982528686523438, 1.1251449584960938, 0.41399383544921875, 0.7186965942382812, 0.6025390625, 0.826171875, 1.8912582397460938, 1.0621795654296875, 0.0997314453125, 0.11147117614746094, 1.1832408905029297, -0.7893524169921875, 1.0499420166015625, 1.336212158203125, 0.13433837890625, 0.103363037109375, -0.7245197296142578], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000076.npy"}
{"epoch": 0.11489040060468632, "step": 77, "batch_size": 64, "mean": 0.3807138502597809, "std": 0.7726268768310547, "min": -1.4603958129882812, "p10": -0.4301429748535156, "median": 0.2723388671875, "p90": 1.5748115539550782, "max": 2.3008193969726562, "pos_frac": 0.640625, "sample": [-0.14571762084960938, 0.3381805419921875, 0.4602317810058594, -0.7491531372070312, 0.9333477020263672, 0.46429443359375, 0.01752471923828125, -0.12276077270507812, -1.4603958129882812, -0.7668495178222656, 0.7425155639648438, -0.07454299926757812, -0.4449310302734375, -0.222747802734375, 0.313720703125, 2.3008193969726562, 1.2192535400390625, 0.7459716796875, 0.48282623291015625, 2.1409835815429688, -0.39563751220703125, 0.14107131958007812, 1.5749740600585938, 1.2493515014648438, 0.6663818359375, -0.2588310241699219, 0.27294921875, 0.9804229736328125, -0.023174285888671875, -0.0754852294921875, -0.3910713195800781, -0.09972381591796875, 0.380218505859375, 1.9245452880859375, 0.271728515625, 0.805633544921875, 1.2952880859375, -0.23341751098632812, -0.18675804138183594, 1.9925308227539062, 0.5509109497070312, -0.16091156005859375, 0.7189369201660156, -0.8231925964355469, -0.4465217590332031, 1.574432373046875, 0.13352203369140625, -0.38724327087402344, 0.5317764282226562, 0.6696052551269531, 0.6475067138671875, 0.1439666748046875, 0.2757072448730469, 0.94952392578125, 1.6740264892578125, 0.19364166259765625, -0.7153396606445312, -0.07375335693359375, 1.7685928344726562, 0.24144744873046875, 0.54510498046875, 0.2537841796875, -0.09703445434570312, 0.13362884521484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000077.npy"}
{"epoch": 0.1164021164021164, "step": 78, "batch_size": 64, "mean": 0.3467490077018738, "std": 0.7690283060073853, "min": -0.9353790283203125, "p10": -0.5747772216796875, "median": 0.29079627990722656, "p90": 1.4072711944580079, "max": 2.27734375, "pos_frac": 0.671875, "sample": [0.894287109375, 1.742095947265625, 0.2574615478515625, 1.1005172729492188, 0.7967987060546875, 0.756866455078125, 1.4103050231933594, -0.42543792724609375, -0.17960739135742188, 1.010986328125, 0.42156982421875, 0.022430419921875, 0.10987663269042969, 1.6031494140625, 0.7797393798828125, 0.01145172119140625, -0.9263992309570312, 1.7620925903320312, 0.19207763671875, -0.0360107421875, -0.4918670654296875, -0.25374603271484375, -0.3934173583984375, 0.12509536743164062, -0.5306243896484375, 0.16797637939453125, 0.399139404296875, -0.7280120849609375, 1.05450439453125, -0.9353790283203125, 1.1655120849609375, 0.25023651123046875, -0.3362617492675781, 0.7721672058105469, 2.27734375, 0.0677032470703125, -0.8241500854492188, 0.04345703125, 0.7970008850097656, 0.9879722595214844, -0.2742156982421875, 0.3241310119628906, 1.7727203369140625, 0.32587432861328125, 0.33736419677734375, 0.34299468994140625, -0.8851966857910156, -0.3431282043457031, 0.6284713745117188, 0.5765151977539062, -0.07523918151855469, 1.4001922607421875, 0.01799774169921875, -0.8472442626953125, 1.379241943359375, 0.402984619140625, -0.35086822509765625, -0.5923233032226562, 0.6019802093505859, 0.9793701171875, -0.5338363647460938, 0.5500621795654297, 1.7562599182128906, -0.22107315063476562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000078.npy"}
{"epoch": 0.11791383219954649, "step": 79, "batch_size": 64, "mean": 0.41709187626838684, "std": 0.950299859046936, "min": -1.7223434448242188, "p10": -0.43187694549560546, "median": 0.22289085388183594, "p90": 1.43421630859375, "max": 3.2451248168945312, "pos_frac": 0.6875, "sample": [0.24487686157226562, -0.43308258056640625, -1.5427398681640625, 0.7231025695800781, -0.10672760009765625, 0.9007797241210938, -0.3232421875, -0.4069023132324219, -0.355377197265625, -0.4021759033203125, -0.4290637969970703, -0.39035797119140625, 0.4501533508300781, 3.2451248168945312, -0.43889617919921875, 0.20090484619140625, 0.8990020751953125, 0.5184402465820312, -0.31219482421875, 0.11130523681640625, 0.6692543029785156, 2.95147705078125, -0.3725433349609375, 0.282379150390625, 0.8366508483886719, -0.09486770629882812, 2.5552215576171875, 1.4367523193359375, 0.7715911865234375, 1.6635284423828125, -1.7223434448242188, 0.19131851196289062, 0.7691574096679688, 0.8755264282226562, 0.6238555908203125, 0.11227989196777344, 0.06892013549804688, 0.7149848937988281, -0.08849334716796875, 0.3683624267578125, -0.9704742431640625, 0.884307861328125, 0.9910850524902344, 0.15595436096191406, 0.06520843505859375, 0.16466331481933594, 1.4282989501953125, 0.11420631408691406, 0.6395092010498047, -0.08543014526367188, 0.6155452728271484, -0.6992645263671875, 0.80657958984375, 1.032958984375, 0.781982421875, 3.221588134765625, -0.6063919067382812, 0.856201171875, -0.4068145751953125, 0.47955322265625, 0.125946044921875, 0.16039276123046875, 2.0188751220703125, 0.1534576416015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000079.npy"}
{"epoch": 0.11942554799697656, "step": 80, "batch_size": 64, "mean": 0.290639191865921, "std": 0.9694415330886841, "min": -1.58038330078125, "p10": -0.9313846588134764, "median": 0.2748279571533203, "p90": 1.4642852783203129, "max": 3.27447509765625, "pos_frac": 0.640625, "sample": [0.4412879943847656, 1.0588340759277344, -0.8104476928710938, 0.67193603515625, 1.2171783447265625, 0.2912139892578125, -0.20854949951171875, -0.37558746337890625, -0.7744789123535156, -1.5220718383789062, 0.2708625793457031, 2.905609130859375, 0.31005859375, -0.00479888916015625, 0.5691757202148438, 1.4929962158203125, 0.1956787109375, -1.1156082153320312, 0.5853958129882812, 0.8818016052246094, 0.305908203125, 0.2215576171875, -0.7707500457763672, 1.6716461181640625, 0.6037139892578125, 1.0740299224853516, -0.195343017578125, 0.2419281005859375, 1.518463134765625, 0.660308837890625, -0.7522773742675781, 0.39319610595703125, 1.1119804382324219, -0.5141620635986328, 0.2787933349609375, 1.723602294921875, -0.2741851806640625, 1.0364837646484375, 1.1778945922851562, -0.8505859375, 0.7100372314453125, 1.1135787963867188, -0.8514175415039062, 1.3972930908203125, -0.3891925811767578, 0.6220855712890625, -1.58038330078125, 0.13678550720214844, -1.003265380859375, -0.5279197692871094, 3.27447509765625, 0.049072265625, 0.075042724609375, -0.260101318359375, 0.11063957214355469, -0.9656562805175781, 1.0598068237304688, -0.11941719055175781, 0.3318290710449219, -1.1544876098632812, 0.09255218505859375, -1.0229339599609375, 1.8941650390625, 0.865631103515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000080.npy"}
{"epoch": 0.12093726379440665, "step": 81, "batch_size": 64, "mean": 0.3502560555934906, "std": 0.9169726967811584, "min": -1.880950927734375, "p10": -0.658694839477539, "median": 0.28601646423339844, "p90": 1.442718887329102, "max": 3.0143585205078125, "pos_frac": 0.65625, "sample": [0.09523200988769531, 0.7686424255371094, 0.6229438781738281, -0.6792793273925781, 0.9537734985351562, 0.398406982421875, 0.056396484375, 0.5630340576171875, -0.2411041259765625, -0.33908843994140625, 0.9213104248046875, -1.2516708374023438, -1.880950927734375, 0.298980712890625, 0.2549457550048828, 1.4826278686523438, -0.03248023986816406, -0.41025543212890625, 0.9751358032226562, 1.097442626953125, 0.2730522155761719, -0.5754718780517578, 0.4188690185546875, -0.7403564453125, 1.54119873046875, 0.8970413208007812, -0.5560760498046875, 2.5180702209472656, 0.404449462890625, 0.267242431640625, 0.23829269409179688, -1.359619140625, 1.9557418823242188, -0.20170974731445312, 0.30133056640625, 0.693939208984375, 0.7450408935546875, 1.1478271484375, 0.0071258544921875, 3.0143585205078125, -0.4630584716796875, 0.6625175476074219, -0.45428466796875, -0.21826934814453125, -0.448822021484375, 0.21825599670410156, 1.0772724151611328, 0.7569923400878906, 1.7552947998046875, 1.8563995361328125, 1.3495979309082031, -0.36959075927734375, 1.27020263671875, -0.6106643676757812, 0.0295562744140625, -0.16967010498046875, -1.2227859497070312, 0.4040679931640625, -0.1239776611328125, 1.2521095275878906, 0.022432327270507812, 0.9670448303222656, 1.0984344482421875, -0.8670578002929688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000081.npy"}
{"epoch": 0.12244897959183673, "step": 82, "batch_size": 64, "mean": 0.5605196952819824, "std": 0.9824343919754028, "min": -1.73583984375, "p10": -0.5312793731689454, "median": 0.4590568542480469, "p90": 1.6933807373046879, "max": 3.522735595703125, "pos_frac": 0.75, "sample": [-0.00312042236328125, 0.3002510070800781, 0.15013504028320312, 0.12052536010742188, 0.6421432495117188, 0.08796501159667969, 1.3003253936767578, 1.0064506530761719, 1.972503662109375, 1.3008499145507812, 0.9663810729980469, 0.5469932556152344, 3.04949951171875, 1.046478271484375, 0.72607421875, 0.627899169921875, 0.7559967041015625, -0.5283927917480469, 1.3584709167480469, 1.2517929077148438, 0.4336128234863281, 0.28783416748046875, 0.2811126708984375, -0.06720352172851562, 0.1283111572265625, 2.0395584106445312, -0.08134078979492188, 0.44376373291015625, 0.8587455749511719, 0.1622447967529297, -0.4682941436767578, 0.5880546569824219, 3.522735595703125, -0.8731765747070312, 1.7213821411132812, -1.73583984375, 1.6138992309570312, 1.108551025390625, 0.03812408447265625, 2.2091331481933594, 0.05579376220703125, -0.5493392944335938, -0.1239013671875, 0.047695159912109375, 0.4743499755859375, 0.4019584655761719, 0.7835884094238281, -0.2161388397216797, 0.8584823608398438, 0.2839813232421875, -1.302520751953125, 0.5170402526855469, 0.606231689453125, 3.16900634765625, 1.058990478515625, -0.49526214599609375, 1.6280441284179688, 0.64642333984375, 1.2372512817382812, 0.22788429260253906, -0.735137939453125, -0.9429168701171875, -0.5325164794921875, -0.11615753173828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000082.npy"}
{"epoch": 0.12396069538926682, "step": 83, "batch_size": 64, "mean": 0.3971697986125946, "std": 0.8984332084655762, "min": -1.8254776000976562, "p10": -0.6609565734863281, "median": 0.3318300247192383, "p90": 1.5049957275390629, "max": 3.1359710693359375, "pos_frac": 0.703125, "sample": [2.1110076904296875, 0.32904052734375, -1.4604911804199219, 0.33461952209472656, 0.4084014892578125, 0.7916488647460938, 1.013427734375, 0.3135833740234375, 0.0438232421875, 1.533203125, 0.3876953125, 0.25152587890625, -0.6685714721679688, 0.3603668212890625, 0.8956985473632812, 0.2832145690917969, 1.3211212158203125, 2.2197647094726562, 0.2339324951171875, -0.6431884765625, -0.11229705810546875, -0.3669776916503906, 1.1401748657226562, 0.7749881744384766, -0.7758636474609375, -0.0615997314453125, 2.267486572265625, 1.254669189453125, -0.14258193969726562, -0.44858741760253906, -0.7631072998046875, -0.0014858245849609375, 0.046581268310546875, -1.8254776000976562, 0.3938140869140625, 0.6386833190917969, 0.4961090087890625, 1.6914749145507812, -0.18434906005859375, -0.5711822509765625, 0.43206024169921875, -0.22660064697265625, 1.4109878540039062, 0.047374725341796875, 0.17789077758789062, -0.4591083526611328, 0.5817222595214844, 0.5415802001953125, 0.7779312133789062, 1.439178466796875, 3.1359710693359375, 0.7378559112548828, 1.1416969299316406, 0.7226028442382812, 0.16522979736328125, 1.0323314666748047, 0.3823833465576172, 0.2908191680908203, 1.5704727172851562, -1.28472900390625, 0.1314239501953125, -0.2379302978515625, 0.26424407958984375, -0.8668174743652344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000083.npy"}
{"epoch": 0.1254724111866969, "step": 84, "batch_size": 64, "mean": 0.48253798484802246, "std": 0.9765664339065552, "min": -2.00970458984375, "p10": -0.7694274902343748, "median": 0.5083408355712891, "p90": 1.61606559753418, "max": 2.6135711669921875, "pos_frac": 0.671875, "sample": [0.4598388671875, 0.7939071655273438, 0.4370079040527344, 0.8751029968261719, 2.5368728637695312, 1.1210441589355469, 1.2880172729492188, 0.03246307373046875, 0.5659523010253906, -1.16326904296875, 0.422821044921875, 0.7864933013916016, 0.9936866760253906, -0.5426025390625, 1.239990234375, 1.5149154663085938, -0.049404144287109375, -1.1928939819335938, 2.2680435180664062, 0.4739265441894531, -0.0247955322265625, 0.52838134765625, -0.09378814697265625, 0.9745140075683594, 1.168813705444336, -1.155426025390625, -1.2745933532714844, 1.3191375732421875, -0.17944717407226562, 1.55230712890625, 0.118316650390625, 0.1724395751953125, 0.15533447265625, 1.7989349365234375, -0.24309539794921875, 1.0354804992675781, 0.9407768249511719, 0.7841529846191406, 0.4883003234863281, -0.853912353515625, -2.00970458984375, 2.6135711669921875, -0.3087615966796875, -0.249786376953125, -0.29978179931640625, -0.14725494384765625, -0.1310577392578125, 0.6467361450195312, 0.6174697875976562, -0.572296142578125, -0.36367034912109375, 1.530517578125, 0.7749481201171875, 0.5792770385742188, 0.9257087707519531, 1.6433906555175781, 2.2925567626953125, 0.040508270263671875, 0.15845489501953125, -1.0675506591796875, 1.1377105712890625, 0.9051666259765625, -0.3287620544433594, 2.421295166015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000084.npy"}
{"epoch": 0.12698412698412698, "step": 85, "batch_size": 64, "mean": 0.5283865928649902, "std": 0.8956908583641052, "min": -1.9920806884765625, "p10": -0.8498447418212891, "median": 0.534210205078125, "p90": 1.7838626861572267, "max": 2.3202972412109375, "pos_frac": 0.75, "sample": [0.8943099975585938, -0.8543319702148438, 0.8052711486816406, 0.5908393859863281, 0.21380615234375, 1.2283401489257812, 0.672637939453125, 0.4370708465576172, 0.00333404541015625, -0.09252166748046875, 1.7268905639648438, -0.999908447265625, 1.7406654357910156, -1.9920806884765625, 0.6627864837646484, 0.044158935546875, 1.2713127136230469, 1.6150436401367188, 1.8227462768554688, 1.4940261840820312, -0.87200927734375, -0.05956268310546875, -0.8610382080078125, 0.6149177551269531, 1.0667266845703125, 1.8197860717773438, 0.3391609191894531, 0.9670448303222656, 1.9264602661132812, 0.22786521911621094, -0.125823974609375, 0.6448440551757812, 0.111053466796875, 1.45806884765625, 0.3198394775390625, 0.03768157958984375, 1.8023757934570312, 0.9861907958984375, 0.4701957702636719, 0.24630355834960938, -0.8393745422363281, 0.85369873046875, 1.3730850219726562, 2.0144577026367188, 2.3202972412109375, 0.7266006469726562, 0.8494873046875, -0.02985382080078125, 0.4775810241699219, 1.2050971984863281, 0.3744316101074219, -0.4942626953125, 0.13066482543945312, -0.319671630859375, 0.6773300170898438, -0.04775238037109375, -0.9227218627929688, -0.8663482666015625, 0.9785957336425781, 0.8097000122070312, 2.1012954711914062, -0.638519287109375, 0.21460723876953125, 0.4638404846191406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000085.npy"}
{"epoch": 0.12849584278155707, "step": 86, "batch_size": 64, "mean": 0.33474162220954895, "std": 0.8201056718826294, "min": -1.4424285888671875, "p10": -0.7488143920898437, "median": 0.2772178649902344, "p90": 1.3737785339355468, "max": 2.83184814453125, "pos_frac": 0.65625, "sample": [1.4402084350585938, -0.1289825439453125, -0.08880615234375, -0.03531646728515625, 1.4078865051269531, 0.3334083557128906, -0.646881103515625, -0.4310302734375, -0.7690887451171875, -0.08710098266601562, 0.28211212158203125, -0.6358757019042969, 1.410064697265625, 0.489105224609375, -0.8266754150390625, 0.98675537109375, 0.7098731994628906, -0.8053550720214844, -0.9890823364257812, 0.0697784423828125, 2.83184814453125, 0.36440277099609375, 0.976043701171875, 1.2871589660644531, 1.374847412109375, 0.34023284912109375, -0.701507568359375, 0.13299560546875, 0.535064697265625, 0.4635448455810547, 1.0233306884765625, 1.6757965087890625, 0.13980865478515625, -1.4424285888671875, 0.8585052490234375, 1.0927505493164062, -0.296295166015625, 0.11899948120117188, 0.78045654296875, 0.356292724609375, -1.1043319702148438, 1.2339305877685547, 1.0005645751953125, 0.03298187255859375, 0.0697784423828125, 0.05478668212890625, 1.3084659576416016, 0.01216888427734375, 1.3712844848632812, -0.4057579040527344, 1.73968505859375, 0.2723236083984375, 0.0919952392578125, -0.079620361328125, 0.8897247314453125, -0.202423095703125, -0.8497352600097656, 1.2795333862304688, -0.2300262451171875, -0.2762184143066406, 0.29388427734375, -0.41417694091796875, 1.0902633666992188, 0.6475372314453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000086.npy"}
{"epoch": 0.13000755857898716, "step": 87, "batch_size": 64, "mean": 0.6726410984992981, "std": 1.1031782627105713, "min": -1.7415924072265625, "p10": -0.7132377624511719, "median": 0.7800140380859375, "p90": 2.1393753051757813, "max": 2.9518280029296875, "pos_frac": 0.6875, "sample": [0.0541839599609375, 1.655059814453125, 1.8280181884765625, -0.13982391357421875, 2.1544265747070312, -0.728790283203125, 2.494171142578125, -0.5899810791015625, 2.32672119140625, 1.7974739074707031, 0.8292694091796875, -0.6116600036621094, 1.5831871032714844, -0.6769485473632812, -1.591705322265625, 1.87322998046875, -1.172821044921875, 1.4846038818359375, 0.8498420715332031, 0.9010505676269531, -0.19561767578125, 2.1042556762695312, 0.7198619842529297, -0.9307708740234375, 1.2247848510742188, -1.7415924072265625, -0.3905906677246094, -0.1464557647705078, 1.1797561645507812, -1.0217571258544922, 0.018341064453125, -0.041179656982421875, 0.5379714965820312, -0.1990509033203125, -0.12905120849609375, -0.8696441650390625, 1.0405044555664062, 0.7307586669921875, 1.0978317260742188, 0.13002395629882812, 0.4580078125, 0.985809326171875, 2.1979637145996094, -0.1390228271484375, 1.1598663330078125, 0.2477264404296875, 0.091888427734375, 1.5188102722167969, -0.5890731811523438, 1.5348052978515625, 2.62078857421875, 0.0223846435546875, 2.9518280029296875, 1.312042236328125, 1.0630416870117188, 0.1473541259765625, 0.6046619415283203, -0.12191390991210938, 2.56146240234375, 1.5806884765625, 1.821868896484375, 0.8485794067382812, 1.127471923828125, 1.6041030883789062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000087.npy"}
{"epoch": 0.13151927437641722, "step": 88, "batch_size": 64, "mean": 0.3557293117046356, "std": 1.1379103660583496, "min": -2.6492462158203125, "p10": -1.1851995468139649, "median": 0.4471893310546875, "p90": 1.4942878723144535, "max": 3.3876190185546875, "pos_frac": 0.65625, "sample": [2.3284759521484375, -0.02512359619140625, -0.14029502868652344, 0.733734130859375, 1.5359649658203125, 1.393218994140625, 1.0642242431640625, -0.8042144775390625, -1.3336105346679688, 0.19670486450195312, 1.8425407409667969, 0.8066482543945312, 1.1513938903808594, 0.11248016357421875, -0.4798698425292969, 0.7753219604492188, 0.3410491943359375, -0.8327980041503906, -1.0292510986328125, 0.7146224975585938, 0.8941898345947266, 0.7242794036865234, 0.052066802978515625, 0.5511360168457031, -1.243255615234375, 2.0751724243164062, 1.1429328918457031, 0.363067626953125, -0.25997161865234375, -0.8469276428222656, -1.769866943359375, -1.1920928955078125, 3.2885894775390625, -0.5876617431640625, 0.4464111328125, 0.5208969116210938, -0.44476318359375, 0.8753662109375, 1.39697265625, 0.09273910522460938, 0.037807464599609375, 1.1662445068359375, 0.6445980072021484, 0.7692413330078125, -1.200164794921875, 0.8770751953125, 0.4427947998046875, -0.07649421691894531, 0.8459625244140625, 0.748321533203125, 1.3970413208007812, 1.101715087890625, 2.2037506103515625, 3.3876190185546875, 1.3357391357421875, -1.1691150665283203, -0.20941925048828125, 0.447967529296875, 0.2082672119140625, 0.6043243408203125, -0.7200031280517578, -2.6492462158203125, -1.6572723388671875, -0.2005767822265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000088.npy"}
{"epoch": 0.1330309901738473, "step": 89, "batch_size": 64, "mean": 0.8478801846504211, "std": 1.3385714292526245, "min": -2.057567596435547, "p10": -0.6705881118774413, "median": 0.6697578430175781, "p90": 2.3816082000732424, "max": 5.511627197265625, "pos_frac": 0.75, "sample": [0.3956260681152344, 0.884246826171875, 0.5412673950195312, -0.7169418334960938, 0.722412109375, 2.4034767150878906, -0.4683341979980469, 0.6186752319335938, 0.29602813720703125, 1.9410476684570312, 0.8359718322753906, 2.0924453735351562, 1.669708251953125, 5.511627197265625, 0.6434288024902344, 0.041919708251953125, 1.2094535827636719, 0.597381591796875, -0.42505645751953125, -0.2966766357421875, -0.2374725341796875, -0.4787616729736328, 1.1653823852539062, 1.563323974609375, -0.7350425720214844, 4.49566650390625, -0.114227294921875, 1.0360565185546875, -0.5624294281005859, -0.40771484375, 0.2492523193359375, 0.31585693359375, 0.3799171447753906, -0.2008056640625, 1.4806365966796875, 0.6960868835449219, 0.8490982055664062, 1.5878372192382812, 3.1630859375, 2.3305816650390625, 1.3358306884765625, 1.83599853515625, 1.6689834594726562, 3.2973365783691406, 0.529388427734375, 0.1738300323486328, 0.760406494140625, 2.89434814453125, 1.56475830078125, -1.1681365966796875, 0.977508544921875, 1.7508201599121094, 0.47900390625, 0.28571319580078125, 1.5434799194335938, -1.076751708984375, 1.1429214477539062, -0.9998130798339844, -1.1301040649414062, -2.057567596435547, 2.5276336669921875, 0.5406837463378906, 2.2485923767089844, 0.06543159484863281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000089.npy"}
{"epoch": 0.1345427059712774, "step": 90, "batch_size": 64, "mean": 0.6239124536514282, "std": 1.1234967708587646, "min": -2.16949462890625, "p10": -0.6360977172851562, "median": 0.6182432174682617, "p90": 2.0386817932128904, "max": 4.040443420410156, "pos_frac": 0.734375, "sample": [0.08125495910644531, 0.9200859069824219, 0.3330268859863281, 0.5190963745117188, 0.428497314453125, 0.2557258605957031, 1.1379318237304688, 3.6010284423828125, -0.694976806640625, -2.16949462890625, 1.732666015625, 1.1920852661132812, 1.7734222412109375, 0.670074462890625, 2.2743301391601562, -0.6680908203125, 0.5977249145507812, 0.981109619140625, -0.013370513916015625, 0.4788360595703125, 2.0287857055664062, -1.523345947265625, 2.546539306640625, 0.219146728515625, 0.77130126953125, 1.0044898986816406, 0.41455078125, 0.8858642578125, 0.23638343811035156, 2.6395797729492188, -0.14045333862304688, -0.8251953125, 0.7600479125976562, 2.0429229736328125, 0.2646598815917969, 0.5790557861328125, -0.7180557250976562, -1.7745361328125, 4.040443420410156, 1.1171550750732422, -0.4309539794921875, 1.6607818603515625, 1.755126953125, -0.2951812744140625, 0.6838760375976562, 0.6387615203857422, 0.7878074645996094, 0.7860603332519531, 0.6416473388671875, 0.7605018615722656, -0.5491180419921875, 0.7987289428710938, 0.4254608154296875, -0.075042724609375, 0.2698802947998047, 0.041449546813964844, 1.403045654296875, -0.3912162780761719, -0.5614471435546875, 1.244110107421875, -0.518890380859375, 2.11444091796875, -0.13645172119140625, 0.8767127990722656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000090.npy"}
{"epoch": 0.1360544217687075, "step": 91, "batch_size": 64, "mean": 0.5400874614715576, "std": 1.3947391510009766, "min": -2.7492141723632812, "p10": -1.033991241455078, "median": 0.3635711669921875, "p90": 2.247067642211915, "max": 5.046558380126953, "pos_frac": 0.609375, "sample": [2.60205078125, 1.2137832641601562, 1.026773452758789, -0.041309356689453125, 0.4732666015625, 1.952484130859375, -0.18986129760742188, -0.45400238037109375, 1.3959121704101562, 3.426605224609375, 1.27874755859375, -1.455780029296875, 0.7549362182617188, -0.15142059326171875, 0.7127571105957031, -0.9642219543457031, -0.5281829833984375, -0.28804779052734375, -0.014850616455078125, -2.7492141723632812, 1.163604736328125, 3.4592742919921875, 1.2416229248046875, 0.07365226745605469, 0.26070404052734375, 0.3179359436035156, 1.296539306640625, -0.8388385772705078, -1.7073516845703125, 0.8987808227539062, 0.6632728576660156, 2.9760360717773438, -1.505126953125, 0.2934684753417969, 2.3733177185058594, -0.1654205322265625, -0.2980804443359375, 1.7967987060546875, -1.0638923645019531, -0.9627532958984375, 0.603607177734375, -0.523712158203125, 1.0511474609375, 0.327545166015625, 0.08495712280273438, 2.4672470092773438, 0.39959716796875, 1.6777114868164062, 1.5548133850097656, 1.8525314331054688, -0.27013397216796875, -0.6407661437988281, -0.4522838592529297, -2.175783157348633, 1.7636566162109375, 1.5590095520019531, 1.358062744140625, -0.20730972290039062, 5.046558380126953, 0.2832450866699219, -1.2129440307617188, 0.8366031646728516, 1.0081901550292969, -0.09992218017578125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000091.npy"}
{"epoch": 0.13756613756613756, "step": 92, "batch_size": 64, "mean": 0.6844162940979004, "std": 1.2589318752288818, "min": -1.4289321899414062, "p10": -0.7774089813232422, "median": 0.4853811264038086, "p90": 2.6105316162109378, "max": 4.502677917480469, "pos_frac": 0.671875, "sample": [0.7328033447265625, 0.7974433898925781, -0.9094696044921875, -0.30068016052246094, -1.4289321899414062, 0.18700408935546875, -0.26776123046875, 1.1749191284179688, -0.5533447265625, -0.3057212829589844, 0.31580352783203125, 0.49100303649902344, 1.5492477416992188, 1.3618316650390625, 1.5784645080566406, 1.6988601684570312, 0.673431396484375, 0.06322097778320312, 2.6448936462402344, 0.270294189453125, 0.599029541015625, -0.22528076171875, 0.9012680053710938, 0.233062744140625, 2.3056182861328125, -0.0885467529296875, -1.2825965881347656, -0.10752105712890625, -1.2565765380859375, 1.1219635009765625, -0.23736572265625, 1.3313751220703125, 2.6307373046875, -0.6090984344482422, 2.6928176879882812, 0.2962760925292969, 1.155853271484375, 1.4114532470703125, -0.875701904296875, 0.2924308776855469, -0.2139892578125, -0.08696746826171875, 4.174163818359375, 2.8357086181640625, 0.2774314880371094, -1.11309814453125, 0.7186279296875, 1.6512374877929688, 0.04949188232421875, -0.7876548767089844, 0.8483009338378906, 2.563385009765625, 1.3849563598632812, -0.13820266723632812, 1.6341552734375, -0.7535018920898438, 4.502677917480469, 0.9963150024414062, 0.5946178436279297, 0.703857421875, 3.2505035400390625, 0.47975921630859375, -0.1010284423828125, 0.26938629150390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000092.npy"}
{"epoch": 0.13907785336356765, "step": 93, "batch_size": 64, "mean": 0.5415202975273132, "std": 1.6081990003585815, "min": -3.140228271484375, "p10": -1.2792694091796875, "median": 0.6106624603271484, "p90": 2.022844696044922, "max": 6.2830963134765625, "pos_frac": 0.625, "sample": [-0.4804363250732422, 1.8219528198242188, -1.7301025390625, 1.0744171142578125, -2.918548583984375, 1.0305366516113281, -1.0095977783203125, -1.010009765625, 0.6245918273925781, 0.6040153503417969, 3.8444442749023438, 2.901212692260742, 6.2830963134765625, -0.5720710754394531, -0.6294937133789062, 0.04485893249511719, 0.9780120849609375, 1.7553787231445312, -0.1991119384765625, -1.3242950439453125, 1.9552459716796875, 3.268707275390625, -0.80657958984375, 0.6173095703125, 0.3419647216796875, -0.17105484008789062, 1.9718780517578125, -0.056385040283203125, -0.5932197570800781, 0.4760704040527344, 1.014984130859375, 0.7290000915527344, 0.19145965576171875, 2.3784103393554688, -3.140228271484375, 1.5572433471679688, 0.3582344055175781, 1.1460018157958984, -1.5250167846679688, 0.01605987548828125, 1.8966140747070312, -0.32292938232421875, 1.7390518188476562, 1.0902900695800781, -0.6656761169433594, -0.6303863525390625, 0.9164772033691406, 0.9296073913574219, 3.6939239501953125, 0.7675094604492188, 1.529947280883789, 0.9512977600097656, 2.0364837646484375, 0.7913055419921875, -1.8214912414550781, -0.3187236785888672, 1.8784942626953125, -2.079193115234375, -0.677032470703125, 1.9910202026367188, 1.1984024047851562, -1.1742095947265625, -0.32588958740234375, 0.4434700012207031], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000093.npy"}
{"epoch": 0.14058956916099774, "step": 94, "batch_size": 64, "mean": 0.9314711689949036, "std": 1.6146849393844604, "min": -2.905487060546875, "p10": -0.8464759826660153, "median": 0.7767753601074219, "p90": 3.273703765869142, "max": 5.51397705078125, "pos_frac": 0.765625, "sample": [3.461334228515625, 3.468914031982422, -1.61468505859375, 1.2609081268310547, 0.46475982666015625, -2.1913833618164062, 0.5230731964111328, -0.1682758331298828, 0.26397705078125, 3.6093826293945312, 0.16604232788085938, 1.6620941162109375, 1.36328125, -2.8260345458984375, 1.1296863555908203, 1.4163284301757812, 1.00494384765625, 0.3652362823486328, 0.73468017578125, 0.0479736328125, 1.8860282897949219, 0.31420135498046875, 0.6946449279785156, 0.7196311950683594, 2.022686004638672, 0.20725250244140625, 0.7348861694335938, 1.1036300659179688, -1.108795166015625, 0.4296722412109375, 1.1183280944824219, 0.5755157470703125, 0.8714561462402344, -1.642730712890625, 0.9151458740234375, 1.8938064575195312, 1.9106369018554688, 2.902801513671875, 0.48194122314453125, 2.91510009765625, 2.1136093139648438, 1.8471832275390625, -0.32975006103515625, 5.51397705078125, 2.6519699096679688, 0.81866455078125, 3.4273910522460938, 4.242767333984375, 0.4368896484375, 2.075986862182617, 4.08172607421875, -0.49337005615234375, -0.4011993408203125, -0.3996257781982422, 0.8302593231201172, -0.030679702758789062, -2.905487060546875, 0.8746109008789062, 2.52197265625, 0.8785400390625, -0.00333404541015625, -0.5307998657226562, -0.9817657470703125, 0.2865409851074219], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000094.npy"}
{"epoch": 0.1421012849584278, "step": 95, "batch_size": 64, "mean": 0.7012571096420288, "std": 1.573445200920105, "min": -2.6410675048828125, "p10": -1.3235832214355467, "median": 0.5517673492431641, "p90": 2.7088226318359383, "max": 4.8262481689453125, "pos_frac": 0.671875, "sample": [2.2733154296875, -1.5790786743164062, 0.0501556396484375, -0.8899612426757812, -1.6892166137695312, -0.4252738952636719, 4.8262481689453125, 1.0335502624511719, 2.9629440307617188, 1.9134979248046875, 0.257568359375, 0.6679229736328125, 0.7960205078125, -0.16168975830078125, -0.34923553466796875, 1.2392158508300781, -2.6410675048828125, -0.46691131591796875, -2.1302108764648438, 0.5587539672851562, 2.2684326171875, 2.55670166015625, 0.5447807312011719, 0.987823486328125, -2.242462158203125, -0.4342803955078125, 2.356414794921875, 2.021270751953125, -1.1321182250976562, -2.63092041015625, 0.4117279052734375, 1.9129714965820312, 0.9388580322265625, 1.113006591796875, -0.08097648620605469, 0.1083526611328125, 2.4026260375976562, 1.4636306762695312, -0.509429931640625, 0.2395782470703125, 0.072174072265625, 0.09682464599609375, 2.774017333984375, 0.5659046173095703, -1.4056396484375, 3.041168212890625, 2.824859619140625, 0.039691925048828125, 1.298614501953125, 3.175262451171875, 0.45461273193359375, 3.6192550659179688, -0.5750942230224609, 0.34696197509765625, 1.6397933959960938, 2.2766876220703125, -0.46358680725097656, 2.2563400268554688, -0.07610702514648438, 1.4412689208984375, -0.26927947998046875, 2.3043136596679688, 1.6185760498046875, -0.7187004089355469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000095.npy"}
{"epoch": 0.1436130007558579, "step": 96, "batch_size": 64, "mean": 0.7872134447097778, "std": 1.5985115766525269, "min": -2.7499847412109375, "p10": -0.9637359619140624, "median": 0.5884580612182617, "p90": 3.167860412597657, "max": 4.8265380859375, "pos_frac": 0.671875, "sample": [0.8614311218261719, -0.45555877685546875, 1.9346160888671875, -0.9820098876953125, 2.0677337646484375, -1.4240264892578125, 1.451995849609375, -0.5403232574462891, 0.9815521240234375, 0.6967391967773438, 3.9756240844726562, 0.10216712951660156, 0.2200164794921875, 2.9386749267578125, 2.458251953125, 2.1605453491210938, -1.2561111450195312, 0.21815872192382812, -0.8911495208740234, -1.1974449157714844, 0.35945892333984375, -0.428863525390625, 1.9266357421875, -2.7499847412109375, 0.49741363525390625, 0.5882148742675781, 1.1528511047363281, 0.540863037109375, 2.15106201171875, -0.31407928466796875, 3.651702880859375, 1.7548980712890625, 3.349365234375, 3.266082763671875, -0.01288604736328125, 2.2353286743164062, -2.7135467529296875, 0.39328765869140625, -0.122772216796875, 1.0382843017578125, 2.239288330078125, 0.7901458740234375, 0.9226722717285156, -0.7432327270507812, 4.8265380859375, 0.26573944091796875, 2.3270034790039062, -0.9210968017578125, -0.6419448852539062, -1.1070098876953125, 0.6907577514648438, -0.3421440124511719, 0.5887012481689453, 4.1062469482421875, 3.6973114013671875, 1.0940704345703125, -0.5459709167480469, 0.92083740234375, 0.46103858947753906, 0.5241546630859375, 1.9767913818359375, 0.6511306762695312, -0.3979644775390625, -0.8856048583984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000096.npy"}
{"epoch": 0.14512471655328799, "step": 97, "batch_size": 64, "mean": 0.25008776783943176, "std": 1.6384333372116089, "min": -3.5253944396972656, "p10": -1.8214088439941405, "median": 0.3757781982421875, "p90": 2.219736099243165, "max": 4.15289306640625, "pos_frac": 0.578125, "sample": [-1.9182586669921875, -0.8254623413085938, -0.6752529144287109, 2.916078567504883, 0.5437793731689453, 4.15289306640625, -1.6990737915039062, -0.3292388916015625, -1.327667236328125, -3.406524658203125, -2.1758804321289062, 2.7079391479492188, 0.4488983154296875, 1.8765106201171875, -1.466094970703125, -3.5253944396972656, 0.6306419372558594, 0.22119522094726562, 1.1427001953125, 0.6278667449951172, 0.5618972778320312, 1.7037200927734375, -0.027570724487304688, -1.1115989685058594, 1.5093994140625, 0.5215377807617188, -1.8354530334472656, 0.2826690673828125, -0.281585693359375, 1.0492630004882812, -0.16867828369140625, -2.01611328125, 1.287139892578125, 0.9125022888183594, -0.6292533874511719, 1.6565017700195312, 2.7355499267578125, 1.9464988708496094, -0.18620872497558594, 0.170196533203125, 1.6698074340820312, -1.1298255920410156, 2.464935302734375, 0.7971954345703125, 0.585693359375, -0.5860939025878906, 1.7369728088378906, 2.3368377685546875, 0.59967041015625, -1.52496337890625, -0.9741287231445312, 0.08847808837890625, -3.25152587890625, -1.7886390686035156, 3.4086685180664062, 1.2432098388671875, 1.4836273193359375, -0.16156005859375, 1.8806514739990234, 1.6704559326171875, -1.6604385375976562, 1.156341552734375, 0.3026580810546875, -0.3424797058105469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000097.npy"}
{"epoch": 0.14663643235071808, "step": 98, "batch_size": 64, "mean": 0.4052843153476715, "std": 1.3245368003845215, "min": -3.657989501953125, "p10": -1.1064048767089845, "median": 0.5766792297363281, "p90": 1.6006797790527345, "max": 3.7152481079101562, "pos_frac": 0.671875, "sample": [-1.0995101928710938, 0.23974037170410156, 1.4644546508789062, 1.6114120483398438, 1.5756378173828125, 0.6053543090820312, 1.2294769287109375, 1.3394279479980469, -0.4894676208496094, 3.7152481079101562, -1.9669342041015625, -1.0936698913574219, 1.3426971435546875, 0.1709136962890625, -2.8428421020507812, 1.48199462890625, 1.2489089965820312, 1.6124267578125, 0.29119873046875, -1.2397270202636719, 1.5346336364746094, 2.8928604125976562, 1.0572662353515625, 2.3184814453125, 0.6857528686523438, -0.9893016815185547, 2.9531822204589844, 1.4932899475097656, 0.7065238952636719, 0.0242767333984375, 1.013427734375, 0.6912994384765625, -1.1093597412109375, 0.76837158203125, -0.9684677124023438, 0.548004150390625, 1.2179756164550781, 0.6358299255371094, 0.7972946166992188, -1.0197792053222656, 0.93975830078125, -0.052978515625, -0.6328125, -0.04497528076171875, 1.0863113403320312, -1.6228370666503906, 0.06539535522460938, 0.47948455810546875, -0.1299896240234375, 1.3184623718261719, -0.2086162567138672, 0.369964599609375, -0.536529541015625, -0.7993545532226562, -3.657989501953125, 0.95037841796875, 1.1252326965332031, 0.4017982482910156, 0.070159912109375, 0.3657684326171875, 2.74755859375, 1.0020713806152344, -0.0792999267578125, -1.6670684814453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000098.npy"}
{"epoch": 0.14814814814814814, "step": 99, "batch_size": 64, "mean": 0.6703593134880066, "std": 2.076678514480591, "min": -3.6578598022460938, "p10": -1.4741729736328124, "median": 0.3786458969116211, "p90": 3.1841255187988304, "max": 6.4973297119140625, "pos_frac": 0.59375, "sample": [5.473731994628906, 3.453704833984375, 0.2989654541015625, 5.3916778564453125, 0.8152542114257812, -0.59014892578125, -0.13950347900390625, -2.469125747680664, 0.1666259765625, 2.2077674865722656, -0.017608642578125, 1.8851509094238281, 0.7015037536621094, 1.6023445129394531, -0.22580528259277344, 2.5551071166992188, -0.329193115234375, 0.6897010803222656, -0.31157684326171875, 0.8435783386230469, 4.2512054443359375, -0.4512519836425781, -0.8192939758300781, -1.4383964538574219, -1.8657684326171875, -2.416961669921875, 5.9434814453125, -0.9549713134765625, 0.7599525451660156, 5.407249450683594, 0.695098876953125, 2.23516845703125, -1.2600898742675781, 1.4891738891601562, 0.06772613525390625, 1.455322265625, -0.20072555541992188, 0.4583263397216797, -1.1758804321289062, 1.9957046508789062, -0.027692794799804688, -3.4306182861328125, 2.544208526611328, 6.4973297119140625, -0.4611358642578125, 1.7613906860351562, 1.7235946655273438, 1.3194923400878906, -1.0950927734375, 0.9300765991210938, 0.6798362731933594, 1.20416259765625, 0.5052070617675781, 0.23527145385742188, 0.9199676513671875, -0.45943450927734375, -0.12374114990234375, 0.8460693359375, 0.16188812255859375, -3.6578598022460938, -0.19964218139648438, -1.4895057678222656, 0.19199371337890625, -1.8499908447265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000099.npy"}
{"epoch": 0.14965986394557823, "step": 100, "batch_size": 64, "mean": 0.773510217666626, "std": 2.3156039714813232, "min": -4.503143310546875, "p10": -1.7168739318847657, "median": 0.6644210815429688, "p90": 3.3150314331054695, "max": 9.455474853515625, "pos_frac": 0.671875, "sample": [0.606903076171875, -0.025299072265625, -2.2977294921875, 1.8774642944335938, 2.4974822998046875, -0.12033843994140625, 0.31269073486328125, 0.7021942138671875, 3.3939361572265625, 0.3250732421875, 5.23529052734375, 1.2138748168945312, 0.7781906127929688, 4.582038879394531, -0.146392822265625, 4.5222320556640625, -1.0567245483398438, 0.55389404296875, 1.0538673400878906, 0.62664794921875, 0.7255058288574219, -1.1528472900390625, -1.1816940307617188, 0.4014625549316406, -1.7293319702148438, 2.2269287109375, 1.4645671844482422, 2.5537261962890625, 0.2261676788330078, -4.159698486328125, 0.4976043701171875, -0.0313568115234375, -1.4407577514648438, 0.48543548583984375, 0.7427978515625, -1.1556129455566406, 1.8232040405273438, 0.7839431762695312, -3.7988357543945312, 2.046722412109375, -0.472015380859375, 6.5018463134765625, -2.3239364624023438, 0.4646034240722656, -0.6872100830078125, 2.005706787109375, 0.8259792327880859, 0.27063751220703125, -0.03459358215332031, 1.5737724304199219, 1.5910491943359375, 3.13092041015625, 1.3225860595703125, 1.325765609741211, 1.7489776611328125, 0.8093357086181641, -2.6754608154296875, 3.91259765625, 1.9380683898925781, 1.1754531860351562, 9.455474853515625, -0.1271820068359375, -4.503143310546875, -1.68780517578125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000100.npy"}
{"epoch": 0.15117157974300832, "step": 101, "batch_size": 64, "mean": 0.9812161922454834, "std": 1.6951119899749756, "min": -4.2628173828125, "p10": -1.0170812606811521, "median": 1.0273962020874023, "p90": 3.4405364990234375, "max": 4.9547576904296875, "pos_frac": 0.734375, "sample": [0.0247650146484375, -0.35672760009765625, 1.1010818481445312, 0.05666351318359375, 3.4564208984375, 2.2467498779296875, 1.141082763671875, 0.3275165557861328, -0.2529144287109375, -0.00972747802734375, 3.4616546630859375, 1.45684814453125, -4.2628173828125, 0.32801055908203125, 2.9909515380859375, -0.5221939086914062, 1.3674964904785156, 1.1324748992919922, 0.9357414245605469, -1.2872428894042969, 1.9044036865234375, 1.960296630859375, 3.59686279296875, 2.1029739379882812, 0.3498725891113281, -1.15753173828125, 0.8504981994628906, 1.6334686279296875, 0.9307174682617188, 2.431060791015625, -0.14787673950195312, 3.5082855224609375, 1.2951164245605469, 0.38147926330566406, -1.0729141235351562, -0.8868045806884766, 0.7871322631835938, -1.7297439575195312, -1.1853408813476562, 1.1833419799804688, -0.1724853515625, 0.6945571899414062, 0.9537105560302734, 0.9175148010253906, 0.3994598388671875, 2.2223777770996094, 1.9949722290039062, -3.757415771484375, -0.6626377105712891, 0.517791748046875, 2.8201751708984375, 1.3963088989257812, 3.6030540466308594, 1.6219444274902344, 1.7885055541992188, -0.2671356201171875, 3.403472900390625, 2.38262939453125, -0.862884521484375, 4.9547576904296875, 1.364410400390625, 2.325284957885742, 3.473480224609375, 1.6148567199707031], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000101.npy"}
{"epoch": 0.15268329554043839, "step": 102, "batch_size": 64, "mean": 0.5816311836242676, "std": 1.9454294443130493, "min": -4.5856170654296875, "p10": -1.426142120361328, "median": 0.4170103073120117, "p90": 2.9394668579101566, "max": 6.15167236328125, "pos_frac": 0.5625, "sample": [1.8344802856445312, -1.4144515991210938, -1.1489639282226562, 0.4914817810058594, -3.0535736083984375, -0.5288143157958984, -1.43115234375, 0.9100456237792969, -2.2584457397460938, 0.3534965515136719, -0.48461151123046875, 1.3975982666015625, -1.6172256469726562, 1.177398681640625, 1.479217529296875, -0.8513946533203125, -0.9880504608154297, 2.8467864990234375, -0.5249099731445312, 4.635528564453125, -1.1013164520263672, 0.4056663513183594, 1.8996047973632812, -0.5783462524414062, 0.8066091537475586, 0.7832145690917969, 1.2813491821289062, 2.97918701171875, -4.5856170654296875, -0.0810394287109375, 4.878669738769531, 2.4275550842285156, -1.4847297668457031, 1.1404876708984375, -0.9517974853515625, 3.4604110717773438, 2.4192428588867188, 4.259311676025391, 2.0196380615234375, -0.9669570922851562, 1.7397117614746094, 3.5035858154296875, -0.56884765625, 0.13428497314453125, -0.7440109252929688, 1.1518325805664062, -0.6010513305664062, 1.4060440063476562, 0.274627685546875, -0.7093734741210938, 0.7600021362304688, 2.4559478759765625, 0.967864990234375, -2.1911048889160156, -0.360992431640625, -1.1913108825683594, -0.14849090576171875, 6.15167236328125, -0.6218109130859375, -0.6028213500976562, 1.8947792053222656, 2.4953460693359375, 1.7645759582519531, 0.42835426330566406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000102.npy"}
{"epoch": 0.15419501133786848, "step": 103, "batch_size": 64, "mean": 1.1501903533935547, "std": 2.111143112182617, "min": -2.9614295959472656, "p10": -1.2855007171630857, "median": 0.9706754684448242, "p90": 3.642555618286133, "max": 7.2682952880859375, "pos_frac": 0.71875, "sample": [-2.5665969848632812, 2.7591323852539062, 2.602508544921875, -2.6339950561523438, 0.4651069641113281, 1.3833999633789062, 0.5127639770507812, 0.46302032470703125, 1.9595375061035156, 1.1046371459960938, -0.33362579345703125, -0.7176589965820312, -0.37044525146484375, 0.9380569458007812, 2.7763214111328125, 3.680736541748047, 1.0032939910888672, 1.1537399291992188, 5.3668975830078125, 3.553466796875, -0.04193115234375, 3.7617874145507812, 0.4141082763671875, 2.3874969482421875, 1.9906883239746094, -0.488677978515625, 0.45204925537109375, 6.0852508544921875, -0.9836807250976562, 2.2684974670410156, 1.309326171875, 7.2682952880859375, 2.335784912109375, 0.472869873046875, 0.4163932800292969, 0.405120849609375, -1.4806137084960938, 3.47747802734375, 0.8040313720703125, -0.0360107421875, 3.340038299560547, 0.766937255859375, 1.3442745208740234, -1.3611907958984375, 1.3210163116455078, -2.147857666015625, 3.4654273986816406, -0.249603271484375, 4.146177291870117, 0.06801986694335938, 2.7712860107421875, 1.7733001708984375, -2.9614295959472656, -1.1088905334472656, -0.938262939453125, 4.906951904296875, -2.9224014282226562, 0.14786148071289062, -1.0187816619873047, 0.6579360961914062, 1.3563117980957031, 1.2776260375976562, 3.39111328125, 1.667755126953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000103.npy"}
{"epoch": 0.15570672713529857, "step": 104, "batch_size": 64, "mean": 1.176223874092102, "std": 2.412670850753784, "min": -5.045951843261719, "p10": -1.7926315307617187, "median": 1.037210464477539, "p90": 4.341725921630863, "max": 7.623931884765625, "pos_frac": 0.6875, "sample": [3.061859130859375, 2.367919921875, 1.24017333984375, 1.447845458984375, -1.69097900390625, 2.0864410400390625, -0.4155101776123047, 1.1041603088378906, 0.9608993530273438, -0.77117919921875, 0.0173492431640625, -5.045951843261719, 7.623931884765625, 3.5202789306640625, -2.2006607055664062, -0.15436935424804688, 5.20111083984375, 0.0999298095703125, 1.59393310546875, 3.3595504760742188, -2.56109619140625, -1.2922821044921875, 4.6622314453125, 0.9702606201171875, -0.943695068359375, -0.050731658935546875, -0.2602043151855469, -3.16998291015625, 3.2115325927734375, -0.8493499755859375, 0.6509456634521484, 0.773773193359375, -1.2248954772949219, 1.3083152770996094, 1.9505043029785156, 1.838348388671875, 0.8313140869140625, -0.9682579040527344, 3.5938796997070312, 2.055023193359375, 0.063995361328125, -1.8627967834472656, 5.28509521484375, 0.928985595703125, 3.1858444213867188, 6.426605224609375, 2.840667724609375, 0.5187721252441406, -1.0460968017578125, -1.8361968994140625, 2.1072540283203125, 1.4543418884277344, 2.8842086791992188, 3.499420166015625, 0.3430938720703125, 2.1144027709960938, -0.5364990234375, 1.3114013671875, 0.6214962005615234, 1.3314056396484375, -2.2486495971679688, 2.8529510498046875, 5.8738250732421875, 5.2324371337890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000104.npy"}
{"epoch": 0.15721844293272866, "step": 105, "batch_size": 64, "mean": 0.9951044917106628, "std": 2.805931329727173, "min": -5.84051513671875, "p10": -2.960071563720703, "median": 1.0228462219238281, "p90": 5.599971008300781, "max": 8.121368408203125, "pos_frac": 0.640625, "sample": [-0.05927467346191406, 2.73602294921875, 2.097747802734375, 1.9585742950439453, 0.8151016235351562, 2.477191925048828, -5.84051513671875, 1.4075927734375, -2.9251174926757812, 1.4862327575683594, 3.911376953125, 1.3709564208984375, -0.721466064453125, 2.491424560546875, -0.41025543212890625, 7.199005126953125, -0.18021202087402344, -0.9883804321289062, 0.34433937072753906, 5.8789215087890625, 1.5702972412109375, 1.2273635864257812, 0.6876678466796875, -3.3625869750976562, 1.0477294921875, 0.6893310546875, 5.58319091796875, -0.8007659912109375, 1.4747848510742188, -3.018463134765625, -3.2218399047851562, 1.8873405456542969, -0.042724609375, 1.6221923828125, -4.0775604248046875, 8.121368408203125, 0.06382369995117188, -0.0713958740234375, 0.35820770263671875, 0.02095794677734375, 1.0886459350585938, -0.3042144775390625, -1.0239067077636719, 1.7749176025390625, -3.929168701171875, 1.9249229431152344, 6.1272735595703125, -0.6534805297851562, 1.2601776123046875, -1.6260452270507812, -2.9750518798828125, 1.2925758361816406, -1.0486564636230469, 5.6071624755859375, 6.2912139892578125, 0.9979629516601562, 2.5583763122558594, 1.6125946044921875, -0.22600173950195312, 2.6326045989990234, -1.843109130859375, 5.7326507568359375, 0.803497314453125, 4.803558349609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000105.npy"}
{"epoch": 0.15873015873015872, "step": 106, "batch_size": 64, "mean": 0.9039825797080994, "std": 2.387871742248535, "min": -7.1685791015625, "p10": -1.5151405334472654, "median": 0.6383457183837891, "p90": 3.3432544708251952, "max": 7.866851806640625, "pos_frac": 0.703125, "sample": [-0.393951416015625, 1.3486785888671875, 0.06131172180175781, 3.1630325317382812, -1.3807449340820312, -1.0341796875, 4.571338653564453, 0.7884502410888672, 0.9309005737304688, 6.487945556640625, 0.17063140869140625, 4.3856658935546875, 4.940238952636719, 1.679443359375, 0.34661102294921875, -2.5578269958496094, -0.13528060913085938, 2.6037673950195312, -0.18206787109375, 1.0238800048828125, 0.6251296997070312, 0.1350860595703125, -4.582672119140625, 1.1708984375, -0.37034034729003906, 0.6515617370605469, -0.2720527648925781, -1.5727386474609375, 0.21984100341796875, 3.3483963012695312, 0.7053680419921875, -0.6434917449951172, 0.48387908935546875, 3.0690765380859375, 3.3175048828125, 0.7481765747070312, -0.08276748657226562, -0.5043849945068359, 1.3039054870605469, 7.866851806640625, 2.153461456298828, 1.9505538940429688, 1.7335243225097656, -0.19415283203125, -3.1593704223632812, 0.1424713134765625, 2.3535118103027344, 1.9855117797851562, -7.1685791015625, 0.5617866516113281, -2.0865936279296875, 0.164886474609375, 3.1225662231445312, 3.331256866455078, 2.716094970703125, 3.0503311157226562, 1.0433731079101562, 0.22802734375, 0.08735466003417969, 3.471832275390625, -2.8044586181640625, 0.0064983367919921875, 2.8225135803222656, -0.09258842468261719], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000106.npy"}
{"epoch": 0.1602418745275888, "step": 107, "batch_size": 64, "mean": 0.6506277918815613, "std": 2.3451988697052, "min": -7.710662841796875, "p10": -1.9517440795898435, "median": 0.7556381225585938, "p90": 3.5939081192016604, "max": 6.0134124755859375, "pos_frac": 0.671875, "sample": [0.8238525390625, -4.2371673583984375, 0.9565887451171875, 0.8344841003417969, 0.00958251953125, 2.8012237548828125, -4.6077880859375, 2.406543731689453, 0.4421844482421875, 1.6310081481933594, 0.6874237060546875, 6.0134124755859375, 1.0450973510742188, 1.2888641357421875, 0.8443832397460938, -1.1946258544921875, -3.3417739868164062, 0.22745513916015625, 3.0020828247070312, 1.0008544921875, 0.0820770263671875, 3.9069061279296875, 4.522789001464844, -1.3416385650634766, 0.974609375, -0.4121246337890625, 2.4478416442871094, -0.10427474975585938, -1.3558349609375, 1.1595916748046875, 0.6499366760253906, -2.358642578125, 0.618316650390625, 2.789104461669922, 0.25255584716796875, 4.020965576171875, 3.6154727935791016, -0.33941650390625, 0.132659912109375, 2.8923416137695312, 1.3638458251953125, 3.0124130249023438, -0.439483642578125, -0.22290802001953125, 1.5362815856933594, 2.6333541870117188, 0.8332862854003906, 2.3780517578125, 0.46520233154296875, -3.1890945434570312, 3.9885330200195312, 3.543590545654297, -1.2147026062011719, -2.0133819580078125, -0.4548187255859375, 0.5507736206054688, 2.21026611328125, 3.89788818359375, -0.06102752685546875, -0.08415985107421875, -1.592987060546875, -7.710662841796875, 1.2309188842773438, -1.80792236328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000107.npy"}
{"epoch": 0.1617535903250189, "step": 108, "batch_size": 64, "mean": 0.905663013458252, "std": 2.309544563293457, "min": -4.9152984619140625, "p10": -1.9121949195861814, "median": 0.8849887847900391, "p90": 3.7307846069335944, "max": 7.56085205078125, "pos_frac": 0.6875, "sample": [-0.7223968505859375, -0.40561676025390625, 1.850128173828125, 2.2325820922851562, 0.42966461181640625, 2.170379638671875, 1.7320404052734375, 2.0510826110839844, 7.56085205078125, 0.6290283203125, 3.014923095703125, 2.1451873779296875, 1.4464263916015625, -2.0462255477905273, 1.2748565673828125, 0.91876220703125, -1.202728271484375, 0.150177001953125, 0.35030364990234375, 0.3102149963378906, 3.811676025390625, -0.8277053833007812, -0.9773674011230469, -0.51214599609375, 1.19696044921875, -2.8556060791015625, 1.5123634338378906, -4.9152984619140625, 2.0877227783203125, -0.176361083984375, -3.0311279296875, -0.4421348571777344, 4.1212005615234375, 0.23648834228515625, 3.0092391967773438, 5.560638427734375, 0.8512153625488281, -0.2366962432861328, -3.50384521484375, 3.2193832397460938, -1.599456787109375, 3.5420379638671875, 1.964447021484375, 0.96466064453125, -1.2622489929199219, 3.170238494873047, -0.4617176055908203, 1.522430419921875, 0.1666259765625, -2.2025375366210938, -3.8675994873046875, 1.3387603759765625, -0.58746337890625, 1.7507781982421875, 4.001472473144531, 0.030284881591796875, 1.191650390625, 5.865264892578125, 0.6805191040039062, 1.3678207397460938, 0.7428512573242188, 2.636241912841797, 0.14056015014648438, 4.84857177734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000108.npy"}
{"epoch": 0.16326530612244897, "step": 109, "batch_size": 64, "mean": 1.351167917251587, "std": 2.7637624740600586, "min": -3.4673690795898438, "p10": -1.6757926940917967, "median": 1.2428035736083984, "p90": 4.784037017822266, "max": 9.751373291015625, "pos_frac": 0.671875, "sample": [-1.303497314453125, 6.046241760253906, 4.7417449951171875, 0.3337516784667969, 3.787088394165039, 1.9080848693847656, 3.815185546875, -1.206756591796875, 0.03678131103515625, 1.9752464294433594, 0.12105941772460938, 1.875213623046875, 4.737312316894531, 1.75897216796875, -1.441619873046875, 2.17498779296875, -0.2641735076904297, 1.3430519104003906, -0.772796630859375, 1.1822967529296875, 0.8199501037597656, 2.1279430389404297, -3.4511566162109375, -1.200037956237793, -1.0319671630859375, 9.751373291015625, -0.4991455078125, -1.00146484375, 1.355794906616211, 6.330291748046875, 1.3033103942871094, -2.1548004150390625, 0.7533149719238281, -0.9274749755859375, -1.5217971801757812, 2.0600738525390625, -1.7825260162353516, 4.93438720703125, 0.4866180419921875, 2.313396453857422, -0.2772636413574219, 2.468547821044922, 1.3896942138671875, 6.670806884765625, -1.2655754089355469, -2.1161575317382812, -1.161834716796875, 0.9104518890380859, -3.4673690795898438, -2.928192138671875, 0.9756431579589844, 1.7324142456054688, 2.2685928344726562, 4.07562255859375, 1.8602523803710938, 8.892173767089844, 4.802162170410156, 4.0411834716796875, 3.0871124267578125, 3.3867759704589844, 0.01152801513671875, 0.41371917724609375, -1.741790771484375, 2.9319915771484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000109.npy"}
{"epoch": 0.16477702191987906, "step": 110, "batch_size": 64, "mean": 0.2415229082107544, "std": 2.5293023586273193, "min": -7.4971923828125, "p10": -2.966814613342285, "median": 0.47527122497558594, "p90": 3.5450630187988286, "max": 5.5690155029296875, "pos_frac": 0.5625, "sample": [1.3044319152832031, -0.020030975341796875, -1.6949005126953125, -1.7135276794433594, -3.059122085571289, 4.9234771728515625, 2.23065185546875, 3.7661514282226562, -5.020355224609375, 0.8420181274414062, 2.4797935485839844, 0.7901611328125, 1.464752197265625, 3.5824508666992188, 0.7094802856445312, -0.15528106689453125, 1.4808197021484375, 5.5690155029296875, -1.2396202087402344, -4.006296157836914, 2.258575439453125, -2.5137481689453125, 2.1213741302490234, 1.878753662109375, 0.40039825439453125, -0.6732864379882812, -0.23874855041503906, 2.4588851928710938, -0.9195976257324219, 0.33179473876953125, -1.6881446838378906, 2.386402130126953, 0.5501441955566406, 0.04225921630859375, -2.7514305114746094, 1.7291412353515625, -1.6930389404296875, 3.67498779296875, -1.3896331787109375, 0.35153675079345703, 4.84527587890625, -0.0469818115234375, -1.7350997924804688, 1.24237060546875, -3.5321998596191406, 1.4347000122070312, 2.3834495544433594, -1.86065673828125, -7.4971923828125, -4.56243896484375, -0.5566482543945312, 1.5723724365234375, -0.791168212890625, 1.1623992919921875, 0.9180450439453125, 3.623291015625, 2.8197250366210938, 1.1370429992675781, 0.7441787719726562, -0.663665771484375, -2.55291748046875, -1.3095855712890625, 3.45782470703125, -3.325347900390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000110.npy"}
{"epoch": 0.16628873771730915, "step": 111, "batch_size": 64, "mean": 1.009019136428833, "std": 2.6326754093170166, "min": -4.3406829833984375, "p10": -1.8529449462890626, "median": 0.5686531066894531, "p90": 4.721892929077149, "max": 8.856231689453125, "pos_frac": 0.609375, "sample": [-1.561004638671875, -0.5423965454101562, 2.0791091918945312, 2.429962158203125, 4.984630584716797, -1.861907958984375, 0.31449127197265625, 2.7079925537109375, 1.8048782348632812, 0.04991912841796875, 3.080535888671875, -1.1465911865234375, -1.1444854736328125, 3.9828109741210938, -4.3406829833984375, -0.19264984130859375, -1.101755142211914, 0.23934364318847656, -2.939929962158203, -0.12971878051757812, 5.005615234375, 1.8077392578125, 4.771350860595703, 5.638153076171875, 4.6064910888671875, 2.217884063720703, 0.811767578125, 2.627838134765625, 1.4058113098144531, -1.9573822021484375, -0.2124309539794922, -0.9756507873535156, -1.83203125, -4.079990386962891, 1.2515106201171875, 0.14130401611328125, -0.898651123046875, 2.10882568359375, -0.5732078552246094, 3.131816864013672, 2.7802734375, 3.7927093505859375, 5.268836975097656, -0.20677947998046875, 5.186794281005859, -1.0547943115234375, 1.5562057495117188, 4.6042633056640625, -1.6399688720703125, -0.4035301208496094, 3.6806259155273438, 0.5355453491210938, 2.042125701904297, -1.6213607788085938, -3.4627151489257812, 1.6200065612792969, -3.3086700439453125, 2.5661087036132812, 0.012310028076171875, 0.016269683837890625, 8.856231689453125, -0.4168357849121094, 1.8624992370605469, 0.6017608642578125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000111.npy"}
{"epoch": 0.16780045351473924, "step": 112, "batch_size": 64, "mean": 1.2651406526565552, "std": 2.70064640045166, "min": -5.759731292724609, "p10": -2.0360992431640623, "median": 0.9052696228027344, "p90": 4.960898590087891, "max": 7.23638916015625, "pos_frac": 0.734375, "sample": [3.469451904296875, 0.5313720703125, 4.8922576904296875, 0.9699783325195312, 0.5121421813964844, 0.8405609130859375, 4.920860290527344, 2.0294456481933594, 0.6860198974609375, 1.448883056640625, 5.9092559814453125, 1.92486572265625, -3.9840126037597656, 0.8098659515380859, 0.5490818023681641, -2.8017425537109375, 4.4504241943359375, 0.19338226318359375, 0.136077880859375, 0.293548583984375, 2.599506378173828, -0.289154052734375, 2.2208709716796875, 5.94317626953125, 0.7736587524414062, 5.656208038330078, 1.1914215087890625, -1.908843994140625, 3.885162353515625, 1.7137470245361328, 7.23638916015625, 1.3036956787109375, -0.8503303527832031, 2.648479461669922, 2.6480026245117188, 6.69781494140625, 0.1198577880859375, -0.8403549194335938, 3.757965087890625, 1.9820365905761719, -2.2164077758789062, -1.550079345703125, 3.680572509765625, 0.7959575653076172, 1.5451812744140625, 6.1419830322265625, 4.978057861328125, 4.265045166015625, -0.03936767578125, -1.4277153015136719, -3.2441253662109375, 0.28376007080078125, -0.9612274169921875, 2.3966751098632812, 0.6653900146484375, -0.1634063720703125, 1.1334266662597656, -5.759731292724609, -2.09063720703125, 0.18377685546875, 1.0798797607421875, -3.9615020751953125, -0.26955413818359375, 1.232025146484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000112.npy"}
{"epoch": 0.1693121693121693, "step": 113, "batch_size": 64, "mean": 0.7895805239677429, "std": 2.5974435806274414, "min": -7.402679443359375, "p10": -1.7168174743652342, "median": 0.9607095718383789, "p90": 3.99704055786133, "max": 8.595489501953125, "pos_frac": 0.609375, "sample": [-0.419586181640625, -3.937389373779297, -1.0168838500976562, 1.2498607635498047, -0.38726806640625, 2.5792236328125, 1.143218994140625, -1.8896484375, -3.1259918212890625, 2.67657470703125, -1.2974014282226562, -0.7260894775390625, -1.7888717651367188, 0.34654998779296875, -2.515338897705078, -0.564849853515625, 0.9461212158203125, 2.0426483154296875, 1.5862579345703125, -5.482324600219727, -0.7143936157226562, -0.485260009765625, 4.7306671142578125, 0.052703857421875, 1.1858940124511719, 1.4994010925292969, 2.277698516845703, 1.1248970031738281, 4.779090881347656, -0.3084259033203125, 1.8599090576171875, 8.595489501953125, 0.6757946014404297, 0.4961585998535156, 0.39441680908203125, 1.88580322265625, -7.402679443359375, -0.22821426391601562, 1.541534423828125, 1.2079086303710938, -0.0616607666015625, 1.1637954711914062, -1.241455078125, 1.200286865234375, 2.491771697998047, 6.7142486572265625, 0.3651885986328125, 1.7136287689208984, 0.9752979278564453, 3.336345672607422, -1.5486907958984375, -0.4814605712890625, -0.17209243774414062, 5.552520751953125, 4.1910247802734375, -0.5908088684082031, 2.2964935302734375, 1.0837669372558594, 3.46844482421875, 3.5444107055664062, 4.642509460449219, -1.0243339538574219, -0.9512290954589844, 1.2779464721679688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000113.npy"}
{"epoch": 0.1708238851095994, "step": 114, "batch_size": 64, "mean": 1.39513099193573, "std": 3.1237103939056396, "min": -4.238105773925781, "p10": -2.267210388183593, "median": 1.340240478515625, "p90": 5.436885452270508, "max": 10.265594482421875, "pos_frac": 0.65625, "sample": [-1.609619140625, -1.44158935546875, 1.9472541809082031, 2.5481643676757812, 8.347293853759766, 2.39752197265625, 5.034217834472656, 4.2411041259765625, 1.438507080078125, -3.8741588592529297, 5.200252532958984, -1.5745086669921875, 2.0510101318359375, -2.495574951171875, -0.12502670288085938, 1.4459686279296875, 1.4009971618652344, 0.5688095092773438, -0.7873573303222656, 0.044097900390625, 0.9600677490234375, -0.5274200439453125, 10.265594482421875, -1.7343597412109375, 1.2289390563964844, 4.171882629394531, 4.6480255126953125, 2.5112056732177734, -1.021331787109375, 3.5500965118408203, -3.0386505126953125, 0.4388284683227539, 5.181243896484375, -4.029052734375, 2.2289505004882812, -0.3132820129394531, 5.538299560546875, 0.29816436767578125, 2.639911651611328, -2.831207275390625, 7.94476318359375, 2.0762100219726562, 5.173736572265625, 0.5207366943359375, 1.31378173828125, -0.4097442626953125, 7.660919189453125, 0.6996994018554688, -1.4524421691894531, -1.5897674560546875, 0.8269119262695312, 1.36669921875, 1.4296684265136719, 1.368316650390625, 3.2355194091796875, 5.68450927734375, 1.7041645050048828, -1.1929454803466797, -2.9336624145507812, -4.238105773925781, 5.862415313720703, -0.4202423095703125, 1.4498558044433594, -1.7158851623535156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000114.npy"}
{"epoch": 0.17233560090702948, "step": 115, "batch_size": 64, "mean": 1.1847448348999023, "std": 2.669616937637329, "min": -4.084297180175781, "p10": -1.9649848937988281, "median": 0.7862129211425781, "p90": 4.8022285461425795, "max": 7.558135986328125, "pos_frac": 0.671875, "sample": [-1.629180908203125, -1.1315574645996094, 3.1378250122070312, 0.363922119140625, 7.0793304443359375, -0.84893798828125, 1.2151718139648438, 3.5673675537109375, -0.5940093994140625, 7.558135986328125, 0.7712631225585938, -2.6018218994140625, 7.2785797119140625, -1.3730239868164062, -2.3183822631835938, -3.8420486450195312, 0.6330413818359375, 2.04315185546875, 0.581878662109375, 3.5440330505371094, 1.1313705444335938, 4.2133636474609375, 0.7280960083007812, 0.6511611938476562, -2.1835098266601562, 0.5896759033203125, 1.584564208984375, 1.678955078125, -4.084297180175781, 5.564292907714844, -0.544708251953125, 4.173984527587891, 2.0250682830810547, 5.2237701416015625, -0.01985931396484375, 4.3516387939453125, 0.8011627197265625, 1.0770301818847656, -1.0219154357910156, 0.29946136474609375, 1.8270606994628906, -2.0062637329101562, 2.124542236328125, 2.7006072998046875, 3.8738632202148438, 1.3353080749511719, 4.308021545410156, 0.26232147216796875, -3.1230087280273438, 7.21844482421875, 1.0639305114746094, -1.0540771484375, 2.4648094177246094, -0.18478775024414062, 0.1654205322265625, -0.6282634735107422, -1.8686676025390625, -0.3697509765625, 0.882354736328125, -1.1519927978515625, 1.6107501983642578, 0.27252197265625, 1.4311389923095703, 4.995338439941406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000115.npy"}
{"epoch": 0.17384731670445955, "step": 116, "batch_size": 64, "mean": 1.6251661777496338, "std": 2.6404953002929688, "min": -4.2654266357421875, "p10": -1.3266197204589842, "median": 1.0115203857421875, "p90": 5.55071258544922, "max": 8.448486328125, "pos_frac": 0.71875, "sample": [6.189903259277344, -1.63262939453125, -0.00133514404296875, 5.62933349609375, 5.3672637939453125, 1.4904251098632812, 3.9603118896484375, 0.14046287536621094, 3.4607009887695312, 3.8454360961914062, 2.3008804321289062, -1.0273056030273438, -1.2933425903320312, 4.1041259765625, 1.7011604309082031, 0.5815029144287109, -0.16271591186523438, 0.17574310302734375, 2.4945755004882812, 3.8110809326171875, -1.9210777282714844, 0.16124725341796875, 1.9163436889648438, 0.10619354248046875, -1.0876274108886719, 6.508201599121094, 1.0685882568359375, 3.3050613403320312, 8.448486328125, 1.4054298400878906, 0.9544525146484375, -0.7861862182617188, -2.475809097290039, 0.14386940002441406, 4.9112091064453125, 0.5242080688476562, 3.441131591796875, 2.807300567626953, -1.4423294067382812, -1.34088134765625, 2.1164894104003906, 1.96173095703125, 1.292236328125, -0.940765380859375, 6.307403564453125, 2.0133934020996094, 5.789070129394531, -1.6753616333007812, -0.7347869873046875, 0.948974609375, -0.6821060180664062, 0.45849609375, 0.8815879821777344, -4.2654266357421875, -0.5171890258789062, 2.7830886840820312, 0.9399642944335938, 0.5623245239257812, 0.9482021331787109, 7.543304443359375, 3.7250823974609375, 2.7516212463378906, 5.131444931030273, -1.1115303039550781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000116.npy"}
{"epoch": 0.17535903250188964, "step": 117, "batch_size": 64, "mean": 1.6157585382461548, "std": 2.8597865104675293, "min": -4.658164978027344, "p10": -2.1272640228271484, "median": 1.301473617553711, "p90": 5.8828987121582035, "max": 6.8334808349609375, "pos_frac": 0.734375, "sample": [-1.9599151611328125, 5.940208435058594, 3.4420547485351562, 1.0219497680664062, 6.045806884765625, 0.2118377685546875, 3.399791717529297, 0.959747314453125, 3.1703567504882812, 2.3705482482910156, 0.7088623046875, -0.07954025268554688, 5.903564453125, -4.658164978027344, 1.2297210693359375, 2.5911483764648438, -3.0267181396484375, -0.9865646362304688, 4.267364501953125, 3.1756515502929688, 0.02630615234375, 4.351245880126953, 0.923736572265625, 0.462249755859375, 4.3046112060546875, -1.6330413818359375, 1.212738037109375, 2.7450714111328125, 2.7837867736816406, 6.2220458984375, 1.1959381103515625, 5.586891174316406, 6.742763519287109, 0.6442947387695312, 3.3807525634765625, -0.12528228759765625, 1.0793685913085938, 5.834678649902344, -0.22058486938476562, 4.228660583496094, 0.9104766845703125, 1.4066429138183594, 3.00592041015625, 6.100086212158203, -2.1666946411132812, -4.088470458984375, 1.3732261657714844, 0.6805992126464844, 4.785697937011719, 1.9308280944824219, 0.6158943176269531, 1.52020263671875, 4.670879364013672, -2.5661544799804688, 6.8334808349609375, -3.522106170654297, -2.035259246826172, 3.1920852661132812, -4.0108642578125, -1.1883697509765625, 2.0079193115234375, 1.9059982299804688, -0.9263343811035156, -0.5010795593261719], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000117.npy"}
{"epoch": 0.17687074829931973, "step": 118, "batch_size": 64, "mean": 1.2826839685440063, "std": 2.948547124862671, "min": -6.454677581787109, "p10": -2.721894836425781, "median": 1.4751243591308594, "p90": 5.078329849243165, "max": 7.95965576171875, "pos_frac": 0.6875, "sample": [0.0130157470703125, 0.19600677490234375, 0.31037139892578125, 1.5158538818359375, -6.454677581787109, -2.8970794677734375, 0.09771728515625, 3.3545913696289062, 4.930553436279297, -1.1969108581542969, 4.126865386962891, 1.545623779296875, -0.8384552001953125, 3.6838760375976562, 0.5074806213378906, -2.5057830810546875, 2.2283096313476562, 3.8179473876953125, 0.19618988037109375, 0.12748336791992188, -0.6015777587890625, 1.72698974609375, 3.62762451171875, 2.611297607421875, 2.8535079956054688, -2.81451416015625, -2.341278076171875, -1.7150192260742188, 2.756195068359375, -0.10930633544921875, 2.6499404907226562, 1.8159828186035156, 5.696922302246094, -1.8866729736328125, 1.4343948364257812, 3.5410308837890625, 0.7922477722167969, 2.2743911743164062, 4.8150787353515625, -3.8401947021484375, -0.171142578125, 4.488410949707031, 2.0166473388671875, 7.95965576171875, 0.8341140747070312, -3.000396728515625, -3.7327957153320312, 5.463470458984375, 6.048675537109375, 5.955146789550781, 4.7691802978515625, -4.238677978515625, 3.10418701171875, 2.4293365478515625, 0.23630142211914062, -0.6400337219238281, -1.6424560546875, 2.415752410888672, 3.427114486694336, 5.14166259765625, 5.913055419921875, -0.48543548583984375, 0.38492774963378906, -0.6309432983398438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000118.npy"}
{"epoch": 0.17838246409674982, "step": 119, "batch_size": 64, "mean": 1.3526442050933838, "std": 3.6152901649475098, "min": -7.63720703125, "p10": -2.6264892578125, "median": 1.0433635711669922, "p90": 6.465339660644532, "max": 11.316162109375, "pos_frac": 0.59375, "sample": [-2.2942161560058594, 8.452648162841797, -1.7869453430175781, -2.610942840576172, -4.6102752685546875, -1.3089752197265625, 2.703563690185547, 1.433349609375, -1.305267333984375, -0.5059661865234375, -3.171396255493164, -0.7522811889648438, -0.4340972900390625, 7.370948791503906, 4.883388519287109, -7.63720703125, -1.8513031005859375, -2.8246002197265625, 1.5257492065429688, 4.8248443603515625, -1.9813232421875, 2.481861114501953, 0.43878173828125, 0.9482955932617188, 6.856010437011719, 2.8918685913085938, -3.417449951171875, 5.340244293212891, 6.1762542724609375, 6.5892333984375, 4.8790740966796875, 3.0098533630371094, 1.7630348205566406, 3.0333099365234375, 5.100006103515625, 11.316162109375, 1.1384315490722656, -1.2055206298828125, 0.6544265747070312, 5.611808776855469, 7.524120330810547, 1.5707550048828125, -2.6331520080566406, 1.4909210205078125, -1.63531494140625, -2.1958541870117188, 5.5357208251953125, 4.076499938964844, 2.9368438720703125, -1.5313873291015625, -0.105010986328125, -2.9948272705078125, -0.3192291259765625, -1.7036819458007812, 0.0767059326171875, 1.4518356323242188, 3.6939125061035156, 7.352447509765625, 0.5761489868164062, 2.4053497314453125, -1.072998046875, 1.301116943359375, 0.17494964599609375, -1.1320266723632812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000119.npy"}
{"epoch": 0.17989417989417988, "step": 120, "batch_size": 64, "mean": 1.9806838035583496, "std": 2.963226556777954, "min": -4.524566650390625, "p10": -1.073977279663086, "median": 1.6718311309814453, "p90": 5.9234664916992195, "max": 9.922237396240234, "pos_frac": 0.703125, "sample": [2.699188232421875, 0.191925048828125, -1.0130653381347656, 9.922237396240234, 1.142364501953125, -0.25518035888671875, 4.34423828125, -0.1448211669921875, 1.7015876770019531, -3.9383010864257812, 7.8459320068359375, 0.9727935791015625, 3.2904701232910156, -0.1695709228515625, 2.4943161010742188, 5.9788055419921875, 5.111541748046875, -0.3426475524902344, 2.9073410034179688, 0.8761367797851562, 2.1245803833007812, 0.9081497192382812, 2.7014389038085938, 3.236896514892578, -1.2995147705078125, 1.7202644348144531, 6.769477844238281, -1.4964065551757812, 2.6852645874023438, 4.1509552001953125, 5.794342041015625, -4.524566650390625, 9.35052490234375, -0.41332054138183594, 4.4195556640625, -0.9346771240234375, 2.810924530029297, -0.5613670349121094, 4.78118896484375, 6.405342102050781, 1.0004825592041016, 1.6420745849609375, -0.1322479248046875, 1.64056396484375, 1.761301040649414, -3.2237472534179688, -0.2530021667480469, 2.99383544921875, 3.3573074340820312, 1.1442031860351562, 0.14959716796875, 0.9506320953369141, 4.842140197753906, -0.88726806640625, 2.501300811767578, -1.1000823974609375, 6.6820831298828125, 5.310577392578125, -0.06429672241210938, -2.9496536254882812, 1.5955047607421875, 1.2470512390136719, 4.305742263793945, 2.005321502685547], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000120.npy"}
{"epoch": 0.18140589569160998, "step": 121, "batch_size": 64, "mean": 1.469817876815796, "std": 3.3027284145355225, "min": -5.19671630859375, "p10": -2.5553268432617187, "median": 1.018075942993164, "p90": 5.912866210937503, "max": 9.983612060546875, "pos_frac": 0.640625, "sample": [3.5028762817382812, -5.19671630859375, 0.3118743896484375, -3.0799407958984375, 3.563922882080078, 8.287811279296875, 2.7702102661132812, -3.420534133911133, 2.1150360107421875, 4.807273864746094, -0.02904510498046875, -1.7644271850585938, 9.983612060546875, 3.266935348510742, -1.72552490234375, 0.7092018127441406, 3.7648887634277344, 0.684539794921875, 4.3090972900390625, -2.6793289184570312, 7.8127288818359375, -0.20538330078125, 3.839111328125, 3.9524307250976562, 4.049476623535156, -1.9697723388671875, 2.3867759704589844, -2.6748809814453125, 1.645050048828125, -0.17669677734375, -0.469146728515625, 8.3228759765625, -0.9755058288574219, 1.0032501220703125, 1.8640975952148438, 1.1970672607421875, -1.796875, 6.28680419921875, 2.3919219970703125, -0.5318450927734375, -0.4655933380126953, 0.5683536529541016, -1.199493408203125, 3.29400634765625, -1.4658355712890625, 2.3827972412109375, -4.147163391113281, 8.691688537597656, 0.2046356201171875, 1.0329017639160156, 1.5659122467041016, 0.8930091857910156, 3.9625396728515625, 1.7355155944824219, -2.0406646728515625, 7.5203857421875, -2.2763671875, 1.2032127380371094, 0.9806175231933594, 0.9334602355957031, -0.070465087890625, 5.04034423828125, 2.81805419921875, -3.226755142211914], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000121.npy"}
{"epoch": 0.18291761148904007, "step": 122, "batch_size": 64, "mean": 1.4627649784088135, "std": 3.547473192214966, "min": -4.682548522949219, "p10": -2.7147041320800778, "median": 0.89471435546875, "p90": 6.312227630615237, "max": 11.100494384765625, "pos_frac": 0.671875, "sample": [10.5267333984375, -3.837718963623047, -4.1893157958984375, 2.5503921508789062, 0.95166015625, -1.1538314819335938, 9.993408203125, 4.27374267578125, -3.7594985961914062, 4.840789794921875, 2.355602264404297, -1.3585128784179688, -2.7949867248535156, 0.8868255615234375, 0.1904125213623047, 11.100494384765625, -1.3078460693359375, 0.421539306640625, -3.3474807739257812, 4.020599365234375, 3.963369369506836, 5.259914398193359, 0.6464462280273438, 2.1944923400878906, 6.603996276855469, 6.703216552734375, -2.5273780822753906, -1.7740440368652344, 2.3664169311523438, 5.650215148925781, -2.37841796875, 3.5769805908203125, 1.0901508331298828, 0.7596435546875, -4.682548522949219, 2.0836448669433594, 1.149026870727539, 0.12957763671875, 2.163524627685547, -0.6037864685058594, 0.3959503173828125, 0.5259323120117188, -2.1365814208984375, 6.595947265625, -2.9785614013671875, 3.4982986450195312, 0.611663818359375, -0.23944854736328125, -0.7591285705566406, 0.9026031494140625, -0.3719215393066406, -0.9139366149902344, 2.7012195587158203, 1.1673049926757812, 0.3128089904785156, 0.09996414184570312, 1.2384109497070312, -0.5572776794433594, -0.701690673828125, 3.8068313598632812, 2.8350753784179688, 2.7494850158691406, 1.440582275390625, 10.655975341796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000122.npy"}
{"epoch": 0.18442932728647016, "step": 123, "batch_size": 64, "mean": 1.6930320262908936, "std": 3.171293258666992, "min": -5.2956085205078125, "p10": -2.420412063598633, "median": 1.4670391082763672, "p90": 5.59511947631836, "max": 12.74737548828125, "pos_frac": 0.734375, "sample": [3.565937042236328, 7.147258758544922, 0.901580810546875, 12.74737548828125, -1.4725112915039062, -2.8314361572265625, 3.951202392578125, 2.0224609375, 5.185764312744141, 2.0622711181640625, 0.8782424926757812, 5.004791259765625, 2.0535202026367188, -1.3916015625, 3.119384765625, 1.8825454711914062, 3.1259841918945312, 4.478483200073242, 0.9425315856933594, 5.6512298583984375, 1.2192230224609375, 2.40673828125, 2.429790496826172, 3.0894832611083984, 3.7224807739257812, 5.464195251464844, 1.5934028625488281, -1.1090850830078125, 2.834442138671875, 3.8667564392089844, 1.3228588104248047, 3.967529296875, -5.2956085205078125, 5.986728668212891, -1.3762702941894531, -4.577606201171875, -3.2018966674804688, -2.2332115173339844, 0.09725570678710938, 0.8534622192382812, 4.044189453125, -0.2229766845703125, 2.890291213989258, -2.7956008911132812, 2.4156150817871094, 7.438865661621094, -0.8188343048095703, -2.6418418884277344, 0.454132080078125, 0.17169189453125, -1.3498458862304688, 1.1428146362304688, 0.22027587890625, 1.3340911865234375, -2.500640869140625, 7.323516845703125, -1.105703353881836, 0.5571708679199219, -0.7094879150390625, 1.3406753540039062, 1.9653444290161133, 2.4665985107421875, 6.090606689453125, 0.5574188232421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000123.npy"}
{"epoch": 0.18594104308390022, "step": 124, "batch_size": 64, "mean": 1.9345088005065918, "std": 3.8106677532196045, "min": -6.007617950439453, "p10": -3.18485107421875, "median": 1.738494873046875, "p90": 7.356343841552736, "max": 13.0068359375, "pos_frac": 0.734375, "sample": [-0.14287567138671875, 9.1988525390625, 0.7856292724609375, -4.461391448974609, 2.051158905029297, 2.473114013671875, 7.7855072021484375, 2.7320404052734375, 10.245697021484375, 13.0068359375, 3.0830535888671875, 3.9590606689453125, -0.29756736755371094, 8.247543334960938, 1.067500114440918, 2.848949432373047, -1.7638320922851562, 2.066488265991211, -0.23400115966796875, -1.9178905487060547, -3.057403564453125, 3.1640167236328125, -1.3919143676757812, 3.4584121704101562, 0.33542633056640625, 2.7369003295898438, 0.4733867645263672, 2.9445724487304688, -1.4468841552734375, -3.239471435546875, 5.978950500488281, 5.671417236328125, 2.2127914428710938, 1.2636871337890625, 5.661918640136719, 7.49639892578125, 7.029548645019531, -5.6528167724609375, 1.71038818359375, 4.1542816162109375, -3.5027008056640625, 2.9494476318359375, 0.813751220703125, 2.8972244262695312, 1.2636489868164062, 3.6067123413085938, 1.5891189575195312, 1.2841567993164062, -3.2986793518066406, 2.226520538330078, 9.642478942871094, 0.6049346923828125, 0.11144065856933594, 0.48777008056640625, -3.8741455078125, 0.47727203369140625, 4.1436004638671875, -0.810089111328125, -6.007617950439453, 5.24041748046875, -1.5613632202148438, 3.2313385009765625, 0.28924560546875, 1.7666015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000124.npy"}
{"epoch": 0.1874527588813303, "step": 125, "batch_size": 64, "mean": 2.159012794494629, "std": 3.643646717071533, "min": -7.4067535400390625, "p10": -2.2317955017089846, "median": 2.010997772216797, "p90": 6.073717117309571, "max": 12.564788818359375, "pos_frac": 0.765625, "sample": [-0.92681884765625, 3.2801132202148438, 5.4120941162109375, 1.909027099609375, 2.9698028564453125, -0.6783580780029297, 4.4774627685546875, 9.465421676635742, 6.8955078125, 1.2188262939453125, 9.334381103515625, -2.1809158325195312, 0.19324111938476562, 1.1104583740234375, -0.12005233764648438, 6.151943206787109, 5.476600646972656, 1.1584320068359375, 0.5977897644042969, -1.5268211364746094, 5.584903717041016, 5.853858947753906, 2.2053451538085938, 2.9773712158203125, -1.3646697998046875, 2.454803466796875, 0.6273880004882812, -2.233551025390625, -4.3609619140625, 0.340240478515625, 4.4712982177734375, 1.6030654907226562, 0.8866081237792969, 5.857513427734375, 5.492275238037109, -3.850841522216797, 2.3217334747314453, 1.569427490234375, 6.186347961425781, 12.564788818359375, 4.4599456787109375, 1.9306182861328125, 7.370429992675781, 4.492431640625, 5.165866851806641, -2.2276992797851562, 2.0913772583007812, 3.8121070861816406, 0.30880165100097656, 1.2725753784179688, 3.5091400146484375, 1.6324386596679688, 4.107078552246094, 1.320404052734375, -1.9319610595703125, 5.8911895751953125, 3.229084014892578, -2.9094009399414062, 2.455657958984375, 0.5486011505126953, 3.6915740966796875, -2.6536521911621094, -5.3881072998046875, -7.4067535400390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000125.npy"}
{"epoch": 0.1889644746787604, "step": 126, "batch_size": 64, "mean": 1.6584042310714722, "std": 4.3218302726745605, "min": -6.53131103515625, "p10": -3.200392150878906, "median": 0.8564224243164062, "p90": 7.848157501220704, "max": 11.72900390625, "pos_frac": 0.609375, "sample": [9.982154846191406, -1.1157894134521484, 4.819018363952637, 1.3713150024414062, 1.1228351593017578, 7.7137298583984375, 6.1055145263671875, 0.5142784118652344, 1.5498046875, -1.656332015991211, 8.522443771362305, 6.2744598388671875, -6.53131103515625, 0.814971923828125, 10.974189758300781, 2.907062530517578, -2.7851486206054688, -2.0719985961914062, 0.6572895050048828, -2.9998321533203125, 6.76873779296875, 6.431610107421875, -0.9929265975952148, 2.551206588745117, -2.4298152923583984, 7.015514373779297, 3.382293701171875, -4.062019348144531, 10.9788818359375, -0.19484329223632812, -3.286346435546875, -4.56182861328125, -2.186676025390625, 0.7227554321289062, -1.9915351867675781, 1.172027587890625, 0.14649581909179688, 2.2358551025390625, 0.7604694366455078, -6.071952819824219, 5.248847961425781, -1.66070556640625, 2.9093475341796875, -0.6497650146484375, 9.517410278320312, -0.633544921875, 1.8388137817382812, 11.72900390625, 1.4246444702148438, 2.7069015502929688, -1.0848617553710938, 2.5112228393554688, 3.1046295166015625, 0.2653846740722656, -0.982696533203125, -0.826263427734375, 0.8978729248046875, 7.165336608886719, -1.6548385620117188, -3.7769546508789062, -0.6049652099609375, 7.905769348144531, -4.074703216552734, 2.305419921875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000126.npy"}
{"epoch": 0.19047619047619047, "step": 127, "batch_size": 64, "mean": 2.264709234237671, "std": 3.3415474891662598, "min": -5.1973114013671875, "p10": -1.4015518188476563, "median": 2.0950698852539062, "p90": 6.441313171386719, "max": 11.090187072753906, "pos_frac": 0.78125, "sample": [6.3893280029296875, 9.158843994140625, 3.0770263671875, 5.4945068359375, -1.4030075073242188, 8.662765502929688, -1.6421966552734375, 3.7065696716308594, 1.2841567993164062, 4.34539794921875, 7.5041961669921875, 0.55804443359375, 0.7828559875488281, 5.492527008056641, 0.19724273681640625, 0.31927490234375, 4.054450988769531, 2.4934730529785156, 0.05362701416015625, 2.1688308715820312, 6.463592529296875, -5.1973114013671875, 1.8828811645507812, 2.938304901123047, 2.508869171142578, -4.2863006591796875, -0.8506927490234375, 0.052265167236328125, 2.3117523193359375, 0.3131599426269531, -1.3981552124023438, 4.250999450683594, 4.322422027587891, 2.057830810546875, -0.18566131591796875, 4.679109573364258, 0.8538990020751953, 6.473304748535156, 4.452384948730469, 11.090187072753906, -4.884670257568359, 4.331111907958984, 4.1273651123046875, -1.1600494384765625, -2.4340896606445312, 6.214393615722656, 1.4978790283203125, -0.722259521484375, 5.772666931152344, 0.2183837890625, -1.2805557250976562, 4.981781005859375, 2.89874267578125, 0.29457855224609375, 2.9103851318359375, -1.8366317749023438, 3.856048583984375, 0.09227752685546875, 2.1323089599609375, 1.7019309997558594, 1.8996200561523438, 1.0520267486572266, 8.462078094482422, -0.6146907806396484], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000127.npy"}
{"epoch": 0.19198790627362056, "step": 128, "batch_size": 64, "mean": 1.5210485458374023, "std": 4.903607368469238, "min": -10.873886108398438, "p10": -5.5426288604736325, "median": 1.6785087585449219, "p90": 7.743847656250002, "max": 12.772895812988281, "pos_frac": 0.65625, "sample": [4.190582275390625, -2.701324462890625, 1.7997703552246094, 1.592071533203125, 1.821868896484375, 1.8207550048828125, -1.0487213134765625, 9.15740966796875, -0.540374755859375, 8.0682373046875, 0.6680793762207031, -10.873886108398438, 3.5722885131835938, 0.8346138000488281, 6.984312057495117, -6.871868133544922, 1.37841796875, 7.152565002441406, -5.7697906494140625, 10.868942260742188, 12.65301513671875, -1.31988525390625, 2.1640892028808594, 7.437042236328125, 11.550735473632812, 0.2782135009765625, -1.0698318481445312, 7.875335693359375, -5.8336181640625, 3.4364280700683594, 3.459564208984375, 0.181640625, -1.2112693786621094, 1.7649459838867188, 4.0372161865234375, 1.2647705078125, -2.2162628173828125, 4.841381072998047, -8.995506286621094, 4.330158233642578, -2.2493743896484375, 2.0479812622070312, 1.8382453918457031, 0.12146759033203125, -5.012584686279297, 4.230644226074219, 1.9297561645507812, -0.5665512084960938, 2.8510513305664062, -0.36412715911865234, -7.423980712890625, 5.315887451171875, 4.2499847412109375, -0.239501953125, 2.4497528076171875, -1.1482486724853516, 1.0367050170898438, 3.304229736328125, 12.772895812988281, -2.055696487426758, -7.544677734375, -0.445068359375, 5.145263671875, 0.370941162109375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000128.npy"}
{"epoch": 0.19349962207105065, "step": 129, "batch_size": 64, "mean": 1.7051732540130615, "std": 4.616244792938232, "min": -8.491943359375, "p10": -4.134046936035157, "median": 1.5831871032714844, "p90": 7.561157608032227, "max": 13.441070556640625, "pos_frac": 0.625, "sample": [-2.4122772216796875, -2.1862411499023438, 5.535207748413086, 0.727294921875, -1.1227760314941406, 8.458015441894531, 4.4180145263671875, -4.150215148925781, 5.00946044921875, 8.130435943603516, 7.097263336181641, 5.23785400390625, 1.19091796875, -1.0533828735351562, 0.8199005126953125, 1.8767776489257812, 7.324760437011719, -1.7162914276123047, -0.20359039306640625, 5.309822082519531, 5.46368408203125, -8.491943359375, 0.9498748779296875, 5.700050354003906, -1.6094856262207031, 13.441070556640625, 0.3072776794433594, -2.6574554443359375, 3.2145957946777344, 9.05377197265625, 1.1963958740234375, -0.172821044921875, -8.10748291015625, 4.414907455444336, 3.16973876953125, 5.868915557861328, -6.38983154296875, -4.3662567138671875, -2.1575927734375, 6.875030517578125, 2.2869338989257812, -1.717315673828125, -2.057384490966797, 0.8902130126953125, 7.6053924560546875, 3.5871448516845703, -4.096321105957031, 7.457942962646484, 2.8751373291015625, 9.439422607421875, 1.4578323364257812, 4.752616882324219, 8.511299133300781, -0.04905223846435547, 1.8462333679199219, -5.368125915527344, -1.7274742126464844, -7.148521423339844, 3.25933837890625, -1.4754142761230469, 2.1269378662109375, 1.7085418701171875, -2.6049747467041016, 3.5772857666015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000129.npy"}
{"epoch": 0.19501133786848074, "step": 130, "batch_size": 64, "mean": 2.0208921432495117, "std": 4.105181694030762, "min": -8.779983520507812, "p10": -2.9466873168945313, "median": 1.670511245727539, "p90": 7.0533786773681655, "max": 11.55914306640625, "pos_frac": 0.65625, "sample": [2.3604278564453125, 2.0245437622070312, 5.5057220458984375, 11.035202026367188, 1.0783233642578125, -2.448211669921875, 2.289886474609375, 3.976350784301758, -1.0437374114990234, 0.9777660369873047, -1.35626220703125, 6.114189147949219, 5.470176696777344, 1.6910400390625, -2.9399795532226562, 8.358673095703125, -1.531646728515625, -3.7798843383789062, -2.9495620727539062, 10.63055419921875, 2.1259212493896484, -0.9375839233398438, -1.1187057495117188, 7.198295593261719, -0.06414794921875, -4.023460388183594, 1.58441162109375, 0.4121551513671875, 1.6499824523925781, -1.6598663330078125, 5.2095947265625, 3.001617431640625, 9.368209838867188, 5.560720443725586, 3.0048446655273438, 7.159976959228516, 6.804649353027344, -0.98236083984375, 11.55914306640625, 2.6186599731445312, -1.5200881958007812, 3.00775146484375, 1.456939697265625, -3.0327796936035156, -1.014678955078125, 2.3410911560058594, 2.6005210876464844, 1.0519180297851562, -0.5859756469726562, -4.297092437744141, -5.202213287353516, 0.7118473052978516, 2.973480224609375, 6.168163299560547, -8.779983520507812, 6.2536163330078125, 6.0125885009765625, -0.12897682189941406, 5.1710662841796875, 0.47711181640625, 4.133869171142578, -0.5723667144775391, 1.549367904663086, 6.6262969970703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000130.npy"}
{"epoch": 0.1965230536659108, "step": 131, "batch_size": 64, "mean": 2.601199150085449, "std": 4.626436710357666, "min": -8.722579956054688, "p10": -2.715761184692383, "median": 1.861750602722168, "p90": 9.290518188476566, "max": 14.317733764648438, "pos_frac": 0.734375, "sample": [0.9657974243164062, 4.29632568359375, -2.4386234283447266, 0.5721931457519531, -2.781787872314453, 14.317733764648438, 2.20361328125, 3.4246387481689453, -0.09151077270507812, 11.914382934570312, 3.49139404296875, 0.7827510833740234, -0.7423820495605469, 0.8276214599609375, 5.738010406494141, 4.152097702026367, 7.6037750244140625, -1.7922210693359375, 1.8411407470703125, -8.722579956054688, -2.5616989135742188, 2.2774505615234375, 13.468734741210938, 0.6260757446289062, 2.2334136962890625, -3.2121505737304688, -1.8418750762939453, -0.7520904541015625, -4.924678802490234, -2.8536453247070312, 2.35626220703125, 1.3833465576171875, 5.954559326171875, 1.8823604583740234, 8.207275390625, 7.4379425048828125, 0.647308349609375, 8.351837158203125, 1.4510555267333984, 5.5383758544921875, 2.9129905700683594, 9.69281005859375, -2.2810897827148438, 7.2377471923828125, 4.830921173095703, -3.1957550048828125, 1.5786399841308594, -4.074333190917969, 0.46625518798828125, 9.72967529296875, 1.1478919982910156, 0.1771392822265625, 7.186532974243164, -2.2715110778808594, -2.2248916625976562, 10.529457092285156, 2.3027477264404297, 1.5640792846679688, 3.492828369140625, 4.816875457763672, 1.703155517578125, 4.192712783813477, 10.609169006347656, 5.120464324951172], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000131.npy"}
{"epoch": 0.1980347694633409, "step": 132, "batch_size": 64, "mean": 1.8213005065917969, "std": 4.031142234802246, "min": -8.337738037109375, "p10": -2.3842391967773433, "median": 1.8678665161132812, "p90": 5.948524856567383, "max": 13.587600708007812, "pos_frac": 0.734375, "sample": [3.6294097900390625, 1.0157470703125, 3.9441452026367188, 3.5393753051757812, -1.3952865600585938, -1.8646011352539062, 5.961128234863281, 1.5058374404907227, -7.077690124511719, 6.58148193359375, 3.0459938049316406, 1.951822280883789, 2.7722625732421875, 2.2912940979003906, -1.0831832885742188, 1.6797409057617188, -0.15069961547851562, 7.12261962890625, 4.2953033447265625, 3.0154190063476562, -7.823249816894531, -8.337738037109375, 6.489559173583984, -1.283721923828125, -0.517730712890625, -2.0961151123046875, 4.6250762939453125, 1.1981163024902344, 5.919116973876953, -1.7389335632324219, 11.192695617675781, 0.8295269012451172, -7.922554016113281, 4.96441650390625, 3.22216796875, 3.365020751953125, 5.3304443359375, 3.6767044067382812, 5.090204238891602, 2.51043701171875, 0.998382568359375, 0.43520355224609375, 0.9730224609375, 11.296783447265625, 1.8676300048828125, -0.126068115234375, 3.5256118774414062, 4.8439788818359375, 0.7496147155761719, 2.1199722290039062, 3.599620819091797, -0.7783012390136719, 1.325042724609375, 1.86810302734375, 2.1221799850463867, 1.6664581298828125, 13.587600708007812, -2.8715667724609375, 1.68017578125, 0.8782196044921875, 0.46356201171875, 2.170928955078125, -2.507720947265625, -2.79876708984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000132.npy"}
{"epoch": 0.19954648526077098, "step": 133, "batch_size": 64, "mean": 2.0570218563079834, "std": 4.423807621002197, "min": -7.3202056884765625, "p10": -3.7541925430297853, "median": 1.7574939727783203, "p90": 6.768586349487306, "max": 14.300865173339844, "pos_frac": 0.65625, "sample": [8.171310424804688, -6.8653717041015625, 1.1626739501953125, 4.842472076416016, -0.2741374969482422, -6.791290283203125, 2.9600257873535156, 12.015430450439453, -0.17563629150390625, 4.201488494873047, -0.5130386352539062, 5.11590576171875, 5.69670295715332, 2.3803176879882812, -2.3419570922851562, 2.288818359375, 7.584358215332031, -7.3202056884765625, 3.342662811279297, 5.27410888671875, -3.8464508056640625, 5.6276092529296875, -1.0729293823242188, 5.574819564819336, 1.9558525085449219, -4.2771148681640625, 2.2981719970703125, -2.568115234375, 5.892112731933594, -0.19240856170654297, 5.179134368896484, 0.42409515380859375, 2.960205078125, 8.111822128295898, -0.48250579833984375, -0.2362060546875, -5.018733978271484, 6.912162780761719, 5.945993423461914, 6.181854248046875, -3.7158966064453125, 0.222808837890625, 1.1270599365234375, -1.3478240966796875, 1.5069694519042969, 10.043792724609375, 4.6613006591796875, 5.1332855224609375, -2.1826934814453125, 14.300865173339844, 0.1197357177734375, 1.3311386108398438, -3.7706050872802734, 1.3709564208984375, 5.6295928955078125, 0.35921478271484375, 3.8411483764648438, 6.433574676513672, 5.996986389160156, -0.6507186889648438, 3.3699569702148438, -0.2322540283203125, -3.5821304321289062, 1.5591354370117188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000133.npy"}
{"epoch": 0.20105820105820105, "step": 134, "batch_size": 64, "mean": 2.98215389251709, "std": 4.885471343994141, "min": -6.2818603515625, "p10": -3.780097961425781, "median": 2.8133506774902344, "p90": 7.5396064758300785, "max": 19.458099365234375, "pos_frac": 0.734375, "sample": [-4.3689422607421875, 2.0169219970703125, -0.8068161010742188, 4.521846771240234, 1.7516288757324219, 1.55621337890625, 2.8946304321289062, -1.1322784423828125, 4.152462005615234, 6.4165191650390625, 3.4383773803710938, 4.543079376220703, 3.9558792114257812, 1.34033203125, -0.08804702758789062, 2.768707275390625, 7.623695373535156, 4.662864685058594, -4.0620574951171875, 6.627170562744141, 8.511001586914062, 10.09503173828125, 5.967338562011719, -4.873283386230469, 1.7045822143554688, 6.235237121582031, -0.87744140625, 4.064231872558594, -0.27871227264404297, 2.4541015625, -4.145069122314453, 0.07700538635253906, 5.714935302734375, -3.897125244140625, 1.7060508728027344, -2.126260757446289, 7.3433990478515625, 2.8712692260742188, 2.3481292724609375, 4.228107452392578, 19.458099365234375, 5.206846237182617, -3.5070343017578125, 4.4570770263671875, -0.66455078125, 3.898479461669922, 7.0501251220703125, -6.2818603515625, 11.361892700195312, 1.5779571533203125, 3.022571563720703, 2.047149658203125, -0.3557472229003906, 1.8021087646484375, 15.774871826171875, -4.720127105712891, 3.9434242248535156, 2.8579940795898438, 1.7732658386230469, 4.6817169189453125, 6.812744140625, 16.31694793701172, 0.15170669555664062, -0.7425384521484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000134.npy"}
{"epoch": 0.20256991685563114, "step": 135, "batch_size": 64, "mean": 2.0684075355529785, "std": 4.323760986328125, "min": -11.78774642944336, "p10": -1.5697383880615234, "median": 1.937368392944336, "p90": 7.69840545654297, "max": 14.905372619628906, "pos_frac": 0.6875, "sample": [3.2375411987304688, -0.742462158203125, 0.5539474487304688, 4.255575180053711, 2.551942825317383, -0.08085155487060547, -4.086296081542969, -0.49597930908203125, 2.4091339111328125, 9.147369384765625, 0.09149169921875, -5.628570556640625, 3.2351455688476562, 0.2260284423828125, 3.5026473999023438, -6.025875091552734, 8.653007507324219, 0.0712890625, -1.58428955078125, 4.008003234863281, 2.2004356384277344, -0.5092315673828125, -11.78774642944336, 7.46533203125, 2.1693115234375, 1.108612060546875, 4.926372528076172, 2.7800750732421875, 1.7054252624511719, 4.7441864013671875, 1.2589225769042969, -3.005115509033203, 7.128395080566406, 2.4200057983398438, 6.8286590576171875, -0.26421356201171875, 0.8563156127929688, -0.2628021240234375, 4.922443389892578, 0.3253288269042969, -0.72802734375, 7.92742919921875, -0.47173309326171875, 3.5526199340820312, 3.656888961791992, 7.222679138183594, 1.6296157836914062, -0.3121376037597656, 2.697967529296875, 1.2967071533203125, 1.1713333129882812, -6.703277587890625, -1.3080062866210938, 3.3289718627929688, 7.7982940673828125, 10.251842498779297, -1.1456623077392578, 2.336641311645508, -1.5357856750488281, 8.351577758789062, 4.1273651123046875, 6.6641845703125, 14.905372619628906, -0.6462974548339844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000135.npy"}
{"epoch": 0.20408163265306123, "step": 136, "batch_size": 64, "mean": 2.6320486068725586, "std": 6.181413650512695, "min": -9.821624755859375, "p10": -4.967971038818359, "median": 0.7832527160644531, "p90": 10.05734100341797, "max": 17.626480102539062, "pos_frac": 0.625, "sample": [5.1887969970703125, 7.347593307495117, -2.5348777770996094, -0.6547317504882812, -4.36920166015625, 5.8328094482421875, -9.821624755859375, -2.2235450744628906, -8.791336059570312, 7.691457748413086, -0.128021240234375, -5.224586486816406, -1.1223373413085938, -0.07783699035644531, -2.4487228393554688, 0.39658355712890625, 8.27682113647461, -0.7187156677246094, 0.4214897155761719, -1.5541114807128906, 10.631423950195312, 2.2825851440429688, 10.219284057617188, 9.679473876953125, -1.5607986450195312, 0.5870704650878906, 12.057395935058594, -1.5760040283203125, 0.90155029296875, 16.46630859375, 8.316238403320312, -0.2898292541503906, 6.5169677734375, 7.72576904296875, 9.298431396484375, -3.21478271484375, 0.330078125, 6.256443023681641, 8.702789306640625, -5.263385772705078, -6.59029483795166, 0.680694580078125, 3.4083251953125, 8.600234985351562, 2.658355712890625, 0.5317840576171875, -6.033027648925781, 5.1207733154296875, 2.2088470458984375, 17.626480102539062, 0.8858108520507812, -1.582916259765625, 5.4174957275390625, 9.300155639648438, 8.346733093261719, 17.174224853515625, -0.6246414184570312, 1.754608154296875, 6.141206741333008, 10.765544891357422, 0.17529296875, -1.4832839965820312, 0.2010040283203125, -9.785194396972656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000136.npy"}
{"epoch": 0.20559334845049132, "step": 137, "batch_size": 64, "mean": 2.646651029586792, "std": 4.75977087020874, "min": -7.722381591796875, "p10": -3.4612064361572266, "median": 2.8731517791748047, "p90": 8.75976257324219, "max": 17.330917358398438, "pos_frac": 0.671875, "sample": [-4.426799774169922, -7.722381591796875, 0.4819145202636719, 10.056282043457031, 3.933013916015625, 5.384956359863281, 9.281803131103516, 13.24420166015625, 2.2796859741210938, 17.330917358398438, 4.676861763000488, 4.555206298828125, 8.030677795410156, -0.6558303833007812, 9.072227478027344, 3.71759033203125, -3.554595947265625, 4.186393737792969, -3.5670166015625, -2.1639938354492188, 2.018901824951172, 2.4935150146484375, 5.198310852050781, -3.243297576904297, 3.2947425842285156, 0.8238525390625, 9.975421905517578, 9.962310791015625, 5.550233840942383, 5.050590515136719, 1.456817626953125, 3.9684829711914062, -5.447212219238281, 2.831573486328125, 7.804328918457031, 2.9147300720214844, -1.441645622253418, -3.955047607421875, -0.4310150146484375, 1.3172569274902344, 1.4998626708984375, 1.1991844177246094, 0.8763656616210938, 5.3964996337890625, 6.037380218505859, -1.2113399505615234, -1.7159614562988281, 5.866031646728516, 4.276397705078125, 6.3352203369140625, -0.0046634674072265625, -1.3348655700683594, -0.5685606002807617, -5.427940368652344, 3.5205726623535156, -1.5817947387695312, -1.0811119079589844, -2.710693359375, -2.6661300659179688, 3.8815689086914062, 6.157112121582031, 6.602466583251953, 5.262969970703125, 6.493133544921875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000137.npy"}
{"epoch": 0.20710506424792138, "step": 138, "batch_size": 64, "mean": 1.9828981161117554, "std": 5.387482643127441, "min": -8.583789825439453, "p10": -3.669453144073486, "median": 1.065958023071289, "p90": 8.378413105010987, "max": 20.267547607421875, "pos_frac": 0.640625, "sample": [0.6541938781738281, 3.420318603515625, 0.46892547607421875, 3.042613983154297, 1.5643157958984375, 0.4965667724609375, 2.6665191650390625, -1.0687179565429688, -6.478973388671875, -3.857921600341797, 1.8806400299072266, -6.811981201171875, 8.479583740234375, -0.17115020751953125, 13.905319213867188, -7.7416839599609375, -0.48838043212890625, 4.866935729980469, 9.730644226074219, 1.3250999450683594, 6.236118316650391, -2.9385528564453125, 20.267547607421875, 3.4852066040039062, 0.7245216369628906, 5.8985137939453125, 5.854654312133789, -1.6438369750976562, 2.3281593322753906, -0.5988998413085938, 8.142348289489746, -3.2296934127807617, 4.9002685546875, 0.06752777099609375, -0.6775760650634766, -1.9456710815429688, 1.4921875, -8.50054931640625, -0.5515537261962891, 7.8548583984375, 4.827384948730469, 11.067024230957031, -1.93701171875, 5.2920684814453125, 0.06750106811523438, -8.583789825439453, 4.544548034667969, 0.8309097290039062, -1.830657958984375, 1.0652313232421875, -1.4829940795898438, 5.271308898925781, 5.9561004638671875, 2.3899383544921875, 0.954345703125, -5.676841735839844, -0.022579193115234375, 1.0666847229003906, -0.5096244812011719, 2.3318862915039062, 10.672645568847656, 3.490070343017578, 14.814170837402344, -0.7412872314453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000138.npy"}
{"epoch": 0.20861678004535147, "step": 139, "batch_size": 64, "mean": 2.093411684036255, "std": 5.223027229309082, "min": -16.53033447265625, "p10": -4.931029510498046, "median": 2.6329708099365234, "p90": 8.115713500976563, "max": 11.1868896484375, "pos_frac": 0.734375, "sample": [-5.409568786621094, 8.328302383422852, 7.613311767578125, 1.1143341064453125, -5.949485778808594, 4.336875915527344, 4.3551177978515625, 5.329368591308594, 9.624198913574219, -1.859100341796875, -0.841064453125, 6.775440216064453, 3.268400192260742, -5.612981796264648, -0.5324554443359375, 1.2961578369140625, 10.365989685058594, 3.2630958557128906, 2.5080032348632812, 2.4092750549316406, 1.398681640625, -16.53033447265625, 0.33704376220703125, 3.307260513305664, 5.241432189941406, 5.248775482177734, 9.156173706054688, 1.730093002319336, 5.082618713378906, -3.8144378662109375, 2.7579383850097656, 3.2199783325195312, 3.737548828125, -8.25433349609375, 4.286277770996094, 0.79547119140625, 0.8978767395019531, 2.172504425048828, 11.1868896484375, 0.2863960266113281, 5.866180419921875, -3.469409942626953, 5.045215606689453, 0.4511985778808594, 1.722930908203125, -6.7192230224609375, 4.990203857421875, 4.8668365478515625, 8.0872802734375, 7.5200347900390625, -2.528900146484375, 5.243072509765625, 0.7535934448242188, -10.185401916503906, -1.135162353515625, 3.944194793701172, -3.3054428100585938, 6.902099609375, -2.6454858779907227, 8.127899169921875, 5.6651763916015625, 10.785011291503906, 2.239443778991699, -0.8700714111328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000139.npy"}
{"epoch": 0.21012849584278157, "step": 140, "batch_size": 64, "mean": 3.352853775024414, "std": 5.806770324707031, "min": -11.883560180664062, "p10": -3.6173759460449215, "median": 2.5548877716064453, "p90": 11.014270782470705, "max": 15.322303771972656, "pos_frac": 0.65625, "sample": [10.76531982421875, 11.61248779296875, -0.5648994445800781, -2.115264892578125, -1.9809951782226562, 8.681961059570312, -4.366325378417969, -4.3772735595703125, 2.8177146911621094, 2.5640029907226562, -11.883560180664062, 7.9718017578125, 4.339698791503906, 2.480072021484375, 7.2278594970703125, -3.3887939453125, -1.2249984741210938, 4.78179931640625, 5.7980804443359375, 8.2017822265625, -0.5672683715820312, 9.259674072265625, 8.244953155517578, 7.614387512207031, 2.0523223876953125, 10.085678100585938, -0.4921226501464844, 11.120964050292969, 0.48659515380859375, -2.7924423217773438, -4.646240234375, 0.7117080688476562, 6.00480842590332, -4.636436462402344, -2.2258949279785156, 5.208900451660156, 2.5457725524902344, 10.589935302734375, 12.142562866210938, 11.901611328125, 9.547103881835938, 9.37990951538086, 11.357337951660156, 9.088340759277344, -0.6595840454101562, -0.8151168823242188, 14.889106750488281, -2.2662124633789062, 15.322303771972656, -3.1055984497070312, -4.0492401123046875, 6.872121810913086, 0.8525390625, 2.2187557220458984, 8.746856689453125, 0.9686908721923828, 4.404106140136719, -3.1250991821289062, 4.303459167480469, 1.3027725219726562, 3.919830322265625, 2.1611709594726562, -2.965503692626953, -3.7153396606445312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000140.npy"}
{"epoch": 0.21164021164021163, "step": 141, "batch_size": 64, "mean": 3.231800079345703, "std": 5.925390720367432, "min": -11.5045166015625, "p10": -3.9685897827148438, "median": 3.058177947998047, "p90": 11.605976486206057, "max": 14.476123809814453, "pos_frac": 0.65625, "sample": [4.299957275390625, -5.908973693847656, 10.194778442382812, -1.1719207763671875, 8.265777587890625, 0.121185302734375, 4.4836273193359375, 11.28414535522461, 4.023458480834961, -1.50537109375, -0.4100494384765625, -3.4585723876953125, 3.8967819213867188, -5.972541809082031, 12.938636779785156, -0.49724769592285156, 2.937286376953125, 0.08469200134277344, -3.3316116333007812, -11.5045166015625, 0.19469451904296875, -4.937599182128906, -3.2308692932128906, 5.122611999511719, -0.5017242431640625, 2.2444992065429688, 5.787200927734375, 9.195465087890625, 2.99951171875, -4.737579345703125, -0.374053955078125, -3.798980712890625, 5.149436950683594, 11.743904113769531, 7.945121765136719, -4.020866394042969, 9.106391906738281, -1.114227294921875, 11.877159118652344, 14.251663208007812, 5.7774505615234375, 5.752796173095703, 4.3638153076171875, 2.8296585083007812, 14.476123809814453, 4.544731140136719, -1.3920669555664062, 7.22271728515625, 2.7835769653320312, 11.043846130371094, -3.8466110229492188, 2.0571327209472656, 7.565921783447266, 12.8621826171875, 4.8988800048828125, 12.50006103515625, -3.1424503326416016, 9.543914794921875, 3.1168441772460938, 9.606353759765625, 7.4238433837890625, -0.7167301177978516, -6.5694122314453125, 2.4613494873046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000141.npy"}
{"epoch": 0.21315192743764172, "step": 142, "batch_size": 64, "mean": 2.2820658683776855, "std": 5.227050304412842, "min": -11.576828002929688, "p10": -2.5879289627075193, "median": 1.9832000732421875, "p90": 6.99320068359375, "max": 18.635101318359375, "pos_frac": 0.6875, "sample": [5.7573089599609375, -9.334617614746094, 0.3852663040161133, 7.057491302490234, 3.11370849609375, 13.027359008789062, 6.136207580566406, -7.001865386962891, 2.1693077087402344, 0.9162216186523438, 3.0205535888671875, 0.6295127868652344, 6.843189239501953, -5.798423767089844, 0.712554931640625, 2.4810562133789062, -1.2440872192382812, -11.576828002929688, -2.6303138732910156, 1.4524917602539062, 1.2319869995117188, 3.8209991455078125, 2.1054649353027344, -1.6350479125976562, 4.530708312988281, 10.382369995117188, -2.3678665161132812, 2.3780059814453125, 6.0810394287109375, 1.8800086975097656, 1.492218017578125, 2.480926513671875, 18.635101318359375, 4.14129638671875, -0.26446533203125, -3.9691162109375, -0.58148193359375, -2.361485481262207, 2.0066299438476562, 4.540424346923828, 13.766067504882812, -0.1588897705078125, 1.1364517211914062, -2.24029541015625, -0.9021759033203125, 1.011383056640625, 0.81085205078125, 5.907257080078125, -1.8597526550292969, 6.8271636962890625, 3.4255752563476562, -2.4890308380126953, -1.4046630859375, 14.534591674804688, 6.023708343505859, 5.810127258300781, 5.437053680419922, 6.2538299560546875, 8.841171264648438, 4.04326057434082, 2.8230972290039062, -4.100076675415039, -0.0480804443359375, 1.9597702026367188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000142.npy"}
{"epoch": 0.2146636432350718, "step": 143, "batch_size": 64, "mean": 2.137986183166504, "std": 6.391254901885986, "min": -13.717033386230469, "p10": -4.42468490600586, "median": 1.6383628845214844, "p90": 10.302128410339357, "max": 16.148269653320312, "pos_frac": 0.59375, "sample": [0.8402252197265625, 6.643463134765625, 8.80716323852539, 8.494842529296875, 3.1969833374023438, 4.650360107421875, -0.7377471923828125, -0.4519615173339844, 12.87847900390625, -2.832752227783203, 3.326934814453125, -4.515693664550781, 16.148269653320312, -2.14959716796875, 8.307258605957031, 2.1243057250976562, -10.613601684570312, -4.328857421875, 9.350860595703125, 1.2517528533935547, -3.065521240234375, -4.980445861816406, 11.286632537841797, 10.036422729492188, 3.002105712890625, 2.1666126251220703, 0.1177825927734375, -8.802886962890625, -0.09765052795410156, -1.3709449768066406, -2.7920684814453125, 3.4086742401123047, -4.425209045410156, 0.298187255859375, 11.412322998046875, -0.5446548461914062, -3.835906982421875, -0.9247684478759766, 0.27039337158203125, 3.4330291748046875, 2.8701324462890625, -12.64971923828125, 7.556591033935547, 2.4200820922851562, 2.024972915649414, 9.144561767578125, 14.202674865722656, 8.280548095703125, -4.4234619140625, -1.45440673828125, -0.39910888671875, -2.5049381256103516, 4.936592102050781, 15.317611694335938, -13.717033386230469, 7.874851226806641, 5.957695007324219, 0.1978607177734375, 10.41600227355957, 8.232431411743164, 2.913797378540039, -0.864649772644043, -3.814708709716797, -0.6700515747070312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000143.npy"}
{"epoch": 0.2161753590325019, "step": 144, "batch_size": 64, "mean": 1.7348195314407349, "std": 7.343339443206787, "min": -12.300506591796875, "p10": -6.899774551391602, "median": 0.1071929931640625, "p90": 11.933946228027345, "max": 21.671035766601562, "pos_frac": 0.515625, "sample": [9.81279182434082, 4.734035491943359, -11.643791198730469, -0.36029624938964844, 0.208160400390625, 4.367128372192383, 0.0062255859375, 8.956954956054688, 6.353607177734375, 1.3773727416992188, -2.8802108764648438, 14.869876861572266, -7.384742736816406, -0.8428821563720703, -2.6121063232421875, -12.300506591796875, 12.065109252929688, -0.47376251220703125, -8.048954010009766, 9.235618591308594, 0.90948486328125, -2.7711944580078125, -2.9293746948242188, 4.1816864013671875, -6.003959655761719, -3.8225250244140625, -5.713226318359375, 8.45498275756836, -2.127321243286133, -0.8560409545898438, 12.9815673828125, -4.551239013671875, -6.546428680419922, -0.9747085571289062, 7.207618713378906, -0.24454498291015625, 2.0316343307495117, 13.465812683105469, 18.884857177734375, 8.753067016601562, 3.749603271484375, 1.60369873046875, -8.02823257446289, 6.325164794921875, -6.225437164306641, -2.2878265380859375, 7.208396911621094, -5.476810455322266, 8.465644836425781, -3.2307586669921875, -0.06856536865234375, -4.6671295166015625, 3.5754051208496094, 9.956888198852539, -9.444656372070312, 13.02798843383789, 21.671035766601562, 11.627899169921875, -1.1014022827148438, -7.05120849609375, 4.721092224121094, -2.2996902465820312, 2.0846023559570312, 1.1229705810546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000144.npy"}
{"epoch": 0.21768707482993196, "step": 145, "batch_size": 64, "mean": 2.9597392082214355, "std": 7.150273323059082, "min": -17.690216064453125, "p10": -7.1245780944824215, "median": 2.8507843017578125, "p90": 11.348302459716805, "max": 19.039581298828125, "pos_frac": 0.734375, "sample": [4.971233367919922, 4.638957977294922, 5.102451324462891, 7.5660400390625, 0.33539581298828125, -17.690216064453125, -0.031341552734375, -0.26694488525390625, 15.810230255126953, -8.583587646484375, 4.395046234130859, 3.594390869140625, 2.2077856063842773, 8.30328369140625, 2.437623977661133, 1.99237060546875, 2.3755416870117188, 14.559242248535156, -9.861648559570312, 6.353851318359375, 3.0482559204101562, 2.452239990234375, 0.9411544799804688, 8.2740478515625, 9.171424865722656, 1.2125701904296875, 1.1896743774414062, 3.79058837890625, -2.5995025634765625, -7.742931365966797, 12.28125, 2.4793930053710938, 5.377845764160156, 16.8497314453125, -12.773590087890625, 5.460850715637207, 7.414436340332031, 0.8301010131835938, 7.735660552978516, 2.6533126831054688, -6.6978607177734375, 7.580207824707031, 15.359024047851562, 6.555698394775391, -7.307456970214844, 2.6336669921875, 2.499277114868164, 6.858546257019043, -0.7066001892089844, 15.334564208984375, 7.472930908203125, 1.068338394165039, 7.450557708740234, -0.11437225341796875, -0.9158706665039062, 6.5927734375, 19.039581298828125, -2.51446533203125, -12.2662353515625, 3.5750885009765625, 7.7219696044921875, -5.726493835449219, -4.92474365234375, 4.598968505859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000145.npy"}
{"epoch": 0.21919879062736206, "step": 146, "batch_size": 64, "mean": 2.163771629333496, "std": 6.222183704376221, "min": -13.831680297851562, "p10": -5.306139183044434, "median": 2.0608386993408203, "p90": 9.627034378051759, "max": 22.768218994140625, "pos_frac": 0.640625, "sample": [3.28338623046875, -0.2816009521484375, 0.76898193359375, 8.092544555664062, 2.6065826416015625, -6.459636688232422, 0.34679412841796875, -3.2348060607910156, 6.8817901611328125, -4.274272918701172, -10.6885986328125, -13.831680297851562, 7.20648193359375, 6.626020431518555, -6.1737518310546875, -5.310935974121094, 9.791484832763672, -3.4534072875976562, 12.537139892578125, -6.91748046875, 4.865161895751953, 6.041618347167969, -0.6897735595703125, -5.071523666381836, -5.294946670532227, 7.222785949707031, -0.353790283203125, 1.2057037353515625, -2.7445144653320312, 0.5877304077148438, 7.8859405517578125, -3.9485931396484375, 9.779037475585938, 2.67108154296875, 6.979701995849609, -0.7485694885253906, 3.0233306884765625, -1.0564346313476562, -5.51812744140625, 0.8409004211425781, 12.312088012695312, 0.9211692810058594, -1.4509639739990234, 4.101165771484375, 4.815711975097656, 9.180561065673828, 5.5681915283203125, 3.4643211364746094, 9.65822982788086, 3.8500843048095703, 5.307525634765625, -5.277557373046875, 10.727981567382812, -3.5017776489257812, 1.5150947570800781, -0.28900909423828125, 22.768218994140625, 7.49798583984375, 0.6655216217041016, 4.246849060058594, 6.047298431396484, 0.25348663330078125, 3.3532028198242188, 9.554244995117188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000146.npy"}
{"epoch": 0.22071050642479215, "step": 147, "batch_size": 64, "mean": 1.9757896661758423, "std": 6.342806816101074, "min": -11.65673828125, "p10": -5.132260513305663, "median": 1.2271194458007812, "p90": 10.96446990966797, "max": 15.573234558105469, "pos_frac": 0.640625, "sample": [0.92755126953125, 2.5663528442382812, 11.652313232421875, 0.20360183715820312, -1.2116241455078125, 2.8390579223632812, -1.0525360107421875, -3.2630767822265625, 0.2516021728515625, -1.4181060791015625, 1.5266876220703125, 6.720603942871094, 11.092620849609375, 4.31439208984375, -2.0052032470703125, 6.794700622558594, 10.665451049804688, -11.65673828125, 8.749813079833984, -0.23891639709472656, 7.38752555847168, 4.131919860839844, -2.5218238830566406, -3.332305908203125, -2.9666671752929688, 4.490880966186523, -1.824859619140625, 3.638214111328125, 0.7484703063964844, 4.6470489501953125, -2.9613265991210938, 0.609527587890625, 0.634765625, -10.37520980834961, 7.515449523925781, 6.479194641113281, 6.694664001464844, 15.2454833984375, 13.839729309082031, -4.217601776123047, 10.358993530273438, 6.1094512939453125, -5.519050598144531, 3.7015914916992188, 2.1354541778564453, -5.248737335205078, 14.651878356933594, -2.8680152893066406, 15.573234558105469, -3.6624908447265625, -4.860481262207031, 4.205513000488281, 0.60443115234375, 2.1803359985351562, -9.2518310546875, 2.27410888671875, 0.3570899963378906, -4.078535079956055, 2.2059249877929688, -8.590179443359375, -7.889591217041016, 3.5944137573242188, 15.02838134765625, 0.1170186996459961], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000147.npy"}
{"epoch": 0.2222222222222222, "step": 148, "batch_size": 64, "mean": 3.6913676261901855, "std": 7.34673547744751, "min": -15.390975952148438, "p10": -5.052968025207519, "median": 2.856259346008301, "p90": 12.36052703857422, "max": 25.8238525390625, "pos_frac": 0.734375, "sample": [7.89202880859375, -4.181406021118164, -0.26688194274902344, -9.245462417602539, 2.4999847412109375, 11.081901550292969, 11.23562240600586, -5.426494598388672, 8.199554443359375, 12.537155151367188, -3.6997156143188477, 2.0521240234375, 6.92755126953125, 7.835834503173828, 7.380073547363281, -9.4263916015625, -1.158529281616211, 0.928558349609375, 5.875175476074219, -6.655742645263672, 2.058429718017578, -9.04193115234375, 25.8238525390625, 4.305107116699219, 1.2689228057861328, 10.114860534667969, 9.30539321899414, 1.0006332397460938, 19.67547607421875, 7.410343170166016, 5.3199005126953125, 7.44598388671875, 16.128833770751953, -3.59942626953125, 5.012748718261719, 15.624664306640625, 1.638723373413086, 2.6646957397460938, -15.390975952148438, 0.07636260986328125, 6.586952209472656, -1.9066390991210938, 3.7520980834960938, 15.236572265625, -8.320762634277344, 2.222036361694336, 6.655727386474609, 2.9632129669189453, 14.324470520019531, -3.6884384155273438, 5.45927619934082, 2.7493057250976562, 2.0419788360595703, -1.3451461791992188, 1.7653141021728516, 1.5869903564453125, 11.948394775390625, 6.0566253662109375, 2.6214256286621094, 4.1240692138671875, 4.40283203125, 10.066230773925781, -2.8925323486328125, -1.3899917602539062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000148.npy"}
{"epoch": 0.2237339380196523, "step": 149, "batch_size": 64, "mean": 2.2720582485198975, "std": 5.917534351348877, "min": -14.534248352050781, "p10": -5.172604370117186, "median": 2.7389068603515625, "p90": 9.559907531738281, "max": 14.555608749389648, "pos_frac": 0.65625, "sample": [6.65673828125, 7.216209411621094, 0.45245361328125, -0.4567108154296875, -4.01910400390625, 2.8159103393554688, -5.666961669921875, 1.4594612121582031, -5.852901458740234, 12.561256408691406, -14.534248352050781, 3.1649436950683594, 6.112201690673828, 1.5542449951171875, -3.2157135009765625, -2.8490943908691406, 0.6586189270019531, 2.3762664794921875, 7.959442138671875, 9.435012817382812, 3.5852012634277344, 4.1055450439453125, 9.613433837890625, -2.3669967651367188, 14.387321472167969, -1.2180252075195312, 3.3878345489501953, -0.6966762542724609, 4.073152542114258, -6.803657531738281, 4.101312637329102, 4.7072906494140625, 6.592529296875, 14.555608749389648, 5.554174423217773, 13.228145599365234, 8.28244400024414, -6.887763977050781, 3.0086212158203125, -3.1472320556640625, 1.7571868896484375, 13.185226440429688, -1.9925384521484375, 11.612747192382812, 4.454010009765625, 5.3408966064453125, 0.8693389892578125, -7.274858474731445, -1.0022945404052734, 4.198127746582031, -2.325969696044922, 4.175971984863281, -2.781829833984375, -10.401611328125, -2.441082000732422, 1.1402587890625, 2.9698333740234375, 8.179664611816406, 2.6619033813476562, 4.594123840332031, -1.5076065063476562, 7.009033203125, -1.4458999633789062, 0.54681396484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000149.npy"}
{"epoch": 0.2252456538170824, "step": 150, "batch_size": 64, "mean": 4.11046028137207, "std": 7.874888896942139, "min": -11.622798919677734, "p10": -6.8282737731933585, "median": 4.380529403686523, "p90": 14.688089370727544, "max": 21.229339599609375, "pos_frac": 0.71875, "sample": [7.081966400146484, 2.063457489013672, 19.640289306640625, 3.3951263427734375, -9.332763671875, 2.0476531982421875, 4.261333465576172, 0.6949615478515625, -2.8836669921875, 6.258354187011719, -0.94183349609375, 10.60760498046875, -7.443084716796875, -7.3289031982421875, -1.824625015258789, 5.884059906005859, 1.203826904296875, 17.48089599609375, -11.100341796875, -7.062568664550781, 11.759201049804688, 0.09992218017578125, 19.492156982421875, -11.332275390625, 0.349761962890625, 7.8026275634765625, 5.780826568603516, 4.782844543457031, 9.839393615722656, 15.159969329833984, 3.91741943359375, 9.886383056640625, 12.552940368652344, 6.730525970458984, 6.698875427246094, -4.053203582763672, 3.271881103515625, 4.69108772277832, 10.4798583984375, 15.791519165039062, 4.499725341796875, 11.337860107421875, -6.281585693359375, 2.3797149658203125, 19.18990707397461, -4.9234771728515625, 4.660728454589844, 7.361848831176758, -11.622798919677734, 21.229339599609375, 6.4221038818359375, -0.7177047729492188, 8.282577514648438, -3.9805221557617188, 1.0773658752441406, 7.314186096191406, -2.3303070068359375, -1.68975830078125, 13.5870361328125, -3.4005813598632812, 3.0552978515625, 9.207572937011719, 1.4686279296875, 10.538848876953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000150.npy"}
{"epoch": 0.22675736961451248, "step": 151, "batch_size": 64, "mean": 4.3620452880859375, "std": 6.772552490234375, "min": -10.282501220703125, "p10": -2.385018157958984, "median": 3.793376922607422, "p90": 11.310577392578125, "max": 25.279876708984375, "pos_frac": 0.6875, "sample": [5.882476806640625, 11.967391967773438, 7.820070266723633, -0.3186836242675781, 6.163238525390625, 0.5858192443847656, 22.893890380859375, -1.755523681640625, -1.139923095703125, -1.0788688659667969, 4.056224822998047, -0.6254501342773438, 4.8063507080078125, -10.282501220703125, 7.206510543823242, 7.025712966918945, 11.37405776977539, -2.4250335693359375, 3.1001625061035156, -1.6057014465332031, -1.8606796264648438, -3.0397262573242188, 11.107406616210938, -0.0512237548828125, 0.2547760009765625, -6.307037353515625, 4.680700302124023, 7.695533752441406, -1.868551254272461, -2.2916488647460938, 5.608267784118652, -0.5417861938476562, -1.4250946044921875, 9.89874267578125, 2.575592041015625, 0.08538818359375, 3.530529022216797, -4.652442932128906, 0.6137008666992188, 25.279876708984375, 1.5493545532226562, 7.612091064453125, 2.7694625854492188, 10.132745742797852, 18.665740966796875, 17.415077209472656, 10.51742935180664, 4.8956298828125, 11.0322265625, 6.036724090576172, 2.637950897216797, 10.036857604980469, -1.68328857421875, 4.5038909912109375, -7.159832000732422, 7.523338317871094, 6.86900520324707, 11.162456512451172, -2.765583038330078, 6.249176025390625, 1.4881057739257812, 9.528427124023438, 3.100055694580078, 14.111312866210938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000151.npy"}
{"epoch": 0.22826908541194255, "step": 152, "batch_size": 64, "mean": 3.14892840385437, "std": 7.273825645446777, "min": -13.298316955566406, "p10": -5.370693588256835, "median": 2.8351869583129883, "p90": 11.994710922241211, "max": 23.634567260742188, "pos_frac": 0.703125, "sample": [1.5202980041503906, 1.44134521484375, 7.378440856933594, 2.6740570068359375, -12.36993408203125, 6.355979919433594, 7.45672607421875, 5.665977478027344, 13.808490753173828, -1.8158111572265625, 9.80086612701416, -2.82940673828125, 1.3017234802246094, -13.298316955566406, 5.955390930175781, 6.696746826171875, 0.8840560913085938, 5.412776947021484, -1.0645217895507812, -3.2305679321289062, -8.91021728515625, 12.106346130371094, 7.628406524658203, -7.4525299072265625, -4.867977142333984, 0.8245677947998047, 3.8409957885742188, 4.903755187988281, 4.398807525634766, 13.015853881835938, 2.8247814178466797, 9.861190795898438, 2.845592498779297, -0.17382049560546875, 1.20330810546875, -12.51513671875, 0.2546348571777344, 1.780120849609375, 8.827423095703125, -2.22308349609375, -3.5517578125, 9.184410095214844, 8.412765502929688, -2.50238037109375, 12.072250366210938, 11.813785552978516, 4.1424407958984375, 6.319175720214844, 0.0631103515625, 3.8683319091796875, -9.091659545898438, 2.8210678100585938, 0.4501953125, 22.226951599121094, 8.2679443359375, 10.7694091796875, 13.389007568359375, -1.6704025268554688, 3.206024169921875, 23.634567260742188, -1.7172622680664062, -5.586143493652344, -1.1520004272460938, 6.244255065917969], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000152.npy"}
{"epoch": 0.22978080120937264, "step": 153, "batch_size": 64, "mean": 3.432448387145996, "std": 6.506551265716553, "min": -12.585784912109375, "p10": -4.694755554199219, "median": 2.9008235931396484, "p90": 12.016265869140625, "max": 17.812217712402344, "pos_frac": 0.703125, "sample": [10.393707275390625, 2.5290374755859375, -4.712608337402344, 5.7356414794921875, 13.409187316894531, 3.2726097106933594, -2.890838623046875, -0.84716796875, 5.643848419189453, -0.8346405029296875, 2.2946643829345703, 6.446231842041016, 1.29510498046875, 16.736167907714844, 4.682090759277344, 9.6517333984375, 0.5673141479492188, -3.2699508666992188, 4.324699401855469, 11.354057312011719, 3.4412384033203125, 3.8319168090820312, -12.585784912109375, -5.340873718261719, 17.812217712402344, -3.3675670623779297, 3.6317405700683594, 1.63800048828125, -2.8092269897460938, 4.73492431640625, 0.6872024536132812, 7.333400726318359, -8.56035041809082, -3.085174560546875, 7.1635589599609375, 5.302080154418945, -6.690643310546875, 10.624542236328125, 10.92474365234375, 2.0182647705078125, -6.0889892578125, 0.8319053649902344, 11.914848327636719, -4.653099060058594, 15.136764526367188, -1.1430931091308594, 12.059730529785156, 11.181808471679688, 8.546096801757812, -0.7244434356689453, -1.4126434326171875, 1.4760971069335938, 1.7194747924804688, -5.37921142578125, 3.5670089721679688, 0.640899658203125, -3.1541061401367188, 7.0341796875, 4.7882080078125, 12.172653198242188, 11.816581726074219, 1.846963882446289, 1.922088623046875, 13.091865539550781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000153.npy"}
{"epoch": 0.23129251700680273, "step": 154, "batch_size": 64, "mean": 2.523413896560669, "std": 7.311330795288086, "min": -14.841079711914062, "p10": -7.432435607910156, "median": 2.6951756477355957, "p90": 12.126281356811523, "max": 17.310382843017578, "pos_frac": 0.65625, "sample": [-14.841079711914062, -9.155731201171875, -8.733047485351562, -4.926055908203125, 12.067035675048828, 4.730129241943359, -6.727607727050781, -1.7093238830566406, -10.118881225585938, -11.631057739257812, 4.160249710083008, -0.016786575317382812, 3.3508834838867188, 10.273086547851562, -0.5387725830078125, -7.734504699707031, 3.9839096069335938, 3.833568572998047, 1.8498916625976562, -5.189399719238281, 0.05401611328125, 17.310382843017578, 15.430658340454102, 5.3411712646484375, -6.215850830078125, 16.639606475830078, 6.839759826660156, -6.6417083740234375, 3.718475341796875, -10.181747436523438, 10.133661270141602, -0.38043212890625, 3.956695556640625, 7.276988983154297, 13.566116333007812, -4.785900115966797, 6.768665313720703, 12.455368041992188, 1.670273780822754, -3.5556068420410156, 1.3770942687988281, 0.090850830078125, 7.917137145996094, -4.681779861450195, 2.2114791870117188, 5.981578826904297, 14.228458404541016, 0.9768791198730469, 5.509265899658203, 1.4264602661132812, 6.3821563720703125, 2.0082931518554688, -1.6080551147460938, 8.587928771972656, 8.486419677734375, 12.15167236328125, 2.5685653686523438, -0.41965675354003906, 9.364265441894531, 11.673347473144531, -0.013824462890625, 3.407482147216797, 2.8217859268188477, 8.723579406738281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000154.npy"}
{"epoch": 0.2328042328042328, "step": 155, "batch_size": 64, "mean": 3.3395771980285645, "std": 7.48551082611084, "min": -17.46194076538086, "p10": -6.435257911682128, "median": 3.1109466552734375, "p90": 13.409040069580081, "max": 17.889278411865234, "pos_frac": 0.640625, "sample": [0.9655914306640625, -0.0375213623046875, 16.671875, 3.278247833251953, -1.9362258911132812, 6.7890167236328125, 2.55780029296875, 9.190670013427734, -5.86976432800293, 11.963851928710938, 5.0675048828125, 4.436614990234375, -0.7656021118164062, 9.58935546875, -2.1428451538085938, 1.9668731689453125, -0.5957794189453125, 17.337600708007812, 0.96044921875, 14.852006912231445, 12.688484191894531, -1.2122764587402344, 3.347412109375, -0.020343780517578125, 8.26800537109375, 17.889278411865234, -6.6776123046875, 4.5989837646484375, -0.3887367248535156, 8.17702865600586, -8.12371826171875, 6.908376693725586, 17.469951629638672, 6.730232238769531, 10.536895751953125, 5.3131561279296875, -4.1494598388671875, 11.970405578613281, -11.034835815429688, 4.4527587890625, -2.2640228271484375, 7.375541687011719, -0.48485565185546875, -0.6024742126464844, 13.954421997070312, 5.084117889404297, 3.9906539916992188, 4.7943572998046875, -13.736724853515625, 2.943645477294922, 12.291542053222656, -0.6678466796875, 1.4711685180664062, -1.1201934814453125, 2.107346534729004, 1.100067138671875, 2.8334197998046875, -6.997119903564453, -17.46194076538086, -0.8071212768554688, 13.717849731445312, 6.6475372314453125, 7.9024658203125, -9.36260986328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000155.npy"}
{"epoch": 0.23431594860166288, "step": 156, "batch_size": 64, "mean": 4.3534746170043945, "std": 6.7535624504089355, "min": -9.390373229980469, "p10": -3.1696655273437497, "median": 3.7831344604492188, "p90": 13.731863403320315, "max": 18.548553466796875, "pos_frac": 0.71875, "sample": [-0.7093658447265625, -2.8340110778808594, -0.36441802978515625, -0.2064056396484375, 12.849090576171875, 12.70855712890625, 15.6202392578125, -6.4876556396484375, 2.7951126098632812, 10.375587463378906, 3.753387451171875, 18.43817901611328, -7.596221923828125, 4.3064727783203125, -3.10369873046875, 11.905929565429688, 9.54052734375, 14.921932220458984, 1.1278877258300781, 6.461055755615234, 2.8252735137939453, 1.5641345977783203, 2.594806671142578, 6.917518615722656, 3.35113525390625, 6.026641845703125, -0.28472137451171875, 13.128036499023438, 6.64947509765625, 3.8128814697265625, -1.3882217407226562, 6.9040069580078125, -0.7449569702148438, -3.19793701171875, 14.284820556640625, 15.927696228027344, 0.541534423828125, 0.29822540283203125, -2.6412124633789062, 9.285957336425781, 12.412757873535156, 2.5511474609375, -1.0898246765136719, 0.1554718017578125, 6.328948974609375, 4.512601852416992, -4.170597076416016, -2.0605316162109375, 5.5664215087890625, 4.814506530761719, 0.9736957550048828, 5.788360595703125, 7.205394744873047, -9.390373229980469, 8.365745544433594, 0.564544677734375, -8.858207702636719, 8.412254333496094, 13.990646362304688, 2.7894210815429688, 9.840576171875, 18.548553466796875, 10.029159545898438, -8.015579223632812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000156.npy"}
{"epoch": 0.23582766439909297, "step": 157, "batch_size": 64, "mean": 2.865233898162842, "std": 7.135542392730713, "min": -16.082717895507812, "p10": -4.433640670776367, "median": 2.059988021850586, "p90": 11.204469680786133, "max": 23.707435607910156, "pos_frac": 0.640625, "sample": [18.101795196533203, 13.430118560791016, 1.4172897338867188, 8.72171401977539, 4.905754089355469, -16.082717895507812, 5.852325439453125, 9.245635986328125, -0.49291229248046875, -4.298526763916016, 0.757598876953125, -1.0266647338867188, 7.51495361328125, -0.64544677734375, -6.276947021484375, 0.022844314575195312, -2.35211181640625, 1.0812835693359375, 11.048110961914062, 6.153209686279297, 2.1409912109375, 11.271480560302734, 23.707435607910156, 2.7871570587158203, 1.8820114135742188, 3.6347885131835938, -9.979362487792969, -3.6978492736816406, 4.261783599853516, -0.6805419921875, 6.76473331451416, -2.9228515625, 3.471027374267578, 5.311100006103516, 4.913673400878906, 17.0445556640625, 1.899932861328125, 7.401145935058594, 14.720840454101562, -1.1052932739257812, 5.962127685546875, 6.2740936279296875, -1.7114372253417969, 15.330253601074219, -4.491546630859375, -8.988420486450195, -0.9257354736328125, 6.294977188110352, -1.1910400390625, -0.5605735778808594, 7.169868469238281, -6.885890960693359, -3.287933349609375, 1.0716514587402344, 3.260634422302246, 9.048728942871094, 1.5270500183105469, 8.270252227783203, -13.2020263671875, 1.9789848327636719, 6.0006561279296875, -0.062591552734375, -2.5101165771484375, 5.098930358886719], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000157.npy"}
{"epoch": 0.23733938019652306, "step": 158, "batch_size": 64, "mean": 3.531813144683838, "std": 7.964321136474609, "min": -14.931137084960938, "p10": -5.81616096496582, "median": 2.816781997680664, "p90": 13.82601776123047, "max": 28.937835693359375, "pos_frac": 0.6875, "sample": [-1.2490997314453125, -4.85235595703125, -0.1845855712890625, 4.174854278564453, 0.31270599365234375, 4.386957168579102, 16.354461669921875, 16.329910278320312, 0.5402374267578125, 1.9503021240234375, 6.162971496582031, 10.401588439941406, 8.4017333984375, 10.101356506347656, -8.337535858154297, 6.903839111328125, 28.937835693359375, 0.8409385681152344, -1.0580930709838867, 9.905204772949219, -5.411689758300781, 2.4322052001953125, 8.555084228515625, 9.218414306640625, 11.426887512207031, 13.476348876953125, 2.35736083984375, 0.6162033081054688, 15.501663208007812, -5.187721252441406, -14.931137084960938, 0.22039031982421875, 8.183174133300781, -5.6222686767578125, 9.775226593017578, 8.531436920166016, 15.572052001953125, -1.36614990234375, -4.357200622558594, -9.494041442871094, 13.975875854492188, 14.227081298828125, 4.341888427734375, 12.546539306640625, -5.3900604248046875, 6.936016082763672, -0.7910194396972656, 6.2559356689453125, 10.945587158203125, 7.572166442871094, 3.2013587951660156, 4.620109558105469, -8.65618896484375, -6.7269744873046875, 0.49562835693359375, -2.3759517669677734, -1.5904121398925781, 1.689483642578125, 5.550239562988281, 0.7571678161621094, -5.899257659912109, 2.39141845703125, 4.304567337036133, -11.8646240234375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000158.npy"}
{"epoch": 0.23885109599395313, "step": 159, "batch_size": 64, "mean": 4.093843460083008, "std": 7.444799900054932, "min": -22.8101806640625, "p10": -3.3351997375488276, "median": 3.2716026306152344, "p90": 13.786949539184572, "max": 21.79192352294922, "pos_frac": 0.75, "sample": [9.97796630859375, 5.119243621826172, 2.0201797485351562, 18.43426513671875, 6.965797424316406, -10.972343444824219, 18.54998779296875, 10.853202819824219, 2.078868865966797, 11.701751708984375, 2.1138916015625, -0.38184356689453125, 14.056987762451172, -4.358802795410156, 0.8220138549804688, 10.242164611816406, 6.9061431884765625, 21.79192352294922, 12.420631408691406, 2.7933883666992188, 6.280357360839844, -1.7740859985351562, 1.8034496307373047, 2.4063186645507812, 6.279075622558594, 0.216766357421875, 0.1798725128173828, -2.8971710205078125, 7.290435791015625, 13.1568603515625, -8.2335205078125, 15.939117431640625, 8.179567337036133, -1.4051322937011719, 4.21282958984375, 2.9117660522460938, 0.3146858215332031, 4.332183837890625, -3.660888671875, -1.0569534301757812, 4.173786163330078, 0.87799072265625, 17.66534423828125, -0.6637535095214844, 1.2979087829589844, 5.7649946212768555, -3.5229263305664062, 6.662071228027344, -2.072265625, 9.045989990234375, 0.2417163848876953, 16.12335205078125, 6.313453674316406, 4.2105255126953125, -1.2513408660888672, -22.8101806640625, 5.999317169189453, -4.287376403808594, 2.1801834106445312, 0.4125022888183594, 6.280612945556641, -1.2376708984375, 3.631439208984375, 11.359367370605469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000159.npy"}
{"epoch": 0.24036281179138322, "step": 160, "batch_size": 64, "mean": 3.462432384490967, "std": 6.897418022155762, "min": -10.674339294433594, "p10": -5.815556144714355, "median": 3.634693145751953, "p90": 12.685847473144534, "max": 22.318328857421875, "pos_frac": 0.671875, "sample": [4.68659782409668, -3.1531524658203125, 9.680252075195312, 9.238876342773438, -2.1219615936279297, 12.924507141113281, 2.338623046875, -2.50067138671875, 13.460617065429688, -0.3797454833984375, 20.905517578125, 1.4132766723632812, 10.426172256469727, 3.6818084716796875, 2.837310791015625, 2.85675048828125, 3.74700927734375, -8.20831298828125, 4.24565315246582, -5.812406539916992, -0.14565277099609375, -5.816905975341797, 6.412933349609375, 0.3233757019042969, 10.045867919921875, 6.006629943847656, 9.685127258300781, 0.41950416564941406, 5.2538909912109375, 6.666526794433594, 13.129310607910156, 4.171445846557617, 22.318328857421875, 15.765758514404297, 0.09446907043457031, 1.118886947631836, 4.875755310058594, -1.677734375, 5.909996032714844, 4.678977966308594, -0.4060516357421875, 8.581405639648438, -7.174037933349609, -10.674339294433594, -0.9741573333740234, 5.410438537597656, -6.752937316894531, -3.817291259765625, 4.063905715942383, 5.204681396484375, 6.9858551025390625, -6.5676374435424805, 3.5875778198242188, 12.128974914550781, 0.3366241455078125, -1.360687255859375, -1.1783218383789062, 10.789051055908203, -2.9711990356445312, 3.1613616943359375, 9.400177001953125, 14.59908676147461, -3.0859375, -7.194084167480469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000160.npy"}
{"epoch": 0.2418745275888133, "step": 161, "batch_size": 64, "mean": 3.606900691986084, "std": 7.178943157196045, "min": -22.82312774658203, "p10": -1.7594116210937498, "median": 3.2585716247558594, "p90": 13.92485504150391, "max": 19.90252685546875, "pos_frac": 0.71875, "sample": [9.918983459472656, -15.670341491699219, -0.7365646362304688, 9.444747924804688, -1.8357391357421875, 0.014495849609375, 4.330078125, 14.476852416992188, 2.447216033935547, 2.10321044921875, -5.15606689453125, -0.04111480712890625, 8.244077682495117, 4.026584625244141, 3.3760604858398438, -1.5813140869140625, 4.1570892333984375, -22.82312774658203, 5.837806701660156, -10.033279418945312, 0.5802536010742188, 5.1724090576171875, 5.489158630371094, -0.6044769287109375, 12.872745513916016, -0.9282569885253906, 5.157917022705078, 1.4219932556152344, -6.702110290527344, 15.378822326660156, 4.280689239501953, 3.6346702575683594, 4.224979400634766, -0.235809326171875, 12.478202819824219, 4.925670623779297, -0.197479248046875, 3.1933746337890625, 2.4781341552734375, 14.862632751464844, 1.849945068359375, 9.480438232421875, -0.2600746154785156, 2.2752418518066406, 8.367294311523438, 1.5751457214355469, 2.2989959716796875, 4.24427604675293, 8.115791320800781, 10.20037841796875, -0.719940185546875, 12.479862213134766, 15.178529739379883, 3.3237686157226562, 0.0212554931640625, -0.8001203536987305, 0.907928466796875, -0.8712692260742188, 14.37575912475586, 15.107086181640625, 19.90252685546875, 2.6054153442382812, -3.3819026947021484, 6.582130432128906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000161.npy"}
{"epoch": 0.24338624338624337, "step": 162, "batch_size": 64, "mean": 5.244149208068848, "std": 6.767694473266602, "min": -20.82818603515625, "p10": -2.894721221923828, "median": 5.489738464355469, "p90": 13.371510314941409, "max": 24.494949340820312, "pos_frac": 0.78125, "sample": [8.100845336914062, 1.8090591430664062, -20.82818603515625, 1.0384902954101562, 6.109619140625, 6.643619537353516, 2.9802112579345703, 4.982635498046875, 7.585906982421875, 14.060686111450195, 5.173042297363281, 3.0545616149902344, -0.3092041015625, 24.494949340820312, 4.520050048828125, 10.281597137451172, 5.877166748046875, 9.634902954101562, 2.900543212890625, 12.761146545410156, 13.633094787597656, 8.507705688476562, 5.428565979003906, -3.652313232421875, 9.431026458740234, 8.870964050292969, -2.94305419921875, 4.793907165527344, -6.184288024902344, 10.90985107421875, 2.563405990600586, 0.886077880859375, -1.716796875, 11.221818923950195, 6.530364990234375, 14.619224548339844, 13.919677734375, 5.406688690185547, 0.5830020904541016, -0.8983955383300781, -2.7819442749023438, 5.550910949707031, 21.18474578857422, 14.372581481933594, 6.1922607421875, -3.3648910522460938, -1.296539306640625, -0.40195465087890625, 8.777755737304688, -0.18328094482421875, -3.568756103515625, 6.407630920410156, 7.766502380371094, 9.014842987060547, 9.765995025634766, 5.070060729980469, 8.437210083007812, -3.416248321533203, 3.1928253173828125, 3.013763427734375, 2.918487548828125, 10.158514022827148, 8.52773666381836, 7.505180358886719], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000162.npy"}
{"epoch": 0.24489795918367346, "step": 163, "batch_size": 64, "mean": 4.165828704833984, "std": 7.414529800415039, "min": -9.834808349609375, "p10": -6.469278717041014, "median": 4.244178771972656, "p90": 13.286185264587408, "max": 22.037887573242188, "pos_frac": 0.671875, "sample": [-2.3835792541503906, 4.388710021972656, 3.0270233154296875, 4.117889404296875, 4.3704681396484375, 4.5541229248046875, -9.71689224243164, -0.85400390625, 22.037887573242188, -1.085662841796875, 7.507110595703125, 18.02557373046875, 10.96014404296875, 4.7393646240234375, 4.0001678466796875, -1.3508071899414062, 20.285720825195312, -0.27030181884765625, 8.430130004882812, 17.82550811767578, -0.369140625, 3.895233154296875, 10.785652160644531, -1.0375213623046875, -5.126609802246094, 13.804908752441406, 2.9229698181152344, 8.363739013671875, 0.8405609130859375, -1.603790283203125, 10.265998840332031, 0.01503753662109375, 7.104393005371094, 6.186372756958008, 8.394927978515625, 4.678245544433594, -7.379322052001953, 9.281501770019531, -5.110004425048828, 3.715177536010742, 6.809654235839844, 15.943058013916016, -7.836639404296875, 10.19729995727539, -7.044708251953125, -1.4141521453857422, 2.4942398071289062, 4.637184143066406, -0.3128337860107422, -7.5856781005859375, -8.240509033203125, 10.892831802368164, 5.897987365722656, 0.5986518859863281, -0.07630157470703125, -9.834808349609375, 17.184898376464844, 4.831459045410156, 10.942161560058594, 12.075830459594727, 8.916900634765625, -3.962139129638672, 2.9572620391845703, 10.304454803466797], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000163.npy"}
{"epoch": 0.24640967498110355, "step": 164, "batch_size": 64, "mean": 5.246478080749512, "std": 8.292464256286621, "min": -13.38885498046875, "p10": -3.9931064605712887, "median": 4.592660903930664, "p90": 17.256732940673828, "max": 20.224533081054688, "pos_frac": 0.71875, "sample": [-2.4693450927734375, -0.7473983764648438, 1.14129638671875, -2.4408950805664062, 7.817146301269531, 10.53338623046875, 19.291961669921875, 13.194854736328125, 11.123184204101562, -2.708770751953125, -0.502471923828125, -4.1977691650390625, 8.383813858032227, -0.08077049255371094, 9.29898452758789, 3.1353759765625, 6.631187438964844, 3.1209230422973633, 14.314727783203125, -0.32216644287109375, 0.7414875030517578, -12.795448303222656, 5.054359436035156, 4.5368804931640625, 0.014322280883789062, 0.9445152282714844, 8.655326843261719, 3.4353713989257812, 3.0775909423828125, 5.364051818847656, 9.988616943359375, 16.26329803466797, 2.3670997619628906, 7.6303558349609375, 4.648441314697266, -13.38885498046875, 15.408203125, 17.393844604492188, 17.331680297851562, -12.210983276367188, -4.750709533691406, -1.47021484375, 6.3761138916015625, 16.782562255859375, 5.444042205810547, 4.291250228881836, 10.220046997070312, -0.042026519775390625, 17.576202392578125, 14.605049133300781, 11.824981689453125, -3.5155601501464844, 17.08185577392578, -9.61746597290039, 0.23227691650390625, -7.72796630859375, -2.1304702758789062, 20.224533081054688, 17.795089721679688, 1.5724372863769531, 17.864795684814453, 12.627288818359375, 7.43379020690918, 4.0992584228515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000164.npy"}
{"epoch": 0.24792139077853365, "step": 165, "batch_size": 64, "mean": 4.919890403747559, "std": 7.663153648376465, "min": -14.766433715820312, "p10": -3.7521430969238283, "median": 4.945552825927734, "p90": 13.133611679077148, "max": 24.8558349609375, "pos_frac": 0.75, "sample": [22.342636108398438, 6.5366668701171875, 9.485054016113281, 10.924808502197266, 9.14508056640625, 1.0836639404296875, 2.0064430236816406, -3.7203826904296875, 5.214496612548828, 2.7222747802734375, 7.330810546875, 5.161830902099609, 16.299102783203125, -0.4514923095703125, -0.0616607666015625, 5.290981292724609, 1.344329833984375, 9.919620513916016, -2.6649742126464844, -1.315704345703125, -14.766433715820312, -4.301250457763672, 11.288032531738281, -0.7364749908447266, 16.361343383789062, 7.627410888671875, 1.6026592254638672, 7.327140808105469, 12.605262756347656, 13.132770538330078, 3.589508056640625, -0.7015209197998047, 24.8558349609375, 5.3638916015625, 10.254203796386719, -11.84716796875, -0.3495941162109375, 6.402301788330078, 13.13397216796875, 11.268783569335938, 0.043849945068359375, -4.29803466796875, 3.48858642578125, 4.4379730224609375, 10.730289459228516, -2.8837890625, 1.8270835876464844, 4.853080749511719, 5.84228515625, -3.7657546997070312, 0.7094879150390625, 5.03802490234375, -11.012344360351562, 3.3917884826660156, 4.231132507324219, -7.7505035400390625, 10.707260131835938, 6.13020133972168, 11.297697067260742, 3.050445556640625, 20.745628356933594, 4.2646484375, 18.606643676757812, 6.4830474853515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000165.npy"}
{"epoch": 0.2494331065759637, "step": 166, "batch_size": 64, "mean": 3.641010284423828, "std": 7.542336940765381, "min": -13.667404174804688, "p10": -4.601329040527344, "median": 1.7932319641113281, "p90": 15.412054824829102, "max": 19.862060546875, "pos_frac": 0.640625, "sample": [16.371131896972656, -0.4557781219482422, 0.463592529296875, 8.01910400390625, 11.437789916992188, -2.3626480102539062, -2.6750411987304688, 19.612144470214844, -0.119476318359375, 1.804779052734375, 3.7101287841796875, 2.8842334747314453, 11.384387969970703, -13.667404174804688, 11.517181396484375, 6.647636413574219, -0.3587188720703125, 3.925067901611328, 1.0103759765625, 6.530128479003906, -1.9051246643066406, 12.553459167480469, -7.191444396972656, 0.9087753295898438, 15.56905746459961, -0.4093170166015625, 0.4351043701171875, 8.609649658203125, 5.890068054199219, 5.3494415283203125, -9.827117919921875, -7.617095947265625, -3.1453704833984375, -1.016448974609375, 18.181922912597656, 1.5725212097167969, 14.078315734863281, -3.7269515991210938, 9.825695037841797, 8.690597534179688, 3.99456787109375, 19.862060546875, -2.9403457641601562, 0.2140960693359375, 6.480743408203125, -5.663543701171875, -1.1948013305664062, -4.545082092285156, 4.456504821777344, 19.402748107910156, 6.1164703369140625, 5.341728210449219, -1.8824920654296875, 1.5194644927978516, -4.625434875488281, -1.4376029968261719, 15.04571533203125, -2.3517398834228516, 15.890106201171875, -5.5411529541015625, 5.499870300292969, 4.18780517578125, 1.7816848754882812, 0.9089317321777344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000166.npy"}
{"epoch": 0.2509448223733938, "step": 167, "batch_size": 64, "mean": 4.390508651733398, "std": 7.228689670562744, "min": -20.665802001953125, "p10": -2.777555656433105, "median": 4.8291015625, "p90": 12.912810516357423, "max": 21.828283309936523, "pos_frac": 0.703125, "sample": [5.061561584472656, 5.714870452880859, 4.152797698974609, 5.966461181640625, -0.49349212646484375, 12.146469116210938, 8.44989013671875, -0.2998924255371094, 1.0592994689941406, 18.03636932373047, 4.017671585083008, -0.41024017333984375, 5.115449905395508, -4.9499664306640625, -1.4361515045166016, -2.4928054809570312, 5.37567138671875, 18.868804931640625, 11.502334594726562, -9.854248046875, -6.570606231689453, -0.5876312255859375, 1.8747024536132812, 4.980804443359375, 7.646352767944336, -20.665802001953125, 5.6721038818359375, -0.6236133575439453, -0.08133697509765625, 6.3156280517578125, 5.225807189941406, 11.875356674194336, -2.8995914459228516, 10.38625717163086, 6.891986846923828, 4.677398681640625, 6.057491302490234, 11.04315185546875, 14.892105102539062, 2.7628402709960938, -0.28014373779296875, 13.789993286132812, -3.1680374145507812, 3.145923614501953, 12.963134765625, -1.88360595703125, -8.054298400878906, 12.761192321777344, 0.862091064453125, 5.9411163330078125, 12.795387268066406, 0.4486961364746094, 5.202156066894531, 21.828283309936523, 4.610342025756836, 7.990875244140625, 0.8375320434570312, 8.38372802734375, 4.6418304443359375, 17.78789520263672, 7.123085021972656, -0.4856700897216797, 0.5157623291015625, -1.1689586639404297], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000167.npy"}
{"epoch": 0.25245653817082386, "step": 168, "batch_size": 64, "mean": 3.505453586578369, "std": 6.111074924468994, "min": -8.134605407714844, "p10": -4.716554260253906, "median": 3.248532295227051, "p90": 11.571668624877933, "max": 20.84070587158203, "pos_frac": 0.71875, "sample": [3.612659454345703, -0.03606414794921875, 8.115394592285156, -3.8620071411132812, 1.7957229614257812, 9.414905548095703, 3.30230712890625, -4.899787902832031, -8.134605407714844, -6.113525390625, 2.9740066528320312, -0.9212226867675781, -1.314453125, 11.972858428955078, 6.88580322265625, 6.8704681396484375, 9.899887084960938, 8.02203369140625, 4.654632568359375, 12.110847473144531, 10.346015930175781, 3.869626998901367, 1.035186767578125, 3.1947574615478516, -1.2436132431030273, 3.4366188049316406, 12.512378692626953, 1.8184814453125, -5.327857971191406, 4.515411376953125, 0.180816650390625, 1.2814865112304688, 9.811538696289062, 4.396247863769531, 5.589544296264648, -5.2196197509765625, 15.377593994140625, -0.3682861328125, -1.4346504211425781, 0.3722381591796875, 5.965778350830078, 20.84070587158203, 0.4421234130859375, 9.377182006835938, 6.207603454589844, -7.9040374755859375, 10.63555908203125, 0.5412139892578125, 3.4954299926757812, 0.27841949462890625, 17.041900634765625, 1.555419921875, 3.5009841918945312, -4.289009094238281, -0.7810592651367188, 13.58090591430664, -6.181022644042969, -1.3049125671386719, -0.7884292602539062, 1.233205795288086, 6.5067901611328125, 5.808860778808594, 9.52276611328125, 0.5688858032226562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000168.npy"}
{"epoch": 0.25396825396825395, "step": 169, "batch_size": 64, "mean": 4.8323869705200195, "std": 8.129197120666504, "min": -19.268798828125, "p10": -4.660197448730469, "median": 3.6824378967285156, "p90": 16.51864433288575, "max": 23.398773193359375, "pos_frac": 0.671875, "sample": [-4.593376159667969, 6.8447265625, 11.060104370117188, -2.435375213623047, 2.6594505310058594, 0.051334381103515625, 9.357833862304688, -1.5462646484375, 2.0701828002929688, 1.4680366516113281, -5.813987731933594, -1.3992576599121094, 21.565841674804688, 7.7535247802734375, 8.544952392578125, -4.783992767333984, 5.882972717285156, 1.8037185668945312, 9.244125366210938, 2.9593582153320312, -0.6034774780273438, 13.938125610351562, 14.184318542480469, 7.4089202880859375, -5.111141204833984, 23.398773193359375, 11.704582214355469, 6.1651458740234375, 0.7662315368652344, 5.749076843261719, -0.40288543701171875, 4.359199523925781, 2.1708450317382812, -2.41353702545166, 2.7597579956054688, -2.404888153076172, 17.428821563720703, -6.002960205078125, 12.138603210449219, 5.616119384765625, -0.5612564086914062, 3.00567626953125, 7.480707168579102, 7.930473327636719, -0.4704399108886719, 11.484901428222656, 8.10150146484375, -3.765472412109375, 9.889205932617188, 14.3948974609375, -2.4649658203125, 7.935659408569336, 12.423225402832031, -19.268798828125, -7.7424774169921875, 18.847763061523438, 18.306358337402344, 20.231185913085938, -0.6250839233398438, 17.98072052001953, -4.688835144042969, 9.660833358764648, -1.172882080078125, 0.8163490295410156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000169.npy"}
{"epoch": 0.25547996976568405, "step": 170, "batch_size": 64, "mean": 3.0439300537109375, "std": 9.937443733215332, "min": -20.045082092285156, "p10": -11.184864425659178, "median": 3.868682861328125, "p90": 16.594039154052737, "max": 21.800750732421875, "pos_frac": 0.671875, "sample": [15.471199035644531, -11.406871795654297, 4.352436065673828, -0.45859527587890625, 5.342000961303711, 17.941688537597656, 2.041677474975586, 1.641352653503418, 6.311714172363281, 11.258975982666016, -11.712745666503906, 3.5781211853027344, 10.911865234375, 20.170516967773438, 14.314735412597656, -8.676410675048828, 3.7667007446289062, 7.9677734375, -3.2072067260742188, -12.302139282226562, 2.1502227783203125, 20.329315185546875, -2.9305496215820312, -2.2552947998046875, -8.464752197265625, -3.7985763549804688, -11.889862060546875, 4.97528076171875, -20.045082092285156, 6.7173614501953125, 12.38717269897461, 8.051628112792969, 18.77191162109375, 7.120689392089844, -8.846765518188477, 0.9493637084960938, -9.62451171875, 3.930023193359375, 11.606124877929688, -4.6012115478515625, 0.11383819580078125, -6.500711441040039, 2.0588645935058594, 5.066200256347656, 6.843524932861328, 17.07525634765625, 3.807342529296875, 21.800750732421875, 1.0789871215820312, 4.504997253417969, -12.034088134765625, -9.036617279052734, 4.891975402832031, 13.85647964477539, -14.956565856933594, 5.367780685424805, 5.748847961425781, 21.595550537109375, -5.081195831298828, 6.143217086791992, 13.357864379882812, 15.183868408203125, 2.7529220581054688, -10.666847229003906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000170.npy"}
{"epoch": 0.25699168556311414, "step": 171, "batch_size": 64, "mean": 5.132603645324707, "std": 9.083740234375, "min": -19.43206787109375, "p10": -6.159224700927734, "median": 5.101863861083984, "p90": 16.675839996337892, "max": 22.5733642578125, "pos_frac": 0.71875, "sample": [4.6839599609375, 1.6936454772949219, 14.698959350585938, 17.100265502929688, 9.895530700683594, 19.32691192626953, 15.437004089355469, 16.993324279785156, 1.1135873794555664, 8.283285140991211, -1.6790904998779297, 5.160858154296875, 7.6348876953125, 13.20883560180664, 9.187973022460938, 5.042869567871094, 5.683368682861328, 4.82440185546875, 9.208598136901855, -2.4348373413085938, 0.1447315216064453, 11.865264892578125, 7.70263671875, 19.65283203125, 11.420967102050781, -0.1281299591064453, -6.392791748046875, -6.728919982910156, 21.51885986328125, -5.298189163208008, 14.289222717285156, 2.00732421875, 7.454902648925781, -4.0688629150390625, 7.898509979248047, 8.644332885742188, -5.614234924316406, -18.25078582763672, 17.586456298828125, 15.935043334960938, -19.43206787109375, -0.6072483062744141, 9.620498657226562, 4.8112945556640625, 3.3165664672851562, 22.5733642578125, 9.531036376953125, 1.9713134765625, 3.9034957885742188, 9.780044555664062, 4.218898773193359, 15.748893737792969, 13.611961364746094, -2.2221603393554688, -14.089241027832031, -6.78936767578125, 7.1254425048828125, 3.1678085327148438, -1.288116455078125, -9.207778930664062, 11.4656982421875, 0.45633697509765625, -1.423095703125, -2.4604530334472656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000171.npy"}
{"epoch": 0.2585034013605442, "step": 172, "batch_size": 64, "mean": 4.593903541564941, "std": 8.2947998046875, "min": -17.045364379882812, "p10": -3.4519279479980467, "median": 3.2441792488098145, "p90": 15.739754104614258, "max": 24.475738525390625, "pos_frac": 0.703125, "sample": [-3.5839767456054688, 3.054279327392578, 2.0281982421875, 8.074844360351562, -3.1438140869140625, 3.3754892349243164, 0.22731685638427734, 13.690162658691406, 15.976593017578125, 15.874797821044922, 24.475738525390625, -2.033233642578125, -0.1339263916015625, -0.18449783325195312, 2.7860641479492188, 21.381484985351562, 10.485572814941406, -1.0974998474121094, 1.3687553405761719, 4.673763275146484, 14.66180419921875, 0.5820770263671875, -7.36277961730957, 4.78704833984375, 6.725391387939453, 0.17824745178222656, -1.0694503784179688, 5.357208251953125, 10.0955810546875, -4.3526458740234375, 1.3649940490722656, 3.68829345703125, 10.45947265625, 15.66156005859375, 0.8750534057617188, -6.539144515991211, 15.773265838623047, -2.629486083984375, 15.315206527709961, -0.8112907409667969, -3.122467041015625, 5.964801788330078, 4.698101043701172, 13.753341674804688, 5.768289566040039, -12.888687133789062, -0.7678375244140625, 11.270065307617188, 0.7374649047851562, 4.83984375, -11.528179168701172, 14.393966674804688, 17.880958557128906, -17.045364379882812, 3.1128692626953125, 12.99725341796875, 3.6058197021484375, -0.298187255859375, 0.8450355529785156, 17.579051971435547, -3.076629638671875, 1.8531646728515625, 13.26904296875, 10.111564636230469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000172.npy"}
{"epoch": 0.2600151171579743, "step": 173, "batch_size": 64, "mean": 4.448987007141113, "std": 9.722726821899414, "min": -16.53714370727539, "p10": -7.533785629272461, "median": 3.096525192260742, "p90": 18.20465316772461, "max": 29.422348022460938, "pos_frac": 0.6875, "sample": [-13.66862678527832, 1.818023681640625, -9.81790542602539, 5.176170349121094, 3.0717620849609375, 18.90294647216797, 12.735034942626953, -4.990413665771484, 0.09372901916503906, 7.139217376708984, 0.9448509216308594, 7.215110778808594, 16.520111083984375, -7.545597076416016, -4.361223220825195, 14.505393981933594, 17.345840454101562, 1.9296455383300781, 2.291452407836914, 1.4122161865234375, 12.375579833984375, 5.072746276855469, -7.5062255859375, 6.07598876953125, -16.53714370727539, -13.13653564453125, 9.5731201171875, -0.9877777099609375, 0.3986167907714844, 3.501964569091797, 2.1086654663085938, 21.724319458007812, -2.5831432342529297, 10.023895263671875, 8.397331237792969, -1.3460254669189453, 12.098445892333984, -10.245174407958984, 12.976310729980469, 20.010635375976562, 4.38525390625, 3.8660354614257812, 4.124454498291016, 15.942474365234375, -0.7848358154296875, -9.035270690917969, 4.0272216796875, -7.069648742675781, 29.422348022460938, 16.885990142822266, 18.572715759277344, 1.3722152709960938, 3.02325439453125, 15.860137939453125, 18.7974853515625, 1.1140060424804688, -2.6589202880859375, -3.6714859008789062, 3.121288299560547, 20.111663818359375, -0.9817657470703125, 12.190595626831055, -4.421875, -2.1715087890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000173.npy"}
{"epoch": 0.2615268329554044, "step": 174, "batch_size": 64, "mean": 5.6412458419799805, "std": 9.122847557067871, "min": -19.967121124267578, "p10": -4.863851928710937, "median": 4.870988845825195, "p90": 17.961389541625977, "max": 24.686172485351562, "pos_frac": 0.71875, "sample": [7.38494873046875, 14.690444946289062, 11.451465606689453, 1.8934440612792969, 8.421548843383789, 4.3509521484375, 9.465934753417969, 18.11895751953125, -5.6269989013671875, 7.248653411865234, 22.792007446289062, 24.285491943359375, 4.229148864746094, 0.7922401428222656, 8.083358764648438, -0.436279296875, 17.593730926513672, 11.816293716430664, 9.501655578613281, -15.982295989990234, 21.264366149902344, 2.7586135864257812, -0.5815086364746094, 6.346675872802734, 24.686172485351562, -0.5817317962646484, 10.85137939453125, 0.9124412536621094, 15.720169067382812, 9.443824768066406, 0.24840545654296875, 16.127845764160156, 18.87523651123047, -0.20524215698242188, 6.353935241699219, -1.6611785888671875, 7.647430419921875, -8.509979248046875, 6.9547576904296875, 16.755447387695312, 0.1486053466796875, 12.584259033203125, -2.8806228637695312, 2.2835121154785156, -6.100395202636719, 17.15911865234375, 3.9861679077148438, -0.313812255859375, 18.780807495117188, 4.53314208984375, -5.179351806640625, 2.8372421264648438, 9.9483642578125, 5.208835601806641, -4.127685546875, 2.7325439453125, 8.790283203125, 3.0495872497558594, -0.8916244506835938, -19.967121124267578, 7.210727691650391, -3.2558364868164062, -8.099876403808594, -0.8788681030273438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000174.npy"}
{"epoch": 0.26303854875283444, "step": 175, "batch_size": 64, "mean": 4.860843658447266, "std": 9.784187316894531, "min": -23.848968505859375, "p10": -5.542345428466796, "median": 3.722858428955078, "p90": 18.062979125976565, "max": 23.65528106689453, "pos_frac": 0.75, "sample": [-12.307830810546875, 12.404226303100586, -5.441680908203125, 2.538818359375, 2.5458908081054688, 16.639392852783203, -4.704038619995117, 6.2822265625, 1.689910888671875, 2.8913497924804688, 5.171562194824219, 18.774032592773438, 2.0889434814453125, 10.661529541015625, -0.718963623046875, 17.167884826660156, 5.5604705810546875, 21.91492462158203, 5.364646911621094, -5.585487365722656, 3.9097824096679688, 5.393522262573242, 9.997894287109375, 11.971389770507812, 2.7606964111328125, 2.2127685546875, 0.11714935302734375, -12.728752136230469, 19.663700103759766, -23.848968505859375, 10.215545654296875, 1.3297271728515625, 12.073562622070312, -14.483352661132812, -1.672271728515625, 2.1166019439697266, 10.419876098632812, 23.65528106689453, 2.8586463928222656, -3.1291122436523438, 3.5359344482421875, -14.192230224609375, 1.97625732421875, 2.949390411376953, -4.95269775390625, 5.405303955078125, 7.571739196777344, -8.370361328125, 18.116226196289062, 18.877723693847656, 5.590776443481445, 15.953140258789062, -1.3317108154296875, -1.079681396484375, 7.4306182861328125, 8.992576599121094, -5.180656433105469, 17.938735961914062, 1.287200927734375, 21.942367553710938, 5.524757385253906, 17.743377685546875, 2.0854034423828125, 17.508285522460938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000175.npy"}
{"epoch": 0.26455026455026454, "step": 176, "batch_size": 64, "mean": 3.15330171585083, "std": 9.363704681396484, "min": -16.810714721679688, "p10": -8.405690765380859, "median": 2.5619640350341797, "p90": 15.104405975341797, "max": 30.1494140625, "pos_frac": 0.640625, "sample": [25.0572509765625, -16.810714721679688, 3.8244171142578125, 15.040153503417969, -2.071788787841797, 13.533798217773438, -8.818389892578125, 6.928779602050781, 0.8656425476074219, 3.7603836059570312, -4.9882049560546875, -6.1835784912109375, -4.44390869140625, 18.468612670898438, -12.427619934082031, 4.444578170776367, 1.5271224975585938, 6.543437957763672, 2.3626861572265625, -1.61669921875, -2.2299346923828125, -7.956275939941406, 3.573963165283203, 20.786422729492188, 15.131942749023438, 0.5514068603515625, -5.4906463623046875, 8.450569152832031, -0.3978118896484375, 11.590995788574219, -2.6980438232421875, -3.9202327728271484, 0.6042938232421875, 4.2276763916015625, 4.781345367431641, 15.966022491455078, 1.3392105102539062, 9.9779052734375, 3.6504745483398438, -4.345924377441406, -3.8701248168945312, 2.2180938720703125, 9.269269943237305, 15.681659698486328, 6.335071563720703, 12.590713500976562, -4.1647186279296875, -13.030399322509766, 10.9610595703125, 4.691680908203125, -0.95294189453125, 11.879505157470703, 2.761241912841797, -3.463592529296875, 6.121368408203125, 4.528839111328125, 13.691329956054688, 2.198638916015625, -8.598297119140625, 30.1494140625, 2.0867156982421875, -10.007537841796875, 7.8269195556640625, -15.681930541992188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000176.npy"}
{"epoch": 0.2660619803476946, "step": 177, "batch_size": 64, "mean": 8.589489936828613, "std": 9.332977294921875, "min": -12.83160400390625, "p10": -2.698448371887206, "median": 7.770557403564453, "p90": 20.277913665771486, "max": 31.173934936523438, "pos_frac": 0.828125, "sample": [4.332496643066406, -8.630134582519531, 9.497077941894531, 31.173934936523438, 8.008888244628906, 6.127834320068359, 2.9072647094726562, 6.087066650390625, 14.878067016601562, 8.672645568847656, 24.237125396728516, 15.777420043945312, 11.290651321411133, -4.941852569580078, 0.447235107421875, -3.200559616088867, 15.575958251953125, -0.9191513061523438, 0.8273963928222656, 11.948781967163086, 8.10064697265625, 1.4642791748046875, -3.7434730529785156, 14.219589233398438, 15.741588592529297, 7.5322265625, 7.362861633300781, 6.841028213500977, 0.27086639404296875, 3.548168182373047, 0.9009857177734375, 6.383617401123047, 20.882949829101562, -1.52685546875, 1.5433387756347656, 19.597412109375, 21.8453369140625, -0.45615386962890625, -0.15936279296875, 9.158805847167969, 5.307018280029297, 6.367168426513672, 16.175838470458984, 13.752593994140625, 20.569557189941406, -8.408226013183594, 0.8997802734375, -4.413818359375, 14.157400131225586, 4.500511169433594, 27.9749755859375, 1.7278556823730469, -12.83160400390625, 19.403079986572266, 12.462387084960938, 18.493881225585938, 16.856788635253906, 25.7017822265625, 16.341827392578125, 9.07247543334961, 16.240890502929688, 18.873287200927734, 3.6088714599609375, 13.287040710449219], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000177.npy"}
{"epoch": 0.2675736961451247, "step": 178, "batch_size": 64, "mean": 5.553801536560059, "std": 8.128524780273438, "min": -15.936397552490234, "p10": -3.8623226165771483, "median": 5.122081756591797, "p90": 16.460266876220704, "max": 27.300704956054688, "pos_frac": 0.796875, "sample": [6.079872131347656, 5.086830139160156, 9.522586822509766, -0.926605224609375, 6.577934265136719, 1.2189979553222656, 3.006927490234375, 13.911149978637695, 8.54034423828125, 12.628597259521484, -3.755615234375, 15.493904113769531, 14.883453369140625, -7.488536834716797, 6.0988311767578125, 5.1573333740234375, 17.382442474365234, 0.6053695678710938, 5.796699523925781, 1.4642715454101562, 21.652355194091797, 2.426858901977539, 4.186805725097656, -1.6005859375, 4.28704833984375, -4.6947174072265625, 1.7594528198242188, 27.300704956054688, -4.012687683105469, 0.16236495971679688, 5.2486114501953125, 0.27965354919433594, 1.5266265869140625, 17.35163116455078, 0.246734619140625, 9.296722412109375, 8.289413452148438, 0.6591606140136719, 16.48870086669922, 1.6695175170898438, -0.31916046142578125, 9.330181121826172, 16.3939208984375, 6.842922210693359, -15.936397552490234, 18.391773223876953, -3.9080543518066406, 3.3885345458984375, 20.04930877685547, 0.1097869873046875, 13.902740478515625, 12.22979736328125, 4.337005615234375, 9.630828857421875, 0.44137001037597656, 9.032341003417969, 5.426422119140625, -1.2367324829101562, -11.792312622070312, -2.3792495727539062, 12.380496978759766, 5.5617828369140625, 15.034065246582031, -5.277250289916992], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000178.npy"}
{"epoch": 0.2690854119425548, "step": 179, "batch_size": 64, "mean": 4.5835747718811035, "std": 9.501955032348633, "min": -16.206764221191406, "p10": -7.568682861328124, "median": 3.5187387466430664, "p90": 14.708164978027344, "max": 34.01322937011719, "pos_frac": 0.71875, "sample": [1.4334716796875, -6.425872802734375, 8.560836791992188, 0.37225341796875, 2.7345428466796875, 10.399402618408203, 3.9593849182128906, 13.556533813476562, 17.73944091796875, 7.825374603271484, 12.392894744873047, 2.382169723510742, 18.03136444091797, -14.413688659667969, 9.604267120361328, -11.761329650878906, -16.206764221191406, -12.257835388183594, -2.6346893310546875, 5.3122406005859375, -1.5890426635742188, 14.537445068359375, 14.716171264648438, 7.3307647705078125, 13.68170166015625, -7.87445068359375, -3.3469161987304688, 16.049758911132812, 2.980714797973633, -1.0832977294921875, 8.143718719482422, 6.4386138916015625, 3.2583541870117188, 1.1168975830078125, 2.1091651916503906, -11.81918716430664, -2.6948394775390625, 23.2833251953125, 14.689483642578125, -6.855224609375, 10.089664459228516, 3.9454116821289062, 2.4168128967285156, 10.957321166992188, -2.3897705078125, 13.99139404296875, 22.649917602539062, 7.358856201171875, 10.042900085449219, 34.01322937011719, 12.103103637695312, 3.779123306274414, 2.591888427734375, 0.09164047241210938, -2.2059059143066406, 8.626625061035156, -1.2689476013183594, 1.599151611328125, 10.041370391845703, 1.0208244323730469, -10.39181137084961, 2.378713607788086, 10.681640625, -2.451519012451172], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000179.npy"}
{"epoch": 0.2705971277399849, "step": 180, "batch_size": 64, "mean": 4.444667339324951, "std": 8.662073135375977, "min": -14.944229125976562, "p10": -3.7959144592285154, "median": 1.8352375030517578, "p90": 17.488232421875, "max": 31.027023315429688, "pos_frac": 0.609375, "sample": [31.027023315429688, -2.1839981079101562, 5.626567840576172, -1.7031402587890625, -2.2913818359375, 7.1429443359375, -6.5221405029296875, 10.411300659179688, -0.9277687072753906, -3.5898056030273438, 17.024120330810547, 0.402496337890625, -0.2822113037109375, -2.59130859375, 10.958175659179688, -1.0952301025390625, -2.1518402099609375, 6.619232177734375, -2.2213897705078125, -8.96728515625, 7.1551513671875, 9.086797714233398, 4.372751235961914, 12.973785400390625, 13.203311920166016, 11.062980651855469, -1.0131034851074219, 10.135482788085938, 6.520162582397461, 5.8114776611328125, 4.983146667480469, 11.713054656982422, -1.4506301879882812, 2.0271987915039062, -3.6008071899414062, 20.166366577148438, -0.7714653015136719, 14.829635620117188, -3.8795318603515625, -6.407066345214844, -1.7887840270996094, 10.711103439331055, 18.982254028320312, -1.4449481964111328, -1.6131057739257812, 1.6111297607421875, 20.609046936035156, 0.2960929870605469, 13.563495635986328, 0.046733856201171875, 17.687137603759766, -14.944229125976562, -1.9587783813476562, 1.8959693908691406, 1.4131851196289062, 1.774505615234375, 20.916015625, -5.059600830078125, 7.4754486083984375, -5.16064453125, 0.6057586669921875, 4.511695861816406, 20.34747314453125, 2.3786964416503906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000180.npy"}
{"epoch": 0.272108843537415, "step": 181, "batch_size": 64, "mean": 5.5361409187316895, "std": 8.313342094421387, "min": -10.930435180664062, "p10": -3.54755859375, "median": 4.016040802001953, "p90": 17.736517333984377, "max": 27.003196716308594, "pos_frac": 0.75, "sample": [9.407520294189453, 5.820789337158203, -3.5766677856445312, 0.02484416961669922, 3.13214111328125, -0.07426071166992188, 3.1219940185546875, -10.930435180664062, 3.430419921875, -0.6518020629882812, 10.293258666992188, 3.005268096923828, 0.7650985717773438, -3.4316253662109375, 4.135032653808594, 4.580513000488281, 12.25436019897461, 8.293014526367188, 14.827003479003906, -5.804351806640625, 4.22528076171875, 9.586669921875, 19.090587615966797, 5.258506774902344, 25.568862915039062, 5.94549560546875, 2.7764129638671875, 27.003196716308594, -5.8463897705078125, -6.3921356201171875, 2.1210174560546875, 1.2058868408203125, 23.23541259765625, -7.025625228881836, 17.42578125, 2.5951499938964844, 4.207906723022461, 25.034683227539062, -0.07596206665039062, -1.7051315307617188, 11.468608856201172, 4.196891784667969, 20.81786346435547, 4.964111328125, 10.732894897460938, -0.5650920867919922, 0.12956619262695312, 4.489833831787109, 3.2056808471679688, 0.7065753936767578, -3.4796371459960938, 11.248050689697266, 17.86968994140625, 0.19008636474609375, 11.716827392578125, 12.976147651672363, 2.3582839965820312, -0.826019287109375, 13.00162124633789, -3.919483184814453, 3.8970489501953125, -0.78338623046875, 12.729888916015625, 4.3292388916015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000181.npy"}
{"epoch": 0.273620559334845, "step": 182, "batch_size": 64, "mean": 7.154191017150879, "std": 9.780839920043945, "min": -14.77947998046875, "p10": -4.868570709228515, "median": 6.909534454345703, "p90": 19.08007278442383, "max": 36.237152099609375, "pos_frac": 0.765625, "sample": [8.043792724609375, -4.6278228759765625, -2.07421875, 4.494140625, 11.090034484863281, 18.267791748046875, 7.600654602050781, -4.993682861328125, -0.6334152221679688, 21.925125122070312, 12.215312957763672, 9.064411163330078, 21.398223876953125, 2.06256103515625, 5.412471771240234, 10.8148193359375, -2.128692626953125, -6.3645172119140625, 16.576374053955078, 5.32525634765625, 9.653392791748047, 1.2409896850585938, 8.935028076171875, 8.2998046875, 8.068115234375, -1.0461044311523438, 18.515045166015625, 6.519630432128906, 16.00653076171875, 19.322227478027344, 16.012344360351562, 4.527172088623047, 9.803512573242188, 2.96148681640625, 25.343399047851562, -12.772796630859375, 3.31475830078125, 11.1866455078125, 2.656524658203125, -0.3346538543701172, 8.91748046875, -12.567214965820312, 18.046836853027344, 6.572662353515625, 16.461318969726562, 36.237152099609375, 0.26949119567871094, -2.3785858154296875, 20.418182373046875, 7.246406555175781, 14.09478759765625, 24.29187774658203, 5.075492858886719, 16.166221618652344, 10.1456298828125, 2.0874691009521484, 3.0502090454101562, 0.536834716796875, 16.81960678100586, 1.3435497283935547, -0.7890434265136719, -4.971748352050781, -6.10856819152832, -14.77947998046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000182.npy"}
{"epoch": 0.2751322751322751, "step": 183, "batch_size": 64, "mean": 2.573082208633423, "std": 8.662209510803223, "min": -20.04607391357422, "p10": -8.193566131591796, "median": 1.4514179229736328, "p90": 13.898304367065432, "max": 21.824371337890625, "pos_frac": 0.625, "sample": [7.205970764160156, 11.992820739746094, -0.08800506591796875, -0.1650238037109375, 1.2356948852539062, -3.1008377075195312, -8.51007080078125, 21.824371337890625, -13.037117004394531, -7.150970458984375, -8.837398529052734, -3.2067337036132812, 3.556839942932129, 1.81561279296875, 5.30914306640625, 7.390907287597656, -0.10471534729003906, 5.8796234130859375, 12.136085510253906, 18.663190841674805, 12.995407104492188, -7.6816558837890625, 10.267768859863281, -0.316650390625, 7.387641906738281, 1.6671409606933594, -0.2459564208984375, -1.9178314208984375, -17.650283813476562, 0.3182373046875, 4.155029296875, -5.337127685546875, -1.2496070861816406, 5.270660400390625, 1.0619430541992188, 3.9087753295898438, 18.654830932617188, 14.18204116821289, -20.04607391357422, 13.236251831054688, -9.252159118652344, -3.4438705444335938, 20.518905639648438, 1.072052001953125, 0.5072555541992188, 0.21383094787597656, 4.877479553222656, 0.9169387817382812, -8.412956237792969, -1.7059898376464844, -5.075096130371094, 2.627716064453125, 1.8292465209960938, 3.3623809814453125, 14.908905029296875, -3.9977169036865234, 5.842567443847656, 0.4491424560546875, -2.2395172119140625, 2.5675430297851562, 5.8579254150390625, 19.054397583007812, 10.235601425170898, 12.492752075195312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000183.npy"}
{"epoch": 0.2766439909297052, "step": 184, "batch_size": 64, "mean": 5.2061357498168945, "std": 7.577889442443848, "min": -13.083911895751953, "p10": -2.562896728515624, "median": 3.3354339599609375, "p90": 17.05461006164551, "max": 22.03070068359375, "pos_frac": 0.765625, "sample": [-0.9847259521484375, 10.442337036132812, 15.775520324707031, -4.755348205566406, 11.430862426757812, 0.986663818359375, 1.7438545227050781, -3.2774429321289062, 0.3858184814453125, -0.503265380859375, -0.560114860534668, 9.396697998046875, 2.4404373168945312, 0.21863651275634766, 5.377628326416016, -3.0261192321777344, 4.486171722412109, 14.066963195800781, 7.378536224365234, 11.257705688476562, -6.8482666015625, 20.152442932128906, 3.599651336669922, 17.095592498779297, 1.0781402587890625, 4.001201629638672, 3.042285919189453, 8.576553344726562, 12.723377227783203, 2.644084930419922, 16.958984375, 6.472648620605469, 3.071216583251953, -9.289497375488281, -0.07625007629394531, 0.3039360046386719, 21.733253479003906, 6.160369873046875, -1.4820442199707031, 17.142555236816406, -0.4914093017578125, 9.242530822753906, 10.465667724609375, 2.163087844848633, 15.106773376464844, 4.864522933959961, 4.1597137451171875, 8.395492553710938, -1.0472412109375, 19.997894287109375, -0.6187152862548828, -3.3462142944335938, 3.0265121459960938, 2.7020435333251953, 22.03070068359375, 18.638336181640625, 0.44152259826660156, 5.971408843994141, 3.6380138397216797, 7.796241760253906, -13.083911895751953, 3.009218215942383, 0.18626976013183594, 0.6031646728515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000184.npy"}
{"epoch": 0.2781557067271353, "step": 185, "batch_size": 64, "mean": 5.48830509185791, "std": 10.894311904907227, "min": -16.60727310180664, "p10": -9.21390380859375, "median": 4.737090110778809, "p90": 20.262998962402346, "max": 25.607086181640625, "pos_frac": 0.6875, "sample": [4.890287399291992, 25.607086181640625, 4.047340393066406, 14.713191986083984, 22.145782470703125, -16.08702850341797, 8.019012451171875, 4.516441345214844, 1.168853759765625, -5.0771942138671875, 18.229228973388672, 21.541061401367188, -9.217185974121094, 0.39923095703125, 18.85235595703125, 2.2511749267578125, 21.92328643798828, 1.8822212219238281, 16.227144241333008, 13.49959945678711, -2.015094757080078, 15.033973693847656, 4.143440246582031, 3.7107620239257812, 2.309804916381836, 16.735023498535156, 6.047765731811523, 12.303823471069336, 4.5196075439453125, 22.47296142578125, -7.5945587158203125, -11.903091430664062, 19.48306655883789, 13.393463134765625, -13.750629425048828, 11.248153686523438, 24.947616577148438, 8.019332885742188, 4.378879547119141, -0.0991058349609375, -9.206245422363281, 10.153335571289062, -1.7885246276855469, 15.44305419921875, 8.49200439453125, -11.870208740234375, 13.593696594238281, -0.8929634094238281, 9.617271423339844, -6.909412384033203, 11.220733642578125, -16.60727310180664, 9.902755737304688, -3.578380584716797, 6.9903411865234375, -6.619316101074219, -8.04571533203125, -12.197711944580078, 4.583892822265625, -3.0982284545898438, -3.04364013671875, 5.5873565673828125, 16.010345458984375, 20.59725570678711], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000185.npy"}
{"epoch": 0.2796674225245654, "step": 186, "batch_size": 64, "mean": 5.764609336853027, "std": 10.955605506896973, "min": -14.048568725585938, "p10": -9.06286506652832, "median": 3.4642677307128906, "p90": 18.903269958496093, "max": 31.202484130859375, "pos_frac": 0.640625, "sample": [18.008148193359375, 12.40472412109375, -4.518547058105469, 1.7405242919921875, -1.2085037231445312, -1.059906005859375, 17.418067932128906, -10.669883728027344, -2.237640380859375, 12.91644287109375, 16.081398010253906, -2.8450584411621094, 6.3633575439453125, 15.152877807617188, 31.202484130859375, 21.35947036743164, -3.632822036743164, 2.937450408935547, 14.0863037109375, -9.1690673828125, -8.815059661865234, 15.920570373535156, 7.065395355224609, 25.957839965820312, -14.048568725585938, -3.6313743591308594, -10.389820098876953, -3.390798568725586, -5.813072204589844, 2.217235565185547, 6.8444671630859375, 15.63204574584961, 11.01513671875, -12.19317626953125, -0.347015380859375, -2.1612815856933594, 17.637252807617188, 21.247291564941406, 17.094188690185547, 1.7723922729492188, -0.9374961853027344, -7.325202941894531, 5.8303375244140625, 0.8604278564453125, 18.908676147460938, 2.0152549743652344, 18.212459564208984, 16.72857666015625, -11.801597595214844, -0.83148193359375, 18.890655517578125, 12.642187118530273, 11.918975830078125, 21.769912719726562, 2.9935531616210938, 5.786109924316406, 27.899658203125, 6.1141815185546875, -0.5171775817871094, 3.9349822998046875, 0.6098175048828125, 2.45068359375, -10.969291687011719, 7.8073272705078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000186.npy"}
{"epoch": 0.2811791383219955, "step": 187, "batch_size": 64, "mean": 5.348174095153809, "std": 9.408909797668457, "min": -17.849082946777344, "p10": -6.030551147460936, "median": 4.1978349685668945, "p90": 18.777243804931647, "max": 28.123764038085938, "pos_frac": 0.75, "sample": [10.078216552734375, -12.149444580078125, 21.196701049804688, 20.63311767578125, 28.123764038085938, 8.805633544921875, 17.00727081298828, 3.364837646484375, 6.7892303466796875, 10.415145874023438, -6.6898956298828125, 0.89385986328125, 1.4812088012695312, 4.26470947265625, 6.668609619140625, 4.130960464477539, 1.3673896789550781, 11.060020446777344, 8.060684204101562, 13.766946792602539, -6.901397705078125, -17.849082946777344, 11.999435424804688, 0.8145027160644531, 1.1760597229003906, 6.746337890625, 1.6853561401367188, -2.1898193359375, -4.1649169921875, 19.847850799560547, 9.156131744384766, 19.44811248779297, -10.253177642822266, -4.4920806884765625, -1.2347030639648438, 10.462478637695312, 5.170051574707031, 2.2178192138671875, -3.0867919921875, 13.697566986083984, -1.5487346649169922, -3.4164352416992188, 19.855918884277344, 3.1290054321289062, 6.215065002441406, 1.3052597045898438, 8.429893493652344, 4.110710144042969, 3.3983306884765625, 1.939117431640625, 15.746833801269531, -9.479110717773438, 12.628219604492188, 23.88446044921875, 2.6710643768310547, 6.388175964355469, 12.147918701171875, 16.8106689453125, 4.4713897705078125, 17.211883544921875, -0.590179443359375, -4.002656936645508, -13.719390869140625, 3.1770401000976562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000187.npy"}
{"epoch": 0.28269085411942557, "step": 188, "batch_size": 64, "mean": 5.212343215942383, "std": 8.870586395263672, "min": -12.322097778320312, "p10": -4.270846176147461, "median": 3.0856704711914062, "p90": 18.80484390258789, "max": 25.829803466796875, "pos_frac": 0.6875, "sample": [3.0887374877929688, -2.1561203002929688, 18.911102294921875, 0.49184608459472656, -7.1079559326171875, 5.78497314453125, 10.926902770996094, 10.493782043457031, 8.795181274414062, 5.0511627197265625, -1.2748031616210938, 25.829803466796875, 22.288105010986328, 3.0826034545898438, 9.222137451171875, -2.7765579223632812, 2.5940933227539062, 13.729682922363281, 0.7357158660888672, -3.3235340118408203, 10.630491256713867, 0.5363311767578125, -1.4729766845703125, -8.4324951171875, 0.4416618347167969, 0.35015869140625, -3.9321861267089844, 1.3430633544921875, 5.772552490234375, 19.67205810546875, 15.431827545166016, -0.7095260620117188, 5.519317626953125, 22.52631378173828, 9.701881408691406, -0.4311981201171875, 16.898773193359375, 5.730033874511719, -6.9015045166015625, -0.2668609619140625, 18.21924591064453, -3.33770751953125, 5.024572372436523, 0.7734146118164062, 9.670379638671875, -0.1066131591796875, 6.272922515869141, 23.0858154296875, 7.015960693359375, 1.5120773315429688, 14.899810791015625, -12.322097778320312, -6.4447021484375, 0.9733772277832031, -3.50677490234375, 8.914047241210938, 2.892782211303711, -4.399822235107422, 18.556907653808594, 19.27227210998535, 8.53744888305664, -7.61407470703125, -3.9699020385742188, 12.876052856445312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000188.npy"}
{"epoch": 0.2842025699168556, "step": 189, "batch_size": 64, "mean": 4.283927917480469, "std": 10.285325050354004, "min": -25.863136291503906, "p10": -8.348439025878905, "median": 4.178722381591797, "p90": 16.628875541687012, "max": 23.109146118164062, "pos_frac": 0.6875, "sample": [-2.7935638427734375, 9.294792175292969, 4.3692779541015625, -6.8616485595703125, -8.718856811523438, 8.059417724609375, 21.574176788330078, 10.995168685913086, 0.9122467041015625, -7.484130859375, -13.573165893554688, 16.46136474609375, -25.863136291503906, 4.311126708984375, -7.198036193847656, 3.057161331176758, 20.165306091308594, 14.351249694824219, -11.109920501708984, 4.046318054199219, 5.968639373779297, 16.51396369934082, 7.080024719238281, 2.8332977294921875, 5.661750793457031, -3.733734130859375, 2.7352371215820312, 0.8288764953613281, 3.6346378326416016, 2.5218429565429688, 9.424636840820312, -9.277740478515625, 1.0264739990234375, 20.207653045654297, 16.678123474121094, 12.03610610961914, -0.12047576904296875, 3.9688339233398438, 15.33864974975586, 12.743370056152344, 14.323421478271484, 3.66046142578125, 10.606674194335938, -1.7493324279785156, 7.616943359375, 11.124908447265625, -3.3190078735351562, -5.8166046142578125, -2.93182373046875, 1.30499267578125, -1.9317474365234375, 16.45311164855957, 13.773170471191406, 15.797584533691406, 16.832916259765625, 7.864482879638672, -2.7424468994140625, -18.23992156982422, 5.742286682128906, -17.691925048828125, 4.360630035400391, -1.681854248046875, 23.109146118164062, 17.640016555786133], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000189.npy"}
{"epoch": 0.2857142857142857, "step": 190, "batch_size": 64, "mean": 7.144102096557617, "std": 9.213125228881836, "min": -18.71712303161621, "p10": -4.722965621948242, "median": 6.312591552734375, "p90": 18.54218444824219, "max": 26.673187255859375, "pos_frac": 0.828125, "sample": [13.228351593017578, 13.883453369140625, 23.790592193603516, -9.845626831054688, 5.841926574707031, 9.021591186523438, 4.205924987792969, 4.415679931640625, -18.71712303161621, 26.673187255859375, -4.874259948730469, 6.230369567871094, 9.270530700683594, 10.940902709960938, 5.579507827758789, 2.2393722534179688, 1.1100730895996094, 10.88232421875, 7.342161178588867, 6.330810546875, 7.0234375, 13.64239501953125, 1.68255615234375, 2.33380126953125, -4.369945526123047, 14.196121215820312, -8.219223022460938, 1.5771522521972656, 3.9835777282714844, 7.730157852172852, 21.907325744628906, -7.163028717041016, 14.185039520263672, 24.31365966796875, 6.29437255859375, 2.3983688354492188, 17.731231689453125, -1.1919784545898438, 3.3172569274902344, 11.287303924560547, 3.0760650634765625, 4.1629180908203125, 16.19659423828125, 21.340545654296875, 5.986232757568359, -0.28188323974609375, -9.377105712890625, 11.9923095703125, -11.295623779296875, 0.9738101959228516, 0.7024765014648438, 11.478006362915039, -0.7446632385253906, 15.160736083984375, 16.634552001953125, 18.341400146484375, 19.565345764160156, 4.808006286621094, 17.09124755859375, 12.301292419433594, 0.8345870971679688, 18.62823486328125, 9.693984985351562, 9.744129180908203], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000190.npy"}
{"epoch": 0.2872260015117158, "step": 191, "batch_size": 64, "mean": 4.318768501281738, "std": 9.389914512634277, "min": -18.3079833984375, "p10": -6.194596862792968, "median": 4.668804168701172, "p90": 15.308944702148441, "max": 25.617469787597656, "pos_frac": 0.71875, "sample": [15.644302368164062, -1.7687606811523438, 3.78173828125, 25.617469787597656, 25.501068115234375, -1.5137176513671875, 5.123565673828125, 14.526443481445312, 1.5708236694335938, 12.99053955078125, -5.952850341796875, 21.63568878173828, -4.155540466308594, 4.875911712646484, 16.681007385253906, 5.276874542236328, -17.630714416503906, 2.21270751953125, 13.375244140625, 9.59524154663086, 8.288185119628906, 3.49334716796875, 18.783584594726562, 5.205841064453125, -16.967044830322266, 4.507072448730469, -0.7994613647460938, 5.552402496337891, -6.2982025146484375, -0.8627700805664062, 3.2528457641601562, 1.0081253051757812, 3.2311744689941406, 7.263099670410156, 6.907073974609375, -13.230209350585938, -1.408172607421875, -9.365402221679688, 10.536441802978516, 9.968040466308594, 13.019912719726562, 14.158317565917969, -4.598731994628906, 4.830535888671875, 7.7154083251953125, 1.8355236053466797, 2.74029541015625, 13.089630126953125, 0.6139030456542969, 0.9661407470703125, -4.772983551025391, 16.03527069091797, 8.42095947265625, -9.368507385253906, -2.5577468872070312, 13.613311767578125, 9.3914794921875, -18.3079833984375, 7.083026885986328, 0.8118133544921875, 5.303169250488281, 13.37567138671875, 0.0145721435546875, -3.4647979736328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000191.npy"}
{"epoch": 0.2887377173091459, "step": 192, "batch_size": 64, "mean": 3.151545763015747, "std": 8.253920555114746, "min": -28.35144805908203, "p10": -4.609989261627197, "median": 2.3901262283325195, "p90": 11.755804443359375, "max": 24.844223022460938, "pos_frac": 0.625, "sample": [-28.35144805908203, 3.6397972106933594, 1.0152053833007812, 7.683563232421875, 6.737384796142578, 3.8526382446289062, -4.610304832458496, 11.728759765625, 2.33428955078125, 8.592506408691406, -1.2186126708984375, -7.91326904296875, -10.084850311279297, 1.2423973083496094, -0.44954681396484375, -1.548309326171875, 19.201770782470703, -2.5213470458984375, 1.6777076721191406, 1.3112335205078125, 7.8161773681640625, 24.5804443359375, 2.445962905883789, -2.102081298828125, 9.427947998046875, 8.225900650024414, -1.6681365966796875, 5.848197937011719, -0.518646240234375, 7.013202667236328, 12.557083129882812, 8.608406066894531, -4.6092529296875, 15.578285217285156, -0.7801113128662109, -1.5748882293701172, 0.55657958984375, 0.8092689514160156, -6.863254547119141, 7.8417510986328125, 6.555946350097656, -10.459793090820312, -4.909431457519531, 24.844223022460938, 5.1941680908203125, 6.153402328491211, -4.128366470336914, 10.987932205200195, 4.355918884277344, -1.8414230346679688, 2.578521728515625, 11.124275207519531, -1.778472900390625, 5.562705993652344, -1.0540847778320312, 10.885360717773438, 4.230369567871094, -1.3853225708007812, 11.76739501953125, 16.039108276367188, 3.787933349609375, -3.00653076171875, -0.600006103515625, 1.2826919555664062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000192.npy"}
{"epoch": 0.29024943310657597, "step": 193, "batch_size": 64, "mean": 6.641099452972412, "std": 9.528197288513184, "min": -13.049285888671875, "p10": -3.5269199371337887, "median": 5.244096755981445, "p90": 20.454435348510742, "max": 26.515708923339844, "pos_frac": 0.6875, "sample": [3.4081802368164062, 5.052677154541016, 13.246618270874023, -2.7500457763671875, 3.0552101135253906, -4.760459899902344, 5.435516357421875, 2.10919189453125, -3.7395477294921875, -1.0003166198730469, 6.868724822998047, -1.6494865417480469, 10.702877044677734, -1.561187744140625, 22.076045989990234, -4.3562164306640625, 14.623123168945312, 3.6313247680664062, 6.647890090942383, 2.0600738525390625, -3.0089054107666016, 16.914775848388672, 19.76666259765625, 1.1299476623535156, 7.3271026611328125, 13.437232971191406, 1.888010025024414, 9.711917877197266, 1.782196044921875, 14.109134674072266, 21.852703094482422, 8.714431762695312, -4.422939300537109, 23.46204376220703, 7.149772644042969, 6.3712615966796875, -2.3462066650390625, -8.496963500976562, 12.695404052734375, 6.304168701171875, 6.8041839599609375, -3.0307884216308594, 19.47911834716797, 26.515708923339844, -2.0592079162597656, -0.240234375, 4.038257598876953, -1.9520187377929688, -1.1147994995117188, 15.755355834960938, 1.86492919921875, -2.6105575561523438, 25.74840545654297, 20.388519287109375, 7.292171478271484, 19.00157928466797, 23.369903564453125, -13.049285888671875, 16.624412536621094, 20.482685089111328, -2.168243408203125, 4.596456527709961, -9.26840591430664, 15.120254516601562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000193.npy"}
{"epoch": 0.29176114890400606, "step": 194, "batch_size": 64, "mean": 3.219606876373291, "std": 10.38158893585205, "min": -21.61644744873047, "p10": -11.030622482299803, "median": 2.689830780029297, "p90": 16.75910110473633, "max": 26.2542724609375, "pos_frac": 0.65625, "sample": [2.5555191040039062, -1.6907882690429688, 6.323616027832031, -4.366571426391602, 4.4791107177734375, -5.039421081542969, -11.883550643920898, 8.561206817626953, -2.2414016723632812, 6.0467071533203125, -1.9732494354248047, 3.957427978515625, 2.7556228637695312, 18.164871215820312, -17.512893676757812, 1.094970703125, 13.726455688476562, -5.39288330078125, 3.319103240966797, 0.1956024169921875, 10.938156127929688, -2.355215072631836, 1.3969039916992188, 9.82427978515625, -7.609699249267578, 2.6240386962890625, -4.536670684814453, 10.521209716796875, 1.2580184936523438, 16.441619873046875, -8.135128021240234, 4.5087738037109375, -11.701251983642578, -14.425697326660156, -17.818206787109375, -7.316459655761719, 18.036415100097656, 19.964767456054688, 12.411628723144531, 13.182392120361328, -11.748294830322266, 0.8789443969726562, 4.69597053527832, -1.5038871765136719, 15.513534545898438, 21.83056640625, 8.816741943359375, -21.61644744873047, 26.2542724609375, 4.218231201171875, 8.851133346557617, -9.4658203125, -0.7553634643554688, 19.230932235717773, 13.781120300292969, 15.627090454101562, 14.051717758178711, 10.102226257324219, 0.5428924560546875, 16.895164489746094, 0.4549064636230469, 0.15282821655273438, -5.037113189697266, 5.994163513183594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000194.npy"}
{"epoch": 0.29327286470143615, "step": 195, "batch_size": 64, "mean": 5.015788555145264, "std": 8.535880088806152, "min": -15.10357666015625, "p10": -5.049653244018554, "median": 5.063207626342773, "p90": 15.785859680175783, "max": 25.68645477294922, "pos_frac": 0.71875, "sample": [3.2683868408203125, 7.7035064697265625, -9.537643432617188, 5.6848297119140625, 23.17919921875, 7.491340637207031, 4.069160461425781, 2.4153671264648438, 11.55877685546875, 22.577682495117188, 9.077407836914062, 4.082263946533203, -2.2545547485351562, -12.626167297363281, -3.7018890380859375, -15.10357666015625, -1.3115692138671875, 12.076072692871094, 1.7430496215820312, 7.042430877685547, -4.4529571533203125, -0.04644775390625, 13.94124984741211, 9.140182495117188, 3.2719268798828125, 5.28778076171875, 4.064136505126953, 7.1616973876953125, 5.7147674560546875, 7.530975341796875, 3.3318862915039062, 3.2655258178710938, 22.708290100097656, 6.889122009277344, 15.393722534179688, -2.8944320678710938, 5.066200256347656, 7.3299713134765625, 5.831184387207031, 25.68645477294922, -0.4558219909667969, -3.0537872314453125, 8.042587280273438, -2.795013427734375, 5.060214996337891, 8.558143615722656, -7.21246337890625, 15.95391845703125, -5.2299652099609375, -7.139179229736328, -4.628925323486328, -2.85675048828125, 10.768806457519531, 19.53662109375, 3.0782012939453125, 8.697486877441406, 2.463775634765625, 7.514152526855469, -6.169273376464844, 23.26910400390625, 4.409095764160156, 4.409152984619141, 6.08026123046875, 11.054798126220703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000195.npy"}
{"epoch": 0.2947845804988662, "step": 196, "batch_size": 64, "mean": 4.339295864105225, "std": 9.078714370727539, "min": -21.688034057617188, "p10": -5.016578865051269, "median": 4.143960952758789, "p90": 15.281288909912112, "max": 38.175537109375, "pos_frac": 0.703125, "sample": [-0.8533935546875, -21.688034057617188, 0.4828376770019531, -3.074859619140625, 4.740795135498047, -0.5121021270751953, 5.14727783203125, 3.5471267700195312, 7.6190643310546875, -6.1878204345703125, -3.77215576171875, 22.137611389160156, 7.6176300048828125, 2.1682662963867188, 5.157951354980469, 1.407623291015625, -10.37985610961914, 23.310604095458984, -10.643653869628906, 16.270217895507812, -4.278600692749023, 8.04538345336914, 18.55939483642578, 2.8243675231933594, -5.332855224609375, 7.7013702392578125, 11.4122314453125, 7.4077606201171875, 4.882904052734375, 38.175537109375, 1.4957160949707031, 8.106094360351562, -0.78070068359375, -2.403289794921875, 1.7209625244140625, 14.522697448730469, -2.7669754028320312, 8.788509368896484, 1.7332077026367188, 2.5429153442382812, 11.257438659667969, 6.073211669921875, 9.239299774169922, -1.6119461059570312, 15.606399536132812, -14.697174072265625, -0.5334606170654297, 2.3671875, 8.11505126953125, 8.53369140625, 6.277069091796875, 10.91545295715332, -1.0461273193359375, -1.6953048706054688, 2.389425277709961, 6.782135009765625, 7.405387878417969, 12.187301635742188, 1.8525390625, -5.6689910888671875, 5.664787292480469, 0.14191055297851562, 7.669532775878906, 15.638351440429688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000196.npy"}
{"epoch": 0.2962962962962963, "step": 197, "batch_size": 64, "mean": 5.900899887084961, "std": 11.410016059875488, "min": -26.781692504882812, "p10": -8.6554573059082, "median": 7.071071624755859, "p90": 19.911020660400393, "max": 29.31140899658203, "pos_frac": 0.71875, "sample": [-4.910915374755859, 27.187057495117188, 4.913539886474609, 2.5131149291992188, 12.995880126953125, 4.3954620361328125, -13.380149841308594, -17.295639038085938, 7.124355316162109, 9.9080810546875, -0.32659912109375, 20.917457580566406, -15.338191986083984, -3.788349151611328, 7.667449951171875, 11.456336975097656, 10.984718322753906, 4.503173828125, 20.234519958496094, 22.220321655273438, -2.0573272705078125, 5.765342712402344, 11.052001953125, 7.670804977416992, 17.900779724121094, -0.7085189819335938, 14.964775085449219, -5.693037033081055, 7.017787933349609, 7.396049499511719, 17.58490753173828, 24.361328125, -26.781692504882812, 16.767236709594727, -4.444583892822266, -16.279312133789062, 4.7841033935546875, 10.834808349609375, 19.15618896484375, -2.6286468505859375, -1.958958625793457, 12.127254486083984, 18.802318572998047, 2.6974945068359375, 2.7432174682617188, 11.195266723632812, -4.3547210693359375, 0.7117805480957031, 7.360374450683594, 17.764110565185547, -9.925065994262695, 0.1223907470703125, 23.03412628173828, 10.387056350708008, -3.450113296508789, -13.44329833984375, 29.31140899658203, 11.174736022949219, 8.906547546386719, 11.486610412597656, 3.495532989501953, 0.0292510986328125, 5.746149063110352, 15.049530029296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000197.npy"}
{"epoch": 0.29780801209372637, "step": 198, "batch_size": 64, "mean": 5.9372639656066895, "std": 8.91983699798584, "min": -18.12261962890625, "p10": -5.6752046585083, "median": 5.987797737121582, "p90": 17.18957672119141, "max": 24.921213150024414, "pos_frac": 0.765625, "sample": [-6.75701904296875, 12.693710327148438, 1.9341506958007812, 1.5676498413085938, 5.470273971557617, 16.292701721191406, 1.8666915893554688, 16.2757568359375, 3.757091522216797, 13.604827880859375, 9.315383911132812, 8.521347045898438, 3.5586090087890625, -5.10772705078125, 18.42757797241211, 18.12552261352539, 2.3055572509765625, 15.833465576171875, 2.4874725341796875, 1.3260440826416016, 8.692703247070312, 8.919330596923828, 12.661605834960938, -11.898185729980469, 15.382057189941406, -18.12261962890625, -7.376983642578125, -1.4440422058105469, 19.932907104492188, 12.177875518798828, 11.165130615234375, 13.0921630859375, 2.1126861572265625, -13.64910888671875, 7.422889709472656, 2.1579627990722656, -0.145721435546875, -3.3082008361816406, 5.2828826904296875, 6.599800109863281, 12.021614074707031, 21.618614196777344, 17.573951721191406, 5.673755645751953, 7.088314056396484, 4.2357177734375, -6.418338775634766, 21.068710327148438, 24.921213150024414, -2.1171798706054688, -3.212799072265625, 3.8926544189453125, 13.602668762207031, 14.225996017456055, -2.0133438110351562, 1.0576629638671875, 10.839157104492188, -4.109630584716797, -5.91840934753418, 6.301839828491211, 9.141616821289062, 8.40545654296875, 8.276222229003906, 2.675203323364258], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000198.npy"}
{"epoch": 0.29931972789115646, "step": 199, "batch_size": 64, "mean": 5.0952534675598145, "std": 10.930628776550293, "min": -20.801048278808594, "p10": -7.11936798095703, "median": 5.854846954345703, "p90": 17.21342277526856, "max": 26.939865112304688, "pos_frac": 0.640625, "sample": [16.37234878540039, 13.841842651367188, 8.21832275390625, 9.800235748291016, 0.4275951385498047, -13.273258209228516, 26.939865112304688, 17.573883056640625, 7.330940246582031, 13.106552124023438, 7.686431884765625, -3.390308380126953, 12.707267761230469, 15.165718078613281, 11.272689819335938, -1.37176513671875, 18.403961181640625, 12.407661437988281, 8.906295776367188, 9.836036682128906, -0.7434921264648438, -15.240890502929688, 16.321399688720703, 16.181442260742188, -0.3213043212890625, 1.5170211791992188, 0.16350555419921875, -6.155906677246094, 4.347965240478516, 13.511886596679688, 4.422660827636719, 7.808462142944336, 13.410392761230469, -3.0905532836914062, -0.6174106597900391, -2.0249481201171875, 24.990936279296875, -0.756072998046875, -7.532279968261719, 11.0751953125, -20.801048278808594, 10.1883544921875, -0.47316741943359375, 15.258026123046875, -13.60589599609375, 7.2870330810546875, 13.303192138671875, 0.9920120239257812, 0.36571502685546875, 0.2014923095703125, -2.7385311126708984, -16.87163543701172, -4.948333740234375, 21.099105834960938, -20.784713745117188, -2.8940811157226562, 13.688899993896484, -3.8274078369140625, 14.38372802734375, -1.6414718627929688, 22.615638732910156, -0.60052490234375, 24.28460693359375, 2.384918212890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000199.npy"}
{"epoch": 0.30083144368858655, "step": 200, "batch_size": 64, "mean": 6.012909889221191, "std": 11.539518356323242, "min": -24.681915283203125, "p10": -6.577841567993162, "median": 5.431173324584961, "p90": 21.18509330749512, "max": 32.317413330078125, "pos_frac": 0.703125, "sample": [3.926547050476074, 6.623355865478516, 21.66558837890625, 14.607795715332031, 14.874160766601562, 0.39606666564941406, 25.52013397216797, 3.36529541015625, 9.779998779296875, -1.364959716796875, -0.3498382568359375, 1.790740966796875, -5.006401062011719, 23.9647216796875, 2.9440040588378906, -2.223377227783203, -0.300048828125, -0.6504364013671875, 15.608673095703125, -11.7470703125, 1.664215087890625, -2.657623291015625, 16.095474243164062, 20.623043060302734, 3.180217742919922, 6.311962127685547, 13.165596008300781, -4.470001220703125, 7.555332183837891, -14.504158020019531, 18.44799041748047, 14.431175231933594, -0.561767578125, 26.058944702148438, 20.140853881835938, -10.586994171142578, -24.681915283203125, 11.708209991455078, 14.829513549804688, 20.360252380371094, 14.391937255859375, 5.893329620361328, 32.317413330078125, 14.875205993652344, 0.4618682861328125, -20.227951049804688, -0.94256591796875, 1.77728271484375, 6.923561096191406, 23.518089294433594, 5.907173156738281, 6.24102783203125, 2.0550899505615234, -1.0428047180175781, 6.3370513916015625, 13.0867919921875, -3.931365966796875, 2.6425628662109375, 2.4832382202148438, 9.905525207519531, -7.251316070556641, -17.52516746520996, 21.42597198486328, 4.969017028808594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000200.npy"}
{"epoch": 0.30234315948601664, "step": 201, "batch_size": 64, "mean": 6.195756912231445, "std": 9.086164474487305, "min": -15.535720825195312, "p10": -3.9673618316650385, "median": 5.677070617675781, "p90": 18.984067535400392, "max": 25.451080322265625, "pos_frac": 0.71875, "sample": [1.8409576416015625, 5.9862518310546875, 2.1851959228515625, 5.971889495849609, -1.3018112182617188, 5.7537841796875, 16.813125610351562, -0.0386810302734375, 2.9589672088623047, -0.8704071044921875, 8.944976806640625, 8.873031616210938, 18.955116271972656, 7.4039154052734375, -1.1270942687988281, -6.830474853515625, 20.524826049804688, 14.477561950683594, 1.018280029296875, 18.353593826293945, -7.1031341552734375, 15.39739990234375, 4.755516052246094, -0.15010833740234375, -2.5753650665283203, 23.768077850341797, 4.664239883422852, 13.260467529296875, -4.224639892578125, 18.996475219726562, 0.10102462768554688, 9.213394165039062, 5.951457977294922, -3.226604461669922, 19.924449920654297, 25.451080322265625, 3.009857177734375, -15.535720825195312, 14.640167236328125, 5.6003570556640625, 15.816558837890625, 10.688789367675781, -3.367046356201172, 6.945674896240234, -8.808332443237305, 16.65081787109375, 8.717967987060547, -0.00030517578125, -4.568178176879883, 19.277877807617188, -1.4015121459960938, -2.2206878662109375, 0.8556938171386719, 8.226299285888672, 13.2635498046875, 16.83746337890625, 15.749134063720703, 0.2482147216796875, 3.5774688720703125, 1.8756256103515625, 19.451416015625, 6.56817626953125, 2.237457275390625, -11.905075073242188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000201.npy"}
{"epoch": 0.30385487528344673, "step": 202, "batch_size": 64, "mean": 5.776695728302002, "std": 9.473567008972168, "min": -15.02358627319336, "p10": -4.776603698730469, "median": 6.202314376831055, "p90": 16.95782852172852, "max": 30.95520782470703, "pos_frac": 0.734375, "sample": [-3.7623062133789062, 12.625473022460938, 25.707977294921875, 17.539520263671875, 12.363363265991211, 0.0876312255859375, 6.416419982910156, 13.367538452148438, 11.758321762084961, 4.316463470458984, 8.571990966796875, 9.769973754882812, 15.60614013671875, 2.450244903564453, -1.5605583190917969, 20.72429656982422, 12.32891845703125, -4.212757110595703, 7.597991943359375, -5.410888671875, 9.612075805664062, 6.044521331787109, 3.4971389770507812, -4.91912841796875, -13.072879791259766, 6.40032958984375, 4.449901580810547, 30.95520782470703, 0.017425537109375, 13.99637222290039, -2.9862442016601562, 4.236396789550781, 9.030204772949219, 0.441864013671875, 1.1216278076171875, -14.810813903808594, 17.351768493652344, 6.360107421875, 8.959453582763672, -11.397228240966797, -0.6527862548828125, 3.90814208984375, 0.16417694091796875, 16.03863525390625, 13.842979431152344, 1.2296371459960938, 15.437919616699219, 2.7277603149414062, 12.683967590332031, -4.4440460205078125, 1.7232170104980469, -3.1601905822753906, -0.6492748260498047, -2.548320770263672, -6.9509124755859375, 8.37579345703125, 22.953948974609375, 10.99283218383789, -1.3153934478759766, 19.025102615356445, -15.02358627319336, 14.945396423339844, 9.284370422363281, 9.545303344726562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000202.npy"}
{"epoch": 0.30536659108087677, "step": 203, "batch_size": 64, "mean": 5.769729137420654, "std": 8.991868019104004, "min": -19.217819213867188, "p10": -2.615773773193359, "median": 4.233978271484375, "p90": 17.18982696533203, "max": 32.39839172363281, "pos_frac": 0.78125, "sample": [6.5020294189453125, 1.1595840454101562, -13.524734497070312, 0.740081787109375, -0.436492919921875, 9.394952774047852, 0.2223968505859375, 14.133697509765625, 5.094886779785156, -3.079875946044922, 24.657089233398438, 6.132266998291016, 0.6270713806152344, 32.39839172363281, 0.23581314086914062, 18.555435180664062, 16.012351989746094, -0.6132354736328125, 16.21015167236328, 13.3861083984375, -19.217819213867188, -1.8998947143554688, 7.776031494140625, -3.4176368713378906, -2.088714599609375, -3.6970367431640625, 2.278839111328125, 10.632034301757812, 5.4445037841796875, 6.957740783691406, 10.069908142089844, 5.984161376953125, 1.7207260131835938, 3.3816184997558594, 16.834945678710938, 0.9870681762695312, 18.39899444580078, 10.51739501953125, 27.634368896484375, 4.233528137207031, 10.968009948730469, 7.629302978515625, 1.703948974609375, 3.3764991760253906, 17.3419189453125, 2.2246780395507812, 6.710357666015625, 4.989715576171875, -2.3455886840820312, 24.338359832763672, 11.565975189208984, -1.8365325927734375, 1.1549835205078125, 4.579387664794922, 1.6670989990234375, -4.027015686035156, 2.184326171875, 0.5669097900390625, 13.311878204345703, 4.234428405761719, -1.4878311157226562, -2.7315673828125, 1.25738525390625, 11.547298431396484], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000203.npy"}
{"epoch": 0.30687830687830686, "step": 204, "batch_size": 64, "mean": 6.20828104019165, "std": 9.248422622680664, "min": -17.368427276611328, "p10": -3.3703655242919917, "median": 5.427648544311523, "p90": 19.90415725708008, "max": 30.25615692138672, "pos_frac": 0.75, "sample": [27.196693420410156, 14.281517028808594, -9.137104034423828, 6.977558135986328, 0.7982368469238281, 30.25615692138672, 4.916778564453125, 11.264501571655273, 1.4866180419921875, 9.343719482421875, 19.698165893554688, -2.9323501586914062, -5.86131477355957, 9.90496826171875, 4.134712219238281, -1.13134765625, 10.244682312011719, 5.344905853271484, -9.0628662109375, -0.5521774291992188, 3.302600860595703, 11.593826293945312, 5.535408020019531, 13.995620727539062, 0.000774383544921875, -2.3390960693359375, 12.98504638671875, 7.228107452392578, 4.179019927978516, 5.6438446044921875, 14.136676788330078, -4.738555908203125, 10.442607879638672, 11.142326354980469, 7.450294494628906, -3.558086395263672, 6.725364685058594, 21.427459716796875, 2.1032676696777344, 8.707359313964844, 3.9681930541992188, 1.8302078247070312, 16.6409912109375, 19.99243927001953, -9.430770874023438, -0.045368194580078125, 5.926033020019531, 3.6730289459228516, 22.347518920898438, 1.2980575561523438, 16.810882568359375, 23.41912841796875, 3.848388671875, 10.582130432128906, 1.6063995361328125, 22.573822021484375, -2.3693771362304688, 1.3385810852050781, -0.09668731689453125, 5.5103912353515625, -17.368427276611328, -2.0419273376464844, 6.122535705566406, -1.9421234130859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000204.npy"}
{"epoch": 0.30839002267573695, "step": 205, "batch_size": 64, "mean": 4.923447608947754, "std": 10.68130111694336, "min": -16.611412048339844, "p10": -7.498354721069335, "median": 3.4243202209472656, "p90": 20.33745651245118, "max": 30.789459228515625, "pos_frac": 0.65625, "sample": [6.787628173828125, 1.6950607299804688, -0.9078922271728516, -6.5512542724609375, 5.082956314086914, 23.434200286865234, 0.6803092956542969, 17.77020263671875, -1.37799072265625, 23.84527587890625, 4.8672027587890625, 3.5994205474853516, 30.789459228515625, 12.938541412353516, 22.460912704467773, -2.562549591064453, 9.371841430664062, -5.68450927734375, 3.1177978515625, 1.6309242248535156, 1.152334213256836, 9.043830871582031, 1.1176109313964844, 3.4400711059570312, 9.158615112304688, -3.3096656799316406, -4.192268371582031, -2.7727088928222656, -3.151702880859375, 9.226760864257812, -0.323211669921875, -3.4021224975585938, -6.3621826171875, -0.0059661865234375, -9.626022338867188, 5.9326934814453125, 6.332218170166016, 3.4085693359375, -5.2736663818359375, 16.93377685546875, -12.710662841796875, 18.487411499023438, 15.443744659423828, 17.131591796875, 27.621902465820312, 15.1385498046875, 1.94073486328125, 10.495170593261719, 24.876808166503906, -3.7719268798828125, -9.303577423095703, 10.553611755371094, 0.8553733825683594, 13.635154724121094, -11.636489868164062, 21.130332946777344, 8.877265930175781, -13.613285064697266, 16.165634155273438, -7.904254913330078, 4.717597961425781, -16.611412048339844, 4.129039764404297, 1.1378459930419922], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000205.npy"}
{"epoch": 0.30990173847316704, "step": 206, "batch_size": 64, "mean": 3.545210123062134, "std": 9.580803871154785, "min": -18.106964111328125, "p10": -5.204959106445313, "median": 1.3975296020507812, "p90": 14.883589172363282, "max": 30.500778198242188, "pos_frac": 0.625, "sample": [-1.7888221740722656, -9.64019775390625, 4.790863037109375, -4.710304260253906, -7.762228012084961, -2.3570098876953125, 4.959953308105469, 1.4745559692382812, 13.173797607421875, 5.29536247253418, 0.4341888427734375, 23.466629028320312, 6.461784362792969, -4.424285888671875, 0.04053497314453125, 22.52142333984375, -0.8779525756835938, 6.84344482421875, -15.04400634765625, 0.2888526916503906, 30.500778198242188, 1.3205032348632812, 0.22113800048828125, -1.3914642333984375, -0.7504043579101562, 14.443023681640625, 8.8165283203125, -15.76694107055664, 10.1597900390625, 13.65325927734375, 14.125999450683594, 11.911575317382812, 4.938201904296875, 15.63228988647461, 4.220123291015625, -18.106964111328125, -16.478195190429688, -4.1398773193359375, -2.3094749450683594, 2.6055450439453125, 5.884662628173828, 0.9211044311523438, 10.759445190429688, -0.21332550048828125, -5.045585632324219, 2.02880859375, 21.71505355834961, 13.513519287109375, -2.1502227783203125, -4.019989013671875, 8.155593872070312, 0.7761611938476562, 7.449165344238281, -0.13535308837890625, 16.98639678955078, 14.853607177734375, -0.15159988403320312, 4.5600738525390625, -3.570758819580078, 14.896438598632812, -5.273262023925781, 7.9320068359375, -0.3956756591796875, 0.6651687622070312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000206.npy"}
{"epoch": 0.31141345427059713, "step": 207, "batch_size": 64, "mean": 5.854440689086914, "std": 9.81875991821289, "min": -17.709152221679688, "p10": -3.718806648254394, "median": 4.101835250854492, "p90": 16.339612579345705, "max": 39.480255126953125, "pos_frac": 0.734375, "sample": [39.480255126953125, -1.5308074951171875, 0.7633190155029297, 4.771034240722656, 7.745426177978516, 7.691261291503906, 4.6519012451171875, 1.182373046875, 1.964385986328125, 11.10675048828125, 2.5682029724121094, 12.766918182373047, 1.0784282684326172, 13.366340637207031, -3.1487789154052734, -3.9697723388671875, -6.103981018066406, 15.52984619140625, 10.26165771484375, 11.798233032226562, 9.354148864746094, 4.115623474121094, 2.3703994750976562, -3.963104248046875, -0.4615631103515625, 0.3463287353515625, -2.508840560913086, 21.979774475097656, -2.5291290283203125, 1.4393463134765625, 15.704158782958984, 5.462898254394531, 33.5736083984375, 0.6647796630859375, 1.4891815185546875, 4.088047027587891, -6.346744537353516, 14.732349395751953, -1.17803955078125, -0.7539339065551758, 12.02093505859375, 4.533378601074219, 4.565330505371094, 4.302562713623047, 13.208282470703125, 14.395042419433594, 3.235870361328125, -1.5787506103515625, 16.611949920654297, 8.42498779296875, -17.709152221679688, 0.5342559814453125, 6.415557861328125, 6.456798553466797, 1.2208251953125, 24.288482666015625, 21.277366638183594, -2.4658050537109375, 11.985122680664062, -2.7090110778808594, -6.655445098876953, 2.180633544921875, -5.524223327636719, 26.11695098876953], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000207.npy"}
{"epoch": 0.3129251700680272, "step": 208, "batch_size": 64, "mean": 3.2392849922180176, "std": 9.753019332885742, "min": -16.8131103515625, "p10": -10.574472045898437, "median": 1.849431037902832, "p90": 15.830787658691406, "max": 22.038551330566406, "pos_frac": 0.609375, "sample": [-7.240253448486328, 0.23818397521972656, 12.599081039428711, -5.663393020629883, 16.356536865234375, 15.839462280273438, 2.071046829223633, 8.442474365234375, -16.8131103515625, -13.389694213867188, -0.8837356567382812, -0.15740966796875, 9.250789642333984, 9.839736938476562, 6.516075134277344, 8.809377670288086, 0.15102005004882812, 4.285308837890625, -1.7734222412109375, -10.184738159179688, 10.911277770996094, -11.463981628417969, 0.7270355224609375, -1.0023345947265625, 6.249912261962891, 13.709692001342773, 12.144477844238281, -2.8059139251708984, -0.47573089599609375, 14.488655090332031, -4.553194046020508, 4.6940460205078125, -9.569313049316406, -0.139556884765625, -8.84613037109375, 12.76824951171875, 21.745086669921875, -2.9730300903320312, -1.0178985595703125, -2.593991279602051, 22.038551330566406, 7.1998443603515625, 2.9622116088867188, -14.181159973144531, 13.615253448486328, 10.231040954589844, 19.75811767578125, 0.7719173431396484, 12.107612609863281, 6.866085052490234, -10.741500854492188, 17.63190460205078, 0.45619964599609375, 18.28345489501953, -5.0836181640625, 7.426979064941406, 3.6246337890625, 1.6278152465820312, -10.77407455444336, -2.0515213012695312, -15.871456146240234, 0.4896087646484375, 14.825096130371094, 15.810546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000208.npy"}
{"epoch": 0.3144368858654573, "step": 209, "batch_size": 64, "mean": 7.20372200012207, "std": 10.547962188720703, "min": -12.747650146484375, "p10": -6.254153060913086, "median": 5.49653434753418, "p90": 19.437786102294922, "max": 35.05816650390625, "pos_frac": 0.75, "sample": [19.556838989257812, 17.211423873901367, -2.2978057861328125, 15.540355682373047, -4.813938140869141, 29.262847900390625, 6.286083221435547, 14.836708068847656, 1.7066421508789062, 35.05816650390625, 1.1877593994140625, -3.8342666625976562, -12.747650146484375, 17.72833251953125, -9.208206176757812, 2.2441177368164062, -0.38204193115234375, 1.8256340026855469, 1.7400588989257812, 30.469223022460938, 3.322742462158203, -6.457744598388672, -5.779106140136719, 9.4833984375, 13.317840576171875, 0.7913436889648438, 0.568389892578125, 11.823348999023438, 11.871734619140625, 16.473739624023438, 12.294132232666016, 7.3582763671875, 9.074172973632812, 13.036941528320312, 17.67437744140625, 4.261104583740234, 22.60882568359375, 4.480445861816406, -9.508804321289062, 8.28460693359375, -0.9127960205078125, 0.13657283782958984, 1.1630630493164062, -8.17059326171875, 17.431411743164062, 5.226768493652344, 19.159996032714844, 15.690774917602539, 6.6841888427734375, 16.518985748291016, 2.258687973022461, -8.728437423706055, -1.966928482055664, 1.758575439453125, -2.9459381103515625, 5.766300201416016, 22.502174377441406, -7.035247802734375, 9.037933349609375, 13.997369766235352, 24.21030044555664, 18.442420959472656, 4.661235809326172, -0.198638916015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000209.npy"}
{"epoch": 0.31594860166288735, "step": 210, "batch_size": 64, "mean": 7.0089311599731445, "std": 9.389031410217285, "min": -10.555679321289062, "p10": -6.642888259887695, "median": 7.525676727294922, "p90": 19.360800933837893, "max": 26.43072509765625, "pos_frac": 0.6875, "sample": [9.212326049804688, -0.8501205444335938, 2.4556884765625, 8.852615356445312, -0.6040992736816406, 11.650154113769531, 22.328094482421875, 12.199668884277344, 18.974761962890625, 10.241714477539062, 18.271400451660156, 26.43072509765625, 10.180397033691406, 11.719146728515625, 18.741493225097656, -0.9662628173828125, -9.870035171508789, -10.555679321289062, 22.721031188964844, -6.280666351318359, 9.381568908691406, 14.867652893066406, 7.137424468994141, -1.822601318359375, 19.081344604492188, -0.36931610107421875, -1.7393131256103516, 7.830482482910156, 14.522415161132812, -7.496437072753906, 14.129058837890625, 5.991569519042969, 7.2208709716796875, 3.8104095458984375, 4.412647247314453, -0.6382293701171875, -0.034267425537109375, -3.1547279357910156, -9.416755676269531, 20.200668334960938, -6.798126220703125, 6.35345458984375, 11.483306884765625, 1.9146499633789062, 19.480567932128906, 1.2957725524902344, 12.790260314941406, -0.8327350616455078, 14.967002868652344, 6.407123565673828, 4.054447174072266, 10.102657318115234, 8.678192138671875, -9.57864761352539, -0.39139556884765625, 24.14887237548828, 20.386775970458984, 5.509052276611328, 8.71788215637207, -3.490997314453125, 14.417745590209961, -7.468940734863281, 11.856613159179688, 15.801227569580078], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000210.npy"}
{"epoch": 0.31746031746031744, "step": 211, "batch_size": 64, "mean": 5.841667175292969, "std": 9.339803695678711, "min": -13.998886108398438, "p10": -6.0627418518066385, "median": 4.999076843261719, "p90": 16.90578842163086, "max": 36.136932373046875, "pos_frac": 0.703125, "sample": [4.0119476318359375, -11.257928848266602, 12.615089416503906, 5.0412750244140625, -2.975116729736328, -0.26471710205078125, -6.799285888671875, 12.331634521484375, 8.934539794921875, 2.2603187561035156, 2.974853515625, 7.382488250732422, -0.295562744140625, 16.950592041015625, 2.586315155029297, 9.323310852050781, 1.8018569946289062, -2.749256134033203, 8.017402648925781, -2.4047317504882812, 15.762962341308594, -1.871673583984375, 11.7939453125, -8.322052001953125, -3.3999671936035156, -4.1075897216796875, 2.709869384765625, 1.877044677734375, 7.611591339111328, 9.512395858764648, 5.823699951171875, -6.8482513427734375, 2.7958106994628906, -0.5252838134765625, 12.922744750976562, 16.801246643066406, 4.848838806152344, 8.51544189453125, -7.135684967041016, 14.206768035888672, 7.291114807128906, 36.136932373046875, 24.13006591796875, 10.018047332763672, 2.9799423217773438, 15.978279113769531, 9.201583862304688, 21.08197021484375, -4.344139099121094, 20.90264892578125, -13.998886108398438, 4.956878662109375, 11.590408325195312, 12.982032775878906, 19.342185974121094, 13.616950988769531, 13.041805267333984, 9.3651123046875, -2.7098426818847656, 18.9088134765625, -9.199317932128906, -0.8538131713867188, 0.849945068359375, 2.141082763671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000211.npy"}
{"epoch": 0.31897203325774753, "step": 212, "batch_size": 64, "mean": 7.075168609619141, "std": 10.566936492919922, "min": -20.56536865234375, "p10": -4.231561660766601, "median": 7.264531135559082, "p90": 20.448247909545902, "max": 29.851943969726562, "pos_frac": 0.75, "sample": [27.644088745117188, 13.214935302734375, 14.012252807617188, 13.477046966552734, 1.0152740478515625, 3.7074966430664062, 26.350509643554688, 2.488311767578125, -7.61846923828125, 17.028236389160156, -3.2974815368652344, -20.56536865234375, 19.687175750732422, 17.86395263671875, 10.885368347167969, 6.967781066894531, 20.77442169189453, 7.561281204223633, -16.834518432617188, 10.060020446777344, 7.786125183105469, 0.36456871032714844, 3.2593231201171875, 6.611162185668945, -8.098724365234375, 0.9016876220703125, 11.79400634765625, -2.724517822265625, 14.048225402832031, 29.851943969726562, 11.73714828491211, 8.27215576171875, 2.837860107421875, 3.2799911499023438, 15.115623474121094, 11.039688110351562, -2.4191856384277344, 13.449752807617188, -0.2461395263671875, 8.875030517578125, 10.7474365234375, -0.9312629699707031, 13.656234741210938, -2.3650054931640625, 3.2538204193115234, 10.24069595336914, -1.2168693542480469, 27.039398193359375, -10.246871948242188, -12.438995361328125, 4.906181335449219, 2.1081695556640625, 4.6275482177734375, -2.5411148071289062, 15.206771850585938, 24.791873931884766, 21.268325805664062, 18.181655883789062, 16.517040252685547, 6.6312713623046875, -4.50860595703125, 2.815460205078125, -3.585124969482422, 8.494735717773438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000212.npy"}
{"epoch": 0.3204837490551776, "step": 213, "batch_size": 64, "mean": 5.6877593994140625, "std": 10.857646942138672, "min": -24.268333435058594, "p10": -6.167728614807128, "median": 5.560474395751953, "p90": 21.64547958374024, "max": 29.174407958984375, "pos_frac": 0.671875, "sample": [12.606468200683594, -3.1804122924804688, 6.219078063964844, 24.82568359375, 6.493854522705078, 3.438115119934082, -4.774688720703125, -2.715534210205078, 5.563690185546875, 20.065086364746094, 6.5589447021484375, 2.073781967163086, -3.7170867919921875, -1.928731918334961, -24.268333435058594, -7.416961669921875, 10.734687805175781, 14.438232421875, -3.4137191772460938, 7.430419921875, 7.873577117919922, -14.300247192382812, 28.743408203125, 26.73504638671875, 22.38201904296875, 5.677696228027344, -3.136983871459961, -6.343231201171875, -1.3780097961425781, 1.0651054382324219, -5.758222579956055, -7.244659423828125, -0.40979766845703125, 4.003944396972656, 10.560150146484375, -7.5460205078125, 2.503772735595703, 6.8927459716796875, 22.294105529785156, 5.61041259765625, 4.05657958984375, 8.513755798339844, 20.13201904296875, 1.2353191375732422, 20.055030822753906, 1.730499267578125, 5.557258605957031, 10.618461608886719, 28.481597900390625, -0.6624603271484375, -7.931510925292969, 6.230098724365234, -3.8852691650390625, 6.4211273193359375, -2.496795654296875, 2.8043060302734375, -5.68023681640625, 29.174407958984375, 17.473892211914062, 3.217803955078125, 8.6619873046875, 12.384765625, 14.404823303222656, 16.26177978515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000213.npy"}
{"epoch": 0.3219954648526077, "step": 214, "batch_size": 64, "mean": 5.443133354187012, "std": 9.86967658996582, "min": -13.289215087890625, "p10": -8.002983474731446, "median": 4.311290740966797, "p90": 17.282386398315435, "max": 37.288330078125, "pos_frac": 0.765625, "sample": [7.410655975341797, 0.8664779663085938, 0.065948486328125, 0.5511207580566406, -0.5696296691894531, 11.443817138671875, 4.3583526611328125, 3.470479965209961, 5.030612945556641, 9.2635498046875, 19.652660369873047, 5.216684341430664, -0.5404472351074219, -13.289215087890625, 9.050140380859375, 12.652313232421875, 9.059545516967773, -8.526451110839844, 6.208381652832031, 13.869241714477539, 4.264228820800781, 10.700386047363281, 3.77850341796875, 0.2274932861328125, 2.8063125610351562, 15.769912719726562, 6.379119873046875, 3.531534194946289, 16.10391616821289, 8.486625671386719, 6.275505065917969, 0.20070266723632812, -8.053106307983398, -10.071952819824219, 5.9332122802734375, 18.49108123779297, -0.641326904296875, -3.9947967529296875, 23.411691665649414, 31.541412353515625, 0.3518524169921875, -7.886030197143555, 9.602020263671875, 2.3537025451660156, 15.915904998779297, 37.288330078125, 0.5794525146484375, 1.2689361572265625, 5.508525848388672, -3.322214126586914, -0.7202911376953125, 17.787445068359375, 1.5660629272460938, -1.4674644470214844, 2.55426025390625, 7.632106781005859, 27.650604248046875, 7.391838073730469, -12.880783081054688, 5.1928253173828125, 15.923362731933594, 2.77252197265625, -8.682735443115234, -8.404403686523438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000214.npy"}
{"epoch": 0.3235071806500378, "step": 215, "batch_size": 64, "mean": 7.5287065505981445, "std": 10.125297546386719, "min": -10.606826782226562, "p10": -4.924847030639649, "median": 6.9828033447265625, "p90": 22.474487304687504, "max": 31.408950805664062, "pos_frac": 0.71875, "sample": [2.9479293823242188, -5.528694152832031, -1.5570297241210938, 9.610939025878906, 3.7075424194335938, -0.04897308349609375, 3.1151046752929688, -2.3033599853515625, 25.50091552734375, 22.67102813720703, -4.837493896484375, -3.3386459350585938, -8.066078186035156, 7.947208404541016, 9.479606628417969, 19.774253845214844, 4.551422119140625, 11.757118225097656, 12.243545532226562, 3.2076873779296875, 7.5434417724609375, 6.906646728515625, -2.5433082580566406, 9.535888671875, 20.18978500366211, 6.610803604125977, -10.606826782226562, 19.291709899902344, 10.079238891601562, -4.962284088134766, 2.1734161376953125, 16.362693786621094, 1.641082763671875, -1.975494384765625, -7.7860107421875, -10.036392211914062, 12.557861328125, 7.139793395996094, 14.35418701171875, 16.030364990234375, 4.330018997192383, 2.7728958129882812, 6.344482421875, -0.22660064697265625, 25.083816528320312, 31.408950805664062, 18.60393524169922, 24.51521873474121, 10.037921905517578, 11.254989624023438, 22.015892028808594, 10.095260620117188, 9.499465942382812, -6.747104644775391, -1.274282455444336, 4.827381134033203, 9.769775390625, 29.70574951171875, 23.11431884765625, -0.9212589263916016, 18.771034240722656, 0.2574501037597656, 7.0589599609375, -1.8016853332519531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000215.npy"}
{"epoch": 0.3250188964474679, "step": 216, "batch_size": 64, "mean": 5.879211902618408, "std": 9.723938941955566, "min": -19.59429931640625, "p10": -6.6035114288330075, "median": 6.246061325073242, "p90": 17.634355926513674, "max": 27.06877899169922, "pos_frac": 0.734375, "sample": [2.2179946899414062, 17.468482971191406, 3.5543212890625, 13.220123291015625, 12.008304595947266, 4.36138916015625, 10.444686889648438, 8.706581115722656, 14.593299865722656, 17.902774810791016, 6.8472747802734375, 7.9198455810546875, 12.079017639160156, 2.888641357421875, 2.6727066040039062, -5.365959167480469, 6.346004486083984, 8.783683776855469, -10.747398376464844, 20.097671508789062, 12.415573120117188, 20.391586303710938, 6.1461181640625, 4.6280364990234375, 12.990673065185547, -6.666942596435547, 10.473411560058594, 0.12018585205078125, 7.229852676391602, 17.7054443359375, 11.864578247070312, -19.59429931640625, 1.9447784423828125, 7.998222351074219, 6.6551513671875, 4.25787353515625, -4.465507507324219, -1.430490493774414, 16.03277587890625, -8.211585998535156, -1.8601150512695312, 15.141986846923828, 27.06877899169922, -5.693763732910156, -6.45550537109375, -4.38934326171875, 5.773490905761719, 0.872039794921875, -9.034183502197266, 5.0447998046875, 26.219043731689453, -6.736274719238281, -11.034072875976562, 0.6585464477539062, 26.866119384765625, -1.1963167190551758, 7.839534759521484, 10.29302978515625, 3.601837158203125, 14.039306640625, -6.073123931884766, 16.714706420898438, -0.39455413818359375, 12.518699645996094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000216.npy"}
{"epoch": 0.32653061224489793, "step": 217, "batch_size": 64, "mean": 6.38028621673584, "std": 11.014670372009277, "min": -16.458602905273438, "p10": -6.833580017089842, "median": 5.3771514892578125, "p90": 21.7814998626709, "max": 41.36344909667969, "pos_frac": 0.765625, "sample": [5.2346649169921875, 21.34772491455078, -7.666252136230469, 1.5881996154785156, -2.391643524169922, -1.7422332763671875, 1.3205280303955078, 41.36344909667969, -2.57928466796875, -11.6597900390625, 20.055908203125, 23.112060546875, 6.0687255859375, 4.579689025878906, 11.434051513671875, 5.6782073974609375, 7.2288970947265625, 6.408740997314453, 5.75311279296875, 3.1780853271484375, 0.7955055236816406, 21.967403411865234, 6.91156005859375, 8.39645767211914, 6.179069519042969, 17.720191955566406, -9.241172790527344, 4.27459716796875, -2.743457794189453, 17.55138397216797, 14.167736053466797, 2.8937835693359375, 11.347549438476562, -8.600120544433594, 1.0243759155273438, -8.709415435791016, 1.0909996032714844, 8.593353271484375, -1.0185623168945312, -11.209121704101562, -16.458602905273438, 22.474292755126953, -4.105659484863281, 0.438629150390625, 1.171905517578125, -4.890678405761719, 2.2578277587890625, 7.28851318359375, 5.5196380615234375, 30.308700561523438, 5.208587646484375, 14.255992889404297, 6.841209411621094, 0.7765674591064453, 28.027816772460938, 8.097076416015625, 31.32622528076172, 10.859664916992188, 3.0643539428710938, 12.985774993896484, 10.945564270019531, 15.277313232421875, 1.205657958984375, -4.2429962158203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000217.npy"}
{"epoch": 0.328042328042328, "step": 218, "batch_size": 64, "mean": 4.195858001708984, "std": 10.920681953430176, "min": -19.129043579101562, "p10": -11.163957595825195, "median": 4.857635498046875, "p90": 19.836041641235365, "max": 28.90496826171875, "pos_frac": 0.6875, "sample": [-6.001731872558594, 0.1577320098876953, 8.729049682617188, 5.96875, 0.459686279296875, 4.557403564453125, 9.545967102050781, -3.826324462890625, 12.202133178710938, 8.988861083984375, 0.83502197265625, 3.5768890380859375, 15.410030364990234, -10.383995056152344, -11.248661041259766, -6.033164978027344, 9.897384643554688, 21.33673858642578, -19.129043579101562, 2.6138687133789062, 5.6592559814453125, 9.730819702148438, 0.5536880493164062, -1.1620922088623047, 14.215898513793945, 22.279022216796875, 13.541748046875, 25.987457275390625, 5.0981597900390625, 11.452224731445312, -2.7027769088745117, 8.392608642578125, -8.108413696289062, -11.312084197998047, -6.582759857177734, 4.6171112060546875, 0.845916748046875, 2.7523193359375, 0.46128082275390625, 9.724472045898438, 21.46997833251953, 5.75421142578125, 2.783449172973633, -5.27386474609375, 5.193107604980469, -13.245389938354492, 16.334415435791016, -2.6207046508789062, -10.966316223144531, -12.982025146484375, 11.398029327392578, -16.667064666748047, 13.697196960449219, -1.2576446533203125, 8.795989990234375, 12.334686279296875, 5.300270080566406, 28.90496826171875, -16.09790802001953, -1.5609397888183594, 7.566925048828125, 8.742744445800781, 22.930877685546875, 24.89948844909668], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000218.npy"}
{"epoch": 0.3295540438397581, "step": 219, "batch_size": 64, "mean": 5.7463812828063965, "std": 9.08888053894043, "min": -11.556732177734375, "p10": -4.366524887084961, "median": 3.622891426086426, "p90": 18.24415969848633, "max": 26.680038452148438, "pos_frac": 0.734375, "sample": [11.544258117675781, 21.655216217041016, 9.71075439453125, 14.595926284790039, 16.771316528320312, -1.8528900146484375, -2.1315231323242188, 3.4436893463134766, 2.821502685546875, -9.431877136230469, 1.3240032196044922, 26.680038452148438, 15.8077392578125, 1.40179443359375, -4.178993225097656, 2.3775634765625, 0.8885459899902344, 1.6499366760253906, 0.9024314880371094, -1.4027414321899414, -3.9446229934692383, -11.556732177734375, 0.7093734741210938, 3.2224197387695312, -3.243194580078125, 1.8443565368652344, 12.53509521484375, -0.33736228942871094, 22.28753662109375, 3.802093505859375, 14.118293762207031, 16.104900360107422, 9.097091674804688, 18.51166534423828, 17.619979858398438, 23.539840698242188, 3.2817230224609375, 5.968467712402344, -9.652908325195312, 4.998729705810547, 7.900463104248047, 6.905662536621094, 6.662788391113281, -9.472251892089844, 7.480621337890625, -6.350742340087891, 16.068132400512695, -1.986236572265625, 12.315879821777344, 21.226882934570312, -0.8023452758789062, 4.966682434082031, 10.654914855957031, 1.992767333984375, -10.409889221191406, 13.750679016113281, 7.628318786621094, 0.24668121337890625, 7.00311279296875, -4.446895599365234, 20.706045150756836, -0.513885498046875, 11.381141662597656, 3.376434326171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000219.npy"}
{"epoch": 0.3310657596371882, "step": 220, "batch_size": 64, "mean": 5.052901268005371, "std": 9.742456436157227, "min": -14.142921447753906, "p10": -6.383942794799804, "median": 4.091811180114746, "p90": 16.919515228271486, "max": 27.683868408203125, "pos_frac": 0.65625, "sample": [-6.503337860107422, 3.7356414794921875, 5.1656341552734375, 3.5098228454589844, 5.165843963623047, -0.784271240234375, 15.254791259765625, 1.1296768188476562, -0.0032215118408203125, 3.0115203857421875, -0.22667694091796875, 27.683868408203125, 6.462894439697266, -10.574424743652344, 4.190265655517578, 11.461088180541992, 12.333969116210938, -11.244659423828125, -1.5358467102050781, 10.997665405273438, 13.785102844238281, 14.272510528564453, -4.075618743896484, -4.618389129638672, -6.105354309082031, -14.142921447753906, -0.00080108642578125, -3.0878829956054688, 20.36895751953125, 12.066009521484375, 16.653709411621094, 7.100807189941406, 11.073974609375, 8.320003509521484, 16.548519134521484, -1.1608772277832031, 17.033432006835938, 0.06107330322265625, 21.732248306274414, -6.791805267333984, 4.564353942871094, 4.189870834350586, 0.1593780517578125, 7.392473220825195, 27.1927490234375, -4.578765869140625, 22.86717987060547, -0.9213199615478516, -3.7663917541503906, 13.633331298828125, 2.7757720947265625, -10.33685302734375, 14.304708480834961, -2.7845897674560547, 3.9937515258789062, 7.4108734130859375, -12.091606140136719, 11.353713989257812, 10.231681823730469, 1.3502559661865234, 24.81439971923828, -4.526899337768555, 5.634929656982422, 2.2597522735595703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000220.npy"}
{"epoch": 0.3325774754346183, "step": 221, "batch_size": 64, "mean": 5.583874702453613, "std": 10.934568405151367, "min": -20.218704223632812, "p10": -6.314083099365234, "median": 3.535247802734375, "p90": 22.71151905059815, "max": 33.885589599609375, "pos_frac": 0.65625, "sample": [8.605766296386719, -5.076900482177734, 8.578804016113281, 6.839111328125, -3.3681488037109375, 10.185699462890625, -0.4735832214355469, 5.73297119140625, 3.951915740966797, -6.4184417724609375, -5.195396423339844, -0.6233062744140625, 17.125900268554688, 15.153854370117188, -5.877174377441406, 0.8732395172119141, -1.2736358642578125, 0.057529449462890625, 2.3333492279052734, 2.319671630859375, 21.29157066345215, 23.530380249023438, 2.2014236450195312, 9.267868041992188, 23.320068359375, 1.3493232727050781, 12.632335662841797, 24.968021392822266, -2.0336456298828125, -9.609045028686523, 12.410263061523438, 20.531036376953125, -6.735301971435547, 4.0743255615234375, 24.484642028808594, 6.685062408447266, 8.946624755859375, -8.440010070800781, 12.511924743652344, 11.724365234375, 14.242408752441406, -1.4135360717773438, -1.4069137573242188, 4.091001510620117, 2.9943389892578125, -2.4861068725585938, -3.5438156127929688, 25.072509765625, -7.0566253662109375, -4.013507843017578, -20.218704223632812, -19.14478302001953, 3.118579864501953, 13.301361083984375, 15.478652954101562, -1.1747093200683594, 2.9231719970703125, 10.825675964355469, 33.885589599609375, 14.7894287109375, -6.070579528808594, 9.32550048828125, 24.529476165771484, 2.7571029663085938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000221.npy"}
{"epoch": 0.3340891912320484, "step": 222, "batch_size": 64, "mean": 7.148991584777832, "std": 9.26502513885498, "min": -9.4921875, "p10": -3.7523656845092765, "median": 7.066184997558594, "p90": 18.893834495544436, "max": 30.51215934753418, "pos_frac": 0.703125, "sample": [-1.5373764038085938, -5.891357421875, 5.113430023193359, 1.202239990234375, 12.73675537109375, 12.487344741821289, 5.720661163330078, -1.5393486022949219, 17.981063842773438, 23.247493743896484, 12.158203125, 2.5707435607910156, -5.2797698974609375, 4.571647644042969, 9.893686294555664, 1.2354812622070312, -1.0790786743164062, 4.50262451171875, -0.25736236572265625, 4.146282196044922, 6.546501159667969, -0.5412445068359375, 22.500381469726562, -2.329864501953125, -0.27410125732421875, 21.314884185791016, 14.151016235351562, 29.87493896484375, -7.390045166015625, 5.621490478515625, 7.585868835449219, 9.977951049804688, 11.500297546386719, 18.17436408996582, 16.09356689453125, 12.25335693359375, 12.003284454345703, -9.465934753417969, -4.086275100708008, -2.6783905029296875, -2.9732437133789062, 17.477279663085938, 1.2815628051757812, 8.702812194824219, 2.346292495727539, 19.202178955078125, -1.374053955078125, -5.925971984863281, 7.636993408203125, 11.389274597167969, 9.078475952148438, 3.8294105529785156, 12.084190368652344, 30.51215934753418, -1.350860595703125, 8.70166015625, 20.928733825683594, 16.815048217773438, -2.4259185791015625, 15.87725830078125, 11.801239013671875, 11.765405654907227, 8.832290649414062, -9.4921875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000222.npy"}
{"epoch": 0.3356009070294785, "step": 223, "batch_size": 64, "mean": 7.442340850830078, "std": 9.640159606933594, "min": -16.85369110107422, "p10": -4.361658477783203, "median": 6.412261009216309, "p90": 19.213279724121094, "max": 32.4365234375, "pos_frac": 0.8125, "sample": [7.021886825561523, 13.829521179199219, 6.467231750488281, 0.2588958740234375, 0.6658554077148438, 12.507291793823242, -4.5487213134765625, 2.89422607421875, -8.13037109375, 15.584516525268555, 6.359617233276367, 32.4365234375, 8.709686279296875, 6.46490478515625, 1.4875335693359375, 3.48846435546875, 0.3213615417480469, 5.624542236328125, 8.754364013671875, -5.108528137207031, 19.129913330078125, 13.822479248046875, -9.226612091064453, 2.8056259155273438, 9.412376403808594, 1.9491157531738281, -1.6904335021972656, 19.98980712890625, -5.5402984619140625, -16.85369110107422, 9.18259048461914, 5.170724868774414, 0.3447418212890625, 2.16204833984375, 7.263496398925781, 18.423927307128906, 10.950202941894531, 16.01910400390625, -3.9251785278320312, 3.8884544372558594, -2.7546157836914062, 8.022552490234375, 4.584556579589844, 12.53948974609375, 24.5416259765625, 16.364097595214844, 16.541168212890625, 26.39504623413086, -0.570404052734375, 13.312870025634766, 29.33032989501953, -5.130363464355469, 19.249008178710938, 23.41106414794922, 13.246406555175781, 8.1517333984375, 15.171024322509766, 18.627811431884766, 2.123729705810547, -2.5351009368896484, 3.5728683471679688, 3.63800048828125, 5.741176605224609, 4.368522644042969], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000223.npy"}
{"epoch": 0.3371126228269085, "step": 224, "batch_size": 64, "mean": 5.134331226348877, "std": 8.727468490600586, "min": -12.846572875976562, "p10": -6.455252075195312, "median": 4.566547393798828, "p90": 17.40196342468262, "max": 28.38488006591797, "pos_frac": 0.75, "sample": [2.2997474670410156, 28.38488006591797, 0.4243640899658203, 1.8400421142578125, 2.489013671875, 8.281776428222656, 12.053489685058594, 6.6248016357421875, -1.1601600646972656, 4.473175048828125, 11.797836303710938, 9.42184066772461, 9.1063232421875, 18.46672821044922, 5.796415328979492, -0.5915107727050781, 7.937273025512695, 11.025081634521484, 8.99700927734375, 8.419448852539062, -4.452728271484375, 9.408203125, 3.064239501953125, -11.606708526611328, -6.778350830078125, -7.259941101074219, -0.8023300170898438, 4.625038146972656, 2.9083251953125, -12.525161743164062, 20.426136016845703, 0.05470085144042969, -7.328998565673828, 5.0624542236328125, 4.861534118652344, -0.33617401123046875, -2.945423126220703, -12.846572875976562, 5.2630615234375, 6.408111572265625, 2.7818069458007812, 2.0295143127441406, 13.494129180908203, 0.2799110412597656, -9.3369140625, 3.2124481201171875, 17.664066314697266, 13.454383850097656, 7.861137390136719, 3.025226593017578, -5.70135498046875, 7.342689514160156, 10.419727325439453, 16.790390014648438, 27.091773986816406, 11.277023315429688, 1.5713882446289062, 8.109893798828125, -1.7428054809570312, 22.807292938232422, -1.1203460693359375, 17.85302734375, 4.508056640625, 3.6377410888671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000224.npy"}
{"epoch": 0.3386243386243386, "step": 225, "batch_size": 64, "mean": 6.576930046081543, "std": 11.66900634765625, "min": -27.940673828125, "p10": -5.227771949768066, "median": 3.6286048889160156, "p90": 20.27227039337158, "max": 42.851226806640625, "pos_frac": 0.71875, "sample": [-3.5165672302246094, 6.162578582763672, -1.0070266723632812, 3.0392608642578125, 12.77313232421875, 20.083343505859375, 1.40380859375, -4.779439926147461, -0.7792167663574219, 19.581790924072266, 0.7628936767578125, 16.276809692382812, 0.10812759399414062, 14.91339111328125, 0.16866302490234375, 2.5823135375976562, -2.5326919555664062, 0.08345794677734375, 10.254878997802734, 20.579185485839844, 20.353239059448242, 9.799489974975586, 42.851226806640625, 6.563804626464844, 31.353240966796875, 3.3281097412109375, 16.547637939453125, 12.285381317138672, 14.115242004394531, -3.4663867950439453, 3.7125091552734375, -27.940673828125, 12.425750732421875, -5.419914245605469, 19.45616912841797, 12.051727294921875, 4.035547256469727, -7.3489990234375, 16.99530029296875, 1.9194297790527344, 6.0462646484375, 3.5447006225585938, 27.462608337402344, -4.467987060546875, 10.338630676269531, -8.588287353515625, -9.0936279296875, -2.1419410705566406, 19.12500762939453, 0.1514892578125, 7.465263366699219, 3.2613143920898438, 8.100196838378906, 18.330429077148438, 23.819488525390625, -9.256402969360352, -0.7287750244140625, 3.5173110961914062, 23.16753387451172, 0.0978851318359375, -2.899871826171875, -10.121917724609375, 16.90771484375, -2.8900070190429688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000225.npy"}
{"epoch": 0.3401360544217687, "step": 226, "batch_size": 64, "mean": 6.0893354415893555, "std": 10.224908828735352, "min": -20.22496795654297, "p10": -4.867883682250977, "median": 6.295965194702148, "p90": 19.155423355102545, "max": 28.01111602783203, "pos_frac": 0.75, "sample": [9.062026977539062, 0.23126220703125, -0.9551315307617188, 9.23958969116211, 7.8978424072265625, 5.00909423828125, 23.972381591796875, 11.064162254333496, 2.897258758544922, -12.542022705078125, 15.446507453918457, 3.2332801818847656, 13.075126647949219, 19.889423370361328, -0.3000450134277344, -3.4194412231445312, 6.77430534362793, -2.1742019653320312, 16.47322654724121, 7.463783264160156, 21.782817840576172, 1.2496337890625, 2.8325271606445312, 9.907943725585938, 6.290626525878906, 6.301303863525391, 4.062091827392578, 27.18651580810547, 17.019134521484375, 14.5098876953125, -5.588359832763672, 24.470291137695312, 2.5346450805664062, -0.34336090087890625, 16.374889373779297, 5.311614990234375, 3.2127113342285156, -18.852218627929688, -4.855747222900391, 0.4926624298095703, -0.795745849609375, 6.97076416015625, -4.873085021972656, -8.895271301269531, 10.790824890136719, 24.306961059570312, -20.22496795654297, 11.666763305664062, -10.936773300170898, 12.406089782714844, 6.42327880859375, 3.8692054748535156, 6.315887451171875, 6.743808746337891, 0.16046142578125, 28.01111602783203, 13.623870849609375, 16.092056274414062, 1.9605369567871094, -4.117841720581055, 17.44275665283203, -4.347078323364258, 0.5128326416015625, 10.372982025146484], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000226.npy"}
{"epoch": 0.3416477702191988, "step": 227, "batch_size": 64, "mean": 6.244579792022705, "std": 11.858757019042969, "min": -21.99779510498047, "p10": -6.351957321166992, "median": 4.7921600341796875, "p90": 23.868840789794923, "max": 31.223785400390625, "pos_frac": 0.671875, "sample": [24.130638122558594, 15.336380004882812, 0.46722412109375, 22.77184295654297, 1.5579071044921875, -19.825122833251953, 12.720329284667969, 19.256160736083984, -21.99779510498047, -2.8031978607177734, 24.297042846679688, -8.692970275878906, -2.6506004333496094, 25.192142486572266, 20.100200653076172, -5.0592041015625, -0.7019920349121094, -1.9924850463867188, 17.624393463134766, -4.442108154296875, 3.7836456298828125, 12.7115478515625, 7.213958740234375, 8.122087478637695, 26.60655975341797, 14.04656982421875, -6.465785980224609, 16.850746154785156, -3.635356903076172, -2.2501487731933594, -0.28916168212890625, 25.268646240234375, 23.257980346679688, 1.9720077514648438, 11.742385864257812, 12.170907020568848, 31.223785400390625, 0.12335968017578125, 2.320354461669922, 17.07275390625, 0.6804962158203125, 5.8006744384765625, -6.086357116699219, 7.2754669189453125, 13.348220825195312, 1.1445732116699219, 7.996543884277344, 6.315284729003906, -4.5608367919921875, 2.559751510620117, -14.116275787353516, 11.749008178710938, -1.0718841552734375, 24.187332153320312, 1.1316795349121094, -3.5554656982421875, 7.629539489746094, -0.13864517211914062, 14.30099105834961, 2.2226638793945312, -10.729499816894531, -13.409637451171875, 16.22243309020996, 13.621437072753906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000227.npy"}
{"epoch": 0.3431594860166289, "step": 228, "batch_size": 64, "mean": 5.698853492736816, "std": 10.405200004577637, "min": -18.786029815673828, "p10": -6.35255241394043, "median": 5.910717010498047, "p90": 19.504127502441406, "max": 28.970439910888672, "pos_frac": 0.671875, "sample": [-2.4633541107177734, 24.777122497558594, 16.279144287109375, 26.207714080810547, 10.306739807128906, 10.283439636230469, 4.33549690246582, 20.034732818603516, 2.294769287109375, -2.8822097778320312, 11.90933609008789, 1.7277374267578125, 9.57489013671875, -18.786029815673828, 1.4206390380859375, 3.582670211791992, 13.61062240600586, 3.3595733642578125, -2.1094207763671875, 17.38241195678711, 5.901329040527344, 12.71246337890625, -2.9084434509277344, 2.6681594848632812, -6.396240234375, 14.321998596191406, 19.61174774169922, 28.970439910888672, 6.502140045166016, 11.781108856201172, -6.500940322875977, -8.416717529296875, 10.855178833007812, -9.147590637207031, 8.54322624206543, -15.51395034790039, 9.94647216796875, -1.0280342102050781, -13.128677368164062, -5.553506851196289, 7.0010986328125, 3.144012451171875, 12.623443603515625, -6.250614166259766, 5.92010498046875, 3.365011215209961, 17.131942749023438, 10.074481964111328, -4.418140411376953, 6.381683349609375, -4.156688690185547, -2.3257369995117188, 18.53875732421875, 6.6439666748046875, -3.8557281494140625, 8.1397705078125, 23.463165283203125, 24.353382110595703, 19.253013610839844, -4.970466613769531, -0.34002685546875, -0.925994873046875, 9.62521743774414, 2.24481201171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000228.npy"}
{"epoch": 0.34467120181405897, "step": 229, "batch_size": 64, "mean": 6.6058149337768555, "std": 9.854999542236328, "min": -17.527130126953125, "p10": -4.528347778320311, "median": 6.840202331542969, "p90": 18.798546600341798, "max": 29.564849853515625, "pos_frac": 0.734375, "sample": [3.140350341796875, 9.556015014648438, 18.38751983642578, 10.613143920898438, 11.587158203125, 8.570037841796875, 22.01409339904785, 12.643247604370117, 14.130558013916016, 1.6547813415527344, 2.9618453979492188, 15.25921630859375, 16.7025146484375, -5.066535949707031, 9.5074462890625, 5.503793716430664, -2.79656982421875, -3.2725753784179688, 18.974700927734375, 10.20947265625, 4.148403167724609, 2.8349342346191406, 3.264301300048828, 13.601181030273438, -11.290481567382812, 7.263315200805664, -11.87542724609375, -2.650519371032715, -1.642843246459961, 7.072212219238281, 10.738296508789062, 29.564849853515625, -12.403175354003906, 10.279987335205078, -0.48548126220703125, 1.909881591796875, 21.018447875976562, -6.982872009277344, -2.2168045043945312, 6.992279052734375, 26.796844482421875, 8.118539810180664, -0.1905956268310547, -5.951564788818359, 7.688575744628906, 4.055702209472656, 17.205547332763672, 9.086502075195312, 6.548408508300781, 9.59914779663086, 6.6881256103515625, -1.0034961700439453, 6.602046966552734, 1.331756591796875, 15.079757690429688, 8.594917297363281, -2.2810440063476562, 1.7613697052001953, 23.74473762512207, 29.04931640625, 17.588638305664062, 3.4965362548828125, -2.731159210205078, -17.527130126953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000229.npy"}
{"epoch": 0.34618291761148906, "step": 230, "batch_size": 64, "mean": 4.95400333404541, "std": 10.030776977539062, "min": -18.72406768798828, "p10": -9.744288635253906, "median": 4.5602521896362305, "p90": 19.50172538757324, "max": 24.314956665039062, "pos_frac": 0.703125, "sample": [-8.228408813476562, 6.04486083984375, 6.114545822143555, 0.9695892333984375, 10.918434143066406, -15.363128662109375, 3.5347442626953125, -0.20917510986328125, -12.19439697265625, 16.463966369628906, 4.6485748291015625, 19.36156463623047, 19.57439422607422, 9.822525024414062, -1.3639907836914062, 5.234287261962891, 13.04075813293457, 20.022445678710938, -12.320915222167969, -2.719512939453125, -18.72406768798828, 10.036376953125, 0.442596435546875, 6.2133941650390625, 10.210151672363281, -2.5831871032714844, 13.864646911621094, 0.09957122802734375, 19.743473052978516, 13.997673034667969, 11.307378768920898, -1.7159194946289062, -0.5800876617431641, 2.822232246398926, -1.6302719116210938, -10.480857849121094, -0.6469535827636719, 4.177635192871094, 22.7696533203125, 14.137260437011719, 8.459918975830078, 6.8170928955078125, -1.0861053466796875, 4.716194152832031, 24.07170867919922, 7.768642425537109, -14.872051239013672, 19.265560150146484, 2.2726287841796875, -4.939825057983398, 15.148300170898438, -10.393951416015625, 11.177818298339844, 10.506208419799805, 3.1790847778320312, 19.56179428100586, 2.3358421325683594, 1.3965606689453125, 4.471929550170898, 0.432403564453125, 0.2814483642578125, 24.314956665039062, -3.471820831298828, 8.830001831054688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000230.npy"}
{"epoch": 0.3476946334089191, "step": 231, "batch_size": 64, "mean": 7.529757976531982, "std": 10.813018798828125, "min": -14.642057418823242, "p10": -4.702381896972656, "median": 5.448789596557617, "p90": 23.75874366760254, "max": 28.972015380859375, "pos_frac": 0.78125, "sample": [21.871971130371094, -3.4913330078125, 21.19287109375, 2.7512054443359375, 1.1637420654296875, 3.58404541015625, 0.35869598388671875, 20.861740112304688, 4.056793212890625, -0.8554487228393555, 5.991542816162109, 1.361663818359375, 6.318046569824219, 4.394683837890625, -10.469863891601562, 4.227897644042969, -0.882843017578125, 21.203983306884766, 19.70751953125, -3.1733245849609375, 6.971122741699219, -9.491180419921875, 8.786128997802734, 8.30483627319336, 1.8971672058105469, 24.612716674804688, -4.4972076416015625, -14.642057418823242, 25.270790100097656, 11.842582702636719, 24.046897888183594, -4.04437255859375, 0.835113525390625, 25.996551513671875, 10.446117401123047, -7.47845458984375, 3.204604148864746, 14.570755004882812, 7.24322509765625, -4.790313720703125, 19.838077545166016, 0.12450408935546875, 1.4441986083984375, -5.358001708984375, 13.768302917480469, 0.39609336853027344, 3.4858169555664062, -2.82525634765625, 11.195671081542969, 10.374794006347656, 11.076217651367188, 24.195907592773438, 4.686920166015625, 13.070304870605469, 20.938167572021484, 28.972015380859375, 1.6027069091796875, 12.998001098632812, 6.2196044921875, 17.566131591796875, -14.576416015625, 4.906036376953125, 23.086383819580078, 25.459733963012695], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000231.npy"}
{"epoch": 0.3492063492063492, "step": 232, "batch_size": 64, "mean": 4.994921684265137, "std": 11.612079620361328, "min": -18.829803466796875, "p10": -7.031211090087891, "median": 2.86944580078125, "p90": 20.533802032470707, "max": 33.20947265625, "pos_frac": 0.640625, "sample": [-0.924285888671875, 21.63271713256836, 10.639045715332031, 32.81739807128906, 2.845691680908203, -12.683502197265625, -3.396087646484375, 21.336647033691406, -7.038230895996094, 2.26007080078125, -7.01483154296875, 2.893199920654297, -15.666488647460938, 0.4264068603515625, -3.617961883544922, 17.279708862304688, -2.3097972869873047, -1.3971061706542969, 2.0389785766601562, 6.884532928466797, 0.40782928466796875, 9.4306640625, 4.028717041015625, 12.423873901367188, 9.729225158691406, 4.715799331665039, -6.275417327880859, 21.01819610595703, -6.483512878417969, 17.5706787109375, 2.2722625732421875, 18.008270263671875, -7.973670959472656, -0.8743782043457031, 0.49420166015625, 0.67095947265625, -0.8716087341308594, -3.4730758666992188, 29.2408447265625, 5.022651672363281, -0.0492706298828125, 11.698211669921875, 7.278678894042969, 33.20947265625, 19.403549194335938, 31.4140625, -15.228290557861328, 2.633514404296875, -3.0612525939941406, 3.3118667602539062, 18.969635009765625, 5.321758270263672, 16.36492919921875, 5.984905242919922, 9.175697326660156, -16.872520446777344, -1.1016483306884766, 5.902790069580078, -0.3938179016113281, 5.078498840332031, -0.781524658203125, -18.829803466796875, 15.433082580566406, 8.723844528198242], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000232.npy"}
{"epoch": 0.3507180650037793, "step": 233, "batch_size": 64, "mean": 5.419363498687744, "std": 10.9344482421875, "min": -15.271697998046875, "p10": -7.9686431884765625, "median": 4.152468681335449, "p90": 21.358055114746097, "max": 31.936981201171875, "pos_frac": 0.625, "sample": [31.936981201171875, 13.939689636230469, 18.32343292236328, -4.346330642700195, 5.802440643310547, 9.48446273803711, -2.995086669921875, 21.684555053710938, -1.3116073608398438, 8.701095581054688, 14.535030364990234, 27.571945190429688, -0.2675018310546875, -2.920074462890625, -2.2395706176757812, 23.277427673339844, 6.594633102416992, 0.35613250732421875, -2.128265380859375, -3.6101417541503906, 20.596221923828125, 4.589996337890625, 15.905693054199219, 2.757080078125, -13.098846435546875, 5.613029479980469, -9.025627136230469, 6.646980285644531, -8.137924194335938, -6.2769317626953125, 8.949363708496094, -0.9507865905761719, 12.538864135742188, 7.600162506103516, -0.51837158203125, -9.033435821533203, 10.599235534667969, 3.866579055786133, 4.438358306884766, 0.17158985137939453, 5.971122741699219, -15.271697998046875, 22.9039306640625, 1.98699951171875, 7.7552642822265625, -2.713237762451172, 4.859504699707031, -8.098190307617188, 20.252368927001953, 13.338628768920898, -2.7154617309570312, -6.786064147949219, 1.1819725036621094, 19.32366180419922, 26.484481811523438, 27.5369873046875, 2.411923408508301, -7.6663665771484375, 15.153945922851562, -9.047119140625, 1.1551933288574219, -2.43359375, -2.7878646850585938, 14.422393798828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000233.npy"}
{"epoch": 0.35222978080120937, "step": 234, "batch_size": 64, "mean": 5.310544967651367, "std": 9.926766395568848, "min": -11.734619140625, "p10": -7.548867416381835, "median": 3.873478889465332, "p90": 21.022099494934086, "max": 25.16339874267578, "pos_frac": 0.65625, "sample": [-1.5984344482421875, -10.4129638671875, 8.056083679199219, -3.230682373046875, 1.9348697662353516, -3.2525863647460938, 17.62396240234375, -0.8592453002929688, 13.988845825195312, -3.92901611328125, 3.6566619873046875, 9.133031845092773, 8.691047668457031, 16.22772216796875, 7.547271728515625, -0.15701675415039062, -0.04515838623046875, -10.82254409790039, 14.112335205078125, -8.93269157409668, 4.625888824462891, 8.05042839050293, 13.928840637207031, 23.637344360351562, 2.4238834381103516, -1.2038116455078125, 24.008544921875, 4.081010818481445, -9.17559814453125, 1.0351600646972656, 13.265266418457031, 22.016300201416016, 22.447351455688477, 16.625843048095703, 5.244596481323242, -7.711086273193359, 4.10174560546875, -1.698944091796875, 2.1222915649414062, -1.4382209777832031, 7.420341491699219, 15.730682373046875, 2.0233154296875, 25.16339874267578, 19.732269287109375, 4.478189468383789, 0.8885688781738281, 9.203964233398438, 21.452674865722656, -8.729022979736328, -2.9716567993164062, 2.3966026306152344, -11.734619140625, 21.68804931640625, 1.6094493865966797, -6.642414093017578, 12.86767578125, 7.5341644287109375, -3.3332672119140625, 3.6659469604492188, -7.170356750488281, 20.017423629760742, 6.6062164306640625, -6.141078948974609], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000234.npy"}
{"epoch": 0.35374149659863946, "step": 235, "batch_size": 64, "mean": 6.812518119812012, "std": 9.057826042175293, "min": -14.131324768066406, "p10": -1.9885223388671873, "median": 5.296785354614258, "p90": 18.817884635925296, "max": 27.361888885498047, "pos_frac": 0.8125, "sample": [-1.0006294250488281, 7.882049560546875, 4.9789886474609375, 9.764595031738281, 16.720611572265625, -1.838409423828125, -3.2947921752929688, 8.471847534179688, 6.009368896484375, 4.2822265625, 2.2093124389648438, 15.554759979248047, 0.6910724639892578, 13.222419738769531, 10.491668701171875, 14.701757431030273, 5.035087585449219, 1.3954238891601562, 14.22671890258789, 5.733367919921875, 3.4579238891601562, 4.881450653076172, -11.150421142578125, 14.552295684814453, 17.155155181884766, 7.202430725097656, 0.4869537353515625, -6.674201965332031, 27.361888885498047, 1.1791973114013672, 9.373542785644531, 1.8983039855957031, 18.482519149780273, -0.856842041015625, 18.961612701416016, 2.4013118743896484, 13.306331634521484, 3.8722686767578125, 23.04345703125, -14.131324768066406, 4.076074600219727, 24.982948303222656, -2.0528564453125, 3.234039306640625, 14.006629943847656, 5.087650299072266, 1.9521484375, 8.216400146484375, 22.730636596679688, 5.7839202880859375, 4.6693878173828125, 3.318511962890625, 21.28729248046875, 13.495872497558594, 5.50592041015625, -12.293594360351562, 23.54936981201172, 8.879226684570312, 13.070823669433594, 0.8746261596679688, 8.61081314086914, -11.986228942871094, -0.8394584655761719, -0.20026779174804688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000235.npy"}
{"epoch": 0.35525321239606955, "step": 236, "batch_size": 64, "mean": 5.982413291931152, "std": 11.80238151550293, "min": -23.772750854492188, "p10": -6.009896087646483, "median": 6.16999626159668, "p90": 21.13298263549805, "max": 28.573196411132812, "pos_frac": 0.65625, "sample": [7.098682403564453, -3.5943222045898438, 7.0446929931640625, 15.77301025390625, 5.295299530029297, 8.019134521484375, 16.520320892333984, 20.230606079101562, -1.782135009765625, 5.1898193359375, 11.754146575927734, 14.1812744140625, -2.092927932739258, 1.1726303100585938, -4.298061370849609, -0.9482269287109375, 13.714561462402344, 14.573348999023438, 25.664779663085938, 25.404041290283203, -0.7674617767333984, -3.81646728515625, 1.3279266357421875, -6.288063049316406, 18.374561309814453, 19.703201293945312, 15.26822280883789, 20.62921142578125, 21.34888458251953, 10.435874938964844, 23.585006713867188, 0.8268966674804688, -10.28741455078125, 12.216361999511719, 8.647018432617188, 0.4339752197265625, -5.36083984375, 27.246734619140625, -13.215538024902344, 11.484325408935547, -4.814329147338867, 4.349952697753906, 8.707176208496094, 18.192440032958984, 0.0178375244140625, -1.4270362854003906, -4.5888519287109375, -23.210792541503906, 9.352127075195312, 8.124069213867188, -3.1121978759765625, -23.772750854492188, -17.045425415039062, 20.317398071289062, 28.573196411132812, 10.744209289550781, 7.6849822998046875, -1.4551277160644531, -1.6948413848876953, -7.0705718994140625, 22.255054473876953, -1.7855072021484375, 0.19965744018554688, 3.6206817626953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000236.npy"}
{"epoch": 0.35676492819349964, "step": 237, "batch_size": 64, "mean": 4.102872848510742, "std": 9.29997730255127, "min": -11.823936462402344, "p10": -7.371074676513671, "median": 3.5608749389648438, "p90": 16.295407485961917, "max": 34.07154846191406, "pos_frac": 0.65625, "sample": [16.76083755493164, 14.283721923828125, -5.2491302490234375, 25.30459213256836, 1.5279083251953125, 4.22137451171875, 4.123687744140625, 2.6859092712402344, 6.58685302734375, 6.696979522705078, 3.9809799194335938, -2.375030517578125, -9.25665283203125, 10.142753601074219, -0.4326133728027344, 19.6876220703125, 0.36029815673828125, 17.256912231445312, -11.56396484375, 4.266975402832031, -11.477584838867188, 9.726051330566406, -7.6248931884765625, 8.259193420410156, 19.784912109375, -4.028980255126953, 1.2006072998046875, 6.470733642578125, -8.305267333984375, 0.15273666381835938, 11.265663146972656, 5.442546844482422, 14.685760498046875, -10.446441650390625, 3.921985626220703, -4.040416717529297, 3.1997642517089844, -11.823936462402344, 20.332611083984375, -6.610736846923828, 1.7079277038574219, 4.599945068359375, 1.2625045776367188, -1.2229461669921875, -1.10736083984375, -2.6395225524902344, 10.63840103149414, 7.711395263671875, -2.44598388671875, 15.209403991699219, 12.457038879394531, 6.0930938720703125, 6.983242034912109, 34.07154846191406, -3.350250244140625, -0.4563751220703125, 2.0563392639160156, 10.345699310302734, 3.1562042236328125, 12.016010284423828, -1.4613265991210938, -6.778831481933594, -4.872325897216797, 9.515731811523438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000237.npy"}
{"epoch": 0.35827664399092973, "step": 238, "batch_size": 64, "mean": 6.2172956466674805, "std": 9.62438678741455, "min": -16.05204200744629, "p10": -5.358917999267578, "median": 5.779550552368164, "p90": 18.58143768310547, "max": 28.579437255859375, "pos_frac": 0.734375, "sample": [-1.4128646850585938, 5.724384307861328, 20.561311721801758, -11.785240173339844, 6.7967376708984375, 27.23257064819336, -2.7300491333007812, 5.92578125, 26.597244262695312, -5.461589813232422, 0.635223388671875, -6.945404052734375, 5.834716796875, 3.9414100646972656, 15.60763168334961, 12.9989013671875, 0.6581077575683594, -5.119350433349609, 13.364356994628906, 14.815818786621094, 18.284683227539062, 28.579437255859375, 13.250381469726562, 6.768482208251953, 2.7588748931884766, 21.312969207763672, 7.923919677734375, -1.980438232421875, -2.546916961669922, 7.04486083984375, 23.344558715820312, 4.728391647338867, 4.231742858886719, -0.15293121337890625, -2.8533077239990234, 2.897287368774414, 16.3343505859375, 2.3614768981933594, 16.945106506347656, 18.7086181640625, 0.6522865295410156, -3.4266090393066406, 1.1495628356933594, -0.3062591552734375, 7.693626403808594, 8.946388244628906, 13.23468017578125, 16.972713470458984, 1.2798843383789062, 10.980621337890625, 11.148040771484375, 6.1063995361328125, 13.452659606933594, -16.05204200744629, -2.599367141723633, 5.659751892089844, -7.5369415283203125, 6.356529235839844, 7.533443450927734, 1.9305877685546875, 12.265289306640625, -5.658042907714844, -10.30194091796875, 3.2444000244140625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000238.npy"}
{"epoch": 0.35978835978835977, "step": 239, "batch_size": 64, "mean": 6.116840839385986, "std": 11.187093734741211, "min": -15.633453369140625, "p10": -9.006998825073241, "median": 4.609577178955078, "p90": 25.278382873535158, "max": 31.251068115234375, "pos_frac": 0.734375, "sample": [6.933628082275391, -1.0466766357421875, 15.087669372558594, 2.507598876953125, 25.544448852539062, 5.375633239746094, -9.39163589477539, 13.490730285644531, -9.56793212890625, 8.713394165039062, -15.633453369140625, -1.4325714111328125, -8.109512329101562, 29.075225830078125, -0.3703765869140625, 1.9824676513671875, -3.4318275451660156, -11.123287200927734, 1.521820068359375, 27.45941162109375, -0.4266815185546875, 12.796401977539062, 17.079452514648438, 4.3349761962890625, 25.697532653808594, 3.4516143798828125, 3.9885787963867188, 12.158802032470703, 2.2898521423339844, 7.1913604736328125, 24.657562255859375, 3.9243545532226562, -11.660873413085938, -1.4103050231933594, 0.039886474609375, 27.894287109375, 0.34506988525390625, 7.479766845703125, 0.298065185546875, 12.776046752929688, -3.8384647369384766, 9.111412048339844, 6.0143280029296875, 12.732879638671875, -6.2887725830078125, 15.968704223632812, 13.273048400878906, 31.251068115234375, 28.315475463867188, -15.580528259277344, -0.059459686279296875, 0.9593887329101562, -11.218734741210938, 4.993316650390625, 0.6248779296875, 4.884178161621094, 18.162250518798828, 1.4078006744384766, 6.8517913818359375, 14.890106201171875, 2.002918243408203, 9.270027160644531, 11.716796875, 5.5428924560546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000239.npy"}
{"epoch": 0.36130007558578986, "step": 240, "batch_size": 64, "mean": 8.535426139831543, "std": 10.76880168914795, "min": -22.130950927734375, "p10": -3.545546340942382, "median": 9.189098358154297, "p90": 21.788418197631838, "max": 29.0736083984375, "pos_frac": 0.765625, "sample": [14.414958953857422, -6.1047210693359375, 14.595306396484375, -2.8990936279296875, 11.477546691894531, 28.115158081054688, -0.0237274169921875, 18.546142578125, -9.962322235107422, -3.8225975036621094, 9.425300598144531, 0.7112159729003906, 11.577308654785156, 21.922260284423828, 17.425643920898438, 17.49779510498047, 18.703227996826172, -7.2007293701171875, 13.71908187866211, 23.006446838378906, -1.597015380859375, 12.339164733886719, 3.413060188293457, -22.130950927734375, 13.819091796875, 29.0736083984375, 0.9757156372070312, 9.207275390625, 24.272354125976562, 17.471420288085938, 13.07861328125, 12.139450073242188, 4.534099578857422, -2.286041259765625, -2.242877960205078, 11.138557434082031, 1.9030036926269531, 13.995330810546875, 22.11772918701172, 3.557262420654297, 9.170921325683594, 5.991413116455078, -4.4570465087890625, 20.393768310546875, 28.784339904785156, -19.553363800048828, -1.3246421813964844, 4.652378082275391, 2.9738998413085938, 13.386192321777344, 3.365345001220703, 20.11579132080078, -0.5446510314941406, 8.517463684082031, 1.7160453796386719, 8.399116516113281, 17.419422149658203, 21.476119995117188, 7.2999725341796875, 7.785106658935547, 19.55451202392578, -0.02759552001953125, 3.1673355102539062, 12.102317810058594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000240.npy"}
{"epoch": 0.36281179138321995, "step": 241, "batch_size": 64, "mean": 4.312898635864258, "std": 9.455873489379883, "min": -18.451210021972656, "p10": -6.125181198120116, "median": 2.226581573486328, "p90": 19.06888198852539, "max": 24.565048217773438, "pos_frac": 0.609375, "sample": [-4.152069091796875, -3.6318016052246094, 18.930587768554688, -8.765678405761719, 18.433303833007812, -5.146751403808594, -0.8253021240234375, 9.246376037597656, 4.1113433837890625, 9.952255249023438, -4.465789794921875, 22.261390686035156, -6.777435302734375, -0.7547683715820312, -2.1250839233398438, 16.262672424316406, 3.758106231689453, 4.376337051391602, 1.70916748046875, 19.128150939941406, 6.828216552734375, 10.960540771484375, -6.4690704345703125, 1.4486160278320312, 5.559242248535156, 0.4686717987060547, 9.117111206054688, 1.2475299835205078, -1.4006271362304688, -0.08643341064453125, 5.598457336425781, 8.51629638671875, 19.586837768554688, -0.9404220581054688, -2.710399627685547, -7.9317779541015625, -2.8208541870117188, 18.63482666015625, 2.7439956665039062, 5.1409454345703125, -18.451210021972656, 0.3243904113769531, -0.9532508850097656, 24.565048217773438, 12.132917404174805, 0.8388862609863281, -4.312164306640625, 21.0911865234375, 0.8225173950195312, -10.700088500976562, 10.06341552734375, 4.095710754394531, 5.703643798828125, 12.523468017578125, 19.188859939575195, -7.056205749511719, -3.4129486083984375, 13.770809173583984, 22.85931396484375, 4.630985260009766, -5.322772979736328, 13.445991516113281, -4.229898452758789, -0.6097850799560547], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000241.npy"}
{"epoch": 0.36432350718065004, "step": 242, "batch_size": 64, "mean": 7.401757717132568, "std": 10.379165649414062, "min": -16.809616088867188, "p10": -1.8858600616455075, "median": 5.875453948974609, "p90": 22.54939460754395, "max": 35.73585510253906, "pos_frac": 0.828125, "sample": [-6.632778167724609, 3.026683807373047, 11.22747802734375, 5.924579620361328, 15.0218505859375, -13.56600570678711, 22.80942153930664, 9.834762573242188, 21.222980499267578, 17.935394287109375, 6.505279541015625, 2.7280426025390625, -2.0267906188964844, 19.631065368652344, 29.864063262939453, -1.5570220947265625, 21.942665100097656, -1.4998931884765625, 4.293724060058594, 5.607524871826172, -0.7792892456054688, 3.3233795166015625, 23.84268569946289, 9.015968322753906, 2.6702728271484375, -5.628448486328125, 3.873260498046875, 7.256610870361328, 11.427654266357422, 9.861724853515625, 5.90447998046875, 0.3815193176269531, 35.73585510253906, 0.2814788818359375, 7.304584503173828, 2.568603515625, 2.9475574493408203, 2.9695568084716797, 27.844146728515625, 0.7073326110839844, -9.36767578125, 14.462944030761719, 5.4885101318359375, 1.1641693115234375, 20.981971740722656, 1.445770263671875, 6.762788772583008, 10.009368896484375, 9.224899291992188, 7.479421615600586, 26.837364196777344, 7.634197235107422, 24.847381591796875, 1.3874359130859375, -0.9700174331665039, 5.846427917480469, 9.53607177734375, 1.635793685913086, 5.134407043457031, -16.809616088867188, 16.248565673828125, 0.9176177978515625, -10.028900146484375, 10.041662216186523], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000242.npy"}
{"epoch": 0.36583522297808013, "step": 243, "batch_size": 64, "mean": 5.565812587738037, "std": 10.354945182800293, "min": -17.978515625, "p10": -8.17098236083984, "median": 4.253376007080078, "p90": 20.502125930786136, "max": 24.401134490966797, "pos_frac": 0.6875, "sample": [5.933925628662109, 21.99480438232422, -0.0563201904296875, -12.974723815917969, 24.401134490966797, 10.013065338134766, -11.090179443359375, 4.696041107177734, 14.371864318847656, -5.645530700683594, 18.663314819335938, 6.249122619628906, 1.90142822265625, 3.2633590698242188, 3.2494277954101562, 12.117118835449219, -1.2475452423095703, -10.402816772460938, 17.565540313720703, 5.09442138671875, -2.9107437133789062, 24.120086669921875, 8.79007339477539, -1.181401252746582, -3.01666259765625, 8.995071411132812, -3.2698287963867188, 16.666311264038086, 14.910675048828125, 11.388984680175781, -2.73626708984375, -9.253318786621094, 2.1739044189453125, -9.455581665039062, 10.15716552734375, 7.922935485839844, 23.344482421875, 23.815872192382812, -17.978515625, 9.755622863769531, 19.562747955322266, -1.2833938598632812, 8.864051818847656, 23.262001037597656, -4.0323638916015625, 2.91455078125, -4.194850921630859, 2.7909584045410156, 20.90471649169922, 16.25762939453125, 6.18536376953125, 4.156044006347656, 1.3192367553710938, 18.422027587890625, -4.120807647705078, 17.8258056640625, 3.2301368713378906, 4.134651184082031, 1.6135997772216797, -3.906158447265625, -12.559440612792969, 9.046945571899414, 1.13153076171875, 4.3507080078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000243.npy"}
{"epoch": 0.3673469387755102, "step": 244, "batch_size": 64, "mean": 9.595220565795898, "std": 9.761358261108398, "min": -5.314716339111328, "p10": -1.7290782928466792, "median": 7.196907043457031, "p90": 26.343911743164067, "max": 32.37080383300781, "pos_frac": 0.84375, "sample": [11.210220336914062, 27.300643920898438, 6.011283874511719, 2.41436767578125, -5.1140289306640625, 6.141057968139648, -0.44312095642089844, 16.492448806762695, 3.217212677001953, -0.6635589599609375, -1.4004058837890625, 9.320381164550781, 5.4064483642578125, 19.467315673828125, 5.050296783447266, -4.76483154296875, 15.025749206542969, 26.832473754882812, -4.591209411621094, 12.938135147094727, 12.851364135742188, 14.353363037109375, -5.314716339111328, 17.51848602294922, 2.0879669189453125, 27.628997802734375, 10.619598388671875, 13.395538330078125, 30.830101013183594, 4.517021179199219, 28.086669921875, 7.062744140625, 2.7170581817626953, -3.8086776733398438, -3.058990478515625, 4.366010665893555, -1.8699378967285156, 18.37230682373047, 25.203933715820312, 32.37080383300781, 1.2668380737304688, 32.130523681640625, 0.7686977386474609, 3.5165061950683594, 3.0359039306640625, 18.386123657226562, 14.12567138671875, 8.949779510498047, 3.0408782958984375, 9.877204895019531, 6.648036956787109, 15.5238037109375, 3.2822628021240234, 11.944450378417969, 7.041709899902344, 20.460308074951172, 14.416301727294922, 4.65185546875, 13.94057846069336, 2.5391693115234375, 12.097343444824219, 2.602741241455078, 8.735855102539062, 7.3310699462890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000244.npy"}
{"epoch": 0.3688586545729403, "step": 245, "batch_size": 64, "mean": 8.76123332977295, "std": 10.744494438171387, "min": -19.364593505859375, "p10": -3.233022308349609, "median": 6.323051452636719, "p90": 24.055764770507814, "max": 31.200611114501953, "pos_frac": 0.84375, "sample": [5.83770751953125, 8.889495849609375, 10.392799377441406, 28.192306518554688, 16.479183197021484, -3.3451309204101562, 1.4836807250976562, 16.223011016845703, 1.6765518188476562, -4.2594146728515625, 1.0107650756835938, 2.701812744140625, 22.451873779296875, 6.099559783935547, 18.024795532226562, 17.86337661743164, 0.5012989044189453, 22.748733520507812, -19.364593505859375, 31.200611114501953, 23.734764099121094, 4.2718658447265625, 6.8826904296875, 8.113521575927734, 20.751937866210938, 16.268386840820312, 9.476715087890625, -2.307140350341797, 15.067344665527344, 0.4947319030761719, 14.292854309082031, -10.257820129394531, 2.1937713623046875, 15.073627471923828, 9.348274230957031, 10.554916381835938, 3.4518585205078125, -6.062229156494141, 11.745468139648438, 30.346893310546875, 10.047744750976562, 1.7303466796875, 25.672264099121094, -0.14421844482421875, -4.3426513671875, 2.963221549987793, 26.11201286315918, 25.93065643310547, 4.877593994140625, -2.971435546875, 5.528778076171875, 9.711540222167969, 1.214996337890625, 5.9973602294921875, 24.193336486816406, 0.4320487976074219, -7.362701416015625, 22.982574462890625, 0.5721359252929688, 1.341094970703125, 22.407878875732422, 4.8177490234375, 4.2092132568359375, 6.546543121337891], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000245.npy"}
{"epoch": 0.37037037037037035, "step": 246, "batch_size": 64, "mean": 8.089115142822266, "std": 11.984938621520996, "min": -14.781417846679688, "p10": -6.280412101745605, "median": 8.286710739135742, "p90": 25.02798233032227, "max": 39.17041015625, "pos_frac": 0.71875, "sample": [17.180805206298828, 19.807558059692383, 30.034358978271484, 22.55127716064453, 19.705734252929688, 1.7748222351074219, -5.971685409545898, 4.2126007080078125, 3.2223968505859375, -6.412723541259766, 10.11578369140625, 18.202117919921875, 9.20977783203125, -0.6284904479980469, 23.826400756835938, 1.0914497375488281, 8.24673080444336, -8.422273635864258, -4.648506164550781, 0.1039581298828125, 10.787811279296875, 26.19900894165039, 0.5645751953125, 13.877023696899414, 33.475372314453125, 9.38568115234375, 15.262725830078125, 39.17041015625, -1.861785888671875, 7.423728942871094, 8.326690673828125, 13.144187927246094, 17.11365509033203, 9.246002197265625, -9.3990478515625, 11.303291320800781, -9.288375854492188, 8.679550170898438, 25.542945861816406, -4.6266937255859375, 15.647262573242188, -3.3011245727539062, -8.975959777832031, -4.6593475341796875, 9.338119506835938, -8.994873046875, 12.57208251953125, 16.595619201660156, 6.80316162109375, -14.781417846679688, -3.5035858154296875, 7.550821304321289, 15.627887725830078, 33.349395751953125, 2.6043472290039062, -3.1685333251953125, 3.318981170654297, 30.05743408203125, -2.645843505859375, 7.168304443359375, 1.6008453369140625, 8.582435607910156, 10.840221405029297, -1.4517135620117188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000246.npy"}
{"epoch": 0.37188208616780044, "step": 247, "batch_size": 64, "mean": 4.3243513107299805, "std": 10.348287582397461, "min": -21.34491729736328, "p10": -6.098518371582031, "median": 3.104602813720703, "p90": 17.369541740417485, "max": 33.19354248046875, "pos_frac": 0.625, "sample": [-7.69586181640625, 15.743471145629883, 0.27117919921875, 1.8112449645996094, 9.289539337158203, -3.2856063842773438, -1.0509109497070312, 14.568634033203125, 28.392353057861328, -4.91973876953125, 10.558334350585938, 7.25433349609375, 20.475692749023438, 14.576351165771484, 1.6010589599609375, 19.5927734375, 4.809078216552734, 12.458343505859375, -1.6003265380859375, -2.278411865234375, 33.19354248046875, 25.95977783203125, -13.91037368774414, -4.488758087158203, 19.666397094726562, -6.0121612548828125, 6.642978668212891, 0.22093582153320312, 0.9966278076171875, 15.516044616699219, 14.833786010742188, 3.0231094360351562, 5.1347198486328125, 3.0286636352539062, 0.282470703125, -0.36119842529296875, 8.7020263671875, -21.34491729736328, -1.3492813110351562, -0.6725845336914062, -12.07156753540039, -0.6853904724121094, -3.828826904296875, 18.066429138183594, 5.991912841796875, 3.79144287109375, -6.135528564453125, -5.850276947021484, 8.367420196533203, 4.165748596191406, 3.1805419921875, 5.884326934814453, -0.9278564453125, 10.593215942382812, -13.301403045654297, 6.224132537841797, -0.9094600677490234, 4.915863037109375, -1.0248947143554688, -12.514434814453125, -3.693939208984375, 14.340782165527344, 9.313148498535156, 13.233755111694336], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000247.npy"}
{"epoch": 0.37339380196523053, "step": 248, "batch_size": 64, "mean": 6.709352493286133, "std": 12.218914985656738, "min": -17.073989868164062, "p10": -8.511989593505856, "median": 4.509781837463379, "p90": 22.83351707458496, "max": 34.994041442871094, "pos_frac": 0.734375, "sample": [3.274995803833008, 3.236297607421875, -4.39605712890625, -12.684417724609375, 18.691680908203125, 4.692710876464844, 18.91253662109375, 9.29611587524414, -0.18012619018554688, -1.5346641540527344, -1.174454689025879, 33.846309661865234, 5.227470397949219, -4.8724822998046875, 18.18035888671875, 8.756855010986328, 14.481643676757812, -17.073989868164062, -14.617752075195312, 5.844573974609375, -12.225570678710938, 0.11752510070800781, 12.528915405273438, 11.591638565063477, 15.781387329101562, 13.457927703857422, 6.547142028808594, 0.17666244506835938, 2.8930130004882812, 22.721832275390625, 0.7458343505859375, 27.647537231445312, 5.338632583618164, -12.938438415527344, 26.881614685058594, 22.504993438720703, -5.471473693847656, -2.5371932983398438, 4.034509658813477, 4.129081726074219, 12.245735168457031, 19.08602523803711, 4.389608383178711, 32.46685028076172, 19.497535705566406, 6.507274627685547, 2.530792236328125, 1.6685333251953125, 7.368610382080078, 16.3858642578125, 34.994041442871094, 4.629955291748047, 14.509963989257812, -3.3116226196289062, -5.574806213378906, 1.9152107238769531, -9.770782470703125, 22.88138198852539, 26.36077880859375, 3.4244766235351562, 0.44600677490234375, 1.8176631927490234, -13.358489990234375, -3.5452346801757812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000248.npy"}
{"epoch": 0.3749055177626606, "step": 249, "batch_size": 64, "mean": 2.998680591583252, "std": 12.304089546203613, "min": -29.529098510742188, "p10": -14.15850372314453, "median": 4.688051223754883, "p90": 18.102752113342287, "max": 28.62480354309082, "pos_frac": 0.640625, "sample": [14.771072387695312, -6.849754333496094, 11.459541320800781, 11.212982177734375, 8.860780715942383, -10.54278564453125, -19.879432678222656, 20.452964782714844, 17.4530029296875, 8.596542358398438, 12.791284561157227, 3.063922882080078, 3.2070274353027344, 4.317222595214844, -14.4725341796875, -21.682281494140625, -2.9170455932617188, -21.701751708984375, -4.507041931152344, -13.221677780151367, 4.918972015380859, -1.2520599365234375, -13.42352294921875, 6.2946319580078125, 4.457130432128906, 6.429847717285156, -0.12623977661132812, -5.077476501464844, 18.52953338623047, -0.5643692016601562, -6.078361511230469, 16.267841339111328, 1.1330642700195312, -0.9443740844726562, 23.495223999023438, -17.811248779296875, 9.003761291503906, -9.493972778320312, 18.381216049194336, -13.425765991210938, 12.1153564453125, 28.62480354309082, 20.25836181640625, 18.998260498046875, 16.464141845703125, 9.80679702758789, -15.564834594726562, 6.824790954589844, 8.326911926269531, 7.9807586669921875, 3.9112110137939453, 9.329261779785156, 6.391143798828125, 0.1247100830078125, 11.834037780761719, 5.4124755859375, -7.247047424316406, 6.474754333496094, 16.12372589111328, -29.529098510742188, 2.868396759033203, -3.031513214111328, 0.7874755859375, 13.504798889160156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000249.npy"}
{"epoch": 0.3764172335600907, "step": 250, "batch_size": 64, "mean": 8.984001159667969, "std": 12.617653846740723, "min": -35.68365478515625, "p10": -3.2414428710937493, "median": 7.867822647094727, "p90": 24.408189010620116, "max": 39.808563232421875, "pos_frac": 0.796875, "sample": [15.236595153808594, 1.5402641296386719, -1.797494888305664, 10.482723236083984, 1.1307830810546875, 17.91101837158203, 17.266380310058594, 21.904708862304688, 27.412479400634766, 1.6698837280273438, 24.17034912109375, -14.489860534667969, 9.453697204589844, 2.2242889404296875, 7.82843017578125, 12.267984390258789, -0.696441650390625, -35.68365478515625, 14.107185363769531, 5.209503173828125, 7.907215118408203, -7.75556755065918, 3.0539474487304688, -13.030717849731445, 11.391311645507812, 7.193511962890625, 27.505573272705078, 15.172279357910156, 10.874305725097656, 22.412803649902344, -0.3595237731933594, 8.120819091796875, 22.44509506225586, -5.509859085083008, 6.1514739990234375, 6.109291076660156, 6.3713531494140625, 16.331043243408203, 37.233673095703125, 3.1282196044921875, 9.636493682861328, -2.654449462890625, 21.79198455810547, -2.038238525390625, 27.491806030273438, 6.050870895385742, 36.021331787109375, 19.7681827545166, 6.479106903076172, 24.510120391845703, 1.298858642578125, 8.9404296875, 3.4710693359375, 8.22292709350586, -3.493011474609375, 9.326431274414062, 39.808563232421875, 13.369224548339844, 3.886138916015625, -0.4259357452392578, -3.6460189819335938, 17.11438751220703, 3.0510597229003906, 5.099697113037109], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000250.npy"}
{"epoch": 0.3779289493575208, "step": 251, "batch_size": 64, "mean": 4.860921859741211, "std": 10.900506973266602, "min": -14.417594909667969, "p10": -10.576417541503906, "median": 4.059737205505371, "p90": 17.02690353393555, "max": 35.03345489501953, "pos_frac": 0.65625, "sample": [2.5405101776123047, 22.606300354003906, 35.03345489501953, 0.7916450500488281, 4.101615905761719, 5.835395812988281, -13.647048950195312, -0.3319816589355469, -0.02506256103515625, 32.52068328857422, -1.5706024169921875, -14.417594909667969, 13.635635375976562, -3.0426406860351562, -2.247386932373047, -12.866628646850586, 1.7723236083984375, 13.9627685546875, -4.22845458984375, -11.54425048828125, 0.78704833984375, 32.282745361328125, 1.999725341796875, -11.507598876953125, 10.383922576904297, -2.1122970581054688, 17.394371032714844, 15.82562255859375, -10.817169189453125, 2.271942138671875, 12.862060546875, 6.8822021484375, -0.8732070922851562, 4.395861625671387, 9.498359680175781, 7.687248229980469, 16.169479370117188, 9.015907287597656, -7.917198181152344, -4.555359840393066, 7.892551422119141, 4.35369873046875, 12.881866455078125, 8.49338150024414, -1.10443115234375, 3.8641815185546875, 7.3086090087890625, 15.722175598144531, 6.846954345703125, 22.4329833984375, 12.441543579101562, 1.41815185546875, 11.940399169921875, -10.014663696289062, 10.977851867675781, -2.6986007690429688, 4.017858505249023, -13.122817993164062, 9.859737396240234, -2.0187454223632812, -4.137378692626953, 1.122833251953125, 17.72265625, 6.3458709716796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000251.npy"}
{"epoch": 0.3794406651549509, "step": 252, "batch_size": 64, "mean": 5.119195938110352, "std": 10.91690444946289, "min": -27.708099365234375, "p10": -9.012529277801509, "median": 4.829566955566406, "p90": 20.760580825805672, "max": 27.54681396484375, "pos_frac": 0.75, "sample": [6.588104248046875, -20.402359008789062, -0.8495750427246094, -14.726959228515625, 24.194847106933594, 4.608451843261719, 3.3556442260742188, 18.7156982421875, 6.001949310302734, 18.088424682617188, 22.37096405029297, 8.428884506225586, 4.968498229980469, 5.751625061035156, 24.548091888427734, 1.5238113403320312, 22.571258544921875, 1.8216438293457031, 4.48797607421875, 0.8305997848510742, -1.46746826171875, 11.505481719970703, 15.4915771484375, 10.678369522094727, 13.48944091796875, 0.602630615234375, -1.7061691284179688, -17.532936096191406, 9.684173583984375, 6.008113861083984, 6.052003860473633, 10.855888366699219, 16.005008697509766, 3.49102783203125, 9.68661117553711, 1.8563003540039062, 0.41363525390625, 9.015769958496094, 5.012474060058594, -0.04390716552734375, 6.732337951660156, -2.180591583251953, -1.331573486328125, 8.059335708618164, 2.0628738403320312, 27.54681396484375, -1.1315841674804688, -0.05097198486328125, -27.708099365234375, -11.376914978027344, -4.222521781921387, 5.946990966796875, 0.5435791015625, -11.065389633178711, 4.124732971191406, 8.472625732421875, 7.0387725830078125, 26.90441131591797, 4.690635681152344, 21.636959075927734, 18.063232421875, 3.0270156860351562, -12.03071403503418, 1.9009933471679688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000252.npy"}
{"epoch": 0.38095238095238093, "step": 253, "batch_size": 64, "mean": 5.03292179107666, "std": 10.492225646972656, "min": -13.40049934387207, "p10": -6.438712310791014, "median": 2.8419017791748047, "p90": 20.52332763671876, "max": 33.23353576660156, "pos_frac": 0.671875, "sample": [8.835502624511719, -4.4085845947265625, 30.920242309570312, 4.964042663574219, 2.8764305114746094, 3.028942108154297, 1.0817031860351562, 16.74416160583496, 17.58495330810547, -10.974903106689453, 3.4743881225585938, 0.8395919799804688, 24.842090606689453, -1.2627010345458984, 2.3659820556640625, -5.18389892578125, -10.506473541259766, 16.461990356445312, -3.0562515258789062, 3.0203018188476562, 1.586761474609375, 6.704708099365234, -3.8606033325195312, 13.6099853515625, 7.090934753417969, -2.107421875, 22.86890411376953, 9.107833862304688, 4.7488861083984375, 14.120071411132812, 1.1974945068359375, -0.4129066467285156, 7.22725772857666, 2.3240432739257812, 2.807373046875, 33.23353576660156, -2.9785003662109375, -3.774200439453125, 11.539291381835938, 8.781532287597656, 4.786079406738281, -7.211330413818359, 2.529296875, -13.40049934387207, 9.310279846191406, -6.9551544189453125, -7.074338912963867, 5.574462890625, 29.44724464416504, 0.6588382720947266, 12.564407348632812, 12.343742370605469, 30.542022705078125, 2.803680419921875, 7.091098785400391, -1.0833816528320312, -7.4625244140625, -4.931169509887695, 21.782630920410156, -5.233680725097656, -0.67169189453125, -3.5144577026367188, 4.359363555908203, 0.3895721435546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000253.npy"}
{"epoch": 0.382464096749811, "step": 254, "batch_size": 64, "mean": 6.14630126953125, "std": 10.701043128967285, "min": -15.696823120117188, "p10": -6.242104339599608, "median": 4.511955261230469, "p90": 20.320073509216314, "max": 32.61957931518555, "pos_frac": 0.640625, "sample": [-9.539970397949219, 1.1518096923828125, 16.331497192382812, -2.39093017578125, 4.3007659912109375, 16.568389892578125, 23.634746551513672, 10.1761474609375, 2.806194305419922, -0.099884033203125, -0.7682113647460938, 16.81029510498047, 10.646682739257812, 10.295042037963867, 6.4103851318359375, 13.351573944091797, 28.94134521484375, 15.596084594726562, -2.8524322509765625, 7.0225830078125, -3.2760467529296875, 32.108726501464844, -2.8658599853515625, 20.938575744628906, -2.3598785400390625, -9.859207153320312, 12.230697631835938, 18.876901626586914, 21.186233520507812, -7.070396423339844, 32.61957931518555, -2.0432891845703125, 0.24776458740234375, -0.5406723022460938, -15.696823120117188, 11.834823608398438, -3.397754669189453, 16.680683135986328, -0.2386188507080078, 12.014678955078125, 8.690704345703125, -3.7730445861816406, 8.343330383300781, -8.769439697265625, -2.900493621826172, 27.73577880859375, -1.2799949645996094, 7.195182800292969, 12.233047485351562, 1.4088420867919922, -7.161590576171875, -4.488639831542969, 2.6276779174804688, 12.829803466796875, 18.799285888671875, 6.14666748046875, 3.17584228515625, 3.6992454528808594, 7.9677581787109375, 2.237060546875, 5.906362533569336, 4.72314453125, -4.9843292236328125, -6.781150817871094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000254.npy"}
{"epoch": 0.3839758125472411, "step": 255, "batch_size": 64, "mean": 6.781064510345459, "std": 11.247841835021973, "min": -20.498123168945312, "p10": -7.559880065917968, "median": 7.754095077514648, "p90": 19.89926223754883, "max": 35.03717803955078, "pos_frac": 0.78125, "sample": [17.830896377563477, -20.498123168945312, 22.77521514892578, 20.12249755859375, 17.48815155029297, 17.280303955078125, 0.22919464111328125, -1.5457649230957031, 1.8454399108886719, 14.29620361328125, 0.8301467895507812, -3.4537506103515625, 11.206695556640625, 16.639022827148438, -15.801643371582031, 4.022056579589844, 9.974006652832031, 10.265426635742188, 4.326515197753906, -19.684585571289062, 19.378379821777344, 9.676605224609375, 7.6824188232421875, 4.630409240722656, 6.676361083984375, 6.0531463623046875, -8.115280151367188, 1.8937721252441406, 26.788223266601562, 2.6631698608398438, 16.654953002929688, 10.645042419433594, -4.218849182128906, 4.7789154052734375, 6.14715576171875, 7.854022979736328, 8.79833984375, 9.84442138671875, 25.489856719970703, 12.149982452392578, 3.1882667541503906, 9.971359252929688, 12.969802856445312, -5.6820526123046875, 4.353870391845703, -6.237133026123047, 10.079727172851562, 35.03717803955078, -9.660591125488281, 16.827987670898438, 20.754711151123047, 8.970085144042969, 4.299079895019531, 18.744400024414062, 7.825771331787109, 9.043190002441406, 6.095439910888672, 21.55289077758789, 14.110870361328125, -18.858295440673828, -6.263946533203125, -1.204803466796875, 5.400592803955078, -10.949195861816406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000255.npy"}
{"epoch": 0.3854875283446712, "step": 256, "batch_size": 64, "mean": 6.714789390563965, "std": 11.731410026550293, "min": -15.788162231445312, "p10": -5.5675249099731445, "median": 4.279270172119141, "p90": 22.747369384765626, "max": 30.885513305664062, "pos_frac": 0.71875, "sample": [-1.8668136596679688, 0.31005859375, 8.070701599121094, 16.53668212890625, 21.938583374023438, 1.603118896484375, -15.50307846069336, 21.752655029296875, 0.20059585571289062, -1.5812568664550781, -15.07828140258789, 4.495458602905273, -5.7425537109375, 22.13141632080078, -3.0936431884765625, 30.885513305664062, 8.977943420410156, 4.74383544921875, -9.84566879272461, 2.5032196044921875, -8.324832916259766, 23.393478393554688, 3.737751007080078, 7.16680908203125, -4.515594482421875, 1.6667556762695312, 10.556390762329102, -3.1740188598632812, 4.360908508300781, 7.542949676513672, -3.2479019165039062, 16.452457427978516, 4.1976318359375, 20.807085037231445, 3.8931961059570312, 17.70086669921875, 21.89966583251953, 18.220256805419922, 1.508514404296875, 25.702449798583984, -0.0830078125, 1.3017711639404297, 28.010284423828125, -15.788162231445312, 26.437461853027344, 7.978359222412109, 19.014678955078125, -0.0645294189453125, 1.9467811584472656, 22.96173095703125, -0.9832229614257812, 19.579444885253906, 7.113372802734375, 5.597904205322266, 12.050003051757812, -15.542423248291016, 22.2471923828125, -2.6692657470703125, 1.4040565490722656, 25.433151245117188, -5.159124374389648, 7.248790740966797, 0.054962158203125, 0.6730384826660156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000256.npy"}
{"epoch": 0.3869992441421013, "step": 257, "batch_size": 64, "mean": 6.60753059387207, "std": 12.558882713317871, "min": -27.23940086364746, "p10": -7.555035591125487, "median": 3.8622589111328125, "p90": 24.440256500244143, "max": 36.477359771728516, "pos_frac": 0.6875, "sample": [0.4356985092163086, 1.6997222900390625, -0.437225341796875, 24.50262451171875, 30.308349609375, 10.140804290771484, -2.1488609313964844, 23.63512420654297, 8.206756591796875, 26.46635627746582, 20.957420349121094, 3.4831008911132812, 9.794597625732422, -0.44121551513671875, 22.906646728515625, 18.956100463867188, 26.315841674804688, 3.5414810180664062, -13.149490356445312, 0.37194061279296875, -0.5752220153808594, 3.6741104125976562, -3.8719482421875, -8.04139518737793, 20.139328002929688, -14.568145751953125, -0.9225120544433594, 8.326652526855469, 9.184799194335938, -4.535125732421875, 8.659873962402344, 24.29473114013672, 0.71710205078125, -4.5567474365234375, 14.225959777832031, 4.374275207519531, -5.386474609375, 15.370246887207031, 23.68262481689453, -13.125762939453125, -27.23940086364746, 27.11932373046875, 10.118587493896484, 7.93341064453125, -6.420196533203125, -8.113210678100586, 3.9406890869140625, -0.7701339721679688, 9.996444702148438, 5.942108154296875, 12.72934341430664, 3.7838287353515625, 24.71722412109375, 0.8716316223144531, -2.569732666015625, 36.477359771728516, -1.0054740905761719, -13.151390075683594, 20.71086883544922, 3.4833908081054688, 12.084074020385742, 5.4786224365234375, 3.469970703125, 0.682464599609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000257.npy"}
{"epoch": 0.3885109599395314, "step": 258, "batch_size": 64, "mean": 7.310370922088623, "std": 8.680947303771973, "min": -11.927291870117188, "p10": -2.6761175155639645, "median": 5.753782272338867, "p90": 18.723230934143075, "max": 29.61618423461914, "pos_frac": 0.828125, "sample": [5.183258056640625, 3.4060897827148438, 6.437583923339844, -2.049755096435547, 4.180694580078125, 25.739459991455078, 21.077049255371094, 4.87615966796875, 14.096672058105469, -2.547250747680664, 15.055198669433594, 14.222274780273438, -5.0172576904296875, 11.451932907104492, 11.956954956054688, 5.490478515625, -7.805377960205078, -4.889427185058594, 3.1340484619140625, -4.957847595214844, 8.076007843017578, 3.7494964599609375, -11.199211120605469, 14.813003540039062, 4.7349090576171875, 8.479557037353516, 7.18768310546875, 1.7963800430297852, 4.0475006103515625, 4.740577697753906, 9.554336547851562, 1.1601142883300781, 21.515403747558594, 16.910810470581055, 5.1832427978515625, -1.9649581909179688, -0.2910308837890625, 6.146949768066406, 4.96807861328125, 27.499038696289062, 8.488533020019531, 3.2042388916015625, 11.515121459960938, 3.7331771850585938, 16.979507446289062, 19.47054100036621, 16.14368438720703, 8.471649169921875, 2.7531356811523438, 16.72494125366211, 1.6404876708984375, 1.7971725463867188, 10.948226928710938, 1.4662284851074219, 9.668533325195312, 5.76513671875, 16.280242919921875, -11.927291870117188, 5.742427825927734, -2.7313461303710938, 20.0821533203125, 8.272537231445312, 29.61618423461914, 7.609687805175781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000258.npy"}
{"epoch": 0.3900226757369615, "step": 259, "batch_size": 64, "mean": 9.96130657196045, "std": 12.523102760314941, "min": -20.612396240234375, "p10": -5.854181671142578, "median": 9.347434997558594, "p90": 25.0664342880249, "max": 36.29998779296875, "pos_frac": 0.78125, "sample": [-5.8968658447265625, -14.224334716796875, -20.612396240234375, 12.752338409423828, 8.947357177734375, 19.557559967041016, 21.57284927368164, 32.33921813964844, 9.033958435058594, -5.754585266113281, 25.09467315673828, 1.839569091796875, 10.278778076171875, 9.430648803710938, 12.876174926757812, -0.7967071533203125, -7.572669982910156, -1.9579582214355469, 36.29998779296875, 15.492263793945312, 0.6865386962890625, 24.8602294921875, 28.014057159423828, 24.719585418701172, 8.734695434570312, -0.633026123046875, 1.6858978271484375, 10.962085723876953, 7.673961639404297, 5.8978729248046875, 11.516216278076172, 6.636650085449219, -7.354213714599609, 15.446441650390625, 8.439933776855469, -5.3723602294921875, 3.09686279296875, 3.6101226806640625, 23.4100341796875, 14.347274780273438, 23.861003875732422, 10.202705383300781, -0.8249740600585938, 29.588760375976562, 3.3066635131835938, 9.517776489257812, -13.189624786376953, -1.8115921020507812, -10.728408813476562, 2.0930557250976562, 28.56427001953125, 21.783424377441406, 22.207366943359375, 9.26422119140625, 28.061752319335938, 25.00054359436035, 14.778778076171875, 1.3131484985351562, 3.6946334838867188, 7.1044158935546875, 24.8853759765625, 23.37596893310547, 20.94561767578125, 9.450027465820312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000259.npy"}
{"epoch": 0.3915343915343915, "step": 260, "batch_size": 64, "mean": 6.716233730316162, "std": 13.308359146118164, "min": -20.567115783691406, "p10": -12.267129898071289, "median": 6.292747497558594, "p90": 25.987289428710938, "max": 33.15351867675781, "pos_frac": 0.703125, "sample": [20.031620025634766, -16.806785583496094, 12.423839569091797, -3.7047042846679688, -3.5652694702148438, 26.166046142578125, 3.382568359375, 2.8471603393554688, 28.21692657470703, -11.319686889648438, 25.880050659179688, 17.070392608642578, 26.033248901367188, 1.96551513671875, 8.074119567871094, -1.3331451416015625, 20.636337280273438, 4.554088592529297, 7.904672622680664, 33.15351867675781, 16.1083984375, 6.412376403808594, 28.409412384033203, 18.36786651611328, -1.289337158203125, 3.4014244079589844, 15.561603546142578, 27.96216583251953, -1.9011306762695312, 1.4477386474609375, 13.72198486328125, 8.878707885742188, 8.173316955566406, -13.21097183227539, -0.7979774475097656, 13.851531982421875, -2.2101898193359375, -11.751564025878906, -20.116817474365234, 6.173118591308594, -0.3549671173095703, 4.611743927001953, 6.102020263671875, 9.278602600097656, 4.076560974121094, 14.003690719604492, 14.296489715576172, -9.64910888671875, 27.998577117919922, 25.543441772460938, -12.488086700439453, 8.950532913208008, 2.612518310546875, 7.6294097900390625, 16.71338653564453, 17.064727783203125, -3.74456787109375, 18.747299194335938, 12.42681884765625, -20.567115783691406, 2.2425575256347656, -20.203369140625, -17.5093994140625, 3.2550277709960938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000260.npy"}
{"epoch": 0.3930461073318216, "step": 261, "batch_size": 64, "mean": 4.976651191711426, "std": 11.74699878692627, "min": -24.329864501953125, "p10": -8.950648498535156, "median": 3.7804651260375977, "p90": 21.59264297485352, "max": 34.842918395996094, "pos_frac": 0.65625, "sample": [-0.7455368041992188, 7.561859130859375, 2.886402130126953, -1.65740966796875, -8.4307861328125, 18.732877731323242, 7.884883880615234, 20.306209564208984, 2.8683624267578125, -1.3389778137207031, -12.433238983154297, -9.173446655273438, -0.23146820068359375, 20.57330322265625, 14.26226806640625, -1.8135223388671875, 1.3304786682128906, 28.000289916992188, 0.4928092956542969, 31.980976104736328, 12.677352905273438, 9.106773376464844, -4.746131896972656, -11.350616455078125, 6.5097503662109375, -3.3889923095703125, 34.842918395996094, 5.814517974853516, -1.103424072265625, 2.7415390014648438, 4.254810333251953, 6.064426422119141, 9.370704650878906, 22.029502868652344, 0.7263526916503906, -1.1263408660888672, 29.123313903808594, -24.329864501953125, -13.355621337890625, 9.080974578857422, 11.085624694824219, 1.292510986328125, 11.231254577636719, 12.416679382324219, 11.160015106201172, -1.2421340942382812, -2.529247283935547, -16.921661376953125, -14.56509780883789, 6.255527496337891, 4.971160888671875, -1.7580718994140625, 24.659103393554688, 25.830078125, -0.3325958251953125, -7.449249267578125, 5.85711669921875, 3.9097366333007812, 1.5777740478515625, 4.9362640380859375, 7.2660064697265625, 0.9425773620605469, 12.2628173828125, 3.651193618774414], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000261.npy"}
{"epoch": 0.3945578231292517, "step": 262, "batch_size": 64, "mean": 6.939663887023926, "std": 9.569439888000488, "min": -10.92340087890625, "p10": -3.0629474639892575, "median": 5.275236129760742, "p90": 20.704629516601564, "max": 27.5775146484375, "pos_frac": 0.734375, "sample": [-8.501075744628906, 9.250015258789062, 11.283782958984375, -0.1323986053466797, 3.57940673828125, 13.258298873901367, 7.604736328125, -1.3348770141601562, 16.223064422607422, 18.797122955322266, -2.076202392578125, 24.675704956054688, -3.1850662231445312, 6.313652038574219, 24.82415771484375, -9.214004516601562, 4.349109649658203, 16.613357543945312, 5.57281494140625, 8.387924194335938, -6.015800476074219, 5.3045654296875, -1.213104248046875, -0.331939697265625, 17.155029296875, 18.125335693359375, 4.039093017578125, 14.749469757080078, -7.988128662109375, -0.7294883728027344, 20.330169677734375, 0.5535049438476562, 1.7683753967285156, 19.672317504882812, -10.92340087890625, 6.012420654296875, 3.875621795654297, 13.466865539550781, 21.635101318359375, 2.68499755859375, -2.778003692626953, -0.65142822265625, -2.124431610107422, 20.8651123046875, 27.5775146484375, 14.361473083496094, 5.245906829833984, 5.091552734375, -2.4365615844726562, 2.3509140014648438, 8.199165344238281, 1.72296142578125, 27.010414123535156, 7.065587997436523, 4.336601257324219, 0.5284194946289062, 2.917205810546875, 25.996414184570312, 5.469120025634766, 9.354904174804688, 6.219940185546875, 12.879730224609375, 3.4115371704101562, -6.936073303222656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000262.npy"}
{"epoch": 0.3960695389266818, "step": 263, "batch_size": 64, "mean": 6.861823081970215, "std": 11.577089309692383, "min": -23.785442352294922, "p10": -6.05341796875, "median": 4.672203063964844, "p90": 22.728373718261725, "max": 34.98394775390625, "pos_frac": 0.75, "sample": [0.37206268310546875, 21.165924072265625, -6.572151184082031, 2.2542953491210938, 4.8989715576171875, 0.6344375610351562, 10.984085083007812, 4.4454345703125, 15.870964050292969, 16.424854278564453, 16.0953369140625, 5.603481292724609, -4.152397155761719, 8.21551513671875, 0.20893096923828125, -8.971202850341797, 15.788597106933594, 2.9695892333984375, -2.2547988891601562, -5.9135589599609375, 10.0616455078125, 18.91436004638672, 16.847305297851562, 9.975975036621094, -11.448341369628906, 4.274013519287109, 30.300979614257812, 12.872161865234375, 10.008880615234375, 8.785919189453125, 29.43880271911621, 7.042633056640625, -3.79998779296875, 9.392578125, -5.664369583129883, 0.8517303466796875, -6.1133575439453125, 26.91936492919922, 0.6838035583496094, 16.848365783691406, -0.6287841796875, -1.8616180419921875, 8.371337890625, 23.397994995117188, -2.9942626953125, 2.215606689453125, 2.379486083984375, 25.907546997070312, 8.242439270019531, 4.376750946044922, -9.671951293945312, 34.068511962890625, 17.5955810546875, -4.061882019042969, 34.98394775390625, 0.09177398681640625, 8.580562591552734, 4.378139495849609, 2.5710105895996094, -7.142612457275391, 0.2461700439453125, -23.785442352294922, 14.823844909667969, 12.811656951904297], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000263.npy"}
{"epoch": 0.3975812547241119, "step": 264, "batch_size": 64, "mean": 7.336230754852295, "std": 10.601770401000977, "min": -12.025527954101562, "p10": -3.783919525146484, "median": 6.103649139404297, "p90": 20.650928115844728, "max": 40.2835693359375, "pos_frac": 0.71875, "sample": [19.234588623046875, -4.6172943115234375, 1.8265533447265625, 5.284446716308594, 20.244304656982422, -2.8940391540527344, 4.9044189453125, 0.8668899536132812, 13.946826934814453, 11.384071350097656, -9.359354019165039, -0.675262451171875, -12.025527954101562, 13.20999526977539, -0.4769096374511719, -2.293365478515625, 0.09804534912109375, 11.652252197265625, 1.4736137390136719, 11.0654296875, 7.156890869140625, 15.3431396484375, 11.357643127441406, 12.069992065429688, 2.4890365600585938, 33.66558837890625, -0.5937538146972656, 12.31866455078125, -0.0340576171875, 19.125383377075195, 13.388298034667969, 6.6383514404296875, 21.480300903320312, 4.065227508544922, 7.468746185302734, 1.8296947479248047, 20.8251953125, -2.857391357421875, -3.4164199829101562, 29.784439086914062, 5.568946838378906, -5.489082336425781, -3.0111312866210938, 10.20404052734375, 20.223426818847656, 10.198806762695312, -1.25189208984375, 12.493499755859375, -9.969642639160156, 8.1405029296875, 0.904571533203125, 3.5677566528320312, 30.334468841552734, 7.502510070800781, 2.439666748046875, -4.961170196533203, 4.952850341796875, 9.533355712890625, 7.1529541015625, -3.4615249633789062, -3.922088623046875, 40.2835693359375, 21.982437133789062, 11.147308349609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000264.npy"}
{"epoch": 0.39909297052154197, "step": 265, "batch_size": 64, "mean": 9.297408103942871, "std": 12.882739067077637, "min": -20.770675659179688, "p10": -8.21737232208252, "median": 9.15699577331543, "p90": 26.610978698730474, "max": 34.23711395263672, "pos_frac": 0.765625, "sample": [-11.023696899414062, 7.98944091796875, -6.796054840087891, -0.2416839599609375, 15.020545959472656, 1.689056396484375, 12.578907012939453, 10.6474609375, -6.869026184082031, 31.821151733398438, 3.1353626251220703, 18.091087341308594, 7.736763000488281, 9.291828155517578, -2.7431106567382812, 1.7713470458984375, 11.927938461303711, 7.753118515014648, 11.002750396728516, -20.770675659179688, -8.429117202758789, 23.23974609375, 29.55303955078125, 11.90109634399414, 0.7117099761962891, 6.3036346435546875, 16.09368896484375, 2.1598854064941406, 15.11065673828125, 3.143360137939453, 24.145679473876953, 21.98925018310547, -6.8121795654296875, 31.256093978881836, 12.920616149902344, 32.16111755371094, 6.421220779418945, 1.6215057373046875, -7.723300933837891, 0.03084564208984375, 10.262046813964844, 23.162185668945312, 12.222614288330078, 23.421096801757812, 34.23711395263672, 17.379066467285156, 28.706233978271484, 15.089645385742188, 9.022163391113281, -13.972888946533203, 12.85186767578125, 5.585594177246094, -0.40857696533203125, 23.04180908203125, 20.793312072753906, 21.1398983001709, 27.190818786621094, -9.170970916748047, 5.319622039794922, -9.918128967285156, 25.258018493652344, -0.7078628540039062, 8.551582336425781, -11.833221435546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000265.npy"}
{"epoch": 0.40060468631897206, "step": 266, "batch_size": 64, "mean": 7.84024715423584, "std": 12.01264762878418, "min": -26.243202209472656, "p10": -1.7987602233886717, "median": 6.198419570922852, "p90": 23.54253311157227, "max": 39.07115173339844, "pos_frac": 0.796875, "sample": [-2.6961708068847656, 20.463924407958984, 34.096702575683594, 6.9601287841796875, 18.892738342285156, 0.011486053466796875, 1.1227874755859375, -0.7647056579589844, 3.2227783203125, -12.862804412841797, 8.183773040771484, -1.2368011474609375, 0.8382759094238281, 5.563459396362305, 2.125263214111328, 12.530647277832031, -1.864044189453125, 37.27092742919922, 0.604766845703125, 1.1426658630371094, 31.80008316040039, 10.435745239257812, 2.3128814697265625, 20.68728256225586, 7.108989715576172, 24.832061767578125, -7.813163757324219, 10.23944091796875, 24.618389129638672, 14.126930236816406, 14.826526641845703, 0.7434158325195312, 9.364330291748047, -0.40048980712890625, 5.112771987915039, -1.6464309692382812, 6.7266845703125, 23.983123779296875, 21.634326934814453, -0.7146034240722656, 8.680473327636719, -5.1501617431640625, 5.1615753173828125, 7.665637969970703, -20.927459716796875, 6.3041839599609375, 8.496055603027344, 4.1750640869140625, 4.8657989501953125, 13.993797302246094, -26.243202209472656, 4.5255584716796875, -1.6453399658203125, 13.979400634765625, 13.110694885253906, 13.941368103027344, 39.07115173339844, 2.7544078826904297, 18.62360382080078, 6.611869812011719, 22.514488220214844, 2.416290283203125, 6.092655181884766, 1.173828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000266.npy"}
{"epoch": 0.4021164021164021, "step": 267, "batch_size": 64, "mean": 6.562788009643555, "std": 10.552145957946777, "min": -23.12903594970703, "p10": -7.493924522399901, "median": 7.008007049560547, "p90": 20.287796020507816, "max": 27.80740737915039, "pos_frac": 0.75, "sample": [12.675460815429688, -0.41145896911621094, 1.7614898681640625, -7.970788955688477, 9.137809753417969, -1.281402587890625, 14.312244415283203, 15.908172607421875, 19.677261352539062, 7.892230987548828, 2.2805633544921875, 1.7700881958007812, 2.4361610412597656, 24.725555419921875, 2.5250186920166016, -10.587509155273438, 0.6772079467773438, -0.8254165649414062, 0.83514404296875, -9.254329681396484, -4.9033966064453125, -6.3812408447265625, 4.662879943847656, 4.517669677734375, 22.414703369140625, 27.80740737915039, 0.4425010681152344, -4.211790084838867, 16.0501708984375, -9.83749008178711, 15.326866149902344, 10.127246856689453, 22.192970275878906, 6.516975402832031, 0.8615646362304688, 0.9488296508789062, 20.549453735351562, -9.136260986328125, 0.781890869140625, 24.607521057128906, 13.0089111328125, 0.3672008514404297, 27.226043701171875, 0.7876319885253906, 14.931785583496094, 11.369638442993164, 19.340057373046875, 9.681312561035156, -23.12903594970703, 14.068431854248047, 7.4990386962890625, -9.141464233398438, -4.199553489685059, 13.045064926147461, 12.467327117919922, -0.28379058837890625, 15.302619934082031, 13.322330474853516, 14.02764892578125, -3.179168701171875, 9.00363540649414, 16.596389770507812, 9.781990051269531, 8.50242805480957], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000267.npy"}
{"epoch": 0.4036281179138322, "step": 268, "batch_size": 64, "mean": 4.415614128112793, "std": 10.87142562866211, "min": -14.719375610351562, "p10": -6.544123077392577, "median": 1.604292869567871, "p90": 20.00389137268067, "max": 37.74383544921875, "pos_frac": 0.59375, "sample": [-1.1912689208984375, 27.816940307617188, 7.149528503417969, -13.622909545898438, 0.28271484375, 14.389328002929688, -2.8440780639648438, -14.719375610351562, -8.015121459960938, 22.72754669189453, 13.688148498535156, -0.9242172241210938, -1.4042530059814453, 2.327117919921875, 6.8775787353515625, -2.5201683044433594, 2.4822349548339844, -11.060867309570312, -7.009735107421875, 18.94028091430664, 18.211801528930664, 5.448974609375, -3.9234790802001953, -2.0902023315429688, -4.083778381347656, 1.65789794921875, -0.922027587890625, 12.108375549316406, 5.330497741699219, -1.4281158447265625, 6.565521240234375, -1.0815277099609375, -0.8322868347167969, 2.5895843505859375, 0.5719146728515625, 24.93651580810547, -0.330230712890625, -12.887340545654297, -5.087127685546875, 1.6377849578857422, 8.669746398925781, -1.2349700927734375, 10.416702270507812, 37.74383544921875, 31.737327575683594, 9.173614501953125, 14.119895935058594, -2.312379837036133, 1.57080078125, -10.957984924316406, -5.457695007324219, 1.4494705200195312, 11.15597915649414, 0.5022821426391602, 21.796707153320312, 7.947292327880859, -4.778861999511719, 8.598114013671875, 15.916263580322266, 1.8991546630859375, -3.2357521057128906, 20.45972442626953, 6.928112030029297, 0.7297477722167969], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000268.npy"}
{"epoch": 0.4051398337112623, "step": 269, "batch_size": 64, "mean": 6.5330915451049805, "std": 10.557945251464844, "min": -19.56763458251953, "p10": -4.017119598388671, "median": 5.20098876953125, "p90": 19.033627128601076, "max": 34.10550308227539, "pos_frac": 0.75, "sample": [3.2130508422851562, -18.653770446777344, 9.63177490234375, 6.077175140380859, 3.06298828125, -3.5679397583007812, 7.419403076171875, 34.10550308227539, 17.461387634277344, -4.209625244140625, -2.925018310546875, 5.61346435546875, 10.610359191894531, 4.78851318359375, 13.703296661376953, 25.401382446289062, 0.6360588073730469, 2.9901580810546875, 8.430122375488281, -5.799816131591797, -2.0589752197265625, -2.108154296875, 12.548900604248047, 6.957122802734375, -16.862213134765625, 20.09368133544922, 10.497795104980469, 32.013153076171875, 13.500907897949219, 18.319673538208008, 4.556316375732422, 9.528045654296875, -1.0554866790771484, 17.41869354248047, 7.1587371826171875, 1.8117504119873047, 0.9824752807617188, 2.978485107421875, -0.5830841064453125, 19.483184814453125, 25.259136199951172, 8.98847770690918, -2.8629302978515625, 7.889873504638672, 17.719791412353516, -5.113353729248047, 13.744110107421875, -2.930072784423828, -5.787651062011719, 4.52996826171875, 4.23748779296875, 2.454774856567383, 16.54522705078125, 4.6647796630859375, 0.33648681640625, -19.56763458251953, -2.56085205078125, 16.653076171875, 0.7675704956054688, 14.432182312011719, 19.33960723876953, 15.376380920410156, 9.34512710571289, 1.48681640625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000269.npy"}
{"epoch": 0.40665154950869237, "step": 270, "batch_size": 64, "mean": 5.86357307434082, "std": 10.015969276428223, "min": -17.658313751220703, "p10": -5.4665048599243145, "median": 4.24188232421875, "p90": 19.220964813232424, "max": 34.401573181152344, "pos_frac": 0.734375, "sample": [13.807086944580078, 17.581649780273438, -1.9149456024169922, 28.082504272460938, 5.456756591796875, 23.442420959472656, 13.683830261230469, 9.802143096923828, 11.114028930664062, 0.0208892822265625, 34.401573181152344, 19.61914825439453, -0.704620361328125, 10.047447204589844, 4.997844696044922, -1.933197021484375, 3.5171661376953125, 2.9114646911621094, 0.4826240539550781, 19.823997497558594, 4.18328857421875, -0.3418426513671875, -7.04437255859375, 4.30047607421875, 15.678901672363281, 11.03390121459961, -3.2587947845458984, -14.319366455078125, 23.07122039794922, 3.9534378051757812, -0.771942138671875, 3.89459228515625, 1.1674118041992188, 4.62548828125, -0.8062782287597656, 7.041744232177734, 3.819061279296875, -6.412666320800781, 12.836883544921875, -17.658313751220703, 7.34405517578125, 22.681686401367188, 12.336402893066406, -1.9009265899658203, 7.367982864379883, 9.447776794433594, -8.279205322265625, -0.9174346923828125, 7.1848907470703125, 2.629152297973633, 15.479915618896484, 1.1512641906738281, -11.27783203125, 14.220436096191406, 3.720611572265625, 3.528076171875, 3.3763885498046875, -16.29242706298828, 18.2918701171875, 4.119426727294922, 6.6299896240234375, -0.4618263244628906, 9.351829528808594, 6.3039398193359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000270.npy"}
{"epoch": 0.40816326530612246, "step": 271, "batch_size": 64, "mean": 4.513890266418457, "std": 11.941535949707031, "min": -13.895011901855469, "p10": -9.05243911743164, "median": 2.1276931762695312, "p90": 19.44178619384766, "max": 38.525421142578125, "pos_frac": 0.578125, "sample": [5.608482360839844, 8.52447509765625, -0.30169677734375, -7.3733673095703125, 19.754425048828125, 22.92095947265625, 6.254156112670898, 5.632570266723633, 13.309303283691406, -6.6847686767578125, 14.434112548828125, -8.161235809326172, 18.712295532226562, -3.860032081604004, 11.034835815429688, 17.463394165039062, 4.6177215576171875, 1.6129302978515625, 1.1136493682861328, 34.21728515625, -6.928688049316406, -3.5478286743164062, 14.329345703125, -9.434383392333984, -1.8487548828125, -1.0843315124511719, -7.490814208984375, 3.9157066345214844, -1.9206085205078125, -2.9747848510742188, 6.957508087158203, 11.656482696533203, -13.895011901855469, -11.474006652832031, 6.148406982421875, 24.866371154785156, 4.652412414550781, -1.322479248046875, 13.757637023925781, 27.644315719604492, -1.315887451171875, -7.760829925537109, 38.525421142578125, -4.5633697509765625, 2.2448272705078125, 6.5484619140625, -0.26679229736328125, -10.757152557373047, -12.798038482666016, -9.697395324707031, 3.940521240234375, 0.64715576171875, 18.60639190673828, 17.192096710205078, 2.01055908203125, 2.356689453125, -2.7793426513671875, 10.86834716796875, 1.4548187255859375, -5.01458740234375, -3.504343032836914, -10.690231323242188, 34.88716125488281, 7.9185333251953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000271.npy"}
{"epoch": 0.40967498110355255, "step": 272, "batch_size": 64, "mean": 8.433588981628418, "std": 9.773602485656738, "min": -12.675445556640625, "p10": -1.36756534576416, "median": 6.723589897155762, "p90": 24.53659057617188, "max": 31.82059097290039, "pos_frac": 0.78125, "sample": [4.7264556884765625, 0.5178604125976562, -1.1604843139648438, 4.646198272705078, 7.4165496826171875, 24.96686553955078, 12.095428466796875, 28.158737182617188, 28.487396240234375, 8.375740051269531, 7.867279052734375, 7.6792144775390625, -0.03436088562011719, 1.96734619140625, 14.144611358642578, 4.182126998901367, 3.3727874755859375, -2.85223388671875, 25.633934020996094, 27.038524627685547, 15.891300201416016, 5.597528457641602, 14.862102508544922, 8.898273468017578, 17.62883758544922, 5.9762420654296875, 14.913917541503906, -7.372711181640625, 2.748809814453125, 21.703006744384766, 8.415191650390625, 31.82059097290039, -1.1639251708984375, -0.4763622283935547, 20.86062240600586, 3.27886962890625, 1.1945953369140625, -1.4548397064208984, 14.608840942382812, -12.675445556640625, 12.215606689453125, 23.532615661621094, 13.123374938964844, 15.097766876220703, 4.643627166748047, 1.5289878845214844, -3.995361328125, -0.9847259521484375, 12.150970458984375, 13.170883178710938, 5.8333587646484375, 8.88840103149414, 2.103851318359375, -0.951019287109375, 7.138542175292969, 16.26251220703125, 2.1514968872070312, 6.3660430908203125, -4.113494873046875, 2.496561050415039, -3.49700927734375, -0.3072071075439453, 27.327354431152344, 7.081136703491211], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000272.npy"}
{"epoch": 0.41118669690098264, "step": 273, "batch_size": 64, "mean": 6.9506449699401855, "std": 11.74539852142334, "min": -21.45496368408203, "p10": -9.95615825653076, "median": 7.427787780761719, "p90": 20.44517288208008, "max": 34.5653076171875, "pos_frac": 0.78125, "sample": [7.4608612060546875, 8.171981811523438, 20.635009765625, -11.097236633300781, 10.808502197265625, 7.39471435546875, 5.06475830078125, 13.365047454833984, -17.686012268066406, 5.233619689941406, 0.7792510986328125, -12.889251708984375, 10.322738647460938, -7.293642044067383, 7.895282745361328, 17.209152221679688, 17.70063018798828, 5.063060760498047, 27.86400604248047, -0.5061721801757812, 10.933174133300781, 12.488971710205078, -4.057884216308594, -6.900054931640625, -21.45496368408203, 10.053760528564453, 14.330642700195312, 3.2138137817382812, -4.345064163208008, 0.662200927734375, 2.97412109375, 20.575775146484375, 8.456069946289062, -16.419715881347656, 14.237831115722656, 6.750801086425781, 14.056121826171875, 17.83538818359375, 4.519981384277344, 11.495323181152344, -11.863594055175781, 5.195335388183594, 13.119033813476562, -6.008598327636719, 1.9045333862304688, 34.5653076171875, 31.977291107177734, 19.998523712158203, 3.0195770263671875, 20.14043426513672, 4.092266082763672, 3.268695831298828, 8.387928009033203, -2.0865020751953125, 5.214427947998047, 13.856796264648438, 3.94659423828125, 17.470413208007812, 1.976675033569336, 8.596954345703125, 23.454421997070312, 27.010608673095703, -14.33908462524414, 17.04065704345703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000273.npy"}
{"epoch": 0.4126984126984127, "step": 274, "batch_size": 64, "mean": 6.943209171295166, "std": 9.800843238830566, "min": -9.69805908203125, "p10": -4.568986511230468, "median": 5.918839454650879, "p90": 22.02057800292969, "max": 27.007232666015625, "pos_frac": 0.703125, "sample": [7.657764434814453, -1.7286453247070312, -3.065296173095703, 0.915557861328125, 1.0224456787109375, -8.671096801757812, 17.09906005859375, 13.617149353027344, 8.29852294921875, -4.349617004394531, 16.863176345825195, 18.002975463867188, -6.0945587158203125, -7.466400146484375, 6.666078567504883, 27.007232666015625, 22.364540100097656, -7.58885383605957, 9.129573822021484, -2.0344104766845703, 8.040618896484375, 17.840843200683594, 8.78106689453125, 23.879043579101562, 5.015815734863281, 0.8993167877197266, 21.824813842773438, 19.226959228515625, 1.8937740325927734, 4.4185943603515625, -3.5819320678710938, 1.2926254272460938, 12.884559631347656, -0.3959503173828125, -3.0973472595214844, 9.037971496582031, -0.5790519714355469, -0.5466461181640625, 11.128379821777344, 5.171600341796875, -4.663002014160156, 22.829635620117188, 0.17528533935546875, 4.1931610107421875, -0.22980880737304688, 18.211048126220703, 4.3362579345703125, -9.69805908203125, 6.7695770263671875, -3.8113021850585938, 21.354103088378906, 25.440231323242188, 12.125570297241211, 8.09271240234375, 2.7715225219726562, -2.0034351348876953, -4.7145538330078125, 10.587715148925781, 26.615142822265625, 9.545429229736328, 2.1075592041015625, 11.275123596191406, 22.104476928710938, 10.17071533203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000274.npy"}
{"epoch": 0.41421012849584277, "step": 275, "batch_size": 64, "mean": 8.64724349975586, "std": 10.30456256866455, "min": -8.7369384765625, "p10": -3.3388818740844726, "median": 6.044593811035156, "p90": 25.858567810058595, "max": 31.746479034423828, "pos_frac": 0.796875, "sample": [2.0966949462890625, 6.867397308349609, 4.3165740966796875, 15.699012756347656, 23.127952575683594, 0.4635334014892578, 0.977508544921875, -4.897602081298828, 29.59947967529297, 18.53570556640625, 2.18560791015625, 9.917953491210938, 6.1549224853515625, 28.125776290893555, -2.44976806640625, 0.226165771484375, 13.730636596679688, 4.07231330871582, 20.414325714111328, -1.909027099609375, -8.7369384765625, 31.746479034423828, 1.9904098510742188, 5.038414001464844, 18.25751495361328, 1.3842010498046875, -3.39202880859375, -2.2980804443359375, 11.483444213867188, 5.0812225341796875, 12.681961059570312, 26.00971221923828, 0.038787841796875, 9.684783935546875, 26.37823486328125, 27.91461944580078, 3.817638397216797, 9.012840270996094, -2.1131515502929688, -4.2000732421875, 20.91217803955078, 4.8071746826171875, 3.1294631958007812, -0.7094154357910156, 5.93426513671875, 19.315933227539062, 19.6143798828125, 7.662689208984375, -3.7487564086914062, 13.763023376464844, 1.0606536865234375, -5.238559722900391, -3.592151641845703, 13.295730590820312, 26.19172477722168, 10.12640380859375, 3.994659423828125, 6.506874084472656, 11.535314559936523, 10.460113525390625, 25.505897521972656, 2.55133056640625, -3.214872360229492, 16.524391174316406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000275.npy"}
{"epoch": 0.41572184429327286, "step": 276, "batch_size": 64, "mean": 6.956180095672607, "std": 10.667712211608887, "min": -19.141006469726562, "p10": -5.421747589111328, "median": 5.943256378173828, "p90": 22.878223991394044, "max": 31.325674057006836, "pos_frac": 0.78125, "sample": [7.833272933959961, 1.7411613464355469, -7.838848114013672, 4.595344543457031, 1.9461402893066406, 7.997406005859375, 31.325674057006836, -14.623443603515625, 8.774894714355469, 9.841741561889648, 11.155380249023438, 3.3055381774902344, 7.169822692871094, 13.701011657714844, 13.46194839477539, 11.094978332519531, 5.005039215087891, 3.0924453735351562, 7.341705322265625, 2.4242172241210938, 2.398174285888672, 26.695236206054688, 10.184932708740234, 24.70958709716797, -2.2132644653320312, -19.141006469726562, 13.113388061523438, 4.2268218994140625, 10.045421600341797, 23.1658878326416, 1.1455154418945312, 1.0752105712890625, -8.228958129882812, -5.620460510253906, 17.91806411743164, -4.9580841064453125, 28.335845947265625, -2.813121795654297, 9.389190673828125, 13.762908935546875, -13.089385986328125, 0.26615142822265625, 27.143646240234375, -0.21509552001953125, 4.971954345703125, -5.6647796630859375, 14.663625717163086, 13.807998657226562, 27.64837646484375, 19.398391723632812, -0.19748687744140625, 1.54693603515625, 5.946128845214844, 1.7601089477539062, 0.8984146118164062, 22.207008361816406, 17.382953643798828, 1.0396499633789062, -4.923851013183594, 9.2874755859375, 5.9403839111328125, 11.019332885742188, 15.112388610839844, -3.2915496826171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000276.npy"}
{"epoch": 0.41723356009070295, "step": 277, "batch_size": 64, "mean": 3.8493056297302246, "std": 10.873051643371582, "min": -17.632186889648438, "p10": -9.361642837524414, "median": 2.8361968994140625, "p90": 18.00585346221924, "max": 36.94861602783203, "pos_frac": 0.625, "sample": [16.54383087158203, -7.0681304931640625, 12.937198638916016, 0.9454803466796875, 5.745157241821289, -7.421051025390625, 9.392906188964844, 1.0052337646484375, 17.907163619995117, 13.729965209960938, -10.902114868164062, -2.8371124267578125, 10.473312377929688, -0.515045166015625, -9.003490447998047, 5.898799896240234, -0.2051410675048828, 2.7787933349609375, -0.5957984924316406, -0.23715972900390625, 6.642171859741211, 11.568641662597656, 2.4893341064453125, 3.1435470581054688, -5.720909118652344, 15.19720458984375, 4.116294860839844, -6.115104675292969, -9.51513671875, 11.31329345703125, 0.08143329620361328, 16.538898468017578, -5.766994476318359, 18.51940155029297, 0.41785621643066406, 2.8936004638671875, -9.815017700195312, -17.632186889648438, -4.190559387207031, -6.214561462402344, 1.7270965576171875, -6.349826812744141, 10.87030029296875, 19.46584701538086, 6.1834564208984375, 7.129299163818359, 9.264839172363281, 6.220863342285156, -15.284629821777344, 36.94861602783203, 15.383392333984375, -1.9557743072509766, 18.891876220703125, 18.04814910888672, -0.5862503051757812, 1.0181770324707031, 21.530899047851562, -15.586395263671875, 5.48211669921875, 29.141319274902344, -8.600173950195312, 3.2513961791992188, -10.470170974731445, 8.107131958007812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000277.npy"}
{"epoch": 0.41874527588813304, "step": 278, "batch_size": 64, "mean": 5.319490432739258, "std": 10.825571060180664, "min": -20.24456024169922, "p10": -9.476126861572265, "median": 5.239174842834473, "p90": 21.102634048461926, "max": 28.557029724121094, "pos_frac": 0.703125, "sample": [-3.7404403686523438, -4.7746429443359375, 12.586639404296875, 0.4351806640625, 6.424228668212891, -2.5053749084472656, -14.749465942382812, 16.344573974609375, 17.503578186035156, 22.859764099121094, 8.117691040039062, 1.4343109130859375, 18.480846405029297, 25.514400482177734, 26.15550994873047, -0.4290313720703125, 4.823200225830078, 5.679592132568359, 4.977909088134766, 8.904792785644531, 5.50044059753418, 8.862739562988281, 2.1351661682128906, -11.74169921875, -20.24456024169922, 22.22625732421875, -9.740966796875, 6.615154266357422, 7.92584228515625, 6.57012939453125, 7.6703338623046875, 0.8518829345703125, -11.314422607421875, -9.248100280761719, 16.526962280273438, 2.9294357299804688, 2.003223419189453, 1.2599334716796875, 14.327051162719727, -1.6123046875, 26.125930786132812, 28.557029724121094, -0.356048583984375, 7.465599060058594, 17.432592391967773, 13.048656463623047, 8.723838806152344, -0.8363571166992188, -6.7011566162109375, 4.260566711425781, 4.156242370605469, 11.059234619140625, -9.5738525390625, 0.6534614562988281, -2.654388427734375, 28.466079711914062, -11.05328369140625, 7.231903076171875, 7.917411804199219, -6.926490783691406, 0.46759796142578125, -0.5854454040527344, 6.069122314453125, 11.953399658203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000278.npy"}
{"epoch": 0.42025699168556313, "step": 279, "batch_size": 64, "mean": 8.478212356567383, "std": 9.864803314208984, "min": -17.17413902282715, "p10": -2.377433586120605, "median": 6.408018112182617, "p90": 22.647789001464844, "max": 28.340957641601562, "pos_frac": 0.828125, "sample": [-6.121283531188965, 4.936481475830078, 28.340957641601562, 4.377269744873047, -17.17413902282715, 26.727645874023438, 5.247457504272461, -5.66839599609375, -9.017913818359375, 11.732383728027344, 6.095756530761719, 22.619964599609375, 4.057544708251953, 24.11385726928711, 10.812728881835938, 22.556121826171875, 12.515838623046875, 24.450294494628906, -6.534547805786133, -2.6745834350585938, -1.6840839385986328, 20.90412139892578, 0.9878730773925781, 9.501480102539062, 5.80377197265625, 14.071502685546875, 14.993122100830078, 1.8017807006835938, -0.3899993896484375, 19.154136657714844, 13.019916534423828, 17.433361053466797, 22.005233764648438, -1.49176025390625, 3.2554244995117188, 9.866334915161133, 10.034017562866211, 13.095382690429688, 6.08673095703125, 25.00834846496582, 14.721546173095703, 6.720279693603516, 11.560493469238281, 3.4754714965820312, -5.9565887451171875, 8.129951477050781, 0.09305763244628906, 2.6180648803710938, 3.534149169921875, 11.114555358886719, 14.01690673828125, -1.1518325805664062, 2.6851253509521484, 22.659713745117188, 1.037332534790039, 0.5703887939453125, 14.396194458007812, 2.4563446044921875, 14.002513885498047, 5.941905975341797, 11.463651657104492, 1.7716903686523438, 27.335853576660156, 4.558712005615234], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000279.npy"}
{"epoch": 0.4217687074829932, "step": 280, "batch_size": 64, "mean": 8.613632202148438, "std": 9.71975326538086, "min": -16.74911880493164, "p10": -0.8388217926025391, "median": 7.574909210205078, "p90": 22.141145324707033, "max": 34.22682189941406, "pos_frac": 0.859375, "sample": [1.04815673828125, 34.22682189941406, 5.955085754394531, 15.598258972167969, 0.073394775390625, 21.68994140625, 1.6346702575683594, -0.8453254699707031, 5.519731521606445, 8.622077941894531, 7.593544006347656, 10.05825424194336, 9.731948852539062, 5.4869232177734375, 14.240653991699219, 20.494056701660156, 8.814765930175781, 9.57806396484375, 22.334518432617188, -5.107231140136719, 0.18076324462890625, 4.644248962402344, 2.1199417114257812, -0.037906646728515625, 29.58295440673828, 11.508529663085938, 12.927621841430664, 12.920272827148438, 29.71063232421875, 10.485809326171875, 4.438112258911133, 2.90582275390625, 0.2142181396484375, 16.361495971679688, 2.177215576171875, -1.4347076416015625, 30.618179321289062, -2.62640380859375, 28.177284240722656, 9.987808227539062, -5.980499267578125, -16.74911880493164, 4.866065979003906, 1.0331077575683594, 3.1800804138183594, 19.75027847290039, 1.7838287353515625, -0.8792381286621094, 5.872016906738281, 10.919574737548828, 19.9033203125, 8.851968765258789, 4.626270294189453, 12.184459686279297, 0.8757476806640625, 7.4814300537109375, 8.512474060058594, -0.8236465454101562, 9.56704330444336, 0.31395721435546875, 7.5562744140625, 22.787567138671875, 7.957237243652344, 16.07208251953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000280.npy"}
{"epoch": 0.42328042328042326, "step": 281, "batch_size": 64, "mean": 6.464018821716309, "std": 9.396986961364746, "min": -12.238636016845703, "p10": -5.721589279174804, "median": 5.669763565063477, "p90": 19.022432708740237, "max": 31.160987854003906, "pos_frac": 0.78125, "sample": [-6.030097961425781, 16.796165466308594, 3.5276031494140625, 7.198760986328125, 10.199222564697266, 11.846420288085938, 5.698863983154297, 11.880165100097656, 4.037506103515625, 21.870498657226562, 5.640663146972656, 13.491584777832031, 9.837654113769531, 19.083839416503906, -2.25933837890625, 6.598958969116211, 7.972463607788086, 16.886573791503906, -2.8924789428710938, 5.430328369140625, 18.59189224243164, 3.8432769775390625, 0.835235595703125, 16.570114135742188, 6.116493225097656, 3.6930160522460938, 10.443840026855469, -10.652076721191406, 1.7772903442382812, -8.417007446289062, 1.1539535522460938, 0.20766830444335938, -1.3050765991210938, 3.6421470642089844, 24.27074432373047, -12.238636016845703, 9.430526733398438, 31.160987854003906, -0.16640472412109375, -3.3161888122558594, 12.29140853881836, 1.2133255004882812, 16.13979721069336, 10.405506134033203, 22.18111801147461, 0.98626708984375, 8.125835418701172, 9.088722229003906, 20.954456329345703, 2.532815933227539, -9.014366149902344, 2.9659957885742188, 21.315433502197266, -10.481315612792969, 1.1832466125488281, 11.574604034423828, 13.832115173339844, -1.4606170654296875, 18.879150390625, 8.704959869384766, 0.01753997802734375, -5.001735687255859, 2.3276004791259766, -7.52178955078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000281.npy"}
{"epoch": 0.42479213907785335, "step": 282, "batch_size": 64, "mean": 7.391363143920898, "std": 11.268403053283691, "min": -14.901664733886719, "p10": -4.782968139648437, "median": 4.566983222961426, "p90": 25.478729248046875, "max": 36.82971954345703, "pos_frac": 0.75, "sample": [4.698333740234375, 10.294670104980469, -3.1078262329101562, 3.147796630859375, 5.0044403076171875, 3.243419647216797, 7.317138671875, 3.7441883087158203, -5.606136322021484, 25.642425537109375, 24.401325225830078, 13.961112976074219, 4.522940635681152, 3.679534912109375, 28.939613342285156, 1.0826263427734375, 29.831205368041992, 7.283275604248047, -4.0065460205078125, 7.634910583496094, 3.6165122985839844, 11.760581970214844, 3.1940460205078125, 30.466232299804688, -0.225555419921875, -4.495166778564453, 4.5461883544921875, 1.647918701171875, 29.368446350097656, 9.593025207519531, -14.901664733886719, -4.7868499755859375, -4.7739105224609375, 17.030786514282227, -1.6606025695800781, -3.6824798583984375, 11.294090270996094, 2.7941761016845703, -0.9073028564453125, 4.613746643066406, 3.4213943481445312, 14.870260238647461, -2.55108642578125, 14.869117736816406, 17.697769165039062, 25.096771240234375, 10.178909301757812, 10.96002197265625, 0.5738639831542969, 18.05792236328125, -6.6212310791015625, 1.1236038208007812, -5.372161865234375, 4.587778091430664, -5.6201934814453125, 36.82971954345703, 15.238258361816406, 2.3148841857910156, 3.686656951904297, -14.126800537109375, 10.072494506835938, 13.949363708496094, 30.90351104736328, 6.705783843994141], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000282.npy"}
{"epoch": 0.42630385487528344, "step": 283, "batch_size": 64, "mean": 6.714934349060059, "std": 10.941810607910156, "min": -25.117263793945312, "p10": -7.368917465209961, "median": 6.489816665649414, "p90": 23.166682052612305, "max": 27.672119140625, "pos_frac": 0.703125, "sample": [26.082794189453125, -8.073802947998047, -17.00666046142578, -1.8692626953125, 11.575508117675781, 6.448207855224609, 7.513759613037109, 0.20876312255859375, 24.057952880859375, -7.372287750244141, -1.2353363037109375, -4.420845031738281, 8.717296600341797, 3.0093917846679688, 12.46255874633789, 7.7645263671875, 6.6895904541015625, 23.47601318359375, -7.8242340087890625, 9.40435791015625, 8.659912109375, 6.4442291259765625, 13.560928344726562, -1.6659717559814453, 8.580734252929688, 23.212345123291016, 27.57257843017578, 12.893325805664062, 19.094711303710938, 4.030975341796875, 21.52813720703125, 18.62820053100586, -1.3776397705078125, 3.050445556640625, -3.1570892333984375, 7.1488189697265625, 0.8734970092773438, -7.52691650390625, 9.811721801757812, 10.691154479980469, 23.404016494750977, -0.24324989318847656, 23.060134887695312, 22.45322036743164, 3.174591064453125, -0.6090545654296875, 10.65948486328125, -25.117263793945312, -0.4920072555541992, 0.275115966796875, 14.670528411865234, 2.5335922241210938, 27.672119140625, 6.286083221435547, 13.3997802734375, 12.947860717773438, -8.496490478515625, 6.531425476074219, -2.8831253051757812, 4.17474365234375, -7.361053466796875, 16.07605743408203, -0.1337738037109375, 6.11067008972168], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000283.npy"}
{"epoch": 0.42781557067271353, "step": 284, "batch_size": 64, "mean": 8.383764266967773, "std": 10.444693565368652, "min": -12.848472595214844, "p10": -2.722062683105468, "median": 7.572465896606445, "p90": 24.275754165649417, "max": 33.90605926513672, "pos_frac": 0.8125, "sample": [2.2347869873046875, 4.679725646972656, 12.980587005615234, 0.443572998046875, 31.48126220703125, 5.44696044921875, -1.4662208557128906, -10.98065185546875, 26.715316772460938, 1.8881492614746094, 1.9471511840820312, 23.688533782958984, 1.3354339599609375, 8.443435668945312, 16.021347045898438, -1.5889854431152344, 10.898052215576172, 18.3387393951416, -2.9852447509765625, 6.55767822265625, 7.9112701416015625, -5.0106964111328125, 1.4376220703125, -6.861370086669922, 1.2703399658203125, 11.123954772949219, 9.04388427734375, -4.1690673828125, 8.744140625, 12.212162017822266, 15.223773956298828, 18.459224700927734, 22.12940216064453, 6.21661376953125, 9.131285667419434, 1.2358779907226562, 7.233661651611328, 30.978439331054688, 26.266815185546875, 2.3046112060546875, 12.263778686523438, 31.34673309326172, 3.5243682861328125, 9.67425537109375, -0.273406982421875, 1.9247665405273438, 15.8031005859375, 9.433418273925781, 12.292274475097656, 24.527420043945312, 1.3345947265625, 18.57178497314453, 11.513389587402344, 3.1096343994140625, 2.1986923217773438, 10.889602661132812, 33.90605926513672, -2.10797119140625, 13.660274505615234, -3.8078689575195312, 0.5025177001953125, -12.848472595214844, 8.517173767089844, -0.3568305969238281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000284.npy"}
{"epoch": 0.4293272864701436, "step": 285, "batch_size": 64, "mean": 4.79688835144043, "std": 11.243825912475586, "min": -26.321685791015625, "p10": -8.673931121826172, "median": 1.989349365234375, "p90": 20.464935684204107, "max": 29.9525146484375, "pos_frac": 0.625, "sample": [-0.7892913818359375, -0.16911697387695312, 0.904876708984375, 19.012210845947266, -2.5300445556640625, -12.6563720703125, -10.819572448730469, 3.8407936096191406, -2.5126953125, -26.321685791015625, 17.686986923217773, 14.548828125, 24.40146255493164, 14.014087677001953, 11.450996398925781, -8.272884368896484, 9.156166076660156, 29.23691177368164, 0.8660888671875, 1.6179428100585938, -0.147216796875, 1.0417633056640625, -8.686347961425781, 0.2204742431640625, -0.9052886962890625, -4.8507537841796875, -3.6350555419921875, 1.7250518798828125, 1.35296630859375, -2.4257125854492188, -3.3899307250976562, 9.003738403320312, -10.11722183227539, 25.23297119140625, 23.960121154785156, 6.789485931396484, 12.282295227050781, 8.841476440429688, 14.322181701660156, 14.939069747924805, 15.888328552246094, 9.114013671875, -2.690196990966797, 23.556018829345703, 21.08753204345703, 6.4080047607421875, 6.378751754760742, 29.9525146484375, -8.64495849609375, -6.742006301879883, 6.483499526977539, 9.687126159667969, 6.648773193359375, 12.095268249511719, 2.2536468505859375, -0.6065559387207031, 18.539596557617188, 10.178680419921875, -2.9500083923339844, -0.564788818359375, -10.861221313476562, -11.839157104492188, 0.5748291015625, 4.833412170410156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000285.npy"}
{"epoch": 0.4308390022675737, "step": 286, "batch_size": 64, "mean": 8.115406036376953, "std": 11.623400688171387, "min": -25.492324829101562, "p10": -3.0212890624999997, "median": 7.014533996582031, "p90": 23.597143554687502, "max": 37.884918212890625, "pos_frac": 0.8125, "sample": [17.810409545898438, 11.515739440917969, 2.8785133361816406, 4.3439178466796875, 6.462944030761719, 16.86296844482422, 0.5297927856445312, -6.063302993774414, 7.019508361816406, -8.282875061035156, 21.068557739257812, 8.315559387207031, 2.9916229248046875, 11.899768829345703, 13.343509674072266, 22.213348388671875, 10.03499984741211, 7.009559631347656, -11.756755828857422, 8.82452392578125, 4.381805419921875, -17.442459106445312, 0.19850921630859375, 29.906139373779297, -3.1735916137695312, -2.112102508544922, 12.168088912963867, 10.351066589355469, 3.188262939453125, 7.8287353515625, 17.8808650970459, 11.71219253540039, 25.889739990234375, 22.333847045898438, 8.290843963623047, 37.884918212890625, -2.6659164428710938, 18.626569747924805, -4.442436218261719, 15.0518798828125, 34.32452392578125, 23.841552734375, 3.3015060424804688, -2.1885528564453125, 7.411567687988281, 3.199115753173828, 2.424825668334961, 0.660552978515625, 23.918960571289062, 23.02685546875, 13.68975830078125, 0.16097259521484375, 0.7364883422851562, 6.581043243408203, -1.7663497924804688, 2.9938201904296875, 26.48760223388672, 2.990070343017578, -25.492324829101562, 15.264022827148438, 6.163667678833008, -1.4497184753417969, 10.01123046875, 0.2155303955078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000286.npy"}
{"epoch": 0.4323507180650038, "step": 287, "batch_size": 64, "mean": 10.722530364990234, "std": 12.37580680847168, "min": -16.485244750976562, "p10": -2.4597030639648434, "median": 7.625267028808594, "p90": 27.66605911254883, "max": 37.53710174560547, "pos_frac": 0.8125, "sample": [10.19659423828125, 3.854339599609375, 3.7876739501953125, 27.82030487060547, 32.504615783691406, 26.784500122070312, 7.360221862792969, 8.487770080566406, 27.30615234375, -2.2577972412109375, 8.729866027832031, 18.63579559326172, 16.16269302368164, 21.969497680664062, 28.61328125, 33.62559509277344, 17.09600830078125, 0.5797500610351562, 6.038639068603516, 2.6423721313476562, 10.178359985351562, -5.200439453125, 20.430023193359375, 6.334546089172363, 6.511829376220703, 16.533950805664062, -0.7972412109375, -3.4974594116210938, 24.790897369384766, 4.266887664794922, 13.358497619628906, 0.48783111572265625, -16.485244750976562, -1.9480514526367188, -1.117863655090332, 18.60706329345703, 11.347156524658203, -2.546234130859375, 4.326499938964844, 16.002704620361328, 16.67337417602539, 1.8883247375488281, 22.946048736572266, 32.24444580078125, 6.942718505859375, 7.890312194824219, 37.53710174560547, -9.929159164428711, 18.692424774169922, 23.374528884887695, -0.33887386322021484, 2.3781051635742188, 3.5057220458984375, 4.748538970947266, -13.80936050415039, 2.3213958740234375, 24.909549713134766, 11.542564392089844, 7.257194519042969, 5.782894134521484, -6.939338684082031, 34.061065673828125, 26.92281723022461, 4.11798095703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000287.npy"}
{"epoch": 0.43386243386243384, "step": 288, "batch_size": 64, "mean": 5.741317272186279, "std": 11.74944019317627, "min": -20.499053955078125, "p10": -6.00998592376709, "median": 4.6529083251953125, "p90": 20.064642715454102, "max": 33.455406188964844, "pos_frac": 0.734375, "sample": [12.796731948852539, 4.62451171875, -19.42041015625, -15.21478271484375, 14.755363464355469, 7.2326812744140625, 8.715141296386719, -3.852386474609375, 13.629150390625, 0.03493499755859375, 7.765583038330078, -0.6199073791503906, 2.86328125, 19.563640594482422, 3.65142822265625, 12.017845153808594, -15.28594970703125, 10.281539916992188, -1.8401947021484375, 9.16683578491211, -2.5106735229492188, 17.363887786865234, 28.141494750976562, 31.62493896484375, 4.681304931640625, 18.306884765625, 8.660171508789062, 22.939311981201172, -5.9530487060546875, 4.577785491943359, 33.455406188964844, -5.5119171142578125, 5.855377197265625, 13.077262878417969, 23.59857177734375, 10.215644836425781, 4.598297119140625, -4.9266204833984375, 7.239177703857422, 2.2665328979492188, -14.825014114379883, -20.499053955078125, 31.65404510498047, 5.297050476074219, 9.383922576904297, -8.311412811279297, -5.580116271972656, 0.0232086181640625, 5.2596588134765625, 20.27935791015625, 11.983543395996094, 1.8302993774414062, 19.53704833984375, 0.81121826171875, -6.034387588500977, -5.333845138549805, 2.3410110473632812, 1.6442947387695312, 4.15240478515625, 15.483375549316406, 1.9464569091796875, 0.32675933837890625, 12.295242309570312, -4.7855987548828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000288.npy"}
{"epoch": 0.43537414965986393, "step": 289, "batch_size": 64, "mean": 7.7562575340271, "std": 11.773369789123535, "min": -20.50469970703125, "p10": -4.907037925720214, "median": 4.409904479980469, "p90": 25.09828968048096, "max": 42.617340087890625, "pos_frac": 0.734375, "sample": [-5.321567535400391, -1.7719268798828125, 14.442916870117188, -1.6133880615234375, 0.24950408935546875, 12.759063720703125, -1.220458984375, 3.032703399658203, 3.8676376342773438, 1.6155853271484375, -3.8130569458007812, 9.75344467163086, 2.6474647521972656, 8.80343246459961, 18.775360107421875, 2.0913448333740234, 21.144866943359375, -7.92987060546875, 5.851871490478516, 24.296157836914062, 2.257781982421875, 26.993759155273438, 3.464977264404297, 0.40350341796875, 17.926780700683594, 3.9062461853027344, 18.366249084472656, 3.3177719116210938, 20.054035186767578, 31.181867599487305, -6.0457763671875, 10.925212860107422, 0.8462905883789062, -7.365074157714844, 29.26569366455078, -3.9398021697998047, -5.906120300292969, 42.617340087890625, 7.666412353515625, 10.380355834960938, 8.678451538085938, 6.4436187744140625, 6.072563171386719, -9.137348175048828, 13.014785766601562, -20.50469970703125, 4.6960906982421875, 0.7790088653564453, -1.158203125, -3.0165977478027344, 27.368494033813477, 16.5433349609375, 2.4405593872070312, -1.5257720947265625, 22.22325897216797, 25.442060470581055, 26.151344299316406, 20.026100158691406, 15.532516479492188, 4.12371826171875, 4.807682037353516, -3.3619518280029297, 16.91747283935547, -0.1345977783203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000289.npy"}
{"epoch": 0.436885865457294, "step": 290, "batch_size": 64, "mean": 7.221979141235352, "std": 11.134799003601074, "min": -22.71741485595703, "p10": -5.558344268798826, "median": 6.105247497558594, "p90": 22.16081275939942, "max": 31.782501220703125, "pos_frac": 0.765625, "sample": [20.23333740234375, 7.5916900634765625, 7.624021530151367, 29.86260986328125, 14.64284896850586, 1.648834228515625, 1.5142974853515625, -0.282928466796875, 3.272918701171875, 10.211868286132812, 6.92388916015625, 4.566352844238281, -3.0915069580078125, -3.4231796264648438, 21.108230590820312, -0.369598388671875, 5.9929046630859375, -6.4683990478515625, -7.312660217285156, 11.534942626953125, 1.0678596496582031, 19.124343872070312, 16.300628662109375, 13.142387390136719, 5.852535247802734, -2.7282867431640625, 5.317859649658203, -1.2774200439453125, 10.820037841796875, -14.672370910644531, 24.03583526611328, 14.280597686767578, 16.100173950195312, 11.762664794921875, -8.21600341796875, 27.57244873046875, 19.188453674316406, 5.0891571044921875, 13.036491394042969, -16.72149658203125, 5.5739898681640625, 16.95636749267578, 0.9858169555664062, -22.71741485595703, 8.492267608642578, 2.4047470092773438, 0.9322109222412109, 31.782501220703125, 24.2008056640625, 22.611919403076172, 6.21759033203125, 13.28936767578125, -1.3906822204589844, 1.0237770080566406, -11.68020248413086, 15.318099975585938, 4.119873046875, 6.550212860107422, 5.203338623046875, 7.171234130859375, 14.768836975097656, 25.949493408203125, 3.0210037231445312, -3.4348831176757812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000290.npy"}
{"epoch": 0.4383975812547241, "step": 291, "batch_size": 64, "mean": 8.44200325012207, "std": 13.809659957885742, "min": -29.69492530822754, "p10": -5.644538497924804, "median": 8.037736892700195, "p90": 26.813636779785156, "max": 37.085968017578125, "pos_frac": 0.71875, "sample": [24.668487548828125, 21.18877410888672, -0.5485916137695312, 4.3475189208984375, 17.091049194335938, 21.707069396972656, -16.670150756835938, 4.668098449707031, 32.788238525390625, -4.257354736328125, 25.46886444091797, 17.87262725830078, -12.73720932006836, -5.084453582763672, 29.392560958862305, 26.161941528320312, 1.24810791015625, -5.884574890136719, 22.690719604492188, 8.003093719482422, -1.4942169189453125, -2.7647132873535156, 12.506317138671875, 3.9514541625976562, 16.600662231445312, 10.922927856445312, 26.977401733398438, 0.18677330017089844, 37.085968017578125, 12.570667266845703, 15.141769409179688, 30.200607299804688, -3.5932998657226562, 1.1739311218261719, -2.5612411499023438, 8.42709732055664, 16.9285888671875, 0.191375732421875, 8.072380065917969, 4.127887725830078, 7.1587982177734375, 18.701095581054688, -29.69492530822754, 1.6413116455078125, 26.4315185546875, 16.124404907226562, 10.777458190917969, 12.400768280029297, 13.004436492919922, 31.084197998046875, -17.8675537109375, 1.70477294921875, -15.387344360351562, 10.059967041015625, -4.153553009033203, 5.9480438232421875, -2.4063053131103516, 30.316665649414062, 11.920333862304688, 14.89675521850586, 4.798576354980469, -2.3081512451171875, -9.474159240722656, -2.1560440063476562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000291.npy"}
{"epoch": 0.4399092970521542, "step": 292, "batch_size": 64, "mean": 7.444643497467041, "std": 11.862417221069336, "min": -24.05804443359375, "p10": -7.833514022827145, "median": 7.7983903884887695, "p90": 24.797604370117188, "max": 30.45998764038086, "pos_frac": 0.71875, "sample": [-10.72222900390625, 1.8164520263671875, 9.992097854614258, -0.27517127990722656, 13.597244262695312, 9.406496047973633, 6.516990661621094, 29.969940185546875, 12.713768005371094, 5.3887786865234375, 10.20162582397461, -0.19966793060302734, 9.131275177001953, 4.8935546875, -12.086990356445312, 11.337749481201172, 11.946578979492188, 19.47735595703125, 21.121238708496094, 2.0209732055664062, 1.5137176513671875, 20.6763916015625, 15.85455322265625, -4.220958709716797, -10.800079345703125, 1.6182098388671875, -3.5733718872070312, 15.681880950927734, -13.567672729492188, 3.025920867919922, 3.5460968017578125, 13.042499542236328, 13.028213500976562, 0.7510528564453125, -0.18648147583007812, -3.3468551635742188, 18.122596740722656, -24.05804443359375, -2.748706817626953, -2.4336395263671875, -3.2992401123046875, 10.375492095947266, 13.196552276611328, -2.5167694091796875, 26.621017456054688, 30.45998764038086, 13.839614868164062, 9.227163314819336, 3.0502567291259766, 9.4927978515625, 24.6895751953125, 19.58343505859375, 1.5141677856445312, 27.005645751953125, 7.175226211547852, -3.5831146240234375, 8.421554565429688, 1.626708984375, 27.9322509765625, 20.2091064453125, 24.843902587890625, -9.6759033203125, -9.381752014160156, 27.476119995117188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000292.npy"}
{"epoch": 0.4414210128495843, "step": 293, "batch_size": 64, "mean": 9.427419662475586, "std": 14.87829303741455, "min": -33.883392333984375, "p10": -9.36465950012207, "median": 7.613685607910156, "p90": 28.841078758239746, "max": 44.27223205566406, "pos_frac": 0.765625, "sample": [2.4523239135742188, 11.805660247802734, 16.415664672851562, 23.831645965576172, 9.717292785644531, 7.611839294433594, -0.6154041290283203, 5.704862594604492, 1.1372146606445312, 4.5148162841796875, 20.613914489746094, -6.338356018066406, 31.784996032714844, 22.091766357421875, 20.782691955566406, 0.7474212646484375, 24.022836685180664, 15.995918273925781, 20.15457534790039, 19.220375061035156, 14.665023803710938, -11.882061004638672, -0.5008773803710938, -9.924125671386719, 17.53672981262207, 3.607940673828125, 5.611005783081055, -1.0429763793945312, 7.615531921386719, 44.27223205566406, 6.352058410644531, 32.602691650390625, 31.35235595703125, -12.106008529663086, 7.7532501220703125, -3.561992645263672, 28.993282318115234, -33.883392333984375, 3.1233062744140625, 5.2863311767578125, -10.167221069335938, 40.676414489746094, -18.866378784179688, -8.05923843383789, 1.51336669921875, -1.6244277954101562, 13.97137451171875, 17.80890655517578, 33.29571533203125, 14.862205505371094, 28.485937118530273, -6.783721923828125, 18.081378936767578, 3.7693557739257812, 14.234596252441406, -16.298667907714844, 23.903488159179688, 3.19879150390625, 1.7374496459960938, 20.671554565429688, 11.47161865234375, 3.4980545043945312, 0.88104248046875, 25.570938110351562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000293.npy"}
{"epoch": 0.4429327286470144, "step": 294, "batch_size": 64, "mean": 7.752444744110107, "std": 12.087320327758789, "min": -19.632740020751953, "p10": -4.676186370849609, "median": 6.45611572265625, "p90": 21.463747787475587, "max": 43.172210693359375, "pos_frac": 0.671875, "sample": [30.437393188476562, 9.458045959472656, 8.868217468261719, 2.279327392578125, 21.708515167236328, 16.374725341796875, 17.84986114501953, 1.6367340087890625, -19.632740020751953, 6.844066619873047, 19.771133422851562, -13.684730529785156, -2.14453125, 17.346755981445312, 14.035734176635742, -4.746917724609375, 29.972042083740234, 13.284088134765625, 16.827194213867188, 20.892623901367188, -3.885772705078125, 4.8594512939453125, 5.84984016418457, 12.463645935058594, -4.511146545410156, 29.59949493408203, -9.181537628173828, 12.210987091064453, 3.2685775756835938, 43.172210693359375, 16.60052490234375, -2.618165969848633, -4.215923309326172, 19.69286346435547, -2.596294403076172, -2.98699951171875, -3.8311691284179688, -2.4446659088134766, 6.256538391113281, 15.067359924316406, -3.5006179809570312, 2.3936386108398438, 25.064451217651367, 3.5241832733154297, 16.103801727294922, 16.516754150390625, 17.126859664916992, 12.683307647705078, -0.7875137329101562, 27.16714859008789, -5.960884094238281, 14.001708984375, 18.11578369140625, 11.14077377319336, 13.380987167358398, -3.0851821899414062, -2.9555740356445312, -8.026897430419922, 2.341470718383789, -1.3828277587890625, 6.655693054199219, 0.5691375732421875, 4.144527435302734, -9.221626281738281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000294.npy"}
{"epoch": 0.4444444444444444, "step": 295, "batch_size": 64, "mean": 6.483031749725342, "std": 11.477566719055176, "min": -22.009544372558594, "p10": -5.343497085571288, "median": 4.8948469161987305, "p90": 20.532658004760748, "max": 41.09374237060547, "pos_frac": 0.75, "sample": [19.12335205078125, 1.2645378112792969, -2.7284393310546875, -10.579635620117188, 21.136646270751953, 15.720291137695312, 16.41712188720703, 5.561817169189453, 24.857372283935547, 2.6649551391601562, 41.09374237060547, 6.872941970825195, 21.91314697265625, 6.193614959716797, 2.940906524658203, 3.679046630859375, 0.5238685607910156, 7.852088928222656, -1.136505126953125, 12.21219253540039, 11.033958435058594, 4.227876663208008, 0.7640838623046875, 8.986228942871094, 11.4412841796875, -9.81890869140625, -1.464111328125, -12.36275863647461, 11.236618041992188, 0.7857131958007812, -1.5127716064453125, -3.140411376953125, 32.92155075073242, 11.828842163085938, 7.870994567871094, -5.716346740722656, 14.522989273071289, 3.8949813842773438, 17.826114654541016, 22.325071334838867, 11.130401611328125, -22.009544372558594, 3.370025634765625, 14.162109375, 2.8187408447265625, 12.542510986328125, -4.2671356201171875, 3.1313629150390625, -11.281360626220703, 7.323942184448242, 0.6067123413085938, 2.5409679412841797, -15.643905639648438, 1.5863418579101562, -4.473514556884766, 13.68034553527832, 15.033294677734375, 13.506378173828125, -0.678802490234375, 0.587554931640625, 9.041061401367188, 11.170166015625, 33.70903778076172, -3.906719207763672], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000295.npy"}
{"epoch": 0.4459561602418745, "step": 296, "batch_size": 64, "mean": 9.517582893371582, "std": 13.01368522644043, "min": -13.796661376953125, "p10": -6.534500885009765, "median": 8.1612548828125, "p90": 29.488320922851567, "max": 33.99394989013672, "pos_frac": 0.75, "sample": [11.155349731445312, 32.32172393798828, 5.8417816162109375, 9.719415664672852, -3.9719772338867188, 8.292037963867188, 3.088703155517578, 31.117446899414062, -3.7593345642089844, 13.158138275146484, 10.040386199951172, 33.67034149169922, -7.1212005615234375, 29.91735076904297, 7.0887908935546875, 6.655548095703125, 22.40399169921875, 21.302778244018555, 5.105033874511719, 15.376579284667969, 0.1340789794921875, -5.165534973144531, 26.201148986816406, -4.17315673828125, -4.1489410400390625, -12.855926513671875, 2.7657394409179688, 8.030471801757812, 2.0870361328125, 5.033424377441406, -11.449722290039062, -9.183979034423828, -3.4449005126953125, 13.80224609375, 26.00244140625, 5.34600830078125, 0.17366790771484375, 1.1790733337402344, 21.164306640625, -8.383834838867188, 8.297004699707031, 4.6688690185546875, 33.99394989013672, -13.796661376953125, 15.465713500976562, 10.993179321289062, 14.336692810058594, 19.55021858215332, 18.710437774658203, 31.458995819091797, 13.264350891113281, -3.8956451416015625, 24.509017944335938, 11.65920639038086, 33.61034393310547, 4.6551666259765625, -10.844047546386719, 28.48725128173828, 0.37335205078125, -0.119903564453125, -0.3852653503417969, 23.254302978515625, 16.732574462890625, 19.629650115966797], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000296.npy"}
{"epoch": 0.4474678760393046, "step": 297, "batch_size": 64, "mean": 9.552496910095215, "std": 11.046802520751953, "min": -18.552188873291016, "p10": -3.22424087524414, "median": 7.19146728515625, "p90": 25.819013023376467, "max": 32.14173889160156, "pos_frac": 0.796875, "sample": [14.586494445800781, -3.5297775268554688, 7.2360687255859375, 15.728172302246094, 11.285369873046875, 25.43109893798828, 8.015018463134766, 26.66876220703125, -5.114925384521484, -6.057403564453125, 9.334281921386719, 6.952781677246094, 4.552669525146484, 11.822746276855469, 6.415079116821289, 32.04317092895508, 5.784709930419922, 15.991287231445312, 5.2037506103515625, 15.220535278320312, 5.506004333496094, -1.948822021484375, -6.53656005859375, -18.552188873291016, 2.282114028930664, 31.821983337402344, 9.763168334960938, -0.39315032958984375, 27.494171142578125, 4.940158843994141, 17.834688186645508, 32.14173889160156, 1.484954833984375, 0.27167510986328125, 6.175884246826172, 6.620197296142578, 0.7280044555664062, -0.689422607421875, 16.10519790649414, 5.8306427001953125, 31.7235107421875, 0.19933700561523438, 12.864219665527344, 25.985261917114258, 4.1884613037109375, 14.462730407714844, -8.424064636230469, -2.363759994506836, 5.2078094482421875, 15.592655181884766, 5.090721130371094, 10.46000862121582, 16.043296813964844, 14.75555419921875, -0.2252044677734375, -5.07151985168457, 22.245391845703125, 22.27716827392578, -2.511322021484375, 25.23737335205078, 7.1468658447265625, 10.576181411743164, 19.432811737060547, 18.01599884033203], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000297.npy"}
{"epoch": 0.4489795918367347, "step": 298, "batch_size": 64, "mean": 8.081270217895508, "std": 14.164616584777832, "min": -18.44855499267578, "p10": -9.895085906982422, "median": 5.8105974197387695, "p90": 29.005224609375006, "max": 39.94313049316406, "pos_frac": 0.734375, "sample": [3.51873779296875, -18.44855499267578, 35.90276336669922, 5.012237548828125, 30.32201385498047, -4.9697418212890625, -6.164461135864258, 3.7579345703125, 6.764179229736328, 5.245950698852539, 27.887447357177734, 31.917011260986328, -7.117969512939453, -2.136444091796875, -4.958366394042969, 30.083221435546875, 14.699432373046875, -9.993507385253906, 25.097537994384766, 9.275177001953125, 6.375244140625, -7.471010208129883, 17.187252044677734, 15.0362548828125, 3.81707763671875, 1.167144775390625, -17.011703491210938, 15.096729278564453, 15.466302871704102, 29.484272003173828, 2.7507400512695312, 23.26828384399414, 7.942665100097656, 21.665382385253906, 11.425760269165039, 26.18280029296875, 8.054899215698242, 21.544410705566406, 2.690868377685547, 0.21349334716796875, 4.732311248779297, -2.1444292068481445, -0.9248390197753906, 7.621955871582031, 6.859579086303711, 39.94313049316406, 23.509605407714844, -9.665435791015625, 1.9800300598144531, 7.6666717529296875, 0.08382225036621094, 1.85418701171875, -5.416236877441406, -13.26536750793457, -14.205146789550781, 17.00560760498047, -11.533111572265625, 21.968032836914062, 3.608734130859375, -15.291099548339844, 25.74847412109375, 9.731796264648438, 5.2448883056640625, 31.506690979003906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000298.npy"}
{"epoch": 0.4504913076341648, "step": 299, "batch_size": 64, "mean": 5.8140764236450195, "std": 12.366348266601562, "min": -21.535972595214844, "p10": -8.212271881103515, "median": 5.037075042724609, "p90": 20.26955070495606, "max": 44.36228942871094, "pos_frac": 0.6875, "sample": [11.367118835449219, 6.117588043212891, -2.6416091918945312, -3.6490325927734375, 3.1820526123046875, 14.253194808959961, 1.1324081420898438, 6.512584686279297, 20.711456298828125, -1.6648426055908203, -2.799806594848633, 9.60772705078125, -21.535972595214844, 5.11224365234375, -15.022377014160156, 2.4368133544921875, 6.45233154296875, 25.000991821289062, 15.083152770996094, 4.961906433105469, 29.97458267211914, -4.978267669677734, -5.051656723022461, 13.842636108398438, -9.066543579101562, -7.9222412109375, -7.704303741455078, 2.1787872314453125, 44.36228942871094, -3.2842330932617188, 17.685806274414062, 11.973709106445312, 10.88595199584961, -21.003646850585938, 0.34682464599609375, -8.6614990234375, 10.879653930664062, 19.23843765258789, 1.8099136352539062, 3.924407958984375, 8.635093688964844, -1.92657470703125, 16.318408966064453, 10.723323822021484, 36.362091064453125, -11.61441421508789, 3.706705093383789, 25.981216430664062, 6.5333404541015625, 18.649799346923828, 6.106109619140625, 3.7394561767578125, 17.888586044311523, -2.164520263671875, -4.176116943359375, 13.308113098144531, 10.796661376953125, 5.993797302246094, 0.27097511291503906, 3.631744384765625, 6.279693603515625, -2.352031707763672, -8.336570739746094, 23.69744873046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000299.npy"}
{"epoch": 0.4520030234315949, "step": 300, "batch_size": 64, "mean": 6.707089900970459, "std": 12.713336944580078, "min": -22.493927001953125, "p10": -9.512313461303709, "median": 7.941295623779297, "p90": 21.32620239257813, "max": 34.818519592285156, "pos_frac": 0.703125, "sample": [8.720390319824219, 13.892593383789062, 4.2620391845703125, 14.280197143554688, 0.20186614990234375, -6.932014465332031, 11.546234130859375, 7.755683898925781, -18.889057159423828, -18.628355026245117, 34.36289978027344, 21.866546630859375, 18.610469818115234, 16.21319580078125, 34.818519592285156, 9.227893829345703, 15.084793090820312, -8.010417938232422, 1.820343017578125, 17.651222229003906, 4.4210662841796875, 20.065399169921875, 8.126907348632812, 2.200468063354492, -10.155982971191406, 4.652656555175781, 9.433738708496094, 3.2227020263671875, 14.017021179199219, 11.040298461914062, -0.5030097961425781, -22.493927001953125, 14.947433471679688, -1.993194580078125, -5.769683837890625, 6.793891906738281, -0.5223731994628906, 31.990341186523438, 16.371994018554688, 16.129562377929688, 9.630859375, 2.316570281982422, -5.0206146240234375, 25.368499755859375, 9.707450866699219, -21.370590209960938, 1.03265380859375, 15.77969741821289, -2.953704833984375, 18.34998321533203, 25.97332763671875, -0.18634033203125, -3.4824371337890625, 8.18850326538086, -0.992462158203125, -13.621511459350586, 4.543617248535156, 12.419677734375, 18.067543029785156, -10.36336898803711, 21.972488403320312, 13.362052917480469, 7.059104919433594, -6.357593536376953], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000300.npy"}
{"epoch": 0.45351473922902497, "step": 301, "batch_size": 64, "mean": 7.021984577178955, "std": 12.830981254577637, "min": -17.372880935668945, "p10": -8.287487030029297, "median": 4.077674865722656, "p90": 25.66988220214844, "max": 38.92875671386719, "pos_frac": 0.6875, "sample": [22.65522003173828, 10.468742370605469, 2.0464324951171875, 1.9922332763671875, 6.912361145019531, -9.161094665527344, 28.074142456054688, 10.70339584350586, 10.066600799560547, 15.170448303222656, 0.5429611206054688, 6.430957794189453, 17.58582305908203, -7.511287689208984, -15.871925354003906, 4.1817474365234375, -6.804290771484375, 3.973602294921875, 12.192596435546875, 0.28733062744140625, -2.480712890625, 11.3321533203125, 24.991466522216797, -6.240863800048828, 17.170814514160156, 11.656288146972656, 2.3947696685791016, 26.64099884033203, 3.0631866455078125, -2.2515029907226562, -10.958961486816406, -5.850830078125, 19.25400161743164, 28.104888916015625, -1.3363265991210938, 1.2058639526367188, -7.45635986328125, 38.16209411621094, -8.62014389038086, 7.677127838134766, 0.5169486999511719, 11.745880126953125, 8.39132308959961, -17.372880935668945, -0.06678390502929688, 18.5396728515625, 10.586380004882812, 9.614532470703125, 0.067230224609375, 23.452239990234375, 2.9762744903564453, 25.7657470703125, 19.52524185180664, -2.5369911193847656, -3.7626113891601562, 11.46695327758789, 26.38323974609375, 38.92875671386719, -0.7117080688476562, -8.69158935546875, 0.6685943603515625, -11.634574890136719, 25.446197509765625, -0.2850189208984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000301.npy"}
{"epoch": 0.455026455026455, "step": 302, "batch_size": 64, "mean": 3.7031824588775635, "std": 11.131087303161621, "min": -20.320995330810547, "p10": -7.8750671386718745, "median": 3.0600147247314453, "p90": 17.27096481323242, "max": 34.639495849609375, "pos_frac": 0.671875, "sample": [7.012866973876953, 15.345611572265625, -1.0625457763671875, 34.639495849609375, 2.2336769104003906, 12.743968963623047, 31.10150146484375, -7.087310791015625, 17.30638885498047, 4.116237640380859, 4.461761474609375, 2.7667617797851562, 7.070503234863281, 3.110187530517578, 1.2543754577636719, 22.24085235595703, 5.733676910400391, 1.2137527465820312, -2.8181533813476562, 4.8389892578125, 0.014286041259765625, -14.476181030273438, -20.025711059570312, 2.0615081787109375, -0.992584228515625, -5.641696929931641, 4.649768829345703, 22.495010375976562, -2.900989532470703, 2.3455047607421875, -7.9603118896484375, 14.1361083984375, 6.898735046386719, -2.625054359436035, -7.501407623291016, 17.188308715820312, 28.168251037597656, -13.270927429199219, 2.04681396484375, -1.6748161315917969, 9.752248764038086, 10.434497833251953, -0.48043060302734375, 6.268623352050781, 3.3221282958984375, 2.9545516967773438, -17.301074981689453, 7.2920074462890625, -0.710540771484375, 3.521209716796875, 10.750883102416992, 4.803352355957031, 10.040313720703125, -3.9291152954101562, 3.0098419189453125, -20.320995330810547, 10.051071166992188, 24.921737670898438, -7.6761627197265625, 3.3653297424316406, -14.376243591308594, 0.04205322265625, 4.868385314941406, -2.7572174072265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000302.npy"}
{"epoch": 0.4565381708238851, "step": 303, "batch_size": 64, "mean": 9.475261688232422, "std": 13.615092277526855, "min": -16.566795349121094, "p10": -6.203190231323242, "median": 8.900638580322266, "p90": 26.64032440185547, "max": 41.399375915527344, "pos_frac": 0.75, "sample": [0.815277099609375, 15.746986389160156, 15.556289672851562, 11.56268310546875, 34.2125244140625, 26.354934692382812, 16.63190460205078, -3.9187698364257812, 14.257596969604492, 0.9474029541015625, 14.54483413696289, -6.336597442626953, 7.887989044189453, -15.07632064819336, -5.055229187011719, -9.932571411132812, -12.763626098632812, 26.76263427734375, -5.462440490722656, 3.9024200439453125, -0.49588966369628906, 15.6595458984375, 36.67218017578125, 24.39910888671875, -16.566795349121094, 5.4150390625, 29.680709838867188, -15.885414123535156, -3.0813121795654297, 2.4185791015625, 13.401247024536133, 8.125541687011719, 5.821590423583984, 0.7158927917480469, 2.6262130737304688, -1.2505912780761719, 34.550785064697266, 2.86474609375, 21.91905975341797, 17.035888671875, 8.810234069824219, 16.68852996826172, 12.970630645751953, 17.613174438476562, -0.932342529296875, 11.536636352539062, 41.399375915527344, 9.159103393554688, 9.563018798828125, 15.638984680175781, -5.89190673828125, 6.594337463378906, -1.7977447509765625, 2.783538818359375, 23.917869567871094, 17.593597412109375, 3.185333251953125, 3.866455078125, 24.39940643310547, 8.991043090820312, 37.85020065307617, -11.827842712402344, 18.669219970703125, 20.971847534179688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000303.npy"}
{"epoch": 0.4580498866213152, "step": 304, "batch_size": 64, "mean": 8.366886138916016, "std": 11.184070587158203, "min": -18.350181579589844, "p10": -1.9495513916015625, "median": 5.627045631408691, "p90": 25.31910438537599, "max": 34.85900115966797, "pos_frac": 0.84375, "sample": [27.37039566040039, 0.8651580810546875, 4.688323974609375, 28.50732421875, 0.1214599609375, 13.567996978759766, 4.463134765625, 7.794530868530273, -4.846044540405273, 0.8271980285644531, 5.3162841796875, 18.6055908203125, 12.779342651367188, -1.9496612548828125, 18.01451301574707, 4.439735412597656, -12.139671325683594, 8.05697250366211, 34.85900115966797, -16.410430908203125, 3.7824554443359375, 6.477256774902344, 4.393463134765625, 2.311586380004883, -0.13866424560546875, -9.491241455078125, 5.937807083129883, 7.9111480712890625, 4.985198974609375, 26.571487426757812, 20.451187133789062, -1.7576103210449219, 22.101348876953125, 9.338863372802734, 4.437889099121094, -18.350181579589844, 9.112083435058594, 1.5390090942382812, 6.102184295654297, 3.9144134521484375, 7.56292724609375, -3.267742156982422, 22.031723022460938, 11.543701171875, 7.754398345947266, 0.9092330932617188, 1.9671955108642578, 17.020156860351562, 17.633529663085938, 22.39687728881836, 4.943412780761719, 3.0657196044921875, 28.901016235351562, 2.0119895935058594, 18.134788513183594, -1.9492950439453125, 18.058353424072266, 28.52989959716797, 28.848129272460938, 0.20220184326171875, 1.9154815673828125, 2.6789932250976562, 10.265625, 19.76153564453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000304.npy"}
{"epoch": 0.4595616024187453, "step": 305, "batch_size": 64, "mean": 7.849347114562988, "std": 13.399735450744629, "min": -28.050018310546875, "p10": -7.893338394165038, "median": 4.629734039306641, "p90": 26.01943511962891, "max": 39.60063934326172, "pos_frac": 0.734375, "sample": [27.2166748046875, -3.94403076171875, 1.9804153442382812, -2.3436813354492188, 1.3940200805664062, 2.332813262939453, 28.418907165527344, -15.919193267822266, 17.979721069335938, 25.018386840820312, 1.7726936340332031, 21.869117736816406, -7.526123046875, 19.532123565673828, -5.1653594970703125, -10.959766387939453, -8.050716400146484, 1.8694686889648438, 1.5347442626953125, 29.60369110107422, 2.43414306640625, 17.944154739379883, 13.463760375976562, 14.160964965820312, -2.4430618286132812, 9.296112060546875, 17.98196792602539, -1.2001571655273438, 4.57275390625, -28.050018310546875, 4.686714172363281, 10.030136108398438, 24.54452896118164, 18.764022827148438, -10.982223510742188, 1.9193229675292969, 39.60063934326172, 2.691009521484375, 2.5625839233398438, 18.171493530273438, 1.2291488647460938, 14.105804443359375, 26.448455810546875, 29.794384002685547, 10.983715057373047, -11.016349792480469, 1.591583251953125, -13.453842163085938, 18.5859375, 20.730026245117188, 4.200370788574219, 18.47692108154297, 7.07452392578125, 34.53891372680664, 10.152030944824219, 11.91363525390625, -0.7417678833007812, 6.210014343261719, -6.924224853515625, 20.975780487060547, -1.0488815307617188, 1.1824760437011719, 14.474517822265625, -3.8877182006835938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000305.npy"}
{"epoch": 0.46107331821617537, "step": 306, "batch_size": 64, "mean": 7.2089385986328125, "std": 11.463809967041016, "min": -18.512855529785156, "p10": -5.274249267578124, "median": 6.27239990234375, "p90": 19.730225372314454, "max": 44.80760192871094, "pos_frac": 0.75, "sample": [26.381027221679688, 1.7446212768554688, 0.5493812561035156, 19.354999542236328, 2.2130355834960938, 9.13980484008789, 4.788124084472656, 3.8651771545410156, 8.746110916137695, 10.687446594238281, 18.884281158447266, 12.690711975097656, -18.512855529785156, -0.5982818603515625, 1.5094642639160156, -8.604488372802734, 17.78192138671875, 6.6347503662109375, 12.998069763183594, 7.429103851318359, 10.594497680664062, 22.164669036865234, 13.488700866699219, -2.058795928955078, 13.536077499389648, -1.0120010375976562, -11.134782791137695, 7.310890197753906, 4.2043914794921875, 2.2200469970703125, -4.9321746826171875, 23.36388397216797, -2.0478954315185547, 3.769023895263672, 12.512046813964844, 6.131744384765625, 24.71147918701172, 38.562896728515625, 0.781219482421875, -1.8933448791503906, -2.3881492614746094, 2.6386985778808594, 7.610599517822266, 0.8206405639648438, -11.231233596801758, 4.196861267089844, 2.2371292114257812, 14.729751586914062, 2.2900733947753906, 17.93941879272461, 18.23291778564453, 12.461288452148438, -5.4208526611328125, -1.3360671997070312, 19.544891357421875, 9.876108169555664, 10.410018920898438, -9.221328735351562, 19.809654235839844, 44.80760192871094, -11.953811645507812, 14.5789794921875, 6.413055419921875, -3.629153251647949], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000306.npy"}
{"epoch": 0.46258503401360546, "step": 307, "batch_size": 64, "mean": 5.781991958618164, "std": 11.829102516174316, "min": -13.30279541015625, "p10": -7.9413707733154295, "median": 4.777616500854492, "p90": 22.609509658813483, "max": 38.70036315917969, "pos_frac": 0.59375, "sample": [12.172470092773438, -5.210014343261719, 10.80221176147461, 4.469097137451172, -5.090484619140625, -0.8314666748046875, 38.70036315917969, 0.5814971923828125, 20.508438110351562, 23.33283233642578, -1.8950347900390625, 7.48944091796875, 11.498184204101562, 19.314559936523438, -12.75885009765625, 8.5048828125, 6.9181671142578125, -1.7151718139648438, 14.351579666137695, -2.6958694458007812, 5.0861358642578125, -2.7609634399414062, -5.476043701171875, -9.566070556640625, 2.682037353515625, -2.7111473083496094, -2.7394943237304688, 5.990680694580078, 18.961517333984375, -1.7163333892822266, -4.126976013183594, -0.970794677734375, 1.477813720703125, -8.323822021484375, -8.165287017822266, 11.859954833984375, -5.913848876953125, 6.444267272949219, 13.926708221435547, -12.722732543945312, 7.868843078613281, 1.4437255859375, 2.334259033203125, 26.258865356445312, 27.316566467285156, -10.033981323242188, 5.560649871826172, 9.106414794921875, 34.01239013671875, 24.13043975830078, -13.30279541015625, 24.698047637939453, 14.772994995117188, -0.15552330017089844, 7.590553283691406, 10.444986343383789, -0.66119384765625, 16.654621124267578, 11.429733276367188, -7.4188995361328125, -5.251260757446289, 18.264209747314453, 20.921756744384766, -5.620368957519531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000307.npy"}
{"epoch": 0.46409674981103555, "step": 308, "batch_size": 64, "mean": 7.299023628234863, "std": 12.09521484375, "min": -18.20849609375, "p10": -7.477244567871093, "median": 7.197292327880859, "p90": 22.185910797119142, "max": 36.85749816894531, "pos_frac": 0.703125, "sample": [25.903209686279297, 9.693687438964844, 2.332611083984375, -3.9628868103027344, 17.342079162597656, 15.9765625, 0.019193649291992188, 0.492523193359375, 19.251251220703125, -12.752120971679688, 36.85749816894531, 14.072891235351562, 25.274124145507812, 10.62033462524414, -4.3015899658203125, -7.7880706787109375, 8.458908081054688, -0.8481636047363281, 20.236854553222656, 11.406112670898438, 22.197357177734375, -8.748424530029297, 18.37390899658203, 11.618659973144531, 10.570762634277344, 18.354248046875, -6.751983642578125, 19.649333953857422, -6.577484130859375, -13.009918212890625, 22.159202575683594, 4.56005859375, 8.993404388427734, -0.5458908081054688, 11.115829467773438, -3.6383209228515625, 12.64862060546875, -0.99456787109375, 4.229545593261719, -18.20849609375, -1.0316085815429688, 3.858837127685547, 11.119842529296875, 23.551849365234375, 3.0566558837890625, 7.113182067871094, 8.627822875976562, -5.5840606689453125, 4.3543853759765625, 7.312400817871094, 25.502182006835938, 2.64697265625, -3.1157684326171875, 18.38707733154297, 36.263031005859375, -1.402587890625, 21.747093200683594, -18.060321807861328, 16.885583877563477, 0.6369895935058594, 4.895639419555664, -7.8789215087890625, 6.688968658447266, 7.281402587890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000308.npy"}
{"epoch": 0.4656084656084656, "step": 309, "batch_size": 64, "mean": 7.487186431884766, "std": 10.00122356414795, "min": -11.38482666015625, "p10": -4.651642227172851, "median": 4.5406036376953125, "p90": 19.564112281799318, "max": 35.77203369140625, "pos_frac": 0.78125, "sample": [11.799613952636719, 35.77203369140625, 7.640592575073242, 29.734764099121094, 16.978187561035156, -5.46783447265625, 18.212650299072266, 2.87335205078125, 3.2226181030273438, -2.3615875244140625, 11.031082153320312, -2.63433837890625, -4.173919677734375, 2.4474525451660156, 3.28277587890625, -10.732048034667969, 3.4194488525390625, 13.89229965209961, -6.2604522705078125, 12.911642074584961, 6.308483123779297, 20.382362365722656, 26.17215347290039, 0.6437454223632812, 18.840370178222656, 13.158061981201172, -5.077461242675781, 2.568103790283203, 4.457977294921875, 0.12935638427734375, -0.3258228302001953, -0.09747314453125, 1.216796875, 11.011878967285156, 7.481483459472656, 3.3190155029296875, 2.6403121948242188, 10.549781799316406, -4.856380462646484, 23.103622436523438, 0.8648834228515625, 9.943038940429688, 19.16465187072754, 15.20742416381836, 18.22045135498047, -11.38482666015625, 16.824798583984375, 9.439903259277344, 1.1287956237792969, 19.735309600830078, -1.4000511169433594, 1.8614654541015625, 1.2512283325195312, 15.608501434326172, 30.877731323242188, 4.62322998046875, 8.654792785644531, 4.1630706787109375, -4.964080810546875, -1.15771484375, 13.040283203125, 3.2324600219726562, 8.194278717041016, 12.835594177246094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000309.npy"}
{"epoch": 0.4671201814058957, "step": 310, "batch_size": 64, "mean": 3.2906010150909424, "std": 11.702202796936035, "min": -18.30725860595703, "p10": -12.673817443847657, "median": 2.5693702697753906, "p90": 20.582329559326173, "max": 28.647659301757812, "pos_frac": 0.640625, "sample": [23.512542724609375, 0.1643829345703125, 13.43255615234375, 12.665891647338867, 9.842056274414062, -12.3760986328125, -7.706642150878906, 1.9400711059570312, 20.85375213623047, -2.3813705444335938, -2.6284027099609375, -12.497322082519531, 0.7458744049072266, 18.061294555664062, -3.89300537109375, 0.5360069274902344, -1.3011283874511719, 8.6904296875, 22.670326232910156, -13.932548522949219, 7.146282196044922, -2.1002464294433594, -12.749458312988281, 16.5283203125, -18.30725860595703, -6.663120269775391, 3.1094131469726562, -8.452739715576172, -5.93695068359375, 14.44622802734375, -8.570480346679688, 2.3788528442382812, 1.8213348388671875, 22.550430297851562, 4.9956207275390625, 10.159637451171875, 8.868999481201172, 11.278694152832031, 5.899330139160156, 3.7924118041992188, -4.72222900390625, 5.887176513671875, -7.05230712890625, -18.28087615966797, 6.481178283691406, 0.5113296508789062, -9.146995544433594, 10.929397583007812, 2.7598876953125, -14.200469970703125, 10.082595825195312, 28.647659301757812, 17.322586059570312, -16.591304779052734, 3.143321990966797, 2.1391067504882812, 0.274993896484375, 20.68260955810547, 20.348342895507812, -4.563026428222656, -15.698898315429688, 24.391250610351562, 6.864717483520508, 13.794452667236328], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000310.npy"}
{"epoch": 0.46863189720332576, "step": 311, "batch_size": 64, "mean": 6.807516098022461, "std": 11.958358764648438, "min": -14.731430053710938, "p10": -6.863223266601563, "median": 6.795684814453125, "p90": 20.858007049560563, "max": 50.20184326171875, "pos_frac": 0.71875, "sample": [15.103595733642578, 1.0585556030273438, 13.112968444824219, -13.544357299804688, -4.7129364013671875, -2.859394073486328, 10.74002456665039, 10.974296569824219, -4.628353118896484, 12.88836669921875, 10.24993896484375, -11.472488403320312, 6.5720672607421875, 6.369235992431641, 3.4716644287109375, 0.24509239196777344, 9.680938720703125, 4.276824951171875, 13.661941528320312, 7.720130920410156, -6.939971923828125, 16.4910888671875, 1.7717723846435547, 2.5012283325195312, 7.121177673339844, 0.5828056335449219, -14.731430053710938, -3.422039031982422, -1.83251953125, 7.0193023681640625, -3.690277099609375, 23.438705444335938, 41.330108642578125, -0.20745277404785156, 9.663108825683594, 22.334869384765625, 15.482948303222656, 7.824249267578125, 9.211029052734375, 14.389907836914062, 50.20184326171875, 7.4919586181640625, -4.8103485107421875, 29.357894897460938, -6.68414306640625, -3.0096435546875, 5.8501129150390625, 0.30767822265625, 9.548782348632812, 11.987083435058594, 28.5916748046875, 7.435394287109375, -0.0346832275390625, -9.398574829101562, 5.464481353759766, 13.025918960571289, 1.7960491180419922, -6.9764404296875, -8.310836791992188, 4.472320556640625, 7.562309265136719, 26.769058227539062, 17.41199493408203, 10.384414672851562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000311.npy"}
{"epoch": 0.47014361300075586, "step": 312, "batch_size": 64, "mean": 8.344015121459961, "std": 11.919751167297363, "min": -12.229522705078125, "p10": -4.924909973144531, "median": 5.461408615112305, "p90": 24.169962692260746, "max": 38.301605224609375, "pos_frac": 0.734375, "sample": [11.200359344482422, 9.55419921875, -9.668182373046875, 0.27361297607421875, 8.530586242675781, 8.817771911621094, -0.000522613525390625, 2.9452590942382812, 0.11627197265625, 3.5987396240234375, 34.198631286621094, 9.79888916015625, 6.314323425292969, 5.2482452392578125, 3.817913055419922, 16.46746826171875, -4.051017761230469, 3.1082115173339844, -2.5648880004882812, 17.423126220703125, 32.98127746582031, 4.123077392578125, -7.175083160400391, 6.522727966308594, 0.698211669921875, 24.478633880615234, -5.3721160888671875, -4.07965087890625, 7.0246124267578125, 14.676105499267578, 34.36488342285156, -0.7562332153320312, 21.428237915039062, 32.10628890991211, -12.229522705078125, 3.691699981689453, 23.449729919433594, 3.3232574462890625, 20.313003540039062, 28.3328857421875, 38.301605224609375, 7.635143280029297, -4.40252685546875, 1.754119873046875, 17.336997985839844, -2.7035903930664062, -1.4357070922851562, 20.765453338623047, 23.44647979736328, -0.5475997924804688, -5.946876525878906, 3.965717315673828, 20.911876678466797, 11.553443908691406, -5.1487884521484375, 12.747001647949219, 13.80609130859375, 7.208034515380859, -5.633726119995117, 20.809112548828125, 5.674571990966797, 1.3659095764160156, -0.7667999267578125, 0.28997802734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000312.npy"}
{"epoch": 0.47165532879818595, "step": 313, "batch_size": 64, "mean": 5.130142688751221, "std": 12.67215633392334, "min": -17.50989532470703, "p10": -10.917073440551757, "median": 3.4780216217041016, "p90": 22.476602935791018, "max": 40.92071533203125, "pos_frac": 0.671875, "sample": [2.356708526611328, 8.829177856445312, 11.407238006591797, 0.5074882507324219, 4.560813903808594, 21.208984375, 11.43136215209961, 8.77351188659668, 7.83990478515625, -16.015411376953125, 2.209644317626953, 2.5825538635253906, 8.753662109375, 5.187652587890625, 3.0707931518554688, -3.17254638671875, -0.3196868896484375, -3.990875244140625, -11.865264892578125, -2.636249542236328, 4.4017333984375, 9.513004302978516, 0.44705963134765625, 31.513275146484375, -17.50989532470703, -14.329277038574219, 9.393211364746094, 13.4290771484375, 10.184471130371094, 3.8690338134765625, 35.74596405029297, 24.25688934326172, -8.585418701171875, -4.926551818847656, 27.849342346191406, 15.849960327148438, 7.268033027648926, -11.392986297607422, -4.770637512207031, 26.817718505859375, 3.4507064819335938, 0.007877349853515625, 10.545562744140625, -9.806610107421875, 40.92071533203125, 4.116737365722656, 2.6774749755859375, -6.578590393066406, 5.0048370361328125, 2.465190887451172, 22.76439666748047, -1.5767440795898438, 0.14514923095703125, -4.597412109375, -2.8339309692382812, -15.859695434570312, 21.805084228515625, 3.5053367614746094, 21.223873138427734, -4.459136962890625, -15.130752563476562, -1.11248779296875, 11.24697494506836, 20.661113739013672], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000313.npy"}
{"epoch": 0.47316704459561604, "step": 314, "batch_size": 64, "mean": 7.6744208335876465, "std": 11.43148136138916, "min": -14.859634399414062, "p10": -4.266556739807128, "median": 5.177586555480957, "p90": 23.932749938964847, "max": 32.03771209716797, "pos_frac": 0.671875, "sample": [9.900337219238281, 0.73150634765625, 4.483787536621094, 27.657981872558594, -3.6882972717285156, 17.494049072265625, -0.5966262817382812, 22.206459045410156, 1.1724700927734375, 9.597675323486328, 0.9344024658203125, -2.5913333892822266, 30.429031372070312, 7.2845916748046875, 3.190044403076172, 22.2479248046875, 11.022136688232422, 22.333776473999023, 26.25646209716797, 8.32354736328125, 1.7927093505859375, -2.2869110107421875, 19.502178192138672, 32.03771209716797, 23.513931274414062, -10.854660034179688, -0.25627899169921875, 24.11224365234375, 5.9189300537109375, -6.94122314453125, 10.061859130859375, 8.3599853515625, 4.360393524169922, -0.7432937622070312, 1.4959373474121094, 5.87138557434082, 10.765289306640625, 7.38531494140625, -4.491607666015625, 10.874156951904297, -0.5477504730224609, -14.859634399414062, 27.818832397460938, 18.68292999267578, 12.126911163330078, -4.6461181640625, 11.251583099365234, 22.69316864013672, -2.7297897338867188, 19.94605255126953, 4.35858154296875, 28.748443603515625, -0.150909423828125, -1.4682083129882812, 1.8518142700195312, -5.7091064453125, -9.155754089355469, 9.889549255371094, -2.912220001220703, -2.8267288208007812, 4.224916458129883, -3.7414379119873047, -2.977275848388672, 22.42711639404297], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000314.npy"}
{"epoch": 0.47467876039304613, "step": 315, "batch_size": 64, "mean": 6.823853492736816, "std": 10.370864868164062, "min": -13.532512664794922, "p10": -6.651550292968749, "median": 5.639106750488281, "p90": 21.889068603515625, "max": 33.94661331176758, "pos_frac": 0.765625, "sample": [-9.056221008300781, 7.8380889892578125, 11.177082061767578, -0.8365745544433594, 0.35724735260009766, 7.902687072753906, -9.066864013671875, -3.409914016723633, 0.11383819580078125, 18.809967041015625, -9.213951110839844, 10.871402740478516, 7.0834808349609375, 14.693862915039062, -1.1272430419921875, 6.243003845214844, 0.18155670166015625, -3.163787841796875, 1.0176773071289062, 4.772361755371094, 4.190822601318359, 10.19293212890625, 6.399982452392578, 3.210906982421875, 0.8838043212890625, 4.725135803222656, 21.911407470703125, -6.847564697265625, -4.971683502197266, 5.242866516113281, 1.968048095703125, 0.4989013671875, 15.307670593261719, 7.99176025390625, 2.7214508056640625, 24.585983276367188, 15.121315002441406, -6.194183349609375, -8.656303405761719, 14.815673828125, 21.188766479492188, 3.5161285400390625, 12.186447143554688, 24.995534896850586, 3.5500640869140625, 10.721694946289062, 9.707168579101562, 33.94661331176758, -13.532512664794922, 6.035346984863281, 24.26941680908203, -1.7040901184082031, 15.84783935546875, 0.9565048217773438, 9.520599365234375, 8.019287109375, 18.939847946166992, 10.9986572265625, 21.836944580078125, -6.967212677001953, 23.58575439453125, -2.256448745727539, 29.760116577148438, 3.3175506591796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000315.npy"}
{"epoch": 0.47619047619047616, "step": 316, "batch_size": 64, "mean": 7.313711166381836, "std": 9.8513822555542, "min": -9.92473030090332, "p10": -5.855305480957031, "median": 6.729211807250977, "p90": 22.98333778381348, "max": 29.295166015625, "pos_frac": 0.75, "sample": [-7.302650451660156, 5.1336822509765625, 11.136947631835938, -3.6564865112304688, -5.850856781005859, -6.563774108886719, 15.89544677734375, 4.600830078125, 3.7686767578125, 12.913619995117188, 19.964088439941406, 29.295166015625, 11.759941101074219, 25.303619384765625, 23.390167236328125, 0.1499176025390625, -6.974353790283203, 9.109382629394531, -9.92473030090332, 18.467422485351562, 5.663295745849609, 16.731048583984375, 6.169200897216797, 5.0921630859375, -5.857212066650391, 7.289222717285156, 7.9460601806640625, -2.7063446044921875, -0.9772109985351562, 9.653465270996094, 14.65432357788086, 20.504531860351562, 0.39719390869140625, 23.457763671875, 1.0336761474609375, -0.7236518859863281, -2.2308807373046875, 8.143608093261719, 2.63885498046875, 3.8203887939453125, 13.779449462890625, 8.712165832519531, 9.778980255126953, -1.533721923828125, 11.503707885742188, 2.1719741821289062, 0.8490352630615234, 11.389877319335938, -8.359573364257812, 22.166595458984375, -1.278656005859375, -6.630727767944336, 1.954010009765625, -5.573759078979492, 10.377300262451172, 9.92822265625, 3.668487548828125, 28.441848754882812, 9.638031005859375, 12.014488220214844, 23.84149932861328, 4.1536102294921875, 23.333370208740234, 12.43572998046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000316.npy"}
{"epoch": 0.47770219198790626, "step": 317, "batch_size": 64, "mean": 8.168839454650879, "std": 12.060746192932129, "min": -31.99249267578125, "p10": -2.475580596923828, "median": 6.520515441894531, "p90": 23.20283546447754, "max": 39.00725555419922, "pos_frac": 0.75, "sample": [6.2761383056640625, 3.3351364135742188, 4.238803863525391, 7.78692626953125, 8.321182250976562, 39.00725555419922, -0.4053916931152344, 3.092742919921875, -0.824951171875, -0.5891342163085938, -1.8023147583007812, -0.6486358642578125, 0.642242431640625, 2.73858642578125, 8.376983642578125, 1.8665275573730469, 22.314743041992188, 5.7514801025390625, 21.91295623779297, 18.640087127685547, -16.315765380859375, -2.2737655639648438, 12.714603424072266, -0.3434906005859375, 15.836498260498047, 4.803779602050781, 6.4774322509765625, 10.065061569213867, 8.073234558105469, -31.99249267578125, 25.24152374267578, -5.0335845947265625, 12.779731750488281, 31.308008193969727, 32.403343200683594, -11.504417419433594, 12.925430297851562, -0.5579605102539062, 11.475095748901367, 23.583446502685547, -0.29541015625, 6.5635986328125, 31.10039520263672, -3.2035675048828125, 1.0824966430664062, 17.52113151550293, 9.776176452636719, 21.600563049316406, 10.616432189941406, 0.8720550537109375, 12.598785400390625, 17.548553466796875, 4.142610549926758, 16.340049743652344, -7.465599060058594, 5.683326721191406, -2.56207275390625, 13.7327880859375, 16.723281860351562, 29.243745803833008, 1.0690116882324219, 18.127685546875, 1.4298095703125, 10.86279296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000317.npy"}
{"epoch": 0.47921390778533635, "step": 318, "batch_size": 64, "mean": 6.111828327178955, "std": 12.366908073425293, "min": -19.749710083007812, "p10": -7.472608280181884, "median": 4.280055046081543, "p90": 24.68086395263672, "max": 34.342437744140625, "pos_frac": 0.65625, "sample": [9.778793334960938, 20.25666046142578, -1.2633209228515625, -11.271621704101562, 28.723663330078125, 8.146007537841797, 2.5803375244140625, -19.749710083007812, 14.191864013671875, 22.674217224121094, 2.0409202575683594, -0.021209716796875, 6.6038055419921875, 9.983543395996094, 28.596662521362305, 1.9061737060546875, 11.285797119140625, -6.744678497314453, 9.904207229614258, 4.370204925537109, -5.295892715454102, 8.322736740112305, 24.879066467285156, -0.5229339599609375, 0.08349990844726562, 8.377792358398438, 24.21839141845703, -13.886177062988281, 16.89849853515625, 12.307144165039062, -6.412803649902344, 11.660564422607422, -11.43673324584961, -0.10208892822265625, 1.554779052734375, 26.892120361328125, 34.342437744140625, -1.9835357666015625, 25.358596801757812, 7.073432922363281, 1.87371826171875, 1.1545867919921875, 16.572830200195312, -3.0490188598632812, -3.9586410522460938, 3.600799560546875, 2.5694732666015625, 16.245323181152344, -1.7165374755859375, 13.952613830566406, -0.670079231262207, 30.730575561523438, 5.637775421142578, 16.720001220703125, -11.90267562866211, -6.104898452758789, 6.3707275390625, -7.7194976806640625, 8.201858520507812, -3.7897720336914062, 23.49981117248535, -18.676532745361328, -6.896533012390137, 4.189905166625977], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000318.npy"}
{"epoch": 0.48072562358276644, "step": 319, "batch_size": 64, "mean": 7.070160865783691, "std": 11.169227600097656, "min": -20.79296112060547, "p10": -5.3674270629882805, "median": 5.955537796020508, "p90": 20.35182304382324, "max": 40.725013732910156, "pos_frac": 0.734375, "sample": [15.31587028503418, 6.675285339355469, -4.231048583984375, 21.092010498046875, 11.004627227783203, 10.053909301757812, 11.890176773071289, 1.7572479248046875, -1.9625396728515625, 5.909782409667969, 16.55429458618164, 20.14757537841797, 0.4464607238769531, 7.2091217041015625, 10.631637573242188, 12.170440673828125, -8.033531188964844, 40.725013732910156, 2.2425804138183594, 25.632762908935547, 9.962663650512695, -0.322418212890625, 1.8692779541015625, 13.452499389648438, 19.629684448242188, -3.550273895263672, 20.43935775756836, 6.001293182373047, 16.87918472290039, 14.730178833007812, -9.452110290527344, -3.1543502807617188, 6.764110565185547, 30.27105712890625, -10.03421401977539, 5.825977325439453, -8.563766479492188, 9.390541076660156, 4.5915069580078125, -2.1538352966308594, 6.876766204833984, 3.7731361389160156, 5.320514678955078, 3.75653076171875, 11.805244445800781, -2.0184898376464844, -5.8544464111328125, -1.3608932495117188, -13.061588287353516, 17.098697662353516, 1.941793441772461, 1.4167633056640625, 1.0771636962890625, -0.252166748046875, 3.94061279296875, 20.086647033691406, 9.647928237915039, 8.02058219909668, 26.12009620666504, 14.26900863647461, -1.723358154296875, -20.79296112060547, 31.214637756347656, 3.3800430297851562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000319.npy"}
{"epoch": 0.48223733938019653, "step": 320, "batch_size": 64, "mean": 4.156286239624023, "std": 11.171575546264648, "min": -17.478435516357422, "p10": -11.392533874511717, "median": 2.85646915435791, "p90": 19.816136932373052, "max": 38.96088409423828, "pos_frac": 0.6875, "sample": [8.208560943603516, -16.7120361328125, 2.856630325317383, 10.425796508789062, 5.396881103515625, -2.5429859161376953, 20.201805114746094, -0.832977294921875, 6.609214782714844, 1.0520391464233398, 10.27044677734375, 3.1164016723632812, -2.430307388305664, 1.4938430786132812, -12.526336669921875, 20.8118896484375, 0.321502685546875, -14.765769958496094, -3.7142868041992188, 7.051036834716797, -7.672550201416016, 12.11385726928711, -13.122711181640625, 6.992677688598633, 0.6667938232421875, 12.845245361328125, 38.96088409423828, 14.365768432617188, 1.2614326477050781, -17.19781494140625, 1.941497802734375, -13.172581672668457, 10.096508026123047, 6.181427001953125, 2.8305301666259766, 18.916244506835938, 3.3983116149902344, -5.719642639160156, -8.746994018554688, 4.117984771728516, 8.688613891601562, 25.470531463623047, 0.8518486022949219, -3.0658187866210938, 11.722663879394531, 2.8563079833984375, 22.634414672851562, 10.066478729248047, -2.591033935546875, 0.5807685852050781, 2.198953628540039, -17.478435516357422, 23.26800537109375, -5.1734161376953125, -2.175994873046875, 25.456253051757812, -0.6379852294921875, 12.14017105102539, 13.293701171875, 2.043426513671875, 10.040508270263672, -6.3423309326171875, 13.402908325195312, 5.403576850891113], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000320.npy"}
{"epoch": 0.4837490551776266, "step": 321, "batch_size": 64, "mean": 6.082797050476074, "std": 11.235809326171875, "min": -20.56766128540039, "p10": -8.48347473144531, "median": 5.506199836730957, "p90": 22.43231086730957, "max": 30.630233764648438, "pos_frac": 0.75, "sample": [5.6865386962890625, 12.772205352783203, 1.182525634765625, 13.89349365234375, 3.4327545166015625, -4.8833160400390625, 23.66824722290039, 13.561412811279297, 22.138362884521484, 0.600459098815918, 29.507274627685547, 1.5028076171875, 3.668437957763672, 15.880081176757812, -13.21786880493164, 1.6856803894042969, 1.393890380859375, 7.5807037353515625, 9.547920227050781, 17.638900756835938, 5.722755432128906, 7.153530120849609, -20.56766128540039, -9.5074462890625, 5.022377014160156, -12.805747985839844, -9.64886474609375, -6.094207763671875, 9.716976165771484, -0.6030426025390625, 5.425071716308594, 8.34017562866211, 12.630699157714844, 2.7237892150878906, 5.58732795715332, 30.630233764648438, 9.160934448242188, -0.6319656372070312, 6.417625427246094, 12.648483276367188, -2.1929931640625, 1.131927490234375, -2.7401390075683594, 7.783729553222656, 5.142890930175781, -14.644821166992188, 18.435508728027344, 27.311294555664062, 15.440162658691406, 16.34711456298828, 3.9380111694335938, 7.041011810302734, 6.544288635253906, 28.19095802307129, 22.55828857421875, -4.5121002197265625, 1.9075927734375, 6.1533355712890625, 30.27043342590332, -11.258464813232422, 0.717254638671875, -2.1672286987304688, 4.613105773925781, -5.273719787597656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000321.npy"}
{"epoch": 0.4852607709750567, "step": 322, "batch_size": 64, "mean": 8.472402572631836, "std": 10.31460952758789, "min": -22.940139770507812, "p10": -2.5159732818603513, "median": 7.584041595458984, "p90": 21.863336944580084, "max": 32.68965148925781, "pos_frac": 0.8125, "sample": [-2.3120994567871094, 20.13001251220703, 20.108596801757812, -1.7809906005859375, 8.370758056640625, 5.228801727294922, 12.675064086914062, 12.551240921020508, 20.17235565185547, 6.721599578857422, -2.6033477783203125, 12.075973510742188, 24.11452865600586, 28.330101013183594, -0.61102294921875, 6.8877105712890625, 8.413818359375, 14.678817749023438, 1.3895721435546875, -1.609161376953125, 1.3765983581542969, 6.009010314941406, -5.225669860839844, 7.15730094909668, 11.923530578613281, 8.707019805908203, 2.5116825103759766, 6.761207580566406, 2.087627410888672, -8.520950317382812, -4.575897216796875, 8.987197875976562, 7.394416809082031, 13.144622802734375, 31.79894256591797, 4.914813995361328, 4.4530181884765625, 32.68965148925781, 2.4362945556640625, 3.347015380859375, 12.478958129882812, 25.230072021484375, 3.926717758178711, 19.070602416992188, 3.463642120361328, 22.588043212890625, 19.390045166015625, 11.759389877319336, -11.079879760742188, -22.940139770507812, 7.939567565917969, 11.121566772460938, 4.665233612060547, 4.616674423217773, 19.320449829101562, 7.9484100341796875, 13.080101013183594, 4.5017242431640625, 11.033447265625, -3.5326461791992188, 30.1512451171875, -0.0766448974609375, 9.493743896484375, 7.7736663818359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000322.npy"}
{"epoch": 0.48677248677248675, "step": 323, "batch_size": 64, "mean": 7.173666954040527, "std": 14.000260353088379, "min": -20.407958984375, "p10": -6.653776550292967, "median": 7.154499053955078, "p90": 29.673927688598635, "max": 43.56275939941406, "pos_frac": 0.6875, "sample": [8.366840362548828, -0.69720458984375, 4.246307373046875, 1.0004730224609375, 6.138725280761719, 22.741928100585938, 8.993484497070312, 10.47978401184082, 8.465690612792969, 29.96218490600586, 3.971771240234375, -5.176719665527344, 24.589248657226562, 2.4565582275390625, 42.41575622558594, -11.049442291259766, -19.49645233154297, 7.06634521484375, 11.868492126464844, -5.4753570556640625, 10.347412109375, 7.512451171875, 8.892745971679688, 7.338218688964844, 32.08778381347656, -5.259218215942383, -7.1588134765625, 9.996604919433594, 2.871723175048828, -10.69915771484375, 43.56275939941406, 7.365875244140625, -3.0919723510742188, -4.911266326904297, 29.001327514648438, 2.6053466796875, -11.4185791015625, 11.795951843261719, 13.964752197265625, 0.03420448303222656, 7.7513580322265625, 7.242652893066406, -1.6891937255859375, -0.45980072021484375, 21.859580993652344, 39.341888427734375, 14.031570434570312, -20.407958984375, 35.079742431640625, 0.4502983093261719, -19.11199951171875, -5.462272644042969, 8.423900604248047, 4.465187072753906, -2.30084228515625, 13.576709747314453, 31.185333251953125, 10.213150024414062, 9.108718872070312, -0.7426338195800781, -0.8278045654296875, -2.147838592529297, 9.17462158203125, 4.6537933349609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000323.npy"}
{"epoch": 0.48828420256991684, "step": 324, "batch_size": 64, "mean": 6.991987705230713, "std": 10.854487419128418, "min": -20.09333038330078, "p10": -4.48103675842285, "median": 6.185583114624023, "p90": 21.33690185546875, "max": 34.918121337890625, "pos_frac": 0.765625, "sample": [21.35704803466797, 15.419578552246094, 1.071197509765625, -0.987762451171875, -15.449760437011719, 18.67253875732422, 3.913456916809082, 14.364578247070312, -3.1956214904785156, -5.031929016113281, 21.289894104003906, 26.813220977783203, 4.9777679443359375, 5.814949035644531, 8.855175018310547, 4.6733856201171875, 12.866552352905273, -9.846790313720703, 2.267669677734375, -9.447357177734375, 13.776634216308594, -10.395294189453125, 4.830013275146484, 22.549461364746094, 7.061756134033203, -0.5982780456542969, -20.09333038330078, 5.411754608154297, 25.69110107421875, 8.475173950195312, 6.848140716552734, -11.718170166015625, 13.524158477783203, 7.983436584472656, 0.37010955810546875, 14.111705780029297, 15.047309875488281, 3.821563720703125, 7.731700897216797, 8.924201965332031, 8.222076416015625, 4.131813049316406, 31.874465942382812, -0.6204795837402344, 34.918121337890625, 1.5910930633544922, 6.556217193603516, 7.634330749511719, 11.491943359375, 17.57978057861328, 29.927001953125, -1.7078399658203125, 6.9880523681640625, 0.3653144836425781, 15.260040283203125, 1.8212890625, 4.900787353515625, -1.8409347534179688, -1.190521240234375, 4.973468780517578, 14.248512268066406, 1.899169921875, 8.542320251464844, -1.8297538757324219], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000324.npy"}
{"epoch": 0.4897959183673469, "step": 325, "batch_size": 64, "mean": 6.432037353515625, "std": 11.32148551940918, "min": -23.06268310546875, "p10": -7.164564704895019, "median": 5.234996795654297, "p90": 20.23046855926514, "max": 34.21599578857422, "pos_frac": 0.71875, "sample": [9.466068267822266, 34.21599578857422, 7.198513031005859, 18.84061050415039, -1.9165458679199219, 8.019676208496094, 23.113731384277344, -2.323688507080078, 2.2540817260742188, 4.565998077392578, 10.897254943847656, 19.335031509399414, -3.3997650146484375, -12.169248580932617, 14.81585693359375, 1.7332077026367188, 14.189689636230469, 5.482490539550781, 4.507621765136719, -7.203165054321289, 32.93107604980469, -10.953681945800781, 19.021629333496094, 4.713886260986328, -0.15124988555908203, 31.47498321533203, 1.9990615844726562, 10.207206726074219, -7.4320220947265625, 6.062685012817383, -10.9002685546875, -0.8200492858886719, 12.446357727050781, 4.1246490478515625, 1.066162109375, 20.614227294921875, 1.6383628845214844, 15.788864135742188, 12.22610855102539, 9.995208740234375, 9.530868530273438, 0.759185791015625, -0.896331787109375, 22.28607177734375, -23.06268310546875, 4.20843505859375, -1.992095947265625, 10.471282958984375, 17.34783935546875, 13.319107055664062, 32.97282409667969, -7.074497222900391, -8.130456924438477, -4.166130065917969, 2.5099945068359375, 0.5020675659179688, 11.86099624633789, -5.0338134765625, 4.9875030517578125, 8.569450378417969, 6.39068603515625, 7.9378509521484375, 6.043006896972656, -3.3674240112304688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000325.npy"}
{"epoch": 0.491307634164777, "step": 326, "batch_size": 64, "mean": 4.824021339416504, "std": 10.432127952575684, "min": -17.313247680664062, "p10": -7.665455818176269, "median": 5.241809844970703, "p90": 18.324668502807622, "max": 29.265968322753906, "pos_frac": 0.640625, "sample": [-3.7576675415039062, -2.8027801513671875, 4.754016876220703, 2.410675048828125, 6.9744415283203125, -14.111305236816406, 29.265968322753906, 6.1019134521484375, 11.015899658203125, 8.345870971679688, -0.6207275390625, 12.132453918457031, 7.859989166259766, -2.617431640625, -1.1158447265625, 17.21869659423828, 2.980701446533203, 5.185432434082031, -1.3973922729492188, 13.469703674316406, 18.739765167236328, -2.1106719970703125, 0.8736343383789062, 14.572166442871094, -6.755455017089844, 2.48541259765625, -16.423358917236328, -4.992218017578125, -2.208740234375, 5.839653015136719, 22.285552978515625, -2.655048370361328, 7.100437164306641, 0.07143402099609375, 7.630889892578125, 6.278694152832031, 5.5360107421875, -2.925262451171875, -8.665138244628906, 12.855804443359375, 5.9692840576171875, 15.709869384765625, -10.99481201171875, -0.6499824523925781, 21.890625, 4.65350341796875, 24.07164764404297, -6.652217864990234, 16.703643798828125, 28.19426727294922, 5.298187255859375, -10.51854133605957, -3.03497314453125, 21.764015197753906, -5.93731689453125, 17.356109619140625, -17.313247680664062, 6.1131439208984375, 12.456501007080078, 10.83404541015625, 12.36690902709961, -8.055456161499023, 1.82476806640625, 7.861198425292969], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000326.npy"}
{"epoch": 0.4928193499622071, "step": 327, "batch_size": 64, "mean": 5.089669227600098, "std": 12.425185203552246, "min": -25.544815063476562, "p10": -9.576409912109373, "median": 3.0909881591796875, "p90": 20.991273117065436, "max": 35.85260009765625, "pos_frac": 0.65625, "sample": [4.490509033203125, -10.1053466796875, -3.2236404418945312, 35.26661682128906, -7.271240234375, 2.0389404296875, -7.757331848144531, 6.304679870605469, 0.9433269500732422, 13.367794036865234, 18.993675231933594, -7.339324951171875, -17.928009033203125, 19.63031005859375, 13.72113037109375, 13.880584716796875, 21.667312622070312, -0.09334754943847656, -2.7217559814453125, 1.7653350830078125, 0.158843994140625, -0.5528373718261719, 11.942455291748047, 16.251678466796875, 26.992950439453125, 2.8570175170898438, 0.32209014892578125, 2.9289093017578125, 13.835723876953125, -0.32736968994140625, 7.831209182739258, 3.2530670166015625, -11.676826477050781, 7.096794128417969, -0.5280685424804688, 1.47265625, 10.2703857421875, 7.484832763671875, -10.403358459472656, -3.8310718536376953, 29.099166870117188, 6.986608505249023, -25.544815063476562, -0.6858062744140625, 16.854183197021484, 35.31304931640625, -3.0514678955078125, 16.115806579589844, 35.85260009765625, 11.247665405273438, 21.574542999267578, 5.33209228515625, 5.1860198974609375, 7.6221466064453125, -14.519245147705078, 0.38140106201171875, 6.04058837890625, 2.0634689331054688, -0.0017852783203125, 5.082408905029297, -6.2900543212890625, -11.430265426635742, 9.843414306640625, -8.34222412109375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000327.npy"}
{"epoch": 0.4943310657596372, "step": 328, "batch_size": 64, "mean": 5.386219024658203, "std": 10.207247734069824, "min": -24.07427978515625, "p10": -4.092676544189453, "median": 3.768678665161133, "p90": 19.763301849365234, "max": 33.59284973144531, "pos_frac": 0.65625, "sample": [21.256134033203125, -1.0389747619628906, 7.4707489013671875, 4.479835510253906, 7.291164398193359, 1.7695980072021484, 0.8421745300292969, 14.552299499511719, 19.32817840576172, 2.0297012329101562, 6.427360534667969, 7.04022216796875, 22.003192901611328, 1.7969131469726562, 5.0551300048828125, -3.1328887939453125, 18.289817810058594, 10.007375717163086, 5.403961181640625, 1.4887619018554688, 1.7223052978515625, -3.6500930786132812, -0.09847068786621094, 24.38976287841797, -2.7261123657226562, -8.238494873046875, 33.59284973144531, -0.19004058837890625, -15.960861206054688, 2.230926513671875, -24.07427978515625, 8.40306282043457, 14.032678604125977, -0.1184844970703125, 24.33612060546875, -1.3895759582519531, -2.7852706909179688, 0.26589012145996094, 23.497817993164062, -3.7622604370117188, 12.388931274414062, 9.31866455078125, -8.000391006469727, -5.294816970825195, -1.3762359619140625, 6.922966003417969, -5.603919982910156, -0.8940277099609375, 0.9938735961914062, -1.28509521484375, 10.130596160888672, -2.2801971435546875, 10.498241424560547, 12.764404296875, -2.980792999267578, 3.1438255310058594, 14.261177062988281, 4.393531799316406, 18.35009765625, 4.914302825927734, 11.92940902709961, -4.234283447265625, 19.949783325195312, 14.869792938232422], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000328.npy"}
{"epoch": 0.4958427815570673, "step": 329, "batch_size": 64, "mean": 6.3929619789123535, "std": 11.211071968078613, "min": -19.23577880859375, "p10": -6.061318969726562, "median": 3.9764175415039062, "p90": 23.258207702636724, "max": 38.29798889160156, "pos_frac": 0.703125, "sample": [8.629783630371094, 8.581298828125, 3.1294174194335938, 6.856407165527344, -5.136222839355469, 6.243831634521484, 22.00555419921875, 0.8249588012695312, 11.65185546875, 0.647369384765625, 4.969932556152344, -0.3024444580078125, -7.455049514770508, 10.370555877685547, -3.9290103912353516, -8.336807250976562, -1.8344497680664062, 17.31854248046875, -5.7764739990234375, 5.0579376220703125, 1.2047653198242188, 0.8198280334472656, 23.795059204101562, 19.132110595703125, -7.7445831298828125, 23.990585327148438, 38.29798889160156, 9.46826171875, 8.189048767089844, 5.956451416015625, 12.949092864990234, 1.8707733154296875, -7.663536071777344, 7.91229248046875, 13.84457778930664, -4.483856201171875, 31.865966796875, -2.9280662536621094, 18.78594207763672, 12.771392822265625, 27.949203491210938, 10.877685546875, 2.9598388671875, -3.021697998046875, -0.3788909912109375, 7.705787658691406, -0.97967529296875, 2.233135223388672, 4.27252197265625, 15.15109634399414, 0.4567070007324219, 11.726486206054688, 3.0957107543945312, -6.281101226806641, 34.14643478393555, 13.548828125, -1.5967292785644531, 25.73065948486328, -1.3519096374511719, -19.23577880859375, 1.770721435546875, 1.3225173950195312, 3.6803131103515625, -6.1833953857421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000329.npy"}
{"epoch": 0.4973544973544973, "step": 330, "batch_size": 64, "mean": 8.46575927734375, "std": 11.592424392700195, "min": -26.445566177368164, "p10": -6.488713455200194, "median": 7.219753265380859, "p90": 21.859556961059575, "max": 35.84751892089844, "pos_frac": 0.75, "sample": [-2.523162841796875, -8.605583190917969, -0.019535064697265625, -8.206260681152344, 6.964256286621094, -26.445566177368164, 13.910675048828125, 5.1551361083984375, -0.009063720703125, 14.119873046875, 12.9586181640625, 7.095344543457031, -3.3893280029296875, 6.566871643066406, 3.9443817138671875, 13.784896850585938, 27.446060180664062, -2.9907913208007812, 30.85028076171875, 6.968620300292969, 19.76845932006836, -12.069547653198242, 6.883758544921875, -7.527185440063477, 22.895950317382812, 22.223114013671875, 7.06341552734375, 0.45116424560546875, -5.147483825683594, 8.187644958496094, 7.815803527832031, 14.332305908203125, 5.935691833496094, 10.348068237304688, -7.063526153564453, 19.39666748046875, 21.01125717163086, 20.78173828125, 32.001312255859375, 1.2649116516113281, -1.1440963745117188, 14.364959716796875, 3.5804519653320312, 14.693729400634766, 16.38182830810547, 16.604389190673828, 5.136850357055664, 7.493755340576172, -0.2203826904296875, 7.3441619873046875, 5.313117980957031, 18.772979736328125, 20.707775115966797, 11.037223815917969, 7.862762451171875, 4.606269836425781, 11.6552734375, -1.262887954711914, 30.69843292236328, -10.378829956054688, 15.08481216430664, 14.783966064453125, 6.715282440185547, 35.84751892089844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000330.npy"}
{"epoch": 0.4988662131519274, "step": 331, "batch_size": 64, "mean": 9.018034934997559, "std": 12.667943954467773, "min": -32.62091064453125, "p10": -3.6511779785156246, "median": 8.970088958740234, "p90": 24.08607349395752, "max": 40.272918701171875, "pos_frac": 0.8125, "sample": [32.08125305175781, 1.7781753540039062, 7.189506530761719, 4.586767196655273, 14.095291137695312, 19.30811309814453, 9.679061889648438, 14.883087158203125, 8.871223449707031, 7.212257385253906, 1.482229232788086, 32.21923828125, 18.385345458984375, 26.642578125, 10.046951293945312, 2.399810791015625, 40.272918701171875, 15.027862548828125, 9.962139129638672, 33.3287467956543, 9.201980590820312, 3.2722511291503906, 1.38323974609375, -1.4616546630859375, 1.1863384246826172, 23.905616760253906, 22.626508712768555, 25.59222412109375, -4.8293304443359375, 1.357208251953125, 21.264694213867188, 2.4294395446777344, 17.822097778320312, 22.213272094726562, -3.077728271484375, 2.25531005859375, 3.443553924560547, -10.416877746582031, 1.6175804138183594, 21.690452575683594, -3.896942138671875, 16.700050354003906, 2.4536094665527344, 0.2084808349609375, 9.60776138305664, -32.62091064453125, 10.436103820800781, -7.56658935546875, -12.690074920654297, 0.29564666748046875, -2.9687652587890625, 21.909217834472656, 13.463966369628906, 9.068954467773438, 10.579666137695312, 3.111358642578125, -1.8623390197753906, 19.61945343017578, -2.006988525390625, -8.522762298583984, 24.16341209411621, 10.048030853271484, 3.4833526611328125, 23.211830139160156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000331.npy"}
{"epoch": 0.5003779289493575, "step": 332, "batch_size": 64, "mean": 7.280100345611572, "std": 10.728301048278809, "min": -9.68048095703125, "p10": -4.742617797851562, "median": 5.200803756713867, "p90": 23.70163650512696, "max": 35.25575256347656, "pos_frac": 0.78125, "sample": [-4.985134124755859, 5.5995025634765625, 2.4029159545898438, -3.0369110107421875, 5.37005615234375, 7.404975891113281, 7.4942169189453125, -1.5465927124023438, -1.0506134033203125, 1.5972671508789062, 5.961906433105469, 29.32207489013672, 24.340347290039062, 3.3121109008789062, 15.170616149902344, 4.7252197265625, -2.399759292602539, 4.816610336303711, 20.630096435546875, 2.98565673828125, 16.954593658447266, 0.8707427978515625, -8.484039306640625, 31.78042221069336, 8.064956665039062, 12.05208969116211, 0.22414207458496094, 1.8787002563476562, 9.22943115234375, 3.8005599975585938, 5.426689147949219, 30.386844635009766, -9.270782470703125, 3.563995361328125, -7.9562530517578125, 5.991678237915039, 2.9870052337646484, 21.981773376464844, -4.176746368408203, 2.3561363220214844, 16.4539794921875, 6.693695068359375, 14.478893280029297, 4.661113739013672, 2.473278045654297, -6.9602508544921875, 22.21131134033203, 7.894313812255859, 7.300933837890625, 5.325672149658203, 35.25575256347656, 14.649742126464844, 5.075935363769531, -3.5691261291503906, -8.050048828125, 3.745941162109375, 29.319107055664062, 4.328884124755859, 8.541824340820312, 14.299079895019531, 27.95404815673828, 11.024139404296875, -9.68048095703125, -3.277801513671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000332.npy"}
{"epoch": 0.5018896447467877, "step": 333, "batch_size": 64, "mean": 9.485807418823242, "std": 10.605560302734375, "min": -7.792900085449219, "p10": -2.391204833984374, "median": 7.814598083496094, "p90": 23.033840370178225, "max": 41.90864562988281, "pos_frac": 0.8125, "sample": [9.381301879882812, 0.04515838623046875, 11.594207763671875, 29.838729858398438, 3.8822174072265625, -2.8949203491210938, 14.484054565429688, 14.6876220703125, 6.811908721923828, 0.3380126953125, 18.2861328125, 7.77069091796875, 11.0438232421875, 15.300907135009766, 13.233104705810547, 2.6475067138671875, 16.647506713867188, 33.1304931640625, 23.890823364257812, 10.972175598144531, 3.9552459716796875, 3.009929656982422, 1.9464111328125, 6.318809509277344, 0.2579936981201172, -0.6833572387695312, -7.792900085449219, 9.400047302246094, 20.685062408447266, 8.835220336914062, 7.8585052490234375, 31.865394592285156, -5.961448669433594, -2.7180023193359375, 12.661479949951172, 17.94664764404297, 41.90864562988281, -1.1637496948242188, 8.3642578125, -1.6286773681640625, -3.2553977966308594, -1.0055389404296875, -7.099769592285156, 7.650285720825195, 4.320415496826172, 0.9205474853515625, 16.69927978515625, 14.429351806640625, 2.3331146240234375, 2.8365707397460938, 2.1432647705078125, 4.992462158203125, 7.464778900146484, 32.88153076171875, 8.070295333862305, -3.9582366943359375, 18.006088256835938, -1.009521484375, 23.47286033630371, 7.37847900390625, 22.00946044921875, 17.999446868896484, 16.52899932861328, 17.125953674316406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000333.npy"}
{"epoch": 0.5034013605442177, "step": 334, "batch_size": 64, "mean": 6.509448528289795, "std": 12.950199127197266, "min": -24.80593490600586, "p10": -7.847628784179687, "median": 5.143244743347168, "p90": 24.467674636840826, "max": 35.126487731933594, "pos_frac": 0.734375, "sample": [-8.103111267089844, -22.85247802734375, 3.9884109497070312, 14.506114959716797, 6.4920654296875, 0.05289459228515625, -19.575950622558594, 30.969039916992188, 2.705657958984375, -24.80593490600586, 2.6692047119140625, 29.712799072265625, -13.618206024169922, 0.5566215515136719, 4.406486511230469, 18.03034210205078, 9.90679931640625, -1.3070907592773438, 4.36444091796875, 27.686920166015625, 26.730133056640625, 26.962631225585938, -6.500598907470703, 9.580150604248047, 25.059154510498047, -14.782623291015625, 4.7066802978515625, -0.4783668518066406, 19.55596923828125, 3.7261886596679688, 0.22756576538085938, 5.704555511474609, 23.087554931640625, -9.92437744140625, 14.55636215209961, 10.651130676269531, 20.8287353515625, 3.5328369140625, 5.388984680175781, 3.7883834838867188, 7.414020538330078, -7.251502990722656, -4.554756164550781, -3.311859130859375, 17.46298599243164, 0.09729385375976562, 5.483757019042969, 4.897504806518555, -5.8852081298828125, 5.91943359375, 8.94235610961914, 5.541389465332031, -3.3189697265625, 8.368236541748047, -4.3185882568359375, 19.147354125976562, -0.42319488525390625, 35.126487731933594, 9.512374877929688, 2.3547630310058594, 22.090103149414062, 21.726943969726562, 15.046340942382812, 18.351356506347656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000334.npy"}
{"epoch": 0.5049130763416477, "step": 335, "batch_size": 64, "mean": 7.221893787384033, "std": 14.592146873474121, "min": -25.390459060668945, "p10": -7.3642127990722654, "median": 3.8837223052978516, "p90": 27.453769302368165, "max": 36.791900634765625, "pos_frac": 0.65625, "sample": [17.583534240722656, 25.233657836914062, -5.580987930297852, 36.791900634765625, -7.451812744140625, 0.526123046875, 16.936676025390625, -0.008584976196289062, 27.58514404296875, 23.180587768554688, 33.678367614746094, 28.768020629882812, 2.596954345703125, 0.467254638671875, 26.69289779663086, 20.307239532470703, 7.956352233886719, 8.635536193847656, -7.497688293457031, 31.778480529785156, -4.691417694091797, 1.2356491088867188, 6.785133361816406, 34.98625183105469, -7.159812927246094, 11.208770751953125, 9.983551025390625, -7.588890075683594, 0.6185111999511719, 20.391624450683594, 25.258201599121094, -1.2852554321289062, 2.5549354553222656, 4.25567626953125, 11.372966766357422, -15.609733581542969, 26.037872314453125, -1.4879989624023438, -2.071197509765625, 33.987770080566406, -4.183097839355469, -1.3117790222167969, -3.7783966064453125, 7.9421234130859375, 0.02863311767578125, 27.147228240966797, 4.2968597412109375, -14.303825378417969, 7.364845275878906, -4.913127899169922, 2.8760833740234375, -6.1909332275390625, 3.6156234741210938, -25.390459060668945, -24.827598571777344, -3.2091445922851562, 24.254074096679688, 19.289955139160156, 4.151821136474609, 15.868324279785156, -5.6995697021484375, 1.6549072265625, -4.439811706542969, 4.9962158203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000335.npy"}
{"epoch": 0.5064247921390779, "step": 336, "batch_size": 64, "mean": 8.332913398742676, "std": 12.2301664352417, "min": -22.391937255859375, "p10": -5.112725830078124, "median": 5.862394332885742, "p90": 26.740449523925793, "max": 37.05609130859375, "pos_frac": 0.765625, "sample": [11.229660034179688, 23.810897827148438, 10.425941467285156, 1.336334228515625, 27.9959716796875, 21.280719757080078, 13.931549072265625, 2.632110595703125, 33.693050384521484, 3.800933837890625, 11.36092758178711, 2.1405181884765625, 32.12242889404297, -5.529396057128906, 9.60833740234375, -1.3712348937988281, -0.8502273559570312, 13.817634582519531, 35.95465087890625, -3.506847381591797, 3.3437728881835938, 9.037338256835938, 14.13189697265625, 5.743782043457031, -9.3155517578125, 14.886817932128906, 14.433830261230469, -22.391937255859375, 9.239540100097656, -13.888656616210938, 33.15099334716797, -3.5396270751953125, 0.26800537109375, 5.57183837890625, -4.140495300292969, 1.557718276977539, 11.274436950683594, -7.405876159667969, 28.781322479248047, 6.705108642578125, 14.397987365722656, 4.79656982421875, 5.3611907958984375, 5.981006622314453, 18.61300277709961, 4.7571258544921875, 13.687850952148438, -5.7858428955078125, 1.7809867858886719, 8.005996704101562, -3.592252731323242, -6.7527923583984375, 5.466514587402344, 21.396102905273438, 37.05609130859375, 10.651985168457031, 1.4831695556640625, 19.4151611328125, 3.667449951171875, -2.7378768920898438, 0.2515106201171875, 15.668197631835938, 19.198211669921875, -0.78912353515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000336.npy"}
{"epoch": 0.5079365079365079, "step": 337, "batch_size": 64, "mean": 5.384521484375, "std": 12.090513229370117, "min": -24.31427764892578, "p10": -9.048084259033203, "median": 4.049570083618164, "p90": 19.886999511718756, "max": 39.12907409667969, "pos_frac": 0.6875, "sample": [11.5682373046875, -9.19091796875, -8.714805603027344, -2.9790496826171875, 3.3736000061035156, 22.100440979003906, -6.730525970458984, 18.446731567382812, 1.1145782470703125, -2.4680328369140625, 15.686508178710938, -6.749092102050781, 10.815940856933594, -16.789794921875, -6.566650390625, -3.5462417602539062, 14.016799926757812, 32.870330810546875, 2.034759521484375, 0.212188720703125, 3.0799789428710938, -1.6292057037353516, -11.005914688110352, -13.855842590332031, 15.348052978515625, 13.542247772216797, -12.300750732421875, 11.492412567138672, 1.7688713073730469, 25.208084106445312, 17.052902221679688, -24.31427764892578, 12.812397003173828, -7.971282958984375, 20.504257202148438, 5.415824890136719, 21.915863037109375, 14.837631225585938, 1.6923713684082031, 10.790254592895508, 8.245613098144531, 4.331233978271484, 3.7679061889648438, 7.5467987060546875, 2.066436767578125, -0.8283309936523438, 12.11703872680664, 4.3548736572265625, 3.7631988525390625, 16.183998107910156, 5.5432586669921875, 39.12907409667969, -5.641357421875, 3.03570556640625, 0.21474075317382812, 17.37420654296875, -2.1285400390625, -6.7630462646484375, 12.801094055175781, 16.79658317565918, -10.698013305664062, 8.579544067382812, 26.897350311279297, 5.0311431884765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000337.npy"}
{"epoch": 0.509448223733938, "step": 338, "batch_size": 64, "mean": 5.847989559173584, "std": 10.686797142028809, "min": -18.291580200195312, "p10": -4.88566951751709, "median": 3.862295150756836, "p90": 20.2761100769043, "max": 38.84755325317383, "pos_frac": 0.703125, "sample": [-6.713409423828125, 4.274257659912109, 19.45404052734375, 1.9497871398925781, 14.30758285522461, 9.340782165527344, -11.209762573242188, 12.9544677734375, 1.3599929809570312, 23.828426361083984, 29.944046020507812, 1.3787841796875, -1.8173370361328125, -0.5960693359375, 4.249244689941406, -7.973941802978516, -7.235893249511719, -4.337556838989258, -5.120574951171875, 0.34183502197265625, 5.487762451171875, 18.69812774658203, 2.5207481384277344, -1.5157089233398438, 20.62842559814453, -2.991077423095703, -14.378471374511719, 10.850997924804688, 10.6241455078125, 14.714080810546875, 3.9578628540039062, 11.258819580078125, 2.96466064453125, 1.9423599243164062, 6.242069244384766, 8.785858154296875, 9.050926208496094, 0.727020263671875, 16.111602783203125, 0.678985595703125, -3.501068115234375, 16.026708602905273, -2.1170997619628906, 12.499340057373047, 3.792205810546875, 5.497398376464844, -1.7390823364257812, -1.0619831085205078, 9.710380554199219, -18.291580200195312, -4.255226135253906, 21.507888793945312, 15.636756896972656, -2.0007705688476562, 7.521995544433594, 11.946701049804688, 0.3335418701171875, 29.251659393310547, 21.972518920898438, -2.3630828857421875, 3.932384490966797, 2.6802139282226562, 38.84755325317383, 3.7060775756835938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000338.npy"}
{"epoch": 0.5109599395313681, "step": 339, "batch_size": 64, "mean": 8.431801795959473, "std": 10.800143241882324, "min": -14.40142822265625, "p10": -5.414237976074219, "median": 8.338676452636719, "p90": 21.023222732543946, "max": 38.423126220703125, "pos_frac": 0.75, "sample": [2.097137451171875, 6.6131134033203125, -2.6898155212402344, -1.7880859375, -7.4066314697265625, 14.560882568359375, 6.5005950927734375, 5.937509536743164, 4.1150970458984375, 6.0989532470703125, 14.543006896972656, 11.06292724609375, 5.184288024902344, -0.03701019287109375, 6.671546936035156, 14.560405731201172, -2.4725074768066406, 18.003616333007812, 27.345993041992188, -0.34572792053222656, 21.237354278564453, 8.464218139648438, 15.258419036865234, 14.816314697265625, 1.2075119018554688, 15.528907775878906, 3.8888092041015625, 8.998085021972656, -14.40142822265625, 12.62761116027832, 20.523582458496094, 20.201946258544922, 16.09569549560547, 30.803890228271484, 19.285858154296875, 10.612857818603516, 18.338478088378906, -8.838447570800781, 7.9276580810546875, 9.563262939453125, 23.626991271972656, 8.8543701171875, -5.12493896484375, -2.649557113647461, 5.992340087890625, -12.152793884277344, 8.213134765625, -0.158599853515625, -12.947555541992188, -7.173683166503906, 0.22687911987304688, 5.791851043701172, 10.581062316894531, 21.72210121154785, -2.8258209228515625, 38.423126220703125, -5.5382232666015625, 11.200061798095703, 13.028190612792969, 15.58013916015625, 17.786853790283203, 19.077598571777344, 3.089432716369629, 24.316482543945312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000339.npy"}
{"epoch": 0.5124716553287982, "step": 340, "batch_size": 64, "mean": 7.551028251647949, "std": 13.462089538574219, "min": -23.266555786132812, "p10": -8.33939971923828, "median": 4.9249114990234375, "p90": 26.56649627685547, "max": 33.632652282714844, "pos_frac": 0.6875, "sample": [6.117927551269531, 25.025211334228516, 3.251798629760742, 1.1779708862304688, 1.9310455322265625, 25.974552154541016, -1.0782184600830078, -2.110443115234375, -1.0285663604736328, 8.23370361328125, 2.1254348754882812, 18.315814971923828, 8.471321105957031, 14.260543823242188, 33.632652282714844, 26.820186614990234, 3.5661468505859375, -23.266555786132812, 10.230560302734375, 11.71844482421875, -3.337278366088867, -1.5220603942871094, -4.955467224121094, 10.866935729980469, -2.0415191650390625, 27.938697814941406, 0.4801139831542969, 22.672500610351562, 21.72711944580078, 20.75860595703125, -7.651191711425781, 23.943626403808594, 0.12351226806640625, 18.83258056640625, 31.30768585205078, 10.74622917175293, 25.733715057373047, -0.8515701293945312, 4.49310302734375, 25.869455337524414, 1.4336357116699219, -6.081886291503906, -8.634346008300781, 29.54705810546875, 19.014602661132812, -6.033882141113281, -10.9498291015625, -9.209197998046875, 10.710063934326172, 20.083267211914062, 0.7296295166015625, 2.4910659790039062, -0.2125091552734375, -8.687480926513672, -22.36728286743164, -15.262710571289062, 27.939910888671875, 2.252887725830078, -3.4552955627441406, 26.949356079101562, 11.709587097167969, 5.356719970703125, 8.312881469726562, 9.125228881835938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000340.npy"}
{"epoch": 0.5139833711262283, "step": 341, "batch_size": 64, "mean": 7.637750148773193, "std": 12.318869590759277, "min": -22.86449432373047, "p10": -7.8906818389892575, "median": 6.121484756469727, "p90": 24.832090759277346, "max": 31.43999481201172, "pos_frac": 0.71875, "sample": [-11.659675598144531, 19.75518798828125, 19.003005981445312, 12.567142486572266, 30.35419464111328, 23.940933227539062, 15.118335723876953, 25.122909545898438, 19.390106201171875, -1.9684600830078125, 24.994171142578125, 4.00724983215332, -9.109390258789062, -2.649442672729492, 4.277189254760742, 1.5370464324951172, 20.380508422851562, 2.1447296142578125, 13.306976318359375, 16.116958618164062, 31.43999481201172, 30.809066772460938, 24.453903198242188, 5.712619781494141, -8.06927490234375, 26.168346405029297, 6.5303497314453125, -8.157463073730469, -7.473964691162109, 0.4481964111328125, -0.5159149169921875, 21.191574096679688, -2.15521240234375, -3.045745849609375, 4.529273986816406, 14.436439514160156, 5.515037536621094, 9.208667755126953, -1.645843505859375, -19.41921615600586, -10.7950439453125, -1.30859375, 1.1511611938476562, 14.2899169921875, 3.8880062103271484, 25.150924682617188, 11.008102416992188, 7.2732391357421875, 17.677993774414062, 13.729141235351562, 10.155471801757812, 2.4923553466796875, 23.005939483642578, 1.21905517578125, 6.987743377685547, 17.232833862304688, -0.0762786865234375, 7.784877777099609, 1.0968189239501953, -5.7896575927734375, -2.7250213623046875, 11.376075744628906, 0.26495361328125, -22.86449432373047], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000341.npy"}
{"epoch": 0.5154950869236583, "step": 342, "batch_size": 64, "mean": 8.2747163772583, "std": 12.00847339630127, "min": -16.531314849853516, "p10": -6.110656356811523, "median": 6.636690139770508, "p90": 24.851713562011724, "max": 40.91961669921875, "pos_frac": 0.765625, "sample": [5.615814208984375, 1.2834243774414062, 1.0492668151855469, -7.894214630126953, 2.8620100021362305, 28.35171890258789, -8.219871520996094, 22.096435546875, -6.211601257324219, 18.393959045410156, -4.810062408447266, 9.441719055175781, 3.7459583282470703, 14.324356079101562, 11.2906494140625, -2.8856048583984375, 0.3516712188720703, -16.531314849853516, -1.786468505859375, -1.0152702331542969, 5.768306732177734, 19.60137939453125, 15.778793334960938, 23.46019744873047, -3.967926025390625, 6.893482208251953, 5.773468017578125, 11.296573638916016, 20.923065185546875, 17.602157592773438, -1.3576469421386719, 1.2875556945800781, 8.284393310546875, 10.06280517578125, 6.3798980712890625, -6.33950138092041, 18.274620056152344, 1.0837249755859375, 7.00433349609375, 23.480392456054688, 4.671516418457031, 29.267166137695312, 21.777755737304688, 7.231258392333984, -8.779216766357422, 27.345741271972656, 11.720317840576172, -2.3026771545410156, 30.224166870117188, 29.504638671875, 8.909492492675781, -14.676395416259766, 0.9625625610351562, 40.91961669921875, -5.875118255615234, 25.439422607421875, 13.173297882080078, 0.56719970703125, 1.0121917724609375, 2.548816680908203, 17.1274471282959, 10.875673294067383, 1.6236724853515625, 15.570648193359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000342.npy"}
{"epoch": 0.5170068027210885, "step": 343, "batch_size": 64, "mean": 4.384973526000977, "std": 10.8245267868042, "min": -25.34720230102539, "p10": -9.501477050781249, "median": 5.217594146728516, "p90": 17.654057312011727, "max": 28.361961364746094, "pos_frac": 0.671875, "sample": [1.5297317504882812, 7.737964630126953, -1.351959228515625, -8.7843017578125, 5.117584228515625, 12.496193885803223, 5.914630889892578, -2.3016834259033203, -6.494606018066406, -6.5638275146484375, -25.34720230102539, -9.808837890625, 9.170623779296875, 14.10125732421875, 23.91943359375, -4.1609954833984375, 22.893455505371094, 13.195587158203125, 15.20700454711914, 10.634353637695312, 7.235595703125, 5.575204849243164, 18.538497924804688, 7.276092529296875, 6.5218658447265625, 15.213411331176758, 6.119350433349609, -7.503444671630859, 3.93402099609375, 3.3759422302246094, 0.270355224609375, 6.6494293212890625, 15.590362548828125, 7.16119384765625, 3.3329544067382812, -0.4306449890136719, 12.07699966430664, 18.70983123779297, -17.58240509033203, 24.054885864257812, -12.39208984375, -2.3837432861328125, 2.2433547973632812, 20.124176025390625, 8.975929260253906, -2.1818008422851562, 15.349334716796875, 6.97955322265625, -3.1308937072753906, 15.27219009399414, 28.361961364746094, 3.0185546875, -4.7695770263671875, 5.317604064941406, 2.407257080078125, 13.48248291015625, -11.587955474853516, 2.325225830078125, 11.739799499511719, -1.1377792358398438, -14.237686157226562, -15.946296691894531, -5.132678985595703, 4.71748161315918], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000343.npy"}
{"epoch": 0.5185185185185185, "step": 344, "batch_size": 64, "mean": 8.391152381896973, "std": 11.161296844482422, "min": -16.19127655029297, "p10": -4.054355621337891, "median": 6.899478912353516, "p90": 22.676311492919925, "max": 37.068084716796875, "pos_frac": 0.734375, "sample": [-6.175506591796875, 20.5262451171875, 5.81256103515625, 8.939302444458008, 21.454208374023438, -1.8257293701171875, 18.80685806274414, 6.901039123535156, 4.5099334716796875, 0.4310150146484375, -6.7022857666015625, 4.307060241699219, 20.37762451171875, 18.129764556884766, -2.1577072143554688, 27.087047576904297, -1.15301513671875, 37.068084716796875, -1.8225936889648438, 12.515762329101562, -6.4927520751953125, 8.157943725585938, -1.6819610595703125, 25.73821258544922, 8.649093627929688, -2.42333984375, -10.318695068359375, 0.3550567626953125, 21.726516723632812, 10.53506851196289, -4.2208099365234375, 31.300804138183594, 6.897918701171875, 6.144523620605469, 17.704620361328125, 24.690561294555664, 15.745201110839844, 3.9392852783203125, 1.7913589477539062, -7.870029449462891, 0.07288360595703125, 5.574371337890625, -3.6659622192382812, 1.2535476684570312, -3.542449951171875, 2.7859535217285156, 17.880592346191406, 10.776603698730469, 15.981887817382812, -0.48583221435546875, 20.739479064941406, 12.848495483398438, 27.22083854675293, 14.67618179321289, 9.581298828125, 2.403522491455078, 14.399810791015625, 11.014095306396484, -1.6994781494140625, 20.225051879882812, -16.19127655029297, 23.08336639404297, 5.140501022338867, 9.561996459960938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000344.npy"}
{"epoch": 0.5200302343159486, "step": 345, "batch_size": 64, "mean": 7.053492069244385, "std": 11.909232139587402, "min": -21.831661224365234, "p10": -5.508676147460937, "median": 5.309630393981934, "p90": 24.49020004272461, "max": 35.73710632324219, "pos_frac": 0.71875, "sample": [3.7141571044921875, 2.8060150146484375, 24.235374450683594, -15.59347152709961, -0.4820289611816406, -0.8713531494140625, -0.10572052001953125, -5.326690673828125, -4.601409912109375, 5.119794845581055, 5.4994659423828125, 29.87311553955078, 11.625244140625, 1.3948535919189453, -5.074638366699219, 6.2342529296875, -14.501976013183594, 3.6072216033935547, 5.101570129394531, 21.637351989746094, 35.73710632324219, 7.74431037902832, 24.599411010742188, 10.4376220703125, 10.997671127319336, 1.463714599609375, 8.826972961425781, 16.73975372314453, -5.586669921875, -1.6257476806640625, 11.785392761230469, 27.791770935058594, 8.047744750976562, 17.833908081054688, 11.610336303710938, -3.4112091064453125, 22.052940368652344, 27.848861694335938, 20.464153289794922, -9.64044189453125, 14.489189147949219, -3.2981643676757812, -21.831661224365234, 18.839309692382812, 8.382987976074219, -7.524103164672852, 3.3373794555664062, 0.4717864990234375, 7.65985107421875, 2.8339157104492188, 25.982040405273438, 9.026130676269531, -10.117584228515625, 8.835205078125, 2.1099624633789062, 15.999710083007812, -0.022064208984375, -4.657611846923828, 1.9015703201293945, 0.9968643188476562, 14.298004150390625, 1.1109619140625, 18.47930145263672, 26.11176300048828], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000345.npy"}
{"epoch": 0.5215419501133787, "step": 346, "batch_size": 64, "mean": 8.276739120483398, "std": 13.897903442382812, "min": -26.39861297607422, "p10": -7.174999618530273, "median": 6.832572937011719, "p90": 25.64999351501465, "max": 39.5721435546875, "pos_frac": 0.640625, "sample": [18.355934143066406, 24.527385711669922, 25.729904174804688, 18.161209106445312, 20.327232360839844, 6.88702392578125, 3.132659912109375, 6.362874984741211, 26.097776412963867, 24.67431640625, 39.5721435546875, -1.8437995910644531, 4.7889404296875, 18.698179244995117, -7.447410583496094, 27.550559997558594, -7.5018463134765625, 10.780204772949219, -1.725494384765625, 6.339851379394531, 5.765556335449219, 18.896865844726562, -0.957550048828125, -1.162567138671875, 17.186492919921875, 37.59893798828125, -26.39861297607422, 5.702230453491211, -0.219207763671875, -16.29767608642578, 4.6551513671875, -2.10211181640625, 10.027816772460938, -4.960844039916992, 5.8067169189453125, 6.7781219482421875, 12.368942260742188, -3.5316925048828125, 14.051582336425781, 35.160926818847656, 31.570724487304688, 25.225053787231445, -0.7348098754882812, 16.501373291015625, 7.592765808105469, -6.739280700683594, 19.7867431640625, 10.135322570800781, -7.361736297607422, -6.6535491943359375, 15.784637451171875, -2.0517578125, -1.2727184295654297, -3.1587791442871094, -18.11178207397461, 8.702247619628906, 7.430442810058594, 19.84606170654297, -17.575057983398438, 25.46353530883789, -2.4927024841308594, -3.0530242919921875, 13.538734436035156, 15.502113342285156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000346.npy"}
{"epoch": 0.5230536659108088, "step": 347, "batch_size": 64, "mean": 6.062224864959717, "std": 13.03646183013916, "min": -25.414173126220703, "p10": -8.751665878295897, "median": 3.385066032409668, "p90": 24.29538497924805, "max": 46.031005859375, "pos_frac": 0.59375, "sample": [-2.7885208129882812, 13.964653015136719, 13.787689208984375, -3.954925537109375, -25.414173126220703, 4.243888854980469, 8.980560302734375, 5.636631011962891, 8.52703857421875, 19.7093505859375, -0.25972747802734375, 23.78759765625, -0.1063232421875, -8.434967041015625, -10.449913024902344, 46.031005859375, 11.157203674316406, 13.801156997680664, -12.16180419921875, 5.6341705322265625, -0.22238922119140625, 0.19387054443359375, 35.750511169433594, -2.1757049560546875, -15.178672790527344, -5.15264892578125, 4.38067626953125, 30.813875198364258, 25.062393188476562, -5.119659423828125, 1.3215560913085938, -3.8660449981689453, -8.887393951416016, 10.067169189453125, -3.0532302856445312, -0.03275299072265625, -10.043506622314453, 12.193279266357422, -3.28350830078125, 15.404067993164062, 11.970296859741211, -0.4425811767578125, 1.4251823425292969, 12.816421508789062, 9.096315383911133, 29.14063262939453, -1.5999374389648438, -10.182403564453125, 26.025230407714844, 18.63207244873047, 18.66802978515625, 9.926101684570312, 8.585784912109375, -4.07220458984375, -6.219886779785156, 22.20238494873047, 2.526243209838867, 11.378326416015625, -0.7994155883789062, 0.0066986083984375, 1.9580307006835938, -0.5977783203125, 24.51300811767578, 13.163349151611328], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000347.npy"}
{"epoch": 0.5245653817082389, "step": 348, "batch_size": 64, "mean": 8.510453224182129, "std": 13.211325645446777, "min": -10.657485961914062, "p10": -4.187644577026367, "median": 3.637989044189453, "p90": 29.192391014099123, "max": 42.22041320800781, "pos_frac": 0.640625, "sample": [22.665985107421875, 23.341373443603516, -2.41497802734375, 2.7838401794433594, 17.32880401611328, 15.681106567382812, -9.3485107421875, 2.6445884704589844, -3.1329803466796875, -0.4139556884765625, 42.22041320800781, -4.13531494140625, 14.311965942382812, 15.452781677246094, 28.55559539794922, 29.465303421020508, 7.798717498779297, 38.22929382324219, 0.2914276123046875, 36.74199295043945, 2.4324493408203125, 8.781768798828125, 20.611427307128906, 1.9620285034179688, 8.015754699707031, 26.201248168945312, -2.4040145874023438, 16.530017852783203, -1.3071365356445312, -2.458587646484375, 5.662773132324219, 0.0055084228515625, 0.9696846008300781, 13.980056762695312, -1.166168212890625, -5.347465515136719, 11.992528915405273, -2.856517791748047, 0.6502037048339844, 13.883602142333984, 14.956703186035156, 40.73771286010742, 10.271194458007812, -2.8114633560180664, 18.840656280517578, -2.4117660522460938, -5.988868713378906, -2.895122528076172, -8.625045776367188, 12.147453308105469, -4.210071563720703, 29.662696838378906, -1.4593067169189453, 6.078344345092773, 31.15015411376953, -0.8575668334960938, 17.98310089111328, -0.03076934814453125, -1.1922035217285156, 4.492137908935547, -10.657485961914062, 9.765853881835938, 1.9828643798828125, -6.466808319091797], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000348.npy"}
{"epoch": 0.5260770975056689, "step": 349, "batch_size": 64, "mean": 8.114167213439941, "std": 12.274805068969727, "min": -25.41585922241211, "p10": -5.365062713623045, "median": 6.258040428161621, "p90": 25.813610076904308, "max": 39.59468078613281, "pos_frac": 0.765625, "sample": [-1.27593994140625, 7.0146331787109375, 26.893783569335938, 2.1265182495117188, 0.24352073669433594, 7.081207275390625, 9.885398864746094, 16.075817108154297, 3.0356292724609375, 9.76333236694336, -3.3558216094970703, 6.137617111206055, 23.29320526123047, 4.236351013183594, 6.756675720214844, 28.878807067871094, 22.041141510009766, -8.754035949707031, -25.41585922241211, 19.085758209228516, 0.7931137084960938, 13.244277954101562, 9.029129028320312, 6.3784637451171875, 10.764026641845703, 5.134986877441406, -1.207763671875, 18.542949676513672, -3.7627029418945312, 15.664093017578125, 16.199630737304688, 5.321689605712891, -8.254898071289062, 13.770072937011719, -1.4164962768554688, 14.383659362792969, 7.691793441772461, 35.69334411621094, -10.285148620605469, 12.640386581420898, 22.785690307617188, 2.7679595947265625, -1.0829277038574219, 2.7257232666015625, 2.748199462890625, 30.372764587402344, 29.903076171875, 0.038234710693359375, -6.051788330078125, -3.5985288619995117, 14.018768310546875, -2.284423828125, -10.464710235595703, 3.2700729370117188, 3.015491485595703, 26.91222381591797, 0.05899810791015625, 0.2183990478515625, -7.198726654052734, 15.263786315917969, 3.4575977325439453, 22.808441162109375, 15.955368041992188, 39.59468078613281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000349.npy"}
{"epoch": 0.527588813303099, "step": 350, "batch_size": 64, "mean": 10.172553062438965, "std": 12.369440078735352, "min": -9.487998962402344, "p10": -4.323915100097656, "median": 9.605944633483887, "p90": 29.722360992431643, "max": 36.54426574707031, "pos_frac": 0.75, "sample": [32.8216552734375, -2.0770530700683594, 10.937957763671875, -2.7884178161621094, 13.285636901855469, 11.387523651123047, 34.75213623046875, 17.252578735351562, 15.973018646240234, 16.2381591796875, -6.731781005859375, -3.067047119140625, 3.143798828125, 10.312480926513672, 9.808866500854492, 7.505870819091797, 21.737083435058594, 0.6896247863769531, -0.06235504150390625, -7.320705413818359, 7.068084716796875, 16.806419372558594, -9.487998962402344, 13.339447021484375, 10.15321159362793, -4.394229888916016, -5.088954925537109, 17.516021728515625, 3.9984569549560547, -6.860809326171875, 21.31340789794922, -0.741455078125, 16.406646728515625, -3.2448501586914062, 16.630096435546875, 24.868576049804688, 14.619369506835938, 29.96826934814453, 36.54426574707031, 35.28367614746094, -4.159847259521484, 32.868370056152344, 29.148574829101562, 3.1835403442382812, 0.11888885498046875, 2.3200836181640625, 12.247047424316406, -0.3565998077392578, 3.128742218017578, 19.786697387695312, 3.282440185546875, 14.552848815917969, 1.155508041381836, 4.332496643066406, 9.403022766113281, -6.108055114746094, 6.256500244140625, 0.7359085083007812, -1.7880363464355469, 4.09991455078125, 34.73836135864258, 14.196624755859375, 22.031494140625, 27.3721923828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000350.npy"}
{"epoch": 0.5291005291005291, "step": 351, "batch_size": 64, "mean": 7.216036796569824, "std": 12.384770393371582, "min": -36.488983154296875, "p10": -6.84301643371582, "median": 7.4863128662109375, "p90": 22.641891479492188, "max": 29.676002502441406, "pos_frac": 0.78125, "sample": [3.0952606201171875, 22.4422607421875, 22.30252456665039, 10.606094360351562, 1.7588043212890625, 25.058509826660156, -6.798793792724609, 9.416130065917969, 18.657440185546875, 13.537353515625, 4.897308349609375, 29.676002502441406, 2.065744400024414, -3.0868072509765625, 2.1314353942871094, -10.343368530273438, 3.0828628540039062, 4.584072113037109, 17.156455993652344, 5.0814056396484375, 11.946113586425781, -7.708900451660156, 14.062976837158203, 8.646224975585938, 22.33160400390625, 24.70166015625, 19.33258819580078, 10.15298080444336, -21.419849395751953, 3.6323928833007812, 10.202568054199219, -11.012100219726562, 23.316940307617188, 7.8109130859375, 10.764698028564453, -6.861968994140625, 1.0807342529296875, 22.727447509765625, -1.4697456359863281, 7.161712646484375, 11.757286071777344, 26.630184173583984, 3.7179336547851562, -1.811676025390625, 17.807403564453125, 2.9208450317382812, 12.939834594726562, 1.4660186767578125, 2.2124786376953125, 10.016326904296875, -0.5705413818359375, -2.8341827392578125, 12.381475448608398, 17.04537582397461, 21.245651245117188, 1.123382568359375, 9.314752578735352, 23.859298706054688, -36.488983154296875, 4.5153961181640625, -20.81787109375, 5.6406402587890625, 19.674102783203125, -4.63848876953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000351.npy"}
{"epoch": 0.5306122448979592, "step": 352, "batch_size": 64, "mean": 6.523018836975098, "std": 11.42588996887207, "min": -14.5048828125, "p10": -6.491220092773437, "median": 4.405525207519531, "p90": 23.356115722656252, "max": 40.499176025390625, "pos_frac": 0.703125, "sample": [19.315284729003906, 29.994125366210938, 33.422752380371094, 7.567695617675781, 4.9035186767578125, -8.032684326171875, 0.7335929870605469, 11.660789489746094, -1.8522109985351562, 13.43922233581543, 1.7978477478027344, 9.659011840820312, -8.3701171875, -7.881704330444336, 4.4593048095703125, 4.8906097412109375, 2.58184814453125, -6.410919189453125, -3.0124969482421875, 3.2807159423828125, 11.356887817382812, 6.850288391113281, -5.4000396728515625, 22.167633056640625, 23.45294189453125, 7.55059814453125, 5.427303314208984, -6.258689880371094, 1.7019996643066406, -0.2809333801269531, 2.828439712524414, 8.301509857177734, 2.4628982543945312, 9.847515106201172, -6.525634765625, 5.930843353271484, -0.3991870880126953, 25.20166015625, 24.005691528320312, 18.826919555664062, 15.825889587402344, -14.5048828125, 40.499176025390625, -3.2197799682617188, 2.5278854370117188, 24.2911376953125, -6.778984069824219, 2.8999176025390625, 1.276540756225586, 13.394195556640625, -10.554405212402344, 9.016815185546875, 15.860610961914062, 19.410945892333984, 7.5162200927734375, 4.35174560546875, 1.4200210571289062, 23.13018798828125, -3.2614517211914062, -5.9279937744140625, 0.9823513031005859, 15.479949951171875, -0.97406005859375, -4.383689880371094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000352.npy"}
{"epoch": 0.5321239606953893, "step": 353, "batch_size": 64, "mean": 7.737872123718262, "std": 13.19715690612793, "min": -39.59942626953125, "p10": -5.616711807250976, "median": 6.250591278076172, "p90": 27.08977737426758, "max": 33.49082946777344, "pos_frac": 0.734375, "sample": [21.598846435546875, 33.49082946777344, -7.799503326416016, 12.707298278808594, -12.25467300415039, 8.755725860595703, 6.830085754394531, -11.356090545654297, 11.509185791015625, 18.618141174316406, 2.9178085327148438, 6.1991424560546875, 11.388629913330078, -2.282470703125, 32.703407287597656, 5.659130096435547, 4.9724578857421875, 22.284637451171875, 27.137290954589844, 25.735748291015625, 32.88963317871094, 12.57177734375, 0.5967483520507812, 10.660114288330078, -2.3575057983398438, 0.17696380615234375, 18.59627914428711, 11.899337768554688, 26.978912353515625, -1.7995262145996094, 3.7344131469726562, -1.2889633178710938, 5.359447479248047, -2.1051025390625, -39.59942626953125, 20.58403778076172, -10.565183639526367, -5.815792083740234, 29.300373077392578, 17.94342613220215, 7.1480560302734375, 14.92498779296875, 33.03783416748047, 7.0196990966796875, -5.152191162109375, 6.3859405517578125, 8.869438171386719, -1.3015403747558594, -4.079017639160156, 4.101890563964844, 2.4392738342285156, 4.6875762939453125, 3.3068466186523438, 6.302040100097656, 2.3075180053710938, 4.844245910644531, 8.157005310058594, 0.5112457275390625, -3.7116546630859375, -5.918495178222656, 20.787460327148438, -1.6935882568359375, 28.986610412597656, 6.687015533447266], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000353.npy"}
{"epoch": 0.5336356764928194, "step": 354, "batch_size": 64, "mean": 8.035799026489258, "std": 12.396102905273438, "min": -18.761821746826172, "p10": -8.033920288085936, "median": 7.694334030151367, "p90": 24.495415115356447, "max": 38.985015869140625, "pos_frac": 0.75, "sample": [38.985015869140625, 20.564895629882812, 0.17952728271484375, 4.181240081787109, -8.644577026367188, -1.3271331787109375, 7.737236022949219, 14.951873779296875, 18.890892028808594, 0.9005966186523438, 6.589607238769531, 15.864500045776367, 17.859012603759766, 2.4400634765625, 5.857814788818359, 1.623331069946289, 9.591400146484375, -6.6090545654296875, 19.3258056640625, 28.889480590820312, 27.071033477783203, 9.457244873046875, 2.57391357421875, 10.092918395996094, -15.46917724609375, -3.7968215942382812, -18.761821746826172, 25.88558578491211, 10.198604583740234, 15.751419067382812, 14.663105010986328, 24.213905334472656, 10.048297882080078, 13.446136474609375, 18.135986328125, 12.586807250976562, -4.0620880126953125, -13.322656631469727, 16.7447509765625, -4.025764465332031, 12.863075256347656, 24.61606216430664, -10.374870300292969, -14.56850814819336, 29.621414184570312, -16.38582992553711, 8.432388305664062, 28.99956512451172, 7.651432037353516, -3.9897689819335938, 2.4901962280273438, 7.145275115966797, 21.99502182006836, 15.108474731445312, 12.551605224609375, 2.571460723876953, -0.6055755615234375, 5.653738021850586, -0.7411727905273438, 5.004066467285156, 2.8274784088134766, 20.67953872680664, -0.7353897094726562, 4.198581695556641], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000354.npy"}
{"epoch": 0.5351473922902494, "step": 355, "batch_size": 64, "mean": 8.830789566040039, "std": 12.520605087280273, "min": -16.623456954956055, "p10": -5.991912269592284, "median": 6.314909934997559, "p90": 27.85730571746827, "max": 36.45831298828125, "pos_frac": 0.765625, "sample": [5.542987823486328, 19.67779541015625, -4.969390869140625, 13.311767578125, 14.079864501953125, 21.900348663330078, 5.533962249755859, -6.430135726928711, -0.6074714660644531, 12.702552795410156, 10.248016357421875, 3.2547836303710938, 8.142293930053711, 1.2194061279296875, 6.248741149902344, 8.086029052734375, -0.4513397216796875, 11.275184631347656, -15.16256332397461, 3.9697113037109375, -6.8925018310546875, 13.045265197753906, 21.29218292236328, 13.389602661132812, 24.056903839111328, 3.9606399536132812, 4.5117340087890625, 11.863988876342773, 24.619281768798828, 29.034927368164062, -9.535911560058594, 14.862701416015625, 2.768556594848633, 36.45831298828125, 2.4704208374023438, -13.182807922363281, 14.385147094726562, -1.807220458984375, -1.8090667724609375, 12.167186737060547, 2.94293212890625, -4.07928466796875, 34.941436767578125, 2.9797210693359375, 6.35723876953125, 25.8280029296875, 3.497161865234375, -0.07456207275390625, 33.79509735107422, -16.623456954956055, 28.727006912231445, 18.816696166992188, 16.030441284179688, 5.6997833251953125, 34.86853790283203, -8.463031768798828, 3.9986228942871094, 0.8361549377441406, 33.27993392944336, -4.964515686035156, 9.778396606445312, 6.272581100463867, 8.409584045410156, 9.084136962890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000355.npy"}
{"epoch": 0.5366591080876795, "step": 356, "batch_size": 64, "mean": 7.250809669494629, "std": 14.034934997558594, "min": -29.560653686523438, "p10": -9.879360961914061, "median": 7.614856719970703, "p90": 26.246414184570327, "max": 38.664276123046875, "pos_frac": 0.71875, "sample": [19.095460891723633, 22.758216857910156, -2.5411529541015625, -18.146018981933594, 19.579755783081055, 12.525901794433594, -1.2411117553710938, 22.168075561523438, 11.21468734741211, 0.5033702850341797, -24.17354965209961, -29.560653686523438, 11.954544067382812, -6.539581298828125, 28.329235076904297, 32.57106018066406, 8.700050354003906, 4.227684020996094, 17.899276733398438, -2.080089569091797, 20.594635009765625, -23.160383224487305, 13.770957946777344, 19.889389038085938, 8.045440673828125, 27.741355895996094, 0.40048980712890625, -9.392471313476562, 3.1040992736816406, 7.184272766113281, 5.563026428222656, 5.085636138916016, 1.4654045104980469, 1.2104320526123047, -11.391090393066406, 12.21697998046875, 8.234138488769531, 0.12192535400390625, 16.751731872558594, 16.090713500976562, 2.337545394897461, 8.501480102539062, 11.846588134765625, -10.088027954101562, -0.7921142578125, 1.0216407775878906, 18.344223022460938, 5.690460205078125, -2.1269073486328125, 10.078094482421875, -1.8296279907226562, 5.326377868652344, 32.275665283203125, 31.10419464111328, -1.8480453491210938, 16.822284698486328, 8.726076126098633, -0.7373580932617188, 28.404617309570312, -4.53515625, -14.016777038574219, 38.664276123046875, 11.956279754638672, 18.1541748046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000356.npy"}
{"epoch": 0.5381708238851096, "step": 357, "batch_size": 64, "mean": 6.760571479797363, "std": 12.134714126586914, "min": -19.32452392578125, "p10": -8.765421295166014, "median": 6.369747161865234, "p90": 21.532076263427737, "max": 38.826133728027344, "pos_frac": 0.734375, "sample": [20.067359924316406, -1.6804618835449219, 27.45256805419922, 8.029563903808594, -11.993099212646484, 38.826133728027344, 14.052520751953125, 2.0967636108398438, 12.371696472167969, -2.3019256591796875, -9.046463012695312, 19.943214416503906, -8.109657287597656, 6.1599578857421875, 34.60149383544922, 11.542098999023438, 23.13968849182129, 7.7386932373046875, 9.300174713134766, 7.871662139892578, 8.130325317382812, 6.579536437988281, 5.702583312988281, 4.181894302368164, 8.763961791992188, 7.613315582275391, -0.6276607513427734, -2.999645233154297, -19.030975341796875, 1.8052597045898438, 1.2005157470703125, 13.37978744506836, -11.716232299804688, 7.743431091308594, 19.763778686523438, 22.996612548828125, 20.15601348876953, 2.0759429931640625, 13.498443603515625, 30.309650421142578, -2.1646041870117188, 20.81409454345703, -6.113964080810547, 4.3043975830078125, 6.83544921875, 21.83978271484375, -0.5296115875244141, 0.4019012451171875, -5.37542724609375, 0.6485366821289062, 1.2771739959716797, 13.747116088867188, 17.642330169677734, -0.32096099853515625, 20.32134246826172, 0.17079925537109375, 2.646045684814453, 6.9400787353515625, 16.388683319091797, -12.682697296142578, -11.129638671875, -19.32452392578125, 5.821434020996094, 0.9303445816040039], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000357.npy"}
{"epoch": 0.5396825396825397, "step": 358, "batch_size": 64, "mean": 9.095544815063477, "std": 12.209330558776855, "min": -18.46653175354004, "p10": -4.685739517211912, "median": 8.488937377929688, "p90": 26.732327270507813, "max": 38.27070617675781, "pos_frac": 0.703125, "sample": [9.490848541259766, 12.166709899902344, 15.865249633789062, 21.586868286132812, -0.24158287048339844, 23.13317108154297, 14.966323852539062, -5.329566955566406, -1.3436737060546875, -1.2550811767578125, -1.0034332275390625, 17.427637100219727, 6.2494049072265625, 14.010749816894531, 22.725902557373047, -0.5270767211914062, -3.046875, 0.6828460693359375, 7.4754638671875, 16.12381362915039, 10.1312255859375, 4.6007080078125, 4.2673187255859375, 26.1656494140625, 17.671138763427734, 0.3586273193359375, 5.982566833496094, -1.9652328491210938, -6.69219970703125, 1.1922988891601562, -6.652717590332031, 11.388442993164062, -9.92938232421875, 5.98248291015625, -0.271392822265625, -0.2563323974609375, 22.10773468017578, 28.4163818359375, 3.8181076049804688, -3.1834754943847656, 31.15587615966797, 15.368072509765625, 22.668487548828125, 5.139091491699219, 38.27070617675781, 35.41749572753906, 10.320789337158203, -7.443885803222656, 7.693943023681641, 26.975189208984375, -1.9011077880859375, -18.46653175354004, -12.14908218383789, 27.747974395751953, 1.3287200927734375, 16.023406982421875, -2.8196754455566406, 9.283931732177734, 10.743743896484375, 14.80960464477539, 11.93606948852539, 16.127471923828125, 10.327857971191406, 31.26708984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000358.npy"}
{"epoch": 0.5411942554799698, "step": 359, "batch_size": 64, "mean": 4.953818321228027, "std": 11.911177635192871, "min": -25.737396240234375, "p10": -6.854681015014648, "median": 4.649274826049805, "p90": 20.717321586608893, "max": 34.413909912109375, "pos_frac": 0.640625, "sample": [2.1492919921875, 22.538345336914062, 0.2869377136230469, 13.548614501953125, 18.772132873535156, 1.8285293579101562, -6.926395416259766, 5.542642593383789, -3.8538169860839844, -25.737396240234375, 34.413909912109375, 4.683803558349609, 5.6269378662109375, 16.490402221679688, -3.131183624267578, 3.5397796630859375, 18.132966995239258, -4.3201904296875, 5.739044189453125, -18.311677932739258, 0.6433563232421875, 13.380439758300781, 5.009307861328125, -18.511138916015625, 6.294708251953125, 6.5785980224609375, 12.260953903198242, -1.0484466552734375, 6.281524658203125, -14.597476959228516, 24.632110595703125, 26.1690673828125, -4.9462127685546875, 8.239849090576172, -2.4887619018554688, 1.7194061279296875, 13.773555755615234, 11.315559387207031, -0.8595619201660156, 4.61474609375, -7.968936920166016, 13.158428192138672, 29.451335906982422, 10.0443115234375, -4.046638488769531, -2.30755615234375, -4.453460693359375, -6.687347412109375, 1.0875129699707031, 6.549171447753906, 4.148700714111328, -6.507730484008789, -6.576171875, -1.1068344116210938, 13.228157043457031, -4.956089019775391, 21.550973892211914, 14.345670700073242, 12.942352294921875, 12.167694091796875, 9.990711212158203, 32.36864471435547, -7.2994537353515625, -1.5533447265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000359.npy"}
{"epoch": 0.5427059712773998, "step": 360, "batch_size": 64, "mean": 7.733197212219238, "std": 12.033853530883789, "min": -11.724525451660156, "p10": -4.967449569702148, "median": 5.368485450744629, "p90": 26.33492965698242, "max": 40.893821716308594, "pos_frac": 0.6875, "sample": [5.55767822265625, 40.893821716308594, 9.32421875, 0.0999908447265625, 9.080825805664062, 13.950103759765625, -5.349491119384766, 12.899253845214844, -4.858467102050781, 32.4600830078125, -0.7166767120361328, -6.0413818359375, -4.59521484375, 0.2787933349609375, 17.02123260498047, 6.0871429443359375, 28.16326904296875, -11.724525451660156, 5.41160774230957, 24.355316162109375, 1.0931396484375, -1.002349853515625, -3.9012908935546875, 0.894195556640625, -1.2016525268554688, 26.024837493896484, 9.9332275390625, 7.239158630371094, 15.54922103881836, -1.7687835693359375, 5.3253631591796875, 3.0111122131347656, 5.624961853027344, -3.9700469970703125, -5.434272766113281, 15.302154541015625, -1.205291748046875, 9.85357666015625, 4.819541931152344, -1.5851058959960938, 35.294212341308594, -3.2024078369140625, 7.180591583251953, -5.800762176513672, 19.090118408203125, 10.573013305664062, 31.57677459716797, 10.656356811523438, 4.511129379272461, 26.40843963623047, 25.476890563964844, 7.8226470947265625, 29.81515121459961, -5.014156341552734, 1.572601318359375, 9.748977661132812, -1.5984458923339844, -10.228950500488281, 5.108554840087891, 6.222740173339844, 3.3040313720703125, -0.7195625305175781, 4.063987731933594, 26.163406372070312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000360.npy"}
{"epoch": 0.54421768707483, "step": 361, "batch_size": 64, "mean": 8.81015396118164, "std": 12.78119945526123, "min": -21.246353149414062, "p10": -2.849236488342285, "median": 7.190067291259766, "p90": 26.105733871459964, "max": 50.386871337890625, "pos_frac": 0.75, "sample": [-2.7070770263671875, 11.573097229003906, -2.9101619720458984, -4.7431230545043945, 18.967987060546875, 1.70953369140625, -1.64837646484375, -12.707195281982422, 4.469139099121094, 9.445598602294922, 1.9327545166015625, -1.8462295532226562, -1.73486328125, 16.815364837646484, 19.280685424804688, 8.491497039794922, -7.381420135498047, -2.536041259765625, 5.4700775146484375, 1.47412109375, 18.303741455078125, 27.14647674560547, 1.373291015625, 29.157546997070312, 14.776641845703125, 4.933513641357422, 17.02517318725586, -21.246353149414062, 28.704055786132812, 3.2192535400390625, 8.792560577392578, 3.8874359130859375, 12.435840606689453, 39.795692443847656, 19.09173583984375, 17.780715942382812, 22.948692321777344, 3.3428916931152344, -9.98583984375, -1.537445068359375, 2.537067413330078, -1.5609130859375, 29.272418975830078, 14.60195541381836, 19.535911560058594, 25.566936492919922, 11.115543365478516, -0.039764404296875, 10.527816772460938, 13.157394409179688, 19.668540954589844, 5.063133239746094, 7.663585662841797, 1.2406387329101562, 50.386871337890625, 5.518280029296875, 26.336647033691406, 6.7991485595703125, -16.387786865234375, 8.262519836425781, 3.706829071044922, 13.315339088439941, 7.580986022949219, -1.3802337646484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000361.npy"}
{"epoch": 0.54572940287226, "step": 362, "batch_size": 64, "mean": 6.977950096130371, "std": 12.613837242126465, "min": -22.717670440673828, "p10": -5.0644897460937495, "median": 5.9539031982421875, "p90": 23.128878021240237, "max": 56.09161376953125, "pos_frac": 0.703125, "sample": [-3.241352081298828, 9.636299133300781, 0.575653076171875, -1.7246379852294922, 4.352428436279297, -22.717670440673828, 14.953926086425781, 13.258636474609375, 56.09161376953125, 7.6727142333984375, -0.76275634765625, -1.378448486328125, 17.624160766601562, 4.479835510253906, -3.3884525299072266, 8.941577911376953, 1.6841964721679688, 9.570648193359375, -1.2688827514648438, 24.172988891601562, 0.2720184326171875, 3.7572784423828125, 4.912544250488281, -4.67236328125, 22.957557678222656, 15.174667358398438, 25.31734848022461, -16.18048095703125, -9.168296813964844, 2.2962989807128906, 30.060211181640625, 7.089670181274414, -2.0565261840820312, 12.254966735839844, -1.141815185546875, 23.202301025390625, 6.746150970458984, -10.246734619140625, 2.4772605895996094, -1.6858901977539062, 2.7315673828125, 24.193832397460938, 19.82305908203125, 10.91910171508789, 3.8552780151367188, 19.188201904296875, 14.905487060546875, 12.845382690429688, 13.973390579223633, -4.175300598144531, -5.2325439453125, 13.60162353515625, 11.008247375488281, -13.772453308105469, 5.749725341796875, 6.8348236083984375, 20.051315307617188, -11.898818969726562, -4.36358642578125, 6.1580810546875, 29.787841796875, 4.048759460449219, 8.649810791015625, 7.807334899902344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000362.npy"}
{"epoch": 0.54724111866969, "step": 363, "batch_size": 64, "mean": 8.87706470489502, "std": 10.75987720489502, "min": -14.242706298828125, "p10": -2.1358970642089825, "median": 6.669378280639648, "p90": 26.107922363281265, "max": 37.12135314941406, "pos_frac": 0.859375, "sample": [4.382946014404297, -7.751701354980469, 30.091434478759766, 8.620841979980469, 0.9837646484375, 2.9244384765625, 6.151782989501953, 35.17533874511719, 6.4787139892578125, 1.2738780975341797, 2.3114471435546875, 16.36865234375, 20.083251953125, 22.291290283203125, 5.611564636230469, 37.12135314941406, 0.327880859375, 7.125673294067383, 3.30712890625, 19.163963317871094, 8.462409973144531, 10.949657440185547, 6.860042572021484, 19.401077270507812, 15.275184631347656, 13.614715576171875, 4.3597412109375, 8.763137817382812, -14.242706298828125, 0.5294818878173828, -5.045986175537109, -7.9204559326171875, 6.928733825683594, -6.583953857421875, 27.743621826171875, 7.469535827636719, 11.61734390258789, 4.946910858154297, 12.442045211791992, -0.3141622543334961, 4.434814453125, 3.8374900817871094, 21.145294189453125, 29.529518127441406, -2.8632583618164062, 28.395957946777344, 21.571311950683594, 7.210601806640625, 15.235307693481445, 1.3092041015625, 28.063446044921875, 2.540576934814453, 9.391660690307617, 2.328022003173828, -4.828067779541016, -0.438720703125, 10.662002563476562, 2.818899154663086, 14.635391235351562, 13.636035919189453, 0.8200302124023438, 2.0807037353515625, 3.9416351318359375, 5.374289512634277], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000363.npy"}
{"epoch": 0.5487528344671202, "step": 364, "batch_size": 64, "mean": 9.481189727783203, "std": 14.140077590942383, "min": -18.433868408203125, "p10": -6.859275054931641, "median": 5.819648742675781, "p90": 26.140785217285156, "max": 44.3939208984375, "pos_frac": 0.75, "sample": [28.925437927246094, 0.5934638977050781, 3.765655517578125, 42.852508544921875, 25.57404327392578, -2.6121444702148438, 4.077304840087891, 11.593460083007812, -1.5279693603515625, 21.632125854492188, 26.0491943359375, 5.624908447265625, -7.866535186767578, -1.0822601318359375, 24.587326049804688, 37.110328674316406, 1.566070556640625, 6.281059265136719, 44.3939208984375, 3.4561691284179688, 16.701934814453125, 5.0302276611328125, 2.4200515747070312, 25.638778686523438, -18.433868408203125, -16.024465560913086, 26.180038452148438, -0.13458633422851562, 2.6147918701171875, 11.053291320800781, 14.312812805175781, 22.328826904296875, 4.21246337890625, 9.510574340820312, 2.6060256958007812, 8.797981262207031, 24.951915740966797, 19.1734619140625, -5.646171569824219, -10.488269805908203, 3.6751441955566406, 6.0143890380859375, -5.108757019042969, -6.959381103515625, 17.6500244140625, 2.8219375610351562, -3.1104965209960938, 17.891464233398438, 10.440513610839844, 39.005882263183594, -6.625694274902344, 17.288612365722656, -14.887832641601562, 20.093826293945312, -10.676445007324219, 9.017974853515625, 10.151641845703125, 22.15557098388672, 5.583232879638672, 2.8840713500976562, 4.3358917236328125, -1.7514877319335938, 29.094932556152344, 18.011253356933594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000364.npy"}
{"epoch": 0.5502645502645502, "step": 365, "batch_size": 64, "mean": 7.240961074829102, "std": 9.391875267028809, "min": -11.02386474609375, "p10": -3.5433181762695307, "median": 6.368405342102051, "p90": 19.733749389648438, "max": 32.13911819458008, "pos_frac": 0.734375, "sample": [2.3670883178710938, 11.935348510742188, 12.555831909179688, 6.399515151977539, 4.926002502441406, 2.744424819946289, 12.415939331054688, 16.05518341064453, 18.948593139648438, -1.801361083984375, 11.98846435546875, 3.286855697631836, -10.681915283203125, 7.074001312255859, 9.235992431640625, 21.96999740600586, -2.7195186614990234, 12.534332275390625, -1.972015380859375, 6.74334716796875, 4.8981781005859375, 3.81219482421875, 18.371421813964844, -3.2215919494628906, -1.5991458892822266, -7.066871643066406, 11.286529541015625, 27.710487365722656, 1.497894287109375, 14.989086151123047, -7.6660614013671875, 2.8158416748046875, 12.73470687866211, 3.8942832946777344, 11.704254150390625, 20.257617950439453, 14.764610290527344, -1.2007017135620117, -0.670684814453125, 6.301338195800781, 20.18487548828125, 10.709800720214844, 19.32823944091797, 5.10955810546875, 4.9625701904296875, 21.82611083984375, 4.320819854736328, -7.945949554443359, 16.334716796875, 7.039203643798828, 19.90753936767578, -3.246654510498047, 7.726596832275391, -11.02386474609375, 10.71230697631836, 5.4425506591796875, -3.670459747314453, -6.8714447021484375, 6.3372955322265625, -1.3106231689453125, 32.13911819458008, 10.771736145019531, -1.130340576171875, 18.1483154296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000365.npy"}
{"epoch": 0.5517762660619804, "step": 366, "batch_size": 64, "mean": 6.592994689941406, "std": 12.338757514953613, "min": -17.159103393554688, "p10": -7.236982727050781, "median": 4.95225715637207, "p90": 24.985643386840827, "max": 34.361846923828125, "pos_frac": 0.6875, "sample": [1.1508541107177734, 26.185791015625, 17.867008209228516, 7.779228210449219, -0.632659912109375, 21.360963821411133, -8.536529541015625, -6.271251678466797, -12.188446044921875, -4.999473571777344, -3.8114280700683594, 12.127731323242188, 12.151432037353516, 7.156044006347656, 9.756172180175781, 0.0121002197265625, -15.0482177734375, 4.2679595947265625, 6.332548141479492, 25.647621154785156, 6.686920166015625, 0.4550285339355469, 4.2865753173828125, -2.7908554077148438, -2.380859375, 7.819301605224609, 2.0494918823242188, 23.441028594970703, -8.084915161132812, 5.178081512451172, 34.361846923828125, -5.842262268066406, -5.584560394287109, 28.863601684570312, -17.159103393554688, 1.8051986694335938, 6.169197082519531, -1.7523574829101562, 10.251705169677734, 3.2648773193359375, 2.1601314544677734, -5.16124153137207, 3.7834911346435547, 2.8744354248046875, -7.549140930175781, 4.726432800292969, -0.09510421752929688, 11.450836181640625, 32.686702728271484, 9.138916015625, 6.10595703125, 29.79840850830078, 16.693817138671875, -15.970794677734375, 17.376434326171875, -6.508613586425781, 32.73078918457031, 8.756683349609375, 16.91565704345703, 21.857269287109375, 21.172439575195312, 11.31454849243164, 17.099287033081055, -0.7510566711425781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000366.npy"}
{"epoch": 0.5532879818594104, "step": 367, "batch_size": 64, "mean": 5.910340309143066, "std": 9.948771476745605, "min": -23.89938735961914, "p10": -5.185955047607421, "median": 6.151222229003906, "p90": 18.31276206970215, "max": 27.132980346679688, "pos_frac": 0.6875, "sample": [7.825529098510742, 23.482696533203125, 13.389537811279297, -3.041271209716797, 5.4533538818359375, -2.432262420654297, 18.30093002319336, 23.752796173095703, -0.27088165283203125, 5.158090591430664, 1.7589263916015625, 5.2242431640625, 6.40472412109375, 2.36700439453125, 27.132980346679688, 11.305120468139648, 12.737617492675781, -7.0544586181640625, -1.7141380310058594, 18.317832946777344, -1.97320556640625, 8.902053833007812, -8.906719207763672, -7.087909698486328, 6.04241943359375, -12.220096588134766, -0.63128662109375, 12.811264038085938, 11.117329597473145, 22.999649047851562, 15.471206665039062, 11.047330856323242, 8.557937622070312, 6.2600250244140625, 7.175025939941406, 0.2861919403076172, 1.3410720825195312, -3.2910385131835938, 6.656047821044922, 12.655448913574219, -1.7213630676269531, -23.89938735961914, 0.01470184326171875, 17.15984344482422, 5.669673919677734, -5.4881591796875, 6.283271789550781, 13.10201644897461, 6.008308410644531, -2.7317981719970703, 8.743270874023438, 3.4022140502929688, -12.120742797851562, 17.8673095703125, 22.392791748046875, 16.425735473632812, -4.185634613037109, -3.2178564071655273, 13.468505859375, 9.723148345947266, -4.480812072753906, 11.478137969970703, -0.7114410400390625, 19.76893424987793], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000367.npy"}
{"epoch": 0.5547996976568406, "step": 368, "batch_size": 64, "mean": 6.968982219696045, "std": 11.919901847839355, "min": -17.10418701171875, "p10": -8.756368255615234, "median": 6.333415985107422, "p90": 23.411914253234865, "max": 33.298377990722656, "pos_frac": 0.6875, "sample": [-14.882259368896484, 12.696060180664062, -4.392219543457031, 9.602882385253906, -0.730377197265625, 6.625457763671875, 25.38958168029785, 7.400245666503906, 1.9787063598632812, 0.421539306640625, 13.787109375, -6.3023223876953125, -2.4366607666015625, 13.259231567382812, 28.124897003173828, -7.7986907958984375, -1.839447021484375, -10.106201171875, -3.5169906616210938, 22.653297424316406, -1.391845703125, 8.579574584960938, 14.063148498535156, -11.720039367675781, 23.18851661682129, 7.625932693481445, -9.138626098632812, 15.437393188476562, 33.298377990722656, 23.50765609741211, 0.9653129577636719, 18.425750732421875, 26.62833023071289, 5.204389572143555, 13.016510009765625, 14.372032165527344, 4.634010314941406, 10.507911682128906, 1.915771484375, 4.948875427246094, 6.041374206542969, -0.6390380859375, 29.936996459960938, 19.691085815429688, -9.409778594970703, -3.6640548706054688, 11.23508071899414, -10.350715637207031, 9.662666320800781, -6.038818359375, 19.415328979492188, 15.853492736816406, 27.71978759765625, 16.966537475585938, 5.745166778564453, 6.0009765625, 5.341796875, -7.864433288574219, -3.5052127838134766, 6.8074798583984375, 19.943222045898438, 9.08453369140625, -17.10418701171875, 1.14276123046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000368.npy"}
{"epoch": 0.5563114134542706, "step": 369, "batch_size": 64, "mean": 6.299734592437744, "std": 12.26999282836914, "min": -21.394088745117188, "p10": -7.717792510986328, "median": 3.8108654022216797, "p90": 20.28699951171875, "max": 37.44898986816406, "pos_frac": 0.765625, "sample": [32.877159118652344, -11.2777099609375, 1.8899497985839844, 13.0213623046875, -16.742820739746094, 19.709747314453125, 17.673908233642578, 6.89239501953125, 9.560310363769531, 3.200408935546875, 2.461395263671875, 20.304046630859375, 23.241985321044922, 15.2772216796875, 25.58353614807129, 3.7183799743652344, 0.5498323440551758, 6.8406982421875, 0.1701812744140625, 4.158233642578125, 19.86058807373047, -21.394088745117188, 8.435440063476562, -7.785833358764648, 2.8534774780273438, 15.946456909179688, -7.48614501953125, 18.381078720092773, 9.322868347167969, -7.559030532836914, 20.247222900390625, -0.8740692138671875, 4.585380554199219, 22.296180725097656, 13.82468032836914, 19.415931701660156, 36.410770416259766, 1.2189483642578125, 1.806549072265625, -6.9871673583984375, -3.3755340576171875, 1.9415550231933594, 2.0162277221679688, 12.927764892578125, 1.2461280822753906, 0.90313720703125, 3.903350830078125, 10.602149963378906, 11.1597900390625, -3.984649658203125, 0.6694183349609375, 0.5474777221679688, 5.130611419677734, 16.20691680908203, -3.0970077514648438, -10.078737258911133, 0.475372314453125, 37.44898986816406, 0.5896148681640625, 14.102920532226562, -5.254337310791016, -16.811752319335938, -10.095863342285156, 14.380002975463867], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000369.npy"}
{"epoch": 0.5578231292517006, "step": 370, "batch_size": 64, "mean": 9.195744514465332, "std": 13.537933349609375, "min": -23.84021759033203, "p10": -4.896449851989746, "median": 7.670658111572266, "p90": 28.652903747558593, "max": 39.98835754394531, "pos_frac": 0.734375, "sample": [4.2884979248046875, 28.04418182373047, 21.305583953857422, 6.2172393798828125, 37.96125793457031, 4.824924468994141, 39.98835754394531, -0.950225830078125, 28.90106964111328, 10.517383575439453, 4.498802185058594, -0.9200210571289062, 0.6868209838867188, 14.390647888183594, 31.05291748046875, 7.782073974609375, 5.7675628662109375, -11.585227966308594, -4.544059753417969, 16.370132446289062, -6.591461181640625, 16.699630737304688, 21.320785522460938, 12.737396240234375, 21.353294372558594, 7.681427001953125, 20.24466323852539, -8.38986587524414, 8.484024047851562, 1.3523406982421875, 1.0264358520507812, -4.5419921875, -2.2685699462890625, -4.962921142578125, -2.5258712768554688, 29.749176025390625, 34.41532897949219, 11.482734680175781, 28.67957305908203, 15.144271850585938, 21.99866485595703, -20.48583984375, 16.50006866455078, 3.153228759765625, 16.7315673828125, 28.590675354003906, 7.0413665771484375, 16.63959503173828, 15.583412170410156, 16.125511169433594, -23.84021759033203, 7.676006317138672, 13.480655670166016, -4.157208442687988, 7.665309906005859, 0.2288360595703125, 6.3674163818359375, -1.2412796020507812, -4.741350173950195, -1.0792312622070312, 4.7115478515625, 23.773361206054688, -11.796989440917969, 3.914234161376953], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000370.npy"}
{"epoch": 0.5593348450491308, "step": 371, "batch_size": 64, "mean": 8.255373001098633, "std": 13.574396133422852, "min": -23.39449691772461, "p10": -9.546393966674804, "median": 6.722284317016602, "p90": 26.28198223114015, "max": 41.764427185058594, "pos_frac": 0.75, "sample": [22.27111053466797, -23.39449691772461, 38.554222106933594, 17.010581970214844, 28.094993591308594, 2.408905029296875, 14.554901123046875, -1.2253684997558594, 4.8789825439453125, 0.0009765625, 3.464324951171875, 16.343833923339844, -9.29830551147461, 37.9921760559082, 7.251447677612305, 13.436721801757812, 0.8682346343994141, 12.41965103149414, 5.883787155151367, 28.88043975830078, 1.8067474365234375, -16.209381103515625, 2.943714141845703, 7.231311798095703, 18.170082092285156, 27.977365493774414, 15.982990264892578, 41.764427185058594, 13.554183959960938, 17.894954681396484, 7.2961273193359375, 17.848461151123047, -11.313148498535156, -8.187454223632812, 6.121417999267578, 6.2132568359375, 5.3669891357421875, -10.856739044189453, -6.25335693359375, 6.158699035644531, -5.533649444580078, 21.738662719726562, -6.463739395141602, 5.756431579589844, 16.413612365722656, -6.542015075683594, 13.244426727294922, 11.53390121459961, -10.772926330566406, 18.430419921875, 4.358894348144531, -13.165729522705078, 2.6909265518188477, 11.5242919921875, 19.953689575195312, -1.2962779998779297, 21.197547912597656, 8.078056335449219, -1.2372970581054688, 10.227005004882812, 22.326087951660156, 28.62140655517578, 3.0051193237304688, -9.652717590332031], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000371.npy"}
{"epoch": 0.5608465608465608, "step": 372, "batch_size": 64, "mean": 6.743547439575195, "std": 13.9678373336792, "min": -23.967838287353516, "p10": -11.1544038772583, "median": 5.927013397216797, "p90": 25.6711051940918, "max": 39.15068817138672, "pos_frac": 0.6875, "sample": [-12.48746109008789, 12.303279876708984, 18.993553161621094, -8.180122375488281, 11.586952209472656, 7.65460205078125, 22.665000915527344, -11.388534545898438, 15.406967163085938, 21.63208770751953, 26.246658325195312, 3.4190902709960938, 6.420234680175781, 35.30030822753906, 13.746322631835938, 8.358329772949219, 10.312225341796875, -18.544578552246094, 1.2556304931640625, 24.328147888183594, 6.627195358276367, 3.170257568359375, 1.5197219848632812, 29.55249786376953, -3.275583267211914, 5.4337921142578125, -9.463729858398438, 23.53894805908203, 39.15068817138672, 2.9413280487060547, 28.16888427734375, 18.129928588867188, -5.277973175048828, -6.910297393798828, -7.2315826416015625, 17.436235427856445, -23.967838287353516, 8.013078689575195, 20.99594497680664, 9.568557739257812, 21.897064208984375, 11.643341064453125, 14.60369873046875, 29.011978149414062, 4.50775146484375, 11.643383026123047, -7.170158386230469, 1.3511180877685547, 4.386054992675781, -15.510543823242188, 4.85467529296875, -14.503562927246094, 12.719482421875, 0.20415496826171875, -3.4800758361816406, -10.608098983764648, -14.959114074707031, -7.129524230957031, 17.436450958251953, 26.272335052490234, -1.4773063659667969, -3.54852294921875, -0.29564666748046875, 2.5893707275390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000372.npy"}
{"epoch": 0.562358276643991, "step": 373, "batch_size": 64, "mean": 8.264881134033203, "std": 13.1200532913208, "min": -18.38949203491211, "p10": -7.248336029052733, "median": 5.578788757324219, "p90": 24.962340164184575, "max": 42.61045837402344, "pos_frac": 0.734375, "sample": [-8.279682159423828, 16.950353622436523, 19.056602478027344, 17.410125732421875, 13.637649536132812, -15.824825286865234, 2.7210311889648438, 19.790733337402344, 9.64410400390625, 12.085794448852539, 17.42269515991211, -9.502288818359375, 10.60107421875, -3.1189117431640625, -18.38949203491211, 1.2548561096191406, 7.578407287597656, -3.415130615234375, -5.444732666015625, 4.098369598388672, 4.5678253173828125, -3.5448856353759766, 12.664543151855469, -8.021308898925781, 12.598007202148438, 30.285293579101562, -2.339569091796875, 13.648818969726562, 3.553985595703125, 5.740879058837891, 29.546287536621094, 0.0375823974609375, -0.512908935546875, 32.2340087890625, -2.8063583374023438, 3.9789962768554688, 1.741912841796875, 0.14031982421875, -2.7353591918945312, 25.412677764892578, 20.802635192871094, -8.968414306640625, -4.571479797363281, -3.4502182006835938, 0.12158203125, 20.639209747314453, -11.118465423583984, 23.91155242919922, 1.8835315704345703, 10.359909057617188, 9.699760437011719, 40.901092529296875, 5.416698455810547, 42.61045837402344, 9.250677108764648, 18.282875061035156, 4.961662292480469, 23.385696411132812, 13.52752685546875, 33.51025390625, 8.074962615966797, 2.5898208618164062, 2.9499549865722656, 19.71368408203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000373.npy"}
{"epoch": 0.563869992441421, "step": 374, "batch_size": 64, "mean": 10.52479362487793, "std": 12.468159675598145, "min": -16.590179443359375, "p10": -3.7696166992187496, "median": 9.559274673461914, "p90": 26.008670043945315, "max": 36.414154052734375, "pos_frac": 0.8125, "sample": [31.78350830078125, 14.359161376953125, 3.946136474609375, -5.5657806396484375, 0.6015548706054688, 20.153175354003906, 22.17807388305664, 25.408164978027344, 31.70372772216797, 11.213180541992188, 16.562149047851562, 1.08514404296875, 12.599967956542969, 34.115447998046875, 0.6501998901367188, -14.89947509765625, 29.147613525390625, 4.486763000488281, -0.20642852783203125, 0.96893310546875, -0.8414230346679688, 14.431726455688477, 6.705715179443359, -5.4485931396484375, 19.45598602294922, 16.54049301147461, 13.53354263305664, 26.266029357910156, -3.52618408203125, 9.109214782714844, 2.8770904541015625, 22.579574584960938, 7.80213737487793, 3.7725086212158203, 24.067703247070312, 9.825237274169922, 7.734626770019531, -16.590179443359375, 14.22845458984375, 12.116806030273438, 9.293312072753906, -13.639236450195312, -0.06437873840332031, 30.415512084960938, 36.414154052734375, 10.695602416992188, -9.226104736328125, -3.8463287353515625, 13.213752746582031, 0.8960037231445312, 25.196319580078125, 21.344226837158203, 24.419326782226562, 8.651359558105469, 18.813274383544922, 0.9498252868652344, 25.194046020507812, 14.5394287109375, 7.053981781005859, 23.891040802001953, 0.7361946105957031, 6.439300537109375, 0.8651123046875, -3.5906219482421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000374.npy"}
{"epoch": 0.5653817082388511, "step": 375, "batch_size": 64, "mean": 6.696353435516357, "std": 11.158818244934082, "min": -21.136451721191406, "p10": -8.00295467376709, "median": 6.09950065612793, "p90": 21.126186370849613, "max": 37.64512634277344, "pos_frac": 0.734375, "sample": [7.62115478515625, 4.0045166015625, 1.7085189819335938, 16.86452293395996, 10.30337905883789, 4.7935638427734375, 24.098602294921875, -1.7110557556152344, 1.8667449951171875, 11.001068115234375, -0.8850860595703125, 16.211822509765625, -0.76934814453125, 22.048343658447266, 13.170490264892578, -3.360279083251953, 15.439071655273438, -5.3249969482421875, 7.538330078125, -8.163431167602539, 8.858901977539062, 12.025360107421875, 20.41901397705078, 1.06951904296875, 8.014263153076172, 22.29138946533203, 4.456119537353516, -8.21234130859375, 6.111492156982422, 12.389335632324219, 4.348533630371094, -11.689201354980469, -1.0113162994384766, -1.4613723754882812, 4.6816253662109375, 13.654220581054688, 5.898895263671875, 2.1846466064453125, 37.64512634277344, 1.3662605285644531, 21.42926025390625, 6.369508743286133, 18.87903594970703, -21.136451721191406, 7.790000915527344, -2.57647705078125, 29.966102600097656, 3.632232666015625, 17.21728515625, 7.067424774169922, 17.14612579345703, 6.0875091552734375, -12.200050354003906, 10.213232040405273, 4.874134063720703, 19.837169647216797, 9.727706909179688, -17.989376068115234, -3.0176048278808594, 16.386611938476562, -9.438796997070312, 22.88919448852539, 3.5449676513671875, -7.628509521484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000375.npy"}
{"epoch": 0.5668934240362812, "step": 376, "batch_size": 64, "mean": 9.763787269592285, "std": 14.530304908752441, "min": -15.561798095703125, "p10": -10.827406883239746, "median": 6.666782379150391, "p90": 33.24780540466309, "max": 37.619224548339844, "pos_frac": 0.78125, "sample": [7.1580810546875, 10.177825927734375, 0.7925453186035156, 5.663566589355469, 12.396705627441406, -9.966299057006836, -2.9587860107421875, 29.931808471679688, 10.522476196289062, 2.8828392028808594, 24.54583740234375, 10.76738166809082, -15.54043197631836, -1.713205337524414, -12.242721557617188, -2.4571781158447266, 34.752540588378906, 8.228355407714844, 15.840415954589844, 30.64739990234375, 3.4310531616210938, 37.619224548339844, -1.8667373657226562, 10.62130355834961, 30.411048889160156, 1.74359130859375, 35.773155212402344, 34.762969970703125, 5.04443359375, 16.345672607421875, 24.10497283935547, 13.213249206542969, 28.015945434570312, 5.411705017089844, 4.271890640258789, 32.10203552246094, 10.79400634765625, -11.196453094482422, 33.772216796875, -12.055641174316406, 5.716468811035156, -7.482646942138672, 14.710685729980469, 5.108211517333984, 4.755867004394531, 34.959747314453125, -12.631301879882812, 6.697212219238281, -4.034446716308594, 21.868450164794922, -13.2122802734375, 1.6055488586425781, 7.129669189453125, 6.6363525390625, 33.73884963989258, 3.024005889892578, 9.80731201171875, 4.895011901855469, 25.924144744873047, 20.039512634277344, 3.0951004028320312, 4.9771728515625, 1.3667364120483398, -15.561798095703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000376.npy"}
{"epoch": 0.5684051398337112, "step": 377, "batch_size": 64, "mean": 6.881415367126465, "std": 11.004210472106934, "min": -13.089496612548828, "p10": -7.455572891235351, "median": 8.794075965881348, "p90": 19.34631519317627, "max": 35.24320983886719, "pos_frac": 0.703125, "sample": [23.95838165283203, -6.76605224609375, 14.09164047241211, 18.506250381469727, 16.504226684570312, 2.0166854858398438, -6.6645050048828125, -9.68798828125, 1.8516616821289062, 18.92751121520996, 14.497573852539062, -4.59197998046875, 19.525802612304688, 9.801483154296875, 10.6826171875, 9.128265380859375, -8.191162109375, 9.720100402832031, -6.260459899902344, -0.41255950927734375, 21.971294403076172, -0.17059040069580078, 12.842781066894531, 11.879203796386719, 12.383106231689453, -8.312206268310547, 7.5081787109375, 25.582717895507812, 0.9300174713134766, -2.0730934143066406, 10.090400695800781, -1.802377700805664, 8.621126174926758, -7.5137786865234375, 10.072616577148438, -11.050796508789062, 7.969970703125, 10.733419418334961, 8.967025756835938, 9.97747802734375, -13.089496612548828, 17.979576110839844, 9.314176559448242, 4.877998352050781, 3.6086044311523438, 33.5800666809082, 10.165464401245117, -6.820224761962891, -7.319759368896484, 9.590362548828125, 18.57366943359375, 6.12548828125, 0.08952713012695312, 35.24320983886719, -2.8309707641601562, -5.561923980712891, 28.691043853759766, 11.067655563354492, -8.794452667236328, 3.421161651611328, 14.254886627197266, 2.0261688232421875, 13.969825744628906, 7.004554748535156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000377.npy"}
{"epoch": 0.5699168556311414, "step": 378, "batch_size": 64, "mean": 5.823657989501953, "std": 11.806870460510254, "min": -26.43956756591797, "p10": -3.518217468261718, "median": 3.785022735595703, "p90": 17.074781036376955, "max": 50.319183349609375, "pos_frac": 0.75, "sample": [3.0299949645996094, -0.28830718994140625, 4.09025764465332, 3.8568344116210938, 4.1257171630859375, 16.95733642578125, -2.52880859375, 2.1875343322753906, 2.4069976806640625, 15.089141845703125, -26.43956756591797, 4.235725402832031, 32.2926025390625, 4.342227935791016, 17.12511444091797, 13.079357147216797, 1.8483009338378906, 5.582651138305664, 23.217636108398438, -2.3449783325195312, 14.858625411987305, 14.289566040039062, 3.2232666015625, 2.9210205078125, 14.535289764404297, 8.062728881835938, 3.0312728881835938, 13.898040771484375, -6.3353271484375, 0.4978618621826172, 26.842803955078125, 7.8676910400390625, -10.809654235839844, -2.597991943359375, 18.719757080078125, 7.041229248046875, 3.7132110595703125, 5.6668243408203125, 9.542961120605469, -5.779485702514648, 50.319183349609375, -2.8299560546875, 2.8097991943359375, 16.14547348022461, -1.185699462890625, 13.476730346679688, 2.7411231994628906, 24.378782272338867, -0.1085205078125, 6.261417388916016, 3.3602447509765625, 2.8041000366210938, 1.3892698287963867, -20.06592559814453, -3.8131866455078125, 13.261070251464844, -2.29925537109375, 14.578277587890625, 2.744924545288086, 6.4930572509765625, -1.5659751892089844, -22.89008331298828, 1.6017951965332031, 14.051994323730469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000378.npy"}
{"epoch": 0.5714285714285714, "step": 379, "batch_size": 64, "mean": 7.418521881103516, "std": 11.618158340454102, "min": -17.741294860839844, "p10": -6.501731109619141, "median": 5.971610069274902, "p90": 18.294884872436523, "max": 39.75169372558594, "pos_frac": 0.75, "sample": [6.635082244873047, 4.6235504150390625, 18.1754150390625, 1.6911239624023438, 3.061431884765625, 14.837860107421875, 14.939735412597656, -17.741294860839844, 0.3858757019042969, 7.291027069091797, 0.8524017333984375, 0.9716796875, -9.428848266601562, 18.343292236328125, -1.598886489868164, 39.75169372558594, 3.995758056640625, -3.782665252685547, 21.562389373779297, 15.32550048828125, 38.408447265625, 5.233154296875, 18.181934356689453, 15.71075439453125, -6.570159912109375, -1.859710693359375, 13.677974700927734, -13.958694458007812, 14.197280883789062, -10.2896728515625, 17.222991943359375, 10.500839233398438, 5.987905502319336, 5.28265380859375, -6.83880615234375, 17.9862060546875, 12.934921264648438, 9.803295135498047, 1.475616455078125, -6.342063903808594, -14.453475952148438, 5.9445648193359375, 7.7855987548828125, 25.830713272094727, 3.1939620971679688, 5.755456924438477, 39.05701446533203, 11.996490478515625, 4.036705017089844, -1.3374176025390625, 5.955314636230469, -1.7775497436523438, -0.3514251708984375, -3.7829055786132812, 20.676990509033203, -1.148162841796875, 17.23479461669922, 11.413192749023438, 7.217628479003906, 7.318365097045898, 11.922576904296875, 13.940414428710938, 12.587478637695312, 5.1320953369140625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000379.npy"}
{"epoch": 0.5729402872260015, "step": 380, "batch_size": 64, "mean": 9.494940757751465, "std": 12.402969360351562, "min": -27.718101501464844, "p10": -5.131973457336422, "median": 9.021228790283203, "p90": 23.748677062988282, "max": 42.990509033203125, "pos_frac": 0.84375, "sample": [3.697420120239258, 14.250167846679688, 8.237335205078125, 6.7461090087890625, -15.566978454589844, -13.780960083007812, 30.28997802734375, 23.749656677246094, -1.6250381469726562, 16.592071533203125, 1.7644233703613281, -6.558601379394531, -1.8031749725341797, 22.873550415039062, 8.528205871582031, 4.034950256347656, -0.2908744812011719, 9.2337646484375, 13.49837875366211, 26.214698791503906, 12.480308532714844, 10.09954833984375, 4.3409271240234375, -17.27482795715332, 23.74639129638672, 4.565944671630859, 20.608503341674805, 17.474380493164062, -12.226568222045898, 5.720428466796875, 9.766319274902344, 7.6890869140625, 6.5816192626953125, 9.604530334472656, 22.51873779296875, 4.588258743286133, 30.638427734375, 19.87651824951172, 0.7734832763671875, 14.209495544433594, 17.705429077148438, -27.718101501464844, 25.936248779296875, 42.990509033203125, 10.376346588134766, 18.44476890563965, 4.365726470947266, 8.808692932128906, 3.5602264404296875, 21.839017868041992, 6.4914703369140625, 4.410707473754883, -11.643798828125, 26.37329864501953, 13.926025390625, 21.03771209716797, 8.153003692626953, 10.500396728515625, 17.85712242126465, 13.897811889648438, 4.8192901611328125, 10.046859741210938, 1.2692489624023438, 8.361602783203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000380.npy"}
{"epoch": 0.5744520030234316, "step": 381, "batch_size": 64, "mean": 5.586208820343018, "std": 12.650764465332031, "min": -32.41886901855469, "p10": -8.096161270141602, "median": 4.321575164794922, "p90": 21.95101852416993, "max": 39.21417236328125, "pos_frac": 0.671875, "sample": [-7.972625732421875, 4.5841522216796875, -2.06400203704834, -12.646095275878906, -32.41886901855469, 6.870149612426758, 7.118579864501953, 5.022321701049805, 30.85470962524414, 11.948677062988281, 3.9001617431640625, 2.9745941162109375, 10.286602020263672, 10.03305435180664, 1.6966209411621094, 39.21417236328125, 24.448944091796875, -0.4138660430908203, -1.247222900390625, -3.113525390625, 5.815528869628906, -4.9661407470703125, 9.381217956542969, 17.124420166015625, -8.233909606933594, 11.464218139648438, 1.071868896484375, -17.146774291992188, 7.422351837158203, -2.227680206298828, 17.47599220275879, 17.816864013671875, 15.0849609375, -6.3683013916015625, -1.6118621826171875, 17.710529327392578, 12.09783935546875, -4.457851409912109, -8.149105072021484, 7.1768646240234375, 19.394203186035156, -13.158096313476562, 20.693702697753906, 0.49498748779296875, 5.736724853515625, 33.54109191894531, 10.436759948730469, 7.473609924316406, -0.9717979431152344, 2.896556854248047, 0.438751220703125, 11.457603454589844, 22.4898681640625, 3.9742889404296875, -0.008647918701171875, 2.7996292114257812, 22.517074584960938, -13.65118408203125, -4.7503662109375, 4.058998107910156, 6.221473693847656, -3.5359344482421875, 0.2141876220703125, 33.196319580078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000381.npy"}
{"epoch": 0.5759637188208617, "step": 382, "batch_size": 64, "mean": 6.46773624420166, "std": 11.170952796936035, "min": -11.2239990234375, "p10": -7.086234283447265, "median": 5.182903289794922, "p90": 21.14046859741212, "max": 38.80378341674805, "pos_frac": 0.703125, "sample": [-8.46600341796875, 1.6146183013916016, 33.6654052734375, 4.578926086425781, 18.75176239013672, 0.5488662719726562, 9.34521484375, 0.24561309814453125, 27.528350830078125, 17.619770050048828, 2.3418426513671875, 12.836029052734375, -3.9550018310546875, 6.5280303955078125, 23.82305145263672, -8.090614318847656, -9.361331939697266, 3.737945556640625, -11.2239990234375, 12.571815490722656, -3.29995059967041, 7.702232360839844, 5.279296875, 0.5307693481445312, -3.9043350219726562, 4.4823455810546875, 3.5738372802734375, 11.614967346191406, -5.7389373779296875, 38.80378341674805, 11.528244018554688, 10.313316345214844, -10.126701354980469, -10.346797943115234, 22.164199829101562, -1.7394065856933594, -4.615776062011719, 7.147212982177734, 7.8181915283203125, 10.590187072753906, -0.9838600158691406, 32.652976989746094, 11.422607421875, 10.313255310058594, -0.52752685546875, 18.391036987304688, -7.371082305908203, -3.1040573120117188, -2.419281005859375, 4.024909973144531, 6.80828857421875, 5.086509704589844, -6.421588897705078, 12.321453094482422, 9.798973083496094, -0.4848060607910156, 10.03497314453125, 9.8162841796875, 8.781646728515625, 32.62554168701172, 13.300064086914062, 3.349517822265625, 3.6786346435546875, 6.423694610595703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000382.npy"}
{"epoch": 0.5774754346182918, "step": 383, "batch_size": 64, "mean": 8.128961563110352, "std": 13.362638473510742, "min": -18.588531494140625, "p10": -6.16986083984375, "median": 5.979270935058594, "p90": 27.34063873291016, "max": 39.030479431152344, "pos_frac": 0.734375, "sample": [1.7380313873291016, 27.780014038085938, 25.2314453125, 6.2286376953125, -8.442268371582031, -7.2888946533203125, 13.386249542236328, 2.3558883666992188, 7.38824462890625, -4.101383209228516, 11.159805297851562, 2.938201904296875, 2.28363037109375, 15.870437622070312, 15.785614013671875, -1.12274169921875, 12.317188262939453, -5.188083648681641, 37.488990783691406, 12.737907409667969, 0.6931915283203125, -5.8388519287109375, 16.189926147460938, 4.25946044921875, 29.959428787231445, 6.0219268798828125, 3.0040512084960938, -2.412689208984375, 13.555002212524414, 32.089569091796875, 23.137168884277344, -3.7254638671875, -6.3117218017578125, 31.442657470703125, -4.078010559082031, 9.905574798583984, 21.422317504882812, -17.77325439453125, 0.86309814453125, 5.936614990234375, 4.183876037597656, 2.9381484985351562, 0.5516357421875, -2.4153099060058594, 14.012222290039062, 2.8139095306396484, 7.42071533203125, 8.925514221191406, -9.689010620117188, 31.82329559326172, 9.76521110534668, 39.030479431152344, 15.547622680664062, 1.5802459716796875, 0.5710678100585938, 21.106002807617188, 18.688018798828125, -5.393035888671875, -1.7899856567382812, -17.37891387939453, 26.3154296875, -18.588531494140625, 24.049041748046875, 19.298973083496094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000383.npy"}
{"epoch": 0.5789871504157218, "step": 384, "batch_size": 64, "mean": 8.016000747680664, "std": 11.88563346862793, "min": -18.60382080078125, "p10": -4.497525787353515, "median": 5.4680376052856445, "p90": 26.22333145141602, "max": 39.96678161621094, "pos_frac": 0.78125, "sample": [-8.726715087890625, 34.389564514160156, 3.7053298950195312, 8.186492919921875, -0.0824127197265625, 13.695411682128906, 5.314117431640625, 21.44794464111328, 19.10851287841797, -2.4377517700195312, 4.8773345947265625, 3.52154541015625, 5.982078552246094, -4.751792907714844, 1.8033256530761719, 7.7516937255859375, 2.6894989013671875, 25.077590942382812, 1.231170654296875, 21.973682403564453, 7.9538726806640625, 11.710281372070312, 4.975334167480469, -12.911041259765625, -5.318002700805664, 10.4715576171875, 29.134857177734375, 1.4086990356445312, -12.568920135498047, 3.13775634765625, 32.26527404785156, 12.325675964355469, 1.963205337524414, 5.621957778930664, 14.284130096435547, 0.5651016235351562, 39.96678161621094, 0.3976001739501953, -18.60382080078125, 20.034465789794922, -3.45745849609375, 6.898002624511719, 28.186553955078125, -9.447444915771484, 26.71436309814453, 27.31409454345703, 9.668853759765625, 4.570932388305664, 5.192626953125, 18.664810180664062, 2.4400482177734375, -0.4106025695800781, 1.0660972595214844, 17.470550537109375, 7.500770568847656, -0.655120849609375, -3.90423583984375, 22.7257080078125, -0.35594940185546875, 8.75655746459961, 10.266830444335938, 5.052082061767578, 6.00775146484375, 11.1868896484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000384.npy"}
{"epoch": 0.5804988662131519, "step": 385, "batch_size": 64, "mean": 8.53204345703125, "std": 11.34797191619873, "min": -19.492565155029297, "p10": -3.479680633544922, "median": 6.235572814941406, "p90": 24.53489685058594, "max": 33.78394317626953, "pos_frac": 0.78125, "sample": [-19.492565155029297, -3.178028106689453, 17.591144561767578, 19.870391845703125, 13.187980651855469, 1.606719970703125, 2.3539276123046875, 9.638187408447266, 5.512432098388672, 18.044509887695312, -3.5309410095214844, 12.239616394042969, 10.983390808105469, 20.39788818359375, 4.38165283203125, 31.214962005615234, 23.6273193359375, 20.148529052734375, 2.018829345703125, 5.109378814697266, -5.006721496582031, 0.291778564453125, -2.207366943359375, 13.066535949707031, 21.92145538330078, 24.923858642578125, 8.739875793457031, 5.8225250244140625, -3.3600730895996094, -0.45147705078125, 11.04888916015625, -6.061351776123047, 21.290260314941406, 5.9013671875, 4.890357971191406, 29.37812042236328, 11.131298065185547, 14.092853546142578, 6.940061569213867, 17.049728393554688, 0.331817626953125, -10.022697448730469, 6.461090087890625, 6.914337158203125, 0.6099624633789062, 27.465835571289062, 14.66424560546875, -1.4966278076171875, -1.1455421447753906, 0.6480712890625, -2.700408935546875, 7.905485153198242, -7.92681884765625, 0.590972900390625, 14.395431518554688, 6.0100555419921875, 2.95367431640625, 22.655250549316406, 0.1433258056640625, 33.78394317626953, -5.1584625244140625, 0.9453086853027344, 31.548778533935547, 25.346481323242188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000385.npy"}
{"epoch": 0.582010582010582, "step": 386, "batch_size": 64, "mean": 5.331875801086426, "std": 12.301651000976562, "min": -27.38933563232422, "p10": -9.934207916259764, "median": 4.655609130859375, "p90": 24.034690475463872, "max": 30.720611572265625, "pos_frac": 0.71875, "sample": [29.928192138671875, -2.8746910095214844, -0.8895759582519531, 18.03388214111328, 24.670379638671875, -4.3596343994140625, 3.2732772827148438, 0.7006072998046875, 5.350067138671875, 15.186508178710938, 2.5899810791015625, 9.83892822265625, -21.772430419921875, 3.4233551025390625, 1.4674453735351562, 10.666694641113281, -5.746490478515625, 24.56795883178711, 12.036888122558594, 10.885719299316406, 4.648323059082031, 17.257308959960938, 4.572479248046875, -1.8493194580078125, -7.994930267333984, 8.762142181396484, 19.84217071533203, 4.662895202636719, 6.159965515136719, 6.925079345703125, 14.154830932617188, 7.0526275634765625, -8.485069274902344, -10.927215576171875, -10.555267333984375, 4.765625, -11.749385833740234, 13.535987854003906, 5.130794525146484, 29.815444946289062, 2.9783477783203125, -17.48011016845703, 4.8915252685546875, 1.167327880859375, 24.946975708007812, -5.466121673583984, 9.163360595703125, 7.075691223144531, 2.336289405822754, 30.720611572265625, 1.5695877075195312, -5.3321533203125, -14.501075744628906, 11.482925415039062, 28.11737060546875, 0.4137420654296875, 4.031982421875, -1.8392410278320312, 20.019775390625, 22.79039764404297, 5.4444732666015625, -0.41949462890625, 3.815643310546875, -27.38933563232422], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000386.npy"}
{"epoch": 0.5835222978080121, "step": 387, "batch_size": 64, "mean": 9.498468399047852, "std": 11.701128959655762, "min": -12.456008911132812, "p10": -4.001258087158203, "median": 8.332691192626953, "p90": 25.15736045837403, "max": 41.1466064453125, "pos_frac": 0.765625, "sample": [4.4160919189453125, -0.5629081726074219, 35.24223327636719, -6.059591293334961, 3.2512741088867188, 30.693389892578125, -3.8168106079101562, 1.1949996948242188, 1.8559646606445312, 41.1466064453125, 11.574264526367188, 0.49900054931640625, 9.315238952636719, 8.521408081054688, 17.241291046142578, 2.9322509765625, 5.0547027587890625, 19.089431762695312, 14.689067840576172, 3.3575439453125, 13.49072265625, 28.78379249572754, 13.806129455566406, -4.0803070068359375, -2.1453475952148438, 17.904129028320312, -4.56817626953125, 7.395475387573242, 21.369338989257812, -0.7715873718261719, 2.3595733642578125, 4.023200988769531, 9.977500915527344, 23.503376007080078, -11.95440673828125, 18.333072662353516, 18.690982818603516, -0.4722862243652344, 9.050527572631836, -12.456008911132812, -2.6849803924560547, 15.020584106445312, 13.649391174316406, -10.048027038574219, -0.8492355346679688, 13.838340759277344, 29.23096466064453, 29.21515655517578, 1.1608867645263672, 18.98601531982422, -5.519111633300781, 7.501640319824219, 4.5057525634765625, 3.3365020751953125, -0.6870346069335938, 16.356847763061523, 10.211387634277344, 8.143974304199219, 23.119491577148438, 25.8662109375, 22.042312622070312, 10.440422058105469, 3.3683395385742188, 19.82096290588379], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000387.npy"}
{"epoch": 0.5850340136054422, "step": 388, "batch_size": 64, "mean": 5.524343490600586, "std": 11.955625534057617, "min": -19.82634735107422, "p10": -10.136578750610349, "median": 4.706901550292969, "p90": 21.765104293823246, "max": 29.538604736328125, "pos_frac": 0.65625, "sample": [18.379592895507812, 19.401931762695312, 20.166580200195312, 22.084381103515625, -0.30377960205078125, 12.92266845703125, 10.805107116699219, 3.555084228515625, 5.978813171386719, -15.080108642578125, 6.7492523193359375, 12.77301025390625, -13.145740509033203, 4.9137725830078125, 0.22255706787109375, 20.2579345703125, -1.6835289001464844, 25.28619384765625, 2.0997562408447266, 9.972915649414062, 24.037078857421875, 2.9034042358398438, 27.804645538330078, 5.047019958496094, 19.050025939941406, -7.873565673828125, 6.514129638671875, 9.083984375, 9.049983978271484, 3.7633590698242188, -2.797445297241211, -11.106441497802734, 0.8261890411376953, 29.538604736328125, -7.588291168212891, 3.448394775390625, -0.920989990234375, -1.0579376220703125, 9.613880157470703, -12.4632568359375, 16.724609375, 21.19781494140625, -7.107307434082031, 12.061080932617188, 15.59547233581543, 19.599021911621094, -4.3749847412109375, 3.06298828125, -1.92950439453125, 22.008228302001953, -4.565662384033203, 26.530717849731445, -3.550445556640625, 4.500030517578125, -19.82634735107422, -16.151832580566406, 3.38800048828125, 8.3577880859375, -6.23480224609375, -7.418609619140625, 7.286155700683594, -3.0045547485351562, -14.386138916015625, 9.567096710205078], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000388.npy"}
{"epoch": 0.5865457294028723, "step": 389, "batch_size": 64, "mean": 7.998516082763672, "std": 11.42385482788086, "min": -16.2337703704834, "p10": -6.320030021667479, "median": 8.240880966186523, "p90": 22.47689323425293, "max": 35.099647521972656, "pos_frac": 0.75, "sample": [5.93463134765625, -16.2337703704834, 4.529518127441406, 18.12408447265625, 8.602241516113281, 15.32537841796875, 11.903423309326172, 27.299560546875, 21.582260131835938, 13.585205078125, -2.22576904296875, -15.872459411621094, 13.13983154296875, 7.879520416259766, 6.546134948730469, 22.622669219970703, 1.8229522705078125, 20.667644500732422, 7.4401397705078125, 10.427436828613281, 10.405712127685547, 17.064300537109375, 0.11489295959472656, 10.449394226074219, 14.891342163085938, -0.40595245361328125, 22.136749267578125, -0.6076736450195312, 10.426101684570312, 13.093841552734375, -4.5119476318359375, -2.6298179626464844, -3.6596221923828125, 35.099647521972656, 27.37405014038086, 4.1427154541015625, 24.407608032226562, 2.4756851196289062, -2.79547119140625, -9.657541275024414, 10.634521484375, -6.679119110107422, 2.983123779296875, -5.607000350952148, 12.26043701171875, 8.697616577148438, 13.201507568359375, -6.625614166259766, -12.740730285644531, 9.606338500976562, 2.4438858032226562, 5.583853721618652, 1.020751953125, -3.219496726989746, -11.19301986694336, 30.67919921875, 30.747543334960938, 13.70123291015625, 14.292831420898438, 17.423248291015625, 2.7802162170410156, 17.98714828491211, 6.536857604980469, 6.475017547607422], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000389.npy"}
{"epoch": 0.5880574452003023, "step": 390, "batch_size": 64, "mean": 7.609128952026367, "std": 11.262855529785156, "min": -19.264602661132812, "p10": -6.004792022705077, "median": 5.701755523681641, "p90": 20.654330444335944, "max": 41.82928466796875, "pos_frac": 0.796875, "sample": [-6.6138916015625, 3.486177444458008, 13.068130493164062, -3.5045013427734375, 19.152990341186523, 3.1758880615234375, 5.493888854980469, 19.14178466796875, -2.9453811645507812, 12.568580627441406, 1.8757057189941406, -5.370048522949219, 19.4825439453125, 1.9933624267578125, 0.5280609130859375, 41.82928466796875, 14.693042755126953, -7.326145172119141, 13.109687805175781, -6.276824951171875, 15.900466918945312, 11.848297119140625, 9.235300064086914, 29.260345458984375, 15.855865478515625, 14.6119384765625, -2.1377716064453125, -12.433441162109375, 3.772838592529297, 29.443771362304688, -9.652511596679688, 6.633630752563477, 5.104515075683594, 5.9096221923828125, -11.572685241699219, 8.2786865234375, -19.264602661132812, 14.481826782226562, 33.50187683105469, -1.90875244140625, 22.409194946289062, 2.667552947998047, 1.021636962890625, 4.931816101074219, 1.3290367126464844, 13.146240234375, 6.236976623535156, 5.332706451416016, 21.156524658203125, 8.297325134277344, 16.534263610839844, 5.212436676025391, 0.9827499389648438, 10.687871932983398, 7.184589385986328, 2.6371231079101562, 4.60205078125, 0.038616180419921875, 2.634500503540039, -0.6798381805419922, 18.717185974121094, 16.538185119628906, 6.731422424316406, 24.202518463134766], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000390.npy"}
{"epoch": 0.5895691609977324, "step": 391, "batch_size": 64, "mean": 9.773367881774902, "std": 14.709888458251953, "min": -24.30780029296875, "p10": -6.851280593872069, "median": 9.296842575073242, "p90": 27.192557907104494, "max": 53.71995544433594, "pos_frac": 0.703125, "sample": [3.055450439453125, -24.30780029296875, 27.012401580810547, 13.917686462402344, 13.459953308105469, 20.878952026367188, -0.08497428894042969, 26.670440673828125, 17.135066986083984, 42.03643798828125, 5.580116271972656, -18.283462524414062, 15.764877319335938, -7.311702728271484, 14.63458251953125, 35.42859649658203, -5.342662811279297, 13.037921905517578, -2.9146575927734375, -1.6092605590820312, -5.7769622802734375, 11.630081176757812, 6.1952667236328125, 2.0232467651367188, 9.127220153808594, -5.1782684326171875, -16.39841079711914, -11.518058776855469, 13.837043762207031, 3.3126182556152344, 53.71995544433594, 9.401824951171875, 7.128551483154297, -0.9996109008789062, -4.0703125, -2.4497909545898438, 5.389013290405273, 12.625213623046875, 21.834320068359375, 19.722312927246094, 9.19186019897461, 21.8206787109375, -11.819849014282227, -10.249706268310547, 5.779392242431641, 19.527721405029297, 28.37433624267578, 6.3508453369140625, 20.559364318847656, 27.26976776123047, -4.677459716796875, -3.8144073486328125, 30.892669677734375, 20.04107666015625, 24.788223266601562, 28.725357055664062, 24.081233978271484, 18.365219116210938, -1.7798004150390625, 10.089170455932617, 4.392127990722656, 18.351234436035156, 13.05926513671875, 7.864013671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000391.npy"}
{"epoch": 0.5910808767951625, "step": 392, "batch_size": 64, "mean": 11.152414321899414, "std": 13.613516807556152, "min": -19.135116577148438, "p10": -3.8081157684326157, "median": 11.81686782836914, "p90": 27.88286857604981, "max": 47.268768310546875, "pos_frac": 0.8125, "sample": [-16.099029541015625, 8.171710968017578, 22.42102813720703, 5.7860260009765625, 12.404556274414062, 2.393096923828125, 28.24356460571289, 2.438396453857422, 23.29931640625, 14.93115234375, 11.412139892578125, -19.135116577148438, 14.609600067138672, 14.231086730957031, 30.883548736572266, 16.255172729492188, -2.2159957885742188, 6.7610015869140625, 24.875762939453125, 12.221595764160156, 6.549201965332031, -4.45257568359375, 17.03105926513672, -11.138620376586914, 13.775283813476562, 27.041244506835938, 23.843135833740234, 5.432186126708984, 13.493927001953125, -8.837203979492188, -0.7632827758789062, 40.15415954589844, 29.520111083984375, 0.7681732177734375, -9.022598266601562, 13.724090576171875, 1.2056808471679688, 9.843936920166016, 12.23455810546875, 22.68832015991211, 43.7803955078125, 36.832176208496094, -1.7212905883789062, 13.476776123046875, 2.2811203002929688, 2.1534996032714844, 1.8595466613769531, 8.314315795898438, 22.393699645996094, 17.250473022460938, -2.3578529357910156, 47.268768310546875, 0.6961040496826172, 18.06768798828125, -4.429656982421875, -1.9528255462646484, 18.3057861328125, 2.046661376953125, 13.858978271484375, 23.779266357421875, 1.6933670043945312, 18.746623992919922, 3.8791656494140625, 10.552330017089844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000392.npy"}
{"epoch": 0.5925925925925926, "step": 393, "batch_size": 64, "mean": 8.07259750366211, "std": 10.766840934753418, "min": -10.077102661132812, "p10": -3.676943969726562, "median": 6.6213836669921875, "p90": 23.686748123168947, "max": 36.426513671875, "pos_frac": 0.75, "sample": [2.9068222045898438, -7.227596282958984, 25.85162353515625, 10.573341369628906, 31.92355728149414, 9.447982788085938, -2.540201187133789, 11.936058044433594, 6.546630859375, 6.778266906738281, -2.83306884765625, -0.5200424194335938, 29.596710205078125, 9.10947036743164, 14.306480407714844, -7.982189178466797, 6.696136474609375, 19.95806884765625, -4.7921142578125, 4.800445556640625, 1.4609870910644531, 6.774106979370117, -0.88629150390625, 27.175674438476562, 0.7727890014648438, 23.792945861816406, -0.8242912292480469, 16.850074768066406, 15.661285400390625, 18.9176025390625, 2.6253585815429688, -6.080532073974609, 15.798568725585938, 23.438953399658203, 1.7790136337280273, 21.567474365234375, -7.891181945800781, 17.025299072265625, 0.8153533935546875, -1.3741531372070312, 8.21126937866211, 0.8447265625, 5.996849060058594, 5.454425811767578, -0.9062423706054688, 7.312116622924805, 18.0845947265625, 0.20958709716796875, 3.0269012451171875, 1.0922088623046875, 26.543874740600586, 36.426513671875, 8.586700439453125, -1.9628791809082031, 13.590717315673828, 12.468242645263672, 0.21630859375, -2.517669677734375, -10.077102661132812, -4.038604736328125, 9.299064636230469, 21.481746673583984, 6.0305328369140625, 9.336929321289062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000393.npy"}
{"epoch": 0.5941043083900227, "step": 394, "batch_size": 64, "mean": 7.362521171569824, "std": 10.263371467590332, "min": -18.10479736328125, "p10": -3.5271724700927733, "median": 7.225616455078125, "p90": 19.894246673583986, "max": 33.83053970336914, "pos_frac": 0.75, "sample": [10.105545043945312, 28.612274169921875, 26.492446899414062, 1.3736038208007812, 0.3911590576171875, 2.9015884399414062, -18.10479736328125, 4.06695556640625, 14.874908447265625, 5.556915283203125, 25.08447265625, 9.490249633789062, 5.0094146728515625, -1.135498046875, -11.089252471923828, 14.076698303222656, -0.511260986328125, -6.237749099731445, -3.5507049560546875, 6.695167541503906, 21.613426208496094, -1.2272186279296875, 8.853137969970703, 7.027610778808594, 6.897907257080078, 19.16968536376953, 12.081361770629883, 23.939861297607422, 9.70440673828125, 6.065711975097656, -11.807064056396484, 7.423622131347656, 33.83053970336914, 9.092086791992188, 5.489276885986328, 15.966751098632812, 2.0911941528320312, 12.993110656738281, 6.878631591796875, -0.32913970947265625, 10.162086486816406, -3.4722633361816406, -16.572021484375, 9.027774810791016, 10.579025268554688, 20.20477294921875, -8.60150146484375, 10.872211456298828, -1.7434577941894531, 8.932479858398438, -2.3968353271484375, 18.851043701171875, 6.9410552978515625, 10.266281127929688, -1.676239013671875, 18.774749755859375, 2.9341506958007812, 13.99345588684082, 10.946830749511719, -0.4217376708984375, 9.538162231445312, 15.762271881103516, 3.817169189453125, 14.624874114990234], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000394.npy"}
{"epoch": 0.5956160241874527, "step": 395, "batch_size": 64, "mean": 7.337076187133789, "std": 11.47253131866455, "min": -14.770263671875, "p10": -4.436208343505859, "median": 5.436026096343994, "p90": 22.21518783569336, "max": 39.53178405761719, "pos_frac": 0.71875, "sample": [14.09482192993164, 22.282943725585938, 8.091796875, -0.568756103515625, 17.68634796142578, 9.929122924804688, -1.373708724975586, 16.538719177246094, 24.541122436523438, -2.591768264770508, 0.86248779296875, 0.4093780517578125, 23.754013061523438, 19.12033462524414, 11.201614379882812, 22.057090759277344, -3.8906097412109375, 5.457684516906738, -9.943359375, 2.2794418334960938, -4.670036315917969, 0.5502471923828125, 1.8816146850585938, 8.320533752441406, 21.242538452148438, -7.561290740966797, 11.352169036865234, 5.41436767578125, 2.4459972381591797, 2.5290069580078125, 15.243400573730469, 39.53178405761719, 1.2192840576171875, -0.8069076538085938, -10.988662719726562, 19.07855224609375, -14.763713836669922, 8.917633056640625, 16.31713104248047, -2.8939571380615234, 12.652412414550781, 0.25553131103515625, 31.13470458984375, 2.4541549682617188, 23.32003402709961, 28.966232299804688, -3.783538818359375, 21.105194091796875, -8.580879211425781, -2.216350555419922, 1.2252864837646484, 16.66278076171875, -0.487457275390625, 2.4706192016601562, 9.396064758300781, 17.918453216552734, -3.7930145263671875, 5.717720031738281, 11.73546028137207, -2.4434661865234375, 9.328296661376953, -14.770263671875, 3.833963394165039, 15.172531127929688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000395.npy"}
{"epoch": 0.5971277399848829, "step": 396, "batch_size": 64, "mean": 7.993203639984131, "std": 10.034141540527344, "min": -24.803558349609375, "p10": -1.1835258483886715, "median": 7.269939422607422, "p90": 17.67746963500977, "max": 34.21229553222656, "pos_frac": 0.84375, "sample": [14.849796295166016, 31.34796142578125, 18.731597900390625, 29.69420623779297, 15.388786315917969, 11.437477111816406, 13.015560150146484, 8.19978141784668, 5.267799377441406, 13.123964309692383, 11.448097229003906, 5.163780212402344, 14.303512573242188, -13.585205078125, 2.1890220642089844, -0.883636474609375, 14.179229736328125, 16.656089782714844, 5.60926628112793, 13.478012084960938, 5.532867431640625, -0.10720062255859375, 34.21229553222656, -1.3120498657226562, 1.0566482543945312, 4.981224060058594, 9.576385498046875, -24.803558349609375, 18.115203857421875, 7.2793121337890625, 12.627864837646484, -12.488780975341797, 3.813262939453125, 31.209365844726562, 3.610809326171875, 16.37281036376953, 7.471168518066406, 7.260566711425781, 16.300540924072266, 5.412628173828125, 1.42584228515625, 15.308145523071289, 4.053962707519531, 11.66278076171875, 1.9095001220703125, 10.054611206054688, 5.856220245361328, 14.43621826171875, 14.549263000488281, 2.639801025390625, 0.272491455078125, -0.22064208984375, 7.994560241699219, -1.8917388916015625, 5.969465255737305, 8.246788024902344, 2.1927528381347656, 10.825275421142578, 6.1153564453125, 6.062408447265625, 1.166168212890625, -8.655143737792969, -2.5355682373046875, 18.390045166015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000396.npy"}
{"epoch": 0.5986394557823129, "step": 397, "batch_size": 64, "mean": 5.623826503753662, "std": 12.130468368530273, "min": -31.698509216308594, "p10": -10.862580108642577, "median": 7.413963317871094, "p90": 18.924072265625, "max": 31.825420379638672, "pos_frac": 0.765625, "sample": [-0.5500984191894531, 3.1786956787109375, -14.3798828125, 21.4986572265625, -1.0467681884765625, 17.94428253173828, 8.4100341796875, 8.759979248046875, 11.893173217773438, 8.7532958984375, 4.501279830932617, 8.039810180664062, 14.557815551757812, 9.879554748535156, 10.060348510742188, 16.746925354003906, -23.025436401367188, -9.301300048828125, 9.510456085205078, 0.6455459594726562, 12.723514556884766, 7.1422576904296875, 11.122631072998047, 21.575653076171875, 5.954353332519531, -15.484024047851562, -1.2010955810546875, 7.6856689453125, 16.5582275390625, 15.701141357421875, 13.41744613647461, 0.068389892578125, 4.5797576904296875, 3.4000473022460938, 21.315092086791992, 3.1744384765625, -1.5636825561523438, 28.715972900390625, 18.646255493164062, 8.1802978515625, 13.958187103271484, 19.043136596679688, 11.240371704101562, 12.418708801269531, -31.698509216308594, 0.01861572265625, 24.133941650390625, -18.3883056640625, -11.531700134277344, 2.6065597534179688, 9.365165710449219, 4.882728576660156, 2.5864715576171875, 5.712728500366211, -4.7698974609375, -0.8501091003417969, 10.12765121459961, 1.5177574157714844, 0.1173858642578125, 14.4942626953125, -22.3475341796875, -8.4202880859375, 6.093418121337891, 31.825420379638672], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000397.npy"}
{"epoch": 0.600151171579743, "step": 398, "batch_size": 64, "mean": 9.113188743591309, "std": 12.788902282714844, "min": -14.520252227783203, "p10": -5.858295822143555, "median": 8.232524871826172, "p90": 26.742277145385742, "max": 40.34674072265625, "pos_frac": 0.734375, "sample": [-6.798389434814453, 13.244255065917969, 10.417137145996094, 6.148406982421875, 16.03927230834961, 0.6911773681640625, 40.34674072265625, 11.8590087890625, -4.850517272949219, 30.962921142578125, 10.683387756347656, 15.578422546386719, -7.532112121582031, -8.568191528320312, 21.13408660888672, -5.0533599853515625, 23.872013092041016, -5.857353210449219, 10.681106567382812, -1.68951416015625, 13.168380737304688, 6.101768493652344, 37.15818405151367, 26.777542114257812, 0.14691543579101562, 23.512971878051758, 24.502368927001953, 13.313102722167969, 2.4860973358154297, 14.133415222167969, -5.858699798583984, 14.172863006591797, 7.0976409912109375, 4.055381774902344, -4.813392639160156, -7.633413314819336, -0.5035781860351562, 26.659992218017578, 5.733551025390625, 2.938556671142578, -11.219062805175781, -5.7716064453125, 21.05113983154297, 12.916069030761719, 14.55374526977539, 29.51836395263672, 1.3780059814453125, 2.4597549438476562, -1.4007682800292969, 22.477920532226562, 1.2335739135742188, 36.220001220703125, 9.367408752441406, 1.3344268798828125, 10.784164428710938, -2.0834426879882812, 19.245758056640625, -14.520252227783203, 1.374481201171875, -0.08187103271484375, 14.64019775390625, 13.547660827636719, 29.461441040039062, 2.2988357543945312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000398.npy"}
{"epoch": 0.6016628873771731, "step": 399, "batch_size": 64, "mean": 8.561114311218262, "std": 12.367218017578125, "min": -19.328975677490234, "p10": -5.278836822509765, "median": 7.8413543701171875, "p90": 23.75431747436524, "max": 42.279876708984375, "pos_frac": 0.765625, "sample": [17.253562927246094, 0.3238067626953125, 9.90484619140625, 10.879058837890625, -19.328975677490234, 16.105255126953125, 21.957664489746094, 6.156566619873047, -5.045448303222656, -8.785202026367188, -10.422561645507812, 14.313232421875, 9.210651397705078, -5.3788604736328125, 4.6925048828125, 9.255477905273438, 8.485298156738281, 9.785125732421875, 10.10870361328125, 35.46778869628906, -8.696868896484375, 2.1737213134765625, -13.027259826660156, 21.04558563232422, 21.451812744140625, -1.30926513671875, 13.556556701660156, 42.279876708984375, 4.846466064453125, 1.8301544189453125, 3.6140899658203125, -3.3545608520507812, 2.935089111328125, 3.9665565490722656, -1.987884521484375, 2.8908309936523438, 1.7432174682617188, 38.29705810546875, 17.782806396484375, 12.756599426269531, 2.4266433715820312, 1.5164108276367188, 29.455894470214844, -0.9159927368164062, 14.294631958007812, 7.197410583496094, 8.746269226074219, 6.266563415527344, -6.664947509765625, -4.3957061767578125, 24.356657028198242, 17.00690460205078, 22.532142639160156, 8.567520141601562, 13.166603088378906, 24.278106689453125, 21.961212158203125, 30.77850341796875, 1.3292999267578125, 5.408107757568359, 19.589550018310547, -3.8293609619140625, 9.072914123535156, -1.9670944213867188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000399.npy"}
{"epoch": 0.6031746031746031, "step": 400, "batch_size": 64, "mean": 5.274971008300781, "std": 11.881673812866211, "min": -15.802391052246094, "p10": -10.653525924682617, "median": 3.7842178344726562, "p90": 19.13966217041016, "max": 37.14735412597656, "pos_frac": 0.65625, "sample": [2.1885299682617188, 7.1678009033203125, -4.175601959228516, 5.687225341796875, 12.510223388671875, 13.93294906616211, 1.93524169921875, 14.480415344238281, 18.712554931640625, 26.960613250732422, -0.9823455810546875, -2.796894073486328, 4.386772155761719, 2.4807281494140625, 6.603517532348633, 3.438720703125, -5.43780517578125, 31.104496002197266, 2.3880844116210938, 15.66212272644043, 1.3723602294921875, -9.521732330322266, 12.710281372070312, -11.703084945678711, -11.138580322265625, 19.322708129882812, -3.0321311950683594, 9.207107543945312, 2.2668533325195312, -3.982147216796875, -11.44317626953125, -3.1755104064941406, 11.70556640625, 1.6928787231445312, 36.24431228637695, -12.816497802734375, -8.579414367675781, -1.581207275390625, 14.345794677734375, 7.11578369140625, 11.976799011230469, 15.763408660888672, -15.802391052246094, 20.144210815429688, -4.3417510986328125, 12.514335632324219, -0.6810302734375, -0.1976165771484375, 5.6176605224609375, 37.14735412597656, 4.927028656005859, 11.9451904296875, -7.99395751953125, 5.0784149169921875, -11.938074111938477, 12.995979309082031, 30.614730834960938, 10.467361450195312, 1.1318092346191406, 0.987579345703125, -13.178642272949219, -2.6321868896484375, 13.664680480957031, 4.1297149658203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000400.npy"}
{"epoch": 0.6046863189720333, "step": 401, "batch_size": 64, "mean": 10.499998092651367, "std": 11.979192733764648, "min": -16.16811752319336, "p10": -2.5947929382324215, "median": 9.2601318359375, "p90": 26.80611190795899, "max": 52.40472412109375, "pos_frac": 0.859375, "sample": [-2.712371826171875, 25.51329803466797, 27.46246337890625, 7.766807556152344, 2.440523147583008, 4.155119895935059, 7.412750244140625, 10.297807693481445, 12.072006225585938, -11.922889709472656, 17.838287353515625, 11.473098754882812, -7.735389709472656, -7.579643249511719, 14.142326354980469, 1.4988479614257812, 15.79132080078125, 3.848114013671875, 25.263465881347656, 27.252708435058594, 6.565971374511719, -1.9817390441894531, 6.4131011962890625, 21.059104919433594, -5.8384857177734375, -16.16811752319336, 9.226409912109375, 7.61058235168457, 13.615249633789062, 2.1587066650390625, 11.296890258789062, 9.699020385742188, 4.867897033691406, 20.016273498535156, 2.4940185546875, 25.764053344726562, 4.749195098876953, 12.359886169433594, 1.0000629425048828, 8.928665161132812, 0.5467987060546875, 27.718013763427734, 9.293853759765625, -11.087100982666016, 17.856460571289062, 10.449634552001953, 2.883056640625, 27.97582244873047, 13.656494140625, 21.65500259399414, 11.604629516601562, 4.904901504516602, 8.572628021240234, 2.6173324584960938, -2.3204421997070312, 9.142583847045898, 13.363765716552734, 17.107219696044922, 30.186595916748047, 35.02519607543945, 17.079866409301758, 6.597095489501953, 16.650360107421875, 52.40472412109375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000401.npy"}
{"epoch": 0.6061980347694633, "step": 402, "batch_size": 64, "mean": 8.272048950195312, "std": 11.973098754882812, "min": -18.559844970703125, "p10": -6.084205627441405, "median": 7.471549987792969, "p90": 19.98922576904297, "max": 40.180389404296875, "pos_frac": 0.765625, "sample": [19.190156936645508, 12.707672119140625, 13.26373291015625, 17.05107879638672, 6.058555603027344, 13.403526306152344, 37.56450653076172, 17.978515625, 7.699684143066406, 15.930381774902344, 0.4244728088378906, 33.50196838378906, 20.24091339111328, -4.1170196533203125, 16.588119506835938, 4.509044647216797, 4.034088134765625, 7.1332855224609375, -1.4448318481445312, 8.436386108398438, 32.011146545410156, 4.718143463134766, -1.2571754455566406, -12.021247863769531, 20.439483642578125, -1.1561088562011719, 8.498516082763672, 16.94775390625, 6.144493103027344, 8.525871276855469, 40.180389404296875, 33.23541259765625, -7.187965393066406, 5.280982971191406, 5.105499267578125, 17.51026725769043, 8.94149398803711, 19.401954650878906, 12.505195617675781, 9.243383407592773, -6.298564910888672, -5.584033966064453, -18.559844970703125, 14.974403381347656, 10.849761962890625, 2.5091629028320312, 4.119270324707031, -8.30047607421875, 4.2591705322265625, -4.004219055175781, 18.36745834350586, 7.243415832519531, 9.536365509033203, 18.56304931640625, -9.696731567382812, 0.5098457336425781, 0.1612548828125, -3.7888031005859375, -13.87774658203125, 16.065536499023438, 5.1866455078125, 9.581714630126953, 2.4327239990234375, -2.0599746704101562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000402.npy"}
{"epoch": 0.6077097505668935, "step": 403, "batch_size": 64, "mean": 10.750381469726562, "std": 14.796972274780273, "min": -30.972244262695312, "p10": -7.517294692993164, "median": 9.624611854553223, "p90": 32.19883880615236, "max": 50.29730224609375, "pos_frac": 0.796875, "sample": [2.2968292236328125, 1.365203857421875, 20.666000366210938, 11.865280151367188, 14.641092300415039, 10.898590087890625, 42.08148193359375, 19.004653930664062, -0.4672203063964844, 24.848167419433594, 19.975818634033203, -7.7389068603515625, 21.047828674316406, 0.5318527221679688, 22.367111206054688, 12.538925170898438, 10.794931411743164, 3.1598682403564453, 9.19158935546875, 7.677516937255859, -3.982269287109375, -14.647098541259766, 50.29730224609375, 38.782169342041016, 4.5736541748046875, 20.168977737426758, 34.08433532714844, 2.790416717529297, 34.18014144897461, 3.1578750610351562, 9.013145446777344, 7.536186218261719, 8.185771942138672, 1.9916229248046875, 10.057634353637695, -7.553127288818359, 3.6739730834960938, 38.678897857666016, 5.748956680297852, 12.027801513671875, 5.701606750488281, -2.688264846801758, 34.85060119628906, -3.8755950927734375, 18.468366622924805, -8.022796630859375, 7.940986633300781, 12.975303649902344, 18.49169158935547, -7.433685302734375, 16.741943359375, 23.57608413696289, -30.972244262695312, -9.5623779296875, 15.15704345703125, -12.500255584716797, 27.15091323852539, 15.753189086914062, 5.105278015136719, 27.799346923828125, -4.5772857666015625, 4.078483581542969, 10.458770751953125, 17.894411087036133], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000403.npy"}
{"epoch": 0.6092214663643235, "step": 404, "batch_size": 64, "mean": 9.390998840332031, "std": 12.221561431884766, "min": -15.413959503173828, "p10": -4.725514221191406, "median": 6.946998596191406, "p90": 25.30107192993164, "max": 36.4962158203125, "pos_frac": 0.765625, "sample": [11.367172241210938, 6.689453125, 25.319869995117188, 7.930816650390625, 3.082317352294922, 9.758529663085938, 29.893463134765625, 4.971390724182129, -4.223457336425781, 1.98187255859375, 22.4158935546875, -0.386474609375, -9.139835357666016, 6.165988922119141, -0.3104419708251953, 34.26483154296875, -5.959140777587891, -3.182084083557129, 25.206130981445312, 19.017288208007812, 25.25720977783203, 7.0217132568359375, 4.544563293457031, 36.4962158203125, -7.683349609375, 8.510845184326172, -4.940681457519531, 21.378570556640625, 14.760421752929688, 14.522960662841797, 6.297966003417969, 6.872283935546875, -6.102130889892578, 15.798625946044922, -2.283935546875, 3.564453125, -2.2161941528320312, 16.682579040527344, 9.353919982910156, 2.55828857421875, 1.3332099914550781, 0.578125, 19.719383239746094, 10.086761474609375, 16.778301239013672, 10.967987060546875, 18.085693359375, 17.718448638916016, 36.16698455810547, 5.543689727783203, 1.329345703125, 21.787017822265625, 35.98114013671875, 1.8363800048828125, 28.894393920898438, 4.055442810058594, 22.924030303955078, -15.413959503173828, -3.575408935546875, -10.201416015625, 2.5738906860351562, -0.9941482543945312, 12.012470245361328, 7.5782012939453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000404.npy"}
{"epoch": 0.6107331821617535, "step": 405, "batch_size": 64, "mean": 7.299925327301025, "std": 11.396657943725586, "min": -10.474678039550781, "p10": -6.368014526367187, "median": 4.555271148681641, "p90": 23.454294586181646, "max": 34.912498474121094, "pos_frac": 0.65625, "sample": [21.780670166015625, 2.4110946655273438, -8.245454788208008, 25.541366577148438, 21.6580810546875, -0.3520050048828125, 15.069141387939453, -5.75848388671875, -9.341106414794922, -9.315322875976562, -2.2892608642578125, 7.918153762817383, 2.0003280639648438, -2.6090011596679688, 14.365997314453125, -3.6788291931152344, 33.889434814453125, -4.294830322265625, 4.810020446777344, 4.111358642578125, 13.101959228515625, 17.858901977539062, 9.184402465820312, 13.19561767578125, -6.640011787414551, -2.3981246948242188, 21.3779296875, 24.416542053222656, -6.629241943359375, 6.63810920715332, 4.259330749511719, -4.062004089355469, 3.5886383056640625, 13.156257629394531, -2.9119720458984375, 11.35904312133789, 8.178760528564453, -6.751335144042969, 19.078712463378906, 20.520591735839844, 2.0865116119384766, -1.6251106262207031, 5.738807678222656, 8.62088394165039, 4.3005218505859375, 2.7242965698242188, -0.554168701171875, 17.95813751220703, 28.397674560546875, 14.835710525512695, -0.6468238830566406, 0.76507568359375, 16.901409149169922, 9.063243865966797, 28.186817169189453, 34.912498474121094, -10.474678039550781, 15.304534912109375, -1.0643310546875, 3.0564422607421875, -2.911163330078125, 24.17156219482422, 8.46295166015625, -5.209053039550781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000405.npy"}
{"epoch": 0.6122448979591837, "step": 406, "batch_size": 64, "mean": 10.236560821533203, "std": 12.651453971862793, "min": -19.16329574584961, "p10": -6.060342216491699, "median": 9.054973602294922, "p90": 29.684873390197765, "max": 35.56001663208008, "pos_frac": 0.828125, "sample": [12.0635986328125, 2.3031768798828125, -5.460914611816406, -7.825325012207031, 14.010765075683594, 17.133163452148438, 22.150917053222656, 33.19062805175781, -7.2769012451171875, 31.45355224609375, 19.82073211669922, 13.541893005371094, 0.58831787109375, -3.800739288330078, 25.566421508789062, 24.088592529296875, 26.88140869140625, 7.967494964599609, 5.844001770019531, 33.995513916015625, 31.019859313964844, 6.185901641845703, 6.743549346923828, 8.2398681640625, 9.342720031738281, 9.011344909667969, 2.3582324981689453, 9.811363220214844, 0.7120800018310547, 4.347648620605469, -6.317239761352539, -0.12555313110351562, -19.16329574584961, 0.5988082885742188, 25.46636199951172, 10.971378326416016, 2.6079635620117188, 30.8863582611084, 1.9164924621582031, 35.56001663208008, 10.142646789550781, -7.59698486328125, 10.571609497070312, 20.55242919921875, 1.9472274780273438, 21.69001007080078, -2.468494415283203, 4.306648254394531, 13.432914733886719, 24.79529571533203, 10.402740478515625, 9.098602294921875, 4.475860595703125, -11.506729125976562, 11.311042785644531, -15.493400573730469, 5.5483856201171875, 16.500030517578125, 4.211097717285156, 8.342620849609375, 33.177581787109375, 19.026763916015625, 19.284652709960938, 6.9771270751953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000406.npy"}
{"epoch": 0.6137566137566137, "step": 407, "batch_size": 64, "mean": 6.762814521789551, "std": 11.57795524597168, "min": -14.46246337890625, "p10": -6.880650901794433, "median": 4.486993789672852, "p90": 22.0281665802002, "max": 39.050148010253906, "pos_frac": 0.671875, "sample": [3.73724365234375, -1.0288429260253906, -8.831573486328125, 21.414852142333984, -1.7560806274414062, 20.687198638916016, 35.441436767578125, -7.9608306884765625, -14.395912170410156, 13.033561706542969, -0.517913818359375, 10.799869537353516, -0.3347320556640625, -6.9386444091796875, -3.0744781494140625, -2.6965560913085938, 12.699047088623047, 3.262359619140625, 20.5181884765625, 23.864412307739258, 0.37457275390625, -4.165803909301758, 12.809661865234375, 12.023368835449219, 4.09417724609375, 17.501686096191406, 2.2714767456054688, 16.730941772460938, 7.493537902832031, 39.050148010253906, 7.000694274902344, -0.29912567138671875, 22.291015625, 13.218246459960938, -0.1530303955078125, 0.607452392578125, -5.315521240234375, 13.012115478515625, 5.115215301513672, 0.27223777770996094, -9.968109130859375, 15.404464721679688, 17.92890167236328, 8.885108947753906, 27.560104370117188, 12.440475463867188, 14.549919128417969, 3.9451522827148438, 6.7189788818359375, -0.6006946563720703, 6.740501403808594, -11.759918212890625, -14.46246337890625, -1.573272705078125, -6.745332717895508, 0.46905517578125, 8.240928649902344, 28.078594207763672, 4.879810333251953, 0.6347198486328125, -1.2820358276367188, 14.675743103027344, 23.501007080078125, 2.7028350830078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000407.npy"}
{"epoch": 0.6152683295540439, "step": 408, "batch_size": 64, "mean": 10.066441535949707, "std": 11.641014099121094, "min": -12.50656509399414, "p10": -4.525440979003906, "median": 7.403554916381836, "p90": 27.09783325195313, "max": 34.61149597167969, "pos_frac": 0.8125, "sample": [-0.746856689453125, 11.291742324829102, 19.460311889648438, 12.257545471191406, 3.1326255798339844, -12.50656509399414, -10.401100158691406, 0.9980926513671875, 18.949649810791016, 6.990253448486328, 6.671669006347656, 24.852752685546875, 5.030433654785156, 5.511260986328125, 23.196929931640625, 22.064102172851562, 0.5934219360351562, -4.621177673339844, 17.64432716369629, 7.163341522216797, 9.114044189453125, -1.2467803955078125, 0.019893646240234375, 8.657386779785156, 28.182571411132812, -5.5283203125, 4.041740417480469, 34.61149597167969, 19.39392852783203, 29.264759063720703, 27.489532470703125, 0.9267215728759766, 5.96746826171875, 9.981697082519531, 24.989036560058594, 26.183868408203125, 3.0151824951171875, -7.951850891113281, 33.90972900390625, 3.1800537109375, 3.0148162841796875, 17.595535278320312, 21.2869873046875, 27.900630950927734, -0.55096435546875, 8.762130737304688, 18.582244873046875, -4.302055358886719, 6.463138580322266, 10.828781127929688, 16.217666625976562, 20.045867919921875, 33.46112060546875, -5.146030426025391, 3.7526626586914062, 13.956504821777344, 1.3810348510742188, -1.9406147003173828, 7.643768310546875, -4.926582336425781, 3.21893310546875, 3.9135589599609375, 10.700736999511719, 20.65746307373047], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000408.npy"}
{"epoch": 0.6167800453514739, "step": 409, "batch_size": 64, "mean": 5.407742023468018, "std": 12.607215881347656, "min": -18.76080322265625, "p10": -8.647924423217772, "median": 2.8469791412353516, "p90": 26.51914749145508, "max": 35.184627532958984, "pos_frac": 0.609375, "sample": [4.02239990234375, 3.5686721801757812, -12.427947998046875, -4.500476837158203, -3.0543289184570312, 7.414573669433594, 8.245208740234375, -1.6084747314453125, 7.6161956787109375, -12.738807678222656, 0.7166805267333984, 27.86113739013672, 3.059703826904297, 33.385467529296875, 10.589839935302734, -2.8782806396484375, 12.580741882324219, 4.274835586547852, -13.335071563720703, 26.537574768066406, 18.092430114746094, -0.4295501708984375, 9.824951171875, 19.766822814941406, -4.271099090576172, -9.437332153320312, 1.6115875244140625, 16.63915252685547, 9.198440551757812, 0.8472137451171875, -3.7687530517578125, -3.0753555297851562, 14.937843322753906, -4.118282318115234, 9.228363037109375, -1.1705894470214844, 26.476150512695312, 27.518478393554688, -5.373321533203125, 8.33420181274414, -6.805973052978516, 2.6342544555664062, -1.2920379638671875, -9.639837265014648, 10.26348876953125, -1.0999298095703125, 0.3500785827636719, 1.276620864868164, -13.953323364257812, 35.184627532958984, -3.944049835205078, 30.254131317138672, -2.825174331665039, 20.849044799804688, -18.76080322265625, 3.0604476928710938, 1.269622802734375, -2.4615478515625, 6.0588226318359375, 19.74439811706543, 35.14866638183594, -3.445789337158203, 5.1075592041015625, 8.961189270019531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000409.npy"}
{"epoch": 0.618291761148904, "step": 410, "batch_size": 64, "mean": 8.658401489257812, "std": 12.152556419372559, "min": -18.968107223510742, "p10": -7.043088150024413, "median": 8.375865936279297, "p90": 24.028820800781258, "max": 40.894309997558594, "pos_frac": 0.765625, "sample": [7.608211517333984, 7.9449005126953125, 6.981868743896484, 1.1265678405761719, 29.078929901123047, -3.064239501953125, 6.670764923095703, 11.62371826171875, -11.002761840820312, -18.968107223510742, 11.36883544921875, 18.096885681152344, -2.0640716552734375, -9.15582275390625, 22.336669921875, 19.036640167236328, 30.580310821533203, -14.158676147460938, 17.78510284423828, 21.099899291992188, 32.1480712890625, 14.064258575439453, -9.914054870605469, 13.558177947998047, 3.859375, 16.84693145751953, -8.779220581054688, -6.2255401611328125, 14.403343200683594, -2.3327713012695312, 21.931442260742188, 0.0960235595703125, 4.820764541625977, 21.569259643554688, 1.1069145202636719, 18.5396728515625, 8.928581237792969, -7.356678009033203, -3.6812095642089844, 24.7540283203125, -1.5937881469726562, -6.311378479003906, 10.525398254394531, 4.461517333984375, 25.31952667236328, 8.806831359863281, 27.12335968017578, 7.111808776855469, 11.220840454101562, 8.9451904296875, 4.7110748291015625, 1.307586669921875, 40.894309997558594, 16.610321044921875, 0.8167648315429688, 4.6419525146484375, 14.9866943359375, 3.4206504821777344, 15.649019241333008, 9.795955657958984, 22.191253662109375, 3.2748184204101562, -0.30399322509765625, 9.269050598144531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000410.npy"}
{"epoch": 0.6198034769463341, "step": 411, "batch_size": 64, "mean": 8.891387939453125, "std": 13.163415908813477, "min": -21.602813720703125, "p10": -5.465934371948242, "median": 7.339622497558594, "p90": 28.71956672668458, "max": 37.410797119140625, "pos_frac": 0.734375, "sample": [-18.53076934814453, 20.544204711914062, 32.58274459838867, 22.679893493652344, 10.781242370605469, 6.564849853515625, -15.205398559570312, -1.4869842529296875, 6.404205322265625, 1.4660072326660156, 26.18476104736328, 35.80665588378906, 12.72262191772461, 13.74306869506836, 13.789199829101562, 3.5309791564941406, 12.111129760742188, -2.89739990234375, 17.642494201660156, -0.002490997314453125, -1.3656005859375, 12.046981811523438, 11.081344604492188, 9.322675704956055, 5.607322692871094, 22.357040405273438, 0.6952629089355469, 21.437847137451172, -1.2451496124267578, -21.602813720703125, -8.240798950195312, -4.966304779052734, -6.6944122314453125, 4.2575531005859375, 19.117645263671875, 5.023536682128906, 0.1424407958984375, 29.995243072509766, -8.377437591552734, 37.410797119140625, 10.5889892578125, 9.883018493652344, -0.5662307739257812, 22.80289077758789, 14.122322082519531, 36.74462127685547, -5.680061340332031, 16.21662139892578, 0.75994873046875, -1.259521484375, 5.877523422241211, 9.536048889160156, 29.805912017822266, 21.43163299560547, -3.1095314025878906, 8.954154968261719, 6.727348327636719, 7.951896667480469, 34.90788650512695, 5.2164306640625, 1.7814712524414062, 3.9465503692626953, -3.34417724609375, 11.318857192993164], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000411.npy"}
{"epoch": 0.6213151927437641, "step": 412, "batch_size": 64, "mean": 7.870449066162109, "std": 11.770347595214844, "min": -16.072433471679688, "p10": -5.246430206298827, "median": 6.808805465698242, "p90": 20.74403877258301, "max": 41.82799530029297, "pos_frac": 0.765625, "sample": [-4.2221221923828125, 28.4490966796875, 7.233184814453125, 33.38274002075195, 6.427520751953125, 13.547874450683594, 1.14764404296875, 16.622852325439453, -12.099340438842773, 7.057975769042969, 1.0983390808105469, 6.663768768310547, 21.191890716552734, 38.98777389526367, 2.4220142364501953, -0.198883056640625, 17.714649200439453, -3.2713680267333984, 2.1761322021484375, 1.4743156433105469, 14.486568450927734, 18.664505004882812, 7.7126007080078125, 9.891471862792969, 9.212265014648438, 3.9299774169921875, 2.4969329833984375, 1.7022819519042969, 9.325859069824219, 3.07659912109375, 1.6176509857177734, 19.699050903320312, 8.065736770629883, 41.82799530029297, -1.6959037780761719, 6.20135498046875, -6.3035430908203125, -1.6360359191894531, 18.603729248046875, 0.4266357421875, -8.414962768554688, -5.903350830078125, 9.935081481933594, -10.065078735351562, 11.543161392211914, 13.465835571289062, 28.543792724609375, 19.319786071777344, 30.279296875, -5.499259948730469, 6.9538421630859375, 7.779323577880859, 15.115013122558594, -3.864307403564453, 11.055191040039062, 10.433082580566406, 0.1491546630859375, 2.8914947509765625, -16.072433471679688, 6.652801513671875, 13.975154876708984, -1.9691543579101562, 18.94995880126953, -4.656494140625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000412.npy"}
{"epoch": 0.6228269085411943, "step": 413, "batch_size": 64, "mean": 7.638739585876465, "std": 13.474037170410156, "min": -19.36627960205078, "p10": -7.608306121826172, "median": 6.505146026611328, "p90": 27.047303771972658, "max": 41.51668167114258, "pos_frac": 0.71875, "sample": [-1.939727783203125, -19.12548065185547, 10.673324584960938, 15.615331649780273, -7.80718994140625, 0.4564781188964844, 16.71068572998047, 41.51668167114258, -4.300685882568359, 27.18121337890625, -19.36627960205078, 14.4268798828125, 7.7274169921875, -1.9686660766601562, -1.26873779296875, 5.612819671630859, 9.296333312988281, 33.656341552734375, 1.8102493286132812, -3.0955638885498047, 28.765968322753906, 10.607704162597656, 3.2853851318359375, 5.74383544921875, 10.283416748046875, -7.6824798583984375, 26.734848022460938, 33.51365661621094, 12.396224975585938, 3.1460342407226562, 9.662956237792969, 4.662723541259766, 0.6221961975097656, 4.2293701171875, 39.50693130493164, -6.7483673095703125, 10.144134521484375, 1.718017578125, 12.223052978515625, 7.266456604003906, -15.265220642089844, 22.31606674194336, 11.862968444824219, -4.960502624511719, 15.879692077636719, -8.997886657714844, -16.108966827392578, 1.5958023071289062, 5.270086288452148, 7.920387268066406, -7.435234069824219, -6.8167724609375, 24.339187622070312, 1.4708061218261719, -0.1161346435546875, 11.295913696289062, 2.668558120727539, 19.823402404785156, 9.27935791015625, 15.413314819335938, -1.6137771606445312, 12.294235229492188, 29.08655548095703, 23.783973693847656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000413.npy"}
{"epoch": 0.6243386243386243, "step": 414, "batch_size": 64, "mean": 9.066047668457031, "std": 12.690054893493652, "min": -24.08808135986328, "p10": -4.055993652343748, "median": 6.648214340209961, "p90": 26.71881866455078, "max": 38.76548767089844, "pos_frac": 0.75, "sample": [23.017120361328125, -2.26788330078125, 17.193588256835938, 5.6030120849609375, -0.9444351196289062, 11.505313873291016, 1.8993682861328125, 21.792953491210938, 10.658515930175781, 1.6728363037109375, 5.371421813964844, 26.95795440673828, 20.878189086914062, -8.410715103149414, 27.927040100097656, 24.547569274902344, -8.756507873535156, -13.433334350585938, 9.9783935546875, -1.8784103393554688, 7.406352996826172, 1.4752120971679688, 38.31842041015625, 11.076278686523438, 25.346492767333984, 3.5622406005859375, 6.7245330810546875, 14.149948120117188, -0.1474609375, 17.35021209716797, 16.439037322998047, -2.0471954345703125, 21.54248046875, 6.571895599365234, 23.666534423828125, 9.691764831542969, 9.7177734375, 6.517608642578125, 11.493026733398438, 20.650875091552734, 3.629638671875, -7.978248596191406, 4.945461273193359, -1.5046348571777344, 26.729507446289062, 2.8217086791992188, 2.7476043701171875, 12.4896240234375, -24.08808135986328, 2.53070068359375, 29.20355224609375, 6.415458679199219, -12.725038528442383, 1.430521011352539, 27.78152084350586, 38.76548767089844, 1.0792274475097656, -2.0931396484375, 10.28656005859375, 15.232391357421875, -4.82232666015625, -1.6144790649414062, -0.5478515625, 26.693878173828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000414.npy"}
{"epoch": 0.6258503401360545, "step": 415, "batch_size": 64, "mean": 8.127355575561523, "std": 14.285613059997559, "min": -30.35393524169922, "p10": -7.651316070556639, "median": 6.773616790771484, "p90": 27.473417091369633, "max": 42.53938293457031, "pos_frac": 0.71875, "sample": [0.992030143737793, 13.988286972045898, 1.1304130554199219, 14.286314010620117, -2.9378204345703125, 6.919410705566406, 23.948348999023438, 35.58324432373047, -8.244804382324219, 30.74883270263672, -2.6563987731933594, -17.468002319335938, 19.36034393310547, 27.75520896911621, -5.777984619140625, 16.67172622680664, 42.53938293457031, 5.148960113525391, 26.361190795898438, -2.0617923736572266, 7.922332763671875, 5.550994873046875, 2.638397216796875, -1.2087278366088867, 14.458747863769531, 26.730003356933594, -13.881366729736328, 17.347579956054688, 14.2589111328125, 25.696533203125, -6.266510009765625, 11.107704162597656, 13.805183410644531, -1.5185127258300781, -16.807044982910156, 26.815902709960938, 12.922294616699219, 1.866765022277832, 27.94493865966797, -12.334365844726562, 3.5936203002929688, 0.354644775390625, -14.974386215209961, -30.35393524169922, 11.786182403564453, 10.472831726074219, 1.1862030029296875, 0.47719573974609375, 8.569938659667969, -0.8692092895507812, -1.3379268646240234, 5.759696960449219, 7.120555877685547, 6.6278228759765625, -1.2113571166992188, 13.357154846191406, 0.38715457916259766, 11.216400146484375, 22.975494384765625, -3.42828369140625, 4.2946319580078125, 16.230079650878906, 33.727874755859375, 30.85175323486328], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000415.npy"}
{"epoch": 0.6273620559334845, "step": 416, "batch_size": 64, "mean": 5.993932723999023, "std": 12.860825538635254, "min": -24.984886169433594, "p10": -10.19079475402832, "median": 4.642311096191406, "p90": 22.239747428894045, "max": 40.296539306640625, "pos_frac": 0.703125, "sample": [-0.5046348571777344, -24.984886169433594, -5.8308258056640625, 4.035194396972656, 1.7998523712158203, 30.159149169921875, -6.8883209228515625, 2.6154327392578125, 8.229248046875, 35.628204345703125, 6.411277770996094, 27.505157470703125, 40.296539306640625, 0.858428955078125, -1.9369010925292969, 2.9423599243164062, -0.21631240844726562, 8.0526123046875, -16.256607055664062, 6.8699951171875, 11.800708770751953, 6.728290557861328, 4.714576721191406, 13.69580078125, -2.20889949798584, 3.2026748657226562, 14.16876220703125, 35.25033950805664, 21.246246337890625, -3.8201370239257812, 12.370893478393555, 10.601509094238281, 0.6323623657226562, -12.026836395263672, 4.870563507080078, 0.8115272521972656, 0.5072288513183594, -15.557563781738281, 2.9536399841308594, 4.713035583496094, -4.2960052490234375, 18.616744995117188, -10.485137939453125, -2.174161911010742, 1.50726318359375, -1.8216552734375, 22.296499252319336, -12.668121337890625, -3.1839160919189453, 17.8934326171875, 3.8467254638671875, 8.405433654785156, 4.571586608886719, 22.10732650756836, -9.50399398803711, 9.803672790527344, -12.164619445800781, 7.7401123046875, 6.551929473876953, 12.920372009277344, 19.7833251953125, 18.241317749023438, 24.857009887695312, 7.326873779296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000416.npy"}
{"epoch": 0.6288737717309146, "step": 417, "batch_size": 64, "mean": 10.32547378540039, "std": 14.134320259094238, "min": -23.313796997070312, "p10": -5.134703445434569, "median": 8.080432891845703, "p90": 30.612555313110352, "max": 47.942291259765625, "pos_frac": 0.828125, "sample": [5.852687835693359, -1.1444358825683594, 2.5123672485351562, -10.028884887695312, 30.054595947265625, 19.3323974609375, -23.313796997070312, 30.851680755615234, 10.295154571533203, 5.688762664794922, 47.942291259765625, 7.9529876708984375, 15.512447357177734, 8.207878112792969, 12.964019775390625, -20.74955177307129, 33.1026496887207, -3.6395187377929688, 1.08795166015625, 5.086017608642578, 15.11065673828125, 32.253387451171875, 6.993907928466797, 7.393497467041016, 3.341449737548828, -4.420265197753906, 36.355674743652344, 9.867408752441406, 21.588401794433594, 25.9267578125, 3.0454177856445312, 28.33869171142578, 38.67375946044922, 4.097259521484375, 16.2554931640625, -5.440891265869141, 2.7870254516601562, 6.629795074462891, 22.823829650878906, 11.590373992919922, 21.03341293334961, 3.290790557861328, 18.229679107666016, 6.637901306152344, 0.023223876953125, 8.710594177246094, 2.711057662963867, 20.157752990722656, -11.430198669433594, 9.708709716796875, 3.9932174682617188, 11.345699310302734, -3.3944549560546875, -7.4142303466796875, 10.967918395996094, -14.195365905761719, 13.817926406860352, 23.324874877929688, 3.4514389038085938, 0.06991386413574219, 23.099838256835938, 5.981782913208008, 14.335922241210938, 35.59358215332031], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000417.npy"}
{"epoch": 0.6303854875283447, "step": 418, "batch_size": 64, "mean": 5.039796829223633, "std": 12.3927001953125, "min": -19.930145263671875, "p10": -10.473326110839842, "median": 4.011898040771484, "p90": 21.12338256835939, "max": 35.28404235839844, "pos_frac": 0.65625, "sample": [8.009117126464844, 12.008453369140625, 15.305580139160156, 6.055023193359375, -6.352178573608398, 18.085289001464844, 2.8882713317871094, 17.274290084838867, 33.551368713378906, -4.7581939697265625, 29.816749572753906, 32.68964385986328, -17.821929931640625, 8.1649169921875, -0.5141143798828125, -12.370616912841797, 10.512924194335938, 0.22552871704101562, 24.220535278320312, -5.7386474609375, 1.7896900177001953, 35.28404235839844, 1.0632171630859375, -0.01871490478515625, 16.66571044921875, 5.225269317626953, -12.570266723632812, 11.910724639892578, 3.1935958862304688, -2.7381820678710938, -19.930145263671875, -3.713287353515625, 4.933135986328125, 6.135703086853027, 34.56840515136719, -4.3629608154296875, 5.964351654052734, -8.050884246826172, 2.286294937133789, -4.0615081787109375, -3.8923568725585938, -10.749481201171875, -12.502516746520996, -10.974968910217285, 8.504074096679688, 0.0829010009765625, 3.9044647216796875, 6.4473724365234375, -6.5241851806640625, 2.948394775390625, 8.443832397460938, 8.738677978515625, 22.42542266845703, 15.471794128417969, 5.6586151123046875, 0.7774200439453125, -0.488525390625, 16.4952392578125, 14.336837768554688, 11.14461898803711, 4.119331359863281, -1.1747932434082031, 4.35760498046875, -9.828964233398438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000418.npy"}
{"epoch": 0.6318972033257747, "step": 419, "batch_size": 64, "mean": 10.564603805541992, "std": 11.301063537597656, "min": -10.125717163085938, "p10": -3.661724853515625, "median": 9.7017822265625, "p90": 26.51580276489258, "max": 36.08043670654297, "pos_frac": 0.8125, "sample": [2.3857955932617188, 25.99285888671875, 11.260187149047852, 15.152801513671875, -10.125717163085938, -2.0933914184570312, 23.74677276611328, 8.6697998046875, 0.5558319091796875, 11.780771255493164, -7.835906982421875, 19.706180572509766, 19.967147827148438, 12.612197875976562, 4.443653106689453, 1.6048583984375, 15.692634582519531, 10.4744873046875, 23.665786743164062, 0.8309097290039062, 4.189605712890625, 17.818923950195312, -6.646976470947266, 14.41473388671875, 4.325782775878906, -6.2350311279296875, 28.90645980834961, -2.1842269897460938, 17.314865112304688, 23.989356994628906, -0.498809814453125, -5.223972320556641, -3.6925277709960938, 20.3824462890625, 21.019805908203125, 11.075782775878906, 13.537620544433594, 3.232102394104004, 0.3175468444824219, 17.184234619140625, 9.722427368164062, 14.821441650390625, 7.7907257080078125, -1.3941574096679688, 26.73992156982422, -3.5898513793945312, 31.662673950195312, 7.430084228515625, 2.2498016357421875, 29.303268432617188, 29.505203247070312, 3.9762630462646484, 15.937202453613281, 9.681137084960938, 17.21548080444336, 7.1845245361328125, 18.059036254882812, 8.008041381835938, 33.748382568359375, 2.9175567626953125, 9.200462341308594, 3.7102279663085938, -5.541046142578125, 36.08043670654297], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000419.npy"}
{"epoch": 0.6334089191232048, "step": 420, "batch_size": 64, "mean": 7.048455238342285, "std": 14.057759284973145, "min": -22.854351043701172, "p10": -8.757856369018553, "median": 5.141809463500977, "p90": 27.93088035583497, "max": 37.34100341796875, "pos_frac": 0.6875, "sample": [-1.8172760009765625, -2.304443359375, 20.95079803466797, -1.339752197265625, 19.470298767089844, -22.854351043701172, 2.3769683837890625, 4.814655303955078, 3.9245681762695312, -21.546772003173828, -2.0227813720703125, 4.88995361328125, 2.0660934448242188, 15.937515258789062, -7.482513427734375, 22.869709014892578, 37.34100341796875, 5.099185943603516, 35.70047378540039, 25.45074462890625, 11.018089294433594, -7.054431915283203, 12.880014419555664, 13.647330284118652, 9.825460433959961, 19.977127075195312, -6.0615997314453125, -20.4390869140625, 6.41459846496582, 6.980486869812012, -14.285194396972656, 12.759101867675781, -3.5510711669921875, 7.017116546630859, -5.620861053466797, -6.363025665283203, 5.20458984375, -1.349578857421875, 9.907768249511719, 9.6595458984375, 4.273082733154297, -14.0179443359375, -9.304431915283203, 3.2980880737304688, 25.98973846435547, 7.624454498291016, 9.244743347167969, 33.51496887207031, 1.8700180053710938, 5.1844329833984375, 32.50305938720703, 2.3331375122070312, 10.660308837890625, -2.565166473388672, -0.9931907653808594, 18.284379959106445, 17.326751708984375, 28.762798309326172, 16.02570343017578, 30.698959350585938, 1.4350624084472656, 35.205780029296875, -10.098220825195312, 1.7541580200195312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000420.npy"}
{"epoch": 0.6349206349206349, "step": 421, "batch_size": 64, "mean": 6.666936874389648, "std": 14.317138671875, "min": -28.57427215576172, "p10": -12.522950744628902, "median": 8.863885879516602, "p90": 23.40378723144532, "max": 48.763763427734375, "pos_frac": 0.65625, "sample": [-18.022357940673828, 29.128021240234375, -4.798160552978516, 15.193349838256836, 13.705062866210938, 25.749401092529297, 19.63190460205078, 0.7591361999511719, 3.4284820556640625, 48.763763427734375, 30.555938720703125, -7.5585784912109375, 24.208595275878906, 10.515216827392578, -8.32479476928711, 9.509431838989258, 10.334800720214844, -2.2303466796875, 21.525901794433594, 16.33739471435547, 17.848403930664062, 11.51751708984375, -0.00635528564453125, -3.908954620361328, 6.459712982177734, -1.6858749389648438, 17.81194305419922, -28.57427215576172, -15.182701110839844, -14.068763732910156, 4.492881774902344, 24.66259002685547, 26.555015563964844, -1.1777477264404297, 3.2299957275390625, 17.95264434814453, 20.260955810546875, 17.52788543701172, -2.8116493225097656, 3.0169296264648438, 16.289276123046875, -4.254650115966797, 9.925529479980469, 12.889633178710938, -8.529708862304688, 16.674619674682617, 2.1652297973632812, 8.977840423583984, -19.808868408203125, -4.277099609375, 21.36767578125, 18.431365966796875, -8.916053771972656, 9.039398193359375, 2.3658294677734375, -20.629425048828125, 2.5037384033203125, 9.56411361694336, 11.890911102294922, -14.28106689453125, 8.749931335449219, 20.42731475830078, -1.5096206665039062, -4.7042694091796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000421.npy"}
{"epoch": 0.636432350718065, "step": 422, "batch_size": 64, "mean": 7.757229328155518, "std": 13.099474906921387, "min": -13.407379150390625, "p10": -8.644361877441405, "median": 6.988347053527832, "p90": 23.387367248535156, "max": 43.93443298339844, "pos_frac": 0.6875, "sample": [14.775604248046875, 40.41023254394531, 4.548984527587891, 2.9969635009765625, 16.813678741455078, -1.4829463958740234, 2.65557861328125, -13.407379150390625, 4.398303985595703, 43.93443298339844, 9.131172180175781, 11.696083068847656, -3.176410675048828, 16.818267822265625, 22.999374389648438, -0.02281951904296875, 14.707664489746094, 28.48987579345703, -7.424713134765625, 5.5682830810546875, 20.76714324951172, -4.175750732421875, -0.16963958740234375, -12.902336120605469, 11.8204345703125, -3.5426788330078125, 5.9315032958984375, 9.09834098815918, 39.5247802734375, 4.08367919921875, 14.793487548828125, 5.91412353515625, 17.91851043701172, -10.667190551757812, 8.437198638916016, -10.922073364257812, 12.18227767944336, -3.6216468811035156, 27.789562225341797, 3.2900466918945312, -0.0521087646484375, -5.901756286621094, 11.111186981201172, 13.40639877319336, 13.361541748046875, -3.0140914916992188, -3.5682220458984375, 9.025634765625, 0.281951904296875, 8.287567138671875, -1.7522048950195312, 40.66618347167969, 0.8791275024414062, -9.167068481445312, 13.260726928710938, 11.380477905273438, 5.094770431518555, -13.291000366210938, 16.626190185546875, 23.55364990234375, 8.045190811157227, 8.462593078613281, -9.62017822265625, 9.406116485595703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000422.npy"}
{"epoch": 0.6379440665154951, "step": 423, "batch_size": 64, "mean": 8.220325469970703, "std": 12.39243221282959, "min": -22.032196044921875, "p10": -4.706250762939453, "median": 6.1273956298828125, "p90": 26.94423065185547, "max": 37.99919128417969, "pos_frac": 0.78125, "sample": [4.759620666503906, -7.263969421386719, 24.516311645507812, -2.000131607055664, 35.3106575012207, 0.7541999816894531, 3.1303977966308594, -15.104583740234375, 24.40362548828125, 2.7144317626953125, -4.9017181396484375, 28.25178337097168, 2.654571533203125, 2.2773590087890625, -3.5542144775390625, 6.861030578613281, 4.7146453857421875, 2.075164794921875, 33.83403015136719, -9.431739807128906, 12.579545974731445, 8.479026794433594, -5.9822540283203125, 13.173206329345703, 4.797903060913086, 13.935531616210938, 31.809722900390625, -3.799833297729492, -22.032196044921875, 5.459247589111328, 15.436065673828125, 24.952476501464844, 9.330718994140625, 11.216567993164062, 13.56551742553711, 10.693504333496094, 7.3852386474609375, 4.59423828125, 26.609451293945312, 37.99919128417969, 3.651641845703125, 4.41893196105957, -0.7805099487304688, 14.047012329101562, 6.304962158203125, 16.905437469482422, 9.480403900146484, -0.08765411376953125, 1.5098648071289062, 7.17083740234375, 27.120895385742188, 6.958740234375, -3.003833770751953, 2.945985794067383, 10.580421447753906, 27.08770751953125, 2.6305770874023438, 11.15616226196289, -4.250160217285156, 5.9498291015625, 25.505142211914062, -13.372634887695312, 8.23386001586914, 1.7328567504882812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000423.npy"}
{"epoch": 0.6394557823129252, "step": 424, "batch_size": 64, "mean": 9.963847160339355, "std": 12.055689811706543, "min": -15.091888427734375, "p10": -6.738660049438477, "median": 9.915277481079102, "p90": 27.747252273559575, "max": 34.16695022583008, "pos_frac": 0.828125, "sample": [-7.362812042236328, 26.553497314453125, 11.628059387207031, 28.258861541748047, 0.8007431030273438, 15.362363815307617, 12.006988525390625, 13.852928161621094, -7.5218048095703125, -15.091888427734375, 5.449817657470703, 9.684375762939453, -0.7643470764160156, -10.173194885253906, 34.04987335205078, 5.581058502197266, 14.895797729492188, -6.747917175292969, 4.426872253417969, 3.1820755004882812, 2.7350082397460938, 12.042366027832031, 1.4651870727539062, -3.2774200439453125, 2.2120208740234375, 11.954353332519531, 24.596832275390625, 3.906890869140625, 33.19996643066406, 12.748188018798828, 15.984733581542969, 6.479461669921875, 0.390228271484375, 4.254304885864258, 31.119304656982422, 8.097412109375, 12.073127746582031, 25.874679565429688, 20.49614715576172, 10.387290954589844, 3.382020950317383, 17.01046371459961, 2.815298080444336, 34.16695022583008, 6.653358459472656, 30.635618209838867, 16.77899169921875, 10.411373138427734, -10.207962036132812, 0.791717529296875, 20.419334411621094, 13.633544921875, 10.14617919921875, 15.891975402832031, -0.115570068359375, 0.18790435791015625, 9.496368408203125, -11.469245910644531, 24.272506713867188, 29.866413116455078, 7.885658264160156, 25.343852996826172, -6.717060089111328, 11.5950927734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000424.npy"}
{"epoch": 0.6409674981103552, "step": 425, "batch_size": 64, "mean": 8.575277328491211, "std": 10.988032341003418, "min": -13.885955810546875, "p10": -4.666596984863281, "median": 7.84356689453125, "p90": 22.543521118164065, "max": 36.59124755859375, "pos_frac": 0.765625, "sample": [10.390174865722656, 8.603523254394531, 0.590789794921875, 1.929595947265625, 19.50829315185547, 1.4711990356445312, 36.59124755859375, -2.040088653564453, 21.693450927734375, 10.283035278320312, 23.300537109375, -2.6441268920898438, 12.570747375488281, -7.06134033203125, -10.031906127929688, 14.795509338378906, -0.512786865234375, 14.153648376464844, 1.0106582641601562, 9.5919189453125, -4.444557189941406, 14.528778076171875, -4.761756896972656, 2.6358089447021484, 15.180519104003906, 4.521148681640625, 8.917152404785156, 25.42437744140625, 3.1618423461914062, 20.557647705078125, 6.484779357910156, -4.390033721923828, 7.083610534667969, 4.473381042480469, 16.95379638671875, 4.037330627441406, 2.8935317993164062, 28.592517852783203, 6.262386322021484, 6.0930633544921875, 16.565040588378906, -8.489089965820312, 9.071006774902344, 21.697402954101562, 20.47415542602539, 9.419681549072266, 12.383659362792969, 12.671310424804688, 16.885894775390625, 4.311981201171875, 31.075355529785156, 5.902099609375, -4.201896667480469, 15.468490600585938, -6.3779144287109375, -3.015331268310547, 30.07373046875, 13.972517013549805, -6.4535064697265625, 18.75946044921875, -2.7977371215820312, 4.0018157958984375, 22.906143188476562, -13.885955810546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000425.npy"}
{"epoch": 0.6424792139077853, "step": 426, "batch_size": 64, "mean": 5.097806930541992, "std": 14.597975730895996, "min": -36.9598388671875, "p10": -12.014838790893554, "median": 3.712291717529297, "p90": 25.640742874145516, "max": 51.890960693359375, "pos_frac": 0.640625, "sample": [-1.7228469848632812, -1.8993682861328125, 15.394630432128906, 11.019401550292969, 20.08917236328125, 26.725929260253906, -1.2177276611328125, 34.32365417480469, 4.850944519042969, 12.509086608886719, -15.391845703125, -6.07281494140625, 2.295543670654297, -15.541725158691406, 10.729583740234375, -5.674518585205078, 11.203914642333984, 4.906425476074219, 2.7409210205078125, -0.4138374328613281, -16.013427734375, 3.1901397705078125, -5.0230255126953125, 4.1776885986328125, -11.602325439453125, 3.2468948364257812, 0.5622138977050781, -36.9598388671875, 11.133441925048828, 12.286911010742188, 9.215438842773438, 23.372127532958984, 14.348560333251953, 6.7238006591796875, -17.541534423828125, 4.916046142578125, -11.870330810546875, 5.499904632568359, 1.251800537109375, 51.890960693359375, 35.40681457519531, 29.463546752929688, -12.076770782470703, 2.38336181640625, -5.460929870605469, 33.270957946777344, -5.942649841308594, -0.2298431396484375, 0.5060539245605469, -1.8661346435546875, -3.251789093017578, 14.329177856445312, 8.229095458984375, 26.613006591796875, 9.222969055175781, -1.7359848022460938, 4.937137603759766, 5.5078125, 1.9500656127929688, -14.465484619140625, 13.958251953125, 7.38836669921875, 19.99893569946289, -3.536283493041992], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000426.npy"}
{"epoch": 0.6439909297052154, "step": 427, "batch_size": 64, "mean": 8.076553344726562, "std": 12.292357444763184, "min": -19.56866455078125, "p10": -4.489707946777344, "median": 5.1709136962890625, "p90": 24.379368591308594, "max": 44.982177734375, "pos_frac": 0.71875, "sample": [10.59564208984375, 3.9173622131347656, -2.670307159423828, 1.0796966552734375, 7.57391357421875, 3.1133346557617188, 30.811023712158203, -1.2402267456054688, 18.844161987304688, -13.198997497558594, 14.516220092773438, -5.888938903808594, -9.619930267333984, -7.9216766357421875, 16.784751892089844, 12.703811645507812, 7.57647705078125, 3.5634994506835938, -0.36392974853515625, 4.173545837402344, -0.4240245819091797, 2.3460006713867188, 24.508132934570312, 9.10549545288086, 4.982398986816406, 18.659732818603516, 4.426738739013672, 18.736595153808594, 9.201522827148438, 8.848567962646484, 22.15077018737793, 2.7079010009765625, 10.693756103515625, 5.289466857910156, 5.052360534667969, 18.128684997558594, -1.2550907135009766, -19.56866455078125, 26.593170166015625, 30.764846801757812, 11.836250305175781, 3.175434112548828, -4.672607421875, -1.4266777038574219, -2.735137939453125, -10.305553436279297, 22.42047119140625, 24.07891845703125, 8.296524047851562, 0.5438709259033203, 30.326995849609375, 19.02081298828125, 2.5044937133789062, 4.0065765380859375, 14.022327423095703, -0.582855224609375, 8.592094421386719, 14.344329833984375, -3.2824554443359375, -1.484771728515625, 6.045207977294922, -4.0629425048828125, 44.982177734375, 35.95811462402344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000427.npy"}
{"epoch": 0.6455026455026455, "step": 428, "batch_size": 64, "mean": 9.290613174438477, "std": 13.943106651306152, "min": -18.441978454589844, "p10": -8.799832534790038, "median": 7.09659481048584, "p90": 28.930963516235355, "max": 36.11219787597656, "pos_frac": 0.75, "sample": [-14.534461975097656, 15.305587768554688, 27.80816650390625, -0.5389776229858398, 0.19383621215820312, 4.188861846923828, -0.236724853515625, -18.441978454589844, -8.831783294677734, -3.6723785400390625, 27.835304260253906, -11.11346435546875, 2.6124343872070312, 1.592620849609375, 27.788860321044922, 1.0686798095703125, 31.63922119140625, 7.529523849487305, 0.5832138061523438, -0.5830154418945312, 15.33993911743164, 16.077360153198242, 2.1757354736328125, 8.990970611572266, 22.724735260009766, 5.312591552734375, 4.684642791748047, -5.98162841796875, 22.07537841796875, -11.772064208984375, 14.030426025390625, -14.013236999511719, 4.601663589477539, 8.793205261230469, 20.333572387695312, 36.11219787597656, 26.568328857421875, 11.5830078125, 1.4836235046386719, -9.156444549560547, 11.196136474609375, 25.1458740234375, 21.686111450195312, 34.84540557861328, -8.72528076171875, -1.7508773803710938, 12.9171142578125, 7.7209625244140625, 3.1834945678710938, 6.663665771484375, 2.836212158203125, 17.52259063720703, 14.468969345092773, -2.485126495361328, 33.4561767578125, 28.004840850830078, 33.62057113647461, 29.32787322998047, 5.50079345703125, 9.996795654296875, 32.076568603515625, 12.264957427978516, 3.2488784790039062, -8.280990600585938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000428.npy"}
{"epoch": 0.6470143613000756, "step": 429, "batch_size": 64, "mean": 9.301875114440918, "std": 15.412348747253418, "min": -33.32352828979492, "p10": -10.959354591369626, "median": 8.578699111938477, "p90": 29.307640457153322, "max": 49.22416687011719, "pos_frac": 0.765625, "sample": [10.40655517578125, -12.539077758789062, -1.1439151763916016, 4.4780120849609375, 10.372039794921875, 19.015777587890625, 13.082420349121094, 22.288612365722656, -1.8272972106933594, 7.2247161865234375, 0.7581825256347656, -8.195068359375, 5.891242980957031, 8.44192886352539, 11.236679077148438, 17.008941650390625, -21.80474853515625, 4.3502960205078125, -2.9543609619140625, 16.458709716796875, 12.509445190429688, 2.0803070068359375, 14.738037109375, 20.765573501586914, 36.73274230957031, 28.963699340820312, 37.210357666015625, 7.4189605712890625, 11.875938415527344, -12.144048690795898, -16.364639282226562, -33.32352828979492, -0.323089599609375, 3.8799514770507812, 23.129188537597656, -0.7950782775878906, -17.609222412109375, 17.11176300048828, 4.100852966308594, 2.77093505859375, 28.86233901977539, 39.28046417236328, 9.107154846191406, 29.45504379272461, 31.4532470703125, -19.09925079345703, -4.992576599121094, 11.192428588867188, 2.5012588500976562, 8.715469360351562, 7.804964065551758, 36.37699890136719, 5.921607971191406, 4.920642852783203, 10.632858276367188, 5.030853271484375, 24.209362030029297, 15.198408126831055, 49.22416687011719, 20.478553771972656, 11.513607025146484, 6.337331771850586, -1.03131103515625, 16.94861602783203], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000429.npy"}
{"epoch": 0.6485260770975056, "step": 430, "batch_size": 64, "mean": 9.822078704833984, "std": 13.47584342956543, "min": -25.879165649414062, "p10": -5.5370876312255835, "median": 8.895130157470703, "p90": 28.18503875732422, "max": 42.918853759765625, "pos_frac": 0.796875, "sample": [27.811508178710938, 1.0984992980957031, -1.3678665161132812, 11.519248962402344, -7.608436584472656, 16.128273010253906, 0.8953351974487305, 39.78683853149414, 10.136894226074219, 11.100997924804688, -11.410903930664062, 21.475303649902344, 28.73432159423828, 7.873716354370117, 37.62212371826172, 17.633522033691406, 22.622817993164062, 11.174118041992188, 28.884132385253906, 9.117042541503906, 4.613273620605469, 31.55695343017578, 11.248619079589844, 3.970775604248047, -2.2946014404296875, 26.921104431152344, 1.9257278442382812, 20.892410278320312, 21.805328369140625, 8.232391357421875, 6.247165679931641, -7.9494781494140625, 9.770584106445312, 15.140533447265625, 4.525135040283203, 42.918853759765625, 28.345123291015625, 11.573005676269531, -10.763648986816406, 21.618465423583984, 0.7861709594726562, 10.992156982421875, 2.169973373413086, -0.0843963623046875, 2.8050613403320312, 9.67544937133789, 7.577171325683594, -1.1121673583984375, -25.879165649414062, -18.618011474609375, 2.29583740234375, 17.889328002929688, -2.8686981201171875, 26.910789489746094, 1.7820892333984375, 17.7987060546875, 8.645034790039062, 17.385820388793945, 11.04425048828125, 1.1296844482421875, -1.1506118774414062, -6.680683135986328, 8.6732177734375, 3.9208297729492188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000430.npy"}
{"epoch": 0.6500377928949358, "step": 431, "batch_size": 64, "mean": 8.467656135559082, "std": 14.109068870544434, "min": -27.208786010742188, "p10": -6.96243677139282, "median": 7.308052062988281, "p90": 27.76457214355469, "max": 43.29418182373047, "pos_frac": 0.71875, "sample": [-23.255916595458984, 7.323936462402344, 14.121349334716797, 1.7799263000488281, -1.5014381408691406, 21.810348510742188, 23.570281982421875, 1.4860610961914062, 14.441787719726562, -8.302207946777344, 4.321460723876953, 42.72089385986328, 9.983102798461914, -1.8022842407226562, 1.6587791442871094, 1.3402824401855469, 22.892791748046875, -10.302558898925781, 23.05767822265625, 3.8553314208984375, 7.453224182128906, -18.19573211669922, -7.6593170166015625, 22.927711486816406, -9.856422424316406, 26.612289428710938, 33.884117126464844, -0.04314613342285156, 13.216026306152344, -5.336382865905762, -0.2849559783935547, 8.526479721069336, 12.237136840820312, 28.858169555664062, -3.4618682861328125, -5.1896514892578125, 30.192169189453125, 7.4200439453125, -27.208786010742188, 4.442913055419922, -1.427337646484375, 7.000755310058594, -4.638832092285156, 21.723373413085938, 5.211677551269531, 7.292167663574219, 31.406734466552734, 10.8857421875, 11.42951774597168, 43.29418182373047, -0.26860809326171875, 20.092899322509766, -2.3899078369140625, 8.499427795410156, 10.734477996826172, 18.340065002441406, 4.752082824707031, 3.3328170776367188, 5.775665283203125, 28.258407592773438, 19.660337448120117, 8.779510498046875, 10.436119079589844, 6.01507568359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000431.npy"}
{"epoch": 0.6515495086923658, "step": 432, "batch_size": 64, "mean": 8.456931114196777, "std": 14.370782852172852, "min": -25.06049346923828, "p10": -6.439567756652831, "median": 5.158864498138428, "p90": 27.263481140136722, "max": 41.76062774658203, "pos_frac": 0.71875, "sample": [-1.5517578125, 0.17915725708007812, -9.954917907714844, -25.06049346923828, 4.899477958679199, 14.008392333984375, 0.4426689147949219, 5.955356597900391, 0.3164520263671875, 17.76714324951172, 2.084596633911133, -21.967323303222656, -0.33245849609375, 2.180194854736328, 23.398147583007812, -21.16021728515625, 5.418251037597656, 7.28594970703125, 41.76062774658203, 24.89208984375, -6.5972747802734375, 14.048870086669922, 26.399002075195312, -2.588104248046875, 4.170806884765625, 24.36473846435547, 12.264251708984375, 27.63397216796875, 17.993370056152344, -6.071584701538086, -7.438972473144531, 0.18095779418945312, 0.476837158203125, 0.14940643310546875, 4.891998291015625, -4.8481597900390625, 35.9078369140625, 17.02783966064453, 9.617477416992188, 19.10042953491211, -1.234222412109375, 20.361892700195312, 20.182861328125, 1.5861053466796875, 25.620071411132812, 16.26910400390625, -0.5381793975830078, 32.58500289916992, -1.2782821655273438, 39.728172302246094, 9.6824951171875, 21.034759521484375, 3.6618423461914062, -0.5899505615234375, 7.0506591796875, 31.919540405273438, -10.89654541015625, 10.76007080078125, 18.992477416992188, -0.20413970947265625, 0.009979248046875, 28.263980865478516, -2.47650146484375, 13.507347106933594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000432.npy"}
{"epoch": 0.6530612244897959, "step": 433, "batch_size": 64, "mean": 11.235065460205078, "std": 15.82537841796875, "min": -24.409557342529297, "p10": -8.536418151855468, "median": 7.620704650878906, "p90": 35.844918060302746, "max": 48.41077423095703, "pos_frac": 0.8125, "sample": [2.1541748046875, 5.887306213378906, -12.954383850097656, 9.228191375732422, 8.401527404785156, 2.9940872192382812, -0.9175662994384766, 37.95635986328125, -2.754331588745117, 5.50396728515625, 3.5051498413085938, 13.831878662109375, 1.2180023193359375, -3.3476943969726562, 37.926422119140625, -15.796699523925781, 4.780561447143555, 28.707977294921875, 19.78363609313965, -9.07415771484375, 18.8634033203125, 1.1354751586914062, 6.55078125, 20.242889404296875, -24.409557342529297, 11.127763748168945, 17.170806884765625, 2.4387130737304688, 32.98993682861328, 19.720172882080078, 3.482372283935547, 26.859573364257812, -10.512435913085938, 5.9410400390625, 17.070541381835938, -6.842136383056641, -11.612136840820312, 2.387054443359375, 6.778472900390625, 48.41077423095703, 32.32539367675781, 37.0684814453125, 32.04412841796875, 4.662078857421875, 14.212486267089844, -14.260566711425781, 1.6208038330078125, 14.759880065917969, 1.8897628784179688, 37.39038848876953, 1.5516891479492188, 15.864585876464844, 16.516937255859375, -7.2816925048828125, 6.8456878662109375, 27.154468536376953, 26.312286376953125, 8.395721435546875, 28.42013168334961, 19.37322998046875, 38.59990692138672, 0.8355560302734375, 39.68931579589844, 10.225639343261719], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000433.npy"}
{"epoch": 0.654572940287226, "step": 434, "batch_size": 64, "mean": 9.755840301513672, "std": 13.922234535217285, "min": -19.195396423339844, "p10": -7.069803619384766, "median": 10.585586547851562, "p90": 28.422020721435555, "max": 40.421592712402344, "pos_frac": 0.734375, "sample": [-19.04931640625, 21.586257934570312, 15.231204986572266, -7.1002197265625, 1.0344467163085938, 30.83423614501953, -8.775554656982422, 29.15845489501953, -17.918197631835938, 10.259334564208984, -0.5493927001953125, -0.2249755859375, -19.195396423339844, 22.14649200439453, 20.883811950683594, 0.27507781982421875, 22.760047912597656, 14.011161804199219, 12.18731689453125, 25.573009490966797, 22.408851623535156, -1.24462890625, 35.01104736328125, -6.2409820556640625, -12.374759674072266, 10.974952697753906, 33.00844955444336, 4.13482666015625, 4.4605865478515625, 17.86058807373047, -2.298126220703125, 12.158924102783203, 5.801910400390625, 40.421592712402344, 33.877410888671875, 16.755645751953125, -5.3074798583984375, 4.0096282958984375, 16.335708618164062, -0.27542877197265625, 25.965009689331055, -6.998832702636719, 3.626744270324707, 16.502626419067383, 3.8382568359375, 1.3561325073242188, -0.4688072204589844, 16.88775634765625, 6.483772277832031, 12.873741149902344, 2.1629180908203125, 39.27569580078125, -4.618152618408203, 7.441497802734375, 6.721446990966797, 14.543800354003906, 13.489845275878906, 26.70367431640625, 10.91183853149414, -7.187349319458008, 13.272468566894531, 13.206718444824219, 18.882614135742188, 6.893831253051758], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000434.npy"}
{"epoch": 0.656084656084656, "step": 435, "batch_size": 64, "mean": 7.464873313903809, "std": 12.1670560836792, "min": -19.644821166992188, "p10": -5.866968536376953, "median": 5.327945709228516, "p90": 23.656570434570316, "max": 35.865882873535156, "pos_frac": 0.71875, "sample": [27.58926010131836, 16.03870964050293, -4.558620452880859, 5.714265823364258, -7.3880462646484375, 9.632987976074219, -2.5142974853515625, 3.91839599609375, 18.110576629638672, -3.010761260986328, 15.480522155761719, 17.697399139404297, -3.09375, 18.609054565429688, 30.98737335205078, 13.027595520019531, 3.1486358642578125, -12.649917602539062, -5.7198944091796875, 23.022216796875, 17.698165893554688, -5.930000305175781, 20.803970336914062, 25.68640899658203, 5.18951416015625, 0.2626953125, -3.5492935180664062, 2.652587890625, 27.187610626220703, 29.373809814453125, 2.1348800659179688, 23.928436279296875, -14.572509765625, 10.376136779785156, 10.352924346923828, 15.395423889160156, -5.2315521240234375, 1.4142913818359375, 8.906026840209961, 11.630302429199219, 13.272537231445312, 21.974227905273438, 35.865882873535156, 20.826332092285156, 2.269730567932129, 6.38868522644043, -5.490139007568359, -4.58912467956543, -4.5329742431640625, 21.969207763671875, -4.187553405761719, 4.705924987792969, -8.553937911987305, 12.580665588378906, 5.466377258300781, 1.6656265258789062, 9.009796142578125, -6.093719482421875, -19.644821166992188, 4.97991943359375, 2.68292236328125, 1.507080078125, 17.1519775390625, 0.775726318359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000435.npy"}
{"epoch": 0.6575963718820862, "step": 436, "batch_size": 64, "mean": 8.60055160522461, "std": 13.844925880432129, "min": -25.686552047729492, "p10": -7.6133804321289045, "median": 6.772014617919922, "p90": 27.518448638916027, "max": 41.34138488769531, "pos_frac": 0.734375, "sample": [2.510772705078125, -0.6577033996582031, 19.708206176757812, 3.1191482543945312, 6.892642974853516, 5.8224382400512695, 4.496112823486328, 18.20290756225586, -8.755355834960938, 9.885517120361328, 22.127696990966797, 28.679710388183594, 4.209556579589844, 14.454254150390625, 4.847282409667969, 37.67435073852539, 41.34138488769531, 30.612777709960938, -23.540420532226562, 20.684345245361328, 13.496326446533203, 4.253871917724609, -5.328926086425781, 3.8388290405273438, -1.530813217163086, 3.5596694946289062, 24.410720825195312, 2.6828041076660156, 10.389183044433594, 19.033382415771484, 14.323081970214844, -8.483840942382812, -25.686552047729492, 7.26409912109375, -0.7823333740234375, 1.8555831909179688, 8.379203796386719, -8.86313247680664, -4.582267761230469, -2.5183639526367188, 16.4356689453125, 19.40448760986328, 6.638694763183594, 3.9612808227539062, -2.1877899169921875, 10.816054344177246, -10.846330642700195, 31.866424560546875, 11.293609619140625, 24.808837890625, 9.300582885742188, 0.35216808319091797, 16.564720153808594, 13.548370361328125, -0.6241874694824219, 17.9940185546875, 33.69702911376953, -5.582305908203125, 7.7995452880859375, 6.651386260986328, -0.964813232421875, 37.72889709472656, -15.469661712646484, 19.222488403320312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000436.npy"}
{"epoch": 0.6591080876795162, "step": 437, "batch_size": 64, "mean": 7.463058948516846, "std": 15.065455436706543, "min": -28.270355224609375, "p10": -10.326903533935546, "median": 6.45164680480957, "p90": 27.660278320312504, "max": 46.75574493408203, "pos_frac": 0.640625, "sample": [42.031005859375, 14.577392578125, 11.333938598632812, 2.592376708984375, 13.742835998535156, -12.483558654785156, 17.716278076171875, 3.852081298828125, 40.69788360595703, 0.7616920471191406, 27.906280517578125, 5.5746002197265625, -28.270355224609375, 7.528099060058594, -10.617622375488281, -4.2651824951171875, 8.799911499023438, 13.794601440429688, 9.048904418945312, -17.808242797851562, 21.32769775390625, 33.95734405517578, 32.714046478271484, -10.936553955078125, -1.1925811767578125, 20.849151611328125, 17.84906005859375, -10.961814880371094, -11.591699600219727, -5.032684326171875, -2.3499488830566406, -3.2215137481689453, -2.3080520629882812, -9.6485595703125, 9.692790985107422, 36.5514030456543, 13.067092895507812, -5.14971923828125, 8.747329711914062, 1.3934440612792969, 8.981948852539062, 0.5558910369873047, -4.010677337646484, 15.697135925292969, -3.7678680419921875, 26.858352661132812, 5.197509765625, 10.284995079040527, 17.866851806640625, -4.158203125, 6.961688995361328, 12.294544219970703, 3.1136131286621094, 46.75574493408203, 11.101573944091797, -4.553676605224609, 5.9416046142578125, 22.403213500976562, -5.4896240234375, 9.572502136230469, -1.612466812133789, -5.045173645019531, -4.669120788574219, 27.086273193359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000437.npy"}
{"epoch": 0.6606198034769464, "step": 438, "batch_size": 64, "mean": 9.921192169189453, "std": 13.56430721282959, "min": -19.912033081054688, "p10": -7.435047912597655, "median": 8.40620231628418, "p90": 29.690810775756844, "max": 35.625423431396484, "pos_frac": 0.75, "sample": [27.82054901123047, -8.096920013427734, 8.51788330078125, -9.492229461669922, 0.7405805587768555, -5.206787109375, 1.6413688659667969, 20.653514862060547, 0.9000587463378906, 25.332244873046875, 2.4048919677734375, 34.577049255371094, -7.936981201171875, -1.2285995483398438, 18.327102661132812, 2.8749122619628906, 24.467514038085938, -11.934078216552734, 13.049774169921875, 15.39830207824707, 21.82489776611328, 20.58823013305664, -6.2638702392578125, 0.8990325927734375, 14.878284454345703, 6.763877868652344, 20.37543487548828, 8.83932876586914, 26.817270278930664, 31.35710906982422, 10.887680053710938, -1.3217010498046875, 33.304481506347656, -0.029293060302734375, 1.7941398620605469, 33.08623123168945, 15.04571533203125, -3.789417266845703, 7.409297943115234, 21.67137908935547, 30.492351531982422, 13.713973999023438, 8.29452133178711, 31.26434326171875, 13.310104370117188, 3.3923797607421875, 18.24029541015625, 7.219104766845703, -11.184982299804688, 3.893352508544922, -6.202339172363281, -10.472249984741211, 6.698738098144531, 24.963455200195312, 20.138782501220703, -19.912033081054688, 35.625423431396484, 17.02838897705078, -5.650596618652344, 22.662242889404297, 0.2911262512207031, -0.20721435546875, 4.52935791015625, 9.879505157470703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000438.npy"}
{"epoch": 0.6621315192743764, "step": 439, "batch_size": 64, "mean": 9.39749526977539, "std": 12.431477546691895, "min": -13.790313720703125, "p10": -4.84543228149414, "median": 7.027629852294922, "p90": 27.82070426940919, "max": 40.70500183105469, "pos_frac": 0.765625, "sample": [17.23638916015625, 5.3567352294921875, -2.716838836669922, -13.261268615722656, 13.284324645996094, 18.594181060791016, 4.73828125, 0.177764892578125, 5.645957946777344, 19.804107666015625, 40.70500183105469, 4.851354598999023, 16.777355194091797, 7.4753265380859375, 10.065292358398438, 29.300979614257812, 28.826400756835938, 5.4440460205078125, 11.016845703125, 8.1396484375, 2.982463836669922, 21.964664459228516, 11.911857604980469, -13.561641693115234, 14.986106872558594, 5.3047943115234375, 17.581939697265625, -4.1220245361328125, 4.957302093505859, 32.88488006591797, -3.7328224182128906, 34.41327667236328, -6.281791687011719, -6.2333984375, 25.474079132080078, 6.579933166503906, -0.73468017578125, -1.2365798950195312, -13.790313720703125, -3.566221237182617, -0.379302978515625, 3.8240966796875, 21.438087463378906, 8.5306396484375, 19.133865356445312, 17.52386474609375, 17.153350830078125, 21.96532440185547, 10.488258361816406, 3.4811973571777344, 4.585624694824219, 3.2027435302734375, 4.709388732910156, 28.847923278808594, 12.43118667602539, 39.25120544433594, -2.949676513671875, -6.7614593505859375, 1.1375045776367188, -5.155464172363281, 9.949073791503906, 17.259292602539062, 13.533428192138672, 0.9958267211914062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000439.npy"}
{"epoch": 0.6636432350718064, "step": 440, "batch_size": 64, "mean": 12.008790969848633, "std": 12.633219718933105, "min": -11.53884506225586, "p10": -2.755699157714843, "median": 8.955602645874023, "p90": 29.773544692993166, "max": 40.106849670410156, "pos_frac": 0.828125, "sample": [3.9203414916992188, 9.22259521484375, 25.524707794189453, 27.755584716796875, 34.522369384765625, 28.526092529296875, 36.476905822753906, 0.05044364929199219, 26.52391815185547, 15.807846069335938, 7.700111389160156, 31.339038848876953, -3.0462646484375, 18.289382934570312, 7.945274353027344, 7.354637145996094, 26.86578369140625, -11.53884506225586, 2.4198379516601562, 5.6576690673828125, 7.679298400878906, -0.8438568115234375, 0.4272003173828125, 21.01103973388672, 12.273262023925781, 16.73110008239746, 16.889793395996094, 15.473251342773438, 1.4832477569580078, 7.17340087890625, -0.22875404357910156, 12.796371459960938, 1.2776679992675781, 11.120925903320312, 2.9508438110351562, 3.030620574951172, 8.674217224121094, -6.3555450439453125, 8.688610076904297, 29.490459442138672, 13.99563217163086, 21.343994140625, 3.2888031005859375, 26.82980728149414, -3.3972015380859375, -1.7405624389648438, 40.106849670410156, 12.021804809570312, 29.894866943359375, -2.0777130126953125, -3.9565086364746094, -6.7156219482421875, 25.009201049804688, 17.293304443359375, 20.746021270751953, 9.735210418701172, 0.354705810546875, -4.525751113891602, 11.971363067626953, 4.806755065917969, 4.196739196777344, 36.6551399230957, 34.69397735595703, 6.97125244140625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000440.npy"}
{"epoch": 0.6651549508692366, "step": 441, "batch_size": 64, "mean": 9.259559631347656, "std": 11.691469192504883, "min": -13.625709533691406, "p10": -5.266491317749024, "median": 7.962276458740234, "p90": 25.02992515563965, "max": 34.989112854003906, "pos_frac": 0.796875, "sample": [11.603553771972656, 4.902927398681641, 22.58358383178711, 11.528892517089844, 11.350517272949219, -1.3237380981445312, 17.690956115722656, 7.849815368652344, 16.494613647460938, -3.855884552001953, -6.436725616455078, 13.634819030761719, 5.588527679443359, 24.548446655273438, 4.225475311279297, 14.921875, 27.31268310546875, 8.276443481445312, -10.572944641113281, 28.79779052734375, 6.1651458740234375, 6.120208740234375, 28.031402587890625, 21.978004455566406, 0.1485748291015625, 34.989112854003906, 1.5822982788085938, 22.197853088378906, 34.669281005859375, -3.078399658203125, -0.7417984008789062, 3.4744949340820312, -9.228765487670898, 23.481674194335938, 23.685821533203125, -5.216388702392578, -12.749755859375, 24.939346313476562, 25.068744659423828, 2.0241851806640625, 4.2195892333984375, 18.113677978515625, 4.195720672607422, 12.708000183105469, 6.0870819091796875, -6.4885711669921875, 8.074737548828125, 2.1708526611328125, 5.2515716552734375, -5.2879638671875, 17.03289794921875, -4.398059844970703, 17.3985595703125, 26.85321807861328, -13.625709533691406, 8.150299072265625, 12.77587890625, 9.393241882324219, 3.37200927734375, 6.7635498046875, 0.4809112548828125, 10.336677551269531, 11.067224502563477, 1.303731918334961], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000441.npy"}
{"epoch": 0.6666666666666666, "step": 442, "batch_size": 64, "mean": 8.054361343383789, "std": 12.278545379638672, "min": -19.442134857177734, "p10": -6.660958099365233, "median": 8.396215438842773, "p90": 26.38447265625001, "max": 41.31687927246094, "pos_frac": 0.703125, "sample": [2.8927001953125, 8.081207275390625, -0.206085205078125, -5.0982818603515625, 5.233699798583984, 27.591018676757812, 41.31687927246094, 32.021446228027344, 10.903350830078125, 16.636207580566406, 13.994071960449219, 10.591590881347656, 28.326881408691406, -9.62823486328125, 8.711223602294922, 12.958454132080078, -1.9457321166992188, 2.2601318359375, 3.7063980102539062, -10.846637725830078, -2.1699752807617188, 8.880590438842773, -5.5920257568359375, -0.6873703002929688, 13.531906127929688, 17.06816864013672, -14.426101684570312, 13.988956451416016, 15.818962097167969, 13.68133544921875, 9.558952331542969, 7.945411682128906, 13.668815612792969, 30.569107055664062, 23.569198608398438, -0.061725616455078125, -1.8624267578125, -1.8290901184082031, 32.11211395263672, 17.112476348876953, 6.78424072265625, 0.58294677734375, -3.2433204650878906, 1.892333984375, 1.1667747497558594, -19.442134857177734, -1.7858428955078125, 28.079727172851562, -5.049537658691406, 10.148147583007812, 22.634475708007812, 6.1131591796875, -7.119071960449219, 12.383060455322266, 8.863983154296875, -11.40329360961914, 7.165834426879883, 10.686203002929688, -8.614486694335938, 14.381523132324219, 13.65264892578125, 20.900074005126953, 4.477756500244141, 13.846370697021484], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000442.npy"}
{"epoch": 0.6681783824640968, "step": 443, "batch_size": 64, "mean": 11.130758285522461, "std": 15.94648551940918, "min": -22.766639709472656, "p10": -5.465260314941406, "median": 8.325435638427734, "p90": 29.959649276733398, "max": 66.36068725585938, "pos_frac": 0.765625, "sample": [1.0921592712402344, 23.782142639160156, 11.988151550292969, 28.77398681640625, 4.579887390136719, 17.255218505859375, 23.32189178466797, -22.766639709472656, 12.159347534179688, 0.9853744506835938, 1.154510498046875, -6.996063232421875, -0.4539833068847656, 17.043529510498047, -5.030708312988281, 19.467132568359375, -0.2648124694824219, 10.888931274414062, 18.824302673339844, 19.792823791503906, 66.36068725585938, -21.751869201660156, 0.3932952880859375, 25.56122589111328, -3.9574928283691406, 42.898406982421875, 30.02556610107422, 2.033428192138672, 40.966888427734375, 28.149818420410156, 9.360740661621094, -7.7669219970703125, 7.726131439208984, 11.895788192749023, 4.5173187255859375, 29.805843353271484, 8.677070617675781, 28.917922973632812, 28.747867584228516, 31.014263153076172, 7.2966461181640625, 8.553178787231445, -0.46665000915527344, 12.400035858154297, 30.8897705078125, 23.07386016845703, -0.719207763671875, 10.586265563964844, 5.807926177978516, 3.758829116821289, 8.097692489624023, -1.4868545532226562, 14.503509521484375, 0.6503143310546875, 0.39078521728515625, 5.809577941894531, -11.926254272460938, -2.9764404296875, 39.04680633544922, 2.0593204498291016, -10.490720748901367, -5.651496887207031, 7.894691467285156, 26.093734741210938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000443.npy"}
{"epoch": 0.6696900982615268, "step": 444, "batch_size": 64, "mean": 4.709733009338379, "std": 12.796536445617676, "min": -28.350692749023438, "p10": -7.626334381103516, "median": 4.139228820800781, "p90": 19.49191131591797, "max": 44.367431640625, "pos_frac": 0.6875, "sample": [8.470001220703125, 13.871131896972656, -2.746856689453125, 0.11101531982421875, 6.38934326171875, 7.068180084228516, 17.12837791442871, 0.330902099609375, -7.4216461181640625, 15.253664016723633, 5.350189208984375, 13.066108703613281, -7.714057922363281, -19.60601806640625, 1.72259521484375, 0.129852294921875, 9.146997451782227, 4.836114883422852, 8.104166030883789, -7.85198974609375, 39.667388916015625, 19.654998779296875, -3.111574172973633, 1.8717460632324219, 0.50299072265625, 1.421722412109375, -3.7537612915039062, -6.5210418701171875, -2.1293869018554688, -1.78826904296875, 26.301895141601562, 6.9535980224609375, 19.00829315185547, -0.9957275390625, 5.594841003417969, 19.111373901367188, 2.0938262939453125, 26.87738037109375, -9.246208190917969, -24.24657440185547, 4.050445556640625, 44.367431640625, -28.350692749023438, -2.5779037475585938, -5.2648773193359375, 14.245849609375, 6.074981689453125, 25.01416015625, 20.037559509277344, 4.2280120849609375, 4.444585800170898, 4.338035583496094, 2.744617462158203, 10.109161376953125, 11.02838134765625, -1.471261978149414, -4.247692108154297, 0.9092483520507812, 6.6719970703125, -3.2537689208984375, -18.648704528808594, 17.576675415039062, 4.791351318359375, 1.6997604370117188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000444.npy"}
{"epoch": 0.671201814058957, "step": 445, "batch_size": 64, "mean": 10.363672256469727, "std": 14.45046329498291, "min": -19.19012451171875, "p10": -6.914987182617186, "median": 8.073738098144531, "p90": 30.57375946044922, "max": 36.16534423828125, "pos_frac": 0.78125, "sample": [23.744823455810547, 34.63172149658203, 1.0528945922851562, 2.70245361328125, -14.511856079101562, 30.169082641601562, 28.721967697143555, 16.432113647460938, 19.003555297851562, 6.288330078125, -19.19012451171875, -5.473915100097656, 2.440540313720703, -2.4583053588867188, -5.947723388671875, 19.541881561279297, -17.14605712890625, 1.1583328247070312, -7.32952880859375, 7.931480407714844, -1.4041404724121094, 30.7471923828125, -14.081405639648438, 10.476997375488281, -18.24920654296875, 27.832496643066406, 2.8958473205566406, 17.98302459716797, 11.035514831542969, -10.616073608398438, -3.425262451171875, -4.209739685058594, 4.181707382202148, 23.693675994873047, 30.999107360839844, 7.8748321533203125, 35.99723815917969, 33.048866271972656, 6.579647064208984, 3.6341094970703125, 18.253772735595703, 24.650638580322266, 14.08803939819336, 27.1500244140625, 13.199493408203125, 26.461610794067383, 26.652801513671875, 5.538556098937988, 19.993209838867188, -1.6502532958984375, 3.4544448852539062, 1.8812408447265625, 3.455474853515625, 4.9239349365234375, 15.278450012207031, 12.860855102539062, 32.555686950683594, 10.815814971923828, 8.215995788574219, 4.971923828125, 15.031993865966797, 21.360092163085938, 36.16534423828125, 1.2097625732421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000445.npy"}
{"epoch": 0.672713529856387, "step": 446, "batch_size": 64, "mean": 11.430152893066406, "std": 11.74632453918457, "min": -9.17198371887207, "p10": -3.9014680862426747, "median": 10.941802501678467, "p90": 28.56683807373047, "max": 38.06874084472656, "pos_frac": 0.84375, "sample": [3.140596389770508, -5.868442535400391, -7.409343719482422, 8.430877685546875, 11.108314514160156, 11.4554443359375, 6.5895843505859375, 1.3412628173828125, 29.578338623046875, 23.376235961914062, 16.831932067871094, 17.15906524658203, 14.161849975585938, 18.00391387939453, 11.625877380371094, 32.2711181640625, 2.7953643798828125, 12.94052505493164, 24.989669799804688, 12.805023193359375, 1.1821041107177734, 19.33795928955078, 4.31378173828125, 12.397941589355469, 23.432056427001953, 14.01890754699707, 2.6851654052734375, 22.299468994140625, 29.949485778808594, -6.3787689208984375, 14.826080322265625, 5.532777786254883, -5.257869720458984, -0.36379241943359375, -9.17198371887207, -7.691017150878906, 7.145179748535156, -4.352581024169922, 7.939414978027344, 14.615234375, 1.4555130004882812, 36.06153106689453, 28.706817626953125, -2.171701431274414, 38.06874084472656, 0.6685905456542969, 23.8408203125, 19.829137802124023, -2.8488712310791016, 3.7580909729003906, 14.538627624511719, 0.6593780517578125, 10.244583129882812, 8.202072143554688, 28.240219116210938, 32.06360626220703, 6.320247650146484, 9.613292694091797, 3.591017723083496, 27.225929260253906, 1.5959854125976562, 10.775290489196777, 18.598861694335938, 20.705219268798828], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000446.npy"}
{"epoch": 0.674225245653817, "step": 447, "batch_size": 64, "mean": 9.552654266357422, "std": 13.292573928833008, "min": -30.253101348876953, "p10": -4.994871139526367, "median": 8.029747009277344, "p90": 28.685762023925786, "max": 34.14874267578125, "pos_frac": 0.765625, "sample": [29.760311126708984, 14.647163391113281, 14.26171875, 1.1475830078125, -0.9466094970703125, -7.224800109863281, 6.773506164550781, 3.3080062866210938, -12.877540588378906, 7.7778472900390625, 6.192298889160156, 19.008712768554688, 1.8261795043945312, 6.4638671875, 7.611930847167969, 30.685726165771484, -1.9509906768798828, 15.036703109741211, 6.570819854736328, 9.204902648925781, 12.441364288330078, -12.184814453125, 8.281646728515625, -8.16817855834961, 0.0015106201171875, 27.10723876953125, 21.595382690429688, 27.205184936523438, 16.824752807617188, -2.12982177734375, 16.245147705078125, 23.223960876464844, 3.8158206939697266, 27.853668212890625, 10.903453826904297, 10.583187103271484, 33.04597473144531, 8.578948974609375, 29.042373657226562, -3.6971092224121094, -1.5371589660644531, 6.240688323974609, 26.384002685546875, -5.1221923828125, 5.836437225341797, -13.3792724609375, 5.934295654296875, 31.795989990234375, -0.5213851928710938, 23.44293212890625, 0.830047607421875, 13.297359466552734, -4.697788238525391, 16.532806396484375, 18.31934356689453, 2.0489654541015625, 18.7696533203125, -30.253101348876953, 9.058719635009766, 34.14874267578125, 33.64744567871094, -4.002799987792969, 7.753570556640625, 8.99549388885498], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000447.npy"}
{"epoch": 0.6757369614512472, "step": 448, "batch_size": 64, "mean": 8.315608024597168, "std": 12.846585273742676, "min": -23.197675704956055, "p10": -6.393924331665039, "median": 6.195255279541016, "p90": 25.039204406738282, "max": 45.16730499267578, "pos_frac": 0.734375, "sample": [21.50408935546875, -12.005859375, 24.22818374633789, 5.233726501464844, 1.14630126953125, 8.353363037109375, -2.3130950927734375, 12.432750701904297, 25.53497314453125, 2.1875762939453125, 27.184036254882812, 22.92420196533203, 4.577117919921875, 1.498291015625, 21.908203125, 18.377914428710938, -6.991586685180664, 30.515758514404297, 16.12646484375, -3.7645206451416016, 8.235488891601562, 38.71866226196289, 3.0428390502929688, 5.4585418701171875, 2.505992889404297, 0.9275569915771484, 23.98590850830078, 0.6536083221435547, 11.056198120117188, -4.972320556640625, -8.324037551879883, -0.8513946533203125, 6.856132507324219, 5.1881103515625, -1.246185302734375, 6.454277038574219, -6.477989196777344, 18.76968002319336, 25.200790405273438, -2.5545883178710938, 11.265544891357422, -23.197675704956055, 10.578727722167969, 9.207794189453125, -1.299835205078125, 17.437782287597656, 12.667823791503906, 26.619552612304688, 14.512107849121094, -6.197772979736328, -6.685604095458984, -0.9169082641601562, 2.0744552612304688, 13.029960632324219, -5.184683799743652, 24.66217041015625, 2.1784820556640625, -12.727436065673828, 12.95849609375, 16.37303924560547, 11.903305053710938, 5.9362335205078125, 0.5509033203125, 45.16730499267578], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000448.npy"}
{"epoch": 0.6772486772486772, "step": 449, "batch_size": 64, "mean": 5.34522819519043, "std": 13.059829711914062, "min": -27.654552459716797, "p10": -9.982165908813474, "median": 5.387748718261719, "p90": 18.940885925292974, "max": 40.28890609741211, "pos_frac": 0.640625, "sample": [-1.124349594116211, 9.928543090820312, -2.5942001342773438, 9.1644287109375, 0.35129547119140625, -11.293212890625, 5.46295166015625, 11.061569213867188, 2.2181625366210938, 22.84618377685547, 15.325454711914062, 20.06732177734375, 4.0672760009765625, -10.93280029296875, -5.88787841796875, 5.4578857421875, 38.915138244628906, 4.648146629333496, -0.8223190307617188, 4.38525390625, 19.729141235351562, 7.126178741455078, 27.260398864746094, 40.28890609741211, -13.81631851196289, 5.303688049316406, 16.96335220336914, -15.870529174804688, 6.340278625488281, 17.10162353515625, 4.62664794921875, -6.9918670654296875, 14.332382202148438, -27.654552459716797, 16.887039184570312, 15.553276062011719, 35.811767578125, -7.415275573730469, -2.1755523681640625, 5.913749694824219, -2.7438011169433594, 8.264053344726562, -7.764019012451172, 5.3176116943359375, -6.371421813964844, 10.954803466796875, -1.2566719055175781, 12.012054443359375, 3.2318058013916016, 9.261306762695312, 14.6282958984375, -0.5081634521484375, 9.827064514160156, 5.718620300292969, -5.256683349609375, 14.30609130859375, 14.606651306152344, -19.25921630859375, 9.722248077392578, -3.7022628784179688, 14.6490478515625, -21.34405517578125, -0.5615882873535156, -2.19635009765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000449.npy"}
{"epoch": 0.6787603930461074, "step": 450, "batch_size": 64, "mean": 7.263884544372559, "std": 12.029903411865234, "min": -21.83252716064453, "p10": -7.534014892578124, "median": 7.998985290527344, "p90": 22.356687927246096, "max": 40.06718444824219, "pos_frac": 0.703125, "sample": [-1.3344841003417969, -9.71392822265625, 4.699501037597656, -0.3974113464355469, 10.886322021484375, 20.66826629638672, 9.775138854980469, 6.096954345703125, 13.18115234375, 40.06718444824219, 20.931350708007812, 8.779525756835938, 12.107963562011719, 7.951694488525391, -3.803234100341797, 23.84862518310547, 1.4613189697265625, 0.6047248840332031, 8.69863510131836, 32.28729248046875, -7.762077331542969, 17.45806121826172, 0.786529541015625, 3.0952377319335938, -2.8807830810546875, -9.291873931884766, 4.4780120849609375, 17.169677734375, 13.109596252441406, -5.232425689697266, 8.949005126953125, 22.48297119140625, -0.16011810302734375, 11.200752258300781, -21.83252716064453, 22.062026977539062, 15.699695587158203, -0.9658241271972656, 20.97637939453125, 9.52872085571289, 8.138687133789062, 16.113601684570312, 8.046276092529297, -17.406600952148438, 26.285430908203125, 3.009031295776367, 5.933025360107422, 1.7817611694335938, 14.392974853515625, -7.001869201660156, 4.4311370849609375, 2.3245773315429688, -2.3785629272460938, 14.545795440673828, 33.32630920410156, 10.509223937988281, -12.760833740234375, 11.489934921264648, -0.7577095031738281, 22.88275146484375, -1.903533935546875, 11.24081039428711, -8.972679138183594, -4.048564910888672], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000450.npy"}
{"epoch": 0.6802721088435374, "step": 451, "batch_size": 64, "mean": 6.920779228210449, "std": 11.794063568115234, "min": -14.206268310546875, "p10": -8.136482238769531, "median": 5.719598770141602, "p90": 24.201014709472656, "max": 39.98575973510742, "pos_frac": 0.703125, "sample": [5.325435638427734, -8.080833435058594, 1.0081024169921875, 10.117584228515625, 23.995819091796875, 3.972991943359375, -12.853630065917969, -1.094635009765625, 7.9293365478515625, 22.876846313476562, 14.60968017578125, 7.678966522216797, 24.288955688476562, 7.945770263671875, 28.729164123535156, 1.18408203125, 6.113761901855469, 6.9760589599609375, 3.860858917236328, 10.601831436157227, 11.901693344116211, -8.191505432128906, -3.6618270874023438, 29.58941650390625, 22.94392967224121, -9.412887573242188, 2.3383026123046875, -14.206268310546875, -13.501205444335938, -0.21987342834472656, 7.377979278564453, 11.992874145507812, -9.718597412109375, 1.5310592651367188, 15.603330612182617, 4.854957580566406, 30.633142471313477, 13.734237670898438, -0.6876888275146484, 2.53179931640625, -4.011573791503906, 6.8078765869140625, 4.586127281188965, 5.144630432128906, 9.885326385498047, 10.98516845703125, 12.29116439819336, 31.936630249023438, 28.123313903808594, 9.642333984375, 16.467926025390625, 7.2218017578125, 7.55169677734375, -0.9031295776367188, -3.3478317260742188, -5.7891845703125, 11.743167877197266, -0.03202056884765625, -2.692403793334961, 2.6010589599609375, 39.98575973510742, 5.228736877441406, -8.160331726074219, -2.9554214477539062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000451.npy"}
{"epoch": 0.6817838246409675, "step": 452, "batch_size": 64, "mean": 8.323768615722656, "std": 13.162985801696777, "min": -18.714618682861328, "p10": -6.160268211364746, "median": 8.237951278686523, "p90": 23.581106567382815, "max": 43.55491638183594, "pos_frac": 0.6875, "sample": [9.656036376953125, -0.4331207275390625, 10.838375091552734, 22.70250701904297, -2.4792137145996094, 20.24608039855957, 9.46124267578125, 43.55491638183594, 25.474750518798828, 20.08563232421875, 8.949241638183594, 15.536270141601562, -7.58868408203125, 19.142723083496094, 20.150772094726562, 4.5631256103515625, 33.337158203125, 4.0629425048828125, 15.488639831542969, 11.721038818359375, 4.603668212890625, -3.5131149291992188, 5.5705108642578125, 4.625053405761719, 0.4570484161376953, -6.236446380615234, 5.767292022705078, 4.8080902099609375, 14.438209533691406, -0.4621391296386719, -5.982519149780273, 38.05192565917969, -14.706520080566406, 23.799652099609375, -5.974800109863281, 27.0718994140625, 17.81826400756836, 2.4430618286132812, -9.212814331054688, 8.185359954833984, 11.525882720947266, -1.999298095703125, 17.327537536621094, 8.290542602539062, 18.550216674804688, -18.714618682861328, 37.391502380371094, 18.99725341796875, 9.175678253173828, -12.076332092285156, 12.358566284179688, -4.210676193237305, 9.166460037231445, 7.575477600097656, -5.3224945068359375, 2.8166885375976562, -2.201385498046875, 14.287544250488281, 13.74690055847168, -3.2353363037109375, -2.7789306640625, 23.0711669921875, -16.67139434814453, -0.37191009521484375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000452.npy"}
{"epoch": 0.6832955404383976, "step": 453, "batch_size": 64, "mean": 7.774205207824707, "std": 13.379837036132812, "min": -22.2265682220459, "p10": -10.417927932739255, "median": 6.43842887878418, "p90": 24.41546096801758, "max": 39.232421875, "pos_frac": 0.734375, "sample": [20.19403076171875, 16.80621337890625, -2.1699790954589844, -6.9149932861328125, 24.94763946533203, 9.905242919921875, -7.325099945068359, 20.112953186035156, 17.839248657226562, 7.563507080078125, -22.2265682220459, 6.791347503662109, 21.798141479492188, -2.02679443359375, 25.288108825683594, 26.92186737060547, 13.816818237304688, 2.556854248046875, 18.235671997070312, 10.810203552246094, 3.6806373596191406, -11.735488891601562, 9.973342895507812, 4.389106750488281, 1.8654098510742188, -15.317031860351562, -11.671913146972656, 20.362926483154297, 1.1961936950683594, 7.381385803222656, 20.817108154296875, 4.4228515625, 13.164764404296875, -17.890159606933594, -7.388221740722656, 4.996036529541016, 20.392963409423828, 3.9706459045410156, -16.156593322753906, 16.996726989746094, 6.053337097167969, 32.55030822753906, -2.8504104614257812, -4.405040740966797, 2.3007278442382812, 13.204132080078125, -4.455350875854492, 2.386444091796875, 19.024627685546875, 1.3635711669921875, 21.217529296875, -0.7027664184570312, 2.943174362182617, 5.163995742797852, 6.08551025390625, -13.166114807128906, -7.491962432861328, 39.232421875, 24.036231994628906, 9.142608642578125, 24.577987670898438, 21.99093246459961, 31.94866943359375, 11.023426055908203], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000453.npy"}
{"epoch": 0.6848072562358276, "step": 454, "batch_size": 64, "mean": 9.388453483581543, "std": 14.19239616394043, "min": -38.99445343017578, "p10": -4.013520050048827, "median": 7.535194396972656, "p90": 26.216858673095704, "max": 38.1365966796875, "pos_frac": 0.75, "sample": [27.116294860839844, -12.48906135559082, -0.7370376586914062, 21.95250701904297, -38.99445343017578, -3.0270729064941406, 23.273956298828125, 3.1550369262695312, 13.558982849121094, 0.1578502655029297, 20.29358673095703, 18.476547241210938, 4.3201446533203125, -2.302337646484375, 25.625961303710938, 10.159046173095703, 23.203044891357422, 24.140769958496094, 2.2122116088867188, 10.99359130859375, -1.3508720397949219, -19.824546813964844, -1.5249156951904297, 19.265052795410156, 6.3702850341796875, 38.1365966796875, -8.513235092163086, 16.326278686523438, 4.2522735595703125, 2.4020538330078125, 2.1961517333984375, 6.768196105957031, 18.009605407714844, -7.762554168701172, 18.355682373046875, 11.856729507446289, 37.707130432128906, -2.449522018432617, 37.81171417236328, 10.748069763183594, 5.774776458740234, -0.7671356201171875, 0.4999656677246094, 1.0159454345703125, -4.436283111572266, 8.302192687988281, 25.35280990600586, 18.788528442382812, -6.465126037597656, 13.495931625366211, 21.672622680664062, 10.07305908203125, 36.04252624511719, 21.93706512451172, 9.232439041137695, 2.4067153930664062, 4.878728866577148, 26.47010040283203, 5.3736419677734375, 29.631980895996094, 13.252330780029297, 0.9491958618164062, -1.6411190032958984, -0.8495826721191406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000454.npy"}
{"epoch": 0.6863189720332578, "step": 455, "batch_size": 64, "mean": 10.655372619628906, "std": 13.229926109313965, "min": -16.7464599609375, "p10": -3.040259552001952, "median": 8.767297744750977, "p90": 24.688533020019538, "max": 49.39149475097656, "pos_frac": 0.828125, "sample": [16.47552490234375, 15.786224365234375, 17.975570678710938, 33.12017822265625, 5.351654052734375, -0.3164644241333008, 20.881561279296875, 46.15910339355469, 13.025321960449219, -0.1022796630859375, 8.472150802612305, -5.02703857421875, 5.735908508300781, 21.04369354248047, 9.917068481445312, 30.70867919921875, 14.47650146484375, -4.283836364746094, 3.5928955078125, -8.747024536132812, 23.256072998046875, -3.5186386108398438, 25.302444458007812, 1.12603759765625, 2.731332778930664, 23.213836669921875, 15.293785095214844, 14.257644653320312, 18.527931213378906, 2.579315185546875, 21.973655700683594, 2.90814208984375, 29.35533905029297, 5.6157989501953125, 9.307048797607422, 2.3009262084960938, -16.18930435180664, 17.369789123535156, 17.142364501953125, 11.375167846679688, -6.810214996337891, 6.283927917480469, 18.488235473632812, -16.7464599609375, 18.45975112915039, 2.245147705078125, 17.74384307861328, 0.2020721435546875, 9.76382064819336, 5.408012390136719, 49.39149475097656, 8.487747192382812, 3.486236572265625, 9.04684829711914, 5.640533447265625, 2.7188186645507812, 19.739505767822266, 0.12660980224609375, -1.924041748046875, 0.06987380981445312, 0.9555206298828125, 15.902664184570312, -0.230316162109375, 45.320159912109375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000455.npy"}
{"epoch": 0.6878306878306878, "step": 456, "batch_size": 64, "mean": 8.75540542602539, "std": 11.7522554397583, "min": -18.738006591796875, "p10": -4.044284820556641, "median": 8.136661529541016, "p90": 26.0357780456543, "max": 40.229400634765625, "pos_frac": 0.78125, "sample": [22.15606689453125, 3.6485748291015625, 2.265665054321289, 14.789863586425781, 31.600799560546875, -1.0028152465820312, -1.9859189987182617, 11.935539245605469, 0.21913719177246094, 40.229400634765625, 10.529228210449219, 7.342124938964844, 21.882278442382812, 12.580421447753906, 12.277259826660156, 8.679298400878906, 1.5514373779296875, 29.410507202148438, -0.25589752197265625, 7.594024658203125, -4.08343505859375, 14.175018310546875, 10.231231689453125, -4.080413818359375, 11.70281982421875, 4.024986267089844, -16.971134185791016, -5.8856201171875, 6.028255462646484, -13.42828369140625, 0.31084442138671875, 4.1767578125, 10.074562072753906, 9.307098388671875, 11.228214263916016, 14.773645401000977, 9.976219177246094, -0.0710296630859375, 17.14373016357422, 1.009572982788086, -5.752773284912109, 20.831985473632812, 2.445005416870117, 14.851089477539062, 7.355276107788086, -18.738006591796875, 23.115577697753906, 5.877967834472656, -0.8554306030273438, 1.8254547119140625, 26.399276733398438, 28.449024200439453, 0.9764404296875, 28.08721160888672, 14.899398803710938, 17.821170806884766, 30.143539428710938, 5.6772003173828125, -1.9327774047851562, 25.18761444091797, 8.76528549194336, -3.9599838256835938, 1.8944854736328125, 11.891830444335938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000456.npy"}
{"epoch": 0.6893424036281179, "step": 457, "batch_size": 64, "mean": 7.240411758422852, "std": 10.545069694519043, "min": -12.182365417480469, "p10": -6.049192810058593, "median": 5.862530708312988, "p90": 20.899908828735356, "max": 32.68206024169922, "pos_frac": 0.734375, "sample": [5.660858154296875, -3.026153564453125, -0.9536857604980469, 3.377532958984375, 7.597602844238281, -6.131084442138672, 12.251691818237305, 2.3961944580078125, 1.435150146484375, -7.999626159667969, 1.9247550964355469, 16.46160888671875, 7.930625915527344, 1.250030517578125, -10.061426162719727, -12.182365417480469, 23.437713623046875, -0.693206787109375, -2.8393592834472656, 6.326927185058594, 29.090045928955078, 16.203166961669922, 32.68206024169922, 9.004959106445312, 5.887727737426758, -1.7620124816894531, -4.402378082275391, 20.131397247314453, 9.342628479003906, 1.6365089416503906, -0.26020050048828125, -6.1006317138671875, 1.5077552795410156, 17.10906219482422, 15.6954345703125, -1.969207763671875, -5.929168701171875, 8.412567138671875, 12.39080810546875, 3.413177490234375, 18.300613403320312, -9.370956420898438, 29.58660125732422, 19.390090942382812, -7.892845153808594, 5.025779724121094, 24.214202880859375, 3.307147979736328, 9.214115142822266, 27.281312942504883, 19.615375518798828, 2.5749740600585938, 19.45733642578125, 5.837333679199219, 12.981700897216797, 21.229270935058594, 1.6526031494140625, 7.8457183837890625, 8.797660827636719, 6.162239074707031, 8.4185791015625, 3.9297866821289062, 19.244373321533203, -1.6641674041748047], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000457.npy"}
{"epoch": 0.690854119425548, "step": 458, "batch_size": 64, "mean": 9.500862121582031, "std": 11.559039115905762, "min": -16.481155395507812, "p10": -3.356805610656738, "median": 8.993682861328125, "p90": 22.52744827270508, "max": 39.264652252197266, "pos_frac": 0.796875, "sample": [-1.6830711364746094, 19.892955780029297, 4.435602188110352, 7.26507568359375, 10.224645614624023, 19.6676025390625, 16.789894104003906, -16.481155395507812, -2.7687911987304688, 5.860874176025391, 3.0897979736328125, -12.474136352539062, -3.31097412109375, 15.306182861328125, 19.275074005126953, 0.0116424560546875, 18.70916748046875, 12.85604476928711, 4.888420104980469, 10.936660766601562, 6.881153106689453, 6.3007659912109375, 17.236515045166016, -6.9700927734375, 20.220230102539062, 9.044578552246094, 27.27972412109375, 8.418621063232422, 2.1938705444335938, 15.888702392578125, 19.505767822265625, 10.390495300292969, 1.5700263977050781, -1.155548095703125, 8.040290832519531, 0.9680423736572266, 2.975006103515625, 8.293235778808594, 17.029510498046875, 22.84941864013672, 10.163341522216797, 8.942787170410156, -12.607612609863281, -3.3764476776123047, 23.588134765625, 4.9335479736328125, 17.46197509765625, -0.8502120971679688, 37.54456329345703, 23.383201599121094, -10.197639465332031, 21.77618408203125, 15.202529907226562, 13.125892639160156, -10.209491729736328, 39.264652252197266, 18.30670166015625, 29.33521270751953, -1.84918212890625, 0.05792999267578125, 4.430818557739258, 15.491050720214844, 18.362403869628906, 16.323043823242188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000458.npy"}
{"epoch": 0.6923658352229781, "step": 459, "batch_size": 64, "mean": 9.375499725341797, "std": 13.18284797668457, "min": -14.828475952148438, "p10": -8.285136413574218, "median": 9.43094253540039, "p90": 29.865792465209974, "max": 41.52122497558594, "pos_frac": 0.75, "sample": [9.901824951171875, 11.870153427124023, -12.999725341796875, 18.185165405273438, -1.7860908508300781, -11.07147216796875, 10.195398330688477, 11.190826416015625, 13.87823486328125, 14.021936416625977, 18.207733154296875, 0.3313255310058594, 17.788013458251953, -1.6954917907714844, 24.37956428527832, 40.08671569824219, 0.38603973388671875, 17.269927978515625, 12.903186798095703, 12.451347351074219, 3.388519287109375, 10.633773803710938, -0.30481719970703125, 1.891387939453125, 6.232551574707031, 13.065452575683594, 0.365997314453125, 12.925201416015625, 21.051246643066406, 32.02802658081055, 41.52122497558594, -2.0463104248046875, -8.922531127929688, 5.075319290161133, -6.797882080078125, 12.43471908569336, 1.5183181762695312, -9.472625732421875, 25.182510375976562, -2.0253162384033203, 5.039420127868652, 10.597793579101562, 8.481287002563477, 5.4200897216796875, 14.116012573242188, -1.5207977294921875, 16.063705444335938, 11.551124572753906, -14.828475952148438, 26.677871704101562, 1.1304779052734375, -0.24086761474609375, -10.512411117553711, 13.711318969726562, 6.2928619384765625, -13.032882690429688, 8.551361083984375, 8.741928100585938, 32.96412658691406, 33.890235900878906, 31.232044219970703, 8.960060119628906, 36.497291564941406, -2.9910125732421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000459.npy"}
{"epoch": 0.6938775510204082, "step": 460, "batch_size": 64, "mean": 9.971198081970215, "std": 12.442729949951172, "min": -9.202949523925781, "p10": -4.33429718017578, "median": 6.724334716796875, "p90": 27.508308792114256, "max": 45.40089416503906, "pos_frac": 0.78125, "sample": [12.655776977539062, 16.279708862304688, 45.40089416503906, -5.4880828857421875, 1.7425613403320312, 23.806785583496094, 22.11529541015625, 6.7635650634765625, 27.522857666015625, -0.9285697937011719, -1.1604728698730469, 1.2632904052734375, -8.976959228515625, 11.194770812988281, -3.6074981689453125, 6.417701721191406, 10.149097442626953, 14.808521270751953, -4.645782470703125, 24.435062408447266, 1.7031707763671875, 41.33234405517578, 3.991851806640625, 17.643943786621094, 25.1220703125, 13.432086944580078, 4.04473876953125, 6.6851043701171875, 17.570877075195312, 9.366806030273438, 2.481414794921875, 28.57215118408203, 10.993133544921875, 6.68218994140625, -8.643543243408203, 11.152175903320312, 6.252561569213867, -6.01837158203125, 12.656112670898438, 5.906620025634766, 21.440433502197266, 16.319366455078125, -1.347320556640625, -1.1354894638061523, 14.006126403808594, 35.907379150390625, -2.4570770263671875, -8.218528747558594, 24.014434814453125, 1.890594482421875, -1.3298053741455078, 4.180633544921875, 10.177444458007812, 7.095008850097656, 2.70330810546875, 30.012428283691406, 1.3079452514648438, 28.412445068359375, -9.202949523925781, 0.5721588134765625, 3.78533935546875, 16.897491455078125, 4.9749755859375, 27.474361419677734], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000460.npy"}
{"epoch": 0.6953892668178382, "step": 461, "batch_size": 64, "mean": 9.276529312133789, "std": 11.993505477905273, "min": -14.044746398925781, "p10": -4.552881240844726, "median": 8.317161560058594, "p90": 25.503051376342775, "max": 41.706787109375, "pos_frac": 0.796875, "sample": [13.6793212890625, 7.251087188720703, 3.2857818603515625, 8.543624877929688, 25.6016845703125, 10.152603149414062, 16.612281799316406, 15.531755447387695, 4.424369812011719, -0.9787368774414062, 3.1719207763671875, 6.2468719482421875, 19.1917724609375, -5.219461441040039, 10.686114311218262, -2.9794540405273438, 1.439727783203125, 27.845035552978516, 3.412043571472168, 16.544551849365234, 4.542842864990234, 16.49184226989746, 17.399948120117188, 1.946685791015625, 2.616286277770996, 9.602432250976562, -14.044746398925781, -8.514238357543945, -4.813560485839844, 7.720592498779297, 11.188793182373047, 0.7120018005371094, -11.400760650634766, 9.424182891845703, 16.642311096191406, 10.35205078125, 24.788330078125, 12.31341552734375, 8.588218688964844, 41.706787109375, 11.12060546875, 22.01852035522461, 25.272907257080078, 10.248516082763672, 4.7974700927734375, 4.8094482421875, 2.623546600341797, -10.01812744140625, 12.622732162475586, 7.04730224609375, 18.568084716796875, 12.903642654418945, 2.6252517700195312, 4.082740783691406, 37.17402648925781, -10.821144104003906, 29.784088134765625, -2.9249801635742188, -0.9868221282958984, -3.944629669189453, 26.35675048828125, -0.4900474548339844, 41.03098678588867, 8.0906982421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000461.npy"}
{"epoch": 0.6969009826152683, "step": 462, "batch_size": 64, "mean": 10.469758987426758, "std": 13.811188697814941, "min": -25.776100158691406, "p10": -5.679010772705078, "median": 11.60281753540039, "p90": 27.419103240966802, "max": 53.31816101074219, "pos_frac": 0.78125, "sample": [33.40727615356445, 33.4063720703125, 26.44898223876953, 12.772476196289062, -5.137748718261719, 5.124610900878906, 27.96233367919922, -10.53317642211914, 1.90673828125, 1.77020263671875, 14.301483154296875, 31.715251922607422, 25.281387329101562, 13.206161499023438, 15.989593505859375, 27.834869384765625, -2.5538101196289062, 32.27497863769531, 15.077215194702148, 16.852157592773438, 22.990440368652344, 13.519981384277344, 21.665176391601562, 13.144454956054688, 5.8860626220703125, 2.8267745971679688, 13.849868774414062, 20.4195556640625, 0.7344150543212891, 17.88277816772461, 7.3687286376953125, -9.10699462890625, 10.726959228515625, 10.86679458618164, 23.945755004882812, -3.5848846435546875, -5.910980224609375, 14.069015502929688, 20.151857376098633, -1.540771484375, 17.536544799804688, 10.348808288574219, 16.111644744873047, 2.5810546875, -2.569732666015625, 14.356101989746094, -7.207920074462891, 17.502899169921875, 10.782615661621094, -11.495719909667969, 1.2567520141601562, -25.776100158691406, 22.86474609375, 18.59579086303711, 12.135902404785156, -3.6218032836914062, 2.1108551025390625, 53.31816101074219, 11.069732666015625, 10.731414794921875, -18.547130584716797, -2.717327117919922, 1.217498779296875, 2.4673995971679688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000462.npy"}
{"epoch": 0.6984126984126984, "step": 463, "batch_size": 64, "mean": 7.10738468170166, "std": 12.253130912780762, "min": -13.984725952148438, "p10": -6.768861007690429, "median": 4.81303596496582, "p90": 24.328824234008792, "max": 40.27442169189453, "pos_frac": 0.640625, "sample": [-8.038623809814453, 8.439964294433594, 11.497917175292969, -8.441482543945312, 29.400718688964844, 14.083396911621094, -2.298938751220703, 36.124786376953125, 28.47332763671875, 11.970722198486328, -0.4630699157714844, 2.54144287109375, 2.659679412841797, 14.333877563476562, 19.22069549560547, 10.12939453125, 3.4404830932617188, -5.3141632080078125, -1.3909568786621094, -0.8498764038085938, -5.904331207275391, 9.117607116699219, -5.094108581542969, 26.163116455078125, 12.111001968383789, -8.68475341796875, 8.559608459472656, 7.7476959228515625, -1.2139472961425781, -7.627342224121094, -1.189117431640625, 1.3730621337890625, 23.244583129882812, 24.793498992919922, -3.614421844482422, 22.673446655273438, 3.5743560791015625, 9.692779541015625, 14.097953796386719, 22.530258178710938, 6.051715850830078, -7.139373779296875, 10.293060302734375, -13.984725952148438, -3.268115997314453, -10.098190307617188, 40.27442169189453, -4.508522033691406, 21.71563720703125, 29.115509033203125, 1.8729324340820312, -2.3120880126953125, 1.9563522338867188, -5.726543426513672, 11.399261474609375, 9.983322143554688, 22.12614631652832, 1.852264404296875, 10.377243041992188, 0.870849609375, -5.471797943115234, 7.3546142578125, -2.184885025024414, 16.45330047607422], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000463.npy"}
{"epoch": 0.6999244142101285, "step": 464, "batch_size": 64, "mean": 9.51894760131836, "std": 15.631010055541992, "min": -23.53564453125, "p10": -7.983237457275391, "median": 8.781951904296875, "p90": 33.03021583557129, "max": 40.6672248840332, "pos_frac": 0.65625, "sample": [-0.9882049560546875, 11.325973510742188, 12.08841323852539, -15.117843627929688, 36.147071838378906, -2.9313125610351562, -0.4214134216308594, 28.529876708984375, 13.311660766601562, 13.430328369140625, 35.59302520751953, -10.758842468261719, 20.361190795898438, 10.80548095703125, 12.993812561035156, -7.729095458984375, 29.898521423339844, -5.733940124511719, 2.8155288696289062, 6.5772857666015625, 13.657276153564453, 32.13897705078125, -2.479717254638672, 7.2608642578125, 10.404804229736328, 17.020050048828125, -3.9839324951171875, 40.6672248840332, 14.938007354736328, 8.704917907714844, -1.94915771484375, 2.9148292541503906, 20.47522735595703, 33.81597900390625, -23.53564453125, 5.2103118896484375, 28.758892059326172, 26.31145477294922, -8.092155456542969, -5.50347900390625, 24.003211975097656, 8.858985900878906, 9.667991638183594, 34.03825378417969, -8.940582275390625, -2.16351318359375, -0.32553863525390625, -0.5361824035644531, -10.282028198242188, 30.57806396484375, 0.7085914611816406, 32.753875732421875, -23.274551391601562, -3.487781524658203, 6.883216857910156, -7.591526031494141, 1.9300155639648438, 33.14864730834961, 12.370697021484375, 39.88294982910156, 0.4550514221191406, 14.541950225830078, 9.272918701171875, -0.21229171752929688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000464.npy"}
{"epoch": 0.7014361300075586, "step": 465, "batch_size": 64, "mean": 8.385457992553711, "std": 14.008581161499023, "min": -16.067119598388672, "p10": -9.127480316162108, "median": 6.750720977783203, "p90": 30.5218605041504, "max": 39.29216003417969, "pos_frac": 0.6875, "sample": [-16.067119598388672, 26.261749267578125, 17.96123504638672, -2.22088623046875, 22.806488037109375, 8.330551147460938, -11.546527862548828, 38.56570053100586, 1.087921142578125, 18.951171875, 5.6908416748046875, 26.40786361694336, 9.363117218017578, 34.17863464355469, -9.239582061767578, 21.4525146484375, 3.752685546875, 31.944232940673828, 11.67608642578125, -0.279754638671875, 5.8975830078125, -6.8078460693359375, 11.57919692993164, 32.40948486328125, 33.99671936035156, 31.52820587158203, -8.865909576416016, -9.248668670654297, -2.4094619750976562, -0.9603118896484375, 6.082328796386719, -3.832916259765625, 9.97650146484375, 6.650856018066406, 6.8505859375, 3.855712890625, -9.605422973632812, -7.238883972167969, -3.3436965942382812, 11.950349807739258, -6.588096618652344, 14.162452697753906, 9.013694763183594, -1.5821609497070312, 1.0301284790039062, -16.04698944091797, 2.177265167236328, 1.9622650146484375, 22.114662170410156, 28.173721313476562, 7.2498931884765625, 17.066848754882812, 6.8784027099609375, 19.106624603271484, -7.7968597412109375, -14.242998123168945, 10.547401428222656, -0.6102066040039062, 4.881156921386719, 6.441642761230469, 39.29216003417969, 13.712913513183594, 20.84180450439453, 11.342330932617188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000465.npy"}
{"epoch": 0.7029478458049887, "step": 466, "batch_size": 64, "mean": 5.373028755187988, "std": 11.28433609008789, "min": -22.389991760253906, "p10": -8.786930847167968, "median": 5.261707782745361, "p90": 20.527112960815433, "max": 33.81877136230469, "pos_frac": 0.671875, "sample": [-5.281696319580078, 12.453907012939453, -6.47607421875, -13.031051635742188, 13.883563995361328, -4.776897430419922, 6.1631011962890625, 1.840423583984375, -11.287490844726562, -5.614837646484375, 6.043510437011719, 12.936729431152344, 20.954132080078125, -8.125457763671875, 10.412338256835938, -1.377685546875, 13.92959213256836, 22.582763671875, 12.764266967773438, 27.75109100341797, 20.734039306640625, 3.929443359375, 14.552699089050293, -3.915006637573242, 2.41925048828125, 6.97088623046875, 11.633373260498047, -22.389991760253906, 28.556076049804688, 3.3284149169921875, 3.7928123474121094, 3.66552734375, -6.407367706298828, 7.0432586669921875, 7.440071105957031, 5.0069169998168945, 2.9422988891601562, 13.364265441894531, -11.722373962402344, 5.516498565673828, 0.39421844482421875, -1.459503173828125, -11.098907470703125, -2.4152297973632812, 8.673225402832031, -2.51361083984375, 3.02081298828125, 18.097309112548828, 33.81877136230469, -9.216552734375, -6.7267303466796875, 8.911285400390625, -9.070419311523438, 17.964614868164062, -2.9374847412109375, 7.452482223510742, 1.961843490600586, -3.2334327697753906, 20.04428482055664, 8.488407135009766, 25.75811004638672, 11.571730613708496, 16.345932006835938, 7.837364196777344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000466.npy"}
{"epoch": 0.7044595616024187, "step": 467, "batch_size": 64, "mean": 8.895553588867188, "std": 11.837189674377441, "min": -10.707328796386719, "p10": -4.260798072814941, "median": 6.884838104248047, "p90": 24.666429138183595, "max": 50.109710693359375, "pos_frac": 0.8125, "sample": [4.906913757324219, 24.788528442382812, 23.176856994628906, 10.198341369628906, 2.4777679443359375, -0.87493896484375, -2.1745986938476562, -7.084407806396484, 8.070724487304688, 5.854953765869141, 0.45223236083984375, -4.019098281860352, 16.367877960205078, 1.3474292755126953, 9.917282104492188, 7.445453643798828, 9.33708381652832, 4.886478424072266, -3.5966339111328125, 4.510353088378906, 9.774887084960938, -9.422626495361328, -5.404766082763672, 25.947635650634766, 24.38153076171875, 12.534385681152344, 18.67343521118164, 8.404655456542969, -8.212575912475586, 17.171661376953125, -4.902618408203125, 27.583602905273438, 6.954750061035156, 1.1003570556640625, 19.99630355834961, 0.6982574462890625, 13.985786437988281, 24.100250244140625, 11.7750244140625, 8.6739501953125, -3.3825511932373047, 0.1509990692138672, 1.5360641479492188, 3.8866539001464844, 3.206390380859375, 0.4194488525390625, 0.3368339538574219, -4.364383697509766, 19.411983489990234, 24.020885467529297, 50.109710693359375, 4.253562927246094, 4.5839080810546875, 0.7996788024902344, 9.329204559326172, 5.7981414794921875, -10.707328796386719, 37.57909393310547, 15.958757400512695, 27.578155517578125, 6.8149261474609375, 18.364898681640625, 26.399658203125, 7.428276062011719], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000467.npy"}
{"epoch": 0.7059712773998488, "step": 468, "batch_size": 64, "mean": 10.925762176513672, "std": 13.734302520751953, "min": -21.209564208984375, "p10": -6.568766784667968, "median": 10.423030853271484, "p90": 28.551438903808595, "max": 41.86615753173828, "pos_frac": 0.796875, "sample": [16.444316864013672, 1.4195594787597656, 15.752275466918945, 21.697296142578125, 9.242111206054688, -6.172271728515625, -0.3390998840332031, 4.775913238525391, 8.768043518066406, 4.42156982421875, 23.29220199584961, 27.410430908203125, 13.714370727539062, -4.94624137878418, 3.15203857421875, 25.71697235107422, -3.5591659545898438, -6.7386932373046875, -7.4212646484375, 14.471641540527344, 19.932823181152344, 0.5420074462890625, 14.652099609375, 34.31910705566406, 19.704174041748047, 11.528209686279297, 33.426910400390625, 9.277767181396484, -7.963081359863281, -21.209564208984375, 30.237518310546875, -14.6309814453125, 5.9601593017578125, -8.5277099609375, 3.450838088989258, -15.310928344726562, 27.940658569335938, 3.37091064453125, -1.204132080078125, 24.583694458007812, 10.708511352539062, 2.8226165771484375, 19.053916931152344, 28.813201904296875, 19.42591094970703, 3.4305992126464844, 5.153034210205078, 17.751258850097656, 4.798091888427734, 10.349601745605469, 33.78935241699219, 21.82946014404297, 10.4964599609375, -2.0295257568359375, 38.8583984375, 10.981719970703125, 2.9133148193359375, 41.86615753173828, 20.32140350341797, 22.809429168701172, 17.389629364013672, 22.789459228515625, 3.572235107421875, 0.17209625244140625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000468.npy"}
{"epoch": 0.7074829931972789, "step": 469, "batch_size": 64, "mean": 7.757468223571777, "std": 11.397430419921875, "min": -14.028938293457031, "p10": -4.067294883728027, "median": 6.002960205078125, "p90": 24.31803722381593, "max": 39.16189193725586, "pos_frac": 0.6875, "sample": [-5.67620849609375, 18.1574649810791, 11.991416931152344, 11.133407592773438, 8.6102294921875, 6.217170715332031, -1.76947021484375, 12.700464248657227, 6.18701171875, -4.770355224609375, 28.034751892089844, 10.130744934082031, 12.103164672851562, -1.8005599975585938, 1.2423782348632812, 19.629257202148438, -0.89959716796875, 1.0767498016357422, 13.084192276000977, 20.549057006835938, 27.073638916015625, 1.4755172729492188, 11.053230285644531, -5.717704772949219, 5.207000732421875, -14.028938293457031, 12.084213256835938, 4.5433197021484375, -1.7178421020507812, 12.043678283691406, -4.864784240722656, 0.2694234848022461, 7.022857666015625, 35.99108123779297, 18.864206314086914, -1.5929031372070312, -0.33545684814453125, -4.12214469909668, 29.65898895263672, 20.957368850708008, 16.233718872070312, 6.194847106933594, 5.81890869140625, 5.0569610595703125, -0.7776336669921875, -0.4797515869140625, -2.5811767578125, -1.6092758178710938, 39.16189193725586, 6.84722900390625, 14.220794677734375, 4.544639587402344, 25.758323669433594, 1.373575210571289, 35.902740478515625, -3.939311981201172, 4.82811164855957, 3.1288681030273438, 8.535942077636719, -10.00181770324707, 11.262371063232422, -2.9769821166992188, -3.481658935546875, 13.660652160644531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000469.npy"}
{"epoch": 0.708994708994709, "step": 470, "batch_size": 64, "mean": 8.055923461914062, "std": 12.06848430633545, "min": -22.80817413330078, "p10": -5.230736541748047, "median": 7.499599456787109, "p90": 22.212991333007814, "max": 44.691741943359375, "pos_frac": 0.703125, "sample": [20.113494873046875, 13.581108093261719, 26.72039794921875, -22.80817413330078, 21.876129150390625, 18.241104125976562, 11.415382385253906, 6.880317687988281, 12.491950988769531, 14.969253540039062, 7.09765625, -10.587553024291992, 21.63037872314453, 20.595951080322266, 24.287094116210938, 0.3048133850097656, 22.35736083984375, -3.467174530029297, -4.6689453125, 12.70952033996582, 4.155567169189453, 14.014778137207031, 6.404365539550781, -3.900482177734375, 14.208518981933594, -1.4533767700195312, 27.25735855102539, 10.529617309570312, 6.04667854309082, -1.515350341796875, -1.5665969848632812, -6.85369873046875, 20.741962432861328, 3.1584091186523438, -0.9723014831542969, 3.9660377502441406, 13.3070068359375, 2.1142654418945312, 3.0301513671875, -11.269424438476562, 44.691741943359375, 8.38552474975586, 28.43982696533203, -0.3222007751464844, -3.1360092163085938, -5.471504211425781, 8.800637245178223, 16.44873809814453, 17.99798583984375, 26.116249084472656, -15.032730102539062, 14.471870422363281, 9.159534454345703, 4.737640380859375, 3.9040603637695312, 15.155754089355469, 1.0854969024658203, 7.901542663574219, -0.15004730224609375, 10.189453125, 20.71953582763672, -2.084667205810547, -9.47512435913086, -2.0977630615234375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000470.npy"}
{"epoch": 0.7105064247921391, "step": 471, "batch_size": 64, "mean": 10.513579368591309, "std": 13.723960876464844, "min": -32.88169860839844, "p10": -3.3448364257812493, "median": 7.666688919067383, "p90": 30.07406387329102, "max": 38.02629089355469, "pos_frac": 0.828125, "sample": [-2.265291213989258, 32.17184066772461, 19.877771377563477, 3.8342952728271484, 0.29544830322265625, -2.78924560546875, 13.840652465820312, 21.790904998779297, 31.912017822265625, 27.56462860107422, 7.299629211425781, 5.1383056640625, 38.02629089355469, 3.3717727661132812, -14.6275634765625, -0.010998725891113281, 5.186412811279297, 36.35491180419922, 17.582801818847656, -3.58294677734375, 2.8540802001953125, 15.415237426757812, 5.394874572753906, 8.470085144042969, 22.144821166992188, -32.88169860839844, 17.0003662109375, 22.3221435546875, 34.809906005859375, -2.6281204223632812, 12.322479248046875, 7.896495819091797, 2.452770233154297, 3.011199951171875, 17.563186645507812, 12.546287536621094, 27.725749969482422, -10.064987182617188, 9.053199768066406, 23.756549835205078, 4.840862274169922, 9.872726440429688, 32.8680305480957, 27.3299560546875, 0.7795658111572266, 3.0691070556640625, 22.240814208984375, 17.077510833740234, -4.556339263916016, 14.535591125488281, -7.055244445800781, 23.05652618408203, 4.195281982421875, 0.620208740234375, 10.5262451171875, 0.5022850036621094, 6.8860626220703125, 4.5417938232421875, 0.49127197265625, 29.149658203125, 2.46026611328125, 7.436882019042969, -8.606491088867188, 30.470237731933594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000471.npy"}
{"epoch": 0.7120181405895691, "step": 472, "batch_size": 64, "mean": 9.537504196166992, "std": 15.407373428344727, "min": -21.86475372314453, "p10": -7.998012542724608, "median": 7.351879119873047, "p90": 31.043867874145512, "max": 49.686004638671875, "pos_frac": 0.734375, "sample": [0.7155532836914062, 1.0002670288085938, 7.346290588378906, 0.04931640625, 21.462806701660156, 25.859352111816406, -3.00128173828125, 17.54357147216797, 8.712089538574219, 17.42681884765625, 7.960346221923828, 32.44793701171875, 6.1507720947265625, -8.558326721191406, 3.8735084533691406, 30.401073455810547, -0.994903564453125, 7.2748565673828125, 46.762367248535156, 9.33929443359375, -2.1275253295898438, 3.7454566955566406, -1.9129981994628906, 41.13609313964844, -3.514312744140625, 11.5535888671875, -1.3071212768554688, 43.192222595214844, 38.086524963378906, -6.680461883544922, 18.644229888916016, 8.386222839355469, 4.439952850341797, -10.197868347167969, -9.062980651855469, 8.038803100585938, 12.833831787109375, 49.686004638671875, 0.5141944885253906, -0.9651107788085938, 19.780609130859375, 27.048965454101562, -17.22382354736328, 31.319351196289062, -17.643020629882812, -21.86475372314453, 25.131017684936523, 2.441038131713867, -9.796585083007812, 14.0682373046875, 10.405660629272461, 9.550048828125, 5.4626617431640625, 2.4281082153320312, -1.691741943359375, 6.2420501708984375, 7.3574676513671875, 9.125785827636719, 24.374927520751953, 6.316967010498047, 15.047657012939453, 9.080461502075195, -6.69061279296875, 23.86932373046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000472.npy"}
{"epoch": 0.7135298563869993, "step": 473, "batch_size": 64, "mean": 9.295183181762695, "std": 14.347360610961914, "min": -35.25148391723633, "p10": -8.853775787353513, "median": 7.125178337097168, "p90": 29.02381820678712, "max": 44.44134521484375, "pos_frac": 0.796875, "sample": [41.544830322265625, 18.725059509277344, 3.3698997497558594, -13.168621063232422, -1.0673751831054688, 25.049041748046875, 3.66046142578125, 15.44580078125, 3.3966598510742188, 26.17298126220703, 30.777851104736328, 4.642133712768555, -1.2044048309326172, 16.452606201171875, 6.1896514892578125, -11.191268920898438, 44.44134521484375, 33.64271545410156, 9.521583557128906, 16.503570556640625, -35.25148391723633, -9.725410461425781, 10.9990234375, -6.8199615478515625, -13.139251708984375, 17.545814514160156, 6.0263671875, 18.7772216796875, 4.821636199951172, 5.7581634521484375, 5.0876312255859375, -3.1372528076171875, 0.020481109619140625, 5.806549072265625, -6.7953948974609375, 21.043764114379883, 30.24560546875, 21.890792846679688, 7.705036163330078, 7.276891708374023, 32.47767639160156, 25.84667205810547, 21.46221923828125, 16.564861297607422, -1.12109375, 5.346015930175781, 1.7340221405029297, 3.3062210083007812, 13.928627014160156, 11.03546142578125, 6.939247131347656, 7.812534332275391, 18.867164611816406, 5.379547119140625, 0.22567176818847656, 11.151163101196289, 37.97715759277344, -10.576011657714844, 9.256507873535156, 6.9734649658203125, 4.756439208984375, 10.3582763671875, 7.661994934082031, -13.51284408569336], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000473.npy"}
{"epoch": 0.7150415721844293, "step": 474, "batch_size": 64, "mean": 7.711645126342773, "std": 13.564223289489746, "min": -23.333751678466797, "p10": -6.0943759918212885, "median": 5.103799819946289, "p90": 28.627624511718754, "max": 36.486080169677734, "pos_frac": 0.703125, "sample": [32.30279541015625, 5.244010925292969, 8.75905990600586, 2.6800193786621094, -3.1949996948242188, -9.470138549804688, 12.325897216796875, 32.47290802001953, 18.192176818847656, -1.43768310546875, 9.6036376953125, 3.9541015625, -23.333751678466797, 19.628097534179688, 3.2412872314453125, 29.13562774658203, -0.20430755615234375, 4.963588714599609, 1.1777076721191406, 8.187187194824219, -12.2138671875, -5.737283706665039, 19.28913116455078, 8.828529357910156, 20.824111938476562, 4.790069580078125, -2.4190988540649414, -3.809295654296875, -0.10626983642578125, -3.8288040161132812, 31.325958251953125, 4.33441162109375, -11.979516983032227, 19.495582580566406, -1.0027389526367188, 10.442699432373047, 1.3545951843261719, 33.29798126220703, 21.936790466308594, 1.1858367919921875, 2.700960159301758, 6.3833465576171875, 20.971328735351562, 6.55401611328125, -4.905605316162109, 21.69084930419922, 11.4326171875, 3.01263427734375, 18.984447479248047, 2.779376983642578, 36.486080169677734, 35.88587188720703, 5.390899658203125, 10.898162841796875, 9.82967758178711, -17.859657287597656, 19.126731872558594, -4.4737701416015625, -6.247415542602539, 13.953804016113281, 27.442283630371094, -18.272872924804688, 4.151710510253906, -2.60626220703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000474.npy"}
{"epoch": 0.7165532879818595, "step": 475, "batch_size": 64, "mean": 11.851011276245117, "std": 13.158246994018555, "min": -20.032760620117188, "p10": -1.890418815612792, "median": 9.311906814575195, "p90": 28.306443786621095, "max": 50.585330963134766, "pos_frac": 0.84375, "sample": [0.8923072814941406, 1.7830543518066406, 12.590660095214844, 13.548683166503906, 18.77322006225586, 2.7759628295898438, 23.98470115661621, 27.40418243408203, 6.473060607910156, -8.747737884521484, 9.578483581542969, -2.3747711181640625, 23.242019653320312, 28.58038330078125, 3.1392059326171875, 11.9156494140625, 16.968154907226562, 13.107505798339844, 4.810224533081055, 20.57538604736328, 7.326469421386719, -0.25323486328125, 5.8345184326171875, -0.6130447387695312, 9.045330047607422, 2.479248046875, 5.166099548339844, 30.623992919921875, 7.1738739013671875, -8.819793701171875, 50.585330963134766, 31.270877838134766, 23.439498901367188, 38.909202575683594, 9.620290756225586, -20.032760620117188, 19.142608642578125, 2.9630470275878906, 8.695175170898438, -0.7602634429931641, 26.65277099609375, 11.656494140625, 8.631599426269531, 13.358428955078125, 22.363719940185547, -7.651226043701172, 4.957120895385742, 2.3248424530029297, 28.139892578125, 3.5877151489257812, 21.9471492767334, 34.5194091796875, 28.377822875976562, 19.970703125, 13.524078369140625, -8.851348876953125, 0.5719223022460938, 24.0267333984375, 20.637649536132812, 26.877622604370117, 8.670112609863281, 5.127067565917969, -8.97113037109375, 7.198740005493164], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000475.npy"}
{"epoch": 0.7180650037792895, "step": 476, "batch_size": 64, "mean": 8.187226295471191, "std": 12.448899269104004, "min": -23.205463409423828, "p10": -5.226588439941404, "median": 6.291889190673828, "p90": 25.43195343017578, "max": 43.010650634765625, "pos_frac": 0.765625, "sample": [21.8692626953125, 13.865989685058594, 0.09076690673828125, 6.634128570556641, -6.111724853515625, 3.8083343505859375, -3.1612701416015625, 18.131324768066406, 29.72265625, -0.1502227783203125, 18.149505615234375, -1.0383415222167969, 34.77434539794922, 13.60212516784668, 7.5250396728515625, 1.0446014404296875, 12.287490844726562, 8.356246948242188, 13.074554443359375, 1.4971160888671875, -2.3640480041503906, 7.089163780212402, 26.92656707763672, 27.359790802001953, 5.4310455322265625, -23.205463409423828, -10.314102172851562, 4.456504821777344, 1.2349395751953125, 0.23107147216796875, 33.5670051574707, 1.3664627075195312, 6.649078369140625, 5.175052642822266, 5.676490783691406, 8.466156005859375, -14.920700073242188, -2.0238037109375, 21.743179321289062, -2.697711944580078, -9.528121948242188, 13.793182373046875, 3.916900634765625, 4.9730682373046875, 12.854515075683594, 7.756675720214844, -1.3925971984863281, 14.4525146484375, 0.3126983642578125, 25.392166137695312, 11.946590423583984, -7.29119873046875, 23.348716735839844, 5.925994873046875, 15.003210067749023, 43.010650634765625, 0.9746170043945312, 5.949649810791016, 25.449005126953125, 7.4926300048828125, -9.977436065673828, 16.88974380493164, 20.650291442871094, -1.7396011352539062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000476.npy"}
{"epoch": 0.7195767195767195, "step": 477, "batch_size": 64, "mean": 7.209848880767822, "std": 14.194193840026855, "min": -30.461349487304688, "p10": -8.771303367614745, "median": 6.356717586517334, "p90": 27.69158859252931, "max": 36.57732391357422, "pos_frac": 0.671875, "sample": [-14.642227172851562, -4.416816711425781, 31.233108520507812, 2.9313278198242188, 10.308990478515625, 9.748916625976562, -2.8665390014648438, -3.6309471130371094, 12.419479370117188, -6.041893005371094, 13.999618530273438, 19.49958038330078, -7.163679122924805, 3.6880836486816406, 0.1457958221435547, 4.483238220214844, -2.0421600341796875, 17.504661560058594, 10.443801879882812, -3.5013294219970703, 20.5419921875, 22.792255401611328, -1.3727760314941406, 8.183311462402344, 15.79803466796875, -4.1392364501953125, -17.20758819580078, 8.776649475097656, 24.14348602294922, 4.077667236328125, -12.767181396484375, -3.80950927734375, 6.031855583190918, 2.2417221069335938, -30.461349487304688, -6.9286041259765625, 5.889839172363281, 21.788352966308594, 10.909278869628906, -0.24657058715820312, 12.441078186035156, -9.460285186767578, 18.039474487304688, 32.766571044921875, 10.900859832763672, 30.566085815429688, -10.942497253417969, 6.68157958984375, 29.212203979492188, -3.7720565795898438, 14.9052734375, 5.747913360595703, 10.465141296386719, 34.15039825439453, 21.0750732421875, 18.742599487304688, -25.328643798828125, 3.166534423828125, 9.874740600585938, 36.57732391357422, 3.189899444580078, 30.78449249267578, -2.5168609619140625, 17.82078742980957], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000477.npy"}
{"epoch": 0.7210884353741497, "step": 478, "batch_size": 64, "mean": 7.9240617752075195, "std": 16.367149353027344, "min": -27.12152862548828, "p10": -10.311952209472656, "median": 4.495885848999023, "p90": 29.592489433288577, "max": 46.146156311035156, "pos_frac": 0.640625, "sample": [-4.403293609619141, 43.911800384521484, 25.130996704101562, 29.789506912231445, 16.244945526123047, -14.21002197265625, 18.350799560546875, 24.076873779296875, -7.54595947265625, 21.304885864257812, -1.0516853332519531, 1.6691818237304688, 20.646339416503906, 35.98912048339844, -9.297691345214844, 4.682392120361328, -20.774520874023438, -0.806427001953125, -1.8114967346191406, 6.941802978515625, -3.630016326904297, -14.315261840820312, 22.918121337890625, 15.267967224121094, 26.63541030883789, -11.989509582519531, 5.294099807739258, 6.241630554199219, 19.496055603027344, -6.027862548828125, 10.442302703857422, -2.0355300903320312, -19.10369110107422, -4.616641998291016, 4.309379577636719, 29.132781982421875, 20.758323669433594, -8.04833984375, 3.214813232421875, 6.281379699707031, 34.231510162353516, -5.649883270263672, -27.12152862548828, -1.276031494140625, 46.146156311035156, -2.549785614013672, 4.2471160888671875, 25.186676025390625, 3.3889541625976562, 3.13519287109375, 8.835601806640625, 26.519241333007812, 0.00464630126953125, 17.491003036499023, 17.871627807617188, 41.116844177246094, 33.41065979003906, -10.746635437011719, 1.761962890625, 7.91131591796875, -7.508979797363281, -5.060209274291992, 5.680816650390625, 1.0507335662841797], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000478.npy"}
{"epoch": 0.7226001511715797, "step": 479, "batch_size": 64, "mean": 7.013975143432617, "std": 13.48591423034668, "min": -39.626502990722656, "p10": -8.416755294799803, "median": 6.081404685974121, "p90": 26.551438140869145, "max": 33.73158264160156, "pos_frac": 0.703125, "sample": [10.346508026123047, 25.621627807617188, 3.6873626708984375, -11.721336364746094, -1.8081016540527344, 9.359756469726562, 31.116004943847656, 11.524845123291016, 15.558250427246094, 1.7406158447265625, 5.790712356567383, -11.836280822753906, -4.968711853027344, -0.40868186950683594, 13.571434020996094, 12.363494873046875, 31.4024658203125, 9.85769271850586, -0.5118064880371094, 2.399293899536133, -3.064577102661133, 2.888519287109375, 14.79925537109375, 26.949928283691406, 3.51177978515625, -14.659881591796875, 13.281593322753906, 12.381927490234375, -5.474739074707031, -1.1517257690429688, 9.753875732421875, 20.35505485534668, -5.046360015869141, 29.13140106201172, 10.269271850585938, 2.0770740509033203, -15.517807006835938, 5.585533142089844, 4.1681671142578125, 33.73158264160156, 17.955154418945312, -39.626502990722656, 1.3196640014648438, 6.550067901611328, 18.537670135498047, 29.56290054321289, 5.126262664794922, -12.577392578125, -9.167919158935547, 2.6027374267578125, 8.375076293945312, 17.22211456298828, -6.664039611816406, 28.762359619140625, -4.815698623657227, 6.372097015380859, 24.184310913085938, 4.982246398925781, 9.807479858398438, 13.48068618774414, -3.1091442108154297, -4.200531005859375, 19.64324188232422, 17.51653289794922], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000479.npy"}
{"epoch": 0.7241118669690099, "step": 480, "batch_size": 64, "mean": 8.89599323272705, "std": 14.83957290649414, "min": -38.65086364746094, "p10": -7.53788146972656, "median": 10.330804824829102, "p90": 25.730405807495117, "max": 34.49848937988281, "pos_frac": 0.734375, "sample": [14.86819076538086, -0.044830322265625, 23.087326049804688, -11.460052490234375, 7.555681228637695, 0.8655586242675781, 14.556255340576172, -38.65086364746094, -1.2859153747558594, 13.378875732421875, -4.867927551269531, -1.7701606750488281, 19.048492431640625, -4.909324645996094, 20.422332763671875, -15.963165283203125, -1.275726318359375, 6.514892578125, -0.9346084594726562, 7.9970703125, -12.548198699951172, 26.11907958984375, 8.536579132080078, 2.1916656494140625, 11.89227294921875, 25.444856643676758, 28.655929565429688, 14.629837036132812, 34.49848937988281, 9.623847961425781, 14.784534454345703, -4.361512184143066, -2.527374267578125, 6.191402435302734, 20.909107208251953, 7.028537750244141, 23.67394256591797, 25.798675537109375, -36.03910827636719, 14.36856460571289, 23.345439910888672, 33.460350036621094, 21.423248291015625, 11.037761688232422, 1.337921142578125, 25.571109771728516, 2.5359725952148438, 13.727340698242188, 8.820037841796875, 12.170433044433594, 5.316898345947266, -3.6771087646484375, 34.343360900878906, 12.375885009765625, 13.235244750976562, 19.916650772094727, 29.39437484741211, 16.724502563476562, 11.921092987060547, -8.664405822753906, 7.217281341552734, 23.44255828857422, 6.66217041015625, -18.297760009765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000480.npy"}
{"epoch": 0.7256235827664399, "step": 481, "batch_size": 64, "mean": 8.741968154907227, "std": 14.265273094177246, "min": -28.845733642578125, "p10": -4.041034698486328, "median": 6.213878631591797, "p90": 24.996411514282237, "max": 48.28466796875, "pos_frac": 0.78125, "sample": [26.021892547607422, -2.544708251953125, 37.18840789794922, -0.46527099609375, -4.685005187988281, 12.456916809082031, 18.550132751464844, 0.10013198852539062, -3.9824752807617188, -3.8864593505859375, 20.47137451171875, 0.7823257446289062, 5.7729034423828125, 13.833564758300781, -6.091209411621094, 14.906494140625, 15.215492248535156, 15.097702026367188, 6.654853820800781, 10.015670776367188, -28.845733642578125, 5.156871795654297, 14.456436157226562, 7.628482818603516, 4.651023864746094, -4.066131591796875, 0.13214111328125, 17.564727783203125, 4.94384765625, 22.603622436523438, 1.3272438049316406, 10.975715637207031, 0.8513031005859375, 0.36328125, 7.52569580078125, 3.4477081298828125, 14.102149963378906, 2.9198036193847656, 3.5656356811523438, 4.619869232177734, 7.84466552734375, 9.820858001708984, 20.691791534423828, 45.991363525390625, 1.3227119445800781, 33.10307693481445, -10.609565734863281, 5.477409362792969, 37.37903594970703, 17.468612670898438, 12.986312866210938, -0.8487052917480469, 3.0646324157714844, -2.864898681640625, -13.253673553466797, 42.296600341796875, -20.704387664794922, 10.712944030761719, 19.400760650634766, 48.28466796875, 14.655941009521484, 8.271495819091797, -3.102813720703125, 2.7606658935546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000481.npy"}
{"epoch": 0.72713529856387, "step": 482, "batch_size": 64, "mean": 7.68571662902832, "std": 15.477113723754883, "min": -29.464004516601562, "p10": -7.851983642578125, "median": 7.013645172119141, "p90": 31.612688827514656, "max": 44.903717041015625, "pos_frac": 0.6875, "sample": [-4.794219970703125, -29.464004516601562, 0.35243988037109375, 8.073883056640625, 5.123256683349609, -7.828826904296875, 0.2273406982421875, -12.660408020019531, 2.858570098876953, 32.589569091796875, 8.509532928466797, 21.820892333984375, 21.785099029541016, 0.5521621704101562, 17.426929473876953, 9.26308822631836, 25.01983642578125, 8.379169464111328, -22.731544494628906, 3.0573043823242188, 33.89411544799805, 0.5689239501953125, 44.903717041015625, -9.045486450195312, -2.206867218017578, 12.819416046142578, 38.01323699951172, 18.54150390625, -5.424781799316406, 0.6621761322021484, -7.861907958984375, 13.492311477661133, -4.806842803955078, 14.599628448486328, 39.50994873046875, 25.27625274658203, 12.977272033691406, 0.6800727844238281, -1.03118896484375, -2.1590728759765625, 37.37835693359375, 7.326202392578125, 6.36248779296875, 7.881500244140625, 7.896524429321289, 29.333301544189453, 12.216720581054688, 7.866355895996094, 34.50566101074219, -5.3836669921875, 15.742958068847656, 6.701087951660156, 17.513572692871094, -4.268119812011719, 16.141521453857422, 1.24676513671875, -16.835494995117188, 24.357254028320312, -5.172279357910156, -12.949161529541016, 11.60357666015625, -5.825630187988281, -4.906890869140625, -7.8092498779296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000482.npy"}
{"epoch": 0.7286470143613001, "step": 483, "batch_size": 64, "mean": 10.071846008300781, "std": 14.158522605895996, "min": -36.35368347167969, "p10": -3.465899658203125, "median": 7.1494293212890625, "p90": 28.09089431762696, "max": 44.89129638671875, "pos_frac": 0.78125, "sample": [3.4121665954589844, 23.681884765625, 1.1093215942382812, 14.996393203735352, 22.570655822753906, 28.822242736816406, 24.228729248046875, -0.6061248779296875, -3.4556884765625, 1.0095939636230469, 4.8232269287109375, 9.292716979980469, 23.297515869140625, 36.19805908203125, -0.042266845703125, 23.724485397338867, -0.17018508911132812, -5.215492248535156, 3.775423049926758, 40.30535888671875, 29.849105834960938, 37.6847038269043, 16.852935791015625, 0.007659912109375, 26.384414672851562, 5.8470458984375, 23.89642333984375, 8.0654296875, -8.743919372558594, 15.212814331054688, 17.889007568359375, -18.768394470214844, 26.136451721191406, 3.431680679321289, 2.35748291015625, 17.782135009765625, 14.559669494628906, -36.35368347167969, 3.896739959716797, 1.6755828857421875, 3.6053009033203125, 12.617485046386719, 9.715301513671875, 4.1683502197265625, -4.522083282470703, 13.416900634765625, 44.89129638671875, 29.886764526367188, 15.23590087890625, 14.209854125976562, 7.146003723144531, -3.47027587890625, 5.190284729003906, 21.938072204589844, -4.828073501586914, -3.23931884765625, -0.5945358276367188, 7.152854919433594, 11.622779846191406, 4.879730224609375, 0.8204727172851562, 4.162181854248047, -3.3170623779296875, 14.488693237304688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000483.npy"}
{"epoch": 0.7301587301587301, "step": 484, "batch_size": 64, "mean": 7.1592864990234375, "std": 10.795513153076172, "min": -23.14742660522461, "p10": -3.7328823089599608, "median": 5.528822422027588, "p90": 19.15887451171875, "max": 34.79444885253906, "pos_frac": 0.765625, "sample": [9.8865966796875, -2.6667327880859375, -12.420204162597656, 4.756256103515625, 33.8043212890625, 2.8685131072998047, 9.635419845581055, -23.14742660522461, 25.275619506835938, -3.4673538208007812, 4.669891357421875, 0.38272857666015625, -3.806720733642578, 3.4538421630859375, 8.691082000732422, 8.659049987792969, -6.120765686035156, 8.507720947265625, 1.1793441772460938, 14.062919616699219, -2.1520118713378906, 14.795158386230469, 31.534095764160156, 12.228780746459961, 4.365022659301758, 16.42691421508789, 18.255096435546875, 10.69921875, 15.578529357910156, 9.022174835205078, -1.0947284698486328, 13.741291046142578, 2.6664581298828125, 1.1412353515625, 29.05493927001953, 1.8086090087890625, 2.819469451904297, 3.2671566009521484, 4.771381378173828, 7.943309783935547, 12.206747055053711, -3.1113204956054688, 34.79444885253906, -5.5072174072265625, 10.922582626342773, 3.411029815673828, 19.292770385742188, -1.2935142517089844, 16.569072723388672, 10.129173278808594, 21.984397888183594, 6.286263465881348, 14.080741882324219, 0.227203369140625, -6.7255706787109375, -7.3082733154296875, 18.846450805664062, -2.3484344482421875, 1.21710205078125, 1.3688507080078125, 7.383293151855469, 16.20575714111328, 12.047157287597656, -3.5605926513671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000484.npy"}
{"epoch": 0.7316704459561603, "step": 485, "batch_size": 64, "mean": 8.80953598022461, "std": 13.289398193359375, "min": -19.13787078857422, "p10": -5.180641174316405, "median": 5.021055221557617, "p90": 29.278494262695315, "max": 39.14277648925781, "pos_frac": 0.75, "sample": [15.796737670898438, 1.7011795043945312, 29.458152770996094, 11.231224060058594, 5.093547821044922, -0.6415863037109375, 6.4305572509765625, -4.071264266967773, 21.904373168945312, -5.401145935058594, -19.13787078857422, 11.167308807373047, 31.076889038085938, 28.859291076660156, 4.047576904296875, 14.503814697265625, 15.43408203125, 15.412353515625, 25.026840209960938, 4.9485626220703125, 4.330270767211914, 1.7286529541015625, -11.691352844238281, 31.2496337890625, 3.1746368408203125, 3.185089111328125, -13.841209411621094, 5.195518493652344, -0.9093246459960938, 15.182079315185547, -8.262611389160156, 10.4730224609375, 27.952011108398438, 0.02614593505859375, 2.6972579956054688, -0.4523468017578125, 39.14277648925781, 37.90766143798828, 2.03387451171875, 21.622283935546875, 11.358489990234375, -10.371902465820312, 3.338226318359375, -3.7559738159179688, 4.8193359375, -6.6446380615234375, -1.8104047775268555, 20.2374267578125, 14.710018157958984, -3.092437744140625, 2.8510589599609375, 5.123632431030273, 22.34839630126953, 34.00764083862305, 5.611602783203125, 23.121902465820312, 9.862495422363281, 36.58062744140625, 1.586639404296875, -4.666130065917969, 8.044036865234375, 2.6147232055664062, -0.00128173828125, 4.352119445800781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000485.npy"}
{"epoch": 0.7331821617535903, "step": 486, "batch_size": 64, "mean": 9.81094741821289, "std": 16.28900909423828, "min": -32.623191833496094, "p10": -6.280521392822265, "median": 6.3903350830078125, "p90": 32.15335845947266, "max": 57.99615478515625, "pos_frac": 0.75, "sample": [-1.032684326171875, 5.113945007324219, 34.703582763671875, 1.1986503601074219, 0.672943115234375, -2.4102020263671875, 9.558929443359375, 29.708282470703125, 15.199310302734375, 0.02655792236328125, -6.635425567626953, 19.6912841796875, 22.82247543334961, 11.62252140045166, 28.741134643554688, 3.9778671264648438, -1.5572509765625, 39.40021896362305, -0.10396766662597656, 27.827064514160156, 4.784732818603516, 10.451225280761719, -1.053497314453125, 0.4155731201171875, 41.038597106933594, -10.136566162109375, 21.556289672851562, -10.097772598266602, 16.164474487304688, -28.201026916503906, 34.60859680175781, 20.789627075195312, 5.49530029296875, -15.636714935302734, 2.7824764251708984, 9.739540100097656, 0.6969490051269531, 23.340179443359375, -7.037111282348633, -2.362041473388672, 11.31546401977539, 26.239017486572266, 5.59722900390625, 33.20124816894531, 23.240467071533203, 2.3370513916015625, 2.730316162109375, 10.110382080078125, 29.671531677246094, 37.67219543457031, 13.29852294921875, 3.6962661743164062, 7.183441162109375, 8.381103515625, -3.6471691131591797, 4.0341644287109375, 7.196281433105469, 9.140548706054688, 57.99615478515625, -5.452411651611328, 20.25379180908203, -32.623191833496094, 4.545158386230469, -4.0810089111328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000486.npy"}
{"epoch": 0.7346938775510204, "step": 487, "batch_size": 64, "mean": 8.483081817626953, "std": 12.682685852050781, "min": -18.792509078979492, "p10": -6.436298370361327, "median": 8.092034339904785, "p90": 25.092992401123052, "max": 42.65924072265625, "pos_frac": 0.75, "sample": [-5.772491455078125, -7.455345153808594, 10.987258911132812, 23.058242797851562, 16.496368408203125, 9.204839706420898, 0.648101806640625, 11.770305633544922, 23.739463806152344, 10.832382202148438, 19.436614990234375, -5.697196960449219, 1.9019241333007812, -8.45956802368164, -5.276275634765625, 35.564720153808594, 19.923294067382812, 10.85946273803711, 41.331787109375, 5.121059417724609, -13.11627197265625, 4.347900390625, 2.252368927001953, 27.319046020507812, 10.331649780273438, -0.7600479125976562, -5.766529083251953, -6.96044921875, 10.232498168945312, 10.036468505859375, 17.24835205078125, 3.6718521118164062, 12.575191497802734, -8.884870529174805, 22.41278839111328, 6.003082275390625, 23.894241333007812, -1.8034439086914062, 11.89249038696289, 10.538097381591797, 9.465484619140625, 2.775888442993164, -2.770172119140625, 14.606758117675781, -1.133697509765625, 6.979228973388672, 10.313217163085938, 1.4631118774414062, 1.6726207733154297, -18.792509078979492, -1.5175437927246094, 29.746231079101562, 4.484367370605469, -6.720787048339844, 2.893157958984375, 12.859283447265625, 3.017406463623047, 11.979324340820312, 4.7047271728515625, 30.477989196777344, 25.60674285888672, 12.794050216674805, 42.65924072265625, 1.673757553100586], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000487.npy"}
{"epoch": 0.7362055933484505, "step": 488, "batch_size": 64, "mean": 9.746414184570312, "std": 15.447053909301758, "min": -25.868507385253906, "p10": -6.784367370605469, "median": 6.012371063232422, "p90": 33.52137298583985, "max": 51.705169677734375, "pos_frac": 0.6875, "sample": [-4.841144561767578, -2.346393585205078, -4.386791229248047, -9.490177154541016, 17.2869873046875, -5.634159088134766, -25.868507385253906, 17.366031646728516, 27.63311767578125, 24.307918548583984, 0.3456764221191406, -9.495986938476562, 38.74407196044922, 7.1654052734375, 3.1549224853515625, 51.705169677734375, 5.453575134277344, 23.54334259033203, -0.5265655517578125, -10.258285522460938, 5.848075866699219, 6.495216369628906, 4.513618469238281, 2.38427734375, -2.8602371215820312, 10.045196533203125, -7.506523132324219, 25.14361572265625, -3.1144561767578125, 33.854591369628906, 0.07526397705078125, -6.036338806152344, -2.2596702575683594, 34.47509002685547, 10.337814331054688, 1.4972763061523438, 30.88985252380371, -7.12861442565918, 31.404205322265625, 21.398971557617188, 29.799880981445312, -1.3445816040039062, 24.253013610839844, -1.2240142822265625, 34.412750244140625, 7.658058166503906, 1.8927154541015625, 6.176666259765625, -7.0254669189453125, 17.899566650390625, 2.801666259765625, 9.302675247192383, -4.906599044799805, 8.88836669921875, 13.153739929199219, 33.80952835083008, 14.486785888671875, 32.8490104675293, 18.42816162109375, 4.1906280517578125, 11.81072998046875, 4.678825378417969, -6.2218017578125, 34.684730529785156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000488.npy"}
{"epoch": 0.7377173091458806, "step": 489, "batch_size": 64, "mean": 3.76362943649292, "std": 11.11733341217041, "min": -20.820846557617188, "p10": -9.512162780761718, "median": 3.7671966552734375, "p90": 16.249645996093754, "max": 35.493141174316406, "pos_frac": 0.625, "sample": [-9.974590301513672, 2.8121719360351562, 5.313438415527344, 12.430450439453125, 13.878997802734375, 29.25629425048828, -0.8540725708007812, 7.3798980712890625, 1.9506301879882812, 35.493141174316406, 14.783103942871094, -2.5283737182617188, 4.844970703125, -14.577529907226562, -1.047861099243164, -18.059310913085938, -6.8451995849609375, -1.7512321472167969, 13.875564575195312, 3.7879180908203125, 11.829120635986328, 11.133468627929688, 7.2686309814453125, 1.7379150390625, 16.612327575683594, 24.179656982421875, -2.7179641723632812, 1.2292938232421875, 0.064239501953125, 5.3753814697265625, -9.656257629394531, -20.820846557617188, 21.286727905273438, 4.704578399658203, -2.6461029052734375, -9.175941467285156, -0.8994293212890625, 8.494850158691406, 14.663734436035156, 5.332786560058594, 8.711368560791016, 11.6502685546875, 4.6272125244140625, 7.4826812744140625, 4.538764953613281, -5.8152923583984375, 14.629085540771484, 4.313224792480469, 15.403388977050781, 1.621795654296875, -11.909988403320312, 1.2081146240234375, 3.7464752197265625, -0.32829856872558594, -2.3092422485351562, -18.10406494140625, 18.9669189453125, -7.297035217285156, -5.973075866699219, -3.6386795043945312, -2.6189346313476562, 25.405609130859375, -6.477472305297852, 4.8748626708984375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000489.npy"}
{"epoch": 0.7392290249433107, "step": 490, "batch_size": 64, "mean": 8.646984100341797, "std": 15.835003852844238, "min": -23.83935546875, "p10": -8.833808135986327, "median": 5.2401018142700195, "p90": 31.057421302795415, "max": 44.49684143066406, "pos_frac": 0.703125, "sample": [0.6406955718994141, -16.194427490234375, -22.489166259765625, 21.344009399414062, -5.214412689208984, 10.175262451171875, 3.241424560546875, 4.6551513671875, 40.39460754394531, 24.803466796875, -10.868186950683594, 36.57769775390625, -15.073738098144531, 5.292749404907227, 23.97873878479004, 10.494855880737305, 34.32417297363281, 26.920562744140625, 13.666748046875, 2.455657958984375, 2.5987091064453125, 12.957931518554688, 26.225013732910156, 22.511329650878906, -5.73260498046875, -0.2606353759765625, 29.128074645996094, -1.8341569900512695, 5.633331298828125, 0.11602020263671875, -7.9663848876953125, 5.1874542236328125, 5.784881591796875, 19.031227111816406, -0.6335639953613281, 4.212303161621094, 3.4808197021484375, 40.84455108642578, -16.12033462524414, -1.7555313110351562, 22.119705200195312, -2.1916732788085938, -6.690093994140625, 31.7016544342041, 0.7008628845214844, 9.36221694946289, 1.8348312377929688, -4.428974151611328, -9.154739379882812, -8.084968566894531, 31.793243408203125, 8.547811508178711, 1.960205078125, 20.83831787109375, 44.49684143066406, 8.63283920288086, 8.964607238769531, 12.156013488769531, 2.169097900390625, 29.554210662841797, -23.83935546875, -2.035289764404297, 23.788497924804688, 18.67682647705078], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000490.npy"}
{"epoch": 0.7407407407407407, "step": 491, "batch_size": 64, "mean": 9.249521255493164, "std": 12.820592880249023, "min": -14.91720962524414, "p10": -4.7809303283691404, "median": 6.595676422119141, "p90": 27.930694580078125, "max": 37.053348541259766, "pos_frac": 0.75, "sample": [36.15493392944336, -4.020927429199219, 14.573570251464844, 7.438285827636719, 5.109733581542969, 14.624698638916016, 8.410860061645508, 4.05828857421875, 7.506187438964844, 6.161586761474609, 6.75665283203125, -2.69512939453125, 22.662212371826172, -2.9502792358398438, 22.96527862548828, -8.014205932617188, 4.384239196777344, 27.896728515625, 2.5309219360351562, 7.179962158203125, 3.3571624755859375, -0.406097412109375, 18.190170288085938, 2.1953086853027344, -4.109067916870117, 16.185222625732422, 2.6564788818359375, 7.0258636474609375, -4.954792022705078, 6.434700012207031, 37.053348541259766, 15.854179382324219, -4.2613677978515625, 9.53656005859375, 27.512649536132812, 7.5006256103515625, -2.4127044677734375, 27.94525146484375, 5.052864074707031, -8.678447723388672, 34.209869384765625, 0.19781494140625, -1.878000259399414, 0.7152881622314453, 26.00334930419922, 13.114910125732422, -4.79730224609375, -7.2650299072265625, 33.65419006347656, 26.642234802246094, 5.0143585205078125, 5.000701904296875, 30.870460510253906, 34.57530975341797, 15.891988754272461, 4.39569091796875, 11.904014587402344, 1.8414268493652344, 19.395105361938477, -14.91720962524414, -8.971920013427734, 7.016530990600586, -4.742729187011719, 21.686843872070312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000491.npy"}
{"epoch": 0.7422524565381708, "step": 492, "batch_size": 64, "mean": 6.226871013641357, "std": 13.522151947021484, "min": -22.826370239257812, "p10": -11.393020629882812, "median": 5.5157928466796875, "p90": 28.004048919677736, "max": 32.25407409667969, "pos_frac": 0.640625, "sample": [30.23760986328125, 20.277191162109375, -3.097195625305176, 7.3929595947265625, 11.907943725585938, -8.801856994628906, 24.12494659423828, -1.02764892578125, 4.982158660888672, -1.1775970458984375, 15.307815551757812, 9.568122863769531, 12.146041870117188, -1.4500808715820312, 13.225265502929688, 7.8590087890625, -5.701438903808594, 2.55096435546875, -3.515565872192383, 4.4114990234375, 13.746036529541016, -15.589141845703125, -11.980316162109375, -2.6660003662109375, 11.217056274414062, 8.26095199584961, 6.049427032470703, 29.04559326171875, -0.2050323486328125, -22.826370239257812, 32.25407409667969, 8.829837799072266, 17.845291137695312, 4.045234680175781, -8.184228897094727, 3.1780471801757812, -7.291748046875, 30.865310668945312, -11.5838623046875, -11.900054931640625, 27.42010498046875, 20.361614227294922, 8.29473876953125, 0.7923812866210938, -3.7496490478515625, 9.55859375, 29.336467742919922, 8.122737884521484, 3.70379638671875, 21.843074798583984, -13.747032165527344, -10.947723388671875, -12.737297058105469, 23.187652587890625, 18.721370697021484, 4.498638153076172, 29.423439025878906, -4.8997039794921875, 8.093063354492188, -10.222015380859375, 10.066957473754883, 0.5510482788085938, -9.737075805664062, 28.254310607910156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000492.npy"}
{"epoch": 0.7437641723356009, "step": 493, "batch_size": 64, "mean": 9.455911636352539, "std": 14.642498970031738, "min": -24.258880615234375, "p10": -8.915797424316406, "median": 8.866199493408203, "p90": 28.01949501037599, "max": 44.6460075378418, "pos_frac": 0.78125, "sample": [10.821041107177734, 3.0259971618652344, -8.752365112304688, 12.991455078125, 9.737251281738281, 3.3570709228515625, -24.200420379638672, 10.188453674316406, 5.611209869384766, 0.2924156188964844, 5.3743743896484375, 17.214447021484375, 24.965927124023438, 38.259918212890625, 8.93914794921875, -10.064800262451172, 31.261940002441406, 2.376178741455078, 22.462646484375, 24.441139221191406, -2.705099105834961, 2.1845245361328125, -13.112014770507812, 8.793251037597656, -0.9210739135742188, -7.658111572265625, 23.956390380859375, 10.238445281982422, 43.58624267578125, 10.589041709899902, -8.98583984375, 10.647850036621094, -5.5264892578125, 13.429618835449219, -7.50897216796875, 9.176681518554688, 22.112628936767578, -9.719505310058594, 17.742431640625, 7.993358612060547, 4.726898193359375, 20.363723754882812, 11.97247314453125, 22.9063720703125, 5.954013824462891, 35.734561920166016, -12.061996459960938, 14.592620849609375, 21.44549560546875, -4.255062103271484, 2.8432159423828125, 29.328166961669922, 6.3138427734375, 4.329811096191406, 36.232521057128906, 15.285400390625, 6.804771423339844, 2.9021987915039062, 4.884857177734375, 44.6460075378418, 16.968017578125, 6.867607116699219, 18.035293579101562, -24.258880615234375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000493.npy"}
{"epoch": 0.745275888133031, "step": 494, "batch_size": 64, "mean": 7.852076530456543, "std": 14.221895217895508, "min": -21.922714233398438, "p10": -8.503910827636718, "median": 5.329427719116211, "p90": 29.135587692260746, "max": 42.404052734375, "pos_frac": 0.71875, "sample": [13.75887680053711, -2.672698974609375, 4.109405517578125, -3.60418701171875, 14.575660705566406, 3.3072357177734375, 3.0536155700683594, 18.72864532470703, 0.7080116271972656, 4.4408111572265625, 0.22137451171875, -6.222126007080078, 2.4395523071289062, 14.836700439453125, 5.040332794189453, 0.7102432250976562, 26.77645492553711, -3.8446731567382812, 7.74932861328125, -10.7142333984375, 5.015357971191406, 42.404052734375, 10.87460708618164, 4.132701873779297, 17.338531494140625, 34.784576416015625, 8.326942443847656, -6.028560638427734, 22.447509765625, 28.25515365600586, 16.547523498535156, -18.826553344726562, -17.659507751464844, 14.710531234741211, -1.2550334930419922, -6.7383270263671875, -9.260589599609375, 13.781196594238281, 9.66015625, 5.304210662841797, 19.935714721679688, -5.267143249511719, -21.922714233398438, -2.0190963745117188, 12.181018829345703, 5.536773681640625, 40.106292724609375, 7.6060791015625, -3.1718521118164062, 14.465774536132812, 29.962860107421875, 5.7718048095703125, 9.859909057617188, 34.304168701171875, 3.3209075927734375, 19.501380920410156, -10.80322265625, -6.669853210449219, 29.512916564941406, 1.1148681640625, 21.434860229492188, -9.481964111328125, 5.354644775390625, 34.685943603515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000494.npy"}
{"epoch": 0.7467876039304611, "step": 495, "batch_size": 64, "mean": 6.954477310180664, "std": 13.523691177368164, "min": -29.282512664794922, "p10": -7.845518112182616, "median": 5.79061222076416, "p90": 23.758553314208985, "max": 43.157569885253906, "pos_frac": 0.75, "sample": [-28.16607666015625, -0.720428466796875, -6.742706298828125, -29.282512664794922, 25.172283172607422, -8.53707504272461, -10.762285232543945, 5.642311096191406, 8.129676818847656, 15.683181762695312, -9.596405029296875, -12.373138427734375, 1.2203521728515625, 2.0118484497070312, 16.9295654296875, 23.777236938476562, -4.532470703125, 19.039657592773438, 17.553874969482422, -3.4419326782226562, 0.9033050537109375, 13.291254043579102, 3.4153823852539062, 8.229608535766602, 5.938913345336914, 25.593116760253906, 2.8930587768554688, -2.7661285400390625, 9.487024307250977, 6.106998443603516, 35.456695556640625, 14.219364166259766, 11.967464447021484, -8.165386199951172, 16.404029846191406, -7.099159240722656, 7.555412292480469, 5.5975341796875, 35.95758056640625, 17.179004669189453, 0.14508438110351562, 11.444129943847656, 14.538650512695312, 1.2568206787109375, 23.71495819091797, 12.098888397216797, 5.564105987548828, 1.6037864685058594, -5.1567535400390625, 43.157569885253906, 0.3670387268066406, 18.477916717529297, 6.240882873535156, 0.8097381591796875, 2.2474365234375, 34.06146240234375, 12.471817016601562, -6.5872650146484375, -0.942962646484375, 8.044891357421875, 4.310914993286133, 6.8052978515625, 22.3828125, 4.859291076660156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000495.npy"}
{"epoch": 0.7482993197278912, "step": 496, "batch_size": 64, "mean": 8.523547172546387, "std": 14.092438697814941, "min": -17.500320434570312, "p10": -7.963307189941405, "median": 7.950885772705078, "p90": 26.992028999328625, "max": 48.91612243652344, "pos_frac": 0.734375, "sample": [11.385551452636719, 12.48858642578125, 48.91612243652344, 18.92486572265625, -3.30560302734375, 5.881095886230469, -12.37747573852539, 31.483856201171875, 15.408500671386719, 19.206741333007812, -1.5614509582519531, 6.092063903808594, 12.9287109375, 7.437370300292969, 3.9822235107421875, -2.6368865966796875, 0.49197959899902344, 12.859600067138672, 23.889638900756836, 17.873336791992188, -8.577896118164062, 28.321624755859375, 0.39060211181640625, 17.479949951171875, 9.486679077148438, -15.609832763671875, 34.438751220703125, 22.242156982421875, 2.846546173095703, 12.85382080078125, -13.574546813964844, 0.8686065673828125, 40.50347137451172, 34.71038818359375, 8.410743713378906, 5.271556854248047, -17.500320434570312, 1.8536796569824219, 0.8873443603515625, -3.1204910278320312, 5.630828857421875, 13.370643615722656, 20.41326904296875, 7.86943244934082, 14.778266906738281, 3.8129005432128906, 8.365585327148438, 8.032339096069336, -1.181051254272461, 9.562919616699219, 36.61195373535156, -16.905624389648438, -4.4702606201171875, 2.0032176971435547, 19.872817993164062, 9.66827392578125, 10.336841583251953, 20.273536682128906, -4.132503509521484, -0.8954620361328125, -6.529266357421875, 14.579212188720703, -17.006683349609375, -0.10581207275390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000496.npy"}
{"epoch": 0.7498110355253212, "step": 497, "batch_size": 64, "mean": 10.414495468139648, "std": 13.68516731262207, "min": -19.279014587402344, "p10": -3.7084022521972653, "median": 8.470138549804688, "p90": 25.982096481323246, "max": 49.2979736328125, "pos_frac": 0.796875, "sample": [46.79435348510742, 0.4257240295410156, 49.2979736328125, 23.661537170410156, 17.697845458984375, 0.3928489685058594, 10.872894287109375, 15.278865814208984, 7.240455627441406, 0.1705780029296875, 16.106264114379883, 13.29037094116211, 18.30997085571289, 0.4441375732421875, 7.9378204345703125, -0.09026336669921875, 17.923545837402344, -19.279014587402344, 16.68670654296875, 3.3793869018554688, -4.067019462585449, 26.611270904541016, 10.138725280761719, -2.5692825317382812, 3.93798828125, -3.7671127319335938, -3.5714111328125, 13.19012451171875, 17.69434356689453, 24.17755889892578, 0.13131141662597656, 5.61376953125, 3.8154296875, 20.90699005126953, -9.039932250976562, 18.599895477294922, 27.433616638183594, 14.644317626953125, 8.571952819824219, -15.757331848144531, 22.917343139648438, 5.295783996582031, 2.9089622497558594, 3.400970458984375, 8.202972412109375, -2.8098526000976562, 38.75233459472656, 38.825408935546875, -3.4309310913085938, 6.615028381347656, 8.368324279785156, 14.62451171875, 14.883270263671875, 12.10714340209961, 6.696624755859375, 37.67008972167969, -12.732330322265625, 11.675308227539062, -0.7394485473632812, 10.331130981445312, 12.4749755859375, 24.514022827148438, -4.544281005859375, 7.283111572265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000497.npy"}
{"epoch": 0.7513227513227513, "step": 498, "batch_size": 64, "mean": 9.127620697021484, "std": 14.43989372253418, "min": -28.012115478515625, "p10": -5.012401580810546, "median": 7.076711654663086, "p90": 29.402433776855478, "max": 45.27134323120117, "pos_frac": 0.71875, "sample": [-6.415424346923828, 11.14450454711914, 30.266586303710938, 1.2908935546875, 11.380706787109375, -14.667499542236328, 3.3677215576171875, 17.754867553710938, -0.6519889831542969, -2.67535400390625, 25.92212677001953, 7.588371276855469, -1.9527053833007812, 10.106460571289062, 23.415084838867188, 7.00323486328125, 1.8084487915039062, 4.689167022705078, -0.9194908142089844, 6.02606201171875, 43.582603454589844, 5.763755798339844, 20.897125244140625, -5.637968063354492, 8.300071716308594, 8.870803833007812, 0.3869476318359375, 24.1435546875, 17.165023803710938, -0.9082221984863281, 13.412948608398438, 25.294078826904297, 2.705728530883789, -2.518829345703125, 0.782196044921875, -10.1358642578125, -5.260337829589844, 22.38507843017578, 0.3639545440673828, 9.270257949829102, 33.525699615478516, 20.133604049682617, -28.012115478515625, 14.84246826171875, 10.074260711669922, 7.535194396972656, 7.993339538574219, 5.051399230957031, 9.959449768066406, 7.150188446044922, 30.725868225097656, 45.27134323120117, 9.146507263183594, -4.206123352050781, 27.386077880859375, -4.040041923522949, 40.165435791015625, 4.0945281982421875, -2.8369789123535156, -0.54144287109375, -6.6213531494140625, -4.4338836669921875, 4.592784881591797, 43.866859436035156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000498.npy"}
{"epoch": 0.7528344671201814, "step": 499, "batch_size": 64, "mean": 8.541094779968262, "std": 14.515359878540039, "min": -19.007164001464844, "p10": -10.59760856628418, "median": 8.942817687988281, "p90": 27.739606094360354, "max": 38.90850830078125, "pos_frac": 0.6875, "sample": [26.507312774658203, 35.161041259765625, -2.5130691528320312, 15.985458374023438, 13.561355590820312, 3.7297210693359375, -2.4234752655029297, -18.5078125, -10.731922149658203, 10.435073852539062, 11.81479263305664, 20.85724639892578, 19.257659912109375, -1.0963306427001953, 20.932861328125, 0.3940582275390625, 34.375152587890625, -12.446731567382812, 30.87335205078125, 23.662559509277344, 12.96282958984375, -5.0486907958984375, -11.050863265991211, -16.43075180053711, -10.26161003112793, 14.64068603515625, 10.828117370605469, 26.853927612304688, -1.9382781982421875, -6.123191833496094, -2.1685562133789062, -5.837406158447266, 19.39664077758789, 6.57366943359375, 11.946670532226562, 4.849433898925781, 10.154655456542969, 6.658485412597656, 4.228580474853516, -1.4965782165527344, 25.35236358642578, -2.2829246520996094, 33.37958526611328, 0.9093818664550781, -10.284210205078125, 1.8056182861328125, 7.730979919433594, 22.308517456054688, 25.512298583984375, 38.90850830078125, 12.661735534667969, 14.38626480102539, 11.846485137939453, -11.228286743164062, -19.007164001464844, 11.20880126953125, 28.119182586669922, 13.974685668945312, -7.4161529541015625, 25.142181396484375, 0.8752975463867188, 31.205673217773438, 2.8607330322265625, 0.09444427490234375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000499.npy"}
{"epoch": 0.7543461829176115, "step": 500, "batch_size": 64, "mean": 8.593254089355469, "std": 13.071296691894531, "min": -20.988414764404297, "p10": -8.223120117187497, "median": 5.197566986083984, "p90": 26.755440521240235, "max": 39.857421875, "pos_frac": 0.78125, "sample": [2.14959716796875, 3.9756927490234375, 3.6203079223632812, -0.7286529541015625, 14.857711791992188, 5.40924072265625, 19.1441650390625, 15.736740112304688, 4.233482360839844, 36.54438400268555, 11.808914184570312, 34.183860778808594, 20.638896942138672, 2.584259033203125, 26.925392150878906, 12.897991180419922, 39.857421875, -16.407264709472656, 8.86260986328125, 0.15892791748046875, 26.35888671875, -10.285354614257812, -0.08913421630859375, 15.320747375488281, -1.8780860900878906, 23.57012176513672, 23.56159210205078, 3.227691650390625, 1.4192161560058594, -12.617347717285156, -9.340946197509766, 1.55780029296875, 2.39178466796875, 10.623321533203125, -20.988414764404297, 0.9441070556640625, 6.2079620361328125, 15.43280029296875, 22.366790771484375, 1.75115966796875, 8.117897033691406, -1.2058753967285156, 10.997291564941406, 7.409000396728516, -11.649223327636719, 10.380485534667969, 3.4847564697265625, -9.29315185546875, 28.144268035888672, -0.5610122680664062, 27.685684204101562, -5.72637939453125, 4.216707229614258, 4.985893249511719, 20.188762664794922, 3.3098182678222656, 0.7544498443603516, 19.698348999023438, 19.619674682617188, 1.3837051391601562, -4.8340301513671875, 13.257461547851562, 31.291561126708984, 22.32379150390625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000500.npy"}
{"epoch": 0.7558578987150416, "step": 501, "batch_size": 64, "mean": 7.153685092926025, "std": 11.721728324890137, "min": -15.80206298828125, "p10": -3.5342725753784174, "median": 4.134700775146484, "p90": 27.612107086181656, "max": 39.49831008911133, "pos_frac": 0.6875, "sample": [-0.7381858825683594, 23.82140350341797, -1.19744873046875, 4.7139739990234375, 29.162010192871094, 13.970623016357422, -0.318023681640625, 9.457374572753906, 16.103668212890625, 39.49831008911133, 29.951156616210938, -7.714166641235352, 5.955959320068359, 16.02837371826172, -3.101318359375, -0.667877197265625, 12.655715942382812, 22.234603881835938, -15.80206298828125, -3.6881046295166016, 8.931396484375, -1.71759033203125, -0.44936275482177734, 6.060356140136719, 32.0543212890625, 23.99566650390625, 8.899139404296875, 29.977951049804688, -4.1932525634765625, -0.7301597595214844, -4.468208312988281, 5.077667236328125, 1.4730758666992188, 3.999359130859375, 18.407577514648438, 0.118011474609375, 8.189460754394531, 0.07594108581542969, 2.509550094604492, -3.1753311157226562, 3.7452354431152344, 33.662818908691406, -2.6258888244628906, 6.21295166015625, 0.896636962890625, -9.837989807128906, 4.0976409912109375, 2.02838134765625, -9.561996459960938, 7.1387939453125, 2.9450225830078125, 30.74030303955078, -0.5635757446289062, 5.7575836181640625, 4.171760559082031, 0.7703828811645508, -1.99114990234375, 4.645477294921875, 17.92205047607422, 10.546119689941406, 12.750633239746094, 6.985115051269531, 3.2965621948242188, -1.258575439453125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000501.npy"}
{"epoch": 0.7573696145124716, "step": 502, "batch_size": 64, "mean": 6.294438362121582, "std": 10.632394790649414, "min": -16.603057861328125, "p10": -5.216841125488282, "median": 5.25913143157959, "p90": 18.037259674072267, "max": 42.708251953125, "pos_frac": 0.75, "sample": [-7.311618804931641, 0.30916595458984375, 0.4535789489746094, -5.1644287109375, 6.4775543212890625, 9.949996948242188, 1.9817428588867188, 3.311248779296875, -7.970634460449219, 9.869598388671875, 8.4501953125, 25.010360717773438, -5.2393035888671875, -16.603057861328125, 5.62213134765625, 37.224998474121094, 3.722198486328125, -4.7992095947265625, 10.897819519042969, 0.885986328125, 17.559112548828125, -5.776531219482422, 11.065019607543945, 10.094635009765625, 6.230255126953125, 0.5420761108398438, 9.026176452636719, -3.838878631591797, 0.9421749114990234, 9.897811889648438, 3.7452621459960938, 0.19190597534179688, 5.912757873535156, 4.4818267822265625, 6.500080108642578, 3.673828125, 31.283794403076172, 42.708251953125, -6.9457855224609375, -4.18890380859375, -2.807098388671875, -2.6304550170898438, 5.855564117431641, -2.852325439453125, 4.89613151550293, -1.1662750244140625, 2.1256790161132812, 18.24217987060547, 20.474517822265625, -9.541412353515625, 12.904342651367188, 14.335845947265625, 14.096061706542969, 13.110733032226562, 9.752349853515625, 9.894989013671875, 9.860824584960938, 20.755043029785156, 2.6913528442382812, 11.174835205078125, 16.486122131347656, -1.8710784912109375, 14.03558349609375, 2.8373565673828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000502.npy"}
{"epoch": 0.7588813303099018, "step": 503, "batch_size": 64, "mean": 8.7264404296875, "std": 13.16087818145752, "min": -25.444671630859375, "p10": -6.281877136230468, "median": 6.8905487060546875, "p90": 27.934481048583987, "max": 37.28546905517578, "pos_frac": 0.78125, "sample": [5.681739807128906, -3.7633285522460938, 19.876022338867188, 6.860073089599609, 11.479629516601562, 8.489578247070312, 15.373146057128906, -25.444671630859375, -3.090118408203125, 25.882055282592773, 10.917922973632812, -2.88226318359375, 9.619400024414062, 37.28546905517578, -0.7738399505615234, 4.104522705078125, 2.7384109497070312, 1.7060394287109375, 28.494216918945312, -12.566452026367188, 0.6139984130859375, 10.021625518798828, 27.531993865966797, 1.9466514587402344, 6.360874176025391, 6.680755615234375, 6.123374938964844, 32.78673553466797, 6.921024322509766, 26.651893615722656, -6.237358093261719, 4.867462158203125, 28.09218978881836, 30.49713897705078, 28.896446228027344, -0.5186767578125, 10.077726364135742, -6.300956726074219, -6.3896942138671875, 1.104644775390625, -9.154296875, 2.2224197387695312, 1.2385082244873047, 10.546699523925781, 14.839410781860352, 7.583961486816406, 6.3803863525390625, 27.56649398803711, 21.612689971923828, 5.076532363891602, 31.53558349609375, 13.81454086303711, -13.930191040039062, 6.537057876586914, 22.51373291015625, 17.182373046875, 11.006989479064941, -5.320747375488281, 3.7251510620117188, 17.200145721435547, 11.109790802001953, 10.009002685546875, 17.57305908203125, -22.09247398376465], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000503.npy"}
{"epoch": 0.7603930461073318, "step": 504, "batch_size": 64, "mean": 11.30551528930664, "std": 12.032524108886719, "min": -7.478328704833984, "p10": -2.134260177612304, "median": 7.281558990478516, "p90": 28.303022575378428, "max": 44.43043518066406, "pos_frac": 0.828125, "sample": [-6.141387939453125, 19.496166229248047, 4.04071044921875, 5.9254150390625, -0.111053466796875, -2.491321563720703, 20.99066162109375, 7.254718780517578, 38.911582946777344, 5.148979187011719, -1.301116943359375, 10.097030639648438, 3.6620750427246094, 37.92527770996094, 4.200164794921875, 26.173730850219727, 19.288002014160156, -4.1519012451171875, 21.50334930419922, 31.69438934326172, 17.436100006103516, 3.6641921997070312, 5.280673980712891, 33.64374923706055, 21.549896240234375, -6.87005615234375, 1.1057395935058594, 20.899436950683594, 14.006370544433594, 33.42530822753906, 21.036041259765625, 11.5311279296875, 13.694269180297852, 21.170486450195312, 8.571022033691406, 2.88702392578125, 6.810771942138672, -0.0706024169921875, 29.215576171875, -4.157356262207031, 21.302963256835938, 3.7972335815429688, -1.2099647521972656, 1.9614715576171875, 10.671562194824219, 7.135383605957031, 5.483467102050781, 1.7617969512939453, 23.717254638671875, 7.308399200439453, 10.060691833496094, -7.478328704833984, -6.774608612060547, 7.163330078125, 5.782070159912109, 5.332557678222656, 2.213921546936035, 16.086288452148438, 15.353652954101562, 14.796600341796875, 16.53368377685547, 44.43043518066406, 16.84912109375, 4.3287811279296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000504.npy"}
{"epoch": 0.7619047619047619, "step": 505, "batch_size": 64, "mean": 7.06602668762207, "std": 14.367191314697266, "min": -26.756793975830078, "p10": -9.678170776367187, "median": 5.844264030456543, "p90": 22.58840370178223, "max": 43.50115203857422, "pos_frac": 0.65625, "sample": [21.992942810058594, 5.946798324584961, -0.00872802734375, -3.4637718200683594, -5.509969711303711, 16.484649658203125, -12.264480590820312, 42.8529052734375, 11.027990341186523, 15.2655029296875, 21.514511108398438, -4.765201568603516, 5.741729736328125, 2.7232513427734375, 20.246601104736328, 2.33941650390625, -5.286529541015625, -0.884613037109375, 7.956119537353516, 5.2696075439453125, -1.2146759033203125, 22.84360122680664, 17.758045196533203, 11.812957763671875, 14.738807678222656, 43.50115203857422, 18.959365844726562, -3.901947021484375, 0.3896217346191406, 7.048795700073242, 2.7541351318359375, -14.313339233398438, 24.41551971435547, -14.6834716796875, 32.296546936035156, 14.142112731933594, 10.528587341308594, 17.352584838867188, -7.896705627441406, -7.42041015625, 19.57617950439453, -4.5615692138671875, 10.625133514404297, -13.476333618164062, 9.254684448242188, 3.263965606689453, 12.789222717285156, -7.159709930419922, 26.740951538085938, -26.756793975830078, -0.5015945434570312, 9.367996215820312, 39.22502136230469, 11.888710021972656, 16.218379974365234, -8.589996337890625, 16.674148559570312, -2.4355506896972656, 4.132667541503906, 21.81246566772461, 5.313388824462891, 2.696308135986328, -10.14453125, -20.0174560546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000505.npy"}
{"epoch": 0.763416477702192, "step": 506, "batch_size": 64, "mean": 7.404515266418457, "std": 15.315028190612793, "min": -26.609493255615234, "p10": -12.218718719482421, "median": 6.1096649169921875, "p90": 28.949127197265632, "max": 47.353538513183594, "pos_frac": 0.734375, "sample": [8.752704620361328, 26.57726287841797, -0.8823966979980469, -2.171947479248047, 6.77618408203125, 7.4824676513671875, 7.3175201416015625, -10.178888320922852, 18.720346450805664, 3.74249267578125, 5.899223327636719, 14.771068572998047, 13.524429321289062, 2.4661788940429688, 9.324684143066406, 4.5422515869140625, -26.609493255615234, 39.55963134765625, 47.353538513183594, -3.67193603515625, 11.91849136352539, -17.746253967285156, -2.6342926025390625, -17.584243774414062, -22.48271942138672, -22.944419860839844, -0.40312957763671875, 29.541336059570312, 26.27105712890625, 13.729984283447266, 21.098716735839844, -11.551910400390625, 38.20872497558594, 2.1945037841796875, 30.686382293701172, 2.51678466796875, 18.685394287109375, 6.320106506347656, 9.437530517578125, 2.3017730712890625, -16.787109375, 0.25568389892578125, 3.267475128173828, 20.062545776367188, 4.875190734863281, 5.146297454833984, -4.989444732666016, 6.3873291015625, 3.7309036254882812, 4.017723083496094, 31.93415069580078, 11.39105224609375, 0.6230812072753906, 19.11963653564453, 12.244049072265625, 11.766571044921875, -0.23046112060546875, 27.567306518554688, -12.504493713378906, -9.119468688964844, 4.5258331298828125, 15.689407348632812, 14.386871337890625, 29.669692993164062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000506.npy"}
{"epoch": 0.764928193499622, "step": 507, "batch_size": 64, "mean": 8.375062942504883, "std": 13.078124046325684, "min": -16.747058868408203, "p10": -7.064687347412108, "median": 6.487375259399414, "p90": 27.02536125183107, "max": 41.77848815917969, "pos_frac": 0.765625, "sample": [17.03809356689453, 5.2743988037109375, -3.0365848541259766, 1.36724853515625, 35.28852844238281, 5.270500183105469, 14.003890991210938, -15.4461669921875, 0.8681068420410156, 12.599327087402344, -5.136135101318359, 13.086067199707031, -4.385406494140625, -6.1307830810546875, -4.594516754150391, 5.067138671875, 3.5468292236328125, -8.237937927246094, 20.74041748046875, 3.7004241943359375, 3.6972503662109375, 7.155181884765625, -6.2353973388671875, -3.03631591796875, 34.24310302734375, 4.195818901062012, 7.17915153503418, 22.382720947265625, 10.324319839477539, 7.622272491455078, 6.132667541503906, 4.4200439453125, 41.77848815917969, -14.426483154296875, 15.453838348388672, 16.018354415893555, 6.614601135253906, 6.063179016113281, -13.224203109741211, -0.9995155334472656, 15.196395874023438, 12.139923095703125, 7.68011474609375, 31.022920608520508, 14.69900131225586, 1.8823394775390625, 11.193656921386719, 11.629913330078125, 34.36932373046875, 37.91645812988281, -9.747413635253906, 16.245101928710938, 4.386127471923828, 2.6586761474609375, 11.308731079101562, 3.271392822265625, 6.360149383544922, -7.420097351074219, 19.07912826538086, -16.747058868408203, 20.810951232910156, 21.63800048828125, 11.172718048095703, 29.015064239501953], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000507.npy"}
{"epoch": 0.7664399092970522, "step": 508, "batch_size": 64, "mean": 9.376272201538086, "std": 14.101686477661133, "min": -14.98486328125, "p10": -4.400241088867187, "median": 8.011913299560547, "p90": 30.770427703857425, "max": 50.31703186035156, "pos_frac": 0.71875, "sample": [12.043251037597656, -3.1753463745117188, 35.928131103515625, 14.456878662109375, 5.342292785644531, -12.564872741699219, 46.58696746826172, -1.335536003112793, 11.38592529296875, 31.02605438232422, 50.31703186035156, 30.173965454101562, -4.538719177246094, 6.723056793212891, -5.2999267578125, 10.174018859863281, -12.143787384033203, 9.773757934570312, 14.917499542236328, -3.9996776580810547, 10.603790283203125, -4.077125549316406, 10.671724319458008, 1.2344284057617188, 0.9762954711914062, -14.98486328125, 4.20478630065918, 5.828544616699219, 25.65008544921875, 8.015556335449219, 8.9156494140625, 10.565322875976562, 7.504425048828125, 5.4384307861328125, 19.02109146118164, 7.278228759765625, -2.991924285888672, 16.143600463867188, 28.441802978515625, -2.9533767700195312, -1.9271049499511719, -2.0170440673828125, 29.973052978515625, 16.62557601928711, 0.9556121826171875, -8.765472412109375, 6.6750640869140625, 13.91845703125, 33.54827880859375, 9.078330993652344, 16.0823974609375, -2.895526885986328, 0.4222698211669922, -3.803609848022461, 2.3931427001953125, -13.605342864990234, -3.8906402587890625, 32.12898635864258, 17.91241455078125, 19.11444091796875, 9.033538818359375, 8.008270263671875, 8.70098876953125, 31.137901306152344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000508.npy"}
{"epoch": 0.7679516250944822, "step": 509, "batch_size": 64, "mean": 8.851554870605469, "std": 10.979546546936035, "min": -14.553695678710938, "p10": -6.13698101043701, "median": 8.847983360290527, "p90": 22.963791275024416, "max": 38.67627716064453, "pos_frac": 0.796875, "sample": [11.241043090820312, -4.0716705322265625, -0.48764801025390625, 15.475791931152344, 14.939849853515625, 2.0313796997070312, 6.323970794677734, -6.754924774169922, -3.4635848999023438, 14.264396667480469, 24.294414520263672, 21.836097717285156, 13.740447998046875, 8.459859848022461, 20.47698974609375, -14.553695678710938, 25.479644775390625, 25.961936950683594, 11.690185546875, 2.520538330078125, 18.97808837890625, -7.174716949462891, 13.57489013671875, -4.695112228393555, 10.395843505859375, 1.189056396484375, 4.4268951416015625, -4.572624206542969, 9.830665588378906, 2.1865234375, 27.436561584472656, 23.121150970458984, 4.579345703125, 4.951911926269531, 2.377471923828125, -0.19964885711669922, 11.90655517578125, 1.0614242553710938, 22.59661865234375, 8.010326385498047, 18.522193908691406, 6.847064971923828, 13.131820678710938, -11.894920349121094, 10.438667297363281, 11.836540222167969, 1.9030609130859375, 16.367828369140625, 38.67627716064453, 18.702800750732422, 15.872848510742188, 4.356143951416016, 2.625335693359375, -9.385894775390625, 0.2732696533203125, 20.117435455322266, 14.679428100585938, -9.425045013427734, -7.550926208496094, 8.352828979492188, 9.236106872558594, 19.611297607421875, 25.698043823242188, 8.121089935302734], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000509.npy"}
{"epoch": 0.7694633408919124, "step": 510, "batch_size": 64, "mean": 9.982291221618652, "std": 14.42245864868164, "min": -19.885578155517578, "p10": -4.683008575439452, "median": 8.066869258880615, "p90": 28.724000358581545, "max": 44.01146697998047, "pos_frac": 0.765625, "sample": [-3.7110824584960938, 14.150516510009766, 23.063186645507812, 5.7152557373046875, -5.09954833984375, 6.01947021484375, 17.566547393798828, 42.467323303222656, 5.19989013671875, 4.9274444580078125, -0.6471633911132812, 4.293495178222656, 3.9263763427734375, 4.840755462646484, -12.438657760620117, -0.4152183532714844, 1.5875396728515625, 18.33281707763672, -1.737274169921875, 8.666938781738281, 12.264968872070312, 21.802791595458984, 28.155242919921875, -3.4781875610351562, 5.009571075439453, 16.558067321777344, 14.599159240722656, 4.400115966796875, 2.935731887817383, 13.630111694335938, 13.623554229736328, 11.930435180664062, 28.20147705078125, 2.0783767700195312, 7.466799736022949, 15.282669067382812, 12.938087463378906, 44.01146697998047, -3.6948509216308594, -17.43133544921875, 3.332120895385742, -18.18518829345703, 16.868785858154297, 15.778728485107422, 34.74615478515625, 7.280632019042969, 12.103775024414062, 40.28802490234375, 26.970947265625, -3.3968238830566406, 16.572481155395508, 15.760902404785156, 26.67444610595703, 14.448005676269531, -19.885578155517578, 5.5459136962890625, 10.552001953125, 30.672767639160156, -1.5758304595947266, 28.947938919067383, 34.92742919921875, 2.0839099884033203, -17.48204803466797, -11.15570068359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000510.npy"}
{"epoch": 0.7709750566893424, "step": 511, "batch_size": 64, "mean": 11.120245933532715, "std": 14.591052055358887, "min": -15.887069702148438, "p10": -6.012085723876952, "median": 8.02392578125, "p90": 30.724400329589844, "max": 43.83501434326172, "pos_frac": 0.765625, "sample": [29.18560791015625, 23.785968780517578, 7.8425750732421875, 39.488731384277344, 10.130016326904297, -0.5651931762695312, 7.032615661621094, -6.4918060302734375, 21.419021606445312, 15.980262756347656, 9.221725463867188, -4.892738342285156, 3.2962417602539062, 3.2862548828125, 23.287860870361328, 22.158966064453125, 18.325468063354492, 2.0087318420410156, 29.64740753173828, 23.41803741455078, 27.839935302734375, -15.887069702148438, 2.283782958984375, -13.664546966552734, -3.8512725830078125, -1.252593994140625, 35.03401184082031, 18.134193420410156, 15.641647338867188, 1.8403472900390625, 2.4973678588867188, 8.205276489257812, -14.153717041015625, 6.3771514892578125, 24.19806671142578, 8.266494750976562, 19.818992614746094, -2.696788787841797, -1.1072921752929688, 7.388664245605469, 6.0817108154296875, 8.937923431396484, 30.915443420410156, 28.150535583496094, 38.62031555175781, 18.335586547851562, -7.466819763183594, 5.222930908203125, 2.4783859252929688, 37.327178955078125, 4.346160888671875, 2.7383346557617188, 9.060958862304688, 6.691490173339844, -1.9356374740600586, 17.311199188232422, -9.018610000610352, 43.83501434326172, 13.433609008789062, 30.27863311767578, 36.308326721191406, 0.5902252197265625, -12.055465698242188, -0.9700851440429688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000511.npy"}
{"epoch": 0.7724867724867724, "step": 512, "batch_size": 64, "mean": 7.068803310394287, "std": 13.874871253967285, "min": -18.66290283203125, "p10": -7.8346633911132795, "median": 3.588716506958008, "p90": 27.054305267333984, "max": 45.56446838378906, "pos_frac": 0.703125, "sample": [14.179229736328125, 6.60821533203125, 9.587749481201172, -14.508319854736328, -5.414577484130859, 37.325130462646484, -2.7087059020996094, 2.4167823791503906, 2.029756546020508, -1.4845199584960938, 6.8612823486328125, 23.539833068847656, -11.7666015625, -3.0783843994140625, 9.501415252685547, 15.355712890625, -3.4345016479492188, -10.226615905761719, 17.88749122619629, 2.938556671142578, 32.92420959472656, 14.360321044921875, 0.23468017578125, 20.415424346923828, -3.205841064453125, 27.105636596679688, -18.66290283203125, 13.728958129882812, -1.20672607421875, 32.30085754394531, 0.7207794189453125, 16.531417846679688, 35.90007019042969, 0.9338207244873047, 11.073272705078125, 1.28802490234375, 0.4180927276611328, 19.020606994628906, 3.792034149169922, 26.934532165527344, -10.184341430664062, -6.2954559326171875, -5.6492767333984375, 0.77655029296875, 5.843452453613281, 35.573638916015625, 0.1317901611328125, 4.025478363037109, 0.10089111328125, -6.089519500732422, 16.949607849121094, 3.7846717834472656, 3.39276123046875, 0.9025497436523438, 4.3294677734375, 12.954582214355469, -2.4736785888671875, 6.772548675537109, 45.56446838378906, 25.854705810546875, -5.188066482543945, -11.624652862548828, 11.229358673095703, -8.49432373046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000512.npy"}
{"epoch": 0.7739984882842026, "step": 513, "batch_size": 64, "mean": 8.985391616821289, "std": 12.28852367401123, "min": -9.426071166992188, "p10": -3.44642219543457, "median": 6.90423583984375, "p90": 29.272434997558598, "max": 49.06035232543945, "pos_frac": 0.78125, "sample": [30.177719116210938, 24.047679901123047, 38.8499755859375, 0.3234443664550781, 8.546836853027344, 10.549129486083984, 10.766101837158203, 9.163093566894531, 4.295055389404297, 4.233146667480469, 8.899124145507812, -2.345855712890625, 27.8880615234375, 5.4783477783203125, 1.869415283203125, 16.113859176635742, 5.022312164306641, 1.4612579345703125, 49.06035232543945, 12.612777709960938, 3.6471405029296875, 30.89169692993164, 6.8783416748046875, 7.618663787841797, 12.16006851196289, -3.5467491149902344, -0.77734375, 4.323524475097656, -1.7385902404785156, 0.7874107360839844, 8.999265670776367, 8.77862548828125, 4.9254302978515625, -7.882396697998047, 12.158004760742188, 5.5325469970703125, 2.3500518798828125, 9.18487548828125, 34.48527526855469, 4.113735198974609, -6.157665252685547, -6.548368453979492, 29.865737915039062, -3.2123260498046875, 18.19274139404297, 27.029312133789062, 38.102935791015625, -9.426071166992188, 15.904586791992188, 6.591152191162109, 13.630851745605469, -8.548885345458984, 9.961128234863281, 10.941238403320312, 6.9301300048828125, -5.183704376220703, 2.4907302856445312, 9.637622833251953, -2.0522193908691406, 3.7624149322509766, 8.524009704589844, -1.2009506225585938, -1.7743682861328125, 7.703582763671875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000513.npy"}
{"epoch": 0.7755102040816326, "step": 514, "batch_size": 64, "mean": 9.43563461303711, "std": 13.044933319091797, "min": -24.562339782714844, "p10": -3.917700958251953, "median": 8.082538604736328, "p90": 26.802063751220707, "max": 47.908843994140625, "pos_frac": 0.734375, "sample": [7.354339599609375, 6.30462646484375, 7.9610748291015625, 24.566226959228516, 5.717643737792969, -1.4480819702148438, 12.00201416015625, -1.6150588989257812, 8.7705078125, -0.3485679626464844, 19.482929229736328, 21.983436584472656, 8.45950698852539, 16.0726318359375, -1.5538749694824219, -2.3724517822265625, -4.1333770751953125, -0.08461952209472656, 1.1447410583496094, 3.0919265747070312, -24.562339782714844, -1.7206649780273438, 7.624198913574219, 15.663681030273438, 39.814022064208984, -3.531005859375, 8.075492858886719, 18.516380310058594, 16.71092987060547, 19.183578491210938, 8.089584350585938, -12.635414123535156, 30.420249938964844, 10.379638671875, 6.0392913818359375, 14.597694396972656, 13.003036499023438, 20.131072998046875, 8.633049011230469, 2.6024131774902344, 0.6436691284179688, -12.665786743164062, -9.624963760375977, 8.518302917480469, 8.191619873046875, 17.098434448242188, 8.048988342285156, -1.2284393310546875, 26.158775329589844, 17.261947631835938, 0.340667724609375, 2.7465667724609375, 14.578176498413086, 30.224014282226562, -8.32440185546875, 28.667015075683594, 27.0777587890625, 5.096260070800781, 19.996826171875, -4.083427429199219, 47.908843994140625, 24.140716552734375, -3.400238037109375, 28.118812561035156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000514.npy"}
{"epoch": 0.7770219198790628, "step": 515, "batch_size": 64, "mean": 8.115760803222656, "std": 13.31826400756836, "min": -20.462753295898438, "p10": -9.635838317871093, "median": 7.623760223388672, "p90": 27.24854431152344, "max": 41.58283996582031, "pos_frac": 0.765625, "sample": [4.9478759765625, 0.9123268127441406, 1.95550537109375, 6.406154632568359, 27.335235595703125, 24.622230529785156, 0.8671035766601562, 14.440196990966797, 10.711257934570312, 17.3984375, 10.275360107421875, 28.015090942382812, 13.100500106811523, -5.588193893432617, 11.209304809570312, -9.94720458984375, -6.247802734375, 10.70245361328125, -13.893280029296875, -11.500411987304688, 1.127777099609375, 0.6886444091796875, 0.0369110107421875, 8.964187622070312, -8.909317016601562, 10.477640151977539, 38.80708312988281, 18.48505401611328, 8.2476806640625, 1.9739456176757812, 3.4069595336914062, 5.696308135986328, -0.8497467041015625, 0.4641838073730469, 22.261539459228516, 27.40019989013672, -11.651390075683594, 19.30219268798828, -3.3524093627929688, 6.9605865478515625, -10.534706115722656, 9.359657287597656, 26.570556640625, -5.0758209228515625, 20.085403442382812, 41.58283996582031, 1.754180908203125, 13.635753631591797, 18.667869567871094, 27.0462646484375, 4.813323974609375, 9.715988159179688, 8.057228088378906, 31.981658935546875, 27.44336700439453, -5.298427581787109, 17.268634796142578, 2.9321441650390625, 15.238685607910156, -14.184394836425781, 10.95401382446289, 7.1902923583984375, -4.593256950378418, -20.462753295898438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000515.npy"}
{"epoch": 0.7785336356764928, "step": 516, "batch_size": 64, "mean": 7.497915744781494, "std": 11.781123161315918, "min": -25.04665756225586, "p10": -6.314183044433594, "median": 8.103710174560547, "p90": 20.994192314147952, "max": 38.17853546142578, "pos_frac": 0.8125, "sample": [26.057167053222656, 38.17853546142578, 10.85866928100586, -5.114044189453125, -5.2480621337890625, 21.610336303710938, 2.784423828125, 8.325267791748047, 20.304719924926758, 6.609918594360352, 1.5167465209960938, 13.786197662353516, 5.28741455078125, 27.065147399902344, 3.324932098388672, 0.1521778106689453, 19.947311401367188, 8.281635284423828, 8.115966796875, 12.694793701171875, 2.20318603515625, -25.04665756225586, 7.153297424316406, 10.127311706542969, 35.19439697265625, 14.113292694091797, 5.845741271972656, -0.9973335266113281, 21.28968048095703, 11.284961700439453, -19.154296875, -8.74846076965332, 0.5858364105224609, -11.4263916015625, 1.2862091064453125, 9.733512878417969, 14.405876159667969, 5.572168350219727, 5.556024551391602, 4.648700714111328, 16.45732879638672, -20.239532470703125, 10.270881652832031, 16.43071937561035, 18.754486083984375, -6.400604248046875, 9.64166259765625, 10.8154296875, 8.243362426757812, 3.5045318603515625, -6.6683197021484375, 11.659896850585938, 10.08245849609375, 7.464637756347656, 4.321807861328125, 16.824569702148438, 2.6157302856445312, 8.091453552246094, -6.1125335693359375, -2.8759632110595703, 16.092315673828125, 2.021902084350586, 12.409782409667969, 28.294288635253906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000516.npy"}
{"epoch": 0.780045351473923, "step": 517, "batch_size": 64, "mean": 7.712836742401123, "std": 15.735404014587402, "min": -24.382373809814453, "p10": -7.479921150207519, "median": 5.0511932373046875, "p90": 32.144020652771, "max": 48.381221771240234, "pos_frac": 0.65625, "sample": [21.896514892578125, -14.602546691894531, 2.8087997436523438, 3.785734176635742, -11.194717407226562, 8.911338806152344, 2.6752090454101562, -1.6567268371582031, 24.402587890625, 22.24346160888672, 9.585594177246094, -7.510978698730469, 0.8320083618164062, 5.446968078613281, -0.5647964477539062, -0.4173736572265625, 4.485385894775391, 6.026679992675781, 12.94278335571289, -6.733379364013672, 17.976295471191406, -4.339141845703125, 8.422027587890625, 35.08053207397461, -12.819580078125, 7.103916168212891, 5.720180511474609, 10.101322174072266, -5.31500244140625, 18.123306274414062, 9.21982192993164, 40.688629150390625, 14.0224609375, 3.538959503173828, -5.360710144042969, -23.98330307006836, -1.228057861328125, 38.67344665527344, -1.5465507507324219, -4.459465026855469, 6.285057067871094, 24.389385223388672, 44.44255828857422, 1.634124755859375, 2.8197860717773438, 32.213600158691406, 6.5238037109375, 10.03369140625, -6.991582870483398, -1.5373306274414062, -7.407453536987305, 9.435649871826172, 1.9023971557617188, 48.381221771240234, 38.18126678466797, 31.17449188232422, 6.402778625488281, -6.035728454589844, 4.655418395996094, -0.25138092041015625, 31.98166847229004, -24.382373809814453, 14.639446258544922, -7.8505706787109375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000517.npy"}
{"epoch": 0.781557067271353, "step": 518, "batch_size": 64, "mean": 7.462783336639404, "std": 13.377872467041016, "min": -15.291244506835938, "p10": -7.649910926818848, "median": 4.856035232543945, "p90": 28.529077911376962, "max": 45.024940490722656, "pos_frac": 0.734375, "sample": [7.7009124755859375, -8.46923828125, -1.5297927856445312, 5.701148986816406, -0.51324462890625, 0.267578125, 2.9794692993164062, 14.995948791503906, -11.381454467773438, 12.080192565917969, 5.479026794433594, -6.067649841308594, 5.957008361816406, 21.545272827148438, 20.87149429321289, -7.146881103515625, 12.118148803710938, 22.13774299621582, -3.616687774658203, 11.14645004272461, 29.3662109375, -15.291244506835938, 0.15312957763671875, 15.296142578125, 26.575767517089844, 9.255424499511719, -13.667015075683594, 6.3801116943359375, -13.41400146484375, 23.726882934570312, 0.43236255645751953, -2.5125198364257812, -1.9046688079833984, 7.990135192871094, 29.87115478515625, -5.195137023925781, 9.22015380859375, 0.9419097900390625, 16.78564453125, 45.024940490722656, 1.4443511962890625, 4.7188720703125, 3.0199642181396484, 38.145042419433594, 15.616836547851562, 32.765899658203125, -7.008598327636719, 29.520381927490234, 2.43096923828125, -9.929672241210938, -7.623601913452148, 3.5601348876953125, 14.149658203125, 1.0738773345947266, 0.2731590270996094, 22.490036010742188, 13.12594985961914, 4.993198394775391, 6.559680938720703, -7.661186218261719, 4.336780548095703, 1.3889427185058594, 33.02406311035156, 3.9125595092773438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000518.npy"}
{"epoch": 0.783068783068783, "step": 519, "batch_size": 64, "mean": 4.838583946228027, "std": 15.57785415649414, "min": -38.76539611816406, "p10": -12.39114646911621, "median": 2.4977149963378906, "p90": 25.695719146728518, "max": 38.342430114746094, "pos_frac": 0.609375, "sample": [-12.083606719970703, 11.917045593261719, -19.474639892578125, -23.564987182617188, -2.7860755920410156, 13.237041473388672, 1.4715194702148438, 38.342430114746094, -4.63214111328125, 8.615692138671875, 0.3591766357421875, 9.599983215332031, 20.1058406829834, 1.8906021118164062, 2.50836181640625, 5.857631683349609, -1.1935501098632812, -1.5597000122070312, -9.991836547851562, -12.52294921875, 23.538341522216797, 25.25090789794922, -8.113174438476562, 14.893218994140625, 22.34332275390625, 33.77226257324219, -8.214134216308594, 11.745948791503906, 15.013534545898438, 12.8812255859375, -6.116785049438477, 24.169601440429688, 2.4870681762695312, 9.925277709960938, 5.031551361083984, -1.5212554931640625, -1.16827392578125, 1.0491943359375, 25.20722198486328, -17.00965118408203, -0.31656646728515625, 8.74777603149414, 25.8863525390625, 2.4142684936523438, 12.405319213867188, -38.76539611816406, 4.868873596191406, -13.068923950195312, 15.336204528808594, -26.798908233642578, 27.962345123291016, -7.1476898193359375, -8.502037048339844, 31.217041015625, 1.4899444580078125, 7.289337158203125, 21.579513549804688, 30.435802459716797, -7.309562683105469, -8.83049201965332, 7.738622665405273, -5.9741363525390625, 26.031761169433594, -8.281312942504883], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000519.npy"}
{"epoch": 0.7845804988662132, "step": 520, "batch_size": 64, "mean": 7.682526588439941, "std": 10.729304313659668, "min": -13.943611145019531, "p10": -2.4349365234375, "median": 4.505897521972656, "p90": 22.80618934631348, "max": 37.218868255615234, "pos_frac": 0.796875, "sample": [12.708171844482422, -2.5924205780029297, -0.14563941955566406, 9.720108032226562, 15.295783996582031, 21.59326171875, 12.899772644042969, 0.46363067626953125, 1.8368339538574219, 8.549346923828125, 13.731590270996094, 1.8392791748046875, 0.465728759765625, 1.8498344421386719, 8.733108520507812, 4.389778137207031, 2.4603500366210938, -13.943611145019531, 6.783500671386719, 25.443355560302734, 15.754379272460938, 19.246234893798828, 1.7314414978027344, 37.218868255615234, 8.725067138671875, 7.579078674316406, 33.24418640136719, 4.0001373291015625, 1.2289810180664062, 18.504379272460938, 4.3697662353515625, 2.4704551696777344, 3.50341796875, -0.5192108154296875, 6.504280090332031, 1.924947738647461, 33.507568359375, 12.4173583984375, 4.180774688720703, -0.3441009521484375, -8.722894668579102, -2.4478912353515625, 6.106971740722656, -9.611297607421875, 3.5389022827148438, 2.0637550354003906, 4.494361877441406, 4.517433166503906, 16.52900505065918, 11.2855224609375, -2.4047088623046875, 29.464004516601562, 21.125503540039062, 7.701438903808594, 26.952255249023438, 11.945571899414062, 0.579803466796875, 7.622993469238281, -1.555938720703125, -6.149627685546875, -2.2375259399414062, 10.211397171020508, 23.32601547241211, -9.983100891113281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000520.npy"}
{"epoch": 0.7860922146636432, "step": 521, "batch_size": 64, "mean": 7.660675048828125, "std": 12.793606758117676, "min": -27.094459533691406, "p10": -3.6688211441040037, "median": 3.935708999633789, "p90": 24.405076599121095, "max": 43.991294860839844, "pos_frac": 0.671875, "sample": [9.221599578857422, 3.689300537109375, 25.084640502929688, -1.22308349609375, 21.076499938964844, -2.675048828125, 23.872695922851562, 1.9138870239257812, -3.9261398315429688, 16.74761199951172, 12.157936096191406, -4.3307647705078125, 9.24005126953125, 2.698394775390625, 21.776508331298828, 15.461395263671875, 4.983528137207031, 9.342239379882812, -9.411453247070312, 33.408843994140625, -0.2349700927734375, 26.850128173828125, -1.426727294921875, -2.2910995483398438, 11.32603645324707, 17.411109924316406, -3.7271270751953125, 21.766361236572266, 3.6796646118164062, 2.0460281372070312, 43.991294860839844, 35.8544921875, -22.502029418945312, 8.542577743530273, -0.46421051025390625, -1.3809661865234375, 2.65411376953125, 3.0354537963867188, -0.2710838317871094, 20.473434448242188, 24.63323974609375, 0.6281394958496094, -3.532773971557617, 2.3489227294921875, -2.2827091217041016, -5.171152114868164, 13.896129608154297, 15.255447387695312, 3.5770835876464844, 25.98354721069336, -3.366565704345703, 7.15289306640625, -27.094459533691406, 22.4539794921875, -2.275482177734375, 4.182117462158203, 8.616668701171875, -2.3146209716796875, 14.004638671875, -1.02532958984375, 21.693092346191406, 8.959480285644531, 1.4815826416015625, 8.0382080078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000521.npy"}
{"epoch": 0.7876039304610734, "step": 522, "batch_size": 64, "mean": 6.073455810546875, "std": 13.86331844329834, "min": -31.205039978027344, "p10": -11.834510803222654, "median": 4.554513931274414, "p90": 23.945852661132815, "max": 43.097389221191406, "pos_frac": 0.671875, "sample": [5.01202392578125, 11.086864471435547, 4.674968719482422, 11.59939193725586, 6.447242736816406, 22.38232421875, 7.1353759765625, 23.076828002929688, 29.556472778320312, 4.434059143066406, -14.678581237792969, -12.982755661010742, 7.118507385253906, 43.097389221191406, 0.6833572387695312, 10.19903564453125, -12.3876953125, 10.764699935913086, 7.341419219970703, -1.2917251586914062, 2.2265777587890625, 0.6639900207519531, -2.3494873046875, -31.205039978027344, 12.93511962890625, -4.65673828125, -2.8811607360839844, 2.8067474365234375, -14.315948486328125, 1.2728195190429688, -2.202301025390625, 19.494972229003906, -8.7928466796875, 23.797256469726562, 15.347625732421875, -6.8011932373046875, -4.601020812988281, 6.294624328613281, -2.907703399658203, 6.242576599121094, 13.641342163085938, 0.24904632568359375, -0.6109771728515625, 1.4757461547851562, 9.242202758789062, -1.466176986694336, 24.009536743164062, -10.543746948242188, 28.90398406982422, 2.8294639587402344, 20.621612548828125, -16.57452392578125, 7.355125427246094, 31.58160400390625, 4.406669616699219, -0.8112106323242188, -17.317733764648438, 27.615318298339844, 19.359310150146484, 31.500839233398438, 0.8421211242675781, -0.40235137939453125, 16.043930053710938, 23.111984252929688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000522.npy"}
{"epoch": 0.7891156462585034, "step": 523, "batch_size": 64, "mean": 8.297507286071777, "std": 11.87272834777832, "min": -23.539962768554688, "p10": -6.121337127685544, "median": 8.154130935668945, "p90": 23.073274993896487, "max": 32.64909362792969, "pos_frac": 0.78125, "sample": [-1.92681884765625, 26.921527862548828, 9.528282165527344, 2.572601318359375, 17.328903198242188, 22.88385009765625, 3.1143112182617188, 18.489227294921875, 18.725006103515625, 5.11419677734375, 2.3525257110595703, 14.692825317382812, 32.1629638671875, 3.824493408203125, 6.877685546875, 17.09759521484375, 11.428802490234375, -2.9190502166748047, 11.753143310546875, -2.6360855102539062, 6.741844177246094, 14.813720703125, -2.4788742065429688, 5.562095642089844, -11.978408813476562, 7.716739654541016, -0.13425445556640625, -2.0003509521484375, 4.65576171875, 10.559139251708984, 15.6390380859375, 17.79914093017578, 24.77294921875, 9.814641952514648, 23.154457092285156, 32.64909362792969, 31.61595916748047, 0.39990234375, 17.8656005859375, -16.65802001953125, 6.901031494140625, -8.839950561523438, 27.012550354003906, 5.3108062744140625, 14.272705078125, 3.4527511596679688, 0.08613204956054688, 8.591522216796875, 12.138580322265625, -18.163230895996094, 15.421295166015625, 4.698986053466797, 15.174629211425781, -3.4276199340820312, 10.600349426269531, 21.494949340820312, 4.5073394775390625, 10.191558837890625, 11.891647338867188, -23.539962768554688, -10.4974365234375, 5.829135894775391, -7.275787353515625, 17.312313079833984], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000523.npy"}
{"epoch": 0.7906273620559335, "step": 524, "batch_size": 64, "mean": 6.214181900024414, "std": 11.487544059753418, "min": -26.99494171142578, "p10": -5.765383911132812, "median": 7.331886291503906, "p90": 19.512768554687504, "max": 35.941062927246094, "pos_frac": 0.671875, "sample": [8.8958740234375, 8.501419067382812, 23.58538055419922, 3.8330917358398438, -2.1651763916015625, 0.3600921630859375, -6.00701904296875, 3.4366912841796875, 19.798248291015625, 2.990192413330078, 10.608062744140625, -4.376644134521484, -13.136911392211914, 1.7173233032226562, -5.201568603515625, 23.981826782226562, -21.160194396972656, -7.64044189453125, 13.250328063964844, 18.846649169921875, 11.883697509765625, -26.99494171142578, 6.7425079345703125, 5.937349319458008, -0.4542083740234375, 3.1511898040771484, 26.127914428710938, 17.696884155273438, 13.272424697875977, -4.471160888671875, -1.4209365844726562, 10.34619140625, -3.2200851440429688, 7.993560791015625, 7.9212646484375, 8.583648681640625, 16.39617919921875, 9.878944396972656, -1.445831298828125, -2.4141769409179688, -4.149200439453125, -1.965972900390625, -3.2432403564453125, 16.551773071289062, 10.586227416992188, 24.557785034179688, 14.95672607421875, -3.7715225219726562, 8.7161865234375, 13.81390380859375, 5.653877258300781, 10.777717590332031, 11.47153091430664, 18.73242950439453, 6.298065185546875, 11.380819320678711, 26.20208740234375, 11.530632019042969, -12.13232421875, -1.5370216369628906, 35.941062927246094, -6.7347412109375, 17.777511596679688, 0.6656646728515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000524.npy"}
{"epoch": 0.7921390778533636, "step": 525, "batch_size": 64, "mean": 5.876955032348633, "std": 12.93425178527832, "min": -15.339469909667969, "p10": -9.451907730102537, "median": 4.007102966308594, "p90": 23.45762100219727, "max": 43.297515869140625, "pos_frac": 0.640625, "sample": [3.2176361083984375, -8.132125854492188, -1.6184921264648438, -4.323802947998047, -13.604110717773438, 7.911865234375, 4.105110168457031, 8.964553833007812, 29.662063598632812, -0.46855926513671875, 8.448493957519531, -2.8781204223632812, 3.4112186431884766, 6.046863555908203, 5.3871612548828125, 17.083038330078125, 19.35832405090332, -12.843666076660156, 36.692962646484375, -0.9277420043945312, 3.9090957641601562, -5.573474884033203, 14.008773803710938, 29.259552001953125, -5.867820739746094, 7.974237442016602, 5.448982238769531, -1.4536094665527344, -0.4346160888671875, 2.7403297424316406, -10.505237579345703, 15.406478881835938, 2.1971702575683594, 15.783092498779297, 10.438796997070312, -1.5579833984375, 6.095478057861328, 4.6972808837890625, 43.297515869140625, -14.69698715209961, -6.808052062988281, 3.6623153686523438, 9.887590408325195, -15.339469909667969, -5.07208251953125, 20.280487060546875, 1.4589862823486328, 1.535888671875, 24.18560028076172, 41.20420837402344, 7.8708038330078125, 21.759002685546875, 7.541015625, 0.5672760009765625, 8.321723937988281, 5.422698974609375, -4.1592254638671875, 11.051177978515625, -1.2572479248046875, 8.674331665039062, -10.017528533935547, 30.420166015625, -1.3491935729980469, -10.37506103515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000525.npy"}
{"epoch": 0.7936507936507936, "step": 526, "batch_size": 64, "mean": 6.332164764404297, "std": 11.400925636291504, "min": -22.809326171875, "p10": -6.564289474487303, "median": 5.200655937194824, "p90": 20.707296752929693, "max": 36.32310485839844, "pos_frac": 0.734375, "sample": [36.32310485839844, 5.821739196777344, 4.210319519042969, 4.5680999755859375, -4.818126678466797, 16.08479118347168, 5.985805511474609, 26.380889892578125, -11.435836791992188, -3.1391754150390625, 8.92022705078125, 11.581722259521484, -4.787311553955078, 3.1086273193359375, 15.297348022460938, 0.6771621704101562, 5.388418197631836, 28.417251586914062, 8.792335510253906, 18.069854736328125, 5.0128936767578125, 19.47887420654297, 5.6436614990234375, 22.016708374023438, 10.222328186035156, 5.6376190185546875, 19.445022583007812, 3.6617813110351562, -8.510189056396484, 8.48046875, 0.9229335784912109, -2.4827117919921875, 3.3575592041015625, -4.196800231933594, 13.746456146240234, -2.9674224853515625, 3.025737762451172, -0.707916259765625, 13.865715026855469, -8.295333862304688, 19.712112426757812, 7.901264190673828, 2.228485107421875, -3.8121566772460938, 9.462448120117188, -17.655399322509766, -7.312644958496094, -1.2725601196289062, 29.27025604248047, -3.1939697265625, 12.450607299804688, 0.49338531494140625, 21.133804321289062, 2.3818893432617188, 33.426326751708984, 6.572807312011719, -7.676332473754883, 11.202743530273438, 4.73736572265625, 9.715248107910156, 1.6991119384765625, 2.5160293579101562, 11.282394409179688, -22.809326171875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000526.npy"}
{"epoch": 0.7951625094482238, "step": 527, "batch_size": 64, "mean": 5.349227428436279, "std": 11.839600563049316, "min": -25.554214477539062, "p10": -8.10306167602539, "median": 4.219547271728516, "p90": 21.623256492614747, "max": 33.192142486572266, "pos_frac": 0.65625, "sample": [-0.13070297241210938, 2.446186065673828, 9.342338562011719, 15.263435363769531, 1.6917572021484375, 3.9545135498046875, 23.69903564453125, 4.975589752197266, 22.505722045898438, 23.922271728515625, 18.981006622314453, 2.709808349609375, 13.123764038085938, -9.147056579589844, 10.249191284179688, 6.275367736816406, 20.143165588378906, -0.8116111755371094, 10.021049499511719, -6.670249938964844, 4.618858337402344, -18.165603637695312, -18.30963897705078, -1.2180328369140625, 0.1541290283203125, 13.81640625, 30.035308837890625, -3.833709716796875, 9.477607727050781, 9.466205596923828, -1.8943939208984375, 4.484580993652344, -8.106597900390625, 7.480556488037109, 13.979564666748047, 33.192142486572266, 25.255874633789062, -3.1428680419921875, -1.4916419982910156, 14.56942367553711, 12.44607162475586, 1.4575080871582031, 21.882537841796875, -3.8748931884765625, -7.851806640625, -5.138822555541992, -6.577108383178711, 21.018266677856445, -8.094810485839844, 2.8872528076171875, 5.21484375, 8.94295883178711, 8.7213134765625, 0.2581520080566406, -0.80596923828125, -0.18793487548828125, 2.4223403930664062, 15.710762023925781, 3.8739166259765625, 18.165283203125, -25.554214477539062, -12.121322631835938, 15.402664184570312, -8.759185791015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000527.npy"}
{"epoch": 0.7966742252456538, "step": 528, "batch_size": 64, "mean": 5.947617053985596, "std": 12.763665199279785, "min": -26.046791076660156, "p10": -10.42483139038086, "median": 5.516212463378906, "p90": 22.397232818603516, "max": 37.630409240722656, "pos_frac": 0.703125, "sample": [15.446098327636719, -17.431793212890625, 27.299697875976562, -1.5849075317382812, 15.136985778808594, 9.755058288574219, 15.113059997558594, -6.8814849853515625, 4.159580230712891, 8.83676528930664, -0.6992950439453125, -10.612022399902344, 15.778465270996094, -26.046791076660156, 1.9899368286132812, -8.113224029541016, 23.293853759765625, 4.683986663818359, 3.2499122619628906, 1.22552490234375, 3.2148513793945312, 30.201095581054688, 1.2206974029541016, -14.36395263671875, 0.8363380432128906, 10.657852172851562, 2.2138519287109375, -9.988052368164062, -2.499897003173828, 22.523330688476562, 8.03818130493164, 8.693595886230469, 17.482620239257812, 30.891502380371094, 10.690017700195312, -9.467849731445312, 37.630409240722656, 7.918830871582031, 25.004302978515625, 6.200828552246094, 15.925529479980469, 14.785125732421875, 1.1215019226074219, 22.103004455566406, -15.771759033203125, 5.545013427734375, -0.060642242431640625, 1.9134750366210938, 18.932870864868164, -3.1086082458496094, 5.4874114990234375, -18.46021270751953, -11.04666519165039, 10.14544677734375, 11.7410888671875, 11.038204193115234, -5.8270111083984375, -0.38744354248046875, 15.661697387695312, -4.260490417480469, 3.893064498901367, 20.519412994384766, 13.012344360351562, 6.0471649169921875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000528.npy"}
{"epoch": 0.7981859410430839, "step": 529, "batch_size": 64, "mean": 12.104849815368652, "std": 13.649789810180664, "min": -15.302810668945312, "p10": -3.3485435485839834, "median": 8.9483642578125, "p90": 31.218559646606447, "max": 42.12769317626953, "pos_frac": 0.828125, "sample": [27.30392074584961, -7.395301818847656, 13.727851867675781, 4.992576599121094, 17.7637939453125, -2.4169769287109375, 37.49870300292969, 31.03945541381836, 7.186393737792969, 5.3667144775390625, -6.396339416503906, 0.6205863952636719, -3.70330810546875, 9.187393188476562, 14.017852783203125, 6.189735412597656, -15.302810668945312, 31.295318603515625, -1.3335838317871094, 27.23114776611328, 23.483909606933594, 6.110443115234375, 4.0361175537109375, -2.5207595825195312, 42.12769317626953, 25.442337036132812, 0.449127197265625, 8.509689331054688, 0.740447998046875, 0.6806240081787109, 17.3150634765625, 17.583969116210938, 12.837486267089844, 37.23033142089844, 22.6190185546875, 30.69969940185547, 23.066726684570312, 13.9227294921875, 8.224517822265625, 13.715965270996094, 10.9293212890625, 25.586814880371094, 32.310394287109375, 1.8009719848632812, 10.823005676269531, 17.044281005859375, 2.8590316772460938, 1.6132879257202148, 1.630767822265625, 24.12085723876953, 5.29620361328125, -5.126861572265625, 6.6025848388671875, 29.600692749023438, 0.7294807434082031, 37.08317565917969, 26.73914337158203, 11.374198913574219, -0.22441864013671875, 8.709335327148438, -13.085884094238281, 5.828102111816406, -4.754554748535156, 34.07221984863281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000529.npy"}
{"epoch": 0.799697656840514, "step": 530, "batch_size": 64, "mean": 8.191718101501465, "std": 12.726482391357422, "min": -13.006149291992188, "p10": -7.804086303710937, "median": 5.556285858154297, "p90": 25.47969665527344, "max": 44.2123908996582, "pos_frac": 0.75, "sample": [35.898956298828125, 11.402839660644531, 13.798912048339844, -3.88427734375, 27.89165496826172, 3.060791015625, 29.676071166992188, 1.5092239379882812, 33.61973571777344, 15.005912780761719, -0.25496673583984375, 20.94415283203125, -8.226341247558594, 2.0507545471191406, 17.71308135986328, 15.022052764892578, 10.485931396484375, 1.28125, 44.2123908996582, 4.837615966796875, -4.452594757080078, -13.006149291992188, -9.0498046875, 12.45676040649414, 16.8896484375, 0.9487972259521484, 19.604812622070312, 16.21153450012207, 0.15174102783203125, 34.003639221191406, 2.8757553100585938, -7.791450500488281, 14.813247680664062, 24.99261474609375, 2.0740966796875, 0.4431724548339844, 15.449787139892578, 21.4661808013916, -7.809501647949219, 5.256965637207031, 3.3266448974609375, 9.45745849609375, -1.3694000244140625, 10.914299011230469, -2.7292633056640625, -11.021186828613281, 17.599647521972656, -2.716827392578125, -11.00655746459961, 0.311981201171875, -10.21841812133789, -1.7113113403320312, 19.579795837402344, 3.098125457763672, 25.688446044921875, 5.955780029296875, 11.75262451171875, -4.1495361328125, 0.62713623046875, 1.2128067016601562, 19.941986083984375, 5.982330322265625, 6.3127593994140625, 5.8556060791015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000530.npy"}
{"epoch": 0.8012093726379441, "step": 531, "batch_size": 64, "mean": 6.534139633178711, "std": 11.407341957092285, "min": -21.831478118896484, "p10": -5.771330261230468, "median": 5.566732406616211, "p90": 21.8808090209961, "max": 37.23915100097656, "pos_frac": 0.703125, "sample": [-1.8823738098144531, 7.503208160400391, 16.80054473876953, -0.6994400024414062, 5.652095794677734, 8.859062194824219, -3.8944091796875, 11.27203369140625, 8.558586120605469, 4.780586242675781, 7.7874298095703125, 1.9538707733154297, 20.699676513671875, -10.390518188476562, 37.23915100097656, 17.121387481689453, -8.678497314453125, -3.4884109497070312, -6.614448547363281, 22.914901733398438, 4.3123626708984375, 12.578018188476562, -5.8775634765625, 3.750030517578125, -1.2134017944335938, 5.4813690185546875, 7.838859558105469, -1.7171478271484375, 7.9467010498046875, -3.44134521484375, 3.380462646484375, 3.048227310180664, -5.059638977050781, 0.8885650634765625, -18.791542053222656, -1.7835044860839844, 5.9697723388671875, 5.236904144287109, 11.933536529541016, -21.831478118896484, 6.556884765625, 10.603324890136719, 17.704208374023438, 17.089340209960938, -5.5234527587890625, -10.427734375, 29.484020233154297, 28.06287956237793, 14.415267944335938, 5.6880340576171875, 2.9810638427734375, 13.403457641601562, 4.6264495849609375, 20.136734008789062, 10.684722900390625, -1.1744003295898438, 22.387008666992188, 2.67913818359375, 7.810630798339844, 29.99676513671875, 8.912969589233398, 4.803760528564453, 29.16412353515625, -0.023883819580078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000531.npy"}
{"epoch": 0.8027210884353742, "step": 532, "batch_size": 64, "mean": 8.024169921875, "std": 13.233915328979492, "min": -15.990447998046875, "p10": -6.4419969558715815, "median": 5.550403594970703, "p90": 26.04552917480469, "max": 44.010765075683594, "pos_frac": 0.71875, "sample": [10.019218444824219, 14.845340728759766, 18.498703002929688, 30.309234619140625, 5.1956787109375, -3.082836151123047, 3.8097991943359375, 44.010765075683594, 5.026390075683594, 13.861679077148438, -3.2749481201171875, 18.266632080078125, 20.711563110351562, 6.909126281738281, 12.362327575683594, 5.249755859375, 3.082733154296875, 18.71255111694336, 5.684165954589844, -11.982032775878906, -3.9545326232910156, -2.5931434631347656, 13.889183044433594, 8.134590148925781, 0.78466796875, 0.8529281616210938, 8.102428436279297, 17.319053649902344, 20.947250366210938, 19.87041473388672, 17.9998779296875, -1.0855560302734375, -3.4962158203125, 40.442893981933594, 8.174636840820312, 3.97723388671875, 26.469619750976562, -6.774929046630859, -0.0988006591796875, 4.704978942871094, -6.561130523681641, -15.990447998046875, -6.164018630981445, -1.2339439392089844, 29.318782806396484, 9.57550048828125, 32.75625991821289, 6.184669494628906, 7.337440490722656, 13.309776306152344, 9.128349304199219, 2.5736007690429688, 0.28924560546875, 40.29294204711914, -13.050041198730469, 10.821407318115234, -10.983489990234375, -1.987060546875, 25.055984497070312, -14.8333740234375, 4.920360565185547, -6.147304534912109, 1.634246826171875, 5.4166412353515625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000532.npy"}
{"epoch": 0.8042328042328042, "step": 533, "batch_size": 64, "mean": 8.410598754882812, "std": 13.606033325195312, "min": -20.682334899902344, "p10": -7.021802139282226, "median": 8.268041610717773, "p90": 30.71908187866211, "max": 33.905479431152344, "pos_frac": 0.671875, "sample": [10.322731018066406, 20.925479888916016, -3.417724609375, 13.258747100830078, 7.085853576660156, 13.624382019042969, -5.93548583984375, -2.319110870361328, 16.566429138183594, 18.185577392578125, -8.70928955078125, -5.9835357666015625, 10.257366180419922, -2.7922096252441406, 21.284082412719727, 19.506591796875, 7.950920104980469, 20.978530883789062, -9.107177734375, -0.13350677490234375, 0.7287445068359375, 5.6881866455078125, 11.646041870117188, 2.677398681640625, -0.7214202880859375, -6.312015533447266, 16.109350204467773, -5.292236328125, 6.413169860839844, -15.357036590576172, -4.0384674072265625, 8.910564422607422, 32.60621643066406, 30.903480529785156, -5.603099822998047, 9.01364517211914, -5.003204345703125, 33.64039611816406, 2.8888092041015625, -4.857418060302734, 11.804885864257812, 18.823394775390625, 12.770362854003906, 23.809791564941406, 31.101333618164062, 33.905479431152344, 2.7535247802734375, -12.816841125488281, 8.827789306640625, -7.885047912597656, -3.986034393310547, 30.996482849121094, 8.585163116455078, 1.7998428344726562, 4.08204460144043, -7.325996398925781, 11.510414123535156, 13.704742431640625, 6.672138214111328, 33.64483642578125, 30.288818359375, 29.50189208984375, -20.682334899902344, 20.80188751220703], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000533.npy"}
{"epoch": 0.8057445200302343, "step": 534, "batch_size": 64, "mean": 7.371869087219238, "std": 11.84791374206543, "min": -17.710617065429688, "p10": -8.619091796874999, "median": 7.692529678344727, "p90": 21.34693145751953, "max": 33.9129638671875, "pos_frac": 0.71875, "sample": [6.8255615234375, -7.808986663818359, 13.58148193359375, 15.227386474609375, 1.240753173828125, 12.692459106445312, 5.0056304931640625, -8.212738037109375, 15.520572662353516, -6.496849060058594, -4.168513298034668, -3.0172195434570312, 12.244071960449219, 21.386322021484375, -5.122226715087891, 31.080841064453125, 5.926475524902344, 10.88809585571289, 12.400970458984375, -7.5100555419921875, 33.9129638671875, 19.993492126464844, 14.521831512451172, 13.438377380371094, -5.631172180175781, -12.75537109375, -17.710617065429688, 0.22415542602539062, 18.4791259765625, 14.145488739013672, 2.1884765625, -11.012481689453125, 25.829330444335938, 31.162399291992188, 16.48227882385254, 8.580718994140625, -8.940788269042969, 6.330955505371094, 29.798355102539062, 17.519309997558594, 22.543106079101562, 4.924125671386719, -8.793243408203125, 4.6701812744140625, 12.36296272277832, 10.634449005126953, -3.6715049743652344, 4.764217376708984, 3.5856399536132812, 7.39215087890625, 7.992908477783203, 14.16025161743164, 6.170318603515625, -2.7324066162109375, 8.96722412109375, 12.88531494140625, 16.572036743164062, -8.805816650390625, 8.136898040771484, -4.461982727050781, 21.255020141601562, 3.851318359375, 20.902786254882812, -9.7471923828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000534.npy"}
{"epoch": 0.8072562358276644, "step": 535, "batch_size": 64, "mean": 7.708707809448242, "std": 14.597306251525879, "min": -35.876766204833984, "p10": -8.104930877685545, "median": 6.728033065795898, "p90": 27.32766952514649, "max": 48.39802551269531, "pos_frac": 0.71875, "sample": [-0.5692481994628906, -9.947105407714844, 39.47080993652344, 26.20893096923828, 1.8226165771484375, 15.2166748046875, -15.289083480834961, 2.805164337158203, -4.685302734375, 34.988189697265625, 3.619476318359375, -0.8167266845703125, 12.661663055419922, 15.055618286132812, -4.97248649597168, 3.83673095703125, -8.778060913085938, 11.524528503417969, -2.149643898010254, -6.534294128417969, 6.1386260986328125, 11.436576843261719, 13.275798797607422, -5.8099822998046875, 22.46126937866211, 1.4897994995117188, 25.09172821044922, -19.67589569091797, -9.316818237304688, 6.979975700378418, 27.80712890625, 30.585716247558594, 13.1014404296875, 5.22113037109375, 5.760885238647461, 6.726680755615234, 8.025360107421875, 31.886978149414062, -11.903297424316406, 14.548990249633789, 7.072273254394531, -3.732759475708008, 48.39802551269531, 1.489593505859375, 16.693809509277344, -35.876766204833984, 10.188262939453125, 17.788124084472656, 2.2846145629882812, 12.432716369628906, 7.619171142578125, -5.810337066650391, 1.0309219360351562, 29.23815155029297, 12.628852844238281, -1.8221664428710938, 9.086959838867188, 16.289962768554688, 23.388168334960938, 0.4776153564453125, -6.310523986816406, 3.02899169921875, 23.743690490722656, 6.7293853759765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000535.npy"}
{"epoch": 0.8087679516250945, "step": 536, "batch_size": 64, "mean": 9.557329177856445, "std": 10.852989196777344, "min": -7.340450286865234, "p10": -3.288261413574218, "median": 8.0040283203125, "p90": 24.92109832763672, "max": 36.83335876464844, "pos_frac": 0.828125, "sample": [10.20920181274414, -2.5812225341796875, -7.340450286865234, 22.166955947875977, 12.9720458984375, -5.0429534912109375, 1.7935562133789062, -3.591278076171875, 7.044868469238281, 0.07200241088867188, -5.094444274902344, 12.246994018554688, 26.3709716796875, 18.516204833984375, 9.101823806762695, 10.6767578125, 11.7398681640625, 33.74399185180664, 26.876190185546875, 3.9225120544433594, 11.945409774780273, 1.6853294372558594, 16.860931396484375, 23.40509033203125, -1.5117568969726562, 36.83335876464844, 12.804206848144531, 2.7495079040527344, 0.589752197265625, 18.381168365478516, 27.616607666015625, -7.168251037597656, 1.2498359680175781, 5.931888580322266, 5.10247802734375, 11.870361328125, 24.434566497802734, -6.077838897705078, 24.497024536132812, 4.05748176574707, 15.714065551757812, 16.459030151367188, 14.494302749633789, -6.101526260375977, 2.1862525939941406, 1.2473678588867188, 11.470573425292969, 6.808128356933594, 3.692413330078125, 25.10284423828125, 2.1106033325195312, 8.963188171386719, 4.468910217285156, 20.287765502929688, 17.541091918945312, 1.3673973083496094, 1.6650733947753906, 30.887542724609375, -2.2245960235595703, 2.5575408935546875, 12.249458312988281, 19.654212951660156, -0.23410797119140625, 2.24078369140625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000536.npy"}
{"epoch": 0.8102796674225246, "step": 537, "batch_size": 64, "mean": 7.318655014038086, "std": 12.146953582763672, "min": -17.636043548583984, "p10": -9.808131027221679, "median": 7.070400238037109, "p90": 25.429637908935554, "max": 31.86577606201172, "pos_frac": 0.734375, "sample": [15.041923522949219, 7.520622253417969, -1.0578460693359375, 3.708740234375, 26.40667724609375, 17.26767349243164, 26.84455108642578, 0.3002204895019531, 3.0565032958984375, 12.300048828125, 9.73678970336914, 10.682231903076172, 26.35565948486328, 22.932601928710938, 26.65890121459961, 12.704315185546875, 26.947025299072266, 7.823490142822266, 2.4896297454833984, -1.8013677597045898, -10.188955307006836, 6.260284423828125, 6.62017822265625, -5.149463653564453, -7.827136993408203, 15.720306396484375, 6.1307373046875, 10.609916687011719, -13.697196960449219, 0.20107269287109375, 15.023307800292969, 6.617462158203125, 9.885242462158203, -2.79937744140625, -3.3668289184570312, -1.8114395141601562, 21.5496826171875, 10.431671142578125, 6.40582275390625, 0.5815658569335938, 10.00921630859375, 31.86577606201172, -9.5626220703125, 2.8993911743164062, 2.3197364807128906, -5.030834197998047, 12.84088134765625, -12.195610046386719, 1.4352493286132812, 15.259834289550781, 8.319862365722656, 18.44831085205078, -17.636043548583984, 23.2689208984375, 27.954727172851562, 22.969497680664062, -15.215473175048828, 14.841583251953125, -9.913349151611328, 0.9185409545898438, 19.178085327148438, 16.665771484375, -10.52730941772461, -7.835472106933594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000537.npy"}
{"epoch": 0.8117913832199547, "step": 538, "batch_size": 64, "mean": 9.057718276977539, "std": 12.00281047821045, "min": -12.960662841796875, "p10": -3.9474945068359375, "median": 8.435802459716797, "p90": 23.382646942138674, "max": 44.37156677246094, "pos_frac": 0.765625, "sample": [-10.58359146118164, 9.97490119934082, 1.2515335083007812, 9.688438415527344, 19.123653411865234, -12.960662841796875, 11.451847076416016, -2.3531951904296875, -2.2576904296875, 14.190622329711914, 16.274015426635742, 29.70899772644043, -3.9627838134765625, 14.715225219726562, 15.73678207397461, 11.852088928222656, 26.246675491333008, 23.442405700683594, 9.498291015625, 11.726766586303711, 9.058673858642578, -2.7788829803466797, 15.005996704101562, 18.051422119140625, 5.883228302001953, -2.9493446350097656, 23.01007843017578, 4.9101715087890625, 29.25518035888672, 9.72161865234375, 5.7925262451171875, 8.016197204589844, 1.2593231201171875, 4.156036376953125, 1.8658905029296875, 7.739738464355469, -3.9118194580078125, 8.268447875976562, -2.0860671997070312, 8.603157043457031, 22.201690673828125, 5.58197021484375, 23.243209838867188, 7.0383758544921875, -5.747547149658203, 16.857677459716797, -6.680206298828125, 22.779769897460938, 12.898975372314453, 44.37156677246094, 3.0807952880859375, 12.616928100585938, -12.2249755859375, -2.4986572265625, 6.7230072021484375, -0.4566650390625, 42.80088424682617, 3.191791534423828, 13.887306213378906, -12.750274658203125, 0.7256393432617188, 26.969432830810547, 8.676837921142578, 4.7705078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000538.npy"}
{"epoch": 0.8133030990173847, "step": 539, "batch_size": 64, "mean": 7.516425132751465, "std": 11.497159957885742, "min": -9.067787170410156, "p10": -5.741968154907227, "median": 4.208251953125, "p90": 24.81894111633301, "max": 36.089393615722656, "pos_frac": 0.703125, "sample": [8.340333938598633, 20.415225982666016, -3.555588722229004, 31.135162353515625, 15.338058471679688, 3.485626220703125, 21.432388305664062, 6.479339599609375, 25.033935546875, -2.925731658935547, 28.7618408203125, 3.704864501953125, -0.477783203125, 7.274566650390625, -4.1936187744140625, 2.4868240356445312, -2.4476547241210938, 2.7531356811523438, 29.10883331298828, 7.922412872314453, 1.3061408996582031, -1.0232715606689453, 11.269744873046875, -7.4147491455078125, -0.9405364990234375, -9.067787170410156, -5.418182373046875, 7.0333709716796875, 33.81854248046875, 9.311019897460938, -1.0274658203125, 3.1852264404296875, -3.73052978515625, -1.7701873779296875, 17.374038696289062, 3.486013412475586, 7.9864044189453125, -2.8664398193359375, 29.19992446899414, 1.6583786010742188, -6.268562316894531, -7.756492614746094, 9.606021881103516, -5.880733489990234, 21.966522216796875, 14.995162963867188, -7.308267593383789, 36.089393615722656, 4.711639404296875, 0.07910919189453125, 16.214458465576172, 17.60480499267578, 10.135627746582031, 0.6519622802734375, 8.109825134277344, 15.227325439453125, 0.4683990478515625, 3.123128890991211, -8.313079833984375, 24.31728744506836, 7.887165069580078, 20.147323608398438, 10.215591430664062, 2.5857467651367188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000539.npy"}
{"epoch": 0.8148148148148148, "step": 540, "batch_size": 64, "mean": 9.484641075134277, "std": 13.820647239685059, "min": -19.343753814697266, "p10": -7.813182830810546, "median": 8.403142929077148, "p90": 30.023159027099613, "max": 37.824127197265625, "pos_frac": 0.75, "sample": [-8.121498107910156, 13.02313232421875, 23.584716796875, 31.096431732177734, 5.725982666015625, -13.009408950805664, 10.870597839355469, 15.19580078125, 4.481292724609375, 8.01980209350586, 5.047687530517578, 13.9139404296875, 18.519760131835938, 12.822494506835938, -9.479072570800781, 2.8653564453125, 29.492652893066406, 5.11590576171875, -0.4898872375488281, 6.2240753173828125, 24.467002868652344, -4.3946990966796875, 15.594970703125, 16.231491088867188, 37.1942138671875, 11.738243103027344, -4.747406005859375, 9.624435424804688, -15.720184326171875, 1.37945556640625, -15.33203125, 33.80987548828125, 24.753036499023438, 6.703277587890625, 37.824127197265625, 6.273464202880859, -4.648124694824219, 20.474321365356445, 0.5089874267578125, 5.671289443969727, 2.520477294921875, 21.672149658203125, 8.798225402832031, 35.67084503173828, 28.820751190185547, -6.08984375, 20.918594360351562, 7.093727111816406, -3.39410400390625, -19.343753814697266, 8.786483764648438, 24.32147216796875, -9.047210693359375, 0.8747787475585938, -0.5202980041503906, 12.443862915039062, -7.093780517578125, 30.250518798828125, -1.641082763671875, 8.858901977539062, 5.2325286865234375, 8.931207656860352, 15.457473754882812, 31.189590454101562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000540.npy"}
{"epoch": 0.8163265306122449, "step": 541, "batch_size": 64, "mean": 9.099142074584961, "std": 12.718914031982422, "min": -21.36328887939453, "p10": -9.959119415283203, "median": 9.655932426452637, "p90": 25.80594139099121, "max": 33.636749267578125, "pos_frac": 0.75, "sample": [15.994087219238281, 20.7982177734375, 12.072093963623047, 32.513526916503906, -1.2825698852539062, 25.727378845214844, -3.8701553344726562, 21.114654541015625, 16.172683715820312, -9.071990966796875, 8.972976684570312, -13.23040771484375, 6.34320068359375, 4.537343978881836, 4.526268005371094, -10.420127868652344, 13.274551391601562, 16.03167724609375, 4.776802062988281, 30.239599227905273, -1.1678695678710938, 0.1395721435546875, 29.074695587158203, 16.408157348632812, 24.12982177734375, 9.674680709838867, 18.548492431640625, -6.30035400390625, -11.497554779052734, -0.4358673095703125, 13.900157928466797, 13.100839614868164, 30.395418167114258, 18.608978271484375, -10.339317321777344, 25.839611053466797, 3.0136280059814453, -21.36328887939453, 8.836166381835938, 12.085594177246094, 3.7195186614990234, 1.012603759765625, 15.504409790039062, 17.154460906982422, 5.482204437255859, 2.7378311157226562, 19.07568359375, 10.635948181152344, -10.663352966308594, 9.637184143066406, 0.28964900970458984, 10.738794326782227, 7.836971282958984, 3.4184188842773438, -5.289146423339844, -1.4434814453125, 17.406230926513672, 10.629413604736328, -1.674306869506836, 23.22876739501953, 30.272321701049805, -10.441840171813965, 33.636749267578125, 21.56866455078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000541.npy"}
{"epoch": 0.817838246409675, "step": 542, "batch_size": 64, "mean": 10.091310501098633, "std": 14.259110450744629, "min": -23.195663452148438, "p10": -5.612602996826172, "median": 7.164068222045898, "p90": 30.3084587097168, "max": 37.45446014404297, "pos_frac": 0.75, "sample": [27.022972106933594, 16.506317138671875, 7.3623046875, 28.614608764648438, 1.4611434936523438, 3.5933456420898438, 5.5546875, 19.85501480102539, 2.6271438598632812, 3.0384902954101562, 9.285980224609375, -6.158210754394531, -0.2686004638671875, 17.186065673828125, 2.601837158203125, 14.767066955566406, -0.26212310791015625, 15.555191040039062, -4.013145446777344, 36.506893157958984, 33.811885833740234, 29.671432495117188, 1.5036611557006836, 1.1210670471191406, -13.629016876220703, -23.195663452148438, 1.4985237121582031, -2.8222198486328125, 14.36836051940918, -12.402717590332031, 25.014965057373047, 28.592304229736328, 9.965435028076172, 14.917903900146484, -5.710548400878906, -1.4813232421875, -5.384063720703125, 12.837112426757812, -9.383003234863281, 30.82232666015625, 4.850517272949219, 4.090229034423828, 30.419418334960938, -3.5929031372070312, 20.734359741210938, 0.5611648559570312, 33.86228942871094, 14.949630737304688, 2.698741912841797, 30.04955291748047, 20.63470458984375, 37.45446014404297, -11.732688903808594, -0.6943092346191406, 10.369621276855469, -0.8665390014648438, 2.6236648559570312, 34.85627365112305, 6.965831756591797, 8.145484924316406, 3.7063026428222656, 29.938446044921875, 7.665946960449219, 27.200225830078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000542.npy"}
{"epoch": 0.8193499622071051, "step": 543, "batch_size": 64, "mean": 9.297382354736328, "std": 13.409905433654785, "min": -27.100181579589844, "p10": -1.9059265136718746, "median": 6.140871047973633, "p90": 30.264068603515625, "max": 36.62336349487305, "pos_frac": 0.78125, "sample": [7.673431396484375, 3.217620849609375, 3.1931304931640625, 0.5461883544921875, 8.606719970703125, -0.9548072814941406, 19.22222900390625, 32.34437561035156, -11.483871459960938, 20.770751953125, 2.9384613037109375, 10.000076293945312, 14.858821868896484, 12.18304443359375, 29.896583557128906, 3.2862625122070312, 34.40825653076172, -20.77875518798828, 1.9150943756103516, 24.77554702758789, -0.5550479888916016, 0.48960113525390625, 5.10711669921875, 5.8444061279296875, 9.89840316772461, 36.62336349487305, -27.100181579589844, -0.0507354736328125, 0.7765617370605469, -4.438560485839844, 7.331321716308594, 21.023086547851562, 33.153526306152344, 11.872245788574219, 16.856706619262695, -7.1495208740234375, 31.247220993041992, -1.1764068603515625, -9.539608001708984, -1.5877857208251953, 6.4445953369140625, 1.9577903747558594, 29.039337158203125, 20.909271240234375, 6.437335968017578, 20.723419189453125, 30.42156219482422, 25.228729248046875, 2.9140472412109375, 8.664745330810547, 4.778984069824219, 4.060386657714844, 1.5617752075195312, 3.7963943481445312, 16.12396240234375, -0.00439453125, 7.556972503662109, 5.10467529296875, 2.9984130859375, -2.0422725677490234, 24.176170349121094, 35.375038146972656, -0.7042007446289062, 14.264854431152344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000543.npy"}
{"epoch": 0.8208616780045351, "step": 544, "batch_size": 64, "mean": 10.775285720825195, "std": 14.78672981262207, "min": -14.49066162109375, "p10": -7.5592399597167965, "median": 8.33905029296875, "p90": 30.39546051025391, "max": 43.92280578613281, "pos_frac": 0.703125, "sample": [18.751861572265625, -12.326225280761719, 8.140365600585938, 0.021280288696289062, 17.136367797851562, 12.576881408691406, -7.8289794921875, -2.3188209533691406, 23.246597290039062, 3.0382156372070312, -2.4259185791015625, 4.340599060058594, 11.214324951171875, 24.803932189941406, 12.610443115234375, 3.5260143280029297, -11.398273468017578, 12.79153060913086, 11.647689819335938, 1.179656982421875, 7.143394470214844, 4.505805969238281, 7.333221435546875, 29.295059204101562, 28.015151977539062, -1.030313491821289, 6.162477493286133, -8.916717529296875, -0.8853387832641602, -3.0447845458984375, 33.456298828125, -11.413124084472656, 18.768829345703125, -3.2666015625, 34.64100646972656, 17.313217163085938, 30.181976318359375, -8.6651611328125, 13.273637771606445, 16.08154296875, 8.537734985351562, 20.613296508789062, 42.652793884277344, 2.5983314514160156, -2.1574172973632812, 6.59991455078125, 18.776065826416016, 6.137229919433594, 38.446197509765625, -6.929847717285156, -4.517087936401367, 43.92280578613281, 30.486953735351562, 28.469741821289062, -4.646720886230469, -0.9204559326171875, 16.526809692382812, -14.49066162109375, 29.412303924560547, 38.879722595214844, -4.035581588745117, 20.727767944335938, 17.107189178466797, 19.744049072265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000544.npy"}
{"epoch": 0.8223733938019653, "step": 545, "batch_size": 64, "mean": 7.4319233894348145, "std": 12.742783546447754, "min": -16.90298843383789, "p10": -6.9857583999633786, "median": 6.235919952392578, "p90": 21.746749877929688, "max": 45.824066162109375, "pos_frac": 0.703125, "sample": [17.07968521118164, 10.814689636230469, -6.67523193359375, 34.20369338989258, 0.6153793334960938, 6.219657897949219, -0.8603668212890625, -9.58547592163086, -3.145732879638672, 6.2521820068359375, 8.27801513671875, -5.866119384765625, 16.310302734375, 0.6320152282714844, 1.9368820190429688, 17.059982299804688, 5.4652557373046875, 10.453826904296875, -0.9656219482421875, 21.784561157226562, 7.7320709228515625, 36.92266845703125, -1.1963882446289062, 15.712112426757812, 20.58099365234375, 21.658523559570312, 7.761161804199219, 9.761730194091797, 12.470512390136719, 3.0674095153808594, -3.153787612915039, -13.650678634643555, -2.8983306884765625, 5.612155914306641, -11.800125122070312, -3.1951675415039062, 20.57263946533203, 41.205623626708984, 3.33941650390625, -5.161262512207031, -1.3616104125976562, 22.937393188476562, 10.407424926757812, 13.282669067382812, 2.1229171752929688, 11.114696502685547, -7.118841171264648, 7.858680725097656, 13.001602172851562, -16.90298843383789, 0.3497314453125, 15.798858642578125, 4.2654571533203125, 45.824066162109375, 18.836959838867188, -10.465171813964844, -1.9613609313964844, 12.671989440917969, -9.844940185546875, 3.740306854248047, 7.5174560546875, 23.34943389892578, 14.062232971191406, 0.807281494140625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000545.npy"}
{"epoch": 0.8238851095993953, "step": 546, "batch_size": 64, "mean": 8.67685317993164, "std": 11.825909614562988, "min": -21.920129776000977, "p10": -6.802738952636719, "median": 8.133975982666016, "p90": 24.798417663574227, "max": 39.84745407104492, "pos_frac": 0.78125, "sample": [28.633712768554688, -8.085540771484375, 22.519935607910156, 4.438720703125, 8.14596939086914, 21.57660484313965, 10.050201416015625, 3.194305419921875, 1.235687255859375, -7.2152252197265625, -1.213623046875, 14.838298797607422, 12.519664764404297, 12.378238677978516, -11.035736083984375, 14.474166870117188, 0.0199737548828125, 6.914758682250977, -6.621490478515625, 0.8091888427734375, -1.793731689453125, 19.90079116821289, 3.0179481506347656, 9.6708984375, 9.797019958496094, 2.80963134765625, 20.914405822753906, -3.0032958984375, 30.437862396240234, 35.423492431640625, 5.820894241333008, 4.7684783935546875, 22.365489959716797, 14.255271911621094, 8.4886474609375, -6.8804168701171875, 7.320281982421875, 15.777679443359375, 5.491374969482422, -1.9107513427734375, 9.294692993164062, 4.086875915527344, -9.624553680419922, 28.002113342285156, 3.9920997619628906, 7.190399169921875, -3.610994338989258, 8.12198257446289, 9.67111587524414, 8.523372650146484, 14.084434509277344, 8.047353744506836, 25.77490997314453, -7.911163330078125, -2.0286102294921875, 0.6431312561035156, 19.726139068603516, 14.465118408203125, 39.84745407104492, 15.424976348876953, 16.8651123046875, -21.920129776000977, 25.842790603637695, 10.560178756713867], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000546.npy"}
{"epoch": 0.8253968253968254, "step": 547, "batch_size": 64, "mean": 9.925826072692871, "std": 12.200201034545898, "min": -15.922447204589844, "p10": -8.232769584655761, "median": 8.892202377319336, "p90": 26.13369064331056, "max": 44.925662994384766, "pos_frac": 0.84375, "sample": [10.607101440429688, -3.4069862365722656, 0.9730148315429688, 3.9937286376953125, 27.5230712890625, 12.297319412231445, -13.546096801757812, 44.925662994384766, 20.14056396484375, 23.598289489746094, 6.7172393798828125, 6.397918701171875, -7.753320693969727, 8.756103515625, 8.30286979675293, 2.2699966430664062, 19.281723022460938, 27.220291137695312, -9.014228820800781, 4.384407043457031, 23.34278106689453, 13.214004516601562, 34.638832092285156, 4.64251708984375, 22.771865844726562, 5.90631103515625, 3.7953720092773438, 15.236007690429688, 13.092086791992188, 8.394229888916016, 5.056541442871094, 1.935516357421875, -12.253631591796875, 11.446136474609375, 3.4397125244140625, 10.781496047973633, 27.64285659790039, 5.738626480102539, 12.888248443603516, 19.565685272216797, 13.146240234375, 28.33283233642578, 19.85295867919922, 21.111392974853516, -13.014957427978516, 17.15695571899414, 6.940662384033203, -15.922447204589844, -8.438247680664062, 19.254417419433594, 27.567123413085938, 9.569751739501953, 18.085479736328125, 1.2151031494140625, 4.08038330078125, -10.434951782226562, 6.133831024169922, 17.569786071777344, 9.028301239013672, 1.925201416015625, 7.53277587890625, 11.515754699707031, 19.700443267822266, -1.5998058319091797], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000547.npy"}
{"epoch": 0.8269085411942555, "step": 548, "batch_size": 64, "mean": 6.58412504196167, "std": 12.193231582641602, "min": -19.911855697631836, "p10": -8.504297637939452, "median": 3.9373397827148438, "p90": 23.42627410888672, "max": 34.0270881652832, "pos_frac": 0.71875, "sample": [4.0241851806640625, 9.003707885742188, 5.527626037597656, -5.398284912109375, 1.120025634765625, 27.517715454101562, -0.3965721130371094, 25.95699691772461, 16.96257781982422, 23.256134033203125, -2.7207489013671875, 21.476993560791016, 0.14086627960205078, 0.8736610412597656, -5.4654083251953125, 8.900602340698242, -13.271286010742188, 1.9694900512695312, 10.884147644042969, 7.1273193359375, 12.032318115234375, 18.469390869140625, -10.190383911132812, 10.880020141601562, 3.636138916015625, 5.930316925048828, 30.47894287109375, 23.499191284179688, 18.934478759765625, 20.130569458007812, 2.0733413696289062, 3.4681053161621094, 2.0170211791992188, 15.730751037597656, 3.67572021484375, 7.984344482421875, 27.119140625, 3.850494384765625, -12.77044677734375, -0.33940887451171875, 11.810249328613281, 2.1693649291992188, 15.611709594726562, -5.0438995361328125, 30.960525512695312, 8.00210952758789, -0.48612213134765625, -9.211929321289062, -6.853157043457031, 5.580863952636719, 8.223602294921875, -19.911855697631836, 2.137218475341797, 13.16921615600586, 3.7542266845703125, 3.8455810546875, 20.061737060546875, -2.6994247436523438, -2.7770004272460938, -10.975982666015625, 34.0270881652832, 19.21887969970703, -17.84765625, -5.481147766113281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000548.npy"}
{"epoch": 0.8284202569916855, "step": 549, "batch_size": 64, "mean": 5.9100799560546875, "std": 13.45987319946289, "min": -26.342069625854492, "p10": -13.61439743041992, "median": 6.020261764526367, "p90": 24.704494857788088, "max": 34.14065170288086, "pos_frac": 0.6875, "sample": [-2.3646392822265625, 15.517852783203125, -2.687103271484375, 0.031955718994140625, -9.76791763305664, -14.401763916015625, 15.85137939453125, 7.020116806030273, -1.7764892578125, 27.19019317626953, 26.274566650390625, 3.882762908935547, 6.139978408813477, 26.15789031982422, -0.8311309814453125, -4.490966796875, -16.17092514038086, 3.8248672485351562, 21.797500610351562, 7.8914947509765625, 18.32574462890625, 8.647064208984375, 19.4476318359375, -12.250686645507812, 4.3435821533203125, 4.896736145019531, 7.7389068603515625, 18.5067138671875, 4.671836853027344, -14.198844909667969, 23.343429565429688, 30.787307739257812, 16.416259765625, -4.2432861328125, 6.709354400634766, 17.768447875976562, -0.030490875244140625, -5.012664794921875, 4.362689971923828, 13.13629150390625, -14.530242919921875, 6.5166168212890625, -0.11061286926269531, 5.900545120239258, 0.2775840759277344, 8.052082061767578, -1.8779621124267578, 7.027137756347656, 23.786575317382812, 6.184906005859375, -11.21621322631836, 25.097888946533203, 9.503814697265625, 3.754180908203125, 9.57427978515625, 10.983047485351562, 2.623779296875, 8.233123779296875, -15.758171081542969, 33.60108184814453, 34.14065170288086, -24.251998901367188, -26.342069625854492, 4.619476318359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000549.npy"}
{"epoch": 0.8299319727891157, "step": 550, "batch_size": 64, "mean": 6.471392631530762, "std": 13.462759971618652, "min": -23.460514068603516, "p10": -9.93914108276367, "median": 5.676055908203125, "p90": 24.34835319519043, "max": 38.157806396484375, "pos_frac": 0.703125, "sample": [10.995033264160156, 28.04737091064453, 37.054141998291016, 3.921478271484375, -20.415088653564453, -4.671150207519531, 0.3423309326171875, -15.450546264648438, -4.4881591796875, 4.222663879394531, 24.402667999267578, 13.141685485839844, 15.579277038574219, 6.10865592956543, 4.464176177978516, 24.22161865234375, 33.2697868347168, 21.138442993164062, 11.758514404296875, 1.9043807983398438, 14.267932891845703, -3.121673583984375, 16.86603355407715, 13.283683776855469, 13.755340576171875, -10.773979187011719, 13.754974365234375, -3.8228702545166016, 3.509876251220703, -7.2967987060546875, -16.32701301574707, 3.8365478515625, -6.803070068359375, 10.552574157714844, 5.394477844238281, 9.237022399902344, -1.86639404296875, -1.3376083374023438, 1.7665863037109375, 5.957633972167969, 6.9284820556640625, 22.42633819580078, 3.7125244140625, 8.129114151000977, 6.38775634765625, 10.045581817626953, 9.588485717773438, 23.21870231628418, 0.00611114501953125, 30.27490997314453, -10.272880554199219, 13.667610168457031, 10.36279296875, 13.333927154541016, 25.287620544433594, 2.595231056213379, -18.961990356445312, -1.5282211303710938, 38.157806396484375, -23.460514068603516, 4.654876708984375, -0.2390289306640625, -9.160415649414062, -7.366249084472656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000550.npy"}
{"epoch": 0.8314436885865457, "step": 551, "batch_size": 64, "mean": 8.07844066619873, "std": 12.034772872924805, "min": -15.265533447265625, "p10": -4.414794540405273, "median": 6.379150390625, "p90": 19.200531005859375, "max": 57.19091796875, "pos_frac": 0.765625, "sample": [11.636302947998047, 12.181167602539062, 12.816146850585938, 5.452972412109375, 14.596786499023438, 18.794164657592773, 57.19091796875, -2.547565460205078, 2.8792724609375, 18.862953186035156, 1.0859756469726562, 10.480880737304688, 30.202415466308594, 2.1706886291503906, 13.386421203613281, 11.980361938476562, 0.759857177734375, 15.598472595214844, -15.265533447265625, 2.0177717208862305, 3.1864166259765625, 3.6063919067382812, 17.8167724609375, 3.6202239990234375, 6.178012847900391, 44.90240478515625, -1.5559158325195312, 6.3082275390625, 12.193580627441406, 6.495319366455078, 9.063369750976562, 6.060882568359375, 4.48103141784668, 5.0940704345703125, -2.6052703857421875, 28.55438232421875, -4.838829040527344, 23.526748657226562, 0.3737640380859375, -0.4275665283203125, -4.516786575317383, 11.621654510498047, 1.3740081787109375, -12.099260330200195, 20.343460083007812, 11.076683044433594, 9.004364013671875, 13.327033996582031, -1.44732666015625, -0.7537040710449219, 6.4500732421875, 16.296348571777344, 6.88671875, -8.595869064331055, 1.822418212890625, -0.966583251953125, -9.128528594970703, 19.34520721435547, 8.899360656738281, 9.063644409179688, -5.1815948486328125, -4.176813125610352, 18.486671447753906, 13.5745849609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000551.npy"}
{"epoch": 0.8329554043839759, "step": 552, "batch_size": 64, "mean": 7.28953218460083, "std": 11.269730567932129, "min": -24.321083068847656, "p10": -8.270006561279295, "median": 9.310493469238281, "p90": 18.51705856323242, "max": 29.299949645996094, "pos_frac": 0.796875, "sample": [-8.584522247314453, 0.5894927978515625, -9.478775024414062, -1.742034912109375, 13.983413696289062, 1.5074100494384766, 2.6929759979248047, 8.988571166992188, 9.942543029785156, 18.346851348876953, 17.406951904296875, 29.298049926757812, 8.385278701782227, 18.41266632080078, -3.5083961486816406, -4.831022262573242, 0.006252288818359375, 11.902290344238281, 10.859283447265625, 0.49309539794921875, 11.21221923828125, 17.77460479736328, 4.9320220947265625, 7.575664520263672, 20.68547821044922, -1.0614204406738281, 7.152809143066406, 13.50100326538086, 13.21337890625, -9.71783447265625, 12.279430389404297, 4.149925231933594, 14.531112670898438, -20.06961441040039, 22.166515350341797, -7.057228088378906, 14.415023803710938, 16.07720947265625, 0.6846275329589844, 3.6773147583007812, 18.561798095703125, 9.632415771484375, -12.53106689453125, 28.063278198242188, 1.8860015869140625, 8.380172729492188, 13.122970581054688, 8.5028076171875, -18.812698364257812, 7.741416931152344, 29.299949645996094, 10.582550048828125, 11.513879776000977, -24.321083068847656, 14.924795150756836, 15.634666442871094, 11.163986206054688, 14.204601287841797, 17.948699951171875, 12.773124694824219, 18.573043823242188, 5.037162780761719, 1.3911361694335938, -7.536136627197266], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000552.npy"}
{"epoch": 0.8344671201814059, "step": 553, "batch_size": 64, "mean": 8.325023651123047, "std": 14.536709785461426, "min": -25.25122833251953, "p10": -7.155131912231444, "median": 7.338798522949219, "p90": 27.42275390625, "max": 41.99794006347656, "pos_frac": 0.640625, "sample": [41.99794006347656, -4.764926910400391, -10.384246826171875, 27.66236114501953, -2.5240554809570312, -2.0499038696289062, 25.861278533935547, 26.6612548828125, -1.6525650024414062, -2.984769821166992, 13.025508880615234, 7.0174560546875, 23.277660369873047, 11.01068115234375, 7.720977783203125, 15.923477172851562, -3.6281356811523438, 18.95824432373047, -1.8568458557128906, -25.25122833251953, -4.814506530761719, 23.408599853515625, 19.77752685546875, 10.63205337524414, 26.863670349121094, 10.877960205078125, 7.5370941162109375, -1.6389694213867188, -5.159446716308594, -7.651756286621094, 34.10173034667969, 18.40130615234375, 13.675239562988281, 36.79339599609375, 13.450721740722656, 9.854988098144531, 33.03057861328125, 0.971099853515625, 2.9973907470703125, 31.441505432128906, 4.586151123046875, -0.9935951232910156, 1.4506168365478516, 8.052263259887695, -8.206306457519531, -0.348114013671875, -5.415033340454102, 34.03760528564453, -16.817794799804688, -17.233478546142578, 15.248054504394531, 12.810832977294922, 25.104896545410156, -1.2128715515136719, -15.206352233886719, 13.576255798339844, 5.52093505859375, -1.52606201171875, 7.1405029296875, 8.869766235351562, 0.1051025390625, -5.996341705322266, 24.894699096679688, 5.7894439697265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000553.npy"}
{"epoch": 0.8359788359788359, "step": 554, "batch_size": 64, "mean": 8.051122665405273, "std": 13.249629020690918, "min": -24.979286193847656, "p10": -6.040985870361327, "median": 6.697375297546387, "p90": 27.59657173156739, "max": 44.525272369384766, "pos_frac": 0.78125, "sample": [8.619125366210938, 20.778358459472656, 28.65277862548828, 18.049072265625, 5.861419677734375, 6.586469650268555, 3.3511905670166016, 15.38238525390625, -19.570053100585938, 17.43962860107422, 2.373424530029297, -1.628997802734375, -9.69073486328125, 10.173210144042969, -24.979286193847656, 0.2662849426269531, 5.4990692138671875, 2.3397903442382812, 10.240341186523438, 3.44842529296875, 3.7284622192382812, 15.303558349609375, 5.255210876464844, 0.907501220703125, 7.5182952880859375, 11.298934936523438, 31.07769775390625, 16.606834411621094, -4.320505142211914, 10.311470031738281, 44.525272369384766, -0.6741790771484375, 11.342060089111328, -7.4044342041015625, 0.9430198669433594, 0.32743263244628906, 13.504219055175781, 2.333484649658203, 23.09076690673828, -1.2831649780273438, -20.64190673828125, 25.427200317382812, 9.7032470703125, 0.7254180908203125, 7.578155517578125, -5.085453033447266, 2.462251663208008, -3.925830841064453, 3.0443458557128906, 0.86846923828125, 23.29450225830078, -9.920257568359375, -6.3590240478515625, 29.680465698242188, 29.892807006835938, 8.162704467773438, 6.808280944824219, 18.54998016357422, 16.04641342163086, 24.027118682861328, 31.378772735595703, -5.298896789550781, 28.526302337646484, 12.742923736572266], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000554.npy"}
{"epoch": 0.8374905517762661, "step": 555, "batch_size": 64, "mean": 8.329030990600586, "std": 13.48740005493164, "min": -31.171173095703125, "p10": -10.8508731842041, "median": 9.040794372558594, "p90": 22.464606475830077, "max": 34.01073455810547, "pos_frac": 0.78125, "sample": [1.8634147644042969, -11.484565734863281, 7.926849365234375, 13.059600830078125, 22.572174072265625, -1.5604705810546875, -14.627639770507812, -3.1848907470703125, -11.122142791748047, 0.8388214111328125, 18.619827270507812, -10.217910766601562, 16.64776611328125, 6.686275482177734, 29.07293701171875, 18.19715118408203, 1.0138397216796875, 7.0567169189453125, 3.8097877502441406, 20.070579528808594, 9.752349853515625, 0.4260711669921875, 34.01073455810547, 5.46368408203125, 12.576801300048828, 16.660919189453125, 21.2259521484375, 14.908935546875, 12.319931030273438, 19.199249267578125, 16.059982299804688, 9.586029052734375, 21.243663787841797, 22.21361541748047, -14.599937438964844, 21.753582000732422, -31.171173095703125, 12.140716552734375, -2.98004150390625, 0.9249267578125, 11.796432495117188, 8.495559692382812, 16.698341369628906, 8.183685302734375, -15.662422180175781, 6.599361419677734, 17.870033264160156, 6.8759307861328125, 31.59874725341797, -4.912519454956055, 16.383861541748047, -3.0766868591308594, -25.093887329101562, 5.465717315673828, 25.381763458251953, 15.093978881835938, 8.234485626220703, 6.1198883056640625, 13.85037612915039, -10.138530731201172, 33.114768981933594, 15.880081176757812, 24.797882080078125, 2.547016143798828], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000555.npy"}
{"epoch": 0.8390022675736961, "step": 556, "batch_size": 64, "mean": 7.474764823913574, "std": 13.534162521362305, "min": -19.068161010742188, "p10": -7.081931304931639, "median": 6.143144607543945, "p90": 26.932122039794933, "max": 40.07904052734375, "pos_frac": 0.65625, "sample": [10.399559020996094, 0.884735107421875, 3.3121795654296875, 1.2809562683105469, -18.056869506835938, 21.570419311523438, 28.978897094726562, 3.348054885864258, -4.899435043334961, -5.7796478271484375, -3.4778709411621094, -1.1005516052246094, 13.980461120605469, 21.418907165527344, 7.170566558837891, 6.733772277832031, -1.02947998046875, 4.071723937988281, 16.345848083496094, 21.420169830322266, -2.5519561767578125, 11.256134033203125, 30.55638885498047, -1.0178070068359375, -5.40374755859375, 5.1691741943359375, -3.626201629638672, 12.17831039428711, 6.206424713134766, -3.6857452392578125, 2.3544960021972656, -13.0059814453125, 24.55181121826172, 2.8576278686523438, -2.0838623046875, 6.339021682739258, 24.539161682128906, 31.628997802734375, 4.874214172363281, 6.079864501953125, 27.952255249023438, 40.07904052734375, 12.009532928466797, 7.438529968261719, 19.690460205078125, 10.973731994628906, -3.2486343383789062, -0.437164306640625, -12.081205368041992, -0.9271469116210938, 30.329151153564453, -19.068161010742188, -5.819435119628906, 14.757644653320312, 6.383480072021484, -7.6230010986328125, 13.521339416503906, -15.638591766357422, 12.485618591308594, 22.328004837036133, 20.951927185058594, 38.667510986328125, 11.756952285766602, -9.885591506958008], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000556.npy"}
{"epoch": 0.8405139833711263, "step": 557, "batch_size": 64, "mean": 9.407912254333496, "std": 14.06295108795166, "min": -14.75079345703125, "p10": -9.112580108642577, "median": 7.879384994506836, "p90": 29.67140426635743, "max": 43.73258590698242, "pos_frac": 0.75, "sample": [36.76593017578125, 10.9840087890625, 9.68256950378418, 2.2464218139648438, 3.6549148559570312, -10.209617614746094, 37.11758804321289, -7.637016296386719, 13.715972900390625, 15.809280395507812, 4.898933410644531, -12.031349182128906, -6.523149490356445, 4.989814758300781, 1.8292350769042969, 4.195972442626953, 17.63042449951172, 7.785816192626953, 43.73258590698242, 6.943824768066406, 6.604911804199219, -4.6633758544921875, 25.212608337402344, -13.333953857421875, -3.4038734436035156, -14.75079345703125, 30.482261657714844, 36.71803283691406, -3.8379592895507812, 5.7423553466796875, 13.464324951171875, 3.557697296142578, 11.141265869140625, -4.9828643798828125, -0.6788482666015625, 31.17123031616211, 17.962646484375, 19.77391815185547, 7.972953796386719, 43.665496826171875, 4.450172424316406, 16.25216293334961, -9.744964599609375, -0.9559402465820312, 8.867012023925781, -5.942474365234375, 8.55963134765625, 22.684982299804688, 18.222267150878906, 15.171531677246094, 2.3699684143066406, 9.964332580566406, 8.940086364746094, 6.285165786743164, 11.201288223266602, 5.779449462890625, 20.576934814453125, 6.0072479248046875, 21.633644104003906, 16.978515625, -12.964263916015625, -13.03057861328125, 27.779403686523438, 19.620615005493164], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000557.npy"}
{"epoch": 0.8420256991685563, "step": 558, "batch_size": 64, "mean": 8.866661071777344, "std": 12.98138427734375, "min": -28.903636932373047, "p10": -5.877357482910155, "median": 8.890869140625, "p90": 24.484062957763676, "max": 45.44530487060547, "pos_frac": 0.765625, "sample": [11.016761779785156, 23.810867309570312, 22.78411865234375, -8.140777587890625, 18.7442626953125, 0.29299354553222656, 24.77257537841797, 4.967718124389648, 5.232247352600098, 14.832794189453125, 6.779659271240234, 13.430160522460938, -0.0653533935546875, 5.250328063964844, 6.080970764160156, 18.600597381591797, 10.183755874633789, 12.360649108886719, 14.421493530273438, 12.932384490966797, 17.591339111328125, 12.213729858398438, -0.033344268798828125, 19.101909637451172, 16.664875030517578, -4.8741912841796875, 28.38616943359375, 8.915634155273438, 11.625198364257812, 2.4513092041015625, 16.05416488647461, 12.616035461425781, 8.594680786132812, 45.44530487060547, -10.729045867919922, -6.654624938964844, 7.380054473876953, 0.3455047607421875, 3.0182876586914062, 36.41163635253906, 12.879222869873047, -11.564903259277344, 29.89734649658203, 3.364105224609375, 21.752708435058594, 30.344314575195312, -5.237308502197266, 0.9974212646484375, 25.956405639648438, 12.528411865234375, 7.8456573486328125, -28.903636932373047, 20.313980102539062, -6.09197998046875, 1.283111572265625, -1.97698974609375, 12.509407043457031, -5.3765716552734375, -17.850759506225586, -3.80712890625, 18.981689453125, 1.3182029724121094, 8.866104125976562, -3.3753204345703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000558.npy"}
{"epoch": 0.8435374149659864, "step": 559, "batch_size": 64, "mean": 8.152387619018555, "std": 15.563899040222168, "min": -28.164348602294922, "p10": -10.453194427490232, "median": 6.73881721496582, "p90": 27.748803329467776, "max": 49.8701171875, "pos_frac": 0.703125, "sample": [-2.1530914306640625, -11.599418640136719, 4.141807556152344, -1.7301559448242188, 8.254119873046875, 49.8701171875, 25.33386993408203, 22.47052001953125, 3.1458511352539062, 0.9426727294921875, -18.370407104492188, 16.848175048828125, -1.2318801879882812, 12.385086059570312, 8.600231170654297, 30.234540939331055, 7.309173583984375, 4.9170684814453125, 18.658573150634766, 2.5021286010742188, 24.97903823852539, 6.318336486816406, 2.4660186767578125, 26.755538940429688, 1.3427467346191406, -12.391773223876953, -5.34320068359375, 23.93286895751953, -7.7786712646484375, 38.260719299316406, 8.251453399658203, -1.216827392578125, 8.565109252929688, 32.91062927246094, 16.75611114501953, -4.872936248779297, 6.942272186279297, -5.288227081298828, -2.5514450073242188, -4.03253173828125, 0.7812957763671875, -4.187217712402344, -2.1073646545410156, 18.120349884033203, 16.32105255126953, 28.174488067626953, 17.431886672973633, -17.52851104736328, 20.087528228759766, 0.4015655517578125, 6.535362243652344, 37.51939392089844, -28.164348602294922, 7.1415557861328125, 13.092842102050781, 36.897865295410156, 5.209621429443359, 22.48272705078125, 11.32468032836914, 6.519834518432617, 17.3660888671875, 15.766586303710938, -19.53437042236328, -22.434329986572266], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000559.npy"}
{"epoch": 0.8450491307634165, "step": 560, "batch_size": 64, "mean": 8.558538436889648, "std": 13.64900016784668, "min": -21.717273712158203, "p10": -5.373776245117187, "median": 5.725452423095703, "p90": 26.16575622558594, "max": 39.52680206298828, "pos_frac": 0.703125, "sample": [7.82257080078125, 12.525833129882812, 21.88671875, 16.35887908935547, -3.6344146728515625, -4.991554260253906, 19.993915557861328, 4.7115631103515625, 35.316802978515625, -3.5321502685546875, 7.3333892822265625, 4.0789947509765625, -4.5360565185546875, 6.7510833740234375, -1.1462249755859375, -4.747398376464844, -5.230018615722656, 3.16671085357666, 15.074256896972656, 16.314468383789062, -3.82879638671875, 17.110618591308594, 4.857753753662109, 3.504079818725586, 24.721454620361328, 17.253902435302734, -4.439109802246094, 30.8602237701416, 26.42180633544922, 15.763322830200195, 12.810005187988281, -7.1848297119140625, -15.494651794433594, 33.2057991027832, 21.59961700439453, -4.4405670166015625, -0.9896621704101562, -2.8424758911132812, 39.52680206298828, -9.045158386230469, 38.839168548583984, 20.328269958496094, 6.167182922363281, 17.348102569580078, 5.283721923828125, 9.679962158203125, -5.435386657714844, 31.918556213378906, 13.08258056640625, 2.1688003540039062, 1.8952178955078125, 3.89239501953125, -21.717273712158203, -9.861518859863281, 4.972747802734375, 16.226966857910156, -16.07917022705078, 2.3586177825927734, 21.596126556396484, 4.473854064941406, 25.56830596923828, 17.09356689453125, 1.4345893859863281, 13.62356948852539], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000560.npy"}
{"epoch": 0.8465608465608465, "step": 561, "batch_size": 64, "mean": 9.65665054321289, "std": 14.451604843139648, "min": -17.127674102783203, "p10": -4.383308982849121, "median": 5.924884796142578, "p90": 34.513165664672854, "max": 46.17548370361328, "pos_frac": 0.734375, "sample": [-0.587158203125, 5.411602020263672, 0.02972412109375, 9.226211547851562, 7.615140914916992, 22.7857666015625, 3.5984420776367188, -3.5690841674804688, 35.06490707397461, 6.64825439453125, -1.9619369506835938, 20.507980346679688, 2.1997642517089844, 16.29393768310547, 1.5234184265136719, -0.864410400390625, 8.732879638671875, 26.573272705078125, 46.17548370361328, 1.2260513305664062, 2.418018341064453, 20.56927490234375, 15.481952667236328, -2.5516891479492188, 13.59295654296875, -0.7170791625976562, 6.020881652832031, -4.078096389770508, -14.187034606933594, 39.00347900390625, -17.127674102783203, 9.224773406982422, 5.828887939453125, 31.050308227539062, 8.376251220703125, 13.812187194824219, 3.7737808227539062, 12.259710311889648, 1.2289962768554688, 41.80671691894531, 33.22576904296875, 37.17900085449219, 20.837661743164062, 43.57820129394531, 17.214282989501953, -5.84014892578125, -11.621932983398438, 12.78499984741211, 14.5599365234375, 5.7541656494140625, 6.2804412841796875, 0.4886016845703125, -0.6710739135742188, -4.707359313964844, 5.2880859375, 17.204914093017578, -0.5362319946289062, 2.7483749389648438, 7.879280090332031, -4.5141143798828125, -3.346607208251953, 2.468465805053711, 36.167259216308594, -6.813240051269531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000561.npy"}
{"epoch": 0.8480725623582767, "step": 562, "batch_size": 64, "mean": 8.955015182495117, "std": 12.715414047241211, "min": -12.940887451171875, "p10": -4.998223495483398, "median": 5.163494110107422, "p90": 27.30884323120118, "max": 37.22967529296875, "pos_frac": 0.6875, "sample": [-3.6774826049804688, 23.914669036865234, -5.248996734619141, 16.0655517578125, 4.735198974609375, -0.28319549560546875, 2.0165634155273438, 37.22967529296875, 29.25934600830078, 3.2595062255859375, 12.080131530761719, 7.4996795654296875, -10.677989959716797, -8.437141418457031, 18.50189208984375, -1.4235687255859375, 2.3274688720703125, 8.927978515625, 23.31348419189453, -4.4130859375, 11.386100769042969, 29.398624420166016, -1.1571502685546875, -0.8316802978515625, 2.9474639892578125, 24.86931610107422, 1.1506767272949219, -11.228073120117188, 0.8289642333984375, 11.15921401977539, 12.330389022827148, -12.940887451171875, 24.4893798828125, 22.267303466796875, 12.551761627197266, 31.81134033203125, 0.00963592529296875, -0.8511199951171875, 3.354564666748047, 28.354354858398438, -2.0347213745117188, -0.19174575805664062, 17.398521423339844, 21.502792358398438, 16.95655059814453, 32.36848449707031, 18.703750610351562, 21.679962158203125, 24.505084991455078, 3.063751220703125, 22.1627197265625, 0.465118408203125, -2.334146499633789, 4.7778778076171875, 6.739360809326172, 10.842658996582031, 20.46517562866211, 29.224029541015625, 5.549110412597656, -7.471923828125, -0.550079345703125, -8.340805053710938, -2.907848358154297, -4.322563171386719], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000562.npy"}
{"epoch": 0.8495842781557067, "step": 563, "batch_size": 64, "mean": 9.173943519592285, "std": 15.34117317199707, "min": -18.89984130859375, "p10": -11.663505554199215, "median": 7.170318603515625, "p90": 31.927379608154297, "max": 45.12939453125, "pos_frac": 0.75, "sample": [3.2719573974609375, 4.362546920776367, 7.9675140380859375, 0.43935203552246094, 32.824798583984375, 21.094100952148438, -13.898771286010742, 45.12939453125, 20.575668334960938, 4.429840087890625, 3.6530227661132812, 11.687889099121094, 3.1387405395507812, -4.0190887451171875, 24.102333068847656, 7.810905456542969, -17.385536193847656, 31.7486572265625, -16.21353530883789, 38.10631561279297, -2.5313568115234375, 44.31040954589844, -3.034830093383789, 16.429031372070312, 16.527313232421875, -2.140308380126953, 19.357437133789062, 10.127853393554688, 12.442085266113281, -13.385246276855469, -4.876091003417969, -4.5737457275390625, 0.5672569274902344, 2.5365142822265625, -5.440526962280273, 35.18767166137695, 4.154815673828125, -14.983367919921875, 9.17190170288086, 8.51446533203125, 32.00397491455078, 31.326202392578125, 13.957195281982422, -7.646110534667969, -17.601242065429688, 5.383110046386719, 11.605926513671875, 1.8656158447265625, 17.28125762939453, 9.025581359863281, -18.89984130859375, 6.143711090087891, 15.798076629638672, 17.511234283447266, 16.567710876464844, -3.4009857177734375, 35.67221450805664, 20.43425750732422, 4.5133209228515625, 29.38153076171875, 6.337799072265625, 6.529731750488281, 15.867958068847656, 0.2867431640625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000563.npy"}
{"epoch": 0.8510959939531368, "step": 564, "batch_size": 64, "mean": 8.895227432250977, "std": 14.309100151062012, "min": -23.51287841796875, "p10": -6.8262140274047844, "median": 7.277202606201172, "p90": 29.573941230773926, "max": 38.76618194580078, "pos_frac": 0.75, "sample": [29.455787658691406, 0.6263198852539062, 33.20928955078125, 4.5408477783203125, 12.92121696472168, 32.33074188232422, 18.457733154296875, -21.078262329101562, 3.331462860107422, -7.183849334716797, 35.51642608642578, 29.62457847595215, 12.096176147460938, -14.188989639282227, 1.5839691162109375, 2.5050277709960938, -4.540491104125977, 11.320520401000977, 20.818771362304688, 7.249137878417969, -0.6801910400390625, 4.176784515380859, 16.17473602294922, 7.857452392578125, 10.78948974609375, 2.7997665405273438, 29.206298828125, 1.178741455078125, 3.7155494689941406, 8.847244262695312, 15.531646728515625, 19.017906188964844, 36.20117950439453, 3.4056015014648438, 0.0908203125, -14.023406982421875, 7.107276916503906, 2.067535400390625, -4.34918212890625, -0.8713226318359375, 17.03382110595703, 25.441871643066406, 4.217231750488281, -5.838218688964844, -10.868961334228516, -1.234771728515625, 21.065753936767578, 8.797576904296875, -2.5234298706054688, -23.51287841796875, 28.37459945678711, 14.952041625976562, 15.039154052734375, -5.991731643676758, 30.899948120117188, 7.305267333984375, 14.51483154296875, 38.76618194580078, -13.896408081054688, 27.352081298828125, -0.637176513671875, 12.022342681884766, 1.8583793640136719, 9.31671142578125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000564.npy"}
{"epoch": 0.8526077097505669, "step": 565, "batch_size": 64, "mean": 8.277042388916016, "std": 12.76017951965332, "min": -21.607765197753906, "p10": -5.116668701171874, "median": 6.939640998840332, "p90": 23.411215591430672, "max": 39.19098663330078, "pos_frac": 0.75, "sample": [1.6023540496826172, -19.110580444335938, -21.607765197753906, 19.49939727783203, 6.12841796875, -0.16802978515625, 5.932830810546875, 12.533782958984375, -0.3429298400878906, 3.5692825317382812, 13.447059631347656, 10.009147644042969, 26.761871337890625, -18.567331314086914, 6.2925262451171875, 13.45855712890625, 7.219049453735352, 8.65066909790039, 27.280975341796875, 33.90718460083008, 21.541770935058594, -12.210769653320312, -5.5437164306640625, 0.512298583984375, 11.77164077758789, 8.603103637695312, 1.1434822082519531, 29.17788314819336, 21.179222106933594, 10.611274719238281, 15.122991561889648, 2.499004364013672, 3.0208740234375, 2.201000213623047, 18.722761154174805, -1.0435562133789062, 18.58800506591797, 10.495353698730469, -7.701690673828125, 18.02783203125, 5.2171630859375, 5.0566864013671875, -12.286819458007812, 20.575645446777344, 4.2789764404296875, 16.299896240234375, -3.294708251953125, 16.845054626464844, 38.88981628417969, 3.5320510864257812, 15.581901550292969, -4.1202239990234375, -1.79754638671875, -2.6164779663085938, 13.282987594604492, 0.7197113037109375, 6.6602325439453125, 14.723701477050781, -1.3859176635742188, -2.2791900634765625, 39.19098663330078, 19.995098114013672, 9.234081268310547, 24.212406158447266], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000565.npy"}
{"epoch": 0.854119425547997, "step": 566, "batch_size": 64, "mean": 7.888600826263428, "std": 13.506945610046387, "min": -26.028533935546875, "p10": -7.913189315795898, "median": 7.271034240722656, "p90": 22.181803894042968, "max": 47.44544982910156, "pos_frac": 0.6875, "sample": [3.525970458984375, -3.0237884521484375, -2.26361083984375, 5.376983642578125, 8.790695190429688, 19.45806884765625, 27.276573181152344, 15.952190399169922, 14.127565383911133, 26.144084930419922, 13.707082748413086, 17.140060424804688, 7.869720458984375, 5.034065246582031, 12.719375610351562, 21.954605102539062, -26.028533935546875, 8.138877868652344, 6.0982818603515625, 9.227100372314453, 12.198551177978516, -0.5674667358398438, -8.464820861816406, 18.543289184570312, 3.4267635345458984, 5.636474609375, 17.579986572265625, 20.260021209716797, 12.682533264160156, -14.10247802734375, 19.297527313232422, 19.110336303710938, 26.306068420410156, -4.1853485107421875, -7.425079345703125, -19.184112548828125, 4.804718017578125, 2.0734806060791016, 22.2791748046875, -3.0795135498046875, 6.6723480224609375, -3.8935012817382812, 47.44544982910156, -11.333133697509766, 31.427026748657227, 10.441699981689453, -0.22475433349609375, -10.643138885498047, 11.447334289550781, -1.6295280456542969, 0.4198589324951172, -1.1201019287109375, -5.3945465087890625, 17.694995880126953, -3.0156822204589844, 6.65081787109375, 1.7311744689941406, 21.92798614501953, 9.583290100097656, -8.122379302978516, 14.899406433105469, 8.088165283203125, -1.1586761474609375, 44.56084442138672], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000566.npy"}
{"epoch": 0.8556311413454271, "step": 567, "batch_size": 64, "mean": 8.996700286865234, "std": 12.000800132751465, "min": -13.198333740234375, "p10": -5.782350158691406, "median": 8.533767700195312, "p90": 22.734149551391607, "max": 42.20555877685547, "pos_frac": 0.765625, "sample": [14.538894653320312, 2.2545204162597656, -8.199226379394531, 21.636093139648438, 13.710174560546875, 10.742431640625, 25.360851287841797, 11.186904907226562, 42.20555877685547, 2.7086715698242188, 14.225807189941406, 8.557464599609375, 11.16573715209961, -13.198333740234375, 7.204933166503906, -0.6928176879882812, -10.253486633300781, 8.1588134765625, -6.009521484375, 12.520206451416016, 6.1080169677734375, -2.9769210815429688, -3.2790298461914062, 21.12639617919922, -2.2918853759765625, 1.3319683074951172, -5.289669036865234, -5.800285339355469, 3.80548095703125, -5.886054992675781, 19.773651123046875, 21.740997314453125, 26.025772094726562, 34.4215087890625, 16.326587677001953, 17.00250816345215, 9.57552719116211, 1.8382644653320312, 19.552978515625, 37.56658935546875, -5.740501403808594, 5.524894714355469, 11.790550231933594, 16.358078002929688, 23.159786224365234, 16.61341094970703, -4.724220275878906, 5.473396301269531, 8.16552734375, 14.259849548339844, 14.696977615356445, 0.20528030395507812, 11.977119445800781, 8.51007080078125, -5.204994201660156, 0.3033905029296875, -9.83749771118164, 13.044544219970703, 34.288917541503906, 0.3031005859375, 14.67861557006836, 4.903373718261719, 15.777362823486328, 2.7657432556152344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000567.npy"}
{"epoch": 0.8571428571428571, "step": 568, "batch_size": 64, "mean": 9.872920989990234, "std": 12.24352741241455, "min": -14.495052337646484, "p10": -3.9818592071533194, "median": 8.388742446899414, "p90": 26.696931457519536, "max": 42.1651611328125, "pos_frac": 0.796875, "sample": [12.986763000488281, 6.465282440185547, 9.92791748046875, 9.496112823486328, 0.6344566345214844, 9.263359069824219, 6.58404541015625, -14.495052337646484, 7.181190490722656, -6.228313446044922, -5.040306091308594, 30.494291305541992, 8.975791931152344, 21.255279541015625, -1.7649612426757812, -11.210372924804688, 4.788337707519531, 3.8306808471679688, 2.4332656860351562, 2.1778831481933594, 0.8149642944335938, 4.375396728515625, 9.192890167236328, 8.973793029785156, 22.960311889648438, 34.93276596069336, 22.799072265625, 34.235137939453125, -1.2920379638671875, 9.333259582519531, 20.47247314453125, 8.34189224243164, 12.635967254638672, -0.51812744140625, 3.355253219604492, 11.528570175170898, -8.018058776855469, 7.614799499511719, -1.2595176696777344, 10.090631484985352, 7.172416687011719, 24.565444946289062, 4.4392852783203125, 11.877769470214844, 21.14803695678711, 3.0364227294921875, 25.64293670654297, 0.895965576171875, -4.328826904296875, -0.8267135620117188, 15.40850830078125, 14.321182250976562, 8.435592651367188, 11.832700729370117, 38.04065704345703, 3.923351287841797, 0.5304489135742188, 24.9251708984375, 28.371307373046875, 23.613754272460938, -5.6251068115234375, -3.1722679138183594, 27.148643493652344, 42.1651611328125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000568.npy"}
{"epoch": 0.8586545729402872, "step": 569, "batch_size": 64, "mean": 7.4285478591918945, "std": 13.195655822753906, "min": -24.22014617919922, "p10": -8.756021881103514, "median": 6.582496643066406, "p90": 26.71719551086426, "max": 31.884288787841797, "pos_frac": 0.734375, "sample": [23.52802276611328, -4.156890869140625, 5.953479766845703, 26.94428253173828, 15.8809814453125, 26.20822525024414, -15.377410888671875, 27.583251953125, 12.44403076171875, 29.031158447265625, 24.734081268310547, 21.48334503173828, 4.536777496337891, 11.67770004272461, -14.7506103515625, 4.0081024169921875, 6.145671844482422, 23.932876586914062, 2.4103927612304688, 11.114734649658203, 29.25464630126953, 12.68414306640625, 6.736946105957031, 26.935325622558594, 5.6540679931640625, 12.248394012451172, 12.54180908203125, 2.9749488830566406, 0.694732666015625, -9.329082489013672, -20.280235290527344, -0.7917003631591797, -7.418880462646484, 7.620738983154297, 0.20970916748046875, 4.5447998046875, 30.766357421875, 14.683151245117188, 8.082351684570312, 6.428047180175781, 5.5434722900390625, -4.481151580810547, -18.477813720703125, -3.1797752380371094, 7.302242279052734, -24.22014617919922, 5.7943115234375, -14.451507568359375, 11.280326843261719, 3.0666046142578125, 9.445457458496094, 5.821990966796875, 18.387248992919922, -4.575122833251953, -1.9388465881347656, 6.928764343261719, 18.588134765625, 7.7626190185546875, 31.884288787841797, -2.7002334594726562, -2.1656112670898438, 12.558784484863281, 24.448505401611328, -4.767967224121094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000569.npy"}
{"epoch": 0.8601662887377173, "step": 570, "batch_size": 64, "mean": 11.672306060791016, "std": 12.823506355285645, "min": -10.375648498535156, "p10": -4.202473831176757, "median": 11.109630584716797, "p90": 28.07255401611329, "max": 48.048614501953125, "pos_frac": 0.796875, "sample": [21.569534301757812, -6.924125671386719, 8.475852966308594, 1.443359375, 6.072475433349609, 17.340538024902344, -9.43182373046875, -4.489788055419922, 5.908756256103516, -7.2835235595703125, 4.542625427246094, 3.5019683837890625, 11.864913940429688, 17.548255920410156, 11.864593505859375, 17.82849884033203, -0.8264865875244141, -8.364433288574219, -2.877307891845703, 24.10340118408203, 22.8280029296875, -2.9552078247070312, -0.9507217407226562, 25.196125030517578, 37.11741638183594, 1.1702117919921875, 17.111000061035156, 12.651168823242188, 21.717796325683594, 6.1116943359375, 17.79065704345703, 28.929977416992188, 23.280033111572266, 1.462799072265625, 2.7477493286132812, 9.588447570800781, 18.621116638183594, 10.354667663574219, 29.770416259765625, 8.948938369750977, 22.128555297851562, 12.670730590820312, 24.917877197265625, -3.532073974609375, 0.16757965087890625, 8.10321044921875, 29.350738525390625, 16.889625549316406, 22.8701171875, -6.3109130859375, 26.0718994140625, -10.375648498535156, 6.883941650390625, 19.451637268066406, 38.30511474609375, 3.1682052612304688, 13.681815147399902, 4.007869720458984, 12.110115051269531, 15.352497100830078, 48.048614501953125, -0.5885505676269531, 33.228538513183594, 7.066558837890625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000570.npy"}
{"epoch": 0.8616780045351474, "step": 571, "batch_size": 64, "mean": 6.014917850494385, "std": 12.24540901184082, "min": -22.311317443847656, "p10": -10.364762878417968, "median": 7.728538513183594, "p90": 19.76586437225342, "max": 33.796302795410156, "pos_frac": 0.734375, "sample": [7.173370361328125, 8.600723266601562, 11.814651489257812, 5.255941390991211, -6.060127258300781, 23.629436492919922, 9.245956420898438, 16.98224639892578, 10.166097640991211, 18.005645751953125, 3.8319435119628906, 10.660331726074219, 1.0007820129394531, 8.450267791748047, -5.908740997314453, 3.5379791259765625, 9.658485412597656, 10.862770080566406, 3.8595428466796875, 15.918220520019531, 30.795955657958984, -0.33588409423828125, 10.97149658203125, 13.820625305175781, 11.195480346679688, 8.283706665039062, 18.72530746459961, 19.217126846313477, -20.81958770751953, 3.33697509765625, 0.3662872314453125, -14.925994873046875, 6.063236236572266, -2.801910400390625, 3.6381072998046875, 5.614299774169922, -17.66405487060547, -3.8464088439941406, -10.436233520507812, -6.9718780517578125, 1.9452629089355469, -5.337066650390625, 1.3710098266601562, 10.684593200683594, 14.808364868164062, 25.469623565673828, 26.07158660888672, -12.995288848876953, 10.70706558227539, 1.906585693359375, -10.197998046875, -22.311317443847656, 9.637481689453125, 20.00103759765625, 18.77197265625, -9.883407592773438, 19.16118621826172, 20.59149932861328, 16.839439392089844, 11.073577880859375, -9.49123764038086, 2.2688817977905273, -10.84658432006836, 33.796302795410156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000571.npy"}
{"epoch": 0.8631897203325775, "step": 572, "batch_size": 64, "mean": 7.561029434204102, "std": 12.016935348510742, "min": -16.29772186279297, "p10": -6.510003662109375, "median": 7.004077911376953, "p90": 22.410131835937506, "max": 42.33148956298828, "pos_frac": 0.71875, "sample": [13.866065979003906, -16.29772186279297, 7.660129547119141, 19.250499725341797, 15.818229675292969, 8.958515167236328, 22.859161376953125, 13.072456359863281, -8.274088859558105, 42.33148956298828, 3.4767074584960938, 8.243667602539062, 11.291839599609375, 31.17205810546875, -3.434906005859375, 8.461071014404297, -10.404747009277344, 2.6806564331054688, -4.7231292724609375, 20.886154174804688, 16.571949005126953, 5.23309326171875, 9.124740600585938, 26.997276306152344, 8.643638610839844, 4.577014923095703, 1.5625534057617188, 15.934127807617188, 3.9319610595703125, 0.02881622314453125, -2.7755813598632812, 31.77361297607422, 40.53365707397461, -7.640838623046875, 7.506513595581055, 4.527565002441406, -0.8472518920898438, -7.61053466796875, 6.178197860717773, 5.109661102294922, 3.0264205932617188, 11.39071273803711, 6.571891784667969, 18.0841064453125, 8.748664855957031, 7.860076904296875, -1.1374282836914062, -6.60809326171875, 32.1436767578125, 8.988296508789062, 1.975107192993164, 21.362396240234375, -1.994110107421875, 12.356513977050781, -12.006362915039062, -6.2811279296875, -0.150787353515625, 0.7636489868164062, -4.141609191894531, 7.4362640380859375, 10.952529907226562, -0.6177883148193359, -0.6643295288085938, 9.592935562133789], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000572.npy"}
{"epoch": 0.8647014361300076, "step": 573, "batch_size": 64, "mean": 8.796982765197754, "std": 16.09462547302246, "min": -23.963401794433594, "p10": -10.144310379028319, "median": 7.819561958312988, "p90": 32.39860763549805, "max": 50.689910888671875, "pos_frac": 0.71875, "sample": [10.619499206542969, 10.7933349609375, -22.92637062072754, 18.71825408935547, -6.331874847412109, 0.6375503540039062, -10.984195709228516, 20.29507827758789, 7.3349456787109375, -11.54132080078125, -23.963401794433594, -14.999637603759766, 3.8863067626953125, -0.8954010009765625, 11.79486083984375, 14.38665771484375, -3.6836700439453125, 15.899162292480469, 31.75475311279297, -22.986968994140625, 33.597496032714844, 9.67193603515625, 22.78772735595703, 8.973808288574219, 7.4706573486328125, 3.0340919494628906, 50.689910888671875, 8.441019058227539, 32.67454528808594, -3.4710006713867188, 23.73833465576172, 3.4454345703125, 8.607986450195312, -0.590606689453125, 21.48260498046875, 2.1183013916015625, 8.419635772705078, 2.6370182037353516, 8.706596374511719, 44.78242492675781, 22.737327575683594, 12.257831573486328, 8.168466567993164, -3.6024932861328125, -0.539794921875, 3.6471099853515625, 45.18412780761719, 4.6614990234375, 4.498481750488281, 18.877914428710938, 31.092552185058594, 28.551254272460938, -2.358734130859375, 8.376220703125, -3.856719970703125, -8.184577941894531, 32.98907470703125, -7.8647003173828125, 8.828659057617188, 6.365699768066406, 2.9017410278320312, 2.0481605529785156, -12.153915405273438, 35.35624694824219], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000573.npy"}
{"epoch": 0.8662131519274376, "step": 574, "batch_size": 64, "mean": 10.483013153076172, "std": 12.970547676086426, "min": -11.288017272949219, "p10": -4.673501968383789, "median": 7.967290878295898, "p90": 27.533121490478518, "max": 39.73370361328125, "pos_frac": 0.765625, "sample": [5.983448028564453, -4.157005310058594, -1.0883560180664062, -7.884185791015625, -9.298561096191406, 23.693239212036133, 1.539642333984375, -8.240631103515625, -6.045642852783203, -10.610145568847656, 8.392024993896484, 7.4311370849609375, -1.5068206787109375, -3.241016387939453, -4.745571136474609, 4.101345062255859, -1.973306655883789, 16.144577026367188, 15.228694915771484, 7.779026031494141, 16.042999267578125, 1.983612060546875, -3.705718994140625, 6.39459228515625, 0.171630859375, 39.73370361328125, 23.755172729492188, 35.752349853515625, 2.3383026123046875, 25.467514038085938, 9.996978759765625, 1.7218914031982422, 15.388065338134766, 13.651336669921875, 0.808197021484375, 18.91405487060547, -11.288017272949219, 12.088973999023438, 27.80480194091797, 8.216682434082031, 18.976219177246094, 6.0799560546875, 36.869834899902344, 6.132568359375, 24.283544540405273, 32.15757751464844, 16.667022705078125, 14.992218017578125, 8.155555725097656, 7.588405609130859, -0.4481048583984375, 36.83164978027344, 16.940597534179688, 15.863197326660156, 26.874229431152344, 5.8653411865234375, 3.574848175048828, 12.118831634521484, 26.899200439453125, 32.08948516845703, -4.505340576171875, 18.922225952148438, 26.012908935546875, 5.2318572998046875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000574.npy"}
{"epoch": 0.8677248677248677, "step": 575, "batch_size": 64, "mean": 6.856937408447266, "std": 14.681061744689941, "min": -30.058841705322266, "p10": -8.224920654296875, "median": 4.8353729248046875, "p90": 29.119874572753908, "max": 46.734527587890625, "pos_frac": 0.71875, "sample": [8.0107421875, -11.593746185302734, -15.819171905517578, 5.828880310058594, 40.344635009765625, -1.796600341796875, -2.6254043579101562, 5.271339416503906, -0.030792236328125, -8.21026611328125, 9.582008361816406, -11.774688720703125, 29.120712280273438, 1.6063766479492188, -11.219581604003906, 6.7038116455078125, 0.5678787231445312, 5.2109375, 8.897045135498047, 6.964073181152344, 5.092857360839844, 21.2314453125, 8.576580047607422, 11.201175689697266, 5.225425720214844, -0.9712677001953125, -7.901092529296875, 6.227699279785156, 31.17059326171875, 1.262054443359375, 2.375345230102539, -1.6723136901855469, 29.117919921875, 41.47198486328125, 21.759357452392578, -8.231201171875, 40.887107849121094, 0.609954833984375, 35.18909454345703, 21.37026023864746, 4.08099365234375, 4.968597412109375, 1.4869918823242188, 3.0182952880859375, 1.357421875, 2.26513671875, 1.2632598876953125, -30.058841705322266, 4.7021484375, 46.734527587890625, 17.213836669921875, 1.8907623291015625, 2.5958709716796875, 10.29672622680664, -7.173046112060547, 7.6588134765625, 6.082099914550781, -7.8782196044921875, -0.5417404174804688, 7.70501708984375, -10.452354431152344, 25.56482696533203, 17.381698608398438, -0.3499717712402344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000575.npy"}
{"epoch": 0.8692365835222978, "step": 576, "batch_size": 64, "mean": 10.249387741088867, "std": 13.091121673583984, "min": -21.319656372070312, "p10": -5.682965850830078, "median": 9.603904724121094, "p90": 25.546966552734375, "max": 33.9276123046875, "pos_frac": 0.71875, "sample": [12.362762451171875, 24.67328643798828, -4.716865539550781, 14.20584487915039, 23.470306396484375, 24.24195098876953, 33.69093704223633, 5.30078125, 16.195419311523438, 6.194358825683594, -5.165019989013672, -7.34385871887207, 17.814983367919922, -9.603736877441406, 8.8018798828125, -0.04235076904296875, -13.609909057617188, 12.855291366577148, 24.17132568359375, 10.938720703125, 18.024213790893555, 25.598831176757812, 9.253715515136719, 2.228588104248047, 26.335521697998047, 22.82736587524414, 18.83304214477539, -2.8058929443359375, 25.425949096679688, -1.0590896606445312, 8.08349609375, -1.6654205322265625, 1.813232421875, 24.53108024597168, 22.19860076904297, -1.0625534057617188, 25.274337768554688, -0.15726470947265625, 4.5386810302734375, 7.059841156005859, 9.954093933105469, -15.684555053710938, 23.258934020996094, -3.593170166015625, 17.953163146972656, 7.9208984375, 33.9276123046875, 29.822647094726562, -5.445945739746094, 28.057178497314453, 18.13671875, -21.319656372070312, 21.694122314453125, -5.7845458984375, 8.3331298828125, 18.116119384765625, 2.5179786682128906, -9.153200149536133, 30.9296875, 15.27471923828125, 3.69891357421875, -1.566619873046875, 8.550369262695312, 10.649845123291016], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000576.npy"}
{"epoch": 0.8707482993197279, "step": 577, "batch_size": 64, "mean": 5.949678897857666, "std": 12.540973663330078, "min": -27.497894287109375, "p10": -5.323233604431152, "median": 3.8467178344726562, "p90": 22.55973873138428, "max": 40.494468688964844, "pos_frac": 0.65625, "sample": [19.92384910583496, 6.561222076416016, -9.064048767089844, 5.647846221923828, -2.3109283447265625, -0.0276336669921875, -0.7321624755859375, -3.722378730773926, 17.136077880859375, 7.942710876464844, -3.2697830200195312, 21.933557510375977, 10.335052490234375, -0.882598876953125, -13.514720916748047, 32.74475860595703, 30.07318115234375, 35.723724365234375, -27.497894287109375, 10.135269165039062, 13.522552490234375, 10.410369873046875, -0.359405517578125, 22.828102111816406, -0.7384452819824219, -7.605628967285156, -4.987091064453125, 12.039226531982422, 3.1458873748779297, 7.508453369140625, 0.8797416687011719, 4.319034576416016, -4.715339660644531, 2.776449203491211, -1.563568115234375, 6.713844299316406, 3.440135955810547, 2.5267181396484375, 4.334293365478516, 2.143819808959961, -4.035068511962891, 1.2525405883789062, 10.073707580566406, 3.635974884033203, 40.494468688964844, 12.451431274414062, -10.836860656738281, 6.296600341796875, 3.610076904296875, 24.155654907226562, -4.400962829589844, 6.224967956542969, 34.16961669921875, 15.568340301513672, 3.6783065795898438, 4.567401885986328, -17.369247436523438, -5.467294692993164, -1.8003520965576172, 13.675979614257812, 7.999011993408203, 21.664676666259766, 4.015129089355469, -2.598907470703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000577.npy"}
{"epoch": 0.872260015117158, "step": 578, "batch_size": 64, "mean": 5.471958160400391, "std": 10.968921661376953, "min": -14.421676635742188, "p10": -7.2530975341796875, "median": 3.3607254028320312, "p90": 21.63063735961915, "max": 30.713241577148438, "pos_frac": 0.609375, "sample": [-2.576435089111328, -3.9636878967285156, -1.33892822265625, 0.634002685546875, 5.6062774658203125, -8.1781005859375, 7.801971435546875, 14.16269302368164, 13.101890563964844, -1.9632415771484375, 0.7825927734375, 18.26287841796875, -2.438701629638672, 11.961624145507812, 16.29071044921875, -7.43157958984375, 22.591659545898438, 15.747955322265625, 10.559562683105469, 5.397514343261719, 16.941932678222656, 5.621543884277344, 1.2795524597167969, 8.699271202087402, -1.1556720733642578, 8.895580291748047, 17.181068420410156, -0.15062713623046875, -1.476287841796875, -6.836639404296875, 29.78534698486328, 9.538494110107422, 27.568950653076172, 2.54339599609375, -7.927921295166016, 2.6523590087890625, 4.47273063659668, 9.533439636230469, 7.6246337890625, 14.509037017822266, 27.58142852783203, -0.2943878173828125, 10.555084228515625, -2.313516616821289, -1.4592132568359375, -9.939815521240234, -5.1632537841796875, 19.38825225830078, 7.864191055297852, 25.62729263305664, 1.5988922119140625, -11.15936279296875, 30.713241577148438, -0.4796791076660156, -5.8279876708984375, -13.324148178100586, 4.069091796875, -3.1539382934570312, 2.153076171875, -3.513029098510742, -14.421676635742188, 6.627315521240234, -4.5588226318359375, 25.325454711914062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000578.npy"}
{"epoch": 0.873771730914588, "step": 579, "batch_size": 64, "mean": 9.586578369140625, "std": 13.53989315032959, "min": -25.88885498046875, "p10": -3.9569885253906247, "median": 7.974128723144531, "p90": 28.2090316772461, "max": 42.290565490722656, "pos_frac": 0.8125, "sample": [8.318565368652344, 4.678672790527344, 7.811065673828125, 9.941291809082031, 8.783203125, -19.25409698486328, 20.58837890625, 6.730751037597656, 5.034576416015625, 28.732177734375, 19.50299072265625, 11.291511535644531, 17.846298217773438, 42.290565490722656, 0.1610088348388672, 6.204990386962891, 23.989364624023438, 12.247459411621094, -8.115425109863281, 31.5074462890625, 36.17559814453125, 1.8509864807128906, 21.71092987060547, 0.2031269073486328, -0.3715972900390625, -2.9752120971679688, 32.4176025390625, 13.968772888183594, 26.988357543945312, 5.612861633300781, -2.003173828125, 5.900245666503906, 7.6773834228515625, 12.231300354003906, 8.675315856933594, 19.09011459350586, 19.751731872558594, 22.631179809570312, -3.9886474609375, 25.09774398803711, 1.62811279296875, 5.521846771240234, -3.6810474395751953, -19.73009490966797, 10.69891357421875, 6.87249755859375, -8.681060791015625, -25.88885498046875, 36.65983581542969, 2.5852813720703125, 8.137191772460938, 9.870597839355469, -3.88311767578125, 21.157779693603516, 8.543022155761719, 2.5649242401123047, 30.831039428710938, 19.626388549804688, 1.0958404541015625, 5.535789489746094, -9.801414489746094, 4.698394775390625, 14.27227783203125, 5.971473693847656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000579.npy"}
{"epoch": 0.8752834467120182, "step": 580, "batch_size": 64, "mean": 8.081050872802734, "std": 13.373946189880371, "min": -24.605777740478516, "p10": -6.34393310546875, "median": 6.63062858581543, "p90": 26.818779754638673, "max": 35.68879699707031, "pos_frac": 0.765625, "sample": [-6.3132781982421875, -4.8555145263671875, -10.173789978027344, 3.0553150177001953, -1.862701416015625, 32.99647521972656, 22.24309730529785, 12.28741455078125, 8.840194702148438, 0.8547172546386719, 17.743057250976562, 13.981678009033203, 2.723602294921875, 3.65008544921875, -6.3570709228515625, 3.665313720703125, 20.380870819091797, 8.08380126953125, -5.177375793457031, 8.007476806640625, 3.0965576171875, 11.371664047241211, 29.63597869873047, 3.6215972900390625, 16.365474700927734, -3.2894515991210938, 35.68879699707031, 6.0374298095703125, -4.310426712036133, 31.900054931640625, 25.466903686523438, 11.847679138183594, 4.332923889160156, 7.223827362060547, -9.462730407714844, 26.45934295654297, -24.605777740478516, -23.181537628173828, 0.21945953369140625, 31.70355224609375, -22.986560821533203, 14.638916015625, -2.5354080200195312, 16.889617919921875, 26.972824096679688, 5.0229644775390625, -1.6155586242675781, 4.494075775146484, 22.109207153320312, 4.6187286376953125, 27.348045349121094, 15.691204071044922, 11.135635375976562, -9.681095123291016, 2.2469749450683594, 1.0536670684814453, 17.34747314453125, 12.041606903076172, 10.299232482910156, 18.789093017578125, 13.616691589355469, 4.817596435546875, 1.7615127563476562, 19.216089248657227], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000580.npy"}
{"epoch": 0.8767951625094482, "step": 581, "batch_size": 64, "mean": 8.385841369628906, "std": 13.618674278259277, "min": -19.305282592773438, "p10": -9.430052185058592, "median": 7.446596145629883, "p90": 27.742429351806646, "max": 41.93701171875, "pos_frac": 0.734375, "sample": [20.024211883544922, 9.708549499511719, 11.724189758300781, 4.788440704345703, -7.50604248046875, 5.5879364013671875, 11.111618041992188, -1.9496307373046875, 28.913742065429688, 16.3599853515625, -13.506484985351562, 3.5296363830566406, 3.667034149169922, 12.342819213867188, 4.6163177490234375, -4.414556503295898, 21.26036834716797, 6.81488037109375, 20.663965225219727, 3.5977783203125, 32.47650146484375, 13.049388885498047, 31.7386474609375, -15.601348876953125, 9.217021942138672, 32.093650817871094, 30.90772247314453, -2.0067367553710938, 22.891586303710938, 1.9729232788085938, -9.963104248046875, 5.1736602783203125, 3.5944690704345703, 18.882904052734375, 2.3208389282226562, 14.459381103515625, 2.1105728149414062, -16.35369873046875, 41.93701171875, 26.488723754882812, 28.27973175048828, -8.186264038085938, 16.392318725585938, 6.692543029785156, 8.799686431884766, -16.781017303466797, 7.00799560546875, 2.3089752197265625, -4.626426696777344, -16.464866638183594, 17.941070556640625, 12.443679809570312, 19.670242309570312, 11.538673400878906, -0.33255577087402344, 23.993484497070312, -0.8428688049316406, -19.305282592773438, -3.3190231323242188, 14.40841293334961, 13.348533630371094, 15.481216430664062, 7.885196685791016, -2.3645095825195312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000581.npy"}
{"epoch": 0.8783068783068783, "step": 582, "batch_size": 64, "mean": 8.123781204223633, "std": 14.082547187805176, "min": -27.387374877929688, "p10": -8.41414566040039, "median": 6.836008071899414, "p90": 25.511622619628906, "max": 49.015594482421875, "pos_frac": 0.75, "sample": [13.844490051269531, 49.015594482421875, 7.060108184814453, 8.739677429199219, 2.2467193603515625, -7.105247497558594, 28.102935791015625, 1.6819381713867188, 16.465560913085938, 25.38391876220703, 4.285179138183594, 15.895767211914062, 20.41986083984375, 36.60198974609375, 2.9137344360351562, 21.662429809570312, 20.256744384765625, -7.212799072265625, 12.974189758300781, 23.33099365234375, 6.10020637512207, 6.208183288574219, 6.458316802978516, 16.55242919921875, 15.088020324707031, 12.0316162109375, 2.1738414764404297, -2.1998748779296875, -16.489517211914062, -16.529266357421875, -0.19211578369140625, 12.849090576171875, 25.56635284423828, -8.929008483886719, -3.3008499145507812, 13.481884002685547, 29.818771362304688, 15.024696350097656, 27.573257446289062, -1.2279205322265625, 10.087432861328125, 18.262596130371094, 2.63446044921875, 11.15334701538086, -14.238224029541016, 2.8796844482421875, 3.9188385009765625, 15.7657470703125, 2.9429244995117188, -22.644866943359375, 0.09827232360839844, 5.542823791503906, -3.9895029067993164, -1.2407779693603516, 1.6761856079101562, 9.890731811523438, 16.98052978515625, -27.387374877929688, 18.740339279174805, -17.552536010742188, 6.611907958984375, -1.1258888244628906, 18.291526794433594, 26.00193977355957], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000582.npy"}
{"epoch": 0.8798185941043084, "step": 583, "batch_size": 64, "mean": 7.07718563079834, "std": 14.977029800415039, "min": -23.289430618286133, "p10": -10.41116256713867, "median": 5.354011535644531, "p90": 28.393214416503913, "max": 43.07792663574219, "pos_frac": 0.640625, "sample": [-5.662117004394531, -3.184253692626953, -15.222373962402344, -4.945243835449219, 43.07792663574219, 14.368160247802734, -13.739555358886719, -4.499874114990234, -12.682212829589844, -6.561195373535156, 10.64910888671875, -18.44577407836914, 6.065242767333984, 20.75701904296875, 6.453311920166016, -2.833568572998047, 30.200897216796875, -11.54019546508789, 42.15658187866211, 19.532278060913086, 23.046619415283203, -14.541343688964844, 17.623031616210938, -2.4848403930664062, 11.68438720703125, 6.831119537353516, 10.700531005859375, -7.5066986083984375, 6.0018768310546875, 26.962730407714844, -0.5757293701171875, -1.9322757720947266, 19.837738037109375, 2.829437255859375, 3.26361083984375, 31.912551879882812, -6.268035888671875, -7.776752471923828, 11.45458984375, 37.93436813354492, 1.3041820526123047, 21.23895263671875, -23.289430618286133, 19.743141174316406, -6.218776702880859, 23.819015502929688, 17.144384384155273, 11.320159912109375, 4.8030853271484375, -6.091917037963867, 5.904937744140625, 6.6222686767578125, 3.6184654235839844, -0.2873954772949219, 9.290590286254883, 4.414306640625, 2.9686412811279297, 17.51569366455078, 0.10124015808105469, 13.088371276855469, 36.41095733642578, 29.00627899169922, -2.64544677734375, 0.21707725524902344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000583.npy"}
{"epoch": 0.8813303099017384, "step": 584, "batch_size": 64, "mean": 7.888943195343018, "std": 13.843225479125977, "min": -14.696304321289062, "p10": -9.036968612670899, "median": 5.933420181274414, "p90": 27.750355529785164, "max": 46.502227783203125, "pos_frac": 0.65625, "sample": [32.11308288574219, 4.7735137939453125, 5.928440093994141, -0.7367019653320312, 8.827987670898438, -6.604772567749023, 30.498336791992188, 17.856826782226562, 22.51953125, 14.857650756835938, 5.096202850341797, -2.052377700805664, -7.428043365478516, -13.654411315917969, -9.181385040283203, 10.271522521972656, 4.569793701171875, 32.2869873046875, 21.238384246826172, -14.696304321289062, 19.743961334228516, 26.192787170410156, -3.287647247314453, 25.438270568847656, 12.984405517578125, 1.4249343872070312, 7.021522521972656, 7.8888092041015625, 3.3500900268554688, -0.0951080322265625, 13.934362411499023, -8.699996948242188, 10.251277923583984, 3.4676971435546875, -11.18789291381836, 34.493202209472656, -5.306663513183594, -0.1813507080078125, -0.46221160888671875, -7.218574523925781, 24.14771270751953, 28.417884826660156, -10.662445068359375, 11.77341079711914, 13.364994049072266, 11.287139892578125, 1.764181137084961, 14.228973388671875, 8.492942810058594, -1.79632568359375, 5.9384002685546875, 7.2227020263671875, -0.4922294616699219, 25.07038116455078, 2.7217674255371094, 46.502227783203125, 33.44859313964844, 2.5552444458007812, 6.321006774902344, -1.7561416625976562, -1.1643829345703125, 19.393207550048828, -13.505302429199219, -14.617733001708984], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000584.npy"}
{"epoch": 0.8828420256991686, "step": 585, "batch_size": 64, "mean": 5.9494829177856445, "std": 12.256616592407227, "min": -28.328590393066406, "p10": -6.16638298034668, "median": 5.826751708984375, "p90": 21.576072692871097, "max": 35.173763275146484, "pos_frac": 0.734375, "sample": [0.3650054931640625, 10.946563720703125, 1.5680923461914062, 7.1694793701171875, 10.745868682861328, 12.189788818359375, 6.644752502441406, 19.730743408203125, -3.9983596801757812, 0.8994636535644531, -28.328590393066406, 2.600566864013672, 22.5433349609375, 5.234153747558594, 0.21144866943359375, 20.337005615234375, -18.04180908203125, 6.9487152099609375, 10.497848510742188, 0.6605606079101562, 15.439720153808594, 4.353725433349609, 1.1662235260009766, -1.6569976806640625, 8.615352630615234, 4.172405242919922, -1.98590087890625, 8.217498779296875, 13.448028564453125, 7.258171081542969, -6.173225402832031, 6.053855895996094, 22.107101440429688, -1.9083728790283203, -14.471565246582031, 5.599647521972656, 4.906951904296875, 9.844402313232422, 30.888816833496094, 12.02077865600586, -0.2664146423339844, 10.577018737792969, 7.522674560546875, -20.940868377685547, -21.416107177734375, 29.766841888427734, 31.51251220703125, 22.792156219482422, -3.1537399291992188, -8.076622009277344, -6.150417327880859, -0.956146240234375, 17.274337768554688, 12.876903533935547, 10.20750617980957, -4.0595245361328125, 4.875701904296875, 16.820362091064453, 35.173763275146484, 9.532485961914062, 1.027862548828125, -1.3958663940429688, 4.999629974365234, 15.401641845703125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000585.npy"}
{"epoch": 0.8843537414965986, "step": 586, "batch_size": 64, "mean": 9.045629501342773, "std": 12.068526268005371, "min": -25.03502655029297, "p10": -4.007038307189941, "median": 7.393317222595215, "p90": 24.694372558593752, "max": 40.33949279785156, "pos_frac": 0.734375, "sample": [14.591323852539062, -2.707042694091797, 11.186023712158203, 12.759185791015625, 5.8755950927734375, 18.835838317871094, 7.296108245849609, -4.148632049560547, 9.79412841796875, 8.769634246826172, 23.654735565185547, -7.853691101074219, 7.49052619934082, 26.00921630859375, 23.91259765625, -1.252492904663086, 6.583282470703125, 6.61688232421875, 7.225837707519531, 30.471012115478516, -3.1346206665039062, 22.95687484741211, 5.902820587158203, 6.843143463134766, 13.706146240234375, -3.6766529083251953, 6.846160888671875, 1.6889495849609375, 19.147621154785156, 11.856010437011719, 28.820350646972656, 16.72937774658203, 6.746788024902344, 25.0294189453125, 16.065963745117188, -1.7184219360351562, -15.321044921875, 40.33949279785156, 3.4398765563964844, 6.691032409667969, -1.6305370330810547, -4.895744323730469, 4.3640289306640625, 10.165634155273438, 18.177093505859375, -2.9448013305664062, -0.2720489501953125, 9.270885467529297, -4.434141159057617, 10.784839630126953, 20.138473510742188, -0.63360595703125, 13.330078125, -1.0760307312011719, 33.314453125, 0.993927001953125, -25.03502655029297, 27.030235290527344, -10.401260375976562, 8.378921508789062, 22.825393676757812, 5.7997589111328125, 22.43665313720703, 9.163726806640625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000586.npy"}
{"epoch": 0.8858654572940288, "step": 587, "batch_size": 64, "mean": 9.30241870880127, "std": 13.990851402282715, "min": -25.13885498046875, "p10": -6.397534561157226, "median": 7.988765716552734, "p90": 26.823165130615234, "max": 43.31153106689453, "pos_frac": 0.75, "sample": [-2.5675506591796875, -12.420944213867188, 1.3716621398925781, 7.452888488769531, 18.73815155029297, -9.473236083984375, 6.1957550048828125, -13.738906860351562, 18.328262329101562, 7.1602020263671875, 11.227800369262695, 18.379085540771484, -6.5474395751953125, 2.8650741577148438, 2.6619186401367188, 4.915618896484375, 22.826297760009766, 0.8066558837890625, -4.0086212158203125, 39.293601989746094, 14.579673767089844, 10.626541137695312, -0.5844936370849609, 32.163169860839844, 11.34771728515625, 12.04323959350586, 43.31153106689453, -2.06793212890625, -25.13885498046875, 1.0841407775878906, -2.312713623046875, 4.469425201416016, 14.609474182128906, 26.628944396972656, 5.495990753173828, 14.249839782714844, 18.429676055908203, -13.81280517578125, -17.219558715820312, 26.906402587890625, 33.7086181640625, -5.558189392089844, 33.199005126953125, 7.0957489013671875, 1.8595199584960938, 26.08383560180664, 11.378852844238281, 25.315128326416016, 20.82784652709961, 7.47149658203125, 24.335350036621094, -3.4772415161132812, 16.755537033081055, 2.2792816162109375, 12.245418548583984, 11.033905029296875, 13.480243682861328, -6.047756195068359, -0.3786201477050781, 8.506034851074219, 14.354850769042969, 28.112274169921875, 23.918060302734375, 0.5799026489257812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000587.npy"}
{"epoch": 0.8873771730914588, "step": 588, "batch_size": 64, "mean": 8.139813423156738, "std": 11.138357162475586, "min": -24.651947021484375, "p10": -3.161301040649414, "median": 9.286666870117188, "p90": 19.23958263397217, "max": 38.15187072753906, "pos_frac": 0.796875, "sample": [-3.24920654296875, -2.7378082275390625, 12.606277465820312, 12.479171752929688, -0.352203369140625, 10.395179748535156, 16.57073974609375, 6.101409912109375, 9.487960815429688, 7.9489593505859375, 32.771881103515625, 14.086860656738281, 9.799610137939453, 14.28805160522461, 9.085372924804688, 10.107545852661133, 38.15187072753906, 32.91172790527344, 12.133712768554688, 24.284454345703125, 5.9944915771484375, 10.772048950195312, -0.7074375152587891, 17.712722778320312, 12.596904754638672, 19.446823120117188, 4.716011047363281, 0.47626495361328125, 4.312095642089844, -24.651947021484375, 3.4305906295776367, 11.553565979003906, -16.567102432250977, 6.34283447265625, 12.193988800048828, 22.667770385742188, -1.0412330627441406, 14.716842651367188, 17.126968383789062, 0.14345550537109375, 0.339263916015625, -15.166370391845703, 0.7846107482910156, 8.1070556640625, 14.779922485351562, 6.107078552246094, 18.75602149963379, -0.7453556060791016, 11.030281066894531, 22.47808837890625, 9.79803466796875, -7.9039306640625, -2.956188201904297, 2.8773651123046875, 7.161842346191406, 15.687255859375, 1.5035400390625, 17.27301788330078, 4.32684326171875, -5.4400482177734375, 5.826179504394531, 14.859771728515625, 14.763931274414062, -11.40740966796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000588.npy"}
{"epoch": 0.8888888888888888, "step": 589, "batch_size": 64, "mean": 9.833610534667969, "std": 12.840791702270508, "min": -22.336746215820312, "p10": -2.5799295425415036, "median": 9.55181884765625, "p90": 27.619317054748535, "max": 39.29742431640625, "pos_frac": 0.859375, "sample": [5.482181549072266, 8.085617065429688, -11.171165466308594, 27.314136505126953, 16.382366180419922, 8.444038391113281, 10.260917663574219, 0.01943206787109375, 4.5187835693359375, 2.721406936645508, 1.46051025390625, -6.481483459472656, 39.29742431640625, 19.221923828125, 8.555717468261719, -0.296142578125, 10.224319458007812, 16.53583526611328, 21.771156311035156, 25.81597137451172, 3.780853271484375, 11.940902709960938, 17.375707626342773, -19.786102294921875, 18.285385131835938, 6.9071807861328125, 1.1394500732421875, 10.201255798339844, 18.17483139038086, 12.488280296325684, 8.858489990234375, 9.423187255859375, 28.250232696533203, 14.304824829101562, 1.57049560546875, -2.6371803283691406, 8.443458557128906, 11.504013061523438, 11.542869567871094, 30.32052230834961, 15.888313293457031, 34.445552825927734, 26.207088470458984, 3.7783279418945312, 14.661624908447266, 0.787139892578125, 3.7083053588867188, -15.773738861083984, -2.4463443756103516, 10.409408569335938, -22.336746215820312, 15.691337585449219, 37.92032241821289, -20.98571014404297, 11.087051391601562, 13.000064849853516, 27.75010871887207, 7.7295379638671875, 4.649101257324219, 6.092643737792969, 9.680450439453125, 1.094146728515625, 28.316421508789062, 7.745086669921875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000589.npy"}
{"epoch": 0.890400604686319, "step": 590, "batch_size": 64, "mean": 9.178227424621582, "std": 12.860471725463867, "min": -28.653404235839844, "p10": -2.9771026611328124, "median": 6.837215423583984, "p90": 28.733661651611328, "max": 36.1134033203125, "pos_frac": 0.734375, "sample": [20.98241424560547, 5.490631103515625, 17.76470947265625, -1.2859115600585938, 10.60089111328125, 36.1134033203125, 4.9623260498046875, -28.653404235839844, -2.570420265197754, -2.30419921875, 2.1877975463867188, -3.2653770446777344, 7.901649475097656, 5.393890380859375, -17.385299682617188, 0.5556373596191406, 16.81402587890625, 3.84130859375, -0.02623748779296875, 0.804656982421875, -3.9742965698242188, 7.0854644775390625, -2.2413673400878906, 27.309764862060547, 34.12085723876953, -3.0328216552734375, 7.937557220458984, 8.537139892578125, 23.232284545898438, 7.436981201171875, 8.043939590454102, 17.241836547851562, 6.463535308837891, 1.71478271484375, 15.801620483398438, 28.91185760498047, 22.146217346191406, 6.2473907470703125, 2.94940185546875, 2.754230499267578, 25.411415100097656, 6.588966369628906, 32.87602233886719, 16.564247131347656, 34.02888870239258, -1.323974609375, -0.125335693359375, 8.957023620605469, 4.31671142578125, 10.420257568359375, -2.8470916748046875, 22.62847900390625, 10.430675506591797, -0.5063705444335938, 0.02923583984375, 28.91779327392578, 16.266231536865234, 17.215492248535156, 31.392200469970703, -3.5381851196289062, 12.494163513183594, -0.39361572265625, 28.31787109375, -7.323432922363281], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000590.npy"}
{"epoch": 0.891912320483749, "step": 591, "batch_size": 64, "mean": 9.116888046264648, "std": 12.8169584274292, "min": -18.44989776611328, "p10": -6.815749740600585, "median": 8.949226379394531, "p90": 26.797944831848145, "max": 40.802001953125, "pos_frac": 0.734375, "sample": [-1.5171709060668945, 28.079681396484375, 8.783458709716797, 14.302734375, 19.34423828125, 17.262954711914062, 26.941761016845703, 9.128173828125, 15.294178009033203, 28.17807388305664, 7.006561279296875, -11.960403442382812, 11.132667541503906, -8.544830322265625, 2.1511802673339844, -7.219081878662109, 9.114994049072266, 4.779899597167969, 10.522186279296875, 6.468935012817383, 24.221954345703125, 17.25069808959961, 19.6461181640625, -4.241386413574219, 21.007904052734375, -16.355079650878906, 17.383941650390625, 13.71139144897461, 2.9681053161621094, -0.083587646484375, -14.934379577636719, -3.744487762451172, 29.076919555664062, 40.802001953125, -1.4018230438232422, 28.444618225097656, -18.44989776611328, 15.704917907714844, 16.591094970703125, 0.6160144805908203, 3.5736160278320312, -5.434216499328613, 13.314323425292969, 14.528507232666016, 1.7621822357177734, 21.218612670898438, -2.710719108581543, -9.967437744140625, 4.219085693359375, 19.30609130859375, -5.874641418457031, -3.680389404296875, 27.173751831054688, 8.414459228515625, 4.451423645019531, 26.462373733520508, 19.642501831054688, 5.112573623657227, 4.131805419921875, 11.262031555175781, 24.58563232421875, -0.1290283203125, 2.152599334716797, 22.50048828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000591.npy"}
{"epoch": 0.8934240362811792, "step": 592, "batch_size": 64, "mean": 9.00094985961914, "std": 13.638615608215332, "min": -14.160163879394531, "p10": -4.968031883239745, "median": 4.215885162353516, "p90": 30.44575309753419, "max": 53.567840576171875, "pos_frac": 0.734375, "sample": [-1.4427108764648438, 35.12480163574219, 17.93933868408203, 10.718568801879883, 19.402015686035156, 33.792911529541016, 27.94381332397461, -7.138099670410156, 9.615928649902344, 3.5522003173828125, -5.907798767089844, 14.495903015136719, 7.2239837646484375, 3.9293594360351562, 3.0098609924316406, 26.264442443847656, 31.51801300048828, 14.842117309570312, -2.8761024475097656, -1.8209953308105469, -12.159561157226562, 21.13067626953125, 14.392965316772461, 27.142959594726562, -0.016357421875, 16.075042724609375, 3.2614822387695312, 40.36827850341797, 4.502410888671875, 14.359123229980469, 5.667781829833984, 9.555578231811523, -5.206947326660156, 3.4296875, 1.4801788330078125, 5.973880767822266, -14.160163879394531, 9.864898681640625, -8.710428237915039, -4.410562515258789, -6.684755325317383, 8.599620819091797, 0.1484832763671875, -1.2148666381835938, 1.990478515625, 16.764907836914062, 8.9771728515625, 37.73382568359375, 34.23204040527344, 0.047332763671875, 53.567840576171875, -0.9256696701049805, 1.9808807373046875, 7.389657974243164, -1.7252845764160156, 3.621246337890625, 12.62567138671875, 3.8689956665039062, -0.7035369873046875, 3.589580535888672, 2.4438629150390625, -0.6737136840820312, 2.0696182250976562, 15.608936309814453], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000592.npy"}
{"epoch": 0.8949357520786092, "step": 593, "batch_size": 64, "mean": 8.658736228942871, "std": 13.978523254394531, "min": -21.845909118652344, "p10": -7.363738250732422, "median": 7.727821350097656, "p90": 28.28304252624512, "max": 40.111778259277344, "pos_frac": 0.71875, "sample": [4.617229461669922, 6.126014709472656, 15.345703125, 5.558380126953125, 22.685325622558594, -7.4284210205078125, 0.7313690185546875, -6.5107574462890625, 35.139183044433594, 13.052490234375, 17.32357406616211, 3.481365203857422, 13.149871826171875, 30.542552947998047, 6.7491302490234375, -2.831707000732422, 14.884906768798828, 8.706512451171875, 9.359786987304688, 9.283061981201172, 4.546722412109375, 27.846057891845703, 20.750732421875, 26.671546936035156, 2.2796974182128906, 9.297115325927734, -7.139961242675781, 27.5997371673584, 5.4379119873046875, -19.594078063964844, 20.55437469482422, -7.410686492919922, -5.477973937988281, 19.36072540283203, 12.86007308959961, 14.887243270874023, 4.853716850280762, -12.45132064819336, 29.99188995361328, -2.9646759033203125, -15.126335144042969, -6.552066802978516, 28.470321655273438, -9.821281433105469, 38.403926849365234, 19.85457992553711, 35.257835388183594, 9.679302215576172, -7.254192352294922, 40.111778259277344, -1.1071319580078125, -1.9826126098632812, 13.076400756835938, 9.070022583007812, 16.053451538085938, -21.845909118652344, 5.82752799987793, 5.521568298339844, -0.9082469940185547, -4.540737152099609, 10.5572509765625, 3.9807968139648438, 3.7806396484375, 11.787790298461914], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000593.npy"}
{"epoch": 0.8964474678760394, "step": 594, "batch_size": 64, "mean": 7.293941497802734, "std": 12.264134407043457, "min": -16.287986755371094, "p10": -6.490124320983886, "median": 4.390399932861328, "p90": 21.201855087280276, "max": 44.280208587646484, "pos_frac": 0.671875, "sample": [5.9665679931640625, 14.79632568359375, -0.5400848388671875, 20.901813507080078, 20.235328674316406, 21.3304443359375, 4.644081115722656, -2.286834716796875, 10.730239868164062, -2.8391189575195312, -0.611602783203125, -7.592918395996094, -6.381898880004883, 4.762054443359375, 1.5417900085449219, -1.2740516662597656, -3.4546051025390625, -5.896228790283203, 20.220306396484375, 8.888294219970703, 17.35095977783203, 2.3209915161132812, 4.226081848144531, 2.2892303466796875, 44.280208587646484, 10.064056396484375, 3.9285240173339844, -8.900184631347656, 16.805511474609375, -1.7603588104248047, 29.459930419921875, 12.407012939453125, 36.871124267578125, 12.299442291259766, 1.1900482177734375, -4.1850128173828125, 7.2004547119140625, 10.884071350097656, 15.88228988647461, -6.536506652832031, 18.784744262695312, -2.2722702026367188, 18.548885345458984, 31.682788848876953, 0.8622093200683594, 10.115074157714844, 3.8299903869628906, 20.748146057128906, 0.3023185729980469, 24.590049743652344, 18.761383056640625, -16.287986755371094, -0.4355621337890625, 29.915855407714844, 4.554718017578125, 12.740425109863281, -2.9341964721679688, 3.1267852783203125, 1.8531570434570312, -3.1379852294921875, 7.774192810058594, -7.351676940917969, -6.777099609375, -11.39947509765625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000594.npy"}
{"epoch": 0.8979591836734694, "step": 595, "batch_size": 64, "mean": 10.371227264404297, "std": 14.964183807373047, "min": -31.943973541259766, "p10": -7.121761131286621, "median": 10.464609146118164, "p90": 28.908314132690435, "max": 47.61272430419922, "pos_frac": 0.78125, "sample": [-2.3819427490234375, 12.522354125976562, 30.84173583984375, 7.266448974609375, 21.576141357421875, 14.341426849365234, 13.658676147460938, 8.996631622314453, 3.719268798828125, 29.331073760986328, 20.449783325195312, 4.085292816162109, -1.531494140625, -19.260398864746094, 32.38528060913086, 2.500995635986328, 4.609579086303711, -6.832880020141602, 23.85250473022461, 1.7247810363769531, 8.021026611328125, 13.865203857421875, 22.410396575927734, 12.80838394165039, 27.921875, 4.845863342285156, 5.080146789550781, 15.646247863769531, 12.656524658203125, 16.36632537841797, -1.4702415466308594, -15.0953369140625, 9.89675521850586, -6.5482635498046875, 25.53859519958496, 9.32357406616211, 32.261131286621094, -8.789443969726562, -31.943973541259766, 47.61272430419922, 45.12281799316406, 32.73329162597656, 23.301910400390625, 10.90301513671875, -12.842422485351562, 16.814849853515625, 17.854171752929688, 19.49039077758789, 2.1079025268554688, 8.150032043457031, 0.2484111785888672, 17.52752685546875, 10.596134185791016, 8.190132141113281, 10.333084106445312, 8.372509002685547, 23.088272094726562, 10.831083297729492, -7.245567321777344, -6.7903900146484375, -2.7738332748413086, -19.363197326660156, 20.759319305419922, 24.086341857910156], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000595.npy"}
{"epoch": 0.8994708994708994, "step": 596, "batch_size": 64, "mean": 4.189225673675537, "std": 12.251760482788086, "min": -23.46917724609375, "p10": -7.59594383239746, "median": 0.4661235809326172, "p90": 23.363282394409204, "max": 38.331260681152344, "pos_frac": 0.53125, "sample": [-0.39821624755859375, 5.8973236083984375, 26.011672973632812, -4.915912628173828, 7.902027130126953, -8.47762680053711, -1.7662773132324219, 6.375820159912109, 1.4972763061523438, 9.09598159790039, -5.070178985595703, 3.4767608642578125, -5.4514923095703125, -1.6400985717773438, 31.93927001953125, 28.717266082763672, 10.138790130615234, -1.7203216552734375, -13.461471557617188, 8.04815673828125, -5.1969451904296875, 7.864418029785156, 0.3019371032714844, 3.7671985626220703, 34.45811462402344, 6.0622406005859375, -1.4117507934570312, -6.975621223449707, 3.163990020751953, 1.10595703125, -4.651584625244141, -1.1534042358398438, 15.393852233886719, -1.9748954772949219, 10.334457397460938, -2.916452407836914, -11.995086669921875, 34.62866973876953, -0.6691741943359375, -10.412689208984375, -1.9117012023925781, 7.644187927246094, -0.7552413940429688, -23.46917724609375, -1.1164703369140625, 9.140972137451172, -7.094669342041016, -7.928994178771973, 28.999908447265625, 7.020881652832031, 17.183704376220703, 3.9366683959960938, -7.8107757568359375, 3.5455856323242188, -3.3041229248046875, 38.331260681152344, -1.1121482849121094, 11.359928131103516, 0.18471527099609375, -1.5187263488769531, 15.74822998046875, 17.040054321289062, 0.63031005859375, -2.5559158325195312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000596.npy"}
{"epoch": 0.9009826152683296, "step": 597, "batch_size": 64, "mean": 8.662214279174805, "std": 13.010931015014648, "min": -24.881086349487305, "p10": -6.26498908996582, "median": 6.776947975158691, "p90": 28.785075759887704, "max": 42.86152648925781, "pos_frac": 0.765625, "sample": [1.1764450073242188, -24.881086349487305, -2.5128135681152344, 2.248138427734375, 31.120468139648438, 6.467094421386719, 4.759276390075684, -1.946136474609375, -5.799160003662109, 29.836910247802734, -9.712654113769531, 11.5462646484375, 4.248910903930664, 2.7739715576171875, -8.605674743652344, 7.601785659790039, 30.016014099121094, 26.330795288085938, 13.144309997558594, -4.34857177734375, 14.04339599609375, 12.619377136230469, -17.716903686523438, -6.867637634277344, 31.98278045654297, 13.684659957885742, 3.515941619873047, -6.464630126953125, 12.821121215820312, 0.10126686096191406, 8.474325180053711, 21.718101501464844, 33.432533264160156, 16.1226806640625, 3.7196578979492188, 12.561203002929688, -0.0662841796875, 4.343360900878906, 12.197038650512695, 30.206092834472656, 7.086801528930664, 1.6990966796875, 2.5864334106445312, 2.0298080444335938, 6.254859924316406, -0.23509979248046875, -9.077625274658203, 0.05300140380859375, 11.245811462402344, 2.7698516845703125, 42.86152648925781, 13.721321105957031, 22.756027221679688, -4.940704345703125, -5.186794281005859, 12.535991668701172, 13.00910472869873, 13.781982421875, 20.835838317871094, 20.42481231689453, 5.294548034667969, 20.452693939208984, 21.708580017089844, 18.821502685546875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000597.npy"}
{"epoch": 0.9024943310657596, "step": 598, "batch_size": 64, "mean": 6.2312750816345215, "std": 12.755658149719238, "min": -22.1690673828125, "p10": -8.479900932312011, "median": 5.311352729797363, "p90": 21.624974060058594, "max": 39.808128356933594, "pos_frac": 0.65625, "sample": [5.436037063598633, -4.533897399902344, -0.467437744140625, 4.5345611572265625, 19.52159881591797, 1.5104141235351562, 4.194450378417969, 21.77685546875, 11.121196746826172, -10.279037475585938, 6.166168212890625, 5.186668395996094, 6.5073699951171875, 1.8596916198730469, 36.45988464355469, -13.219684600830078, 39.808128356933594, -11.55682373046875, 10.571815490722656, 17.32427978515625, -8.646142959594727, -17.005046844482422, -4.5299530029296875, 17.193653106689453, 1.6490325927734375, 2.2480831146240234, 7.11053466796875, 18.356597900390625, -8.092002868652344, -3.046905517578125, -4.881690979003906, -2.2947998046875, 7.371326446533203, 9.020126342773438, 18.967910766601562, 0.5049285888671875, 13.37506103515625, -0.0484771728515625, 21.270584106445312, 7.72137451171875, 23.238685607910156, -1.5479202270507812, 1.4731807708740234, 39.71432113647461, -3.0429840087890625, 15.645349502563477, 7.3184814453125, 1.0689735412597656, 14.568103790283203, 16.6712646484375, -3.924285888671875, -22.1690673828125, -1.2045974731445312, 6.173004150390625, 5.9797821044921875, -10.211845397949219, 22.69689178466797, 8.19952392578125, -5.747344970703125, -1.9549598693847656, 26.030426025390625, 18.647254943847656, -5.232025146484375, 18.244949340820312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000598.npy"}
{"epoch": 0.9040060468631897, "step": 599, "batch_size": 64, "mean": 7.720818519592285, "std": 12.781628608703613, "min": -20.674129486083984, "p10": -6.130500221252441, "median": 5.335064888000488, "p90": 26.70118751525879, "max": 40.7767333984375, "pos_frac": 0.75, "sample": [-3.295989990234375, 15.7066650390625, 2.7059364318847656, 5.282958984375, 0.1162872314453125, 8.91670036315918, -5.349117279052734, 34.5230712890625, -3.337696075439453, -14.522933959960938, -9.004974365234375, 6.869157791137695, 2.105846405029297, 10.450080871582031, 17.961013793945312, -7.766181945800781, -0.898834228515625, 1.3875675201416016, 3.480801582336426, 5.854705810546875, 26.76516342163086, 12.600902557373047, 1.4698410034179688, -9.2384033203125, 0.44831275939941406, 2.76934814453125, 6.0980377197265625, 26.442588806152344, -20.674129486083984, 8.139549255371094, 29.663009643554688, 17.623451232910156, -1.6182441711425781, -5.35882568359375, 4.105888366699219, 3.1496353149414062, 1.2275867462158203, 9.644294738769531, -4.084659576416016, 11.312362670898438, 25.394184112548828, 4.882165908813477, 2.523212432861328, -6.461217880249023, 27.898155212402344, 17.568893432617188, 20.178512573242188, -13.341808319091797, 1.067840576171875, 8.830371856689453, 40.7767333984375, 11.5330810546875, 8.268196105957031, 5.387170791625977, 28.246932983398438, 15.766265869140625, 35.99244689941406, 8.96026611328125, 14.859909057617188, -1.5416374206542969, -4.226654052734375, 18.441524505615234, 26.551910400390625, 4.905120849609375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000599.npy"}
{"epoch": 0.9055177626606198, "step": 600, "batch_size": 64, "mean": 6.555037975311279, "std": 12.183257102966309, "min": -20.638999938964844, "p10": -7.807457160949707, "median": 6.189525604248047, "p90": 24.882865905761726, "max": 34.081260681152344, "pos_frac": 0.65625, "sample": [9.240768432617188, 23.25762176513672, 1.1011238098144531, -3.6781768798828125, 31.015426635742188, -0.30327606201171875, 11.206298828125, -7.744314193725586, 20.70275115966797, 25.57939910888672, -6.239246368408203, 9.94940185546875, 15.184501647949219, 14.537506103515625, -1.2479705810546875, -1.4763946533203125, -0.4762420654296875, -8.446325302124023, 5.4210205078125, 16.091140747070312, 5.647552490234375, 7.449623107910156, 22.132415771484375, 25.920196533203125, -0.5063629150390625, 15.880889892578125, 5.362419128417969, 34.081260681152344, -0.8969001770019531, 20.85283660888672, 8.859512329101562, 6.665924072265625, -12.804725646972656, -10.267181396484375, -3.6617050170898438, -18.86510467529297, 3.5989227294921875, -7.8345184326171875, 0.5614509582519531, 25.874237060546875, -4.429821014404297, 12.47283935546875, 6.434856414794922, 6.221649169921875, -0.41138458251953125, 10.922954559326172, -13.480712890625, -4.912286758422852, -20.638999938964844, 20.237564086914062, 7.572193145751953, 6.157402038574219, 4.220115661621094, 6.47198486328125, 7.904895782470703, -0.6058349609375, 7.9393768310546875, -2.6387481689453125, 12.615665435791016, 31.484725952148438, 2.0485076904296875, 0.353851318359375, 15.152908325195312, 26.702964782714844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000600.npy"}
{"epoch": 0.9070294784580499, "step": 601, "batch_size": 64, "mean": 9.732687950134277, "std": 12.598760604858398, "min": -22.33110809326172, "p10": -3.995879554748535, "median": 8.881668090820312, "p90": 29.060784149169923, "max": 36.36001205444336, "pos_frac": 0.734375, "sample": [29.665489196777344, 16.74939727783203, 20.36139678955078, -22.33110809326172, 22.182601928710938, 28.819107055664062, 34.18670654296875, 1.1124267578125, 1.72735595703125, -3.7779693603515625, 32.72998046875, -13.399505615234375, 12.495979309082031, 9.870079040527344, 1.6103553771972656, 7.9848480224609375, 16.117494583129883, 18.969482421875, -2.4784774780273438, 18.730606079101562, 26.428203582763672, 5.1632232666015625, -1.5917510986328125, 22.912261962890625, 36.36001205444336, 5.445503234863281, -0.9015674591064453, -2.316986083984375, 10.390846252441406, -1.5580368041992188, 1.9070701599121094, 15.151939392089844, 17.43585205078125, -8.741851806640625, 4.736381530761719, 20.806934356689453, 8.724067687988281, 14.446937561035156, 5.003662109375, 9.039268493652344, -3.7338714599609375, 2.6414642333984375, 5.444522857666016, 15.17041015625, 15.560989379882812, 10.808475494384766, 13.786094665527344, -4.554019927978516, -4.089269638061523, 3.963207244873047, 1.5926780700683594, 29.978713989257812, -2.3751907348632812, -2.3379154205322266, 8.568546295166016, 15.85333251953125, 17.6700439453125, 30.12732696533203, 21.496788024902344, -5.929174423217773, -1.404541015625, 12.778118133544922, 29.16436004638672, -7.4572601318359375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000601.npy"}
{"epoch": 0.90854119425548, "step": 602, "batch_size": 64, "mean": 8.985538482666016, "std": 11.807914733886719, "min": -11.7213134765625, "p10": -5.611060333251952, "median": 7.317853927612305, "p90": 27.5797119140625, "max": 37.040260314941406, "pos_frac": 0.78125, "sample": [0.2942962646484375, 18.44879150390625, 10.310688018798828, 23.0245361328125, 5.797943115234375, 7.6834259033203125, 12.638145446777344, 5.56654167175293, 15.509719848632812, 5.8534393310546875, 33.620697021484375, -3.1435813903808594, 30.59173583984375, 21.0703125, -8.070243835449219, 37.040260314941406, -4.7449951171875, 30.164642333984375, 20.370269775390625, 2.9042205810546875, 0.02490997314453125, 0.4916677474975586, 5.202629089355469, -11.7213134765625, 8.142818450927734, 14.339176177978516, -0.05287933349609375, 20.08835220336914, 7.765201568603516, 13.860158920288086, -4.6086578369140625, 0.8287506103515625, 11.332672119140625, 1.3430824279785156, 12.093135833740234, 8.749759674072266, -3.4062881469726562, 10.301223754882812, 20.119144439697266, 0.5729827880859375, -4.093944549560547, 7.459293365478516, 6.897712707519531, 35.36781692504883, -6.88629150390625, -9.211925506591797, 13.022136688232422, 6.514713287353516, 6.684150695800781, 3.0636749267578125, -6.056419372558594, 7.176414489746094, 27.435943603515625, 1.5029067993164062, -0.9498443603515625, 17.305835723876953, 13.563499450683594, 4.606132507324219, 29.81573486328125, 27.641326904296875, -5.982231140136719, 17.655498504638672, 9.181190490722656, -7.036247253417969], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000602.npy"}
{"epoch": 0.91005291005291, "step": 603, "batch_size": 64, "mean": 7.853864669799805, "std": 13.191612243652344, "min": -26.060020446777344, "p10": -5.799628067016601, "median": 6.575564384460449, "p90": 24.709480285644535, "max": 44.609222412109375, "pos_frac": 0.71875, "sample": [7.895439147949219, 31.586181640625, 16.943588256835938, -0.108489990234375, -5.537990570068359, 1.531036376953125, 6.9856414794921875, 7.6250762939453125, -3.7264556884765625, 2.0679931640625, 15.209991455078125, 5.355995178222656, 20.923545837402344, -2.209657669067383, -13.187015533447266, 27.557083129882812, -0.9482803344726562, 3.7839317321777344, 5.893501281738281, 44.609222412109375, 11.1251220703125, 2.806427001953125, 19.800735473632812, 4.325340270996094, -4.272174835205078, 5.979652404785156, 12.372543334960938, -5.9117584228515625, 10.4224853515625, -18.992202758789062, 8.203433990478516, 1.9062957763671875, 11.805276870727539, 6.7155609130859375, -7.124309539794922, 20.637001037597656, 13.023027420043945, 2.0364723205566406, -0.2541675567626953, 39.28037643432617, -10.436653137207031, 12.473106384277344, 19.1094970703125, 12.873281478881836, 25.181381225585938, 20.484817504882812, 1.9007186889648438, -1.2422332763671875, 7.999324798583984, 1.5038681030273438, -3.1876373291015625, -0.9125289916992188, 18.536357879638672, 23.31653594970703, 8.994884490966797, -1.5403938293457031, 6.435567855834961, -17.85784912109375, 17.80670166015625, 24.147705078125, 26.59636878967285, 5.438838958740234, 24.950241088867188, -26.060020446777344], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000603.npy"}
{"epoch": 0.9115646258503401, "step": 604, "batch_size": 64, "mean": 7.7018914222717285, "std": 14.350754737854004, "min": -28.788192749023438, "p10": -8.797468566894532, "median": 4.970401763916016, "p90": 28.480017471313484, "max": 44.134422302246094, "pos_frac": 0.71875, "sample": [13.22369384765625, 15.387229919433594, -1.2153091430664062, -7.138019561767578, 13.597908020019531, 3.6232681274414062, 2.310375213623047, 19.058631896972656, 31.753257751464844, -11.829483032226562, 4.851898193359375, -0.35747528076171875, 3.1592111587524414, 14.525447845458984, 4.787445068359375, -0.32810211181640625, 24.74506378173828, 2.846405029296875, 2.277067184448242, 18.39385986328125, -23.048309326171875, -8.903518676757812, 2.110851287841797, -9.08734130859375, 4.136482238769531, 1.89111328125, -0.6159210205078125, 5.088905334472656, 6.311103820800781, -28.788192749023438, 5.705760955810547, -8.550018310546875, -3.3703536987304688, -0.3551177978515625, 32.59482192993164, -3.9348526000976562, -1.014617919921875, 12.688629150390625, 21.995956420898438, 12.207992553710938, 8.527791976928711, 39.45596694946289, 38.038299560546875, 7.365772247314453, 14.426071166992188, 32.300010681152344, 2.5182037353515625, 19.538230895996094, 44.134422302246094, 2.018352508544922, 29.155746459960938, 4.8484039306640625, 8.847305297851562, 6.8051300048828125, 5.4656982421875, 14.068344116210938, 26.903316497802734, -1.3490982055664062, 3.376689910888672, 13.369766235351562, 20.472213745117188, -12.776382446289062, -18.522109985351562, 13.197158813476562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000604.npy"}
{"epoch": 0.9130763416477702, "step": 605, "batch_size": 64, "mean": 7.205513954162598, "std": 12.718545913696289, "min": -20.157981872558594, "p10": -7.424439239501953, "median": 6.212291717529297, "p90": 21.86900787353516, "max": 49.4017333984375, "pos_frac": 0.703125, "sample": [15.782115936279297, 2.3762435913085938, -10.188825607299805, 49.4017333984375, 17.111976623535156, 12.487892150878906, 5.767608642578125, 14.576004028320312, 14.364608764648438, -5.677543640136719, 8.759140014648438, -7.911865234375, 1.4227218627929688, -3.1329879760742188, -7.0098876953125, -3.0227012634277344, -1.813751220703125, 13.716659545898438, 14.233154296875, 4.794181823730469, 2.4920654296875, -3.0874099731445312, 9.177532196044922, -14.642875671386719, 3.408477783203125, -1.9204139709472656, 29.625080108642578, 10.53209114074707, 4.435955047607422, 11.536331176757812, 10.726966857910156, 19.679786682128906, -20.157981872558594, 21.00977325439453, 1.5565643310546875, 12.380813598632812, 22.23725128173828, 16.583465576171875, -0.01815032958984375, 6.407356262207031, -7.0515289306640625, 4.396770477294922, 24.88825225830078, 19.955608367919922, 32.36798095703125, 5.8484954833984375, 15.022346496582031, 11.646942138671875, 7.2003936767578125, -7.48883056640625, 34.164093017578125, 10.720199584960938, 6.0172271728515625, -10.934768676757812, -7.274192810058594, 3.11248779296875, 8.912040710449219, -10.896087646484375, 6.697456359863281, -6.182483673095703, 0.5662040710449219, 29.948822021484375, -2.7993927001953125, 14.315685272216797], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000605.npy"}
{"epoch": 0.9145880574452003, "step": 606, "batch_size": 64, "mean": 7.625718593597412, "std": 11.554939270019531, "min": -14.44720458984375, "p10": -4.808855056762695, "median": 4.212013244628906, "p90": 24.974526214599614, "max": 34.45317077636719, "pos_frac": 0.6875, "sample": [-3.218181610107422, 17.04035186767578, 11.170406341552734, 25.595947265625, 3.5377349853515625, -0.9224929809570312, 8.259445190429688, -0.7487640380859375, 1.9099445343017578, 22.287437438964844, 28.35601806640625, 4.5135498046875, 15.094566345214844, 12.92578125, 7.8575286865234375, -13.530784606933594, 14.136783599853516, 2.7609405517578125, 3.304342269897461, 16.080989837646484, 29.269287109375, 3.472381591796875, -5.726951599121094, 18.780052185058594, -4.8201446533203125, -1.274169921875, 13.10888671875, -0.9490127563476562, 13.982172012329102, 16.894638061523438, 3.9104766845703125, 3.142230987548828, -4.782512664794922, 23.009239196777344, -1.2085342407226562, 5.914994239807129, -0.7869720458984375, -7.547704696655273, 3.3598480224609375, 3.5177764892578125, 30.259532928466797, 28.91669464111328, 0.27458953857421875, 23.52454376220703, 5.611324310302734, 34.45317077636719, -1.5069580078125, 10.718765258789062, 3.6814022064208984, -4.278446197509766, 28.724205017089844, -8.98968505859375, -0.9030075073242188, 10.222610473632812, -14.44720458984375, 19.31947898864746, 3.3337326049804688, -9.731903076171875, -0.892303466796875, 16.63329315185547, 4.849697113037109, 17.74816131591797, 6.0402374267578125, -3.193462371826172], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000606.npy"}
{"epoch": 0.9160997732426304, "step": 607, "batch_size": 64, "mean": 8.795883178710938, "std": 10.878074645996094, "min": -19.711029052734375, "p10": -5.265088653564453, "median": 9.540390014648438, "p90": 23.048517227172855, "max": 31.82568359375, "pos_frac": 0.765625, "sample": [1.5860824584960938, -1.1650848388671875, -0.0824432373046875, 4.222900390625, 22.031631469726562, 19.095027923583984, 13.439815521240234, 13.114639282226562, -19.711029052734375, 3.4499168395996094, 12.410049438476562, 16.71630859375, 31.82568359375, 21.783950805664062, 3.03472900390625, -0.5362548828125, 5.458133697509766, 8.072879791259766, 16.767822265625, 23.95008087158203, 10.8653564453125, 16.607139587402344, 16.945404052734375, 23.484325408935547, 20.846389770507812, 9.514389038085938, -5.2695770263671875, 11.999610900878906, -5.254615783691406, -8.663566589355469, 15.526763916015625, 13.06265640258789, 14.5084228515625, 24.977222442626953, 2.4871063232421875, 21.07933807373047, 18.272415161132812, -1.8152236938476562, 26.223541259765625, -3.6258316040039062, -11.446460723876953, 6.352626800537109, 2.511669158935547, 14.434654235839844, 8.201339721679688, -2.1795272827148438, -8.269569396972656, 12.636518478393555, 21.358154296875, 23.636856079101562, -1.3835067749023438, -6.4501800537109375, 7.416831970214844, 23.751434326171875, 4.91534423828125, 13.675647735595703, 9.566390991210938, 1.7339706420898438, 3.598663330078125, 17.49750328063965, 0.49616241455078125, -11.72555160522461, 5.5681915283203125, 9.803215026855469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000607.npy"}
{"epoch": 0.9176114890400605, "step": 608, "batch_size": 64, "mean": 5.993744850158691, "std": 12.279084205627441, "min": -19.148635864257812, "p10": -7.774757385253905, "median": 4.4656982421875, "p90": 23.621107482910162, "max": 38.732147216796875, "pos_frac": 0.671875, "sample": [10.141077041625977, -10.356636047363281, 0.43827247619628906, 11.855182647705078, -0.21495628356933594, -6.5835418701171875, 14.447433471679688, -9.983879089355469, 6.344718933105469, -9.008956909179688, 13.127052307128906, 4.0485382080078125, 14.212020874023438, 14.13680648803711, 6.708736419677734, 5.297645568847656, 22.094375610351562, -18.501609802246094, 5.256984710693359, -0.14947509765625, 0.34868621826171875, -5.947231292724609, -0.7548580169677734, 37.608360290527344, 1.4213142395019531, -3.2784423828125, -3.4421768188476562, 3.496004104614258, 10.399175643920898, 24.275421142578125, 9.740753173828125, -13.957626342773438, -4.377288818359375, 13.886711120605469, -1.988922119140625, 0.5186920166015625, -2.6083526611328125, -5.270111083984375, 4.613639831542969, 5.4945220947265625, -0.07400131225585938, 16.790359497070312, 4.317756652832031, -1.5510368347167969, 0.804656982421875, 15.737884521484375, -8.2852783203125, 29.878036499023438, 38.732147216796875, 10.346961975097656, 6.6131591796875, 2.1096572875976562, 15.516159057617188, 6.977058410644531, 4.752279281616211, 30.701576232910156, -19.148635864257812, 31.10805320739746, 2.141571044921875, -2.9914932250976562, 2.3220386505126953, 15.527442932128906, 11.985404968261719, 25.799835205078125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000608.npy"}
{"epoch": 0.9191232048374905, "step": 609, "batch_size": 64, "mean": 8.734305381774902, "std": 10.449430465698242, "min": -23.861827850341797, "p10": -2.2711544036865234, "median": 8.423943519592285, "p90": 21.97113418579102, "max": 31.02789306640625, "pos_frac": 0.796875, "sample": [30.425369262695312, 20.50897216796875, 18.35657501220703, 2.822998046875, 1.167938232421875, 0.6481781005859375, 18.12160873413086, 16.031982421875, -0.2994041442871094, 2.7478675842285156, -1.333749771118164, 4.580844879150391, -4.255546569824219, 8.459217071533203, 19.92376708984375, 11.6890869140625, 25.991134643554688, 10.818962097167969, 8.388669967651367, 28.94662857055664, -9.301895141601562, 1.4575653076171875, 13.291107177734375, 22.395492553710938, 0.27974510192871094, 20.98096466064453, 19.745708465576172, 5.926368713378906, -6.422794342041016, 7.018272399902344, 31.02789306640625, -6.138889312744141, 20.461353302001953, 2.7082290649414062, 12.42791748046875, 14.483875274658203, 11.546478271484375, -1.0131072998046875, 22.948699951171875, -0.5179061889648438, 1.4462394714355469, 1.6533393859863281, 7.502513885498047, 7.772483825683594, 11.66888427734375, -23.861827850341797, 11.708438873291016, 6.0819549560546875, 7.621829986572266, 2.849527359008789, 6.123018264770508, -0.6160812377929688, 9.157188415527344, 16.60354995727539, 13.771728515625, -2.253490447998047, 9.644317626953125, 22.736608505249023, 16.418188095092773, -8.34521484375, 13.727470397949219, 10.933292388916016, 11.884124755859375, -2.2787246704101562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000609.npy"}
{"epoch": 0.9206349206349206, "step": 610, "batch_size": 64, "mean": 8.661681175231934, "std": 14.547538757324219, "min": -23.33642578125, "p10": -6.660463714599609, "median": 6.50743293762207, "p90": 30.802962493896512, "max": 45.13556671142578, "pos_frac": 0.703125, "sample": [11.140787124633789, 11.761398315429688, 7.956901550292969, 38.902587890625, 4.9600982666015625, 9.388633728027344, 2.2365951538085938, 10.989376068115234, -23.33642578125, 20.104766845703125, -4.322868347167969, -12.22127914428711, 10.492134094238281, 7.992279052734375, 3.166656494140625, 5.4506378173828125, 24.23110580444336, -5.871212005615234, -4.1754913330078125, -3.845489501953125, -2.85736083984375, -7.8466949462890625, 8.6435546875, 6.167491912841797, -10.52606201171875, -4.3411102294921875, -4.51348876953125, 44.773250579833984, 17.347862243652344, 37.10920715332031, 0.6621437072753906, 36.883514404296875, 15.21185302734375, 23.991302490234375, -6.959403991699219, 17.251556396484375, 33.61947250366211, 2.8739185333251953, 13.353694915771484, 7.16314697265625, 16.66626739501953, 1.0042076110839844, 3.0932540893554688, -9.5672607421875, 16.73175048828125, 0.4156684875488281, -1.7867660522460938, 6.764133453369141, 23.523822784423828, 21.949508666992188, 6.250732421875, 8.133747100830078, 3.834625244140625, -1.4652748107910156, -10.05487060546875, -4.326801300048828, -5.9629364013671875, 21.805030822753906, 45.13556671142578, 12.844465255737305, 19.570518493652344, -4.5692596435546875, 35.31561279296875, 6.032783508300781], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000610.npy"}
{"epoch": 0.9221466364323507, "step": 611, "batch_size": 64, "mean": 6.109829902648926, "std": 11.338781356811523, "min": -12.128807067871094, "p10": -7.230941581726074, "median": 4.317476272583008, "p90": 20.43435668945313, "max": 42.10308074951172, "pos_frac": 0.65625, "sample": [17.13257598876953, -1.0467300415039062, -10.848648071289062, -6.537212371826172, 17.482234954833984, 4.634281158447266, 7.5138397216796875, 9.26220703125, -12.128807067871094, -2.7945404052734375, -10.88160514831543, 27.279006958007812, 9.079416275024414, -6.88372802734375, -1.2254219055175781, 3.405303955078125, -10.955867767333984, 20.691802978515625, -10.878303527832031, 42.10308074951172, 16.591312408447266, 10.326583862304688, 4.00067138671875, -7.37974739074707, -4.5002899169921875, -2.9904212951660156, 19.225936889648438, 8.938884735107422, -0.43212890625, 19.020278930664062, 5.392185211181641, -2.438323974609375, -6.753974914550781, 28.531166076660156, 1.4521102905273438, 3.6043777465820312, -9.203996658325195, 16.786865234375, 10.919540405273438, 11.603500366210938, 13.665908813476562, -1.1153106689453125, 3.4208297729492188, 19.833648681640625, -1.0273818969726562, 10.505050659179688, -0.07055282592773438, 13.05218505859375, 9.92216682434082, 13.007049560546875, 24.074981689453125, 22.971607208251953, -1.80877685546875, 7.3717803955078125, 3.353292465209961, 3.589008331298828, 27.917747497558594, -6.709831237792969, 2.2533035278320312, 4.997657775878906, 5.402919769287109, 1.079437255859375, 0.9666671752929688, 7.278327941894531], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000611.npy"}
{"epoch": 0.9236583522297808, "step": 612, "batch_size": 64, "mean": 9.299038887023926, "std": 13.756364822387695, "min": -26.69367218017578, "p10": -5.336811447143554, "median": 8.226627349853516, "p90": 28.059106063842773, "max": 43.38862609863281, "pos_frac": 0.78125, "sample": [-0.3969764709472656, -14.293746948242188, 7.0053558349609375, -0.9803924560546875, 0.1688079833984375, 8.722259521484375, 23.594993591308594, 13.969436645507812, 1.6632080078125, 5.956016540527344, 29.4617919921875, 6.33709716796875, 17.638763427734375, 2.885986328125, 10.044601440429688, 9.3909912109375, 11.74898910522461, -0.71136474609375, 3.6301498413085938, 7.6149444580078125, 9.721504211425781, -10.367774963378906, 10.790916442871094, 2.41845703125, -6.8057098388671875, 2.6610794067382812, 19.704788208007812, -6.658393859863281, -3.0738067626953125, 0.265716552734375, 10.451778411865234, 33.63344955444336, -4.688198089599609, -4.632360458374023, 0.06511688232421875, 39.34657287597656, 4.0299530029296875, 28.08639907836914, 27.99542236328125, -18.61456298828125, 39.28094482421875, 3.9640655517578125, 19.068893432617188, -5.381401062011719, 24.488784790039062, 22.670318603515625, 18.72484588623047, -5.232769012451172, 17.039649963378906, 43.38862609863281, 1.4164314270019531, 5.411293029785156, 36.445892333984375, 9.79132080078125, 14.529020309448242, 15.2696533203125, 14.136619567871094, 10.21993637084961, 6.964447021484375, -26.69367218017578, 13.258905410766602, 18.45745086669922, 7.730995178222656, 12.406982421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000612.npy"}
{"epoch": 0.9251700680272109, "step": 613, "batch_size": 64, "mean": 10.865882873535156, "std": 13.913680076599121, "min": -25.79698944091797, "p10": -4.55719108581543, "median": 8.586026191711426, "p90": 31.677628326416027, "max": 40.903709411621094, "pos_frac": 0.8125, "sample": [-5.267108917236328, 9.386791229248047, 0.6258430480957031, -2.114105224609375, 20.493810653686523, 5.233970642089844, 22.78412628173828, 8.83880615234375, -4.4407958984375, 5.171909332275391, 7.947517395019531, 12.595787048339844, -25.79698944091797, 15.677223205566406, 8.490381240844727, 25.336959838867188, 3.6307220458984375, 6.974359512329102, 27.864063262939453, 5.474464416503906, 4.914455413818359, 0.8662528991699219, -9.98388671875, -2.7908706665039062, 33.18912124633789, 9.999195098876953, 34.370216369628906, 5.849468231201172, -0.78948974609375, 15.478801727294922, 28.25531005859375, 7.125736236572266, 12.610870361328125, -4.607074737548828, 3.57830810546875, 8.723793029785156, -14.287322998046875, 35.04082489013672, 10.742950439453125, 40.903709411621094, 20.12255096435547, 28.894851684570312, 2.84930419921875, 8.681671142578125, 37.670440673828125, -4.434211730957031, 27.42426300048828, 15.691802978515625, 3.2900543212890625, 17.993789672851562, -5.8665008544921875, 2.8713836669921875, 10.4716796875, 32.87024688720703, 22.2274169921875, 5.8461151123046875, 5.651369094848633, 25.662487030029297, 9.311737060546875, 34.66786575317383, 8.281570434570312, 28.788070678710938, 0.8944091796875, -12.543960571289062], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000613.npy"}
{"epoch": 0.926681783824641, "step": 614, "batch_size": 64, "mean": 7.457139015197754, "std": 10.44261646270752, "min": -17.489364624023438, "p10": -4.1405517578125, "median": 5.923637390136719, "p90": 22.076779174804688, "max": 30.68060302734375, "pos_frac": 0.765625, "sample": [26.29120445251465, 9.131694793701172, 1.2269668579101562, 16.450103759765625, 15.681892395019531, 7.7913055419921875, -0.14626693725585938, -2.55682373046875, 5.912342071533203, 3.8458328247070312, 22.12328338623047, 3.08831787109375, 24.3486328125, 3.8963241577148438, 23.693267822265625, 10.199514389038086, 14.151603698730469, 10.428760528564453, 14.381317138671875, 22.33930206298828, -5.420867919921875, 5.934932708740234, -1.6375732421875, -7.38519287109375, 30.68060302734375, -4.1434173583984375, 6.13629150390625, -16.009544372558594, 12.747058868408203, -7.043975830078125, -0.1035003662109375, 0.486236572265625, 5.478973388671875, 21.52472686767578, 2.6141319274902344, 8.750679016113281, 14.454267501831055, 1.6297836303710938, -17.489364624023438, 13.920024871826172, 1.3860511779785156, 28.1671142578125, 4.352630615234375, 13.15117073059082, 3.6170196533203125, 17.154693603515625, 4.1576385498046875, 1.458892822265625, 15.472564697265625, -4.1338653564453125, 3.7292518615722656, 9.879066467285156, -3.794708251953125, -4.076635360717773, -11.817230224609375, 16.038299560546875, 21.96826934814453, 7.4149169921875, 16.666080474853516, 3.2903213500976562, 4.965282440185547, 14.263923645019531, -0.8254928588867188, 17.368804931640625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000614.npy"}
{"epoch": 0.9281934996220711, "step": 615, "batch_size": 64, "mean": 9.824195861816406, "std": 13.776714324951172, "min": -17.520538330078125, "p10": -6.190474700927734, "median": 6.952556610107422, "p90": 28.553350830078127, "max": 38.55921173095703, "pos_frac": 0.734375, "sample": [9.646873474121094, 11.437332153320312, 25.595060348510742, 7.053108215332031, 0.316070556640625, 21.89388084411621, 4.793487548828125, -17.520538330078125, 27.58098602294922, 15.824810028076172, -10.579742431640625, 7.498992919921875, -1.5784454345703125, 3.680328369140625, 15.191690444946289, -5.461048126220703, 5.751426696777344, 38.55921173095703, 21.324779510498047, 6.1653900146484375, 9.897296905517578, 0.8244552612304688, 22.09259033203125, -6.575817108154297, 18.960407257080078, -2.291717529296875, -8.523193359375, 6.456031799316406, 4.5316009521484375, 37.820098876953125, 22.234832763671875, -1.0153770446777344, 2.8507308959960938, -1.415130615234375, 10.181381225585938, 13.494209289550781, 32.07666015625, 5.7440948486328125, 6.8520050048828125, 30.54736328125, 23.19959259033203, 36.02716064453125, 27.636585235595703, 6.304046630859375, 20.888229370117188, -6.503086090087891, 1.2144012451171875, -4.735992431640625, -3.6012420654296875, 29.757369995117188, 5.2340087890625, 28.7125244140625, -3.6263694763183594, 25.135597229003906, -2.8430862426757812, 23.6619873046875, 10.375297546386719, 10.78619384765625, 28.18194580078125, 1.9785003662109375, 12.500110626220703, -4.672630310058594, -15.862476348876953, -12.916290283203125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000615.npy"}
{"epoch": 0.9297052154195011, "step": 616, "batch_size": 64, "mean": 7.308769226074219, "std": 11.993223190307617, "min": -16.356306076049805, "p10": -4.368105506896972, "median": 5.477882385253906, "p90": 24.96733703613281, "max": 35.32630920410156, "pos_frac": 0.65625, "sample": [19.882957458496094, 5.280570983886719, 5.2322998046875, -2.1686935424804688, 11.783180236816406, -5.974189758300781, 32.70475769042969, 10.226470947265625, 0.8029708862304688, 8.345786094665527, -4.507854461669922, -0.4539604187011719, 6.881950378417969, -15.069770812988281, -1.3034744262695312, -7.350612640380859, 24.958877563476562, -0.6231956481933594, -0.6398887634277344, 31.46728515625, 6.414451599121094, 30.915159225463867, -16.356306076049805, -1.4640998840332031, 33.59899139404297, -8.564346313476562, -2.0521621704101562, -3.0129623413085938, 0.5749626159667969, 17.084293365478516, 5.675193786621094, 10.688041687011719, 7.824165344238281, 12.864723205566406, 35.32630920410156, 24.970962524414062, -3.1283226013183594, 0.9597930908203125, -0.68841552734375, 18.93756866455078, 12.524856567382812, 7.3205108642578125, 5.856839179992676, 10.984367370605469, 18.287979125976562, 18.233810424804688, -0.17549896240234375, 2.5816268920898438, -3.973602294921875, 2.2987709045410156, -10.553977966308594, -3.667449951171875, -0.09683990478515625, 17.672744750976562, 6.513542175292969, 18.81537628173828, 3.8713760375976562, 7.1356658935546875, 15.206260681152344, 1.9106216430664062, 15.509256362915039, 30.30133056640625, 5.20220947265625, -4.042024612426758], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000616.npy"}
{"epoch": 0.9312169312169312, "step": 617, "batch_size": 64, "mean": 7.84160852432251, "std": 11.500407218933105, "min": -20.904769897460938, "p10": -5.265887451171874, "median": 7.2003936767578125, "p90": 18.64901695251465, "max": 44.55510711669922, "pos_frac": 0.78125, "sample": [7.168670654296875, 1.7125778198242188, -8.48284912109375, 17.66521453857422, 11.697616577148438, 11.584981918334961, 6.061187744140625, 15.921562194824219, 8.072261810302734, -13.917083740234375, 10.766258239746094, -0.07039642333984375, 8.883880615234375, 8.959066390991211, 3.987948417663574, 6.908882141113281, 8.613792419433594, 6.1228179931640625, 12.065261840820312, 4.949615478515625, 26.108062744140625, 4.994853973388672, -1.540313720703125, 18.135822296142578, -3.6957244873046875, -3.1536903381347656, -5.575843811035156, 18.86895751953125, 0.16762542724609375, 12.311759948730469, 5.845941543579102, -0.619903564453125, 8.658370971679688, 30.05872344970703, 4.95941162109375, 34.036598205566406, -0.6798553466796875, -5.966941833496094, 9.899845123291016, 16.505035400390625, 14.967819213867188, 2.68145751953125, 1.3077888488769531, 3.8075408935546875, 28.53274154663086, 7.23211669921875, -17.890426635742188, -20.904769897460938, -6.504997253417969, 4.592720031738281, 15.661346435546875, 17.808982849121094, 14.605110168457031, 11.492326736450195, 44.55510711669922, 28.796850204467773, 15.710044860839844, 2.298980712890625, 6.562065124511719, 10.727203369140625, 1.0834236145019531, -4.542655944824219, 11.856529235839844, 9.435638427734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000617.npy"}
{"epoch": 0.9327286470143613, "step": 618, "batch_size": 64, "mean": 7.668525695800781, "std": 13.38885498046875, "min": -24.012802124023438, "p10": -5.398642730712891, "median": 4.9896240234375, "p90": 27.04985733032227, "max": 43.886451721191406, "pos_frac": 0.6875, "sample": [30.024505615234375, 26.35613250732422, 19.01342010498047, 4.568840026855469, 0.354583740234375, -6.060434341430664, -5.312995910644531, -0.6454315185546875, 11.883056640625, -0.5988330841064453, 3.511363983154297, 18.07452392578125, 27.34716796875, 12.39849853515625, 43.886451721191406, 2.7357177734375, 8.770790100097656, 4.063529968261719, 38.40199279785156, 17.669189453125, -1.0040512084960938, 42.081451416015625, 5.312824249267578, 14.389259338378906, -5.4353485107421875, 31.0592041015625, 1.0451164245605469, 3.5891647338867188, 9.683673858642578, 4.3746795654296875, -3.5649871826171875, -2.5108489990234375, 15.447925567626953, 4.838405609130859, 13.767860412597656, -2.4600067138671875, 19.031715393066406, -4.6376800537109375, 10.922679901123047, 3.918811798095703, -4.534934997558594, -9.995216369628906, -2.2888641357421875, -15.934181213378906, 5.304862976074219, -13.762649536132812, -1.14276123046875, -24.012802124023438, 6.675548553466797, 19.27682113647461, -6.7953033447265625, 17.465045928955078, 6.248176574707031, 7.675870895385742, 1.4517669677734375, -1.3710670471191406, -0.9807014465332031, 13.334144592285156, 5.140842437744141, 16.36935806274414, 15.862930297851562, 32.318511962890625, 0.499359130859375, 7.688976287841797], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000618.npy"}
{"epoch": 0.9342403628117913, "step": 619, "batch_size": 64, "mean": 9.305343627929688, "std": 12.831095695495605, "min": -21.8995361328125, "p10": -4.626228713989257, "median": 6.329195976257324, "p90": 27.712648010253908, "max": 37.63648986816406, "pos_frac": 0.78125, "sample": [17.205427169799805, 1.935455322265625, 3.039933204650879, 7.162860870361328, 36.805503845214844, 30.788623809814453, 28.534866333007812, -3.202239990234375, -8.925262451171875, 26.939712524414062, 27.330337524414062, -10.544807434082031, 11.99725341796875, -0.9706535339355469, 4.472137451171875, 0.4898490905761719, 21.01508331298828, 9.646804809570312, 0.34061241149902344, -0.9798431396484375, 6.373054504394531, 18.974464416503906, 2.5545616149902344, -5.1166839599609375, -1.0870323181152344, 13.972480773925781, 17.43486213684082, 21.45086669921875, 0.9819183349609375, -1.5224285125732422, 2.8154830932617188, -21.8995361328125, -4.667716979980469, 27.876495361328125, 12.203136444091797, 28.849788665771484, 6.285337448120117, 35.55936050415039, 37.63648986816406, -13.809715270996094, 5.2860565185546875, 17.043182373046875, 16.17870330810547, 1.8249855041503906, 17.72332763671875, 3.7303619384765625, 2.4052810668945312, 7.6678314208984375, 6.230998992919922, 15.009941101074219, -5.2080078125, 6.601116180419922, 19.667034149169922, 3.372802734375, 24.662994384765625, -4.529422760009766, 10.406608581542969, 19.82982063293457, 0.17752838134765625, 2.590473175048828, -3.3339462280273438, 16.689090728759766, 0.8945846557617188, 22.673866271972656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000619.npy"}
{"epoch": 0.9357520786092215, "step": 620, "batch_size": 64, "mean": 6.665311336517334, "std": 12.971067428588867, "min": -26.806427001953125, "p10": -7.525658226013183, "median": 5.0673017501831055, "p90": 24.831802368164066, "max": 41.671783447265625, "pos_frac": 0.703125, "sample": [3.1885604858398438, -7.69573974609375, -13.345924377441406, 23.68579864501953, -23.531219482421875, 3.327322006225586, 6.48236083984375, 25.302154541015625, -0.824432373046875, 19.800716400146484, 28.090938568115234, 29.420654296875, -0.8282012939453125, 4.9998779296875, 6.6659393310546875, 22.514144897460938, 3.6602554321289062, 1.8251571655273438, -1.3717422485351562, -1.8656082153320312, -5.063385009765625, -0.2237701416015625, -26.806427001953125, 23.73431396484375, -16.586334228515625, 27.1019287109375, 4.322284698486328, -4.836357116699219, 41.671783447265625, 0.8302459716796875, 17.86481285095215, 12.109825134277344, -3.8352222442626953, 2.7702560424804688, 11.421546936035156, 7.878704071044922, -2.6256637573242188, 10.342155456542969, 15.68063735961914, 1.8743400573730469, 11.798961639404297, 7.1428070068359375, 10.770822525024414, 2.8732166290283203, -3.364154815673828, -8.361427307128906, 17.622032165527344, 32.968116760253906, -7.128801345825195, 13.227775573730469, 5.134725570678711, 26.04876708984375, -2.0308837890625, 2.3663253784179688, -8.837944030761719, 1.2197532653808594, 6.112417221069336, 10.026496887207031, 14.701398849487305, 17.808990478515625, 14.276092529296875, 6.455879211425781, 5.220851898193359, 3.4010276794433594], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000620.npy"}
{"epoch": 0.9372637944066515, "step": 621, "batch_size": 64, "mean": 6.981159210205078, "std": 11.987911224365234, "min": -21.83611297607422, "p10": -5.690655517578125, "median": 4.483762741088867, "p90": 24.74566955566407, "max": 35.618377685546875, "pos_frac": 0.671875, "sample": [-1.7301139831542969, 2.2537918090820312, 13.754814147949219, -5.867519378662109, 2.7122802734375, -21.83611297607422, 7.9622650146484375, 23.2972412109375, -1.13311767578125, 2.5767745971679688, 5.151981353759766, 25.366424560546875, 25.983963012695312, -2.876840591430664, 34.282596588134766, 2.9554595947265625, 8.596660614013672, 15.679405212402344, 2.91253662109375, 18.560791015625, 5.72021484375, 0.6212310791015625, 13.982162475585938, -9.878219604492188, 16.910736083984375, 14.4029541015625, -7.0408935546875, -4.51580810546875, 17.00408935546875, 3.8155441284179688, 34.13555908203125, 10.816543579101562, -0.38470458984375, 21.003173828125, 14.54909896850586, -2.0672454833984375, 8.293510437011719, 7.346576690673828, 9.983406066894531, -3.8902435302734375, -2.495157241821289, 3.5769500732421875, 18.461448669433594, 0.4270477294921875, 5.254724502563477, -3.2726707458496094, 8.707304000854492, -5.277973175048828, -10.114173889160156, -2.7034912109375, -4.349822998046875, 16.689619064331055, 20.87065887451172, -1.9430980682373047, 13.849533081054688, -8.525226593017578, -0.13780593872070312, 25.57965087890625, -10.662712097167969, 35.618377685546875, 0.55023193359375, 1.4748954772949219, 10.385238647460938, 25.419708251953125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000621.npy"}
{"epoch": 0.9387755102040817, "step": 622, "batch_size": 64, "mean": 7.7317914962768555, "std": 11.177224159240723, "min": -15.969772338867188, "p10": -5.602474212646484, "median": 6.915021896362305, "p90": 23.537176513671884, "max": 38.931705474853516, "pos_frac": 0.75, "sample": [13.010200500488281, 7.630229949951172, -8.968017578125, -5.478385925292969, 6.503440856933594, 11.451225280761719, -4.557559967041016, 17.581817626953125, -2.2208404541015625, 7.6707763671875, 38.931705474853516, 8.286201477050781, 0.62921142578125, -0.455078125, 26.838512420654297, -2.6116409301757812, 5.944122314453125, 7.624584197998047, 14.7691650390625, 27.280975341796875, 3.1776771545410156, 12.007406234741211, -2.697906494140625, 7.1734619140625, 25.74932861328125, 18.081710815429688, 7.020133972167969, -0.9369888305664062, 4.935661315917969, 18.933456420898438, 28.769325256347656, -15.969772338867188, 15.155645370483398, -8.447395324707031, 1.12750244140625, 2.029766082763672, 9.532028198242188, 15.007484436035156, 12.193695068359375, -3.5203323364257812, 17.406719207763672, 16.120513916015625, -5.846038818359375, 2.5593414306640625, 6.809909820556641, 14.666213989257812, 21.189117431640625, -15.293014526367188, 1.1466217041015625, 14.627059936523438, 28.503433227539062, 1.7556877136230469, 2.5857410430908203, -5.6556549072265625, 3.361034393310547, 1.6238632202148438, 24.543487548828125, 2.4367599487304688, -6.833778381347656, -1.7852935791015625, 18.12729263305664, 6.573238372802734, 16.64795684814453, 10.38189697265625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000622.npy"}
{"epoch": 0.9402872260015117, "step": 623, "batch_size": 64, "mean": 8.49173355102539, "std": 12.332313537597656, "min": -18.760517120361328, "p10": -5.059229469299316, "median": 7.312131881713867, "p90": 24.689614105224614, "max": 39.793907165527344, "pos_frac": 0.765625, "sample": [4.6811676025390625, 3.690948486328125, -2.532337188720703, -18.760517120361328, 10.633956909179688, 14.03509521484375, 9.04019546508789, 16.234325408935547, 16.94903564453125, 11.861503601074219, 0.6573562622070312, 3.678342819213867, 2.2040481567382812, 5.2633056640625, -4.137657165527344, 28.965675354003906, -5.943389892578125, 16.19817352294922, 17.8433837890625, 17.156265258789062, 7.562618255615234, 7.467098236083984, 19.80272674560547, -5.190057754516602, -4.4634552001953125, 11.72982406616211, 5.9550933837890625, 13.318069458007812, -13.835113525390625, 23.52203369140625, 33.832725524902344, 39.793907165527344, 13.715492248535156, 14.759063720703125, -3.01324462890625, -8.743267059326172, 2.4959564208984375, 3.259185791015625, 18.072357177734375, 25.110488891601562, 26.035919189453125, 31.931869506835938, 2.099733352661133, 2.506195068359375, -4.753963470458984, 0.8674087524414062, -7.1769256591796875, 11.793556213378906, 11.503799438476562, 11.84031867980957, 13.2294921875, 37.87120056152344, 7.15716552734375, 19.169708251953125, -3.6539306640625, 6.593223571777344, 1.242462158203125, 3.4900474548339844, 12.59417724609375, -3.716228485107422, 23.70757293701172, -1.9776535034179688, 1.765167236328125, -13.519752502441406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000623.npy"}
{"epoch": 0.9417989417989417, "step": 624, "batch_size": 64, "mean": 9.085529327392578, "std": 12.508866310119629, "min": -28.961830139160156, "p10": -5.666164398193359, "median": 6.811023712158203, "p90": 28.146660614013676, "max": 37.574493408203125, "pos_frac": 0.78125, "sample": [-7.22247314453125, -7.0435791015625, 18.489730834960938, 10.768486022949219, 29.8525390625, 17.436111450195312, 3.4471168518066406, 5.8828125, -5.769355773925781, -1.2146415710449219, 5.288398742675781, 8.034099578857422, -1.5814361572265625, 2.2369937896728516, 12.450721740722656, 5.25250244140625, 2.284332275390625, 35.60870361328125, 13.456466674804688, -2.322254180908203, 2.9407806396484375, 4.659259796142578, -8.058300018310547, 21.452457427978516, 0.8929481506347656, 5.771400451660156, 3.2208938598632812, -7.054656982421875, 7.667728424072266, 12.442691802978516, 7.2000885009765625, 28.915435791015625, 24.64312744140625, 16.468597412109375, 11.12374496459961, 2.1864242553710938, 27.529861450195312, 35.047096252441406, 37.574493408203125, 17.607013702392578, 15.62374496459961, 10.557723999023438, 8.85538101196289, 12.033226013183594, -3.7412185668945312, 26.583297729492188, 30.57909393310547, 3.3245277404785156, -2.493741989135742, 17.834609985351562, 13.742179870605469, 6.790351867675781, 0.1208343505859375, 4.313292503356934, -28.961830139160156, 19.424842834472656, 6.5505218505859375, -0.9879684448242188, 6.831695556640625, -5.870361328125, -5.425384521484375, 4.660748481750488, 28.41100311279297, 15.150955200195312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000624.npy"}
{"epoch": 0.9433106575963719, "step": 625, "batch_size": 64, "mean": 5.822243690490723, "std": 12.200562477111816, "min": -22.691253662109375, "p10": -9.626149559020995, "median": 6.039772033691406, "p90": 21.182026672363282, "max": 43.63671112060547, "pos_frac": 0.65625, "sample": [11.377487182617188, 16.774124145507812, 0.012939453125, 11.219635009765625, -5.703130722045898, 20.584293365478516, -1.6364479064941406, 0.7436904907226562, 13.81528091430664, 10.941802978515625, 3.3668746948242188, 0.4526023864746094, 7.806884765625, -2.7489852905273438, 26.753021240234375, 1.123046875, 43.63671112060547, 5.213249206542969, -4.062217712402344, 3.692117691040039, 6.583709716796875, 21.43819808959961, 7.368122100830078, -2.1363372802734375, -0.5319442749023438, 10.757709503173828, 18.78356170654297, 19.385284423828125, -2.129537582397461, 5.467628479003906, -10.714019775390625, 12.200393676757812, 25.705535888671875, -7.1285247802734375, 6.142189025878906, -5.782609939575195, -9.859987258911133, 26.7928466796875, -3.3649673461914062, -0.9681472778320312, 3.297027587890625, 28.950820922851562, 16.617843627929688, 6.822088241577148, 8.533721923828125, 8.30862045288086, -9.080528259277344, -14.660842895507812, -12.040237426757812, 19.594573974609375, -4.751686096191406, -22.691253662109375, -10.353963851928711, 24.55359649658203, 13.033151626586914, -3.344545364379883, 12.590927124023438, 5.937355041503906, -5.6732025146484375, -12.18062973022461, 7.964801788330078, 9.17791748046875, 7.804615020751953, 12.84130859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000625.npy"}
{"epoch": 0.9448223733938019, "step": 626, "batch_size": 64, "mean": 9.84233570098877, "std": 15.539020538330078, "min": -20.98894500732422, "p10": -8.91995735168457, "median": 8.862558364868164, "p90": 31.92811164855957, "max": 45.734832763671875, "pos_frac": 0.671875, "sample": [23.53130340576172, 7.763946533203125, 8.640689849853516, -10.252090454101562, 45.734832763671875, 28.784481048583984, 33.84510803222656, -7.006837844848633, 6.050506591796875, 17.42022705078125, -2.257223129272461, -2.582744598388672, -6.164196014404297, -0.9687423706054688, -0.342254638671875, 42.58083724975586, 28.28476333618164, -9.52444839477539, 22.812889099121094, -1.3777427673339844, -8.276535034179688, 4.303802490234375, 3.5053157806396484, 31.616283416748047, 9.084426879882812, -7.5221099853515625, 11.500158309936523, 11.893112182617188, -2.3787841796875, 40.66204833984375, 11.161571502685547, -3.2563915252685547, 28.998924255371094, 12.687057495117188, 2.6654052734375, -20.98894500732422, -5.285072326660156, 19.374244689941406, 8.067825317382812, 11.314773559570312, 7.25030517578125, -2.880786895751953, 41.071144104003906, 5.936336517333984, -14.918182373046875, 18.80790901184082, 10.866874694824219, 27.08807373046875, 10.222362518310547, 4.838035583496094, 18.757064819335938, 9.89224624633789, -12.330270767211914, 13.89447021484375, -8.370681762695312, 20.534011840820312, 9.980928421020508, 35.0101318359375, 19.523784637451172, 12.811851501464844, 32.06175231933594, -9.15536117553711, 6.259208679199219, -11.342140197753906], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000626.npy"}
{"epoch": 0.9463340891912321, "step": 627, "batch_size": 64, "mean": 10.888916015625, "std": 13.114431381225586, "min": -10.13165283203125, "p10": -4.106046867370606, "median": 8.474403381347656, "p90": 32.54976272583008, "max": 45.62230682373047, "pos_frac": 0.796875, "sample": [6.0182647705078125, 5.958377838134766, 14.368919372558594, 8.61517333984375, 12.260810852050781, 19.676990509033203, 25.654212951660156, 27.443527221679688, 24.003448486328125, 6.1429443359375, -4.359912872314453, 22.959022521972656, 40.69535827636719, 12.671653747558594, 2.3178272247314453, -1.3780059814453125, -3.1370849609375, 2.4606246948242188, 36.43808364868164, 5.442176818847656, 15.873985290527344, 9.500545501708984, 1.3719749450683594, 1.854461669921875, 10.093120574951172, -7.449840545654297, 33.81843185424805, -4.884651184082031, 23.922412872314453, -3.878293991088867, -4.203655242919922, 34.80889892578125, 19.59991455078125, 9.204505920410156, 2.0165252685546875, 8.333633422851562, -0.6026535034179688, 3.4251785278320312, 2.1456069946289062, 32.58612060546875, 32.765655517578125, 12.394027709960938, -3.386089324951172, -5.0452880859375, 5.3197784423828125, 2.0569305419921875, -0.07547760009765625, 11.098037719726562, 3.5867767333984375, 15.30047607421875, 6.495450973510742, -10.13165283203125, 1.4664535522460938, 14.74631118774414, 14.358123779296875, 13.0906982421875, 5.740879058837891, -10.033679962158203, 13.261348724365234, 1.4737930297851562, 15.4857177734375, 27.046493530273438, 32.464927673339844, 45.62230682373047], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000627.npy"}
{"epoch": 0.9478458049886621, "step": 628, "batch_size": 64, "mean": 9.803083419799805, "std": 15.209822654724121, "min": -30.65374755859375, "p10": -8.181335449218748, "median": 9.286628723144531, "p90": 25.94344482421875, "max": 46.231719970703125, "pos_frac": 0.75, "sample": [46.231719970703125, -9.20672607421875, 25.954193115234375, 12.971662521362305, -10.459213256835938, -5.78875732421875, 25.918365478515625, -2.37347412109375, 27.1270694732666, 18.891395568847656, 42.665374755859375, 24.818679809570312, 20.8804931640625, 7.413074493408203, 6.915763854980469, 7.2978668212890625, -30.65374755859375, -30.03759765625, 12.84515380859375, -5.1535797119140625, 23.944290161132812, 4.97357177734375, 14.622968673706055, -13.336517333984375, 5.346160888671875, 13.906410217285156, 6.6037139892578125, 8.961898803710938, -1.0026397705078125, 15.2254638671875, 23.643062591552734, 19.275619506835938, 9.935821533203125, 32.71510314941406, 7.2811279296875, 9.611358642578125, 0.4891853332519531, 7.605733871459961, 22.64739227294922, -2.77801513671875, 17.73943328857422, 11.866348266601562, 8.575214385986328, 21.21424102783203, 18.204429626464844, 14.70721435546875, 34.06098556518555, 13.465301513671875, 40.629005432128906, 18.930889129638672, 4.182971954345703, -11.111923217773438, 5.141353607177734, -16.97633171081543, 13.976882934570312, 24.533660888671875, 0.39009857177734375, 17.110397338867188, -2.909515380859375, 0.9202346801757812, -3.119495391845703, 7.3306121826171875, -4.664669036865234, -2.7294235229492188], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000628.npy"}
{"epoch": 0.9493575207860923, "step": 629, "batch_size": 64, "mean": 8.333179473876953, "std": 11.57516098022461, "min": -13.306678771972656, "p10": -3.4788612365722655, "median": 5.922538757324219, "p90": 23.957857513427737, "max": 37.879638671875, "pos_frac": 0.78125, "sample": [-7.8893585205078125, 4.655921936035156, -3.5565185546875, 11.041620254516602, 5.82368278503418, 6.320323944091797, 0.4448394775390625, 8.883182525634766, 1.9626274108886719, 1.8245849609375, 9.929977416992188, 4.1623687744140625, 7.221732139587402, 33.02733612060547, 0.030914306640625, -13.306678771972656, 10.321287155151367, 6.7654266357421875, 5.9684600830078125, 24.032859802246094, -12.854972839355469, 11.360595703125, -7.357004165649414, 0.8784236907958984, -2.097681999206543, 1.671783447265625, 35.151641845703125, 37.879638671875, -1.3249397277832031, 18.939773559570312, 14.770187377929688, -3.8036041259765625, 7.592994689941406, 18.445743560791016, 23.782852172851562, 9.133499145507812, 13.409111022949219, 4.338478088378906, -0.012378692626953125, -2.059293746948242, -2.04156494140625, 5.025505065917969, 22.263572692871094, 5.307962417602539, 3.9594497680664062, 5.876617431640625, 1.3655471801757812, 18.685165405273438, 8.39002799987793, 25.60763931274414, 30.26811981201172, 1.6409988403320312, 18.541587829589844, -3.2976608276367188, 11.720855712890625, 3.764556884765625, -2.3852310180664062, 19.2935791015625, 18.050830841064453, 36.05891036987305, 3.30279541015625, 16.804893493652344, -9.313129425048828, 8.92303466796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000629.npy"}
{"epoch": 0.9508692365835223, "step": 630, "batch_size": 64, "mean": 9.437376976013184, "std": 12.178630828857422, "min": -8.892539978027344, "p10": -5.318898010253905, "median": 6.833240509033203, "p90": 24.983323669433595, "max": 45.91697692871094, "pos_frac": 0.71875, "sample": [-5.8963470458984375, 2.64398193359375, 3.6290054321289062, 5.170066833496094, 25.312664031982422, 25.841400146484375, 17.332115173339844, -1.6523799896240234, 37.235504150390625, 14.76898193359375, -1.0781974792480469, 14.885211944580078, -8.892539978027344, 24.214862823486328, -1.7744712829589844, 20.724727630615234, -7.67633056640625, 10.367015838623047, 11.872688293457031, 5.9061279296875, 3.593414306640625, 0.36643218994140625, 17.928558349609375, 12.672698974609375, 28.057281494140625, 18.16820526123047, -3.5305099487304688, 14.623075485229492, -2.590972900390625, 6.1160888671875, 4.45953369140625, -1.91021728515625, -5.6949462890625, 29.490821838378906, 10.851211547851562, 11.088394165039062, 17.1962890625, -3.1028900146484375, 8.329437255859375, 5.9061126708984375, 45.91697692871094, 21.66839027404785, 18.973064422607422, -4.7319793701171875, 34.21112060546875, 18.172142028808594, 10.28448486328125, 3.5146408081054688, 18.113981246948242, -6.1684417724609375, 5.026786804199219, 0.5887527465820312, -1.0717620849609375, 24.197463989257812, 19.99750518798828, 0.4588165283203125, -3.595489501953125, 6.704883575439453, -5.5704345703125, 22.30248260498047, -7.203559875488281, 11.292800903320312, 6.961597442626953, -1.0041942596435547], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000630.npy"}
{"epoch": 0.9523809523809523, "step": 631, "batch_size": 64, "mean": 11.173954010009766, "std": 13.581205368041992, "min": -20.85387420654297, "p10": -5.385592460632324, "median": 8.720894813537598, "p90": 28.600827026367188, "max": 45.87306213378906, "pos_frac": 0.84375, "sample": [8.527345657348633, 0.5983734130859375, 7.4327239990234375, 20.17237091064453, 11.086048126220703, 25.807693481445312, 25.696502685546875, 24.109375, 0.5084609985351562, 8.120246887207031, 3.12896728515625, -12.566242218017578, 9.251260757446289, 10.801445007324219, -5.370565414428711, 4.37098503112793, 3.8957061767578125, 7.3954925537109375, 11.445358276367188, 40.34577941894531, 30.376991271972656, -8.804489135742188, 45.87306213378906, 19.580047607421875, 14.480743408203125, 22.917709350585938, 4.913787841796875, 28.13433837890625, -4.212833404541016, 7.22526741027832, 6.7164764404296875, 8.936660766601562, -20.85387420654297, 0.9208259582519531, 11.365943908691406, 13.255584716796875, 35.34113693237305, 6.98779296875, 40.712860107421875, 25.204696655273438, 2.2106170654296875, -6.008274078369141, 2.1444091796875, 13.638439178466797, -6.398719787597656, -5.392032623291016, 5.735649108886719, 8.231907844543457, -8.757438659667969, 28.800750732421875, 0.7340240478515625, 24.510841369628906, 1.4183502197265625, 31.270423889160156, -2.2256622314453125, 12.099884033203125, 0.983062744140625, 17.845481872558594, 8.914443969726562, 21.543411254882812, 13.63454818725586, 26.36969757080078, 7.348358154296875, 22.650863647460938], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000631.npy"}
{"epoch": 0.9538926681783825, "step": 632, "batch_size": 64, "mean": 8.9365816116333, "std": 12.194446563720703, "min": -19.286882400512695, "p10": -6.4556640624999995, "median": 8.573783874511719, "p90": 25.050223731994645, "max": 36.132591247558594, "pos_frac": 0.78125, "sample": [5.461322784423828, 18.159088134765625, 4.0931396484375, 1.5658760070800781, 13.446578979492188, 26.76100730895996, -5.393032073974609, 18.707530975341797, 2.965057373046875, 8.944293975830078, -0.1166839599609375, -10.115402221679688, 9.879409790039062, 7.128936767578125, -3.2121429443359375, 21.058395385742188, 18.297317504882812, 6.28630256652832, 14.25583267211914, -19.286882400512695, 4.8066864013671875, -0.7322921752929688, -9.040218353271484, 8.793968200683594, 1.8186759948730469, 16.855854034423828, 7.725372314453125, 14.26123046875, 3.1817588806152344, 16.144515991210938, 3.7077789306640625, 27.069381713867188, 12.839599609375, -5.597892761230469, 14.012008666992188, 18.603870391845703, 16.372390747070312, 3.318187713623047, 18.046836853027344, 17.458148956298828, 20.290367126464844, 18.24139404296875, -0.34297943115234375, 1.0496044158935547, 1.9900474548339844, 27.1800537109375, 8.646202087402344, 2.3950862884521484, 32.44786071777344, 29.414339065551758, 35.84672546386719, -13.594940185546875, 11.047119140625, 20.925811767578125, -11.78194808959961, 1.4457244873046875, 36.132591247558594, 8.501365661621094, 18.16144561767578, 7.765766143798828, 10.414894104003906, -0.08817481994628906, -15.85568618774414, -6.823280334472656], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000632.npy"}
{"epoch": 0.9554043839758125, "step": 633, "batch_size": 64, "mean": 10.094891548156738, "std": 14.462000846862793, "min": -12.3463134765625, "p10": -5.335243511199951, "median": 8.568102836608887, "p90": 32.13632278442383, "max": 52.52143859863281, "pos_frac": 0.703125, "sample": [-7.523113250732422, 2.0160179138183594, 11.568717956542969, 2.212615966796875, 10.300407409667969, 20.191253662109375, 9.783645629882812, 19.812660217285156, 9.928655624389648, 4.714591979980469, 2.7561721801757812, 1.7966976165771484, 35.141693115234375, 5.973928451538086, -4.837742805480957, 15.070648193359375, -4.7644805908203125, -0.505706787109375, 17.520751953125, -1.8645133972167969, -0.2948150634765625, -12.3463134765625, 17.939788818359375, 17.130905151367188, 12.786224365234375, 14.42288589477539, -4.546571731567383, 9.243087768554688, 7.826320648193359, 14.180694580078125, 16.392494201660156, 5.833406448364258, 18.922428131103516, 8.445404052734375, -10.321418762207031, 12.657569885253906, -1.4184646606445312, -5.548458099365234, -3.1008224487304688, -9.725959777832031, -0.6964111328125, 3.1533584594726562, 8.690801620483398, 52.52143859863281, 3.6571044921875, -3.053560256958008, 15.830780029296875, -11.307506561279297, 23.075389862060547, 31.081809997558594, 36.11457061767578, 19.500001907348633, 18.541961669921875, 32.5882568359375, 0.7990493774414062, -8.486087799072266, 6.035121917724609, -1.5362510681152344, 26.7412109375, 41.07686996459961, 46.955894470214844, -4.770994186401367, 34.81959533691406, 16.969345092773438], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000633.npy"}
{"epoch": 0.9569160997732427, "step": 634, "batch_size": 64, "mean": 4.412355422973633, "std": 11.925896644592285, "min": -19.67789077758789, "p10": -9.257940292358398, "median": 2.879767417907715, "p90": 20.839608383178714, "max": 42.31816864013672, "pos_frac": 0.640625, "sample": [3.2788467407226562, -14.1844482421875, 0.8375778198242188, 20.115402221679688, 18.077064514160156, -1.3417930603027344, 3.5427322387695312, -0.5397167205810547, 0.02587127685546875, -0.7497138977050781, 0.5712642669677734, -4.9569854736328125, -9.29202651977539, 2.903656005859375, -2.248016357421875, 2.959604263305664, 17.613466262817383, 1.3975944519042969, -3.63897705078125, 3.4625282287597656, -9.5657958984375, 2.8558788299560547, 6.5308380126953125, -18.43083953857422, 27.163414001464844, 0.5367298126220703, 19.20328140258789, 20.129194259643555, 27.825790405273438, -19.67789077758789, 3.4139251708984375, 3.014110565185547, -0.015186309814453125, -7.975948333740234, 21.144071578979492, -6.3535614013671875, -2.7960853576660156, -9.370357513427734, 42.31816864013672, 22.32742691040039, 9.070632934570312, -2.83148193359375, -5.062980651855469, 9.366310119628906, -9.688385009765625, -9.17840576171875, -7.695316314697266, 12.197362899780273, 30.82244873046875, 0.6162796020507812, -3.4213294982910156, -1.3927421569824219, 8.140449523925781, 3.77764892578125, 7.479496002197266, 1.7860679626464844, 12.713735580444336, 12.21975326538086, 8.623519897460938, 10.569664001464844, 0.04708099365234375, 4.903141021728516, 5.535133361816406, 23.681564331054688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000634.npy"}
{"epoch": 0.9584278155706727, "step": 635, "batch_size": 64, "mean": 6.495750904083252, "std": 13.852375030517578, "min": -20.842041015625, "p10": -7.938927841186522, "median": 2.71630859375, "p90": 24.300407409667983, "max": 50.835662841796875, "pos_frac": 0.703125, "sample": [-3.460968017578125, 15.486946105957031, -13.982398986816406, -0.5340118408203125, 34.55516052246094, -2.3159332275390625, -5.641563415527344, -20.842041015625, -13.873802185058594, 34.35633087158203, 2.464426040649414, 14.568626403808594, 2.86297607421875, -0.9122390747070312, 18.47400665283203, 30.98816680908203, 18.732219696044922, 6.8281402587890625, 6.3146209716796875, 18.21527099609375, 8.946319580078125, 2.722808837890625, -2.01763916015625, 31.832611083984375, 1.0637664794921875, -1.7006683349609375, 12.143318176269531, 0.4932689666748047, 0.88262939453125, 6.081413269042969, 10.9261474609375, -10.158485412597656, 20.7742919921875, 1.0247421264648438, 4.5949249267578125, -6.232669830322266, 17.464584350585938, 2.2998504638671875, 18.01641082763672, -0.44524383544921875, -2.088348388671875, 25.811599731445312, -5.566059112548828, 43.38843536376953, 3.451904296875, -8.670181274414062, 13.853164672851562, 14.789718627929688, 0.5023040771484375, 3.075897216796875, 1.9873161315917969, 50.835662841796875, 2.2833175659179688, 2.709808349609375, -16.366737365722656, 1.180328369140625, 6.331024169921875, 2.5439605712890625, 1.8507232666015625, 4.68223762512207, -13.718551635742188, 11.534767150878906, 13.662841796875, -3.3333873748779297], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000635.npy"}
{"epoch": 0.9599395313681028, "step": 636, "batch_size": 64, "mean": 10.924173355102539, "std": 12.002284049987793, "min": -12.851173400878906, "p10": -3.734382629394531, "median": 9.402084350585938, "p90": 26.468354034423832, "max": 48.461944580078125, "pos_frac": 0.78125, "sample": [26.78980255126953, -2.175384521484375, 0.6991519927978516, 6.009727478027344, 20.85955047607422, 7.436180114746094, 26.80597686767578, 2.4956817626953125, 17.323196411132812, 21.956958770751953, 8.25, -5.3411712646484375, 28.646934509277344, 12.983894348144531, 6.4538421630859375, -4.27996826171875, 14.818504333496094, 3.1213455200195312, 5.614112854003906, 14.475654602050781, 16.231582641601562, 13.180831909179688, -0.1826915740966797, 18.780635833740234, 8.670352935791016, 23.56741714477539, 48.461944580078125, -5.4950714111328125, 3.3453617095947266, 22.832809448242188, 8.140617370605469, 10.18426513671875, 42.38561248779297, 15.542648315429688, -3.9425506591796875, 19.08202362060547, 9.414825439453125, 5.953437805175781, -3.2486572265625, 8.070541381835938, 12.759506225585938, 19.95410919189453, -0.32772254943847656, 25.718307495117188, 2.0927963256835938, -12.851173400878906, 33.559715270996094, 14.623741149902344, 9.543123245239258, 13.463859558105469, 13.53363037109375, -1.5070343017578125, 25.67742919921875, 27.7008056640625, 9.38934326171875, -2.6310958862304688, -8.034271240234375, -2.4233341217041016, -4.1223602294921875, 14.636112213134766, 6.807548522949219, 6.17425537109375, 6.4761505126953125, 15.01373291015625], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000636.npy"}
{"epoch": 0.9614512471655329, "step": 637, "batch_size": 64, "mean": 9.234546661376953, "std": 13.433594703674316, "min": -20.232574462890625, "p10": -4.159934997558593, "median": 5.877140045166016, "p90": 33.01398544311524, "max": 39.59759521484375, "pos_frac": 0.78125, "sample": [14.454498291015625, 15.426132202148438, 10.581188201904297, 39.59759521484375, -3.394336700439453, 14.966323852539062, -5.566566467285156, 7.293724060058594, 18.325958251953125, 10.169296264648438, 4.935508728027344, 8.168464660644531, 2.60491943359375, -0.979034423828125, 5.368324279785156, 8.307395935058594, -5.153200149536133, 15.709136962890625, 31.450782775878906, -3.1971893310546875, 8.042236328125, -3.267303466796875, 8.753318786621094, 3.4170379638671875, 0.6480503082275391, 9.7640380859375, 4.62908935546875, 37.742942810058594, 36.11280822753906, 39.28203582763672, 33.683929443359375, -2.5804672241210938, -20.232574462890625, 5.885406494140625, -2.0580291748046875, 16.017181396484375, 4.308052062988281, 4.910984039306641, 16.604202270507812, 26.073631286621094, 11.330184936523438, 28.446456909179688, 0.23901748657226562, 5.868873596191406, 12.622283935546875, 1.08367919921875, -10.819732666015625, 8.9215087890625, 1.1284561157226562, 7.579120635986328, 1.0400543212890625, 38.45500946044922, 3.4024486541748047, 5.1060791015625, 5.66326904296875, -9.096199035644531, 22.155685424804688, -0.3099517822265625, -4.488048553466797, -10.370330810546875, 9.88454818725586, 38.52386474609375, 2.5383987426757812, 5.300838470458984], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000637.npy"}
{"epoch": 0.9629629629629629, "step": 638, "batch_size": 64, "mean": 9.062637329101562, "std": 12.268311500549316, "min": -16.34827423095703, "p10": -4.202431106567382, "median": 9.213075637817383, "p90": 25.061576080322276, "max": 40.4517707824707, "pos_frac": 0.734375, "sample": [-6.4423828125, 8.262489318847656, 22.6998291015625, 5.069370269775391, -16.34827423095703, -4.290153503417969, 32.5136604309082, 9.7191162109375, 0.17492294311523438, -10.433990478515625, 1.0093841552734375, -2.4768905639648438, 0.9923095703125, -2.537353515625, 2.57257080078125, 17.195358276367188, 2.6618118286132812, -11.469398498535156, 26.073753356933594, -1.744659423828125, 13.905746459960938, 9.163619995117188, 15.051231384277344, 4.910186767578125, -11.470123291015625, 19.77667236328125, 14.97515869140625, 28.01797103881836, -2.4714889526367188, 11.055591583251953, 12.309673309326172, 20.34494400024414, 20.4970703125, 8.103805541992188, 17.861358642578125, -1.3131561279296875, 3.1482696533203125, 27.468799591064453, 9.3289794921875, 20.30594253540039, 17.008522033691406, 21.028980255126953, 10.750022888183594, 4.502784729003906, 20.09540557861328, -0.2897377014160156, 9.262531280517578, 40.4517707824707, 0.28639984130859375, 12.676311492919922, -1.0274658203125, 13.593132019042969, 18.2144775390625, 10.96026611328125, -3.9977455139160156, 14.812772750854492, -9.959705352783203, 38.7425651550293, 28.417072296142578, -1.8455924987792969, 4.9360809326171875, 2.4695663452148438, -1.3469390869140625, 16.095535278320312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000638.npy"}
{"epoch": 0.9644746787603931, "step": 639, "batch_size": 64, "mean": 8.720342636108398, "std": 11.830388069152832, "min": -13.906723022460938, "p10": -3.325597953796385, "median": 6.058506011962891, "p90": 24.62279434204102, "max": 41.870513916015625, "pos_frac": 0.765625, "sample": [17.230369567871094, 22.379318237304688, 9.095878601074219, -1.223907470703125, 26.460647583007812, 16.159435272216797, 9.980255126953125, 1.8326873779296875, 7.959968566894531, 13.437538146972656, 6.215705871582031, 14.319896697998047, 41.870513916015625, 3.5823287963867188, 16.93639373779297, 2.086578369140625, 3.6140499114990234, -1.3582382202148438, -9.44866943359375, 10.259529113769531, 9.333702087402344, 20.266448974609375, 2.1681365966796875, 3.45794677734375, -8.644662857055664, 27.62557601928711, 0.032958984375, 19.917709350585938, -12.7506103515625, 24.792892456054688, 24.22589874267578, -3.9124584197998047, 2.4094276428222656, 17.49014663696289, -1.0356502532958984, 3.2397918701171875, 1.2412147521972656, 0.3918647766113281, 41.423805236816406, 11.995746612548828, 21.317176818847656, 2.8314056396484375, 25.691558837890625, 7.1759185791015625, 3.7384033203125, 4.0340118408203125, -1.9562568664550781, -0.00728607177734375, 6.877288818359375, 1.8874397277832031, 13.004425048828125, 16.10025405883789, -0.5475311279296875, 5.90130615234375, 1.5549163818359375, -7.811246871948242, -0.0644683837890625, -13.906723022460938, -0.7724380493164062, 22.64533233642578, -5.370353698730469, 26.440902709960938, 18.199777603149414, 16.078018188476562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000639.npy"}
{"epoch": 0.9659863945578231, "step": 640, "batch_size": 64, "mean": 6.363860607147217, "std": 12.744903564453125, "min": -27.857318878173828, "p10": -9.623559951782223, "median": 6.096305847167969, "p90": 18.937545394897462, "max": 34.09196472167969, "pos_frac": 0.71875, "sample": [13.166648864746094, 2.594654083251953, 15.251617431640625, -19.666118621826172, 1.9265689849853516, 7.3764190673828125, 17.302785873413086, 4.205078125, 14.552093505859375, 18.726890563964844, -2.0370101928710938, -0.7821884155273438, 23.71877670288086, 4.828178405761719, -5.904876708984375, -1.2559432983398438, 12.503482818603516, 18.862144470214844, -0.12880516052246094, 26.727615356445312, -26.668533325195312, 7.4868316650390625, 8.901519775390625, 12.778770446777344, -12.48016357421875, -11.217281341552734, 12.291549682617188, 4.016181945800781, -27.857318878173828, 6.065586090087891, 13.825531005859375, -5.74578857421875, 3.1425323486328125, 17.85961151123047, 19.8673095703125, 2.8704452514648438, -11.878494262695312, 7.3888702392578125, 14.553047180175781, 4.5829010009765625, -1.95068359375, 18.969860076904297, 3.881389617919922, 10.928375244140625, 17.927146911621094, 17.24392318725586, 13.880905151367188, -22.622100830078125, 6.127025604248047, 17.913681030273438, 5.134792327880859, 34.09196472167969, -1.4007606506347656, 17.113616943359375, -2.5701217651367188, -5.795858383178711, 30.86358642578125, 0.44138336181640625, -4.74615478515625, 15.225885391235352, 0.07690238952636719, 21.600448608398438, 4.716541290283203, 18.484207153320312], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000640.npy"}
{"epoch": 0.9674981103552532, "step": 641, "batch_size": 64, "mean": 7.310059070587158, "std": 12.012986183166504, "min": -18.900787353515625, "p10": -7.228703308105467, "median": 7.089399337768555, "p90": 21.972738838195806, "max": 44.80058288574219, "pos_frac": 0.75, "sample": [-2.599681854248047, -11.047760009765625, 10.477643966674805, 7.020160675048828, 5.537757873535156, -4.9881744384765625, 23.362594604492188, -0.14330291748046875, -1.6630401611328125, 0.6870155334472656, -10.470489501953125, 1.8624677658081055, 13.031314849853516, 37.47246551513672, 1.7348785400390625, 5.937891006469727, 8.187366485595703, 32.451637268066406, 35.50245666503906, 11.763290405273438, 16.08273696899414, 2.738597869873047, -4.477989196777344, -1.005157470703125, 8.691165924072266, 13.521514892578125, 2.89971923828125, 8.307937622070312, 11.51259994506836, 22.335012435913086, -8.285369873046875, 10.35223388671875, 0.8643035888671875, 4.232433319091797, 7.609321594238281, -12.323661804199219, 12.81485366821289, 0.600830078125, -8.990158081054688, 7.562713623046875, -18.900787353515625, 18.12649917602539, 7.158638000488281, -7.997467041015625, 9.997695922851562, 21.12743377685547, -4.705638885498047, 1.1190471649169922, 7.403436660766602, 18.747467041015625, 44.80058288574219, 12.992147445678711, 24.074100494384766, 9.455671310424805, 9.13037109375, -2.1620712280273438, 4.31219482421875, 2.4033355712890625, 6.8076171875, 8.66552734375, -5.4349212646484375, 6.912200927734375, 13.825366973876953, 20.8251895904541], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000641.npy"}
{"epoch": 0.9690098261526833, "step": 642, "batch_size": 64, "mean": 7.077930450439453, "std": 12.064919471740723, "min": -22.98297882080078, "p10": -6.629565811157226, "median": 6.9994049072265625, "p90": 24.08835639953614, "max": 32.36254119873047, "pos_frac": 0.6875, "sample": [24.63918685913086, 3.4322357177734375, 5.265575408935547, -8.376450538635254, -0.6090011596679688, -2.1382408142089844, 11.878433227539062, 29.458900451660156, 8.22760009765625, -22.98297882080078, 31.87353515625, -4.688243865966797, 1.6348190307617188, -6.724479675292969, 12.486557006835938, 5.452545166015625, 14.727363586425781, 30.80706787109375, -5.375164031982422, -11.35638427734375, 10.199798583984375, 8.970382690429688, 15.139976501464844, -4.54766845703125, 13.181713104248047, 1.1351242065429688, 29.940082550048828, 9.375808715820312, 7.237396240234375, 9.3248291015625, 12.748775482177734, 4.50177001953125, 16.75072479248047, 4.787353515625, -1.8206901550292969, 22.322738647460938, -2.0904388427734375, 6.572784423828125, 10.669624328613281, -3.51708984375, 8.82534408569336, 1.6422805786132812, 3.504343032836914, -12.842704772949219, 6.1126861572265625, 14.285062789916992, -10.566375732421875, 14.053020477294922, 22.803085327148438, 32.36254119873047, 7.580169677734375, -1.803985595703125, 17.765220642089844, 21.225231170654297, 28.855438232421875, 10.705257415771484, -0.9517860412597656, 11.396560668945312, -6.408100128173828, -14.961479187011719, 6.76141357421875, -2.976715087890625, -2.7851104736328125, 9.890281677246094], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000642.npy"}
{"epoch": 0.9705215419501134, "step": 643, "batch_size": 64, "mean": 8.041906356811523, "std": 13.368385314941406, "min": -28.78924560546875, "p10": -7.049991607666015, "median": 5.644096374511719, "p90": 26.492737579345704, "max": 41.26803207397461, "pos_frac": 0.78125, "sample": [1.6716156005859375, 8.310203552246094, 1.0928688049316406, 1.767425537109375, 6.13792610168457, 26.726364135742188, 14.18359375, 0.7021446228027344, 19.9395751953125, 0.7502822875976562, 4.341766357421875, 4.217376708984375, 1.9262542724609375, 5.8712310791015625, 13.718017578125, 11.812026977539062, 1.2521133422851562, -3.8393383026123047, 35.65998077392578, 41.26803207397461, 10.287097930908203, 7.731693267822266, 23.38685417175293, 16.817718505859375, -0.44481658935546875, 32.9227294921875, 17.959548950195312, 1.63262939453125, 24.385208129882812, 6.4296875, 9.395156860351562, 5.208408355712891, -7.9951019287109375, 12.929611206054688, 5.416961669921875, -1.3352928161621094, -1.1931533813476562, 1.9280014038085938, 38.480648040771484, 31.254661560058594, 15.599395751953125, -14.032279968261719, 14.672760009765625, -7.98499870300293, 3.809600830078125, -11.353240966796875, -12.374137878417969, -28.78924560546875, 0.42327117919921875, 7.741689682006836, -5.251556396484375, 10.20907211303711, 35.165618896484375, 5.20494270324707, 3.0719451904296875, -7.543632507324219, -1.9662322998046875, 25.947608947753906, 0.8397979736328125, 12.464971542358398, 14.387893676757812, 8.772857666015625, 18.856346130371094, -5.898162841796875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000643.npy"}
{"epoch": 0.9720332577475435, "step": 644, "batch_size": 64, "mean": 7.383061408996582, "std": 12.757487297058105, "min": -22.145233154296875, "p10": -6.872967910766602, "median": 6.963754653930664, "p90": 23.67087554931641, "max": 38.38648986816406, "pos_frac": 0.6875, "sample": [19.618061065673828, 2.8741683959960938, -22.145233154296875, 18.427345275878906, -2.0646438598632812, -13.002861022949219, 6.807300567626953, 5.591884613037109, 30.359512329101562, 10.555305480957031, 7.120208740234375, -10.20370101928711, 5.8035736083984375, 21.56414794921875, 11.6588134765625, 10.143760681152344, -2.4222946166992188, 6.282867431640625, -2.5687408447265625, 26.87329864501953, 0.46027565002441406, 13.227546691894531, -16.910846710205078, -1.1611862182617188, 3.183013916015625, 15.150123596191406, -3.071918487548828, 11.690643310546875, 2.9613075256347656, -3.9316024780273438, 17.19799041748047, 11.84796142578125, -1.1954574584960938, 10.190902709960938, 24.083515167236328, -6.957012176513672, 1.4789886474609375, 28.873899459838867, 20.431026458740234, 6.618852615356445, 14.878875732421875, 13.34884262084961, 9.237625122070312, 5.74919319152832, -3.6136703491210938, -15.94525146484375, -2.9256725311279297, 36.69919967651367, 13.862556457519531, 22.708049774169922, 7.1328887939453125, 0.6288909912109375, -1.6926803588867188, -6.6768646240234375, 9.957489013671875, 9.974754333496094, -11.815559387207031, -3.200122833251953, 17.1124267578125, 8.768665313720703, -4.050666809082031, 29.653732299804688, 18.895950317382812, 38.38648986816406], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000644.npy"}
{"epoch": 0.9735449735449735, "step": 645, "batch_size": 64, "mean": 9.357705116271973, "std": 11.766825675964355, "min": -12.58026123046875, "p10": -4.370175552368163, "median": 6.262813568115234, "p90": 24.332541275024415, "max": 42.537994384765625, "pos_frac": 0.78125, "sample": [16.504356384277344, 10.845550537109375, 32.865943908691406, -1.3577308654785156, 24.532608032226562, 42.537994384765625, 34.53672409057617, -2.536428451538086, 9.931800842285156, -6.715667724609375, -1.1186981201171875, -4.5972137451171875, 2.2227420806884766, 1.0452156066894531, 30.961105346679688, 7.685894012451172, 17.751157760620117, 18.049428939819336, 4.039360046386719, -4.662342071533203, 28.691543579101562, -2.6180267333984375, -5.1568450927734375, 12.552253723144531, 0.6269073486328125, -4.5725860595703125, 5.031028747558594, 14.019432067871094, 15.142784118652344, 16.698806762695312, 8.093643188476562, 5.6819610595703125, 11.96955680847168, 6.649482727050781, 3.4739837646484375, 19.140670776367188, -6.506744384765625, 0.3525543212890625, 4.82470703125, 20.605987548828125, 19.648963928222656, 13.220993041992188, 38.177490234375, 4.496772766113281, 5.641025543212891, 15.914060592651367, 12.768623352050781, 3.04620361328125, 5.8761444091796875, 13.916893005371094, -3.8978843688964844, 23.865718841552734, 4.163093566894531, 18.952903747558594, -2.4979705810546875, 1.9280433654785156, 10.245361328125, -1.1817474365234375, 1.782928466796875, -12.58026123046875, 1.2192916870117188, 22.751922607421875, 4.734529495239258, 9.477134704589844], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000645.npy"}
{"epoch": 0.9750566893424036, "step": 646, "batch_size": 64, "mean": 9.095805168151855, "std": 12.814823150634766, "min": -24.01241683959961, "p10": -3.8371292114257805, "median": 7.826658248901367, "p90": 29.789262390136727, "max": 42.80731201171875, "pos_frac": 0.8125, "sample": [-4.1265411376953125, 0.34314537048339844, 5.4589691162109375, 10.04046630859375, 31.29669189453125, 10.907455444335938, 2.938873291015625, 25.968719482421875, 14.597667694091797, 13.25634765625, 0.244964599609375, 9.333091735839844, 31.409149169921875, 28.129913330078125, 11.503860473632812, -2.1088790893554688, 9.6956787109375, 12.128387451171875, 28.07563018798828, 16.038022994995117, 30.500411987304688, 7.6097259521484375, -18.67157745361328, 10.351451873779297, -6.639568328857422, 26.967727661132812, 4.419647216796875, -0.0182342529296875, 11.077293395996094, 0.3985748291015625, 2.322662353515625, 3.4005126953125, 1.7418441772460938, 33.016075134277344, 6.930091857910156, 3.6265296936035156, 5.330846786499023, 2.8212127685546875, 8.043590545654297, 17.922754287719727, 0.182525634765625, -9.266983032226562, 11.321113586425781, 4.1547393798828125, 16.658981323242188, 42.80731201171875, 8.44146728515625, 12.803993225097656, -0.15804004669189453, 30.718460083007812, 19.925945281982422, 3.0841522216796875, -3.161834716796875, 8.147605895996094, -24.01241683959961, 0.9704971313476562, 15.115371704101562, -8.8665771484375, -2.9176712036132812, 33.78905487060547, -4.436676025390625, 5.572883605957031, 2.6615028381347656, 12.312938690185547], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000646.npy"}
{"epoch": 0.9765684051398337, "step": 647, "batch_size": 64, "mean": 11.073234558105469, "std": 12.598631858825684, "min": -18.5128173828125, "p10": -2.1212680816650384, "median": 8.856101989746094, "p90": 28.809124183654788, "max": 50.18928527832031, "pos_frac": 0.875, "sample": [-4.035919189453125, 5.776496887207031, 19.28244972229004, 5.711814880371094, 6.3717803955078125, 6.9778594970703125, 3.67010498046875, 29.7305908203125, 5.749176025390625, 20.933380126953125, 4.851737976074219, 13.279205322265625, 3.6286773681640625, 2.6599960327148438, -4.619392395019531, 5.457550048828125, 50.18928527832031, 5.583948135375977, -1.4556694030761719, 3.41156005859375, 21.520105361938477, 11.857521057128906, 10.621986389160156, 8.417961120605469, 36.524314880371094, 28.325437545776367, 7.01873779296875, 7.2353363037109375, 13.412200927734375, 22.357681274414062, 9.014923095703125, 2.1457366943359375, 12.602737426757812, 2.918792724609375, 8.138261795043945, -4.300006866455078, 9.848522186279297, 20.276077270507812, -2.406524658203125, -14.211074829101562, 8.697280883789062, 15.665092468261719, 9.531017303466797, 17.046524047851562, -18.5128173828125, 6.971195220947266, 11.7666015625, 13.093605041503906, 2.7061080932617188, 12.21255111694336, 1.7940444946289062, 34.48462677001953, 21.576644897460938, 0.85552978515625, 10.784446716308594, 29.01641845703125, 4.105720520019531, 48.652313232421875, 9.577667236328125, 14.592254638671875, 29.8662109375, -7.05816650390625, 15.600332260131836, 21.188430786132812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000647.npy"}
{"epoch": 0.9780801209372638, "step": 648, "batch_size": 64, "mean": 8.14780044555664, "std": 12.372627258300781, "min": -31.358997344970703, "p10": -4.549099731445312, "median": 5.750931739807129, "p90": 25.10865783691407, "max": 36.15703582763672, "pos_frac": 0.84375, "sample": [10.647071838378906, 1.5115432739257812, 11.985847473144531, -31.358997344970703, 27.10699462890625, 5.983192443847656, -15.31524658203125, 13.695854187011719, 18.526351928710938, 14.17527961730957, 2.244182586669922, 7.803504943847656, 6.0717620849609375, 2.4104537963867188, -4.696540832519531, -3.332986831665039, 10.978633880615234, 0.6535263061523438, 36.15703582763672, 21.103240966796875, 32.931617736816406, 4.851894378662109, 5.2060089111328125, 34.55625534057617, 11.256645202636719, 8.924930572509766, 2.4801101684570312, 16.880420684814453, 1.7584686279296875, 33.7313232421875, 2.1331634521484375, -2.0402603149414062, 23.10272216796875, 4.45941162109375, 5.716222763061523, -9.330657958984375, 10.715354919433594, 5.785640716552734, 5.419158935546875, 21.654373168945312, 5.7926177978515625, 3.643035888671875, 32.89713668823242, 4.133159637451172, -5.047294616699219, 1.42266845703125, 20.846588134765625, 4.809864044189453, 6.9244537353515625, 9.235088348388672, 8.375129699707031, 15.324378967285156, 5.413642883300781, -11.51422119140625, -9.604705810546875, 23.5963134765625, 25.756805419921875, 0.2697639465332031, 2.446758270263672, -4.205070495605469, 5.42877197265625, 5.384880065917969, 2.5306243896484375, 11.055355072021484], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000648.npy"}
{"epoch": 0.9795918367346939, "step": 649, "batch_size": 64, "mean": 8.692584037780762, "std": 14.195343017578125, "min": -22.485130310058594, "p10": -9.274611282348634, "median": 9.2687349319458, "p90": 27.85890922546387, "max": 42.10059356689453, "pos_frac": 0.75, "sample": [30.669357299804688, 14.259513854980469, 1.023508071899414, 6.616752624511719, 22.995166778564453, 17.5574951171875, 9.492477416992188, 18.604108810424805, 4.948566436767578, 31.151958465576172, 2.2788772583007812, 26.347610473632812, 3.6195068359375, -12.0645751953125, 14.206108093261719, 0.541748046875, 1.81036376953125, 9.044992446899414, 19.500839233398438, 12.249313354492188, 16.28030776977539, -15.000755310058594, -22.485130310058594, -6.933967590332031, -17.647247314453125, 18.429271697998047, 12.98895263671875, 0.9963455200195312, -9.135860443115234, -9.958389282226562, 22.125015258789062, 14.031303405761719, 24.23474884033203, 9.600584030151367, 27.61728286743164, 8.155914306640625, 11.27835464477539, 18.490642547607422, 10.25937271118164, 9.834091186523438, 3.8282394409179688, 42.10059356689453, 35.3758430480957, -5.905994415283203, -1.3364105224609375, 30.79076385498047, 6.221979141235352, 0.5875396728515625, -1.578887939453125, 3.206480026245117, 1.3982086181640625, 34.72148895263672, -9.334075927734375, -4.738433837890625, 27.96246337890625, -21.992996215820312, -5.275382995605469, -1.9892730712890625, 13.463726043701172, 16.462753295898438, 19.165435791015625, 13.098060607910156, 2.377655029296875, -0.298919677734375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000649.npy"}
{"epoch": 0.981103552532124, "step": 650, "batch_size": 64, "mean": 9.252646446228027, "std": 12.553420066833496, "min": -26.85870361328125, "p10": -5.111824798583984, "median": 8.518966674804688, "p90": 26.806208419799805, "max": 40.80283737182617, "pos_frac": 0.75, "sample": [-3.5333633422851562, 15.92633056640625, 8.94384765625, 26.871475219726562, 6.454315185546875, 31.215187072753906, 2.969573974609375, 8.822998046875, 19.665847778320312, 2.038270950317383, 17.724891662597656, 21.762535095214844, -5.144493103027344, 10.042366027832031, 2.032562255859375, 11.423431396484375, 20.824844360351562, 12.89599609375, 13.097679138183594, 0.6033782958984375, 19.288009643554688, 11.767145156860352, 2.4861106872558594, -7.208160400390625, 26.653919219970703, 21.23143768310547, 40.80283737182617, 22.80139923095703, 21.204666137695312, -2.1970672607421875, 4.982357025146484, -8.015464782714844, -2.8285980224609375, 22.449142456054688, -26.85870361328125, 3.031352996826172, 29.744964599609375, 17.088520050048828, 9.201957702636719, -7.4270477294921875, 14.440460205078125, 27.270282745361328, -13.154273986816406, 14.256996154785156, 4.744476318359375, -2.3607521057128906, 27.691131591796875, 24.013565063476562, 3.4561767578125, -7.084690093994141, 7.472108840942383, 2.662261962890625, 7.723545074462891, 2.542682647705078, -3.4532337188720703, 15.929275512695312, -5.0355987548828125, -4.6013641357421875, 9.859931945800781, 27.03668212890625, -0.343048095703125, 8.214935302734375, 8.053031921386719, -0.0016937255859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000650.npy"}
{"epoch": 0.982615268329554, "step": 651, "batch_size": 64, "mean": 7.572590351104736, "std": 11.776309967041016, "min": -18.340641021728516, "p10": -3.1329109191894533, "median": 5.861116886138916, "p90": 26.371050262451174, "max": 37.58611297607422, "pos_frac": 0.6875, "sample": [3.2645702362060547, 1.5927581787109375, 8.439033508300781, -2.5071258544921875, 6.8779754638671875, -18.340641021728516, -2.2144851684570312, 5.665456771850586, -2.8872222900390625, 31.430225372314453, 0.6865806579589844, -4.2054443359375, -8.334182739257812, 7.2091522216796875, 26.45136260986328, -9.415451049804688, -1.6265907287597656, 17.2374267578125, 8.666641235351562, 15.915634155273438, 14.709850311279297, 1.60675048828125, 3.49847412109375, 5.6426544189453125, -11.170536041259766, 7.803581237792969, 10.427078247070312, 5.570831298828125, -3.0989608764648438, 26.18365478515625, 14.49264144897461, 24.91130828857422, 11.465801239013672, -3.1474609375, 7.064239501953125, -2.011362075805664, -1.2437515258789062, 13.907203674316406, -0.8946037292480469, 10.091781616210938, 28.87442398071289, 4.225963592529297, 24.793190002441406, 7.3536834716796875, -0.15935516357421875, 37.58611297607422, 28.87242889404297, 27.370868682861328, -0.64666748046875, 22.000717163085938, 6.056777000427246, -2.7492752075195312, 2.3225765228271484, 14.269363403320312, 11.46649169921875, 12.722118377685547, 7.40995979309082, -1.1010475158691406, 0.05098724365234375, -1.5504150390625, 1.7895851135253906, 9.282764434814453, 33.26304626464844, -8.573379516601562], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000651.npy"}
{"epoch": 0.9841269841269841, "step": 652, "batch_size": 64, "mean": 7.717067718505859, "std": 13.553836822509766, "min": -34.10447692871094, "p10": -6.41130313873291, "median": 4.535224914550781, "p90": 28.645178222656256, "max": 36.988685607910156, "pos_frac": 0.734375, "sample": [1.83251953125, 14.080368041992188, 29.365631103515625, 4.5975799560546875, 10.154312133789062, 31.27391815185547, -2.199390411376953, 18.418228149414062, 32.89570617675781, -0.44002532958984375, 27.226699829101562, -2.245830535888672, -0.22488975524902344, 8.748165130615234, 19.056610107421875, 16.070640563964844, 4.472869873046875, 21.799654006958008, 10.300308227539062, -4.8414154052734375, -6.587043762207031, 22.71856689453125, 10.151443481445312, 4.4563446044921875, 1.659088134765625, 8.075775146484375, -34.10447692871094, -10.08319091796875, 17.43146514892578, 10.190460205078125, 7.691749572753906, -6.343158721923828, 7.235115051269531, 1.314056396484375, 1.7230606079101562, 1.7432022094726562, 33.24864196777344, -12.399787902832031, -6.440507888793945, 4.0469818115234375, 2.6899490356445312, 36.988685607910156, 1.2331962585449219, 6.458385467529297, 0.08628082275390625, 3.928485870361328, 21.40802764892578, 27.423553466796875, -1.2880172729492188, 21.170806884765625, 29.168731689453125, -3.9188766479492188, 16.875526428222656, 2.2028732299804688, -11.834854125976562, 1.50408935546875, -5.2192230224609375, 5.042375564575195, -3.678436279296875, 34.619503021240234, 13.016738891601562, 1.6815643310546875, -10.262489318847656, 8.526046752929688], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000652.npy"}
{"epoch": 0.9856386999244142, "step": 653, "batch_size": 64, "mean": 8.90933895111084, "std": 12.694674491882324, "min": -21.03152847290039, "p10": -7.87465591430664, "median": 7.770469665527344, "p90": 24.15825881958008, "max": 42.75433349609375, "pos_frac": 0.78125, "sample": [3.738424301147461, 21.92877960205078, 3.973724365234375, -9.162315368652344, 9.785442352294922, 14.332691192626953, 10.184677124023438, 20.06085968017578, 33.4473876953125, -0.1981658935546875, -10.986164093017578, 2.496673583984375, 20.801677703857422, 3.235595703125, 23.50762176513672, 14.031517028808594, 2.125335693359375, 25.6512451171875, 5.692630767822266, 29.185943603515625, 16.248519897460938, 24.658828735351562, -3.0666122436523438, -2.0076217651367188, 14.950128555297852, 1.3903656005859375, 17.122154235839844, 7.3069000244140625, -6.897239685058594, 8.234039306640625, 42.75433349609375, 0.3671875, 2.6171951293945312, 6.967836380004883, 16.805736541748047, -6.472450256347656, 10.310859680175781, 5.809501647949219, 20.222412109375, 0.6360092163085938, 6.926849365234375, -8.293548583984375, 24.437103271484375, 16.684783935546875, 15.2474365234375, -21.03152847290039, 21.149463653564453, 2.89068603515625, 22.95391845703125, 15.743606567382812, -10.918510437011719, 35.269920349121094, -17.6864013671875, -0.30365753173828125, -4.294727325439453, 2.647552490234375, 15.15350341796875, 6.7307281494140625, 11.339683532714844, 15.419021606445312, 14.002994537353516, 2.9846229553222656, -10.06247329711914, 11.414993286132812], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000653.npy"}
{"epoch": 0.9871504157218443, "step": 654, "batch_size": 64, "mean": 7.37592077255249, "std": 13.102499008178711, "min": -13.422195434570312, "p10": -7.450526237487792, "median": 5.000383377075195, "p90": 29.049500083923345, "max": 39.69762420654297, "pos_frac": 0.6875, "sample": [3.5300064086914062, 30.690109252929688, -5.621185302734375, 14.350818634033203, 0.53326416015625, -9.44415283203125, 23.235679626464844, 8.33575439453125, 2.1865386962890625, 5.782691955566406, -4.9012908935546875, -10.366615295410156, 29.5352783203125, 10.70269775390625, -1.4389419555664062, 11.673660278320312, 4.337368011474609, -11.198707580566406, 35.18533706665039, -0.6174240112304688, 0.9425048828125, 4.689899444580078, -2.3828201293945312, 31.25495147705078, 0.8286495208740234, 6.343330383300781, -6.898157119750977, -8.215614318847656, -12.80633544921875, 11.209514617919922, -4.321636199951172, -6.569549560546875, 32.378173828125, 8.504913330078125, 24.36962127685547, 19.958587646484375, 8.297882080078125, 11.436691284179688, 5.883018493652344, 5.3108673095703125, 25.68804168701172, 4.034576416015625, -4.292816162109375, -2.983795166015625, 9.151874542236328, -3.4819488525390625, 31.82769012451172, 27.916017532348633, 11.042226791381836, 2.2332687377929688, 14.297630310058594, 26.335403442382812, 4.444736480712891, 4.6584625244140625, 39.69762420654297, -13.422195434570312, 12.703544616699219, 2.6474533081054688, -5.5318145751953125, 12.305034637451172, 6.004203796386719, -3.95062255859375, 11.716232299804688, -7.687255859375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000654.npy"}
{"epoch": 0.9886621315192744, "step": 655, "batch_size": 64, "mean": 8.42211627960205, "std": 11.549861907958984, "min": -11.342964172363281, "p10": -3.1708614349365236, "median": 4.939609527587891, "p90": 24.267616271972656, "max": 51.933685302734375, "pos_frac": 0.75, "sample": [24.282882690429688, 2.357616424560547, -2.1413803100585938, 5.941129684448242, 3.2141952514648438, 24.19586944580078, 14.026962280273438, 0.34067535400390625, -4.364067077636719, 6.1146240234375, 3.168853759765625, 12.111709594726562, 0.05045318603515625, 4.361900329589844, -8.015785217285156, 3.1388206481933594, 6.664085388183594, 3.7508201599121094, 26.798988342285156, 6.528892517089844, 30.730926513671875, -0.553131103515625, 9.849998474121094, 18.57733154296875, 24.23199462890625, -3.1440277099609375, 10.965110778808594, -1.2919578552246094, 25.913223266601562, -3.182361602783203, 13.48737907409668, -2.8904457092285156, 14.821361541748047, 4.5875701904296875, 17.0657958984375, -0.44245147705078125, 4.329914093017578, 18.93976593017578, 13.743789672851562, 51.933685302734375, -1.1741790771484375, -11.342964172363281, 4.529869079589844, 19.513214111328125, 4.497581481933594, -10.312000274658203, 5.149131774902344, 5.066795349121094, 14.185811996459961, 21.025924682617188, -4.128623962402344, 3.0420989990234375, 23.19232177734375, 5.783073425292969, -0.23126220703125, 1.6977958679199219, 28.209468841552734, 4.043975830078125, 12.002159118652344, -2.034942626953125, -3.4756336212158203, 27.301067352294922, 4.8124237060546875, 7.461639404296875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000655.npy"}
{"epoch": 0.9901738473167044, "step": 656, "batch_size": 64, "mean": 9.37906265258789, "std": 13.029580116271973, "min": -21.358951568603516, "p10": -3.6420978546142577, "median": 7.265419006347656, "p90": 27.539342117309573, "max": 42.656776428222656, "pos_frac": 0.75, "sample": [24.341323852539062, 4.3716278076171875, 5.416229248046875, 17.017982482910156, -6.58837890625, -3.6872482299804688, 8.4190673828125, 12.410789489746094, 6.9246368408203125, 1.2550277709960938, 6.112457275390625, 31.03453826904297, -2.2341651916503906, -0.9838485717773438, -2.648691177368164, 12.505096435546875, -15.208267211914062, -2.0365982055664062, 16.275691986083984, 14.904132843017578, 42.656776428222656, 14.528205871582031, -21.358951568603516, 24.2027587890625, 15.975006103515625, -9.597099304199219, 12.957473754882812, 19.5899658203125, -1.5043487548828125, 6.636383056640625, 37.07002258300781, 2.425037384033203, 6.420810699462891, 12.051712036132812, 8.67093276977539, 39.00086975097656, 7.606201171875, 3.0698299407958984, 26.466262817382812, 6.6046142578125, -9.1123046875, -0.863861083984375, 17.025711059570312, 27.99923324584961, 13.332862854003906, 33.215843200683594, 6.891151428222656, 13.410842895507812, 4.229804992675781, 11.189167022705078, 19.46331787109375, -3.5367469787597656, -2.949462890625, 37.31312561035156, 5.966400146484375, 6.690673828125, 9.203804016113281, -9.805343627929688, 2.8745803833007812, 0.7593727111816406, -1.5818862915039062, 14.339630126953125, 10.650463104248047, 12.479728698730469], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000656.npy"}
{"epoch": 0.9916855631141346, "step": 657, "batch_size": 64, "mean": 11.48154354095459, "std": 13.333126068115234, "min": -21.54138946533203, "p10": -2.9045524597167964, "median": 10.213342666625977, "p90": 30.065380859375004, "max": 38.32849884033203, "pos_frac": 0.796875, "sample": [31.164695739746094, 10.460121154785156, 0.3035736083984375, 19.32855224609375, 13.206939697265625, 26.90347671508789, 30.573333740234375, 36.56757354736328, 20.531192779541016, -16.647445678710938, -1.0282363891601562, -3.3038196563720703, 15.202293395996094, 3.49298095703125, 18.997726440429688, 12.774394989013672, 18.872177124023438, 31.555213928222656, -3.983978271484375, 27.806381225585938, -3.3892822265625, 9.685216903686523, -11.623588562011719, 1.5244503021240234, 6.4731597900390625, 17.20330238342285, 2.51025390625, 14.167442321777344, -0.5225639343261719, 1.0167236328125, 8.049163818359375, 12.550872802734375, 12.538360595703125, 28.228248596191406, 26.408531188964844, 11.680732727050781, 9.966564178466797, 38.044883728027344, 22.865280151367188, 5.789398193359375, 28.880157470703125, 4.496490478515625, 2.0419445037841797, 19.43444061279297, -1.0518112182617188, -3.054046630859375, 38.32849884033203, 8.355888366699219, -2.5557327270507812, 3.7524070739746094, -2.2313690185546875, -21.54138946533203, 15.3656005859375, -1.0108232498168945, 9.319339752197266, 7.165557861328125, 0.096435546875, 16.124736785888672, 26.916275024414062, 37.58631134033203, 2.230478286743164, 11.94033432006836, 8.084770202636719, 20.20000457763672], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000657.npy"}
{"epoch": 0.9931972789115646, "step": 658, "batch_size": 64, "mean": 8.180170059204102, "std": 14.247986793518066, "min": -30.262466430664062, "p10": -8.928837776184082, "median": 4.875862121582031, "p90": 25.898590469360357, "max": 40.35636901855469, "pos_frac": 0.734375, "sample": [15.129840850830078, 18.13550567626953, 18.35888671875, -9.064697265625, 3.97406005859375, 5.95611572265625, -18.396936416625977, 2.58087158203125, 11.325420379638672, 10.97906494140625, 2.3777313232421875, 8.44757080078125, -9.221134185791016, 24.601703643798828, 16.891254425048828, 9.763317108154297, -4.247833251953125, 0.9407272338867188, 39.14239501953125, 5.7776641845703125, -8.647222518920898, 3.286468505859375, -5.781768798828125, 18.423362731933594, 8.3232421875, 21.701400756835938, -9.403907775878906, 17.52440643310547, 24.176162719726562, -0.8228530883789062, 3.497783660888672, 2.7884864807128906, 0.5100154876708984, 26.45439910888672, -1.3759269714355469, 18.741924285888672, 14.0673828125, -9.323776245117188, 3.363739013671875, 10.072235107421875, 39.546844482421875, 3.459585189819336, 1.9552764892578125, 30.383892059326172, 0.05074310302734375, 22.766178131103516, 2.8684616088867188, 34.19956970214844, 23.36565399169922, -0.00724029541015625, -4.504173278808594, 1.2937774658203125, -4.868129730224609, -9.049530029296875, 11.695816040039062, 19.131561279296875, -2.866443634033203, -30.262466430664062, 36.26898193359375, -3.907928466796875, 40.35636901855469, 8.214370727539062, 11.987892150878906, 0.42475128173828125], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000658.npy"}
{"epoch": 0.9947089947089947, "step": 659, "batch_size": 64, "mean": 6.01186990737915, "std": 13.17009449005127, "min": -23.750896453857422, "p10": -9.513371276855468, "median": 5.085702896118164, "p90": 23.172697067260746, "max": 38.02110290527344, "pos_frac": 0.71875, "sample": [9.292854309082031, 5.217594146728516, 31.07623291015625, 3.3812484741210938, -5.701904296875, 34.4822998046875, 0.10767364501953125, 14.30040168762207, 3.15521240234375, 8.042156219482422, 18.8983154296875, -1.019439697265625, 2.6609878540039062, -0.47882843017578125, 3.0720624923706055, 13.347969055175781, 13.641426086425781, -13.536201477050781, 2.340557098388672, -14.340141296386719, 21.89844512939453, -7.534389495849609, -0.32861328125, 12.869743347167969, 5.827724456787109, 1.4223098754882812, 2.1714630126953125, 9.429874420166016, 0.9039707183837891, 4.96051025390625, 8.807632446289062, -9.599517822265625, 33.64155578613281, 3.6934661865234375, 6.2720184326171875, 10.137908935546875, 38.02110290527344, 12.80926513671875, 30.232078552246094, 7.937374114990234, 12.672561645507812, -17.646240234375, -13.957576751708984, 22.671924591064453, -9.312362670898438, 5.210895538330078, 0.9365997314453125, 12.0111083984375, 23.387313842773438, -5.0350799560546875, 6.841320037841797, 17.338573455810547, -23.750896453857422, -1.0382919311523438, 16.875083923339844, 10.78499984741211, -3.970245361328125, -21.953948974609375, 29.010833740234375, 5.5580596923828125, -1.1263275146484375, 4.749235153198242, 1.159576416015625, -8.171844482421875], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000659.npy"}
{"epoch": 0.9962207105064248, "step": 660, "batch_size": 64, "mean": 10.601615905761719, "std": 11.060906410217285, "min": -13.104316711425781, "p10": -2.4158613204956056, "median": 9.965511322021484, "p90": 25.869981765747074, "max": 33.70758056640625, "pos_frac": 0.796875, "sample": [22.93817901611328, 10.616607666015625, 10.049858093261719, -5.890602111816406, 11.250181198120117, -5.583534240722656, 6.1768341064453125, 20.385669708251953, -13.104316711425781, -10.45672607421875, 19.0576171875, 27.153213500976562, 10.354618072509766, 20.12108612060547, 12.383903503417969, 25.087387084960938, 26.205379486083984, 14.881519317626953, 9.88116455078125, -6.174957275390625, 7.465370178222656, 6.173980712890625, 17.235336303710938, -0.3982582092285156, 2.151611328125, 32.188682556152344, -1.1956787109375, 17.558326721191406, 7.498706817626953, -2.2510604858398438, 0.6754608154296875, 13.98040771484375, 33.70758056640625, 9.250778198242188, -2.3984603881835938, -0.8633804321289062, 1.8613510131835938, 13.454753875732422, 28.963520050048828, 5.695549011230469, 2.0714111328125, 7.262397766113281, 14.175079345703125, 5.8098297119140625, 25.03900909423828, 8.0091552734375, 11.882789611816406, 1.9497222900390625, 13.213798522949219, 26.5672607421875, 14.063865661621094, 32.00817108154297, -2.4237899780273438, 5.160682678222656, 24.510513305664062, -1.0447463989257812, 7.914402961730957, 19.45238494873047, 23.298194885253906, -2.423318862915039, 8.30748176574707, 13.1083984375, 1.3347320556640625, 23.16834259033203], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000660.npy"}
{"epoch": 0.9977324263038548, "step": 661, "batch_size": 64, "mean": 6.786632537841797, "std": 13.36817741394043, "min": -32.14326477050781, "p10": -7.809616088867187, "median": 5.866168975830078, "p90": 24.299882507324224, "max": 35.793277740478516, "pos_frac": 0.65625, "sample": [-3.6898193359375, -2.1520843505859375, -9.343215942382812, -8.177337646484375, 10.97219467163086, 11.47396469116211, 26.419429779052734, -2.5347900390625, -32.14326477050781, 13.407848358154297, -4.766071319580078, -0.4415168762207031, -2.3421173095703125, 35.793277740478516, 24.726119995117188, 30.76922607421875, 34.372196197509766, 16.635189056396484, 35.38311767578125, -6.95159912109375, -24.57645034790039, -0.3631744384765625, 23.305328369140625, 5.657096862792969, -2.0532608032226562, -0.20040130615234375, -0.92718505859375, -8.841659545898438, 5.798515319824219, 7.519432067871094, 9.672149658203125, 8.446510314941406, -13.441228866577148, 19.364852905273438, 9.907958984375, 0.4270925521850586, 7.925209045410156, 13.22646713256836, 4.310356140136719, 12.739891052246094, 14.821922302246094, -0.049289703369140625, 0.5956268310546875, 21.121318817138672, 22.401138305664062, 14.713287353515625, 22.829757690429688, 10.709762573242188, 5.9338226318359375, 10.671241760253906, -6.751544952392578, 13.149398803710938, -9.91644287109375, -1.2675666809082031, 6.540409088134766, 13.778778076171875, 2.0026931762695312, 1.3619384765625, -0.520416259765625, 9.132049560546875, 1.1403350830078125, 2.6053466796875, 2.3604202270507812, 31.6722412109375], "npy": "/scratch/qu.yang1/outputs/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249/margin_logs/step_0000661.npy"}

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4e37f31f231a8609031763d23d2c543957deeb21d3e83928a978e209a664a07b
size 4972454376

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c99ba2f2b22f4fedd1247f0dc23584dc599ca6e76e0e2d9ca7ac2fc577b0077d
size 4832048608

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9bb61d42106c26347f8513289f1c0644aa8e8760e6a7f43d694d8b9a0b3943ed
size 4832048656

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f1979cf6caaf49322afdd6c1521149b50c1bfc067b191fb182a1bfcf544d1a78
size 4999855528

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b53ac951da0248c0fc7d71a719a2576eb92696992f6bec8a08633b5164de334d
size 4832048672

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c305a3ed03fd57165fdabc283bc66cc79fc2444c8e3896d73d1c9aca62c2cc0b
size 4832048672

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:83ebd09d324d5272387993dce800e51ee28f18e0c6c5764295ed2d6e91564212
size 3462482728

View File

@@ -0,0 +1,406 @@
{
"metadata": {
"total_size": 32762941440
},
"weight_map": {
"lm_head.weight": "model-00007-of-00007.safetensors",
"model.embed_tokens.weight": "model-00001-of-00007.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.20.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.input_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.34.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.35.input_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.k_norm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.q_norm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.norm.weight": "model-00007-of-00007.safetensors"
}
}

31
special_tokens_map.json Normal file
View File

@@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

BIN
tokenizer.json (Stored with Git LFS) Normal file

Binary file not shown.

240
tokenizer_config.json Normal file
View File

@@ -0,0 +1,240 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|endoftext|>",
"errors": "replace",
"extra_special_tokens": {},
"model_max_length": 2048,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

1688
train.log Normal file

File diff suppressed because one or more lines are too long

9
train_results.json Normal file
View File

@@ -0,0 +1,9 @@
{
"epoch": 0.999244142101285,
"total_flos": 0.0,
"train_loss": 1.122965409968516,
"train_runtime": 3224.9347,
"train_samples": 42336,
"train_samples_per_second": 13.128,
"train_steps_per_second": 0.205
}

10054
trainer_state.json Normal file

File diff suppressed because it is too large Load Diff

1
vocab.json Normal file

File diff suppressed because one or more lines are too long