初始化项目,由ModelHub XC社区提供模型

Model: jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-16 03:03:45 +08:00
commit 4b3d06f5b1
23 changed files with 166086 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

82
README.md Normal file
View File

@@ -0,0 +1,82 @@
---
library_name: transformers
license: apache-2.0
base_model: jackf857/qwen3-8b-base-sft-hh-helpful-4xh200-batch-64-20260417-214452
tags:
- alignment-handbook
- new-dpo
- generated_from_trainer
datasets:
- Anthropic/hh-rlhf
model-index:
- name: qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85
This model is a fine-tuned version of [jackf857/qwen3-8b-base-sft-hh-helpful-4xh200-batch-64-20260417-214452](https://huggingface.co/jackf857/qwen3-8b-base-sft-hh-helpful-4xh200-batch-64-20260417-214452) on the Anthropic/hh-rlhf dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4843
- Fcm Dpo/beta: 0.0126
- Margin Dpo/margin Mean: 59.3348
- Margin Dpo/margin Std: 75.8064
- Logps/chosen: -239.6379
- Logps/rejected: -292.5469
- Logps/ref Chosen: -100.4936
- Logps/ref Rejected: -94.0678
- Logits/chosen: -1.6991
- Logits/rejected: -1.3195
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Fcm Dpo/beta | Margin Dpo/margin Mean | Margin Dpo/margin Std | Logps/chosen | Logps/rejected | Logps/ref Chosen | Logps/ref Rejected | Logits/chosen | Logits/rejected |
|:-------------:|:------:|:----:|:---------------:|:------------:|:----------------------:|:---------------------:|:------------:|:--------------:|:----------------:|:------------------:|:-------------:|:---------------:|
| 0.8851 | 0.1468 | 100 | 0.5619 | 0.1911 | 2.7888 | 5.1863 | -102.8106 | -99.1735 | -100.4936 | -94.0678 | -0.0489 | 0.1697 |
| 0.8588 | 0.2937 | 200 | 0.5018 | 0.0551 | 12.3878 | 17.0440 | -115.7780 | -121.7400 | -100.4936 | -94.0678 | -1.1454 | -0.8443 |
| 0.9115 | 0.4405 | 300 | 0.4888 | 0.0264 | 26.5166 | 33.8690 | -155.7471 | -175.8378 | -100.4936 | -94.0678 | -1.7435 | -1.4186 |
| 0.8025 | 0.5874 | 400 | 0.4957 | 0.0127 | 51.4539 | 65.7573 | -216.1685 | -261.1965 | -100.4936 | -94.0678 | -1.7851 | -1.4226 |
| 0.9588 | 0.7342 | 500 | 0.4848 | 0.0130 | 58.4689 | 75.4513 | -233.7258 | -285.7689 | -100.4936 | -94.0678 | -1.8270 | -1.4568 |
| 0.8671 | 0.8811 | 600 | 0.4843 | 0.0126 | 59.3348 | 75.8064 | -239.6379 | -292.5469 | -100.4936 | -94.0678 | -1.6991 | -1.3195 |
### Framework versions
- Transformers 4.51.0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.21.4

28
added_tokens.json Normal file
View File

@@ -0,0 +1,28 @@
{
"</think>": 151668,
"</tool_call>": 151658,
"</tool_response>": 151666,
"<think>": 151667,
"<tool_call>": 151657,
"<tool_response>": 151665,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

23
all_results.json Normal file
View File

@@ -0,0 +1,23 @@
{
"epoch": 1.0,
"eval_fcm_dpo/beta": 0.011290068738162518,
"eval_logits/chosen": -1.8993860483169556,
"eval_logits/rejected": -1.5321727991104126,
"eval_logps/chosen": -239.60226440429688,
"eval_logps/ref_chosen": -100.49356842041016,
"eval_logps/ref_rejected": -94.06775665283203,
"eval_logps/rejected": -292.7149963378906,
"eval_loss": 0.49325719475746155,
"eval_margin_dpo/margin_mean": 59.53855895996094,
"eval_margin_dpo/margin_std": 76.0225830078125,
"eval_runtime": 43.8024,
"eval_samples": 2339,
"eval_samples_per_second": 53.399,
"eval_steps_per_second": 1.689,
"total_flos": 0.0,
"train_loss": 0.9292974616462438,
"train_runtime": 2239.8819,
"train_samples": 43598,
"train_samples_per_second": 19.464,
"train_steps_per_second": 0.304
}

30
config.json Normal file
View File

@@ -0,0 +1,30 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151643,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 12288,
"max_position_embeddings": 32768,
"max_window_layers": 36,
"model_type": "qwen3",
"num_attention_heads": 32,
"num_hidden_layers": 36,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.51.0",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}

17
eval_results.json Normal file
View File

@@ -0,0 +1,17 @@
{
"epoch": 1.0,
"eval_fcm_dpo/beta": 0.011290068738162518,
"eval_logits/chosen": -1.8993860483169556,
"eval_logits/rejected": -1.5321727991104126,
"eval_logps/chosen": -239.60226440429688,
"eval_logps/ref_chosen": -100.49356842041016,
"eval_logps/ref_rejected": -94.06775665283203,
"eval_logps/rejected": -292.7149963378906,
"eval_loss": 0.49325719475746155,
"eval_margin_dpo/margin_mean": 59.53855895996094,
"eval_margin_dpo/margin_std": 76.0225830078125,
"eval_runtime": 43.8024,
"eval_samples": 2339,
"eval_samples_per_second": 53.399,
"eval_steps_per_second": 1.689
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"bos_token_id": 151643,
"eos_token_id": 151643,
"max_new_tokens": 2048,
"transformers_version": "4.51.0"
}

681
margin_logs/margins.jsonl Normal file
View File

@@ -0,0 +1,681 @@
{"epoch": 0.0, "step": 1, "batch_size": 64, "mean": 0.061235010623931885, "std": 0.44581517577171326, "min": -0.9558563232421875, "p10": -0.49806652069091795, "median": 0.0538330078125, "p90": 0.6431861877441406, "max": 1.349212646484375, "pos_frac": 0.5625, "sample": [0.2952117919921875, -0.9558563232421875, 0.002197265625, 0.5723953247070312, 0.45829010009765625, 0.045989990234375, 0.07258987426757812, 0.16957473754882812, -0.0712890625, 0.067230224609375, 0.061676025390625, 0.9847030639648438, 0.25745391845703125, 0.223114013671875, 0.8358078002929688, -0.205474853515625, -0.4477996826171875, -0.6458778381347656, 1.101287841796875, 0.0405731201171875, -0.05789947509765625, 0.8459625244140625, -0.16561508178710938, -0.1964569091796875, -0.08588409423828125, 0.07699203491210938, -0.15012359619140625, 0.08970260620117188, -0.6135711669921875, -0.3250083923339844, -0.5196094512939453, -0.055461883544921875, 0.06568145751953125, -0.1916046142578125, 0.07837677001953125, -0.5969200134277344, -0.6232337951660156, -0.23907470703125, 0.10578536987304688, 1.349212646484375, 0.6396026611328125, -0.1342315673828125, -0.33837318420410156, -0.6055755615234375, 0.28924560546875, -0.053333282470703125, -0.13497543334960938, 0.040065765380859375, 0.156463623046875, 0.30047607421875, 0.18242645263671875, -0.4043922424316406, 0.10161590576171875, -0.2782421112060547, 0.6447219848632812, 0.592559814453125, 0.2892589569091797, 0.2594795227050781, 0.278594970703125, 0.7102890014648438, -0.0288848876953125, 0.5050926208496094, -0.35527801513671875, -0.3906135559082031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000001.npy"}
{"epoch": 0.0014684287812041115, "step": 2, "batch_size": 64, "mean": -0.12442246079444885, "std": 0.41391414403915405, "min": -1.25341796875, "p10": -0.5899612426757812, "median": -0.16697025299072266, "p90": 0.40547485351562507, "max": 0.74664306640625, "pos_frac": 0.375, "sample": [-0.4356536865234375, 0.0176544189453125, -0.20431900024414062, 0.74664306640625, -0.2661247253417969, -0.1260967254638672, -0.8647918701171875, -0.17073822021484375, -0.477691650390625, -0.18256378173828125, 0.3944091796875, -0.16320228576660156, -0.3320465087890625, -0.39223480224609375, -0.4348640441894531, -0.602691650390625, -1.201324462890625, -0.4142303466796875, -0.037799835205078125, 0.09072113037109375, -0.5269927978515625, 0.05698394775390625, 0.5052413940429688, 0.13204193115234375, -0.4286689758300781, 0.03350067138671875, 0.65325927734375, 0.6349716186523438, -0.19322967529296875, 0.34609222412109375, 0.65667724609375, -0.2496337890625, -0.6125564575195312, -0.11834716796875, -0.10693359375, -0.062713623046875, -0.3054351806640625, 0.3363838195800781, 0.35395050048828125, -0.008523941040039062, -0.6783828735351562, 0.23185348510742188, -0.02996826171875, -0.4708709716796875, -1.25341796875, -0.22631072998046875, 0.0096588134765625, 0.437835693359375, 0.345306396484375, 0.17419815063476562, -0.5602569580078125, 0.17741012573242188, 0.04550933837890625, -0.2523040771484375, -0.6806221008300781, -0.28516387939453125, -0.21526336669921875, -0.24713897705078125, 0.18416213989257812, 0.41021728515625, -0.4717559814453125, -0.22457504272460938, 0.0484161376953125, -0.47069549560546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000002.npy"}
{"epoch": 0.002936857562408223, "step": 3, "batch_size": 64, "mean": 0.033603474497795105, "std": 0.47235339879989624, "min": -1.1258697509765625, "p10": -0.6252763748168945, "median": 0.051342010498046875, "p90": 0.6293060302734376, "max": 1.12786865234375, "pos_frac": 0.5625, "sample": [-0.6029033660888672, -0.911865234375, 0.1763153076171875, 0.0651092529296875, 0.3923759460449219, -0.18950462341308594, 0.14054107666015625, 0.09049606323242188, 0.2709808349609375, 0.334228515625, 0.6973876953125, 0.5523185729980469, -1.1258697509765625, -0.20796966552734375, -0.078369140625, 0.055694580078125, -0.57720947265625, -0.6348648071289062, -0.7830657958984375, -0.106048583984375, 0.6094284057617188, 0.8430023193359375, 0.197174072265625, -0.25201416015625, 0.06450653076171875, 0.03399658203125, -0.6698684692382812, -0.17305755615234375, 0.04698944091796875, -0.8885498046875, 0.44861602783203125, 1.0282745361328125, 0.013353347778320312, -0.178070068359375, -0.248809814453125, 0.1596832275390625, -0.01377105712890625, -0.2791481018066406, 0.30573272705078125, 0.4403038024902344, -0.29293060302734375, 0.134063720703125, 0.3950958251953125, 0.4018821716308594, -0.258209228515625, -0.19649124145507812, 0.7665863037109375, -0.2421875, 0.27696990966796875, 0.03937530517578125, -0.3887939453125, -0.2666740417480469, 0.17680740356445312, 0.7966537475585938, -0.45803070068359375, -0.7217254638671875, 0.3378448486328125, 0.6378250122070312, -0.06058502197265625, 0.4389190673828125, -0.02663707733154297, 1.12786865234375, 0.4224395751953125, 0.06500625610351562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000003.npy"}
{"epoch": 0.004405286343612335, "step": 4, "batch_size": 64, "mean": -0.027436137199401855, "std": 0.4447876811027527, "min": -1.311279296875, "p10": -0.5476089477539062, "median": 0.07236099243164062, "p90": 0.5351360321044925, "max": 0.7963943481445312, "pos_frac": 0.546875, "sample": [-0.7179412841796875, -0.3704833984375, -0.3488044738769531, -0.15949249267578125, 0.0425262451171875, 0.13916778564453125, -0.0830841064453125, -0.123321533203125, 0.7134609222412109, 0.183990478515625, 0.11034774780273438, -0.0839691162109375, 0.39380645751953125, 0.7184219360351562, -0.41815185546875, 0.019100189208984375, -0.15550994873046875, 0.3012733459472656, 0.4821815490722656, 0.06844711303710938, -0.2345428466796875, 0.29325103759765625, 0.0944671630859375, -0.434814453125, -0.22177886962890625, -1.311279296875, 0.2604522705078125, 0.3702239990234375, -0.4659767150878906, -0.504913330078125, 0.12053871154785156, 0.10930252075195312, -0.8468093872070312, -0.20601654052734375, -0.10233116149902344, 0.155059814453125, -1.2399444580078125, 0.557830810546875, 0.1468658447265625, 0.61627197265625, 0.7020416259765625, 0.13254547119140625, 0.0836029052734375, -0.537322998046875, 0.3772735595703125, -0.16379547119140625, 0.692657470703125, -0.5520172119140625, 0.4746379852294922, 0.7963943481445312, 0.1820831298828125, -0.10736846923828125, 0.15705490112304688, -0.5859832763671875, -0.14397048950195312, 0.07627487182617188, 0.1190032958984375, -0.7981338500976562, -0.2470703125, 0.083038330078125, -0.3531341552734375, 0.23792648315429688, -0.485748291015625, 0.23627471923828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000004.npy"}
{"epoch": 0.005873715124816446, "step": 5, "batch_size": 64, "mean": 0.08646932244300842, "std": 0.4314703643321991, "min": -0.9998703002929688, "p10": -0.3593414306640625, "median": 0.022283554077148438, "p90": 0.6566970825195313, "max": 1.1161727905273438, "pos_frac": 0.546875, "sample": [0.56536865234375, -0.05377960205078125, 0.4616737365722656, -0.362335205078125, -0.3203277587890625, -0.4602813720703125, -0.183685302734375, 0.285736083984375, 0.3110618591308594, 0.5123825073242188, 0.2551116943359375, 0.65179443359375, 0.04759979248046875, -0.5222969055175781, 0.8195877075195312, -0.10049057006835938, -0.21957778930664062, 0.10091781616210938, -0.7761077880859375, 0.3072509765625, 0.7047653198242188, 0.5625228881835938, 0.3489227294921875, -0.16378402709960938, 0.025592803955078125, -0.9998703002929688, 0.01322174072265625, 0.06769561767578125, 0.3147735595703125, 1.1161727905273438, -0.21176910400390625, 0.037212371826171875, 0.01897430419921875, -0.06331253051757812, 0.2996788024902344, -0.963623046875, 0.3362274169921875, 0.6587982177734375, -0.35235595703125, -0.2437591552734375, -0.0226898193359375, -0.44922637939453125, -0.2544670104980469, -0.14208984375, -0.155975341796875, 0.1239776611328125, -0.04617881774902344, 0.5975341796875, -0.274658203125, 0.2646636962890625, 0.8865814208984375, -0.197845458984375, 0.1165313720703125, -0.109588623046875, -0.25342559814453125, 0.5528106689453125, 0.577728271484375, 0.690521240234375, 0.7365493774414062, 0.00019073486328125, -0.17270660400390625, 0.41571807861328125, -0.074127197265625, -0.10147857666015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000005.npy"}
{"epoch": 0.007342143906020558, "step": 6, "batch_size": 64, "mean": 0.048646748065948486, "std": 0.4201069474220276, "min": -1.2440185546875, "p10": -0.4966560363769531, "median": 0.05651664733886719, "p90": 0.6180194854736328, "max": 0.997955322265625, "pos_frac": 0.546875, "sample": [-0.0325164794921875, 0.034603118896484375, -0.1503467559814453, -0.5611114501953125, 0.170654296875, -0.34418487548828125, 0.05550384521484375, -0.049915313720703125, 0.153167724609375, -0.6444854736328125, 0.1490478515625, -0.052478790283203125, 0.01103973388671875, 0.26611328125, 0.7215118408203125, -0.0831451416015625, 0.41944122314453125, 0.22825050354003906, -0.15184402465820312, 0.27948760986328125, -0.21933746337890625, 0.06270599365234375, 0.6256103515625, -0.08404731750488281, 0.21889495849609375, 0.147369384765625, -0.48746490478515625, 0.829833984375, 0.21227264404296875, -0.1412506103515625, 0.6929168701171875, -0.7042617797851562, 0.0718536376953125, 0.41796875, 0.43088531494140625, -0.1435089111328125, 0.574920654296875, 0.27423095703125, 0.057529449462890625, -0.0555572509765625, -0.2078857421875, 0.6326141357421875, 0.6067428588867188, -0.22504806518554688, 0.44188690185546875, 0.08831977844238281, -0.39166259765625, 0.6228523254394531, 0.1444549560546875, -0.033031463623046875, 0.997955322265625, -0.10357666015625, 0.349395751953125, 0.22892379760742188, 0.5082855224609375, -0.08707046508789062, -0.28813934326171875, -1.2440185546875, -0.56298828125, -0.072021484375, -0.5005950927734375, 0.15143966674804688, -0.97235107421875, -0.17144775390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000006.npy"}
{"epoch": 0.00881057268722467, "step": 7, "batch_size": 64, "mean": 0.021133065223693848, "std": 0.42899325489997864, "min": -1.1426544189453125, "p10": -0.47563285827636714, "median": -0.024682998657226562, "p90": 0.5870330810546875, "max": 1.1676712036132812, "pos_frac": 0.453125, "sample": [0.22600173950195312, -0.9534149169921875, -0.13063430786132812, 0.030612945556640625, -0.40608978271484375, -0.266510009765625, -0.5043678283691406, 0.6663818359375, -0.055606842041015625, -0.00399017333984375, -0.08392333984375, 0.84210205078125, 0.2655792236328125, -0.26751708984375, -0.14408111572265625, -0.1616973876953125, -0.030872344970703125, 0.2198638916015625, -0.07741928100585938, -0.7867050170898438, -0.13141632080078125, 0.0598602294921875, -0.283355712890625, -0.064971923828125, -1.1426544189453125, -0.13294219970703125, 0.37860107421875, -0.2108917236328125, 0.40960693359375, -0.112030029296875, -0.226959228515625, -0.1922760009765625, 0.06189727783203125, -0.4085845947265625, 0.36147308349609375, -0.256378173828125, -0.016326904296875, -0.3563995361328125, 0.57769775390625, 0.1484375, 0.591033935546875, 0.5030975341796875, -0.06469345092773438, 0.039947509765625, 1.1676712036132812, -0.7474365234375, -0.07881546020507812, -0.10015869140625, -0.5187225341796875, 0.08673858642578125, -0.2144317626953125, 0.8159332275390625, -0.5450973510742188, 0.2779693603515625, 0.4717559814453125, 0.19301605224609375, 0.28151702880859375, 0.13760757446289062, -0.01849365234375, 0.0744781494140625, 0.3524932861328125, 0.3752593994140625, 0.7969436645507812, 0.6348037719726562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000007.npy"}
{"epoch": 0.010279001468428781, "step": 8, "batch_size": 64, "mean": -0.11333887279033661, "std": 0.4834345281124115, "min": -1.1573944091796875, "p10": -0.6775115966796874, "median": -0.13326644897460938, "p90": 0.45822372436523473, "max": 1.40948486328125, "pos_frac": 0.4375, "sample": [0.01010894775390625, 0.6789779663085938, 0.2788276672363281, 0.03991889953613281, -0.241485595703125, 0.15570831298828125, 0.12849807739257812, -1.05303955078125, 1.40948486328125, -0.5865058898925781, -0.44261932373046875, 0.2495574951171875, -0.1405487060546875, 0.22802734375, 0.0047454833984375, -0.2491302490234375, -0.528564453125, -0.2137603759765625, -0.32646942138671875, 0.824462890625, -0.22674560546875, 0.035797119140625, -0.496337890625, -0.483245849609375, 0.09883499145507812, -0.12598419189453125, 0.0738067626953125, -0.1769866943359375, -1.118133544921875, 0.36615753173828125, -0.5378646850585938, -0.11190032958984375, -0.4752197265625, -0.3691558837890625, -0.02618408203125, -0.6338043212890625, -0.45929718017578125, 0.08713531494140625, -0.18030452728271484, 0.6482086181640625, -0.06740188598632812, -0.2078704833984375, -0.7672195434570312, -1.1573944091796875, -0.16545486450195312, 1.013031005859375, -1.03167724609375, -0.7977142333984375, 0.21246910095214844, 0.08750152587890625, 0.4976806640625, 0.19208526611328125, -0.5095672607421875, 0.0211181640625, 0.0535888671875, -0.266448974609375, -0.6962432861328125, -0.2391815185546875, -0.163818359375, 0.5217208862304688, -0.3800201416015625, 0.060588836669921875, 0.1928863525390625, 0.2286834716796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000008.npy"}
{"epoch": 0.011747430249632892, "step": 9, "batch_size": 64, "mean": 0.05302073061466217, "std": 0.4571126103401184, "min": -1.0199508666992188, "p10": -0.4905250549316406, "median": -0.03192138671875, "p90": 0.6600130081176758, "max": 1.12091064453125, "pos_frac": 0.46875, "sample": [-0.1328582763671875, 0.248046875, 0.22692108154296875, -0.38324737548828125, -0.3137245178222656, 0.6410446166992188, 0.4206695556640625, -0.010990142822265625, -0.03717041015625, -0.1387786865234375, 0.6330718994140625, -0.200531005859375, 0.24227142333984375, 0.36524200439453125, 1.0277175903320312, -0.1216583251953125, 0.2601966857910156, 0.3894233703613281, 0.106689453125, 0.2611236572265625, -0.48693084716796875, -0.07864761352539062, -0.053619384765625, -0.13706207275390625, -0.949951171875, -0.5597381591796875, -0.33722686767578125, 1.12091064453125, 0.4974861145019531, 0.3669586181640625, -0.23032760620117188, -0.16770172119140625, -0.279937744140625, 0.3927764892578125, -0.089324951171875, -0.449462890625, -0.07207584381103516, 0.16921234130859375, 0.446014404296875, -0.6844558715820312, 0.3064613342285156, -1.0199508666992188, 0.8393707275390625, -0.4920654296875, -0.6400642395019531, 0.7194595336914062, -0.10388946533203125, 0.5409622192382812, -0.17317962646484375, -0.05805206298828125, -0.12285614013671875, 0.20124053955078125, 0.9662322998046875, 0.06403732299804688, 0.08559799194335938, -0.6584930419921875, 0.6994857788085938, -0.02667236328125, 0.6681423187255859, 0.1402587890625, -0.2090911865234375, 0.3189697265625, -0.084716796875, -0.4682159423828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000009.npy"}
{"epoch": 0.013215859030837005, "step": 10, "batch_size": 64, "mean": -0.07061484456062317, "std": 0.4116254150867462, "min": -0.8816680908203125, "p10": -0.6797134399414062, "median": 0.016096115112304688, "p90": 0.3403518676757813, "max": 1.0978851318359375, "pos_frac": 0.5, "sample": [-0.0131988525390625, -0.4303398132324219, -0.0189361572265625, 0.09816169738769531, 0.2627696990966797, -0.6470184326171875, -0.022031784057617188, -0.6937255859375, -0.0892333984375, 0.0592041015625, -0.630340576171875, 0.3002166748046875, -0.515350341796875, 0.22849273681640625, 0.22502899169921875, 0.072113037109375, -0.7314224243164062, 0.16617202758789062, 0.16126632690429688, -0.45610809326171875, 0.6657333374023438, -0.23311996459960938, 0.5129547119140625, 0.16503334045410156, 0.12030029296875, 0.2884864807128906, -0.8816680908203125, -0.7889289855957031, -0.3256416320800781, -0.5460624694824219, 1.0978851318359375, 0.1761322021484375, -0.083892822265625, -0.8229141235351562, -0.5482177734375, -0.1297760009765625, -0.8011474609375, -0.13695526123046875, 0.3659248352050781, 0.11137962341308594, 0.14386367797851562, 0.21533966064453125, -0.19580078125, 0.347442626953125, -0.2996063232421875, 0.1263580322265625, -0.3001708984375, 0.2770843505859375, 0.3238067626953125, -0.33411407470703125, 0.3710365295410156, 0.045391082763671875, 0.438446044921875, -0.748626708984375, 0.27960968017578125, 0.2001495361328125, -0.3699913024902344, -0.204620361328125, -0.5847396850585938, 0.286346435546875, -0.3533935546875, 0.2661399841308594, 0.19698333740234375, -0.17750930786132812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000010.npy"}
{"epoch": 0.014684287812041116, "step": 11, "batch_size": 64, "mean": 0.044263795018196106, "std": 0.5459516048431396, "min": -1.5420989990234375, "p10": -0.5988838195800781, "median": 0.031490325927734375, "p90": 0.7026943206787111, "max": 1.76446533203125, "pos_frac": 0.546875, "sample": [0.34413909912109375, 0.56109619140625, 0.09896659851074219, -0.3684253692626953, -0.5588912963867188, 0.0399932861328125, 0.20452117919921875, 0.31927967071533203, 0.35888671875, 0.9434356689453125, 0.5629844665527344, 0.31186676025390625, 0.94329833984375, -0.1234588623046875, -0.6095123291015625, -0.16201019287109375, 0.07578277587890625, -0.18238067626953125, 0.77642822265625, -0.4166603088378906, 0.17362594604492188, -0.13486480712890625, 0.6387252807617188, -0.465362548828125, 0.9560394287109375, 0.23097991943359375, -0.3016510009765625, -0.1235809326171875, 0.021697998046875, -0.708831787109375, 0.4821624755859375, -0.079864501953125, 0.175750732421875, -0.0383148193359375, -0.4489288330078125, 0.67205810546875, -0.6071624755859375, -0.28975677490234375, 0.4400177001953125, 0.061187744140625, 0.12066268920898438, -0.3613433837890625, 0.05991363525390625, -1.5420989990234375, 0.7158241271972656, -0.4389305114746094, 1.1189498901367188, 0.33028411865234375, 1.76446533203125, -0.22199630737304688, -0.19340896606445312, -0.00670623779296875, -0.6435699462890625, 0.048858642578125, -1.1004905700683594, 0.1665802001953125, 0.02298736572265625, 0.5304679870605469, -0.7484130859375, 0.3067817687988281, -0.5795669555664062, 0.007080078125, -0.2606048583984375, -0.03610992431640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000011.npy"}
{"epoch": 0.016152716593245228, "step": 12, "batch_size": 64, "mean": 0.11056801676750183, "std": 0.5247921943664551, "min": -0.9948883056640625, "p10": -0.44559402465820314, "median": 0.07611083984375, "p90": 0.7692073822021485, "max": 2.0369415283203125, "pos_frac": 0.5625, "sample": [-0.3850555419921875, 0.95526123046875, 0.26165008544921875, -0.6289215087890625, -0.15207672119140625, -0.43332672119140625, 0.015117645263671875, 0.5313568115234375, -0.0178070068359375, 0.667877197265625, -0.20670318603515625, 0.3492431640625, -0.2788238525390625, 0.3052215576171875, 0.4244537353515625, -0.4508514404296875, 0.009998321533203125, 2.0369415283203125, -0.030673980712890625, -0.3101234436035156, 0.754486083984375, 0.057956695556640625, 0.044891357421875, 0.1107635498046875, 0.38525390625, 0.11815643310546875, 0.6153106689453125, -0.879852294921875, -0.3122596740722656, 0.5822906494140625, -0.905792236328125, 0.2082672119140625, -0.24925994873046875, 0.16259002685546875, 0.97564697265625, 0.7929916381835938, -0.07203102111816406, -0.4891204833984375, -0.6709518432617188, -0.2603302001953125, 0.5250473022460938, 0.21635818481445312, -0.04676055908203125, 0.6311416625976562, 0.39038848876953125, -0.02251434326171875, -0.3331756591796875, -0.35242462158203125, -0.2614707946777344, 0.7755165100097656, -0.2799949645996094, -0.23005294799804688, -0.9948883056640625, 0.8219528198242188, -0.029205322265625, 1.149017333984375, 0.3493804931640625, 0.18179702758789062, 0.24656295776367188, 0.09426498413085938, 0.232452392578125, -0.15264129638671875, 0.2891082763671875, 0.24472808837890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000012.npy"}
{"epoch": 0.01762114537444934, "step": 13, "batch_size": 64, "mean": 0.05721601843833923, "std": 0.5203707814216614, "min": -1.300689697265625, "p10": -0.6708366394042968, "median": 0.04156780242919922, "p90": 0.77260856628418, "max": 1.573944091796875, "pos_frac": 0.5625, "sample": [-0.122528076171875, -0.4059104919433594, -0.56402587890625, -0.7046051025390625, 0.5011138916015625, 0.7146339416503906, -0.21112060546875, -0.024812698364257812, 0.023029327392578125, -0.6766433715820312, 0.40811920166015625, -0.14127349853515625, -0.9095306396484375, -0.0115509033203125, 0.10699462890625, 0.49979400634765625, -0.65728759765625, 1.573944091796875, 0.08975982666015625, 0.0841217041015625, 0.16567420959472656, 0.35964202880859375, 1.0096511840820312, 0.10612106323242188, -0.19311904907226562, 0.797454833984375, -0.6815071105957031, 0.025228500366210938, 0.18416786193847656, 0.44342041015625, -0.5108871459960938, 0.26705169677734375, -0.03250885009765625, 0.5069694519042969, -1.300689697265625, 0.009006500244140625, 0.4098052978515625, 0.8724594116210938, 0.08576393127441406, -0.777313232421875, -0.4273719787597656, -0.051998138427734375, 0.998748779296875, 0.20289993286132812, -0.3660888671875, -0.037342071533203125, 0.0579071044921875, -0.08713912963867188, 0.3165245056152344, 0.02274322509765625, -0.14759063720703125, 0.60223388671875, -0.7843856811523438, -0.023693084716796875, 0.26148223876953125, 0.9287872314453125, -0.141693115234375, 0.11516571044921875, 0.896728515625, 0.3433837890625, -0.1411895751953125, -0.5093154907226562, 0.242767333984375, 0.07164764404296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000013.npy"}
{"epoch": 0.01908957415565345, "step": 14, "batch_size": 64, "mean": -0.0824495255947113, "std": 0.5198246836662292, "min": -1.8970413208007812, "p10": -0.6720996856689453, "median": 0.017806053161621094, "p90": 0.4418388366699219, "max": 0.9465255737304688, "pos_frac": 0.5, "sample": [-0.81231689453125, 0.09931182861328125, -0.0005130767822265625, 0.21491241455078125, -1.144012451171875, -0.656494140625, -0.162841796875, 0.05823516845703125, -1.8970413208007812, 0.18373870849609375, -0.6015853881835938, 0.38381195068359375, -0.4546966552734375, 0.5709609985351562, 0.19335174560546875, -0.040496826171875, 0.7380943298339844, 0.4454345703125, -0.26587677001953125, 0.31089019775390625, 0.9465255737304688, -0.449127197265625, -0.4233055114746094, -0.038299560546875, 0.10010528564453125, 0.7022323608398438, -0.1379547119140625, -0.5784645080566406, 0.10186004638671875, 0.03612518310546875, -0.10680389404296875, -0.15377426147460938, 0.3482666015625, 0.407073974609375, 0.19188308715820312, -0.5920944213867188, -0.035297393798828125, -0.33331298828125, -0.08199310302734375, 0.03800201416015625, 0.20556640625, 0.4225578308105469, -1.1560745239257812, -0.3469085693359375, -0.6678886413574219, 0.06319427490234375, -0.09811782836914062, 0.127197265625, -0.7895126342773438, 0.109771728515625, 0.43344879150390625, 0.2738800048828125, -0.6131973266601562, -0.596649169921875, 0.15368270874023438, -0.6739044189453125, 0.04752349853515625, 0.658843994140625, 0.39923095703125, 0.0595855712890625, -0.12989044189453125, 0.8333778381347656, -1.0326080322265625, -0.06439208984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000014.npy"}
{"epoch": 0.020558002936857563, "step": 15, "batch_size": 64, "mean": 0.043842852115631104, "std": 0.5014774203300476, "min": -0.9378509521484375, "p10": -0.5016084671020508, "median": 0.004530906677246094, "p90": 0.5263648986816407, "max": 2.035125732421875, "pos_frac": 0.5, "sample": [-0.05821037292480469, 0.25360107421875, 0.13532638549804688, -0.2936134338378906, -0.357208251953125, -0.21964645385742188, -0.22849655151367188, 0.46152496337890625, -0.299346923828125, 0.5350189208984375, 0.0186767578125, 0.09746932983398438, -0.2423248291015625, 0.0687408447265625, -0.46614837646484375, -0.180511474609375, 0.20164871215820312, -0.108184814453125, -0.4648399353027344, 0.7935256958007812, -0.331939697265625, 0.360443115234375, -0.03467559814453125, -0.33831787109375, 0.34954833984375, -0.6482009887695312, -0.22475051879882812, -0.47411346435546875, 1.2087783813476562, 1.086090087890625, 0.2196502685546875, -0.484039306640625, 0.3270263671875, 0.025537490844726562, -0.5434417724609375, -0.9378509521484375, 0.468292236328125, 0.24112701416015625, 0.928985595703125, -0.015993118286132812, -0.11075210571289062, 0.3517265319824219, -0.30931854248046875, 0.1415252685546875, 0.09692764282226562, -0.18263816833496094, -0.5091381072998047, -0.009614944458007812, 0.5061721801757812, -0.6619300842285156, 0.350250244140625, -0.20207977294921875, 0.0985565185546875, 0.40221405029296875, -0.6155242919921875, 0.09687042236328125, 2.035125732421875, -0.22069549560546875, 0.39402198791503906, -0.7337417602539062, -0.05534934997558594, 0.7495155334472656, 0.16468048095703125, 0.199981689453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000015.npy"}
{"epoch": 0.022026431718061675, "step": 16, "batch_size": 64, "mean": -0.02028217911720276, "std": 0.46277642250061035, "min": -1.18548583984375, "p10": -0.5903709411621094, "median": 0.022424697875976562, "p90": 0.4738189697265626, "max": 1.073699951171875, "pos_frac": 0.578125, "sample": [0.1175537109375, -0.0726165771484375, -0.0407867431640625, 1.0189666748046875, 0.565673828125, -0.1184234619140625, 1.073699951171875, 0.06753158569335938, 0.017276763916015625, 0.11429595947265625, -0.8375396728515625, 0.3609046936035156, 0.3092193603515625, -0.1782989501953125, 0.3061656951904297, 0.227783203125, 0.7230033874511719, -0.9431991577148438, -1.1715240478515625, -0.59454345703125, 0.5760345458984375, -0.2344818115234375, -0.21886825561523438, -1.1241912841796875, -0.20282363891601562, -0.223236083984375, -0.230255126953125, 0.1102447509765625, 0.012514114379882812, 0.008863449096679688, 0.16393470764160156, 0.13087081909179688, 0.29087066650390625, 0.48712158203125, 0.077880859375, 0.020782470703125, 0.38459014892578125, -0.2540283203125, 0.442779541015625, -0.2014923095703125, -0.19196510314941406, 0.2920722961425781, -0.1995868682861328, -0.12860107421875, 0.40685272216796875, -0.2825736999511719, 0.1845245361328125, 0.09819221496582031, 0.15299224853515625, 0.120697021484375, -0.3003082275390625, 0.0030059814453125, 0.025665283203125, -1.0062446594238281, -0.25661468505859375, 0.22534942626953125, 0.16228866577148438, -0.10540771484375, -0.5806350708007812, -1.18548583984375, 0.6955413818359375, 0.08770751953125, 0.024066925048828125, -0.5018463134765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000016.npy"}
{"epoch": 0.023494860499265784, "step": 17, "batch_size": 64, "mean": 0.03842410445213318, "std": 0.4417261779308319, "min": -1.212921142578125, "p10": -0.4437843322753906, "median": 0.03247642517089844, "p90": 0.6084777832031254, "max": 1.1150360107421875, "pos_frac": 0.546875, "sample": [-0.3145751953125, -0.25797080993652344, 0.34917449951171875, 0.26380157470703125, 0.7422637939453125, 0.09200096130371094, -0.393829345703125, 0.515960693359375, 0.1764373779296875, -0.2332305908203125, 0.34610748291015625, 0.28476715087890625, 0.2657318115234375, -0.24407958984375, 0.17669677734375, 0.8132095336914062, 0.35509490966796875, -0.4173431396484375, -0.16810989379882812, -0.2232666015625, 0.01976776123046875, -0.1698455810546875, 0.099517822265625, 0.4748725891113281, 0.00713348388671875, 0.07341766357421875, 0.6455154418945312, -0.15167617797851562, -0.624267578125, 0.045185089111328125, -0.4097900390625, 0.6651229858398438, 1.0926055908203125, -0.6887283325195312, -0.18963050842285156, -0.0427703857421875, 0.40761566162109375, -0.47696685791015625, 0.26513671875, 0.04827880859375, -0.02225494384765625, 0.11598587036132812, -0.7695541381835938, 0.14948654174804688, 0.14847564697265625, 1.1150360107421875, 0.5017623901367188, -0.19158935546875, -0.1978607177734375, -0.0234832763671875, 0.5220565795898438, 0.014446258544921875, 0.837188720703125, -0.7979393005371094, -0.04701995849609375, 0.193817138671875, -1.212921142578125, -0.2872161865234375, 0.197662353515625, 0.161895751953125, -0.21636962890625, -0.2004241943359375, -0.45511627197265625, -0.29625701904296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000017.npy"}
{"epoch": 0.024963289280469897, "step": 18, "batch_size": 64, "mean": 0.004341289401054382, "std": 0.39755260944366455, "min": -1.0481719970703125, "p10": -0.46836776733398433, "median": 0.0048999786376953125, "p90": 0.4898574829101563, "max": 0.959747314453125, "pos_frac": 0.53125, "sample": [0.016407012939453125, -0.0165863037109375, -0.24761199951171875, 0.4324188232421875, 0.3668212890625, 0.230987548828125, -0.37624359130859375, -0.9129791259765625, -0.3413238525390625, -0.4124946594238281, -0.0720062255859375, -0.02820873260498047, 0.18865203857421875, -0.25153350830078125, -0.010345458984375, 0.49308013916015625, 0.004383087158203125, -0.1547718048095703, -0.5169486999511719, 0.224029541015625, -0.22562789916992188, 0.39203643798828125, -0.202301025390625, -0.23241424560546875, 0.41717529296875, -0.4291343688964844, -1.0481719970703125, 0.48233795166015625, 0.2701683044433594, 0.6956214904785156, 0.10390853881835938, 0.19755935668945312, 0.13371658325195312, -0.284027099609375, 0.959747314453125, 0.0019378662109375, 0.0371551513671875, -0.16517066955566406, 0.527099609375, -0.4809722900390625, 0.18466949462890625, -0.04581451416015625, 0.9002609252929688, -0.5089263916015625, 0.0639190673828125, 0.0480499267578125, 0.242767333984375, 0.025135040283203125, 0.1209716796875, -0.43895721435546875, -0.26816558837890625, -0.2376861572265625, 0.7326126098632812, 0.22528839111328125, -0.11707496643066406, -0.43650054931640625, -0.5904998779296875, -0.0042171478271484375, 0.68096923828125, 0.185760498046875, 0.2140636444091797, 0.0054168701171875, -0.6043853759765625, 0.13381576538085938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000018.npy"}
{"epoch": 0.02643171806167401, "step": 19, "batch_size": 64, "mean": 0.009020596742630005, "std": 0.4559498429298401, "min": -0.9843978881835938, "p10": -0.6344291687011718, "median": -0.058582305908203125, "p90": 0.5696716308593752, "max": 1.4770965576171875, "pos_frac": 0.4375, "sample": [-0.29834556579589844, 0.15152931213378906, 0.14551925659179688, -0.0685272216796875, 0.00997161865234375, 1.4770965576171875, 0.5891952514648438, -0.2619209289550781, 0.006519317626953125, -0.382171630859375, 0.7799606323242188, -0.1331329345703125, -0.11208343505859375, -0.142181396484375, -0.42267608642578125, -0.2777252197265625, 0.8626251220703125, -0.06700515747070312, 0.73260498046875, 0.344207763671875, -0.373565673828125, -0.1616973876953125, 0.2176513671875, -0.3861083984375, -0.05601310729980469, 0.09615898132324219, 0.4642333984375, 0.296051025390625, 0.696258544921875, 0.33597564697265625, -0.077301025390625, -0.23104095458984375, 0.47709083557128906, -0.04187774658203125, -0.64398193359375, -0.10417938232421875, 0.5241165161132812, -0.06115150451660156, -0.9843978881835938, 1.004241943359375, 0.0364837646484375, -0.694915771484375, -0.164337158203125, -0.2914581298828125, 0.4868316650390625, 0.12773895263671875, -0.20118141174316406, -0.6187973022460938, -0.64923095703125, -0.6769256591796875, -0.6411285400390625, 0.23775100708007812, 0.24939727783203125, -0.7001609802246094, -0.4407196044921875, -0.13905715942382812, 0.13836669921875, -0.043064117431640625, -0.013153076171875, 0.132537841796875, 0.337371826171875, 0.5167388916015625, -0.20258331298828125, -0.13311004638671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000019.npy"}
{"epoch": 0.027900146842878122, "step": 20, "batch_size": 64, "mean": 0.0042449235916137695, "std": 0.38712218403816223, "min": -1.0001678466796875, "p10": -0.5227432250976562, "median": 0.029455184936523438, "p90": 0.4228225708007813, "max": 0.9121856689453125, "pos_frac": 0.546875, "sample": [0.18120574951171875, 0.14393234252929688, -0.009441375732421875, -1.0001678466796875, -0.23931884765625, 0.2888374328613281, -0.2486419677734375, 0.9121856689453125, 0.8925018310546875, -0.6699600219726562, -0.19751739501953125, -0.2042999267578125, 0.0087432861328125, 0.08061599731445312, 0.07797622680664062, 0.412109375, -0.07921981811523438, 0.018657684326171875, -0.1201934814453125, -0.07537841796875, -0.7023544311523438, 0.22693634033203125, 0.38509368896484375, 0.0499114990234375, 0.1621856689453125, 0.17023468017578125, -0.2533721923828125, 0.11864089965820312, 0.5234603881835938, -0.08250999450683594, -0.10250473022460938, 0.05127716064453125, 0.2764129638671875, -0.229827880859375, 0.41123199462890625, 0.034637451171875, -0.212982177734375, -0.8822021484375, -0.2146759033203125, 0.13013839721679688, -0.1630401611328125, -0.528717041015625, 0.2914085388183594, 0.4274139404296875, -0.04669189453125, 0.07525634765625, -0.53228759765625, -0.294036865234375, 0.38018035888671875, 0.07070541381835938, -0.560760498046875, -0.38636016845703125, 0.22234344482421875, 0.6047744750976562, -0.5088043212890625, 0.024272918701171875, -0.1180419921875, 0.30622100830078125, 0.44080543518066406, -0.1906890869140625, 0.14232254028320312, 0.887969970703125, -0.488800048828125, 0.18387222290039062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000020.npy"}
{"epoch": 0.02936857562408223, "step": 21, "batch_size": 64, "mean": 0.12687447667121887, "std": 0.3982231318950653, "min": -0.607940673828125, "p10": -0.3740840911865234, "median": 0.09633636474609375, "p90": 0.622072982788086, "max": 1.2558135986328125, "pos_frac": 0.609375, "sample": [-0.225189208984375, 0.6442451477050781, 0.30474090576171875, -0.2266845703125, 0.0852508544921875, 0.06838607788085938, 0.08022308349609375, 0.6136054992675781, 0.48278045654296875, -0.098907470703125, -0.312469482421875, 0.5991134643554688, 0.027956008911132812, -0.3900413513183594, 0.3066825866699219, 0.5934219360351562, -0.26529693603515625, 0.107421875, 0.3680763244628906, 0.1267547607421875, -0.607940673828125, 0.799285888671875, 0.9881134033203125, -0.3797798156738281, 0.0070819854736328125, -0.41039276123046875, 0.4762725830078125, 0.17685317993164062, 0.12502288818359375, -0.263458251953125, -0.538055419921875, 0.4948539733886719, -0.2889137268066406, -0.3607940673828125, -0.5330276489257812, 0.019939422607421875, 0.2199554443359375, -0.111297607421875, 0.258148193359375, -0.03997802734375, 0.3885955810546875, -0.3570976257324219, -0.2972564697265625, 0.05513954162597656, 0.46366119384765625, -0.020442962646484375, 0.20466232299804688, 0.50384521484375, -0.2280120849609375, -0.0722503662109375, -0.013515472412109375, 0.35634613037109375, 0.3570556640625, -0.4388580322265625, 0.3161659240722656, -0.08220291137695312, -0.1566162109375, 0.625701904296875, 1.2558135986328125, 0.29616546630859375, 0.6457901000976562, 0.12483978271484375, 0.5178947448730469, 0.7525825500488281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000021.npy"}
{"epoch": 0.030837004405286344, "step": 22, "batch_size": 64, "mean": 0.11361068487167358, "std": 0.3960931599140167, "min": -1.0052032470703125, "p10": -0.32989959716796874, "median": 0.058208465576171875, "p90": 0.6111892700195313, "max": 1.3006439208984375, "pos_frac": 0.53125, "sample": [0.399139404296875, -0.07861328125, 0.3790550231933594, -0.08126258850097656, 0.48859405517578125, -0.496734619140625, 0.4615325927734375, 0.11035919189453125, -0.20047378540039062, -0.24399566650390625, -0.2666778564453125, -0.1395263671875, -0.3368186950683594, 0.22406387329101562, -0.4083824157714844, 0.1861572265625, 0.26978302001953125, -0.01071929931640625, 0.08864593505859375, 0.17108154296875, -0.2751350402832031, 1.3006439208984375, 0.721527099609375, 0.6892547607421875, 0.4092826843261719, -0.07822418212890625, -1.0052032470703125, 0.2623291015625, 0.7243499755859375, 0.88427734375, -0.41893577575683594, 0.05344390869140625, 0.6177444458007812, -0.4266357421875, 0.07099533081054688, 0.06281280517578125, 0.0536041259765625, -0.0826263427734375, 0.2725486755371094, -0.00287628173828125, 0.492706298828125, 0.6635322570800781, -0.04128265380859375, -0.08765792846679688, -0.3137550354003906, -0.2573089599609375, 0.3427581787109375, 0.5828781127929688, 0.55853271484375, -0.06380653381347656, -0.12066078186035156, 0.387969970703125, 0.5958938598632812, 0.07184219360351562, -0.03607177734375, -0.2410125732421875, -0.406951904296875, -0.14597702026367188, -0.0399017333984375, 0.52459716796875, 0.5260467529296875, 0.16802978515625, -0.0656280517578125, -0.1720733642578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000022.npy"}
{"epoch": 0.032305433186490456, "step": 23, "batch_size": 64, "mean": 0.02496352791786194, "std": 0.4065571129322052, "min": -1.164794921875, "p10": -0.4479156494140625, "median": 0.06338691711425781, "p90": 0.49053497314453137, "max": 1.026580810546875, "pos_frac": 0.578125, "sample": [-0.1878643035888672, -0.08821487426757812, -0.01141357421875, 0.03179931640625, 0.0669403076171875, -0.46257781982421875, 0.3304023742675781, 0.2024078369140625, -0.24701309204101562, 1.026580810546875, -0.3368988037109375, -0.2378387451171875, 0.6021270751953125, -0.27911376953125, 0.33109283447265625, 0.10588455200195312, -0.5267677307128906, 0.7842979431152344, 0.20842742919921875, -0.15389633178710938, -0.51361083984375, 0.0707855224609375, 0.347686767578125, -0.442596435546875, -0.4246711730957031, -0.3577423095703125, 0.1568756103515625, 0.72845458984375, 0.41918182373046875, 0.8365478515625, 0.08492279052734375, -0.4107666015625, 0.0614013671875, 0.21138381958007812, -0.28400611877441406, -0.17201995849609375, -0.5464019775390625, 0.18291473388671875, 0.41374969482421875, 0.16083526611328125, 0.6671581268310547, 0.06537246704101562, 0.028865814208984375, 0.19664382934570312, 0.07401657104492188, -0.4501953125, 0.4653167724609375, -0.172454833984375, 0.4041595458984375, -0.32940673828125, -0.3538665771484375, 0.2732124328613281, 0.097442626953125, 0.001384735107421875, 0.3381195068359375, -0.271728515625, 0.42426300048828125, -1.164794921875, 0.3559684753417969, -0.687530517578125, -0.14022064208984375, -0.4364013671875, 0.5013427734375, 0.029712677001953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000023.npy"}
{"epoch": 0.033773861967694566, "step": 24, "batch_size": 64, "mean": 0.012234941124916077, "std": 0.4346502721309662, "min": -0.9354400634765625, "p10": -0.5806732177734375, "median": 0.032337188720703125, "p90": 0.5644607543945312, "max": 0.9850616455078125, "pos_frac": 0.515625, "sample": [-0.5341415405273438, -0.8839111328125, -0.3895988464355469, -0.15247726440429688, -0.01268768310546875, 0.47002410888671875, 0.1916351318359375, 0.07110977172851562, 0.04917144775390625, 0.12408447265625, -0.029142379760742188, -0.1303863525390625, -0.1388874053955078, 0.9322586059570312, 0.1345672607421875, 0.4742546081542969, -0.00469207763671875, -0.9211387634277344, -0.188507080078125, 0.5983619689941406, -0.03783416748046875, 0.23537826538085938, -0.4045066833496094, 0.5815277099609375, 0.5518035888671875, 0.19789886474609375, 0.3802337646484375, -0.66363525390625, -0.113677978515625, 0.06501007080078125, 0.30340576171875, -0.585296630859375, 0.056240081787109375, 0.520172119140625, -0.04952239990234375, 0.424072265625, 0.577301025390625, 0.4136505126953125, 0.10457611083984375, -0.10332489013671875, -0.9354400634765625, -0.7075386047363281, -0.05120849609375, 0.6871490478515625, 0.0555877685546875, 0.31021881103515625, 0.0505218505859375, -0.19971942901611328, -0.56988525390625, -0.35433197021484375, 0.04146575927734375, -0.09977531433105469, -0.2308349609375, -0.2676525115966797, 0.452850341796875, -0.528289794921875, 0.0232086181640625, 0.56988525390625, 0.05388832092285156, 0.9850616455078125, -0.6802902221679688, -0.0719146728515625, -0.3934783935546875, 0.5301895141601562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000024.npy"}
{"epoch": 0.03524229074889868, "step": 25, "batch_size": 64, "mean": 0.06985485553741455, "std": 0.5171591639518738, "min": -0.8009185791015625, "p10": -0.42043304443359375, "median": 0.0017757415771484375, "p90": 0.6632068634033206, "max": 2.23309326171875, "pos_frac": 0.5, "sample": [-0.19905853271484375, -0.11075782775878906, 0.37261962890625, 0.47003173828125, 0.18341445922851562, 0.03446769714355469, 0.2637176513671875, -0.21172332763671875, 1.131927490234375, -0.25407981872558594, 0.8693084716796875, -0.63189697265625, 0.18707275390625, -0.1961822509765625, -0.4172821044921875, 0.6896781921386719, -0.4107017517089844, -0.21404266357421875, -0.28194427490234375, 0.0789031982421875, -0.10205841064453125, 0.016727447509765625, 0.5904388427734375, 0.7223968505859375, -0.365997314453125, 0.728302001953125, -0.10303497314453125, 0.23218536376953125, -0.6524505615234375, -0.017915725708007812, -0.8009185791015625, -0.421783447265625, -0.09905624389648438, -0.3347625732421875, -0.1651153564453125, 0.06776809692382812, 0.19910812377929688, 0.6014404296875, 2.23309326171875, -0.38507080078125, -0.4160652160644531, 0.17894744873046875, -0.6400070190429688, 0.43106842041015625, -0.008434295654296875, -0.13616943359375, 0.24646377563476562, 0.0261077880859375, 0.22728729248046875, 0.24887466430664062, 1.550018310546875, -0.07391738891601562, 0.01198577880859375, -0.3604278564453125, 0.08581924438476562, 0.20367431640625, -0.0745086669921875, -0.5927200317382812, 0.24179840087890625, -0.5195999145507812, -0.16539764404296875, -0.2377471923828125, 0.47525787353515625, 0.4716339111328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000025.npy"}
{"epoch": 0.03671071953010279, "step": 26, "batch_size": 64, "mean": 0.05025133490562439, "std": 0.4283388555049896, "min": -1.306640625, "p10": -0.4859874725341797, "median": 0.13268280029296875, "p90": 0.5145027160644532, "max": 0.89068603515625, "pos_frac": 0.609375, "sample": [0.3744354248046875, 0.10941314697265625, 0.2050628662109375, 0.2226409912109375, 0.51898193359375, 0.30255126953125, 0.04038238525390625, 0.42589569091796875, -0.17559051513671875, -0.146209716796875, 0.321990966796875, 0.28984832763671875, -0.385101318359375, -0.10504913330078125, 0.28330230712890625, -1.306640625, 0.2087249755859375, 0.43909454345703125, -0.03460693359375, 0.4021034240722656, 0.14013671875, 0.1410064697265625, 0.2934303283691406, 0.773101806640625, -0.9154891967773438, -0.9611663818359375, -0.058502197265625, -0.361724853515625, 0.2798881530761719, -0.25286865234375, 0.42839813232421875, -0.23548126220703125, -0.1324462890625, -0.5624237060546875, -0.2608680725097656, 0.521728515625, -0.64111328125, 0.1936187744140625, 0.37184906005859375, -0.702178955078125, 0.14887237548828125, 0.554443359375, -0.48822021484375, -0.46918487548828125, 0.33847808837890625, 0.5040512084960938, -0.072479248046875, -0.12830352783203125, -0.1652984619140625, 0.009378433227539062, -0.331512451171875, 0.43359375, 0.0658721923828125, 0.33426666259765625, 0.89068603515625, 0.451568603515625, 0.026763916015625, 0.2491912841796875, 0.6362533569335938, -0.4807777404785156, -0.18927383422851562, 0.08986282348632812, 0.6324996948242188, 0.1252288818359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000026.npy"}
{"epoch": 0.0381791483113069, "step": 27, "batch_size": 64, "mean": 0.07869991660118103, "std": 0.48043230175971985, "min": -1.49163818359375, "p10": -0.4471385955810547, "median": 0.08673095703125, "p90": 0.775406646728516, "max": 1.4207305908203125, "pos_frac": 0.59375, "sample": [-0.2546253204345703, 0.37877655029296875, -0.31426239013671875, -0.2647438049316406, 1.4207305908203125, -0.05385589599609375, 1.2269439697265625, 0.0985107421875, 0.1285247802734375, -0.06577873229980469, -0.3820991516113281, -0.25775146484375, 0.9365310668945312, 0.00739288330078125, -0.1699676513671875, 0.0283355712890625, 0.135223388671875, 0.0984649658203125, -0.06383132934570312, -0.5990447998046875, -0.12530899047851562, 1.0778961181640625, 0.17774200439453125, 0.41578102111816406, 0.0749969482421875, -0.0243072509765625, -0.4280967712402344, -0.1329326629638672, 0.4082794189453125, 0.811004638671875, 0.06281280517578125, -0.015148162841796875, 0.1266918182373047, 0.1793212890625, 0.31336212158203125, 0.8925552368164062, 0.0667724609375, 0.9238357543945312, -0.4756050109863281, -0.40975189208984375, -0.272979736328125, 0.3051910400390625, -0.73095703125, 0.2600440979003906, 0.11639022827148438, -1.49163818359375, 0.418670654296875, -0.0123443603515625, -0.41756439208984375, -0.124603271484375, 0.6923446655273438, 0.03318023681640625, 0.3635406494140625, 0.10865020751953125, 0.35699462890625, 0.0984954833984375, -0.49160003662109375, 0.17218780517578125, 0.127166748046875, 0.16704559326171875, -0.45529937744140625, 0.231689453125, -0.59228515625, 0.221099853515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000027.npy"}
{"epoch": 0.039647577092511016, "step": 28, "batch_size": 64, "mean": 0.0692436695098877, "std": 0.4800063967704773, "min": -0.9635772705078125, "p10": -0.4972137451171875, "median": 0.014942169189453125, "p90": 0.7306743621826173, "max": 1.413543701171875, "pos_frac": 0.53125, "sample": [-0.03582763671875, -0.49591064453125, 0.386566162109375, 0.3794097900390625, 0.15453720092773438, 0.8549041748046875, -0.40111541748046875, -0.06170654296875, 0.07675552368164062, 0.8733024597167969, -0.23394775390625, 0.44793701171875, 1.413543701171875, -0.2922515869140625, 0.4262847900390625, 0.2621574401855469, 0.3218193054199219, 0.412353515625, -0.7455024719238281, -0.05959320068359375, -0.249603271484375, 0.089508056640625, 0.4304962158203125, 0.3576202392578125, -0.152923583984375, -0.4268035888671875, 0.14416885375976562, 1.01800537109375, -0.10961151123046875, 0.06642913818359375, -0.048664093017578125, -0.06833648681640625, 0.15494537353515625, 1.0811004638671875, 0.7120170593261719, 0.0254669189453125, 3.814697265625e-05, 0.15032196044921875, -0.13703536987304688, 0.7386703491210938, -0.092742919921875, 0.05706787109375, 0.00441741943359375, -0.497772216796875, 0.4556884765625, -0.8310546875, -0.5314407348632812, 0.49832916259765625, 0.5184326171875, -0.4840545654296875, -0.5282440185546875, -0.41135406494140625, -0.020236968994140625, -0.04222869873046875, -0.2879905700683594, 0.110687255859375, 0.4055900573730469, -0.01483154296875, -0.9635772705078125, -0.3275566101074219, 0.7708244323730469, -0.419189453125, -0.7721099853515625, 0.3754158020019531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000028.npy"}
{"epoch": 0.041116005873715125, "step": 29, "batch_size": 64, "mean": 0.1637905091047287, "std": 0.4623018801212311, "min": -0.71649169921875, "p10": -0.2945449829101562, "median": 0.0869302749633789, "p90": 0.7844818115234375, "max": 1.57708740234375, "pos_frac": 0.546875, "sample": [-0.17212677001953125, 0.09996604919433594, 0.4355316162109375, 0.2768211364746094, -0.18953704833984375, -0.0174407958984375, 0.45861053466796875, -0.15996551513671875, 0.5222930908203125, 0.8849334716796875, -0.115478515625, 0.02949237823486328, -0.19516754150390625, 0.8653106689453125, -0.10398101806640625, -0.71649169921875, -0.21161651611328125, 0.76190185546875, -0.13362503051757812, 0.2878303527832031, 0.35739898681640625, -0.0962677001953125, -0.5922622680664062, -0.003681182861328125, 0.794158935546875, -0.322601318359375, 0.3355903625488281, 0.3727264404296875, 0.5544891357421875, -0.63275146484375, 0.13285064697265625, -0.21735763549804688, -0.49407958984375, 1.13018798828125, -0.2290802001953125, -0.03519439697265625, 0.4805908203125, -0.03369140625, -0.19287109375, 1.57708740234375, 0.2391204833984375, -0.0577545166015625, 0.6135101318359375, -0.48134613037109375, 0.3627758026123047, 0.35503578186035156, 1.0661468505859375, 0.0075531005859375, 0.577392578125, -0.12075233459472656, 0.07389450073242188, -0.1476287841796875, 0.58123779296875, 0.30588531494140625, 0.6156845092773438, 0.2257537841796875, -0.0853424072265625, -0.1961669921875, 0.2819099426269531, 0.376800537109375, 1.0061187744140625, -0.68780517578125, 0.23435211181640625, -0.15628623962402344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000029.npy"}
{"epoch": 0.042584434654919234, "step": 30, "batch_size": 64, "mean": 0.13101181387901306, "std": 0.40445253252983093, "min": -1.271820068359375, "p10": -0.3410110473632812, "median": 0.15625572204589844, "p90": 0.5856475830078125, "max": 0.883270263671875, "pos_frac": 0.6875, "sample": [0.296234130859375, 0.5808143615722656, 0.011150360107421875, 0.28851318359375, 0.5032806396484375, -0.221038818359375, 0.424041748046875, 0.1259918212890625, 0.1222076416015625, 0.36789703369140625, -0.5372238159179688, 0.7200469970703125, 0.19246673583984375, 0.5599479675292969, -0.356689453125, -0.69024658203125, -0.5018692016601562, 0.63983154296875, -0.1089630126953125, -0.6358489990234375, -0.009002685546875, 0.0538330078125, 0.359100341796875, 0.0896148681640625, -0.00287628173828125, 0.2451457977294922, -0.2242431640625, 0.1786041259765625, -0.2668647766113281, -0.11326789855957031, 0.7299957275390625, 0.37465667724609375, 0.10764312744140625, -0.12405776977539062, 0.4955711364746094, 0.41358184814453125, -0.69140625, 0.883270263671875, 0.0850830078125, 0.5272445678710938, 0.001483917236328125, -0.3044281005859375, 0.10489273071289062, 0.09729766845703125, -0.1349334716796875, 0.6380691528320312, -1.271820068359375, -0.14312362670898438, 0.12226676940917969, 0.503509521484375, 0.18209457397460938, 0.7912826538085938, 0.21458816528320312, 0.3619804382324219, 0.159942626953125, 0.5280494689941406, 0.5108985900878906, 0.27175140380859375, -0.2686309814453125, 0.15256881713867188, 0.5877189636230469, 0.18328857421875, 0.3568115234375, -0.15297317504882812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000030.npy"}
{"epoch": 0.04405286343612335, "step": 31, "batch_size": 64, "mean": 0.11164584755897522, "std": 0.3907439708709717, "min": -0.4414863586425781, "p10": -0.30876617431640624, "median": 0.04256439208984375, "p90": 0.6050346374511719, "max": 1.389068603515625, "pos_frac": 0.515625, "sample": [-0.20691299438476562, 0.1152801513671875, 0.6947097778320312, 0.6054534912109375, 0.37467193603515625, -0.3065338134765625, -0.11753082275390625, 0.02005767822265625, 0.12827301025390625, -0.39739990234375, -0.357696533203125, 0.425079345703125, -0.03564453125, -0.08376502990722656, -0.3866767883300781, -0.1347637176513672, 0.19121551513671875, 0.2752971649169922, 0.8599853515625, -0.05885887145996094, -0.266845703125, -0.15192031860351562, -0.03717041015625, -0.413360595703125, 0.3485870361328125, -0.4414863586425781, 0.06507110595703125, -0.15267181396484375, -0.15581512451171875, 0.40240478515625, -0.168243408203125, 0.10902786254882812, -0.00484466552734375, -0.0153961181640625, 0.18929672241210938, 1.2021484375, 0.17661666870117188, 0.07306671142578125, 0.30231475830078125, -0.27159881591796875, -0.140869140625, 0.6040573120117188, -0.24734878540039062, -0.012050628662109375, -0.12889862060546875, 1.131439208984375, 0.6959686279296875, 0.46068572998046875, 0.21376800537109375, -0.04953575134277344, 0.22670364379882812, 0.0764617919921875, 0.18707275390625, 0.233978271484375, -0.309722900390625, 0.24139404296875, -0.20091629028320312, 0.3727874755859375, 1.389068603515625, 0.20473861694335938, -0.00389862060546875, -0.39208221435546875, 0.4657135009765625, -0.2666015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000031.npy"}
{"epoch": 0.04552129221732746, "step": 32, "batch_size": 64, "mean": 0.0882592499256134, "std": 0.4879763126373291, "min": -1.2233734130859375, "p10": -0.4122577667236328, "median": 0.06586074829101562, "p90": 0.6740272521972658, "max": 1.7897796630859375, "pos_frac": 0.59375, "sample": [0.43929290771484375, 0.740264892578125, -0.4593048095703125, -0.25881195068359375, -0.2548236846923828, 1.1346588134765625, -0.49359130859375, 0.6172027587890625, 0.11857986450195312, 1.1206436157226562, 0.1274566650390625, 0.02581787109375, 0.1929473876953125, 0.19995880126953125, -0.121337890625, 0.03521728515625, 0.36019134521484375, 0.01611328125, 0.23969268798828125, -0.726776123046875, 0.23525238037109375, 0.634490966796875, 1.7897796630859375, -0.616668701171875, 0.07848739624023438, -0.0531158447265625, 0.6909713745117188, 0.23406982421875, 0.4408378601074219, -0.293609619140625, -0.8536605834960938, 0.14468765258789062, 0.043422698974609375, 0.8468017578125, 0.08451080322265625, 0.2593231201171875, -0.2377471923828125, -0.10068511962890625, -0.330596923828125, -0.24438858032226562, 0.168304443359375, -0.13017654418945312, 0.2147064208984375, -0.16630172729492188, 0.149688720703125, 0.228759765625, -0.19062042236328125, 0.0648193359375, 0.4585113525390625, -0.370849609375, -0.1680908203125, 0.07644271850585938, -1.2233734130859375, 0.8482818603515625, -0.2922172546386719, -0.3210906982421875, -0.01348114013671875, -0.0551605224609375, -0.30878448486328125, 0.54400634765625, -0.4300041198730469, 0.0634918212890625, 0.6292724609375, 0.06690216064453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000032.npy"}
{"epoch": 0.04698972099853157, "step": 33, "batch_size": 64, "mean": 0.13513219356536865, "std": 0.5082568526268005, "min": -0.9523391723632812, "p10": -0.47583770751953125, "median": 0.13007354736328125, "p90": 0.7032901763916016, "max": 1.6356201171875, "pos_frac": 0.578125, "sample": [-0.0141448974609375, 0.10613250732421875, 0.413299560546875, -0.355987548828125, -0.1936187744140625, 0.32906341552734375, -0.821807861328125, 0.3052978515625, 0.8722724914550781, 0.4856224060058594, 1.1498641967773438, 0.6294021606445312, 1.6356201171875, 0.7138633728027344, 0.678619384765625, 0.2713813781738281, -0.08466339111328125, -0.21734237670898438, -0.45068359375, 0.0323638916015625, 0.25086212158203125, -0.19744873046875, 1.2460174560546875, 0.894287109375, 0.5917129516601562, 0.179412841796875, 0.03034210205078125, 0.16913604736328125, 0.602294921875, -0.581329345703125, -0.88836669921875, 0.6422271728515625, -0.9523391723632812, 0.3783283233642578, -0.440277099609375, -0.1369476318359375, -0.4183235168457031, -0.5238265991210938, 0.3477325439453125, -0.019073486328125, 0.3555450439453125, -0.4866180419921875, 0.29468536376953125, 0.3200492858886719, 0.15401458740234375, -0.12105560302734375, -0.21289825439453125, 0.39582061767578125, 0.3380279541015625, -0.3037300109863281, 0.06197357177734375, -0.007940292358398438, 0.43035888671875, -0.12484359741210938, -0.08001708984375, 0.451141357421875, 0.03078460693359375, -0.5233917236328125, -0.08206558227539062, 0.9534683227539062, -0.4252586364746094, -0.22972869873046875, 0.28997039794921875, 0.5111923217773438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000033.npy"}
{"epoch": 0.048458149779735685, "step": 34, "batch_size": 64, "mean": 0.14157328009605408, "std": 0.4822671115398407, "min": -0.5831222534179688, "p10": -0.515777587890625, "median": 0.07332992553710938, "p90": 0.7485557556152346, "max": 1.859619140625, "pos_frac": 0.640625, "sample": [0.09793853759765625, 0.3160552978515625, 0.5076370239257812, -0.0314178466796875, 1.292755126953125, 0.030185699462890625, -0.1680145263671875, -0.33600616455078125, 0.06935501098632812, -0.2903289794921875, 0.4308929443359375, -0.14389801025390625, 0.1551666259765625, 0.6913528442382812, -0.141815185546875, 0.7730712890625, -0.5541877746582031, -0.061290740966796875, 0.2295684814453125, 0.5991058349609375, -0.56207275390625, -0.517242431640625, 0.40724945068359375, 0.0729827880859375, 1.859619140625, 0.27773284912109375, 0.4629364013671875, 0.23663711547851562, 1.0620574951171875, -0.0822906494140625, 0.42412567138671875, -0.04007720947265625, 0.056671142578125, 0.0600128173828125, 0.551116943359375, -0.5400161743164062, -0.5187320709228516, 0.8157958984375, -0.3482475280761719, -0.5725631713867188, -0.10938262939453125, -0.22065353393554688, 0.029773712158203125, 0.573974609375, 0.1125640869140625, 0.0694427490234375, 0.24531173706054688, 0.2931404113769531, 0.8678817749023438, -0.512359619140625, 0.22589111328125, 0.65789794921875, 0.21714019775390625, 0.1165313720703125, 0.124481201171875, 0.07367706298828125, 1.0443496704101562, -0.36907196044921875, 0.1785430908203125, 0.057910919189453125, -0.5831222534179688, 0.025543212890625, -0.15885543823242188, -0.47174072265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000034.npy"}
{"epoch": 0.049926578560939794, "step": 35, "batch_size": 64, "mean": 0.1686515212059021, "std": 0.46060439944267273, "min": -1.07403564453125, "p10": -0.40479125976562497, "median": 0.174407958984375, "p90": 0.7528545379638674, "max": 1.4149932861328125, "pos_frac": 0.625, "sample": [0.2746009826660156, 0.7738075256347656, 0.2373046875, 0.323150634765625, -0.048583984375, 0.08747291564941406, 0.568695068359375, 0.28672027587890625, -0.15665054321289062, -0.1298980712890625, -0.902069091796875, -0.017663955688476562, 0.8919525146484375, 0.4660186767578125, -0.07773208618164062, -0.5549430847167969, -0.17398834228515625, 0.2745361328125, 0.361602783203125, -0.10971260070800781, 0.4266071319580078, 0.0676116943359375, 1.4149932861328125, 0.7993392944335938, -0.24721527099609375, -0.1833953857421875, 0.15561294555664062, -0.705718994140625, -0.058349609375, 0.946258544921875, 0.539398193359375, 0.10962677001953125, -1.07403564453125, 0.2890663146972656, 0.0304718017578125, 0.461395263671875, 0.24878692626953125, 0.4460105895996094, -0.351470947265625, 0.42697906494140625, -0.17020416259765625, 0.5333671569824219, -0.11871337890625, 0.09058380126953125, 0.39907073974609375, -0.19834136962890625, -0.427642822265625, -0.0491790771484375, 0.26788330078125, 0.17304611206054688, 0.17576980590820312, 0.0174560546875, 0.4156913757324219, -0.534393310546875, 0.7039642333984375, 0.42333221435546875, 0.4366302490234375, -0.053676605224609375, 1.152740478515625, -0.44012451171875, -0.036998748779296875, 0.5232696533203125, 0.9260711669921875, 0.4675025939941406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000035.npy"}
{"epoch": 0.0513950073421439, "step": 36, "batch_size": 64, "mean": 0.12370041012763977, "std": 0.5115190148353577, "min": -1.460479736328125, "p10": -0.48172607421875, "median": 0.07850456237792969, "p90": 0.7111694335937502, "max": 1.89837646484375, "pos_frac": 0.578125, "sample": [0.07154083251953125, -0.05559349060058594, -0.29948997497558594, -0.7113571166992188, 0.2792510986328125, 0.2766590118408203, -0.03226470947265625, 0.08546829223632812, -0.06173896789550781, 1.89837646484375, -0.0685577392578125, -0.37240028381347656, -0.15177154541015625, -0.20963668823242188, 0.3415718078613281, 0.1869354248046875, 0.46584320068359375, 0.32891082763671875, 0.40830230712890625, -0.056396484375, 0.3102569580078125, -0.21453475952148438, -0.35800933837890625, 0.27669525146484375, 0.970977783203125, 0.9162139892578125, 0.46685028076171875, -0.482421875, -1.460479736328125, -0.09268951416015625, -0.48462677001953125, 0.05461883544921875, 0.055286407470703125, -0.06317138671875, 0.4195404052734375, 0.012836456298828125, -0.4801025390625, 0.273590087890625, 0.323822021484375, 0.36176300048828125, 1.0084877014160156, 0.615264892578125, 1.1512947082519531, 0.1310882568359375, -0.006195068359375, -0.000701904296875, 0.3395042419433594, 1.0750885009765625, 0.11954116821289062, -0.02527618408203125, -0.3615264892578125, 0.663787841796875, 0.302093505859375, 0.3274993896484375, 0.731475830078125, -0.2044219970703125, -0.7811279296875, -0.10332489013671875, -0.54461669921875, 0.19392776489257812, 0.1734466552734375, 0.4411468505859375, -0.5068283081054688, 0.047130584716796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000036.npy"}
{"epoch": 0.05286343612334802, "step": 37, "batch_size": 64, "mean": 0.2637891173362732, "std": 0.6376413106918335, "min": -1.06170654296875, "p10": -0.2700630187988281, "median": 0.09233474731445312, "p90": 1.114488983154297, "max": 2.418060302734375, "pos_frac": 0.6875, "sample": [-0.1173095703125, -0.11727714538574219, -0.006378173828125, 0.04776763916015625, -0.14559173583984375, 0.16939163208007812, 0.0032958984375, 0.15918350219726562, 0.5024261474609375, -0.1165924072265625, 1.1494064331054688, 1.821533203125, -0.19440078735351562, 1.5855712890625, -0.784942626953125, 0.21002197265625, 0.14102554321289062, 0.006861686706542969, -0.2180194854736328, 2.418060302734375, -1.06170654296875, -0.376861572265625, 0.035888671875, 0.4603271484375, 0.42330169677734375, -0.3963775634765625, -0.11260986328125, 0.2229766845703125, 0.5113906860351562, 0.22194671630859375, 0.023400306701660156, 1.1274642944335938, 0.041131019592285156, 0.6607208251953125, 0.4897003173828125, 0.6465873718261719, -0.08147144317626953, -0.268829345703125, 0.0830078125, 2.340240478515625, 0.04621124267578125, -0.341644287109375, 0.3552093505859375, -0.1555633544921875, 0.40012359619140625, 0.0507659912109375, 0.091705322265625, 0.743438720703125, -0.6553802490234375, 1.2476768493652344, -0.20413970947265625, 0.2712211608886719, 0.6592254638671875, -0.27059173583984375, 0.09851264953613281, 0.09296417236328125, -0.1015472412109375, 1.0842132568359375, 0.2615509033203125, 0.39306640625, 0.11043548583984375, 1.079376220703125, 0.029790878295898438, 0.09162139892578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000037.npy"}
{"epoch": 0.05433186490455213, "step": 38, "batch_size": 64, "mean": 0.3004909157752991, "std": 0.5923438668251038, "min": -0.8516044616699219, "p10": -0.2894298553466797, "median": 0.22199153900146484, "p90": 0.9088211059570312, "max": 3.00372314453125, "pos_frac": 0.65625, "sample": [-0.0522308349609375, 1.363006591796875, 3.00372314453125, 0.5105743408203125, 0.6505508422851562, -0.20545005798339844, 0.0731658935546875, 0.5593109130859375, -0.5379104614257812, 0.07476806640625, 0.9796295166015625, 0.3589324951171875, -0.112213134765625, -0.18291473388671875, -0.18390655517578125, -0.2793388366699219, 0.3027496337890625, 0.3624076843261719, -0.2540245056152344, 0.260711669921875, 0.328948974609375, 0.823211669921875, 0.014225006103515625, -0.029951095581054688, -0.39678955078125, 0.48596954345703125, 1.076263427734375, -0.5889816284179688, 0.655120849609375, -0.06750106811523438, 0.17908096313476562, -0.26149749755859375, 0.18099212646484375, 0.9027862548828125, 0.1995372772216797, 0.15043258666992188, 0.7767257690429688, 0.8892059326171875, -0.2551422119140625, -0.4239349365234375, 0.911407470703125, 0.24444580078125, 0.5337066650390625, -0.15980148315429688, -0.07241058349609375, 0.44608306884765625, 0.116424560546875, 0.6953887939453125, 0.28377532958984375, 0.7152786254882812, 0.2594757080078125, -0.33464813232421875, -0.11987113952636719, 0.6168746948242188, -0.8516044616699219, 1.3101654052734375, 0.6813812255859375, 0.861846923828125, -0.29375457763671875, 0.198944091796875, -0.14682769775390625, 0.15285015106201172, 0.7874755859375, 1.0645675659179688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000038.npy"}
{"epoch": 0.055800293685756244, "step": 39, "batch_size": 64, "mean": 0.25560712814331055, "std": 0.5706183910369873, "min": -1.1253662109375, "p10": -0.318426513671875, "median": 0.201446533203125, "p90": 0.9334930419921876, "max": 2.3105621337890625, "pos_frac": 0.625, "sample": [0.043643951416015625, -0.070404052734375, 0.5068359375, 0.18976593017578125, 0.6869354248046875, -0.2076416015625, 0.6826171875, -0.42498016357421875, -0.15235137939453125, -0.2988090515136719, 0.2064056396484375, 0.435272216796875, 0.9000701904296875, -0.2038421630859375, 1.4535255432128906, -0.4925689697265625, 0.5423355102539062, -0.7564048767089844, 0.9079971313476562, 0.1964874267578125, -0.2559051513671875, 0.32218170166015625, -0.053386688232421875, 0.9444198608398438, -0.07563972473144531, -0.31932830810546875, -0.24832916259765625, 0.6377716064453125, -1.1253662109375, -0.04093170166015625, 0.6162109375, 0.17460250854492188, 0.5739669799804688, 0.5695686340332031, 0.56573486328125, -0.11339187622070312, -0.35082054138183594, 0.17686080932617188, -0.1596240997314453, 2.3105621337890625, -0.018688201904296875, 0.2240447998046875, 0.2550811767578125, 0.1666717529296875, 1.10150146484375, 0.665924072265625, -0.2684326171875, -0.31632232666015625, -0.83526611328125, 0.18889617919921875, 0.8426284790039062, 0.43935394287109375, 1.0183868408203125, 1.08306884765625, 0.2522850036621094, 0.5714950561523438, 1.050750732421875, 0.8323326110839844, 0.453125, 0.30388450622558594, -0.29029273986816406, 0.12677383422851562, -0.0062999725341796875, 0.223907470703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000039.npy"}
{"epoch": 0.05726872246696035, "step": 40, "batch_size": 64, "mean": 0.23065805435180664, "std": 0.47920456528663635, "min": -0.72119140625, "p10": -0.3305366516113281, "median": 0.16544532775878906, "p90": 0.7459121704101563, "max": 1.92938232421875, "pos_frac": 0.671875, "sample": [-0.30718231201171875, 0.461761474609375, -0.319091796875, -0.158447265625, -0.72119140625, 0.24991607666015625, 0.2876167297363281, 0.6134185791015625, 0.376190185546875, 0.3237724304199219, 0.6473541259765625, 0.0904541015625, -0.33544158935546875, 0.7358932495117188, 0.4562873840332031, -0.14687347412109375, -0.3388557434082031, 0.5143890380859375, -0.18085479736328125, 0.580322265625, -0.5752334594726562, 0.08460617065429688, 0.4496307373046875, 0.7170867919921875, 0.7502059936523438, 0.40962982177734375, -0.3513984680175781, 0.19665908813476562, -0.09549713134765625, 0.4086761474609375, 0.5054473876953125, 0.2345752716064453, -0.3542327880859375, 0.389862060546875, 0.06342697143554688, 0.2338714599609375, -0.07147598266601562, 1.231781005859375, 0.092864990234375, 0.2877464294433594, 1.92938232421875, 0.07932472229003906, -0.06586456298828125, 0.645721435546875, -0.5033950805664062, 0.0418853759765625, 0.8216018676757812, 0.0262451171875, 0.8121223449707031, 0.4781074523925781, 1.1744384765625, -0.21116256713867188, 1.1983184814453125, 0.02471923828125, 0.7155685424804688, -0.29901123046875, -0.13238525390625, -0.12640380859375, 0.623687744140625, 0.1342315673828125, -0.05100250244140625, 0.11016464233398438, 0.09748077392578125, -0.19932937622070312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000040.npy"}
{"epoch": 0.05873715124816446, "step": 41, "batch_size": 64, "mean": 0.25650691986083984, "std": 0.6502161026000977, "min": -1.228240966796875, "p10": -0.477042007446289, "median": 0.15751075744628906, "p90": 1.0492080688476564, "max": 1.9108428955078125, "pos_frac": 0.6875, "sample": [0.59466552734375, -0.30280303955078125, 0.5282516479492188, 0.2578849792480469, 0.0974273681640625, -0.6380615234375, 0.11114501953125, -0.32703399658203125, -0.786590576171875, 0.110137939453125, 1.9108428955078125, 0.48740386962890625, 0.34881591796875, 1.6067695617675781, -0.3545112609863281, -0.054351806640625, 0.13169097900390625, 0.11116790771484375, 0.614990234375, -0.2869415283203125, -0.15296554565429688, 1.2982101440429688, 0.6684951782226562, -0.5082054138183594, 1.6257095336914062, 0.38721466064453125, 0.8334503173828125, 1.058074951171875, 0.01123809814453125, -0.6432723999023438, 0.5938873291015625, -0.404327392578125, 0.1687469482421875, -0.12128829956054688, 0.07755470275878906, 0.7867507934570312, 0.9949417114257812, 0.1016693115234375, 0.14605712890625, 0.7720298767089844, 0.7433700561523438, 0.12505340576171875, -0.8216094970703125, 1.4472503662109375, 0.18035125732421875, -1.0366973876953125, -0.342010498046875, 0.2604408264160156, -0.16158294677734375, 0.4847259521484375, 0.22226524353027344, 0.2711944580078125, -1.228240966796875, 0.8487701416015625, -0.009290695190429688, 0.3627433776855469, -0.18750762939453125, 1.6303253173828125, 0.18532562255859375, -0.3485107421875, 0.14627456665039062, 0.011796951293945312, 0.7486152648925781, 1.0285186767578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000041.npy"}
{"epoch": 0.06020558002936858, "step": 42, "batch_size": 64, "mean": 0.4121454060077667, "std": 0.5679777264595032, "min": -0.5118637084960938, "p10": -0.20832271575927733, "median": 0.32122802734375, "p90": 1.039710235595703, "max": 2.109130859375, "pos_frac": 0.78125, "sample": [0.20653533935546875, 0.1303558349609375, 0.3697013854980469, 0.9180450439453125, 2.108795166015625, 0.2891998291015625, 0.018768310546875, 1.701019287109375, 0.5511817932128906, 0.3742218017578125, 0.4238739013671875, 0.20688629150390625, 0.00673675537109375, 2.109130859375, 0.21496200561523438, 1.031982421875, 0.9409332275390625, 0.46982574462890625, 0.053173065185546875, 1.1037445068359375, 0.5475616455078125, 0.050312042236328125, 0.3268280029296875, 0.12235641479492188, -0.5118637084960938, 0.4700202941894531, -0.486846923828125, 0.94970703125, 0.5000839233398438, 0.867156982421875, 0.6884536743164062, 1.107177734375, -0.045543670654296875, 0.129730224609375, 1.0430221557617188, 0.22145843505859375, -0.4873046875, -0.07449913024902344, 0.654327392578125, -0.3001861572265625, -0.1998138427734375, -0.18929290771484375, 0.3156280517578125, 0.43109130859375, 0.7580528259277344, 0.89935302734375, 0.3913841247558594, 0.23348236083984375, 0.2161388397216797, 0.1979827880859375, -0.0842437744140625, 0.8788299560546875, -0.25183868408203125, -0.21196937561035156, 0.9339675903320312, 0.638397216796875, -0.22423171997070312, 1.6699371337890625, -0.1728057861328125, 0.00732421875, 0.0707550048828125, 0.6217422485351562, -0.144073486328125, 0.590484619140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000042.npy"}
{"epoch": 0.06167400881057269, "step": 43, "batch_size": 64, "mean": 0.3553672134876251, "std": 0.6694585084915161, "min": -1.0314254760742188, "p10": -0.3110973358154297, "median": 0.2554292678833008, "p90": 1.1268306732177735, "max": 2.596038818359375, "pos_frac": 0.65625, "sample": [0.35247039794921875, -0.37145233154296875, -0.20563316345214844, -0.06993484497070312, 0.9605560302734375, 0.348907470703125, -0.1407318115234375, 0.2915496826171875, -0.36461639404296875, 0.8769989013671875, 0.207061767578125, 0.01335906982421875, -0.2828788757324219, -0.1655731201171875, 0.9728050231933594, 0.13162994384765625, -0.05934906005859375, 0.29381561279296875, -1.0314254760742188, -0.1703929901123047, 0.5582275390625, 0.33274078369140625, -0.03704833984375, 1.1014976501464844, 0.39922332763671875, 0.8393630981445312, -0.20215606689453125, 0.3355083465576172, 0.011585235595703125, 0.188446044921875, -0.418701171875, 1.1376876831054688, 0.2979583740234375, 0.78125, -0.531951904296875, -0.08474349975585938, -0.10118865966796875, -0.1932659149169922, -0.05303192138671875, 0.990753173828125, 0.8482818603515625, -0.17937088012695312, 1.3309478759765625, 1.7786788940429688, 0.29941558837890625, 0.172088623046875, -0.3018074035644531, 0.449737548828125, 1.9813232421875, -0.71563720703125, 1.0150299072265625, 0.497894287109375, 1.7586441040039062, 0.21930885314941406, 0.4051971435546875, 2.596038818359375, -0.3150787353515625, 1.020364761352539, 0.09397125244140625, 0.07673835754394531, 0.5703887939453125, 0.6590499877929688, 1.4649581909179688, 0.0780181884765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000043.npy"}
{"epoch": 0.0631424375917768, "step": 44, "batch_size": 64, "mean": 0.34468963742256165, "std": 0.6256860494613647, "min": -1.166961669921875, "p10": -0.3174877166748047, "median": 0.3420295715332031, "p90": 0.973747253417969, "max": 2.31414794921875, "pos_frac": 0.75, "sample": [-0.19572067260742188, -0.12038421630859375, 0.4891815185546875, 0.08950614929199219, 2.2849960327148438, 0.31640625, 0.4008064270019531, 0.2042694091796875, 0.4348716735839844, -0.10155868530273438, 0.33966827392578125, 0.0470123291015625, 0.3295745849609375, -0.0197601318359375, 1.00274658203125, -1.0307464599609375, 0.25492095947265625, -0.104644775390625, 0.36060333251953125, -0.3413238525390625, 0.48000144958496094, 0.8128204345703125, 0.9060821533203125, 0.6093368530273438, -0.523223876953125, 0.626617431640625, 1.3064804077148438, -0.31517791748046875, 0.400665283203125, 0.4404945373535156, 0.02281951904296875, 0.6655998229980469, 0.6631870269775391, -0.3184776306152344, 0.6973152160644531, -0.36380767822265625, 0.8993988037109375, 0.6167411804199219, 1.8634796142578125, 1.3296051025390625, 0.1166229248046875, 0.6221466064453125, 0.371734619140625, 0.774383544921875, 0.10797119140625, 0.0587310791015625, 1.2886962890625, 0.465728759765625, 0.1880321502685547, 0.00392913818359375, 0.382781982421875, 0.4802703857421875, -1.166961669921875, 0.344390869140625, -0.07489204406738281, 2.31414794921875, -0.16422653198242188, 0.00960540771484375, -0.4758758544921875, 0.3289031982421875, 0.0880126953125, -0.2856178283691406, 0.3826103210449219, 0.438629150390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000044.npy"}
{"epoch": 0.06461086637298091, "step": 45, "batch_size": 64, "mean": 0.20663078129291534, "std": 0.5793676376342773, "min": -1.5067901611328125, "p10": -0.44396400451660156, "median": 0.1505126953125, "p90": 1.0469821929931644, "max": 1.60137939453125, "pos_frac": 0.625, "sample": [0.6436996459960938, 0.217132568359375, -0.47600555419921875, -0.08435440063476562, -0.06832122802734375, 0.11243057250976562, 0.15759658813476562, 0.41574859619140625, 0.3744964599609375, 1.2102203369140625, 1.3729934692382812, 0.6494674682617188, 0.2809867858886719, -0.4505958557128906, 0.14342880249023438, -0.5395050048828125, -0.2821807861328125, 0.1056671142578125, -0.0451202392578125, 0.354827880859375, 0.26299285888671875, 0.3387460708618164, 0.772735595703125, -0.20973587036132812, -0.11073875427246094, -0.047748565673828125, 0.251373291015625, -0.20197677612304688, 0.036342620849609375, 0.19225502014160156, 0.845001220703125, 0.29282379150390625, 0.39327239990234375, 1.348602294921875, -0.1866607666015625, -0.570159912109375, 1.365509033203125, -0.11844253540039062, 0.2872772216796875, -0.712493896484375, 0.43946075439453125, -0.0552215576171875, 0.26305389404296875, -0.08208465576171875, 0.0694427490234375, 1.560577392578125, 1.60137939453125, 0.090301513671875, 0.00753021240234375, -0.13863563537597656, 0.7234230041503906, 0.9469985961914062, 1.0898323059082031, 0.24901580810546875, -0.42848968505859375, 0.6334304809570312, -0.91461181640625, -1.5067901611328125, 0.4233837127685547, 0.259857177734375, -0.039516448974609375, -0.1361846923828125, 0.08559036254882812, -0.23896026611328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000045.npy"}
{"epoch": 0.06607929515418502, "step": 46, "batch_size": 64, "mean": 0.5048205852508545, "std": 0.612766444683075, "min": -0.8315963745117188, "p10": -0.1488971710205078, "median": 0.426666259765625, "p90": 1.4554378509521484, "max": 1.934326171875, "pos_frac": 0.765625, "sample": [0.6698837280273438, -0.8315963745117188, 1.4886703491210938, 1.6718902587890625, 0.7476844787597656, -0.1103515625, 0.16703033447265625, -0.263916015625, -0.34433746337890625, 1.34698486328125, 0.43212890625, 0.05972480773925781, 0.32085418701171875, 1.4420166015625, 1.10919189453125, -0.313018798828125, 0.6747817993164062, 0.3485107421875, 0.40387725830078125, 0.54595947265625, 0.4698982238769531, -0.5446624755859375, 0.19921112060546875, 0.5971145629882812, 0.3080482482910156, 0.7161865234375, 1.53955078125, 0.3973579406738281, 1.4479103088378906, 0.39730072021484375, 0.3594322204589844, -0.007537841796875, 1.576690673828125, 0.6636428833007812, 1.4586639404296875, 0.114471435546875, 0.059436798095703125, 0.6808547973632812, 0.6619186401367188, 0.825775146484375, -0.14113998413085938, 0.9765777587890625, 0.14748382568359375, 0.30942535400390625, 1.148712158203125, -0.08006668090820312, 0.7507553100585938, 0.1845245361328125, -0.0021953582763671875, -0.1522216796875, -0.09651947021484375, 1.934326171875, 1.0291576385498047, 0.43527984619140625, -0.08885955810546875, 1.652130126953125, 0.4692115783691406, 0.7986106872558594, -0.1339874267578125, 0.5423965454101562, 0.052520751953125, 1.173269271850586, -0.5093135833740234, 0.42120361328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000046.npy"}
{"epoch": 0.06754772393538913, "step": 47, "batch_size": 64, "mean": 0.3553228974342346, "std": 0.6757086515426636, "min": -0.945587158203125, "p10": -0.4153007507324219, "median": 0.26810646057128906, "p90": 1.3275966644287114, "max": 2.1450042724609375, "pos_frac": 0.75, "sample": [0.1279754638671875, 1.3819732666015625, -0.3337249755859375, 0.0300445556640625, 1.8993682861328125, 0.06274795532226562, 0.409332275390625, 1.1413345336914062, -0.25011444091796875, 0.3876495361328125, 0.1861419677734375, -0.46685791015625, 1.7363052368164062, 0.9637680053710938, 0.5506973266601562, 0.8251571655273438, 0.3773193359375, -0.2575492858886719, 0.3514404296875, -0.4173583984375, 0.03957366943359375, 0.4848480224609375, 0.15017127990722656, 0.2634620666503906, -0.03130340576171875, 0.17311859130859375, 0.129180908203125, 0.0528106689453125, 0.82464599609375, 0.5931015014648438, -0.14525127410888672, 0.12259292602539062, 0.404327392578125, 0.3311614990234375, 0.0926055908203125, -0.13089752197265625, -0.912567138671875, 0.14167022705078125, 0.4978485107421875, 0.917724609375, 1.77825927734375, 1.2007179260253906, 0.7444381713867188, -0.458709716796875, 1.6121826171875, 0.2727508544921875, 0.8167266845703125, 1.0604629516601562, -0.945587158203125, 1.64410400390625, -0.7455062866210938, 0.666534423828125, 0.243438720703125, 0.4024639129638672, -0.66510009765625, -0.3649864196777344, 0.416290283203125, 0.30838775634765625, 0.038330078125, 0.1682415008544922, 2.1450042724609375, -0.41049957275390625, 0.4359722137451172, -0.3277244567871094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000047.npy"}
{"epoch": 0.06901615271659324, "step": 48, "batch_size": 64, "mean": 0.5949971079826355, "std": 0.8422984480857849, "min": -1.7154693603515625, "p10": -0.352430534362793, "median": 0.5254545211791992, "p90": 1.6693958282470707, "max": 3.22027587890625, "pos_frac": 0.796875, "sample": [0.5628585815429688, 0.20755767822265625, 1.15350341796875, 1.472625732421875, 0.9944076538085938, 2.1961593627929688, -0.6988525390625, -0.6312026977539062, 0.6497650146484375, 0.5660400390625, 0.9400787353515625, 2.7314605712890625, -1.7154693603515625, 1.96038818359375, 0.118865966796875, -0.5226974487304688, 1.40728759765625, -0.0092010498046875, 0.044158935546875, 0.36932373046875, -0.0290069580078125, 0.4853973388671875, -0.16585731506347656, 1.0985565185546875, 0.117401123046875, 1.69781494140625, 1.1943511962890625, 1.479339599609375, 0.1515045166015625, 0.2530231475830078, 0.3866996765136719, 1.0672492980957031, 1.1051483154296875, 0.19103050231933594, 0.3467864990234375, -0.3499298095703125, 0.8547134399414062, 0.81927490234375, 0.66387939453125, 0.5436553955078125, 0.72613525390625, -0.37139320373535156, 0.5519180297851562, 0.00017547607421875, 0.6975555419921875, 0.12315559387207031, 0.6442184448242188, -0.15758514404296875, 0.10097122192382812, 0.6605300903320312, -0.21147918701171875, 1.0897941589355469, 0.07379150390625, 0.5032806396484375, 1.70867919921875, 3.22027587890625, 0.5072536468505859, -0.4201087951660156, -0.3535022735595703, 0.28179931640625, 2.27984619140625, 0.37570953369140625, 0.7376194000244141, 1.6030845642089844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000048.npy"}
{"epoch": 0.07048458149779736, "step": 49, "batch_size": 64, "mean": 0.8068201541900635, "std": 1.0285474061965942, "min": -0.4747314453125, "p10": -0.2647056579589844, "median": 0.48738861083984375, "p90": 2.615111541748047, "max": 3.1688003540039062, "pos_frac": 0.765625, "sample": [-0.268218994140625, 1.2061920166015625, -0.3486061096191406, 0.5965118408203125, 0.07143020629882812, 1.333587646484375, 0.1486358642578125, -0.2825355529785156, 0.0666351318359375, 0.044036865234375, 0.2308673858642578, -0.044384002685546875, 1.670989990234375, 2.615814208984375, 0.430450439453125, 1.070220947265625, -0.1119537353515625, -0.0552825927734375, 3.1107330322265625, 1.0602340698242188, -0.4317817687988281, -0.29308319091796875, 0.210540771484375, 0.006069183349609375, 2.998992919921875, 1.136077880859375, 0.7029266357421875, -0.4747314453125, -0.150390625, 0.233642578125, 1.9991073608398438, 3.0528564453125, 1.0956878662109375, 1.73382568359375, 0.73126220703125, 0.3152008056640625, 0.16696929931640625, 1.1870536804199219, 0.5443267822265625, 2.6134719848632812, 0.14942550659179688, -0.010572433471679688, 0.731536865234375, -0.12023162841796875, 2.8582000732421875, 1.02685546875, 0.36572265625, 0.6852798461914062, -0.1090240478515625, 1.4027252197265625, 0.4213905334472656, -0.25650787353515625, 3.1688003540039062, 3.1677322387695312, 1.1200942993164062, 0.2520751953125, 1.1273956298828125, 0.8565902709960938, 0.14602279663085938, 0.9546737670898438, 2.5306854248046875, -0.2776069641113281, 0.08159637451171875, 1.44024658203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000049.npy"}
{"epoch": 0.07195301027900147, "step": 50, "batch_size": 64, "mean": 0.6463215947151184, "std": 0.9229667782783508, "min": -1.0845298767089844, "p10": -0.4101512908935547, "median": 0.4969482421875, "p90": 1.968364715576172, "max": 2.96337890625, "pos_frac": 0.78125, "sample": [0.812408447265625, 1.54852294921875, -0.220611572265625, 0.00440216064453125, 0.7050666809082031, 0.4394073486328125, 0.36910247802734375, -1.0845298767089844, 0.4923248291015625, 0.19248199462890625, -0.40203094482421875, 1.3166656494140625, -0.19048309326171875, 1.49468994140625, 0.10063934326171875, -0.006103515625, 1.9249191284179688, 0.12895584106445312, 0.6781005859375, 2.921356201171875, -0.7535610198974609, 2.1494293212890625, 1.1001129150390625, 0.626678466796875, -0.4136314392089844, 0.4317207336425781, -0.5057449340820312, 0.0802459716796875, 0.06122589111328125, 1.1949615478515625, 0.5286865234375, 1.8020095825195312, 1.9869842529296875, 2.96337890625, -0.3259124755859375, 1.3276557922363281, -0.60552978515625, 2.2601318359375, -0.06195640563964844, 1.537261962890625, -0.8001632690429688, 0.9146194458007812, 0.7091712951660156, 0.5968704223632812, 0.7575531005859375, 0.01092529296875, 0.38024139404296875, 0.16327667236328125, 1.5699920654296875, 0.1612396240234375, 0.845367431640625, 0.195587158203125, 1.4036331176757812, 0.49040985107421875, 0.1682586669921875, 0.606109619140625, 0.5428543090820312, 1.4333038330078125, 0.5015716552734375, -0.6566848754882812, 2.330944061279297, 2.5929336547851562, -0.2063751220703125, 0.04351043701171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000050.npy"}
{"epoch": 0.07342143906020558, "step": 51, "batch_size": 64, "mean": 0.6294623017311096, "std": 1.0829473733901978, "min": -2.432647705078125, "p10": -0.3223533630371094, "median": 0.49744606018066406, "p90": 1.7670852661132814, "max": 4.7806854248046875, "pos_frac": 0.75, "sample": [0.11746597290039062, -0.4249725341796875, -0.11019515991210938, -0.3320808410644531, 0.7330360412597656, 1.7188568115234375, 1.6607284545898438, -0.7416572570800781, -2.432647705078125, -1.407562255859375, 3.7850341796875, 1.740264892578125, 0.7825565338134766, 0.992034912109375, 0.6927852630615234, 1.9525527954101562, 0.841278076171875, 1.0050106048583984, 1.1872406005859375, 0.6507225036621094, 0.40579986572265625, 1.171539306640625, 0.12118148803710938, 0.28377532958984375, 0.8699951171875, 1.3375930786132812, 0.3276100158691406, 0.653472900390625, 0.412841796875, 0.73602294921875, -0.0572052001953125, 0.4670906066894531, 0.1521148681640625, 0.08127212524414062, -0.2996559143066406, 4.7806854248046875, 0.19199371337890625, 1.6999359130859375, 1.7785797119140625, 0.10768890380859375, 0.29052734375, 0.4280586242675781, 1.3409690856933594, 2.408416748046875, 0.2706298828125, 0.8461418151855469, 0.8495330810546875, -0.072479248046875, -0.5250282287597656, 1.796478271484375, 2.2147216796875, -1.1940765380859375, -0.11650848388671875, -0.14997100830078125, -0.093841552734375, 0.527801513671875, 0.1267404556274414, 0.016815185546875, 0.62677001953125, 1.00640869140625, 1.2278366088867188, -0.1236114501953125, 1.1632080078125, -0.212738037109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000051.npy"}
{"epoch": 0.07488986784140969, "step": 52, "batch_size": 64, "mean": 0.7756293416023254, "std": 0.866299569606781, "min": -0.722686767578125, "p10": -0.2528919219970702, "median": 0.7428932189941406, "p90": 1.9252555847167974, "max": 3.95916748046875, "pos_frac": 0.84375, "sample": [0.3178272247314453, 1.8205108642578125, 0.06850433349609375, 0.0802154541015625, 0.233062744140625, 1.1630859375, 0.9989242553710938, 0.775115966796875, -0.722686767578125, 2.25048828125, 0.36647796630859375, 1.3759651184082031, 0.037200927734375, -0.2875823974609375, 0.9108734130859375, 0.48291015625, 0.6476211547851562, 0.6061172485351562, 0.910980224609375, 1.3710556030273438, 1.319610595703125, 0.7465362548828125, 0.05340576171875, 0.7392501831054688, -0.09576797485351562, 1.0459060668945312, -0.5589599609375, 0.7657318115234375, 0.39785003662109375, 3.1049957275390625, 3.95916748046875, 0.502349853515625, 0.030426025390625, -0.17194747924804688, 0.979827880859375, 1.2297515869140625, 2.0549392700195312, -0.0135498046875, 0.48888397216796875, 1.9701461791992188, 0.2587127685546875, 1.6592254638671875, 1.2293319702148438, -0.7160186767578125, 0.4242210388183594, 1.1345062255859375, 0.9042510986328125, 1.555267333984375, -0.41519927978515625, 0.6700115203857422, 0.9517478942871094, 2.28179931640625, 0.9061927795410156, 0.6643600463867188, -0.40636444091796875, 2.0410919189453125, -0.3358001708984375, 0.937896728515625, 1.06219482421875, 1.03759765625, 0.44181060791015625, 1.0390090942382812, 0.24421310424804688, 0.11499786376953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000052.npy"}
{"epoch": 0.0763582966226138, "step": 53, "batch_size": 64, "mean": 1.0988894701004028, "std": 1.4979040622711182, "min": -0.907562255859375, "p10": -0.38580207824707025, "median": 0.8119335174560547, "p90": 3.1445472717285172, "max": 7.868865966796875, "pos_frac": 0.796875, "sample": [-0.021854400634765625, 0.8143768310546875, 1.1188888549804688, 4.4890899658203125, -0.019256591796875, -0.8908233642578125, 1.8561019897460938, -0.5675430297851562, 2.1330642700195312, 1.013946533203125, 3.452789306640625, 7.868865966796875, 0.9028167724609375, 0.8639297485351562, 0.214080810546875, 0.18156814575195312, 0.0354461669921875, 1.2614364624023438, 0.665252685546875, 2.0092849731445312, 0.8910293579101562, 0.37945556640625, 3.3360137939453125, 1.0056304931640625, 1.0160331726074219, 0.43398284912109375, 1.9551544189453125, -0.20436859130859375, 2.0165443420410156, -0.07205390930175781, 3.39794921875, -0.6195030212402344, 1.2495384216308594, 0.2250213623046875, 1.05987548828125, 0.532318115234375, -0.5734519958496094, 0.8861770629882812, 0.35193634033203125, 0.5120086669921875, 0.5106964111328125, -0.3228340148925781, 1.9217872619628906, 2.6977920532226562, -0.41278839111328125, 2.062328338623047, 2.53045654296875, -0.770477294921875, -0.907562255859375, 0.4735565185546875, 0.08458328247070312, 0.17252349853515625, 0.5867118835449219, -0.3206329345703125, 1.1378097534179688, 0.8094902038574219, 0.6762123107910156, 1.9141387939453125, 0.7232627868652344, 3.9989013671875, 1.167449951171875, 2.1956939697265625, 3.907684326171875, 0.331390380859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000053.npy"}
{"epoch": 0.07782672540381791, "step": 54, "batch_size": 64, "mean": 1.159604787826538, "std": 1.440098762512207, "min": -1.2510757446289062, "p10": -0.09393157958984374, "median": 0.7989530563354492, "p90": 2.8221557617187507, "max": 7.138702392578125, "pos_frac": 0.859375, "sample": [0.26625823974609375, 2.9725265502929688, 0.5856552124023438, 0.1500835418701172, 1.0425567626953125, 0.2253875732421875, 1.1100387573242188, 0.02262115478515625, 2.555419921875, 2.0298843383789062, 0.40035247802734375, 1.551025390625, 1.9952163696289062, 0.7260227203369141, -1.1337432861328125, 1.1301956176757812, 0.5479564666748047, 0.6232070922851562, 2.0189208984375, 2.8988418579101562, -0.26543426513671875, 2.6511077880859375, 2.9000396728515625, -0.72491455078125, 2.1544189453125, -0.024280548095703125, 1.5089874267578125, 1.8002777099609375, 1.2279243469238281, 2.3143959045410156, 0.0554046630859375, 0.1396026611328125, 2.604644775390625, 0.25223541259765625, 0.3132896423339844, 1.8557891845703125, 3.00140380859375, -0.8153839111328125, 0.1620330810546875, 2.8954620361328125, 0.5751914978027344, 1.1924896240234375, 1.7203369140625, -1.2510757446289062, 0.033031463623046875, 1.2943077087402344, -0.0878753662109375, 0.194000244140625, 0.6644058227539062, -0.096527099609375, 2.0611190795898438, 0.6922950744628906, 1.7042083740234375, -0.298553466796875, 0.20749664306640625, 5.773040771484375, 0.8718833923339844, 1.4463882446289062, 0.00382232666015625, 1.614410400390625, 0.16847896575927734, 7.138702392578125, 0.4254341125488281, 2.4422607421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000054.npy"}
{"epoch": 0.07929515418502203, "step": 55, "batch_size": 64, "mean": 1.2348113059997559, "std": 1.6062026023864746, "min": -1.7999267578125, "p10": -0.32629890441894527, "median": 0.7473831176757812, "p90": 3.1790229797363287, "max": 6.639381408691406, "pos_frac": 0.84375, "sample": [3.3634185791015625, 1.25054931640625, 0.279937744140625, 2.7474365234375, 3.26708984375, -0.4270172119140625, 0.7486534118652344, 1.4445724487304688, 3.8400306701660156, 2.5629959106445312, 0.6847457885742188, 0.4696693420410156, 1.0976333618164062, 0.771820068359375, 6.583717346191406, 2.4059295654296875, 0.8845291137695312, 1.5902862548828125, 2.9358291625976562, -0.9592475891113281, 2.9652786254882812, 0.0316314697265625, 0.5579719543457031, -0.0988922119140625, 3.6941375732421875, 0.3607063293457031, 2.207550048828125, 2.2708282470703125, 0.20159530639648438, 0.10852432250976562, -0.7319107055664062, 1.47503662109375, 1.1490859985351562, 0.8088130950927734, 0.2694549560546875, -0.4372711181640625, 0.32765960693359375, -0.3357391357421875, 2.4747657775878906, 1.6901969909667969, 0.6689300537109375, 0.3014335632324219, 0.3703765869140625, 6.639381408691406, 3.8909988403320312, -0.3042716979980469, 0.21586227416992188, 0.043182373046875, 0.30083465576171875, -1.7999267578125, 0.6039676666259766, 0.750396728515625, 2.3621368408203125, 0.48375701904296875, 2.54541015625, 0.5654067993164062, 0.038532257080078125, 2.9735336303710938, -0.12943267822265625, 1.6761550903320312, -0.9314498901367188, 0.7461128234863281, 0.083160400390625, 2.40142822265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000055.npy"}
{"epoch": 0.08076358296622614, "step": 56, "batch_size": 64, "mean": 1.0258824825286865, "std": 1.6156896352767944, "min": -3.947540283203125, "p10": -0.30241556167602535, "median": 0.7275753021240234, "p90": 2.9555114746093767, "max": 6.357688903808594, "pos_frac": 0.828125, "sample": [0.36801910400390625, 0.4690399169921875, 0.463958740234375, 2.0365333557128906, -0.7506637573242188, 0.25424957275390625, 4.269256591796875, 0.5319328308105469, -0.3336067199707031, -0.09776687622070312, 0.7628555297851562, 0.4999961853027344, 0.890533447265625, 3.123382568359375, 0.9405555725097656, 0.24718093872070312, 1.0564422607421875, 0.6787796020507812, 6.357688903808594, 1.80059814453125, 0.14526748657226562, 0.728607177734375, 1.6931838989257812, -0.4889068603515625, 1.714202880859375, 1.348541259765625, -0.006237030029296875, 0.20316314697265625, 0.7265434265136719, 0.7227745056152344, 1.5436248779296875, 3.143096923828125, 0.6636962890625, 0.6386699676513672, 0.9935836791992188, 2.563812255859375, 0.12808609008789062, 0.954132080078125, 1.5550460815429688, 5.185447692871094, 1.5054588317871094, 0.9207916259765625, 1.2398834228515625, 6.3358154296875, 2.1849365234375, 1.173004150390625, -0.35784339904785156, -0.665740966796875, 0.5208587646484375, 1.084747314453125, 0.519287109375, -0.22963619232177734, 0.8290557861328125, 2.197021484375, 0.057891845703125, 0.7906875610351562, 0.31137847900390625, -3.947540283203125, -0.14211654663085938, 1.6350326538085938, -1.8210601806640625, 0.11535263061523438, 0.32178497314453125, 3.3521270751953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000056.npy"}
{"epoch": 0.08223201174743025, "step": 57, "batch_size": 64, "mean": 1.1884701251983643, "std": 1.6261351108551025, "min": -3.454132080078125, "p10": -0.34449558258056634, "median": 0.8007259368896484, "p90": 3.4550430297851578, "max": 5.750457763671875, "pos_frac": 0.78125, "sample": [1.07086181640625, 1.1615333557128906, 0.9032173156738281, 2.8923492431640625, 2.9655838012695312, 0.80340576171875, 2.963409423828125, -0.20676136016845703, 1.7806396484375, 0.8314743041992188, 2.0724334716796875, 0.2694568634033203, 0.6096267700195312, 0.769805908203125, 1.2896499633789062, 5.750457763671875, 0.12943077087402344, 1.7988739013671875, -1.5283355712890625, 5.24053955078125, 2.2712783813476562, 0.6083908081054688, 0.778533935546875, 0.5011405944824219, -0.2921886444091797, 0.657684326171875, 2.387725830078125, -0.3962669372558594, 0.5649948120117188, 3.9491958618164062, 1.9800643920898438, 0.7000846862792969, 0.412322998046875, -0.11653900146484375, -3.454132080078125, 4.7772369384765625, 0.1217193603515625, 0.35997772216796875, 0.7980461120605469, 1.8868560791015625, 3.605010986328125, 0.3739204406738281, 1.7906341552734375, 1.1920623779296875, 2.01702880859375, 0.6630935668945312, 0.8108024597167969, -0.5559349060058594, 1.331787109375, -0.1666107177734375, -0.6439609527587891, 4.2797088623046875, -0.28308868408203125, -0.06432342529296875, 3.1051177978515625, 0.8690948486328125, -0.061370849609375, 2.034618377685547, -0.366912841796875, -1.211639404296875, 0.31805419921875, 2.572418212890625, 0.7798538208007812, 3.6089401245117188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000057.npy"}
{"epoch": 0.08370044052863436, "step": 58, "batch_size": 64, "mean": 1.547766923904419, "std": 1.949338436126709, "min": -2.35150146484375, "p10": -0.009648895263671856, "median": 0.9171905517578125, "p90": 4.4931825637817395, "max": 8.310592651367188, "pos_frac": 0.890625, "sample": [1.2651214599609375, 0.3847007751464844, -0.18260765075683594, 1.8759078979492188, 1.2271347045898438, -0.16878128051757812, 0.46517181396484375, 1.3864364624023438, 2.4051589965820312, -0.13254547119140625, 0.348663330078125, 6.5778045654296875, 0.37244415283203125, 2.2192115783691406, 0.5292930603027344, 1.1956634521484375, 1.2908287048339844, -0.01766204833984375, 1.3083572387695312, 2.964733123779297, 0.796173095703125, 0.9632148742675781, 5.2725677490234375, 0.32535552978515625, 3.5660858154296875, 1.067413330078125, 2.5518646240234375, 4.139448165893555, 3.9323959350585938, 0.5125961303710938, 8.310592651367188, 0.8464698791503906, 0.4163475036621094, -0.17181396484375, 0.5677299499511719, 0.12196159362792969, 0.18091583251953125, 0.8711662292480469, 3.9617843627929688, -0.31756591796875, 0.10959243774414062, 0.36225318908691406, 6.1148223876953125, 2.4456634521484375, 2.4945602416992188, 4.644783020019531, 1.0309295654296875, 1.322265625, 1.858856201171875, 5.70635986328125, 0.042613983154296875, 0.248260498046875, 0.0992431640625, 2.2789459228515625, 1.0226516723632812, 0.18615341186523438, 4.66656494140625, -2.35150146484375, 0.0090484619140625, 0.158660888671875, 0.08991241455078125, 0.15509033203125, 0.8442611694335938, 2.287353515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000058.npy"}
{"epoch": 0.08516886930983847, "step": 59, "batch_size": 64, "mean": 1.1774799823760986, "std": 1.4349702596664429, "min": -2.338348388671875, "p10": -0.20257968902587883, "median": 1.0889396667480469, "p90": 2.5937873840332033, "max": 5.822074890136719, "pos_frac": 0.8125, "sample": [1.2929534912109375, 0.03857421875, 2.0601806640625, 1.315765380859375, 1.2315483093261719, 2.3004379272460938, 0.3654022216796875, 4.903717041015625, 0.18778228759765625, 2.761199951171875, -0.10098648071289062, 1.9951171875, 2.0174713134765625, -0.2653045654296875, 1.68707275390625, 0.6095123291015625, -0.2312946319580078, 1.7166824340820312, 0.7971038818359375, 1.1633377075195312, -0.726043701171875, -2.338348388671875, 0.00041961669921875, 0.6594924926757812, 1.4349899291992188, 1.9463958740234375, 0.8205833435058594, -0.039829254150390625, 1.4089584350585938, -0.0541229248046875, 1.2940216064453125, -0.34220123291015625, 1.0388259887695312, 0.18297576904296875, 0.45450592041015625, 0.8813705444335938, 1.5198783874511719, 1.1021194458007812, 1.7101898193359375, 1.117218017578125, 0.99310302734375, 3.5538787841796875, 0.09746742248535156, -1.1826095581054688, -0.11119461059570312, 1.0757598876953125, 1.569793701171875, 1.3184280395507812, 2.4067306518554688, 2.4990081787109375, 1.7525558471679688, 0.6829376220703125, 1.9960784912109375, 2.6344070434570312, 0.8240966796875, -0.6462020874023438, 1.5882568359375, 0.06723785400390625, -0.13557815551757812, 0.42897796630859375, 5.822074890136719, 4.1788482666015625, 0.6562423706054688, 5.3707427978515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000059.npy"}
{"epoch": 0.08663729809104258, "step": 60, "batch_size": 64, "mean": 1.2050297260284424, "std": 1.4179848432540894, "min": -1.29290771484375, "p10": -0.11770439147949215, "median": 0.8144645690917969, "p90": 3.256043243408203, "max": 4.9782562255859375, "pos_frac": 0.828125, "sample": [0.8370628356933594, 3.257080078125, 0.10418701171875, -0.033885955810546875, 0.7531280517578125, 0.8528938293457031, 2.4799270629882812, 4.098854064941406, 1.11090087890625, 2.1919403076171875, 0.7599830627441406, 1.084930419921875, -0.07219123840332031, 0.012451171875, 2.8692169189453125, 1.514495849609375, 2.12518310546875, 2.204631805419922, 0.3081207275390625, 1.4482994079589844, 4.7744903564453125, 0.6114349365234375, 4.650238037109375, 3.6736984252929688, 0.770233154296875, 1.3113327026367188, 0.1547393798828125, 0.11734771728515625, 0.8021163940429688, -0.16415977478027344, 0.25384521484375, 4.9782562255859375, -0.21482086181640625, 2.5673065185546875, 0.5643310546875, -0.07258987426757812, 0.06744384765625, 2.225555419921875, 0.245208740234375, 0.0059661865234375, 1.0619583129882812, 2.080760955810547, 1.479888916015625, 2.8584136962890625, 3.9229583740234375, 0.07245063781738281, 0.826812744140625, 3.2536239624023438, 0.931365966796875, 0.4498443603515625, 2.0443382263183594, -0.669586181640625, 0.6836395263671875, -0.5489959716796875, 1.3877983093261719, -0.1370391845703125, -1.29290771484375, 0.721160888671875, 1.3843536376953125, 1.5271949768066406, 0.38684844970703125, -0.6938018798828125, 0.22505569458007812, -0.06341934204101562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000060.npy"}
{"epoch": 0.0881057268722467, "step": 61, "batch_size": 64, "mean": 1.0777602195739746, "std": 1.8553279638290405, "min": -2.3487548828125, "p10": -0.3008569717407226, "median": 0.7490501403808594, "p90": 3.222039031982422, "max": 10.8310546875, "pos_frac": 0.75, "sample": [-0.2076568603515625, 0.4536628723144531, 0.8029708862304688, 0.5678558349609375, 1.8358688354492188, 0.5195751190185547, 0.6511077880859375, 3.559215545654297, -0.07457733154296875, 2.2670211791992188, 0.9481201171875, 5.1702728271484375, 3.141754150390625, -0.040958404541015625, 0.9330062866210938, 0.6555366516113281, -1.1629638671875, 0.9882545471191406, 0.2225341796875, -0.094573974609375, 3.2564468383789062, -2.06439208984375, 2.9733123779296875, 0.63372802734375, 1.5725479125976562, 0.4757118225097656, 1.54449462890625, 0.9787673950195312, 0.3419914245605469, 2.2479934692382812, 1.2108955383300781, 0.8904571533203125, 1.3663177490234375, -0.15643310546875, 10.8310546875, 2.4065322875976562, 1.0010986328125, -0.04268646240234375, -2.3487548828125, -0.3115234375, 0.9618148803710938, -0.06452369689941406, 3.5206451416015625, 0.5564002990722656, -0.2759685516357422, 4.046112060546875, 0.5648651123046875, 0.0608367919921875, -0.681640625, 1.1752777099609375, 3.5021820068359375, -0.2137298583984375, -1.286285400390625, 0.5531387329101562, 0.4736289978027344, 0.69512939453125, 0.5579833984375, 0.9753952026367188, 2.3408660888671875, 1.3682975769042969, 1.4748687744140625, 1.0297622680664062, -1.1615066528320312, 0.8595161437988281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000061.npy"}
{"epoch": 0.08957415565345081, "step": 62, "batch_size": 64, "mean": 1.1855626106262207, "std": 1.9207544326782227, "min": -6.734771728515625, "p10": -0.25049362182617174, "median": 0.9223957061767578, "p90": 3.2471822738647464, "max": 8.036445617675781, "pos_frac": 0.859375, "sample": [0.6812629699707031, 0.4365730285644531, -1.2949485778808594, 4.14398193359375, 1.30389404296875, 0.18878555297851562, 1.3425102233886719, 0.052417755126953125, 2.0600547790527344, -0.5964508056640625, 4.8035888671875, 0.048004150390625, 0.9261817932128906, -0.30670928955078125, 0.239593505859375, 0.2867279052734375, 1.67431640625, 2.420684814453125, 0.06743049621582031, 1.27001953125, 0.7000846862792969, 3.46533203125, 1.9065399169921875, -0.336517333984375, 0.2513236999511719, 4.78497314453125, 0.2603130340576172, 3.0580596923828125, 0.7676219940185547, 6.259727478027344, 0.9459190368652344, 3.2857208251953125, 2.5405807495117188, 0.918609619140625, 0.772216796875, 1.5723152160644531, 1.0323562622070312, 0.3511962890625, 1.097198486328125, 1.371307373046875, -0.855072021484375, -0.106109619140625, 0.29486083984375, 1.3322296142578125, -0.11932373046875, 0.3812580108642578, 2.090930938720703, 1.2853050231933594, 2.4018707275390625, 1.8659820556640625, 1.7454109191894531, 0.2284984588623047, 0.0061798095703125, 1.8643817901611328, 3.157258987426758, 0.6772994995117188, 0.3742523193359375, 0.7057418823242188, 8.036445617675781, -0.6019439697265625, 1.1113052368164062, 0.8069076538085938, 1.1743087768554688, -6.734771728515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000062.npy"}
{"epoch": 0.09104258443465492, "step": 63, "batch_size": 64, "mean": 1.5173449516296387, "std": 1.5604740381240845, "min": -2.2088165283203125, "p10": -0.22064208984374992, "median": 1.3735618591308594, "p90": 3.5050079345703136, "max": 6.299644470214844, "pos_frac": 0.828125, "sample": [0.607147216796875, 3.8819656372070312, 2.6787261962890625, 3.984832763671875, 0.3206787109375, -0.0117340087890625, -0.005096435546875, 2.035308837890625, 2.7855148315429688, 1.3637847900390625, 1.0253982543945312, 1.9898300170898438, 0.7590255737304688, 0.5285911560058594, -0.254241943359375, -0.142242431640625, 3.6373291015625, 0.8107433319091797, 0.37970733642578125, 2.1884918212890625, 6.299644470214844, 1.7398910522460938, 1.2764434814453125, -0.0876007080078125, 0.02939605712890625, 0.7310333251953125, 1.5792007446289062, 0.000213623046875, 1.9593734741210938, 0.9244422912597656, 2.21197509765625, 2.100250244140625, 2.8022613525390625, 0.8828926086425781, 1.4659576416015625, 2.390380859375, 2.582763671875, 3.196258544921875, 1.3833389282226562, 0.91912841796875, -0.36474609375, 2.0541839599609375, -0.36385345458984375, -0.6250228881835938, -2.2088165283203125, 3.168376922607422, 0.317352294921875, 1.1026763916015625, 2.48065185546875, 0.19293212890625, 0.90582275390625, 4.122840881347656, 4.131172180175781, -0.30517578125, 2.54058837890625, 0.059844970703125, 1.4089202880859375, 5.414947509765625, 0.811798095703125, -0.7874565124511719, 3.025177001953125, 2.8627853393554688, 2.81378173828125, 1.4002914428710938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000063.npy"}
{"epoch": 0.09251101321585903, "step": 64, "batch_size": 64, "mean": 1.638750672340393, "std": 2.126108169555664, "min": -1.752777099609375, "p10": -0.3868671417236328, "median": 1.1556262969970703, "p90": 4.335562133789063, "max": 8.111457824707031, "pos_frac": 0.765625, "sample": [-0.82470703125, 0.37790679931640625, 4.4899444580078125, 0.9151763916015625, 0.878631591796875, 1.4433631896972656, 0.4924278259277344, -0.5223159790039062, 1.3069496154785156, -0.21440887451171875, -0.06741714477539062, 1.2255706787109375, 0.2658348083496094, 1.412994384765625, -0.3619537353515625, 0.49639892578125, 1.4375, 0.38614463806152344, 6.006744384765625, 0.32303619384765625, 2.9460906982421875, -1.7215499877929688, 4.373016357421875, 0.6188278198242188, -1.752777099609375, 1.3307342529296875, -0.109619140625, 0.5205421447753906, -0.3909149169921875, 2.4563941955566406, 0.13249969482421875, -0.3774223327636719, 7.83056640625, 1.390625, -0.07601165771484375, 2.9004440307617188, -0.8135223388671875, 2.3866729736328125, 4.0261383056640625, 0.7678070068359375, 0.6616668701171875, 2.2733840942382812, 4.557525634765625, 4.2481689453125, -0.8408203125, 1.7041549682617188, 2.7721099853515625, 2.63360595703125, 7.261383056640625, 1.8599395751953125, 3.5419692993164062, 0.5108642578125, 2.9090957641601562, 1.0237503051757812, 8.111457824707031, -0.06502151489257812, -0.20919036865234375, 1.1215133666992188, 3.5742340087890625, 2.8874359130859375, 2.5036239624023438, 0.6220779418945312, 4.121013641357422, 1.1897392272949219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000064.npy"}
{"epoch": 0.09397944199706314, "step": 65, "batch_size": 64, "mean": 1.7660255432128906, "std": 2.10125994682312, "min": -3.0152587890625, "p10": -0.1767623901367187, "median": 1.4183425903320312, "p90": 4.770558166503906, "max": 10.606178283691406, "pos_frac": 0.859375, "sample": [4.7745819091796875, 5.33551025390625, 1.50921630859375, 1.258758544921875, 1.5023269653320312, -0.14371490478515625, -0.19092559814453125, 1.4310150146484375, 1.2573356628417969, 0.7212142944335938, 10.606178283691406, -0.7751083374023438, 0.2056884765625, 0.799835205078125, 4.896209716796875, 0.781646728515625, 2.5977516174316406, 0.6098909378051758, 1.9764251708984375, -3.0152587890625, 1.5859489440917969, 4.380859375, 1.405670166015625, 0.4742279052734375, 2.9384918212890625, 5.9456024169921875, 3.661102294921875, 2.8932571411132812, 5.610282897949219, 0.41057586669921875, 1.4977798461914062, 0.8075942993164062, -0.8831863403320312, 1.0828933715820312, 2.494293212890625, 1.6247444152832031, 3.0771102905273438, 0.48056793212890625, 5.613525390625, 0.3251686096191406, -0.0045928955078125, -1.545257568359375, 1.2755889892578125, 0.4519309997558594, 4.76116943359375, 0.8743972778320312, 2.3348388671875, 0.48651123046875, 2.497039794921875, 3.5605545043945312, 1.548675537109375, -0.7255401611328125, 2.855316162109375, 1.5052833557128906, 1.6070556640625, 1.216461181640625, 1.0512008666992188, -0.412017822265625, 1.8449554443359375, 2.401123046875, 0.2688751220703125, 2.1155242919921875, 0.6500015258789062, 0.8414497375488281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000065.npy"}
{"epoch": 0.09544787077826726, "step": 66, "batch_size": 64, "mean": 1.1536920070648193, "std": 1.9623749256134033, "min": -3.740795135498047, "p10": -0.5469528198242187, "median": 0.5890722274780273, "p90": 3.9924690246582055, "max": 6.956840515136719, "pos_frac": 0.765625, "sample": [2.374866485595703, 5.4626007080078125, 4.243408203125, 0.1305389404296875, 1.4978866577148438, -0.1195526123046875, 0.003265380859375, 0.6659774780273438, 0.21397018432617188, 2.0126266479492188, 2.2030181884765625, 0.3306922912597656, 1.034576416015625, 0.22127532958984375, -1.99224853515625, -3.740795135498047, 1.2753944396972656, 0.36808109283447266, 0.5278244018554688, 6.956840515136719, 0.341400146484375, 1.9978141784667969, 0.15381431579589844, 0.854827880859375, -0.57305908203125, 0.4317474365234375, -0.11219024658203125, 2.0156898498535156, 1.0181427001953125, 0.817840576171875, 0.7046279907226562, 5.974205017089844, 2.822895050048828, 1.9674301147460938, 1.9205093383789062, 0.7469940185546875, 1.4415626525878906, 0.6092586517333984, 0.25380706787109375, 3.4069442749023438, -0.1055145263671875, -0.4860382080078125, 0.43164825439453125, -0.2140960693359375, 1.1119117736816406, -1.9235763549804688, 0.5688858032226562, 0.3914604187011719, -0.829315185546875, 0.02561187744140625, 2.8043670654296875, 2.3817291259765625, 5.564979553222656, -0.29135894775390625, -0.18664932250976562, 5.052459716796875, -1.0045242309570312, 0.16476058959960938, -0.00318145751953125, 0.48254966735839844, 5.5174560546875, -0.648712158203125, 2.357086181640625, 2.2098388671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000066.npy"}
{"epoch": 0.09691629955947137, "step": 67, "batch_size": 64, "mean": 1.0333654880523682, "std": 2.767472743988037, "min": -5.202568054199219, "p10": -1.83896484375, "median": 0.7019920349121094, "p90": 4.875941467285159, "max": 10.443382263183594, "pos_frac": 0.640625, "sample": [5.2987518310546875, 1.1447906494140625, 7.7298126220703125, 1.09521484375, 0.3988933563232422, -0.162384033203125, 6.864044189453125, 0.6169109344482422, 0.6800689697265625, -0.06390571594238281, 0.7855720520019531, 2.2475128173828125, -1.4214859008789062, 0.22198486328125, 0.18120574951171875, -0.8148727416992188, 1.3113250732421875, 2.3109664916992188, 0.16958236694335938, -2.891876220703125, 1.783447265625, 1.433685302734375, 1.736907958984375, -0.9409942626953125, -0.09851837158203125, -1.239959716796875, 2.760833740234375, -0.053775787353515625, 1.2906341552734375, -2.1291275024414062, -1.442352294921875, 2.072620391845703, 1.0169563293457031, 0.5698318481445312, -0.525909423828125, 1.0296363830566406, -0.000736236572265625, -0.0240631103515625, -1.907867431640625, -0.6670989990234375, -2.76812744140625, 3.6690750122070312, -0.12476348876953125, 1.7840995788574219, 2.394134521484375, 2.69012451171875, 0.3925933837890625, 4.216102600097656, 1.0069599151611328, 5.158729553222656, -1.678192138671875, 0.8897342681884766, 2.84283447265625, 1.216796875, 1.3935699462890625, -2.7501678466796875, 7.01470947265625, 0.7239151000976562, -5.202568054199219, 7.059661865234375, 10.443382263183594, 0.34147071838378906, -4.038818359375, -0.9061260223388672], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000067.npy"}
{"epoch": 0.09838472834067548, "step": 68, "batch_size": 64, "mean": 2.0699267387390137, "std": 2.780046224594116, "min": -1.8524169921875, "p10": -0.24716625213623045, "median": 1.3689794540405273, "p90": 6.553750610351571, "max": 13.2196044921875, "pos_frac": 0.84375, "sample": [1.4728240966796875, 2.1918182373046875, 2.9307861328125, 1.9983444213867188, 0.3589668273925781, 4.0789642333984375, 4.337005615234375, 2.541484832763672, -1.165313720703125, -0.09549808502197266, 1.69232177734375, 13.2196044921875, 3.1883010864257812, 3.2638626098632812, 1.7794876098632812, 0.84515380859375, 1.235626220703125, 3.161792755126953, 0.7391815185546875, 8.092193603515625, 0.4820709228515625, 1.5144901275634766, 0.7719917297363281, 0.550506591796875, -0.22652626037597656, 2.6713333129882812, 7.7667236328125, 8.414833068847656, 2.6094932556152344, -1.4703903198242188, 2.7418060302734375, 1.875732421875, 0.200347900390625, 1.9677276611328125, 1.4070587158203125, 0.9330787658691406, 0.135955810546875, 0.4032421112060547, 0.509246826171875, 3.0024948120117188, -1.8524169921875, 0.01861572265625, -0.051387786865234375, 0.39013671875, 1.2735137939453125, 1.076446533203125, 1.3309001922607422, 7.5037841796875, -1.1010360717773438, 0.17090511322021484, 1.6408157348632812, 8.983688354492188, 7.761405944824219, 2.6931610107421875, -0.48056793212890625, 3.6769485473632812, 1.1521415710449219, 1.0779914855957031, 3.2480010986328125, 0.01636505126953125, -0.256011962890625, -0.7598991394042969, 1.3181915283203125, 1.5154953002929688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000068.npy"}
{"epoch": 0.09985315712187959, "step": 69, "batch_size": 64, "mean": 2.242255210876465, "std": 3.0561177730560303, "min": -6.134002685546875, "p10": -0.4556274414062499, "median": 1.7835578918457031, "p90": 5.467201232910156, "max": 15.057907104492188, "pos_frac": 0.828125, "sample": [0.92889404296875, -0.6837272644042969, 7.56878662109375, 0.17507171630859375, 3.0944747924804688, 5.4853057861328125, -6.134002685546875, 1.02850341796875, 0.042369842529296875, 0.775299072265625, -0.1251220703125, 0.6158065795898438, 4.0099334716796875, -0.2745094299316406, 1.78729248046875, 3.4090499877929688, 4.1861419677734375, 2.6318740844726562, 5.424957275390625, 4.7845306396484375, 1.2867355346679688, 0.381317138671875, 1.7798233032226562, 2.321552276611328, 3.417987823486328, 3.197662353515625, 0.9478607177734375, 12.027565002441406, 2.316547393798828, 1.8081798553466797, -0.50048828125, -0.6740798950195312, 1.11505126953125, 8.116439819335938, 0.5690994262695312, 4.2123260498046875, 2.9783248901367188, 3.484771728515625, 2.2591629028320312, 0.9144630432128906, -0.170196533203125, 4.0699920654296875, 0.4849090576171875, -0.9912185668945312, 0.32448768615722656, 1.7888145446777344, 3.4650497436523438, 6.0113372802734375, 1.248626708984375, 0.3781852722167969, 2.115144729614258, 2.373668670654297, -0.3509521484375, 0.9442596435546875, 2.352752685546875, 1.1276702880859375, 5.863616943359375, 15.057907104492188, 1.4035491943359375, 3.8103179931640625, -0.7006702423095703, 0.9415645599365234, -1.0706214904785156, 2.33489990234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000069.npy"}
{"epoch": 0.1013215859030837, "step": 70, "batch_size": 64, "mean": 2.6392879486083984, "std": 3.53118634223938, "min": -3.38763427734375, "p10": -0.8856090545654295, "median": 1.7517499923706055, "p90": 6.626463317871096, "max": 16.55279541015625, "pos_frac": 0.796875, "sample": [-0.229736328125, 9.996444702148438, 2.108367919921875, 0.7669677734375, -1.3147964477539062, 0.3981895446777344, -3.38763427734375, 6.1476593017578125, -1.0357666015625, 0.4231719970703125, 2.898956298828125, -0.5127716064453125, -1.1814231872558594, 2.4092559814453125, 16.55279541015625, 1.7180919647216797, 6.8316650390625, 3.1354217529296875, 4.131908416748047, 5.480621337890625, 3.7501068115234375, 4.841197967529297, 2.1206283569335938, 5.7024383544921875, -1.005340576171875, 3.585042953491211, -0.7247161865234375, 1.1771965026855469, 1.7134170532226562, 0.7886199951171875, -0.0780181884765625, 9.804145812988281, 1.7854080200195312, 2.3747406005859375, 4.0094146728515625, 1.31158447265625, 1.566925048828125, 1.9769287109375, 1.3787422180175781, 0.16588783264160156, -0.20883560180664062, -0.9545631408691406, 0.6665802001953125, 0.941650390625, 1.2456817626953125, -1.2359771728515625, 8.518234252929688, 4.949958801269531, 2.956024169921875, 0.40840911865234375, 2.6214675903320312, 4.132083892822266, 0.3522300720214844, 3.2257919311523438, 0.7418441772460938, 3.1931076049804688, 10.735702514648438, 4.661224365234375, 12.110809326171875, 1.155557632446289, -0.5510406494140625, 1.0236892700195312, 3.0504989624023438, 3.592559814453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000070.npy"}
{"epoch": 0.1027900146842878, "step": 71, "batch_size": 64, "mean": 3.616886615753174, "std": 4.024207592010498, "min": -2.2289505004882812, "p10": -0.11109771728515623, "median": 2.9598751068115234, "p90": 8.344434356689453, "max": 18.774627685546875, "pos_frac": 0.875, "sample": [0.93194580078125, 13.292236328125, -0.6757011413574219, -0.3498115539550781, 7.584007263183594, -0.09825897216796875, 2.1201858520507812, 3.4641952514648438, 1.9638290405273438, 3.4304332733154297, 3.544830322265625, 2.0427627563476562, 2.9304656982421875, 0.17554664611816406, 2.4801483154296875, 3.4532318115234375, 4.753204345703125, 6.24603271484375, 1.7155532836914062, 1.2605056762695312, 8.231689453125, 8.392753601074219, 3.2305984497070312, 3.72991943359375, 1.0477027893066406, 2.9892845153808594, 2.0155487060546875, 0.6610298156738281, 3.2036476135253906, 0.7495269775390625, -2.2289505004882812, 4.03521728515625, 4.8179931640625, -0.11660003662109375, 3.9763259887695312, 2.6171875, 1.5065841674804688, 6.7411041259765625, 3.006256103515625, 9.656684875488281, -0.12909698486328125, 5.1759490966796875, 5.4828033447265625, 0.925384521484375, 5.3773651123046875, 3.7279624938964844, 0.3269004821777344, 17.793304443359375, 18.774627685546875, 6.0284881591796875, 0.5273971557617188, 5.7877655029296875, 2.0271167755126953, 0.08565139770507812, 3.3282470703125, 1.277587890625, -0.6290435791015625, 1.5475502014160156, 3.4738693237304688, 10.682441711425781, 9.982666015625, 0.48172760009765625, -1.1860923767089844, 2.081329345703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000071.npy"}
{"epoch": 0.10425844346549193, "step": 72, "batch_size": 64, "mean": 2.966139316558838, "std": 3.3009915351867676, "min": -3.0714569091796875, "p10": -0.6855300903320312, "median": 2.0163440704345703, "p90": 7.992829132080079, "max": 13.825042724609375, "pos_frac": 0.84375, "sample": [3.1387596130371094, 4.828834533691406, -2.117706298828125, 0.648834228515625, 3.3665390014648438, -0.9075508117675781, 2.0267181396484375, 7.078502655029297, -2.1396484375, -1.90576171875, 0.9918327331542969, 0.26273155212402344, 2.11468505859375, 7.775238037109375, 5.721050262451172, 1.3074722290039062, 2.865478515625, 5.559123992919922, 0.92645263671875, 3.729248046875, 4.927093505859375, 2.005970001220703, 1.0601959228515625, 3.124011993408203, 1.8576507568359375, 2.3960037231445312, 1.3091373443603516, 2.8284454345703125, 1.734130859375, 13.825042724609375, 0.954071044921875, 1.7337799072265625, 1.689453125, 4.732421875, 1.052398681640625, 10.889129638671875, 8.97955322265625, 1.666341781616211, 1.823516845703125, 8.792732238769531, 1.945465087890625, 8.117523193359375, 0.6461639404296875, -0.2639579772949219, -0.9196376800537109, 5.986297607421875, 8.739486694335938, 2.4643821716308594, -0.24477005004882812, 8.086082458496094, 2.988739013671875, 2.8900833129882812, 1.1744804382324219, -0.7060470581054688, 1.4642410278320312, 6.6012115478515625, 3.961353302001953, 4.954925537109375, 6.70111083984375, 1.6739654541015625, 2.6619033813476562, 1.96710205078125, -3.0714569091796875, -0.6376571655273438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000072.npy"}
{"epoch": 0.10572687224669604, "step": 73, "batch_size": 64, "mean": 3.1142072677612305, "std": 4.924386501312256, "min": -4.0630950927734375, "p10": -1.6725143432617187, "median": 1.7502098083496094, "p90": 10.431603240966798, "max": 19.058425903320312, "pos_frac": 0.75, "sample": [1.8859405517578125, -0.1944732666015625, 0.8455848693847656, 0.813232421875, 1.0839347839355469, 5.208625793457031, -4.0630950927734375, -0.1821136474609375, 4.11199951171875, 1.1043243408203125, 19.058425903320312, 8.203788757324219, 3.6774749755859375, 0.7070884704589844, -1.6143341064453125, -1.1329421997070312, 17.031784057617188, 3.4202880859375, 1.1903724670410156, 0.48764801025390625, 6.247432708740234, 3.7539291381835938, 2.6737632751464844, 0.6531143188476562, -0.3671092987060547, 2.4655189514160156, 3.048503875732422, 4.294219970703125, 5.09844970703125, 1.1929473876953125, 0.9575672149658203, 0.13526344299316406, 11.3818359375, 3.4943084716796875, -2.47265625, 4.910003662109375, 1.5774383544921875, -0.2162189483642578, -0.4331836700439453, 0.0711669921875, -3.5388031005859375, 2.1985626220703125, -0.03570556640625, -0.1235198974609375, 10.343467712402344, 3.2414703369140625, 1.7207756042480469, 0.281585693359375, 7.339626312255859, 10.469375610351562, 2.348052978515625, 4.633075714111328, 13.66778564453125, 0.058437347412109375, 1.8057861328125, -1.69744873046875, 12.41448974609375, 7.8820343017578125, -1.861602783203125, 15.984970092773438, 1.7796440124511719, 6.144371032714844, -2.419769287109375, -3.4372406005859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000073.npy"}
{"epoch": 0.10719530102790015, "step": 74, "batch_size": 64, "mean": 3.480034589767456, "std": 4.490489959716797, "min": -2.737926483154297, "p10": -0.9722274780273437, "median": 2.2939529418945312, "p90": 11.09144744873047, "max": 16.520599365234375, "pos_frac": 0.765625, "sample": [4.302093505859375, 2.224365234375, 2.765117645263672, 0.268096923828125, 9.680068969726562, 2.0081634521484375, 2.3635406494140625, 12.198104858398438, -0.6492118835449219, 0.070343017578125, 11.224739074707031, -0.8567619323730469, 12.245292663574219, 4.325775146484375, 0.5636215209960938, 13.22540283203125, 3.12457275390625, 2.4662094116210938, 1.3063125610351562, 0.04058074951171875, 1.822885513305664, 16.520599365234375, -2.0998382568359375, 4.978755950927734, 1.2927093505859375, 4.237373352050781, -0.17464828491210938, 7.0853424072265625, -0.5938568115234375, 3.1666641235351562, 3.2470016479492188, -0.4576835632324219, 0.5736274719238281, 11.825386047363281, -1.8568344116210938, 5.533184051513672, -2.04376220703125, 0.9384880065917969, 3.808269500732422, 0.42889404296875, -1.0613441467285156, 1.5456771850585938, 5.290412902832031, 10.074859619140625, 7.00347900390625, 5.90753173828125, -0.86309814453125, -0.6274871826171875, 7.98175048828125, 0.4809417724609375, 4.150421142578125, 11.913330078125, -1.9038314819335938, 7.080322265625, 6.43585205078125, 1.1463851928710938, 4.1622467041015625, 4.4067230224609375, -1.0189971923828125, 10.780433654785156, -0.7941131591796875, 0.4157257080078125, -2.737926483154297, 1.8239364624023438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000074.npy"}
{"epoch": 0.10866372980910426, "step": 75, "batch_size": 64, "mean": 4.428960800170898, "std": 5.337127208709717, "min": -3.66973876953125, "p10": -0.32873153686523426, "median": 3.1910762786865234, "p90": 12.741963195800784, "max": 22.896240234375, "pos_frac": 0.859375, "sample": [0.06150245666503906, 13.034881591796875, 6.92413330078125, 1.7388229370117188, -3.66973876953125, 4.5952301025390625, 3.746265411376953, 0.32601165771484375, 1.9725112915039062, 13.755157470703125, 4.5076141357421875, 1.1248912811279297, 5.607658386230469, 2.1044387817382812, 7.576713562011719, 3.3645782470703125, 3.79522705078125, 5.443778991699219, 0.10039710998535156, 0.9860057830810547, 9.901763916015625, 2.7875213623046875, 1.8886985778808594, -0.215057373046875, 7.6552734375, 2.67279052734375, -1.254302978515625, 0.054790496826171875, 3.206371307373047, 9.793437957763672, 7.160247802734375, 0.6721992492675781, 5.59271240234375, 17.036544799804688, 0.7498664855957031, 3.6336746215820312, 17.921875, -3.61553955078125, 1.949951171875, 1.2448348999023438, 0.21200180053710938, 6.905010223388672, 3.739490509033203, 5.102333068847656, -0.5443649291992188, 4.8155059814453125, 0.7546234130859375, 2.449169158935547, 3.17578125, 10.352920532226562, 0.04136466979980469, 4.4639434814453125, -0.5673103332519531, 4.295627593994141, -0.4917793273925781, -0.041222572326660156, 4.41424560546875, 12.058486938476562, -0.37744903564453125, 0.8632125854492188, 22.896240234375, 13.838134765625, 16.503494262695312, 2.6603164672851562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000075.npy"}
{"epoch": 0.11013215859030837, "step": 76, "batch_size": 64, "mean": 3.066539764404297, "std": 4.768495559692383, "min": -7.189521789550781, "p10": -1.5527679443359375, "median": 2.30361270904541, "p90": 7.108723449707031, "max": 20.647552490234375, "pos_frac": 0.828125, "sample": [-3.8208694458007812, 2.353466033935547, 0.5190162658691406, 19.887283325195312, 2.1562423706054688, 1.0737762451171875, 5.8288421630859375, 7.135589599609375, 3.2031326293945312, 2.868938446044922, 4.397975921630859, -0.673248291015625, -4.083133697509766, 3.7123641967773438, 2.538055419921875, 1.1976165771484375, 4.25823974609375, 20.647552490234375, 2.3631134033203125, 2.016265869140625, 5.892066955566406, 3.4192256927490234, 0.06676483154296875, 1.0548248291015625, -0.42694091796875, 5.746578216552734, 0.575779914855957, 4.157646179199219, -2.66546630859375, -7.189521789550781, 5.733554840087891, 4.683727264404297, 2.0850906372070312, 1.2142066955566406, -4.762298583984375, 0.5724983215332031, 5.494804382324219, 2.8407516479492188, -0.8384838104248047, 0.7586288452148438, 2.2537593841552734, 4.701637268066406, -2.238140106201172, 0.99102783203125, 13.287567138671875, 7.0460357666015625, 9.467193603515625, 3.5814056396484375, 6.525005340576172, 1.7823104858398438, -1.5560226440429688, -1.5451736450195312, 0.5263595581054688, 0.3861503601074219, 11.276145935058594, 1.1064224243164062, 0.048809051513671875, 9.51220703125, 4.382987976074219, 4.91729736328125, 2.1696701049804688, 5.835289001464844, 4.996406555175781, 0.8105316162109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000076.npy"}
{"epoch": 0.11160058737151249, "step": 77, "batch_size": 64, "mean": 3.324207305908203, "std": 4.964902400970459, "min": -7.6295166015625, "p10": -1.1995040893554687, "median": 2.289379119873047, "p90": 9.158369445800783, "max": 18.634201049804688, "pos_frac": 0.765625, "sample": [2.7931671142578125, 13.74920654296875, 1.077972412109375, 5.386573791503906, 2.91314697265625, -3.4874954223632812, 14.154556274414062, 5.265007019042969, 1.374074935913086, 4.772296905517578, -1.1236038208007812, 3.9874649047851562, 6.123340606689453, 1.1899375915527344, 2.1117706298828125, 2.4669876098632812, -7.6295166015625, 8.132766723632812, -1.2320327758789062, 9.298820495605469, 3.6560211181640625, -0.622314453125, 0.09412384033203125, 0.3190460205078125, 3.3719940185546875, 5.4163818359375, -0.5115089416503906, 8.830650329589844, 0.07586288452148438, 1.9531707763671875, 10.706714630126953, 11.307846069335938, 0.9054985046386719, 0.9289398193359375, 7.187980651855469, 1.7701911926269531, 6.9486083984375, 18.634201049804688, -0.7619857788085938, 5.243705749511719, -3.2206573486328125, 2.0126075744628906, -6.67742919921875, 0.20537948608398438, 4.648786544799805, -0.3076019287109375, 8.509613037109375, 8.479156494140625, 1.930419921875, 12.359935760498047, 0.5311660766601562, 0.381561279296875, -0.9314937591552734, 1.07373046875, 6.1710052490234375, 4.405609130859375, -0.4766502380371094, -5.8023529052734375, 8.329666137695312, 7.889900207519531, -1.2598381042480469, 3.1777114868164062, 5.206310272216797, -0.666839599609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000077.npy"}
{"epoch": 0.1130690161527166, "step": 78, "batch_size": 64, "mean": 3.5206122398376465, "std": 4.205119609832764, "min": -5.679693222045898, "p10": -0.8864656448364255, "median": 2.960987091064453, "p90": 8.154825973510743, "max": 17.609756469726562, "pos_frac": 0.8125, "sample": [6.83538818359375, -1.1152191162109375, 7.990203857421875, -1.2147064208984375, 2.0157203674316406, 10.208587646484375, 3.6586532592773438, 14.95452880859375, 7.672760009765625, 2.8426513671875, 4.184501647949219, 8.132205963134766, -1.3533172607421875, -5.679693222045898, 3.881702423095703, 3.98956298828125, 1.762115478515625, 4.6045989990234375, 3.8946380615234375, 4.950286865234375, 0.44544219970703125, 1.867959976196289, 5.808311462402344, 13.799163818359375, 2.63299560546875, 1.7781524658203125, -0.16482925415039062, 4.462799072265625, 10.0753173828125, 3.2942428588867188, -0.48892974853515625, 0.9917182922363281, -2.5013427734375, 3.0793228149414062, 4.4444732666015625, 11.24981689453125, 3.095550537109375, 1.0728073120117188, 2.5097198486328125, 6.09930419921875, 6.775398254394531, 5.010307312011719, 8.164520263671875, 4.2574615478515625, 4.2964324951171875, 1.6422615051269531, -0.9812049865722656, 0.9778804779052734, -0.6654071807861328, 3.437530517578125, 0.8636398315429688, 0.6044902801513672, 0.8300056457519531, -0.1849822998046875, 2.2091217041015625, -2.0441055297851562, 17.609756469726562, 1.4191646575927734, 1.7540435791015625, 5.523719787597656, -0.56201171875, 7.0425567626953125, 1.021942138671875, 0.5495071411132812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000078.npy"}
{"epoch": 0.1145374449339207, "step": 79, "batch_size": 64, "mean": 4.717245101928711, "std": 5.740406036376953, "min": -7.0709381103515625, "p10": -1.1128120422363281, "median": 3.5461339950561523, "p90": 11.873710632324222, "max": 25.727249145507812, "pos_frac": 0.8125, "sample": [-5.221855163574219, 9.415447235107422, 1.48846435546875, 9.356300354003906, -0.150848388671875, 2.9023895263671875, -1.5984306335449219, 1.2517929077148438, 9.76385498046875, 13.318374633789062, 11.192947387695312, -7.0709381103515625, 3.0769271850585938, 25.727249145507812, 0.86029052734375, 12.16546630859375, 4.7164764404296875, -1.1527862548828125, 3.603822708129883, 12.929397583007812, 16.338356018066406, 5.661563873291016, 6.447540283203125, -1.71343994140625, 10.536918640136719, -1.9054908752441406, 2.9880447387695312, 6.44776725769043, 5.409950256347656, 8.072463989257812, 8.760429382324219, 2.986164093017578, 8.098991394042969, 5.032154083251953, -0.25682830810546875, 3.488445281982422, 3.4750595092773438, -1.0195388793945312, 20.468124389648438, 12.514312744140625, 5.713191986083984, 3.1931610107421875, 4.117195129394531, 2.9321136474609375, 8.637969970703125, 0.044185638427734375, 6.1772613525390625, 2.539081573486328, 1.7094688415527344, -0.13196182250976562, 7.509254455566406, 0.3092460632324219, 2.5733070373535156, 4.8626556396484375, 1.21484375, 5.188629150390625, 3.9714736938476562, 0.10582733154296875, -3.645477294921875, 0.6355247497558594, 4.430538177490234, 10.122222900390625, -0.8332633972167969, 2.1219253540039062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000079.npy"}
{"epoch": 0.11600587371512482, "step": 80, "batch_size": 64, "mean": 3.698101043701172, "std": 4.994575023651123, "min": -5.994110107421875, "p10": -0.8488719940185546, "median": 2.1918773651123047, "p90": 10.675811767578127, "max": 22.43414306640625, "pos_frac": 0.796875, "sample": [2.046630859375, 3.5818519592285156, 2.2583694458007812, -5.994110107421875, 1.7080268859863281, -0.19356536865234375, 1.0414276123046875, 0.19847869873046875, 9.306243896484375, -2.3043136596679688, 1.1207218170166016, 3.9958419799804688, -1.478658676147461, 1.3144760131835938, 0.7013778686523438, -0.13906097412109375, 22.43414306640625, 3.6222381591796875, 10.237083435058594, 5.083930969238281, 18.182571411132812, 1.6843490600585938, 4.8046875, -2.3374404907226562, 4.507091522216797, 2.125385284423828, 0.6047611236572266, 11.806533813476562, 1.9206085205078125, 4.8540496826171875, 0.5300788879394531, -0.1103515625, 2.29052734375, 9.021591186523438, 5.00006103515625, -2.010040283203125, 3.6505279541015625, 5.674858093261719, -0.8841209411621094, 13.756759643554688, 0.8052444458007812, 6.292144775390625, 1.5608139038085938, 10.863838195800781, 11.046180725097656, 2.9614791870117188, 3.5557098388671875, 4.9107513427734375, 1.8547039031982422, 1.6652755737304688, 0.9901885986328125, -0.7666244506835938, -0.040302276611328125, 3.722747802734375, 1.5129508972167969, 4.2057037353515625, 1.4080657958984375, -2.6681900024414062, 4.267581939697266, 4.6586761474609375, 7.59991455078125, 9.958877563476562, -0.2545146942138672, 12.953628540039062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000080.npy"}
{"epoch": 0.11747430249632893, "step": 81, "batch_size": 64, "mean": 5.233097076416016, "std": 6.959782123565674, "min": -9.387664794921875, "p10": -0.5148021697998046, "median": 4.358171463012695, "p90": 11.940229797363282, "max": 34.35777282714844, "pos_frac": 0.859375, "sample": [6.307884216308594, 16.4814453125, 11.427749633789062, 4.993255615234375, 1.7424449920654297, 11.941909790039062, 0.49491119384765625, 0.3900489807128906, -0.5492134094238281, 5.115348815917969, 4.824493408203125, -9.387664794921875, 1.3517303466796875, 4.5983428955078125, 9.4366455078125, 0.7984542846679688, 2.0315399169921875, 5.6842803955078125, 11.936309814453125, -1.7131423950195312, 0.4306774139404297, 6.636871337890625, 0.23238372802734375, 4.400909423828125, -0.43450927734375, 1.7071342468261719, 9.950653076171875, 8.729934692382812, 3.0559558868408203, 5.358642578125, -1.0271759033203125, 1.0970001220703125, 0.172454833984375, -4.9692840576171875, 7.0367431640625, 23.60113525390625, 1.0110702514648438, -0.36846923828125, -2.767963409423828, 34.35777282714844, 5.1720123291015625, 12.397674560546875, 5.229316711425781, 5.9073638916015625, 2.0795631408691406, 2.6707000732421875, -1.0560226440429688, 3.644775390625, 7.3128509521484375, 21.768402099609375, 4.315433502197266, 0.6751747131347656, 2.784271240234375, 2.1354751586914062, 6.647727966308594, 9.701385498046875, 22.87286376953125, 1.3934707641601562, 3.8709793090820312, 5.115089416503906, 5.0241241455078125, 5.9736175537109375, 3.0382614135742188, 10.124992370605469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000081.npy"}
{"epoch": 0.11894273127753303, "step": 82, "batch_size": 64, "mean": 3.512044906616211, "std": 4.788589954376221, "min": -7.6970367431640625, "p10": -1.121614074707031, "median": 3.065530776977539, "p90": 8.953352355957033, "max": 17.37994384765625, "pos_frac": 0.8125, "sample": [1.5529403686523438, 0.0095367431640625, 4.940887451171875, 2.9989891052246094, 1.4646072387695312, 6.717071533203125, 1.3843002319335938, 0.2977333068847656, 4.261569976806641, 16.468238830566406, 8.350906372070312, 0.17566680908203125, 0.6964797973632812, 4.5783233642578125, 1.9056663513183594, 15.69207763671875, 3.7413711547851562, -2.5844268798828125, 4.2517547607421875, 4.1234893798828125, -0.3898468017578125, 7.0685882568359375, 9.154571533203125, 6.935577392578125, -0.8218154907226562, 1.6383514404296875, 1.9015045166015625, 3.6017608642578125, -3.7287139892578125, 3.1320724487304688, 6.336402893066406, 1.6197357177734375, -0.42535400390625, 0.1641826629638672, 14.168350219726562, 3.2371139526367188, 2.1047515869140625, 4.711009979248047, 3.9618988037109375, -0.28778076171875, 1.4920616149902344, 4.968666076660156, 13.3726806640625, 1.1819114685058594, 0.3451499938964844, 1.9397430419921875, 5.633636474609375, 4.751922607421875, -1.5811233520507812, -4.750152587890625, -1.2500991821289062, -2.7431488037109375, -7.6970367431640625, 6.101959228515625, 5.2475738525390625, 5.151954650878906, 1.695159912109375, 9.949310302734375, 2.518299102783203, 17.37994384765625, 8.483840942382812, 3.5016708374023438, -0.35637664794921875, 4.323783874511719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000082.npy"}
{"epoch": 0.12041116005873716, "step": 83, "batch_size": 64, "mean": 4.038934230804443, "std": 5.924076557159424, "min": -7.607269287109375, "p10": -2.2184509277343745, "median": 3.3419504165649414, "p90": 12.257090759277345, "max": 20.498855590820312, "pos_frac": 0.71875, "sample": [12.000839233398438, 0.740936279296875, -1.2509841918945312, 10.654067993164062, 4.8437957763671875, 1.897665023803711, 1.0958480834960938, 2.493682861328125, 12.366912841796875, 2.303974151611328, 7.7356109619140625, 5.012632369995117, -0.59619140625, -5.163360595703125, -0.5842323303222656, 12.944099426269531, 1.9991302490234375, 0.816986083984375, 20.480323791503906, 8.810012817382812, 10.410415649414062, 3.1931838989257812, -1.82440185546875, -1.5014724731445312, -0.8892745971679688, 4.369773864746094, 9.925262451171875, -2.3873291015625, 3.4907169342041016, 7.553123474121094, -5.6502838134765625, 4.493812561035156, 0.582977294921875, -1.7108078002929688, 4.4127044677734375, 7.184547424316406, 8.74570083618164, -1.106943130493164, 4.701690673828125, 2.3470840454101562, 4.594444274902344, -0.02860260009765625, 6.1902618408203125, 20.498855590820312, 2.9050064086914062, 10.927604675292969, -4.459312438964844, -1.52044677734375, -4.006561279296875, -1.6608810424804688, -7.607269287109375, 4.821258544921875, 3.7544212341308594, 12.906883239746094, 11.104667663574219, 13.449005126953125, 3.1841049194335938, -2.4508323669433594, 6.355812072753906, 13.981575012207031, 1.25701904296875, 4.937252044677734, 3.7273941040039062, 0.6879005432128906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000083.npy"}
{"epoch": 0.12187958883994127, "step": 84, "batch_size": 64, "mean": 2.4480395317077637, "std": 5.011202812194824, "min": -10.409591674804688, "p10": -2.0223094940185544, "median": 1.744537353515625, "p90": 8.794988250732425, "max": 20.094635009765625, "pos_frac": 0.671875, "sample": [1.8032913208007812, 5.3017425537109375, 1.6857833862304688, -0.8479022979736328, 17.254913330078125, -1.4416046142578125, 3.3793296813964844, 2.47357177734375, -0.03636741638183594, 3.9952468872070312, -0.3208770751953125, 1.8411636352539062, -2.099090576171875, -0.54388427734375, -1.9744873046875, -3.8757095336914062, 4.261165618896484, 3.7856674194335938, 3.3998947143554688, 3.374420166015625, 1.6219291687011719, 5.226020812988281, 20.094635009765625, -0.07504844665527344, 0.2301006317138672, -1.3873481750488281, 3.0560302734375, -0.95751953125, -10.409591674804688, 2.3589324951171875, 2.019916534423828, -0.3037109375, 14.628776550292969, -1.9487380981445312, 0.8810234069824219, 8.088478088378906, 10.500053405761719, -4.9071197509765625, 0.1855010986328125, -3.19134521484375, -0.0564727783203125, 3.8559722900390625, 0.7593097686767578, -2.042804718017578, 11.167266845703125, 3.639759063720703, 3.238494873046875, 2.431896209716797, 5.2540283203125, 7.039497375488281, -0.6967620849609375, 1.8057327270507812, 9.0977783203125, 6.5777435302734375, 2.05780029296875, -1.4400634765625, 5.297901153564453, 0.4610862731933594, 0.140411376953125, 11.001678466796875, 0.813934326171875, -2.72607421875, 1.6103439331054688, 0.2588348388671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000084.npy"}
{"epoch": 0.12334801762114538, "step": 85, "batch_size": 64, "mean": 3.2974720001220703, "std": 5.686344146728516, "min": -7.597785949707031, "p10": -4.056065368652344, "median": 2.786334991455078, "p90": 9.130353164672853, "max": 20.97010040283203, "pos_frac": 0.703125, "sample": [13.31253433227539, 7.0820465087890625, 14.489456176757812, 0.7601547241210938, -7.597785949707031, -1.954437255859375, 1.0279617309570312, -1.7329559326171875, 20.97010040283203, -3.5038108825683594, 0.46126556396484375, 0.8747329711914062, 9.001617431640625, 8.792808532714844, -1.3324947357177734, 3.082988739013672, -5.9517974853515625, 6.713531494140625, 2.59210205078125, -3.6042633056640625, -0.13167190551757812, 7.9505462646484375, 0.32070159912109375, -0.3506355285644531, -0.20318603515625, -6.1525421142578125, 7.396888732910156, 1.59808349609375, 12.042896270751953, 4.2860107421875, 9.018363952636719, -5.06103515625, -6.39593505859375, 2.154390335083008, -4.486572265625, -0.2477855682373047, 11.007553100585938, 8.652109146118164, 1.5170974731445312, 0.9736595153808594, 4.11187744140625, 6.7152099609375, 3.6566925048828125, 2.9805679321289062, 8.168563842773438, 3.7506046295166016, 2.5228424072265625, 14.709335327148438, 7.96539306640625, 3.5325775146484375, 7.173393249511719, 8.97793197631836, -0.1911468505859375, 3.4048843383789062, -1.4090652465820312, 4.438701629638672, 0.4995574951171875, 0.3700523376464844, 8.2080078125, 9.178348541259766, -4.24969482421875, 3.7775650024414062, -0.24066925048828125, 5.613983154296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000085.npy"}
{"epoch": 0.12481644640234948, "step": 86, "batch_size": 64, "mean": 4.088076114654541, "std": 6.586512565612793, "min": -7.8817138671875, "p10": -1.3013328552246093, "median": 1.921891212463379, "p90": 13.937344360351565, "max": 32.04132080078125, "pos_frac": 0.796875, "sample": [-2.679119110107422, 8.194442749023438, 15.205024719238281, 4.76837158203125, -0.408935546875, -0.34914398193359375, 4.158061981201172, -1.249053955078125, 3.202880859375, 2.55767822265625, -0.15758895874023438, 3.414886474609375, -7.8817138671875, 4.769144058227539, 8.676490783691406, 4.767913818359375, 16.612960815429688, 3.4510574340820312, 1.4580955505371094, 12.489707946777344, 0.5331573486328125, 4.046909332275391, 1.6238784790039062, 1.6912479400634766, 7.419090270996094, 0.2548656463623047, 1.0049819946289062, 15.174266815185547, 0.24516677856445312, -0.23264694213867188, 6.576530456542969, -2.7046051025390625, 1.960348129272461, -3.4322967529296875, 7.667793273925781, 13.035003662109375, 2.1195068359375, -5.5041351318359375, 16.668594360351562, 1.5720787048339844, 13.122222900390625, -2.7154541015625, 0.92437744140625, 2.744518280029297, -1.3237380981445312, 19.8909912109375, 1.2850017547607422, 32.04132080078125, 14.28668212890625, 3.89007568359375, 5.1016998291015625, 0.2944374084472656, 5.962303161621094, 1.8834342956542969, 1.8523025512695312, 1.5213699340820312, 1.5694694519042969, 2.4644317626953125, -0.5073699951171875, 1.487508773803711, 0.7300186157226562, 1.2054595947265625, 1.0367584228515625, 2.1681671142578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000086.npy"}
{"epoch": 0.1262848751835536, "step": 87, "batch_size": 64, "mean": 4.472693920135498, "std": 6.854785442352295, "min": -19.9454345703125, "p10": -1.0834907531738276, "median": 3.5095062255859375, "p90": 14.40572624206543, "max": 26.64581298828125, "pos_frac": 0.828125, "sample": [-0.6379852294921875, 8.259029388427734, 1.4403305053710938, 4.780784606933594, 4.2208709716796875, 3.0636672973632812, 5.727153778076172, -0.0986175537109375, -1.2744216918945312, -3.045623779296875, 3.519622802734375, 4.6213531494140625, -1.414520263671875, 14.8277587890625, 7.0520172119140625, 6.6856231689453125, 6.360298156738281, 1.1947288513183594, 18.717559814453125, 3.4993896484375, 1.6446380615234375, 1.0246620178222656, 2.251392364501953, 9.160850524902344, 4.756595611572266, 0.04207801818847656, 8.247406005859375, 26.64581298828125, 5.86395263671875, 5.850811004638672, -5.498077392578125, 11.904869079589844, 15.651123046875, 14.406116485595703, 7.0018157958984375, -0.5303611755371094, 2.5548477172851562, 3.2800941467285156, 14.404815673828125, 5.3896636962890625, 0.24198150634765625, 14.905143737792969, 0.866729736328125, 12.561912536621094, -19.9454345703125, 1.3241806030273438, 12.029998779296875, -4.031040191650391, 3.4555397033691406, 4.5411376953125, 0.5525569915771484, 1.8175048828125, 3.0887088775634766, -8.000938415527344, -0.15911102294921875, 17.559051513671875, 3.968841552734375, 6.234092712402344, 6.840217590332031, 1.9190292358398438, 0.13269805908203125, 0.10583877563476562, 3.6642608642578125, 1.02740478515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000087.npy"}
{"epoch": 0.1277533039647577, "step": 88, "batch_size": 64, "mean": 3.4725332260131836, "std": 5.774807929992676, "min": -11.61016845703125, "p10": -2.6137317657470702, "median": 2.8163490295410156, "p90": 11.120979309082033, "max": 23.238265991210938, "pos_frac": 0.75, "sample": [0.4791107177734375, -2.8353729248046875, 4.424774169921875, 3.1881942749023438, 7.348289489746094, 1.51348876953125, 6.277473449707031, 23.238265991210938, -2.709625244140625, 0.9422588348388672, 14.56890869140625, 4.733913421630859, 3.6424942016601562, 0.05869293212890625, -11.61016845703125, 4.9452362060546875, 5.351402282714844, 3.032745361328125, -2.849456787109375, 2.060882568359375, -0.2953643798828125, 10.26556396484375, 9.5072021484375, 1.1335868835449219, 1.882354736328125, -6.250205993652344, 13.151992797851562, 4.153564453125, 3.5522518157958984, 7.3904876708984375, 6.563911437988281, 15.198020935058594, 0.7163715362548828, 6.551979064941406, -1.9767379760742188, 15.76898193359375, -1.58123779296875, 8.615936279296875, 0.5878105163574219, 12.01470947265625, 3.116382598876953, 1.8614578247070312, 2.6776504516601562, 1.9877910614013672, -0.45290374755859375, -1.108001708984375, -1.615438461303711, 1.0414447784423828, 10.561233520507812, 11.360870361328125, 5.403083801269531, 1.840087890625, -1.6060981750488281, -0.6767044067382812, -2.7508010864257812, 9.555503845214844, -2.3899803161621094, 0.3047466278076172, 3.22027587890625, 2.3875045776367188, 3.449756622314453, -4.664466857910156, 2.955047607421875, 3.0309982299804688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000088.npy"}
{"epoch": 0.12922173274596183, "step": 89, "batch_size": 64, "mean": 3.7134885787963867, "std": 6.169270038604736, "min": -8.246505737304688, "p10": -0.8725738525390625, "median": 2.3427581787109375, "p90": 8.984289550781252, "max": 29.0867919921875, "pos_frac": 0.78125, "sample": [4.5146026611328125, 1.2001419067382812, 1.720489501953125, 1.8032722473144531, -2.3565750122070312, 2.7240447998046875, 2.8760223388671875, 8.29916000366211, 2.906494140625, 10.533294677734375, 6.036342620849609, 14.024585723876953, 7.326133728027344, -0.6476821899414062, 1.2139739990234375, 2.7635726928710938, -0.1164398193359375, 0.8521575927734375, -0.7399311065673828, 6.960094451904297, 2.291168212890625, -1.054718017578125, 5.640663146972656, 1.8530502319335938, 6.6752777099609375, -0.09584236145019531, -1.7097396850585938, 0.6477546691894531, 8.048927307128906, 2.7547149658203125, 2.39434814453125, 26.581817626953125, 2.8537521362304688, 5.337532043457031, 5.674510955810547, 5.986289978027344, 1.5709247589111328, 29.0867919921875, 1.2854499816894531, -0.84405517578125, 14.28985595703125, -0.3739776611328125, 0.45081329345703125, 2.780590057373047, -7.2096710205078125, -8.246505737304688, 1.1184463500976562, 0.21954727172851562, 4.7463531494140625, 8.538970947265625, 7.32806396484375, 5.445159912109375, -0.7692604064941406, 2.0247344970703125, 15.868904113769531, 0.01617431640625, 0.7938232421875, 0.40228271484375, -3.82843017578125, 2.4211692810058594, 9.175140380859375, -0.884796142578125, 5.34588623046875, 1.1376266479492188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000089.npy"}
{"epoch": 0.13069016152716592, "step": 90, "batch_size": 64, "mean": 4.664685249328613, "std": 6.156388759613037, "min": -8.68206787109375, "p10": -1.495690155029297, "median": 3.0812416076660156, "p90": 13.857106781005859, "max": 28.6483154296875, "pos_frac": 0.84375, "sample": [2.7483062744140625, -3.4680862426757812, 1.5194778442382812, 15.360870361328125, 4.574527740478516, 3.5015411376953125, 0.7555084228515625, 0.9654312133789062, 11.849365234375, 0.4324150085449219, 13.801292419433594, -1.4981689453125, 3.9385528564453125, 5.601692199707031, 2.7537689208984375, 6.417884826660156, 1.4009666442871094, 14.522216796875, 2.0242919921875, 0.9690322875976562, 4.745477676391602, 1.8622665405273438, 9.985450744628906, 3.1545448303222656, 28.6483154296875, 2.4315223693847656, 10.751708984375, 8.474029541015625, -1.3794269561767578, -1.4899063110351562, 14.730010986328125, 6.00225830078125, -3.3795928955078125, 2.846926689147949, 8.014450073242188, -8.68206787109375, 1.3866729736328125, 3.9726104736328125, 12.767807006835938, -0.1480560302734375, 7.709716796875, -1.5999298095703125, 0.17217254638671875, 0.7455768585205078, -2.3650970458984375, 1.6804962158203125, 0.8798065185546875, -5.0931854248046875, 0.4835662841796875, 2.5729141235351562, 7.354305267333984, 3.0079383850097656, 3.6890411376953125, 12.57196044921875, 13.881027221679688, 4.403678894042969, 1.506387710571289, 9.034774780273438, 14.7559814453125, 5.2439117431640625, 5.479827880859375, 1.5846328735351562, 14.349288940429688, 3.625171661376953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000090.npy"}
{"epoch": 0.13215859030837004, "step": 91, "batch_size": 64, "mean": 4.2195258140563965, "std": 5.789723873138428, "min": -4.4940643310546875, "p10": -2.028165817260742, "median": 2.869029998779297, "p90": 11.552529144287112, "max": 23.880233764648438, "pos_frac": 0.765625, "sample": [1.6509780883789062, 11.047348022460938, 1.4993362426757812, 2.502239227294922, 23.880233764648438, 0.46353721618652344, 0.710052490234375, -1.8579216003417969, 5.294948577880859, 6.058444976806641, 5.207275390625, 7.580280303955078, 5.0342254638671875, 6.3442230224609375, 2.2878646850585938, 18.559677124023438, 2.639892578125, 6.16668701171875, -0.0391845703125, 12.233505249023438, 6.303962707519531, 3.237171173095703, 7.667877197265625, 2.280670166015625, 8.253089904785156, 3.1206741333007812, 1.0155029296875, -0.41533660888671875, -0.9236564636230469, -2.4985580444335938, 3.9006805419921875, 2.5534515380859375, -1.06024169921875, 3.74603271484375, 2.8224639892578125, 4.759967803955078, 3.73443603515625, -2.9537353515625, 0.7061653137207031, 1.825225830078125, -3.2081298828125, -2.591949462890625, 2.9155960083007812, -2.8371849060058594, 0.07086944580078125, 14.923065185546875, 2.2373046875, 9.67144775390625, -4.4940643310546875, 7.397727966308594, 3.6475372314453125, -0.6547470092773438, 0.015155792236328125, 19.766273498535156, -0.9943809509277344, 11.031864166259766, -2.1011276245117188, 11.769035339355469, 8.287059783935547, 3.542797088623047, 15.861503601074219, -1.3599700927734375, 9.006988525390625, 2.8074951171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000091.npy"}
{"epoch": 0.13362701908957417, "step": 92, "batch_size": 64, "mean": 3.4903693199157715, "std": 5.3700432777404785, "min": -9.961532592773438, "p10": -1.2248027801513672, "median": 2.3525943756103516, "p90": 11.30472564697266, "max": 24.266876220703125, "pos_frac": 0.71875, "sample": [-4.077156066894531, -0.17195892333984375, 1.5060043334960938, 11.742362976074219, 1.8874053955078125, 3.3597488403320312, 5.2742767333984375, -0.6517124176025391, 0.9042434692382812, -1.2122764587402344, 9.545787811279297, -0.16788101196289062, 3.10247802734375, 0.7370204925537109, -9.961532592773438, 10.595230102539062, 6.7515869140625, 24.266876220703125, -3.73809814453125, -0.30810546875, 3.8664169311523438, -0.70159912109375, 5.9955291748046875, -0.38226318359375, 0.290191650390625, 3.074005126953125, 4.280342102050781, 4.34393310546875, 7.0455474853515625, 3.5483837127685547, 14.339069366455078, -1.4921960830688477, 1.1795196533203125, -1.1038360595703125, 4.712993621826172, -0.41946983337402344, 3.93743896484375, 11.734130859375, -1.3315467834472656, 14.660804748535156, 11.608795166015625, 9.858779907226562, 1.5797195434570312, 2.5579681396484375, 0.6547393798828125, 3.3922996520996094, 2.0692138671875, 9.346427917480469, 2.627758026123047, 4.020833969116211, -0.5402050018310547, -1.2301712036132812, 12.517982482910156, 0.9426116943359375, -0.021778106689453125, 1.26641845703125, 1.5184383392333984, -2.781463623046875, 8.335296630859375, 3.0036067962646484, 4.311809539794922, 1.8345565795898438, 7.4010772705078125, 2.1472206115722656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000092.npy"}
{"epoch": 0.13509544787077826, "step": 93, "batch_size": 64, "mean": 3.492115020751953, "std": 5.824976444244385, "min": -8.397956848144531, "p10": -3.2765899658203117, "median": 2.400869369506836, "p90": 10.961052703857423, "max": 20.552749633789062, "pos_frac": 0.703125, "sample": [-5.9294586181640625, 0.8828105926513672, 7.922607421875, 8.57861328125, 6.3155517578125, -1.9436492919921875, 2.1780548095703125, 0.7244377136230469, 11.016792297363281, -0.5587310791015625, 8.790237426757812, 2.4979209899902344, 2.3023910522460938, -1.0490837097167969, 13.5712890625, 9.730037689208984, 2.3038177490234375, -2.3833465576171875, -8.397956848144531, 12.396110534667969, -1.6513099670410156, 4.1822509765625, 8.968544006347656, -5.4296112060546875, -0.0067901611328125, 1.8822784423828125, 15.666030883789062, 14.719062805175781, -0.8568077087402344, 4.444950103759766, 3.4022369384765625, -1.2984085083007812, 0.3869972229003906, 6.571136474609375, 7.981941223144531, 20.552749633789062, -5.265098571777344, 5.494293212890625, -0.30889892578125, -4.04766845703125, 5.1773223876953125, 1.813507080078125, 5.3764495849609375, 1.7246475219726562, 4.060680389404297, 4.603069305419922, -0.5991744995117188, 0.7285652160644531, 10.83099365234375, 9.553394317626953, -3.6594085693359375, 13.518360137939453, 2.7977142333984375, 3.575469970703125, 2.751636505126953, 7.9642791748046875, -1.5536575317382812, 6.4229736328125, 2.1781005859375, 8.118244171142578, 1.4672431945800781, -1.8800125122070312, -7.267547607421875, 1.4561805725097656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000093.npy"}
{"epoch": 0.13656387665198239, "step": 94, "batch_size": 64, "mean": 5.036520957946777, "std": 6.226236820220947, "min": -9.919509887695312, "p10": -1.8033452987670893, "median": 4.7220611572265625, "p90": 13.389707183837892, "max": 23.137176513671875, "pos_frac": 0.828125, "sample": [4.367561340332031, 3.7315673828125, 7.8393707275390625, -1.1827983856201172, -7.707305908203125, 1.4828109741210938, 1.1301612854003906, 17.139938354492188, 1.7815704345703125, 3.6935577392578125, 1.1848793029785156, -0.83697509765625, 5.750835418701172, 7.9182891845703125, -0.6590938568115234, 0.8501987457275391, 12.868034362792969, 8.249210357666016, 5.2194671630859375, 10.15338134765625, -2.7253646850585938, 4.107166290283203, 7.3495330810546875, 5.076560974121094, 9.513381958007812, 8.259536743164062, -2.069293975830078, 5.220130920410156, 5.534660339355469, 0.8392333984375, -4.148881912231445, 4.318107604980469, 2.7573204040527344, 17.360687255859375, 4.277263641357422, 1.8694686889648438, 23.137176513671875, 13.61328125, 5.6301116943359375, 5.673187255859375, 0.7887725830078125, 0.2790641784667969, -9.919509887695312, 1.6362419128417969, 8.23135757446289, 9.85085678100586, -5.6532135009765625, 10.629058837890625, 7.405426025390625, 7.124229431152344, 14.219146728515625, 5.746917724609375, -0.2172088623046875, 6.1262054443359375, 2.6087570190429688, 11.997650146484375, 0.8950653076171875, -3.0425758361816406, 19.08226776123047, 1.5939254760742188, 8.402606964111328, 7.23797607421875, 15.6669921875, 3.0793914794921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000094.npy"}
{"epoch": 0.13803230543318648, "step": 95, "batch_size": 64, "mean": 4.036772727966309, "std": 5.646301746368408, "min": -5.3335723876953125, "p10": -1.4659399032592773, "median": 2.8796215057373047, "p90": 11.75714492797852, "max": 23.171279907226562, "pos_frac": 0.78125, "sample": [0.616302490234375, 3.4333457946777344, 6.107452392578125, -1.4355888366699219, 6.014373779296875, 14.633087158203125, 2.886566162109375, 1.0988693237304688, -0.8370475769042969, 13.120628356933594, 0.33605194091796875, 2.8726768493652344, 6.2921905517578125, 19.36614990234375, 16.785053253173828, 0.5011749267578125, 5.640220642089844, 0.6440010070800781, -1.5989646911621094, 3.1644058227539062, 9.448509216308594, 2.9643402099609375, 3.400218963623047, 0.45594024658203125, -1.7177543640136719, -0.2799835205078125, 0.24533843994140625, 13.718757629394531, -0.31982421875, 7.2121429443359375, -4.543750762939453, 4.725311279296875, 10.110015869140625, 9.364120483398438, -5.3335723876953125, 12.180519104003906, 8.843276977539062, 0.8375129699707031, 1.5912551879882812, 7.4809112548828125, 6.3748321533203125, 23.171279907226562, 4.837493896484375, 5.019035339355469, 1.5140113830566406, 0.3935890197753906, 0.32501983642578125, 4.2359161376953125, -1.6264724731445312, -4.223541259765625, -1.4762516021728516, 1.0976314544677734, 0.91168212890625, 0.9800262451171875, 7.364967346191406, -0.11095809936523438, 10.769271850585938, 5.189388275146484, 2.6938247680664062, 7.3817596435546875, -0.5456771850585938, 2.3817214965820312, 3.112558364868164, -1.4418792724609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000095.npy"}
{"epoch": 0.1395007342143906, "step": 96, "batch_size": 64, "mean": 4.7870965003967285, "std": 5.893151760101318, "min": -9.786209106445312, "p10": -0.980921173095703, "median": 3.5184555053710938, "p90": 12.553553771972659, "max": 21.627395629882812, "pos_frac": 0.859375, "sample": [10.212860107421875, 3.6147842407226562, 11.53173828125, 1.6930007934570312, 7.1520233154296875, 19.37701416015625, -9.786209106445312, 0.5338687896728516, -1.0333786010742188, 5.620216369628906, 21.627395629882812, 4.863273620605469, 5.38531494140625, 0.2414398193359375, 2.9540023803710938, 4.049495697021484, 1.9876136779785156, 13.380538940429688, 1.6999359130859375, 13.773780822753906, -1.430816650390625, 7.919994354248047, 8.825973510742188, 20.245635986328125, 12.050567626953125, 5.563835144042969, 7.854560852050781, -3.00555419921875, 12.769119262695312, 3.749126434326172, -1.3626785278320312, 8.866668701171875, 1.7075576782226562, 2.470947265625, 2.6090087890625, 0.9004974365234375, -0.8585205078125, 2.801738739013672, -5.9524383544921875, 1.7004995346069336, 2.4369735717773438, 3.7911529541015625, 6.999847412109375, 2.6711349487304688, 0.44746971130371094, 0.4751091003417969, 4.724334716796875, 0.6867446899414062, 0.3903007507324219, 10.211448669433594, 4.7340240478515625, 8.394786834716797, -1.676340103149414, -0.3465118408203125, 18.84771728515625, 4.495723724365234, 2.2665557861328125, 7.5565643310546875, 5.544258117675781, 6.188606262207031, 2.2760963439941406, 3.0721282958984375, 2.4594879150390625, 3.4221267700195312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000096.npy"}
{"epoch": 0.14096916299559473, "step": 97, "batch_size": 64, "mean": 4.642391204833984, "std": 4.9778923988342285, "min": -4.06695556640625, "p10": -0.17427635192871074, "median": 3.1284751892089844, "p90": 12.304787826538089, "max": 18.533828735351562, "pos_frac": 0.890625, "sample": [1.6051406860351562, 8.45623779296875, 4.906036376953125, -1.690155029296875, 0.5376548767089844, 0.7523841857910156, 8.916282653808594, 0.8127231597900391, 2.6122093200683594, 3.46136474609375, 15.987564086914062, 12.981491088867188, -0.34120941162109375, 2.1164016723632812, 16.41747283935547, 16.30005645751953, 0.01810455322265625, 5.0585174560546875, 1.571319580078125, 2.726736068725586, 1.1685562133789062, -0.9172325134277344, 1.473592758178711, 2.7955856323242188, 6.65576171875, 5.286094665527344, 2.1519393920898438, 0.9618072509765625, 11.573478698730469, 7.289302825927734, 0.6751594543457031, -1.0332927703857422, 9.899955749511719, 1.7026119232177734, 8.548027038574219, 4.330841064453125, 11.476394653320312, 4.207599639892578, -4.06695556640625, 0.37929534912109375, 3.835540771484375, 1.6046562194824219, -1.6422500610351562, -0.2567253112792969, 12.618206024169922, 6.4329376220703125, 6.206523895263672, 1.9222068786621094, 4.036518096923828, 0.32128143310546875, 2.0417747497558594, 1.9396209716796875, 2.6407699584960938, 4.7817840576171875, 3.5073394775390625, 18.533828735351562, 0.3731346130371094, 6.7693634033203125, 4.487449645996094, 8.986953735351562, 9.977058410644531, 4.847019195556641, 1.9275665283203125, 13.45562744140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000097.npy"}
{"epoch": 0.14243759177679882, "step": 98, "batch_size": 64, "mean": 4.884597301483154, "std": 5.232528209686279, "min": -5.284210205078125, "p10": -0.1582910537719725, "median": 3.7987003326416016, "p90": 12.248529434204109, "max": 27.751678466796875, "pos_frac": 0.890625, "sample": [0.0325927734375, 3.3466835021972656, 13.230865478515625, 12.971061706542969, -2.1775074005126953, 3.533344268798828, 6.659454345703125, 10.600894927978516, 2.1293182373046875, 1.4986495971679688, 4.064056396484375, 5.056549072265625, 2.0994338989257812, 13.224555969238281, 5.458396911621094, 8.714447021484375, 7.472648620605469, 5.501251220703125, 0.16571807861328125, 5.028007507324219, 8.345497131347656, 2.471515655517578, 1.67181396484375, 5.806060791015625, 13.914756774902344, 2.4006004333496094, 5.566009521484375, 10.575634002685547, 8.6595458984375, 3.359668731689453, 1.8000106811523438, 5.186187744140625, 7.656902313232422, 5.7105865478515625, 6.706939697265625, 15.587646484375, 2.0758056640625, -5.284210205078125, -1.7920379638671875, 5.828117370605469, -3.5986900329589844, 6.864429473876953, 1.516143798828125, 6.015663146972656, 4.6888885498046875, 2.330158233642578, 1.9000778198242188, 1.8019943237304688, 2.7814178466796875, 12.954658508300781, 5.895210266113281, 8.551231384277344, 2.893218994140625, 27.751678466796875, -1.4215087890625, 3.463489532470703, -0.23095703125, 0.011262893676757812, -1.0302276611328125, 1.716339111328125, 8.044113159179688, 1.745849609375, 1.033050537109375, 2.0792617797851562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000098.npy"}
{"epoch": 0.14390602055800295, "step": 99, "batch_size": 64, "mean": 5.220826148986816, "std": 6.537954807281494, "min": -3.7038803100585938, "p10": 0.01151657104492193, "median": 2.899017333984375, "p90": 13.461184692382815, "max": 30.38104248046875, "pos_frac": 0.890625, "sample": [1.4245071411132812, 0.16167449951171875, 0.6246700286865234, 0.53564453125, -0.23334503173828125, 5.367393493652344, 8.61007308959961, 6.499748229980469, 15.218002319335938, 0.3633880615234375, 2.8033370971679688, 10.10443115234375, 9.818206787109375, -1.756683349609375, 13.850494384765625, 6.3251800537109375, 5.337547302246094, 4.4400482177734375, 1.29791259765625, 0.3209991455078125, 12.55279541015625, -3.7038803100585938, 0.3193778991699219, 20.327972412109375, 0.8135452270507812, 8.106979370117188, 3.878448486328125, -0.0120697021484375, 0.06655120849609375, -2.6844329833984375, 1.4241218566894531, 4.8853607177734375, 5.801788330078125, 5.559074401855469, 8.296409606933594, 19.139022827148438, 6.7465057373046875, 0.9521865844726562, 10.97835922241211, 12.082405090332031, 3.534637451171875, 6.370704650878906, 1.5007705688476562, 0.285858154296875, 30.38104248046875, 2.3079299926757812, 3.090618133544922, 6.0653076171875, 0.09804534912109375, 1.7880516052246094, 15.694686889648438, 2.2532806396484375, -0.3844890594482422, -1.0826568603515625, 11.299736022949219, 2.9946975708007812, 2.744508743286133, 0.4718894958496094, 25.136978149414062, 3.616058349609375, 2.268247604370117, 2.7784576416015625, 2.1468238830566406, 2.1279144287109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000099.npy"}
{"epoch": 0.14537444933920704, "step": 100, "batch_size": 64, "mean": 4.128679275512695, "std": 4.47694206237793, "min": -3.6604042053222656, "p10": -0.5377410888671874, "median": 3.2541465759277344, "p90": 8.60804901123047, "max": 19.3895263671875, "pos_frac": 0.84375, "sample": [1.8044204711914062, 1.7982254028320312, -1.2603950500488281, 3.914775848388672, 0.46581268310546875, -0.5587158203125, 2.4450836181640625, 5.513973236083984, -3.6604042053222656, 8.676681518554688, 2.970672607421875, 0.16936492919921875, 7.1420745849609375, 1.5910911560058594, 0.05348396301269531, 2.799957275390625, 3.1806793212890625, 6.780834197998047, -0.488800048828125, 7.0180511474609375, 4.501529693603516, 5.154029846191406, 2.40362548828125, 1.1045265197753906, 7.569103240966797, 13.211524963378906, 0.748565673828125, 2.10205078125, -0.2974853515625, -1.422576904296875, 4.512119293212891, 8.162593841552734, 1.2507400512695312, 4.8955535888671875, 18.22174835205078, 8.447906494140625, 5.901132583618164, 5.671394348144531, 7.302436828613281, 5.2208404541015625, 14.10333251953125, -0.8944549560546875, 4.1583251953125, 6.327861785888672, 4.825809478759766, 5.656955718994141, 3.7535552978515625, 2.0140228271484375, 3.3276138305664062, 1.8009910583496094, 8.879287719726562, 12.368354797363281, 5.9474334716796875, 2.9029693603515625, 19.3895263671875, -0.3073844909667969, -1.7446746826171875, 0.6631927490234375, 3.6012115478515625, -1.0279541015625, 8.318412780761719, 1.0738487243652344, 2.0479888916015625, 2.0309906005859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000100.npy"}
{"epoch": 0.14684287812041116, "step": 101, "batch_size": 64, "mean": 3.557770252227783, "std": 5.378474235534668, "min": -11.609619140625, "p10": -1.5379470825195312, "median": 2.673105239868164, "p90": 11.85643539428711, "max": 18.131912231445312, "pos_frac": 0.78125, "sample": [15.608245849609375, 13.68634033203125, 0.7040863037109375, 1.3352699279785156, 5.348285675048828, 12.608489990234375, 12.47967529296875, 2.5982933044433594, 11.629562377929688, 11.953666687011719, -11.609619140625, 3.248931884765625, 2.7906036376953125, 7.652915954589844, -0.17178058624267578, 0.3253173828125, 0.1974639892578125, 3.5267601013183594, 5.7649993896484375, 4.7274627685546875, -6.8084259033203125, 5.2064971923828125, 8.403541564941406, 1.9636306762695312, 6.3193206787109375, 0.8991508483886719, -0.6283645629882812, 18.131912231445312, 10.034866333007812, 4.5712890625, -1.4931182861328125, -1.2652587890625, -1.557159423828125, 6.050071716308594, 5.831050872802734, 3.7163238525390625, 2.4271392822265625, 14.174331665039062, 2.09405517578125, -2.5796546936035156, -1.187826156616211, 2.517822265625, 0.04839515686035156, 7.5260772705078125, 1.2018585205078125, -0.02878570556640625, 11.55181884765625, 6.210945129394531, 4.119861602783203, 2.225635528564453, 2.9883041381835938, -0.3438568115234375, 4.53497314453125, -3.1283836364746094, 2.7479171752929688, 5.334995269775391, 1.7295417785644531, -3.7929763793945312, -2.4868240356445312, 0.4788684844970703, 2.1205406188964844, 0.3251972198486328, 2.9326934814453125, 0.17431640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000101.npy"}
{"epoch": 0.14831130690161526, "step": 102, "batch_size": 64, "mean": 2.9680778980255127, "std": 5.068187236785889, "min": -14.975616455078125, "p10": -1.9767627716064453, "median": 2.665560722351074, "p90": 9.116550445556642, "max": 20.60498046875, "pos_frac": 0.8125, "sample": [-1.5315322875976562, 2.46502685546875, 10.214424133300781, 4.4337310791015625, -14.975616455078125, 0.7780551910400391, 0.4153289794921875, 6.8119964599609375, 3.442211151123047, 1.51739501953125, -3.2997589111328125, 9.656288146972656, 5.41680908203125, -0.0984649658203125, 2.8504791259765625, 0.1968402862548828, 0.9591827392578125, 7.685604095458984, -8.516510009765625, 1.4982070922851562, 4.856269836425781, -0.02276611328125, 6.957008361816406, 0.09167861938476562, 3.9615020751953125, 3.698152542114258, 2.1634483337402344, 12.494949340820312, 8.537498474121094, 2.6883277893066406, 7.9451904296875, 0.2061309814453125, 5.331409454345703, 4.710456848144531, 2.642793655395508, -2.306304931640625, 2.3673019409179688, -0.70477294921875, 5.2912139892578125, -2.008892059326172, 1.1586589813232422, 2.001708984375, 5.447425842285156, 0.07869720458984375, 9.464599609375, 20.60498046875, 2.2485122680664062, 11.819633483886719, 0.041046142578125, -5.286956787109375, 9.364715576171875, 3.474090576171875, 4.9696197509765625, 1.5553131103515625, 3.5866317749023438, -1.90179443359375, 3.681049346923828, 3.4312667846679688, 4.244720458984375, 2.5149307250976562, -3.7410507202148438, 8.145835876464844, 3.254444122314453, 0.9786090850830078], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000102.npy"}
{"epoch": 0.14977973568281938, "step": 103, "batch_size": 64, "mean": 4.861823081970215, "std": 5.409820556640625, "min": -10.55743408203125, "p10": -1.3314025878906248, "median": 4.060270309448242, "p90": 11.65772171020508, "max": 19.210205078125, "pos_frac": 0.796875, "sample": [2.91845703125, 1.0085067749023438, 0.8596076965332031, -0.45996856689453125, -2.592498779296875, 5.4548797607421875, 1.5844345092773438, 8.661632537841797, 1.6034126281738281, 4.363201141357422, -0.0623016357421875, 3.4589767456054688, -0.550567626953125, 8.921134948730469, -1.6036758422851562, 6.957210540771484, 13.893867492675781, 11.097164154052734, 7.771812438964844, 7.9352264404296875, 4.8582611083984375, 14.617904663085938, 0.00241851806640625, -10.55743408203125, 3.03424072265625, 8.285255432128906, 9.722511291503906, -1.4489059448242188, 4.6842193603515625, 3.678802490234375, 1.3843841552734375, 13.076812744140625, 6.190788269042969, -1.0572280883789062, 9.569908142089844, 2.8021926879882812, 3.7573394775390625, -0.5163192749023438, -1.6457443237304688, 0.9529132843017578, -2.98968505859375, 7.13714599609375, 10.398155212402344, -1.630950927734375, 9.409942626953125, 13.172126770019531, 0.3769035339355469, 2.021270751953125, 19.210205078125, 11.551460266113281, 11.703262329101562, 8.121223449707031, 7.421775817871094, 9.340293884277344, 2.6760711669921875, -0.5166835784912109, 4.995113372802734, 6.8443450927734375, 2.5082473754882812, 3.0309295654296875, 16.257110595703125, 6.917699813842773, 2.03216552734375, 8.555694580078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000103.npy"}
{"epoch": 0.1512481644640235, "step": 104, "batch_size": 64, "mean": 5.340367317199707, "std": 6.180846214294434, "min": -4.3167724609375, "p10": -0.4284313201904295, "median": 3.753814697265625, "p90": 12.566345977783204, "max": 30.906402587890625, "pos_frac": 0.859375, "sample": [1.6758575439453125, 4.7625579833984375, 0.022954940795898438, 4.678371429443359, 13.030563354492188, 2.2852249145507812, -0.23232269287109375, 0.44121551513671875, 2.3855743408203125, 9.60595703125, 1.0294857025146484, -3.2311630249023438, 16.129776000976562, 5.1148223876953125, 13.792434692382812, -0.6800880432128906, 10.043731689453125, 9.417301177978516, 3.5540924072265625, 5.074371337890625, -0.1497650146484375, 3.3190574645996094, 6.370044708251953, 3.3634567260742188, -1.6579837799072266, 2.142335891723633, 19.401947021484375, 3.728424072265625, 8.490364074707031, 7.888847351074219, 12.395271301269531, 12.639663696289062, 1.5492591857910156, 4.433013916015625, 0.7097892761230469, 4.989921569824219, 5.129611968994141, 8.618148803710938, 30.906402587890625, 0.6984024047851562, 10.011978149414062, -1.0726776123046875, 12.20623779296875, -1.2452011108398438, 4.0421600341796875, 5.35810661315918, 10.326904296875, 3.399585723876953, 1.5264892578125, 2.3990325927734375, 0.3548545837402344, 1.3378448486328125, 1.3759918212890625, 8.542732238769531, 7.8961181640625, 22.597686767578125, 2.6742935180664062, 8.6300048828125, -0.5124778747558594, 2.0159530639648438, 5.2249603271484375, 1.3635749816894531, -4.3167724609375, 3.779205322265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000104.npy"}
{"epoch": 0.1527165932452276, "step": 105, "batch_size": 64, "mean": 5.128673553466797, "std": 5.7731242179870605, "min": -11.695266723632812, "p10": -0.34552383422851557, "median": 4.444063186645508, "p90": 13.778254699707032, "max": 23.664215087890625, "pos_frac": 0.875, "sample": [-2.999420166015625, 3.1972198486328125, 4.297481536865234, 1.7999114990234375, 5.9330291748046875, 6.391120910644531, 2.6184158325195312, -11.695266723632812, -1.8987388610839844, -3.141937255859375, 4.263591766357422, 8.054573059082031, 5.213996887207031, 4.977916717529297, 0.5767288208007812, 8.276451110839844, 1.1752090454101562, 3.7457046508789062, -0.2866973876953125, -0.37073516845703125, 6.1241912841796875, 13.648361206054688, 5.134033203125, 0.7529258728027344, 2.3468399047851562, 9.985488891601562, 9.447822570800781, 3.725238800048828, 1.217620849609375, 6.226905822753906, 9.553939819335938, 9.862754821777344, 0.8731460571289062, 13.909980773925781, 4.8181915283203125, 11.536880493164062, 16.07686996459961, 11.743240356445312, 1.4276275634765625, 5.589183807373047, 5.9041595458984375, 11.794227600097656, 4.590644836425781, 1.0700111389160156, 3.2720489501953125, 1.6763954162597656, 1.4455032348632812, 2.5774459838867188, 0.224456787109375, 5.85528564453125, 5.621070861816406, 0.1185455322265625, -0.8286285400390625, 2.4188003540039062, 14.63763427734375, 5.783504486083984, 0.8617992401123047, 23.664215087890625, 14.123085021972656, 16.72607421875, 8.648757934570312, -2.1978302001953125, 2.2841567993164062, 13.83392333984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000105.npy"}
{"epoch": 0.15418502202643172, "step": 106, "batch_size": 64, "mean": 5.0136871337890625, "std": 5.848348617553711, "min": -4.3797454833984375, "p10": -0.9276950836181641, "median": 3.40484619140625, "p90": 12.226905059814454, "max": 28.450294494628906, "pos_frac": 0.84375, "sample": [6.4583740234375, 1.7095890045166016, -0.9240074157714844, -1.4948654174804688, 6.873577117919922, 3.346343994140625, 1.61688232421875, -3.9295654296875, 9.765838623046875, 9.048057556152344, 6.549022674560547, 1.3271827697753906, 1.1801776885986328, 10.11435317993164, 4.213340759277344, 16.784027099609375, -1.1439933776855469, 2.6803359985351562, 12.685791015625, 0.5017280578613281, -0.08989715576171875, 9.240257263183594, -0.9292755126953125, 19.69781494140625, 3.463348388671875, 2.714702606201172, 1.3508987426757812, 1.5259780883789062, 13.451065063476562, 2.3374557495117188, 0.18860626220703125, 1.297821044921875, 6.8805694580078125, 6.155097961425781, 3.7848968505859375, 4.463829040527344, 2.7579345703125, 2.0831146240234375, 6.533699035644531, -4.3797454833984375, 11.460556030273438, 6.045366287231445, 0.26056671142578125, 1.4595870971679688, 28.450294494628906, 11.3707275390625, 5.940956115722656, 0.906982421875, 15.500137329101562, 5.831581115722656, -1.2926826477050781, 7.1195220947265625, 0.05849456787109375, 3.2245559692382812, 12.026702880859375, 4.5991668701171875, 5.7599334716796875, 12.312705993652344, 5.8541412353515625, 10.842391967773438, -0.0523681640625, 1.86956787109375, -1.6034927368164062, 3.040210723876953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000106.npy"}
{"epoch": 0.15565345080763582, "step": 107, "batch_size": 64, "mean": 5.5514326095581055, "std": 8.169926643371582, "min": -6.017860412597656, "p10": -1.4238151550292961, "median": 4.31005859375, "p90": 12.577403259277343, "max": 53.352508544921875, "pos_frac": 0.859375, "sample": [1.716461181640625, 13.909400939941406, -0.1845703125, 3.6338958740234375, 2.3893508911132812, -1.7325973510742188, 4.360931396484375, 1.3766422271728516, -2.19354248046875, -6.017860412597656, 1.4854106903076172, 5.073097229003906, 4.605884552001953, 5.341548919677734, 2.770662307739258, 1.1361770629882812, 5.145904541015625, 5.214996337890625, -0.7033233642578125, 3.419109344482422, 10.667076110839844, 13.767925262451172, 9.78558349609375, 1.47998046875, -4.107017517089844, 5.498815536499023, -3.91070556640625, 4.545036315917969, 53.352508544921875, 6.580940246582031, 2.766021728515625, 5.254364013671875, 4.0861968994140625, 4.729084014892578, 11.268409729003906, 12.592422485351562, 1.368814468383789, 4.0738677978515625, 3.1885032653808594, 3.445343017578125, 9.116134643554688, 3.8130645751953125, 5.45428466796875, 12.5423583984375, 4.259185791015625, 2.832000732421875, 0.18083953857421875, 24.41986083984375, 5.253021240234375, 7.988861083984375, 3.51702880859375, 1.381622314453125, 19.947341918945312, 5.453498840332031, -2.2130889892578125, 6.012420654296875, 1.460592269897461, 4.643161773681641, 8.547073364257812, 0.8588447570800781, 7.146087646484375, 19.127212524414062, -3.1784515380859375, 5.5479736328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000107.npy"}
{"epoch": 0.15712187958883994, "step": 108, "batch_size": 64, "mean": 5.115566253662109, "std": 6.195886611938477, "min": -9.682662963867188, "p10": -0.6695018768310544, "median": 3.911273956298828, "p90": 15.101004028320315, "max": 27.300567626953125, "pos_frac": 0.875, "sample": [2.7591323852539062, 1.198944091796875, 5.741065979003906, 2.1981201171875, -9.682662963867188, 6.58123779296875, -0.98724365234375, -1.173248291015625, 10.80242919921875, -3.5473861694335938, 0.8120574951171875, 15.6148681640625, 1.04461669921875, 0.0290985107421875, 0.3097705841064453, 3.6426925659179688, 1.031595230102539, 19.64739990234375, 27.300567626953125, 7.340320587158203, 16.151718139648438, 8.0728759765625, 3.033092498779297, 1.1999320983886719, 1.7446937561035156, 3.089670181274414, -0.4546852111816406, 2.6899871826171875, 16.4052734375, 4.845859527587891, 5.034221649169922, -1.3941383361816406, 14.7081298828125, 7.23516845703125, 15.54888916015625, 1.0750579833984375, 7.505287170410156, 3.9111251831054688, 13.75372314453125, 8.50136947631836, 6.3885345458984375, 5.0791015625, 1.6513748168945312, 3.9691009521484375, 3.9114227294921875, -0.761566162109375, 7.134857177734375, 15.269378662109375, 3.4748077392578125, 8.339950561523438, -3.4340133666992188, 11.732772827148438, 0.730682373046875, 6.9188385009765625, 0.04949188232421875, 1.5380725860595703, 9.174564361572266, 1.8931961059570312, 5.631317138671875, 2.0784149169921875, 4.976737976074219, 0.056194305419921875, 4.182323455810547, 4.090053558349609], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000108.npy"}
{"epoch": 0.15859030837004406, "step": 109, "batch_size": 64, "mean": 6.441263675689697, "std": 6.990314483642578, "min": -8.41339111328125, "p10": -1.2235679626464842, "median": 5.197715759277344, "p90": 16.473106384277347, "max": 25.970569610595703, "pos_frac": 0.859375, "sample": [8.282752990722656, 4.548583984375, 5.89495849609375, 2.250232696533203, 2.997894287109375, 6.929058074951172, 6.666900634765625, 2.3301773071289062, -5.114036560058594, 9.88592529296875, 18.214111328125, 8.02410888671875, 1.7305755615234375, 2.0785446166992188, 11.312870025634766, 2.9732723236083984, 1.797952651977539, 0.19028091430664062, -1.8206062316894531, -1.5446596145629883, 1.8993988037109375, 24.1383056640625, -1.4632148742675781, 2.3511962890625, 6.706363677978516, 1.460205078125, 4.7566680908203125, 1.083150863647461, 4.233562469482422, -2.06787109375, 10.135688781738281, 19.054000854492188, 0.34148216247558594, 4.0279998779296875, 16.732452392578125, 9.080047607421875, 6.2698974609375, 15.836135864257812, 9.747726440429688, 14.58935546875, -1.3347015380859375, -0.9642562866210938, -8.41339111328125, 7.024829864501953, 2.317535400390625, 25.970569610595703, 7.1909332275390625, 3.8065948486328125, 1.8478736877441406, 19.671035766601562, 7.98736572265625, 7.208770751953125, 3.0823745727539062, 5.638763427734375, 21.30908203125, -0.901092529296875, 4.3570098876953125, 11.712623596191406, 15.867965698242188, 10.78802490234375, 14.165542602539062, 8.137557983398438, 0.19811439514160156, 9.030288696289062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000109.npy"}
{"epoch": 0.16005873715124816, "step": 110, "batch_size": 64, "mean": 4.108128547668457, "std": 5.288022041320801, "min": -6.903434753417969, "p10": -0.7223510742187499, "median": 2.996091842651367, "p90": 11.292715835571292, "max": 19.868011474609375, "pos_frac": 0.84375, "sample": [12.358688354492188, 0.4519805908203125, 8.070442199707031, 8.898284912109375, 5.7473602294921875, 3.1341094970703125, 3.850017547607422, 4.99284553527832, 2.6640625, 13.117469787597656, 0.5667572021484375, 0.6720733642578125, -6.843841552734375, -0.6028022766113281, 8.967658996582031, 0.00115966796875, 2.1468944549560547, 19.868011474609375, 1.9787139892578125, 5.611808776855469, -0.7735862731933594, 3.7427597045898438, 2.247894287109375, 13.299705505371094, 3.54144287109375, 14.050384521484375, 7.4949188232421875, 5.4676513671875, 1.5186347961425781, 7.2493743896484375, -1.5136871337890625, 0.04827117919921875, 1.7901611328125, 0.8203582763671875, -2.779937744140625, 8.7406005859375, 8.928749084472656, 11.526485443115234, 2.8000640869140625, 0.5092201232910156, 2.3345184326171875, 4.886667251586914, 6.162487030029297, 2.662445068359375, 1.1176185607910156, 6.8266754150390625, -0.12563323974609375, -5.34808349609375, 6.030059814453125, -0.14086151123046875, 6.4929656982421875, 0.3855419158935547, 4.3407745361328125, 2.858074188232422, -1.07427978515625, 5.427803039550781, 6.217250823974609, 0.36181640625, 1.1559944152832031, 10.74725341796875, -6.903434753417969, 0.6656951904296875, 3.9071998596191406, 19.570510864257812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000110.npy"}
{"epoch": 0.16152716593245228, "step": 111, "batch_size": 64, "mean": 7.069258689880371, "std": 6.86132287979126, "min": -5.393280029296875, "p10": -0.2915494918823242, "median": 6.486198425292969, "p90": 15.433299255371095, "max": 31.544937133789062, "pos_frac": 0.84375, "sample": [22.183822631835938, 13.805450439453125, -5.393280029296875, 13.214569091796875, 4.504203796386719, 7.197052001953125, -3.4722900390625, 5.634147644042969, -0.4729576110839844, 10.23724365234375, 9.332763671875, 7.439422607421875, 9.343982696533203, 14.955375671386719, 12.64599609375, 4.8127593994140625, 1.6654129028320312, 8.96051025390625, 17.020797729492188, 1.9377155303955078, 1.5098724365234375, -0.296966552734375, 15.544837951660156, 2.882781982421875, 6.1483001708984375, 9.68121337890625, 4.2962493896484375, 3.6571121215820312, -0.17156219482421875, 7.059883117675781, 8.174484252929688, 3.254018783569336, 10.79388427734375, 7.769721984863281, 15.173042297363281, 2.154327392578125, -0.24848175048828125, 7.538810729980469, 31.544937133789062, 3.15130615234375, 2.562774658203125, 0.447906494140625, 2.659341812133789, 8.210365295410156, 9.370529174804688, 16.155105590820312, 5.6977081298828125, 3.6067047119140625, 13.400718688964844, 2.8285751342773438, 6.7586212158203125, 6.213775634765625, 13.133415222167969, 10.445472717285156, 23.308700561523438, -0.27890968322753906, -3.2860336303710938, 7.397392272949219, 4.502002716064453, 7.068145751953125, 0.09899520874023438, 19.05877685546875, -1.6876296997070312, -0.41033935546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000111.npy"}
{"epoch": 0.16299559471365638, "step": 112, "batch_size": 64, "mean": 4.050925254821777, "std": 4.67784309387207, "min": -3.9096527099609375, "p10": -1.3448951721191407, "median": 3.0936546325683594, "p90": 10.961716461181643, "max": 16.58092498779297, "pos_frac": 0.8125, "sample": [2.8081817626953125, 9.437881469726562, 8.002838134765625, 1.2239913940429688, -0.43243408203125, 1.4302558898925781, -1.3047714233398438, -1.362091064453125, 4.708442687988281, 1.8763160705566406, 7.7912139892578125, 2.3404808044433594, 0.24995040893554688, -2.9151611328125, 11.521392822265625, 9.33673095703125, -1.5906047821044922, 6.441902160644531, 4.091220855712891, 1.4705581665039062, 3.7117385864257812, 8.588420867919922, 6.1498565673828125, -0.7946662902832031, 4.31915283203125, 12.515277862548828, 3.061676025390625, 3.1256332397460938, 0.6259613037109375, 0.269073486328125, 8.591304779052734, 4.703559875488281, 4.5428619384765625, 13.435604095458984, 6.585784912109375, -1.430145263671875, 16.58092498779297, 1.354248046875, 6.539825439453125, 0.7307891845703125, -3.9096527099609375, 4.697540283203125, 6.268524169921875, -2.7617950439453125, 0.2052001953125, 10.237037658691406, 15.299919128417969, 1.7931385040283203, 13.197952270507812, 11.272293090820312, -0.23499298095703125, 1.5387496948242188, 6.4835205078125, 1.4698638916015625, -0.7617034912109375, 3.935028076171875, 1.4120559692382812, -1.7523574829101562, 5.883670806884766, 2.9538097381591797, 1.1975250244140625, 4.1510467529296875, 0.7547035217285156, 7.594959259033203], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000112.npy"}
{"epoch": 0.1644640234948605, "step": 113, "batch_size": 64, "mean": 4.301935195922852, "std": 5.041431903839111, "min": -8.710235595703125, "p10": -1.2483783721923827, "median": 3.71148681640625, "p90": 9.60699005126953, "max": 20.786376953125, "pos_frac": 0.828125, "sample": [2.1803359985351562, 5.2872467041015625, 0.6247348785400391, 20.786376953125, 9.614433288574219, -2.3847122192382812, 5.753318786621094, 8.207771301269531, 0.23438262939453125, 0.758544921875, 3.7605209350585938, 3.045166015625, 3.7148590087890625, 6.34417724609375, 6.132007598876953, -0.12179946899414062, 1.2547073364257812, -2.248077392578125, 4.12835693359375, 6.193534851074219, -1.329833984375, -2.8023414611816406, 10.064048767089844, -8.710235595703125, 6.598442077636719, 7.3211669921875, 5.483501434326172, 3.7081146240234375, 5.1927642822265625, 6.95758056640625, 3.1040725708007812, 7.3980560302734375, 3.6683197021484375, 9.589622497558594, 6.633476257324219, -5.5711822509765625, 2.283172607421875, -1.0009613037109375, 14.505996704101562, 7.006317138671875, 2.2797927856445312, 2.2700271606445312, -0.48822021484375, 3.4292068481445312, 10.233673095703125, 6.0006866455078125, 4.97198486328125, 7.643943786621094, -1.7531204223632812, 2.1318492889404297, 19.524185180664062, 3.085845947265625, 3.231761932373047, 2.7144699096679688, 4.166206359863281, 8.552276611328125, 8.220115661621094, 13.951286315917969, -1.0583152770996094, 2.663665771484375, 0.925445556640625, 3.386077880859375, 1.7121505737304688, 4.1628570556640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000113.npy"}
{"epoch": 0.16593245227606462, "step": 114, "batch_size": 64, "mean": 5.540801048278809, "std": 6.9556498527526855, "min": -6.411590576171875, "p10": -1.9486877441406243, "median": 4.063877105712891, "p90": 15.593045806884765, "max": 30.647003173828125, "pos_frac": 0.828125, "sample": [30.647003173828125, 15.393394470214844, 16.870010375976562, -4.1895294189453125, 1.4235267639160156, 10.479896545410156, 1.5438919067382812, 6.130577087402344, 3.1739044189453125, 1.8457717895507812, 1.9202213287353516, 5.391448974609375, 13.456275939941406, 5.17413330078125, 8.657852172851562, 5.6337432861328125, 3.762847900390625, -1.26470947265625, -0.6045570373535156, 16.493438720703125, 0.3488883972167969, 1.1831703186035156, 16.839462280273438, 5.757240295410156, 2.355499267578125, 15.646331787109375, 1.374969482421875, -0.5689163208007812, -2.2418212890625, 8.577503204345703, 2.4865894317626953, 3.4754505157470703, -4.584442138671875, 4.948421478271484, -6.411590576171875, 0.12815475463867188, 3.2888317108154297, 0.627716064453125, 12.733139038085938, 15.468711853027344, 7.586328506469727, 9.221656799316406, 1.1711463928222656, 12.244430541992188, 3.56719970703125, -2.692535400390625, 19.517822265625, 2.44219970703125, 5.651123046875, 1.1974029541015625, 0.8944244384765625, 6.9665069580078125, 5.065986633300781, 4.364906311035156, 10.42864990234375, 19.44992446899414, -0.2376422882080078, 12.052841186523438, 6.5598297119140625, -4.415256500244141, 5.470207214355469, 1.4946746826171875, -3.611297607421875, 6.848304748535156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000114.npy"}
{"epoch": 0.16740088105726872, "step": 115, "batch_size": 64, "mean": 5.725839614868164, "std": 6.139260768890381, "min": -6.4259796142578125, "p10": -0.7296443939208981, "median": 4.505157470703125, "p90": 13.69646110534668, "max": 25.913543701171875, "pos_frac": 0.859375, "sample": [-1.5145111083984375, -1.5127983093261719, 9.25518798828125, 7.386810302734375, 10.522308349609375, 5.658935546875, 4.978759765625, 3.008045196533203, -0.8981437683105469, 13.588493347167969, 3.006305694580078, 0.9510536193847656, 2.468660354614258, 9.320564270019531, 5.620750427246094, 18.271286010742188, -0.06777381896972656, 4.3021240234375, 6.7108917236328125, 0.95733642578125, 7.254783630371094, 0.5582962036132812, 10.344871520996094, 17.76284408569336, 8.312980651855469, 7.076454162597656, 2.3500213623046875, 13.742733001708984, 1.3070926666259766, 21.538040161132812, 10.05255126953125, 0.96722412109375, 5.379425048828125, 0.72076416015625, 4.514862060546875, -6.4259796142578125, 3.4986801147460938, 6.977939605712891, 13.342304229736328, 14.932685852050781, 4.122283935546875, 8.215896606445312, 10.643569946289062, -1.9218902587890625, 25.913543701171875, 3.304962158203125, 2.23394775390625, -0.9557228088378906, 4.380260467529297, 5.150627136230469, 3.889862060546875, 10.3763427734375, 3.7132415771484375, 15.064468383789062, 0.23572158813476562, 7.282623291015625, 10.281974792480469, 0.24344348907470703, 1.891204833984375, 2.423828125, 5.84552001953125, 4.495452880859375, -0.33647918701171875, -6.2637786865234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000115.npy"}
{"epoch": 0.16886930983847284, "step": 116, "batch_size": 64, "mean": 4.71929931640625, "std": 5.427305698394775, "min": -4.2934722900390625, "p10": -0.9668228149414061, "median": 3.450185775756836, "p90": 12.865737915039062, "max": 18.163909912109375, "pos_frac": 0.796875, "sample": [3.4540481567382812, 6.8692474365234375, 1.8202095031738281, 15.003158569335938, 3.2186431884765625, 3.8691864013671875, 0.42208099365234375, 1.2274646759033203, 2.3684234619140625, 0.5874786376953125, 0.504730224609375, -1.1663856506347656, 11.857166290283203, -1.5529708862304688, 4.009239196777344, -0.8486785888671875, 3.6102752685546875, -1.485626220703125, 9.92138671875, 18.163909912109375, 4.6883697509765625, 9.805397033691406, 0.0161590576171875, 5.316078186035156, 11.978996276855469, 1.08807373046875, 4.367950439453125, 15.712753295898438, 0.7106208801269531, -0.406982421875, 7.1458587646484375, 5.73077392578125, 2.735828399658203, 4.4158477783203125, -0.021211624145507812, 3.4463233947753906, -4.2934722900390625, 6.051902770996094, 2.8952903747558594, 2.5450286865234375, -1.0174560546875, 15.161941528320312, -1.5331573486328125, 15.961227416992188, 2.0508270263671875, -1.4448623657226562, 14.110969543457031, 12.818687438964844, 2.019580841064453, 3.9912796020507812, 3.8139305114746094, -0.20403099060058594, 12.236740112304688, 1.1247787475585938, 11.107357025146484, -0.7473678588867188, 1.6733894348144531, 6.569793701171875, 0.5263938903808594, 3.761871337890625, -0.21966934204101562, 12.7379150390625, 8.866539001464844, 12.885902404785156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000116.npy"}
{"epoch": 0.17033773861967694, "step": 117, "batch_size": 64, "mean": 4.342745304107666, "std": 5.133291244506836, "min": -10.488311767578125, "p10": -1.1653833389282224, "median": 4.429618835449219, "p90": 10.179060745239259, "max": 21.951385498046875, "pos_frac": 0.8125, "sample": [1.2892189025878906, 0.8258056640625, 13.274215698242188, 4.57720947265625, 6.7176666259765625, 2.508575439453125, 7.2162322998046875, -1.6886825561523438, 4.411506652832031, 0.7447395324707031, 4.447731018066406, 9.346046447753906, 9.403968811035156, -10.488311767578125, 6.8772430419921875, 0.9348793029785156, 5.2104034423828125, 4.9333953857421875, 2.6547698974609375, -3.006103515625, 3.6916656494140625, 10.646499633789062, 6.1815032958984375, -1.2511768341064453, 10.75335693359375, 1.0375595092773438, -2.0597991943359375, 0.5087642669677734, -0.5938949584960938, -0.9651985168457031, 5.796680450439453, 7.60845947265625, 5.058296203613281, 9.006011962890625, 9.099288940429688, 3.976024627685547, 21.951385498046875, 8.741424560546875, -0.1844635009765625, 10.03689193725586, 3.6427001953125, 0.49536895751953125, 2.2135009765625, -1.6915435791015625, -3.086273193359375, 0.8773345947265625, 9.84820556640625, 5.171424865722656, 5.053821563720703, 1.3543243408203125, -0.0870513916015625, 4.144893646240234, 5.676181793212891, 4.702850341796875, 16.965576171875, 10.239990234375, 0.05872917175292969, 6.267963409423828, 12.379440307617188, 2.4537734985351562, 5.5560150146484375, -0.013629913330078125, 1.852935791015625, 4.629367828369141], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000117.npy"}
{"epoch": 0.17180616740088106, "step": 118, "batch_size": 64, "mean": 5.370712757110596, "std": 5.230695724487305, "min": -7.926399230957031, "p10": -0.14978141784667942, "median": 4.952371597290039, "p90": 12.531702423095703, "max": 19.72643280029297, "pos_frac": 0.890625, "sample": [1.963226318359375, 2.6422119140625, 12.96114730834961, -0.4023284912109375, 7.484685897827148, 7.4732818603515625, 1.4308319091796875, 6.910480499267578, 7.019712448120117, 6.699817657470703, 2.971923828125, 1.1224822998046875, 15.47198486328125, 0.8526229858398438, 3.5175323486328125, 9.046550750732422, 4.23919677734375, 6.6229705810546875, 7.963134765625, 12.353279113769531, -2.7624359130859375, 1.0698928833007812, 2.188629150390625, 2.3496322631835938, 0.43294525146484375, 1.9649658203125, 0.6909370422363281, 4.559864044189453, 2.73681640625, 8.459320068359375, 10.702596664428711, 5.003726959228516, -1.9231948852539062, 6.242092132568359, 15.142929077148438, 12.608169555664062, 2.241943359375, 5.24591064453125, 13.504684448242188, 7.762001037597656, -0.2585563659667969, 0.10402679443359375, -7.926399230957031, 4.680477142333984, 19.72643280029297, 8.73785400390625, 9.68243408203125, 11.552909851074219, 5.299415588378906, -2.911508560180664, 5.54876708984375, 12.350311279296875, 4.9010162353515625, 6.539905548095703, 3.7692718505859375, 5.265556335449219, 4.375181198120117, 4.055900573730469, 2.833953857421875, 13.395645141601562, 7.164329528808594, -6.210178375244141, 12.08465576171875, 2.398040771484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000118.npy"}
{"epoch": 0.17327459618208516, "step": 119, "batch_size": 64, "mean": 5.35649299621582, "std": 6.601394176483154, "min": -13.674362182617188, "p10": -1.3911079406738274, "median": 4.9603986740112305, "p90": 13.42982406616211, "max": 24.37554931640625, "pos_frac": 0.8125, "sample": [5.014890670776367, 7.331451416015625, 2.366466522216797, 20.47247314453125, 13.167083740234375, 11.1041259765625, -0.48448944091796875, 0.332855224609375, 1.1277122497558594, 8.346885681152344, -1.7097320556640625, 6.5453948974609375, 14.4088134765625, 7.4964447021484375, -0.0462493896484375, 9.779823303222656, 5.217044830322266, 6.755180358886719, 3.5529632568359375, 0.1500244140625, 4.689300537109375, 10.232280731201172, 12.185661315917969, 10.167442321777344, 4.905906677246094, 4.403022766113281, 13.542427062988281, 1.7948226928710938, 8.254150390625, 6.9808197021484375, -0.5721588134765625, -1.8118438720703125, 0.06937408447265625, 2.3719863891601562, 15.186691284179688, -4.9362335205078125, 7.734748840332031, 21.390548706054688, 3.174488067626953, 0.24553680419921875, -3.5466461181640625, 3.6028671264648438, 1.4080581665039062, 24.37554931640625, -0.128875732421875, 5.535621643066406, 6.5760345458984375, 9.607147216796875, 6.359771728515625, 8.51708984375, -0.6476516723632812, -13.674362182617188, 3.0148773193359375, 4.402496337890625, 10.912582397460938, 5.111301422119141, -2.5675582885742188, 2.163278579711914, 8.434440612792969, 0.7566165924072266, -8.03643798828125, 4.7705078125, 11.310813903808594, 13.619888305664062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000119.npy"}
{"epoch": 0.17474302496328928, "step": 120, "batch_size": 64, "mean": 7.322626113891602, "std": 8.441629409790039, "min": -10.389938354492188, "p10": -1.2784362792968746, "median": 6.1245832443237305, "p90": 21.000767517089848, "max": 26.33026123046875, "pos_frac": 0.828125, "sample": [20.383941650390625, 13.10382080078125, 8.008560180664062, 9.489059448242188, 6.212423324584961, 5.463294982910156, -2.3155059814453125, 22.746421813964844, 9.475791931152344, 1.6754302978515625, 14.633537292480469, 10.867538452148438, 6.464321136474609, 9.146484375, 4.57768440246582, -10.02264404296875, 7.9892730712890625, 1.7798347473144531, 9.915969848632812, -1.040374755859375, -10.389938354492188, 19.376937866210938, 12.30279541015625, 13.493568420410156, 3.3079376220703125, -2.0539398193359375, 1.21539306640625, 8.994361877441406, 17.70880889892578, -0.5377349853515625, -1.380462646484375, 24.556228637695312, 25.443862915039062, 8.89169692993164, 21.265121459960938, 2.030284881591797, 6.3936920166015625, 3.733099937438965, 3.6280899047851562, -2.261463165283203, 2.240142822265625, 7.670318603515625, 0.9954757690429688, 26.33026123046875, 25.273483276367188, 10.69500732421875, 24.327133178710938, 3.5732879638671875, 8.462142944335938, 9.912078857421875, 1.396575927734375, 2.442352294921875, -0.6481170654296875, 8.48117446899414, -0.6153717041015625, -2.853515625, 0.22365570068359375, 0.2788543701171875, 16.4625244140625, 3.6507797241210938, 6.0367431640625, 1.9506607055664062, 4.6631011962890625, 3.4261398315429688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000120.npy"}
{"epoch": 0.1762114537444934, "step": 121, "batch_size": 64, "mean": 6.205869674682617, "std": 6.141052722930908, "min": -3.9956111907958984, "p10": -0.3809799194335937, "median": 5.2922821044921875, "p90": 14.656572723388674, "max": 26.71753692626953, "pos_frac": 0.875, "sample": [1.5159263610839844, 6.95263671875, 2.3845672607421875, 14.845718383789062, -2.0196266174316406, 6.042938232421875, 7.83135986328125, 2.1670150756835938, 1.3398551940917969, 9.456321716308594, 2.2583999633789062, -2.6671371459960938, 0.44496917724609375, 7.227165222167969, 10.790779113769531, 3.4452438354492188, -0.9141006469726562, 3.1371078491210938, 8.293037414550781, 5.465095520019531, 5.63507080078125, 1.5921821594238281, 6.574623107910156, 4.608306884765625, 7.056510925292969, 0.9944381713867188, 6.584869384765625, 5.7706451416015625, 12.434768676757812, 2.93414306640625, 6.0901641845703125, -0.35158538818359375, 5.563545227050781, 17.392868041992188, 1.5575942993164062, 0.662567138671875, 9.881797790527344, 21.07470703125, -1.8219070434570312, 26.71753692626953, 1.5664596557617188, 10.748710632324219, 5.119468688964844, -0.8564453125, 4.297821044921875, 7.6576385498046875, 16.323776245117188, 0.4817047119140625, 12.763206481933594, 13.184341430664062, 3.5286483764648438, 4.402587890625, 13.82757568359375, 3.3434295654296875, 11.027427673339844, -0.39357757568359375, 0.8270263671875, 14.89569091796875, 9.886383056640625, 14.215232849121094, 18.239486694335938, 4.507175445556641, -3.9956111907958984, 2.62738037109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000121.npy"}
{"epoch": 0.1776798825256975, "step": 122, "batch_size": 64, "mean": 6.558840751647949, "std": 6.455270290374756, "min": -9.269561767578125, "p10": -0.6013593673706054, "median": 6.15907096862793, "p90": 15.463894653320317, "max": 22.08355712890625, "pos_frac": 0.8125, "sample": [9.9779052734375, -2.2714004516601562, 6.067195892333984, 4.069919586181641, 9.670326232910156, -3.232624053955078, 0.2081146240234375, 3.0632400512695312, 7.107906341552734, 5.72369384765625, 8.103229522705078, -9.269561767578125, 11.283683776855469, 10.1854248046875, 5.585166931152344, 12.47479248046875, 6.503807067871094, -1.1658477783203125, 22.08355712890625, 6.4373931884765625, -0.6301727294921875, 5.24224853515625, 5.463310241699219, 8.633682250976562, 3.5616531372070312, 2.6254043579101562, 8.57586669921875, 13.735588073730469, 9.877532958984375, 12.054466247558594, 19.179935455322266, -4.659248352050781, 5.3441314697265625, 0.5948753356933594, -0.37652587890625, 6.250946044921875, 7.652660369873047, 8.499519348144531, 20.526885986328125, 17.725849151611328, -0.13922119140625, 18.89520263671875, 2.3294754028320312, 14.576675415039062, 0.30992889404296875, 17.380489349365234, -0.12087249755859375, 9.007400512695312, 5.032905578613281, 8.263023376464844, 3.5980453491210938, -2.074737548828125, 11.422557830810547, -0.5341281890869141, 10.103721618652344, 6.288963317871094, 14.144912719726562, 15.844131469726562, -0.3695106506347656, 0.2599296569824219, 6.0049285888671875, 4.741607666015625, 3.434459686279297, 8.881393432617188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000122.npy"}
{"epoch": 0.17914831130690162, "step": 123, "batch_size": 64, "mean": 5.186343193054199, "std": 5.921340465545654, "min": -4.37982177734375, "p10": -1.5208305358886718, "median": 4.662052154541016, "p90": 12.35783081054688, "max": 28.567001342773438, "pos_frac": 0.78125, "sample": [6.457328796386719, 5.450229644775391, 6.474510192871094, 2.9311447143554688, 5.117034912109375, 2.997711181640625, 10.772262573242188, 8.3045654296875, 4.722236633300781, 2.767333984375, -0.7585716247558594, 6.03271484375, -0.7151031494140625, 10.927680969238281, -0.0864105224609375, 5.131900787353516, 4.110740661621094, 14.880050659179688, -4.042839050292969, 7.6492462158203125, 4.06072998046875, 28.567001342773438, 5.9637908935546875, 0.92877197265625, 3.2641639709472656, -0.7379913330078125, 15.623634338378906, 3.072357177734375, 14.682418823242188, 4.60186767578125, 9.541439056396484, 7.2525787353515625, 1.9241790771484375, -1.1103324890136719, 12.819435119628906, 9.491920471191406, 11.280754089355469, 8.29842758178711, 4.0693817138671875, 2.4704132080078125, 3.1284103393554688, 9.485092163085938, 1.2156810760498047, -2.475677490234375, 10.016105651855469, -1.4671630859375, -0.04633331298828125, -4.17315673828125, -2.27703857421875, 15.387832641601562, 9.649421691894531, 9.039871215820312, 1.0260009765625, 3.1686782836914062, 13.609504699707031, -1.5438308715820312, 4.993785858154297, 0.6347770690917969, -4.37982177734375, 5.189964294433594, 11.11065673828125, -3.243572235107422, 1.2762203216552734, 7.411857604980469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000123.npy"}
{"epoch": 0.18061674008810572, "step": 124, "batch_size": 64, "mean": 6.1312971115112305, "std": 8.156783103942871, "min": -5.103116989135742, "p10": -1.9038169860839844, "median": 3.779094696044922, "p90": 18.616233062744143, "max": 33.93724822998047, "pos_frac": 0.75, "sample": [7.152534484863281, -0.38025665283203125, 16.039207458496094, -1.803009033203125, -0.8597126007080078, 9.21014404296875, -0.2241668701171875, 1.6945266723632812, -3.16424560546875, 33.93724822998047, 1.2188568115234375, 3.3482704162597656, 0.1767578125, 7.285697937011719, 3.0656280517578125, 21.367111206054688, 23.512306213378906, 3.5879364013671875, 3.7920150756835938, 2.1168212890625, 2.5746917724609375, -1.8628120422363281, 19.120315551757812, -4.907386779785156, 17.7325439453125, 12.485095977783203, 13.289535522460938, 1.25653076171875, 8.121898651123047, 8.365692138671875, -5.103116989135742, 2.0683021545410156, -1.9060440063476562, 6.0499725341796875, 8.806655883789062, 23.894073486328125, 8.505958557128906, 11.819686889648438, 8.868904113769531, 3.1918792724609375, 8.152923583984375, 8.259849548339844, 4.747016906738281, -1.89862060546875, 7.192441940307617, -2.2702407836914062, 8.79779052734375, 3.76617431640625, -0.7799816131591797, -5.0281829833984375, -3.1827011108398438, 0.957672119140625, 7.39459228515625, 4.12506103515625, 10.436859130859375, 23.99835205078125, 4.872467041015625, 18.994956970214844, -0.7946224212646484, 0.0448760986328125, -0.5029983520507812, 3.2257843017578125, 16.731555938720703, 1.7159442901611328], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000124.npy"}
{"epoch": 0.18208516886930984, "step": 125, "batch_size": 64, "mean": 4.905576705932617, "std": 6.154337406158447, "min": -9.849533081054688, "p10": -1.2618011474609374, "median": 3.4904136657714844, "p90": 12.851531219482423, "max": 22.01309585571289, "pos_frac": 0.828125, "sample": [5.1511077880859375, 2.5224456787109375, 4.3555755615234375, -1.578948974609375, 0.46308135986328125, -3.846454620361328, 12.921401977539062, 2.419994354248047, 15.5792236328125, 4.9610748291015625, 0.7198295593261719, -1.33880615234375, -0.5127449035644531, -9.849533081054688, 10.530155181884766, 0.5275459289550781, 4.116363525390625, 3.4696578979492188, 9.753215789794922, 2.1125869750976562, 5.9576568603515625, 1.3528366088867188, 0.19036865234375, -1.082122802734375, 3.424234390258789, 20.02496337890625, 1.793609619140625, 3.2750186920166016, 8.112388610839844, 5.835868835449219, 4.805393218994141, 1.5948944091796875, -0.5802764892578125, 1.695444107055664, 15.229679107666016, 9.649238586425781, -3.2098922729492188, 20.12059783935547, 11.935455322265625, 9.884803771972656, 2.9121017456054688, 22.01309585571289, 2.06695556640625, 11.811569213867188, 5.9863433837890625, -0.9432525634765625, 4.751739501953125, 4.4145965576171875, 1.1319961547851562, 10.323638916015625, 2.5341262817382812, 2.8970985412597656, 10.442512512207031, -2.7255935668945312, 12.308006286621094, 4.833583831787109, 1.3274383544921875, 12.688499450683594, 3.6891021728515625, 5.66058349609375, -3.8680572509765625, 13.249435424804688, 3.51116943359375, 0.453277587890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000125.npy"}
{"epoch": 0.18355359765051396, "step": 126, "batch_size": 64, "mean": 6.910326957702637, "std": 5.541971683502197, "min": -2.1504974365234375, "p10": -0.31354751586914054, "median": 6.031398773193359, "p90": 15.919324874877931, "max": 21.817108154296875, "pos_frac": 0.859375, "sample": [21.817108154296875, -1.7825927734375, 4.047685623168945, 9.610649108886719, 11.1021728515625, 16.357749938964844, 9.968555450439453, -0.936492919921875, 5.854301452636719, 2.333343505859375, 4.435821533203125, 12.47705078125, 8.411834716796875, 10.051536560058594, 2.4782028198242188, -0.12197494506835938, 4.585365295410156, 4.8031158447265625, 7.6048736572265625, -0.21373748779296875, 4.922863006591797, 3.647205352783203, 1.5746650695800781, -1.9277572631835938, 3.2086257934570312, 8.547256469726562, 15.289363861083984, 16.50775146484375, 11.33056640625, 8.633468627929688, 3.927326202392578, -0.3563232421875, -0.678985595703125, 3.9092636108398438, 17.678192138671875, 2.0910110473632812, 8.922401428222656, -0.7290802001953125, 9.59246826171875, 8.576797485351562, 1.5016326904296875, 7.9567413330078125, 7.183746337890625, 12.321662902832031, 4.356254577636719, 6.20849609375, 8.25680923461914, 6.732563018798828, 6.567634582519531, 5.809783935546875, 4.938804626464844, 9.311676025390625, 16.189308166503906, 17.938323974609375, 4.8920745849609375, 0.7545166015625, 11.209762573242188, 4.8608551025390625, 5.521381378173828, 9.210746765136719, 4.7744140625, 11.419265747070312, -2.1504974365234375, 18.943321228027344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000126.npy"}
{"epoch": 0.18502202643171806, "step": 127, "batch_size": 64, "mean": 5.684311866760254, "std": 7.019923210144043, "min": -11.514663696289062, "p10": -1.6191490173339844, "median": 5.4981842041015625, "p90": 15.131465148925782, "max": 26.445327758789062, "pos_frac": 0.84375, "sample": [0.7886466979980469, 15.856216430664062, -1.8966407775878906, 1.1263809204101562, 4.589820861816406, 0.5739593505859375, 3.2673797607421875, 3.2684402465820312, 6.522552490234375, 13.484619140625, 2.039318084716797, 6.271862030029297, 15.625411987304688, -6.442737579345703, 6.259735107421875, 1.6884384155273438, 6.217979431152344, 2.437540054321289, 5.23614501953125, 8.154922485351562, 8.535385131835938, 23.716552734375, -2.5913772583007812, 7.087074279785156, 7.531085968017578, 10.176132202148438, 6.686798095703125, 9.81207275390625, -6.1288604736328125, 9.485832214355469, 21.879291534423828, 15.07452392578125, 9.24862289428711, 1.1905813217163086, -11.514663696289062, 0.00473785400390625, 0.0029497146606445312, 5.5970001220703125, 0.41655731201171875, -0.19388198852539062, 5.6700897216796875, 12.387840270996094, 0.8957672119140625, 6.165199279785156, 3.7872314453125, 11.892303466796875, 5.3993682861328125, 7.5648956298828125, -6.0987091064453125, 8.921714782714844, 3.6349716186523438, 1.9299583435058594, 6.073816299438477, 26.445327758789062, 2.4062652587890625, 9.8013916015625, -1.6152877807617188, 12.310260772705078, -1.6208038330078125, 0.6612129211425781, 16.10529327392578, 15.155868530273438, -0.185791015625, 5.0213775634765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000127.npy"}
{"epoch": 0.18649045521292218, "step": 128, "batch_size": 64, "mean": 5.552103042602539, "std": 6.990488052368164, "min": -21.7242431640625, "p10": -0.33334999084472655, "median": 4.121860504150391, "p90": 14.708990478515627, "max": 24.87408447265625, "pos_frac": 0.84375, "sample": [2.152149200439453, -0.5849838256835938, 5.2710113525390625, 10.205520629882812, 14.951934814453125, 2.0752944946289062, 17.053024291992188, 15.460540771484375, 1.1018753051757812, 1.7383804321289062, 5.750648498535156, 5.5279388427734375, -0.341522216796875, 3.62359619140625, 3.6760940551757812, 14.142120361328125, 0.4995880126953125, 4.0619049072265625, 0.761871337890625, 0.5945091247558594, -1.3107452392578125, 3.4505138397216797, 11.594718933105469, 19.515823364257812, 1.33807373046875, -0.3142814636230469, 24.87408447265625, 13.910781860351562, 16.36285400390625, 7.283046722412109, 9.220962524414062, 0.7382125854492188, 9.36465072631836, -0.06601715087890625, 2.196155548095703, 10.82568359375, -21.7242431640625, 9.981155395507812, 5.535144805908203, 0.6494903564453125, 5.914253234863281, 2.127941131591797, 1.8868484497070312, -0.07890701293945312, 2.7271080017089844, 9.370071411132812, -3.1380558013916016, 2.8920135498046875, 6.942913055419922, 2.2314529418945312, 8.752395629882812, 4.938266754150391, 6.13848876953125, -1.2264022827148438, 4.181816101074219, 9.9705810546875, 0.1968994140625, 9.847244262695312, 13.14404296875, -3.255687713623047, 13.247650146484375, 7.655342102050781, 18.69097137451172, 1.0298023223876953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000128.npy"}
{"epoch": 0.18795888399412627, "step": 129, "batch_size": 64, "mean": 5.623868942260742, "std": 6.491948127746582, "min": -3.650653839111328, "p10": -1.1955726623535154, "median": 3.5278396606445312, "p90": 15.84967994689942, "max": 23.365036010742188, "pos_frac": 0.796875, "sample": [-0.6957550048828125, 0.4199867248535156, 2.079944610595703, 13.63759994506836, -1.6882286071777344, 10.375167846679688, 0.0180511474609375, 2.6389312744140625, 14.694900512695312, -0.3340606689453125, 16.344585418701172, 8.838699340820312, 0.3391094207763672, 18.669784545898438, -0.6876716613769531, -1.2805976867675781, 5.16668701171875, 2.463104248046875, 5.8268280029296875, 3.3003406524658203, 8.096527099609375, 12.36676025390625, 5.9799041748046875, 7.605567932128906, -2.3957443237304688, -2.530517578125, -0.881195068359375, 5.37091064453125, 17.75537109375, 23.365036010742188, 1.8949432373046875, 21.91046142578125, 18.871200561523438, 3.875518798828125, 9.834823608398438, 2.2220382690429688, -0.143096923828125, 1.749481201171875, 9.988655090332031, 2.921680450439453, 1.1216278076171875, 3.0432586669921875, 6.754142761230469, 9.233833312988281, 16.580772399902344, 1.934600830078125, 12.701583862304688, 2.9481163024902344, -3.650653839111328, -2.638216018676758, 7.310329437255859, 6.521308898925781, 1.8816871643066406, 6.4282379150390625, 3.62945556640625, 3.4262237548828125, 9.13686752319336, 1.9936981201171875, 0.6113662719726562, -2.9897079467773438, 12.926528930664062, 6.309326171875, 7.7246856689453125, -0.9971809387207031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000129.npy"}
{"epoch": 0.1894273127753304, "step": 130, "batch_size": 64, "mean": 7.054594993591309, "std": 7.807432174682617, "min": -4.6740264892578125, "p10": -1.1170087814331053, "median": 5.760711669921875, "p90": 18.47584686279297, "max": 35.01933288574219, "pos_frac": 0.859375, "sample": [5.1248931884765625, 5.923851013183594, -2.483917236328125, 19.510902404785156, 0.41649627685546875, 10.718978881835938, 5.047821044921875, 8.858638763427734, 25.123062133789062, -1.0040302276611328, 8.539138793945312, 18.35730743408203, 5.4190673828125, 8.751190185546875, 7.176139831542969, -1.1654281616210938, 5.63226318359375, -1.7793121337890625, 11.684623718261719, 9.634723663330078, 3.1104507446289062, -2.5029754638671875, 0.337493896484375, 16.233482360839844, 20.882732391357422, -3.2529144287109375, 0.7776947021484375, 7.955965042114258, 0.9820823669433594, 14.747840881347656, -4.6740264892578125, 3.647735595703125, -3.6094284057617188, 3.9251708984375, 6.710052490234375, 13.5638427734375, 7.7310333251953125, 3.2155075073242188, 8.117767333984375, 0.01337432861328125, 7.1099853515625, -0.15328598022460938, 18.526649475097656, 10.17706298828125, 5.003395080566406, 0.37627410888671875, 35.01933288574219, 5.88916015625, 2.08416748046875, 2.7232666015625, 8.212196350097656, 0.02701568603515625, 10.766532897949219, 0.651947021484375, 7.4179534912109375, 3.304370880126953, 11.0693359375, 2.3091487884521484, 21.889297485351562, 10.441642761230469, 25.49530029296875, 2.9298667907714844, 3.6940040588378906, 9.130149841308594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000130.npy"}
{"epoch": 0.19089574155653452, "step": 131, "batch_size": 64, "mean": 5.799962997436523, "std": 6.874149799346924, "min": -4.801239013671875, "p10": -0.868701934814453, "median": 4.841888427734375, "p90": 14.304223632812501, "max": 25.73455810546875, "pos_frac": 0.734375, "sample": [-0.5866241455078125, -0.40020751953125, 6.293731689453125, 0.07695388793945312, 23.31097412109375, -4.801239013671875, -1.7815322875976562, 3.6407623291015625, 10.697273254394531, 5.388114929199219, 25.73455810546875, 10.308212280273438, -4.04876708984375, 1.4739456176757812, -0.0760650634765625, -0.7723846435546875, 3.2676334381103516, -2.135993003845215, -0.31809234619140625, 5.927642822265625, 1.1649208068847656, 7.7444000244140625, -0.19011306762695312, 0.9251670837402344, 12.288604736328125, 2.6196746826171875, 2.5408592224121094, 4.295661926269531, 14.396774291992188, 7.189628601074219, -0.32781410217285156, 12.288711547851562, 0.5545654296875, 1.6959686279296875, 8.405990600585938, -0.14161109924316406, 11.944961547851562, 5.766471862792969, 22.60833740234375, 3.622528076171875, 4.199314117431641, 9.445684432983398, 12.891304016113281, 5.840061187744141, -0.3227653503417969, 8.420440673828125, 7.868370056152344, 5.539878845214844, -3.1055850982666016, 6.318325042724609, 5.935752868652344, -0.9099807739257812, -2.3521881103515625, 14.088272094726562, 13.491989135742188, 15.191070556640625, -0.03424835205078125, 3.9589691162109375, 0.30440807342529297, 16.849319458007812, 10.89031982421875, 7.144702911376953, 20.870864868164062, 8.080787658691406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000131.npy"}
{"epoch": 0.19236417033773862, "step": 132, "batch_size": 64, "mean": 6.553079605102539, "std": 6.372128486633301, "min": -15.59796142578125, "p10": 0.23441295623779307, "median": 5.25189208984375, "p90": 16.475100708007815, "max": 20.844642639160156, "pos_frac": 0.90625, "sample": [5.7437591552734375, 2.0745620727539062, 9.5496826171875, 4.911857604980469, 5.249504089355469, 18.364707946777344, 2.9098358154296875, 20.033721923828125, 15.688690185546875, 3.3360061645507812, 2.5938568115234375, 4.921501159667969, 2.9177780151367188, 0.4722900390625, 6.97747802734375, 10.099906921386719, 11.01629638671875, 4.823951721191406, 7.155052185058594, 4.6675567626953125, -3.4500885009765625, 4.5486297607421875, 17.210845947265625, 3.2455902099609375, 1.9414043426513672, 5.254280090332031, 3.5667572021484375, 19.42931365966797, 4.578269958496094, 7.133811950683594, -1.3766250610351562, -15.59796142578125, 0.19233131408691406, 11.24368667602539, 5.939659118652344, 20.844642639160156, 11.402812957763672, 4.5789794921875, 3.5996932983398438, 10.74428939819336, 13.854248046875, 5.9683685302734375, 12.166545867919922, 13.014289855957031, -0.6133041381835938, 4.009906768798828, 16.8121337890625, 5.773551940917969, -0.4290657043457031, 3.6158218383789062, 0.33260345458984375, 13.581268310546875, 5.365509033203125, 3.8855972290039062, 8.279815673828125, 9.612106323242188, 2.6743316650390625, -1.4416275024414062, 5.885711669921875, 8.153678894042969, 3.4851455688476562, 0.38360595703125, 8.377632141113281, 18.110916137695312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000132.npy"}
{"epoch": 0.19383259911894274, "step": 133, "batch_size": 64, "mean": 6.717145919799805, "std": 6.872471809387207, "min": -3.6815719604492188, "p10": -0.5976648330688474, "median": 4.97222900390625, "p90": 17.034574890136724, "max": 27.914840698242188, "pos_frac": 0.859375, "sample": [6.054817199707031, 11.052116394042969, -0.8721771240234375, 2.5963287353515625, 2.7578582763671875, -1.6808013916015625, 14.02899169921875, -2.3846282958984375, 2.3864593505859375, 8.746429443359375, 7.07171630859375, 5.271034240722656, 1.254791259765625, 22.75177001953125, 4.597908020019531, 15.109809875488281, 1.4180526733398438, -0.3733062744140625, 14.471481323242188, 4.5948638916015625, 8.723876953125, 11.164131164550781, 5.3356781005859375, 4.645233154296875, 6.237266540527344, 13.614860534667969, -0.3900432586669922, 5.1561126708984375, 13.288948059082031, -0.6866455078125, 8.233314514160156, 6.236560821533203, 3.684253692626953, 1.0225601196289062, 2.112201690673828, 8.022163391113281, 1.995086669921875, 8.660408020019531, 1.6099853515625, 7.099445343017578, 5.3806610107421875, 0.6880264282226562, 2.0819740295410156, 27.914840698242188, 2.170879364013672, 6.587226867675781, 18.083343505859375, -1.6680908203125, 15.853271484375, 1.2398872375488281, 4.576080322265625, 12.81704330444336, 2.765533447265625, 3.6400146484375, 20.85639190673828, 17.665191650390625, 9.415641784667969, 22.990745544433594, 17.540847778320312, 4.7883453369140625, 2.3225021362304688, -0.8638153076171875, 2.1134586334228516, -3.6815719604492188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000133.npy"}
{"epoch": 0.19530102790014683, "step": 134, "batch_size": 64, "mean": 7.475937843322754, "std": 6.956516265869141, "min": -5.7245025634765625, "p10": -0.16019592285156206, "median": 6.708080291748047, "p90": 16.745574188232425, "max": 26.246047973632812, "pos_frac": 0.890625, "sample": [1.954010009765625, 1.2798614501953125, -2.1972122192382812, 9.537582397460938, 26.246047973632812, 4.274932861328125, 2.4232749938964844, 6.819759368896484, -3.0250320434570312, 10.001106262207031, 2.06048583984375, 7.018463134765625, 7.361680030822754, 1.7404556274414062, 0.2664947509765625, 2.101898193359375, 1.8971023559570312, 7.841522216796875, 11.429412841796875, 1.1521263122558594, 5.1287689208984375, 4.790653228759766, -3.1071929931640625, 5.152629852294922, 6.021087646484375, 6.5097198486328125, 15.153900146484375, 9.708534240722656, 10.041748046875, 16.00075912475586, 4.830621719360352, 16.945358276367188, 2.461507797241211, 9.594070434570312, 10.998184204101562, 13.904243469238281, 10.153888702392578, 2.8732070922851562, -2.5013351440429688, 6.911262512207031, 13.661239624023438, 8.705497741699219, 6.7584686279296875, 1.8107261657714844, 22.70556640625, 13.571121215820312, 17.758758544921875, -2.0516586303710938, 17.523941040039062, 3.5553483963012695, -5.7245025634765625, 4.297096252441406, -0.3430633544921875, 23.905685424804688, 14.551155090332031, 8.1473388671875, 5.41510009765625, 1.49237060546875, 6.657691955566406, 11.45462417602539, 2.2900123596191406, 21.748741149902344, 16.27941131591797, 12.533737182617188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000134.npy"}
{"epoch": 0.19676945668135096, "step": 135, "batch_size": 64, "mean": 6.340190887451172, "std": 8.152007102966309, "min": -9.343353271484375, "p10": -2.1596534729003896, "median": 4.732841491699219, "p90": 14.102827072143555, "max": 45.843414306640625, "pos_frac": 0.859375, "sample": [1.740936279296875, 8.24267578125, 6.436370849609375, 10.117324829101562, -6.063201904296875, 15.491256713867188, 13.899337768554688, 0.5883560180664062, 8.269378662109375, 13.128158569335938, 8.814273834228516, 0.23823928833007812, 12.69537353515625, -2.807758331298828, 4.8694915771484375, 1.8425827026367188, 14.19003677368164, 9.813621520996094, 5.332965850830078, 2.5249404907226562, 3.9429702758789062, 15.342536926269531, 1.142120361328125, -2.6873321533203125, 2.919431686401367, 21.96062469482422, 0.5526962280273438, 45.843414306640625, -4.474063873291016, 25.16765594482422, 4.71307373046875, 4.578224182128906, -3.0480499267578125, 3.0794715881347656, 3.2533226013183594, 10.649288177490234, 3.1164608001708984, 9.773691177368164, -2.5597686767578125, -1.2260513305664062, 3.0036544799804688, -0.16019821166992188, 0.6982154846191406, 1.9810943603515625, -9.343353271484375, 7.252727508544922, 6.6031341552734375, 4.7526092529296875, 3.492755889892578, 5.916454315185547, 4.093515396118164, 2.300506591796875, 12.485393524169922, 1.4061088562011719, 10.458335876464844, 10.827407836914062, 8.281608581542969, 12.914642333984375, 6.495170593261719, 0.6104583740234375, 3.8379592895507812, 19.44470977783203, 8.4326171875, 8.582633972167969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000135.npy"}
{"epoch": 0.19823788546255505, "step": 136, "batch_size": 64, "mean": 7.531214714050293, "std": 6.414876461029053, "min": -5.90679931640625, "p10": 0.6479089736938478, "median": 7.04681396484375, "p90": 16.815982818603516, "max": 22.908905029296875, "pos_frac": 0.90625, "sample": [4.461395263671875, 15.519210815429688, 4.977813720703125, 6.320274353027344, 3.8558425903320312, 2.9132003784179688, 17.238967895507812, 8.773887634277344, 7.122932434082031, 2.7546939849853516, 8.417564392089844, 0.8165817260742188, 8.377578735351562, 0.5756206512451172, 5.535305023193359, 7.4089202880859375, -0.15750503540039062, 2.7500839233398438, 7.100334167480469, 22.074615478515625, 6.6171417236328125, 16.909446716308594, 22.908905029296875, 5.455009460449219, 1.6327972412109375, 16.93207550048828, 17.281463623046875, 3.4547348022460938, 16.597900390625, 1.8551139831542969, 9.436164855957031, 6.270328521728516, 4.040557861328125, -5.90679931640625, 4.560333251953125, 1.036712646484375, 8.450660705566406, 4.638725280761719, 10.620590209960938, 8.175899505615234, 2.613727569580078, 7.870269775390625, 1.6190338134765625, 20.579544067382812, -2.332914352416992, 8.879739761352539, 1.6448097229003906, 14.02353286743164, -3.4228286743164062, -3.5568771362304688, 16.564979553222656, 13.136772155761719, 11.973499298095703, 6.993293762207031, 12.464645385742188, 7.284778594970703, 9.878677368164062, 3.7271995544433594, 9.469074249267578, -2.5749588012695312, 13.246841430664062, 11.116928100585938, 16.5220947265625, 6.470817565917969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000136.npy"}
{"epoch": 0.19970631424375918, "step": 137, "batch_size": 64, "mean": 10.205141067504883, "std": 8.651503562927246, "min": -5.864349365234375, "p10": 1.36285400390625, "median": 7.441120147705078, "p90": 23.635669708251957, "max": 33.8299560546875, "pos_frac": 0.953125, "sample": [5.4825439453125, 13.037649154663086, 6.8674468994140625, 5.831748962402344, 10.478866577148438, 14.464813232421875, 6.3812713623046875, 29.51513671875, 6.418159484863281, 17.12747573852539, 1.4705734252929688, 5.125236511230469, 32.630615234375, 7.0058441162109375, 5.070564270019531, 5.142978668212891, 7.371337890625, 14.587539672851562, 4.6602020263671875, 9.90057373046875, 24.120132446289062, 7.831886291503906, 6.669595718383789, 33.8299560546875, 21.313995361328125, 9.6865234375, 14.0675048828125, 9.488449096679688, -1.313100814819336, 8.570133209228516, 10.435653686523438, 9.319503784179688, 28.134422302246094, 3.5687484741210938, 26.32391357421875, 4.5436859130859375, 10.373882293701172, 9.452438354492188, 22.50525665283203, 7.379158020019531, 5.022920608520508, 7.1636505126953125, 1.3166885375976562, -5.864349365234375, 3.0443115234375, 2.6232986450195312, 24.3968505859375, -1.2267532348632812, 14.918037414550781, 14.092361450195312, 7.503082275390625, 0.629241943359375, 3.9449329376220703, 12.194095611572266, 2.06573486328125, 0.18179702758789062, 4.207122802734375, 4.784332275390625, 2.4011154174804688, 18.40699005126953, 20.536972045898438, 0.772552490234375, 14.753787994384766, 20.387969970703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000137.npy"}
{"epoch": 0.2011747430249633, "step": 138, "batch_size": 64, "mean": 7.02825403213501, "std": 7.0232367515563965, "min": -3.9825668334960938, "p10": -0.2464941024780273, "median": 5.603023529052734, "p90": 18.13671264648438, "max": 24.3814697265625, "pos_frac": 0.859375, "sample": [5.5817413330078125, 2.2859878540039062, 14.827262878417969, 14.123016357421875, 6.203132629394531, 15.871963500976562, 3.4672698974609375, 2.9280433654785156, 5.624305725097656, 21.97272491455078, 1.0289726257324219, -2.5210418701171875, 3.13909912109375, 1.2412548065185547, 3.1396484375, 5.131744384765625, 8.091751098632812, 2.6240463256835938, 1.9087944030761719, -0.56402587890625, 8.023063659667969, 20.696792602539062, -0.26363563537597656, 3.3578948974609375, 19.443267822265625, 13.455123901367188, -3.9825668334960938, 5.937652587890625, 8.083763122558594, -0.2064971923828125, 3.6859283447265625, 2.1252288818359375, 5.567535400390625, 16.597640991210938, 14.4332275390625, 23.567047119140625, 12.237533569335938, 10.18893051147461, -0.13405609130859375, 17.157859802246094, 4.63421630859375, 9.114551544189453, -1.2878265380859375, 1.3140230178833008, 4.369810104370117, -3.4390907287597656, 15.224258422851562, 5.915260314941406, -1.5140151977539062, 8.29690170288086, 6.521644592285156, 1.6622562408447266, 0.36698150634765625, 5.7623443603515625, 1.4936294555664062, 6.975303649902344, 9.776443481445312, 19.108810424804688, 5.82244873046875, 24.3814697265625, 2.0892372131347656, 18.55622100830078, 2.181549072265625, 6.404396057128906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000138.npy"}
{"epoch": 0.2026431718061674, "step": 139, "batch_size": 64, "mean": 8.000020980834961, "std": 8.227112770080566, "min": -21.35797119140625, "p10": -1.5043479919433593, "median": 8.277154922485352, "p90": 18.875445556640624, "max": 26.75466537475586, "pos_frac": 0.859375, "sample": [-0.3403968811035156, 6.9553985595703125, 8.926376342773438, 15.167205810546875, 14.16799545288086, 9.714653015136719, 26.75466537475586, 3.764270782470703, 11.316513061523438, 3.9738845825195312, -21.35797119140625, 9.486488342285156, 10.271629333496094, 14.907073974609375, 19.47320556640625, 7.812583923339844, 10.325172424316406, 18.677139282226562, 12.165817260742188, -1.6014175415039062, 18.960433959960938, 2.952442169189453, 4.8388824462890625, 3.6782455444335938, 4.711250305175781, 1.3342361450195312, 10.430892944335938, 13.552513122558594, -3.7943115234375, 2.73101806640625, 15.623741149902344, 8.602018356323242, -2.2897415161132812, -2.601795196533203, 11.932247161865234, 6.564525604248047, 7.632759094238281, 2.5735206604003906, 0.6185035705566406, 18.1220703125, 1.6848373413085938, 7.956287384033203, 8.670173645019531, 2.3033905029296875, 9.710372924804688, 6.5439910888671875, 0.8494873046875, 13.168338775634766, 0.18305587768554688, -1.7154083251953125, 26.577560424804688, 9.597496032714844, 4.249645233154297, 6.469432830810547, 25.231109619140625, -1.5034027099609375, 23.712982177734375, 9.949966430664062, 20.446876525878906, 4.47528076171875, 8.5980224609375, 10.763931274414062, 8.848918914794922, -1.5047531127929688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000139.npy"}
{"epoch": 0.20411160058737152, "step": 140, "batch_size": 64, "mean": 7.332958221435547, "std": 7.606283664703369, "min": -5.84412956237793, "p10": -1.088503265380859, "median": 6.312124252319336, "p90": 17.851006317138673, "max": 24.522064208984375, "pos_frac": 0.84375, "sample": [6.320560455322266, 4.206632614135742, 19.053466796875, 15.019729614257812, 0.4120330810546875, 0.3333396911621094, 3.0991668701171875, 11.454963684082031, 17.122743606567383, 16.949310302734375, 14.604324340820312, -4.432159423828125, 7.143653869628906, 2.309864044189453, 9.23984146118164, 5.421058654785156, 12.391075134277344, 2.6271018981933594, 6.603412628173828, 19.009456634521484, -1.2466583251953125, 0.9335556030273438, 16.33071517944336, 17.60340118408203, -0.14965057373046875, 10.069168090820312, 0.4495849609375, 8.349258422851562, 3.6863861083984375, 2.229522705078125, 2.8138656616210938, 1.54376220703125, 5.986125946044922, 18.794342041015625, 1.768829345703125, 15.515708923339844, -1.3163375854492188, 9.434539794921875, 12.764083862304688, 7.1717529296875, 3.15802001953125, 1.1330184936523438, 24.295867919921875, -4.8890228271484375, 6.109413146972656, 6.303688049316406, -0.7194747924804688, 24.522064208984375, 10.753181457519531, -3.8317794799804688, -5.84412956237793, -0.1273040771484375, 0.9794254302978516, 5.9467315673828125, 7.218700408935547, 11.77044677734375, 7.5238189697265625, -5.298858642578125, 24.311447143554688, 17.957122802734375, 9.998416900634766, 8.215641021728516, 3.5083770751953125, 14.6929931640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000140.npy"}
{"epoch": 0.2055800293685756, "step": 141, "batch_size": 64, "mean": 8.8695650100708, "std": 8.220972061157227, "min": -6.452545166015625, "p10": -0.24200363159179686, "median": 7.548137664794922, "p90": 22.85269241333008, "max": 36.43144226074219, "pos_frac": 0.84375, "sample": [6.03338623046875, 6.951141357421875, 7.3408355712890625, 6.967002868652344, 10.849197387695312, 2.6549606323242188, 8.286415100097656, 1.8022804260253906, 4.467437744140625, 2.988872528076172, -0.33502960205078125, 22.88604736328125, -6.452545166015625, 6.943977355957031, -0.122314453125, 24.02680206298828, -0.32364463806152344, 6.1973876953125, 23.102127075195312, 5.354484558105469, 22.774864196777344, 25.74908447265625, 13.102998733520508, 9.160202026367188, 14.880455017089844, 3.7484130859375, 11.7374267578125, 7.9629058837890625, 4.889556884765625, 0.18229293823242188, 11.374214172363281, 8.137313842773438, 5.398250579833984, -2.0279693603515625, 24.411346435546875, 7.755439758300781, 9.597785949707031, 4.313301086425781, 3.4893722534179688, 5.386516571044922, 8.459320068359375, 12.791606903076172, 1.49188232421875, 36.43144226074219, 4.890554428100586, 8.302574157714844, 10.830680847167969, -1.862457275390625, -0.19278621673583984, -0.2435302734375, 17.2001953125, 12.547069549560547, 5.0917205810546875, -0.23844146728515625, 13.85443115234375, 9.833724975585938, 24.4862060546875, 11.450576782226562, 4.792819976806641, 12.580650329589844, 14.253807067871094, -3.6053314208984375, 18.23444366455078, 14.628402709960938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000141.npy"}
{"epoch": 0.20704845814977973, "step": 142, "batch_size": 64, "mean": 7.893212795257568, "std": 8.378360748291016, "min": -3.476520538330078, "p10": -1.0369819641113278, "median": 6.967742919921875, "p90": 18.57354431152345, "max": 38.9609375, "pos_frac": 0.828125, "sample": [7.270851135253906, 38.9609375, 3.8576183319091797, 6.430229187011719, 12.180252075195312, 3.115478515625, 10.040008544921875, 10.798412322998047, 14.651351928710938, 6.270908355712891, 4.9992218017578125, 9.591598510742188, 6.664634704589844, 0.3641166687011719, 20.919288635253906, 26.837249755859375, 1.2231826782226562, 4.53509521484375, 2.5152587890625, 0.7084426879882812, 10.164894104003906, 19.78192138671875, 5.9829864501953125, 3.4301681518554688, 13.592941284179688, 7.726066589355469, -0.5565414428710938, 7.577024459838867, -0.7318572998046875, -0.4260997772216797, 30.07769775390625, 7.5887298583984375, 11.260406494140625, 15.328014373779297, -3.4102134704589844, 19.88421630859375, 9.364532470703125, 5.693197250366211, -3.476520538330078, 0.3923225402832031, 7.762809753417969, 7.973503112792969, -1.3484649658203125, 3.7227783203125, 2.8314056396484375, 10.346298217773438, 6.571506500244141, 5.6440582275390625, 15.753997802734375, 9.995101928710938, 11.013359069824219, -2.9116744995117188, -1.566680908203125, 9.700042724609375, 28.024658203125, 13.959548950195312, -2.7521190643310547, 7.799156188964844, 11.483749389648438, -0.771881103515625, 3.25799560546875, 8.338661193847656, 0.31036376953125, -1.1505966186523438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000142.npy"}
{"epoch": 0.20851688693098386, "step": 143, "batch_size": 64, "mean": 7.293005466461182, "std": 8.576821327209473, "min": -12.871963500976562, "p10": -1.6230358123779296, "median": 6.172115325927734, "p90": 17.96746063232422, "max": 36.56398010253906, "pos_frac": 0.8125, "sample": [8.02907943725586, 10.368865966796875, 1.8625869750976562, -7.710700988769531, 11.54071044921875, 9.837600708007812, -1.1005973815917969, 18.1004638671875, 12.469810485839844, 5.359869003295898, 21.052337646484375, -0.7653579711914062, 6.16851806640625, 7.98828125, -1.4339981079101562, -2.3754043579101562, 1.8869972229003906, -0.9212570190429688, -4.590599060058594, 0.0778045654296875, 2.597156524658203, 8.27862548828125, -1.7040519714355469, 20.767494201660156, 5.665229797363281, 3.902496337890625, 1.4405441284179688, 15.028778076171875, 6.45648193359375, 15.862781524658203, 8.236579895019531, 4.830930709838867, 25.105377197265625, 3.2384204864501953, 9.378036499023438, 3.3504867553710938, 5.364471435546875, 8.539894104003906, 14.726333618164062, 11.233573913574219, 13.722755432128906, -1.9316253662109375, 0.6959457397460938, 7.837921142578125, -0.3567209243774414, 17.657119750976562, 0.1171112060546875, 11.523658752441406, 0.1782989501953125, 9.924850463867188, 16.172821044921875, 36.56398010253906, 9.463188171386719, 5.623695373535156, 27.12109375, 6.175712585449219, 10.522117614746094, 5.2971649169921875, 22.014663696289062, -12.871963500976562, 3.302806854248047, 8.732288360595703, -3.5517578125, 4.672554016113281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000143.npy"}
{"epoch": 0.20998531571218795, "step": 144, "batch_size": 64, "mean": 8.005437850952148, "std": 9.913551330566406, "min": -12.08917236328125, "p10": -2.5591865539550773, "median": 6.926715850830078, "p90": 18.744670867919922, "max": 41.197364807128906, "pos_frac": 0.8125, "sample": [-3.102447509765625, 6.010257720947266, 2.5969390869140625, 3.1523056030273438, -3.5730209350585938, 4.87750244140625, 28.876197814941406, 6.826713562011719, 7.9150390625, -6.4250030517578125, 41.197364807128906, 11.127838134765625, 15.882797241210938, 19.30878448486328, 6.207088470458984, 18.399322509765625, 11.281818389892578, 2.3136062622070312, 3.6557693481445312, 28.126163482666016, 2.438140869140625, 9.595287322998047, -0.341644287109375, -1.885894775390625, 9.726097106933594, 3.240081787109375, 4.891988754272461, 8.560775756835938, 8.374870300292969, -0.15420913696289062, 12.86773681640625, 11.267135620117188, 15.589248657226562, 8.3037109375, -12.08917236328125, 8.254508972167969, 0.7338905334472656, 2.8722152709960938, -2.8477401733398438, 8.859237670898438, 6.209300994873047, 5.9457550048828125, 29.37481689453125, 17.816848754882812, 7.0267181396484375, 13.563392639160156, -0.7704696655273438, -10.969131469726562, 18.892677307128906, 11.2354736328125, 3.7083969116210938, 13.434211730957031, 5.6111907958984375, 8.550537109375, -7.271705627441406, 3.5454483032226562, 35.485389709472656, 7.761016845703125, 4.0069732666015625, 11.147071838378906, 16.313674926757812, -1.8766098022460938, 8.252799987792969, 2.442962646484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000144.npy"}
{"epoch": 0.21145374449339208, "step": 145, "batch_size": 64, "mean": 8.439842224121094, "std": 8.86569881439209, "min": -14.291160583496094, "p10": -1.7190254211425773, "median": 8.096729278564453, "p90": 20.128185272216804, "max": 36.389427185058594, "pos_frac": 0.84375, "sample": [7.606422424316406, 11.755485534667969, 24.20697021484375, 31.466529846191406, 8.162300109863281, 17.23968505859375, 10.900146484375, -0.1886138916015625, 10.259971618652344, 16.66712188720703, -2.5899505615234375, 15.155006408691406, 6.502891540527344, 4.8699493408203125, 6.30164909362793, 3.008270263671875, 21.54955291748047, 8.281185150146484, -6.868110656738281, 3.611522674560547, 2.493938446044922, -3.7062454223632812, 1.0146217346191406, 0.8996124267578125, 10.941650390625, 5.302637100219727, 12.533958435058594, -5.468345642089844, 0.9591827392578125, 18.224349975585938, 4.073984146118164, 12.39031982421875, 15.136070251464844, -2.1583824157714844, 15.211353302001953, 13.864364624023438, 8.886222839355469, 3.153193473815918, 1.4207305908203125, 8.031158447265625, -14.291160583496094, 5.624908447265625, 22.410240173339844, 3.7385711669921875, 9.160152435302734, 36.389427185058594, 20.944114685058594, -0.8830490112304688, 9.63096809387207, 23.913497924804688, 11.573661804199219, 12.514688491821289, -2.077301025390625, 9.074138641357422, 8.236068725585938, 3.3079051971435547, 5.1981964111328125, 10.963920593261719, 6.62127685546875, 4.611194610595703, 11.666183471679688, 14.872200012207031, -0.78228759765625, 6.6300048828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000145.npy"}
{"epoch": 0.21292217327459617, "step": 146, "batch_size": 64, "mean": 9.692916870117188, "std": 8.440576553344727, "min": -5.7920684814453125, "p10": 1.4616502761840822, "median": 7.8084716796875, "p90": 19.113049316406254, "max": 34.229583740234375, "pos_frac": 0.90625, "sample": [18.391319274902344, 18.50562286376953, 16.84691619873047, 1.5354557037353516, 2.6609344482421875, -2.4337844848632812, 19.373374938964844, 9.201919555664062, 2.7939453125, 9.379825592041016, 11.646026611328125, 15.446510314941406, 1.4506359100341797, 3.9426422119140625, 3.4464378356933594, 7.8296661376953125, 5.3338165283203125, 14.623382568359375, 30.511825561523438, 5.787303924560547, 3.0203819274902344, 2.136962890625, 6.646781921386719, 15.5201416015625, -5.13897705078125, 15.271690368652344, -2.7677536010742188, 11.041694641113281, 16.268756866455078, 20.128463745117188, 1.4873504638671875, 2.484424591064453, 7.450693130493164, 16.810211181640625, 29.512855529785156, -2.4823532104492188, 4.88134765625, 3.943338394165039, 7.7872772216796875, 5.5449371337890625, 22.531082153320312, 11.3919677734375, 16.48267364501953, 7.187858581542969, 2.5706329345703125, 14.417556762695312, 17.810558319091797, -5.7920684814453125, 12.9888916015625, 11.863174438476562, 26.538040161132812, 34.229583740234375, 8.805221557617188, 7.067447662353516, 6.51409912109375, 17.988204956054688, 4.424995422363281, 8.615348815917969, 12.761894226074219, 5.931495666503906, 4.921173095703125, 8.717002868652344, -0.8697509765625, 7.427589416503906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000146.npy"}
{"epoch": 0.2143906020558003, "step": 147, "batch_size": 64, "mean": 10.242698669433594, "std": 8.834641456604004, "min": -7.248264312744141, "p10": -1.7599906921386719, "median": 9.943134307861328, "p90": 20.61955318450928, "max": 32.904449462890625, "pos_frac": 0.828125, "sample": [26.64470672607422, 5.176845550537109, 18.284454345703125, 18.127220153808594, 8.671310424804688, 2.786376953125, 7.474357604980469, 15.787010192871094, -2.8894424438476562, 6.10400390625, 9.281183242797852, 20.2132568359375, 20.79368019104004, 9.163818359375, 18.268924713134766, 26.498428344726562, 12.537071228027344, 13.909000396728516, -1.7953414916992188, -2.254913330078125, 16.66954803466797, -6.088859558105469, -7.248264312744141, 11.193855285644531, 26.11822509765625, -1.3391304016113281, 13.435859680175781, 16.671611785888672, 13.29173469543457, -3.640827178955078, 5.167671203613281, 11.283039093017578, 4.843086242675781, 1.8583793640136719, 9.017425537109375, 19.713729858398438, 4.612733840942383, 32.904449462890625, 6.288597106933594, 19.852020263671875, 9.462936401367188, 7.380867004394531, -0.548004150390625, 0.6248092651367188, -3.3003616333007812, 13.250141143798828, 8.795806884765625, 18.656492233276367, 21.94617462158203, 1.0272293090820312, -1.6775054931640625, 17.074302673339844, 16.946434020996094, 14.375564575195312, 15.102043151855469, -1.2629585266113281, 21.908172607421875, 10.423332214355469, 8.931159973144531, 13.810989379882812, 11.32830810546875, 5.054054260253906, 7.2658843994140625, 11.570014953613281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000147.npy"}
{"epoch": 0.21585903083700442, "step": 148, "batch_size": 64, "mean": 8.08431625366211, "std": 9.565133094787598, "min": -10.767414093017578, "p10": -1.4586224555969234, "median": 6.187158584594727, "p90": 20.415539550781254, "max": 37.314735412597656, "pos_frac": 0.84375, "sample": [-2.4570693969726562, 3.930217742919922, 5.73565673828125, 21.650146484375, 24.558212280273438, -3.1829605102539062, 2.5393333435058594, 5.542198181152344, 3.521636962890625, 36.348304748535156, 9.535469055175781, 19.27496337890625, 4.027629852294922, 2.1519775390625, 16.663475036621094, 20.90435791015625, 3.2982025146484375, -0.9606781005859375, 10.249832153320312, 8.024856567382812, 5.046165466308594, 11.21185302734375, 3.776029586791992, 6.3410797119140625, 37.314735412597656, 4.064846038818359, 8.255348205566406, -8.00537109375, 16.34375, -1.9677314758300781, 30.104934692382812, 6.116455078125, 8.384639739990234, 0.16119384765625, -0.9971466064453125, -1.656397819519043, 6.762580871582031, 0.22314453125, 13.67120361328125, -10.767414093017578, 8.301490783691406, 12.472114562988281, -1.6952629089355469, 7.524742126464844, 5.794677734375, 8.454055786132812, 11.78521728515625, 1.0570182800292969, 8.087249755859375, 13.080757141113281, 11.275581359863281, 10.193534851074219, 15.329004287719727, 5.3043060302734375, -0.700164794921875, 1.7287607192993164, 17.614280700683594, 13.039276123046875, 31.260345458984375, 1.3941154479980469, 1.459686279296875, 0.8902587890625, 6.257862091064453, 1.7476329803466797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000148.npy"}
{"epoch": 0.2173274596182085, "step": 149, "batch_size": 64, "mean": 8.242945671081543, "std": 10.114802360534668, "min": -6.1544036865234375, "p10": -1.8357856750488277, "median": 6.2925262451171875, "p90": 22.263747406005862, "max": 40.47991943359375, "pos_frac": 0.78125, "sample": [32.405616760253906, -0.2801666259765625, 36.313934326171875, 5.5895843505859375, -1.4087905883789062, 5.594108581542969, 0.0216827392578125, 7.587653160095215, 3.539775848388672, 8.235763549804688, 3.608274459838867, 14.294113159179688, 8.68560791015625, -6.1544036865234375, 10.908683776855469, 6.871063232421875, 14.020179748535156, 6.0623779296875, 8.191024780273438, 1.5875778198242188, 9.087081909179688, -2.52667236328125, -0.8257408142089844, 0.5265636444091797, 6.192638397216797, 5.212150573730469, 4.734539031982422, 5.852199554443359, 10.559577941894531, 23.401596069335938, -5.23016357421875, -0.29952239990234375, 18.43354034423828, -0.23266983032226562, -0.9474411010742188, 15.009063720703125, 40.47991943359375, 6.392414093017578, 1.4697494506835938, 20.80108642578125, 5.605720520019531, 19.135154724121094, 7.221767425537109, 7.446174621582031, 1.1053009033203125, 15.04644775390625, 6.936882019042969, 25.220123291015625, -2.176666259765625, 8.63970947265625, 22.75347900390625, 21.12104034423828, 3.3079833984375, 7.23321533203125, 0.872650146484375, -2.0187835693359375, -0.23629379272460938, 12.037410736083984, 0.16656494140625, 31.016448974609375, 13.201950073242188, -5.7636871337890625, 9.509521484375, -3.5971450805664062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000149.npy"}
{"epoch": 0.21879588839941264, "step": 150, "batch_size": 64, "mean": 9.116851806640625, "std": 10.137989044189453, "min": -8.202476501464844, "p10": -0.9540191650390623, "median": 6.867000579833984, "p90": 19.541014099121096, "max": 49.64814758300781, "pos_frac": 0.875, "sample": [9.2506103515625, 38.1669921875, 10.551902770996094, 7.702049255371094, 18.110244750976562, -5.775062561035156, 4.507392883300781, 16.47330093383789, 11.508880615234375, -1.05206298828125, 49.64814758300781, -1.6542205810546875, 19.055248260498047, 2.0093994140625, 5.908119201660156, 27.66058349609375, 5.7113494873046875, 3.4256668090820312, 10.749153137207031, 15.104339599609375, 6.334297180175781, 25.571044921875, 9.355827331542969, 11.233100891113281, 16.753360748291016, 7.3997039794921875, 10.3504638671875, 17.33684539794922, 2.6933021545410156, 17.23871612548828, 3.0542259216308594, 5.752494812011719, 0.13382720947265625, 24.781158447265625, 1.08062744140625, 1.1885204315185547, 5.948371887207031, 11.894824981689453, 19.301040649414062, -5.556007385253906, 18.33257293701172, 19.64385986328125, 10.495229721069336, -0.725250244140625, -4.257453918457031, 0.8809490203857422, -8.202476501464844, 7.537771224975586, 9.173583984375, 4.821422576904297, -2.7358856201171875, 15.260005950927734, 4.776432037353516, 3.0171546936035156, 20.292598724365234, 9.939308166503906, 5.39208984375, 0.5932197570800781, 0.3204307556152344, 5.2867584228515625, 3.9954376220703125, 3.5718040466308594, 2.2199554443359375, 14.941219329833984], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000150.npy"}
{"epoch": 0.22026431718061673, "step": 151, "batch_size": 64, "mean": 8.495565414428711, "std": 10.190389633178711, "min": -8.87677001953125, "p10": -2.507686233520508, "median": 6.676877975463867, "p90": 22.053808593750002, "max": 46.813262939453125, "pos_frac": 0.78125, "sample": [-3.4674072265625, 7.705207824707031, 12.451629638671875, 4.896209716796875, -2.4979209899902344, 19.67321014404297, -1.8263778686523438, 2.7977066040039062, -7.006378173828125, 4.1893310546875, 11.254920959472656, -2.8547210693359375, -2.9291152954101562, 1.1315841674804688, 7.535003662109375, -1.9724960327148438, 3.2510147094726562, 10.531112670898438, 11.279251098632812, 22.926551818847656, 46.813262939453125, 7.569698333740234, 4.9329833984375, 1.8162689208984375, -2.511871337890625, 19.291061401367188, 6.991279602050781, 4.66741943359375, -0.1489410400390625, 25.237548828125, 24.224624633789062, 13.869956970214844, 6.056865692138672, 18.004430770874023, -0.7135009765625, 6.651561737060547, 7.443912506103516, -0.2939109802246094, 6.7021942138671875, 20.030609130859375, 9.247325897216797, 16.633270263671875, 22.678985595703125, -2.0366973876953125, 5.458106994628906, 20.512474060058594, 27.497726440429688, -7.568016052246094, 18.787235260009766, -8.87677001953125, 1.1062774658203125, 3.1183547973632812, 9.927574157714844, 6.176727294921875, 21.932662963867188, 0.32294178009033203, 6.178367614746094, 9.239906311035156, 17.940643310546875, 2.27069091796875, 11.191120147705078, 22.105728149414062, 0.5092048645019531, 15.658531188964844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000151.npy"}
{"epoch": 0.22173274596182085, "step": 152, "batch_size": 64, "mean": 9.964032173156738, "std": 9.50886344909668, "min": -10.919769287109375, "p10": -0.8539005279541013, "median": 9.455989837646484, "p90": 20.39683990478516, "max": 40.4671630859375, "pos_frac": 0.859375, "sample": [2.8370819091796875, 6.840858459472656, 37.49042510986328, 16.424293518066406, 4.178138732910156, 10.684425354003906, 3.3893585205078125, 10.420379638671875, 40.4671630859375, -5.396389007568359, 10.33270263671875, 6.8105316162109375, 9.136974334716797, 10.583290100097656, 12.63201904296875, 7.004814147949219, 5.008689880371094, 18.135616302490234, 11.321220397949219, -1.9813919067382812, 13.883392333984375, 5.427528381347656, 2.439493179321289, 10.438713073730469, 5.505928039550781, 17.7752685546875, 10.941314697265625, 22.146392822265625, 7.363800048828125, 6.414283752441406, -1.178955078125, 8.2333984375, 21.101547241210938, -1.9994583129882812, 8.226293563842773, 9.775005340576172, 6.8818511962890625, 16.793075561523438, 2.2290496826171875, -2.7739105224609375, 20.981170654296875, 14.116790771484375, 5.353296279907227, 19.033401489257812, -0.9673652648925781, -10.919769287109375, 10.49530029296875, 2.74761962890625, 13.577220916748047, 38.98603820800781, 8.625473022460938, 3.2948760986328125, 13.216156005859375, 3.8602142333984375, 24.768508911132812, -0.4192466735839844, 10.34228515625, 3.6082000732421875, 13.587738037109375, 18.70807647705078, 13.57931900024414, -0.5891494750976562, 11.482940673828125, 14.284740447998047], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000152.npy"}
{"epoch": 0.22320117474302498, "step": 153, "batch_size": 64, "mean": 7.391656875610352, "std": 9.438112258911133, "min": -12.55160903930664, "p10": -3.2723186492919916, "median": 5.683196067810059, "p90": 20.481260681152346, "max": 37.82478332519531, "pos_frac": 0.78125, "sample": [4.899829864501953, -2.5263519287109375, 9.114822387695312, 4.361499786376953, -8.731788635253906, 19.514652252197266, -0.940582275390625, 24.43206787109375, 10.98983383178711, 11.169151306152344, 4.520904541015625, -2.752532958984375, 8.4112548828125, -4.301338195800781, 8.532211303710938, 2.2117462158203125, 3.4405059814453125, 4.184787750244141, 1.6565208435058594, 1.5198974609375, 7.143638610839844, 22.1932373046875, 17.76158905029297, 12.125326156616211, 8.642333984375, 6.5590972900390625, 6.751190185546875, 5.440906524658203, 37.82478332519531, 21.215087890625, 13.925216674804688, 3.0313720703125, 23.0311279296875, -2.537435531616211, 0.3219261169433594, -12.55160903930664, 16.89678955078125, 18.422821044921875, 12.058265686035156, -2.830718994140625, 14.164283752441406, 20.838897705078125, 5.9334716796875, 11.298355102539062, -3.456340789794922, 1.7526397705078125, -3.8861732482910156, 4.524505615234375, 26.128326416015625, -6.959320068359375, 5.183685302734375, 13.729171752929688, 16.312557220458984, 3.6232948303222656, 5.925485610961914, 3.1003265380859375, 19.646774291992188, 4.001823425292969, -2.8429336547851562, 1.5274658203125, -3.6929054260253906, -0.1079864501953125, 8.30743408203125, 12.881141662597656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000153.npy"}
{"epoch": 0.22466960352422907, "step": 154, "batch_size": 64, "mean": 11.730561256408691, "std": 9.884175300598145, "min": -5.297449111938477, "p10": 0.8723447799682619, "median": 10.860755920410156, "p90": 26.21300849914552, "max": 39.6763916015625, "pos_frac": 0.921875, "sample": [1.0187606811523438, 4.617815017700195, 6.2913970947265625, 11.442243576049805, 22.77093505859375, 17.110790252685547, 0.8095951080322266, 18.935791015625, 29.095932006835938, 7.018138885498047, 19.608070373535156, 28.320220947265625, 13.785812377929688, 11.381996154785156, 8.069766998291016, 9.762073516845703, -0.7639760971069336, 7.32293701171875, 3.2648372650146484, -5.297449111938477, 21.614364624023438, 23.175540924072266, 0.2728843688964844, 2.989788055419922, 2.0132598876953125, 8.056991577148438, 23.61351776123047, 33.912384033203125, -1.1593475341796875, 11.714424133300781, 7.779605865478516, 19.894790649414062, -1.36920166015625, 11.28934097290039, 2.9242477416992188, 10.805122375488281, 28.6085205078125, 8.232696533203125, 7.62445068359375, 21.241188049316406, 13.225959777832031, 8.531211853027344, 11.635860443115234, 17.506683349609375, 39.6763916015625, 13.985137939453125, 10.916389465332031, 17.510051727294922, 8.655143737792969, 11.006843566894531, 5.514228820800781, 18.682174682617188, 3.151905059814453, 1.24420166015625, 6.223236083984375, -4.798309326171875, 27.327075958251953, 34.703102111816406, 13.617538452148438, 3.587738037109375, 12.035552978515625, 11.989021301269531, 1.3717880249023438, 5.65673828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000154.npy"}
{"epoch": 0.2261380323054332, "step": 155, "batch_size": 64, "mean": 9.972282409667969, "std": 11.939743995666504, "min": -29.07269287109375, "p10": -2.8694013595581054, "median": 8.282093048095703, "p90": 26.343616485595724, "max": 38.66954803466797, "pos_frac": 0.8125, "sample": [3.92010498046875, -10.346221923828125, -7.163066864013672, 18.988433837890625, 7.342185974121094, -0.06822586059570312, -1.8402748107910156, -4.6170196533203125, 9.620502471923828, 4.163820266723633, 5.547294616699219, 5.408744812011719, 2.7612991333007812, -29.07269287109375, 11.66253662109375, 7.9093170166015625, 30.549579620361328, 19.918880462646484, -3.5260391235351562, 17.611312866210938, 33.806549072265625, 10.726612091064453, 5.253425598144531, 21.527511596679688, 37.31158447265625, 4.824197769165039, 7.3470306396484375, 18.646636962890625, 8.654869079589844, 16.738250732421875, 4.751044273376465, 18.131927490234375, 12.429000854492188, 28.683929443359375, 21.35704803466797, -1.6087493896484375, 38.66954803466797, 29.217010498046875, -9.779731750488281, 18.825119018554688, -2.3425865173339844, 5.328775405883789, 12.172271728515625, 6.478912353515625, 16.89574432373047, 11.29071044921875, 20.622833251953125, 4.050636291503906, 13.068817138671875, 7.03662109375, 16.69641876220703, 10.817314147949219, 28.40766143798828, 7.868869781494141, 16.54754066467285, 0.1322174072265625, 19.180908203125, 4.767059326171875, 9.200027465820312, 5.477771759033203, 5.930427551269531, 9.991615295410156, -2.7626819610595703, -2.9151382446289062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000155.npy"}
{"epoch": 0.2276064610866373, "step": 156, "batch_size": 64, "mean": 8.519001007080078, "std": 10.094517707824707, "min": -11.016250610351562, "p10": -3.8977592468261713, "median": 8.591775894165039, "p90": 20.97410964965821, "max": 39.23042297363281, "pos_frac": 0.796875, "sample": [-4.197990417480469, 25.975868225097656, 39.23042297363281, 16.435325622558594, -2.3958511352539062, 2.6089935302734375, 12.297035217285156, 0.20642852783203125, -6.560638427734375, 14.410552978515625, 8.889480590820312, 11.820789337158203, -6.442417144775391, 7.298004150390625, 2.7943992614746094, 1.2496871948242188, 12.76657485961914, 15.69107437133789, 26.638504028320312, 9.743240356445312, -4.8460693359375, 24.530059814453125, 17.503662109375, 12.151046752929688, 5.726165771484375, 0.33290863037109375, 4.895970344543457, 14.901641845703125, 32.83307647705078, 2.34564208984375, 14.597648620605469, 18.649192810058594, 16.02069854736328, 12.809333801269531, 10.839597702026367, 1.8486270904541016, 3.27984619140625, -11.016250610351562, 11.261772155761719, -3.1972198486328125, 12.753826141357422, -0.9232063293457031, 14.904640197753906, 19.457412719726562, 1.8128814697265625, 22.121322631835938, 21.624122619628906, 10.017545700073242, 8.574951171875, 8.531867980957031, 0.4051952362060547, -1.2364082336425781, -1.9805068969726562, 17.477922439575195, -4.47003173828125, 15.546600341796875, 5.121986389160156, 2.7819671630859375, 5.616035461425781, 4.816436767578125, 8.608600616455078, -9.603538513183594, -0.706817626953125, 10.036396026611328], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000156.npy"}
{"epoch": 0.2290748898678414, "step": 157, "batch_size": 64, "mean": 10.88486385345459, "std": 10.052377700805664, "min": -9.046928405761719, "p10": -1.5480022430419922, "median": 8.774080276489258, "p90": 24.005062103271484, "max": 38.55926513671875, "pos_frac": 0.875, "sample": [12.285469055175781, 17.67913818359375, -6.624351501464844, 13.055267333984375, 7.9315643310546875, 4.696750640869141, 9.293582916259766, 0.55279541015625, 6.711006164550781, 38.55926513671875, 20.937225341796875, 15.695358276367188, 8.563735961914062, 24.04986572265625, 23.702117919921875, 9.982566833496094, 18.646690368652344, 4.736522674560547, 15.716377258300781, 21.941238403320312, 8.847694396972656, -2.6299896240234375, 26.338104248046875, 3.683338165283203, 14.144515991210938, 1.9335174560546875, 5.7304840087890625, 19.35370635986328, 6.158935546875, 25.06922149658203, 3.6802978515625, 18.054420471191406, -1.588348388671875, -1.4538612365722656, 7.5995941162109375, 23.90052032470703, 15.15234375, -2.5208663940429688, 8.531795501708984, 11.470420837402344, 5.448101043701172, 20.262283325195312, 5.1874542236328125, 3.3483963012695312, 3.200214385986328, 10.86508560180664, 10.977630615234375, -3.283576011657715, 6.19927978515625, 11.410484313964844, 37.457977294921875, 8.243919372558594, 2.4798202514648438, 5.027503967285156, 16.779953002929688, 17.127166748046875, 33.58500671386719, 0.6641998291015625, 8.70046615600586, 7.388702392578125, 24.111846923828125, 13.309455871582031, -2.381169319152832, -9.046928405761719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000157.npy"}
{"epoch": 0.2305433186490455, "step": 158, "batch_size": 64, "mean": 11.145757675170898, "std": 9.649316787719727, "min": -10.977569580078125, "p10": 1.0707992553710943, "median": 9.932426452636719, "p90": 25.29023246765137, "max": 36.1484375, "pos_frac": 0.921875, "sample": [4.960184097290039, 2.666292190551758, 10.000045776367188, 10.480369567871094, 7.048286437988281, 4.211467742919922, 5.616132736206055, 10.027069091796875, 9.86480712890625, 24.159122467041016, 3.0105438232421875, 33.547401428222656, 31.77288818359375, 6.007106781005859, 8.582817077636719, -10.977569580078125, -6.206418991088867, 15.007026672363281, -4.4857635498046875, 13.615325927734375, 17.428131103515625, -0.9532394409179688, 5.843353271484375, 16.05193328857422, 15.665924072265625, 10.978416442871094, 17.595718383789062, 9.588207244873047, 8.72027587890625, 2.0553627014160156, 16.47164535522461, 7.424858093261719, 11.91250228881836, 4.80633544921875, 9.515380859375, 18.678314208984375, 19.210716247558594, 10.333576202392578, 16.14398193359375, 36.1484375, 14.841766357421875, 4.6787567138671875, 3.462078094482422, 28.939453125, 11.303779602050781, 28.93716049194336, 5.892194747924805, 0.2889556884765625, 14.571151733398438, 7.156822204589844, 12.439697265625, 22.501197814941406, 0.8650665283203125, 25.774993896484375, 3.6083335876464844, 2.9969558715820312, 20.038787841796875, -2.0050926208496094, 16.37127685546875, 6.1948089599609375, 13.95196533203125, 1.55084228515625, 29.39031982421875, 7.050266265869141], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000158.npy"}
{"epoch": 0.23201174743024963, "step": 159, "batch_size": 64, "mean": 11.194160461425781, "std": 11.748787879943848, "min": -16.46259307861328, "p10": -1.3707260131835937, "median": 9.75372314453125, "p90": 26.227711486816414, "max": 47.10014343261719, "pos_frac": 0.828125, "sample": [22.311676025390625, 33.16871643066406, 10.502632141113281, 0.7715988159179688, -0.737396240234375, 20.42138671875, -3.7296218872070312, 7.244972229003906, 20.25634765625, 12.830230712890625, 11.085647583007812, -2.002727508544922, 0.872528076171875, -1.4276580810546875, 6.587543487548828, 2.1163177490234375, 33.455535888671875, 17.26811981201172, 11.043636322021484, 20.29043197631836, 47.10014343261719, 12.0072021484375, -7.8420867919921875, 5.20074462890625, 2.814725875854492, 18.623886108398438, 6.9439239501953125, -16.46259307861328, 6.895965576171875, 9.03875732421875, 9.44073486328125, 11.240272521972656, -6.609161376953125, 12.290695190429688, 15.975875854492188, 4.824291229248047, 16.149396896362305, 10.06671142578125, -0.6095638275146484, 17.606475830078125, 22.94378662109375, 35.678436279296875, 8.15230941772461, 12.164894104003906, 3.2883834838867188, 26.873016357421875, 15.983627319335938, -0.29668235778808594, -5.216403961181641, 24.722000122070312, 15.508520126342773, 0.111572265625, 6.9922943115234375, -1.237884521484375, 6.68035888671875, 34.325374603271484, 5.408538818359375, 22.330841064453125, 6.754707336425781, 32.126953125, 7.054290771484375, 14.768070220947266, 9.0643310546875, 15.218639373779297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000159.npy"}
{"epoch": 0.23348017621145375, "step": 160, "batch_size": 64, "mean": 10.25672721862793, "std": 10.002972602844238, "min": -12.91427993774414, "p10": -1.8541896820068358, "median": 10.914588928222656, "p90": 20.93763885498047, "max": 41.16215515136719, "pos_frac": 0.84375, "sample": [14.0416259765625, 15.735267639160156, 11.799354553222656, 13.079887390136719, -1.5752220153808594, 7.752372741699219, 10.817100524902344, 15.349197387695312, 3.455718994140625, -0.7886848449707031, -5.24003791809082, 11.012077331542969, 13.606529235839844, 12.895103454589844, 15.154361724853516, 15.549842834472656, 0.12759780883789062, 20.427398681640625, 4.7592620849609375, 3.2468414306640625, 27.82525634765625, 9.917518615722656, 15.450103759765625, 13.792320251464844, 9.196510314941406, -7.250862121582031, 4.084754943847656, 18.339637756347656, 41.16215515136719, 20.822463989257812, 1.8618049621582031, 6.107536315917969, 15.549949645996094, 17.418609619140625, -12.91427993774414, 6.777137756347656, -9.089683532714844, 7.961540222167969, 25.87640380859375, 8.986885070800781, 7.422855377197266, 13.930198669433594, 14.7239990234375, 16.30718994140625, 11.122650146484375, 10.272430419921875, 4.067237854003906, 1.5108089447021484, -3.2451705932617188, 20.98699951171875, 20.33718490600586, 28.660118103027344, 14.187515258789062, 2.0614395141601562, 11.801239013671875, 4.762657165527344, 3.2901649475097656, -0.17891883850097656, -7.113151550292969, 28.254913330078125, -1.9737472534179688, 24.2762451171875, 18.517333984375, 9.366962432861328], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000160.npy"}
{"epoch": 0.23494860499265785, "step": 161, "batch_size": 64, "mean": 9.975399017333984, "std": 11.314095497131348, "min": -11.956798553466797, "p10": -3.5888988494873035, "median": 9.8943510055542, "p90": 24.487240600585938, "max": 39.64018249511719, "pos_frac": 0.765625, "sample": [-2.5305938720703125, 13.418426513671875, 9.829133987426758, 33.200523376464844, 30.297576904296875, 11.3349609375, 6.7023162841796875, -1.6538238525390625, 26.322265625, -5.622669219970703, 12.354522705078125, 26.67694854736328, 2.8466873168945312, 10.53494644165039, 15.273792266845703, 0.5778732299804688, 23.752548217773438, 17.984046936035156, 11.895538330078125, 39.64018249511719, -6.798828125, -2.3822784423828125, 20.09851837158203, -1.2362537384033203, 5.8616485595703125, -1.5919971466064453, 13.636962890625, 14.228744506835938, 24.13019561767578, 22.937469482421875, 14.147926330566406, -11.201507568359375, 5.648591995239258, -4.043376922607422, 17.931488037109375, 24.21429443359375, 6.604499816894531, -4.041603088378906, 21.55016326904297, 7.794563293457031, -9.494232177734375, 4.38494873046875, -11.956798553466797, 12.983875274658203, 7.0950164794921875, 2.2647247314453125, 6.3869781494140625, 16.519062042236328, -1.8019828796386719, 11.264190673828125, -0.29791259765625, 10.589584350585938, 8.09661865234375, 23.546913146972656, 13.287567138671875, 24.604217529296875, 3.531036376953125, 29.40839385986328, 9.919122695922852, -2.5325889587402344, 7.573307037353516, 10.161376953125, 9.869579315185547, 2.6981048583984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000161.npy"}
{"epoch": 0.23641703377386197, "step": 162, "batch_size": 64, "mean": 10.478342056274414, "std": 10.924077987670898, "min": -26.556427001953125, "p10": -1.2088203430175768, "median": 10.112613677978516, "p90": 24.881641387939457, "max": 39.14839172363281, "pos_frac": 0.890625, "sample": [0.13521575927734375, 5.9971466064453125, 17.404388427734375, 10.537979125976562, 0.525665283203125, 20.766738891601562, 12.435237884521484, 18.328994750976562, 13.543991088867188, 4.841800689697266, -7.47282600402832, 2.835277557373047, 14.674362182617188, 6.309478759765625, 25.201492309570312, 30.928258895874023, 12.254646301269531, 0.1768341064453125, 1.059844970703125, 19.441940307617188, 20.686126708984375, 3.7616233825683594, 18.300888061523438, 2.5781326293945312, 7.450098037719727, 10.453422546386719, 16.84235382080078, 2.4193115234375, -26.556427001953125, 29.32952117919922, 7.5184478759765625, 15.903533935546875, 4.118261337280273, 0.19699859619140625, 27.4144287109375, 8.690620422363281, 21.487930297851562, 11.619110107421875, 11.992740631103516, -2.343231201171875, 14.471771240234375, 17.98590087890625, 25.764984130859375, 10.521202087402344, 5.349433898925781, 8.017745971679688, 14.361198425292969, 9.043678283691406, -2.2483978271484375, 8.145748138427734, 8.363800048828125, 24.13532257080078, 8.490692138671875, -1.7848358154296875, 39.14839172363281, 21.644119262695312, 12.291694641113281, -5.162055969238281, 3.8347625732421875, 9.771804809570312, -8.488418579101562, 0.4983978271484375, 15.402790069580078, 29.263816833496094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000162.npy"}
{"epoch": 0.23788546255506607, "step": 163, "batch_size": 64, "mean": 13.094063758850098, "std": 13.253658294677734, "min": -9.155776977539062, "p10": -1.1585492134094233, "median": 11.179527282714844, "p90": 32.01480216979981, "max": 63.90147399902344, "pos_frac": 0.875, "sample": [5.133392333984375, 8.84515380859375, 23.108871459960938, 27.96936798095703, 2.597339630126953, 3.4093360900878906, -2.408905029296875, 18.063232421875, 4.456630706787109, 35.5438232421875, 11.276779174804688, 10.161361694335938, 11.082275390625, 32.67853546142578, 22.157428741455078, -1.43609619140625, 16.553335189819336, 35.6668701171875, 4.2795562744140625, 15.354789733886719, 1.6329154968261719, 12.739891052246094, 7.001251220703125, 20.252174377441406, -9.155776977539062, 1.5339126586914062, 11.36202621459961, 4.989208221435547, 4.97357177734375, 6.735382080078125, 42.25428771972656, -3.6359291076660156, 15.364402770996094, 6.207363128662109, -6.467460632324219, -3.9073429107666016, 2.39556884765625, 4.5566558837890625, 15.024749755859375, 14.314071655273438, 23.06890106201172, 3.2023544311523438, 16.0159912109375, 13.424957275390625, 30.46609115600586, 11.902069091796875, 2.970458984375, 37.26856994628906, 63.90147399902344, -0.5109395980834961, 9.317970275878906, 4.351921081542969, 25.560165405273438, 34.56800079345703, 24.118751525878906, 16.106048583984375, 8.223918914794922, 12.574928283691406, -3.1858749389648438, 11.40444564819336, 22.133834838867188, 7.6001129150390625, 25.651268005371094, 5.220664978027344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000163.npy"}
{"epoch": 0.2393538913362702, "step": 164, "batch_size": 64, "mean": 8.783533096313477, "std": 9.993690490722656, "min": -20.070755004882812, "p10": -2.9875953674316404, "median": 8.71550464630127, "p90": 21.391646575927737, "max": 33.752532958984375, "pos_frac": 0.796875, "sample": [2.2731857299804688, 14.0975341796875, 16.276046752929688, 8.241546630859375, 1.0421218872070312, -3.3087158203125, 17.931671142578125, 24.44537353515625, -6.051727294921875, 9.122970581054688, 0.8128509521484375, -7.1107330322265625, 1.9325637817382812, 8.748201370239258, 17.40882110595703, 6.423408508300781, 13.223892211914062, 20.626811981201172, 7.265048980712891, 6.698678970336914, -2.8243980407714844, 4.302680969238281, 9.713150024414062, 18.643707275390625, 4.888874053955078, -6.187042236328125, 4.355703353881836, 18.784992218017578, 0.37592315673828125, -0.42379188537597656, 3.0335540771484375, -3.057537078857422, 10.453277587890625, -0.7789764404296875, 4.9370269775390625, 14.557243347167969, 9.9410400390625, 27.1844482421875, -1.0875320434570312, 33.72163391113281, -1.4342117309570312, -20.070755004882812, -5.838275909423828, 13.802055358886719, 2.406704902648926, 7.841438293457031, 21.719432830810547, 8.646507263183594, 9.129837036132812, 8.682807922363281, 14.919063568115234, 19.126575469970703, 8.81622314453125, 14.365921020507812, 33.752532958984375, 11.988082885742188, 10.511917114257812, 14.01806640625, 22.348403930664062, 6.832328796386719, -0.7017898559570312, 22.985336303710938, 12.761550903320312, 14.902870178222656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000164.npy"}
{"epoch": 0.24082232011747431, "step": 165, "batch_size": 64, "mean": 12.753755569458008, "std": 14.481099128723145, "min": -11.555614471435547, "p10": -0.1694728851318357, "median": 10.596622467041016, "p90": 22.804380798339846, "max": 84.20907592773438, "pos_frac": 0.890625, "sample": [6.224765777587891, 6.269554138183594, 1.2036209106445312, 8.659664154052734, 14.43328857421875, 2.565812110900879, 17.713050842285156, -4.0586700439453125, 15.67523193359375, 7.467390060424805, 13.578575134277344, 2.6933364868164062, 28.883316040039062, 20.384376525878906, 15.27157974243164, 14.786453247070312, 1.3658828735351562, 22.91748046875, 5.242515563964844, 44.68685531616211, 12.503059387207031, 0.30115509033203125, 3.8600387573242188, 84.20907592773438, 16.98944854736328, 38.67138671875, 0.06935501098632812, -11.555614471435547, 17.86658477783203, 22.540481567382812, 35.083961486816406, 5.242420196533203, 22.000518798828125, 20.567718505859375, -3.0170745849609375, 3.63507080078125, 11.585502624511719, -0.27182769775390625, 2.9897689819335938, -0.6363945007324219, 9.96761703491211, 1.57421875, 7.2305450439453125, -6.639533996582031, 21.00537872314453, 16.116191864013672, 18.974700927734375, 8.166007995605469, 10.700935363769531, 4.608165740966797, 15.048919677734375, 16.026214599609375, 8.901477813720703, -5.37335205078125, 12.669715881347656, 10.4923095703125, 21.39122772216797, 7.407661437988281, 44.45423126220703, 21.655155181884766, 3.41705322265625, 18.644439697265625, 1.0844192504882812, 20.11797332763672], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000165.npy"}
{"epoch": 0.2422907488986784, "step": 166, "batch_size": 64, "mean": 12.143659591674805, "std": 14.20314884185791, "min": -9.251667022705078, "p10": -1.6419269561767573, "median": 9.857589721679688, "p90": 27.857276153564452, "max": 57.6168212890625, "pos_frac": 0.828125, "sample": [27.884414672851562, 34.32209777832031, -7.319095611572266, 55.0535888671875, 10.4658203125, 33.03726577758789, 3.8770484924316406, 10.623092651367188, 0.7260665893554688, 6.916015625, 19.802459716796875, 8.638275146484375, 4.480989456176758, 19.046554565429688, 12.150566101074219, 4.001031875610352, 3.7280426025390625, 13.644805908203125, 12.375625610351562, 12.843093872070312, 12.21340560913086, 43.53857421875, 3.1182632446289062, 3.1719589233398438, -3.2743568420410156, -4.661832809448242, 16.299259185791016, -1.0319366455078125, 4.794677734375, 54.10345458984375, 15.247390747070312, 57.6168212890625, 12.829376220703125, 21.48773193359375, -0.278961181640625, 27.79395294189453, -6.05828857421875, 3.5667266845703125, 9.249359130859375, -0.909881591796875, 14.545974731445312, 3.9195556640625, 3.0279998779296875, 13.584270477294922, 11.197635650634766, 8.897018432617188, 13.171096801757812, 24.296470642089844, 7.951271057128906, 17.989212036132812, 15.404273986816406, 7.2647857666015625, -4.585149765014648, 0.7760963439941406, 3.0521316528320312, -9.251667022705078, -1.1408500671386719, 25.620758056640625, 23.095836639404297, 15.758148193359375, 17.315269470214844, 5.168262481689453, -1.8566741943359375, 6.879066467285156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000166.npy"}
{"epoch": 0.24375917767988253, "step": 167, "batch_size": 64, "mean": 13.465675354003906, "std": 11.786768913269043, "min": -5.7142333984375, "p10": 1.2524818420410158, "median": 10.648664474487305, "p90": 29.397566223144537, "max": 48.500335693359375, "pos_frac": 0.921875, "sample": [16.196151733398438, 3.139333724975586, 12.7786865234375, 16.79370880126953, 27.71221923828125, 24.475669860839844, 4.689971923828125, 12.067131042480469, 24.87464141845703, 10.667964935302734, 27.17718505859375, 37.45526885986328, 12.747516632080078, -5.7142333984375, 4.105560302734375, 13.971515655517578, 26.378250122070312, 10.352951049804688, 20.998069763183594, 14.895858764648438, 1.1931190490722656, 3.010112762451172, 20.551841735839844, 4.776603698730469, 8.38724136352539, 21.592987060546875, 41.92658233642578, 10.401988983154297, 4.7968292236328125, 10.880264282226562, -1.3828773498535156, 0.2069549560546875, 38.85216522216797, 6.377368927001953, 5.852874755859375, 13.22281265258789, 48.500335693359375, 9.963218688964844, 43.69242858886719, 7.0034637451171875, 9.994598388671875, 5.237998962402344, 9.089714050292969, 4.994026184082031, 30.119857788085938, 16.037353515625, 13.55172348022461, 17.163833618164062, 31.3565673828125, 10.48138427734375, 10.41607666015625, 10.629364013671875, 16.311264038085938, -2.8437652587890625, 2.327606201171875, 20.502670288085938, 6.872306823730469, 5.3426055908203125, -1.42254638671875, 6.633880615234375, 11.787704467773438, -2.5866241455078125, 12.844841003417969, 1.3909950256347656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000167.npy"}
{"epoch": 0.24522760646108663, "step": 168, "batch_size": 64, "mean": 11.17927360534668, "std": 12.425471305847168, "min": -14.476848602294922, "p10": 0.28993129730224615, "median": 10.064657211303711, "p90": 26.06568908691407, "max": 61.41432189941406, "pos_frac": 0.90625, "sample": [11.354305267333984, 10.594741821289062, -14.476848602294922, 8.79994010925293, -1.895223617553711, 9.753257751464844, 24.14617919921875, 10.495609283447266, -12.686859130859375, 6.541511535644531, 0.33935546875, -9.452720642089844, 13.604034423828125, 9.667831420898438, 8.048812866210938, 14.580390930175781, 16.83214569091797, 19.102798461914062, 9.745613098144531, 26.888336181640625, 21.24169921875, 15.052597045898438, 8.456039428710938, 11.286201477050781, 44.50306701660156, 31.3419189453125, 0.27453041076660156, 32.58642578125, 5.7709503173828125, 11.834922790527344, -10.346595764160156, 12.942134857177734, -7.988555908203125, 11.144599914550781, 6.714778900146484, 4.565025329589844, 6.7331085205078125, 3.028108596801758, 7.8897705078125, 4.9428558349609375, 2.9811058044433594, 27.145343780517578, 21.609458923339844, 0.42423248291015625, 0.32586669921875, 3.3463993072509766, 9.832706451416016, 15.123878479003906, 12.504501342773438, 13.621971130371094, 6.517913818359375, 10.771709442138672, 7.518459320068359, 6.6338958740234375, 31.23260498046875, 10.296607971191406, 14.774856567382812, 12.32537841796875, 17.553024291992188, 1.0582962036132812, 61.41432189941406, 4.645408630371094, 20.4072265625, 19.451560974121094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000168.npy"}
{"epoch": 0.24669603524229075, "step": 169, "batch_size": 64, "mean": 12.494746208190918, "std": 12.370039939880371, "min": -7.091527938842773, "p10": -2.2766716003417966, "median": 11.704097747802734, "p90": 26.374504852294923, "max": 52.49693298339844, "pos_frac": 0.859375, "sample": [-2.321685791015625, 14.548820495605469, 1.4703292846679688, 5.4077606201171875, 2.525787353515625, 17.709320068359375, 28.12572479248047, 20.752395629882812, 7.678314208984375, -3.0917720794677734, 6.179779052734375, 31.315452575683594, -4.5959320068359375, -0.9492645263671875, 10.730823516845703, 18.26300048828125, 8.087112426757812, 13.068878173828125, 0.2754974365234375, 2.8889923095703125, 14.569759368896484, 1.2650222778320312, 8.632034301757812, -5.094139099121094, 26.320816040039062, 22.74689483642578, 7.948169708251953, 18.062835693359375, 6.9188690185546875, 1.9130239486694336, 24.131332397460938, 14.55096435546875, 1.8108444213867188, 1.065877914428711, 20.113872528076172, -2.1716384887695312, 17.754364013671875, -7.091527938842773, 14.283653259277344, 26.39751434326172, 6.211278915405273, 49.46028137207031, 11.997966766357422, 10.42755126953125, 25.036659240722656, 6.834209442138672, 13.122261047363281, 11.259666442871094, 27.968795776367188, 14.798828125, 42.79734802246094, 7.161468505859375, 13.540706634521484, -3.7367706298828125, 52.49693298339844, 6.955986022949219, 11.410228729248047, -6.977630615234375, 15.306861877441406, 22.099239349365234, 15.014152526855469, 13.990959167480469, 23.46308135986328, 16.825790405273438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000169.npy"}
{"epoch": 0.24816446402349487, "step": 170, "batch_size": 64, "mean": 11.533134460449219, "std": 11.139452934265137, "min": -13.605731964111328, "p10": -1.617112350463867, "median": 10.7564697265625, "p90": 27.42874755859375, "max": 37.66852569580078, "pos_frac": 0.84375, "sample": [9.378379821777344, 16.426284790039062, 6.3178558349609375, 15.200946807861328, 10.752723693847656, 2.7894325256347656, -0.4827423095703125, -1.3625984191894531, 16.140522003173828, 3.0934295654296875, 27.459552764892578, 4.102565765380859, 4.602262496948242, 5.099521636962891, 0.5030498504638672, -4.310157775878906, 18.806968688964844, 36.178958892822266, 36.135009765625, 11.917167663574219, 7.698970794677734, 12.878898620605469, 19.86162567138672, 20.702409744262695, -1.6979522705078125, 3.45196533203125, -3.7536468505859375, 10.02645492553711, -6.095062255859375, 14.52450942993164, 20.05877685546875, 37.66852569580078, 6.9434814453125, 28.385581970214844, 22.517166137695312, 1.942352294921875, 7.340484619140625, -1.4284858703613281, 6.904754638671875, 15.68994140625, 27.356868743896484, 10.5185546875, 12.508438110351562, 15.570709228515625, 33.80949401855469, 15.704193115234375, 18.38809585571289, 6.700706481933594, 16.63714599609375, 10.407716751098633, 12.245853424072266, 18.06623077392578, -3.2259674072265625, 15.982322692871094, 34.222633361816406, 10.760215759277344, 11.13958740234375, -8.397422790527344, 3.7652359008789062, -13.605731964111328, 14.811859130859375, 3.853290557861328, 18.805282592773438, 9.725378036499023], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000170.npy"}
{"epoch": 0.24963289280469897, "step": 171, "batch_size": 64, "mean": 13.098483085632324, "std": 16.056306838989258, "min": -25.112319946289062, "p10": 0.1857160568237307, "median": 9.049083709716797, "p90": 29.908365631103518, "max": 75.5731201171875, "pos_frac": 0.90625, "sample": [1.9693145751953125, 16.43531036376953, 36.35240173339844, 1.8360061645507812, 9.107994079589844, -5.996437072753906, 7.637714385986328, 3.4738311767578125, -8.355422973632812, 20.309112548828125, 18.832515716552734, 18.162525177001953, 5.846527099609375, 3.5898284912109375, 15.682968139648438, -10.289749145507812, 7.4873046875, 15.343475341796875, 10.314598083496094, 0.07462310791015625, 21.006866455078125, 27.916244506835938, 27.156570434570312, 2.7165565490722656, 28.182876586914062, 11.011314392089844, 8.99017333984375, 52.243499755859375, 3.364238739013672, 31.223045349121094, 8.489852905273438, -25.112319946289062, 57.89154052734375, 2.5421829223632812, 22.710872650146484, 0.78106689453125, 17.523536682128906, 24.284000396728516, 7.276700973510742, 23.692474365234375, 3.7006607055664062, 2.7208480834960938, 24.646392822265625, 2.299999237060547, 1.2058906555175781, 13.63836669921875, 17.025909423828125, 20.1292724609375, 0.53741455078125, 4.883232116699219, -4.8661041259765625, -2.3802947998046875, 34.153221130371094, 29.662757873535156, 5.344268798828125, 5.497539520263672, 75.5731201171875, 10.316238403320312, 12.034446716308594, 4.797843933105469, 6.155494689941406, 0.4449329376220703, 19.0621337890625, 30.013626098632812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000171.npy"}
{"epoch": 0.2511013215859031, "step": 172, "batch_size": 64, "mean": 12.417065620422363, "std": 9.546381950378418, "min": -12.0672607421875, "p10": 0.47002944946289094, "median": 12.226600646972656, "p90": 23.018926239013673, "max": 35.141448974609375, "pos_frac": 0.9375, "sample": [12.060447692871094, 14.610279083251953, 0.2583332061767578, 1.89794921875, 19.74927520751953, 7.471092224121094, 13.119640350341797, 3.95123291015625, 12.543319702148438, 0.7681350708007812, 9.977867126464844, 10.984954833984375, -12.0672607421875, 18.928146362304688, 10.322708129882812, 21.723670959472656, 27.506668090820312, 20.65924835205078, 6.466224670410156, 9.350349426269531, 3.7191390991210938, 15.434280395507812, 20.360443115234375, 35.141448974609375, 7.297935485839844, 3.814271926879883, 20.56012725830078, 12.447486877441406, 0.041015625, 33.8985595703125, 12.05535888671875, -0.6138744354248047, 8.78408432006836, 9.519058227539062, 6.606758117675781, 23.01123046875, 16.884475708007812, 23.40053367614746, 3.1680755615234375, 12.392753601074219, -5.861232757568359, 17.658737182617188, 33.647239685058594, 17.941238403320312, -2.3252182006835938, 12.960884094238281, 0.3422698974609375, 4.840354919433594, 10.542800903320312, 17.78504180908203, 15.145095825195312, 18.974456787109375, 10.39947509765625, 15.740791320800781, 19.143775939941406, 22.27752685546875, 23.02222442626953, 16.41680908203125, 4.95513916015625, 8.65118408203125, 6.335563659667969, 2.117889404296875, 30.886474609375, 14.888221740722656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000172.npy"}
{"epoch": 0.2525697503671072, "step": 173, "batch_size": 64, "mean": 10.324000358581543, "std": 11.206513404846191, "min": -9.384265899658203, "p10": -4.422364616394042, "median": 9.460893630981445, "p90": 23.9846923828125, "max": 39.673828125, "pos_frac": 0.828125, "sample": [-2.18072509765625, 0.41272735595703125, 4.477485656738281, 15.536636352539062, 3.050304412841797, 39.673828125, 22.90216827392578, 8.471748352050781, -3.0925445556640625, -0.36176300048828125, 17.1888427734375, 9.197059631347656, 0.5128555297851562, 1.3900527954101562, 15.085166931152344, 16.274513244628906, 13.587677001953125, 14.186613082885742, 5.622707366943359, 39.32533264160156, -6.540397644042969, -3.8121490478515625, 7.117038726806641, 12.642593383789062, 19.482398986816406, 15.094585418701172, 1.537078857421875, 1.872690200805664, 17.907203674316406, 13.202507019042969, 2.8615455627441406, 3.4620361328125, 3.97186279296875, 24.21782684326172, 15.485000610351562, 5.4385528564453125, 26.8905029296875, 20.68499755859375, 7.40186882019043, -6.456024169921875, 23.440711975097656, 23.157379150390625, 10.301403045654297, 24.579994201660156, 12.816688537597656, -5.108711242675781, 35.40608215332031, 10.055854797363281, -8.830368041992188, 12.961860656738281, 15.416311264038086, 5.9205169677734375, 8.556602478027344, 20.553390502929688, -9.384265899658203, 19.9599609375, 1.1129474639892578, -4.68388557434082, 9.724727630615234, 5.610958099365234, -5.238677978515625, 7.285820007324219, 25.93895721435547, 17.459388732910156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000173.npy"}
{"epoch": 0.2540381791483113, "step": 174, "batch_size": 64, "mean": 13.184261322021484, "std": 13.0292329788208, "min": -21.774871826171875, "p10": -0.549943161010742, "median": 11.678722381591797, "p90": 28.021855163574223, "max": 71.04119873046875, "pos_frac": 0.84375, "sample": [24.551925659179688, -0.37592315673828125, 12.02685546875, 11.710800170898438, -4.39788818359375, 10.310791015625, 5.666259765625, 18.786815643310547, 23.058094024658203, 26.657241821289062, 8.600196838378906, -4.391883850097656, 10.504005432128906, 6.678033828735352, 24.004959106445312, 23.71329116821289, 26.026180267333984, 12.911972045898438, 14.53903579711914, -7.9326171875, 19.084396362304688, 24.436359405517578, 24.985984802246094, 8.55640983581543, 32.84930419921875, -21.774871826171875, 13.810333251953125, 15.977401733398438, 1.4537811279296875, 17.063779830932617, 9.189098358154297, 7.271018028259277, 10.954154968261719, -2.9095230102539062, 28.65106201171875, 10.432876586914062, 13.305572509765625, 7.405872344970703, -0.0504608154296875, 29.949417114257812, 20.848297119140625, 30.221572875976562, 17.904268264770508, -4.93060302734375, -0.6245231628417969, 4.863059997558594, 6.753671646118164, 29.776779174804688, 5.71875, 9.255867004394531, 12.173347473144531, 10.720430374145508, 13.028511047363281, 28.606689453125, 11.646644592285156, 14.40679931640625, 10.444183349609375, 4.1853179931640625, 2.4563674926757812, -0.05515480041503906, 71.04119873046875, 25.141845703125, 9.274173736572266, 17.645030975341797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000174.npy"}
{"epoch": 0.2555066079295154, "step": 175, "batch_size": 64, "mean": 13.580202102661133, "std": 11.814506530761719, "min": -12.398239135742188, "p10": -0.032505798339843156, "median": 11.104940414428711, "p90": 30.406169891357422, "max": 43.76524353027344, "pos_frac": 0.890625, "sample": [10.950138092041016, 19.13599395751953, 2.2993946075439453, 18.52349853515625, 6.6046600341796875, 1.9052734375, 11.259742736816406, 9.737236022949219, 3.732624053955078, 26.267684936523438, 24.709548950195312, -0.27923583984375, 0.5431976318359375, 43.76524353027344, 42.66852569580078, 25.172462463378906, -0.6769161224365234, 3.8162918090820312, 9.599502563476562, 2.346038818359375, 23.07762908935547, 10.656478881835938, 33.06639099121094, 22.034317016601562, 8.840690612792969, 12.78173828125, 23.024620056152344, 24.85277557373047, 16.500579833984375, 12.382865905761719, 11.654510498046875, 3.2988357543945312, 3.6855621337890625, 20.362205505371094, 10.811885833740234, 13.393617630004883, -6.64959716796875, 31.53717041015625, 5.23106575012207, -2.0253658294677734, 24.255802154541016, 11.661201477050781, 14.145170211791992, 10.748090744018555, 4.986909866333008, 33.05165100097656, 22.568511962890625, -2.1577301025390625, 5.540130615234375, 35.71354675292969, 13.070503234863281, 24.808917999267578, 9.817073822021484, 3.74444580078125, 7.232685089111328, 8.804603576660156, 18.384910583496094, -3.1476478576660156, -12.398239135742188, 30.109207153320312, 30.53343963623047, 18.188827514648438, 10.14630126953125, 8.725753784179688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000175.npy"}
{"epoch": 0.25697503671071953, "step": 176, "batch_size": 64, "mean": 13.813408851623535, "std": 15.215455055236816, "min": -14.89947509765625, "p10": -4.389229583740234, "median": 11.96527099609375, "p90": 34.99623107910157, "max": 49.650421142578125, "pos_frac": 0.8125, "sample": [16.661014556884766, 12.041694641113281, 12.885551452636719, 44.7381591796875, 28.65003204345703, 13.840965270996094, 7.900676727294922, 3.4596099853515625, -1.9150981903076172, 1.9505462646484375, 6.279140472412109, 32.953033447265625, 2.811677932739258, 15.658760070800781, 29.231826782226562, 11.283599853515625, 15.162353515625, 24.5531005859375, 6.759941101074219, -6.246623992919922, 35.87188720703125, 9.9224853515625, 29.771133422851562, 0.20651626586914062, -14.89947509765625, -4.838371276855469, 0.387298583984375, 42.31261444091797, 13.638214111328125, -0.04364776611328125, 4.274024963378906, 27.676788330078125, 28.675140380859375, 1.3362197875976562, 15.881362915039062, -1.422821044921875, 12.460350036621094, 4.8764190673828125, -5.105308532714844, 14.450767517089844, 29.041229248046875, 5.817140579223633, 24.632747650146484, 2.0280075073242188, 7.575263977050781, 39.87129211425781, 27.06768035888672, 11.888847351074219, 28.885665893554688, 46.9285888671875, 26.780487060546875, -10.004173278808594, 19.5401611328125, -3.837665557861328, -4.625614166259766, -0.275787353515625, -9.379117965698242, 8.069957733154297, 23.929824829101562, 6.7712249755859375, 49.650421142578125, 39.56645202636719, 7.567331314086914, 12.476692199707031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000176.npy"}
{"epoch": 0.25844346549192365, "step": 177, "batch_size": 64, "mean": 15.80119800567627, "std": 17.112051010131836, "min": -5.9520721435546875, "p10": 0.27735805511474626, "median": 12.436332702636719, "p90": 33.366738128662114, "max": 79.67616271972656, "pos_frac": 0.921875, "sample": [4.45111083984375, 21.50445556640625, 22.637344360351562, 4.9895477294921875, -5.9520721435546875, 29.4730224609375, 13.707015991210938, 0.45764923095703125, 11.473003387451172, 32.19232940673828, 77.48127746582031, 6.881065368652344, 20.133012771606445, 17.632415771484375, 13.784439086914062, 14.421161651611328, 3.4520721435546875, 61.76385498046875, 4.2262115478515625, 28.981231689453125, 4.127967834472656, 14.985111236572266, 13.387260437011719, 12.367439270019531, -3.6192007064819336, 8.033746719360352, 46.052268981933594, -2.3371810913085938, 0.6938552856445312, 35.53498077392578, 14.659622192382812, 8.45792007446289, 18.737442016601562, 12.505226135253906, 22.25360107421875, 0.7345428466796875, -4.0073699951171875, 16.811342239379883, 13.946187973022461, 4.538782119750977, 7.11419677734375, 43.13194274902344, 11.920530319213867, 7.0029144287109375, 11.178009033203125, 14.117256164550781, 22.331764221191406, 1.3964805603027344, 2.5860061645507812, 7.177734375, 19.750476837158203, 8.136543273925781, 14.606605529785156, 0.2000904083251953, 31.41729736328125, 11.684844970703125, 7.920783996582031, 5.249897003173828, 25.149551391601562, 0.16280746459960938, 79.67616271972656, 33.87005615234375, 27.188232421875, -3.2471694946289062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000177.npy"}
{"epoch": 0.2599118942731278, "step": 178, "batch_size": 64, "mean": 18.723678588867188, "std": 16.19768524169922, "min": -7.137443542480469, "p10": 0.7130981445312514, "median": 16.320388793945312, "p90": 43.78019485473634, "max": 68.36727905273438, "pos_frac": 0.90625, "sample": [20.072288513183594, 17.13402557373047, 13.643135070800781, -2.0946807861328125, 24.84325408935547, 6.610923767089844, 4.072574615478516, 4.104564666748047, 2.597431182861328, 16.220733642578125, 2.931873321533203, 19.501876831054688, 30.63385772705078, 20.664688110351562, 21.834747314453125, 19.644535064697266, 63.390594482421875, 20.81995391845703, 42.172515869140625, 5.5894775390625, 68.36727905273438, 6.327169418334961, 0.119049072265625, 21.84967803955078, 26.402385711669922, 44.469200134277344, 47.16398620605469, 12.214790344238281, -2.73175048828125, 16.4200439453125, 19.332778930664062, 52.12513732910156, -1.3746719360351562, 12.936562538146973, 21.012939453125, 32.415435791015625, 17.51959228515625, 2.099212646484375, 15.566646575927734, 32.9078369140625, 6.25885009765625, -0.63153076171875, 32.49315643310547, 12.348342895507812, 16.11945915222168, 24.28607940673828, 16.603172302246094, 26.618255615234375, 12.948810577392578, 48.8489990234375, 6.928615570068359, 32.103187561035156, 16.979766845703125, 9.16839599609375, 7.079376220703125, 16.046188354492188, -2.362060546875, 13.069572448730469, -7.137443542480469, 14.386310577392578, 55.84672546386719, 6.607124328613281, 20.059978485107422, 14.114326477050781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000178.npy"}
{"epoch": 0.26138032305433184, "step": 179, "batch_size": 64, "mean": 12.73602294921875, "std": 16.115755081176758, "min": -21.044342041015625, "p10": -4.403768920898437, "median": 11.40704345703125, "p90": 38.60220718383791, "max": 48.811279296875, "pos_frac": 0.734375, "sample": [-3.7466201782226562, 27.70059585571289, 20.755462646484375, 25.131546020507812, 5.75507926940918, 18.877403259277344, 22.108840942382812, -0.14365386962890625, -4.5309906005859375, -2.169994354248047, 41.40851593017578, 13.805362701416016, 3.798391342163086, -1.1119651794433594, 1.1416473388671875, 21.511085510253906, -9.7462158203125, 9.691390991210938, 5.144157409667969, 22.493911743164062, 23.22431182861328, -1.865875244140625, 20.894378662109375, 1.2264137268066406, 26.512554168701172, 15.003326416015625, 6.614898681640625, 33.62700653076172, 8.72528076171875, 21.273178100585938, 1.2437591552734375, 40.73443603515625, 48.811279296875, 11.608604431152344, -5.716808319091797, 17.642433166503906, -7.405647277832031, 25.076873779296875, -3.644317626953125, 0.4792442321777344, 4.202304840087891, -21.044342041015625, 4.220417022705078, -0.72705078125, 21.85700225830078, 46.4984130859375, 19.984275817871094, 18.733612060546875, 41.382110595703125, -4.1069183349609375, 16.817710876464844, 45.051666259765625, -1.9832305908203125, 30.12908935546875, 42.664512634277344, 25.384387969970703, -4.1068572998046875, 14.321380615234375, -13.461906433105469, 3.9149818420410156, -9.799507141113281, 19.622406005859375, 2.406299591064453, 11.205482482910156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000179.npy"}
{"epoch": 0.26284875183553597, "step": 180, "batch_size": 64, "mean": 15.557160377502441, "std": 14.6504545211792, "min": -10.894325256347656, "p10": -0.9578212738037102, "median": 13.114585876464844, "p90": 33.29262962341309, "max": 79.21377563476562, "pos_frac": 0.875, "sample": [13.509658813476562, 25.756248474121094, 16.8135986328125, -10.894325256347656, 11.8248291015625, 79.21377563476562, 28.728443145751953, 10.829833984375, 32.13259506225586, 26.216064453125, -3.0809497833251953, 9.618133544921875, -8.291885375976562, 6.092216491699219, 3.94757080078125, 37.153289794921875, 8.758773803710938, 15.265422821044922, 12.392749786376953, 9.51824951171875, 8.516101837158203, 15.350364685058594, 16.030086517333984, 26.972625732421875, 34.268638610839844, 31.74200439453125, -6.2854766845703125, 17.198001861572266, 13.567928314208984, -1.2995681762695312, 38.171730041503906, 2.9271316528320312, 19.076461791992188, 24.74365997314453, 24.314041137695312, 9.126502990722656, 8.743095397949219, 23.482391357421875, 17.789443969726562, 33.78978729248047, 15.024452209472656, 19.932373046875, 28.757347106933594, 9.059471130371094, -4.585166931152344, 2.3542747497558594, 16.97796630859375, 22.400711059570312, 5.323478698730469, 4.792655944824219, -1.3473129272460938, 24.824966430664062, 9.835731506347656, 48.14054870605469, 5.734169006347656, 2.1468582153320312, 11.76287841796875, 6.687896728515625, 12.719512939453125, 7.466766357421875, 35.48738098144531, 12.338897705078125, -0.16041183471679688, 16.253616333007812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000180.npy"}
{"epoch": 0.2643171806167401, "step": 181, "batch_size": 64, "mean": 12.171871185302734, "std": 13.419724464416504, "min": -11.083984375, "p10": -2.2442561149597164, "median": 9.4342041015625, "p90": 28.15670013427735, "max": 51.76856994628906, "pos_frac": 0.828125, "sample": [9.395751953125, 0.94525146484375, 12.501077651977539, 29.02356719970703, 15.166091918945312, -3.375425338745117, 1.4880409240722656, -0.19498062133789062, 0.379974365234375, 22.489303588867188, 22.588062286376953, 13.033992767333984, -1.8951807022094727, 21.650802612304688, 7.692352294921875, 7.642604827880859, 7.346649169921875, 11.892509460449219, 17.313617706298828, -0.7803268432617188, 9.612834930419922, -11.083984375, 33.16734313964844, 25.3707275390625, 5.4166107177734375, 7.921222686767578, -6.67718505859375, 48.57659912109375, 30.822429656982422, 15.62629508972168, 5.296836853027344, 19.2396240234375, -4.5268707275390625, 22.605457305908203, 11.666481018066406, 23.474994659423828, 24.408493041992188, 4.3697052001953125, 20.370834350585938, 2.7528228759765625, 4.271528244018555, -4.9466094970703125, 33.47425079345703, 13.838031768798828, 51.76856994628906, 16.71837043762207, -2.39385986328125, 8.589370727539062, 15.95880126953125, 8.006690979003906, 26.134010314941406, 51.402740478515625, -4.006050109863281, 9.47265625, 7.368900299072266, -0.4836292266845703, 6.306034088134766, 20.596054077148438, 14.2791748046875, 14.907852172851562, 1.68060302734375, 3.07281494140625, 0.25548553466796875, 0.01293182373046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000181.npy"}
{"epoch": 0.2657856093979442, "step": 182, "batch_size": 64, "mean": 15.727121353149414, "std": 17.087377548217773, "min": -18.957748413085938, "p10": -2.2546096801757805, "median": 14.643694400787354, "p90": 35.88784408569337, "max": 69.15199279785156, "pos_frac": 0.859375, "sample": [23.301345825195312, 14.195220947265625, 36.492515563964844, 22.95556640625, 25.052825927734375, 34.47694396972656, 61.400848388671875, 16.6820068359375, 69.15199279785156, 14.641366004943848, 4.891632080078125, 18.19012451171875, -10.077396392822266, -0.6255064010620117, 26.790069580078125, -18.280242919921875, 23.944461822509766, 25.817062377929688, 2.96004581451416, 12.531322479248047, 15.156158447265625, -11.224069595336914, 31.70920181274414, 18.967933654785156, 9.376514434814453, -1.455841064453125, 12.024276733398438, 1.5281448364257812, -3.3109970092773438, 16.47747802734375, 14.64602279663086, 15.294052124023438, 33.94512176513672, 18.08454132080078, 17.346542358398438, 22.686691284179688, 7.661285400390625, -18.957748413085938, 4.8780517578125, 40.7264404296875, 1.24908447265625, 14.617450714111328, 5.818611145019531, 9.222023010253906, 45.11427307128906, 11.339942932128906, 15.58419418334961, 34.02241134643555, 14.61468505859375, -2.5969390869140625, 46.22483825683594, 19.441850662231445, 5.569011688232422, 3.9785079956054688, 7.730186462402344, 3.8569507598876953, -12.212890625, 0.45760345458984375, 43.84519958496094, 1.2837791442871094, 29.727386474609375, 6.7477264404296875, 24.097686767578125, 26.750160217285156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000182.npy"}
{"epoch": 0.26725403817914833, "step": 183, "batch_size": 64, "mean": 16.041793823242188, "std": 14.990285873413086, "min": -10.117446899414062, "p10": 0.23574180603027353, "median": 14.007205963134766, "p90": 34.58793487548828, "max": 55.209930419921875, "pos_frac": 0.90625, "sample": [30.733261108398438, 26.671737670898438, 3.8248672485351562, 16.075443267822266, 3.438182830810547, 29.890846252441406, 40.94902038574219, -7.246002197265625, 12.663993835449219, -5.892856597900391, 17.655441284179688, 55.209930419921875, 14.65521240234375, 2.9033279418945312, -3.557373046875, 11.43917465209961, 51.26239013671875, 34.413970947265625, 12.201309204101562, 3.600492477416992, 5.155853271484375, 17.786354064941406, 14.256515502929688, 15.026885986328125, 0.51348876953125, 1.3055925369262695, 34.66249084472656, 32.01359558105469, 33.845706939697266, 6.006988525390625, 49.528465270996094, 12.785446166992188, 12.599180221557617, 2.0791072845458984, 25.663848876953125, 0.32602691650390625, 14.496721267700195, 26.977874755859375, 16.599369049072266, 35.02371597290039, -2.80181884765625, -5.1508941650390625, 27.80865478515625, 22.359527587890625, 11.225875854492188, 46.90601348876953, 9.713005065917969, 0.4227294921875, 13.757896423339844, 18.26146697998047, 21.962806701660156, 5.432029724121094, 21.919219970703125, 13.302284240722656, 6.837913513183594, 32.981483459472656, 6.6971893310546875, 14.927803039550781, 4.9859466552734375, 0.19704818725585938, 31.81256103515625, -10.117446899414062, 17.2896728515625, 8.398185729980469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000183.npy"}
{"epoch": 0.2687224669603524, "step": 184, "batch_size": 64, "mean": 10.603963851928711, "std": 15.783801078796387, "min": -47.2125244140625, "p10": -4.570604324340819, "median": 9.133541107177734, "p90": 32.94616851806641, "max": 42.2587890625, "pos_frac": 0.734375, "sample": [20.315628051757812, 15.761238098144531, 0.6661605834960938, 29.454376220703125, 42.03345489501953, 1.21368408203125, 11.896652221679688, 1.1603546142578125, 42.2587890625, -1.4420051574707031, 11.652145385742188, -0.35190582275390625, 12.461811065673828, 13.819839477539062, -1.4077529907226562, 28.922691345214844, 18.736373901367188, 32.934814453125, 32.56871032714844, 38.295799255371094, 2.4523582458496094, 6.595985412597656, -47.2125244140625, 14.5572509765625, -0.38106536865234375, 26.178298950195312, 12.529888153076172, -0.342681884765625, -3.151336669921875, 14.43048095703125, 1.1887969970703125, -15.670867919921875, 14.405036926269531, 35.13085174560547, -3.6152267456054688, 19.95575714111328, 17.118019104003906, 5.779945373535156, 18.23957061767578, 4.4390869140625, 22.357315063476562, -4.097469329833984, 5.276985168457031, -4.77337646484375, 10.391845703125, 7.875236511230469, -0.3083839416503906, 4.733295440673828, 21.743003845214844, -3.0219039916992188, -5.14581298828125, 6.113136291503906, 3.09796142578125, 12.31351089477539, -5.3660888671875, -11.812347412109375, 37.78209686279297, 36.65937042236328, 7.722675323486328, 19.248497009277344, 2.881561279296875, -8.663818359375, 17.116891860961914, 32.95103454589844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000184.npy"}
{"epoch": 0.2701908957415565, "step": 185, "batch_size": 64, "mean": 12.89339542388916, "std": 16.69576644897461, "min": -20.958749771118164, "p10": -7.054742813110351, "median": 13.453571319580078, "p90": 34.44989128112794, "max": 67.961181640625, "pos_frac": 0.8125, "sample": [-20.958749771118164, -10.560813903808594, 5.602176666259766, -6.2561798095703125, 7.584259033203125, 3.9876937866210938, -3.029294967651367, 0.8728256225585938, 27.817230224609375, 15.282814025878906, 16.078357696533203, 49.53410339355469, 15.643775939941406, 16.770347595214844, 12.770065307617188, 19.15448760986328, 13.047004699707031, -1.8542861938476562, 23.170364379882812, 39.021728515625, -7.289661407470703, 20.78508758544922, 4.836463928222656, 1.33062744140625, -6.506599426269531, -12.552513122558594, 21.57227325439453, 26.37448501586914, 10.308223724365234, 14.1617431640625, -14.71807861328125, 0.4550018310546875, 23.592453002929688, 19.534347534179688, -15.413421630859375, 11.5693359375, 13.860137939453125, 18.974517822265625, 41.79376220703125, 8.016288757324219, 16.358325958251953, 35.13136291503906, 2.2336959838867188, 24.93212127685547, 7.786418914794922, 5.383907318115234, 24.10674285888672, 16.125839233398438, 20.967849731445312, 10.639883041381836, 67.961181640625, 15.62447738647461, 14.785346984863281, -2.1005172729492188, 0.7148704528808594, 32.85979080200195, 12.253368377685547, 7.7969818115234375, -16.908401489257812, 43.20494079589844, 16.198062896728516, 43.889015197753906, 19.101608276367188, 1.7680397033691406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000185.npy"}
{"epoch": 0.27165932452276065, "step": 186, "batch_size": 64, "mean": 14.769529342651367, "std": 15.212905883789062, "min": -20.373321533203125, "p10": -1.042755126953125, "median": 14.007831573486328, "p90": 39.00422592163087, "max": 59.98222351074219, "pos_frac": 0.859375, "sample": [28.188339233398438, 17.845413208007812, 13.802963256835938, 18.872644424438477, 12.99176025390625, 39.81031799316406, 11.45294189453125, 21.104766845703125, 59.98222351074219, -5.798927307128906, 21.115386962890625, 20.167312622070312, 1.9747848510742188, -7.431480407714844, 19.37512969970703, 26.425193786621094, 4.168109893798828, -0.416351318359375, -1.4809722900390625, 4.1252593994140625, 27.059097290039062, 21.46837615966797, -20.373321533203125, 23.270584106445312, 9.950332641601562, 12.193870544433594, 5.530300140380859, 41.439537048339844, 1.3260345458984375, 1.6609115600585938, 0.9350013732910156, -10.419570922851562, 37.12334442138672, 0.3027000427246094, 7.211658477783203, 17.174942016601562, 42.56929016113281, 23.951438903808594, 43.07051086425781, 6.633907318115234, 18.692642211914062, 8.669261932373047, 31.494117736816406, 40.45851135253906, 0.5354995727539062, -0.955810546875, 1.711822509765625, 21.581893920898438, 4.6285247802734375, 10.716796875, 31.33837890625, 18.137561798095703, 18.90972900390625, 17.931812286376953, -7.55169677734375, 41.647056579589844, 2.5061874389648438, -1.08001708984375, 14.212699890136719, 21.853836059570312, 16.918075561523438, 16.881637573242188, 7.358283996582031, 10.299346923828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000186.npy"}
{"epoch": 0.27312775330396477, "step": 187, "batch_size": 64, "mean": 12.98681354522705, "std": 16.200305938720703, "min": -20.451385498046875, "p10": -6.889683914184568, "median": 10.645275115966797, "p90": 33.73826293945313, "max": 64.89169311523438, "pos_frac": 0.84375, "sample": [-4.138648986816406, 33.328895568847656, 1.5062084197998047, 4.201316833496094, 1.4637832641601562, 9.919639587402344, 36.033287048339844, 38.748016357421875, 17.838207244873047, 16.89171600341797, 2.3449172973632812, 1.9708118438720703, 39.84696960449219, 4.905059814453125, 22.046913146972656, 2.3282928466796875, 22.20056915283203, 6.05084228515625, 2.1469573974609375, -11.941341400146484, 7.112251281738281, -2.4462890625, 29.145416259765625, 4.5074310302734375, 4.125328063964844, 19.008411407470703, 64.89169311523438, -11.33523178100586, 0.0140533447265625, 10.774772644042969, 14.045738220214844, 33.09137725830078, 10.817474365234375, -8.06869888305664, 1.857656478881836, 29.44805908203125, 40.66985321044922, 9.798198699951172, 0.8271102905273438, 10.515777587890625, 33.33628845214844, 20.105201721191406, 25.951560974121094, 33.91053771972656, 5.401092529296875, -14.744941711425781, -0.30047607421875, 1.6109580993652344, -20.451385498046875, -13.279434204101562, 8.670726776123047, 26.349632263183594, 34.90846252441406, 16.93750762939453, 23.459354400634766, 25.342987060546875, 28.28244400024414, -11.038002014160156, 20.067893981933594, 25.583208084106445, 14.749382019042969, 13.810123443603516, 5.0387725830078125, 10.961404800415039], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000187.npy"}
{"epoch": 0.2745961820851689, "step": 188, "batch_size": 64, "mean": 15.28190803527832, "std": 15.410323143005371, "min": -37.93367004394531, "p10": -2.3641105651855465, "median": 16.237979888916016, "p90": 29.46320419311524, "max": 57.97193908691406, "pos_frac": 0.859375, "sample": [22.14873504638672, 19.021804809570312, 47.10645294189453, 6.599819183349609, 16.530174255371094, 21.125221252441406, 17.853744506835938, 16.579116821289062, 15.076766967773438, 25.32950210571289, 5.057910919189453, 22.671661376953125, 13.548614501953125, 6.170463562011719, 10.582931518554688, 12.0040283203125, 12.175544738769531, 30.0345458984375, 6.578067779541016, 2.36215877532959, 35.409523010253906, 14.959583282470703, 20.83055877685547, 24.368165969848633, 20.842666625976562, -9.819053649902344, 27.977294921875, 3.9309539794921875, 28.115985870361328, -2.1266098022460938, 57.97193908691406, -37.93367004394531, 22.792755126953125, 15.512100219726562, 44.44153594970703, 20.583282470703125, -12.6163330078125, 24.90686798095703, 4.630826950073242, 22.722023010253906, 28.13007354736328, -13.929763793945312, 16.84149169921875, 15.122337341308594, -3.669689178466797, -1.83251953125, 5.508899688720703, 7.878376007080078, 13.852828979492188, 17.31005859375, 16.694305419921875, 22.82110595703125, 18.091720581054688, -2.4658966064453125, 14.570388793945312, 15.945785522460938, 23.940235137939453, 49.9144287109375, 37.231109619140625, 15.498260498046875, 18.697113037109375, -3.1695213317871094, 8.816841125488281, 0.18645095825195312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000188.npy"}
{"epoch": 0.27606461086637296, "step": 189, "batch_size": 64, "mean": 15.897202491760254, "std": 18.756498336791992, "min": -15.749164581298828, "p10": -3.138838958740234, "median": 12.711017608642578, "p90": 37.988162231445315, "max": 83.24737548828125, "pos_frac": 0.8125, "sample": [-1.5930099487304688, 7.645576477050781, 21.173797607421875, 40.07637023925781, 6.229438781738281, -11.03424072265625, 23.68754005432129, -2.8454971313476562, 4.245819091796875, 15.591423034667969, 19.734630584716797, 69.62259674072266, 9.339324951171875, 30.527145385742188, 24.653709411621094, 21.92176055908203, -4.84661865234375, 9.660030364990234, 1.6478118896484375, 4.4974212646484375, -9.504981994628906, 37.954078674316406, -1.9370613098144531, 33.09821319580078, 19.926395416259766, 22.28679656982422, -3.264556884765625, 13.127313613891602, 11.769367218017578, 32.54521179199219, 11.638374328613281, 15.338533401489258, 31.25930404663086, 60.248817443847656, 14.286109924316406, 21.62750244140625, 1.8912010192871094, 4.915210723876953, -15.749164581298828, 23.549789428710938, -0.516021728515625, 2.1614837646484375, 23.042984008789062, 8.180122375488281, 8.778221130371094, 27.908939361572266, 54.42840576171875, 83.24737548828125, 1.2897911071777344, 17.627853393554688, 41.48767852783203, 3.5281200408935547, 12.294721603393555, 13.313056945800781, -3.313201904296875, 5.551506042480469, 20.304580688476562, -0.5616302490234375, 38.002769470214844, -7.113765716552734, 2.180999755859375, 24.04163360595703, 20.480369567871094, 6.1335296630859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000189.npy"}
{"epoch": 0.2775330396475771, "step": 190, "batch_size": 64, "mean": 17.790395736694336, "std": 15.284417152404785, "min": -13.211410522460938, "p10": -0.17778320312499807, "median": 14.855083465576172, "p90": 38.85698165893555, "max": 61.471588134765625, "pos_frac": 0.890625, "sample": [3.0151290893554688, 12.76607894897461, -0.9719619750976562, 9.3453369140625, 8.237396240234375, 24.0244140625, 29.99726104736328, -6.701713562011719, 4.157360076904297, 31.9124755859375, 19.42803955078125, 14.064079284667969, 29.960220336914062, 16.674819946289062, 32.75123596191406, 18.314422607421875, -9.713371276855469, 16.364959716796875, 29.59680938720703, 31.410354614257812, 4.46834659576416, 44.58766174316406, 10.317520141601562, 26.929092407226562, 40.910888671875, 12.327789306640625, 4.556060791015625, 37.30848693847656, 9.161972045898438, 15.646087646484375, -13.211410522460938, 39.52062225341797, -4.3214111328125, 23.161014556884766, 44.6849365234375, 32.851783752441406, 13.048080444335938, 27.444900512695312, 8.842666625976562, -10.671234130859375, 44.025062561035156, 1.6753005981445312, 13.839706420898438, 35.24330139160156, 4.922126770019531, 8.099639892578125, 27.270896911621094, 10.37725830078125, 40.94268798828125, 27.204742431640625, 26.03832244873047, 61.471588134765625, 15.667236328125, 20.656665802001953, 13.014198303222656, 10.807962417602539, -1.3924407958984375, 13.7947998046875, 5.75861930847168, 25.4183349609375, 6.2881011962890625, 28.691558837890625, 12.087589263916016, 4.484893798828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000190.npy"}
{"epoch": 0.2790014684287812, "step": 191, "batch_size": 64, "mean": 19.89072036743164, "std": 18.23760986328125, "min": -14.429595947265625, "p10": 0.6378787994384768, "median": 18.267213821411133, "p90": 47.486627197265626, "max": 72.3485107421875, "pos_frac": 0.921875, "sample": [41.119110107421875, -4.891632080078125, 68.2523193359375, 48.05699920654297, 22.32903289794922, 11.792633056640625, 0.5390625, 22.997024536132812, 2.988128662109375, 51.613861083984375, 13.576705932617188, 15.594886779785156, 38.86470031738281, 45.89625549316406, 6.753509521484375, 6.0142822265625, 5.252219200134277, 18.75170135498047, 24.039512634277344, 3.4833717346191406, 19.7308349609375, 26.872161865234375, -14.429595947265625, 30.243438720703125, 53.12751770019531, 20.3409423828125, 11.248424530029297, 46.6873779296875, 4.3310699462890625, 7.8625030517578125, 11.492431640625, 3.923828125, 7.297370910644531, 0.06990814208984375, 17.91339874267578, 18.3807373046875, 14.323356628417969, 12.490875244140625, 47.82916259765625, 3.0455055236816406, 2.0176267623901367, 21.72832489013672, 0.8684501647949219, 6.46844482421875, 2.9438858032226562, 48.019073486328125, -0.9307689666748047, 10.366924285888672, 11.008544921875, 33.165992736816406, -11.980789184570312, 28.703781127929688, 30.443382263183594, -0.1795654296875, 29.206077575683594, 23.513412475585938, 36.130271911621094, 18.87845230102539, 20.797340393066406, 33.84906768798828, 72.3485107421875, 28.38907241821289, 18.153690338134766, 23.292037963867188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000191.npy"}
{"epoch": 0.28046989720998533, "step": 192, "batch_size": 64, "mean": 14.054752349853516, "std": 17.253639221191406, "min": -27.44781494140625, "p10": -6.169287872314453, "median": 13.251760482788086, "p90": 35.15620689392091, "max": 62.22575378417969, "pos_frac": 0.765625, "sample": [15.954559326171875, 13.270458221435547, -12.16937255859375, 10.01922607421875, 50.1396484375, 27.799320220947266, 3.5960235595703125, -3.8626060485839844, -11.310462951660156, 8.7762451171875, 11.523468017578125, 23.209156036376953, 12.403289794921875, 4.2396087646484375, -7.4035491943359375, 1.1587753295898438, 20.55718994140625, 17.447647094726562, -2.0966796875, 32.0522346496582, -9.270111083984375, 27.75604248046875, -10.49676513671875, 11.885482788085938, -6.2708587646484375, 44.86076354980469, 1.4386825561523438, 62.22575378417969, -0.998504638671875, 50.86578369140625, -1.80462646484375, 22.312335968017578, 55.090087890625, 23.257781982421875, 12.5147705078125, 7.623771667480469, 36.486480712890625, 28.18267059326172, 29.10693359375, 13.233062744140625, 21.543006896972656, 19.728736877441406, 9.899993896484375, 10.5694580078125, 22.153076171875, 13.902816772460938, 18.33270263671875, 36.81597900390625, -4.903053283691406, 25.016685485839844, 14.007926940917969, 10.9031982421875, 16.088943481445312, -2.449676513671875, 2.793459892272949, 20.16876220703125, -5.932289123535156, 20.092933654785156, -0.114044189453125, 31.439590454101562, 14.917766571044922, 16.92667007446289, -27.44781494140625, 1.74560546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000192.npy"}
{"epoch": 0.28193832599118945, "step": 193, "batch_size": 64, "mean": 12.118255615234375, "std": 17.465810775756836, "min": -33.056732177734375, "p10": -8.383171081542969, "median": 14.277795791625977, "p90": 33.117122650146484, "max": 47.74615478515625, "pos_frac": 0.703125, "sample": [17.773258209228516, -7.306175231933594, 38.209754943847656, -9.568138122558594, -33.056732177734375, 5.123954772949219, 17.869720458984375, 0.31400299072265625, -0.7618408203125, 24.973968505859375, 24.69556427001953, 14.512210845947266, 39.33965301513672, 17.18091583251953, 33.71379089355469, -4.230354309082031, 19.985366821289062, 11.140308380126953, 13.3385009765625, 39.3603515625, -3.2138233184814453, 30.83154296875, 23.35805320739746, 26.118576049804688, 15.59661865234375, 16.279850006103516, 30.930252075195312, 15.393630981445312, -2.4044876098632812, 33.111488342285156, -4.393013000488281, 33.119537353515625, 42.739715576171875, 10.032020568847656, -4.047630310058594, -22.218231201171875, 6.697212219238281, 4.7187652587890625, -8.367950439453125, -2.1791610717773438, -8.389694213867188, 6.4914703369140625, 14.043380737304688, 24.668373107910156, 19.69214630126953, -5.966773986816406, 21.15801239013672, -16.928619384765625, 28.564823150634766, 5.360477447509766, 0.8139076232910156, 32.543975830078125, -16.473533630371094, 16.053955078125, 32.52694320678711, 1.1746673583984375, 25.32599639892578, 47.74615478515625, -10.242515563964844, 0.030780792236328125, 26.81865692138672, -3.1009063720703125, 29.291709899902344, -0.3460540771484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000193.npy"}
{"epoch": 0.2834067547723935, "step": 194, "batch_size": 64, "mean": 14.118998527526855, "std": 16.479721069335938, "min": -17.469966888427734, "p10": -3.4993003845214843, "median": 13.646613121032715, "p90": 35.670315170288085, "max": 57.27717590332031, "pos_frac": 0.828125, "sample": [24.898983001708984, -3.4966354370117188, 24.34923553466797, 7.55474853515625, 18.920955657958984, 31.795501708984375, 1.205169677734375, 12.265544891357422, 23.325279235839844, -7.440359115600586, 1.1630287170410156, 18.60112762451172, 16.35094451904297, 19.939720153808594, 6.390403747558594, 9.540481567382812, 17.27082061767578, 20.32602310180664, 18.69583511352539, 33.06471252441406, 2.598438262939453, 5.538581848144531, -15.389976501464844, -3.1233253479003906, 8.183570861816406, 23.85074234008789, -0.7206869125366211, 35.28530502319336, 2.3125839233398438, 49.892452239990234, 57.27717590332031, 36.27662658691406, 41.158668518066406, -17.469966888427734, 0.8379364013671875, -14.300689697265625, 7.642265319824219, 5.485298156738281, 12.907333374023438, 4.042288780212402, 16.815452575683594, 19.50971221923828, -17.0078125, 12.537849426269531, 14.385892868041992, 22.67901611328125, 34.39441680908203, 17.053123474121094, 0.16943740844726562, -1.430389404296875, 1.3987846374511719, 19.927291870117188, 17.41553497314453, 35.83531951904297, 4.281503677368164, 47.44416809082031, -3.5004425048828125, 27.224822998046875, 50.48634338378906, 6.117927551269531, 20.824691772460938, -4.409454345703125, 17.288532257080078, 7.168006896972656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000194.npy"}
{"epoch": 0.28487518355359764, "step": 195, "batch_size": 64, "mean": 17.347488403320312, "std": 18.364614486694336, "min": -27.323318481445312, "p10": -4.32576389312744, "median": 17.46213150024414, "p90": 41.461969757080084, "max": 68.26751708984375, "pos_frac": 0.84375, "sample": [24.960697174072266, -0.178436279296875, 17.813156127929688, 21.12485122680664, 31.087074279785156, 45.688507080078125, 9.759780883789062, 41.695953369140625, 10.5650634765625, 1.3893718719482422, 15.116630554199219, 25.806133270263672, 10.630844116210938, 28.60712432861328, 30.58245086669922, 16.63592529296875, 17.111106872558594, 0.129608154296875, -27.323318481445312, 22.415557861328125, 21.31707000732422, 12.104114532470703, 0.1605377197265625, 28.925994873046875, 20.885692596435547, 36.81028747558594, 49.15240478515625, 26.453750610351562, 7.574943542480469, 10.094520568847656, -14.481903076171875, 27.574501037597656, 21.066246032714844, 27.87073516845703, -6.3516082763671875, -9.313674926757812, 4.767156600952148, 20.709251403808594, 6.841697692871094, 31.0487060546875, -26.605751037597656, 29.03509521484375, 21.248699188232422, 2.1838455200195312, 48.86552429199219, 50.98014831542969, 5.522357940673828, 48.58570861816406, -5.127693176269531, 20.754714965820312, 9.672863006591797, -1.6766815185546875, 68.26751708984375, 40.91600799560547, 14.16290283203125, -6.0152130126953125, 16.001747131347656, 11.518451690673828, 28.700185775756836, 19.024097442626953, 5.137298583984375, 6.920391082763672, 37.793121337890625, -2.4545955657958984], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000195.npy"}
{"epoch": 0.28634361233480177, "step": 196, "batch_size": 64, "mean": 17.488330841064453, "std": 15.598090171813965, "min": -10.472915649414062, "p10": -1.3917076110839834, "median": 16.74329376220703, "p90": 37.66196899414063, "max": 55.02227020263672, "pos_frac": 0.859375, "sample": [31.57486343383789, 34.89141082763672, 43.67070007324219, 25.39728546142578, 16.704910278320312, 11.713916778564453, 1.5831146240234375, 36.52146911621094, 31.23303985595703, 27.384010314941406, 1.050567626953125, 10.774169921875, 19.62255096435547, 26.232826232910156, 17.354278564453125, 19.531782150268555, 8.105985641479492, 23.376205444335938, -3.4767799377441406, 6.219047546386719, 7.5450286865234375, -6.610025405883789, -10.472915649414062, 9.487930297851562, 17.62017250061035, 16.78167724609375, 6.192726135253906, -1.7613906860351562, 21.94373321533203, 2.2199764251708984, 39.256263732910156, 12.972221374511719, 41.23016357421875, 36.108917236328125, 3.426849365234375, 31.832740783691406, 9.326858520507812, 31.89281463623047, 3.0802345275878906, 32.84815979003906, 9.688850402832031, 33.715065002441406, 41.13163375854492, -4.133848190307617, 37.44456481933594, 6.52825927734375, 11.1466064453125, 11.785709381103516, 8.369994163513184, 17.272186279296875, -6.5017852783203125, -7.351161956787109, 20.64352035522461, 34.297515869140625, 45.20195007324219, 19.541900634765625, 33.689964294433594, -0.52911376953125, -0.134033203125, 37.75514221191406, 55.02227020263672, 16.670841217041016, 2.58294677734375, 1.0267400741577148], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000196.npy"}
{"epoch": 0.2878120411160059, "step": 197, "batch_size": 64, "mean": 10.927433013916016, "std": 14.32720947265625, "min": -35.41196060180664, "p10": -3.06452522277832, "median": 9.431133270263672, "p90": 26.35276603698731, "max": 48.54790496826172, "pos_frac": 0.875, "sample": [4.980377197265625, 33.271644592285156, 17.82278060913086, 5.096549987792969, 1.5277099609375, 8.480365753173828, 12.227619171142578, 8.72684097290039, 5.544399261474609, 1.9310379028320312, 38.02619934082031, 48.54790496826172, 0.5986137390136719, 23.231124877929688, 8.16912841796875, 22.355195999145508, -14.392112731933594, 23.91834259033203, 7.6211395263671875, 10.672237396240234, 13.062557220458984, 21.223167419433594, 6.163875579833984, 8.561607360839844, 9.87762451171875, 4.030948638916016, 15.703361511230469, 15.576751708984375, 22.35717010498047, 13.293258666992188, 1.125762939453125, -8.7567138671875, -6.366847991943359, 18.179534912109375, 6.802459716796875, -2.6368331909179688, 32.4996337890625, 33.19304656982422, 5.95684814453125, -35.41196060180664, 4.292652130126953, -27.823654174804688, 1.1034355163574219, 22.185943603515625, 2.402069091796875, 16.86492919921875, 12.83685302734375, 15.086872100830078, 18.974327087402344, 26.79541778564453, -3.247821807861328, 25.31991195678711, 2.8875350952148438, 11.657730102539062, 19.08281707763672, 44.33662414550781, 8.984642028808594, 6.274600982666016, 5.714111328125, 10.115325927734375, -7.966156005859375, 21.65264892578125, 7.729095458984375, 11.303359985351562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000197.npy"}
{"epoch": 0.28928046989721, "step": 198, "batch_size": 64, "mean": 16.559494018554688, "std": 16.446805953979492, "min": -17.343181610107422, "p10": -2.6678337097167963, "median": 15.797870635986328, "p90": 37.83827514648438, "max": 75.55947875976562, "pos_frac": 0.859375, "sample": [-2.3110885620117188, -7.7397003173828125, 35.65402603149414, 47.541259765625, 19.16918182373047, 10.461479187011719, 25.30218505859375, 4.145759582519531, 26.467453002929688, 15.586284637451172, 33.070281982421875, -6.6460723876953125, -3.053497314453125, 31.72931671142578, 22.09856414794922, -10.993583679199219, 29.35354232788086, 18.316757202148438, 18.83349609375, 5.935382843017578, 22.24091339111328, 37.511634826660156, 0.9700584411621094, 25.555450439453125, 11.254817962646484, 1.3052215576171875, 18.127952575683594, 8.632156372070312, 5.812721252441406, 39.31382751464844, 16.009456634521484, 7.285026550292969, 3.03192138671875, 12.313125610351562, 3.882476806640625, 0.27693939208984375, -17.343181610107422, 2.0421981811523438, 47.45177459716797, 37.97826385498047, 26.43170166015625, 38.938079833984375, 25.099136352539062, -2.8207244873046875, 8.24398422241211, 10.101226806640625, 47.27601623535156, 10.887992858886719, 7.769947052001953, 12.425067901611328, -1.7795639038085938, 26.258941650390625, 24.880142211914062, 19.53378677368164, 13.291240692138672, 27.185420989990234, 9.288909912109375, 22.488998413085938, -3.624950408935547, 20.951370239257812, 19.73546600341797, 75.55947875976562, 8.815895080566406, 16.296348571777344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000198.npy"}
{"epoch": 0.2907488986784141, "step": 199, "batch_size": 64, "mean": 14.728889465332031, "std": 16.262046813964844, "min": -16.61456298828125, "p10": -4.435576629638671, "median": 14.176219940185547, "p90": 35.50546798706055, "max": 53.00489807128906, "pos_frac": 0.765625, "sample": [4.817283630371094, 22.900405883789062, -16.61456298828125, 26.16302490234375, 20.82861328125, 22.461105346679688, -1.7996864318847656, 19.909286499023438, 16.588134765625, 11.570938110351562, 53.00489807128906, 25.75579071044922, -3.3529510498046875, -2.2475547790527344, 2.8447723388671875, -9.04384994506836, -6.875526428222656, 26.286636352539062, 42.38127899169922, 29.916748046875, -10.139457702636719, 11.286163330078125, 32.52001953125, 11.6153564453125, -1.3264999389648438, 15.055335998535156, 1.8648452758789062, 17.930423736572266, 35.63684844970703, 38.974342346191406, 8.842826843261719, 20.883277893066406, 23.18414306640625, 44.80562973022461, -3.1228599548339844, 35.102455139160156, 4.906700134277344, 3.8817520141601562, -1.4321250915527344, -0.6390342712402344, -4.899559020996094, 2.066131591796875, 25.102027893066406, 5.634796142578125, 51.214019775390625, 13.297103881835938, 3.8314285278320312, 3.5366458892822266, 3.7822799682617188, -2.941204071044922, -6.288856506347656, 16.706525802612305, 20.05477523803711, 25.888973236083984, 35.19891357421875, 20.248077392578125, 6.285575866699219, 25.41944122314453, 17.831336975097656, -8.994667053222656, 31.46735382080078, 40.90423583984375, 31.68133544921875, 10.297332763671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000199.npy"}
{"epoch": 0.2922173274596182, "step": 200, "batch_size": 64, "mean": 17.302513122558594, "std": 18.827529907226562, "min": -22.58995819091797, "p10": -10.67705192565918, "median": 16.743335723876953, "p90": 41.52826919555664, "max": 53.82623291015625, "pos_frac": 0.84375, "sample": [22.69781494140625, 51.542930603027344, 18.190359115600586, 40.132965087890625, 14.003791809082031, -0.890838623046875, -12.158554077148438, 16.23710060119629, 47.854156494140625, -10.25027847290039, 15.812118530273438, 24.629913330078125, 5.0213623046875, 0.5521583557128906, 3.7285900115966797, 30.23419189453125, 37.81346893310547, 49.833984375, 28.81397247314453, 39.64092254638672, 2.145751953125, 7.031133651733398, 16.655763626098633, 9.414459228515625, -15.997100830078125, 30.702301025390625, 16.67053985595703, 8.492240905761719, 34.428558349609375, 0.7060546875, 38.25851058959961, -14.685964584350586, 5.114990234375, 13.289840698242188, 2.1746158599853516, 43.924842834472656, 10.59271240234375, 30.064727783203125, 26.72241973876953, 53.82623291015625, 5.7689056396484375, 41.622222900390625, 17.024398803710938, 21.620880126953125, 0.6609325408935547, 6.8457489013671875, 25.7685546875, 18.337778091430664, 51.544830322265625, -10.972759246826172, -5.082887649536133, 33.694183349609375, -11.367080688476562, 26.27973175048828, 19.975372314453125, 4.35894775390625, 41.309043884277344, 35.36787414550781, 7.894287109375, 30.674819946289062, -10.859954833984375, 16.816131591796875, -22.58995819091797, 19.696090698242188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000200.npy"}
{"epoch": 0.2936857562408223, "step": 201, "batch_size": 64, "mean": 16.05862045288086, "std": 19.847042083740234, "min": -38.465576171875, "p10": -8.616516876220702, "median": 15.310525894165039, "p90": 36.347848892211914, "max": 70.25204467773438, "pos_frac": 0.828125, "sample": [0.6437835693359375, 15.632015228271484, 14.988380432128906, -38.465576171875, 35.9684944152832, 29.833160400390625, 3.9116039276123047, 2.2704010009765625, 6.2006988525390625, 11.344085693359375, -14.004852294921875, 35.317413330078125, -9.321533203125, 18.52761459350586, 29.8612060546875, 4.285741806030273, 27.748733520507812, 2.4334640502929688, 9.216331481933594, 20.498443603515625, 26.741920471191406, 34.643348693847656, 20.391403198242188, -2.2265090942382812, 17.073410034179688, 44.914817810058594, -22.937829971313477, 60.10270690917969, -3.3636703491210938, 6.164237976074219, 14.989036560058594, 22.109397888183594, 22.659225463867188, 20.754196166992188, -9.187057495117188, 21.457656860351562, 36.51042938232422, 8.480148315429688, 6.217504501342773, 29.76904296875, 54.905914306640625, 13.560844421386719, 38.321014404296875, 26.790420532226562, 5.437889099121094, 34.70830535888672, -8.12677001953125, -8.826408386230469, 33.45558166503906, 7.6560516357421875, 51.028961181640625, 0.16485595703125, -5.9659271240234375, 33.69264221191406, 23.66827392578125, 13.145278930664062, 12.884246826171875, 25.215259552001953, 0.16135025024414062, -18.570037841796875, 70.25204467773438, 17.42144775390625, 30.513031005859375, 14.104354858398438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000201.npy"}
{"epoch": 0.29515418502202645, "step": 202, "batch_size": 64, "mean": 14.717092514038086, "std": 17.564855575561523, "min": -23.800643920898438, "p10": -3.60436553955078, "median": 14.373109817504883, "p90": 36.18740310668946, "max": 66.97451782226562, "pos_frac": 0.765625, "sample": [51.58355712890625, -0.9364738464355469, -0.4264945983886719, -8.072830200195312, 1.2168636322021484, 30.02386474609375, 46.42021942138672, -0.6095371246337891, 45.18567657470703, 28.316970825195312, 17.646419525146484, 16.426979064941406, 13.117355346679688, 17.147262573242188, -0.882598876953125, 28.527389526367188, 4.5474853515625, 27.0738525390625, 10.71432876586914, -4.4367828369140625, 1.3857269287109375, 12.047271728515625, -4.1243896484375, 28.09320068359375, 2.9116363525390625, -10.018577575683594, 23.92620849609375, -0.9286823272705078, 12.34034538269043, 22.58954620361328, 6.6380462646484375, 27.138011932373047, 56.5950927734375, -7.325531005859375, 18.130279541015625, -23.800643920898438, 6.106910705566406, -2.3909759521484375, 49.13090515136719, 23.7645263671875, 1.1803436279296875, -0.6452980041503906, -6.123655319213867, 34.67015075683594, 5.697471618652344, 23.83951187133789, 15.888553619384766, 17.739730834960938, 16.4425048828125, 66.97451782226562, 2.1387786865234375, 22.354583740234375, 4.521125793457031, -0.33969879150390625, 4.4332122802734375, 20.923904418945312, 19.85031509399414, 0.02236175537109375, 28.53522491455078, 15.628864288330078, 25.77637481689453, 2.0208511352539062, 36.83765411376953, 18.734127044677734], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000202.npy"}
{"epoch": 0.2966226138032305, "step": 203, "batch_size": 64, "mean": 17.103050231933594, "std": 16.53299903869629, "min": -17.35010528564453, "p10": -4.3770702362060545, "median": 17.331161499023438, "p90": 37.89013824462891, "max": 57.78846740722656, "pos_frac": 0.84375, "sample": [-5.7906494140625, -0.138427734375, 46.67870330810547, -4.4024200439453125, -14.52886962890625, 38.24859619140625, 0.7554092407226562, 26.810577392578125, 11.805147171020508, 17.030776977539062, 5.589258193969727, 11.455503463745117, 21.89447021484375, 57.78846740722656, 6.029338836669922, 18.359619140625, 22.864181518554688, 12.473739624023438, 19.503814697265625, 1.4820213317871094, 10.3543701171875, 38.080718994140625, 5.477874755859375, 8.415149688720703, 38.628662109375, 5.060728073120117, 57.464935302734375, 32.063907623291016, -17.35010528564453, 16.89289093017578, 33.58411407470703, -4.317920684814453, 37.44544982910156, 32.66413879394531, 35.54887390136719, 29.26520538330078, 47.388084411621094, 22.433162689208984, 14.358551025390625, 8.291885375976562, 13.021202087402344, 17.631546020507812, 19.234222412109375, 23.54692840576172, 7.3577728271484375, 20.028480529785156, -4.561767578125, -12.628192901611328, 9.841175079345703, 8.596927642822266, 21.142852783203125, 22.008880615234375, 15.55340576171875, 11.878374099731445, 29.89611053466797, -2.94647216796875, 0.3542289733886719, 21.432815551757812, -11.186447143554688, 25.226638793945312, 27.501502990722656, 25.48443603515625, 31.963024139404297, 28.557594299316406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000203.npy"}
{"epoch": 0.29809104258443464, "step": 204, "batch_size": 64, "mean": 14.365680694580078, "std": 17.87637710571289, "min": -19.607894897460938, "p10": -8.584695434570312, "median": 16.02229404449463, "p90": 37.20218658447266, "max": 67.45779418945312, "pos_frac": 0.75, "sample": [8.327056884765625, 4.30340576171875, -7.526641845703125, 1.6156578063964844, 21.519306182861328, 18.14873695373535, 40.50732421875, 35.66967010498047, 16.721099853515625, -1.3780632019042969, -2.2265090942382812, 16.180755615234375, 12.283798217773438, -3.2223052978515625, 24.369094848632812, -0.44605255126953125, -3.67730712890625, 35.767921447753906, 38.967437744140625, 15.863832473754883, 67.45779418945312, -13.463653564453125, -9.03814697265625, -6.069480895996094, 28.29462432861328, 11.89079475402832, 25.06005859375, 52.15220260620117, 27.151901245117188, 18.882614135742188, 21.24908447265625, -5.0685272216796875, 7.9656982421875, 7.431827545166016, -14.636077880859375, 40.068389892578125, 21.947341918945312, -19.48754119873047, 33.502166748046875, -7.2888946533203125, -17.154830932617188, 18.889541625976562, 9.664527893066406, 27.898334503173828, 6.3046112060546875, 19.51113510131836, 17.33983612060547, 22.046524047851562, 27.196670532226562, 20.098270416259766, -12.038848876953125, 17.5382080078125, 7.653472900390625, 5.3243255615234375, 14.099189758300781, 23.770416259765625, -19.607894897460938, 26.281570434570312, 13.901863098144531, 37.816871643066406, 18.88983154296875, 46.08248519897461, 12.889528274536133, 15.237564086914062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000204.npy"}
{"epoch": 0.29955947136563876, "step": 205, "batch_size": 64, "mean": 19.112470626831055, "std": 15.536169052124023, "min": -11.820968627929688, "p10": -2.487664031982421, "median": 19.672542572021484, "p90": 37.040485382080085, "max": 66.29908752441406, "pos_frac": 0.859375, "sample": [6.791204452514648, 29.208946228027344, 19.51219940185547, 17.694000244140625, 1.092559814453125, 8.049659729003906, -1.3369674682617188, 10.176925659179688, 39.332611083984375, 36.153656005859375, -0.8929824829101562, 54.181396484375, 10.307239532470703, 22.09205436706543, 9.330028533935547, 39.45790481567383, 18.357820510864258, 32.56129455566406, 43.90688705444336, 22.423099517822266, 9.952499389648438, 13.394493103027344, 5.994964599609375, 17.646408081054688, 26.640647888183594, 12.144432067871094, 30.255260467529297, -3.891357421875, 37.420555114746094, 28.689468383789062, -10.123342514038086, 18.974822998046875, -2.9808197021484375, 16.392494201660156, 10.568763732910156, 7.1024932861328125, 54.380035400390625, 24.509010314941406, 16.04602813720703, 23.662147521972656, -6.828830718994141, 22.187088012695312, 24.648727416992188, 13.88104248046875, 11.430877685546875, 15.323112487792969, 31.68274688720703, -3.0364761352539062, 34.830238342285156, 33.89801788330078, 5.461936950683594, 25.934051513671875, 23.675518035888672, 19.8328857421875, 27.290756225585938, 66.29908752441406, 23.21436309814453, 28.599441528320312, -11.820968627929688, 20.567279815673828, -5.158180236816406, 20.5614013671875, 24.770050048828125, 20.77545928955078], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000205.npy"}
{"epoch": 0.3010279001468429, "step": 206, "batch_size": 64, "mean": 17.530092239379883, "std": 14.728669166564941, "min": -12.784912109375, "p10": 1.6830101013183603, "median": 16.0987548828125, "p90": 36.243508911132814, "max": 59.857383728027344, "pos_frac": 0.90625, "sample": [16.157455444335938, 2.4580078125, 10.129470825195312, 35.826324462890625, 34.05175018310547, 34.50708770751953, 20.072494506835938, 7.391304016113281, 6.95751953125, 13.734107971191406, 24.05156707763672, -0.4376068115234375, 20.2061767578125, 13.235729217529297, -9.0230712890625, 9.243621826171875, 18.3243408203125, 5.879302978515625, 3.1821937561035156, 13.56007194519043, 2.72210693359375, 12.90191650390625, 22.919937133789062, 36.42230224609375, 11.76171875, 7.608253479003906, 6.8701629638671875, 25.300804138183594, 38.82420349121094, 21.85962677001953, 27.889511108398438, 1.3508682250976562, 14.608341217041016, 15.650581359863281, 24.096332550048828, 19.94326400756836, 39.084686279296875, 19.37511444091797, 39.727622985839844, 19.473411560058594, -9.949546813964844, 22.714523315429688, -12.784912109375, 7.036384582519531, 25.445964813232422, 17.574813842773438, -2.2775421142578125, 23.615365982055664, 2.886016845703125, 6.930614471435547, -1.4137153625488281, 29.121681213378906, 16.040054321289062, 9.73012924194336, 55.98454284667969, 39.60273742675781, 6.453784942626953, 24.17145538330078, 59.857383728027344, 30.524322509765625, 5.611236572265625, 7.0449981689453125, 34.31712341308594, 35.789947509765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000206.npy"}
{"epoch": 0.302496328928047, "step": 207, "batch_size": 64, "mean": 18.237651824951172, "std": 22.477550506591797, "min": -24.02716064453125, "p10": -8.690975189208983, "median": 15.920848846435547, "p90": 41.86172332763672, "max": 91.08413696289062, "pos_frac": 0.8125, "sample": [4.739921569824219, -1.7612228393554688, 26.046035766601562, 19.09454345703125, 23.834861755371094, -17.903079986572266, 54.799652099609375, -3.5757904052734375, -11.732505798339844, 27.33959197998047, -6.849517822265625, 7.263404846191406, 27.39447021484375, 21.555152893066406, 0.792877197265625, 30.757984161376953, 33.88307189941406, 40.51268768310547, 2.906190872192383, 6.749814987182617, 7.983940124511719, 7.4491729736328125, -9.480171203613281, 10.322578430175781, 12.141048431396484, -10.39251708984375, 36.90996551513672, 16.569717407226562, -3.813140869140625, 11.090988159179688, 15.271980285644531, 61.39295959472656, 27.573532104492188, 11.917888641357422, 35.054832458496094, 43.91189193725586, 41.91819763183594, 40.88923645019531, 12.723716735839844, -17.6929931640625, 24.200111389160156, 35.47028732299805, 30.79662322998047, 30.205299377441406, 35.510162353515625, 5.336387634277344, 2.723785400390625, 22.63001251220703, 41.729949951171875, 31.856094360351562, 1.9234848022460938, 28.475242614746094, -13.262588500976562, 24.040435791015625, 79.89251708984375, -24.02716064453125, 1.5105648040771484, 1.1331329345703125, -0.8307723999023438, 91.08413696289062, 4.108367919921875, 52.90771484375, 1.434967041015625, 20.770004272460938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000207.npy"}
{"epoch": 0.3039647577092511, "step": 208, "batch_size": 64, "mean": 19.343997955322266, "std": 17.72580909729004, "min": -40.717994689941406, "p10": 2.4564746856689457, "median": 17.104816436767578, "p90": 42.09548416137695, "max": 66.56083679199219, "pos_frac": 0.921875, "sample": [13.620277404785156, 9.291866302490234, 12.765899658203125, 7.712318420410156, 11.88106918334961, -7.5393524169921875, 39.85307312011719, 4.137336730957031, 3.9094276428222656, 41.971588134765625, 23.864669799804688, 34.619808197021484, 14.075370788574219, 23.583345413208008, 29.576805114746094, 19.50659942626953, 10.428787231445312, 1.4307594299316406, 22.228103637695312, 34.01454162597656, 29.070159912109375, 24.90896987915039, 36.63902282714844, 15.480316162109375, 4.494056701660156, 17.970138549804688, 48.690185546875, 17.365188598632812, 38.340362548828125, 46.52995300292969, -6.149467468261719, 4.191017150878906, 8.002861022949219, 17.02527618408203, 34.5578727722168, 46.46190643310547, 8.105384826660156, 13.495758056640625, 13.857574462890625, 13.43228530883789, 66.56083679199219, 2.306079864501953, 35.83833312988281, 6.3602752685546875, 2.8073959350585938, 35.75702667236328, 25.78367805480957, 47.130088806152344, -40.717994689941406, 42.148582458496094, -10.754837036132812, 5.9140625, 33.267662048339844, 10.030471801757812, 27.845726013183594, 26.85570526123047, 10.311386108398438, 17.184356689453125, 47.93070983886719, 10.53561019897461, 26.684642791748047, 12.686843872070312, 19.92591094970703, -7.777870178222656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000208.npy"}
{"epoch": 0.3054331864904552, "step": 209, "batch_size": 64, "mean": 16.96176528930664, "std": 18.68191909790039, "min": -21.197586059570312, "p10": -6.620913505554199, "median": 12.62936782836914, "p90": 43.13382148742676, "max": 61.83253479003906, "pos_frac": 0.859375, "sample": [22.826812744140625, 9.083309173583984, 48.932090759277344, 5.957313537597656, 49.916290283203125, 61.83253479003906, 12.745880126953125, -6.348657608032227, 39.72206115722656, 13.897178649902344, 25.985092163085938, 11.686195373535156, 33.32377243041992, 6.7696990966796875, 42.77182388305664, 13.00653076171875, 47.49156951904297, 12.512855529785156, 11.519676208496094, 35.18572235107422, 16.450218200683594, 11.727821350097656, 3.691636085510254, 6.703315734863281, 11.233932495117188, -21.197586059570312, -10.31446647644043, 30.652183532714844, 19.228660583496094, 7.8593902587890625, 33.73704528808594, 35.691551208496094, 5.925994873046875, 0.2916450500488281, 3.24273681640625, 43.7266845703125, -13.588224411010742, 11.787979125976562, 24.66217041015625, 53.99176025390625, 0.2216339111328125, 5.476432800292969, 7.380376815795898, 6.90484619140625, -18.817230224609375, 18.770179748535156, 32.36882019042969, -8.245487213134766, 35.87120819091797, 8.766651153564453, 16.49547576904297, 11.165359497070312, 18.722427368164062, -13.407257080078125, 39.83265686035156, 36.1485595703125, 32.4460334777832, 9.808807373046875, 23.152099609375, -6.7375946044921875, 13.734603881835938, 43.288963317871094, 0.98309326171875, -3.099761962890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000209.npy"}
{"epoch": 0.3069016152716593, "step": 210, "batch_size": 64, "mean": 16.353267669677734, "std": 18.01125144958496, "min": -15.154769897460938, "p10": -3.5937793731689442, "median": 11.70030689239502, "p90": 43.484214782714844, "max": 66.66470336914062, "pos_frac": 0.84375, "sample": [9.39422607421875, 8.934860229492188, 43.52019500732422, 31.99254608154297, 3.948974609375, 5.3359375, 17.27045440673828, 2.028367042541504, 7.294036865234375, -2.766040802001953, 47.326087951660156, 42.68355178833008, 5.809221267700195, 3.1099510192871094, 36.76139450073242, -11.548004150390625, 4.92694091796875, 30.466171264648438, 45.76909637451172, 8.431221008300781, 40.41386413574219, 10.966226577758789, 3.7721595764160156, 28.447052001953125, -3.9485244750976562, 3.2180747985839844, 6.733312606811523, -9.903656005859375, 14.202682495117188, 24.639801025390625, 13.379783630371094, 43.40026092529297, -15.154769897460938, 34.59618377685547, 15.222930908203125, 3.908294677734375, 34.21034240722656, 3.0420379638671875, 10.912960052490234, -4.887094497680664, 16.211498260498047, 12.43438720703125, 0.8315658569335938, -7.2850341796875, 8.604175567626953, 46.84251403808594, 49.81427001953125, -1.4806900024414062, 16.738632202148438, -4.196409225463867, 47.34339904785156, 66.66470336914062, 27.30417251586914, 32.189640045166016, 22.27093505859375, 10.228416442871094, 17.42754364013672, -1.3173179626464844, 1.333892822265625, 12.6241455078125, 16.782669067382812, 4.6325836181640625, 13.870681762695312, 38.87755584716797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000210.npy"}
{"epoch": 0.30837004405286345, "step": 211, "batch_size": 64, "mean": 16.896448135375977, "std": 18.903907775878906, "min": -23.68798828125, "p10": -7.385555267333984, "median": 16.515193939208984, "p90": 40.56665420532227, "max": 84.94468688964844, "pos_frac": 0.8125, "sample": [24.13461685180664, 10.916204452514648, 29.448272705078125, -7.431983947753906, 29.218597412109375, 16.539321899414062, 39.437042236328125, 41.77259826660156, 56.95014953613281, 19.723312377929688, 26.382583618164062, -10.985803604125977, 6.6372222900390625, 30.827438354492188, 2.707763671875, 45.216094970703125, 45.95710754394531, 6.285064697265625, 36.855125427246094, -13.964981079101562, 7.164087295532227, -23.68798828125, 2.3930511474609375, 27.54216766357422, 20.9971923828125, -8.143363952636719, -6.096397399902344, 29.782516479492188, -0.9771194458007812, 16.840179443359375, 84.94468688964844, 16.132308959960938, 20.17833709716797, 24.471534729003906, 43.063453674316406, 26.932083129882812, 10.73284912109375, 25.66606903076172, 11.825981140136719, 25.984115600585938, 15.285579681396484, 16.940231323242188, 19.89495849609375, 6.103107452392578, 2.7068023681640625, 41.05077362060547, 14.635631561279297, 20.057632446289062, 3.1727237701416016, 23.175884246826172, -7.2772216796875, -7.252830505371094, -9.005683898925781, 8.488204956054688, 39.197669982910156, 16.491065979003906, 18.865257263183594, 10.739753723144531, -8.91627311706543, 16.40833854675293, 6.481880187988281, 37.14808654785156, -0.6651763916015625, 5.2747650146484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000211.npy"}
{"epoch": 0.30983847283406757, "step": 212, "batch_size": 64, "mean": 23.030893325805664, "std": 19.947629928588867, "min": -23.489913940429688, "p10": 0.016555023193360496, "median": 21.442806243896484, "p90": 51.36478576660157, "max": 82.04487609863281, "pos_frac": 0.890625, "sample": [18.90955352783203, 3.6561203002929688, 82.04487609863281, 62.00245666503906, 19.83942413330078, 18.605064392089844, -3.417724609375, 4.879917144775391, 28.241188049316406, 8.723640441894531, 39.84748840332031, -23.489913940429688, 1.1221237182617188, -1.3693695068359375, 14.816482543945312, 30.367897033691406, 44.74755859375, 52.87786102294922, 63.4522705078125, 36.479522705078125, -0.4572601318359375, -2.2792739868164062, 21.45331573486328, 26.125335693359375, 25.37512969970703, 46.855316162109375, 58.804962158203125, 2.2867584228515625, 30.052261352539062, 4.235912322998047, 18.449851989746094, 8.824344635009766, 23.36358642578125, 44.452484130859375, 8.401063919067383, 16.91150665283203, 26.27227783203125, 13.743942260742188, 26.01160430908203, 21.770919799804688, 27.35602569580078, 47.59979248046875, 21.202131271362305, 52.64903259277344, 31.591751098632812, 19.31072998046875, 6.8078155517578125, -4.302131652832031, 48.36820983886719, -11.592884063720703, 23.837257385253906, 24.507247924804688, 21.432296752929688, 10.294075012207031, 18.99044418334961, 10.965766906738281, 30.181396484375, 36.458744049072266, 23.59588623046875, 33.974971771240234, 2.7138214111328125, 14.098308563232422, 4.9316864013671875, 56.01434326171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000212.npy"}
{"epoch": 0.31130690161527164, "step": 213, "batch_size": 64, "mean": 18.629619598388672, "std": 18.594533920288086, "min": -32.88861083984375, "p10": -2.3645076751708984, "median": 17.81218910217285, "p90": 43.622871780395506, "max": 64.49574279785156, "pos_frac": 0.828125, "sample": [34.869293212890625, 8.717483520507812, 15.279136657714844, 33.29796600341797, 17.77679443359375, 15.523628234863281, 1.5837821960449219, 43.49762725830078, -0.5053558349609375, 31.498138427734375, -2.4069252014160156, 29.2962646484375, 29.79059600830078, 3.282590866088867, 33.76581954956055, 28.109851837158203, -1.2297821044921875, -5.765186309814453, 43.67654800415039, 64.49574279785156, 45.09526062011719, 17.983619689941406, 16.90842056274414, 48.81578063964844, 45.27753448486328, 18.17090606689453, 5.56353759765625, -11.636512756347656, 12.238754272460938, -5.634033203125, 9.367835998535156, 14.750396728515625, 23.51189422607422, 3.5310821533203125, 28.25220489501953, 21.744178771972656, 14.266586303710938, 33.71293640136719, 17.806140899658203, 24.163856506347656, 22.958702087402344, -2.265533447265625, 44.255897521972656, -18.2303466796875, -0.8199539184570312, 21.82740020751953, 20.94192123413086, 64.21432495117188, -32.88861083984375, 33.98651885986328, 17.8182373046875, 41.12007141113281, 16.853988647460938, 1.8112068176269531, 8.8463134765625, -7.146598815917969, 35.360740661621094, 4.71826171875, 9.370668411254883, 13.742271423339844, 26.59408950805664, 6.091175079345703, 24.359577178955078, 30.33083724975586], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000213.npy"}
{"epoch": 0.31277533039647576, "step": 214, "batch_size": 64, "mean": 16.192214965820312, "std": 22.042314529418945, "min": -26.196880340576172, "p10": -11.294264030456539, "median": 14.878044128417969, "p90": 47.3191505432129, "max": 91.41706848144531, "pos_frac": 0.78125, "sample": [21.689727783203125, 52.25598907470703, 0.792449951171875, -0.15301132202148438, 23.450550079345703, -15.5423583984375, 3.0248336791992188, -17.343856811523438, 21.87206268310547, 18.709495544433594, -13.930057525634766, 11.840774536132812, 5.241436004638672, -1.7108535766601562, 34.43556213378906, 10.236961364746094, -3.344512939453125, 11.353580474853516, 21.71484375, 37.47621154785156, 11.37704849243164, 15.126739501953125, 23.8721923828125, 6.179283142089844, 25.744747161865234, 32.722381591796875, 34.873443603515625, 5.428956985473633, 14.629348754882812, -20.052764892578125, 28.772464752197266, 40.268699645996094, 15.721588134765625, 55.18268585205078, 8.388916015625, 52.615509033203125, 60.601253509521484, 9.3690185546875, 23.778564453125, 19.104049682617188, 17.08759307861328, 1.0254135131835938, 32.87842559814453, 91.41706848144531, -26.196880340576172, 50.48388671875, 12.323440551757812, 9.177459716796875, 18.55572509765625, -3.8003082275390625, 16.811676025390625, 26.912063598632812, 8.141535758972168, 1.728546142578125, -1.280029296875, -23.645591735839844, 15.93850326538086, 45.877830505371094, 33.584869384765625, 1.359039306640625, -5.8777313232421875, 47.936859130859375, -6.609506607055664, -13.302017211914062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000214.npy"}
{"epoch": 0.3142437591776799, "step": 215, "batch_size": 64, "mean": 21.456758499145508, "std": 19.102970123291016, "min": -8.102008819580078, "p10": -0.6553718566894529, "median": 18.151824951171875, "p90": 45.47415122985841, "max": 75.2066650390625, "pos_frac": 0.875, "sample": [32.428871154785156, 40.71698760986328, 10.6253662109375, 25.027969360351562, 9.732612609863281, 30.18353271484375, -8.102008819580078, 23.145545959472656, 29.4871826171875, 31.974000930786133, 75.0446548461914, 29.78011703491211, 23.1773738861084, 27.02257537841797, 41.65214538574219, 28.140228271484375, 8.183120727539062, 15.473442077636719, 71.19073486328125, 36.33127212524414, 1.3754768371582031, 49.36907958984375, 75.2066650390625, 22.258270263671875, 17.319427490234375, -5.8851165771484375, -2.577381134033203, 15.633201599121094, 35.44407653808594, 20.663772583007812, 7.076053619384766, 14.223033905029297, -4.119529724121094, 0.3538322448730469, 17.90953826904297, 17.682788848876953, -2.803558349609375, 24.072250366210938, 29.471725463867188, 16.186080932617188, -2.71795654296875, 21.597023010253906, 54.37336730957031, 43.140167236328125, 36.27817916870117, 13.291671752929688, 46.474430084228516, 7.18865966796875, 18.39411163330078, 50.55780792236328, 22.148277282714844, 17.16353988647461, 30.9989013671875, 1.7413368225097656, -0.435089111328125, 6.82513427734375, 8.841545104980469, 35.57646942138672, -0.7497787475585938, 11.493158340454102, 0.06260490417480469, 13.855297088623047, 1.8727760314941406, 5.185478210449219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000215.npy"}
{"epoch": 0.315712187958884, "step": 216, "batch_size": 64, "mean": 17.97332000732422, "std": 17.59032440185547, "min": -18.308197021484375, "p10": 0.13700332641601612, "median": 14.386823654174805, "p90": 41.165831375122075, "max": 82.6205825805664, "pos_frac": 0.890625, "sample": [17.429183959960938, 31.630126953125, 2.6082916259765625, 22.289962768554688, 4.063423156738281, 7.5999603271484375, 29.567832946777344, 14.286914825439453, 27.061660766601562, 3.5566940307617188, 23.0927734375, 1.019256591796875, 27.901596069335938, 2.6660537719726562, 23.640586853027344, 27.063201904296875, -5.714899063110352, 44.59648132324219, 16.032001495361328, -18.308197021484375, 2.3217086791992188, 54.901763916015625, 9.06924057006836, 25.666046142578125, 41.62220001220703, 14.89788818359375, 34.91322326660156, 40.020408630371094, 49.6224365234375, 4.243671417236328, 32.589599609375, 27.301631927490234, -1.7191076278686523, 10.671952247619629, 20.345558166503906, 12.34320068359375, 11.345901489257812, 27.45378875732422, 7.3415374755859375, 9.740581512451172, 8.094776153564453, 5.401176452636719, -3.985820770263672, 8.55911636352539, 42.22645568847656, 15.920289993286133, 40.10097122192383, 22.199737548828125, -0.07297515869140625, 29.307647705078125, 5.5566253662109375, 12.065608978271484, -2.5174808502197266, 11.199588775634766, 7.1294708251953125, 82.6205825805664, 25.997909545898438, 14.486732482910156, 49.969032287597656, 35.39216613769531, 0.65350341796875, -0.18662643432617188, 2.7709922790527344, 0.626953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000216.npy"}
{"epoch": 0.31718061674008813, "step": 217, "batch_size": 64, "mean": 16.616840362548828, "std": 19.96677017211914, "min": -19.275955200195312, "p10": -8.874074268341062, "median": 14.099662780761719, "p90": 46.55165710449219, "max": 68.44793701171875, "pos_frac": 0.828125, "sample": [47.1219482421875, 31.930179595947266, 18.889663696289062, -13.443387985229492, 8.00545883178711, 19.249191284179688, 26.638904571533203, 11.650711059570312, 20.9293212890625, 26.026397705078125, 43.1844482421875, 55.236419677734375, 22.28515625, 56.40691375732422, 24.491016387939453, -13.928817749023438, 14.360157012939453, -9.875892639160156, 32.10284423828125, 68.44793701171875, -4.9066925048828125, 5.434673309326172, 16.70965576171875, 6.0096282958984375, 7.610321044921875, 29.01740264892578, 2.504180908203125, -16.772884368896484, -4.00531005859375, 68.41578674316406, 54.00920104980469, 12.081130981445312, -14.990825653076172, 5.518470764160156, 45.220977783203125, 18.732463836669922, 20.081390380859375, 11.059135437011719, -19.275955200195312, 26.779937744140625, 32.82398986816406, 13.223281860351562, 49.993988037109375, 10.115524291992188, 19.423629760742188, 15.931587219238281, 25.628524780273438, 9.282135009765625, 17.662220001220703, 21.130144119262695, -17.040695190429688, 10.585281372070312, 10.187469482421875, 13.839168548583984, 25.788772583007812, 5.2550506591796875, 3.4360198974609375, 7.794614791870117, 7.7457427978515625, 2.31451416015625, 16.74151611328125, 13.100357055664062, -3.8897933959960938, -6.536498069763184], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000217.npy"}
{"epoch": 0.3186490455212922, "step": 218, "batch_size": 64, "mean": 18.908458709716797, "std": 21.201505661010742, "min": -35.66002655029297, "p10": -6.244234466552735, "median": 18.507831573486328, "p90": 43.80574264526368, "max": 79.41845703125, "pos_frac": 0.84375, "sample": [20.465660095214844, 34.535987854003906, 44.209190368652344, 12.932796478271484, 41.72480010986328, 7.650707244873047, 48.18341064453125, 28.670494079589844, 8.649181365966797, -6.2486724853515625, 22.38037109375, 20.550315856933594, -12.148452758789062, 27.129364013671875, 34.869964599609375, 8.466598510742188, -15.283340454101562, 7.982521057128906, 25.894119262695312, 22.80677032470703, 12.29351806640625, 25.722455978393555, 21.178260803222656, -31.050872802734375, 6.1583251953125, -5.321449279785156, 18.063461303710938, 37.38890838623047, 42.86436462402344, 57.113059997558594, 31.615129470825195, 25.087554931640625, 54.0869140625, 12.223129272460938, 37.668670654296875, 34.779808044433594, 27.919164657592773, -2.0137557983398438, 17.204448699951172, 13.494773864746094, 23.428024291992188, 4.482280731201172, 27.17403793334961, 11.616020202636719, 9.206817626953125, -7.856910705566406, 79.41845703125, 54.541473388671875, -9.705989837646484, 18.95220184326172, 6.452934265136719, 37.200355529785156, 24.5289306640625, 14.21017837524414, 6.797050476074219, 35.31597137451172, 63.375938415527344, 1.7550048828125, 15.072906494140625, -35.66002655029297, 1.41131591796875, 2.209808349609375, -6.233879089355469, 12.550804138183594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000218.npy"}
{"epoch": 0.3201174743024963, "step": 219, "batch_size": 64, "mean": 17.627521514892578, "std": 22.152233123779297, "min": -49.757965087890625, "p10": -3.414673614501953, "median": 13.695892333984375, "p90": 47.652035903930674, "max": 76.46487426757812, "pos_frac": 0.84375, "sample": [41.15704345703125, 27.242233276367188, 28.18838882446289, 48.67906188964844, 16.535993576049805, 21.586639404296875, -49.757965087890625, 4.335029602050781, 9.22479248046875, 51.8492431640625, 28.544403076171875, 1.9655647277832031, 11.322662353515625, 62.50042724609375, 70.54263305664062, 10.654685974121094, -0.2736091613769531, 38.18701934814453, 8.146354675292969, 45.25564193725586, 31.10741424560547, 5.6582489013671875, -3.2753219604492188, 21.622940063476562, 30.65447998046875, -4.766929626464844, 40.275360107421875, 8.600757598876953, 12.251060485839844, 14.02569580078125, 26.266956329345703, 76.46487426757812, -7.442535400390625, 4.579021453857422, -8.338401794433594, 14.834747314453125, 32.91388702392578, 6.804847717285156, 10.997406005859375, 53.34332275390625, 26.33385467529297, 1.3505859375, 0.167327880859375, 16.3353271484375, 54.94672775268555, 16.887622833251953, 6.794525146484375, -2.2487335205078125, 8.15066909790039, 29.895294189453125, -28.578216552734375, 28.09210205078125, 3.6181564331054688, -3.474395751953125, -17.7845458984375, 22.774898529052734, 7.054721832275391, 21.52239990234375, 23.710113525390625, 10.941650390625, 13.3660888671875, 39.25004577636719, 1.3581161499023438, 5.232967376708984], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000219.npy"}
{"epoch": 0.32158590308370044, "step": 220, "batch_size": 64, "mean": 15.754082679748535, "std": 16.540620803833008, "min": -14.899101257324219, "p10": -5.655397415161133, "median": 13.83199691772461, "p90": 37.3312671661377, "max": 62.18357849121094, "pos_frac": 0.828125, "sample": [-14.899101257324219, 9.136642456054688, 11.028182983398438, -5.688007354736328, 46.011722564697266, 39.39015197753906, 19.818588256835938, 8.9937744140625, -11.836761474609375, 36.13135528564453, 23.251327514648438, 0.063079833984375, 34.671852111816406, 22.058094024658203, 36.17764663696289, 27.4923095703125, 14.286483764648438, -9.895036697387695, 37.82567596435547, 41.660003662109375, 62.18357849121094, 6.767911911010742, 42.654327392578125, 22.877761840820312, -0.3755950927734375, 13.377510070800781, 26.631465911865234, 2.55767822265625, -5.579307556152344, 8.222049713134766, 43.46159362792969, -6.7072296142578125, 10.225666046142578, 2.57305908203125, 34.349609375, 21.636856079101562, 3.383575439453125, 3.76605224609375, 2.4504318237304688, 21.254653930664062, 2.72625732421875, 22.070907592773438, 13.267210006713867, -8.726242065429688, 9.507137298583984, -4.819150924682617, 27.52435302734375, 21.289993286132812, 10.771507263183594, 3.0356826782226562, 2.315032958984375, 33.880897521972656, 10.767189025878906, 17.69464111328125, -9.895271301269531, 24.020915985107422, 29.2076416015625, 29.801528930664062, 23.0626220703125, 19.583648681640625, 23.388641357421875, -1.6180000305175781, 21.30804443359375, 6.706451416015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000220.npy"}
{"epoch": 0.32305433186490456, "step": 221, "batch_size": 64, "mean": 23.293336868286133, "std": 20.17242431640625, "min": -16.729843139648438, "p10": -1.934389495849608, "median": 19.174917221069336, "p90": 51.75602951049806, "max": 77.84333038330078, "pos_frac": 0.875, "sample": [2.5748634338378906, -8.609603881835938, -4.610816955566406, 8.3634033203125, 20.931304931640625, 55.5152587890625, 18.90753173828125, 17.955963134765625, 31.479190826416016, 77.84333038330078, 44.69581604003906, -8.180870056152344, 27.904888153076172, 39.479095458984375, 28.587093353271484, 8.984443664550781, 38.87200927734375, 36.31617736816406, 14.127815246582031, 41.09075927734375, 39.06632995605469, 5.857574462890625, -0.42375946044921875, 17.992225646972656, 23.484878540039062, 59.86644744873047, 38.5148811340332, 16.986907958984375, 9.265701293945312, -6.251312255859375, 30.7904052734375, 52.83271026611328, 29.185997009277344, 34.20728302001953, 8.82940673828125, 33.63471221923828, 7.998775482177734, 19.442302703857422, 17.183330535888672, 13.654960632324219, 13.968452453613281, 70.54290771484375, 57.784969329833984, 13.988555908203125, 20.82485580444336, 7.1440582275390625, 38.552101135253906, 35.49854278564453, 45.542137145996094, 23.02972412109375, 16.626022338867188, -6.079845428466797, 14.0458984375, 3.9792938232421875, 18.535186767578125, 12.258926391601562, -16.729843139648438, 35.3505859375, 14.025646209716797, -2.5818023681640625, 21.25335693359375, 1.812032699584961, 49.2437744140625, 57.81053924560547], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000221.npy"}
{"epoch": 0.3245227606461087, "step": 222, "batch_size": 64, "mean": 19.7185001373291, "std": 24.799470901489258, "min": -44.952545166015625, "p10": -5.419067764282226, "median": 16.97915554046631, "p90": 51.57283020019532, "max": 89.45675659179688, "pos_frac": 0.828125, "sample": [7.5537261962890625, 40.301025390625, 12.345527648925781, 12.601123809814453, -9.410476684570312, 22.190860748291016, -2.037811279296875, 35.82024383544922, -44.08447265625, 34.41264343261719, 22.64191436767578, 3.070697784423828, 83.52436828613281, 35.63025665283203, 16.64764404296875, 50.12590789794922, 13.7977294921875, 14.186416625976562, 38.37501525878906, 34.31010818481445, 11.696945190429688, 8.066757202148438, -44.952545166015625, 35.2633056640625, 5.4279632568359375, 16.650428771972656, 20.722145080566406, -4.880889892578125, 25.272628784179688, 17.30788230895996, 27.54144287109375, 10.780471801757812, 28.11968231201172, 6.455341339111328, -25.474884033203125, 10.430938720703125, -0.25179290771484375, 22.905994415283203, 2.175884246826172, 35.637603759765625, 32.432777404785156, 15.863296508789062, 22.634258270263672, -27.119674682617188, 52.648345947265625, 16.052867889404297, 23.184688568115234, 40.194969177246094, 13.876930236816406, 5.730079650878906, 15.105316162109375, 29.537076950073242, 30.59514617919922, 52.827308654785156, 60.10321044921875, -10.169082641601562, 52.19293975830078, -5.649715423583984, -2.0595016479492188, 0.2991943359375, 89.45675659179688, 37.50909423828125, 63.8055419921875, 22.034408569335938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000222.npy"}
{"epoch": 0.32599118942731276, "step": 223, "batch_size": 64, "mean": 15.932899475097656, "std": 20.124418258666992, "min": -32.8095588684082, "p10": -8.35456848144531, "median": 13.95182991027832, "p90": 41.224230194091795, "max": 82.139404296875, "pos_frac": 0.828125, "sample": [6.3993988037109375, -9.16412353515625, 20.022125244140625, -5.747215270996094, 3.3268814086914062, 2.1613845825195312, 28.593833923339844, 15.193075180053711, 18.426136016845703, -16.59009552001953, -14.615760803222656, 27.470428466796875, 16.409629821777344, 12.8406982421875, 14.324874877929688, 13.739021301269531, 14.61362075805664, 17.64640998840332, 41.94146728515625, 6.574420928955078, 26.55970001220703, -10.501937866210938, 14.16463851928711, -32.8095588684082, -11.649730682373047, 46.56468963623047, 2.4641647338867188, 35.47770690917969, 36.81578063964844, 3.04705810546875, 22.511497497558594, 15.355159759521484, 10.630203247070312, 35.668174743652344, 36.38087463378906, 41.18690490722656, -6.465606689453125, 13.232955932617188, 12.367172241210938, 82.139404296875, 40.2618408203125, -18.229114532470703, 20.225444793701172, 37.35646057128906, 41.24022674560547, -4.455841064453125, 5.7291107177734375, 11.841453552246094, 25.82868194580078, 43.54523849487305, 3.9651756286621094, 1.724752426147461, 10.874763488769531, 48.648223876953125, -0.5805091857910156, 48.015350341796875, 2.552715301513672, 1.3776493072509766, 9.1220703125, 1.84674072265625, 21.338808059692383, 38.148414611816406, 4.340721130371094, 38.28175354003906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000223.npy"}
{"epoch": 0.3274596182085169, "step": 224, "batch_size": 64, "mean": 19.053667068481445, "std": 20.1626033782959, "min": -25.511825561523438, "p10": -6.717670822143554, "median": 17.720404624938965, "p90": 40.83401260375977, "max": 79.75128173828125, "pos_frac": 0.828125, "sample": [9.421249389648438, -25.511825561523438, 23.284080505371094, 8.328170776367188, 3.2119216918945312, 62.306983947753906, 5.792900085449219, 29.126258850097656, 15.550125122070312, 19.760658264160156, 17.28362464904785, 45.80616760253906, 25.820404052734375, 25.11688995361328, 39.96795654296875, -6.877845764160156, 16.395004272460938, 8.019594192504883, 34.093170166015625, 36.43109130859375, 40.76874542236328, 18.510330200195312, -5.934471130371094, 17.835777282714844, -9.119064331054688, 30.515419006347656, 7.88970947265625, 26.984634399414062, 33.2120361328125, -9.040119171142578, -5.306976318359375, -12.309440612792969, 15.928180694580078, 79.75128173828125, 11.374862670898438, -12.399765014648438, 19.120155334472656, -9.119918823242188, 23.46805191040039, 7.008968353271484, 36.22415542602539, 40.86198425292969, 8.337806701660156, 2.6913719177246094, 12.907272338867188, 17.605031967163086, -6.343929290771484, 11.607131958007812, 21.42099380493164, 13.663009643554688, 23.260589599609375, 16.013076782226562, 52.278106689453125, 28.08509063720703, -3.8403396606445312, 13.117843627929688, 25.67285919189453, 18.762481689453125, 15.872932434082031, 29.165237426757812, 31.14211654663086, 49.09169006347656, 77.6791000366211, 21.694141387939453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000224.npy"}
{"epoch": 0.328928046989721, "step": 225, "batch_size": 64, "mean": 21.81402587890625, "std": 23.53586196899414, "min": -33.759090423583984, "p10": -2.6490238189697264, "median": 22.44729995727539, "p90": 52.72145004272461, "max": 100.70321655273438, "pos_frac": 0.828125, "sample": [22.614181518554688, 84.96438598632812, 19.19139862060547, 18.857784271240234, 34.83707046508789, 16.463314056396484, 4.972618103027344, 27.9140625, -7.929779052734375, 32.987144470214844, 20.76616668701172, 25.476768493652344, -30.283340454101562, 9.02349853515625, 31.68366241455078, 52.01725769042969, -33.759090423583984, -2.751117706298828, -2.4108047485351562, 8.27154541015625, 8.859909057617188, 55.668182373046875, 56.824188232421875, 54.93825149536133, -0.9023532867431641, 4.94354248046875, 67.28751373291016, 33.283348083496094, 38.24736404418945, 23.040931701660156, 16.78136444091797, 11.53131103515625, 9.926467895507812, 10.851203918457031, -5.892539978027344, 40.971832275390625, 27.1051025390625, 4.064414978027344, 4.9597320556640625, 26.523155212402344, 31.878524780273438, 25.21070098876953, 21.8004150390625, 35.40430450439453, 53.02324676513672, 14.423492431640625, 100.70321655273438, 44.804988861083984, 29.334030151367188, 30.41309356689453, 24.747238159179688, 17.856369018554688, -1.9620208740234375, -1.053863525390625, -16.324138641357422, 32.07112121582031, 23.746063232421875, 22.280418395996094, 15.789939880371094, 26.77520751953125, 0.5836067199707031, 24.494461059570312, -10.180503845214844, 28.358184814453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000225.npy"}
{"epoch": 0.3303964757709251, "step": 226, "batch_size": 64, "mean": 19.608936309814453, "std": 18.59842872619629, "min": -17.77538299560547, "p10": -1.0894866943359371, "median": 21.060680389404297, "p90": 45.14620437622071, "max": 70.49813842773438, "pos_frac": 0.875, "sample": [70.49813842773438, 14.988815307617188, -8.139511108398438, 58.80364990234375, 12.90958023071289, -0.6982574462890625, 5.929412841796875, 46.12556457519531, 8.936527252197266, 14.39895248413086, 29.92786407470703, 23.809059143066406, 1.3797969818115234, 28.65642547607422, 10.646011352539062, 5.567899703979492, 24.871871948242188, 2.5622406005859375, 29.394718170166016, 18.936233520507812, 39.55158996582031, 39.226409912109375, 11.137704849243164, 21.143280029296875, 13.260459899902344, 4.0468902587890625, 22.79572296142578, 22.979263305664062, -14.299560546875, -17.77538299560547, -1.2571563720703125, 28.935569763183594, 26.95355224609375, 31.456226348876953, 21.420852661132812, 21.094970703125, 40.2928466796875, 21.026390075683594, 17.46703338623047, 7.657157897949219, 25.018661499023438, 2.10626220703125, 13.555500030517578, 5.951591491699219, 46.751277923583984, -3.8890304565429688, 28.84844970703125, 23.56304931640625, -7.550329208374023, 21.345367431640625, 39.28204345703125, 4.54217529296875, 24.720748901367188, 21.510459899902344, 0.4509315490722656, 48.29754638671875, 17.4901123046875, 5.8996429443359375, 58.81156921386719, 9.114278793334961, 37.68363952636719, -17.458045959472656, 49.4461669921875, 42.86103057861328], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000226.npy"}
{"epoch": 0.33186490455212925, "step": 227, "batch_size": 64, "mean": 19.70592498779297, "std": 22.856199264526367, "min": -23.717803955078125, "p10": -8.294147491455076, "median": 16.66073989868164, "p90": 52.51931457519532, "max": 77.80963134765625, "pos_frac": 0.78125, "sample": [16.795944213867188, 7.7885894775390625, 7.269645690917969, 42.21403503417969, 35.757080078125, 11.330074310302734, 33.16507339477539, 28.792911529541016, 32.23388671875, 13.614738464355469, 9.556835174560547, 58.61042785644531, -4.8074493408203125, 3.892427444458008, -6.512825012207031, 28.58380889892578, -9.399299621582031, 28.238555908203125, 48.65966796875, 77.80963134765625, 10.439460754394531, 22.0048828125, 27.650955200195312, 19.5361328125, 9.792091369628906, -12.972160339355469, 45.34790802001953, -2.6233062744140625, 17.995803833007812, 16.525535583496094, 41.41389465332031, 45.2010498046875, 26.906875610351562, -5.716587066650391, -17.36639404296875, -20.1207275390625, 12.66351318359375, -5.9293212890625, 52.974853515625, -23.717803955078125, 33.183067321777344, 53.62837219238281, -6.109779357910156, 55.258575439453125, 5.978080749511719, 20.27752685546875, 38.71476745605469, 7.465980529785156, 56.104705810546875, 14.345979690551758, 15.37252426147461, 54.23600769042969, 51.456390380859375, -21.200286865234375, -3.4450740814208984, 28.505151748657227, 41.51972961425781, -9.057571411132812, 7.3875732421875, 7.8596343994140625, 19.763195037841797, 14.142858505249023, 4.075771331787109, 48.11566162109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000227.npy"}
{"epoch": 0.3333333333333333, "step": 228, "batch_size": 64, "mean": 15.936595916748047, "std": 17.9883975982666, "min": -28.310951232910156, "p10": -5.8372863769531245, "median": 18.41212272644043, "p90": 36.85183258056641, "max": 59.14545440673828, "pos_frac": 0.8125, "sample": [-6.98138427734375, -22.112884521484375, 29.303436279296875, 46.869293212890625, -20.348541259765625, 46.118186950683594, -3.3566360473632812, 10.704818725585938, 28.42247772216797, -1.0964927673339844, 27.288742065429688, 27.916168212890625, 26.516510009765625, 37.20374298095703, 19.319847106933594, 16.71435546875, 44.58283996582031, -28.310951232910156, 8.061098098754883, 10.984912872314453, 1.92926025390625, 1.5342254638671875, 3.48883056640625, 43.29816818237305, 21.064651489257812, 14.985580444335938, 3.534473419189453, -0.3943328857421875, 10.366584777832031, -7.5481109619140625, 59.14545440673828, 14.7015380859375, 24.29761505126953, 47.16630172729492, 30.28582000732422, 10.946056365966797, 3.6026611328125, 18.841835021972656, 30.23028564453125, 30.646514892578125, -4.378631591796875, 36.03070831298828, 24.189682006835938, 33.20991134643555, 7.322948455810547, 18.460281372070312, 6.0809783935546875, -21.674545288085938, 19.573593139648438, 12.999069213867188, 18.887008666992188, 24.118072509765625, -6.192474365234375, 25.040138244628906, 14.912483215332031, 3.023679733276367, 28.14373016357422, -5.008514404296875, 18.707077026367188, 26.604049682617188, 29.96649169921875, 26.44049072265625, 5.1989898681640625, 18.363964080810547], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000228.npy"}
{"epoch": 0.33480176211453744, "step": 229, "batch_size": 64, "mean": 25.238357543945312, "std": 23.789390563964844, "min": -22.511207580566406, "p10": -2.578567504882812, "median": 23.5614070892334, "p90": 54.1190860748291, "max": 93.56983947753906, "pos_frac": 0.875, "sample": [53.620201110839844, 17.247270584106445, 9.537429809570312, 40.907928466796875, 17.779022216796875, 10.828258514404297, 28.14306640625, 37.80294418334961, 6.0759429931640625, 31.34436798095703, 53.89178466796875, 50.37645721435547, 23.694225311279297, 64.93901062011719, 67.97984313964844, 40.231163024902344, 24.349685668945312, 17.41094970703125, 30.125022888183594, 70.64859008789062, -12.810592651367188, 21.987350463867188, 11.937007904052734, 32.578346252441406, 12.535171508789062, 34.349544525146484, 60.1251220703125, 1.893280029296875, 51.265289306640625, 41.283912658691406, 23.4285888671875, 30.736282348632812, 8.133808135986328, 32.98335266113281, 93.56983947753906, 7.407642364501953, 74.65232849121094, 54.13095474243164, -7.713043212890625, 3.4639434814453125, 42.20152282714844, -22.511207580566406, -2.716583251953125, 1.7616424560546875, 24.106853485107422, -4.575355529785156, 25.46441650390625, 54.091392517089844, 11.9866943359375, -2.25653076171875, -5.015117645263672, 35.27193832397461, 8.435094833374023, 3.625743865966797, 18.371135711669922, 19.275222778320312, 5.6609039306640625, 10.164352416992188, 32.978660583496094, 36.006980895996094, -6.549842834472656, 4.099208831787109, 0.713165283203125, 51.7933349609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000229.npy"}
{"epoch": 0.33627019089574156, "step": 230, "batch_size": 64, "mean": 24.8081111907959, "std": 20.33191680908203, "min": -20.380630493164062, "p10": 3.005419540405274, "median": 21.9781494140625, "p90": 51.666441726684575, "max": 76.37946319580078, "pos_frac": 0.90625, "sample": [10.214927673339844, 60.37445068359375, 12.893611907958984, 35.88671112060547, -2.5380210876464844, 19.255224227905273, 17.701812744140625, 44.02174377441406, 31.753978729248047, 28.866683959960938, 13.524551391601562, 6.4514617919921875, -14.971199035644531, 28.728973388671875, 38.789215087890625, 68.81204223632812, -6.656440734863281, 36.66636657714844, 13.885757446289062, 17.444316864013672, 59.33903503417969, -15.202037811279297, 4.084495544433594, 19.278038024902344, 29.918542861938477, 30.022911071777344, 31.531410217285156, 20.86060333251953, 9.413787841796875, 16.719879150390625, 4.8757781982421875, -0.9549674987792969, 37.889404296875, 46.862396240234375, 54.473297119140625, 50.946075439453125, 19.15863037109375, 22.166969299316406, 22.277179718017578, 45.21813201904297, 2.7775115966796875, 33.37036895751953, 34.65039825439453, 19.70294952392578, 10.634864807128906, 7.903903961181641, 62.87763977050781, 37.17596435546875, 51.97517013549805, 8.415199279785156, 12.27779769897461, 37.27418518066406, 21.789329528808594, 3.5372047424316406, 49.4716796875, -20.380630493164062, 76.37946319580078, 24.4097900390625, 17.92424774169922, 27.126083374023438, 12.594745635986328, 14.816085815429688, 44.0987548828125, 26.93073272705078], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000230.npy"}
{"epoch": 0.3377386196769457, "step": 231, "batch_size": 64, "mean": 19.79311752319336, "std": 18.631580352783203, "min": -19.818248748779297, "p10": -3.1139450073242183, "median": 17.92343807220459, "p90": 46.758520889282245, "max": 73.78521728515625, "pos_frac": 0.859375, "sample": [73.78521728515625, 20.02429962158203, 48.9235954284668, 60.58746337890625, -3.311870574951172, 28.391090393066406, 32.89385986328125, 24.959671020507812, 0.6758213043212891, 19.134613037109375, 1.4838943481445312, 11.658393859863281, -3.822355270385742, 29.90624237060547, 66.37113952636719, 23.99140167236328, 25.564239501953125, 34.44926452636719, 14.346870422363281, -4.614105224609375, 22.47222137451172, 28.950292587280273, 14.911613464355469, 10.98733139038086, 13.954994201660156, 24.16006088256836, 35.67005157470703, -5.0526275634765625, 17.48419761657715, 39.66753387451172, 41.70668029785156, 4.1790008544921875, 16.124473571777344, 18.744529724121094, 8.403175354003906, 5.748539924621582, 3.700042724609375, 26.77169418334961, 3.3058013916015625, 3.574188232421875, 13.36236572265625, 16.518413543701172, 11.7440185546875, -0.24882125854492188, 32.07347869873047, 24.32280731201172, 21.89453125, -2.652118682861328, -19.818248748779297, 16.027923583984375, 53.312171936035156, 21.06531524658203, 11.216167449951172, 29.82469940185547, 30.177669525146484, 52.87383270263672, -8.665912628173828, 18.36267852783203, -3.6157455444335938, 54.31013488769531, 12.09234619140625, 13.436161041259766, 8.076148986816406, 20.206939697265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000231.npy"}
{"epoch": 0.3392070484581498, "step": 232, "batch_size": 64, "mean": 18.341995239257812, "std": 21.190006256103516, "min": -26.7393798828125, "p10": -8.512464904785157, "median": 19.61841583251953, "p90": 44.16654891967774, "max": 83.41450500488281, "pos_frac": 0.8125, "sample": [-21.061424255371094, 17.668716430664062, 0.6000041961669922, 20.375518798828125, 9.878829956054688, 28.681838989257812, 57.319427490234375, 25.294097900390625, -18.3546142578125, 44.00298309326172, -2.7841415405273438, 10.889137268066406, 55.997859954833984, 8.273635864257812, -4.1334228515625, 1.4935455322265625, -13.663726806640625, 0.7949180603027344, -4.29986572265625, 24.27142333984375, 25.094711303710938, 50.41290283203125, 49.829566955566406, 23.454544067382812, 32.550621032714844, -8.78160285949707, 36.421417236328125, 13.434959411621094, 11.46099853515625, 28.118316650390625, 33.04767608642578, 0.6841583251953125, -26.7393798828125, 35.869720458984375, 1.4463310241699219, 24.65845489501953, 30.284738540649414, 3.9692001342773438, 23.28192138671875, 49.25628662109375, 35.95152282714844, 24.620460510253906, 36.85576629638672, 6.5856475830078125, 10.478446960449219, 12.408945083618164, 32.13240432739258, -1.2072677612304688, 25.649658203125, -9.040407180786133, 83.41450500488281, 4.4252471923828125, 18.861312866210938, 21.971946716308594, 42.988182067871094, -8.369369506835938, 44.23664855957031, 7.549465179443359, 14.87615966796875, -8.57379150390625, 3.6335906982421875, 27.50722885131836, 34.46315002441406, 33.467933654785156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000232.npy"}
{"epoch": 0.3406754772393539, "step": 233, "batch_size": 64, "mean": 24.76534652709961, "std": 23.254087448120117, "min": -24.658912658691406, "p10": -4.994076538085937, "median": 22.571121215820312, "p90": 55.782317352294925, "max": 76.37222290039062, "pos_frac": 0.84375, "sample": [3.2746658325195312, 29.828407287597656, 37.0670166015625, -8.0467529296875, 20.62126922607422, 37.245365142822266, -15.754920959472656, 14.619953155517578, 0.0619659423828125, -6.885124206542969, 15.450119018554688, 12.48630142211914, 25.468345642089844, 23.446914672851562, 76.37222290039062, 31.497215270996094, 21.180931091308594, 1.6551971435546875, 50.256160736083984, 30.41539764404297, 49.87046813964844, 48.77116394042969, -11.336212158203125, 72.84203338623047, 22.714309692382812, 16.296661376953125, 28.563884735107422, 3.4287109375, -8.333213806152344, 34.051239013671875, -1.7702102661132812, 55.617042541503906, 8.517902374267578, 55.8531494140625, 21.299522399902344, 70.79592895507812, 5.414894104003906, 41.57801818847656, -3.063018798828125, 22.427932739257812, 2.18377685546875, 57.846397399902344, 24.758560180664062, 14.410964965820312, -24.658912658691406, 16.187225341796875, -4.7020111083984375, -5.1192474365234375, 7.4858551025390625, 18.58844757080078, 7.747825622558594, 41.85881805419922, 35.47129821777344, 44.18519592285156, 53.19932556152344, 21.83990478515625, 21.206356048583984, 39.536865234375, 30.298267364501953, 59.172393798828125, 46.867462158203125, 58.62303924560547, 54.0482177734375, 30.145187377929688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000233.npy"}
{"epoch": 0.342143906020558, "step": 234, "batch_size": 64, "mean": 26.18871307373047, "std": 26.06709861755371, "min": -26.539627075195312, "p10": -3.6126068115234364, "median": 24.762535095214844, "p90": 62.23657302856445, "max": 90.4316177368164, "pos_frac": 0.859375, "sample": [17.429431915283203, 31.83966827392578, 62.23484802246094, 45.1033935546875, 29.990280151367188, 62.23731231689453, 33.039154052734375, 28.515335083007812, 33.31909942626953, 26.508773803710938, 12.81357192993164, 0.7909393310546875, -15.8321533203125, -15.602619171142578, 14.750213623046875, 24.78986358642578, 3.2478179931640625, 1.2182884216308594, -13.770294189453125, 26.188217163085938, 88.94105529785156, 49.65290451049805, 13.647649765014648, -4.0076751708984375, 33.27867126464844, 48.04962158203125, 47.77142333984375, 24.516273498535156, 17.433364868164062, 90.4316177368164, 1.0433712005615234, 27.02157211303711, 49.10411834716797, 42.24820327758789, 65.76536560058594, -15.682388305664062, 24.03832244873047, 36.873470306396484, 18.613792419433594, 69.8177490234375, -2.6907806396484375, 9.33049201965332, 16.54651641845703, 5.478742599487305, 0.3865547180175781, -1.9037322998046875, 32.8594970703125, 11.920059204101562, 33.39667510986328, 22.031105041503906, 43.85432434082031, 71.29008483886719, 74.26803588867188, -6.904609680175781, 45.81641387939453, 18.505477905273438, 8.657905578613281, 50.115882873535156, 24.735206604003906, -26.539627075195312, 57.14749526977539, 1.2672882080078125, 42.723785400390625, 6.415283203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000234.npy"}
{"epoch": 0.3436123348017621, "step": 235, "batch_size": 64, "mean": 19.665771484375, "std": 19.340375900268555, "min": -23.706298828125, "p10": -3.3247643470764148, "median": 16.9315185546875, "p90": 50.3983757019043, "max": 58.149993896484375, "pos_frac": 0.84375, "sample": [20.33153533935547, 8.082958221435547, -0.4919586181640625, -0.08618545532226562, 0.21087646484375, 58.12736511230469, 37.12828826904297, 7.903186798095703, 11.735824584960938, 47.704833984375, 10.075813293457031, 34.5447998046875, 50.79957580566406, 10.436447143554688, 13.45855712890625, -12.121383666992188, 27.796279907226562, -1.8727531433105469, 21.314800262451172, 32.80839538574219, -3.947054862976074, 34.381439208984375, -4.419242858886719, 16.557960510253906, 34.74512481689453, 58.149993896484375, 22.70941925048828, 26.04674530029297, 16.772628784179688, 30.031021118164062, -23.706298828125, 17.090408325195312, 5.485008239746094, 24.07202911376953, -5.343299865722656, 52.650245666503906, 25.379844665527344, 1.5126190185546875, 20.20471954345703, 56.746612548828125, 2.337779998779297, 7.019365310668945, 32.514068603515625, 9.608932495117188, 53.024810791015625, 20.275161743164062, 49.462242126464844, -6.673866271972656, 12.917724609375, 49.15660858154297, 39.55821228027344, 50.85801696777344, 10.6575927734375, 5.1422271728515625, 8.438966751098633, 2.440774917602539, -6.250755310058594, 32.762779235839844, 4.9595794677734375, 18.281967163085938, 15.387969970703125, 17.742630004882812, 37.36051940917969, 8.6190185546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000235.npy"}
{"epoch": 0.34508076358296624, "step": 236, "batch_size": 64, "mean": 24.66576385498047, "std": 29.16991424560547, "min": -24.21904754638672, "p10": 0.13289833068847845, "median": 18.46198081970215, "p90": 60.97619705200195, "max": 145.69418334960938, "pos_frac": 0.890625, "sample": [18.11835479736328, 8.550323486328125, 52.029632568359375, 11.050399780273438, 16.07347297668457, 12.834373474121094, 10.681488037109375, 11.426727294921875, -1.9240150451660156, -24.21904754638672, 100.47467041015625, 105.3875732421875, 20.485301971435547, -0.6553878784179688, 23.46636199951172, 29.65765380859375, 7.699733734130859, 5.1163787841796875, 21.41228485107422, 17.421112060546875, 16.83594512939453, 22.046371459960938, 8.368841171264648, 47.10343933105469, 61.0400390625, 27.252063751220703, 18.247966766357422, 145.69418334960938, 42.45166015625, 10.82429313659668, 44.27056884765625, 24.566179275512695, 60.827232360839844, 34.710880279541016, 29.914268493652344, 25.86072540283203, 42.4598503112793, 84.6143569946289, 20.22698211669922, 11.447881698608398, 28.431663513183594, 3.9667510986328125, 2.2720909118652344, -11.076255798339844, 1.9722328186035156, 30.26537322998047, 2.6335525512695312, 21.644981384277344, 4.712253570556641, -8.449087142944336, 16.394607543945312, 61.7193603515625, 7.055809020996094, -22.626567840576172, 25.183982849121094, 30.613006591796875, 66.0843505859375, 18.675994873046875, 10.903289794921875, -12.393142700195312, 14.528663635253906, 26.475006103515625, 6.189826965332031, 29.580078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000236.npy"}
{"epoch": 0.3465491923641703, "step": 237, "batch_size": 64, "mean": 20.084592819213867, "std": 28.654911041259766, "min": -55.22856140136719, "p10": -10.040402221679686, "median": 15.922436714172363, "p90": 55.562661361694346, "max": 104.24484252929688, "pos_frac": 0.78125, "sample": [-7.999835968017578, 28.471233367919922, 75.02059936523438, 41.499122619628906, 2.3817977905273438, 39.85477828979492, 51.65428924560547, 23.435279846191406, 8.936996459960938, -5.2599945068359375, 13.646183013916016, 28.526275634765625, 46.382911682128906, -8.785751342773438, 6.8957977294921875, 30.536285400390625, 9.018245697021484, -17.692407608032227, 17.887100219726562, 51.23784637451172, 12.600982666015625, -7.676971435546875, 64.209716796875, 104.24484252929688, 30.97795867919922, 43.268882751464844, -3.894073486328125, 60.09326934814453, 23.167682647705078, 18.52667236328125, 88.82119750976562, 12.526824951171875, 56.35182189941406, 53.72128677368164, 18.088363647460938, 4.8604278564453125, 19.508255004882812, -16.293502807617188, 15.661590576171875, 24.09374237060547, 81.4012451171875, 10.665473937988281, 16.18328285217285, 31.884735107421875, 3.398773193359375, -12.212539672851562, -12.43988037109375, 2.804412841796875, -55.22856140136719, 50.69673156738281, 13.241004943847656, 4.969337463378906, -25.51158905029297, 11.027400970458984, 22.009841918945312, 2.1226425170898438, 6.873332977294922, -10.578109741210938, -7.3274078369140625, 7.006752014160156, -5.8611602783203125, 51.55308532714844, 22.650527954101562, 17.57886505126953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000237.npy"}
{"epoch": 0.34801762114537443, "step": 238, "batch_size": 64, "mean": 21.72978401184082, "std": 24.07063102722168, "min": -16.255584716796875, "p10": -11.07748565673828, "median": 20.7137393951416, "p90": 56.455907440185555, "max": 92.22206115722656, "pos_frac": 0.796875, "sample": [61.499908447265625, 10.658496856689453, 7.963081359863281, 16.485031127929688, 3.6449813842773438, 21.91545867919922, 1.5451812744140625, 53.70562744140625, -5.273590087890625, 57.11634826660156, 25.36331558227539, -2.0433387756347656, 7.824047088623047, 17.37933349609375, 22.767864227294922, 7.52227783203125, 51.10478210449219, -13.756195068359375, -11.707645416259766, 7.902656555175781, 21.378650665283203, 61.20452880859375, 29.903900146484375, 23.54681396484375, -3.7078514099121094, 92.22206115722656, 10.678398132324219, 22.236183166503906, 43.14921569824219, 34.47001647949219, -2.8996353149414062, -15.181488037109375, 25.676177978515625, 22.158157348632812, 47.85887145996094, 8.262939453125, -12.966041564941406, 27.072620391845703, 65.806396484375, 20.048828125, 8.224262237548828, 66.58685302734375, 48.85486602783203, -13.297599792480469, -9.607112884521484, 54.914878845214844, 12.129196166992188, 9.691192626953125, 9.807628631591797, 63.64656066894531, 25.74791717529297, 17.380054473876953, -2.6386260986328125, 36.27796936035156, -11.783782958984375, 29.33297348022461, 8.496345520019531, 19.255210876464844, 35.22604751586914, -16.255584716796875, 26.94892120361328, 42.43717575073242, 28.456756591796875, 38.337703704833984], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000238.npy"}
{"epoch": 0.34948604992657856, "step": 239, "batch_size": 64, "mean": 20.524173736572266, "std": 22.791799545288086, "min": -22.419204711914062, "p10": -6.092936706542969, "median": 20.343181610107422, "p90": 53.46880493164064, "max": 70.17558288574219, "pos_frac": 0.78125, "sample": [61.59705352783203, 23.598770141601562, 5.4199371337890625, 28.759536743164062, 29.301734924316406, 46.82965087890625, 41.136009216308594, 3.3343238830566406, -5.666767120361328, 22.611865997314453, 45.93316650390625, 64.6463623046875, -16.562122344970703, 36.07322311401367, 15.966007232666016, -1.9836883544921875, 70.17558288574219, 25.10631561279297, -6.0729522705078125, 55.02949523925781, 37.33573913574219, 29.044235229492188, -9.978073120117188, -2.1178531646728516, -4.1763153076171875, 0.07297897338867188, 11.6220703125, 3.8801498413085938, 23.885513305664062, 9.966796875, 41.59307861328125, -13.5439453125, 28.165626525878906, 13.295875549316406, 48.935279846191406, 18.902225494384766, 0.7358551025390625, 31.624053955078125, 34.699806213378906, 6.5954742431640625, -2.4495277404785156, 44.609580993652344, 35.72723388671875, 8.848564147949219, -6.10150146484375, 11.08780288696289, -22.419204711914062, 8.1661376953125, -3.4320755004882812, 20.831275939941406, 56.44254684448242, 3.639892578125, 32.182167053222656, 31.036312103271484, 19.855087280273438, 18.385494232177734, 49.82719421386719, -12.769012451171875, 24.065196990966797, 28.622135162353516, 56.49385070800781, 12.580940246582031, 60.88348388671875, -18.33856773376465], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000239.npy"}
{"epoch": 0.3509544787077827, "step": 240, "batch_size": 64, "mean": 18.81613540649414, "std": 21.17365837097168, "min": -23.822166442871094, "p10": -6.741899871826171, "median": 20.346942901611328, "p90": 45.023912048339845, "max": 68.81407928466797, "pos_frac": 0.734375, "sample": [45.198951721191406, 40.47697830200195, 9.860282897949219, -0.01830291748046875, 17.631454467773438, -1.5688323974609375, 11.602401733398438, 25.272979736328125, 24.981121063232422, -4.342906951904297, -6.286659240722656, 1.665252685546875, 27.798912048339844, -23.822166442871094, 41.573036193847656, -13.437896728515625, 4.914739608764648, 14.952728271484375, 31.0135498046875, 38.272056579589844, 23.124774932861328, 51.79547119140625, 46.98588562011719, 33.00967788696289, 21.12464141845703, -4.1140594482421875, 4.448463439941406, 10.439979553222656, 44.61548614501953, 33.968360900878906, 25.156845092773438, 40.15633773803711, 13.28179931640625, -1.9392471313476562, -23.782791137695312, 68.81407928466797, 23.703262329101562, 30.779876708984375, -6.473594665527344, 20.86219024658203, -0.021778106689453125, 44.49851989746094, 36.312767028808594, 11.150962829589844, 41.503761291503906, 27.79106903076172, 49.86376953125, -10.782302856445312, 49.42632293701172, -10.046043395996094, 32.50127410888672, 58.22001647949219, -6.8568878173828125, -1.3763351440429688, 18.97339630126953, 25.386009216308594, 37.40102005004883, 35.01852035522461, 8.285560607910156, 3.986297607421875, -0.15996551513671875, 6.107086181640625, 19.831695556640625, -14.477188110351562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000240.npy"}
{"epoch": 0.3524229074889868, "step": 241, "batch_size": 64, "mean": 31.7427978515625, "std": 27.80092430114746, "min": -9.79046630859375, "p10": -2.4757308959960933, "median": 33.16873741149902, "p90": 60.26560363769532, "max": 133.5748748779297, "pos_frac": 0.859375, "sample": [61.305747985839844, 13.822914123535156, -2.6072235107421875, 18.2130126953125, 6.035369873046875, 43.30774688720703, 36.70665740966797, 36.564849853515625, 46.97557067871094, 61.60284423828125, 83.9076919555664, 35.90964126586914, 18.89196014404297, 35.66581726074219, -5.957851409912109, 36.03738784790039, -9.25423812866211, 22.317123413085938, 51.97060775756836, 40.19951629638672, -9.79046630859375, 35.35816955566406, 16.657352447509766, -3.001312255859375, 107.42446899414062, 30.129989624023438, 9.2950439453125, 39.759429931640625, 56.44017028808594, 23.86994171142578, 133.5748748779297, 18.478408813476562, 34.354217529296875, 18.38329315185547, 53.107933044433594, 13.683265686035156, 37.343292236328125, 47.343360900878906, 7.114765167236328, 46.44707489013672, 55.95196533203125, 31.983257293701172, 4.9066925048828125, 22.177879333496094, 45.993408203125, -0.783416748046875, 52.074737548828125, 35.98019790649414, 61.577781677246094, 0.167572021484375, 18.622406005859375, 50.58551788330078, -2.168914794921875, 57.838600158691406, 5.069698333740234, 13.899356842041016, 23.14886474609375, 50.39933776855469, 16.087738037109375, 39.91680145263672, -7.4276123046875, 88.42926788330078, -9.784757614135742, 29.304214477539062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000241.npy"}
{"epoch": 0.35389133627019087, "step": 242, "batch_size": 64, "mean": 22.97677993774414, "std": 21.96134376525879, "min": -25.956085205078125, "p10": -6.534420776367186, "median": 20.5636043548584, "p90": 55.961305999755865, "max": 68.46836853027344, "pos_frac": 0.859375, "sample": [2.550365447998047, 20.835765838623047, -4.3441619873046875, 48.91155242919922, 19.302040100097656, 36.165283203125, 37.441932678222656, -10.886161804199219, 33.087257385253906, 24.8994140625, 1.816263198852539, 51.914215087890625, 60.501426696777344, -9.089035034179688, 16.529632568359375, 4.0844573974609375, 37.243614196777344, 54.884971618652344, 58.729469299316406, 3.5210304260253906, 67.66950988769531, 24.803321838378906, 41.54261779785156, 10.385177612304688, -14.670368194580078, 33.67671203613281, 31.42833709716797, 13.016426086425781, 21.907909393310547, 34.525634765625, 18.05907440185547, 17.1429443359375, 14.029190063476562, 14.209259033203125, 15.309028625488281, -12.129447937011719, 12.211090087890625, 12.5123291015625, -5.813774108886719, 9.8616943359375, 18.850894927978516, 14.59002685546875, 42.35319519042969, 68.46836853027344, -6.843269348144531, 24.97809600830078, 56.42259216308594, 58.77159118652344, -18.639358520507812, 51.47052764892578, 27.324356079101562, 21.686363220214844, 19.52587127685547, 3.162933349609375, -25.956085205078125, 20.29144287109375, 21.347644805908203, 19.37198257446289, 34.77812194824219, 24.663246154785156, 13.163677215576172, 41.542755126953125, 30.261985778808594, 61.150909423828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000242.npy"}
{"epoch": 0.355359765051395, "step": 243, "batch_size": 64, "mean": 22.756088256835938, "std": 25.121309280395508, "min": -29.27275848388672, "p10": -7.055486297607421, "median": 24.638763427734375, "p90": 54.588833618164074, "max": 76.72940826416016, "pos_frac": 0.796875, "sample": [25.845848083496094, -24.07347869873047, -6.092041015625, 41.74748992919922, 21.38964080810547, 42.42527770996094, -6.056110382080078, -4.544694900512695, 18.223237991333008, 28.3372802734375, 66.95330810546875, 12.974853515625, 63.28422546386719, 25.988189697265625, 15.609756469726562, 55.435829162597656, -20.13665771484375, 37.021995544433594, -7.468391418457031, 8.890777587890625, -20.692672729492188, 11.915336608886719, 41.34152603149414, 0.1875, -3.212923049926758, 71.69740295410156, 32.811641693115234, -25.294395446777344, 38.782405853271484, 17.24505615234375, 55.87858581542969, 13.47491455078125, 22.587139129638672, 29.026927947998047, 4.625602722167969, 76.72940826416016, 33.23976135253906, 1.8961181640625, -5.5124969482421875, 52.612510681152344, 36.999977111816406, 24.12120819091797, 37.81950378417969, 70.22774505615234, 50.65666198730469, 35.75590515136719, -29.27275848388672, 33.22681427001953, 24.214752197265625, -3.4460315704345703, 51.44084167480469, 28.723861694335938, 18.448043823242188, 13.865875244140625, 22.206085205078125, 4.1456146240234375, 37.528602600097656, 25.062774658203125, 41.9160041809082, 29.651641845703125, 26.357803344726562, 21.81578826904297, 36.14922332763672, -26.32205581665039], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000243.npy"}
{"epoch": 0.3568281938325991, "step": 244, "batch_size": 64, "mean": 23.336746215820312, "std": 25.058128356933594, "min": -24.440622329711914, "p10": -4.202680206298828, "median": 19.611038208007812, "p90": 58.619150543212896, "max": 88.09950256347656, "pos_frac": 0.828125, "sample": [18.905136108398438, 14.474594116210938, 12.272529602050781, 32.62855529785156, -4.38519287109375, 13.850635528564453, 36.32581329345703, 39.0152587890625, 54.03507995605469, 58.85485076904297, 29.937480926513672, 1.4095611572265625, 33.486663818359375, 20.993911743164062, 5.2613525390625, 37.488861083984375, 3.2897415161132812, 73.98563385009766, 22.868240356445312, 4.4785919189453125, -3.395648956298828, 13.243217468261719, 40.43467712402344, 1.227609634399414, 33.61677551269531, 9.726175308227539, -24.11349868774414, 16.571048736572266, -4.397239685058594, -7.7250518798828125, 20.316940307617188, -1.182098388671875, 72.60214233398438, 24.543685913085938, 72.91461181640625, 10.210281372070312, 5.119728088378906, 28.614112854003906, -3.7768173217773438, 34.143035888671875, 46.89634704589844, -24.440622329711914, -13.417724609375, 6.0556793212890625, 10.41729736328125, -14.590866088867188, 42.14070510864258, 18.579818725585938, 27.660385131835938, -0.7008132934570312, 58.069183349609375, 73.32177734375, 27.556224822998047, 15.089065551757812, 3.4273452758789062, 20.519977569580078, 16.042415618896484, 74.432373046875, 17.733421325683594, 88.09950256347656, 38.120689392089844, 36.3587646484375, 32.81977844238281, 45.49010467529297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000244.npy"}
{"epoch": 0.35829662261380324, "step": 245, "batch_size": 64, "mean": 26.122928619384766, "std": 22.983217239379883, "min": -20.828468322753906, "p10": -1.1408538818359368, "median": 22.736465454101562, "p90": 56.624252319335945, "max": 83.40640258789062, "pos_frac": 0.875, "sample": [-1.3909072875976562, 42.29768371582031, 44.776344299316406, 16.23198699951172, 30.754959106445312, 12.023120880126953, 9.514764785766602, 21.889617919921875, 53.84190368652344, 53.255088806152344, 57.323394775390625, 1.1304893493652344, 21.42810821533203, 54.992919921875, 12.182594299316406, 69.20684051513672, 31.628395080566406, 30.484481811523438, 39.762001037597656, 4.376678466796875, 62.53013610839844, 37.84196472167969, 1.084538459777832, 45.56950378417969, 4.679570198059082, 2.2368316650390625, 46.916908264160156, -3.8457603454589844, 6.749301910400391, 12.1282958984375, 35.03265380859375, 15.345367431640625, 83.40640258789062, 47.396697998046875, 40.5907096862793, 39.47819519042969, -12.464141845703125, 20.605560302734375, 16.705322265625, 6.8315277099609375, 15.813528060913086, -0.5573959350585938, 27.910186767578125, 41.115440368652344, -4.9560394287109375, 19.205474853515625, 33.27709197998047, 64.7763671875, 14.657630920410156, 62.32255554199219, 45.953102111816406, -3.1703720092773438, -6.65606689453125, 1.7595863342285156, -20.828468322753906, 63.230560302734375, 16.766857147216797, 41.471893310546875, 49.89994812011719, 23.58331298828125, 33.71763610839844, 27.422927856445312, 10.080768585205078, 0.540888786315918], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000245.npy"}
{"epoch": 0.35976505139500736, "step": 246, "batch_size": 64, "mean": 20.089324951171875, "std": 23.074047088623047, "min": -32.89813232421875, "p10": -7.238546752929685, "median": 18.094575881958008, "p90": 49.53313903808594, "max": 91.35279083251953, "pos_frac": 0.8125, "sample": [40.50445556640625, 43.640586853027344, 21.486312866210938, 45.32225799560547, 49.987945556640625, 27.222911834716797, 75.52226257324219, 6.796440124511719, -9.177665710449219, -0.722503662109375, 54.296844482421875, 29.018512725830078, 21.718645095825195, 14.188945770263672, 4.757457733154297, 39.24684143066406, -32.89813232421875, 15.584518432617188, 22.19190216064453, 7.3831939697265625, 15.797309875488281, 28.025897979736328, 16.806812286376953, 2.1189804077148438, 29.14196014404297, 22.09368896484375, -8.200912475585938, -18.510467529296875, 91.35279083251953, -14.925226211547852, -3.4083824157714844, -3.324859619140625, 4.8766326904296875, 36.13825225830078, -11.903861999511719, 47.41136169433594, 8.833013534545898, 11.040168762207031, 51.14728546142578, 14.931907653808594, 50.241546630859375, 41.663673400878906, 30.24243927001953, -4.9930267333984375, 13.289787292480469, 29.814193725585938, 5.682319641113281, 48.471923828125, -1.008610725402832, 37.9075927734375, 51.12727355957031, 28.153579711914062, 17.95269775390625, 13.078147888183594, 4.909523010253906, 22.450000762939453, 2.670154571533203, 30.886764526367188, 37.08317565917969, 2.538726806640625, 21.9659423828125, 18.236454010009766, -24.880905151367188, 12.719348907470703], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000246.npy"}
{"epoch": 0.36123348017621143, "step": 247, "batch_size": 64, "mean": 29.328170776367188, "std": 27.784740447998047, "min": -31.25676727294922, "p10": -0.21823616027831966, "median": 28.121002197265625, "p90": 62.88597640991211, "max": 127.24456787109375, "pos_frac": 0.890625, "sample": [-3.2423629760742188, 18.676002502441406, 0.509033203125, 34.51182556152344, 28.402938842773438, 38.72419738769531, 56.6911506652832, 46.53868865966797, -0.49924468994140625, 45.527862548828125, 35.48503875732422, 59.36207580566406, 38.40252685546875, 35.17912292480469, 8.033313751220703, 32.440773010253906, 34.711669921875, 11.990447998046875, 68.89237976074219, 62.439125061035156, 2.03216552734375, -2.4194297790527344, 27.887847900390625, 3.9445457458496094, 39.958065032958984, 21.18274688720703, 1.3400421142578125, 4.77166748046875, 33.75356674194336, 8.339576721191406, 71.62989807128906, 53.18144226074219, 10.602813720703125, 15.052215576171875, 16.768638610839844, 20.248931884765625, 127.24456787109375, 82.44586181640625, 20.876174926757812, -5.202178955078125, 0.4374504089355469, 46.28330993652344, -31.25676727294922, 42.118370056152344, -5.252552032470703, 4.163414001464844, 25.843963623046875, 10.951255798339844, 100.19091796875, 28.354156494140625, 28.4842529296875, 18.526947021484375, 43.94792175292969, 86.40204620361328, -3.1448593139648438, 15.652523040771484, 29.648605346679688, 39.83960723876953, 37.802528381347656, 12.702568054199219, 63.077484130859375, 12.563236236572266, 46.630889892578125, 16.620010375976562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000247.npy"}
{"epoch": 0.36270190895741555, "step": 248, "batch_size": 64, "mean": 20.61870002746582, "std": 23.08053207397461, "min": -26.816696166992188, "p10": -6.270581817626952, "median": 20.263269424438477, "p90": 55.95127296447756, "max": 77.01568603515625, "pos_frac": 0.828125, "sample": [22.435638427734375, 61.619049072265625, -9.840194702148438, -10.596813201904297, -5.138801574707031, 9.18899917602539, -7.093086242675781, 36.857574462890625, 9.783527374267578, -20.0147705078125, 0.21319580078125, 24.451156616210938, 22.46137237548828, 15.929790496826172, 77.01568603515625, 8.465749740600586, 3.8338260650634766, 6.4557647705078125, 29.216217041015625, 2.85693359375, -4.2585601806640625, 50.730472564697266, 9.818504333496094, 66.22378540039062, 23.15936279296875, -3.297718048095703, 36.857906341552734, 64.95999908447266, 24.125946044921875, 47.3599853515625, 1.2618560791015625, 49.52928161621094, 40.20195007324219, 29.598602294921875, 36.69609069824219, 26.70623779296875, -26.816696166992188, 42.43498229980469, 5.364051818847656, -4.669193267822266, 10.776840209960938, 22.97728729248047, 26.88074493408203, 58.188758850097656, 20.60967254638672, 12.811914443969727, 34.23394012451172, 65.19831085205078, -8.579292297363281, 67.21764373779297, 11.461151123046875, 13.337902069091797, 38.43330001831055, 26.431053161621094, 33.74940490722656, 25.018539428710938, 2.3293418884277344, 25.24877166748047, -6.7556304931640625, 19.916866302490234, 1.4118270874023438, 8.784912109375, 6.933010101318359, 8.892822265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000248.npy"}
{"epoch": 0.3641703377386197, "step": 249, "batch_size": 64, "mean": 19.5279541015625, "std": 25.447851181030273, "min": -31.839805603027344, "p10": -15.619208908081053, "median": 18.951553344726562, "p90": 52.28814926147462, "max": 76.36151123046875, "pos_frac": 0.796875, "sample": [6.743434906005859, 53.29351043701172, -19.51239776611328, -24.536388397216797, 36.48065948486328, 9.157882690429688, -1.018341064453125, 20.0596923828125, -11.555770874023438, 8.473121643066406, 28.395339965820312, 34.508201599121094, 6.2218475341796875, 33.909523010253906, -17.914222717285156, -31.839805603027344, 1.7603874206542969, 76.36151123046875, 20.751293182373047, -7.4920806884765625, 68.9566650390625, 36.9786376953125, -22.99811553955078, 28.199260711669922, 9.196998596191406, 22.46552276611328, 13.593799591064453, 0.0251922607421875, 35.34992218017578, 29.68700408935547, -13.425361633300781, 17.843414306640625, 0.6822586059570312, 24.572906494140625, 20.33553695678711, 8.55279541015625, 38.11793518066406, 25.536285400390625, 8.91937255859375, 30.42951202392578, 39.226043701171875, 13.948104858398438, 67.04591369628906, -16.559429168701172, 8.486801147460938, 16.911392211914062, 46.02721405029297, 7.331159591674805, 21.449996948242188, 16.068553924560547, -2.976917266845703, 49.94230651855469, 16.437225341796875, 64.3277587890625, 45.73199462890625, 44.48009490966797, -3.830108642578125, 3.89501953125, 62.59819412231445, -29.859054565429688, 49.54753875732422, 63.28129577636719, 39.82832336425781, 21.182666778564453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000249.npy"}
{"epoch": 0.3656387665198238, "step": 250, "batch_size": 64, "mean": 19.97228240966797, "std": 24.433605194091797, "min": -56.741127014160156, "p10": -5.7848365783691404, "median": 19.907360076904297, "p90": 54.14714508056642, "max": 77.225341796875, "pos_frac": 0.765625, "sample": [55.11869812011719, 41.946937561035156, -3.6729736328125, -18.73104476928711, -12.28204345703125, -5.7616424560546875, 57.96257781982422, 58.5379638671875, 15.792228698730469, 28.242515563964844, 17.435791015625, 19.4732666015625, 36.39147186279297, 3.814178466796875, -1.42431640625, 9.719221115112305, 51.88018798828125, 27.094223022460938, 51.243202209472656, 33.76665496826172, 27.60546875, 12.681991577148438, 35.57154846191406, 41.489036560058594, 28.899410247802734, 37.896610260009766, 28.025177001953125, -4.736213684082031, 24.587242126464844, -28.399688720703125, -1.7831649780273438, 8.31964111328125, -1.6324386596679688, 26.93950653076172, 3.184793472290039, 77.225341796875, 63.82997131347656, 0.1032257080078125, 12.451133728027344, 57.70930099487305, 5.001800537109375, 28.67707061767578, 39.062583923339844, 26.291908264160156, 18.698822021484375, 27.739990234375, 9.785209655761719, -8.076229095458984, 17.805938720703125, 20.341453552246094, 35.57872772216797, -1.7134857177734375, 26.86811637878418, 56.6527099609375, -56.741127014160156, 4.641605377197266, 14.323654174804688, 45.45391082763672, 23.762306213378906, -5.794776916503906, 16.6260986328125, -18.841739654541016, -2.9905471801757812, 38.557106018066406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000250.npy"}
{"epoch": 0.3671071953010279, "step": 251, "batch_size": 64, "mean": 19.943553924560547, "std": 25.718564987182617, "min": -29.184192657470703, "p10": -12.91717300415039, "median": 16.329185485839844, "p90": 56.01881103515627, "max": 89.33958435058594, "pos_frac": 0.75, "sample": [14.155403137207031, -3.5012264251708984, 6.442626953125, 38.02740478515625, 35.38689422607422, 12.375564575195312, 35.998626708984375, 71.04415893554688, 9.902397155761719, -9.088737487792969, 89.33958435058594, 9.155616760253906, 50.389015197753906, -5.295738220214844, 23.557945251464844, 7.817359924316406, -15.93408203125, 58.30438995361328, 24.052715301513672, 28.40599822998047, 57.91617965698242, 42.05926513671875, -2.5034637451171875, 3.3078536987304688, 44.716896057128906, -11.865768432617188, -19.591175079345703, -29.184192657470703, 59.921085357666016, 28.45665740966797, 14.772750854492188, -0.6322250366210938, 47.13713073730469, 24.791671752929688, -4.190097808837891, 36.39839172363281, 28.710357666015625, 10.683841705322266, 8.624958038330078, 51.591617584228516, 46.670166015625, -23.872127532958984, 11.226078033447266, 16.170181274414062, 21.8741455078125, 32.509742736816406, 38.797119140625, 72.3172607421875, 26.23157501220703, 37.180511474609375, -16.893611907958984, 29.828758239746094, 61.43138122558594, 16.488189697265625, 21.611366271972656, 11.060991287231445, 5.157955169677734, 29.1669921875, 8.74435043334961, 5.2570648193359375, -17.996734619140625, -6.540435791015625, -8.323251724243164, -13.367774963378906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000251.npy"}
{"epoch": 0.368575624082232, "step": 252, "batch_size": 64, "mean": 20.19826889038086, "std": 22.705158233642578, "min": -25.800025939941406, "p10": -13.20432243347168, "median": 21.25990104675293, "p90": 52.74816551208498, "max": 68.11009979248047, "pos_frac": 0.8125, "sample": [68.11009979248047, 54.73106384277344, -7.241264343261719, 22.772525787353516, 32.138336181640625, -25.800025939941406, -12.984378814697266, 59.31036376953125, 21.226577758789062, 46.41374206542969, 30.27410888671875, 7.495609283447266, 48.121402740478516, -2.5731887817382812, 32.64647674560547, 30.34515380859375, 26.895736694335938, 15.661811828613281, 8.462203979492188, 29.112396240234375, 31.97760009765625, -13.298583984375, 21.293224334716797, 26.18328857421875, 10.0650634765625, 1.2130203247070312, 5.615140914916992, 27.628799438476562, 59.19194030761719, 17.890453338623047, 36.70415115356445, 33.930572509765625, 31.27081298828125, 8.481300354003906, 66.1605224609375, -15.531951904296875, 38.43004608154297, -23.71208953857422, 35.70452117919922, 33.6551513671875, -5.757699966430664, -18.69696807861328, 3.17437744140625, 1.107086181640625, -16.978561401367188, 42.163700103759766, 27.092864990234375, 25.042564392089844, 4.716087341308594, 35.5715217590332, 32.82878112792969, -15.385421752929688, 18.992874145507812, 34.88880157470703, 18.872879028320312, 15.52475357055664, 19.098386764526367, 13.513908386230469, 57.98626708984375, 6.050437927246094, -5.658199310302734, 13.042510986328125, 56.55683898925781, 10.969734191894531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000252.npy"}
{"epoch": 0.3700440528634361, "step": 253, "batch_size": 64, "mean": 19.479793548583984, "std": 22.452821731567383, "min": -39.490386962890625, "p10": -6.221220397949217, "median": 20.07970428466797, "p90": 45.86936569213867, "max": 74.34259033203125, "pos_frac": 0.796875, "sample": [-1.9435253143310547, -19.4625244140625, 25.492660522460938, 23.454185485839844, 33.3501091003418, 3.0032196044921875, 29.65887451171875, 23.645660400390625, 24.13678741455078, 12.770301818847656, 57.744056701660156, 29.810592651367188, 38.542083740234375, 15.387619018554688, 38.764007568359375, 33.600067138671875, 40.016944885253906, 6.292301177978516, -8.420295715332031, 4.867908477783203, -37.283607482910156, -16.800384521484375, 48.28114318847656, 42.0220947265625, 19.176956176757812, 9.567514419555664, 60.56040954589844, -1.2710533142089844, 15.682685852050781, 4.5763702392578125, -4.7687225341796875, 36.28631591796875, 21.938072204589844, 45.94932556152344, 42.577545166015625, 40.138336181640625, 12.610244750976562, 10.28472900390625, -0.788116455078125, 4.773506164550781, 26.94066619873047, 46.47773742675781, 52.04278564453125, 74.34259033203125, 30.37567138671875, -4.361717224121094, 13.848861694335938, 9.469741821289062, -14.995094299316406, 6.882045745849609, 37.597259521484375, 11.956832885742188, -39.490386962890625, -1.60601806640625, 20.982452392578125, 13.049205780029297, 29.984390258789062, -6.843719482421875, 39.0701789855957, 37.94659423828125, 6.616382598876953, 17.34747314453125, 29.165687561035156, 45.68279266357422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000253.npy"}
{"epoch": 0.37151248164464024, "step": 254, "batch_size": 64, "mean": 18.474328994750977, "std": 22.77364158630371, "min": -35.64396667480469, "p10": -8.544906616210938, "median": 19.070877075195312, "p90": 49.63790588378907, "max": 68.70477294921875, "pos_frac": 0.765625, "sample": [56.512176513671875, -35.64396667480469, 36.832000732421875, 0.7168369293212891, 22.340116500854492, 44.964073181152344, -11.935012817382812, -2.9583568572998047, 23.874160766601562, 15.912521362304688, -8.547401428222656, 11.460853576660156, 15.800262451171875, 31.26152801513672, 9.24896240234375, -20.554283142089844, -7.836956024169922, 22.90491485595703, 28.085464477539062, 50.63011932373047, -8.45977783203125, 59.337493896484375, 60.175201416015625, 68.70477294921875, 48.55933380126953, -34.255126953125, 11.264762878417969, 6.5599517822265625, 5.99354362487793, 30.158222198486328, -10.625160217285156, 41.547325134277344, -0.26749420166015625, 13.145187377929688, -1.5682563781738281, 30.040176391601562, 32.5264892578125, 50.71186828613281, 29.241256713867188, 29.270465850830078, -0.20319366455078125, 7.193336486816406, 5.872785568237305, 45.441162109375, -8.539085388183594, 38.694190979003906, 0.3660392761230469, 1.7526779174804688, 10.989192962646484, 29.877830505371094, 23.55847930908203, -1.147430419921875, 41.85040283203125, 37.192779541015625, 10.357540130615234, 20.43096923828125, -9.072677612304688, 50.10015106201172, 37.48472595214844, 19.348052978515625, 27.182952880859375, 23.344085693359375, 6.360137939453125, 18.793701171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000254.npy"}
{"epoch": 0.37298091042584436, "step": 255, "batch_size": 64, "mean": 24.718219757080078, "std": 23.98863983154297, "min": -23.839157104492188, "p10": -3.5139495849609355, "median": 26.61838150024414, "p90": 53.98002319335938, "max": 92.47921752929688, "pos_frac": 0.828125, "sample": [55.931060791015625, 2.257293701171875, 1.5134658813476562, 53.05189514160156, 24.050369262695312, -6.04925537109375, 14.605842590332031, -23.839157104492188, 16.248397827148438, 40.10675811767578, 12.9552001953125, 33.100223541259766, 29.452835083007812, 1.5309257507324219, 42.01483154296875, 41.86140441894531, 13.366291046142578, 35.378929138183594, 26.7513427734375, 55.865699768066406, 20.102432250976562, 4.987010955810547, 8.059028625488281, 32.61592483520508, 17.830612182617188, 8.015434265136719, -20.264997482299805, 40.916526794433594, 6.997978210449219, -0.0747222900390625, -4.312225341796875, 92.47921752929688, 39.95500946044922, 39.687599182128906, 5.9220428466796875, 35.128028869628906, 67.61604309082031, 29.696792602539062, 28.450881958007812, 45.84292984008789, 39.092491149902344, -19.774702072143555, 36.100921630859375, 15.950347900390625, 7.345249176025391, 29.834125518798828, 49.41981506347656, 54.37779235839844, 21.823715209960938, 39.523746490478516, 47.548614501953125, 13.107009887695312, -1.4886093139648438, -1.65130615234375, -6.098049163818359, 46.34443664550781, 51.65440368652344, 60.85541534423828, 12.062324523925781, 26.48542022705078, -19.83873748779297, 45.82389831542969, -0.769317626953125, 64.43113708496094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000255.npy"}
{"epoch": 0.3744493392070485, "step": 256, "batch_size": 64, "mean": 25.614931106567383, "std": 28.126327514648438, "min": -43.29029846191406, "p10": -8.106841659545896, "median": 26.176830291748047, "p90": 55.68030052185059, "max": 104.34419250488281, "pos_frac": 0.828125, "sample": [37.371925354003906, 51.454986572265625, 7.8071441650390625, 17.147384643554688, -1.1129150390625, 53.918128967285156, 34.28098678588867, 64.31910705566406, 7.111549377441406, 39.684181213378906, 35.79924011230469, 7.968681335449219, 61.9835205078125, 13.939987182617188, 53.980499267578125, -43.29029846191406, 10.494808197021484, 52.517578125, 10.014686584472656, 8.10016918182373, 7.82928466796875, -17.51386260986328, 27.637409210205078, 45.5301513671875, 24.716251373291016, 38.60837936401367, 55.09537124633789, 23.52490234375, 13.437217712402344, 19.72161865234375, 42.63275146484375, 29.307395935058594, 17.03429412841797, -38.90129089355469, 0.9822425842285156, 51.43710708618164, 12.393280029296875, 51.10173034667969, -3.383819580078125, -13.997650146484375, 33.27650451660156, -22.27447509765625, 47.87346649169922, -5.877105712890625, 16.30401611328125, -3.840423583984375, 53.674766540527344, 67.52215576171875, 38.72032928466797, -9.062442779541016, 17.938331604003906, 32.189476013183594, 50.154510498046875, 66.31918334960938, -10.445335388183594, 55.93098449707031, 43.49787139892578, 6.850879669189453, 79.44322204589844, 3.2172393798828125, 29.090972900390625, 4.63172721862793, 29.19145965576172, 104.34419250488281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000256.npy"}
{"epoch": 0.37591776798825255, "step": 257, "batch_size": 64, "mean": 24.06505584716797, "std": 26.24692153930664, "min": -35.10497283935547, "p10": -0.8967193603515619, "median": 19.51236343383789, "p90": 61.50673446655274, "max": 95.28298950195312, "pos_frac": 0.875, "sample": [95.28298950195312, 15.1923828125, 20.022186279296875, 13.228992462158203, 28.332542419433594, 19.324134826660156, 56.834991455078125, 79.12870788574219, 45.417076110839844, -35.10497283935547, 14.859371185302734, -1.6351394653320312, 48.31504821777344, 44.266380310058594, 43.95458221435547, 16.234161376953125, 26.7340087890625, 27.129146575927734, 3.55224609375, 20.660377502441406, 34.079261779785156, 71.86576843261719, 54.2239990234375, -34.84379959106445, 11.70468521118164, 17.789276123046875, -0.286376953125, 6.139839172363281, 23.70250701904297, 19.700592041015625, 32.03343963623047, 11.630630493164062, 59.72345733642578, 49.434906005859375, 9.324874877929688, 0.612762451171875, 25.482439041137695, -4.846309661865234, 45.45656967163086, 4.8018951416015625, 43.69035339355469, 11.252395629882812, 14.454505920410156, 9.616779327392578, 18.589187622070312, 20.012969970703125, 5.321647644042969, 1.0155391693115234, 63.58348083496094, 63.11786651611328, 26.514041900634766, 15.081031799316406, 5.881317138671875, 0.73992919921875, 33.58770751953125, 62.27099609375, 34.01466369628906, -1.158294677734375, 14.774955749511719, -18.202402114868164, -29.2724609375, 48.434226989746094, 9.162620544433594, 68.22078704833984], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000257.npy"}
{"epoch": 0.37738619676945667, "step": 258, "batch_size": 64, "mean": 23.28742027282715, "std": 22.506834030151367, "min": -33.96942138671875, "p10": -4.521169281005856, "median": 26.659442901611328, "p90": 51.17782936096192, "max": 66.1357421875, "pos_frac": 0.84375, "sample": [7.57379150390625, -0.8271541595458984, 35.015953063964844, 32.292381286621094, -9.607826232910156, 34.35298156738281, 44.87861633300781, 17.405487060546875, 20.830406188964844, 5.9771270751953125, -1.7201919555664062, 17.01513671875, -8.812660217285156, 42.80145263671875, 30.844804763793945, 28.083755493164062, 6.151599884033203, -5.721588134765625, 65.5637435913086, 40.04734802246094, 34.73362731933594, 31.436491012573242, 26.0980224609375, 27.220863342285156, 60.71678161621094, 49.080352783203125, 31.9010009765625, 12.012565612792969, 10.272598266601562, 36.67123794555664, 48.569366455078125, 12.313369750976562, 45.79936218261719, 24.454391479492188, -1.37518310546875, 5.604251861572266, 33.454166412353516, -32.86945343017578, 3.3053741455078125, 7.107421875, 27.585220336914062, 2.2535667419433594, 57.39463806152344, 40.415069580078125, 6.153659820556641, 0.2379913330078125, 60.105316162109375, 38.15794372558594, -9.25709342956543, -7.587471008300781, 35.08122253417969, 16.737407684326172, 20.568893432617188, 52.07674789428711, 10.299224853515625, 53.27190399169922, -33.96942138671875, 13.836395263671875, 66.1357421875, 48.11661148071289, 29.689453125, 11.987228393554688, 39.137420654296875, 43.31549072265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000258.npy"}
{"epoch": 0.3788546255506608, "step": 259, "batch_size": 64, "mean": 19.601879119873047, "std": 25.75547981262207, "min": -49.36518096923828, "p10": -5.108844375610351, "median": 18.20303726196289, "p90": 49.28154449462891, "max": 109.37217712402344, "pos_frac": 0.84375, "sample": [32.04393768310547, 6.4284210205078125, 47.49256896972656, -14.489120483398438, 32.45470428466797, -38.74766159057617, -10.074325561523438, 109.37217712402344, 50.04905700683594, 55.6455078125, 20.685895919799805, -41.39372253417969, 28.240806579589844, 5.659088134765625, 13.734100341796875, 1.148712158203125, 17.02519989013672, 9.438247680664062, 35.999855041503906, 17.36939239501953, 39.56739807128906, 19.03668212890625, 0.302276611328125, 22.557811737060547, 53.52143859863281, 11.718280792236328, 61.07404708862305, 21.870201110839844, 19.15496826171875, -49.36518096923828, 50.048248291015625, 3.9172630310058594, 45.654876708984375, 34.93293762207031, 10.416915893554688, 37.911712646484375, 34.72161102294922, 72.95446014404297, 20.498767852783203, 28.77332305908203, 0.5997695922851562, 21.038795471191406, 24.66413116455078, 30.270736694335938, -4.05657958984375, 12.229198455810547, 14.252647399902344, 39.189613342285156, -4.1711578369140625, 36.01402282714844, 3.5212669372558594, 46.026084899902344, 42.4566650390625, 12.731803894042969, 14.534927368164062, -5.252979278564453, 20.35346794128418, 0.8732452392578125, 8.19314193725586, -6.114418029785156, 3.7315673828125, -4.772529602050781, 16.621009826660156, 14.234962463378906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000259.npy"}
{"epoch": 0.3803230543318649, "step": 260, "batch_size": 64, "mean": 18.564983367919922, "std": 27.96923828125, "min": -47.528934478759766, "p10": -21.253549194335935, "median": 21.688045501708984, "p90": 54.59248962402346, "max": 75.96333312988281, "pos_frac": 0.75, "sample": [27.57537841796875, -7.220550537109375, 21.809043884277344, -31.928592681884766, 13.50460433959961, 41.690155029296875, 49.08272171020508, 24.72551727294922, 21.567047119140625, 6.7063140869140625, 7.364253997802734, 40.37349319458008, 10.133159637451172, 30.414260864257812, 43.135345458984375, 71.96478271484375, 33.702701568603516, -37.35082244873047, -2.74432373046875, 40.03699493408203, -47.528934478759766, -10.792457580566406, 19.389026641845703, 44.26027297973633, -6.366535186767578, -3.6374053955078125, 27.01211929321289, 37.39665222167969, 11.459716796875, 15.557905197143555, 22.015602111816406, 49.177764892578125, 34.11377716064453, 59.35076904296875, 24.95990753173828, -10.54422378540039, 75.96333312988281, 67.60675811767578, 3.0596580505371094, 14.190780639648438, -20.264442443847656, -14.152824401855469, 5.079685211181641, -23.118267059326172, 22.907852172851562, -21.677452087402344, -6.5590667724609375, -21.781230926513672, 40.53217697143555, 7.0686492919921875, 42.14398193359375, 4.41937255859375, -36.432090759277344, 62.417388916015625, 29.478965759277344, 37.87449645996094, 9.424270629882812, 33.821380615234375, 34.86612319946289, 4.7956695556640625, 16.531944274902344, 56.9130859375, 33.0400390625, 59.643287658691406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000260.npy"}
{"epoch": 0.38179148311306904, "step": 261, "batch_size": 64, "mean": 25.985095977783203, "std": 24.344921112060547, "min": -16.593238830566406, "p10": -3.044468688964843, "median": 24.983875274658203, "p90": 53.06479415893555, "max": 120.53912353515625, "pos_frac": 0.859375, "sample": [27.637481689453125, 52.75001525878906, 38.34728240966797, 10.845458984375, -3.2551040649414062, 120.53912353515625, 19.33536148071289, 32.637481689453125, 3.539579391479492, -3.4432907104492188, 43.453773498535156, 11.7281494140625, 29.523033142089844, -0.7946319580078125, 44.97782897949219, 38.482887268066406, 23.495983123779297, 2.1401824951171875, 44.44573211669922, 1.8205757141113281, 47.79045867919922, -6.79229736328125, 20.49448013305664, 12.843086242675781, 38.23122787475586, 34.13762664794922, 33.81959533691406, 38.50923156738281, 75.32308959960938, -2.5529861450195312, 5.884304046630859, 27.297950744628906, 8.507406234741211, 39.71674728393555, 24.268470764160156, 8.602054595947266, 39.752803802490234, 22.308799743652344, 51.9036865234375, 17.540817260742188, 73.85202026367188, 25.69927978515625, 11.360702514648438, 29.036056518554688, -16.593238830566406, 53.19969940185547, 18.819778442382812, 33.555511474609375, 17.954490661621094, 52.56697463989258, 63.5755615234375, 33.05896759033203, 14.987579345703125, 1.1310272216796875, -13.406612396240234, 55.074440002441406, 36.521087646484375, -5.4805145263671875, 10.882637023925781, -11.954742431640625, 9.173477172851562, 26.0445556640625, 56.50757598876953, 11.686447143554688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000261.npy"}
{"epoch": 0.3832599118942731, "step": 262, "batch_size": 64, "mean": 23.777236938476562, "std": 24.481460571289062, "min": -33.204498291015625, "p10": -2.800972747802734, "median": 19.3740234375, "p90": 57.15074920654297, "max": 101.9554672241211, "pos_frac": 0.84375, "sample": [6.501415252685547, 20.507110595703125, -33.204498291015625, 28.047286987304688, 23.965606689453125, 9.010574340820312, 64.24874877929688, 56.14380645751953, 23.71700096130371, 11.67950439453125, -14.633888244628906, 19.35431671142578, 1.9333362579345703, 43.300079345703125, -7.8040008544921875, -2.7151870727539062, 38.425758361816406, 45.04172897338867, -2.4715518951416016, 56.39361572265625, 21.102462768554688, 29.0880184173584, 9.796993255615234, 15.820587158203125, 16.15533447265625, 15.420440673828125, 11.693544387817383, 68.1170425415039, 24.627059936523438, 4.1425018310546875, 7.9114990234375, 13.881649017333984, 16.640151977539062, 12.250213623046875, 15.024749755859375, -1.9943046569824219, 12.002861022949219, 43.70030975341797, 61.09825134277344, 17.752159118652344, 25.8447265625, 19.39373016357422, 38.00769805908203, 48.029052734375, 57.47523498535156, 65.57361602783203, 38.22996520996094, -27.654388427734375, 33.60945129394531, 36.83759689331055, -4.535377502441406, 47.88623809814453, 5.779579162597656, 49.291385650634766, -7.733531951904297, 60.026397705078125, 45.144989013671875, 28.39751434326172, 11.809280395507812, 16.155731201171875, 13.549293518066406, 19.83502197265625, -2.837738037109375, 101.9554672241211], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000262.npy"}
{"epoch": 0.38472834067547723, "step": 263, "batch_size": 64, "mean": 32.29640197753906, "std": 28.91850471496582, "min": -42.73094177246094, "p10": -6.302722549438476, "median": 33.795143127441406, "p90": 66.74054794311523, "max": 122.419189453125, "pos_frac": 0.859375, "sample": [38.74755859375, -9.835922241210938, 12.991546630859375, 61.8292236328125, 30.288406372070312, 23.364501953125, 42.94635772705078, 36.84022521972656, 16.208221435546875, 38.8436279296875, -6.64361572265625, 71.11998748779297, 11.857343673706055, 34.982757568359375, 13.936286926269531, 26.174789428710938, 59.10268783569336, 59.976837158203125, 17.778717041015625, 67.2396011352539, -12.584304809570312, 19.471343994140625, 59.06207275390625, -11.208061218261719, -5.507305145263672, 16.918304443359375, 22.85321044921875, 47.97639846801758, -15.942237854003906, 40.13719177246094, 28.58159637451172, 7.9011993408203125, 47.147499084472656, 18.7039852142334, 24.326271057128906, 55.6529541015625, 57.94036102294922, 35.07542037963867, 66.5445327758789, 36.152587890625, 16.662940979003906, 81.08880615234375, 41.75017547607422, 45.774085998535156, 31.00951385498047, 12.266731262207031, 38.65293884277344, 66.82455444335938, 35.64556884765625, 4.058372497558594, 6.7216033935546875, 49.32209014892578, 84.54780578613281, -42.73094177246094, 32.60752868652344, 29.454177856445312, 59.70436096191406, 47.640743255615234, 3.0491180419921875, -2.3917694091796875, 76.50006103515625, 122.419189453125, 54.77959060668945, -15.339813232421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000263.npy"}
{"epoch": 0.38619676945668135, "step": 264, "batch_size": 64, "mean": 26.88092803955078, "std": 26.89993667602539, "min": -26.324390411376953, "p10": -10.56842861175537, "median": 25.259666442871094, "p90": 58.76534767150879, "max": 95.41010284423828, "pos_frac": 0.84375, "sample": [23.783641815185547, 23.03021240234375, 25.42888641357422, 35.875579833984375, 8.824386596679688, 58.11481475830078, 51.80687713623047, 95.41010284423828, 28.51421356201172, 31.640350341796875, 52.54781723022461, 14.872337341308594, 16.095232009887695, -22.855300903320312, 31.981830596923828, 9.625804901123047, 74.83110046386719, 20.566070556640625, 59.04414749145508, -23.6932373046875, -13.886138916015625, 18.488754272460938, 31.5074462890625, 29.69630241394043, 21.73660659790039, 48.79894256591797, 66.74739837646484, 6.298439025878906, 7.855968475341797, 21.410247802734375, 4.13494873046875, 11.961700439453125, -9.917947769165039, 8.111526489257812, 6.164222717285156, 25.09044647216797, 14.175910949707031, 39.33216857910156, -5.533641815185547, 55.57514953613281, 48.27912139892578, 4.876007080078125, 47.26104736328125, -17.61620330810547, 30.79895782470703, 21.107742309570312, 43.177772521972656, -11.191322326660156, 50.76214599609375, 1.5669708251953125, 49.855995178222656, -10.847206115722656, 34.925514221191406, 76.39553833007812, -2.7093048095703125, 51.29265594482422, 30.57062530517578, 71.87236022949219, 42.475563049316406, 68.84033203125, 49.05797576904297, 47.376869201660156, 15.38137435913086, -26.324390411376953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000264.npy"}
{"epoch": 0.3876651982378855, "step": 265, "batch_size": 64, "mean": 26.15304946899414, "std": 28.96063232421875, "min": -35.14190673828125, "p10": -5.199279403686522, "median": 20.492298126220703, "p90": 68.90739135742189, "max": 93.22926330566406, "pos_frac": 0.8125, "sample": [30.846946716308594, 93.22926330566406, 45.535499572753906, 54.24074172973633, 0.35540008544921875, 17.339893341064453, -23.77886962890625, 26.244842529296875, 73.11064147949219, 30.047775268554688, 18.443252563476562, 44.068450927734375, -1.7095985412597656, 1.6726455688476562, 0.42990875244140625, 4.218479156494141, 12.811470031738281, -12.202655792236328, 15.73006820678711, 23.929527282714844, 5.7926177978515625, 73.60775756835938, 47.16065979003906, 91.54647827148438, 25.214187622070312, 69.74003601074219, 21.45270538330078, 65.90933990478516, 19.064529418945312, 65.14220428466797, 19.043073654174805, -3.4966888427734375, 9.765499114990234, 4.811853408813477, 16.178245544433594, 36.18438720703125, 7.754219055175781, 43.92292785644531, 72.82035827636719, 45.445533752441406, 66.96455383300781, 19.531890869140625, -5.728404998779297, -35.14190673828125, 27.927600860595703, -14.066291809082031, -0.036563873291015625, 14.455387115478516, 12.186721801757812, 37.88749694824219, 44.1000862121582, -1.284332275390625, 69.90444946289062, 65.7363510131836, -3.9646530151367188, 18.81071662902832, 60.53898620605469, 42.8419075012207, 33.61732482910156, -14.804168701171875, 31.581886291503906, 5.58740234375, -17.745399475097656, 23.270565032958984], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000265.npy"}
{"epoch": 0.3891336270190896, "step": 266, "batch_size": 64, "mean": 22.744022369384766, "std": 24.06202507019043, "min": -22.667537689208984, "p10": -8.805171585083007, "median": 23.990222930908203, "p90": 55.61866149902344, "max": 77.25936889648438, "pos_frac": 0.796875, "sample": [8.215805053710938, 56.05256652832031, 34.547813415527344, 35.44728088378906, 44.801063537597656, -8.433746337890625, 1.9173583984375, 35.185218811035156, 21.932373046875, 41.707664489746094, 28.333240509033203, 61.15583801269531, 2.692718505859375, 60.71112823486328, -7.088829040527344, 21.56769561767578, 62.545928955078125, 34.38371276855469, 50.484745025634766, -19.929603576660156, -1.1719436645507812, -22.667537689208984, 43.719635009765625, 38.08253479003906, 42.44116973876953, 13.587387084960938, -12.610984802246094, 14.310333251953125, -2.7133941650390625, 54.60621643066406, 42.12157440185547, 23.310195922851562, 77.25936889648438, 25.188430786132812, 36.384010314941406, 12.027053833007812, 57.751312255859375, 61.20445251464844, 12.0054931640625, -22.317535400390625, 28.69403076171875, 19.074851989746094, -1.6878890991210938, -8.787769317626953, 40.403221130371094, 19.264259338378906, 24.670249938964844, 27.942672729492188, -21.796424865722656, 18.96221160888672, 2.5476303100585938, 24.933547973632812, 41.25981903076172, 43.29997253417969, 1.3818435668945312, 18.211685180664062, 6.995477676391602, 14.037628173828125, 5.8235931396484375, 34.35496520996094, 25.387054443359375, 50.325286865234375, -9.615680694580078, -8.812629699707031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000266.npy"}
{"epoch": 0.39060205580029367, "step": 267, "batch_size": 64, "mean": 27.538055419921875, "std": 24.466886520385742, "min": -24.59958267211914, "p10": -0.2431228637695309, "median": 27.67640495300293, "p90": 56.62265014648438, "max": 122.21633911132812, "pos_frac": 0.890625, "sample": [34.232879638671875, 37.659629821777344, 0.9468917846679688, -24.59958267211914, 33.49864196777344, 25.45218276977539, 29.68071746826172, 30.995452880859375, 28.180328369140625, -5.182395935058594, 34.016082763671875, 36.53094482421875, 2.9164581298828125, 7.965839385986328, 44.211090087890625, 9.147579193115234, 37.74462890625, 20.037445068359375, 49.91642761230469, 122.21633911132812, 8.109025955200195, -4.915435791015625, 54.61448669433594, 13.375007629394531, 17.947799682617188, 27.416259765625, 30.019393920898438, -3.0405521392822266, 30.3212890625, 57.48329162597656, 3.7425537109375, 23.92737579345703, 67.00830078125, 39.74383544921875, 45.391883850097656, 18.936477661132812, -3.5490074157714844, 20.844398498535156, 21.706459045410156, 40.780181884765625, 66.60984802246094, 0.08734130859375, 16.725852966308594, 35.194252014160156, 17.98944091796875, -0.3847503662109375, 14.084823608398438, 62.315093994140625, 81.60157775878906, 22.297256469726562, 24.368114471435547, 36.138389587402344, 3.7057113647460938, 41.49885559082031, 72.25933837890625, 7.9548492431640625, 16.489974975585938, -17.833595275878906, 11.05447006225586, 43.48202133178711, 39.68943786621094, 27.93655014038086, 44.95515441894531, 28.784832000732422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000267.npy"}
{"epoch": 0.3920704845814978, "step": 268, "batch_size": 64, "mean": 28.003337860107422, "std": 25.255781173706055, "min": -21.275665283203125, "p10": -4.052496337890624, "median": 28.459888458251953, "p90": 61.58820381164553, "max": 89.33184814453125, "pos_frac": 0.875, "sample": [33.914669036865234, 49.71418762207031, 22.08349609375, 14.202207565307617, 13.103404998779297, 12.384988784790039, 4.8465118408203125, 0.5311813354492188, 32.971717834472656, 17.633811950683594, 16.670379638671875, -19.427474975585938, 29.50848388671875, -2.469635009765625, 54.67581558227539, 34.479339599609375, 19.052356719970703, 21.088481903076172, 13.9886474609375, 35.54616928100586, 15.307464599609375, 49.042335510253906, 38.225982666015625, 63.77618408203125, 73.66346740722656, 57.17474365234375, 48.56817626953125, 89.33184814453125, 0.6112518310546875, 64.62200927734375, 6.064002990722656, 15.369499206542969, 49.932106018066406, 23.781543731689453, 36.69389343261719, 18.725341796875, 55.716922760009766, 49.08751678466797, 42.298736572265625, 63.47968673706055, 72.36332702636719, 56.499176025390625, 27.411293029785156, -5.068418502807617, 31.27312469482422, -4.730865478515625, 7.975101470947266, 33.361114501953125, -17.127609252929688, 72.31983184814453, 8.890296936035156, -6.7307281494140625, 36.525421142578125, 16.02373504638672, 5.6373443603515625, 50.99549865722656, 34.14506149291992, 43.10968017578125, 35.566131591796875, 55.76028060913086, -10.9200439453125, -21.275665283203125, 1.1815986633300781, 3.0575485229492188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000268.npy"}
{"epoch": 0.3935389133627019, "step": 269, "batch_size": 64, "mean": 28.96139907836914, "std": 27.59026336669922, "min": -21.486282348632812, "p10": -10.405976486206054, "median": 27.673450469970703, "p90": 65.03318405151367, "max": 97.18658447265625, "pos_frac": 0.84375, "sample": [25.25025177001953, 63.50922393798828, 26.432815551757812, 47.26203155517578, -21.486282348632812, 26.17375946044922, 12.039741516113281, 27.672149658203125, 50.48778533935547, 6.9849853515625, 71.91326904296875, -10.901264190673828, 54.34550476074219, 49.21678161621094, 60.333282470703125, 35.07794189453125, 55.37236785888672, -11.455123901367188, 18.804840087890625, -11.961021423339844, 47.679107666015625, 28.306968688964844, 3.1401824951171875, 29.72954559326172, 20.22454833984375, 10.347732543945312, 42.67957305908203, 10.533935546875, 69.06744384765625, 9.005996704101562, 36.824710845947266, -9.25030517578125, -16.662185668945312, 51.631561279296875, 10.201057434082031, 32.60790252685547, 22.962982177734375, 15.825607299804688, 39.68150329589844, 50.349517822265625, -16.785831451416016, 36.901954650878906, 36.99613952636719, -1.5460205078125, 25.022396087646484, 94.83242797851562, 7.3146209716796875, 4.182647705078125, 68.60475158691406, 29.461563110351562, 2.208953857421875, 16.37286376953125, 53.23988342285156, 97.18658447265625, -0.18489837646484375, 27.67475128173828, 65.68630981445312, 8.728813171386719, 9.039039611816406, 59.244407653808594, 51.46269989013672, 43.286521911621094, -15.143547058105469, 69.78196716308594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000269.npy"}
{"epoch": 0.39500734214390604, "step": 270, "batch_size": 64, "mean": 26.342090606689453, "std": 31.098766326904297, "min": -31.772598266601562, "p10": -12.510458374023436, "median": 26.905250549316406, "p90": 62.748948669433595, "max": 99.8875503540039, "pos_frac": 0.765625, "sample": [-2.2464828491210938, 59.04463195800781, 71.51744079589844, 8.961898803710938, -1.8065643310546875, -27.1417236328125, 47.18037414550781, 62.01213073730469, 60.59910583496094, 93.55227661132812, 36.003089904785156, -10.236419677734375, 34.3193359375, 23.867599487304688, 23.78759765625, 31.325878143310547, 24.80731201171875, -10.036235809326172, -18.245849609375, 6.241546630859375, 30.240699768066406, 44.60139465332031, 53.542572021484375, 42.7271728515625, 23.261764526367188, 8.728652954101562, 1.8832168579101562, 14.075714111328125, 45.002593994140625, 37.787200927734375, 25.105514526367188, 14.706352233886719, 87.109375, -29.389183044433594, 63.064727783203125, 31.168983459472656, 47.65996551513672, -0.21590042114257812, 34.961090087890625, 16.52988624572754, 57.759033203125, 29.47943115234375, 61.628448486328125, -13.48504638671875, -16.40424346923828, 49.95063781738281, 66.06867218017578, 7.562469482421875, 28.704986572265625, 7.461250305175781, 39.69813537597656, -0.273529052734375, 0.08612823486328125, -8.702899932861328, 41.766082763671875, 43.46287536621094, -31.772598266601562, -1.3397483825683594, -27.475990295410156, 12.937477111816406, 41.60514831542969, 99.8875503540039, 4.514926910400391, 86.71586608886719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000270.npy"}
{"epoch": 0.3964757709251101, "step": 271, "batch_size": 64, "mean": 30.45282745361328, "std": 29.376808166503906, "min": -47.88587951660156, "p10": -9.018387413024891, "median": 35.01700210571289, "p90": 66.53064498901368, "max": 86.45755004882812, "pos_frac": 0.890625, "sample": [33.19854736328125, 67.63996887207031, 12.169975280761719, 38.858421325683594, 20.52667236328125, -13.583171844482422, 29.748367309570312, 86.45755004882812, 41.09807586669922, 44.38935852050781, 17.760162353515625, -47.88587951660156, 71.60658264160156, 35.346397399902344, 33.57087326049805, 51.95105743408203, -29.83721923828125, 79.1889877319336, 1.8695755004882812, 10.44591999053955, -32.1937255859375, 43.53960418701172, 69.18679809570312, 51.6697998046875, 32.28148651123047, 76.77647399902344, 17.974544525146484, 27.39234161376953, 37.9476318359375, 19.6534423828125, -24.994243621826172, 32.08458709716797, 40.736839294433594, 44.365760803222656, 56.67961120605469, 44.51971435546875, 23.615196228027344, 1.6327762603759766, 35.26398468017578, 39.82921600341797, 53.70451354980469, 39.82221984863281, 18.036624908447266, 3.675394058227539, 13.311161041259766, 18.5347957611084, 6.489288330078125, -14.71603012084961, 63.942222595214844, 46.11262512207031, 63.582794189453125, 68.98662567138672, 45.663368225097656, 8.032562255859375, 21.58556365966797, -46.13983154296875, 3.8892440795898438, 42.04032897949219, 44.69001770019531, 34.77001953125, 47.919219970703125, 21.454811096191406, 57.60315704345703, 63.50804901123047], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000271.npy"}
{"epoch": 0.39794419970631423, "step": 272, "batch_size": 64, "mean": 30.06342124938965, "std": 30.48558235168457, "min": -28.249893188476562, "p10": -6.3379375457763665, "median": 25.123313903808594, "p90": 74.37529983520508, "max": 93.71018981933594, "pos_frac": 0.859375, "sample": [10.887962341308594, -11.169200897216797, 44.268943786621094, -5.213748931884766, 16.800880432128906, 60.00194549560547, 64.21399688720703, 93.15233612060547, 60.76628875732422, 53.56897735595703, 15.471923828125, 36.31867980957031, 8.342872619628906, 13.37118911743164, -19.274301528930664, 59.57063293457031, 47.52509689331055, 26.44109344482422, 74.689208984375, 73.6428451538086, -25.260055541992188, 24.509361267089844, 19.359413146972656, 90.27777099609375, -21.603256225585938, 1.1226348876953125, 53.417213439941406, 93.71018981933594, 40.48112487792969, 82.64004516601562, 58.03695297241211, 17.01757049560547, 11.82559585571289, 24.669189453125, 25.779129028320312, 13.726242065429688, 59.068328857421875, -28.249893188476562, 25.577438354492188, 33.373558044433594, 79.84990692138672, 23.490814208984375, 1.7696685791015625, 20.107616424560547, 27.500030517578125, 9.263923645019531, 75.979736328125, 19.881023406982422, 18.37126922607422, 65.11216735839844, 3.7377166748046875, 63.32538986206055, -10.020164489746094, -2.3122634887695312, 8.894683837890625, 42.73579406738281, 40.0426025390625, 0.23108673095703125, 29.759666442871094, 6.366334915161133, 8.150581359863281, 33.41545104980469, 42.369468688964844, -6.819732666015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000272.npy"}
{"epoch": 0.39941262848751835, "step": 273, "batch_size": 64, "mean": 31.21113395690918, "std": 33.70786666870117, "min": -70.43782806396484, "p10": -8.234107971191406, "median": 30.929975509643555, "p90": 75.75953598022461, "max": 109.5216293334961, "pos_frac": 0.84375, "sample": [-7.906303405761719, 57.578369140625, -26.81739044189453, 4.904289245605469, 95.73896789550781, 109.5216293334961, 17.7669677734375, 29.69054412841797, 45.70085144042969, 15.0240478515625, 1.1690673828125, 25.655853271484375, 41.853271484375, 37.542152404785156, -16.775821685791016, 80.55181884765625, -19.890846252441406, 25.576766967773438, 57.76420593261719, 83.86233520507812, 51.98412322998047, 29.476341247558594, 4.292999267578125, 29.823604583740234, -14.912517547607422, 59.29872131347656, 39.04347229003906, 30.387271881103516, 39.074546813964844, 78.1404800415039, -33.43098449707031, 2.049713134765625, 73.41413116455078, 59.5418701171875, 45.28480529785156, 76.76470947265625, 34.62245178222656, 43.946311950683594, 31.84371566772461, 64.9898681640625, -4.12225341796875, 55.48918914794922, 22.213638305664062, 7.73626708984375, 27.751792907714844, 31.472679138183594, 6.354034423828125, -8.374595642089844, 14.577278137207031, 1.4221878051757812, 55.54670715332031, 22.7293701171875, -0.7719192504882812, 102.3116683959961, 46.69242858886719, 57.36647033691406, 17.61444091796875, 3.990774154663086, 41.667572021484375, 38.321319580078125, 62.36051940917969, -70.43782806396484, 35.56822204589844, 25.886123657226562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000273.npy"}
{"epoch": 0.4008810572687225, "step": 274, "batch_size": 64, "mean": 25.03656768798828, "std": 25.081161499023438, "min": -26.074722290039062, "p10": -2.445079994201659, "median": 20.212629318237305, "p90": 57.23003311157227, "max": 82.4390869140625, "pos_frac": 0.875, "sample": [11.443023681640625, 13.952381134033203, 17.930660247802734, -26.074722290039062, 45.22642517089844, 30.296661376953125, 14.514007568359375, 4.015556335449219, -2.9424190521240234, 70.57286071777344, 18.827693939208984, 77.79395294189453, 14.579513549804688, -6.353546142578125, 12.517402648925781, 56.668060302734375, 39.993431091308594, 31.005172729492188, -17.9967041015625, 0.44496917724609375, 80.24605560302734, 21.969749450683594, 26.712074279785156, 12.343955993652344, -21.364784240722656, 36.03227233886719, 35.333091735839844, 14.633842468261719, 24.625503540039062, 52.33720397949219, 15.542409896850586, 54.546783447265625, 2.945343017578125, 33.619773864746094, 8.43118667602539, 54.85490417480469, 57.47087860107422, 20.358863830566406, 82.4390869140625, 34.663516998291016, 54.11253356933594, 20.234554290771484, 8.993125915527344, -14.62448501586914, -20.68170928955078, 19.9395751953125, 48.21966552734375, 9.79345703125, 8.502922058105469, 72.47327423095703, 46.73366928100586, -1.2846221923828125, 21.537166595458984, 27.39141845703125, 61.5439338684082, 20.190704345703125, 21.454090118408203, 13.21147632598877, 13.082061767578125, 18.308258056640625, 41.52226257324219, 6.55609130859375, 10.476112365722656, 40.498653411865234], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000274.npy"}
{"epoch": 0.4023494860499266, "step": 275, "batch_size": 64, "mean": 26.782381057739258, "std": 29.081092834472656, "min": -45.157588958740234, "p10": -12.51008834838867, "median": 27.825389862060547, "p90": 66.13914184570314, "max": 83.34931182861328, "pos_frac": 0.8125, "sample": [-13.058143615722656, -4.865810394287109, 11.15509033203125, 7.429561614990234, 26.828102111816406, 47.98231506347656, 43.652061462402344, -3.6875076293945312, 58.310386657714844, 55.48714065551758, 52.4439697265625, 61.83100128173828, 43.25425720214844, 14.593381881713867, 6.80865478515625, 25.300079345703125, 17.785598754882812, 30.463485717773438, -45.157588958740234, -2.9657745361328125, 67.02119445800781, 43.39375305175781, 12.241287231445312, 17.99378204345703, -18.397781372070312, 18.36541748046875, 4.938804626464844, -11.231292724609375, 40.82037353515625, -17.673568725585938, 19.79343605041504, 39.51227951049805, 70.41014099121094, 19.2984619140625, 31.142616271972656, 13.453948974609375, 6.768341064453125, 42.032081604003906, 0.8436450958251953, 17.425979614257812, -20.643157958984375, -4.409271240234375, -21.69628143310547, 45.236141204833984, 53.42023468017578, 73.51870727539062, 44.018096923828125, 70.21343231201172, 16.990468978881836, 17.319046020507812, 44.371944427490234, -35.96236038208008, 28.822677612304688, 75.67912292480469, 60.055999755859375, 11.55718994140625, 35.62409210205078, 83.34931182861328, 31.39483642578125, 55.270320892333984, 65.00091552734375, 36.703857421875, 66.626953125, 29.866912841796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000275.npy"}
{"epoch": 0.40381791483113066, "step": 276, "batch_size": 64, "mean": 25.471071243286133, "std": 29.577238082885742, "min": -38.461185455322266, "p10": -11.642927551269532, "median": 23.824939727783203, "p90": 66.91635589599609, "max": 110.69120788574219, "pos_frac": 0.84375, "sample": [25.9212646484375, 66.94876098632812, 66.84074401855469, 42.629486083984375, 5.3802337646484375, 21.441133499145508, 25.90882110595703, 68.38621520996094, -11.709243774414062, 42.92304229736328, 28.159698486328125, 1.3235321044921875, 14.040679931640625, 23.83483123779297, 28.333114624023438, 58.2783203125, -0.5952911376953125, -8.144874572753906, 53.95994567871094, 42.59964370727539, 14.601192474365234, -23.805038452148438, 25.312458038330078, 18.309982299804688, 4.2503662109375, 40.34381866455078, 2.9999923706054688, 12.148910522460938, 28.23048210144043, 29.425472259521484, -11.488189697265625, -33.07867431640625, 35.261253356933594, 11.444900512695312, 57.782562255859375, 67.89746856689453, 18.018508911132812, 9.97796630859375, 2.8669776916503906, 16.72769546508789, 31.098464965820312, 9.284912109375, -33.19171142578125, 54.908843994140625, 17.247222900390625, -17.949859619140625, 1.8031845092773438, 26.201095581054688, 72.41937255859375, 75.28221130371094, 73.37629699707031, 15.409584045410156, 40.09952926635742, 63.771141052246094, -14.292884826660156, 55.89862060546875, 8.987422943115234, -38.461185455322266, 50.116119384765625, 23.183006286621094, 37.41810607910156, 110.69120788574219, 19.34467315673828, 23.815048217773438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000276.npy"}
{"epoch": 0.4052863436123348, "step": 277, "batch_size": 64, "mean": 24.163555145263672, "std": 23.871143341064453, "min": -27.870269775390625, "p10": -5.0187835693359375, "median": 25.493751525878906, "p90": 54.41658210754395, "max": 72.4430923461914, "pos_frac": 0.8125, "sample": [-5.838623046875, 44.69801330566406, 30.064857482910156, 47.36639404296875, 60.190372467041016, 2.904407501220703, -2.8535614013671875, 40.7833251953125, -5.024574279785156, 25.8028564453125, 31.77478790283203, 31.48074722290039, 0.2326202392578125, -22.20760726928711, 63.53112030029297, 49.2388916015625, 12.613517761230469, 7.927665710449219, 40.007102966308594, -5.005271911621094, 30.814044952392578, -4.82379150390625, 46.79737091064453, 50.75047302246094, -1.0081825256347656, 28.118091583251953, 15.628532409667969, 36.916046142578125, 2.9317703247070312, 28.87963104248047, 40.621612548828125, 57.32537078857422, 10.608421325683594, 72.4430923461914, 41.183143615722656, 8.098651885986328, -2.5549468994140625, 32.157169342041016, 20.04827880859375, 24.25689697265625, 2.8764190673828125, 50.786441802978516, 43.28468322753906, -9.733871459960938, 3.086437225341797, 17.870071411132812, 7.285438537597656, 55.12294006347656, 66.10287475585938, 37.322113037109375, 25.124542236328125, 32.23533630371094, 52.76841354370117, -23.289093017578125, 25.184646606445312, 18.33159637451172, -12.463020324707031, 16.002548217773438, 55.845733642578125, 43.09553527832031, 9.16629409790039, 23.778568267822266, -27.870269775390625, 47.674407958984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000277.npy"}
{"epoch": 0.4067547723935389, "step": 278, "batch_size": 64, "mean": 26.432132720947266, "std": 29.078899383544922, "min": -60.92011260986328, "p10": -6.860929870605468, "median": 26.13400650024414, "p90": 66.24444541931155, "max": 83.07136535644531, "pos_frac": 0.828125, "sample": [26.74445343017578, 56.592105865478516, 35.09978485107422, 52.94746780395508, 34.20350646972656, 40.476470947265625, 47.354774475097656, -30.182205200195312, 83.07136535644531, 38.17974853515625, 6.651847839355469, 47.524009704589844, 47.00840377807617, 18.149124145507812, 12.55472183227539, 25.474544525146484, -19.281845092773438, 73.22650146484375, 71.76028442382812, 4.176025390625, 14.449737548828125, 70.90899658203125, 57.19715881347656, -5.981636047363281, -3.9136505126953125, 19.357070922851562, 37.146034240722656, 4.900566101074219, 9.343284606933594, 57.30328369140625, 74.92236328125, 46.969139099121094, 46.13923645019531, -60.92011260986328, 33.482086181640625, 19.97911834716797, 32.1939811706543, 37.23616027832031, -7.237770080566406, 44.719139099121094, 50.304683685302734, 7.552066802978516, 13.065092086791992, 25.5235595703125, -8.916366577148438, 36.20111083984375, 0.5213050842285156, 0.37985992431640625, 11.463768005371094, 47.94854736328125, 27.13214111328125, 61.53750991821289, 6.117454528808594, 22.742332458496094, 81.83851623535156, 68.26170349121094, -1.6824569702148438, 10.305274963378906, -5.8042755126953125, 9.231895446777344, 37.72046661376953, -26.26263427734375, 24.109237670898438, -7.5596466064453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000278.npy"}
{"epoch": 0.40822320117474303, "step": 279, "batch_size": 64, "mean": 27.01772689819336, "std": 29.080509185791016, "min": -42.733116149902344, "p10": -6.032017135620116, "median": 24.203861236572266, "p90": 66.85079879760742, "max": 117.58193969726562, "pos_frac": 0.796875, "sample": [7.991390228271484, 31.968338012695312, -2.0275421142578125, -8.193023681640625, -11.066986083984375, 22.344566345214844, 28.960769653320312, 23.2486572265625, 4.23919677734375, -11.4410400390625, 43.91082000732422, 40.37022399902344, 66.87727355957031, 49.1618537902832, 70.09235382080078, -3.0495758056640625, -6.519733428955078, 29.1868896484375, 61.990264892578125, -42.733116149902344, 35.94256591796875, 74.2123031616211, 25.050029754638672, 41.88207244873047, 39.82912063598633, 46.03164291381836, 66.78902435302734, 21.933815002441406, 18.565391540527344, 58.363807678222656, -16.5511474609375, 39.93760681152344, 25.016250610351562, 75.72586059570312, 24.02857208251953, 47.83545684814453, 9.965095520019531, 117.58193969726562, 32.84361267089844, 73.56127166748047, 13.370353698730469, -2.3492965698242188, 24.379150390625, 7.26031494140625, -2.733123779296875, 47.96327209472656, 16.544189453125, 15.74258804321289, 58.821693420410156, 11.922992706298828, 21.4991455078125, 23.84265899658203, 78.59848022460938, -4.894012451171875, 54.837677001953125, 10.62582015991211, 16.787139892578125, 0.12337493896484375, -29.87811851501465, 27.876747131347656, 39.156585693359375, 16.472885131835938, -4.199913024902344, 33.50807189941406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000279.npy"}
{"epoch": 0.40969162995594716, "step": 280, "batch_size": 64, "mean": 31.758270263671875, "std": 29.139339447021484, "min": -41.53939437866211, "p10": -4.401605415344236, "median": 28.290027618408203, "p90": 70.55203170776367, "max": 99.22549438476562, "pos_frac": 0.875, "sample": [11.210868835449219, 43.48655700683594, -41.53939437866211, 72.36318969726562, -11.704551696777344, -16.235984802246094, 28.581748962402344, 6.4508514404296875, 70.661865234375, 65.94126892089844, 2.5450477600097656, 3.3964080810546875, 73.36287689208984, 22.83429718017578, 27.998306274414062, 54.576255798339844, 37.287391662597656, 70.2957534790039, 45.87724304199219, 7.079341888427734, 2.8559093475341797, 50.8530158996582, 54.80699157714844, 19.07388687133789, 64.74883270263672, 55.04024887084961, 24.459144592285156, 53.24043273925781, -16.271713256835938, 51.35613250732422, 74.89753723144531, 69.00711822509766, 22.3328857421875, 78.73713684082031, 65.83724975585938, -2.670461654663086, 32.632080078125, 99.22549438476562, 0.3965911865234375, 68.42491912841797, 47.73342514038086, 42.62665939331055, -10.755729675292969, -10.715911865234375, 1.9384918212890625, 14.341888427734375, 17.62259292602539, 30.324935913085938, -5.143524169921875, 72.68769073486328, 29.917816162109375, 26.83404541015625, 18.569297790527344, 11.838081359863281, 49.38937759399414, 24.606002807617188, 18.65802001953125, 59.85626983642578, 12.144561767578125, 23.227890014648438, 34.99267578125, 23.700302124023438, 43.14685821533203, 11.534866333007812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000280.npy"}
{"epoch": 0.4111600587371512, "step": 281, "batch_size": 64, "mean": 32.501609802246094, "std": 33.846595764160156, "min": -45.925628662109375, "p10": -3.987183761596678, "median": 27.26114273071289, "p90": 80.13626098632814, "max": 124.90835571289062, "pos_frac": 0.84375, "sample": [-16.20214080810547, 107.52516174316406, 18.135299682617188, 61.43352127075195, 27.01757049560547, 10.533035278320312, 78.35133361816406, 22.43321990966797, 16.958953857421875, 10.217750549316406, 51.16761016845703, 53.885398864746094, 23.857803344726562, 12.367767333984375, 20.749061584472656, -45.925628662109375, 60.92792892456055, 12.119705200195312, 18.938133239746094, 30.74468231201172, 45.06095886230469, 80.90122985839844, 21.461318969726562, 8.409049987792969, -4.568702697753906, 103.99514770507812, 33.31694030761719, 63.56757736206055, 24.440231323242188, 85.89378356933594, 31.950576782226562, 26.030651092529297, -2.6303062438964844, 51.49623107910156, 24.609329223632812, 30.810531616210938, 32.32984924316406, 29.474365234375, 54.52582550048828, 26.592315673828125, 89.55398559570312, 16.919448852539062, 30.244705200195312, 55.366294860839844, 103.05426788330078, 35.588993072509766, 21.79342269897461, 58.826202392578125, 124.90835571289062, 29.776674270629883, -7.042266845703125, 23.2889404296875, -15.893196105957031, 9.92578125, -2.2491455078125, 29.422897338867188, 14.106611251831055, 77.26054382324219, -27.126541137695312, 48.081966400146484, 27.504714965820312, -35.316314697265625, -1.885061264038086, 31.088783264160156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000281.npy"}
{"epoch": 0.41262848751835535, "step": 282, "batch_size": 64, "mean": 37.71876907348633, "std": 30.769556045532227, "min": -17.513513565063477, "p10": 7.8058013916015625, "median": 31.251022338867188, "p90": 77.44027328491211, "max": 145.69094848632812, "pos_frac": 0.953125, "sample": [72.55244445800781, 21.62504005432129, 29.73727798461914, 89.84623718261719, 32.799842834472656, 58.15972137451172, 85.55235290527344, 67.26771545410156, -11.195503234863281, 58.70960998535156, 71.23530578613281, 66.90504455566406, 66.9178695678711, -17.513513565063477, 34.79486846923828, 8.914146423339844, 145.69094848632812, 68.39551544189453, -16.11962890625, 0.717193603515625, 64.872802734375, 76.05415344238281, 30.982662200927734, 16.36276626586914, 52.762847900390625, 89.3028793334961, 51.07844924926758, 58.57036590576172, 19.89844512939453, 31.73346710205078, 41.53852081298828, 43.208351135253906, 1.9968414306640625, 10.926897048950195, 17.796009063720703, 78.0343246459961, 65.52467346191406, 22.144332885742188, 24.494964599609375, 31.51938247680664, 11.759429931640625, 13.844650268554688, 26.65563201904297, 86.25373840332031, 29.320037841796875, 23.680381774902344, 7.8297271728515625, 23.70183563232422, 10.563034057617188, 47.33918762207031, 15.01751708984375, 13.957778930664062, 36.53367614746094, 31.938142776489258, 8.168434143066406, 29.08880615234375, 34.9508056640625, 34.87348556518555, 27.835521697998047, 95.06329345703125, 17.576736450195312, 13.407310485839844, 3.050830841064453, 7.7955474853515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000282.npy"}
{"epoch": 0.41409691629955947, "step": 283, "batch_size": 64, "mean": 25.754066467285156, "std": 34.799869537353516, "min": -59.326019287109375, "p10": -14.210248184204097, "median": 24.530895233154297, "p90": 71.21881713867188, "max": 99.77613830566406, "pos_frac": 0.828125, "sample": [0.6689682006835938, 2.7671127319335938, 50.222755432128906, 6.262905120849609, 17.884071350097656, 91.37733459472656, 34.02574157714844, 24.434234619140625, 68.33242797851562, -2.6793670654296875, 2.6655120849609375, 50.735687255859375, 3.328948974609375, 6.5772857666015625, -30.670013427734375, 49.45135498046875, 13.831390380859375, 40.564369201660156, -9.681869506835938, 7.336702346801758, 18.26776123046875, 45.72327423095703, -47.65167999267578, 72.45584106445312, 9.821113586425781, 58.73857116699219, 16.47707748413086, 74.43292236328125, 39.31683349609375, 99.77613830566406, 5.675537109375, 24.62755584716797, 79.13140869140625, 39.25916290283203, 26.320281982421875, 34.13633728027344, -32.32989501953125, 49.244140625, 6.875804901123047, 5.377037048339844, -16.117324829101562, 62.59393310546875, 56.14239501953125, -46.32499694824219, -9.76040267944336, 57.57024383544922, 61.486419677734375, 37.590240478515625, 19.85595703125, -32.55150604248047, 78.08158874511719, 15.758834838867188, 62.785614013671875, 15.319206237792969, 47.571990966796875, 30.4722900390625, 79.86328125, -8.594802856445312, 25.085693359375, 47.141395568847656, 11.91680908203125, 58.452728271484375, -59.326019287109375, 0.13579940795898438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000283.npy"}
{"epoch": 0.4155653450807636, "step": 284, "batch_size": 64, "mean": 37.36852264404297, "std": 29.967016220092773, "min": -27.363014221191406, "p10": 1.2991125106811523, "median": 36.748111724853516, "p90": 68.48020324707032, "max": 149.92007446289062, "pos_frac": 0.921875, "sample": [48.0428466796875, 43.93023681640625, 33.881744384765625, 43.8572998046875, 19.981727600097656, 11.97130012512207, 24.1883544921875, 19.45531463623047, 36.35314178466797, 43.07081604003906, 36.28192138671875, 26.372474670410156, 39.59223175048828, -1.9829788208007812, 30.88567352294922, 1.3480548858642578, -27.363014221191406, 64.87501525878906, 64.25447082519531, 36.742881774902344, 18.72144317626953, 103.50732421875, 22.68543243408203, 89.43333435058594, 60.43452072143555, 0.0510711669921875, 49.51704406738281, 26.689117431640625, 36.75334167480469, -13.823799133300781, -3.268585205078125, 149.92007446289062, 44.65863037109375, 37.992591857910156, 26.372650146484375, 38.88676452636719, 46.9837646484375, 1.27813720703125, 41.46952819824219, 123.09579467773438, 74.01942443847656, 37.513824462890625, 18.514129638671875, 31.524444580078125, 45.159912109375, 10.589046478271484, 28.191055297851562, 22.439987182617188, -0.108154296875, 53.59099578857422, 38.06373596191406, 28.04217529296875, 65.47892761230469, 76.64602661132812, 69.76646423339844, 9.63882827758789, 15.206977844238281, 9.773666381835938, 15.43145751953125, 45.37485885620117, 48.39118957519531, 51.86543273925781, 60.442710876464844, 38.93049621582031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000284.npy"}
{"epoch": 0.4170337738619677, "step": 285, "batch_size": 64, "mean": 26.28765296936035, "std": 25.850746154785156, "min": -49.0576171875, "p10": -8.04016647338867, "median": 29.410966873168945, "p90": 51.0203296661377, "max": 91.12884521484375, "pos_frac": 0.84375, "sample": [65.55349731445312, 5.1343841552734375, 25.923431396484375, 14.857254028320312, -19.861347198486328, 28.495288848876953, 28.588394165039062, -8.543914794921875, 47.43605422973633, 4.396575927734375, 7.304786682128906, 50.76195526123047, 50.5004768371582, 14.924470901489258, 47.72231674194336, 62.73157501220703, 49.295860290527344, 14.76080322265625, 91.12884521484375, 41.51936340332031, 16.829605102539062, -11.17791748046875, 2.3756256103515625, 30.10071563720703, 27.288436889648438, 32.64704895019531, 31.26209259033203, 41.67930603027344, 35.86729431152344, 31.43572998046875, 45.49684143066406, 33.988731384277344, -35.02296447753906, -0.7535629272460938, 35.99523162841797, 64.26687622070312, -4.4747161865234375, -14.322097778320312, 34.07293701171875, 58.119354248046875, 24.14571762084961, 31.173049926757812, 42.984710693359375, 20.358047485351562, 28.674869537353516, 3.4802894592285156, 6.26300048828125, 42.9410400390625, 46.34421920776367, 19.84741973876953, -18.991912841796875, 29.356204986572266, 39.108604431152344, 29.465728759765625, 22.52535629272461, 51.13106155395508, 81.10186767578125, -49.0576171875, -6.864753723144531, 19.375228881835938, 42.30723571777344, 25.0885009765625, 40.19153594970703, 33.15563201904297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000285.npy"}
{"epoch": 0.4185022026431718, "step": 286, "batch_size": 64, "mean": 32.153831481933594, "std": 30.46472930908203, "min": -41.33367919921875, "p10": -10.11059112548828, "median": 30.991310119628906, "p90": 62.05588760375977, "max": 116.54522705078125, "pos_frac": 0.828125, "sample": [15.192207336425781, 33.41107940673828, 47.63604736328125, 35.708709716796875, 50.147586822509766, 43.01639938354492, 59.04692840576172, -2.3356094360351562, 13.6240234375, 17.496002197265625, 23.697032928466797, 30.937469482421875, 47.40814208984375, -4.394748687744141, 10.027725219726562, 75.86492919921875, 112.439453125, 13.560783386230469, 27.124603271484375, 21.35848045349121, 90.73805236816406, -23.18579864501953, -11.929649353027344, 38.74977493286133, 30.633255004882812, 12.49905776977539, 30.767608642578125, 51.16168975830078, 16.884017944335938, 56.4549560546875, 77.68368530273438, 51.34629821777344, 46.95630645751953, 47.8662109375, 55.50151062011719, 55.84886932373047, 65.7347412109375, 27.940773010253906, 34.27271270751953, -24.08209228515625, 49.150779724121094, 27.34832763671875, 32.58263397216797, -9.386932373046875, -23.800888061523438, 38.55685806274414, 61.04070281982422, 31.045150756835938, -41.33367919921875, 116.54522705078125, 18.644615173339844, 43.911712646484375, -6.752416610717773, 30.459136962890625, -10.420730590820312, 62.490966796875, -11.13470458984375, 26.935829162597656, 30.24396324157715, 18.28533172607422, 55.61975860595703, 37.47010040283203, 57.263641357421875, 20.270416259765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000286.npy"}
{"epoch": 0.4199706314243759, "step": 287, "batch_size": 64, "mean": 21.69069480895996, "std": 33.02899169921875, "min": -49.456825256347656, "p10": -15.000632858276365, "median": 22.46906089782715, "p90": 54.36954841613771, "max": 131.02304077148438, "pos_frac": 0.734375, "sample": [1.6475601196289062, 2.25115966796875, 45.46961212158203, -0.8700485229492188, 4.431610107421875, -21.714218139648438, 33.535400390625, 6.2318115234375, 76.84349060058594, -12.885318756103516, 110.95818328857422, 62.31562805175781, 40.27071762084961, 35.513206481933594, 32.840179443359375, -4.4047698974609375, -0.06684494018554688, 33.881134033203125, -15.907196044921875, -17.98260498046875, 24.058767318725586, 14.505062103271484, 50.810489654541016, 27.6793212890625, 131.02304077148438, 29.57621955871582, 5.713069915771484, -42.48675537109375, -2.8514022827148438, 24.67900848388672, 10.058002471923828, 39.25755310058594, 1.5904197692871094, 1.9915618896484375, 14.373106002807617, 49.967002868652344, 30.398788452148438, 55.894859313964844, 28.64238739013672, -6.933807373046875, 35.69392395019531, 49.18102264404297, -49.456825256347656, 22.859851837158203, 17.218673706054688, 11.075565338134766, 41.74577331542969, 1.30029296875, 46.127845764160156, 73.7408447265625, -8.072366714477539, -4.362037658691406, 33.40741729736328, 45.55448532104492, -19.18023681640625, -4.719423294067383, -29.025959014892578, 45.36549377441406, -5.573945999145508, 6.168212890625, 23.1083984375, 40.898311614990234, 22.078269958496094, 92.76553344726562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000287.npy"}
{"epoch": 0.42143906020558003, "step": 288, "batch_size": 64, "mean": 33.86872482299805, "std": 33.573055267333984, "min": -21.18072509765625, "p10": -12.378703308105466, "median": 32.50166893005371, "p90": 78.09091033935547, "max": 127.17744445800781, "pos_frac": 0.8125, "sample": [-10.05392837524414, 20.225753784179688, -21.18072509765625, 30.5556640625, -15.298789978027344, 64.72262573242188, -10.58401870727539, 60.47764587402344, -13.14785385131836, 78.50325012207031, 83.73835754394531, 46.74981689453125, -13.52212142944336, 45.92317199707031, 13.319637298583984, 1.0918502807617188, 17.48625946044922, 39.533790588378906, 56.99871826171875, 54.30485534667969, 8.921154022216797, 37.9544677734375, 43.496795654296875, -6.8787689208984375, 59.52039337158203, 26.941871643066406, 10.654251098632812, 6.442310333251953, 41.61604309082031, 19.779449462890625, -18.87957763671875, -13.453605651855469, 89.6626968383789, 77.1287841796875, 91.25377655029297, 6.5322418212890625, 4.652351379394531, 88.19684600830078, 23.159027099609375, 26.379005432128906, 9.389297485351562, 25.229652404785156, -5.351844787597656, 76.29991912841797, 75.67050170898438, 52.866111755371094, 20.257598876953125, -0.3129158020019531, 63.323394775390625, 49.165130615234375, 54.79584503173828, 54.542022705078125, 25.112655639648438, 58.326148986816406, 49.959346771240234, -15.913070678710938, 127.17744445800781, 34.44767379760742, 21.67071533203125, 61.76640319824219, 43.64336395263672, 88.39956665039062, 7.4870452880859375, 36.722904205322266], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000288.npy"}
{"epoch": 0.42290748898678415, "step": 289, "batch_size": 64, "mean": 36.33726501464844, "std": 40.942176818847656, "min": -38.387451171875, "p10": -4.723001098632812, "median": 25.665325164794922, "p90": 94.00893325805664, "max": 145.30941772460938, "pos_frac": 0.8125, "sample": [23.18543815612793, 41.67364501953125, 124.01139831542969, 95.0547103881836, -9.845439910888672, 21.956199645996094, 6.355064392089844, 79.68492889404297, 2.394683837890625, 26.323226928710938, 35.03324890136719, 42.891502380371094, 55.6263427734375, 145.30941772460938, 60.272918701171875, 65.08131408691406, 91.33680725097656, 125.54386901855469, 16.502105712890625, 86.42220306396484, 25.007423400878906, 14.657295227050781, -38.387451171875, -1.6502456665039062, 6.583346366882324, 2.6099166870117188, 116.80938720703125, -5.436798095703125, 34.58283996582031, 3.472198486328125, 77.53971862792969, 82.84297180175781, 2.2924423217773438, 4.935951232910156, -8.324127197265625, -0.2518310546875, 93.9270248413086, 31.48517608642578, 8.526443481445312, 3.367584228515625, 9.063316345214844, 100.2431640625, -0.7050762176513672, 30.16735076904297, -3.9904022216796875, 54.70178985595703, 94.04403686523438, 6.634979248046875, -5.0369720458984375, 34.22594451904297, 53.76948928833008, 72.81466674804688, -10.534130096435547, 38.89031982421875, 12.350173950195312, 15.981582641601562, 6.32835578918457, -13.04433822631836, -3.334808349609375, 72.95863342285156, 4.87518310546875, 68.29859924316406, 70.83987426757812, 26.640472412109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000289.npy"}
{"epoch": 0.4243759177679883, "step": 290, "batch_size": 64, "mean": 36.90747833251953, "std": 39.21573257446289, "min": -55.48419189453125, "p10": -3.3489818572998042, "median": 33.73259162902832, "p90": 86.42757263183594, "max": 174.41403198242188, "pos_frac": 0.84375, "sample": [38.16786193847656, 93.92040252685547, 58.16832733154297, 26.02874755859375, 32.172340393066406, 8.887611389160156, -7.2783660888671875, 4.979057312011719, 27.996871948242188, -2.4944610595703125, 54.763580322265625, 42.86936950683594, 24.378463745117188, 58.486785888671875, 71.31637573242188, 43.71665954589844, -37.64253234863281, 6.037742614746094, -12.975273132324219, 22.659103393554688, 38.879913330078125, 61.43989562988281, 42.58420181274414, 38.862640380859375, 38.688743591308594, -55.48419189453125, 87.40220642089844, 28.197296142578125, 174.41403198242188, 119.63217163085938, 21.12948989868164, 27.147193908691406, 25.854995727539062, 84.15342712402344, -27.23199462890625, -3.075824737548828, 36.48151397705078, 38.49323272705078, 15.059494018554688, 66.7374267578125, 43.06892395019531, 35.292842864990234, 28.880508422851562, 40.46263885498047, 14.094970703125, 10.5386962890625, 54.80451202392578, 41.00480270385742, -5.756538391113281, 28.103656768798828, 73.89471435546875, 48.169677734375, 22.42790412902832, 101.85104370117188, 5.36662483215332, -3.4660491943359375, 128.15243530273438, 17.220703125, -0.5516262054443359, 6.340780258178711, 23.460235595703125, 108.21591186523438, 80.76563262939453, 46.21117401123047], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000290.npy"}
{"epoch": 0.42584434654919234, "step": 291, "batch_size": 64, "mean": 28.44838523864746, "std": 32.47624206542969, "min": -75.52906799316406, "p10": -9.357759475708008, "median": 25.935500144958496, "p90": 69.75317840576172, "max": 93.73088073730469, "pos_frac": 0.859375, "sample": [33.209449768066406, 6.858158111572266, 18.803634643554688, 8.919960021972656, 23.52375030517578, 60.48546600341797, 70.29873657226562, 41.969337463378906, 81.72149658203125, 11.6546630859375, -20.051498413085938, 17.97613525390625, 80.39644622802734, 13.06197738647461, 14.216167449951172, 15.014495849609375, 65.29698181152344, 49.633628845214844, 33.63005065917969, 8.587982177734375, 93.73088073730469, -54.46253967285156, 44.04726791381836, 65.76531982421875, 50.122779846191406, 69.31643676757812, 22.966426849365234, 42.63818359375, 28.036514282226562, -11.40103530883789, 29.175514221191406, 50.03800964355469, 24.819564819335938, 5.0582122802734375, 81.37748718261719, 62.22022247314453, -75.52906799316406, 40.98454284667969, 18.171890258789062, 33.350311279296875, 25.511411666870117, 42.226478576660156, 63.66931915283203, 12.778129577636719, 13.116430282592773, 15.288778305053711, -9.064067840576172, 57.25224304199219, 13.732292175292969, 51.459228515625, 13.555343627929688, -36.424468994140625, 54.31426239013672, -8.789840698242188, 12.980735778808594, -12.819244384765625, 69.94035339355469, 16.783451080322266, 38.91459655761719, -9.483627319335938, 70.3182144165039, 42.559661865234375, 26.359588623046875, 0.8834152221679688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000291.npy"}
{"epoch": 0.42731277533039647, "step": 292, "batch_size": 64, "mean": 35.03672790527344, "std": 26.916410446166992, "min": -15.838493347167969, "p10": -2.7461299896240234, "median": 33.7474308013916, "p90": 68.4274383544922, "max": 111.98550415039062, "pos_frac": 0.859375, "sample": [39.72978973388672, 68.01644897460938, 37.664405822753906, 29.847732543945312, 40.94166564941406, 32.2896728515625, -12.232635498046875, 29.29302978515625, 44.7674560546875, 5.2770843505859375, 34.95469665527344, -2.582721710205078, 59.283790588378906, 13.553573608398438, 22.305763244628906, 30.072887420654297, 28.472686767578125, 69.05680847167969, 56.89561462402344, 35.555519104003906, 58.51667785644531, 111.98550415039062, 40.1517333984375, 48.90348434448242, 30.698272705078125, 89.20750427246094, 70.77859497070312, 47.81681823730469, 40.023895263671875, 26.763080596923828, -15.838493347167969, -5.3497161865234375, 1.2391510009765625, 17.886314392089844, 10.328086853027344, 68.60357666015625, 26.408885955810547, 32.540164947509766, 63.05809783935547, -8.215896606445312, 67.94441986083984, 48.207977294921875, 64.5287094116211, -9.640811920166016, 13.51715087890625, 70.2581787109375, 67.18846130371094, 29.121597290039062, 47.3843994140625, 25.461078643798828, -2.816162109375, 17.200775146484375, 37.50760269165039, -4.252349853515625, 43.569190979003906, 18.347625732421875, 23.8482666015625, 39.40576934814453, 59.72480010986328, 27.49987030029297, 81.5507583618164, 6.1122589111328125, 53.67687225341797, -1.6647453308105469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000292.npy"}
{"epoch": 0.4287812041116006, "step": 293, "batch_size": 64, "mean": 25.03589630126953, "std": 34.24531173706055, "min": -33.140708923339844, "p10": -16.466721725463866, "median": 19.885770797729492, "p90": 74.98184814453127, "max": 113.31050109863281, "pos_frac": 0.78125, "sample": [23.540193557739258, 21.26241683959961, -24.091148376464844, 11.825325012207031, 0.4065208435058594, 21.245201110839844, 71.56631469726562, -11.495285034179688, 49.758628845214844, 9.48288345336914, 25.761436462402344, 21.663238525390625, 21.639930725097656, -32.2667236328125, 113.31050109863281, 8.150642395019531, 1.594207763671875, 34.6927490234375, 19.852447509765625, 43.26788330078125, -7.1504364013671875, -33.140708923339844, 90.72911071777344, -1.7942733764648438, 24.3270263671875, 9.270122528076172, 14.806983947753906, 88.22343444824219, -17.941543579101562, 85.37095642089844, 52.236358642578125, 67.46175384521484, 57.801300048828125, 9.226951599121094, 68.8403091430664, 19.296859741210938, 69.01036834716797, -15.415966033935547, 12.252403259277344, 9.241676330566406, 60.231964111328125, -24.22789764404297, -7.685462951660156, 3.5909423828125, 80.62626647949219, 66.39736938476562, -15.033426284790039, 76.44564819335938, 18.575660705566406, -31.695995330810547, 7.958595275878906, -16.91704559326172, 32.216636657714844, 17.186172485351562, 79.62107849121094, 19.91909408569336, 26.209915161132812, 14.059288024902344, 4.148029327392578, 52.18830871582031, -2.5352706909179688, 22.430221557617188, 60.5777587890625, 24.189407348632812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000293.npy"}
{"epoch": 0.4302496328928047, "step": 294, "batch_size": 64, "mean": 22.482465744018555, "std": 34.787357330322266, "min": -47.3316650390625, "p10": -18.67851104736328, "median": 19.612241744995117, "p90": 66.93650436401369, "max": 160.099609375, "pos_frac": 0.78125, "sample": [-37.382164001464844, 71.67350769042969, 22.417404174804688, 80.6683349609375, 43.369781494140625, 36.602821350097656, 37.151092529296875, 20.319622039794922, 43.47967529296875, 34.130126953125, 26.757080078125, 23.819664001464844, -8.214252471923828, 40.08514404296875, -39.53120422363281, 9.697669982910156, -16.653854370117188, 25.481781005859375, -14.519886016845703, 31.90435791015625, -12.732620239257812, 1.8458099365234375, -11.529167175292969, 23.59917449951172, -0.9049530029296875, -17.1956787109375, 41.10727310180664, 16.667373657226562, 40.010345458984375, 60.959632873535156, 16.37633514404297, 86.52115631103516, 18.904861450195312, 60.67387390136719, 27.74882698059082, -47.3316650390625, 160.099609375, 50.882286071777344, 17.941978454589844, 44.94335174560547, -20.051849365234375, 45.62759780883789, 6.543270111083984, 10.474079132080078, 68.63008880615234, 64.9548568725586, 27.167930603027344, 17.266807556152344, 1.7587966918945312, 2.3723087310791016, -19.314010620117188, 8.9752197265625, -35.909603118896484, 12.314884185791016, 32.52091979980469, 39.583740234375, 11.247196197509766, 17.404922485351562, -19.495651245117188, 6.299671173095703, 70.27349853515625, 9.653488159179688, 2.9493865966796875, 67.78578186035156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000294.npy"}
{"epoch": 0.43171806167400884, "step": 295, "batch_size": 64, "mean": 32.85896301269531, "std": 34.89580535888672, "min": -48.719966888427734, "p10": -4.4083518981933585, "median": 29.577850341796875, "p90": 73.84995574951175, "max": 141.00689697265625, "pos_frac": 0.84375, "sample": [16.848541259765625, 51.0833740234375, 6.8844757080078125, 8.97235107421875, 51.878013610839844, 10.877769470214844, 25.164649963378906, 28.43352508544922, 59.214691162109375, 50.987030029296875, 31.75872802734375, 0.5333633422851562, 30.062652587890625, 37.39148712158203, 63.69703674316406, 43.56138610839844, -40.86134338378906, 62.64825439453125, 23.935192108154297, 66.77571105957031, 41.24104309082031, 3.841156005859375, 11.577423095703125, 39.94959259033203, 6.333927154541016, -3.4733505249023438, 64.76541137695312, 21.433216094970703, 76.88177490234375, 62.99293518066406, 16.569900512695312, 33.69950866699219, 1.8020477294921875, 29.093048095703125, 26.448566436767578, 56.093544006347656, 64.27651977539062, -7.813396453857422, -9.826454162597656, 79.0482177734375, 1.5522193908691406, 81.97401428222656, -3.24188232421875, -22.310211181640625, -1.0335807800292969, -4.8090667724609375, 42.88740539550781, 13.12188720703125, 46.52961730957031, 63.234222412109375, 25.448577880859375, 49.2764892578125, 15.191703796386719, 141.00689697265625, 80.80268859863281, 105.66340637207031, 35.17879104614258, 28.0076904296875, 30.30768585205078, -48.719966888427734, -6.768125534057617, 25.421524047851562, 124.76748657226562, 34.70256805419922], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000295.npy"}
{"epoch": 0.4331864904552129, "step": 296, "batch_size": 64, "mean": 34.803794860839844, "std": 36.26123809814453, "min": -53.84001159667969, "p10": -4.583621215820312, "median": 35.54410171508789, "p90": 72.04629058837891, "max": 148.76919555664062, "pos_frac": 0.8125, "sample": [-2.9577503204345703, 29.165794372558594, 72.08721923828125, 141.92959594726562, 46.86316680908203, 148.76919555664062, 9.282615661621094, 13.572166442871094, 43.04981994628906, -53.84001159667969, 74.11311340332031, -1.5532913208007812, -27.871963500976562, 54.355873107910156, 2.910552978515625, -1.1337814331054688, 43.70226287841797, 31.3240966796875, 71.76074981689453, 50.617401123046875, 11.54364013671875, 44.0540771484375, 7.8856353759765625, 47.32200622558594, 10.408782958984375, 54.144805908203125, 60.727874755859375, 38.97441101074219, 50.338714599609375, -4.758216857910156, 67.04837036132812, 71.95079040527344, 53.89849853515625, 61.593841552734375, 7.062767028808594, 55.34552001953125, 50.591705322265625, 20.562816619873047, -8.558761596679688, 28.150493621826172, -4.176231384277344, 53.70338439941406, -7.4148712158203125, 20.48387908935547, 115.18475341796875, -0.3375244140625, 4.188560485839844, 74.85233306884766, 13.428970336914062, 14.884946823120117, 57.72810363769531, 10.58694839477539, 36.203399658203125, 78.41813659667969, -12.445114135742188, 12.085334777832031, 34.884803771972656, 54.009185791015625, 38.59763717651367, -9.869823455810547, 36.91253662109375, 34.79285430908203, 26.119216918945312, 70.18666076660156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000296.npy"}
{"epoch": 0.434654919236417, "step": 297, "batch_size": 64, "mean": 29.71851921081543, "std": 31.939250946044922, "min": -30.65680503845215, "p10": -6.35141487121582, "median": 23.58730697631836, "p90": 75.93250122070313, "max": 116.02651977539062, "pos_frac": 0.84375, "sample": [10.977886199951172, 51.79041290283203, 21.758041381835938, 52.95196533203125, 25.577880859375, 1.2156486511230469, 46.65507507324219, 36.81803894042969, 53.744895935058594, -28.03443145751953, 42.06257629394531, 60.67433166503906, -6.6261749267578125, 23.111602783203125, 51.459228515625, -1.9221954345703125, 0.63360595703125, 67.07888793945312, 42.22723388671875, -30.65680503845215, 8.420196533203125, -13.529325485229492, 3.9824562072753906, 14.41824722290039, 34.0155029296875, 44.63330078125, 43.647705078125, 15.759178161621094, 29.066112518310547, 13.765045166015625, 22.07813262939453, 13.383201599121094, 8.52825927734375, 24.063011169433594, 33.158660888671875, -5.710308074951172, 76.63601684570312, 83.05130004882812, 97.81379699707031, 74.53218078613281, 33.35034942626953, 55.7129020690918, 5.094970703125, 17.803176879882812, -3.859983444213867, -9.004833221435547, 16.783363342285156, 7.385169982910156, 76.53263854980469, 6.622093200683594, 14.712396621704102, 42.869056701660156, 47.375518798828125, 0.114715576171875, -13.896377563476562, -14.833793640136719, 9.754955291748047, 69.3578109741211, 102.25581359863281, 116.02651977539062, 13.919501304626465, 78.74374389648438, 55.03352737426758, 30.921585083007812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000297.npy"}
{"epoch": 0.43612334801762115, "step": 298, "batch_size": 64, "mean": 35.245792388916016, "std": 38.42403793334961, "min": -18.208206176757812, "p10": -7.707609176635742, "median": 28.962139129638672, "p90": 79.40145416259767, "max": 196.45916748046875, "pos_frac": 0.796875, "sample": [41.38819122314453, 63.203155517578125, 10.430013656616211, 22.301223754882812, -12.107284545898438, 68.06927490234375, -7.453357696533203, 81.4935302734375, 86.54203796386719, -6.321197509765625, -7.8165740966796875, 50.964141845703125, 61.399559020996094, 37.99150085449219, -11.577644348144531, 45.7073974609375, -16.58361053466797, 26.332923889160156, 28.82452392578125, 12.113849639892578, 62.6402587890625, -18.208206176757812, 44.54078674316406, -16.10144805908203, 75.2337646484375, 103.58238220214844, 28.125221252441406, 22.07928466796875, 29.099754333496094, 74.25321960449219, -2.9200210571289062, 3.0332183837890625, 10.498184204101562, 2.0603981018066406, -0.5579833984375, 58.974884033203125, -14.159408569335938, 81.18760681152344, 62.4317626953125, 35.33008575439453, 19.92947006225586, 17.294357299804688, 196.45916748046875, 10.690792083740234, -0.0545196533203125, -0.6712646484375, 70.01549530029297, 100.37077331542969, 62.69512939453125, 22.336971282958984, 62.539947509765625, 38.77613067626953, 35.04766845703125, 65.51508331298828, 37.71031188964844, 47.44169616699219, 20.67490005493164, 106.61740112304688, 1.6405181884765625, 51.620445251464844, 11.21533203125, 11.3125, 1.3350944519042969, 49.19172668457031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000298.npy"}
{"epoch": 0.43759177679882527, "step": 299, "batch_size": 64, "mean": 37.45670700073242, "std": 33.99776077270508, "min": -23.841781616210938, "p10": -2.594499969482422, "median": 35.79804992675781, "p90": 77.78833236694337, "max": 130.33987426757812, "pos_frac": 0.859375, "sample": [30.64067840576172, 9.152946472167969, 46.64967346191406, 46.28997802734375, 8.241905212402344, -2.707977294921875, 61.03556823730469, 55.9781494140625, 13.35849380493164, 8.634639739990234, 74.82730102539062, 30.26068115234375, 94.16064453125, 50.91654586791992, 61.42202377319336, 67.98103332519531, 26.690994262695312, 91.79932403564453, 21.682451248168945, 71.56698608398438, 17.51641845703125, 22.37616729736328, 6.031806945800781, -11.309654235839844, -11.110382080078125, 8.412239074707031, 32.124481201171875, -0.38695526123046875, -23.841781616210938, 3.6722640991210938, 15.62442398071289, 46.038726806640625, 52.17979431152344, 34.348121643066406, 66.64637756347656, 50.03046798706055, 17.846466064453125, 50.87593460083008, 69.65780639648438, 56.931983947753906, 67.36817932128906, 21.066699981689453, 3.3883705139160156, -17.963714599609375, 84.32572174072266, 37.40093231201172, 9.152862548828125, 123.4999771118164, 53.873260498046875, 130.33987426757812, 68.48213195800781, 14.008548736572266, 23.451980590820312, -5.008064270019531, -2.3297195434570312, 74.98629760742188, 45.1219367980957, 88.01045989990234, 37.48957824707031, 78.98920440673828, -17.46282958984375, 37.24797821044922, 7.680593490600586, 61.8621940612793], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000299.npy"}
{"epoch": 0.4390602055800294, "step": 300, "batch_size": 64, "mean": 31.3201904296875, "std": 37.1410026550293, "min": -53.49272155761719, "p10": -7.30917282104492, "median": 32.14188194274902, "p90": 73.08132781982422, "max": 124.29103088378906, "pos_frac": 0.84375, "sample": [45.92261505126953, 14.217891693115234, 47.419593811035156, 99.65830993652344, 12.420417785644531, -5.8116912841796875, 34.91770935058594, 70.64901733398438, 33.22002029418945, -43.538272857666016, -53.49272155761719, 3.6876678466796875, 43.72066116333008, 9.4049072265625, -12.737438201904297, 35.02381134033203, 22.68587875366211, 8.377197265625, 8.178159713745117, 1.1933670043945312, 71.87005615234375, 89.12214660644531, -1.480133056640625, 5.541473388671875, -7.950950622558594, 70.59605407714844, 13.505874633789062, 36.58247375488281, 58.55992889404297, 0.00152587890625, 31.063743591308594, 120.71583557128906, 39.746612548828125, -27.851341247558594, 51.235321044921875, 2.0013389587402344, 72.6220703125, -13.220741271972656, 55.1365966796875, 71.7052001953125, 43.970359802246094, 33.57060241699219, 1.6545562744140625, 56.95289611816406, -23.431427001953125, 60.37556457519531, 23.959041595458984, 21.260356903076172, 73.27815246582031, 56.856964111328125, 85.57675170898438, 6.770046234130859, 5.8436279296875, 8.724189758300781, 5.7525835037231445, 16.327957153320312, 40.36299133300781, 17.886444091796875, -5.429954528808594, 124.29103088378906, 38.91693115234375, 110.79853820800781, 52.27412796020508, 33.32963943481445], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000300.npy"}
{"epoch": 0.44052863436123346, "step": 301, "batch_size": 64, "mean": 38.26066589355469, "std": 38.069610595703125, "min": -36.12165069580078, "p10": -6.586894226074217, "median": 32.72390365600586, "p90": 87.42831573486329, "max": 157.54385375976562, "pos_frac": 0.828125, "sample": [38.667938232421875, 64.81855773925781, 31.331409454345703, 85.10379791259766, 122.12612915039062, 71.00471496582031, 24.799545288085938, 60.53236389160156, 54.49059295654297, -14.091033935546875, 75.77474975585938, 80.41603088378906, 4.747833251953125, 29.543697357177734, -10.973506927490234, 34.996360778808594, -1.791351318359375, 157.54385375976562, 4.684700012207031, 33.528602600097656, 52.57835388183594, 39.34907913208008, -11.933998107910156, 26.90673828125, 18.635452270507812, 50.622039794921875, 26.64000701904297, 45.482200622558594, 11.71902084350586, 49.82896423339844, 120.16575622558594, -19.274742126464844, -3.7891693115234375, 55.08721923828125, 13.082866668701172, 34.34449768066406, 52.573394775390625, -7.3002777099609375, 24.45724105834961, 15.334754943847656, 13.32217025756836, 29.311660766601562, 19.392929077148438, -4.922332763671875, -36.12165069580078, 109.00313568115234, 91.30531311035156, -13.947654724121094, 30.297439575195312, 48.652931213378906, 74.25787353515625, 43.02552795410156, 31.919204711914062, 78.0568618774414, 94.28787231445312, 88.4245376586914, 5.549591064453125, 75.90689849853516, 37.33799743652344, 25.396560668945312, 2.9435806274414062, -2.0776443481445312, 13.705974578857422, 51.88939666748047], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000301.npy"}
{"epoch": 0.4419970631424376, "step": 302, "batch_size": 64, "mean": 31.317073822021484, "std": 36.61674118041992, "min": -45.644588470458984, "p10": -10.0365306854248, "median": 25.742488861083984, "p90": 74.1426628112793, "max": 124.73274993896484, "pos_frac": 0.84375, "sample": [52.983306884765625, 96.84947204589844, 8.226409912109375, 39.436614990234375, 54.23747253417969, 61.011409759521484, 71.01302337646484, 16.44965362548828, 25.926651000976562, -43.9664306640625, 26.7833251953125, 2.0391292572021484, -6.086750030517578, 22.6290283203125, -40.051361083984375, -26.14789390563965, -0.3535308837890625, 2.1717071533203125, 97.80662536621094, 38.35308837890625, -11.729293823242188, 74.49718475341797, 49.82673645019531, 11.020729064941406, 17.418956756591797, -34.67951965332031, 73.31544494628906, 53.859222412109375, 41.764678955078125, 18.54302978515625, 15.027610778808594, 44.68870544433594, 23.432113647460938, 13.233909606933594, 7.021116256713867, 71.91246032714844, 124.73274993896484, 10.276359558105469, 46.449275970458984, -45.644588470458984, 38.33860778808594, 91.11446380615234, 36.882965087890625, 62.74031448364258, 5.495796203613281, 66.27397918701172, 25.558326721191406, 99.41250610351562, 65.78816986083984, -5.3320159912109375, 3.3922252655029297, 43.56207275390625, 7.975151062011719, 25.339584350585938, 31.3096923828125, 17.61172103881836, 53.262115478515625, 83.63362884521484, 5.58123779296875, 63.36918640136719, 21.010971069335938, 4.4917144775390625, 67.26724243164062, -14.064733505249023], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000302.npy"}
{"epoch": 0.4434654919236417, "step": 303, "batch_size": 64, "mean": 36.312965393066406, "std": 34.5268440246582, "min": -30.837608337402344, "p10": -5.856856536865229, "median": 34.72014236450195, "p90": 78.6925422668457, "max": 116.90744018554688, "pos_frac": 0.875, "sample": [78.34857177734375, -14.20035171508789, 44.802955627441406, 49.287193298339844, 0.050693511962890625, 59.90808868408203, 7.760986328125, 27.358360290527344, 11.739677429199219, 10.313827514648438, 1.752288818359375, 8.884223937988281, 72.73377990722656, 7.397407531738281, 78.83995819091797, 23.74657440185547, 22.347793579101562, 27.108966827392578, 5.595909118652344, 104.44573974609375, 35.462646484375, 61.992286682128906, -29.769515991210938, -8.870285034179688, 116.90744018554688, 39.30675506591797, -8.050224304199219, 79.94811248779297, 58.47653579711914, 64.90693664550781, 52.25480651855469, 52.14612579345703, 63.44378662109375, 14.894462585449219, 94.57151794433594, -24.811721801757812, -30.837608337402344, 35.23851013183594, 73.42134094238281, 28.482086181640625, 81.6683349609375, 19.088150024414062, 42.54010009765625, 54.86399841308594, 66.2835693359375, 74.1661376953125, 74.25236511230469, 24.12087631225586, 49.666259765625, -26.094057083129883, 111.61978149414062, 6.442741394042969, 54.984893798828125, 40.53371810913086, 31.389528274536133, 49.88927459716797, -0.7389984130859375, 4.364887237548828, 15.54559326171875, 18.92097282409668, 31.975452423095703, 34.20177459716797, 9.19959831237793, 57.80824279785156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000303.npy"}
{"epoch": 0.44493392070484583, "step": 304, "batch_size": 64, "mean": 36.74197769165039, "std": 34.9139404296875, "min": -50.20197296142578, "p10": -7.872029113769531, "median": 40.29056739807129, "p90": 79.13302688598634, "max": 115.64736938476562, "pos_frac": 0.84375, "sample": [-11.321477890014648, 18.702224731445312, 55.734413146972656, 46.2745361328125, 16.900283813476562, 58.4920768737793, -7.904571533203125, 86.7458724975586, 29.386032104492188, 58.74085998535156, 68.40640258789062, 37.447906494140625, 25.417579650878906, 43.05548858642578, 48.734764099121094, 39.152488708496094, 22.75360870361328, 5.344612121582031, 18.321365356445312, 41.428646087646484, 80.55864715576172, 41.4747314453125, 72.83847045898438, 50.852508544921875, 65.88108825683594, 65.55099487304688, 46.9952392578125, 15.749580383300781, 100.878173828125, 4.975868225097656, 48.69386291503906, 57.41700744628906, -9.86981201171875, 70.48324584960938, -50.20197296142578, -3.68255615234375, 12.79150390625, 44.73675537109375, 23.738853454589844, 55.95032501220703, -35.54191589355469, 115.64736938476562, -21.376792907714844, -7.7960968017578125, 34.37164306640625, 18.20858383178711, 20.467430114746094, 0.5504236221313477, 51.61335754394531, 65.28424072265625, -32.271942138671875, -1.1099052429199219, 79.73098754882812, 76.24256896972656, 77.34178924560547, 36.57421112060547, 11.521644592285156, 26.899887084960938, 13.838027954101562, 85.52633666992188, 77.73778533935547, 81.62197875976562, 76.48868560791016, 2.2906112670898438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000304.npy"}
{"epoch": 0.44640234948604995, "step": 305, "batch_size": 64, "mean": 37.75819396972656, "std": 35.27877426147461, "min": -45.107635498046875, "p10": -6.581229972839354, "median": 36.35969161987305, "p90": 81.84809341430665, "max": 115.04203033447266, "pos_frac": 0.84375, "sample": [72.90774536132812, 69.52787017822266, 28.80084228515625, 56.24658203125, -9.827198028564453, 49.77301025390625, 72.26728820800781, 15.240503311157227, 42.21620178222656, 90.39478302001953, 49.49370574951172, 9.460468292236328, 21.442020416259766, 61.45530700683594, -4.632598876953125, 15.728729248046875, 37.5924072265625, 110.49043273925781, 82.72486877441406, 68.9116439819336, 115.04203033447266, 34.133323669433594, 48.36036682128906, 67.38754272460938, 27.71224594116211, 15.620941162109375, 33.174476623535156, 8.198287963867188, 66.53728485107422, 18.643829345703125, 14.866943359375, 62.69921112060547, 8.631317138671875, 5.45806884765625, 95.10235595703125, 6.749256134033203, 98.77383422851562, 19.229724884033203, 14.016036987304688, -9.088821411132812, -5.877145767211914, -4.8842620849609375, 65.4105453491211, 63.24318313598633, 46.47972869873047, 70.02301025390625, 46.39764404296875, 7.738067626953125, 56.1483154296875, 87.09194946289062, 72.2120132446289, 79.80228424072266, 59.5552978515625, 58.44520568847656, -45.107635498046875, -26.323516845703125, 11.589202880859375, -21.921310424804688, 37.34342956542969, -11.938812255859375, 35.375953674316406, 34.128143310546875, -6.8829803466796875, 17.01347541809082], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000305.npy"}
{"epoch": 0.447870778267254, "step": 306, "batch_size": 64, "mean": 45.435882568359375, "std": 30.717479705810547, "min": -18.036209106445312, "p10": 12.865103912353517, "median": 42.23556137084961, "p90": 84.1665412902832, "max": 108.40219116210938, "pos_frac": 0.953125, "sample": [66.10177612304688, 14.013011932373047, 41.52732849121094, 30.8504581451416, 13.682281494140625, 43.07733154296875, 103.93394470214844, 84.32036590576172, 87.47306823730469, 34.13810729980469, 74.98942565917969, 20.91510009765625, -17.453811645507812, 83.8076171875, 19.09825897216797, 76.02122497558594, 76.13381958007812, 82.47422790527344, 16.360061645507812, 30.4696044921875, 26.65233612060547, 42.94379425048828, 64.91466522216797, 31.0772705078125, 13.791656494140625, 23.123029708862305, 0.8051795959472656, 22.55936050415039, 45.37930679321289, 108.40219116210938, 39.94343566894531, -18.036209106445312, 107.1864013671875, 12.514884948730469, 42.990562438964844, 57.22666931152344, 34.15887451171875, 69.71179962158203, 45.37498474121094, 23.480804443359375, 27.94488525390625, 17.868438720703125, 45.33213806152344, 73.48492431640625, -4.171792984008789, 11.4002685546875, 31.152755737304688, 96.6146011352539, 71.66724395751953, 16.6259765625, 62.7657470703125, 70.80339813232422, 15.767616271972656, 79.99614715576172, 34.91340637207031, 56.7636833190918, 62.58317565917969, 39.7841796875, 78.24201965332031, 20.471153259277344, 60.82987976074219, 59.45347595214844, 5.0236663818359375, 96.44513702392578], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000306.npy"}
{"epoch": 0.44933920704845814, "step": 307, "batch_size": 64, "mean": 32.57453918457031, "std": 39.671817779541016, "min": -45.48042297363281, "p10": -21.832102966308593, "median": 34.81639862060547, "p90": 73.40713882446289, "max": 144.76492309570312, "pos_frac": 0.796875, "sample": [72.27693176269531, -6.594657897949219, 20.23102569580078, 27.974685668945312, 33.13888931274414, 16.620723724365234, -43.2373046875, -15.145309448242188, 3.7645263671875, 39.172119140625, 44.934288024902344, -39.483924865722656, 88.83328247070312, -22.711196899414062, 42.759918212890625, -23.59221649169922, 56.898616790771484, -19.7808837890625, 72.36902618408203, 5.5316314697265625, 42.1805419921875, 42.195526123046875, -8.150726318359375, 50.3459358215332, 124.35070037841797, 40.33434295654297, 20.16150665283203, -4.0185546875, 29.75707244873047, 52.59741973876953, 22.883298873901367, 67.5333251953125, 36.334678649902344, -37.08815002441406, 40.43603515625, 32.45250701904297, 73.36438751220703, 73.42546081542969, 144.76492309570312, 53.54193115234375, 36.34302520751953, 40.519813537597656, -45.48042297363281, 7.66070556640625, 75.32400512695312, 35.75041961669922, 24.205413818359375, 68.96866607666016, 25.39342498779297, 11.099159240722656, 23.71184539794922, 33.88237762451172, 64.32757568359375, 43.30792999267578, 52.93096923828125, 104.33504486083984, 38.646087646484375, 14.390003204345703, 116.05661010742188, 70.42196655273438, 31.06524658203125, 0.48846435546875, -15.36004638671875, -24.58003807067871], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000307.npy"}
{"epoch": 0.45080763582966227, "step": 308, "batch_size": 64, "mean": 32.360687255859375, "std": 40.394996643066406, "min": -37.658206939697266, "p10": -18.210248565673826, "median": 25.995319366455078, "p90": 91.96374130249023, "max": 150.10272216796875, "pos_frac": 0.78125, "sample": [54.07782745361328, 61.5206298828125, -21.3004150390625, -8.798198699951172, 36.47722625732422, 87.04915618896484, -37.658206939697266, 22.817066192626953, 19.42767333984375, 19.784997940063477, 36.928924560546875, -18.588947296142578, -26.726089477539062, 26.280502319335938, 15.992340087890625, 81.57211303710938, -17.326618194580078, 58.287506103515625, 50.4375, 21.879852294921875, -0.922332763671875, 150.10272216796875, 92.33575439453125, 97.00089263916016, 31.629135131835938, -5.372650146484375, -33.14251708984375, 24.531415939331055, 84.37206268310547, 23.02731704711914, 94.27428436279297, 19.565704345703125, 34.77671813964844, 117.94207763671875, 35.564300537109375, 109.48673248291016, 101.09707641601562, 11.694595336914062, 28.72796630859375, 30.470687866210938, 20.41583251953125, -14.02239990234375, 33.28245544433594, 44.57852554321289, 38.87670135498047, 91.09571075439453, -3.793426513671875, 65.94279479980469, 26.715469360351562, 13.443923950195312, 82.02490234375, 49.89397430419922, 10.522674560546875, 11.994903564453125, -10.512380599975586, 7.128120422363281, 26.388168334960938, 25.71013641357422, 9.751617431640625, 14.366153717041016, -32.903953552246094, -26.821495056152344, 52.379669189453125, 25.329208374023438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000308.npy"}
{"epoch": 0.4522760646108664, "step": 309, "batch_size": 64, "mean": 34.5657958984375, "std": 37.57004165649414, "min": -46.941741943359375, "p10": -8.720758056640623, "median": 34.91949653625488, "p90": 84.09542007446291, "max": 123.48368835449219, "pos_frac": 0.828125, "sample": [35.618682861328125, 3.702554702758789, 29.945541381835938, 85.42581176757812, 14.595611572265625, 68.93444061279297, 25.26760482788086, -23.025951385498047, 64.26107788085938, 11.312408447265625, 81.45906066894531, 47.64262390136719, 10.034889221191406, 117.53633880615234, -35.10856628417969, -40.38325881958008, 37.0858268737793, 12.875648498535156, 39.572540283203125, 31.68708038330078, 29.871292114257812, -46.941741943359375, 53.06639099121094, 85.22528839111328, -5.7460784912109375, -30.85696792602539, 50.28477478027344, 85.23512268066406, 14.597492218017578, 44.90255355834961, 5.283008575439453, 112.48200988769531, 24.04228973388672, -9.995620727539062, 55.543251037597656, 38.299652099609375, 104.26885986328125, 65.95223999023438, 35.99004364013672, -1.127166748046875, -22.79779815673828, 56.33975601196289, 1.0494613647460938, 123.48368835449219, -2.7098236083984375, 61.550498962402344, 64.5987548828125, 26.39153289794922, 55.60913848876953, 7.001213073730469, 31.340469360351562, -4.653377532958984, 62.43498229980469, 11.256874084472656, 49.18647766113281, 23.94402313232422, 73.95181274414062, 66.15438842773438, 14.43411636352539, 34.22031021118164, 13.392959594726562, 47.76847839355469, 50.68402862548828, 38.762359619140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000309.npy"}
{"epoch": 0.45374449339207046, "step": 310, "batch_size": 64, "mean": 32.65531921386719, "std": 37.35422897338867, "min": -55.32006072998047, "p10": -15.191454696655269, "median": 36.88519287109375, "p90": 73.78514022827149, "max": 116.33793640136719, "pos_frac": 0.796875, "sample": [-2.961355209350586, 5.404144287109375, 113.75810241699219, 23.557579040527344, 14.091651916503906, 109.92901611328125, 63.83984375, 28.696481704711914, 20.332420349121094, -25.72760009765625, -22.959915161132812, -9.895217895507812, 6.9852447509765625, 58.19240188598633, -8.391040802001953, 6.478736877441406, 16.579387664794922, 19.550209045410156, 67.29586791992188, 39.96173858642578, 63.58568572998047, -2.2917938232421875, -10.192459106445312, -45.548912048339844, 52.46559143066406, 0.18500518798828125, 73.25643920898438, 23.680667877197266, 57.76934814453125, 50.720458984375, 20.345840454101562, 56.769744873046875, 70.57846069335938, 1.7176094055175781, 42.34403991699219, 49.12162780761719, 42.74028778076172, 40.073280334472656, 30.117141723632812, 51.90660858154297, -33.46954345703125, 50.72601318359375, 116.33793640136719, 74.01172637939453, 93.16400146484375, 59.23738098144531, 69.39883422851562, 37.57878112792969, 80.97772979736328, 41.22148132324219, -21.349781036376953, -4.093603134155273, 74.68482971191406, 31.709915161132812, -55.32006072998047, 0.7962532043457031, 57.415252685546875, 15.285575866699219, 58.91615295410156, 53.349632263183594, -17.333881378173828, 49.394168853759766, 27.04778289794922, 36.19160461425781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000310.npy"}
{"epoch": 0.4552129221732746, "step": 311, "batch_size": 64, "mean": 40.49799346923828, "std": 37.112648010253906, "min": -25.330825805664062, "p10": -2.6571562767028793, "median": 33.934295654296875, "p90": 93.0981185913086, "max": 146.07737731933594, "pos_frac": 0.875, "sample": [6.874916076660156, 91.89324188232422, 49.34974670410156, 48.155609130859375, -3.2881174087524414, 131.16477966308594, -4.583465576171875, 59.371246337890625, 55.867279052734375, 104.28854370117188, 54.72686004638672, 12.682388305664062, 61.91626739501953, 75.0111083984375, -25.330825805664062, 1.6600341796875, 28.027999877929688, 15.718294143676758, 49.505332946777344, 53.9840087890625, 34.14093017578125, 93.61449432373047, 111.91314697265625, 57.50441360473633, 44.791770935058594, 25.320018768310547, 30.517532348632812, 69.34899139404297, 65.67335510253906, 43.82952117919922, 28.948497772216797, 14.312458038330078, 55.69600296020508, 83.54248809814453, 38.68321228027344, 17.345962524414062, 10.8936767578125, 74.63240051269531, 1.9681549072265625, 99.03521728515625, 8.243011474609375, -1.1849136352539062, 95.25062561035156, 33.7276611328125, 74.15168762207031, 25.181320190429688, -19.990951538085938, -4.613101959228516, 6.149467468261719, 27.541921615600586, -5.3443450927734375, -4.696746826171875, 20.800891876220703, 19.563899993896484, 7.112525939941406, 46.54612731933594, 91.71029663085938, 38.86280822753906, 9.597381591796875, 0.6737823486328125, 57.19366455078125, 21.29763412475586, 29.312095642089844, 146.07737731933594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000311.npy"}
{"epoch": 0.4566813509544787, "step": 312, "batch_size": 64, "mean": 38.68157958984375, "std": 40.04439926147461, "min": -61.4798583984375, "p10": -7.882765579223632, "median": 35.26186943054199, "p90": 90.54352569580078, "max": 128.47793579101562, "pos_frac": 0.828125, "sample": [44.462684631347656, 126.23211669921875, 75.08030700683594, 38.82179260253906, 11.226875305175781, -61.4798583984375, 55.452178955078125, -0.03925323486328125, 35.659793853759766, 78.40441131591797, -26.395889282226562, 113.25132751464844, 91.26972198486328, 72.17984008789062, 72.94990539550781, 8.987483978271484, -25.050281524658203, 73.99337768554688, 44.15741729736328, 108.78059387207031, 54.4156379699707, -35.399269104003906, 52.055625915527344, 29.634353637695312, 26.057296752929688, 5.213649749755859, 86.08970642089844, 46.01727294921875, 95.55647277832031, 1.63934326171875, 33.57176971435547, 67.55545043945312, 11.294082641601562, 34.86394500732422, 29.445547103881836, 56.9815673828125, 13.112598419189453, 33.4842643737793, -31.246192932128906, 72.43006134033203, 128.47793579101562, 18.7861328125, -15.380012512207031, 21.510147094726562, 53.01877975463867, 46.73728942871094, -8.270687103271484, 88.84906768798828, 20.175926208496094, 107.69510650634766, 32.77320098876953, -2.771320343017578, 19.782323837280273, 11.505058288574219, 13.670848846435547, 50.66847229003906, 33.97376251220703, -0.02204132080078125, 42.32081604003906, 74.73503112792969, 48.63822555541992, -6.9776153564453125, 44.86102294921875, 30.146221160888672], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000312.npy"}
{"epoch": 0.4581497797356828, "step": 313, "batch_size": 64, "mean": 34.8759765625, "std": 42.337066650390625, "min": -59.60631561279297, "p10": -12.814888954162594, "median": 31.201683044433594, "p90": 89.65176239013672, "max": 148.2212677001953, "pos_frac": 0.796875, "sample": [93.74893951416016, -36.256195068359375, 43.482574462890625, 148.2212677001953, 16.163970947265625, 63.68739318847656, 59.330177307128906, 24.15882110595703, 100.53795623779297, -6.1392974853515625, 49.675472259521484, 12.507675170898438, 90.03448486328125, 36.58943176269531, 21.795391082763672, -14.034980773925781, 81.62246704101562, 11.974563598632812, 81.6582260131836, -35.23339080810547, 21.05768394470215, 54.55445861816406, -51.665096282958984, 24.234046936035156, 27.43746566772461, 60.204124450683594, 83.6214370727539, 13.433219909667969, 62.68171691894531, 104.92481994628906, 37.29014587402344, 46.121002197265625, -38.093143463134766, 69.61239624023438, 30.912681579589844, 7.3669891357421875, -9.968008041381836, 0.01300048828125, 33.20923614501953, 73.56285095214844, 31.490684509277344, -59.60631561279297, 19.309547424316406, 10.657768249511719, 64.62041473388672, 98.52751922607422, 10.495864868164062, -35.633148193359375, 34.824501037597656, 20.83475112915039, 88.75874328613281, 75.49319458007812, 59.87754821777344, 11.011021614074707, -7.802242279052734, 28.621109008789062, -0.125885009765625, 45.69792175292969, -2.4186410903930664, 27.952064514160156, -9.545791625976562, 39.003578186035156, 86.52993774414062, 99.4522705078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000313.npy"}
{"epoch": 0.45961820851688695, "step": 314, "batch_size": 64, "mean": 36.76209259033203, "std": 41.16289520263672, "min": -37.732627868652344, "p10": -16.894852447509766, "median": 36.40312957763672, "p90": 85.01303482055664, "max": 141.13327026367188, "pos_frac": 0.796875, "sample": [81.02273559570312, -17.324745178222656, -30.193302154541016, -9.703079223632812, -30.682884216308594, 93.06380462646484, 42.31199645996094, -12.373886108398438, 49.730804443359375, -37.732627868652344, 42.294761657714844, 15.59716796875, -7.223472595214844, 122.51174926757812, 56.092872619628906, 9.907264709472656, 16.601226806640625, 33.8955192565918, 85.50587463378906, 141.13327026367188, 81.672607421875, 10.26633071899414, 39.175621032714844, 98.93537902832031, -2.34930419921875, 24.888748168945312, 22.34980010986328, -4.855371475219727, -32.50190734863281, 18.40058135986328, 61.500648498535156, 56.96543884277344, 51.89404296875, 76.3614501953125, 56.37569046020508, 29.303020477294922, 45.5666389465332, 75.43612670898438, 64.57426452636719, 139.86795043945312, 56.16706848144531, 35.579246520996094, 67.98165893554688, 83.86307525634766, -19.302059173583984, 46.30494689941406, 41.36638259887695, 62.32423400878906, -37.72296142578125, 30.38129425048828, 4.2813262939453125, 56.219871520996094, -15.891769409179688, 28.922439575195312, 40.754920959472656, 47.980926513671875, 9.706085205078125, 37.227012634277344, 20.025863647460938, 19.194564819335938, 27.697898864746094, 9.40349006652832, 32.68379211425781, 109.36184692382812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000314.npy"}
{"epoch": 0.461086637298091, "step": 315, "batch_size": 64, "mean": 36.83607482910156, "std": 44.11724853515625, "min": -79.99275207519531, "p10": -27.11802597045898, "median": 39.10050392150879, "p90": 92.633837890625, "max": 131.29151916503906, "pos_frac": 0.796875, "sample": [9.20599365234375, 100.95872497558594, 30.00701141357422, 5.3384552001953125, -6.617822647094727, 7.872550964355469, -30.96471405029297, -35.16448974609375, -24.752166748046875, -4.690723419189453, 56.76631164550781, 3.7973251342773438, -35.933998107910156, 44.12105941772461, 68.48326110839844, 71.16314697265625, 60.822425842285156, 49.37191390991211, 54.149845123291016, -13.727806091308594, 37.122802734375, 25.705886840820312, 34.7061767578125, 68.67566680908203, 89.58358764648438, 19.70425033569336, 59.94297790527344, 40.26877975463867, 41.346954345703125, 27.274147033691406, 75.62362670898438, -35.68377685546875, 91.46707153320312, 6.168041229248047, -33.08491516113281, 110.56747436523438, 67.33786010742188, 47.63326644897461, 77.82771301269531, 1.2929611206054688, 8.392244338989258, 41.4991455078125, 131.29151916503906, 29.814584732055664, 87.80136108398438, 45.300384521484375, -28.13196563720703, -21.863155364990234, 28.427288055419922, 93.13388061523438, -3.3222923278808594, 25.13302993774414, 111.65518188476562, 55.74574661254883, 106.94554138183594, 23.85369873046875, 83.06090545654297, 29.869827270507812, 95.87777709960938, 68.25542449951172, -79.99275207519531, 76.41253662109375, 37.932228088378906, 46.729881286621094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000315.npy"}
{"epoch": 0.46255506607929514, "step": 316, "batch_size": 64, "mean": 36.19043731689453, "std": 41.632266998291016, "min": -70.29933166503906, "p10": -4.706139373779297, "median": 34.97837829589844, "p90": 88.40820922851563, "max": 155.1497802734375, "pos_frac": 0.84375, "sample": [25.007705688476562, 10.288007736206055, 24.38927459716797, 43.01904296875, 54.09734344482422, 55.902130126953125, 25.76202392578125, 33.88153076171875, 100.65225219726562, 155.1497802734375, 29.87030029296875, -20.30133056640625, -2.148529052734375, 1.5392913818359375, -18.472583770751953, 14.945785522460938, 11.537853240966797, 5.902198791503906, 14.207561492919922, 76.28976440429688, 20.61638641357422, 36.075225830078125, 41.77827835083008, -4.7751312255859375, 67.44319152832031, 5.132118225097656, 25.471893310546875, 122.95050048828125, 48.356910705566406, 82.794921875, 11.168413162231445, 0.2141876220703125, 39.60636901855469, 36.7943115234375, 78.59030151367188, 39.958396911621094, 31.80487823486328, -70.29933166503906, 6.4191436767578125, 36.94889831542969, 90.5053939819336, 45.89945983886719, 22.430801391601562, 9.358329772949219, 58.01523208618164, -47.44150161743164, 55.04345703125, -4.545158386230469, 122.98870849609375, 88.89994812011719, 128.77325439453125, -0.95989990234375, -29.3970947265625, 52.596527099609375, 58.25421142578125, 8.295612335205078, 49.550537109375, 62.15425109863281, 3.0997886657714844, 78.40998077392578, 48.86142349243164, 37.78110122680664, -8.216545104980469, 87.26081848144531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000316.npy"}
{"epoch": 0.46402349486049926, "step": 317, "batch_size": 64, "mean": 50.12665557861328, "std": 46.87739562988281, "min": -49.73982238769531, "p10": -5.130646514892578, "median": 39.15777778625488, "p90": 113.48311614990234, "max": 163.82199096679688, "pos_frac": 0.875, "sample": [54.72590637207031, 15.989555358886719, -40.35353088378906, -4.8999176025390625, 99.37515258789062, 41.75498962402344, 12.714065551757812, 36.777435302734375, 51.09001159667969, 35.88872528076172, 113.53376770019531, 2.7153472900390625, 34.82617950439453, -10.108177185058594, 87.31291961669922, 81.21694946289062, 6.911346435546875, 12.04974365234375, 113.36492919921875, 37.40008544921875, 25.881484985351562, 42.47948455810547, 32.90962219238281, 75.55978393554688, -49.73982238769531, 35.23394775390625, -20.437225341796875, 87.72559356689453, 163.82199096679688, 74.25613403320312, 44.39952087402344, 12.676536560058594, -22.242568969726562, 112.55496215820312, 81.71575927734375, 22.481231689453125, 157.4237060546875, 37.401611328125, 95.69371032714844, 74.29010009765625, 121.55461120605469, 13.528549194335938, 0.48737335205078125, 37.45676803588867, 110.30419921875, 12.03973388671875, 138.2044677734375, 40.858787536621094, 114.84142303466797, -6.760017395019531, 34.65871047973633, 57.2308349609375, 33.81232833862305, 59.362335205078125, 44.443939208984375, 27.950000762939453, 31.827606201171875, 105.24321746826172, 120.4774169921875, 79.5741195678711, 14.238739013671875, -5.229530334472656, 76.41495513916016, 79.21436309814453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000317.npy"}
{"epoch": 0.4654919236417034, "step": 318, "batch_size": 64, "mean": 43.18408966064453, "std": 41.377349853515625, "min": -49.795127868652344, "p10": -3.9314901351928704, "median": 40.63339424133301, "p90": 97.46854705810549, "max": 154.75970458984375, "pos_frac": 0.84375, "sample": [16.852954864501953, 62.671142578125, 53.43363952636719, 37.622802734375, 5.209613800048828, 70.63578796386719, 50.45349884033203, 74.33198547363281, 26.038917541503906, 106.62572479248047, -4.231899261474609, 1.5453948974609375, 24.791648864746094, 67.1669921875, 131.8587646484375, 60.17642593383789, -4.427341461181641, 77.64032745361328, 62.16229248046875, 33.94322967529297, 74.06819915771484, 29.740509033203125, 7.1840667724609375, 76.41419982910156, 70.73936462402344, 53.55063247680664, -6.152098655700684, -28.110519409179688, 63.81555938720703, 14.693817138671875, 62.354209899902344, 110.73033142089844, 92.04261779785156, -1.8134841918945312, 36.05089569091797, 53.0257568359375, 154.75970458984375, -3.2305355072021484, 10.224807739257812, 106.3663101196289, 99.7939453125, 15.391830444335938, 6.23968505859375, 55.7261962890625, 77.06505584716797, 23.962173461914062, -3.2026901245117188, 32.92375183105469, -49.795127868652344, -7.949064254760742, 3.734844207763672, 48.93846893310547, 91.90904235839844, 42.98460388183594, 111.36963653564453, 38.28218460083008, 3.5418548583984375, 33.214115142822266, 61.15223693847656, 76.51187133789062, -44.997398376464844, 31.08570098876953, 16.44818115234375, 68.4943618774414], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000318.npy"}
{"epoch": 0.4669603524229075, "step": 319, "batch_size": 64, "mean": 45.22449493408203, "std": 44.756736755371094, "min": -66.19596099853516, "p10": -4.472754287719725, "median": 35.55034065246582, "p90": 100.3127685546875, "max": 165.30018615722656, "pos_frac": 0.859375, "sample": [46.17655944824219, 75.01982116699219, 9.311019897460938, 29.111385345458984, 130.82058715820312, 21.64366912841797, 165.30018615722656, 97.57486724853516, 93.90987396240234, 32.797454833984375, 9.438070297241211, 20.986923217773438, 54.140869140625, 22.357131958007812, 65.37085723876953, 11.794265747070312, 8.019817352294922, 89.62554931640625, 25.1414794921875, 16.97614288330078, 90.70159912109375, -5.434345245361328, 73.03936767578125, 1.2755241394042969, 29.930328369140625, -1.160348892211914, 46.05314254760742, 87.75970458984375, 10.298377990722656, 109.06558227539062, 54.34281921386719, 51.09357452392578, 27.45111083984375, 100.40614318847656, 26.06659698486328, -15.405693054199219, 69.80167388916016, 18.565534591674805, 11.62237548828125, 73.16543579101562, 105.79933166503906, 134.8124237060547, 52.14524841308594, -2.652416229248047, 80.9864501953125, 15.903404235839844, 59.55188751220703, 61.35691833496094, 94.38040924072266, 81.97038269042969, 58.455474853515625, 38.303226470947266, 24.098419189453125, 88.33293151855469, -66.19596099853516, 30.63824462890625, -11.412559509277344, -27.617151260375977, -5.252899169921875, 4.4069366455078125, 111.9892578125, 0.7185745239257812, 100.09489440917969, -20.60052490234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000319.npy"}
{"epoch": 0.4684287812041116, "step": 320, "batch_size": 64, "mean": 39.59357452392578, "std": 41.39896011352539, "min": -67.21539306640625, "p10": -7.775055122375484, "median": 33.56629943847656, "p90": 92.79963684082034, "max": 188.05841064453125, "pos_frac": 0.84375, "sample": [-18.525279998779297, 67.8899917602539, 116.96522521972656, 11.855522155761719, 23.85394287109375, 71.20822143554688, 100.38218688964844, 51.71746826171875, 188.05841064453125, 58.59099197387695, 19.569297790527344, -11.919845581054688, -9.823028564453125, 20.031875610351562, 97.6024169921875, 85.46184539794922, 76.64407348632812, -26.087135314941406, -9.393749237060547, -1.1848220825195312, 14.258865356445312, 99.0489730834961, -3.9981021881103516, 44.012298583984375, 7.1808929443359375, 54.0343017578125, 60.72332763671875, 25.769424438476562, 10.225387573242188, 31.198993682861328, -2.0927276611328125, 21.947494506835938, 49.301025390625, 35.578407287597656, 60.134605407714844, 26.766780853271484, 67.56768798828125, 41.82424545288086, 57.38987731933594, 30.468231201171875, 31.690139770507812, -67.21539306640625, 24.80828094482422, 25.667938232421875, 46.97273254394531, 27.402389526367188, 68.05143737792969, 12.40167236328125, 1.9568500518798828, 24.614181518554688, 28.496421813964844, 35.44245910644531, 62.760215759277344, 69.59362030029297, 56.665489196777344, 132.04071044921875, 95.94440460205078, 6.2842254638671875, 36.69305419921875, 18.9072265625, 77.78300476074219, 61.55167007446289, 41.96869659423828, -30.730209350585938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000320.npy"}
{"epoch": 0.4698972099853157, "step": 321, "batch_size": 64, "mean": 44.4124641418457, "std": 43.94962692260742, "min": -67.8741455078125, "p10": -12.414965438842774, "median": 44.3895378112793, "p90": 100.14708633422853, "max": 128.19223022460938, "pos_frac": 0.828125, "sample": [96.96337127685547, 46.81842041015625, 49.2955322265625, 62.011924743652344, 8.41098403930664, 46.952274322509766, 81.92144775390625, -3.863811492919922, 28.516494750976562, 45.4248046875, 93.99803161621094, 27.837020874023438, -10.118759155273438, -1.8186416625976562, -67.8741455078125, 25.027053833007812, 63.390647888183594, 124.56759643554688, 122.7548828125, -16.746505737304688, 32.92823791503906, 5.564247131347656, 17.675979614257812, 38.764312744140625, 123.5901870727539, 31.95848846435547, 64.2747573852539, -12.457206726074219, 23.38239288330078, 40.71289825439453, 46.80706787109375, 93.06166076660156, -19.561859130859375, 66.08378601074219, 40.62921905517578, 93.63668823242188, -51.53716278076172, -34.63203430175781, 101.51153564453125, 106.80867004394531, 38.01057434082031, 23.793548583984375, 14.77435302734375, 47.214752197265625, 21.04492950439453, 128.19223022460938, 52.08642578125, -12.316402435302734, 90.83732604980469, 69.79855346679688, 88.2322998046875, 72.86468505859375, 19.89781951904297, 88.63983917236328, 40.92585754394531, 56.60743713378906, 86.35584259033203, 43.354270935058594, 5.501956939697266, 29.321388244628906, -25.115501403808594, 67.86335754394531, 101.700927734375, 60.140716552734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000321.npy"}
{"epoch": 0.4713656387665198, "step": 322, "batch_size": 64, "mean": 50.00999450683594, "std": 48.14103698730469, "min": -50.50304412841797, "p10": -8.266658782958983, "median": 43.05326843261719, "p90": 112.86385879516602, "max": 134.34585571289062, "pos_frac": 0.859375, "sample": [43.428550720214844, 83.45126342773438, 45.131072998046875, 128.48497009277344, 72.22779846191406, -50.50304412841797, -42.384979248046875, 104.40220642089844, 81.96671295166016, 26.316747665405273, -26.14842987060547, 6.583717346191406, 102.66702270507812, -8.659317016601562, 130.2015838623047, 106.51229858398438, 88.2518310546875, 36.15849304199219, 26.05823516845703, 30.684661865234375, 111.34664916992188, 109.78826904296875, -7.350456237792969, 116.53958892822266, -1.8891677856445312, 44.667938232421875, 4.963775634765625, 58.069313049316406, -29.155105590820312, 67.71005249023438, 67.61491394042969, 21.36109161376953, 120.33529663085938, 42.67798614501953, 87.64656066894531, 33.44467544555664, 23.690582275390625, 21.863067626953125, 15.707740783691406, 73.87057495117188, -24.201324462890625, 124.09725952148438, 63.35367202758789, 12.343769073486328, 92.0191421508789, 30.624835968017578, 79.33164978027344, -36.880828857421875, 18.250625610351562, 2.0792617797851562, 98.32006072998047, 96.36793518066406, 113.51409149169922, 84.34684753417969, 97.58733367919922, 25.03594970703125, 37.54492950439453, 17.374290466308594, 22.748933792114258, 94.04173278808594, 31.974761962890625, 9.636528015136719, 134.34585571289062, 9.047670364379883], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000322.npy"}
{"epoch": 0.47283406754772395, "step": 323, "batch_size": 64, "mean": 44.52705383300781, "std": 43.87985610961914, "min": -42.05519104003906, "p10": -4.586957550048822, "median": 36.82014083862305, "p90": 105.14348373413085, "max": 158.00804138183594, "pos_frac": 0.890625, "sample": [-15.891891479492188, 98.72614288330078, 45.31993103027344, 18.261062622070312, 27.643592834472656, 37.178001403808594, 103.1821517944336, 158.00804138183594, 14.307157516479492, 9.042182922363281, 50.31389617919922, 25.06877899169922, 1.1963577270507812, 77.42486572265625, 18.17546844482422, 124.00764465332031, 4.739837646484375, 66.81558990478516, 24.22547149658203, 97.61227416992188, 21.287933349609375, 4.687564849853516, 36.1097412109375, 51.05107116699219, 5.184696197509766, 104.82453918457031, 14.781330108642578, 43.654544830322266, 115.63438415527344, 12.441535949707031, 59.458702087402344, -24.361572265625, 28.234390258789062, 38.669715881347656, -42.05519104003906, 36.4622802734375, 8.580764770507812, 20.223464965820312, 147.71884155273438, 75.42402648925781, -13.421810150146484, -7.065521240234375, 53.61275100708008, 43.00608825683594, 59.70482635498047, 60.189300537109375, 22.35507583618164, 96.24181365966797, 12.070320129394531, 84.50975799560547, 110.40704345703125, 99.32344818115234, -26.295166015625, 105.2801742553711, 10.34796142578125, 68.7232894897461, 58.24015426635742, 116.54688262939453, 59.368080139160156, 24.005889892578125, 17.068462371826172, -7.909063339233398, 13.100215911865234, 46.95204162597656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000323.npy"}
{"epoch": 0.47430249632892807, "step": 324, "batch_size": 64, "mean": 47.70296096801758, "std": 39.36297607421875, "min": -52.028228759765625, "p10": -5.4637107849121085, "median": 49.42839431762695, "p90": 93.86378097534183, "max": 150.55816650390625, "pos_frac": 0.84375, "sample": [50.09754180908203, 41.176658630371094, 56.81142807006836, 55.31600570678711, 18.694589614868164, 128.11102294921875, 73.00735473632812, 77.37791442871094, 47.967002868652344, 45.7994499206543, 97.50334167480469, -12.850204467773438, 150.55816650390625, 35.57270812988281, 44.473289489746094, 64.36474609375, 16.273536682128906, 77.0313720703125, 30.20459747314453, 80.52287292480469, 17.468399047851562, 6.016574859619141, -52.028228759765625, 40.224151611328125, 54.973907470703125, 97.22478485107422, 65.68789672851562, -11.72528076171875, 61.38928985595703, 85.96818542480469, 113.54019165039062, 72.38526153564453, 63.39867401123047, 66.22805786132812, 77.92756652832031, -28.148834228515625, 76.34962463378906, 99.43795776367188, 20.733457565307617, 47.18975830078125, 75.32804870605469, -1.9530868530273438, 80.21598815917969, 39.939208984375, -12.226287841796875, -4.848609924316406, 6.463188171386719, 10.716484069824219, -11.170089721679688, 106.48274230957031, 81.95677185058594, 73.4702377319336, 46.225486755371094, -2.06158447265625, 62.287017822265625, -5.727325439453125, 64.5404052734375, 3.5023536682128906, 53.78822326660156, 86.02143859863281, 48.759246826171875, 27.79364013671875, 26.811599731445312, 44.41946029663086], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000324.npy"}
{"epoch": 0.47577092511013214, "step": 325, "batch_size": 64, "mean": 46.04853820800781, "std": 43.3525505065918, "min": -50.04536437988281, "p10": -7.966883277893065, "median": 52.612876892089844, "p90": 97.9573486328125, "max": 138.48776245117188, "pos_frac": 0.84375, "sample": [-6.636320114135742, -50.04536437988281, 65.58992004394531, 1.4453392028808594, 85.91450500488281, 55.35121154785156, 64.91094970703125, -45.12474822998047, -18.376052856445312, 72.75096130371094, 56.52076721191406, 59.2376708984375, 58.145263671875, 39.2287483215332, 6.635589599609375, 10.473649978637695, 89.64512634277344, 94.03948974609375, -2.53448486328125, 98.77938842773438, 97.730224609375, 98.0546875, 60.33955383300781, 102.5531997680664, 43.29755401611328, 16.825756072998047, 108.07344055175781, 65.22602844238281, 27.545745849609375, 5.27609920501709, 138.48776245117188, 53.65370178222656, 88.0347900390625, 3.9159088134765625, 35.481163024902344, 20.165592193603516, 117.4574203491211, 31.53385353088379, -24.030487060546875, 78.23094177246094, 24.583290100097656, -0.16940689086914062, 64.15945434570312, -36.82667541503906, 80.034912109375, 19.108871459960938, 55.32855224609375, 30.33404541015625, -8.537124633789062, 28.67828369140625, 50.642311096191406, 4.1129150390625, 18.040355682373047, 96.14373016357422, 96.6259765625, 36.20276641845703, 86.48048400878906, 92.80247497558594, 74.67459869384766, 55.22972869873047, 51.572052001953125, -16.711151123046875, 19.768474578857422, 121.023193359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000325.npy"}
{"epoch": 0.47723935389133626, "step": 326, "batch_size": 64, "mean": 41.62855529785156, "std": 45.054019927978516, "min": -67.01953887939453, "p10": -14.478147125244138, "median": 36.87641906738281, "p90": 99.70622711181643, "max": 150.12554931640625, "pos_frac": 0.828125, "sample": [23.973312377929688, 80.37257385253906, -67.01953887939453, -12.790279388427734, 4.134178161621094, -22.09942626953125, 50.72696304321289, 22.10595703125, 80.30567932128906, 27.315242767333984, 105.02239227294922, 51.02983856201172, 126.38243103027344, 24.95794677734375, -9.463153839111328, -19.329862594604492, 16.56903076171875, 8.723386764526367, 34.062076568603516, 138.50071716308594, 1.430389404296875, 58.68004608154297, 92.3896713256836, 84.31216430664062, 54.41606140136719, 15.901344299316406, 57.08258819580078, -4.8555755615234375, 17.38900375366211, 14.027519226074219, 89.74671936035156, 87.9397201538086, 94.33062744140625, 64.97752380371094, 42.75560760498047, 66.80847930908203, 36.988922119140625, 57.85298156738281, -15.201519012451172, 72.89517211914062, 7.1511688232421875, -28.74762725830078, 52.9143180847168, 63.295379638671875, 106.12492370605469, 34.27635955810547, 83.90913391113281, 36.763916015625, 40.551353454589844, 27.211772918701172, -49.459800720214844, 63.943206787109375, 36.47737121582031, 10.657272338867188, 120.3125228881836, 150.12554931640625, 27.155471801757812, 102.01005554199219, -3.519550323486328, -17.68828582763672, 67.78987884521484, 21.870101928710938, 54.726112365722656, 3.0300064086914062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000326.npy"}
{"epoch": 0.4787077826725404, "step": 327, "batch_size": 64, "mean": 44.55260467529297, "std": 45.017696380615234, "min": -44.06468963623047, "p10": -12.11943435668945, "median": 44.90619468688965, "p90": 108.62780914306646, "max": 148.72613525390625, "pos_frac": 0.828125, "sample": [32.74690246582031, -2.4207420349121094, 91.57373809814453, 81.11613464355469, 12.31610107421875, 94.43305969238281, 8.77073860168457, 74.05792236328125, 51.467254638671875, 91.92118835449219, 68.97264099121094, 8.532073974609375, 1.669677734375, 114.71127319335938, 29.440082550048828, -36.728614807128906, 7.038612365722656, 70.11454010009766, 66.13041687011719, 60.42613220214844, -44.06468963623047, 55.15217590332031, 148.72613525390625, 68.6229248046875, 16.24696922302246, 38.04975891113281, 93.06275939941406, -23.974281311035156, 40.38404083251953, 25.3956298828125, -7.261299133300781, 77.80633544921875, 14.049026489257812, 42.93558883666992, 57.54679870605469, 93.29878997802734, -21.319072723388672, 34.088104248046875, 6.9994354248046875, 51.24839401245117, -20.5089111328125, 18.24781036376953, 44.01445007324219, 117.74803161621094, -8.922805786132812, 61.196388244628906, 34.94036102294922, 84.37916564941406, 120.59745788574219, 54.7342529296875, -7.146018981933594, -16.136154174804688, 121.66075897216797, 9.739242553710938, 73.82597351074219, 13.269254684448242, 47.671852111816406, 119.03659057617188, 63.05575180053711, -13.489418029785156, 45.79793930053711, 141.06988525390625, 2.5317916870117188, 50.77033996582031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000327.npy"}
{"epoch": 0.4801762114537445, "step": 328, "batch_size": 64, "mean": 51.705078125, "std": 47.7775764465332, "min": -43.593902587890625, "p10": -2.816714286804199, "median": 40.869510650634766, "p90": 112.98736038208008, "max": 184.6364288330078, "pos_frac": 0.875, "sample": [113.76626586914062, 8.237030029296875, 25.867340087890625, 38.31768035888672, 88.4662857055664, 32.277496337890625, 31.162105560302734, 98.59197998046875, 64.261474609375, 33.086524963378906, 36.970909118652344, 0.7880859375, 77.1676025390625, 32.420379638671875, 5.002388000488281, 64.80807495117188, -24.86399269104004, -15.938690185546875, 32.03404235839844, 65.63845825195312, 24.696868896484375, 60.695709228515625, 88.44305419921875, 143.97958374023438, 10.943370819091797, 51.69340515136719, 19.7822265625, 11.986297607421875, 78.91165924072266, 135.7607421875, 41.18144226074219, 111.16991424560547, 109.32441711425781, 32.473731994628906, 6.307855606079102, -2.732086181640625, 24.536773681640625, 98.24549102783203, 15.37115478515625, -15.55950927734375, -2.8529834747314453, 139.05780029296875, -29.26683807373047, 97.53821563720703, 24.36782455444336, 184.6364288330078, -43.593902587890625, 46.33019256591797, 96.80419921875, 31.179096221923828, 115.83662414550781, 123.41848754882812, 81.82270050048828, -22.803314208984375, 25.635990142822266, 98.96437072753906, 102.31782531738281, 69.03370666503906, 80.71784973144531, 63.220703125, 66.35295104980469, 52.23992919921875, 12.336359024047852, 40.557579040527344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000328.npy"}
{"epoch": 0.48164464023494863, "step": 329, "batch_size": 64, "mean": 43.669822692871094, "std": 42.9894905090332, "min": -53.95903778076172, "p10": -10.227544403076172, "median": 42.67643165588379, "p90": 102.85713958740236, "max": 133.52783203125, "pos_frac": 0.84375, "sample": [125.10503387451172, 23.319000244140625, 133.52783203125, 59.848052978515625, 21.528335571289062, 13.050865173339844, 43.64869689941406, 0.863006591796875, 110.046875, 92.66339874267578, 82.51664733886719, 72.35444641113281, 5.189144134521484, 32.373199462890625, 18.62451171875, 43.30961990356445, -11.942893981933594, 56.12947082519531, 22.70978546142578, 111.82965087890625, 52.002235412597656, 20.484878540039062, 80.77339935302734, 22.595684051513672, 71.20951080322266, -13.923301696777344, -37.02944564819336, 74.36692810058594, 60.58515167236328, 91.51528930664062, 46.19768524169922, -10.411270141601562, 62.697288513183594, 22.318641662597656, 104.09431457519531, 62.391658782958984, 82.67649841308594, 78.33950805664062, 107.50880432128906, 99.97039794921875, -4.20556640625, -15.57464599609375, 82.44013977050781, 62.014404296875, 0.541259765625, 13.738548278808594, 119.22990417480469, -2.62713623046875, 87.59423065185547, 33.578819274902344, 25.932086944580078, -9.798851013183594, -53.95903778076172, 30.771514892578125, 28.530136108398438, 42.043243408203125, 44.561859130859375, 10.226287841796875, 65.17437744140625, 26.031982421875, 19.986663818359375, 19.464174270629883, 79.65047454833984, -45.53489685058594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000329.npy"}
{"epoch": 0.4831130690161527, "step": 330, "batch_size": 64, "mean": 47.70458984375, "std": 38.083518981933594, "min": -55.751251220703125, "p10": -1.9637969970703113, "median": 52.81452941894531, "p90": 90.744384765625, "max": 129.450927734375, "pos_frac": 0.875, "sample": [26.421005249023438, 43.44659423828125, -0.845672607421875, -4.196186065673828, 96.31502532958984, 17.570220947265625, 25.005741119384766, 66.32245635986328, 57.82013702392578, 31.49216079711914, 4.052021026611328, 70.1119384765625, 87.52178955078125, 79.6075668334961, 32.51416015625, 30.6573486328125, 79.0727767944336, 45.594940185546875, 21.382858276367188, 66.07084655761719, 56.146995544433594, -2.4429931640625, 13.261734008789062, -7.644798278808594, 96.88511657714844, 102.0259017944336, 26.116207122802734, -46.77260208129883, -8.96197509765625, 70.79473876953125, 90.90336608886719, 51.040260314941406, 67.28270721435547, 47.31133270263672, 65.23387908935547, 80.92816925048828, 98.44734954833984, 89.48648071289062, 73.94560241699219, 74.88021087646484, 2.7599525451660156, 84.60831451416016, 21.874149322509766, 82.57389831542969, -24.568443298339844, 114.03028869628906, 50.38923645019531, 57.828609466552734, 44.499786376953125, 59.381202697753906, 61.708648681640625, 90.37342834472656, 6.362037658691406, 48.2177734375, 54.563087463378906, 129.450927734375, 1.8885993957519531, 55.80836486816406, -55.751251220703125, 18.659744262695312, 64.99058532714844, 51.06597137451172, 76.45024108886719, 41.122955322265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000330.npy"}
{"epoch": 0.4845814977973568, "step": 331, "batch_size": 64, "mean": 50.507652282714844, "std": 43.304420471191406, "min": -50.74574661254883, "p10": -1.4946268081665035, "median": 48.69760322570801, "p90": 104.88175659179687, "max": 159.08901977539062, "pos_frac": 0.875, "sample": [26.790287017822266, 125.85801696777344, 72.61714172363281, 34.608123779296875, 33.84089660644531, -25.552749633789062, 11.642452239990234, 33.346832275390625, 23.05432891845703, 94.56416320800781, 60.00623321533203, -1.1818733215332031, 79.20127868652344, 36.247581481933594, 104.25633239746094, -18.282432556152344, 159.08901977539062, 18.483882904052734, 42.68867492675781, 67.2952880859375, 55.04981994628906, 63.92295455932617, 130.25497436523438, 17.592967987060547, 31.669288635253906, -23.359107971191406, 70.685791015625, -50.74574661254883, 126.04244232177734, 92.31448364257812, 71.13821411132812, 41.404930114746094, 99.14181518554688, 1.7012405395507812, 32.71778869628906, 6.0455322265625, 25.1229248046875, -12.348588943481445, 26.709346771240234, 37.76919174194336, 42.18175506591797, 61.477806091308594, -1.6286640167236328, 118.72171783447266, 50.36107635498047, 83.08650207519531, 105.14979553222656, 94.85894775390625, 86.34381103515625, 60.79124450683594, 9.557846069335938, 60.351966857910156, 63.14482116699219, 42.69477844238281, -32.684242248535156, 48.727455139160156, 48.66775131225586, 41.81926727294922, 2.179290771484375, 107.4563217163086, 78.50321960449219, 76.72655487060547, 64.63191223144531, 97.96472930908203], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000331.npy"}
{"epoch": 0.48604992657856094, "step": 332, "batch_size": 64, "mean": 37.903587341308594, "std": 45.0799446105957, "min": -79.5777587890625, "p10": -13.923488616943352, "median": 33.49231719970703, "p90": 100.99140396118167, "max": 147.52597045898438, "pos_frac": 0.828125, "sample": [147.52597045898438, 16.876800537109375, 66.79779052734375, 49.94572448730469, 23.009010314941406, -51.019561767578125, 39.295799255371094, 91.28368377685547, 7.367034912109375, -7.485237121582031, 113.03707885742188, -7.419273376464844, -16.6827392578125, 105.15185546875, 16.802433013916016, 31.790298461914062, 55.39295196533203, 111.644775390625, 32.52696990966797, 7.941092491149902, 42.61834716796875, 28.62151336669922, 56.85364532470703, 13.513313293457031, 123.69313049316406, 78.08180236816406, 39.45451354980469, -79.5777587890625, 30.093063354492188, 88.13589477539062, 68.42539978027344, 71.706787109375, -40.66297149658203, -0.8109207153320312, 33.938232421875, 19.463825225830078, 48.92839050292969, 84.61898803710938, -1.5893535614013672, 37.88892364501953, 16.856094360351562, -31.776466369628906, 64.95588684082031, 49.220184326171875, 62.39365768432617, 34.77973937988281, 32.909271240234375, 133.67544555664062, 58.44889831542969, -30.07416534423828, 58.25489807128906, 17.702224731445312, 56.68086624145508, 33.04640197753906, 125.9286880493164, 4.638296127319336, 23.669570922851562, 1.7384262084960938, 5.959877967834473, 54.96027374267578, 56.94709777832031, 29.397369384765625, 23.841136932373047, -35.50138854980469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000332.npy"}
{"epoch": 0.48751835535976507, "step": 333, "batch_size": 64, "mean": 49.67053985595703, "std": 59.22256088256836, "min": -147.71148681640625, "p10": -13.744326972961424, "median": 50.714426040649414, "p90": 121.67290496826175, "max": 210.28369140625, "pos_frac": 0.75, "sample": [29.449508666992188, -26.477981567382812, 97.59817504882812, -11.711257934570312, 78.25364685058594, 129.58682250976562, -58.00373077392578, 61.91863250732422, -11.421119689941406, -2.798917770385742, 50.385101318359375, -9.456107139587402, 55.862213134765625, 19.313812255859375, 59.41081237792969, 79.76559448242188, 33.0883674621582, 23.222442626953125, 86.01461791992188, -12.635154724121094, -1.8399124145507812, 111.11331176757812, 22.746994018554688, 91.5757064819336, 74.56324768066406, 65.06522369384766, 33.1993522644043, 17.529067993164062, 10.84942626953125, 112.96096801757812, 67.58975219726562, 10.180776596069336, 93.50927734375, 142.495361328125, 124.90098571777344, -9.751724243164062, -18.78883934020996, -14.219686508178711, 53.971519470214844, 48.93208312988281, 35.900146484375, 114.14071655273438, -15.760635375976562, 110.76127624511719, 39.95612335205078, 51.04375076293945, -0.6365966796875, 101.93606567382812, 210.28369140625, 128.8305206298828, 83.3487319946289, 92.18902587890625, 163.20391845703125, 10.93157958984375, 99.6192626953125, 75.36924743652344, 78.05335998535156, 19.85507583618164, -4.371318817138672, 19.30278778076172, -19.701431274414062, -147.71148681640625, 92.87887573242188, 131.5435333251953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000333.npy"}
{"epoch": 0.4889867841409692, "step": 334, "batch_size": 64, "mean": 45.11811828613281, "std": 53.31507873535156, "min": -129.5018768310547, "p10": -7.286766815185546, "median": 47.13508987426758, "p90": 113.66325149536135, "max": 147.09951782226562, "pos_frac": 0.8125, "sample": [74.86039733886719, 29.131011962890625, 5.291942596435547, 45.6950569152832, 93.1882553100586, -6.57415771484375, -5.664634704589844, 63.685386657714844, 32.52783966064453, 43.685211181640625, -2.79571533203125, -7.592170715332031, 9.46307373046875, 11.707046508789062, 67.23185729980469, -1.273895263671875, 59.85378646850586, 116.36939239501953, -129.5018768310547, 72.84552001953125, 17.528106689453125, 19.657928466796875, 100.40008544921875, -20.811681747436523, 55.42205810546875, 55.59033203125, 134.1446533203125, 11.283859252929688, 12.207099914550781, 27.468063354492188, -1.063812255859375, 17.778278350830078, 71.17864990234375, 61.64133834838867, 26.555404663085938, 4.517578125, 76.14646911621094, 74.45071411132812, 50.84373474121094, 47.57140350341797, 124.24087524414062, -42.42329406738281, 107.34892272949219, -53.62823486328125, 81.1517562866211, 89.43995666503906, 5.6622467041015625, 37.92193603515625, 103.43301391601562, 84.81427001953125, 81.19548034667969, 64.15716552734375, 120.95082092285156, 74.56595611572266, 26.307754516601562, 147.09951782226562, 46.69877624511719, 92.51387786865234, -28.193115234375, 140.72520446777344, 0.2215423583984375, 139.10931396484375, 90.31383514404297, -60.71149444580078], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000334.npy"}
{"epoch": 0.49045521292217326, "step": 335, "batch_size": 64, "mean": 42.2041130065918, "std": 49.56914520263672, "min": -57.89805603027344, "p10": -24.299842071533202, "median": 37.97121047973633, "p90": 99.27430648803711, "max": 179.60630798339844, "pos_frac": 0.796875, "sample": [-57.184478759765625, 76.3494873046875, 33.08720397949219, 67.01924133300781, 74.5191879272461, 16.056873321533203, 26.850189208984375, 37.69679260253906, 98.82945251464844, -46.034889221191406, -7.6492156982421875, 24.669036865234375, 38.245628356933594, -33.478179931640625, 74.78033447265625, 79.33120727539062, 99.46495819091797, 93.8463134765625, 4.372714996337891, -10.713310241699219, 63.18840789794922, 38.83266067504883, 47.74414825439453, -23.08562469482422, 47.807037353515625, 25.526824951171875, 107.5246810913086, -57.89805603027344, -39.23993682861328, 55.55390930175781, 68.84567260742188, 34.04206848144531, 50.32112121582031, 19.48040771484375, 138.03326416015625, 36.206050872802734, 179.60630798339844, 44.45030975341797, 87.75244903564453, -36.50999450683594, 26.184967041015625, 66.37605285644531, 35.3758544921875, 14.26800537109375, 54.838844299316406, -2.797893524169922, 36.92892074584961, 48.07493591308594, -24.820220947265625, 0.28558349609375, 72.89078521728516, 22.125484466552734, 83.93608093261719, -13.563186645507812, -16.207962036132812, 130.3261260986328, 110.21507263183594, 127.58638000488281, 34.33912658691406, 94.81049346923828, 15.780227661132812, 29.09429931640625, 94.57624816894531, 82.19895935058594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000335.npy"}
{"epoch": 0.4919236417033774, "step": 336, "batch_size": 64, "mean": 50.96186828613281, "std": 52.45704650878906, "min": -96.89781951904297, "p10": -2.930039215087889, "median": 47.584415435791016, "p90": 113.15190505981447, "max": 255.37066650390625, "pos_frac": 0.859375, "sample": [77.30380249023438, 20.012367248535156, 56.94879150390625, 74.05523681640625, 77.81785583496094, -96.89781951904297, 38.988670349121094, -3.5563125610351562, -1.4687347412109375, 124.36557006835938, -37.375736236572266, 35.66551208496094, 46.60198974609375, 5.18194580078125, 255.37066650390625, 47.214149475097656, 83.52969360351562, 63.16170883178711, 77.2231674194336, 8.387901306152344, 11.31991195678711, 109.29817199707031, 114.80350494384766, 66.03929901123047, 37.336090087890625, 50.639892578125, 58.624298095703125, 98.63473510742188, -0.5599212646484375, 19.6754150390625, 20.889911651611328, 90.23881530761719, 47.110076904296875, -33.071388244628906, 88.82962799072266, -16.58960723876953, -18.225845336914062, 37.95848846435547, 76.5110855102539, 75.92098999023438, 143.84490966796875, 69.74264526367188, 87.61450958251953, 10.530071258544922, 144.90673828125, 47.954681396484375, 11.110160827636719, 28.167861938476562, 36.902793884277344, 80.88971710205078, 92.70452117919922, 52.82819366455078, 37.74267578125, 78.69488525390625, 118.49342346191406, -22.050933837890625, 42.423858642578125, 59.754241943359375, 13.577106475830078, 18.268295288085938, 1.50054931640625, 118.64846801757812, 71.94395446777344, 27.452316284179688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000336.npy"}
{"epoch": 0.4933920704845815, "step": 337, "batch_size": 64, "mean": 48.112449645996094, "std": 55.161678314208984, "min": -68.34254455566406, "p10": -26.699817657470696, "median": 43.713829040527344, "p90": 120.66911621093752, "max": 175.20846557617188, "pos_frac": 0.84375, "sample": [77.45755004882812, 138.30960083007812, -68.34254455566406, -28.944198608398438, 79.67286682128906, 121.73431396484375, 56.75736999511719, 64.90717315673828, 110.04889678955078, 82.67259216308594, 16.35211181640625, 77.83265686035156, 29.05895233154297, -37.09590148925781, 105.73994445800781, 9.049728393554688, 113.34416198730469, 45.21928405761719, 29.856857299804688, 128.77987670898438, 83.33357238769531, 15.915214538574219, 52.26421356201172, 0.8319473266601562, 92.41696166992188, 22.425689697265625, 118.18365478515625, 95.35609436035156, 2.792266845703125, 21.783348083496094, 31.717926025390625, 37.71095275878906, -61.517120361328125, -57.97008514404297, 82.58415222167969, -21.462928771972656, 38.69451141357422, 70.1723403930664, -13.94171142578125, 65.33781433105469, 29.404281616210938, 175.20846557617188, 13.612052917480469, 39.07264709472656, -31.07917022705078, 86.73579406738281, 131.3704833984375, -7.249748229980469, 64.1741943359375, 17.900131225585938, 103.2761001586914, 26.186813354492188, 26.5257568359375, 136.65048217773438, -60.00975036621094, 53.068145751953125, 14.028060913085938, 75.41353607177734, 42.2083740234375, 139.3257293701172, 94.11871337890625, 2.4091415405273438, 2.5396041870117188, 75.26661682128906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000337.npy"}
{"epoch": 0.4948604992657856, "step": 338, "batch_size": 64, "mean": 45.65751647949219, "std": 46.82432556152344, "min": -71.65596008300781, "p10": -13.506916236877437, "median": 41.57236862182617, "p90": 101.73042221069336, "max": 139.68663024902344, "pos_frac": 0.8125, "sample": [-51.987548828125, 96.61774444580078, -3.1929473876953125, -0.9070472717285156, 34.637664794921875, 87.70983123779297, 13.035652160644531, 63.04058837890625, 101.04801940917969, 90.90591430664062, 117.94154357910156, 28.319995880126953, 27.91277313232422, 95.13896179199219, -48.370025634765625, 31.819183349609375, 78.86263275146484, 80.15992736816406, 139.68663024902344, 50.02350616455078, 17.37631607055664, 39.796592712402344, 37.614234924316406, -22.7935791015625, -39.772552490234375, 86.78707122802734, 58.07436752319336, 2.8428688049316406, 48.88816833496094, -6.0626678466796875, 54.13890075683594, 39.18171691894531, 99.18377685546875, -7.540910720825195, 34.36370849609375, 91.02549743652344, -15.678131103515625, 8.738954544067383, 102.88752746582031, 43.291831970214844, 107.9602279663086, -30.939983367919922, 38.18415069580078, -71.65596008300781, 67.10795593261719, 103.07799530029297, 100.78518676757812, 22.344482421875, 56.61709976196289, 56.22761535644531, 73.70028686523438, -8.44074821472168, 102.02288055419922, 51.58244323730469, 85.38026428222656, 97.87379455566406, 110.36181640625, 20.129600524902344, 32.47270202636719, 90.24293518066406, 39.8529052734375, 39.351322174072266, 17.933910369873047, 15.161445617675781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000338.npy"}
{"epoch": 0.49632892804698975, "step": 339, "batch_size": 64, "mean": 50.0194206237793, "std": 55.875335693359375, "min": -98.520263671875, "p10": -27.044472503662107, "median": 42.365936279296875, "p90": 120.35612564086915, "max": 173.1795654296875, "pos_frac": 0.8125, "sample": [64.40478515625, 67.47980499267578, 95.87223052978516, 82.25177001953125, 5.820732116699219, 48.60498046875, 40.637786865234375, -29.64132308959961, 99.47270965576172, -19.342803955078125, 173.1795654296875, 67.67190551757812, 54.31233215332031, -34.797088623046875, 43.55412292480469, 62.03520584106445, -39.835357666015625, -8.047203063964844, -24.645309448242188, 122.12660217285156, 115.212890625, 36.61380386352539, -20.44439697265625, 72.22195434570312, 29.8382568359375, 86.59611511230469, 64.14170837402344, -1.5562362670898438, 136.39752197265625, 116.84601593017578, 16.993064880371094, 67.04732513427734, -98.520263671875, 38.54603576660156, 121.86045837402344, 20.780990600585938, 29.872581481933594, 0.46343231201171875, 39.9066162109375, 27.380157470703125, 41.17774963378906, 39.77394104003906, 11.996749877929688, 87.97174072265625, 101.14584350585938, 160.6756591796875, -28.07268524169922, 115.02438354492188, 33.493202209472656, 14.477607727050781, 65.86148071289062, 151.20921325683594, 137.5039825439453, -32.99272918701172, 87.03218078613281, 93.45628356933594, 31.475555419921875, 84.92366027832031, 37.88653564453125, -45.22911071777344, 113.45648193359375, 86.10896301269531, 24.512496948242188, 17.060142517089844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000339.npy"}
{"epoch": 0.4977973568281938, "step": 340, "batch_size": 64, "mean": 45.136905670166016, "std": 51.16151809692383, "min": -101.6008529663086, "p10": -11.025224494934081, "median": 46.59071731567383, "p90": 102.45276336669924, "max": 236.38548278808594, "pos_frac": 0.828125, "sample": [78.60881042480469, 83.4039535522461, 41.89314651489258, 31.4503173828125, -79.01512145996094, 74.6043472290039, 64.10507202148438, 90.75067138671875, 41.62150573730469, 56.754966735839844, 67.10060119628906, 236.38548278808594, -2.5210113525390625, -11.262052536010742, 114.03775024414062, 48.91039276123047, 20.701522827148438, -13.917900085449219, 109.73222351074219, 1.3921127319335938, 57.29454803466797, 46.848777770996094, -101.6008529663086, 61.89320373535156, 76.48338317871094, 41.856719970703125, -16.53992462158203, 28.783523559570312, 51.726924896240234, 31.562767028808594, 82.72779846191406, 46.33265686035156, 55.09596252441406, 54.92949676513672, 19.830352783203125, 12.070850372314453, 92.41594696044922, 6.678565979003906, -7.810832977294922, 95.80279541015625, 9.808197021484375, 64.48977661132812, 49.4624137878418, 17.307640075683594, 1.5524749755859375, 105.30274963378906, -10.472625732421875, 40.4360237121582, 30.333232879638672, 31.428131103515625, 9.290603637695312, 132.7274932861328, 5.6234283447265625, 54.290008544921875, 61.47888946533203, 108.45223999023438, -26.111934661865234, 91.59001922607422, 91.66610717773438, 116.59477233886719, -2.3103561401367188, 86.61747741699219, -17.537033081054688, 45.62261199951172], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000340.npy"}
{"epoch": 0.49926578560939794, "step": 341, "batch_size": 64, "mean": 43.65577697753906, "std": 54.095924377441406, "min": -76.89141845703125, "p10": -24.45951538085937, "median": 41.027069091796875, "p90": 110.75011978149415, "max": 165.732177734375, "pos_frac": 0.796875, "sample": [165.732177734375, 21.014892578125, -59.286354064941406, 21.30352020263672, 73.20603942871094, 27.972503662109375, -17.38616943359375, 52.610931396484375, 144.00445556640625, 36.22598648071289, 48.1285514831543, 50.004756927490234, 32.748165130615234, -26.650115966796875, 10.747661590576172, 5.264362335205078, -11.482711791992188, 136.4708709716797, 0.050006866455078125, -19.05364990234375, -4.963315963745117, 102.04092407226562, 89.5693588256836, 31.446189880371094, 79.56826782226562, 112.21826171875, 11.922767639160156, -29.506179809570312, -19.348114013671875, 43.95855712890625, 54.02659606933594, 120.85833740234375, 148.8218231201172, 116.44046783447266, 96.05778503417969, 107.32445526123047, 86.09119415283203, -37.46284484863281, 9.585517883300781, 14.988147735595703, 33.01991271972656, -76.89141845703125, 88.29154968261719, -3.4671096801757812, 18.163818359375, 38.0955810546875, 20.34442138671875, 58.73919677734375, -72.29150390625, 73.70348358154297, 90.13523864746094, 18.344219207763672, -46.22425842285156, 72.34674072265625, 89.5203857421875, 37.29557800292969, 68.33577728271484, 26.44452667236328, 68.92115783691406, 60.753501892089844, 64.03904724121094, 90.92613220214844, 103.85112762451172, 46.30846405029297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000341.npy"}
{"epoch": 0.5007342143906021, "step": 342, "batch_size": 64, "mean": 61.41213607788086, "std": 54.315086364746094, "min": -84.82722473144531, "p10": -5.056172180175781, "median": 63.15629577636719, "p90": 128.0539993286133, "max": 150.64630126953125, "pos_frac": 0.875, "sample": [79.96546173095703, 61.58551025390625, 87.88470458984375, 135.91259765625, 21.1165828704834, 103.11582946777344, 71.96234130859375, 46.788360595703125, 40.99284362792969, -12.141654968261719, 68.58858489990234, 32.78962707519531, 124.08897399902344, 135.2110595703125, 99.29643249511719, 42.26628112792969, 77.33274841308594, 119.1165771484375, 116.99833679199219, -5.0230712890625, 49.647918701171875, -5.0703582763671875, 71.66561889648438, 2.692626953125, 41.95808792114258, -44.40806579589844, 53.379608154296875, 64.8692626953125, 100.47737884521484, -84.82722473144531, -5.912834167480469, 118.08599090576172, 120.43626403808594, 17.066707611083984, 32.82585144042969, 23.62134552001953, 82.08052062988281, 142.40208435058594, 120.41455841064453, 104.0209732055664, 3.92047119140625, 43.63337326049805, 59.966339111328125, 11.120086669921875, 150.3016357421875, 150.64630126953125, 58.791015625, 51.05317687988281, 106.96854400634766, 21.699447631835938, 81.6225357055664, 137.72378540039062, 119.17588806152344, 42.373329162597656, 79.9053955078125, 105.17522430419922, 43.44746398925781, 13.328193664550781, -52.24150848388672, -65.07746124267578, 9.784927368164062, 64.72708129882812, 129.7532958984375, 109.30366516113281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000342.npy"}
{"epoch": 0.5022026431718062, "step": 343, "batch_size": 64, "mean": 51.96100616455078, "std": 51.64067077636719, "min": -56.70075988769531, "p10": -18.747718811035153, "median": 51.779693603515625, "p90": 114.2436981201172, "max": 181.69012451171875, "pos_frac": 0.859375, "sample": [10.818557739257812, -35.37073516845703, -7.7738800048828125, -28.912506103515625, 60.20102310180664, 15.182159423828125, 96.53828430175781, 24.079933166503906, 63.63566589355469, 54.18603515625, 37.72401428222656, 130.64736938476562, 20.928009033203125, 125.93991088867188, 96.9211654663086, -29.36652374267578, 12.41006088256836, 60.21339416503906, 106.85869598388672, 45.040771484375, 90.68112182617188, 47.89653015136719, -40.67169189453125, 101.4143295288086, 112.44990539550781, 61.678993225097656, 68.69178009033203, 8.05560302734375, 86.41337585449219, 111.68668365478516, 98.68961334228516, 30.158540725708008, -20.84210205078125, 52.27351379394531, 100.29157257080078, 51.28587341308594, 2.2211456298828125, -35.54243087768555, 23.238332748413086, 16.693832397460938, 18.02471923828125, 8.34564208984375, 67.72758483886719, 39.48280334472656, 115.01246643066406, 65.43128967285156, 65.34454345703125, 98.76099395751953, 48.932674407958984, 23.540313720703125, 121.45408630371094, 0.44635009765625, 110.86247253417969, 19.987293243408203, 86.3539810180664, 127.99105834960938, 32.19139862060547, 96.2465591430664, 110.95472717285156, -13.860824584960938, 10.047691345214844, 120.5714111328125, 181.69012451171875, -56.70075988769531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000343.npy"}
{"epoch": 0.5036710719530103, "step": 344, "batch_size": 64, "mean": 50.76179504394531, "std": 56.002323150634766, "min": -73.34451293945312, "p10": -11.08728847503662, "median": 41.63364791870117, "p90": 125.40261230468754, "max": 193.79074096679688, "pos_frac": 0.875, "sample": [54.749122619628906, 0.21693801879882812, -44.227935791015625, -11.146018981933594, 64.17219543457031, 25.037734985351562, 87.91664123535156, 43.29743194580078, 35.94102478027344, 87.0819091796875, 33.500492095947266, 26.042930603027344, -65.58358764648438, 93.07939910888672, 66.46823120117188, 95.74173736572266, 143.44146728515625, 82.89188385009766, 20.499771118164062, 54.92023468017578, 41.65456008911133, -10.950250625610352, 11.541905403137207, 14.299797058105469, 25.484329223632812, 193.79074096679688, 35.33039855957031, 93.38864135742188, 35.545509338378906, 43.1156005859375, 18.955734252929688, 94.52133178710938, 10.077110290527344, 34.30271911621094, 129.22918701171875, 41.612735748291016, 116.47393798828125, 8.317131042480469, 10.943817138671875, 8.17668342590332, 148.0645751953125, 102.83882904052734, -64.01460266113281, 99.50672149658203, -29.798137664794922, 145.75637817382812, 94.9404296875, 96.61701965332031, 35.7010383605957, 172.38717651367188, 0.20911216735839844, -19.5743408203125, 25.519197463989258, 83.9305419921875, 88.87571716308594, 153.9663543701172, -73.34451293945312, 33.343345642089844, 60.34215545654297, 39.42658996582031, 51.08536911010742, 44.6750373840332, 85.49581909179688, 22.95151138305664], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000344.npy"}
{"epoch": 0.5051395007342144, "step": 345, "batch_size": 64, "mean": 56.23204803466797, "std": 54.72203826904297, "min": -55.7286376953125, "p10": -3.0132068634033202, "median": 51.3322696685791, "p90": 130.78189544677736, "max": 201.7724151611328, "pos_frac": 0.875, "sample": [35.263084411621094, 119.82373046875, -35.869537353515625, -31.74175262451172, 26.448253631591797, 25.8309326171875, 201.7724151611328, 134.59332275390625, 137.7415771484375, -5.36181640625, 25.2861270904541, 89.336669921875, 103.26141357421875, 40.954376220703125, 123.59663391113281, 22.397598266601562, 12.360794067382812, 86.00028228759766, 105.79973602294922, 96.83428955078125, 98.38264465332031, 88.54679870605469, 48.31844711303711, 92.00456237792969, 158.7930908203125, 62.648345947265625, 22.409011840820312, 29.780868530273438, -2.730602264404297, 19.53717041015625, -3.7386932373046875, 96.97886657714844, 77.48822021484375, 20.691654205322266, 112.71670532226562, 82.74537658691406, -3.1343231201171875, 7.813407897949219, 19.298171997070312, 114.66020965576172, 2.9442825317382812, 141.03472900390625, 19.354873657226562, 58.65904998779297, 87.19134521484375, 54.346092224121094, 12.566879272460938, 56.18629455566406, 55.91468811035156, 127.24986267089844, 58.24895477294922, 55.4891357421875, 4.118867874145508, 170.2752685546875, 3.3146629333496094, -55.7286376953125, 2.749309539794922, -32.935638427734375, 30.57251739501953, 72.183349609375, 40.75539016723633, 132.29562377929688, 43.980682373046875, 0.5453948974609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000345.npy"}
{"epoch": 0.5066079295154186, "step": 346, "batch_size": 64, "mean": 46.195072174072266, "std": 59.02115249633789, "min": -99.126220703125, "p10": -11.327004241943357, "median": 48.29217529296875, "p90": 114.40014877319337, "max": 209.79051208496094, "pos_frac": 0.796875, "sample": [144.53488159179688, 107.11972045898438, -8.701107025146484, 135.80133056640625, 89.13545989990234, 92.57281494140625, 48.84393310546875, 121.5495834350586, 38.424705505371094, 65.56705474853516, 84.32337951660156, 63.462196350097656, 60.49238586425781, 13.65313720703125, 21.734397888183594, 66.96728515625, 55.407981872558594, -61.697174072265625, 27.90692901611328, -8.684270858764648, -0.07767486572265625, -48.32384490966797, -99.126220703125, 115.00345611572266, -66.44586944580078, 84.88967895507812, 44.81317901611328, -6.4356842041015625, 93.68502807617188, 57.09327697753906, 66.38225555419922, 65.9339828491211, -90.65603637695312, 10.701034545898438, 61.63929748535156, 49.94355010986328, 38.38459396362305, 122.97052001953125, 11.300537109375, 33.44599914550781, 35.918861389160156, 2.2781829833984375, 92.6877670288086, 181.92344665527344, -3.72088623046875, 47.74041748046875, 19.91541290283203, 0.6841812133789062, 105.78347778320312, 112.992431640625, 102.3992919921875, -12.452388763427734, 85.28656005859375, -3.2614784240722656, 19.188823699951172, 57.681365966796875, 20.523914337158203, 82.08074188232422, 209.79051208496094, 8.9281005859375, 21.2872314453125, -21.78759002685547, 12.095100402832031, 74.98558044433594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000346.npy"}
{"epoch": 0.5080763582966226, "step": 347, "batch_size": 64, "mean": 43.91358184814453, "std": 58.35564041137695, "min": -107.29408264160156, "p10": -23.444079399108887, "median": 43.13309097290039, "p90": 117.58977737426758, "max": 184.75253295898438, "pos_frac": 0.75, "sample": [-107.29408264160156, -18.47799301147461, 46.196258544921875, 65.76048278808594, 6.107173919677734, -6.709327697753906, 48.17647171020508, 36.02458190917969, -25.580703735351562, 117.60492706298828, 129.67002868652344, 3.2381515502929688, -22.898452758789062, 54.131103515625, 13.026771545410156, 41.369606018066406, -68.52611541748047, -12.53143310546875, 184.75253295898438, 17.04357147216797, 118.53338623046875, 18.22901153564453, 20.188003540039062, 31.532066345214844, 54.88172912597656, 60.78704071044922, 168.24832153320312, 74.45654296875, 52.310482025146484, 117.23346710205078, 30.800575256347656, -16.2423095703125, 117.55442810058594, 86.24152374267578, -11.77375602722168, 102.01606750488281, 114.313720703125, 38.53731918334961, 22.526351928710938, -12.930816650390625, 74.4930648803711, -40.77204132080078, 95.72343444824219, 63.431636810302734, 104.06957244873047, 69.37370300292969, 44.896575927734375, 69.30517578125, 75.17735290527344, 14.341066360473633, 52.4364013671875, 82.60917663574219, 107.29261016845703, 140.8112335205078, -43.03385543823242, 57.50860595703125, -40.18488311767578, -23.677919387817383, 152.1661834716797, 38.46074295043945, 27.5374755859375, 1.58001708984375, -0.8818740844726562, -0.7209548950195312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000347.npy"}
{"epoch": 0.5095447870778267, "step": 348, "batch_size": 64, "mean": 47.98047637939453, "std": 54.09994125366211, "min": -78.11915588378906, "p10": -11.263417816162109, "median": 48.18965721130371, "p90": 115.26260070800781, "max": 171.70095825195312, "pos_frac": 0.84375, "sample": [-11.377769470214844, -2.8590545654296875, 99.83270263671875, 7.1760711669921875, 136.59071350097656, 49.0723876953125, 23.377609252929688, 108.92176818847656, 51.35395050048828, 107.0452880859375, 6.6033477783203125, -6.5491943359375, 48.060546875, 61.44556427001953, 13.566940307617188, 31.442733764648438, 0.9985389709472656, 111.48579406738281, 36.9219856262207, 54.953941345214844, 54.32281494140625, 35.51097106933594, 121.71696472167969, 54.075042724609375, 93.53596496582031, 63.86772155761719, 21.30573272705078, 118.18183898925781, 18.14263343811035, 53.612457275390625, 40.492095947265625, 27.662479400634766, 2.213623046875, 111.26451110839844, 124.00534057617188, 63.881561279296875, 7.701622009277344, 12.177505493164062, 94.81089782714844, 64.97882843017578, -78.11915588378906, 21.051254272460938, 43.43916320800781, 127.78262329101562, -21.046798706054688, -49.705291748046875, 108.49532318115234, 9.70831298828125, 40.21343994140625, 109.254638671875, 98.35334014892578, 48.31876754760742, 171.70095825195312, 55.18871307373047, 102.30132293701172, -10.996597290039062, 11.719390869140625, 82.32044219970703, 18.698837280273438, 115.20819091796875, 115.28591918945312, -68.38609313964844, -75.65583038330078, -15.904764175415039], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000348.npy"}
{"epoch": 0.5110132158590308, "step": 349, "batch_size": 64, "mean": 45.96704864501953, "std": 55.00606918334961, "min": -116.87010192871094, "p10": -18.14445571899414, "median": 41.60102844238281, "p90": 119.80840301513675, "max": 168.26866149902344, "pos_frac": 0.8125, "sample": [5.263458251953125, 27.49664306640625, -43.417083740234375, -3.9132652282714844, 69.71672821044922, 80.12765502929688, 148.27838134765625, -0.26873016357421875, 66.92706298828125, 79.51228332519531, 33.07609176635742, 62.73377227783203, 37.1800537109375, 60.50267791748047, 38.392791748046875, 5.502969741821289, 7.418575286865234, 45.16105651855469, 48.761322021484375, 33.315101623535156, 168.26866149902344, 99.64305114746094, -30.523666381835938, 21.15011215209961, 123.69660186767578, -18.265541076660156, 110.55500793457031, 72.6824722290039, 62.80995559692383, -31.810256958007812, 166.42022705078125, 15.261909484863281, -42.22722625732422, 20.11507797241211, 37.36915588378906, -23.971664428710938, -116.87010192871094, 21.907073974609375, 102.65562438964844, 132.22354125976562, 53.44197082519531, 3.6852569580078125, 12.449607849121094, 29.227615356445312, 111.78557586669922, 88.89009857177734, 53.129852294921875, 93.54252624511719, 34.790367126464844, 66.05255126953125, 14.177360534667969, 152.15184020996094, 62.93910598754883, -3.45904541015625, 58.98643493652344, -17.861923217773438, 3.9104957580566406, 123.24675750732422, 100.14818572998047, 53.48847961425781, -8.01751708984375, 17.672819137573242, 44.80926513671875, 99.84573364257812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000349.npy"}
{"epoch": 0.5124816446402349, "step": 350, "batch_size": 64, "mean": 38.87583923339844, "std": 47.71432113647461, "min": -50.33867645263672, "p10": -23.680993652343748, "median": 34.9733829498291, "p90": 108.89687118530276, "max": 157.20372009277344, "pos_frac": 0.78125, "sample": [63.90266418457031, -47.923301696777344, -1.9219589233398438, 34.16556167602539, 88.38632202148438, 52.249900817871094, 48.349273681640625, 7.79119873046875, -18.728378295898438, 40.040992736816406, 16.042957305908203, 129.92276000976562, 10.56500244140625, 53.06026840209961, 53.729400634765625, -23.164474487304688, 17.553298950195312, -26.676162719726562, 33.41621398925781, 111.20677185058594, 99.28578186035156, 70.90095520019531, 28.013412475585938, 37.09895324707031, -23.902359008789062, 120.13832092285156, 94.27273559570312, -36.89830017089844, -27.750717163085938, 62.58457946777344, 31.94017791748047, 129.8490447998047, 35.78120422363281, 84.44412231445312, 21.336318969726562, 103.5071029663086, -50.33867645263672, 157.20372009277344, 47.874412536621094, 28.118896484375, -3.3326416015625, 56.14091491699219, 68.48725891113281, 68.82046508789062, 90.68731689453125, 125.12985229492188, 13.11407470703125, 18.278602600097656, 14.823013305664062, 77.12496948242188, 10.624542236328125, 116.06114196777344, -11.024818420410156, 39.0426025390625, 15.052717208862305, -11.176162719726562, 1.5706024169921875, -28.156036376953125, 24.976852416992188, -2.9397354125976562, 44.66547393798828, 6.5733489990234375, 38.64915466308594, 59.43229675292969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000350.npy"}
{"epoch": 0.5139500734214391, "step": 351, "batch_size": 64, "mean": 56.837730407714844, "std": 58.14760208129883, "min": -69.48434448242188, "p10": -29.882355499267575, "median": 54.300785064697266, "p90": 125.91506805419924, "max": 191.0748291015625, "pos_frac": 0.828125, "sample": [53.59790802001953, 37.322998046875, 49.68865966796875, 160.90484619140625, 141.90982055664062, 15.408061981201172, 52.899681091308594, 40.20610046386719, 108.24113464355469, 148.90721130371094, 85.62802124023438, 55.003662109375, 47.42127990722656, 12.236221313476562, 102.50444030761719, 49.41621398925781, -55.88043212890625, 89.76014709472656, 75.3326187133789, 100.43916320800781, -60.866180419921875, 160.41757202148438, 79.989990234375, 87.11415100097656, -28.63793182373047, 93.45026397705078, 64.30010223388672, 104.67704772949219, -28.457321166992188, 47.84587097167969, 49.826812744140625, 121.75201416015625, -43.6701545715332, 73.18391418457031, 38.166229248046875, 93.72163391113281, 20.777572631835938, 127.69923400878906, 26.123214721679688, -2.3491363525390625, 13.017204284667969, -1.6264877319335938, 105.70870971679688, 145.63697814941406, -30.415679931640625, 26.24394416809082, 65.30262756347656, -32.242401123046875, 70.20586395263672, -55.078765869140625, 21.744417190551758, 72.97640991210938, -69.48434448242188, 50.50165557861328, 110.71994018554688, 39.17591857910156, 98.56718444824219, 191.0748291015625, 83.407470703125, 30.912033081054688, 103.53689575195312, 106.8541259765625, 30.028629302978516, 64.8349380493164], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000351.npy"}
{"epoch": 0.5154185022026432, "step": 352, "batch_size": 64, "mean": 53.94491195678711, "std": 55.1077766418457, "min": -43.93131637573242, "p10": -17.596114540100093, "median": 52.123483657836914, "p90": 134.23280181884766, "max": 179.94525146484375, "pos_frac": 0.8125, "sample": [174.119140625, 153.7538604736328, 15.848163604736328, 79.579345703125, -28.57470703125, 26.791954040527344, 15.186729431152344, -30.911224365234375, 111.84117126464844, 135.4966278076172, 39.19267272949219, 27.23138427734375, 58.95030212402344, -26.32172393798828, 41.26017379760742, -20.20842933654785, 30.655941009521484, 179.94525146484375, 25.622154235839844, -31.055217742919922, 152.63478088378906, 120.74510955810547, 33.544464111328125, 24.60797119140625, 106.11994934082031, 51.912689208984375, -11.500713348388672, 11.0645751953125, 95.23560333251953, 56.555084228515625, -6.7569580078125, 95.01544952392578, 57.23429870605469, -43.93131637573242, 112.43673706054688, 56.251739501953125, -22.310379028320312, 64.8639907836914, -3.3041000366210938, 52.33427810668945, 18.201631546020508, 51.10569763183594, 77.38871765136719, 54.469139099121094, -8.991386413574219, 69.35464477539062, 137.80380249023438, 146.08377075195312, 129.15476989746094, 77.87295532226562, 48.225425720214844, 5.027839660644531, -2.660381317138672, 131.28387451171875, 81.8587417602539, 51.33191680908203, 13.467121124267578, 76.07041931152344, 61.6951904296875, 59.063385009765625, 14.80615234375, 1.1358413696289062, 63.241764068603516, 114.3265380859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000352.npy"}
{"epoch": 0.5168869309838473, "step": 353, "batch_size": 64, "mean": 33.43932342529297, "std": 51.50300598144531, "min": -106.14622497558594, "p10": -36.352634048461915, "median": 36.26612854003906, "p90": 91.96075286865235, "max": 145.87513732910156, "pos_frac": 0.734375, "sample": [145.87513732910156, 71.02912902832031, 30.07365608215332, 124.61067199707031, 29.47552490234375, 65.1528549194336, -4.660930633544922, -55.39142608642578, 60.183631896972656, 14.959938049316406, 10.278633117675781, -16.092681884765625, 75.49766540527344, 44.88758087158203, 61.53124237060547, 117.21286010742188, -52.51515197753906, -15.131118774414062, 32.90589904785156, 66.0347900390625, 58.85218811035156, 77.42342376708984, 122.73416900634766, -34.53780746459961, 15.140625, 40.5087890625, -37.13041687011719, 41.11830139160156, 15.163040161132812, 37.405357360839844, 29.136268615722656, -21.657474517822266, -45.187744140625, 59.94115447998047, -5.189075469970703, -106.14622497558594, 133.9473114013672, 94.07676696777344, -38.504150390625, 11.060456275939941, 83.80245971679688, 64.64192962646484, -20.460426330566406, 27.462615966796875, 93.30400085449219, 67.83998107910156, 74.4311752319336, -81.64541625976562, 22.7449951171875, 70.95974731445312, 33.67723083496094, 78.88357543945312, 88.82650756835938, 40.79352569580078, 20.617828369140625, 43.253662109375, 51.66779327392578, 66.39787292480469, -5.588043212890625, -16.56597900390625, -13.583026885986328, 42.808387756347656, 35.12689971923828, 16.646530151367188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000353.npy"}
{"epoch": 0.5183553597650514, "step": 354, "batch_size": 64, "mean": 53.27729797363281, "std": 45.38572692871094, "min": -55.717254638671875, "p10": -1.8781562805175729, "median": 46.68876647949219, "p90": 117.82258224487305, "max": 169.16432189941406, "pos_frac": 0.890625, "sample": [60.23268127441406, 74.25840759277344, 39.836875915527344, 90.36323547363281, 169.16432189941406, 86.53323364257812, 129.11622619628906, -11.700355529785156, 14.811172485351562, -4.150177001953125, 29.700897216796875, 6.194635391235352, 68.78827667236328, 137.50320434570312, 104.06245422363281, 18.14590072631836, 42.793701171875, 51.783355712890625, 118.10885620117188, 26.339431762695312, 74.6590576171875, 26.940956115722656, 21.250274658203125, 18.65326690673828, 79.63517761230469, 47.46040344238281, 45.91712951660156, 82.30967712402344, 53.1939697265625, 47.64808654785156, 41.75897216796875, 59.36103439331055, 125.55965423583984, 34.31496047973633, 28.47850799560547, 63.8431396484375, -55.717254638671875, 33.426170349121094, 48.52391052246094, -20.298416137695312, 25.00689697265625, 90.97553253173828, 69.19799041748047, 3.4232254028320312, 44.24455261230469, 85.20600891113281, -4.163675308227539, 117.15460968017578, 95.36747741699219, 56.900421142578125, 15.885456085205078, 26.526241302490234, 28.413711547851562, 23.140625, -13.90216064453125, 57.96205139160156, 152.98382568359375, 42.59419250488281, 39.954307556152344, -4.3582305908203125, 161.17367553710938, 48.98588562011719, 29.151458740234375, 109.11788177490234], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000354.npy"}
{"epoch": 0.5198237885462555, "step": 355, "batch_size": 64, "mean": 51.005733489990234, "std": 52.92388916015625, "min": -74.66207122802734, "p10": -7.6870727539062464, "median": 52.27284049987793, "p90": 114.34884033203124, "max": 175.13284301757812, "pos_frac": 0.8125, "sample": [82.45193481445312, 66.31156158447266, 20.052162170410156, 107.57650756835938, -74.66207122802734, 69.15042114257812, -48.620906829833984, 114.45086669921875, -8.989799499511719, 66.01164245605469, 99.39907836914062, 109.90373229980469, -35.000083923339844, 23.38429832458496, 5.772098541259766, 108.47853088378906, 146.7322540283203, 155.03643798828125, -38.281558990478516, 23.168498992919922, 132.5266876220703, 57.8028564453125, 59.553443908691406, 72.10667419433594, 14.638778686523438, 40.72323226928711, 50.35275650024414, 46.518211364746094, 64.57684326171875, 93.16729736328125, -2.4396896362304688, -2.0549774169921875, 84.67716979980469, 81.25552368164062, 86.91986846923828, 8.618209838867188, 24.320785522460938, 119.6772232055664, -2.007587432861328, 175.13284301757812, 93.54263305664062, 75.696044921875, 30.385513305664062, 22.318359375, 63.649864196777344, -14.201866149902344, 43.25522994995117, 1.1456146240234375, 26.74889373779297, 54.19292449951172, 91.46310424804688, 127.69844818115234, -54.424652099609375, 114.11077880859375, 0.2508544921875, 102.23686218261719, 73.82227325439453, 43.56150817871094, 43.60307312011719, -0.8071193695068359, 24.14739990234375, -4.647377014160156, 72.2855453491211, 35.941314697265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000355.npy"}
{"epoch": 0.5212922173274597, "step": 356, "batch_size": 64, "mean": 37.909202575683594, "std": 56.47854232788086, "min": -103.62787628173828, "p10": -31.223735809326165, "median": 40.92465019226074, "p90": 98.27117919921876, "max": 189.83218383789062, "pos_frac": 0.71875, "sample": [70.37014770507812, 50.352622985839844, 25.129863739013672, 21.896697998046875, 10.61849594116211, -7.484882354736328, 80.4043197631836, 66.21284484863281, 30.311912536621094, 75.48247528076172, 88.89179992675781, 35.58131408691406, 71.37139892578125, 144.8518524169922, 112.83915710449219, 73.55763244628906, -34.077606201171875, 99.91876220703125, -8.512596130371094, 59.96501159667969, -19.926422119140625, 61.16404724121094, -24.56470489501953, 94.42681884765625, 22.82714080810547, 57.116485595703125, 81.02506256103516, 66.75070190429688, -67.39063262939453, 33.98614501953125, -12.375717163085938, 189.83218383789062, 130.40353393554688, 34.26615905761719, 50.465911865234375, -103.62787628173828, -3.20111083984375, 55.45780944824219, -92.88987731933594, 47.90728759765625, -34.48139190673828, 40.5118408203125, -65.90505981445312, 38.1432991027832, 11.36073112487793, 14.56048583984375, 47.786598205566406, 26.861053466796875, 152.9522705078125, -5.3177337646484375, 104.64208221435547, 61.72944641113281, 90.93772888183594, 63.55718994140625, -4.257686614990234, -5.388458251953125, 41.00431442260742, 73.15202331542969, -13.194477081298828, 79.73681640625, 40.84498596191406, -51.401031494140625, -21.738418579101562, 70.75831604003906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000356.npy"}
{"epoch": 0.5227606461086637, "step": 357, "batch_size": 64, "mean": 62.15252685546875, "std": 51.46938705444336, "min": -59.382843017578125, "p10": -1.3618755340576159, "median": 60.97359848022461, "p90": 129.6569091796875, "max": 157.04470825195312, "pos_frac": 0.875, "sample": [113.57245635986328, 77.90228271484375, 88.7864990234375, 140.2413330078125, 157.04470825195312, 79.84773254394531, 53.61555480957031, 35.082855224609375, 12.2496337890625, 32.09031677246094, 115.74615478515625, 25.327178955078125, -48.859214782714844, 56.857818603515625, 42.728607177734375, 60.175445556640625, 125.1130599975586, 103.3348388671875, 129.49630737304688, -11.511104583740234, 24.023193359375, -8.870128631591797, 40.244590759277344, 77.83831787109375, -59.382843017578125, -18.751001358032227, 70.28431701660156, 129.72573852539062, 5.7942352294921875, 129.88595581054688, 127.89840698242188, 7.145164489746094, 52.37946319580078, 39.49846649169922, 107.717041015625, -1.9074897766113281, 99.0486831665039, 142.99838256835938, 37.671630859375, 106.8656997680664, -0.088775634765625, 12.171611785888672, 156.3226318359375, 84.92576599121094, 52.109283447265625, -7.638580322265625, 2.017547607421875, 36.85804748535156, 127.55179595947266, 97.58547973632812, 52.1096305847168, 74.82743835449219, 107.25364685058594, 63.74464416503906, 132.06484985351562, 10.575767517089844, 14.996337890625, 61.771751403808594, 110.42137145996094, 62.719482421875, 27.114273071289062, 98.42596435546875, 65.2963638305664, 35.67530822753906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000357.npy"}
{"epoch": 0.5242290748898678, "step": 358, "batch_size": 64, "mean": 43.57078552246094, "std": 55.56226348876953, "min": -47.61363220214844, "p10": -24.020783233642575, "median": 34.49484062194824, "p90": 125.30503540039062, "max": 196.419677734375, "pos_frac": 0.78125, "sample": [-24.61382293701172, 81.7756576538086, 115.79339599609375, 149.69393920898438, 48.11155700683594, 95.75919342041016, 28.81391143798828, 76.33050537109375, 22.21480941772461, 52.154457092285156, -25.085693359375, 114.58106994628906, 78.33786010742188, 11.271194458007812, -22.27670669555664, 139.39332580566406, -27.341552734375, 59.69865417480469, 55.84925842285156, 21.626232147216797, -36.72633743286133, 64.77200317382812, -25.49858856201172, 1.6173954010009766, 28.391921997070312, 53.803741455078125, 38.37982177734375, 27.12283706665039, 138.9439697265625, 125.75027465820312, 90.46414184570312, 11.682014465332031, 128.40866088867188, -4.2798919677734375, 124.26614379882812, 22.836837768554688, 16.409683227539062, -47.61363220214844, 50.504276275634766, 32.077392578125, 6.110191345214844, -14.883773803710938, 42.03633117675781, 18.719261169433594, 36.912288665771484, -22.63702392578125, 186.3832550048828, 72.6552734375, 12.686595916748047, 14.205638885498047, -42.06575012207031, 48.678245544433594, 40.76007080078125, 21.424907684326172, 68.6749038696289, 196.419677734375, 9.565231323242188, 48.97949981689453, 45.70830535888672, -6.307525634765625, 114.07942199707031, -9.855316162109375, 22.031494140625, -15.150848388671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000358.npy"}
{"epoch": 0.5256975036710719, "step": 359, "batch_size": 64, "mean": 41.65609359741211, "std": 60.94831466674805, "min": -61.64706802368164, "p10": -32.51900634765625, "median": 28.155941009521484, "p90": 120.7372917175293, "max": 221.83200073242188, "pos_frac": 0.765625, "sample": [7.592071533203125, 2.9510498046875, 126.46395874023438, 81.29100036621094, -32.42524719238281, 177.40167236328125, -45.249298095703125, 221.83200073242188, 53.01029586791992, 0.23989486694335938, -22.659103393554688, -15.4881591796875, 109.24761199951172, 115.7498779296875, -46.454689025878906, 158.1690673828125, 130.0648651123047, 55.2282829284668, 108.6845932006836, 45.30833435058594, 21.00078582763672, 102.4924545288086, 84.78817749023438, 54.220176696777344, 20.927921295166016, 26.83654022216797, 96.06486511230469, 1.297454833984375, -18.11883544921875, -2.779510498046875, 144.27828979492188, 13.462905883789062, 118.82926940917969, 9.186813354492188, 29.475341796875, 106.89014434814453, 24.111038208007812, 121.55501556396484, 7.544456481933594, 52.488128662109375, -61.64706802368164, 11.116897583007812, 34.929786682128906, -5.444334030151367, 3.5218048095703125, 2.6458873748779297, 53.740386962890625, 61.143341064453125, 96.26309967041016, 40.003173828125, -29.506797790527344, 26.210006713867188, 34.22027587890625, -13.71826171875, 31.168296813964844, -32.55918884277344, 88.07144165039062, -33.833255767822266, 65.80577087402344, 14.733428955078125, 98.31019592285156, -33.78736877441406, 20.682552337646484, -51.589534759521484], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000359.npy"}
{"epoch": 0.527165932452276, "step": 360, "batch_size": 64, "mean": 52.00859069824219, "std": 51.166568756103516, "min": -119.9491958618164, "p10": -7.374932861328125, "median": 48.936439514160156, "p90": 116.51559448242188, "max": 179.75682067871094, "pos_frac": 0.875, "sample": [78.84074401855469, 19.66156005859375, 45.19142150878906, 97.82368469238281, 36.704612731933594, 123.81748962402344, 100.16104125976562, 122.91413116455078, 116.65086364746094, 19.110458374023438, 79.87422180175781, 54.6419677734375, 24.6961669921875, 2.823474884033203, 105.85497283935547, 9.125110626220703, 8.71632194519043, 39.76270294189453, 116.19996643066406, 2.980743408203125, 93.46539306640625, 31.69345474243164, -7.366783142089844, 103.61614990234375, -16.304588317871094, 24.512908935546875, 121.1076889038086, 63.5977783203125, 95.64252471923828, 106.6278076171875, 179.75682067871094, 66.77149963378906, 69.01624298095703, -119.9491958618164, 2.048480987548828, 56.59087371826172, 75.13566589355469, -7.378425598144531, 77.7666015625, 46.57069396972656, 42.59867477416992, 146.90374755859375, 11.607837677001953, 62.63542938232422, 23.2401123046875, 70.44126892089844, -13.86568832397461, 50.30205154418945, 25.870685577392578, 30.84614372253418, -24.87982940673828, 74.06487274169922, -9.029609680175781, 96.55117797851562, 118.35153198242188, 110.41986083984375, 75.010986328125, 81.11738586425781, 47.57082748413086, -33.09978485107422, 20.66114044189453, 37.81156539916992, 5.2888946533203125, 9.657318115234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000360.npy"}
{"epoch": 0.5286343612334802, "step": 361, "batch_size": 64, "mean": 46.802886962890625, "std": 60.345680236816406, "min": -103.05709075927734, "p10": -9.784484100341796, "median": 33.955055236816406, "p90": 131.95123748779298, "max": 192.43380737304688, "pos_frac": 0.78125, "sample": [74.12360382080078, 2.2626304626464844, 7.2718353271484375, 97.1822509765625, -43.260284423828125, 7.121257781982422, 114.84811401367188, 162.5686492919922, 102.2088394165039, 61.917274475097656, 7.636604309082031, 9.7635498046875, -9.863739013671875, 145.7044219970703, 7.097198486328125, -2.2479114532470703, 17.36589813232422, 20.6888427734375, 10.792709350585938, 59.95459747314453, 155.6597900390625, 48.68163299560547, -28.914306640625, 73.064697265625, -12.097297668457031, 4.556419372558594, 52.141265869140625, 83.50813293457031, 34.5572509765625, 96.45038604736328, 81.77100372314453, 24.613967895507812, 31.814605712890625, 61.101295471191406, 111.44023895263672, 62.633819580078125, -68.86650085449219, 116.93136596679688, 85.15057373046875, 78.53805541992188, 85.84571838378906, 22.51667022705078, 129.46083068847656, 33.35285949707031, -7.043647766113281, 128.1452178955078, 192.43380737304688, 43.3901481628418, -0.9495086669921875, -9.599555969238281, 8.960070610046387, 22.836341857910156, 133.267333984375, 25.298072814941406, -54.1929931640625, 44.445587158203125, 133.0185546875, 8.363584518432617, -5.14605712890625, 64.98094177246094, 159.10650634765625, -5.073616027832031, -103.05709075927734, -0.8476791381835938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000361.npy"}
{"epoch": 0.5301027900146843, "step": 362, "batch_size": 64, "mean": 50.67647933959961, "std": 55.76588821411133, "min": -115.75123596191406, "p10": -5.281710815429685, "median": 48.55665588378906, "p90": 119.01262512207032, "max": 234.28549194335938, "pos_frac": 0.859375, "sample": [40.21807861328125, 157.39886474609375, 50.6375732421875, 53.10089874267578, 22.53101921081543, -115.75123596191406, 234.28549194335938, 119.79985046386719, 41.49785614013672, 124.03422546386719, 101.78218078613281, 10.273193359375, 36.80047607421875, 59.01854705810547, 21.27477264404297, 6.345672607421875, 8.79658317565918, -97.4635238647461, 117.17576599121094, 81.24513244628906, 43.03926086425781, 31.8218994140625, 6.041542053222656, 58.895851135253906, 85.44464874267578, 2.962566375732422, 42.8001708984375, 64.52495574951172, 3.6820602416992188, 60.899566650390625, 64.62765502929688, 82.9266357421875, 95.95600891113281, 11.697761535644531, 137.34925842285156, 37.010414123535156, 107.66654968261719, -26.060836791992188, 95.85002136230469, 46.475738525390625, 53.95871353149414, -6.34942626953125, 57.23907470703125, 62.254844665527344, -16.891265869140625, 15.166412353515625, 131.01571655273438, 69.57659149169922, 28.90289306640625, 72.77095031738281, 38.3018798828125, 120.78819274902344, -33.632293701171875, 113.71675109863281, -16.415924072265625, 43.756446838378906, 66.29591369628906, 60.14332580566406, 41.07284927368164, 89.13693237304688, 45.06520462036133, -2.790374755859375, -1.6134490966796875, 85.2115478515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000362.npy"}
{"epoch": 0.5315712187958884, "step": 363, "batch_size": 64, "mean": 65.30412292480469, "std": 62.63591766357422, "min": -78.89653015136719, "p10": -13.87988300323486, "median": 64.72445678710938, "p90": 146.35989685058598, "max": 235.82516479492188, "pos_frac": 0.8125, "sample": [-26.802833557128906, 16.884910583496094, 126.87415313720703, 56.5802001953125, -23.48532485961914, -1.5531272888183594, 71.58792877197266, 86.47125244140625, -8.654205322265625, 126.8980712890625, 70.35954284667969, 33.51405334472656, 163.11541748046875, 45.703792572021484, 112.02272033691406, 61.55357360839844, 30.699655532836914, 90.9520263671875, 113.47320556640625, 176.4154510498047, 15.069801330566406, 56.353271484375, 74.76690673828125, 169.05545043945312, 98.6248779296875, 76.32633972167969, 235.82516479492188, 67.89533996582031, -0.23486328125, 38.28703308105469, 47.85871124267578, 151.88247680664062, 14.270950317382812, -11.415323257446289, 184.65676879882812, 111.09272766113281, 21.256275177001953, 4.352943420410156, 43.21717071533203, 32.11810302734375, 100.6303482055664, 82.89176940917969, -14.93612289428711, 111.02326965332031, -20.465007781982422, 56.073211669921875, 44.66949462890625, -78.89653015136719, 132.3343963623047, 125.74855041503906, 11.377182006835938, -28.891250610351562, 75.56497192382812, -20.603858947753906, 104.04752349853516, 10.941421508789062, 85.26207733154297, 36.60923767089844, 86.84957885742188, -2.675212860107422, 159.39031982421875, 133.473876953125, 120.61941528320312, 114.55487823486328], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000363.npy"}
{"epoch": 0.5330396475770925, "step": 364, "batch_size": 64, "mean": 50.632606506347656, "std": 47.960636138916016, "min": -34.62777328491211, "p10": -18.143750381469722, "median": 52.48990249633789, "p90": 114.44559020996095, "max": 170.15895080566406, "pos_frac": 0.796875, "sample": [20.569961547851562, 58.84410858154297, -24.261993408203125, 39.629058837890625, 19.22386932373047, 18.143775939941406, -10.288948059082031, 73.89802551269531, -26.447959899902344, 102.16470336914062, 33.076141357421875, 73.66222381591797, 29.559715270996094, 63.10737609863281, 51.219539642333984, 19.619354248046875, 79.19392395019531, 52.0225830078125, -33.3536376953125, -34.62777328491211, 5.211280822753906, -8.220558166503906, 52.95722198486328, 123.24103546142578, 30.598854064941406, 29.220840454101562, 82.13617706298828, 170.15895080566406, 115.67235565185547, 127.31078338623047, -23.656326293945312, 75.13957977294922, 55.185020446777344, 107.29541778564453, 104.30378723144531, 11.956954956054688, 77.5161361694336, 58.25205993652344, -4.4536285400390625, 77.91021728515625, -13.554862976074219, 120.21171569824219, 118.4082260131836, 33.744041442871094, 89.6038589477539, 64.2972640991211, 78.55863952636719, 49.11883544921875, 107.3405532836914, -24.398971557617188, 41.095611572265625, 78.28575897216797, -2.2128658294677734, -6.6749114990234375, 73.79592895507812, 58.82685089111328, 89.65988159179688, 111.58313751220703, -20.110416412353516, 133.49026489257812, 51.354644775390625, 36.15565490722656, 12.791728973388672, 86.42607116699219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000364.npy"}
{"epoch": 0.5345080763582967, "step": 365, "batch_size": 64, "mean": 55.836181640625, "std": 61.49734115600586, "min": -82.51747131347656, "p10": -30.84305343627929, "median": 57.66157531738281, "p90": 124.31091766357423, "max": 262.8988037109375, "pos_frac": 0.8125, "sample": [105.03926086425781, -36.90299987792969, 121.84785461425781, -9.401626586914062, 5.1854705810546875, 13.346582412719727, 40.43348693847656, 41.74107360839844, 62.855892181396484, -25.187271118164062, 58.39888000488281, 56.92427062988281, -33.26696014404297, 83.2528076171875, 37.70659255981445, -15.85641860961914, 56.549896240234375, 132.21728515625, 34.882286071777344, 106.49444580078125, 108.65933227539062, 46.15581512451172, 0.5173568725585938, 45.583160400390625, 84.29922485351562, -23.186622619628906, 113.12552642822266, 84.86607360839844, 134.29022216796875, -57.42811965942383, -82.51747131347656, 25.237991333007812, 21.775310516357422, 64.01043701171875, -33.606292724609375, 133.28855895996094, 98.81375122070312, 74.86140441894531, 107.64494323730469, -40.82704162597656, 53.0108642578125, 75.89772033691406, 99.9674072265625, -2.6947174072265625, 8.009895324707031, 125.36651611328125, 95.8224868774414, 22.468719482421875, 71.49345397949219, 33.74748229980469, 50.82611083984375, 52.89292907714844, -44.55470275878906, 71.79765319824219, 60.804893493652344, 70.46113586425781, 262.8988037109375, 162.23760986328125, 18.25104522705078, 103.1164779663086, 94.82923889160156, 175.00927734375, 84.63430786132812, 85.3966293334961], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000365.npy"}
{"epoch": 0.5359765051395007, "step": 366, "batch_size": 64, "mean": 51.919151306152344, "std": 56.83214569091797, "min": -31.432754516601562, "p10": 0.40644054412841935, "median": 33.57999610900879, "p90": 145.27774353027345, "max": 221.50421142578125, "pos_frac": 0.890625, "sample": [23.509246826171875, 1.8170642852783203, 75.89226531982422, 2.9570865631103516, 27.58575439453125, -0.19811248779296875, 53.56737518310547, 51.15641784667969, 5.275169372558594, 139.82473754882812, 23.860410690307617, 21.30982208251953, 46.27977752685547, 196.52932739257812, 69.74757385253906, 35.20318603515625, 45.445220947265625, 35.427974700927734, -1.8008003234863281, -2.862091064453125, 93.53341674804688, 4.51593017578125, 98.13365173339844, -19.950611114501953, 114.71186828613281, 75.16748809814453, 104.37442016601562, 40.3066520690918, 18.96001434326172, 14.581100463867188, 147.61474609375, -1.5896759033203125, 69.18525695800781, 2.518890380859375, 80.86764526367188, 4.006690979003906, 158.68365478515625, 160.45849609375, 28.660736083984375, 18.258634567260742, 108.77755737304688, 3.2287063598632812, 3.8937625885009766, 27.520999908447266, 119.14952087402344, 27.396400451660156, 5.99346923828125, 5.447223663330078, -31.432754516601562, 221.50421142578125, 63.05390930175781, 6.6126556396484375, 178.202880859375, 49.645263671875, 165.48342895507812, 47.15324401855469, 31.956806182861328, 42.64630126953125, 29.860675811767578, 12.29827880859375, 64.24641418457031, 77.15213775634766, 14.64645004272461, -15.138214111328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000366.npy"}
{"epoch": 0.5374449339207048, "step": 367, "batch_size": 64, "mean": 55.335411071777344, "std": 54.15477752685547, "min": -82.024169921875, "p10": 2.2635772705078137, "median": 48.78618621826172, "p90": 125.90439147949219, "max": 210.66665649414062, "pos_frac": 0.90625, "sample": [8.40323257446289, 47.626121520996094, 136.09310913085938, 40.84211349487305, 189.84803771972656, 63.383644104003906, 54.29804229736328, 23.407068252563477, 67.63638305664062, 50.898590087890625, 96.01029968261719, 15.657989501953125, -34.56446075439453, 210.66665649414062, 139.57855224609375, 124.43353271484375, 108.96443176269531, 7.422004699707031, 182.68814086914062, 1.7195205688476562, 34.984073638916016, 126.53475952148438, 39.32769775390625, 77.20985412597656, 46.2567138671875, 81.29676818847656, 49.946250915527344, 23.217857360839844, 173.16387939453125, 38.40385437011719, 100.80191802978516, 19.738216400146484, 52.19432830810547, 46.44416809082031, 61.21697998046875, 57.95074462890625, 59.81364440917969, 45.24687194824219, 39.31599426269531, 64.34632110595703, 71.73515319824219, 114.89258575439453, 64.41494750976562, -24.505151748657227, 50.01420593261719, -82.024169921875, 45.78566360473633, -10.245697021484375, 86.75152587890625, 21.674110412597656, 22.89275360107422, -20.1224365234375, 47.57524108886719, 9.713748931884766, 3.5330429077148438, 23.372222900390625, 63.65632629394531, 35.919586181640625, -25.743820190429688, 11.588088989257812, 34.20988464355469, 54.98450469970703, 50.62751770019531, 118.342529296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000367.npy"}
{"epoch": 0.5389133627019089, "step": 368, "batch_size": 64, "mean": 57.98668670654297, "std": 58.0166130065918, "min": -100.50733947753906, "p10": -8.178204345703122, "median": 63.40228080749512, "p90": 130.63208618164063, "max": 169.88023376464844, "pos_frac": 0.84375, "sample": [104.99634552001953, 67.65108489990234, 41.623226165771484, 139.1457061767578, 66.16888427734375, -9.030029296875, -26.810562133789062, -100.50733947753906, 64.12569427490234, 46.98418426513672, 9.769668579101562, 26.287704467773438, 38.11762237548828, 90.97665405273438, 62.67886734008789, 145.1304931640625, 94.01192474365234, 105.54159545898438, 47.08716583251953, 150.19747924804688, 96.33914184570312, 9.69040298461914, 128.31671142578125, 72.00778198242188, 169.88023376464844, 128.1250762939453, -2.6480255126953125, -3.64508056640625, 43.820289611816406, -9.657465934753418, 143.661865234375, 131.6243896484375, 83.39002990722656, 12.8387451171875, 66.91197967529297, 116.8983383178711, 7.232177734375, 123.88427734375, 21.95794677734375, 11.141448974609375, 54.79731750488281, 116.1147232055664, 29.302169799804688, -6.19061279296875, 50.529422760009766, 9.352622985839844, 79.9583740234375, 75.50106811523438, -44.347862243652344, -24.138870239257812, 50.47898864746094, 70.33207702636719, 20.784347534179688, -93.54596710205078, 121.80889892578125, 20.00265121459961, 34.42974853515625, 104.67434692382812, 111.9797134399414, 118.2094497680664, 74.90454864501953, 137.5192108154297, 15.790203094482422, 66.98454284667969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000368.npy"}
{"epoch": 0.540381791483113, "step": 369, "batch_size": 64, "mean": 57.41761779785156, "std": 47.8602180480957, "min": -30.303329467773438, "p10": -1.7896369934082028, "median": 60.864097595214844, "p90": 127.14695434570315, "max": 185.83663940429688, "pos_frac": 0.875, "sample": [92.99876403808594, 27.856630325317383, 109.72758483886719, 88.05205535888672, 12.031089782714844, 45.00472640991211, -1.9317703247070312, 104.63119506835938, -30.303329467773438, 185.83663940429688, 66.67271423339844, 62.625335693359375, 61.4467887878418, 17.66826629638672, 42.868438720703125, 68.65711212158203, 26.051712036132812, 129.57412719726562, -17.58319854736328, 74.29266357421875, 69.2422866821289, 62.777679443359375, 87.17912292480469, -6.73492431640625, 36.531639099121094, -9.016143798828125, 34.314002990722656, 4.8564605712890625, 60.72344970703125, 45.88035583496094, 73.75477600097656, 72.011962890625, 14.534698486328125, -2.8190155029296875, 80.4757080078125, 68.15436553955078, 54.00531768798828, 147.98291015625, 84.05974578857422, 27.757850646972656, 72.33575439453125, 54.017295837402344, 157.74407958984375, 70.85103607177734, 121.48355102539062, 34.746185302734375, 7.400077819824219, 64.35039520263672, -7.041627883911133, 71.19795227050781, 20.933303833007812, -1.4579925537109375, 34.555511474609375, 35.715728759765625, 137.7513885498047, 61.00474548339844, 17.34783172607422, 23.132648468017578, 89.65290069580078, 116.89300537109375, 155.63914489746094, 15.577133178710938, 148.80604553222656, 0.24169540405273438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000369.npy"}
{"epoch": 0.5418502202643172, "step": 370, "batch_size": 64, "mean": 42.137481689453125, "std": 57.44393539428711, "min": -104.65985107421875, "p10": -29.112420654296873, "median": 37.78211212158203, "p90": 111.6766960144043, "max": 164.6201171875, "pos_frac": 0.78125, "sample": [80.37837219238281, 102.76455688476562, -104.65985107421875, 111.48722839355469, 99.72666931152344, 11.353607177734375, -27.8231201171875, 23.79546356201172, -4.754207611083984, 111.75789642333984, 131.49362182617188, 74.738037109375, 3.747589111328125, 23.356796264648438, 109.24781799316406, 61.938026428222656, -29.66497802734375, 1.7716140747070312, -32.24920654296875, 96.02112579345703, 90.38638305664062, 55.805633544921875, 109.94313049316406, 21.733154296875, 41.607337951660156, 13.087285995483398, 35.009521484375, -13.80889892578125, 32.516845703125, 6.496156692504883, 16.650466918945312, 39.623050689697266, 52.233741760253906, 78.86317443847656, 35.9411735534668, 125.13921356201172, 56.47312927246094, 78.37266540527344, 44.53558349609375, -13.8195161819458, 127.85741424560547, 28.524246215820312, -51.99230194091797, 14.578907012939453, 88.98457336425781, 43.9049072265625, -9.2239990234375, 12.605697631835938, 164.6201171875, -43.70391082763672, -0.3887939453125, -9.941162109375, -59.31165313720703, 103.24520111083984, 25.4793701171875, 9.402572631835938, 19.639312744140625, 120.138671875, -94.89815521240234, 82.01010131835938, 75.55963134765625, 66.30348205566406, 136.40728759765625, 95.78109741210938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000370.npy"}
{"epoch": 0.5433186490455213, "step": 371, "batch_size": 64, "mean": 41.45665740966797, "std": 56.679378509521484, "min": -64.6585693359375, "p10": -30.876966857910155, "median": 33.30080986022949, "p90": 105.72377700805664, "max": 172.83303833007812, "pos_frac": 0.75, "sample": [71.79519653320312, 87.37728881835938, 68.66172790527344, -49.47106170654297, -62.8291015625, 155.26971435546875, 104.23070526123047, 104.655517578125, 56.99896240234375, 18.253013610839844, 22.822128295898438, -27.22320556640625, 88.30146026611328, 137.28744506835938, 94.90274047851562, 31.740737915039062, 73.25301361083984, 22.004776000976562, 15.871871948242188, 106.18160247802734, 74.62956237792969, 52.97798538208008, 53.78520965576172, 28.105567932128906, 19.1776123046875, 92.67536163330078, 110.81698608398438, 16.2122802734375, 160.54083251953125, 33.927978515625, 172.83303833007812, -64.6585693359375, -53.957427978515625, -7.5549468994140625, -57.86640930175781, 2.6782875061035156, 49.79656982421875, 47.745174407958984, 32.673641204833984, -36.703330993652344, -17.52849006652832, -8.317035675048828, 36.62945556640625, 1.1424007415771484, 17.304885864257812, -0.09230804443359375, -3.0663604736328125, 90.42398071289062, 82.30989074707031, -29.980743408203125, -31.261062622070312, -8.129531860351562, 30.498607635498047, 101.74520874023438, 82.77972412109375, 19.05858612060547, 93.04527282714844, 0.08832168579101562, 41.86524963378906, 5.879692077636719, 117.5181655883789, 92.27439880371094, 93.09353637695312, -1.9758033752441406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000371.npy"}
{"epoch": 0.5447870778267254, "step": 372, "batch_size": 64, "mean": 67.37609100341797, "std": 60.192962646484375, "min": -61.53942108154297, "p10": -1.6247817993163998, "median": 69.04162979125977, "p90": 132.33979797363284, "max": 285.3144226074219, "pos_frac": 0.890625, "sample": [77.8045883178711, -14.082099914550781, 91.11654663085938, 62.789093017578125, 32.43743896484375, 140.14019775390625, 34.68260192871094, 85.16683959960938, -53.23963928222656, 14.211456298828125, 285.3144226074219, 87.71849060058594, 14.541900634765625, 101.06181335449219, 35.478546142578125, 10.00897216796875, 8.953132629394531, 8.108283996582031, 91.24212646484375, -14.289257049560547, 116.4276123046875, 124.26812744140625, 69.5400161743164, 92.54925537109375, 100.83575439453125, 108.06112670898438, 35.72270202636719, 127.14352416992188, 79.95571899414062, 102.28364562988281, 89.73528289794922, 124.7576675415039, 72.40265655517578, 32.022605895996094, 24.488075256347656, 27.09775161743164, 70.83465576171875, 84.425048828125, 63.33119201660156, 125.89884948730469, 64.48580169677734, 23.761573791503906, 68.54324340820312, 85.03033447265625, 134.5667724609375, 176.62356567382812, -29.137550354003906, 28.760223388671875, -4.3922882080078125, 51.26887512207031, 109.30240631103516, 91.03813934326172, 115.42074584960938, 55.28839111328125, 57.229026794433594, 67.63575744628906, -30.63144874572754, 143.10491943359375, 18.864498138427734, 199.25579833984375, -61.53942108154297, 32.888427734375, 138.92274475097656, 4.832733154296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000372.npy"}
{"epoch": 0.5462555066079295, "step": 373, "batch_size": 64, "mean": 53.74197006225586, "std": 57.01975631713867, "min": -101.30043029785156, "p10": -16.171763992309568, "median": 51.75439262390137, "p90": 131.44988861083985, "max": 162.6571807861328, "pos_frac": 0.828125, "sample": [5.264507293701172, 33.978431701660156, -19.20575714111328, 133.02386474609375, 76.52110290527344, 117.15049743652344, 63.129737854003906, 96.1123275756836, 5.385578155517578, 54.776885986328125, 7.514396667480469, 51.46443176269531, 67.0599365234375, 15.282257080078125, 83.04767608642578, 52.04435348510742, 7.3122100830078125, 59.723243713378906, 73.48970794677734, 11.802452087402344, -13.746322631835938, -17.524322509765625, 127.05492401123047, -11.632598876953125, 41.99369430541992, 21.811365127563477, 162.6571807861328, 62.524681091308594, 46.650665283203125, -39.51435089111328, -34.37372970581055, 78.7386474609375, -7.952690124511719, 111.20880126953125, 149.24522399902344, -4.972469329833984, 116.95709228515625, 88.33221435546875, 17.094486236572266, 50.87574768066406, 124.52378845214844, 72.9036636352539, -17.211238861083984, 141.3031005859375, 123.15046691894531, 128.85643005371094, 94.6712646484375, 42.45039367675781, 132.762451171875, 22.373687744140625, 10.927108764648438, 132.56137084960938, 35.250457763671875, 37.80070114135742, 40.78606414794922, 110.12645721435547, 89.91294860839844, 87.4162368774414, 6.455352783203125, -34.318359375, 52.847259521484375, 161.69000244140625, -101.30043029785156, 3.2409820556640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000373.npy"}
{"epoch": 0.5477239353891337, "step": 374, "batch_size": 64, "mean": 55.66587829589844, "std": 49.33210754394531, "min": -33.04296875, "p10": -4.9896610260009755, "median": 53.80585861206055, "p90": 119.43814239501954, "max": 191.1314239501953, "pos_frac": 0.828125, "sample": [21.43743133544922, 86.83501434326172, 53.521949768066406, 105.27059936523438, 49.515968322753906, 191.1314239501953, 24.114364624023438, -33.04296875, 45.88233184814453, 63.37812042236328, 7.427703857421875, -16.537181854248047, 60.7454833984375, -5.523700714111328, 96.62579345703125, 59.67005920410156, 84.83503723144531, 88.12378692626953, 119.76637268066406, 14.00094985961914, 72.06773376464844, -9.858642578125, 114.784423828125, -2.7277908325195312, -11.204437255859375, 56.11576843261719, 12.843080520629883, -24.324440002441406, 20.728530883789062, 75.76420593261719, 148.38063049316406, 118.67227172851562, 122.02593994140625, 103.00001525878906, 24.422622680664062, 52.87928771972656, -3.7435684204101562, 121.97933959960938, -3.4834823608398438, 67.15701293945312, 37.55242919921875, 17.156299591064453, 68.16952514648438, 12.384943008422852, 97.65048217773438, 36.39079284667969, 104.84862518310547, 89.6720962524414, 85.38790130615234, 131.03878784179688, 34.6981201171875, 68.78607940673828, 25.834232330322266, 79.74683380126953, 39.17000198364258, 97.93376159667969, -16.523021697998047, -2.0569534301757812, 105.82566833496094, 42.73906707763672, 54.08976745605469, 2.9848175048828125, 20.95197296142578, 155.52706909179688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000374.npy"}
{"epoch": 0.5491923641703378, "step": 375, "batch_size": 64, "mean": 60.347686767578125, "std": 59.30996322631836, "min": -55.75122833251953, "p10": -10.684545898437499, "median": 56.93462371826172, "p90": 135.93816833496095, "max": 219.42312622070312, "pos_frac": 0.84375, "sample": [122.28961181640625, 56.17259216308594, 57.6966552734375, 53.469085693359375, 111.66322326660156, 127.76960754394531, 114.22067260742188, 10.366455078125, 73.20162963867188, 34.4207763671875, 26.93703842163086, 58.85850524902344, -22.593856811523438, -32.42433166503906, 10.241050720214844, 111.34716796875, 172.7634735107422, 66.41426849365234, 63.05339813232422, -11.438858032226562, 27.58697509765625, 63.70113754272461, 77.02960205078125, 135.07391357421875, 142.57272338867188, 131.29718017578125, 90.44039154052734, 36.74828338623047, 54.48982238769531, 64.80978393554688, 136.30856323242188, 5.526477813720703, 29.858871459960938, 42.620025634765625, 62.923194885253906, 17.527908325195312, 160.94476318359375, 115.65394592285156, 40.72932434082031, 25.0894775390625, -24.511093139648438, 16.97551155090332, -1.4460906982421875, -54.48771667480469, 87.91885375976562, 100.9512939453125, 97.12850189208984, 101.55679321289062, 219.42312622070312, -8.924484252929688, 111.64654541015625, -39.31640625, 55.35676574707031, 43.91270065307617, 160.71444702148438, 124.96354675292969, 63.1484375, 20.48907470703125, -5.543327331542969, 22.1424560546875, 5.467864990234375, 13.932174682617188, 141.14358520507812, -55.75122833251953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000375.npy"}
{"epoch": 0.5506607929515418, "step": 376, "batch_size": 64, "mean": 66.08840942382812, "std": 60.032562255859375, "min": -71.28012084960938, "p10": -2.4118644714355466, "median": 64.32518005371094, "p90": 139.38058166503907, "max": 213.93142700195312, "pos_frac": 0.859375, "sample": [-25.825439453125, 61.24247741699219, 127.19124603271484, 105.62081146240234, 159.820068359375, 54.6478271484375, 97.79011535644531, -26.452674865722656, 82.34254455566406, 75.222900390625, -11.6925048828125, 0.3698005676269531, 26.609451293945312, 7.086559295654297, 83.2613754272461, 61.138362884521484, 137.6551055908203, 7.215850830078125, 94.20211791992188, 3.2447738647460938, 124.93856811523438, 3.806964874267578, 77.50686645507812, 117.00718688964844, 127.34464263916016, 15.057052612304688, 168.9329833984375, 55.940086364746094, 213.93142700195312, 59.0703125, 31.431365966796875, -2.1598052978515625, 67.1711196899414, 177.94003295898438, 64.54759216308594, 53.06309127807617, 152.21421813964844, -0.6260223388671875, 112.33464050292969, 66.03471374511719, 91.36981964111328, 166.87603759765625, -71.28012084960938, 74.80928039550781, 46.38274383544922, 69.39192962646484, 75.06866455078125, 137.26425170898438, 44.022857666015625, 3.2055130004882812, 43.428428649902344, -57.74778366088867, 140.1200714111328, 51.13787841796875, 96.25738525390625, 111.2412109375, 64.10276794433594, 129.83786010742188, 108.64448547363281, -39.9360466003418, -2.5198898315429688, 61.68108367919922, 39.454627990722656, 40.6671142578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000376.npy"}
{"epoch": 0.5521292217327459, "step": 377, "batch_size": 64, "mean": 67.94343566894531, "std": 65.5228042602539, "min": -59.3089599609375, "p10": -9.254408264160155, "median": 68.5103988647461, "p90": 149.0623031616211, "max": 213.83197021484375, "pos_frac": 0.828125, "sample": [165.86917114257812, 176.04493713378906, 83.33181762695312, 180.218505859375, 23.654743194580078, 4.2842559814453125, 117.8114013671875, 94.06600952148438, 122.81349182128906, 22.779422760009766, 94.73820495605469, 207.09701538085938, 70.19451904296875, 44.476409912109375, 3.1791133880615234, -10.92364501953125, 127.26937866210938, -7.937080383300781, 106.54891967773438, 32.559139251708984, 130.50262451171875, 213.83197021484375, 55.34466552734375, -54.99559783935547, 88.33413696289062, -3.620819091796875, 169.38536071777344, 116.6080093383789, 50.092445373535156, 19.824554443359375, 124.47540283203125, 51.00978088378906, 74.28547668457031, 66.82627868652344, 5.579858779907227, 139.91094970703125, 11.429595947265625, 100.26235961914062, 147.7316131591797, 112.78988647460938, 2.5108165740966797, 29.403724670410156, 108.12720489501953, 77.56985473632812, -59.3089599609375, 118.44898986816406, 53.510498046875, 15.50323486328125, 96.71675872802734, 149.63259887695312, 60.981117248535156, -9.818977355957031, -3.0375289916992188, 40.92328643798828, 21.852664947509766, 48.57801055908203, -20.864830017089844, 140.3756103515625, -57.028472900390625, -29.36140251159668, 92.20985412597656, 98.68270874023438, 98.58131408691406, -3.4925384521484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000377.npy"}
{"epoch": 0.55359765051395, "step": 378, "batch_size": 64, "mean": 56.504417419433594, "std": 57.14765548706055, "min": -114.89825439453125, "p10": -11.181060791015625, "median": 62.31708526611328, "p90": 129.4477569580078, "max": 197.42555236816406, "pos_frac": 0.828125, "sample": [16.075057983398438, 24.435569763183594, 44.30332946777344, 99.13059997558594, 76.30953979492188, 58.98058319091797, 23.515213012695312, 35.56110382080078, 135.5283203125, 89.82268524169922, 81.28421020507812, 93.69882202148438, -61.3707275390625, 94.65478515625, -41.01868438720703, 17.59503173828125, 46.20994186401367, 114.10279846191406, 25.767345428466797, 76.50952911376953, 65.21621704101562, -11.330726623535156, 70.09637451171875, 102.12249755859375, 31.41999053955078, 54.16172790527344, -30.769107818603516, 131.18373107910156, 77.97976684570312, 61.61888122558594, 81.62266540527344, 63.015289306640625, 46.03246307373047, -10.831840515136719, -47.81036376953125, 61.314170837402344, -1.2057647705078125, 104.68218994140625, -5.596973419189453, -114.89825439453125, 153.02987670898438, 155.1011505126953, 91.50394439697266, 79.27766418457031, 123.60894012451172, -0.08016777038574219, 129.31866455078125, 92.43866729736328, 197.42555236816406, 63.363853454589844, 57.17588806152344, 70.56338500976562, -49.67723083496094, 8.567398071289062, 76.49935913085938, 63.90824890136719, 84.05598449707031, 41.551239013671875, 51.862335205078125, 139.42642211914062, 41.05534362792969, 11.719841003417969, 25.965456008911133, 129.50308227539062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000378.npy"}
{"epoch": 0.5550660792951542, "step": 379, "batch_size": 64, "mean": 54.58235168457031, "std": 69.63346099853516, "min": -102.10183715820312, "p10": -34.770678710937496, "median": 57.29200553894043, "p90": 135.62238922119144, "max": 246.45828247070312, "pos_frac": 0.75, "sample": [186.53109741210938, 25.309715270996094, 170.68577575683594, 128.35052490234375, -38.32407760620117, -102.10183715820312, 57.6217041015625, 77.03921508789062, 0.7131500244140625, 4.1080474853515625, 31.517616271972656, 83.7872543334961, 121.15859985351562, 43.865386962890625, 72.16302490234375, -30.392196655273438, 96.40811157226562, 151.19476318359375, 42.48846435546875, 6.125907897949219, 168.77699279785156, 21.30500030517578, -18.892044067382812, -7.207490921020508, 28.55462646484375, 28.647560119628906, 91.79206848144531, 38.0050048828125, 107.13289642333984, -1.2938232421875, 192.59271240234375, 92.39495086669922, -36.64717102050781, 129.77085876464844, 96.20906066894531, 95.06690979003906, 83.2855224609375, 138.13018798828125, 34.710113525390625, -43.676170349121094, 104.1962890625, 105.68223571777344, -26.340110778808594, 67.73725891113281, 57.3790283203125, -2.44012451171875, -57.06549835205078, 22.195648193359375, -12.4134521484375, -10.757545471191406, 31.552547454833984, 67.33905029296875, -42.528076171875, 71.07937622070312, 246.45828247070312, 9.99658203125, 107.42648315429688, 82.47578430175781, 57.20498275756836, 121.97564697265625, 108.79888916015625, 98.20942687988281, -15.464189529418945, -66.33631896972656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000379.npy"}
{"epoch": 0.5565345080763583, "step": 380, "batch_size": 64, "mean": 60.12629699707031, "std": 69.050048828125, "min": -76.56877899169922, "p10": -18.45185165405272, "median": 56.057212829589844, "p90": 161.75715332031257, "max": 282.2494812011719, "pos_frac": 0.84375, "sample": [69.29916381835938, 74.58674621582031, -44.18328857421875, 67.01127624511719, 17.72269630432129, 18.9237060546875, 85.23208618164062, 199.54196166992188, 76.47696685791016, 12.195026397705078, 69.47553253173828, 68.1099624633789, -5.553009033203125, 193.55307006835938, 84.75587463378906, 30.840591430664062, 19.627281188964844, 4.376869201660156, 55.29023742675781, 56.824188232421875, 147.77249145507812, 1.0539054870605469, 46.25560760498047, 36.190635681152344, 112.00880432128906, 35.89756774902344, 75.76504516601562, 31.088905334472656, 139.50677490234375, -2.158843994140625, -38.9112548828125, 194.5636749267578, 60.344791412353516, 92.52250671386719, 282.2494812011719, 167.75057983398438, -23.97992706298828, 94.47514343261719, 132.32476806640625, 70.70738983154297, 121.0850830078125, 17.51318359375, 27.96501350402832, -43.591827392578125, -42.88646697998047, 45.21202850341797, 30.95958709716797, 44.064605712890625, -4.297760009765625, -64.45977783203125, 106.52267456054688, 36.9844856262207, 68.18444061279297, 80.59207153320312, 30.187606811523438, -76.56877899169922, 168.3342742919922, 37.247535705566406, 58.38192367553711, 11.2449951171875, 83.13185119628906, 66.384765625, 206.89747619628906, 29.45899200439453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000380.npy"}
{"epoch": 0.5580029368575624, "step": 381, "batch_size": 64, "mean": 54.18187713623047, "std": 57.460086822509766, "min": -130.8907470703125, "p10": -4.965937805175781, "median": 46.615821838378906, "p90": 126.6042465209961, "max": 172.6307373046875, "pos_frac": 0.828125, "sample": [22.81958770751953, -7.9010009765625, 92.35160827636719, 63.306610107421875, 106.3365478515625, 106.28083801269531, 117.07899475097656, 82.004150390625, 159.63531494140625, 90.25717163085938, 101.43960571289062, 99.02400207519531, 26.13182830810547, 44.940635681152344, 157.6612548828125, 30.860427856445312, 154.37872314453125, 7.020481109619141, 169.95291137695312, 14.551231384277344, 100.70650482177734, 29.18395233154297, 16.116535186767578, -22.629249572753906, -14.774803161621094, 147.2389678955078, 101.26031494140625, 24.150211334228516, -0.6763763427734375, 11.938398361206055, 3.2700366973876953, 48.29100799560547, 48.843910217285156, 43.93194580078125, 39.87190246582031, 39.96508026123047, 75.66651916503906, 4.886329650878906, 61.72455596923828, 52.177734375, 21.281936645507812, 126.67683410644531, 108.13555145263672, 84.45758819580078, 96.35963439941406, 96.82312774658203, 19.174789428710938, 49.79078674316406, 126.43487548828125, -4.8473663330078125, 25.021686553955078, 172.6307373046875, -2.3194732666015625, 25.48953628540039, 4.676424026489258, -4.364532470703125, -13.31365966796875, 116.81707763671875, 56.19795227050781, 53.986602783203125, -5.016754150390625, -130.8907470703125, 0.587646484375, -5.424530029296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000381.npy"}
{"epoch": 0.5594713656387665, "step": 382, "batch_size": 64, "mean": 59.04779052734375, "std": 61.29759979248047, "min": -68.2753677368164, "p10": -17.476593017578125, "median": 55.87657165527344, "p90": 136.33928070068362, "max": 207.71145629882812, "pos_frac": 0.84375, "sample": [199.35414123535156, 108.28504180908203, 85.97520446777344, -11.19232177734375, 94.6099624633789, 112.08294677734375, 119.49993896484375, 54.88623046875, 91.04783630371094, 60.05120086669922, 177.14788818359375, 34.84597396850586, 0.9486236572265625, 109.72534942626953, 2.217254638671875, 121.56523132324219, 152.73773193359375, -19.849037170410156, 72.40775299072266, 9.653369903564453, 155.2458953857422, 12.927806854248047, 71.54487609863281, 43.36457824707031, 110.80587768554688, 33.957977294921875, 107.51188659667969, 62.06187438964844, 29.33428955078125, 60.62690734863281, 35.83423614501953, -17.690093994140625, -38.07441711425781, 86.75862884521484, 46.33226776123047, 93.97915649414062, -25.787246704101562, 138.66464233398438, 2.9359512329101562, 186.49725341796875, 23.7347412109375, 90.88241577148438, 52.070098876953125, 32.14196014404297, -29.860122680664062, 59.709556579589844, -6.173500061035156, -16.978424072265625, 18.269691467285156, -47.49117660522461, 130.91343688964844, 207.71145629882812, 41.3511962890625, 102.85443878173828, 89.63261413574219, 36.94322204589844, 9.638191223144531, -68.2753677368164, 64.57485961914062, 25.972381591796875, 29.81411361694336, 7.676780700683594, 94.24636840820312, 56.866912841796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000382.npy"}
{"epoch": 0.5609397944199707, "step": 383, "batch_size": 64, "mean": 50.940895080566406, "std": 58.0232048034668, "min": -49.07309341430664, "p10": -23.47902450561523, "median": 48.76097297668457, "p90": 128.28487243652344, "max": 159.80130004882812, "pos_frac": 0.765625, "sample": [135.07534790039062, -38.63737487792969, 127.45855712890625, -47.47223663330078, 108.33639526367188, 1.2259063720703125, -6.353477478027344, 159.80130004882812, 51.33280944824219, 11.423110961914062, 115.98893737792969, 106.79798889160156, 12.550735473632812, -49.07309341430664, 100.91180419921875, -16.035438537597656, 158.42636108398438, 104.57693481445312, 19.816848754882812, 72.49822998046875, 95.16192626953125, -14.518024444580078, -3.0627784729003906, 90.88145446777344, 134.31690979003906, 15.246063232421875, 89.41236877441406, 69.64285278320312, -25.864776611328125, -45.224735260009766, 124.93377685546875, 65.55581665039062, -31.631317138671875, 128.63900756835938, 1.4899253845214844, -0.8337860107421875, 149.61410522460938, 105.17759704589844, 146.350830078125, 38.110565185546875, 110.49166870117188, 8.403778076171875, -17.912269592285156, 46.18913650512695, 96.65382385253906, 75.34521484375, 5.796173095703125, 62.505279541015625, -0.17545318603515625, 34.51372528076172, 92.53292846679688, 39.493247985839844, -39.26507568359375, 56.3558349609375, 102.75540924072266, 21.303050994873047, 75.19645690917969, 26.682952880859375, -14.977775573730469, 31.125030517578125, 13.878326416015625, 66.60476684570312, 44.16667175292969, 60.506996154785156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000383.npy"}
{"epoch": 0.5624082232011748, "step": 384, "batch_size": 64, "mean": 52.73126983642578, "std": 68.84490966796875, "min": -120.10953521728516, "p10": -44.553394317626946, "median": 56.61350059509277, "p90": 135.40037841796877, "max": 241.00057983398438, "pos_frac": 0.84375, "sample": [80.88297271728516, 164.02780151367188, 73.26334381103516, 39.20954513549805, -46.76634979248047, -80.47494506835938, -4.325511932373047, 39.35276794433594, 241.00057983398438, 54.796730041503906, 84.307861328125, 135.93511962890625, 118.24445343017578, 62.073387145996094, 56.58365249633789, 45.126861572265625, -120.10953521728516, 21.481792449951172, 3.076446533203125, -31.887649536132812, 134.15264892578125, 60.639251708984375, 17.487459182739258, 155.4595947265625, 35.4222297668457, 122.23692321777344, 15.016624450683594, 14.4119873046875, 5.290653228759766, 103.23612976074219, 73.38467407226562, 35.521484375, 26.36988067626953, 163.89752197265625, 89.76176452636719, -63.016361236572266, -63.09355926513672, 84.24889373779297, 89.81429290771484, -39.38983154296875, 56.643348693847656, 75.02640533447266, 24.655929565429688, 121.3128662109375, 48.918304443359375, 6.462921142578125, 162.03797912597656, 70.9395523071289, 84.86895751953125, -90.85742950439453, 9.015869140625, 87.16152954101562, 90.99431610107422, 168.84271240234375, 46.954002380371094, 22.354148864746094, 106.66759490966797, 18.079605102539062, 11.546398162841797, 80.2430419921875, 74.26658630371094, 105.9429931640625, -69.49957275390625, 65.5715103149414], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000384.npy"}
{"epoch": 0.5638766519823789, "step": 385, "batch_size": 64, "mean": 62.7065544128418, "std": 67.4426498413086, "min": -69.97038269042969, "p10": -21.56408462524414, "median": 50.963134765625, "p90": 152.23838195800784, "max": 209.45803833007812, "pos_frac": 0.78125, "sample": [188.95391845703125, 13.282638549804688, 76.12797546386719, 27.87100601196289, 153.99215698242188, 41.830055236816406, 99.0534439086914, 105.43773651123047, 102.38304901123047, 130.35800170898438, 60.26329040527344, 5.1003875732421875, 47.158111572265625, 1.6452178955078125, -14.689620971679688, -9.999458312988281, 166.15684509277344, 112.36392974853516, 71.9835205078125, 154.36087036132812, 115.36579895019531, -12.791378021240234, 45.245330810546875, 75.83702850341797, -59.51490783691406, 45.02525329589844, 116.91911315917969, 177.08238220214844, -33.4176025390625, 120.0548095703125, -22.482269287109375, -23.53924560546875, 74.31468963623047, 12.757194519042969, 135.1385498046875, 102.92057037353516, -38.356842041015625, -19.421653747558594, 209.45803833007812, 134.56602478027344, -7.371467590332031, 36.35894012451172, 29.618896484375, -14.897758483886719, 121.1462173461914, 146.3028564453125, 51.079795837402344, 66.82247924804688, 134.00106811523438, -69.97038269042969, -25.08380126953125, 25.076522827148438, 148.146240234375, 44.199546813964844, 127.4110336303711, 40.039085388183594, 114.02021789550781, 50.846473693847656, 58.48967742919922, 175.92721557617188, 8.958770751953125, 24.893661499023438, -6.249000549316406, 44.65934753417969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000385.npy"}
{"epoch": 0.5653450807635829, "step": 386, "batch_size": 64, "mean": 60.51496124267578, "std": 62.57432174682617, "min": -119.29080200195312, "p10": -14.961188125610347, "median": 67.09077072143555, "p90": 121.95155029296875, "max": 203.60003662109375, "pos_frac": 0.84375, "sample": [50.3006591796875, 43.87112808227539, -105.55778503417969, -39.31110763549805, 23.086456298828125, 38.364593505859375, -16.634578704833984, 79.86304473876953, 68.42330932617188, 104.2958984375, 81.63089752197266, 106.49339294433594, 105.59034729003906, -11.056610107421875, 130.81097412109375, 97.6446533203125, 110.61064910888672, 51.64210510253906, 57.53886413574219, -20.71881103515625, 120.32356262207031, 121.67064666748047, 44.16606903076172, 39.447021484375, 152.83328247070312, 108.58683013916016, 17.093151092529297, 9.0718994140625, 171.45742797851562, 203.60003662109375, -6.1733856201171875, 48.91216278076172, 89.30357360839844, 81.57077026367188, 59.529762268066406, 121.54684448242188, 84.54407501220703, -7.856658935546875, 65.57434844970703, 114.78839874267578, 18.225830078125, 89.05663299560547, -46.94853210449219, 120.04183959960938, 36.955047607421875, -74.48783111572266, 26.437057495117188, 142.28805541992188, 87.08613586425781, 70.12339782714844, 68.78611755371094, 67.91405487060547, 15.436958312988281, 115.3012466430664, 4.754585266113281, 159.3085479736328, 47.518348693847656, 54.59659957885742, -119.29080200195312, 66.26748657226562, 107.18065643310547, 23.215030670166016, 74.24128723144531, 122.07193756103516], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000386.npy"}
{"epoch": 0.566813509544787, "step": 387, "batch_size": 64, "mean": 58.13371658325195, "std": 57.13772964477539, "min": -77.45654296875, "p10": -8.946256256103513, "median": 54.358741760253906, "p90": 135.61081848144534, "max": 189.17239379882812, "pos_frac": 0.84375, "sample": [71.26925659179688, -6.094505310058594, 49.187171936035156, 140.87696838378906, 44.18999481201172, 139.2230682373047, 88.50311279296875, 16.362396240234375, 155.67861938476562, 91.04185485839844, 2.7262802124023438, -31.746566772460938, 10.64852523803711, 52.982666015625, 50.814849853515625, 108.8929443359375, 122.437744140625, 175.0394744873047, 56.844757080078125, -77.45654296875, -7.720054626464844, 104.7844009399414, 91.7016830444336, 53.387115478515625, 53.319793701171875, 71.30268096923828, 47.75009536743164, 22.356040954589844, 30.373672485351562, 76.50757598876953, 7.747810363769531, 30.880020141601562, 148.32635498046875, 63.0218505859375, 83.2738037109375, 38.81749725341797, -50.25035095214844, -10.342483520507812, 22.96759033203125, 55.33036804199219, 55.615272521972656, 26.79823875427246, 111.67420959472656, 33.693878173828125, 38.83229064941406, 83.46849060058594, 40.24458312988281, 11.90854263305664, 98.81818389892578, -39.40644836425781, -4.455253601074219, 74.58853149414062, -9.471771240234375, 115.363525390625, -48.1883544921875, 114.09507751464844, 3.8535003662109375, 189.17239379882812, 127.18223571777344, 63.152435302734375, 103.8301010131836, 96.0492935180664, 77.85688781738281, 160.9246368408203], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000387.npy"}
{"epoch": 0.5682819383259912, "step": 388, "batch_size": 64, "mean": 66.6114501953125, "std": 58.60031509399414, "min": -60.45602035522461, "p10": -5.591610336303708, "median": 69.7759780883789, "p90": 145.58685913085938, "max": 180.28387451171875, "pos_frac": 0.859375, "sample": [15.209930419921875, 121.14976501464844, 8.512046813964844, 77.23707580566406, 132.42282104492188, 170.57479858398438, 72.55990600585938, 132.46307373046875, 135.10733032226562, 31.894744873046875, 141.14559936523438, 94.90451049804688, 26.75640106201172, 103.71665954589844, -0.030391693115234375, 45.48794937133789, 24.82550048828125, 32.815433502197266, 147.15072631835938, 85.49385070800781, 77.77690124511719, 53.68165588378906, 18.972869873046875, -60.45602035522461, 110.36747741699219, 119.16354370117188, 58.59117126464844, 47.02046203613281, 158.5071258544922, 88.53038024902344, -17.064712524414062, 3.2046279907226562, 77.46177673339844, 42.57598876953125, 162.84130859375, 114.6903076171875, 35.26799011230469, 25.146926879882812, 120.11503601074219, -11.70506477355957, 5.734893798828125, 34.31941604614258, 147.58883666992188, -31.134185791015625, 109.0634765625, 180.28387451171875, 45.492462158203125, 6.866096496582031, 94.81204223632812, 152.32029724121094, 74.5223388671875, 141.93783569335938, -7.4061279296875, 59.407745361328125, -6.774806976318359, 0.2967987060546875, 6.766109466552734, 66.99205017089844, 108.03437805175781, -2.8308181762695312, 123.42778015136719, 86.96400451660156, 79.90675354003906, -37.54594039916992], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000388.npy"}
{"epoch": 0.5697503671071953, "step": 389, "batch_size": 64, "mean": 58.680137634277344, "std": 72.37788391113281, "min": -118.5276870727539, "p10": -22.6953950881958, "median": 50.14767265319824, "p90": 160.1288589477539, "max": 230.34799194335938, "pos_frac": 0.796875, "sample": [40.387306213378906, 169.32711791992188, 129.80746459960938, -42.75265121459961, -9.832500457763672, -6.6446990966796875, 141.55535888671875, 104.46536254882812, 77.5976791381836, -76.67813110351562, 191.3878631591797, -21.414512634277344, 77.47551727294922, -118.5276870727539, 16.817733764648438, 37.38580322265625, 9.713066101074219, 56.165618896484375, 30.28955841064453, 8.453475952148438, 62.510589599609375, 57.80901336669922, 230.34799194335938, -53.79987716674805, 87.2741928100586, 22.857376098632812, -6.163116455078125, 150.45533752441406, 4.6670989990234375, 102.96379089355469, 115.74295043945312, 84.28314208984375, 51.795345306396484, -27.21051025390625, -15.417484283447266, 2.2677001953125, 21.33782196044922, 25.279815673828125, 162.88742065429688, 60.543365478515625, 205.14501953125, 29.316972732543945, 33.44468688964844, 85.208740234375, 45.40282440185547, 136.75991821289062, 49.74639892578125, 27.449268341064453, 161.26588439941406, 90.0364761352539, 40.71263122558594, 181.01568603515625, -10.825386047363281, -55.77336120605469, 140.72247314453125, 93.70855712890625, 115.66746520996094, 95.2110366821289, 157.47579956054688, 0.316375732421875, 35.797088623046875, -23.24434471130371, 50.548946380615234, 115.0091552734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000389.npy"}
{"epoch": 0.5712187958883994, "step": 390, "batch_size": 64, "mean": 52.51655578613281, "std": 73.70211029052734, "min": -93.16535949707031, "p10": -29.632872772216796, "median": 47.80708694458008, "p90": 138.86381835937502, "max": 338.34228515625, "pos_frac": 0.734375, "sample": [-66.26300811767578, 56.7657470703125, 99.43817901611328, 14.294795989990234, -27.220428466796875, 16.69986343383789, 49.82737731933594, 56.215049743652344, 141.52093505859375, 26.40642547607422, 12.727752685546875, 99.7539291381836, 91.20651245117188, 73.53939056396484, 18.20067596435547, -32.03279495239258, 15.070545196533203, 338.34228515625, -10.851219177246094, 182.0418701171875, 46.23136901855469, 126.02554321289062, 78.51461791992188, 126.72269439697266, -27.330379486083984, 13.72720718383789, 25.820972442626953, -3.8988494873046875, -13.124336242675781, 22.358322143554688, -12.622810363769531, 166.31314086914062, 70.19685363769531, -44.17961883544922, 3.5496978759765625, 184.60427856445312, 160.21485900878906, 104.95878601074219, -8.519439697265625, 49.38280487060547, 60.25254821777344, 76.8248519897461, -9.377464294433594, 28.720922470092773, 30.313278198242188, -32.60310363769531, -10.361808776855469, 30.147567749023438, 132.66387939453125, 126.18655395507812, 117.06353759765625, 66.85165405273438, -10.66119384765625, 122.82103729248047, 1.2676162719726562, -30.61965560913086, 65.56716918945312, -93.16535949707031, -50.928497314453125, 117.27558898925781, 90.64204406738281, 50.33677673339844, 106.7162094116211, 150.49598693847656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000390.npy"}
{"epoch": 0.5726872246696035, "step": 391, "batch_size": 64, "mean": 59.23310852050781, "std": 68.53031158447266, "min": -89.39692687988281, "p10": -16.997875213623047, "median": 50.74971008300781, "p90": 137.6612976074219, "max": 219.61495971679688, "pos_frac": 0.828125, "sample": [-40.74279022216797, 25.098770141601562, -4.826148986816406, 126.99623107910156, 40.243309020996094, 90.35887145996094, 131.04034423828125, 25.803871154785156, 40.81614685058594, 61.326690673828125, 80.61674499511719, 13.636672973632812, -14.842912673950195, 11.512008666992188, 0.510101318359375, 33.875205993652344, 47.891448974609375, 32.22515869140625, -5.752773284912109, 53.60797119140625, 120.73965454101562, 179.14613342285156, 2.8367919921875, 133.43655395507812, 87.96858215332031, 67.41895294189453, 35.074562072753906, 56.8118896484375, 86.60772705078125, 58.896453857421875, -17.459014892578125, 161.3220672607422, 17.538999557495117, 114.0197525024414, 60.935630798339844, -54.10065460205078, 89.87251281738281, 127.6942367553711, 108.42827606201172, 36.81501007080078, 87.63880920410156, -64.75470733642578, 115.18182373046875, 55.72825241088867, 215.3236083984375, -53.89823913574219, 38.38194274902344, 15.108165740966797, 21.420623779296875, 32.82625961303711, -15.921882629394531, 97.59275817871094, 13.664047241210938, 138.89938354492188, 134.77243041992188, 219.61495971679688, 205.3532257080078, -37.24659729003906, 40.37062072753906, 5.312744140625, 98.1275634765625, 168.45018005371094, -89.39692687988281, 124.9708251953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000391.npy"}
{"epoch": 0.5741556534508077, "step": 392, "batch_size": 64, "mean": 66.70355224609375, "std": 70.923828125, "min": -120.09214782714844, "p10": -10.217131805419921, "median": 61.574275970458984, "p90": 152.5317153930664, "max": 239.89593505859375, "pos_frac": 0.84375, "sample": [75.10487365722656, -10.695716857910156, -23.799636840820312, 111.29255676269531, 41.023216247558594, 3.3073577880859375, 67.62142944335938, -2.244049072265625, 21.415393829345703, 145.1026153564453, 138.371826171875, -9.100433349609375, 68.68849182128906, 58.568275451660156, -55.822784423828125, 39.093360900878906, 151.88833618164062, 226.7760009765625, 88.56231689453125, 35.567935943603516, 49.12177276611328, 5.08135986328125, 159.1788330078125, 141.09841918945312, 28.97722625732422, 130.32994079589844, 102.99868774414062, 152.8074493408203, 54.79673385620117, -5.676727294921875, 103.6203842163086, 239.89593505859375, 37.092445373535156, 92.66546630859375, 139.29769897460938, 156.04730224609375, 23.72197723388672, 68.79193878173828, 52.63641357421875, 51.66901397705078, 118.36293029785156, 82.22109985351562, 49.64056396484375, 61.505699157714844, 161.348876953125, 113.00788879394531, 55.426231384277344, 21.60230255126953, -51.33686447143555, -120.09214782714844, 46.6159553527832, 137.45199584960938, 9.905414581298828, 10.822708129882812, 3.56903076171875, 107.99713134765625, 128.10202026367188, -40.07643127441406, 154.5380859375, -106.41448974609375, 113.72638702392578, 61.642852783203125, 74.24005889892578, 120.34671020507812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000392.npy"}
{"epoch": 0.5756240822320118, "step": 393, "batch_size": 64, "mean": 62.09144592285156, "std": 63.19862365722656, "min": -67.10519409179688, "p10": -21.00044174194335, "median": 64.20133209228516, "p90": 142.40579071044922, "max": 216.42324829101562, "pos_frac": 0.78125, "sample": [132.8112335205078, 120.12825012207031, 60.072818756103516, -0.6752700805664062, -12.306808471679688, 45.82582092285156, -23.968482971191406, 111.8526611328125, 59.317710876464844, 62.37433624267578, 98.01228332519531, 143.14511108398438, 25.544904708862305, 103.28268432617188, 82.32888793945312, 60.15422821044922, -14.07501220703125, 216.42324829101562, -24.568313598632812, 95.44351959228516, 86.97372436523438, 31.243637084960938, 18.228500366210938, 72.7669906616211, 138.59547424316406, 167.32290649414062, -6.826915740966797, 140.6807098388672, 111.72392272949219, 11.340751647949219, 72.39761352539062, 72.77774810791016, 117.57569885253906, 107.54913330078125, -46.94989776611328, 41.10984802246094, 48.430328369140625, 16.218170166015625, 151.35362243652344, 91.55784606933594, -29.25231170654297, 66.02832794189453, 102.9355697631836, 4.150154113769531, 148.78453063964844, 72.73257446289062, 41.38420867919922, -44.17548370361328, 71.10771942138672, 31.93335723876953, 112.67115020751953, 81.53468322753906, 134.15924072265625, -11.421249389648438, 29.661100387573242, 164.65838623046875, 26.296981811523438, 92.53434753417969, -3.6392974853515625, -67.10519409179688, -40.713470458984375, 37.27398681640625, 179.9654083251953, -12.845890045166016], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000393.npy"}
{"epoch": 0.5770925110132159, "step": 394, "batch_size": 64, "mean": 65.45079040527344, "std": 57.48057556152344, "min": -49.58894348144531, "p10": -11.971379470825193, "median": 62.94366455078125, "p90": 133.7700119018555, "max": 193.89813232421875, "pos_frac": 0.828125, "sample": [121.36985778808594, 103.6272201538086, -4.2734527587890625, 50.593563079833984, 154.2309112548828, 87.50770568847656, 94.90074157714844, 7.09307861328125, 62.620147705078125, 107.70075988769531, 103.6511459350586, 46.27208709716797, 58.02445983886719, 154.75308227539062, -30.482290267944336, 50.15656280517578, -47.790775299072266, 29.123558044433594, 9.392913818359375, -49.58894348144531, 15.852645874023438, 73.55310821533203, 129.2156219482422, 83.35462951660156, 171.94070434570312, 102.78903198242188, 60.006744384765625, 19.157901763916016, -12.820919036865234, 176.30615234375, 97.14117431640625, 78.37311553955078, 27.010005950927734, -17.96876335144043, 138.26988220214844, 97.06117248535156, 43.92164611816406, 127.34370422363281, 123.980712890625, 63.22355651855469, -9.989120483398438, 46.42192077636719, 13.358940124511719, 62.66377258300781, -14.631664276123047, -5.936737060546875, 123.73284912109375, 135.72189331054688, 82.83409118652344, 56.451507568359375, 82.47457885742188, 22.53338623046875, 193.89813232421875, 119.14022827148438, 87.6998519897461, 101.27458953857422, -6.264373779296875, 32.793846130371094, 91.05287170410156, 45.677032470703125, 103.96187591552734, -22.763566970825195, 119.146240234375, 21.004302978515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000394.npy"}
{"epoch": 0.57856093979442, "step": 395, "batch_size": 64, "mean": 54.53062438964844, "std": 87.35565185546875, "min": -141.61697387695312, "p10": -60.33939971923827, "median": 43.1900520324707, "p90": 185.21234741210944, "max": 219.36611938476562, "pos_frac": 0.703125, "sample": [139.0662078857422, -5.773105621337891, -120.53388214111328, -92.77278137207031, -4.5370635986328125, 167.37693786621094, -5.812095642089844, 43.9764404296875, -86.78607177734375, 124.91302490234375, 31.482254028320312, 50.29090118408203, 31.227890014648438, -38.534053802490234, 36.50775909423828, 58.17962646484375, 98.0617904663086, 160.45986938476562, 107.55325317382812, -19.769878387451172, 116.38807678222656, 63.55616760253906, 154.59707641601562, 99.31434631347656, 28.490676879882812, 45.80945587158203, 73.37471771240234, 193.0160369873047, -64.98257446289062, -0.098297119140625, 206.32894897460938, 146.78456115722656, 53.592140197753906, 71.2876205444336, 61.51176834106445, 20.839675903320312, 31.582870483398438, 163.18211364746094, 219.36611938476562, 18.530242919921875, 42.403663635253906, -10.699745178222656, -97.49447631835938, -92.65473937988281, 42.333702087402344, -1.6201324462890625, 40.47383117675781, 203.54208374023438, 53.26633071899414, -11.6739501953125, 12.937545776367188, -141.61697387695312, 41.44481658935547, 202.4971923828125, 194.67422485351562, 127.12377166748047, -49.50532531738281, 26.658111572265625, 80.99624633789062, 192.85609436035156, 157.21168518066406, -5.279666900634766, 121.47126007080078, -16.43431854248047], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000395.npy"}
{"epoch": 0.580029368575624, "step": 396, "batch_size": 64, "mean": 45.51504135131836, "std": 62.767391204833984, "min": -108.0500259399414, "p10": -35.14653549194336, "median": 52.087520599365234, "p90": 118.3721748352051, "max": 211.95794677734375, "pos_frac": 0.765625, "sample": [92.57070922851562, 105.60474395751953, -16.4185791015625, 52.242347717285156, 57.38982391357422, 72.6776123046875, -14.572807312011719, 25.895606994628906, -34.35462951660156, 53.66119384765625, 103.26602172851562, -51.24752426147461, 18.631916046142578, 129.14053344726562, 1.9253425598144531, 28.920013427734375, -9.901861190795898, 10.543197631835938, 7.130516052246094, 82.16281127929688, 54.83360290527344, 13.849807739257812, 120.5945053100586, 113.18673706054688, 133.7894287109375, 32.681549072265625, 123.89248657226562, 78.08538818359375, 98.03797912597656, 211.95794677734375, 17.917152404785156, 7.43804931640625, -108.0500259399414, 51.93269348144531, 15.083419799804688, 32.18718719482422, -43.413177490234375, 103.06523895263672, -35.485923767089844, 188.28750610351562, 64.7219467163086, 8.175163269042969, -59.92444610595703, 27.539695739746094, 79.33483123779297, -71.29176330566406, 102.65612030029297, -3.5351791381835938, 65.29096221923828, 124.24130249023438, 109.879638671875, 110.45907592773438, -19.853668212890625, -47.47697448730469, 94.86898803710938, 60.86979293823242, -25.66126251220703, -9.203628540039062, 16.64630126953125, 53.44142150878906, 84.54182434082031, 105.60057830810547, 77.8329086303711, 38.67041015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000396.npy"}
{"epoch": 0.5814977973568282, "step": 397, "batch_size": 64, "mean": 81.79728698730469, "std": 81.22312927246094, "min": -141.8875274658203, "p10": -22.95569038391113, "median": 92.32292938232422, "p90": 181.12046203613284, "max": 240.59664916992188, "pos_frac": 0.8125, "sample": [-12.38287353515625, 192.93368530273438, 24.97222137451172, 205.1951904296875, -42.10735321044922, 124.14210510253906, -1.721038818359375, 32.95591735839844, -26.036636352539062, 56.59252166748047, 190.56005859375, 68.66729736328125, 60.16261291503906, 83.1015396118164, 3.8385086059570312, -24.890907287597656, 35.324241638183594, 109.14408874511719, 123.65116882324219, 97.51406860351562, 15.080177307128906, 187.64959716796875, 3.9867782592773438, 102.10536193847656, 94.72367858886719, -141.8875274658203, 132.80038452148438, 182.06033325195312, -16.190059661865234, 100.53716278076172, 140.27171325683594, 171.07943725585938, 176.3243865966797, 178.92742919921875, 113.7020492553711, 73.09021759033203, 66.767333984375, 20.14002227783203, 65.17770385742188, 143.8701629638672, 171.20257568359375, -11.557792663574219, -24.23175811767578, 120.46817016601562, 154.108642578125, 132.96311950683594, -31.048202514648438, 96.8255615234375, -53.71919631958008, 6.293052673339844, 31.309934616088867, 117.20306396484375, 173.7216339111328, 240.59664916992188, 89.92218017578125, 170.91055297851562, 175.15664672851562, -19.978199005126953, 225.71981811523438, 2.1927413940429688, 133.7579345703125, 58.42005920410156, 27.503036499023438, 135.45346069335938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000397.npy"}
{"epoch": 0.5829662261380323, "step": 398, "batch_size": 64, "mean": 61.175594329833984, "std": 75.65242004394531, "min": -148.77838134765625, "p10": -15.795900344848626, "median": 47.61848068237305, "p90": 171.9841247558594, "max": 221.30506896972656, "pos_frac": 0.828125, "sample": [73.33182525634766, 43.81565856933594, 27.476333618164062, -6.668296813964844, 176.34500122070312, -33.61240768432617, 19.316059112548828, 0.433746337890625, 36.929962158203125, -18.860519409179688, 105.04608917236328, -148.77838134765625, 19.574798583984375, 18.329360961914062, 45.210975646972656, 211.36233520507812, 17.168502807617188, 144.3411865234375, 17.258544921875, 140.6219024658203, 112.800537109375, 63.792335510253906, -4.133247375488281, 16.44867706298828, 221.30506896972656, 67.3809814453125, 81.40264892578125, -66.11983489990234, 0.1089935302734375, -8.645122528076172, 0.07196807861328125, 163.09841918945312, 53.06101989746094, 93.07008361816406, -74.17393493652344, 116.09419250488281, 164.3909149169922, 79.34408569335938, 175.2383575439453, -4.4411773681640625, 40.695255279541016, 164.01080322265625, 50.02598571777344, 0.20940780639648438, 42.44140625, 65.85148620605469, 181.5699005126953, 106.34736633300781, 94.55987548828125, 75.48846435546875, -42.842994689941406, 177.9490966796875, 142.2938995361328, 197.5547637939453, -39.03692626953125, 38.7425537109375, 90.7191162109375, 93.12130737304688, 9.436767578125, 33.71831512451172, 87.70823669433594, 144.26260375976562, 7.3926239013671875, 14.2811279296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000398.npy"}
{"epoch": 0.5844346549192364, "step": 399, "batch_size": 64, "mean": 77.61836242675781, "std": 62.0500602722168, "min": -54.778228759765625, "p10": 10.233241271972657, "median": 66.53778648376465, "p90": 156.80231628417968, "max": 219.52989196777344, "pos_frac": 0.921875, "sample": [41.48430633544922, -45.373695373535156, 82.8870620727539, 15.245819091796875, 111.74317932128906, 44.74028778076172, 100.69792175292969, 29.199447631835938, 91.54379272460938, 150.7302703857422, 26.639968872070312, 30.261444091796875, 145.01919555664062, 51.178314208984375, 38.35777282714844, 61.60856628417969, 47.895660400390625, -13.977973937988281, 53.10810470581055, 29.447128295898438, 40.686370849609375, 24.948162078857422, 157.91429138183594, 106.64601135253906, 186.91949462890625, 157.4942626953125, 155.18777465820312, 162.7755889892578, -54.778228759765625, 141.55734252929688, 151.46292114257812, 219.52989196777344, 24.008014678955078, 173.63803100585938, 103.32832336425781, -32.25755310058594, 85.268310546875, 10.094680786132812, 137.36630249023438, 147.64410400390625, 25.173789978027344, 81.95481872558594, 10.556549072265625, 36.79309844970703, -22.973569869995117, 70.36245727539062, 42.48676300048828, 105.66336059570312, 118.21371459960938, 44.05022048950195, 86.1126708984375, 191.5615997314453, 7.3227996826171875, 135.9608154296875, 59.697715759277344, 69.6423110961914, 53.43597412109375, 63.43326187133789, 33.22663879394531, 109.03284454345703, 121.79157257080078, 147.60000610351562, 149.15542602539062, 35.44953918457031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000399.npy"}
{"epoch": 0.5859030837004405, "step": 400, "batch_size": 64, "mean": 79.97297668457031, "std": 77.99774932861328, "min": -80.87203979492188, "p10": -24.33271255493164, "median": 91.11384582519531, "p90": 164.52607269287108, "max": 266.0137023925781, "pos_frac": 0.859375, "sample": [39.122901916503906, 37.044342041015625, 236.34982299804688, 66.9058837890625, 82.20750427246094, -78.36904907226562, -67.9260025024414, 114.96876525878906, 25.39776611328125, 114.8997802734375, 121.79874420166016, 115.71806335449219, 5.186553955078125, 31.282485961914062, -10.708812713623047, -53.038482666015625, 217.13131713867188, 79.08428955078125, 154.3258514404297, 89.44570922851562, 266.0137023925781, -80.87203979492188, 64.78608703613281, 161.80288696289062, 112.76219940185547, 44.3162841796875, 93.98263549804688, 125.98216247558594, 92.781982421875, 164.40707397460938, 21.691532135009766, 127.53577423095703, -51.24547576904297, 10.921829223632812, 171.99594116210938, 116.40418243408203, 66.52555847167969, 53.304351806640625, 39.056671142578125, 9.463531494140625, -45.65623474121094, -21.320152282714844, 154.90277099609375, 146.93350219726562, 42.53253936767578, 158.7225799560547, 23.023452758789062, 164.5770721435547, 23.133811950683594, 171.24497985839844, 120.47128295898438, -25.623809814453125, 43.29066467285156, 135.453857421875, 124.66968536376953, 193.6650848388672, 1.902008056640625, 164.22998046875, 140.71160888671875, 138.44549560546875, 102.40115356445312, 21.596649169921875, 106.6113510131836, 99.90724182128906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000400.npy"}
{"epoch": 0.5873715124816447, "step": 401, "batch_size": 64, "mean": 50.725948333740234, "std": 79.305419921875, "min": -104.76039123535156, "p10": -31.319961547851562, "median": 44.21118927001953, "p90": 162.27029571533205, "max": 297.2654724121094, "pos_frac": 0.71875, "sample": [-4.911567687988281, -30.87866973876953, 27.06902313232422, 193.95912170410156, 113.86750030517578, 52.78254699707031, -104.76039123535156, 9.206398010253906, -14.280853271484375, -25.180145263671875, -2.7373275756835938, -17.574127197265625, 43.20436096191406, -15.06076431274414, 16.89288330078125, 46.038429260253906, 35.75236511230469, 67.86109924316406, -33.588531494140625, 163.6662139892578, -19.421859741210938, 30.497562408447266, 52.137542724609375, 176.90655517578125, -21.05213165283203, 84.35757446289062, 181.25396728515625, 84.27102661132812, 297.2654724121094, 2.583688735961914, 60.50371551513672, 14.215965270996094, -6.388401031494141, 159.01315307617188, -31.50908660888672, 55.19536590576172, 195.07513427734375, -53.20258331298828, 62.00520324707031, -87.08174133300781, 53.08525085449219, 73.81320190429688, 43.34331512451172, 8.982933044433594, 122.7818832397461, 10.22705078125, -49.23628234863281, 75.21083068847656, 239.93313598632812, 128.16757202148438, 144.12924194335938, 0.3880462646484375, 87.22879028320312, 67.91206359863281, 123.03804016113281, 84.48176574707031, 45.079063415527344, 110.74244689941406, 61.27900695800781, -4.687710762023926, 16.545196533203125, 123.19499206542969, 14.376014709472656, -91.50885009765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000401.npy"}
{"epoch": 0.5888399412628488, "step": 402, "batch_size": 64, "mean": 92.74884033203125, "std": 64.57414245605469, "min": -53.281044006347656, "p10": 7.0079772949218775, "median": 111.93198776245117, "p90": 162.21433258056643, "max": 225.52041625976562, "pos_frac": 0.90625, "sample": [38.26374816894531, 43.747894287109375, 31.267932891845703, 124.82298278808594, 19.384628295898438, 133.1127471923828, -28.865203857421875, 80.32778930664062, 40.527427673339844, 81.81082153320312, 113.62679290771484, 154.62586975097656, 209.32814025878906, 136.59564208984375, 70.48416137695312, 116.494873046875, -53.281044006347656, 131.74786376953125, 80.1769790649414, 146.16064453125, 164.74400329589844, 36.77001953125, 149.03125, -15.155155181884766, 115.08009338378906, 93.498046875, 124.23304748535156, 178.20094299316406, 41.301353454589844, 195.71597290039062, 117.80085754394531, 138.6604461669922, 134.92727661132812, 81.25408172607422, -15.6663818359375, 142.30223083496094, 195.81695556640625, 133.3881072998047, 29.985488891601562, 156.311767578125, 111.79698944091797, 5.9681396484375, 143.34750366210938, 188.75863647460938, 145.8919677734375, 37.74528121948242, 119.07592010498047, 82.4140853881836, 123.39056396484375, 225.52041625976562, 127.7079086303711, 50.67023468017578, 51.20741271972656, 91.88533020019531, 109.30155944824219, 137.39608764648438, 112.06698608398438, 19.55396270751953, -7.977821350097656, -44.58857345581055, 130.09954833984375, 16.55579376220703, 9.43426513671875, 80.14241027832031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000402.npy"}
{"epoch": 0.5903083700440529, "step": 403, "batch_size": 64, "mean": 68.3082046508789, "std": 70.84237670898438, "min": -72.18103790283203, "p10": -24.1972728729248, "median": 72.04260635375977, "p90": 161.57269287109378, "max": 240.43994140625, "pos_frac": 0.84375, "sample": [32.80847930908203, 70.45787811279297, 113.96281433105469, 40.0325927734375, 59.958343505859375, 46.77956008911133, 202.95179748535156, 101.60941314697266, -72.18103790283203, 13.531076431274414, 32.584022521972656, 127.14276123046875, -52.33826446533203, 163.61422729492188, 29.259145736694336, 240.43994140625, 39.28573226928711, -51.8052864074707, 73.25324249267578, 72.84974670410156, 121.97784423828125, 33.90180587768555, 83.4362564086914, -63.44960403442383, 21.903732299804688, 181.2928466796875, 116.36347961425781, 29.389556884765625, 134.6866912841797, 208.17666625976562, 12.44498062133789, -35.82090759277344, 177.69464111328125, 45.89263153076172, 94.04302978515625, -19.627029418945312, -9.6270751953125, 144.4522247314453, 72.74919891357422, 102.46101379394531, 0.5688247680664062, -26.155948638916016, 125.69744873046875, -2.36379337310791, 108.2511215209961, -57.33428955078125, 127.25247192382812, 18.092666625976562, 88.07512664794922, 178.81561279296875, 48.35496520996094, 121.48004150390625, 156.80911254882812, 86.11848449707031, 117.19941711425781, 4.1475677490234375, 11.616081237792969, 71.33601379394531, 109.63843536376953, 75.63239288330078, 38.03422164916992, 21.30225372314453, 86.57141876220703, 126.04736328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000403.npy"}
{"epoch": 0.591776798825257, "step": 404, "batch_size": 64, "mean": 61.89646911621094, "std": 81.83818817138672, "min": -125.94248962402344, "p10": -30.93199348449706, "median": 52.36526107788086, "p90": 160.75469360351565, "max": 294.1169128417969, "pos_frac": 0.84375, "sample": [146.77902221679688, 4.202880859375, 20.88135528564453, -34.780487060546875, 97.14411926269531, 29.468385696411133, 103.4913101196289, 199.8828125, 139.00802612304688, 30.58624267578125, 206.26177978515625, 138.8174591064453, 10.487075805664062, 222.96246337890625, 98.44158935546875, 111.6516342163086, 294.1169128417969, 106.18577575683594, -21.95217514038086, 37.70136642456055, 22.51150131225586, 99.76356506347656, 24.792434692382812, 18.443771362304688, 119.3585205078125, 161.84201049804688, -4.991275787353516, 99.1697998046875, 32.99446105957031, 40.24836730957031, 21.745580673217773, 44.273712158203125, -89.8411865234375, 58.885807037353516, 94.14105224609375, 80.27083587646484, -10.411140441894531, 96.40728759765625, 85.89215087890625, 65.7254638671875, -81.15966033935547, 20.99043083190918, 144.16262817382812, 120.08134460449219, 12.825000762939453, 48.63215637207031, -88.25752258300781, 71.51482391357422, 136.23548889160156, -125.94248962402344, 48.47476577758789, 161.91737365722656, 19.591318130493164, 66.66178894042969, 158.21762084960938, 192.92442321777344, 99.34172058105469, -98.42755126953125, 9.24893569946289, -90.5116195678711, 56.098365783691406, 34.15802001953125, 34.40804672241211, 7.628639221191406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000404.npy"}
{"epoch": 0.593245227606461, "step": 405, "batch_size": 64, "mean": 60.25043487548828, "std": 62.32162094116211, "min": -51.43402099609375, "p10": -14.956688690185544, "median": 48.88612174987793, "p90": 139.9820220947266, "max": 247.60923767089844, "pos_frac": 0.84375, "sample": [41.30625915527344, 58.84228515625, -17.552108764648438, 37.20298767089844, -11.237228393554688, 33.446044921875, 90.0778579711914, 17.775115966796875, 15.934480667114258, 61.16595458984375, 80.77088928222656, 190.04331970214844, 219.48123168945312, 17.298377990722656, 65.78173828125, 162.24014282226562, 25.221664428710938, 23.831947326660156, 118.32913970947266, 10.547126770019531, 97.41763305664062, 47.97344207763672, -48.054229736328125, 176.8570556640625, -6.007129669189453, 65.15373229980469, 59.10047912597656, 128.3586883544922, 105.62786865234375, 12.035972595214844, -23.11264419555664, -0.808619499206543, 142.9961395263672, 85.61285400390625, 30.96758270263672, -51.43402099609375, 13.321548461914062, 28.47802734375, 49.79880142211914, -16.550743103027344, 89.11124420166016, 16.9991455078125, 132.94908142089844, 44.5393180847168, -23.366058349609375, 116.79119873046875, 41.354522705078125, 76.32992553710938, 44.237030029296875, 159.94699096679688, 105.27669525146484, 9.449615478515625, 37.642547607421875, 113.76289367675781, 1.5870895385742188, 92.58953857421875, 85.37985229492188, 21.782447814941406, 95.55642700195312, -16.782867431640625, 247.60923767089844, 74.74549865722656, 81.32400512695312, 68.97280883789062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000405.npy"}
{"epoch": 0.5947136563876652, "step": 406, "batch_size": 64, "mean": 55.51643371582031, "std": 73.90116882324219, "min": -120.11734008789062, "p10": -34.98161926269531, "median": 55.889732360839844, "p90": 152.99042968750004, "max": 209.36767578125, "pos_frac": 0.796875, "sample": [125.57452392578125, 53.09797668457031, 38.32282257080078, 89.69403839111328, 13.423530578613281, 79.23685455322266, 10.817092895507812, 128.2998046875, 178.80343627929688, -53.71238327026367, 96.36048889160156, -11.936248779296875, 143.57986450195312, 81.64759063720703, 59.01765441894531, 45.171112060546875, 27.410400390625, 33.17565155029297, -36.182823181152344, 26.087200164794922, 172.2672576904297, 182.80133056640625, 63.40718078613281, 54.18986511230469, 143.2382354736328, 18.67532730102539, 30.07646369934082, 94.73684692382812, 80.24391174316406, -120.11734008789062, -91.22299194335938, -102.52599334716797, 3.2370529174804688, 18.515216827392578, 57.91455841064453, -32.178810119628906, 157.02352905273438, -43.94990539550781, 74.29025268554688, -1.8944454193115234, 110.52234649658203, 114.04605102539062, 162.50250244140625, -14.828784942626953, 136.5091552734375, 11.672737121582031, 1.84130859375, 121.51109313964844, 96.39202880859375, 10.982162475585938, 209.36767578125, 100.02973175048828, 99.05977630615234, 9.130157470703125, -68.55310821533203, -2.973834991455078, -15.266395568847656, 3.2749252319335938, 126.99942016601562, 57.589599609375, 184.57052612304688, 11.53326416015625, 108.1644287109375, 92.35884094238281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000406.npy"}
{"epoch": 0.5961820851688693, "step": 407, "batch_size": 64, "mean": 51.120460510253906, "std": 69.25262451171875, "min": -150.14418029785156, "p10": -18.422685241699217, "median": 40.8007698059082, "p90": 131.55535125732422, "max": 248.03836059570312, "pos_frac": 0.78125, "sample": [6.615652084350586, 193.6510009765625, 41.662620544433594, -83.7100601196289, -150.14418029785156, 107.83749389648438, 146.57875061035156, -16.961685180664062, 19.840171813964844, -24.70189666748047, 18.3828125, -2.0313720703125, 77.7739028930664, 33.037696838378906, 166.679443359375, 13.53912353515625, 101.90750122070312, 121.65379333496094, 43.40101623535156, 120.32119750976562, 5.2158966064453125, 80.17552185058594, 160.2747039794922, 78.052490234375, -2.4208145141601562, 39.93891906738281, 73.07929229736328, 117.87750244140625, 0.090728759765625, 38.1458740234375, 11.565818786621094, 203.3175048828125, 110.08969116210938, 11.935310363769531, 44.70287322998047, 12.661476135253906, 86.69624328613281, 38.57122039794922, 131.6524200439453, -54.60334777832031, -0.4960479736328125, 49.92607879638672, -19.048828125, 69.30184936523438, 25.293373107910156, 117.4375991821289, 34.66438293457031, 248.03836059570312, 86.66687774658203, 48.29768371582031, 131.328857421875, -0.8788528442382812, -21.9619140625, 60.31257629394531, 54.72270965576172, -0.04286384582519531, 28.857196807861328, 3.067474365234375, 56.95213317871094, 84.9791259765625, 130.61773681640625, -14.643211364746094, -24.8746337890625, 0.8395309448242188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000407.npy"}
{"epoch": 0.5976505139500734, "step": 408, "batch_size": 64, "mean": 52.61655807495117, "std": 76.4020004272461, "min": -107.89964294433594, "p10": -39.9503433227539, "median": 42.4035758972168, "p90": 133.95527801513674, "max": 286.2783203125, "pos_frac": 0.78125, "sample": [73.83445739746094, 286.2783203125, 188.88650512695312, 102.00373840332031, 96.59330749511719, 76.88968658447266, 41.872825622558594, 42.934326171875, 25.219482421875, 60.44440460205078, 12.790180206298828, 6.592342376708984, 136.73597717285156, 52.87548828125, -8.315773010253906, -107.89964294433594, 19.18297576904297, 121.91133117675781, 165.18946838378906, 126.33749389648438, 31.052326202392578, 29.93255615234375, -40.476226806640625, 107.68894958496094, -90.35821533203125, 72.26826477050781, -66.90575408935547, -7.671142578125, -12.020416259765625, 105.90660858154297, 54.54180908203125, 99.33262634277344, 104.44796752929688, -44.6346549987793, -54.04939270019531, -12.383087158203125, 78.83779907226562, 69.26475524902344, 17.52336883544922, 216.5076446533203, 3.7585906982421875, 14.470947265625, 1.6814346313476562, 127.46697998046875, 114.609130859375, -5.067195892333984, 119.008544921875, 153.43414306640625, 34.807411193847656, 14.977989196777344, 74.54086303710938, -46.535369873046875, -38.72328186035156, -27.326751708984375, 1.1851043701171875, 5.4998321533203125, 20.657629013061523, 3.427642822265625, 62.88514709472656, 124.20600891113281, 230.82350158691406, 121.580810546875, 13.444122314453125, 63.483985900878906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000408.npy"}
{"epoch": 0.5991189427312775, "step": 409, "batch_size": 64, "mean": 52.24652862548828, "std": 74.69048309326172, "min": -165.61328125, "p10": -27.489616775512687, "median": 56.06451988220215, "p90": 142.88751068115235, "max": 193.01881408691406, "pos_frac": 0.8125, "sample": [5.7651214599609375, -98.19386291503906, 120.36150360107422, 54.26255416870117, 124.61305236816406, -52.60093688964844, 150.45458984375, 68.58615112304688, 94.79183959960938, -165.61328125, 160.21099853515625, 86.79476928710938, -30.90111541748047, 40.33744812011719, 26.69775390625, 99.847900390625, 142.2897186279297, -6.300994873046875, 140.4732666015625, 145.195556640625, 29.379196166992188, -163.2662811279297, 127.8164291381836, 193.01881408691406, 78.47132873535156, 12.040889739990234, 98.39767456054688, 6.350826263427734, 83.18098449707031, 81.98701477050781, -19.058334350585938, 160.2952423095703, 143.14370727539062, -86.3947525024414, 142.2626495361328, 101.7952880859375, 3.9762039184570312, 25.054046630859375, -56.410064697265625, 49.35304260253906, -4.37652587890625, 61.07257843017578, 7.09747314453125, 36.86760711669922, 74.7614517211914, 131.6054229736328, 15.128555297851562, 75.248046875, 93.89508056640625, 98.4118423461914, 25.739992141723633, 50.10491180419922, -19.52945327758789, 32.23978042602539, 31.739898681640625, 170.18319702148438, 59.338958740234375, 12.57342529296875, 63.57763671875, 15.582847595214844, 119.54059600830078, 57.866485595703125, -8.948265075683594, 25.590316772460938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000409.npy"}
{"epoch": 0.6005873715124816, "step": 410, "batch_size": 64, "mean": 56.06606674194336, "std": 66.01399993896484, "min": -177.219482421875, "p10": -17.731908798217773, "median": 52.86868667602539, "p90": 117.0957015991211, "max": 214.22720336914062, "pos_frac": 0.78125, "sample": [117.77120971679688, 55.0385627746582, -13.433834075927734, 24.310646057128906, 3.2072982788085938, 48.13032531738281, 83.16970825195312, 182.78823852539062, 168.95211791992188, 58.34478759765625, 97.05481719970703, 30.34807586669922, 46.027992248535156, 35.89012908935547, 13.282184600830078, 38.11354064941406, 103.50190734863281, 103.3612060546875, 11.348709106445312, 111.88044738769531, 50.69881057739258, 10.360427856445312, 95.81575012207031, 66.07462310791016, 50.35944366455078, 91.69083404541016, 44.050262451171875, 57.197593688964844, -32.037986755371094, 96.09307861328125, 115.51951599121094, -5.125583648681641, 80.37498474121094, 103.43928527832031, -3.7476348876953125, -31.512039184570312, 30.44739532470703, -21.04267120361328, -42.64057922363281, -177.219482421875, -16.008651733398438, 43.115966796875, 2.084991455078125, 38.837196350097656, -18.470447540283203, -29.6236572265625, -12.749427795410156, 88.00094604492188, 90.24678039550781, 73.06343078613281, -1.01165771484375, 104.13044738769531, 176.30189514160156, 160.05857849121094, 109.53219604492188, 36.231109619140625, -6.6988983154296875, 110.48818969726562, 82.46990966796875, 182.67523193359375, 90.92042541503906, 63.27699279785156, 214.22720336914062, 109.24562072753906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000410.npy"}
{"epoch": 0.6020558002936858, "step": 411, "batch_size": 64, "mean": 58.06561279296875, "std": 64.37434387207031, "min": -55.93852233886719, "p10": -24.146640014648433, "median": 54.490360260009766, "p90": 138.53145294189454, "max": 257.8119812011719, "pos_frac": 0.796875, "sample": [-4.459770202636719, -55.93852233886719, 4.355224609375, -33.09246826171875, 19.072799682617188, -26.512481689453125, -34.11891555786133, 44.911773681640625, 109.85501098632812, 109.979248046875, 163.74420166015625, 17.43267059326172, 117.870849609375, -26.700035095214844, 59.72641372680664, -3.27099609375, 23.761260986328125, 136.68661499023438, 149.87429809570312, 55.525726318359375, 111.77228546142578, 90.19319152832031, 90.89959716796875, -39.406044006347656, 92.55258178710938, 6.997810363769531, 26.863143920898438, 114.8259048461914, 35.93426513671875, 49.284576416015625, 53.454994201660156, 78.95917510986328, 103.32843017578125, 139.3220977783203, -3.21240234375, 19.482898712158203, 88.96344757080078, 213.89599609375, 145.52479553222656, 75.45995330810547, 42.945899963378906, 60.17945861816406, 121.9232406616211, 56.4831428527832, 38.62081527709961, 71.44612121582031, 77.71583557128906, 31.954376220703125, 257.8119812011719, 92.99659729003906, 96.65058898925781, 37.80741500854492, 57.531219482421875, 163.7936248779297, 81.46907043457031, -4.8501129150390625, 8.613693237304688, -18.6263427734375, 17.068572998046875, -6.457550048828125, 134.17538452148438, 18.161096572875977, 4.68499755859375, -49.69951629638672], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000411.npy"}
{"epoch": 0.6035242290748899, "step": 412, "batch_size": 64, "mean": 55.503990173339844, "std": 77.85572052001953, "min": -89.8660888671875, "p10": -45.60023231506347, "median": 53.022464752197266, "p90": 160.642919921875, "max": 277.8353271484375, "pos_frac": 0.703125, "sample": [-16.26248550415039, 159.6334228515625, -89.8660888671875, 63.15919494628906, 112.54377746582031, 81.36334228515625, 179.8058319091797, -47.07441329956055, -4.11737060546875, 72.35733032226562, 55.770774841308594, 143.8851318359375, 21.498001098632812, 190.579345703125, 7.872032165527344, 77.16226196289062, 131.4371795654297, 21.899417877197266, 64.12580871582031, 106.77632904052734, -0.8382720947265625, -25.627532958984375, 123.5147705078125, 50.27415466308594, -15.730060577392578, 22.56580352783203, 68.07872009277344, 35.56681823730469, -10.54388427734375, -42.16047668457031, -65.16477966308594, 46.656375885009766, -53.508087158203125, 76.21867370605469, 106.72927856445312, 42.806800842285156, 2.1525306701660156, 35.55889129638672, 74.96923065185547, 167.0301513671875, -1.068572998046875, -19.754470825195312, 130.36175537109375, -1.61041259765625, 80.45301055908203, -42.025489807128906, 147.7115020751953, 187.8488311767578, -48.11016845703125, 65.22964477539062, 161.0755615234375, 83.83769226074219, -18.307716369628906, 49.42152404785156, 176.55377197265625, 277.8353271484375, -87.68135833740234, 97.55348205566406, 11.922584533691406, -49.528961181640625, 91.2463150024414, 23.68378448486328, 108.35296630859375, 156.1569061279297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000412.npy"}
{"epoch": 0.604992657856094, "step": 413, "batch_size": 64, "mean": 62.048255920410156, "std": 75.31135559082031, "min": -149.8158721923828, "p10": -14.65769805908203, "median": 59.550458908081055, "p90": 151.22720336914062, "max": 276.6510009765625, "pos_frac": 0.84375, "sample": [30.779144287109375, 13.175643920898438, 34.0334587097168, 20.037166595458984, 73.4368896484375, 10.332267761230469, 93.12864685058594, 27.059110641479492, -7.9549407958984375, -86.94849395751953, 188.07139587402344, 163.7570343017578, -14.943283081054688, 54.596351623535156, -13.9913330078125, 16.416406631469727, 32.26692199707031, 28.43659210205078, 136.138916015625, 107.05331420898438, -15.78514289855957, 78.73654174804688, 162.64495849609375, -130.033935546875, 19.465972900390625, 35.78074264526367, 52.594390869140625, 125.89491271972656, 21.622825622558594, 60.59895324707031, -20.457626342773438, 121.50785827636719, 94.13450622558594, 126.14445495605469, 151.8751220703125, 70.1046142578125, 216.1575927734375, 57.194091796875, 149.71539306640625, 118.975830078125, 37.59657287597656, 73.59815979003906, 127.1530990600586, 92.37788391113281, 60.59592819213867, 108.69316101074219, 45.398681640625, -149.8158721923828, -42.51055908203125, 58.50498962402344, -7.847869873046875, 88.42478942871094, 81.65377807617188, 233.87660217285156, 4.448347091674805, 75.20689392089844, 11.315170288085938, 276.6510009765625, 91.41871643066406, 27.569091796875, 39.017723083496094, 72.6706314086914, 90.53021240234375, 72.80795288085938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000413.npy"}
{"epoch": 0.6064610866372981, "step": 414, "batch_size": 64, "mean": 57.7109260559082, "std": 80.65249633789062, "min": -116.91129302978516, "p10": -24.430300903320312, "median": 61.650400161743164, "p90": 167.92324066162112, "max": 257.1994323730469, "pos_frac": 0.765625, "sample": [181.34036254882812, 60.39280319213867, 109.38520812988281, -66.90130615234375, 116.28179931640625, 257.1994323730469, 48.53582763671875, -109.62571716308594, -24.65472412109375, 112.19366455078125, 77.778076171875, -23.906646728515625, 95.3944091796875, 27.173065185546875, -62.61653137207031, 9.591142654418945, 141.0403594970703, 70.07135009765625, -12.79311752319336, -16.457651138305664, -2.000030517578125, 62.907997131347656, 93.75669860839844, -6.824470520019531, -22.548660278320312, 78.64915466308594, -115.9891357421875, 14.493968963623047, 76.27700805664062, 108.146484375, 149.2498321533203, 225.01858520507812, 2.8156089782714844, 54.93794250488281, 192.89340209960938, 41.75918960571289, 10.663482666015625, 75.56757354736328, -14.135574340820312, 139.8738250732422, 43.54522705078125, 109.89604187011719, 214.6620330810547, 17.526161193847656, 22.250812530517578, 96.48049926757812, 48.990142822265625, 66.38026428222656, 6.2067413330078125, -64.43950653076172, 159.30360412597656, 103.57610321044922, 82.29688262939453, 18.73310089111328, 68.8066177368164, 34.275352478027344, 171.61737060546875, -116.91129302978516, 113.76780700683594, 79.13629150390625, -6.79669189453125, 12.470260620117188, 70.9635009765625, 185.82717895507812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000414.npy"}
{"epoch": 0.6079295154185022, "step": 415, "batch_size": 64, "mean": 52.944557189941406, "std": 71.97238159179688, "min": -98.85816955566406, "p10": -41.29225158691406, "median": 45.04956245422363, "p90": 148.4326919555664, "max": 238.0833740234375, "pos_frac": 0.796875, "sample": [103.89944458007812, 36.7850341796875, 99.23249816894531, 172.9554901123047, 15.419097900390625, 45.678504943847656, 238.0833740234375, -35.58740234375, 61.94173049926758, 124.43598937988281, 28.398895263671875, 16.433448791503906, 2.676025390625, 157.75692749023438, -8.328933715820312, 37.28322219848633, 59.41545104980469, 70.8671875, -34.507537841796875, 12.503063201904297, 30.362159729003906, 103.15697479248047, 26.74835205078125, -52.33880615234375, 44.67909622192383, 85.43132019042969, 140.9472198486328, 47.87086486816406, 148.5283966064453, -26.458709716796875, -74.88467407226562, 66.61958312988281, -98.85816955566406, -42.910430908203125, 134.56265258789062, -88.94606018066406, -40.35460662841797, 45.42002868652344, 56.41869354248047, 121.5931396484375, 30.992305755615234, 122.38543701171875, 3.1641616821289062, -47.69050598144531, 11.505176544189453, 83.23770141601562, 24.226593017578125, 35.89952087402344, 30.96576690673828, 136.86529541015625, 30.923828125, 178.67674255371094, 96.28302001953125, -30.65948486328125, 136.45144653320312, 32.672882080078125, 43.50697326660156, 45.474609375, 148.20938110351562, -41.69409942626953, 168.46514892578125, 80.38538360595703, 74.60429382324219, 160.67161560058594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000415.npy"}
{"epoch": 0.6093979441997063, "step": 416, "batch_size": 64, "mean": 70.30528259277344, "std": 76.35330200195312, "min": -117.53197479248047, "p10": -19.208038711547847, "median": 75.3592529296875, "p90": 171.98128967285157, "max": 265.97833251953125, "pos_frac": 0.8125, "sample": [75.90837097167969, 79.51346588134766, 121.31390380859375, 41.752662658691406, -54.28217697143555, 172.44017028808594, 80.77891540527344, 44.666744232177734, 77.86363220214844, 125.28971099853516, 170.9105682373047, -117.53197479248047, 24.08846664428711, 99.66474914550781, 66.05198669433594, 98.26465606689453, 141.3661346435547, 0.18408203125, 38.915008544921875, -12.843147277832031, 114.65190124511719, 155.30426025390625, -59.666229248046875, 71.30729675292969, 151.885986328125, 185.45361328125, 131.014892578125, 265.97833251953125, -2.000640869140625, 90.67135620117188, 107.64012145996094, 183.51449584960938, 169.21450805664062, 62.250213623046875, 53.696720123291016, 15.683422088623047, 91.13539123535156, 51.071372985839844, -86.13639831542969, 109.00713348388672, 77.43376922607422, 145.33758544921875, 24.224044799804688, 103.63996124267578, -10.125930786132812, 74.84906005859375, 1.9728012084960938, 178.86962890625, 73.49749755859375, 23.005935668945312, -20.690624237060547, -37.66191101074219, 16.753860473632812, 78.40898132324219, 29.045082092285156, -26.069717407226562, 225.7554931640625, -15.748672485351562, -7.958484649658203, 43.364837646484375, 29.780075073242188, 79.91085052490234, 200.08099365234375, 75.86944580078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000416.npy"}
{"epoch": 0.6108663729809104, "step": 417, "batch_size": 64, "mean": 65.29705810546875, "std": 67.07811737060547, "min": -112.45299530029297, "p10": -18.142824172973633, "median": 67.87801361083984, "p90": 154.7063430786133, "max": 185.95726013183594, "pos_frac": 0.859375, "sample": [3.5003013610839844, 119.39297485351562, 5.91630744934082, 110.07476806640625, 3.7788314819335938, 160.28585815429688, 28.476028442382812, 104.43861389160156, 39.968971252441406, 132.54718017578125, -50.587120056152344, 28.537673950195312, 103.40248107910156, -18.57925033569336, 105.40872192382812, 148.44557189941406, -44.202980041503906, 89.88253784179688, 114.93888854980469, -73.11457061767578, 71.78424072265625, 182.1884002685547, 154.89231872558594, 131.30206298828125, 181.24903869628906, 154.27239990234375, 132.1123504638672, 11.741790771484375, 83.80089569091797, 80.09756469726562, 28.422149658203125, -18.922653198242188, 69.25553894042969, 86.89733123779297, 40.98467254638672, 13.123908996582031, -17.124496459960938, 91.65037536621094, 102.1764907836914, 13.575164794921875, 41.23814392089844, 58.47182083129883, 66.50048828125, 76.91976165771484, 12.826362609863281, 153.29747009277344, 59.40008544921875, 3.773143768310547, -39.39593505859375, 59.761619567871094, 27.613052368164062, -112.45299530029297, 170.65322875976562, 95.69247436523438, 50.83851623535156, 85.8922119140625, 115.31523132324219, 80.90020751953125, 185.95726013183594, 173.06581115722656, -17.064666748046875, 22.925277709960938, 46.79447937011719, 54.09745788574219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000417.npy"}
{"epoch": 0.6123348017621145, "step": 418, "batch_size": 64, "mean": 79.63304138183594, "std": 70.61311340332031, "min": -113.64617156982422, "p10": 4.058103942871098, "median": 63.75849151611328, "p90": 174.6940444946289, "max": 265.544189453125, "pos_frac": 0.921875, "sample": [119.61126708984375, 45.21784973144531, 38.653228759765625, -8.738649368286133, 150.7117919921875, 31.915298461914062, 8.312545776367188, 19.16529083251953, 152.07803344726562, 133.56838989257812, 30.995697021484375, -3.6091156005859375, 162.70875549316406, 98.85407257080078, 168.8209228515625, 1.9570159912109375, 54.538360595703125, 41.20841979980469, 120.4944076538086, 55.030799865722656, -113.64617156982422, 178.1685791015625, 138.62600708007812, 37.65777587890625, 94.27837371826172, 77.93799591064453, 17.23827362060547, 206.3752899169922, 107.68533325195312, 108.0155029296875, 175.2464599609375, 116.91020202636719, 13.179237365722656, 211.89955139160156, 81.25518798828125, 39.9814567565918, 9.390281677246094, 54.36994552612305, 80.40174865722656, 87.39619445800781, 265.544189453125, 62.913665771484375, 194.8179931640625, 88.8505859375, 164.22055053710938, 42.887535095214844, 64.60331726074219, 76.83883666992188, 45.64167785644531, -49.04351806640625, 106.55247497558594, 17.22262954711914, 2.234771728515625, 144.38568115234375, 30.40526580810547, -8.329757690429688, 20.575531005859375, 214.81378173828125, 34.682594299316406, 90.05508422851562, 62.24322509765625, 173.4050750732422, 53.182708740234375, 53.948944091796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000418.npy"}
{"epoch": 0.6138032305433186, "step": 419, "batch_size": 64, "mean": 69.2376937866211, "std": 71.9326400756836, "min": -102.68609619140625, "p10": -27.14317398071289, "median": 70.20010375976562, "p90": 160.60507965087893, "max": 226.25701904296875, "pos_frac": 0.84375, "sample": [72.8779296875, -54.78461837768555, 29.716415405273438, 44.3359375, 91.76179504394531, 78.7069091796875, -27.111915588378906, 43.791099548339844, 30.84088134765625, 108.35275268554688, 179.20944213867188, 9.212242126464844, 94.65602111816406, 210.35858154296875, 43.74837112426758, 77.51103210449219, -102.68609619140625, 20.972929000854492, 117.56246185302734, 67.52227783203125, 76.0437240600586, 16.910369873046875, -33.306793212890625, 55.349609375, 52.078369140625, -68.98757934570312, 82.81192016601562, 64.2782211303711, 147.8240203857422, 12.763046264648438, -51.4256591796875, 22.28133773803711, 111.3960189819336, 140.07757568359375, -8.65167236328125, 31.58489227294922, 76.14274597167969, 139.03109741210938, 48.78010177612305, -27.156570434570312, 118.13597106933594, -30.031265258789062, 206.28286743164062, 113.28562927246094, 126.62420654296875, 5.45501708984375, 38.07075881958008, 25.88873291015625, 187.9860382080078, 163.33360290527344, 101.41549682617188, 18.843643188476562, 60.404518127441406, 152.1865234375, 145.76412963867188, 147.21136474609375, 226.25701904296875, 154.238525390625, 50.223846435546875, 73.739013671875, 77.73049926757812, -15.241790771484375, 77.73443603515625, 181.29449462890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000419.npy"}
{"epoch": 0.6152716593245228, "step": 420, "batch_size": 64, "mean": 63.06053924560547, "std": 72.83586120605469, "min": -75.79473876953125, "p10": -25.699176025390624, "median": 55.413997650146484, "p90": 159.38004913330082, "max": 231.1338653564453, "pos_frac": 0.8125, "sample": [-26.033004760742188, 42.917755126953125, 79.77335357666016, 54.27996826171875, 103.07217407226562, -57.56537628173828, 80.77253723144531, 37.57344055175781, 231.1338653564453, 68.6864013671875, 29.629261016845703, -12.882881164550781, 53.7087287902832, 6.171165466308594, 26.496200561523438, -27.214374542236328, 60.98163604736328, 133.1977081298828, 17.95829963684082, 16.438549041748047, -22.14544105529785, 63.91316223144531, -25.2711181640625, 48.79786682128906, 53.32078552246094, 138.40777587890625, 204.2952423095703, 72.72567749023438, 34.14494323730469, 99.11651611328125, 138.462646484375, 28.653785705566406, 226.88046264648438, -75.79473876953125, 139.54348754882812, 128.07034301757812, 54.558753967285156, 56.01539611816406, 73.98480987548828, 116.59429168701172, -19.007049560546875, -25.88262939453125, 118.3111801147461, 20.1685791015625, 77.20988464355469, 54.812599182128906, 216.40847778320312, 14.855194091796875, 9.58993148803711, 75.59589385986328, -73.53998565673828, -63.83476257324219, 70.64788818359375, 73.89094543457031, 18.646240234375, 177.03103637695312, 58.2005615234375, 40.072784423828125, 136.96656799316406, 147.4308624267578, 186.94802856445312, 164.50112915039062, -25.250701904296875, 108.73196411132812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000420.npy"}
{"epoch": 0.6167400881057269, "step": 421, "batch_size": 64, "mean": 67.26036071777344, "std": 71.44530487060547, "min": -92.10398864746094, "p10": -7.122518539428707, "median": 54.23846435546875, "p90": 168.1824951171875, "max": 211.7091522216797, "pos_frac": 0.828125, "sample": [158.09176635742188, 2.980205535888672, 168.80555725097656, 37.41578674316406, 160.98397827148438, 39.08277893066406, 211.7091522216797, 13.54559326171875, 60.130165100097656, 10.602725982666016, 146.78746032714844, 57.668453216552734, 24.70043182373047, 130.81675720214844, 96.75422668457031, 53.85115051269531, 126.02975463867188, 10.004661560058594, 51.399330139160156, 61.536895751953125, 193.20587158203125, 160.62228393554688, 51.72203063964844, 53.23482894897461, 54.62577819824219, 61.86885070800781, -14.683647155761719, 101.99855041503906, 43.93389892578125, 172.8113555908203, 113.87265014648438, 166.7286834716797, 174.87159729003906, 128.45237731933594, 0.4231681823730469, -8.681163787841797, 169.52127075195312, 118.49745178222656, 12.657379150390625, 126.97319793701172, -30.841224670410156, -3.4856796264648438, 97.61091613769531, 51.15234375, 120.93032836914062, 66.40167236328125, 176.91619873046875, -0.6621551513671875, 14.052484512329102, 154.13755798339844, 75.26040649414062, -66.62347412109375, 17.718334197998047, -92.10398864746094, 8.01108169555664, -1.470489501953125, 140.65516662597656, 0.2304229736328125, -55.111480712890625, -3.0763626098632812, 22.65953254699707, -34.11738204956055, 103.88258361816406, 36.98328399658203], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000421.npy"}
{"epoch": 0.618208516886931, "step": 422, "batch_size": 64, "mean": 65.76749420166016, "std": 69.72413635253906, "min": -81.93333435058594, "p10": -10.806053161621094, "median": 57.23948287963867, "p90": 143.7778579711914, "max": 298.6029357910156, "pos_frac": 0.84375, "sample": [-11.135482788085938, 65.95609283447266, 217.45973205566406, 165.87701416015625, 34.93415832519531, -31.489181518554688, -38.54591369628906, 39.458038330078125, 34.87594223022461, 298.6029357910156, 143.51573181152344, 142.6848907470703, 86.83197021484375, 39.37345886230469, 159.81173706054688, -10.037384033203125, 27.875560760498047, 37.316795349121094, 77.1279067993164, 135.5467529296875, 91.60983276367188, 92.78587341308594, 42.853782653808594, 44.26899719238281, 16.593505859375, 134.80313110351562, 82.89337921142578, 126.23101043701172, 68.41999053955078, 79.07415771484375, 213.16497802734375, 8.455078125, 74.79541015625, 33.15202713012695, 16.001079559326172, 124.5071029663086, 92.64447784423828, -63.80613708496094, 119.24119567871094, -3.0366744995117188, 17.15353012084961, 136.46923828125, 25.291854858398438, -13.877883911132812, 26.833255767822266, 150.42910766601562, 143.89019775390625, 61.753387451171875, 42.821311950683594, -4.084314346313477, 119.294189453125, 75.31863403320312, 123.96212768554688, 3.72418212890625, 13.34417724609375, 137.17958068847656, 13.68804931640625, 57.68328857421875, 61.53472900390625, 56.795677185058594, -81.93333435058594, 31.152217864990234, 23.221900939941406, -23.218284606933594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000422.npy"}
{"epoch": 0.6196769456681351, "step": 423, "batch_size": 64, "mean": 61.6230583190918, "std": 78.239501953125, "min": -110.01030731201172, "p10": -27.169914627075194, "median": 58.52078628540039, "p90": 159.51715087890625, "max": 250.28492736816406, "pos_frac": 0.71875, "sample": [15.329597473144531, -17.406539916992188, -31.638999938964844, 13.623252868652344, 20.498085021972656, -11.18316650390625, 102.89334106445312, -32.230934143066406, 128.62176513671875, 132.99093627929688, 13.024017333984375, -27.570667266845703, 222.1184844970703, 98.74156951904297, 26.45022964477539, 84.95637512207031, 98.64844512939453, 221.17515563964844, 143.84507751464844, 121.18960571289062, -34.07069396972656, -68.65597534179688, 24.960494995117188, 8.38602066040039, 56.691375732421875, 97.1359634399414, 159.29071044921875, 129.10110473632812, -110.01030731201172, -38.865753173828125, 55.856910705566406, 36.7647819519043, 37.45775604248047, 110.37149810791016, 196.22235107421875, 6.6358489990234375, 60.350196838378906, 85.9207992553711, -23.93285369873047, -20.02410888671875, 94.87326049804688, 159.61419677734375, -14.5302734375, -24.620773315429688, 250.28492736816406, 49.96181106567383, 119.23050689697266, -9.099096298217773, 90.4815673828125, 105.75053405761719, -26.234825134277344, -25.66124725341797, 208.88331604003906, 162.10250854492188, 45.97172546386719, 153.06484985351562, -8.785993576049805, 80.79688262939453, -16.90575408935547, 72.02952575683594, 75.5382308959961, 108.99226379394531, 77.07632446289062, 121.39961242675781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000423.npy"}
{"epoch": 0.6211453744493393, "step": 424, "batch_size": 64, "mean": 65.39669799804688, "std": 70.09913635253906, "min": -115.32315826416016, "p10": -14.272481536865234, "median": 61.26641845703125, "p90": 150.33312225341797, "max": 238.51812744140625, "pos_frac": 0.859375, "sample": [-13.371147155761719, 181.64688110351562, 47.87017822265625, 27.960309982299805, 109.07209777832031, 124.86563110351562, 95.13529205322266, 180.44107055664062, 4.345726013183594, 19.908111572265625, 144.17108154296875, -24.553939819335938, 115.5582046508789, 37.65184020996094, 19.771377563476562, 218.41204833984375, 82.53219604492188, 123.80342864990234, 34.61195755004883, 13.313823699951172, 76.6250991821289, -95.73014831542969, 116.51554870605469, 41.430213928222656, 63.311607360839844, 30.22699737548828, 67.17620849609375, 86.08438110351562, 6.816165924072266, -31.147581100463867, 13.782487869262695, 73.05128479003906, 13.058502197265625, 28.33477020263672, 60.9239501953125, 15.6466064453125, 151.11322021484375, -115.32315826416016, 18.73236083984375, 41.45658874511719, 19.975345611572266, -25.318099975585938, -11.556854248046875, 136.98196411132812, 148.32052612304688, 41.594146728515625, 27.77276611328125, -15.489547729492188, -14.658767700195312, 238.51812744140625, 106.76477813720703, 93.12783813476562, 87.06901550292969, 122.89419555664062, 50.22515869140625, 6.491424560546875, 148.5128936767578, 61.60888671875, 74.09059143066406, 141.31231689453125, 163.74642944335938, 181.80044555664062, 86.78448486328125, 109.58941650390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000424.npy"}
{"epoch": 0.6226138032305433, "step": 425, "batch_size": 64, "mean": 43.77002716064453, "std": 83.19957733154297, "min": -174.35218811035156, "p10": -41.39962692260742, "median": 23.42754364013672, "p90": 178.31704406738282, "max": 246.20156860351562, "pos_frac": 0.609375, "sample": [-15.531989097595215, 8.5027494430542, -43.400474548339844, -57.83555603027344, -39.966552734375, 154.13479614257812, -11.03104019165039, 114.98828125, 55.478179931640625, 114.85324096679688, 98.42134094238281, 23.248275756835938, 23.6068115234375, -37.36609649658203, 23.643678665161133, -53.20489501953125, -14.745040893554688, 33.905517578125, 205.65997314453125, 72.45306396484375, -14.56658935546875, -2.4177207946777344, 36.35112762451172, -71.17240905761719, -22.110275268554688, 4.926349639892578, 44.29254150390625, -48.89366912841797, 12.743110656738281, 200.6202392578125, -24.241928100585938, -21.03985595703125, -30.06310272216797, 1.7763710021972656, 172.73406982421875, 14.42529296875, 130.21597290039062, -1.0024566650390625, -2.1822586059570312, 180.70974731445312, -19.505334854125977, 89.11798095703125, -3.4278717041015625, -4.166648864746094, 201.82666015625, 194.6958465576172, 52.09654235839844, 48.29784393310547, 116.84745025634766, 107.89375305175781, 88.54788970947266, 185.25469970703125, 105.40918731689453, -27.93218231201172, 51.70637512207031, 118.88055419921875, 125.06532287597656, 246.20156860351562, -42.01380157470703, -27.504066467285156, 73.18299865722656, -174.35218811035156, 65.31364440917969, 12.926923751831055], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000425.npy"}
{"epoch": 0.6240822320117474, "step": 426, "batch_size": 64, "mean": 64.79296875, "std": 64.8238754272461, "min": -110.89173889160156, "p10": -22.529104614257808, "median": 62.809499740600586, "p90": 158.76752319335938, "max": 218.70167541503906, "pos_frac": 0.828125, "sample": [63.12665939331055, 12.037879943847656, 28.962844848632812, 10.863143920898438, -40.9450798034668, 79.50788116455078, -18.264739990234375, 164.05661010742188, 218.70167541503906, 72.02576446533203, 157.02850341796875, 57.13982391357422, 96.76801300048828, 56.39394760131836, 49.202728271484375, 60.2547492980957, 58.3967399597168, 43.263267517089844, 162.98162841796875, 168.42623901367188, 103.58185577392578, 86.05990600585938, 90.25138092041016, 94.75695037841797, -2.05126953125, 25.89175796508789, 56.35051345825195, -110.89173889160156, 72.77364349365234, 149.7706756591797, 104.3390121459961, 178.311767578125, 45.65690612792969, 78.47830200195312, 123.73635864257812, 100.03813934326172, 96.27731323242188, 21.942398071289062, 18.542984008789062, -72.18547058105469, 27.505355834960938, 125.6810302734375, 130.23452758789062, -3.0510635375976562, 159.7149658203125, -5.0629425048828125, -27.356090545654297, 21.000167846679688, -32.68653869628906, 92.63955688476562, 77.88436889648438, 96.9094009399414, 18.764089584350586, 47.269493103027344, 122.63831329345703, 37.242488861083984, 46.42401123046875, 159.5128173828125, 62.492340087890625, -24.356689453125, 132.75497436523438, 77.19939422607422, -33.52079772949219, 75.35685729980469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000426.npy"}
{"epoch": 0.6255506607929515, "step": 427, "batch_size": 64, "mean": 60.51248550415039, "std": 59.453407287597656, "min": -90.23011016845703, "p10": -5.588325500488278, "median": 51.6998405456543, "p90": 142.5576889038086, "max": 213.64120483398438, "pos_frac": 0.875, "sample": [86.52481079101562, 20.39862060546875, -7.2430267333984375, -1.72735595703125, 133.6736297607422, 76.31617736816406, 108.60446166992188, -90.23011016845703, 41.257537841796875, 42.351478576660156, 8.687454223632812, 103.77420043945312, 5.535823822021484, 37.68422317504883, 155.56761169433594, 32.45111846923828, 19.861907958984375, 90.47789001464844, -31.802825927734375, 58.571258544921875, 109.86871337890625, -33.83758544921875, 52.28297424316406, 50.90399169921875, 113.65837860107422, 16.725433349609375, 89.7275390625, 140.26988220214844, 48.218963623046875, 81.43071746826172, 132.0328826904297, 213.64120483398438, 95.79381561279297, 70.58828735351562, 25.732837677001953, 135.027587890625, 54.176170349121094, 38.15736389160156, 129.1707305908203, 79.15242004394531, 151.58389282226562, 22.579299926757812, 27.842857360839844, 3.8641357421875, 26.165481567382812, 91.35086059570312, 23.614463806152344, 51.11670684814453, 143.53817749023438, -38.34894561767578, 23.237945556640625, 39.467124938964844, 146.490966796875, 78.10067749023438, 169.20901489257812, 4.319793701171875, -14.925346374511719, 76.86797332763672, 1.5698051452636719, 52.85369110107422, 166.1847381591797, 91.1172866821289, 26.454437255859375, -24.913162231445312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000427.npy"}
{"epoch": 0.6270190895741556, "step": 428, "batch_size": 64, "mean": 48.68279266357422, "std": 75.49652099609375, "min": -174.43063354492188, "p10": -47.72638053894042, "median": 43.83076477050781, "p90": 145.4643112182617, "max": 222.13253784179688, "pos_frac": 0.75, "sample": [61.57679748535156, -60.03642272949219, -7.861808776855469, -79.64481353759766, 144.95724487304688, 43.17900085449219, 125.6414794921875, 216.43133544921875, 80.35807037353516, 135.63156127929688, 148.8291473388672, -1.123992919921875, 47.96125793457031, 26.1990966796875, 58.82151794433594, -1.1516265869140625, -5.486419677734375, 111.12661743164062, 30.85840606689453, 60.827117919921875, 25.231735229492188, 21.777313232421875, 70.70175170898438, 22.052730560302734, 130.0826873779297, 87.59174346923828, -3.5001678466796875, 37.697021484375, 36.289772033691406, 36.86854553222656, 15.91817855834961, 44.48252868652344, 148.8826141357422, 59.98658752441406, 145.68162536621094, 14.971841812133789, 47.507591247558594, 81.51361083984375, 4.377830505371094, -9.854164123535156, -4.764793395996094, 48.46836853027344, 61.13470458984375, -39.30763244628906, 19.972389221191406, 50.566375732421875, -52.63154602050781, -51.334415435791016, 107.51287841796875, 196.60995483398438, 114.70323181152344, 60.045860290527344, -12.2999267578125, 30.045936584472656, 121.98179626464844, 121.07058715820312, 182.7015380859375, 222.13253784179688, -107.14352416992188, 9.229976654052734, 98.76029968261719, -65.77936553955078, 23.09930419921875, -174.43063354492188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000428.npy"}
{"epoch": 0.6284875183553598, "step": 429, "batch_size": 64, "mean": 68.02421569824219, "std": 78.03314971923828, "min": -83.29788208007812, "p10": -24.518481636047362, "median": 62.92340087890625, "p90": 171.8238723754883, "max": 249.73863220214844, "pos_frac": 0.796875, "sample": [93.76484680175781, -18.807785034179688, 13.160762786865234, 201.5391387939453, 71.71928405761719, 88.25180053710938, -66.98265075683594, -24.31684684753418, 100.7266845703125, -9.4515380859375, 10.837394714355469, 33.52992248535156, 42.011436462402344, 54.73069381713867, -26.572372436523438, 134.4304656982422, 45.155914306640625, -5.816173553466797, 197.64637756347656, 127.6326904296875, 69.03077697753906, 141.49838256835938, 30.464317321777344, 7.8698577880859375, 10.104904174804688, 18.213699340820312, 89.6422119140625, 6.157848358154297, 180.03976440429688, 237.99119567871094, 11.247184753417969, 49.830322265625, 123.68727111816406, 32.858245849609375, -83.29788208007812, -24.604896545410156, -67.69584655761719, 110.60337829589844, 133.8897705078125, 10.272537231445312, 132.6681671142578, 166.14952087402344, 78.9537582397461, 142.0775146484375, 155.53135681152344, 37.55145263671875, 235.44451904296875, 38.934417724609375, 68.26832580566406, -25.374855041503906, 65.7340087890625, 125.69747924804688, 128.8704833984375, 60.34039306640625, -9.446094512939453, -47.21846008300781, -19.679548263549805, 66.40354919433594, 174.2557373046875, 60.15223693847656, 99.67536163330078, 249.73863220214844, 65.50640869140625, 152.32247924804688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000429.npy"}
{"epoch": 0.6299559471365639, "step": 430, "batch_size": 64, "mean": 50.407920837402344, "std": 67.27767944335938, "min": -85.41578674316406, "p10": -25.24410209655761, "median": 50.786346435546875, "p90": 133.60447387695314, "max": 215.1168212890625, "pos_frac": 0.734375, "sample": [5.1484832763671875, -56.01091003417969, -2.7110137939453125, -57.70826721191406, 136.74810791015625, 68.20420837402344, 70.44668579101562, 126.17611694335938, 83.81399536132812, 124.31759643554688, 77.16809844970703, 41.18121337890625, 0.0067138671875, -70.4417724609375, 98.62163543701172, -27.286319732666016, -54.85868835449219, 177.53924560546875, 106.32609558105469, 31.89517593383789, -85.41578674316406, 95.38424682617188, -10.2115478515625, 115.50999450683594, 3.7974205017089844, 67.19739532470703, -70.54507446289062, 178.08726501464844, 87.724853515625, 58.25196838378906, 38.50144958496094, -2.675189971923828, 90.12567138671875, 102.28448486328125, 110.56244659423828, 61.63288879394531, -10.6099853515625, -20.478927612304688, 8.388511657714844, 129.46975708007812, 40.62986755371094, 41.98625946044922, 78.2970962524414, -18.79150390625, 48.428565979003906, 158.66534423828125, 18.041259765625, 103.45077514648438, 86.79780578613281, 66.19424438476562, 66.7572021484375, -2.744415283203125, 79.84600830078125, 53.144126892089844, -1.526611328125, -17.23052978515625, 34.204715728759766, -10.974784851074219, 135.37649536132812, 179.73556518554688, 215.1168212890625, 13.951057434082031, 28.579452514648438, 2.6139163970947266], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000430.npy"}
{"epoch": 0.631424375917768, "step": 431, "batch_size": 64, "mean": 60.23283767700195, "std": 71.6241683959961, "min": -80.077880859375, "p10": -35.87823638916015, "median": 64.37686920166016, "p90": 156.26074829101563, "max": 242.52972412109375, "pos_frac": 0.78125, "sample": [-57.161048889160156, 26.968215942382812, 6.655998229980469, 134.1592559814453, 160.1170654296875, 94.47052001953125, 110.17245483398438, -27.927364349365234, 81.41191864013672, 69.40684509277344, 119.07380676269531, -17.714630126953125, 12.82357406616211, -55.08872985839844, 86.73155212402344, 52.62847900390625, 104.51103973388672, 100.08943939208984, 157.13385009765625, 74.884033203125, 62.94630432128906, 88.01588439941406, 74.80628967285156, 137.26080322265625, -18.706960678100586, -38.018707275390625, 154.2235107421875, 78.91581726074219, 20.088836669921875, 81.05233764648438, 124.23272705078125, 42.84989547729492, 209.39419555664062, -7.459022521972656, 181.31857299804688, -32.42936706542969, 104.8111572265625, -37.3563232421875, 113.49169921875, -73.62117004394531, 57.70802307128906, 73.54507446289062, 242.52972412109375, -4.197917938232422, -32.117347717285156, 7.8964691162109375, 53.14210510253906, 82.96083068847656, 15.754322052001953, 65.80743408203125, -46.38255310058594, 36.84217834472656, 10.338245391845703, 172.01699829101562, 32.22832107543945, 103.39329528808594, 11.634273529052734, 50.0626335144043, -80.077880859375, 58.88380432128906, 148.38027954101562, 171.7135772705078, 67.18844604492188, 56.488372802734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000431.npy"}
{"epoch": 0.6328928046989721, "step": 432, "batch_size": 64, "mean": 72.14842987060547, "std": 72.28042602539062, "min": -69.38639831542969, "p10": -15.919144439697261, "median": 67.587646484375, "p90": 154.86210784912112, "max": 272.0621337890625, "pos_frac": 0.84375, "sample": [51.5465087890625, -8.814437866210938, 123.6865234375, 133.11892700195312, 33.45281219482422, -1.9674034118652344, 43.091705322265625, 3.6740875244140625, 196.5880126953125, 176.30743408203125, 44.475990295410156, -20.442001342773438, 137.31613159179688, 101.3820571899414, 0.9233551025390625, 272.0621337890625, 80.72277069091797, 142.03530883789062, 50.44293212890625, 21.653045654296875, 13.458122253417969, 142.80502319335938, 114.37321472167969, -17.78801727294922, 12.328948974609375, 187.62396240234375, 68.16484069824219, -69.38639831542969, 27.209991455078125, 73.90270233154297, 26.443740844726562, -54.808135986328125, -57.459144592285156, 123.74858093261719, -11.558441162109375, 128.73321533203125, 107.11956787109375, 8.722091674804688, 112.866943359375, 79.42936706542969, 118.21025848388672, -23.43170166015625, 114.72785949707031, 194.15872192382812, 1.2179107666015625, 67.01045227050781, 99.38262939453125, 4.473075866699219, 137.02447509765625, 48.93100357055664, 93.62713623046875, 141.4010772705078, 84.97610473632812, 51.72734069824219, 3.5831985473632812, 42.01920700073242, 156.3758544921875, 107.64212799072266, 208.68589782714844, 63.73223876953125, 57.5894775390625, -33.65330505371094, 129.57235717773438, 151.3300323486328], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000432.npy"}
{"epoch": 0.6343612334801763, "step": 433, "batch_size": 64, "mean": 57.14216995239258, "std": 75.06452178955078, "min": -171.7236785888672, "p10": -28.646405029296872, "median": 50.06402587890625, "p90": 160.08431091308594, "max": 201.97134399414062, "pos_frac": 0.703125, "sample": [71.23133850097656, 143.10447692871094, -20.015853881835938, -8.41534423828125, 191.7432403564453, 99.98646545410156, -31.173009872436523, -29.893692016601562, 201.97134399414062, 160.6822509765625, 66.54983520507812, 24.770477294921875, -9.599105834960938, 32.954925537109375, 104.73905944824219, 13.794723510742188, 40.13551330566406, 127.83696746826172, 122.0883560180664, -171.7236785888672, -48.091552734375, 152.1614990234375, 161.04771423339844, -25.353775024414062, 158.68911743164062, 40.06927490234375, 185.87832641601562, 166.46022033691406, 68.0805435180664, 93.17872619628906, 40.674896240234375, 121.96653747558594, -8.549148559570312, 36.06695556640625, -1.380767822265625, 67.3939437866211, 50.19597625732422, -64.23712158203125, 84.95319366455078, 49.93207550048828, 191.72671508789062, 138.01319885253906, -33.5927848815918, -34.42225646972656, -25.736068725585938, 133.89080810546875, 84.71903991699219, -11.099456787109375, -4.266529083251953, 103.4163818359375, 46.933746337890625, 18.251144409179688, 129.21524047851562, -5.7600812911987305, 104.77493286132812, -7.893638610839844, 52.786773681640625, 18.704803466796875, -15.87713623046875, 128.6718292236328, 59.97801208496094, 72.34073638916016, 15.971639633178711, 36.44696044921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000433.npy"}
{"epoch": 0.6358296622613803, "step": 434, "batch_size": 64, "mean": 59.13129806518555, "std": 77.37425994873047, "min": -82.39614868164062, "p10": -34.37220840454101, "median": 59.486690521240234, "p90": 152.25554504394535, "max": 349.5191650390625, "pos_frac": 0.765625, "sample": [167.65301513671875, 100.69816589355469, 349.5191650390625, 10.026092529296875, -50.100318908691406, -82.39614868164062, 35.48382568359375, 115.99021911621094, -42.28312683105469, -25.267059326171875, 121.18802642822266, -20.991302490234375, 37.38125991821289, 75.68937683105469, 105.78651428222656, -35.80279541015625, 60.39623260498047, 27.372482299804688, 66.34226989746094, 62.811859130859375, 130.8092803955078, -5.570499420166016, 63.84208679199219, -2.7781829833984375, 113.68516540527344, 2.4581642150878906, 73.15791320800781, 49.237335205078125, 58.5771484375, 138.49440002441406, -73.88980102539062, 201.19680786132812, 103.5068359375, -31.03417205810547, 113.267578125, 170.55535888671875, -24.684741973876953, 15.430049896240234, -17.185806274414062, 120.01974487304688, 6.892120361328125, 25.745445251464844, 5.7356109619140625, -43.259971618652344, 145.43539428710938, 107.625732421875, 115.09734344482422, 86.0870132446289, 22.03546142578125, 68.90553283691406, 21.123207092285156, -22.84439468383789, 155.178466796875, 36.28128433227539, 1.1482391357421875, 37.21119689941406, 172.13497924804688, 102.1316909790039, 76.19267272949219, 144.4225311279297, 158.14309692382812, 83.18079376220703, 51.77888107299805, -50.571746826171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000434.npy"}
{"epoch": 0.6372980910425844, "step": 435, "batch_size": 64, "mean": 65.7310791015625, "std": 72.73974609375, "min": -104.35537719726562, "p10": -22.444498825073243, "median": 61.45952796936035, "p90": 174.34404296875002, "max": 248.96461486816406, "pos_frac": 0.828125, "sample": [248.96461486816406, -29.057464599609375, -22.639095306396484, 19.87390899658203, 178.67233276367188, 130.51016235351562, 93.23345947265625, 30.680404663085938, 39.40095520019531, 46.275184631347656, 160.99270629882812, -55.2689208984375, 20.22703742980957, 102.2150650024414, 171.794921875, 175.4365234375, 75.32625579833984, 62.113861083984375, 60.80519485473633, -21.990440368652344, 1.281158447265625, -7.335296630859375, 54.405548095703125, 82.10627746582031, 34.02618408203125, 11.20029067993164, 108.33134460449219, 107.47411346435547, -104.35537719726562, 36.3665885925293, 63.325164794921875, 6.610637664794922, 90.64320373535156, 3.2427215576171875, 127.531494140625, -1.4493408203125, 100.12269592285156, -34.95433807373047, 18.032623291015625, -27.763320922851562, 134.62176513671875, 64.24102020263672, 98.85722351074219, 4.816043853759766, 126.17886352539062, 119.40733337402344, -45.81744384765625, 97.8970947265625, 66.6815185546875, 34.16619873046875, 239.53665161132812, 46.019588470458984, 76.86338806152344, 15.829996109008789, 110.78671264648438, 181.47450256347656, 202.89437866210938, 40.50579833984375, 64.23216247558594, 199.29885864257812, 57.10818099975586, 56.513427734375, 65.70274353027344, -7.4359283447265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000435.npy"}
{"epoch": 0.6387665198237885, "step": 436, "batch_size": 64, "mean": 75.75979614257812, "std": 63.03631591796875, "min": -64.19618225097656, "p10": 1.9842391967773438, "median": 80.06029510498047, "p90": 164.9123077392578, "max": 194.62191772460938, "pos_frac": 0.90625, "sample": [15.511831283569336, -57.07902526855469, 63.7266845703125, 174.03857421875, 173.8343963623047, 98.32766723632812, 68.57276916503906, 38.015846252441406, 149.9133758544922, 55.58758544921875, 11.613658905029297, 101.4997329711914, 95.90909576416016, -64.19618225097656, 141.04067993164062, 146.1803436279297, -2.0230255126953125, 43.742828369140625, 82.93557739257812, 92.08174133300781, 100.14276885986328, 81.84458923339844, 45.36236572265625, 54.102813720703125, 164.99154663085938, 38.06459045410156, 102.65545654296875, 183.54904174804688, 89.77153015136719, 77.18644714355469, 194.62191772460938, 176.1125946044922, 69.03834533691406, 125.93882751464844, 42.40283966064453, 64.62567138671875, 84.3016586303711, 6.880195617675781, 162.6754150390625, 78.2760009765625, 69.54232788085938, 86.60479736328125, 89.46882629394531, -23.697528839111328, 98.74893951416016, 60.43682861328125, 20.235130310058594, -13.426055908203125, 141.42958068847656, -58.06739807128906, 171.37728881835938, 99.41170501708984, 120.01654052734375, 2.0434341430664062, 90.97499084472656, 26.116622924804688, 164.7274169921875, 12.967582702636719, 2.467376708984375, 1.9588699340820312, 2.981975555419922, 56.31578063964844, 92.9267349243164, 161.28643798828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000436.npy"}
{"epoch": 0.6402349486049926, "step": 437, "batch_size": 64, "mean": 50.75785446166992, "std": 59.347320556640625, "min": -80.70347595214844, "p10": -34.6760902404785, "median": 63.88522911071777, "p90": 121.29343948364259, "max": 168.2631378173828, "pos_frac": 0.796875, "sample": [94.83309936523438, -0.13604736328125, 17.742340087890625, 91.80635833740234, 88.39012145996094, -58.77184295654297, 76.79095458984375, 48.7255859375, 137.31753540039062, 12.451202392578125, 27.173702239990234, -39.719261169433594, 41.849647521972656, 82.07447052001953, 24.57115936279297, 159.58822631835938, 72.31562805175781, -74.98735046386719, 124.96273803710938, 52.48358917236328, 18.05516815185547, 14.540092468261719, -0.9466819763183594, 89.05628967285156, 6.7709503173828125, 87.41622924804688, 65.06009674072266, -44.616111755371094, 88.34196472167969, -13.798229217529297, 99.56983184814453, 17.699310302734375, 118.57855224609375, 10.479225158691406, 168.2631378173828, -64.97164916992188, 40.230682373046875, -17.568788528442383, 105.45248413085938, 73.47504425048828, 98.45117950439453, -80.70347595214844, 106.80343627929688, 110.56816101074219, 68.79212951660156, 62.71036148071289, 81.4523696899414, 86.2461166381836, 144.45242309570312, 70.08419799804688, -52.55812454223633, 79.18218994140625, 29.309677124023438, 60.753143310546875, 120.80135345458984, 21.995040893554688, 131.16970825195312, 12.883407592773438, 28.887405395507812, 121.50433349609375, -21.430191040039062, -22.90869140625, 78.08023834228516, 71.42677307128906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000437.npy"}
{"epoch": 0.6417033773861968, "step": 438, "batch_size": 64, "mean": 60.790897369384766, "std": 65.49947357177734, "min": -77.3135986328125, "p10": -32.007057189941406, "median": 64.91218948364258, "p90": 141.28597717285157, "max": 185.34756469726562, "pos_frac": 0.78125, "sample": [107.98959350585938, 125.59678649902344, 134.48631286621094, 76.27045440673828, 103.48320007324219, -36.38896179199219, 105.16024780273438, -57.46763610839844, -31.103805541992188, 154.27308654785156, 61.28315353393555, 62.83049011230469, -25.274402618408203, 150.78073120117188, 121.13455200195312, 122.41470336914062, 139.7196502685547, 141.95726013183594, -7.218116760253906, -23.080703735351562, -77.3135986328125, 158.32742309570312, 74.724853515625, 130.38880920410156, 164.75445556640625, 84.87166595458984, 105.19476318359375, 35.66015625, -45.304931640625, 21.39720916748047, -17.25677490234375, 24.325824737548828, 185.34756469726562, 64.04975128173828, 98.22628784179688, 43.27464294433594, 99.75646209716797, 12.801406860351562, 46.55487060546875, -3.54248046875, 69.46493530273438, 60.17028045654297, 28.84381103515625, 74.984375, -32.3941650390625, 65.77462768554688, 48.528961181640625, 81.11674499511719, 9.146820068359375, 120.65068817138672, 25.740074157714844, 22.381263732910156, 168.70767211914062, 69.7188720703125, 15.365455627441406, 52.27410888671875, 42.536888122558594, -50.848968505859375, 111.91192626953125, 130.08456420898438, 83.36268615722656, 125.76757049560547, -60.50154113769531, -5.2552642822265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000438.npy"}
{"epoch": 0.6431718061674009, "step": 439, "batch_size": 64, "mean": 56.781715393066406, "std": 70.68641662597656, "min": -105.25611114501953, "p10": -21.161859703063964, "median": 44.626277923583984, "p90": 143.5169448852539, "max": 240.16094970703125, "pos_frac": 0.828125, "sample": [42.51954650878906, 89.15750122070312, -20.704448699951172, 153.38844299316406, 44.399200439453125, -6.132846832275391, 45.249298095703125, 114.46884155273438, 41.97301483154297, -75.9267807006836, 14.325843811035156, 210.5631103515625, 2.1326141357421875, 160.89666748046875, 31.414199829101562, 128.95932006835938, 111.80278015136719, 7.766395568847656, 59.63396453857422, 37.581268310546875, 5.67820930480957, 141.78237915039062, 86.21399688720703, 20.042221069335938, 49.92015838623047, 127.42926025390625, 11.032524108886719, -47.88177490234375, -21.316682815551758, 40.79051971435547, 144.2603302001953, 7.140129089355469, 56.207950592041016, 22.68328094482422, 240.16094970703125, 39.79298400878906, 88.87505340576172, 140.9598388671875, 67.83772277832031, 88.72813415527344, 1.0783576965332031, -58.42608642578125, -23.04425048828125, 44.853355407714844, 32.79548645019531, 42.88349533081055, 19.982982635498047, 156.21006774902344, 125.93692016601562, -14.862129211425781, 141.12911987304688, 92.16543579101562, 67.37269592285156, 224.23703002929688, 6.592597961425781, 33.18471908569336, 57.72344970703125, -49.08002471923828, 44.89910125732422, 82.22474670410156, 96.53418731689453, 131.89022827148438, -105.25611114501953, -20.80060577392578], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000439.npy"}
{"epoch": 0.644640234948605, "step": 440, "batch_size": 64, "mean": 66.29185485839844, "std": 72.9873275756836, "min": -82.60221862792969, "p10": -21.800879287719717, "median": 63.67947769165039, "p90": 156.46375274658206, "max": 289.72113037109375, "pos_frac": 0.828125, "sample": [160.73617553710938, 88.33111572265625, 165.53732299804688, 23.780410766601562, 87.93463134765625, 21.041366577148438, -8.647193908691406, 18.186302185058594, 55.23225784301758, 85.10899353027344, 0.0211639404296875, 48.413177490234375, 132.39715576171875, 97.635498046875, 289.72113037109375, -68.66444396972656, 94.5185775756836, 114.16426086425781, 103.95549774169922, 44.62776184082031, 192.74533081054688, 145.34396362304688, 43.86486053466797, 146.49476623535156, 55.06553649902344, 85.66988372802734, -11.885536193847656, 172.1702880859375, 20.219261169433594, -10.627655029296875, -38.35749816894531, 192.06541442871094, 144.44363403320312, 60.012939453125, 108.28799438476562, -31.677574157714844, 189.25228881835938, -49.26902389526367, 44.222259521484375, 0.27501869201660156, 135.54566955566406, 46.09661865234375, 78.76112365722656, 89.17308807373047, 20.993736267089844, 36.206912994384766, -82.60221862792969, 67.34601593017578, -68.2939453125, 144.44808959960938, 122.76051330566406, 127.79530334472656, 115.2953109741211, 91.52127075195312, 20.586750030517578, 31.071380615234375, 4.794734954833984, -26.050312042236328, 72.51937866210938, 22.294845581054688, 77.66829681396484, 102.1347427368164, -7.6089630126953125, 7.8731231689453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000440.npy"}
{"epoch": 0.6461086637298091, "step": 441, "batch_size": 64, "mean": 69.98018646240234, "std": 65.57110595703125, "min": -31.318161010742188, "p10": 8.56330909729004, "median": 54.03856658935547, "p90": 163.45076141357424, "max": 327.2849426269531, "pos_frac": 0.921875, "sample": [121.55328369140625, 14.614825248718262, 45.14862060546875, 71.13948822021484, 180.62869262695312, 127.41188049316406, 48.324798583984375, 63.87754821777344, 32.061912536621094, 2.141925811767578, 13.570354461669922, 80.09632873535156, 164.54698181152344, 97.93963623046875, 80.06224822998047, 63.00726318359375, 31.39421844482422, 38.402374267578125, 58.49037170410156, 47.958526611328125, -1.8968524932861328, 126.86392974853516, 210.70083618164062, 160.89291381835938, 34.728965759277344, 11.248542785644531, 64.86714172363281, 83.74862670898438, 21.83080291748047, 327.2849426269531, 8.149791717529297, -31.318161010742188, 58.02832794189453, 195.0174102783203, 95.9911117553711, 12.253162384033203, 50.048805236816406, 70.9143295288086, 176.812255859375, 38.40167999267578, 111.268798828125, 24.312477111816406, 22.305923461914062, 11.024063110351562, 31.434173583984375, 34.30088806152344, 136.87181091308594, 159.99685668945312, 16.91827392578125, 30.249908447265625, -16.612686157226562, -0.17193984985351562, 105.42506408691406, 60.179473876953125, 20.14636993408203, 121.3853759765625, 29.324951171875, -1.5689239501953125, 46.49707794189453, 71.11815643310547, 98.5701904296875, 9.528182983398438, 71.26637268066406, 188.0215301513672], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000441.npy"}
{"epoch": 0.6475770925110133, "step": 442, "batch_size": 64, "mean": 76.8173828125, "std": 68.49284362792969, "min": -46.328765869140625, "p10": -1.2009801864624006, "median": 72.78909301757812, "p90": 178.07024993896488, "max": 223.6913299560547, "pos_frac": 0.890625, "sample": [79.34210205078125, 19.342453002929688, 200.54946899414062, 13.71826171875, 68.71835327148438, 186.68875122070312, 203.72067260742188, 84.32591247558594, 125.73057556152344, 61.47636032104492, 87.24150085449219, 182.46192932128906, 14.211036682128906, 97.32506561279297, 69.27664947509766, 45.38092803955078, 33.67070007324219, 65.11556243896484, -40.138885498046875, 18.157569885253906, 11.476036071777344, 122.95024108886719, 97.5145263671875, 147.9561767578125, 11.811088562011719, 84.45472717285156, -1.9163360595703125, -3.7289581298828125, 53.09381103515625, 123.61261749267578, 97.97185516357422, 0.4681835174560547, 66.29484558105469, -28.54029083251953, 138.44186401367188, 75.216796875, 22.130308151245117, 83.35128784179688, 121.48857879638672, 162.92205810546875, 1.355621337890625, 15.873832702636719, 118.5332260131836, -46.2461051940918, -22.543045043945312, 167.822998046875, 35.31016540527344, 50.415679931640625, 223.6913299560547, 126.96033477783203, 93.81573486328125, 30.200790405273438, 7.3548126220703125, 24.477264404296875, 74.32130432128906, -46.328765869140625, 127.3473892211914, 133.2891845703125, 215.26638793945312, 71.25688171386719, 217.69789123535156, 83.21035766601562, 143.91799926757812, 66.0270767211914], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000442.npy"}
{"epoch": 0.6490455212922174, "step": 443, "batch_size": 64, "mean": 56.881134033203125, "std": 64.20823669433594, "min": -97.75401306152344, "p10": -12.935807991027831, "median": 63.9353141784668, "p90": 130.64302062988284, "max": 199.3967742919922, "pos_frac": 0.828125, "sample": [194.46844482421875, 18.75261116027832, 143.65676879882812, 92.39752197265625, 16.952133178710938, 33.244720458984375, 32.48930740356445, -7.095787048339844, 37.90641784667969, 39.71466827392578, -93.90057373046875, -41.3262939453125, -12.016271591186523, -14.160804748535156, 82.12384033203125, -62.538352966308594, 125.92337036132812, 15.327186584472656, 184.24237060546875, 57.36465072631836, 199.3967742919922, 64.51615905761719, 33.95314025878906, 99.54520416259766, 79.89508056640625, 73.47824096679688, -97.75401306152344, 121.99951171875, 118.68806457519531, -13.32989501953125, 78.26942443847656, 67.4287109375, 50.3363037109375, 63.354469299316406, 76.16513061523438, 28.8670654296875, 72.26246643066406, 17.262069702148438, 66.75497436523438, -11.713790893554688, 28.861709594726562, 45.650115966796875, -9.47711181640625, 89.23873138427734, 31.657302856445312, 34.69781494140625, 76.06193542480469, 78.0859375, 39.04098892211914, 132.47581481933594, 162.71463012695312, 7.874561309814453, 65.08407592773438, 100.86318969726562, 56.55116271972656, 116.76922607421875, 75.1641616821289, 144.82470703125, 126.36650085449219, 125.67860412597656, 85.90379333496094, 7.635955810546875, -96.3744888305664, 82.14218139648438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000443.npy"}
{"epoch": 0.6505139500734214, "step": 444, "batch_size": 64, "mean": 55.769325256347656, "std": 72.77098846435547, "min": -106.52452087402344, "p10": -36.52985153198242, "median": 49.90106010437012, "p90": 150.5919967651367, "max": 211.87741088867188, "pos_frac": 0.8125, "sample": [-56.43256378173828, 50.826171875, 21.248138427734375, 106.29791259765625, -36.08844757080078, 62.18025207519531, 8.96152114868164, 204.73873901367188, 136.10902404785156, -62.87959289550781, -9.80633544921875, 78.72874450683594, 10.968597412109375, 107.78478240966797, 151.31332397460938, 13.728363037109375, 8.30303955078125, 64.21617126464844, 140.9165496826172, -41.45806884765625, 49.0603141784668, -20.673477172851562, 78.59384155273438, 171.2019500732422, 18.119197845458984, -8.369094848632812, 112.32368469238281, 149.13890075683594, 33.49391174316406, 26.297409057617188, 39.19517517089844, 211.87741088867188, 142.21498107910156, 7.998706817626953, 66.59915161132812, 144.02003479003906, 88.29979705810547, 10.792259216308594, 111.84693908691406, 169.9495849609375, 10.342247009277344, 138.4709014892578, 113.14932250976562, 50.74180603027344, 84.63534545898438, 123.96754455566406, 76.89312744140625, 6.885490417480469, -106.52452087402344, 107.77546691894531, 6.4608001708984375, -7.054798126220703, 17.86431121826172, 26.957042694091797, 87.2047119140625, 7.473148345947266, 0.9040412902832031, 114.91281127929688, -89.88833618164062, -36.719024658203125, 156.00047302246094, -48.682281494140625, 14.615676879882812, 151.21475219726562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000444.npy"}
{"epoch": 0.6519823788546255, "step": 445, "batch_size": 64, "mean": 64.85956573486328, "std": 64.95843505859375, "min": -88.54615783691406, "p10": -14.93157997131347, "median": 62.76518630981445, "p90": 160.6088958740235, "max": 224.181640625, "pos_frac": 0.84375, "sample": [62.340736389160156, 4.178142547607422, 165.9085693359375, 185.90431213378906, -4.79669189453125, -60.69475555419922, 91.016845703125, 178.39036560058594, 93.70033264160156, 41.90369415283203, 70.84095764160156, 91.84111022949219, 49.349334716796875, 39.12846755981445, -8.432544708251953, -88.54615783691406, 169.30648803710938, 62.264869689941406, -24.659080505371094, 72.32803344726562, 82.72728729248047, -5.4822235107421875, 7.776447296142578, 30.204727172851562, 5.156429290771484, 34.13969421386719, 224.181640625, 96.06258392333984, 21.023666381835938, 97.1905746459961, 136.16116333007812, 82.1959228515625, 68.3868637084961, 147.54525756835938, 136.3555450439453, 146.989990234375, 150.20419311523438, 88.54263305664062, -20.105331420898438, 115.25680541992188, 4.589813232421875, 48.42366027832031, 65.64695739746094, -39.06209945678711, 33.447967529296875, 184.80001831054688, 31.700302124023438, 68.5191421508789, 89.15081787109375, 52.143619537353516, 14.49664306640625, 46.44842529296875, 16.324007034301758, 63.18963623046875, 31.705459594726562, 81.71440887451172, 28.475860595703125, 95.59405517578125, 165.06805419921875, -22.4320068359375, -17.716880798339844, 113.37335205078125, 118.78729248046875, 40.836669921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000445.npy"}
{"epoch": 0.6534508076358296, "step": 446, "batch_size": 64, "mean": 58.844242095947266, "std": 65.3250732421875, "min": -112.68881225585938, "p10": -17.44843940734863, "median": 61.02729034423828, "p90": 147.7141586303711, "max": 200.44036865234375, "pos_frac": 0.796875, "sample": [25.352294921875, 61.799285888671875, 10.122589111328125, -1.5872650146484375, 88.54954528808594, 72.8001708984375, 172.28421020507812, 153.09103393554688, 81.41099548339844, 92.87352752685547, 117.2921142578125, 11.014678955078125, 4.952392578125, 46.7034912109375, -16.178447723388672, 7.342414855957031, 195.4104766845703, 100.33380126953125, 63.61968994140625, 128.36483764648438, 60.25529479980469, -9.758529663085938, 30.184188842773438, 59.82563400268555, -60.90039825439453, 98.33985900878906, 95.14588928222656, 135.15579223632812, 126.41022491455078, 53.05844497680664, 176.28810119628906, -22.119949340820312, 61.89170837402344, -5.357738494873047, -3.603771209716797, 71.1783218383789, 13.518814086914062, 166.29078674316406, 83.701904296875, 200.44036865234375, -36.72248840332031, -112.68881225585938, 75.418701171875, 126.48645782470703, 32.04493713378906, -23.84246063232422, 119.45610046386719, 66.50161743164062, 23.755691528320312, 80.78800964355469, 20.50811004638672, -5.199310302734375, -17.992721557617188, 36.11464309692383, 6.607841491699219, 72.74765014648438, 4.714942932128906, 148.04733276367188, 18.840423583984375, 71.34972381591797, -20.467620849609375, 146.93675231933594, 56.13459014892578, 130.99462890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000446.npy"}
{"epoch": 0.6549192364170338, "step": 447, "batch_size": 64, "mean": 71.3275375366211, "std": 74.52296447753906, "min": -88.44059753417969, "p10": -28.05784130096435, "median": 80.61674118041992, "p90": 153.53705596923828, "max": 292.0722961425781, "pos_frac": 0.84375, "sample": [150.09927368164062, 149.48556518554688, 66.94985961914062, 86.71246337890625, 16.69110870361328, -30.758071899414062, 4.9142303466796875, 136.00894165039062, 118.17577362060547, 38.919921875, 17.50408935546875, 81.27032470703125, 157.02395629882812, 50.843082427978516, 88.71453857421875, 15.985572814941406, 180.782470703125, -21.75730323791504, 292.0722961425781, 219.2738494873047, 122.36705017089844, 206.16708374023438, 95.5770034790039, 121.47224426269531, 86.78556060791016, 17.381614685058594, 77.16285705566406, -83.4095687866211, 166.57443237304688, 14.130012512207031, 100.80572509765625, 95.068115234375, 11.831962585449219, -7.5916595458984375, 66.97930908203125, 32.26753234863281, -39.4007568359375, 101.2177734375, 29.89501953125, -1.0646591186523438, 120.85841369628906, -51.603511810302734, 155.01039123535156, 14.809850692749023, 131.1511688232422, 123.97594451904297, 86.75932312011719, -88.44059753417969, -66.3047866821289, 61.42739486694336, 120.48324584960938, 26.863479614257812, 128.2051544189453, -35.51022720336914, 89.49591827392578, 66.14100646972656, 144.5, 41.91162109375, 109.72428894042969, 35.86186981201172, 111.19613647460938, 121.56088256835938, 79.9631576538086, 3.7937469482421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000447.npy"}
{"epoch": 0.6563876651982379, "step": 448, "batch_size": 64, "mean": 70.38318634033203, "std": 83.93827819824219, "min": -129.52493286132812, "p10": -38.77143096923828, "median": 68.32210540771484, "p90": 169.56487884521485, "max": 250.05413818359375, "pos_frac": 0.765625, "sample": [-67.25521850585938, 1.2850494384765625, 7.393585205078125, -15.668685913085938, 208.64566040039062, 43.22239685058594, -11.304641723632812, -11.110015869140625, -4.280158996582031, -67.43814086914062, 159.87747192382812, 82.45310974121094, -45.139495849609375, 31.604991912841797, -47.60972595214844, 170.24534606933594, 63.37141418457031, 104.25523376464844, 149.19091796875, 30.594482421875, -129.52493286132812, 98.1861801147461, 74.9254150390625, 42.585052490234375, 154.09271240234375, 179.26185607910156, -3.0886917114257812, 41.60784912109375, 168.88009643554688, 75.84759521484375, 91.32536315917969, 140.4473114013672, 2.317249298095703, 138.21188354492188, 213.0819854736328, -6.525112152099609, 144.40768432617188, 23.75469207763672, -54.4412841796875, 174.3660888671875, 73.27279663085938, 168.94615173339844, 154.8196563720703, -39.188453674316406, 250.05413818359375, 45.111602783203125, 44.96974182128906, 169.83004760742188, 162.70675659179688, 30.747711181640625, 19.910842895507812, 3.2266311645507812, 129.70999145507812, -32.783477783203125, 166.76779174804688, 152.44241333007812, -37.798377990722656, 125.64827728271484, 158.64102172851562, 82.14148712158203, 154.1475830078125, 46.98784255981445, 81.9538803100586, 40.205352783203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000448.npy"}
{"epoch": 0.657856093979442, "step": 449, "batch_size": 64, "mean": 72.28010559082031, "std": 75.46662902832031, "min": -109.24331665039062, "p10": -23.585840606689445, "median": 80.30645751953125, "p90": 155.65738220214845, "max": 275.29052734375, "pos_frac": 0.78125, "sample": [-9.777656555175781, 48.921051025390625, 138.54502868652344, 43.46412658691406, 191.91587829589844, 140.89572143554688, 125.84336853027344, 119.03713989257812, 88.67628479003906, 82.1002197265625, 195.70028686523438, -109.24331665039062, 133.9398193359375, 122.64381408691406, -1.6912384033203125, 93.87228393554688, 120.01268005371094, -16.859657287597656, -8.39605712890625, -38.911067962646484, 100.79670715332031, 25.721019744873047, 112.58456420898438, 103.58642578125, 127.79403686523438, 157.2954864501953, 70.60823059082031, 9.821134567260742, 59.55995178222656, 125.97239685058594, 150.7349090576172, 4.953441619873047, 91.66553497314453, 41.00010681152344, 97.44264221191406, 156.6170654296875, -27.839752197265625, -33.451622009277344, -26.468490600585938, 159.40255737304688, 275.29052734375, 153.41812133789062, -89.48388671875, 109.9066162109375, 130.57205200195312, 33.190696716308594, 78.5126953125, 39.607730865478516, -14.88465690612793, 127.43632507324219, 129.25057983398438, 92.03668975830078, 22.651565551757812, -43.27433776855469, 141.57611083984375, 2.219024658203125, 73.01858520507812, 209.3994140625, 2.2872238159179688, 77.89287567138672, -0.6230812072753906, -0.506622314453125, 71.1349868774414, 36.81236267089844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000449.npy"}
{"epoch": 0.6593245227606461, "step": 450, "batch_size": 64, "mean": 71.86397552490234, "std": 73.44386291503906, "min": -113.80902099609375, "p10": -28.381845855712882, "median": 66.51541137695312, "p90": 175.51070404052734, "max": 201.11790466308594, "pos_frac": 0.859375, "sample": [64.99919128417969, 201.11790466308594, 193.0856475830078, 84.6954116821289, 63.42881774902344, 178.8194122314453, 132.18563842773438, 49.95454406738281, 37.25670623779297, -66.8818359375, 54.38648986816406, 186.71087646484375, 153.94924926757812, 38.55791473388672, 66.58258056640625, 156.15696716308594, 93.91624450683594, 93.61288452148438, 50.59867858886719, 152.83184814453125, -113.80902099609375, 2.3424224853515625, 87.88587951660156, 60.2518310546875, 177.00482177734375, 136.2247314453125, 55.51179504394531, 149.684326171875, 70.21377563476562, -84.18666076660156, 109.66771697998047, 176.63059997558594, 172.89761352539062, 75.39522552490234, 29.727134704589844, 10.216850280761719, 35.15461730957031, 83.91712188720703, 148.8910369873047, 8.57518196105957, 193.8764190673828, 53.73841857910156, 34.41557312011719, 127.22163391113281, 112.1317138671875, 66.4482421875, -53.11200714111328, -20.419586181640625, -64.6204605102539, 86.92756652832031, 122.54386138916016, 33.22277069091797, -45.64494323730469, 40.18043518066406, 131.87518310546875, -31.79424285888672, 129.75848388671875, 97.83328247070312, 2.5946426391601562, 76.84774017333984, 57.30408477783203, 41.57820129394531, -14.896835327148438, 43.1219482421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000450.npy"}
{"epoch": 0.6607929515418502, "step": 451, "batch_size": 64, "mean": 60.40147399902344, "std": 82.55384063720703, "min": -127.96249389648438, "p10": -27.89381580352783, "median": 51.635555267333984, "p90": 164.62923889160157, "max": 241.19107055664062, "pos_frac": 0.734375, "sample": [59.444366455078125, 110.68844604492188, -3.22174072265625, 12.002021789550781, 112.27604675292969, 50.50343322753906, -44.66827392578125, 166.70416259765625, 119.08720397949219, 12.78756332397461, 8.0079345703125, 241.19107055664062, 139.31292724609375, 123.09304809570312, 127.85711669921875, -17.14319610595703, 5.137880325317383, 54.56492614746094, -52.80207061767578, -20.90667724609375, 38.47303771972656, -18.9710693359375, 136.47576904296875, 118.5978012084961, 208.66831970214844, -25.3385009765625, 36.842063903808594, 20.964828491210938, 160.88568115234375, -6.190460205078125, 17.207822799682617, 211.1110076904297, 105.32943725585938, 50.564598083496094, 91.30670928955078, 52.706512451171875, 147.97584533691406, -71.34674835205078, 44.52191925048828, -118.1189956665039, -28.988950729370117, 8.229270935058594, 161.070556640625, -127.96249389648438, 13.641422271728516, -15.55935287475586, -5.634952545166016, 62.07711410522461, 166.15438842773438, 47.59882354736328, 69.36131286621094, -65.83184814453125, 6.149837493896484, 117.2889633178711, 207.19664001464844, 97.90330505371094, 156.1845703125, 54.210289001464844, 113.3975601196289, -2.1427383422851562, 186.63385009765625, -2.096799850463867, 157.93011474609375, 83.30155181884766], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000451.npy"}
{"epoch": 0.6622613803230544, "step": 452, "batch_size": 64, "mean": 63.32386779785156, "std": 58.54567337036133, "min": -57.823211669921875, "p10": -12.483492469787596, "median": 59.436622619628906, "p90": 145.74776611328127, "max": 199.12164306640625, "pos_frac": 0.828125, "sample": [97.69100952148438, 46.95387268066406, 108.52970123291016, 129.4752197265625, 117.68203735351562, 141.87298583984375, 185.74913024902344, 169.5006866455078, 53.31298828125, 67.50828552246094, 7.413360595703125, 75.16047668457031, 19.845550537109375, 19.09040069580078, -4.7121734619140625, 17.38568878173828, 90.29667663574219, 51.39557647705078, 24.450607299804688, -24.9702091217041, 131.5166015625, 16.367908477783203, -15.060272216796875, 163.80604553222656, -9.498069763183594, 147.40838623046875, 100.12696838378906, -17.65238380432129, 112.06190490722656, -25.65363311767578, -57.823211669921875, 165.745849609375, 34.22576141357422, 50.42253112792969, 78.81173706054688, 83.48674011230469, 56.26835632324219, 199.12164306640625, 90.31385040283203, 37.40422058105469, 66.97551727294922, 31.985374450683594, 39.18115234375, 70.98709106445312, 62.604888916015625, -6.019725799560547, 14.96075439453125, 11.296798706054688, 81.08486938476562, 81.0899429321289, 147.6873321533203, 114.23616027832031, 37.57783508300781, -13.258245468139648, -10.675735473632812, 120.51134490966797, 94.1446304321289, 45.32447052001953, 95.26583099365234, 7.6407928466796875, 112.8868408203125, 79.76457977294922, -14.670906066894531, 47.11324691772461], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000452.npy"}
{"epoch": 0.6637298091042585, "step": 453, "batch_size": 64, "mean": 77.42842102050781, "std": 69.07745361328125, "min": -59.880767822265625, "p10": -4.147872924804686, "median": 73.60690307617188, "p90": 174.59532165527347, "max": 304.29412841796875, "pos_frac": 0.859375, "sample": [-8.56268310546875, 42.23920440673828, 1.790008544921875, -2.1785888671875, 98.71051025390625, 71.6771011352539, 91.2890853881836, 90.34872436523438, 56.33024978637695, 131.59527587890625, 67.79305267333984, 140.5386505126953, 133.1136016845703, 71.46234130859375, 63.520263671875, 125.00886535644531, 141.93470764160156, 166.42498779296875, 119.35969543457031, -4.991851806640625, 62.04412841796875, -13.500679016113281, 34.01780700683594, -1.0315093994140625, 135.52621459960938, 21.906654357910156, 17.284271240234375, 190.97146606445312, -14.955852508544922, 102.94818878173828, 48.2047233581543, 111.05718994140625, 29.344985961914062, 178.09689331054688, 187.26258850097656, 55.31557083129883, 17.309158325195312, 18.67807960510254, 304.29412841796875, 76.09605407714844, 89.09259033203125, 31.02237319946289, 98.78794860839844, 79.69745635986328, 142.42227172851562, 84.99050903320312, 104.01068115234375, 75.53670501708984, 146.29356384277344, 206.04774475097656, -26.80780029296875, 25.17828369140625, -59.880767822265625, 180.8736572265625, 123.77650451660156, -8.502891540527344, 20.627037048339844, 20.94095230102539, 208.53573608398438, 90.38920593261719, 31.13561248779297, 100.9675064086914, 5.3882293701171875, 26.622692108154297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000453.npy"}
{"epoch": 0.6651982378854625, "step": 454, "batch_size": 64, "mean": 62.07777786254883, "std": 85.26810455322266, "min": -122.02013397216797, "p10": -60.42327270507811, "median": 64.12895202636719, "p90": 172.9046051025391, "max": 263.328125, "pos_frac": 0.75, "sample": [165.13742065429688, 79.13751220703125, 5.52508544921875, 36.165225982666016, 130.00872802734375, -35.650718688964844, 66.94477844238281, 42.28401184082031, 69.91304016113281, 18.398117065429688, -88.77886962890625, 26.88359832763672, 176.2333984375, 133.71473693847656, -6.753421783447266, 76.2767105102539, 32.22262191772461, 4.367561340332031, 13.966331481933594, -30.161865234375, 91.04743194580078, 208.19174194335938, 123.74607849121094, -19.824951171875, -5.979591369628906, -0.9648284912109375, 144.75430297851562, -86.97356414794922, 185.77969360351562, 108.60494995117188, 91.95137023925781, 80.34220886230469, 109.97205352783203, 129.30812072753906, 153.42196655273438, 55.04612350463867, -15.031425476074219, -48.069580078125, -122.02013397216797, 95.47946166992188, 180.50477600097656, 210.16070556640625, 86.53041076660156, 52.89019775390625, 114.19168090820312, 132.3516845703125, 182.31417846679688, 13.009727478027344, 163.11294555664062, 61.31312561035156, 94.01241302490234, 263.328125, -79.3363037109375, 89.74796295166016, 28.316375732421875, -91.38697814941406, 149.27264404296875, -83.03202819824219, 133.76962280273438, -65.71771240234375, 45.66271209716797, 55.7453727722168, 54.8157958984375, -13.215190887451172], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000454.npy"}
{"epoch": 0.6666666666666666, "step": 455, "batch_size": 64, "mean": 63.95558166503906, "std": 72.43963623046875, "min": -81.7669906616211, "p10": -31.268123245239256, "median": 67.1108169555664, "p90": 161.81008758544922, "max": 227.54083251953125, "pos_frac": 0.8125, "sample": [21.794017791748047, 109.8814697265625, 183.310791015625, 45.36279296875, 21.405132293701172, 46.103729248046875, 123.62562561035156, -10.28717041015625, 227.54083251953125, 69.66661834716797, 163.88629150390625, 126.04503631591797, -77.40330505371094, 8.295906066894531, 88.24407958984375, -60.05900573730469, 88.73474884033203, 12.62847900390625, 10.194854736328125, -81.7669906616211, 110.5544204711914, 106.69502258300781, 129.93165588378906, 117.56146240234375, 129.59751892089844, 44.38432312011719, 162.29273986816406, -31.42544174194336, -30.901046752929688, 51.66741943359375, 174.41241455078125, 76.38288116455078, 83.75250244140625, 12.530960083007812, 8.280635833740234, 164.957275390625, -16.22154998779297, 15.260986328125, 30.529144287109375, -18.209747314453125, 44.032493591308594, 14.552253723144531, 131.8768310546875, 141.98858642578125, -42.5416259765625, -24.047204971313477, 65.05886840820312, 99.89187622070312, 167.4261932373047, -67.92636108398438, 69.16276550292969, 41.75672149658203, 119.39309692382812, 72.30693817138672, 45.95361328125, 29.468429565429688, 0.23611068725585938, 160.68389892578125, 112.940673828125, 118.81755065917969, 116.31842041015625, 132.4536590576172, -37.86380386352539, 141.9798583984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000455.npy"}
{"epoch": 0.6681350954478708, "step": 456, "batch_size": 64, "mean": 58.58759307861328, "std": 66.63648986816406, "min": -87.49510955810547, "p10": -22.144207000732415, "median": 61.04969024658203, "p90": 131.53837890625002, "max": 219.69920349121094, "pos_frac": 0.78125, "sample": [10.142730712890625, 128.2088623046875, 129.0322265625, 147.24356079101562, 36.60252380371094, 71.20181274414062, -87.49510955810547, 67.81766510009766, -39.86717224121094, 109.60022735595703, 43.03668212890625, 9.462303161621094, 129.006591796875, 90.57281494140625, 101.15835571289062, 49.45600128173828, 85.65216064453125, -4.8276519775390625, 29.39278793334961, -10.224151611328125, 41.103851318359375, -14.131210327148438, 60.99510192871094, -30.207855224609375, 24.838043212890625, 77.71461486816406, 90.96673583984375, -75.61767578125, 128.81463623046875, 76.72396850585938, 46.812896728515625, 219.69920349121094, -15.941368103027344, 117.1747817993164, -29.280654907226562, -13.883535385131836, 24.434539794921875, -8.792583465576172, 204.62379455566406, -43.44403076171875, 14.056991577148438, 146.05496215820312, 105.84676361083984, 96.06751251220703, 81.06491088867188, 61.104278564453125, -6.9845123291015625, 19.514678955078125, 177.65478515625, 16.29446029663086, 175.5201873779297, 32.682884216308594, 132.231689453125, 21.12720489501953, 69.49285888671875, 120.19187927246094, 14.147079467773438, 106.63801574707031, 129.920654296875, 91.59683227539062, 89.7950439453125, -24.802566528320312, 1.462646484375, 101.15043640136719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000456.npy"}
{"epoch": 0.6696035242290749, "step": 457, "batch_size": 64, "mean": 77.71039581298828, "std": 61.69509506225586, "min": -69.54832458496094, "p10": 4.946020507812503, "median": 75.61523056030273, "p90": 149.55636444091797, "max": 257.05023193359375, "pos_frac": 0.921875, "sample": [161.58511352539062, 56.17869567871094, 122.75912475585938, 132.6886749267578, 150.07614135742188, 135.17230224609375, 257.05023193359375, 110.44625854492188, 21.55620574951172, 44.04052734375, 134.02633666992188, 73.91564178466797, 45.785179138183594, -16.71575927734375, 84.08282470703125, 133.5379638671875, 90.05368041992188, 56.82615280151367, 148.3435516357422, 59.16474914550781, 173.24197387695312, 55.905738830566406, 81.8653335571289, 101.72508239746094, 124.8290786743164, 108.54901885986328, 43.864715576171875, 52.43829345703125, 34.98181915283203, -6.5369720458984375, 107.13291931152344, -69.54832458496094, 78.26480102539062, 71.13121032714844, 42.79677963256836, 87.31140899658203, 7.833488464355469, 135.20498657226562, 29.10272216796875, 128.3340301513672, 116.98230743408203, 190.45030212402344, 53.263824462890625, 3.7085342407226562, 10.259429931640625, 109.81639099121094, 88.44662475585938, 38.57816696166992, 9.340187072753906, 98.79155731201172, 33.28173065185547, 150.474365234375, 120.22819519042969, 70.41839599609375, -35.337684631347656, 204.54824829101562, 132.77047729492188, 45.975494384765625, 43.9324836730957, 24.810829162597656, 1.9766159057617188, 43.882652282714844, 77.3148193359375, -49.45030212402344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000457.npy"}
{"epoch": 0.671071953010279, "step": 458, "batch_size": 64, "mean": 58.9216423034668, "std": 79.96781921386719, "min": -102.48347473144531, "p10": -41.55339355468749, "median": 57.80746078491211, "p90": 153.07990264892578, "max": 304.49200439453125, "pos_frac": 0.78125, "sample": [-26.624252319335938, 7.233562469482422, 75.3479995727539, 120.43140411376953, 37.550865173339844, -51.59302520751953, 304.49200439453125, -63.0738525390625, -102.48347473144531, 3.3618316650390625, 45.419105529785156, 62.42539978027344, 143.34632873535156, 2.2310409545898438, 56.452117919921875, 7.351173400878906, 33.593692779541016, 8.77774429321289, -16.569683074951172, 41.076690673828125, 79.03985595703125, -10.406280517578125, 112.1538314819336, 153.59046936035156, -49.72943115234375, 62.70055389404297, 94.24266052246094, -31.050445556640625, 156.14874267578125, 59.162803649902344, 99.5781478881836, 155.99716186523438, 29.448745727539062, 115.8615951538086, -26.4847412109375, -15.759956359863281, 141.01278686523438, 67.1806640625, 40.82642364501953, 146.50106811523438, 137.315185546875, 95.89404296875, 36.600730895996094, -4.201030731201172, 71.28583526611328, -82.2234115600586, 183.86122131347656, 77.33151245117188, 12.485771179199219, 264.4745178222656, 12.207534790039062, 67.75772857666016, 16.780982971191406, 66.12322998046875, -72.03718566894531, 151.88858032226562, 98.74174499511719, 44.783203125, 96.09428405761719, 148.0279083251953, 93.01606750488281, 197.4251251220703, -46.054656982421875, 34.64502716064453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000458.npy"}
{"epoch": 0.6725403817914831, "step": 459, "batch_size": 64, "mean": 72.08201599121094, "std": 76.73512268066406, "min": -82.23626708984375, "p10": -19.32445068359375, "median": 61.69231986999512, "p90": 169.5783462524414, "max": 265.2967834472656, "pos_frac": 0.828125, "sample": [45.76445007324219, 57.49333572387695, -19.3924560546875, -37.22168731689453, -27.028114318847656, 144.91146850585938, 95.05733489990234, 70.57182312011719, 128.96270751953125, -46.691612243652344, 20.987991333007812, 30.013214111328125, 105.91875457763672, 103.30863952636719, -2.7375335693359375, 93.06143951416016, 103.10076904296875, 7.149803161621094, -82.23626708984375, 36.711151123046875, 20.437179565429688, 140.6534423828125, 176.47262573242188, 65.89130401611328, -46.956504821777344, 36.13019561767578, 41.337520599365234, 11.457878112792969, 139.32937622070312, 126.29635620117188, 31.473682403564453, 147.45889282226562, 81.73211669921875, 32.18415069580078, 265.2967834472656, 4.803951263427734, 218.22384643554688, 169.91697692871094, 9.928520202636719, 15.135025024414062, 22.683975219726562, -6.516807556152344, 121.84227752685547, 127.09970092773438, -7.841102600097656, 162.93138122558594, 4.703155517578125, 111.1114273071289, 77.63307189941406, 136.65277099609375, 199.20785522460938, 141.38063049316406, 32.98755645751953, 85.34202575683594, 168.66192626953125, 168.7882080078125, 19.84848403930664, -43.7392463684082, 189.89871215820312, 139.2636260986328, 204.81455993652344, 49.822418212890625, 10.93000602722168, -19.165771484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000459.npy"}
{"epoch": 0.6740088105726872, "step": 460, "batch_size": 64, "mean": 64.40177917480469, "std": 75.63972473144531, "min": -95.61180114746094, "p10": -34.0976318359375, "median": 60.05703353881836, "p90": 163.8306396484375, "max": 276.5517883300781, "pos_frac": 0.796875, "sample": [132.3671112060547, 14.295166015625, 117.0155029296875, 157.7152099609375, -48.89991760253906, 151.671142578125, 62.005767822265625, 183.750732421875, 2.4581298828125, -0.00423431396484375, 83.64221954345703, 58.333763122558594, -40.32791519165039, 67.512939453125, -1.3622188568115234, 72.78256225585938, 63.893802642822266, 85.53343200683594, -56.613555908203125, 18.920379638671875, 212.28639221191406, -0.4774017333984375, 205.0177001953125, -4.049842834472656, 29.824798583984375, -8.13702392578125, 276.5517883300781, 142.46546936035156, 133.47987365722656, -30.932891845703125, 14.149185180664062, 8.173812866210938, 58.780555725097656, 60.16246032714844, 112.73432922363281, 59.95160675048828, 4.051584243774414, 67.9517822265625, 81.22235870361328, 190.18455505371094, 46.863555908203125, 166.4515380859375, 33.639808654785156, 139.14942932128906, 85.6979751586914, -61.01931381225586, 26.323211669921875, -95.61180114746094, 64.47868347167969, 43.795631408691406, 57.52012252807617, -68.86593627929688, 22.203277587890625, 27.539779663085938, 108.00980377197266, 166.51710510253906, 30.829681396484375, 49.119773864746094, -35.453948974609375, 70.5518798828125, 103.55126190185547, 127.970458984375, 123.1795654296875, 151.19122314453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000460.npy"}
{"epoch": 0.6754772393538914, "step": 461, "batch_size": 64, "mean": 52.35367965698242, "std": 68.24530029296875, "min": -71.35308074951172, "p10": -29.009869766235347, "median": 42.65960693359375, "p90": 151.52167816162108, "max": 221.3590850830078, "pos_frac": 0.734375, "sample": [9.244842529296875, 62.79249572753906, 3.538707733154297, 80.61614990234375, 122.63472747802734, 79.66720581054688, 27.060073852539062, 121.1473388671875, -35.080413818359375, -36.846656799316406, 56.856937408447266, 39.355438232421875, 79.04820251464844, 24.292823791503906, 106.7540283203125, 60.80949401855469, 149.1346435546875, -4.371040344238281, 61.112464904785156, 124.64051818847656, 11.438720703125, 169.8678436279297, 151.79052734375, 89.02091979980469, 83.24396514892578, 21.17255401611328, 193.6729278564453, 221.3590850830078, 9.330162048339844, -1.016693115234375, 21.463592529296875, 24.346771240234375, 113.58948516845703, 3.0145492553710938, -22.0159912109375, -30.763870239257812, -11.359466552734375, 52.51829528808594, -23.226150512695312, 54.90709686279297, 189.57748413085938, -7.481678009033203, -6.64569091796875, -24.91720199584961, -39.55919647216797, 109.92906951904297, -71.35308074951172, 35.07117462158203, 35.58625030517578, 45.30665588378906, 173.84080505371094, 29.211326599121094, 105.64320373535156, -1.15643310546875, 40.01255798339844, 65.95394897460938, -48.77592468261719, -59.91853713989258, 150.8943634033203, 118.1478500366211, -6.474067687988281, 154.27471923828125, 46.79435729980469, 51.911231994628906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000461.npy"}
{"epoch": 0.6769456681350955, "step": 462, "batch_size": 64, "mean": 70.1772232055664, "std": 66.27196502685547, "min": -96.6753158569336, "p10": -1.7115720748901329, "median": 63.325008392333984, "p90": 156.3254379272461, "max": 201.485595703125, "pos_frac": 0.890625, "sample": [30.118370056152344, 81.92426300048828, -71.96894836425781, 98.90408325195312, 115.70321655273438, 90.46366882324219, 33.00361251831055, -19.677295684814453, 116.29027557373047, 151.54747009277344, 187.7762451171875, 51.57684326171875, 21.007431030273438, -27.680051803588867, 90.81187438964844, 201.485595703125, 188.4334716796875, -29.967361450195312, 14.963043212890625, 158.37313842773438, 104.40030670166016, 85.07516479492188, 32.478240966796875, 88.97648620605469, 108.08955383300781, 107.68650817871094, 9.999519348144531, 119.81327056884766, 2.0583553314208984, 49.76641082763672, 102.44032287597656, 132.13101196289062, 15.242721557617188, 42.13179016113281, 116.83075714111328, 83.85330200195312, 18.59720230102539, 6.334987640380859, 147.64447021484375, 63.97943115234375, 8.766239166259766, 143.02627563476562, 40.91780090332031, 45.047386169433594, -3.3272552490234375, 2.1458969116210938, 35.485565185546875, 43.174285888671875, 95.90442657470703, 136.70016479492188, -96.6753158569336, 145.10194396972656, 184.508056640625, -5.085166931152344, 62.67058563232422, 40.405548095703125, 35.00904846191406, 166.15707397460938, 42.58537292480469, 189.6746063232422, 147.6590118408203, 73.10198211669922, 34.020240783691406, 3.749502182006836], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000462.npy"}
{"epoch": 0.6784140969162996, "step": 463, "batch_size": 64, "mean": 81.48208618164062, "std": 78.64169311523438, "min": -138.5433349609375, "p10": -23.9262466430664, "median": 83.89880752563477, "p90": 172.98584442138673, "max": 257.0567626953125, "pos_frac": 0.84375, "sample": [30.202789306640625, 64.1246337890625, -67.63347625732422, 89.75715637207031, 7.3822479248046875, 192.9791259765625, 153.48800659179688, 84.72906494140625, 47.52309036254883, 127.08386993408203, 208.82583618164062, 156.90826416015625, -26.791961669921875, 75.10734558105469, 89.45182800292969, 107.7421875, 46.6282958984375, -13.16390609741211, 32.062217712402344, 26.448013305664062, 170.06509399414062, 162.0125732421875, -84.04525756835938, -138.5433349609375, 74.97335815429688, -17.239578247070312, 18.416927337646484, 168.86082458496094, 110.682861328125, 139.6636962890625, 128.5342559814453, 69.06489562988281, 63.700897216796875, 61.407806396484375, 79.45964813232422, -10.737529754638672, 71.18630981445312, 48.038543701171875, 76.85535430908203, 35.61888885498047, 66.59053039550781, 37.58641815185547, 119.93280792236328, -27.61711883544922, 187.28689575195312, 85.97030639648438, 89.97016906738281, 174.2375946044922, 175.27743530273438, 156.9484405517578, -67.57907104492188, 115.08042907714844, 212.56163024902344, 113.07205200195312, 135.23110961914062, -78.01658630371094, 136.6133270263672, 83.06855010986328, 136.67877197265625, 100.97309875488281, 71.42710876464844, 149.19754028320312, 257.0567626953125, 122.47477722167969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000463.npy"}
{"epoch": 0.6798825256975036, "step": 464, "batch_size": 64, "mean": 65.88970947265625, "std": 71.93437957763672, "min": -125.07319641113281, "p10": -11.496292495727538, "median": 63.07847023010254, "p90": 142.02950897216797, "max": 284.1348876953125, "pos_frac": 0.828125, "sample": [-72.79109191894531, 36.51918029785156, 21.87480926513672, 62.80376434326172, 65.05148315429688, 57.84100341796875, 47.22407531738281, 143.13677978515625, 44.261199951171875, 125.99710083007812, -12.517967224121094, 73.99758911132812, 33.58412170410156, 19.457229614257812, 50.612945556640625, 124.73895263671875, 98.22078704833984, 6.413017272949219, -3.1225433349609375, 127.73411560058594, 106.52824401855469, -76.38443756103516, 12.206230163574219, 82.04795837402344, 194.6477508544922, 96.06734466552734, 18.683822631835938, 56.59465789794922, -16.675010681152344, -46.68013000488281, 107.24820709228516, 63.44036865234375, 104.54740142822266, 63.35317611694336, 36.94599914550781, 167.07635498046875, -30.901634216308594, 98.73895263671875, -0.9092559814453125, 102.529052734375, 2.2624282836914062, 52.13597869873047, 146.40655517578125, -9.112384796142578, -3.1839675903320312, 122.60939025878906, -125.07319641113281, 65.43521118164062, 101.48257446289062, 101.54811096191406, 115.42012023925781, 139.4458770751953, 284.1348876953125, 61.36400604248047, 29.767234802246094, 195.58740234375, 139.0, 34.37713623046875, 108.12055969238281, 234.46034240722656, 39.38182067871094, 63.36882781982422, 104.40949249267578, 23.45165252685547], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000464.npy"}
{"epoch": 0.6813509544787077, "step": 465, "batch_size": 64, "mean": 78.90789794921875, "std": 72.07849884033203, "min": -70.63436889648438, "p10": -5.116298675537099, "median": 71.73055267333984, "p90": 176.46363067626956, "max": 231.84512329101562, "pos_frac": 0.890625, "sample": [51.35502624511719, 126.19822692871094, 110.08694458007812, 9.191730499267578, 39.99464416503906, 13.847206115722656, 167.11289978027344, 64.29281616210938, 75.30404663085938, 161.2802734375, 5.779998779296875, 102.90838623046875, 9.172080993652344, 16.078125, 19.441848754882812, 231.84512329101562, 221.52734375, 219.13546752929688, 118.36275482177734, -9.786140441894531, 9.11016845703125, 161.9436492919922, 76.88796997070312, 39.68891906738281, 222.02474975585938, 81.34525299072266, 120.35746002197266, 35.49090576171875, -24.855392456054688, 172.85922241210938, 178.0083770751953, 92.41664123535156, 44.8734016418457, 106.55072021484375, 124.65058898925781, 162.86151123046875, 15.398956298828125, 60.637359619140625, -19.744178771972656, 68.9863510131836, 77.07498931884766, 15.25430679321289, 59.30394744873047, 18.14511489868164, 112.90765380859375, 88.59526062011719, 37.2802734375, 71.64146423339844, 118.39997863769531, 219.4417724609375, 46.29975128173828, -70.63436889648438, -12.093399047851562, 64.75408172607422, 129.36190795898438, 226.0813751220703, 31.540794372558594, 111.93553161621094, 111.83642578125, -22.22771453857422, 78.03125, 71.81964111328125, -32.837554931640625, 15.571342468261719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000465.npy"}
{"epoch": 0.6828193832599119, "step": 466, "batch_size": 64, "mean": 82.00619506835938, "std": 75.74956512451172, "min": -111.06680297851562, "p10": -8.369316482543944, "median": 70.34088516235352, "p90": 186.13500213623047, "max": 245.170654296875, "pos_frac": 0.875, "sample": [202.2390594482422, 87.23272705078125, 55.67791748046875, 24.89736557006836, 71.24213409423828, -23.062957763671875, 187.32443237304688, 24.342422485351562, 120.74358367919922, 31.043411254882812, 166.31578063964844, 208.58038330078125, 101.38233947753906, 12.959671020507812, 86.66834259033203, 127.45868682861328, 122.24664306640625, 152.34243774414062, 81.10885620117188, 171.41246032714844, 26.14472198486328, 54.055946350097656, 66.2913589477539, 169.96124267578125, 201.21519470214844, 41.598350524902344, -18.07061004638672, 156.5189208984375, 42.093223571777344, 160.7399444580078, 88.69918823242188, 141.60386657714844, 183.3596649169922, 195.30746459960938, -111.06680297851562, 110.93728637695312, 69.43963623046875, 129.109130859375, 121.78960418701172, 7.715641021728516, 62.06463623046875, -6.907646179199219, -49.944210052490234, 178.3517303466797, 86.76728820800781, 245.170654296875, 36.09149169921875, 47.57469940185547, 18.913963317871094, 35.464752197265625, 116.02897644042969, 31.82706069946289, 60.67974853515625, 1.5735092163085938, 181.92860412597656, 193.10182189941406, -11.387504577636719, -8.995746612548828, 95.37904357910156, 64.76332092285156, 24.981765747070312, 7.909656524658203, -45.00071716308594, 32.46080017089844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000466.npy"}
{"epoch": 0.684287812041116, "step": 467, "batch_size": 64, "mean": 71.97554016113281, "std": 83.57694244384766, "min": -98.12985229492188, "p10": -36.30528450012207, "median": 70.17601013183594, "p90": 169.87976379394533, "max": 291.4591064453125, "pos_frac": 0.828125, "sample": [36.74669647216797, 141.21243286132812, 157.68923950195312, 151.10980224609375, 28.931060791015625, -90.25904846191406, 189.88720703125, 70.46722412109375, 88.5580825805664, -0.197021484375, 120.30429077148438, 111.2106704711914, 227.90379333496094, 23.925968170166016, 26.39154815673828, 158.483154296875, 19.973918914794922, 37.65025329589844, 291.4591064453125, 60.958892822265625, 150.093017578125, 122.93771362304688, 125.3696060180664, -4.499000549316406, 30.616958618164062, 122.27291870117188, -64.98677062988281, 0.3606719970703125, 155.6234588623047, 179.25387573242188, 75.41421508789062, -37.00300216674805, 120.77442932128906, 64.06632995605469, 121.9102783203125, 69.73947143554688, 134.57650756835938, 163.98658752441406, -98.12985229492188, 70.37344360351562, 71.5882568359375, -73.31208801269531, 172.40541076660156, 17.906570434570312, 177.24295043945312, 7.047666549682617, 38.26353454589844, -88.32911682128906, 20.685882568359375, 127.54190063476562, 61.40473937988281, 16.459518432617188, 177.88438415527344, 12.067710876464844, -34.677276611328125, 160.7766571044922, 104.20791625976562, 159.90895080566406, 100.37415313720703, 69.97857666015625, -76.34229278564453, 20.794204711914062, -32.831451416015625, 40.229888916015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000467.npy"}
{"epoch": 0.6857562408223201, "step": 468, "batch_size": 64, "mean": 58.95384979248047, "std": 81.47587585449219, "min": -115.01063537597656, "p10": -20.807677459716796, "median": 48.53470993041992, "p90": 178.29155426025392, "max": 298.0343933105469, "pos_frac": 0.703125, "sample": [-3.1318359375, -17.253501892089844, 111.14244079589844, 56.233924865722656, 191.57949829101562, 102.53742980957031, 201.26010131835938, -0.3242645263671875, 57.225914001464844, 47.08722686767578, 146.21612548828125, -18.9434814453125, 77.32749938964844, -24.231258392333984, -28.010089874267578, 14.952362060546875, 51.60997009277344, 39.73793029785156, 3.4522247314453125, -20.910186767578125, -20.56848907470703, 8.724584579467773, -49.39735412597656, 34.165283203125, -5.2593231201171875, 176.07106018066406, 127.05039978027344, 116.58380126953125, 102.9547348022461, -42.035400390625, -18.76757049560547, -2.7357864379882812, 115.50057983398438, -80.28556823730469, 220.90878295898438, 175.70997619628906, 150.90414428710938, 13.658676147460938, 52.3809814453125, 90.93901062011719, 108.75648498535156, 14.595191955566406, 5.54376220703125, 184.1680908203125, -7.724273681640625, 20.48943328857422, -15.862762451171875, 49.98219299316406, 71.07112121582031, 109.432861328125, -16.074745178222656, 199.85052490234375, 10.11117172241211, 50.35627746582031, 64.65799713134766, 90.44535064697266, 298.0343933105469, 179.24319458007812, 139.654052734375, 129.057373046875, 25.320388793945312, -4.878383636474609, -115.01063537597656, 27.766799926757812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000468.npy"}
{"epoch": 0.6872246696035242, "step": 469, "batch_size": 64, "mean": 74.86597442626953, "std": 76.90406799316406, "min": -119.6056137084961, "p10": -24.666133880615227, "median": 72.39346694946289, "p90": 165.872543334961, "max": 257.24853515625, "pos_frac": 0.84375, "sample": [112.68219757080078, 177.1215362548828, 24.33924674987793, 51.88222885131836, 140.5970458984375, 62.14682388305664, 224.10174560546875, 25.515281677246094, 9.440658569335938, -27.286056518554688, 139.5072479248047, 115.02322387695312, -19.527137756347656, -26.868560791015625, 170.63287353515625, 50.612152099609375, 42.54899597167969, 257.24853515625, 47.480743408203125, 146.95204162597656, 152.3262481689453, 116.85179138183594, 135.08767700195312, 66.06027221679688, 83.55693054199219, 154.76510620117188, 46.46489715576172, 47.994598388671875, 116.78675842285156, 16.40899658203125, 28.929893493652344, 84.97158813476562, 177.15609741210938, 3.37451171875, 83.35761260986328, -49.62404251098633, 44.25529861450195, 56.464317321777344, 20.314498901367188, 85.65878295898438, 78.7266616821289, 25.17974853515625, 96.48663330078125, 32.61174011230469, 44.01622009277344, -4.292030334472656, -119.6056137084961, 175.14324951171875, 126.82926940917969, -4.979450225830078, -30.051284790039062, 20.9669189453125, -103.52784729003906, 150.4761505126953, 135.43264770507812, 219.7239532470703, -48.66094970703125, 154.75192260742188, 35.23633575439453, 105.02191162109375, 93.14533996582031, 144.37057495117188, 154.53636169433594, 114.57107543945312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000469.npy"}
{"epoch": 0.6886930983847284, "step": 470, "batch_size": 64, "mean": 71.6021499633789, "std": 71.91655731201172, "min": -106.56178283691406, "p10": -11.269404983520504, "median": 64.59227752685547, "p90": 148.73650512695312, "max": 255.77835083007812, "pos_frac": 0.875, "sample": [93.51644897460938, 59.97117614746094, 148.2957763671875, -56.68608856201172, 130.6678009033203, 17.56566619873047, 97.33350372314453, 94.15806579589844, 35.52040481567383, 85.5048599243164, 97.14463806152344, 93.82723999023438, 134.12969970703125, 30.804824829101562, 103.27416229248047, -106.56178283691406, 26.393362045288086, 38.95916748046875, 38.57068634033203, -22.39696502685547, -6.817161560058594, 35.585121154785156, 129.88546752929688, -59.9351806640625, 89.45387268066406, 61.37824249267578, 16.189605712890625, 127.86626434326172, 103.43356323242188, 22.240863800048828, 35.261390686035156, 23.077848434448242, 255.77835083007812, 36.33990478515625, -13.177509307861328, 74.22994995117188, 37.78030776977539, 195.78411865234375, 87.5312271118164, 67.80631256103516, 56.475929260253906, 43.57441711425781, 141.91299438476562, 148.29710388183594, 39.513641357421875, 225.57455444335938, 86.67053985595703, 107.69991302490234, 20.834571838378906, 222.392822265625, 42.23258972167969, 30.816160202026367, 50.41668701171875, 85.7103042602539, 181.38356018066406, 107.81964111328125, 31.641849517822266, 1.4961700439453125, 75.70732116699219, -28.010955810546875, 221.32473754882812, 148.92481994628906, 136.5964813232422, -56.153221130371094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000470.npy"}
{"epoch": 0.6901615271659325, "step": 471, "batch_size": 64, "mean": 68.89513397216797, "std": 65.97065734863281, "min": -73.43057250976562, "p10": -18.333655929565424, "median": 62.407169342041016, "p90": 158.2834396362305, "max": 242.96511840820312, "pos_frac": 0.84375, "sample": [36.87448501586914, 65.84317779541016, 42.08562469482422, 161.62911987304688, 48.35365295410156, 179.55557250976562, 71.29759216308594, 150.4768524169922, 55.28550720214844, 48.35760498046875, 34.90699768066406, -14.23809814453125, 125.73217010498047, -66.48268127441406, 86.27366638183594, 89.33554077148438, 33.862876892089844, -20.33475685119629, 57.206886291503906, 103.5045166015625, 242.96511840820312, 41.889434814453125, 19.1904296875, 112.99737548828125, 1.130645751953125, -49.235496520996094, -19.9815673828125, 95.92446899414062, 106.77831268310547, 171.4801788330078, 49.725093841552734, 103.54222869873047, 91.89875793457031, 96.16722869873047, 46.70992660522461, 100.89990234375, 55.5869140625, -4.51898193359375, 8.523941040039062, 86.79188537597656, 134.43255615234375, 20.547073364257812, 91.1942367553711, -73.43057250976562, 110.69427490234375, 146.92376708984375, 45.568603515625, -14.488529205322266, 58.971160888671875, 72.36433410644531, 169.22348022460938, 42.56183624267578, -57.79179382324219, 104.17950439453125, 52.21467590332031, 107.65097045898438, -29.98649024963379, 18.230857849121094, 165.5499267578125, 83.61691284179688, 149.43008422851562, 132.95257568359375, 46.64516830444336, 184.04196166992188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000471.npy"}
{"epoch": 0.6916299559471366, "step": 472, "batch_size": 64, "mean": 84.37684631347656, "std": 65.92009735107422, "min": -33.94075012207031, "p10": 5.976443481445321, "median": 76.95679473876953, "p90": 174.4975296020508, "max": 310.3833923339844, "pos_frac": 0.921875, "sample": [32.48478698730469, -26.44574737548828, 42.98412322998047, 65.86661529541016, 13.70269775390625, 33.147430419921875, 30.323379516601562, 37.09375, 56.383079528808594, 60.164913177490234, 52.65989303588867, 19.680068969726562, 77.93707275390625, 125.37378692626953, 39.369537353515625, 96.53648376464844, 90.268310546875, 82.81034851074219, -33.94075012207031, 2.555938720703125, 227.4141845703125, 152.3725128173828, 126.14501953125, 55.86822509765625, 26.046363830566406, 59.331451416015625, 127.80487823486328, 53.027488708496094, -13.472564697265625, 182.69027709960938, 22.403778076171875, 127.42256164550781, 128.25221252441406, 168.61912536621094, 310.3833923339844, 117.42784881591797, 34.19003677368164, 154.69482421875, 151.62374877929688, 188.5192108154297, 101.99555969238281, 177.016845703125, 44.93934631347656, 56.101318359375, 81.38690185546875, 153.59974670410156, 188.20416259765625, 162.68756103515625, 98.76234436035156, 75.97651672363281, 93.38934326171875, 95.83892822265625, -8.956573486328125, 86.16793823242188, 103.02826690673828, 177.0537872314453, 70.05783081054688, 79.85029602050781, 24.005393981933594, 71.00982666015625, -10.366443634033203, 50.66204833984375, 2.665191650390625, 125.32160949707031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000472.npy"}
{"epoch": 0.6930983847283406, "step": 473, "batch_size": 64, "mean": 54.608848571777344, "std": 78.60802459716797, "min": -80.81608581542969, "p10": -50.173923492431626, "median": 50.76431846618652, "p90": 162.83010711669922, "max": 257.65594482421875, "pos_frac": 0.75, "sample": [-8.83404541015625, 25.685752868652344, 172.50648498535156, -68.96366119384766, -54.72369384765625, 149.06597900390625, 53.86412048339844, -64.14009094238281, -20.096038818359375, 49.712039947509766, 220.190673828125, 23.451202392578125, -39.55779266357422, -64.43638610839844, 43.51799011230469, 63.49772644042969, 28.14483642578125, -54.759281158447266, 150.235595703125, 18.603904724121094, -10.758628845214844, 72.10031127929688, 164.65670776367188, 18.017539978027344, 88.08158874511719, 48.462730407714844, 100.27336120605469, -16.716331481933594, 40.854698181152344, 3.2398147583007812, 15.060874938964844, -75.613525390625, -36.17950439453125, 77.78428649902344, 18.655536651611328, -27.30022430419922, 257.65594482421875, 158.5680389404297, 51.81659698486328, 104.491455078125, 21.040199279785156, 83.55937957763672, 84.02987670898438, 207.94302368164062, 126.48600769042969, 75.17109680175781, 35.24549865722656, 83.22489929199219, -36.783424377441406, 215.13519287109375, 112.65662384033203, 27.55377197265625, -32.383148193359375, -80.81608581542969, 99.5505142211914, 85.74168395996094, 176.06716918945312, 97.28219604492188, 117.84388732910156, 65.08454132080078, 6.18603515625, 76.8208236694336, 60.0809211730957, 112.12908172607422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000473.npy"}
{"epoch": 0.6945668135095447, "step": 474, "batch_size": 64, "mean": 48.58485412597656, "std": 75.74409484863281, "min": -140.13287353515625, "p10": -49.268786621093746, "median": 46.18341064453125, "p90": 146.85603637695314, "max": 207.8853759765625, "pos_frac": 0.734375, "sample": [13.259757995605469, 2.7133026123046875, 115.48126220703125, 20.014015197753906, 38.58927917480469, -42.839630126953125, 98.6453857421875, 14.320493698120117, 143.43734741210938, 162.77981567382812, 158.59141540527344, 207.8853759765625, 5.159635543823242, 73.2142105102539, -140.13287353515625, 45.33094787597656, -1.4306831359863281, 1.7965087890625, -52.024139404296875, 21.279661178588867, 107.5782470703125, 58.138458251953125, 98.64384460449219, 34.89216232299805, -2.76495361328125, 88.48084259033203, 90.7586669921875, 108.8335189819336, 145.33123779296875, -99.35804748535156, -34.121360778808594, -70.1275634765625, 103.71945190429688, -1.6285247802734375, 7.785511016845703, 4.0740203857421875, 47.03587341308594, 9.349250793457031, -83.72026062011719, -28.037212371826172, 147.509521484375, 39.42573547363281, 26.933753967285156, 89.09481811523438, -21.377365112304688, -61.09928894042969, 155.53463745117188, 142.57403564453125, -64.1264419555664, -34.03184509277344, -16.06866455078125, 161.92257690429688, 83.55589294433594, 65.54456329345703, 85.6903076171875, 116.96893310546875, 103.91976928710938, 104.77880859375, 166.7889404296875, -29.239152908325195, 77.50274658203125, 142.99806213378906, 84.16151428222656, 69.53453826904297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000474.npy"}
{"epoch": 0.6960352422907489, "step": 475, "batch_size": 64, "mean": 74.97346496582031, "std": 78.39911651611328, "min": -52.95426559448242, "p10": -15.677076721191405, "median": 67.21626281738281, "p90": 184.82134857177735, "max": 281.1488037109375, "pos_frac": 0.828125, "sample": [39.902069091796875, -52.95426559448242, 201.53543090820312, 22.51205062866211, -31.26470947265625, 166.22824096679688, 133.23126220703125, -49.854698181152344, 135.95602416992188, 210.65142822265625, 75.71778106689453, -0.9081039428710938, 75.905029296875, 5.6887359619140625, 57.12800598144531, 39.21807098388672, 77.06588745117188, 153.8603515625, 0.2902679443359375, -10.14614486694336, 32.592159271240234, 31.801971435546875, 4.69549560546875, 130.37631225585938, 32.8293571472168, 109.0820541381836, 121.55810546875, -14.579353332519531, 38.471717834472656, -36.647369384765625, 13.963397979736328, -2.8769683837890625, 31.361297607421875, 234.74356079101562, 42.44451141357422, 70.0573501586914, 73.3875732421875, 71.1336669921875, 142.88819885253906, 182.64883422851562, 92.25128173828125, 48.798011779785156, 185.8128204345703, -48.133148193359375, 66.95655822753906, 67.47596740722656, 37.02659225463867, 90.19761657714844, 146.69512939453125, 171.46170043945312, -30.183990478515625, 36.291900634765625, 104.18283081054688, 48.02787780761719, 158.89645385742188, 14.995712280273438, 91.01993560791016, 281.1488037109375, 152.25216674804688, 237.21856689453125, 2.012603759765625, 114.59696960449219, -16.14752960205078, 185.75242614746094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000475.npy"}
{"epoch": 0.697503671071953, "step": 476, "batch_size": 64, "mean": 65.93070983886719, "std": 80.3464584350586, "min": -77.302001953125, "p10": -30.65645294189453, "median": 48.79498481750488, "p90": 164.1319030761719, "max": 282.5957336425781, "pos_frac": 0.8125, "sample": [5.1944580078125, -32.34925842285156, 7.261669158935547, -31.325979232788086, 187.8514404296875, 30.5048828125, 282.5957336425781, 125.64755249023438, 28.83120346069336, -49.29216003417969, 89.08638000488281, 125.36509704589844, 60.42381286621094, 233.2247314453125, 91.08123779296875, 114.79344177246094, 34.61102294921875, 1.4585418701171875, 136.61410522460938, 12.359092712402344, 135.72125244140625, 104.15121459960938, -19.035690307617188, 156.14141845703125, 112.32718658447266, 10.681556701660156, -14.49459457397461, -32.62645721435547, 2.0583724975585938, 1.3849563598632812, 26.48138427734375, -31.93132781982422, -21.989795684814453, 50.37184524536133, 38.27975082397461, 100.905517578125, -17.7333984375, -29.09422492980957, 21.554771423339844, 88.27764892578125, 36.19552993774414, 25.42009735107422, 71.45842742919922, 7.556190490722656, -36.296165466308594, 24.299779891967773, 116.10233306884766, 211.81423950195312, 271.8537902832031, 229.28244018554688, 167.556396484375, 122.51217651367188, 84.8146743774414, 110.4000244140625, 106.6341781616211, 26.260452270507812, 117.98786926269531, 136.04168701171875, 100.72610473632812, -77.302001953125, 60.73115539550781, 119.36506652832031, 47.21812438964844, 3.5644073486328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000476.npy"}
{"epoch": 0.6989720998531571, "step": 477, "batch_size": 64, "mean": 71.10099792480469, "std": 78.16577911376953, "min": -155.28761291503906, "p10": -22.171673202514647, "median": 70.43376159667969, "p90": 171.97371826171877, "max": 231.0826416015625, "pos_frac": 0.8125, "sample": [-19.352909088134766, 99.2150650024414, 76.09664916992188, -77.11985778808594, 101.45985412597656, 174.2647247314453, 49.1273193359375, 166.62803649902344, 20.583484649658203, 151.2379608154297, 60.50419616699219, 75.17950439453125, 176.408935546875, 174.44789123535156, 28.121292114257812, -9.436100006103516, -14.343864440917969, 160.32382202148438, 231.0826416015625, 109.99221801757812, 141.7324676513672, 129.33535766601562, 104.1328125, 45.26335144042969, 53.631439208984375, 126.9857177734375, -15.912830352783203, 119.19134521484375, 86.81175231933594, 37.46776580810547, -155.28761291503906, 166.45370483398438, -47.396827697753906, -87.99076843261719, 208.7782745361328, 89.51747131347656, 74.2607421875, 97.08429718017578, 66.60678100585938, 57.865264892578125, 124.06388854980469, 9.422416687011719, 25.870956420898438, 94.0146713256836, -35.630157470703125, 186.4852294921875, 50.423484802246094, 145.19544982910156, -60.51240539550781, 50.47768020629883, 57.36474609375, 29.103374481201172, -13.248329162597656, 159.53085327148438, 45.36054992675781, 16.904739379882812, 149.80245971679688, 26.07220458984375, 34.46649169921875, 124.76685333251953, -23.379714965820312, 45.18994903564453, 180.4304656982422, 95.33637237548828], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000477.npy"}
{"epoch": 0.7004405286343612, "step": 478, "batch_size": 64, "mean": 73.42304992675781, "std": 75.09782409667969, "min": -155.6069793701172, "p10": -0.5910205841064378, "median": 65.03711318969727, "p90": 178.00674285888672, "max": 272.15728759765625, "pos_frac": 0.890625, "sample": [31.00299072265625, 66.26060485839844, 45.414085388183594, 19.036483764648438, 18.591705322265625, 93.53556823730469, -22.383148193359375, 58.459495544433594, 47.88136291503906, 6.984813690185547, -155.6069793701172, 35.22926330566406, 8.072513580322266, 23.576492309570312, 135.67709350585938, 207.76211547851562, 57.477928161621094, 272.15728759765625, -72.98555755615234, 25.987586975097656, 88.27194213867188, 11.91058349609375, 36.59229278564453, 120.9415054321289, 12.677803039550781, 117.00706481933594, 135.4753875732422, 110.2611312866211, 22.781646728515625, 70.56320190429688, 141.07177734375, 175.58262634277344, 99.58935546875, 62.90660858154297, 59.94170379638672, 63.813621520996094, 184.1476287841797, 114.54251861572266, 38.34502410888672, 127.69430541992188, 178.95704650878906, 160.97256469726562, -28.589141845703125, -17.744041442871094, 71.77030944824219, -3.8378067016601562, 84.47323608398438, 226.4048309326172, 191.4610595703125, 33.39805603027344, 72.59725189208984, 68.64556884765625, 175.3589630126953, 182.4581756591797, -40.69720458984375, 94.17388916015625, 80.94044494628906, 29.685821533203125, 80.5864028930664, 40.28361511230469, 54.71246337890625, 80.34474182128906, 175.78936767578125, 10.680023193359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000478.npy"}
{"epoch": 0.7019089574155654, "step": 479, "batch_size": 64, "mean": 63.484867095947266, "std": 68.95425415039062, "min": -136.40235900878906, "p10": -24.121891784667962, "median": 71.99038696289062, "p90": 141.01531829833988, "max": 222.66749572753906, "pos_frac": 0.78125, "sample": [38.065185546875, 68.27078247070312, 170.46334838867188, 83.06362915039062, 112.57362365722656, 126.14210510253906, 32.11613464355469, 5.783935546875, 130.694580078125, 32.13494873046875, 127.5040054321289, 72.60234069824219, 84.0354232788086, 143.49546813964844, 95.9959716796875, -46.0611572265625, 65.57632446289062, -51.580406188964844, 91.03009796142578, 50.56610107421875, -15.548545837402344, 124.95828247070312, 60.65479278564453, 43.149017333984375, 97.83784484863281, 164.8481903076172, -16.974334716796875, -15.939849853515625, -0.646453857421875, 64.74021911621094, 151.39407348632812, 96.36895751953125, 5.93994140625, 116.282958984375, 32.83601379394531, 151.15985107421875, 82.25048828125, -1.1475677490234375, 5.8202362060546875, 108.13027954101562, 105.59536743164062, 135.22830200195312, 222.66749572753906, 30.587387084960938, -26.248199462890625, -55.8583984375, 94.2667236328125, -40.18310546875, 101.31803894042969, 68.9321060180664, 64.96138763427734, 71.37843322753906, -136.40235900878906, -55.158355712890625, -19.160507202148438, 131.4091033935547, 116.69172668457031, 90.20148468017578, 109.25729370117188, 122.95710754394531, -15.851547241210938, 172.62112426757812, 6.172508239746094, 79.06165313720703], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000479.npy"}
{"epoch": 0.7033773861967695, "step": 480, "batch_size": 64, "mean": 63.504146575927734, "std": 75.74657440185547, "min": -77.78607177734375, "p10": -38.17751541137694, "median": 60.85433006286621, "p90": 169.057373046875, "max": 206.06056213378906, "pos_frac": 0.78125, "sample": [12.311225891113281, 156.07449340820312, 139.90737915039062, 29.944095611572266, 87.10247802734375, 132.23023986816406, 170.324462890625, 201.70111083984375, 25.149986267089844, 65.04450225830078, -62.437583923339844, 96.25980377197266, 12.455924987792969, 2.5690231323242188, 35.669349670410156, 32.91461181640625, 68.24996948242188, 56.66415786743164, 92.8507308959961, 43.00559997558594, 146.1535186767578, 132.53903198242188, -62.70628356933594, 51.5872802734375, 24.55712890625, 76.01311492919922, 183.40476989746094, 184.63304138183594, 87.42105102539062, 10.158096313476562, 179.36550903320312, 206.06056213378906, 192.85910034179688, 14.279457092285156, -68.22593688964844, 122.98550415039062, 36.841732025146484, -77.78607177734375, 52.49725341796875, 13.927345275878906, 102.56546020507812, -4.533721923828125, 10.539649963378906, 85.43549346923828, -6.1120452880859375, -45.941551208496094, 77.86536407470703, -20.061431884765625, -58.79917526245117, 128.09530639648438, -12.850784301757812, -9.372836112976074, 74.4366683959961, 166.100830078125, 71.80570220947266, -5.470712661743164, -67.28907775878906, 43.01660919189453, 142.3009796142578, 91.93982696533203, 146.31228637695312, 92.24028015136719, -7.07354736328125, 164.55923461914062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000480.npy"}
{"epoch": 0.7048458149779736, "step": 481, "batch_size": 64, "mean": 68.99488830566406, "std": 79.05152130126953, "min": -133.7528076171875, "p10": -18.763103294372552, "median": 64.3205451965332, "p90": 167.9746871948242, "max": 275.46087646484375, "pos_frac": 0.828125, "sample": [166.63587951660156, -116.80078125, 69.28326416015625, -6.976654052734375, 19.208770751953125, 16.292343139648438, 20.256454467773438, 49.69816970825195, 23.704500198364258, 168.5484619140625, -12.582527160644531, 97.04635620117188, 50.85624694824219, 37.61943435668945, 95.56416320800781, -7.369419097900391, 32.53471374511719, 39.106109619140625, 183.89007568359375, 76.6667251586914, 106.13943481445312, 68.88065338134766, 126.23849487304688, 133.73703002929688, 124.04338073730469, 18.098609924316406, 45.02076721191406, 163.83743286132812, 50.22853088378906, 99.02471160888672, -21.21279525756836, 182.48831176757812, 91.97261047363281, 121.69403839111328, 77.39666748046875, 129.71609497070312, 40.325836181640625, 59.2685546875, 108.6094741821289, 18.66680908203125, 3.4788055419921875, 59.76043701171875, 27.196640014648438, 231.73121643066406, -96.85136413574219, -133.7528076171875, 126.89239501953125, 104.39891052246094, 275.46087646484375, 36.250213623046875, 45.45784378051758, 79.15362548828125, 105.39160919189453, 20.777698516845703, -28.43340301513672, -24.237274169921875, -62.54386901855469, 158.31536865234375, 113.00444030761719, 170.9070587158203, 137.54396057128906, 190.4306182861328, -13.047155380249023, 141.03048706054688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000481.npy"}
{"epoch": 0.7063142437591777, "step": 482, "batch_size": 64, "mean": 67.4673080444336, "std": 73.67701721191406, "min": -157.8822021484375, "p10": -14.51023712158203, "median": 70.8050651550293, "p90": 167.77415618896484, "max": 208.17684936523438, "pos_frac": 0.828125, "sample": [180.6990966796875, 15.205810546875, 143.92599487304688, 12.544586181640625, 166.2191619873047, 71.56455993652344, 45.850189208984375, 18.62364959716797, 63.9273681640625, -60.34242248535156, 39.94407272338867, 71.9378662109375, -6.193408966064453, -14.967391967773438, 98.80799865722656, 201.58932495117188, 145.08741760253906, 61.07392501831055, 104.06373596191406, 196.24688720703125, 26.602874755859375, 3.87164306640625, -4.487937927246094, 72.818115234375, 35.153114318847656, 168.44058227539062, 1.3790168762207031, 152.79347229003906, 102.26736450195312, 13.99179458618164, 70.04557037353516, 84.8741455078125, 208.17684936523438, 143.47642517089844, 9.939865112304688, 21.112300872802734, 46.44373321533203, 51.336395263671875, 87.69316101074219, 139.09030151367188, 92.27798461914062, 43.94794845581055, 0.9593372344970703, 38.15370178222656, 102.05253601074219, -55.78076171875, -57.91404724121094, 117.24663543701172, 113.75057983398438, 138.33119201660156, -157.8822021484375, 147.13845825195312, -13.44354248046875, 106.39189147949219, -20.46649169921875, 85.63380432128906, 101.06196594238281, 175.25967407226562, 32.35240173339844, -19.620590209960938, -3.5091018676757812, 184.13502502441406, 83.50421142578125, 93.500244140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000482.npy"}
{"epoch": 0.7077826725403817, "step": 483, "batch_size": 64, "mean": 71.71238708496094, "std": 71.45669555664062, "min": -111.03945922851562, "p10": 0.3614723205566413, "median": 64.09388732910156, "p90": 156.87183990478516, "max": 269.25933837890625, "pos_frac": 0.90625, "sample": [18.179092407226562, 117.236328125, 151.7231903076172, 152.9001922607422, -74.24984741210938, 129.69613647460938, 138.1023712158203, 34.589935302734375, 16.421836853027344, 0.069366455078125, 25.95513916015625, 216.45948791503906, 83.85810089111328, 116.05355072021484, 5.958339691162109, 147.09869384765625, 144.67718505859375, 82.24886322021484, 142.56031799316406, 139.86172485351562, -26.804847717285156, -2.2287120819091797, 25.563201904296875, 61.16117858886719, 42.91046905517578, 27.568145751953125, 100.72517395019531, 171.45526123046875, 131.84425354003906, 99.98342895507812, -16.613975524902344, 269.25933837890625, 25.476234436035156, 67.02659606933594, 75.05968475341797, 29.56475067138672, 72.57655334472656, 41.15149688720703, 187.05496215820312, 86.70286560058594, 77.16204833984375, 12.067611694335938, 170.72503662109375, 59.66063690185547, 170.09063720703125, 130.0000457763672, 12.556663513183594, 1.0430526733398438, 53.205810546875, 24.049606323242188, 39.03413391113281, 53.16355895996094, -85.65679168701172, 31.64813995361328, -111.03945922851562, 75.2408447265625, 89.50731658935547, 106.26509094238281, 158.573974609375, 50.9305534362793, 18.282958984375, 60.53904724121094, 129.65487670898438, 4.051361083984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000483.npy"}
{"epoch": 0.7092511013215859, "step": 484, "batch_size": 64, "mean": 76.90882873535156, "std": 81.04461669921875, "min": -126.41262817382812, "p10": -22.012659072875973, "median": 73.21420288085938, "p90": 173.38624114990236, "max": 245.65139770507812, "pos_frac": 0.828125, "sample": [-29.108810424804688, 141.38336181640625, 28.91534423828125, -9.29364013671875, 48.171112060546875, 44.76898956298828, 0.24919509887695312, 87.97340393066406, 119.38677978515625, 46.62285614013672, 4.793903350830078, 7.027286529541016, 197.08047485351562, 189.3270263671875, 170.72291564941406, 20.60092544555664, 89.97491455078125, -29.60942840576172, -42.4361572265625, 92.2196044921875, 138.18165588378906, 7.740863800048828, 145.59698486328125, 162.0477294921875, 233.9534912109375, -77.68586730957031, 138.0514678955078, 237.87074279785156, 1.3992652893066406, 125.98905181884766, 96.61474609375, -23.205699920654297, 141.23915100097656, 174.26821899414062, -13.919204711914062, 148.22808837890625, 77.88888549804688, 50.58576965332031, -2.4867019653320312, 105.16490173339844, 67.0439453125, 113.724609375, 57.24647521972656, 25.348186492919922, 171.3282928466797, -29.1497802734375, 115.36618041992188, 39.94259262084961, 101.64354705810547, -126.41262817382812, 6.931446075439453, 84.23497009277344, 245.65139770507812, 68.53952026367188, 32.34897232055664, 63.820220947265625, 135.85760498046875, 118.72447967529297, 150.47482299804688, 10.021005630493164, 152.80517578125, 63.874874114990234, -19.228897094726562, 225.7347412109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000484.npy"}
{"epoch": 0.71071953010279, "step": 485, "batch_size": 64, "mean": 88.29972839355469, "std": 75.34162139892578, "min": -40.598846435546875, "p10": 2.015275955200196, "median": 85.80889511108398, "p90": 194.11225891113287, "max": 268.8243408203125, "pos_frac": 0.90625, "sample": [176.677001953125, 113.29721069335938, 88.1424331665039, 40.68310546875, 215.16705322265625, 117.95201110839844, 130.71348571777344, -40.598846435546875, 144.33364868164062, 106.57019805908203, 53.196624755859375, 200.04190063476562, 28.602752685546875, 87.84569549560547, -40.0804443359375, 159.23895263671875, 115.53839111328125, 8.13067626953125, 157.76815795898438, 82.0184326171875, 230.74163818359375, 211.7301788330078, -15.707046508789062, 22.05471420288086, 143.5920867919922, 169.31790161132812, 29.98902130126953, 28.990028381347656, 123.44255065917969, 107.99996948242188, 139.83663940429688, 10.671028137207031, 56.825965881347656, 23.32752227783203, -22.261585235595703, 55.41548156738281, 170.95834350585938, 155.36378479003906, -39.67144012451172, 83.7720947265625, 180.27642822265625, 93.20314025878906, 218.5415496826172, 138.87429809570312, 15.14141845703125, 44.342315673828125, -9.939189910888672, 78.86497497558594, 95.33451843261719, 3.0670623779296875, 26.75909423828125, 203.87295532226562, 43.09496307373047, 117.29946899414062, 9.074188232421875, 161.91839599609375, 119.28474426269531, 69.49846649169922, 1.5645103454589844, 14.961196899414062, 21.24590301513672, 28.76239776611328, 268.8243408203125, 75.6881103515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000485.npy"}
{"epoch": 0.7121879588839941, "step": 486, "batch_size": 64, "mean": 78.58280181884766, "std": 81.8096923828125, "min": -120.18292236328125, "p10": -20.46575393676757, "median": 70.08208465576172, "p90": 185.18345642089847, "max": 239.5874786376953, "pos_frac": 0.828125, "sample": [227.04501342773438, 191.5845947265625, 68.97650146484375, 110.83226776123047, 109.09019470214844, 25.851165771484375, 123.98921966552734, 44.77593994140625, 64.9791030883789, 146.67498779296875, 174.75852966308594, 36.87065887451172, 143.8948211669922, 6.470672607421875, 63.93623352050781, 162.9129638671875, 126.71446228027344, 6.902008056640625, -3.0912704467773438, 64.47984313964844, 115.69161987304688, -42.747867584228516, 40.771629333496094, 61.96574401855469, 9.591739654541016, 68.50072479248047, -24.14392852783203, 129.53945922851562, 106.63327026367188, 165.6405029296875, 144.67337036132812, 193.39157104492188, 36.202667236328125, -11.883346557617188, 156.25521850585938, -37.004051208496094, 8.6475830078125, 102.08041381835938, 189.59249877929688, 169.07223510742188, 54.45072937011719, 102.7598648071289, 99.94000244140625, 0.07089519500732422, 212.7701873779297, 173.2562713623047, 87.62147521972656, 174.89569091796875, 25.00506591796875, 144.93202209472656, 52.98606872558594, 71.81156921386719, -10.960996627807617, 71.18766784667969, 3.049407958984375, 4.111719131469727, 140.18960571289062, -27.500457763671875, -120.18292236328125, -2.186573028564453, 201.85440063476562, -92.06044006347656, -58.40850067138672, 239.5874786376953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000486.npy"}
{"epoch": 0.7136563876651982, "step": 487, "batch_size": 64, "mean": 71.78614807128906, "std": 73.78055572509766, "min": -73.13101196289062, "p10": -2.150388717651366, "median": 60.29062461853027, "p90": 186.4601196289064, "max": 258.6658935546875, "pos_frac": 0.875, "sample": [80.56707763671875, 219.81930541992188, 11.207305908203125, 14.713302612304688, 98.5987548828125, 66.41278839111328, 100.95062255859375, 20.37706756591797, 240.19503784179688, 8.544387817382812, 78.14303588867188, 90.86531066894531, 127.17591857910156, -5.314796447753906, 109.68712615966797, 258.6658935546875, 117.19188690185547, 149.7220001220703, 74.13215637207031, 59.8494873046875, 137.08529663085938, 208.78921508789062, -18.822975158691406, 136.38674926757812, 60.737648010253906, 141.87164306640625, 21.548439025878906, 27.961318969726562, 32.8531608581543, 44.06354522705078, 41.10845947265625, 0.7574672698974609, 102.29853820800781, 55.42103576660156, 38.98893737792969, 51.25453186035156, 142.60781860351562, -68.32767486572266, 202.1565704345703, 46.45318603515625, -2.756793975830078, -0.735443115234375, 12.515586853027344, 22.5421142578125, 30.304397583007812, 149.83506774902344, 6.3043212890625, -41.09901428222656, 64.539794921875, 121.68424987792969, -73.13101196289062, 221.51556396484375, 60.73176193237305, 57.95014953613281, 23.251527786254883, -49.383758544921875, 67.16708374023438, 209.46397399902344, 113.62667083740234, 35.8836669921875, 43.30300521850586, 65.68985748291016, 27.000320434570312, 101.41394805908203], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000487.npy"}
{"epoch": 0.7151248164464024, "step": 488, "batch_size": 64, "mean": 74.18093872070312, "std": 65.45681762695312, "min": -117.0220947265625, "p10": 1.8598697662353567, "median": 73.3923225402832, "p90": 143.99515686035159, "max": 251.16287231445312, "pos_frac": 0.890625, "sample": [43.27824401855469, 37.48820495605469, 55.81224822998047, 164.34881591796875, 243.40480041503906, -3.5421218872070312, 102.197509765625, -0.40491485595703125, -27.568222045898438, 97.3268051147461, 78.17892456054688, 13.573837280273438, 23.99420166015625, 111.24801635742188, 10.1986083984375, 83.98109436035156, 95.91424560546875, -15.745172500610352, 7.144367218017578, -117.0220947265625, 99.68588256835938, 8.810661315917969, 174.00186157226562, 52.199989318847656, 63.810062408447266, 134.36474609375, 98.38336181640625, 119.92672729492188, 113.99838256835938, 106.44376373291016, 51.77037811279297, 58.56233215332031, 59.52783203125, 137.12484741210938, 162.03244018554688, 106.67697143554688, 251.16287231445312, 107.88565063476562, 49.03092575073242, 71.10335540771484, 50.785072326660156, 111.63287353515625, 93.35629272460938, 127.61984252929688, 134.90560913085938, -20.96857452392578, 48.14423370361328, 115.56527709960938, 18.71270751953125, 18.301910400390625, 95.55654907226562, 145.42474365234375, 75.68128967285156, 137.15512084960938, 57.80755615234375, -53.951904296875, 12.96310806274414, 158.78469848632812, 18.703237533569336, 19.737869262695312, 44.12226867675781, 39.592437744140625, 126.97775268554688, 140.65945434570312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000488.npy"}
{"epoch": 0.7165932452276065, "step": 489, "batch_size": 64, "mean": 69.30853271484375, "std": 73.26781463623047, "min": -53.695953369140625, "p10": -16.35525588989258, "median": 68.26888275146484, "p90": 163.73064270019535, "max": 233.31643676757812, "pos_frac": 0.8125, "sample": [58.40428161621094, 76.85908508300781, 36.285789489746094, 16.3365478515625, 69.18262481689453, 23.226043701171875, 93.0355453491211, 100.68732452392578, 25.21843719482422, 79.35721588134766, 126.68463897705078, -32.49385070800781, -51.90824890136719, 84.45137786865234, 67.35514068603516, 43.28080749511719, 177.22760009765625, 150.18199157714844, -53.695953369140625, 175.10169982910156, 76.41036224365234, 16.441598892211914, -10.051918029785156, 58.91876220703125, 36.86061096191406, -5.242652893066406, 166.69430541992188, 17.63599395751953, 77.59415435791016, -49.83879089355469, 12.267614364624023, 73.11189270019531, -0.44658470153808594, 32.443565368652344, 220.04953002929688, 44.014434814453125, 22.506221771240234, 5.7306976318359375, 27.564559936523438, -16.704307556152344, 131.50985717773438, -53.02240753173828, 231.51515197753906, 19.94983673095703, 123.34172058105469, 100.03071594238281, 156.8154296875, 134.67245483398438, 223.388916015625, -15.540802001953125, 233.31643676757812, 142.58261108398438, 100.56585693359375, 154.64776611328125, -21.15199089050293, 31.801177978515625, 79.55703735351562, 119.08963775634766, 128.99435424804688, 87.30062103271484, -7.15264892578125, 113.98876953125, 147.546142578125, 1.2614593505859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000489.npy"}
{"epoch": 0.7180616740088106, "step": 490, "batch_size": 64, "mean": 59.85785675048828, "std": 94.34295654296875, "min": -142.0689239501953, "p10": -62.043659591674796, "median": 58.302019119262695, "p90": 168.9592742919922, "max": 279.2066650390625, "pos_frac": 0.734375, "sample": [144.81494140625, -142.0689239501953, 115.23046875, -9.764015197753906, 83.29196166992188, 9.549169540405273, -31.629661560058594, -17.203636169433594, 198.34719848632812, 8.306293487548828, -131.02197265625, 162.36953735351562, 89.0198974609375, 200.64418029785156, -66.79440307617188, -46.59571838378906, -38.27055358886719, 169.65640258789062, -82.369873046875, -15.993751525878906, 0.5809707641601562, 18.66082763671875, -120.09622192382812, 34.65754699707031, -74.88306427001953, 116.64331817626953, 8.247804641723633, -20.641027450561523, 65.88481903076172, 279.2066650390625, 41.66804504394531, 276.676513671875, 53.3835563659668, 130.3341522216797, 144.72738647460938, 99.07318115234375, 111.70317077636719, 43.056480407714844, 112.05404663085938, -27.409141540527344, 143.45556640625, 82.66753387451172, 153.33724975585938, 156.14744567871094, 58.852291107177734, 153.44815063476562, -85.28388977050781, 86.30865478515625, -37.81109619140625, 114.83804321289062, 95.22123718261719, -50.95859146118164, 197.24293518066406, 62.70805358886719, 51.530303955078125, 167.3326416015625, 102.00215911865234, 37.067832946777344, 25.06562042236328, 209.78292846679688, 57.751747131347656, 79.43560791015625, 56.937721252441406, 20.776039123535156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000490.npy"}
{"epoch": 0.7195301027900147, "step": 491, "batch_size": 64, "mean": 63.129661560058594, "std": 78.33280181884766, "min": -183.8103485107422, "p10": -29.494820022583006, "median": 63.023555755615234, "p90": 166.0752731323242, "max": 212.3819580078125, "pos_frac": 0.8125, "sample": [54.302947998046875, -30.179821014404297, 88.06919860839844, 166.50927734375, 98.1419448852539, 18.563013076782227, 94.19477081298828, 1.1667518615722656, 47.31150817871094, 74.40725708007812, 77.83605194091797, 56.206512451171875, 66.67759704589844, -40.81861114501953, 166.44607543945312, 151.67892456054688, 56.77978515625, 157.74810791015625, 82.60101318359375, 1.654541015625, 19.76667022705078, 35.16448211669922, 212.3819580078125, 18.48944091796875, -37.83783721923828, 66.93937683105469, 141.09323120117188, 18.95100212097168, 146.83416748046875, 35.00092315673828, 142.72885131835938, 134.40298461914062, -21.922935485839844, 69.16365051269531, -27.896484375, 203.40786743164062, -4.835914611816406, 193.57183837890625, -1.1901893615722656, 115.99125671386719, 78.37982177734375, -16.179672241210938, 211.97222900390625, -92.43890380859375, 59.591827392578125, 80.4332504272461, 60.80345153808594, 132.2877197265625, 165.21006774902344, 171.3817138671875, 2.918712615966797, 67.68101501464844, 25.922317504882812, 32.51476287841797, 65.24365997314453, -79.98779296875, 110.31987762451172, 87.10258483886719, 29.358261108398438, 43.93232727050781, 145.92971801757812, -56.731475830078125, 48.96208190917969, -183.8103485107422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000491.npy"}
{"epoch": 0.7209985315712188, "step": 492, "batch_size": 64, "mean": 52.02274703979492, "std": 90.35128021240234, "min": -208.05519104003906, "p10": -67.17593002319336, "median": 43.65361022949219, "p90": 168.79033203125002, "max": 241.96383666992188, "pos_frac": 0.734375, "sample": [70.02222442626953, 132.6717987060547, 52.701358795166016, -72.71285247802734, 112.64631652832031, 28.660354614257812, -4.043426513671875, 144.94497680664062, -71.81512451171875, 3.450408935546875, -90.74246215820312, 205.6941375732422, -1.7060394287109375, 145.3779754638672, 70.13479614257812, 164.22146606445312, 2.2925186157226562, 164.69650268554688, 82.63088989257812, 181.982177734375, -69.12903594970703, 87.22797393798828, 31.717819213867188, 107.20794677734375, 127.44962310791016, 20.344444274902344, 49.14244842529297, -62.618682861328125, 24.170608520507812, 8.349590301513672, 7.45556640625, -17.490921020507812, 25.5560359954834, -43.89677429199219, 9.187707901000977, 141.41290283203125, 38.164772033691406, -2.4520339965820312, -208.05519104003906, 110.1558837890625, 241.96383666992188, 122.87545776367188, 90.90609741210938, 9.825820922851562, 22.93213653564453, 226.72479248046875, 76.29005432128906, 99.55077362060547, 74.30694580078125, 185.51332092285156, 95.95088195800781, -14.828056335449219, 100.51959991455078, -113.91547393798828, 95.15771484375, 170.54483032226562, -3.6113433837890625, 34.91014099121094, -60.09047317504883, 81.93450927734375, -121.89033508300781, 198.474365234375, 14.96127700805664, -4.559783935546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000492.npy"}
{"epoch": 0.7224669603524229, "step": 493, "batch_size": 64, "mean": 61.896034240722656, "std": 65.36371612548828, "min": -85.47159576416016, "p10": -28.60513000488281, "median": 66.78594207763672, "p90": 142.00155792236328, "max": 195.2083740234375, "pos_frac": 0.84375, "sample": [40.69658279418945, 46.77814483642578, 122.29559326171875, 32.470458984375, 77.86648559570312, 110.41999816894531, 195.2083740234375, 142.70803833007812, -54.243812561035156, -73.5467529296875, 140.3531036376953, 55.76805114746094, 101.55723571777344, 54.343505859375, 30.779695510864258, 82.95478057861328, 85.3236083984375, 78.16249084472656, -67.5815200805664, 21.050695419311523, 51.04991912841797, 70.7254409790039, 128.44473266601562, 1.6737136840820312, 96.92642211914062, 72.99573516845703, 127.8637924194336, -85.47159576416016, 114.7393569946289, -24.323360443115234, 64.47352600097656, -2.202930450439453, 151.96868896484375, 46.91925811767578, 133.55697631835938, -30.440174102783203, 174.22720336914062, 67.83424377441406, 98.04945373535156, -9.721420288085938, 23.6019287109375, 19.359695434570312, 186.23524475097656, 35.00717544555664, 90.54936218261719, 88.14112854003906, 16.206995010375977, 22.527496337890625, 172.68466186523438, 95.94854736328125, 41.087303161621094, 123.21570587158203, 106.76947021484375, 162.14169311523438, -69.13199615478516, -38.20555114746094, 66.21743774414062, 26.03156089782715, 24.981430053710938, 83.78555297851562, 67.35444641113281, 101.94595336914062, 8.368850708007812, 33.868560791015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000493.npy"}
{"epoch": 0.723935389133627, "step": 494, "batch_size": 64, "mean": 84.58407592773438, "std": 78.070556640625, "min": -110.10407257080078, "p10": -11.089061737060547, "median": 80.19169235229492, "p90": 189.75792694091803, "max": 267.98602294921875, "pos_frac": 0.859375, "sample": [-11.26772689819336, 49.663414001464844, 107.337646484375, 195.41494750976562, 267.98602294921875, 137.9315185546875, -12.84722900390625, 229.98133850097656, 152.38955688476562, 49.36626052856445, 93.72479248046875, 76.08262634277344, -19.33584213256836, 156.87234497070312, 26.708740234375, 147.757568359375, 78.10975646972656, -88.21246337890625, 87.47794342041016, 220.28575134277344, 99.915283203125, 75.98037719726562, 96.32441711425781, 172.93101501464844, 3.2746429443359375, 76.5235595703125, 52.097320556640625, 196.1031494140625, 147.76638793945312, 88.23719787597656, -46.73793029785156, 29.656517028808594, 60.289222717285156, 176.55821228027344, 144.26373291015625, 74.62092590332031, 138.45901489257812, 24.78496551513672, 85.61734008789062, 129.47451782226562, 65.06575775146484, -10.672176361083984, -110.10407257080078, 20.179168701171875, 153.26828002929688, 220.5836181640625, 106.200927734375, 10.531761169433594, -69.27357482910156, 153.01904296875, 82.5677719116211, 4.435138702392578, 111.32913208007812, 45.69700622558594, 117.99446105957031, 148.54806518554688, 198.22744750976562, 76.86296081542969, -1.3530120849609375, 17.802993774414062, 80.16497802734375, 73.68751525878906, 80.2184066772461, 66.83233642578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000494.npy"}
{"epoch": 0.7254038179148311, "step": 495, "batch_size": 64, "mean": 71.03709411621094, "std": 81.49250030517578, "min": -159.5187225341797, "p10": -35.2141788482666, "median": 71.9976692199707, "p90": 163.90857086181643, "max": 277.47088623046875, "pos_frac": 0.8125, "sample": [121.969482421875, -5.819530487060547, 101.12235260009766, 69.03594970703125, 164.82260131835938, 142.13519287109375, -43.666290283203125, -0.5688934326171875, 19.174911499023438, 44.23529052734375, 110.22130584716797, 20.456695556640625, 62.75391387939453, -28.90829086303711, 153.56182861328125, 78.82408905029297, 14.165081024169922, 40.86936950683594, 33.750640869140625, 54.88099670410156, 17.595352172851562, -39.338287353515625, 124.09429931640625, -19.221092224121094, 95.84896850585938, 83.07522583007812, 177.88311767578125, -37.91670227050781, 38.12648010253906, 34.40882873535156, 92.76016235351562, 124.34153747558594, 95.4931640625, -84.79752349853516, 169.3345947265625, 202.9381103515625, 161.7758331298828, 57.20964050292969, 234.51968383789062, 122.52273559570312, 105.51506042480469, -54.96905517578125, 78.9953384399414, 56.473968505859375, 29.597137451171875, 97.97285461425781, -159.5187225341797, 147.13983154296875, 58.19540786743164, 38.393798828125, 72.515380859375, 40.5126953125, 121.5525894165039, 104.26445007324219, 161.71681213378906, -19.14654541015625, 26.76880645751953, 155.0360565185547, -104.99629211425781, 71.4799575805664, 205.52708435058594, 148.09934997558594, 277.47088623046875, 84.10612487792969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000495.npy"}
{"epoch": 0.7268722466960352, "step": 496, "batch_size": 64, "mean": 60.36466598510742, "std": 70.49468231201172, "min": -123.86347198486328, "p10": -26.636392974853507, "median": 66.82257461547852, "p90": 139.08839263916016, "max": 201.69049072265625, "pos_frac": 0.765625, "sample": [-123.86347198486328, -51.10279846191406, 104.38011932373047, 113.71344757080078, -2.3150100708007812, 121.92802429199219, -93.5068359375, 174.4584197998047, 123.74642944335938, -16.796493530273438, 37.213661193847656, 185.39071655273438, 4.885993957519531, 118.21481323242188, 146.3349609375, 50.76764678955078, 27.795623779296875, 113.37203979492188, -11.120674133300781, 39.549888610839844, 59.251373291015625, 70.68081665039062, 114.17294311523438, 139.63096618652344, 201.69049072265625, 152.842529296875, 65.20942687988281, 74.04242706298828, 85.2713394165039, 82.75189208984375, 36.39031982421875, -4.750389099121094, 54.40570068359375, 118.74604797363281, -6.372856140136719, 106.30397033691406, -2.0417022705078125, 67.75170135498047, -43.61204528808594, 127.00332641601562, 3.628002166748047, 10.001564025878906, 81.90251159667969, -30.853492736816406, -9.234535217285156, 65.89344787597656, 90.39654541015625, 197.37277221679688, 72.70735931396484, 59.69444274902344, 60.77092742919922, -89.98977661132812, 137.8223876953125, 120.36637878417969, 108.48246765136719, 25.883892059326172, 134.9859619140625, 85.4709701538086, 22.626556396484375, 76.970947265625, 92.13873291015625, -14.097854614257812, 37.21485137939453, -39.231468200683594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000496.npy"}
{"epoch": 0.7283406754772394, "step": 497, "batch_size": 64, "mean": 45.23841094970703, "std": 75.94994354248047, "min": -130.94227600097656, "p10": -55.21966476440429, "median": 32.89691162109375, "p90": 151.68666229248046, "max": 215.95578002929688, "pos_frac": 0.734375, "sample": [6.229984283447266, 149.9886474609375, 65.44325256347656, 184.2334747314453, -3.570169448852539, -10.988361358642578, 55.50218200683594, 47.25262451171875, 69.99988555908203, 20.604034423828125, 99.06707000732422, 100.06240844726562, -2.617431640625, 3.2092971801757812, 3.6888809204101562, 31.1485595703125, 95.3788833618164, 110.83132934570312, 52.000694274902344, 14.43896484375, -7.579963684082031, 26.418766021728516, 65.85427856445312, -58.15824890136719, 7.7074127197265625, 18.743854522705078, -38.15725326538086, 157.310791015625, -3.079620361328125, -73.0511474609375, 82.784912109375, 93.66948699951172, 56.20732116699219, 122.81918334960938, -99.22984313964844, 215.95578002929688, 14.394371032714844, 97.31942749023438, 134.4854278564453, 93.52915954589844, -12.14910888671875, 118.97676849365234, 13.830982208251953, 212.398681640625, -130.94227600097656, 63.09526062011719, 171.72756958007812, 107.32189178466797, 164.566162109375, 34.645263671875, -74.1434326171875, -59.787208557128906, 52.87187957763672, 30.13988494873047, 59.82032012939453, 3.3813400268554688, -67.8475341796875, -12.437515258789062, -38.57853698730469, 14.248069763183594, 152.4143829345703, -48.36296844482422, 133.2007598876953, 7.019401550292969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000497.npy"}
{"epoch": 0.7298091042584435, "step": 498, "batch_size": 64, "mean": 69.55242919921875, "std": 69.90756225585938, "min": -117.04827880859375, "p10": -4.069352722167966, "median": 56.70199012756348, "p90": 155.12825775146484, "max": 256.9486999511719, "pos_frac": 0.875, "sample": [-35.199764251708984, 124.44381713867188, 37.94188690185547, 133.85867309570312, 22.0943603515625, 27.3720703125, 18.878421783447266, 92.33274841308594, 83.01964569091797, 163.67176818847656, 28.16094207763672, 20.79894256591797, 19.068889617919922, 154.2830047607422, -45.8673095703125, 101.5353012084961, 104.76646423339844, 104.3857650756836, 123.5103759765625, 209.95535278320312, 2.194814682006836, 71.6698989868164, 44.84309387207031, 75.9846420288086, 115.19879913330078, 197.66070556640625, 83.59293365478516, -5.468635559082031, 14.168731689453125, 155.49050903320312, 52.85420227050781, 152.71046447753906, 44.81565856933594, 256.9486999511719, 54.02931594848633, 59.374664306640625, 48.644493103027344, 112.23968505859375, 157.8193359375, 102.04319763183594, 28.537216186523438, 12.324111938476562, 131.95614624023438, 52.779014587402344, 141.625, 21.208587646484375, -117.04827880859375, 18.402969360351562, 78.36952209472656, 93.88939666748047, 43.08552551269531, -65.52545928955078, 77.18623352050781, -0.8043594360351562, 0.096649169921875, -35.610687255859375, 124.84304809570312, 103.26924133300781, 194.08921813964844, 52.61903381347656, 19.032508850097656, 149.63877868652344, 48.98255920410156, -7.417217254638672], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000498.npy"}
{"epoch": 0.7312775330396476, "step": 499, "batch_size": 64, "mean": 68.91854095458984, "std": 82.9411392211914, "min": -139.43910217285156, "p10": -21.99670257568359, "median": 62.91853332519531, "p90": 172.22937622070313, "max": 298.3147888183594, "pos_frac": 0.8125, "sample": [89.6631851196289, 61.03257751464844, 30.554351806640625, -139.43910217285156, 105.76785278320312, 28.724716186523438, 51.68272399902344, 29.389007568359375, -97.78075408935547, -51.13153076171875, 50.190834045410156, 17.666545867919922, 18.31153106689453, 91.71743774414062, -6.467681884765625, 118.35069274902344, 85.73855590820312, -12.673416137695312, 298.3147888183594, 47.85322570800781, 85.22477722167969, 141.51283264160156, 46.24180603027344, 180.66915893554688, 168.8753662109375, 30.737655639648438, 173.66680908203125, 121.37060546875, 37.142127990722656, 202.30320739746094, 132.375732421875, 15.222969055175781, 92.05451965332031, 52.49351501464844, 6.657569885253906, 64.80448913574219, 90.61334228515625, 32.82157897949219, 119.81475830078125, -99.25975036621094, -36.63007354736328, 177.3214111328125, 148.87071228027344, 175.80838012695312, 157.7848663330078, 104.68643188476562, 103.82312774658203, 131.10549926757812, -13.522586822509766, -24.192718505859375, 12.310455322265625, 143.6450958251953, 139.57797241210938, 47.18731689453125, 23.884525299072266, 134.40585327148438, 247.03814697265625, 79.54216003417969, 0.7067413330078125, -70.34868621826172, -9.591442108154297, 97.12085723876953, -16.872665405273438, 144.31634521484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000499.npy"}
{"epoch": 0.7327459618208517, "step": 500, "batch_size": 64, "mean": 62.958309173583984, "std": 83.96858978271484, "min": -76.28460693359375, "p10": -34.72176055908203, "median": 45.63586616516113, "p90": 179.7774291992188, "max": 266.39544677734375, "pos_frac": 0.734375, "sample": [2.836048126220703, 26.41244888305664, -35.33860778808594, 266.39544677734375, 113.58665466308594, 161.214111328125, 59.56745147705078, 31.656478881835938, 1.4339866638183594, 165.94369506835938, 115.75343322753906, 159.03961181640625, -7.011878967285156, 44.48518371582031, -33.82001495361328, 128.4896240234375, 5.465667724609375, 2.408966064453125, -1.5407943725585938, 125.11573791503906, 46.91004943847656, -27.335235595703125, 107.55730438232422, 195.22218322753906, 21.834671020507812, 192.84356689453125, -19.94890594482422, 239.5044708251953, 113.74140930175781, -47.100006103515625, 47.169677734375, 51.16563415527344, 56.589813232421875, 134.65878295898438, -62.431365966796875, 149.26446533203125, -70.3570556640625, -35.10822296142578, 72.3533935546875, -16.578521728515625, -76.28460693359375, 8.972335815429688, 145.4474639892578, 169.32115173339844, -19.49553680419922, 1.3088722229003906, 169.3551025390625, 92.1068115234375, 43.779876708984375, 47.28688049316406, 184.244140625, 163.35504150390625, 84.98119354248047, 32.383975982666016, 29.29943084716797, 198.14974975585938, 41.70869445800781, -36.13037109375, 30.407562255859375, -15.413497924804688, 46.78654861450195, 216.31198120117188, -3.089111328125, -7.511329650878906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000500.npy"}
{"epoch": 0.7342143906020558, "step": 501, "batch_size": 64, "mean": 67.81106567382812, "std": 95.60289001464844, "min": -139.99188232421875, "p10": -41.6958267211914, "median": 57.42985153198242, "p90": 210.82040557861336, "max": 291.9793701171875, "pos_frac": 0.78125, "sample": [141.88082885742188, -87.06830596923828, 55.97608184814453, -31.663204193115234, 37.119102478027344, 26.6434326171875, 0.7171401977539062, 189.19007873535156, -35.47113800048828, -20.883743286132812, 55.34312438964844, 16.300142288208008, 103.64263153076172, 101.4750747680664, 184.113037109375, 224.10751342773438, 96.29582214355469, -67.41570281982422, -115.95785522460938, 12.833282470703125, 139.07545471191406, 10.309856414794922, 117.33872985839844, 146.6785430908203, 176.00721740722656, 147.86419677734375, -42.84477233886719, -36.164085388183594, 177.7899932861328, 101.42866516113281, 0.1543731689453125, 9.729141235351562, 66.31876373291016, 4.441204071044922, 103.87940979003906, 222.2203369140625, -22.718154907226562, 56.32525634765625, 80.29072570800781, 169.86004638671875, 240.0814208984375, 46.9755859375, 69.34061431884766, -39.01495361328125, 225.020751953125, 86.8631362915039, 5.398429870605469, 137.01519775390625, -11.06679916381836, -65.88505554199219, 220.09054565429688, 1.0294570922851562, 31.05108642578125, 8.406669616699219, 89.64619445800781, -139.99188232421875, 74.98747253417969, 225.62391662597656, 178.0365447998047, -43.24067306518555, 291.9793701171875, 58.534446716308594, 106.65194702148438, 27.212661743164062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000501.npy"}
{"epoch": 0.73568281938326, "step": 502, "batch_size": 64, "mean": 61.971439361572266, "std": 82.99942779541016, "min": -129.19741821289062, "p10": -48.9416763305664, "median": 60.92786407470703, "p90": 161.9532897949219, "max": 215.9161376953125, "pos_frac": 0.78125, "sample": [80.32275390625, -80.1076889038086, 56.350677490234375, 67.86119079589844, 13.500259399414062, -30.761856079101562, -39.3343505859375, -51.43511962890625, 215.9161376953125, 133.95677185058594, 137.91397094726562, 180.41903686523438, 59.38337707519531, 151.59503173828125, 134.7189483642578, 13.918746948242188, -37.66389465332031, 125.73104858398438, 130.22877502441406, 153.54486083984375, 17.0997314453125, 29.595523834228516, 40.23048400878906, 33.65629959106445, 37.4769287109375, -23.85449981689453, -55.70140838623047, 78.00523376464844, 80.33898162841797, 164.63279724121094, -37.446258544921875, 202.99119567871094, -129.19741821289062, -7.7784576416015625, 28.107421875, 163.80873107910156, 2.47821044921875, 108.52793884277344, -98.69131469726562, 13.063125610351562, 176.0643768310547, 97.64601135253906, -77.00300598144531, -43.12364196777344, 158.5728759765625, 54.33253479003906, 37.18290710449219, 12.88958740234375, 163.40203857421875, 123.80484008789062, 149.6870574951172, 63.49188995361328, 150.43780517578125, 62.47235107421875, 18.30126190185547, 35.438140869140625, 139.90452575683594, -76.4290771484375, 119.60774230957031, 16.077091217041016, 155.65182495117188, 125.08851623535156, 147.9583740234375, 91.31407165527344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000502.npy"}
{"epoch": 0.737151248164464, "step": 503, "batch_size": 64, "mean": 71.16193389892578, "std": 82.44644165039062, "min": -153.2558135986328, "p10": -25.378284454345692, "median": 60.86140441894531, "p90": 177.89323883056642, "max": 211.49758911132812, "pos_frac": 0.796875, "sample": [8.714157104492188, -1.3660392761230469, 92.94650268554688, -56.01667785644531, 27.44361114501953, 17.785917282104492, -30.127731323242188, 2.5561256408691406, 21.381244659423828, 180.46435546875, -3.4366302490234375, 124.95504760742188, 52.63829803466797, 160.904296875, 170.5845489501953, 143.0654296875, 127.85310363769531, 22.925460815429688, -81.1069564819336, 127.26605224609375, 119.50962829589844, 118.31951904296875, 56.322357177734375, 108.43436431884766, 31.23215103149414, 22.93121337890625, -14.296241760253906, 126.08334350585938, -4.916254043579102, 43.370521545410156, 0.6085052490234375, 21.08590316772461, -67.79734802246094, 163.70204162597656, -7.8422393798828125, 192.62179565429688, 86.94577026367188, 143.84375, 76.9296875, 39.64904022216797, 184.00033569335938, 197.31668090820312, 111.94694519042969, -46.25880813598633, -153.2558135986328, 171.8939666748047, 143.93927001953125, 201.68934631347656, 122.53939056396484, 142.10018920898438, -48.644798278808594, 163.3288116455078, 10.188629150390625, 52.58392333984375, 5.5117645263671875, 117.98747253417969, 43.25914001464844, 65.40045166015625, 186.82864379882812, 111.54116821289062, 168.7401580810547, 38.0016975402832, 211.49758911132812, -13.940292358398438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000503.npy"}
{"epoch": 0.7386196769456681, "step": 504, "batch_size": 64, "mean": 82.92597961425781, "std": 76.5731201171875, "min": -113.64083862304688, "p10": -1.720368194580075, "median": 96.06114196777344, "p90": 178.20998840332035, "max": 265.3204345703125, "pos_frac": 0.890625, "sample": [110.74851989746094, 107.66032409667969, 119.91899871826172, 45.330047607421875, 14.09295654296875, 41.71757888793945, 20.54723358154297, 43.3026123046875, 123.34649658203125, 263.5069885253906, 214.39022827148438, 155.6466522216797, 182.28822326660156, -2.998462677001953, 41.04991912841797, 103.4483642578125, 154.67881774902344, 116.13539123535156, -103.13473510742188, 128.7636260986328, 181.35507202148438, 45.11357879638672, 95.9150390625, -113.64083862304688, 35.89425277709961, 108.32745361328125, 43.132537841796875, -65.61294555664062, 52.86164855957031, -7.174774169921875, -39.865196228027344, 45.671485900878906, 109.28490447998047, 203.56304931640625, 48.291648864746094, 45.575103759765625, 73.81719970703125, 76.09164428710938, 130.02906799316406, 51.806610107421875, 114.96133422851562, 265.3204345703125, 117.48678588867188, 1.2618522644042969, 120.32714080810547, 136.57516479492188, -30.967849731445312, 144.6357879638672, 22.74456214904785, 20.758651733398438, 9.723339080810547, 136.27890014648438, 213.12257385253906, 99.97309112548828, 99.755859375, 96.20724487304688, 170.8714599609375, 110.42568969726562, 129.1370849609375, 62.27406311035156, 34.60942077636719, 39.40790557861328, 144.14698791503906, 47.34907531738281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000504.npy"}
{"epoch": 0.7400881057268722, "step": 505, "batch_size": 64, "mean": 66.58810424804688, "std": 70.41443634033203, "min": -61.04573440551758, "p10": -26.92497634887695, "median": 66.7277717590332, "p90": 159.60498046875003, "max": 259.1971435546875, "pos_frac": 0.796875, "sample": [-15.38922119140625, 27.643951416015625, 42.87835693359375, 123.26831817626953, 34.443328857421875, 182.66578674316406, 2.846710205078125, 143.17086791992188, 170.10284423828125, 169.30287170410156, -22.379703521728516, 136.50244140625, 259.1971435546875, 86.14627838134766, 130.3934326171875, 70.98207092285156, 132.51934814453125, 22.201480865478516, 147.70687866210938, 75.79528045654297, 108.28190612792969, 77.85574340820312, 84.32080078125, 53.702911376953125, -41.441375732421875, 52.68524932861328, 57.84010696411133, 27.287242889404297, 56.75657653808594, 163.2867889404297, -38.91184616088867, 161.36737060546875, -20.125591278076172, -22.691539764404297, 153.38595581054688, 101.78515625, -23.8382568359375, 99.14861297607422, 20.905597686767578, 120.49443054199219, 76.1321792602539, 42.15077209472656, 173.08828735351562, 38.3104248046875, 76.3043212890625, 64.09881591796875, 52.337158203125, 75.59097290039062, 59.40899658203125, 145.93446350097656, 18.041580200195312, -14.863441467285156, 26.591848373413086, 87.39199829101562, 113.6885986328125, -44.85344314575195, 69.35672760009766, -42.116455078125, 28.034011840820312, -28.24785614013672, 155.49273681640625, -61.04573440551758, 97.4208984375, -58.70330047607422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000505.npy"}
{"epoch": 0.7415565345080763, "step": 506, "batch_size": 64, "mean": 68.321533203125, "std": 68.3592758178711, "min": -112.3463134765625, "p10": -19.875444412231438, "median": 60.114646911621094, "p90": 162.0868682861328, "max": 201.1782684326172, "pos_frac": 0.875, "sample": [-23.503318786621094, 70.94656372070312, 19.077213287353516, 76.72627258300781, 60.63066101074219, 48.262298583984375, 13.633201599121094, 105.9344711303711, 53.425262451171875, 45.85060501098633, 78.47865295410156, 30.986648559570312, 161.68109130859375, 15.716232299804688, -39.24022674560547, 177.60183715820312, 123.2066879272461, -11.410404205322266, 25.888229370117188, 177.1693572998047, 106.84246826171875, 122.25289916992188, 122.26510620117188, 167.3197021484375, 201.1782684326172, -34.120262145996094, 10.944984436035156, 118.44105529785156, 178.80494689941406, 57.42772674560547, 105.18714904785156, 47.668514251708984, 136.47804260253906, 35.217864990234375, 37.16754150390625, -90.97676086425781, 120.01644134521484, 107.11595153808594, 113.73123168945312, 119.88516235351562, 136.93637084960938, 55.07589340209961, 148.72354125976562, 18.69454574584961, -24.492477416992188, 6.322662353515625, 40.516685485839844, 77.28825378417969, 46.27789306640625, 162.26077270507812, 59.5986328125, 28.82419776916504, 54.555572509765625, 57.788780212402344, 100.79989624023438, 98.15103912353516, -112.3463134765625, 51.54589080810547, 67.12387084960938, 127.35697937011719, 181.59664916992188, 76.51286315917969, 2.7842559814453125, -83.22803497314453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000506.npy"}
{"epoch": 0.7430249632892805, "step": 507, "batch_size": 64, "mean": 73.00973510742188, "std": 85.29108428955078, "min": -136.4137725830078, "p10": -29.42193145751953, "median": 65.10589981079102, "p90": 176.13164062500005, "max": 314.2185974121094, "pos_frac": 0.859375, "sample": [42.54939270019531, 37.61316680908203, 102.3818359375, 66.85806274414062, 314.2185974121094, -30.132431030273438, 106.96829223632812, 113.69014739990234, 20.428070068359375, 57.43373107910156, 105.19413757324219, 106.13127136230469, 157.17898559570312, 182.36087036132812, 4.6857147216796875, -68.9752197265625, -27.76409912109375, -63.88519287109375, -0.0906219482421875, 236.22323608398438, 38.13878631591797, 139.29405212402344, 9.533315658569336, 51.36729431152344, 26.197799682617188, 126.08853912353516, 235.86810302734375, 49.16754150390625, 43.88398742675781, 18.713134765625, 139.56675720214844, 116.40151977539062, -136.4137725830078, 28.613805770874023, 161.59677124023438, 146.2266845703125, -91.80804443359375, 144.1713409423828, 141.48382568359375, 39.52699661254883, 71.18515014648438, 94.95562744140625, 153.26034545898438, 76.23192596435547, 128.82180786132812, 85.62080383300781, 111.00381469726562, 189.9536895751953, 109.81779479980469, 209.55682373046875, 31.645095825195312, -53.13603973388672, 11.112180709838867, 50.86851501464844, 66.26876068115234, 230.578125, 63.94303894042969, 19.409645080566406, 24.527130126953125, -91.43414306640625, 8.90716552734375, 105.071044921875, 39.388736724853516, 44.37946319580078], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000507.npy"}
{"epoch": 0.7444933920704846, "step": 508, "batch_size": 64, "mean": 62.412715911865234, "std": 85.20691680908203, "min": -121.89508056640625, "p10": -39.32404174804687, "median": 44.381107330322266, "p90": 178.08126068115234, "max": 255.5589599609375, "pos_frac": 0.765625, "sample": [155.74050903320312, 191.15591430664062, -76.7912368774414, 179.32859802246094, -23.53497314453125, -121.89508056640625, -12.401710510253906, 165.67852783203125, -3.93780517578125, 185.72398376464844, 171.60922241210938, 63.11094665527344, 32.61089324951172, 210.89187622070312, 166.89222717285156, 110.13462829589844, 60.76594161987305, -20.085281372070312, 185.05484008789062, 146.70184326171875, -43.13835144042969, 155.88632202148438, 85.36502075195312, 3.0519866943359375, 83.59934997558594, 34.50018310546875, 37.727684020996094, -42.175872802734375, 59.56201171875, 33.06837463378906, -16.166336059570312, 167.05250549316406, 139.94363403320312, -10.695354461669922, 175.17080688476562, -23.40575408935547, -50.42430114746094, 15.59927749633789, 6.087654113769531, 130.490478515625, 43.38976287841797, 50.30918884277344, 31.4041748046875, 70.9069595336914, 106.14163208007812, 79.87339782714844, 45.37245178222656, 66.43731689453125, 117.27909851074219, 197.10882568359375, 101.88623046875, 32.95793151855469, -119.88030242919922, 255.5589599609375, 159.34632873535156, 0.9858932495117188, 34.6378173828125, 11.115753173828125, 31.32817840576172, 27.547771453857422, 17.92938232421875, -46.58423614501953, 4.177947998046875, -32.669769287109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000508.npy"}
{"epoch": 0.7459618208516887, "step": 509, "batch_size": 64, "mean": 60.193206787109375, "std": 77.10820770263672, "min": -133.65200805664062, "p10": -36.22503356933593, "median": 67.87903594970703, "p90": 147.71634216308595, "max": 216.50074768066406, "pos_frac": 0.8125, "sample": [3.6182518005371094, 70.2828369140625, -133.65200805664062, 15.021484375, -12.4759521484375, 122.26052856445312, 163.4832305908203, 27.501567840576172, 104.96711730957031, 140.73736572265625, 65.47523498535156, 101.68405151367188, 8.216720581054688, 42.655372619628906, 63.19146728515625, 8.425949096679688, 113.90742492675781, 11.615715026855469, -61.1065673828125, -12.348434448242188, 212.30889892578125, 142.13356018066406, 20.967063903808594, 46.603824615478516, 114.06156158447266, -11.954334259033203, -6.315895080566406, 125.46320343017578, 70.5727767944336, 3.3110809326171875, 141.7264404296875, -68.17935943603516, 107.56808471679688, -41.05235290527344, 80.45219421386719, 108.96309661865234, 108.1512680053711, 105.03584289550781, 162.4821014404297, -45.30592346191406, 0.8528366088867188, -24.961288452148438, 38.92731475830078, 103.29264068603516, 113.14051055908203, -127.17527770996094, 150.21475219726562, 123.67591094970703, 3.59661865234375, 216.50074768066406, -119.9250259399414, 100.50341796875, 115.76580810546875, 11.87539291381836, 84.50379943847656, 44.4669303894043, 85.23030853271484, 38.60328674316406, 94.2798080444336, 171.32455444335938, 135.97032165527344, 57.0797119140625, 150.1089630126953, 64.05877685546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000509.npy"}
{"epoch": 0.7474302496328928, "step": 510, "batch_size": 64, "mean": 71.46773529052734, "std": 80.51123809814453, "min": -122.55581665039062, "p10": -31.32707977294922, "median": 69.91939926147461, "p90": 188.2080764770508, "max": 223.34976196289062, "pos_frac": 0.78125, "sample": [-49.23179626464844, 25.086036682128906, 161.10775756835938, 144.25103759765625, -61.42109680175781, 108.39240264892578, 218.3387908935547, 210.93109130859375, 121.15216064453125, 64.88307189941406, 16.652965545654297, 43.681243896484375, -0.17167282104492188, 196.26315307617188, 6.131690979003906, 141.77783203125, 149.92349243164062, -122.55581665039062, -88.1326904296875, 113.01083374023438, -7.45355224609375, -2.1579113006591797, 72.46727752685547, -72.84872436523438, 69.87776947021484, 136.18807983398438, 93.99554443359375, 126.23121643066406, 194.89723205566406, 55.074554443359375, 30.64989471435547, 98.49433898925781, 51.13452911376953, 223.34976196289062, 192.37196350097656, 32.342323303222656, 104.33833312988281, 99.25846862792969, -6.081455230712891, 202.44308471679688, 83.48164367675781, 46.97612762451172, 69.96102905273438, 38.48798370361328, 44.85076904296875, 104.87887573242188, 134.41102600097656, 178.49234008789062, -30.852890014648438, 66.0928955078125, 122.67755126953125, 35.56550979614258, 38.25341796875, 74.56491088867188, 43.177032470703125, 163.24346923828125, -18.903152465820312, 46.91576385498047, -29.44754409790039, 139.5384979248047, 118.3567123413086, 88.31485748291016, -48.21484375, -31.530303955078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000510.npy"}
{"epoch": 0.748898678414097, "step": 511, "batch_size": 64, "mean": 72.47528076171875, "std": 80.88565826416016, "min": -86.55412292480469, "p10": -21.85988159179687, "median": 59.135047912597656, "p90": 184.1612518310547, "max": 326.879150390625, "pos_frac": 0.828125, "sample": [51.36121368408203, 217.13699340820312, 13.874488830566406, 141.19302368164062, 85.68844604492188, 19.687255859375, 96.15203857421875, 102.40464782714844, 185.22625732421875, 58.17597579956055, 6.124786376953125, 79.88214111328125, 213.8726348876953, 40.61457824707031, -2.009702682495117, 97.89822387695312, -39.97338104248047, 34.95857238769531, -10.0224609375, -23.89788055419922, 225.99224853515625, 140.6243133544922, 13.245574951171875, 95.63888549804688, 179.62777709960938, -68.95079803466797, -86.55412292480469, 134.6527099609375, 72.49071502685547, 131.58396911621094, 76.23458862304688, 58.574066162109375, 48.88106155395508, 51.35589599609375, 108.54530334472656, 115.17053985595703, 54.918243408203125, 215.7555389404297, 32.33367156982422, 119.30596923828125, 116.73735809326172, 9.235153198242188, 70.262939453125, 38.53192138671875, -41.577178955078125, 45.786293029785156, 59.69602966308594, 9.582263946533203, -24.570350646972656, 1.78924560546875, -23.682289123535156, 32.979515075683594, 131.82815551757812, -1.8860206604003906, -17.60759735107422, 121.98859405517578, 181.67623901367188, 70.71458435058594, 326.879150390625, 5.4751434326171875, 73.19114685058594, 90.7537841796875, 24.098194122314453, 248.7615966796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000511.npy"}
{"epoch": 0.750367107195301, "step": 512, "batch_size": 64, "mean": 70.07406616210938, "std": 83.94904327392578, "min": -111.67393493652344, "p10": -36.15088272094725, "median": 69.90876770019531, "p90": 170.07967529296874, "max": 249.87351989746094, "pos_frac": 0.828125, "sample": [142.54193115234375, -54.604156494140625, 170.09536743164062, 52.068809509277344, 56.284507751464844, 20.968875885009766, 249.87351989746094, -46.89031982421875, -13.84358024597168, -3.937112808227539, 111.40380096435547, 151.5162353515625, -15.895951271057129, 224.2195587158203, 61.75968933105469, 103.89371490478516, -104.77725982666016, 82.99578857421875, 8.1240234375, 3.357219696044922, -111.67393493652344, 152.173583984375, 8.362674713134766, 50.42120361328125, 85.9227523803711, 225.43325805664062, 0.96044921875, 83.17997741699219, 78.17230987548828, 113.64936065673828, 86.5980453491211, 31.63840103149414, 68.46580505371094, 62.89013671875, -110.56935119628906, 170.04306030273438, 178.72967529296875, 135.7978057861328, 115.23851013183594, 124.54765319824219, -93.74551391601562, 171.72259521484375, 166.1885528564453, 149.96975708007812, 1.7299957275390625, 127.24502563476562, 30.274986267089844, 86.10541534423828, -20.13361358642578, 49.145591735839844, 72.78805541992188, 16.69670867919922, 157.88528442382812, 68.199462890625, 79.65126037597656, 21.56576156616211, 160.17739868164062, 12.887535095214844, 65.83406066894531, 71.35173034667969, -43.01542663574219, 242.1009979248047, 81.95632934570312, 59.02201843261719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000512.npy"}
{"epoch": 0.7518355359765051, "step": 513, "batch_size": 64, "mean": 63.00016784667969, "std": 71.62881469726562, "min": -175.04461669921875, "p10": -13.396506500244136, "median": 59.57754135131836, "p90": 139.60330657958986, "max": 214.67147827148438, "pos_frac": 0.84375, "sample": [75.2937240600586, 5.155544281005859, 56.90088653564453, 13.286439895629883, -25.78827667236328, 43.750762939453125, 17.35409927368164, 111.92255401611328, 78.11151123046875, 12.655654907226562, 102.82799530029297, 60.330406188964844, 156.41455078125, -15.013023376464844, 10.990699768066406, 88.1907958984375, 14.734840393066406, -9.6246337890625, 96.92718505859375, 161.83119201660156, 93.25416564941406, 214.67147827148438, 203.73910522460938, 63.727752685546875, 54.49846649169922, 26.772789001464844, -27.944690704345703, 112.37680053710938, 136.90664672851562, -37.60707473754883, 114.93856811523438, 31.90921401977539, 37.78199768066406, 166.66094970703125, 123.97135925292969, 89.3690185546875, 57.027740478515625, 117.59420776367188, 72.67445373535156, 133.90496826171875, 92.50306701660156, -3.0633716583251953, 140.75901794433594, -157.19842529296875, 131.80972290039062, 50.333587646484375, 23.460254669189453, 24.166118621826172, 106.38407135009766, 58.824676513671875, 100.98200225830078, -2.987691879272461, 55.34419631958008, 117.26631164550781, 40.71742248535156, 40.2025146484375, 116.14374542236328, 73.30904388427734, -15.575645446777344, 24.98406982421875, -175.04461669921875, 197.07647705078125, 122.96450805664062, 26.168663024902344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000513.npy"}
{"epoch": 0.7533039647577092, "step": 514, "batch_size": 64, "mean": 64.95950317382812, "std": 76.77040100097656, "min": -57.93986511230469, "p10": -35.10834579467773, "median": 59.387168884277344, "p90": 181.25411987304688, "max": 258.25494384765625, "pos_frac": 0.734375, "sample": [51.16143035888672, -57.93986511230469, -16.32586669921875, -52.436561584472656, 23.52178955078125, 77.96916198730469, 192.9945831298828, 147.11456298828125, 182.3936767578125, 189.36651611328125, 23.643455505371094, 96.22992706298828, 78.55592346191406, 38.32395935058594, 133.81802368164062, -31.055755615234375, 185.37611389160156, 52.86468505859375, 87.2373046875, 90.66342163085938, -37.79787826538086, 35.523681640625, 178.59515380859375, 24.072330474853516, -30.769309997558594, 52.523956298828125, 139.94204711914062, 3.2845916748046875, -13.337059020996094, 60.826377868652344, 60.87364959716797, 57.947959899902344, 55.57090759277344, 94.40140533447266, -5.729862213134766, 114.40406799316406, 258.25494384765625, -47.80290222167969, -37.244956970214844, -3.5553817749023438, 42.500789642333984, 45.34776306152344, -31.380821228027344, 96.7394790649414, 100.570068359375, -10.450864791870117, -15.346000671386719, -54.82908630371094, 157.02687072753906, 184.89096069335938, 72.80418395996094, 156.46908569335938, 177.22442626953125, -24.082908630371094, 62.92140197753906, 65.65818786621094, -36.70585632324219, 158.07382202148438, 40.775604248046875, 107.72537231445312, 191.2193145751953, 74.31749725341797, 48.92706298828125, 93.55157470703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000514.npy"}
{"epoch": 0.7547723935389133, "step": 515, "batch_size": 64, "mean": 56.79905700683594, "std": 77.83741760253906, "min": -120.00175476074219, "p10": -36.983924484252924, "median": 58.149471282958984, "p90": 158.5887405395508, "max": 277.74932861328125, "pos_frac": 0.796875, "sample": [33.829837799072266, -70.59236145019531, 102.75033569335938, 58.820037841796875, -39.65571594238281, -114.6640396118164, 98.62763977050781, 9.165580749511719, -20.230873107910156, -64.08291625976562, 97.13162994384766, 113.66532897949219, 66.61848449707031, 49.99639892578125, 127.73116302490234, -79.81381225585938, 106.48623657226562, 96.37374877929688, 33.24815368652344, -19.011184692382812, 277.74932861328125, -30.749744415283203, 116.43318176269531, 169.34152221679688, 77.30367279052734, -104.15487670898438, 57.478904724121094, 134.93472290039062, 83.49125671386719, 66.56023406982422, 160.82725524902344, 25.549964904785156, -2.9017066955566406, 71.77313232421875, 181.4005126953125, 54.98497009277344, 153.36553955078125, 69.66912078857422, 62.19737243652344, 43.284523010253906, 109.54283142089844, 96.60936737060547, 39.649436950683594, 22.937332153320312, 73.00164794921875, 21.853790283203125, 17.536544799804688, 122.572998046875, 70.28143310546875, 20.24435043334961, -120.00175476074219, 207.7222442626953, 25.418577194213867, -3.600851058959961, 178.3765106201172, 129.0684814453125, 3.7975311279296875, 34.446502685546875, 21.602439880371094, 48.41605758666992, 30.782958984375, -21.647476196289062, 73.37583923339844, 178.22015380859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000515.npy"}
{"epoch": 0.7562408223201175, "step": 516, "batch_size": 64, "mean": 63.58304977416992, "std": 91.88068389892578, "min": -144.53225708007812, "p10": -46.65171356201171, "median": 49.14762878417969, "p90": 189.3095886230469, "max": 256.06304931640625, "pos_frac": 0.765625, "sample": [155.82034301757812, 119.27366638183594, 213.31967163085938, -52.47685241699219, 155.1498260498047, 73.17152404785156, 86.35308074951172, 32.07073974609375, 27.446701049804688, 63.086753845214844, 190.38540649414062, 193.63011169433594, 52.814552307128906, 45.48070526123047, -144.53225708007812, 56.89659881591797, 73.80047607421875, -114.47484588623047, 45.15556335449219, 87.45357513427734, -1.4006805419921875, 157.93850708007812, -55.70873260498047, 6.072612762451172, 191.3485107421875, -81.35273742675781, -33.059722900390625, 23.055389404296875, 186.79934692382812, 28.246429443359375, -16.194503784179688, 80.09881591796875, 218.01785278320312, 166.07501220703125, -25.440963745117188, -61.238037109375, 73.66000366210938, -10.773693084716797, 17.252532958984375, 22.4268798828125, 109.001953125, -21.37854766845703, 131.05105590820312, 38.05461883544922, 137.0133056640625, 182.25418090820312, -24.3828125, 142.5659637451172, 15.719606399536133, 157.84701538085938, 16.907997131347656, 18.504119873046875, 36.319427490234375, 34.319175720214844, 231.84140014648438, 37.69633483886719, -141.68338012695312, -11.886993408203125, 61.56541442871094, 256.06304931640625, 185.6469268798828, 115.29931640625, 113.25694274902344, 2.0708770751953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000516.npy"}
{"epoch": 0.7577092511013216, "step": 517, "batch_size": 64, "mean": 75.99791717529297, "std": 79.48611450195312, "min": -68.79515075683594, "p10": -31.87541275024414, "median": 81.47855758666992, "p90": 179.8624481201172, "max": 246.70108032226562, "pos_frac": 0.796875, "sample": [111.79796600341797, 82.59783935546875, 147.7985382080078, 163.9735107421875, 113.04753875732422, 135.80970764160156, 130.5209503173828, 48.787071228027344, 103.15972137451172, -68.65640258789062, 11.554313659667969, 179.75209045410156, -46.658233642578125, 94.21678161621094, 188.8285675048828, 56.304832458496094, 22.87417221069336, 20.654449462890625, 35.59809494018555, 158.04307556152344, 189.5916748046875, 150.4693603515625, 155.2329559326172, 101.83354949951172, -3.2167205810546875, -66.5064468383789, 24.74974822998047, 37.99533462524414, -24.795257568359375, 184.63169860839844, -68.79515075683594, -1.4444808959960938, 1.0773391723632812, 229.19985961914062, 26.6253662109375, 28.69732666015625, 135.72312927246094, -0.2742767333984375, -25.64280891418457, 7.230262756347656, 47.451087951660156, 168.9088134765625, 52.02149963378906, -33.39582061767578, 99.11222839355469, 181.41412353515625, 82.70115661621094, 80.3592758178711, -32.02281951904297, 37.29168701171875, 31.066919326782227, 114.53756713867188, 130.999267578125, 246.70108032226562, 36.54985046386719, 179.9097442626953, -31.531463623046875, 142.0548095703125, -38.612403869628906, 163.50881958007812, 108.24563598632812, 153.84585571289062, 78.302001953125, 92.06047058105469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000517.npy"}
{"epoch": 0.7591776798825257, "step": 518, "batch_size": 64, "mean": 56.313072204589844, "std": 78.62860107421875, "min": -136.8994903564453, "p10": -40.56511917114258, "median": 47.55389976501465, "p90": 159.71313629150396, "max": 237.57962036132812, "pos_frac": 0.796875, "sample": [-88.0604248046875, 212.776123046875, 118.34169006347656, 36.660091400146484, 39.490318298339844, 130.3732452392578, 27.38719940185547, 85.250732421875, 10.173309326171875, -16.33533477783203, 98.28978729248047, 112.9712905883789, 2.011505126953125, -39.650177001953125, 47.878299713134766, 90.65117645263672, 165.25543212890625, 47.22949981689453, 21.863449096679688, 17.815658569335938, 190.1797637939453, 48.25849151611328, 3.6822738647460938, 107.21369934082031, 48.66743469238281, 115.39259338378906, 205.3449249267578, 119.24748229980469, 6.010337829589844, 221.3571014404297, 113.61952209472656, 105.42996215820312, 80.65766906738281, 165.9185028076172, 26.09698486328125, 117.33533477783203, 33.60622787475586, 39.002288818359375, 237.57962036132812, 72.49574279785156, 83.1010513305664, 15.204116821289062, 70.98233032226562, -0.2325592041015625, 41.32037353515625, 132.50347900390625, -136.8994903564453, -22.484886169433594, 23.001197814941406, 20.37805938720703, -62.26976013183594, -46.9426155090332, 54.02442169189453, -18.460464477539062, -42.209957122802734, -38.67681884765625, 132.022216796875, 146.78111267089844, -40.957237243652344, 110.21856689453125, 41.674659729003906, 8.341796875, 51.14166259765625, -94.99368286132812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000518.npy"}
{"epoch": 0.7606461086637298, "step": 519, "batch_size": 64, "mean": 63.967472076416016, "std": 67.00696563720703, "min": -52.8997802734375, "p10": -12.739543533325188, "median": 52.955026626586914, "p90": 163.17063293457036, "max": 244.556640625, "pos_frac": 0.84375, "sample": [-36.83990478515625, 184.75050354003906, 11.199234008789062, -33.67396545410156, 85.4459228515625, 115.68296813964844, 90.35174560546875, -52.696163177490234, 195.9007110595703, 24.763580322265625, -5.412841796875, 145.75978088378906, 52.383243560791016, 129.7969512939453, 101.79515075683594, 61.85244369506836, 105.65292358398438, 244.556640625, 106.54603576660156, 167.73114013671875, 45.84275817871094, 212.26824951171875, 9.400415420532227, 53.52680969238281, -2.885284423828125, 85.64749145507812, 53.77838134765625, 100.84712219238281, -52.8997802734375, 48.99757385253906, 55.344207763671875, 116.2337875366211, 35.56077575683594, 18.47211456298828, 87.04550170898438, 9.162765502929688, 82.74169921875, 29.438770294189453, 24.558395385742188, 34.44445037841797, 58.13072967529297, 48.76161193847656, 65.43989562988281, 26.50390625, -19.78033447265625, 129.6611328125, 60.682159423828125, -45.22940444946289, 152.52944946289062, 38.217010498046875, 62.33180236816406, 212.7604522705078, 82.58445739746094, 29.044593811035156, 82.10447692871094, 187.30850219726562, -3.4194717407226562, 41.91969680786133, 40.95030212402344, 51.59651184082031, -15.879558563232422, 3.811307907104492, 26.272422790527344, 34.544410705566406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000519.npy"}
{"epoch": 0.762114537444934, "step": 520, "batch_size": 64, "mean": 73.66596984863281, "std": 67.22393035888672, "min": -113.5177993774414, "p10": -5.280877304077146, "median": 74.20817947387695, "p90": 151.7524124145508, "max": 271.630615234375, "pos_frac": 0.875, "sample": [-78.83053588867188, 47.67710876464844, 133.47705078125, 14.363069534301758, 80.27747344970703, 169.6635284423828, 137.448486328125, 13.470104217529297, -113.5177993774414, -6.2633056640625, -16.87506103515625, 115.22291564941406, -8.16973876953125, -15.871967315673828, 144.1739959716797, 67.69645690917969, 104.19293975830078, 271.630615234375, 16.017295837402344, 18.314422607421875, 177.04229736328125, 125.24090576171875, 47.8275032043457, 66.86835479736328, 135.20887756347656, 88.30238342285156, 104.78730773925781, 95.17617797851562, 3.5647048950195312, 125.38665771484375, 104.23713684082031, 47.8266716003418, 155.00030517578125, -11.489913940429688, 165.38211059570312, 117.00621795654297, 180.75714111328125, 124.76251983642578, 20.544448852539062, 16.309707641601562, 70.48970794677734, 57.833595275878906, 77.92665100097656, 130.88381958007812, 92.59814453125, 91.65404510498047, 35.05115509033203, 33.80851745605469, 199.99111938476562, -2.988544464111328, 17.081743240356445, 11.915367126464844, 114.697021484375, 21.215576171875, 35.49242401123047, 89.93667602539062, 49.16064453125, 119.335205078125, 103.4798583984375, 43.913551330566406, 116.38937377929688, 61.58305358886719, 49.61688232421875, 109.71791076660156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000520.npy"}
{"epoch": 0.7635829662261381, "step": 521, "batch_size": 64, "mean": 70.42420196533203, "std": 68.47748565673828, "min": -93.67367553710938, "p10": -17.219112396240234, "median": 61.162532806396484, "p90": 169.10497283935547, "max": 212.630126953125, "pos_frac": 0.84375, "sample": [40.75938415527344, 106.83184814453125, 177.2702178955078, 42.64196014404297, 21.99420166015625, 51.793365478515625, 35.090721130371094, 102.59971618652344, 47.92631912231445, 91.15816497802734, 17.461376190185547, 52.390010833740234, 25.838886260986328, 104.67330932617188, 29.42833709716797, 114.63086700439453, -17.52227783203125, -37.32509231567383, 120.9906005859375, 83.82867431640625, 49.97288513183594, -93.67367553710938, 77.90802001953125, -3.1278648376464844, 136.02664184570312, 161.64291381835938, 39.92744064331055, -53.71784210205078, 34.55046463012695, 50.42443084716797, 65.6325912475586, 86.62393188476562, 164.57705688476562, 127.62294006347656, 5.0131378173828125, 166.4427490234375, 53.116729736328125, 12.785507202148438, 89.16349792480469, 44.14875030517578, -24.23282241821289, 170.2459259033203, 21.667922973632812, 109.8100357055664, -60.71250534057617, 127.83229064941406, -13.473550796508789, 145.93927001953125, 130.85037231445312, -16.51172637939453, 22.0533447265625, -22.75894546508789, 78.86042785644531, 56.692474365234375, 68.24053955078125, 197.57196044921875, 75.82498168945312, 172.27297973632812, 38.81953811645508, 212.630126953125, 137.06277465820312, 100.09898376464844, 174.52879333496094, 176.31483459472656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000521.npy"}
{"epoch": 0.7650513950073421, "step": 522, "batch_size": 64, "mean": 67.86552429199219, "std": 81.81644439697266, "min": -79.01913452148438, "p10": -22.505841827392576, "median": 54.20322608947754, "p90": 177.93660888671877, "max": 244.97531127929688, "pos_frac": 0.765625, "sample": [179.52142333984375, 69.80574798583984, 129.6317596435547, 54.18342590332031, 194.1878662109375, -22.74376678466797, 28.66082763671875, 29.590408325195312, 53.612003326416016, 174.23870849609375, 0.6581878662109375, 197.06300354003906, 240.38644409179688, -19.646446228027344, 237.35316467285156, -11.892066955566406, 89.45883178710938, 35.83375549316406, 1.0221405029296875, -30.180877685546875, 222.45001220703125, 148.2713623046875, 54.80316925048828, 87.14844512939453, -2.3956527709960938, 117.38371276855469, 171.0201416015625, -7.880462646484375, 83.33448791503906, 128.04403686523438, 131.33526611328125, -21.95068359375, 26.079238891601562, 64.56520080566406, 37.240264892578125, 0.4696044921875, 112.67843627929688, -43.85114288330078, 161.64984130859375, -79.01913452148438, -46.30280303955078, 171.14111328125, -8.802093505859375, 47.519012451171875, 41.82864761352539, 121.23072814941406, 10.174667358398438, 31.551055908203125, -65.8087387084961, 130.71051025390625, 77.70709228515625, 54.223026275634766, 156.92127990722656, 24.6497802734375, -58.77326583862305, 64.0947265625, -0.3205375671386719, -14.822158813476562, 60.33027648925781, 23.33462142944336, 244.97531127929688, 6.869977951049805, 117.07726287841797, 131.76333618164062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000522.npy"}
{"epoch": 0.7665198237885462, "step": 523, "batch_size": 64, "mean": 48.934547424316406, "std": 81.6767578125, "min": -138.91937255859375, "p10": -59.529745483398436, "median": 53.84083557128906, "p90": 142.80589904785157, "max": 261.65008544921875, "pos_frac": 0.75, "sample": [55.28666687011719, 68.23236083984375, 46.09026336669922, 32.87501525878906, 15.538490295410156, -47.58758544921875, 167.11441040039062, 66.82339477539062, 59.03900146484375, 128.56643676757812, -67.27397918701172, 140.83462524414062, 10.935281753540039, -13.98321533203125, -46.948829650878906, 5.241153717041016, 110.41250610351562, 72.17962646484375, -26.337158203125, 197.26495361328125, 198.53831481933594, 95.4154052734375, -21.332237243652344, 18.81095314025879, 79.18500518798828, 141.79159545898438, 68.62955474853516, -138.91937255859375, -131.4543914794922, 52.39500427246094, 41.97209548950195, 33.21607971191406, 30.25841522216797, -76.38427734375, 95.29576873779297, 59.086936950683594, 39.12815475463867, -29.054964065551758, 180.27342224121094, -38.177513122558594, 73.71697998046875, 35.458534240722656, 261.65008544921875, 14.93777847290039, 88.2953109741211, -37.95664978027344, 83.09616088867188, -64.27081298828125, 160.0741424560547, 76.45857238769531, 71.3792724609375, 132.05250549316406, -109.47209167480469, 39.761253356933594, 73.91681671142578, 42.839576721191406, 114.61083221435547, 35.365753173828125, 134.39642333984375, 143.2406005859375, 104.69454956054688, 103.42269134521484, -59.136444091796875, -59.69830322265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000523.npy"}
{"epoch": 0.7679882525697503, "step": 524, "batch_size": 64, "mean": 66.92581939697266, "std": 76.5450668334961, "min": -71.4327392578125, "p10": -27.244808959960935, "median": 58.88350868225098, "p90": 173.0885482788086, "max": 249.01409912109375, "pos_frac": 0.828125, "sample": [-15.504676818847656, 155.99270629882812, 7.3958740234375, 57.94145584106445, 21.34636688232422, 12.196502685546875, 172.34083557128906, 4.785533905029297, 53.34190368652344, 65.1270980834961, 40.8677978515625, 16.221435546875, 4.4697113037109375, 74.3258285522461, 123.33135986328125, 125.9005126953125, 97.97305297851562, 249.01409912109375, 59.8255615234375, 181.50848388671875, 23.58289337158203, 116.13359069824219, 82.82487487792969, 130.38946533203125, 67.85247802734375, 167.1877899169922, 192.36692810058594, 176.70687866210938, 8.482261657714844, -71.4327392578125, 57.825862884521484, 93.39541625976562, 66.18122863769531, -39.177146911621094, 106.12967681884766, -24.3472900390625, 12.001815795898438, 82.67996215820312, 173.40899658203125, 8.383979797363281, 110.16825103759766, 19.481399536132812, 63.4829216003418, -54.28777313232422, 161.95492553710938, 16.64739990234375, 190.3841094970703, 107.72528076171875, 101.01831817626953, 150.29025268554688, -7.5610198974609375, 94.85528564453125, 144.0628204345703, 46.181976318359375, 47.869903564453125, -66.46150207519531, -15.766098022460938, 38.02532958984375, -28.486602783203125, 248.64697265625, 46.631919860839844, -62.808746337890625, 34.6806640625, -42.462074279785156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000524.npy"}
{"epoch": 0.7694566813509545, "step": 525, "batch_size": 64, "mean": 73.25902557373047, "std": 83.87409973144531, "min": -51.825286865234375, "p10": -19.196748352050776, "median": 53.352848052978516, "p90": 198.2029388427735, "max": 293.64178466796875, "pos_frac": 0.75, "sample": [88.15312194824219, 121.25372314453125, 42.168365478515625, 45.622161865234375, -4.958152770996094, 204.41796875, 293.64178466796875, 47.22434997558594, -44.95703125, -39.22289276123047, 52.92723083496094, -1.9064178466796875, 22.129676818847656, -20.949081420898438, 111.4119873046875, -49.26866149902344, 131.9847412109375, 138.36074829101562, 82.630859375, 95.13128662109375, 22.66302490234375, -7.201450347900391, 14.421226501464844, 92.43975830078125, -51.825286865234375, 84.05155181884766, 63.10809326171875, 211.57150268554688, 43.21034240722656, 83.8194580078125, 41.97303009033203, 5.004661560058594, 246.23863220214844, -3.0723609924316406, 183.70120239257812, -15.10797119140625, 177.93667602539062, 58.24278259277344, -5.138465881347656, 53.778465270996094, 127.41647338867188, -22.10504150390625, 162.03704833984375, -14.781410217285156, 26.528701782226562, 87.23538208007812, -1.775238037109375, 18.676036834716797, 134.60977172851562, 182.35699462890625, 219.85357666015625, 231.45858764648438, 80.54974365234375, 41.78932189941406, 178.90560913085938, -8.974647521972656, 9.249908447265625, 151.75772094726562, -26.064865112304688, 108.44930267333984, 12.885894775390625, 141.1656494140625, 7.513679504394531, 224.22879028320312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000525.npy"}
{"epoch": 0.7709251101321586, "step": 526, "batch_size": 64, "mean": 72.99378967285156, "std": 82.32686614990234, "min": -113.14873504638672, "p10": -17.699146461486812, "median": 76.40750503540039, "p90": 178.5783721923828, "max": 254.93528747558594, "pos_frac": 0.78125, "sample": [75.75139617919922, 84.06796264648438, 113.91523742675781, 19.315204620361328, 175.06642150878906, -113.14873504638672, -18.84412956237793, 36.55776596069336, 93.93667602539062, -40.78358459472656, 152.749267578125, 192.74778747558594, 101.55916595458984, 24.373699188232422, 80.0431137084961, 177.76632690429688, 125.18141174316406, 144.23780822753906, 93.17900085449219, -15.027519226074219, 19.892566680908203, -1.1797943115234375, 172.4850616455078, 181.5878448486328, 151.8089599609375, 163.90646362304688, 124.70887756347656, 178.86349487304688, 88.74632263183594, 44.66938018798828, 27.76611328125, 229.05409240722656, 20.70935821533203, 254.93528747558594, -70.78509521484375, -51.06695556640625, 181.06723022460938, 90.02059936523438, 17.036865234375, -8.252155303955078, 90.36593627929688, 182.92510986328125, 48.77003479003906, 41.83189392089844, 21.873497009277344, -4.076026916503906, 136.99880981445312, 122.71049499511719, 13.604904174804688, -29.888504028320312, 70.11893463134766, 177.9130859375, 96.9326171875, 176.11102294921875, 66.82251739501953, 77.06361389160156, 125.70709228515625, 1.014129638671875, -9.719650268554688, -111.4563217163086, -9.980804443359375, 54.14031219482422, 17.808563232421875, -4.6073760986328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000526.npy"}
{"epoch": 0.7723935389133627, "step": 527, "batch_size": 64, "mean": 71.01980590820312, "std": 68.45335388183594, "min": -83.20178985595703, "p10": -10.095051193237296, "median": 62.841392517089844, "p90": 159.7912109375, "max": 236.54592895507812, "pos_frac": 0.859375, "sample": [-24.681907653808594, 34.57318115234375, 39.2071533203125, 149.68991088867188, 90.77279663085938, 12.616378784179688, 125.51907348632812, -31.496713638305664, 40.70799255371094, 138.83364868164062, 158.84780883789062, -83.20178985595703, 190.49253845214844, 0.12700271606445312, -28.33440399169922, -32.399635314941406, 155.5450439453125, 7.633174896240234, 236.54592895507812, 183.21795654296875, 75.87002563476562, 29.604965209960938, 5.479522705078125, -0.8092231750488281, 35.04060363769531, 108.10673522949219, 33.79252243041992, 170.71966552734375, 51.55613327026367, 155.5638427734375, 102.97833251953125, 47.01930236816406, 84.82310485839844, 63.1226806640625, 104.75100708007812, 23.710464477539062, 174.1409912109375, 35.03049087524414, 124.50080871582031, -14.074691772460938, 235.45550537109375, 62.40589904785156, 138.3096923828125, 42.57759094238281, 41.863014221191406, 83.11102294921875, 114.02462768554688, 53.49609375, 59.3082275390625, 62.56010437011719, 160.19552612304688, 83.11966705322266, 33.91230773925781, 121.95941162109375, 69.06015014648438, -0.625732421875, 71.15365600585938, 79.14734649658203, 119.58415222167969, 68.92220306396484, 4.649496078491211, -37.71361541748047, 4.345417022705078, 99.30314636230469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000527.npy"}
{"epoch": 0.7738619676945668, "step": 528, "batch_size": 64, "mean": 72.56159973144531, "std": 77.36080932617188, "min": -106.5186767578125, "p10": -20.24466323852539, "median": 57.74424743652344, "p90": 167.9199234008789, "max": 252.87364196777344, "pos_frac": 0.828125, "sample": [37.24721145629883, 147.24310302734375, -11.772857666015625, -30.017555236816406, 55.443359375, 79.60391235351562, -106.5186767578125, 220.27615356445312, 181.04808044433594, 13.418392181396484, 48.53871154785156, 23.57806396484375, 132.87051391601562, 46.337642669677734, 91.54151916503906, 166.91201782226562, 121.16378784179688, 32.2359619140625, 128.73080444335938, 12.162139892578125, -70.5404052734375, 37.02033615112305, 115.2352294921875, 124.982177734375, 12.618095397949219, 119.87161254882812, -32.83649444580078, 27.688331604003906, 122.15231323242188, 168.3518829345703, -83.43154907226562, 40.64939498901367, 141.83453369140625, 15.854154586791992, 47.26325988769531, 53.863563537597656, 132.34356689453125, 99.72834014892578, 186.08538818359375, -19.91149139404297, 79.82173919677734, 149.08775329589844, 39.75117874145508, 162.25204467773438, 252.87364196777344, 82.7573013305664, 125.54296875, 209.19308471679688, 121.47312927246094, 97.97776794433594, -20.387451171875, 164.7625274658203, -19.404556274414062, 186.39297485351562, 19.222082138061523, 117.188720703125, 60.045135498046875, 42.77478790283203, -36.879966735839844, -8.9287109375, 34.568336486816406, 18.57263946533203, 17.761619567871094, 118.6590576171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000528.npy"}
{"epoch": 0.775330396475771, "step": 529, "batch_size": 64, "mean": 63.64674377441406, "std": 86.27729034423828, "min": -108.31644439697266, "p10": -34.406909179687496, "median": 57.09857940673828, "p90": 177.65496826171878, "max": 345.8447570800781, "pos_frac": 0.75, "sample": [-69.19340515136719, 123.00125885009766, 188.10986328125, 91.37860107421875, 69.04000091552734, 43.065582275390625, 15.629997253417969, 77.35210418701172, 103.71395874023438, 345.8447570800781, 79.94954681396484, 66.51819610595703, 52.780967712402344, 173.06260681152344, 123.98614501953125, -17.271774291992188, 19.432180404663086, 35.921653747558594, 115.44081115722656, -49.393943786621094, 4.968379974365234, -11.59649658203125, 212.88079833984375, 128.86509704589844, 209.22930908203125, -35.788970947265625, 158.25083923339844, -31.182098388671875, 60.292808532714844, -0.436187744140625, -11.101089477539062, 66.34658813476562, 64.10812377929688, 223.944091796875, 12.877052307128906, -102.42078399658203, 55.1773681640625, 14.36709976196289, 53.60603332519531, 102.62637329101562, -26.317928314208984, 154.22198486328125, 60.460174560546875, -62.67143249511719, 57.364662170410156, -20.803754806518555, 68.81672668457031, 106.3331298828125, 135.71109008789062, 39.245338439941406, 150.88076782226562, 56.832496643066406, -12.092733383178711, 20.728843688964844, 179.6231231689453, 52.99127197265625, -10.021575927734375, 165.0321502685547, 125.0554428100586, 26.36032485961914, 4.0937347412109375, -108.31644439697266, -48.557167053222656, 195.03819274902344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000529.npy"}
{"epoch": 0.7767988252569751, "step": 530, "batch_size": 64, "mean": 67.5218276977539, "std": 63.36030197143555, "min": -31.978866577148438, "p10": -4.442748641967773, "median": 52.7377815246582, "p90": 153.06400146484378, "max": 228.31307983398438, "pos_frac": 0.875, "sample": [2.6297454833984375, 104.91011047363281, 137.15977478027344, 118.21241760253906, 47.56903076171875, 48.667816162109375, 186.2723388671875, 67.05050659179688, 77.41879272460938, 25.517250061035156, -4.783206939697266, 26.793472290039062, 95.92928314208984, 44.940521240234375, 24.620147705078125, 112.651611328125, 26.812545776367188, 101.32621765136719, 122.6500244140625, 118.24258422851562, 58.43817901611328, 156.3785400390625, 25.724472045898438, 3.33428955078125, 19.40057373046875, 34.843849182128906, 22.47803497314453, 24.98796844482422, 68.87286376953125, -15.436920166015625, 2.420999526977539, 117.4799575805664, 106.89727783203125, 12.465686798095703, 9.421287536621094, 28.487030029296875, 37.6881103515625, 105.39027404785156, 57.092041015625, -31.978866577148438, 135.30113220214844, 228.31307983398438, -7.406089782714844, 219.4547119140625, 74.10535430908203, 163.024658203125, 98.34194946289062, -12.348472595214844, 208.55349731445312, 44.73934555053711, 173.80430603027344, 32.030372619628906, 56.80774688720703, -3.648345947265625, 145.330078125, 131.1790771484375, -24.852340698242188, 97.30264282226562, 121.02843475341797, -13.381011962890625, 20.56389617919922, 23.162124633789062, 70.85696411132812, 10.157333374023438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000530.npy"}
{"epoch": 0.7782672540381792, "step": 531, "batch_size": 64, "mean": 79.07819366455078, "std": 70.17310333251953, "min": -125.82054138183594, "p10": -9.567765808105468, "median": 81.87490844726562, "p90": 165.64542236328126, "max": 242.86911010742188, "pos_frac": 0.859375, "sample": [50.16241455078125, -8.959228515625, 91.73625183105469, 195.40806579589844, 116.59764099121094, -66.56024169921875, 72.04881286621094, 165.19764709472656, 242.86911010742188, 176.19581604003906, 53.53212356567383, 178.21023559570312, -15.923259735107422, 165.8373260498047, 82.56989288330078, 30.281723022460938, 185.4832000732422, 127.8567886352539, 24.70539093017578, 137.60940551757812, 138.01626586914062, 73.3744125366211, 83.43716430664062, 41.16731262207031, 82.63745880126953, 117.99705505371094, 33.01107406616211, 68.57811737060547, 41.796485900878906, -125.82054138183594, 212.50830078125, 115.60737609863281, 155.1147003173828, 86.53648376464844, -9.828567504882812, 81.17992401123047, 34.17497253417969, 56.88743591308594, 22.47756576538086, 117.3316879272461, 165.11334228515625, 23.498470306396484, 116.42727661132812, 78.90274047851562, 90.31668090820312, -16.442218780517578, 157.74456787109375, 105.5817642211914, 108.74195861816406, 89.98371887207031, 72.18582153320312, -4.0559844970703125, 76.37408447265625, 42.61351776123047, 29.845355987548828, 8.058696746826172, 136.87344360351562, 73.6727066040039, 107.866455078125, -33.430389404296875, -64.85475158691406, 122.31690979003906, 55.30894470214844, 87.31753540039062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000531.npy"}
{"epoch": 0.7797356828193832, "step": 532, "batch_size": 64, "mean": 89.30570983886719, "std": 74.70391082763672, "min": -51.2708740234375, "p10": -11.684263610839837, "median": 85.09942626953125, "p90": 189.37484741210938, "max": 232.38809204101562, "pos_frac": 0.84375, "sample": [124.60804748535156, 162.87197875976562, 167.30914306640625, 74.85446166992188, 32.8551025390625, 95.17251586914062, 41.4576416015625, 189.63792419433594, 62.961761474609375, 55.385684967041016, -3.4269561767578125, 206.08941650390625, 200.78099060058594, 117.40135192871094, 80.180908203125, 12.310836791992188, 165.3438720703125, 11.513137817382812, 139.10403442382812, -14.739822387695312, 63.66899108886719, 193.45751953125, 232.38809204101562, 149.25668334960938, 132.90513610839844, -49.003875732421875, 72.25660705566406, 75.73954772949219, 80.54241943359375, -40.359130859375, 151.17076110839844, 188.76100158691406, -43.62469482421875, 89.65643310546875, 29.352310180664062, -34.02691650390625, 26.343490600585938, 147.31460571289062, 25.915924072265625, 78.7166748046875, 39.81280517578125, 162.3089599609375, 97.74417114257812, 166.43814086914062, 136.68418884277344, 119.98075103759766, 109.96683502197266, 199.42593383789062, 2.8118133544921875, 73.59431457519531, 110.80725860595703, -51.2708740234375, -38.93446350097656, 217.42193603515625, 152.94598388671875, 71.80702209472656, 139.887451171875, 54.021728515625, 174.74325561523438, 111.89080047607422, -2.8837966918945312, 111.04811096191406, -4.55462646484375, 67.7640380859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000532.npy"}
{"epoch": 0.7812041116005873, "step": 533, "batch_size": 64, "mean": 53.07807922363281, "std": 76.38224029541016, "min": -110.15476989746094, "p10": -39.3216968536377, "median": 39.900970458984375, "p90": 163.54732971191407, "max": 261.3065490722656, "pos_frac": 0.75, "sample": [19.37671661376953, 17.11913299560547, 34.05453872680664, 30.499176025390625, 3.373149871826172, 58.1724853515625, -110.15476989746094, 120.4691162109375, -32.73139953613281, 150.64642333984375, 76.64131164550781, 64.98545837402344, 50.00218963623047, -3.8364334106445312, -8.645301818847656, -10.431259155273438, 220.229248046875, 39.63337707519531, -17.38631820678711, -19.087722778320312, 164.24502563476562, 167.74307250976562, 26.946762084960938, 62.94525146484375, 154.2123565673828, 126.0291748046875, 178.01690673828125, -21.00421905517578, 30.178010940551758, 261.3065490722656, 37.01398468017578, 58.479705810546875, 165.92807006835938, -39.90559387207031, 77.15592956542969, 83.89862060546875, 111.5938720703125, 79.73883056640625, 65.96528625488281, 75.5528564453125, 10.48830795288086, 13.896600723266602, 185.82452392578125, 78.06465148925781, 29.37841796875, -56.33978271484375, 4.1895751953125, -58.41737365722656, 121.73881530761719, 75.83221435546875, 63.46783447265625, -93.8008804321289, 85.859619140625, 40.16856384277344, 35.053123474121094, -49.66349792480469, -37.95927047729492, 35.29673767089844, 22.611419677734375, 160.86895751953125, 73.03609466552734, -42.50209045410156, 161.91937255859375, -10.984615325927734], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000533.npy"}
{"epoch": 0.7826725403817915, "step": 534, "batch_size": 64, "mean": 75.91458129882812, "std": 63.225982666015625, "min": -58.18415832519531, "p10": -12.011733818054182, "median": 76.08272171020508, "p90": 154.92073669433594, "max": 188.2947998046875, "pos_frac": 0.890625, "sample": [80.61277770996094, 188.2947998046875, 108.60398864746094, 149.96591186523438, 25.41222381591797, 74.3005599975586, 49.4583740234375, 43.35331344604492, 16.935043334960938, 171.46888732910156, 16.233036041259766, 150.76348876953125, -20.62010955810547, 163.97140502929688, 88.65572357177734, 26.598175048828125, 175.58865356445312, -58.18415832519531, -21.089691162109375, 83.45535278320312, 80.97520446777344, -29.359359741210938, 99.21195983886719, 32.12806701660156, 130.88319396972656, 142.41949462890625, 57.57261657714844, 3.8424530029296875, 12.277706146240234, 121.4556884765625, 54.01481628417969, 131.54336547851562, -18.75181007385254, 69.9754867553711, 77.86488342285156, 5.8403778076171875, -47.015045166015625, 36.64262390136719, 143.28887939453125, 136.43545532226562, 137.97845458984375, 51.46116638183594, 101.62366485595703, 119.26847839355469, 91.9727554321289, 155.81658935546875, 132.37200927734375, 65.2707290649414, 116.88367462158203, 186.28140258789062, 136.4356689453125, 45.95038604736328, 44.108551025390625, 108.66902160644531, 32.394386291503906, 152.83041381835938, 183.45521545410156, 42.48446273803711, 47.06616973876953, 36.110557556152344, 3.7151107788085938, 26.438644409179688, -21.747024536132812, 106.6689224243164], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000534.npy"}
{"epoch": 0.7841409691629956, "step": 535, "batch_size": 64, "mean": 80.23560333251953, "std": 80.39476776123047, "min": -126.80530548095703, "p10": -28.01006240844726, "median": 82.17752075195312, "p90": 182.31244201660164, "max": 271.9852600097656, "pos_frac": 0.828125, "sample": [-3.0472068786621094, 151.68710327148438, 102.86991119384766, 59.8618049621582, 135.81471252441406, 72.07698059082031, 59.06464767456055, 146.75086975097656, 109.83270263671875, 17.313621520996094, -13.029876708984375, 112.81460571289062, 161.79612731933594, 57.880775451660156, 104.13987731933594, 106.90214538574219, 35.58087921142578, 151.71353149414062, 89.31549072265625, 27.9578857421875, 235.50350952148438, 27.588714599609375, 97.99494934082031, 152.57977294921875, 88.52606201171875, 223.48599243164062, 117.20520782470703, 191.1051483154297, 111.26250457763672, 146.43247985839844, 151.21348571777344, 50.36448669433594, 161.35940551757812, 116.76702117919922, 5.560661315917969, -22.16492462158203, 5.231719970703125, 194.07106018066406, -30.515121459960938, 56.50147247314453, 27.220245361328125, 271.9852600097656, 75.8289794921875, 44.99955749511719, -39.336402893066406, 100.88948059082031, 32.68894577026367, 113.09886932373047, -47.22528076171875, -47.927833557128906, 48.90154266357422, -32.989837646484375, 107.23277282714844, 25.364662170410156, 72.622314453125, -36.06444549560547, 216.87269592285156, -126.80530548095703, 225.12718200683594, 161.17755126953125, 45.73070526123047, 117.01986694335938, -15.70819091796875, 27.005088806152344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000535.npy"}
{"epoch": 0.7856093979441997, "step": 536, "batch_size": 64, "mean": 73.65609741210938, "std": 75.00313568115234, "min": -140.80345153808594, "p10": -24.024046325683585, "median": 73.81912231445312, "p90": 161.44126129150393, "max": 266.1372985839844, "pos_frac": 0.875, "sample": [7.083066940307617, 98.06039428710938, 188.6741943359375, 187.6967010498047, 54.972801208496094, 96.3046875, 115.87371826171875, -27.506996154785156, -48.89435577392578, 123.55839538574219, 148.41375732421875, 141.1428680419922, 103.24417114257812, 94.69697570800781, 163.31893920898438, 74.55836486816406, 145.4429168701172, 6.710762023925781, 116.12804412841797, 179.87615966796875, 201.62864685058594, 71.97117614746094, 36.32575988769531, 141.76669311523438, 20.033771514892578, 157.0600128173828, 10.945470809936523, 95.91877746582031, 135.91104125976562, -84.47184753417969, 142.61827087402344, 73.07987976074219, 61.263946533203125, 54.0704345703125, -35.3029899597168, 84.36690521240234, 49.09605407714844, 229.28567504882812, 19.07717514038086, -15.897163391113281, 85.80116271972656, -140.80345153808594, 107.88514709472656, 125.95215606689453, 37.98381042480469, 9.786773681640625, 53.261253356933594, 115.12078094482422, 62.91948699951172, -30.402008056640625, 36.99513244628906, 64.48518371582031, 75.60616302490234, 106.87503051757812, -34.10569763183594, 60.88053894042969, 31.324398040771484, 9.429882049560547, 0.6106834411621094, 1.4937896728515625, 266.1372985839844, 100.78865814208984, 58.61724853515625, 89.24334716796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000536.npy"}
{"epoch": 0.7870778267254038, "step": 537, "batch_size": 64, "mean": 88.06642150878906, "std": 77.63011169433594, "min": -97.07540893554688, "p10": -0.05024948120117084, "median": 82.03357696533203, "p90": 185.47940216064458, "max": 254.5069122314453, "pos_frac": 0.890625, "sample": [-97.07540893554688, 16.706527709960938, 3.779003143310547, 36.112213134765625, -54.22021484375, 74.49370574951172, 31.549434661865234, 224.20091247558594, 57.771453857421875, 207.63514709472656, 35.64459228515625, 25.3798828125, 7.592109680175781, 78.5164794921875, 87.41846466064453, 158.50164794921875, 85.55067443847656, 0.9624252319335938, 162.81039428710938, -30.79729461669922, -2.8230247497558594, 124.64559936523438, -0.4842529296875, 158.17684936523438, 50.066162109375, -18.58071517944336, 110.28839111328125, 71.3304214477539, 123.91021728515625, 171.5249786376953, 254.5069122314453, 164.24273681640625, 250.8915557861328, 126.4290771484375, 170.039794921875, 107.925537109375, 4.859790802001953, 191.45986938476562, 11.823921203613281, 52.61210632324219, 19.034942626953125, 217.99893188476562, 158.9381103515625, 93.22146606445312, 35.507102966308594, 126.53254699707031, 99.52684783935547, 152.9229736328125, -2.1770095825195312, 120.77037048339844, 154.60601806640625, 100.63589477539062, 46.68977737426758, 52.42633056640625, 118.93816375732422, 26.490192413330078, 250.22463989257812, 54.24827575683594, 156.29547119140625, 38.60563278198242, 120.97933959960938, 70.22055053710938, 53.31266784667969, 134.9239959716797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000537.npy"}
{"epoch": 0.788546255506608, "step": 538, "batch_size": 64, "mean": 67.333984375, "std": 65.2879409790039, "min": -68.58363342285156, "p10": -2.8120384216308576, "median": 66.80461883544922, "p90": 145.8506072998047, "max": 239.5009765625, "pos_frac": 0.859375, "sample": [81.50965881347656, 182.88394165039062, 56.75481414794922, 221.20472717285156, 74.62643432617188, 71.2950439453125, 114.22352600097656, 94.39378356933594, 58.584068298339844, 41.818031311035156, 71.41960144042969, 141.9648895263672, 58.668304443359375, 126.43383026123047, 39.35004425048828, -0.7975845336914062, 35.30201721191406, 6.0240020751953125, 111.00794982910156, 47.237701416015625, 46.70225524902344, 37.08135986328125, 26.5635986328125, 128.54530334472656, 56.99751281738281, 79.78724670410156, 147.5159149169922, 79.24695587158203, 106.39822387695312, -39.40586853027344, 58.079345703125, 191.01988220214844, 77.1333999633789, -65.22981262207031, 164.88259887695312, -17.7274227142334, 6.725002288818359, 105.93093872070312, 85.87788391113281, -33.16987609863281, 112.52323913574219, 108.40199279785156, 18.921667098999023, 114.21118927001953, 7.061309814453125, 87.24916076660156, 3.070201873779297, -68.58363342285156, 5.172613143920898, -1.3105621337890625, 133.09397888183594, 2.6900787353515625, 165.03369140625, 54.22140884399414, -32.75755310058594, 24.216941833496094, -3.4555282592773438, 90.69105529785156, 239.5009765625, 62.31419372558594, 1.4263153076171875, 116.18069458007812, 117.91036987304688, 74.73178100585938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000538.npy"}
{"epoch": 0.7900146842878121, "step": 539, "batch_size": 64, "mean": 71.33673095703125, "std": 82.17650604248047, "min": -118.40109252929688, "p10": -32.73659210205078, "median": 68.90102005004883, "p90": 160.18490447998047, "max": 247.07513427734375, "pos_frac": 0.8125, "sample": [17.807174682617188, 30.553939819335938, -68.49501037597656, 47.53113555908203, 80.31539154052734, 119.45276641845703, 10.008663177490234, 81.38186645507812, 65.14666748046875, 115.85920715332031, -34.11097717285156, -3.8909912109375, 50.08686828613281, 128.15060424804688, 144.13082885742188, 91.48162841796875, 80.05574035644531, 105.43742370605469, 45.60478973388672, 144.67276000976562, 224.7985382080078, -118.40109252929688, 145.64404296875, 205.7764434814453, 156.71505737304688, -45.05217361450195, 160.18344116210938, 59.10948944091797, 129.4658203125, 125.45856475830078, 37.89145278930664, 136.50262451171875, 74.08688354492188, 36.308685302734375, 26.425827026367188, 18.5321044921875, 58.583465576171875, 247.07513427734375, 41.60879898071289, 92.49727630615234, -87.18807983398438, 176.14566040039062, -21.733322143554688, 63.903785705566406, 151.10073852539062, 16.338396072387695, 152.2047882080078, -76.35736083984375, -87.7437973022461, 30.607620239257812, 209.4912567138672, 46.60966491699219, 103.93547821044922, 26.86127471923828, 125.52523803710938, -29.529693603515625, 212.90728759765625, 72.6553726196289, 16.235397338867188, 160.18553161621094, -18.37920379638672, 130.17800903320312, 152.23886108398438, -25.033309936523438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000539.npy"}
{"epoch": 0.7914831130690162, "step": 540, "batch_size": 64, "mean": 81.52059936523438, "std": 85.75823211669922, "min": -134.04025268554688, "p10": -12.222925186157221, "median": 74.80424118041992, "p90": 185.17412567138672, "max": 267.5906982421875, "pos_frac": 0.859375, "sample": [97.08209991455078, 21.644378662109375, 35.072174072265625, -77.14579772949219, 146.2476806640625, -76.52015686035156, 136.05255126953125, -7.652099609375, -14.18185043334961, 22.280601501464844, 5.685922622680664, 185.31591796875, 27.4451904296875, 176.9423065185547, 266.54107666015625, 103.57292938232422, 74.15624237060547, 149.62344360351562, 162.3836669921875, 102.27166748046875, 110.29267120361328, 50.07929992675781, 133.28492736816406, 111.6766357421875, 101.58151245117188, -27.275558471679688, 141.4637451171875, 219.66725158691406, 79.08372497558594, 75.45223999023438, 9.3326416015625, 164.29833984375, 167.98101806640625, 70.45970153808594, 84.5833740234375, 184.84327697753906, -62.49011993408203, 34.8707275390625, 252.78530883789062, -2.7693252563476562, 219.10476684570312, 239.938720703125, 85.961669921875, 24.817138671875, -38.72875213623047, 71.92631530761719, 136.65576171875, 22.3172607421875, 147.86370849609375, 69.92692565917969, -134.04025268554688, 24.682464599609375, 63.97686767578125, 126.64871215820312, 14.574111938476562, 15.399246215820312, 41.17108154296875, 13.155765533447266, 267.5906982421875, 58.80722427368164, 28.630081176757812, 139.99009704589844, 76.8058090209961, 64.12361145019531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000540.npy"}
{"epoch": 0.7929515418502202, "step": 541, "batch_size": 64, "mean": 58.86784362792969, "std": 71.31011962890625, "min": -108.40313720703125, "p10": -17.089589691162104, "median": 51.78759002685547, "p90": 159.55283203125003, "max": 233.3856964111328, "pos_frac": 0.8125, "sample": [5.502166748046875, 31.586654663085938, 94.02116394042969, 70.66108703613281, 50.086021423339844, -12.940170288085938, 87.15158081054688, 78.72793579101562, 100.42204284667969, 115.80743408203125, 7.920196533203125, -18.86791229248047, 46.03511047363281, 152.69754028320312, 93.78369140625, 48.063446044921875, 121.25709533691406, 188.562744140625, 97.6097412109375, 53.775604248046875, 233.3856964111328, 207.47816467285156, 125.61592102050781, 77.50172424316406, 28.655174255371094, 0.2812957763671875, -93.39790344238281, 29.152828216552734, -1.3077507019042969, 95.72235107421875, 37.24176025390625, -50.630615234375, 60.013336181640625, 136.85592651367188, -108.40313720703125, 10.469104766845703, 30.165695190429688, -1.3068199157714844, 162.49081420898438, 190.62191772460938, 53.489158630371094, 44.77455139160156, 42.0986328125, 30.494583129882812, 20.14752197265625, 109.97689819335938, -38.65907669067383, 118.58953857421875, -12.550148010253906, -38.13059997558594, 15.483028411865234, 38.303443908691406, 11.063674926757812, 57.929298400878906, 220.13113403320312, 69.76176452636719, -38.256195068359375, 56.02257537841797, -5.4013214111328125, 54.54252624511719, 180.6282958984375, 57.45442199707031, 109.97244262695312, 27.20716094970703], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000541.npy"}
{"epoch": 0.7944199706314243, "step": 542, "batch_size": 64, "mean": 82.24880981445312, "std": 76.41389465332031, "min": -67.90126037597656, "p10": -23.36165618896484, "median": 86.37420272827148, "p90": 182.28163909912112, "max": 245.37228393554688, "pos_frac": 0.828125, "sample": [83.98529052734375, 151.904052734375, 190.67572021484375, -25.239791870117188, -9.506874084472656, 132.5984344482422, 5.704713821411133, 187.751220703125, 188.74061584472656, 88.03053283691406, 100.87368774414062, 102.2489242553711, 159.39874267578125, 61.072998046875, 184.66542053222656, 187.21495056152344, 105.64466857910156, 3.906219482421875, 141.0247344970703, 9.589691162109375, 52.90985107421875, -7.991294860839844, 66.52613830566406, 75.76216888427734, -61.55610656738281, 30.846603393554688, -50.42359924316406, 132.51480102539062, -56.545257568359375, 146.5349884033203, -32.25054168701172, 114.91071319580078, -67.90126037597656, -45.56211853027344, 84.7178726196289, 40.354217529296875, 138.777587890625, 130.73976135253906, 9.256759643554688, 58.297698974609375, 68.59305572509766, 175.82363891601562, 174.8174591064453, -15.009521484375, 110.97276306152344, 138.71975708007812, 12.73444938659668, 61.3681640625, 108.42320251464844, 22.562583923339844, 121.5398941040039, 76.90446472167969, 186.4867706298828, 131.53604125976562, 118.46183776855469, 245.37228393554688, 133.13150024414062, -18.979339599609375, 176.719482421875, 58.495948791503906, 125.22462463378906, 67.07664489746094, 163.87301635742188, 8.872222900390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000542.npy"}
{"epoch": 0.7958883994126285, "step": 543, "batch_size": 64, "mean": 45.18752670288086, "std": 87.88217163085938, "min": -144.33384704589844, "p10": -58.95977020263672, "median": 34.074974060058594, "p90": 154.46973266601566, "max": 288.767578125, "pos_frac": 0.6875, "sample": [36.56329345703125, 22.483383178710938, -3.8989486694335938, -59.72270202636719, 145.730712890625, 208.59805297851562, -16.157451629638672, -11.667343139648438, -68.60213470458984, 91.22298431396484, -97.4415054321289, -77.94680786132812, -54.443702697753906, 107.4580307006836, 288.767578125, 44.459712982177734, 70.70347595214844, 248.54342651367188, 10.891483306884766, 117.56291198730469, 77.25586700439453, 48.22760772705078, -45.7967529296875, 46.4428596496582, -106.86691284179688, 122.66246032714844, -13.127243041992188, 194.13490295410156, -47.93279266357422, 102.64469909667969, 76.28424072265625, 19.1068115234375, -144.33384704589844, 113.44453430175781, 26.652389526367188, -3.520549774169922, 132.7384033203125, 138.13121032714844, 187.99325561523438, -22.818038940429688, 56.88982391357422, 23.656692504882812, -108.0037841796875, 31.07819366455078, 27.62451934814453, 33.37640380859375, 163.74476623535156, 138.76617431640625, -30.460247039794922, 46.21844482421875, 83.28564453125, 45.560794830322266, 158.21502685546875, 4.036285400390625, 108.29182434082031, 140.04525756835938, 26.10724639892578, -57.179595947265625, 15.8502197265625, -28.999664306640625, 34.77354431152344, 26.9664306640625, -2.3894081115722656, 50.119773864746094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000543.npy"}
{"epoch": 0.7973568281938326, "step": 544, "batch_size": 64, "mean": 55.278053283691406, "std": 71.21431732177734, "min": -75.49259185791016, "p10": -17.281611251831052, "median": 36.6396369934082, "p90": 150.26485290527347, "max": 321.83251953125, "pos_frac": 0.8125, "sample": [27.571945190429688, 12.2562255859375, -55.19136047363281, 34.52813720703125, 321.83251953125, 12.839614868164062, 43.469512939453125, 92.82536315917969, 16.215267181396484, 12.895898818969727, 2.0423583984375, 15.315258026123047, 82.33499145507812, 48.55131530761719, -75.49259185791016, 71.52204132080078, 173.1230010986328, 144.54859924316406, 51.50849914550781, 55.479557037353516, 104.21175384521484, 16.41973876953125, -4.065517425537109, 0.6788825988769531, 41.85955047607422, 183.42013549804688, 152.7146759033203, 159.9441680908203, 15.270723342895508, 144.52532958984375, 143.04400634765625, 25.123756408691406, 44.47383117675781, 72.43148803710938, 33.15646743774414, -0.200927734375, 39.98518371582031, 18.02190399169922, -19.09783935546875, -9.091217041015625, 121.97589874267578, 38.048057556152344, 82.5499267578125, 33.08184814453125, 71.77743530273438, 31.124603271484375, 14.739486694335938, 35.23121643066406, 100.29802703857422, 32.41680908203125, -15.3662109375, 77.3681869506836, 72.62731170654297, 213.2830810546875, 121.07850646972656, 3.7415313720703125, -18.102497100830078, -25.733779907226562, 175.44065856933594, -57.81755065917969, 98.79130554199219, 120.56874084472656, -33.66785430908203, -6.661523818969727], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000544.npy"}
{"epoch": 0.7988252569750367, "step": 545, "batch_size": 64, "mean": 67.8095474243164, "std": 84.61007690429688, "min": -127.92819213867188, "p10": -43.75001831054687, "median": 72.60540008544922, "p90": 186.12010192871097, "max": 220.69012451171875, "pos_frac": 0.8125, "sample": [83.47606658935547, -105.41458892822266, -10.981689453125, -127.92819213867188, 70.72843933105469, 132.92095947265625, 39.83690643310547, 11.693204879760742, 108.71674346923828, 60.72810363769531, 84.74043273925781, 200.6242218017578, -46.50614929199219, 16.515792846679688, 122.78390502929688, 111.08597564697266, -35.37812805175781, 175.17799377441406, 172.4116668701172, 9.685379028320312, 119.32867431640625, 157.7208709716797, -109.57390594482422, -61.78565979003906, 91.90204620361328, 39.79418182373047, 16.713531494140625, -37.31904602050781, 196.8909912109375, 89.0914306640625, 88.42359924316406, 39.052947998046875, 4.313325881958008, 68.05813598632812, 190.36050415039062, 199.12704467773438, 58.466712951660156, 89.15834045410156, 60.80682373046875, 34.785797119140625, 21.77643585205078, 138.783447265625, 145.6141815185547, 7.998035430908203, 17.001113891601562, 94.84603118896484, 110.65678405761719, 203.084716796875, 97.20433044433594, -0.9568138122558594, -7.298366546630859, 23.346420288085938, 73.78294372558594, 110.49465942382812, 176.225830078125, 220.69012451171875, 134.71542358398438, 137.0248565673828, 71.4278564453125, -93.91415405273438, 76.97642517089844, -78.47463989257812, 39.50172424316406, 209.07008361816406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000545.npy"}
{"epoch": 0.8002936857562408, "step": 546, "batch_size": 64, "mean": 64.24735260009766, "std": 78.48299407958984, "min": -172.19561767578125, "p10": -13.361320877075192, "median": 61.95509338378906, "p90": 178.7099029541016, "max": 218.80503845214844, "pos_frac": 0.78125, "sample": [-10.059661865234375, 44.01481246948242, -38.67577362060547, 102.95049285888672, 71.34259796142578, 76.15534210205078, 83.077880859375, 143.78060913085938, 196.49948120117188, 66.03976440429688, -14.776317596435547, 98.34370422363281, 105.13487243652344, -36.34115219116211, 96.9598159790039, 53.77784729003906, 36.22010803222656, 28.150554656982422, 218.80503845214844, 52.492034912109375, 8.450145721435547, 202.2425537109375, -172.19561767578125, 198.83358764648438, 10.52325439453125, 198.51528930664062, 41.277801513671875, 104.59471130371094, 170.6790008544922, 94.32633972167969, -54.41815185546875, 66.89410400390625, 13.921951293945312, 182.15171813964844, 87.92520904541016, 41.841651916503906, 127.07240295410156, 119.40656280517578, 120.62660217285156, 142.13113403320312, -141.10064697265625, 49.00651550292969, 25.547149658203125, -0.11602783203125, -47.3624267578125, 159.71847534179688, -7.711496353149414, 31.6212158203125, 42.73078155517578, 182.197509765625, -5.0579681396484375, -3.9783935546875, 26.180095672607422, -9.543685913085938, 102.04959869384766, 102.5860366821289, 30.879959106445312, 60.07693099975586, -8.429195404052734, 83.7501220703125, 34.51531219482422, 129.99850463867188, 63.833255767822266, 131.74656677246094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000546.npy"}
{"epoch": 0.801762114537445, "step": 547, "batch_size": 64, "mean": 73.09080505371094, "std": 63.33012771606445, "min": -70.13029479980469, "p10": -11.695862770080563, "median": 68.6412239074707, "p90": 145.70867309570312, "max": 224.53887939453125, "pos_frac": 0.859375, "sample": [29.153518676757812, -27.781768798828125, 84.35595703125, 110.38778686523438, -13.46728515625, -3.8692855834960938, 131.7506866455078, -29.665267944335938, 39.886993408203125, 215.78297424316406, 14.255355834960938, 115.87246704101562, 150.90567016601562, 224.53887939453125, -70.13029479980469, 53.701866149902344, 129.70816040039062, 89.83690643310547, 35.19287109375, 147.7176055908203, 65.95037841796875, 127.18285369873047, 23.731857299804688, 145.3084716796875, 146.42965698242188, 100.93338012695312, 130.64476013183594, 145.88018798828125, 127.02377319335938, 39.300933837890625, 81.35103607177734, 13.606109619140625, -32.02299880981445, 90.52110290527344, 141.69976806640625, 112.46768188476562, -68.02391815185547, 124.48802185058594, 56.25288391113281, 138.62313842773438, 61.509246826171875, 72.52183532714844, 160.37547302246094, 140.2484893798828, 63.55089569091797, 113.32720947265625, 22.191341400146484, 82.1402587890625, 24.3623104095459, 20.462486267089844, 59.64925003051758, 111.40667724609375, 66.16488647460938, 80.07540893554688, -41.262847900390625, 42.79530334472656, 69.86666870117188, 64.7161865234375, 67.41577911376953, 52.95941162109375, 134.49114990234375, 39.95562744140625, -7.562543869018555, 36.96843719482422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000547.npy"}
{"epoch": 0.8032305433186491, "step": 548, "batch_size": 64, "mean": 54.670692443847656, "std": 86.08524322509766, "min": -134.03955078125, "p10": -71.47863006591797, "median": 51.73506164550781, "p90": 146.2186492919922, "max": 322.26458740234375, "pos_frac": 0.78125, "sample": [155.7578125, 90.48194885253906, 22.637710571289062, 36.15971374511719, 71.92426300048828, 34.01681137084961, -2.2270050048828125, 200.8362274169922, 125.80953216552734, 90.13687133789062, 93.54570007324219, 48.536903381347656, 151.05149841308594, 36.80596923828125, 53.926124572753906, -115.08206176757812, -7.9991912841796875, 0.3821830749511719, 146.60061645507812, 134.4671630859375, 142.24462890625, 31.334667205810547, 27.27106475830078, 75.10176086425781, 133.95074462890625, -134.03955078125, -89.59959411621094, 70.75590515136719, 52.193580627441406, -35.08023452758789, 75.23042297363281, 322.26458740234375, 118.87428283691406, 83.82118225097656, 33.66802215576172, -56.35761260986328, 66.32551574707031, 43.29853057861328, 143.34237670898438, 73.75444030761719, 135.04953002929688, -71.94451904296875, 23.979694366455078, 51.27654266357422, 39.754417419433594, -117.42167663574219, 46.31077575683594, 87.30465698242188, 19.410751342773438, -70.39155578613281, 39.91436767578125, -76.3966064453125, -58.26689910888672, 175.82745361328125, -78.61796569824219, 52.86209487915039, -34.89312744140625, 145.327392578125, 50.623809814453125, 141.70367431640625, 24.607437133789062, 141.894775390625, 80.09776306152344, 204.78782653808594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000548.npy"}
{"epoch": 0.8046989720998532, "step": 549, "batch_size": 64, "mean": 66.53626251220703, "std": 79.45561981201172, "min": -140.55426025390625, "p10": -37.26612205505371, "median": 61.76848602294922, "p90": 178.91716156005862, "max": 225.719482421875, "pos_frac": 0.8125, "sample": [9.325302124023438, 172.04823303222656, 4.958732604980469, 21.605873107910156, 143.2030487060547, 150.83009338378906, 84.01978302001953, 162.38331604003906, 114.08894348144531, 200.8537139892578, 60.08992004394531, 36.0728759765625, 106.95198059082031, 83.30711364746094, 69.77571105957031, 135.785888671875, 16.219833374023438, -140.55426025390625, 38.828643798828125, -33.3098258972168, 68.47361755371094, 123.62881469726562, 109.60684967041016, 39.34907531738281, 82.668701171875, 58.712310791015625, 114.48896789550781, 14.759571075439453, 180.45237731933594, 86.49539184570312, 85.24925231933594, -11.579706192016602, 63.447052001953125, -62.34650802612305, 175.33499145507812, -11.827865600585938, 115.8886489868164, 194.53147888183594, 45.06420135498047, 4.411407470703125, 51.308563232421875, 97.71292114257812, 120.23944854736328, 186.11077880859375, 225.719482421875, 146.40048217773438, -53.93568420410156, 14.85098648071289, -44.00689697265625, 194.5118408203125, 50.850276947021484, -86.49124908447266, 20.058765411376953, 66.29643249511719, 111.46888732910156, 34.94169235229492, -22.651748657226562, 10.128034591674805, -0.62255859375, 218.27598571777344, -47.45427703857422, 49.754302978515625, 40.52254104614258, -38.96167755126953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000549.npy"}
{"epoch": 0.8061674008810573, "step": 550, "batch_size": 64, "mean": 64.60523986816406, "std": 67.81615447998047, "min": -103.75785064697266, "p10": -12.92852325439453, "median": 55.29528999328613, "p90": 168.1207702636719, "max": 207.23583984375, "pos_frac": 0.859375, "sample": [41.07239532470703, 38.247779846191406, 47.410587310791016, 207.23583984375, 36.10345458984375, -44.08708190917969, 167.159423828125, 19.56659698486328, 5.138879776000977, 81.01200103759766, 130.1558837890625, 116.1072998046875, 182.25640869140625, 65.53677368164062, 31.97369384765625, 168.53277587890625, 51.29737854003906, -79.0443344116211, 60.15641784667969, 29.882064819335938, -21.130599975585938, 8.735809326171875, 41.042259216308594, 127.04289245605469, 88.29539489746094, 53.89056396484375, 54.82354736328125, 97.89207458496094, 77.2639389038086, -4.722455978393555, 7.376312255859375, 13.480064392089844, 44.60272216796875, 21.376300811767578, 26.81873321533203, 194.88092041015625, -13.24530029296875, 202.21766662597656, 144.9510040283203, 31.900733947753906, 118.64669799804688, 101.19227600097656, 55.767032623291016, -30.4368896484375, 97.232421875, 63.89312744140625, 148.23089599609375, 70.22994995117188, 150.00592041015625, 90.09133911132812, 39.75225067138672, 187.05331420898438, -42.7518196105957, -103.75785064697266, 19.693416595458984, 30.891979217529297, 64.66128540039062, 101.58561706542969, -12.189376831054688, 45.74554443359375, 66.56845092773438, 79.604736328125, 174.3099365234375, 65.50814819335938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000550.npy"}
{"epoch": 0.8076358296622613, "step": 551, "batch_size": 64, "mean": 75.28948974609375, "std": 74.68304443359375, "min": -57.0518798828125, "p10": -9.26210050582885, "median": 56.1925106048584, "p90": 183.61144409179695, "max": 279.6213073730469, "pos_frac": 0.875, "sample": [9.047826766967773, 67.7584457397461, 38.33617401123047, 50.164337158203125, 31.9505615234375, 95.59458923339844, -30.894378662109375, 150.89669799804688, 101.92349243164062, 66.44972229003906, 35.3286018371582, -12.71243667602539, 61.809051513671875, 90.51176452636719, 42.23686981201172, 71.16343688964844, 42.7989501953125, 29.955535888671875, 111.93087768554688, 263.07366943359375, 157.98475646972656, 15.302658081054688, 233.455322265625, 80.75787353515625, 21.05431365966797, -16.612625122070312, 233.67169189453125, 16.372764587402344, -1.2113161087036133, 63.99249267578125, -31.4803466796875, 39.40734100341797, 279.6213073730469, -57.0518798828125, 52.723907470703125, 158.21612548828125, 24.7911376953125, 66.240478515625, 53.14912414550781, 25.405155181884766, 21.911727905273438, 18.4647216796875, 81.89691925048828, 25.197311401367188, 69.4865493774414, 162.51052856445312, 14.54681396484375, -26.119293212890625, 119.64324951171875, 54.36479568481445, 129.02197265625, 99.61215209960938, 191.6158447265625, 42.394866943359375, 164.93450927734375, 121.45286560058594, 31.79271697998047, 145.0710906982422, 193.71481323242188, 199.45846557617188, 33.84335708618164, 150.61810302734375, -18.041030883789062, 58.020225524902344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000551.npy"}
{"epoch": 0.8091042584434655, "step": 552, "batch_size": 64, "mean": 95.46898651123047, "std": 86.39298248291016, "min": -143.31903076171875, "p10": -14.962752723693846, "median": 95.74886322021484, "p90": 207.91727600097656, "max": 306.4083557128906, "pos_frac": 0.828125, "sample": [58.64567947387695, 71.79546356201172, 121.9515380859375, 207.1170654296875, 234.01858520507812, 69.26696014404297, 163.97021484375, 110.60584259033203, 75.08403015136719, -9.121376037597656, 167.76954650878906, 48.37376403808594, -31.937149047851562, 152.61923217773438, 142.4014892578125, 95.33174133300781, 123.06646728515625, 113.45003509521484, 306.4083557128906, -56.68404006958008, -34.44999694824219, 154.46795654296875, 188.25997924804688, 131.1356964111328, -143.31903076171875, 46.46294021606445, 120.97884368896484, 172.4891357421875, 63.078269958496094, -13.837142944335938, 217.47581481933594, -24.179931640625, 101.6272964477539, 96.16598510742188, 184.40145874023438, 109.34412384033203, 139.1129913330078, 235.6730499267578, 215.41607666015625, 140.79989624023438, 190.4359130859375, 70.29737854003906, 18.593475341796875, 86.34066772460938, 8.793350219726562, 236.96865844726562, 83.54042053222656, 136.05728149414062, 208.26022338867188, 107.61433410644531, 46.14129638671875, -14.014318466186523, -15.369224548339844, 26.233993530273438, 77.2690200805664, 50.836517333984375, 31.871658325195312, -30.652725219726562, -1.3550357818603516, 20.840728759765625, 202.87332153320312, 78.95063781738281, 162.38336181640625, 61.86744689941406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000552.npy"}
{"epoch": 0.8105726872246696, "step": 553, "batch_size": 64, "mean": 61.68965148925781, "std": 73.67684173583984, "min": -94.22862243652344, "p10": -47.49476356506347, "median": 60.262184143066406, "p90": 142.50587005615233, "max": 219.4986572265625, "pos_frac": 0.8125, "sample": [98.30343627929688, 182.54147338867188, 35.693023681640625, -50.639747619628906, 129.830078125, -8.710887908935547, 66.89424896240234, -94.22862243652344, 109.85731506347656, 29.096206665039062, 142.20443725585938, 43.75811004638672, 7.8536376953125, 99.7698745727539, 25.50940704345703, 142.6350555419922, -40.15646743774414, 22.25701904296875, 129.82818603515625, -61.65182113647461, 50.767333984375, 173.62181091308594, 27.34062957763672, 21.644268035888672, 2.2892837524414062, 79.48141479492188, 219.4986572265625, 134.00384521484375, -54.492889404296875, 48.80507278442383, 63.69456481933594, 38.42639923095703, 117.20355224609375, -14.587844848632812, 39.259857177734375, 22.776626586914062, 67.34228515625, 94.6766357421875, 72.63572692871094, -94.073974609375, 18.92843246459961, 130.23837280273438, -0.40937042236328125, 105.18434143066406, 153.33172607421875, 114.6023178100586, 43.87760925292969, -66.34667205810547, 137.23208618164062, 32.118263244628906, 138.8980712890625, 19.256561279296875, 213.6446533203125, 83.33955383300781, 109.478759765625, 129.89614868164062, 95.03033447265625, 138.3213348388672, 44.62294006347656, 56.829803466796875, -13.162334442138672, 161.44650268554688, -86.97929382324219, 67.80028533935547], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000553.npy"}
{"epoch": 0.8120411160058737, "step": 554, "batch_size": 64, "mean": 69.89382934570312, "std": 93.18569946289062, "min": -130.9364013671875, "p10": -60.33680953979491, "median": 73.4058609008789, "p90": 172.03226776123046, "max": 309.8730773925781, "pos_frac": 0.765625, "sample": [69.77679443359375, 78.49871826171875, 156.12600708007812, 108.4434814453125, 28.344539642333984, 144.78253173828125, -12.459091186523438, -15.097602844238281, 58.228111267089844, 135.64682006835938, 47.560211181640625, 87.25064086914062, 98.96355438232422, 27.82647705078125, 18.827768325805664, 6.1823883056640625, 175.1479034423828, -101.6546401977539, -63.150245666503906, 78.5714111328125, -91.00633239746094, -5.735906600952148, 68.53782653808594, 61.50245666503906, 164.928466796875, 77.03492736816406, -0.1796417236328125, 45.43091583251953, 41.9141845703125, 38.539207458496094, 163.334228515625, 161.10336303710938, 172.24847412109375, 91.41413879394531, 180.09097290039062, 131.4032440185547, 128.49151611328125, 133.87579345703125, 111.40388488769531, -30.6523380279541, -66.00863647460938, 119.1735610961914, 195.7528533935547, -12.173931121826172, 143.63986206054688, 24.73249053955078, 171.5277862548828, -53.772125244140625, 230.85037231445312, -49.5413818359375, -103.21725463867188, 111.614990234375, 7.8087005615234375, -130.9364013671875, 113.63699340820312, 107.56937408447266, 26.84283447265625, 40.094947814941406, -90.2430419921875, 148.4243927001953, 309.8730773925781, 285.5557861328125, 140.8145294189453, 29.689743041992188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000554.npy"}
{"epoch": 0.8135095447870778, "step": 555, "batch_size": 64, "mean": 64.8729248046875, "std": 85.04773712158203, "min": -94.79646301269531, "p10": -44.85582885742187, "median": 50.20684051513672, "p90": 173.1858367919922, "max": 269.645751953125, "pos_frac": 0.765625, "sample": [164.13525390625, -18.717880249023438, 147.5738983154297, -1.303558349609375, 119.69076538085938, 178.50250244140625, 111.94625854492188, -29.528167724609375, -81.4773941040039, 109.12166595458984, 27.885345458984375, 66.69189453125, 100.12042236328125, 171.56988525390625, 35.304901123046875, -38.94837188720703, 66.14738464355469, -17.132896423339844, 50.85528564453125, 225.39547729492188, 65.9918212890625, 171.29678344726562, 19.254531860351562, -72.80783081054688, 160.5475616455078, -94.79646301269531, 58.09215545654297, 17.946998596191406, 209.51913452148438, -47.387596130371094, -13.01202392578125, 173.87838745117188, 149.22865295410156, 183.11563110351562, 2.2567138671875, 35.54624938964844, 23.110427856445312, -57.67292785644531, 14.347881317138672, -83.90747833251953, -60.25440979003906, 47.33903121948242, 54.67982482910156, 186.62466430664062, 44.58995056152344, 36.846778869628906, 148.45497131347656, 3.2160587310791016, 40.296234130859375, 120.29595947265625, 167.55203247070312, 107.33038330078125, 170.10006713867188, -15.643394470214844, 29.805011749267578, 36.45196533203125, 269.645751953125, 57.90614318847656, 96.9393310546875, 97.0198974609375, 49.55839538574219, 8.8858642578125, -1.1830902099609375, 153.02838134765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000555.npy"}
{"epoch": 0.8149779735682819, "step": 556, "batch_size": 64, "mean": 69.96516418457031, "std": 78.43024444580078, "min": -145.04002380371094, "p10": -21.37979888916015, "median": 72.26668548583984, "p90": 154.9167022705078, "max": 297.69219970703125, "pos_frac": 0.84375, "sample": [100.89250183105469, 12.52557373046875, 51.308692932128906, -15.550399780273438, 150.08010864257812, 89.81326293945312, 134.50997924804688, 41.76490783691406, -83.7257308959961, 205.49398803710938, -14.846881866455078, 53.663963317871094, 94.68186950683594, 83.98226928710938, 110.20982360839844, 20.380218505859375, 91.25010681152344, 90.57608032226562, 93.05000305175781, 62.380760192871094, -63.015892028808594, 148.81857299804688, 12.303787231445312, 35.125030517578125, -73.46184539794922, 176.35379028320312, -63.52422332763672, 131.9687042236328, 146.295654296875, -145.04002380371094, 38.18980026245117, 6.1990509033203125, 91.59757232666016, 107.55339050292969, 74.40107727050781, 40.34842300415039, 155.3785400390625, 70.13229370117188, 9.128143310546875, -32.91020202636719, 54.74259948730469, 83.47589111328125, 153.83908081054688, 199.02285766601562, 84.39338684082031, 45.17694091796875, 142.5727081298828, 102.69778442382812, 297.69219970703125, 11.022636413574219, 49.676177978515625, 124.13580322265625, 40.59648132324219, 131.99282836914062, 167.58932495117188, -23.87811279296875, 36.15673828125, 150.93341064453125, -3.3292922973632812, 94.53949737548828, 205.83349609375, 8.977474212646484, 31.920989990234375, 49.707069396972656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000556.npy"}
{"epoch": 0.8164464023494861, "step": 557, "batch_size": 64, "mean": 76.68682861328125, "std": 84.54766845703125, "min": -100.93087768554688, "p10": -24.250439453124997, "median": 76.10598373413086, "p90": 188.14103851318362, "max": 266.88134765625, "pos_frac": 0.78125, "sample": [-43.74755859375, 218.64852905273438, 180.3963165283203, 24.07635498046875, 50.845916748046875, 90.84579467773438, 3.62884521484375, -4.959251403808594, 111.1009292602539, 12.251174926757812, 84.93643188476562, 118.26657104492188, 266.88134765625, 54.06903076171875, 36.92595672607422, 143.32626342773438, 42.04766845703125, 90.86572265625, 33.36780548095703, 123.11148071289062, 31.424560546875, 163.2969970703125, 172.48828125, 89.76123046875, -21.469879150390625, 133.6251220703125, -39.677459716796875, 133.70912170410156, -1.79974365234375, 157.34901428222656, 138.08740234375, 33.76185607910156, 207.28546142578125, 26.47399139404297, 158.8875732421875, 15.04928970336914, 197.80349731445312, -11.540138244628906, 83.471923828125, -14.158443450927734, -34.87150955200195, -62.13323211669922, 50.874366760253906, 68.74004364013672, 0.4910163879394531, -6.992866516113281, 191.460205078125, 100.94725036621094, -30.474441528320312, 250.88253784179688, 97.05070495605469, 104.04054260253906, 177.06427001953125, 129.2445068359375, 154.07229614257812, 50.573883056640625, -100.93087768554688, 234.3060760498047, 9.38372802734375, 116.34072875976562, -25.442108154296875, -16.571537017822266, 7.897148132324219, 151.28929138183594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000557.npy"}
{"epoch": 0.8179148311306902, "step": 558, "batch_size": 64, "mean": 79.33963012695312, "std": 75.40190887451172, "min": -107.38394165039062, "p10": 1.0137802124023456, "median": 73.25312805175781, "p90": 157.19672393798828, "max": 325.53546142578125, "pos_frac": 0.90625, "sample": [78.70111083984375, 79.30902099609375, 154.97381591796875, 177.072265625, 111.7322998046875, 73.22756958007812, 89.21739196777344, 158.14939880371094, 32.134037017822266, -94.39725494384766, 2.7625732421875, 214.7234649658203, 47.3818359375, 131.8131103515625, 39.60215759277344, -1.551025390625, 51.07440185546875, 13.891948699951172, 82.28036499023438, 148.5936279296875, 146.37246704101562, 29.835533142089844, 73.2786865234375, 46.51738739013672, 58.506717681884766, 88.98544311523438, 185.53427124023438, -20.099761962890625, 57.35551452636719, -51.57543182373047, 112.07072448730469, -10.546073913574219, 25.222328186035156, 135.43695068359375, 62.81105041503906, 154.48333740234375, 24.651458740234375, 138.73745727539062, -107.38394165039062, 139.36802673339844, 2.7816314697265625, 41.602806091308594, 128.22344970703125, 259.412353515625, 0.2642974853515625, 33.487125396728516, 23.40869903564453, 325.53546142578125, 69.34759521484375, 38.34623718261719, 54.619659423828125, 123.163330078125, 110.78237915039062, 110.26052856445312, 39.228965759277344, 107.37750244140625, 28.805686950683594, 6.889835357666016, 194.19821166992188, 88.72683715820312, 116.69418334960938, 118.08831787109375, 54.050559997558594, 122.18640899658203], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000558.npy"}
{"epoch": 0.8193832599118943, "step": 559, "batch_size": 64, "mean": 69.80716705322266, "std": 79.47175598144531, "min": -140.974609375, "p10": -26.474672126770017, "median": 64.9943962097168, "p90": 161.4798263549805, "max": 264.7794189453125, "pos_frac": 0.828125, "sample": [53.68064880371094, 140.67662048339844, 130.04522705078125, 176.26119995117188, 17.710487365722656, 68.74810791015625, -9.709394454956055, 24.747146606445312, 98.12527465820312, 150.2630157470703, 136.07086181640625, 109.34030151367188, 25.981307983398438, 182.31394958496094, 106.168701171875, 66.61112976074219, 98.20988464355469, 230.78125, 37.715789794921875, -21.1336669921875, 264.7794189453125, 50.977073669433594, -65.55474853515625, -24.688453674316406, 64.5189437866211, 58.012786865234375, -140.974609375, -140.54702758789062, 136.3592529296875, 101.87449645996094, 129.12567138671875, 7.973186492919922, 34.17707824707031, 113.13617706298828, 91.15987396240234, 65.4698486328125, 31.482563018798828, -27.24019432067871, 54.19010925292969, 23.69662857055664, 164.6278076171875, -9.212455749511719, 99.26985168457031, 30.043014526367188, 132.421875, 134.96795654296875, -68.80575561523438, 154.13453674316406, 191.86114501953125, 6.576480865478516, 97.46154022216797, 46.067291259765625, 30.486709594726562, 64.33415985107422, 174.46788024902344, 106.18106079101562, -33.69024658203125, 141.76788330078125, 64.40203094482422, 52.54479217529297, 48.972381591796875, -48.53595733642578, 149.72744750976562, 87.05165100097656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000559.npy"}
{"epoch": 0.8208516886930984, "step": 560, "batch_size": 64, "mean": 65.19851684570312, "std": 78.45458984375, "min": -128.78590393066406, "p10": -23.19542121887207, "median": 61.98963928222656, "p90": 171.53177490234378, "max": 225.81680297851562, "pos_frac": 0.796875, "sample": [71.37825012207031, 22.84134292602539, 89.36546325683594, 164.40017700195312, 82.50975036621094, 8.57940673828125, -128.78590393066406, -117.17670440673828, 56.40771484375, 104.30943298339844, 85.94442749023438, 32.28369140625, 182.01510620117188, -23.395355224609375, 128.62506103515625, -22.72890853881836, 161.4431610107422, 32.562095642089844, 108.71106719970703, 49.933563232421875, 74.94720458984375, -14.597846984863281, 87.25228881835938, 173.92098999023438, 53.89762878417969, 65.01040649414062, 2.9138717651367188, 225.81680297851562, -3.429962158203125, -27.224853515625, 99.76617431640625, -111.44956970214844, 48.979679107666016, 161.486083984375, 61.23411560058594, 62.74516296386719, 60.05625534057617, 123.87033081054688, 198.56195068359375, 25.741897583007812, 175.10052490234375, 46.141448974609375, 118.33987426757812, 20.776607513427734, 127.7888412475586, 165.95693969726562, 131.97215270996094, 199.78152465820312, 40.748046875, 219.99465942382812, -21.132965087890625, 22.736099243164062, 71.25483703613281, 52.00227355957031, 66.10740661621094, 94.97228240966797, -21.749252319335938, 91.39352416992188, 42.245452880859375, 136.00527954101562, -43.077301025390625, -17.674758911132812, 23.822860717773438, -29.52301025390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000560.npy"}
{"epoch": 0.8223201174743024, "step": 561, "batch_size": 64, "mean": 61.706260681152344, "std": 69.03496551513672, "min": -77.53230285644531, "p10": -16.714825439453126, "median": 46.49811553955078, "p90": 164.60923614501954, "max": 233.24990844726562, "pos_frac": 0.8125, "sample": [125.23204803466797, -16.88611602783203, -16.906661987304688, 29.61389923095703, 10.57733154296875, 99.97146606445312, 66.93977355957031, 156.6531524658203, 233.24990844726562, 130.5185089111328, -8.470006942749023, 31.2515869140625, 175.46896362304688, 74.17766571044922, 164.82025146484375, 80.71172332763672, 69.83110809326172, 15.456497192382812, 86.39453887939453, 41.255523681640625, 229.24374389648438, 7.542442321777344, 29.334426879882812, 97.32756042480469, 35.232757568359375, -18.337982177734375, 181.70138549804688, 67.119140625, 61.853755950927734, 143.48251342773438, 164.1168670654297, 126.39752197265625, 13.850051879882812, 96.00640869140625, 167.75465393066406, 28.808319091796875, 0.2044239044189453, 48.130126953125, 33.863975524902344, 92.44969940185547, 18.794116973876953, -16.315147399902344, 39.65724182128906, 55.67744445800781, -27.157638549804688, 16.17995834350586, 110.76174926757812, 155.14154052734375, 20.81745719909668, 67.75019836425781, -48.6273193359375, 188.120361328125, -6.239448547363281, -77.53230285644531, 41.50238037109375, 9.900447845458984, 17.610271453857422, 67.75080108642578, -3.238750457763672, -5.379158020019531, -41.52458190917969, 44.86610412597656, 67.81106567382812, 96.9310302734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000561.npy"}
{"epoch": 0.8237885462555066, "step": 562, "batch_size": 64, "mean": 59.46961975097656, "std": 88.24432373046875, "min": -153.6268310546875, "p10": -37.85145454406738, "median": 61.52490997314453, "p90": 155.96751708984377, "max": 331.362548828125, "pos_frac": 0.703125, "sample": [-40.654632568359375, 170.136474609375, -35.07828903198242, 38.210243225097656, 87.09535217285156, 35.64350128173828, 152.64866638183594, -31.994735717773438, 134.69808959960938, 63.52642822265625, 68.60003662109375, 184.67416381835938, 18.302894592285156, 43.96485137939453, 234.31639099121094, -102.30207824707031, 41.3778076171875, 86.18061828613281, 57.67464065551758, 191.81915283203125, 61.15882873535156, 97.0894775390625, 127.34976959228516, -2.0416526794433594, 21.479248046875, 151.42474365234375, 156.31658935546875, 132.24075317382812, 123.42727661132812, 119.90165710449219, 40.523582458496094, -2.14306640625, 57.328147888183594, 155.15301513671875, -74.31595611572266, 331.362548828125, 47.61424255371094, 92.45281982421875, -132.87152099609375, 135.74099731445312, -9.754165649414062, 98.31584167480469, 87.89141845703125, 24.19573974609375, -19.304439544677734, 112.03797149658203, 117.29096984863281, 64.480224609375, -23.189430236816406, 112.69955444335938, -38.3332633972168, 78.03976440429688, 122.09945678710938, -30.512855529785156, 81.06004333496094, -11.089195251464844, -153.6268310546875, 61.8909912109375, -1.0815277099609375, -29.26799201965332, 14.203635215759277, -36.72723388671875, 199.7958221435547, -53.08961868286133], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000562.npy"}
{"epoch": 0.8252569750367107, "step": 563, "batch_size": 64, "mean": 54.94388961791992, "std": 75.76880645751953, "min": -160.0216064453125, "p10": -28.498231887817376, "median": 49.560733795166016, "p90": 141.3908477783203, "max": 263.6005554199219, "pos_frac": 0.796875, "sample": [17.029644012451172, -160.0216064453125, 51.93965148925781, 45.72724914550781, 167.81787109375, 27.53900146484375, 165.80960083007812, -34.471954345703125, 144.748291015625, 103.70468139648438, 128.14208984375, 1.812795639038086, 2.7977447509765625, 59.35400390625, 69.68128967285156, 48.51852798461914, 43.60014343261719, 92.80232238769531, 83.50205993652344, 139.51255798339844, 125.51537322998047, 5.694601058959961, -69.14816284179688, 94.86628723144531, 92.53578186035156, -30.69704818725586, 45.3135986328125, -23.367660522460938, 204.20477294921875, 82.94464111328125, 49.3043212890625, -15.404796600341797, 39.95368194580078, 140.51309204101562, 101.26514434814453, 133.74630737304688, 45.31013488769531, -93.1781005859375, 141.76702880859375, 27.534774780273438, 68.91712188720703, 48.29228210449219, 231.88018798828125, 49.81714630126953, -16.5589599609375, -42.676353454589844, 68.12898254394531, 263.6005554199219, 102.44013214111328, -13.3948974609375, 2.2365570068359375, 51.224403381347656, -12.306526184082031, 11.498123168945312, 22.886398315429688, -89.29473114013672, 50.58283996582031, -9.298490524291992, 24.99115753173828, 106.69231414794922, 106.34867858886719, 62.21135711669922, 101.07207489013672, 28.89893913269043], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000563.npy"}
{"epoch": 0.8267254038179148, "step": 564, "batch_size": 64, "mean": 59.6172981262207, "std": 64.01009368896484, "min": -118.91574096679688, "p10": -19.386983489990232, "median": 62.587867736816406, "p90": 143.4113037109375, "max": 195.9733123779297, "pos_frac": 0.8125, "sample": [53.281036376953125, 67.29277038574219, 53.695709228515625, 107.08504486083984, 44.28718566894531, -38.74371337890625, 39.426700592041016, 12.039813995361328, 33.26136016845703, 92.39186096191406, 138.9718475341797, 60.45777893066406, 17.461441040039062, -16.404739379882812, 195.9733123779297, -20.141494750976562, -20.867462158203125, 134.58314514160156, 1.133148193359375, 15.454814910888672, 74.03784942626953, 64.71795654296875, 4.364902496337891, -52.00047302246094, -118.91574096679688, -31.584033966064453, 103.43374633789062, 104.83377075195312, 122.061767578125, 12.8697509765625, 91.78252410888672, 129.34542846679688, 68.776611328125, -4.0489654541015625, 33.72868728637695, 69.97528076171875, 91.02782440185547, 169.97454833984375, 12.227279663085938, -17.62645721435547, 64.862060546875, 54.52203369140625, 69.14795684814453, 115.58163452148438, -0.1254425048828125, 90.5150146484375, 141.25535583496094, 155.02059936523438, 97.78509521484375, 19.37451171875, -1.7410125732421875, 84.97489929199219, 84.18392181396484, -46.18224334716797, 143.59878540039062, 147.99755859375, 161.96900939941406, 43.23137664794922, 34.97569274902344, 24.30047607421875, 90.06016540527344, 168.731689453125, 28.872413635253906, 142.97384643554688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000564.npy"}
{"epoch": 0.8281938325991189, "step": 565, "batch_size": 64, "mean": 78.6133041381836, "std": 75.35665893554688, "min": -90.99372863769531, "p10": -14.183987236022945, "median": 82.20711517333984, "p90": 169.05376129150392, "max": 228.82357788085938, "pos_frac": 0.84375, "sample": [5.252479553222656, 181.1612548828125, 144.77679443359375, -0.6712188720703125, 12.296234130859375, 190.31243896484375, 73.46910858154297, 2.9438858032226562, -53.73781967163086, 94.15081024169922, 90.52581024169922, 158.7708740234375, 29.430683135986328, 53.09564208984375, 24.835582733154297, 5.849334716796875, -7.650115966796875, 16.480865478515625, 83.08734130859375, 127.03790283203125, 228.82357788085938, 132.4326934814453, 19.379196166992188, 187.01217651367188, 3.8959503173828125, 15.3892822265625, 147.02236938476562, 59.032615661621094, -32.311607360839844, 169.1335906982422, 168.86749267578125, 118.63471984863281, 58.274269104003906, 49.096961975097656, -90.99372863769531, -38.28533935546875, 125.29527282714844, 133.8472900390625, 114.0325927734375, 82.8133544921875, 78.23855590820312, 190.05990600585938, -44.44868469238281, 81.60087585449219, 3.6325225830078125, 161.05230712890625, 100.65386199951172, 194.52578735351562, 56.299713134765625, -19.742156982421875, 140.88369750976562, 27.45648193359375, 167.25604248046875, 147.51730346679688, 119.99417114257812, 167.43115234375, 126.11174011230469, -10.936197280883789, -15.575897216796875, 44.88449478149414, 157.2183380126953, 16.86501693725586, 105.46568298339844, 151.9961395263672], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000565.npy"}
{"epoch": 0.8296622613803231, "step": 566, "batch_size": 64, "mean": 73.64616394042969, "std": 77.61248016357422, "min": -130.50796508789062, "p10": -9.498128509521479, "median": 68.03791809082031, "p90": 186.67532653808595, "max": 239.8994140625, "pos_frac": 0.859375, "sample": [52.800575256347656, 136.58518981933594, 111.53897094726562, 183.9757080078125, 72.36642456054688, 41.704833984375, 37.94281005859375, 111.835205078125, -23.593067169189453, 141.09893798828125, 61.20372009277344, 20.866378784179688, 26.392578125, 67.98808288574219, 63.242225646972656, 30.328475952148438, 99.82772827148438, 89.5545425415039, -3.6183090209960938, -78.75634765625, 45.209083557128906, 140.62710571289062, 192.01480102539062, 138.66058349609375, 136.4405059814453, -48.533634185791016, 124.08519744873047, 18.18878173828125, -0.9758892059326172, 68.08775329589844, 77.60018920898438, 176.79812622070312, 80.8318099975586, 77.35052490234375, 190.5701904296875, 9.29085922241211, -87.67111206054688, 200.83587646484375, 39.04743194580078, 3.0307464599609375, 17.160736083984375, 92.75377655029297, 55.10248565673828, 208.61907958984375, 28.08056640625, 239.8994140625, 154.91505432128906, 0.1329498291015625, -130.50796508789062, 29.820938110351562, -12.018051147460938, 187.83230590820312, 84.733642578125, 103.54035186767578, 24.909250259399414, 86.47331237792969, 18.81153106689453, 144.51974487304688, 175.6705322265625, 44.01308059692383, -29.56369972229004, 55.79071044921875, 93.42218017578125, 214.4693603515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000566.npy"}
{"epoch": 0.8311306901615272, "step": 567, "batch_size": 64, "mean": 62.4362907409668, "std": 74.05352783203125, "min": -117.97066497802734, "p10": -20.842268562316896, "median": 54.781394958496094, "p90": 154.3027313232422, "max": 242.4022216796875, "pos_frac": 0.8125, "sample": [-88.12944030761719, 63.53260040283203, 149.34034729003906, 36.55793762207031, 51.048805236816406, 201.74050903320312, 89.47627258300781, 92.63213348388672, 100.7916259765625, 39.66387939453125, 57.889801025390625, 29.389572143554688, 111.24734497070312, 59.55107116699219, 173.74261474609375, 86.79415893554688, 156.04266357421875, 144.1017303466797, 13.095603942871094, 113.6985092163086, 54.87725830078125, 200.72750854492188, -20.918943405151367, 150.24288940429688, 44.101341247558594, 68.12028503417969, 38.056129455566406, 30.374221801757812, 106.51467895507812, 137.88392639160156, 54.68553161621094, 87.52705383300781, 53.83808135986328, -37.1829833984375, 107.47476196289062, -12.090110778808594, 93.44326782226562, 110.05509948730469, 242.4022216796875, 8.807449340820312, 9.831787109375, -20.663360595703125, -55.727027893066406, -1.1306304931640625, 48.616416931152344, 172.23707580566406, 9.139236450195312, -117.97066497802734, 183.68450927734375, 40.351776123046875, -95.41259002685547, 144.79173278808594, 117.88607788085938, 26.770477294921875, 16.938278198242188, -6.8173370361328125, 32.735084533691406, 35.090301513671875, -2.4420433044433594, 98.222412109375, 125.82749938964844, 26.590702056884766, -55.320045471191406, 61.54539489746094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000567.npy"}
{"epoch": 0.8325991189427313, "step": 568, "batch_size": 64, "mean": 74.95256805419922, "std": 73.00387573242188, "min": -58.54026794433594, "p10": -5.142860031127928, "median": 58.95730972290039, "p90": 176.84050598144532, "max": 235.49522399902344, "pos_frac": 0.859375, "sample": [18.23590087890625, -4.102630615234375, 137.53277587890625, -5.588672637939453, 46.39610290527344, 115.69737243652344, 52.604591369628906, 177.76419067382812, 162.05892944335938, -8.657684326171875, 3.066577911376953, -25.473480224609375, 54.80580139160156, 235.49522399902344, 57.976219177246094, 217.6138153076172, 65.11231231689453, 83.68355560302734, 79.93278503417969, 7.085552215576172, 24.33631134033203, 3.8443527221679688, 174.68524169921875, 58.45643997192383, 114.54368591308594, 40.16301727294922, 91.22456359863281, 141.5945281982422, 148.83816528320312, 71.79193115234375, 140.94398498535156, 8.956180572509766, 56.19129943847656, 128.18951416015625, 186.5608367919922, 158.0626220703125, 59.45817947387695, 93.12669372558594, 30.34033203125, 95.8083267211914, 223.51693725585938, -45.94127655029297, 9.481483459472656, 44.633270263671875, -58.54026794433594, 17.16668701171875, 20.264877319335938, -23.393402099609375, -2.5541515350341797, 193.9264373779297, 106.58082580566406, 223.404052734375, 156.64138793945312, -32.56689453125, 126.0542984008789, 18.62567138671875, 32.70453643798828, 70.30451965332031, 142.92849731445312, 113.51483917236328, 103.48532104492188, 22.285171508789062, 6.5510101318359375, 29.535076141357422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000568.npy"}
{"epoch": 0.8340675477239354, "step": 569, "batch_size": 64, "mean": 62.19904327392578, "std": 83.970947265625, "min": -136.71444702148438, "p10": -62.432112121582016, "median": 66.17057037353516, "p90": 160.15097351074218, "max": 213.05638122558594, "pos_frac": 0.84375, "sample": [-117.63709259033203, 110.31488800048828, 159.7361602783203, 75.57865905761719, 63.2222900390625, -110.55221557617188, 91.44328308105469, 42.075523376464844, 45.424468994140625, 152.34832763671875, 135.1929931640625, 5.1447601318359375, 33.478065490722656, 90.2814712524414, 206.8349151611328, 126.14865112304688, 16.67754364013672, 85.27871704101562, -22.221656799316406, 65.27043151855469, -132.23829650878906, 116.28079223632812, 125.86819458007812, 4.9460601806640625, 20.040016174316406, 171.69972229003906, 36.05021667480469, 47.445098876953125, 3.44940185546875, -74.14410400390625, 58.6723747253418, 72.7130126953125, 189.30235290527344, 145.5346221923828, 74.49880981445312, 29.54931640625, 119.1302490234375, 132.6191864013672, 167.6961212158203, -86.54229736328125, 150.65185546875, 9.196197509765625, -69.05300903320312, 213.05638122558594, 111.64270782470703, 4.555908203125, 105.62959289550781, 122.30882263183594, -136.71444702148438, 160.32875061035156, -37.88665771484375, 108.19568634033203, 8.930215835571289, 123.08568572998047, 145.2095947265625, 20.352920532226562, 120.317138671875, 11.645648956298828, 67.07070922851562, -46.98335266113281, 59.0888557434082, 21.65558624267578, 52.31036376953125, 179.53260803222656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000569.npy"}
{"epoch": 0.8355359765051396, "step": 570, "batch_size": 64, "mean": 79.84074401855469, "std": 70.21666717529297, "min": -105.71533203125, "p10": -11.893210983276365, "median": 86.69221115112305, "p90": 172.81041564941407, "max": 231.6696014404297, "pos_frac": 0.828125, "sample": [109.55743408203125, 90.39754486083984, 63.112709045410156, 231.6696014404297, 97.27266693115234, 84.62452697753906, 202.250244140625, 3.702362060546875, 135.89910888671875, 161.16810607910156, 101.03364562988281, 173.4710693359375, 18.615646362304688, 116.10354614257812, -105.71533203125, 146.25912475585938, -13.057540893554688, 88.75989532470703, 193.00714111328125, 71.5837631225586, -9.176441192626953, 27.156341552734375, 11.131492614746094, 76.62785339355469, 107.59141540527344, 110.17648315429688, 60.1015625, 197.31918334960938, 171.26889038085938, -2.2902755737304688, -24.208045959472656, 80.50056457519531, 43.89970397949219, -2.075836181640625, 45.433258056640625, 168.48834228515625, 92.38035583496094, 189.25048828125, 92.83332824707031, 140.15524291992188, 179.6292724609375, -3.3051300048828125, 34.239540100097656, 66.68415832519531, 108.25772094726562, 119.20771026611328, 110.27410888671875, 146.06829833984375, 106.24806213378906, 122.03402709960938, 32.02934646606445, 12.572463989257812, 69.79275512695312, -22.526824951171875, -26.424606323242188, 83.109619140625, 77.21505737304688, -27.893753051757812, 105.78185272216797, -46.46598434448242, 23.204193115234375, 148.21737670898438, 97.6683349609375, 47.91059875488281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000570.npy"}
{"epoch": 0.8370044052863436, "step": 571, "batch_size": 64, "mean": 73.75041961669922, "std": 86.01789093017578, "min": -114.6440200805664, "p10": -47.0948932647705, "median": 80.41279602050781, "p90": 189.52273254394535, "max": 249.1363067626953, "pos_frac": 0.828125, "sample": [74.78730773925781, 129.4651336669922, -114.6440200805664, 44.97923278808594, -90.07064819335938, 231.48989868164062, 33.98286056518555, 128.86221313476562, 11.925262451171875, 194.30816650390625, 82.63780975341797, -66.91653442382812, 70.91378784179688, 27.721214294433594, -64.58889770507812, 118.40332794189453, 221.31069946289062, 159.5973663330078, 76.01837158203125, 249.1363067626953, 199.89674377441406, 160.4154052734375, 125.6352767944336, -37.18067932128906, 13.625045776367188, 103.91014862060547, 85.28026580810547, 117.54531860351562, 160.8505859375, 240.271240234375, 6.0787506103515625, 90.26945495605469, -50.51184844970703, 88.7957534790039, 78.18778228759766, 41.06610107421875, 34.741539001464844, 102.58670043945312, 45.112911224365234, 23.784698486328125, 117.42999267578125, 60.705596923828125, 174.9593048095703, 148.45236206054688, 194.53253173828125, 7.771404266357422, 138.531005859375, 90.87083435058594, -19.928237915039062, 98.39903259277344, 12.205131530761719, -72.525634765625, 96.53182220458984, -39.12199783325195, 10.676589965820312, 132.92279052734375, 11.541620254516602, 111.52281188964844, -31.968711853027344, 103.51336669921875, 59.203121185302734, 178.35671997070312, -79.79911804199219, 65.56436157226562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000571.npy"}
{"epoch": 0.8384728340675477, "step": 572, "batch_size": 64, "mean": 73.78263092041016, "std": 84.3973159790039, "min": -69.07749938964844, "p10": -22.350676345825196, "median": 58.0110969543457, "p90": 192.0596481323242, "max": 370.3412780761719, "pos_frac": 0.8125, "sample": [32.46771240234375, 65.0059585571289, -11.465888977050781, 25.54122543334961, 193.98712158203125, -52.57377624511719, 84.96847534179688, 17.866615295410156, 26.290515899658203, 46.0771484375, 6.380712509155273, 64.07672119140625, 127.11529541015625, -14.994338989257812, 17.55474853515625, 85.80054473876953, -21.727508544921875, 182.36959838867188, 120.41242218017578, 180.21267700195312, 28.757293701171875, 94.24552917480469, -22.617748260498047, 137.38720703125, 41.90719985961914, -7.9584197998046875, -62.417518615722656, 26.306686401367188, 2.122499465942383, 95.39060974121094, 108.05570983886719, 32.35676574707031, 192.0348358154297, 5.635894775390625, -69.07749938964844, 370.3412780761719, -26.967918395996094, 58.97337341308594, 196.01962280273438, 60.977210998535156, 116.11184692382812, 143.15252685546875, -3.601461410522461, 195.29067993164062, -41.91413879394531, 39.55970764160156, 213.82508850097656, 50.922515869140625, 39.25447082519531, 57.04882049560547, 63.91578674316406, 101.45556640625, -29.34893035888672, 218.10629272460938, 34.84474563598633, 192.07028198242188, 176.8799591064453, 162.68841552734375, 25.33868408203125, 142.55313110351562, 29.955421447753906, 124.71275329589844, 69.34782409667969, 163.08018493652344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000572.npy"}
{"epoch": 0.8399412628487518, "step": 573, "batch_size": 64, "mean": 70.3113021850586, "std": 80.95068359375, "min": -148.91665649414062, "p10": -10.001570320129392, "median": 62.27304458618164, "p90": 175.03159790039064, "max": 221.799560546875, "pos_frac": 0.84375, "sample": [175.20025634765625, 9.05082893371582, 58.71411895751953, 14.894851684570312, 44.065773010253906, 70.14109802246094, 69.04164123535156, -51.14629364013672, 174.6380615234375, 183.17855834960938, 192.00454711914062, 168.6307373046875, 23.03158950805664, 221.799560546875, 37.1454963684082, 128.67042541503906, 159.53952026367188, -148.91665649414062, 44.32109832763672, 42.32774353027344, 112.52974700927734, 39.87061309814453, 2.325103759765625, 182.5695037841797, 114.44012451171875, 201.82630920410156, 110.57201385498047, 167.12918090820312, 65.83197021484375, 118.02845001220703, -126.62019348144531, -7.999229431152344, 173.78477478027344, 185.1947021484375, 106.46183013916016, -90.9996337890625, 34.860755920410156, -10.859716415405273, 5.944122314453125, 3.2181015014648438, 120.44538116455078, 150.9518585205078, 169.40988159179688, 35.694068908691406, 77.07994079589844, -5.840755462646484, 51.90284729003906, -23.461715698242188, -6.023693084716797, -65.72611999511719, 30.984954833984375, 29.350936889648438, 86.70994567871094, 10.825912475585938, 53.41259765625, 109.20065307617188, 41.303436279296875, 82.09947204589844, 121.49258422851562, 160.25518798828125, 52.367095947265625, 89.07174682617188, 12.304473876953125, 111.67107391357422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000573.npy"}
{"epoch": 0.8414096916299559, "step": 574, "batch_size": 64, "mean": 70.3678207397461, "std": 81.21170806884766, "min": -101.30770111083984, "p10": -37.539681243896474, "median": 63.488704681396484, "p90": 164.1844421386719, "max": 262.1033630371094, "pos_frac": 0.8125, "sample": [25.27048110961914, 15.265121459960938, 139.03009033203125, -46.62971878051758, 139.25515747070312, 47.460819244384766, 143.4544219970703, 155.7626495361328, 60.610801696777344, 146.04544067382812, 47.88714599609375, 197.34814453125, 57.76622772216797, 238.6879119873047, 99.27491760253906, 86.56741333007812, 75.71379089355469, 18.609848022460938, 153.7836456298828, 140.40269470214844, 49.835323333740234, -17.496612548828125, 40.99266052246094, 223.30886840820312, 15.512664794921875, 98.01431274414062, 17.442054748535156, -44.0391845703125, 66.36660766601562, 165.19293212890625, 24.152587890625, -85.2306137084961, -4.7434844970703125, 97.11015319824219, 38.688934326171875, -101.30770111083984, -4.744232177734375, 149.67636108398438, 84.36450958251953, 38.67362976074219, 75.44920349121094, -30.065155029296875, 157.9228973388672, -63.509361267089844, 116.60151672363281, 114.48104858398438, 69.99585723876953, 36.52466583251953, -40.74304962158203, -96.510498046875, 93.67355346679688, -0.9696083068847656, 75.53203582763672, 161.831298828125, 24.63165283203125, 182.95309448242188, 42.59794616699219, 262.1033630371094, 22.99688720703125, 84.01182556152344, 120.16921997070312, 220.08428955078125, 48.65583038330078, 31.785614013671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000574.npy"}
{"epoch": 0.8428781204111601, "step": 575, "batch_size": 64, "mean": 73.34466552734375, "std": 88.89456939697266, "min": -128.7067413330078, "p10": -52.382767868041995, "median": 74.10614013671875, "p90": 204.344482421875, "max": 299.85382080078125, "pos_frac": 0.796875, "sample": [19.447418212890625, 142.94773864746094, 123.21647644042969, 211.69857788085938, 65.69261169433594, 59.78910827636719, -70.71465301513672, 82.61572265625, 93.16311645507812, 120.15144348144531, 41.73295593261719, 53.998939514160156, 144.01388549804688, 83.40176391601562, -0.38649749755859375, -14.559654235839844, -103.03286743164062, 78.47498321533203, 88.44670104980469, 74.0753173828125, 169.88531494140625, -80.55815887451172, 203.68310546875, 23.478485107421875, 111.33248901367188, 82.44414520263672, -51.713836669921875, 49.891727447509766, -9.556327819824219, 133.2654571533203, -16.161304473876953, 112.80117797851562, 112.10852813720703, 104.40721130371094, -52.66945266723633, 66.07625579833984, 55.097991943359375, -39.660987854003906, 69.04219818115234, 223.3360595703125, 58.15741729736328, 29.457138061523438, -128.7067413330078, 86.35653686523438, 219.4814453125, -64.76374816894531, 204.6279296875, 117.43025207519531, 63.13462829589844, 121.58060455322266, 33.17320251464844, 172.26898193359375, 74.136962890625, 100.33001708984375, 299.85382080078125, 39.40576934814453, 213.74169921875, 50.95018005371094, 219.75656127929688, 15.95745849609375, -110.88903045654297, 103.2143783569336, 160.21888732910156, 54.480735778808594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000575.npy"}
{"epoch": 0.8443465491923642, "step": 576, "batch_size": 64, "mean": 69.30174255371094, "std": 86.04670715332031, "min": -161.3775177001953, "p10": -34.94043045043944, "median": 61.495582580566406, "p90": 186.9743896484375, "max": 239.76123046875, "pos_frac": 0.78125, "sample": [91.78209686279297, 66.60404968261719, -65.1072998046875, 123.83673095703125, -53.93132400512695, 55.62889862060547, 134.21031188964844, 18.536300659179688, 31.028179168701172, 153.93887329101562, 200.15347290039062, 148.6865997314453, 6.488395690917969, 187.46340942382812, 239.76123046875, 216.90829467773438, 39.38128662109375, 74.90614318847656, 34.69659423828125, -105.09107208251953, -4.579803466796875, 198.90350341796875, 140.68724060058594, 110.0009765625, -40.156494140625, 171.27392578125, 7.980724334716797, 129.31146240234375, -22.769615173339844, 135.92514038085938, 2.1289596557617188, 39.27239990234375, 47.564178466796875, 40.51451110839844, 27.6883544921875, 193.4832763671875, 185.83334350585938, -1.3874931335449219, 6.800895690917969, 130.2699737548828, -9.081718444824219, 68.86677551269531, -2.0220584869384766, 46.54905700683594, 147.62705993652344, -6.372428894042969, 29.911827087402344, 143.00148010253906, 49.86174011230469, 121.744140625, -90.09923553466797, 183.9891357421875, 197.39111328125, 119.28718566894531, -161.3775177001953, 56.387115478515625, 44.200958251953125, 74.01641845703125, 153.82630920410156, -4.9688720703125, 67.87358093261719, 87.93394470214844, -60.14875793457031, 78.28770446777344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000576.npy"}
{"epoch": 0.8458149779735683, "step": 577, "batch_size": 64, "mean": 83.21434783935547, "std": 88.69630432128906, "min": -104.27642822265625, "p10": -39.86539077758789, "median": 88.61109924316406, "p90": 193.3446243286133, "max": 329.6661376953125, "pos_frac": 0.78125, "sample": [-78.23591613769531, 329.6661376953125, 202.95556640625, 149.41650390625, 54.64543914794922, -1.7815933227539062, 161.88381958007812, 115.82404327392578, -25.073532104492188, 125.27743530273438, 41.72138977050781, 138.81849670410156, 170.37112426757812, 139.4412384033203, -104.27642822265625, 106.45565032958984, 196.36212158203125, 193.91844177246094, 74.59307861328125, 141.08131408691406, 59.55680465698242, 101.84851837158203, -26.54688262939453, -36.61407470703125, 193.74232482910156, 60.39422607421875, 42.4874153137207, 75.40272521972656, 168.60311889648438, 105.71527099609375, 61.6883544921875, 91.50721740722656, 68.1432113647461, 115.60262298583984, 154.29421997070312, -43.12554931640625, 203.3131561279297, 85.71498107910156, 84.5226821899414, -87.58817291259766, 27.994041442871094, -2.5571022033691406, 96.73794555664062, 8.588150024414062, 49.42565155029297, 11.070724487304688, -41.258811950683594, 161.04090881347656, -64.8328628540039, 183.6224365234375, 62.511444091796875, 134.37161254882812, 95.26715850830078, -2.4746742248535156, 138.7991943359375, 173.4027557373047, 62.086997985839844, -101.23371887207031, -11.220291137695312, 210.31298828125, 142.90029907226562, 79.62791442871094, 107.3924789428711, 192.41665649414062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000577.npy"}
{"epoch": 0.8472834067547724, "step": 578, "batch_size": 64, "mean": 62.50342559814453, "std": 79.31452178955078, "min": -158.0628662109375, "p10": -41.52497787475585, "median": 63.36244201660156, "p90": 170.03372650146488, "max": 247.9851837158203, "pos_frac": 0.828125, "sample": [88.15555572509766, 22.79060173034668, 22.846649169921875, 159.99490356445312, 174.7244110107422, 108.34980010986328, 79.7717056274414, -45.81660461425781, 46.956626892089844, 35.3000602722168, 30.522789001464844, 100.7165298461914, 12.41412353515625, 162.0072784423828, 20.427108764648438, -26.201828002929688, 76.48855590820312, 87.64275360107422, 122.7501220703125, 68.77312469482422, 32.134796142578125, 150.13580322265625, 15.468315124511719, -50.20072937011719, -94.05294799804688, 145.29232788085938, 247.9851837158203, -6.470222473144531, 103.40191650390625, 173.4736328125, 138.20233154296875, -61.60118103027344, 45.29156494140625, 59.879913330078125, 17.741844177246094, 86.89044952392578, 5.019630432128906, -158.0628662109375, 0.9713287353515625, -32.10309600830078, 29.66490936279297, 66.5797119140625, 82.07377624511719, 45.995460510253906, 179.5561065673828, 9.396575927734375, 100.4654541015625, -45.56292724609375, 187.82406616210938, 127.59400939941406, 100.60087585449219, 60.4176025390625, 29.266586303710938, 62.162506103515625, 67.7890396118164, 128.97947692871094, 64.5623779296875, 51.169105529785156, 222.74295043945312, 180.13092041015625, -106.9230728149414, -6.185096740722656, 128.77398681640625, 65.13264465332031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000578.npy"}
{"epoch": 0.8487518355359766, "step": 579, "batch_size": 64, "mean": 64.23765563964844, "std": 78.89299011230469, "min": -110.85607147216797, "p10": -32.359544372558595, "median": 57.07699012756348, "p90": 175.04991455078132, "max": 304.9245300292969, "pos_frac": 0.8125, "sample": [-19.08061981201172, 44.04350280761719, -85.8960189819336, 185.97268676757812, 97.36715698242188, 122.81636047363281, 42.59706115722656, -48.91853332519531, 62.72173309326172, 131.2708282470703, 13.0216064453125, 16.95931625366211, 37.90821838378906, -71.26917266845703, 38.73778533935547, 88.80895233154297, 87.7391586303711, 155.19923400878906, 223.64834594726562, 125.63546752929688, -110.85607147216797, 22.289758682250977, -38.168983459472656, 209.65350341796875, 25.629661560058594, 104.12872314453125, 54.850467681884766, 25.639617919921875, 96.64869689941406, -31.400772094726562, -2.6708984375, 39.370849609375, 35.852935791015625, 92.62387084960938, 304.9245300292969, 150.23721313476562, 22.9454345703125, 76.60101318359375, 101.727783203125, -10.423782348632812, 88.114990234375, 15.299949645996094, 103.76261901855469, 91.93476867675781, 106.67523193359375, 59.30351257324219, 108.77772521972656, 65.92771911621094, -68.59921264648438, 27.217571258544922, 187.03639221191406, 129.5048370361328, -9.346868515014648, 183.67547607421875, 92.37747955322266, 45.154541015625, 156.212890625, 12.706197738647461, 11.9002685546875, 15.038589477539062, 49.070552825927734, 183.1229248046875, 70.22555541992188, -32.77044677734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000579.npy"}
{"epoch": 0.8502202643171806, "step": 580, "batch_size": 64, "mean": 73.02272033691406, "std": 79.85865020751953, "min": -72.58403015136719, "p10": -28.976558685302734, "median": 69.2580337524414, "p90": 184.0193588256836, "max": 234.61038208007812, "pos_frac": 0.78125, "sample": [30.141555786132812, 162.89126586914062, 56.30052185058594, 87.71424102783203, 27.748794555664062, 50.06282043457031, 142.33804321289062, 212.47085571289062, 98.87381744384766, 97.93827819824219, 29.640350341796875, 36.88020324707031, 232.43258666992188, 141.052490234375, -43.71467590332031, 70.58097839355469, -17.843475341796875, 1.742868423461914, 178.41281127929688, 115.1544418334961, -27.83124542236328, 144.43260192871094, 63.75627136230469, 2.4085617065429688, -0.1065216064453125, 67.94325256347656, 79.39894104003906, 65.05460357666016, 206.38534545898438, 37.126609802246094, 143.68768310546875, 126.44584655761719, 70.57281494140625, 9.442611694335938, 198.17596435546875, 171.7079620361328, 97.18699645996094, -34.25681686401367, 84.98565673828125, -0.2019805908203125, -29.4674072265625, -18.87451171875, -53.45386505126953, 127.6282958984375, 44.66075897216797, 31.912078857421875, -71.11863708496094, -72.58403015136719, -19.537551879882812, 234.61038208007812, 140.99029541015625, 10.607231140136719, 192.59405517578125, 162.68743896484375, 145.7525634765625, 71.0709228515625, 153.7252960205078, 78.59835815429688, -8.86048698425293, -52.82099533081055, 44.865997314453125, 97.61746215820312, 186.4221649169922, 59.294281005859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000580.npy"}
{"epoch": 0.8516886930983847, "step": 581, "batch_size": 64, "mean": 64.5920181274414, "std": 88.29571533203125, "min": -123.58969116210938, "p10": -58.90564231872558, "median": 71.41394805908203, "p90": 174.25166931152344, "max": 241.5004119873047, "pos_frac": 0.75, "sample": [99.93597412109375, -123.58969116210938, 61.58142852783203, -100.78431701660156, -20.362823486328125, 29.796890258789062, 148.8539581298828, 16.047286987304688, 114.5991439819336, 54.264427185058594, 163.28688049316406, 43.029380798339844, 179.86572265625, 116.91049194335938, -55.28218078613281, 55.96129608154297, 174.4552001953125, 132.0123291015625, 59.242706298828125, 241.5004119873047, 81.8070068359375, 173.77676391601562, -117.263916015625, 29.417564392089844, 105.26029205322266, 77.44580841064453, 127.3734130859375, 108.81363677978516, 52.91265869140625, 146.5228729248047, -0.6384849548339844, 165.87196350097656, -0.2578849792480469, 178.91639709472656, 135.29080200195312, 18.229293823242188, 179.49163818359375, 155.4765625, 101.60707092285156, 67.72151184082031, 129.38018798828125, 56.45813751220703, -58.97001647949219, 13.0208740234375, 123.3437728881836, -94.04747009277344, 178.80032348632812, -58.755435943603516, 93.15401458740234, 5.230457305908203, 77.46369171142578, -33.30943298339844, 14.299993515014648, -97.56065368652344, 66.39909362792969, 140.81051635742188, -17.021881103515625, 75.10638427734375, 137.26104736328125, -86.0262451171875, -7.9164276123046875, 211.29095458984375, -36.59324264526367, 122.97113037109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000581.npy"}
{"epoch": 0.8531571218795888, "step": 582, "batch_size": 64, "mean": 62.802974700927734, "std": 80.24565887451172, "min": -174.2293701171875, "p10": -20.993894958496085, "median": 50.720293045043945, "p90": 176.59877929687502, "max": 256.9533996582031, "pos_frac": 0.8125, "sample": [33.604583740234375, 52.41394805908203, 119.51065826416016, -5.48931884765625, 161.45779418945312, 119.70347595214844, 62.33406066894531, 28.79718017578125, -65.46624755859375, 126.30108642578125, 83.55534362792969, 127.04141235351562, 38.607513427734375, 173.18927001953125, 60.18489074707031, 197.06759643554688, 25.43775177001953, 58.75990295410156, -82.39759826660156, 129.293212890625, -7.2340087890625, 0.9116744995117188, 54.33025360107422, 19.341381072998047, 48.92500305175781, 85.78001403808594, 106.76194763183594, 90.5179443359375, 254.38519287109375, 203.5988311767578, 84.95841979980469, 131.18930053710938, 48.90009307861328, 82.34793090820312, 157.23178100585938, -24.869583129882812, 9.802726745605469, -44.41802978515625, 24.458511352539062, 83.76640319824219, 0.4095458984375, 21.683807373046875, 256.9533996582031, 99.41702270507812, 26.96514892578125, 127.60591125488281, 42.34002685546875, 9.221542358398438, -6.023078918457031, 199.95233154296875, -11.95062255859375, 186.8330841064453, 64.95106506347656, -44.84907531738281, -29.22183609008789, -174.2293701171875, 178.05999755859375, 31.62533187866211, 29.408920288085938, 49.02663803100586, 42.64659118652344, 8.487064361572266, -0.9772491455078125, 56.461936950683594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000582.npy"}
{"epoch": 0.8546255506607929, "step": 583, "batch_size": 64, "mean": 64.84408569335938, "std": 78.77951049804688, "min": -74.96652221679688, "p10": -24.925909423828124, "median": 52.904457092285156, "p90": 180.55399017333986, "max": 275.43951416015625, "pos_frac": 0.75, "sample": [3.55267333984375, 32.356536865234375, -43.03052520751953, 9.739227294921875, 138.60792541503906, 27.116561889648438, 60.95008850097656, 91.68778991699219, 275.43951416015625, 41.222557067871094, 132.00222778320312, 22.021865844726562, -10.155807495117188, 71.08190155029297, 74.09117889404297, -26.464927673339844, 103.86868286132812, 40.052284240722656, 88.50305938720703, 122.59820556640625, 197.79843139648438, 61.136661529541016, 33.84019470214844, -25.45096206665039, -23.700786590576172, -33.356056213378906, 47.066734313964844, 152.15794372558594, -21.165611267089844, 13.290687561035156, 26.62664031982422, 79.167724609375, 170.35134887695312, 179.98052978515625, -1.6944351196289062, 110.6418685913086, 180.7997589111328, -6.162345886230469, 144.6123504638672, 23.37976837158203, -10.514404296875, 74.11026000976562, 49.90497589111328, -13.175010681152344, -59.58234405517578, -17.275833129882812, 232.40032958984375, 25.97320556640625, 78.59127807617188, -42.25176239013672, 236.08517456054688, 92.91348266601562, 15.265144348144531, 55.90393829345703, 187.86288452148438, -74.96652221679688, 44.95012664794922, 193.20230102539062, 64.90385437011719, 140.9014892578125, 135.09854125976562, 85.12716674804688, 98.13177490234375, -8.100090026855469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000583.npy"}
{"epoch": 0.856093979441997, "step": 584, "batch_size": 64, "mean": 71.72808837890625, "std": 82.30305480957031, "min": -95.41262817382812, "p10": -38.024383544921875, "median": 73.3425407409668, "p90": 185.7469589233399, "max": 247.37277221679688, "pos_frac": 0.8125, "sample": [-60.93785095214844, 83.43920135498047, 24.95709991455078, 74.20004272460938, 109.98123168945312, -9.610565185546875, 95.16078186035156, 84.51835632324219, 128.64483642578125, 81.33621978759766, 32.75999450683594, -95.41262817382812, 25.8038330078125, 55.10493469238281, 43.17322540283203, 247.37277221679688, -5.303352355957031, 216.2316436767578, 19.68267822265625, -38.80061340332031, 64.92571258544922, 0.9911651611328125, 88.21145629882812, 73.3345947265625, 64.8878402709961, -29.09210205078125, 161.93124389648438, 131.5645751953125, 44.041168212890625, 54.85145568847656, 145.10552978515625, -87.94012451171875, 140.79959106445312, -52.19012451171875, 30.287315368652344, 78.47882080078125, 39.929237365722656, 124.68246459960938, -36.21318054199219, -83.21855163574219, 190.03964233398438, 73.3504867553711, -31.435340881347656, 243.1790771484375, 175.73069763183594, 55.767723083496094, 194.1536865234375, 96.81524658203125, -93.37284851074219, 138.88795471191406, 199.22439575195312, 50.368370056152344, 87.66232299804688, 54.48663330078125, 33.33266067504883, 163.97390747070312, 113.60710144042969, 146.2282257080078, 194.0791015625, 66.33261108398438, 152.38223266601562, 114.21630859375, 99.58843231201172, 4.3291168212890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000584.npy"}
{"epoch": 0.8575624082232012, "step": 585, "batch_size": 64, "mean": 83.6611557006836, "std": 90.11136627197266, "min": -134.05413818359375, "p10": -37.49532089233398, "median": 72.2989387512207, "p90": 201.4703033447266, "max": 326.88427734375, "pos_frac": 0.859375, "sample": [55.440467834472656, 63.06349182128906, 49.833824157714844, 144.55624389648438, 41.074668884277344, 326.88427734375, 192.83383178710938, -28.948944091796875, 134.7670440673828, 36.934139251708984, 158.65939331054688, 107.85832977294922, 235.0419921875, 106.23764038085938, 98.43197631835938, 117.77363586425781, 174.30491638183594, 189.65635681152344, 208.25308227539062, 23.607284545898438, 75.7496566772461, 68.75747680664062, 23.017532348632812, -29.48199462890625, 187.86441040039062, 24.748367309570312, 68.84822082519531, 194.19735717773438, 248.5946044921875, 173.28001403808594, 217.72479248046875, 0.40575408935546875, 51.02613830566406, -50.714447021484375, 82.6570053100586, 34.653358459472656, 131.3682861328125, -134.05413818359375, 103.80172729492188, 156.05809020996094, 55.04966735839844, -40.929603576660156, 208.75390625, 4.527046203613281, 166.65528869628906, 88.97744750976562, 158.85699462890625, -52.06818389892578, 30.80962371826172, 78.63249969482422, 34.76654815673828, 14.276473999023438, -47.46744155883789, -75.66999816894531, 55.71229553222656, 204.5872802734375, 35.9066162109375, 85.18870544433594, 63.13166809082031, 44.85401916503906, 115.52058410644531, 48.4641227722168, -87.21446228027344, 98.22698974609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000585.npy"}
{"epoch": 0.8590308370044053, "step": 586, "batch_size": 64, "mean": 58.4819450378418, "std": 69.90046691894531, "min": -96.67218780517578, "p10": -13.707448577880857, "median": 50.990461349487305, "p90": 173.83836517333987, "max": 223.2901611328125, "pos_frac": 0.8125, "sample": [139.53050231933594, 0.47956085205078125, 31.405563354492188, 9.275558471679688, 44.10521697998047, 23.55051612854004, 45.344242095947266, 82.52719116210938, 37.07233810424805, -12.021797180175781, -39.402313232421875, 91.26541900634766, 223.2901611328125, 117.10530090332031, -9.750358581542969, 62.61363983154297, 61.226318359375, -54.02787399291992, 133.544189453125, 80.0168685913086, 28.86236572265625, 14.974418640136719, 51.417110443115234, 45.04851531982422, 195.025146484375, 7.6069793701171875, 123.36621856689453, 81.53128051757812, -60.80064392089844, 33.903533935546875, 50.563812255859375, 68.61836242675781, -4.8347320556640625, -76.08995056152344, 147.23435974121094, 52.269683837890625, 41.01299285888672, 80.13912963867188, 75.67127990722656, 182.30816650390625, 198.8830108642578, -10.609783172607422, 80.28626251220703, 176.03514099121094, -96.67218780517578, -14.42987060546875, 55.04503631591797, 47.84508514404297, 26.809066772460938, 101.69463348388672, 24.90587615966797, 168.71255493164062, -43.94786071777344, 44.31153869628906, 195.03485107421875, 84.26127624511719, 43.096885681152344, 177.8672637939453, -4.434566497802734, 5.589111328125, 64.91207122802734, 92.02088928222656, 65.26805877685547, 55.382080078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000586.npy"}
{"epoch": 0.8604992657856094, "step": 587, "batch_size": 64, "mean": 62.21076202392578, "std": 77.88706970214844, "min": -112.38121032714844, "p10": -35.86400833129883, "median": 43.748857498168945, "p90": 166.79055328369142, "max": 281.06829833984375, "pos_frac": 0.8125, "sample": [84.22444915771484, 168.10003662109375, 13.482833862304688, -1.329519271850586, 47.7117919921875, 61.59803009033203, 26.153404235839844, 121.56581115722656, 30.81363296508789, 175.2466583251953, 12.860433578491211, 131.59402465820312, 281.06829833984375, 31.88399887084961, 163.73509216308594, 179.82595825195312, 1.1286354064941406, -56.29542541503906, 27.821395874023438, 112.65534973144531, -112.38121032714844, 2.313568115234375, 173.24307250976562, 12.076154708862305, 89.48506164550781, 160.92782592773438, -47.51476287841797, 185.14822387695312, 42.31019592285156, 60.82957458496094, 34.6690559387207, 135.083740234375, -73.13645935058594, 117.05023193359375, -13.955368041992188, 63.944129943847656, 9.567533493041992, -45.59019470214844, 37.53937530517578, 8.608776092529297, 18.666488647460938, 134.3279266357422, -14.679986953735352, 149.398681640625, 30.729001998901367, 153.37847900390625, 41.151405334472656, -14.704574584960938, 60.892852783203125, 142.70835876464844, 158.56768798828125, 93.59463500976562, 168.22940063476562, 33.000770568847656, 153.81854248046875, 102.727294921875, 11.804206848144531, -35.624168395996094, -53.28752136230469, 34.891597747802734, 78.60012817382812, -35.966796875, 45.18751907348633, 70.01345825195312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000587.npy"}
{"epoch": 0.8619676945668135, "step": 588, "batch_size": 64, "mean": 59.2062873840332, "std": 85.31361389160156, "min": -154.1258544921875, "p10": -36.23185424804686, "median": 56.00674819946289, "p90": 174.57270355224608, "max": 235.50079345703125, "pos_frac": 0.734375, "sample": [103.30253601074219, -77.98959350585938, -15.930328369140625, 79.42465209960938, 161.49098205566406, 31.958343505859375, 174.33311462402344, 82.97526550292969, 208.3160400390625, 177.79234313964844, -13.100868225097656, 116.17390441894531, 12.103302001953125, 52.645057678222656, -110.97624969482422, 69.5701675415039, 67.61614990234375, 97.84677124023438, -9.172409057617188, 25.80443572998047, 86.26519775390625, 93.56019592285156, 79.76959228515625, 200.3540802001953, -154.1258544921875, 24.4295654296875, 98.69041442871094, -4.2994232177734375, -110.68235778808594, -1.6645355224609375, 39.79060745239258, 147.87586975097656, -42.07096862792969, 204.25294494628906, 74.7237777709961, -15.485694885253906, -22.607254028320312, 18.910964965820312, 42.3707389831543, 169.4700927734375, 138.58779907226562, 137.2926025390625, -55.70814514160156, 40.629302978515625, 151.5426025390625, 25.16703987121582, 50.07581329345703, 174.67538452148438, 63.890716552734375, 111.00360107421875, 14.386138916015625, 207.6375732421875, 235.50079345703125, -12.853469848632812, 1.8462905883789062, 7.046119689941406, 59.368438720703125, -1.0856437683105469, 15.935808181762695, 98.66605377197266, -70.62525939941406, -19.996047973632812, 125.15925598144531, 127.34795379638672], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000588.npy"}
{"epoch": 0.8634361233480177, "step": 589, "batch_size": 64, "mean": 78.61827850341797, "std": 69.85184478759766, "min": -86.2852783203125, "p10": -18.331515502929683, "median": 86.67219543457031, "p90": 166.97510223388673, "max": 227.27685546875, "pos_frac": 0.859375, "sample": [184.3336181640625, 167.6915740966797, 28.930816650390625, 12.534919738769531, 106.29878234863281, -1.6914329528808594, 43.02552795410156, 143.89913940429688, 39.41473388671875, -21.842514038085938, 148.9976348876953, 166.31497192382812, 50.634559631347656, 102.50379943847656, 32.36708068847656, 166.7031707763672, 227.27685546875, 64.07181549072266, -13.920135498046875, 85.3070068359375, 91.60025787353516, 147.10610961914062, -36.73973083496094, 114.29871368408203, 96.59355163574219, 147.7176513671875, 89.52752685546875, 173.32815551757812, 98.42523193359375, 27.680767059326172, 88.03738403320312, 115.75849151611328, 95.75199127197266, 51.97991180419922, 154.41624450683594, 78.51889038085938, 167.09164428710938, -86.2852783203125, 116.878173828125, 65.38948059082031, 162.74342346191406, 19.38127899169922, -32.50050354003906, 91.04801940917969, 178.22402954101562, 58.71852111816406, 70.31381225585938, -20.22210693359375, 178.72592163085938, 128.9062042236328, 159.68795776367188, 8.72275161743164, 46.69274139404297, 96.24456787109375, 1.1777572631835938, 13.894950866699219, 44.231842041015625, 107.88650512695312, 40.782981872558594, 42.61920166015625, 64.01194763183594, -74.5484619140625, -36.215782165527344, 151.1153106689453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000589.npy"}
{"epoch": 0.8649045521292217, "step": 590, "batch_size": 64, "mean": 75.031005859375, "std": 71.89248657226562, "min": -94.53520965576172, "p10": 4.926534271240238, "median": 62.59465789794922, "p90": 175.08861083984377, "max": 251.45101928710938, "pos_frac": 0.90625, "sample": [14.193742752075195, 98.17121124267578, 51.53343200683594, 8.1942138671875, 51.07942199707031, -9.436882019042969, 35.76445007324219, 251.45101928710938, 222.81396484375, 16.04876708984375, 31.062938690185547, 66.28450775146484, 69.53321838378906, 250.54776000976562, 208.17221069335938, 53.34262466430664, 11.490758895874023, -13.790328979492188, 101.80915832519531, 32.767669677734375, 89.92534637451172, 175.92991638183594, 78.86983489990234, 86.93022155761719, 38.745574951171875, -94.53520965576172, -32.093589782714844, 91.774169921875, 53.59967041015625, 20.372787475585938, -45.76325988769531, 96.19064331054688, 80.70088195800781, 181.82730102539062, 19.790409088134766, 110.5416259765625, 56.005645751953125, -37.628822326660156, 55.601383209228516, 16.29547119140625, 83.00208282470703, 72.01704406738281, 58.904808044433594, 154.08444213867188, 38.240692138671875, 165.5087890625, 172.13046264648438, 93.77680969238281, 3.5261001586914062, 13.952552795410156, 29.17450714111328, 119.06403350830078, 29.557464599609375, 93.33972930908203, 54.91001892089844, 121.74351501464844, 38.81140899658203, 173.1255645751953, 21.318153381347656, 171.434814453125, 68.93968200683594, 102.83547973632812, 123.94290924072266, 204.52940368652344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000590.npy"}
{"epoch": 0.8663729809104258, "step": 591, "batch_size": 64, "mean": 80.97084045410156, "std": 75.46971130371094, "min": -62.458396911621094, "p10": -13.357201766967773, "median": 82.71969985961914, "p90": 173.65335388183598, "max": 328.73095703125, "pos_frac": 0.84375, "sample": [118.8602066040039, 89.93949890136719, 101.98279571533203, 43.02781677246094, 78.51290130615234, -62.458396911621094, 138.9982147216797, 30.98131561279297, 4.678936004638672, 328.73095703125, 56.88014221191406, 126.97732543945312, 178.20510864257812, 101.22956848144531, 133.11965942382812, 230.34573364257812, 90.25519561767578, 161.3013916015625, 143.10150146484375, 114.51451110839844, 69.84358215332031, 67.3905258178711, 13.512374877929688, -3.2622108459472656, 17.11328125, 14.59194564819336, 59.34847640991211, 177.00704956054688, 85.65159606933594, 64.9144287109375, 6.884918212890625, 104.55107116699219, 152.2247772216797, 8.050516128540039, 186.53836059570312, 115.82954406738281, 86.61128234863281, 37.20277786254883, 6.2642059326171875, 23.544139862060547, -25.346633911132812, -12.17923355102539, 165.82806396484375, 20.593231201171875, -27.844276428222656, 82.06532287597656, 149.31259155273438, -22.353837966918945, 50.48878479003906, 127.3348388671875, 148.90182495117188, 81.38920593261719, 201.12887573242188, 143.505126953125, 111.80020141601562, -6.573280334472656, 83.37407684326172, -13.862045288085938, -25.119117736816406, -58.83027648925781, 155.06149291992188, 182.42019653320312, 78.8145751953125, 89.22736358642578], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000591.npy"}
{"epoch": 0.8678414096916299, "step": 592, "batch_size": 64, "mean": 62.02149200439453, "std": 82.17369079589844, "min": -115.92794799804688, "p10": -60.43837089538574, "median": 68.16657638549805, "p90": 171.5059036254883, "max": 276.3427734375, "pos_frac": 0.78125, "sample": [177.1005401611328, -66.6390380859375, -11.747695922851562, 36.41912078857422, -61.7203483581543, 15.661994934082031, 147.28756713867188, -20.734664916992188, 276.3427734375, 71.98684692382812, 31.309120178222656, 100.31073760986328, -57.44709014892578, 87.16409301757812, -91.1421890258789, 23.60943603515625, 49.43194580078125, 45.354766845703125, 73.196044921875, 158.74623107910156, 49.48012161254883, 64.82351684570312, 62.54725646972656, 141.90582275390625, -63.67955780029297, 46.347679138183594, 76.64173126220703, 106.24440002441406, 185.1505584716797, 108.02630615234375, 83.91519165039062, -24.52039337158203, 7.972190856933594, 136.56527709960938, 194.2002716064453, 121.08428192138672, -115.92794799804688, 45.51948547363281, 54.6424560546875, 167.3983917236328, 115.34947967529297, 119.80690002441406, 173.26626586914062, 154.034423828125, 26.7159423828125, 12.126319885253906, 79.17826843261719, 69.01818084716797, 70.85223388671875, 176.52955627441406, -22.620681762695312, -26.148208618164062, 207.0172576904297, 30.290878295898438, -31.41231918334961, 93.46907043457031, -68.17186737060547, 116.4034423828125, 88.30796813964844, 72.78072357177734, -106.8504638671875, 16.68408203125, 67.31497192382812, 102.60621643066406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000592.npy"}
{"epoch": 0.869309838472834, "step": 593, "batch_size": 64, "mean": 55.27935791015625, "std": 82.42095947265625, "min": -105.59095001220703, "p10": -64.98533668518066, "median": 44.780378341674805, "p90": 157.2614501953125, "max": 265.89300537109375, "pos_frac": 0.78125, "sample": [18.056793212890625, -73.42800903320312, -84.06013488769531, 32.966712951660156, 26.691085815429688, 19.42486572265625, 63.82842254638672, -21.95458221435547, 66.20201873779297, 38.973175048828125, 85.12486267089844, 185.38674926757812, 42.41261291503906, -7.81866455078125, 14.406440734863281, 14.761539459228516, 265.89300537109375, 157.33914184570312, 136.0166778564453, 48.211692810058594, 118.45326232910156, 52.63261413574219, 45.96733474731445, 81.7573471069336, 74.810546875, 87.48883056640625, 20.448768615722656, 102.67512512207031, 10.487960815429688, 20.575942993164062, -36.01374053955078, 154.88087463378906, 72.11819458007812, -14.360191345214844, 40.883460998535156, 59.082557678222656, 70.73074340820312, 243.86192321777344, 37.80775451660156, 31.90642547607422, -105.59095001220703, 22.383041381835938, -74.47882080078125, 43.593421936035156, 157.32925415039062, -80.94834899902344, 24.343910217285156, 129.808349609375, 84.57261657714844, 152.80526733398438, 36.7081298828125, 134.02882385253906, 258.72198486328125, 118.21745300292969, -68.04129028320312, 70.48683166503906, 116.96096801757812, -10.91465950012207, -78.25001525878906, 163.1951904296875, 157.10324096679688, -44.13292694091797, -57.85477828979492, 83.20195007324219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000593.npy"}
{"epoch": 0.8707782672540382, "step": 594, "batch_size": 64, "mean": 80.72880554199219, "std": 70.40156555175781, "min": -47.946144104003906, "p10": -2.460500335693359, "median": 71.36650466918945, "p90": 175.46569824218759, "max": 245.06150817871094, "pos_frac": 0.875, "sample": [142.91891479492188, 151.59259033203125, 18.59709930419922, -7.335908889770508, 13.099929809570312, 55.951133728027344, 221.51882934570312, 67.98719787597656, 105.90658569335938, 68.96791076660156, 31.501644134521484, 68.76264953613281, 145.18174743652344, 92.778564453125, 19.3494873046875, 135.94223022460938, 216.99032592773438, 157.3723602294922, 7.3535919189453125, 19.945697784423828, 151.0985107421875, 67.01683044433594, -2.062835693359375, 238.88348388671875, 52.08514404296875, 63.955474853515625, -23.985736846923828, 183.21998596191406, 89.5022201538086, 140.45474243164062, 41.53832244873047, 70.1524887084961, -47.946144104003906, 51.07402038574219, 80.11209106445312, 147.09066772460938, 23.273719787597656, 4.515110015869141, -21.42388153076172, -43.10650634765625, 28.783164978027344, 74.30763244628906, 51.152496337890625, -2.6309280395507812, 75.69903564453125, 216.67184448242188, 156.799072265625, 84.8707275390625, 36.10276794433594, 115.81715393066406, -24.460918426513672, 72.58052062988281, 128.18063354492188, 108.84709930419922, 93.1286392211914, 14.813232421875, 123.56642150878906, 101.12806701660156, 245.06150817871094, 32.010948181152344, 192.13861083984375, 104.13726043701172, 56.096527099609375, 82.01171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000594.npy"}
{"epoch": 0.8722466960352423, "step": 595, "batch_size": 64, "mean": 67.65665435791016, "std": 80.97412872314453, "min": -77.03699493408203, "p10": -31.01928405761718, "median": 61.15827560424805, "p90": 165.74178771972657, "max": 257.1819763183594, "pos_frac": 0.734375, "sample": [89.52676391601562, 168.29214477539062, 141.48939514160156, 57.249263763427734, 19.089445114135742, 252.01284790039062, 1.8355560302734375, 72.73596954345703, -48.14076232910156, 175.4989776611328, 109.13945007324219, 155.92625427246094, 126.84138488769531, 32.3470458984375, -1.5891685485839844, 91.03047180175781, 113.76020812988281, 202.5728759765625, 213.05963134765625, -11.525157928466797, -51.663692474365234, -53.60837936401367, 78.98828125, 152.65530395507812, 11.349231719970703, 134.41207885742188, 42.92530059814453, 12.712078094482422, -56.92384338378906, 142.1673126220703, 7.361415863037109, 134.83058166503906, -24.25076675415039, -6.0839080810546875, 155.86907958984375, 60.46160888671875, -23.12335205078125, 50.78363037109375, 103.92063903808594, -4.389839172363281, -34.569374084472656, -33.110321044921875, 94.11592102050781, -77.03699493408203, 97.2457046508789, 127.19097900390625, 88.37203216552734, 12.333282470703125, -26.14019775390625, 133.97457885742188, -15.81231689453125, 45.765960693359375, -0.27571868896484375, 25.04827880859375, -20.05225372314453, 61.854942321777344, 74.8203125, 24.12017822265625, 159.79095458984375, 39.027099609375, 142.3754425048828, 136.59275817871094, 187.66754150390625, 257.1819763183594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000595.npy"}
{"epoch": 0.8737151248164464, "step": 596, "batch_size": 64, "mean": 61.938507080078125, "std": 82.13892364501953, "min": -134.5306854248047, "p10": -43.94523010253906, "median": 58.976829528808594, "p90": 155.20941009521485, "max": 235.4217529296875, "pos_frac": 0.8125, "sample": [-72.33352661132812, 55.60243225097656, 173.75509643554688, -8.299896240234375, 97.13724517822266, 222.8404083251953, -85.66557312011719, -34.129547119140625, 56.507568359375, 50.52733612060547, 235.4217529296875, 39.269439697265625, -17.6126708984375, 151.72671508789062, 63.246978759765625, 223.33477783203125, 154.90769958496094, 2.4145679473876953, 108.23192596435547, -59.59974670410156, 57.441627502441406, 94.17929077148438, 34.83262634277344, 100.17808532714844, 59.85267639160156, 87.72699737548828, -13.156831741333008, 18.81533432006836, 113.10376739501953, 6.9276885986328125, 98.44509887695312, 140.89727783203125, 69.88673400878906, 155.33871459960938, 10.749710083007812, 11.576637268066406, 228.04562377929688, 147.0912628173828, 130.53709411621094, -113.61296844482422, -46.46076965332031, 7.686975479125977, -38.07563781738281, 122.29449462890625, 11.473106384277344, 77.32675170898438, 20.407058715820312, 68.4215087890625, 154.79718017578125, 160.339599609375, 30.35110855102539, 21.822383880615234, -66.01848602294922, 147.796630859375, 116.00013732910156, -134.5306854248047, 85.68792724609375, 49.621421813964844, 58.100982666015625, 105.59762573242188, 116.64469909667969, 4.830875396728516, 62.31462860107422, 31.495548248291016], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000596.npy"}
{"epoch": 0.8751835535976505, "step": 597, "batch_size": 64, "mean": 78.33621215820312, "std": 80.49236297607422, "min": -116.39402770996094, "p10": -28.703213500976556, "median": 72.10479354858398, "p90": 188.84696502685551, "max": 273.6109619140625, "pos_frac": 0.828125, "sample": [61.249053955078125, 28.16702651977539, 72.9655532836914, 149.50802612304688, 58.847198486328125, -46.16693115234375, 13.611259460449219, 213.01329040527344, 24.314495086669922, 141.2703857421875, 84.99191284179688, 43.27140808105469, -116.39402770996094, 78.6880111694336, 125.77377319335938, 19.365478515625, 42.62080383300781, 34.519981384277344, -31.213104248046875, 240.0672607421875, -51.952606201171875, 83.685546875, 93.9713363647461, 192.53311157226562, 222.72984313964844, 58.582359313964844, -22.8468017578125, 11.101860046386719, 58.90825653076172, -18.059906005859375, -39.03099822998047, 69.20297241210938, 202.43246459960938, 163.73541259765625, 62.15641784667969, 273.6109619140625, 200.34640502929688, -0.7438507080078125, 20.108684539794922, 113.38917541503906, -35.719627380371094, 95.54094696044922, -11.236297607421875, 78.4498519897461, 75.79470825195312, 119.48319244384766, 180.24595642089844, 105.84393310546875, 54.62194061279297, 124.06788635253906, 72.2414779663086, 29.093685150146484, -40.45806121826172, 71.96810913085938, 165.44284057617188, 21.319297790527344, 159.97071838378906, 40.92920684814453, 148.39218139648438, 138.61175537109375, 77.73239135742188, 173.20352172851562, 64.29557037353516, 171.35073852539062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000597.npy"}
{"epoch": 0.8766519823788547, "step": 598, "batch_size": 64, "mean": 60.842613220214844, "std": 79.35845184326172, "min": -127.03347778320312, "p10": -53.27487487792968, "median": 70.50802230834961, "p90": 159.7171615600586, "max": 213.78770446777344, "pos_frac": 0.75, "sample": [101.99281311035156, -0.140411376953125, 85.3392333984375, 202.52447509765625, -54.12611389160156, 94.831787109375, 60.0965576171875, -3.027923583984375, 100.37084197998047, 121.71240234375, 83.83650207519531, -51.28865051269531, -85.21989440917969, -22.07440185546875, 119.85055541992188, 81.15733337402344, -28.035858154296875, 161.64236450195312, 207.6660919189453, 119.1759262084961, -66.18310546875, 209.66453552246094, 31.650985717773438, -7.9095611572265625, 70.32494354248047, 136.70474243164062, 155.2250213623047, 73.4939956665039, -127.03347778320312, -44.4654541015625, 33.24620819091797, 185.64907836914062, 174.75917053222656, 70.69110107421875, 10.470062255859375, 138.887939453125, 122.96808624267578, 37.235015869140625, -4.260223388671875, 51.75751495361328, -57.93096923828125, -48.046775817871094, 77.97274017333984, 42.860382080078125, 4.661106109619141, 129.77548217773438, 119.1438980102539, 40.08587646484375, 82.51869201660156, 113.31844329833984, 135.41429138183594, 49.21296691894531, 213.78770446777344, -68.40650177001953, 73.0377197265625, 35.617591857910156, 65.84996032714844, 76.15377807617188, 76.50833129882812, 60.2574462890625, 13.343093872070312, 61.268768310546875, 122.95357513427734, -74.59051513671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000598.npy"}
{"epoch": 0.8781204111600588, "step": 599, "batch_size": 64, "mean": 58.25450134277344, "std": 79.3045883178711, "min": -125.439697265625, "p10": -43.113969421386706, "median": 52.15511703491211, "p90": 169.0711013793946, "max": 217.174072265625, "pos_frac": 0.71875, "sample": [106.5862045288086, 58.40780258178711, 118.55577087402344, -9.284225463867188, 201.548095703125, 146.3946533203125, 175.8810272216797, 3.2125167846679688, 24.09427261352539, -3.445281982421875, 74.53707885742188, 151.53089904785156, 41.00675582885742, -8.923545837402344, -62.31887435913086, 54.4031982421875, 19.423873901367188, 180.15206909179688, 49.90703582763672, 106.31612396240234, 108.44505310058594, 217.174072265625, 123.67826080322266, 71.47943115234375, 153.1812744140625, -3.9057884216308594, 117.03498077392578, 126.2106704711914, 42.44710159301758, -125.439697265625, 93.44650268554688, 43.314544677734375, -74.34793853759766, -49.00482177734375, 208.41226196289062, 178.39874267578125, -58.14262390136719, -7.175712585449219, 87.16361236572266, -30.706283569335938, 115.41739654541016, 59.10255432128906, 137.6920166015625, 1.4328460693359375, 2.6346511840820312, -15.220333099365234, 49.63334655761719, -70.31583404541016, 109.80345916748047, 99.03548431396484, 73.683349609375, 9.203449249267578, -48.431549072265625, -16.877975463867188, 209.93881225585938, 40.08756637573242, 92.9061050415039, 40.94654083251953, -8.553985595703125, 98.25621032714844, -26.62200927734375, -22.188358306884766, 16.36129379272461, 130.71383666992188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000599.npy"}
{"epoch": 0.8795888399412628, "step": 600, "batch_size": 64, "mean": 71.27790832519531, "std": 77.35630798339844, "min": -93.35980224609375, "p10": -19.791983413696286, "median": 61.741294860839844, "p90": 171.7406967163086, "max": 263.6187438964844, "pos_frac": 0.828125, "sample": [93.03645324707031, 96.26058197021484, 91.29871368408203, 2.2897796630859375, 203.34494018554688, 146.66796875, 3.6336326599121094, -93.35980224609375, 29.16136360168457, 154.8140106201172, -70.76902770996094, 117.84423828125, 117.32684326171875, 102.56575012207031, 46.61477279663086, 148.20919799804688, -21.314224243164062, 200.27713012695312, 37.8651123046875, 39.166954040527344, 13.313125610351562, 82.16580963134766, 11.79513931274414, 36.64551544189453, 149.435546875, 130.11842346191406, 57.41976547241211, 82.83243560791016, 196.140380859375, 143.71737670898438, 56.038482666015625, -1.128143310546875, 172.0570526123047, 83.57276153564453, 172.12416076660156, 160.41578674316406, -63.055572509765625, 15.81744384765625, 20.38875961303711, 263.6187438964844, -17.798797607421875, 99.97555541992188, 141.28164672851562, 28.422889709472656, 106.90411376953125, 53.610992431640625, 171.00253295898438, 64.70006561279297, -57.72412109375, 53.49256896972656, -10.305397033691406, 201.63548278808594, 6.4599456787109375, 52.121192932128906, 58.78252410888672, 147.98841857910156, -20.64620590209961, 7.407684326171875, -43.57356643676758, 0.8980941772460938, 118.533447265625, 117.26516723632812, -11.962448120117188, 64.94680786132812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000600.npy"}
{"epoch": 0.8810572687224669, "step": 601, "batch_size": 64, "mean": 71.59075927734375, "std": 74.00298309326172, "min": -60.611228942871094, "p10": -18.172698593139646, "median": 65.02828216552734, "p90": 168.87684326171876, "max": 297.0807189941406, "pos_frac": 0.84375, "sample": [-15.597053527832031, 43.89013671875, 297.0807189941406, 114.49600982666016, 8.22451400756836, 154.99224853515625, 42.45893859863281, 130.7657470703125, 95.84835815429688, 64.31028747558594, 5.68110466003418, 138.736083984375, 185.14285278320312, 44.07598114013672, 137.73248291015625, -10.613723754882812, -44.58036804199219, 120.42227172851562, 102.75110626220703, 182.2958526611328, 80.44482421875, 58.11021423339844, 92.64923095703125, 26.8040771484375, 65.74627685546875, 175.34793090820312, 3.611652374267578, 11.188106536865234, 144.4291534423828, 17.336666107177734, 26.069793701171875, 171.9239044189453, -59.55940246582031, 68.62126159667969, 127.73182678222656, 30.76507568359375, -4.973274230957031, 144.36024475097656, 42.913230895996094, 115.72931671142578, -47.5275993347168, 110.89131164550781, -19.276546478271484, -28.461746215820312, 2.6547088623046875, -60.611228942871094, 59.620941162109375, 121.07083129882812, 164.86453247070312, 72.72074890136719, 100.80267333984375, 82.36630249023438, 40.746551513671875, 123.97638702392578, -60.169612884521484, 227.9998779296875, 35.01324462890625, 15.391387939453125, 91.91960144042969, 50.83805847167969, 170.59640502929688, 23.01114845275879, 62.601036071777344, 103.406005859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000601.npy"}
{"epoch": 0.882525697503671, "step": 602, "batch_size": 64, "mean": 86.27711486816406, "std": 85.09407806396484, "min": -103.36310577392578, "p10": -15.66305351257324, "median": 81.42399597167969, "p90": 190.67092132568362, "max": 339.91278076171875, "pos_frac": 0.828125, "sample": [56.03302001953125, 100.4185791015625, -20.574508666992188, 174.01060485839844, 41.21021270751953, 100.10244750976562, 161.33901977539062, 51.86016845703125, 44.34233093261719, -68.46884155273438, 43.15253448486328, -69.23680877685547, 71.74017333984375, 51.89723205566406, 154.04747009277344, 106.05046081542969, 61.2768440246582, 194.03916931152344, 195.17152404785156, -3.1135406494140625, 121.78240966796875, 91.8396987915039, 14.48272705078125, 33.95615768432617, -11.747978210449219, 59.70001220703125, 230.01144409179688, 112.31288146972656, 67.13583374023438, 35.14815902709961, 126.9599609375, -84.12959289550781, 50.33327865600586, 35.50109100341797, 246.11953735351562, -16.214584350585938, 27.660987854003906, 124.82627868652344, -103.36310577392578, 29.450437545776367, 84.52920532226562, 131.65191650390625, 88.75452423095703, -7.610748291015625, 143.9948272705078, 86.54943084716797, 339.91278076171875, 50.29551696777344, 199.6859588623047, -14.376148223876953, 92.45196533203125, 166.0169219970703, 155.96112060546875, 182.81167602539062, 135.77597045898438, 53.85511016845703, 69.87821960449219, 78.31878662109375, 159.84185791015625, 160.76870727539062, 164.71786499023438, 253.2812042236328, 128.1034698486328, -20.498443603515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000602.npy"}
{"epoch": 0.8839941262848752, "step": 603, "batch_size": 64, "mean": 76.12080383300781, "std": 80.42584228515625, "min": -132.69606018066406, "p10": -10.690446090698241, "median": 75.91014099121094, "p90": 165.677848815918, "max": 371.69989013671875, "pos_frac": 0.84375, "sample": [111.98284912109375, 31.84900665283203, -16.958984375, 13.674007415771484, 109.39346313476562, 143.12799072265625, 37.792327880859375, 145.92327880859375, 129.59315490722656, -2.9988861083984375, 116.22956848144531, 12.450653076171875, 16.102401733398438, 76.8661880493164, 27.853330612182617, 66.27772521972656, 122.79083251953125, 155.54067993164062, 187.81399536132812, 55.533180236816406, 118.37306213378906, 58.60549545288086, 12.694000244140625, 59.24946212768555, 128.33595275878906, -29.107925415039062, 18.111923217773438, 99.76266479492188, 14.435718536376953, 7.779777526855469, 75.57743072509766, 12.651031494140625, 176.76593017578125, -132.69606018066406, 104.7130355834961, 133.77247619628906, 55.800453186035156, -9.63675308227539, 70.07706451416016, 134.41195678710938, 200.74371337890625, 104.83020782470703, 42.37794494628906, -39.59140396118164, 98.8280029296875, 61.934661865234375, 141.69149780273438, 108.82563781738281, -11.14202880859375, 2.778980255126953, -9.480525970458984, 160.62454223632812, 223.226318359375, 4.110595703125, 76.24285125732422, -44.69017028808594, 371.69989013671875, 106.79859924316406, 95.12620544433594, 92.94284057617188, 167.8435516357422, 217.813232421875, 92.9915771484375, -45.308837890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000603.npy"}
{"epoch": 0.8854625550660793, "step": 604, "batch_size": 64, "mean": 74.73595428466797, "std": 69.04105377197266, "min": -103.17939758300781, "p10": -6.88024978637695, "median": 75.97947692871094, "p90": 158.0195922851563, "max": 216.96170043945312, "pos_frac": 0.84375, "sample": [-95.52169036865234, 100.4776611328125, -64.83319091796875, 35.09605407714844, 57.01165771484375, 190.22882080078125, 162.47482299804688, 71.753662109375, 108.68997955322266, 93.80287170410156, -103.17939758300781, 125.0423583984375, -8.217842102050781, 46.851654052734375, 115.85478210449219, 75.96571350097656, 101.9613265991211, 110.57261657714844, 0.23031234741210938, 147.16671752929688, 143.32940673828125, 12.71868896484375, 57.04014587402344, 168.8270263671875, -64.56781005859375, -43.83452606201172, 216.96170043945312, 100.00886535644531, 70.66793060302734, 79.5946044921875, -14.39139175415039, 17.975223541259766, 125.15775299072266, 34.0494384765625, 137.3365936279297, 163.0736083984375, 66.81161499023438, -0.7361831665039062, 32.89808654785156, 70.02726745605469, 131.83963012695312, 147.62405395507812, 61.582881927490234, -0.435760498046875, 125.11543273925781, 143.41607666015625, 50.758399963378906, 194.24908447265625, 175.31407165527344, 68.07356262207031, 46.280799865722656, 143.2510223388672, 141.52874755859375, 51.18206024169922, 105.92985534667969, 56.46290969848633, 23.609375, 77.29458618164062, 54.10797119140625, 88.26594543457031, 86.25562286376953, -3.7592010498046875, 75.99324035644531, 94.78365325927734], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000604.npy"}
{"epoch": 0.8869309838472834, "step": 605, "batch_size": 64, "mean": 62.80385971069336, "std": 72.851318359375, "min": -87.03396606445312, "p10": -31.580352020263668, "median": 56.99800109863281, "p90": 159.26956176757812, "max": 210.49105834960938, "pos_frac": 0.78125, "sample": [-19.82233428955078, 7.387092590332031, 160.32122802734375, 88.83026885986328, 26.667980194091797, 138.61151123046875, 83.41799926757812, 89.04576873779297, 112.12622833251953, 43.68780517578125, 156.815673828125, 133.2364501953125, 49.909446716308594, 111.79067993164062, 173.54823303222656, 20.579296112060547, 14.038738250732422, 134.71469116210938, 164.1860809326172, -47.66725158691406, 201.73416137695312, 91.80862426757812, 31.49740219116211, -26.30927276611328, -33.839385986328125, 107.66157531738281, 68.22854614257812, -15.124916076660156, -41.49742889404297, 49.109352111816406, 4.8227386474609375, -47.44615936279297, 53.89636993408203, 152.67709350585938, 120.5735092163086, -9.282669067382812, 209.33847045898438, 18.08371353149414, 105.24213409423828, 65.17138671875, 107.75455474853516, 210.49105834960938, -21.83218002319336, 53.491722106933594, 71.74163818359375, 83.04994201660156, -76.59895324707031, 33.89312744140625, -44.18400955200195, 75.1478271484375, 45.633018493652344, -87.03396606445312, 191.66302490234375, 60.099632263183594, -6.666828155517578, 64.82252502441406, 136.9194793701172, 42.920955657958984, 101.13375091552734, 51.375213623046875, 150.29449462890625, 39.19245147705078, 7.5682373046875, -19.200454711914062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000605.npy"}
{"epoch": 0.8883994126284875, "step": 606, "batch_size": 64, "mean": 71.01239013671875, "std": 73.25605010986328, "min": -127.38089752197266, "p10": -14.247961044311523, "median": 68.08872604370117, "p90": 153.9578048706055, "max": 248.0955810546875, "pos_frac": 0.828125, "sample": [-58.812828063964844, 40.67466735839844, 171.59686279296875, -47.264896392822266, 116.23274993896484, -5.944007873535156, 36.45556640625, 22.300323486328125, 248.0955810546875, 120.97947692871094, 145.8642578125, 228.86561584472656, 191.721923828125, -37.095298767089844, -1.1039581298828125, -0.8499755859375, 64.38735961914062, 24.135433197021484, 42.76549530029297, 141.71726989746094, -13.940143585205078, -127.38089752197266, 156.71629333496094, 88.43061065673828, 121.33985900878906, 106.51472473144531, 121.65097045898438, 92.33390045166016, 37.34910202026367, 81.07571411132812, 50.57843017578125, 104.09862518310547, -64.66000366210938, 84.92784881591797, 17.129898071289062, 54.56346130371094, 66.65383911132812, 103.22310638427734, 44.290836334228516, 104.45166778564453, -101.19463348388672, 99.22500610351562, 49.25030517578125, 119.69640350341797, 189.94371032714844, 68.7535171508789, -14.3798828125, 177.32894897460938, 131.35305786132812, 44.731346130371094, 147.52133178710938, 49.40190887451172, 137.19960021972656, 82.37117004394531, 56.1197509765625, 59.873046875, 72.18315124511719, 32.84845733642578, 25.63641357421875, 73.6506576538086, 144.04339599609375, 110.57954406738281, 67.42393493652344, 47.16368103027344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000606.npy"}
{"epoch": 0.8898678414096917, "step": 607, "batch_size": 64, "mean": 75.94203186035156, "std": 94.61343383789062, "min": -183.65957641601562, "p10": -32.74885940551758, "median": 85.9010009765625, "p90": 171.73819274902345, "max": 239.64776611328125, "pos_frac": 0.765625, "sample": [161.0904083251953, 170.27117919921875, 27.177963256835938, -13.43838882446289, 131.49542236328125, -31.741867065429688, 111.24203491210938, 58.99101257324219, 154.69354248046875, 172.36691284179688, 115.81465148925781, -17.773479461669922, 159.12112426757812, -28.836151123046875, 33.447418212890625, 236.1393585205078, 65.28994750976562, 145.70144653320312, 221.25057983398438, 45.39251708984375, 167.71983337402344, -18.427274703979492, 161.03585815429688, 79.76225280761719, -26.241344451904297, 47.24378967285156, -33.18042755126953, 127.44415283203125, 146.3323974609375, 57.18296432495117, 239.64776611328125, -26.601238250732422, 104.5624771118164, 54.902130126953125, 131.80487060546875, -15.367752075195312, 231.20510864257812, -36.42622375488281, 198.55712890625, 2.423858642578125, -117.46653747558594, 121.11348724365234, -109.73074340820312, 152.59837341308594, -92.50434112548828, 7.070352554321289, 44.191253662109375, 102.0002212524414, -183.65957641601562, 144.61212158203125, 140.54278564453125, 144.19325256347656, 142.82330322265625, 58.55782699584961, -66.8254623413086, 2.7568817138671875, 95.24427795410156, 237.2505340576172, 166.9923553466797, 22.330535888671875, 64.93521881103516, 42.010169982910156, 137.93792724609375, 92.03974914550781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000607.npy"}
{"epoch": 0.8913362701908958, "step": 608, "batch_size": 64, "mean": 78.56434631347656, "std": 83.21826171875, "min": -209.05462646484375, "p10": -27.727102661132797, "median": 86.4079360961914, "p90": 173.55433197021483, "max": 245.1653289794922, "pos_frac": 0.84375, "sample": [118.03179931640625, 131.24522399902344, 42.191680908203125, 18.843536376953125, 51.99720764160156, 135.73097229003906, 32.56991958618164, 96.19007873535156, 126.03897857666016, 130.0506591796875, 27.486976623535156, 45.961334228515625, 17.663787841796875, 118.26229095458984, -78.70606231689453, 22.743000030517578, 79.12183380126953, 55.38249206542969, -0.7647705078125, 89.66249084472656, 189.75869750976562, 149.44677734375, 160.5843505859375, 245.1653289794922, -34.66539001464844, 95.16158294677734, -56.641998291015625, 70.53251647949219, 108.52365112304688, -5.35089111328125, 167.365966796875, 58.4697265625, 173.84400939941406, 71.61509704589844, 189.20144653320312, 186.43099975585938, 139.71409606933594, -123.92730712890625, 99.45767211914062, -40.252708435058594, 106.36537170410156, 62.431724548339844, 23.841583251953125, 6.4257049560546875, -11.537765502929688, 127.10635375976562, 16.673065185546875, 120.89982604980469, 103.88385772705078, 71.78882598876953, 226.98770141601562, -209.05462646484375, 65.52692413330078, 172.1219940185547, 140.48428344726562, 149.5320587158203, -41.187232971191406, 105.20164489746094, 179.8359375, 50.010223388671875, 135.96099853515625, 83.15338134765625, 64.65082550048828, 172.87841796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000608.npy"}
{"epoch": 0.8928046989720999, "step": 609, "batch_size": 64, "mean": 79.48576354980469, "std": 83.70735931396484, "min": -127.26499938964844, "p10": -8.528715515136714, "median": 79.73587417602539, "p90": 179.95715637207033, "max": 374.2892150878906, "pos_frac": 0.828125, "sample": [51.41481399536133, 179.13079833984375, 12.32147216796875, 109.50065612792969, -36.650428771972656, 153.68409729003906, 110.03137969970703, 102.09684753417969, -0.4020881652832031, -3.6318206787109375, 197.35739135742188, 191.39617919921875, -10.292694091796875, 38.451175689697266, 4.5884246826171875, 158.78958129882812, -3.644500732421875, 93.055419921875, 58.06559753417969, 26.40521240234375, 88.39895629882812, 119.76605224609375, 180.31130981445312, 150.90020751953125, -33.39755630493164, 77.59736633300781, 203.7555389404297, -61.013160705566406, -58.06758499145508, 195.44102478027344, 65.16554260253906, 84.47308349609375, 146.80734252929688, 125.36595916748047, 131.5021209716797, 127.41215515136719, 130.60305786132812, 27.824390411376953, 58.11137008666992, 104.8042221069336, 29.908592224121094, 82.68527221679688, 81.87438201904297, 374.2892150878906, 34.672027587890625, 24.487747192382812, 129.71603393554688, 75.73507690429688, -4.4127655029296875, 117.27540588378906, 98.48118591308594, 40.33367919921875, 37.175506591796875, 20.181968688964844, 68.22406005859375, 175.4344482421875, 0.1365966796875, 243.6143798828125, 110.54682922363281, 166.14129638671875, 5.460121154785156, -127.26499938964844, 23.628570556640625, -18.664382934570312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000609.npy"}
{"epoch": 0.8942731277533039, "step": 610, "batch_size": 64, "mean": 79.26726531982422, "std": 72.40300750732422, "min": -137.79754638671875, "p10": -11.338178634643553, "median": 86.74457550048828, "p90": 176.93203430175782, "max": 204.2815399169922, "pos_frac": 0.859375, "sample": [168.44625854492188, 40.60345458984375, 12.931747436523438, 111.30982208251953, 84.03323364257812, -17.584259033203125, 185.81761169433594, 130.99386596679688, 143.14366149902344, 94.20965576171875, -81.99678039550781, -8.428298950195312, 114.11798095703125, 68.32514953613281, 184.54974365234375, 132.76422119140625, 130.4372100830078, 47.264259338378906, 60.66047286987305, 94.23857116699219, 80.43031311035156, 157.91873168945312, 6.374774932861328, 139.5619659423828, 49.628387451171875, 49.818809509277344, -10.346420288085938, 104.94224548339844, 108.4609375, 200.26959228515625, 45.11138916015625, 20.70958709716797, 64.95713806152344, 63.063446044921875, 41.145729064941406, 195.02574157714844, -137.79754638671875, 114.41477966308594, 45.686370849609375, 89.45591735839844, 171.46014404296875, 39.99053955078125, -12.294317245483398, 191.88543701171875, 48.56410217285156, -11.76321792602539, 93.12083435058594, 50.99778366088867, -91.40110778808594, 120.22124481201172, 71.26968383789062, 179.27713012695312, 61.66120147705078, 131.92637634277344, 89.69483947753906, 125.87797546386719, 117.4050521850586, 204.2815399169922, 32.06169509887695, 108.3721923828125, -25.605239868164062, 5.2348480224609375, 99.74470520019531, 146.45237731933594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000610.npy"}
{"epoch": 0.895741556534508, "step": 611, "batch_size": 64, "mean": 81.11207580566406, "std": 82.183837890625, "min": -47.55595397949219, "p10": -15.611544418334955, "median": 65.79672622680664, "p90": 204.87585449218753, "max": 298.25726318359375, "pos_frac": 0.828125, "sample": [13.142196655273438, 211.32843017578125, 94.3808822631836, 101.84378051757812, 180.9293212890625, 137.7586212158203, 19.36056900024414, 45.1270751953125, 98.99125671386719, 119.53532409667969, -7.616600036621094, 298.25726318359375, -44.45746612548828, -2.9503021240234375, -47.55595397949219, -18.21798324584961, 0.6451091766357422, 8.215282440185547, 148.7635040283203, 71.73301696777344, -36.94337463378906, 187.62619018554688, 206.47518920898438, -45.24742889404297, 241.1947021484375, 219.82327270507812, 53.704307556152344, 109.17001342773438, -9.529853820800781, 45.03099822998047, 123.01502990722656, 111.58189392089844, 27.91931915283203, 19.35675621032715, -1.489410400390625, 37.243072509765625, 31.557483673095703, 76.25371551513672, -23.366859436035156, 59.501930236816406, 117.63323974609375, 84.63877868652344, 131.57859802246094, 227.33876037597656, 44.72979736328125, 77.05555725097656, 117.574462890625, 107.86906433105469, 187.60174560546875, 19.853477478027344, 39.72120666503906, 48.84782409667969, 223.89669799804688, 50.865753173828125, 17.621543884277344, 59.860435485839844, 84.41238403320312, 199.19326782226562, 40.53605270385742, 201.14407348632812, 117.3106689453125, 6.144287109375, 146.89797973632812, -23.242950439453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000611.npy"}
{"epoch": 0.8972099853157122, "step": 612, "batch_size": 64, "mean": 79.22400665283203, "std": 73.83154296875, "min": -87.79598999023438, "p10": -2.0506668090820295, "median": 74.48856353759766, "p90": 162.93798217773437, "max": 297.95513916015625, "pos_frac": 0.875, "sample": [297.95513916015625, 130.06898498535156, 67.85681915283203, 34.57427978515625, 91.42670440673828, 189.1929473876953, 87.55076599121094, 139.35336303710938, -22.951812744140625, 5.956567764282227, 86.84278869628906, -2.7308349609375, -5.098419189453125, 86.62008666992188, 209.7279815673828, 56.81269073486328, 70.5684814453125, 120.8946762084961, 114.76617431640625, 113.5787124633789, -87.79598999023438, 122.26617431640625, 22.65143585205078, 123.01091766357422, 164.54141235351562, 21.322601318359375, 83.38814544677734, 142.065673828125, 22.60553741455078, 66.90892028808594, 133.1587677001953, 80.42532348632812, 62.03599548339844, 94.36526489257812, 122.8971176147461, 55.44287109375, 163.15060424804688, 72.51925659179688, 22.15155029296875, 18.355567932128906, 26.99044418334961, 34.0202522277832, 54.545021057128906, 6.892219543457031, 145.56320190429688, 19.013469696044922, 117.71342468261719, 128.19911193847656, 270.5833435058594, -70.5040054321289, 47.30762481689453, 12.71975326538086, 62.67596435546875, -0.4636077880859375, 76.45787048339844, 105.34664916992188, -11.347541809082031, 42.93745422363281, 159.79632568359375, 47.59623718261719, 162.44186401367188, 196.85067749023438, 125.24064636230469, -68.67312622070312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000612.npy"}
{"epoch": 0.8986784140969163, "step": 613, "batch_size": 64, "mean": 82.93061065673828, "std": 72.89334106445312, "min": -70.9023208618164, "p10": 8.593204498291017, "median": 74.41709518432617, "p90": 199.85712127685548, "max": 266.2840881347656, "pos_frac": 0.921875, "sample": [104.52400207519531, 67.07174682617188, 157.87802124023438, 109.16255950927734, 70.19515991210938, 208.83627319335938, 147.24913024902344, 83.032470703125, 106.035400390625, 74.49365997314453, 23.8857421875, -70.9023208618164, 250.01718139648438, 98.88693237304688, 124.57042694091797, 123.16560363769531, 75.31478881835938, 40.9796257019043, 117.27169036865234, 17.85498046875, 45.263526916503906, 60.491207122802734, 177.1953125, 49.9299430847168, 78.10404968261719, 24.331092834472656, 200.0084991455078, 186.9247283935547, 81.74961853027344, 91.83004760742188, 96.24488830566406, 125.1051025390625, -15.698066711425781, 9.868240356445312, 22.660736083984375, 209.18014526367188, 100.57099914550781, 127.29055786132812, 266.2840881347656, 45.812591552734375, 60.9046630859375, 10.948562622070312, 64.07269287109375, 8.046760559082031, -16.562591552734375, 199.50390625, 45.821929931640625, 206.3288116455078, 11.376976013183594, -46.987823486328125, 224.11207580566406, 27.18929672241211, 47.77764892578125, 3.7333297729492188, 47.03672790527344, 74.34053039550781, 88.29547119140625, 36.938209533691406, 85.59040832519531, 152.19436645507812, 19.589996337890625, 48.89845275878906, 10.661361694335938, -14.922859191894531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000613.npy"}
{"epoch": 0.9001468428781204, "step": 614, "batch_size": 64, "mean": 68.47464752197266, "std": 79.6109619140625, "min": -115.48089599609375, "p10": -42.718675613403306, "median": 81.15713500976562, "p90": 160.36685791015626, "max": 265.732177734375, "pos_frac": 0.796875, "sample": [108.20575714111328, 133.46995544433594, -65.2684326171875, 86.64553833007812, 133.99151611328125, 58.730857849121094, -4.505210876464844, -47.696014404296875, 88.11643981933594, -25.719858169555664, 41.14091873168945, 26.53594207763672, 55.69140625, -8.6409912109375, 184.95236206054688, -48.98072814941406, 152.28347778320312, 94.69245910644531, 154.4608612060547, 0.7304229736328125, 142.38909912109375, 57.044586181640625, 82.79501342773438, 56.41719055175781, 144.611328125, 25.333621978759766, 26.798858642578125, 119.13462829589844, 23.88054656982422, -115.48089599609375, -49.083961486816406, 148.6260986328125, 161.8816680908203, 28.37624740600586, 11.994316101074219, -86.89716339111328, 161.2793731689453, 4.776721954345703, 132.87335205078125, 158.23765563964844, 177.19317626953125, -94.49748992919922, 121.48294067382812, 4.437076568603516, 265.732177734375, 175.91636657714844, 103.63081359863281, 92.90425872802734, 41.70122528076172, 2.909576416015625, 70.89205932617188, 102.50870513916016, 100.47731018066406, 63.1309814453125, 79.51925659179688, 104.1463623046875, 125.76213073730469, -13.467769622802734, 126.16014099121094, -31.10488510131836, 123.75151824951172, -22.554351806640625, 181.77076721191406, 126.15017700195312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000614.npy"}
{"epoch": 0.9016152716593245, "step": 615, "batch_size": 64, "mean": 64.12843322753906, "std": 68.139892578125, "min": -119.06594848632812, "p10": -23.147696304321283, "median": 68.1121826171875, "p90": 150.9455947875977, "max": 195.7455291748047, "pos_frac": 0.859375, "sample": [-39.11016845703125, 195.7455291748047, 98.98717498779297, 63.991981506347656, 10.05151653289795, 17.84759521484375, 24.975364685058594, 4.113014221191406, 24.989501953125, 49.89949035644531, 89.25294494628906, 110.52030181884766, 125.18790435791016, 118.655029296875, 99.72042846679688, 65.44317626953125, 82.5250015258789, 16.39582061767578, 0.005519866943359375, 24.44335174560547, 124.9765625, -119.06594848632812, 85.66744995117188, 5.1628875732421875, 19.520851135253906, 127.15432739257812, 123.12289428710938, 79.58969116210938, 155.98912048339844, 84.72108459472656, 35.48572540283203, 86.51414489746094, 1.93585205078125, 28.238372802734375, 109.60538482666016, 123.98566436767578, 139.1773681640625, 122.32217407226562, -16.281661987304688, 50.27332305908203, -49.68503189086914, 171.95913696289062, 169.91949462890625, 19.508922576904297, 172.90777587890625, 67.80889892578125, 88.02284240722656, 176.645751953125, 50.1441650390625, 156.49368286132812, 15.994544982910156, 19.272071838378906, 68.41546630859375, -100.65756225585938, 57.47846984863281, -26.090282440185547, -14.654396057128906, 85.43548583984375, -27.780181884765625, 125.50481414794922, -32.24430847167969, 122.8187255859375, 137.9723663330078, 97.29348754882812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000615.npy"}
{"epoch": 0.9030837004405287, "step": 616, "batch_size": 64, "mean": 80.85198974609375, "std": 84.75638580322266, "min": -135.7255859375, "p10": -29.185228729248045, "median": 81.9344596862793, "p90": 179.2446517944336, "max": 339.0926513671875, "pos_frac": 0.828125, "sample": [179.9019775390625, 11.019134521484375, 176.8524932861328, 177.7108917236328, 58.714080810546875, 61.08290100097656, -42.89009094238281, -85.5469970703125, 144.25164794921875, 253.3900909423828, 196.88380432128906, 120.84579467773438, 126.054443359375, -29.102455139160156, 167.9423065185547, 105.60380554199219, 147.44601440429688, 186.7442626953125, 101.48030090332031, -0.12974929809570312, 151.83580017089844, 191.02548217773438, 93.4937973022461, 108.11365509033203, 123.67679595947266, -54.84967041015625, 46.675621032714844, 28.329238891601562, 128.373779296875, 35.032691955566406, 339.0926513671875, 97.23219299316406, -135.7255859375, 21.453079223632812, -36.534149169921875, -13.043312072753906, 47.942970275878906, 61.91834259033203, 98.98143005371094, 83.83025360107422, 48.89732360839844, 98.73026275634766, -29.220703125, 39.60752868652344, -33.43315124511719, 115.10015869140625, 87.32376098632812, 50.50624084472656, 78.62644958496094, 61.47093200683594, 17.7803955078125, 94.091064453125, 96.85397338867188, -11.488914489746094, 168.08444213867188, 47.39836883544922, 28.748275756835938, 106.80073547363281, 59.69554901123047, 80.03866577148438, 175.77606201171875, 53.32792663574219, 252.94320678710938, 11.759262084960938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000616.npy"}
{"epoch": 0.9045521292217328, "step": 617, "batch_size": 64, "mean": 65.0345230102539, "std": 77.20874786376953, "min": -179.455322265625, "p10": -23.274589538574215, "median": 61.71889877319336, "p90": 152.1254104614258, "max": 236.9046173095703, "pos_frac": 0.78125, "sample": [36.846214294433594, 92.14698028564453, 56.289306640625, 150.32513427734375, -73.77250671386719, 124.29435729980469, 145.67771911621094, 64.50792694091797, 99.20606994628906, 54.437965393066406, -1.77435302734375, 121.8520278930664, 147.95057678222656, 100.55699920654297, 112.01785278320312, -15.647918701171875, 19.579008102416992, 106.90837860107422, 50.866939544677734, 235.9625244140625, 51.06067657470703, 93.1575927734375, -179.455322265625, 105.35861206054688, 25.757633209228516, 115.37576293945312, 6.095211029052734, 14.314956665039062, 8.453681945800781, 58.92987060546875, 29.525997161865234, 203.5223388671875, -18.731842041015625, 94.39209747314453, 236.9046173095703, 86.16118621826172, 144.61537170410156, 108.29306030273438, 110.94967651367188, 141.5030517578125, -46.75624084472656, -3.609556198120117, 171.7977294921875, 152.89695739746094, 155.96090698242188, 47.646514892578125, 41.14305114746094, -25.221481323242188, 24.769248962402344, 94.70911407470703, -44.21449279785156, -3.8129806518554688, -73.04595184326172, -4.623081207275391, 55.214698791503906, 32.90058898925781, 76.81929016113281, 115.45779418945312, -51.792015075683594, 177.72215270996094, 106.95162963867188, 64.94224548339844, 33.82728576660156, -1.8891983032226562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000617.npy"}
{"epoch": 0.9060205580029369, "step": 618, "batch_size": 64, "mean": 68.39972686767578, "std": 80.7316665649414, "min": -182.37535095214844, "p10": -30.14806747436523, "median": 75.98752975463867, "p90": 161.66343078613284, "max": 206.45956420898438, "pos_frac": 0.796875, "sample": [-89.68577575683594, 15.934345245361328, 55.57740020751953, 43.84053039550781, 159.37896728515625, -0.2632293701171875, 113.02216339111328, 79.69941711425781, 157.1461639404297, 115.5851821899414, 12.650436401367188, -88.99681854248047, 162.64248657226562, 15.190305709838867, 108.84382629394531, 60.496917724609375, -182.37535095214844, 202.3658447265625, 95.98709106445312, 83.94171142578125, -93.19338989257812, 135.56475830078125, 97.50944519042969, 11.516555786132812, -64.017578125, 64.7132797241211, 43.713966369628906, -63.14497375488281, 113.87713623046875, 83.9178695678711, 177.47718811035156, 39.23340606689453, 139.735595703125, 189.8975067138672, 40.822235107421875, 63.46311950683594, -25.611541748046875, 73.02580261230469, -9.355823516845703, 152.1659393310547, 112.64924621582031, -32.09229278564453, -20.0687255859375, 134.4517822265625, 140.64346313476562, -18.17562484741211, 125.4251937866211, 80.74552917480469, 78.94925689697266, 143.4647674560547, 179.3457794189453, 68.69091033935547, 206.45956420898438, 103.40174102783203, 183.26296997070312, 63.06398391723633, 66.88360595703125, -13.384841918945312, 135.81959533691406, 12.505218505859375, 43.12876892089844, 58.904319763183594, 93.08985900878906, 138.1262969970703], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000618.npy"}
{"epoch": 0.9074889867841409, "step": 619, "batch_size": 64, "mean": 65.72555541992188, "std": 84.322509765625, "min": -96.75715637207031, "p10": -33.427135086059565, "median": 54.766990661621094, "p90": 170.9777572631836, "max": 323.2409973144531, "pos_frac": 0.765625, "sample": [-51.62506103515625, 159.735107421875, 103.64859008789062, 180.35400390625, 66.69001770019531, -39.27134704589844, -4.3012847900390625, 20.707435607910156, 148.54541015625, 42.270965576171875, 180.6261444091797, 206.35537719726562, 9.553291320800781, 160.90264892578125, 28.583892822265625, 48.333580017089844, 122.98904418945312, 22.18044662475586, 151.98052978515625, 76.9115219116211, -72.8900375366211, 81.93923950195312, 91.54612731933594, 130.9551544189453, -92.98207092285156, -29.806793212890625, 14.98431396484375, 125.40943908691406, 11.574623107910156, -29.527099609375, 5.0597076416015625, 155.7715301513672, 116.19158935546875, 115.317138671875, 173.1401824951172, 13.2254638671875, -14.462158203125, -34.97871017456055, 53.90190887451172, 62.730377197265625, 55.63207244873047, 157.66921997070312, 36.615936279296875, 93.22700500488281, 183.62432861328125, 36.649940490722656, -96.75715637207031, 183.6544189453125, 97.4993896484375, 49.727378845214844, -23.296642303466797, -10.653213500976562, 113.28376007080078, 323.2409973144531, 16.00952911376953, -13.055595397949219, 152.09808349609375, 14.382858276367188, 165.93209838867188, 139.1409454345703, 22.643600463867188, -68.25111389160156, -7.454076766967773, 72.60148620605469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000619.npy"}
{"epoch": 0.908957415565345, "step": 620, "batch_size": 64, "mean": 72.52044677734375, "std": 87.43000793457031, "min": -98.63429260253906, "p10": -42.29463233947754, "median": 63.69932174682617, "p90": 183.88930969238282, "max": 227.14666748046875, "pos_frac": 0.796875, "sample": [30.716049194335938, -0.027935028076171875, 159.170654296875, 63.68407440185547, 63.607791900634766, 186.0107879638672, 131.16207885742188, -58.2166633605957, 93.34843444824219, 196.58416748046875, 3.2487716674804688, -41.466087341308594, -41.49177551269531, 166.4502716064453, -78.80610656738281, 33.83729553222656, 143.67587280273438, 91.31724548339844, 169.05348205566406, 165.232666015625, 80.23501586914062, 165.17491149902344, -63.882667541503906, 63.714569091796875, -98.63429260253906, -12.534439086914062, 56.850059509277344, 116.94224548339844, 158.83688354492188, 44.47303771972656, 1.4137725830078125, 209.82443237304688, 159.572265625, 196.33663940429688, 23.177337646484375, 9.992652893066406, 199.70755004882812, 68.10118103027344, 227.14666748046875, 172.3164520263672, 21.243213653564453, 116.51494598388672, -80.90496826171875, 10.933258056640625, 55.683162689208984, 25.832839965820312, 67.00518798828125, 57.243011474609375, 173.446533203125, 26.58126449584961, -39.21289825439453, 92.13815307617188, 191.7261962890625, 34.72737121582031, 143.25628662109375, 178.84658813476562, 47.0723876953125, 17.208099365234375, 120.0178451538086, -42.63871383666992, -28.408126831054688, 178.93919372558594, -90.66825866699219, 108.87055969238281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000620.npy"}
{"epoch": 0.9104258443465492, "step": 621, "batch_size": 64, "mean": 70.27882385253906, "std": 90.24836730957031, "min": -169.71383666992188, "p10": -31.634332275390623, "median": 64.64590072631836, "p90": 183.88993225097659, "max": 309.27008056640625, "pos_frac": 0.75, "sample": [44.895790100097656, -12.99972915649414, -169.71383666992188, -6.234519958496094, 160.65090942382812, -58.65196228027344, 136.66110229492188, 32.5806884765625, 27.58749771118164, 17.610511779785156, 127.89045715332031, 80.74088287353516, 58.73773956298828, 120.70455932617188, -11.817657470703125, 61.223106384277344, -7.467960357666016, -9.222270965576172, -118.7430419921875, -29.609420776367188, 143.54217529296875, 52.28877258300781, -25.8753662109375, 134.6988525390625, 226.20135498046875, 207.99830627441406, 185.50967407226562, -22.321884155273438, 82.45829772949219, 102.37380981445312, -66.8211898803711, 180.11053466796875, 115.68646240234375, 107.93673706054688, 223.32827758789062, 57.19084548950195, 47.788185119628906, 124.79978942871094, 64.10264587402344, 6.4858245849609375, -38.42494201660156, 73.40618133544922, 0.2186737060546875, 64.945556640625, 56.778568267822266, 85.7857666015625, 104.8756103515625, 67.28941345214844, 31.96990966796875, 178.81802368164062, -17.707088470458984, 36.37323760986328, 95.62905883789062, 250.81427001953125, -59.92717742919922, -32.50215148925781, 144.52830505371094, 191.3685302734375, 131.69033813476562, 103.30765533447266, 309.27008056640625, 64.34624481201172, 164.3001708984375, 98.38511657714844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000621.npy"}
{"epoch": 0.9118942731277533, "step": 622, "batch_size": 64, "mean": 73.6937255859375, "std": 74.56144714355469, "min": -69.23141479492188, "p10": -15.119918823242186, "median": 65.7932357788086, "p90": 166.96679077148437, "max": 230.58595275878906, "pos_frac": 0.78125, "sample": [145.15345764160156, 25.894485473632812, -9.252586364746094, 144.14315795898438, 33.26500701904297, -17.48236083984375, 151.95233154296875, 117.31676483154297, 230.58595275878906, 138.93373107910156, 32.83714294433594, 54.43024444580078, -5.135303497314453, -44.91069793701172, 34.69118118286133, 229.67568969726562, 82.42079162597656, 59.53204345703125, 45.8404541015625, -8.437057495117188, 31.239734649658203, -8.35791015625, 99.313232421875, 116.83914184570312, 140.16371154785156, 98.595703125, 94.21502685546875, 13.815093994140625, 206.48709106445312, -22.5776309967041, 83.8051986694336, 140.69696044921875, 156.10333251953125, 4.822811126708984, 95.97927856445312, -48.164794921875, 91.04655456542969, -2.9101409912109375, 178.17840576171875, 166.1517333984375, 167.31610107421875, 144.0614013671875, -13.899169921875, -35.41541290283203, 29.56322479248047, 13.517303466796875, -15.643096923828125, 4.4788055419921875, 86.4478759765625, 67.170654296875, 151.61502075195312, -0.29001617431640625, 102.79417419433594, 220.75975036621094, 55.7567138671875, 61.44273376464844, 125.37255859375, -69.23141479492188, 18.243167877197266, 106.07575225830078, 64.41581726074219, 37.389556884765625, 148.6290740966797, 168.9311981201172], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000622.npy"}
{"epoch": 0.9133627019089574, "step": 623, "batch_size": 64, "mean": 54.46601104736328, "std": 89.44459533691406, "min": -149.78492736816406, "p10": -47.61735916137695, "median": 45.02042007446289, "p90": 169.27205505371094, "max": 261.14739990234375, "pos_frac": 0.671875, "sample": [114.8999252319336, 57.327545166015625, -41.862083435058594, 214.24594116210938, 116.51329803466797, 110.24617767333984, 44.53554916381836, 136.2955322265625, 137.98751831054688, -97.7396240234375, 151.81138610839844, 45.50529098510742, -54.099971771240234, -4.1443939208984375, 153.50241088867188, 0.284912109375, 188.14593505859375, 12.48651123046875, -30.155067443847656, -21.354812622070312, -6.8434906005859375, 68.04345703125, -46.59797668457031, -22.60579490661621, -78.11378479003906, -41.87554168701172, 151.49609375, 70.79789733886719, 110.59390258789062, 6.184822082519531, -43.073455810546875, 64.18424987792969, 124.97956085205078, 28.99081802368164, -48.054237365722656, 38.572792053222656, 205.646240234375, 100.72148895263672, 35.739654541015625, 48.19114685058594, 175.44699096679688, -82.7609634399414, 109.95783233642578, 75.30513000488281, -26.671188354492188, 10.880517959594727, 222.47682189941406, -1.7183380126953125, -4.300642013549805, -149.78492736816406, 166.7069549560547, 170.3713836669922, 166.33580017089844, 77.47972106933594, -58.09617233276367, 261.14739990234375, -24.142478942871094, -32.08763885498047, 18.061264038085938, 74.25354766845703, 162.68316650390625, 34.159149169921875, 98.11346435546875, 40.59788513183594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000623.npy"}
{"epoch": 0.9148311306901615, "step": 624, "batch_size": 64, "mean": 80.68818664550781, "std": 78.51807403564453, "min": -104.81259155273438, "p10": -10.518982696533197, "median": 76.6513557434082, "p90": 186.11929016113282, "max": 241.93429565429688, "pos_frac": 0.84375, "sample": [215.11607360839844, 14.994346618652344, 78.34487915039062, 72.60015869140625, -104.81259155273438, 117.87461853027344, 45.54119873046875, 59.29447937011719, 165.03201293945312, 108.29883575439453, 148.44065856933594, -54.49787902832031, 188.7889404296875, 99.33412170410156, 229.79800415039062, -3.88873291015625, 193.98062133789062, -68.72221374511719, 98.2706298828125, 83.62435913085938, 128.97750854492188, 86.11415100097656, 159.87486267089844, 31.733776092529297, -4.794242858886719, 13.744483947753906, 241.93429565429688, 84.22880554199219, -42.92948913574219, 28.945953369140625, 78.89309692382812, 142.65435791015625, 103.37199401855469, 179.89010620117188, 35.58879852294922, 74.17314147949219, 107.36485290527344, 22.15518569946289, 104.93388366699219, 136.64944458007812, -18.6767578125, -12.972442626953125, 7.92242431640625, 53.60936737060547, 179.7078094482422, 93.83084869384766, 51.978668212890625, 170.99227905273438, 143.28538513183594, 68.29696655273438, 74.95783233642578, -58.60894775390625, 67.81121826171875, 228.54673767089844, 12.32565689086914, 21.592391967773438, 46.22999572753906, 33.38802719116211, -4.528194427490234, 74.93312072753906, 226.9824676513672, 51.388648986816406, 94.97834777832031, 155.15478515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000624.npy"}
{"epoch": 0.9162995594713657, "step": 625, "batch_size": 64, "mean": 61.96144485473633, "std": 72.5468978881836, "min": -87.98300170898438, "p10": -16.59994125366211, "median": 58.92151641845703, "p90": 166.09989929199224, "max": 241.55776977539062, "pos_frac": 0.78125, "sample": [35.57048034667969, 50.482818603515625, -71.01688385009766, 200.63189697265625, 16.08379364013672, 170.7313232421875, 123.63435363769531, 148.34507751464844, 94.08808135986328, 72.40644836425781, 39.03954315185547, -41.05438232421875, -12.980846405029297, 29.746009826660156, 200.77139282226562, 28.375091552734375, 19.86749839782715, 3.7036819458007812, 62.49586486816406, 58.995635986328125, 181.86390686035156, -29.22919464111328, 19.667491912841797, 62.98038101196289, -16.603775024414062, 97.02330017089844, 125.37074279785156, -3.7990798950195312, 9.719970703125, 62.71965026855469, 123.98535919189453, 64.66825866699219, 77.08621978759766, 33.94242858886719, 241.55776977539062, -5.917873382568359, -2.8725662231445312, 66.98275756835938, -87.98300170898438, 20.577655792236328, -57.31611633300781, 29.120391845703125, 125.13096618652344, 103.9749984741211, 44.912940979003906, 122.06941223144531, 17.113693237304688, -7.977317810058594, 155.29324340820312, 58.84739685058594, -16.811466217041016, -16.59099578857422, -9.275886535644531, 81.91118621826172, 65.39881134033203, 152.28872680664062, 77.37809753417969, 146.40553283691406, 89.10037231445312, 0.29239654541015625, 190.70790100097656, 89.46762084960938, 204.2217559814453, 48.211395263671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000625.npy"}
{"epoch": 0.9177679882525698, "step": 626, "batch_size": 64, "mean": 60.04652404785156, "std": 89.08826446533203, "min": -116.1103286743164, "p10": -39.44444084167479, "median": 45.45368576049805, "p90": 180.11094360351564, "max": 301.08526611328125, "pos_frac": 0.75, "sample": [29.904129028320312, -13.246688842773438, 45.301963806152344, 98.32756042480469, -116.1103286743164, 208.31527709960938, 179.55697631835938, 21.208587646484375, 3.2518310546875, -78.46339416503906, 57.25828552246094, 84.64482116699219, 131.39126586914062, 72.62368774414062, -17.1214599609375, 150.530517578125, 48.58607482910156, 103.35792541503906, 36.689117431640625, -28.350154876708984, 33.16497802734375, 31.978805541992188, 99.41516876220703, 24.097484588623047, -72.24382019042969, 50.99165344238281, 50.46929931640625, 16.674423217773438, 37.547996520996094, 163.23541259765625, -20.77545928955078, 45.49883270263672, -114.89225769042969, -44.199134826660156, 99.1556396484375, 134.42189025878906, 28.49603271484375, 123.81001281738281, 47.26100540161133, -19.29559326171875, 35.88617706298828, -66.67164611816406, 156.73797607421875, 185.73626708984375, 301.08526611328125, 123.52510833740234, -4.716487884521484, 99.29035949707031, 29.164527893066406, 180.34835815429688, -27.791236877441406, 138.76918029785156, 10.458740234375, 270.1064758300781, 134.70956420898438, -106.75872039794922, 209.05262756347656, 45.408538818359375, 36.346649169921875, -4.526622772216797, 109.82748413085938, 47.523536682128906, 213.06326293945312, -6.066158294677734], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000626.npy"}
{"epoch": 0.9192364170337739, "step": 627, "batch_size": 64, "mean": 71.95941162109375, "std": 86.09464263916016, "min": -130.86911010742188, "p10": -32.615384292602535, "median": 68.08267974853516, "p90": 195.07635803222658, "max": 280.5596923828125, "pos_frac": 0.796875, "sample": [100.73314666748047, 76.88525390625, 115.05606079101562, 202.646728515625, 79.46620178222656, -11.631362915039062, 79.38431549072266, 94.58087158203125, 139.46112060546875, 26.787506103515625, 25.063095092773438, 1.7161712646484375, 11.444061279296875, 21.46500015258789, -1.5703048706054688, 66.0714111328125, -63.57786178588867, 148.33547973632812, 90.46298217773438, 120.96310424804688, -67.22333526611328, 69.50760650634766, 141.31008911132812, 7.975748062133789, 195.68292236328125, -28.857223510742188, 125.34752655029297, 125.80931091308594, -2.035968780517578, 139.57740783691406, 61.43003463745117, 38.22401809692383, 93.0790023803711, 102.36477661132812, 30.40473175048828, 23.58666229248047, 142.90121459960938, 27.473644256591797, 77.81982421875, -11.141658782958984, 33.56709289550781, 40.53993225097656, 253.1308135986328, -35.073509216308594, 142.24102783203125, -130.86911010742188, 207.24330139160156, 66.65775299072266, 57.97596740722656, 52.13092041015625, 270.3639831542969, 151.95050048828125, 210.32479858398438, -67.04164123535156, -34.22602462768555, 71.74761962890625, 142.21096801757812, 25.20207977294922, -12.292251586914062, 193.66104125976562, 280.5596923828125, -79.84565734863281, 45.618919372558594, 102.64479064941406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000627.npy"}
{"epoch": 0.920704845814978, "step": 628, "batch_size": 64, "mean": 76.80429077148438, "std": 70.81092834472656, "min": -67.74557495117188, "p10": -13.670254516601556, "median": 73.83473205566406, "p90": 177.00446929931647, "max": 241.6798095703125, "pos_frac": 0.859375, "sample": [133.0963897705078, 5.124505996704102, 146.35174560546875, 109.23646545410156, 102.86199188232422, 73.07626342773438, 41.588993072509766, 29.450775146484375, -16.145858764648438, 14.323688507080078, 134.26480102539062, 34.91693115234375, -30.90576171875, 205.9862060546875, 53.875213623046875, 191.9716339111328, -27.127622604370117, 99.76038360595703, 241.6798095703125, 32.917144775390625, 10.49945068359375, 14.103408813476562, 140.27993774414062, -26.155288696289062, 46.63459777832031, 17.098098754882812, 67.86947631835938, 145.52565002441406, 108.64582824707031, 94.5814437866211, 29.348190307617188, 160.67601013183594, 212.8878173828125, 200.7302703857422, 144.83139038085938, 89.99286651611328, 107.59663391113281, 36.429786682128906, 103.2774658203125, 74.59320068359375, 15.584980010986328, 39.207366943359375, 32.25868225097656, 48.6805419921875, 111.94708251953125, 93.75502014160156, 138.753173828125, 102.31192016601562, -39.313636779785156, -7.8938446044921875, -48.50025939941406, 95.21363830566406, 195.35009765625, 184.00238037109375, -67.74557495117188, 44.39317321777344, 137.06427001953125, 23.244190216064453, 49.41947937011719, 117.42217254638672, 115.85731506347656, 122.93997192382812, 60.54236602783203, -4.769752502441406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000628.npy"}
{"epoch": 0.922173274596182, "step": 629, "batch_size": 64, "mean": 66.93384552001953, "std": 78.53201293945312, "min": -116.95863342285156, "p10": -43.91746978759765, "median": 78.20393371582031, "p90": 166.6323699951172, "max": 234.53488159179688, "pos_frac": 0.828125, "sample": [109.86177062988281, 174.55838012695312, 10.161592483520508, 33.63135528564453, 116.09847259521484, 151.75332641601562, 61.41416931152344, -47.985260009765625, 155.9833984375, 81.11944580078125, 117.25215148925781, 120.51528930664062, 172.25997924804688, 73.1607666015625, 130.29815673828125, -100.73992919921875, 6.7713623046875, 109.61344909667969, 38.996864318847656, 158.15377807617188, -22.147430419921875, 167.2921142578125, 117.1158447265625, 93.03398132324219, -44.347694396972656, 130.86805725097656, 104.20013427734375, -42.913612365722656, 22.588790893554688, 13.572479248046875, 90.96971893310547, 96.88323974609375, 140.21957397460938, 74.05369567871094, 81.36280059814453, 28.41566276550293, -116.95863342285156, 148.4025421142578, 144.3869171142578, 18.412643432617188, 22.198200225830078, -7.3388671875, 39.00006866455078, 3.992584228515625, 93.71270751953125, 46.08381652832031, 167.29295349121094, 29.11304473876953, 35.98095703125, 3.9128971099853516, 110.03213500976562, 102.50765991210938, -56.996673583984375, 176.9803009033203, -32.20928192138672, 234.53488159179688, 75.28842163085938, 100.94691467285156, 32.41064453125, 25.915966033935547, 165.48062133789062, 167.1259765625, -106.41741180419922, -64.07160949707031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000629.npy"}
{"epoch": 0.9236417033773862, "step": 630, "batch_size": 64, "mean": 52.37853240966797, "std": 76.4432144165039, "min": -111.39522552490234, "p10": -54.42351684570312, "median": 63.20903015136719, "p90": 143.95699768066407, "max": 232.92935180664062, "pos_frac": 0.765625, "sample": [158.16018676757812, -55.31500244140625, -33.62173843383789, 67.34413146972656, 94.97785949707031, 103.97737884521484, 92.01383972167969, -52.3433837890625, 76.69479370117188, 101.2485122680664, 20.187637329101562, 232.92935180664062, 73.66857147216797, 187.6295166015625, 121.95516204833984, 52.83563232421875, -23.266727447509766, 32.5889892578125, 83.0394287109375, -4.262325286865234, 75.33543395996094, 67.07516479492188, 4.92626953125, 89.44482421875, -66.96854400634766, 148.15133666992188, 114.16558837890625, 83.73861694335938, 20.263381958007812, 142.71334838867188, -63.64888000488281, -39.85575866699219, 59.3428955078125, -38.822418212890625, 107.889404296875, 17.762161254882812, -111.39522552490234, 69.24961853027344, 46.29692840576172, 2.4576873779296875, -90.9067611694336, -51.96061706542969, 57.596763610839844, 3.8109283447265625, 95.97897338867188, -93.24664306640625, 70.64643096923828, 185.8610076904297, -71.2499771118164, 25.14415740966797, 141.99708557128906, 137.36761474609375, 46.14203643798828, 67.16152954101562, 25.136474609375, -9.770622253417969, 188.88330078125, 21.40422821044922, 85.11068725585938, 49.310997009277344, 133.9017791748047, 28.523025512695312, 144.489990234375, 102.33015441894531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000630.npy"}
{"epoch": 0.9251101321585903, "step": 631, "batch_size": 64, "mean": 70.1238021850586, "std": 68.4873046875, "min": -117.13912963867188, "p10": -11.485499000549314, "median": 73.3205337524414, "p90": 152.31283569335938, "max": 250.24667358398438, "pos_frac": 0.828125, "sample": [132.46331787109375, 99.87601470947266, -5.3779754638671875, 137.84030151367188, 64.84638977050781, 98.51412200927734, 93.12664794921875, 116.33280944824219, -117.13912963867188, 27.431114196777344, 250.24667358398438, 150.36598205566406, -14.276313781738281, 69.71331787109375, 61.365379333496094, 91.38236999511719, 93.37741088867188, 149.57052612304688, -1.3415184020996094, 11.92513656616211, 75.44313049316406, 51.90877151489258, 28.257339477539062, 107.38885498046875, 143.7879180908203, 89.83417510986328, 33.96037292480469, 32.533267974853516, 167.09902954101562, 71.19793701171875, 16.61286163330078, -3.6130828857421875, 194.5413818359375, 9.521011352539062, 61.436492919921875, 200.37503051757812, -12.040359497070312, 96.2415771484375, 97.64059448242188, 136.54299926757812, 19.648948669433594, -10.190824508666992, 126.02903747558594, 86.11140441894531, 153.14720153808594, -29.786582946777344, 160.44358825683594, 9.953903198242188, 109.53729248046875, 157.77926635742188, 87.72421264648438, 23.685550689697266, 5.989437103271484, 7.048103332519531, 51.647071838378906, -71.10060119628906, 79.24615478515625, 119.05690002441406, 59.95881652832031, -17.18221092224121, 64.44091796875, 111.15972900390625, 104.9466552734375, -30.28270721435547], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000631.npy"}
{"epoch": 0.9265785609397944, "step": 632, "batch_size": 64, "mean": 77.0180435180664, "std": 85.80314636230469, "min": -135.43873596191406, "p10": -24.199931716918943, "median": 74.07853698730469, "p90": 193.97193298339843, "max": 280.34002685546875, "pos_frac": 0.796875, "sample": [139.67974853515625, -19.972320556640625, 50.62383270263672, 111.87306213378906, 156.65069580078125, 70.29777526855469, 19.173492431640625, 197.47613525390625, 254.21405029296875, 194.311767578125, 152.1348114013672, 23.0948486328125, 68.873779296875, 50.71165466308594, -10.86981201171875, 12.883922576904297, 76.75593566894531, 194.9379119873047, -6.376855850219727, 61.586326599121094, 97.23020935058594, 72.22142028808594, 121.10055541992188, 118.77943420410156, -19.611236572265625, 222.1825408935547, 193.17898559570312, 67.23336791992188, 147.8458251953125, -43.700218200683594, -24.690948486328125, -81.04219055175781, 280.34002685546875, 153.5219268798828, -22.741966247558594, 161.05560302734375, 125.82406616210938, 129.32427978515625, 85.14187622070312, 7.596183776855469, 115.15259552001953, 5.208595275878906, 87.3948745727539, 83.63026428222656, 67.49179077148438, 110.13374328613281, 75.93565368652344, 79.14141845703125, 137.06529235839844, 253.89627075195312, 192.30377197265625, -24.4141845703125, 40.360504150390625, -74.78823852539062, 37.86428451538086, 45.018699645996094, 27.498641967773438, -46.52271270751953, -135.43873596191406, 58.3408203125, 85.53009033203125, 61.667503356933594, -23.700008392333984, 81.53369140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000632.npy"}
{"epoch": 0.9280469897209985, "step": 633, "batch_size": 64, "mean": 50.707542419433594, "std": 95.78597259521484, "min": -112.93782043457031, "p10": -69.4041389465332, "median": 44.87798881530762, "p90": 168.238232421875, "max": 404.4156494140625, "pos_frac": 0.6875, "sample": [-54.449798583984375, 75.40384674072266, 56.298545837402344, -2.1069793701171875, -112.93782043457031, 6.006538391113281, 8.959159851074219, -64.62651824951172, 404.4156494140625, 198.55819702148438, 72.583251953125, -71.45169067382812, 201.39520263671875, 17.381309509277344, 94.741943359375, 49.226043701171875, 167.91070556640625, 147.41961669921875, -8.053512573242188, 102.8448257446289, -40.50910186767578, 177.75576782226562, 168.37860107421875, -25.92974853515625, 79.83625793457031, 157.45114135742188, -26.388896942138672, 41.784976959228516, -100.8585205078125, 79.32532501220703, -8.469619750976562, 12.029327392578125, 152.66329956054688, 115.75052642822266, -53.28007888793945, -86.71788024902344, -40.66161346435547, -83.98439025878906, -36.97901153564453, 150.4155731201172, 108.15777587890625, 30.647865295410156, 130.4310760498047, 126.26457214355469, 28.486351013183594, 57.22136688232422, -30.33447265625, 60.38407897949219, 191.005615234375, 14.813369750976562, -107.72391510009766, 132.6027374267578, 4.164785385131836, 181.87286376953125, 47.97100067138672, -85.85722351074219, 48.840309143066406, 38.69195556640625, 2.128345489501953, 131.0819091796875, -3.161712646484375, 98.0638427734375, 15.704544067382812, 102.69488525390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000633.npy"}
{"epoch": 0.9295154185022027, "step": 634, "batch_size": 64, "mean": 61.819000244140625, "std": 66.95844268798828, "min": -80.78943634033203, "p10": -26.213160705566402, "median": 62.180023193359375, "p90": 144.14383544921876, "max": 260.7149963378906, "pos_frac": 0.78125, "sample": [-60.08287048339844, -30.598892211914062, 23.330753326416016, 61.311004638671875, 6.851558685302734, 41.159278869628906, 68.14990997314453, 119.24275207519531, 124.6376724243164, 79.15748596191406, 23.27533721923828, 80.00010681152344, 100.51594543457031, 25.967117309570312, 225.65093994140625, 99.55767822265625, 81.4369125366211, 115.97648620605469, 56.59336853027344, 12.477203369140625, 69.83673095703125, -5.199958801269531, 92.34342956542969, -2.341644287109375, 46.88188171386719, 79.18243408203125, 145.58103942871094, 105.82334899902344, 119.41265869140625, 50.57288360595703, 260.7149963378906, 4.792085647583008, 7.2336883544921875, -80.78943634033203, 54.020660400390625, 140.7903594970703, 147.41082763671875, -42.598236083984375, 95.48709106445312, -19.030548095703125, -11.835357666015625, 149.94577026367188, 70.7427749633789, 159.35252380371094, 101.07441711425781, 138.51881408691406, 82.78507995605469, -28.006912231445312, 80.08050537109375, 139.46145629882812, -47.76180648803711, 63.049041748046875, -22.027740478515625, 58.08201599121094, -31.101547241210938, 80.19168853759766, -5.833316802978516, 54.07073974609375, -0.9502716064453125, 34.07061767578125, 104.76202392578125, 146.71780395507812, 59.15668869018555, 57.136993408203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000634.npy"}
{"epoch": 0.9309838472834068, "step": 635, "batch_size": 64, "mean": 49.26845932006836, "std": 77.72618865966797, "min": -150.44952392578125, "p10": -36.54271545410156, "median": 50.24028015136719, "p90": 144.52463989257816, "max": 287.5557556152344, "pos_frac": 0.734375, "sample": [82.30790710449219, -23.102920532226562, 154.92996215820312, -59.09980773925781, 130.1385498046875, 103.19620513916016, -51.079627990722656, 38.424285888671875, 46.52703857421875, 66.57205963134766, 77.87132263183594, 100.23451232910156, 76.7691650390625, 41.87725830078125, 89.48272705078125, 44.16386795043945, -12.5975341796875, 138.0816192626953, 60.83869934082031, 70.25411987304688, -9.651458740234375, -4.363670349121094, 34.34416198730469, 54.00633239746094, 48.269264221191406, -19.276268005371094, 24.180999755859375, 136.25328063964844, 179.3609619140625, 174.26576232910156, 1.5845985412597656, 53.411773681640625, -3.7645034790039062, 287.5557556152344, 128.3125457763672, 10.977706909179688, 11.022663116455078, 37.67032241821289, 29.35089874267578, 77.19611358642578, 60.696510314941406, 229.72027587890625, 147.2859344482422, 73.46044158935547, 82.08131408691406, -30.846923828125, 16.142261505126953, 44.33055877685547, -45.97333526611328, -35.71714782714844, 87.56062316894531, -28.383010864257812, -150.44952392578125, 52.21129608154297, 60.47716522216797, 65.73004150390625, 71.54032897949219, -92.64640808105469, -5.385627746582031, 194.84036254882812, -111.718994140625, 17.40087890625, 61.224334716796875, -36.89653015136719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000635.npy"}
{"epoch": 0.9324522760646109, "step": 636, "batch_size": 64, "mean": 72.37382507324219, "std": 78.07594299316406, "min": -104.3466567993164, "p10": -9.162501525878906, "median": 62.6963005065918, "p90": 179.8440353393555, "max": 234.16030883789062, "pos_frac": 0.796875, "sample": [37.649803161621094, 163.90647888183594, -7.965576171875, 52.71095275878906, 117.31570434570312, -9.200454711914062, 183.14581298828125, -7.083963394165039, 11.442703247070312, -19.311439514160156, 25.493820190429688, 77.11885833740234, 12.734413146972656, 142.41058349609375, 146.05276489257812, 174.0135040283203, -8.907173156738281, 233.5802001953125, 58.14070129394531, -9.073944091796875, -75.668701171875, 172.94769287109375, 43.73937225341797, 22.233530044555664, 49.053733825683594, 43.28564453125, 117.73051452636719, 198.75543212890625, 77.21368408203125, 199.14796447753906, -2.4713821411132812, -41.8393669128418, 34.268211364746094, 151.2764129638672, 13.717018127441406, -2.1593780517578125, 151.7432861328125, 62.92992401123047, 87.17743682861328, 83.00286865234375, 75.8675537109375, 3.062763214111328, 125.82725524902344, 143.56964111328125, 71.45538330078125, 147.40530395507812, -43.401832580566406, 108.01619720458984, 48.0565185546875, -104.3466567993164, 62.462677001953125, -22.642608642578125, 67.54168701171875, 145.3827667236328, 182.34283447265625, 234.16030883789062, 225.21112060546875, 131.99488830566406, 98.098876953125, 15.607643127441406, 0.286651611328125, 53.79771423339844, 32.71122360229492, 69.19935607910156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000636.npy"}
{"epoch": 0.933920704845815, "step": 637, "batch_size": 64, "mean": 77.41453552246094, "std": 81.78123474121094, "min": -114.9149169921875, "p10": -5.850955200195311, "median": 72.25156021118164, "p90": 188.1249496459961, "max": 239.002685546875, "pos_frac": 0.84375, "sample": [1.644775390625, 108.71903991699219, 163.3994140625, 165.28317260742188, -21.740806579589844, 199.64996337890625, -73.55998229980469, -114.9149169921875, 127.30113983154297, 188.8350830078125, 30.842147827148438, -0.7017917633056641, -4.3893890380859375, 44.787445068359375, 0.7845726013183594, 228.927978515625, 137.05201721191406, -75.63416290283203, 129.10333251953125, 186.4679718017578, 42.23731994628906, 60.30851745605469, 73.03093719482422, 145.64260864257812, 32.230140686035156, 117.80863952636719, 1.9321441650390625, 55.62876892089844, 2.1752471923828125, 78.15780639648438, -6.4773406982421875, 131.6182403564453, 89.11831665039062, 50.496185302734375, 222.9505157470703, 0.3982391357421875, 22.657073974609375, 40.278221130371094, 103.14102935791016, 35.844886779785156, 198.05047607421875, 183.19943237304688, 102.73578643798828, 154.76434326171875, -11.458572387695312, 136.92161560058594, 91.28782653808594, 4.249519348144531, 205.09674072265625, 51.619873046875, 157.60809326171875, -82.79443359375, 57.466400146484375, 176.6014404296875, 71.47218322753906, 25.38983154296875, 42.53579330444336, 239.002685546875, 108.08784484863281, 115.220703125, -2.6314773559570312, 100.23944091796875, 103.595703125, 5.234642028808594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000637.npy"}
{"epoch": 0.9353891336270191, "step": 638, "batch_size": 64, "mean": 81.04854583740234, "std": 65.33538055419922, "min": -27.159759521484375, "p10": -0.5699098587036102, "median": 73.6426773071289, "p90": 172.20635986328125, "max": 237.59600830078125, "pos_frac": 0.890625, "sample": [177.769775390625, -1.8946533203125, 30.406494140625, 111.19219207763672, 118.84036254882812, 29.873611450195312, 75.60585021972656, 115.12705993652344, 134.7328338623047, 32.10050582885742, 128.67684936523438, 210.24908447265625, 194.98553466796875, 236.21804809570312, 5.746856689453125, 7.720417022705078, 5.565073013305664, 12.407569885253906, 47.06850051879883, 82.15995788574219, 96.4013671875, 71.67950439453125, 170.76536560058594, 57.93172836303711, 99.21345520019531, 199.80238342285156, 66.1825180053711, 59.83142852783203, 237.59600830078125, 121.5518569946289, 144.58651733398438, 88.01727294921875, 50.482147216796875, -13.111534118652344, 92.43505859375, 124.44851684570312, 48.75469970703125, 60.64438247680664, 28.973281860351562, 12.225326538085938, -18.379179000854492, 172.8239288330078, 90.07598876953125, 130.72222900390625, -5.744560241699219, 4.475555419921875, 122.64073944091797, 120.96112823486328, -13.057022094726562, 86.61299133300781, 60.54894256591797, 2.521158218383789, 65.8779296875, 126.0270004272461, 45.46112823486328, 86.61337280273438, 126.46794891357422, 148.8812713623047, 65.01216125488281, 134.52989196777344, -15.764274597167969, -27.159759521484375, 62.5594482421875, 41.43579864501953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000638.npy"}
{"epoch": 0.9368575624082232, "step": 639, "batch_size": 64, "mean": 44.23047637939453, "std": 81.16999816894531, "min": -147.1557159423828, "p10": -72.1784454345703, "median": 41.48894691467285, "p90": 155.3227508544922, "max": 206.64788818359375, "pos_frac": 0.734375, "sample": [92.53614807128906, 39.064842224121094, 93.05232238769531, 153.6270751953125, 49.33766174316406, 19.17266082763672, 37.735992431640625, 156.04946899414062, 80.16876220703125, 37.86246109008789, 29.924118041992188, -25.167770385742188, 41.09373474121094, -90.15083312988281, 142.94775390625, 67.43675231933594, -31.774864196777344, -88.9476318359375, 41.884159088134766, 76.46678924560547, 172.26702880859375, -147.1557159423828, 33.70993423461914, 97.16520690917969, 172.6504669189453, -88.29777526855469, -68.61053466796875, 135.64993286132812, 80.677490234375, 87.39741516113281, 197.98422241210938, 199.17030334472656, 177.92811584472656, -33.52749252319336, 89.42318725585938, 80.02885437011719, 42.247703552246094, 61.425941467285156, -73.70755004882812, -90.41876983642578, 101.75931549072266, 80.6195068359375, 37.39600372314453, -6.23016357421875, 125.08076477050781, -102.69217681884766, 18.35748291015625, 40.997840881347656, 19.597213745117188, 33.721588134765625, 24.592470169067383, -35.88005065917969, 131.38084411621094, 56.32191467285156, -60.600189208984375, -52.26947784423828, -8.789031982421875, 20.730926513671875, -44.940773010253906, 206.64788818359375, 69.76731872558594, 45.63312530517578, 63.22845458984375, 17.9921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000639.npy"}
{"epoch": 0.9383259911894273, "step": 640, "batch_size": 64, "mean": 75.16079711914062, "std": 77.56948852539062, "min": -165.97430419921875, "p10": -13.574027633666992, "median": 67.02810287475586, "p90": 195.9341033935547, "max": 228.12448120117188, "pos_frac": 0.875, "sample": [-13.673542022705078, 124.85223388671875, -33.30718994140625, 31.70447540283203, 6.628225326538086, 33.43739318847656, 94.40864562988281, 100.58038330078125, 132.61767578125, 52.18060302734375, 88.45195007324219, 196.133056640625, 95.1846923828125, 45.65842056274414, 89.19913482666016, 65.34595489501953, 79.75341796875, -165.97430419921875, 53.42200469970703, 37.32247543334961, 14.787109375, 105.09101867675781, 80.65167236328125, 63.91277313232422, 61.94873046875, 35.22297286987305, 109.48353576660156, 46.40840148925781, 197.1084747314453, 69.41667175292969, 175.60667419433594, 213.67108154296875, 38.71148681640625, 28.27157974243164, 228.12448120117188, 1.7807941436767578, 11.781547546386719, 94.60247802734375, 40.83601379394531, -38.632904052734375, 147.27313232421875, 195.46987915039062, 121.40191650390625, 139.747802734375, 154.91995239257812, 76.110595703125, 100.59757232666016, 15.71710205078125, -38.74345779418945, 227.1817169189453, 162.601318359375, 57.97132873535156, -34.25419616699219, 197.029296875, -100.83949279785156, 68.71025085449219, 49.99718475341797, 59.02726745605469, 116.47280883789062, 197.58766174316406, 60.46885681152344, 132.14260864257812, -13.341827392578125, 24.33127212524414], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000640.npy"}
{"epoch": 0.9397944199706314, "step": 641, "batch_size": 64, "mean": 75.97200012207031, "std": 85.58711242675781, "min": -94.07441711425781, "p10": -32.05515747070312, "median": 86.6772346496582, "p90": 170.51169738769534, "max": 377.44158935546875, "pos_frac": 0.78125, "sample": [-24.74517059326172, 144.30027770996094, -28.697608947753906, 206.35018920898438, -32.48480987548828, 6.964508056640625, 105.8330078125, 164.6886749267578, 23.317703247070312, -23.050262451171875, 4.165576934814453, 47.605567932128906, -29.68060302734375, 13.003303527832031, 377.44158935546875, 18.441139221191406, 133.47161865234375, 28.257537841796875, -10.269210815429688, 155.47084045410156, 71.06135559082031, 37.86712646484375, 115.66719818115234, 115.37176513671875, 110.81570434570312, 54.65730285644531, -48.23539352416992, 92.5779037475586, 159.09060668945312, 173.0072784423828, 104.28924560546875, 151.02719116210938, 78.3037109375, -34.18406677246094, -4.340576171875, 164.05490112304688, 27.093902587890625, 136.70559692382812, -67.13137817382812, -50.687477111816406, 46.02602005004883, -94.07441711425781, 122.56748962402344, 26.498130798339844, 111.22926330566406, -69.80474853515625, -31.052635192871094, 146.78860473632812, 94.61544799804688, 127.67634582519531, 122.94965362548828, 175.30517578125, 86.03741455078125, 114.86308288574219, 178.39820861816406, 26.886276245117188, 76.78348541259766, 112.62924194335938, 130.8772735595703, 70.84771728515625, 87.31705474853516, 146.06707763671875, 182.81997680664062, 202.5615234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000641.npy"}
{"epoch": 0.9412628487518355, "step": 642, "batch_size": 64, "mean": 65.17139434814453, "std": 85.81626892089844, "min": -91.50529479980469, "p10": -56.95937919616699, "median": 66.30398178100586, "p90": 171.25851287841797, "max": 343.371826171875, "pos_frac": 0.78125, "sample": [108.24927520751953, -71.58812713623047, 64.28324890136719, 26.70543670654297, 175.4564208984375, -72.06815338134766, 78.75654602050781, -16.235122680664062, 54.16034698486328, 24.24303436279297, -17.086807250976562, 67.64678955078125, 83.78010559082031, -58.001747131347656, 139.68455505371094, 72.41064453125, 110.9344482421875, 103.27592468261719, -89.0994873046875, 110.12284851074219, -54.52718734741211, -22.693038940429688, -59.8216667175293, 1.3800392150878906, 38.9451904296875, 171.73558044433594, 95.94678497314453, 65.12171936035156, 40.95692443847656, 89.94363403320312, 67.48624420166016, 180.6313018798828, 127.29690551757812, 170.14535522460938, 115.38330841064453, 176.27481079101562, 163.52072143554688, 103.727294921875, 56.678955078125, 36.1005744934082, 137.78732299804688, 89.77592468261719, 116.13555908203125, -3.9011383056640625, 5.068363189697266, 53.61320495605469, 77.25531005859375, 190.39834594726562, 109.95912170410156, 32.47509765625, 343.371826171875, -3.1859283447265625, 56.025535583496094, 8.071685791015625, 17.44756317138672, -70.85529327392578, 16.223098754882812, 306.3631591796875, 70.44200134277344, -6.972938537597656, 15.270843505859375, 130.53201293945312, -91.50529479980469, 111.30997467041016], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000642.npy"}
{"epoch": 0.9427312775330396, "step": 643, "batch_size": 64, "mean": 64.00656127929688, "std": 72.64767456054688, "min": -115.7318344116211, "p10": -26.2063735961914, "median": 60.40215873718262, "p90": 161.76649017333986, "max": 210.59152221679688, "pos_frac": 0.828125, "sample": [178.73825073242188, 147.61968994140625, 145.3840789794922, 26.815597534179688, 99.55279541015625, 158.6634979248047, 162.90499877929688, 27.871292114257812, 14.859024047851562, 57.60371398925781, -56.086456298828125, -50.81249237060547, 16.887752532958984, 95.2748794555664, -16.8502197265625, 103.53834533691406, 91.53861999511719, 49.654815673828125, 166.31610107421875, 131.73696899414062, 166.26675415039062, 109.37198638916016, 85.7876968383789, 22.71346664428711, -5.034921646118164, 35.861087799072266, 85.602783203125, 111.94705200195312, 210.59152221679688, 4.253143310546875, -39.29627227783203, 96.80195617675781, -102.37870025634766, 21.279075622558594, 159.10997009277344, 55.40411376953125, 65.56962585449219, 93.22452545166016, 59.85447311401367, 94.76678466796875, 136.35531616210938, 152.98129272460938, 184.9246826171875, -5.26495361328125, 84.73635864257812, 32.46028137207031, 0.7812557220458984, 35.62144470214844, 184.53067016601562, 33.1542854309082, 35.543487548828125, 33.62379455566406, -57.142940521240234, 84.73184204101562, 111.77377319335938, 11.994171142578125, -28.7110595703125, 5.438232421875, 99.21429443359375, -20.362106323242188, 48.00208282470703, 60.94984436035156, -115.7318344116211, 103.90831756591797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000643.npy"}
{"epoch": 0.9441997063142438, "step": 644, "batch_size": 64, "mean": 55.31208801269531, "std": 68.82864379882812, "min": -108.34590911865234, "p10": -36.30487098693847, "median": 58.31765174865723, "p90": 146.9589736938477, "max": 206.44781494140625, "pos_frac": 0.765625, "sample": [206.44781494140625, 49.02298355102539, 48.945098876953125, -90.48809814453125, -108.34590911865234, 149.9726104736328, -41.438865661621094, 82.11592864990234, 15.394538879394531, 52.71405029296875, 67.46804809570312, 72.11272430419922, 30.26305389404297, 130.8048553466797, 93.32247161865234, 125.62362670898438, 60.6282958984375, -49.15806579589844, 44.465789794921875, 84.34194946289062, 33.49945068359375, 83.641845703125, 85.99244689941406, -3.3877334594726562, 14.3974609375, 156.2809600830078, -9.22470474243164, 27.07436752319336, 23.657432556152344, 202.10589599609375, 77.91972351074219, 56.00700759887695, 68.90464782714844, 74.2021484375, 134.76011657714844, 71.15017700195312, 86.15695190429688, 139.92715454101562, 46.38241958618164, 117.04443359375, -9.045028686523438, 64.43659973144531, 182.74179077148438, 5.542201995849609, -48.65733337402344, 183.80772399902344, 24.50171661376953, -13.565338134765625, 31.082637786865234, 155.82907104492188, 36.085052490234375, 76.1219482421875, -24.325550079345703, -69.00627136230469, 61.40997314453125, 103.49756622314453, 44.81757354736328, 114.50981140136719, -11.015335083007812, -1.7284164428710938, 62.39976119995117, 118.33938598632812, -10.835824012756348, -47.67523193359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000644.npy"}
{"epoch": 0.9456681350954479, "step": 645, "batch_size": 64, "mean": 41.705081939697266, "std": 83.37627410888672, "min": -115.03807067871094, "p10": -65.4136589050293, "median": 41.447975158691406, "p90": 146.41982421875, "max": 216.49502563476562, "pos_frac": 0.640625, "sample": [-115.03807067871094, 130.97189331054688, -37.21278381347656, -55.48411560058594, -41.62974548339844, -73.54988098144531, 58.702178955078125, 130.12460327148438, 199.29867553710938, 39.70549774169922, 13.74520492553711, -22.128890991210938, 90.42503356933594, -5.693508148193359, 125.26551818847656, 54.522369384765625, -67.82795715332031, 86.00717163085938, 57.99244689941406, 100.42919158935547, 108.44520568847656, -82.91217041015625, 1.4215850830078125, 8.220352172851562, 43.190452575683594, -59.780296325683594, -13.141128540039062, -33.230567932128906, -68.64891815185547, -49.215187072753906, 19.406902313232422, 25.077129364013672, 30.267044067382812, 136.30068969726562, 162.90325927734375, -33.997833251953125, 67.99020385742188, 162.3917694091797, -10.693244934082031, -104.49628448486328, 210.71243286132812, 114.93228149414062, -28.53887939453125, -43.10285186767578, 137.3153533935547, 181.08966064453125, 51.822784423828125, 87.0066146850586, 146.2805633544922, 35.801963806152344, -33.265289306640625, 5.8365020751953125, 146.47950744628906, -7.045726776123047, 75.81744384765625, 216.49502563476562, 85.93830871582031, 52.49879455566406, -105.840087890625, 100.27024841308594, 69.25566101074219, -0.29465675354003906, 128.23297119140625, 63.302886962890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000645.npy"}
{"epoch": 0.947136563876652, "step": 646, "batch_size": 64, "mean": 71.85616302490234, "std": 67.20948791503906, "min": -66.08417510986328, "p10": -5.28554573059082, "median": 75.42411804199219, "p90": 163.78188934326172, "max": 205.576416015625, "pos_frac": 0.84375, "sample": [165.70289611816406, 135.253662109375, 152.8299102783203, 159.61766052246094, -4.498737335205078, 29.688507080078125, -26.64312744140625, 93.58697509765625, 77.21401977539062, 57.87342834472656, 15.485908508300781, 98.60201263427734, 80.53141021728516, 27.470474243164062, 95.51795959472656, 25.818599700927734, 31.399188995361328, 170.28384399414062, 91.1947021484375, 102.70610046386719, -20.903778076171875, 72.43744659423828, 65.50570678710938, 110.36279296875, 117.6436767578125, 21.135032653808594, 192.84844970703125, 29.268814086914062, 58.6203498840332, 140.529541015625, 20.170303344726562, 81.12771606445312, -53.31495666503906, 33.35053253173828, 126.89077758789062, 120.54299926757812, 26.50494384765625, 157.39093017578125, 0.483734130859375, -33.22313690185547, 32.550926208496094, -5.622749328613281, 85.17585754394531, -23.453025817871094, -66.08417510986328, 5.737140655517578, 31.43572998046875, 19.357799530029297, 158.25718688964844, 202.21664428710938, 165.56655883789062, 131.3142852783203, 172.93319702148438, 12.876434326171875, 10.315086364746094, 73.63421630859375, 104.37376403808594, 90.33387756347656, 102.18240356445312, -0.3594818115234375, -1.737579345703125, 205.576416015625, 100.95303344726562, 144.25347900390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000646.npy"}
{"epoch": 0.9486049926578561, "step": 647, "batch_size": 64, "mean": 59.60916519165039, "std": 87.4765625, "min": -100.00120544433594, "p10": -49.49076614379882, "median": 56.69459533691406, "p90": 181.73297424316408, "max": 294.39337158203125, "pos_frac": 0.71875, "sample": [51.69105529785156, 160.57052612304688, 184.83636474609375, 63.38545227050781, 174.49172973632812, 156.97744750976562, 34.026084899902344, -76.35369873046875, 52.146514892578125, 38.55821228027344, 73.13723754882812, 31.374786376953125, -14.520599365234375, 109.85712432861328, 61.24267578125, 21.72496795654297, 34.43666076660156, 74.06657409667969, 228.24453735351562, -6.433803558349609, 86.6026611328125, -41.74909210205078, 134.69937133789062, -32.59911346435547, 93.61568450927734, 193.5582275390625, 31.988693237304688, 294.39337158203125, -80.982421875, 210.5073699951172, 112.87643432617188, 74.52853393554688, 86.43685150146484, 68.68061065673828, 98.84700775146484, -61.76018524169922, -51.3104248046875, 128.95094299316406, -10.552581787109375, -14.74444580078125, -23.677791595458984, 33.206153869628906, 122.08140563964844, 14.043426513671875, -97.734619140625, 24.001461029052734, 67.6655502319336, -12.466670989990234, -45.244895935058594, 20.896026611328125, 3.3091182708740234, 142.51393127441406, 218.38943481445312, 109.31878662109375, -21.93902587890625, 150.12799072265625, -100.00120544433594, 113.31018829345703, -28.13823699951172, 103.26724243164062, 16.4642333984375, 211.23797607421875, -58.44148254394531, 77.35014343261719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000647.npy"}
{"epoch": 0.9500734214390602, "step": 648, "batch_size": 64, "mean": 85.91301727294922, "std": 82.87735748291016, "min": -143.3994903564453, "p10": -10.396869659423817, "median": 79.66897201538086, "p90": 189.6148635864258, "max": 341.47393798828125, "pos_frac": 0.890625, "sample": [8.433847427368164, 4.961524963378906, 37.88172149658203, 130.4642791748047, 81.98825073242188, 61.546958923339844, 111.27735900878906, 261.49432373046875, 31.76141357421875, 68.86664581298828, 27.93329620361328, 33.41650390625, 4.1724853515625, 122.56388854980469, 4.26739501953125, 129.430419921875, 208.04550170898438, 6.30377197265625, 129.7088623046875, -26.60809326171875, -15.633541107177734, 44.89427947998047, 1.3893814086914062, 67.34857177734375, 30.6517333984375, 172.51361083984375, 229.04498291015625, 192.6976776123047, 85.03669738769531, 164.2613067626953, 72.37214660644531, 70.20381164550781, -143.3994903564453, 341.47393798828125, 182.421630859375, 56.114768981933594, 29.660682678222656, 97.30690002441406, 41.93804168701172, -19.428863525390625, -35.948516845703125, 6.0838623046875, 200.1287841796875, 120.0179443359375, 57.21547317504883, 142.181396484375, 142.12509155273438, -15.4481201171875, 138.77645874023438, 75.12806701660156, 143.87985229492188, 111.056640625, 92.44476318359375, 80.99413299560547, 128.36447143554688, 21.2491397857666, 117.88203430175781, 155.96937561035156, 164.70999145507812, 206.14398193359375, 112.88107299804688, 160.72653198242188, -45.2515869140625, 78.34381103515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000648.npy"}
{"epoch": 0.9515418502202643, "step": 649, "batch_size": 64, "mean": 69.18254852294922, "std": 66.83673095703125, "min": -45.90351104736328, "p10": -22.63207550048828, "median": 64.73954391479492, "p90": 156.85565032958985, "max": 215.6269989013672, "pos_frac": 0.828125, "sample": [92.1651611328125, 137.83023071289062, 113.55448913574219, 13.379264831542969, 89.80061340332031, 6.527750015258789, 32.7581787109375, 53.988037109375, 95.68853759765625, -33.11125183105469, 109.60404968261719, -30.2144775390625, -45.90351104736328, -12.058364868164062, 38.06733703613281, 0.38742828369140625, 29.384811401367188, 157.82957458496094, 10.01131820678711, 100.46871185302734, 116.70488739013672, 81.79953002929688, -20.84735107421875, 2.6531009674072266, 154.58316040039062, 48.48661422729492, 136.5115966796875, 59.12989807128906, -4.730659484863281, 153.96726989746094, -39.03578186035156, -23.396957397460938, 64.22073364257812, 91.57682800292969, 72.03489685058594, 64.2894058227539, 65.18968200683594, 100.35809326171875, 169.6951141357422, 84.82501983642578, 91.35743713378906, 43.899505615234375, 4.216608047485352, 167.10804748535156, 143.52816772460938, 127.79393768310547, 215.6269989013672, 20.482681274414062, 79.59771728515625, 120.44618225097656, 0.7425975799560547, 113.311767578125, -3.691009521484375, 149.59890747070312, 181.5087127685547, 206.7398681640625, 54.605224609375, -28.169658660888672, 52.41257095336914, 179.6592559814453, 42.57533264160156, 108.81404113769531, -31.338348388671875, 48.68376159667969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000649.npy"}
{"epoch": 0.9530102790014684, "step": 650, "batch_size": 64, "mean": 68.0292739868164, "std": 80.41361236572266, "min": -175.21170043945312, "p10": -26.892950439453124, "median": 73.34943008422852, "p90": 179.09219360351562, "max": 241.5806884765625, "pos_frac": 0.828125, "sample": [73.50296020507812, 62.57916259765625, 79.65245819091797, 182.3662109375, 89.77359008789062, 80.72792053222656, 87.96247863769531, -27.848793029785156, 151.3800048828125, 154.37344360351562, 214.099853515625, 4.108879089355469, 84.78018188476562, 55.93397521972656, 123.49923706054688, 214.0282745361328, 17.945999145507812, 241.5806884765625, 52.96405029296875, 177.43736267089844, 213.1591796875, 73.1958999633789, 76.2469711303711, 109.4297103881836, 179.80140686035156, 80.48865509033203, -27.75799560546875, 12.941436767578125, 73.53456115722656, 74.08146667480469, 76.26383972167969, 159.5806121826172, 29.887168884277344, -175.21170043945312, 47.48236846923828, 68.59367370605469, -42.376922607421875, 206.5196990966797, -9.629499435424805, 46.547119140625, -24.87451171875, -29.499252319335938, 28.33110237121582, 56.12086486816406, 111.88533782958984, 17.6458740234375, 145.42530822753906, 18.108871459960938, 41.03265380859375, 73.99163818359375, 104.26335906982422, -7.398643493652344, 68.82365417480469, 30.237743377685547, -60.239501953125, 76.34609985351562, 123.38584899902344, -17.453048706054688, -130.41143798828125, 47.67082214355469, 84.8992919921875, 9.175849914550781, 172.8431396484375, 19.936866760253906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000650.npy"}
{"epoch": 0.9544787077826725, "step": 651, "batch_size": 64, "mean": 82.91757202148438, "std": 72.1063461303711, "min": -70.32748413085938, "p10": -10.941246795654292, "median": 95.8122787475586, "p90": 167.41909637451172, "max": 265.5610046386719, "pos_frac": 0.84375, "sample": [163.58941650390625, 139.65988159179688, 77.15327453613281, 168.3739776611328, 92.32803344726562, 22.75957489013672, 188.74246215820312, 85.20854187011719, 96.94792175292969, 66.81077575683594, 150.7470245361328, 165.1910400390625, 137.20144653320312, -70.32748413085938, 55.91143798828125, 114.23809051513672, 35.46717834472656, -50.12952423095703, 126.09329223632812, 184.2141876220703, 112.05094146728516, 16.055152893066406, 126.02950286865234, -39.880859375, 16.4830322265625, 40.28314208984375, -14.70431900024414, -6.1967926025390625, 137.36599731445312, 125.12783813476562, 135.89321899414062, -24.871841430664062, 236.26504516601562, 33.5046272277832, 66.46987915039062, 96.13777160644531, 100.79048156738281, 128.68698120117188, 58.10986328125, -6.650238037109375, 25.41346549987793, 107.2582778930664, -32.983062744140625, 109.97047424316406, 6.8109588623046875, 197.47994995117188, 9.881553649902344, 265.5610046386719, -0.11730194091796875, 44.38824462890625, 32.317935943603516, 29.22180938720703, 177.3907928466797, 105.67103576660156, 95.48678588867188, 114.77180480957031, 109.6241455078125, -12.780250549316406, 138.7706298828125, 11.923141479492188, 129.56100463867188, 109.66400909423828, 90.24031066894531, 154.06826782226562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000651.npy"}
{"epoch": 0.9559471365638766, "step": 652, "batch_size": 64, "mean": 70.08258819580078, "std": 64.61573791503906, "min": -53.882110595703125, "p10": -5.195122528076169, "median": 69.47147369384766, "p90": 158.79793395996094, "max": 249.08004760742188, "pos_frac": 0.875, "sample": [46.96751403808594, 42.504600524902344, 89.20541381835938, 35.53288269042969, 2.2440643310546875, 95.86898040771484, 95.42023468017578, -2.4703445434570312, 131.93408203125, 78.15316772460938, 98.42654418945312, 191.9355926513672, 201.9459228515625, 27.394287109375, 1.0117034912109375, 28.369461059570312, 41.56543731689453, 164.15301513671875, 163.71548461914062, -44.215797424316406, 77.77752685546875, 76.45008850097656, 21.409446716308594, -53.882110595703125, 72.56428527832031, 45.37574768066406, 59.593040466308594, 151.16119384765625, 133.08721923828125, 5.760284423828125, 67.66123962402344, 121.66580200195312, 71.28170776367188, 155.79693603515625, 160.08407592773438, 126.85745239257812, -12.471405029296875, 11.7464599609375, -12.66152572631836, 59.76708984375, 75.3894271850586, 94.83050537109375, -8.453948974609375, 27.925289154052734, 212.71092224121094, 108.22093200683594, 49.446136474609375, 91.05094146728516, 67.61377716064453, 76.0258560180664, 249.08004760742188, 1.8490066528320312, 148.85610961914062, 16.909507751464844, 91.68993377685547, 37.65321350097656, 3.993438720703125, -33.68388366699219, 24.629302978515625, 46.298309326171875, 89.9060287475586, 116.01226806640625, 75.00871276855469, -6.362884521484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000652.npy"}
{"epoch": 0.9574155653450808, "step": 653, "batch_size": 64, "mean": 67.90987396240234, "std": 73.79393768310547, "min": -132.31112670898438, "p10": -33.093054962158206, "median": 74.38645935058594, "p90": 164.26560516357424, "max": 216.0851287841797, "pos_frac": 0.84375, "sample": [60.5277099609375, 74.29257202148438, 38.9954948425293, 137.1265411376953, 6.510536193847656, 62.459014892578125, -33.28581237792969, -33.0733642578125, 31.532604217529297, 216.0851287841797, 87.26164245605469, 56.97742462158203, -20.076080322265625, 110.65359497070312, 185.20037841796875, 155.32015991210938, 127.57594299316406, 69.14385986328125, 168.71466064453125, -132.31112670898438, -20.491750717163086, 143.95111083984375, 74.4803466796875, 30.52161407470703, 126.1251220703125, 73.38648986816406, 9.485977172851562, 7.647834777832031, 95.23336029052734, 89.1285400390625, 158.16024780273438, 77.05585479736328, -33.10149383544922, 55.79277801513672, 110.99209594726562, 175.4788818359375, -63.15616226196289, 91.96263122558594, -42.980796813964844, 166.88218688964844, 111.42431640625, 94.58895874023438, 0.318603515625, 42.58390808105469, 196.1757049560547, 93.2059555053711, 23.45062255859375, 12.092933654785156, 121.39799499511719, 13.426752090454102, 95.46144104003906, 9.348468780517578, 111.38883972167969, 204.12596130371094, 97.57365417480469, 47.69751739501953, 105.25504302978516, -88.95355224609375, 113.95984649658203, 43.42613983154297, 101.15422058105469, 62.05903625488281, -67.56440734863281, 106.44810485839844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000653.npy"}
{"epoch": 0.9588839941262849, "step": 654, "batch_size": 64, "mean": 57.98382568359375, "std": 89.82442474365234, "min": -150.3082275390625, "p10": -52.87064666748046, "median": 54.13536262512207, "p90": 180.6074279785157, "max": 320.18084716796875, "pos_frac": 0.796875, "sample": [59.251976013183594, 60.0771369934082, -27.175369262695312, 48.359580993652344, 43.12434387207031, 46.41645812988281, -115.93133544921875, 62.31004333496094, 10.982454299926758, 104.91915893554688, 167.366943359375, 129.9783477783203, 55.521392822265625, 47.69914245605469, 7.458171844482422, 63.34461975097656, 148.8690185546875, 26.393569946289062, 112.0997314453125, 51.99532699584961, 209.73959350585938, 197.76168823242188, 1.9523048400878906, 7.574462890625, 52.88063049316406, 239.06883239746094, -72.08528137207031, 59.08552551269531, -4.149711608886719, 63.781951904296875, 107.0814208984375, 186.28192138671875, 165.34730529785156, 6.575714111328125, 144.00047302246094, 15.388534545898438, 213.2782745361328, 65.26119995117188, 65.23614501953125, 137.96591186523438, -14.088043212890625, 90.7225112915039, 29.385040283203125, 48.88962173461914, -104.353271484375, 234.41888427734375, 2.1112747192382812, 55.39009475708008, 148.22042846679688, -37.36751174926758, 59.07867431640625, -57.763893127441406, 103.024658203125, -41.45307159423828, 63.517478942871094, -74.12303161621094, 31.270050048828125, -150.3082275390625, 320.18084716796875, 87.27235412597656, 18.32952880859375, -75.60140991210938, -11.878582000732422, 21.002853393554688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000654.npy"}
{"epoch": 0.960352422907489, "step": 655, "batch_size": 64, "mean": 67.70552825927734, "std": 77.22325897216797, "min": -135.0894775390625, "p10": -22.361821365356437, "median": 66.47673797607422, "p90": 169.49739379882814, "max": 193.29432678222656, "pos_frac": 0.828125, "sample": [1.877685546875, 87.15005493164062, 84.82106018066406, 74.1490478515625, -15.966136932373047, 78.79971313476562, 23.225990295410156, 38.823944091796875, 51.40977478027344, 88.4443130493164, 42.299468994140625, 73.98709106445312, 5.8296051025390625, -3.058380126953125, 30.787715911865234, 120.1693115234375, 137.0786590576172, 65.6962890625, 48.82383728027344, 148.06976318359375, 24.925254821777344, 172.13320922851562, 137.58673095703125, 12.813980102539062, 4.334930419921875, 52.49277114868164, 171.30018615722656, 120.85286712646484, -38.05226135253906, 127.19245910644531, -53.08592987060547, 174.98072814941406, 108.0130386352539, 22.580297470092773, 137.1625213623047, 106.95791625976562, -132.7344207763672, 37.696044921875, 183.96841430664062, 164.77655029296875, 21.552196502685547, 165.29087829589844, 156.35243225097656, 193.29432678222656, 191.97836303710938, 34.140655517578125, 84.62458801269531, -15.918643951416016, -50.51373291015625, -52.191070556640625, 13.654541015625, 67.25718688964844, 193.27322387695312, -135.0894775390625, 122.67886352539062, 61.52815246582031, 153.11807250976562, -25.102828979492188, -14.310096740722656, 139.21636962890625, 129.6986083984375, 119.44463348388672, 28.687652587890625, 32.17481231689453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000655.npy"}
{"epoch": 0.9618208516886931, "step": 656, "batch_size": 64, "mean": 85.67755126953125, "std": 75.87490844726562, "min": -58.49580383300781, "p10": 5.092994689941406, "median": 75.73480987548828, "p90": 193.5147186279297, "max": 303.31878662109375, "pos_frac": 0.921875, "sample": [195.96177673339844, 123.2515869140625, 8.259963989257812, 97.0156478881836, 46.992034912109375, 184.37826538085938, 26.274160385131836, 83.05941772460938, 97.54994201660156, 17.653564453125, 89.23452758789062, 8.36749267578125, 61.386322021484375, 95.74856567382812, 85.14418029785156, 165.76483154296875, 98.72927856445312, 75.56022644042969, 144.5531005859375, 125.14799499511719, 102.86810302734375, 58.1986083984375, 94.05860900878906, 27.875762939453125, 30.8182373046875, -2.3548583984375, 133.85452270507812, 137.30416870117188, 153.36105346679688, 160.34243774414062, 71.65929412841797, 132.76556396484375, 75.90939331054688, -7.79638671875, 303.31878662109375, 41.69975280761719, 204.10366821289062, 69.02548217773438, 286.00653076171875, 14.516571044921875, 197.47515869140625, 26.42351531982422, 187.80491638183594, 25.354530334472656, 131.71273803710938, -58.49580383300781, 50.073299407958984, 202.54083251953125, -31.001659393310547, 46.233360290527344, 5.3580474853515625, 85.32710266113281, 70.6734619140625, 23.99749755859375, 228.28541564941406, 4.979400634765625, 3.97369384765625, 23.898527145385742, 28.41149139404297, 56.13134002685547, 35.65531921386719, 80.97006225585938, -35.680076599121094, 175.69305419921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000656.npy"}
{"epoch": 0.9632892804698973, "step": 657, "batch_size": 64, "mean": 72.03126525878906, "std": 86.89146423339844, "min": -120.42321014404297, "p10": -29.064809417724604, "median": 63.755245208740234, "p90": 197.17935180664065, "max": 306.984375, "pos_frac": 0.828125, "sample": [144.12353515625, -15.56405258178711, 139.91558837890625, 50.777496337890625, 161.35040283203125, 306.984375, 5.455085754394531, 17.73320770263672, 200.21469116210938, 19.911727905273438, -120.42321014404297, 0.7472610473632812, 66.1261215209961, 137.34942626953125, -54.255836486816406, 113.006591796875, 12.054771423339844, 68.13560485839844, -65.27253723144531, 259.55712890625, 91.62423706054688, 163.77142333984375, 102.04405975341797, 214.68353271484375, 9.141799926757812, 50.30656433105469, 48.83171081542969, 115.81616973876953, -32.22569274902344, 190.09689331054688, 57.252960205078125, 80.37532806396484, 37.127742767333984, -103.59600067138672, 135.28836059570312, -21.689414978027344, 157.41427612304688, -49.109779357910156, -11.243736267089844, 32.101409912109375, 58.39611053466797, 86.64117431640625, 1.050790786743164, 16.87287139892578, 100.22233581542969, 87.50160217285156, 32.93971252441406, 33.716365814208984, 68.7274169921875, 40.693607330322266, 207.2659912109375, 73.62528991699219, 132.7291259765625, 213.35769653320312, -11.72686767578125, 69.33738708496094, 143.73699951171875, 232.00582885742188, 78.91830444335938, -55.299678802490234, 59.478973388671875, 61.384368896484375, 23.764663696289062, 138.72186279296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000657.npy"}
{"epoch": 0.9647577092511013, "step": 658, "batch_size": 64, "mean": 60.930870056152344, "std": 73.79741668701172, "min": -129.91616821289062, "p10": -23.458081817626947, "median": 50.20601463317871, "p90": 160.94861602783203, "max": 241.34793090820312, "pos_frac": 0.828125, "sample": [98.86466979980469, 11.938713073730469, 24.342498779296875, -43.71266174316406, -14.906623840332031, 82.71682739257812, 161.5862579345703, 40.39948272705078, 90.92892456054688, 93.48422241210938, 62.8922119140625, 103.84564208984375, 18.284290313720703, 2.219146728515625, 168.16180419921875, 101.28397369384766, 77.40785217285156, 30.51803970336914, 185.14259338378906, -6.916248321533203, 0.978057861328125, -26.60171890258789, -87.39002227783203, 54.508907318115234, 29.975234985351562, 166.30467224121094, 142.17672729492188, 125.22984313964844, 20.76983642578125, -129.91616821289062, 96.95392608642578, 155.37828063964844, 85.57779693603516, 122.32977294921875, 29.190887451171875, 99.28904724121094, 45.64383316040039, 50.32072448730469, 57.075538635253906, -2.071992874145508, 188.5198516845703, 159.46078491210938, 13.097549438476562, 20.821311950683594, 39.359649658203125, 128.7372283935547, -35.632904052734375, 22.921066284179688, 206.34783935546875, 144.36123657226562, 19.042943954467773, 116.62335968017578, 50.091304779052734, 73.36414337158203, 38.27043151855469, -34.23377990722656, 39.47523498535156, 241.34793090820312, -16.122928619384766, 129.25892639160156, -81.97533416748047, 74.52127838134766, 32.923179626464844, 4.79071044921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000658.npy"}
{"epoch": 0.9662261380323054, "step": 659, "batch_size": 64, "mean": 79.48670959472656, "std": 97.0342025756836, "min": -138.17666625976562, "p10": -54.23401870727538, "median": 89.45333099365234, "p90": 187.79335937500005, "max": 316.3569030761719, "pos_frac": 0.78125, "sample": [198.9478302001953, 89.05403137207031, 18.44274139404297, 156.91018676757812, 78.92478942871094, 74.22313690185547, 240.33807373046875, 114.5452880859375, 72.26066589355469, 115.40467071533203, 316.3569030761719, -5.950279235839844, 151.86849975585938, -103.64263153076172, -70.33245086669922, 33.940460205078125, 17.689178466796875, 35.23753356933594, 94.90911865234375, 73.6976089477539, 55.64665985107422, 47.67864990234375, -138.17666625976562, 125.24794006347656, 151.59527587890625, 304.31134033203125, 141.1505584716797, 151.7090301513672, 93.61351013183594, -7.9646759033203125, -36.09721374511719, 23.64331817626953, -35.120941162109375, 174.63739013671875, 87.54391479492188, 115.28724670410156, 127.14168548583984, -39.479248046875, 32.55397033691406, 157.58651733398438, 110.43404388427734, 194.86956787109375, 168.35345458984375, -111.5904312133789, 212.45074462890625, -45.97825622558594, -57.73066711425781, 108.7100830078125, 172.37698364257812, 112.532470703125, 68.81519317626953, 23.144542694091797, 123.61287689208984, 168.86190795898438, 174.83746337890625, 39.331214904785156, 120.8023681640625, 27.48931884765625, 193.34588623046875, -46.075172424316406, 89.85263061523438, 146.08627319335938, -82.95741271972656, -59.75921630859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000659.npy"}
{"epoch": 0.9676945668135095, "step": 660, "batch_size": 64, "mean": 80.31880187988281, "std": 80.78871154785156, "min": -74.44376373291016, "p10": -21.140383148193358, "median": 84.14671325683594, "p90": 184.72500457763675, "max": 278.03277587890625, "pos_frac": 0.8125, "sample": [115.27987670898438, 76.662353515625, 166.39114379882812, 20.074378967285156, 43.6901741027832, 155.85073852539062, -20.90978240966797, 194.6710968017578, 20.131454467773438, 125.31867980957031, -25.567508697509766, 179.5135955810547, 240.26568603515625, 28.89110565185547, 121.79812622070312, 164.11688232421875, 129.53680419921875, -57.90105438232422, -21.239212036132812, 44.26983642578125, 50.315582275390625, 87.63904571533203, 186.95846557617188, 189.00189208984375, 84.16323852539062, 278.03277587890625, -55.52400588989258, 84.53412628173828, 73.64334106445312, 59.84492492675781, 110.68059539794922, 100.59297180175781, 84.13018798828125, 264.94671630859375, 97.54949951171875, 169.12673950195312, 98.83396911621094, 45.287200927734375, 51.900665283203125, 95.19778442382812, -11.39057731628418, 86.88339233398438, 99.8760757446289, 102.44635009765625, -12.221414566040039, 131.3152618408203, -12.882762908935547, 164.02684020996094, 144.75758361816406, 113.55180358886719, -60.0364990234375, 146.68081665039062, -4.4420623779296875, 45.321197509765625, 62.02832794189453, -74.44376373291016, 56.34429168701172, 13.3616943359375, 19.779653549194336, 201.95462036132812, 62.5216178894043, -71.76488494873047, 13.699836730957031, 65.33573913574219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000660.npy"}
{"epoch": 0.9691629955947136, "step": 661, "batch_size": 64, "mean": 83.06249237060547, "std": 84.5733642578125, "min": -90.732177734375, "p10": -26.77875213623047, "median": 80.82958984375, "p90": 189.71059570312502, "max": 300.3523864746094, "pos_frac": 0.84375, "sample": [71.86160278320312, 16.481460571289062, 101.00497436523438, 29.02798843383789, 184.47119140625, -77.87722778320312, 94.73614501953125, 61.515281677246094, 81.96534729003906, 64.78895568847656, 174.45486450195312, 29.024215698242188, 96.16552734375, 79.91000366210938, 263.5610656738281, -26.289085388183594, 57.51935577392578, 36.708091735839844, -33.29583740234375, 134.7935791015625, 178.807373046875, 95.78497314453125, 122.6741943359375, 48.438499450683594, 46.2541389465332, 211.93576049804688, 140.55157470703125, 83.45088958740234, 191.9560546875, 37.85798645019531, -68.05390930175781, 145.53404235839844, 67.47329711914062, 172.7064208984375, 8.549312591552734, 11.856204986572266, 56.681251525878906, 2.908416748046875, 137.07131958007812, 72.73211669921875, -38.47220230102539, 40.615325927734375, -26.988609313964844, 119.29986572265625, 82.76441955566406, -90.732177734375, 155.16445922851562, 99.09283447265625, 127.58122253417969, 152.30252075195312, 300.3523864746094, -1.484527587890625, 38.19313049316406, -0.48043251037597656, 81.74917602539062, 84.38121032714844, 133.09585571289062, 235.85528564453125, -76.31417846679688, 254.60723876953125, 213.49835205078125, 100.50383758544922, 53.24162292480469, 72.47529602050781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000661.npy"}
{"epoch": 0.9706314243759178, "step": 662, "batch_size": 64, "mean": 73.25682067871094, "std": 66.38719177246094, "min": -55.968841552734375, "p10": -6.954984283447264, "median": 69.77767181396484, "p90": 149.24573364257813, "max": 245.05332946777344, "pos_frac": 0.859375, "sample": [-4.599277496337891, 95.24935150146484, 91.31144714355469, -48.56464385986328, 132.9973907470703, 105.13174438476562, -13.869247436523438, 120.90278625488281, 208.10919189453125, 114.84587860107422, 95.44717407226562, 47.020389556884766, 35.3125, 119.11878967285156, 203.85670471191406, 75.98309326171875, 31.942485809326172, -28.825084686279297, 204.8690185546875, 24.582916259765625, 245.05332946777344, 124.11207580566406, 97.35631561279297, 23.11115264892578, 43.16154479980469, 66.47091674804688, 54.139808654785156, 149.4324188232422, 31.40058135986328, -33.794921875, 73.08442687988281, 46.19984436035156, 6.356781005859375, 18.63612937927246, 58.273529052734375, 133.61984252929688, 76.77410888671875, 141.9546356201172, 43.311279296875, 173.8697052001953, 103.8931655883789, 126.92349243164062, 36.576961517333984, 130.75350952148438, 64.71614074707031, 148.8101348876953, 21.19867706298828, 119.7667236328125, -55.968841552734375, 88.05284118652344, -23.554500579833984, 172.31100463867188, 15.7049560546875, 115.56163787841797, 25.70367431640625, -5.924354553222656, 13.023406982421875, 126.02020263671875, 15.712646484375, 31.4334716796875, 113.72698974609375, -7.3966827392578125, 78.1981201171875, 49.846893310546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000662.npy"}
{"epoch": 0.9720998531571219, "step": 663, "batch_size": 64, "mean": 70.93704223632812, "std": 81.64666748046875, "min": -91.59114837646484, "p10": -29.335370635986322, "median": 60.37479019165039, "p90": 185.94402465820312, "max": 222.6580810546875, "pos_frac": 0.796875, "sample": [174.43585205078125, 105.68118286132812, 76.13261413574219, 54.19509506225586, 108.83629608154297, 106.1326675415039, 31.093503952026367, -21.853538513183594, 6.797393798828125, 187.94473266601562, -0.8804130554199219, 59.519256591796875, 172.1142578125, 175.5823211669922, 106.62510681152344, 45.192604064941406, 18.923599243164062, 159.43280029296875, -78.9554214477539, -61.48021697998047, 50.01881408691406, 3.5326004028320312, 72.85518646240234, 45.99540710449219, 145.94102478027344, 120.56036376953125, 121.35356140136719, -0.7576446533203125, 154.5655517578125, 194.14422607421875, -13.77971076965332, 152.27398681640625, 185.63140869140625, -91.59114837646484, 90.01058959960938, -78.55968475341797, 49.17683410644531, 59.55561447143555, 61.193965911865234, 186.0780029296875, 151.5647735595703, -9.650970458984375, 102.545654296875, -66.47050476074219, -6.8792877197265625, 189.560302734375, 32.11841583251953, 47.11187744140625, 46.22908020019531, 222.6580810546875, 66.06779479980469, 118.67324829101562, 186.67637634277344, 22.047035217285156, 163.24920654296875, -84.42343139648438, 1.1020050048828125, 19.073612213134766, 21.418277740478516, 93.62364196777344, -32.5418701171875, 126.53287506103516, 8.711166381835938, 187.30459594726562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000663.npy"}
{"epoch": 0.973568281938326, "step": 664, "batch_size": 64, "mean": 65.49781799316406, "std": 70.27774810791016, "min": -118.65831756591797, "p10": -24.040784835815426, "median": 74.19659042358398, "p90": 152.72066650390627, "max": 216.90463256835938, "pos_frac": 0.828125, "sample": [54.151214599609375, -118.65831756591797, 73.18375396728516, 111.36270141601562, 194.15591430664062, 35.803382873535156, 13.58587646484375, 99.02674865722656, 97.26862335205078, 104.9364013671875, -16.845123291015625, 108.73486328125, 75.20942687988281, 51.9511604309082, 128.12158203125, 166.56671142578125, 86.87075805664062, 80.86754608154297, -3.3145370483398438, 109.03179931640625, 40.47662353515625, 89.827392578125, 18.87030029296875, 115.25480651855469, -1.0946464538574219, -37.484619140625, 159.423828125, 11.657752990722656, -29.275440216064453, 55.5030517578125, 127.95386505126953, 6.3389739990234375, 174.84185791015625, 20.824291229248047, 55.757102966308594, 121.80817413330078, -25.179706573486328, 50.23780822753906, 100.28699493408203, 103.30327606201172, 10.388748168945312, 107.826904296875, 31.63107681274414, -95.48419189453125, 166.36817932128906, 216.90463256835938, 115.99945068359375, 142.73330688476562, 143.02557373046875, 18.700820922851562, -39.58393859863281, 92.4831314086914, 93.79098510742188, 18.396930694580078, 45.018898010253906, 28.233871459960938, 156.60391235351562, -57.92520523071289, -21.38330078125, 0.3483123779296875, 143.65975952148438, 98.02536010742188, 141.08047485351562, 23.67418670654297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000664.npy"}
{"epoch": 0.9750367107195301, "step": 665, "batch_size": 64, "mean": 70.52334594726562, "std": 67.92237091064453, "min": -127.05296325683594, "p10": 0.09599456787109431, "median": 62.305049896240234, "p90": 157.57082824707035, "max": 209.59767150878906, "pos_frac": 0.890625, "sample": [41.61037826538086, 41.902610778808594, 21.079498291015625, 9.26519775390625, 56.87028503417969, 64.30744934082031, 29.37274932861328, 111.3482666015625, 121.6171875, 31.9544677734375, 87.46627044677734, -0.14539337158203125, 67.99468994140625, 98.97073364257812, 95.0377426147461, 4.330894470214844, 12.911172866821289, 29.5877685546875, 0.6592330932617188, 30.408889770507812, 45.506134033203125, 118.50682830810547, 128.0178680419922, 57.05793762207031, 61.57749938964844, 80.08899688720703, 143.53634643554688, 5.60174560546875, 160.36288452148438, -9.754364013671875, 27.46709442138672, 63.03260040283203, 208.76104736328125, 119.96554565429688, 122.88560485839844, 10.361316680908203, 151.0560302734375, 117.42607879638672, -127.05296325683594, 59.294960021972656, 36.922645568847656, 103.97268676757812, 209.59767150878906, 83.98712158203125, 55.75349426269531, 145.0449676513672, -17.339191436767578, 83.20576477050781, 203.41720581054688, 141.6951904296875, 61.10625457763672, 23.78791046142578, 80.88837432861328, 49.51464080810547, 175.10830688476562, -37.817413330078125, 134.80841064453125, -12.67645263671875, 50.41771697998047, 125.6964111328125, 73.46322631835938, -101.83338928222656, 183.94171142578125, 160.58160400390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000665.npy"}
{"epoch": 0.9765051395007343, "step": 666, "batch_size": 64, "mean": 67.13858032226562, "std": 85.15947723388672, "min": -128.05133056640625, "p10": -37.331652069091795, "median": 70.97018814086914, "p90": 187.74839324951176, "max": 265.6046142578125, "pos_frac": 0.765625, "sample": [193.7222137451172, 110.34326171875, 67.36480712890625, -10.223167419433594, -37.333045959472656, -25.367115020751953, 78.61297607421875, 94.51972198486328, 70.03704833984375, 265.6046142578125, 51.446533203125, -0.7252655029296875, -50.7330207824707, -85.10299682617188, 11.154533386230469, -46.937744140625, 209.12506103515625, 104.57400512695312, -33.33152770996094, 74.27355194091797, 232.94430541992188, 33.083900451660156, 23.184593200683594, 60.163238525390625, 249.962158203125, -37.328399658203125, 71.58863067626953, -106.34748077392578, 16.87848663330078, -128.05133056640625, 123.07130432128906, 133.08413696289062, 92.58735656738281, 90.76904296875, 91.48857116699219, 10.88116455078125, 160.3185272216797, 70.35174560546875, -50.699615478515625, 82.632080078125, 28.35001564025879, 39.84964370727539, -5.722705841064453, 17.591064453125, -33.27545166015625, 165.47744750976562, 10.263565063476562, 111.28466796875, 101.52381134033203, 87.3069839477539, 118.78950500488281, 146.21441650390625, 97.65829467773438, 58.22106170654297, 39.16682434082031, 71.80393981933594, 150.61209106445312, 54.50572204589844, 175.7681884765625, 97.67066192626953, 109.09910583496094, -6.836067199707031, 192.8827667236328, 207.07655334472656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000666.npy"}
{"epoch": 0.9779735682819384, "step": 667, "batch_size": 64, "mean": 78.54322052001953, "std": 69.91694641113281, "min": -55.86640930175781, "p10": -16.829280090332023, "median": 84.27954483032227, "p90": 183.66357574462893, "max": 195.6074676513672, "pos_frac": 0.84375, "sample": [195.6074676513672, 46.524658203125, 133.59127807617188, -30.66020965576172, 66.99964141845703, 163.72213745117188, 186.15432739257812, 76.28981018066406, 102.94358825683594, 56.26396942138672, 25.741064071655273, 143.21963500976562, 8.31655502319336, 98.83983612060547, 109.55380249023438, 173.77996826171875, 117.97700500488281, 19.4205322265625, 143.21966552734375, 35.380130767822266, 105.05581665039062, 39.90639114379883, 195.376220703125, 100.06904602050781, 100.48561096191406, 6.399871826171875, 94.80891418457031, 89.38938903808594, 184.5789337158203, 77.36166381835938, 20.870262145996094, -24.00961685180664, 88.14337921142578, 18.74781036376953, 83.79779815673828, -4.45404052734375, -20.52942657470703, 193.5611114501953, 121.20540618896484, 120.14690399169922, -52.10307312011719, 171.59735107421875, 84.76129150390625, -19.609935760498047, 62.53199768066406, 181.52774047851562, 90.70564270019531, 49.38019561767578, 76.60710144042969, 194.480224609375, -29.7833251953125, 155.25514221191406, 5.879600524902344, 23.135726928710938, 2.7353897094726562, 185.12933349609375, 100.38249969482422, -10.341083526611328, 64.92665100097656, -6.004058837890625, 86.66691589355469, 43.79096221923828, 157.1837158203125, -55.86640930175781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000667.npy"}
{"epoch": 0.9794419970631424, "step": 668, "batch_size": 64, "mean": 85.928466796875, "std": 82.99992370605469, "min": -50.87725830078125, "p10": -5.968509292602537, "median": 67.49929428100586, "p90": 208.11985626220707, "max": 286.6799011230469, "pos_frac": 0.828125, "sample": [73.17813110351562, 68.55380249023438, 268.26947021484375, 40.51679992675781, -0.6991958618164062, 121.41883850097656, 12.206184387207031, 36.30835723876953, -7.9324798583984375, 186.77273559570312, 9.599655151367188, -8.846908569335938, 151.0242919921875, -1.0926055908203125, 128.28981018066406, 56.557708740234375, 4.744529724121094, 66.44478607177734, 7.301666259765625, -16.663105010986328, 69.91618347167969, 191.88519287109375, 23.044334411621094, 211.35289001464844, 200.57611083984375, 55.223060607910156, 6.448883056640625, 115.70401000976562, 119.10743713378906, 126.57862854003906, 91.94151306152344, -13.744598388671875, 199.59947204589844, 47.39056396484375, 89.40621185302734, -50.87725830078125, 58.41986083984375, 220.51998901367188, 46.96589279174805, 18.108158111572266, 82.78315734863281, 38.45256805419922, 196.60650634765625, 74.982177734375, -2.1499481201171875, 246.55667114257812, 146.83868408203125, -3.7859573364257812, 191.6309356689453, 33.60851287841797, -23.705276489257812, 44.998146057128906, 104.77276611328125, 87.97109985351562, 25.83600616455078, 219.625, 101.52140045166016, 51.82133483886719, 286.6799011230469, 60.717613220214844, -6.903888702392578, 134.15481567382812, 127.16680908203125, 255.7235870361328], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000668.npy"}
{"epoch": 0.9809104258443465, "step": 669, "batch_size": 64, "mean": 82.30418395996094, "std": 74.42383575439453, "min": -53.9858283996582, "p10": -14.074181365966792, "median": 75.47517776489258, "p90": 173.5625869750977, "max": 257.97308349609375, "pos_frac": 0.8125, "sample": [128.4951171875, -53.24724578857422, -53.9858283996582, 135.95498657226562, -15.76998519897461, 153.56797790527344, 42.39397430419922, 136.25030517578125, 40.110252380371094, -9.706537246704102, 257.97308349609375, -24.382850646972656, -6.752357482910156, 70.31851196289062, 46.84790802001953, 20.094955444335938, 150.61172485351562, 51.367027282714844, -5.4209747314453125, 42.09416198730469, 85.94137573242188, -35.28852081298828, 116.4525146484375, 73.44747161865234, 60.91693115234375, 149.58316040039062, 111.91641235351562, 193.59902954101562, 227.2747802734375, -10.117305755615234, 97.40628051757812, 147.28463745117188, 49.81531524658203, 153.02264404296875, 199.45333862304688, -20.519603729248047, -2.378131866455078, 38.20970153808594, 132.99539184570312, 68.08919525146484, 6.411079406738281, 26.363807678222656, 109.55509185791016, 135.90579223632812, -51.11723327636719, 67.64360046386719, 52.70323181152344, 66.86648559570312, 133.85311889648438, 107.57246398925781, 164.08497619628906, 77.50288391113281, 114.6827392578125, 119.00423431396484, 147.38343811035156, 177.62442016601562, 8.986617088317871, 192.26625061035156, 139.914794921875, 126.1218032836914, 10.488121032714844, 46.82474136352539, 152.15769958496094, 190.7487030029297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000669.npy"}
{"epoch": 0.9823788546255506, "step": 670, "batch_size": 64, "mean": 90.21934509277344, "std": 78.18925476074219, "min": -131.2679443359375, "p10": 3.4359809875488287, "median": 94.11100769042969, "p90": 181.2890380859375, "max": 257.56549072265625, "pos_frac": 0.90625, "sample": [156.4254150390625, 142.9059295654297, 97.83152770996094, 62.74790954589844, 87.62848663330078, 26.55349349975586, 38.0487060546875, 42.44878005981445, 190.6148681640625, 70.65042114257812, 196.90353393554688, 49.03289794921875, -131.2679443359375, 143.62762451171875, 132.46377563476562, 197.0647430419922, 257.56549072265625, 94.99116516113281, 181.45205688476562, -66.4289779663086, 92.32875061035156, 161.38787841796875, 97.64006805419922, 151.17176818847656, 38.674041748046875, 143.07742309570312, 130.3631591796875, -7.086311340332031, 103.50940704345703, -37.55792236328125, 156.0989532470703, 104.03070068359375, 47.003395080566406, 50.47825622558594, 61.72287368774414, 253.4100341796875, 180.90866088867188, 40.230072021484375, -86.70352172851562, 149.82891845703125, 11.339717864990234, 129.0108642578125, 160.17251586914062, 74.55584716796875, 143.65292358398438, 175.09042358398438, 51.32478332519531, 37.86085510253906, 13.35788345336914, -84.65342712402344, 96.90154266357422, 130.86062622070312, 171.6166229248047, 89.77734375, 86.57061004638672, 34.51919937133789, 76.45863342285156, 93.23085021972656, 3.9190521240234375, 185.41676330566406, 61.733177185058594, 96.19783020019531, 3.2289505004882812, 130.11813354492188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000670.npy"}
{"epoch": 0.9838472834067548, "step": 671, "batch_size": 64, "mean": 61.64239501953125, "std": 78.61564636230469, "min": -142.61251831054688, "p10": -23.532062149047846, "median": 49.12520980834961, "p90": 176.80166168212895, "max": 261.423095703125, "pos_frac": 0.875, "sample": [69.67901611328125, 37.32373046875, 61.92906188964844, 15.365913391113281, -142.0061492919922, 88.73787689208984, 30.459781646728516, 18.41899871826172, 24.957923889160156, 102.51033020019531, 75.6893310546875, 73.32156372070312, 38.55828857421875, 6.39862060546875, 112.1712646484375, 51.095489501953125, 10.812225341796875, 0.5284881591796875, 182.73130798339844, 148.2653350830078, 19.469032287597656, 132.04295349121094, -25.27863311767578, 93.9876708984375, 190.50006103515625, 103.19166564941406, 195.570556640625, 50.16851806640625, 207.2001495361328, 11.448389053344727, 42.079368591308594, 62.709434509277344, 79.03068542480469, -19.456729888916016, 135.14605712890625, 67.75721740722656, 112.669677734375, -33.830902099609375, -48.793182373046875, 47.522796630859375, 136.05638122558594, -142.61251831054688, 3.219146728515625, 26.573265075683594, 48.08190155029297, 82.4241943359375, 151.19007873535156, 162.9658203125, 4.550239562988281, 60.19021987915039, 202.04263305664062, -27.79669952392578, 2.6452903747558594, 55.751983642578125, 25.8067626953125, 39.628787994384766, -32.47731399536133, 30.941970825195312, 6.777303695678711, 261.423095703125, 15.066383361816406, 228.39801025390625, 30.948394775390625, 143.23472595214844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000671.npy"}
{"epoch": 0.9853157121879589, "step": 672, "batch_size": 64, "mean": 56.48442840576172, "std": 73.70994567871094, "min": -87.72930908203125, "p10": -37.927182769775385, "median": 52.31110763549805, "p90": 150.71468658447267, "max": 214.21343994140625, "pos_frac": 0.78125, "sample": [54.605438232421875, -63.56195068359375, 8.501934051513672, 55.370262145996094, 66.59754180908203, 35.66954803466797, -10.864158630371094, 86.26528930664062, 211.76307678222656, -71.48771667480469, 30.024436950683594, 3.1852645874023438, -87.72930908203125, 106.86555480957031, 30.376869201660156, 124.29208374023438, 59.1351318359375, 94.04714965820312, 50.01677703857422, 177.02676391601562, 127.52091979980469, 103.96267700195312, 6.948369979858398, 43.709991455078125, 30.505767822265625, -79.64158630371094, 202.46937561035156, -27.565509796142578, 25.003883361816406, 214.21343994140625, 7.445594787597656, 148.57801818847656, 105.99620056152344, 131.4511260986328, 113.4857177734375, 108.76666259765625, 110.12130737304688, -41.4814453125, 48.604347229003906, 97.51866149902344, -70.16706085205078, 86.92720031738281, 47.49774932861328, 30.632461547851562, 56.73893737792969, 166.85333251953125, 185.6635284423828, 72.443603515625, -29.63390350341797, 58.50349044799805, 79.59574890136719, 30.085357666015625, -4.98492431640625, 125.90921783447266, 8.923028945922852, -14.842718124389648, -28.469207763671875, 81.79150390625, 148.28477478027344, -13.889175415039062, -45.298179626464844, 25.830276489257812, 151.63040161132812, 27.26439666748047], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000672.npy"}
{"epoch": 0.986784140969163, "step": 673, "batch_size": 64, "mean": 67.50054931640625, "std": 72.65570831298828, "min": -146.8754425048828, "p10": -3.6480150222778307, "median": 59.859947204589844, "p90": 160.68819274902347, "max": 294.859375, "pos_frac": 0.875, "sample": [165.01544189453125, 18.50225257873535, -35.58617401123047, 118.47696685791016, 4.4343719482421875, 190.2042999267578, 82.68782043457031, 51.49040222167969, 194.85128784179688, 82.1795883178711, 5.9256591796875, 46.132049560546875, -60.608184814453125, 55.51847839355469, 175.85391235351562, 95.268310546875, 28.611814498901367, 165.28268432617188, 116.12350463867188, -9.639625549316406, -32.634735107421875, 35.20948791503906, 0.19365310668945312, 7.785831451416016, 33.75138854980469, 16.19048309326172, 13.305549621582031, 84.07815551757812, 5.321842193603516, 144.30014038085938, 146.0485076904297, 218.70928955078125, 54.440025329589844, 41.99191665649414, 100.98193359375, -23.351913452148438, 17.568634033203125, 81.95528411865234, 64.201416015625, 115.84394836425781, 95.62735748291016, 4.836551666259766, 79.47081756591797, 150.59127807617188, 93.22239685058594, 91.25067138671875, 51.68761444091797, -2.223836898803711, 26.143051147460938, 294.859375, 117.95806121826172, 75.64205932617188, 138.0889434814453, 44.472957611083984, -146.8754425048828, 66.1558837890625, 2.587799072265625, 109.87736511230469, 47.69544982910156, 120.43547058105469, 95.4728775024414, 36.54854965209961, 114.15228271484375, -4.2583770751953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000673.npy"}
{"epoch": 0.9882525697503671, "step": 674, "batch_size": 64, "mean": 82.70986938476562, "std": 76.67236328125, "min": -45.21881866455078, "p10": -4.452337360382075, "median": 63.77672576904297, "p90": 197.03866882324223, "max": 299.0943298339844, "pos_frac": 0.875, "sample": [137.45240783691406, 56.03486633300781, 85.99906158447266, 249.97698974609375, 46.83314895629883, 84.59305572509766, 94.32271575927734, 93.94025421142578, -41.83195495605469, 118.85883331298828, 27.922508239746094, 109.80917358398438, -0.04889678955078125, 187.8111572265625, 86.17831420898438, 57.41810989379883, -6.339526176452637, 225.0269775390625, 54.81671142578125, 238.4178466796875, 22.44896697998047, 133.5950927734375, 30.583763122558594, 76.3663101196289, 34.323326110839844, 6.956478118896484, 203.992919921875, -45.21881866455078, 42.907196044921875, 77.03811645507812, 163.09388732910156, 299.0943298339844, 38.50938415527344, 10.564380645751953, -39.27363204956055, 211.19650268554688, 51.17815399169922, -35.56025695800781, 44.10222244262695, 93.89524841308594, 77.95509338378906, -32.48774719238281, 52.46508026123047, 144.65518188476562, 36.79423904418945, 180.42124938964844, 88.47381591796875, 64.53137969970703, 51.73979949951172, 176.8460693359375, 137.82073974609375, 31.310035705566406, 200.99331665039062, -35.015228271484375, 127.0626449584961, 51.155120849609375, 58.182456970214844, 146.4911651611328, 149.06719970703125, 63.022071838378906, 86.50286865234375, 45.02610397338867, 35.2921257019043, 28.141921997070312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000674.npy"}
{"epoch": 0.9897209985315712, "step": 675, "batch_size": 64, "mean": 72.52214050292969, "std": 90.69078063964844, "min": -257.2762756347656, "p10": -33.070989227294916, "median": 80.88442993164062, "p90": 177.70649871826174, "max": 291.6551818847656, "pos_frac": 0.75, "sample": [-27.690364837646484, 160.76654052734375, 37.147865295410156, 118.76133728027344, -13.235504150390625, 87.2071533203125, 143.27774047851562, -1.596609115600586, 20.745079040527344, 130.13528442382812, 31.259902954101562, -14.2052001953125, -1.908843994140625, -30.19732666015625, 70.0473403930664, 14.470481872558594, -21.015254974365234, 174.3067626953125, 120.2847671508789, -12.928153991699219, 185.086181640625, 60.15522766113281, 291.6551818847656, 210.774169921875, 125.68907928466797, -37.65053939819336, 24.136402130126953, 113.16812133789062, 61.730411529541016, -61.55010223388672, 111.64067840576172, 238.5926055908203, -36.07292938232422, 104.98839569091797, -5.49395751953125, 150.17913818359375, 96.95891571044922, 91.86738586425781, 144.5791473388672, 6.9384002685546875, -34.9465446472168, 141.58294677734375, 99.09526062011719, 188.2567138671875, -34.30255889892578, 14.955263137817383, -257.2762756347656, 104.73344421386719, 121.1162109375, 115.8480224609375, 59.3242301940918, 179.1635284423828, 64.548583984375, 76.24812316894531, 78.48007202148438, 117.24525451660156, -63.58006286621094, 28.974632263183594, 83.28878784179688, 240.00521850585938, 12.804084777832031, 135.7463836669922, 158.09478759765625, 149.00576782226562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000675.npy"}
{"epoch": 0.9911894273127754, "step": 676, "batch_size": 64, "mean": 76.57514953613281, "std": 79.75215148925781, "min": -144.76934814453125, "p10": -23.383885765075682, "median": 80.2295036315918, "p90": 183.49204254150393, "max": 232.6951904296875, "pos_frac": 0.828125, "sample": [38.10327911376953, 104.64103698730469, 115.52633666992188, 77.78406524658203, -144.76934814453125, 85.36209106445312, 222.34130859375, 189.0279083251953, -10.685822486877441, 24.864818572998047, 62.811195373535156, 73.08505249023438, 129.49966430664062, 112.661376953125, 59.09027862548828, 20.634567260742188, 5.495367050170898, -7.2909698486328125, 24.55377960205078, -61.63273620605469, 29.310184478759766, -63.907135009765625, 187.85487365722656, -49.04905700683594, 141.56231689453125, -36.22129821777344, 119.89096069335938, 209.2081756591797, 77.81651306152344, 160.3252410888672, 79.3680419921875, -20.126373291015625, 160.98828125, 81.0909652709961, 146.7103271484375, 5.343341827392578, 232.6951904296875, 141.88772583007812, 195.90335083007812, 72.29910278320312, 21.450042724609375, -24.77996253967285, 60.019256591796875, 15.277969360351562, 100.03368377685547, 89.44705200195312, 16.366119384765625, 84.77252197265625, 149.99444580078125, 14.135993957519531, 121.17939758300781, 208.80946350097656, 125.49662017822266, 133.68212890625, 60.16709899902344, 100.22579956054688, 55.93849182128906, -2.2431602478027344, 123.94410705566406, 97.29438018798828, 163.70419311523438, -71.71153259277344, 173.31210327148438, 90.23982238769531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000676.npy"}
{"epoch": 0.9926578560939795, "step": 677, "batch_size": 64, "mean": 78.80928039550781, "std": 68.12860870361328, "min": -49.96009063720703, "p10": -7.866487121582029, "median": 68.4556770324707, "p90": 179.41778411865235, "max": 206.73931884765625, "pos_frac": 0.859375, "sample": [-5.454383850097656, 131.58798217773438, 198.95928955078125, 91.19705963134766, 148.3656005859375, 77.90684509277344, -28.323261260986328, -3.6241073608398438, 170.8241424560547, 31.123306274414062, 41.98806381225586, 106.65875244140625, 14.863723754882812, 174.17242431640625, 51.483856201171875, 91.72978973388672, 206.73931884765625, -49.96009063720703, 32.89347839355469, 181.43536376953125, 199.61233520507812, 145.79721069335938, 138.941162109375, 56.5433349609375, 34.66429138183594, 142.6494140625, 10.59576416015625, 26.98343276977539, 101.89319610595703, 58.949951171875, 103.19859313964844, 62.50311279296875, 47.36931228637695, 115.50540161132812, 145.58767700195312, 48.5816650390625, -46.85661315917969, 20.431474685668945, 104.07682037353516, 40.78905487060547, 67.10677337646484, 53.24605941772461, 81.26356506347656, 153.09152221679688, -8.900245666503906, 177.7604217529297, -26.109291076660156, 45.680110931396484, 36.784637451171875, 200.56967163085938, 89.92031860351562, -11.650823593139648, 20.96978759765625, 34.94011688232422, 180.12808227539062, 94.4231185913086, 44.648406982421875, 202.76502990722656, 97.34766387939453, 115.31130981445312, -36.86735153198242, 45.55488586425781, 93.62176513671875, 69.80458068847656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000677.npy"}
{"epoch": 0.9941262848751835, "step": 678, "batch_size": 64, "mean": 89.56806945800781, "std": 76.15312194824219, "min": -114.497314453125, "p10": -1.1669906616210923, "median": 93.61478042602539, "p90": 183.7312240600586, "max": 252.87716674804688, "pos_frac": 0.890625, "sample": [95.21880340576172, 92.01075744628906, 136.36767578125, 184.5226287841797, 14.827407836914062, 29.95603370666504, 131.10012817382812, 114.30888366699219, 30.384506225585938, 99.84568786621094, -41.338226318359375, 68.94933319091797, 41.30409622192383, 163.35635375976562, 105.48690032958984, 120.05267333984375, 126.31983947753906, 45.76653289794922, -114.497314453125, 140.4653778076172, 102.43733215332031, 44.02666473388672, 10.465736389160156, 90.67056274414062, 151.15380859375, 251.88475036621094, 119.81143188476562, 63.75830078125, 0.07036209106445312, 201.4055938720703, 121.83125305175781, 26.653507232666016, 86.47552490234375, 17.64007568359375, 63.521995544433594, 105.93798828125, 7.8813018798828125, 252.87716674804688, -6.5303802490234375, 36.503509521484375, -15.92605209350586, 175.84390258789062, 140.38694763183594, 153.6739501953125, 212.81195068359375, 42.672176361083984, 231.1730499267578, 157.37326049804688, 29.035179138183594, -38.46562194824219, 133.43087768554688, 45.134193420410156, 71.48845672607422, -1.6972846984863281, 123.09910583496094, 84.95609283447266, 48.938148498535156, 222.8756561279297, 181.88461303710938, -8.468109130859375, 151.75540161132812, 0.110595703125, 129.28213500976562, 128.10311889648438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000678.npy"}
{"epoch": 0.9955947136563876, "step": 679, "batch_size": 64, "mean": 73.9551773071289, "std": 62.57904815673828, "min": -47.11396408081055, "p10": 5.057492828369142, "median": 71.94274520874023, "p90": 157.7172760009766, "max": 253.81060791015625, "pos_frac": 0.90625, "sample": [19.022796630859375, 43.13897705078125, 19.896564483642578, 161.74734497070312, 144.95828247070312, 60.17354965209961, 77.70973205566406, 122.79005432128906, 4.638618469238281, 72.29761505126953, 88.43244934082031, 147.9404754638672, 118.186279296875, -16.80083465576172, 209.04830932617188, 6.611228942871094, 63.156517028808594, 115.356201171875, 45.83311462402344, 42.64342498779297, 181.59765625, 187.29806518554688, 35.280242919921875, 194.84939575195312, 72.58299255371094, 18.562744140625, 35.95246505737305, 114.88450622558594, 6.809169769287109, 99.58448791503906, 77.33563995361328, 46.90901184082031, 52.20387268066406, 161.4123992919922, 96.81967163085938, 133.2698974609375, 54.55226135253906, 17.858116149902344, -13.357646942138672, -25.643211364746094, 107.08711242675781, -14.915943145751953, 85.36856079101562, 79.79548645019531, 71.58787536621094, 137.0426025390625, 42.167030334472656, 149.09532165527344, 108.57368469238281, 64.25419616699219, 117.80641174316406, 28.703125, 25.986061096191406, -47.11396408081055, 78.75373840332031, 26.882232666015625, 96.90583801269531, 86.86087799072266, 6.0348663330078125, 253.81060791015625, 85.04398345947266, 7.671173095703125, 60.25408935546875, -20.066001892089844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000679.npy"}
{"epoch": 0.9970631424375918, "step": 680, "batch_size": 64, "mean": 73.86129760742188, "std": 80.54899597167969, "min": -90.61123657226562, "p10": -29.277639389038086, "median": 64.03754806518555, "p90": 180.3043899536133, "max": 251.8128662109375, "pos_frac": 0.8125, "sample": [165.78131103515625, -66.80248260498047, -66.68535614013672, -8.123420715332031, 122.61217498779297, -10.941741943359375, 169.24517822265625, 62.5892333984375, 137.6227264404297, 82.49147033691406, 159.20263671875, -3.740203857421875, 62.218170166015625, 67.10385131835938, 137.45431518554688, 251.8128662109375, -18.266639709472656, 2.90802001953125, 39.034820556640625, -90.61123657226562, 114.11653137207031, 103.03494262695312, 44.12321853637695, 29.183982849121094, 96.96611022949219, 57.268218994140625, 63.88817596435547, 72.84356689453125, 31.381088256835938, 244.08139038085938, 28.76936149597168, 126.41618347167969, -56.41102600097656, 147.32171630859375, 88.4125747680664, 44.274505615234375, 159.458251953125, 64.18692016601562, 43.21052551269531, 15.060821533203125, 47.73341369628906, 208.46484375, 49.28180694580078, 53.61302947998047, 59.12495422363281, -80.00238800048828, 194.49533081054688, -44.9781379699707, -29.87674331665039, 106.7978515625, 73.19541931152344, 204.30615234375, 65.40687561035156, -27.879730224609375, 104.33779907226562, 11.934822082519531, 175.97750854492188, 188.910400390625, 30.216529846191406, 150.91830444335938, 90.80972290039062, 182.1587677001953, 35.79917907714844, 163.88455200195312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000680.npy"}
{"epoch": 0.9985315712187959, "step": 681, "batch_size": 64, "mean": 69.10145568847656, "std": 75.96519470214844, "min": -153.4980926513672, "p10": -27.603844833374012, "median": 65.09303665161133, "p90": 184.02260742187502, "max": 242.13034057617188, "pos_frac": 0.828125, "sample": [77.35420227050781, -17.528369903564453, 103.26673889160156, 14.454910278320312, 97.08857727050781, -31.921905517578125, 59.42266082763672, 46.52641296386719, 72.1485595703125, 59.703155517578125, 150.1324462890625, -10.619491577148438, 46.38289260864258, 45.84046173095703, -6.7371368408203125, 216.9616241455078, 193.444091796875, -36.259857177734375, 141.4812469482422, 181.09396362304688, -77.17132568359375, 22.61376190185547, 92.50393676757812, 87.02810668945312, 242.13034057617188, 57.247283935546875, 53.33544921875, 73.69551086425781, 35.10210418701172, 71.93928527832031, 63.908226013183594, 31.146835327148438, 68.3240966796875, 112.23747253417969, 22.86920928955078, 88.13447570800781, 1.1695938110351562, 107.54226684570312, 56.35490798950195, 63.11317825317383, 12.574508666992188, 211.00782775878906, 87.66755676269531, 117.4601058959961, -153.4980926513672, 123.31031799316406, -33.6629524230957, 73.56385803222656, 45.84580993652344, 66.27784729003906, 210.21202087402344, -7.9666290283203125, 133.89459228515625, 140.57009887695312, 78.09113311767578, -34.19044494628906, 235.83444213867188, -32.807861328125, 95.872802734375, 59.91416931152344, 44.50946044921875, 185.27774047851562, 75.1634292602539, 12.111541748046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000681.npy"}

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4d8c35d3d078b0d06864be672c8ee0fee43eb73a26596f161f302646b5f353d1
size 4972454376

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b39fa6a67f9202d733d62a06bf51464384fe8f01da6483930d485b13973b40c6
size 4832048608

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:47fcdebeb14578a7179e336e6229ec7507a15e8889d401347db946e25f1f9bbc
size 4832048656

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:86c7dec8fb24020e873a7757c12643c90cdacaf8f80e37f375e4d3d48cc7f519
size 4999855528

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b3a69e8c49b152bd3e20e15a3a51c73ebba12d77127ddebbf1a4eccfbb15d48b
size 4832048672

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:989673bd70819641a05b80e567fccb7458bcf86dbdb18280b32acf5d4f7cffad
size 4832048672

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:41e3b6aac9c9fd322b3b740c62a56f31c5856fd6df0eeab55b34610c7bf10bf5
size 3462482728

View File

@@ -0,0 +1,406 @@
{
"metadata": {
"total_size": 32762941440
},
"weight_map": {
"lm_head.weight": "model-00007-of-00007.safetensors",
"model.embed_tokens.weight": "model-00001-of-00007.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.20.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.input_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.34.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.35.input_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.k_norm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.q_norm.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.norm.weight": "model-00007-of-00007.safetensors"
}
}

31
special_tokens_map.json Normal file
View File

@@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

BIN
tokenizer.json (Stored with Git LFS) Normal file

Binary file not shown.

240
tokenizer_config.json Normal file
View File

@@ -0,0 +1,240 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151665": {
"content": "<tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151666": {
"content": "</tool_response>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151667": {
"content": "<think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151668": {
"content": "</think>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|endoftext|>",
"errors": "replace",
"extra_special_tokens": {},
"model_max_length": 2048,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

9
train_results.json Normal file
View File

@@ -0,0 +1,9 @@
{
"epoch": 1.0,
"total_flos": 0.0,
"train_loss": 0.9292974616462438,
"train_runtime": 2239.8819,
"train_samples": 43598,
"train_samples_per_second": 19.464,
"train_steps_per_second": 0.304
}

13084
trainer_state.json Normal file

File diff suppressed because it is too large Load Diff

1
vocab.json Normal file

File diff suppressed because one or more lines are too long