初始化项目,由ModelHub XC社区提供模型

Model: jackf857/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-18 23:09:55 +08:00
commit 36da24c147
20 changed files with 16324 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

78
README.md Normal file
View File

@@ -0,0 +1,78 @@
---
library_name: transformers
base_model: W-61/llama-3-8b-base-sft-hh-helpful-4xh200
tags:
- alignment-handbook
- new-dpo
- generated_from_trainer
datasets:
- Anthropic/hh-rlhf
model-index:
- name: llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85
This model is a fine-tuned version of [W-61/llama-3-8b-base-sft-hh-helpful-4xh200](https://huggingface.co/W-61/llama-3-8b-base-sft-hh-helpful-4xh200) on the Anthropic/hh-rlhf dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5312
- Fcm Dpo/beta: 0.0040
- Margin Dpo/margin Mean: 149.2117
- Margin Dpo/margin Std: 231.6038
- Logps/chosen: -579.7042
- Logps/rejected: -736.6628
- Logps/ref Chosen: -79.0510
- Logps/ref Rejected: -86.7979
- Logits/chosen: 0.3259
- Logits/rejected: 0.3477
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Fcm Dpo/beta | Margin Dpo/margin Mean | Margin Dpo/margin Std | Logps/chosen | Logps/rejected | Logps/ref Chosen | Logps/ref Rejected | Logits/chosen | Logits/rejected |
|:-------------:|:------:|:----:|:---------------:|:------------:|:----------------------:|:---------------------:|:------------:|:--------------:|:----------------:|:------------------:|:-------------:|:---------------:|
| 0.8901 | 0.2937 | 200 | 0.5793 | 0.0146 | 37.0480 | 73.5224 | -146.9453 | -191.7402 | -79.0510 | -86.7979 | -0.4808 | -0.4605 |
| 0.7153 | 0.5874 | 400 | 0.5444 | 0.0046 | 115.8499 | 187.1605 | -440.1346 | -563.7314 | -79.0510 | -86.7979 | 0.1299 | 0.1525 |
| 0.8902 | 0.8811 | 600 | 0.5312 | 0.0040 | 149.2117 | 231.6038 | -579.7042 | -736.6628 | -79.0510 | -86.7979 | 0.3259 | 0.3477 |
### Framework versions
- Transformers 4.51.0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.21.4

23
all_results.json Normal file
View File

@@ -0,0 +1,23 @@
{
"epoch": 1.0,
"eval_fcm_dpo/beta": 0.0035767327062785625,
"eval_logits/chosen": 0.3511611819267273,
"eval_logits/rejected": 0.37540391087532043,
"eval_logps/chosen": -578.7294311523438,
"eval_logps/ref_chosen": -79.05104064941406,
"eval_logps/ref_rejected": -86.79793548583984,
"eval_logps/rejected": -737.1654663085938,
"eval_loss": 0.5333206057548523,
"eval_margin_dpo/margin_mean": 150.68910217285156,
"eval_margin_dpo/margin_std": 227.09837341308594,
"eval_runtime": 39.2308,
"eval_samples": 2339,
"eval_samples_per_second": 59.622,
"eval_steps_per_second": 1.886,
"total_flos": 0.0,
"train_loss": 0.8810622134572784,
"train_runtime": 1867.9067,
"train_samples": 43598,
"train_samples_per_second": 23.341,
"train_steps_per_second": 0.365
}

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128001,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.51.0",
"use_cache": true,
"vocab_size": 128256
}

17
eval_results.json Normal file
View File

@@ -0,0 +1,17 @@
{
"epoch": 1.0,
"eval_fcm_dpo/beta": 0.0035767327062785625,
"eval_logits/chosen": 0.3511611819267273,
"eval_logits/rejected": 0.37540391087532043,
"eval_logps/chosen": -578.7294311523438,
"eval_logps/ref_chosen": -79.05104064941406,
"eval_logps/ref_rejected": -86.79793548583984,
"eval_logps/rejected": -737.1654663085938,
"eval_loss": 0.5333206057548523,
"eval_margin_dpo/margin_mean": 150.68910217285156,
"eval_margin_dpo/margin_std": 227.09837341308594,
"eval_runtime": 39.2308,
"eval_samples": 2339,
"eval_samples_per_second": 59.622,
"eval_steps_per_second": 1.886
}

9
generation_config.json Normal file
View File

@@ -0,0 +1,9 @@
{
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": 128001,
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "4.51.0"
}

681
margin_logs/margins.jsonl Normal file
View File

@@ -0,0 +1,681 @@
{"epoch": 0.0, "step": 1, "batch_size": 64, "mean": -0.02287048101425171, "std": 0.42023447155952454, "min": -1.4034271240234375, "p10": -0.46674575805664065, "median": 0.04234886169433594, "p90": 0.4323463439941407, "max": 0.89263916015625, "pos_frac": 0.53125, "sample": [-0.06523895263671875, 0.436798095703125, 0.27811431884765625, -0.9194221496582031, 0.018890380859375, 0.20587158203125, 0.18878173828125, -0.3968696594238281, 0.26206207275390625, 0.2470550537109375, -0.040912628173828125, 0.4394989013671875, -0.44133758544921875, -0.39148712158203125, 0.2764854431152344, 0.89263916015625, -0.42584991455078125, -0.46125030517578125, -0.8638992309570312, -0.3508758544921875, 0.371368408203125, 0.887847900390625, -0.382904052734375, 0.36145782470703125, -0.4890003204345703, 0.052455902099609375, -0.036136627197265625, 0.23079299926757812, 0.2469482421875, 0.1643218994140625, -0.07129669189453125, 0.2790794372558594, 0.3637123107910156, -0.8916168212890625, 0.03298759460449219, -0.2790107727050781, -0.17860984802246094, 0.23892593383789062, 0.05171012878417969, -0.2564239501953125, -0.14655303955078125, 0.27777862548828125, 0.0810394287109375, -1.4034271240234375, -0.28739166259765625, -0.1489429473876953, 0.44918060302734375, 0.1693286895751953, 0.10933303833007812, -0.14766693115234375, -0.40944671630859375, -0.18532562255859375, 0.6261310577392578, -0.20856857299804688, 0.602569580078125, 0.05538177490234375, 0.1505279541015625, 0.1313800811767578, -0.006317138671875, 0.42195892333984375, -0.29936981201171875, -0.4691009521484375, 0.16705322265625, -0.5789260864257812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000001.npy"}
{"epoch": 0.0014684287812041115, "step": 2, "batch_size": 64, "mean": -0.06572240591049194, "std": 0.3523969054222107, "min": -0.9291305541992188, "p10": -0.46334152221679686, "median": -0.05502510070800781, "p90": 0.3672500610351563, "max": 1.0444793701171875, "pos_frac": 0.4375, "sample": [-0.2829437255859375, 0.3027191162109375, -0.19867706298828125, -0.3062286376953125, 0.10318756103515625, 0.20131683349609375, -0.34906005859375, 0.2802886962890625, 0.1914520263671875, -0.31072998046875, 0.08922195434570312, 0.10284614562988281, -0.03655242919921875, -0.0604095458984375, -0.06208038330078125, 0.32562255859375, -0.37982177734375, 0.2746162414550781, -0.049640655517578125, 0.3752174377441406, -0.103973388671875, 0.0699462890625, 0.36417388916015625, -0.033428192138671875, 0.37265777587890625, -0.3787078857421875, -0.6610565185546875, 0.4720420837402344, 0.47701263427734375, -0.27928924560546875, -0.44719696044921875, -0.0965118408203125, -0.7628555297851562, 0.046764373779296875, 0.06670379638671875, -0.9291305541992188, -0.7122802734375, -0.16554832458496094, 0.1485595703125, -0.07539939880371094, 0.2588920593261719, 0.039890289306640625, 0.201690673828125, 0.0623016357421875, 1.0444793701171875, -0.37696075439453125, -0.02794647216796875, -0.223297119140625, -0.35730743408203125, -0.1309051513671875, -0.3106689453125, -0.11409187316894531, -0.1669769287109375, 0.131317138671875, -0.2361297607421875, 0.4093780517578125, -0.6485977172851562, 0.36856842041015625, -0.1951904296875, -0.4702606201171875, -0.7624168395996094, 0.008928298950195312, -0.31630706787109375, 0.022550582885742188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000002.npy"}
{"epoch": 0.002936857562408223, "step": 3, "batch_size": 64, "mean": 0.009388208389282227, "std": 0.28488123416900635, "min": -0.5578231811523438, "p10": -0.34405670166015623, "median": -0.038600921630859375, "p90": 0.38259963989257817, "max": 0.7552642822265625, "pos_frac": 0.4375, "sample": [-0.1851177215576172, -0.40454864501953125, -0.19455718994140625, 0.05818939208984375, -0.25875282287597656, -0.16633033752441406, 0.23579978942871094, 0.2530097961425781, 0.5445556640625, 0.009889602661132812, -0.01627349853515625, -0.10354042053222656, -0.2176971435546875, -0.1152496337890625, -0.1434955596923828, -0.408447265625, 0.230865478515625, -0.16727828979492188, 0.11531829833984375, -0.0381927490234375, -0.3487396240234375, 0.607452392578125, 0.064697265625, 0.123046875, 0.4116058349609375, -0.092315673828125, 0.13521575927734375, 0.35221099853515625, 0.1404590606689453, -0.18487548828125, 0.22316741943359375, 0.5703125, -0.04603004455566406, -0.172393798828125, -0.007045745849609375, -0.21282196044921875, -0.5578231811523438, -0.4862823486328125, 0.3663482666015625, -0.03900909423828125, 0.19307327270507812, -0.14304351806640625, -0.16375732421875, 0.284759521484375, 0.2542724609375, -0.06398963928222656, 0.14159393310546875, 0.2613182067871094, -0.11353302001953125, -0.31179046630859375, 0.1640605926513672, -0.3802165985107422, 0.4802055358886719, 0.2588348388671875, -0.1695842742919922, 0.38956451416015625, -0.04706573486328125, -0.5288925170898438, -0.00555419921875, 0.7552642822265625, -0.09318351745605469, -0.3331298828125, 0.03462982177734375, -0.13831710815429688], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000003.npy"}
{"epoch": 0.004405286343612335, "step": 4, "batch_size": 64, "mean": -0.04822826385498047, "std": 0.45266056060791016, "min": -1.5372314453125, "p10": -0.5598731994628906, "median": -0.01579570770263672, "p90": 0.4012956619262697, "max": 1.3105926513671875, "pos_frac": 0.5, "sample": [-0.2571525573730469, -0.12437820434570312, -0.4063873291015625, 0.22118377685546875, -0.068878173828125, 0.28668975830078125, 0.6942901611328125, -1.5372314453125, -0.7919979095458984, -0.6689376831054688, 0.011333465576171875, 0.3090858459472656, 0.252471923828125, -0.2560577392578125, -0.5504150390625, 0.045009613037109375, -0.48236083984375, -0.09471893310546875, 0.14928436279296875, 0.0045070648193359375, -0.9766769409179688, -0.04454803466796875, -0.155364990234375, -0.5639266967773438, -0.12810516357421875, -0.285186767578125, 0.4510498046875, 0.24947357177734375, -0.08808135986328125, -0.2989501953125, 0.0014476776123046875, 0.10707473754882812, -0.422607421875, 0.032901763916015625, 0.16542625427246094, 0.4208965301513672, -0.5365447998046875, 0.0121612548828125, 0.12916946411132812, 0.7624053955078125, 0.24300384521484375, -0.12639617919921875, 0.34405517578125, 0.355560302734375, -0.2748146057128906, 0.2820777893066406, 0.115509033203125, 0.702178955078125, -0.03975868225097656, -0.4715576171875, 0.3024444580078125, 0.24290847778320312, 0.2862586975097656, -0.8917007446289062, -0.033039093017578125, 0.6132698059082031, 1.3105926513671875, -0.23751068115234375, 0.041900634765625, 0.0184173583984375, -0.72674560546875, -0.28997039794921875, -0.350128173828125, -0.07051849365234375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000004.npy"}
{"epoch": 0.005873715124816446, "step": 5, "batch_size": 64, "mean": 0.01697593927383423, "std": 0.3499475419521332, "min": -0.7906570434570312, "p10": -0.44270954132080076, "median": 0.030420303344726562, "p90": 0.39948959350585944, "max": 0.9911956787109375, "pos_frac": 0.53125, "sample": [-0.0181427001953125, 0.40520477294921875, -0.21201324462890625, 0.18061065673828125, 0.5753707885742188, -0.1158599853515625, 0.48572540283203125, -0.18075180053710938, 0.12430953979492188, -0.17308807373046875, -0.5694427490234375, -0.05120849609375, 0.12969398498535156, 0.02570343017578125, 0.2815361022949219, 0.156951904296875, 0.27785491943359375, -0.3466472625732422, 0.095611572265625, -0.2638092041015625, -0.04306602478027344, 0.37831878662109375, 0.6563758850097656, 0.10047149658203125, 0.20947265625, 0.013336181640625, 0.3404998779296875, -0.3858146667480469, -0.7906570434570312, 0.3861541748046875, -0.33617591857910156, 0.3269538879394531, 0.9911956787109375, 0.1442108154296875, -0.43588829040527344, 0.085296630859375, 0.510833740234375, 0.19445419311523438, -0.16400146484375, 0.1306610107421875, -0.23486328125, 0.15425872802734375, -0.11827850341796875, -0.4456329345703125, 0.34716033935546875, -0.5464019775390625, -0.21573257446289062, -0.412200927734375, -0.091766357421875, -0.5723114013671875, -0.11967658996582031, -0.00167083740234375, -0.14682769775390625, 0.21824264526367188, 0.3319740295410156, 0.16887664794921875, 0.356842041015625, -0.06191253662109375, -0.0512542724609375, 0.035137176513671875, -0.6589813232421875, 0.5013427734375, -0.6724090576171875, 0.20230484008789062], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000005.npy"}
{"epoch": 0.007342143906020558, "step": 6, "batch_size": 64, "mean": 0.010399848222732544, "std": 0.3816351890563965, "min": -0.7031402587890625, "p10": -0.5422424316406249, "median": 0.0093994140625, "p90": 0.4537620544433595, "max": 0.922332763671875, "pos_frac": 0.5, "sample": [-0.45447540283203125, -0.08581924438476562, 0.061305999755859375, -0.0177764892578125, 0.1698150634765625, 0.838165283203125, -0.6897640228271484, -0.569488525390625, 0.1896209716796875, -0.6470947265625, 0.1257781982421875, 0.922332763671875, -0.67852783203125, 0.12175369262695312, -0.4097175598144531, -0.7031402587890625, -0.02962493896484375, -0.23130226135253906, -0.02910614013671875, 0.3904380798339844, -0.14720916748046875, 0.1301422119140625, -0.169708251953125, -0.1938934326171875, -0.13327789306640625, -0.1012420654296875, 0.3443145751953125, 0.7983627319335938, 0.1420440673828125, 0.09947967529296875, -0.478668212890625, 0.21090126037597656, 0.42591094970703125, 0.0766143798828125, 0.11228179931640625, -0.0025787353515625, -0.670654296875, -0.40238189697265625, 0.30268096923828125, -0.08853912353515625, -0.377471923828125, -0.3121795654296875, -0.010772705078125, 0.2259979248046875, 0.2893829345703125, -0.06041908264160156, 0.23539352416992188, 0.12461090087890625, -0.0764312744140625, -0.18709182739257812, -0.6215057373046875, 0.4901275634765625, -0.3122978210449219, 0.28644752502441406, 0.4656982421875, -0.25186920166015625, 0.388458251953125, 0.470977783203125, 0.3665771484375, 0.6907958984375, -0.2423095703125, 0.11550140380859375, 0.41864013671875, 0.0213775634765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000006.npy"}
{"epoch": 0.00881057268722467, "step": 7, "batch_size": 64, "mean": 0.042353659868240356, "std": 0.4169289767742157, "min": -0.869659423828125, "p10": -0.4642112731933593, "median": -0.04889678955078125, "p90": 0.5722251892089844, "max": 1.0817337036132812, "pos_frac": 0.46875, "sample": [0.03148651123046875, -0.15866851806640625, 0.09681320190429688, 0.37587738037109375, -0.7756576538085938, 0.2044830322265625, -0.052074432373046875, -0.145355224609375, -0.4844970703125, 0.7711334228515625, -0.869659423828125, 0.17836761474609375, -0.1456146240234375, 0.9419097900390625, -0.09661865234375, 0.0921630859375, 0.06692314147949219, -0.3977794647216797, -0.2406940460205078, -0.17890548706054688, -0.14165115356445312, 0.5568389892578125, 0.11682891845703125, -0.10100555419921875, 0.56549072265625, -0.09775924682617188, -0.18148040771484375, -0.18965721130371094, -0.0066070556640625, -0.3326301574707031, 0.48415374755859375, 0.06340789794921875, 0.17546844482421875, 0.755645751953125, 0.1393585205078125, -0.08510971069335938, -0.12712860107421875, 0.27161407470703125, -0.6515655517578125, -0.4035072326660156, -0.4886627197265625, -0.086700439453125, 0.3230438232421875, -0.41687774658203125, -0.05156707763671875, 0.5201416015625, 0.5751113891601562, 0.31824493408203125, -0.3979644775390625, -0.04622650146484375, 0.4019012451171875, -0.71551513671875, -0.34560394287109375, 1.0817337036132812, 0.5107040405273438, -0.0529937744140625, -0.12923431396484375, 0.36295318603515625, 0.8038959503173828, 0.12233734130859375, 0.6116561889648438, 0.3726348876953125, -0.4941864013671875, -0.092529296875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000007.npy"}
{"epoch": 0.010279001468428781, "step": 8, "batch_size": 64, "mean": -0.05177316069602966, "std": 0.39431923627853394, "min": -1.7095947265625, "p10": -0.39315032958984375, "median": -0.037555694580078125, "p90": 0.4091590881347659, "max": 1.0687026977539062, "pos_frac": 0.421875, "sample": [-0.1796875, 0.30267333984375, -0.1782684326171875, 0.07019424438476562, -0.5732879638671875, 0.07734298706054688, 0.46915435791015625, 0.16086578369140625, 0.2996063232421875, -0.033916473388671875, -0.25251007080078125, -0.356414794921875, -0.13872909545898438, 0.09006690979003906, 0.44019317626953125, -0.2931060791015625, 0.016254425048828125, 0.4892463684082031, 0.2490081787109375, -0.29479217529296875, -0.03569793701171875, -1.7095947265625, -0.18824005126953125, 0.9436492919921875, -0.17342185974121094, -0.40001678466796875, -0.34020233154296875, -0.25565147399902344, 0.467987060546875, -0.493988037109375, -0.07691192626953125, -0.37712860107421875, -0.0632171630859375, 0.49853515625, 0.09186553955078125, -0.23238372802734375, -0.02081298828125, 0.09164047241210938, 0.04820060729980469, 1.0687026977539062, -0.14101791381835938, 0.3367462158203125, -0.1044921875, -0.191986083984375, 0.18000221252441406, -0.53570556640625, -0.885040283203125, 0.1340789794921875, 0.09700965881347656, 0.038055419921875, 0.07837677001953125, -0.21240234375, 0.0056915283203125, -0.10564422607421875, 0.186492919921875, -0.08275222778320312, -0.0331878662109375, -0.7226333618164062, -0.0061492919921875, -0.2961273193359375, -0.20636367797851562, -0.13323402404785156, 0.11900711059570312, -0.0394134521484375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000008.npy"}
{"epoch": 0.011747430249632892, "step": 9, "batch_size": 64, "mean": -0.041812002658843994, "std": 0.4325736165046692, "min": -1.039306640625, "p10": -0.6003341674804688, "median": -0.045319557189941406, "p90": 0.5164672851562501, "max": 1.12310791015625, "pos_frac": 0.4375, "sample": [0.0433349609375, -0.19998931884765625, 0.7397670745849609, -0.0221405029296875, 0.10559844970703125, -0.2219696044921875, -0.35308074951171875, 0.16094970703125, 0.08971405029296875, 0.6305160522460938, -0.42975616455078125, -1.039306640625, -0.4172210693359375, -0.7225723266601562, -0.555206298828125, -0.100006103515625, 0.49835205078125, -0.3889617919921875, -0.3807525634765625, -0.5074691772460938, -0.6066970825195312, 0.042934417724609375, 0.14746856689453125, 0.45705413818359375, 0.18453216552734375, 0.1034088134765625, 0.5372428894042969, -0.753875732421875, -0.22394752502441406, 0.12636566162109375, -0.044155120849609375, -0.015653610229492188, 0.2514495849609375, -0.3381500244140625, -0.0787200927734375, -0.8576126098632812, -0.049541473388671875, -0.04648399353027344, 0.588287353515625, -0.3372764587402344, -0.5854873657226562, -0.17400360107421875, -0.8063507080078125, -0.09567070007324219, -0.13214111328125, 0.52423095703125, -0.1425933837890625, -0.1351165771484375, 0.27032470703125, 0.1566638946533203, 0.256622314453125, -0.1008148193359375, 0.34698486328125, 0.3391914367675781, 0.41850852966308594, 1.12310791015625, 0.0909881591796875, 0.0349273681640625, -0.03942680358886719, 1.045989990234375, -0.6133956909179688, -0.13824462890625, -0.36773681640625, 0.03104400634765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000009.npy"}
{"epoch": 0.013215859030837005, "step": 10, "batch_size": 64, "mean": 0.017669767141342163, "std": 0.4249260127544403, "min": -1.870452880859375, "p10": -0.32405776977539064, "median": 0.051853179931640625, "p90": 0.5007177352905275, "max": 1.066986083984375, "pos_frac": 0.546875, "sample": [-0.2081756591796875, 0.214385986328125, 0.1437397003173828, 0.16422080993652344, -0.3572425842285156, 0.3284912109375, 0.0214996337890625, 0.66802978515625, 0.3575897216796875, -0.3225250244140625, 0.1438140869140625, 0.3045539855957031, -0.32471466064453125, 0.2257232666015625, -1.0439605712890625, -0.28143310546875, -0.3154869079589844, -0.27508544921875, 0.26009178161621094, 0.07209205627441406, 0.04894256591796875, -0.18194198608398438, 0.37605857849121094, 0.1482982635498047, -0.3616485595703125, -0.24257278442382812, -0.29891204833984375, 0.2334136962890625, 0.4427204132080078, -0.021900177001953125, -0.5775909423828125, -0.21319961547851562, 0.06461906433105469, -0.17588043212890625, -0.11531829833984375, -0.06525421142578125, -1.870452880859375, 0.7733650207519531, 0.21472549438476562, -0.2844352722167969, 0.20262718200683594, -0.12444305419921875, -0.32065391540527344, 0.275177001953125, -0.19890594482421875, -0.1395263671875, 0.5464630126953125, 0.52557373046875, 0.0547637939453125, 0.6123580932617188, -0.03925323486328125, 0.2815093994140625, 0.101654052734375, 0.3744392395019531, 0.1529388427734375, -0.12809181213378906, 0.3386383056640625, 0.24080657958984375, -0.46175384521484375, 0.545867919921875, 1.066986083984375, -0.2933177947998047, -0.16456031799316406, 0.0129241943359375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000010.npy"}
{"epoch": 0.014684287812041116, "step": 11, "batch_size": 64, "mean": 0.026336759328842163, "std": 0.34159544110298157, "min": -0.8680419921875, "p10": -0.3608335494995117, "median": 0.00046825408935546875, "p90": 0.4314411163330081, "max": 0.8780059814453125, "pos_frac": 0.5, "sample": [-0.21158599853515625, 0.0653839111328125, 0.4610252380371094, 0.1550006866455078, -0.3230438232421875, -0.17247772216796875, 0.32332611083984375, 0.0061359405517578125, -0.11227989196777344, 0.1342926025390625, -0.22371864318847656, -0.1451263427734375, -0.007541656494140625, 0.340728759765625, 0.3229217529296875, 0.34740447998046875, -0.039886474609375, -0.3669624328613281, 0.77593994140625, 0.12139892578125, 0.24231529235839844, 0.35208892822265625, -0.34653282165527344, 0.1626873016357422, -0.14339447021484375, 0.18594741821289062, 0.2878875732421875, -0.4580078125, -0.2664031982421875, 0.06270217895507812, 0.655426025390625, 0.347869873046875, -0.18499755859375, -0.8680419921875, -0.1203460693359375, 0.6252899169921875, -0.00885772705078125, 0.8780059814453125, -0.6695785522460938, -0.3899250030517578, -0.2906074523925781, -0.3923301696777344, 0.32413482666015625, -0.286834716796875, 0.2467041015625, 0.21303939819335938, 0.3624114990234375, -0.1258678436279297, 0.16326904296875, 0.03455352783203125, -0.23018836975097656, -0.04688262939453125, -0.0810699462890625, -0.5847396850585938, 0.5333290100097656, 0.13375282287597656, -0.15382766723632812, -0.049655914306640625, 0.06948089599609375, 0.57440185546875, -0.33599090576171875, 0.01373291015625, -0.19513320922851562, -0.005199432373046875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000011.npy"}
{"epoch": 0.016152716593245228, "step": 12, "batch_size": 64, "mean": 0.01722833514213562, "std": 0.39411911368370056, "min": -0.8962249755859375, "p10": -0.47024078369140626, "median": 0.01251983642578125, "p90": 0.513765525817871, "max": 0.7334747314453125, "pos_frac": 0.515625, "sample": [0.561859130859375, 0.223663330078125, 0.7334747314453125, -0.8962249755859375, -0.322784423828125, -0.0832977294921875, -0.20318984985351562, 0.5146713256835938, 0.14360809326171875, 0.4996681213378906, 0.6181411743164062, -0.690216064453125, -0.719482421875, 0.321319580078125, -0.16829681396484375, -0.016632080078125, 0.4289741516113281, 0.358978271484375, 0.08749771118164062, -0.44417572021484375, -0.7732696533203125, -0.28131103515625, 0.339080810546875, -0.46321868896484375, -0.4633941650390625, -0.1412506103515625, -0.1328277587890625, 0.503692626953125, -0.2214031219482422, -0.3907928466796875, -0.5390472412109375, 0.3802947998046875, -0.038700103759765625, 0.011571884155273438, -0.198028564453125, -0.13120651245117188, 0.013467788696289062, -0.473175048828125, -0.40570068359375, -0.2341156005859375, -0.1704254150390625, 0.023433685302734375, 0.04953575134277344, 0.18224716186523438, -0.39300537109375, 0.43227386474609375, 0.452880859375, 0.3638801574707031, 0.1916637420654297, 0.2584686279296875, 0.08807182312011719, 0.5116519927978516, 0.19617462158203125, 0.14296913146972656, 0.6959228515625, -0.506805419921875, 0.5689239501953125, 0.049152374267578125, -0.03948974609375, -0.1434154510498047, 0.5871429443359375, -0.023303985595703125, -0.20794677734375, 0.4843902587890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000012.npy"}
{"epoch": 0.01762114537444934, "step": 13, "batch_size": 64, "mean": 0.010488033294677734, "std": 0.382878839969635, "min": -0.8511962890625, "p10": -0.39113159179687496, "median": -0.027861595153808594, "p90": 0.49131355285644557, "max": 1.2947845458984375, "pos_frac": 0.4375, "sample": [0.2512359619140625, -0.12474632263183594, -0.3407745361328125, -0.4670257568359375, -0.3653106689453125, 0.5163612365722656, 0.25557708740234375, 0.13442611694335938, 0.16827392578125, -0.24564743041992188, -0.26227569580078125, 0.6515884399414062, -0.1146087646484375, -0.2080535888671875, 0.04549217224121094, 1.2947845458984375, -0.36895751953125, 0.42352294921875, 0.014928817749023438, 0.241455078125, -0.18131637573242188, -0.26610565185546875, 0.43286895751953125, -0.1814289093017578, 0.5359706878662109, -0.833831787109375, -0.10030746459960938, -0.2550067901611328, -0.28447532653808594, -0.404144287109375, -0.20050430297851562, 0.15418243408203125, 0.100067138671875, 0.053791046142578125, -0.8511962890625, -0.02433013916015625, -0.1624298095703125, 0.24528884887695312, 0.41228294372558594, -0.227142333984375, 0.5475807189941406, -0.03139305114746094, -0.400634765625, -0.04468536376953125, 0.22756195068359375, -0.19223403930664062, 0.08820533752441406, -0.008638381958007812, 0.0478515625, -0.19379425048828125, -0.096527099609375, 0.229705810546875, -0.01345062255859375, -0.012010574340820312, 0.6719131469726562, 0.3084716796875, 0.12965774536132812, -0.23541259765625, 1.140625, -0.4019889831542969, -0.1714019775390625, -0.4286651611328125, 0.25624847412109375, -0.20822906494140625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000013.npy"}
{"epoch": 0.01908957415565345, "step": 14, "batch_size": 64, "mean": 0.020410001277923584, "std": 0.39739495515823364, "min": -0.9185791015625, "p10": -0.5051288604736328, "median": 0.05271720886230469, "p90": 0.47166519165039067, "max": 0.8568496704101562, "pos_frac": 0.5625, "sample": [-0.4218597412109375, 0.22768020629882812, 0.0045566558837890625, -0.5081901550292969, 0.4751739501953125, -0.5644302368164062, -0.49798583984375, -0.24547576904296875, 0.362213134765625, 0.5510826110839844, 0.46347808837890625, -0.2762451171875, 0.3641357421875, 0.8568496704101562, 0.031223297119140625, -0.44953155517578125, -0.2707958221435547, -0.29338836669921875, 0.14744186401367188, -0.1364879608154297, 0.3825492858886719, -0.3930511474609375, 0.1078948974609375, -0.1215972900390625, 0.425872802734375, 0.17849349975585938, -0.9185791015625, -0.15254974365234375, 0.461395263671875, 0.6630096435546875, -0.20262908935546875, -0.1501483917236328, 0.64886474609375, 0.24499130249023438, 0.020587921142578125, 0.38820648193359375, 0.2770957946777344, 0.032562255859375, -0.25244903564453125, 0.12463951110839844, -0.022003173828125, -0.5649948120117188, -0.0242767333984375, 0.2387847900390625, 0.07287216186523438, -0.052410125732421875, 0.41363525390625, -0.7339210510253906, 0.263397216796875, 0.21698760986328125, -0.8184051513671875, 0.2170257568359375, 0.17887115478515625, -0.2678871154785156, -0.2782096862792969, 0.5427398681640625, 0.24063873291015625, 0.8315277099609375, -0.7089500427246094, -0.08901214599609375, -0.3772449493408203, 0.10226249694824219, 0.1001434326171875, 0.2400646209716797], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000014.npy"}
{"epoch": 0.020558002936857563, "step": 15, "batch_size": 64, "mean": 0.02144584059715271, "std": 0.3367857038974762, "min": -1.258544921875, "p10": -0.42457275390625, "median": 0.08524322509765625, "p90": 0.39294242858886724, "max": 0.5288658142089844, "pos_frac": 0.640625, "sample": [0.15075111389160156, 0.4368896484375, 0.08232688903808594, 0.2857227325439453, -0.13166046142578125, 0.11521530151367188, 0.012514114379882812, 0.5001678466796875, 0.07698822021484375, 0.30329132080078125, -0.43321990966796875, -0.2518596649169922, 0.1469268798828125, -0.40439605712890625, 0.5288658142089844, 0.062713623046875, 0.2684783935546875, -0.444244384765625, 0.10509300231933594, 0.1605987548828125, 0.5208587646484375, -1.258544921875, 0.0248870849609375, 0.38603973388671875, -0.6622829437255859, 0.12556076049804688, 0.31903839111328125, 0.4161376953125, -0.7510604858398438, 0.2940673828125, -0.09098052978515625, -0.545654296875, -0.2365264892578125, -0.036548614501953125, -0.320404052734375, 0.5160369873046875, -0.2536468505859375, 0.2005615234375, 0.1210784912109375, -0.06554794311523438, 0.013830184936523438, -0.0654296875, 0.055118560791015625, 0.0153961181640625, -0.0076141357421875, -0.016241073608398438, 0.10579872131347656, -0.1852874755859375, 0.15238189697265625, 0.21895980834960938, 0.35807037353515625, 0.3292503356933594, 0.1398773193359375, -0.3416748046875, 0.21148681640625, 0.25901031494140625, -0.17388916015625, -0.5498199462890625, 0.19602203369140625, 0.0466766357421875, 0.08815956115722656, -0.37633514404296875, 0.3959007263183594, 0.2286529541015625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000015.npy"}
{"epoch": 0.022026431718061675, "step": 16, "batch_size": 64, "mean": 0.020204484462738037, "std": 0.37695881724357605, "min": -0.9639739990234375, "p10": -0.4106306076049804, "median": -0.0019683837890625, "p90": 0.4428176879882813, "max": 1.093017578125, "pos_frac": 0.5, "sample": [0.21337127685546875, 1.093017578125, -0.5573348999023438, -0.213958740234375, 0.095367431640625, 0.3429832458496094, 0.6995849609375, 0.01598358154296875, 0.17409515380859375, 0.4491119384765625, 0.013153076171875, 0.1300506591796875, -0.1379852294921875, 0.31592559814453125, -0.04485511779785156, -0.01267242431640625, -0.32009124755859375, -0.1806640625, -0.735504150390625, 0.2841949462890625, 0.17494964599609375, 0.36295509338378906, 0.67279052734375, -0.11284637451171875, 0.06777381896972656, -0.08243179321289062, -0.708526611328125, -0.5128669738769531, -0.15302467346191406, -0.1976146697998047, 0.1688232421875, 0.248291015625, 0.22762298583984375, -0.19911956787109375, 0.01116943359375, 0.7673797607421875, -0.09369659423828125, -0.9639739990234375, -0.13344573974609375, -0.10191154479980469, 0.11640548706054688, -0.0724945068359375, -0.10841751098632812, -0.064788818359375, 0.2407684326171875, 0.5899925231933594, -0.15635108947753906, -0.4378223419189453, 0.00873565673828125, 0.2566680908203125, -0.20186233520507812, 0.1884002685546875, -0.018035888671875, -0.12359809875488281, 0.24514007568359375, -0.3471832275390625, -0.1792430877685547, 0.428131103515625, 0.6089859008789062, 0.2873382568359375, -0.2134246826171875, -0.123138427734375, 0.203216552734375, -0.9004058837890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000016.npy"}
{"epoch": 0.023494860499265784, "step": 17, "batch_size": 64, "mean": 0.09094801545143127, "std": 0.4269859790802002, "min": -1.137451171875, "p10": -0.46690788269042965, "median": 0.17073631286621094, "p90": 0.5522102355957033, "max": 1.0062255859375, "pos_frac": 0.625, "sample": [-0.3350791931152344, 0.2189922332763672, -0.08092498779296875, 0.03328704833984375, -0.48162841796875, -0.20517730712890625, 0.4566802978515625, -1.137451171875, 0.532989501953125, -0.23651885986328125, 0.9690780639648438, 0.2764434814453125, 0.660064697265625, 0.286895751953125, -0.06258201599121094, 0.35065460205078125, -0.5577049255371094, 0.18657875061035156, 0.08080482482910156, 0.715667724609375, 0.18907546997070312, 0.3136310577392578, -0.14027023315429688, 0.33683013916015625, 0.25215911865234375, 0.16765594482421875, 0.2099781036376953, 0.09662628173828125, 0.3901214599609375, 0.19116592407226562, -0.662506103515625, 0.7616424560546875, 0.1081085205078125, -0.1535186767578125, 0.18781471252441406, 0.39842987060546875, 0.497283935546875, -0.05137443542480469, -0.03664970397949219, -0.574462890625, 0.113037109375, 0.0211334228515625, 0.34682655334472656, 0.2372112274169922, -0.5306549072265625, -0.0126495361328125, -0.4325599670410156, 0.25399208068847656, -0.28688812255859375, 0.01782989501953125, -0.37998199462890625, -0.060749053955078125, 0.4775848388671875, -0.1307373046875, -0.02646636962890625, 0.78631591796875, 1.0062255859375, 0.5275039672851562, 0.3236083984375, -1.0836029052734375, 0.1983203887939453, -0.43170166015625, 0.5604476928710938, 0.17381668090820312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000017.npy"}
{"epoch": 0.024963289280469897, "step": 18, "batch_size": 64, "mean": -0.02423509955406189, "std": 0.28429028391838074, "min": -0.7091064453125, "p10": -0.4207092285156249, "median": -0.01544189453125, "p90": 0.3058504104614258, "max": 0.652801513671875, "pos_frac": 0.453125, "sample": [-0.2791786193847656, -0.606109619140625, 0.2309722900390625, 0.20337677001953125, 0.06270980834960938, -0.3666839599609375, -0.0421295166015625, -0.7091064453125, -0.08416748046875, 0.3052482604980469, -0.5941925048828125, -0.084381103515625, -0.07099151611328125, -0.17081451416015625, -0.0727386474609375, -0.02201080322265625, -0.0211029052734375, 0.11239433288574219, 0.3342132568359375, 0.3321380615234375, 0.30049896240234375, -0.0649566650390625, -0.04949951171875, 0.002162933349609375, 0.19543838500976562, 0.29340362548828125, 0.10394287109375, -0.0410308837890625, -0.11942481994628906, -0.08372306823730469, 0.2718353271484375, 0.09476852416992188, 0.2431640625, 0.30638885498046875, -0.0097808837890625, -0.10205078125, 0.041290283203125, 0.3316764831542969, 0.008575439453125, -0.2664337158203125, 0.28191184997558594, -0.0428619384765625, -0.6816482543945312, 0.34899139404296875, 0.0637969970703125, -0.33818817138671875, 0.17325973510742188, -0.4438629150390625, 0.209808349609375, -0.12032699584960938, -0.1109771728515625, -0.2589454650878906, -0.6644287109375, -0.0233154296875, -0.00199127197265625, 0.07865142822265625, 0.652801513671875, -0.11107826232910156, -0.58428955078125, 0.023578643798828125, 0.12644386291503906, -0.3434600830078125, -0.0047149658203125, 0.3061084747314453], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000018.npy"}
{"epoch": 0.02643171806167401, "step": 19, "batch_size": 64, "mean": 0.09300819039344788, "std": 0.3002864718437195, "min": -0.775177001953125, "p10": -0.25102691650390624, "median": 0.07503318786621094, "p90": 0.4022407531738281, "max": 1.25628662109375, "pos_frac": 0.640625, "sample": [0.22014999389648438, -0.03456306457519531, -0.06508636474609375, -0.05987548828125, 0.14228439331054688, 0.181304931640625, 0.23393630981445312, -0.009387969970703125, 0.102508544921875, 0.2795257568359375, 0.36962890625, -0.14748001098632812, 0.039943695068359375, 0.302978515625, 0.647613525390625, -0.775177001953125, -0.41156005859375, -0.21856689453125, -0.2649383544921875, 0.1294403076171875, -0.3014373779296875, 0.041568756103515625, 0.1401195526123047, 0.148406982421875, -0.2676963806152344, 0.29889488220214844, 0.3909759521484375, -0.040740966796875, -0.3325233459472656, -0.0411224365234375, -0.20177459716796875, 0.8394508361816406, 0.015027999877929688, 0.012479782104492188, 0.0833740234375, -0.07969856262207031, 0.160400390625, 0.2987251281738281, -0.19075775146484375, -0.16571044921875, 0.068634033203125, 0.425537109375, -0.0103302001953125, 0.0448760986328125, 0.005062103271484375, 0.40520477294921875, 0.31491851806640625, 0.251678466796875, 0.1251373291015625, -0.058399200439453125, 0.1829071044921875, 0.18414688110351562, 0.11474609375, 0.06517219543457031, 0.3059234619140625, 0.08143234252929688, 0.4049072265625, 0.39601898193359375, -0.3412322998046875, 0.470367431640625, 1.25628662109375, -0.058746337890625, 0.03334808349609375, -0.1857147216796875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000019.npy"}
{"epoch": 0.027900146842878122, "step": 20, "batch_size": 64, "mean": 0.10994073748588562, "std": 0.3572402894496918, "min": -0.6739349365234375, "p10": -0.3135507583618164, "median": 0.12745380401611328, "p90": 0.6014587402343751, "max": 0.9145736694335938, "pos_frac": 0.640625, "sample": [0.02191925048828125, -0.2786540985107422, -0.00806427001953125, 0.801605224609375, -0.14757537841796875, 0.3025836944580078, 0.02100372314453125, -0.1728801727294922, 0.1440582275390625, 0.299713134765625, -0.4990997314453125, 0.28591156005859375, 0.6646003723144531, 0.23479652404785156, -0.25461578369140625, -0.0518798828125, 0.061351776123046875, -0.3285064697265625, -0.2094440460205078, 0.21954345703125, -0.0287017822265625, -0.512725830078125, -0.33349609375, -0.26747703552246094, -0.545562744140625, -0.16965866088867188, -0.1194915771484375, 0.0693511962890625, 0.28272247314453125, -0.03843879699707031, 0.015867233276367188, -0.0997772216796875, -0.6739349365234375, 0.2056121826171875, 0.23462677001953125, 0.01670074462890625, 0.795989990234375, -0.6190185546875, 0.12702560424804688, -0.186920166015625, 0.25621795654296875, 0.5749053955078125, 0.429107666015625, 0.2198486328125, 0.4686737060546875, 0.17498397827148438, 0.1986541748046875, 0.6128387451171875, 0.4353179931640625, 0.07784271240234375, 0.7821044921875, -0.22504425048828125, 0.4607429504394531, 0.75787353515625, 0.9145736694335938, -0.19350242614746094, 0.391876220703125, 0.1318187713623047, 0.3570423126220703, 0.1278820037841797, 0.026866912841796875, 0.328948974609375, 0.3304557800292969, 0.1371173858642578], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000020.npy"}
{"epoch": 0.02936857562408223, "step": 21, "batch_size": 64, "mean": 0.10829953849315643, "std": 0.3992394506931305, "min": -0.6503753662109375, "p10": -0.3042173385620117, "median": 0.09803581237792969, "p90": 0.6115791320800782, "max": 1.203338623046875, "pos_frac": 0.578125, "sample": [0.4253692626953125, 0.12546730041503906, 0.8658599853515625, 0.404266357421875, 0.22939300537109375, 0.08705997467041016, -0.011381149291992188, -0.22932052612304688, 0.301055908203125, -0.6503753662109375, 0.622467041015625, 0.03740692138671875, 0.11443901062011719, 0.293426513671875, 0.14119720458984375, 0.988494873046875, 0.3207550048828125, -0.17413330078125, -0.11930084228515625, 0.4080047607421875, 0.270416259765625, 0.2327423095703125, -0.6087417602539062, 0.08975982666015625, -0.3062915802001953, 1.0654067993164062, 0.32208251953125, -0.42157554626464844, -0.2198486328125, -0.152099609375, -0.29937744140625, 0.21842193603515625, -0.007091522216796875, 0.3288555145263672, -0.6292800903320312, 0.07031822204589844, 0.10631179809570312, -0.1951751708984375, -0.117340087890625, 0.005157470703125, 0.13358306884765625, -0.13554000854492188, 0.2642974853515625, -0.06316757202148438, -0.207763671875, -0.2437000274658203, 0.4964866638183594, -0.23192596435546875, -0.07693099975585938, 0.20745849609375, 0.23639488220214844, 0.36496734619140625, 0.7717628479003906, 0.6401214599609375, -0.2452373504638672, -0.003704071044921875, 0.528045654296875, -0.5841598510742188, 1.203338623046875, 0.5861740112304688, -0.4742889404296875, -0.1472320556640625, 0.2579498291015625, -0.2785625457763672], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000021.npy"}
{"epoch": 0.030837004405286344, "step": 22, "batch_size": 64, "mean": 0.15809422731399536, "std": 0.4044044315814972, "min": -0.748870849609375, "p10": -0.37582530975341794, "median": 0.19183349609375, "p90": 0.6482650756835938, "max": 1.1539230346679688, "pos_frac": 0.6875, "sample": [0.19355010986328125, 0.069091796875, -0.11516189575195312, 0.3560009002685547, 0.420196533203125, -0.5955047607421875, 0.158843994140625, 0.10811805725097656, 0.06302642822265625, 0.889373779296875, 0.4834022521972656, 0.19097900390625, -0.3084220886230469, 1.1539230346679688, 0.04300689697265625, -0.4956817626953125, -0.22081756591796875, 0.37954139709472656, 0.2139129638671875, -0.533172607421875, 0.02500152587890625, 0.8894805908203125, -0.1626739501953125, -0.0432281494140625, -0.01609039306640625, 0.5550975799560547, 0.6544036865234375, 0.19268798828125, 0.293182373046875, 0.6062774658203125, 0.15036964416503906, 0.207275390625, 0.206787109375, -0.12482070922851562, 0.1330127716064453, 0.966278076171875, 0.6766853332519531, -0.47310638427734375, -0.23293495178222656, 0.4988365173339844, 0.28765869140625, 0.22542572021484375, 0.376800537109375, 0.2955665588378906, -0.35567665100097656, 0.8232955932617188, -0.3222236633300781, -0.6927947998046875, 0.633941650390625, 0.24193572998046875, -0.048816680908203125, 0.222808837890625, -0.18883323669433594, 0.0247039794921875, -0.748870849609375, 0.27227783203125, 0.1307220458984375, 0.3964385986328125, 0.173980712890625, 0.539459228515625, -0.38446044921875, 0.5615463256835938, 0.2279052734375, -0.03148841857910156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000022.npy"}
{"epoch": 0.032305433186490456, "step": 23, "batch_size": 64, "mean": 0.20202842354774475, "std": 0.3789483606815338, "min": -0.6140861511230469, "p10": -0.18919925689697265, "median": 0.10356903076171875, "p90": 0.7166036605834961, "max": 1.4065742492675781, "pos_frac": 0.640625, "sample": [-0.1929798126220703, 0.0501251220703125, 0.5280914306640625, 0.19490432739257812, 0.7236251831054688, 0.4198646545410156, 0.4474945068359375, -0.10694122314453125, -0.05125999450683594, -0.1753082275390625, 0.82489013671875, 1.0273590087890625, 0.09832763671875, 0.0511322021484375, 0.40814971923828125, -0.20086669921875, 0.3353404998779297, 0.4740867614746094, -0.24407958984375, 0.06750869750976562, 0.519775390625, 0.161376953125, 0.7541961669921875, 0.15176773071289062, -0.6140861511230469, -0.0680999755859375, -0.1512603759765625, -0.10786056518554688, -0.27286529541015625, -0.001007080078125, -0.1092529296875, -0.0699310302734375, 0.32196044921875, 0.7058010101318359, -0.07335281372070312, 0.39397430419921875, 0.1088104248046875, 0.4927825927734375, 0.044399261474609375, 0.03927421569824219, -0.009052276611328125, 0.09337234497070312, 0.04593658447265625, 1.4065742492675781, -0.060260772705078125, 1.2808837890625, -0.18037796020507812, 0.04736328125, 0.31084442138671875, -0.2364044189453125, 0.5870609283447266, 0.40256309509277344, 0.26409149169921875, -0.07655906677246094, 0.18667030334472656, 0.235504150390625, 0.3806610107421875, 0.3446502685546875, -0.023075103759765625, 0.2116851806640625, -0.16538238525390625, -0.28769683837890625, 0.7212333679199219, 0.5436668395996094], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000023.npy"}
{"epoch": 0.033773861967694566, "step": 24, "batch_size": 64, "mean": 0.21480104327201843, "std": 0.3640005886554718, "min": -0.7636375427246094, "p10": -0.23071441650390623, "median": 0.19333648681640625, "p90": 0.6749931335449221, "max": 1.140350341796875, "pos_frac": 0.71875, "sample": [0.22612762451171875, 0.6357345581054688, -0.1075592041015625, -0.238006591796875, 0.3623504638671875, 0.24193763732910156, 0.14298248291015625, -0.01222991943359375, 0.5579071044921875, 0.24611663818359375, 0.40415000915527344, -0.3618431091308594, 0.12409782409667969, 0.59320068359375, 0.18411636352539062, 0.2191944122314453, 0.13776016235351562, 0.5094242095947266, 0.4744873046875, -0.220001220703125, 0.3637123107910156, -0.7636375427246094, -0.2656116485595703, 0.518707275390625, 0.2013092041015625, 0.6947174072265625, 0.3489646911621094, -0.58935546875, 0.6918182373046875, 0.33032989501953125, 1.140350341796875, 0.8350677490234375, 0.15977859497070312, -0.1116790771484375, 0.5005722045898438, -0.4158935546875, -0.0001678466796875, -0.1059722900390625, -0.2353057861328125, 0.371673583984375, 0.75103759765625, 0.5280704498291016, -0.15817642211914062, -0.01537322998046875, 0.5060653686523438, 0.947601318359375, 0.25335693359375, -0.0034046173095703125, -0.0384521484375, 0.11116218566894531, 0.3955230712890625, 0.04213523864746094, 0.11916732788085938, 0.051280975341796875, 0.218414306640625, 0.485198974609375, 0.6243228912353516, 0.0212554931640625, -0.15473556518554688, 0.06284332275390625, 0.18536376953125, 0.0398406982421875, 0.15045166015625, 0.834991455078125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000024.npy"}
{"epoch": 0.03524229074889868, "step": 25, "batch_size": 64, "mean": 0.2061150074005127, "std": 0.4741508662700653, "min": -1.387664794921875, "p10": -0.2832302093505859, "median": 0.1278705596923828, "p90": 0.7049468994140625, "max": 1.651641845703125, "pos_frac": 0.65625, "sample": [-0.01738739013671875, 0.1172332763671875, 0.725738525390625, -0.00382232666015625, -0.13557052612304688, -0.11524772644042969, 0.066558837890625, 0.7043914794921875, 0.487823486328125, -0.02161407470703125, 0.12319183349609375, 0.5772705078125, -0.1162109375, 0.5615615844726562, 1.3849563598632812, -0.3592662811279297, -0.04724884033203125, 0.87298583984375, -0.055736541748046875, 0.5622673034667969, -0.3902587890625, 0.3215675354003906, 0.24657630920410156, 0.07884979248046875, 0.320404052734375, 0.542633056640625, -0.0171661376953125, 0.921417236328125, -1.0261154174804688, 0.008668899536132812, 0.39340972900390625, 0.31783294677734375, -0.12313079833984375, 0.05615997314453125, 0.6069488525390625, -0.122589111328125, -0.286895751953125, 1.651641845703125, -0.3549652099609375, 0.7051849365234375, -0.2746772766113281, 0.5329132080078125, 0.4368133544921875, 0.11861801147460938, -0.13700485229492188, -0.34276580810546875, 0.36798858642578125, -0.05438232421875, 0.8083343505859375, 0.035602569580078125, 0.282958984375, 0.408233642578125, 0.13254928588867188, -1.387664794921875, 0.360137939453125, 0.03652763366699219, -0.07647705078125, 0.5191535949707031, 0.5101165771484375, 0.6956291198730469, 0.13680076599121094, 0.5072860717773438, 0.40564727783203125, 0.0069732666015625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000025.npy"}
{"epoch": 0.03671071953010279, "step": 26, "batch_size": 64, "mean": 0.3534834384918213, "std": 0.5806217193603516, "min": -0.74755859375, "p10": -0.2320434570312499, "median": 0.21116065979003906, "p90": 1.420623016357422, "max": 1.9849700927734375, "pos_frac": 0.78125, "sample": [0.2169036865234375, 0.3353157043457031, 0.2898674011230469, 0.10579490661621094, 1.467987060546875, 0.49898529052734375, 0.48996734619140625, 1.4521980285644531, 0.1428985595703125, -0.38866233825683594, 1.35943603515625, 0.042236328125, 1.54193115234375, 0.32440185546875, -0.07795333862304688, 0.88226318359375, 0.00811004638671875, 0.07840728759765625, 0.864105224609375, -0.5052680969238281, -0.13279342651367188, -0.0675811767578125, 0.4290008544921875, 0.019989013671875, 0.024688720703125, 1.4468460083007812, 0.4495697021484375, -0.74755859375, 0.39371681213378906, 0.8983612060546875, -0.2702484130859375, -0.11663055419921875, 0.30632972717285156, 0.04977226257324219, -0.1428985595703125, -0.2925262451171875, 1.58355712890625, -0.1046600341796875, 0.21635055541992188, 1.5499954223632812, 0.07637786865234375, 0.0371551513671875, 1.9849700927734375, 0.20597076416015625, 0.4131622314453125, 0.6650161743164062, -0.31148529052734375, 0.025909423828125, 0.33603668212890625, 0.0739288330078125, 0.048564910888671875, 0.23873138427734375, 0.009138107299804688, 0.32660675048828125, -0.5306320190429688, 0.139312744140625, 1.1378707885742188, 1.0044784545898438, 0.9762649536132812, -0.08763504028320312, 0.19487762451171875, 0.20502090454101562, 0.49306678771972656, 0.3380260467529297], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000026.npy"}
{"epoch": 0.0381791483113069, "step": 27, "batch_size": 64, "mean": 0.4813147783279419, "std": 0.5254032015800476, "min": -0.52703857421875, "p10": -0.06738471984863273, "median": 0.4080963134765625, "p90": 1.1280609130859376, "max": 2.4729385375976562, "pos_frac": 0.890625, "sample": [0.34529685974121094, 0.5217971801757812, 0.45551300048828125, 0.3930206298828125, 0.8338432312011719, 1.2103347778320312, -0.103302001953125, 0.38067626953125, 1.2309417724609375, -0.52703857421875, 0.4860343933105469, 0.4325752258300781, 0.5462608337402344, 0.04438018798828125, 1.1313018798828125, 1.2383346557617188, 1.75714111328125, 0.046295166015625, 0.4988250732421875, 1.0849609375, 0.03537750244140625, 0.26544189453125, 0.45201873779296875, 0.15112876892089844, 0.46175384521484375, 0.6534233093261719, 0.1996288299560547, 0.08970451354980469, 0.20375442504882812, 2.4729385375976562, 0.14352035522460938, 0.13152313232421875, -0.13582611083984375, 0.863739013671875, 1.00970458984375, 0.707733154296875, 0.6486129760742188, 1.280029296875, 0.016422271728515625, 0.2560272216796875, -0.23751068115234375, 0.396209716796875, 0.0754241943359375, 0.7062606811523438, 0.32369232177734375, 0.52069091796875, -0.47898101806640625, 0.2664451599121094, 1.1181793212890625, 1.076202392578125, 0.480010986328125, 0.27777862548828125, 0.8523101806640625, 0.14357757568359375, 0.79730224609375, 0.41998291015625, 1.1204986572265625, -0.12436485290527344, 0.2381153106689453, -0.33309173583984375, 0.03794097900390625, 0.9120254516601562, 0.16153717041015625, 0.1400604248046875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000027.npy"}
{"epoch": 0.039647577092511016, "step": 28, "batch_size": 64, "mean": 0.301299124956131, "std": 0.6241872906684875, "min": -1.171051025390625, "p10": -0.407757568359375, "median": 0.22960853576660156, "p90": 1.1306259155273441, "max": 1.7890167236328125, "pos_frac": 0.734375, "sample": [1.4446907043457031, 0.509490966796875, -0.11617279052734375, 0.20142364501953125, 0.4725055694580078, 0.067169189453125, 0.10793304443359375, 0.2201671600341797, -0.1845836639404297, -0.3297309875488281, 0.16487884521484375, 1.166839599609375, 1.7178497314453125, 0.5972442626953125, 0.27382659912109375, -0.15586280822753906, 0.2266845703125, 0.533416748046875, -0.14365768432617188, 0.3562164306640625, 0.5822830200195312, -0.6112747192382812, 0.050445556640625, 1.7890167236328125, -0.1257781982421875, 1.0105819702148438, 0.29001617431640625, 0.7782135009765625, 1.6521568298339844, 0.4285163879394531, 0.23253250122070312, 0.1730499267578125, -0.2928581237792969, -0.4107513427734375, 0.7336845397949219, 0.5254058837890625, 0.6478919982910156, 0.17885589599609375, 0.3732185363769531, -0.5434455871582031, 0.7250823974609375, 0.5707130432128906, 0.21112632751464844, 0.9961090087890625, 1.394378662109375, 0.7558746337890625, 0.04296112060546875, 0.20032882690429688, 1.0461273193359375, 0.138458251953125, 0.2754383087158203, -0.8544692993164062, -0.6907501220703125, 0.1876373291015625, -0.3302154541015625, 0.46039581298828125, -0.4007720947265625, 0.842559814453125, 1.1819000244140625, -0.3468494415283203, 0.28388214111328125, -1.171051025390625, -0.9875869750976562, 0.1597747802734375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000028.npy"}
{"epoch": 0.041116005873715125, "step": 29, "batch_size": 64, "mean": 0.48947349190711975, "std": 0.5348759293556213, "min": -0.39105224609375, "p10": -0.14045200347900388, "median": 0.4916238784790039, "p90": 1.1318115234375001, "max": 2.2104568481445312, "pos_frac": 0.78125, "sample": [1.1119117736816406, -0.14998435974121094, 0.50103759765625, 0.05208587646484375, 0.44609832763671875, 0.4888935089111328, -0.35521697998046875, 1.3331146240234375, 0.5514335632324219, 0.169952392578125, 0.1797332763671875, -0.009359359741210938, 0.8657455444335938, 1.0457839965820312, -0.1182098388671875, 0.9529876708984375, 0.9305419921875, -0.228118896484375, 0.024694442749023438, 1.619476318359375, -0.00075531005859375, 0.6043853759765625, 0.4807586669921875, 0.1659564971923828, 1.0964202880859375, -0.06966400146484375, -0.182525634765625, 0.6227607727050781, 0.7212371826171875, 0.494354248046875, -0.10900115966796875, 0.5084762573242188, 2.2104568481445312, -0.299072265625, 0.2191009521484375, 0.08849525451660156, 0.30629730224609375, -0.10750961303710938, 1.1577911376953125, 0.357666015625, 1.143768310546875, 1.1385498046875, 0.6376991271972656, 0.59674072265625, 0.6857223510742188, 0.03226661682128906, 1.1160888671875, -0.24112701416015625, 0.3445167541503906, 0.010011672973632812, 0.31812286376953125, 0.12515830993652344, 1.0419769287109375, 0.26877593994140625, 0.7607841491699219, 0.746368408203125, -0.02889251708984375, 0.963653564453125, 0.5055732727050781, 0.7263374328613281, 1.5264511108398438, 0.6610336303710938, -0.39105224609375, 0.9595451354980469], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000029.npy"}
{"epoch": 0.042584434654919234, "step": 30, "batch_size": 64, "mean": 0.6395825147628784, "std": 0.5862607359886169, "min": -0.6992111206054688, "p10": -0.04998035430908194, "median": 0.6337699890136719, "p90": 1.3738697052001954, "max": 2.28778076171875, "pos_frac": 0.890625, "sample": [1.3365364074707031, -0.0881195068359375, 0.05901908874511719, 0.473602294921875, 1.40655517578125, -0.6992111206054688, 0.6945610046386719, 0.359344482421875, -0.32385826110839844, 0.438140869140625, 1.1071929931640625, 1.1357269287109375, 0.9679946899414062, 0.8723907470703125, 0.801666259765625, 1.3356704711914062, 0.6023483276367188, 1.1153793334960938, 0.22167587280273438, 1.1382904052734375, 0.7892913818359375, 1.05755615234375, 2.28778076171875, 0.6626052856445312, 0.796844482421875, 0.21187591552734375, 0.6049346923828125, 0.590667724609375, 0.08492469787597656, 0.34949493408203125, 0.5079803466796875, 0.3092021942138672, 0.7029647827148438, 0.36395835876464844, -0.3758697509765625, 0.7252044677734375, 1.6825408935546875, 1.897674560546875, 1.2332916259765625, 0.5193061828613281, 0.03901100158691406, 0.8618621826171875, 0.19975662231445312, 0.9325294494628906, -0.34781646728515625, 1.3898696899414062, 1.0717620849609375, 0.5775718688964844, 0.77532958984375, 0.8579025268554688, 1.469696044921875, 0.740203857421875, 0.10381507873535156, -0.1979351043701172, 0.05119895935058594, 1.120635986328125, 0.675506591796875, 0.3864555358886719, 0.23086166381835938, 0.15857887268066406, -0.4124298095703125, 1.6663894653320312, 0.243011474609375, 0.38237571716308594], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000030.npy"}
{"epoch": 0.04405286343612335, "step": 31, "batch_size": 64, "mean": 0.4423332214355469, "std": 0.6922799348831177, "min": -1.3129730224609375, "p10": -0.22923202514648436, "median": 0.4007406234741211, "p90": 1.3205467224121095, "max": 2.754180908203125, "pos_frac": 0.796875, "sample": [0.0077342987060546875, 1.4388236999511719, -0.432891845703125, 0.41609954833984375, 1.567169189453125, 0.24073028564453125, 0.025341033935546875, -0.3001556396484375, 0.7705307006835938, -0.9930877685546875, 2.754180908203125, 0.27619171142578125, 0.43753814697265625, 0.3492469787597656, 0.23652267456054688, 0.10647964477539062, 1.00311279296875, 0.40651893615722656, 1.241363525390625, 0.9246864318847656, -0.00334930419921875, 0.04852294921875, 0.3003692626953125, 0.22180938720703125, 0.03269195556640625, 0.1701030731201172, 0.5235710144042969, 0.5467491149902344, 0.5352382659912109, -0.17181396484375, 1.3217620849609375, 0.41751861572265625, 1.725433349609375, -1.3129730224609375, 0.4562225341796875, -0.8227996826171875, -0.22225189208984375, 0.7302398681640625, -0.36814117431640625, 0.1048126220703125, 0.7646865844726562, 0.7716751098632812, -0.020992279052734375, 0.3408641815185547, 0.0588836669921875, -0.021270751953125, 0.612060546875, 0.492706298828125, 0.033161163330078125, 0.043243408203125, 0.4763927459716797, 0.8677749633789062, 0.49542236328125, -0.2322235107421875, -0.18928146362304688, 1.5234909057617188, 0.3949623107910156, 0.7318878173828125, 0.7382354736328125, 0.22571945190429688, 1.2591400146484375, 2.2498397827148438, 1.3177108764648438, 0.6653861999511719], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000031.npy"}
{"epoch": 0.04552129221732746, "step": 32, "batch_size": 64, "mean": 0.7098907232284546, "std": 0.8560023307800293, "min": -0.7479896545410156, "p10": -0.19838752746582025, "median": 0.5367431640625, "p90": 1.916912078857422, "max": 3.3843841552734375, "pos_frac": 0.84375, "sample": [0.5082244873046875, 0.5350379943847656, 0.8183212280273438, 1.2855148315429688, 0.8194503784179688, 0.9507904052734375, -0.3678131103515625, 0.9598388671875, 0.3104877471923828, 0.8256759643554688, 1.2408103942871094, 0.08326339721679688, -0.50091552734375, 0.4174518585205078, -0.050018310546875, 1.0734100341796875, -0.5008773803710938, 0.8469657897949219, -0.7479896545410156, 2.624908447265625, 0.43958282470703125, 1.7447052001953125, 1.758941650390625, 0.223388671875, 0.645233154296875, 0.5482215881347656, -0.22507858276367188, -0.1313934326171875, -0.3562431335449219, 0.8092117309570312, 0.3193817138671875, 0.3829307556152344, 0.3179759979248047, 2.6309356689453125, 1.5778694152832031, 0.24426651000976562, 0.759613037109375, 0.5384483337402344, 0.06039619445800781, 0.5466842651367188, 2.4705047607421875, 0.9098644256591797, -0.1361083984375, 0.25551605224609375, 3.3843841552734375, 0.958343505859375, 0.42731475830078125, 1.9341812133789062, 1.95977783203125, -0.4055328369140625, 0.06666183471679688, 0.306060791015625, 2.8223876953125, 0.5682296752929688, 0.04923248291015625, 0.20712661743164062, 0.7366065979003906, 1.2568817138671875, 0.09795570373535156, 0.313690185546875, 0.23254013061523438, 0.856597900390625, 1.876617431640625, 0.3165626525878906], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000032.npy"}
{"epoch": 0.04698972099853157, "step": 33, "batch_size": 64, "mean": 0.5745028257369995, "std": 0.570174515247345, "min": -0.7579154968261719, "p10": -0.08554306030273436, "median": 0.48003578186035156, "p90": 1.369311904907227, "max": 2.060760498046875, "pos_frac": 0.828125, "sample": [-0.1033935546875, 0.29775428771972656, 0.3904571533203125, 0.6353302001953125, 1.7151336669921875, -0.048492431640625, 0.3554534912109375, 0.34765625, 0.49193572998046875, 0.5585117340087891, -0.00690460205078125, 0.8228988647460938, 1.475799560546875, 1.4138259887695312, 1.7549285888671875, 0.45931243896484375, 0.7798614501953125, -0.7579154968261719, -0.0240020751953125, 0.2860260009765625, -0.09137725830078125, 0.7965545654296875, 0.72796630859375, 1.2155914306640625, 0.6972503662109375, -0.071929931640625, 1.4447021484375, 0.04044342041015625, 0.22821044921875, 1.873260498046875, 0.17032241821289062, 0.054962158203125, 2.060760498046875, -0.16971397399902344, 0.1452178955078125, 0.5478630065917969, 0.5822410583496094, 0.4390411376953125, 0.9884262084960938, 0.3852424621582031, 0.18701934814453125, 1.0292510986328125, 0.218963623046875, 0.7499198913574219, 0.8809738159179688, 0.42047882080078125, 0.339385986328125, 1.199127197265625, 0.8422107696533203, -0.21697235107421875, 0.402587890625, -0.30061912536621094, 1.2654457092285156, 0.06121826171875, 0.58416748046875, 0.7558860778808594, 1.2205047607421875, -0.19811248779296875, 1.02874755859375, 0.4681358337402344, 0.85516357421875, 0.6546478271484375, 0.29137229919433594, 1.1194648742675781], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000033.npy"}
{"epoch": 0.048458149779735685, "step": 34, "batch_size": 64, "mean": 0.6845252513885498, "std": 0.7803934216499329, "min": -0.7032051086425781, "p10": -0.15060749053955078, "median": 0.5644588470458984, "p90": 1.9506340026855473, "max": 2.8947601318359375, "pos_frac": 0.828125, "sample": [2.2408981323242188, -0.7032051086425781, 0.5222930908203125, 0.4051017761230469, 1.8277740478515625, 0.38838958740234375, 0.868377685546875, 0.6712493896484375, 0.0696563720703125, 0.0007476806640625, 0.6179275512695312, 0.024654388427734375, 0.30066680908203125, 0.34665679931640625, 1.4416732788085938, 2.1663970947265625, 0.2260265350341797, -0.15481948852539062, 1.8326873779296875, 0.4906463623046875, 0.8163833618164062, -0.5394401550292969, 1.0108489990234375, 0.8615989685058594, 0.87701416015625, 1.4264144897460938, 2.259796142578125, -0.4148101806640625, 0.505859375, 1.044830322265625, 0.32418060302734375, 0.0986785888671875, 0.811614990234375, 1.0156021118164062, 2.0323486328125, 0.7896690368652344, 0.6077613830566406, 2.8947601318359375, 0.6209697723388672, -0.14148712158203125, 0.0796661376953125, 0.24267196655273438, 1.3154678344726562, 1.1185226440429688, 0.3737335205078125, 0.6562347412109375, -0.03402900695800781, -0.32274627685546875, 1.139556884765625, 0.3580322265625, 0.3910331726074219, -0.41925048828125, 0.14563751220703125, 0.773193359375, 0.3215179443359375, -0.15451622009277344, 2.0011825561523438, 0.6066246032714844, -0.1309661865234375, 0.6999664306640625, 1.4179534912109375, 2.4940185546875, 0.2680034637451172, -0.01828765869140625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000034.npy"}
{"epoch": 0.049926578560939794, "step": 35, "batch_size": 64, "mean": 1.0402470827102661, "std": 1.0527760982513428, "min": -0.8935546875, "p10": 0.09258632659912115, "median": 0.7959442138671875, "p90": 2.312187957763672, "max": 6.41046142578125, "pos_frac": 0.9375, "sample": [1.0692672729492188, 0.4869842529296875, 2.345672607421875, 1.51318359375, 0.013214111328125, 0.5726337432861328, 0.3223876953125, 0.0696563720703125, 0.0665283203125, 0.944183349609375, 1.8566436767578125, 0.1460895538330078, 1.1209793090820312, 0.5957050323486328, 0.407379150390625, 0.4174041748046875, 1.7377853393554688, 3.0274429321289062, 0.287109375, 0.38761329650878906, 0.7795333862304688, 2.3181228637695312, 2.29833984375, -0.0407562255859375, 0.242645263671875, 0.6151885986328125, 0.4083271026611328, 1.083953857421875, 0.46623992919921875, 1.1625823974609375, 6.41046142578125, 1.0059585571289062, 0.172393798828125, 0.9383163452148438, 0.7163543701171875, 2.8618927001953125, 0.6368827819824219, 1.1202468872070312, 2.42840576171875, 1.7621536254882812, 0.1755828857421875, 1.8157806396484375, 1.5547027587890625, 1.2779998779296875, 0.3944358825683594, -0.24814224243164062, 1.54388427734375, 1.7602081298828125, 0.9893569946289062, 0.571563720703125, 0.6388626098632812, 0.37725830078125, 0.7041568756103516, 2.4731292724609375, 0.8123550415039062, 0.8286495208740234, 1.1539306640625, -0.2668914794921875, 1.46148681640625, -0.8935546875, 0.6443157196044922, 1.9854888916015625, 1.787811279296875, 0.258331298828125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000035.npy"}
{"epoch": 0.0513950073421439, "step": 36, "batch_size": 64, "mean": 1.0272307395935059, "std": 1.1828550100326538, "min": -2.2750015258789062, "p10": -0.27951774597167967, "median": 0.8293113708496094, "p90": 2.6426208496093753, "max": 3.91168212890625, "pos_frac": 0.828125, "sample": [1.3009185791015625, -0.22872352600097656, 1.0808086395263672, 0.5209274291992188, 1.2885818481445312, 0.8498516082763672, 0.35321807861328125, -0.5070419311523438, -0.3012866973876953, 2.464202880859375, 2.8332977294921875, 0.2678356170654297, -0.45890045166015625, 0.3056297302246094, 0.08153343200683594, 1.4514541625976562, -0.025026321411132812, 1.75994873046875, 1.694183349609375, 0.14246368408203125, -0.0113983154296875, -0.32086181640625, 0.42620849609375, 0.07036209106445312, 3.7662429809570312, 2.573863983154297, 1.0768623352050781, -0.7782974243164062, 2.9263153076171875, 0.7731781005859375, 0.6003093719482422, 1.185272216796875, -0.46959686279296875, 2.7278823852539062, 0.5057487487792969, 0.01628875732421875, 2.6673583984375, 0.6917190551757812, 1.9304046630859375, -0.017360687255859375, 1.4670867919921875, 0.8711032867431641, 2.2215576171875, 0.8087711334228516, 2.358062744140625, 1.0665435791015625, 0.4070091247558594, 3.7685394287109375, 0.7813167572021484, -2.2750015258789062, 2.260578155517578, 0.9827346801757812, 1.5721359252929688, 0.709197998046875, 2.2095565795898438, 1.2482242584228516, 0.2285919189453125, 0.5669574737548828, 2.58489990234375, 0.571502685546875, 1.0236434936523438, 3.91168212890625, 0.8727569580078125, 0.3109397888183594], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000036.npy"}
{"epoch": 0.05286343612334802, "step": 37, "batch_size": 64, "mean": 0.979668140411377, "std": 1.2481446266174316, "min": -1.47515869140625, "p10": -0.14779930114746093, "median": 0.6688232421875, "p90": 2.630265045166016, "max": 5.469451904296875, "pos_frac": 0.8125, "sample": [1.5456695556640625, -0.009698867797851562, 1.0669631958007812, 1.0096588134765625, 0.2753753662109375, 0.4002532958984375, 0.396514892578125, 0.9359893798828125, 1.401214599609375, -0.13270950317382812, 0.5621261596679688, 3.2235107421875, 0.6381683349609375, -0.744476318359375, -0.738861083984375, 0.00760650634765625, 1.013162612915039, 0.17512130737304688, 1.8579540252685547, 5.469451904296875, 0.6994781494140625, 3.8144989013671875, 1.4932403564453125, 0.6153907775878906, 1.0409011840820312, 0.7855339050292969, 0.23020172119140625, 0.5324897766113281, 1.1129684448242188, 1.138031005859375, 0.48104286193847656, -0.0041961669921875, 0.1884784698486328, -0.9470748901367188, 0.5856399536132812, -0.011322021484375, 0.2787303924560547, 2.5382843017578125, 1.5036697387695312, 2.36370849609375, -0.4003334045410156, 3.0626602172851562, 0.730194091796875, 1.5761032104492188, 0.08028411865234375, 0.8873729705810547, 0.0280609130859375, -0.154266357421875, 0.577056884765625, 1.5391807556152344, 2.508625030517578, 0.19659423828125, 2.6696853637695312, 2.498443603515625, 0.1219482421875, 3.0436477661132812, 1.5983924865722656, -1.47515869140625, -0.1307373046875, 1.28411865234375, 2.11517333984375, 3.551849365234375, -0.1628856658935547, 0.16005897521972656], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000037.npy"}
{"epoch": 0.05433186490455213, "step": 38, "batch_size": 64, "mean": 1.3030192852020264, "std": 1.5184988975524902, "min": -0.5276565551757812, "p10": 0.011898803710937501, "median": 0.9543304443359375, "p90": 3.0832672119140625, "max": 7.436431884765625, "pos_frac": 0.90625, "sample": [0.4638786315917969, 7.436431884765625, 4.42303466796875, 1.4033317565917969, 0.5318260192871094, 0.012096405029296875, 1.5993499755859375, 0.6015701293945312, 1.0780715942382812, 1.631500244140625, 1.525421142578125, 1.10931396484375, 0.187255859375, 2.0327224731445312, 0.855133056640625, -0.20042037963867188, 0.6928558349609375, 0.4719810485839844, 0.034473419189453125, 2.0526580810546875, 0.6461296081542969, 1.1722259521484375, 1.6964569091796875, 0.09013557434082031, 3.461212158203125, 0.20372962951660156, 2.3851394653320312, 2.0053482055664062, 1.6986465454101562, 0.16179656982421875, 0.5032119750976562, 3.0559234619140625, -0.4605865478515625, 1.2935028076171875, 0.45113182067871094, 0.011814117431640625, 0.2629547119140625, 5.4914093017578125, 0.6292839050292969, 0.10102081298828125, 3.0949859619140625, 4.900360107421875, 1.726531982421875, 0.22038650512695312, 0.9014129638671875, -0.5276565551757812, 0.5343513488769531, 0.4340629577636719, 0.5651206970214844, 1.6905975341796875, -0.3693084716796875, 0.079132080078125, -0.20018959045410156, 1.347320556640625, 1.3817920684814453, 1.9037704467773438, 1.0072479248046875, 4.834678649902344, 1.8033905029296875, 0.26659393310546875, 2.0343170166015625, -0.3573169708251953, 1.0533294677734375, 2.2653427124023438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000038.npy"}
{"epoch": 0.055800293685756244, "step": 39, "batch_size": 64, "mean": 1.47984778881073, "std": 1.2735984325408936, "min": -2.02783203125, "p10": 0.1035911560058594, "median": 1.3285980224609375, "p90": 3.3608650207519535, "max": 4.538482666015625, "pos_frac": 0.90625, "sample": [1.9578170776367188, 2.51495361328125, 4.538482666015625, 1.95660400390625, 1.5687789916992188, -0.73358154296875, 1.9765853881835938, 2.369964599609375, 0.9487628936767578, 1.9207897186279297, 1.004608154296875, 3.4616317749023438, 1.513519287109375, 1.183969497680664, 3.3817138671875, 0.9762039184570312, 0.09462356567382812, 0.6579399108886719, 0.44659423828125, 1.29595947265625, 0.8226318359375, 1.8791732788085938, 2.1022796630859375, 0.4285602569580078, 1.0296382904052734, 2.4062271118164062, -0.6695709228515625, 1.6283092498779297, 3.589935302734375, 0.3383636474609375, 0.46555328369140625, -0.02407073974609375, 3.1848678588867188, 2.1719627380371094, -2.02783203125, 1.8812980651855469, 0.4578857421875, 1.7174568176269531, 3.3122177124023438, 4.2091064453125, 1.2253913879394531, 2.539886474609375, 3.383716583251953, 0.607879638671875, 0.8757667541503906, 0.415679931640625, -0.064300537109375, 0.4834403991699219, 1.361236572265625, 0.9123039245605469, 0.6764450073242188, 3.1631202697753906, 2.5759658813476562, 1.9913482666015625, 0.2665424346923828, 3.674102783203125, 2.789337158203125, 0.8517780303955078, 1.7077445983886719, 0.12451553344726562, 0.6970424652099609, -0.021741867065429688, 0.5127010345458984, 2.020437240600586], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000039.npy"}
{"epoch": 0.05726872246696035, "step": 40, "batch_size": 64, "mean": 1.4126379489898682, "std": 1.5398181676864624, "min": -0.7543487548828125, "p10": -0.18571453094482418, "median": 1.1659622192382812, "p90": 3.6119976043701176, "max": 6.4597015380859375, "pos_frac": 0.84375, "sample": [-0.0903167724609375, 0.0821533203125, 1.3379287719726562, 1.5019245147705078, 1.5030746459960938, 0.5585441589355469, 0.7844390869140625, 0.549163818359375, 0.15099716186523438, 1.7580451965332031, 6.1331634521484375, 2.8994903564453125, 0.7679080963134766, 3.6729087829589844, 1.0890045166015625, -0.5853958129882812, 1.7492713928222656, 2.2583770751953125, 0.15546607971191406, -0.7543487548828125, 2.7330856323242188, -0.19919967651367188, 0.5479888916015625, -0.0677337646484375, 1.9247283935546875, 1.7675323486328125, 0.13399124145507812, 0.9143524169921875, 0.5595550537109375, -0.41943359375, 3.7716522216796875, 0.2889366149902344, 3.21429443359375, -0.7288055419921875, 0.6164798736572266, 4.453575134277344, 0.19916534423828125, 3.4698715209960938, 0.52423095703125, 1.3346996307373047, 4.172004699707031, 1.0180130004882812, 0.8353004455566406, 6.4597015380859375, -0.1542491912841797, 1.7945938110351562, -0.4583892822265625, 1.67364501953125, 2.7469024658203125, 1.8631744384765625, 1.242919921875, 1.6241302490234375, 3.4179153442382812, 1.7335681915283203, 1.0242538452148438, 1.324371337890625, 3.706146240234375, 0.21254730224609375, 1.3973197937011719, 0.2878074645996094, -0.23964691162109375, 1.7798919677734375, 1.9692764282226562, 0.4168701171875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000040.npy"}
{"epoch": 0.05873715124816446, "step": 41, "batch_size": 64, "mean": 1.6483460664749146, "std": 1.8015201091766357, "min": -2.6493682861328125, "p10": -0.18758869171142575, "median": 1.1969614028930664, "p90": 3.9540710449218754, "max": 7.804969787597656, "pos_frac": 0.859375, "sample": [0.6492462158203125, 0.5986976623535156, 3.0688629150390625, 1.5209159851074219, 1.0229949951171875, 2.344512939453125, 3.3463897705078125, 1.95208740234375, -0.21563720703125, 2.487518310546875, 5.074615478515625, 0.9938430786132812, 1.866668701171875, 7.804969787597656, 0.263763427734375, 0.60430908203125, 0.90032958984375, 1.1416816711425781, 0.07162094116210938, -0.68743896484375, 1.5713882446289062, 3.296703338623047, 3.9830474853515625, 0.9510974884033203, 5.9998779296875, 1.5810432434082031, 3.7252044677734375, 4.148040771484375, 0.8880558013916016, 0.2592048645019531, 1.8972320556640625, -0.15728759765625, 5.284393310546875, 0.6183795928955078, 1.2522411346435547, 3.8864593505859375, 1.8303146362304688, 2.5625, -0.5089111328125, 2.0558700561523438, 3.7903594970703125, 1.3049678802490234, -2.6493682861328125, 3.047454833984375, 0.06304931640625, -0.0384521484375, 1.7486114501953125, 1.8450965881347656, 1.9098491668701172, 0.39234352111816406, 0.7397079467773438, 0.9671401977539062, 1.0138397216796875, 0.4595489501953125, 0.5424079895019531, 0.9108715057373047, -0.46707916259765625, 2.7706985473632812, 0.7215194702148438, 0.44519615173339844, -0.6266345977783203, -0.2005748748779297, 1.9413604736328125, 4.92742919921875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000041.npy"}
{"epoch": 0.06020558002936858, "step": 42, "batch_size": 64, "mean": 2.276841640472412, "std": 2.028303384780884, "min": -1.45574951171875, "p10": 0.15325279235839845, "median": 1.9668254852294922, "p90": 5.049465942382813, "max": 9.156402587890625, "pos_frac": 0.9375, "sample": [1.8948974609375, -0.06250762939453125, 2.0840110778808594, 8.379318237304688, 9.156402587890625, 2.6066131591796875, 0.33197784423828125, 5.3561859130859375, 2.3143310546875, 2.7554893493652344, 2.0247955322265625, 5.2696685791015625, 2.532247543334961, 5.607635498046875, 0.9519939422607422, 2.1573028564453125, 2.4180831909179688, 0.00574493408203125, 2.2337493896484375, 2.631011962890625, 1.5118026733398438, 1.4963455200195312, 3.6503829956054688, 0.40175628662109375, 0.9639053344726562, 2.7380294799804688, 0.171600341796875, 2.1480865478515625, 1.5952491760253906, 0.22858428955078125, 1.9088554382324219, 5.0988311767578125, 0.13153839111328125, 4.72283935546875, 2.8399734497070312, 1.00653076171875, 0.7477912902832031, 1.471435546875, 1.830810546875, -0.8898162841796875, 2.9873504638671875, -0.040252685546875, 1.6235733032226562, 6.2572784423828125, 1.4424209594726562, 3.9851760864257812, 0.9776706695556641, 0.14538955688476562, 1.2639102935791016, 1.3552780151367188, 0.4533653259277344, 4.9342803955078125, -1.45574951171875, 0.6040916442871094, 2.14288330078125, 3.2274627685546875, 0.4538993835449219, 1.6548004150390625, 4.302490234375, 2.3009796142578125, 3.7848663330078125, 4.35748291015625, 0.7809314727783203, 3.7548065185546875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000042.npy"}
{"epoch": 0.06167400881057269, "step": 43, "batch_size": 64, "mean": 2.1357014179229736, "std": 1.704218864440918, "min": -0.9363250732421875, "p10": 0.09036636352539071, "median": 1.9310932159423828, "p90": 4.108570861816407, "max": 8.340560913085938, "pos_frac": 0.921875, "sample": [-0.19528961181640625, 0.05666351318359375, -0.06653404235839844, 3.2003135681152344, 3.4264678955078125, 0.960479736328125, 3.5323333740234375, 2.4494781494140625, 2.2635040283203125, 5.777427673339844, 0.6902313232421875, 1.6567497253417969, 0.6070137023925781, 1.962890625, 3.0569725036621094, 1.049835205078125, 3.0041656494140625, 4.376708984375, 1.28692626953125, 0.16900634765625, 5.2900390625, 2.5466766357421875, 4.491050720214844, 0.7919235229492188, 2.625732421875, 2.1168899536132812, 0.6862983703613281, 1.8407058715820312, -0.2421741485595703, 3.9758148193359375, 1.4992446899414062, 2.505573272705078, 3.7010955810546875, 1.899749755859375, 2.7558746337890625, 3.448772430419922, 0.7961654663085938, 1.8921928405761719, 1.3353385925292969, 0.3371429443359375, 1.6247329711914062, 1.8687667846679688, 5.835166931152344, 3.1078033447265625, 0.3616790771484375, -0.9363250732421875, 0.7668972015380859, 2.4881134033203125, 2.4113922119140625, 3.2874603271484375, 2.089611053466797, 3.193706512451172, 1.9624366760253906, 0.017599105834960938, 1.369110107421875, 8.340560913085938, 3.707855224609375, 1.6552448272705078, -0.22820091247558594, 0.32472991943359375, 1.83642578125, 4.16546630859375, 3.2276458740234375, 0.6475582122802734], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000043.npy"}
{"epoch": 0.0631424375917768, "step": 44, "batch_size": 64, "mean": 2.370201587677002, "std": 1.8722184896469116, "min": -0.9519500732421875, "p10": 0.44582481384277345, "median": 2.071582794189453, "p90": 4.5974067687988285, "max": 9.10760498046875, "pos_frac": 0.953125, "sample": [3.6156158447265625, 2.0557708740234375, 1.6384506225585938, 0.4422874450683594, 5.226310729980469, 2.6623687744140625, 2.5403213500976562, 1.4928207397460938, 2.0873947143554688, 4.621940612792969, 5.4700164794921875, 0.5690269470214844, 3.9112014770507812, 4.5401611328125, 4.430419921875, 3.4094161987304688, 0.8951644897460938, 2.0147705078125, 1.9785099029541016, 4.4774627685546875, 0.5162944793701172, 2.6565399169921875, 2.272705078125, 5.5716552734375, 2.4983367919921875, 3.1924972534179688, 0.0315399169921875, 1.2171173095703125, 4.137725830078125, 1.164337158203125, 2.341388702392578, -0.2381439208984375, 1.8818798065185547, 1.197479248046875, 1.462839126586914, 2.6446304321289062, 0.45407867431640625, 1.3594894409179688, 4.161834716796875, 1.199859619140625, 0.752288818359375, 2.5367507934570312, 3.0548248291015625, 3.246734619140625, 1.0897636413574219, 3.7079849243164062, 2.54827880859375, 2.0262985229492188, 0.5693206787109375, 0.2427501678466797, 5.1954498291015625, 3.5528106689453125, 1.2346649169921875, 2.9402923583984375, 0.6472358703613281, 9.10760498046875, 0.6728858947753906, -0.9519500732421875, 2.7405929565429688, 1.3125572204589844, -0.7038326263427734, 0.03568458557128906, 1.2893524169921875, 7.041038513183594], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000044.npy"}
{"epoch": 0.06461086637298091, "step": 45, "batch_size": 64, "mean": 1.8630380630493164, "std": 2.225584030151367, "min": -1.4972457885742188, "p10": -0.20156555175781243, "median": 1.2055816650390625, "p90": 5.116755676269531, "max": 11.10870361328125, "pos_frac": 0.859375, "sample": [2.9848861694335938, 0.8179779052734375, 0.34671783447265625, 1.7744941711425781, 0.20669937133789062, 1.153543472290039, 2.9317359924316406, 5.103607177734375, 5.253631591796875, 2.0469589233398438, 0.7940216064453125, 6.2929840087890625, 3.021543502807617, 0.32330322265625, 0.03694915771484375, 1.6839370727539062, -0.12210845947265625, 1.5890731811523438, 0.05070304870605469, 4.719425201416016, 0.0378570556640625, 0.351470947265625, 2.2645339965820312, 0.7165718078613281, 0.4408130645751953, 4.407375335693359, 5.637413024902344, 3.6223297119140625, 0.06150245666503906, 0.5753517150878906, 3.6300811767578125, 1.5561599731445312, 5.1223907470703125, -0.2809333801269531, -0.23561859130859375, 1.55706787109375, 5.22747802734375, 3.4163894653320312, -0.2952461242675781, 1.5823898315429688, 1.1804637908935547, 0.6182022094726562, -1.1215400695800781, -1.0426712036132812, 0.6265869140625, 11.10870361328125, 1.0776138305664062, 5.9759368896484375, 2.4244766235351562, -0.8627243041992188, 1.9870128631591797, 1.1313323974609375, 1.1268863677978516, 0.29403114318847656, 1.2306995391845703, 4.500152587890625, 0.8258209228515625, 1.9973602294921875, 1.8658447265625, 1.5179901123046875, -0.02654266357421875, 3.483642578125, 0.4069366455078125, -1.4972457885742188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000045.npy"}
{"epoch": 0.06607929515418502, "step": 46, "batch_size": 64, "mean": 2.5065414905548096, "std": 2.5875461101531982, "min": -2.1418075561523438, "p10": 0.02454795837402346, "median": 1.9173593521118164, "p90": 5.356118392944336, "max": 12.32891845703125, "pos_frac": 0.90625, "sample": [0.9940910339355469, -0.19084930419921875, 4.503814697265625, 5.993133544921875, 3.45526123046875, -2.1418075561523438, 1.9162406921386719, 1.7174434661865234, 5.54010009765625, 3.0063095092773438, 4.956867218017578, 1.4635124206542969, 0.605316162109375, 1.9675788879394531, 3.788909912109375, 0.4792137145996094, -0.5208358764648438, 0.8617210388183594, 3.0529251098632812, 9.64501953125, -0.3763580322265625, 0.3141326904296875, 0.820587158203125, 3.610393524169922, 0.014629364013671875, 3.923980712890625, 3.46380615234375, 1.1406841278076172, 3.9246482849121094, 1.959848403930664, 3.3886184692382812, 0.74542236328125, -1.528717041015625, 0.33827972412109375, 3.2501678466796875, 5.359447479248047, 4.597328186035156, 0.158721923828125, 7.053863525390625, -0.149810791015625, 0.12092971801757812, 2.8060760498046875, 4.574531555175781, 0.5669708251953125, 12.32891845703125, 0.32391357421875, 5.191162109375, 5.243316650390625, 0.5740203857421875, 1.9130706787109375, 1.7829208374023438, 7.2812652587890625, 2.744924545288086, 0.04769134521484375, 1.4899520874023438, 5.348350524902344, 1.2701835632324219, 1.918478012084961, 3.1322708129882812, 4.4015655517578125, 1.9325637817382812, 0.7282619476318359, 1.408681869506836, 0.18498992919921875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000046.npy"}
{"epoch": 0.06754772393538913, "step": 47, "batch_size": 64, "mean": 2.371908664703369, "std": 2.1768782138824463, "min": -1.9719810485839844, "p10": -0.23514251708984357, "median": 1.8874835968017578, "p90": 5.3151123046875, "max": 8.66400146484375, "pos_frac": 0.875, "sample": [2.2157440185546875, 2.6910743713378906, 0.7494049072265625, 1.3969879150390625, 3.643596649169922, 1.2990684509277344, 0.8074188232421875, 1.3613243103027344, 5.676166534423828, 2.5680103302001953, -0.500946044921875, 4.972564697265625, 5.318023681640625, 1.5842819213867188, 2.5817947387695312, 1.8453826904296875, -0.0631866455078125, 0.9681053161621094, 1.2271652221679688, 4.202064514160156, 5.139839172363281, 2.0266799926757812, 0.645843505859375, 2.689067840576172, 2.842926025390625, 1.2306976318359375, -1.4410171508789062, 4.42596435546875, 5.98345947265625, 2.3172359466552734, -0.308837890625, 2.5293540954589844, 5.813556671142578, 3.3961105346679688, -1.9719810485839844, 0.2377300262451172, 0.0161285400390625, 0.9335842132568359, 1.4058303833007812, 8.66400146484375, 4.8631744384765625, 1.9295845031738281, 1.17706298828125, 1.82379150390625, 6.80059814453125, 4.70660400390625, -0.33632659912109375, 2.765247344970703, 7.89599609375, -0.42919158935546875, 1.5947723388671875, 1.445709228515625, -0.6197052001953125, 1.0687751770019531, 2.0370025634765625, 1.2476730346679688, 5.308319091796875, 1.1781234741210938, 4.2630462646484375, 1.4607658386230469, 3.344818115234375, 3.308502197265625, 2.1599159240722656, 1.687673568725586], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000047.npy"}
{"epoch": 0.06901615271659324, "step": 48, "batch_size": 64, "mean": 2.3516030311584473, "std": 2.359065055847168, "min": -4.549468994140625, "p10": 0.038265228271484465, "median": 1.8140754699707031, "p90": 5.7594013214111355, "max": 9.686935424804688, "pos_frac": 0.890625, "sample": [2.7083282470703125, 0.5056991577148438, 6.1518096923828125, 6.307823181152344, 5.164955139160156, 2.9857635498046875, -0.23846435546875, 1.8299102783203125, 3.5550155639648438, 3.33184814453125, 1.1695785522460938, 4.963615417480469, 1.7997283935546875, -0.240936279296875, 0.12769317626953125, 0.8311271667480469, 4.92327880859375, -0.09797477722167969, 0.74005126953125, 0.47307586669921875, 0.6731147766113281, 6.7100372314453125, -0.31878662109375, 1.5743560791015625, 4.871315002441406, 1.2008590698242188, 2.9966659545898438, 3.02569580078125, -6.103515625e-05, 2.582805633544922, 1.3176536560058594, 1.8632793426513672, 2.6289443969726562, 1.6282234191894531, 7.357269287109375, 3.3536949157714844, 3.8341827392578125, 7.331413269042969, 2.9661865234375, 0.6786727905273438, 0.5281410217285156, 1.2382068634033203, 0.38057708740234375, 1.8194770812988281, 0.902191162109375, 2.896139144897461, 2.095590591430664, 1.2208099365234375, -1.3564319610595703, 6.014163970947266, 1.9162673950195312, 2.879056930541992, 1.195526123046875, 9.686935424804688, 3.5441818237304688, -4.549468994140625, 4.535541534423828, 1.0745086669921875, 0.5513076782226562, 1.07684326171875, 4.5484466552734375, 1.6658401489257812, 1.8086738586425781, 1.5626220703125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000048.npy"}
{"epoch": 0.07048458149779736, "step": 49, "batch_size": 64, "mean": 2.6118998527526855, "std": 2.5745668411254883, "min": -2.109222412109375, "p10": -0.023649406433105404, "median": 2.1525516510009766, "p90": 6.1859474182128915, "max": 9.959976196289062, "pos_frac": 0.890625, "sample": [2.893707275390625, 0.817169189453125, 0.1699047088623047, 2.1521759033203125, 2.4387359619140625, 4.265575408935547, 0.6814346313476562, 1.7333011627197266, -0.9183502197265625, 0.41626739501953125, 0.04108238220214844, 1.6267223358154297, 1.7221908569335938, 7.00018310546875, 5.938652038574219, 2.37847900390625, 2.5162925720214844, 1.1619224548339844, 2.1455001831054688, 1.9257736206054688, 1.3253421783447266, -2.109222412109375, 2.42108154296875, 0.2675132751464844, 9.001190185546875, 6.29193115234375, 0.9311904907226562, 4.757484436035156, 3.8829498291015625, 0.8509349822998047, 5.1326446533203125, 9.811431884765625, 2.6952362060546875, 0.417755126953125, -0.23358917236328125, -1.9015121459960938, 2.086872100830078, 2.538818359375, 2.8567657470703125, 2.8334426879882812, 2.1529273986816406, -0.0619659423828125, -0.0513916015625, 2.9640655517578125, 3.7750473022460938, 9.959976196289062, 1.4683952331542969, 2.1462020874023438, 6.3362274169921875, 4.0062408447265625, 4.722503662109375, 2.471111297607422, 3.5115509033203125, 3.1780242919921875, 9.205047607421875, 0.3647480010986328, -0.949310302734375, 1.9561004638671875, 1.6634902954101562, 2.5587158203125, 3.2955322265625, 1.5034332275390625, 4.309349060058594, 1.7105865478515625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000049.npy"}
{"epoch": 0.07195301027900147, "step": 50, "batch_size": 64, "mean": 2.6372203826904297, "std": 2.5380706787109375, "min": -1.87628173828125, "p10": -0.3694915771484373, "median": 2.141895294189453, "p90": 6.03632049560547, "max": 9.541915893554688, "pos_frac": 0.859375, "sample": [4.36517333984375, 0.8486709594726562, -0.898712158203125, 6.563255310058594, 0.6743392944335938, 1.3475112915039062, 5.41473388671875, 0.8885002136230469, 3.0202293395996094, 1.3384160995483398, -0.633087158203125, 5.228492736816406, 2.316314697265625, 9.39654541015625, 0.5756988525390625, 1.58056640625, 4.5684967041015625, 2.3908615112304688, 5.478759765625, 1.779449462890625, 0.49462890625, 3.1907958984375, 6.383201599121094, 2.1144256591796875, 1.6357841491699219, 2.9575271606445312, 0.17415618896484375, -0.46303558349609375, 2.0189170837402344, 6.2450103759765625, 4.4899444580078125, 3.4598541259765625, 6.515750885009766, 9.541915893554688, -1.658712387084961, 0.3180961608886719, 1.2351226806640625, 0.6494216918945312, 0.2458038330078125, 2.4765472412109375, 1.6318817138671875, 1.2611141204833984, 3.902538299560547, 2.1693649291992188, 5.389011383056641, 4.398811340332031, -0.8738632202148438, 1.7531242370605469, 3.33819580078125, -0.1455078125, 5.54937744140625, 2.4222564697265625, 2.886157989501953, 3.8567886352539062, 1.7955474853515625, 4.682037353515625, 4.648292541503906, 1.3778457641601562, 7.015922546386719, -0.6433944702148438, 5.448951721191406, 0.67578125, -1.87628173828125, -0.15122222900390625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000050.npy"}
{"epoch": 0.07342143906020558, "step": 51, "batch_size": 64, "mean": 2.6572089195251465, "std": 3.1503689289093018, "min": -3.2017822265625, "p10": -0.3540201187133789, "median": 1.9454078674316406, "p90": 6.787166595458986, "max": 13.368698120117188, "pos_frac": 0.84375, "sample": [6.942909240722656, 2.5899581909179688, 0.6777191162109375, -0.3613243103027344, 0.537445068359375, 1.240509033203125, 5.6800994873046875, -0.8558025360107422, 1.5602035522460938, 0.2871856689453125, 6.42376708984375, 4.816265106201172, 1.194122314453125, 3.0117645263671875, 1.959686279296875, 3.228240966796875, 2.7970619201660156, -0.16199684143066406, 7.454887390136719, 3.6050338745117188, 3.0867652893066406, -0.4498138427734375, 0.29408836364746094, 3.4653244018554688, 1.9311294555664062, 4.916431427001953, 0.1827259063720703, 4.089241027832031, 0.4663352966308594, 2.5663681030273438, 13.368698120117188, 1.8416824340820312, 0.7270088195800781, 4.011871337890625, 2.423694610595703, 9.194183349609375, 1.12982177734375, 9.283493041992188, 4.906013488769531, 1.2575836181640625, 1.76483154296875, 0.6970329284667969, 2.4572830200195312, 6.023193359375, 1.6948966979980469, 0.1518096923828125, 3.97955322265625, -1.9897003173828125, -0.3369770050048828, 10.673553466796875, 9.530181884765625, 4.027275085449219, 4.922435760498047, -0.7600784301757812, 0.7275333404541016, 3.8545150756835938, -0.0024700164794921875, 0.32799530029296875, 1.1175689697265625, -3.2017822265625, 2.1864776611328125, -1.3552703857421875, 2.223163604736328, 0.0279541015625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000051.npy"}
{"epoch": 0.07488986784140969, "step": 52, "batch_size": 64, "mean": 4.237063407897949, "std": 3.3493356704711914, "min": -1.7982864379882812, "p10": 0.6294500350952148, "median": 3.613128662109375, "p90": 8.485014343261719, "max": 13.931549072265625, "pos_frac": 0.953125, "sample": [1.0987968444824219, 8.463203430175781, 3.018688201904297, 8.464324951171875, 6.132354736328125, 6.9514923095703125, 2.1147994995117188, 5.3301849365234375, 7.861297607421875, 8.57891845703125, 0.6287765502929688, 8.493881225585938, 4.092437744140625, 3.6527099609375, 6.39069938659668, 6.45942497253418, 2.5824928283691406, 3.854898452758789, 12.137802124023438, 7.1951751708984375, 2.5299148559570312, 0.72186279296875, 7.931510925292969, 1.7334632873535156, 0.11501121520996094, 5.3351287841796875, 1.6422233581542969, 1.3283195495605469, 6.529165267944336, 13.931549072265625, 6.05059814453125, 1.9577560424804688, 7.395591735839844, 3.2675209045410156, 3.0606765747070312, 9.678024291992188, 3.329570770263672, 0.3978271484375, 3.57354736328125, -1.7982864379882812, -1.70880126953125, 2.0833740234375, 5.566507339477539, 3.8369216918945312, 1.5179023742675781, 5.875587463378906, 3.2610321044921875, 8.62225341796875, 5.37371826171875, 1.0484390258789062, 0.4574012756347656, 1.268829345703125, 2.1425247192382812, 3.6831893920898438, 2.4617462158203125, 7.120197296142578, 4.031898498535156, 0.8465652465820312, 11.746994018554688, -0.8127784729003906, 4.803733825683594, 3.3020858764648438, 1.7983932495117188, 0.6310214996337891], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000052.npy"}
{"epoch": 0.0763582966226138, "step": 53, "batch_size": 64, "mean": 4.196630477905273, "std": 4.2808685302734375, "min": -1.9476776123046875, "p10": 0.3231447219848634, "median": 2.7606887817382812, "p90": 10.571642303466799, "max": 18.73052978515625, "pos_frac": 0.921875, "sample": [-0.8099937438964844, 2.1188201904296875, 2.2463836669921875, 2.20050048828125, 0.4113616943359375, 3.1447219848632812, 8.641929626464844, -0.24010467529296875, 5.091560363769531, -0.7379302978515625, 1.49053955078125, 18.3831787109375, -1.3860931396484375, 0.7262420654296875, 4.019065856933594, 4.992179870605469, 1.2074127197265625, 8.249702453613281, 2.0535202026367188, 8.563697814941406, 2.9289398193359375, 4.0063018798828125, 7.85394287109375, 5.768074035644531, 10.839958190917969, 1.3887405395507812, 4.581062316894531, 0.7178726196289062, 4.0500335693359375, 1.6293373107910156, 11.612152099609375, 6.162799835205078, 5.166408538818359, 6.251197814941406, 3.0218963623046875, 0.21321487426757812, 0.7402763366699219, 3.2049293518066406, 0.9293117523193359, 3.2331275939941406, 2.2421493530273438, 5.925832748413086, 1.341684341430664, 9.945571899414062, -1.9476776123046875, 11.44354248046875, 12.562088012695312, 2.0373306274414062, 3.8625946044921875, 2.483297348022461, 2.0592575073242188, 2.5559215545654297, 1.7740325927734375, 1.53369140625, 2.0783767700195312, 3.119091033935547, 2.2975330352783203, 18.73052978515625, 1.9578666687011719, 9.8035888671875, 4.039310455322266, 2.592437744140625, 11.194686889648438, 0.2853374481201172], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000053.npy"}
{"epoch": 0.07782672540381791, "step": 54, "batch_size": 64, "mean": 3.9426727294921875, "std": 3.5844852924346924, "min": -0.7063922882080078, "p10": 0.43321208953857426, "median": 3.0832958221435547, "p90": 8.562593841552735, "max": 17.212173461914062, "pos_frac": 0.953125, "sample": [5.277534484863281, 3.3276901245117188, 6.590782165527344, 0.45670127868652344, 0.3134193420410156, 1.958404541015625, 0.9408149719238281, 2.1193008422851562, 4.6100616455078125, 7.013362884521484, 0.8744449615478516, 3.3246078491210938, 8.570281982421875, 1.5412483215332031, 2.4814834594726562, 8.065559387207031, 0.03601837158203125, 3.786266326904297, 3.977781295776367, 4.517097473144531, -0.11805534362792969, 1.2114028930664062, 10.675956726074219, 1.0071601867675781, 4.9674530029296875, 0.4231452941894531, 3.7352867126464844, 3.3562469482421875, 3.9692611694335938, 4.912921905517578, 1.40179443359375, 0.6071815490722656, 5.6462249755859375, 0.15789413452148438, 0.47603416442871094, 2.8085594177246094, 8.544654846191406, 3.1452598571777344, 4.9804229736328125, 10.581512451171875, 1.6556892395019531, 2.4998512268066406, 17.212173461914062, 8.646430969238281, -0.7063922882080078, 2.6058692932128906, -0.137542724609375, 9.745025634765625, 0.5463180541992188, 3.4451751708984375, 1.3735427856445312, 2.8476600646972656, 8.230934143066406, 1.5646591186523438, 1.9809646606445312, 12.860137939453125, 1.0972518920898438, 3.021331787109375, 3.922760009765625, 8.36865234375, 1.9145870208740234, 8.03753662109375, 1.0915679931640625, 8.23370361328125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000054.npy"}
{"epoch": 0.07929515418502203, "step": 55, "batch_size": 64, "mean": 4.647494316101074, "std": 5.00372314453125, "min": -6.6195068359375, "p10": -0.7114389419555662, "median": 3.889634132385254, "p90": 11.178639221191407, "max": 20.0045166015625, "pos_frac": 0.828125, "sample": [7.203422546386719, 7.4886474609375, 4.123573303222656, -1.404510498046875, 17.2767333984375, -0.89581298828125, 5.230327606201172, 2.963665008544922, 11.547149658203125, -0.22374725341796875, 3.7293243408203125, 4.102653503417969, 1.6938552856445312, 8.4273681640625, 13.664825439453125, 7.5207366943359375, 3.6654052734375, 6.73760986328125, 20.0045166015625, 3.1762466430664062, 7.8856048583984375, -1.310943603515625, 2.7619552612304688, 5.820240020751953, 17.005279541015625, 0.188720703125, 11.21209716796875, 0.742523193359375, 0.3457489013671875, 5.431488037109375, 2.9542770385742188, -0.8835296630859375, 3.710956573486328, 0.132781982421875, 8.150680541992188, 2.8279266357421875, -0.7954769134521484, 0.8907699584960938, 1.3423423767089844, -3.410858154296875, 6.966464996337891, 2.317415237426758, 1.0354080200195312, 11.100570678710938, 6.000244140625, -0.515350341796875, 1.0096511840820312, 7.8471527099609375, 4.049943923950195, -6.6195068359375, 2.090097427368164, 12.400039672851562, 5.3971405029296875, 3.280254364013672, 5.023136138916016, 4.442346572875977, -0.3647727966308594, 7.288818359375, 5.2167510986328125, 9.959098815917969, 4.429481506347656, -0.24744415283203125, 3.47235107421875, 10.825767517089844], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000055.npy"}
{"epoch": 0.08076358296622614, "step": 56, "batch_size": 64, "mean": 4.4003424644470215, "std": 4.850205421447754, "min": -6.169879913330078, "p10": -0.8566898345947264, "median": 4.1577043533325195, "p90": 10.041467285156251, "max": 17.850967407226562, "pos_frac": 0.796875, "sample": [-0.93695068359375, 5.410865783691406, -1.5038909912109375, 6.495414733886719, -0.45127296447753906, -6.169879913330078, 7.798187255859375, -0.6694145202636719, 2.9950294494628906, -2.6388320922851562, 2.497314453125, 2.2322921752929688, 5.7478179931640625, 11.00567626953125, 1.9350910186767578, -0.07195281982421875, -0.26609230041503906, 8.198204040527344, 7.550773620605469, 6.461051940917969, -1.1780242919921875, 7.34027099609375, 1.7268600463867188, 14.4989013671875, 6.3543853759765625, 10.153564453125, 1.1521968841552734, 0.03583335876464844, 7.083574295043945, 1.992025375366211, 4.627952575683594, 5.960231781005859, 6.982452392578125, 1.085042953491211, 8.13631820678711, 8.777229309082031, 0.22155380249023438, 12.971694946289062, 4.906578063964844, 5.2577362060546875, 4.3245697021484375, 3.6985321044921875, 2.767223358154297, 14.247772216796875, 5.327186584472656, 9.412368774414062, 4.545969009399414, 8.627059936523438, 1.1382369995117188, 17.850967407226562, 1.536407470703125, -0.06459808349609375, 2.8432388305664062, 9.7799072265625, -4.560333251953125, 2.130237579345703, -0.1116180419921875, 16.77690887451172, 4.128198623657227, -1.6636962890625, 2.7658538818359375, 1.358551025390625, 4.1872100830078125, 6.8699493408203125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000056.npy"}
{"epoch": 0.08223201174743025, "step": 57, "batch_size": 64, "mean": 4.910983085632324, "std": 4.3147149085998535, "min": -6.692779541015625, "p10": 0.16893215179443377, "median": 4.94651985168457, "p90": 10.740789031982423, "max": 15.795303344726562, "pos_frac": 0.921875, "sample": [5.968318939208984, 2.635833740234375, 0.8149185180664062, -1.1626434326171875, 7.158782958984375, -6.692779541015625, 7.1305084228515625, 0.35544776916503906, 9.731201171875, 5.505775451660156, 11.6373291015625, 3.2033843994140625, 15.795303344726562, 2.482086181640625, 5.197608947753906, 4.06988525390625, -0.7091255187988281, 1.8412322998046875, 5.347328186035156, 15.132186889648438, 8.596794128417969, 10.499710083007812, 8.305877685546875, 3.515979766845703, 4.964666366577148, 0.9833183288574219, 11.69961166381836, 4.928373336791992, 2.9974517822265625, 2.6489524841308594, 0.7833480834960938, 3.406513214111328, 3.49029541015625, 0.08899688720703125, 7.1102294921875, 5.58282470703125, 0.03163909912109375, 1.5858192443847656, 1.8030624389648438, 6.363807678222656, 7.3914947509765625, 1.7430591583251953, 7.149681091308594, 4.124176025390625, 6.6438140869140625, -1.7656822204589844, 7.0087890625, 5.744682312011719, 6.792205810546875, 2.32568359375, 1.3794326782226562, 12.684982299804688, 4.519844055175781, 2.194355010986328, 6.7040252685546875, 3.1053237915039062, 0.5071620941162109, 9.520721435546875, 8.052101135253906, -2.5657119750976562, 10.844108581542969, 14.043731689453125, 5.073564529418945, 6.251533508300781], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000057.npy"}
{"epoch": 0.08370044052863436, "step": 58, "batch_size": 64, "mean": 5.7713303565979, "std": 5.657684803009033, "min": -5.928565979003906, "p10": 0.21852321624755883, "median": 3.8745803833007812, "p90": 13.77212982177735, "max": 20.018661499023438, "pos_frac": 0.90625, "sample": [11.271759033203125, -1.028635025024414, 0.46285057067871094, 3.4391632080078125, 3.5609664916992188, 3.5947036743164062, 14.446479797363281, 2.1942367553710938, 10.926155090332031, 2.644533157348633, 3.979717254638672, 5.9615478515625, 5.779792785644531, 3.0706005096435547, 2.0129623413085938, 12.082038879394531, 8.419147491455078, 7.2857818603515625, 3.7694435119628906, 2.389719009399414, 5.1368560791015625, 10.86566162109375, 11.191925048828125, -0.4236907958984375, 9.107276916503906, 2.5308914184570312, 7.606101989746094, 5.599882125854492, 17.184173583984375, 3.1443634033203125, 20.018661499023438, 7.044490814208984, 6.075962066650391, 0.11381149291992188, 12.198646545410156, 0.8935756683349609, 10.277702331542969, 2.8717384338378906, 3.369647979736328, -2.2952423095703125, 1.3782691955566406, 1.7015666961669922, 16.895126342773438, -0.6351242065429688, 8.0185546875, 0.4912452697753906, 1.1420040130615234, 6.599151611328125, -3.836456298828125, 16.745574951171875, 1.9903202056884766, 5.2597808837890625, 0.7492027282714844, 11.719135284423828, 9.237297058105469, -5.928565979003906, 18.999069213867188, 2.09356689453125, 2.6790008544921875, 6.051036834716797, 11.960426330566406, 1.6408939361572266, 0.6366043090820312, 15.002082824707031], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000058.npy"}
{"epoch": 0.08516886930983847, "step": 59, "batch_size": 64, "mean": 5.904120922088623, "std": 5.8766703605651855, "min": -4.167633056640625, "p10": -0.8982154846191405, "median": 5.100193023681641, "p90": 14.493823242187501, "max": 24.790420532226562, "pos_frac": 0.875, "sample": [6.448703765869141, 5.6241302490234375, 11.740921020507812, 16.3671875, 12.034339904785156, 3.722911834716797, 24.790420532226562, 15.716567993164062, 5.835432052612305, 11.961502075195312, 0.7617359161376953, 7.754085540771484, -2.71868896484375, 1.9028797149658203, 15.907241821289062, -0.9519500732421875, 2.0314388275146484, 10.717155456542969, 5.867622375488281, 14.563934326171875, 0.40130043029785156, 1.8956985473632812, 0.85076904296875, 0.6390800476074219, 2.9631729125976562, 18.48236083984375, 2.036684036254883, 1.7098979949951172, 5.8824462890625, -4.167633056640625, 6.834114074707031, -1.1528816223144531, 5.855934143066406, 6.01922607421875, 7.669166564941406, 14.038871765136719, 2.094085693359375, 10.069496154785156, 4.99993896484375, 3.123973846435547, -3.7522201538085938, -1.9243354797363281, 4.79095458984375, -0.7728347778320312, 3.2116756439208984, 8.072372436523438, 6.2455291748046875, 2.1307601928710938, 5.7762451171875, 1.7792110443115234, 2.8757362365722656, 4.728797912597656, 2.7119216918945312, 7.966411590576172, 4.231672286987305, 10.268623352050781, 7.757354736328125, 2.4224624633789062, 1.861602783203125, -1.3078842163085938, 14.008415222167969, 14.927322387695312, 5.200447082519531, 14.330230712890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000059.npy"}
{"epoch": 0.08663729809104258, "step": 60, "batch_size": 64, "mean": 4.9901885986328125, "std": 5.19807243347168, "min": -9.092742919921875, "p10": -1.0888229370117182, "median": 5.3869781494140625, "p90": 10.974187469482425, "max": 21.057083129882812, "pos_frac": 0.84375, "sample": [6.2084197998046875, -1.7886962890625, 1.1574516296386719, 10.011886596679688, 0.6982269287109375, 2.596874237060547, 8.319869995117188, 6.093162536621094, 2.9236297607421875, -9.092742919921875, 5.196689605712891, -0.09032440185546875, 1.751382827758789, 2.8239002227783203, 8.606155395507812, 3.940582275390625, 4.5912322998046875, 8.35748291015625, 1.8901824951171875, 7.348640441894531, 11.608009338378906, 9.319305419921875, 14.689453125, 7.225059509277344, 7.267219543457031, 5.8544921875, -0.42945098876953125, 5.2878875732421875, 7.3043060302734375, 0.2488861083984375, 6.986240386962891, 13.121795654296875, 2.8211631774902344, 5.4860687255859375, 5.897418975830078, -1.31048583984375, 9.017585754394531, 10.289199829101562, 1.681671142578125, -3.6586532592773438, 9.96124267578125, 2.5952415466308594, 4.976020812988281, 7.1240234375, 3.5243682861328125, -0.5716094970703125, -7.573207855224609, 12.506210327148438, 14.6629638671875, 8.6400146484375, 11.267753601074219, 1.8645248413085938, -1.322906494140625, -1.6233901977539062, 8.858833312988281, 5.965860366821289, 21.057083129882812, 6.397365570068359, 1.6146087646484375, 7.077352523803711, 0.8898391723632812, 6.436820983886719, 3.8631324768066406, 0.9287605285644531], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000060.npy"}
{"epoch": 0.0881057268722467, "step": 61, "batch_size": 64, "mean": 5.912901878356934, "std": 7.457601547241211, "min": -7.6274566650390625, "p10": -0.431458282470703, "median": 4.498050689697266, "p90": 13.635128021240234, "max": 41.295013427734375, "pos_frac": 0.859375, "sample": [2.116292953491211, 1.1697845458984375, -0.47957611083984375, 2.3249130249023438, 10.264297485351562, 0.25241661071777344, 0.21465492248535156, 5.774391174316406, -0.116455078125, 20.0989990234375, 16.205169677734375, 15.037872314453125, 22.578529357910156, 9.701530456542969, 5.49346923828125, 7.42987060546875, -0.7934494018554688, 1.3894577026367188, 9.820659637451172, 9.464042663574219, 2.7467117309570312, 5.258110046386719, 11.851898193359375, 5.823268890380859, 4.4611968994140625, 5.644264221191406, 4.646690368652344, 4.307746887207031, 6.237083435058594, 6.6816558837890625, 0.72430419921875, 2.421783447265625, 11.119903564453125, 13.7059326171875, 12.234405517578125, 16.766586303710938, 13.334808349609375, 1.7587814331054688, 0.415802001953125, 10.084976196289062, 2.6173934936523438, 0.14586257934570312, 8.984298706054688, 2.0706138610839844, 1.2983512878417969, 13.469917297363281, 1.3711624145507812, 2.8479461669921875, 41.295013427734375, 9.525131225585938, 10.1549072265625, -0.319183349609375, -7.6274566650390625, 8.87481689453125, 2.0202388763427734, -3.947734832763672, 4.534904479980469, 4.294380187988281, -4.601593017578125, 1.71661376953125, 5.7366790771484375, -4.5992279052734375, -0.6762237548828125, 1.0660991668701172], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000061.npy"}
{"epoch": 0.08957415565345081, "step": 62, "batch_size": 64, "mean": 4.51983642578125, "std": 5.374363899230957, "min": -5.486785888671875, "p10": -1.3240362167358388, "median": 4.119012832641602, "p90": 11.231359100341798, "max": 22.063453674316406, "pos_frac": 0.859375, "sample": [6.114582061767578, 0.9098892211914062, 0.7740316390991211, 8.578086853027344, 4.573514938354492, 8.111183166503906, 5.587518692016602, 0.7585792541503906, 8.355621337890625, -3.7765045166015625, 11.32293701171875, 4.833641052246094, 1.4283771514892578, 0.05170249938964844, 7.415061950683594, 2.909872055053711, 7.1531982421875, 3.610015869140625, 0.05087852478027344, 4.122219085693359, 5.904232025146484, 6.824867248535156, 1.2559928894042969, 3.757232666015625, -3.027313232421875, 6.2147369384765625, 3.504467010498047, 6.1160888671875, -0.2690601348876953, 22.063453674316406, 14.828781127929688, 5.6171722412109375, 4.334877014160156, 2.0558624267578125, 1.6210250854492188, 4.138151168823242, 0.964599609375, 13.066848754882812, -5.486785888671875, -1.7761688232421875, -3.89752197265625, 0.9755897521972656, 1.3347091674804688, 1.1685867309570312, 1.4295196533203125, 6.936544418334961, 13.198009490966797, 7.120521545410156, -2.672821044921875, 10.397735595703125, 7.211132049560547, 4.146171569824219, -3.768402099609375, 3.0380496978759766, 11.017677307128906, 2.97064208984375, 2.48248291015625, 4.115806579589844, 20.512115478515625, 14.785964965820312, 5.76611328125, 0.8836097717285156, -0.234375, 5.758209228515625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000062.npy"}
{"epoch": 0.09104258443465492, "step": 63, "batch_size": 64, "mean": 6.1104888916015625, "std": 6.131771087646484, "min": -9.339500427246094, "p10": -0.5052549362182611, "median": 5.784873962402344, "p90": 12.408344268798828, "max": 25.06402587890625, "pos_frac": 0.890625, "sample": [19.757888793945312, 5.783576965332031, 3.447427749633789, 7.448444366455078, 2.925323486328125, 1.0969009399414062, 3.1216659545898438, -4.4330291748046875, 5.091957092285156, 5.0791778564453125, -9.339500427246094, 12.443992614746094, 7.542407989501953, 4.470115661621094, 2.106199264526367, -0.7497539520263672, 12.325164794921875, 3.874217987060547, 8.193267822265625, 17.380409240722656, 10.191688537597656, 10.74737548828125, 5.2318572998046875, 0.06524276733398438, -1.062957763671875, 10.272336959838867, -2.5672760009765625, 0.1860065460205078, 4.641838073730469, 2.3830204010009766, 8.185302734375, 6.56048583984375, 6.1798095703125, 7.503089904785156, 8.13934326171875, 9.540046691894531, 9.980491638183594, 9.853233337402344, 3.8221206665039062, 3.9044570922851562, 0.772979736328125, 11.654327392578125, 1.7979850769042969, 5.911521911621094, -6.857276916503906, 10.76576042175293, 4.8894805908203125, 0.74493408203125, 2.2014007568359375, 5.361419677734375, 2.384033203125, 10.897354125976562, 13.929618835449219, 7.193668365478516, 14.13912582397461, 0.8020706176757812, 7.027887344360352, 25.06402587890625, -2.6287841796875, 11.847732543945312, 6.1835174560546875, 20.808425903320312, 5.786170959472656, 9.070541381835938], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000063.npy"}
{"epoch": 0.09251101321585903, "step": 64, "batch_size": 64, "mean": 6.8527960777282715, "std": 7.456490993499756, "min": -10.993743896484375, "p10": -0.5238840103149414, "median": 5.46428108215332, "p90": 16.024870300292974, "max": 35.107818603515625, "pos_frac": 0.859375, "sample": [0.8980941772460938, 12.987548828125, 21.463699340820312, 5.9749755859375, 5.018032073974609, 9.678482055664062, 2.3797435760498047, 12.598236083984375, -0.5559406280517578, 4.840156555175781, 1.2230110168457031, 8.243253707885742, 1.3094367980957031, 3.7265090942382812, 9.070625305175781, 22.479995727539062, 3.62835693359375, -1.4458808898925781, 13.107223510742188, 6.028650283813477, 8.07904052734375, -2.4117965698242188, 21.864532470703125, 6.490520477294922, 0.14386558532714844, 5.488670349121094, 0.8651828765869141, 5.300867080688477, -10.993743896484375, -0.08867454528808594, 5.322048187255859, 0.12767791748046875, 35.107818603515625, 4.934942245483398, 1.6982994079589844, 7.594753265380859, 12.913726806640625, 14.858200073242188, 7.03582763671875, 5.033779144287109, 4.306011199951172, 17.560867309570312, 9.211219787597656, 5.658195495605469, -0.5736236572265625, -7.313941955566406, 18.284896850585938, 2.903472900390625, 16.524871826171875, 1.4262161254882812, 11.253341674804688, 10.793930053710938, 9.444068908691406, -1.5513229370117188, 10.266288757324219, -0.4490852355957031, 3.612771987915039, 5.439891815185547, 9.44085693359375, 11.030349731445312, 2.7749099731445312, 8.234268188476562, 14.78558349609375, 3.4951438903808594], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000064.npy"}
{"epoch": 0.09397944199706314, "step": 65, "batch_size": 64, "mean": 6.764092445373535, "std": 7.006280899047852, "min": -15.503700256347656, "p10": 0.25150909423828127, "median": 5.589824676513672, "p90": 16.410420989990236, "max": 21.730880737304688, "pos_frac": 0.90625, "sample": [21.730880737304688, 19.12628173828125, 10.024612426757812, 10.9942626953125, 3.5163326263427734, 6.216156005859375, 1.2268753051757812, 10.3004150390625, 2.6696548461914062, 15.832725524902344, 12.430458068847656, 0.9989967346191406, -0.3700408935546875, 15.556610107421875, 20.56036376953125, 0.9311408996582031, 7.8014984130859375, 2.039091110229492, -3.5045089721679688, 3.5147705078125, 1.023193359375, 11.967819213867188, 4.497200012207031, -15.503700256347656, 10.7989501953125, 6.535186767578125, -2.8327789306640625, 5.627674102783203, 10.43096923828125, 5.2048492431640625, 5.240325927734375, 5.551975250244141, -1.3449554443359375, 3.943876266479492, 21.00335693359375, 4.672794342041016, 16.658004760742188, 10.912044525146484, 19.56702423095703, 5.665416717529297, 0.253997802734375, -6.655433654785156, 6.691619873046875, 1.9564933776855469, 5.53131103515625, 12.944580078125, 16.784996032714844, 0.40154266357421875, 5.30169677734375, 14.443145751953125, 9.577957153320312, 2.3816680908203125, 13.94512939453125, 0.8223667144775391, 9.767990112304688, 4.1921539306640625, 4.4820556640625, 1.70623779296875, 12.750152587890625, 10.373863220214844, 0.2504425048828125, 7.022041320800781, 0.882781982421875, 5.877290725708008], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000065.npy"}
{"epoch": 0.09544787077826726, "step": 66, "batch_size": 64, "mean": 6.522555351257324, "std": 7.3919596672058105, "min": -5.599090576171875, "p10": -0.2401662826538083, "median": 5.236921310424805, "p90": 15.329477691650391, "max": 34.85237121582031, "pos_frac": 0.890625, "sample": [5.821113586425781, 34.85237121582031, 3.0031814575195312, 11.529647827148438, 10.9637451171875, 9.167701721191406, 1.4882984161376953, 0.52142333984375, 8.586843490600586, 5.688560485839844, -5.599090576171875, 2.006898880004883, 4.785282135009766, -0.37213897705078125, 10.239482879638672, 2.438201904296875, 3.477083206176758, 0.06777000427246094, 2.4189701080322266, 14.609664916992188, 4.443107604980469, 8.488311767578125, 0.8520736694335938, 7.7688751220703125, 13.974105834960938, 2.6302337646484375, 2.6339111328125, -1.5389175415039062, -2.6274681091308594, -5.4033203125, 6.480583190917969, 15.430778503417969, 15.742908477783203, 5.9316558837890625, 2.6756134033203125, 8.018013000488281, 6.232860565185547, 0.253082275390625, 1.0382575988769531, 11.512336730957031, 8.841583251953125, 3.4590682983398438, 0.43953704833984375, 6.069160461425781, 4.544040679931641, -5.253387451171875, 8.393898010253906, 1.0358505249023438, -1.1077346801757812, 10.518684387207031, 2.5744400024414062, 15.093109130859375, 20.124069213867188, 16.787551879882812, 2.4485092163085938, 11.983688354492188, 1.9254150390625, 6.368236541748047, 10.995872497558594, 2.3313446044921875, 25.473052978515625, 0.5321121215820312, 23.646209716796875, 9.987197875976562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000066.npy"}
{"epoch": 0.09691629955947137, "step": 67, "batch_size": 64, "mean": 6.675516605377197, "std": 7.243378162384033, "min": -6.775115966796875, "p10": -0.5569335937499994, "median": 4.937321662902832, "p90": 15.652740478515627, "max": 32.694366455078125, "pos_frac": 0.875, "sample": [17.169082641601562, 2.0575523376464844, 16.800552368164062, 1.6858139038085938, 1.4937667846679688, -0.005828857421875, 12.818778991699219, 3.944978713989258, -5.1368865966796875, 1.757162094116211, 12.986438751220703, 8.16790771484375, 3.2550582885742188, 11.662017822265625, 0.69952392578125, -4.19816780090332, 9.427490234375, 7.649650573730469, 0.49738311767578125, -6.775115966796875, 14.677665710449219, 6.114479064941406, 1.1720695495605469, 0.40057373046875, 3.1707534790039062, 2.879180908203125, 15.791030883789062, 5.334844589233398, -0.9590606689453125, 7.061805725097656, 6.3792877197265625, 9.894397735595703, 4.174613952636719, 2.2007083892822266, 0.15045928955078125, 8.063167572021484, 11.831361770629883, 3.807832717895508, 1.21380615234375, -4.0703125, 3.3536148071289062, 13.944976806640625, 3.886768341064453, 2.897937774658203, 19.802841186523438, 11.645111083984375, 5.494604110717773, 13.072067260742188, 10.820396423339844, 9.840179443359375, 9.548538208007812, 4.362977981567383, 10.647710800170898, 24.53582763671875, 5.697078704833984, 1.2890148162841797, 32.694366455078125, 3.4975109100341797, -1.1147842407226562, 20.06073760986328, 15.330062866210938, 4.539798736572266, -0.793121337890625, 6.933038711547852], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000067.npy"}
{"epoch": 0.09838472834067548, "step": 68, "batch_size": 64, "mean": 6.57466983795166, "std": 6.196515083312988, "min": -3.9877471923828125, "p10": 0.26153793334960945, "median": 4.905551910400391, "p90": 16.188015747070313, "max": 27.139129638671875, "pos_frac": 0.90625, "sample": [-2.5362396240234375, 4.4011993408203125, 17.5606689453125, 5.557708740234375, 4.551445007324219, -3.9877471923828125, 6.585441589355469, 17.796463012695312, -0.5423469543457031, 0.23459625244140625, 11.636383056640625, 12.568878173828125, 12.95245361328125, 9.341503143310547, 10.2391357421875, 0.5392379760742188, 4.997402191162109, 9.401615142822266, 1.8900222778320312, 4.980842590332031, 0.32440185546875, 4.196819305419922, 3.5798187255859375, 4.08135986328125, 4.530345916748047, 12.078887939453125, 14.562728881835938, 7.826713562011719, 7.328792572021484, 16.389724731445312, 15.717361450195312, 17.986248016357422, 2.1282577514648438, 4.83026123046875, 6.889122009277344, 1.245992660522461, -0.33075904846191406, 4.3649749755859375, 3.235015869140625, 4.103431701660156, 12.321468353271484, 2.0477867126464844, 9.292692184448242, -3.065460205078125, 5.732521057128906, 5.1636810302734375, 2.8791465759277344, 16.92547607421875, 2.7542190551757812, 1.0458507537841797, 2.1120471954345703, 27.139129638671875, 18.285079956054688, 13.976943969726562, 9.01141357421875, 1.4343338012695312, 2.1226730346679688, 4.992740631103516, 9.752578735351562, -1.7152023315429688, 3.394561767578125, 1.7755565643310547, 4.4338836669921875, 5.72760009765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000068.npy"}
{"epoch": 0.09985315712187959, "step": 69, "batch_size": 64, "mean": 7.318740367889404, "std": 6.902398109436035, "min": -2.6322402954101562, "p10": 0.16949958801269532, "median": 5.481818199157715, "p90": 14.692226409912111, "max": 32.24334716796875, "pos_frac": 0.90625, "sample": [3.133148193359375, 5.090465545654297, 22.572921752929688, 5.416206359863281, 8.739517211914062, 10.788040161132812, -0.356353759765625, 0.32659912109375, -0.04786491394042969, 1.6804885864257812, 3.7866363525390625, 6.719524383544922, -0.786895751953125, 5.812583923339844, 8.184032440185547, 11.356624603271484, 16.77557373046875, 5.5988006591796875, 13.252777099609375, 12.285209655761719, 1.670806884765625, 14.961585998535156, 10.731010437011719, 10.360260009765625, 14.063720703125, 9.832809448242188, 4.427791595458984, 11.789466857910156, 10.316238403320312, 3.4309730529785156, 0.7023944854736328, 0.8092231750488281, 13.19647216796875, 32.24334716796875, 3.2995834350585938, 13.669586181640625, 3.739593505859375, 15.650192260742188, 11.342041015625, 0.4172954559326172, 2.2773399353027344, 12.774444580078125, 9.062313079833984, 0.18624496459960938, -0.06419754028320312, 10.139015197753906, 5.424427032470703, 29.333877563476562, 4.015533447265625, 5.47735595703125, 5.48628044128418, -2.6322402954101562, 3.6231536865234375, 3.6212539672851562, 2.2264785766601562, 5.705810546875, 14.02679443359375, 16.131866455078125, -0.364898681640625, 10.025543212890625, 1.020761489868164, 1.1899490356445312, 2.567535400390625, 0.162322998046875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000069.npy"}
{"epoch": 0.1013215859030837, "step": 70, "batch_size": 64, "mean": 8.128886222839355, "std": 8.045876502990723, "min": -4.31121826171875, "p10": 0.452383041381836, "median": 7.872323989868164, "p90": 19.63241119384766, "max": 29.769851684570312, "pos_frac": 0.90625, "sample": [4.815633773803711, 2.3622589111328125, 8.98862075805664, 15.274696350097656, 19.112014770507812, 9.241823196411133, -3.34326171875, 1.207305908203125, 9.916252136230469, 3.0888671875, 0.421844482421875, 8.469673156738281, -1.4096813201904297, 5.1308746337890625, 29.769851684570312, 2.0326709747314453, 9.839080810546875, 19.855438232421875, 9.215738296508789, 29.2137451171875, 7.673095703125, 8.230743408203125, 1.40216064453125, 22.184585571289062, -0.46327972412109375, 12.893890380859375, 13.779983520507812, 9.94206428527832, 0.7394504547119141, 5.689464569091797, 3.577686309814453, 23.226226806640625, 1.1100044250488281, 0.8035926818847656, 1.2275390625, 0.5236396789550781, 9.693225860595703, 1.995697021484375, 8.323280334472656, -0.6374893188476562, 1.7976150512695312, 2.439882278442383, 2.416748046875, 2.051025390625, 8.613950729370117, 2.1297149658203125, 14.336135864257812, 17.786346435546875, -4.31121826171875, 4.176445007324219, 11.486114501953125, 0.7335433959960938, 1.2279224395751953, 22.888107299804688, 9.501720428466797, -0.7742424011230469, 25.197174072265625, 8.071552276611328, 16.910934448242188, 12.381021499633789, 3.163440704345703, 17.865798950195312, 9.43365478515625, 15.606338500976562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000070.npy"}
{"epoch": 0.1027900146842878, "step": 71, "batch_size": 64, "mean": 8.960904121398926, "std": 8.312749862670898, "min": -7.409997940063477, "p10": 0.5756675720214852, "median": 7.625293731689453, "p90": 17.078650665283202, "max": 34.215667724609375, "pos_frac": 0.9375, "sample": [-3.8443527221679688, 28.08087158203125, 6.238239288330078, 0.18658447265625, 5.404354095458984, 9.849300384521484, 6.231956481933594, 8.786720275878906, 13.370162963867188, 3.753023147583008, 14.90643310546875, 4.190954208374023, 12.20440673828125, 3.7605762481689453, 1.7101516723632812, 7.596611022949219, 9.464893341064453, 10.572235107421875, 0.13632965087890625, 1.5601158142089844, 3.6556625366210938, 30.781753540039062, 5.3201751708984375, 4.9470367431640625, 12.452369689941406, 12.686546325683594, 4.429721832275391, 7.6539764404296875, 9.489936828613281, -5.792686462402344, 5.060127258300781, 15.4764404296875, 6.6717071533203125, 0.17917251586914062, 3.9922351837158203, 13.69244384765625, 14.543601989746094, 2.34100341796875, 15.212051391601562, 9.389747619628906, 5.279144287109375, 9.213058471679688, 6.961006164550781, 17.140625, 10.789993286132812, 10.740272521972656, 4.351341247558594, 25.94244384765625, 34.215667724609375, 31.285858154296875, 1.9011993408203125, 16.934043884277344, -7.409997940063477, -0.7246513366699219, 8.514579772949219, 2.6182098388671875, 6.45721435546875, 9.467010498046875, 15.341182708740234, 10.955245971679688, 22.295135498046875, 1.4835281372070312, 4.889671325683594, 8.513519287109375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000071.npy"}
{"epoch": 0.10425844346549193, "step": 72, "batch_size": 64, "mean": 9.382756233215332, "std": 8.397205352783203, "min": -6.5157928466796875, "p10": 0.6926544189453125, "median": 7.4148406982421875, "p90": 21.557283020019533, "max": 35.409637451171875, "pos_frac": 0.921875, "sample": [13.572147369384766, 10.744298934936523, 5.22613525390625, 9.842979431152344, 8.03238296508789, 0.6773719787597656, 5.626502990722656, 19.12982940673828, -6.5157928466796875, 19.5081787109375, 9.9083251953125, 9.749893188476562, 6.195470809936523, -0.137664794921875, 13.561595916748047, 7.112171173095703, 7.898017883300781, 16.390396118164062, 9.322067260742188, 24.5966796875, 27.125564575195312, 0.7659244537353516, 4.765174865722656, 7.303409576416016, 12.848487854003906, 4.091529846191406, 13.956657409667969, 5.476325988769531, 4.984397888183594, 11.795921325683594, 2.7303848266601562, 5.865913391113281, 5.608673095703125, -6.4504547119140625, 21.890411376953125, 19.74053955078125, 13.90228271484375, 5.455053329467773, 15.595504760742188, 0.32027435302734375, 4.7441558837890625, 27.171142578125, 35.409637451171875, -4.825828552246094, 7.223197937011719, 21.998916625976562, 7.5107421875, 5.377296447753906, 0.7283134460449219, 20.779983520507812, 3.497894287109375, 2.6888389587402344, 2.8703460693359375, 15.21392822265625, 8.924938201904297, 14.118316650390625, 7.318939208984375, 12.059272766113281, 24.32550048828125, 8.364339828491211, 4.392154693603516, 1.7067947387695312, -0.7717742919921875, 1.4564094543457031], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000072.npy"}
{"epoch": 0.10572687224669604, "step": 73, "batch_size": 64, "mean": 8.946794509887695, "std": 11.48235034942627, "min": -14.023269653320312, "p10": -3.7999465942382806, "median": 6.220461845397949, "p90": 22.76017532348633, "max": 42.23579406738281, "pos_frac": 0.8125, "sample": [22.55963134765625, -4.939659118652344, 3.6079254150390625, 9.09710693359375, 3.8744983673095703, 22.84612274169922, -1.6612319946289062, 19.563610076904297, 4.454154968261719, -1.4351272583007812, 15.578811645507812, 23.90599822998047, 15.387184143066406, -0.5220184326171875, 2.162353515625, -10.138031005859375, 36.985076904296875, -0.9430160522460938, 8.355833053588867, 12.053825378417969, 12.72188949584961, 5.479034423828125, 5.899650573730469, 0.7132682800292969, 1.2895927429199219, 6.152095794677734, -8.10787582397461, 8.667327880859375, 19.03040313720703, 11.343738555908203, 8.590003967285156, 1.168355941772461, 35.499237060546875, -6.7968902587890625, 5.853843688964844, 20.058349609375, 17.326377868652344, -6.030242919921875, 2.0925941467285156, 3.6519622802734375, 1.685394287109375, -3.3609619140625, -3.9880828857421875, 6.288827896118164, 14.479438781738281, 11.937797546386719, 1.9626541137695312, 3.9484481811523438, 16.337867736816406, 28.936264038085938, 8.656723022460938, 4.730060577392578, 12.831787109375, 4.917549133300781, 7.144889831542969, 0.19564056396484375, 12.41265869140625, 34.88560485839844, 14.690933227539062, 42.23579406738281, 2.8350982666015625, 21.41022491455078, 16.04778289794922, -14.023269653320312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000073.npy"}
{"epoch": 0.10719530102790015, "step": 74, "batch_size": 64, "mean": 9.200439453125, "std": 12.116964340209961, "min": -23.250022888183594, "p10": -1.010174751281738, "median": 7.525510787963867, "p90": 20.905987548828126, "max": 67.24639892578125, "pos_frac": 0.84375, "sample": [20.801589965820312, 7.864898681640625, 6.82574462890625, 0.4428730010986328, 17.632225036621094, 3.6551132202148438, 6.011442184448242, 34.846290588378906, 2.421072006225586, 3.5959701538085938, 8.16998291015625, 8.135156631469727, 13.687847137451172, 2.9170455932617188, 12.70672607421875, 67.24639892578125, 6.5578460693359375, 19.485641479492188, 11.029617309570312, 0.73419189453125, 1.8499412536621094, 31.244766235351562, 1.1610794067382812, 20.527503967285156, 20.326332092285156, 6.499336242675781, -2.4137306213378906, 3.55694580078125, 0.3229961395263672, -10.924583435058594, 9.834678649902344, 0.26215362548828125, -0.7561702728271484, 7.3648223876953125, -23.250022888183594, 14.605350494384766, -0.01204681396484375, 14.93471908569336, 4.6865997314453125, -0.6117820739746094, -5.225563049316406, 7.923595428466797, 11.16162109375, 20.950729370117188, -1.1190338134765625, 24.317962646484375, 10.732200622558594, 6.504508972167969, 4.2590179443359375, 6.234683990478516, 7.686199188232422, 25.168426513671875, 11.327903747558594, 14.157958984375, 23.23619842529297, 15.391609191894531, 4.597972869873047, 8.957695007324219, 15.251602172851562, 16.95281982421875, 2.676483154296875, -3.0298309326171875, -1.9298553466796875, 8.66666030883789], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000074.npy"}
{"epoch": 0.10866372980910426, "step": 75, "batch_size": 64, "mean": 12.63138198852539, "std": 11.624005317687988, "min": -5.204526901245117, "p10": -1.249568939208983, "median": 10.564733505249023, "p90": 29.680998992919925, "max": 45.88404846191406, "pos_frac": 0.890625, "sample": [0.11257171630859375, 12.21478271484375, 10.328048706054688, 13.276481628417969, 13.609664916992188, 10.099910736083984, 8.32659912109375, 4.9034576416015625, 25.4935302734375, 34.83892822265625, 20.70471954345703, 10.053749084472656, -1.833343505859375, 16.298744201660156, 45.88404846191406, 9.053020477294922, 23.32598876953125, 38.02888488769531, 6.387180328369141, 0.4223957061767578, 26.05682373046875, 14.380889892578125, 10.468830108642578, 8.437553405761719, 19.4468994140625, 29.175636291503906, 30.360031127929688, 2.9579849243164062, 5.247932434082031, 21.362224578857422, 11.906295776367188, 10.322017669677734, 4.8304595947265625, 41.06951904296875, -2.1860923767089844, -2.897216796875, 15.284866333007812, -5.1346893310546875, 13.170516967773438, 13.074487686157227, 2.357318878173828, 8.932674407958984, -2.93658447265625, 3.9053878784179688, -5.204526901245117, 16.742645263671875, 1.9624767303466797, 6.146327972412109, 22.7972412109375, 16.6988525390625, 3.7126331329345703, -3.4591827392578125, 5.7537384033203125, 3.6362266540527344, 10.660636901855469, 2.895658493041992, 7.8634490966796875, 11.207183837890625, 13.580604553222656, 11.608245849609375, 21.19335174560547, 29.8975830078125, 37.99755859375, 11.594673156738281], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000075.npy"}
{"epoch": 0.11013215859030837, "step": 76, "batch_size": 64, "mean": 8.773109436035156, "std": 9.099518775939941, "min": -20.450454711914062, "p10": 0.6716329574584962, "median": 7.432178497314453, "p90": 19.039941787719727, "max": 40.775726318359375, "pos_frac": 0.90625, "sample": [9.235357284545898, 7.944950103759766, 1.6662712097167969, 15.074905395507812, 1.2999229431152344, 15.164947509765625, 13.050552368164062, 17.604949951171875, 19.052570343017578, 7.245391845703125, 5.742179870605469, 11.180587768554688, 5.960615158081055, 22.979583740234375, 2.6284866333007812, 3.4383182525634766, 6.998935699462891, 40.775726318359375, 19.010475158691406, 2.7803497314453125, 1.1433639526367188, 8.70411491394043, 0.6970767974853516, 6.869880676269531, -8.5113525390625, 6.482454299926758, 3.307239532470703, 7.618965148925781, 4.3690948486328125, 8.764274597167969, 19.19646453857422, 13.534942626953125, 12.610328674316406, 6.79499626159668, 19.752777099609375, 2.5431594848632812, 0.6607284545898438, 15.580123901367188, 3.5582828521728516, -3.7612838745117188, 5.515445709228516, -0.9190731048583984, -2.1436920166015625, 1.4406280517578125, 11.117279052734375, 16.701919555664062, 33.158172607421875, 17.73133087158203, 6.931312561035156, 2.6788787841796875, 9.319145202636719, 1.256521224975586, 12.135726928710938, 7.0897674560546875, 10.611053466796875, 8.175273895263672, -0.3675041198730469, 7.0023193359375, 22.07666778564453, 12.762908935546875, 13.372848510742188, 16.943206787109375, 10.588607788085938, -20.450454711914062], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000076.npy"}
{"epoch": 0.11160058737151249, "step": 77, "batch_size": 64, "mean": 13.343066215515137, "std": 12.258736610412598, "min": -16.5076904296875, "p10": 0.9337432861328127, "median": 13.392967224121094, "p90": 30.18117370605469, "max": 42.02732849121094, "pos_frac": 0.921875, "sample": [8.980087280273438, 32.023643493652344, 19.850543975830078, 23.473072052001953, 26.093154907226562, 2.1454849243164062, 36.96458435058594, 15.151386260986328, 6.525846481323242, 6.5501708984375, 3.5205020904541016, 23.225448608398438, 22.159156799316406, 4.8842010498046875, 7.9115753173828125, 13.790908813476562, -15.11480712890625, 10.287689208984375, 17.751808166503906, 20.83563232421875, 10.680767059326172, -1.5295753479003906, 5.687465667724609, 0.2185516357421875, 1.05621337890625, 29.809661865234375, 0.881256103515625, 14.274490356445312, 6.405437469482422, -3.3990707397460938, 22.129337310791016, 18.361679077148438, 9.911605834960938, 40.581268310546875, 19.44989776611328, 6.423259735107422, 30.34039306640625, 41.65216064453125, 13.860122680664062, 4.959197998046875, -16.5076904296875, 4.6662750244140625, -2.55987548828125, 3.674989700317383, 7.447019577026367, 19.110504150390625, 21.410659790039062, 17.441762924194336, 16.592391967773438, 18.844009399414062, 4.639196395874023, 4.4126129150390625, 3.705587387084961, 14.828804016113281, 17.664031982421875, 16.89842987060547, 1.5060882568359375, 30.965011596679688, 42.02732849121094, 6.299571990966797, 13.22280502319336, 29.525985717773438, 13.563129425048828, 5.8133697509765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000077.npy"}
{"epoch": 0.1130690161527166, "step": 78, "batch_size": 64, "mean": 11.015655517578125, "std": 10.469409942626953, "min": -7.4049835205078125, "p10": -0.5744194030761715, "median": 10.833045959472656, "p90": 26.408636856079102, "max": 43.6893310546875, "pos_frac": 0.84375, "sample": [8.616119384765625, 1.3888778686523438, 23.7796630859375, 9.111343383789062, 0.9900360107421875, 28.452529907226562, 12.507038116455078, 43.6893310546875, 18.109375, 15.889881134033203, 13.516902923583984, 5.715290069580078, -3.407329559326172, 12.641403198242188, 16.59161376953125, 14.983531951904297, -0.1190338134765625, 4.979692459106445, -5.7267913818359375, -0.7474517822265625, 7.005117416381836, 11.564981460571289, 16.540645599365234, 11.693328857421875, 12.803356170654297, 13.016803741455078, 11.461166381835938, 11.387115478515625, 10.278976440429688, 4.455413818359375, 26.378646850585938, 12.44508171081543, 4.9304351806640625, -3.5954856872558594, 17.572914123535156, -2.55657958984375, 28.584686279296875, 12.849098205566406, 26.577667236328125, 7.1650543212890625, 26.421489715576172, 4.318267822265625, 30.06562042236328, 23.444313049316406, 14.844711303710938, 2.71697998046875, 8.01761245727539, 2.588888168334961, 9.777381896972656, 6.24348258972168, 3.0846633911132812, -0.16425132751464844, 9.777408599853516, -0.17067718505859375, 18.232025146484375, -7.4049835205078125, 36.043121337890625, 3.1117286682128906, -6.905265808105469, 20.02178955078125, 7.4373016357421875, 15.904617309570312, 0.6765899658203125, 15.398651123046875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000078.npy"}
{"epoch": 0.1145374449339207, "step": 79, "batch_size": 64, "mean": 12.404184341430664, "std": 13.748146057128906, "min": -6.4891815185546875, "p10": -2.531490707397461, "median": 8.998678207397461, "p90": 33.950745391845714, "max": 50.09580993652344, "pos_frac": 0.78125, "sample": [2.1647872924804688, 36.669830322265625, 4.010686874389648, 42.00752258300781, 6.630229949951172, -0.8987808227539062, -0.7282009124755859, -4.1859893798828125, 28.622703552246094, 31.103668212890625, 13.789443969726562, -6.4891815185546875, 17.997779846191406, 41.939666748046875, 8.402873992919922, 17.159324645996094, 50.09580993652344, 39.41297912597656, 13.013010025024414, 2.1194610595703125, 45.01165771484375, 11.985557556152344, -0.8529434204101562, -4.9918212890625, 17.694541931152344, 0.8422374725341797, 5.430877685546875, 10.953441619873047, 8.892326354980469, 35.170921325683594, 3.0896072387695312, 7.120523452758789, 9.105030059814453, 20.37290382385254, -2.5154876708984375, 8.140312194824219, -2.9102210998535156, 6.2179718017578125, 27.085113525390625, 22.483184814453125, 10.961814880371094, 19.77154541015625, 20.421463012695312, 13.971122741699219, 18.754913330078125, -2.560304641723633, 2.3925552368164062, 17.724464416503906, 21.462081909179688, 7.872957229614258, 10.041664123535156, -3.3973751068115234, -1.4922332763671875, 25.626205444335938, -1.75872802734375, 11.943206787109375, 4.470359802246094, 4.075286865234375, 0.09262847900390625, -1.30218505859375, 7.753965377807617, 25.199966430664062, -2.538349151611328, 13.217422485351562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000079.npy"}
{"epoch": 0.11600587371512482, "step": 80, "batch_size": 64, "mean": 11.025638580322266, "std": 13.256199836730957, "min": -10.358562469482422, "p10": -2.1448509216308587, "median": 10.114007949829102, "p90": 26.29355545043946, "max": 51.61427307128906, "pos_frac": 0.796875, "sample": [24.718421936035156, 6.038051605224609, -0.3877716064453125, 18.21454620361328, 2.9386959075927734, 1.6192626953125, -0.08202743530273438, -3.0724029541015625, 12.972579956054688, 10.510231018066406, 2.818979263305664, 11.272418975830078, -10.001934051513672, -10.358562469482422, 0.67535400390625, 16.619216918945312, 48.428619384765625, 20.456642150878906, 51.61427307128906, 33.46471405029297, 12.748275756835938, 5.250885009765625, 26.968612670898438, 23.761512756347656, 23.077865600585938, 9.717784881591797, -0.5084056854248047, 16.925949096679688, 9.560083389282227, 13.119064331054688, 0.22466278076171875, -8.53900146484375, 5.4471893310546875, -2.5371246337890625, 17.047683715820312, -1.2295455932617188, 38.42933654785156, 18.465530395507812, -4.933933258056641, 45.632080078125, 7.085798263549805, 2.6121063232421875, 3.441486358642578, -1.1327362060546875, 27.550987243652344, 6.786937713623047, 16.45146942138672, 13.80743408203125, 11.972030639648438, 11.180763244628906, 0.7879409790039062, -1.2128334045410156, 0.09110069274902344, 1.265045166015625, 18.024240493774414, 14.051528930664062, 3.5500259399414062, 3.924030303955078, 21.468521118164062, 13.884796142578125, 15.357353210449219, 20.292617797851562, -3.7325592041015625, 11.045028686523438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000080.npy"}
{"epoch": 0.11747430249632893, "step": 81, "batch_size": 64, "mean": 15.350555419921875, "std": 16.064435958862305, "min": -8.888763427734375, "p10": -3.37009124755859, "median": 13.089485168457031, "p90": 34.20310363769532, "max": 72.83660888671875, "pos_frac": 0.890625, "sample": [24.081985473632812, 18.72747802734375, -8.888763427734375, 27.05290985107422, 13.898151397705078, 31.3841552734375, 11.200321197509766, -6.192493438720703, 5.141735076904297, 23.20691680908203, 17.122451782226562, -4.8180389404296875, 1.2086029052734375, 13.09979248046875, 49.716583251953125, 3.387125015258789, 13.079177856445312, 18.022537231445312, 35.411224365234375, -6.378292083740234, 0.6380634307861328, 12.410713195800781, 16.243980407714844, 41.73646545410156, 16.535926818847656, 8.78271484375, 8.60125732421875, 3.4014625549316406, 2.9616622924804688, 0.008453369140625, 16.57611083984375, 9.415328979492188, 2.3029937744140625, 9.915771484375, 23.91619873046875, 64.78033447265625, 26.447086334228516, 12.270706176757812, 2.7228050231933594, 36.799346923828125, 25.517364501953125, 28.026336669921875, 7.609779357910156, 2.945148468017578, 4.468894958496094, 1.1501121520996094, -7.731319427490234, 9.127763748168945, 22.64165496826172, 18.374252319335938, 11.791271209716797, 13.78814697265625, 13.445343017578125, -5.216548919677734, 29.501686096191406, 30.021461486816406, 72.83660888671875, 39.884490966796875, 6.293304443359375, 25.87371826171875, 5.394172668457031, 15.973861694335938, -5.0173797607421875, 19.804489135742188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000081.npy"}
{"epoch": 0.11894273127753303, "step": 82, "batch_size": 64, "mean": 12.550470352172852, "std": 17.606733322143555, "min": -34.037872314453125, "p10": -4.440937805175781, "median": 11.6666259765625, "p90": 32.23881874084473, "max": 71.24411010742188, "pos_frac": 0.8125, "sample": [-0.8586540222167969, 16.97248077392578, -5.787422180175781, 13.650367736816406, 14.98629379272461, 13.871225357055664, -11.879768371582031, 11.445358276367188, 15.68377685546875, 39.55951690673828, 40.42547607421875, 10.92253303527832, -3.8736724853515625, 3.4110946655273438, 10.152551651000977, 66.57655334472656, 7.544486999511719, -34.037872314453125, -4.684051513671875, 22.03436279296875, 1.5271224975585938, 23.698623657226562, 31.34051513671875, 2.020536422729492, -10.02984619140625, -1.2141189575195312, 3.7390670776367188, 13.884918212890625, 14.0919189453125, 8.9520263671875, 10.391746520996094, 24.409255981445312, 20.50292205810547, -0.5518932342529297, 42.65596008300781, 13.505950927734375, 29.600021362304688, 11.887893676757812, 7.667081832885742, -1.0272216796875, 0.068878173828125, 20.831573486328125, 34.79747009277344, -7.091285705566406, 19.853225708007812, 12.223930358886719, 5.774480819702148, 32.62380599975586, 17.08789825439453, 5.656585693359375, 4.77081298828125, 5.30462646484375, -32.9161376953125, 5.06341552734375, 20.27037811279297, 16.61516571044922, 7.372200012207031, 19.523513793945312, 13.989479064941406, 71.24411010742188, 10.041358947753906, 15.88509750366211, 0.01703643798828125, 31.055419921875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000082.npy"}
{"epoch": 0.12041116005873716, "step": 83, "batch_size": 64, "mean": 14.497251510620117, "std": 15.50546932220459, "min": -15.426780700683594, "p10": -2.4614952087402333, "median": 12.258077621459961, "p90": 36.29059143066407, "max": 54.85858154296875, "pos_frac": 0.828125, "sample": [12.906570434570312, 12.91476058959961, 11.246410369873047, 48.17816162109375, 28.505203247070312, 9.315496444702148, -2.9994277954101562, 34.30596923828125, 33.916259765625, 7.280025482177734, 3.7692184448242188, 13.062492370605469, 9.67181396484375, -1.3349380493164062, 13.245803833007812, 46.96455383300781, -7.026092529296875, 54.85858154296875, 6.308784484863281, 42.215599060058594, 23.632858276367188, 7.668540954589844, 37.141143798828125, -9.700401306152344, -15.426780700683594, -2.944305419921875, 32.684356689453125, -10.957633972167969, 14.843648910522461, 39.90574645996094, 16.993667602539062, 10.120559692382812, 3.735454559326172, 4.48822021484375, 13.451812744140625, 9.867424011230469, 19.225292205810547, -1.0393257141113281, 24.456863403320312, 5.439052581787109, 11.60958480834961, 7.995494842529297, 28.03424072265625, 45.02618408203125, 15.311851501464844, 30.47631072998047, 6.507774353027344, -11.15869140625, -1.03118896484375, 0.4542884826660156, -0.7988739013671875, 25.00213623046875, 7.450603485107422, 27.559219360351562, 15.758834838867188, 15.3857421875, 2.6406478881835938, 5.5778045654296875, 13.058902740478516, 19.569656372070312, 29.341400146484375, 11.514610290527344, 17.78241729736328, 3.863697052001953], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000083.npy"}
{"epoch": 0.12187958883994127, "step": 84, "batch_size": 64, "mean": 13.803592681884766, "std": 12.719524383544922, "min": -22.492843627929688, "p10": -0.16063461303710908, "median": 12.821098327636719, "p90": 28.634481048583986, "max": 51.36932373046875, "pos_frac": 0.890625, "sample": [7.3596343994140625, 7.6707000732421875, 3.834564208984375, 2.7633743286132812, 27.9471435546875, 3.13812255859375, 3.53826904296875, 16.017608642578125, 8.940702438354492, 10.569656372070312, 21.656471252441406, 21.44890594482422, -9.458038330078125, 14.382217407226562, 18.77151870727539, 25.018905639648438, 6.508901596069336, 51.36932373046875, 21.05569076538086, 1.0146331787109375, 6.353269577026367, 25.096839904785156, 11.044227600097656, 5.5979461669921875, -1.5487403869628906, 28.929054260253906, 16.58747100830078, 27.299224853515625, -22.492843627929688, -0.6547908782958984, 37.38402557373047, 3.38995361328125, 25.739242553710938, 12.740432739257812, 10.779403686523438, 0.12511825561523438, 30.596649169921875, 27.08148193359375, 17.079971313476562, 20.03807830810547, 4.2420196533203125, 22.436134338378906, -0.48461151123046875, 11.574926376342773, 20.702056884765625, 20.155590057373047, 0.19464874267578125, 0.4037055969238281, 22.277389526367188, 24.044143676757812, 5.907447814941406, 31.341476440429688, 33.44386291503906, 16.782997131347656, 38.074920654296875, 12.901763916015625, 20.603897094726562, 3.865509033203125, -4.568878173828125, 22.44029998779297, 8.95703125, 13.658611297607422, 10.043769836425781, -0.2831001281738281], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000084.npy"}
{"epoch": 0.12334801762114538, "step": 85, "batch_size": 64, "mean": 13.623421669006348, "std": 15.16620922088623, "min": -15.116950988769531, "p10": -1.2472665786743165, "median": 11.644775390625, "p90": 34.55279617309571, "max": 50.49420166015625, "pos_frac": 0.78125, "sample": [26.69377899169922, 8.545608520507812, 22.78443145751953, 2.264455795288086, 5.24627685546875, 17.329208374023438, 13.218452453613281, 3.8548545837402344, 40.324485778808594, 5.554634094238281, -2.2265567779541016, 5.841590881347656, 37.04499816894531, 13.638633728027344, 13.792417526245117, 15.63705062866211, -1.0117950439453125, -8.3724365234375, 16.09709930419922, -0.5644302368164062, -15.116950988769531, 42.66358947753906, 11.516815185546875, -1.274404525756836, 1.96368408203125, 31.979782104492188, 10.784637451171875, 1.75445556640625, 11.772735595703125, 7.319480895996094, 29.511947631835938, 0.3585052490234375, -5.9803924560546875, -2.0471744537353516, 10.522537231445312, 4.473058700561523, 47.99188232421875, 13.02059555053711, 23.17010498046875, 0.9744319915771484, -0.6324920654296875, 50.49420166015625, -0.9244632720947266, 27.9427490234375, 33.80574035644531, 9.789348602294922, 34.282493591308594, 34.66864013671875, 5.677520751953125, 3.1433868408203125, 17.671592712402344, 33.39618682861328, -1.166229248046875, -1.1839447021484375, 12.700637817382812, 12.758443832397461, 40.58207702636719, 12.203826904296875, 23.93364715576172, 25.942760467529297, -10.82720947265625, 14.667671203613281, -0.6407032012939453, 32.561004638671875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000085.npy"}
{"epoch": 0.12481644640234948, "step": 86, "batch_size": 64, "mean": 14.717806816101074, "std": 21.53960609436035, "min": -25.152420043945312, "p10": -5.888851928710936, "median": 9.503189086914062, "p90": 40.551599121093766, "max": 83.99191284179688, "pos_frac": 0.78125, "sample": [-6.328746795654297, 10.862373352050781, 42.046051025390625, -9.163604736328125, 26.705839157104492, 17.785682678222656, 19.68937110900879, -3.5483169555664062, 61.080810546875, 7.184543609619141, 4.145734786987305, 2.6751327514648438, 3.9094085693359375, 12.387298583984375, -2.6533050537109375, 13.013481140136719, 33.207550048828125, 16.089214324951172, -1.0675468444824219, -20.726547241210938, 34.032012939453125, 11.354007720947266, 21.718685150146484, 8.144004821777344, 7.391502380371094, -8.764163970947266, 3.773406982421875, 43.7825927734375, -4.862430572509766, 4.215177536010742, 25.703937530517578, 15.315807342529297, 5.240137100219727, -25.152420043945312, 26.325576782226562, 37.064544677734375, 17.499923706054688, -15.135505676269531, 42.62544250488281, 4.152950286865234, 80.06124877929688, 13.978511810302734, 20.571487426757812, 7.537555694580078, -0.41751670837402344, 83.99191284179688, 5.501909255981445, 71.1593017578125, 28.088485717773438, 4.5790252685546875, -18.447181701660156, 29.124298095703125, 7.356298446655273, 25.799034118652344, 18.346160888671875, 28.481685638427734, 29.622480392456055, 2.322298049926758, 5.8715667724609375, 5.847145080566406, -0.9968185424804688, -4.71514892578125, 3.812225341796875, 12.744060516357422], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000086.npy"}
{"epoch": 0.1262848751835536, "step": 87, "batch_size": 64, "mean": 16.16939926147461, "std": 19.185428619384766, "min": -32.716949462890625, "p10": -1.4508285522460933, "median": 14.916419982910156, "p90": 40.49762420654297, "max": 81.44999694824219, "pos_frac": 0.828125, "sample": [17.133291244506836, 21.638446807861328, 0.6213016510009766, 6.949060440063477, 26.303939819335938, 2.0725479125976562, 15.074485778808594, -0.3956012725830078, 1.0164871215820312, 0.23352813720703125, -5.829620361328125, 22.04913330078125, -1.6147537231445312, 39.02128601074219, 17.078628540039062, 31.969711303710938, 36.671295166015625, 6.979970932006836, 60.204803466796875, 28.49041748046875, 15.249786376953125, 13.631290435791016, 4.570493698120117, 27.4337158203125, 23.129444122314453, 1.6731033325195312, 21.33111572265625, 81.44999694824219, 12.936973571777344, -0.2058563232421875, 12.02855110168457, 44.82891845703125, 42.13310241699219, 5.134757995605469, 38.28404235839844, 16.49748992919922, 10.688074111938477, 25.837757110595703, 14.249092102050781, 36.05604553222656, 0.40509033203125, 50.3973388671875, 18.222984313964844, 17.31606674194336, -32.716949462890625, 5.680887222290039, 25.813079833984375, -21.95402717590332, 24.8433837890625, 42.091644287109375, -6.495632171630859, -5.702537536621094, 5.401702880859375, 14.758354187011719, -0.1413116455078125, 41.130340576171875, 24.111541748046875, 21.695236206054688, -18.609893798828125, 12.795890808105469, 13.548606872558594, 1.9890995025634766, 28.722793579101562, -1.0683364868164062], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000087.npy"}
{"epoch": 0.1277533039647577, "step": 88, "batch_size": 64, "mean": 12.597654342651367, "std": 17.918014526367188, "min": -27.826080322265625, "p10": -2.212751388549804, "median": 9.893268585205078, "p90": 34.08934936523438, "max": 71.764892578125, "pos_frac": 0.828125, "sample": [9.355293273925781, 10.351799011230469, 44.25354766845703, 35.332763671875, 24.804840087890625, 26.301834106445312, 1.8245315551757812, 38.81419372558594, 4.036956787109375, 9.128108978271484, 17.889129638671875, 4.886636734008789, 7.0769500732421875, 25.108047485351562, -13.757125854492188, 19.579971313476562, -12.950775146484375, 15.135185241699219, -24.306808471679688, -24.686553955078125, -1.5542182922363281, 44.82427978515625, 20.018035888671875, -2.4949798583984375, 29.940383911132812, 4.093841552734375, 16.38507080078125, -0.8292388916015625, 14.981412887573242, 20.912002563476562, 27.898719787597656, 4.570518493652344, -0.2539405822753906, 22.584136962890625, 0.8660888671875, 51.73565673828125, 16.539146423339844, 31.18804931640625, 2.7827682495117188, 71.764892578125, 1.7836151123046875, 26.656509399414062, 5.075439453125, 10.871671676635742, 11.791641235351562, 13.675125122070312, 4.701271057128906, 0.5995693206787109, 19.665000915527344, 47.06231689453125, 20.979843139648438, 9.434738159179688, 2.426166534423828, 5.507061004638672, -27.826080322265625, 0.41965484619140625, 7.218240737915039, -1.5277595520019531, 5.534978866577148, 12.589637756347656, 14.288908004760742, -13.817153930664062, 9.146942138671875, 25.86138916015625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000088.npy"}
{"epoch": 0.12922173274596183, "step": 89, "batch_size": 64, "mean": 13.830251693725586, "std": 15.56725025177002, "min": -28.45245361328125, "p10": -0.03566818237304631, "median": 12.890018463134766, "p90": 30.653027343750008, "max": 68.73341369628906, "pos_frac": 0.890625, "sample": [0.5542678833007812, 1.7271728515625, 2.6764373779296875, 15.87785530090332, -3.9764671325683594, 28.858978271484375, 19.673805236816406, 12.871650695800781, 0.5307159423828125, 40.26512908935547, 35.75349426269531, 16.972362518310547, 15.624893188476562, 26.748191833496094, 16.98310089111328, 22.676002502441406, 11.177505493164062, 7.3278045654296875, -1.2982540130615234, 12.698444366455078, 9.288749694824219, 1.3347339630126953, 12.955757141113281, 7.590526580810547, -3.6228790283203125, 3.993074417114258, -0.27840423583984375, 7.030113220214844, 41.77159118652344, 12.90838623046875, 13.556602478027344, 58.245361328125, 12.505657196044922, 31.421905517578125, 14.6932373046875, 4.68402099609375, 14.098407745361328, 68.73341369628906, 1.4868316650390625, 0.9756107330322266, 19.642803192138672, 4.486865997314453, 14.096822738647461, 19.13113784790039, 13.393655776977539, 5.7620086669921875, 7.075153350830078, 5.564815521240234, 4.958099365234375, 21.30516815185547, 13.698402404785156, 22.798873901367188, -1.4625778198242188, 9.013378143310547, 24.72803497314453, -28.45245361328125, -8.913345336914062, 17.84039306640625, 13.958162307739258, 10.56768798828125, 20.602813720703125, 17.763145446777344, 56.50996398925781, 3.9712982177734375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000089.npy"}
{"epoch": 0.13069016152716592, "step": 90, "batch_size": 64, "mean": 17.629772186279297, "std": 23.41208267211914, "min": -25.36791229248047, "p10": -1.9634609222412105, "median": 11.436012268066406, "p90": 54.39006347656251, "max": 85.8770751953125, "pos_frac": 0.828125, "sample": [36.4212646484375, -25.36791229248047, 26.77130126953125, 51.799713134765625, 17.68645477294922, 24.38311767578125, 0.4247856140136719, -1.3287773132324219, 40.686431884765625, -0.2985877990722656, 30.04364776611328, 11.524848937988281, 2.4043807983398438, 11.415740966796875, 21.66234588623047, 57.93635559082031, -0.4155006408691406, 85.8770751953125, 32.31272888183594, 1.912841796875, 3.458925247192383, 4.857975006103516, 8.853477478027344, -2.1377487182617188, 66.11602783203125, 15.295707702636719, 19.09600067138672, 67.09033203125, 4.02906608581543, 17.888446807861328, 82.27264404296875, 11.456283569335938, -3.5121002197265625, 5.950279235839844, 8.113334655761719, -23.933441162109375, 18.594364166259766, 5.2419586181640625, 39.153106689453125, 2.6744918823242188, 6.8032989501953125, 8.92596435546875, -14.093681335449219, -1.5567893981933594, 9.281539916992188, 46.5986328125, 55.500213623046875, -21.095237731933594, -10.1036376953125, 7.364418029785156, 18.39122772216797, 1.9150047302246094, 13.323368072509766, 16.975257873535156, 37.963287353515625, 20.72167205810547, 8.794349670410156, 5.442575454711914, 26.091705322265625, 6.756557464599609, 12.416191101074219, 9.663007736206055, 58.461944580078125, 27.353111267089844], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000090.npy"}
{"epoch": 0.13215859030837004, "step": 91, "batch_size": 64, "mean": 16.636859893798828, "std": 20.00498390197754, "min": -17.73090362548828, "p10": -6.201046752929687, "median": 14.099203109741211, "p90": 40.173654937744146, "max": 72.70280456542969, "pos_frac": 0.75, "sample": [13.787483215332031, 37.939971923828125, -13.784446716308594, 5.028308868408203, 40.62214660644531, -0.37822914123535156, -2.1017913818359375, 31.589916229248047, 4.938322067260742, 12.609130859375, 29.38408660888672, 25.279067993164062, 26.420631408691406, 10.085700988769531, -17.73090362548828, 72.70280456542969, -8.324943542480469, 1.0582351684570312, 4.9394073486328125, 51.8135986328125, 10.861328125, 21.594085693359375, 61.30291748046875, -5.6569061279296875, 22.375701904296875, 39.127174377441406, -5.652992248535156, 0.5654888153076172, 14.41092300415039, -1.6284713745117188, -1.7534942626953125, 14.802953720092773, 36.08604431152344, -9.752182006835938, 16.492515563964844, 13.62038803100586, 37.409156799316406, -7.001190185546875, 16.218433380126953, -1.764364242553711, 5.622276306152344, 31.927505493164062, 38.98351287841797, 2.3446197509765625, -6.4342498779296875, 21.699554443359375, 17.536087036132812, 51.978271484375, 28.733078002929688, 22.395767211914062, 16.260009765625, -13.798931121826172, 0.28156471252441406, 12.734649658203125, -1.0207500457763672, 34.03362274169922, 0.8410511016845703, 54.49342346191406, 28.825660705566406, 12.639152526855469, 31.137359619140625, -3.7775802612304688, 30.013580322265625, 49.77375030517578], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000091.npy"}
{"epoch": 0.13362701908957417, "step": 92, "batch_size": 64, "mean": 16.04425811767578, "std": 20.02904510498047, "min": -26.968597412109375, "p10": -5.2096008300781245, "median": 13.160582542419434, "p90": 38.909125900268556, "max": 76.80056762695312, "pos_frac": 0.765625, "sample": [-0.7900581359863281, 3.271392822265625, 10.47100830078125, 29.596633911132812, -7.8969573974609375, 4.900657653808594, -26.968597412109375, -9.365493774414062, 26.130210876464844, 17.02727508544922, 39.856388092041016, 1.2742252349853516, 27.820068359375, 1.7740364074707031, 1.8838882446289062, 68.08981323242188, 39.159942626953125, 76.80056762695312, -15.45074462890625, -5.258270263671875, 17.357074737548828, -6.229072570800781, 22.4559326171875, -2.5951271057128906, -5.096038818359375, 33.11542510986328, -3.218292236328125, 51.11090087890625, 21.77178192138672, 7.359628677368164, 50.344764709472656, -1.2698841094970703, 22.74718475341797, 2.2027721405029297, 36.34540557861328, 3.9049129486083984, 17.490585327148438, 56.56916809082031, 7.976327896118164, 33.61262512207031, 29.313400268554688, 32.85923767089844, 0.119476318359375, 1.6463775634765625, 4.815673828125, 13.029945373535156, -2.0523834228515625, 26.509613037109375, 25.84478759765625, 6.741857528686523, -0.017047882080078125, -6.970148086547852, 38.32388687133789, 9.78829574584961, 13.291219711303711, 23.34349822998047, 25.348674774169922, 27.415748596191406, 28.340057373046875, 15.643089294433594, 8.325139999389648, -3.7213211059570312, 37.46044921875, 23.150970458984375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000092.npy"}
{"epoch": 0.13509544787077826, "step": 93, "batch_size": 64, "mean": 16.5429630279541, "std": 17.971166610717773, "min": -21.28677749633789, "p10": -1.9383253097534165, "median": 13.13955307006836, "p90": 40.22276000976563, "max": 93.47415161132812, "pos_frac": 0.875, "sample": [18.478179931640625, 6.604188919067383, 16.3765869140625, 42.651519775390625, 41.19275665283203, 30.871620178222656, 4.0971527099609375, 13.682853698730469, 25.1219482421875, -2.576904296875, 46.479034423828125, 5.150493621826172, 33.595672607421875, 36.278663635253906, 93.47415161132812, 7.576740264892578, 17.70818328857422, 7.0884857177734375, 8.320167541503906, 55.392250061035156, -4.930870056152344, 9.955730438232422, 14.797016143798828, -0.6365585327148438, 20.421646118164062, 12.222213745117188, 14.291862487792969, 6.721626281738281, -2.496225357055664, 13.515323638916016, 0.1972808837890625, 12.870903015136719, 6.602361679077148, 18.325794219970703, 25.097618103027344, 29.202560424804688, 40.361000061035156, -7.037620544433594, -3.3365478515625, 28.47674560546875, 27.353736877441406, 28.41400718688965, 12.980720520019531, 4.400449752807617, 7.84980583190918, 6.619720458984375, 17.84839630126953, 2.0536327362060547, 40.51116943359375, 5.01848030090332, 2.1682281494140625, 39.90019989013672, 13.298385620117188, 8.500053405761719, 4.318630218505859, 4.346046447753906, -10.413108825683594, 29.780433654785156, 28.63336181640625, 26.056396484375, 24.40509033203125, 12.247428894042969, -21.28677749633789, 1.5595588684082031], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000093.npy"}
{"epoch": 0.13656387665198239, "step": 94, "batch_size": 64, "mean": 17.906766891479492, "std": 17.921972274780273, "min": -10.796768188476562, "p10": -3.253224182128906, "median": 13.427339553833008, "p90": 45.092795181274425, "max": 62.490234375, "pos_frac": 0.828125, "sample": [16.75847625732422, -5.506103515625, 30.17498779296875, 11.790672302246094, 7.906623840332031, -6.522159576416016, 5.899314880371094, 42.01646041870117, 4.860391616821289, -8.569206237792969, 19.330739974975586, 11.246917724609375, 16.978294372558594, 3.1530609130859375, 6.7374267578125, 0.6130771636962891, 35.96250915527344, 9.257946014404297, -3.3577499389648438, 62.490234375, 5.718137741088867, 13.45041275024414, 19.069931030273438, 37.97856903076172, 48.5234375, 13.404266357421875, 12.573577880859375, 19.886734008789062, 13.397083282470703, -0.7759552001953125, 1.9955921173095703, 12.687616348266602, 22.64683723449707, 30.760223388671875, 31.006145477294922, 13.087020874023438, 26.106491088867188, 40.634620666503906, 26.966873168945312, 17.65349006652832, 3.4950599670410156, 32.09337615966797, 4.426536560058594, -5.023956298828125, 46.411224365234375, 47.26158905029297, -1.6211261749267578, 46.4833984375, 56.1845703125, -0.6200485229492188, -10.796768188476562, 61.01275634765625, 17.680885314941406, 31.421493530273438, -3.0093307495117188, 11.762054443359375, -5.936187744140625, 0.41001129150390625, 24.59748077392578, 6.5778656005859375, 26.17947006225586, 32.395843505859375, 33.57891845703125, 23.074905395507812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000094.npy"}
{"epoch": 0.13803230543318648, "step": 95, "batch_size": 64, "mean": 14.971220970153809, "std": 15.88862419128418, "min": -31.848342895507812, "p10": -1.1319664001464833, "median": 12.803985595703125, "p90": 37.60997924804688, "max": 53.26441192626953, "pos_frac": 0.875, "sample": [7.964263916015625, 24.034645080566406, 13.388809204101562, 14.795623779296875, 12.613370895385742, 41.005859375, 7.645057678222656, 4.80402946472168, -2.54156494140625, 16.36705780029297, 17.242143630981445, 2.88037109375, 21.02254867553711, 46.78599548339844, 35.713134765625, 4.793937683105469, 24.5374755859375, 4.047307968139648, 8.182476043701172, 23.684341430664062, 7.5074920654296875, 22.473388671875, 15.245466232299805, 11.735631942749023, 7.752771377563477, -2.568939208984375, 6.278797149658203, 28.452133178710938, 14.03925895690918, 19.435222625732422, -1.8863677978515625, 2.837310791015625, 38.42291259765625, 51.499053955078125, 53.26441192626953, 32.59986877441406, 16.431686401367188, 2.1591835021972656, 1.98406982421875, 32.21258544921875, 29.198307037353516, 44.328948974609375, 46.17303466796875, 5.137176513671875, 11.951873779296875, 6.723295211791992, -0.16351699829101562, 21.136369705200195, 11.124504089355469, -31.848342895507812, 0.6923084259033203, 1.8629035949707031, -9.42584228515625, 23.45813751220703, 30.103736877441406, 12.508197784423828, 26.710235595703125, 10.646757125854492, 12.994600296020508, 13.213165283203125, -1.5470161437988281, 0.8118419647216797, 15.404386520385742, -11.879753112792969], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000095.npy"}
{"epoch": 0.1395007342143906, "step": 96, "batch_size": 64, "mean": 19.99195098876953, "std": 20.732364654541016, "min": -27.427520751953125, "p10": 0.9795801162719727, "median": 16.409629821777344, "p90": 49.56314926147464, "max": 92.82789611816406, "pos_frac": 0.90625, "sample": [33.18247985839844, 19.132843017578125, 23.412124633789062, 18.857975006103516, 24.566864013671875, 37.65669250488281, 7.0441131591796875, 1.6544761657714844, 0.9456787109375, 27.26959228515625, 66.85212707519531, 9.285743713378906, 13.584098815917969, 2.235370635986328, -1.5116195678710938, 17.135910034179688, 20.376590728759766, 32.892181396484375, 60.22101593017578, 52.790252685546875, 8.788604736328125, 25.22075653076172, 20.48619842529297, 92.82789611816406, 42.033241271972656, 14.857112884521484, 19.754608154296875, 15.683349609375, 12.257003784179688, 13.151588439941406, 6.953453063964844, 74.41812133789062, 22.957902908325195, 20.338237762451172, 3.21820068359375, 55.202117919921875, 30.402786254882812, 4.598705291748047, 23.50867462158203, 2.7357311248779297, 26.904861450195312, 23.073240280151367, 11.19085693359375, -0.9528961181640625, 10.31998062133789, -0.09482765197753906, 6.831428527832031, 24.36170196533203, 1.225637435913086, 14.132789611816406, 10.002609252929688, 31.130859375, -8.14356803894043, -0.2382354736328125, 32.3321533203125, 3.8909454345703125, 12.162412643432617, 29.265609741210938, 65.63539123535156, 12.238410949707031, 1.0586833953857422, 20.16930389404297, -27.427520751953125, 3.4361038208007812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000096.npy"}
{"epoch": 0.14096916299559473, "step": 97, "batch_size": 64, "mean": 16.080947875976562, "std": 16.205642700195312, "min": -31.37212371826172, "p10": -2.23061408996582, "median": 14.842074394226074, "p90": 34.99550018310547, "max": 64.31375122070312, "pos_frac": 0.828125, "sample": [11.245109558105469, 35.02101135253906, 16.212142944335938, 6.6673736572265625, -1.6505088806152344, 5.859699249267578, 14.10415267944336, 19.079896926879883, 0.5548629760742188, 26.802047729492188, 13.196033477783203, 26.031944274902344, -0.2530975341796875, 14.567207336425781, 53.940673828125, 32.71903991699219, -2.4227981567382812, 9.919929504394531, 1.622314453125, 9.995254516601562, 18.539405822753906, 5.499004364013672, 12.097259521484375, -4.014091491699219, 45.5107421875, 15.817371368408203, 16.912796020507812, 19.172828674316406, 29.212432861328125, 11.394798278808594, 17.012176513671875, -1.1724815368652344, 10.29522705078125, -4.079652786254883, 26.84796905517578, 19.399826049804688, 34.93597412109375, 18.870983123779297, -31.37212371826172, -2.477865219116211, 31.383682250976562, 27.183155059814453, -4.053291320800781, 3.7344627380371094, 34.239501953125, 21.489971160888672, 2.4234390258789062, 15.485795974731445, 15.116941452026367, 10.662330627441406, -1.7821846008300781, 30.663406372070312, -5.152671813964844, 34.05239486694336, 7.941455841064453, 36.539306640625, 3.6989669799804688, 38.73616027832031, 32.75013732910156, 40.67466735839844, 22.914958953857422, 2.609485626220703, 64.31375122070312, 11.93994140625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000097.npy"}
{"epoch": 0.14243759177679882, "step": 98, "batch_size": 64, "mean": 17.607675552368164, "std": 19.85318374633789, "min": -47.773284912109375, "p10": -5.821557998657226, "median": 16.582931518554688, "p90": 43.02947540283205, "max": 59.461761474609375, "pos_frac": 0.8125, "sample": [18.804046630859375, 7.656949996948242, 47.62654113769531, 23.17029571533203, 4.215559005737305, 13.798357009887695, 14.127182006835938, 16.955581665039062, 16.210281372070312, 3.75201416015625, 19.167970657348633, 9.4603271484375, 13.843681335449219, 12.067802429199219, 44.539825439453125, 33.24949645996094, -6.904090881347656, 8.752410888671875, 4.7522735595703125, -12.14324951171875, 29.782123565673828, 23.78982925415039, -8.798240661621094, -6.272735595703125, 39.39710235595703, 32.942684173583984, 24.816848754882812, 44.80047607421875, 37.6961669921875, 19.85959815979004, 22.547225952148438, 0.00461578369140625, 39.31989288330078, 59.461761474609375, -47.773284912109375, 47.79637145996094, 39.50532531738281, 19.7518310546875, 25.041831970214844, 8.387626647949219, -4.768810272216797, 27.52825164794922, -16.921417236328125, 16.20281982421875, 23.451446533203125, 2.146543502807617, 29.182777404785156, 55.95857238769531, 37.112632751464844, 26.593685150146484, 8.185440063476562, -0.3769989013671875, 35.944000244140625, 58.8680419921875, 23.47718048095703, 14.83895492553711, 12.351608276367188, 6.851064682006836, -2.6015892028808594, -2.773956298828125, 37.338348388671875, -0.9002494812011719, 3.163595199584961, -9.1209716796875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000098.npy"}
{"epoch": 0.14390602055800295, "step": 99, "batch_size": 64, "mean": 20.36504364013672, "std": 22.309019088745117, "min": -13.437042236328125, "p10": 0.475838088989258, "median": 14.260103225708008, "p90": 45.86993865966797, "max": 98.43316650390625, "pos_frac": 0.90625, "sample": [1.636453628540039, -1.173095703125, 3.8947906494140625, 14.772205352783203, 4.699323654174805, 31.085437774658203, 11.574150085449219, 14.429435729980469, 64.765380859375, 13.995691299438477, 12.430610656738281, 64.89541625976562, 14.386608123779297, -13.437042236328125, 42.00328063964844, 6.298561096191406, 10.966026306152344, 24.356185913085938, 0.6634712219238281, 8.302024841308594, 33.0863037109375, 8.227195739746094, 4.957233428955078, 98.43316650390625, 4.226221084594727, 30.0604248046875, 6.3416748046875, -6.948753356933594, 8.609495162963867, -11.351688385009766, 0.39542388916015625, 38.06348419189453, 24.550277709960938, 14.133598327636719, 35.34770202636719, 89.95872497558594, 46.64952087402344, 18.192264556884766, 15.295700073242188, 11.312744140625, 1.3930130004882812, 17.568119049072266, 21.357177734375, 41.1173210144043, 67.26837921142578, 2.5118942260742188, 10.308059692382812, 41.748291015625, -1.5056076049804688, 3.3035717010498047, 22.335548400878906, 46.64396667480469, 18.96646499633789, 8.827434539794922, 31.939071655273438, 1.8478164672851562, 25.95840072631836, 11.699575424194336, 38.92271423339844, 44.063873291015625, 16.32167625427246, 31.9139404296875, 1.01800537109375, -2.2514724731445312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000099.npy"}
{"epoch": 0.14537444933920704, "step": 100, "batch_size": 64, "mean": 13.70488166809082, "std": 19.86121368408203, "min": -30.482486724853516, "p10": -5.276523399353027, "median": 10.76858901977539, "p90": 42.0465717315674, "max": 81.3082275390625, "pos_frac": 0.796875, "sample": [36.622840881347656, 23.064821243286133, -4.371417999267578, 10.713424682617188, 13.259098052978516, 5.152557373046875, 24.6031551361084, 10.823753356933594, 8.261215209960938, 9.683822631835938, 9.602310180664062, 0.7288455963134766, 22.4591064453125, 11.639739990234375, 1.6094093322753906, 49.05377197265625, 13.563285827636719, -25.60955810546875, -4.8774261474609375, -20.22021484375, -5.447565078735352, -3.4000320434570312, 3.0710296630859375, -30.482486724853516, 37.8574104309082, 18.932022094726562, 34.730079650878906, 17.874725341796875, 0.8205814361572266, 4.965545654296875, 26.81561279296875, 2.889007568359375, 31.42779541015625, -0.3097419738769531, 46.23443603515625, 23.06079864501953, 5.859731674194336, 8.957221984863281, 32.18421936035156, 12.906112670898438, 24.20821762084961, 9.583114624023438, 9.648483276367188, 2.062135696411133, 19.81816864013672, 22.831085205078125, 5.346412658691406, -15.567970275878906, -2.4372940063476562, 43.84192657470703, 22.104629516601562, 45.67163848876953, 18.119922637939453, -28.113433837890625, 81.3082275390625, -1.31561279296875, 12.567062377929688, 47.04045104980469, -8.602439880371094, 44.725494384765625, 23.032638549804688, 5.842597961425781, 20.814464569091797, 9.873493194580078], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000100.npy"}
{"epoch": 0.14684287812041116, "step": 101, "batch_size": 64, "mean": 15.905902862548828, "std": 21.710723876953125, "min": -21.944808959960938, "p10": -8.140522956848143, "median": 13.254981994628906, "p90": 45.674140167236345, "max": 92.33865356445312, "pos_frac": 0.71875, "sample": [47.141998291015625, 92.33865356445312, 20.80133056640625, 19.803619384765625, -2.51116943359375, 57.598426818847656, 63.39495849609375, 3.926097869873047, 53.34541320800781, 61.75221252441406, -10.640045166015625, 15.252090454101562, -12.042686462402344, 30.593856811523438, 9.287857055664062, -1.7813644409179688, -5.245941162109375, 8.546892166137695, 29.30152130126953, 21.483688354492188, -2.5373077392578125, 13.350128173828125, 30.679107666015625, 32.389015197753906, 15.26797103881836, 13.125019073486328, 5.4082183837890625, 42.24913787841797, 58.65228271484375, 13.159835815429688, 6.475006103515625, -4.169525146484375, -8.858808517456055, 31.212238311767578, 15.949575424194336, 23.025909423828125, 35.86572265625, 23.92017364501953, -3.78497314453125, 5.857646942138672, -0.2510089874267578, -11.743511199951172, 13.074125289916992, 3.3829193115234375, 8.281120300292969, -3.2089099884033203, 14.982231140136719, 19.661941528320312, 6.264373779296875, 15.314476013183594, -21.944808959960938, 27.595504760742188, 12.427764892578125, -11.50198745727539, 19.321239471435547, 13.678865432739258, -1.7594032287597656, 24.909183502197266, 3.7623214721679688, -1.110940933227539, 35.81501007080078, -14.074932098388672, 21.982940673828125, -6.4645233154296875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000101.npy"}
{"epoch": 0.14831130690161526, "step": 102, "batch_size": 64, "mean": 12.05476188659668, "std": 19.768177032470703, "min": -26.354522705078125, "p10": -9.675699615478514, "median": 12.189958572387695, "p90": 32.37996749877931, "max": 81.07023620605469, "pos_frac": 0.734375, "sample": [2.0518951416015625, 33.68640899658203, 27.18883514404297, 3.1444778442382812, 14.954620361328125, 3.4021263122558594, -17.751754760742188, -22.396438598632812, 16.006296157836914, -25.7603759765625, 19.61151123046875, 45.881317138671875, 0.8672294616699219, 24.718772888183594, 14.19644546508789, -3.9433326721191406, 11.075485229492188, 21.998558044433594, 27.79381561279297, -0.5314750671386719, 4.389503479003906, -2.416677474975586, 19.284576416015625, -1.942138671875, 29.33160400390625, 13.159244537353516, 23.3502140045166, 81.07023620605469, -1.2418289184570312, 0.3569793701171875, 59.233612060546875, 0.0455169677734375, 25.553112030029297, 27.74175453186035, 2.4944286346435547, -7.0317230224609375, 14.692712783813477, -26.354522705078125, 1.1661834716796875, -13.706291198730469, 0.2090587615966797, -15.644912719726562, -10.352249145507812, 7.143585205078125, 29.16918182373047, 11.220672607421875, 21.035171508789062, 46.03956604003906, 25.31897735595703, 18.144119262695312, 40.63166046142578, -7.6288604736328125, 1.0818023681640625, 9.514301300048828, 37.21770477294922, -3.3667678833007812, 21.59845733642578, 18.651878356933594, 13.856576919555664, 27.21673583984375, -5.007728576660156, -8.097084045410156, 22.60358428955078, 25.578411102294922], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000102.npy"}
{"epoch": 0.14977973568281938, "step": 103, "batch_size": 64, "mean": 22.175342559814453, "std": 21.04568099975586, "min": -11.041763305664062, "p10": -0.9198093414306635, "median": 19.602664947509766, "p90": 51.24158020019534, "max": 86.26910400390625, "pos_frac": 0.875, "sample": [2.4810104370117188, 4.4879608154296875, 10.169233322143555, 16.53894805908203, 4.373687744140625, -9.053070068359375, 26.739933013916016, 25.811203002929688, 8.290691375732422, 15.303577423095703, 19.78131103515625, 25.82086753845215, 8.752601623535156, 20.22069549560547, 0.31939697265625, 41.047088623046875, 64.5196533203125, 23.54669952392578, 29.74950408935547, 30.466331481933594, -2.0867156982421875, 5.231842041015625, 23.21070098876953, 30.752639770507812, -8.504806518554688, 43.24407958984375, 23.027809143066406, -9.093955993652344, 4.25244140625, 17.39434051513672, 16.126113891601562, 76.63031005859375, 13.451148986816406, 22.44833755493164, 38.86552429199219, 27.68152618408203, -2.72088623046875, -11.041763305664062, 14.980012893676758, 14.606731414794922, -0.3589286804199219, 21.741073608398438, 29.35985565185547, 5.4019622802734375, 12.741546630859375, 66.44817352294922, 45.459007263183594, 20.715957641601562, 86.26910400390625, 56.17002868652344, 64.37676239013672, 42.21379089355469, 12.218643188476562, 19.42401885986328, 44.363075256347656, -1.160186767578125, 24.01169204711914, 2.178953170776367, 5.290477752685547, 37.353126525878906, 53.719825744628906, 14.620952606201172, 18.762939453125, 30.07728385925293], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000103.npy"}
{"epoch": 0.1512481644640235, "step": 104, "batch_size": 64, "mean": 20.85307502746582, "std": 24.284900665283203, "min": -30.517410278320312, "p10": -2.143929862976073, "median": 13.828636169433594, "p90": 54.144687271118165, "max": 87.61309814453125, "pos_frac": 0.875, "sample": [44.39665985107422, 2.243804931640625, 8.376930236816406, 5.7431488037109375, 36.926422119140625, 4.281379699707031, -9.506118774414062, 1.8083305358886719, 11.637924194335938, 23.670623779296875, -5.675561904907227, 16.263343811035156, 80.18748474121094, 35.677520751953125, 46.75409698486328, 0.9661483764648438, 58.54278564453125, 24.242361068725586, -5.177886962890625, 66.6602783203125, 13.88641357421875, 11.985427856445312, 17.800765991210938, 6.342502593994141, 9.844606399536133, 2.6119728088378906, 87.61309814453125, -15.619148254394531, 32.94342041015625, 71.41636657714844, 75.48638916015625, 17.569732666015625, 7.77015495300293, 34.968711853027344, 5.9257354736328125, 6.561798095703125, -2.6419944763183594, -7.528900146484375, 53.28173828125, 10.751678466796875, 35.226688385009766, 5.927459716796875, 41.066741943359375, 8.53476333618164, 54.514522552490234, 31.08075523376465, 21.902915954589844, 27.65035629272461, 13.770858764648438, 28.806121826171875, 1.3105220794677734, 47.78770446777344, 7.3630218505859375, 14.691001892089844, 46.73689270019531, 41.320411682128906, -0.9817790985107422, 21.55207633972168, 0.08496284484863281, 14.042789459228516, 2.808704376220703, 2.504873275756836, -30.517410278320312, 8.421792984008789], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000104.npy"}
{"epoch": 0.1527165932452276, "step": 105, "batch_size": 64, "mean": 29.143280029296875, "std": 22.649538040161133, "min": -16.502586364746094, "p10": 2.394055175781252, "median": 27.317867279052734, "p90": 60.5586784362793, "max": 109.11296081542969, "pos_frac": 0.953125, "sample": [25.499969482421875, 21.503494262695312, 35.12294387817383, 23.12139892578125, 29.932044982910156, 21.827064514160156, 44.30744934082031, 66.56685638427734, 10.708703994750977, -9.449371337890625, 1.5188865661621094, 52.385711669921875, 41.800811767578125, 10.801080703735352, 27.71198272705078, 29.704822540283203, -16.502586364746094, 8.237083435058594, 1.315673828125, 61.17115020751953, 22.555038452148438, 64.80329895019531, 61.68022155761719, 11.031990051269531, 10.376205444335938, 52.54743194580078, 37.93922424316406, 23.79051971435547, 5.7688446044921875, 30.5538330078125, 32.277923583984375, 77.28184509277344, 11.209487915039062, 33.762001037597656, 10.926864624023438, 14.578193664550781, 39.025238037109375, 56.09626770019531, 26.923751831054688, 12.406558990478516, 35.291900634765625, 66.33935546875, 21.250118255615234, 23.249174118041992, 14.6114501953125, 34.89997100830078, 0.7257919311523438, 43.203338623046875, 44.141536712646484, 46.649261474609375, 17.34499740600586, 18.998199462890625, -3.1051177978515625, 4.436115264892578, 35.76615905761719, 34.212345123291016, 31.37458610534668, 59.12957763671875, 33.68190002441406, 109.11296081542969, 12.071983337402344, 0.376800537109375, 9.648618698120117, 48.93910217285156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000105.npy"}
{"epoch": 0.15418502202643172, "step": 106, "batch_size": 64, "mean": 18.59684944152832, "std": 19.547449111938477, "min": -24.225357055664062, "p10": -2.92393054962158, "median": 18.438199996948242, "p90": 45.80099487304689, "max": 85.75528717041016, "pos_frac": 0.84375, "sample": [18.47386932373047, 2.6964855194091797, 3.413961410522461, 0.2680377960205078, 15.489856719970703, -1.1720867156982422, 32.397377014160156, -3.6747207641601562, 20.925613403320312, 42.500484466552734, 25.827789306640625, 24.95183563232422, 4.0160675048828125, 49.50962829589844, 13.164421081542969, 47.060943603515625, 0.7369651794433594, 12.437257766723633, 37.486061096191406, -0.30576133728027344, 42.861114501953125, 20.422229766845703, 3.71453857421875, 47.93035888671875, 14.100992202758789, 24.28003692626953, 10.399452209472656, -8.772216796875, 25.550880432128906, -9.700820922851562, 24.003644943237305, 13.48660659790039, 22.003040313720703, 27.656055450439453, 1.83905029296875, 8.022041320800781, 16.34492301940918, 2.548583984375, 48.40313720703125, 47.46794891357422, 29.136993408203125, 18.402530670166016, -24.225357055664062, 1.0671768188476562, 85.75528717041016, 17.72149658203125, 21.03097152709961, 28.87622833251953, 11.487472534179688, 4.177467346191406, 2.190673828125, 29.4749755859375, 61.076316833496094, 24.692285537719727, 32.641448974609375, 30.648971557617188, -6.601799011230469, 21.12890625, -0.1968841552734375, 21.322250366210938, 40.79252624511719, -12.388664245605469, -6.059532165527344, 29.280990600585938], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000106.npy"}
{"epoch": 0.15565345080763582, "step": 107, "batch_size": 64, "mean": 21.153640747070312, "std": 21.61376953125, "min": -30.763015747070312, "p10": -4.4839433670043904, "median": 15.030487060546875, "p90": 50.82872886657715, "max": 73.69660949707031, "pos_frac": 0.875, "sample": [41.00993347167969, 59.08393859863281, 29.380172729492188, 42.322059631347656, 14.380294799804688, 17.66507339477539, 30.490989685058594, -0.4622478485107422, -30.763015747070312, 3.5914134979248047, 10.617862701416016, 45.11875915527344, -10.482337951660156, 23.6279296875, 8.274776458740234, 8.609184265136719, 9.746980667114258, 57.79222869873047, 2.542642593383789, -9.951278686523438, 50.14153289794922, 40.13078308105469, 48.741302490234375, 3.4394359588623047, -6.207527160644531, 5.6343231201171875, 5.912300109863281, 3.9567413330078125, 73.69660949707031, 37.94940948486328, 19.743093490600586, 15.680679321289062, 46.540748596191406, 22.457901000976562, 37.17921447753906, 58.815818786621094, 11.773571014404297, 17.22857666015625, 9.086179733276367, -10.634506225585938, 36.411407470703125, -19.01934051513672, 36.83013153076172, 51.12324142456055, 11.440544128417969, 52.34629821777344, 5.521476745605469, 54.704193115234375, 35.89039611816406, 30.915878295898438, 5.950935363769531, 10.677652359008789, 12.928604125976562, -6.522491455078125, 9.414756774902344, 5.9543609619140625, 9.345029830932617, 39.47483825683594, 14.243888854980469, 21.77923583984375, 34.15437316894531, 39.73150634765625, 4.2876434326171875, 12.386871337890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000107.npy"}
{"epoch": 0.15712187958883994, "step": 108, "batch_size": 64, "mean": 22.344249725341797, "std": 28.262174606323242, "min": -23.720226287841797, "p10": -7.840344238281249, "median": 18.37006950378418, "p90": 64.19185714721681, "max": 97.48150634765625, "pos_frac": 0.75, "sample": [22.563201904296875, 66.65815734863281, 25.210586547851562, -18.16094970703125, -21.11322021484375, 50.97314453125, 21.049394607543945, 4.097869873046875, 97.48150634765625, -3.7928237915039062, 3.64227294921875, 68.50501251220703, 3.0874710083007812, -0.1790771484375, -0.48772621154785156, 33.720619201660156, -2.2110538482666016, 47.57017517089844, 75.80091857910156, 64.77906036376953, 62.38543701171875, 42.691009521484375, -1.547882080078125, -12.507543563842773, 24.908979415893555, 7.729314804077148, 17.751873016357422, 11.27932357788086, 51.702392578125, 6.772218704223633, 9.646595001220703, -0.5737781524658203, 48.536964416503906, -8.322158813476562, 93.1749267578125, -6.7161102294921875, 44.986000061035156, -11.48110580444336, -5.7712249755859375, 3.8920459747314453, 21.30713653564453, 6.1751251220703125, 24.405670166015625, 71.27340698242188, 18.988265991210938, 4.075309753417969, 45.70545196533203, 36.82411193847656, 14.730539321899414, -22.85577392578125, 6.535896301269531, 34.361167907714844, 62.82171630859375, 29.41790771484375, 3.199920654296875, -1.2059497833251953, 31.21465301513672, 11.759607315063477, 45.96223449707031, -23.720226287841797, 19.451004028320312, 34.37952423095703, 25.071151733398438, 12.422300338745117], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000108.npy"}
{"epoch": 0.15859030837004406, "step": 109, "batch_size": 64, "mean": 25.194257736206055, "std": 23.478912353515625, "min": -14.301128387451172, "p10": -3.3113481521606434, "median": 23.345308303833008, "p90": 57.302837371826186, "max": 89.01226806640625, "pos_frac": 0.828125, "sample": [32.23199462890625, 4.622020721435547, 58.64274597167969, 45.598175048828125, 10.517356872558594, 34.84920883178711, -11.207870483398438, 11.567607879638672, 10.347236633300781, 34.678680419921875, 64.89837646484375, 23.310916900634766, 40.52806091308594, 22.149246215820312, 38.177734375, 6.31724739074707, 17.455127716064453, 26.43787956237793, -4.614383697509766, -3.726621627807617, -0.7889404296875, 89.01226806640625, -0.5935859680175781, -4.545654296875, 21.379304885864258, 3.0391998291015625, 69.47297668457031, 22.83970832824707, 11.709440231323242, -14.301128387451172, 30.182449340820312, 38.853240966796875, 3.654844284057617, 26.4066162109375, 37.87286376953125, 23.68787956237793, 48.8455810546875, 44.92162322998047, 53.723655700683594, 60.22498321533203, 6.1935882568359375, 7.648529052734375, -8.539306640625, 17.14684295654297, 1.364959716796875, 45.375, 28.503219604492188, 24.941852569580078, 15.151582717895508, 53.07611083984375, -12.64251708984375, 23.37969970703125, 23.944690704345703, 60.07838439941406, 68.05162048339844, 12.915863037109375, 49.68824005126953, 16.464637756347656, 46.61981201171875, 54.17638397216797, -1.54486083984375, -2.342376708984375, 6.783563613891602, 47.618934631347656], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000109.npy"}
{"epoch": 0.16005873715124816, "step": 110, "batch_size": 64, "mean": 23.25214958190918, "std": 26.59100341796875, "min": -41.80027770996094, "p10": -1.0487892150878901, "median": 19.7039737701416, "p90": 61.20070495605472, "max": 99.13632202148438, "pos_frac": 0.859375, "sample": [44.23480224609375, 23.4307861328125, 20.347660064697266, 37.05525207519531, 31.85541534423828, 21.647682189941406, 27.78261947631836, 17.66433334350586, -41.80027770996094, 23.631744384765625, 33.565147399902344, 11.856437683105469, -9.713470458984375, 11.200395584106445, 13.933910369873047, -0.587646484375, 18.84543228149414, 99.13632202148438, 65.38980102539062, 16.40682601928711, 3.264616012573242, -1.2464218139648438, 92.90061950683594, 66.50727844238281, 51.5213623046875, 64.83824157714844, 7.4330291748046875, 10.722320556640625, 1.0373382568359375, 41.6099967956543, -15.182937622070312, -9.96832275390625, 5.200824737548828, 13.387933731079102, 4.932949066162109, 85.4014892578125, 30.225059509277344, 52.71311950683594, 48.389678955078125, 1.9877185821533203, 38.76991271972656, -0.10721683502197266, 13.180252075195312, 28.929643630981445, 11.684463500976562, 21.57476043701172, 43.83019256591797, -33.04962158203125, 35.33526611328125, -8.568206787109375, 19.060287475585938, 1.8192977905273438, 26.88318634033203, 17.455928802490234, 33.969539642333984, 22.508773803710938, 16.567340850830078, 21.097976684570312, 11.106502532958984, 32.291419982910156, 32.62300109863281, 1.298349380493164, 7.243312835693359, 71.07412719726562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000110.npy"}
{"epoch": 0.16152716593245228, "step": 111, "batch_size": 64, "mean": 24.890220642089844, "std": 28.001632690429688, "min": -40.67413330078125, "p10": -7.694386291503903, "median": 22.52447509765625, "p90": 59.824664306640635, "max": 111.18896484375, "pos_frac": 0.828125, "sample": [111.18896484375, -14.005523681640625, -40.67413330078125, 47.1572265625, -5.122108459472656, 29.571502685546875, 2.9857940673828125, 19.756027221679688, 12.47640609741211, 70.67825317382812, 20.186203002929688, 47.24394226074219, 25.856590270996094, 28.028030395507812, 18.26727294921875, 40.130126953125, 63.198631286621094, 4.98193359375, 51.88874816894531, 7.991050720214844, 36.664772033691406, -8.796791076660156, -2.802265167236328, 2.3126220703125, 35.26133728027344, 57.571624755859375, 7.70928955078125, 55.10026550292969, -11.8931884765625, 9.505146026611328, 0.711151123046875, 20.02683448791504, 60.790252685546875, 12.839111328125, 73.26455688476562, 25.283370971679688, 17.26294708251953, 31.324771881103516, 35.71106719970703, 80.04451751708984, 15.834991455078125, -10.954910278320312, 13.385992050170898, 12.704242706298828, 81.28851318359375, 26.396194458007812, 22.986923217773438, 23.664291381835938, 52.76654052734375, 2.2452831268310547, 56.45686340332031, 24.278667449951172, 46.73081970214844, 46.89087677001953, 27.38690948486328, 6.167415618896484, -27.01837158203125, 22.062026977539062, 14.79789924621582, 27.30762481689453, -3.249868392944336, 48.30210876464844, -14.339614868164062, -2.79364013671875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000111.npy"}
{"epoch": 0.16299559471365638, "step": 112, "batch_size": 64, "mean": 17.130971908569336, "std": 21.753332138061523, "min": -32.63335418701172, "p10": -8.381766128540038, "median": 14.167064666748047, "p90": 51.4933578491211, "max": 69.9100341796875, "pos_frac": 0.8125, "sample": [19.218616485595703, 69.9100341796875, 39.5963134765625, -3.7634716033935547, -17.747982025146484, 15.63154411315918, 20.275253295898438, 14.490608215332031, 7.523643493652344, -6.879779815673828, 59.50437927246094, 50.110137939453125, 1.4575157165527344, 1.1698226928710938, 52.08616638183594, 40.69855880737305, 24.159748077392578, 36.70970153808594, 26.469757080078125, 22.262863159179688, 31.342788696289062, 32.153560638427734, 40.36439514160156, 2.4607772827148438, 18.205642700195312, 19.325353622436523, 20.09607696533203, 8.294788360595703, -10.042770385742188, 14.348529815673828, 4.019510269165039, -13.056652069091797, 14.158912658691406, 58.792640686035156, 66.49246978759766, -13.340499877929688, 24.006423950195312, 10.7080078125, 36.049041748046875, 3.7581920623779297, -32.63335418701172, 55.036590576171875, 10.279273986816406, 6.6706390380859375, 3.6262054443359375, 9.108928680419922, 31.255508422851562, 3.101043701171875, 13.857765197753906, 52.98541259765625, 6.984626770019531, 4.781673431396484, 28.55503273010254, -11.154342651367188, 4.579736709594727, -9.025474548339844, -4.138519287109375, -1.0859107971191406, 37.99479675292969, 1.2260551452636719, 14.175216674804688, 5.97802734375, -2.5267715454101562, 25.729469299316406], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000112.npy"}
{"epoch": 0.1644640234948605, "step": 113, "batch_size": 64, "mean": 19.44598960876465, "std": 19.499792098999023, "min": -16.053970336914062, "p10": -4.852844238281249, "median": 18.416847229003906, "p90": 44.72538795471191, "max": 61.276206970214844, "pos_frac": 0.8125, "sample": [-14.881072998046875, 39.583038330078125, -5.960724830627441, 44.884429931640625, 10.597587585449219, 17.56512451171875, 17.2051944732666, 28.9205322265625, 10.40349006652832, 4.708789825439453, 4.559234619140625, 38.480926513671875, 33.06379699707031, 54.53717041015625, 19.843013763427734, -8.332670211791992, 6.015632629394531, -2.8832931518554688, 11.485092163085938, 52.35542297363281, -10.059555053710938, 25.254409790039062, 39.97322082519531, -2.28448486328125, 30.12975311279297, 8.704635620117188, 9.027925491333008, 61.276206970214844, 23.162254333496094, -9.413871765136719, 9.46112060546875, 51.73905944824219, 21.411396026611328, -4.046791076660156, 21.004241943359375, 36.536346435546875, 8.537483215332031, 43.230674743652344, 31.662567138671875, 6.0346527099609375, 31.360626220703125, 41.44501495361328, -1.9717731475830078, 2.244565963745117, -16.053970336914062, 33.36049270629883, 19.268569946289062, 23.845184326171875, -5.198295593261719, 5.77470588684082, 58.399200439453125, 3.8591651916503906, 21.860061645507812, 33.821136474609375, 49.70250701904297, 44.35429000854492, 24.283443450927734, 43.6549072265625, -1.5094776153564453, 9.4326171875, 8.501800537109375, 8.614742279052734, 13.96712875366211, 28.034683227539062], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000113.npy"}
{"epoch": 0.16593245227606462, "step": 114, "batch_size": 64, "mean": 28.083698272705078, "std": 31.90363311767578, "min": -15.71583366394043, "p10": -5.186272239685058, "median": 27.044106483459473, "p90": 56.332905578613286, "max": 164.78240966796875, "pos_frac": 0.8125, "sample": [87.13931274414062, 56.98406982421875, 35.80741882324219, 45.44786834716797, 13.319755554199219, 73.93156433105469, 8.39057731628418, 27.778671264648438, -0.434600830078125, 33.08946990966797, -15.71583366394043, 29.7391357421875, 35.70586395263672, 3.4994583129882812, -12.271461486816406, 43.84642028808594, 7.562507629394531, -5.293216705322266, 23.2103271484375, 54.81352233886719, 13.718687057495117, 23.79218864440918, -7.564483642578125, 57.187644958496094, 34.050621032714844, 41.630462646484375, 18.41650390625, 18.38735580444336, 14.827728271484375, 38.43185043334961, 39.889122009277344, -0.12452316284179688, -2.70550537109375, 38.51426696777344, -4.936735153198242, 7.84844970703125, 20.46024513244629, 31.586380004882812, 44.23951721191406, 44.26536560058594, 36.70566940307617, 45.44214630126953, 3.2408580780029297, 26.58524513244629, 19.688823699951172, 0.03420257568359375, 128.7811737060547, 3.441446304321289, 27.502967834472656, 87.26508331298828, 2.1576061248779297, 42.277549743652344, -8.657417297363281, -5.985843658447266, 164.78240966796875, 43.92983627319336, -0.23734092712402344, 36.220703125, 31.215301513671875, -15.28829574584961, 12.235664367675781, 52.71372985839844, 0.5639381408691406, 44.27519989013672], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000114.npy"}
{"epoch": 0.16740088105726872, "step": 115, "batch_size": 64, "mean": 20.51511001586914, "std": 30.815521240234375, "min": -56.919525146484375, "p10": -6.48261547088623, "median": 13.612865447998047, "p90": 68.19215164184571, "max": 126.25241088867188, "pos_frac": 0.765625, "sample": [4.218273162841797, -3.3619613647460938, 70.00498962402344, 27.62811279296875, -8.88250732421875, 2.100677490234375, -8.634483337402344, 8.100591659545898, 4.310657501220703, 16.081504821777344, 11.71213150024414, -4.2223052978515625, 29.02194595336914, 32.277496337890625, 25.06361961364746, 83.39201354980469, 17.70383644104004, 1.8063735961914062, 126.25241088867188, -6.690202713012695, 35.54364776611328, 2.1415863037109375, 69.23997497558594, 37.803985595703125, 77.982177734375, -4.088470458984375, 19.88177490234375, 27.45090103149414, 3.311542510986328, 46.472625732421875, 86.84080505371094, 46.36668395996094, 8.92312240600586, -52.218666076660156, 6.017055511474609, 24.413745880126953, 29.21466827392578, 10.861442565917969, 13.401418685913086, 27.48394775390625, 18.012710571289062, 20.187484741210938, 22.816741943359375, 12.944480895996094, 76.986083984375, -5.9982452392578125, 12.131267547607422, 6.9383544921875, -0.51214599609375, 5.986320495605469, 13.824312210083008, 43.55436706542969, -0.306488037109375, 23.494808197021484, -9.666337966918945, 65.74723052978516, 41.33075714111328, -3.5147743225097656, 40.79566955566406, -56.919525146484375, 47.362945556640625, -7.748878479003906, 5.271875381469727, -4.679161071777344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000115.npy"}
{"epoch": 0.16886930983847284, "step": 116, "batch_size": 64, "mean": 21.212888717651367, "std": 21.901363372802734, "min": -12.208919525146484, "p10": -2.937494850158691, "median": 17.449246406555176, "p90": 53.52203063964845, "max": 116.37559509277344, "pos_frac": 0.828125, "sample": [6.989105224609375, 3.227642059326172, -12.208919525146484, -4.937095642089844, 18.473031997680664, 24.19873046875, 20.616607666015625, 14.728195190429688, 30.51225471496582, 35.023902893066406, 13.043312072753906, 2.084920883178711, 22.631004333496094, 19.781166076660156, 1.4495086669921875, -4.516611099243164, 27.401878356933594, -0.45810699462890625, 41.04164123535156, 50.938323974609375, 20.297592163085938, 10.634536743164062, 12.21014404296875, 5.785697937011719, 27.002227783203125, 15.16656494140625, 31.8585205078125, 55.9871826171875, 11.451438903808594, -2.7012557983398438, 62.38128662109375, 26.202545166015625, 14.61037826538086, 19.863204956054688, -3.0387401580810547, -8.888145446777344, 15.47275161743164, 14.327224731445312, 8.837148666381836, 54.62933349609375, 3.147127151489258, 55.474945068359375, 16.425460815429688, 42.968841552734375, 34.92289733886719, 10.965015411376953, 38.96800994873047, -0.62652587890625, -3.884136199951172, 39.360862731933594, 22.703388214111328, -4.929218292236328, 116.37559509277344, 32.46753692626953, 14.609382629394531, 25.86115264892578, 27.587615966796875, 26.643848419189453, -1.1330814361572266, 8.556785583496094, 1.85296630859375, 36.03052520751953, 55.75968933105469, 55.406158447265625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000116.npy"}
{"epoch": 0.17033773861967694, "step": 117, "batch_size": 64, "mean": 19.626995086669922, "std": 24.237754821777344, "min": -35.514129638671875, "p10": -3.3991889953613277, "median": 18.257533073425293, "p90": 50.12329864501956, "max": 103.39422607421875, "pos_frac": 0.859375, "sample": [11.481388092041016, 26.14240074157715, 28.3255615234375, 6.726909637451172, 13.447296142578125, 5.6822662353515625, 18.625349044799805, 11.795578002929688, 11.799552917480469, 22.723554611206055, 30.231182098388672, 41.61384582519531, 55.659149169921875, 84.14439392089844, 21.971542358398438, 18.90229034423828, -17.985275268554688, 41.73805236816406, 29.508453369140625, -35.514129638671875, -11.330535888671875, 6.543510437011719, 13.31031608581543, 33.20803451538086, 32.45130157470703, 3.1520004272460938, 7.601432800292969, 1.9387664794921875, 21.856849670410156, 1.6599769592285156, 22.531940460205078, 67.48770141601562, -3.0050277709960938, 24.199859619140625, 63.149269104003906, 12.400230407714844, 103.39422607421875, 14.751571655273438, 11.115737915039062, 29.74736785888672, 4.257602691650391, 1.9557380676269531, 5.298173904418945, -3.568115234375, -29.42822265625, 13.892738342285156, 43.59668731689453, 20.8687744140625, 52.92041778564453, 21.74161720275879, -15.97503662109375, 0.5974674224853516, 4.829097747802734, 33.885433197021484, 58.101036071777344, 39.675689697265625, 7.4660797119140625, 28.26946258544922, 27.901947021484375, 22.85772705078125, 17.88971710205078, -0.7116622924804688, -12.653976440429688, 29.275299072265625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000117.npy"}
{"epoch": 0.17180616740088106, "step": 118, "batch_size": 64, "mean": 20.390953063964844, "std": 21.802257537841797, "min": -17.476730346679688, "p10": -2.1401634216308576, "median": 15.58329963684082, "p90": 46.554080963134766, "max": 83.69734191894531, "pos_frac": 0.875, "sample": [-2.9769210815429688, 11.957063674926758, 7.577157974243164, -11.369064331054688, 14.771484375, 15.832515716552734, 35.98215103149414, 29.789751052856445, 32.484046936035156, 46.842620849609375, 45.62055969238281, 15.09979248046875, 64.44381713867188, 18.696041107177734, 0.21893310546875, 24.276952743530273, 1.8801918029785156, 2.3090057373046875, 20.15713882446289, 83.69734191894531, 54.294708251953125, -3.700458526611328, -13.173934936523438, 5.2583465576171875, -17.476730346679688, 12.912111282348633, 3.9477405548095703, 17.665590286254883, 2.5288352966308594, 2.587371826171875, 10.33123779296875, 27.40642547607422, -8.895210266113281, 42.53468322753906, 15.334083557128906, 24.385055541992188, 14.04403305053711, 21.95635223388672, 69.54388427734375, 2.0304298400878906, 10.233016967773438, -6.4287567138671875, 8.817779541015625, 21.468215942382812, 67.6768798828125, 41.72502899169922, 68.17604064941406, 30.772842407226562, 45.880821228027344, 10.79026985168457, 2.0269622802734375, 31.459922790527344, 42.59931945800781, 34.49388122558594, 27.182281494140625, 17.696884155273438, 20.8668155670166, 9.866264343261719, 2.6817970275878906, 21.35942840576172, 11.312667846679688, 43.62774658203125, 0.11743927001953125, -0.1877288818359375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000118.npy"}
{"epoch": 0.17327459618208516, "step": 119, "batch_size": 64, "mean": 20.894737243652344, "std": 25.70353889465332, "min": -23.255706787109375, "p10": -9.022619628906249, "median": 18.199519157409668, "p90": 56.341681289672856, "max": 102.77952575683594, "pos_frac": 0.8125, "sample": [18.162084579467773, 19.005081176757812, 13.011283874511719, 69.40647888183594, 33.764617919921875, -1.09234619140625, 2.9234390258789062, 28.816001892089844, -4.546291351318359, 36.17947006225586, 5.03411865234375, 1.0354156494140625, 93.07769775390625, 33.20586395263672, 4.250720977783203, 64.46066284179688, 16.76669692993164, 8.550247192382812, 102.77952575683594, 30.221389770507812, 32.97618103027344, 39.83634948730469, 39.978492736816406, 5.103725433349609, 16.728469848632812, 22.409992218017578, -17.527999877929688, 7.0394744873046875, 33.23799133300781, 34.06744384765625, 5.595161437988281, -7.38763427734375, 54.9644775390625, 18.342304229736328, 70.26541137695312, -9.72332763671875, 32.39514923095703, 25.939895629882812, 4.26753044128418, 12.169792175292969, -0.114776611328125, 17.87353515625, -15.511978149414062, 61.533485412597656, 23.080230712890625, 32.81884765625, 4.6367645263671875, -15.623947143554688, 43.824806213378906, 28.013446807861328, -16.56572723388672, -23.255706787109375, 10.886039733886719, 33.24855041503906, 22.152114868164062, 26.412071228027344, 28.472702026367188, 7.998857498168945, 56.93191146850586, -2.2537403106689453, -22.099361419677734, 5.003021240234375, 18.236953735351562, 15.874076843261719], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000119.npy"}
{"epoch": 0.17474302496328928, "step": 120, "batch_size": 64, "mean": 27.11212921142578, "std": 28.037742614746094, "min": -23.430587768554688, "p10": -6.180593109130852, "median": 24.50049591064453, "p90": 70.03729248046875, "max": 96.69424438476562, "pos_frac": 0.890625, "sample": [82.22630310058594, 40.1732177734375, 23.82684326171875, 64.07066345214844, 21.23526382446289, 13.154813766479492, -16.74871826171875, 60.46517562866211, 15.125091552734375, 37.89141082763672, 39.42094421386719, 83.60450744628906, 37.603370666503906, 19.960128784179688, 5.686588287353516, 9.83544921875, 59.04838562011719, 20.45612335205078, 34.29601287841797, -16.583824157714844, -21.590911865234375, 75.24020385742188, 25.174148559570312, 28.06525421142578, 26.199417114257812, 12.303665161132812, 20.676544189453125, 8.57061767578125, 71.05718994140625, -23.430587768554688, -9.263778686523438, 36.08464050292969, 27.218284606933594, 2.920928955078125, 53.14640808105469, 2.196809768676758, 94.70285034179688, 35.32229995727539, 34.52198791503906, 1.8333110809326172, 1.2732810974121094, 4.694801330566406, 36.92394256591797, 69.04736328125, 20.961944580078125, 27.775238037109375, 70.4615478515625, 96.69424438476562, 15.329269409179688, 26.649818420410156, 0.630462646484375, 39.58715057373047, -11.853546142578125, 16.73424530029297, 30.22754669189453, 11.414779663085938, 5.290744781494141, 49.11326599121094, 29.867401123046875, 0.7064208984375, 42.94309997558594, -9.099617004394531, 21.27508544921875, 2.8307418823242188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000120.npy"}
{"epoch": 0.1762114537444934, "step": 121, "batch_size": 64, "mean": 29.79346466064453, "std": 31.868616104125977, "min": -33.584014892578125, "p10": -3.3727903366088863, "median": 26.998981475830078, "p90": 79.0282424926758, "max": 109.45573425292969, "pos_frac": 0.84375, "sample": [4.448850631713867, 5.384727478027344, 1.29119873046875, 96.93846130371094, -3.826719284057617, 5.794425964355469, 43.502655029296875, -3.027742385864258, 58.82334899902344, 39.88677978515625, 9.220649719238281, 36.66088104248047, 74.68328857421875, 66.3964614868164, 31.015724182128906, -6.291130065917969, 38.747100830078125, 41.23128890991211, 37.52153015136719, -9.424932479858398, 14.489496231079102, 71.26736450195312, -14.769691467285156, 50.83207702636719, 81.73570251464844, 31.784133911132812, 18.089675903320312, 4.224399566650391, 59.57829284667969, 4.778190612792969, 31.317626953125, -6.269798278808594, 34.45452117919922, 33.64488220214844, 5.299211502075195, 82.0001220703125, 3.7857284545898438, 82.26437377929688, 23.369956970214844, 109.45573425292969, 3.0467300415039062, 69.02216339111328, -3.5206680297851562, 9.000900268554688, 0.050811767578125, 17.528404235839844, 52.0753059387207, 8.51580810546875, 9.282711029052734, 27.01361846923828, 5.790853500366211, 7.3928680419921875, 35.16273498535156, 60.887969970703125, 80.89036560058594, 51.039031982421875, -33.584014892578125, 33.911285400390625, -3.0232391357421875, 43.98536682128906, 105.20951843261719, 12.79840087890625, -2.9884071350097656, 26.984344482421875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000121.npy"}
{"epoch": 0.1776798825256975, "step": 122, "batch_size": 64, "mean": 37.39657211303711, "std": 30.369840621948242, "min": -22.03052520751953, "p10": 6.000138092041016, "median": 32.84161376953125, "p90": 77.23844909667969, "max": 114.00886535644531, "pos_frac": 0.921875, "sample": [63.747406005859375, 8.489395141601562, 24.666000366210938, 47.37950134277344, 12.949167251586914, 21.868545532226562, 47.802886962890625, 53.27092742919922, 20.315502166748047, 59.60441589355469, 9.99917221069336, 59.37256622314453, 58.29735565185547, 109.06393432617188, 62.51403045654297, 26.466156005859375, 10.867927551269531, 47.908363342285156, 75.42012023925781, 38.04217529296875, 5.816215515136719, 76.29861450195312, 14.549732208251953, 6.586204528808594, 28.841445922851562, 21.759384155273438, 27.981216430664062, 71.08447265625, 44.27033233642578, 37.79113006591797, 46.355567932128906, -9.440925598144531, 95.0474624633789, -1.352254867553711, 31.97516632080078, 77.6412353515625, 39.32500457763672, 33.70806121826172, 57.34923553466797, 79.99565124511719, 8.226324081420898, 35.22419738769531, -14.919906616210938, 114.00886535644531, 25.208480834960938, 25.419376373291016, 22.20745086669922, 15.57175064086914, 34.60255432128906, -22.03052520751953, -2.9189529418945312, 17.06072235107422, 9.189142227172852, 56.77130889892578, 103.89227294921875, 7.472454071044922, 62.296844482421875, 31.338523864746094, 2.8641319274902344, 6.429290771484375, 83.6597900390625, 43.00587463378906, 16.84160804748535, 68.3304443359375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000122.npy"}
{"epoch": 0.17914831130690162, "step": 123, "batch_size": 64, "mean": 25.154926300048828, "std": 34.64956283569336, "min": -68.9216079711914, "p10": -10.142229270935058, "median": 19.79494857788086, "p90": 68.62985763549806, "max": 125.45978546142578, "pos_frac": 0.78125, "sample": [35.296653747558594, 42.92066192626953, 1.742340087890625, 6.032459259033203, 31.782958984375, 32.23948669433594, 51.363616943359375, 46.612518310546875, 43.005088806152344, 30.973648071289062, 1.4796886444091797, 10.203048706054688, 11.049690246582031, -8.343452453613281, 89.65425109863281, 39.608123779296875, 13.56539535522461, -1.461761474609375, -2.275146484375, 29.194839477539062, 1.0346832275390625, 76.37631225585938, 31.919448852539062, -37.886993408203125, 17.527664184570312, 47.17486572265625, 125.45978546142578, 10.553377151489258, 69.65664672851562, 73.709716796875, -9.923929214477539, 52.0601806640625, 17.650188446044922, 6.509063720703125, 52.70507049560547, 63.63929748535156, 21.141571044921875, 36.65965270996094, 96.76039123535156, -10.235786437988281, 40.45330810546875, 36.45439147949219, -20.984210968017578, -12.334854125976562, 42.08502197265625, 1.9267578125, -20.943748474121094, -0.5329875946044922, -11.454246520996094, 115.07958984375, -8.022621154785156, 29.078903198242188, 22.980987548828125, 66.23401641845703, 7.402839660644531, 16.808212280273438, 64.56929016113281, 24.342729568481445, -68.9216079711914, 18.448326110839844, 10.261177062988281, -1.7114295959472656, 0.9074058532714844, 10.65257453918457], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000123.npy"}
{"epoch": 0.18061674008810572, "step": 124, "batch_size": 64, "mean": 33.948753356933594, "std": 43.55914306640625, "min": -43.00956726074219, "p10": -9.856531906127929, "median": 24.22890281677246, "p90": 85.31137847900392, "max": 193.18295288085938, "pos_frac": 0.84375, "sample": [68.091064453125, -8.116226196289062, 29.538040161132812, 23.46697998046875, 3.1099071502685547, 81.84596252441406, -43.00956726074219, -4.797895431518555, 68.41107177734375, 94.39093017578125, 2.4224624633789062, 21.705425262451172, 3.4452857971191406, 30.832984924316406, 31.80963897705078, 140.302001953125, 68.4444580078125, 24.132545471191406, 5.4173583984375, 41.34038543701172, -33.123809814453125, -16.885910034179688, 69.04037475585938, 3.8132286071777344, 102.88739013671875, 68.64677429199219, 36.22755432128906, 8.096755981445312, 35.71009063720703, 56.92967224121094, -10.048450469970703, 24.299922943115234, 24.157882690429688, 26.53365135192871, 4.1589508056640625, 152.58486938476562, 53.0772819519043, 86.74232482910156, 13.385734558105469, 74.63502502441406, 52.584449768066406, -11.965507507324219, 22.50271987915039, -9.408721923828125, 27.86036491394043, 57.16609191894531, -32.780853271484375, 22.10198974609375, 14.154253005981445, 16.124481201171875, 26.454872131347656, 30.16778564453125, -20.358566284179688, 100.87899780273438, 18.280487060546875, 193.18295288085938, 9.36578369140625, 81.97250366210938, 2.3239212036132812, 20.431507110595703, 32.783363342285156, 32.04112243652344, 9.99599838256836, 13.207813262939453], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000124.npy"}
{"epoch": 0.18208516886930984, "step": 125, "batch_size": 64, "mean": 31.42009735107422, "std": 31.614765167236328, "min": -23.44208526611328, "p10": -0.08410282135009739, "median": 19.958462715148926, "p90": 76.4515151977539, "max": 118.53756713867188, "pos_frac": 0.890625, "sample": [-0.19824790954589844, 49.39201354980469, 30.759780883789062, 7.44366455078125, 9.151090621948242, 10.01620864868164, 56.78739929199219, -5.587135314941406, 76.38780212402344, 15.928966522216797, 1.0107307434082031, -13.637153625488281, 14.444097518920898, 15.281486511230469, 51.57250213623047, 45.42973327636719, 36.10066223144531, 33.612953186035156, 7.148561477661133, 10.080181121826172, 72.70697021484375, 12.87774658203125, -1.590057373046875, 15.431365966796875, 51.66053009033203, 118.53756713867188, -19.075454711914062, 19.770776748657227, 36.47320556640625, 11.190376281738281, 11.421310424804688, -23.44208526611328, 20.146148681640625, 7.453788757324219, 18.973480224609375, 98.21237182617188, 4.753017425537109, 91.84416961669922, 63.25639343261719, 0.9511184692382812, 7.300376892089844, 64.69528198242188, 76.47882080078125, 54.73951721191406, 93.56317138671875, 44.25633239746094, 43.10955810546875, 23.820880889892578, 77.54722595214844, 37.43971252441406, 13.085952758789062, 59.12162399291992, 20.47795867919922, 2.4389171600341797, 52.31562805175781, 21.463573455810547, 15.282241821289062, 53.57572937011719, 18.18552017211914, -0.79791259765625, 99.96058654785156, 59.562156677246094, 0.1822357177734375, 10.403081893920898], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000125.npy"}
{"epoch": 0.18355359765051396, "step": 126, "batch_size": 64, "mean": 29.75729751586914, "std": 31.211505889892578, "min": -40.94841003417969, "p10": -1.0629356384277338, "median": 23.62807846069336, "p90": 75.36683197021485, "max": 114.7825927734375, "pos_frac": 0.875, "sample": [57.472076416015625, 39.14014434814453, 8.752403259277344, 48.76652526855469, 10.151107788085938, 23.110145568847656, -12.215599060058594, 2.018156051635742, 25.282516479492188, 93.61654663085938, 8.4927978515625, 14.074504852294922, 73.98275756835938, 26.348587036132812, 45.90003204345703, -1.2830352783203125, 66.48829650878906, 22.81562042236328, 57.47423553466797, 30.325672149658203, 31.027416229248047, 28.398448944091797, -0.5493698120117188, 18.093944549560547, 32.72515106201172, 82.48312377929688, -3.1531143188476562, 76.72450256347656, 22.950031280517578, 27.77167510986328, 14.39498519897461, 13.096179962158203, 24.146011352539062, 8.40606689453125, 75.96000671386719, 32.88543701171875, 42.15351104736328, -23.998825073242188, 53.51487731933594, 13.82802963256836, -11.158721923828125, 4.6524505615234375, 104.57086181640625, 10.525327682495117, 15.500219345092773, 6.793010711669922, 5.542629241943359, 10.868757247924805, 58.9373779296875, 58.45751953125, -40.94841003417969, 114.7825927734375, 40.30614471435547, 15.911300659179688, 17.20465087890625, 45.619049072265625, 26.508407592773438, 43.115928649902344, 13.966045379638672, -15.437644958496094, 4.030738830566406, 52.459800720214844, 11.455451965332031, 99.23196411132812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000126.npy"}
{"epoch": 0.18502202643171806, "step": 127, "batch_size": 64, "mean": 24.356143951416016, "std": 26.474632263183594, "min": -26.8907470703125, "p10": -9.304018402099608, "median": 23.384559631347656, "p90": 59.530442810058595, "max": 81.06056213378906, "pos_frac": 0.8125, "sample": [30.505878448486328, 46.508148193359375, -0.4226951599121094, 25.16436004638672, 2.228759765625, -21.16020965576172, 75.77430725097656, 7.7460479736328125, 59.56531524658203, -18.65032958984375, 7.474119186401367, 28.540367126464844, 45.78672790527344, -14.030004501342773, 15.698509216308594, 18.2514591217041, 37.36289978027344, 6.768400192260742, -26.8907470703125, 49.276283264160156, 20.194129943847656, 49.5224609375, 57.96379089355469, 17.10912322998047, 42.56187438964844, 75.93848419189453, 23.660633087158203, 38.562835693359375, 0.6356201171875, 19.895023345947266, 76.78809356689453, 26.298309326171875, 51.292205810546875, 15.178321838378906, 18.840286254882812, 67.11489868164062, -2.0265769958496094, -2.9321746826171875, 21.220048904418945, -2.1287155151367188, 8.65267562866211, 59.449073791503906, 6.582754135131836, 30.99579620361328, 27.581695556640625, 37.45028305053711, 31.10218048095703, 52.67927551269531, 48.73768615722656, 34.171356201171875, -20.85491943359375, 24.56124496459961, 13.317508697509766, 81.06056213378906, -23.183448791503906, 23.244033813476562, 9.035652160644531, 66.5507583618164, 30.385467529296875, 6.927707672119141, 23.52508544921875, 13.382003784179688, -9.949394226074219, -7.7981414794921875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000127.npy"}
{"epoch": 0.18649045521292218, "step": 128, "batch_size": 64, "mean": 25.285327911376953, "std": 32.860408782958984, "min": -62.047821044921875, "p10": -9.360805320739745, "median": 17.95591449737549, "p90": 76.1209831237793, "max": 121.92131042480469, "pos_frac": 0.796875, "sample": [25.34939956665039, 22.163829803466797, 74.92289733886719, -14.525215148925781, 62.1353759765625, -13.026748657226562, 66.23045349121094, 76.59354400634766, 13.768104553222656, 18.344329833984375, 79.009033203125, 20.398292541503906, 31.928504943847656, 17.502857208251953, -16.06043243408203, 59.98167419433594, 56.14299011230469, 5.903924942016602, 5.278852462768555, 46.50625991821289, -24.821792602539062, -1.204742431640625, 121.92131042480469, -62.047821044921875, 75.01834106445312, 17.63159942626953, 60.36358642578125, 78.93766021728516, 20.37726593017578, 27.499622344970703, -4.808261871337891, -7.8273162841796875, 57.26588439941406, 8.385517120361328, -0.313568115234375, 16.01300048828125, -14.097305297851562, 9.803180694580078, 18.280229568481445, 36.77888870239258, 34.66924285888672, -7.559154510498047, 42.369850158691406, -10.018014907836914, 0.9783744812011719, 83.01783752441406, 8.915731430053711, 35.51585388183594, 16.048389434814453, 25.35773468017578, 12.44720458984375, 7.3767547607421875, 32.46728515625, -1.1502437591552734, 10.256696701049805, 9.256454467773438, 8.877281188964844, 37.760414123535156, 38.2239990234375, 1.9548301696777344, 2.4267120361328125, 80.10205078125, 76.90369415283203, 0.35886192321777344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000128.npy"}
{"epoch": 0.18795888399412627, "step": 129, "batch_size": 64, "mean": 26.0760498046875, "std": 36.68892288208008, "min": -61.743011474609375, "p10": -10.429520606994627, "median": 21.66796588897705, "p90": 79.28621139526368, "max": 148.91387939453125, "pos_frac": 0.703125, "sample": [-6.234228134155273, 21.21672821044922, -15.383445739746094, 62.11070251464844, -11.408971786499023, 40.13153076171875, 8.228553771972656, -6.674278259277344, 82.07553100585938, 22.119203567504883, 19.152610778808594, 51.58467102050781, 16.108638763427734, 26.323158264160156, 32.27140426635742, 40.151397705078125, 83.37652587890625, -2.43994140625, 7.260162353515625, 47.140525817871094, 90.31588745117188, 49.38606262207031, 79.84133911132812, -35.189048767089844, 28.045379638671875, -27.436309814453125, -1.0320968627929688, 22.396076202392578, 148.91387939453125, 77.99091339111328, 20.11747932434082, 18.990203857421875, -8.144134521484375, 46.00165557861328, -4.3282928466796875, 3.045379638671875, 9.924102783203125, 64.5521240234375, 9.607566833496094, -7.081977844238281, -21.877273559570312, 7.440263748168945, 22.701004028320312, 48.327083587646484, 95.40904235839844, -61.743011474609375, 12.708040237426758, 12.31456184387207, -2.998220443725586, -3.445333480834961, 61.19989013671875, 92.11064147949219, 32.82941818237305, 27.927947998046875, 42.3065185546875, -3.5602798461914062, 36.11688995361328, 29.00469207763672, -8.02032470703125, 41.53284454345703, 61.79852294921875, -15.689933776855469, 60.984031677246094, -1.5364246368408203], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000129.npy"}
{"epoch": 0.1894273127753304, "step": 130, "batch_size": 64, "mean": 37.14537048339844, "std": 38.54509353637695, "min": -57.40104675292969, "p10": -4.062894439697265, "median": 31.572856903076172, "p90": 92.8198127746582, "max": 126.39910888671875, "pos_frac": 0.828125, "sample": [45.002445220947266, 34.38662338256836, 11.879676818847656, 93.26641845703125, -4.20947265625, 17.397415161132812, 77.6402587890625, 59.93327331542969, -3.7208786010742188, 1.0578269958496094, 15.749044418334961, 22.547229766845703, 22.54965591430664, 72.51052856445312, 106.7555923461914, 12.157997131347656, 62.42144775390625, -3.2435474395751953, 25.686294555664062, 50.900962829589844, 4.246879577636719, 17.155426025390625, -0.14208221435546875, 48.37825012207031, 60.19999694824219, 14.091835021972656, 30.27667999267578, 68.05882263183594, 39.47362518310547, 96.43949890136719, 81.9307861328125, 94.50228881835938, -7.995758056640625, 12.563247680664062, 64.8388671875, -57.40104675292969, 23.01833724975586, 17.653728485107422, 18.910778045654297, -9.24237060546875, 42.821083068847656, 33.808536529541016, 50.88697052001953, 118.86488342285156, 91.7777328491211, -6.9599609375, 126.39910888671875, -13.809356689453125, -2.0755653381347656, 42.523338317871094, 32.282920837402344, 22.875110626220703, 95.88457489013672, 1.683258056640625, 75.38133239746094, -49.46819305419922, 68.70789337158203, 3.326292037963867, 45.84381103515625, 75.52610778808594, 87.05038452148438, 46.57749938964844, 30.86279296875, 18.906646728515625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000130.npy"}
{"epoch": 0.19089574155653452, "step": 131, "batch_size": 64, "mean": 34.836463928222656, "std": 32.549556732177734, "min": -41.641448974609375, "p10": 4.1777080535888675, "median": 29.058603286743164, "p90": 74.9644142150879, "max": 113.64627075195312, "pos_frac": 0.921875, "sample": [32.70427703857422, 27.223608016967773, 30.86082649230957, 24.461753845214844, 106.072021484375, -41.641448974609375, 16.771739959716797, 20.30187225341797, 32.30603790283203, 49.37317657470703, 70.11880493164062, 15.635414123535156, 20.1829833984375, 8.037652969360352, 40.96443176269531, 15.168235778808594, 19.542369842529297, -2.5513362884521484, 68.06846618652344, 34.149227142333984, 30.715354919433594, 76.24821472167969, 12.920475006103516, -3.552186965942383, 27.490097045898438, 12.526313781738281, 60.4683837890625, 8.510366439819336, 80.7940673828125, 27.85675048828125, 23.48287010192871, -33.19403076171875, 71.96887969970703, 4.096149444580078, 113.64627075195312, 1.7993412017822266, 37.24208068847656, 12.840221405029297, 80.22474670410156, 30.260456085205078, 55.94708251953125, 24.245113372802734, 107.94142150878906, 15.866813659667969, -41.17634582519531, 57.9913330078125, 7.9124603271484375, 57.49223327636719, 4.368011474609375, 21.366466522216797, 4.884025573730469, 54.99138641357422, 12.052604675292969, 96.39462280273438, 59.941253662109375, 71.48828125, 46.0191650390625, 37.859222412109375, 15.940357208251953, 50.05853271484375, 46.545433044433594, 64.99220275878906, 68.47976684570312, 23.837242126464844], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000131.npy"}
{"epoch": 0.19236417033773862, "step": 132, "batch_size": 64, "mean": 31.518390655517578, "std": 36.81360626220703, "min": -46.319854736328125, "p10": -0.8922243118286124, "median": 26.55368709564209, "p90": 72.16932754516603, "max": 156.12741088867188, "pos_frac": 0.890625, "sample": [12.191507339477539, 9.398189544677734, 81.49710083007812, -20.44239044189453, 30.762603759765625, 35.51099395751953, 38.408817291259766, 141.4117431640625, 130.96066284179688, 6.9001922607421875, 32.368072509765625, 50.181983947753906, 5.387273788452148, 19.979766845703125, -46.319854736328125, 14.164825439453125, 81.84942626953125, -2.1396522521972656, 28.23108673095703, -1.290853500366211, 5.219997406005859, 0.03791046142578125, 56.741172790527344, 56.015480041503906, 5.688762664794922, 16.532638549804688, 5.455774307250977, 56.82733917236328, 4.5491180419921875, 9.967391967773438, 8.4554443359375, 10.2005615234375, 10.413032531738281, 48.050392150878906, 54.092098236083984, 65.20255279541016, 22.709877014160156, 31.501449584960938, 47.419063568115234, 38.57710647583008, 27.55565071105957, 28.29836654663086, 36.91176986694336, 69.77886962890625, -30.745376586914062, 25.55172348022461, 73.19380950927734, -15.896293640136719, 7.553493499755859, 52.27288818359375, -8.015405654907227, 41.917144775390625, 68.22107696533203, 6.774726867675781, 29.846160888671875, 6.703887939453125, 20.131980895996094, 49.768157958984375, 20.442520141601562, 11.408292770385742, 95.54635620117188, 11.016845703125, 30.144256591796875, 156.12741088867188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000132.npy"}
{"epoch": 0.19383259911894274, "step": 133, "batch_size": 64, "mean": 28.69933319091797, "std": 36.52239990234375, "min": -34.35804748535156, "p10": -4.861134338378903, "median": 21.18086528778076, "p90": 75.09685440063477, "max": 140.27447509765625, "pos_frac": 0.859375, "sample": [18.77086639404297, 60.81658935546875, 10.44818115234375, 51.29888916015625, 9.093345642089844, -11.555191040039062, 3.62420654296875, 13.251029968261719, -28.502532958984375, 109.99983215332031, 30.26325225830078, 46.48053741455078, 1.4365043640136719, 73.18932342529297, -32.011810302734375, 2.442474365234375, 38.10736083984375, -1.80889892578125, 105.9388427734375, 11.821496963500977, 52.13581848144531, 26.604019165039062, -0.9813766479492188, 63.115631103515625, 2.4861297607421875, 60.34754943847656, 26.584701538085938, 45.9015007019043, 48.13214874267578, 23.75634765625, 16.468734741210938, 6.969810485839844, 15.682012557983398, 117.30438232421875, 2.7755126953125, 35.09900665283203, 40.210533142089844, -13.820571899414062, 106.00997924804688, 26.92041015625, 39.63161087036133, 22.507217407226562, 22.13665199279785, 140.27447509765625, -34.35804748535156, 0.5819549560546875, 40.03282165527344, 90.45925903320312, 0.32843017578125, 20.225078582763672, 7.0848236083984375, 52.77659606933594, -23.42749786376953, 75.91436767578125, 46.077903747558594, -6.1692352294921875, 60.0079345703125, 23.776748657226562, 16.242340087890625, 12.737390518188477, 5.07904052734375, 0.12485885620117188, 6.402212142944336, 3.5038223266601562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000133.npy"}
{"epoch": 0.19530102790014683, "step": 134, "batch_size": 64, "mean": 31.654129028320312, "std": 31.065528869628906, "min": -13.500297546386719, "p10": 1.074802589416504, "median": 24.952600479125977, "p90": 75.38041534423829, "max": 129.53369140625, "pos_frac": 0.90625, "sample": [7.7428741455078125, -13.500297546386719, 0.9618682861328125, 65.54739379882812, 113.91525268554688, 17.03643035888672, 27.22185516357422, 15.074222564697266, 4.2293548583984375, 25.961326599121094, 57.36384582519531, 13.717597961425781, 8.539966583251953, 9.200630187988281, 34.189613342285156, 25.397537231445312, -5.605695724487305, 30.974822998046875, 43.318458557128906, 8.979408264160156, 6.080722808837891, 4.335075378417969, 35.38862609863281, 26.514450073242188, 103.11154174804688, 34.77813720703125, 62.4468994140625, 64.17178344726562, 31.908519744873047, 12.803314208984375, 14.43564224243164, 76.6937026977539, 5.4591827392578125, 23.407943725585938, 70.70527648925781, 72.31607818603516, 84.0850601196289, -4.402109146118164, 45.584651947021484, 5.944604873657227, 19.650680541992188, 8.49517822265625, 24.50766372680664, 6.226167678833008, 129.53369140625, 21.836437225341797, 50.35655212402344, 22.060684204101562, 42.231903076171875, 1.3383159637451172, 28.473899841308594, -3.6139450073242188, -9.954582214355469, 96.65408325195312, 46.76641845703125, 26.144630432128906, 39.53985595703125, 17.487537384033203, 21.295166015625, 43.22327423095703, -10.308769226074219, 40.32169723510742, 18.000003814697266, 79.56210327148438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000134.npy"}
{"epoch": 0.19676945668135096, "step": 135, "batch_size": 64, "mean": 37.6085319519043, "std": 41.21818542480469, "min": -41.011837005615234, "p10": -4.916728401184081, "median": 28.063804626464844, "p90": 99.02548065185549, "max": 154.59906005859375, "pos_frac": 0.859375, "sample": [27.274147033691406, 56.03315734863281, 10.777278900146484, 100.8675537109375, 44.457427978515625, 44.13883972167969, 131.0070343017578, 29.81563949584961, 110.4486083984375, 109.47142028808594, 83.21416473388672, 21.598281860351562, 94.211181640625, 75.55307006835938, 121.8731689453125, 49.7112922668457, 23.436965942382812, 80.63174438476562, 11.264392852783203, -5.631498336791992, 5.3907318115234375, 72.90562438964844, 70.46717834472656, 50.04206085205078, 19.108190536499023, 57.78001403808594, 10.555755615234375, 154.59906005859375, -17.012348175048828, 63.84602355957031, -11.8651123046875, 104.24899291992188, 94.72731018066406, -41.011837005615234, 22.429283142089844, -3.2140769958496094, 19.155155181884766, 33.371986389160156, 47.76368713378906, 5.139312744140625, -30.70965576171875, 14.354629516601562, 40.97791290283203, 17.54210662841797, 13.775203704833984, 29.288795471191406, 13.97665023803711, -3.248931884765625, 28.405899047851562, 12.217437744140625, 13.130180358886719, 46.7706298828125, 45.662506103515625, 35.712913513183594, 3.4362030029296875, 47.45045471191406, 27.721710205078125, 78.994384765625, 12.617080688476562, -28.237518310546875, 0.5920944213867188, 2.4995880126953125, -8.104415893554688, 13.539230346679688], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000135.npy"}
{"epoch": 0.19823788546255505, "step": 136, "batch_size": 64, "mean": 27.498376846313477, "std": 40.1883430480957, "min": -19.7808837890625, "p10": -9.600277709960936, "median": 16.445656776428223, "p90": 66.19746704101563, "max": 195.25360107421875, "pos_frac": 0.75, "sample": [41.88395309448242, 28.59209442138672, 47.914520263671875, 45.00347900390625, 3.4560012817382812, -19.636398315429688, 135.60679626464844, 68.47027587890625, -12.87213134765625, 30.356876373291016, 47.113914489746094, 37.345638275146484, 30.806625366210938, 6.658414840698242, 46.51361083984375, 23.7923583984375, 40.016151428222656, 15.811433792114258, 66.56631469726562, 137.30145263671875, -2.8263092041015625, 50.50724411010742, 195.25360107421875, 10.109306335449219, 48.42027282714844, 5.513175964355469, 95.23602294921875, 8.27609634399414, 120.645263671875, 8.381523132324219, 15.755409240722656, 7.146900177001953, 37.050575256347656, -19.7808837890625, 17.837661743164062, -8.003524780273438, 15.03662109375, -9.002357482910156, 35.98114013671875, 41.341331481933594, 14.834283828735352, -9.856529235839844, -7.7408294677734375, 42.22273254394531, -3.309267044067383, 1.5037250518798828, -0.8970603942871094, 29.547603607177734, 47.43281173706055, 54.065513610839844, -13.010093688964844, -3.1967620849609375, 22.25366973876953, 24.68236541748047, -8.895233154296875, 8.217033386230469, 17.079879760742188, -3.0743980407714844, -18.9827880859375, 8.026514053344727, 11.423336029052734, 65.33682250976562, 6.851005554199219, -18.19866180419922], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000136.npy"}
{"epoch": 0.19970631424375918, "step": 137, "batch_size": 64, "mean": 41.521568298339844, "std": 50.55982971191406, "min": -66.91748046875, "p10": -5.204382514953612, "median": 26.59206771850586, "p90": 107.1087585449219, "max": 193.3846435546875, "pos_frac": 0.84375, "sample": [15.903274536132812, 18.133182525634766, 21.420883178710938, 54.67914581298828, 78.09449005126953, 13.168434143066406, 17.905128479003906, 41.516357421875, -4.256538391113281, 72.68207550048828, 8.551456451416016, 56.14977264404297, 193.3846435546875, 10.622274398803711, 22.659759521484375, -32.9051628112793, 55.17839050292969, 15.817798614501953, 21.440645217895508, 87.66299438476562, 102.06610107421875, 46.91063690185547, 4.665544509887695, 154.02072143554688, 52.70906066894531, 133.2584686279297, 90.52212524414062, 47.282012939453125, 8.329723358154297, 101.49054718017578, -33.60429382324219, 25.446380615234375, 39.848609924316406, 12.020462036132812, 80.12174987792969, 3.0108985900878906, 84.5205078125, 40.287841796875, -9.059257507324219, 29.347137451171875, 39.05571746826172, 21.00836181640625, 21.971046447753906, -66.91748046875, -15.439323425292969, -24.047531127929688, 109.2698974609375, -1.6143760681152344, 27.737754821777344, 171.8697509765625, 27.890487670898438, 62.280487060546875, 5.0659942626953125, 65.40721130371094, -5.610601425170898, 4.899135589599609, 22.47186279296875, 5.954425811767578, 4.985681533813477, 56.57117462158203, 138.98411560058594, -1.6653900146484375, 73.03173065185547, 133.21649169921875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000137.npy"}
{"epoch": 0.2011747430249633, "step": 138, "batch_size": 64, "mean": 31.326282501220703, "std": 35.36564636230469, "min": -63.070037841796875, "p10": -3.0516342163085914, "median": 25.457637786865234, "p90": 77.83893356323243, "max": 139.66632080078125, "pos_frac": 0.875, "sample": [79.3221435546875, -3.94403076171875, 54.70212173461914, 79.14752197265625, 23.481101989746094, 42.42560577392578, 1.6105499267578125, -0.9693756103515625, 0.8954925537109375, 67.19659423828125, 21.466083526611328, 18.247344970703125, 44.23289489746094, 5.9070892333984375, 21.155248641967773, -63.070037841796875, 11.493131637573242, -8.04172134399414, 1.7057266235351562, 7.4449005126953125, 86.18741607666016, 89.05950164794922, 7.770298004150391, 4.071556091308594, 83.37075805664062, 72.29133605957031, 28.14722442626953, 8.108795166015625, 53.03907775878906, 62.75755310058594, 35.400604248046875, 12.687555313110352, 58.04837417602539, 8.189346313476562, 69.35851287841797, 65.21968078613281, 15.458282470703125, 36.34816360473633, -32.59278869628906, 18.47020721435547, 40.73577117919922, 19.396591186523438, 27.434173583984375, 5.125814437866211, 2.2129745483398438, -16.985198974609375, 60.56718444824219, 74.78556060791016, 14.445446014404297, 31.220703125, 59.29725646972656, 22.008779525756836, 62.26579284667969, -30.53569793701172, -15.944023132324219, 38.64693069458008, 0.3289794921875, 38.57823181152344, 8.303123474121094, 139.66632080078125, 55.49462127685547, 34.382911682128906, 57.81147003173828, 89.83868408203125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000138.npy"}
{"epoch": 0.2026431718061674, "step": 139, "batch_size": 64, "mean": 39.141056060791016, "std": 38.93692398071289, "min": -29.610427856445312, "p10": -3.0071731567382796, "median": 33.25684356689453, "p90": 82.53921051025391, "max": 163.35812377929688, "pos_frac": 0.84375, "sample": [-12.49169921875, 67.60021209716797, 55.99121856689453, 81.91741943359375, -1.3277435302734375, 45.963783264160156, 79.45452117919922, -0.5193099975585938, 35.233829498291016, 75.60565948486328, -29.610427856445312, 34.115806579589844, -10.159862518310547, 129.21273803710938, 82.08765411376953, 38.613731384277344, 60.35704040527344, 30.72797393798828, 24.01300811767578, 29.478130340576172, 82.73273468017578, 41.480369567871094, -19.178489685058594, 76.01168060302734, 26.165374755859375, 2.9154224395751953, 84.17091369628906, 48.85498046875, 6.761327743530273, 32.23381042480469, 74.69009399414062, 7.677392959594727, 48.28404235839844, 49.527610778808594, 35.47401428222656, 42.07562255859375, 13.829330444335938, 0.8323612213134766, 2.2223072052001953, 5.612052917480469, 27.680152893066406, 1.7381362915039062, 7.905216217041016, 77.53987121582031, 22.045494079589844, 109.51559448242188, 63.80579376220703, -11.760269165039062, 32.39788055419922, -0.8997879028320312, 49.593299865722656, -3.7269287109375, 22.857526779174805, 42.26606750488281, 163.35812377929688, 23.273578643798828, 143.20155334472656, 21.251724243164062, 44.555633544921875, -8.459930419921875, 89.71757507324219, 22.962661743164062, 26.440616607666016, 59.13160705566406], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000139.npy"}
{"epoch": 0.20411160058737152, "step": 140, "batch_size": 64, "mean": 36.26915740966797, "std": 43.82479476928711, "min": -47.96546173095703, "p10": -11.039588928222654, "median": 31.329246520996094, "p90": 95.0215042114258, "max": 160.1151123046875, "pos_frac": 0.765625, "sample": [-1.5944252014160156, 78.42745971679688, 160.1151123046875, -20.035423278808594, -4.617027282714844, 66.94049072265625, 58.887535095214844, 97.42178344726562, 43.065277099609375, 87.75679016113281, 63.670066833496094, 33.999473571777344, 73.34771728515625, 18.373291015625, 20.337982177734375, 39.42759704589844, 135.43496704101562, 6.090862274169922, -0.8778190612792969, 37.373268127441406, 11.880096435546875, 45.445396423339844, 49.389278411865234, 59.66254425048828, 2.8658409118652344, -12.359375, 121.82743835449219, -5.4191741943359375, 18.26495361328125, 58.45155715942383, 16.88185691833496, 11.195960998535156, 8.624969482421875, 65.77061462402344, -8.98736572265625, 34.0108642578125, 61.684261322021484, 20.664817810058594, 3.1334075927734375, 76.8364028930664, 69.17951202392578, -15.294334411621094, 89.42085266113281, -43.48200988769531, -7.8349456787109375, 40.2969970703125, 24.584945678710938, 53.84130096435547, 107.55445098876953, 8.375511169433594, -7.879184722900391, 77.18974304199219, -5.136116027832031, 30.403961181640625, -47.96546173095703, 1.1724357604980469, -23.502288818359375, 15.445571899414062, 120.35345458984375, 99.16150665283203, -11.919113159179688, 32.25453186035156, 14.214096069335938, 67.4212646484375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000140.npy"}
{"epoch": 0.2055800293685756, "step": 141, "batch_size": 64, "mean": 35.71399688720703, "std": 39.767616271972656, "min": -82.13302612304688, "p10": -5.432839584350585, "median": 32.0679874420166, "p90": 88.77985229492188, "max": 118.39105224609375, "pos_frac": 0.84375, "sample": [70.08682250976562, 27.896015167236328, 11.922794342041016, 64.22330474853516, 86.75254821777344, -5.893222808837891, 42.652679443359375, 40.95636749267578, 13.152595520019531, -0.13726806640625, 34.31528854370117, 100.47903442382812, -82.13302612304688, 26.97838592529297, -4.358612060546875, 36.517242431640625, 0.8357753753662109, -17.0517578125, 115.20835876464844, -34.20085906982422, 30.29705047607422, 89.64869689941406, 32.05927658081055, 52.973663330078125, 80.06951904296875, 46.70313262939453, 46.206787109375, 12.628791809082031, 21.41156005859375, 37.013126373291016, 69.03952026367188, 19.405231475830078, 14.028425216674805, 84.54356384277344, 118.39105224609375, 23.03214454650879, 53.01032257080078, 13.309310913085938, -15.515321731567383, 13.696529388427734, -3.2170257568359375, 33.71289825439453, 2.0962600708007812, 106.01408386230469, 10.53773307800293, 71.86518096923828, 77.10721588134766, 28.814682006835938, 10.64619255065918, -47.34576416015625, 17.896484375, 58.49267578125, 32.076698303222656, 25.00476837158203, 47.002838134765625, 70.29621124267578, 111.5557861328125, 64.65625, 5.343072891235352, 99.76654052734375, 38.898101806640625, 47.45904541015625, -20.341278076171875, 27.202285766601562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000141.npy"}
{"epoch": 0.20704845814977973, "step": 142, "batch_size": 64, "mean": 40.46052551269531, "std": 51.41898727416992, "min": -107.7098388671875, "p10": -11.132499885559081, "median": 28.407533645629883, "p90": 115.8721046447754, "max": 153.59109497070312, "pos_frac": 0.8125, "sample": [-6.043373107910156, 109.14218139648438, 8.406789779663086, 20.766799926757812, -3.1021194458007812, 95.80616760253906, -17.408985137939453, -3.600088119506836, -31.42266845703125, 23.879520416259766, 60.149269104003906, 73.43270874023438, 108.0503921508789, 19.642000198364258, 30.520498275756836, 103.79774475097656, 5.023639678955078, -107.7098388671875, -25.3475341796875, 38.85459899902344, 37.196083068847656, 28.751434326171875, 14.394596099853516, 124.27890014648438, 103.14027404785156, 28.71229362487793, 69.29885864257812, 27.210899353027344, 64.69121551513672, -10.603338241577148, 42.119049072265625, 81.94557189941406, 85.22972106933594, 53.81385803222656, 0.3521099090576172, 105.50408935546875, 141.03501892089844, 20.1024227142334, 8.284582138061523, 15.805122375488281, -11.359283447265625, -15.97420883178711, 4.78118896484375, 121.92915344238281, 37.150245666503906, 117.44984436035156, 9.443511962890625, 0.065643310546875, 31.489730834960938, 2.7155532836914062, 42.23611068725586, -3.3950271606445312, 16.54857063293457, -33.63481140136719, 132.8538055419922, 153.59109497070312, 26.46654510498047, 112.19071197509766, 8.463699340820312, 3.5347461700439453, 68.90677642822266, 119.05473327636719, 28.102773666381836, 72.76216888427734], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000142.npy"}
{"epoch": 0.20851688693098386, "step": 143, "batch_size": 64, "mean": 26.287750244140625, "std": 41.852787017822266, "min": -64.51527404785156, "p10": -21.934994888305663, "median": 23.576565742492676, "p90": 74.70422744750977, "max": 164.61288452148438, "pos_frac": 0.703125, "sample": [40.28460693359375, 0.5861053466796875, 35.50830841064453, 29.46735382080078, 117.81396484375, 30.824737548828125, -22.23980140686035, 55.692054748535156, 74.83195495605469, 15.298942565917969, 63.54945373535156, -54.843414306640625, -25.170162200927734, 56.24658203125, 51.337196350097656, -36.56914520263672, -18.969640731811523, -0.025720596313476562, 65.97531127929688, -9.934646606445312, 13.451740264892578, 101.91531372070312, -25.38671112060547, 54.38496398925781, 27.935482025146484, -13.75851821899414, 37.462379455566406, 70.05243682861328, 20.060874938964844, 30.851303100585938, -12.590728759765625, -21.223779678344727, -9.793426513671875, 7.7701568603515625, 64.21865844726562, 23.77979278564453, 21.72356414794922, 29.883264541625977, 25.032913208007812, 164.61288452148438, -1.5912704467773438, -4.412944793701172, 30.472183227539062, 65.26081848144531, 0.12856292724609375, 47.836036682128906, 74.40619659423828, 81.08934783935547, -3.4166393280029297, 4.89825439453125, -22.591217041015625, 120.8070068359375, -3.1605224609375, 41.725990295410156, 25.819366455078125, 21.65571403503418, 50.24995422363281, 20.22173309326172, 76.6719741821289, -64.51527404785156, 6.114238739013672, 23.37333869934082, -0.8531112670898438, 12.179763793945312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000143.npy"}
{"epoch": 0.20998531571218795, "step": 144, "batch_size": 64, "mean": 35.076011657714844, "std": 35.68683624267578, "min": -40.471710205078125, "p10": -8.139253234863276, "median": 33.75960922241211, "p90": 77.97599258422854, "max": 131.12498474121094, "pos_frac": 0.875, "sample": [-10.434860229492188, 56.70459747314453, 12.583681106567383, 88.8465576171875, -40.471710205078125, 10.004158020019531, 52.31642150878906, 65.37483978271484, 34.16716766357422, 26.800304412841797, 53.015098571777344, 3.5538673400878906, 35.88469696044922, 47.11268615722656, 47.700157165527344, 70.21953582763672, 36.844810485839844, 66.0580825805664, 71.92401885986328, 80.50943756103516, 68.27761840820312, 91.02729034423828, -2.7828369140625, -33.25384521484375, 49.926719665527344, 1.0883941650390625, 36.60332489013672, 6.6000823974609375, 50.26728057861328, 10.564743041992188, -15.512575149536133, 58.959388732910156, 43.375244140625, 24.57842254638672, -15.243995666503906, 83.15028381347656, 29.126708984375, 18.072978973388672, 17.387290954589844, 47.56562042236328, 19.446495056152344, 34.704322814941406, 125.94054412841797, 34.869476318359375, 21.75690460205078, 48.217674255371094, -21.27944564819336, 18.929962158203125, 33.35205078125, 63.948997497558594, 57.52953338623047, 116.59327697753906, 3.0654754638671875, 3.027374267578125, 7.007659912109375, 3.4202880859375, 131.12498474121094, 13.114362716674805, 16.80092430114746, 20.507118225097656, 32.44300842285156, 25.423011779785156, 72.06462097167969, -15.63564682006836], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000144.npy"}
{"epoch": 0.21145374449339208, "step": 145, "batch_size": 64, "mean": 36.60434341430664, "std": 40.49720764160156, "min": -39.31402587890625, "p10": -10.755808639526366, "median": 28.721290588378906, "p90": 92.77067947387697, "max": 159.452392578125, "pos_frac": 0.8125, "sample": [16.833892822265625, -1.1982955932617188, -18.110794067382812, 159.452392578125, 78.09449768066406, 75.56072235107422, 94.02542114257812, 6.246829986572266, -39.31402587890625, 38.2786979675293, 8.116790771484375, 31.00916290283203, 79.82649230957031, -13.046867370605469, 16.188331604003906, 58.775291442871094, 49.849388122558594, 21.410364151000977, 47.124576568603516, -15.193708419799805, 3.859525680541992, 19.803955078125, 26.43341827392578, 95.91132354736328, 48.47150421142578, -33.56776428222656, 13.829345703125, 7.788311004638672, -0.841827392578125, 94.11288452148438, 24.491796493530273, 15.701202392578125, 68.03407287597656, -11.25271224975586, -12.224632263183594, 67.32643127441406, 4.771965026855469, 9.698305130004883, 41.350379943847656, -6.569587707519531, 46.10617446899414, 12.020008087158203, 89.84294891357422, -1.8560218811035156, 74.38076782226562, 103.58049011230469, 51.611572265625, -9.596366882324219, 33.44342803955078, 104.4477767944336, 14.815227508544922, 25.644306182861328, 82.27685546875, 87.14928436279297, 47.56051254272461, 12.036087036132812, 4.488868713378906, 34.805809020996094, 44.543609619140625, 66.39399719238281, 122.36012268066406, 38.86417007446289, 9.094501495361328, 77.60687255859375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000145.npy"}
{"epoch": 0.21292217327459617, "step": 146, "batch_size": 64, "mean": 29.2852783203125, "std": 43.514251708984375, "min": -95.33432006835938, "p10": -26.773853302001946, "median": 27.51817798614502, "p90": 82.6104949951172, "max": 128.731201171875, "pos_frac": 0.765625, "sample": [31.708580017089844, -4.242679595947266, 79.9017333984375, -36.43224334716797, 14.379779815673828, 7.523193359375, 61.711448669433594, 31.594345092773438, 43.35077667236328, 29.82910919189453, 121.04937744140625, 27.31487274169922, 17.548097610473633, 8.405807495117188, 23.71831512451172, 15.057533264160156, 51.34203338623047, 7.988861083984375, 9.563179016113281, -3.9453983306884766, 0.8439826965332031, 21.635162353515625, 47.00457763671875, 83.77139282226562, -41.25047302246094, 41.755943298339844, 74.42933654785156, -95.33432006835938, -29.594512939453125, 88.93313598632812, -34.895660400390625, 35.703369140625, 6.99127197265625, 48.917022705078125, 128.731201171875, -35.600914001464844, 42.468902587890625, 19.206106185913086, 65.70667266845703, 121.82089233398438, 20.63068389892578, 48.6988639831543, 58.816043853759766, 32.59344482421875, 31.658462524414062, 51.460182189941406, 76.68054962158203, -39.572021484375, 18.150619506835938, -20.19231414794922, 56.92656707763672, 73.68643951416016, 73.19647216796875, 27.72148323059082, -0.27863311767578125, 17.75287628173828, 15.866609573364258, -10.160552978515625, 48.140838623046875, -18.633583068847656, 109.82258605957031, -11.829694747924805, 98.86894226074219, -14.35687255859375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000146.npy"}
{"epoch": 0.2143906020558003, "step": 147, "batch_size": 64, "mean": 39.91291427612305, "std": 47.765506744384766, "min": -56.929229736328125, "p10": -8.618191528320311, "median": 31.876008987426758, "p90": 114.32717742919924, "max": 167.5880126953125, "pos_frac": 0.828125, "sample": [101.34283447265625, -33.291011810302734, 35.1339111328125, 119.45442199707031, -5.006736755371094, 2.37493896484375, 54.436248779296875, 69.54029846191406, 20.620223999023438, -14.931861877441406, 16.47050666809082, 45.62178039550781, 36.60948181152344, 73.0469741821289, 25.303424835205078, 115.93746185302734, 70.52870178222656, 19.79480743408203, -7.610595703125, 110.5698471069336, 116.0311279296875, 13.746955871582031, 92.05000305175781, 18.567861557006836, 149.059326171875, 28.25048828125, 149.30474853515625, 64.24775695800781, 8.951377868652344, 35.481178283691406, 20.122875213623047, -56.929229736328125, 5.118095397949219, 78.93045043945312, 49.584259033203125, -15.720809936523438, 44.23027038574219, 167.5880126953125, 36.485931396484375, 16.183330535888672, 86.82598114013672, 11.556854248046875, -1.5081806182861328, 10.869277954101562, 23.35809326171875, 40.82574462890625, 7.24761962890625, 40.29570388793945, 46.31605529785156, -7.269371032714844, 16.474273681640625, 34.46767044067383, 20.115947723388672, 78.40321350097656, -9.050018310546875, 33.71560287475586, 124.9495849609375, 4.349346160888672, 25.94818115234375, -45.555908203125, 100.02017211914062, -28.9111328125, 33.077789306640625, 30.67422866821289], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000147.npy"}
{"epoch": 0.21585903083700442, "step": 148, "batch_size": 64, "mean": 31.903329849243164, "std": 40.64439010620117, "min": -45.33416748046875, "p10": -4.6212825775146475, "median": 20.02794647216797, "p90": 79.51107788085939, "max": 165.8648681640625, "pos_frac": 0.84375, "sample": [-3.5468788146972656, 30.167770385742188, 6.853363037109375, 71.110595703125, 3.0750198364257812, 43.465110778808594, 68.94068908691406, 26.28165054321289, -14.438766479492188, 105.73507690429688, 12.880203247070312, 80.94115447998047, 18.703811645507812, 15.404335021972656, 75.77058410644531, 116.3087158203125, -0.3782501220703125, 0.2864532470703125, -12.124580383300781, -24.648056030273438, 81.30828857421875, 70.95639038085938, 1.9740276336669922, 53.106658935546875, 71.79557037353516, 30.051206588745117, 41.77618408203125, 24.226425170898438, 72.88113403320312, -1.1382331848144531, 76.17423248291016, 14.5621337890625, 23.27680778503418, 31.230018615722656, -7.756565093994141, 20.281009674072266, 11.618995666503906, 19.200149536132812, 65.69835662841797, 14.935283660888672, 82.90319061279297, 24.585594177246094, -42.4443359375, 20.132431030273438, 24.933347702026367, 16.81909942626953, 70.82258605957031, 18.53545379638672, 15.22610092163086, -5.0817413330078125, -45.33416748046875, 31.361557006835938, 15.708179473876953, 9.244749069213867, 22.258201599121094, 5.489980697631836, 19.9234619140625, 157.78814697265625, 165.8648681640625, 5.704719543457031, 6.630277633666992, 10.89013671875, 66.47029113769531, 6.434867858886719], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000148.npy"}
{"epoch": 0.2173274596182085, "step": 149, "batch_size": 64, "mean": 41.27139663696289, "std": 42.81531524658203, "min": -66.76600646972656, "p10": 0.4316488265991245, "median": 32.945852279663086, "p90": 92.14445114135744, "max": 177.0043182373047, "pos_frac": 0.890625, "sample": [69.49667358398438, 42.26405334472656, 124.95684814453125, 32.96916580200195, 71.34188079833984, 25.790695190429688, 32.92253875732422, -10.269937515258789, -15.615989685058594, 9.906078338623047, 12.83466911315918, 10.43460464477539, 48.304813385009766, 31.251388549804688, 26.80681610107422, -25.52941131591797, 86.17411041259766, 56.42315673828125, 17.66900634765625, 66.8050308227539, 44.07837677001953, 97.85134887695312, 66.28266143798828, 19.41447639465332, 12.580747604370117, 50.63560485839844, 24.440706253051758, 77.02332305908203, 55.0315055847168, 71.30389404296875, 46.907554626464844, -1.0429401397705078, 177.0043182373047, 21.605775833129883, 93.26158905029297, 24.93805694580078, 126.5364990234375, 37.632911682128906, 13.953842163085938, 67.37213897705078, 22.810062408447266, 64.12918090820312, 12.329690933227539, 34.164756774902344, -5.249458312988281, 36.80796813964844, 20.606067657470703, 53.76581573486328, 7.829010009765625, -66.76600646972656, 42.2437744140625, 26.926780700683594, 10.117141723632812, 88.52236938476562, 18.835744857788086, 104.85928344726562, 15.92296028137207, 26.943483352661133, 89.53779602050781, 161.87396240234375, 10.854530334472656, 3.872356414794922, 52.847904205322266, -34.16424560546875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000149.npy"}
{"epoch": 0.21879588839941264, "step": 150, "batch_size": 64, "mean": 38.6958122253418, "std": 50.88700866699219, "min": -59.17975616455078, "p10": -5.0247364044189435, "median": 23.56205177307129, "p90": 103.99325332641604, "max": 211.54461669921875, "pos_frac": 0.84375, "sample": [151.399658203125, 193.07676696777344, 15.710731506347656, 17.92473602294922, 43.06166076660156, -10.830390930175781, 84.7857894897461, 28.587980270385742, -59.17975616455078, 22.66681671142578, 211.54461669921875, -12.480314254760742, 7.2940521240234375, 14.096084594726562, 10.084888458251953, 64.34480285644531, 23.980186462402344, 37.719581604003906, 39.70353698730469, 39.91094207763672, 26.671993255615234, 59.913658142089844, 113.147216796875, 21.288665771484375, 74.71407318115234, 54.480995178222656, 106.70906066894531, 71.43717956542969, 14.722919464111328, 57.206783294677734, 7.784912109375, -28.86101531982422, 53.99365997314453, 98.69331359863281, 17.520065307617188, 5.130393981933594, 106.26465606689453, 17.04316520690918, 73.89676666259766, 2.9451065063476562, 55.78306579589844, -5.766868591308594, -9.7005615234375, -1.4110946655273438, 79.55229187011719, 2.6855239868164062, 12.992515563964844, 39.48939514160156, 13.196632385253906, 23.143917083740234, 21.56842041015625, 34.45151138305664, 41.971961975097656, 9.589115142822266, 33.65301513671875, 184.75506591796875, 11.429170608520508, 10.819358825683594, -1.1360721588134766, 39.07037353515625, 37.96118927001953, -3.2930946350097656, -34.47346496582031, 2.094829559326172], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000150.npy"}
{"epoch": 0.22026431718061673, "step": 151, "batch_size": 64, "mean": 41.395328521728516, "std": 41.7921142578125, "min": -56.40985107421875, "p10": -9.052989959716797, "median": 34.50985336303711, "p90": 109.4956970214844, "max": 153.68418884277344, "pos_frac": 0.875, "sample": [-56.40985107421875, 153.68418884277344, 42.88554382324219, 3.8025894165039062, 16.938255310058594, 25.078781127929688, 46.76776885986328, 63.76179885864258, 54.33534240722656, 22.917465209960938, 102.88790893554688, -18.41956901550293, -10.666160583496094, 68.15542602539062, 32.00006866455078, -9.132343292236328, 10.811664581298828, 24.472023010253906, 32.68318176269531, 36.42955780029297, 128.2452392578125, 50.033817291259766, 60.89732360839844, 34.32627868652344, -22.17076873779297, 114.00121307373047, 17.218360900878906, 16.867706298828125, -8.86783218383789, 31.113525390625, 34.69342803955078, 23.073598861694336, 19.416027069091797, 18.70113182067871, 43.67390441894531, 112.32760620117188, -36.952659606933594, 57.51310729980469, 46.58263397216797, 54.930694580078125, 9.626794815063477, 43.29681396484375, 43.3739013671875, 83.78495788574219, 20.36864471435547, 118.52714538574219, 72.08563232421875, 28.84358024597168, 80.62921142578125, 27.947860717773438, 71.14039611816406, 13.121688842773438, 2.1348037719726562, 81.70926666259766, 18.559776306152344, 8.453851699829102, 117.01201629638672, 73.44806671142578, 121.08926391601562, 50.37403869628906, -20.459213256835938, 47.72685241699219, 19.80889129638672, 78.08882141113281], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000151.npy"}
{"epoch": 0.22173274596182085, "step": 152, "batch_size": 64, "mean": 45.03105545043945, "std": 48.00522994995117, "min": -40.406341552734375, "p10": -4.011425018310545, "median": 34.89765930175781, "p90": 113.99167022705082, "max": 168.82742309570312, "pos_frac": 0.859375, "sample": [25.417068481445312, 22.244117736816406, 100.20707702636719, 103.87690734863281, 41.44160842895508, 92.21378326416016, 43.559146881103516, 63.0247802734375, 150.6915740966797, 23.34728240966797, 95.7598648071289, 78.1611328125, 24.284992218017578, -22.55791473388672, 15.682350158691406, 131.270263671875, 73.4494400024414, 18.506446838378906, 43.2530403137207, 22.957054138183594, 46.81334686279297, 28.62574005126953, -32.02533721923828, 38.46943664550781, 10.188346862792969, 167.21055603027344, 62.09947967529297, 128.95556640625, 23.7181396484375, 5.543266296386719, 47.60911560058594, -40.406341552734375, 31.474178314208984, 0.2808837890625, 6.7393798828125, 44.19355773925781, 25.36017608642578, 88.650390625, 25.930892944335938, 26.689407348632812, 168.82742309570312, 1.0151824951171875, 38.725669860839844, 63.323699951171875, 36.589202880859375, -2.459320068359375, 144.7790069580078, 18.79793357849121, -25.888771057128906, 70.19021606445312, 22.774030685424805, 45.90751647949219, 33.20611572265625, 19.944740295410156, 118.32656860351562, -17.262710571289062, 75.10415649414062, -30.044189453125, 11.591476440429688, 74.08349609375, -1.6541900634765625, -4.676612854003906, 77.17057037353516, 60.706260681152344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000152.npy"}
{"epoch": 0.22320117474302498, "step": 153, "batch_size": 64, "mean": 38.37079620361328, "std": 48.338417053222656, "min": -69.53680419921875, "p10": -12.759013366699214, "median": 28.348827362060547, "p90": 105.7468627929688, "max": 205.30938720703125, "pos_frac": 0.796875, "sample": [59.90904998779297, 24.132240295410156, 13.40350341796875, 4.5284576416015625, 10.09115982055664, 22.546478271484375, 18.642330169677734, 69.71047973632812, -9.437324523925781, 119.44593811035156, 50.58113098144531, 27.00859832763672, -4.481647491455078, 51.3723258972168, 137.31187438964844, -40.44953155517578, 63.557464599609375, 9.105508804321289, 39.04594421386719, 69.83536529541016, 37.504974365234375, 205.30938720703125, 24.98754119873047, 43.857872009277344, 49.73789978027344, 112.42235565185547, -14.422126770019531, -8.661712646484375, 120.07661437988281, 64.3414306640625, 95.27151489257812, 9.750267028808594, 93.34637451171875, 6.205251693725586, 0.7171897888183594, 5.584259033203125, 62.76451873779297, 45.64817810058594, -15.752532958984375, -14.182594299316406, 46.64606475830078, 29.689056396484375, 9.000015258789062, 110.23629760742188, 15.855663299560547, -5.506284713745117, -33.37793731689453, 38.13966369628906, 24.8099365234375, -69.53680419921875, 56.50452423095703, 153.98374938964844, 50.778297424316406, -14.475902557373047, 38.97478485107422, -0.7181358337402344, 53.22031784057617, 23.713504791259766, 62.262367248535156, 90.77020263671875, 26.53707504272461, -9.265464782714844, 17.294090270996094, 79.83006286621094], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000153.npy"}
{"epoch": 0.22466960352422907, "step": 154, "batch_size": 64, "mean": 37.15574264526367, "std": 49.44888687133789, "min": -65.61349487304688, "p10": -22.449165725708003, "median": 27.231151580810547, "p90": 99.49546813964845, "max": 152.49227905273438, "pos_frac": 0.765625, "sample": [-2.3261947631835938, 15.277328491210938, -65.61349487304688, 5.625356674194336, 152.49227905273438, 40.0567626953125, 1.1615352630615234, 56.0250244140625, 86.37405395507812, -17.475399017333984, 37.065834045410156, 121.08175659179688, -28.161102294921875, 54.16712951660156, 72.14328002929688, 36.79349899291992, -0.7343330383300781, -8.427444458007812, 23.67331314086914, -0.26676177978515625, 55.383140563964844, 55.64902114868164, 1.0480461120605469, 30.788990020751953, -0.1104888916015625, 85.161376953125, 2.9479942321777344, 92.9119873046875, 78.2929458618164, -24.580780029296875, 2.25091552734375, 103.50147247314453, -41.26099395751953, 95.3345718383789, -25.024934768676758, 107.62283325195312, 143.8197021484375, 101.26678466796875, 6.7137603759765625, 92.2362060546875, 95.36239624023438, 73.16453552246094, 13.014049530029297, 46.29475402832031, 92.48866271972656, 4.980812072753906, 16.52375030517578, 17.485998153686523, -34.75447082519531, 63.01336669921875, 72.85014343261719, 17.337677001953125, -6.971746444702148, 22.58984375, 9.436004638671875, -47.48404312133789, 65.55508422851562, 62.43370056152344, 59.81133270263672, -15.175056457519531, 3.7070770263671875, 15.189878463745117, 43.935333251953125, 144.29342651367188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000154.npy"}
{"epoch": 0.2261380323054332, "step": 155, "batch_size": 64, "mean": 40.0943489074707, "std": 45.07286071777344, "min": -89.65045166015625, "p10": -3.955239868164062, "median": 35.25883865356445, "p90": 92.723072052002, "max": 168.77273559570312, "pos_frac": 0.859375, "sample": [10.74407958984375, 35.053062438964844, 15.32988166809082, -4.20709228515625, 59.5646858215332, 5.042928695678711, 60.77915954589844, -3.367584228515625, 48.01577377319336, 10.766937255859375, -5.0015411376953125, 29.271080017089844, 37.658897399902344, -6.144500732421875, 44.76110076904297, 49.164772033691406, 14.080760955810547, 38.144371032714844, 44.08606719970703, 18.248443603515625, 52.40711975097656, 32.881500244140625, 45.79875183105469, 133.32786560058594, 129.8116455078125, 7.246559143066406, 74.91229248046875, 65.6641616821289, 20.27686309814453, 1.9631500244140625, 12.665931701660156, 2.4251861572265625, 22.219993591308594, 41.77813720703125, 111.65895080566406, -89.65045166015625, 137.9838104248047, 27.11949920654297, 77.1663589477539, 0.8796005249023438, 14.67416000366211, 59.15885925292969, 78.20396423339844, 25.891510009765625, 168.77273559570312, 48.69757080078125, 159.2891845703125, 78.49411010742188, -27.990692138671875, 35.46461486816406, -0.528076171875, 97.45173645019531, 20.94473648071289, 62.32970428466797, 51.34439468383789, -18.49279022216797, 47.743385314941406, 81.68952178955078, 66.3735580444336, 2.4430770874023438, 72.6221923828125, 6.852941513061523, -5.667566299438477, 29.74738311767578], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000155.npy"}
{"epoch": 0.2276064610866373, "step": 156, "batch_size": 64, "mean": 26.240877151489258, "std": 52.01670837402344, "min": -129.365234375, "p10": -30.965802001953122, "median": 27.27737045288086, "p90": 82.42936325073242, "max": 142.30850219726562, "pos_frac": 0.71875, "sample": [75.29264831542969, 28.43347930908203, 122.46925354003906, 142.0821075439453, 75.18030548095703, 42.10768127441406, -85.42572021484375, -2.795745849609375, -8.917694091796875, 14.411785125732422, -52.34928894042969, 85.24172973632812, 39.53486633300781, -7.646635055541992, -18.37753677368164, -82.49646759033203, 38.2283935546875, 80.59588623046875, 131.46424865722656, 31.434829711914062, 8.8995361328125, 65.74237060546875, 13.1376953125, 35.59513854980469, 12.780702590942383, 2.6318893432617188, 11.166543960571289, 28.074752807617188, 142.30850219726562, -2.8263473510742188, 120.294677734375, 81.35352325439453, 82.77251434326172, 31.826446533203125, -9.399871826171875, 4.826417922973633, 29.032493591308594, -39.834381103515625, -129.365234375, -27.452301025390625, 44.25263214111328, -14.117874145507812, 16.737350463867188, 62.25834655761719, 22.862136840820312, -32.471588134765625, 46.56953430175781, 81.62867736816406, 41.50779724121094, 18.534170150756836, -1.3811511993408203, 48.03272247314453, 26.47998809814453, 40.96526336669922, -69.9898452758789, -6.949432373046875, 41.518585205078125, 43.60426330566406, 23.05181884765625, 9.051956176757812, 19.094867706298828, 42.32958221435547, -1.7577247619628906, 67.57080078125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000156.npy"}
{"epoch": 0.2290748898678414, "step": 157, "batch_size": 64, "mean": 46.5479736328125, "std": 49.059593200683594, "min": -82.34423828125, "p10": -7.47056045532226, "median": 43.218666076660156, "p90": 111.82839355468754, "max": 184.43849182128906, "pos_frac": 0.84375, "sample": [82.78080749511719, 87.85431671142578, 55.806373596191406, 116.79240417480469, 51.43810272216797, 8.81857681274414, 37.62040710449219, 116.343017578125, 10.991188049316406, 82.93136596679688, 38.72090530395508, 115.39111328125, 32.419708251953125, 45.137176513671875, 100.06661987304688, 84.65275573730469, 66.54114532470703, 45.938621520996094, 31.639846801757812, 81.6175308227539, 41.30015563964844, -0.6248550415039062, 50.18788146972656, -11.627681732177734, 63.00701904296875, 91.11073303222656, -1.8610916137695312, 23.227832794189453, 6.544830322265625, 88.66456604003906, -82.34423828125, 9.811315536499023, 19.5916748046875, 14.322837829589844, 52.24485778808594, 57.30735778808594, 103.515380859375, 55.639068603515625, 62.743865966796875, -14.253683090209961, -0.11620330810546875, 172.18472290039062, 21.504873275756836, 16.454864501953125, 3.7848167419433594, 121.82276916503906, 67.66819763183594, -17.698938369750977, 136.89273071289062, 56.2042121887207, 74.7964859008789, 15.061370849609375, -17.127376556396484, 17.401504516601562, 93.88211822509766, 65.7271728515625, 184.43849182128906, 38.81385803222656, 30.2789249420166, -46.829994201660156, 24.022003173828125, -9.874618530273438, 4.163707733154297, 3.60491943359375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000157.npy"}
{"epoch": 0.2305433186490455, "step": 158, "batch_size": 64, "mean": 46.50477600097656, "std": 59.507991790771484, "min": -45.118682861328125, "p10": -9.235852432250976, "median": 32.29118728637695, "p90": 131.2784133911133, "max": 224.01388549804688, "pos_frac": 0.8125, "sample": [17.667354583740234, 6.251575469970703, 35.112396240234375, 32.80925750732422, 35.740325927734375, 0.09770584106445312, 15.40438461303711, 25.579490661621094, 33.94429016113281, 3.872020721435547, 102.30197143554688, -45.118682861328125, 125.41157531738281, 56.703208923339844, -3.2647552490234375, -16.565826416015625, -4.111230850219727, 71.52583312988281, 4.105775833129883, 6.719047546386719, 140.63595581054688, -9.6181640625, 106.79449462890625, -18.53266143798828, 133.2539520263672, 20.659873962402344, 70.98741149902344, 13.864809036254883, 10.063972473144531, 10.594547271728516, 48.685916900634766, 45.340980529785156, 4.8949737548828125, 85.889892578125, -8.343791961669922, -5.0269927978515625, 159.3688507080078, 74.6235122680664, -43.31798553466797, 171.7471466064453, 200.3544921875, 15.821792602539062, 25.667938232421875, 224.01388549804688, 0.8145904541015625, 18.909408569335938, -16.07567596435547, 60.6634521484375, 59.693603515625, 31.773117065429688, 200.5909423828125, 92.68545532226562, 61.17790222167969, 73.06721496582031, -11.119775772094727, 87.587646484375, 14.337936401367188, -6.214405059814453, 32.97985076904297, 57.26422882080078, 126.6688232421875, 37.81739807128906, 68.69073486328125, 2.382600784301758], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000158.npy"}
{"epoch": 0.23201174743024963, "step": 159, "batch_size": 64, "mean": 37.868499755859375, "std": 50.76020431518555, "min": -85.84523010253906, "p10": -12.709533691406248, "median": 28.957965850830078, "p90": 106.25537872314455, "max": 191.80560302734375, "pos_frac": 0.796875, "sample": [10.642509460449219, 28.99677276611328, 62.37263488769531, -19.867534637451172, 112.51434326171875, 56.872161865234375, -85.84523010253906, -17.952465057373047, 70.46029663085938, 65.89398193359375, 45.997406005859375, -13.162490844726562, 10.304824829101562, -3.96710205078125, 35.77201843261719, 0.26992034912109375, 5.208770751953125, 47.27022933959961, 103.73393249511719, 91.04841613769531, 191.80560302734375, 5.850589752197266, -22.642913818359375, 9.230354309082031, 9.805038452148438, -11.652633666992188, 13.000656127929688, -20.5833740234375, 25.877670288085938, 28.919158935546875, 4.991184234619141, 109.18351745605469, 21.76128387451172, 12.537174224853516, 43.72418212890625, -11.051431655883789, 40.385894775390625, -3.6559600830078125, 6.542387008666992, 133.54495239257812, 71.11581420898438, 107.33599853515625, 36.761016845703125, 52.20972442626953, 13.476936340332031, 168.31887817382812, 9.132944107055664, -0.2499847412109375, 31.331405639648438, 45.99224853515625, 80.4716567993164, 20.585044860839844, -0.6495590209960938, -46.42250061035156, 44.86207580566406, 48.65631103515625, 73.29612731933594, 75.07279968261719, 13.778940200805664, 162.60830688476562, 92.61138153076172, 35.19609069824219, 15.28851318359375, 78.66706848144531], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000159.npy"}
{"epoch": 0.23348017621145375, "step": 160, "batch_size": 64, "mean": 42.595428466796875, "std": 57.0815544128418, "min": -40.22950744628906, "p10": -22.437585449218748, "median": 23.76125717163086, "p90": 111.29849166870119, "max": 219.77734375, "pos_frac": 0.796875, "sample": [58.913909912109375, 4.615312576293945, 89.02996826171875, 80.20866394042969, -1.613128662109375, -17.12395477294922, 41.376319885253906, 98.0426025390625, 98.60519409179688, 0.40235328674316406, -38.13703155517578, 11.301864624023438, 92.31475830078125, 120.57256317138672, 63.82872009277344, -23.106369018554688, 16.860816955566406, 37.89848327636719, -40.22950744628906, 105.81929016113281, 164.83102416992188, -26.7232666015625, 69.47727966308594, -20.877090454101562, 66.1581039428711, 15.555511474609375, 19.43842315673828, 6.42181396484375, 219.77734375, 41.10308074951172, 10.626838684082031, 42.00096893310547, 11.949752807617188, 158.8114013671875, -5.626213073730469, 6.716728210449219, 18.780113220214844, 104.1597900390625, 161.737060546875, 183.22166442871094, 33.59049606323242, 76.80433654785156, -24.9490966796875, 113.64672088623047, 59.26862335205078, 15.534652709960938, -11.052583694458008, 23.095258712768555, 24.237411499023438, 66.48623657226562, 9.225746154785156, 23.28510284423828, 25.865707397460938, -33.10760498046875, 22.7579345703125, 52.65843200683594, 3.7594356536865234, -0.8859386444091797, -35.45050048828125, 68.13078308105469, 9.058151245117188, 104.1697998046875, 0.5937042236328125, 52.26359176635742], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000160.npy"}
{"epoch": 0.23494860499265785, "step": 161, "batch_size": 64, "mean": 39.81043243408203, "std": 50.381832122802734, "min": -43.955474853515625, "p10": -19.578657722473142, "median": 32.24727439880371, "p90": 112.48352661132814, "max": 182.62521362304688, "pos_frac": 0.75, "sample": [47.49855041503906, 108.67861938476562, -20.605323791503906, 107.6851806640625, 126.70162963867188, -43.955474853515625, -31.135498046875, -22.402694702148438, 43.44056701660156, -17.037303924560547, 35.646629333496094, 78.95465850830078, 50.132667541503906, -32.936988830566406, -4.502559661865234, 51.728912353515625, 135.5647430419922, 42.29667663574219, 65.16368103027344, 132.77781677246094, 33.93632125854492, 28.613094329833984, 21.832176208496094, -13.514152526855469, 30.5582275390625, -5.842994689941406, 66.55693817138672, 14.815757751464844, 152.31680297851562, 182.62521362304688, 59.61119842529297, 2.603513717651367, 17.95842170715332, -13.13809585571289, 99.58203125, 59.025413513183594, -0.9019393920898438, -28.72271156311035, 10.263442993164062, 54.77193832397461, 23.670028686523438, 11.756309509277344, -15.5281982421875, 25.985042572021484, 38.64557647705078, 25.363037109375, 109.28994750976562, 59.030677795410156, 8.937255859375, 48.05271911621094, 94.26040649414062, -33.26488494873047, -4.1214752197265625, 113.85220336914062, 21.532188415527344, 18.977020263671875, 19.86483383178711, 50.60520935058594, 67.98597717285156, -17.183103561401367, 20.409664154052734, 115.0682601928711, 70.73121643066406, 47.30268859863281], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000161.npy"}
{"epoch": 0.23641703377386197, "step": 162, "batch_size": 64, "mean": 41.22889709472656, "std": 54.980892181396484, "min": -49.712799072265625, "p10": -12.566300964355468, "median": 32.789363861083984, "p90": 105.27063980102541, "max": 297.3839416503906, "pos_frac": 0.796875, "sample": [-20.413772583007812, 27.494064331054688, 47.889556884765625, 107.69587707519531, 6.8778533935546875, 125.5594482421875, 7.561487197875977, 74.61137390136719, 21.559938430786133, -9.934530258178711, -12.914100646972656, 100.88923645019531, 31.039833068847656, -26.129638671875, -12.914138793945312, -2.7114639282226562, 80.87062072753906, 75.17491912841797, 9.235153198242188, 61.480560302734375, 54.767356872558594, 17.483970642089844, 44.403076171875, 12.802715301513672, 4.90101432800293, 82.21833801269531, 64.59186553955078, 11.124465942382812, -37.35736083984375, 67.35635375976562, 65.24629974365234, -35.6446533203125, -2.0915355682373047, 15.08875846862793, 84.87598419189453, 42.62320327758789, 107.14838409423828, 297.3839416503906, 47.6973876953125, 49.016746520996094, 39.899192810058594, 46.71421813964844, 76.24575805664062, 95.16827392578125, 81.10917663574219, 5.826507568359375, 2.8999977111816406, 15.304649353027344, 60.524993896484375, 15.956901550292969, -5.9097442626953125, 112.50257873535156, 2.611968994140625, -11.754768371582031, 175.53115844726562, 37.850921630859375, -49.712799072265625, 3.52459716796875, 23.611766815185547, 60.75768280029297, 34.53889465332031, -8.183868408203125, 28.207237243652344, 108.86563110351562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000162.npy"}
{"epoch": 0.23788546255506607, "step": 163, "batch_size": 64, "mean": 39.377601623535156, "std": 45.40939712524414, "min": -40.628196716308594, "p10": -10.522235298156733, "median": 34.202392578125, "p90": 99.50545883178714, "max": 161.07557678222656, "pos_frac": 0.84375, "sample": [11.007957458496094, 3.9419021606445312, 143.0389862060547, 114.21223449707031, -17.64022445678711, -30.925384521484375, 26.77276611328125, -1.6778564453125, 1.0914459228515625, 42.95158386230469, 51.49108123779297, 9.424201965332031, 67.66735076904297, 161.07557678222656, 47.47834014892578, 57.12641143798828, 68.25782775878906, 60.39288330078125, 11.810386657714844, 13.727104187011719, 25.07427978515625, 60.14221954345703, 48.59016418457031, 25.238258361816406, 42.22502136230469, 2.072690963745117, 78.75335693359375, -37.16319274902344, -12.321739196777344, 41.252166748046875, 78.85514831542969, 8.82198715209961, 63.62431335449219, 16.5859432220459, 40.44889831542969, -6.323392868041992, -21.60523223876953, 136.9281463623047, 71.2219009399414, 14.97479248046875, -12.964218139648438, 28.958702087402344, 14.569625854492188, 92.02253723144531, 38.881622314453125, 60.897621154785156, 9.368972778320312, 147.3706512451172, 65.82984924316406, 7.270734786987305, 29.523162841796875, 46.9251708984375, 29.473281860351562, 102.7124252319336, 57.032135009765625, 2.48809814453125, 46.561553955078125, 7.861824035644531, -40.628196716308594, 0.5546779632568359, 137.81558227539062, 43.886871337890625, 85.95897674560547, -0.8255023956298828], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000163.npy"}
{"epoch": 0.2393538913362702, "step": 164, "batch_size": 64, "mean": 45.797706604003906, "std": 46.90169906616211, "min": -53.83784484863281, "p10": -4.637977790832518, "median": 41.877437591552734, "p90": 98.568416595459, "max": 213.84169006347656, "pos_frac": 0.875, "sample": [-53.83784484863281, 73.77084350585938, 85.5753402709961, 86.4198226928711, 104.68045043945312, 38.19249725341797, 44.04849624633789, 72.65292358398438, 54.7072639465332, 56.24343490600586, 126.22795104980469, 17.88469696044922, -5.808294296264648, 42.48368835449219, 19.935279846191406, 40.91234588623047, 42.973045349121094, 47.12751007080078, -5.231584548950195, 29.215166091918945, -14.663797378540039, 55.73784637451172, 43.149330139160156, 68.6673583984375, 32.19097900390625, 48.3521728515625, 3.115081787109375, -21.98589324951172, 60.51494598388672, 15.76883316040039, 20.636749267578125, -3.2528953552246094, 69.6677474975586, 100.71854400634766, 64.11322021484375, 67.5976333618164, 14.931037902832031, 67.29098510742188, 32.93279266357422, 93.55145263671875, 21.52623748779297, 36.23969650268555, 32.29328155517578, 36.259124755859375, 31.358678817749023, 3.7916107177734375, 41.27118682861328, 7.336921691894531, -40.783660888671875, 71.57808685302734, 47.502723693847656, 27.795516967773438, 159.04898071289062, 17.714630126953125, 168.71739196777344, 124.78858184814453, 4.052093505859375, 79.02987670898438, 61.503196716308594, -14.095413208007812, 15.05251693725586, 45.250946044921875, 213.84169006347656, 2.7723541259765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000164.npy"}
{"epoch": 0.24082232011747431, "step": 165, "batch_size": 64, "mean": 30.478483200073242, "std": 41.49768829345703, "min": -81.3035659790039, "p10": -17.369551849365234, "median": 23.818069458007812, "p90": 89.38191375732423, "max": 126.3907470703125, "pos_frac": 0.796875, "sample": [38.37132263183594, 50.50401306152344, -18.000572204589844, 12.16685676574707, -15.897171020507812, -0.5745601654052734, 46.97051239013672, 40.456207275390625, 5.2491607666015625, -20.637977600097656, 34.64037322998047, -81.3035659790039, 1.03033447265625, 27.196090698242188, 18.29156494140625, -32.8377685546875, -6.048419952392578, 4.510986328125, 84.2723617553711, 77.49566650390625, 52.606136322021484, 26.582122802734375, 14.223445892333984, 107.25518798828125, -22.387847900390625, 59.320037841796875, 3.198183059692383, -23.964950561523438, 13.890701293945312, 73.40695190429688, -2.0790786743164062, 16.30964469909668, 88.648193359375, 89.69636535644531, 16.062896728515625, -11.113525390625, 44.71436309814453, 5.410491943359375, 33.47002410888672, 5.54661750793457, 19.647687911987305, 17.046648025512695, 49.224517822265625, 50.416656494140625, 126.3907470703125, 11.740570068359375, 53.543060302734375, 42.1680908203125, 25.122215270996094, -27.15399169921875, 7.197021484375, 73.27056884765625, 29.80274200439453, -15.5355224609375, 28.464618682861328, 107.42164611816406, 16.84276580810547, 117.78602600097656, 119.7505874633789, 91.46170043945312, 36.22493362426758, 7.0089874267578125, 22.51392364501953, 83.61528778076172], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000165.npy"}
{"epoch": 0.2422907488986784, "step": 166, "batch_size": 64, "mean": 54.38330841064453, "std": 64.4186782836914, "min": -34.45964050292969, "p10": -9.803523635864252, "median": 39.159292221069336, "p90": 141.99731750488283, "max": 268.8385009765625, "pos_frac": 0.859375, "sample": [218.7258758544922, 142.4540557861328, 6.651821136474609, 226.59698486328125, -13.746757507324219, 24.25033187866211, 19.748687744140625, 45.26141357421875, 119.43961334228516, -4.578338623046875, 53.67425537109375, 2.1567535400390625, 6.870121002197266, 79.97091674804688, -4.022056579589844, 37.52539825439453, 69.4837646484375, 34.37054443359375, 40.79318618774414, 74.7325668334961, 21.633028030395508, 268.8385009765625, 16.672069549560547, -12.042888641357422, -34.45964050292969, -21.331340789794922, 21.087276458740234, 52.48456573486328, -12.431991577148438, 106.50442504882812, 157.5580596923828, 202.24752807617188, 66.06610870361328, 67.33135986328125, 14.01934814453125, 198.41281127929688, 15.712417602539062, 71.9185791015625, 42.901275634765625, 61.920196533203125, 14.873577117919922, 43.203426361083984, 10.3685302734375, 27.060813903808594, 125.88623046875, 62.51328659057617, 140.9315948486328, 15.537090301513672, 46.13762664794922, 67.425537109375, 20.635536193847656, 63.62211608886719, -17.49159049987793, 23.798839569091797, 3.2051467895507812, -13.034648895263672, 84.41515350341797, 11.834075927734375, 51.19684982299805, 8.963302612304688, 63.022884368896484, 14.391805648803711, 94.05013275146484, 32.583641052246094], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000166.npy"}
{"epoch": 0.24375917767988253, "step": 167, "batch_size": 64, "mean": 57.01264953613281, "std": 53.657108306884766, "min": -75.17105102539062, "p10": 2.2651878356933612, "median": 49.18985939025879, "p90": 128.977816772461, "max": 191.87313842773438, "pos_frac": 0.90625, "sample": [77.868408203125, 47.34251022338867, 81.23670959472656, 100.34059143066406, 66.6156005859375, 84.69740295410156, 26.90509033203125, 39.385650634765625, 71.56993103027344, 44.7476806640625, 191.87313842773438, 108.91423797607422, 43.1529541015625, -3.2896881103515625, 16.771575927734375, 33.042503356933594, 53.63629913330078, -17.63196563720703, 67.08238220214844, 96.64956665039062, 23.1068058013916, 23.548583984375, 34.039772033691406, 25.7928466796875, 61.800270080566406, -75.17105102539062, 155.33058166503906, 111.2259750366211, 1.5350341796875, 113.98846435546875, -6.971193313598633, 33.550758361816406, 68.4381103515625, 18.741958618164062, 163.7830810546875, 86.98695373535156, 153.527099609375, 58.18768310546875, 99.92720031738281, -46.73225402832031, 99.02597045898438, 135.40182495117188, 159.49929809570312, 85.7992935180664, 164.2608642578125, 6.18927001953125, 90.05177307128906, 32.536346435546875, 107.90591430664062, 96.06611633300781, 57.03065490722656, 23.36065673828125, 23.39915657043457, 24.473407745361328, 11.757781982421875, 79.98320770263672, 51.037208557128906, 36.400634765625, 10.207504272460938, -32.446678161621094, 32.471126556396484, 6.874475479125977, 8.007659912109375, 3.9688796997070312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000167.npy"}
{"epoch": 0.24522760646108663, "step": 168, "batch_size": 64, "mean": 44.928321838378906, "std": 48.53337097167969, "min": -39.77587890625, "p10": -6.308740234375, "median": 39.109153747558594, "p90": 100.52998657226563, "max": 202.63900756835938, "pos_frac": 0.8125, "sample": [21.394378662109375, 41.564308166503906, 21.16364288330078, 18.035926818847656, 4.850421905517578, 28.42349624633789, 78.62140655517578, -2.5844993591308594, 25.074893951416016, -5.638786315917969, -38.86225891113281, 89.14974975585938, 188.70242309570312, 54.17644500732422, -11.395843505859375, 97.09037780761719, 103.52892303466797, 97.38111877441406, 94.36770629882812, 39.308380126953125, 110.44586181640625, 144.12033081054688, 101.38301086425781, 81.28588104248047, 2.5772781372070312, -18.619400024414062, -0.1124267578125, 79.00076293945312, 75.70309448242188, 51.595497131347656, 38.90992736816406, 23.575881958007812, -21.891178131103516, 42.159339904785156, 28.67963981628418, 76.0400390625, -6.595863342285156, -1.5813980102539062, -1.0307884216308594, 40.350982666015625, 21.085248947143555, 58.234352111816406, 21.805404663085938, 44.27653503417969, 47.12654113769531, 57.85863494873047, 13.896696090698242, -14.94189453125, 108.424560546875, 28.308700561523438, 98.53959655761719, 14.56867790222168, 28.80230140686035, 11.10296630859375, 70.8746566772461, 40.25757598876953, 27.37940788269043, -39.77587890625, 48.53426742553711, 4.236419677734375, 202.63900756835938, 18.698440551757812, 76.46574401855469, 96.66612243652344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000168.npy"}
{"epoch": 0.24669603524229075, "step": 169, "batch_size": 64, "mean": 39.69366455078125, "std": 62.22714614868164, "min": -86.26084899902344, "p10": -19.663462829589843, "median": 26.339811325073242, "p90": 112.2622528076172, "max": 267.96728515625, "pos_frac": 0.78125, "sample": [30.2646484375, 50.52843475341797, -3.8235511779785156, 15.106910705566406, 78.21122741699219, 77.75670623779297, 16.913101196289062, 62.05113220214844, 45.76293182373047, 16.093982696533203, -10.69418716430664, 46.93182373046875, -86.26084899902344, -1.2676048278808594, 19.85861587524414, -7.777099609375, 13.390605926513672, -20.03076171875, 43.890525817871094, 70.10675048828125, 64.63523864746094, -19.873878479003906, 28.0361328125, -24.760494232177734, 114.59693908691406, 108.87472534179688, 35.44218063354492, 12.058494567871094, 23.064102172851562, 18.133926391601562, 105.26790618896484, 10.241874694824219, 6.057573318481445, 35.650917053222656, 50.60049057006836, 70.37226867675781, 113.71405029296875, 9.586299896240234, 58.8335075378418, 105.6693115234375, 14.116283416748047, 243.7359161376953, -25.592552185058594, 5.9230194091796875, 33.44487762451172, 36.45299530029297, 25.218292236328125, -19.17249298095703, 186.54165649414062, 5.659034729003906, 267.96728515625, -4.4573211669921875, 67.69786071777344, 21.020973205566406, 129.73361206054688, 4.175750732421875, 46.10610580444336, -7.7402191162109375, 27.46133041381836, 8.546371459960938, -58.76933288574219, 58.75578689575195, 148.8689727783203, -58.51451873779297], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000169.npy"}
{"epoch": 0.24816446402349487, "step": 170, "batch_size": 64, "mean": 49.90868377685547, "std": 51.49773025512695, "min": -78.32504272460938, "p10": -2.822162628173828, "median": 42.785430908203125, "p90": 113.60707015991213, "max": 168.34823608398438, "pos_frac": 0.8125, "sample": [106.64714813232422, -47.662498474121094, 45.93766784667969, 51.88185119628906, 107.14358520507812, 39.86259841918945, 86.36212158203125, 4.003318786621094, 83.67774963378906, -3.5543670654296875, 60.2584228515625, -2.386432647705078, 50.13160705566406, -28.64081573486328, 6.719387054443359, -5.0662841796875, 61.87914276123047, 157.16986083984375, 30.632247924804688, 136.52032470703125, 3.6525707244873047, 5.731054306030273, 86.33865356445312, 35.42464828491211, 0.7300262451171875, 29.275413513183594, -78.32504272460938, -2.8046722412109375, 38.355865478515625, 109.93661499023438, 79.86711120605469, 84.51393127441406, 45.7082633972168, 59.83100891113281, 115.18012237548828, -2.8296585083007812, 155.6563720703125, 6.407688140869141, -1.1728534698486328, 168.34823608398438, 64.63658905029297, -0.064178466796875, 33.38840866088867, 94.29177856445312, 92.93204498291016, -6.8935394287109375, 30.117456436157227, 3.0536975860595703, 16.710784912109375, 33.04325485229492, 61.89618682861328, 55.21436309814453, 87.40646362304688, 35.08042526245117, 104.66898345947266, 37.15447998046875, 88.70838928222656, 122.41425323486328, -0.80853271484375, 104.53070831298828, 134.6294708251953, 20.292863845825195, 88.93545532226562, 11.47378921508789], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000170.npy"}
{"epoch": 0.24963289280469897, "step": 171, "batch_size": 64, "mean": 40.819374084472656, "std": 60.9570426940918, "min": -90.45155334472656, "p10": -23.57658233642578, "median": 38.572879791259766, "p90": 122.1280326843262, "max": 284.10577392578125, "pos_frac": 0.78125, "sample": [-24.472091674804688, 124.56172180175781, 38.957557678222656, 31.428054809570312, -27.17499542236328, -4.822906494140625, 10.0814208984375, 41.389678955078125, -30.633346557617188, 74.41156005859375, 8.756595611572266, 86.7718505859375, 51.935882568359375, 50.98332214355469, 38.188201904296875, 6.779045104980469, 42.04597473144531, -20.938095092773438, 58.01289749145508, -6.466285705566406, 116.44942474365234, 46.7445182800293, 56.926483154296875, 40.75200653076172, 166.74331665039062, 41.022804260253906, -33.849273681640625, 284.10577392578125, -56.825279235839844, 37.03288269042969, 60.13069534301758, -90.45155334472656, 179.13421630859375, 57.1651611328125, 42.28446960449219, -21.487060546875, 142.88369750976562, 65.49221801757812, 18.08203887939453, 142.82949829101562, 37.521209716796875, 45.557395935058594, 94.78768920898438, 6.801565170288086, -53.16099548339844, 22.367401123046875, 56.70697784423828, 40.77397155761719, -18.274097442626953, 22.12237548828125, -2.2805328369140625, -7.8533782958984375, 92.83824920654297, 30.945226669311523, 44.29020690917969, 8.739950180053711, 19.29864501953125, 104.0089340209961, 2.3112640380859375, 3.812103271484375, 42.40266418457031, 9.737293243408203, 134.6638641357422, 29.360095977783203], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000171.npy"}
{"epoch": 0.2511013215859031, "step": 172, "batch_size": 64, "mean": 42.23426055908203, "std": 55.837120056152344, "min": -77.13348388671875, "p10": -12.104402160644531, "median": 43.937110900878906, "p90": 99.42894897460938, "max": 243.7176971435547, "pos_frac": 0.71875, "sample": [-4.797019958496094, 73.78729248046875, -77.13348388671875, 52.65576934814453, 11.20758056640625, -9.995841979980469, 89.58523559570312, 53.429656982421875, 35.94823455810547, -25.915468215942383, -52.68116760253906, 23.464271545410156, 7.611198425292969, -5.1829986572265625, 88.46597290039062, 54.436561584472656, 20.991382598876953, 24.971847534179688, -10.739013671875, 62.377838134765625, -7.92279052734375, 67.3470458984375, 63.26909637451172, 94.96359252929688, 7.1839752197265625, 41.991416931152344, 100.13925170898438, 97.77157592773438, -11.983673095703125, 152.65252685546875, -12.156143188476562, 59.37891387939453, 83.19866180419922, 42.48175048828125, 67.78189086914062, 171.19381713867188, 243.7176971435547, 37.28947448730469, 65.24800872802734, 26.862689971923828, -14.350542068481445, -11.493354797363281, 45.39247131347656, 49.09974670410156, 91.14148712158203, -4.935493469238281, 14.012243270874023, 26.68574333190918, -5.953369140625, 106.2804183959961, 90.38008117675781, 0.2530517578125, -5.2886962890625, 76.97686767578125, 58.108497619628906, -55.23004150390625, 58.651309967041016, 58.2315673828125, 135.06210327148438, 47.82865905761719, -6.0773468017578125, -39.67744445800781, 110.05897521972656, 74.93927764892578], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000172.npy"}
{"epoch": 0.2525697503671072, "step": 173, "batch_size": 64, "mean": 42.6837158203125, "std": 47.6840705871582, "min": -67.74227905273438, "p10": -6.962204933166501, "median": 33.02223014831543, "p90": 102.36215362548828, "max": 186.5557098388672, "pos_frac": 0.859375, "sample": [21.39147186279297, 38.930213928222656, 35.351593017578125, 10.351654052734375, -8.009788513183594, 51.40568542480469, 6.048709869384766, 102.38606262207031, 75.72361755371094, 5.348562240600586, 25.407363891601562, -13.211814880371094, 12.239212036132812, 13.318229675292969, 44.652137756347656, 132.71009826660156, 42.499244689941406, 33.43832778930664, 40.01532745361328, 113.54010009765625, 59.36151123046875, 25.600143432617188, 55.422210693359375, -11.075393676757812, 26.50445556640625, 25.98321533203125, 186.5557098388672, 18.441543579101562, 96.62361907958984, 102.30636596679688, 34.777137756347656, 15.917974472045898, -23.485397338867188, 89.82808685302734, 75.76625061035156, 13.136184692382812, 77.99796295166016, 49.0068359375, -4.517843246459961, 144.93670654296875, 51.48967361450195, 80.80950927734375, -0.9049949645996094, 20.35277557373047, 15.718971252441406, 12.985553741455078, 147.2191619873047, 71.29161071777344, -23.869918823242188, 13.019378662109375, 45.64832305908203, 32.60613250732422, 58.054378509521484, -26.7552490234375, -67.74227905273438, 162.06219482421875, 26.468584060668945, 1.5358753204345703, 53.58494567871094, 24.32088851928711, 96.98764038085938, 32.2921142578125, 43.22469711303711, 18.73455810546875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000173.npy"}
{"epoch": 0.2540381791483113, "step": 174, "batch_size": 64, "mean": 45.755714416503906, "std": 57.358524322509766, "min": -61.41322326660156, "p10": -16.228048706054683, "median": 33.45191764831543, "p90": 131.93708038330084, "max": 222.31781005859375, "pos_frac": 0.796875, "sample": [9.482284545898438, 14.571548461914062, -22.88890838623047, -20.280731201171875, 0.6101837158203125, 14.180923461914062, 2.7365379333496094, 21.921342849731445, 64.78317260742188, 98.25325012207031, 116.74298095703125, -5.131004333496094, 43.61492919921875, 9.683656692504883, 46.73820495605469, 2.7459564208984375, 73.24684143066406, 77.0946044921875, 82.47196960449219, 157.05398559570312, 34.53240966796875, 32.37142562866211, 29.554344177246094, -3.6477108001708984, 138.64773559570312, -61.41322326660156, 70.87914276123047, 147.37005615234375, 3.815460205078125, 62.95740509033203, 17.998708724975586, -10.071487426757812, 31.180618286132812, -29.285144805908203, 61.73455810546875, -18.10271453857422, 58.070037841796875, 57.63018035888672, -40.921478271484375, 138.44883728027344, 152.04815673828125, 222.31781005859375, 11.092002868652344, 49.770477294921875, -8.424276351928711, -48.263885498046875, -3.9781951904296875, 59.16954803466797, 54.09132385253906, 68.3254165649414, 65.85430908203125, 31.693519592285156, 26.91761016845703, 84.9601821899414, -11.853828430175781, 110.20164489746094, 98.89085388183594, 69.33943176269531, 24.552650451660156, 12.232454299926758, 197.71633911132812, 63.280052185058594, 63.95438766479492, 25.096826553344727], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000174.npy"}
{"epoch": 0.2555066079295154, "step": 175, "batch_size": 64, "mean": 64.21873474121094, "std": 58.933040618896484, "min": -33.38127899169922, "p10": 2.1954935073852546, "median": 51.558372497558594, "p90": 143.6851913452149, "max": 251.3679656982422, "pos_frac": 0.90625, "sample": [6.690916061401367, 22.754051208496094, 39.172149658203125, -19.048995971679688, 66.29432678222656, 54.87384033203125, 117.17984008789062, 47.171607971191406, 19.820755004882812, 102.38072204589844, 177.69114685058594, -28.44516372680664, 2.7138671875, 182.56539916992188, 38.68826675415039, 9.305679321289062, 126.29398345947266, 114.32521057128906, 93.00020599365234, 108.69379425048828, 103.4327163696289, -11.076957702636719, 131.7142791748047, 251.3679656982422, 12.737232208251953, 62.61792755126953, 186.3287353515625, 37.66142272949219, 83.78590393066406, 49.967559814453125, -2.8519515991210938, 28.549774169921875, 156.1099853515625, 20.290218353271484, 24.435646057128906, 27.297697067260742, 13.74947738647461, 148.81558227539062, 9.044990539550781, 39.056427001953125, 86.33248901367188, 76.4627685546875, 84.60176849365234, 112.44036102294922, 1.9733333587646484, 114.55277252197266, 112.97225952148438, 119.32359313964844, -8.563098907470703, 83.35131072998047, 109.95928955078125, 53.31170654296875, 53.14918518066406, 70.60013580322266, 25.274375915527344, 26.10611343383789, 69.36044311523438, 10.677923202514648, 38.04363250732422, 162.78067016601562, -33.38127899169922, 31.160720825195312, 29.129959106445312, 25.222259521484375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000175.npy"}
{"epoch": 0.25697503671071953, "step": 176, "batch_size": 64, "mean": 52.805381774902344, "std": 59.41462326049805, "min": -33.411529541015625, "p10": -6.858613014221191, "median": 40.79395294189453, "p90": 128.66326293945312, "max": 256.308837890625, "pos_frac": 0.828125, "sample": [28.27825927734375, 15.05322265625, 45.18186950683594, 93.09980773925781, 50.62736129760742, 19.221134185791016, 12.083908081054688, 158.60940551757812, 12.093347549438477, 6.64373779296875, 0.5743484497070312, 211.67257690429688, -8.20024299621582, -7.0128021240234375, 128.7760467529297, 48.348968505859375, 1.0087890625, 162.8608856201172, 83.73680114746094, 6.1436767578125, 3.189422607421875, 59.094261169433594, 70.22976684570312, -25.541545867919922, 128.4001007080078, -6.498838424682617, 16.138507843017578, 99.644287109375, 89.83709716796875, 6.618476867675781, 60.167144775390625, -15.389068603515625, 61.24855041503906, -33.411529541015625, 22.38915252685547, 80.3124008178711, -8.514507293701172, 28.481971740722656, 73.15483856201172, 73.05305480957031, 86.9728012084961, -0.0390167236328125, -0.28864288330078125, -9.118770599365234, 90.69461059570312, 61.46514892578125, 91.02581024169922, 43.941368103027344, 91.94577026367188, 181.573486328125, 37.64653778076172, -0.45728302001953125, 24.856414794921875, 10.375421524047852, 81.86982727050781, 56.10408020019531, 2.6537628173828125, 26.231250762939453, 86.93111419677734, 5.561687469482422, 122.70455932617188, 256.308837890625, 29.308090209960938, 149.87269592285156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000176.npy"}
{"epoch": 0.25844346549192365, "step": 177, "batch_size": 64, "mean": 54.26726150512695, "std": 62.28555679321289, "min": -37.063201904296875, "p10": -10.962886047363277, "median": 34.34335708618164, "p90": 131.62494049072268, "max": 279.2540588378906, "pos_frac": 0.78125, "sample": [76.4349365234375, 32.35546875, 110.87873840332031, 45.931541442871094, 47.330474853515625, 81.05876159667969, -4.8050079345703125, 44.18739318847656, 110.52680206298828, 90.25137329101562, 128.71205139160156, 13.695762634277344, 58.257568359375, 132.87332153320312, 69.13131713867188, 50.303653717041016, 14.325080871582031, 189.03042602539062, 10.860466003417969, 119.5269775390625, 15.766908645629883, 104.74928283691406, -32.46971130371094, 100.89749145507812, -1.1532268524169922, 25.014760971069336, 161.29641723632812, -14.416061401367188, -12.592803955078125, 27.11713409423828, 28.58270835876465, 162.9542694091797, 88.41148376464844, -0.4985198974609375, 35.593170166015625, 25.870925903320312, 11.103614807128906, 165.37118530273438, -1.6732921600341797, -16.35049057006836, -37.063201904296875, 70.56080627441406, 1.1663265228271484, 32.81104278564453, -28.154739379882812, 89.20858764648438, 103.97196960449219, 1.9766311645507812, 0.8809585571289062, -2.3173675537109375, 120.29714965820312, 33.093544006347656, 66.87034606933594, -17.692115783691406, 90.67486572265625, 27.511272430419922, 27.9540958404541, -5.123220443725586, 99.27236938476562, 13.181381225585938, 279.2540588378906, 158.48497009277344, 59.00251007080078, -7.1597442626953125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000177.npy"}
{"epoch": 0.2599118942731278, "step": 178, "batch_size": 64, "mean": 67.36396789550781, "std": 74.49321746826172, "min": -86.79591369628906, "p10": -10.377513122558591, "median": 52.2801628112793, "p90": 148.88585052490237, "max": 316.6278991699219, "pos_frac": 0.828125, "sample": [40.40787124633789, -7.47479248046875, 102.4080581665039, 4.14581298828125, 92.03064727783203, 49.68231964111328, 84.1567611694336, 22.75194549560547, 13.37689208984375, 175.3687286376953, 14.148727416992188, 83.3561782836914, 130.07763671875, 59.00758361816406, 33.960693359375, 88.5880126953125, 222.82882690429688, 48.14402770996094, 316.6278991699219, -11.621536254882812, 123.05279541015625, 38.48523712158203, 14.900115966796875, 54.87800598144531, 22.8512020111084, 120.90499877929688, 152.12838745117188, -21.972068786621094, 40.0673828125, 7.301383972167969, -20.04681396484375, 64.4296875, 134.01681518554688, 21.450180053710938, 66.3093032836914, 131.0613250732422, 45.771026611328125, -22.836349487304688, 45.71131896972656, 124.97135162353516, 19.289535522460938, 72.62994384765625, -86.79591369628906, 248.05892944335938, -4.054449081420898, 97.26622009277344, 69.06460571289062, 89.95833587646484, -18.479703903198242, 239.16390991210938, 26.757944107055664, 81.74852752685547, 127.66598510742188, 5.703290939331055, 25.63250732421875, 78.95979309082031, 64.6635971069336, -3.2996826171875, 47.54277038574219, 141.31993103027344, 205.3536376953125, -40.83625793457031, 119.87464141845703, -1.3020477294921875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000178.npy"}
{"epoch": 0.26138032305433184, "step": 179, "batch_size": 64, "mean": 55.60845184326172, "std": 71.67807006835938, "min": -120.42732238769531, "p10": -22.819378662109372, "median": 53.963905334472656, "p90": 155.29380493164066, "max": 229.20376586914062, "pos_frac": 0.71875, "sample": [63.81755065917969, 103.46914672851562, 105.21011352539062, 188.43450927734375, 39.82213592529297, -36.62181091308594, 136.57325744628906, 40.15970993041992, -2.72149658203125, -0.6612892150878906, 82.59097290039062, 36.016456604003906, 18.909610748291016, 54.38416290283203, -13.157844543457031, 89.87625885009766, -14.103897094726562, 169.3343505859375, 27.49881362915039, -9.508697509765625, 53.65980529785156, -31.387340545654297, 165.03970336914062, 17.205644607543945, 136.35337829589844, 91.68852996826172, 91.21551513671875, 171.54481506347656, 119.43452453613281, 47.58179473876953, 75.40861511230469, 229.20376586914062, 176.5951385498047, 59.773895263671875, 61.88099670410156, 65.96358489990234, -16.544883728027344, 10.116931915283203, 0.25977325439453125, 54.26800537109375, 60.75428009033203, -19.15393829345703, 50.20527648925781, -18.172012329101562, 109.46649169921875, 137.42071533203125, -49.33502960205078, 83.07054138183594, 79.39460754394531, -15.400535583496094, -9.415458679199219, 126.06315612792969, 10.908035278320312, 125.8006591796875, 158.634033203125, 147.49993896484375, -83.00944519042969, 135.3355712890625, -25.325950622558594, -24.390281677246094, -120.42732238769531, 37.08489990234375, 21.47484016418457, -18.126365661621094], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000179.npy"}
{"epoch": 0.26284875183553597, "step": 180, "batch_size": 64, "mean": 61.51070785522461, "std": 62.781436920166016, "min": -58.93031311035156, "p10": -7.440760993957519, "median": 54.26250076293945, "p90": 139.50648345947266, "max": 235.82176208496094, "pos_frac": 0.84375, "sample": [54.56812286376953, 101.5861587524414, 15.523681640625, -8.669620513916016, -48.530860900878906, 235.82176208496094, -1.5995445251464844, 24.628204345703125, 72.79278564453125, 56.941192626953125, 0.3529815673828125, 121.77458953857422, -58.93031311035156, 84.88999938964844, 80.94512939453125, 53.956878662109375, 77.18289947509766, 65.51214599609375, 73.05498504638672, 25.973892211914062, 107.89105987548828, -39.155517578125, 22.607959747314453, 157.01092529296875, 43.69678497314453, 160.3870849609375, 85.21964263916016, 21.98865509033203, 18.783662796020508, 28.384212493896484, 137.3618621826172, 6.24738883972168, 135.97113037109375, 84.73251342773438, 78.76136016845703, -2.414682388305664, 76.27242279052734, 117.60196685791016, 8.905319213867188, 118.23336029052734, 44.2509765625, -7.480323791503906, 174.22198486328125, 26.48651885986328, -40.58342742919922, 39.37606430053711, 19.559066772460938, 124.90696716308594, 139.72862243652344, 27.671249389648438, 59.04182815551758, 138.9881591796875, -7.348447799682617, 191.2975616455078, 42.686737060546875, 1.3928070068359375, 97.8931655883789, 51.50627136230469, 23.402793884277344, -24.318580627441406, 170.68902587890625, 118.27543640136719, 17.122657775878906, 111.65597534179688], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000180.npy"}
{"epoch": 0.2643171806167401, "step": 181, "batch_size": 64, "mean": 40.64165496826172, "std": 56.92431640625, "min": -77.44400787353516, "p10": -19.524598693847647, "median": 32.70277976989746, "p90": 119.26678543090821, "max": 210.42250061035156, "pos_frac": 0.84375, "sample": [47.357627868652344, -10.812271118164062, 14.609621047973633, 49.42708206176758, 67.6302719116211, -5.533763885498047, 7.80706787109375, 34.37514114379883, 40.011878967285156, 57.68565368652344, 27.13199234008789, 26.308116912841797, 3.607542037963867, 7.561332702636719, -61.56071472167969, 119.9328842163086, 46.77054977416992, 2.8757858276367188, 52.69709014892578, 3.5778579711914062, 31.379459381103516, -32.23303985595703, -3.30523681640625, 125.03578186035156, -23.258453369140625, 48.445831298828125, 52.2252197265625, 162.20677185058594, 59.054969787597656, 26.75197410583496, 79.8665771484375, 42.77069091796875, -57.22883605957031, 32.845375061035156, 23.30108642578125, 29.808277130126953, 7.985376358032227, 71.68727111816406, 115.94728088378906, -44.64549255371094, 16.189929962158203, -77.44400787353516, 149.73861694335938, 4.677761077880859, 140.35569763183594, 94.75565338134766, -56.10816955566406, 49.86334991455078, 4.743095397949219, 67.59471130371094, 117.71255493164062, 210.42250061035156, 70.03915405273438, 32.560184478759766, 77.70816040039062, 1.5773429870605469, 3.0709991455078125, 6.4908294677734375, 188.5610809326172, 45.69622802734375, 24.686416625976562, 69.17384338378906, 50.95842742919922, 27.939857482910156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000181.npy"}
{"epoch": 0.2657856093979442, "step": 182, "batch_size": 64, "mean": 65.56771850585938, "std": 69.84908294677734, "min": -115.74585723876953, "p10": -15.046351623535152, "median": 51.68027114868164, "p90": 161.44234924316407, "max": 213.1708221435547, "pos_frac": 0.859375, "sample": [163.75164794921875, 50.4608154296875, 178.01220703125, 87.31381225585938, 56.607383728027344, 100.74749755859375, 130.35098266601562, 151.47238159179688, 213.1708221435547, 57.01321792602539, 48.52992248535156, 35.844600677490234, 20.370386123657227, -5.393959045410156, 133.09027099609375, -115.74585723876953, 156.05398559570312, 28.661666870117188, 4.700035095214844, 26.380958557128906, 197.56398010253906, 0.7065067291259766, 168.65573120117188, 50.46400451660156, 71.19806671142578, -23.06685447692871, 151.04132080078125, 36.0223503112793, 17.595714569091797, 49.992523193359375, 101.300537109375, 139.6793670654297, -10.141181945800781, -51.58452606201172, 82.39794158935547, 131.55426025390625, 47.31635284423828, 48.254425048828125, 107.54837036132812, 127.92880249023438, -17.14856719970703, 41.03771209716797, 32.04802703857422, 94.69227600097656, 74.19682312011719, 90.87183380126953, 80.7616195678711, 99.62015533447266, -44.00920104980469, 2.083942413330078, 188.53561401367188, 88.18785858154297, 45.66577911376953, 2.0178585052490234, -17.445091247558594, 6.4239654541015625, -69.89774322509766, 20.926307678222656, 147.6689453125, 21.599334716796875, 180.80645751953125, 4.125728607177734, 52.89653778076172, 104.84730529785156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000182.npy"}
{"epoch": 0.26725403817914833, "step": 183, "batch_size": 64, "mean": 42.76314926147461, "std": 55.15424346923828, "min": -50.95018768310547, "p10": -27.001188468933098, "median": 32.639774322509766, "p90": 113.58900222778323, "max": 181.14208984375, "pos_frac": 0.8125, "sample": [50.93220520019531, 50.75720977783203, 9.74778938293457, 32.89024353027344, 0.5203742980957031, 116.87478637695312, 84.54139709472656, 92.77770233154297, 49.457122802734375, 32.389305114746094, -2.621368408203125, 78.6607666015625, 27.580951690673828, 13.841264724731445, 6.4987945556640625, 46.7191047668457, 172.63755798339844, 166.337158203125, 98.70439147949219, 24.008962631225586, 36.57398223876953, 57.57943344116211, 153.94618225097656, 78.2686767578125, 18.05408477783203, 3.4196643829345703, 56.60719680786133, 25.550334930419922, 14.520729064941406, -43.10113525390625, 90.02862548828125, 50.852149963378906, -20.49662971496582, 5.440250396728516, 78.2234878540039, 3.2049407958984375, 39.18148422241211, 95.86327362060547, 95.72969818115234, 158.74826049804688, -34.41658020019531, -4.459556579589844, 181.14208984375, -29.788856506347656, 23.22051239013672, -15.140998840332031, 68.58777618408203, -38.367706298828125, -44.372650146484375, -42.032508850097656, 116.41919708251953, 55.508758544921875, 106.98521423339844, 90.8809585571289, 26.551029205322266, 31.984519958496094, 12.176017761230469, 5.272918701171875, 50.721885681152344, 53.068660736083984, -50.95018768310547, -4.590297698974609, 12.25830078125, 14.732528686523438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000183.npy"}
{"epoch": 0.2687224669603524, "step": 184, "batch_size": 64, "mean": 39.89654541015625, "std": 54.458248138427734, "min": -46.53705596923828, "p10": -15.427247619628897, "median": 27.910093307495117, "p90": 113.28846130371096, "max": 196.00588989257812, "pos_frac": 0.78125, "sample": [11.773956298828125, -34.17156219482422, 37.017539978027344, 54.291404724121094, 105.68849182128906, 67.34600830078125, -0.0647735595703125, 7.447975158691406, 73.19415283203125, -5.299934387207031, 196.00588989257812, 160.25469970703125, 29.045852661132812, 101.29879760742188, 5.438758850097656, 89.84635162353516, 29.854507446289062, 85.83880615234375, 65.42982482910156, 129.01930236816406, -2.609285354614258, 149.60733032226562, -24.092254638671875, 9.03448486328125, 13.692264556884766, 34.068084716796875, 27.67683982849121, 8.702354431152344, 72.01121520996094, 10.346426010131836, 39.824859619140625, 30.51727294921875, 94.04133605957031, 115.32270812988281, -39.78982925415039, 125.26683807373047, 8.771392822265625, -19.76752471923828, -2.6948814392089844, 27.83603286743164, 64.3067855834961, -3.725788116455078, 3.5952510833740234, 6.945932388305664, 27.984153747558594, 108.54188537597656, 31.52366828918457, 33.94668197631836, -3.0723190307617188, -34.935760498046875, -28.510822296142578, 38.85237121582031, 8.015911102294922, 19.237205505371094, -46.53705596923828, 16.603042602539062, 8.716995239257812, 103.04292297363281, 16.379558563232422, 30.62934112548828, 12.476646423339844, -2.5318870544433594, 61.09904098510742, 193.77362060546875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000184.npy"}
{"epoch": 0.2701908957415565, "step": 185, "batch_size": 64, "mean": 33.747230529785156, "std": 57.302425384521484, "min": -89.48863983154297, "p10": -19.41500434875488, "median": 23.734447479248047, "p90": 118.08504562377931, "max": 226.06118774414062, "pos_frac": 0.71875, "sample": [-89.48863983154297, -36.94259262084961, 4.827238082885742, 65.10794067382812, 52.44514846801758, 18.51841926574707, -3.9360218048095703, 32.575836181640625, 153.70831298828125, 26.843856811523438, 5.865978240966797, 65.39380645751953, 64.99818420410156, 6.877708435058594, 120.35684204101562, 43.285369873046875, -7.441211700439453, 23.732337951660156, 51.78843688964844, -33.92254638671875, -3.7497730255126953, 36.458824157714844, 9.26095199584961, 40.2618408203125, 42.2172737121582, 23.736557006835938, -31.061431884765625, 27.877288818359375, 33.19622039794922, -37.229156494140625, 72.11920166015625, -0.5703315734863281, 80.3657455444336, 65.92141723632812, 0.2520904541015625, 185.07191467285156, -20.48804473876953, 56.18994903564453, 123.26507568359375, -1.5582923889160156, 9.849903106689453, 15.155210494995117, -6.324302673339844, -7.7814483642578125, -6.221963882446289, 12.200729370117188, 119.6933364868164, 104.87236785888672, 94.74057006835938, 10.320352554321289, 226.06118774414062, 7.498146057128906, 28.204986572265625, 33.9039192199707, -6.317176818847656, 130.9806671142578, 18.475967407226562, 70.86137390136719, -86.69723510742188, 114.33236694335938, -14.050468444824219, 31.432273864746094, 9.41131591796875, -16.911243438720703], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000185.npy"}
{"epoch": 0.27165932452276065, "step": 186, "batch_size": 64, "mean": 60.132774353027344, "std": 59.561275482177734, "min": -65.62652587890625, "p10": 0.5038398742675814, "median": 47.87161827087402, "p90": 136.25339202880863, "max": 229.8030548095703, "pos_frac": 0.890625, "sample": [-65.62652587890625, 88.02510070800781, 162.52081298828125, 83.51927185058594, 87.78711700439453, 20.572296142578125, 56.341339111328125, 20.202686309814453, 146.60874938964844, 121.22479248046875, 31.26800537109375, 122.1759033203125, 8.619850158691406, 43.29417419433594, 31.98428726196289, 5.960636138916016, 48.39722442626953, -43.78465270996094, 29.11029815673828, 3.8409671783447266, 123.62710571289062, 24.789257049560547, 18.862709045410156, 96.77610778808594, 29.01263427734375, 47.368831634521484, 69.1965560913086, 104.76813507080078, 157.7255859375, 38.646018981933594, 120.22846221923828, -4.006052017211914, 98.58766174316406, 37.500118255615234, 27.663915634155273, 86.29704284667969, 214.11196899414062, 141.66465759277344, 54.4476203918457, -3.7649192810058594, 86.83882904052734, 17.40509033203125, 64.3746337890625, 31.232818603515625, 9.6654052734375, 229.8030548095703, -0.82269287109375, 64.19249725341797, -7.1741943359375, 3.5990829467773438, 220.35546875, 108.65814971923828, 41.336669921875, 96.46006774902344, 19.966339111328125, 94.6425552368164, 41.37736129760742, -19.507041931152344, 6.001701354980469, 64.19258117675781, 67.68589782714844, 57.50065612792969, 16.79247283935547, 48.37440490722656], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000186.npy"}
{"epoch": 0.27312775330396477, "step": 187, "batch_size": 64, "mean": 48.3011589050293, "std": 61.51124954223633, "min": -78.03987884521484, "p10": -12.377404022216794, "median": 41.81460762023926, "p90": 130.69376525878909, "max": 224.77499389648438, "pos_frac": 0.828125, "sample": [-78.03987884521484, 154.83673095703125, -4.682167053222656, 39.49336624145508, -53.72329330444336, 80.26461029052734, 64.05119323730469, 57.052459716796875, 31.9792537689209, 52.46037673950195, 50.8531379699707, 60.939048767089844, 171.89231872558594, 33.734703063964844, 45.40773010253906, 98.43214416503906, 107.52586364746094, 14.491233825683594, 36.89064025878906, -35.848880767822266, 44.82267761230469, 12.951126098632812, 224.77499389648438, -3.1078643798828125, 85.74893188476562, 44.75962829589844, 162.34585571289062, -23.37557601928711, 111.33467102050781, 16.216814041137695, -13.89984130859375, 77.4659652709961, 12.512901306152344, -38.035255432128906, -8.607345581054688, 133.56008911132812, 6.267158508300781, 12.568204879760742, -8.825050354003906, 113.64798736572266, 105.7449951171875, 217.7500457763672, 44.13584899902344, 12.569816589355469, 67.19400024414062, 19.952796936035156, 20.248218536376953, 22.2132568359375, -72.27326202392578, 35.67462921142578, 29.65685272216797, 6.334510803222656, 61.67912292480469, 57.39143371582031, 60.775672912597656, 67.15626525878906, 63.715919494628906, 11.194183349609375, 44.94099426269531, 153.1795196533203, 124.00567626953125, 36.588462829589844, 4.491243362426758, 5.817413330078125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000187.npy"}
{"epoch": 0.2745961820851689, "step": 188, "batch_size": 64, "mean": 61.76872253417969, "std": 66.62774658203125, "min": -84.9134292602539, "p10": -5.214736938476559, "median": 40.59449768066406, "p90": 151.8629577636719, "max": 247.4945831298828, "pos_frac": 0.859375, "sample": [168.6159210205078, 57.53077697753906, 142.45372009277344, 22.407108306884766, 70.49604034423828, 12.628265380859375, -17.557083129882812, -33.558692932128906, 148.75245666503906, 74.49593353271484, 40.537635803222656, -8.580802917480469, 64.17534637451172, 56.396644592285156, 153.19602966308594, 126.27024841308594, 51.317054748535156, -1.40887451171875, 38.50525665283203, 3.0376415252685547, 16.39251708984375, 8.310710906982422, 56.815948486328125, 88.66460418701172, 20.692115783691406, 36.99309539794922, 30.323104858398438, 8.827205657958984, 190.07888793945312, 17.205642700195312, 228.0049591064453, -84.9134292602539, 154.3341064453125, 91.24436950683594, 116.97854614257812, 65.7158203125, 14.710708618164062, 144.09043884277344, -2.177030563354492, 130.65744018554688, 69.25044250488281, 37.3538818359375, -44.826446533203125, 18.3465518951416, 130.61273193359375, 31.106964111328125, 148.15643310546875, 28.187435150146484, 35.648193359375, 195.16696166992188, 29.249916076660156, 107.1033935546875, 40.65135955810547, -9.032381057739258, 108.93943786621094, 40.89857482910156, 75.12642669677734, 247.4945831298828, 67.91409301757812, 36.56391143798828, 37.92803955078125, -6.516611099243164, 13.125740051269531, 12.088245391845703], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000188.npy"}
{"epoch": 0.27606461086637296, "step": 189, "batch_size": 64, "mean": 52.98753356933594, "std": 70.69912719726562, "min": -111.4983901977539, "p10": -19.15800914764404, "median": 39.40630340576172, "p90": 136.57869873046877, "max": 307.3416748046875, "pos_frac": 0.8125, "sample": [21.206192016601562, 120.6822509765625, -51.14160919189453, 179.49142456054688, -10.667182922363281, -36.04991912841797, 31.82661247253418, -6.0348052978515625, -15.731416702270508, 57.107582092285156, 93.14836120605469, 98.8303451538086, 81.00212097167969, 36.07060241699219, 105.20543670654297, 42.50794219970703, 71.91199493408203, 19.890125274658203, -41.903564453125, 10.115240097045898, -21.3687686920166, 45.86982727050781, 65.93809509277344, 138.36923217773438, 99.37808227539062, 32.735313415527344, -20.626548767089844, 39.4741096496582, 21.10614013671875, 132.40078735351562, 19.482547760009766, 28.615201950073242, 79.9012451171875, 224.46295166015625, 11.304443359375, 12.599746704101562, 6.94420051574707, 162.6035614013672, 61.5130615234375, 26.589149475097656, -50.91636657714844, 76.44490814208984, 1.15252685546875, 49.361412048339844, 21.48953628540039, 217.68565368652344, 128.90098571777344, 307.3416748046875, -1.4146499633789062, 95.64132690429688, 145.2102813720703, 23.744977951049805, 39.338497161865234, 86.0048828125, 43.90332794189453, 74.68029022216797, 105.75482177734375, 56.423091888427734, 15.364921569824219, -111.4983901977539, 8.184494018554688, -3.1762619018554688, 4.41015625, 82.40986633300781], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000189.npy"}
{"epoch": 0.2775330396475771, "step": 190, "batch_size": 64, "mean": 55.913509368896484, "std": 71.45211791992188, "min": -88.05149841308594, "p10": -20.428277587890623, "median": 46.265907287597656, "p90": 158.8952301025391, "max": 241.6675262451172, "pos_frac": 0.8125, "sample": [-0.6764678955078125, 77.61515045166016, 70.73332214355469, 38.32389831542969, 0.8872718811035156, -32.80531311035156, 121.5003662109375, 27.215538024902344, 7.624439239501953, 172.46612548828125, 48.340728759765625, 24.056793212890625, 54.114280700683594, 28.60491943359375, 221.00209045410156, 71.37713623046875, 50.015953063964844, 46.2950439453125, 119.4747085571289, 38.24908447265625, 3.2417945861816406, 163.5386199951172, -88.05149841308594, 46.23677062988281, 139.00537109375, 8.855026245117188, 27.447967529296875, 168.0862274169922, 131.87962341308594, 63.067840576171875, 0.2701416015625, 148.06065368652344, 8.066230773925781, -3.405855178833008, 57.671592712402344, 73.42022705078125, -21.16162872314453, -2.5867767333984375, -81.99681854248047, 25.313491821289062, 100.81089782714844, -18.717124938964844, 72.69562530517578, 20.221603393554688, -14.158636093139648, -25.497222900390625, 96.09722900390625, 138.565185546875, -28.514556884765625, 48.867347717285156, -79.7021484375, 206.5347900390625, 49.55665588378906, 28.974031448364258, 122.33746337890625, 74.61417388916016, 43.99256134033203, 241.6675262451172, 1.602874755859375, 186.50439453125, 57.46626281738281, 142.04322814941406, 32.75723648071289, 28.371246337890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000190.npy"}
{"epoch": 0.2790014684287812, "step": 191, "batch_size": 64, "mean": 79.48197174072266, "std": 81.00853729248047, "min": -52.0858154296875, "p10": 2.4208404541015645, "median": 59.43080520629883, "p90": 194.37849121093757, "max": 342.5007629394531, "pos_frac": 0.9375, "sample": [210.75039672851562, 57.584381103515625, 310.0414123535156, 11.01312255859375, 145.46798706054688, 55.76484680175781, 103.86783599853516, 167.580810546875, 0.8660449981689453, 200.4095458984375, 68.87443542480469, 75.0317153930664, 28.072555541992188, 129.50479125976562, 142.92691040039062, 5.399360656738281, 21.731351852416992, 47.007293701171875, 36.91368103027344, 59.440879821777344, 124.13475036621094, 41.429046630859375, 27.91455078125, 164.213134765625, 82.0224609375, 25.268447875976562, 1.6390190124511719, 293.9932556152344, 48.54313659667969, 19.00951385498047, 106.74929809570312, -51.49454879760742, 13.640525817871094, -6.361598968505859, 60.58918762207031, 84.4716796875, 77.16238403320312, 90.87252807617188, 223.42201232910156, 17.35465431213379, -1.492319107055664, 112.57730102539062, 137.35757446289062, -52.0858154296875, 16.241775512695312, 342.5007629394531, 44.05352020263672, 1.1687545776367188, 64.10687255859375, 115.71990966796875, 82.91862487792969, 180.3060302734375, 35.94947052001953, 4.245090484619141, 81.96144104003906, 59.42073059082031, 11.220283508300781, 87.61653137207031, 4.98101806640625, 50.72863006591797, 215.6515350341797, 47.906028747558594, 33.851219177246094, 87.11846160888672], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000191.npy"}
{"epoch": 0.28046989720998533, "step": 192, "batch_size": 64, "mean": 57.93366241455078, "std": 71.90042877197266, "min": -78.69625854492188, "p10": -16.529489135742185, "median": 48.18352127075195, "p90": 171.65653228759766, "max": 257.2716064453125, "pos_frac": 0.8125, "sample": [10.545097351074219, -40.11766052246094, 49.07550811767578, 106.44509887695312, 111.0425796508789, 18.314117431640625, -12.731773376464844, -12.506643295288086, 17.915573120117188, -78.69625854492188, 17.53076171875, 47.291534423828125, -15.22406005859375, 121.11839294433594, 40.71562194824219, 12.230758666992188, -13.659820556640625, 49.216773986816406, -17.088958740234375, 84.68122863769531, -35.63948059082031, 209.17361450195312, -27.908401489257812, 52.102630615234375, 3.9551544189453125, 189.8884735107422, 182.50926208496094, 248.30728149414062, 26.280162811279297, 170.947998046875, 50.482818603515625, 62.4670524597168, 257.2716064453125, 57.492305755615234, 92.47779846191406, 9.990461349487305, 89.51844787597656, 126.79676818847656, 121.95126342773438, 16.53182601928711, 58.81730651855469, 171.96018981933594, 15.645669937133789, 18.704689025878906, 86.54998779296875, 50.81201934814453, 46.58789825439453, -53.11241149902344, 94.5079345703125, 40.228416442871094, 59.613121032714844, 0.28948974609375, 46.17237091064453, -20.39842987060547, 10.781671524047852, 134.19540405273438, 56.040069580078125, 11.375419616699219, -2.0129547119140625, 114.90023803710938, 93.275634765625, 75.11309051513672, 13.69390869140625, 183.318603515625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000192.npy"}
{"epoch": 0.28193832599118945, "step": 193, "batch_size": 64, "mean": 33.98414611816406, "std": 64.69102478027344, "min": -190.0749053955078, "p10": -19.325426673889158, "median": 38.23432159423828, "p90": 98.52520141601568, "max": 212.26100158691406, "pos_frac": 0.71875, "sample": [62.27665710449219, -8.592483520507812, 25.444992065429688, 80.3775634765625, -190.0749053955078, 71.1351318359375, 71.8241958618164, 51.30311584472656, -151.79649353027344, 155.85028076171875, 61.96556854248047, -13.002622604370117, 81.35885620117188, -7.820842742919922, 29.288177490234375, 55.5688591003418, 41.943092346191406, 51.59159851074219, -1.3128814697265625, 63.40789794921875, 2.4385986328125, 4.198272705078125, 27.707233428955078, 50.19306182861328, 18.969961166381836, 68.73712158203125, -15.747516632080078, 34.525550842285156, 7.67474365234375, -17.08386993408203, -20.172414779663086, 124.91566467285156, -17.34912109375, -15.640602111816406, 103.3976821899414, -47.80992889404297, 75.19574737548828, -14.863761901855469, 6.062379837036133, 50.048614501953125, 6.9338531494140625, 65.0920639038086, 103.34688568115234, 60.785728454589844, 60.10063934326172, -75.75028991699219, 174.38966369628906, 76.43599700927734, 53.532958984375, 70.58763885498047, -37.860069274902344, 56.769493103027344, -11.215267181396484, -11.457962036132812, 21.122283935546875, 87.27460479736328, 212.26100158691406, 78.80795288085938, -41.11240768432617, 18.179420471191406, 142.52609252929688, 26.883453369140625, 25.532669067382812, 55.685752868652344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000193.npy"}
{"epoch": 0.2834067547723935, "step": 194, "batch_size": 64, "mean": 54.738746643066406, "std": 53.14759826660156, "min": -46.20798873901367, "p10": -6.08463745117187, "median": 47.31213188171387, "p90": 136.92454833984377, "max": 167.9375, "pos_frac": 0.859375, "sample": [80.44158935546875, -0.960205078125, 25.58648109436035, 34.94212341308594, 46.882659912109375, 126.26275634765625, 26.1708984375, 109.2121810913086, 1.3839263916015625, 2.9080963134765625, 19.142614364624023, 151.59854125976562, -19.2625732421875, 19.884368896484375, 38.56316375732422, 17.809356689453125, 94.8219985961914, 33.39921569824219, 47.74160385131836, 86.64371490478516, 22.594898223876953, 47.934425354003906, 8.711727142333984, -0.5963783264160156, 75.6816635131836, -46.20798873901367, 1.8841190338134766, 158.78884887695312, 53.07391357421875, 78.16707611083984, 164.62510681152344, 91.13992309570312, 139.4611358642578, 94.69515228271484, 13.792724609375, 167.9375, 50.076576232910156, 14.188436508178711, 82.17079162597656, 33.84459686279297, 127.06846618652344, -8.925949096679688, 10.511880874633789, 5.026214599609375, 71.40031433105469, 106.38883209228516, 6.257038116455078, -9.584320068359375, 29.330278396606445, -8.28082275390625, 54.46942901611328, -17.54407501220703, 30.49591064453125, 146.2860107421875, 30.683326721191406, 166.49502563476562, 75.53816223144531, 75.65118408203125, 131.00584411621094, 66.79520416259766, 105.80371856689453, 65.83133697509766, 56.07623291015625, -8.636505126953125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000194.npy"}
{"epoch": 0.28487518355359764, "step": 195, "batch_size": 64, "mean": 56.52834701538086, "std": 57.938880920410156, "min": -46.829612731933594, "p10": -19.1647979736328, "median": 54.59556579589844, "p90": 118.30521545410159, "max": 243.23959350585938, "pos_frac": 0.859375, "sample": [59.502784729003906, 28.580413818359375, 138.80638122558594, 39.36268997192383, 53.14501190185547, 184.39230346679688, 26.60425567626953, 160.4792938232422, 90.24325561523438, -5.474946975708008, -7.5093536376953125, 36.856056213378906, 14.760627746582031, -46.829612731933594, 76.71524047851562, 65.08053588867188, 64.21305084228516, 120.04136657714844, -24.933860778808594, 225.7500762939453, 2.184680938720703, 73.09941864013672, -30.747928619384766, 13.520721435546875, 99.33160400390625, 69.32935333251953, 101.915283203125, 125.63578796386719, 79.56092834472656, 111.91694641113281, 78.6479263305664, -31.78174591064453, 62.928688049316406, 101.43869018554688, 19.226364135742188, 85.61965942382812, 3.147523880004883, 16.591819763183594, 77.53347778320312, 114.25419616699219, -25.72154998779297, 107.11788940429688, 29.88052749633789, 25.851112365722656, 243.23959350585938, 92.4928970336914, 59.58953094482422, 45.93995666503906, -26.71233367919922, 29.817346572875977, 66.52142333984375, 9.204666137695312, 100.80084228515625, 62.13899230957031, 33.13014221191406, 17.41473388671875, 27.630992889404297, 45.085662841796875, 21.310537338256836, 56.046119689941406, -24.159988403320312, 29.65850067138672, 75.04531860351562, 43.352439880371094], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000195.npy"}
{"epoch": 0.28634361233480177, "step": 196, "batch_size": 64, "mean": 66.21810913085938, "std": 71.35926055908203, "min": -62.05438232421875, "p10": -10.984938430786126, "median": 45.89542770385742, "p90": 166.44441528320317, "max": 236.7515411376953, "pos_frac": 0.859375, "sample": [135.39657592773438, 146.40603637695312, 135.92718505859375, 119.74932861328125, -4.068809509277344, 153.13841247558594, 147.86997985839844, 204.4984588623047, 139.29736328125, 44.15928649902344, 13.379337310791016, 44.614112854003906, 144.4836883544922, 151.33267211914062, 38.32159423828125, 23.37529945373535, -13.948993682861328, 38.35552978515625, 3.841827392578125, 64.97211456298828, 236.7515411376953, -3.4551773071289062, 37.43492126464844, 35.847049713134766, 58.74708557128906, 52.76604461669922, 77.98614501953125, 104.42288970947266, 28.07506561279297, 1.6838035583496094, 111.28578186035156, 47.57252502441406, 223.0101318359375, 215.35484313964844, -19.118549346923828, 79.03434753417969, 58.480228424072266, 44.21197509765625, 37.56123733520508, 40.75617980957031, 44.478126525878906, 57.184051513671875, 177.48497009277344, 0.016798019409179688, 87.12511444091797, 26.982952117919922, 29.996925354003906, -42.12647247314453, 33.04111099243164, -61.0177001953125, -45.184661865234375, 73.64757537841797, 51.365478515625, 17.920177459716797, 172.14698791503906, -26.888587951660156, -62.05438232421875, 47.17674255371094, 22.738990783691406, 90.41378021240234, 181.08920288085938, 148.25816345214844, 8.8551025390625, 5.799627304077148], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000196.npy"}
{"epoch": 0.2878120411160059, "step": 197, "batch_size": 64, "mean": 51.957035064697266, "std": 58.448184967041016, "min": -74.8345947265625, "p10": -0.9656972885131829, "median": 41.785316467285156, "p90": 121.57483062744143, "max": 234.0718994140625, "pos_frac": 0.875, "sample": [22.697885513305664, 199.46023559570312, 61.81568908691406, 9.755596160888672, 93.35935974121094, 1.1665878295898438, 25.291954040527344, -30.33083724975586, 41.70293426513672, 202.11074829101562, 234.0718994140625, 114.50579833984375, 58.18230438232422, 55.85308837890625, -32.8868408203125, 2.3355484008789062, 37.696693420410156, 162.49497985839844, 45.731689453125, 29.931217193603516, 77.28614044189453, 105.43264770507812, 49.96091842651367, 8.797378540039062, 41.867698669433594, 7.150999069213867, -24.062759399414062, 67.77922058105469, 98.8622055053711, 83.29246520996094, 168.57644653320312, 62.478790283203125, 32.95127868652344, 103.905517578125, 0.81939697265625, 90.27067565917969, 18.484466552734375, 2.2274932861328125, -1.232421875, -74.8345947265625, 16.78337287902832, 76.29695892333984, 0.3424224853515625, 9.788322448730469, 19.48476219177246, 31.976181030273438, 114.38993835449219, 83.2972640991211, 38.74775695800781, 124.60441589355469, -0.3433399200439453, 50.38106918334961, 57.60498809814453, 23.90911102294922, 6.790763854980469, 67.11515045166016, 75.68022155761719, 57.32795715332031, 129.55123901367188, -14.721549987792969, -1.641143798828125, 25.25439453125, 15.759765625, 61.909698486328125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000197.npy"}
{"epoch": 0.28928046989721, "step": 198, "batch_size": 64, "mean": 61.515472412109375, "std": 71.88182067871094, "min": -58.09253692626953, "p10": -0.4364814758300776, "median": 49.73006820678711, "p90": 134.86078262329107, "max": 332.1396179199219, "pos_frac": 0.890625, "sample": [-58.09253692626953, 18.222183227539062, 20.134714126586914, 244.69326782226562, 70.30137634277344, 114.49288177490234, 150.27499389648438, 27.398067474365234, 69.03059387207031, 52.431907653808594, 92.64720153808594, 70.20843505859375, -55.609588623046875, 37.610801696777344, 23.004913330078125, 84.69502258300781, 85.77857971191406, 66.66789245605469, 52.366485595703125, 55.61695861816406, 0.8768253326416016, 0.050872802734375, 27.52949333190918, 46.596038818359375, 10.53004264831543, 33.564842224121094, -12.828628540039062, 119.0667495727539, 59.54884338378906, 87.17816162109375, 61.78076934814453, 53.746551513671875, 46.4207763671875, 27.197265625, 69.67269897460938, 12.689571380615234, 38.971073150634766, 94.6777114868164, 62.446746826171875, 141.62965393066406, -34.411865234375, 302.083740234375, 47.093650817871094, 34.631858825683594, -0.6453475952148438, 12.5711669921875, 151.88204956054688, -5.668588638305664, 6.473749160766602, 33.705718994140625, -11.930545806884766, 41.03216552734375, 81.44871520996094, 86.30203247070312, 13.139142990112305, 100.51565551757812, 90.93577575683594, 111.9826431274414, 1.451202392578125, 234.3822021484375, 7.8084716796875, 332.1396179199219, 30.73369789123535, 66.11305236816406], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000198.npy"}
{"epoch": 0.2907488986784141, "step": 199, "batch_size": 64, "mean": 60.14692687988281, "std": 70.12164306640625, "min": -90.66707611083984, "p10": -13.843490600585934, "median": 45.4381217956543, "p90": 153.87288513183594, "max": 245.368896484375, "pos_frac": 0.828125, "sample": [84.09497833251953, 103.07798767089844, -40.34503173828125, 1.38104248046875, 160.63275146484375, 51.79894256591797, -25.521759033203125, 129.56317138671875, -90.66707611083984, 84.22737884521484, 24.677040100097656, -15.5831298828125, -36.18072509765625, 112.32319641113281, 137.0115966796875, 67.95722198486328, 9.221111297607422, 77.44194793701172, 207.4485626220703, 29.137720108032227, 31.975814819335938, 102.9493408203125, 190.91879272460938, 149.44918823242188, -53.64641189575195, 11.553298950195312, 5.933998107910156, 57.723182678222656, 127.12484741210938, 16.9940185546875, 19.196426391601562, 150.21392822265625, 227.38380432128906, 40.79417419433594, 29.061012268066406, 121.6375503540039, -9.784332275390625, 67.16886138916016, -1.32159423828125, 102.16926574707031, 17.2115478515625, 95.47146606445312, 86.80996704101562, 23.278118133544922, 155.44100952148438, 42.985008239746094, 40.25248336791992, 6.955854415893555, -5.853076934814453, -64.89342498779297, 110.99085235595703, 32.558258056640625, 36.98712158203125, 76.87478637695312, 158.34117126464844, 74.84281921386719, 5.577051162719727, 64.66439819335938, 19.233932495117188, 32.90215301513672, 47.8912353515625, 91.4614028930664, 245.368896484375, -5.1419677734375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000199.npy"}
{"epoch": 0.2922173274596182, "step": 200, "batch_size": 64, "mean": 64.84251403808594, "std": 79.98242950439453, "min": -135.5108642578125, "p10": -11.885632705688476, "median": 36.90866661071777, "p90": 178.63456268310551, "max": 324.218505859375, "pos_frac": 0.8125, "sample": [163.15914916992188, 90.15914916992188, 49.20113754272461, 195.649169921875, 33.421268463134766, 89.36063385009766, 9.645004272460938, 7.450706481933594, 220.83697509765625, -21.60678482055664, 188.39328002929688, 137.213134765625, 69.31739044189453, -15.770427703857422, 37.65224838256836, 170.358154296875, 128.66986083984375, 201.1758575439453, 23.042739868164062, 132.73013305664062, 35.775421142578125, 17.469728469848633, 98.19712829589844, 127.31025695800781, -5.790035247802734, 51.54027557373047, 96.77908325195312, -23.888259887695312, 324.218505859375, 22.27684783935547, 69.03065490722656, -12.09981918334961, 5.916069030761719, 31.936798095703125, 34.860755920410156, 4.971384048461914, -135.5108642578125, 36.16508483886719, 87.99179077148438, 236.2874755859375, 15.174917221069336, 70.93669891357422, 33.53977966308594, 29.75597381591797, -3.364978790283203, -70.0838623046875, 51.75299072265625, -9.075481414794922, 147.20208740234375, -5.172780990600586, 19.675155639648438, 182.1815948486328, 33.55097198486328, 22.344154357910156, 32.54109191894531, 142.94496154785156, 104.44583129882812, 98.80889129638672, 34.26222229003906, 70.17263793945312, -54.81943893432617, 100.59486389160156, -11.3858642578125, 100.44139862060547], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000200.npy"}
{"epoch": 0.2936857562408223, "step": 201, "batch_size": 64, "mean": 66.80262756347656, "std": 78.36316680908203, "min": -114.75070190429688, "p10": -13.786550140380852, "median": 54.256874084472656, "p90": 157.53540039062503, "max": 318.8988952636719, "pos_frac": 0.859375, "sample": [133.91976928710938, 32.07789611816406, 136.0597381591797, -97.49090576171875, 37.72125244140625, 18.153701782226562, 13.99842643737793, 11.87165641784668, -4.410400390625, 30.85888671875, 0.418182373046875, 31.962310791015625, 18.030517578125, 17.72198486328125, 223.23800659179688, 79.69742584228516, 54.9508056640625, 71.11683654785156, 22.575927734375, 153.9556121826172, 207.74465942382812, 120.90550231933594, 19.39727783203125, 53.56294250488281, 86.09501647949219, 185.57952880859375, -38.7893180847168, 97.01615905761719, 109.28533172607422, 15.485101699829102, 57.27457046508789, 88.62400817871094, 97.58049774169922, 92.62559509277344, -16.907516479492188, -23.112468719482422, 135.20941162109375, 37.53130340576172, 40.75848388671875, 318.8988952636719, 63.7100830078125, -6.504295349121094, 53.338218688964844, 47.71026611328125, 24.946462631225586, 128.0489044189453, 46.790245056152344, 138.9044647216797, 207.67520141601562, 57.632545471191406, 68.3189926147461, -31.109230041503906, 74.70967102050781, 66.91316986083984, -114.75070190429688, 159.06959533691406, 270.0946960449219, 96.03778839111328, 20.304460525512695, -36.41923522949219, 128.89755249023438, 18.55095672607422, 23.298912048339844, 98.0066909790039], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000201.npy"}
{"epoch": 0.29515418502202645, "step": 202, "batch_size": 64, "mean": 47.20063018798828, "std": 73.76876068115234, "min": -107.48710632324219, "p10": -30.137131118774413, "median": 45.45691680908203, "p90": 148.67474365234375, "max": 245.4305877685547, "pos_frac": 0.71875, "sample": [245.4305877685547, -25.28958511352539, 71.61782836914062, 160.34754943847656, 4.213085174560547, 20.26288604736328, 98.45085906982422, -7.450216293334961, 87.29859924316406, 74.2194595336914, 28.840927124023438, -80.67727661132812, 76.43870544433594, 149.00396728515625, 84.18273162841797, 53.555320739746094, -107.48710632324219, 18.63494873046875, 55.80436706542969, 207.3144073486328, 49.97747802734375, 156.6195526123047, -67.26520538330078, 139.08816528320312, 9.4674072265625, -1.33245849609375, 228.47845458984375, -1.3123703002929688, 57.52696228027344, 100.94113159179688, 51.80302429199219, 93.575439453125, 147.90655517578125, 0.87481689453125, -12.213348388671875, -23.932056427001953, 40.93635559082031, 40.493408203125, 32.988487243652344, 127.48150634765625, 82.87872314453125, -28.342742919921875, 10.782329559326172, 163.611572265625, 56.309349060058594, 105.82920837402344, 62.83118438720703, -92.64131164550781, 54.29546356201172, 22.099945068359375, 15.089046478271484, 28.21686363220215, -20.081253051757812, -22.97313690185547, 16.12225341796875, -33.26963424682617, 52.02557373046875, -39.52495574951172, -25.268436431884766, 51.40862274169922, 141.87942504882812, -30.90615463256836, 65.31536865234375, -1.6621246337890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000202.npy"}
{"epoch": 0.2966226138032305, "step": 203, "batch_size": 64, "mean": 57.385345458984375, "std": 64.01374816894531, "min": -57.034934997558594, "p10": -16.44442024230956, "median": 39.57668876647949, "p90": 147.55090789794926, "max": 257.7147216796875, "pos_frac": 0.84375, "sample": [15.69375991821289, 90.46409606933594, 117.49099731445312, 19.77935028076172, -5.811351776123047, 156.2681884765625, -24.37978744506836, 39.02253723144531, 30.57958221435547, 72.27789306640625, 9.819374084472656, 75.82328033447266, 79.30944061279297, 54.631858825683594, -40.591800689697266, 88.89750671386719, 28.094955444335938, 4.908546447753906, 107.36105346679688, 55.71490478515625, 179.71035766601562, 200.0504150390625, 2.660003662109375, 35.14748001098633, 72.06861114501953, 38.18358612060547, 257.7147216796875, 13.388938903808594, -37.21525573730469, 73.98928833007812, -4.971458435058594, 101.05998992919922, 77.219482421875, 176.61080932617188, 76.82272338867188, 115.07838439941406, 76.68891906738281, 76.65834045410156, 18.40276336669922, 27.80194854736328, 18.996231079101562, 77.91706085205078, 51.576866149902344, 137.7901611328125, -2.5369186401367188, -57.034934997558594, -21.001449584960938, 35.203182220458984, -36.14189147949219, 29.40023422241211, 113.83880615234375, 23.999427795410156, 40.13084030151367, 35.31927490234375, 23.603729248046875, -25.81964111328125, 85.56224822998047, 41.71075439453125, 110.44955444335938, 212.96270751953125, 12.377378463745117, 151.7340850830078, 35.796958923339844, 24.403156280517578], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000203.npy"}
{"epoch": 0.29809104258443464, "step": 204, "batch_size": 64, "mean": 54.453792572021484, "std": 76.88208770751953, "min": -59.103050231933594, "p10": -20.959363937377926, "median": 34.145663261413574, "p90": 135.7411743164063, "max": 331.7828369140625, "pos_frac": 0.734375, "sample": [87.14279174804688, 44.909732818603516, 12.156780242919922, 31.065221786499023, 78.28104400634766, 29.45653533935547, 111.912353515625, -59.103050231933594, 85.13397979736328, 21.05388069152832, -19.00937271118164, 26.92983055114746, 8.243148803710938, 54.32384490966797, 45.35661315917969, -4.289154052734375, -1.4832763671875, 99.20182037353516, 28.510284423828125, 5.982309341430664, 139.85801696777344, -35.402366638183594, -0.7759246826171875, -14.495269775390625, 24.229690551757812, -6.851858139038086, 42.35704040527344, 126.13520812988281, 118.91874694824219, -21.795074462890625, 268.38482666015625, -8.031349182128906, -4.8056488037109375, 16.820295333862305, -4.3590240478515625, 104.44302368164062, 6.5776214599609375, 60.13074493408203, 260.8095703125, -58.4434814453125, 102.47718811035156, 112.51741027832031, 52.994537353515625, 60.166805267333984, 44.044219970703125, 13.639913558959961, 30.132720947265625, 108.47200012207031, 188.76235961914062, 11.11474609375, 37.226104736328125, 61.70745086669922, -28.880157470703125, 55.873130798339844, 331.7828369140625, 211.51492309570312, -7.91535758972168, 98.34163665771484, 26.524538040161133, 174.17416381835938, -22.097946166992188, 57.82456970214844, 89.68701171875, -24.522167205810547], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000204.npy"}
{"epoch": 0.29955947136563876, "step": 205, "batch_size": 64, "mean": 67.72974395751953, "std": 77.51020812988281, "min": -111.53890991210938, "p10": -35.43956108093261, "median": 63.61687469482422, "p90": 168.30741577148441, "max": 236.38385009765625, "pos_frac": 0.796875, "sample": [24.298992156982422, 60.189910888671875, 78.6510238647461, 228.43731689453125, 40.489349365234375, 25.031869888305664, -8.378484725952148, 16.907943725585938, 40.420494079589844, 128.61355590820312, 159.7388153076172, -62.9322509765625, -41.87153625488281, 72.34737396240234, -5.4748382568359375, 64.880126953125, 8.506763458251953, 217.42465209960938, 175.13026428222656, 25.98214340209961, 100.8797378540039, 44.93388366699219, -33.8101806640625, 89.59941101074219, 160.3580322265625, 134.50473022460938, 136.50161743164062, 26.05447006225586, -17.715065002441406, 122.70902252197266, -39.1158561706543, 236.38385009765625, 44.392547607421875, 40.12968444824219, 23.36971664428711, 75.76699829101562, 172.63345336914062, 155.27056884765625, 57.36628723144531, 62.35362243652344, -98.97021484375, 78.00413513183594, 102.94945526123047, 106.47178649902344, 79.32284545898438, -36.13786697387695, 91.82887268066406, 53.431373596191406, 171.71429443359375, 127.7289047241211, 156.2740936279297, 150.0571746826172, 121.34858703613281, 56.85767364501953, 102.76878356933594, 177.50997924804688, 77.59600830078125, -14.460700988769531, 2.438629150390625, -111.53890991210938, -48.48304748535156, 120.32151794433594, -16.328689575195312, 43.03875732421875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000205.npy"}
{"epoch": 0.3010279001468429, "step": 206, "batch_size": 64, "mean": 73.30876159667969, "std": 78.87586975097656, "min": -89.5447998046875, "p10": -31.22425079345702, "median": 69.41500473022461, "p90": 160.6477722167969, "max": 288.08404541015625, "pos_frac": 0.84375, "sample": [77.91400146484375, 114.69973754882812, 42.671356201171875, 67.24870300292969, 288.08404541015625, 199.6866455078125, 66.053955078125, 15.119131088256836, 135.52357482910156, 151.4532012939453, 49.2430419921875, 31.187179565429688, 19.20973777770996, 33.704978942871094, -89.5447998046875, 76.89479064941406, 65.90679931640625, 58.78752136230469, 37.40690612792969, -11.130590438842773, 107.70210266113281, -47.434425354003906, 142.53988647460938, 219.32843017578125, 102.87309265136719, 108.44081115722656, 35.2716064453125, 147.7559814453125, -81.40521240234375, -61.572486877441406, 136.90846252441406, 164.5883026123047, 133.15972900390625, 105.10733032226562, 18.686342239379883, 147.5015869140625, 20.451889038085938, -51.6430778503418, 130.25747680664062, -48.49383544921875, 12.292381286621094, 130.52255249023438, 6.317447662353516, 75.56483459472656, 71.58130645751953, -7.89752197265625, 63.06336212158203, 57.934715270996094, 95.33294677734375, 72.14772033691406, -36.226646423339844, -19.55199432373047, 45.78535461425781, 109.1798324584961, 234.48338317871094, 142.48536682128906, 2.8414134979248047, 92.12944030761719, 90.53522491455078, 221.55953979492188, 36.263916015625, 19.11363983154297, 203.96173095703125, 112.197021484375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000206.npy"}
{"epoch": 0.302496328928047, "step": 207, "batch_size": 64, "mean": 84.67915344238281, "std": 87.7833251953125, "min": -50.007286071777344, "p10": 1.135661315917972, "median": 51.571794509887695, "p90": 198.04455718994143, "max": 351.3677978515625, "pos_frac": 0.890625, "sample": [4.466320037841797, -41.46851348876953, 38.642234802246094, 26.653133392333984, 87.26133728027344, 40.38001251220703, 191.7436065673828, 11.294166564941406, 7.8431243896484375, -0.27099609375, 66.60511779785156, 62.255271911621094, 150.61724853515625, -6.4925384521484375, 58.55016326904297, 79.83573150634766, 50.108375549316406, 102.78239440917969, 25.472522735595703, 23.134553909301758, 28.51651382446289, 122.56343841552734, 22.746971130371094, 169.9761199951172, 24.11548614501953, -18.992034912109375, 181.52223205566406, 37.26976013183594, 39.242271423339844, 8.09720230102539, 200.74496459960938, 137.2315216064453, -25.361228942871094, 4.4178619384765625, 43.25421142578125, 158.45806884765625, 27.023086547851562, 124.78076171875, 51.36477279663086, 98.29127502441406, 163.10272216796875, 93.55184936523438, 51.77881622314453, 126.698486328125, 143.71469116210938, 145.8131103515625, 43.50665283203125, 257.50390625, 297.703857421875, 145.36965942382812, 39.258731842041016, 275.1507873535156, -11.190214157104492, 50.291404724121094, 351.3677978515625, 36.44403076171875, 53.844547271728516, 63.66863250732422, 48.0479736328125, 275.133056640625, -50.007286071777344, 140.05062866210938, 251.93228149414062, 12.052909851074219], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000207.npy"}
{"epoch": 0.3039647577092511, "step": 208, "batch_size": 64, "mean": 65.2934799194336, "std": 79.50164794921875, "min": -133.71218872070312, "p10": -5.821620178222654, "median": 45.661014556884766, "p90": 170.16142730712892, "max": 305.69403076171875, "pos_frac": 0.875, "sample": [-79.75222778320312, 68.04280090332031, -47.321319580078125, 110.64566040039062, 29.932228088378906, 10.122295379638672, 180.5755157470703, 7.82960319519043, 69.35059356689453, 154.49932861328125, 143.6544952392578, 40.58837127685547, 54.59054183959961, 111.60060119628906, 122.33280181884766, 50.163612365722656, 305.69403076171875, 29.912307739257812, 240.85784912109375, 98.89492797851562, 14.79034423828125, 42.433204650878906, 127.88436889648438, 94.1373519897461, 91.39410400390625, 31.253273010253906, 25.714202880859375, 15.25927734375, -73.21052551269531, 167.46572875976562, -73.557373046875, 35.92881774902344, 24.842586517333984, 46.942161560058594, 76.69930267333984, 171.3167266845703, 139.67604064941406, 43.12173080444336, 2.2916641235351562, 113.6171875, 22.622772216796875, 27.405548095703125, 12.176429748535156, 140.97518920898438, 44.37986755371094, 182.06500244140625, 52.11437225341797, -133.71218872070312, -20.133705139160156, 59.939544677734375, 4.2435150146484375, 41.582611083984375, 191.79104614257812, 62.92646789550781, 42.16865539550781, -6.9505157470703125, 25.301773071289062, 237.84854125976562, 125.68911743164062, 125.77996826171875, 84.6656265258789, 30.067264556884766, 8.80889892578125, -3.187530517578125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000208.npy"}
{"epoch": 0.3054331864904552, "step": 209, "batch_size": 64, "mean": 56.85578918457031, "std": 74.11839294433594, "min": -120.03630828857422, "p10": -21.874330520629883, "median": 46.05218315124512, "p90": 170.1059768676758, "max": 230.38778686523438, "pos_frac": 0.765625, "sample": [222.3215789794922, 55.364688873291016, 108.784423828125, 4.237022399902344, 214.8988800048828, 202.97039794921875, 186.89111328125, 38.608924865722656, 66.59307098388672, 43.248382568359375, -21.90114974975586, 47.601173400878906, 99.93907165527344, 30.531471252441406, 114.25613403320312, 75.92145538330078, 124.75608825683594, -67.6412353515625, -2.3402786254882812, 83.43656158447266, 123.98219299316406, 108.97764587402344, 11.409828186035156, -54.36199951171875, 44.07157897949219, -21.811752319335938, -18.124475479125977, 47.72010040283203, 37.701324462890625, 103.70327758789062, 62.95945739746094, 65.74058532714844, -60.634117126464844, -6.584110260009766, 32.12194061279297, 39.60479736328125, -41.55178451538086, -14.338848114013672, 230.38778686523438, 132.58216857910156, 26.038345336914062, 170.856201171875, 12.437110900878906, -6.753456115722656, -14.350204467773438, 16.04961395263672, 65.46524047851562, 42.9506721496582, 79.13276672363281, 81.9547119140625, 84.88479614257812, 5.280065536499023, -19.716156005859375, -120.03630828857422, 94.31533813476562, 41.1051025390625, 44.50319290161133, 43.64970397949219, 52.84330749511719, -50.157752990722656, 168.35545349121094, 179.02774047851562, 97.45223999023438, 91.44969177246094], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000209.npy"}
{"epoch": 0.3069016152716593, "step": 210, "batch_size": 64, "mean": 61.69956970214844, "std": 67.8758544921875, "min": -70.55352783203125, "p10": -5.21521759033203, "median": 48.39678192138672, "p90": 174.0730728149414, "max": 238.754638671875, "pos_frac": 0.8125, "sample": [-0.201446533203125, 21.74643898010254, 81.68001556396484, 49.02301025390625, 11.865280151367188, 22.31686019897461, 125.31371307373047, -2.1452579498291016, 64.73426818847656, -20.81865692138672, 103.37799072265625, -4.515281677246094, 24.853328704833984, 8.726593017578125, 174.71791076660156, 80.48543548583984, 0.55987548828125, 47.77055358886719, 54.79253387451172, -3.1309165954589844, 157.5963592529297, 78.19471740722656, 37.76551818847656, 110.07756042480469, 54.60719299316406, 3.0615806579589844, -6.391380310058594, 132.02728271484375, 110.71414184570312, 176.32395935058594, 54.84160614013672, 126.370361328125, 86.22785186767578, 189.16261291503906, -14.556427001953125, -5.515190124511719, -18.93502426147461, 30.058578491210938, 25.78923225402832, 17.110713958740234, 98.4016342163086, 81.83416748046875, 197.6390838623047, 32.45951461791992, 7.397195816040039, -7.038238525390625, 172.56845092773438, 18.934799194335938, 72.28608703613281, 79.66683197021484, 150.520751953125, 238.754638671875, 127.54659271240234, 56.903221130371094, 5.5561065673828125, 40.11988067626953, 49.135887145996094, 8.74978256225586, 13.887214660644531, 203.2729949951172, -0.7094535827636719, 3.5403900146484375, -70.55352783203125, 182.21498107910156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000210.npy"}
{"epoch": 0.30837004405286345, "step": 211, "batch_size": 64, "mean": 52.43247604370117, "std": 71.23040008544922, "min": -79.0660400390625, "p10": -26.145406341552732, "median": 43.74513244628906, "p90": 137.82195281982425, "max": 267.36444091796875, "pos_frac": 0.78125, "sample": [55.99537658691406, -6.318838119506836, 142.32174682617188, 70.36890411376953, 52.47489547729492, 56.366554260253906, 10.592666625976562, 102.27832794189453, 267.36444091796875, -38.407806396484375, 21.708717346191406, -1.8933238983154297, 12.040767669677734, -27.16143798828125, 72.61680603027344, 133.30857849121094, 91.09245300292969, 47.7757568359375, 44.58528137207031, 14.878433227539062, 4.762626647949219, 6.553558349609375, 84.42291259765625, 63.20872497558594, -7.394676208496094, 1.9331207275390625, -9.2799072265625, 213.745849609375, 28.565444946289062, 252.07571411132812, 207.280517578125, -12.343330383300781, -38.301509857177734, 54.735107421875, 115.50114440917969, 89.57382202148438, -57.653411865234375, 100.59725952148438, 32.02070617675781, 26.6539306640625, 70.00436401367188, 139.4127960205078, -23.77466583251953, 25.332365036010742, 21.219282150268555, 48.96275329589844, 42.90498352050781, -49.467193603515625, -20.794376373291016, 134.1099853515625, -58.395111083984375, 64.93362426757812, 114.40509033203125, 68.49803924560547, 88.44898223876953, 152.18804931640625, 39.98589324951172, 122.9318618774414, 0.3488922119140625, 80.61190795898438, 42.81954574584961, 22.207794189453125, 29.203720092773438, -79.0660400390625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000211.npy"}
{"epoch": 0.30983847283406757, "step": 212, "batch_size": 64, "mean": 75.48409271240234, "std": 83.11174011230469, "min": -92.02413940429688, "p10": -12.925804901123046, "median": 68.07704162597656, "p90": 176.41643829345705, "max": 295.88958740234375, "pos_frac": 0.765625, "sample": [-92.02413940429688, -49.70338439941406, 231.19952392578125, 128.19879150390625, 46.05419158935547, 143.75448608398438, 22.823211669921875, -6.377696990966797, 15.244531631469727, 9.09018325805664, 119.06614685058594, 81.6086196899414, -52.458251953125, -24.207550048828125, -8.041255950927734, 24.30225372314453, 138.88494873046875, 173.31686401367188, 252.3894500732422, 117.14625549316406, 57.11930847167969, -55.100074768066406, 39.39388656616211, 66.25997924804688, 75.72850036621094, 82.90469360351562, 142.85614013671875, -10.22647476196289, 180.7225341796875, -5.884111404418945, 53.122467041015625, 86.14973449707031, 295.88958740234375, 68.58731079101562, 29.287988662719727, -14.480888366699219, 144.5288543701172, -8.506210327148438, 92.22716522216797, 277.15191650390625, 135.81964111328125, 67.5667724609375, 48.323585510253906, 85.32501983642578, 19.511077880859375, 27.897186279296875, -11.338489532470703, -1.6964454650878906, 165.86117553710938, 27.46670913696289, 88.57425689697266, 188.96527099609375, 177.7448272705078, 124.19270324707031, 170.4246826171875, -13.606082916259766, 105.36738586425781, 31.123409271240234, 125.3857192993164, 95.15415954589844, 20.712013244628906, 166.92779541015625, -7.585906982421875, 124.86587524414062], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000212.npy"}
{"epoch": 0.31130690161527164, "step": 213, "batch_size": 64, "mean": 74.51496124267578, "std": 76.25312042236328, "min": -106.94775390625, "p10": -12.2928466796875, "median": 73.76980590820312, "p90": 193.6643478393555, "max": 246.78701782226562, "pos_frac": 0.84375, "sample": [200.91094970703125, -19.90264892578125, 23.51917266845703, -21.595815658569336, 84.96742248535156, 10.590385437011719, 37.527862548828125, 92.69486999511719, -106.94775390625, -23.2178955078125, 2.873037338256836, 86.47509765625, 158.33660888671875, -9.360361099243164, 54.20391082763672, 88.76412963867188, 8.126619338989258, 46.99407196044922, 122.31504821777344, 155.92391967773438, 241.2688446044922, 145.27896118164062, 35.310699462890625, 205.62249755859375, 80.94778442382812, 165.62310791015625, 209.54190063476562, -15.964096069335938, 13.837379455566406, 15.823688507080078, 87.10900115966797, 52.039756774902344, 83.08007049560547, 73.00033569335938, 12.385635375976562, 80.93130493164062, 79.98843383789062, 137.85308837890625, 187.3695068359375, 97.31239318847656, 19.639286041259766, 60.290321350097656, 246.78701782226562, -12.514678955078125, 85.62010192871094, 55.351959228515625, 101.74752044677734, 217.35617065429688, -11.775238037109375, 126.89415740966797, 132.07666015625, 191.3423614501953, 194.65948486328125, 4.721485137939453, -16.945266723632812, 0.23479080200195312, -10.114631652832031, 74.53927612304688, 35.42756652832031, 19.92192840576172, 45.10334014892578, 34.3477783203125, 89.67303466796875, 103.01406860351562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000213.npy"}
{"epoch": 0.31277533039647576, "step": 214, "batch_size": 64, "mean": 61.19904327392578, "std": 76.56452941894531, "min": -115.60761260986328, "p10": -5.676867675781249, "median": 41.61366081237793, "p90": 183.8830535888672, "max": 277.01617431640625, "pos_frac": 0.828125, "sample": [29.03150177001953, 204.88153076171875, 5.293521881103516, 68.42459106445312, 14.594446182250977, -2.418008804321289, -11.915218353271484, -115.60761260986328, 152.47540283203125, 3.861621856689453, 106.16084289550781, 54.33373260498047, 54.292633056640625, -0.7788906097412109, 39.21681213378906, 41.526248931884766, 74.76116180419922, -6.110219955444336, 21.763839721679688, 86.7567138671875, 37.6937255859375, 16.266746520996094, 69.52367401123047, 105.76507568359375, 116.12503814697266, 41.701072692871094, 33.61212158203125, -4.665712356567383, 51.75497817993164, -16.50199317932129, 27.452190399169922, 52.21941375732422, 21.579551696777344, 203.06314086914062, -53.46735382080078, 237.04641723632812, 198.7816162109375, 15.422698974609375, 5.54638671875, 184.35543823242188, 96.76084899902344, 76.75792694091797, 191.1378631591797, 173.1871795654297, -79.92784881591797, 277.01617431640625, 16.29453468322754, 84.86107635498047, 131.77694702148438, 63.481834411621094, 1.2201766967773438, 87.680908203125, -0.5191173553466797, 38.8408088684082, 9.156822204589844, 96.26811218261719, 32.4247932434082, 182.78082275390625, 35.03759765625, 51.33992004394531, 42.39094161987305, 167.09043884277344, -29.25127410888672, 7.112331390380859], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000214.npy"}
{"epoch": 0.3142437591776799, "step": 215, "batch_size": 64, "mean": 65.80323791503906, "std": 72.91478729248047, "min": -76.97673034667969, "p10": -9.62750930786132, "median": 57.330820083618164, "p90": 144.17865142822268, "max": 337.60296630859375, "pos_frac": 0.84375, "sample": [42.56135177612305, 169.0989990234375, 76.88994598388672, 61.932334899902344, -13.874176025390625, -31.335304260253906, 0.5452499389648438, 98.03470611572266, 93.75175476074219, 50.07551574707031, 44.2187385559082, 28.5681209564209, 16.406625747680664, 7.273294448852539, 237.79052734375, 126.74288177490234, 118.25473022460938, 147.1607666015625, 337.60296630859375, 69.7113037109375, 24.58160400390625, 156.8570556640625, 72.85529327392578, -0.59332275390625, 30.019500732421875, 79.6762924194336, -28.21535873413086, 34.578643798828125, 137.2203826904297, 113.61676788330078, 26.549726486206055, 35.059043884277344, 34.49632263183594, -0.7946300506591797, 116.60505676269531, 59.199371337890625, -42.90129089355469, 108.80411529541016, 106.55104064941406, 107.47463989257812, 171.435546875, 76.6359634399414, 106.5091552734375, 12.87615966796875, 119.68447875976562, 28.400146484375, 59.07477569580078, -2.8647537231445312, 28.058834075927734, 122.62090301513672, -76.97673034667969, 59.94062805175781, 269.19305419921875, 22.645030975341797, 59.06651306152344, 10.13067626953125, 131.0164794921875, -12.525833129882812, 55.59512710571289, 21.552370071411133, 68.0091781616211, 12.281515121459961, -34.25992202758789, 50.25719451904297], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000215.npy"}
{"epoch": 0.315712187958884, "step": 216, "batch_size": 64, "mean": 69.67721557617188, "std": 80.6275863647461, "min": -95.81723022460938, "p10": -24.529902839660643, "median": 48.999996185302734, "p90": 177.6442825317383, "max": 258.6192626953125, "pos_frac": 0.828125, "sample": [126.4882583618164, 154.0167236328125, -26.679298400878906, 90.23600769042969, 83.3206558227539, 35.23063659667969, -49.78501892089844, 89.9317855834961, 122.20974731445312, 2.7508316040039062, 258.6192626953125, 13.693252563476562, 83.21180725097656, -24.943069458007812, 20.510356903076172, 0.8261566162109375, -44.846458435058594, 143.81494140625, -0.8773612976074219, -95.81723022460938, 2.1653289794921875, 176.05267333984375, 7.570404052734375, 31.15655517578125, 106.48185729980469, 77.1937255859375, 9.455413818359375, 100.39803314208984, 110.08526611328125, 42.211578369140625, 76.3005142211914, 89.40138244628906, 59.48793029785156, -8.312389373779297, 205.16067504882812, 150.6949920654297, -36.92234802246094, 192.8822021484375, 17.869956970214844, 40.17223358154297, 20.108983993530273, 151.85092163085938, 15.943094253540039, -85.35694885253906, 43.118873596191406, 90.66193389892578, 145.8271484375, 237.8964080810547, 30.81884765625, -17.164329528808594, 158.04904174804688, 46.79716491699219, 30.03740692138672, 51.20282745361328, 46.6534423828125, 178.32640075683594, 123.68470001220703, 156.7294464111328, 216.697509765625, 214.3601531982422, 20.585647583007812, -23.565847396850586, 25.65235137939453, 149.00836181640625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000216.npy"}
{"epoch": 0.31718061674008813, "step": 217, "batch_size": 64, "mean": 77.040771484375, "std": 71.24847412109375, "min": -68.14031982421875, "p10": 7.027622795104982, "median": 67.7517204284668, "p90": 182.1899871826172, "max": 265.3919677734375, "pos_frac": 0.921875, "sample": [81.71974182128906, 81.47981262207031, 66.9901123046875, 25.707094192504883, 21.834617614746094, 36.79521179199219, 46.768585205078125, 113.58598327636719, 78.11321258544922, 37.55816650390625, 4.4885406494140625, 212.50924682617188, 97.80390167236328, 171.8374481201172, -43.529029846191406, 13.319023132324219, 41.244422912597656, -18.498275756835938, 80.49952697753906, 257.8410949707031, 6.269313812255859, 54.864830017089844, 105.66313934326172, 212.13540649414062, 98.47612762451172, 130.58636474609375, 30.083953857421875, 24.043460845947266, -68.14031982421875, 70.9482192993164, 265.3919677734375, 108.09095001220703, 12.966291427612305, 45.1834716796875, 115.02450561523438, 24.959136962890625, 84.68148803710938, 145.39016723632812, 11.863471984863281, 183.1793975830078, 179.88136291503906, -27.938758850097656, 175.81883239746094, 144.06784057617188, 83.04457092285156, 190.42013549804688, 18.490158081054688, 51.768096923828125, 55.89672088623047, 41.46355438232422, 71.43681335449219, 112.81583404541016, 207.20208740234375, 38.82757568359375, 97.72236633300781, 37.640052795410156, -1.7907257080078125, 10.364387512207031, 29.87091064453125, 33.80604553222656, 68.5133285522461, 123.91361999511719, 108.84782409667969, 8.79701042175293], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000217.npy"}
{"epoch": 0.3186490455212922, "step": 218, "batch_size": 64, "mean": 62.84312438964844, "std": 74.30023193359375, "min": -74.78504943847656, "p10": -20.49938201904296, "median": 54.048274993896484, "p90": 152.4404052734375, "max": 331.53521728515625, "pos_frac": 0.828125, "sample": [57.2892951965332, 31.911720275878906, 111.87786102294922, 63.08390426635742, 94.24974822998047, 17.997295379638672, 188.30801391601562, 10.203109741210938, 12.195207595825195, 53.35614776611328, 106.75684356689453, 48.502349853515625, 4.980998992919922, 108.9500961303711, 214.76409912109375, 51.85693359375, 67.31129455566406, 3.4914169311523438, 45.320030212402344, 107.85002136230469, 26.8974609375, 32.52973175048828, 17.229137420654297, 113.2352294921875, -33.467498779296875, -36.737548828125, 163.0306396484375, 71.07933044433594, 93.68682861328125, 105.232421875, 22.65472412109375, 153.73675537109375, 108.15042114257812, 128.27401733398438, 97.37176513671875, 66.01861572265625, 2.725351333618164, -10.27122688293457, 74.77320861816406, 15.250362396240234, -8.408164978027344, -45.360313415527344, 86.2357406616211, -6.962757110595703, -13.001640319824219, -23.71269989013672, 331.53521728515625, 161.2431640625, 124.70941162109375, 5.6130218505859375, -33.80445861816406, 138.78306579589844, 250.64639282226562, 5.537065505981445, 20.033458709716797, 66.9699935913086, 122.85243225097656, 67.9416275024414, 16.33855628967285, -31.318870544433594, 54.74040222167969, -74.78504943847656, 149.41558837890625, 45.062774658203125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000218.npy"}
{"epoch": 0.3201174743024963, "step": 219, "batch_size": 64, "mean": 54.65869903564453, "std": 71.99447631835938, "min": -140.0369873046875, "p10": -12.595654678344722, "median": 42.77796173095703, "p90": 159.40168151855477, "max": 237.33444213867188, "pos_frac": 0.859375, "sample": [227.5081024169922, 104.0005874633789, 11.218564987182617, 173.29928588867188, 13.581903457641602, 221.2023468017578, -140.0369873046875, 4.680488586425781, 13.36260986328125, 120.06683349609375, 7.165889739990234, 53.670936584472656, 99.52583312988281, 202.78289794921875, 237.33444213867188, 41.029136657714844, 30.227062225341797, 168.8116455078125, 95.63069152832031, 105.03937530517578, 81.16046142578125, 12.752212524414062, -49.28363800048828, 39.7044677734375, 44.52678680419922, 0.01482391357421875, 100.72742462158203, 5.7252197265625, 178.68600463867188, 137.44509887695312, 40.69944763183594, 132.3232421875, 23.029678344726562, 21.459562301635742, 18.337875366210938, 91.64723205566406, 71.5737533569336, 6.405342102050781, 48.623992919921875, 4.45697021484375, 129.4970703125, 66.1527099609375, 77.13421630859375, -43.307281494140625, 35.79449462890625, 73.40333557128906, -55.309486389160156, 8.714302062988281, 15.2109375, 72.23706817626953, 25.52777099609375, -6.738273620605469, 5.1808929443359375, -18.978099822998047, -60.83110046386719, 82.27418518066406, 71.81745910644531, 54.889923095703125, 12.857479095458984, -14.66949462890625, 48.357818603515625, 49.527984619140625, 77.05208587646484, -7.756694793701172], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000219.npy"}
{"epoch": 0.32158590308370044, "step": 220, "batch_size": 64, "mean": 49.40396499633789, "std": 53.194053649902344, "min": -66.75981140136719, "p10": -15.179590415954582, "median": 53.01373291015625, "p90": 120.46457443237306, "max": 177.01370239257812, "pos_frac": 0.828125, "sample": [142.79428100585938, 20.975255966186523, 65.0125732421875, -5.494468688964844, 38.786415100097656, 169.88336181640625, 177.01370239257812, 3.152027130126953, 81.58749389648438, 86.19259643554688, 65.16754913330078, 70.71794891357422, 93.55579376220703, 64.74781799316406, -66.75981140136719, 53.813720703125, 120.72085571289062, 52.2137451171875, 117.46387481689453, 111.77255249023438, 82.77473449707031, 12.319873809814453, 23.214378356933594, -35.901397705078125, -22.801651000976562, 119.86658477783203, -3.206361770629883, 30.022430419921875, 134.487060546875, 32.04758071899414, -6.429412841796875, 74.65968322753906, 88.78713989257812, 5.5011138916015625, 75.45738220214844, 59.34253692626953, 129.95233154296875, 123.51695251464844, 25.79143524169922, 47.932716369628906, 3.055095672607422, 13.875850677490234, 1.8460407257080078, 35.83045959472656, -18.92966651916504, 19.763839721679688, 64.28366088867188, 63.75886535644531, -55.846431732177734, 54.03691101074219, 25.022964477539062, 27.71764373779297, 109.74121856689453, -20.28232765197754, 21.124897003173828, 99.36460876464844, 74.52923583984375, 17.98395538330078, 65.7401123046875, 54.81205749511719, 16.79149627685547, 75.4326400756836, -5.557159423828125, -42.894447326660156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000220.npy"}
{"epoch": 0.32305433186490456, "step": 221, "batch_size": 64, "mean": 90.53008270263672, "std": 77.92446899414062, "min": -66.1712646484375, "p10": 1.3959535598754895, "median": 88.65191650390625, "p90": 197.0647186279297, "max": 268.6773681640625, "pos_frac": 0.90625, "sample": [50.85376739501953, 41.501495361328125, 38.0289306640625, 4.56121826171875, 58.94276428222656, 268.6773681640625, 0.9019241333007812, 195.7664031982422, 146.9869384765625, 156.25900268554688, 177.22750854492188, -0.8046531677246094, 197.6211395263672, 64.68844604492188, 80.33258056640625, 133.86309814453125, 14.8358154296875, 191.842041015625, 2.5486888885498047, 107.87812042236328, 141.251953125, 81.15313720703125, 173.0703125, 241.552978515625, 230.41937255859375, 146.45059204101562, 152.3466033935547, -66.1712646484375, 35.13399124145508, 30.434181213378906, 95.44591522216797, 97.17280578613281, 3.9346160888671875, 49.121543884277344, 39.412933349609375, 86.47136688232422, 51.17266082763672, -55.468101501464844, 101.27106475830078, 201.6105499267578, -4.909019470214844, 134.63470458984375, 164.83255004882812, 126.24630737304688, 71.2014389038086, 120.13887786865234, 23.845550537109375, 220.58245849609375, 145.57986450195312, 92.1685791015625, 22.052295684814453, -54.090065002441406, 49.407264709472656, 116.51274871826172, 21.28985595703125, -21.0682373046875, 27.270219802856445, 94.88008880615234, 40.842899322509766, 90.83246612548828, 191.8837890625, 26.900259017944336, 207.27734375, 117.31329345703125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000221.npy"}
{"epoch": 0.3245227606461087, "step": 222, "batch_size": 64, "mean": 69.71733093261719, "std": 76.03740692138672, "min": -83.29112243652344, "p10": -13.144255638122557, "median": 69.2297592163086, "p90": 172.71329650878909, "max": 263.1428527832031, "pos_frac": 0.828125, "sample": [-12.114051818847656, 188.8802490234375, 3.308441162109375, -13.585771560668945, 20.390480041503906, -2.8774280548095703, -2.672393798828125, 13.092117309570312, 11.21014404296875, 54.104400634765625, 72.67430877685547, 79.90631866455078, 263.1428527832031, 33.36802291870117, 96.43553161621094, 49.691368103027344, -83.29112243652344, 91.84040069580078, 136.49697875976562, 93.38745880126953, 105.73799133300781, 59.968231201171875, -50.9925537109375, 173.98385620117188, -39.681007385253906, 127.38047790527344, 258.018310546875, 52.5106201171875, 39.227821350097656, 8.663711547851562, 82.34784698486328, 10.749191284179688, 143.75694274902344, -39.802772521972656, 84.84394836425781, 101.24397277832031, 21.698074340820312, 116.01669311523438, -14.20376968383789, 83.207275390625, 0.35257720947265625, 94.87168884277344, -6.978523254394531, 6.7400360107421875, 169.7486572265625, 100.48250579833984, 94.87218475341797, 138.96188354492188, 72.60515594482422, 34.326332092285156, 102.96095275878906, 104.26532745361328, 232.3658905029297, 43.37441635131836, 113.05184936523438, 65.85436248779297, 112.45817565917969, -76.40536499023438, 117.92190551757812, 23.09215545654297, 219.88824462890625, 33.126708984375, 197.20187377929688, 48.70753479003906], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000222.npy"}
{"epoch": 0.32599118942731276, "step": 223, "batch_size": 64, "mean": 61.169429779052734, "std": 87.91458892822266, "min": -176.61898803710938, "p10": -52.87220001220703, "median": 64.65215301513672, "p90": 175.02635192871094, "max": 294.626220703125, "pos_frac": 0.75, "sample": [78.18536376953125, -63.70500946044922, 94.15928649902344, 92.62750244140625, 51.42163848876953, 192.64178466796875, 123.28392028808594, 56.401710510253906, 105.60006713867188, 99.68937683105469, -42.370941162109375, 17.468353271484375, 28.118667602539062, 141.23936462402344, 172.1448974609375, 138.44900512695312, 63.91716003417969, 65.38714599609375, 222.9854736328125, -19.375804901123047, 104.16695404052734, -5.286352157592773, 84.04566955566406, -50.10468673706055, 3.8283157348632812, 170.89578247070312, -54.05827713012695, 179.16360473632812, 78.35338592529297, -104.49909210205078, 16.107070922851562, -6.3139801025390625, 142.94178771972656, -104.84663391113281, 73.32180786132812, 119.1605224609375, -42.81986999511719, 91.82323455810547, 50.48281478881836, 294.626220703125, 184.2330780029297, -55.017845153808594, 65.7313003540039, 128.4637908935547, -60.33555603027344, 10.841426849365234, 123.62872314453125, 159.64447021484375, 61.61768341064453, -3.5986480712890625, 43.12482452392578, 5.86309814453125, -29.205108642578125, 192.77084350585938, 21.508934020996094, 35.019081115722656, -10.103231430053711, 19.218191146850586, 93.05693054199219, 109.43814086914062, 62.629295349121094, 97.41449737548828, -176.61898803710938, 176.26126098632812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000223.npy"}
{"epoch": 0.3274596182085169, "step": 224, "batch_size": 64, "mean": 77.3526611328125, "std": 89.359130859375, "min": -120.9803237915039, "p10": -11.371294403076172, "median": 57.028533935546875, "p90": 206.76949005126966, "max": 343.80438232421875, "pos_frac": 0.828125, "sample": [109.8555908203125, 4.941986083984375, 93.63937377929688, 14.895576477050781, -34.73298645019531, 248.143798828125, 140.7105255126953, 47.416534423828125, 57.09532928466797, -120.9803237915039, 56.96173858642578, 125.13203430175781, 127.87245178222656, 70.85800170898438, 239.27359008789062, 51.98672866821289, 90.5810317993164, 146.88856506347656, 123.53897094726562, 343.80438232421875, 178.57948303222656, 64.12645721435547, 157.31158447265625, 139.67416381835938, -10.828216552734375, -31.867290496826172, 152.46084594726562, 60.99639892578125, 132.88055419921875, -17.765289306640625, 32.56321716308594, 34.74853515625, 11.840433120727539, 285.4134826660156, 139.58309936523438, 155.8130340576172, 70.72911834716797, -11.392829895019531, 22.393827438354492, 42.85763931274414, 137.93800354003906, -11.321044921875, 19.539169311523438, -10.369909286499023, 49.88129425048828, -42.826324462890625, -3.056640625, 4.07879638671875, -38.20002746582031, 16.84341049194336, 112.42094421386719, 119.29473876953125, 235.83412170410156, 22.086977005004883, 12.324905395507812, 29.421764373779297, 9.007110595703125, 168.38467407226562, 33.859100341796875, 0.584075927734375, 76.54933166503906, 218.85092163085938, 228.095947265625, 13.347881317138672], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000224.npy"}
{"epoch": 0.328928046989721, "step": 225, "batch_size": 64, "mean": 92.91471862792969, "std": 94.33313751220703, "min": -98.07281494140625, "p10": -19.929166030883785, "median": 83.00249481201172, "p90": 225.10938110351572, "max": 391.3451843261719, "pos_frac": 0.84375, "sample": [169.26846313476562, 95.80130004882812, 234.9043426513672, 246.24026489257812, 33.01122283935547, 41.78794860839844, 37.87284851074219, -13.630218505859375, -13.8065185546875, 200.7176971435547, 235.2122802734375, 150.46920776367188, -39.591339111328125, -5.553810119628906, 170.3918914794922, 13.375858306884766, -64.40052795410156, 13.094747543334961, -22.553157806396484, 31.214279174804688, 121.14657592773438, 81.63302612304688, 105.15868377685547, 114.57698822021484, 0.2564830780029297, 66.85220336914062, 72.45442199707031, 93.01034545898438, 31.409732818603516, 155.883544921875, 195.13685607910156, 84.37196350097656, 132.36830139160156, 61.84423065185547, 3.569561004638672, 269.47271728515625, 254.3463134765625, 2.4564437866210938, 177.3697509765625, 238.29376220703125, 61.18754959106445, 156.6377716064453, 62.07130432128906, -39.29712677001953, 111.08485412597656, 52.619049072265625, 391.3451843261719, 177.6337890625, 111.93283081054688, 78.00192260742188, 184.0154266357422, 50.489654541015625, 136.17091369628906, -24.97479248046875, -50.31473159790039, 202.2544708251953, 150.4545135498047, 96.56068420410156, 130.36488342285156, -98.07281494140625, 76.13323974609375, 95.89787292480469, 5.290748596191406, 53.616119384765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000225.npy"}
{"epoch": 0.3303964757709251, "step": 226, "batch_size": 64, "mean": 71.96773529052734, "std": 90.19156646728516, "min": -120.44528198242188, "p10": -12.621466064453125, "median": 51.44171905517578, "p90": 193.93486785888675, "max": 350.219970703125, "pos_frac": 0.8125, "sample": [350.219970703125, -60.105255126953125, -12.577625274658203, 38.514068603515625, -12.640254974365234, 127.70457458496094, 12.8787841796875, 59.4207649230957, 85.89256286621094, -10.381389617919922, 70.04744720458984, 10.738739013671875, -6.961891174316406, 33.391807556152344, -75.60063171386719, 44.93207931518555, -30.524066925048828, 139.1357421875, 207.86456298828125, 67.90283966064453, 196.5291290283203, 187.881591796875, 35.22064208984375, 73.13841247558594, 75.81602478027344, 155.42916870117188, 52.658653259277344, 16.401870727539062, 67.06414794921875, 250.67092895507812, 144.4514923095703, -29.303756713867188, 108.87655639648438, 56.724395751953125, 33.521461486816406, 238.1240692138672, 161.91827392578125, 275.6517639160156, 167.24191284179688, -3.4569854736328125, 8.287796020507812, 50.22478485107422, -120.44528198242188, 88.03440856933594, 40.28705978393555, 47.889434814453125, 7.640083312988281, 17.233154296875, 36.716094970703125, 58.60279083251953, -1.897308349609375, 182.6940155029297, 66.86117553710938, 126.18859100341797, 42.871063232421875, 85.38459777832031, -26.679214477539062, 7.000755310058594, 279.60333251953125, 30.10327911376953, 25.900489807128906, 70.77568817138672, 16.03741455078125, 162.20823669433594], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000226.npy"}
{"epoch": 0.33186490455212925, "step": 227, "batch_size": 64, "mean": 91.20738220214844, "std": 96.4043960571289, "min": -204.1812744140625, "p10": -9.385914802551266, "median": 89.63237762451172, "p90": 217.81774902343753, "max": 317.72772216796875, "pos_frac": 0.84375, "sample": [212.43807983398438, 52.989898681640625, 74.28071594238281, 233.698974609375, 129.72412109375, 51.54185485839844, 96.5240478515625, 104.65546417236328, 170.6787872314453, -5.7866668701171875, 65.07353973388672, 106.78955078125, 113.07164764404297, -14.724367141723633, -26.074146270751953, 19.340316772460938, 129.5977783203125, 75.88888549804688, 110.50611877441406, 306.7437744140625, 92.78799438476562, 43.888702392578125, 160.19583129882812, 100.96743774414062, 78.2421646118164, 159.8045196533203, 132.15994262695312, 129.27658081054688, 77.9180908203125, 65.66232299804688, 113.96387481689453, 86.47676086425781, 180.04534912109375, 3.757049560546875, -24.60223388671875, -74.18971252441406, 93.53739166259766, 136.55511474609375, 265.290771484375, 77.93391418457031, 106.54866027832031, 220.12332153320312, -79.296630859375, 317.72772216796875, 21.755414962768555, 28.338890075683594, 102.14537048339844, 29.173377990722656, 126.6738052368164, 29.42950439453125, 79.95863342285156, 280.2047119140625, 164.1368408203125, 33.59297180175781, 0.9484043121337891, -10.928449630737305, 210.171875, -204.1812744140625, -2.5623855590820312, 55.152587890625, -5.772176742553711, 17.878732681274414, 96.31263732910156, 313.1098327636719], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000227.npy"}
{"epoch": 0.3333333333333333, "step": 228, "batch_size": 64, "mean": 65.56757354736328, "std": 82.34881591796875, "min": -92.20559692382812, "p10": -37.74826049804687, "median": 60.822998046875, "p90": 183.146257019043, "max": 299.9844665527344, "pos_frac": 0.8125, "sample": [-67.25585174560547, -74.614990234375, -92.20559692382812, 68.00250244140625, 76.20720672607422, 198.9767608642578, -81.9033432006836, 81.9365234375, 115.93489837646484, -48.904232025146484, 8.380859375, 117.9771728515625, 191.3699951171875, 124.4902114868164, 225.71824645996094, 221.4734649658203, 227.5918731689453, 107.34996032714844, 73.54946899414062, -22.914283752441406, -24.083885192871094, 185.740234375, 60.095909118652344, 156.4262237548828, 101.61121368408203, 78.09187316894531, 97.29829406738281, 16.836341857910156, -39.90306854248047, 91.616455078125, 57.68658447265625, 118.25592041015625, 12.488019943237305, 152.7352294921875, 141.3421630859375, 52.52256774902344, 44.61851501464844, 38.44140625, 50.107696533203125, 31.878662109375, 75.55108642578125, 109.03036499023438, 0.517822265625, 1.353363037109375, 61.550086975097656, 62.237693786621094, 299.9844665527344, 5.725791931152344, 59.8265380859375, 23.31110382080078, 177.09364318847656, 3.1258201599121094, -24.849899291992188, 89.25113677978516, 109.91056823730469, 110.90087127685547, 100.79454040527344, 45.355438232421875, -32.720375061035156, 38.3050537109375, -4.643699645996094, 54.28581237792969, -64.75157165527344, 20.212020874023438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000228.npy"}
{"epoch": 0.33480176211453744, "step": 229, "batch_size": 64, "mean": 76.55232238769531, "std": 88.97025299072266, "min": -174.81434631347656, "p10": -17.906453895568838, "median": 67.80654907226562, "p90": 192.01339416503907, "max": 318.54547119140625, "pos_frac": 0.8125, "sample": [-0.9246978759765625, 9.897258758544922, -5.69488525390625, 225.38894653320312, 31.60617446899414, -71.20133972167969, 74.3415756225586, 124.87774658203125, 58.4564208984375, 129.08529663085938, 192.30313110351562, 191.33734130859375, 61.924346923828125, 262.432373046875, 102.18417358398438, 74.6203842163086, 129.94607543945312, 67.21710205078125, 30.136226654052734, 224.42459106445312, -22.341285705566406, 68.39599609375, 9.65289306640625, 136.05833435058594, 73.73407745361328, 35.780731201171875, 318.54547119140625, -24.87908935546875, 21.608779907226562, 121.50164794921875, 149.3641357421875, 158.8172607421875, 147.48719787597656, 188.74122619628906, 221.427001953125, 53.036216735839844, 130.8231658935547, 27.513038635253906, 59.3111572265625, -6.5775299072265625, 77.7569808959961, 43.474586486816406, 3.5701370239257812, 107.22377014160156, -21.937673568725586, 53.54185485839844, 129.38241577148438, 113.42387390136719, -8.500274658203125, -68.13272094726562, 85.8987808227539, 262.07879638671875, 23.016950607299805, 64.98574829101562, 91.89228820800781, 95.74613952636719, 8.784660339355469, 118.14583587646484, -1.13623046875, 145.99496459960938, -174.81434631347656, 0.04714775085449219, -31.941940307617188, 0.48841094970703125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000229.npy"}
{"epoch": 0.33627019089574156, "step": 230, "batch_size": 64, "mean": 98.563720703125, "std": 95.31008911132812, "min": -159.510009765625, "p10": -16.26131725311279, "median": 93.69509506225586, "p90": 246.5184509277344, "max": 299.0873107910156, "pos_frac": 0.859375, "sample": [77.44527435302734, 236.07937622070312, 3.817079544067383, 173.978759765625, 110.0390396118164, 50.922691345214844, -11.619722366333008, 158.29759216308594, 72.51568603515625, 95.47671508789062, 115.14259338378906, 94.08251190185547, -18.250572204589844, 121.7890625, 47.08186340332031, 46.742462158203125, -30.93140411376953, 73.97952270507812, 25.9013671875, 77.74671936035156, 61.324798583984375, -63.65953063964844, -21.05449676513672, 94.10693359375, 87.70220947265625, 101.64212036132812, 116.49659729003906, 190.18511962890625, 156.27139282226562, 117.28235626220703, 62.37633514404297, 28.308921813964844, 98.92059326171875, 299.0873107910156, 117.27606201171875, -34.099830627441406, 93.8090591430664, 117.60076141357422, 147.94155883789062, 296.04486083984375, -21.232927322387695, 8.340103149414062, -159.510009765625, -3.4911575317382812, 2.7860260009765625, 23.662132263183594, 247.2138671875, 81.99752807617188, 269.6832275390625, 267.0380859375, 119.63050842285156, 52.60293197631836, 93.58113098144531, 30.41337776184082, 240.4185791015625, 177.0045623779297, 279.32830810546875, 73.94587707519531, 244.89581298828125, 126.75828552246094, 158.59165954589844, 55.30072021484375, 65.68211364746094, 285.6379089355469], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000230.npy"}
{"epoch": 0.3377386196769457, "step": 231, "batch_size": 64, "mean": 72.66645812988281, "std": 88.12664794921875, "min": -57.34916687011719, "p10": -16.842718887329095, "median": 58.30626678466797, "p90": 172.0138809204102, "max": 389.0895080566406, "pos_frac": 0.828125, "sample": [311.8926696777344, 86.97087097167969, -11.984046936035156, 139.52740478515625, 17.762840270996094, 63.207481384277344, 200.25408935546875, 14.448501586914062, 39.43048858642578, 118.96991729736328, 21.243972778320312, 141.14239501953125, -18.925006866455078, 69.94864654541016, 278.218505859375, 76.22982025146484, 133.03932189941406, 99.9710693359375, 95.2443618774414, 145.373046875, 220.41494750976562, 84.78382873535156, 58.52617645263672, 82.52815246582031, 44.07617950439453, -38.17224884033203, 176.90994262695312, -35.57041549682617, 128.11502075195312, 389.0895080566406, 289.0713195800781, 142.31170654296875, 68.6136703491211, 18.9710750579834, -57.34916687011719, 35.41318130493164, 45.859432220458984, 132.11611938476562, 19.063507080078125, 1.900848388671875, -2.6589202880859375, 31.50493049621582, 81.0716552734375, 22.660810470581055, -8.621604919433594, 58.08635711669922, -21.85455322265625, -43.92585754394531, 24.817630767822266, 86.79843139648438, 160.58973693847656, 77.59303283691406, 38.97801208496094, 21.39419174194336, -7.469646453857422, 84.1119384765625, -47.46173095703125, 56.937381744384766, 12.756591796875, 11.057098388671875, 16.000442504882812, 77.35174560546875, 69.158447265625, 23.138137817382812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000231.npy"}
{"epoch": 0.3392070484581498, "step": 232, "batch_size": 64, "mean": 60.88869094848633, "std": 92.43216705322266, "min": -99.46353149414062, "p10": -27.0786262512207, "median": 35.917022705078125, "p90": 190.66481628417978, "max": 347.97998046875, "pos_frac": 0.796875, "sample": [89.67295837402344, 100.94579315185547, 3.914875030517578, 36.069908142089844, 6.388755798339844, -23.545562744140625, 218.78598022460938, 111.84640502929688, -10.703067779541016, 166.23797607421875, 3.002473831176758, 215.3640594482422, 54.95695495605469, -28.592796325683594, -12.962783813476562, 29.558006286621094, -8.937713623046875, 128.89129638671875, -73.46253204345703, 58.01471710205078, 85.87060546875, 321.70330810546875, 5.949737548828125, 57.17976379394531, 72.73705291748047, 1.7073211669921875, 327.9600830078125, 4.210235595703125, 7.7708740234375, 170.61968994140625, 88.06262969970703, 100.5261459350586, -29.12298583984375, 37.27740478515625, -37.998199462890625, 59.73396301269531, -72.6837158203125, 18.04730224609375, 19.44902801513672, 347.97998046875, 10.15228271484375, -16.000511169433594, 29.011566162109375, -99.46353149414062, 17.01431655883789, 41.12702941894531, 48.98431396484375, -75.74239349365234, 118.75651550292969, 99.78404235839844, 200.5352325439453, 18.309749603271484, 29.118545532226562, 199.25558471679688, 161.39071655273438, 28.308637619018555, 117.24251556396484, 120.92170715332031, -1.069314956665039, 35.764137268066406, 14.90328598022461, 54.74614334106445, 60.09502410888672, 31.30480194091797], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000232.npy"}
{"epoch": 0.3406754772393539, "step": 233, "batch_size": 64, "mean": 89.08324432373047, "std": 110.37028503417969, "min": -95.07673645019531, "p10": -37.72989444732665, "median": 63.241764068603516, "p90": 256.9729034423829, "max": 411.249267578125, "pos_frac": 0.796875, "sample": [16.404176712036133, 22.60065460205078, 273.52618408203125, 9.292343139648438, 87.92092895507812, 133.47665405273438, -73.35627746582031, -41.41901397705078, 5.3776397705078125, -95.07673645019531, 51.80986022949219, 57.94694519042969, 68.90184020996094, 159.35635375976562, 164.89556884765625, 346.7626953125, 95.77640533447266, 38.266937255859375, 271.0187072753906, 181.89195251464844, 19.918426513671875, 109.23780822753906, 43.33598327636719, 272.8226318359375, 14.918876647949219, 137.02638244628906, 22.989646911621094, -24.057098388671875, -64.0500717163086, -7.350513458251953, 119.30406188964844, 204.81942749023438, 118.83673095703125, 109.82913208007812, 186.24513244628906, 237.26803588867188, -29.224443435668945, 225.82908630371094, 3.09808349609375, 40.074737548828125, 95.95292663574219, 280.44091796875, 20.242443084716797, 137.77011108398438, 38.782867431640625, 198.484619140625, -12.930561065673828, -66.42899322509766, -60.05168151855469, 19.970314025878906, 26.653162002563477, 37.792724609375, 192.75852966308594, 68.53658294677734, -11.200042724609375, 111.8072509765625, -8.631732940673828, 265.4178466796875, -41.37508773803711, 411.249267578125, 216.7169189453125, 31.43798828125, 123.0953369140625, 108.5899429321289], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000233.npy"}
{"epoch": 0.342143906020558, "step": 234, "batch_size": 64, "mean": 89.16376495361328, "std": 87.84344482421875, "min": -98.10919189453125, "p10": -3.837342834472651, "median": 81.33277130126953, "p90": 205.78646850585946, "max": 353.1755065917969, "pos_frac": 0.890625, "sample": [-37.54338836669922, 167.81605529785156, 130.66793823242188, 72.7210693359375, 145.34901428222656, 153.62506103515625, 101.1702880859375, 124.98928833007812, 252.60018920898438, 28.75596809387207, 146.8144989013672, 77.3714599609375, -98.10919189453125, 24.01340103149414, 29.242637634277344, 117.28203582763672, 26.943450927734375, -31.878257751464844, 37.78529357910156, 126.08973693847656, 353.1755065917969, 124.31360626220703, 29.2110652923584, 2.9313278198242188, 42.67540740966797, 9.0989990234375, 125.72229766845703, 134.27890014648438, 10.09988784790039, 213.806396484375, 5.300241470336914, 102.46604919433594, 176.44671630859375, 92.73027801513672, 319.5155029296875, 57.851043701171875, 42.139801025390625, 138.07315063476562, 78.70524597167969, 102.47117614746094, 138.05804443359375, 93.42704772949219, 187.07330322265625, 33.08708190917969, 26.338539123535156, 56.296993255615234, 1.42742919921875, -27.67572784423828, 130.93063354492188, 83.96029663085938, 50.1365966796875, 227.3280487060547, 271.8219299316406, -39.497596740722656, 111.6463394165039, 68.716796875, -6.0936737060546875, 30.871877670288086, 91.228759765625, 32.25074768066406, 75.30459594726562, -40.246238708496094, 242.43930053710938, 110.93087768554688], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000234.npy"}
{"epoch": 0.3436123348017621, "step": 235, "batch_size": 64, "mean": 59.925636291503906, "std": 82.09176635742188, "min": -142.16543579101562, "p10": -20.752742958068843, "median": 52.980995178222656, "p90": 154.18634490966804, "max": 280.57806396484375, "pos_frac": 0.765625, "sample": [19.449806213378906, 133.18084716796875, -142.16543579101562, 43.572479248046875, 126.74394226074219, 279.64007568359375, 81.9924087524414, 201.38510131835938, -26.975162506103516, 94.30899047851562, -70.9658203125, -26.35638427734375, 62.372283935546875, 106.7265625, 36.38812255859375, -16.852935791015625, -5.752960205078125, 24.91857147216797, 85.75579833984375, -68.04791259765625, 1.3211555480957031, 17.586456298828125, -10.035842895507812, 4.236625671386719, 182.6810302734375, 123.86399841308594, 1.74664306640625, -10.337501525878906, 112.6146011352539, 160.12669372558594, 91.6002197265625, 63.01141357421875, 128.1685791015625, -12.058364868164062, 77.38992309570312, 102.81401062011719, 32.267433166503906, 90.1689224243164, 51.6326904296875, 19.00482177734375, 20.062946319580078, 31.99749755859375, 265.6047058105469, 54.32929992675781, 140.32553100585938, 111.85478210449219, 112.93041229248047, -17.105552673339844, -67.93282318115234, 36.76530456542969, 280.57806396484375, 134.99215698242188, -10.786109924316406, 17.615768432617188, 80.27235412597656, -22.315824508666992, -7.659538269042969, 72.49246215820312, 83.06759643554688, 10.790071487426758, 96.47029876708984, 58.19887161254883, 183.66036987304688, 1.9098358154296875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000235.npy"}
{"epoch": 0.34508076358296624, "step": 236, "batch_size": 64, "mean": 87.5325927734375, "std": 76.85736083984375, "min": -43.046173095703125, "p10": 3.7768196105957044, "median": 69.88385391235352, "p90": 200.3757415771485, "max": 289.7439270019531, "pos_frac": 0.921875, "sample": [210.462646484375, 40.970420837402344, -14.513565063476562, 84.51817321777344, 123.00225830078125, 67.67436218261719, 206.69277954101562, 25.446083068847656, 65.74214172363281, 70.7432861328125, 73.37210083007812, 208.54440307617188, 73.97286987304688, 108.3892593383789, 109.94890594482422, 161.92721557617188, 42.16796875, 145.82469177246094, 9.462944030761719, 220.15321350097656, 63.31956481933594, -43.046173095703125, 41.46345520019531, 134.945556640625, 171.19229125976562, 24.141021728515625, 52.843223571777344, 185.635986328125, 164.73944091796875, 40.28955841064453, 289.7439270019531, 44.272438049316406, 9.245590209960938, 128.4142303466797, 58.17942428588867, 141.46554565429688, -8.958473205566406, 246.6328125, 69.02442169189453, 20.795177459716797, 67.23217010498047, 29.91895294189453, 16.79640007019043, 5.927604675292969, 3.2077903747558594, 168.08802795410156, 59.62283706665039, 24.2691650390625, -34.018009185791016, 3.3191909790039062, 100.79988098144531, 251.01504516601562, 181.35899353027344, 28.665206909179688, 136.83201599121094, 80.43020629882812, 167.017333984375, 75.61901092529297, 4.8446197509765625, 123.58360290527344, -21.571319580078125, 153.67332458496094, 14.977306365966797, 91.63502502441406], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000236.npy"}
{"epoch": 0.3465491923641703, "step": 237, "batch_size": 64, "mean": 75.69317626953125, "std": 88.49095153808594, "min": -145.927490234375, "p10": -20.37361068725586, "median": 69.79029846191406, "p90": 190.9308868408203, "max": 337.3763427734375, "pos_frac": 0.828125, "sample": [79.73147583007812, 120.52880859375, 191.751220703125, 149.50241088867188, -1.4454498291015625, 101.3619155883789, 66.73064422607422, 97.77937316894531, 0.3526191711425781, 64.10052490234375, 49.42875671386719, 34.62530517578125, 254.2876434326172, 49.37408447265625, 132.07504272460938, 31.468582153320312, -28.412738800048828, 86.17103576660156, -19.75267791748047, 98.08617401123047, 107.10565185546875, -20.639724731445312, 190.94131469726562, 337.3763427734375, -33.42634582519531, 248.69635009765625, 122.11172485351562, 10.948837280273438, 29.027236938476562, 3.7284507751464844, 259.81573486328125, 82.23847961425781, 137.84681701660156, 159.2753143310547, 29.299400329589844, 106.45774841308594, 2.7586746215820312, 41.078529357910156, 133.9694061279297, 8.213264465332031, 147.38882446289062, 10.577980041503906, 155.14425659179688, 121.2486572265625, 223.94483947753906, 15.683380126953125, -8.80810546875, 72.8499526977539, -145.927490234375, 190.90655517578125, -6.202901840209961, 34.428794860839844, -65.72257995605469, -24.894065856933594, 117.72286224365234, 164.2130889892578, 3.5838851928710938, 149.3280029296875, 13.341468811035156, -38.40559387207031, 7.7159423828125, 88.44986724853516, 18.76104736328125, 84.4666748046875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000237.npy"}
{"epoch": 0.34801762114537443, "step": 238, "batch_size": 64, "mean": 69.70079040527344, "std": 84.61013793945312, "min": -77.50553894042969, "p10": -10.664713668823241, "median": 54.80696105957031, "p90": 157.01440887451173, "max": 351.78521728515625, "pos_frac": 0.828125, "sample": [3.8325347900390625, 2.9866943359375, 7.152191162109375, 94.02165222167969, 23.790149688720703, 9.648139953613281, 70.15098571777344, 126.77777099609375, -16.31142234802246, 202.03021240234375, 58.145912170410156, 108.7054672241211, 58.435096740722656, 27.95834732055664, 19.840795516967773, 107.29364013671875, 156.78082275390625, -33.35670471191406, 26.562177658081055, 43.595176696777344, -4.399869918823242, 351.78521728515625, 24.400131225585938, 153.5183868408203, -21.97917938232422, 299.4349060058594, 45.35523986816406, 30.483055114746094, 153.77484130859375, -33.71116638183594, 85.0301513671875, 36.26863098144531, 45.61187744140625, 99.96773529052734, 159.52023315429688, 14.194019317626953, 13.824337005615234, 69.11270141601562, 333.14642333984375, 80.5370864868164, -8.97439193725586, 4.1871337890625, 26.495792388916016, -49.892845153808594, 10.484817504882812, 51.46800994873047, 81.70909118652344, 113.65402221679688, 58.32640838623047, 217.2860107421875, 127.88949584960938, -8.94781494140625, 95.88433837890625, 148.91497802734375, -77.50553894042969, 157.11451721191406, 127.48595428466797, 59.92073059082031, -11.389137268066406, -8.2366943359375, 87.25012969970703, 101.0738525390625, 21.8026123046875, 100.9346923828125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000238.npy"}
{"epoch": 0.34948604992657856, "step": 239, "batch_size": 64, "mean": 80.02545166015625, "std": 85.3837661743164, "min": -71.46484375, "p10": -9.9886173248291, "median": 64.71949005126953, "p90": 225.1063308715821, "max": 254.66311645507812, "pos_frac": 0.828125, "sample": [254.66311645507812, 155.53302001953125, -3.657806396484375, 206.21128845214844, -9.220069885253906, 52.0908203125, 82.26396179199219, 102.97857666015625, -10.221187591552734, -49.500579833984375, 30.051441192626953, 31.53875732421875, 25.10364532470703, 135.92474365234375, 41.55043029785156, 0.7964096069335938, 246.68795776367188, 68.10689544677734, 110.78795623779297, 101.21378326416016, 143.74766540527344, 79.42906188964844, 128.1273651123047, -40.00648498535156, 1.5847091674804688, 24.45033073425293, 111.71701049804688, 233.60552978515625, 97.38206481933594, 61.03153991699219, 0.9419040679931641, 29.476318359375, 231.0548553466797, 249.5428924560547, -49.21501159667969, 61.33208465576172, -3.7342453002929688, 211.2264404296875, -31.182357788085938, 164.3995361328125, 40.54788589477539, 192.89561462402344, 102.71764373779297, 116.82556915283203, 14.205093383789062, 40.86016082763672, -71.46484375, 20.679908752441406, -35.32650375366211, 97.3369140625, 79.40916442871094, 38.899078369140625, 180.0356903076172, 42.481754302978516, 250.68003845214844, 130.2012176513672, 248.52674865722656, 38.03714370727539, 84.78530883789062, 76.7981948852539, 30.053756713867188, -9.445953369140625, 88.44511413574219, 45.62962341308594], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000239.npy"}
{"epoch": 0.3509544787077827, "step": 240, "batch_size": 64, "mean": 74.52180480957031, "std": 86.9332275390625, "min": -85.19563293457031, "p10": -47.182191848754876, "median": 74.227294921875, "p90": 203.5898590087891, "max": 288.8628234863281, "pos_frac": 0.78125, "sample": [90.85713958740234, 130.1270294189453, 111.12824249267578, 40.5008544921875, 101.49699401855469, 221.97572326660156, 39.006439208984375, 151.92538452148438, -30.46875, -33.183937072753906, 64.9080810546875, 57.94996643066406, -60.91326904296875, 135.46981811523438, 205.6141357421875, 60.91419219970703, -51.09762191772461, 21.8111572265625, 12.49053955078125, 152.1361083984375, 76.4841079711914, 208.1575469970703, 81.00492858886719, 88.96036529541016, 38.41419982910156, 23.816173553466797, -76.23702239990234, 43.64342498779297, 69.89862060546875, 240.6573944091797, 169.97389221191406, 94.57894897460938, 70.25962829589844, 40.075172424316406, -67.39813232421875, 198.86654663085938, 106.793701171875, 143.55091857910156, -60.44007873535156, 71.9704818725586, -11.60104751586914, 154.26133728027344, -66.17578125, 42.84708786010742, 288.8628234863281, -34.24815368652344, 146.340087890625, -20.464113235473633, 174.83123779296875, 93.04747009277344, 66.94667053222656, 232.0267333984375, 24.148406982421875, 78.92061614990234, -38.04618835449219, 106.36946868896484, 97.12434387207031, -6.26081657409668, 116.22219848632812, 104.29685974121094, 28.24755096435547, 81.79641723632812, 209.4190673828125, -85.19563293457031], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000240.npy"}
{"epoch": 0.3524229074889868, "step": 241, "batch_size": 64, "mean": 103.05683898925781, "std": 107.9900131225586, "min": -120.38542175292969, "p10": -7.248532009124752, "median": 89.00009536743164, "p90": 228.1500411987305, "max": 459.0991516113281, "pos_frac": 0.859375, "sample": [63.06163787841797, 84.56045532226562, -62.72016143798828, -22.399484634399414, -14.498367309570312, 129.20372009277344, 95.27716827392578, 229.54490661621094, 78.76573181152344, 94.35199737548828, 105.1484375, 12.587562561035156, -0.5904273986816406, 306.03204345703125, -2.55178165435791, 100.96629333496094, 15.65585708618164, 171.86837768554688, 143.2725067138672, 122.56221008300781, 167.45643615722656, 99.2900390625, 79.70165252685547, 53.43992614746094, 459.0991516113281, 193.68304443359375, 90.8653564453125, 98.5759048461914, 370.8206481933594, 11.661819458007812, 417.15521240234375, 180.54872131347656, 224.89535522460938, 61.08409118652344, 60.4755859375, 44.074180603027344, 101.42586517333984, 85.6699447631836, 15.334686279296875, 83.80844116210938, 87.13483428955078, 39.89690399169922, 52.19091796875, 12.413131713867188, 166.917236328125, 17.93841552734375, 124.2051010131836, -9.261425018310547, 148.1637420654297, 115.6976318359375, 40.69171142578125, 60.18012237548828, 151.65585327148438, 247.95263671875, 130.91497802734375, 81.85041046142578, -102.052001953125, 177.0019073486328, 181.32400512695312, 174.33999633789062, -120.38542175292969, 260.3718566894531, 34.43781661987305, -27.10772705078125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000241.npy"}
{"epoch": 0.35389133627019087, "step": 242, "batch_size": 64, "mean": 71.43954467773438, "std": 87.3676528930664, "min": -82.52740478515625, "p10": -14.944469451904286, "median": 55.07327651977539, "p90": 176.28818969726566, "max": 384.26324462890625, "pos_frac": 0.84375, "sample": [-37.59490203857422, 65.5892333984375, 54.59740447998047, 122.49531555175781, 16.821483612060547, -82.52740478515625, 32.096160888671875, 3.452556610107422, 150.3131561279297, 36.454498291015625, 12.331262588500977, 96.96871948242188, 291.4627990722656, 23.965482711791992, -22.083824157714844, 34.19615173339844, 49.70488739013672, 62.95074462890625, 76.31431579589844, 13.568513870239258, 115.38983154296875, 13.228584289550781, 133.0157012939453, 121.20781707763672, 125.70431518554688, -3.8502731323242188, 112.99066925048828, 51.23255157470703, 71.95673370361328, -19.2796630859375, 106.78929901123047, 115.62539672851562, 43.065032958984375, -68.74542236328125, 165.02444458007812, 67.64875030517578, 181.11550903320312, 254.42877197265625, 45.272865295410156, -1.3346099853515625, 55.54914855957031, 93.53410339355469, 23.354812622070312, 384.26324462890625, 18.851585388183594, 42.92284393310547, 266.88848876953125, 63.550079345703125, 16.644699096679688, 34.83526611328125, -36.322227478027344, 268.2391357421875, 68.68861389160156, 7.986351013183594, 83.17607116699219, 24.1995849609375, 75.94331359863281, -29.106159210205078, 81.70196533203125, 76.65350341796875, 2.6786441802978516, 229.52847290039062, 91.63528442382812, -4.829017639160156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000242.npy"}
{"epoch": 0.355359765051395, "step": 243, "batch_size": 64, "mean": 95.98948669433594, "std": 91.78363800048828, "min": -83.7388916015625, "p10": -1.2512569427490228, "median": 81.02103805541992, "p90": 209.48662261962892, "max": 403.6221008300781, "pos_frac": 0.875, "sample": [206.64930725097656, -1.5264015197753906, 46.499488830566406, 251.9597625732422, 136.09808349609375, 92.30242156982422, 79.52252197265625, 123.8870620727539, -47.272647857666016, -38.414344787597656, -46.967926025390625, 47.525115966796875, 181.50978088378906, 36.56105041503906, 19.233139038085938, 71.55128479003906, 4.5250701904296875, 196.8685302734375, 74.99711608886719, 84.578125, -83.7388916015625, 26.813385009765625, 92.89285278320312, 34.048484802246094, 18.71916389465332, 196.1477813720703, 32.3759765625, 80.49906921386719, 86.50069427490234, 165.92340087890625, 142.69482421875, 158.92752075195312, 9.039255142211914, 100.78652954101562, -7.249542236328125, 193.96310424804688, 102.40850830078125, 245.46484375, 145.7682342529297, 146.92514038085938, 208.8757781982422, 52.194053649902344, 272.23748779296875, 403.6221008300781, 268.84649658203125, 62.98040771484375, 60.256736755371094, 2.641979217529297, 209.7484130859375, -10.874519348144531, 81.54300689697266, 95.86872100830078, 43.52117919921875, 87.19631958007812, 273.1330261230469, 52.075599670410156, 111.13905334472656, 60.41583251953125, -0.6092529296875, 145.44784545898438, 39.48780822753906, 43.54200744628906, 62.954139709472656, 108.08635711669922], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000243.npy"}
{"epoch": 0.3568281938325991, "step": 244, "batch_size": 64, "mean": 92.55368041992188, "std": 82.52848815917969, "min": -69.19011688232422, "p10": -6.060930633544919, "median": 80.80603790283203, "p90": 220.624398803711, "max": 254.83392333984375, "pos_frac": 0.875, "sample": [121.07957458496094, 29.76789093017578, 53.47725296020508, 72.8084945678711, 12.953838348388672, 31.659236907958984, 249.23355102539062, 95.82574462890625, -24.969757080078125, 8.31787109375, 104.78802490234375, 114.45219421386719, -17.934982299804688, 185.91456604003906, 30.499603271484375, 121.28939819335938, -2.9208106994628906, 251.04586791992188, 20.701801300048828, 81.34980010986328, -69.19011688232422, 71.70742797851562, 164.56947326660156, -45.05963897705078, 175.78826904296875, 22.087488174438477, 57.23084259033203, 145.89825439453125, 52.452239990234375, 2.9876785278320312, 145.0691375732422, 60.351234436035156, 184.9365234375, 224.80569458007812, 238.7938232421875, 203.69187927246094, 113.40248107910156, 254.83392333984375, 37.5421142578125, 251.4877471923828, 126.52008056640625, -9.180356979370117, 210.8680419921875, 106.04315948486328, 119.53364562988281, -19.140396118164062, 74.7502212524414, 6.515958786010742, 5.496246337890625, 89.26657104492188, 49.3629150390625, 32.778587341308594, 80.26227569580078, 63.556732177734375, 47.84375, 64.22001647949219, 175.82244873046875, 149.4853515625, -7.406696319580078, 104.14996337890625, 85.244384765625, 226.727783203125, 111.11180114746094, 196.87725830078125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000244.npy"}
{"epoch": 0.35829662261380324, "step": 245, "batch_size": 64, "mean": 81.5028076171875, "std": 110.91514587402344, "min": -182.70291137695312, "p10": -17.812093734741207, "median": 58.13947296142578, "p90": 205.60277404785163, "max": 414.2124328613281, "pos_frac": 0.8125, "sample": [46.20904541015625, 259.82086181640625, 123.01516723632812, 86.28248596191406, 91.43447875976562, 85.48130798339844, 44.60966491699219, -5.4409332275390625, 144.17684936523438, 76.46957397460938, 37.870635986328125, 74.09913635253906, 131.89454650878906, 33.5860481262207, 25.32840919494629, 124.59918975830078, 151.0701904296875, 173.2587432861328, 49.378692626953125, 7.400241851806641, 341.2584228515625, 84.25743103027344, -15.391651153564453, 252.1581268310547, 5.629587173461914, 11.334915161132812, 80.76226043701172, -26.96389389038086, -101.66258239746094, 5.644752502441406, 174.50010681152344, 117.03941345214844, 399.33062744140625, -43.28034973144531, 123.6295394897461, 0.04175567626953125, -0.7234268188476562, 42.883079528808594, 176.46435546875, 100.36949157714844, -1.2718925476074219, 78.3968505859375, 118.90760803222656, -18.84942626953125, 43.04961395263672, -182.70291137695312, 184.2849884033203, 191.75827026367188, 38.367225646972656, 123.38893127441406, 211.5361328125, 3.3134765625, 365.95013427734375, -19.604827880859375, -2.850841522216797, 414.2124328613281, 66.90025329589844, 81.35028839111328, 13.946739196777344, -51.603240966796875, 31.444805145263672, 4.1195526123046875, 33.142173767089844, 1.1673011779785156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000245.npy"}
{"epoch": 0.35976505139500736, "step": 246, "batch_size": 64, "mean": 79.59729766845703, "std": 100.67549896240234, "min": -104.14811706542969, "p10": -15.350130462646481, "median": 43.967994689941406, "p90": 229.41325683593752, "max": 406.0634460449219, "pos_frac": 0.859375, "sample": [96.7999267578125, 226.07437133789062, 64.11715698242188, 21.67095184326172, 214.28347778320312, 144.5067596435547, 406.0634460449219, -79.46701049804688, 12.910579681396484, -104.14811706542969, 39.62567901611328, 88.8723373413086, 34.556583404541016, 9.51323127746582, 118.54519653320312, 238.5714569091797, 234.70196533203125, 17.306419372558594, 37.87397766113281, 52.5428466796875, 36.568878173828125, 239.77459716796875, 120.96026611328125, 13.308723449707031, 89.52696228027344, -0.49273681640625, 155.29530334472656, -16.8311767578125, 190.93902587890625, -41.140724182128906, 4.5556488037109375, 29.901939392089844, 68.95097351074219, 42.62480163574219, 158.49456787109375, 11.465965270996094, 36.48442840576172, 31.908069610595703, 33.06220245361328, 109.57389068603516, 230.84420776367188, 59.41614532470703, 20.943519592285156, 123.36659240722656, 48.56812286376953, 81.12486267089844, 39.85542678833008, 16.902816772460938, 5.705913543701172, 112.08103942871094, -90.83297729492188, 42.985267639160156, 201.18826293945312, 61.516815185546875, 33.46684265136719, 44.950721740722656, 11.390420913696289, 370.8511962890625, 179.01937866210938, 251.9425506591797, -11.894355773925781, 133.2138671875, -31.686172485351562, -30.546215057373047], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000246.npy"}
{"epoch": 0.36123348017621143, "step": 247, "batch_size": 64, "mean": 112.23686218261719, "std": 115.15831756591797, "min": -149.99661254882812, "p10": -4.8081438064575135, "median": 90.52703475952148, "p90": 253.62225189208985, "max": 441.7412414550781, "pos_frac": 0.890625, "sample": [-117.62267303466797, 41.9307861328125, 38.17381286621094, 254.1448211669922, 66.26190185546875, 325.3497009277344, 128.4072265625, 205.7509002685547, 1.163278579711914, 187.22518920898438, 233.65353393554688, 252.40292358398438, 112.624755859375, 247.49179077148438, 37.629249572753906, 202.99234008789062, 322.40228271484375, 21.24585723876953, 241.422119140625, 326.5101318359375, 73.22474670410156, 16.000635147094727, 55.512786865234375, 34.159690856933594, 11.288185119628906, 136.2158660888672, 11.423095703125, 80.02067565917969, 92.55712127685547, 67.55725860595703, 138.13009643554688, 187.2008514404297, 50.77411651611328, 66.58964538574219, 66.75912475585938, 228.31317138671875, 223.11862182617188, 160.85467529296875, 112.21376037597656, -82.39385223388672, 5.352947235107422, 98.13627624511719, -21.09637451171875, 88.4969482421875, 43.71929931640625, 21.197174072265625, 102.29055786132812, 180.58447265625, 341.148681640625, 83.10992431640625, 26.002967834472656, -34.799102783203125, 66.38069152832031, 441.7412414550781, 109.34292602539062, -19.181114196777344, 113.30120086669922, 173.6711883544922, 164.31805419921875, -149.99661254882812, 278.658203125, -7.3673248291015625, 41.283355712890625, 178.18353271484375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000247.npy"}
{"epoch": 0.36270190895741555, "step": 248, "batch_size": 64, "mean": 88.36830139160156, "std": 75.74627685546875, "min": -28.524600982666016, "p10": 2.234283256530765, "median": 75.56774139404297, "p90": 198.03738555908205, "max": 274.51727294921875, "pos_frac": 0.90625, "sample": [185.463134765625, 163.97634887695312, 56.550201416015625, 122.05162811279297, 104.11614227294922, 0.7550754547119141, 40.63999938964844, 209.67147827148438, 49.77560043334961, 12.26800537109375, 16.204429626464844, 189.86648559570312, 91.02337646484375, 14.864347457885742, 120.70558166503906, 5.685768127441406, 19.596040725708008, -8.037538528442383, 79.78250885009766, 52.31711959838867, 73.71871185302734, 77.78722381591797, 153.98983764648438, 238.52017211914062, 25.909820556640625, -6.94781494140625, 96.17533111572266, 153.79864501953125, 122.53718566894531, 134.4070281982422, 81.64826965332031, -10.783950805664062, 63.56220245361328, 172.8685302734375, 32.95616912841797, 137.6786651611328, 23.465286254882812, 18.778778076171875, 37.58940887451172, 34.42094039916992, 95.38282775878906, 201.53919982910156, 119.5314712524414, 252.89569091796875, 64.73319244384766, 41.01530456542969, 274.51727294921875, 189.79312133789062, 26.854583740234375, 204.05604553222656, -16.700706481933594, 43.246788024902344, 205.42501831054688, 118.07881164550781, 165.9180908203125, 186.089599609375, -3.4382781982421875, 25.582305908203125, 77.4167709350586, 130.48529052734375, 32.309837341308594, 53.773014068603516, 6.234775543212891, -28.524600982666016], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000248.npy"}
{"epoch": 0.3641703377386197, "step": 249, "batch_size": 64, "mean": 63.81077194213867, "std": 96.50648498535156, "min": -151.77320861816406, "p10": -43.78258819580078, "median": 59.58405685424805, "p90": 186.21284942626954, "max": 379.75811767578125, "pos_frac": 0.6875, "sample": [-41.10609436035156, 178.0739288330078, -4.3171539306640625, -26.693117141723633, 43.460411071777344, -35.26293182373047, 13.38262939453125, 72.19891357421875, 65.45272827148438, 204.09622192382812, 19.627410888671875, 28.708404541015625, 30.435441970825195, 55.551597595214844, 77.362548828125, -151.77320861816406, -18.851558685302734, 188.42572021484375, 120.07844543457031, -32.900413513183594, 102.02078247070312, 118.15242004394531, -26.509658813476562, 98.36492156982422, -5.6207733154296875, 185.6666259765625, -51.334564208984375, 195.2268524169922, 157.19589233398438, 120.84077453613281, -17.337032318115234, 90.05264282226562, 107.79383087158203, 24.608901977539062, 59.344078063964844, 18.70977020263672, 108.97595977783203, -111.24388122558594, 110.11729431152344, 216.4532470703125, 144.02517700195312, -5.6328125, 379.75811767578125, -33.16655731201172, -97.6558609008789, -48.425262451171875, 77.5250015258789, -19.224369049072266, 104.50204467773438, 108.22927856445312, 153.88034057617188, 149.0711669921875, -55.07842254638672, 170.8533935546875, 197.65780639648438, 166.67726135253906, 26.63495635986328, -44.929656982421875, -13.762245178222656, 59.82403564453125, 6.087013244628906, 179.28048706054688, 186.4469451904297, 3.8836212158203125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000249.npy"}
{"epoch": 0.3656387665198238, "step": 250, "batch_size": 64, "mean": 91.13978576660156, "std": 74.37265014648438, "min": -88.32813262939453, "p10": -9.29341011047363, "median": 90.42203903198242, "p90": 197.93516998291017, "max": 236.00201416015625, "pos_frac": 0.859375, "sample": [192.327392578125, 60.94414520263672, 31.891387939453125, -25.31680679321289, -54.686492919921875, 123.70024108886719, 70.87438201904297, 207.41024780273438, 100.8670654296875, -3.60797119140625, 51.316741943359375, 194.3662872314453, 117.87338256835938, 153.93446350097656, 68.79072570800781, 89.72407531738281, 20.62847137451172, 85.68582153320312, 222.46246337890625, 156.64190673828125, 109.5804443359375, -29.523712158203125, -10.934226989746094, 96.65080261230469, 60.731300354003906, 169.96847534179688, 69.07305908203125, 2.4414310455322266, 35.433197021484375, 80.10122680664062, 10.582662582397461, 39.17735290527344, 53.149757385253906, 199.46469116210938, -7.147029876708984, 229.91358947753906, 116.92938232421875, 100.73241424560547, -13.587684631347656, 26.783226013183594, 79.33430480957031, 225.0490264892578, 170.5232696533203, 99.58267211914062, 94.2642822265625, 55.36644744873047, 135.8143310546875, 77.96463775634766, 122.37742614746094, 66.33938598632812, 90.61685180664062, -10.213287353515625, 90.22722625732422, 152.8436279296875, -88.32813262939453, 116.26766204833984, 199.91256713867188, 179.8639373779297, 104.24564361572266, 236.00201416015625, 120.47267150878906, 36.01628112792969, 136.270263671875, 136.78448486328125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000250.npy"}
{"epoch": 0.3671071953010279, "step": 251, "batch_size": 64, "mean": 76.52996826171875, "std": 94.09739685058594, "min": -158.09820556640625, "p10": -27.288331985473626, "median": 62.51838302612305, "p90": 207.41206970214847, "max": 313.6056213378906, "pos_frac": 0.8125, "sample": [238.42724609375, 5.555551528930664, 249.4114532470703, 45.70240020751953, 132.03106689453125, -37.32737731933594, -116.02256774902344, 313.6056213378906, 47.076148986816406, 119.24630737304688, 179.107421875, -49.684471130371094, 155.79776000976562, 29.374231338500977, 51.86372756958008, -19.223297119140625, -158.09820556640625, 259.130615234375, 4.06968879699707, 124.81535339355469, 19.15957260131836, 161.78379821777344, 5.199848175048828, 34.885719299316406, -5.0393218994140625, 105.62962341308594, -29.871726989746094, -42.85643005371094, 71.58885192871094, 209.65118408203125, 47.7991943359375, -34.503753662109375, 154.69908142089844, 22.737808227539062, 58.22345733642578, 142.57949829101562, 35.996315002441406, 59.47820281982422, 81.59539794921875, 119.17585754394531, 13.398605346679688, 123.40696716308594, 6.981023788452148, 186.3370819091797, 2.4001235961914062, 202.18746948242188, 72.14668273925781, 150.97686767578125, 71.64893341064453, 84.25284576416016, -16.08284568786621, 25.597885131835938, 284.30322265625, 105.39439392089844, 222.7132568359375, 147.531005859375, -1.2964115142822266, 134.65219116210938, 77.49186706542969, -21.26041030883789, 45.743675231933594, 81.18856811523438, 39.87565612792969, 65.55856323242188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000251.npy"}
{"epoch": 0.368575624082232, "step": 252, "batch_size": 64, "mean": 92.27542114257812, "std": 77.71553802490234, "min": -235.7372589111328, "p10": 10.488803100585939, "median": 94.39253997802734, "p90": 188.0739685058594, "max": 239.95726013183594, "pos_frac": 0.921875, "sample": [190.35948181152344, 154.5028076171875, 73.02998352050781, 149.94912719726562, 171.0669708251953, 27.5711669921875, 98.40030670166016, 141.80601501464844, 54.450592041015625, 124.5822982788086, 111.98777770996094, 12.83681869506836, 118.41808319091797, 20.187667846679688, 90.38477325439453, 116.05574035644531, 76.22950744628906, 62.24249267578125, 129.56939697265625, 10.783378601074219, 180.1183319091797, 113.96711730957031, 42.35795211791992, 182.74110412597656, 139.87939453125, -33.78505325317383, -9.811990737915039, 49.46636962890625, 10.385787963867188, 68.189697265625, -16.89535140991211, 63.4683837890625, 104.2579345703125, 140.94464111328125, 83.20669555664062, 73.14926147460938, 74.81324768066406, 229.82199096679688, 201.67919921875, 193.31939697265625, 59.1846923828125, 50.339874267578125, 113.62654113769531, 33.32261657714844, 115.38983154296875, 221.9481658935547, 195.109619140625, 139.2810821533203, 98.40716552734375, 152.8797149658203, 10.729171752929688, -235.7372589111328, 180.5850372314453, 181.82281494140625, 73.25003051757812, 66.45712280273438, -7.722133636474609, 79.60649871826172, 239.95726013183594, 11.23971176147461, 74.63481140136719, 9.54937744140625, 114.150146484375, 101.92684173583984], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000252.npy"}
{"epoch": 0.3700440528634361, "step": 253, "batch_size": 64, "mean": 83.24456024169922, "std": 95.91144561767578, "min": -209.05767822265625, "p10": -13.286424064636229, "median": 77.91287231445312, "p90": 219.45238800048833, "max": 318.10552978515625, "pos_frac": 0.828125, "sample": [-14.039926528930664, 10.064537048339844, 157.39501953125, 23.510391235351562, -6.6081695556640625, 61.88787078857422, 156.492919921875, 54.91942596435547, 164.6990203857422, 47.35243225097656, 75.88786315917969, 318.10552978515625, 136.61251831054688, 16.679222106933594, 82.53311157226562, 94.50851440429688, 226.7381591796875, 49.76339340209961, 1.0097827911376953, 68.5084457397461, -11.305524826049805, -209.05767822265625, 87.59188842773438, 227.31094360351562, 224.21087646484375, 7.890308380126953, 291.8160400390625, -11.528251647949219, -3.759918212890625, 23.912334442138672, 12.011161804199219, 15.556673049926758, 79.93788146972656, 131.876708984375, 198.03170776367188, 173.41322326660156, 83.73876953125, -60.322608947753906, 222.94297790527344, 82.63021087646484, 121.7659912109375, 190.8838348388672, 117.88776397705078, 184.45819091796875, 211.30767822265625, 50.47088623046875, 109.37283325195312, 176.72299194335938, 38.86042022705078, -105.3316650390625, 126.03935241699219, 113.26447296142578, -48.2220458984375, 23.630887985229492, 3.4126434326171875, 66.77027893066406, 81.65021514892578, 117.47946166992188, 160.84246826171875, -20.2244873046875, -14.345462799072266, 28.992034912109375, 235.73403930664062, 63.3111572265625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000253.npy"}
{"epoch": 0.37151248164464024, "step": 254, "batch_size": 64, "mean": 77.70970153808594, "std": 112.8464584350586, "min": -105.9693603515625, "p10": -59.446203994750974, "median": 58.2801628112793, "p90": 222.8512924194336, "max": 378.7037353515625, "pos_frac": 0.765625, "sample": [14.13916015625, 3.18505859375, 116.45272064208984, 59.72742462158203, 79.50640869140625, 112.70603942871094, -12.816116333007812, -43.52338409423828, 54.39056396484375, 23.202850341796875, 87.53498077392578, 143.6342315673828, 135.86099243164062, 88.20884704589844, -61.898983001708984, -74.52484130859375, 138.7003173828125, 4.430484771728516, 99.28706359863281, 107.03973388671875, -21.63259506225586, 165.63308715820312, 298.3971252441406, 211.85723876953125, 212.3388671875, -88.81192016601562, 202.36634826660156, 224.72482299804688, -60.46543884277344, 23.58146095275879, 22.756006240844727, 142.07594299316406, 69.13426208496094, -101.8470458984375, 19.64148712158203, 130.89208984375, -10.477218627929688, 275.11419677734375, -57.067989349365234, 42.01742935180664, -6.458534240722656, -105.9693603515625, 31.709814071655273, 265.281494140625, -10.158994674682617, 134.26504516601562, 3.6806983947753906, 218.47972106933594, 15.162286758422852, 112.78942108154297, 365.0616455078125, 26.1319580078125, 43.644134521484375, 5.835479736328125, -87.76398468017578, 378.7037353515625, 76.63388061523438, 138.32113647460938, 97.42710876464844, 56.83290100097656, -18.252487182617188, 340.0243835449219, 85.19281005859375, 31.375091552734375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000254.npy"}
{"epoch": 0.37298091042584436, "step": 255, "batch_size": 64, "mean": 85.83883666992188, "std": 113.79019927978516, "min": -235.89837646484375, "p10": -21.461704254150384, "median": 67.40051651000977, "p90": 245.55048980712897, "max": 356.76336669921875, "pos_frac": 0.796875, "sample": [355.6654052734375, -14.23529052734375, 34.79225158691406, 120.59092712402344, 113.5357666015625, -10.114501953125, -4.705526351928711, 18.918106079101562, -235.89837646484375, -4.988559722900391, 190.22605895996094, 80.10639953613281, 6.368682861328125, 19.459461212158203, 162.33328247070312, 119.53274536132812, -119.5986099243164, -118.06340026855469, 93.7620849609375, 195.87271118164062, 173.32952880859375, 7.600603103637695, -14.813087463378906, 17.273700714111328, 85.56494140625, 17.80414581298828, -47.09063720703125, 257.74029541015625, -8.886638641357422, 171.2752685546875, 38.847442626953125, 141.1808319091797, 71.75302124023438, 270.1406555175781, 253.2316131591797, 168.86370849609375, 221.34884643554688, 220.75601196289062, 219.61143493652344, 356.76336669921875, 26.24321746826172, -27.336217880249023, 189.78439331054688, 98.93804168701172, 23.284893035888672, 85.73815155029297, 40.23797607421875, 138.0260009765625, -65.8751220703125, 250.75360107421875, 268.447265625, 65.83935546875, 52.088768005371094, 22.31671905517578, 0.8478775024414062, 37.745880126953125, 68.96167755126953, 61.80207061767578, 15.877006530761719, 79.53001403808594, -24.311111450195312, 48.595821380615234, 176.88491821289062, 233.40989685058594], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000255.npy"}
{"epoch": 0.3744493392070485, "step": 256, "batch_size": 64, "mean": 92.20353698730469, "std": 104.55828094482422, "min": -90.98136901855469, "p10": -33.814755249023435, "median": 85.40142059326172, "p90": 236.38202819824224, "max": 407.0679931640625, "pos_frac": 0.796875, "sample": [91.20889282226562, 407.0679931640625, 89.56754302978516, 150.1824493408203, 143.2633056640625, 171.29800415039062, 10.964944839477539, 122.34940338134766, -35.563899993896484, 29.4874267578125, 159.1566925048828, 203.925048828125, 94.08352661132812, 131.72674560546875, 190.20089721679688, -49.493919372558594, 62.651485443115234, 158.1031036376953, -63.753173828125, 35.364139556884766, 54.94343566894531, 53.989601135253906, -85.04457092285156, 164.87879943847656, 50.89372634887695, 173.35365295410156, -1.1345291137695312, -33.01116943359375, -49.29486083984375, 85.37567138671875, 242.86569213867188, 2.9015731811523438, 134.615966796875, 85.42716979980469, 9.169315338134766, 108.33683013916016, 158.2083740234375, 21.970504760742188, 45.78239440917969, 35.03587341308594, -90.98136901855469, -16.775238037109375, 11.536758422851562, 48.51641845703125, 44.78785705566406, -29.89017105102539, 78.73211669921875, 117.05541229248047, -14.707626342773438, 144.38937377929688, -34.159149169921875, 117.90927124023438, 38.16779327392578, 156.28485107421875, 252.5420379638672, 294.37933349609375, 149.9545440673828, 52.82985305786133, 329.47796630859375, 293.8324279785156, 129.0740966796875, 255.0615692138672, -9.299459457397461, 221.25347900390625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000256.npy"}
{"epoch": 0.37591776798825255, "step": 257, "batch_size": 64, "mean": 96.6123275756836, "std": 112.2756118774414, "min": -91.73538208007812, "p10": -17.37272148132324, "median": 80.00253295898438, "p90": 260.45836181640635, "max": 411.9151916503906, "pos_frac": 0.796875, "sample": [233.3985595703125, 9.62918472290039, 64.96934509277344, 113.1658935546875, 229.81991577148438, 17.169570922851562, 157.07806396484375, 208.77566528320312, 144.99449157714844, 100.8879165649414, -2.0705013275146484, -18.526390075683594, 60.39271545410156, 192.3109588623047, 140.44287109375, 5.407806396484375, 136.49868774414062, 153.63636779785156, 28.29574203491211, 272.055419921875, 71.91741943359375, 301.8061828613281, 192.53684997558594, -91.73538208007812, 80.72050476074219, 90.7858657836914, -60.10529708862305, -3.02972412109375, -14.680828094482422, 324.2948913574219, 145.8628692626953, 128.33920288085938, 186.96774291992188, 411.9151916503906, 64.45452117919922, -12.3853759765625, 79.28456115722656, 2.3558082580566406, 109.62345886230469, -79.69515228271484, 66.25238800048828, 285.3192443847656, -9.568004608154297, 183.86790466308594, 16.557220458984375, 106.1545181274414, 46.869041442871094, 22.96442413330078, 348.92791748046875, 154.61846923828125, 111.32781982421875, 165.0084228515625, 25.63013458251953, 97.5842056274414, 41.21238708496094, 318.9094543457031, -43.523441314697266, -2.4048004150390625, 69.45803833007812, -91.4189682006836, 3.0332183837890625, 1.52435302734375, -38.832977294921875, 126.15254211425781], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000257.npy"}
{"epoch": 0.37738619676945667, "step": 258, "batch_size": 64, "mean": 87.23058319091797, "std": 109.9209213256836, "min": -240.10540771484375, "p10": -28.09805374145507, "median": 77.95215606689453, "p90": 216.2250411987305, "max": 370.4894714355469, "pos_frac": 0.8125, "sample": [176.25796508789062, 30.971654891967773, -33.8706169128418, 182.92913818359375, 44.97850036621094, 107.1475601196289, 179.569580078125, 8.824981689453125, 118.89503479003906, 9.479354858398438, -16.899341583251953, 5.157203674316406, 10.139520645141602, 55.734649658203125, 61.61549377441406, -1.84552001953125, -20.995765686035156, 62.76673889160156, 175.2867889404297, 14.279991149902344, -1.0428619384765625, 174.56515502929688, 304.5262451171875, 129.6639862060547, 237.70614624023438, 180.28817749023438, 50.416168212890625, 23.683349609375, 27.3822021484375, 208.89019775390625, 116.23275756835938, 115.291015625, 219.36854553222656, 77.54324340820312, -94.33366394042969, 66.01177978515625, 177.04312133789062, -240.10540771484375, 63.35321807861328, 78.36106872558594, 250.58917236328125, -41.854061126708984, -13.566009521484375, 196.84576416015625, 36.13685607910156, -225.26309204101562, 370.4894714355469, 120.60904693603516, -58.094993591308594, 83.29131317138672, 103.736328125, -31.141891479492188, 152.3758087158203, 166.68096923828125, 255.9573974609375, 51.49799346923828, 180.6469268798828, 239.4917755126953, 148.79855346679688, 155.15000915527344, 51.25962829589844, 61.396446228027344, 137.84963989257812, 104.60672760009766], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000258.npy"}
{"epoch": 0.3788546255506608, "step": 259, "batch_size": 64, "mean": 78.04417419433594, "std": 98.88311004638672, "min": -132.63076782226562, "p10": -33.93754348754883, "median": 55.55257797241211, "p90": 212.4588760375977, "max": 317.67578125, "pos_frac": 0.78125, "sample": [163.55174255371094, -12.597040176391602, 156.98681640625, 127.70930480957031, 26.100200653076172, -132.63076782226562, 163.14022827148438, 282.2478942871094, 258.0736999511719, 106.4898681640625, 14.060569763183594, -1.5919418334960938, 124.03365325927734, 13.939750671386719, 20.419281005859375, -66.42781066894531, 153.90316772460938, -15.2989501953125, 257.23876953125, 119.57550811767578, 99.07986450195312, 37.96539306640625, 28.693756103515625, 32.45927810668945, 149.88255310058594, 13.124063491821289, 58.72422790527344, 67.02623748779297, 317.67578125, -87.83219909667969, 15.547348022460938, 39.053314208984375, 243.20404052734375, 129.78594970703125, 42.43102264404297, -0.7160797119140625, 107.59612274169922, 202.4796142578125, -33.71467590332031, 258.8189697265625, -36.26791000366211, 15.31512451171875, 177.5720977783203, -2.457927703857422, 62.50590515136719, 18.0804443359375, 199.58099365234375, 164.94210815429688, -61.97917175292969, 83.65824890136719, -18.36445426940918, 216.73570251464844, 179.6278076171875, 10.283599853515625, 165.72628784179688, -34.033058166503906, 44.1801872253418, 146.26541137695312, 81.28398132324219, -69.3675765991211, 18.679073333740234, 25.130855560302734, 75.14029693603516, 52.38092803955078], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000259.npy"}
{"epoch": 0.3803230543318649, "step": 260, "batch_size": 64, "mean": 88.66500091552734, "std": 96.79959869384766, "min": -76.06608581542969, "p10": -11.701538467407216, "median": 71.97487258911133, "p90": 230.897171020508, "max": 399.12591552734375, "pos_frac": 0.875, "sample": [143.85012817382812, -61.360145568847656, -31.970970153808594, -21.68741226196289, 46.666893005371094, 100.10894775390625, 87.09591674804688, 96.09696960449219, 143.34002685546875, 34.09393310546875, 31.494319915771484, 255.4654083251953, 69.58743286132812, 43.157012939453125, 84.83662414550781, 399.12591552734375, 115.63834381103516, 113.93030548095703, 13.269989013671875, 282.75006103515625, -76.06608581542969, 56.081817626953125, 38.33636474609375, 141.99859619140625, -61.24689483642578, -28.968482971191406, 107.19241333007812, 0.444122314453125, 143.53619384765625, 87.51727294921875, 56.965728759765625, 183.35482788085938, 14.548149108886719, 145.7478790283203, 74.17877197265625, 176.43026733398438, 276.7158203125, 74.76738739013672, 54.581668853759766, 38.96406555175781, -1.8858985900878906, 88.74734497070312, 183.2661590576172, 19.65533447265625, 101.66875457763672, 43.133975982666016, 2.4434051513671875, 53.38893127441406, 111.5696792602539, 129.3258819580078, 90.8201904296875, 69.7709732055664, 17.019325256347656, 253.56021118164062, 46.135528564453125, -15.908241271972656, 20.186737060546875, 163.06394958496094, 9.973758697509766, 371.7103271484375, 29.055931091308594, 251.2724609375, 18.88037109375, 167.13534545898438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000260.npy"}
{"epoch": 0.38179148311306904, "step": 261, "batch_size": 64, "mean": 94.40481567382812, "std": 99.44379425048828, "min": -97.04562377929688, "p10": -9.342733764648436, "median": 83.06594848632812, "p90": 216.18211059570314, "max": 504.54327392578125, "pos_frac": 0.859375, "sample": [230.65994262695312, 143.82449340820312, 33.16981506347656, -60.23957824707031, 69.764892578125, 504.54327392578125, 47.77064514160156, 283.20672607421875, -9.745849609375, 109.76725006103516, 24.802122116088867, 39.27454376220703, 0.667633056640625, 211.82159423828125, 27.740402221679688, 159.68572998046875, 114.06817626953125, 185.095947265625, 155.26553344726562, -21.70410919189453, 65.72691345214844, 55.172607421875, 91.37174224853516, 126.15862274169922, 125.75119018554688, 222.1562042236328, 188.98513793945312, 45.33750915527344, 221.98583984375, 102.98323059082031, -16.040977478027344, 54.79509735107422, 13.852294921875, 80.87407684326172, 94.13250732421875, 18.438920974731445, 197.61837768554688, 189.56785583496094, 143.0068817138672, 53.60858917236328, 184.54019165039062, 44.052371978759766, 177.77383422851562, 218.0509033203125, 43.61791229248047, 175.65350341796875, -54.94491958618164, 117.46176147460938, 34.787254333496094, 86.03479766845703, 110.49198150634766, 261.27532958984375, 7.1823883056640625, -97.04562377929688, -79.27183532714844, 79.04692840576172, 9.2635498046875, -4.654712677001953, 85.25782012939453, 101.08119201660156, 43.26732635498047, 63.73633575439453, 118.73004150390625, -8.402130126953125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000261.npy"}
{"epoch": 0.3832599118942731, "step": 262, "batch_size": 64, "mean": 76.14309692382812, "std": 78.95851135253906, "min": -131.89273071289062, "p10": -20.096146011352527, "median": 82.93036270141602, "p90": 161.00389709472657, "max": 283.7706298828125, "pos_frac": 0.859375, "sample": [37.847076416015625, 175.46920776367188, 55.485328674316406, 117.056884765625, -63.09916687011719, 128.14285278320312, 283.7706298828125, 82.16676330566406, 78.77093505859375, 107.66765594482422, -53.67591857910156, -131.89273071289062, 57.356712341308594, 264.3935546875, 38.80810546875, -89.8728256225586, 124.33132934570312, 119.25411987304688, 26.49566078186035, 94.39227294921875, 91.05760192871094, 43.63214874267578, -33.156333923339844, 134.77731323242188, 99.37025451660156, 2.0953521728515625, 5.003833770751953, 225.53890991210938, 116.96977233886719, 157.22174072265625, 29.981096267700195, 42.94544219970703, 51.7530517578125, -25.199504852294922, 99.75822448730469, 85.98632049560547, 47.863037109375, 39.07331085205078, 131.22323608398438, 155.87916564941406, 118.52947998046875, 89.7790298461914, 83.69396209716797, 204.2337646484375, 130.6229248046875, 120.04808807373047, 127.00457763671875, 33.275856018066406, 26.133934020996094, 22.56606101989746, 20.698989868164062, 62.73615264892578, -8.188308715820312, 160.94384765625, -75.93229675292969, 3.1905288696289062, 114.6370849609375, 161.37527465820312, 111.63288879394531, 161.02963256835938, 98.42430114746094, -1.1845855712890625, 79.70832824707031, 73.55611419677734], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000262.npy"}
{"epoch": 0.38472834067547723, "step": 263, "batch_size": 64, "mean": 104.40455627441406, "std": 97.02840423583984, "min": -98.40977478027344, "p10": -9.017329978942861, "median": 103.43277740478516, "p90": 245.8958923339844, "max": 313.0006103515625, "pos_frac": 0.875, "sample": [188.45285034179688, -98.40977478027344, 88.53170013427734, 90.67649841308594, 178.6839141845703, 24.23052215576172, 170.817138671875, 53.80070495605469, 19.487518310546875, 131.418212890625, -91.2486572265625, 140.11521911621094, 49.698402404785156, 160.8114013671875, 150.60000610351562, 99.36848449707031, 14.459905624389648, 198.1986846923828, 75.695068359375, 76.59465026855469, -12.805551528930664, -43.90357208251953, 61.62458038330078, 259.0371398925781, -13.43264389038086, 30.502731323242188, 159.17807006835938, 62.728355407714844, 1.6691207885742188, 25.92862319946289, 20.836837768554688, 7.93321418762207, 244.97222900390625, 47.14769744873047, 41.37678527832031, 228.5356903076172, 136.85287475585938, 51.940765380859375, 285.2893981933594, 107.4970703125, 28.080209732055664, 189.8603515625, 134.53060913085938, 148.7518310546875, 252.1348876953125, 110.13555908203125, 263.4382019042969, 20.927215576171875, 139.5337371826172, 170.90341186523438, 152.95620727539062, 246.291748046875, 313.0006103515625, -69.88761901855469, -0.1781463623046875, 233.314697265625, 57.754615783691406, -18.63946533203125, 152.93687438964844, 145.2958984375, 108.54743957519531, 302.6650695800781, 120.35112762451172, 54.29475402832031], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000263.npy"}
{"epoch": 0.38619676945668135, "step": 264, "batch_size": 64, "mean": 84.01654052734375, "std": 90.68926239013672, "min": -126.6380615234375, "p10": -6.374010086059567, "median": 65.80777359008789, "p90": 238.5603881835938, "max": 322.29083251953125, "pos_frac": 0.875, "sample": [137.07345581054688, 153.51910400390625, 29.178897857666016, 243.305419921875, 93.70008850097656, -63.4676513671875, -13.271629333496094, 227.4886474609375, 6.991306304931641, 263.64239501953125, -2.38568115234375, 35.480865478515625, 57.53833770751953, -42.97856903076172, 74.05439758300781, 94.47417449951172, 157.97772216796875, 100.959228515625, 134.91970825195312, 165.69602966308594, 66.59652709960938, 48.379119873046875, 44.572509765625, 114.57270812988281, 42.21666717529297, 56.10166931152344, 35.49341583251953, 37.872314453125, 19.446182250976562, 73.12577819824219, 1.5436458587646484, 38.122840881347656, 57.88848114013672, 257.4959716796875, 3.47900390625, 273.0892028808594, 78.91594696044922, 79.21700286865234, 97.58892822265625, 151.19166564941406, 65.0190200805664, 79.81585693359375, 63.820220947265625, 18.544189453125, 28.626197814941406, 322.29083251953125, 168.9036865234375, -126.6380615234375, 22.405715942382812, -12.956459045410156, 115.72010803222656, 34.230106353759766, 77.167236328125, 246.29287719726562, 3.6529197692871094, 52.55148696899414, 75.83159637451172, 107.87799072265625, -66.38211059570312, 211.76956176757812, 250.4158935546875, 186.64398193359375, -8.083293914794922, 28.733062744140625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000264.npy"}
{"epoch": 0.3876651982378855, "step": 265, "batch_size": 64, "mean": 85.43595886230469, "std": 108.01421356201172, "min": -164.15835571289062, "p10": -16.034740066528318, "median": 58.09703063964844, "p90": 251.7293273925782, "max": 393.13250732421875, "pos_frac": 0.828125, "sample": [44.251976013183594, 270.8767395019531, 232.34521484375, 118.67268371582031, 55.769004821777344, 25.6069393157959, 141.4124755859375, 4.402320861816406, 363.1439514160156, 39.31292724609375, 171.72000122070312, -78.1243667602539, 21.12955093383789, 35.402828216552734, -5.7494354248046875, 49.87015914916992, 79.9754638671875, -68.01638793945312, 85.02983093261719, 19.557388305664062, 161.03453063964844, 326.2065734863281, -164.15835571289062, 270.1778869628906, -50.587890625, 104.09115600585938, 59.64739990234375, 283.5100402832031, -3.214794158935547, 393.13250732421875, -74.76487731933594, 78.21222686767578, 30.666528701782227, 15.71365737915039, 15.377593994140625, -21.798919677734375, 16.561180114746094, 24.426055908203125, 159.46670532226562, 8.365280151367188, 192.03292846679688, 53.52931213378906, 44.8385009765625, 181.53836059570312, -14.145225524902344, 41.02532196044922, -0.4591712951660156, 94.3167724609375, 6.5468292236328125, 70.60794067382812, 56.546661376953125, 30.8951416015625, 163.42710876464844, 199.44076538085938, 83.58258056640625, 100.32333374023438, 99.65367126464844, 81.91363525390625, 104.43623352050781, -16.844532012939453, 260.03680419921875, 78.41161346435547, 173.0780029296875, 144.5154571533203], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000265.npy"}
{"epoch": 0.3891336270190896, "step": 266, "batch_size": 64, "mean": 89.72528839111328, "std": 96.957275390625, "min": -133.67706298828125, "p10": -17.093426513671872, "median": 65.15545272827148, "p90": 206.30631256103516, "max": 328.40203857421875, "pos_frac": 0.859375, "sample": [49.91594696044922, 310.7716369628906, 76.5985107421875, 65.43183898925781, 158.38719177246094, 113.35426330566406, 55.54689025878906, 191.32901000976562, 85.82450866699219, 242.55909729003906, 154.8494415283203, 174.8810272216797, -13.386703491210938, 168.87355041503906, -18.682022094726562, 45.38789367675781, 164.0812225341797, 53.2281494140625, 158.91885375976562, 15.339218139648438, 58.44564437866211, -133.67706298828125, -52.453468322753906, 312.51837158203125, 39.29460144042969, 202.11721801757812, -7.6563873291015625, 25.679580688476562, 15.181678771972656, 206.46051025390625, 285.89154052734375, 111.36864471435547, 328.40203857421875, 205.94651794433594, 139.84056091308594, 9.58984375, 109.48295593261719, 194.5264892578125, 58.95978546142578, 43.366050720214844, 98.552001953125, 55.031524658203125, 80.13150787353516, 4.856328964233398, 138.63916015625, 24.531665802001953, -51.764434814453125, -51.04327392578125, 205.6367950439453, 116.1977767944336, -23.11255645751953, 24.26604461669922, 46.56977844238281, 77.35714721679688, 12.250259399414062, 16.722915649414062, 15.723012924194336, 64.87906646728516, -26.65723419189453, 226.64971923828125, 131.083740234375, 100.88387298583984, 0.2319660186767578, 48.306976318359375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000266.npy"}
{"epoch": 0.39060205580029367, "step": 267, "batch_size": 64, "mean": 94.54733276367188, "std": 91.87327575683594, "min": -147.40264892578125, "p10": 5.290312385559083, "median": 83.35901641845703, "p90": 194.50230255126954, "max": 389.54327392578125, "pos_frac": 0.921875, "sample": [77.44173431396484, 101.4381103515625, 113.07328796386719, -7.360172271728516, 125.33328247070312, -17.194013595581055, 183.27944946289062, 133.91331481933594, -13.278884887695312, 40.80058670043945, 21.041976928710938, 12.94207763671875, 5.71392822265625, 203.26919555664062, 70.1275405883789, 95.65111541748047, 24.1778564453125, 191.06033325195312, 251.2442626953125, 389.54327392578125, 61.50478744506836, 82.90985870361328, 117.48943328857422, 2.53125, 18.460342407226562, 90.93583679199219, 74.81971740722656, 37.915565490722656, 158.95875549316406, 162.481689453125, 178.48492431640625, 224.04910278320312, 334.232177734375, 49.84388732910156, 27.271404266357422, 66.57403564453125, 9.292675018310547, 17.045310974121094, 19.67422866821289, 170.07672119140625, 177.58865356445312, 8.10516357421875, 5.108762741088867, 301.4419250488281, -147.40264892578125, 8.469947814941406, -46.29768371582031, 120.18683624267578, 79.11658477783203, 92.77792358398438, 59.489192962646484, 129.59329223632812, 22.29928207397461, 195.97743225097656, 83.80817413330078, 95.65208435058594, 51.990631103515625, 173.91204833984375, 59.691139221191406, 111.59381103515625, 150.73361206054688, 113.03437042236328, 188.00833129882812, 109.38062286376953], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000267.npy"}
{"epoch": 0.3920704845814978, "step": 268, "batch_size": 64, "mean": 86.81520080566406, "std": 94.74246978759766, "min": -141.834228515625, "p10": -36.33221893310546, "median": 86.66937637329102, "p90": 174.1482345581055, "max": 349.17523193359375, "pos_frac": 0.8125, "sample": [9.040122985839844, 219.68284606933594, 105.46619415283203, -54.573585510253906, 100.09410095214844, 56.84149932861328, 99.62480163574219, 67.872314453125, 155.64068603515625, 137.1150360107422, 136.00607299804688, -41.391944885253906, 128.806396484375, 47.72589874267578, 205.8551025390625, -52.418487548828125, 45.169288635253906, 10.86296272277832, -57.835113525390625, 126.81426239013672, 84.86699676513672, 329.4881591796875, 30.421977996826172, 106.15147399902344, 76.86578369140625, 153.67799377441406, 174.9501953125, 349.17523193359375, -8.654010772705078, 148.64215087890625, -53.742225646972656, 20.997817993164062, 77.65071105957031, 36.557552337646484, 271.73529052734375, 50.19734191894531, 123.67804718017578, 96.95509338378906, 105.3197021484375, 172.27699279785156, 124.31526947021484, 168.5672607421875, 34.15052032470703, -0.5867176055908203, 80.34280395507812, 38.83441162109375, 168.24290466308594, 163.816650390625, -5.70849609375, 111.51342010498047, 89.19207763671875, 88.47175598144531, 330.00885009765625, 124.76512145996094, 71.430419921875, -44.29193115234375, 62.80450439453125, -24.52619171142578, 162.5465087890625, 22.90513801574707, 57.60479736328125, -141.834228515625, -19.301624298095703, 99.29916381835938], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000268.npy"}
{"epoch": 0.3935389133627019, "step": 269, "batch_size": 64, "mean": 114.76429748535156, "std": 115.61670684814453, "min": -123.06128692626953, "p10": -14.34717674255371, "median": 108.45899963378906, "p90": 247.21625061035158, "max": 396.71331787109375, "pos_frac": 0.8125, "sample": [91.18785095214844, 232.706787109375, 118.70709228515625, 88.77180480957031, -14.522407531738281, 168.4436492919922, 222.294189453125, 143.3785400390625, 4.0874786376953125, -17.473426818847656, 187.74891662597656, -5.111602783203125, 234.15036010742188, 100.622802734375, 193.02276611328125, 133.8955078125, 10.957237243652344, 76.1544189453125, 54.68165588378906, -123.06128692626953, 197.80154418945312, 34.21016311645508, 145.20541381835938, -53.49964141845703, -7.123218536376953, 202.88462829589844, 19.293296813964844, 326.0838928222656, 396.71331787109375, 81.30889892578125, 151.61962890625, -1.34674072265625, 241.05368041992188, 180.54531860351562, 52.02967834472656, 182.1153106689453, 171.64865112304688, 103.97218322753906, 219.97305297851562, 13.695953369140625, -18.622108459472656, 131.37930297851562, -58.3638916015625, 3.158050537109375, 112.94581604003906, 225.0943603515625, 179.7958984375, 248.8846435546875, 243.32333374023438, 42.510986328125, 145.76593017578125, 63.90155029296875, 57.00164794921875, 371.0255432128906, -7.281499862670898, 319.3438720703125, 28.29351806640625, 9.772819519042969, -13.938304901123047, 70.57659912109375, 300.1846923828125, 278.2933044433594, -111.39828491210938, 164.44027709960938], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000269.npy"}
{"epoch": 0.39500734214390604, "step": 270, "batch_size": 64, "mean": 84.35902404785156, "std": 109.70121765136719, "min": -132.94386291503906, "p10": -27.679444122314443, "median": 70.68701171875, "p90": 244.70953521728518, "max": 383.1695861816406, "pos_frac": 0.734375, "sample": [179.54702758789062, 213.1536865234375, 224.55511474609375, -132.94386291503906, -123.01448822021484, 34.650543212890625, 34.170677185058594, 166.98629760742188, 85.7015609741211, 185.07913208007812, 280.72216796875, -7.354034423828125, 54.5067138671875, 85.75938415527344, -0.75048828125, 7.679145812988281, 88.95178985595703, 99.54518127441406, -32.14008331298828, 87.06204223632812, 239.842529296875, 86.56111145019531, -5.762138366699219, 150.99984741210938, 246.79539489746094, 127.24362182617188, -16.678617477416992, 11.093452453613281, 95.34219360351562, 51.28217315673828, -59.072540283203125, 293.618408203125, 382.0946044921875, -58.28695297241211, -17.271286010742188, 136.985595703125, 149.29248046875, 28.9001522064209, 68.41252136230469, 50.061580657958984, 383.1695861816406, -10.5689697265625, 45.935791015625, -0.027912139892578125, 65.60282897949219, 74.1109619140625, -5.3815460205078125, 110.77543640136719, -9.303359985351562, 100.18702697753906, 122.12686157226562, 143.41909790039062, 8.548309326171875, -9.140230178833008, 21.024795532226562, 54.770477294921875, -49.157958984375, 3.9817657470703125, -43.19310760498047, 262.1449890136719, 129.08258056640625, 186.48983764648438, 72.96150207519531, 248.0970458984375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000270.npy"}
{"epoch": 0.3964757709251101, "step": 271, "batch_size": 64, "mean": 106.77681732177734, "std": 110.8238296508789, "min": -86.1483154296875, "p10": -12.457397079467768, "median": 77.31730651855469, "p90": 265.27092590332046, "max": 398.6195373535156, "pos_frac": 0.875, "sample": [398.6195373535156, 210.47637939453125, 83.15509796142578, 25.640628814697266, 122.54853820800781, 27.117595672607422, 219.40493774414062, 232.31735229492188, 98.67011260986328, 71.4795150756836, 120.366455078125, -86.1483154296875, 331.8638916015625, -48.06732940673828, 87.57334899902344, 8.405067443847656, 114.82186889648438, 59.503868103027344, 39.48805236816406, -21.15880584716797, -40.01601791381836, 29.328018188476562, 224.9974365234375, 234.66705322265625, 322.53253173828125, 234.11167907714844, 55.05431365966797, 5.528474807739258, 37.45905303955078, 146.20501708984375, -7.011379241943359, 52.39888000488281, 124.03852844238281, -14.791404724121094, -23.454681396484375, 117.87873840332031, 36.96128845214844, 15.337867736816406, 169.33966064453125, 43.85374069213867, 11.919334411621094, 323.8974609375, 61.86286163330078, 20.945280075073242, 95.93972778320312, 49.814388275146484, 60.62657165527344, 25.94989013671875, 209.04122924804688, 112.2808837890625, 359.4324035644531, 191.86993408203125, 42.57054901123047, 25.391189575195312, 329.503662109375, -17.30791473388672, 124.256103515625, 278.3868713378906, 110.05606079101562, 194.15480041503906, 10.396278381347656, 197.39288330078125, 58.090728759765625, 96.74842834472656], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000271.npy"}
{"epoch": 0.39794419970631423, "step": 272, "batch_size": 64, "mean": 127.61502075195312, "std": 118.9935073852539, "min": -133.249267578125, "p10": -16.208446311950677, "median": 124.70915222167969, "p90": 292.95364074707044, "max": 443.4105224609375, "pos_frac": 0.84375, "sample": [171.42588806152344, 127.21906280517578, -8.387056350708008, -35.82987976074219, -2.9091339111328125, 251.4812469482422, 121.32939147949219, 304.1755065917969, 266.769287109375, 134.3434295654297, 226.04052734375, 218.7437744140625, -47.71059036254883, 67.60647583007812, -122.43858337402344, 62.22773742675781, 28.82085418701172, 123.08624267578125, 126.23355102539062, 108.50745391845703, -25.432968139648438, 244.55291748046875, 159.52789306640625, 123.18475341796875, 180.41041564941406, 28.16773223876953, 16.571704864501953, 212.25921630859375, 113.28592681884766, 443.4105224609375, 113.533935546875, 121.29351806640625, 139.25570678710938, 61.825958251953125, 32.56791687011719, 209.5355224609375, 321.5383605957031, -133.249267578125, 129.7928009033203, 139.72024536132812, 412.8025207519531, 64.83053588867188, 308.9043273925781, 101.14065551757812, 126.27056884765625, 2.9496688842773438, 187.63165283203125, 50.36226272583008, 169.73751831054688, 259.3843994140625, -19.560470581054688, 172.80905151367188, 3.784332275390625, -20.6375732421875, -8.189363479614258, 228.5057830810547, 158.64332580566406, 59.158538818359375, 307.75592041015625, 49.519775390625, 147.24066162109375, 234.04718017578125, 328.1456604003906, 89.6363296508789], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000272.npy"}
{"epoch": 0.39941262848751835, "step": 273, "batch_size": 64, "mean": 122.21438598632812, "std": 119.00743865966797, "min": -125.82196044921875, "p10": -8.290301513671864, "median": 87.15401840209961, "p90": 303.7344848632814, "max": 404.970458984375, "pos_frac": 0.890625, "sample": [174.52886962890625, 198.5013427734375, 26.257598876953125, 86.15260314941406, 389.86798095703125, 352.09552001953125, 131.65341186523438, -23.748544692993164, 348.6954345703125, -22.632644653320312, 192.8463134765625, 67.29987335205078, 35.268524169921875, 52.74694061279297, 9.169038772583008, 127.20294189453125, 23.410335540771484, 97.25433349609375, 122.33778381347656, 205.89756774902344, 142.31597900390625, 18.874074935913086, 38.22100830078125, 71.97685241699219, 42.8048095703125, 86.80109405517578, 222.63136291503906, 75.63052368164062, 120.45172119140625, 125.12053680419922, 134.4324188232422, 13.875923156738281, 242.43942260742188, 322.77313232421875, 12.84882926940918, 346.8937072753906, 367.26702880859375, 156.65444946289062, 79.13589477539062, 239.0176239013672, -77.92066192626953, -13.598419189453125, 172.46791076660156, -35.88117980957031, 149.30517578125, 259.31097412109375, 222.60447692871094, 35.257286071777344, 52.24021911621094, -125.82196044921875, 404.970458984375, 87.20555877685547, 70.21208190917969, 4.095306396484375, 56.7042236328125, 216.39263916015625, 74.77711486816406, 25.814571380615234, 87.10247802734375, -13.961807250976562, 228.0552978515625, 34.926063537597656, 225.14077758789062, 227.35067749023438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000273.npy"}
{"epoch": 0.4008810572687225, "step": 274, "batch_size": 64, "mean": 96.30369567871094, "std": 107.41636657714844, "min": -118.10441589355469, "p10": -31.18189010620117, "median": 89.4465217590332, "p90": 251.50978546142585, "max": 337.90911865234375, "pos_frac": 0.796875, "sample": [51.21717071533203, 46.51454162597656, 19.323326110839844, -64.35136413574219, 121.83163452148438, 93.33316040039062, 144.04116821289062, -6.369880676269531, 43.543426513671875, 60.28460693359375, -50.034088134765625, 263.6807861328125, 190.11021423339844, 166.65655517578125, 203.99639892578125, 91.45143127441406, 79.29248809814453, 307.00250244140625, -118.10441589355469, 180.67416381835938, 38.382598876953125, 139.08140563964844, -4.966949462890625, 71.63099670410156, 15.258750915527344, 214.00442504882812, 290.4455871582031, -28.1834716796875, 44.57006072998047, 337.90911865234375, -23.328227996826172, 7.212799072265625, 109.86416625976562, 102.8295669555664, 41.952613830566406, 151.6914825439453, 270.6565856933594, -8.309173583984375, 162.13951110839844, 50.41722106933594, 198.0458526611328, 63.30503845214844, 122.38641357421875, 116.04872131347656, 191.7528076171875, -32.46692657470703, 96.14396667480469, 4.862752914428711, 87.44161224365234, 304.43731689453125, 194.47804260253906, 44.39778137207031, 2.811309814453125, 159.92279052734375, -96.07994842529297, 185.09251403808594, -76.82635498046875, -79.71971130371094, 257.95916748046875, 109.63314819335938, 79.66986083984375, 236.4612274169922, -1.417327880859375, 187.74371337890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000274.npy"}
{"epoch": 0.4023494860499266, "step": 275, "batch_size": 64, "mean": 101.77294921875, "std": 118.76139831542969, "min": -122.78144836425781, "p10": -50.24382553100586, "median": 72.6323013305664, "p90": 273.73945922851567, "max": 387.95452880859375, "pos_frac": 0.78125, "sample": [143.83482360839844, 50.47059631347656, 172.95179748535156, 12.04128646850586, 11.365468978881836, -21.933433532714844, 234.76632690429688, 52.05235290527344, 283.0377502441406, -48.333709716796875, 195.55291748046875, 244.22267150878906, 387.95452880859375, 64.14095306396484, 125.7220687866211, 144.67042541503906, -56.7234992980957, 38.17951965332031, 68.15082550048828, 98.23542785644531, -20.170066833496094, 133.73715209960938, 308.5039978027344, -51.06244659423828, 42.973304748535156, -97.56511688232422, 131.163818359375, 135.39649963378906, 191.87094116210938, 67.39415740966797, -67.71792602539062, 58.03509521484375, 195.09580993652344, 144.8726806640625, 40.119415283203125, 3.7128753662109375, -122.78144836425781, 70.22294616699219, -22.78480339050293, 28.308441162109375, 67.76742553710938, 199.29873657226562, -78.77520751953125, 41.11995315551758, 135.57473754882812, 256.8419189453125, 132.34963989257812, 377.048095703125, 171.69952392578125, 294.7749328613281, 298.65814208984375, -66.77468872070312, 183.80767822265625, 280.98126220703125, -5.023139953613281, 181.5120849609375, 75.04165649414062, 211.1782989501953, -7.849922180175781, 126.07679748535156, 41.13300323486328, 219.60113525390625, -15.9559326171875, 23.6978759765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000275.npy"}
{"epoch": 0.40381791483113066, "step": 276, "batch_size": 64, "mean": 105.10834503173828, "std": 123.49974060058594, "min": -116.3984603881836, "p10": -24.079404830932614, "median": 85.44400787353516, "p90": 288.28392944335945, "max": 486.82672119140625, "pos_frac": 0.828125, "sample": [70.26912689208984, 486.82672119140625, 86.05035400390625, 296.4930419921875, 297.1143493652344, 90.222900390625, -0.340545654296875, 361.3273010253906, 42.715248107910156, 148.7509307861328, 92.22366333007812, 43.580223083496094, -8.456939697265625, 167.20811462402344, 189.95999145507812, 157.9302978515625, 52.36933898925781, -116.3984603881836, 37.58984375, 35.08224868774414, 27.031513214111328, 156.7357940673828, 113.03089904785156, 237.57550048828125, 30.222122192382812, -101.62884521484375, 7.041790008544922, 92.11048889160156, 99.6968002319336, 223.63140869140625, -25.001724243164062, 122.56661987304688, 67.33959197998047, 61.69825744628906, 217.5323028564453, 159.83230590820312, -68.71929931640625, -116.20857238769531, 4.050992965698242, 55.49433517456055, 9.992311477661133, -26.510238647460938, -21.927326202392578, 324.9388732910156, 84.4706802368164, 61.72191619873047, 110.41258239746094, 126.120849609375, 186.3489532470703, 32.42584991455078, 269.12933349609375, -9.311981201171875, 150.13780212402344, 167.53402709960938, 21.276336669921875, 185.86570739746094, 117.58645629882812, 84.83766174316406, 268.09576416015625, 29.21331787109375, -67.64340209960938, 353.343994140625, 27.2390079498291, 347.08575439453125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000276.npy"}
{"epoch": 0.4052863436123348, "step": 277, "batch_size": 64, "mean": 104.77349090576172, "std": 107.94400024414062, "min": -78.55622863769531, "p10": -4.07501697540283, "median": 67.89827728271484, "p90": 268.9250427246094, "max": 359.00616455078125, "pos_frac": 0.875, "sample": [17.09377670288086, 202.73622131347656, -40.92176818847656, 16.978729248046875, 36.372798919677734, 130.357177734375, 261.4927978515625, 107.14823913574219, -4.708404541015625, 147.77435302734375, 105.934326171875, 29.32793617248535, 197.1519317626953, -2.5971126556396484, 233.50074768066406, 163.59959411621094, 12.741659164428711, 67.33341979980469, 135.87777709960938, 153.8142852783203, 8.995338439941406, 183.06280517578125, 52.21299743652344, 44.40637969970703, -49.623809814453125, -20.051124572753906, 331.1824951171875, 86.09437561035156, 35.47344970703125, 106.12637329101562, -78.55622863769531, 305.3275146484375, 30.826148986816406, 207.34173583984375, 37.495994567871094, -69.3220443725586, 359.00616455078125, 189.87387084960938, 68.463134765625, 26.41204833984375, 125.32406616210938, 52.83509826660156, 317.358642578125, 53.31715774536133, 6.462272644042969, 60.870521545410156, 56.357261657714844, 283.6042785644531, 256.1085510253906, 302.4185485839844, -44.80291748046875, 190.25057983398438, 33.762550354003906, 36.12779998779297, 17.678939819335938, 9.34024429321289, 134.08087158203125, 45.61188507080078, 242.07247924804688, 272.11029052734375, 78.29812622070312, 144.931640625, 44.93955993652344, 160.68850708007812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000277.npy"}
{"epoch": 0.4067547723935389, "step": 278, "batch_size": 64, "mean": 115.85311126708984, "std": 123.02217864990234, "min": -85.02090454101562, "p10": -21.003531074523924, "median": 93.3395004272461, "p90": 322.87380065917966, "max": 371.9335021972656, "pos_frac": 0.828125, "sample": [3.4481754302978516, 220.34463500976562, -28.52214813232422, 42.6266975402832, 142.42990112304688, 168.3330841064453, 107.41696166992188, 39.23447799682617, 133.3920135498047, 76.04423522949219, 65.63874816894531, 121.45263671875, 61.650291442871094, 6.364776611328125, 132.5826416015625, 160.6790313720703, -31.478744506835938, 209.81338500976562, 327.6325988769531, -18.98137664794922, 22.350156784057617, 363.3426513671875, 232.34840393066406, 97.39254760742188, 168.46640014648438, 202.0229949951172, 71.69620513916016, 3.0873184204101562, 74.8055648803711, 371.9335021972656, 289.16162109375, -37.08349609375, 89.47305297851562, -76.98440551757812, 360.148193359375, 341.54718017578125, 3.8304519653320312, 92.7222900390625, 34.76447296142578, 168.85504150390625, 55.337501525878906, 321.36932373046875, 93.95671081542969, 15.12725830078125, 28.909576416015625, 322.8030700683594, 74.9318618774414, 141.38096618652344, -1.4382972717285156, 322.90411376953125, 104.15995788574219, 149.631591796875, 8.13629150390625, 25.066490173339844, 175.71475219726562, 252.9409637451172, -4.968568801879883, -21.870168685913086, 355.22979736328125, -53.22611999511719, 213.6326446533203, 112.75747680664062, -85.02090454101562, -6.849479675292969], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000278.npy"}
{"epoch": 0.40822320117474303, "step": 279, "batch_size": 64, "mean": 127.84455871582031, "std": 140.56895446777344, "min": -127.16712188720703, "p10": -44.181443405151356, "median": 106.4525032043457, "p90": 358.99927673339846, "max": 481.7618408203125, "pos_frac": 0.8125, "sample": [59.51757049560547, 4.111513137817383, 30.347171783447266, 99.27508544921875, 20.423492431640625, -23.968711853027344, 361.551025390625, 261.11016845703125, 54.71843719482422, 21.19272232055664, -8.640663146972656, 198.51954650878906, 94.97300720214844, 38.78831481933594, 381.1079406738281, 366.58892822265625, 45.9992790222168, 153.64971923828125, 204.18675231933594, 137.69786071777344, 373.86993408203125, 105.52237701416016, -12.821741104125977, -127.16712188720703, 3.0729312896728516, 64.41670989990234, 107.38262939453125, 183.78164672851562, 14.032623291015625, 353.0451965332031, 114.58065795898438, 100.23308563232422, 187.4674072265625, 40.062095642089844, 172.97396850585938, 372.14739990234375, 161.779052734375, 481.7618408203125, 362.23004150390625, 246.0432891845703, -49.84361267089844, -24.6260986328125, -125.90902709960938, -120.7998046875, 71.05322265625, 300.8348693847656, 99.49461364746094, 191.36097717285156, 213.2328643798828, 65.62098693847656, 183.4919891357422, 233.052978515625, 184.5472412109375, -74.89900207519531, 235.50778198242188, 96.42147064208984, 242.94085693359375, 137.19984436035156, -69.97193908691406, -52.63917541503906, 264.528564453125, 198.30532836914062, -30.969715118408203, 208.5535430908203], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000279.npy"}
{"epoch": 0.40969162995594716, "step": 280, "batch_size": 64, "mean": 127.20117950439453, "std": 109.88967895507812, "min": -91.11930847167969, "p10": -19.960846328735347, "median": 126.53274154663086, "p90": 277.03081665039065, "max": 374.6228332519531, "pos_frac": 0.859375, "sample": [122.8720703125, 203.21607971191406, 104.98114776611328, 189.3229522705078, -61.23167419433594, -72.15253448486328, 209.51174926757812, 282.33819580078125, 277.80914306640625, 14.617523193359375, 188.6186981201172, 30.449806213378906, -15.643115997314453, 172.979248046875, 264.0263977050781, 309.04559326171875, 68.31468200683594, 186.09266662597656, 120.37635803222656, 159.6319580078125, 39.7310676574707, 366.95660400390625, 374.6228332519531, 171.3178253173828, 202.03750610351562, 125.86433410644531, 208.50111389160156, 177.7108612060547, 7.54656982421875, 175.1075439453125, 131.36538696289062, 106.06537628173828, 311.61956787109375, 117.73516845703125, -13.962066650390625, 140.847900390625, -34.5147705078125, 275.2147216796875, 130.28282165527344, 131.7448272705078, 129.0411834716797, 61.63219451904297, -91.11930847167969, 327.29022216796875, -71.4747314453125, -71.30133056640625, 66.31214904785156, 211.597900390625, 121.475341796875, 120.33245086669922, 100.70201110839844, 248.90386962890625, 69.74226379394531, 102.38477325439453, 31.3475341796875, 5.523262023925781, 127.2011489868164, 95.93357849121094, 89.49755859375, 221.07997131347656, 162.55613708496094, -21.811302185058594, 68.70381164550781, 134.35227966308594], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000280.npy"}
{"epoch": 0.4111600587371512, "step": 281, "batch_size": 64, "mean": 120.0322265625, "std": 113.36522674560547, "min": -310.3327941894531, "p10": -3.4414091110229457, "median": 118.33752822875977, "p90": 248.91187438964846, "max": 398.8612976074219, "pos_frac": 0.875, "sample": [80.1225357055664, 252.59100341796875, 110.83360290527344, 197.4452667236328, 87.82766723632812, 40.952423095703125, 87.7125244140625, 130.06045532226562, 140.9224853515625, 54.38141632080078, 116.56403350830078, 240.32723999023438, 199.37469482421875, 42.835670471191406, 171.1348419189453, -38.57762145996094, 163.80105590820312, 333.11529541015625, 65.5301513671875, 136.52886962890625, 64.2730712890625, 155.56736755371094, 201.19569396972656, 282.83465576171875, 101.19564056396484, 119.21343231201172, 152.57089233398438, 172.58665466308594, -21.877338409423828, 390.5426025390625, 223.972412109375, 67.43348693847656, 173.83531188964844, 269.78460693359375, 175.6695556640625, -310.3327941894531, 115.74229431152344, 117.46162414550781, 140.69012451171875, 48.4266357421875, 398.8612976074219, 56.11393737792969, 94.04972839355469, 120.67857360839844, 138.4723358154297, 212.73809814453125, 61.90351867675781, 156.6663055419922, 235.2210693359375, 99.50615692138672, -4.901741027832031, 77.9511947631836, -72.00929260253906, 40.13677978515625, -69.57418823242188, 14.495704650878906, -30.33647346496582, 210.48846435546875, 139.2073211669922, 19.404550552368164, 61.80933380126953, 166.88636779785156, -0.03396797180175781, 300.0585021972656], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000281.npy"}
{"epoch": 0.41262848751835535, "step": 282, "batch_size": 64, "mean": 96.7197036743164, "std": 117.89969635009766, "min": -97.75103759765625, "p10": -26.754999542236323, "median": 76.797607421875, "p90": 241.2982192993164, "max": 574.3090209960938, "pos_frac": 0.765625, "sample": [294.4642028808594, 79.28779602050781, 51.73257064819336, -42.63774871826172, 150.4886474609375, 200.71214294433594, 81.91822814941406, 41.64872741699219, 95.33120727539062, 241.3706512451172, 241.12921142578125, 102.5633544921875, 259.0116271972656, 70.05318450927734, -6.9562225341796875, 131.7984619140625, 574.3090209960938, 109.1202163696289, 61.6942138671875, 63.55255126953125, 45.79913330078125, 193.22177124023438, 13.201351165771484, -60.41729736328125, 305.47705078125, 283.0959777832031, 101.25859832763672, 110.33018493652344, 146.18675231933594, 332.5557861328125, 232.14833068847656, 43.18313980102539, 195.80813598632812, -8.528182983398438, 40.2453727722168, 101.10906982421875, 11.039474487304688, -17.09571075439453, 37.795257568359375, -37.30389404296875, -18.4971923828125, 86.79049682617188, 70.03447723388672, 150.54949951171875, 191.09385681152344, 52.46735382080078, -97.75103759765625, 197.43896484375, 120.9690933227539, 89.81465148925781, -22.65167999267578, -28.513565063476562, 46.01564025878906, 43.1718864440918, 110.61111450195312, -14.5595703125, 168.68563842773438, -11.459550857543945, -16.592355728149414, 232.24905395507812, -77.3057861328125, 40.30379867553711, -66.81350708007812, 74.30741882324219], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000282.npy"}
{"epoch": 0.41409691629955947, "step": 283, "batch_size": 64, "mean": 137.80856323242188, "std": 130.78060913085938, "min": -231.12533569335938, "p10": -17.023999214172342, "median": 104.1812629699707, "p90": 303.5004089355469, "max": 445.5201721191406, "pos_frac": 0.890625, "sample": [76.1204833984375, 2.2230300903320312, 109.32203674316406, 2.946714401245117, 201.77426147460938, 291.8928527832031, 69.41754150390625, 159.6465301513672, 276.3091125488281, 39.76995086669922, -30.62970733642578, 184.5224151611328, -51.121578216552734, 85.54536437988281, 43.51084899902344, 28.234878540039062, -231.12533569335938, 67.58773803710938, 214.6741943359375, 7.841123580932617, 385.63116455078125, 321.1600341796875, 106.99067687988281, 198.7471160888672, 89.14759826660156, 67.25570678710938, -34.126068115234375, 292.95611572265625, 227.26507568359375, 413.4481201171875, 32.22233581542969, 185.41806030273438, 3.5719451904296875, 101.3718490600586, 237.8599853515625, 75.43940734863281, 65.93093872070312, -48.23625564575195, 76.8175277709961, -53.078826904296875, 258.7352294921875, 210.51612854003906, 259.3562316894531, 89.5810546875, 290.4220886230469, 163.37493896484375, 96.39400482177734, 132.82769775390625, 222.57937622070312, 94.13353729248047, 371.271240234375, 132.16604614257812, 300.592529296875, 100.97651672363281, 31.53986358642578, 215.1283416748047, 329.5568542480469, 170.33529663085938, 87.31826782226562, 445.5201721191406, 304.74664306640625, 70.86553192138672, 172.82839965820312, -25.27272605895996], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000283.npy"}
{"epoch": 0.4155653450807636, "step": 284, "batch_size": 64, "mean": 135.40318298339844, "std": 128.82261657714844, "min": -84.82613372802734, "p10": 4.110821151733399, "median": 128.9069595336914, "p90": 318.4787139892578, "max": 532.7193603515625, "pos_frac": 0.921875, "sample": [-29.774627685546875, 75.57989501953125, 190.3397979736328, 29.597763061523438, 129.86019897460938, 12.297475814819336, 213.1687774658203, 26.920196533203125, 194.97518920898438, 256.3096008300781, 318.76806640625, 3.898754119873047, 317.8035583496094, 41.086097717285156, 159.89166259765625, 81.39186096191406, 35.674705505371094, 184.3553466796875, 182.03857421875, 338.2189636230469, 3.5919189453125, 461.9248352050781, 148.48817443847656, 213.06088256835938, 119.4229736328125, 59.018157958984375, 130.59385681152344, 80.3892822265625, 170.22792053222656, -27.249420166015625, 52.114105224609375, 526.9630737304688, 139.3912353515625, -23.567983627319336, 112.74190521240234, 97.76840209960938, 4.605644226074219, 74.82491302490234, 174.50596618652344, 532.7193603515625, 127.95372009277344, 146.51974487304688, 93.06668090820312, 20.517061233520508, -45.78614807128906, -84.82613372802734, 193.7547607421875, 245.08779907226562, 38.70909118652344, 165.33250427246094, 344.0604248046875, 156.12496948242188, 147.52040100097656, 141.65982055664062, 41.327880859375, 5.661556243896484, 16.783363342285156, 39.734107971191406, 194.57179260253906, 211.97691345214844, 51.465736389160156, 323.6488952636719, 61.512672424316406, 215.48837280273438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000284.npy"}
{"epoch": 0.4170337738619677, "step": 285, "batch_size": 64, "mean": 116.0887680053711, "std": 115.90702819824219, "min": -249.9479217529297, "p10": -26.33936958312986, "median": 121.0826187133789, "p90": 249.66661529541017, "max": 417.56976318359375, "pos_frac": 0.84375, "sample": [223.94776916503906, 175.04812622070312, 32.91966247558594, 251.50851440429688, 63.07154083251953, 70.21221923828125, 141.18853759765625, 37.557701110839844, 155.3955078125, 343.841064453125, 177.9993438720703, 156.92835998535156, 191.05850219726562, 95.88336944580078, 120.25752258300781, 417.56976318359375, 55.109161376953125, -0.3472442626953125, 131.87857055664062, 200.16842651367188, 264.1805725097656, -59.99713897705078, 55.13249969482422, 124.69483947753906, 239.57443237304688, 34.90619659423828, 11.30691146850586, 96.2076416015625, 49.769256591796875, 23.03620147705078, 183.74017333984375, 66.56169128417969, -45.253089904785156, 12.904170989990234, 187.20977783203125, 343.4945373535156, 101.0967788696289, -249.9479217529297, 192.95635986328125, 62.285614013671875, 137.1964874267578, -1.3973388671875, 338.23614501953125, 132.0511474609375, -35.77642059326172, 195.40469360351562, 159.787353515625, 147.7251434326172, 153.21182250976562, 281.8415832519531, 24.5189208984375, -82.00485229492188, 169.32647705078125, 121.90771484375, -46.63505554199219, -4.319583892822266, 243.93223571777344, 54.49525451660156, 93.3958969116211, 117.05448913574219, -36.19396209716797, 245.3688507080078, 177.17245483398438, 108.32613372802734], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000285.npy"}
{"epoch": 0.4185022026431718, "step": 286, "batch_size": 64, "mean": 128.3986053466797, "std": 134.10101318359375, "min": -176.76010131835938, "p10": -33.43417892456054, "median": 120.4224624633789, "p90": 341.00945129394535, "max": 405.6056213378906, "pos_frac": 0.796875, "sample": [30.18592071533203, 23.10564422607422, 398.5090637207031, 114.60855102539062, 98.95150756835938, 141.06048583984375, 77.82959747314453, 102.56639099121094, 186.0858917236328, -113.73446655273438, 191.83401489257812, 251.13137817382812, -19.723251342773438, -35.869667053222656, 152.74468994140625, 126.23637390136719, 348.2806091308594, -24.356857299804688, 134.14508056640625, 144.75543212890625, 394.05950927734375, 100.4119873046875, -176.76010131835938, 354.7441711425781, 186.6708526611328, -72.61492919921875, -7.381801605224609, 241.83799743652344, 205.97311401367188, 337.3011474609375, 371.281005859375, -8.689727783203125, 171.54864501953125, 53.922393798828125, 26.037277221679688, 93.29083251953125, 186.5416717529297, 128.1212615966797, 79.720947265625, -41.04869079589844, 183.74530029296875, 7.604988098144531, 109.16600799560547, -38.12805938720703, 0.7803192138671875, 242.6627960205078, 326.4096984863281, 158.36907958984375, -72.76638793945312, 342.5987243652344, -18.642797470092773, 107.7789535522461, 78.89518737792969, 244.51934814453125, 155.27886962890625, 268.5088806152344, 96.15440368652344, 143.212158203125, 127.11688232421875, -27.751373291015625, 405.6056213378906, 242.54513549804688, 96.47215270996094, 84.06073760986328], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000286.npy"}
{"epoch": 0.4199706314243759, "step": 287, "batch_size": 64, "mean": 114.15290069580078, "std": 134.55264282226562, "min": -198.12802124023438, "p10": -23.88597717285156, "median": 105.94004821777344, "p90": 301.77603759765634, "max": 557.0880126953125, "pos_frac": 0.84375, "sample": [106.65888977050781, 108.82720947265625, -9.195474624633789, 81.77657318115234, 108.02666473388672, -25.058685302734375, 203.8699951171875, 418.461181640625, 163.51199340820312, 115.64988708496094, 356.06524658203125, 117.62261962890625, 127.37612915039062, 315.895263671875, -21.149658203125, -70.47804260253906, 84.16158294677734, 274.1854248046875, 209.33682250976562, 250.1198272705078, 92.79869079589844, 4.764047622680664, 131.78167724609375, -198.12802124023438, 557.0880126953125, 50.475440979003906, 45.347328186035156, 47.292083740234375, 34.628875732421875, -26.23126983642578, 110.9312744140625, 39.52800750732422, -10.085418701171875, 107.43241882324219, 50.73866653442383, 248.98341369628906, 30.787200927734375, 168.48907470703125, 34.066680908203125, 63.49543762207031, 24.668058395385742, 269.2210693359375, 69.53443145751953, 97.42372131347656, 18.961380004882812, 105.22120666503906, 25.5269775390625, 35.94703674316406, 272.9625244140625, 122.4832763671875, -100.3046875, 41.614219665527344, 153.54473876953125, 150.54896545410156, 332.3221435546875, -109.0427474975586, -74.96226501464844, 0.31307220458984375, 120.2949447631836, 192.77505493164062, 371.9627685546875, 313.6005859375, 171.47293090820312, 199.8489990234375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000287.npy"}
{"epoch": 0.42143906020558003, "step": 288, "batch_size": 64, "mean": 131.31356811523438, "std": 120.7635726928711, "min": -213.38882446289062, "p10": -2.732868385314933, "median": 115.41716384887695, "p90": 288.71848754882814, "max": 449.66729736328125, "pos_frac": 0.890625, "sample": [182.51187133789062, 89.93878173828125, -50.86306381225586, 210.61705017089844, 207.3179168701172, 237.10707092285156, 22.566328048706055, 449.66729736328125, 66.61155700683594, 292.442138671875, 144.354736328125, 81.95956420898438, -36.970863342285156, 20.24639892578125, 37.57279586791992, 61.86061096191406, 97.05427551269531, 220.97433471679688, 67.01641845703125, 268.4034729003906, 118.48463439941406, 191.827392578125, 239.77288818359375, 37.76873779296875, 142.53919982910156, 136.75811767578125, 212.79281616210938, 5.480127334594727, 112.34969329833984, 94.30570983886719, 70.6899642944336, 98.54794311523438, 335.0346984863281, 170.71246337890625, 313.9384765625, 197.181640625, 36.62156677246094, 162.48165893554688, 192.006103515625, 74.1191635131836, -213.38882446289062, 359.1934509277344, 54.29939651489258, 156.28482055664062, 67.81697082519531, 134.20159912109375, 122.67642974853516, 51.86769104003906, 362.52703857421875, 87.47264862060547, 230.4590301513672, 280.02996826171875, 98.58397674560547, -103.90589904785156, 73.6483154296875, 296.57672119140625, 182.78305053710938, 34.44541931152344, 43.897674560546875, 271.6744384765625, -6.252723693847656, 263.0087890625, -20.358993530273438, -37.27400207519531], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000288.npy"}
{"epoch": 0.42290748898678415, "step": 289, "batch_size": 64, "mean": 149.75587463378906, "std": 150.3675994873047, "min": -113.35198211669922, "p10": -15.43266735076904, "median": 116.44817352294922, "p90": 386.48204650878904, "max": 487.10198974609375, "pos_frac": 0.8125, "sample": [118.86849975585938, 6.920501708984375, 461.584716796875, 432.3222351074219, -113.35198211669922, 341.5133056640625, 244.99937438964844, 197.38885498046875, 66.862060546875, 425.34197998046875, 343.4439697265625, 263.47894287109375, 252.71495056152344, 386.9567565917969, 257.962646484375, 252.125244140625, 96.98677062988281, 487.10198974609375, 0.08403778076171875, 221.41033935546875, 115.53813171386719, -4.633485794067383, 89.5408935546875, -29.057781219482422, -16.402650833129883, 117.35821533203125, 234.21475219726562, 246.020263671875, 31.38909912109375, 89.99293518066406, 241.22561645507812, 465.10906982421875, 239.46536254882812, -37.8911247253418, 79.09107208251953, -1.6420745849609375, 385.3743896484375, 101.63850402832031, -1.0049667358398438, 21.45314598083496, 79.2298583984375, 176.56134033203125, -13.16937255859375, 185.34750366210938, 18.210412979125977, 258.3707275390625, 317.2337951660156, 14.776201248168945, 84.54444885253906, -43.32827377319336, -11.151844024658203, 179.89007568359375, 88.72219848632812, 12.172971725463867, 399.84112548828125, 142.62619018554688, 43.639976501464844, 92.86917877197266, -25.357757568359375, 157.57266235351562, 72.8558349609375, -71.99591827392578, 193.65757751464844, 119.76232147216797], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000289.npy"}
{"epoch": 0.4243759177679883, "step": 290, "batch_size": 64, "mean": 146.91989135742188, "std": 160.98411560058594, "min": -165.7156982421875, "p10": -13.39089546203613, "median": 124.79716110229492, "p90": 372.265707397461, "max": 597.2838134765625, "pos_frac": 0.875, "sample": [340.07659912109375, 115.09255981445312, 126.03966522216797, 138.93008422851562, -165.7156982421875, 196.1114501953125, 278.95135498046875, -37.510589599609375, 5.2337493896484375, 82.22233581542969, -11.198249816894531, 246.66397094726562, 135.958984375, 7.404449462890625, 171.48785400390625, 176.9131317138672, 344.2283935546875, 89.90011596679688, 123.55465698242188, 35.802860260009766, 28.608016967773438, 21.362594604492188, -44.11578369140625, 129.90679931640625, 103.78253173828125, -164.14385986328125, 95.08161926269531, 148.6226806640625, 557.974609375, 544.6920166015625, 30.489118576049805, 196.33131408691406, 207.1110076904297, 484.5740661621094, 148.30120849609375, 7.31736946105957, 197.51528930664062, 122.2694091796875, 97.5505142211914, 86.5464096069336, 31.498870849609375, 138.26161193847656, 152.0417938232422, 221.1531219482422, 30.189361572265625, 263.0921630859375, 38.145484924316406, -42.98426818847656, 204.9586181640625, -50.741397857666016, 381.48822021484375, 290.36285400390625, 82.0517349243164, 375.2120666503906, 16.06026840209961, 365.390869140625, 417.48638916015625, 181.5328369140625, 29.960094451904297, 36.33428955078125, 50.31680679321289, 597.2838134765625, 208.183837890625, -14.33060073852539], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000290.npy"}
{"epoch": 0.42584434654919234, "step": 291, "batch_size": 64, "mean": 135.67874145507812, "std": 129.27268981933594, "min": -213.41986083984375, "p10": -6.151799011230464, "median": 139.1481475830078, "p90": 291.9641632080078, "max": 418.291015625, "pos_frac": 0.875, "sample": [79.68791198730469, 156.2340850830078, 35.0653076171875, 36.565391540527344, 4.16021728515625, 244.6005859375, 418.291015625, 151.70498657226562, 151.9230499267578, 234.5887451171875, 27.575836181640625, 292.338623046875, 197.65310668945312, 88.52420806884766, 408.9815368652344, 0.518280029296875, 296.7991027832031, 159.9058074951172, 42.82733917236328, 166.0515594482422, 277.30462646484375, 66.67103576660156, 210.9553985595703, 44.935333251953125, 200.9922332763672, 247.42263793945312, 124.74510955810547, 234.9456329345703, 273.30804443359375, 17.386898040771484, 172.90130615234375, 120.37589263916016, 232.5512237548828, -28.822280883789062, 66.7169189453125, -86.5030517578125, -8.267143249511719, 94.8243637084961, -213.41986083984375, 249.97909545898438, 199.1698760986328, 278.4954528808594, 231.29409790039062, 172.13360595703125, -73.62548828125, 53.0977783203125, -32.592529296875, 232.5358123779297, 13.09321403503418, 109.28380584716797, 16.882843017578125, 46.35595703125, 316.71661376953125, 129.92587280273438, -12.89605712890625, 148.37042236328125, 291.0904235839844, 72.76325988769531, 378.030517578125, 27.69341278076172, -1.2159957885742188, 237.00025939941406, 351.9237060546875, 4.912883758544922], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000291.npy"}
{"epoch": 0.42731277533039647, "step": 292, "batch_size": 64, "mean": 133.37327575683594, "std": 156.3836212158203, "min": -393.1285400390625, "p10": -24.27136135101318, "median": 141.5446014404297, "p90": 346.8363372802735, "max": 466.02886962890625, "pos_frac": 0.828125, "sample": [146.98464965820312, 124.32225036621094, 26.794754028320312, 136.10455322265625, 134.1385498046875, 161.54885864257812, 105.46014404296875, 103.0027084350586, 163.5269775390625, -393.1285400390625, 412.5706481933594, 50.41020965576172, 359.11029052734375, 66.36445617675781, 177.12750244140625, -26.013017654418945, -176.28646850585938, 351.277587890625, 5.663856506347656, 240.1925048828125, -20.207496643066406, 446.63214111328125, 248.30450439453125, 170.63986206054688, 227.45639038085938, 372.6460266113281, 207.73358154296875, 315.87200927734375, 63.40093231201172, 39.709293365478516, -113.56954956054688, 191.5263671875, 7.083898544311523, 241.5377197265625, -14.275856018066406, 110.6113510131836, -70.43211364746094, 165.43238830566406, 102.40174865722656, 74.45079040527344, 44.10546875, 257.24346923828125, 198.02920532226562, 195.87295532226562, 158.04241943359375, 196.2623748779297, 251.26171875, 227.434326171875, 466.02886962890625, 41.54248046875, 93.62223052978516, 154.70333862304688, 233.39939880371094, -144.32423400878906, 182.8966827392578, -97.5687484741211, 436.9747314453125, -3.749034881591797, 13.680240631103516, 336.4734191894531, 194.58802795410156, 95.5722885131836, 76.86509704589844, -9.193628311157227], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000292.npy"}
{"epoch": 0.4287812041116006, "step": 293, "batch_size": 64, "mean": 108.85354614257812, "std": 146.88436889648438, "min": -180.10289001464844, "p10": -58.66225967407226, "median": 105.29367065429688, "p90": 300.4778198242188, "max": 473.8836364746094, "pos_frac": 0.734375, "sample": [25.729995727539062, 62.20855712890625, 43.32115936279297, 69.54024505615234, 191.4110565185547, 191.1263885498047, 332.98101806640625, 106.55934143066406, 305.31451416015625, 25.454248428344727, 263.4533386230469, -29.962295532226562, 169.555908203125, -27.871532440185547, 473.8836364746094, -98.23281860351562, 109.60348510742188, 204.43679809570312, 48.42218017578125, 124.9816665649414, 210.86001586914062, -116.83307647705078, 172.212158203125, -34.208106994628906, 20.200990676879883, 38.201698303222656, -43.94364929199219, 137.80027770996094, 64.90745544433594, 380.46563720703125, 344.1361083984375, 233.75131225585938, 222.86021423339844, 207.20928955078125, 193.21514892578125, -52.44444274902344, 20.400634765625, 134.21575927734375, 134.97894287109375, -51.580894470214844, -33.128562927246094, 38.50934600830078, -12.584930419921875, 109.15957641601562, 247.60438537597656, 268.3649597167969, 147.6375732421875, 223.4458465576172, 104.02799987792969, -137.08596801757812, 22.316741943359375, -180.10289001464844, -61.13728332519531, 289.19219970703125, 442.95550537109375, 34.65203094482422, -55.8353271484375, 178.6005401611328, 53.153324127197266, -36.44834899902344, -59.873802185058594, 271.9017639160156, 372.21337890625, -69.23358154296875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000293.npy"}
{"epoch": 0.4302496328928047, "step": 294, "batch_size": 64, "mean": 121.20598602294922, "std": 149.3057403564453, "min": -136.22802734375, "p10": -36.62670097351074, "median": 77.37353134155273, "p90": 330.6365844726563, "max": 581.7984008789062, "pos_frac": 0.859375, "sample": [41.876197814941406, 74.52273559570312, 231.4520263671875, 337.603271484375, 84.40274047851562, 56.0935173034668, -40.424110412597656, 73.7933120727539, 6.6710662841796875, 132.68777465820312, 6.304473876953125, 10.600654602050781, -27.76607894897461, 383.9021301269531, -19.31688690185547, 188.02716064453125, 160.92124938964844, 264.55706787109375, 59.91531753540039, 127.21528625488281, 417.35784912109375, -91.22784423828125, 175.31698608398438, 69.79005432128906, 461.713623046875, 234.6827392578125, 229.3134765625, 71.69766998291016, 121.2875747680664, -70.73589324951172, 30.84636688232422, 231.5143280029297, 190.10787963867188, 581.7984008789062, -128.54127502441406, -122.67034912109375, 429.55810546875, 106.86077880859375, 189.38436889648438, 55.147560119628906, -136.22802734375, -81.52239227294922, 221.19876098632812, 236.18446350097656, 363.1304931640625, 110.78904724121094, 45.00668716430664, 73.42103576660156, 314.3809814453125, 43.03875732421875, 235.28298950195312, 213.11386108398438, 7.641059875488281, 105.80763244628906, 5.300212860107422, 13.15045166015625, 57.29470443725586, 80.22432708740234, 28.474414825439453, 54.34031677246094, 284.23968505859375, 15.228057861328125, 51.067291259765625, 80.37715911865234], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000294.npy"}
{"epoch": 0.43171806167400884, "step": 295, "batch_size": 64, "mean": 121.80484771728516, "std": 138.51943969726562, "min": -130.1283416748047, "p10": -31.467968368530272, "median": 114.99645233154297, "p90": 292.7342437744141, "max": 520.666748046875, "pos_frac": 0.796875, "sample": [-92.85108947753906, 58.2391357421875, 284.6373596191406, -0.1899738311767578, -122.97038269042969, 189.23519897460938, 34.30751419067383, 207.71029663085938, 163.08224487304688, -130.1283416748047, 201.61599731445312, 26.133148193359375, 117.73200225830078, 254.4977264404297, 290.3049011230469, 200.11776733398438, 218.2965850830078, 111.23007202148438, 7.147087097167969, 54.12255096435547, 255.5369873046875, -3.829366683959961, 35.789825439453125, 136.45738220214844, -42.80200958251953, 35.201507568359375, 351.5958557128906, -69.57160949707031, 333.61151123046875, 130.6449737548828, 174.54013061523438, 11.02703857421875, -31.19430923461914, -114.89608764648438, 277.1016845703125, 410.50390625, -4.840156555175781, 25.487876892089844, 14.418807983398438, 79.12403869628906, 133.19442749023438, 212.25555419921875, 286.17901611328125, -31.585250854492188, -23.31615447998047, 11.676742553710938, 135.26768493652344, 57.50598907470703, 164.80404663085938, 377.1499328613281, 131.7252197265625, 166.5091552734375, 62.878936767578125, 520.666748046875, 293.775390625, 258.048828125, 114.61862182617188, 157.17379760742188, 115.37428283691406, 83.04360961914062, 67.1457748413086, 85.138916015625, 340.4755554199219, -0.37477874755859375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000295.npy"}
{"epoch": 0.4331864904552129, "step": 296, "batch_size": 64, "mean": 92.9300308227539, "std": 145.4501953125, "min": -271.1091613769531, "p10": -58.265440368652335, "median": 81.88672637939453, "p90": 268.44691162109376, "max": 448.89959716796875, "pos_frac": 0.734375, "sample": [43.839073181152344, 63.14429473876953, -27.08624267578125, 448.274658203125, -38.94122314453125, 177.94583129882812, -33.174644470214844, -13.436904907226562, -20.61548614501953, -267.0910949707031, 155.9351806640625, 62.378562927246094, -172.5712432861328, 207.52261352539062, 92.60220336914062, -120.07254028320312, 71.2301025390625, 188.40257263183594, 396.1098327636719, 271.9337158203125, 131.03683471679688, 100.37653350830078, -62.021766662597656, 9.397186279296875, 43.923583984375, 200.93191528320312, 8.083404541015625, -44.84754943847656, -34.205596923828125, -19.880603790283203, 288.34796142578125, 102.28358459472656, 155.5445098876953, 125.22347259521484, 74.44229125976562, 368.9324951171875, 178.36630249023438, 37.41550827026367, -271.1091613769531, 203.12985229492188, -82.46334838867188, 212.41864013671875, -49.50067901611328, -22.385957717895508, 273.43536376953125, 106.5482177734375, 84.4161605834961, 185.9119873046875, 72.00965881347656, 79.35729217529297, 74.83106231689453, 11.194007873535156, 9.400520324707031, 448.89959716796875, 196.4971160888672, 216.76641845703125, 126.85724639892578, 144.96292114257812, 260.31103515625, -63.56431579589844, 162.6614532470703, 174.24276733398438, 35.46488952636719, 207.55023193359375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000296.npy"}
{"epoch": 0.434654919236417, "step": 297, "batch_size": 64, "mean": 112.05933380126953, "std": 109.78848266601562, "min": -158.5224609375, "p10": -7.211843872070309, "median": 105.62314987182617, "p90": 273.35340881347656, "max": 378.5842590332031, "pos_frac": 0.875, "sample": [61.275054931640625, 124.38420104980469, 266.0996398925781, 333.17529296875, 104.02696228027344, -3.9983577728271484, 286.622314453125, 123.1583251953125, -25.64750099182129, -61.91582107543945, 97.44503784179688, 111.28871154785156, 197.3721923828125, 142.14662170410156, 326.3634033203125, 46.71656036376953, 114.63369750976562, 65.63481140136719, 69.21231079101562, -8.589052200317383, -32.92420959472656, -59.53114318847656, 29.70716667175293, 33.12620544433594, 342.26593017578125, 14.276824951171875, 272.77459716796875, 39.63807678222656, 168.70254516601562, 125.68734741210938, 107.2193374633789, 86.21893310546875, 229.53883361816406, 120.75880432128906, 209.06739807128906, 45.8601188659668, 128.78318786621094, 88.03836822509766, 348.31842041015625, 170.95004272460938, 70.40470886230469, 168.7876434326172, -72.08350372314453, 127.44975280761719, 11.268789291381836, 50.14665222167969, 134.87440490722656, 42.2022705078125, 378.5842590332031, 9.944046020507812, 96.65261840820312, 273.6014709472656, 125.85055541992188, 8.317413330078125, -158.5224609375, 114.5215072631836, 67.2388687133789, 48.17207336425781, 177.27413940429688, 213.07333374023438, 84.00367736816406, 171.0703582763672, 76.33499908447266, 114.74853515625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000297.npy"}
{"epoch": 0.43612334801762115, "step": 298, "batch_size": 64, "mean": 134.819091796875, "std": 153.6062774658203, "min": -106.12713623046875, "p10": -25.99762420654296, "median": 95.19966888427734, "p90": 348.46621093750014, "max": 688.6326293945312, "pos_frac": 0.8125, "sample": [91.5461654663086, -30.858474731445312, 37.248165130615234, -63.94020080566406, 67.40005493164062, -58.9356689453125, -7.761564254760742, 455.9858703613281, 55.02991485595703, 68.68359375, 68.47290802001953, 302.12298583984375, 58.54508972167969, 12.707534790039062, -2.939117431640625, 24.03447723388672, 104.64971160888672, 137.34417724609375, 104.97415924072266, 181.86302185058594, 430.99981689453125, -9.871688842773438, 195.0713653564453, -106.12713623046875, 311.59906005859375, 210.3804168701172, 3.3253326416015625, 214.69073486328125, 381.4975891113281, 364.26641845703125, -49.82769012451172, -77.90034484863281, -43.020164489746094, 2.63629150390625, 21.40892791748047, 377.53033447265625, 406.64300537109375, 169.41505432128906, 245.52996826171875, 247.68307495117188, 29.346250534057617, 10.146041870117188, 688.6326293945312, 158.89422607421875, 292.2542419433594, 89.71920776367188, 150.2539520263672, 280.3927917480469, 59.651554107666016, 137.78916931152344, -14.6556396484375, 287.30767822265625, 213.02703857421875, 56.694297790527344, 277.993408203125, 166.6975860595703, 96.1767578125, 229.04098510742188, -2.4833221435546875, 106.03414916992188, 51.394256591796875, 94.22257995605469, 43.31927490234375, 224.46932983398438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000298.npy"}
{"epoch": 0.43759177679882527, "step": 299, "batch_size": 64, "mean": 129.19944763183594, "std": 150.3936004638672, "min": -162.3499755859375, "p10": -56.04174995422363, "median": 118.78539276123047, "p90": 365.7326751708985, "max": 512.0806274414062, "pos_frac": 0.8125, "sample": [-56.278465270996094, -161.06301879882812, -59.919158935546875, 174.06570434570312, -83.14959716796875, -66.24801635742188, 252.668212890625, 64.7083740234375, 54.17046356201172, 52.83606719970703, 398.2008972167969, 376.948974609375, 460.74127197265625, 64.6363296508789, 207.0614013671875, 473.333740234375, -162.3499755859375, 244.05801391601562, 125.22920227050781, 70.15592956542969, 32.81999969482422, 207.21542358398438, 45.90270233154297, 112.22737121582031, 141.55734252929688, 55.96178436279297, 512.0806274414062, 46.7075080871582, 66.45063018798828, -0.4793739318847656, -21.337915420532227, 65.34739685058594, 110.55209350585938, 174.02171325683594, 191.8165283203125, 380.3032531738281, 57.61424255371094, 234.83180236816406, 46.7357177734375, 123.7184829711914, 124.5072250366211, 112.35061645507812, 126.79209899902344, -55.48941421508789, 194.03872680664062, 126.82775115966797, -136.6603240966797, 339.5613098144531, 186.380859375, 331.1622619628906, -27.43267059326172, 24.305213928222656, 113.85230255126953, 150.59466552734375, 125.57637786865234, 194.76431274414062, 128.33334350585938, 418.56524658203125, 142.79774475097656, 100.50222778320312, -22.3695068359375, 270.9309387207031, 55.38520812988281, 229.63455200195312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000299.npy"}
{"epoch": 0.4390602055800294, "step": 300, "batch_size": 64, "mean": 141.98764038085938, "std": 167.3097381591797, "min": -203.8716583251953, "p10": -109.21709289550779, "median": 154.3271026611328, "p90": 325.6793334960938, "max": 556.8153076171875, "pos_frac": 0.796875, "sample": [126.09320068359375, 206.6080322265625, 158.82568359375, 240.89646911621094, 182.6708526611328, -165.46124267578125, 49.95581817626953, 339.325927734375, 292.74542236328125, -3.1373329162597656, -157.68711853027344, 94.4669418334961, 206.3296356201172, 229.97445678710938, 92.2146224975586, 262.0778503417969, 116.42568969726562, 298.0701904296875, 19.04783821105957, 87.68582916259766, -25.322021484375, 213.91143798828125, 150.78665161132812, -12.852958679199219, 212.32589721679688, 311.4332275390625, -117.89492797851562, 32.84886169433594, 186.5263214111328, 125.59562683105469, 92.93415832519531, 386.600830078125, 158.34588623046875, -147.080078125, 286.05120849609375, -192.62075805664062, 342.45770263671875, 151.8321533203125, 314.76165771484375, 185.2056884765625, 109.39640045166016, 241.7043914794922, 280.66876220703125, 174.73248291015625, 28.34524917602539, 262.8583984375, 79.42198181152344, -88.96881103515625, 529.987548828125, 304.0759582519531, 330.35833740234375, -203.8716583251953, 156.82205200195312, -124.91809844970703, 9.59408950805664, 118.66516876220703, 277.0384216308594, 45.63262939453125, 460.7527160644531, 556.8153076171875, -25.2142333984375, 81.16572570800781, 181.48947143554688, -32.31800842285156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000300.npy"}
{"epoch": 0.44052863436123346, "step": 301, "batch_size": 64, "mean": 141.79217529296875, "std": 138.87759399414062, "min": -144.49600219726562, "p10": -1.2794456481933565, "median": 110.15982055664062, "p90": 319.3412689208985, "max": 510.52215576171875, "pos_frac": 0.890625, "sample": [169.9767303466797, 225.98538208007812, -2.5284194946289062, 309.6186218261719, 209.06259155273438, 208.79098510742188, 421.1717224121094, 63.49153137207031, 282.5812072753906, -48.06111145019531, 366.536865234375, 298.342529296875, 50.72252655029297, 191.6952362060547, -36.35905456542969, 143.158935546875, 65.6729965209961, 323.50811767578125, 6.4506683349609375, 226.3153839111328, 475.1101379394531, 17.429691314697266, -16.966073989868164, 8.977951049804688, 217.409912109375, 104.61492919921875, 38.726158142089844, 43.287200927734375, 234.8633575439453, 169.41845703125, 262.59716796875, 106.2430419921875, 90.66703796386719, 196.16281127929688, 62.072227478027344, 89.07316589355469, 139.03334045410156, 35.76499938964844, 8.264644622802734, 2.1590576171875, 77.90228271484375, 39.210914611816406, 77.0394058227539, 74.13806915283203, 2.1402740478515625, 510.52215576171875, 259.63128662109375, 122.25564575195312, 279.94866943359375, 114.07659912109375, 381.3746032714844, 139.29736328125, 243.29981994628906, 261.1682434082031, 232.112060546875, 73.18952941894531, -144.49600219726562, -7.2807464599609375, 386.2815246582031, -49.6268310546875, 33.68986511230469, 1.63482666015625, 6.917209625244141, 199.22988891601562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000301.npy"}
{"epoch": 0.4419970631424376, "step": 302, "batch_size": 64, "mean": 117.74930572509766, "std": 132.63661193847656, "min": -119.46663665771484, "p10": -32.065709686279284, "median": 96.2752799987793, "p90": 303.6031463623047, "max": 528.9874877929688, "pos_frac": 0.8125, "sample": [108.825439453125, 240.66015625, 318.2532958984375, 12.88888931274414, 106.7474365234375, 63.72746276855469, 223.5601043701172, -0.9208984375, 82.11900329589844, 50.579681396484375, 222.12026977539062, 51.48692321777344, 92.2020034790039, 30.10254669189453, 51.55364990234375, -20.05725860595703, 226.40538024902344, -1.2095146179199219, 220.1675567626953, 193.15347290039062, 125.65768432617188, 136.74679565429688, 187.10867309570312, 139.97821044921875, -71.09269714355469, -88.15709686279297, 178.18630981445312, 368.1305236816406, 82.46146392822266, -104.72042846679688, 143.75396728515625, 30.54479217529297, 181.8346710205078, -110.35054016113281, -19.677753448486328, 396.2342834472656, 528.9874877929688, 245.7966766357422, 110.3011245727539, -37.212188720703125, -3.280548095703125, 296.39813232421875, -119.46663665771484, 196.09527587890625, 210.57115173339844, 89.37703704833984, 47.507049560546875, 330.49530029296875, 100.34855651855469, 150.81085205078125, 21.00358009338379, 153.99452209472656, -95.84256744384766, 133.8068084716797, 50.1124382019043, 139.81201171875, 78.67652893066406, 306.6910095214844, 25.589675903320312, 403.576171875, 61.69642639160156, 90.35731506347656, 85.3603286743164, 85.38789367675781], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000302.npy"}
{"epoch": 0.4434654919236417, "step": 303, "batch_size": 64, "mean": 118.02484130859375, "std": 148.70689392089844, "min": -156.34683227539062, "p10": -29.783688926696776, "median": 88.39288711547852, "p90": 339.04544982910164, "max": 539.1705322265625, "pos_frac": 0.765625, "sample": [135.63058471679688, 5.264049530029297, 18.070350646972656, 403.77032470703125, -97.83899688720703, 233.5352020263672, 261.0433044433594, 455.5716552734375, 8.548803329467773, -14.2281494140625, 240.32693481445312, 134.68844604492188, 280.70147705078125, -24.995132446289062, 29.982826232910156, 86.00796508789062, 50.08439636230469, 68.97676086425781, -22.580373764038086, 242.9185791015625, -156.34683227539062, -6.352657318115234, -23.538169860839844, 188.12490844726562, 285.2496643066406, 359.65753173828125, -3.357675552368164, 183.6377716064453, -84.0258560180664, 122.55500793457031, 8.61834716796875, 187.13980102539062, 407.72210693359375, 131.00827026367188, 298.15203857421875, 49.358123779296875, -101.86666107177734, 23.44219970703125, 90.7778091430664, 539.1705322265625, 325.5528259277344, 9.827667236328125, 132.5003662109375, 354.8473815917969, 38.077972412109375, 69.23876953125, 150.34437561035156, 277.9507751464844, 81.06267547607422, -29.58388328552246, 138.26058959960938, 48.72644805908203, 130.41519165039062, 39.8390998840332, 104.006591796875, 344.8280029296875, -16.99032211303711, -59.85499572753906, 108.65768432617188, -29.869319915771484, 139.2619171142578, 51.080474853515625, -36.434715270996094, 187.23947143554688], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000303.npy"}
{"epoch": 0.44493392070484583, "step": 304, "batch_size": 64, "mean": 125.19536590576172, "std": 102.76283264160156, "min": -108.15293884277344, "p10": -5.356690788269039, "median": 122.89799499511719, "p90": 256.5961517333984, "max": 373.9634094238281, "pos_frac": 0.859375, "sample": [0.271392822265625, 111.20323181152344, 202.0582733154297, 89.75457000732422, -0.85858154296875, -46.44145202636719, -7.063995361328125, 10.499191284179688, 117.22509765625, 147.13885498046875, 158.2725830078125, 191.4563446044922, 84.56597900390625, 211.17140197753906, 131.78134155273438, 198.5538330078125, 168.88909912109375, 168.01687622070312, 99.74431610107422, 122.22746276855469, 167.40472412109375, 256.67352294921875, 56.56114196777344, 41.979583740234375, 204.837158203125, 108.35603332519531, 209.0955047607422, 39.70111083984375, 256.4156188964844, 72.96014404296875, 85.60920715332031, 99.20236206054688, 318.8334045410156, 107.8436050415039, -30.718711853027344, -108.15293884277344, 118.74374389648438, 52.610626220703125, 105.96733093261719, 128.1747283935547, -64.90228271484375, 280.24163818359375, -55.73069763183594, -1.3729801177978516, 283.66558837890625, 54.592262268066406, 7.0128326416015625, 47.715660095214844, 214.56671142578125, 204.81600952148438, 226.67855834960938, -56.574031829833984, 123.56852722167969, 373.9634094238281, 288.3487548828125, 87.42205810546875, 207.7593994140625, 134.85552978515625, 272.42620849609375, 200.10592651367188, 181.03826904296875, 216.74871826171875, 168.86358642578125, 166.1298370361328], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000304.npy"}
{"epoch": 0.44640234948604995, "step": 305, "batch_size": 64, "mean": 119.18153381347656, "std": 143.3424530029297, "min": -129.84698486328125, "p10": -28.368702697753903, "median": 82.53142929077148, "p90": 302.9187713623047, "max": 684.713623046875, "pos_frac": 0.796875, "sample": [216.19046020507812, 152.33157348632812, 76.3358154296875, 136.91578674316406, 110.56462097167969, 404.306640625, 146.03335571289062, 70.74974060058594, 181.32667541503906, 346.6926574707031, 117.43412780761719, 50.01368713378906, 87.6785659790039, -13.97601318359375, -28.956146240234375, 146.5025634765625, 180.576171875, 298.1413269042969, 77.3717269897461, -12.140998840332031, 684.713623046875, 212.5196533203125, 60.437137603759766, 142.83815002441406, 241.78277587890625, 44.149322509765625, 64.76947784423828, 163.8203582763672, 210.3230743408203, -37.990142822265625, -4.994346618652344, 237.4048309326172, -26.998001098632812, 166.6280517578125, 173.22970581054688, 71.97352600097656, 433.8507995605469, 126.879150390625, 23.392946243286133, 60.267066955566406, 67.8174819946289, -7.811302185058594, 77.38429260253906, 62.93492889404297, 304.96624755859375, 339.9747314453125, 159.2372283935547, -43.122039794921875, -98.61880493164062, 135.15493774414062, -5.72393798828125, 406.55548095703125, 260.4969787597656, 39.62615203857422, -82.42413330078125, 13.627471923828125, -67.76415252685547, 12.813491821289062, 163.53567504882812, -129.84698486328125, 25.78937530517578, 163.392333984375, 25.244949340820312, 11.288091659545898], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000305.npy"}
{"epoch": 0.447870778267254, "step": 306, "batch_size": 64, "mean": 122.83321380615234, "std": 125.07752227783203, "min": -191.93215942382812, "p10": -3.686279678344726, "median": 104.00725555419922, "p90": 274.31601867675784, "max": 514.294921875, "pos_frac": 0.859375, "sample": [42.788516998291016, -3.220012664794922, 144.19163513183594, 18.43603515625, -44.891357421875, 25.718063354492188, 252.63668823242188, 101.43053436279297, 48.633941650390625, -12.066596984863281, 137.22377014160156, 296.0585021972656, 25.22308349609375, 176.78553771972656, 79.7977294921875, 106.58397674560547, 20.779056549072266, 271.2447814941406, 100.30644989013672, 44.04505920410156, 13.507011413574219, 53.94609832763672, -3.8861083984375, 267.6371154785156, 167.64556884765625, 205.7602996826172, 32.788326263427734, -1.4299869537353516, 242.6792755126953, 150.66029357910156, 22.623565673828125, 48.97303009033203, 220.483642578125, 227.15646362304688, 67.60334777832031, 210.34381103515625, 79.31239318847656, -49.503074645996094, 65.15879821777344, 144.35350036621094, -191.93215942382812, 275.63226318359375, 129.61160278320312, -51.73524475097656, 7.570772171020508, 210.09521484375, -8.241493225097656, 165.90280151367188, 435.3731994628906, 361.9246826171875, 214.91148376464844, 244.1757049560547, 151.3234405517578, 34.77124786376953, 307.57391357421875, 30.52895736694336, 64.52468872070312, 36.606285095214844, 153.6769561767578, 129.7841033935547, 514.294921875, 199.63638305664062, 159.71607971191406, 288.0810241699219], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000306.npy"}
{"epoch": 0.44933920704845814, "step": 307, "batch_size": 64, "mean": 117.76042175292969, "std": 132.6990203857422, "min": -257.5232849121094, "p10": -32.57609519958496, "median": 101.95780181884766, "p90": 305.30108032226565, "max": 450.01116943359375, "pos_frac": 0.84375, "sample": [221.79745483398438, 97.5506820678711, 42.31548309326172, 95.27218627929688, 114.0057144165039, 133.2624053955078, -44.04351806640625, 39.44514465332031, -22.9012451171875, 380.3569030761719, 112.03607940673828, 17.766822814941406, 114.36988830566406, -56.3897705078125, 289.68402099609375, -257.5232849121094, 360.0788879394531, -114.27043151855469, 165.59872436523438, -39.874717712402344, 139.23123168945312, 141.93844604492188, 78.33733367919922, 141.333984375, 381.79644775390625, 143.97750854492188, 91.55943298339844, 60.79652404785156, 103.19148254394531, 347.7286682128906, 64.17884063720703, 297.96575927734375, 54.58307647705078, 100.72412109375, 89.67927551269531, 12.614242553710938, 159.16629028320312, 80.34678649902344, 252.4057159423828, 321.44580078125, 55.43170928955078, 227.24346923828125, -32.80219268798828, 160.42874145507812, 208.6123809814453, 35.04991149902344, 118.33576965332031, 216.6114501953125, 36.744483947753906, 92.55754089355469, 168.03077697753906, 4.701379776000977, 305.84393310546875, 304.034423828125, 209.97119140625, 450.01116943359375, 34.37499237060547, -32.04853439331055, 178.45285034179688, 48.03265380859375, 41.81117248535156, 125.21493530273438, -14.958206176757812, -116.55785369873047], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000307.npy"}
{"epoch": 0.45080763582966227, "step": 308, "batch_size": 64, "mean": 131.8763885498047, "std": 126.04998779296875, "min": -117.68441009521484, "p10": -24.066464042663572, "median": 122.70753860473633, "p90": 310.6393249511719, "max": 445.2265930175781, "pos_frac": 0.859375, "sample": [162.5225830078125, 315.65863037109375, -108.3797607421875, 220.65298461914062, 17.619140625, 326.37506103515625, -117.68441009521484, 75.51718139648438, 57.0751953125, 86.49449920654297, 285.47210693359375, 34.69839859008789, 126.23191833496094, 77.56222534179688, 193.75704956054688, 352.9541015625, -94.15962219238281, 175.3194122314453, 101.09508514404297, -0.5999984741210938, 176.23072814941406, 353.3316345214844, 269.46759033203125, 313.18231201171875, 19.486541748046875, 146.26358032226562, 115.18359375, 40.645729064941406, 147.42550659179688, 26.670793533325195, 158.14340209960938, 119.18315887451172, 104.63729095458984, 193.22584533691406, 114.68089294433594, 280.5031433105469, 304.7056884765625, 78.46774291992188, -37.48789978027344, 49.53124237060547, 41.51580810546875, 193.58714294433594, 149.49847412109375, 128.86187744140625, 211.86778259277344, 445.2265930175781, 144.87667846679688, 96.96058654785156, 55.280662536621094, 214.67041015625, 412.8833923339844, 188.943603515625, 89.08890533447266, 16.610803604125977, -21.157346725463867, 27.302457809448242, 225.80841064453125, 246.590087890625, 204.4800262451172, -62.684776306152344, 214.65853881835938, 11.023765563964844, -25.313228607177734, -32.15222930908203], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000308.npy"}
{"epoch": 0.4522760646108664, "step": 309, "batch_size": 64, "mean": 131.0915069580078, "std": 129.8876953125, "min": -144.00494384765625, "p10": -11.324162292480468, "median": 115.55319595336914, "p90": 294.4834289550782, "max": 455.5001525878906, "pos_frac": 0.84375, "sample": [71.64936828613281, 46.92030334472656, 58.32056427001953, -60.736968994140625, 138.97134399414062, 164.22341918945312, 84.7696304321289, -7.478277206420898, 353.49481201171875, 54.483642578125, 235.64129638671875, 261.65374755859375, 123.8755111694336, 398.7281799316406, 80.91471099853516, 49.035804748535156, 12.666728973388672, 246.81179809570312, 261.1958312988281, 217.12103271484375, 199.1920623779297, -142.9420166015625, 197.34628295898438, -19.82012939453125, 98.65023803710938, 71.09009552001953, 226.7769012451172, 160.2384490966797, -10.830169677734375, 55.075103759765625, -28.12580108642578, 396.4580078125, 6.1513671875, 19.314559936523438, 121.30174255371094, 314.854736328125, 455.5001525878906, 282.8307189941406, 72.40196228027344, -11.535873413085938, -6.940319061279297, 70.48970794677734, 74.44291687011719, 285.7295837402344, 117.47833251953125, 152.899658203125, 321.797119140625, 164.79502868652344, 181.35369873046875, 35.50449752807617, 298.2350769042969, -144.00494384765625, 143.15765380859375, 197.25723266601562, -59.02850341796875, 61.4296760559082, 279.9811096191406, 279.87408447265625, 10.916839599609375, 113.62805938720703, 127.55387878417969, 53.20486068725586, 261.37884521484375, 112.53153991699219], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000309.npy"}
{"epoch": 0.45374449339207046, "step": 310, "batch_size": 64, "mean": 122.96668243408203, "std": 105.62281036376953, "min": -69.10797119140625, "p10": -2.9498939514160147, "median": 110.72494125366211, "p90": 274.76922912597655, "max": 360.54736328125, "pos_frac": 0.875, "sample": [-69.10797119140625, 76.65393829345703, 232.79312133789062, -2.1089401245117188, 95.89091491699219, 256.00311279296875, 102.29798889160156, 29.20420265197754, 37.07218933105469, 127.43853759765625, -47.56922149658203, 142.3673095703125, 205.34115600585938, -8.143924713134766, 39.61235809326172, 110.81993865966797, 130.95863342285156, 135.4908905029297, 110.62994384765625, 225.2840576171875, 102.27134704589844, 57.32527160644531, 207.01055908203125, 27.420166015625, 51.82666015625, 275.577880859375, 187.72084045410156, 56.11380386352539, 21.380815505981445, 107.77108001708984, -3.310302734375, 344.6578674316406, 24.483394622802734, 103.63534545898438, 360.54736328125, 118.92291259765625, 199.50390625, 27.55528450012207, 34.762351989746094, 40.982749938964844, 19.148651123046875, 147.53237915039062, 256.08489990234375, 168.02308654785156, 101.61988067626953, 203.5586700439453, 351.94195556640625, -52.52826690673828, 308.35076904296875, 159.17327880859375, 140.8746337890625, -40.24231719970703, 149.91978454589844, 99.98524475097656, -39.15013122558594, 33.200050354003906, 111.96417236328125, 301.37017822265625, 275.0960998535156, 183.19052124023438, 79.16799926757812, 274.00653076171875, 209.3610076904297, 151.13096618652344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000310.npy"}
{"epoch": 0.4552129221732746, "step": 311, "batch_size": 64, "mean": 144.33042907714844, "std": 153.36610412597656, "min": -93.70094299316406, "p10": -36.00371246337889, "median": 132.52642059326172, "p90": 342.8493316650391, "max": 559.3551025390625, "pos_frac": 0.765625, "sample": [314.11041259765625, 274.5947570800781, 257.1612548828125, 14.700325012207031, -82.86225128173828, 220.4523162841797, -8.18588638305664, 163.3069610595703, 214.79095458984375, 310.8367614746094, 143.76278686523438, 35.81890869140625, 253.30419921875, 518.90185546875, 136.08824157714844, 155.038818359375, 271.3432312011719, 103.52959442138672, 45.20555877685547, 357.5663757324219, 164.4734649658203, 196.99267578125, 344.4925842285156, 129.2805938720703, 67.11376190185547, 168.16065979003906, 24.981765747070312, -59.59185791015625, 92.29940032958984, 109.02629852294922, -50.808868408203125, 161.53863525390625, -4.341209411621094, 203.2014923095703, 279.08941650390625, -43.42002868652344, 3.4284515380859375, 559.3551025390625, -18.698974609375, 185.71185302734375, 89.50707244873047, 30.517860412597656, 197.09327697753906, 86.61579895019531, 416.79022216796875, 93.21505737304688, -1.1248283386230469, -7.82276725769043, 71.71469116210938, 82.40133666992188, -2.1517333984375, 468.68878173828125, 198.24014282226562, -46.22657012939453, -93.70094299316406, 309.04931640625, 152.224853515625, 135.77224731445312, -16.80127716064453, 113.60935974121094, 339.01507568359375, -50.68098068237305, -6.893592834472656, 466.3446044921875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000311.npy"}
{"epoch": 0.4566813509544787, "step": 312, "batch_size": 64, "mean": 132.19378662109375, "std": 144.12118530273438, "min": -210.8433837890625, "p10": -23.134867095947257, "median": 121.89448547363281, "p90": 322.23646850585953, "max": 575.3331909179688, "pos_frac": 0.875, "sample": [64.25759887695312, 575.3331909179688, 342.6370849609375, 119.87020874023438, 78.9554443359375, -12.637382507324219, 58.32988739013672, 50.974945068359375, 103.7544937133789, 284.63970947265625, -210.8433837890625, 338.349365234375, 459.31939697265625, 218.2399139404297, 179.84622192382812, 183.11294555664062, -169.36990356445312, 93.66387939453125, 241.9851837158203, 174.34146118164062, 255.76080322265625, 20.659954071044922, 88.62991333007812, 195.54647827148438, 113.11212158203125, 5.78923225402832, 65.95650482177734, 363.5913391113281, 232.45578002929688, 193.37704467773438, 144.40357971191406, 150.09402465820312, 163.56478881835938, 172.92698669433594, -75.92855834960938, 100.73310852050781, 168.2207794189453, 338.9470520019531, 67.08079528808594, 110.48062133789062, 219.81805419921875, 106.2874526977539, 0.8295269012451172, 218.5006866455078, 184.12220764160156, 83.88481140136719, -27.6337890625, 497.6100158691406, 50.88108825683594, 72.9911880493164, 142.64337158203125, 154.5189971923828, -98.15230560302734, 38.84623718261719, 15.647499084472656, 162.1516876220703, 176.92820739746094, -33.92045593261719, -148.53787231445312, 178.97071838378906, 212.83055114746094, 10.357147216796875, 66.74580383300781, 123.91876220703125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000312.npy"}
{"epoch": 0.4581497797356828, "step": 313, "batch_size": 64, "mean": 119.13378143310547, "std": 151.51220703125, "min": -243.2392578125, "p10": -51.86152877807616, "median": 102.44617462158203, "p90": 294.26816711425784, "max": 549.7338256835938, "pos_frac": 0.8125, "sample": [65.7947998046875, 157.85812377929688, 41.471893310546875, 267.0374755859375, -73.1759033203125, 103.99497985839844, 318.9814758300781, 137.46902465820312, 210.30194091796875, 32.72372817993164, 23.113204956054688, -88.78834533691406, 96.74786376953125, 92.55015563964844, 13.538118362426758, 52.23439025878906, 196.30419921875, 155.71522521972656, 96.90711212158203, 134.45748901367188, 59.13513946533203, 295.0897521972656, -243.2392578125, -55.31114196777344, 73.69268798828125, 344.1938781738281, 226.66070556640625, 52.22232437133789, 156.28219604492188, 466.0047607421875, 244.6459503173828, 109.579833984375, -3.303709030151367, 55.23735046386719, -153.19537353515625, 100.89736938476562, -43.81243133544922, -206.75946044921875, 292.35113525390625, 192.8919219970703, -10.890754699707031, 163.15435791015625, 20.462162017822266, 109.16253662109375, 263.1437683105469, 126.96538543701172, 43.63208770751953, 219.80343627929688, 239.5572967529297, -97.44529724121094, 260.97052001953125, 77.62432861328125, 143.139892578125, -19.105857849121094, 52.93119812011719, 392.52728271484375, 416.1999206542969, 6.6701202392578125, -28.784469604492188, 46.15342712402344, 128.17222595214844, 242.86376953125, 279.420166015625, 549.7338256835938], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000313.npy"}
{"epoch": 0.45961820851688695, "step": 314, "batch_size": 64, "mean": 119.68209075927734, "std": 153.48411560058594, "min": -242.67637634277344, "p10": -29.85929679870605, "median": 100.17032623291016, "p90": 288.9424987792969, "max": 556.24755859375, "pos_frac": 0.75, "sample": [430.1725769042969, -204.25567626953125, 103.20360565185547, 18.10771942138672, 11.978363037109375, 254.41702270507812, 84.16450500488281, 180.938720703125, 105.31375122070312, -30.7398681640625, 205.48080444335938, -10.235000610351562, 154.71347045898438, 135.1043243408203, 99.68782043457031, -242.67637634277344, -91.44737243652344, 95.90254211425781, 442.1319580078125, 407.3294677734375, 18.69903564453125, -26.488733291625977, -11.580841064453125, 62.627628326416016, 26.45050048828125, -12.183712005615234, -18.720169067382812, -52.716522216796875, -0.31722259521484375, 280.1596984863281, 291.1831970214844, 35.67816162109375, 196.96331787109375, 260.66363525390625, 209.61404418945312, 0.7951545715332031, -20.69145965576172, 156.4968719482422, 100.65283203125, 369.7506103515625, 283.456787109375, 283.7142028808594, 556.24755859375, 258.9664001464844, 15.608211517333984, 178.0498046875, -46.35157012939453, 279.09527587890625, -3.8787689208984375, 147.5197296142578, 221.9134063720703, 52.122802734375, 217.1695556640625, 229.5480499267578, 296.1866760253906, 216.47776794433594, 96.87271118164062, 42.64263916015625, 119.29774475097656, 43.224143981933594, 5.2262115478515625, -27.804630279541016, 230.60256958007812, -52.58207702636719], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000314.npy"}
{"epoch": 0.461086637298091, "step": 315, "batch_size": 64, "mean": 113.18681335449219, "std": 131.47451782226562, "min": -160.4247589111328, "p10": -21.794832229614254, "median": 95.56494522094727, "p90": 273.2305389404297, "max": 563.2267456054688, "pos_frac": 0.796875, "sample": [266.78533935546875, 50.83130645751953, 146.05625915527344, 100.27157592773438, -23.39590072631836, 79.08738708496094, 42.36564636230469, 62.26936340332031, 118.59181213378906, -4.0899200439453125, 207.6733856201172, -5.628660202026367, -63.45561599731445, 157.77603149414062, 96.76924896240234, 326.980712890625, 119.64241027832031, 112.61686706542969, -90.62213897705078, 275.9927673339844, 59.1600341796875, 291.1898193359375, 177.57435607910156, 36.599395751953125, -33.811553955078125, 126.6468734741211, 243.44744873046875, 122.99481964111328, 308.33355712890625, 81.15415954589844, 346.9602355957031, 94.36064147949219, 60.640350341796875, 77.46453857421875, 172.6208953857422, 238.6011962890625, 34.58746337890625, 61.697021484375, 194.66822814941406, 21.196258544921875, 18.127460479736328, 98.19625854492188, 563.2267456054688, 70.47089385986328, 184.41015625, 61.462486267089844, -120.26216888427734, -15.439359664916992, -18.059005737304688, 251.82913208007812, 67.79965209960938, 130.9920196533203, 475.5931091308594, 104.8541488647461, 192.3671417236328, -31.577089309692383, 255.69326782226562, -10.008907318115234, 243.19094848632812, 105.26101684570312, -160.4247589111328, -4.486143112182617, 62.00386428833008, 26.132102966308594], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000315.npy"}
{"epoch": 0.46255506607929514, "step": 316, "batch_size": 64, "mean": 154.8727569580078, "std": 150.6689910888672, "min": -79.81370544433594, "p10": 0.100404357910159, "median": 124.06172180175781, "p90": 391.16799011230484, "max": 572.75146484375, "pos_frac": 0.890625, "sample": [180.21424865722656, 74.86822509765625, 337.33837890625, 70.78419494628906, 99.62303924560547, 493.04339599609375, 124.11161041259766, 421.3162841796875, 285.17236328125, 473.28131103515625, 292.2040710449219, 178.26148986816406, 4.417755126953125, -34.42037582397461, -1.1069564819335938, 150.40663146972656, 122.27738952636719, 110.11451721191406, -30.12900161743164, 143.71229553222656, 28.452781677246094, 63.565311431884766, 479.33837890625, 223.35386657714844, 178.91944885253906, 75.96749114990234, 79.15359497070312, 406.01593017578125, 105.86750793457031, 356.5227966308594, 22.18843650817871, 82.0724105834961, 21.065948486328125, 173.36317443847656, 276.9872741699219, 303.61260986328125, 247.3816375732422, 5.788383483886719, 139.8154296875, 2.9175796508789062, 412.29443359375, 226.67007446289062, 19.534027099609375, 294.8994445800781, 124.01183319091797, 85.1710433959961, -79.81370544433594, 48.39005661010742, 572.75146484375, 171.7404022216797, 324.5256042480469, -77.18984985351562, 181.40176391601562, 139.3044891357422, 144.80050659179688, -6.643272399902344, -52.82103729248047, 26.778823852539062, 14.862449645996094, 90.10430145263672, 191.99786376953125, 35.96344757080078, 71.64813232421875, 183.63369750976562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000316.npy"}
{"epoch": 0.46402349486049926, "step": 317, "batch_size": 64, "mean": 173.1220703125, "std": 145.41259765625, "min": -135.23922729492188, "p10": -4.839561653137202, "median": 165.63854217529297, "p90": 353.97946472167973, "max": 526.7654418945312, "pos_frac": 0.890625, "sample": [220.61907958984375, 10.885108947753906, 30.43506622314453, 360.2911376953125, 414.7696838378906, 39.733795166015625, 224.21896362304688, 262.3709716796875, -76.60624694824219, 335.68377685546875, 526.7654418945312, 126.51731872558594, 312.47686767578125, 58.8077392578125, 151.21969604492188, 261.9620056152344, 127.83338928222656, 58.05469512939453, 127.31714630126953, 285.214111328125, 19.58087158203125, 219.3851776123047, 247.6253662109375, 371.64959716796875, 44.29545211791992, 394.4599914550781, -135.23922729492188, 66.59282684326172, 487.7103271484375, 178.685302734375, 336.3845520019531, 250.1489715576172, 79.07487487792969, 339.2522277832031, 274.258056640625, 19.517898559570312, 168.2006072998047, 110.96748352050781, 268.5379943847656, 163.07647705078125, 85.0250244140625, 303.3899841308594, -6.945470809936523, -37.48222351074219, 268.7877197265625, 222.6056365966797, 281.5420227050781, 100.2179946899414, 199.50274658203125, -49.050682067871094, 126.20338439941406, 148.55722045898438, 208.46246337890625, 78.45020294189453, -43.56598663330078, 148.28216552734375, 104.2330322265625, 0.07422637939453125, 492.8143005371094, 203.62132263183594, 55.389503479003906, 184.93603515625, -20.360855102539062, 262.3879699707031], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000317.npy"}
{"epoch": 0.4654919236417034, "step": 318, "batch_size": 64, "mean": 151.012939453125, "std": 138.28282165527344, "min": -207.1664581298828, "p10": -8.15765800476073, "median": 137.36656951904297, "p90": 328.6601715087891, "max": 513.4267578125, "pos_frac": 0.890625, "sample": [67.69770050048828, 90.68190002441406, 513.4267578125, 108.8560791015625, 212.40093994140625, 120.29508972167969, 39.55449676513672, 169.93943786621094, 140.78890991210938, 469.30084228515625, 17.450380325317383, 99.64547729492188, 187.11065673828125, 199.42918395996094, 447.50274658203125, 229.26268005371094, 36.29566955566406, 99.47969055175781, -13.491477966308594, 28.501699447631836, 289.9916076660156, 37.755516052246094, -47.11729049682617, 321.2135925292969, 407.3190002441406, 245.86988830566406, -50.21285629272461, 4.287921905517578, 223.22523498535156, 251.177490234375, 65.70651245117188, 290.7997741699219, 58.22570037841797, 302.1425476074219, 134.0736846923828, 94.34514617919922, 355.4114990234375, -16.480377197265625, -42.904510498046875, 332.4517517089844, 331.8515625, 174.79733276367188, 78.28907775878906, 160.15725708007812, 225.0892791748047, 162.37442016601562, 127.11078643798828, 231.07791137695312, 90.86265563964844, -55.10939025878906, -207.1664581298828, 87.65766143798828, 224.4506378173828, 179.18576049804688, 262.14874267578125, 278.0828857421875, 159.40878295898438, 63.21994400024414, 33.31422424316406, 140.65945434570312, 68.00955963134766, 254.85357666015625, 16.89551544189453, 56.19658660888672], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000318.npy"}
{"epoch": 0.4669603524229075, "step": 319, "batch_size": 64, "mean": 143.07208251953125, "std": 139.1573028564453, "min": -124.0673828125, "p10": -49.337984848022444, "median": 139.00181579589844, "p90": 305.3605072021485, "max": 490.67706298828125, "pos_frac": 0.796875, "sample": [193.62094116210938, 122.58158874511719, 181.3525390625, 172.295166015625, 137.8763885498047, 175.45809936523438, 211.81777954101562, 257.17242431640625, 221.77978515625, 52.361167907714844, -24.084150314331055, -120.15412139892578, 226.114501953125, 140.1272430419922, 125.86104583740234, 84.17994689941406, 57.34040069580078, 199.16033935546875, 258.689453125, 113.66394805908203, 82.14906311035156, 196.72515869140625, 170.302734375, 224.3685302734375, 121.13197326660156, 25.342208862304688, 229.88339233398438, 488.9652404785156, 288.489990234375, 262.62060546875, -57.70306396484375, 114.29812622070312, 67.03347778320312, 256.2119140625, -124.0673828125, 219.85464477539062, -74.11737060546875, -54.88296890258789, -26.086456298828125, 261.9146728515625, 400.8678894042969, 312.5907287597656, 369.1059265136719, -36.399688720703125, 152.92747497558594, -8.105087280273438, 327.51617431640625, 287.5823059082031, 236.19398498535156, 333.990478515625, 490.67706298828125, -20.92384147644043, 72.05152893066406, 230.79019165039062, -76.85743713378906, 22.58916473388672, -87.10480499267578, -19.29807472229004, 108.83499145507812, 119.02922058105469, 82.02490234375, 135.94345092773438, 156.38589477539062, 106.55216979980469], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000319.npy"}
{"epoch": 0.4684287812041116, "step": 320, "batch_size": 64, "mean": 160.73431396484375, "std": 153.9678497314453, "min": -98.51158905029297, "p10": -15.540300941467283, "median": 154.62417602539062, "p90": 365.4262878417969, "max": 663.53955078125, "pos_frac": 0.859375, "sample": [52.97898864746094, 154.51629638671875, 460.0071716308594, 86.55613708496094, 270.4755554199219, 68.52555084228516, 226.88143920898438, -98.51158905029297, 507.20306396484375, 172.9916229248047, 285.43743896484375, 145.54776000976562, -16.853836059570312, 195.35198974609375, 154.7320556640625, 384.32861328125, 256.8436279296875, 173.71429443359375, 172.44696044921875, -77.4694595336914, 230.575927734375, 205.9119873046875, 43.831398010253906, 87.30872344970703, 61.92444610595703, 98.76351165771484, 229.97532653808594, 191.7109375, -30.27587890625, 328.0065002441406, -18.604782104492188, 209.8791046142578, 66.94243621826172, 220.01211547851562, 350.51971435546875, 113.30854797363281, 71.00640869140625, 118.27630615234375, 663.53955078125, 78.65634155273438, 324.2466125488281, 0.9176406860351562, 92.13491821289062, -0.2747344970703125, 371.8148193359375, 28.203392028808594, 180.495361328125, 129.5241241455078, -12.475385665893555, 20.469696044921875, 186.11917114257812, 182.9503936767578, 142.84165954589844, 319.2463684082031, 287.74212646484375, 439.325927734375, 455.62701416015625, 17.162626266479492, -34.40127182006836, 2.5129165649414062, 22.25677490234375, 167.91249084472656, 155.1824188232422, -89.5107421875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000320.npy"}
{"epoch": 0.4698972099853157, "step": 321, "batch_size": 64, "mean": 139.42694091796875, "std": 161.06417846679688, "min": -181.4079132080078, "p10": -27.317045211791992, "median": 131.2984390258789, "p90": 332.3142730712891, "max": 608.5118408203125, "pos_frac": 0.796875, "sample": [369.8587646484375, 150.02880859375, 176.99876403808594, 178.57058715820312, 32.47142791748047, 226.8997344970703, 263.8829650878906, 122.64207458496094, 86.11441040039062, 327.06634521484375, 267.3800048828125, 158.9436798095703, 1.630889892578125, -27.337238311767578, -133.1992950439453, 184.4724884033203, 321.99078369140625, 224.44512939453125, 178.66510009765625, 236.8432159423828, 510.84613037109375, 87.3548355102539, -181.4079132080078, 224.40951538085938, 222.22630310058594, 227.85089111328125, 230.0451202392578, -162.4080352783203, -6.856739044189453, 90.82977294921875, 148.38182067871094, 396.68121337890625, -0.172882080078125, 249.3445281982422, 99.92603302001953, 334.5633850097656, 74.59523010253906, -91.61492919921875, 608.5118408203125, 112.13819122314453, -12.665191650390625, 19.660669326782227, 3.6222171783447266, 119.23295593261719, -15.151237487792969, 92.03068542480469, -27.269927978515625, -51.02824783325195, 121.2098388671875, 206.9583282470703, 475.6995849609375, 428.6259460449219, -13.855205535888672, -171.94012451171875, 110.99346160888672, 139.95480346679688, 153.509765625, 184.40000915527344, 26.070894241333008, 286.2233581542969, 97.4505615234375, 175.24594116210938, 35.9266357421875, 14.805694580078125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000321.npy"}
{"epoch": 0.4713656387665198, "step": 322, "batch_size": 64, "mean": 147.5738983154297, "std": 143.60267639160156, "min": -143.6774444580078, "p10": -33.00102500915527, "median": 139.79084014892578, "p90": 345.75914306640624, "max": 448.5283508300781, "pos_frac": 0.84375, "sample": [-29.760093688964844, 368.3238220214844, 156.75390625, 448.5283508300781, 11.379447937011719, -143.6774444580078, 346.873779296875, 273.7059631347656, 259.9888916015625, -0.6330528259277344, -49.26377868652344, 21.73322296142578, 278.18023681640625, 209.10040283203125, 205.80111694335938, 245.11488342285156, 68.8302993774414, 135.46920776367188, 0.450958251953125, 30.186370849609375, 379.5328674316406, 296.571533203125, 68.40414428710938, 431.2798156738281, -66.70413208007812, 113.4007339477539, 254.99111938476562, 377.194091796875, -121.75847625732422, 343.1583251953125, 67.08833312988281, 98.90762329101562, 120.85536193847656, 287.66015625, 36.337158203125, 47.70978927612305, 176.13462829589844, 156.00917053222656, -21.53390121459961, 123.7900390625, 205.85472106933594, 286.7752685546875, 221.6676788330078, -69.88472747802734, 334.8297119140625, 141.05470275878906, 163.79798889160156, -34.38999557495117, 184.1765594482422, 37.30003356933594, 129.8111572265625, 428.5115966796875, 197.30819702148438, 202.34921264648438, 187.56011962890625, 157.5218963623047, 32.65342712402344, 105.69617462158203, 53.65784454345703, -99.57229614257812, 66.86297607421875, 81.26957702636719, 285.2757568359375, 138.5269775390625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000322.npy"}
{"epoch": 0.47283406754772395, "step": 323, "batch_size": 64, "mean": 142.3433837890625, "std": 147.13084411621094, "min": -164.404541015625, "p10": -18.37316875457763, "median": 132.65816497802734, "p90": 330.3092437744141, "max": 574.6619873046875, "pos_frac": 0.828125, "sample": [320.4740295410156, 208.5388641357422, 376.0582275390625, -164.404541015625, 341.8557434082031, -114.02339935302734, 331.1072998046875, 574.6619873046875, -21.10787582397461, 128.29122924804688, 268.94873046875, 19.29498291015625, 102.34109497070312, 134.09776306152344, 3.4037628173828125, 195.39369201660156, 68.95503234863281, 172.46932983398438, 55.978355407714844, 461.59576416015625, -9.556495666503906, 75.94151306152344, 384.95867919921875, -10.705377578735352, -45.3975944519043, 219.46592712402344, 71.03226470947266, 135.04888916015625, 218.4260711669922, 64.88327026367188, 277.99798583984375, 174.5948486328125, 130.47825622558594, -11.992185592651367, 188.8738555908203, 10.196090698242188, 63.64888000488281, 103.26351165771484, 253.39512634277344, 212.36941528320312, 221.27597045898438, 14.269012451171875, 169.633056640625, 308.92138671875, -141.96795654296875, 120.46000671386719, 247.41049194335938, 328.4471130371094, 232.8704833984375, -0.9845504760742188, 178.63507080078125, 367.73406982421875, -52.638267517089844, 188.5508270263672, -123.16737365722656, 216.6461181640625, 89.7236328125, 201.10955810546875, 39.49047088623047, 213.2710418701172, 104.48552703857422, 27.564220428466797, 131.21856689453125, 56.1658935546875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000323.npy"}
{"epoch": 0.47430249632892807, "step": 324, "batch_size": 64, "mean": 156.79869079589844, "std": 170.978759765625, "min": -300.5748291015625, "p10": -33.60046806335447, "median": 150.34716796875, "p90": 366.1071899414064, "max": 577.4320068359375, "pos_frac": 0.828125, "sample": [-7.746185302734375, 412.478515625, 14.807861328125, 137.72830200195312, 107.60546112060547, 161.10992431640625, 234.2357177734375, 262.1404724121094, 199.45753479003906, 190.81829833984375, 575.3271484375, 12.948867797851562, 272.66162109375, 84.77366638183594, 38.89091110229492, 460.2342224121094, -94.37821960449219, -142.81361389160156, 211.01657104492188, 222.3557586669922, 23.137100219726562, 167.61761474609375, -48.793182373046875, 227.26727294921875, 252.8926544189453, 541.556640625, 280.95538330078125, -132.51499938964844, 119.60458374023438, 242.05035400390625, 378.7931213378906, 74.76936340332031, 92.0534896850586, 129.78646850585938, 250.39932250976562, 252.86782836914062, 145.95396423339844, 273.6664733886719, 104.82184600830078, 8.687566757202148, 273.75830078125, -300.5748291015625, 185.8641357421875, 282.9933166503906, 336.5066833496094, 244.84616088867188, 50.50418472290039, -152.21971130371094, -42.007232666015625, 152.22576904296875, 148.46856689453125, 138.99346923828125, 104.18986511230469, 6.059513092041016, 138.8816375732422, 167.55555725097656, 326.6634521484375, -7.254295349121094, -13.984683990478516, 577.4320068359375, 390.9808044433594, 71.22996520996094, -6.834930419921875, 221.61282348632812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000324.npy"}
{"epoch": 0.47577092511013214, "step": 325, "batch_size": 64, "mean": 130.07130432128906, "std": 144.76695251464844, "min": -224.27911376953125, "p10": -11.84420261383056, "median": 124.50012969970703, "p90": 313.9015411376954, "max": 553.3438720703125, "pos_frac": 0.875, "sample": [-224.27911376953125, -224.106201171875, 214.84963989257812, 10.761833190917969, 296.8804626464844, 220.53494262695312, 123.27643585205078, -43.51416015625, 155.14529418945312, 266.44219970703125, 182.99156188964844, 369.50054931640625, 232.7025146484375, 143.3837432861328, 85.72401428222656, 67.1993179321289, 84.08911895751953, 187.71583557128906, 11.698745727539062, 172.49322509765625, 12.746620178222656, 321.1962890625, 198.85540771484375, 343.2500915527344, 13.254871368408203, -33.08277130126953, 149.1120147705078, 118.63646697998047, 60.55165100097656, 62.385215759277344, 238.09422302246094, 34.51274108886719, 525.777587890625, -5.595088958740234, 352.02032470703125, 74.21807098388672, 364.69403076171875, 22.13473129272461, 152.30007934570312, 168.71487426757812, -14.522394180297852, 72.52019500732422, 230.98782348632812, -39.833526611328125, 215.06661987304688, 170.04910278320312, 54.442710876464844, 143.61553955078125, 44.505950927734375, 155.5985107421875, 212.59408569335938, -141.78826904296875, 13.821640014648438, 553.3438720703125, 29.734710693359375, 125.72382354736328, 262.6296691894531, 122.2024917602539, 60.18858337402344, 113.16033172607422, 144.26161193847656, 42.39909744262695, 29.21704864501953, 217.3771514892578], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000325.npy"}
{"epoch": 0.47723935389133626, "step": 326, "batch_size": 64, "mean": 125.01014709472656, "std": 122.31525421142578, "min": -99.8221435546875, "p10": -1.748120117187499, "median": 99.72586059570312, "p90": 284.76945800781255, "max": 491.1695861816406, "pos_frac": 0.875, "sample": [-0.75238037109375, 72.42794799804688, 42.44526672363281, 292.2050476074219, 156.36891174316406, -2.17486572265625, 32.28812026977539, 75.56550598144531, 190.79299926757812, -72.67977905273438, 50.31496810913086, -99.8221435546875, 424.46954345703125, 71.98628997802734, -28.54195213317871, 88.33578491210938, 79.44732666015625, 79.05484008789062, 176.77609252929688, 257.9271545410156, 3.914825439453125, 43.979217529296875, 246.55343627929688, 224.41934204101562, 3.27960205078125, 49.78767395019531, 491.1695861816406, 140.7978973388672, 20.736658096313477, 95.17506408691406, 167.30203247070312, 132.6312255859375, 244.11866760253906, 166.85928344726562, 109.22933197021484, -40.4830322265625, 76.30953979492188, 159.62728881835938, -10.19974136352539, 177.86007690429688, 106.43052673339844, 50.149169921875, 137.2691192626953, 70.0177001953125, 77.43482971191406, 98.40910339355469, 166.10623168945312, 208.76138305664062, 101.04261779785156, 174.5817413330078, -70.14656066894531, 354.5188903808594, 406.32611083984375, 120.96532440185547, 291.5352783203125, 108.98316192626953, 263.79278564453125, 257.63470458984375, 11.820737838745117, 55.84098815917969, 306.91534423828125, 268.9825439453125, 3.3106117248535156, 40.46489715576172], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000326.npy"}
{"epoch": 0.4787077826725404, "step": 327, "batch_size": 64, "mean": 180.06192016601562, "std": 162.75863647460938, "min": -209.10726928710938, "p10": -0.7730285644531237, "median": 195.95399475097656, "p90": 378.22868652343755, "max": 590.5398559570312, "pos_frac": 0.890625, "sample": [206.97018432617188, 118.51959991455078, 396.140380859375, 214.01431274414062, 59.27467346191406, 295.73431396484375, -1.3019332885742188, 256.78802490234375, 192.25, 102.10546875, 227.43212890625, 318.21612548828125, 51.61811828613281, 245.7919921875, 0.46108245849609375, 199.65798950195312, 73.44522094726562, 407.7934265136719, 257.6531677246094, 212.412353515625, -209.10726928710938, 370.60858154296875, 473.3924560546875, 325.1082763671875, 22.49873924255371, 497.24114990234375, 216.95587158203125, -31.364398956298828, 251.4844970703125, 353.1544494628906, 349.3492736816406, -160.23574829101562, 16.47894287109375, 168.48179626464844, 264.9818115234375, 132.54920959472656, 210.88304138183594, 177.39466857910156, 54.34918212890625, 127.59615325927734, 81.69142150878906, 27.050445556640625, 112.82650756835938, 319.2402648925781, 115.81292724609375, -44.99565124511719, 107.01335144042969, 225.14488220214844, 381.49444580078125, 14.67401123046875, 324.73944091796875, 112.98377990722656, 590.5398559570312, -40.55802917480469, 422.0727844238281, 82.707763671875, 99.53431701660156, 352.2359619140625, 71.69313049316406, 350.5169677734375, 205.73440551757812, 281.61968994140625, 51.27674102783203, -167.86404418945312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000327.npy"}
{"epoch": 0.4801762114537445, "step": 328, "batch_size": 64, "mean": 149.48219299316406, "std": 152.16839599609375, "min": -153.7160186767578, "p10": -4.915833282470703, "median": 111.2491683959961, "p90": 351.4894134521484, "max": 516.7391967773438, "pos_frac": 0.859375, "sample": [160.2290496826172, -45.925811767578125, 102.45203399658203, 101.54644775390625, 72.87481689453125, 198.72442626953125, 225.0273895263672, 67.99598693847656, 109.07749938964844, -147.38888549804688, 72.68726348876953, 202.75599670410156, 343.2698974609375, 267.07012939453125, 43.98387145996094, 351.71759033203125, -4.983360290527344, -153.7160186767578, 24.895902633666992, 149.80523681640625, 10.39023208618164, -4.758270263671875, 249.65875244140625, 506.17791748046875, 157.896240234375, 318.04193115234375, 145.46649169921875, 113.42083740234375, 418.85626220703125, 292.2712707519531, 1.3587322235107422, 91.17428588867188, 217.45045471191406, 73.60557556152344, 16.21893882751465, 91.56454467773438, 188.24267578125, 191.16220092773438, 208.48635864257812, 84.0224609375, -1.2434253692626953, 206.67312622070312, -59.7586669921875, 136.46566772460938, -9.051902770996094, 350.9570007324219, 63.95198440551758, 203.61505126953125, 240.01705932617188, 76.96575164794922, 394.4768981933594, 509.7233581542969, 63.01789855957031, 21.426422119140625, 42.49509048461914, 516.7391967773438, 471.3338623046875, 30.778717041015625, 252.61138916015625, 61.551029205322266, 73.39002227783203, 291.39996337890625, 148.5083770751953, -31.99144744873047], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000328.npy"}
{"epoch": 0.48164464023494863, "step": 329, "batch_size": 64, "mean": 165.03314208984375, "std": 173.1510467529297, "min": -238.20156860351562, "p10": -34.68547840118408, "median": 163.25824737548828, "p90": 396.54383239746096, "max": 607.0814208984375, "pos_frac": 0.859375, "sample": [101.62478637695312, 287.27587890625, 355.90411376953125, 248.72291564941406, 178.64703369140625, 37.36316680908203, 389.6707458496094, 39.353797912597656, 367.7291259765625, 166.658447265625, 83.74102783203125, 198.31015014648438, 41.986732482910156, -37.86186218261719, 120.32878112792969, 163.29025268554688, 163.2262420654297, 95.9969253540039, 24.39683723449707, 518.868408203125, 110.3177719116211, 81.43490600585938, 183.9459686279297, 87.93972778320312, -238.20156860351562, 81.5644302368164, 51.22057342529297, 607.0814208984375, 413.0069885253906, 25.430984497070312, 241.70431518554688, -25.002574920654297, 252.68161010742188, -27.273916244506836, 200.0923309326172, 316.4642639160156, 334.7581481933594, 285.025390625, 469.9112548828125, 218.8131561279297, 129.91201782226562, -64.69868469238281, -120.46629333496094, 399.48944091796875, 374.7653503417969, -109.38662719726562, 348.9549560546875, -97.45417785644531, 204.19651794433594, 88.96754455566406, 171.84622192382812, 16.37645149230957, 195.57444763183594, 246.5902099609375, 56.095619201660156, 46.423622131347656, 180.13743591308594, 268.53662109375, 510.76568603515625, 74.36798095703125, -148.45819091796875, 147.59469604492188, 405.784423828125, 20.057411193847656], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000329.npy"}
{"epoch": 0.4831130690161527, "step": 330, "batch_size": 64, "mean": 158.81234741210938, "std": 159.24227905273438, "min": -144.16326904296875, "p10": -44.083376693725576, "median": 148.12281036376953, "p90": 339.33734436035155, "max": 587.3748779296875, "pos_frac": 0.828125, "sample": [402.631591796875, -32.70025634765625, -139.88009643554688, -48.961856842041016, 236.25759887695312, 381.221435546875, 145.4956817626953, 123.19448852539062, 5.076574325561523, 81.67615509033203, -74.07579040527344, 147.2744903564453, 81.78506469726562, 219.90057373046875, 316.32171630859375, 69.0131607055664, 194.2987060546875, 185.83212280273438, 164.42198181152344, 308.1656799316406, 17.01685905456543, 91.56515502929688, 237.19076538085938, 112.26374816894531, 149.89654541015625, 229.0051727294922, -113.9129638671875, 145.80593872070312, 205.34014892578125, 126.54228973388672, 224.55484008789062, 38.93098068237305, -118.61007690429688, 135.9835662841797, 340.9792785644531, 49.90538787841797, 125.80686950683594, 529.837646484375, -3.6635360717773438, 160.34820556640625, -18.213966369628906, 335.50616455078125, 104.68130493164062, 587.3748779296875, -128.9844512939453, 500.6203918457031, 296.3214416503906, 223.66847229003906, 245.17657470703125, 127.36871337890625, 155.3267822265625, 231.4539031982422, 86.19717407226562, 444.1639404296875, 78.39989471435547, 271.18499755859375, 127.56310272216797, 267.62860107421875, -144.16326904296875, 332.53814697265625, 148.97113037109375, 214.1809844970703, 236.99847412109375, -11.709991455078125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000330.npy"}
{"epoch": 0.4845814977973568, "step": 331, "batch_size": 64, "mean": 166.78672790527344, "std": 121.04784393310547, "min": -30.729965209960938, "p10": 34.188666915893556, "median": 148.56494903564453, "p90": 310.0799774169922, "max": 476.762451171875, "pos_frac": 0.9375, "sample": [142.7661590576172, 403.24932861328125, 278.9873046875, 13.923934936523438, 262.1962890625, 35.345977783203125, 56.9765625, -30.729965209960938, 122.89259338378906, 47.53448486328125, 389.3012390136719, 295.6369934082031, 452.75897216796875, 210.0433349609375, 169.4883575439453, 112.45952606201172, 476.762451171875, 84.19511413574219, 208.03079223632812, 154.36373901367188, 315.6881408691406, 280.19842529296875, 347.7957763671875, 33.69267654418945, 76.68649291992188, -8.876998901367188, 87.62150573730469, 111.86844635009766, 255.91107177734375, 56.90594482421875, 95.87705993652344, 13.536266326904297, 262.70123291015625, 120.27963256835938, 183.3460693359375, 70.84859466552734, -23.24829864501953, 135.53298950195312, 249.2041015625, 444.6387634277344, 296.9942626953125, 79.80271911621094, 66.14989471435547, 225.64730834960938, 54.016868591308594, 162.37210083007812, 201.09542846679688, 204.45639038085938, 213.528564453125, 87.98432922363281, -22.6527099609375, 71.150390625, 113.08204650878906, 133.30628967285156, 251.47116088867188, 170.8717803955078, 237.6584930419922, 92.91248321533203, 42.89482116699219, 272.93756103515625, 239.11607360839844, 182.61386108398438, 216.65797424316406, 83.891357421875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000331.npy"}
{"epoch": 0.48604992657856094, "step": 332, "batch_size": 64, "mean": 123.15026092529297, "std": 133.8206024169922, "min": -214.3594970703125, "p10": -18.141941833496087, "median": 106.71694946289062, "p90": 318.5304809570313, "max": 443.52056884765625, "pos_frac": 0.84375, "sample": [426.242919921875, 125.42522430419922, 102.94059753417969, 192.23703002929688, 69.14456176757812, -12.669570922851562, 97.97239685058594, 323.16510009765625, 307.71636962890625, 110.49330139160156, 117.53810119628906, 0.8016338348388672, 6.85906982421875, 443.52056884765625, 144.9246368408203, 137.71852111816406, 120.28677368164062, -25.978530883789062, 56.13178253173828, -7.00773811340332, 186.4510498046875, 97.75296020507812, 286.79827880859375, 74.50355529785156, 286.49566650390625, 116.80586242675781, 27.440887451171875, 59.127891540527344, 350.1256103515625, 181.26284790039062, 136.4002685546875, 160.51918029785156, 96.57441711425781, -28.39142608642578, 2.7490463256835938, 71.79715728759766, 393.8122863769531, 304.8727722167969, -73.15280151367188, 195.7501678466797, 132.73782348632812, -88.18665313720703, 176.8563232421875, 112.02687072753906, 158.65408325195312, 377.24212646484375, 66.19786071777344, 329.1285400390625, 14.3070068359375, 96.05741882324219, -86.3165283203125, 73.58273315429688, 266.0608825683594, 17.7489013671875, 235.0193328857422, 21.897449493408203, 3.6007537841796875, -8.708503723144531, -20.48724365234375, 86.73845672607422, 236.34596252441406, 75.15038299560547, 155.16392517089844, -214.3594970703125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000332.npy"}
{"epoch": 0.48751835535976507, "step": 333, "batch_size": 64, "mean": 162.07449340820312, "std": 173.1933135986328, "min": -128.07357788085938, "p10": -54.64882507324217, "median": 148.82251739501953, "p90": 394.00570373535163, "max": 526.339111328125, "pos_frac": 0.78125, "sample": [-41.18968200683594, 143.06234741210938, 526.339111328125, -7.8238372802734375, 171.59902954101562, 133.94996643066406, 146.6927947998047, 234.1661834716797, 53.068115234375, 86.63752746582031, 514.8812866210938, -91.73856353759766, 86.82244873046875, -87.45453643798828, 380.59429931640625, 41.43487548828125, 445.50726318359375, -21.630769729614258, 455.4516296386719, 113.30644989013672, -60.08462905883789, 279.7414245605469, -60.08445739746094, 331.1306457519531, 272.43048095703125, 243.15338134765625, 43.198951721191406, 43.188232421875, 150.95223999023438, 324.39422607421875, 50.770851135253906, 8.679304122924805, -21.842376708984375, 454.2983703613281, 286.3995056152344, 241.19674682617188, -128.07357788085938, 3.466878890991211, 399.7534484863281, 234.48464965820312, 301.4632568359375, 49.98014831542969, 0.31231689453125, 220.69461059570312, 356.0462646484375, -31.62006378173828, 165.3190460205078, 193.2670440673828, 357.2513427734375, 226.94970703125, 373.46185302734375, 273.4898376464844, 496.9833984375, 91.51215362548828, 205.5836944580078, -87.60310363769531, 323.49505615234375, -22.90597152709961, 225.5848388671875, -41.96568298339844, 250.76116943359375, -83.71249389648438, 80.95039367675781, 66.63835144042969], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000333.npy"}
{"epoch": 0.4889867841409692, "step": 334, "batch_size": 64, "mean": 137.55638122558594, "std": 152.70703125, "min": -222.0321807861328, "p10": -7.964372444152824, "median": 117.18910217285156, "p90": 353.2107269287109, "max": 440.4886474609375, "pos_frac": 0.875, "sample": [374.88604736328125, 80.69247436523438, -73.72915649414062, 303.17120361328125, 188.01451110839844, 22.982513427734375, 27.978919982910156, 91.57769012451172, 176.5059814453125, 20.891937255859375, 117.91329956054688, 89.97322845458984, 289.1111755371094, 98.6199951171875, 6.7638397216796875, 126.93716430664062, 66.23411560058594, 259.8511962890625, -222.0321807861328, 349.8875427246094, 35.96266555786133, 264.6286926269531, 60.87953186035156, -219.3655242919922, 230.38137817382812, 92.79733276367188, 354.63494873046875, -117.58927917480469, -60.78461837768555, 40.24488830566406, 13.258621215820312, 15.39456558227539, 56.66477966308594, 55.733184814453125, 378.571044921875, 433.4903259277344, 279.0196228027344, 47.95703887939453, 145.49612426757812, 136.41845703125, 109.36162567138672, 180.63653564453125, 289.5035400390625, 136.7748565673828, 43.83183288574219, 116.46490478515625, 398.505126953125, 171.10250854492188, 421.1701354980469, 178.767333984375, 92.72649383544922, -145.04490661621094, 65.9976806640625, 204.5101318359375, 138.6107940673828, 335.3277893066406, 312.9678039550781, 163.4136505126953, -11.345230102539062, 330.5101318359375, -0.07570457458496094, 440.4886474609375, 150.33584594726562, 39.041221618652344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000334.npy"}
{"epoch": 0.49045521292217326, "step": 335, "batch_size": 64, "mean": 120.4576187133789, "std": 152.62939453125, "min": -199.19232177734375, "p10": -59.87490539550781, "median": 127.37436294555664, "p90": 312.97312011718753, "max": 577.4583740234375, "pos_frac": 0.78125, "sample": [136.07080078125, 239.5488739013672, 284.9349365234375, 90.30618286132812, 137.99107360839844, -45.570899963378906, 109.61023712158203, 279.5067138671875, 51.5309944152832, 15.798959732055664, -136.81103515625, 85.90187072753906, 137.3406982421875, 162.1694793701172, 109.29468536376953, 354.00994873046875, 133.292236328125, 121.45648956298828, -171.25332641601562, 149.68954467773438, -3.73504638671875, 83.50495910644531, 147.07818603515625, 145.66268920898438, 301.9131774902344, 275.4423522949219, 281.0786437988281, 135.9650421142578, 61.75898742675781, 426.69378662109375, 577.4583740234375, -145.49594116210938, 239.83486938476562, -199.19232177734375, 315.52044677734375, 34.11296844482422, 337.2975769042969, 84.62709045410156, 46.14166259765625, 152.62860107421875, -84.44137573242188, 330.5606994628906, 85.5911636352539, -61.144989013671875, 242.722900390625, 42.96117401123047, 145.43077087402344, 8.002716064453125, -118.73599243164062, 26.6922607421875, 75.62637329101562, -23.948593139648438, 307.02935791015625, -56.911376953125, 18.275352478027344, 254.4256591796875, 143.5982208251953, 349.0982666015625, -7.5013885498046875, -27.17642593383789, 155.59718322753906, 232.88638305664062, 176.19174194335938, -48.6568717956543], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000335.npy"}
{"epoch": 0.4919236417033774, "step": 336, "batch_size": 64, "mean": 156.17337036132812, "std": 160.0751953125, "min": -274.0065002441406, "p10": -21.688503646850553, "median": 145.0219497680664, "p90": 350.24609985351583, "max": 599.1451416015625, "pos_frac": 0.890625, "sample": [118.0758056640625, 40.04936981201172, 412.1033020019531, 107.75676727294922, 255.40904235839844, 154.390869140625, 145.72824096679688, -36.635337829589844, -93.79228210449219, 501.0831604003906, 43.22288513183594, 175.1063690185547, 136.9039306640625, 188.2783203125, 577.8245849609375, 12.017911911010742, 431.0054626464844, 279.2030029296875, 159.5106658935547, 221.60096740722656, -56.1097297668457, 119.6876220703125, 227.24627685546875, 238.5665283203125, 15.503448486328125, 165.67538452148438, 105.14410400390625, 149.771728515625, 298.7401428222656, 150.5108642578125, 144.31565856933594, 82.30463409423828, 99.6685791015625, 67.90149688720703, 69.93754577636719, 124.5989990234375, -35.80461120605469, 395.56658935546875, 11.249080657958984, 21.40631866455078, 127.46128845214844, 159.76605224609375, 371.4551696777344, 60.671024322509766, 300.7582702636719, 35.091278076171875, -274.0065002441406, 54.736228942871094, 135.69985961914062, 205.25399780273438, 141.46751403808594, 128.66363525390625, 242.45086669921875, 120.4081802368164, 234.08840942382812, 298.8726501464844, 206.05975341796875, 599.1451416015625, -196.8371124267578, 199.60362243652344, -130.84576416015625, 228.95755004882812, 252.3336639404297, 269.11737060546875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000336.npy"}
{"epoch": 0.4933920704845815, "step": 337, "batch_size": 64, "mean": 162.87271118164062, "std": 145.60569763183594, "min": -145.3797149658203, "p10": -12.55474700927734, "median": 150.5291748046875, "p90": 340.9814239501954, "max": 553.164306640625, "pos_frac": 0.84375, "sample": [553.164306640625, 350.6793212890625, -145.3797149658203, -8.559272766113281, 303.06817626953125, 279.3206481933594, -14.431129455566406, 201.5002899169922, 75.93388366699219, 265.0894775390625, 15.566703796386719, 150.41510009765625, 139.74197387695312, 261.76544189453125, 303.6175842285156, -13.831367492675781, 211.41575622558594, 209.8965301513672, 76.61705017089844, 244.68008422851562, 93.65711975097656, 165.6976318359375, 13.035985946655273, 307.07098388671875, 150.64324951171875, 84.40106201171875, 493.7728271484375, 37.14996337890625, 10.880790710449219, -36.936859130859375, 126.83101654052734, 180.76771545410156, 249.67483520507812, -27.280424118041992, 168.7406463623047, 141.4300994873047, 318.3529968261719, 5.5640106201171875, 131.88571166992188, 180.9129638671875, 8.771282196044922, 448.06500244140625, 131.25827026367188, 33.31059265136719, -103.34811401367188, 118.82479858398438, 360.04852294921875, 297.31024169921875, 298.8500671386719, 125.8709487915039, 209.24728393554688, -2.4876174926757812, 370.9000244140625, 252.2455596923828, 100.40663146972656, 390.8497314453125, 235.7649383544922, 139.57867431640625, -9.575965881347656, 230.857421875, 253.24044799804688, 136.02963256835938, -79.14341735839844, 220.45562744140625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000337.npy"}
{"epoch": 0.4948604992657856, "step": 338, "batch_size": 64, "mean": 158.37158203125, "std": 137.0055389404297, "min": -75.60812377929688, "p10": -22.05732498168944, "median": 150.54745483398438, "p90": 346.21630859375, "max": 455.91571044921875, "pos_frac": 0.84375, "sample": [204.2280731201172, 313.7091369628906, -29.382705688476562, -28.292388916015625, 174.482177734375, 98.95114135742188, -75.60812377929688, 174.3056182861328, 422.7159423828125, 186.80325317382812, 91.22996520996094, 14.50424575805664, 8.705167770385742, 303.900634765625, 52.0943603515625, 145.37808227539062, 153.69537353515625, 210.84805297851562, 233.523681640625, 322.5827331542969, 346.2994384765625, 85.2822494506836, 208.95294189453125, 214.55494689941406, 206.01771545410156, 46.38630676269531, 233.1962890625, -34.68071746826172, 85.36483001708984, 4.547584533691406, 171.924560546875, 420.3599548339844, 346.0223388671875, 38.21317672729492, 143.9146728515625, 201.32321166992188, 118.32921600341797, -8.825054168701172, 311.25347900390625, 131.29464721679688, 318.9534606933594, -5.63780403137207, 112.7669677734375, -49.89971923828125, 334.55194091796875, 72.81198120117188, 189.50155639648438, 57.03630828857422, 210.6042022705078, 100.26069641113281, -72.93463134765625, -27.72829818725586, 414.04656982421875, 455.91571044921875, 40.49011993408203, 177.9667510986328, 349.4915771484375, 135.11471557617188, 269.2008056640625, 147.3995361328125, 354.6636962890625, -6.037471771240234, 37.495208740234375, 271.64117431640625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000338.npy"}
{"epoch": 0.49632892804698975, "step": 339, "batch_size": 64, "mean": 151.34521484375, "std": 132.88148498535156, "min": -83.3383560180664, "p10": -9.867481231689446, "median": 141.1565170288086, "p90": 319.42337341308604, "max": 527.4901733398438, "pos_frac": 0.875, "sample": [154.1953125, 48.82750701904297, 17.11031723022461, -58.74208068847656, 411.8631591796875, -2.9025421142578125, 113.86323547363281, 106.35971069335938, 221.49191284179688, 10.843544006347656, 478.934326171875, 80.82594299316406, 101.0123291015625, 68.02493286132812, 202.80482482910156, 244.5644073486328, 6.201078414916992, 82.49950408935547, 40.230445861816406, 417.9120178222656, 150.19720458984375, -12.852455139160156, 166.43421936035156, 25.975440979003906, 98.96304321289062, 124.39503479003906, 329.7821350097656, 23.813919067382812, 224.2737579345703, 527.4901733398438, -77.49203491210938, 134.4496307373047, -19.605308532714844, 69.78085327148438, 377.9864807128906, 199.45201110839844, 121.01490020751953, 61.78144073486328, 157.498291015625, 208.29306030273438, 85.58366394042969, 147.8634033203125, 180.0833282470703, 265.7883605957031, 57.942649841308594, 295.2529296875, -83.3383560180664, 359.3160095214844, 175.2598876953125, -40.15056228637695, 214.0933074951172, 225.72894287109375, 227.9502410888672, 182.5302734375, 125.53059387207031, 93.31367492675781, 113.5782470703125, -35.035972595214844, 205.26153564453125, 193.6597900390625, 287.7536315917969, 270.52935791015625, 287.88323974609375, 212.1639404296875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000339.npy"}
{"epoch": 0.4977973568281938, "step": 340, "batch_size": 64, "mean": 148.940185546875, "std": 149.0395965576172, "min": -206.50881958007812, "p10": -19.9591812133789, "median": 139.09185791015625, "p90": 324.53162536621096, "max": 627.899658203125, "pos_frac": 0.859375, "sample": [138.85626220703125, 275.1568908691406, 116.28960418701172, 167.53408813476562, 93.94413757324219, 299.9218444824219, 371.7055358886719, -70.42755889892578, 151.7474365234375, 81.04273223876953, 201.9524383544922, 627.899658203125, 202.55926513671875, -87.73846435546875, 326.1320495605469, 34.79639434814453, 320.79730224609375, 122.1427993774414, 18.055076599121094, 58.266841888427734, 167.87757873535156, 40.529022216796875, -206.50881958007812, 300.73895263671875, -13.772689819335938, 176.74887084960938, -34.175392150878906, 293.56134033203125, 54.18878173828125, 209.29425048828125, 278.019775390625, 179.4284210205078, 491.60162353515625, 337.5087890625, 79.02046203613281, 141.90933227539062, 42.98392105102539, 137.53753662109375, 30.404159545898438, 176.15048217773438, 159.88311767578125, 125.1021728515625, 139.32745361328125, 22.87832260131836, -124.19200134277344, 235.47744750976562, 44.48931884765625, -3.6583786010742188, 342.22149658203125, 38.79063415527344, 74.9369125366211, 374.2754211425781, 262.4649963378906, 284.3594970703125, 157.57015991210938, 226.31814575195312, 66.19525146484375, 89.9654541015625, 306.2584228515625, 133.79403686523438, -85.17704772949219, 298.41436767578125, -22.61053466796875, 51.40595245361328], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000340.npy"}
{"epoch": 0.49926578560939794, "step": 341, "batch_size": 64, "mean": 164.71856689453125, "std": 160.3800811767578, "min": -166.2598876953125, "p10": -36.179302215576165, "median": 166.89056396484375, "p90": 357.41595458984375, "max": 659.0726318359375, "pos_frac": 0.8125, "sample": [659.0726318359375, -49.569122314453125, 0.9772491455078125, -4.031343460083008, 216.55169677734375, 280.33966064453125, 314.2723083496094, 389.9682312011719, 166.3968505859375, 79.48799896240234, 44.91944885253906, 226.98477172851562, 226.31268310546875, 175.36538696289062, 83.81068420410156, -28.421234130859375, -7.8076934814453125, 304.09228515625, 53.384056091308594, 211.79367065429688, -118.9564437866211, 284.88348388671875, 361.1853332519531, 254.11001586914062, 117.1185531616211, 437.81890869140625, 167.38427734375, 132.06491088867188, -4.4062042236328125, -82.41294860839844, 112.04220581054688, 238.88265991210938, 389.4625244140625, 242.144287109375, 284.40020751953125, 263.87091064453125, 113.74055480957031, -18.876440048217773, -39.504188537597656, 221.4815216064453, 284.281494140625, 232.2500457763672, 134.8625030517578, 91.20447540283203, -76.25045776367188, 262.51348876953125, -113.54393005371094, 69.93238830566406, -166.2598876953125, 313.74365234375, 167.746337890625, 207.69906616210938, 148.79669189453125, 51.816734313964844, 487.63330078125, 55.82023620605469, 352.16082763671875, 313.3611145019531, 330.09368896484375, 63.30814743041992, 88.049560546875, 134.2482452392578, 359.66815185546875, 48.518333435058594], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000341.npy"}
{"epoch": 0.5007342143906021, "step": 342, "batch_size": 64, "mean": 159.92022705078125, "std": 158.93528747558594, "min": -168.40765380859375, "p10": -21.468765258789062, "median": 133.59242248535156, "p90": 366.0092041015626, "max": 682.9649047851562, "pos_frac": 0.8125, "sample": [431.20501708984375, -168.40765380859375, -88.00283813476562, 336.3090515136719, 24.158817291259766, 109.3378677368164, 223.159912109375, 160.1312713623047, 127.38655090332031, 258.9696350097656, 70.93855285644531, 166.34361267089844, 180.70443725585938, 223.83602905273438, 345.1421203613281, 217.18109130859375, 128.7837677001953, 274.3068542480469, -76.11673736572266, -15.863540649414062, 207.359619140625, -2.9001083374023438, 200.38284301757812, 33.43518829345703, -20.109107971191406, 185.09304809570312, 682.9649047851562, 265.467041015625, 124.97589111328125, -70.58973693847656, -28.960521697998047, 558.68701171875, 169.87899780273438, 83.412353515625, 48.09596252441406, -22.051475524902344, 405.7894592285156, 374.9522399902344, 387.9034423828125, 82.06413269042969, -8.247686386108398, 120.95582580566406, 272.9878845214844, -12.560455322265625, 217.9345703125, 324.495849609375, 83.8365478515625, 311.3013610839844, 138.4010772705078, 291.37725830078125, 383.6039123535156, 124.75501251220703, -35.60517120361328, 103.30824279785156, 235.449462890625, 87.0538101196289, 107.44273376464844, 2.6725101470947266, 54.709022521972656, 92.360595703125, 195.29623413085938, 306.6865234375, 168.9433135986328, 72.38164520263672], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000342.npy"}
{"epoch": 0.5022026431718062, "step": 343, "batch_size": 64, "mean": 170.99798583984375, "std": 138.78292846679688, "min": -122.32030487060547, "p10": 9.46857128143311, "median": 150.87835693359375, "p90": 320.1419952392578, "max": 548.0076293945312, "pos_frac": 0.921875, "sample": [264.9017028808594, 94.2938461303711, 93.64738464355469, 92.52838897705078, 209.87351989746094, 2.294891357421875, 102.607666015625, 251.94801330566406, 106.0789794921875, 426.0128173828125, 98.28227233886719, 17.26446533203125, 139.61746215820312, 106.06207275390625, 121.34396362304688, 70.68484497070312, 76.7542724609375, 201.7127227783203, 194.97854614257812, 420.0560302734375, 142.02415466308594, 268.0423278808594, 61.99977111816406, 322.6224060058594, 314.3543701171875, 282.7831726074219, 14.69540786743164, 63.43438720703125, 457.18511962890625, 221.04063415527344, -99.84404754638672, 57.66147232055664, 238.42218017578125, 30.696197509765625, 174.40869140625, 145.31927490234375, 113.56658935546875, 20.36027717590332, 7.228498458862305, 309.1324768066406, -87.1693115234375, 217.54391479492188, 206.20315551757812, 270.7121276855469, 194.26934814453125, 548.0076293945312, 252.03421020507812, 240.73638916015625, 295.1131286621094, 312.5246276855469, 145.88723754882812, 205.72964477539062, 136.8388671875, -8.50246810913086, 312.71649169921875, 302.6700134277344, 159.60923767089844, 104.37189483642578, 155.86947631835938, 52.903350830078125, -4.583168029785156, 361.9339904785156, 454.69390869140625, -122.32030487060547], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000343.npy"}
{"epoch": 0.5036710719530103, "step": 344, "batch_size": 64, "mean": 152.88720703125, "std": 163.60194396972656, "min": -177.7236785888672, "p10": 0.9245235443115298, "median": 136.85791015625, "p90": 350.4489715576172, "max": 883.2041015625, "pos_frac": 0.890625, "sample": [25.15270233154297, 13.872522354125977, -177.7236785888672, -9.275873184204102, 134.67103576660156, 168.82644653320312, 93.50749206542969, 170.49417114257812, 20.696266174316406, 136.8706512451172, 56.055877685546875, 412.838134765625, 42.95861053466797, 145.87718200683594, 78.18699645996094, 90.34884643554688, 180.4968719482422, 7.2185821533203125, -70.29256439208984, 215.89028930664062, 111.07878112792969, 46.83186340332031, 75.94126892089844, -1.7729301452636719, 122.07675170898438, 207.02401733398438, 343.33526611328125, 345.8110656738281, 66.47220611572266, 52.912841796875, 352.4366455078125, 74.86798095703125, 197.27999877929688, 18.789793014526367, 245.50924682617188, 150.53326416015625, 261.00689697265625, 25.46435546875, 170.0260009765625, 26.712263107299805, 382.5822448730469, 243.75912475585938, 170.0782012939453, 163.7886962890625, 73.4050064086914, 883.2041015625, 309.7646484375, 277.9702453613281, 47.52723693847656, 421.42059326171875, -23.41925811767578, 151.49240112304688, 201.147216796875, 384.9878234863281, 98.268310546875, 88.10032653808594, -126.02608489990234, 288.3038330078125, 407.96697998046875, -139.67169189453125, 210.71502685546875, 255.0442352294922, 136.8451690673828, 248.51870727539062], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000344.npy"}
{"epoch": 0.5051395007342144, "step": 345, "batch_size": 64, "mean": 161.3408203125, "std": 159.7306671142578, "min": -182.66946411132812, "p10": -10.272309494018554, "median": 134.14270401000977, "p90": 372.2221740722657, "max": 597.4654541015625, "pos_frac": 0.875, "sample": [80.06462860107422, 421.7622985839844, 149.86053466796875, 212.41259765625, 63.152137756347656, -138.51815795898438, 424.1287841796875, -31.879196166992188, 160.80511474609375, 234.1938018798828, -9.843130111694336, 175.9475555419922, 23.4779052734375, 284.79443359375, 344.96246337890625, 87.74600219726562, -148.70242309570312, 144.11688232421875, 90.87535095214844, 176.88368225097656, 108.69369506835938, 277.03900146484375, 56.75373840332031, 240.22232055664062, 215.12310791015625, 202.5743865966797, 87.19054412841797, 335.5660095214844, 103.885009765625, 112.38108825683594, 52.280853271484375, 506.3807067871094, 107.64715576171875, 42.01438903808594, 38.306549072265625, 263.5253601074219, 298.90899658203125, 313.48309326171875, 169.43234252929688, 303.2816162109375, 197.98878479003906, 597.4654541015625, 99.29617309570312, -28.278045654296875, 242.01292419433594, 271.2813720703125, 35.90810012817383, 515.5975952148438, 107.04452514648438, 282.8453063964844, 62.514503479003906, 8.598106384277344, 114.44366455078125, 403.92596435546875, -10.456243515014648, -103.67951202392578, 8.996953964233398, 100.847412109375, 383.9049072265625, 124.16852569580078, 233.9815673828125, 238.082275390625, -182.66946411132812, 41.06016540527344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000345.npy"}
{"epoch": 0.5066079295154186, "step": 346, "batch_size": 64, "mean": 151.2305145263672, "std": 148.2451934814453, "min": -305.076416015625, "p10": -14.035796737670895, "median": 161.9804229736328, "p90": 364.1893554687501, "max": 511.2657775878906, "pos_frac": 0.84375, "sample": [252.20204162597656, 49.30677795410156, 1.0095138549804688, 235.90640258789062, 299.8688049316406, 223.60194396972656, 472.3653259277344, 187.69622802734375, 218.8004913330078, 511.2657775878906, -305.076416015625, 85.15357971191406, 188.72137451171875, 1.7323036193847656, 178.13470458984375, 274.07733154296875, 0.13458251953125, 195.45152282714844, 402.26385498046875, -6.344419479370117, 229.0225067138672, 88.24868774414062, -41.10028076171875, 350.0333251953125, 146.48678588867188, 190.17910766601562, 184.13265991210938, 64.23193359375, 154.3917236328125, 113.80056762695312, 2.201963424682617, -4.197959899902344, 131.3225555419922, 80.45268249511719, 384.0216979980469, 185.87266540527344, 220.9630584716797, 155.00808715820312, 226.30599975585938, 4.9482574462890625, 20.439231872558594, 370.2562255859375, 141.46607971191406, 402.44732666015625, 131.8772735595703, 174.2102813720703, 207.60342407226562, 241.51806640625, 58.69065475463867, 77.930908203125, 232.84036254882812, -9.609668731689453, -18.934112548828125, -49.16193389892578, 85.19709777832031, 348.1890869140625, 62.074066162109375, 168.9527587890625, 406.774169921875, -15.932708740234375, -22.197601318359375, -75.99549865722656, 193.18418884277344, 214.33544921875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000346.npy"}
{"epoch": 0.5080763582966226, "step": 347, "batch_size": 64, "mean": 159.3125, "std": 168.84112548828125, "min": -238.25836181640625, "p10": -16.597657775878904, "median": 156.24932861328125, "p90": 365.81589050292973, "max": 584.1195068359375, "pos_frac": 0.875, "sample": [-172.68124389648438, 229.67315673828125, 273.29693603515625, 284.6727600097656, 65.49419403076172, 62.93120574951172, 87.09513092041016, 172.24131774902344, 204.14735412597656, 151.999267578125, 249.25730895996094, 25.030338287353516, 108.21571350097656, 44.844024658203125, 250.2552490234375, -203.83226013183594, -238.25836181640625, 25.384185791015625, 558.3309326171875, 115.9546890258789, 286.0021057128906, 28.20376968383789, 48.87799072265625, 239.31451416015625, -17.0479736328125, 391.59503173828125, 302.2886657714844, 212.61016845703125, 13.770027160644531, 370.9088134765625, 227.2003631591797, 322.9368591308594, 291.7589111328125, 65.83293914794922, 159.3846435546875, 58.52423858642578, 400.64288330078125, 238.87362670898438, 125.7620849609375, -105.91654968261719, 159.82366943359375, 183.70651245117188, 108.39043426513672, 275.90887451171875, 153.114013671875, 92.35726928710938, 52.926910400390625, 109.7496566772461, 578.0977783203125, 47.083351135253906, 210.29957580566406, 372.0843200683594, 195.77781677246094, 584.1195068359375, -50.387550354003906, 146.69180297851562, 168.28573608398438, -138.14566040039062, 353.9324035644531, 72.65254211425781, 322.7008056640625, -15.546920776367188, 38.95805358886719, 217.84437561035156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000347.npy"}
{"epoch": 0.5095447870778267, "step": 348, "batch_size": 64, "mean": 143.79965209960938, "std": 156.2502899169922, "min": -170.8118133544922, "p10": -32.92155990600584, "median": 124.88959503173828, "p90": 324.2526550292969, "max": 643.232177734375, "pos_frac": 0.875, "sample": [225.92449951171875, 5.869682312011719, 160.47381591796875, 207.81094360351562, 236.5240936279297, 6.449668884277344, 15.436622619628906, 244.63389587402344, 260.51251220703125, -133.24937438964844, 65.52485656738281, 63.9354248046875, 388.6237487792969, 214.45565795898438, 176.8251495361328, 176.73635864257812, 11.456422805786133, 310.8101501464844, 123.5091552734375, 69.36923217773438, 194.8546142578125, 197.96551513671875, 17.100685119628906, 643.232177734375, 310.9750671386719, 235.22891235351562, 25.51203155517578, 277.3659362792969, 40.65440368652344, 140.24496459960938, 72.24967956542969, -99.66300964355469, 32.74981689453125, 477.58514404296875, 325.3815002441406, 152.42584228515625, 247.38436889648438, 114.09840393066406, 215.53759765625, 19.475812911987305, -170.8118133544922, 281.41571044921875, -57.15387725830078, 285.3692932128906, 175.06503295898438, 108.5766372680664, 115.73062896728516, 21.398351669311523, 369.5386047363281, 132.22840881347656, 61.252235412597656, 60.90327453613281, 412.76934814453125, -16.83173370361328, 321.6186828613281, 126.27003479003906, 85.98273468017578, 13.279300689697266, 17.791549682617188, 467.9797668457031, 77.34420776367188, -39.81719970703125, -43.85377502441406, -74.85992431640625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000348.npy"}
{"epoch": 0.5110132158590308, "step": 349, "batch_size": 64, "mean": 145.6865234375, "std": 180.912109375, "min": -275.35540771484375, "p10": -38.165478515625, "median": 109.23462677001953, "p90": 395.3954528808594, "max": 692.1522827148438, "pos_frac": 0.8125, "sample": [-37.82251739501953, -275.35540771484375, 275.86651611328125, 101.37052154541016, 65.55433654785156, 49.564788818359375, 311.09136962890625, -38.312461853027344, 185.67877197265625, 294.2279968261719, 107.65127563476562, 281.6776428222656, 200.669189453125, 80.35477447509766, 15.021116256713867, 26.35127830505371, 35.70049285888672, 26.50873565673828, 174.18836975097656, -37.197975158691406, 255.9068603515625, 226.99249267578125, 180.11434936523438, 15.595548629760742, 328.0206604003906, -38.8071174621582, 250.63265991210938, -22.82494354248047, 96.98894500732422, 122.04638671875, 544.5952758789062, -41.573768615722656, -28.593822479248047, 149.26025390625, 230.48779296875, -182.16915893554688, -206.8582763671875, 90.97699737548828, 451.1512451171875, 85.90206909179688, 458.8000793457031, 49.000213623046875, -144.907958984375, 103.01325988769531, 389.756103515625, 410.59893798828125, 437.8874206542969, 110.81797790527344, 397.81231689453125, -28.556793212890625, 212.13888549804688, 369.5341796875, 66.47460174560547, 192.2469482421875, 205.5210418701172, 199.80343627929688, 0.10471343994140625, 103.91600036621094, 129.36512756347656, 257.6683654785156, 246.885009765625, 30.464265823364258, 82.80853271484375, 692.1522827148438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000349.npy"}
{"epoch": 0.5124816446402349, "step": 350, "batch_size": 64, "mean": 154.13565063476562, "std": 160.34844970703125, "min": -234.4989776611328, "p10": -31.72759246826171, "median": 150.99317169189453, "p90": 367.4137817382813, "max": 510.37188720703125, "pos_frac": 0.84375, "sample": [504.73040771484375, -226.65408325195312, 91.3688735961914, 253.10263061523438, 219.64108276367188, 150.28787231445312, 302.351318359375, 27.931182861328125, -34.38017272949219, 120.25825500488281, 105.80172729492188, 261.1788330078125, -119.60430908203125, 169.83558654785156, 175.43777465820312, 14.930877685546875, 510.37188720703125, 258.02911376953125, 403.80963134765625, 296.131103515625, 201.30810546875, 143.4012908935547, -25.031179428100586, 270.5944519042969, 32.022857666015625, 417.1063232421875, 27.295654296875, -96.89456176757812, 95.54358673095703, 126.2665786743164, 317.2679443359375, 80.1951904296875, 169.10934448242188, 210.96820068359375, 263.0317077636719, 188.91856384277344, 65.61272430419922, 236.32908630371094, -78.82913208007812, 382.482177734375, 151.69847106933594, 366.8001708984375, 212.81552124023438, -25.538238525390625, 77.76807403564453, 197.47259521484375, 367.6767578125, 248.4239501953125, 98.00448608398438, 149.00137329101562, -234.4989776611328, 403.958984375, 30.857772827148438, 221.75958251953125, 64.41010284423828, 131.45602416992188, -61.41648864746094, 11.6397705078125, 21.44207000732422, 185.25347900390625, 308.22869873046875, 66.60016632080078, -0.2256011962890625, 359.83355712890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000350.npy"}
{"epoch": 0.5139500734214391, "step": 351, "batch_size": 64, "mean": 178.76748657226562, "std": 180.74327087402344, "min": -301.0643615722656, "p10": -28.330630493164048, "median": 180.587890625, "p90": 407.6152496337891, "max": 633.0709228515625, "pos_frac": 0.828125, "sample": [57.30035400390625, 163.36538696289062, 348.39898681640625, 467.5252685546875, 633.0709228515625, 103.45149230957031, 198.06813049316406, 75.41667175292969, 309.1751708984375, 237.25218200683594, 146.12713623046875, 212.74713134765625, 497.0626220703125, 488.69134521484375, 260.3355712890625, 333.76123046875, 232.47109985351562, 334.0506591796875, 379.03070068359375, 80.1434555053711, -301.0643615722656, 131.73611450195312, 159.601806640625, 173.6778564453125, 127.67703247070312, 250.39715576171875, 252.8548583984375, 271.2487487792969, -183.6968536376953, -12.081527709960938, -73.05599975585938, 197.12139892578125, -94.34738159179688, 232.15521240234375, -13.8896484375, 591.7532958984375, 24.156143188476562, 144.35540771484375, 118.46173095703125, 76.18486785888672, -52.133819580078125, 310.4371643066406, 42.44063186645508, 485.8586730957031, -34.519622802734375, 42.46399688720703, 170.30311584472656, -1.5296630859375, 105.35467529296875, -131.55763244628906, 59.649864196777344, 330.22540283203125, 86.79011535644531, 187.4979248046875, 414.13568115234375, -9.225112915039062, 249.9490509033203, 392.4009094238281, 259.6573791503906, 189.52127075195312, 199.1389617919922, 263.3587646484375, 207.1114044189453, 43.098426818847656], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000351.npy"}
{"epoch": 0.5154185022026432, "step": 352, "batch_size": 64, "mean": 120.65502166748047, "std": 171.1182861328125, "min": -242.4648895263672, "p10": -71.4205940246582, "median": 94.37676239013672, "p90": 323.9131286621094, "max": 650.3346557617188, "pos_frac": 0.78125, "sample": [297.5793762207031, 295.33837890625, 113.419189453125, 284.15673828125, -138.79843139648438, 70.49569702148438, 95.99884796142578, 253.7697296142578, 90.93559265136719, 184.85072326660156, 460.28656005859375, 327.36248779296875, 69.13999938964844, 81.71660614013672, 196.74435424804688, -45.81065368652344, -82.6126708984375, 650.3346557617188, 154.9087371826172, -70.46712493896484, 177.55694580078125, 495.6448059082031, 67.90794372558594, 306.0570068359375, 19.265777587890625, 14.828575134277344, -71.8292236328125, -213.3427734375, 92.75467681884766, 55.31100845336914, 63.223297119140625, 391.4341125488281, 16.476703643798828, 10.744255065917969, 354.5584716796875, 185.326416015625, -242.4648895263672, 60.29857635498047, 180.34347534179688, 265.7144775390625, 124.59397888183594, 28.09869384765625, 221.7506103515625, -168.20118713378906, -57.1326789855957, -63.259246826171875, -82.7279052734375, 186.12078857421875, 293.14410400390625, -35.70463562011719, 159.84205627441406, -54.404624938964844, -64.04749298095703, 315.8646240234375, 210.5419921875, 151.84591674804688, 211.14031982421875, 35.16651153564453, 343.5487060546875, 89.44634246826172, 101.79681396484375, 52.13008117675781, 169.86337280273438, 33.34561538696289], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000352.npy"}
{"epoch": 0.5168869309838473, "step": 353, "batch_size": 64, "mean": 144.52413940429688, "std": 169.7705535888672, "min": -248.51437377929688, "p10": -46.38029708862304, "median": 119.5998420715332, "p90": 388.113119506836, "max": 495.00408935546875, "pos_frac": 0.8125, "sample": [196.2755126953125, 218.57272338867188, 48.45757293701172, 373.959228515625, 140.84219360351562, 277.2808532714844, 69.01490783691406, -48.903717041015625, 365.26348876953125, 53.520565032958984, 136.47732543945312, -40.49231719970703, 71.35743713378906, 346.30987548828125, 89.66659545898438, 243.28720092773438, -71.70112609863281, -4.869110107421875, 467.9786682128906, 414.2474365234375, 85.34654998779297, 272.136474609375, 57.37362289428711, 43.957366943359375, -248.51437377929688, 347.46820068359375, -195.59890747070312, 84.23204803466797, 33.711326599121094, 243.74554443359375, -28.99342155456543, 10.350717544555664, 270.861572265625, 107.57579040527344, 59.082733154296875, -61.562679290771484, 126.81228637695312, -56.422454833984375, 390.4090881347656, 1.4356422424316406, 84.49234008789062, 495.00408935546875, 64.9598617553711, 123.44757843017578, 122.96959686279297, 183.6433563232422, 229.6875, 122.46922302246094, 1.7335205078125, 448.7491760253906, 116.73046112060547, -33.101837158203125, 382.755859375, 297.3235168457031, 493.5238037109375, 391.32373046875, -34.39781188964844, 181.5411376953125, 342.5303955078125, -68.02230834960938, 29.27281951904297, 129.9954376220703, 19.5589599609375, 233.40191650390625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000353.npy"}
{"epoch": 0.5183553597650514, "step": 354, "batch_size": 64, "mean": 172.76959228515625, "std": 136.4973602294922, "min": -98.20536804199219, "p10": 3.8853263854980473, "median": 158.20091247558594, "p90": 340.00444946289065, "max": 494.99261474609375, "pos_frac": 0.90625, "sample": [257.67401123046875, 334.63140869140625, 259.77508544921875, 5.9337158203125, 340.369140625, 182.14144897460938, 468.5103454589844, 9.666851043701172, 222.50942993164062, -98.20536804199219, 54.772743225097656, -75.9568099975586, 122.57858276367188, 440.24249267578125, 282.2149963378906, 137.93917846679688, 4.120338439941406, 167.70782470703125, 263.46307373046875, 266.8622131347656, 36.544532775878906, 328.4305114746094, 104.78604888916016, 110.82794189453125, -19.816925048828125, 102.20803833007812, 198.74392700195312, 219.77386474609375, 132.544189453125, 223.80906677246094, 329.8215637207031, -2.4079456329345703, 288.5192565917969, 258.8859558105469, 176.7126007080078, 126.82063293457031, -4.562599182128906, 304.9474792480469, 114.31079864501953, 53.66859436035156, 225.8291015625, 64.2902603149414, 228.97230529785156, 142.2073974609375, 145.76510620117188, 202.9139862060547, 69.7238998413086, 494.99261474609375, 236.87351989746094, 3.78460693359375, 78.84215545654297, 396.1788330078125, 149.2785186767578, 75.96502685546875, -23.16638946533203, 401.451416015625, 33.785972595214844, 339.15350341796875, 263.6035461425781, 59.99226379394531, 378.2269592285156, 167.12330627441406, 130.69589233398438, 59.25234603881836], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000354.npy"}
{"epoch": 0.5198237885462555, "step": 355, "batch_size": 64, "mean": 175.61483764648438, "std": 147.84146118164062, "min": -106.63618469238281, "p10": 15.504193115234385, "median": 165.3508529663086, "p90": 404.54533081054694, "max": 592.3492431640625, "pos_frac": 0.90625, "sample": [164.67355346679688, 423.010009765625, 253.27932739257812, 422.383544921875, 168.77114868164062, 200.43638610839844, 120.92047882080078, 592.3492431640625, -96.75806427001953, 202.17213439941406, 256.1048583984375, 458.04620361328125, 58.62664031982422, -2.982210159301758, 336.3672180175781, 258.7657470703125, 65.28375244140625, 413.914306640625, 63.08599090576172, 146.496337890625, 311.5181884765625, 168.99331665039062, -61.88752746582031, 57.83674240112305, 173.50930786132812, 382.68438720703125, 253.85711669921875, 153.77481079101562, 130.85958862304688, 11.550872802734375, 72.55673217773438, 103.04696655273438, 166.0281524658203, -70.60076904296875, 159.55833435058594, -106.63618469238281, 232.35768127441406, 435.39703369140625, 128.10324096679688, 475.08258056640625, 83.2305679321289, 212.47702026367188, 30.92251968383789, 45.1270751953125, 197.20916748046875, 253.43975830078125, 163.80758666992188, 337.1369323730469, 46.940311431884766, 128.76058959960938, 359.45208740234375, 188.44915771484375, 222.5557098388672, 217.50384521484375, 62.32914733886719, 24.728607177734375, 63.10444641113281, -44.337276458740234, 178.021728515625, 30.618938446044922, 70.33554077148438, 236.0575714111328, 296.9728088378906, 151.96832275390625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000355.npy"}
{"epoch": 0.5212922173274597, "step": 356, "batch_size": 64, "mean": 109.56278228759766, "std": 195.1230010986328, "min": -314.5636291503906, "p10": -111.48277435302732, "median": 98.76798629760742, "p90": 344.3852844238282, "max": 568.0358276367188, "pos_frac": 0.65625, "sample": [193.14959716796875, -13.453651428222656, 317.54656982421875, 154.79501342773438, -0.9680671691894531, -139.6417694091797, -92.35009765625, 256.0302734375, -10.031524658203125, 181.04562377929688, 334.8465881347656, 8.154857635498047, 568.0358276367188, 423.5321350097656, -47.33154296875, 348.4732971191406, -77.90119934082031, -92.44505310058594, -314.5636291503906, -118.49618530273438, -19.96826934814453, 487.6747131347656, -43.11265563964844, 50.4586181640625, 292.2508239746094, 334.120361328125, 80.97879791259766, 310.4488220214844, -214.09039306640625, 63.956321716308594, -95.11814880371094, 486.23040771484375, 164.84408569335938, -64.71659088134766, 235.7038116455078, 110.74678039550781, 271.5472717285156, 307.65570068359375, -203.0666046142578, 45.55000305175781, 31.431232452392578, -71.3938217163086, 127.81512451171875, 187.48345947265625, 6.804903030395508, 183.45599365234375, 304.69000244140625, -128.25259399414062, 421.9814453125, 149.34298706054688, 137.82601928710938, 21.448272705078125, -81.28141784667969, -41.39954376220703, 130.8531494140625, 86.78919219970703, 47.19883728027344, 200.04473876953125, -265.6449279785156, 435.9427185058594, 184.86878967285156, 278.391845703125, -0.9646072387695312, 184.06521606445312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000356.npy"}
{"epoch": 0.5227606461086637, "step": 357, "batch_size": 64, "mean": 164.54934692382812, "std": 148.47372436523438, "min": -207.79324340820312, "p10": 16.71076545715332, "median": 134.63858032226562, "p90": 350.8632354736328, "max": 585.15966796875, "pos_frac": 0.90625, "sample": [298.8029479980469, 296.65283203125, 110.60148620605469, 394.8826904296875, 368.2339172363281, 76.33329772949219, 123.37513732910156, 299.59393310546875, 77.38607788085938, 283.5677795410156, 257.50396728515625, 93.6694564819336, -30.99292755126953, 352.9766845703125, -14.743988037109375, 130.30319213867188, 210.98150634765625, 84.2288818359375, 260.20501708984375, -115.28511047363281, 501.59515380859375, 85.38582611083984, 112.81405639648438, 41.50224304199219, -207.79324340820312, 16.249759674072266, 191.1445770263672, 133.09750366210938, 34.786170959472656, 384.08917236328125, 342.6199645996094, 48.98509979248047, 345.9318542480469, 21.20556640625, 222.93870544433594, 25.903148651123047, 173.51937866210938, 585.15966796875, 205.1939697265625, 313.8714294433594, 58.867218017578125, -34.39696502685547, 277.8164367675781, 225.7932586669922, 17.78644561767578, 123.80780029296875, 191.91612243652344, 205.56796264648438, 18.16756820678711, 144.97293090820312, -47.43463134765625, 233.1226043701172, 130.7864990234375, 32.72154235839844, 283.6651611328125, 146.3973388671875, 80.02978515625, 304.9201965332031, 136.17965698242188, 170.99415588378906, 87.49467468261719, 445.6086730957031, 88.0152587890625, 71.881591796875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000357.npy"}
{"epoch": 0.5242290748898678, "step": 358, "batch_size": 64, "mean": 138.49087524414062, "std": 159.7650146484375, "min": -203.18392944335938, "p10": -13.980882263183592, "median": 118.94603729248047, "p90": 340.0028289794922, "max": 607.2719116210938, "pos_frac": 0.875, "sample": [86.728271484375, 388.92596435546875, 98.53544616699219, 244.0856170654297, 144.5257110595703, 205.82089233398438, 34.96319580078125, 129.6094970703125, 81.37841796875, 113.54376220703125, -76.57506561279297, 94.80636596679688, -50.930328369140625, -14.542068481445312, 5.498716354370117, 594.6903076171875, 1.5307998657226562, 180.6238555908203, 56.172393798828125, 37.736576080322266, 25.858484268188477, -27.444564819335938, 7.1242828369140625, 70.29692840576172, 99.94741821289062, -203.18392944335938, 212.392578125, -127.28012084960938, 483.1038513183594, 331.7229919433594, 299.9730224609375, 203.0279541015625, 280.75592041015625, 19.348899841308594, 500.705810546875, 141.483154296875, 130.70468139648438, 113.56753540039062, -57.55980682373047, 146.66232299804688, 207.9166717529297, 230.71224975585938, 14.892608642578125, 180.55908203125, 237.6607208251953, -12.67144775390625, 244.3582000732422, 444.9866638183594, 78.32414245605469, 20.756633758544922, 173.87954711914062, 68.9114761352539, 4.3953857421875, 124.32453918457031, 343.55133056640625, 607.2719116210938, 10.551721572875977, 173.25802612304688, 164.0520477294922, 28.57342529296875, 199.18052673339844, 142.36978149414062, 138.66493225097656, 28.599672317504883], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000358.npy"}
{"epoch": 0.5256975036710719, "step": 359, "batch_size": 64, "mean": 158.2200927734375, "std": 181.29734802246094, "min": -169.97439575195312, "p10": -46.30404586791991, "median": 111.33000946044922, "p90": 378.34659729003914, "max": 691.8771362304688, "pos_frac": 0.828125, "sample": [-168.62120056152344, 189.79922485351562, 16.65460205078125, 270.1285705566406, -169.97439575195312, 491.91802978515625, -82.7684326171875, 691.8771362304688, 99.53816986083984, -94.37062072753906, -105.69456481933594, -49.638160705566406, 262.6182861328125, 288.9021301269531, -91.31906127929688, 112.31306457519531, 192.41897583007812, -38.524444580078125, 335.015869140625, 43.517051696777344, 94.84900665283203, 132.6494140625, 357.5967712402344, 39.98479461669922, 80.70033264160156, 191.7557830810547, 185.2386016845703, 239.4624786376953, 89.3721923828125, 71.2578353881836, 387.2393798828125, 314.3216552734375, 653.6664428710938, 218.14817810058594, -11.965286254882812, 129.5041961669922, 217.018798828125, 82.95677947998047, -37.73279571533203, 523.8571166992188, 110.34695434570312, 225.60061645507812, 121.89512634277344, 194.6148681640625, -28.341217041015625, 104.84040832519531, 337.8045349121094, 158.11053466796875, 71.64957427978516, 425.5360107421875, 24.292396545410156, 47.03767395019531, 38.06974792480469, 247.2872314453125, 88.51580810546875, 336.5184326171875, 86.50259399414062, 69.70506286621094, 491.0149230957031, 238.76611328125, 84.57381439208984, 325.43505859375, 65.10148620605469, 107.53629302978516], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000359.npy"}
{"epoch": 0.527165932452276, "step": 360, "batch_size": 64, "mean": 137.50169372558594, "std": 133.86912536621094, "min": -137.8060302734375, "p10": -11.02812709808348, "median": 106.39649963378906, "p90": 315.54033508300785, "max": 468.5168151855469, "pos_frac": 0.890625, "sample": [27.586578369140625, 253.0614013671875, 27.772911071777344, 109.81826782226562, -41.28392791748047, 319.22369384765625, 122.62198638916016, 192.13352966308594, 346.66204833984375, 34.62513732910156, 108.83832550048828, 252.82797241210938, 5.10455322265625, 206.4135284423828, 367.7994079589844, 57.7325439453125, 20.113910675048828, 26.922622680664062, 303.13433837890625, 154.31454467773438, 254.3722686767578, 186.93701171875, -17.9421329498291, 208.7710418701172, 96.29571533203125, 222.2708282470703, 306.9458312988281, 72.22416687011719, 72.59001159667969, 358.9818420410156, 468.5168151855469, 165.37655639648438, 247.56275939941406, -88.85670471191406, 27.45896339416504, 94.74064636230469, 62.343265533447266, 108.04754638671875, 342.7822265625, -20.579833984375, 26.944843292236328, 464.885498046875, -137.8060302734375, 54.687313079833984, 6.958892822265625, 100.76470947265625, -60.240135192871094, 45.77606201171875, 152.96484375, -25.284343719482422, 17.322662353515625, 198.67117309570312, 46.83386993408203, 278.1961364746094, 204.583251953125, 250.9791259765625, 267.17431640625, 70.7190170288086, 104.74545288085938, 267.256591796875, 45.11198425292969, 35.84966278076172, 65.54232025146484, 253.20916748046875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000360.npy"}
{"epoch": 0.5286343612334802, "step": 361, "batch_size": 64, "mean": 131.26763916015625, "std": 155.48776245117188, "min": -209.40121459960938, "p10": -50.0920425415039, "median": 117.58654022216797, "p90": 361.78109436035163, "max": 513.037841796875, "pos_frac": 0.796875, "sample": [-42.484413146972656, 132.00750732421875, 33.16218185424805, 140.1959228515625, -20.580665588378906, -53.352455139160156, 74.07946014404297, 142.03346252441406, 441.5791320800781, 132.8384552001953, 182.564453125, 201.82589721679688, -173.81613159179688, 370.34710693359375, -20.674331665039062, -57.213539123535156, 150.69296264648438, 137.63931274414062, -113.28471374511719, 80.31869506835938, 202.19461059570312, -15.274650573730469, 136.38934326171875, 20.840213775634766, 200.4459686279297, 389.5987854003906, 85.81291961669922, 311.6343994140625, 89.11956787109375, 93.5787582397461, 37.72987365722656, 94.60971069335938, 189.81068420410156, -2.74566650390625, 398.438232421875, 329.22467041015625, -125.09248352050781, 427.409423828125, 513.037841796875, 170.5685272216797, 194.97286987304688, 313.3849792480469, 441.8379821777344, 254.12020874023438, 92.08958435058594, 143.25030517578125, 86.42272186279297, 105.55618286132812, 18.765655517578125, -78.80010223388672, 94.64054870605469, 41.582698822021484, 341.7937316894531, 90.93897247314453, 94.25111389160156, 245.15736389160156, 202.53189086914062, -36.91735076904297, 207.36260986328125, 36.50440979003906, 19.404052734375, -209.40121459960938, 129.6168975830078, 286.8532409667969], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000361.npy"}
{"epoch": 0.5301027900146843, "step": 362, "batch_size": 64, "mean": 185.47732543945312, "std": 154.01309204101562, "min": -70.31558990478516, "p10": 7.937777900695813, "median": 171.72943878173828, "p90": 383.27600097656267, "max": 644.28271484375, "pos_frac": 0.90625, "sample": [58.786746978759766, 220.5667724609375, 413.96038818359375, 271.1322937011719, 158.31021118164062, 52.816505432128906, 644.28271484375, 239.86917114257812, 148.76541137695312, 327.8674621582031, 328.3797912597656, 43.42707061767578, 101.97000885009766, 226.58480834960938, 209.2565460205078, 329.4039001464844, 62.14391326904297, -70.31558990478516, 105.070068359375, 222.73753356933594, 176.93089294433594, 34.147247314453125, 35.579017639160156, 119.30850219726562, -14.961898803710938, -0.9700794219970703, 92.46255493164062, 241.0923309326172, -62.09069061279297, 546.4537963867188, 407.8620910644531, 106.23838806152344, 73.77496337890625, 260.7767639160156, 41.4205207824707, 254.1230926513672, 398.7558288574219, -23.060447692871094, 219.9304656982422, 272.61944580078125, 169.2017364501953, 2.7736949920654297, 464.93719482421875, 347.1564025878906, 90.01834869384766, 120.68795013427734, 219.11178588867188, 173.8345947265625, 94.84376525878906, 582.701416015625, 165.9453125, 335.291015625, 332.709716796875, 169.62428283691406, 103.80901336669922, 237.84658813476562, 298.37945556640625, -60.326812744140625, 104.85851287841797, 186.87051391601562, 201.93856811523438, 200.4976806640625, 19.9873046875, 32.442840576171875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000362.npy"}
{"epoch": 0.5315712187958884, "step": 363, "batch_size": 64, "mean": 199.26763916015625, "std": 164.79278564453125, "min": -168.39541625976562, "p10": 12.854437255859382, "median": 197.87865447998047, "p90": 401.13666687011727, "max": 617.467041015625, "pos_frac": 0.90625, "sample": [-167.251708984375, 244.19744873046875, 421.0657043457031, 215.3873748779297, -72.4241943359375, 184.45355224609375, 213.86083984375, 229.43093872070312, 99.66505432128906, 148.92575073242188, 196.41297912597656, 90.35902404785156, 299.846923828125, 97.00357818603516, 194.60182189941406, 617.467041015625, -168.39541625976562, 335.1313171386719, 102.21034240722656, 507.2210998535156, 289.0148010253906, 174.62353515625, 212.38668823242188, 270.0323486328125, 113.50760650634766, 246.23707580566406, 612.8455810546875, 217.43373107910156, 77.389892578125, 410.079345703125, 105.82533264160156, 206.3394012451172, 51.57843017578125, 132.29888916015625, 380.2704162597656, 99.02106475830078, 178.82260131835938, 37.79974365234375, 359.72186279296875, 364.945068359375, 358.5655517578125, -11.406669616699219, 19.818687438964844, 380.1683044433594, 29.155298233032227, 297.653076171875, 323.90716552734375, -13.5760498046875, 298.3453674316406, 450.9072570800781, -6.375923156738281, 111.22708129882812, 9.869758605957031, 257.26226806640625, 299.3900451660156, 67.08417510986328, 32.53722381591797, 199.34432983398438, 78.02700805664062, 118.6409683227539, 487.95263671875, 326.02142333984375, 79.58113861083984, 229.68572998046875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000363.npy"}
{"epoch": 0.5330396475770925, "step": 364, "batch_size": 64, "mean": 144.00555419921875, "std": 159.55563354492188, "min": -219.6351318359375, "p10": -43.95796432495114, "median": 137.4359359741211, "p90": 333.37071533203135, "max": 614.1900634765625, "pos_frac": 0.84375, "sample": [227.13519287109375, 277.3901062011719, -196.71775817871094, 39.889183044433594, -219.6351318359375, 289.2796630859375, -123.60617065429688, 169.5189971923828, 286.345458984375, 614.1900634765625, 126.49995422363281, 341.6695861816406, 292.05828857421875, 156.1071319580078, -78.90006256103516, 32.23036193847656, 21.24127197265625, -10.176605224609375, 210.86709594726562, -57.273406982421875, 97.22747802734375, 216.77322387695312, 148.37191772460938, 255.8166046142578, 166.28802490234375, 39.42393493652344, 35.55996322631836, 377.22088623046875, 200.82650756835938, 208.53375244140625, 69.66923522949219, 408.27227783203125, 98.00434112548828, 117.98959350585938, 124.04718780517578, 47.042999267578125, 173.6451416015625, 94.24307250976562, 297.3097839355469, -55.526702880859375, -7.300239562988281, 371.7479248046875, 417.1129150390625, 57.16667175292969, 158.7213592529297, -16.96424102783203, 83.31627655029297, 150.04150390625, 314.0066833496094, 118.225341796875, 47.27888107299805, 179.42437744140625, 4.226346969604492, 155.96902465820312, 556.970703125, 122.93472290039062, 158.83718872070312, 244.54315185546875, 29.91695213317871, 270.41839599609375, 61.9971923828125, 99.66387939453125, 238.29454040527344, -119.04653930664062], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000364.npy"}
{"epoch": 0.5345080763582967, "step": 365, "batch_size": 64, "mean": 147.65203857421875, "std": 161.03025817871094, "min": -93.134521484375, "p10": -23.00267181396484, "median": 98.20791244506836, "p90": 359.5519470214845, "max": 711.767333984375, "pos_frac": 0.859375, "sample": [147.11013793945312, -21.386314392089844, 12.01605224609375, 152.67552185058594, 65.19197082519531, 17.846281051635742, 244.8614501953125, -23.695396423339844, 113.13648223876953, 36.640655517578125, 290.57745361328125, 279.71197509765625, -19.594377517700195, 267.7283935546875, -34.76362609863281, 39.38470458984375, 59.93125915527344, 652.4359741210938, 389.64581298828125, 269.92388916015625, 40.02558135986328, 69.75629425048828, 44.056785583496094, 5.830108642578125, 78.09303283691406, -93.134521484375, 59.255889892578125, 84.18023681640625, 138.47999572753906, 75.62200927734375, -27.139907836914062, -63.408504486083984, 106.49195098876953, 28.45553970336914, 227.85203552246094, 394.2481384277344, 70.66522979736328, 89.92387390136719, 60.4388427734375, 136.72708129882812, 176.9173126220703, 10.018836975097656, 226.52334594726562, 35.76184844970703, 79.85186767578125, 324.13482666015625, 253.18670654296875, 188.94580078125, 180.98703002929688, -31.460784912109375, 121.98914337158203, 263.76959228515625, 17.276636123657227, 218.77737426757812, 163.0657501220703, 187.7935791015625, 711.767333984375, 399.148193359375, 287.2371826171875, -52.2547492980957, 450.66571044921875, 374.730712890625, 87.94947814941406, 307.1494140625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000365.npy"}
{"epoch": 0.5359765051395007, "step": 366, "batch_size": 64, "mean": 155.00631713867188, "std": 154.9882354736328, "min": -132.93833923339844, "p10": -17.736734771728507, "median": 135.87843322753906, "p90": 344.7187896728516, "max": 662.2771606445312, "pos_frac": 0.84375, "sample": [-10.430992126464844, 77.64236450195312, 197.689208984375, 70.75318908691406, -49.06146240234375, 22.34998321533203, 31.676809310913086, 85.85067749023438, 99.29742431640625, 331.5439453125, 115.69569396972656, -96.16055297851562, 367.5452880859375, 390.50457763671875, 181.4130096435547, 108.30509948730469, 180.87017822265625, 18.951183319091797, 19.92873191833496, 5.288990020751953, 286.36456298828125, 268.9388427734375, 230.39535522460938, 44.06025695800781, 344.7763671875, 662.2771606445312, 302.5215148925781, 104.41986846923828, 268.5618591308594, 152.9364013671875, 136.50982666015625, -23.048507690429688, -6.833305358886719, 209.1568603515625, 254.89601135253906, 194.4454803466797, 567.6044921875, 225.60348510742188, 138.44305419921875, 99.91804504394531, 196.0102996826172, 140.97174072265625, 92.259521484375, -101.29132080078125, 344.5844421386719, -10.171487808227539, 223.4566650390625, 25.588367462158203, 278.72540283203125, 371.3301086425781, 251.52044677734375, 88.9966812133789, 244.28530883789062, 135.24703979492188, 457.50537109375, 240.29620361328125, -58.98779296875, 103.33815002441406, -20.867767333984375, 79.59722900390625, 74.9703598022461, 72.2455825805664, 212.13067626953125, -132.93833923339844], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000366.npy"}
{"epoch": 0.5374449339207048, "step": 367, "batch_size": 64, "mean": 121.64521789550781, "std": 129.97561645507812, "min": -172.07504272460938, "p10": -15.0339859008789, "median": 94.05333709716797, "p90": 317.54808654785165, "max": 393.2070617675781, "pos_frac": 0.84375, "sample": [51.96336364746094, 51.511077880859375, 339.3673095703125, 60.25033187866211, 372.5242614746094, 81.74618530273438, 90.88336181640625, 96.28680419921875, 272.180419921875, 6.377815246582031, 33.742034912109375, 5.563570022583008, 22.88531494140625, 277.5950012207031, 204.3445281982422, 160.4315185546875, 241.03790283203125, 51.46730041503906, 70.39503479003906, -30.098466873168945, 161.58168029785156, 340.315185546875, -5.862586975097656, 8.318748474121094, 130.69461059570312, 217.82968139648438, 345.566162109375, 91.81986999511719, 393.2070617675781, 56.434112548828125, 329.18408203125, 35.66075134277344, 6.759586334228516, 187.45167541503906, 51.52845764160156, 172.2431640625, 223.3297882080078, 248.91981506347656, 290.3974304199219, 98.10719299316406, 221.7313690185547, -101.1011962890625, -92.04209899902344, -1.8421306610107422, 21.650115966796875, -65.42768859863281, -26.520538330078125, 170.19387817382812, 215.21229553222656, 37.851287841796875, -18.040184020996094, -172.07504272460938, 157.6279296875, 24.654922485351562, 201.43734741210938, 211.7340545654297, 202.5536346435547, 98.76411437988281, 287.42523193359375, 18.0091552734375, 98.36494445800781, -8.019523620605469, 369.709228515625, 89.50177001953125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000367.npy"}
{"epoch": 0.5389133627019089, "step": 368, "batch_size": 64, "mean": 173.193359375, "std": 120.78719329833984, "min": -88.04280090332031, "p10": 28.142940139770513, "median": 150.88330078125, "p90": 301.3900390625, "max": 579.60498046875, "pos_frac": 0.921875, "sample": [206.702392578125, 203.76678466796875, 207.74075317382812, 186.267333984375, 174.75503540039062, 303.03253173828125, 96.24307250976562, -47.13563537597656, 346.9339904785156, 136.8836669921875, 96.595458984375, 278.2386474609375, 268.918701171875, 81.28680419921875, 255.82864379882812, 272.14239501953125, 103.96666717529297, 269.8145751953125, -36.84918975830078, 202.81784057617188, 137.55657958984375, 130.69935607910156, 320.0670166015625, 193.09286499023438, 236.93869018554688, 56.781341552734375, 132.43028259277344, -1.2626495361328125, 25.500988006591797, -88.04280090332031, 89.47560119628906, 278.65032958984375, 205.33517456054688, 124.9693603515625, 126.6314697265625, 215.73565673828125, 146.57440185546875, 34.3074951171875, 143.7310333251953, 56.12757110595703, 126.58834838867188, 316.603515625, 258.6576843261719, 273.0157165527344, 155.18820190429688, 263.8998718261719, 495.11956787109375, 21.408676147460938, -5.596738815307617, 85.48348999023438, 72.4810562133789, 335.0957946777344, 141.9367218017578, 297.55755615234375, 287.5951232910156, 102.75485229492188, 122.3497085571289, 579.60498046875, 278.2606506347656, 217.57846069335938, 146.57839965820312, 93.19956970214844, 40.12309646606445, 205.6412811279297], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000368.npy"}
{"epoch": 0.540381791483113, "step": 369, "batch_size": 64, "mean": 128.01370239257812, "std": 147.3737030029297, "min": -164.69239807128906, "p10": -42.776758575439445, "median": 103.79164505004883, "p90": 316.6399108886719, "max": 523.94873046875, "pos_frac": 0.796875, "sample": [43.88983154296875, -2.086965560913086, 393.150146484375, -34.189491271972656, -153.51063537597656, 99.82896423339844, 32.65387725830078, 284.7891540527344, 14.716060638427734, 328.8982238769531, 200.7081756591797, -7.148895263671875, 205.58489990234375, 192.7996826171875, 177.48500061035156, 523.94873046875, 221.02487182617188, 210.44912719726562, -4.827083587646484, 203.65310668945312, 267.81890869140625, 105.24826049804688, 161.57772827148438, 120.0615005493164, 195.70083618164062, 98.01446533203125, -124.99040985107422, 233.0081787109375, 193.70291137695312, 65.49712371826172, 171.6414031982422, 352.68853759765625, 237.26071166992188, -46.45701599121094, 65.84429931640625, 75.44225311279297, 254.8696746826172, 119.15392303466797, 93.53382873535156, 321.456298828125, 289.5488586425781, -70.57693481445312, 399.3441467285156, 88.59626770019531, 280.2554016113281, 218.21051025390625, 228.78407287597656, -156.28421020507812, 16.891803741455078, 305.40167236328125, 99.9034423828125, 70.71394348144531, -156.5426025390625, 53.764381408691406, 65.97167205810547, 27.836456298828125, -164.69239807128906, 3.4297714233398438, 102.33502960205078, 345.053955078125, 194.42349243164062, -7.687995910644531, 73.31051635742188, -8.004875183105469], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000369.npy"}
{"epoch": 0.5418502202643172, "step": 370, "batch_size": 64, "mean": 108.72503662109375, "std": 135.2613525390625, "min": -194.6722869873047, "p10": -31.784219360351564, "median": 111.6788444519043, "p90": 299.933984375, "max": 390.0079345703125, "pos_frac": 0.765625, "sample": [109.95880889892578, 205.6507568359375, 77.62428283691406, -18.016769409179688, 390.0079345703125, 51.81939697265625, 156.659912109375, 113.39888000488281, 30.64143180847168, 64.66548919677734, 210.04246520996094, 193.82479858398438, 19.6751708984375, 126.15876770019531, -31.766082763671875, 305.01861572265625, 283.5794372558594, 0.4130821228027344, 166.31541442871094, 329.64453125, 347.13897705078125, 14.058868408203125, 317.6528015136719, 20.83399200439453, 159.49090576171875, -2.9844131469726562, 78.333740234375, 109.48918151855469, -88.52376556396484, 114.07604217529297, 290.0946044921875, -31.7919921875, -0.4354400634765625, 170.88246154785156, -15.061710357666016, 191.65859985351562, 199.08367919921875, -15.553497314453125, -98.08525085449219, 15.911344528198242, 138.94580078125, 135.62100219726562, -194.6722869873047, 127.02032470703125, 290.8365478515625, 155.0816650390625, 19.209228515625, 303.8328857421875, 209.4943389892578, -147.98623657226562, -53.62504577636719, -23.435272216796875, 75.63616943359375, 210.34844970703125, -28.919044494628906, 3.5361709594726562, 19.236854553222656, -124.17200469970703, 6.498142242431641, 268.33648681640625, 365.80645751953125, 179.43338012695312, 225.94284057617188, 234.80963134765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000370.npy"}
{"epoch": 0.5433186490455213, "step": 371, "batch_size": 64, "mean": 137.0546417236328, "std": 152.3195343017578, "min": -246.0979461669922, "p10": -42.58666992187499, "median": 147.91001892089844, "p90": 329.61358032226565, "max": 517.4133911132812, "pos_frac": 0.8125, "sample": [259.4638366699219, 254.57086181640625, 285.4007263183594, 40.78404998779297, -246.0979461669922, 157.92306518554688, 397.5682373046875, 153.8209228515625, 322.83966064453125, 238.4066619873047, 146.80059814453125, 8.313436508178711, 220.27407836914062, 169.51028442382812, -73.76998901367188, 65.78327941894531, 75.7225341796875, 104.83514404296875, 351.46783447265625, 136.06207275390625, 0.1773242950439453, -163.55091857910156, 331.492919921875, -92.89515686035156, -59.29547119140625, 184.81565856933594, 388.90386962890625, 193.04611206054688, 27.313552856445312, 135.14764404296875, 517.4133911132812, -26.383010864257812, 196.6348876953125, 161.4668426513672, -8.070732116699219, -38.174713134765625, 149.01943969726562, -13.000801086425781, 62.44281005859375, 353.36981201171875, -44.477508544921875, 85.1611099243164, 308.2133483886719, 165.5576629638672, 23.34310531616211, 186.52432250976562, -11.58428955078125, 228.91253662109375, 325.22845458984375, 16.694564819335938, 224.08523559570312, 77.35457611083984, 141.82630920410156, 158.18154907226562, 191.1049346923828, 64.10186767578125, 159.98304748535156, 65.68177795410156, -117.59761047363281, 314.494873046875, 284.74005126953125, 447.79248046875, 87.82899475097656, 18.79229736328125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000371.npy"}
{"epoch": 0.5447870778267254, "step": 372, "batch_size": 64, "mean": 187.5018310546875, "std": 180.899169921875, "min": -147.77969360351562, "p10": -31.901624298095697, "median": 164.71788787841797, "p90": 476.18970031738286, "max": 603.0062255859375, "pos_frac": 0.84375, "sample": [284.85198974609375, 192.9205780029297, 353.51690673828125, -66.44572448730469, 45.20549011230469, 493.5417175292969, 419.1297912597656, 226.5526123046875, 163.8284912109375, 3.9148101806640625, 603.0062255859375, 259.5406799316406, 128.81634521484375, 249.64154052734375, -27.146209716796875, 142.26654052734375, 39.105228424072266, 130.22726440429688, 516.3192749023438, -108.65528106689453, 82.51701354980469, 207.94021606445312, 398.1605529785156, -86.4083023071289, 506.33648681640625, 246.0478973388672, 24.68108367919922, 124.38504028320312, 79.75936889648438, 99.48226165771484, 49.7838134765625, 154.17941284179688, 336.96807861328125, -33.939659118652344, 152.46847534179688, -89.74724578857422, 309.501708984375, 459.77227783203125, 125.2626724243164, 483.2257385253906, 316.8048400878906, 238.00213623046875, 202.09141540527344, -147.77969360351562, 302.23004150390625, 438.4178161621094, 242.5537567138672, 186.6089324951172, -22.067214965820312, 225.05636596679688, 78.09146118164062, 235.45262145996094, 165.60728454589844, -3.0439090728759766, 514.3487548828125, 171.6741180419922, 43.667877197265625, 137.88150024414062, 35.71739196777344, 564.7346801757812, 39.3851203918457, -42.70709991455078, 301.4500732421875, 95.4248275756836], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000372.npy"}
{"epoch": 0.5462555066079295, "step": 373, "batch_size": 64, "mean": 174.2254180908203, "std": 157.95152282714844, "min": -181.6821746826172, "p10": 0.25573577880860077, "median": 150.0140380859375, "p90": 385.0130493164063, "max": 477.6051330566406, "pos_frac": 0.890625, "sample": [-3.0908279418945312, -113.25962829589844, 20.303466796875, 119.31388092041016, 109.76521301269531, 457.323486328125, 356.05047607421875, 7.1403045654296875, 135.48434448242188, 125.85128021240234, 477.6051330566406, 140.96946716308594, -2.694793701171875, 59.256126403808594, 261.26531982421875, 136.04139709472656, 20.658153533935547, 405.38592529296875, 165.79421997070312, 73.51311492919922, -172.15957641601562, 31.030487060546875, 222.11282348632812, -140.6790771484375, 103.10675048828125, 32.016929626464844, 182.41680908203125, 213.70899963378906, 301.8775939941406, 147.0960693359375, 69.66888427734375, 225.85052490234375, 186.44400024414062, 263.509033203125, 325.12518310546875, -3.6256980895996094, 11.891502380371094, 391.3846435546875, 273.6807861328125, 118.87690734863281, 459.96087646484375, 68.1971664428711, 261.05426025390625, 310.86932373046875, 336.0679016113281, 209.8220977783203, 319.8526916503906, 446.4903259277344, 370.14599609375, 319.9920654296875, 49.5694580078125, 135.73941040039062, 101.94427490234375, 152.9320068359375, 16.824569702148438, 308.6566162109375, 217.7113800048828, 106.14810180664062, 413.99005126953125, 239.73475646972656, 306.1357727050781, 364.1991882324219, -181.6821746826172, 80.06035614013672], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000373.npy"}
{"epoch": 0.5477239353891337, "step": 374, "batch_size": 64, "mean": 156.51187133789062, "std": 177.40768432617188, "min": -118.61407470703125, "p10": -39.42747344970701, "median": 113.19159698486328, "p90": 432.3464569091797, "max": 668.9567260742188, "pos_frac": 0.84375, "sample": [-16.46703338623047, 144.58929443359375, 1.9668426513671875, 598.4530029296875, 69.29353332519531, 668.9567260742188, 94.44657135009766, 378.90283203125, 153.86317443847656, 25.901573181152344, 52.774139404296875, 167.33697509765625, 22.496658325195312, 0.6495285034179688, 44.961822509765625, -1.9208984375, 348.46258544921875, 224.51693725585938, 449.2601318359375, 147.32858276367188, 64.98745727539062, 76.668212890625, 362.42779541015625, 36.25871276855469, -62.431095123291016, 407.3340759277344, 26.380807876586914, 49.05900955200195, -96.55387878417969, 241.46560668945312, 244.8309326171875, 211.278076171875, 45.60931396484375, 471.4385986328125, -118.61407470703125, 184.82615661621094, -52.35774230957031, 383.1030578613281, 116.70551300048828, -12.9794921875, 101.79960632324219, 137.34603881835938, 134.90631103515625, 112.57781982421875, 113.80537414550781, 61.035255432128906, 44.37562561035156, 427.5597839355469, 182.94480895996094, 182.457763671875, 39.02314758300781, 434.39788818359375, -85.44688415527344, 281.328857421875, 166.96109008789062, 63.49884796142578, -49.267662048339844, 288.82635498046875, 105.04292297363281, 81.89143371582031, 488.7818603515625, -50.42462158203125, 125.40538787841797, 472.7226867675781], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000374.npy"}
{"epoch": 0.5491923641703378, "step": 375, "batch_size": 64, "mean": 150.79466247558594, "std": 181.8096923828125, "min": -257.81158447265625, "p10": -72.83865203857421, "median": 149.70600128173828, "p90": 416.80663452148445, "max": 564.27587890625, "pos_frac": 0.84375, "sample": [481.6900939941406, 221.71128845214844, 48.71429443359375, 151.1961669921875, -102.59117126464844, -84.6451416015625, 309.05029296875, 143.36639404296875, 23.280162811279297, 126.71806335449219, 40.619285583496094, 59.14237976074219, 152.52655029296875, -217.42913818359375, 2.5355300903320312, 487.2937316894531, 280.76416015625, 28.792802810668945, 335.43475341796875, 196.48233032226562, 20.58709716796875, 264.910400390625, -182.4127197265625, 122.48743438720703, 165.6289825439453, 2.3744049072265625, 403.9652404785156, 16.525114059448242, -73.63125610351562, 163.086669921875, 60.007530212402344, -30.414352416992188, 422.3100891113281, 427.3553466796875, -70.98924255371094, 236.81716918945312, 341.09625244140625, 270.6513671875, 168.85400390625, 485.3113708496094, 139.71176147460938, 162.30479431152344, 95.59249114990234, -257.81158447265625, 247.3081817626953, 231.80718994140625, 223.31741333007812, -180.3900604248047, 564.27587890625, 394.24853515625, 249.4520721435547, 38.20559310913086, 2.57098388671875, 209.78858947753906, 67.49444580078125, 431.3513488769531, 148.21583557128906, 330.2169494628906, 41.184852600097656, 54.66429901123047, 129.2447967529297, 284.3105773925781, 154.21603393554688, -9.595897674560547], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000375.npy"}
{"epoch": 0.5506607929515418, "step": 376, "batch_size": 64, "mean": 179.1793670654297, "std": 180.32127380371094, "min": -312.81048583984375, "p10": -9.153025627136223, "median": 137.07403564453125, "p90": 424.51289672851567, "max": 609.4039916992188, "pos_frac": 0.84375, "sample": [121.29708099365234, 476.17138671875, 192.4734344482422, 101.297119140625, 609.4039916992188, 70.98905944824219, 275.75762939453125, -11.747941970825195, 77.06071472167969, -3.0982208251953125, 236.92315673828125, 2.6002445220947266, 426.91778564453125, 83.44239807128906, -1.2927322387695312, 301.2337646484375, 222.7764892578125, -52.837677001953125, 27.131561279296875, 194.3942108154297, 46.27531433105469, -2.5071887969970703, -130.87158203125, 477.7734375, 443.03741455078125, 92.09712982177734, 174.25440979003906, 241.51075744628906, 387.4195251464844, 44.95426559448242, 360.552490234375, 412.5137939453125, 82.65895080566406, 285.49127197265625, 140.09829711914062, 245.7333984375, 356.19757080078125, 124.84146118164062, 260.50848388671875, 134.04977416992188, 222.19004821777344, 279.8238830566406, -15.832748413085938, 235.7255859375, 376.4210510253906, 82.49592590332031, 575.075927734375, 133.49942016601562, -46.62211608886719, -40.92168426513672, 10.556015014648438, 57.90953826904297, 474.3305358886719, 370.2413330078125, 230.11151123046875, 372.142578125, 78.87358093261719, 160.85675048828125, 109.67949676513672, 90.08802032470703, 10.605230331420898, 66.65635681152344, 418.9014892578125, -312.81048583984375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000376.npy"}
{"epoch": 0.5521292217327459, "step": 377, "batch_size": 64, "mean": 171.89007568359375, "std": 169.15159606933594, "min": -233.76904296875, "p10": -34.56828155517577, "median": 169.52245330810547, "p90": 405.5896881103516, "max": 562.63232421875, "pos_frac": 0.828125, "sample": [344.28253173828125, 438.12982177734375, 181.20071411132812, 74.77365112304688, 103.01459503173828, -50.853515625, 233.08514404296875, 110.65057373046875, 302.383056640625, 7.071235656738281, 368.3870544433594, 562.63232421875, 10.433815002441406, 472.6134033203125, -21.61344337463379, 163.89944458007812, 184.6056365966797, 316.4412536621094, 172.12701416015625, -28.777311325073242, 191.4954833984375, 398.202880859375, 177.13223266601562, -6.179298400878906, 269.9972229003906, 414.688720703125, 292.6777648925781, -29.748336791992188, 321.3603515625, 117.24027252197266, 312.0284423828125, 165.12887573242188, 71.12108612060547, 121.43948364257812, 12.191974639892578, 382.93597412109375, 199.7448272705078, 187.25567626953125, 178.60032653808594, 460.3143310546875, 151.7404327392578, -36.63397216796875, -233.76904296875, 220.72607421875, 148.78890991210938, 166.9178924560547, 283.76361083984375, 90.23313903808594, -107.78044128417969, 267.1180114746094, 78.65415954589844, 353.46337890625, -37.79252624511719, -69.53643798828125, 16.49468231201172, 408.7554626464844, 52.905799865722656, 462.3844909667969, 277.3173828125, 5.818473815917969, 193.57655334472656, 127.73486328125, 153.499267578125, -155.5306396484375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000377.npy"}
{"epoch": 0.55359765051395, "step": 378, "batch_size": 64, "mean": 181.31509399414062, "std": 159.90281677246094, "min": -215.70892333984375, "p10": -20.89767208099365, "median": 180.6227798461914, "p90": 380.5224212646485, "max": 675.75146484375, "pos_frac": 0.828125, "sample": [420.8233947753906, 277.24560546875, 400.8810119628906, 370.63677978515625, 311.6475830078125, 114.73292541503906, 165.49737548828125, 220.51513671875, 233.72454833984375, 127.86483764648438, 256.89666748046875, 196.2235107421875, 299.5313415527344, 179.4059295654297, 234.01190185546875, -10.612152099609375, 229.42047119140625, 433.2119140625, 160.0702362060547, 237.075439453125, 131.23745727539062, -30.338476181030273, 38.552677154541016, 109.53949737548828, 145.16015625, 165.63705444335938, -16.773969650268555, 350.8323669433594, 159.583740234375, 191.23883056640625, 301.3563232421875, 384.7591247558594, 144.54421997070312, -20.682405471801758, -73.14266204833984, 286.7844543457031, 52.02918243408203, 675.75146484375, 39.91056823730469, -2.0914306640625, 156.0850372314453, 64.69635772705078, 215.6284942626953, 424.4075622558594, -20.98992919921875, -35.36997985839844, 325.5586853027344, 181.83963012695312, 444.9441833496094, 285.32470703125, 136.01661682128906, 230.72059631347656, 326.67010498046875, 189.50454711914062, -215.70892333984375, 204.44241333007812, 156.29031372070312, 135.58538818359375, -27.25041961669922, 351.8420715332031, 65.43638610839844, -171.4938507080078, 14.904684066772461, 272.388671875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000378.npy"}
{"epoch": 0.5550660792951542, "step": 379, "batch_size": 64, "mean": 181.93988037109375, "std": 189.7003173828125, "min": -175.86126708984375, "p10": -56.181329345703105, "median": 179.87857055664062, "p90": 391.44591979980476, "max": 716.4874877929688, "pos_frac": 0.8125, "sample": [257.40887451171875, 403.75439453125, 249.07217407226562, 311.3729553222656, -72.42675018310547, 63.48950958251953, 137.50738525390625, 377.93731689453125, 164.31546020507812, 96.18099975585938, 310.62274169921875, 622.4971313476562, 78.55644226074219, 348.9903869628906, 325.2969055175781, 189.32040405273438, -152.91607666015625, 593.5448608398438, 147.4836883544922, 266.59197998046875, 274.18701171875, 336.84576416015625, 231.64344787597656, 41.8965950012207, 38.1505126953125, -6.116086959838867, -99.61178588867188, 43.745361328125, 27.392799377441406, 216.5377960205078, 368.75347900390625, 268.07049560546875, -175.86126708984375, 30.580684661865234, 302.7679443359375, 471.9625244140625, 211.1787567138672, 293.8299560546875, 397.2353210449219, 42.43806457519531, -11.913106918334961, 477.5263671875, 111.19680786132812, -17.752086639404297, 252.1294403076172, -109.50735473632812, 76.50796508789062, 310.783935546875, 289.63671875, 99.99046325683594, -64.52845764160156, 170.43673706054688, 343.9405517578125, 104.02583312988281, 716.4874877929688, 100.39017486572266, 205.5025634765625, 263.99725341796875, 32.086448669433594, -9.72665023803711, 19.673446655273438, 366.7132873535156, -80.96963500976562, -36.70469665527344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000379.npy"}
{"epoch": 0.5565345080763583, "step": 380, "batch_size": 64, "mean": 199.48626708984375, "std": 192.84442138671875, "min": -306.9314270019531, "p10": -60.0080787658691, "median": 205.83898162841797, "p90": 434.7661437988282, "max": 639.5811767578125, "pos_frac": 0.84375, "sample": [97.01507568359375, 190.72657775878906, -12.039230346679688, 535.686279296875, 219.39083862304688, 77.9540023803711, 165.26980590820312, 337.6624755859375, 322.08050537109375, 215.58494567871094, 70.19567108154297, 402.4179992675781, -16.370407104492188, 475.86724853515625, 564.3597412109375, 342.3253479003906, 184.09518432617188, 251.87408447265625, 353.3060302734375, 57.47222900390625, 210.07040405273438, -206.31076049804688, 185.16827392578125, 197.1968994140625, 136.6314697265625, -88.02660369873047, 443.245849609375, 273.8866271972656, 162.61221313476562, 414.98016357421875, 200.71636962890625, 376.044921875, 260.4236145019531, 305.103759765625, 288.6394348144531, 311.68548583984375, 62.72828674316406, 33.74658203125, 201.60755920410156, 55.933326721191406, 639.5811767578125, -10.996868133544922, -103.79784393310547, 128.79629516601562, -127.75885772705078, 215.6925048828125, 362.215087890625, 292.0161437988281, 31.35765838623047, -166.93637084960938, 365.2181396484375, 57.7110595703125, 493.08758544921875, 248.15513610839844, 150.77044677734375, -78.7099380493164, 402.20318603515625, 241.41436767578125, 126.22351837158203, 47.1348876953125, -306.9314270019531, 341.74053955078125, 461.7967834472656, 296.1798095703125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000380.npy"}
{"epoch": 0.5580029368575624, "step": 381, "batch_size": 64, "mean": 167.54666137695312, "std": 204.3968963623047, "min": -295.8416748046875, "p10": -87.1848030090332, "median": 138.34000396728516, "p90": 451.57907409667973, "max": 593.1339721679688, "pos_frac": 0.828125, "sample": [164.31008911132812, 14.500396728515625, 121.41510009765625, 454.029052734375, 263.42169189453125, 476.3165588378906, 315.8582763671875, 43.40184020996094, 92.54276275634766, 445.8624572753906, 396.3301696777344, 114.57621765136719, 268.6803894042969, 238.9156494140625, 585.7369384765625, -105.49639129638672, 366.2576904296875, 288.4068603515625, 477.91632080078125, 244.30850219726562, 588.2097778320312, 587.005126953125, -152.06219482421875, -162.5519561767578, -38.03032684326172, 194.2017822265625, 445.32989501953125, 83.52461242675781, -88.33936309814453, 33.02941131591797, 120.52293395996094, 250.5228271484375, 46.40007019042969, 4.395145416259766, 81.68782043457031, -64.27381896972656, 171.58200073242188, 39.63605499267578, 211.67996215820312, 47.382568359375, 103.3965072631836, 125.35357666015625, 593.1339721679688, 252.5759735107422, 256.64971923828125, 182.98184204101562, 73.77926635742188, -9.58828353881836, 402.2430419921875, -84.49082946777344, 81.54741668701172, 324.8951721191406, 13.630767822265625, 153.55137634277344, 75.6815414428711, -117.59930419921875, 109.99516296386719, 324.216064453125, 291.75787353515625, 74.61861419677734, 151.32643127441406, -295.8416748046875, -244.26165771484375, 216.291259765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000381.npy"}
{"epoch": 0.5594713656387665, "step": 382, "batch_size": 64, "mean": 159.9391632080078, "std": 157.12107849121094, "min": -193.85768127441406, "p10": -17.54198532104492, "median": 140.38558959960938, "p90": 398.9613464355471, "max": 506.000732421875, "pos_frac": 0.84375, "sample": [478.6768798828125, 336.64703369140625, 154.97845458984375, 88.44506072998047, 56.74646759033203, 506.000732421875, -35.44841766357422, 36.049278259277344, 149.85009765625, 43.131141662597656, 141.03648376464844, 251.23843383789062, 293.00433349609375, 141.3326873779297, 279.18463134765625, 226.09881591796875, 126.25984191894531, -47.81828308105469, 324.3033142089844, 74.21543884277344, 242.18295288085938, 61.70887756347656, -12.131393432617188, 266.9646301269531, 425.66748046875, -13.993370056152344, 290.6925048828125, -76.54641723632812, 276.86224365234375, 138.72747802734375, 481.8827819824219, -74.89070892333984, 25.97610855102539, 199.11322021484375, 189.55813598632812, -193.85768127441406, -64.34689331054688, 245.18255615234375, -19.062820434570312, 313.7469482421875, 48.62841033935547, 59.326446533203125, 247.6619110107422, 34.557090759277344, 110.59236145019531, 58.72575378417969, 201.14015197753906, 187.24868774414062, 126.32670593261719, 124.43727111816406, 485.80584716796875, 456.06549072265625, 164.14419555664062, 48.824974060058594, 192.07476806640625, 447.78436279296875, 46.59267807006836, 27.02572250366211, 255.7532501220703, 303.4635009765625, 139.7346954345703, 85.1957015991211, -9.973770141601562, 67.601806640625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000382.npy"}
{"epoch": 0.5609397944199707, "step": 383, "batch_size": 64, "mean": 174.2191162109375, "std": 195.34152221679688, "min": -422.30096435546875, "p10": -47.63334579467773, "median": 160.56776428222656, "p90": 426.1621215820313, "max": 646.8382568359375, "pos_frac": 0.828125, "sample": [45.440834045410156, 73.117919921875, 76.99295043945312, -118.7445297241211, 300.89825439453125, -33.47904968261719, -44.45726776123047, 462.6593322753906, 243.6191864013672, -5.353404998779297, 234.56039428710938, 284.1190185546875, 18.872127532958984, 22.505722045898438, -132.9130096435547, 169.259521484375, 646.8382568359375, 45.31092071533203, 155.2967071533203, 181.9431610107422, 499.3602600097656, -166.31539916992188, 64.13162231445312, 414.94622802734375, 200.04660034179688, -58.793678283691406, 541.5777587890625, 145.80393981933594, -64.417724609375, 125.80342102050781, 596.1126098632812, 159.14617919921875, 385.3559265136719, 176.93121337890625, 290.3160400390625, 148.1374053955078, 368.9623107910156, 118.26675415039062, 250.6024627685547, 262.88641357421875, 86.56243896484375, 30.768768310546875, 148.843505859375, 128.67893981933594, 226.25820922851562, 352.66632080078125, 15.923431396484375, 370.0029296875, 41.71604919433594, 197.12332153320312, 430.96893310546875, 143.85775756835938, -48.99452209472656, 161.98934936523438, 284.3050537109375, 172.27731323242188, 458.9081115722656, 382.6619873046875, -19.442806243896484, 285.1432800292969, -422.30096435546875, 40.17586898803711, 239.83309936523438, 356.7494812011719], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000383.npy"}
{"epoch": 0.5624082232011748, "step": 384, "batch_size": 64, "mean": 149.1619873046875, "std": 137.2507781982422, "min": -198.55215454101562, "p10": -13.745876693725574, "median": 163.38735961914062, "p90": 306.22474365234376, "max": 576.906982421875, "pos_frac": 0.875, "sample": [151.8074951171875, 100.05685424804688, 359.72564697265625, -18.925556182861328, 15.826339721679688, 19.431259155273438, -134.8229217529297, -1.6599578857421875, 576.906982421875, 28.254600524902344, 406.4600830078125, 188.04556274414062, 128.2733612060547, 59.22705841064453, 258.9672546386719, 89.74742126464844, -66.18690490722656, 95.31449890136719, 198.45692443847656, 224.41481018066406, 186.89871215820312, 82.112548828125, 106.43479919433594, 161.1310577392578, 183.06930541992188, 105.77427673339844, 40.753868103027344, -67.57083892822266, -70.97000885009766, 203.69677734375, 212.00308227539062, 219.9330291748047, 20.552536010742188, 189.77407836914062, 153.8006591796875, -198.55215454101562, 283.9828186035156, 201.65011596679688, 306.6602783203125, 22.90008544921875, 116.38710021972656, 192.09149169921875, 197.50662231445312, 324.89483642578125, 193.70269775390625, 305.20849609375, 343.2276611328125, 166.30029296875, 172.88565063476562, 102.32469940185547, 258.55865478515625, 12.649024963378906, 171.32894897460938, 470.9211120605469, 85.98001098632812, -19.929351806640625, 125.46966552734375, 214.17657470703125, 165.64366149902344, 156.17601013183594, 263.3060302734375, 225.44540405273438, 38.241424560546875, 240.51486206054688], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000384.npy"}
{"epoch": 0.5638766519823789, "step": 385, "batch_size": 64, "mean": 176.4716796875, "std": 139.77601623535156, "min": -146.60153198242188, "p10": 22.513036537170418, "median": 163.5130157470703, "p90": 353.65568847656255, "max": 495.4305114746094, "pos_frac": 0.90625, "sample": [73.86334228515625, -46.19884490966797, 83.9944076538086, 234.65682983398438, 357.8659973144531, 192.815185546875, 332.5147705078125, 169.94149780273438, 39.21196746826172, 300.914794921875, 146.9870147705078, 30.277143478393555, 75.2516860961914, 131.9019012451172, 343.8316345214844, 39.607261657714844, 329.2703552246094, 269.5688171386719, 109.95879364013672, 113.94629669189453, 314.9259033203125, 151.57522583007812, -76.90541076660156, 281.181396484375, -85.62924194335938, 214.36514282226562, 241.42247009277344, 417.10882568359375, 53.75672912597656, 125.59820556640625, 193.9679718017578, 146.1493377685547, 87.06494140625, 480.8809814453125, 422.3835754394531, 184.9215850830078, 136.02584838867188, -4.4272003173828125, 424.7320861816406, 203.1498565673828, 19.185562133789062, 74.99089050292969, 233.93179321289062, 153.72776794433594, 179.91485595703125, 457.8544921875, 282.6826477050781, 106.17495727539062, 295.8460388183594, 150.35801696777344, -78.4656753540039, 90.76356506347656, 102.7651138305664, -146.60153198242188, 164.61203002929688, 58.225860595703125, 240.5251922607422, 177.79568481445312, 225.01055908203125, 495.4305114746094, 162.41400146484375, 257.60699462890625, 158.77301025390625, 188.2353973388672], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000385.npy"}
{"epoch": 0.5653450807635829, "step": 386, "batch_size": 64, "mean": 184.04327392578125, "std": 176.86744689941406, "min": -247.4900665283203, "p10": -41.589656448364245, "median": 176.28579711914062, "p90": 410.45739135742195, "max": 633.3884887695312, "pos_frac": 0.84375, "sample": [-31.590076446533203, 265.0199890136719, -64.50505828857422, 276.07293701171875, 381.4409484863281, 111.34169006347656, 56.96997833251953, 317.25555419921875, 131.44393920898438, 281.92926025390625, 352.1825866699219, 579.7824096679688, -68.61470031738281, 19.829374313354492, 396.01446533203125, 190.067138671875, 215.12457275390625, 356.2601623535156, 14.85496711730957, -19.480430603027344, 160.9251708984375, 176.9057159423828, 82.68804168701172, 146.47161865234375, 184.20272827148438, 294.2959899902344, 128.6904296875, 93.41775512695312, 633.3884887695312, 473.3260803222656, -21.022613525390625, 219.06056213378906, 179.87335205078125, -45.87519073486328, 227.44651794433594, 493.3402404785156, 569.5516967773438, 273.9873046875, 463.6825256347656, 142.5553436279297, 183.54696655273438, 341.58050537109375, -77.48158264160156, 156.2965850830078, 26.8411865234375, 150.67471313476562, 105.16627502441406, 216.9535369873047, -103.03034973144531, 210.88046264648438, -247.4900665283203, 216.87261962890625, 94.14862823486328, 135.27459716796875, 87.565185546875, 132.62872314453125, 347.85064697265625, 313.1269836425781, -94.25831604003906, 95.249267578125, 175.66587829589844, 182.8119659423828, 416.647216796875, 72.93665313720703], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000386.npy"}
{"epoch": 0.566813509544787, "step": 387, "batch_size": 64, "mean": 137.70285034179688, "std": 133.54632568359375, "min": -138.48007202148438, "p10": -24.3665843963623, "median": 133.51874542236328, "p90": 309.8662322998047, "max": 513.3859252929688, "pos_frac": 0.875, "sample": [168.96636962890625, -17.266990661621094, 92.01078796386719, 181.00889587402344, -27.40926742553711, 513.3859252929688, 129.56210327148438, 38.72074890136719, 293.21533203125, 349.2113342285156, -104.56810760498047, 243.6611328125, 85.95050048828125, 180.72105407714844, -138.48007202148438, 183.56167602539062, 41.264404296875, 330.12847900390625, 123.38335418701172, 54.488616943359375, 310.61395263671875, 125.02436828613281, 161.60040283203125, 465.0027160644531, 220.95529174804688, 85.71511840820312, 110.53751373291016, 231.14389038085938, 43.688941955566406, 57.04643630981445, 78.92768096923828, 173.70314025878906, 197.63223266601562, 48.37678146362305, 308.1215515136719, 95.24381256103516, 256.4715576171875, 141.60972595214844, 114.90444946289062, 244.28854370117188, 36.92369842529297, 8.40245532989502, 400.63812255859375, 207.99740600585938, 242.27694702148438, 146.05252075195312, 119.68311309814453, -112.65644836425781, 334.96563720703125, -70.79441833496094, 179.51390075683594, 84.99148559570312, 36.74079132080078, 192.48533630371094, -55.43026351928711, 203.8231658935547, 3.6993255615234375, 164.37457275390625, 61.805694580078125, 1.46173095703125, 228.4289093017578, -75.88203430175781, 143.88037109375, 137.4753875732422], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000387.npy"}
{"epoch": 0.5682819383259912, "step": 388, "batch_size": 64, "mean": 187.8830108642578, "std": 182.38543701171875, "min": -258.6502685546875, "p10": -13.7997013092041, "median": 173.94174194335938, "p90": 405.9671142578126, "max": 758.0053100585938, "pos_frac": 0.875, "sample": [758.0053100585938, 273.2665100097656, -10.828052520751953, 362.3624267578125, 371.543212890625, 621.7105102539062, 224.45840454101562, -106.16141510009766, 102.23424530029297, 248.47003173828125, 267.12200927734375, 504.5410461425781, 146.9401397705078, 250.12557983398438, -258.6502685546875, -15.073265075683594, 134.77789306640625, 202.19952392578125, 176.91806030273438, 123.8733901977539, 224.19140625, 445.20416259765625, 103.5728530883789, -76.28186798095703, 170.96542358398438, 249.450439453125, 12.976646423339844, 48.91291809082031, 70.47265625, 75.42120361328125, 10.539604187011719, 79.90007019042969, 225.0979461669922, 75.33889770507812, 366.35247802734375, 188.35467529296875, 100.94865417480469, 78.90253448486328, 341.24786376953125, 33.657569885253906, 170.38824462890625, 79.77922058105469, 246.22216796875, 108.56171417236328, 140.32550048828125, 148.86154174804688, 301.9039306640625, 256.840576171875, -90.08724975585938, 305.2134704589844, 296.6212158203125, 656.4385375976562, -83.97492980957031, 283.05096435546875, 137.30055236816406, 23.026153564453125, -36.7115478515625, 228.1372528076172, 423.50732421875, 420.72021484375, 227.48178100585938, 278.7447204589844, 228.6494903564453, 70.45057678222656], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000388.npy"}
{"epoch": 0.5697503671071953, "step": 389, "batch_size": 64, "mean": 184.94285583496094, "std": 181.81591796875, "min": -167.8259735107422, "p10": -28.534362030029296, "median": 163.79714965820312, "p90": 465.34418945312507, "max": 596.3509521484375, "pos_frac": 0.828125, "sample": [122.73782348632812, 441.6142578125, 228.2135772705078, -167.8259735107422, -111.14227294921875, 241.5328826904297, 202.70809936523438, 312.64459228515625, 163.12672424316406, 106.79150390625, 145.88211059570312, -29.03986358642578, 203.92498779296875, -83.80777740478516, 35.50142288208008, 152.9002685546875, 121.3907241821289, 426.04791259765625, 410.1892395019531, -31.956146240234375, -27.3548583984375, 12.27536392211914, 482.4352722167969, 38.735042572021484, 178.48394775390625, 332.7540588378906, 2.58184814453125, 88.30268859863281, 152.30328369140625, 164.4675750732422, 291.5584411621094, 101.84524536132812, 56.26936340332031, 289.5024719238281, -30.332115173339844, 239.5564422607422, 103.56614685058594, 192.69833374023438, 265.0224609375, 579.23046875, 301.7734069824219, -25.402204513549805, 258.79974365234375, -3.1150665283203125, 250.92889404296875, 314.9125671386719, 135.17861938476562, 185.17420959472656, 391.9547119140625, 197.8104248046875, -7.328819274902344, 520.7626953125, 30.489730834960938, -113.88109588623047, 246.03651428222656, 475.51416015625, 78.80572509765625, 596.3509521484375, 493.87994384765625, 135.59010314941406, 153.59429931640625, 12.245098114013672, 214.3317413330078, 586.6007690429688], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000389.npy"}
{"epoch": 0.5712187958883994, "step": 390, "batch_size": 64, "mean": 163.55551147460938, "std": 201.81756591796875, "min": -339.5333557128906, "p10": -74.4440689086914, "median": 171.43630981445312, "p90": 377.61525268554686, "max": 1035.6331787109375, "pos_frac": 0.828125, "sample": [176.4417266845703, 204.6445770263672, 449.1522216796875, 378.0931396484375, -235.9473114013672, -57.693443298339844, -50.966102600097656, 90.61158752441406, 176.80801391601562, 259.891845703125, 403.74951171875, 62.99024963378906, 225.86798095703125, 217.5542755126953, 31.09668731689453, 138.358154296875, -68.43944549560547, 1035.6331787109375, -81.1626205444336, 340.4832763671875, 202.5719757080078, 188.10150146484375, 326.0569152832031, 177.0425262451172, 129.3562469482422, 116.31039428710938, 122.93704986572266, 68.57040405273438, 202.96519470214844, 81.81903839111328, 106.60108947753906, 81.57144165039062, -149.472900390625, -8.738380432128906, 163.18409729003906, 432.85235595703125, 446.45184326171875, 376.50018310546875, -284.4961853027344, 248.78164672851562, 390.1431579589844, 284.60797119140625, 136.60543823242188, 161.6817626953125, 269.2763366699219, 166.43089294433594, 122.58341217041016, 70.33879089355469, 338.5378723144531, 330.8504638671875, 239.27023315429688, 183.78489685058594, 93.74179077148438, 185.93258666992188, 313.97259521484375, -77.0174789428711, 36.907257080078125, -139.2451629638672, -339.5333557128906, 217.44737243652344, 129.6708984375, 156.73049926757812, 258.9159240722656, 209.78497314453125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000390.npy"}
{"epoch": 0.5726872246696035, "step": 391, "batch_size": 64, "mean": 118.00923156738281, "std": 168.53323364257812, "min": -217.923583984375, "p10": -58.37660903930662, "median": 81.13113021850586, "p90": 383.8146392822266, "max": 651.9232177734375, "pos_frac": 0.765625, "sample": [26.667461395263672, 11.403678894042969, 3.4481124877929688, -18.562576293945312, 57.96269989013672, 229.298828125, 80.71183013916016, 26.172279357910156, 9.558155059814453, 179.21908569335938, -21.80254364013672, 97.73249816894531, -170.91668701171875, 25.841835021972656, 131.0620574951172, 49.93547821044922, 213.84747314453125, 131.40277099609375, 17.833688735961914, -37.9310302734375, 160.22410583496094, 411.308837890625, 241.87918090820312, 426.3084716796875, -18.324167251586914, 62.07652282714844, 651.9232177734375, -6.5193634033203125, -9.810272216796875, 152.16143798828125, 109.99234771728516, 72.19022369384766, 64.69479370117188, 118.06393432617188, -67.13899993896484, 81.55043029785156, 151.53732299804688, 60.291358947753906, 451.61309814453125, 136.17327880859375, -84.59915924072266, -20.109275817871094, 188.576904296875, 111.86117553710938, 488.7806091308594, -117.25033569335938, 388.9613037109375, -8.918996810913086, -69.68400573730469, 43.63579559326172, 33.9954833984375, 63.598724365234375, -118.66604614257812, 364.5517578125, 371.8057556152344, 156.38624572753906, 457.2028503417969, 290.4559326171875, 218.6490936279297, -217.923583984375, 180.57931518554688, 240.8332061767578, 135.95620727539062, 160.83071899414062], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000391.npy"}
{"epoch": 0.5741556534508077, "step": 392, "batch_size": 64, "mean": 186.098388671875, "std": 155.43162536621094, "min": -106.74681091308594, "p10": 12.534795379638679, "median": 168.29686737060547, "p90": 422.7372955322266, "max": 598.233642578125, "pos_frac": 0.921875, "sample": [87.86869049072266, 199.45599365234375, -57.91827392578125, 61.58272933959961, 196.11973571777344, -51.33234405517578, 343.0058288574219, 101.89546203613281, 264.2177429199219, 257.724609375, 347.5355224609375, -106.74681091308594, 125.55731964111328, 6.507535934448242, 85.78205871582031, 27.242643356323242, 480.7294921875, 467.6280517578125, 246.39974975585938, 146.42967224121094, 106.48975372314453, 212.56236267089844, 338.0536193847656, 153.26382446289062, 202.1525115966797, 228.5956573486328, 262.28076171875, 561.073974609375, 146.99942016601562, 194.4361572265625, 197.60142517089844, 9.49993896484375, 374.9273986816406, 106.62035369873047, 244.6365966796875, 150.76475524902344, 248.8350067138672, 197.9556427001953, 119.049560546875, 112.17626190185547, 198.3372344970703, 271.3059387207031, 24.089736938476562, -8.593376159667969, 419.8567810058594, 474.6051330566406, 165.8197479248047, 423.9718017578125, 295.97503662109375, -100.21543884277344, 62.52782440185547, 64.71049499511719, 93.0174560546875, 137.14483642578125, 116.52317810058594, 170.77398681640625, 277.16082763671875, 47.9006462097168, 598.233642578125, 58.63407897949219, 19.616127014160156, 45.33030700683594, 213.08779907226562, 442.8228759765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000392.npy"}
{"epoch": 0.5756240822320118, "step": 393, "batch_size": 64, "mean": 145.4260711669922, "std": 197.29656982421875, "min": -177.20945739746094, "p10": -71.49655075073242, "median": 125.90135955810547, "p90": 425.6488830566408, "max": 789.5107421875, "pos_frac": 0.75, "sample": [164.0766143798828, 443.94183349609375, 26.104854583740234, -104.75232696533203, 174.8573760986328, 326.64190673828125, -68.9549331665039, 291.79937744140625, 173.06979370117188, 178.02047729492188, 150.65277099609375, 117.46267700195312, 103.5632095336914, 333.0945129394531, 77.67295837402344, 38.804893493652344, 582.418212890625, 607.041259765625, 263.8842468261719, -2.268646240234375, 7.136289596557617, 119.21739196777344, -173.874267578125, -15.329757690429688, -1.4198150634765625, 42.21574783325195, -177.20945739746094, 158.08860778808594, 158.91485595703125, 150.4281005859375, 62.46006393432617, 294.06536865234375, 448.8670654296875, 259.4158935546875, 329.29656982421875, 205.0731964111328, 132.5853271484375, 80.55826568603516, 373.7740478515625, 26.033126831054688, -29.187301635742188, 177.0337371826172, 445.65887451171875, 12.903526306152344, -25.004878997802734, 382.96533203125, 277.7245788574219, -72.5858154296875, 41.79712677001953, 43.950584411621094, 57.793006896972656, 165.3720245361328, -66.79309844970703, -100.7166976928711, -5.737941741943359, 262.4162292480469, 211.73373413085938, -117.96936798095703, 147.73443603515625, 52.16204071044922, -175.2713623046875, 789.5107421875, 486.1534118652344, -11.802459716796875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000393.npy"}
{"epoch": 0.5770925110132159, "step": 394, "batch_size": 64, "mean": 166.911865234375, "std": 187.26780700683594, "min": -193.5924530029297, "p10": -46.33468933105468, "median": 154.8108139038086, "p90": 426.55444641113286, "max": 702.103759765625, "pos_frac": 0.78125, "sample": [157.6648712158203, 178.80873107910156, -119.398193359375, 273.9529724121094, 224.2708740234375, -41.21435546875, 211.47605895996094, 96.682861328125, 410.23895263671875, 93.54974365234375, 442.5965881347656, 38.741310119628906, -159.78390502929688, 34.06785583496094, -47.50508117675781, -11.269798278808594, -162.88128662109375, 447.26239013671875, 339.0517272949219, -193.5924530029297, 399.72833251953125, -12.77813720703125, 247.95909118652344, 256.4416809082031, 147.8032684326172, 203.6265106201172, 255.6237335205078, 98.36894226074219, -153.03741455078125, 429.1968688964844, 343.63812255859375, 442.338623046875, -43.60377502441406, 263.9134826660156, 151.95675659179688, 38.02217102050781, 95.4089126586914, 77.27672576904297, 56.588623046875, 459.1391906738281, 42.49567413330078, 361.5076904296875, -3.192607879638672, 207.83604431152344, -14.657241821289062, -4.30169677734375, 408.1540832519531, 420.3887939453125, 136.6981964111328, 102.8621826171875, 316.1797180175781, 186.0314483642578, 702.103759765625, 460.50469970703125, 234.88717651367188, 66.47753143310547, 70.56550598144531, 24.956974029541016, 103.2269058227539, 357.8114318847656, 222.16561889648438, -71.86909484863281, 163.42376708984375, 217.77108764648438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000394.npy"}
{"epoch": 0.57856093979442, "step": 395, "batch_size": 64, "mean": 182.48716735839844, "std": 206.36509704589844, "min": -229.46148681640625, "p10": -40.91683464050293, "median": 159.90724182128906, "p90": 466.31722717285174, "max": 721.8021240234375, "pos_frac": 0.796875, "sample": [381.6903991699219, 97.12060546875, 163.67498779296875, 48.85620880126953, 86.39042663574219, 315.943359375, 304.4112243652344, 176.91061401367188, -159.55230712890625, 253.57969665527344, 89.37162780761719, 258.55023193359375, 369.7339782714844, 37.78711700439453, 156.13949584960938, 71.63902282714844, 360.3674621582031, 313.9980163574219, 519.575439453125, -40.656158447265625, 379.41839599609375, -34.95800018310547, 400.2530517578125, 97.05656433105469, 30.426002502441406, 57.53387451171875, 189.65145874023438, 286.3861083984375, -19.730552673339844, -15.657722473144531, 569.6307373046875, 625.517822265625, 152.5940704345703, 116.94375610351562, 217.22048950195312, 0.447021484375, 116.1266098022461, 138.88453674316406, 483.1855163574219, 188.9461669921875, 313.0084228515625, 362.06646728515625, -229.46148681640625, -65.11640930175781, -143.81021118164062, 221.590087890625, -26.033554077148438, 590.2252197265625, -34.803062438964844, -66.61882019042969, 194.46633911132812, -211.86734008789062, 60.309471130371094, 721.8021240234375, 506.3160705566406, 267.9531555175781, 99.15178680419922, 274.770751953125, 229.0175323486328, 295.1989440917969, 426.9578857421875, 33.25414276123047, 116.42244720458984, -41.0285530090332], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000395.npy"}
{"epoch": 0.580029368575624, "step": 396, "batch_size": 64, "mean": 166.55291748046875, "std": 160.2821502685547, "min": -176.6760711669922, "p10": -25.740715789794912, "median": 157.17092895507812, "p90": 356.78516235351566, "max": 685.1844482421875, "pos_frac": 0.84375, "sample": [183.9122772216797, 334.6517333984375, -29.34503936767578, 146.89614868164062, 162.4442138671875, 390.671630859375, 83.1917953491211, 110.04132080078125, 69.30620574951172, 204.55435180664062, 297.2933349609375, 151.89764404296875, 133.362060546875, 266.93145751953125, -118.59011840820312, 163.64439392089844, 138.56332397460938, 148.7155303955078, 68.43797302246094, 165.6949462890625, 174.63699340820312, 237.777099609375, 605.69091796875, 268.75628662109375, 248.46307373046875, 345.7840881347656, 209.24122619628906, 183.6927490234375, 130.10284423828125, 425.316162109375, -32.33525466918945, 273.7189025878906, -125.92923736572266, 231.12039184570312, 49.769290924072266, 194.7164764404297, 163.24334716796875, -4.110572814941406, 337.515380859375, 393.07623291015625, 89.98946380615234, 19.752471923828125, -17.33062744140625, 361.4999084472656, 78.40408325195312, -11.848880767822266, 327.833251953125, 91.10697174072266, 145.63510131835938, 171.0819091796875, 480.1522216796875, 53.362342834472656, 87.96356201171875, -81.43829345703125, 126.12901306152344, 195.66334533691406, 224.0841064453125, 77.9881820678711, -48.009910583496094, -176.6760711669922, 139.42002868652344, 685.1844482421875, 164.5103759765625, 92.40878295898438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000396.npy"}
{"epoch": 0.5814977973568282, "step": 397, "batch_size": 64, "mean": 189.552734375, "std": 169.38613891601562, "min": -195.04156494140625, "p10": -12.773446655273434, "median": 188.29950714111328, "p90": 398.52579956054694, "max": 607.8899536132812, "pos_frac": 0.84375, "sample": [14.85513687133789, 346.9112548828125, 21.01251220703125, 240.01206970214844, 77.53028106689453, 59.80192947387695, 121.70449829101562, 175.73373413085938, 312.95172119140625, 92.77143859863281, 315.6803283691406, 117.61082458496094, 300.88433837890625, 11.951263427734375, -9.21856689453125, 297.0118103027344, 127.87750244140625, -14.296966552734375, 318.157470703125, 320.051025390625, 212.30490112304688, 461.8433837890625, 215.5460662841797, 385.6830749511719, 607.592529296875, -35.26292419433594, 449.9333190917969, 386.60986328125, -33.900455474853516, 196.6046905517578, 346.5578308105469, 112.51951599121094, 58.707763671875, 464.308837890625, 278.83880615234375, 235.80856323242188, 113.13581848144531, 83.53632354736328, 127.61190032958984, 200.80848693847656, -195.04156494140625, -90.43123626708984, -6.312692642211914, -31.536643981933594, 257.16217041015625, 153.98963928222656, -84.76964569091797, 200.22067260742188, -3.6442909240722656, 326.5118408203125, 6.245037078857422, 403.63262939453125, 287.8945617675781, 607.8899536132812, 179.99432373046875, 434.4219055175781, 100.38056945800781, 92.45498657226562, 333.2738037109375, 122.16914367675781, 134.03924560546875, 343.053955078125, 220.35504150390625, 221.6392059326172], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000397.npy"}
{"epoch": 0.5829662261380323, "step": 398, "batch_size": 64, "mean": 177.1311798095703, "std": 186.32733154296875, "min": -192.14447021484375, "p10": -7.651324462890621, "median": 152.59844970703125, "p90": 409.20597534179694, "max": 727.0708618164062, "pos_frac": 0.875, "sample": [89.40762329101562, 17.114791870117188, 318.8899230957031, 379.1510009765625, -9.313766479492188, 119.21399688720703, 189.90809631347656, -105.40410614013672, 79.30064392089844, 95.92163848876953, 337.54461669921875, -185.70408630371094, 397.7933349609375, 390.2245178222656, 288.15313720703125, 727.0708618164062, 155.5445098876953, 130.02801513671875, 462.4567565917969, 259.9290771484375, 250.36151123046875, 99.47479248046875, 510.3934326171875, 267.0633850097656, 512.9957275390625, 18.555755615234375, 104.52191162109375, 70.29042053222656, 262.93817138671875, 170.33493041992188, 113.48165893554688, 194.5157470703125, -88.6317138671875, -30.397018432617188, 114.13601684570312, 157.6283416748047, 52.793724060058594, 244.93634033203125, 70.19605255126953, 169.14341735839844, 149.6523895263672, 355.5132751464844, 170.26718139648438, 60.986610412597656, 90.48837280273438, 41.45714569091797, 178.97457885742188, 21.198333740234375, 20.429641723632812, 190.22100830078125, 21.29877281188965, 566.5504150390625, 414.09710693359375, 577.6707153320312, -81.41162872314453, 370.7677307128906, 325.38720703125, 338.2799987792969, 191.50048828125, 36.914215087890625, -3.7722930908203125, 45.250579833984375, 44.854454040527344, -192.14447021484375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000398.npy"}
{"epoch": 0.5844346549192364, "step": 399, "batch_size": 64, "mean": 177.97613525390625, "std": 185.05747985839844, "min": -183.0909881591797, "p10": -58.9971366882324, "median": 167.3278045654297, "p90": 429.8758483886719, "max": 631.1529541015625, "pos_frac": 0.765625, "sample": [211.18099975585938, 120.56785583496094, 59.37339782714844, 27.8636474609375, 325.52215576171875, -12.802606582641602, 423.3448791503906, 139.88169860839844, 195.96429443359375, 107.7906494140625, 162.14724731445312, -26.750579833984375, 451.32672119140625, -28.34674072265625, 314.8763732910156, 100.49708557128906, 270.97100830078125, -117.3236312866211, 0.7163028717041016, 89.78744506835938, 251.7166748046875, -8.342849731445312, 323.42437744140625, 161.70985412597656, 379.1513671875, 481.9033508300781, 57.23320770263672, 285.91912841796875, -183.0909881591797, 150.73727416992188, 89.82054138183594, 326.1520690917969, 252.96572875976562, 168.87425231933594, 312.5323181152344, -137.69961547851562, 631.1529541015625, 321.8684997558594, 432.6748352050781, -16.241165161132812, 323.6663513183594, 337.2211608886719, 118.12484741210938, -173.285400390625, -8.65011215209961, 323.3226318359375, -6.283597946166992, 444.95623779296875, 160.02590942382812, -74.44216918945312, 310.1200256347656, 548.15380859375, 235.89309692382812, 165.78135681152344, 230.36888122558594, 267.2774658203125, 96.62120056152344, 268.683349609375, -118.23992919921875, 299.472412109375, 462.84149169921875, -36.37861633300781, 184.86337280273438, -68.69078826904297], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000399.npy"}
{"epoch": 0.5859030837004405, "step": 400, "batch_size": 64, "mean": 243.1272735595703, "std": 204.13461303710938, "min": -269.61376953125, "p10": -6.901844215393065, "median": 236.67406463623047, "p90": 482.04123840332034, "max": 768.775634765625, "pos_frac": 0.859375, "sample": [195.88482666015625, 288.7572021484375, 276.64324951171875, 181.03945922851562, 486.7598571777344, 39.312103271484375, 343.3245544433594, 264.7123718261719, -25.925376892089844, 333.2077941894531, 204.0988311767578, 143.61862182617188, 289.4494934082031, 240.88905334472656, -269.61376953125, 392.3138427734375, 547.4590454101562, 69.08195495605469, 89.60087585449219, 328.67626953125, 581.4564208984375, 388.21820068359375, 133.21755981445312, 408.20904541015625, -64.3583755493164, 212.7999725341797, 74.12345886230469, 432.4209289550781, 176.62245178222656, 326.3528747558594, 240.5502166748047, 89.5902099609375, -46.15015411376953, 140.84263610839844, 91.58377838134766, -72.07481384277344, 45.23876190185547, 506.97332763671875, 299.05810546875, 354.5923767089844, 4.3787689208984375, 139.85763549804688, 222.31105041503906, 751.466064453125, -5.169157028198242, 456.5028076171875, 203.04742431640625, 213.93109130859375, -32.52192306518555, 768.775634765625, 470.6033935546875, 0.868072509765625, 253.30426025390625, 392.64508056640625, 232.79791259765625, 471.0311279296875, -7.6444244384765625, 537.0369873046875, 446.24676513671875, 267.644287109375, 170.17840576171875, 424.8431091308594, 442.0275573730469, -2.5743408203125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000400.npy"}
{"epoch": 0.5873715124816447, "step": 401, "batch_size": 64, "mean": 143.82565307617188, "std": 223.73484802246094, "min": -347.1209716796875, "p10": -125.84073257446288, "median": 120.49708938598633, "p90": 538.5240173339844, "max": 623.2547607421875, "pos_frac": 0.734375, "sample": [55.30533981323242, 337.021728515625, -30.942028045654297, 213.67471313476562, 265.26483154296875, -12.572549819946289, -132.3511505126953, -271.5716247558594, -347.1209716796875, -90.23466491699219, 77.23107147216797, 77.47308349609375, -110.6497573852539, 85.25835418701172, -0.146270751953125, -1.8544063568115234, 364.65777587890625, 131.53335571289062, 108.38632202148438, 91.70317077636719, -70.9713134765625, -137.2673797607422, 63.68596649169922, 565.930419921875, 33.02033233642578, 391.6424255371094, 544.0454711914062, 221.85333251953125, 561.5496826171875, 152.3397979736328, 197.4188232421875, 164.0299072265625, 31.895633697509766, 128.79818725585938, -90.31803131103516, -255.48886108398438, 580.7801513671875, 20.704334259033203, 247.35052490234375, 260.0954284667969, 141.98545837402344, 405.6121520996094, 198.91064453125, 24.91489028930664, 238.53175354003906, 112.19599151611328, 158.15089416503906, 4.5105133056640625, 525.640625, -138.96849060058594, 410.7906799316406, 316.02557373046875, 105.08692932128906, 67.8679428100586, 555.6185302734375, 623.2547607421875, 193.87286376953125, 139.06796264648438, 199.3976593017578, -46.783695220947266, 185.5816650390625, 551.5633544921875, -165.94876098632812, -23.199445724487305], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000401.npy"}
{"epoch": 0.5888399412628488, "step": 402, "batch_size": 64, "mean": 251.9342041015625, "std": 183.1197052001953, "min": -81.62411499023438, "p10": 31.100657844543466, "median": 217.15894317626953, "p90": 494.72964172363294, "max": 782.1368408203125, "pos_frac": 0.9375, "sample": [259.2418212890625, 27.69997215270996, 71.08138275146484, 357.00885009765625, 179.31295776367188, 330.0080261230469, 13.474197387695312, 143.71939086914062, 39.03559112548828, 359.859375, 343.004638671875, 466.5613708496094, 506.8017578125, 139.1393585205078, -60.26123046875, 438.5660095214844, 13.058517456054688, 209.82923889160156, 245.465576171875, 375.9012756347656, 426.2812805175781, 199.58529663085938, 413.0691833496094, -51.905250549316406, 170.501953125, 366.5340881347656, 57.854522705078125, 636.9397583007812, 93.19566345214844, 562.5521240234375, 173.10415649414062, 189.8208770751953, 143.8672332763672, 108.55096435546875, 128.40753173828125, 209.557861328125, 639.1690673828125, 75.61212158203125, 192.46224975585938, 317.266845703125, 92.83024597167969, -56.86765670776367, 523.8475341796875, 543.7293701171875, 224.4886474609375, 297.4079284667969, 372.9750671386719, 209.49920654296875, 241.87753295898438, 782.1368408203125, 347.03076171875, 408.727294921875, 133.95977783203125, 466.1622314453125, 91.81405639648438, 329.63946533203125, 305.99688720703125, 166.93829345703125, 191.296142578125, -81.62411499023438, 344.1850280761719, 268.1009216308594, 150.47128295898438, 258.2296447753906], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000402.npy"}
{"epoch": 0.5903083700440529, "step": 403, "batch_size": 64, "mean": 189.0760498046875, "std": 180.69215393066406, "min": -215.1708526611328, "p10": -25.73819599151611, "median": 186.26304626464844, "p90": 440.9688873291016, "max": 590.94287109375, "pos_frac": 0.828125, "sample": [130.17486572265625, 168.72384643554688, 65.36731719970703, 101.58373260498047, 296.7882385253906, -13.753093719482422, 210.75704956054688, 362.1748046875, -23.9490909576416, -129.3867645263672, 118.82485961914062, 70.00448608398438, -215.1708526611328, 444.9843444824219, 11.174854278564453, 497.1966857910156, 178.75315856933594, 201.3135986328125, 389.68865966796875, 245.0440216064453, 296.2701416015625, 134.12799072265625, 465.2325134277344, 15.540367126464844, -38.18212127685547, 517.0969848632812, 331.234619140625, 268.7706298828125, 183.58607482910156, 590.94287109375, -26.504955291748047, 209.98818969726562, 379.2537841796875, 168.29708862304688, 32.43425750732422, 289.9295959472656, -84.76873016357422, 295.72314453125, 391.34759521484375, 431.5994873046875, -105.24394226074219, 152.09054565429688, 472.73651123046875, 25.945119857788086, 336.2909240722656, 73.70576477050781, 410.9650573730469, 344.340087890625, 91.9643325805664, 536.6204833984375, 276.9481506347656, 191.18624877929688, 214.52239990234375, 118.39891052246094, 21.849607467651367, -11.410263061523438, -51.173805236816406, 227.53085327148438, 62.24049377441406, 189.02394104003906, 267.2353515625, 114.67304229736328, -10.72698974609375, 188.9400177001953], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000403.npy"}
{"epoch": 0.591776798825257, "step": 404, "batch_size": 64, "mean": 196.38226318359375, "std": 210.74522399902344, "min": -247.24114990234375, "p10": -96.39292907714842, "median": 202.14349365234375, "p90": 457.3342864990235, "max": 686.2058715820312, "pos_frac": 0.8125, "sample": [-102.11424255371094, 188.58010864257812, 15.168970108032227, 334.4612121582031, 422.832763671875, 112.81942749023438, 165.7935333251953, 558.6236572265625, 262.521484375, 545.7545776367188, 606.808837890625, 167.97442626953125, 271.34869384765625, 410.1937255859375, -13.521125793457031, -144.39035034179688, 425.0076904296875, 172.40716552734375, -200.88743591308594, 37.001182556152344, 73.31588745117188, 115.29389190673828, 118.31752014160156, 366.89508056640625, 351.03497314453125, 280.6269226074219, -246.96441650390625, 307.7143859863281, 277.428466796875, 233.14495849609375, 352.2456359863281, -108.71660614013672, 365.7454833984375, 168.44491577148438, 328.9249572753906, 109.81987762451172, -68.54209899902344, 271.54302978515625, 257.7803649902344, 205.51409912109375, -247.24114990234375, 260.506103515625, 150.64434814453125, 493.3687744140625, 118.42379760742188, -83.04319763183594, -64.65390014648438, 222.43106079101562, 379.77178955078125, -151.93751525878906, 163.26004028320312, 170.44375610351562, 220.09939575195312, 201.82550048828125, 445.505615234375, 539.5820922851562, 107.89474487304688, 53.02656555175781, -25.613794326782227, 3.7638988494873047, 686.2058715820312, 202.46148681640625, 462.4037170410156, 263.384033203125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000404.npy"}
{"epoch": 0.593245227606461, "step": 405, "batch_size": 64, "mean": 160.87057495117188, "std": 205.2707977294922, "min": -203.97854614257812, "p10": -75.91244354248046, "median": 148.51178741455078, "p90": 450.7653717041016, "max": 671.3587646484375, "pos_frac": 0.71875, "sample": [391.94708251953125, 448.3091125488281, 161.19515991210938, -126.88429260253906, -55.68696594238281, -0.803436279296875, -201.9401092529297, 177.48361206054688, 363.0228576660156, -199.207275390625, 48.03539276123047, 544.872314453125, 374.86065673828125, 65.71112060546875, -0.03618621826171875, 591.0847778320312, -23.610383987426758, 6.353538513183594, 200.11117553710938, -73.94650268554688, 115.5322036743164, 164.87342834472656, 135.8284149169922, 389.44915771484375, 91.22854614257812, 167.3822479248047, 101.19645690917969, 246.69223022460938, -10.64581298828125, -96.6446762084961, 256.1188659667969, -8.719247817993164, 211.6682586669922, 96.64856719970703, 309.282958984375, 48.279075622558594, 190.31240844726562, -95.93492126464844, 133.65509033203125, -22.68465805053711, 348.4134521484375, 190.78707885742188, 471.2362365722656, 293.04119873046875, -39.90848159790039, 162.29327392578125, 343.8663635253906, -21.607269287109375, 84.20761108398438, 183.0665283203125, 218.1781768798828, -76.75498962402344, 97.00865173339844, 540.04248046875, 62.528480529785156, -203.97854614257812, 208.4998779296875, 284.3786926269531, 126.21347045898438, 671.3587646484375, 586.0825805664062, -56.24627685546875, 256.80029296875, 451.81805419921875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000405.npy"}
{"epoch": 0.5947136563876652, "step": 406, "batch_size": 64, "mean": 187.85006713867188, "std": 208.31898498535156, "min": -206.3266143798828, "p10": -23.528099441528315, "median": 160.19821166992188, "p90": 468.1025665283203, "max": 739.8875732421875, "pos_frac": 0.828125, "sample": [199.9384765625, 96.47073364257812, 148.74826049804688, 115.92042541503906, 189.9056396484375, 171.64816284179688, 73.70034790039062, -177.02667236328125, 612.7750854492188, 143.46670532226562, 338.2065734863281, 134.21803283691406, 721.2665405273438, 413.0592041015625, 221.8290252685547, 32.7989616394043, 243.44970703125, 259.2367248535156, 306.62078857421875, -71.68128204345703, 470.4447326660156, 316.23944091796875, 236.49737548828125, -206.3266143798828, 64.89066314697266, 90.12785339355469, 9.073577880859375, -6.973117828369141, 244.18569946289062, -2.886627197265625, 24.243026733398438, 196.7870330810547, 215.64822387695312, -182.9095916748047, -25.143367767333984, 345.87567138671875, 559.126220703125, 87.71912384033203, 114.44881439208984, -12.57172966003418, 146.24362182617188, 247.84860229492188, 360.3000183105469, -19.759140014648438, 4.620208740234375, 5.070960998535156, 548.08642578125, 294.9747009277344, 20.330596923828125, 120.1168212890625, 435.7646179199219, 462.63751220703125, 290.8019104003906, 34.28019714355469, -157.1803436279297, -39.31586837768555, 503.7923889160156, 273.0982666015625, 739.8875732421875, 308.98236083984375, 301.3486328125, 278.9697265625, 129.96910095214844, 18.487709045410156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000406.npy"}
{"epoch": 0.5961820851688693, "step": 407, "batch_size": 64, "mean": 150.65615844726562, "std": 197.66969299316406, "min": -440.1927490234375, "p10": -61.503392028808584, "median": 142.39136505126953, "p90": 410.95054321289064, "max": 598.3394165039062, "pos_frac": 0.828125, "sample": [-154.6392364501953, 414.4421081542969, 238.83447265625, -157.2403106689453, -195.73338317871094, 55.741920471191406, 298.4456787109375, -155.77383422851562, -51.75653076171875, 67.04820251464844, 404.30810546875, 200.7268829345703, 189.6107940673828, 598.3394165039062, 355.58551025390625, 25.824800491333008, 38.53165054321289, 362.576416015625, 128.4375, 34.28782653808594, 197.06900024414062, 0.17572021484375, 132.38685607910156, 163.10440063476562, 209.1692657470703, 325.053466796875, 275.3207092285156, 81.47433471679688, 89.50709533691406, 162.9488983154297, -39.324066162109375, 147.59768676757812, 25.893592834472656, 22.20397186279297, 15.363212585449219, 154.18092346191406, 301.72802734375, 593.3575439453125, 133.69017028808594, -440.1927490234375, 262.84332275390625, 263.3427429199219, 144.80955505371094, 42.482521057128906, 413.79730224609375, 100.04591369628906, 85.7523193359375, 388.65570068359375, 309.8197937011719, 163.97625732421875, 501.5498046875, -32.49309539794922, 4.434192657470703, 492.5044250488281, -50.00080108642578, 139.97317504882812, 106.35279846191406, 372.9234313964844, 28.499557495117188, -134.32261657714844, 420.6549377441406, 228.5448760986328, -65.68061828613281, 205.22344970703125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000407.npy"}
{"epoch": 0.5976505139500734, "step": 408, "batch_size": 64, "mean": 148.69833374023438, "std": 172.78497314453125, "min": -269.1595458984375, "p10": -50.36142196655273, "median": 126.03681945800781, "p90": 364.76730041503913, "max": 627.5027465820312, "pos_frac": 0.859375, "sample": [192.17982482910156, 370.398681640625, 627.5027465820312, 111.00299072265625, 262.0009765625, 250.08847045898438, -130.36798095703125, 69.13761138916016, 62.81756591796875, -16.667526245117188, 5.94688606262207, -107.38490295410156, 1.3664016723632812, 47.261962890625, 0.15018463134765625, -125.53138732910156, 200.17027282714844, 261.91314697265625, 73.65225982666016, 103.60374450683594, 75.53648376464844, 333.9973449707031, 62.27813720703125, 341.1694641113281, -144.19224548339844, 197.3955078125, -50.58240509033203, 110.70682525634766, 62.410125732421875, 388.216552734375, 87.67607879638672, 157.90243530273438, 351.6274108886719, 13.927398681640625, -269.1595458984375, -49.845794677734375, 247.50454711914062, 270.0892333984375, 1.8444061279296875, 171.42495727539062, 239.66268920898438, 227.11410522460938, 194.34629821777344, 247.240966796875, 249.10255432128906, -160.455322265625, 434.7149353027344, 18.504384994506836, 105.30415344238281, 38.736907958984375, 106.90399169921875, 198.54214477539062, 76.6966323852539, 66.80499267578125, 175.95870971679688, 297.9671936035156, 101.2723159790039, 409.3567810058594, 182.0283660888672, 141.07064819335938, 533.124267578125, 480.65045166015625, 319.7308349609375, 213.14547729492188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000408.npy"}
{"epoch": 0.5991189427312775, "step": 409, "batch_size": 64, "mean": 157.61297607421875, "std": 178.486572265625, "min": -168.60165405273438, "p10": -23.422235870361323, "median": 144.09752655029297, "p90": 395.82060546875, "max": 625.48486328125, "pos_frac": 0.828125, "sample": [-7.668792724609375, -168.60165405273438, 205.60910034179688, 183.28912353515625, 236.70899963378906, 169.35357666015625, 27.63576316833496, 94.44403076171875, 100.01516723632812, 134.11021423339844, 393.0604248046875, 367.0510559082031, -46.957191467285156, -4.244293212890625, 347.4194030761719, 150.09188842773438, 156.73260498046875, 253.99729919433594, 193.55227661132812, 625.48486328125, 173.33261108398438, 5.530738830566406, 208.526611328125, 73.85960388183594, 397.0035400390625, -124.5197982788086, -18.657325744628906, -14.84991455078125, 47.290985107421875, 139.32118225097656, 154.0275115966797, 545.2330932617188, 280.730712890625, -48.688087463378906, 70.51599884033203, 496.9749450683594, -105.5577621459961, 70.72251892089844, -25.464340209960938, 1.8578872680664062, 326.1283874511719, 30.377357482910156, 219.92735290527344, 33.6311149597168, 200.7698974609375, 83.6693115234375, 61.426902770996094, 89.56897735595703, 555.3901977539062, 376.5397033691406, 2.942373275756836, 106.54530334472656, -115.77455139160156, 170.63768005371094, 273.5477294921875, 455.42529296875, 264.00238037109375, 160.01171875, 207.19076538085938, 29.451370239257812, 577.735595703125, 83.31440734863281, 7.6220703125, 148.87387084960938], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000409.npy"}
{"epoch": 0.6005873715124816, "step": 410, "batch_size": 64, "mean": 161.7826690673828, "std": 179.25839233398438, "min": -290.5767822265625, "p10": -38.173379516601564, "median": 187.0708236694336, "p90": 403.9064666748047, "max": 613.645751953125, "pos_frac": 0.84375, "sample": [263.6351013183594, 135.1129608154297, -38.303558349609375, 70.9462661743164, 316.245849609375, 130.23098754882812, 294.6523132324219, 414.7039489746094, 226.8349151611328, 11.701566696166992, 116.72746276855469, 125.28990936279297, 13.24462890625, 228.15231323242188, -60.401893615722656, -175.81011962890625, 479.9336853027344, -259.57952880859375, -14.78696060180664, 192.23362731933594, 316.5906066894531, 405.2500305175781, 8.324111938476562, 115.31361389160156, 389.13330078125, 400.771484375, 228.80499267578125, -2.5806312561035156, 214.0503692626953, 613.645751953125, 245.9434814453125, 24.35879135131836, 267.20086669921875, 11.71441650390625, 244.9162139892578, 59.966400146484375, -55.40961456298828, 21.938547134399414, 50.97210693359375, -37.86962890625, 243.94569396972656, 190.46810913085938, 104.13924407958984, 318.32501220703125, 124.65000915527344, -290.5767822265625, 74.71652221679688, 183.6735382080078, 413.03076171875, 220.05882263183594, 80.09698486328125, 86.30365753173828, 245.0172882080078, 278.2975769042969, 203.7389678955078, 502.8583984375, 191.6658477783203, 320.6995849609375, -180.24317932128906, 37.04576110839844, 214.62823486328125, 124.3773422241211, 428.7860412597656, 244.5884246826172], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000410.npy"}
{"epoch": 0.6020558002936858, "step": 411, "batch_size": 64, "mean": 177.94451904296875, "std": 199.2484130859375, "min": -192.89089965820312, "p10": -30.887628936767562, "median": 133.1286849975586, "p90": 438.9575012207032, "max": 914.8614501953125, "pos_frac": 0.78125, "sample": [215.6698455810547, 38.81908416748047, 86.73310089111328, 59.488975524902344, 336.915771484375, -3.8337783813476562, 118.22702026367188, 195.188232421875, 305.30572509765625, 267.2446594238281, 207.1505126953125, 2.680196762084961, 173.47467041015625, -39.348873138427734, -2.700326919555664, -2.205598831176758, 379.1993103027344, 348.7887878417969, 88.87680053710938, 264.3119812011719, 551.25830078125, 278.0005798339844, -37.164459228515625, -13.010141372680664, 112.5796890258789, 73.09263610839844, 39.08729553222656, 125.15898132324219, 181.0574951171875, 633.3937377929688, 63.887542724609375, 217.420166015625, 446.38751220703125, -16.24169158935547, 342.6075439453125, 56.68341064453125, 187.17018127441406, 327.03045654296875, 453.537109375, 85.75849151611328, 87.9326400756836, 324.0973205566406, 235.4542236328125, 247.49725341796875, 113.41043853759766, 264.47772216796875, 68.00810241699219, -175.8875732421875, 914.8614501953125, 325.2613220214844, 69.63240051269531, 268.22283935546875, 423.7239990234375, 381.4591979980469, 445.48614501953125, -192.89089965820312, 40.62688446044922, -41.29759216308594, 141.098388671875, -0.9911518096923828, 467.3703918457031, -62.66136932373047, -15.671920776367188, -88.45268249511719], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000411.npy"}
{"epoch": 0.6035242290748899, "step": 412, "batch_size": 64, "mean": 162.5821990966797, "std": 181.7827606201172, "min": -272.04656982421875, "p10": -41.06028251647949, "median": 151.02254486083984, "p90": 385.94630432128906, "max": 610.1728515625, "pos_frac": 0.796875, "sample": [-115.63423919677734, 215.3199920654297, 256.43353271484375, 201.4361114501953, 333.237548828125, -4.295860290527344, 452.63653564453125, 6.427299499511719, -27.739013671875, 233.72714233398438, 57.923675537109375, 101.90274047851562, 68.76095581054688, 279.6203308105469, -43.16401672363281, 308.22821044921875, 323.03631591796875, 73.3023452758789, -272.04656982421875, 174.25234985351562, -4.489191055297852, 74.43375396728516, 309.1842346191406, 129.75685119628906, 28.278392791748047, 195.3168487548828, 54.072784423828125, 253.04998779296875, 3.0896987915039062, 129.12997436523438, -51.498985290527344, 60.152259826660156, 180.05087280273438, 148.03707885742188, 154.0080108642578, 37.53364562988281, -188.8390655517578, -36.15156936645508, 159.482177734375, 513.3680419921875, 139.43731689453125, -85.12889099121094, 398.4342041015625, 364.8251953125, 388.58984375, 37.01194381713867, 367.70404052734375, 501.091064453125, 99.31182861328125, 286.5015563964844, 335.120849609375, -34.609588623046875, 69.09318542480469, 297.1564636230469, 370.713623046875, 610.1728515625, -29.636051177978516, 177.62939453125, 315.45306396484375, -141.81265258789062, 406.84478759765625, 120.55877685546875, 259.6889343261719, 379.7780456542969], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000412.npy"}
{"epoch": 0.604992657856094, "step": 413, "batch_size": 64, "mean": 156.8417510986328, "std": 161.25604248046875, "min": -281.01422119140625, "p10": -12.125163269042964, "median": 168.60279846191406, "p90": 331.71141967773445, "max": 572.5004272460938, "pos_frac": 0.859375, "sample": [39.4049072265625, 51.15510940551758, 47.06451416015625, 193.12094116210938, 178.83389282226562, -13.844619750976562, 167.82931518554688, 173.2449951171875, 62.521156311035156, 118.89093780517578, 308.2735900878906, 498.8365783691406, 20.67803192138672, 312.57354736328125, 16.722389221191406, -38.04927062988281, 223.056396484375, 97.21917724609375, -0.7692718505859375, 410.5445556640625, 59.99229431152344, 416.01153564453125, 169.37628173828125, -281.01422119140625, 27.21356201171875, 257.77630615234375, 227.22267150878906, 117.97313690185547, 303.631591796875, 572.5004272460938, 108.34574890136719, 160.50074768066406, 376.1626281738281, 301.33673095703125, -44.384124755859375, 311.544677734375, 264.2729187011719, 23.06402587890625, 245.5562744140625, 314.0045471191406, 32.62577819824219, 62.29841613769531, 424.3662414550781, -8.11309814453125, 202.06979370117188, 339.3000793457031, 223.32412719726562, -106.56607055664062, -116.04804992675781, -179.25115966796875, 183.1905975341797, 248.9224853515625, 228.9217529296875, 279.65802001953125, 20.780406951904297, 289.18060302734375, 85.75997924804688, 309.267578125, 127.42633056640625, 59.024169921875, 200.49310302734375, 92.59156799316406, 223.7751922607422, 16.48040771484375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000413.npy"}
{"epoch": 0.6064610866372981, "step": 414, "batch_size": 64, "mean": 195.24468994140625, "std": 169.85525512695312, "min": -202.05673217773438, "p10": -7.233338928222656, "median": 180.31841278076172, "p90": 418.26853332519534, "max": 588.8699951171875, "pos_frac": 0.859375, "sample": [345.0049743652344, 285.51953125, 261.432373046875, 34.14711380004883, 109.35079956054688, 413.81536865234375, 188.02352905273438, -202.05673217773438, 61.16166687011719, 258.53656005859375, 259.8803405761719, 278.94287109375, 344.173828125, 561.6205444335938, 84.92760467529297, -112.69410705566406, 178.75059509277344, 119.95657348632812, -16.613576889038086, 35.261512756347656, 180.49026489257812, 204.11099243164062, -63.5745735168457, 161.32907104492188, 347.917236328125, 139.5095672607422, 147.21047973632812, -10.60075569152832, 249.42933654785156, 115.14942932128906, 350.6029357910156, 280.38604736328125, 36.31325149536133, 168.28761291503906, 562.6583251953125, 58.88283920288086, 232.1045684814453, -6.5498199462890625, -5.9727325439453125, 315.8297119140625, 180.1465606689453, 145.5579833984375, 430.2529602050781, -87.81942749023438, 25.768465042114258, 420.1770324707031, 31.95361328125, 477.9694519042969, 42.544898986816406, 104.09242248535156, 283.491943359375, 252.0492401123047, 152.9251708984375, 205.4744110107422, 164.29898071289062, 290.3555908203125, 468.5126037597656, -7.526275634765625, 325.2062072753906, 393.77593994140625, 588.8699951171875, 267.53082275390625, 67.64877319335938, 319.7474365234375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000414.npy"}
{"epoch": 0.6079295154185022, "step": 415, "batch_size": 64, "mean": 149.7544403076172, "std": 164.05702209472656, "min": -230.6063690185547, "p10": -49.23192825317381, "median": 146.65408325195312, "p90": 374.71867065429694, "max": 542.56005859375, "pos_frac": 0.84375, "sample": [200.19931030273438, 94.52685546875, 244.43365478515625, 539.3465576171875, -56.34056091308594, 464.8050231933594, 398.98699951171875, 187.15231323242188, -59.783653259277344, 197.7426300048828, 239.08230590820312, 159.39666748046875, 285.4359130859375, 297.68011474609375, 149.02841186523438, 8.329242706298828, 202.23736572265625, 73.95407104492188, 194.4674835205078, 51.48460006713867, 6.440879821777344, 144.27975463867188, 14.77337646484375, 308.0635070800781, 35.369361877441406, 76.1972427368164, 352.4996032714844, -11.797584533691406, 207.55032348632812, 201.62554931640625, 49.5306396484375, 214.81246948242188, 142.95980834960938, 92.31698608398438, 542.56005859375, -230.6063690185547, -119.36384582519531, 180.3747100830078, 247.34893798828125, 391.8619689941406, 62.73357009887695, 261.6326904296875, 84.91412353515625, 90.2091064453125, 62.99782943725586, 113.12806701660156, -32.645118713378906, 125.34718322753906, 421.8803405761719, -92.3055191040039, 84.434326171875, 356.0789489746094, 265.6045837402344, 37.13868713378906, 170.96258544921875, -27.575803756713867, 131.72857666015625, 20.560699462890625, 264.2568664550781, -125.1842269897461, 382.7071228027344, 156.23789978027344, -202.19821166992188, 252.67764282226562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000415.npy"}
{"epoch": 0.6093979441997063, "step": 416, "batch_size": 64, "mean": 203.90164184570312, "std": 162.5150604248047, "min": -92.14492797851562, "p10": -35.460186004638665, "median": 210.4156951904297, "p90": 386.8416931152344, "max": 660.339111328125, "pos_frac": 0.859375, "sample": [289.3641357421875, 190.42193603515625, 81.5284423828125, 280.56561279296875, -87.7785873413086, 473.6383056640625, 169.5936279296875, 148.27334594726562, 277.65948486328125, 451.56182861328125, -38.172767639160156, 113.877685546875, 78.14225006103516, 284.39630126953125, 221.70022583007812, 341.20294189453125, 176.85098266601562, 27.744224548339844, 345.62237548828125, 110.62895202636719, 285.60797119140625, 223.19317626953125, 123.49333190917969, 180.91995239257812, 371.5992431640625, 140.02554321289062, 205.598876953125, 379.79473876953125, 7.877777099609375, 412.1253662109375, 189.47901916503906, 325.97979736328125, 660.339111328125, -78.22553253173828, -92.14492797851562, 66.3257064819336, 346.93304443359375, 311.17034912109375, -87.01605987548828, 453.40338134765625, 145.077880859375, 360.603759765625, 153.41696166992188, 245.00929260253906, 261.60443115234375, 227.41246032714844, 241.40736389160156, 389.86181640625, 138.6587677001953, 170.25177001953125, -78.61494445800781, 198.8618927001953, -47.74957275390625, 377.20263671875, 215.23251342773438, -29.130828857421875, 266.121826171875, -15.796802520751953, 270.7036437988281, 25.370193481445312, 100.17388153076172, 237.04733276367188, 314.90350341796875, 518.7745971679688], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000416.npy"}
{"epoch": 0.6108663729809104, "step": 417, "batch_size": 64, "mean": 191.4312744140625, "std": 147.13314819335938, "min": -151.4388427734375, "p10": 8.43077297210694, "median": 188.3863296508789, "p90": 366.28894348144536, "max": 529.9442749023438, "pos_frac": 0.921875, "sample": [177.68955993652344, 86.19551086425781, 5.621530532836914, 104.56632995605469, 1.6516380310058594, 359.22735595703125, 14.985671997070312, 42.5404052734375, -20.060571670532227, 523.7091674804688, -40.67544174194336, 347.4134216308594, 254.42343139648438, 124.86961364746094, 109.92060852050781, 80.57951354980469, -34.29862976074219, 401.981689453125, 475.88873291015625, 17.521453857421875, 465.9338073730469, 147.0, 349.1798095703125, 110.3707275390625, 369.3153381347656, 529.9442749023438, 335.9398193359375, 109.15579986572266, 267.4302673339844, 292.8767395019531, 130.79364013671875, 192.6630096435547, 21.853479385375977, 123.07584381103516, 227.05706787109375, 336.34832763671875, 179.0137939453125, 121.6658706665039, 357.0740051269531, 197.77027893066406, 40.22709274291992, 234.46067810058594, 235.62074279785156, 260.5216369628906, -151.4388427734375, 273.7823486328125, 279.11175537109375, 79.4655990600586, 132.0342559814453, 253.0283660888672, 307.5784606933594, -20.439918518066406, 259.6941223144531, 148.12985229492188, 194.2489013671875, 27.500755310058594, 184.10964965820312, 249.21286010742188, 406.8934020996094, 240.59063720703125, 250.86598205566406, 259.5185546875, 160.23646545410156, 48.436180114746094], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000417.npy"}
{"epoch": 0.6123348017621145, "step": 418, "batch_size": 64, "mean": 179.6948699951172, "std": 196.7645263671875, "min": -196.661865234375, "p10": -37.5784194946289, "median": 167.6519317626953, "p90": 424.34837036132814, "max": 766.2838134765625, "pos_frac": 0.8125, "sample": [164.46511840820312, -75.35586547851562, 265.80255126953125, -22.444995880126953, 138.6958465576172, 605.136962890625, 27.37378692626953, 57.755924224853516, 418.84326171875, 342.5509948730469, 70.96722412109375, -117.49502563476562, 256.8013916015625, 170.8387451171875, 526.6511840820312, 244.1416473388672, 367.76947021484375, -35.71595764160156, 156.06500244140625, 71.134765625, -38.376617431640625, 346.0787658691406, -14.31719970703125, 206.22821044921875, 181.6033477783203, 100.26033020019531, 159.84356689453125, 355.4273986816406, 256.57666015625, 146.98138427734375, 323.1981506347656, 36.98158264160156, 10.399177551269531, 426.70770263671875, -196.661865234375, 16.37738800048828, -97.2215347290039, 192.869384765625, 200.78623962402344, 318.6746520996094, 766.2838134765625, -24.78533172607422, 113.07073974609375, 111.51571655273438, 200.09426879882812, 249.44923400878906, 124.19579315185547, 232.75753784179688, -181.34474182128906, -162.4578094482422, 244.7072296142578, 183.2133331298828, 368.8196105957031, -13.524124145507812, 515.4799194335938, 156.34420776367188, 247.2715301513672, 559.97998046875, 100.05545806884766, 18.257946014404297, 239.4796142578125, 555.63671875, 235.91348266601562, 93.65899658203125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000418.npy"}
{"epoch": 0.6138032305433186, "step": 419, "batch_size": 64, "mean": 190.42242431640625, "std": 175.9047088623047, "min": -199.73687744140625, "p10": 3.7866470336914078, "median": 177.85823822021484, "p90": 407.8551696777344, "max": 598.4485473632812, "pos_frac": 0.90625, "sample": [508.1480712890625, -199.73687744140625, 155.51327514648438, 317.0865478515625, 195.7703094482422, 246.8140106201172, -75.98037719726562, -25.385059356689453, 5.130645751953125, 406.1798095703125, 359.9151306152344, 66.03732299804688, 239.57388305664062, 266.25970458984375, 212.34571838378906, 355.5642395019531, 29.624481201171875, 114.02508544921875, -177.5453643798828, 135.9114227294922, 126.31241607666016, 422.9354553222656, 13.893726348876953, 211.6273193359375, 226.043701171875, -156.88858032226562, 81.71475219726562, 77.04932403564453, 359.16412353515625, 581.9136352539062, 122.46041870117188, 85.8619384765625, 280.1897888183594, 340.69366455078125, 107.16780090332031, 127.97313690185547, 160.9204864501953, 123.10543823242188, 397.8790588378906, 243.8401641845703, 330.1697692871094, 129.10003662109375, 270.8332824707031, 117.34065246582031, 72.80378723144531, 243.9250946044922, 76.34637451171875, 30.195037841796875, 543.230224609375, 419.6263122558594, 355.3103332519531, 194.86813354492188, -99.88143920898438, 598.4485473632812, 335.3601379394531, 307.0476989746094, 298.203369140625, 69.82506561279297, 93.84735107421875, 408.57318115234375, 3.2106475830078125, 44.280242919921875, 80.46485137939453, 194.79598999023438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000419.npy"}
{"epoch": 0.6152716593245228, "step": 420, "batch_size": 64, "mean": 150.2655029296875, "std": 187.79788208007812, "min": -328.0347595214844, "p10": -35.28028106689453, "median": 138.02867889404297, "p90": 376.9049957275391, "max": 671.6902465820312, "pos_frac": 0.8125, "sample": [-26.89844512939453, 153.66372680664062, 168.4935302734375, 104.312744140625, 263.6335144042969, -106.9137954711914, 8.591804504394531, 4.9334869384765625, 230.93441772460938, 368.9971618652344, -48.69401550292969, 146.56820678710938, 27.644054412841797, 33.11759567260742, -7.60015869140625, 57.04047393798828, 294.7369079589844, 247.42645263671875, 54.00559616088867, 145.54537963867188, -34.875518798828125, 279.2379455566406, -90.00848388671875, 323.6390380859375, 66.41830444335938, 153.499755859375, 136.8907928466797, 209.5230712890625, 24.760215759277344, 67.363037109375, 380.2940673828125, 272.97991943359375, 306.7437744140625, -18.424055099487305, 178.46434020996094, 259.3613586425781, -35.45375061035156, 68.1740493774414, 323.9468688964844, 502.0807189941406, -4.595205307006836, 16.31085205078125, 410.19659423828125, 139.16656494140625, 292.48382568359375, 99.25205993652344, 395.3313903808594, 297.5168762207031, 294.56561279296875, 198.79315185546875, 144.12161254882812, -328.0347595214844, 47.60731506347656, 100.04827880859375, 34.25226974487305, 671.6902465820312, -227.06788635253906, 87.65669250488281, 124.42649841308594, 166.1673583984375, 570.2997436523438, 62.260597229003906, -111.6236801147461, 642.0123291015625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000420.npy"}
{"epoch": 0.6167400881057269, "step": 421, "batch_size": 64, "mean": 138.48049926757812, "std": 162.12623596191406, "min": -191.70205688476562, "p10": -95.91509017944335, "median": 156.6637191772461, "p90": 369.723602294922, "max": 434.1565246582031, "pos_frac": 0.765625, "sample": [278.9948425292969, 70.81924438476562, 166.83168029785156, -100.4777603149414, 218.86219787597656, 263.6891174316406, 434.1565246582031, -2.94573974609375, 23.542402267456055, 39.39726638793945, 121.48956298828125, 4.537014007568359, 347.1030578613281, 387.9798278808594, 276.85125732421875, 277.3796691894531, 203.4038543701172, -102.29280090332031, 192.6910400390625, 145.00970458984375, 405.7139587402344, 291.70599365234375, 202.47552490234375, -111.98367309570312, 182.69374084472656, -44.58949279785156, 161.41636657714844, 197.8734130859375, 94.84082794189453, 218.40965270996094, 379.4181213378906, 287.65667724609375, 413.53875732421875, 258.4209899902344, 142.71165466308594, -85.26885986328125, 256.327392578125, 151.91107177734375, -175.34075927734375, 411.8744201660156, -48.850791931152344, -15.375885009765625, 203.51341247558594, 424.02728271484375, 247.64462280273438, 238.0128631591797, 138.76455688476562, 41.536865234375, 4.463174819946289, 202.5523223876953, 77.78945922851562, -37.03173065185547, 139.82179260253906, -191.70205688476562, -142.0692596435547, 70.45509338378906, 198.1312255859375, -36.08809280395508, -120.2825927734375, 163.41989135742188, 53.22346496582031, -68.88233947753906, 133.21627807617188, 299.63494873046875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000421.npy"}
{"epoch": 0.618208516886931, "step": 422, "batch_size": 64, "mean": 154.63314819335938, "std": 169.6029052734375, "min": -240.77886962890625, "p10": -59.90521774291991, "median": 163.14847564697266, "p90": 367.7298187255859, "max": 579.4008178710938, "pos_frac": 0.828125, "sample": [124.99217224121094, 367.32391357421875, 418.9991455078125, 264.8294372558594, 189.9537811279297, -109.31190490722656, -227.10105895996094, 334.98370361328125, 191.95950317382812, 579.4008178710938, -65.33308410644531, 217.2506561279297, 83.53276062011719, 31.169448852539062, 263.7044982910156, 78.53750610351562, 115.81710052490234, -47.240196228027344, 54.194644927978516, 208.5568084716797, 135.00143432617188, 480.33026123046875, 196.45606994628906, 14.888734817504883, 374.2652893066406, 265.8090515136719, 171.75601196289062, 367.9037780761719, 94.62017059326172, 295.8924255371094, 473.153076171875, 220.5897979736328, 96.45867919921875, 147.5243682861328, 61.0408821105957, 185.85670471191406, 335.13824462890625, -240.77886962890625, 99.41817474365234, -11.87738037109375, -31.856128692626953, 195.46063232421875, 74.10155487060547, -201.31613159179688, -9.632747650146484, 392.6054382324219, 294.489990234375, 219.51051330566406, 250.12644958496094, 110.32691955566406, 296.3712158203125, 154.5409393310547, 13.68408203125, 40.97737503051758, 311.205078125, 242.03829956054688, 140.85955810546875, -146.5119171142578, 188.69619750976562, 221.47866821289062, -96.04541015625, 102.31724548339844, 194.69070434570312, 98.73731994628906], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000422.npy"}
{"epoch": 0.6196769456681351, "step": 423, "batch_size": 64, "mean": 158.27920532226562, "std": 166.36349487304688, "min": -279.5654296875, "p10": -41.52825241088867, "median": 147.87747955322266, "p90": 387.9369354248047, "max": 542.4803466796875, "pos_frac": 0.8125, "sample": [18.804420471191406, 152.60873413085938, -41.82218933105469, 101.26742553710938, -19.344818115234375, -101.08753204345703, 481.4888000488281, 38.042877197265625, 372.5518798828125, 233.48577880859375, 327.013916015625, -40.84239959716797, 249.04232788085938, 217.96383666992188, 282.199462890625, -107.056884765625, 439.21636962890625, 497.09765625, 408.394287109375, 414.8144226074219, 18.33978271484375, 10.972808837890625, 282.0289306640625, 119.89827728271484, 278.268310546875, 195.55413818359375, 79.26696014404297, 147.5047149658203, -279.5654296875, -75.03256225585938, 215.7840576171875, 88.75576782226562, 135.05752563476562, 269.1983947753906, 383.7098693847656, 188.46292114257812, 43.13063049316406, 245.11962890625, -26.13377571105957, 49.013275146484375, 148.250244140625, 380.21099853515625, 130.26986694335938, 141.1019287109375, 542.4803466796875, 138.0459442138672, 193.9905242919922, -37.81427764892578, 197.2053680419922, 126.51225280761719, -57.56425857543945, 169.7580108642578, 168.3087615966797, 214.96566772460938, 177.02178955078125, 208.44178771972656, -105.32583618164062, 115.78495025634766, 114.62789916992188, -32.636810302734375, 307.25823974609375, 87.14054107666016, 118.91387939453125, 389.74853515625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000423.npy"}
{"epoch": 0.6211453744493393, "step": 424, "batch_size": 64, "mean": 156.38648986816406, "std": 178.88987731933594, "min": -215.44442749023438, "p10": -61.00262603759764, "median": 138.9018898010254, "p90": 340.1459350585938, "max": 701.347900390625, "pos_frac": 0.84375, "sample": [-141.21495056152344, 328.18902587890625, 11.221633911132812, -27.62258529663086, 23.0513916015625, 246.64453125, 235.14468383789062, 227.56292724609375, 63.051856994628906, -45.71295166015625, 254.65863037109375, 74.05756378173828, 46.86824035644531, -81.72945404052734, 203.11155700683594, 526.4808959960938, 196.03369140625, 400.181640625, 74.79866027832031, 300.1805114746094, 312.9169006347656, -74.65199279785156, 307.0342712402344, 345.27032470703125, 260.4153747558594, -67.55534362792969, 76.81375885009766, 238.05123901367188, -78.56085205078125, 60.97555160522461, -33.895729064941406, 701.347900390625, 23.89272689819336, 201.50099182128906, 232.9597625732422, 36.4891357421875, 288.7753601074219, -116.79304504394531, 45.404075622558594, 203.11953735351562, 0.9412593841552734, 99.47110748291016, -215.44442749023438, 50.23880386352539, 123.12671661376953, 98.74671173095703, 23.93767547607422, 84.55866241455078, 59.44242858886719, 619.1728515625, 160.38623046875, 306.6461181640625, 257.52825927734375, 201.29771423339844, 225.23228454589844, 48.49631881713867, 392.3974304199219, 70.61483001708984, 154.67706298828125, 547.3038330078125, 296.6946716308594, 219.64295959472656, 191.75120544433594, 113.406982421875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000424.npy"}
{"epoch": 0.6226138032305433, "step": 425, "batch_size": 64, "mean": 141.27867126464844, "std": 157.88706970214844, "min": -211.52146911621094, "p10": -56.40944366455077, "median": 152.22576904296875, "p90": 351.3668518066406, "max": 456.673828125, "pos_frac": 0.796875, "sample": [-42.57964324951172, -17.195199966430664, 267.4162902832031, -156.78640747070312, -211.52146911621094, 231.31668090820312, 24.302322387695312, 235.26986694335938, 51.4656982421875, 239.21913146972656, 148.17755126953125, 322.95709228515625, -175.4795684814453, 17.856220245361328, 353.0486755371094, 180.33139038085938, 33.51679229736328, -61.69775390625, 260.41326904296875, -40.17573547363281, 252.5628662109375, 150.0330810546875, 110.16665649414062, 274.85247802734375, 399.01824951171875, -30.60552215576172, 213.09840393066406, 44.16392517089844, 166.4167938232422, 418.76385498046875, 56.16240692138672, 332.8248596191406, 187.66909790039062, 19.812667846679688, 233.01724243164062, 70.26345825195312, 214.4812469482422, 117.22041320800781, 70.59492492675781, 456.673828125, -44.07005310058594, 200.2996063232422, 80.15168762207031, -94.04641723632812, 308.00701904296875, 233.7954559326172, 173.01707458496094, 118.3530502319336, 21.877334594726562, 347.4425964355469, 182.72821044921875, 223.23910522460938, -88.85099792480469, -21.9921875, 126.45470428466797, 155.5308074951172, 321.9161376953125, 398.93646240234375, 372.35888671875, 361.70294189453125, 150.43182373046875, -130.92483520507812, 154.01971435546875, 74.41097259521484], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000425.npy"}
{"epoch": 0.6240822320117474, "step": 426, "batch_size": 64, "mean": 177.3383331298828, "std": 173.1068572998047, "min": -335.1842041015625, "p10": -27.873868751525876, "median": 154.68035125732422, "p90": 417.0310913085938, "max": 560.4019775390625, "pos_frac": 0.875, "sample": [237.57781982421875, 10.991920471191406, 33.04853057861328, 172.42535400390625, 152.1201171875, 95.81959533691406, 79.38055419921875, 202.18927001953125, 466.0115661621094, 157.24058532714844, 139.79393005371094, 86.03903198242188, 180.25540161132812, -25.53424644470215, 346.8992004394531, 50.24441146850586, 172.6044464111328, 223.45489501953125, 346.9654235839844, 141.04153442382812, -56.542457580566406, 511.33966064453125, -28.876564025878906, 139.14059448242188, 106.15879821777344, 46.18389892578125, 141.9240264892578, -40.03844451904297, 543.4910888671875, 299.0092468261719, 165.8292694091797, 432.712158203125, 61.478721618652344, 109.1796875, 135.1908721923828, 414.07489013671875, 95.977294921875, -199.6228790283203, 141.33697509765625, 307.1358642578125, 15.19140625, 290.361083984375, 560.4019775390625, 269.0728759765625, 276.80914306640625, -44.518836975097656, 198.66799926757812, 418.29803466796875, -36.131256103515625, 148.3090362548828, 8.582527160644531, 305.8882751464844, 1.1173515319824219, 182.5775146484375, 269.6294860839844, 305.14813232421875, 68.44200897216797, 291.1315612792969, 357.2054748535156, -335.1842041015625, 421.27227783203125, 349.146484375, 100.773681640625, 333.80889892578125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000426.npy"}
{"epoch": 0.6255506607929515, "step": 427, "batch_size": 64, "mean": 116.69023895263672, "std": 143.9087371826172, "min": -228.63873291015625, "p10": -44.26773300170898, "median": 104.09674453735352, "p90": 293.63652343750005, "max": 558.3900146484375, "pos_frac": 0.828125, "sample": [125.4111328125, 53.47269821166992, 224.90396118164062, 347.0618896484375, 429.5885314941406, 88.2259292602539, 112.22008514404297, -92.29190826416016, 240.27920532226562, 246.70333862304688, -8.592498779296875, 171.52578735351562, -76.28083801269531, 27.329410552978516, 127.91220092773438, 107.03614807128906, 315.7148132324219, 424.91357421875, 103.12958526611328, -228.63873291015625, 117.9498519897461, 95.8204345703125, 173.24972534179688, 75.80209350585938, 41.409263610839844, 220.2679443359375, 117.2145004272461, 558.3900146484375, 61.38386535644531, 140.4184112548828, 55.729835510253906, 251.79873657226562, 225.01512145996094, -40.70130157470703, 18.462841033935547, 217.46875, 77.09168243408203, 68.36117553710938, 279.6824951171875, 47.815330505371094, 272.2607116699219, 123.62849426269531, 20.01494598388672, 127.26263427734375, -199.76202392578125, 24.28888702392578, 232.92369079589844, 206.97857666015625, -45.79620361328125, -63.29711151123047, 334.848388671875, 20.072998046875, 105.06390380859375, 9.987220764160156, 299.6168212890625, -29.481658935546875, 21.539031982421875, 154.0968017578125, 225.90826416015625, 82.5030746459961, 6.053779602050781, -12.810102462768555, 60.854774475097656, -50.83526611328125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000427.npy"}
{"epoch": 0.6270190895741556, "step": 428, "batch_size": 64, "mean": 162.33013916015625, "std": 180.40896606445312, "min": -321.53070068359375, "p10": -58.559735870361315, "median": 157.1686782836914, "p90": 399.5862518310547, "max": 561.460693359375, "pos_frac": 0.8125, "sample": [-48.61491012573242, -2.5112876892089844, 174.50204467773438, 279.5694274902344, 561.460693359375, 156.06336975097656, 98.33576965332031, 235.68011474609375, 295.896728515625, 454.1768798828125, -23.656158447265625, 49.86572265625, 311.510986328125, 45.229949951171875, 403.4176330566406, -78.34259033203125, -321.53070068359375, 507.9256286621094, -41.62828063964844, 317.2572326660156, 246.94329833984375, 131.9798583984375, 73.9681396484375, 89.67752075195312, 267.98785400390625, 478.27008056640625, 259.50848388671875, 158.27398681640625, 15.92104721069336, 191.0647430419922, 87.90312957763672, 155.8106689453125, 263.0430908203125, -98.0186538696289, 290.0113220214844, 37.12376403808594, 239.77813720703125, 250.6587677001953, 23.921253204345703, -63.46346664428711, 73.28024291992188, 84.62300109863281, 152.60214233398438, 4.987693786621094, -62.82180404663086, 224.47344970703125, -15.052322387695312, 160.51412963867188, 476.61773681640625, 138.87173461914062, 319.80181884765625, 315.63055419921875, 313.33984375, -249.1710968017578, 240.95306396484375, 186.8584442138672, 299.2708435058594, 505.51409912109375, 205.00831604003906, 104.33824157714844, 390.6463623046875, -76.17897033691406, 80.99359130859375, 39.055870056152344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000428.npy"}
{"epoch": 0.6284875183553598, "step": 429, "batch_size": 64, "mean": 136.06239318847656, "std": 175.8988037109375, "min": -294.7154846191406, "p10": -65.51715698242187, "median": 130.25286102294922, "p90": 383.66815185546875, "max": 488.2121276855469, "pos_frac": 0.75, "sample": [106.8033676147461, 146.97349548339844, -44.11127853393555, 289.1408996582031, -73.8960952758789, 312.88116455078125, -32.69585418701172, 75.19551849365234, 217.00030517578125, 156.84788513183594, -52.31437683105469, 221.28823852539062, -10.753730773925781, 40.65856170654297, -71.17549133300781, 220.85922241210938, 75.06283569335938, -21.493606567382812, 416.8398742675781, -20.600936889648438, 437.980712890625, 450.251220703125, -12.0216064453125, 278.8875427246094, 98.53768920898438, 55.53030776977539, 296.44915771484375, 17.175243377685547, 438.09429931640625, 215.18397521972656, -15.233695983886719, 117.46665954589844, 386.3373718261719, -178.83477783203125, 7.132804870605469, -78.72499084472656, -118.91414642333984, 151.29429626464844, 233.4311065673828, 10.325479507446289, 202.6845703125, 295.929931640625, 144.01416015625, 44.144683837890625, 488.2121276855469, 33.02332305908203, 296.1339416503906, -20.991806030273438, 99.14036560058594, -217.64865112304688, 289.2171325683594, 109.00367736816406, 353.31219482421875, 143.0390625, 44.78997039794922, -294.7154846191406, 3.40545654296875, 147.0225830078125, 206.91156005859375, 432.0486755371094, 207.98574829101562, 377.4399719238281, 311.08685302734375, 269.9439697265625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000429.npy"}
{"epoch": 0.6299559471365639, "step": 430, "batch_size": 64, "mean": 154.27493286132812, "std": 158.36412048339844, "min": -91.04891967773438, "p10": -11.322657394409173, "median": 131.5498275756836, "p90": 340.3810272216797, "max": 640.6307983398438, "pos_frac": 0.875, "sample": [251.50787353515625, 74.15773010253906, 27.23011016845703, -91.04891967773438, 500.5943908691406, 236.43521118164062, 11.787628173828125, 153.48834228515625, 147.55328369140625, 275.29010009765625, 25.66919708251953, 7.291648864746094, 19.96373748779297, 149.92922973632812, 104.63273620605469, 232.53512573242188, -19.578594207763672, 224.5645751953125, 55.12803649902344, 110.23773193359375, 95.79594421386719, 18.53058624267578, 201.5621337890625, 144.90667724609375, 30.910879135131836, 69.88395690917969, 305.6634521484375, 221.33123779296875, -19.035125732421875, 170.57530212402344, 118.87574768066406, -65.9175033569336, 640.6307983398438, 131.5829620361328, 161.24935913085938, 237.82301330566406, 100.56371307373047, 5.818870544433594, 12.998495101928711, 169.9673309326172, 316.53753662109375, 433.7908020019531, 337.5211486816406, 91.71094512939453, 131.51669311523438, 423.05279541015625, 144.2748565673828, 42.55029296875, 193.276123046875, -14.297470092773438, 201.1011505126953, 10.342670440673828, 341.606689453125, 154.56617736816406, -13.789421081542969, 130.4581298828125, 251.86207580566406, -88.87281036376953, 274.6151123046875, 519.9249267578125, 598.1270141601562, 74.85671997070312, 73.34305572509766, -5.566875457763672], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000430.npy"}
{"epoch": 0.631424375917768, "step": 431, "batch_size": 64, "mean": 196.02615356445312, "std": 167.5950927734375, "min": -154.84371948242188, "p10": -4.585929870605467, "median": 209.33966827392578, "p90": 396.4901336669922, "max": 649.1151123046875, "pos_frac": 0.875, "sample": [-17.547012329101562, 94.39814758300781, 4.290752410888672, 324.56341552734375, 122.77330780029297, 132.552734375, 511.7266540527344, 98.92570495605469, 196.67575073242188, 313.5461120605469, 196.88021850585938, 257.1206359863281, 53.966644287109375, 0.365509033203125, -136.11361694335938, 38.46888732910156, -2.502197265625, -54.40752410888672, 246.11053466796875, -154.84371948242188, 261.2413330078125, 473.208251953125, 558.76318359375, 449.03887939453125, 103.09675598144531, 386.82452392578125, 368.2095947265625, 265.3016357421875, 211.3423309326172, 257.1544189453125, 401.76666259765625, 207.33700561523438, 273.7018737792969, 179.7156982421875, 271.32989501953125, -55.80040740966797, 649.1151123046875, -5.4789581298828125, 310.8691101074219, 44.381614685058594, 357.08074951171875, 245.29637145996094, 375.4246826171875, -68.43852233886719, 136.44020080566406, 121.28378295898438, 303.1631774902344, 15.551828384399414, 144.7662353515625, 263.770263671875, 400.6325378417969, 241.61598205566406, 276.6377868652344, 271.26397705078125, 227.32369995117188, 249.83843994140625, 112.91614532470703, 13.461708068847656, 63.6846923828125, 122.76361083984375, 383.8160400390625, 127.48681640625, 75.62016296386719, 246.2047882080078], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000431.npy"}
{"epoch": 0.6328928046989721, "step": 432, "batch_size": 64, "mean": 180.48574829101562, "std": 161.9417724609375, "min": -232.8278350830078, "p10": 8.86722984313965, "median": 151.47222137451172, "p90": 363.1036346435547, "max": 779.1103515625, "pos_frac": 0.90625, "sample": [80.81422424316406, 65.31600952148438, 122.96885681152344, 219.07012939453125, 70.35475158691406, 84.15052795410156, 169.77822875976562, 40.58078384399414, 216.289306640625, 54.46687316894531, 120.77032470703125, -1.4067878723144531, 386.5349426269531, 293.8013916015625, 107.89067077636719, 779.1103515625, 234.1094970703125, 228.29177856445312, 43.06562805175781, 146.49575805664062, 113.95327758789062, 48.494903564453125, 437.7150573730469, 15.568939208984375, 251.706787109375, 310.7541809082031, 244.60777282714844, 11.265300750732422, 175.18310546875, 114.54959106445312, 224.37014770507812, 7.839485168457031, 54.60054397583008, 93.41059875488281, -32.39508056640625, 479.4437255859375, 194.50465393066406, -232.8278350830078, 262.8727111816406, 120.40623474121094, 141.41571044921875, -73.359375, 317.00408935546875, 238.0497283935547, 228.5743408203125, 266.8751220703125, 192.57086181640625, 342.0012512207031, 312.26519775390625, -17.686660766601562, 78.62608337402344, 340.3800964355469, 92.40724182128906, 156.4486846923828, 142.14920043945312, 65.83944702148438, 503.6433410644531, 240.61073303222656, 280.21697998046875, 366.2583312988281, -14.214302062988281, 355.74267578125, 135.21524047851562, 501.5772399902344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000432.npy"}
{"epoch": 0.6343612334801763, "step": 433, "batch_size": 64, "mean": 138.47442626953125, "std": 178.112548828125, "min": -315.50927734375, "p10": -106.62468719482422, "median": 145.86064910888672, "p90": 349.7043426513672, "max": 426.4100341796875, "pos_frac": 0.765625, "sample": [23.24432373046875, 320.5831298828125, 350.2821960449219, 288.04278564453125, 331.15411376953125, -105.17359924316406, 113.1933364868164, -116.92393493652344, 341.7347106933594, 240.5476837158203, -315.50927734375, 17.03619384765625, 363.3544616699219, 212.82415771484375, 70.25289916992188, 152.7748565673828, 271.455810546875, 362.02001953125, 426.4100341796875, -248.14892578125, -12.488090515136719, 348.35601806640625, 346.299560546875, 217.182861328125, 225.66470336914062, 196.3114013671875, 65.66191101074219, 106.65113067626953, 49.66984558105469, 272.7148742675781, -7.037750244140625, 296.40386962890625, -159.51052856445312, 53.8458251953125, 34.49287414550781, 337.8328857421875, 379.87060546875, -107.24658203125, 275.63909912109375, 94.9537124633789, 422.82708740234375, 319.02496337890625, 124.72150421142578, -48.852561950683594, -152.334228515625, 110.03319549560547, 205.83792114257812, 53.506752014160156, -51.58591842651367, -129.40228271484375, 414.4949951171875, 103.98849487304688, 170.6114501953125, -74.98228454589844, 179.05532836914062, -77.21516418457031, 113.8908462524414, 267.8595886230469, -91.73687744140625, 152.17733764648438, 259.1374206542969, 259.602294921875, 139.54396057128906, 77.73725891113281], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000433.npy"}
{"epoch": 0.6358296622613803, "step": 434, "batch_size": 64, "mean": 153.3128204345703, "std": 172.4844207763672, "min": -197.97325134277344, "p10": -22.155556297302237, "median": 139.5896453857422, "p90": 344.76752014160155, "max": 869.9866333007812, "pos_frac": 0.859375, "sample": [345.5126953125, 235.12413024902344, 869.9866333007812, 4.145580291748047, -70.7833251953125, 19.604888916015625, 92.2190933227539, 229.71160888671875, -39.76445770263672, -26.511611938476562, 288.5092468261719, 28.859085083007812, 141.69754028320312, 343.0287780761719, -38.23981857299805, 137.48175048828125, 28.05274200439453, -117.01663208007812, 201.9591064453125, 452.06304931640625, 220.70108032226562, 52.47185516357422, 191.33338928222656, 74.7405014038086, 260.1419372558594, 224.26016235351562, 53.429420471191406, 5.972867965698242, 108.29527282714844, 144.72946166992188, 90.7315673828125, 243.76510620117188, 303.0325012207031, -197.97325134277344, 86.06819152832031, 361.3897399902344, 21.885738372802734, 52.431121826171875, 146.58819580078125, 258.7347717285156, 12.164871215820312, 339.67413330078125, 416.0931701660156, 64.96107482910156, 148.53994750976562, 277.4365539550781, 242.48257446289062, 60.99943161010742, 194.37783813476562, 326.16259765625, 169.2207489013672, -11.991426467895508, 444.17205810546875, 113.59248352050781, 99.86653137207031, 5.534454345703125, 219.5775604248047, 177.07904052734375, 103.034423828125, 16.140586853027344, 482.6418762207031, 211.25132751464844, -7.4297637939453125, -121.93096923828125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000434.npy"}
{"epoch": 0.6372980910425844, "step": 435, "batch_size": 64, "mean": 177.13829040527344, "std": 181.31756591796875, "min": -311.94036865234375, "p10": -23.69732761383056, "median": 192.3525390625, "p90": 403.5310150146485, "max": 561.3436279296875, "pos_frac": 0.875, "sample": [504.8984680175781, 167.93295288085938, -204.12039184570312, 250.73086547851562, 442.7037353515625, 252.06744384765625, 247.24322509765625, 369.4632568359375, 311.8531799316406, 188.66744995117188, 346.699951171875, 88.38371276855469, 19.867019653320312, 351.7836608886719, 43.79295349121094, 425.7161865234375, 412.9775085449219, 378.2235107421875, 229.63760375976562, 40.97373962402344, -108.25057983398438, 16.4705810546875, -25.90545654296875, 75.79154968261719, -92.39157104492188, 37.96540832519531, 51.77543258666992, 242.63131713867188, -179.69772338867188, 125.0853271484375, 196.03762817382812, -18.545026779174805, 176.548828125, 128.98458862304688, 307.86077880859375, 289.2842712402344, 285.4737548828125, -311.94036865234375, 205.4757080078125, -172.63507080078125, 83.45364379882812, 187.44189453125, 443.8611755371094, 35.86534118652344, 213.4629669189453, 230.0377655029297, 381.48919677734375, 284.40264892578125, 44.129642486572266, 3.807218551635742, 561.3436279296875, 34.476409912109375, 273.33856201171875, 73.37692260742188, 306.68389892578125, 507.7169189453125, 307.6986389160156, 141.85073852539062, 94.75413513183594, 310.0632019042969, 160.6811981201172, 97.55354309082031, 252.95945739746094, 206.88662719726562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000435.npy"}
{"epoch": 0.6387665198237885, "step": 436, "batch_size": 64, "mean": 177.87025451660156, "std": 161.83802795410156, "min": -163.9276123046875, "p10": -15.896312332153295, "median": 163.79043579101562, "p90": 394.2567993164063, "max": 604.8577880859375, "pos_frac": 0.890625, "sample": [31.108476638793945, 43.26551818847656, -50.316375732421875, 159.90896606445312, 372.9186096191406, 192.94947814941406, 227.74618530273438, 510.58941650390625, 339.710205078125, 26.258705139160156, 147.68406677246094, 599.6217041015625, 79.50045776367188, 18.46091651916504, 242.03981018066406, 183.24493408203125, -26.791393280029297, 52.638702392578125, 457.0516052246094, -103.82797241210938, 236.8939666748047, 171.25894165039062, 245.0402374267578, 186.03570556640625, 445.5575866699219, 85.55349731445312, 324.12237548828125, 384.0948791503906, 106.71824645996094, 178.8092803955078, 604.8577880859375, 149.3037109375, 398.6119079589844, 249.05384826660156, 277.662353515625, 167.67190551757812, 159.6870574951172, -65.0981674194336, 220.4178466796875, 71.75885009765625, 289.8717956542969, 42.533084869384766, 85.44236755371094, 96.04351043701172, 147.56837463378906, 251.77227783203125, 100.24497985839844, -55.880699157714844, 186.85726928710938, 107.61398315429688, 214.7742919921875, 200.97213745117188, 140.19326782226562, 314.29144287109375, 475.1066589355469, 29.232040405273438, 118.9391860961914, 149.58763122558594, -163.9276123046875, 106.22862243652344, -61.53681182861328, 219.29287719726562, 277.17584228515625, 9.525543212890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000436.npy"}
{"epoch": 0.6402349486049926, "step": 437, "batch_size": 64, "mean": 162.37954711914062, "std": 171.57337951660156, "min": -242.5375518798828, "p10": -61.68466186523436, "median": 156.0784149169922, "p90": 387.1226043701173, "max": 545.6763305664062, "pos_frac": 0.84375, "sample": [163.33636474609375, -110.349365234375, 174.3195343017578, 272.5856018066406, 142.88827514648438, -242.5375518798828, 358.28045654296875, 148.82046508789062, 144.22743225097656, 33.758148193359375, -67.57827758789062, 119.2344970703125, -14.485145568847656, 361.4494934082031, 70.84359741210938, 200.1403350830078, 237.58041381835938, -240.13763427734375, 41.66180419921875, 411.0819396972656, -14.181922912597656, 318.228759765625, 130.8488006591797, 172.16064453125, -132.60267639160156, 463.9603576660156, 103.00438690185547, 74.70955657958984, 324.0840759277344, 222.4527130126953, 524.3516235351562, 259.00482177734375, 496.8457946777344, -94.21653747558594, 545.6763305664062, -47.932891845703125, 17.815622329711914, -84.29962158203125, 191.60494995117188, 323.4617004394531, 136.95257568359375, 45.90936279296875, 254.8255157470703, 165.9317626953125, 305.0701599121094, 124.89105224609375, 398.1253662109375, 278.2119445800781, 411.708984375, 169.59268188476562, 41.54570007324219, 145.35598754882812, 229.06175231933594, 65.7166748046875, 57.277931213378906, 353.62738037109375, 275.3802185058594, 168.33082580566406, 72.02040100097656, 134.7814178466797, 122.43682861328125, 180.94427490234375, 242.46807861328125, 12.027503967285156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000437.npy"}
{"epoch": 0.6417033773861968, "step": 438, "batch_size": 64, "mean": 196.6783905029297, "std": 164.91867065429688, "min": -152.48129272460938, "p10": -6.510884857177706, "median": 187.97462463378906, "p90": 464.6456939697266, "max": 517.20849609375, "pos_frac": 0.890625, "sample": [21.921175003051758, 172.65318298339844, 316.6253356933594, 42.05137634277344, 83.47725677490234, 40.9576416015625, 484.48046875, -45.17323303222656, 93.2624282836914, 105.16949462890625, 259.3562316894531, 125.50336456298828, 47.163570404052734, 131.90512084960938, 249.21827697753906, 314.02777099609375, 254.03504943847656, 458.23138427734375, 163.94007873535156, 186.19378662109375, 60.076148986816406, 500.5741882324219, 284.87786865234375, 498.62884521484375, 80.80290222167969, 26.302371978759766, 105.02826690673828, 331.0818176269531, -152.48129272460938, 35.287322998046875, -22.085542678833008, 77.27766418457031, 214.34022521972656, 408.48968505859375, 517.20849609375, 176.03591918945312, 192.81448364257812, 49.36475372314453, 362.1598205566406, 289.0523986816406, 467.3946838378906, 227.59542846679688, -64.99357604980469, 516.91015625, -45.27545166015625, 189.75546264648438, 365.80657958984375, 333.56109619140625, 242.8491668701172, 247.92172241210938, 77.5203857421875, 277.2561950683594, 399.2362060546875, 126.79283142089844, 68.36441040039062, 190.78378295898438, 194.322998046875, -43.18739318847656, -18.69605255126953, 334.93389892578125, 120.43492126464844, 468.8913269042969, 108.46144104003906, 260.9400329589844], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000438.npy"}
{"epoch": 0.6431718061674009, "step": 439, "batch_size": 64, "mean": 153.88719177246094, "std": 182.93299865722656, "min": -154.40371704101562, "p10": -69.26409606933593, "median": 120.786865234375, "p90": 413.01636962890626, "max": 612.867919921875, "pos_frac": 0.8125, "sample": [165.21981811523438, 500.44683837890625, 17.716718673706055, 271.8717041015625, -54.38471984863281, -65.42898559570312, 218.6376953125, 143.93980407714844, 123.89823913574219, 104.3752670288086, -98.3834228515625, 458.22509765625, 14.369125366210938, 15.460365295410156, 79.05927276611328, 277.77313232421875, 237.2704315185547, 274.1985168457031, 283.9945373535156, 125.46823120117188, 117.15498352050781, 612.867919921875, 156.31927490234375, -154.40371704101562, 87.95439910888672, -70.90771484375, 105.93258666992188, 83.52781677246094, -2.3845157623291016, 55.327110290527344, 334.4962158203125, 12.279899597167969, 257.830322265625, 244.87571716308594, 569.3505859375, 233.44003295898438, 411.568603515625, 269.18218994140625, 165.40965270996094, 72.06781768798828, 85.75737762451172, -93.48213958740234, -105.14576721191406, 4.990264892578125, 234.86270141601562, 137.72149658203125, 169.1014404296875, 218.15464782714844, -108.39382934570312, 316.19866943359375, 595.1946411132812, 117.67549133300781, 55.25917053222656, 413.6368408203125, 103.85379791259766, 33.73628234863281, 384.876708984375, 204.25137329101562, -44.06930923461914, 33.75532531738281, 492.2802429199219, 106.98583984375, -133.1583251953125, -30.880138397216797], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000439.npy"}
{"epoch": 0.644640234948605, "step": 440, "batch_size": 64, "mean": 144.91168212890625, "std": 183.9510955810547, "min": -213.46853637695312, "p10": -45.32242202758788, "median": 107.04914474487305, "p90": 424.9971893310547, "max": 597.9746704101562, "pos_frac": 0.8125, "sample": [342.1815490722656, 101.06586456298828, 434.4183349609375, 155.2835235595703, 57.27581787109375, 93.05089569091797, 304.99969482421875, 16.2565975189209, 95.13517761230469, 472.5927734375, 28.53328514099121, 3.60565185546875, 402.7786865234375, 141.50473022460938, 301.7589111328125, -38.823394775390625, 142.86737060546875, -193.06996154785156, 113.03242492675781, 13.474235534667969, 77.49163818359375, 464.6819152832031, 58.67994689941406, 32.25580596923828, 86.59010314941406, 156.84347534179688, -48.10771942138672, 297.87420654296875, 133.4790802001953, 88.70292663574219, 150.41915893554688, 427.5135803222656, -33.556732177734375, 132.29173278808594, -88.68014526367188, -3.29595947265625, 391.66400146484375, -213.46853637695312, 119.26457977294922, -65.07271575927734, 419.1256103515625, 179.87274169921875, 594.015869140625, 300.39508056640625, 88.3126220703125, 166.13906860351562, 66.69082641601562, -119.72077941894531, 259.0652160644531, 232.6455078125, 151.89120483398438, 597.9746704101562, 13.598594665527344, 147.99278259277344, 17.773677825927734, 393.6715087890625, 5.938251495361328, -29.78767967224121, 29.421592712402344, -2.315032958984375, 456.30218505859375, 231.43350219726562, -109.37889862060547, 29.797088623046875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000440.npy"}
{"epoch": 0.6461086637298091, "step": 441, "batch_size": 64, "mean": 172.47109985351562, "std": 211.3975067138672, "min": -368.46868896484375, "p10": -31.246041107177728, "median": 119.18840789794922, "p90": 446.9529510498047, "max": 863.843017578125, "pos_frac": 0.84375, "sample": [139.5365753173828, 91.27317810058594, 104.58905029296875, 351.00030517578125, 259.65887451171875, 467.54931640625, 11.96734619140625, 37.314552307128906, 276.9997863769531, -271.89697265625, -51.99564743041992, 15.447031021118164, 421.0447082519531, 519.64501953125, 283.35430908203125, 88.4577865600586, 52.64973449707031, 343.92413330078125, 89.45269775390625, 71.22189331054688, 16.865320205688477, 57.37364196777344, 126.2996826171875, 448.7082824707031, 508.76507568359375, 34.916229248046875, 95.51591491699219, 325.2175598144531, 257.59033203125, 863.843017578125, -17.1877384185791, -244.85987854003906, 112.07713317871094, 399.9396667480469, -143.79978942871094, 98.69509887695312, 101.88611602783203, 306.6278991699219, 503.3138427734375, -9.82867431640625, 18.760711669921875, 186.67401123046875, -24.013870239257812, -368.46868896484375, 283.405029296875, 229.2239532470703, 221.65176391601562, 442.857177734375, 293.7034606933594, 61.925052642822266, 30.067459106445312, -57.25843048095703, 50.72039794921875, 269.54779052734375, 244.0574493408203, 104.90765380859375, 436.03765869140625, 177.8053436279297, 275.32916259765625, 478.04412841796875, 58.26583480834961, -34.345542907714844, 193.9088134765625, 322.1914978027344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000441.npy"}
{"epoch": 0.6475770925110133, "step": 442, "batch_size": 64, "mean": 200.40408325195312, "std": 193.8613739013672, "min": -200.60023498535156, "p10": -10.357443428039543, "median": 157.86739349365234, "p90": 491.110580444336, "max": 708.6703491210938, "pos_frac": 0.875, "sample": [-3.6018295288085938, 110.50309753417969, 469.34844970703125, 72.0306396484375, 246.24285888671875, 80.42562866210938, 45.86194610595703, 156.68373107910156, 43.833282470703125, 120.25320434570312, 146.89707946777344, 290.8354187011719, 27.574790954589844, 219.36680603027344, 238.78457641601562, 415.66546630859375, 456.8260803222656, 47.41913986206055, 46.34136962890625, 63.76670455932617, 117.63001251220703, 377.60797119140625, 55.82679748535156, 265.8544616699219, -13.252706527709961, 159.05105590820312, 510.8675842285156, 206.36436462402344, 497.61187744140625, 261.0909423828125, 106.13922119140625, -200.60023498535156, 203.456787109375, 102.32051086425781, 25.11821746826172, 332.96124267578125, -18.930709838867188, 426.2481384277344, 178.56320190429688, 523.9782104492188, -90.55307006835938, 275.2605895996094, 436.5780029296875, 12.926473617553711, 34.291961669921875, 624.1312255859375, 499.3492126464844, 253.465087890625, 475.9408874511719, 151.69650268554688, 41.803253173828125, -29.429931640625, -28.81915283203125, 120.06410217285156, 27.320541381835938, -77.80375671386719, 396.8375244140625, 254.79986572265625, 708.6703491210938, 243.67471313476562, 549.835205078125, 191.5654296875, 90.33680725097656, 250.9546661376953], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000442.npy"}
{"epoch": 0.6490455212922174, "step": 443, "batch_size": 64, "mean": 142.5586700439453, "std": 195.6590118408203, "min": -293.7953186035156, "p10": -87.37556457519531, "median": 123.68791580200195, "p90": 339.54473571777345, "max": 751.4804077148438, "pos_frac": 0.828125, "sample": [751.4804077148438, 48.44084930419922, 480.77130126953125, 12.793746948242188, 206.96597290039062, 50.216278076171875, 26.16914939880371, -82.80296325683594, 46.69099044799805, 122.83687591552734, -74.66078186035156, 79.17509460449219, 197.3583221435547, 282.01617431640625, 42.13818359375, -293.7953186035156, 245.1037139892578, -89.33525085449219, 213.00039672851562, 73.9879150390625, 664.1730346679688, 341.938720703125, 304.5775451660156, 295.4166259765625, -146.75692749023438, -269.31842041015625, 186.04006958007812, 170.97669982910156, 328.0836486816406, -270.7876892089844, 294.63360595703125, 290.79071044921875, 38.67192077636719, 72.94868469238281, 96.78105163574219, -44.43225860595703, 120.11121368408203, 295.13909912109375, 184.35687255859375, 91.73588562011719, -128.7906951904297, -62.24717712402344, 268.2149963378906, 43.13937759399414, 68.87652587890625, 85.97221374511719, 229.33697509765625, 198.61538696289062, 147.1127471923828, 179.08370971679688, 333.9587707519531, 90.86871337890625, 124.53895568847656, 168.45826721191406, -118.03666687011719, 223.44149780273438, 51.97315979003906, 351.88275146484375, 161.88641357421875, 102.2862548828125, 459.8835754394531, 450.44244384765625, 33.409950256347656, 275.8157043457031], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000443.npy"}
{"epoch": 0.6505139500734214, "step": 444, "batch_size": 64, "mean": 196.6611785888672, "std": 205.12924194335938, "min": -265.3648986816406, "p10": -25.085510826110838, "median": 171.9711151123047, "p90": 414.7305358886719, "max": 864.2075805664062, "pos_frac": 0.84375, "sample": [220.8024444580078, -35.16241455078125, 36.896095275878906, 369.4243469238281, -11.390512466430664, 213.9114990234375, 75.73081970214844, 762.984375, 170.12750244140625, -233.90806579589844, 97.79452514648438, 223.51498413085938, 326.95355224609375, 411.7059326171875, 416.02679443359375, 222.4794921875, 237.2275390625, 186.1465301513672, -93.22946166992188, 23.636672973632812, 28.465246200561523, -54.13567352294922, 390.4870300292969, 485.2210693359375, -23.379318237304688, 272.38201904296875, 191.49630737304688, -13.75201416015625, 168.16795349121094, -83.268798828125, 86.57901763916016, 864.2075805664062, 272.75579833984375, 163.20315551757812, 106.21791076660156, 367.8504333496094, 215.0831298828125, 398.8107604980469, 295.49658203125, 158.24676513671875, 90.22537231445312, 592.0879516601562, 418.5126953125, 65.44625091552734, 334.38177490234375, 409.26837158203125, 205.30101013183594, 118.3748550415039, 95.12855529785156, 92.169189453125, 402.50274658203125, 131.6346893310547, 170.37881469726562, 82.2463607788086, 359.46380615234375, 173.56341552734375, 52.739288330078125, 89.52885437011719, -25.816736221313477, 44.310943603515625, 479.332763671875, -265.3648986816406, 206.55215454101562, 352.53961181640625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000444.npy"}
{"epoch": 0.6519823788546255, "step": 445, "batch_size": 64, "mean": 201.54562377929688, "std": 187.32301330566406, "min": -99.10179138183594, "p10": -33.030329895019506, "median": 164.27447509765625, "p90": 450.1152801513672, "max": 667.6602783203125, "pos_frac": 0.84375, "sample": [188.7311553955078, 25.345197677612305, 128.56072998046875, 537.448974609375, 94.45368957519531, 367.2467346191406, 156.2761688232422, 58.90776824951172, 152.4835205078125, 188.5615692138672, 31.071102142333984, 390.72052001953125, 83.35279846191406, 208.4660186767578, 104.00809478759766, -99.10179138183594, 176.01226806640625, 397.4714660644531, 65.44104766845703, 105.10862731933594, 164.04171752929688, 269.20697021484375, 361.0221862792969, 486.0831298828125, 59.25130081176758, 79.66246795654297, 655.8016967773438, 392.3292236328125, 206.1595458984375, 514.8612670898438, 413.00396728515625, 452.103759765625, 189.04945373535156, 445.4754943847656, 163.9648895263672, -5.1126708984375, 164.50723266601562, 292.1551818847656, -46.14794921875, 383.27197265625, -45.836402893066406, 596.798828125, 355.84735107421875, -47.75370788574219, 313.2374267578125, 342.7218017578125, 150.43215942382812, -78.2940444946289, 52.23865509033203, 79.46754455566406, 230.91452026367188, 83.93779754638672, 50.39669418334961, 322.2250061035156, 317.88427734375, 154.96372985839844, -1.837158203125, -1.8034725189208984, 667.6602783203125, -44.99504089355469, 197.18832397460938, 214.04641723632812, 79.26272583007812, -91.04013061523438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000445.npy"}
{"epoch": 0.6534508076358296, "step": 446, "batch_size": 64, "mean": 199.14756774902344, "std": 167.48011779785156, "min": -73.69969177246094, "p10": -2.2074674606323237, "median": 182.97057342529297, "p90": 446.7985748291016, "max": 624.3599243164062, "pos_frac": 0.875, "sample": [33.53974151611328, -1.4565277099609375, 45.29115295410156, 350.6446228027344, 601.0300903320312, 312.84100341796875, 110.98262023925781, -73.69969177246094, 84.74403381347656, 399.4439697265625, 624.3599243164062, 161.97625732421875, 82.45075225830078, 356.57550048828125, -31.70447540283203, 451.4096374511719, 343.59442138671875, 237.9522705078125, 144.25, 321.2630615234375, 74.62261962890625, 75.96630859375, 188.44886779785156, 129.93080139160156, 94.67518615722656, -55.93409729003906, 167.1248016357422, 276.8346252441406, 245.74923706054688, 229.26193237304688, 311.6780700683594, -50.53681945800781, 151.46304321289062, 144.6725311279297, -2.529298782348633, 243.396728515625, -12.965530395507812, 319.896240234375, 89.67364501953125, 464.9471740722656, 306.50213623046875, 63.39854431152344, 251.64187622070312, 551.0432739257812, 259.95770263671875, 8.369850158691406, 470.6006164550781, 312.7184143066406, 143.7303924560547, 314.68133544921875, 211.36758422851562, 11.324447631835938, -25.060302734375, 89.13687133789062, 49.544395446777344, 177.49227905273438, 61.209598541259766, 235.3085174560547, 465.05731201171875, 189.27671813964844, 27.132957458496094, 436.0394287109375, 197.61761474609375, 295.4891662597656], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000446.npy"}
{"epoch": 0.6549192364170338, "step": 447, "batch_size": 64, "mean": 197.22711181640625, "std": 205.8803253173828, "min": -154.50845336914062, "p10": -26.588602638244627, "median": 170.1280517578125, "p90": 400.5472045898438, "max": 1125.697021484375, "pos_frac": 0.84375, "sample": [114.19384765625, 185.5797882080078, 251.05435180664062, 179.0392608642578, -15.440948486328125, -24.952558517456055, 333.9956970214844, 132.74111938476562, 234.7545623779297, -27.289764404296875, -111.96209716796875, 354.63916015625, 222.801513671875, 165.9696807861328, 393.4644470214844, 386.33355712890625, 565.4152221679688, 102.72360229492188, 1125.697021484375, 507.3004150390625, 133.63827514648438, 459.5537414550781, 303.12554931640625, 268.24200439453125, -31.808303833007812, 94.98426055908203, 210.88638305664062, -140.46682739257812, 196.8568115234375, 33.47486877441406, -15.732940673828125, 116.20429992675781, 76.94889068603516, -49.54962158203125, 314.6176452636719, 92.14912414550781, 125.7218017578125, 392.4143371582031, 21.37371826171875, 15.446975708007812, 313.04437255859375, -38.294776916503906, 174.2864227294922, -154.50845336914062, 511.9069519042969, 164.8079833984375, 279.0764465332031, 26.370149612426758, 129.63934326171875, 27.6476993560791, 299.1358642578125, 182.83859252929688, 221.37989807128906, 154.85003662109375, 50.03948974609375, 379.67889404296875, 274.3406982421875, 141.5034637451172, 554.265380859375, 403.5826721191406, 114.40904998779297, 290.9911804199219, 381.4051208496094, 45.999786376953125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000447.npy"}
{"epoch": 0.6563876651982379, "step": 448, "batch_size": 64, "mean": 213.82147216796875, "std": 193.5828857421875, "min": -156.2112579345703, "p10": -18.43331356048582, "median": 194.52651977539062, "p90": 505.27878723144534, "max": 668.0330810546875, "pos_frac": 0.875, "sample": [27.01219940185547, 72.53213500976562, 30.568572998046875, 106.05047607421875, 584.9732666015625, 23.03228759765625, 190.6334991455078, -61.44458770751953, 271.54901123046875, -25.92767333984375, 277.08270263671875, 85.46573638916016, 516.8902587890625, 71.6678466796875, 241.54345703125, 279.20001220703125, 232.73365783691406, 179.99998474121094, 453.754150390625, -39.438629150390625, -156.2112579345703, 90.70263671875, 440.8965148925781, 270.8858642578125, 198.41954040527344, 165.34945678710938, 127.5318603515625, 103.04041290283203, 349.8997802734375, 573.62060546875, 296.7371826171875, 494.05291748046875, 77.02230072021484, 290.6363830566406, 654.90673828125, 2.4414196014404297, 152.76759338378906, 88.07600402832031, 78.7273941040039, 232.8873291015625, 274.82562255859375, 399.8793029785156, 133.8317108154297, -32.51593780517578, 657.2548828125, 235.1416473388672, 95.85293579101562, -0.9464740753173828, 176.08340454101562, 254.56863403320312, 246.9303436279297, -25.982406616210938, 668.0330810546875, 5.499702453613281, 423.56756591796875, 510.0898742675781, -56.66047668457031, 230.78250122070312, 107.46961212158203, 140.54550170898438, 464.6664123535156, 254.43338012695312, 223.266845703125, 247.687744140625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000448.npy"}
{"epoch": 0.657856093979442, "step": 449, "batch_size": 64, "mean": 191.867919921875, "std": 235.49368286132812, "min": -504.4924011230469, "p10": -27.034552574157708, "median": 177.50045776367188, "p90": 494.99631042480473, "max": 938.8231201171875, "pos_frac": 0.828125, "sample": [171.09397888183594, 63.349647521972656, 215.03541564941406, 114.29759216308594, 540.184326171875, 421.6227722167969, 78.0614013671875, -29.966257095336914, 211.77191162109375, 170.072509765625, 500.2266540527344, -80.50044250488281, 19.65520477294922, 34.646644592285156, 434.9446105957031, 52.02230453491211, 306.46197509765625, 185.80477905273438, 61.59941101074219, -10.758440017700195, 381.7724304199219, 187.36634826660156, 415.76287841796875, 441.5222473144531, 177.01466369628906, 503.6182861328125, 11.695159912109375, 32.30232238769531, 209.80078125, 181.8369140625, 177.9862518310547, 77.33889770507812, 9.6136474609375, 320.86444091796875, 329.29730224609375, 482.79217529296875, -36.58135223388672, 26.485942840576172, -20.19390869140625, -132.1345672607422, 938.8231201171875, 443.7835998535156, -165.1220245361328, 259.2074279785156, 204.06674194335938, 115.68230438232422, 16.5679931640625, -504.4924011230469, -35.25015640258789, 664.345703125, 213.33004760742188, 588.9583129882812, 181.66197204589844, -17.990068435668945, 165.91246032714844, 186.64633178710938, 328.25604248046875, 788.8665161132812, 234.0386962890625, 94.17704772949219, 12.468978881835938, -8.2010498046875, 134.8897247314453, 201.13241577148438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000449.npy"}
{"epoch": 0.6593245227606461, "step": 450, "batch_size": 64, "mean": 221.74276733398438, "std": 194.49790954589844, "min": -385.319580078125, "p10": 28.07355957031251, "median": 216.7552032470703, "p90": 428.4082885742188, "max": 796.1747436523438, "pos_frac": 0.921875, "sample": [143.25897216796875, 288.94775390625, 413.7350158691406, 431.8940124511719, 23.00054931640625, 477.70025634765625, 796.1747436523438, -68.92817687988281, 187.878173828125, -19.076438903808594, 400.24847412109375, 502.205078125, 374.91131591796875, 100.46700286865234, 291.600341796875, 397.187255859375, 117.61334228515625, 116.78564453125, 246.0025634765625, 116.43942260742188, 147.0797576904297, 263.1120300292969, 235.72894287109375, 420.2749328613281, 261.60565185546875, 70.19180297851562, 141.9175262451172, 568.2625732421875, 194.0186004638672, 372.29669189453125, 178.2657012939453, 365.50616455078125, 232.7218780517578, 121.02415466308594, 283.5042419433594, 41.31576919555664, 48.1716423034668, 352.4017639160156, 239.49090576171875, 86.48123931884766, 568.4440307617188, -177.11061096191406, 211.31326293945312, 327.0548400878906, 187.91334533691406, 254.96536254882812, 7.638397216796875, 325.4709777832031, 248.98040771484375, 66.28929138183594, 324.1074523925781, 39.91058349609375, 222.1971435546875, 152.72886657714844, 71.30426788330078, -385.319580078125, 257.242919921875, 263.80413818359375, -10.803337097167969, 158.09364318847656, 136.74432373046875, 735.8904418945312, 50.695350646972656, 192.56362915039062], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000450.npy"}
{"epoch": 0.6607929515418502, "step": 451, "batch_size": 64, "mean": 181.61514282226562, "std": 210.0352325439453, "min": -650.5352172851562, "p10": -36.87354583740231, "median": 176.3696060180664, "p90": 413.7963287353516, "max": 634.4208374023438, "pos_frac": 0.859375, "sample": [93.08956909179688, 114.35153198242188, 121.09228515625, 146.57662963867188, 321.41632080078125, 169.85372924804688, 125.80558776855469, 408.8857116699219, 65.2951889038086, 286.95294189453125, 322.8840026855469, 516.7334594726562, 396.6285400390625, 527.5023193359375, 179.55429077148438, 189.88644409179688, 42.26915740966797, -650.5352172851562, -5.5111083984375, -84.62591552734375, 93.39220428466797, 177.63015747070312, 343.9852294921875, 385.96783447265625, 281.4585876464844, 175.1090545654297, 236.075439453125, 154.88966369628906, 223.23495483398438, 484.7497863769531, 68.54354858398438, 634.4208374023438, 544.2823486328125, 111.91531372070312, 326.4910583496094, 284.247802734375, 263.61505126953125, 41.00532531738281, 212.887939453125, 97.27992248535156, -50.31459045410156, -1.8661231994628906, 404.6654357910156, -345.27935791015625, 140.04991149902344, 161.21832275390625, 179.6896209716797, 128.48355102539062, 238.42665100097656, 49.22510528564453, 248.96597290039062, -87.22862243652344, -74.67703247070312, 415.90087890625, 384.5028076171875, 138.31304931640625, 349.6080322265625, -169.27828979492188, 123.44549560546875, 7.360055923461914, 495.3731994628906, 35.8441276550293, 211.61358642578125, 210.03970336914062], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000451.npy"}
{"epoch": 0.6622613803230544, "step": 452, "batch_size": 64, "mean": 204.20883178710938, "std": 184.55909729003906, "min": -149.5360107421875, "p10": -8.789492607116697, "median": 207.3053970336914, "p90": 482.2239318847657, "max": 763.9049682617188, "pos_frac": 0.875, "sample": [277.4918212890625, -45.706947326660156, 302.3843688964844, 184.44015502929688, 39.43244171142578, 335.4124755859375, 322.27056884765625, 321.6728210449219, 235.60919189453125, -20.528221130371094, 585.0309448242188, 220.50576782226562, 50.99212646484375, 151.50784301757812, 391.4100341796875, 159.49636840820312, 259.311767578125, 260.5096130371094, 370.02886962890625, -9.23655891418457, 251.0843505859375, 40.291053771972656, 232.0306396484375, 205.847412109375, 208.7633819580078, 601.0032348632812, 45.13705825805664, -149.5360107421875, 371.1816711425781, 70.62051391601562, -10.53826904296875, 339.820556640625, 76.15408325195312, 220.90664672851562, 547.3345336914062, 119.95843505859375, 247.3540802001953, 369.3364562988281, 134.3294677734375, 96.72542572021484, 36.56296157836914, -48.49267578125, 111.09207153320312, 524.9189453125, 105.22241973876953, 272.9837341308594, 64.58026123046875, 491.8518371582031, 763.9049682617188, 88.18949890136719, 141.5954132080078, 229.29605102539062, 239.96775817871094, 51.998653411865234, -107.01065063476562, 131.7581787109375, -7.746337890625, 459.7588195800781, 242.14288330078125, 491.89544677734375, 86.40848541259766, 246.04241943359375, 4.4304656982421875, 38.17357635498047], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000452.npy"}
{"epoch": 0.6637298091042585, "step": 453, "batch_size": 64, "mean": 173.53720092773438, "std": 200.44998168945312, "min": -254.277587890625, "p10": -34.75994148254393, "median": 130.2735366821289, "p90": 421.00586547851566, "max": 993.2929077148438, "pos_frac": 0.828125, "sample": [297.90216064453125, 385.1800537109375, 26.65271759033203, 95.33269500732422, 264.5686340332031, 118.14710998535156, 143.97650146484375, 203.5635986328125, 130.86378479003906, 217.0260009765625, 156.50946044921875, 284.88885498046875, 304.12017822265625, 423.1102600097656, 151.60113525390625, 2.0968170166015625, 302.78802490234375, 222.24020385742188, -9.809553146362305, 83.48207092285156, 281.7055358886719, -112.03443908691406, 129.68328857421875, 78.59362030029297, 486.4740905761719, -112.42935943603516, 51.09626770019531, 352.0196838378906, -254.277587890625, 237.87506103515625, -8.107223510742188, 68.66629028320312, 250.7215576171875, 547.26806640625, 452.5357666015625, 42.82927322387695, -64.29576110839844, 93.63754272460938, 993.2929077148438, 386.5428466796875, 18.745574951171875, -72.88265228271484, 470.9034423828125, 37.504295349121094, 108.08224487304688, 39.244171142578125, 240.64601135253906, 13.504646301269531, 328.97265625, 295.73687744140625, -9.969100952148438, 266.47845458984375, -80.53768920898438, 327.8878479003906, 125.78656005859375, -41.010536193847656, 16.409011840820312, 122.95867156982422, 484.6380615234375, 416.0956115722656, -20.175220489501953, 144.15447998046875, 54.605804443359375, 112.56446075439453], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000453.npy"}
{"epoch": 0.6651982378854625, "step": 454, "batch_size": 64, "mean": 215.69744873046875, "std": 222.2145233154297, "min": -340.22918701171875, "p10": 14.677969169616704, "median": 176.65042877197266, "p90": 460.5650451660157, "max": 874.4517822265625, "pos_frac": 0.90625, "sample": [146.0849151611328, -9.9251708984375, 593.6273193359375, 190.83753967285156, 330.946044921875, 101.07667541503906, 392.4825134277344, 157.2517547607422, 87.80111694335938, 180.7985076904297, -340.22918701171875, 288.2950439453125, 240.88992309570312, 247.145263671875, 120.73760986328125, 63.90362548828125, 166.9241485595703, 202.1106719970703, -48.75945281982422, -231.19232177734375, 57.54747009277344, 358.78369140625, 19.260986328125, 639.0051879882812, -44.447303771972656, 62.53826904296875, 441.20220947265625, 18.919958114624023, 228.44667053222656, 260.17730712890625, 391.1304626464844, 93.22441101074219, 227.23040771484375, 172.50234985351562, 413.19354248046875, 114.18389129638672, 156.25949096679688, 136.12081909179688, 391.00189208984375, 186.9840545654297, 328.55841064453125, 396.8288269042969, 63.41841125488281, 318.1192321777344, 25.25604248046875, 468.8634033203125, 874.4517822265625, 181.78384399414062, 134.0488739013672, 135.28219604492188, 736.9338989257812, 832.2574462890625, 66.3104019165039, 319.0057373046875, 88.61449432373047, 344.91766357421875, 302.9770202636719, -114.19790649414062, 257.0858154296875, 27.661697387695312, 140.78257751464844, 99.19819641113281, 559.5457153320312, 12.859973907470703], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000454.npy"}
{"epoch": 0.6666666666666666, "step": 455, "batch_size": 64, "mean": 234.59808349609375, "std": 214.92477416992188, "min": -160.77012634277344, "p10": -40.01848564147947, "median": 218.00196075439453, "p90": 568.3942199707033, "max": 707.8005981445312, "pos_frac": 0.875, "sample": [63.64518356323242, 117.20361328125, 344.78753662109375, 483.410400390625, -23.578357696533203, 286.5394592285156, 432.6307373046875, -88.42869567871094, 602.3593139648438, -88.01129913330078, 652.9131469726562, 163.8840789794922, 504.4979248046875, -72.0631103515625, 627.9583740234375, -72.29660034179688, 260.0606689453125, 581.403076171875, 70.13207244873047, -47.06425476074219, 18.374053955078125, 328.7218933105469, 195.26744079589844, 59.542762756347656, 310.9867248535156, 707.8005981445312, 251.07228088378906, 94.90082550048828, 221.5978240966797, 97.15586853027344, 291.9261474609375, 377.61627197265625, 207.41912841796875, 214.40609741210938, 85.87992858886719, 153.70501708984375, 168.4908447265625, 279.08099365234375, 56.380157470703125, -160.77012634277344, 338.7174987792969, 44.79214096069336, 315.0515441894531, 422.9571533203125, 17.633819580078125, 107.19629669189453, -110.58807373046875, 289.32806396484375, 509.64080810546875, 311.0396423339844, 461.1203918457031, 92.86367797851562, 259.67364501953125, 343.9523620605469, 117.03668975830078, 73.08213806152344, 3.718128204345703, 538.0402221679688, 311.9123229980469, 229.44342041015625, 601.30126953125, 114.6761703491211, 204.74440002441406, 657.4051513671875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000455.npy"}
{"epoch": 0.6681350954478708, "step": 456, "batch_size": 64, "mean": 227.55477905273438, "std": 182.5292205810547, "min": -56.63412094116211, "p10": 29.041788101196296, "median": 186.49154663085938, "p90": 501.67820129394534, "max": 724.3048706054688, "pos_frac": 0.953125, "sample": [25.52389144897461, 120.25401306152344, 475.93670654296875, 199.75421142578125, 25.2752685546875, 216.04458618164062, 101.07894897460938, 117.13009643554688, 158.79840087890625, 406.38458251953125, 308.8360595703125, 37.76310729980469, 54.76802062988281, 526.4140625, 142.82916259765625, 128.00177001953125, 51.90180969238281, 258.6122131347656, 226.2619171142578, 190.73464965820312, 96.41639709472656, 37.250213623046875, 75.77700805664062, 572.3511962890625, -4.942539215087891, 314.70782470703125, 566.2288208007812, 220.40536499023438, 389.992431640625, 115.22674560546875, 200.43746948242188, 329.1692810058594, 416.6836242675781, 175.98731994628906, 120.02425384521484, 89.62677764892578, 39.60069274902344, 325.974365234375, 616.2820434570312, 37.27769470214844, 478.70452880859375, 202.83828735351562, 223.0214385986328, 118.35962677001953, -56.63412094116211, 140.05096435546875, 46.38703155517578, 19.659759521484375, 514.6861572265625, 165.47439575195312, 724.3048706054688, 504.961181640625, 206.25192260742188, 3.74224853515625, 182.24844360351562, 151.4198455810547, 375.7401123046875, 127.98129272460938, 262.5108642578125, -3.9449996948242188, 433.4354553222656, 486.3816223144531, 494.0179138183594, 255.1267852783203], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000456.npy"}
{"epoch": 0.6696035242290749, "step": 457, "batch_size": 64, "mean": 210.951171875, "std": 191.09718322753906, "min": -238.21885681152344, "p10": -38.07606182098387, "median": 234.89306640625, "p90": 465.8900817871094, "max": 687.4066162109375, "pos_frac": 0.859375, "sample": [105.61063385009766, 322.0769348144531, 253.8909149169922, 148.8614501953125, 188.8733367919922, 338.70477294921875, 271.31964111328125, 493.81170654296875, 174.86795043945312, 41.49147033691406, 687.4066162109375, 385.1102294921875, 19.37851333618164, -139.20115661621094, 202.1178741455078, 237.44017028808594, 332.9839782714844, 251.34483337402344, 479.6673278808594, 268.5645751953125, 519.73583984375, -3.7364234924316406, 198.9193878173828, 302.14031982421875, 301.8787841796875, 469.64892578125, -178.1579132080078, 24.059459686279297, 410.5921936035156, 393.4105224609375, 120.74250793457031, 227.81887817382812, 47.87821578979492, 254.47384643554688, 232.34596252441406, -172.3135528564453, 336.7217102050781, 303.8155212402344, 99.59213256835938, 63.441925048828125, 281.76605224609375, 488.51275634765625, 249.5970916748047, -78.14698791503906, 23.873615264892578, -47.17918395996094, 369.8116149902344, 48.95276641845703, 347.98931884765625, 300.6167297363281, -16.8354434967041, 311.68536376953125, 226.59182739257812, 512.750732421875, -238.21885681152344, 457.11944580078125, 115.88542175292969, -75.50215911865234, 168.41400146484375, 88.18785095214844, 86.75324249267578, 296.20989990234375, 146.36331176757812, 418.34686279296875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000457.npy"}
{"epoch": 0.671071953010279, "step": 458, "batch_size": 64, "mean": 220.4754638671875, "std": 213.5567169189453, "min": -171.14120483398438, "p10": -53.83767166137693, "median": 235.5184555053711, "p90": 420.60771789550796, "max": 1032.5650634765625, "pos_frac": 0.8125, "sample": [-89.78306579589844, 183.97911071777344, 328.07672119140625, 222.0937042236328, 234.46543884277344, -26.261581420898438, 1032.5650634765625, -130.1858673095703, 273.45489501953125, 325.1827697753906, 64.27303314208984, 231.00839233398438, 457.83734130859375, 265.58831787109375, 540.8322143554688, -70.17605590820312, -21.192218780517578, -9.506311416625977, 166.2770538330078, 298.961669921875, 349.9811706542969, 314.3142395019531, 193.78781127929688, 360.17767333984375, 262.16058349609375, 165.1636505126953, 267.67303466796875, -32.97418212890625, 242.51182556152344, 185.31765747070312, 236.57147216796875, 574.30615234375, 349.0482482910156, 449.3631896972656, 219.75518798828125, -13.80487060546875, 305.3509826660156, 42.75263977050781, 436.21527099609375, 340.1026916503906, 361.5167541503906, 253.15969848632812, 0.6693191528320312, -108.9098892211914, 185.6488037109375, -112.65169525146484, 273.3988037109375, -62.77916717529297, 119.29510498046875, 881.593994140625, 198.9275360107422, 248.56100463867188, 243.71005249023438, 146.3069305419922, 80.72293090820312, 384.1900939941406, 339.7099304199219, 250.38290405273438, 119.85221862792969, 335.99371337890625, 92.49190521240234, 327.4946594238281, -171.14120483398438, 197.01930236816406], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000458.npy"}
{"epoch": 0.6725403817914831, "step": 459, "batch_size": 64, "mean": 202.25750732421875, "std": 199.30384826660156, "min": -305.18524169921875, "p10": -19.045756340026852, "median": 179.25509643554688, "p90": 471.02061767578124, "max": 691.3047485351562, "pos_frac": 0.859375, "sample": [39.20074462890625, 122.23648071289062, 178.50375366210938, 218.04904174804688, -20.902626037597656, 182.64691162109375, -37.32138442993164, 51.94401550292969, 350.72210693359375, 214.8406219482422, 338.9403991699219, 173.99298095703125, 323.5099182128906, 351.97430419921875, -14.71306037902832, 266.97174072265625, 100.56072998046875, 180.00643920898438, 5.199394226074219, 517.5762329101562, -7.2926025390625, 675.7579956054688, 471.06976318359375, -41.8635139465332, 215.38119506835938, 117.13853454589844, 379.6618957519531, 8.520034790039062, 125.44673156738281, 198.49737548828125, 171.13795471191406, 177.763427734375, 490.3638000488281, -305.18524169921875, 691.3047485351562, 200.6761016845703, 665.8138427734375, 231.90945434570312, 141.55731201171875, -33.07086181640625, 110.5132827758789, 74.64546966552734, 193.244873046875, 105.5789794921875, 6.678436279296875, 206.87173461914062, -189.97613525390625, 241.71694946289062, 87.09059143066406, 421.8004150390625, 439.9949645996094, 139.19837951660156, 156.52142333984375, 470.90594482421875, 144.36978149414062, 290.64044189453125, 367.1289978027344, -107.99408721923828, 476.3929138183594, 392.7071533203125, 393.0332946777344, 87.65338134765625, 22.044498443603516, 295.1927490234375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000459.npy"}
{"epoch": 0.6740088105726872, "step": 460, "batch_size": 64, "mean": 222.14083862304688, "std": 200.023193359375, "min": -232.55941772460938, "p10": -22.895415306091305, "median": 214.82653045654297, "p90": 502.5172149658203, "max": 728.1021118164062, "pos_frac": 0.859375, "sample": [586.7044677734375, 291.13787841796875, 160.07711791992188, 72.295654296875, 57.43183898925781, 432.4306640625, 531.1488037109375, 254.03431701660156, -143.02633666992188, 177.6978302001953, 33.528968811035156, 438.1168212890625, -24.001598358154297, 380.9041442871094, 85.34782409667969, 127.28369140625, 182.81719970703125, 84.69340515136719, -47.17228698730469, 728.1021118164062, 436.30364990234375, 135.1492919921875, 407.8597106933594, 20.98193359375, 498.3666687011719, 293.6419677734375, 680.9073486328125, 245.1017303466797, 168.28567504882812, 23.001708984375, 313.81768798828125, -232.55941772460938, 517.1615600585938, -37.66639709472656, 199.73226928710938, 537.0180053710938, -20.314321517944336, 321.2099914550781, 252.189453125, 140.79605102539062, 370.4206848144531, 312.2073059082031, 210.5824432373047, 298.0631103515625, 275.44024658203125, 17.261507034301758, 184.06019592285156, 163.2021942138672, 241.3387451171875, 229.8451690673828, -53.94480514526367, -6.389503479003906, 59.91997528076172, 129.22555541992188, 24.013046264648438, 188.275146484375, 360.31658935546875, 227.57711791992188, -96.1200942993164, 369.12005615234375, 318.533935546875, 360.16143798828125, 219.07061767578125, 504.2960205078125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000460.npy"}
{"epoch": 0.6754772393538914, "step": 461, "batch_size": 64, "mean": 168.80178833007812, "std": 206.18136596679688, "min": -268.00457763671875, "p10": -61.88219680786133, "median": 154.16116333007812, "p90": 424.36202087402353, "max": 682.759765625, "pos_frac": 0.796875, "sample": [70.23442077636719, -210.1035919189453, 283.2845153808594, 104.94064331054688, 146.6588134765625, 362.1330871582031, -268.00457763671875, 370.5793762207031, 35.846527099609375, 207.17201232910156, 55.745506286621094, -65.80420684814453, 188.92230224609375, 332.0306701660156, 621.8214111328125, 161.66351318359375, 555.7771606445312, -15.871793746948242, 33.06544494628906, 76.03077697753906, 62.99763488769531, 302.5162048339844, 329.0590515136719, 614.8631591796875, 11.677780151367188, -41.00250244140625, 638.2936401367188, 682.759765625, 165.04248046875, -40.73468017578125, 295.5913391113281, 140.17970275878906, 351.4179992675781, 18.254531860351562, 226.679931640625, 103.12190246582031, -111.09564971923828, -30.549285888671875, -62.36726379394531, 137.79257202148438, 459.5933837890625, -57.43329620361328, 288.87725830078125, 253.95632934570312, -119.08943176269531, 434.01507568359375, 193.15281677246094, 381.511962890625, 22.05322265625, 207.17877197265625, 401.8382263183594, 66.11864471435547, 71.21481323242188, 183.451416015625, 260.3829040527344, -60.75037384033203, 6.308319091796875, 182.38803100585938, 189.69708251953125, 134.92303466796875, 57.35240936279297, -105.38417053222656, 237.48646545410156, 273.85113525390625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000461.npy"}
{"epoch": 0.6769456681350955, "step": 462, "batch_size": 64, "mean": 160.3841552734375, "std": 183.43466186523438, "min": -158.648681640625, "p10": -38.99123153686523, "median": 156.41156005859375, "p90": 372.75897216796875, "max": 718.2456665039062, "pos_frac": 0.765625, "sample": [169.43551635742188, -12.881553649902344, -5.183403015136719, 33.11824035644531, 414.2221984863281, 255.58987426757812, 38.247344970703125, 182.2586669921875, 355.48577880859375, 281.46234130859375, 213.991455078125, -140.936279296875, -50.89189529418945, -105.75857543945312, 102.28191375732422, 323.56329345703125, 373.46063232421875, 196.46737670898438, -11.55633544921875, 155.6363525390625, -9.206398010253906, 234.115478515625, 197.37721252441406, -158.648681640625, 37.05159378051758, 26.877227783203125, 325.65716552734375, 34.39591979980469, 40.42386245727539, 27.432178497314453, 697.75048828125, 173.1219024658203, 69.42314147949219, 200.1017608642578, 371.12176513671875, 718.2456665039062, -2.8644466400146484, -57.09900665283203, 228.13638305664062, 325.6473388671875, -27.29747200012207, -33.41545104980469, 236.27561950683594, 158.47674560546875, 100.03195190429688, -2.2831573486328125, 199.16058349609375, 375.11077880859375, -41.38085174560547, 152.80606079101562, 62.902740478515625, 284.10003662109375, 593.9361572265625, 259.4737548828125, 379.1123962402344, 124.15325927734375, 315.57403564453125, 265.1191101074219, 84.68499755859375, 311.2100830078125, 87.98408508300781, 157.186767578125, -111.10301208496094, 85.69346618652344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000462.npy"}
{"epoch": 0.6784140969162996, "step": 463, "batch_size": 64, "mean": 187.98629760742188, "std": 214.18917846679688, "min": -193.17822265625, "p10": -65.77043609619139, "median": 175.9860076904297, "p90": 434.4525451660156, "max": 722.7108764648438, "pos_frac": 0.78125, "sample": [18.787033081054688, 361.7880554199219, -145.5891876220703, 120.48199462890625, 91.89228057861328, 349.271728515625, -17.97992706298828, -84.5511245727539, 272.346923828125, -33.00523376464844, 374.0517578125, -45.654205322265625, 302.709716796875, 427.9757080078125, 93.42034912109375, 216.80996704101562, 235.74185180664062, 180.8948516845703, 171.07716369628906, 134.6641387939453, 135.86070251464844, 621.4976196289062, 58.054359436035156, 139.45834350585938, 423.7327880859375, 331.8480224609375, 45.39755630493164, -33.839317321777344, -11.859161376953125, 300.7764587402344, -155.12380981445312, -26.248817443847656, -89.48703002929688, 0.5884475708007812, 593.2206420898438, -193.17822265625, 15.855575561523438, 78.04376220703125, 265.3763122558594, 16.267566680908203, 343.0409851074219, 283.6363220214844, 437.22833251953125, 212.21047973632812, 474.671142578125, 559.7816772460938, 57.607269287109375, 217.24301147460938, 399.0368957519531, -74.39167785644531, 291.4892272949219, 722.7108764648438, 182.81748962402344, 411.43756103515625, -157.17559814453125, -27.85107421875, 283.36865234375, 127.18096923828125, 37.61698913574219, 243.54562377929688, 282.3477478027344, 401.59027099609375, 684.25390625, 96.35043334960938], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000463.npy"}
{"epoch": 0.6798825256975036, "step": 464, "batch_size": 64, "mean": 263.5333557128906, "std": 235.31239318847656, "min": -237.82101440429688, "p10": 4.853813362121589, "median": 255.7038116455078, "p90": 529.3572753906251, "max": 1052.26953125, "pos_frac": 0.90625, "sample": [-139.5850067138672, 518.5177001953125, 241.54364013671875, 77.48019409179688, 123.87103271484375, 514.383544921875, 219.5832061767578, 1.722860336303711, 118.15579223632812, 268.66357421875, 429.413818359375, 406.0049133300781, 37.366851806640625, 435.2029724121094, 254.114990234375, 340.20318603515625, 486.3858642578125, 109.04450988769531, 90.521240234375, 485.10638427734375, 475.1964111328125, 283.499267578125, 544.0348510742188, -125.78410339355469, 570.556640625, 121.6185073852539, 257.2926330566406, 480.62774658203125, -64.11568450927734, 118.65637969970703, 177.37408447265625, 337.30615234375, 266.9669189453125, 35.33475112915039, 120.48641204833984, 342.3344421386719, 438.50341796875, 282.7340087890625, 41.9215087890625, 103.64764404296875, 83.50274658203125, 228.222900390625, 427.904052734375, 101.13097381591797, 87.44398498535156, 410.4551086425781, -223.60848999023438, 214.19369506835938, 12.159370422363281, 534.0028076171875, 557.4732666015625, 722.6844482421875, 1052.26953125, 403.096923828125, 173.74639892578125, 461.3891296386719, 466.72125244140625, -24.68999481201172, 557.1677856445312, 411.73419189453125, 366.2823791503906, -237.82101440429688, 237.4800567626953, 17.299720764160156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000464.npy"}
{"epoch": 0.6813509544787077, "step": 465, "batch_size": 64, "mean": 240.72128295898438, "std": 206.53594970703125, "min": -194.4801025390625, "p10": -10.469570159912106, "median": 230.637451171875, "p90": 478.49668884277344, "max": 787.935791015625, "pos_frac": 0.875, "sample": [479.136962890625, 437.4277038574219, 208.080810546875, 134.1500701904297, -97.61011505126953, 53.751251220703125, 344.70263671875, 61.021942138671875, 111.43988800048828, 245.19253540039062, 306.7308044433594, 150.85809326171875, 134.18408203125, 88.01726531982422, 477.0027160644531, 595.8967895507812, 288.5509033203125, 344.2137756347656, 321.50408935546875, 143.34829711914062, -127.9304428100586, 473.45416259765625, 521.7304077148438, 230.86856079101562, 787.935791015625, -19.626968383789062, 353.55706787109375, 58.79425048828125, 86.62383270263672, 218.07025146484375, 376.3116455078125, 362.87921142578125, 109.76480865478516, 170.59909057617188, 145.76361083984375, 450.56378173828125, -12.152183532714844, 214.38916015625, 44.71713638305664, 295.2049255371094, 409.0682067871094, 209.72329711914062, 198.62173461914062, -194.4801025390625, 230.40634155273438, 95.99211120605469, 108.82664489746094, 394.18878173828125, 60.64308547973633, 279.0141906738281, 585.416748046875, -62.35797119140625, 672.7554931640625, -170.66152954101562, 624.310302734375, 301.89288330078125, -6.5434722900390625, 241.57208251953125, 291.41961669921875, 358.8319091796875, 36.69267272949219, 402.9124755859375, 465.50799560546875, 303.29052734375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000465.npy"}
{"epoch": 0.6828193832599119, "step": 466, "batch_size": 64, "mean": 203.26609802246094, "std": 217.36859130859375, "min": -404.0133361816406, "p10": -46.50006408691405, "median": 204.53009033203125, "p90": 497.3258758544923, "max": 749.9016723632812, "pos_frac": 0.8125, "sample": [313.54693603515625, -50.81739807128906, 172.2559814453125, -71.7455062866211, 310.68756103515625, 350.7706298828125, 329.34765625, 290.3299560546875, 327.0149841308594, -17.905410766601562, 77.78748321533203, 233.27005004882812, -89.83312225341797, 565.7760009765625, -7.167507171630859, 230.16873168945312, 205.9651641845703, 52.31178665161133, 267.3342590332031, 252.692138671875, 235.19415283203125, 161.64102172851562, 207.44728088378906, 251.43504333496094, 568.4342041015625, 166.00033569335938, 134.70407104492188, 467.3789367675781, 203.0950164794922, 361.3966979980469, 155.9803466796875, 197.02188110351562, 289.90545654296875, 589.2841186523438, -286.4628601074219, 749.9016723632812, 245.5800323486328, 538.9793701171875, 309.17425537109375, 5.426704406738281, 325.4107666015625, 5.365467071533203, 159.98727416992188, 184.52496337890625, 403.2969970703125, 687.9944458007812, -52.69632339477539, -159.68472290039062, -2.7162551879882812, 96.39297485351562, 423.1986083984375, 201.54940795898438, 143.24908447265625, 74.8368148803711, 388.4483947753906, 328.46771240234375, -404.0133361816406, 75.69216918945312, 510.1602783203125, -36.42628479003906, 264.2877502441406, -25.728233337402344, 117.6379623413086, 6.486358642578125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000466.npy"}
{"epoch": 0.684287812041116, "step": 467, "batch_size": 64, "mean": 228.0748291015625, "std": 268.163330078125, "min": -394.5601806640625, "p10": -104.59171142578124, "median": 218.75282287597656, "p90": 595.5617675781252, "max": 798.409423828125, "pos_frac": 0.8125, "sample": [461.75421142578125, 620.611328125, 468.39617919921875, 537.11279296875, -236.41542053222656, -394.5601806640625, 258.5097351074219, 311.4171447753906, 102.01174926757812, 163.45816040039062, 241.02035522460938, 345.9854431152344, 627.9539794921875, 23.206464767456055, 331.37872314453125, 420.4091796875, 114.4703140258789, 177.21434020996094, 685.2494506835938, 469.336181640625, 426.6065673828125, 65.88816833496094, 183.96551513671875, 328.6872863769531, -121.49165344238281, -71.2955322265625, -3.3406219482421875, 294.3731689453125, 724.9779052734375, 267.4493408203125, 481.2670593261719, 130.2007598876953, 334.4231262207031, -70.6032485961914, 528.5116577148438, 58.210296630859375, 687.6951904296875, 39.129940032958984, 117.468505859375, 416.4024658203125, 184.44296264648438, -280.57421875, 89.77716827392578, 4.13836669921875, 189.84486389160156, 274.9715881347656, -106.9002685546875, -99.205078125, -154.711669921875, 494.1256408691406, 360.27166748046875, 399.499755859375, 347.6969299316406, 12.88076400756836, 798.409423828125, 196.48529052734375, 388.9078369140625, 170.08370971679688, -93.06199645996094, 742.804443359375, -229.92196655273438, 58.334529876708984, 34.70336151123047, 266.73919677734375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000467.npy"}
{"epoch": 0.6857562408223201, "step": 468, "batch_size": 64, "mean": 228.93344116210938, "std": 232.2508087158203, "min": -239.50148010253906, "p10": -10.626161193847652, "median": 212.16583251953125, "p90": 528.9257202148438, "max": 786.0404052734375, "pos_frac": 0.84375, "sample": [387.1529846191406, -90.75968170166016, 749.4207153320312, 259.9560546875, 277.4405822753906, 520.7097778320312, 237.2566375732422, 459.3160705566406, 381.89520263671875, 442.5445861816406, 322.7673034667969, 182.6526641845703, 77.58375549316406, 101.37178802490234, -6.864957809448242, 125.17974853515625, 94.32963562011719, 81.09595489501953, 320.86431884765625, 151.62802124023438, 141.72756958007812, 23.425212860107422, -7.061347961425781, 201.28085327148438, 291.9858093261719, 345.5881652832031, 213.212890625, 670.8328857421875, 598.25830078125, 87.72750854492188, -239.50148010253906, -61.74546813964844, 443.8597106933594, 228.9530029296875, 367.8489074707031, 235.1271514892578, 211.1187744140625, 175.20098876953125, 482.1907958984375, -231.25732421875, 721.9864501953125, 56.79309844970703, 123.01179504394531, 239.62863159179688, 284.92144775390625, 341.8958435058594, 80.01275634765625, 48.687835693359375, 3.4029998779296875, 786.0404052734375, 139.26016235351562, 582.2437744140625, 169.33016967773438, 295.4319763183594, -3.2981491088867188, 453.93341064453125, 397.1161804199219, 532.4468383789062, 399.4701843261719, 18.358154296875, 64.99671936035156, -183.39596557617188, -142.69480895996094, -12.153938293457031], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000468.npy"}
{"epoch": 0.6872246696035242, "step": 469, "batch_size": 64, "mean": 246.43438720703125, "std": 223.6728973388672, "min": -214.2534942626953, "p10": -4.342338562011714, "median": 191.56819915771484, "p90": 596.0020935058594, "max": 774.7147216796875, "pos_frac": 0.875, "sample": [448.288818359375, 264.3959655761719, 141.94467163085938, 139.89341735839844, 502.231689453125, 163.54713439941406, 442.5530700683594, 168.439208984375, 88.64128112792969, 164.7032470703125, 311.0194396972656, 89.08280944824219, 305.1333923339844, 97.89639282226562, 774.7147216796875, 152.02386474609375, 449.8008117675781, 355.3346252441406, 336.87823486328125, 381.9275207519531, 659.33203125, 288.0030212402344, 130.0485076904297, 264.68743896484375, 770.668701171875, -6.11834716796875, 351.9185791015625, 146.60794067382812, 617.5868530273438, -184.82037353515625, 177.99234008789062, 131.90061950683594, 233.58587646484375, 16.55330467224121, 303.8128662109375, 33.29167938232422, 114.9459457397461, -32.734039306640625, 0.9730224609375, -56.77751159667969, 244.06137084960938, 87.1873550415039, 587.52685546875, -23.325000762939453, -0.1983184814453125, 200.0267333984375, 374.2356262207031, 454.01654052734375, 194.21543884277344, -72.23119354248047, 415.5406494140625, 599.6343383789062, 254.38595581054688, 757.9015502929688, 605.579833984375, 454.6275634765625, 82.74275970458984, 301.1011047363281, 164.89317321777344, 176.16415405273438, 188.92095947265625, 137.7816619873047, 61.35630798339844, -214.2534942626953], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000469.npy"}
{"epoch": 0.6886930983847284, "step": 470, "batch_size": 64, "mean": 242.15528869628906, "std": 223.19879150390625, "min": -534.05712890625, "p10": -7.781984710693354, "median": 202.063232421875, "p90": 534.6554290771486, "max": 744.80615234375, "pos_frac": 0.875, "sample": [137.35418701171875, 281.65533447265625, 148.98583984375, 108.64374542236328, 201.24566650390625, 81.12592315673828, 207.76400756835938, 80.9655532836914, 70.77426147460938, -96.62944030761719, 306.00848388671875, 127.10630798339844, 417.7395935058594, 147.0537872314453, 281.2768859863281, 545.7005615234375, 27.458662033081055, 639.4384155273438, -534.05712890625, 392.89044189453125, 194.97393798828125, 307.99530029296875, 128.88510131835938, -50.399986267089844, 186.72857666015625, 103.48379516601562, 24.48685073852539, 291.4011535644531, 214.590576171875, 108.8252182006836, 420.45635986328125, -85.47709655761719, 562.0477294921875, 136.70176696777344, 35.128807067871094, 457.8739013671875, 202.88079833984375, 503.995361328125, 638.8028564453125, 368.220458984375, 416.60235595703125, 154.98507690429688, 424.0013732910156, 413.6363830566406, -25.254974365234375, 671.1398315429688, 298.9798278808594, 165.77711486816406, 508.8834533691406, 476.968994140625, 198.79559326171875, -2.5278244018554688, 186.08966064453125, 246.8831787109375, 407.8731689453125, 583.18115234375, -33.68931579589844, 744.80615234375, 350.6176452636719, -10.033767700195312, 477.7618103027344, 262.46624755859375, 87.67403411865234, 168.21755981445312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000470.npy"}
{"epoch": 0.6901615271659325, "step": 471, "batch_size": 64, "mean": 199.71197509765625, "std": 203.51272583007812, "min": -238.65721130371094, "p10": -33.59572067260742, "median": 207.72998809814453, "p90": 413.4270446777344, "max": 822.1947021484375, "pos_frac": 0.796875, "sample": [516.9970092773438, -11.085708618164062, 192.996337890625, 438.7658386230469, 9.06982421875, 349.2183837890625, 348.0377197265625, 796.8418579101562, 373.1168212890625, 96.11669921875, 185.3126678466797, 214.0438232421875, 252.2432098388672, 221.93746948242188, 320.7587890625, 11.76827621459961, -66.37911987304688, -88.80680847167969, 215.23196411132812, 385.0680847167969, 822.1947021484375, 99.76643371582031, 131.74301147460938, 363.55712890625, 334.4205627441406, -9.975652694702148, -14.054168701171875, -3.431333541870117, 26.89856719970703, 356.1451721191406, -41.749717712402344, 504.1318359375, 163.75466918945312, 403.1671447753906, 165.5768585205078, 424.823486328125, 226.84292602539062, 145.0137481689453, 340.98236083984375, 414.0516357421875, 120.35599517822266, 7.893760681152344, 206.0733184814453, 70.32865905761719, 343.3104248046875, 411.96966552734375, 190.4495849609375, -178.99948120117188, 302.14886474609375, 29.11163330078125, -238.65721130371094, 209.38665771484375, -32.85637664794922, 308.6336669921875, 38.37586975097656, 232.55877685546875, -41.12010955810547, -33.91258239746094, 282.5555419921875, 227.24111938476562, 200.3408660888672, 313.14447021484375, -29.941560745239258, 228.06150817871094], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000471.npy"}
{"epoch": 0.6916299559471366, "step": 472, "batch_size": 64, "mean": 248.52740478515625, "std": 253.4379119873047, "min": -141.44631958007812, "p10": -12.540178680419896, "median": 186.11627197265625, "p90": 610.172821044922, "max": 951.1907958984375, "pos_frac": 0.890625, "sample": [-94.92918395996094, -59.37898254394531, 394.7295837402344, -92.06393432617188, 38.91508483886719, 221.49246215820312, 218.80026245117188, -67.05108642578125, 121.55355834960938, -141.44631958007812, 190.1590118408203, 203.479248046875, 84.66133117675781, 277.2272033691406, 73.49398803710938, 636.1749267578125, 84.27973937988281, 262.45513916015625, 22.2835693359375, 180.23797607421875, 587.2239379882812, 111.52189636230469, 244.5684814453125, 37.239295959472656, 130.498046875, 12.930252075195312, 133.17996215820312, 103.08817291259766, 167.6639404296875, 513.0885620117188, 27.727874755859375, 447.8097839355469, 241.58059692382812, 475.8642272949219, 951.1907958984375, 529.568359375, 182.0735321044922, 844.8284912109375, 267.5535888671875, 55.744110107421875, 478.34039306640625, 470.25299072265625, 71.34486389160156, 192.53257751464844, -60.12959289550781, 742.87548828125, 884.1214599609375, 205.0083770751953, 620.008056640625, 95.49476623535156, 793.142578125, 16.147659301757812, 101.88368225097656, -23.456077575683594, 314.4010314941406, 344.66510009765625, 488.02886962890625, 465.6413269042969, 86.13510131835938, 288.2166748046875, 153.7849884033203, 79.73295593261719, 302.0994567871094, 175.4626922607422], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000472.npy"}
{"epoch": 0.6930983847283406, "step": 473, "batch_size": 64, "mean": 172.9485321044922, "std": 256.728515625, "min": -522.6921997070312, "p10": -138.87391738891594, "median": 192.85015869140625, "p90": 407.2226165771484, "max": 1153.03466796875, "pos_frac": 0.765625, "sample": [-49.129188537597656, 150.8636474609375, 321.31976318359375, 365.61151123046875, -59.64437484741211, 295.848876953125, 83.04754638671875, 139.34780883789062, 322.861083984375, 140.0811004638672, 440.59423828125, 406.8276062011719, 158.65611267089844, 51.76031494140625, -5.04205322265625, -63.464210510253906, 312.01226806640625, 52.89973449707031, 285.1246643066406, 308.61138916015625, -169.49627685546875, 225.5880584716797, 368.0737609863281, 207.2418670654297, 222.09332275390625, 50.776100158691406, 299.00341796875, 449.86065673828125, 200.54647827148438, 185.15383911132812, -342.1123962402344, 171.07095336914062, -244.70004272460938, 85.63703918457031, -522.6921997070312, -194.6442413330078, 1153.03466796875, 282.69891357421875, 311.5926818847656, 661.22216796875, 270.8470458984375, 256.4735412597656, 367.0755615234375, 310.205078125, 180.30470275878906, 180.4512939453125, -25.526084899902344, -354.1584167480469, 14.084854125976562, 345.6153869628906, 407.39190673828125, 337.9281005859375, -67.42174530029297, 30.007789611816406, 212.39584350585938, 4.450767517089844, 521.6773071289062, -236.6808624267578, 296.3839111328125, 150.36441040039062, -38.43855285644531, 383.58319091796875, -2.8230953216552734, 466.3775634765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000473.npy"}
{"epoch": 0.6945668135095447, "step": 474, "batch_size": 64, "mean": 179.43878173828125, "std": 227.61537170410156, "min": -446.4132080078125, "p10": -83.49398345947266, "median": 179.07029724121094, "p90": 461.3204803466798, "max": 820.6637573242188, "pos_frac": 0.796875, "sample": [-181.92384338378906, 177.81710815429688, 57.281925201416016, 375.66796875, 326.1634216308594, 75.39163970947266, 520.5223388671875, -33.9925537109375, 222.46713256835938, 431.89593505859375, 379.2856750488281, 497.53173828125, 15.71414566040039, 47.282508850097656, 54.091270446777344, 337.0226745605469, 114.66189575195312, -3.4179306030273438, -153.19085693359375, 185.97576904296875, 36.80995178222656, 326.8529357910156, -80.16834259033203, 18.36019515991211, -62.774452209472656, 312.6075134277344, 168.11825561523438, 180.323486328125, 341.39349365234375, 241.9566192626953, 195.97161865234375, -52.165767669677734, 169.5601806640625, 30.58391571044922, 316.7269287109375, 166.0637969970703, 373.8728942871094, 101.1015625, 34.60615539550781, -137.2452392578125, 249.4737091064453, 156.3721923828125, 363.41766357421875, 69.73638916015625, 170.01678466796875, 125.34054565429688, 356.5837097167969, 663.3179321289062, -104.64424133300781, -362.5549621582031, 326.1953125, 500.77593994140625, 201.93411254882812, 363.40380859375, 473.9309997558594, 250.53602600097656, 264.72369384765625, -84.91925811767578, 336.8789367675781, -42.9859619140625, 217.07330322265625, 486.42138671875, -446.4132080078125, 820.6637573242188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000474.npy"}
{"epoch": 0.6960352422907489, "step": 475, "batch_size": 64, "mean": 239.89425659179688, "std": 210.96414184570312, "min": -170.1549835205078, "p10": -35.12461738586425, "median": 233.9521026611328, "p90": 517.5631774902344, "max": 803.55859375, "pos_frac": 0.859375, "sample": [81.21234130859375, -25.623729705810547, 334.6571350097656, 189.53651428222656, 45.22308349609375, 345.1653747558594, 713.022216796875, 116.24694061279297, 571.7323608398438, 515.2503662109375, 226.66250610351562, 223.06967163085938, 313.16412353515625, 60.32330322265625, 231.46412658691406, 11.884122848510742, 484.1245422363281, 158.6810760498047, 255.3619842529297, 13.236913681030273, 207.11363220214844, 236.44007873535156, 307.5732116699219, 76.25006103515625, -46.27585220336914, 268.5415954589844, 260.4166259765625, 296.3381042480469, 73.03627014160156, 316.91070556640625, 115.57929229736328, 374.1028747558594, -170.1549835205078, 426.1178283691406, 278.61444091796875, 518.5543823242188, 273.3443908691406, 328.4339599609375, 149.74330139160156, 685.588134765625, 184.1544189453125, -3.5591773986816406, 215.78524780273438, -119.42216491699219, 402.20989990234375, 170.6819305419922, 65.64842987060547, -47.76033020019531, 297.39715576171875, 399.6177978515625, -44.083831787109375, 264.2898254394531, 437.26922607421875, 8.744232177734375, 537.8585205078125, -39.19642639160156, 198.9426727294922, 803.55859375, 382.4416198730469, 609.3134765625, 124.18484497070312, 438.7550048828125, -120.16810607910156, 345.9067077636719], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000475.npy"}
{"epoch": 0.697503671071953, "step": 476, "batch_size": 64, "mean": 208.7927703857422, "std": 198.04208374023438, "min": -208.00048828125, "p10": 0.01492500305176847, "median": 176.34130096435547, "p90": 514.2920227050782, "max": 729.890869140625, "pos_frac": 0.890625, "sample": [452.9228210449219, -43.328575134277344, 17.480932235717773, 159.7571563720703, 282.1449279785156, 249.47903442382812, 531.4468994140625, 308.3525695800781, 119.27029418945312, -163.04022216796875, 604.2669067382812, 45.50763702392578, 563.7997436523438, 396.0496826171875, 413.33184814453125, 373.37896728515625, 348.57659912109375, 44.403831481933594, 214.70437622070312, 201.21188354492188, 219.03915405273438, 264.521728515625, 33.47520446777344, 411.07293701171875, 130.2312774658203, 219.52561950683594, -208.00048828125, 12.232963562011719, 78.42547607421875, 48.059242248535156, 12.3275146484375, 114.45928955078125, -4.448116302490234, 10.428688049316406, 346.70306396484375, 92.74589538574219, 100.71875762939453, -24.808382034301758, 248.8905029296875, 126.6358871459961, 55.934757232666016, 328.50286865234375, 203.88427734375, -85.92622375488281, 69.54125213623047, 114.49846649169922, 474.26397705078125, 557.8746948242188, 729.890869140625, 532.1844482421875, 158.70936584472656, 143.13856506347656, 192.92544555664062, 297.5194396972656, 128.30287170410156, 76.89425659179688, 275.01348876953125, 470.938720703125, 108.26921081542969, 304.8313903808594, 543.137451171875, 262.5540466308594, 125.47352600097656, -57.57328414916992], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000476.npy"}
{"epoch": 0.6989720998531571, "step": 477, "batch_size": 64, "mean": 194.69967651367188, "std": 213.29856872558594, "min": -450.8719787597656, "p10": -47.80071792602538, "median": 191.1541290283203, "p90": 464.19966125488287, "max": 691.58447265625, "pos_frac": 0.859375, "sample": [-54.06640625, 299.31304931640625, 305.052001953125, -33.18077850341797, 440.8638000488281, 301.0841064453125, 227.78292846679688, 255.13027954101562, 193.9344482421875, 329.1766052246094, 107.47284698486328, 103.6591567993164, 214.86839294433594, 434.19403076171875, 513.0614013671875, 16.757455825805664, 224.11952209472656, 240.0361328125, 660.784423828125, 141.50848388671875, 226.20169067382812, 135.46728515625, 17.044418334960938, 202.41421508789062, -72.60592651367188, 262.58270263671875, 165.93966674804688, 188.37380981445312, -67.86154174804688, 48.99041748046875, -300.878173828125, 237.13980102539062, -450.8719787597656, -146.88955688476562, 612.8211669921875, 130.48043823242188, 153.50613403320312, 275.3118591308594, 371.24365234375, 150.38937377929688, 402.95404052734375, 120.307861328125, 345.3076171875, 520.2861938476562, 17.379608154296875, 83.03719329833984, 152.29293823242188, 481.452392578125, 288.0930480957031, 166.07064819335938, 471.7514343261719, -124.8165512084961, 40.401161193847656, 187.61004638671875, 3.1851959228515625, 396.7348937988281, 226.24099731445312, -30.009674072265625, 60.989471435546875, 46.107967376708984, 47.134178161621094, 359.75384521484375, 691.58447265625, 446.578857421875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000477.npy"}
{"epoch": 0.7004405286343612, "step": 478, "batch_size": 64, "mean": 219.52688598632812, "std": 205.8341064453125, "min": -109.665771484375, "p10": -7.1115127563476275, "median": 186.97076416015625, "p90": 443.0913452148438, "max": 891.286376953125, "pos_frac": 0.890625, "sample": [33.14210510253906, 119.99971008300781, 41.74475860595703, 132.77719116210938, 123.69481658935547, 150.34022521972656, 226.70123291015625, -19.404495239257812, -32.53626251220703, 50.90167236328125, 46.06498718261719, 225.53546142578125, -47.557472229003906, 208.13807678222656, 49.764259338378906, 428.00909423828125, 275.87860107421875, 702.4718017578125, 148.88172912597656, 201.79583740234375, -41.72529220581055, 53.172149658203125, -109.665771484375, 283.9607849121094, 304.78460693359375, 688.319091796875, 393.5955810546875, 105.88960266113281, -33.063751220703125, 163.53318786621094, 891.286376953125, 448.6160583496094, 291.92791748046875, 192.57821655273438, 279.1011962890625, 394.68304443359375, 430.2003479003906, 255.43331909179688, 275.4203796386719, 95.63191223144531, 330.3582458496094, 110.56800079345703, 40.935546875, -48.01384735107422, 514.714599609375, 227.2681884765625, 186.75946044921875, 685.0325317382812, 119.66349792480469, 223.89027404785156, 378.4946594238281, 129.11386108398438, 255.91412353515625, 304.996337890625, 47.04563903808594, 149.80221557617188, 370.3775939941406, 22.809219360351562, 716.2247924804688, 160.14346313476562, 187.18206787109375, 134.89122009277344, 349.9540710449219, 21.572113037109375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000478.npy"}
{"epoch": 0.7019089574155654, "step": 479, "batch_size": 64, "mean": 181.81784057617188, "std": 191.73577880859375, "min": -521.2464599609375, "p10": -4.604800033569318, "median": 166.83212280273438, "p90": 435.14086303710945, "max": 640.9283447265625, "pos_frac": 0.890625, "sample": [115.4698257446289, 179.45358276367188, 493.6672058105469, 128.78549194335938, 19.073867797851562, 392.302001953125, 13.405040740966797, -74.4431381225586, 400.9644775390625, 15.207426071166992, 303.6971435546875, 61.483211517333984, 40.25440216064453, 280.08062744140625, 461.7635192871094, 173.62847900390625, 255.55116271972656, -12.32330322265625, 52.493919372558594, 47.86197280883789, 27.24697494506836, 402.83270263671875, 42.84748840332031, 129.59005737304688, 490.0914306640625, 506.5102233886719, 379.84527587890625, 309.0216979980469, 369.407958984375, 52.76263427734375, 236.67617797851562, 245.3759765625, 273.9324951171875, 320.3858642578125, -89.68619537353516, 449.4166259765625, -67.89529418945312, 82.31342315673828, 177.99191284179688, 134.4712371826172, 275.89776611328125, 421.97113037109375, 327.48199462890625, 235.71035766601562, 63.89643859863281, -521.2464599609375, 178.85272216796875, 110.1295166015625, 19.02857208251953, 98.75534057617188, 123.53460693359375, 19.368125915527344, -76.8551025390625, -83.91101837158203, 208.9539337158203, 105.54553985595703, 358.6136474609375, 129.87112426757812, 160.0357666015625, 640.9283447265625, 193.9668426513672, 440.7850341796875, 267.68365478515625, 115.82862091064453], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000479.npy"}
{"epoch": 0.7033773861967695, "step": 480, "batch_size": 64, "mean": 168.90713500976562, "std": 217.46334838867188, "min": -326.1108093261719, "p10": -111.43830871582031, "median": 153.2289581298828, "p90": 459.09882812500007, "max": 695.1116943359375, "pos_frac": 0.78125, "sample": [-41.30116653442383, 172.8257598876953, -105.77847290039062, 97.57940673828125, 89.08889770507812, 274.0440673828125, 240.21697998046875, 438.2085876464844, 107.12564849853516, 386.33795166015625, 141.22979736328125, 573.6968994140625, -27.355606079101562, -91.89274597167969, 108.7686767578125, -159.35562133789062, 118.53300476074219, 317.3971862792969, 191.831298828125, 516.612060546875, 107.3170394897461, -135.92649841308594, 13.30523681640625, 241.02667236328125, 152.9493408203125, 168.2677459716797, 315.69940185546875, -36.39567565917969, 380.7303771972656, -178.0480499267578, 468.0517883300781, 535.5532836914062, 407.4074401855469, 120.96022033691406, -326.1108093261719, 243.31719970703125, 335.94781494140625, -101.14317321777344, 291.07916259765625, 120.89727783203125, -165.87327575683594, -320.414794921875, 400.897705078125, 270.88128662109375, 695.1116943359375, 121.66837310791016, 257.48370361328125, 419.629150390625, 18.73031997680664, 25.488130569458008, -1.3654937744140625, -113.86395263671875, 128.42529296875, 195.54678344726562, 153.50857543945312, 125.76670837402344, 210.72900390625, 58.35549545288086, 303.0223388671875, 505.26611328125, 360.2230224609375, 20.866262435913086, 191.9808807373047, 475.2948913574219], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000480.npy"}
{"epoch": 0.7048458149779736, "step": 481, "batch_size": 64, "mean": 210.886962890625, "std": 205.94589233398438, "min": -218.53060913085938, "p10": -56.79589080810543, "median": 204.08621215820312, "p90": 445.100131225586, "max": 709.4593505859375, "pos_frac": 0.859375, "sample": [447.9263000488281, -217.6525421142578, 438.5057373046875, 134.4515380859375, 142.76829528808594, 87.28028869628906, -23.929397583007812, 229.05734252929688, 293.7966613769531, 678.8499145507812, 49.51708221435547, -172.84014892578125, 59.20069885253906, -106.16717529296875, 249.73292541503906, 189.52334594726562, 56.37239074707031, 342.9365234375, 569.49658203125, -19.014450073242188, 151.8673095703125, 338.6399230957031, 226.35968017578125, 322.2520446777344, 171.08087158203125, 117.89713287353516, 176.5477294921875, 375.00970458984375, 65.41305541992188, -128.33633422851562, -70.88153076171875, 362.76678466796875, 145.38958740234375, -218.53060913085938, 423.2950134277344, 255.666015625, 17.715980529785156, 203.0736846923828, 416.2739562988281, 245.07566833496094, 118.15863037109375, 205.09873962402344, 289.5628662109375, 493.50518798828125, 31.346473693847656, 113.63152313232422, 428.1150207519531, 157.32821655273438, 709.4593505859375, 542.8553466796875, 275.736572265625, 16.493392944335938, 115.73776245117188, 326.9914855957031, 183.52496337890625, 314.056640625, -102.609619140625, 249.79226684570312, 424.9282531738281, 307.2614440917969, 617.9168701171875, 103.42774963378906, 215.1830596923828, 332.87603759765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000481.npy"}
{"epoch": 0.7063142437591777, "step": 482, "batch_size": 64, "mean": 154.00546264648438, "std": 211.45213317871094, "min": -566.7672119140625, "p10": -91.68539428710932, "median": 150.9571304321289, "p90": 436.73386230468753, "max": 764.6431884765625, "pos_frac": 0.84375, "sample": [279.0702209472656, -120.66746520996094, 263.1888122558594, -2.684906005859375, 410.490234375, 447.1216125488281, -116.98782348632812, 451.58721923828125, 189.1248321533203, 71.89654541015625, 154.54275512695312, 147.53207397460938, 5.3613433837890625, 66.26873779296875, 524.519775390625, 385.24713134765625, 216.00030517578125, 222.38902282714844, 144.8399200439453, 33.09138107299805, -25.724353790283203, 26.294647216796875, 154.38218688964844, 392.89453125, -202.3191680908203, 92.8616943359375, 54.80657196044922, 370.79437255859375, 289.92779541015625, 3.947551727294922, 242.26718139648438, 281.89532470703125, 446.6014099121094, 175.40292358398438, -132.65164184570312, 115.86038208007812, 240.83563232421875, 27.270416259765625, 529.6985473632812, -205.03237915039062, 335.8450622558594, 159.2446746826172, 29.454259872436523, 57.650856018066406, 227.52552795410156, 45.285369873046875, -566.7672119140625, 304.31732177734375, 764.6431884765625, 78.75142669677734, -32.646392822265625, 64.18852233886719, 48.09027099609375, 164.5574188232422, -156.53497314453125, 225.7366943359375, 115.52455139160156, 427.91046142578125, 1.485107421875, 165.22071838378906, 70.51058197021484, 74.9155502319336, 440.51531982421875, 158.97894287109375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000482.npy"}
{"epoch": 0.7077826725403817, "step": 483, "batch_size": 64, "mean": 245.37890625, "std": 221.8968048095703, "min": -111.13751220703125, "p10": -32.12652149200439, "median": 229.82573699951172, "p90": 520.986199951172, "max": 1049.081298828125, "pos_frac": 0.859375, "sample": [-32.46882629394531, 82.38925170898438, 440.91571044921875, 445.99176025390625, -55.64512634277344, 397.9434814453125, 256.13446044921875, 169.8421630859375, 495.35369873046875, 442.57623291015625, 400.51727294921875, 378.43603515625, 291.39117431640625, 21.373306274414062, 22.986412048339844, 192.9708251953125, 591.8733520507812, 230.9099884033203, -44.065208435058594, 270.98065185546875, 36.786766052246094, -31.327810287475586, 531.9715576171875, 636.7129516601562, 214.91009521484375, 135.86691284179688, 658.735107421875, 277.6331787109375, 89.87165832519531, 480.2408447265625, -64.48966979980469, 613.1970825195312, 28.042442321777344, 582.6911010742188, 94.11119079589844, 443.26654052734375, 282.5726318359375, 243.6514892578125, 82.40560913085938, 420.5584716796875, 269.8416748046875, 339.3314208984375, 209.53900146484375, 28.584945678710938, 207.59478759765625, 90.27628326416016, 154.58840942382812, 134.1790313720703, 309.2542419433594, -45.81074523925781, 262.82293701171875, -111.13751220703125, 69.57818603515625, 1049.081298828125, 63.62434387207031, 255.7133026123047, 150.80665588378906, 314.747314453125, 356.30517578125, -24.871366500854492, 228.74148559570312, 464.2839050292969, 217.7396240234375, -48.378360748291016], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000483.npy"}
{"epoch": 0.7092511013215859, "step": 484, "batch_size": 64, "mean": 215.4441680908203, "std": 191.19508361816406, "min": -61.85877227783203, "p10": 3.099153137207034, "median": 157.1880874633789, "p90": 479.872900390625, "max": 879.2503662109375, "pos_frac": 0.921875, "sample": [253.11219787597656, 146.5723419189453, 344.2677001953125, 257.57733154296875, -41.49507141113281, 58.63206481933594, 136.6759490966797, 139.0484619140625, 480.03289794921875, 197.96435546875, 37.78007125854492, 135.60214233398438, 263.9123229980469, 292.14910888671875, 263.99346923828125, 211.4333953857422, 188.01068115234375, 113.60455322265625, -61.85877227783203, -57.92094421386719, 154.08828735351562, 117.27958679199219, 228.96392822265625, 268.5318298339844, 879.2503662109375, 1.3835906982421875, 225.06903076171875, 402.4852294921875, 1.9295005798339844, -24.219985961914062, 5.828342437744141, 102.0931396484375, 455.7557373046875, 579.2274169921875, 18.027687072753906, 237.99468994140625, 116.627685546875, 123.2129135131836, 123.65516662597656, 80.12052154541016, 123.73758697509766, 62.103355407714844, 387.952392578125, 260.8063049316406, 492.38568115234375, 133.83685302734375, 373.3154602050781, 377.2185363769531, 77.2659683227539, -19.10649871826172, 19.530595779418945, 151.09423828125, 535.6510009765625, 229.5172119140625, 160.2878875732422, 430.7530517578125, 572.3455810546875, 203.06536865234375, 479.49957275390625, 33.221534729003906, 647.9190063476562, 139.1386260986328, 46.226768493652344, 414.2621765136719], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000484.npy"}
{"epoch": 0.71071953010279, "step": 485, "batch_size": 64, "mean": 223.0164031982422, "std": 213.76528930664062, "min": -273.987548828125, "p10": -26.670518493652335, "median": 216.4036636352539, "p90": 490.1617187500001, "max": 664.498046875, "pos_frac": 0.875, "sample": [283.2544860839844, 333.6732482910156, 214.23451232910156, -33.51068115234375, 377.817138671875, 32.940895080566406, 433.80426025390625, -89.51895141601562, 234.2908172607422, 320.3008728027344, 160.41085815429688, 64.42742156982422, 469.9613952636719, 375.10406494140625, 185.022216796875, 644.2277221679688, -273.987548828125, 86.95552825927734, 457.0895080566406, 416.5887756347656, 420.4285583496094, 458.5521545410156, 22.12652587890625, 271.35260009765625, 643.1075439453125, 664.498046875, -16.335670471191406, 99.87206268310547, 281.5138854980469, -115.18352508544922, 23.43254852294922, 124.35491943359375, 107.94373321533203, -31.09973907470703, 218.57281494140625, 338.92767333984375, 538.6817626953125, 390.79217529296875, 130.10289001464844, 391.19146728515625, 184.31103515625, 88.1092529296875, 498.8190002441406, 255.51617431640625, -43.247703552246094, 290.6836853027344, 31.08308982849121, 139.72198486328125, 232.360107421875, -247.795654296875, 537.294677734375, 402.2752380371094, 46.385623931884766, 19.990264892578125, 70.94435119628906, 16.517990112304688, 562.411865234375, 37.39476013183594, 45.289634704589844, 163.78463745117188, 249.47427368164062, 184.7239227294922, 447.40203857421875, 403.6806945800781], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000485.npy"}
{"epoch": 0.7121879588839941, "step": 486, "batch_size": 64, "mean": 228.66432189941406, "std": 242.95069885253906, "min": -254.27880859375, "p10": -91.58001480102538, "median": 245.9873504638672, "p90": 527.4652587890625, "max": 841.235595703125, "pos_frac": 0.828125, "sample": [578.843994140625, 241.32763671875, 37.99977111816406, 48.80771255493164, 250.64706420898438, 134.9698028564453, 154.79586791992188, -82.25215911865234, 528.5366821289062, 14.884918212890625, 108.45028686523438, 452.70001220703125, 421.808349609375, 110.56072235107422, -254.27880859375, 743.1363525390625, 524.9652709960938, 274.3755187988281, -110.17457580566406, 541.9974365234375, 571.4335327148438, -126.89408874511719, 393.14581298828125, -95.57766723632812, 36.70856857299805, 203.19769287109375, 370.8084716796875, 376.4023132324219, 113.00283813476562, 435.1195068359375, 354.90911865234375, 188.6024169921875, -216.39794921875, -230.65936279296875, 140.8804473876953, 354.37811279296875, 85.2991943359375, 499.0772705078125, 371.3590087890625, 349.1640625, 154.97662353515625, 366.30230712890625, 324.61553955078125, -65.38629150390625, 352.2176818847656, 703.574462890625, 461.4847717285156, 290.1810302734375, 337.7184143066406, 422.20782470703125, -76.17606353759766, 266.3257751464844, 145.77481079101562, 213.64202880859375, 310.7859191894531, 82.59156799316406, 68.45496368408203, 316.3289794921875, -239.58865356445312, 191.95411682128906, 267.20928955078125, 6.580657958984375, -4.555305480957031, 841.235595703125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000486.npy"}
{"epoch": 0.7136563876651982, "step": 487, "batch_size": 64, "mean": 196.9682159423828, "std": 178.58924865722656, "min": -300.4004211425781, "p10": -29.010778617858886, "median": 215.77432250976562, "p90": 401.9945739746094, "max": 545.5452880859375, "pos_frac": 0.828125, "sample": [260.84442138671875, 332.2613525390625, 106.12852478027344, 134.41348266601562, 545.5452880859375, 368.5550842285156, 406.0950927734375, 235.53482055664062, 485.1165771484375, -7.933647155761719, 86.0230712890625, -32.90508270263672, 387.3837585449219, -292.39105224609375, 259.35760498046875, 392.42669677734375, 478.15362548828125, -54.282649993896484, 318.4524841308594, 499.04364013671875, 312.47308349609375, 296.59954833984375, 3.8611984252929688, 160.96450805664062, 84.58112335205078, 245.6005859375, 60.454917907714844, 226.54150390625, 363.5465087890625, 70.07799530029297, 256.61456298828125, -11.026430130004883, 206.46234130859375, 322.348876953125, 165.34002685546875, 18.579757690429688, 225.0863037109375, -300.4004211425781, 331.2342224121094, 265.833984375, 202.02491760253906, 195.048583984375, 198.0214385986328, -27.819772720336914, 235.27069091796875, 263.74285888671875, -20.725479125976562, 162.56082153320312, 277.3924560546875, 145.85305786132812, 138.5184783935547, 133.31552124023438, 317.2801513671875, 315.74871826171875, 141.46339416503906, -42.201416015625, 95.76089477539062, 521.8408813476562, 437.0682373046875, 370.1412658691406, -84.31051635742188, 255.05770874023438, -29.521209716796875, 191.83697509765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000487.npy"}
{"epoch": 0.7151248164464024, "step": 488, "batch_size": 64, "mean": 205.8707275390625, "std": 163.98211669921875, "min": -115.22282409667969, "p10": 9.446101379394532, "median": 190.65399932861328, "p90": 416.5237060546875, "max": 615.8372802734375, "pos_frac": 0.90625, "sample": [-1.2761459350585938, 105.38894653320312, 521.2955932617188, 127.18614196777344, 602.853271484375, 60.35325622558594, 368.9195556640625, 26.772926330566406, -9.668304443359375, 249.11288452148438, 172.52969360351562, 241.4618377685547, 173.21633911132812, 183.3892822265625, 329.9615478515625, 444.6832580566406, 56.221282958984375, 11.697301864624023, -115.22282409667969, 81.504638671875, 130.7406768798828, 137.53128051757812, 309.39501953125, 163.20745849609375, 43.47007751464844, 120.36880493164062, 196.82925415039062, 311.83465576171875, 615.8372802734375, 111.32801818847656, 192.89501953125, 274.8778076171875, 9.032859802246094, 277.05401611328125, 382.31298828125, 269.7912902832031, 414.4123840332031, 419.8360290527344, 261.5574035644531, 255.95361328125, 143.86318969726562, 368.18109130859375, 327.51483154296875, 57.09187316894531, 234.8992919921875, 93.79750061035156, 208.8228759765625, 395.40380859375, -24.90880584716797, 12.826927185058594, 188.41297912597656, 262.7477722167969, 304.38006591796875, 143.60943603515625, 271.864013671875, 490.6461486816406, -9.243253707885742, 270.9930725097656, 28.957012176513672, 10.410331726074219, -72.56001281738281, 417.4285583496094, 119.8150634765625, 402.12762451171875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000488.npy"}
{"epoch": 0.7165932452276065, "step": 489, "batch_size": 64, "mean": 174.794677734375, "std": 185.6405792236328, "min": -177.65164184570312, "p10": -17.26628704071045, "median": 115.22842025756836, "p90": 417.0620819091797, "max": 702.0123291015625, "pos_frac": 0.875, "sample": [306.6558532714844, -56.67604064941406, 44.67979431152344, 80.20524597167969, -16.438974380493164, 107.42694091796875, 102.24006652832031, 415.1119384765625, 52.06443786621094, 208.66250610351562, 65.70484924316406, 37.40523910522461, 210.47018432617188, 117.36908721923828, 266.4901123046875, 7.0645599365234375, 111.87428283691406, 301.35479736328125, 144.56591796875, 458.0221862792969, 132.9135284423828, 79.177978515625, 44.88664245605469, 153.93484497070312, 61.273929595947266, 39.90531921386719, 54.05778503417969, 59.93515396118164, 147.8656463623047, 51.66938781738281, 141.19580078125, 57.875640869140625, 57.257080078125, 36.651519775390625, 532.3051147460938, 328.68988037109375, 113.08775329589844, 52.39851379394531, 205.35226440429688, 14.577400207519531, 357.14825439453125, -37.58367156982422, 639.6522827148438, -41.77438735961914, 413.6265563964844, 322.4539489746094, 266.639892578125, 458.458984375, 144.8324737548828, -25.512420654296875, 702.0123291015625, 191.06753540039062, 223.2869110107422, 654.2571411132812, -26.25887107849121, 83.70317840576172, 417.8978576660156, 227.63174438476562, 411.6760559082031, -17.620849609375, 12.4586181640625, 381.90093994140625, 275.2925109863281, -177.65164184570312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000489.npy"}
{"epoch": 0.7180616740088106, "step": 490, "batch_size": 64, "mean": 204.73190307617188, "std": 232.7493438720703, "min": -320.66668701171875, "p10": -64.41148910522458, "median": 159.0927276611328, "p90": 515.8624908447267, "max": 820.4263916015625, "pos_frac": 0.859375, "sample": [42.09234619140625, -320.66668701171875, 478.67974853515625, 17.22224998474121, 73.6690673828125, 49.06195831298828, 179.90966796875, 249.05667114257812, 394.5966491699219, -80.5697021484375, 53.76666259765625, 45.79949951171875, 420.8142395019531, 160.33987426757812, 286.1900329589844, 40.048683166503906, -125.72750854492188, 177.6702880859375, 352.675048828125, -110.28085327148438, -13.746387481689453, 67.20367431640625, -26.70899200439453, 40.19148254394531, -185.1353302001953, 542.572998046875, 140.19302368164062, 107.64430236816406, 113.7877426147461, 353.6788024902344, 114.4227066040039, 484.63897705078125, 147.95452880859375, 481.0548400878906, 157.8455810546875, 113.40526580810547, 527.4237060546875, 253.9951171875, 668.7314453125, 75.90122985839844, 279.92657470703125, 241.187744140625, 570.1697387695312, 604.068603515625, 214.8906707763672, 668.388671875, -164.25579833984375, 427.6000671386719, 21.71746826171875, 322.15509033203125, 253.27052307128906, 52.013511657714844, 820.4263916015625, 168.02691650390625, 369.5261535644531, 468.297119140625, 11.92047119140625, -140.01870727539062, 188.43905639648438, 488.8863220214844, 37.596153259277344, 383.6696472167969, 149.54315185546875, 115.9927978515625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000490.npy"}
{"epoch": 0.7195301027900147, "step": 491, "batch_size": 64, "mean": 155.4693603515625, "std": 204.31492614746094, "min": -242.1868896484375, "p10": -89.67288589477538, "median": 162.82540130615234, "p90": 388.5605682373047, "max": 848.4334716796875, "pos_frac": 0.75, "sample": [272.415771484375, -15.480913162231445, 391.295166015625, 271.836669921875, 345.5506591796875, 164.89955139160156, 122.51275634765625, -28.13410758972168, 95.35923767089844, 24.15985107421875, 125.64006042480469, 179.58265686035156, -22.448387145996094, 85.09458923339844, 183.78488159179688, 14.505390167236328, 222.7311248779297, 212.74298095703125, 24.322769165039062, 242.42300415039062, 164.21661376953125, -161.51976013183594, 431.2192077636719, -105.34506225585938, 199.7482147216797, 133.5088348388672, 444.18231201171875, -77.97347259521484, 277.4664306640625, 54.068931579589844, 547.2257080078125, 235.53497314453125, -242.1868896484375, 74.48970794677734, 120.72935485839844, 194.67556762695312, -11.634193420410156, 503.30596923828125, 174.4550018310547, 378.9117431640625, 161.43418884277344, -135.5968017578125, 692.6553344726562, -94.68692016601562, -35.64775085449219, 195.22369384765625, -4.2655029296875, 251.2712860107422, -13.587677001953125, 216.63064575195312, 80.24140930175781, -143.4371795654297, 336.29241943359375, 165.7190704345703, 96.09130096435547, 229.3533477783203, 382.1798400878906, 327.53778076171875, 61.83982849121094, -8.696563720703125, 848.4334716796875, 19.80394744873047, 280.05731201171875, -206.67977905273438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000491.npy"}
{"epoch": 0.7209985315712188, "step": 492, "batch_size": 64, "mean": 131.85690307617188, "std": 238.54322814941406, "min": -448.2883605957031, "p10": -176.29180603027342, "median": 122.22612762451172, "p90": 438.4896850585938, "max": 673.7073364257812, "pos_frac": 0.75, "sample": [54.781578063964844, 139.67906188964844, 329.2869567871094, -73.64917755126953, 189.75601196289062, -48.22441101074219, 0.379425048828125, 442.37921142578125, 13.444232940673828, -123.25456237792969, 4.132591247558594, 235.52439880371094, -280.22412109375, 21.66266632080078, -357.3062744140625, 527.1347045898438, 128.25430297851562, 350.01446533203125, 45.55120849609375, 497.6712951660156, -207.11697387695312, 120.23538970947266, 186.58901977539062, 18.386932373046875, 393.6766357421875, 230.4156494140625, 295.02423095703125, -3.3017120361328125, 178.8913116455078, 121.42671203613281, -106.657470703125, 35.623497009277344, 52.019798278808594, -232.4145965576172, -20.88650894165039, 332.6911315917969, 585.3890380859375, 83.66525268554688, -448.2883605957031, 37.50035858154297, 673.7073364257812, -168.58883666992188, 351.4310302734375, -13.839439392089844, -116.37336730957031, 429.41412353515625, 159.33192443847656, 123.02554321289062, 636.9620361328125, 141.70196533203125, 201.39688110351562, 69.16462707519531, 290.87158203125, -179.59307861328125, 400.2420654296875, 241.70021057128906, 269.78057861328125, 148.39564514160156, 55.86061477661133, 271.0869445800781, -243.57131958007812, 537.5499267578125, 108.38253021240234, 300.93963623046875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000492.npy"}
{"epoch": 0.7224669603524229, "step": 493, "batch_size": 64, "mean": 171.43350219726562, "std": 177.9192657470703, "min": -297.85479736328125, "p10": -27.90030136108398, "median": 180.38404083251953, "p90": 388.92429809570314, "max": 623.9824829101562, "pos_frac": 0.796875, "sample": [140.87574768066406, 108.79903411865234, 302.6824645996094, 205.3798828125, -30.52850341796875, 623.9824829101562, 395.013671875, 99.01931762695312, 254.00668334960938, -297.85479736328125, 310.90789794921875, 64.47462463378906, 313.87921142578125, 201.83636474609375, 199.7926483154297, 234.41085815429688, 121.53451538085938, 255.20144653320312, 151.656982421875, 30.687788009643555, 139.2267608642578, 213.00750732421875, 287.7043151855469, 291.8682556152344, 225.65789794921875, 188.423583984375, 330.39404296875, -60.031593322753906, 11.952154159545898, -21.76782989501953, 57.6292724609375, -175.4445343017578, 267.70269775390625, 39.92344665527344, 365.0054626464844, 54.030418395996094, 390.6499938964844, 418.0316162109375, 153.8670654296875, -53.53019714355469, 100.41036987304688, 230.79620361328125, 363.44964599609375, 384.8976745605469, 264.08154296875, 541.3673706054688, -3.2543087005615234, 197.9512939453125, 566.2349243164062, 472.98663330078125, 195.08609008789062, 159.43121337890625, -125.03028869628906, 237.65835571289062, -6.37811279296875, -84.75687408447266, -10.783651351928711, -2.147075653076172, 269.4066467285156, 150.94093322753906, 69.30353546142578, 22.970458984375, -5.28106689453125, 172.34449768066406], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000493.npy"}
{"epoch": 0.723935389133627, "step": 494, "batch_size": 64, "mean": 229.67922973632812, "std": 194.30264282226562, "min": -320.21759033203125, "p10": 1.8758697509765632, "median": 238.07880401611328, "p90": 464.0044799804688, "max": 724.4761352539062, "pos_frac": 0.90625, "sample": [-1.4394607543945312, 262.90240478515625, 228.3153533935547, 329.0780334472656, 724.4761352539062, 398.12353515625, -286.89556884765625, 465.32525634765625, 126.34793090820312, 63.042236328125, 357.0888977050781, 239.6503143310547, 550.093994140625, 285.7230224609375, 91.70882415771484, 617.2454223632812, 305.5505065917969, -35.9375, 356.54534912109375, 436.7824401855469, 59.409950256347656, -320.21759033203125, 363.7482604980469, 396.7244873046875, 433.05511474609375, 83.67166900634766, 374.6421203613281, 513.802734375, 191.88710021972656, 124.42898559570312, -34.17810821533203, 1.5651397705078125, 170.4932403564453, 43.46959686279297, 329.4047546386719, 236.50729370117188, 289.6583251953125, 77.8747787475586, 175.43104553222656, 534.1343994140625, 460.92266845703125, 188.05392456054688, -10.968887329101562, 105.30895233154297, 130.94781494140625, 305.3875732421875, 470.02301025390625, 265.4667053222656, 115.9330062866211, 248.73744201660156, 82.76301574707031, 2.6009063720703125, 321.74444580078125, 178.52041625976562, 245.9034423828125, 89.5228271484375, 415.77435302734375, 355.07696533203125, 196.16860961914062, 366.131591796875, 251.39468383789062, 182.0115203857422, 158.43685913085938, 14.369552612304688], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000494.npy"}
{"epoch": 0.7254038179148311, "step": 495, "batch_size": 64, "mean": 209.8826446533203, "std": 213.63108825683594, "min": -225.37533569335938, "p10": -18.253870201110825, "median": 173.34152221679688, "p90": 476.07450866699224, "max": 839.2472534179688, "pos_frac": 0.875, "sample": [284.36090087890625, -3.2319202423095703, 333.8124084472656, 77.84820556640625, 30.00347900390625, 172.4769287109375, -104.85893249511719, 90.42315673828125, 207.58282470703125, 24.282058715820312, 194.49562072753906, 398.549072265625, 442.44146728515625, -174.2538299560547, 272.09112548828125, 50.010887145996094, -77.322998046875, 467.4399719238281, 63.572235107421875, 382.5149841308594, 81.27117919921875, -225.37533569335938, 277.3900451660156, 126.62176513671875, 186.20803833007812, 519.1436767578125, 407.2057189941406, 223.42001342773438, 101.41735076904297, 265.7952880859375, 159.6242218017578, 281.41607666015625, 213.1591796875, 144.89569091796875, 198.20619201660156, 566.612060546875, 479.7750244140625, -114.02876281738281, 609.1207275390625, 457.6341552734375, 270.87921142578125, 91.02455139160156, 71.30445098876953, 125.65203857421875, 101.99037170410156, 75.25372314453125, -24.691848754882812, 155.18902587890625, 157.90719604492188, -120.79350280761719, 174.20611572265625, 0.09259033203125, 358.182861328125, 311.3610534667969, 466.5283203125, 45.05998229980469, 107.38772583007812, 327.07940673828125, 71.42851257324219, 734.189208984375, 595.8515014648438, 253.52272033691406, 839.2472534179688, 152.88748168945312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000495.npy"}
{"epoch": 0.7268722466960352, "step": 496, "batch_size": 64, "mean": 159.43557739257812, "std": 175.25250244140625, "min": -231.2533721923828, "p10": -20.18380737304687, "median": 147.06317138671875, "p90": 416.6927062988283, "max": 603.3375244140625, "pos_frac": 0.796875, "sample": [66.89572143554688, -81.51885223388672, 473.8877868652344, 491.3934631347656, -16.131591796875, 482.70294189453125, 165.39132690429688, 205.00726318359375, 351.9884948730469, 157.46865844726562, 168.9642791748047, 74.27774810791016, 168.02371215820312, 102.18624877929688, 151.0675048828125, 75.224853515625, -9.7313232421875, 174.9794921875, -110.91161346435547, 205.5591278076172, 376.0625, 277.1549377441406, 603.3375244140625, 215.97174072265625, 452.23565673828125, 143.058837890625, 25.517288208007812, 313.05657958984375, 53.469696044921875, 115.70431518554688, 92.37310028076172, -15.63306999206543, 220.82778930664062, 280.0960388183594, 30.48137664794922, 3.1681365966796875, 137.87158203125, -29.20122528076172, -231.2533721923828, 227.70741271972656, -134.20440673828125, 57.4844970703125, 351.4624938964844, -13.375236511230469, 18.310562133789062, 479.9649963378906, -16.040023803710938, 333.2526550292969, 434.10565185546875, 338.0322265625, 58.801361083984375, -61.83323669433594, 346.43597412109375, 158.87583923339844, 26.205047607421875, 78.59291076660156, 195.0930938720703, 276.6783447265625, 34.042991638183594, -3.0382003784179688, 310.4530334472656, 276.642578125, 91.12236785888672, -21.92047119140625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000496.npy"}
{"epoch": 0.7283406754772394, "step": 497, "batch_size": 64, "mean": 153.30006408691406, "std": 187.25265502929688, "min": -238.04563903808594, "p10": -64.21643295288084, "median": 150.66756439208984, "p90": 385.91956176757816, "max": 571.783203125, "pos_frac": 0.796875, "sample": [12.164226531982422, 379.9345703125, 335.94549560546875, 498.5429382324219, 7.810520172119141, 149.8408203125, 388.48455810546875, -7.9448394775390625, 65.26750183105469, -27.79597282409668, 287.067138671875, 323.10919189453125, 537.1729736328125, 161.93727111816406, 26.179733276367188, -26.803131103515625, 283.7439270019531, 156.4907989501953, 336.55963134765625, 145.53573608398438, 126.50544738769531, 70.87683868408203, 321.8408508300781, 245.27670288085938, -132.34764099121094, 72.70184326171875, 266.4550476074219, -71.10279846191406, 77.20750427246094, -29.579299926757812, 248.61915588378906, 185.08934020996094, -23.630413055419922, 303.205322265625, 82.84768676757812, 571.783203125, 131.57354736328125, 509.6435241699219, 151.4943084716797, 21.655303955078125, 126.48348236083984, 189.73390197753906, 25.29957389831543, 342.47802734375, -165.2297821044922, 154.95547485351562, 213.76181030273438, 81.92172241210938, 289.06976318359375, 249.10971069335938, 210.87258911132812, 80.35441589355469, 482.7138366699219, 136.8965606689453, -48.14824676513672, 166.40573120117188, -135.39060974121094, 243.034912109375, -235.2782745361328, 94.00582885742188, 405.4718322753906, -238.04563903808594, 204.4324493408203, -227.06358337402344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000497.npy"}
{"epoch": 0.7298091042584435, "step": 498, "batch_size": 64, "mean": 174.86439514160156, "std": 226.5410919189453, "min": -406.1424255371094, "p10": -86.8869186401367, "median": 165.8636245727539, "p90": 448.68358154296874, "max": 806.366943359375, "pos_frac": 0.78125, "sample": [-73.69139099121094, 228.47879028320312, 170.52120971679688, 101.08273315429688, 117.71755981445312, 238.94271850585938, 456.58111572265625, 305.12762451171875, 241.72979736328125, 170.5625, 167.38131713867188, 471.44683837890625, -38.230560302734375, 335.48944091796875, -281.556884765625, 349.46051025390625, 231.6785125732422, 447.87335205078125, 92.73126983642578, 352.24334716796875, 38.293357849121094, 227.78298950195312, -22.267303466796875, 278.0453186035156, -69.135009765625, 147.99014282226562, 130.17787170410156, 28.49158477783203, -53.67035675048828, 164.62991333007812, 165.4293975830078, 282.0406494140625, -406.1424255371094, 806.366943359375, 229.66973876953125, 13.70806884765625, -46.51941680908203, 166.2978515625, 420.727783203125, 466.8404541015625, 792.97900390625, -120.88884735107422, -27.4027099609375, 145.80845642089844, 391.8773193359375, 150.341796875, -92.54214477539062, 317.6435546875, 373.6217346191406, 259.4228515625, 165.31578063964844, -152.64508056640625, 162.61135864257812, 140.34463500976562, -182.78880310058594, 21.063079833984375, 393.0118408203125, 80.67792510986328, 449.03082275390625, 234.4934539794922, 5.741556167602539, 305.65667724609375, 532.2957153320312, -208.67562866210938], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000498.npy"}
{"epoch": 0.7312775330396476, "step": 499, "batch_size": 64, "mean": 219.04827880859375, "std": 220.767822265625, "min": -337.4114074707031, "p10": -60.59201965332029, "median": 223.3811492919922, "p90": 488.1385864257813, "max": 632.166748046875, "pos_frac": 0.828125, "sample": [425.731689453125, 167.8207550048828, 181.71841430664062, -337.4114074707031, 124.27603912353516, 231.90304565429688, 258.8995056152344, 214.8592529296875, -7.899238586425781, -37.120635986328125, 184.6476287841797, 327.135009765625, 42.752220153808594, 493.5679931640625, 254.377685546875, 397.2839050292969, 329.37945556640625, -31.779098510742188, 632.166748046875, 166.5417022705078, 236.69171142578125, 41.0580940246582, 461.7296447753906, 133.02001953125, -216.41529846191406, 385.6119689941406, 249.38140869140625, 208.33262634277344, -133.3572540283203, 419.2210998535156, 198.50360107421875, -70.65118408203125, 296.9242248535156, 208.67825317382812, 168.11009216308594, 162.6437530517578, 362.4765625, 569.7756958007812, 277.850830078125, -200.4399871826172, 21.262617111206055, 352.9772644042969, 471.27777099609375, 234.26480102539062, 551.2374267578125, 248.10928344726562, 461.5074768066406, 608.541748046875, 138.55352783203125, 61.807777404785156, 211.52719116210938, 424.3825378417969, 199.43609619140625, 324.26898193359375, 21.678897857666016, 380.94189453125, 519.4910278320312, 263.9788513183594, -244.524658203125, 18.943525314331055, -116.17750549316406, 475.469970703125, -10.764200210571289, 622.9020385742188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000499.npy"}
{"epoch": 0.7327459618208517, "step": 500, "batch_size": 64, "mean": 189.812744140625, "std": 210.73500061035156, "min": -263.0548095703125, "p10": -66.80461578369137, "median": 159.21466064453125, "p90": 475.2341247558594, "max": 833.46630859375, "pos_frac": 0.84375, "sample": [33.497894287109375, -13.741985321044922, 56.88306427001953, 833.46630859375, -123.1408462524414, 319.8352966308594, 238.68246459960938, 300.3162841796875, 199.62767028808594, 193.5787353515625, 479.64910888671875, 366.00799560546875, 291.7479248046875, 146.06155395507812, -80.45301818847656, 194.00100708007812, 33.32037353515625, -23.84588623046875, 113.05461120605469, 403.29705810546875, 526.7266845703125, -90.30436706542969, 262.63818359375, 621.9818115234375, 119.31389617919922, 577.4407958984375, -263.0548095703125, 264.8125, 464.9324951171875, -238.22689819335938, 112.40129089355469, 453.07269287109375, 289.81683349609375, 15.304738998413086, 159.637939453125, 333.4109191894531, -143.2937469482422, 73.6917495727539, 224.36358642578125, -151.6333465576172, 126.79692840576172, 220.7999725341797, 135.7213134765625, 334.2549133300781, 86.29996490478516, 89.90831756591797, 318.2241516113281, 47.26850891113281, 158.7913818359375, 147.13473510742188, 560.754638671875, 109.80928802490234, 369.1838073730469, 105.69292449951172, 234.7518310546875, 260.0320739746094, 306.69732666015625, 101.65252685546875, 103.69561004638672, 202.49215698242188, 56.75086975097656, 506.9560852050781, -34.958343505859375, 24.425979614257812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000500.npy"}
{"epoch": 0.7342143906020558, "step": 501, "batch_size": 64, "mean": 210.9066162109375, "std": 207.08566284179688, "min": -222.9598388671875, "p10": -12.207765960693358, "median": 191.9647216796875, "p90": 496.2607391357422, "max": 754.4176635742188, "pos_frac": 0.859375, "sample": [-11.608985900878906, 183.0539093017578, 297.970947265625, -72.01261138916016, 52.957149505615234, 491.1122741699219, 73.10194396972656, 406.9651794433594, 170.1372528076172, 47.949371337890625, 237.87744140625, 19.482133865356445, 305.16156005859375, 147.9661407470703, 676.8497924804688, 397.36529541015625, 509.2060546875, 176.54132080078125, 233.7948760986328, 408.560302734375, 319.8975524902344, 171.66969299316406, 83.83641052246094, 122.60529327392578, 410.5721740722656, -12.464385986328125, -48.07176208496094, 198.43609619140625, 498.46722412109375, 235.08346557617188, -179.2998504638672, 185.49334716796875, 230.1612091064453, 152.99913024902344, 314.80950927734375, 754.4176635742188, -4.5104217529296875, 290.6255187988281, 167.67733764648438, 355.526611328125, 344.004150390625, 48.13719940185547, 129.8377227783203, 44.589874267578125, 458.47320556640625, 263.77545166015625, -124.93685913085938, 46.05165100097656, 94.59024047851562, -222.9598388671875, 215.16094970703125, 257.32470703125, -182.23277282714844, 4.915130615234375, 71.64959716796875, 7.557807922363281, 342.4275207519531, 498.97235107421875, 513.8007202148438, 277.7774353027344, 658.082275390625, 346.977294921875, 274.06292724609375, 129.62120056152344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000501.npy"}
{"epoch": 0.73568281938326, "step": 502, "batch_size": 64, "mean": 224.31927490234375, "std": 156.40475463867188, "min": -95.83180236816406, "p10": 30.35423030853272, "median": 231.96668243408203, "p90": 428.3896453857422, "max": 563.30029296875, "pos_frac": 0.921875, "sample": [371.8565673828125, -78.85762023925781, 102.44608306884766, 361.752197265625, 47.0850830078125, 229.33642578125, -54.69512176513672, 39.61474609375, 420.3778076171875, 287.89666748046875, 267.71600341796875, 264.5204772949219, 249.6240997314453, 367.8199462890625, 357.3390197753906, 52.76336669921875, 72.56439208984375, 86.59934997558594, -54.9716796875, 198.42062377929688, 469.9842224121094, 266.58734130859375, 210.4605712890625, 228.25918579101562, 16.536712646484375, 231.34996032714844, 149.33534240722656, 245.9373016357422, 347.71112060546875, 288.5403137207031, 314.26385498046875, 428.935546875, -12.359016418457031, 175.86898803710938, 388.4457702636719, 440.3204650878906, 136.46937561035156, 487.9300231933594, -95.83180236816406, 43.74482727050781, 473.48974609375, 255.1331024169922, 165.0609130859375, 103.43304443359375, 251.747802734375, 391.08709716796875, 563.30029296875, 27.524484634399414, 393.12347412109375, 36.95697021484375, 304.4579162597656, 158.31614685058594, 232.58340454101562, 53.066009521484375, 496.4242248535156, 128.2730712890625, 291.30145263671875, 295.7647705078125, 427.1158752441406, 183.9135284423828, 173.34848022460938, 240.32518005371094, 228.99769592285156, 129.99037170410156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000502.npy"}
{"epoch": 0.737151248164464, "step": 503, "batch_size": 64, "mean": 197.1927947998047, "std": 182.25326538085938, "min": -127.97386932373047, "p10": -28.144193458557126, "median": 168.84284210205078, "p90": 401.90710449218756, "max": 747.9297485351562, "pos_frac": 0.84375, "sample": [-8.466632843017578, -26.160573959350586, 574.8365478515625, 108.89341735839844, -102.10047912597656, -77.53216552734375, 394.18121337890625, 162.7928009033203, -28.99431610107422, 26.680458068847656, -87.93312072753906, 219.0774688720703, 103.2550277709961, 111.67997741699219, 531.9814453125, 290.00958251953125, 293.37518310546875, 109.43435668945312, 216.04962158203125, 329.21746826171875, 69.85746765136719, 91.96553039550781, 113.14146423339844, 109.28546905517578, 405.21820068359375, 240.51893615722656, -127.97386932373047, 386.2117919921875, 62.99565887451172, 224.13525390625, 174.89288330078125, -13.97760009765625, 50.40260314941406, 321.7386779785156, -42.162437438964844, 134.12530517578125, 465.3977355957031, 747.9297485351562, 218.21548461914062, 315.0293884277344, 132.78271484375, 297.85333251953125, 159.11524963378906, 149.2008514404297, -91.56607055664062, 326.08868408203125, 161.06898498535156, 477.273681640625, 240.59689331054688, 289.5443115234375, 61.801124572753906, 352.1998596191406, 328.0291748046875, 204.34413146972656, 630.4869384765625, 343.135498046875, 70.65071868896484, 238.887451171875, 91.65499114990234, 99.755615234375, 103.64929962158203, 219.5759735107422, 386.358154296875, 260.6260681152344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000503.npy"}
{"epoch": 0.7386196769456681, "step": 504, "batch_size": 64, "mean": 236.57835388183594, "std": 201.64376831054688, "min": -225.28164672851562, "p10": -11.898183441162105, "median": 204.6672134399414, "p90": 472.4954711914063, "max": 727.1781005859375, "pos_frac": 0.859375, "sample": [128.41098022460938, 145.4271240234375, 437.1296081542969, 169.4941864013672, 46.99641418457031, 198.22799682617188, 259.7831726074219, -96.38574981689453, 727.1781005859375, 644.1318359375, 720.2974243164062, -13.407661437988281, 378.3158874511719, 305.2935791015625, -8.376068115234375, 131.71400451660156, 414.4754943847656, 103.68838500976562, 109.86669158935547, 163.01031494140625, 366.630126953125, 295.657470703125, 217.2930145263672, 102.99559020996094, 247.8983612060547, 178.78114318847656, -91.08252716064453, 307.04791259765625, 125.34063720703125, 198.36866760253906, -225.28164672851562, 479.42718505859375, 115.50251770019531, 562.920654296875, 456.32147216796875, 346.1748046875, 379.1349182128906, 332.079345703125, 128.47012329101562, 210.96575927734375, 354.82464599609375, 691.0535888671875, 404.3179016113281, 167.5493927001953, 294.940673828125, -52.680450439453125, -56.930641174316406, 70.02680969238281, 110.22029113769531, 329.284423828125, 197.71914672851562, 349.5162048339844, 396.3875427246094, 359.58319091796875, 132.01246643066406, 119.52337646484375, -65.99845886230469, 268.9248046875, 489.80291748046875, 186.21524047851562, -4.112701416015625, 262.2733154296875, 415.8445739746094, 20.7989501953125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000504.npy"}
{"epoch": 0.7400881057268722, "step": 505, "batch_size": 64, "mean": 146.14662170410156, "std": 232.57928466796875, "min": -366.75396728515625, "p10": -111.90105895996093, "median": 103.00530242919922, "p90": 500.3648468017579, "max": 729.8859252929688, "pos_frac": 0.6875, "sample": [-4.199317932128906, 460.764892578125, -113.1458740234375, 385.9005126953125, -20.916030883789062, 524.5076904296875, -97.52342224121094, 73.45118713378906, 155.26348876953125, 95.99784851074219, 538.1107788085938, 410.7284240722656, 729.8859252929688, -5.74517822265625, 305.7825927734375, 330.51593017578125, 63.65498352050781, 33.4600830078125, 210.6180419921875, -31.879016876220703, 142.41883850097656, 511.627197265625, 110.01275634765625, 51.750732421875, -220.0833740234375, -108.99649047851562, 63.84492492675781, 232.037109375, 372.464111328125, 317.154296875, -159.6122589111328, 87.15216064453125, 309.2232666015625, -98.52847290039062, 74.43318939208984, 474.0860290527344, 127.29837799072266, 117.30125427246094, -13.332298278808594, 130.89405822753906, -116.873046875, -366.75396728515625, 61.97233200073242, 223.3018341064453, 20.07098388671875, 280.8747253417969, 625.7232055664062, 303.6495666503906, 7.313159942626953, 611.482421875, -82.72297668457031, -128.96112060546875, -82.53636932373047, 569.1897583007812, 235.26878356933594, -12.044601440429688, 187.25881958007812, -223.95619201660156, 62.904052734375, 171.70851135253906, 346.3922119140625, -61.60777282714844, 159.23873901367188, -3.887676239013672], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000505.npy"}
{"epoch": 0.7415565345080763, "step": 506, "batch_size": 64, "mean": 207.72500610351562, "std": 215.21420288085938, "min": -204.32994079589844, "p10": -60.68112640380859, "median": 177.4163589477539, "p90": 490.7414489746094, "max": 809.8201293945312, "pos_frac": 0.8125, "sample": [-204.32994079589844, 32.41180419921875, 302.6087646484375, -59.99946594238281, 96.58869934082031, 238.2868194580078, -26.849231719970703, 262.4090576171875, 422.6375427246094, 73.29754638671875, 331.8200988769531, 92.41716003417969, 137.22669982910156, 258.78839111328125, -39.61536407470703, 627.36083984375, 288.137451171875, 36.53474426269531, 154.47998046875, 395.8795471191406, 187.73779296875, 420.907958984375, 359.3379821777344, 191.1424560546875, 809.8201293945312, -201.16644287109375, -68.37065887451172, 348.1983337402344, 352.3997802734375, 131.50926208496094, 167.0949249267578, 485.57275390625, 309.9201354980469, 80.61534118652344, 441.53082275390625, -143.77548217773438, 374.5802917480469, 246.63787841796875, 154.03494262695312, 575.022705078125, 456.2624816894531, 309.0937194824219, 18.969497680664062, 232.08168029785156, 35.901100158691406, -100.54048919677734, 139.19635009765625, 133.87991333007812, 515.294189453125, 340.3751525878906, 323.1934814453125, 0.7620811462402344, 127.4422607421875, 492.95660400390625, 495.38128662109375, 584.39111328125, -50.43266296386719, 161.35877990722656, 81.96989440917969, -25.638267517089844, -60.9732666015625, 166.9981231689453, 337.9524230957031, -64.31775665283203], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000506.npy"}
{"epoch": 0.7430249632892805, "step": 507, "batch_size": 64, "mean": 183.55238342285156, "std": 200.6233673095703, "min": -185.84153747558594, "p10": -11.944218063354493, "median": 146.14669036865234, "p90": 410.42297363281256, "max": 889.0589599609375, "pos_frac": 0.875, "sample": [26.48103904724121, 167.69615173339844, -11.965896606445312, -185.84153747558594, 642.02001953125, 236.51681518554688, 138.52720642089844, 182.87490844726562, 65.61820220947266, 169.54299926757812, 61.55348205566406, 293.11083984375, -111.73149108886719, 209.80833435058594, 391.1142272949219, 103.1507568359375, -125.17762756347656, 140.50604248046875, 82.62684631347656, 889.0589599609375, 149.8991241455078, 231.4161376953125, -11.893634796142578, 268.4714050292969, 249.69818115234375, 305.8511962890625, 219.56002807617188, 303.50823974609375, 15.505977630615234, 414.695556640625, 248.99644470214844, 79.3709716796875, -36.87737274169922, 36.41127014160156, 53.38953399658203, 178.22891235351562, 135.59588623046875, 343.0504455566406, 214.77374267578125, 31.394065856933594, 245.9477996826172, 15.7752685546875, 62.05706787109375, 50.502586364746094, 715.158203125, 450.79791259765625, 142.39425659179688, 340.65057373046875, 17.02545166015625, 318.79656982421875, 310.39544677734375, 8.318624496459961, -24.45184326171875, 394.373779296875, 458.818359375, 400.45361328125, 514.626708984375, 371.8659973144531, 111.0340576171875, 20.406469345092773, 17.80683135986328, 10.283111572265625, -61.18064880371094, 58.960411071777344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000507.npy"}
{"epoch": 0.7444933920704846, "step": 508, "batch_size": 64, "mean": 172.5631561279297, "std": 210.88613891601562, "min": -365.8638000488281, "p10": -59.359803009033186, "median": 142.43885803222656, "p90": 440.0316894531251, "max": 718.17578125, "pos_frac": 0.796875, "sample": [135.78985595703125, 54.75048828125, -136.73045349121094, 194.14178466796875, 251.46522521972656, -81.714111328125, 29.08181381225586, 718.17578125, 413.3791809082031, 393.269775390625, 291.7076110839844, 410.6134033203125, 140.14218139648438, 656.153076171875, 362.388916015625, 201.2533721923828, 211.90664672851562, 17.58123779296875, 451.4541931152344, 461.931396484375, 5.651493072509766, 180.67591857910156, 107.73609924316406, 55.559326171875, 547.7808227539062, 227.37966918945312, 171.07479858398438, 70.89225006103516, 20.36662483215332, 158.9583740234375, -19.773391723632812, 235.90652465820312, 64.26765441894531, -15.877918243408203, 641.8506469726562, -365.8638000488281, -65.7109146118164, 111.22740936279297, 94.46133422851562, 304.4956359863281, 0.382415771484375, 261.67791748046875, 42.04266357421875, 147.42916870117188, 353.05914306640625, 144.73553466796875, 407.7832336425781, 83.63005065917969, 355.2718505859375, 404.1888732910156, 281.8606872558594, -0.076934814453125, -101.80244445800781, 498.123046875, 357.30511474609375, -84.88874053955078, -25.372650146484375, 22.7772216796875, -41.77339172363281, 194.98153686523438, -44.54054260253906, 15.500419616699219, -72.7036361694336, 136.65191650390625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000508.npy"}
{"epoch": 0.7459618208516887, "step": 509, "batch_size": 64, "mean": 177.45721435546875, "std": 197.6454315185547, "min": -171.88360595703125, "p10": -23.425650787353504, "median": 148.78773498535156, "p90": 445.0095489501954, "max": 860.634765625, "pos_frac": 0.828125, "sample": [78.50890350341797, -6.809234619140625, -28.754676818847656, 409.7895202636719, 32.136566162109375, 280.7940673828125, 85.20917510986328, 59.46570587158203, 279.09088134765625, 234.0101318359375, 315.1207275390625, 265.7872009277344, 138.87973022460938, 29.706666946411133, 40.64527893066406, -3.9264984130859375, 640.9073486328125, -163.61814880371094, -171.88360595703125, 249.53555297851562, 449.5942077636719, 273.4154052734375, 96.07423400878906, 123.88053131103516, 310.904296875, 115.58793640136719, 141.63153076171875, 187.61721801757812, 257.7371520996094, 220.52728271484375, 164.47434997558594, -121.4482650756836, -0.4734611511230469, -103.06306457519531, 73.29287719726562, 613.997802734375, 196.79600524902344, 93.74653625488281, 194.74765014648438, -10.991256713867188, 82.2098388671875, 222.0341033935547, 2.8371429443359375, 153.1627197265625, 28.28076934814453, -99.09327697753906, 476.63623046875, 467.5548095703125, 164.8566131591797, 860.634765625, -65.52596282958984, 151.44393920898438, 78.71598815917969, 78.00558471679688, 335.1554260253906, 94.24290466308594, 146.13153076171875, 132.84091186523438, 184.16061401367188, 311.47796630859375, 434.31201171875, 596.8568725585938, 222.90313720703125, 254.7825164794922], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000509.npy"}
{"epoch": 0.7474302496328928, "step": 510, "batch_size": 64, "mean": 216.81671142578125, "std": 185.53541564941406, "min": -184.41366577148438, "p10": -24.02781925201415, "median": 215.74918365478516, "p90": 451.7875183105469, "max": 589.583251953125, "pos_frac": 0.84375, "sample": [-40.62813949584961, 193.6044921875, 560.2711181640625, 346.7474670410156, 48.544097900390625, 189.88401794433594, 492.78155517578125, 422.9820556640625, 131.0537567138672, 274.47235107421875, 304.7560729980469, 317.4719543457031, -28.13597297668457, 138.308837890625, 444.4183349609375, 336.8752746582031, 312.1235046386719, -184.41366577148438, -73.67424774169922, 428.85882568359375, 213.91323852539062, 149.2683868408203, 273.523193359375, -9.991180419921875, 130.94473266601562, 325.09912109375, 356.8836975097656, 589.583251953125, 469.2747497558594, -132.4248046875, 441.4858093261719, 387.92816162109375, 0.048610687255859375, 78.39116668701172, 365.472900390625, 249.42984008789062, 162.18453979492188, 169.8859405517578, -14.442127227783203, 220.68011474609375, 83.32759094238281, 280.5372009277344, 320.3388366699219, 208.63157653808594, 53.80369567871094, 37.30571746826172, 217.5851287841797, 249.90740966796875, 110.34657287597656, 280.8262939453125, 544.0509643554688, -87.48577117919922, 212.25653076171875, 454.94573974609375, 134.5635223388672, 407.8291320800781, 243.80589294433594, 413.7392272949219, -69.38799285888672, 73.18895721435547, 109.79065704345703, 529.0694580078125, -2.591217041015625, 26.44259262084961], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000510.npy"}
{"epoch": 0.748898678414097, "step": 511, "batch_size": 64, "mean": 205.57594299316406, "std": 234.52728271484375, "min": -410.21099853515625, "p10": -46.100232696533205, "median": 191.35296630859375, "p90": 494.2055572509766, "max": 761.707275390625, "pos_frac": 0.78125, "sample": [73.01615142822266, 639.8096923828125, 135.06137084960938, 146.97972106933594, 218.40203857421875, 183.3300018310547, 335.3896484375, 640.784912109375, 197.64418029785156, 195.75892639160156, 385.775390625, 294.781982421875, 483.42333984375, 498.8265075683594, -40.20387268066406, 664.916748046875, -410.21099853515625, 465.98675537109375, -0.7857666015625, 371.4873046875, 355.8904724121094, 394.962158203125, 68.60920715332031, 321.2265625, 167.17092895507812, 90.54276275634766, -321.8583984375, 359.98138427734375, -5.407745361328125, 425.0874328613281, 463.144775390625, 55.88423156738281, -25.17862892150879, 170.34364318847656, 250.57003784179688, 253.98704528808594, 365.6573486328125, 144.62503051757812, -44.44342041015625, 127.70366668701172, 360.5195617675781, 230.566650390625, 114.99214172363281, -29.779991149902344, -46.81029510498047, 585.3904418945312, 225.28329467773438, -47.81648254394531, -110.36209869384766, 106.56983947753906, -105.92691802978516, 145.9456787109375, 363.6878356933594, -23.49039649963379, 63.08416748046875, 186.94700622558594, 214.7310791015625, 12.602714538574219, 761.707275390625, 78.2646484375, 222.069580078125, -138.95855712890625, 225.5972900390625, 663.3732299804688], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000511.npy"}
{"epoch": 0.750367107195301, "step": 512, "batch_size": 64, "mean": 222.03216552734375, "std": 183.8060760498047, "min": -103.2437744140625, "p10": -23.01773052215576, "median": 246.0050506591797, "p90": 487.04219055175787, "max": 660.988525390625, "pos_frac": 0.859375, "sample": [572.4779052734375, 512.1219482421875, 288.99407958984375, -58.70458984375, -23.721603393554688, -59.756011962890625, 508.58544921875, 33.30011749267578, -103.2437744140625, 82.41146087646484, 117.92863464355469, 336.3582458496094, 1.9968185424804688, 246.63951110839844, -50.234466552734375, 133.25624084472656, 313.35260009765625, 260.3805847167969, 134.7567138671875, -21.3753604888916, -24.60114288330078, 359.1644592285156, 8.533723831176758, 289.6444396972656, 660.988525390625, 283.95867919921875, 157.3367462158203, 176.32086181640625, 76.14702606201172, 406.0696716308594, 178.3258056640625, 275.636474609375, 224.1648406982422, 328.39715576171875, 157.9783477783203, 235.00743103027344, 245.37059020996094, 302.4192199707031, 43.06237030029297, 432.76922607421875, 146.67599487304688, 285.60955810546875, 362.98345947265625, 518.482666015625, 8.492645263671875, -40.29658508300781, 334.49066162109375, 15.476322174072266, 96.14390563964844, 283.7955017089844, 601.2254638671875, 18.43091583251953, 261.9714660644531, 405.28662109375, 249.6119384765625, 234.48873901367188, 282.2027282714844, 178.78488159179688, 289.80194091796875, 264.86614990234375, -8.539661407470703, 476.990478515625, 491.3500671386719, 409.513916015625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000512.npy"}
{"epoch": 0.7518355359765051, "step": 513, "batch_size": 64, "mean": 167.71078491210938, "std": 206.7906494140625, "min": -320.22857666015625, "p10": -78.4668098449707, "median": 156.43409729003906, "p90": 439.64476623535154, "max": 641.3388671875, "pos_frac": 0.78125, "sample": [439.3439636230469, 225.4139404296875, 187.70729064941406, 32.94043731689453, 77.29763793945312, 262.9375915527344, 56.869224548339844, 129.2205047607422, -199.71678161621094, 484.3063049316406, 312.2839050292969, 150.6593017578125, 439.773681640625, 152.27490234375, 3.2550907135009766, 384.2076416015625, 39.89044189453125, -78.47610473632812, 138.36997985839844, 294.7370300292969, 155.0233154296875, 487.27142333984375, 494.15618896484375, 295.90716552734375, 141.62677001953125, 157.64114379882812, 215.7134552001953, -13.05255126953125, 307.7989501953125, 12.004512786865234, 177.69775390625, 38.917320251464844, 337.49188232421875, 364.31439208984375, 225.6075439453125, -9.02880859375, 129.5409698486328, -206.40150451660156, 209.06097412109375, 641.3388671875, -89.09428405761719, 27.491294860839844, 270.41162109375, -320.22857666015625, 549.1156616210938, 155.22705078125, -0.5915451049804688, -5.361398696899414, 375.18450927734375, 414.7669982910156, -151.90306091308594, -34.945884704589844, 508.6567077636719, 281.29803466796875, -78.44512176513672, 220.86257934570312, -2.940317153930664, 100.88927459716797, 188.83035278320312, 207.1875762939453, -319.0726623535156, 400.37744140625, 81.22579956054688, 258.6217041015625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000513.npy"}
{"epoch": 0.7533039647577092, "step": 514, "batch_size": 64, "mean": 174.20184326171875, "std": 221.8417510986328, "min": -239.40150451660156, "p10": -106.17189178466795, "median": 136.51172637939453, "p90": 465.66338806152345, "max": 858.7373046875, "pos_frac": 0.796875, "sample": [-233.79763793945312, -112.6333999633789, 156.8701934814453, -239.40150451660156, 398.44818115234375, 178.72311401367188, 392.5505065917969, 244.2906951904297, 858.7373046875, 580.9469604492188, 58.48103332519531, 200.0722198486328, -122.01953887939453, 90.76971435546875, 338.4891357421875, -17.531570434570312, 12.897296905517578, 186.31082153320312, 527.5242309570312, 575.4404907226562, -165.7571258544922, 103.72985076904297, 353.21954345703125, -37.53508758544922, 21.094078063964844, 37.424652099609375, 110.63700866699219, -38.66455078125, 207.51995849609375, 28.06926727294922, 276.642578125, 238.31556701660156, 51.01757049560547, 159.26397705078125, 110.74945831298828, 100.60621643066406, 375.7201232910156, -157.06698608398438, 421.5945129394531, -28.119083404541016, 122.75297546386719, -115.38719177246094, -91.09503936767578, 246.2042236328125, 222.70785522460938, 57.62946701049805, -46.47637939453125, 29.111557006835938, 120.17491912841797, 468.0354919433594, 176.40072631835938, 364.1112060546875, 63.35087585449219, 459.5088806152344, 544.249267578125, 155.78125, 201.60618591308594, 128.31234741210938, 460.12847900390625, 144.7111053466797, 417.8642578125, 107.48448181152344, 110.06619262695312, 558.0553588867188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000514.npy"}
{"epoch": 0.7547723935389133, "step": 515, "batch_size": 64, "mean": 155.23159790039062, "std": 189.80120849609375, "min": -342.8394775390625, "p10": -43.718778610229485, "median": 140.90076446533203, "p90": 396.9640350341797, "max": 568.0260009765625, "pos_frac": 0.828125, "sample": [40.45512771606445, 109.39671325683594, 348.4062805175781, 480.8321228027344, 96.43022918701172, -342.8394775390625, 345.1728210449219, 238.99668884277344, 8.06646728515625, -191.48219299316406, -15.564102172851562, 140.47950744628906, 5.374237060546875, 428.7276611328125, 333.086669921875, 173.827392578125, 94.66105651855469, 327.5538024902344, 179.4638214111328, -51.814178466796875, 204.72879028320312, -190.8963165283203, 172.81700134277344, 363.8214111328125, 122.13529968261719, 59.76725769042969, 399.6446533203125, -47.98086166381836, 175.39666748046875, 174.12742614746094, 519.8783569335938, 16.272829055786133, -33.77391815185547, 149.56637573242188, 356.89678955078125, 383.2747802734375, 291.43536376953125, 7.374137878417969, 473.603271484375, 26.981475830078125, 106.56886291503906, 390.7092590332031, -214.0883026123047, -12.136741638183594, -105.214599609375, 147.86062622070312, 41.61920928955078, 568.0260009765625, 43.4965705871582, 63.663909912109375, 103.6126937866211, 432.74267578125, 21.001625061035156, -4.830284118652344, 357.8268127441406, 345.8905944824219, 27.541046142578125, 55.459197998046875, 285.70758056640625, 241.65969848632812, 115.57819366455078, 148.998046875, 141.322021484375, 257.50634765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000515.npy"}
{"epoch": 0.7562408223201175, "step": 516, "batch_size": 64, "mean": 217.89169311523438, "std": 186.21255493164062, "min": -128.47225952148438, "p10": -29.772582817077637, "median": 211.7223892211914, "p90": 470.0875701904298, "max": 693.8282470703125, "pos_frac": 0.875, "sample": [161.15640258789062, 200.3284912109375, 208.04409790039062, 55.32372283935547, 447.4021911621094, 267.068115234375, 236.09014892578125, -50.57063293457031, 241.4073028564453, 397.8221435546875, 550.1857299804688, 253.29444885253906, 259.31463623046875, 308.74822998046875, 75.60600280761719, 0.297149658203125, 325.4654846191406, 84.57695007324219, 166.08984375, 508.517578125, -29.445877075195312, 86.91146850585938, -29.912599563598633, 75.97175598144531, 324.2142639160156, 149.7836151123047, 12.090904235839844, 495.56903076171875, 693.8282470703125, 180.70509338378906, 13.836883544921875, 506.5174560546875, 651.3410034179688, 305.00872802734375, 442.9731750488281, 218.41586303710938, 295.36517333984375, 23.00588607788086, 133.5037078857422, 162.33218383789062, 222.77125549316406, -54.870147705078125, 305.6007080078125, 206.50967407226562, -32.577693939208984, 444.123291015625, 189.5083770751953, 114.4019775390625, 15.135459899902344, 207.70787048339844, 292.71832275390625, 337.7802429199219, 38.3642578125, 215.4006805419922, 289.238037109375, -128.47225952148438, -79.58796691894531, 32.82135009765625, 346.1063232421875, 372.9295959472656, 334.42236328125, 479.80987548828125, 419.14471435546875, -32.1017951965332], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000516.npy"}
{"epoch": 0.7577092511013216, "step": 517, "batch_size": 64, "mean": 215.03375244140625, "std": 211.66285705566406, "min": -197.79759216308594, "p10": -26.8998212814331, "median": 217.8975601196289, "p90": 466.8817932128907, "max": 855.3994140625, "pos_frac": 0.84375, "sample": [194.88027954101562, 446.94140625, 302.2136535644531, 40.154754638671875, 405.11785888671875, 96.93795776367188, 163.19813537597656, 254.07879638671875, 352.22747802734375, -197.79759216308594, 247.97412109375, 650.46435546875, 208.150634765625, 253.191162109375, 279.53656005859375, 65.56403350830078, 263.87451171875, 148.01126098632812, 310.48895263671875, 475.42767333984375, 529.9541015625, 343.69696044921875, 158.9612274169922, 86.88980102539062, 79.15560150146484, -29.168163299560547, 75.23352813720703, 245.77117919921875, 35.576324462890625, 855.3994140625, 91.52790832519531, 93.04261779785156, -119.10188293457031, 441.94622802734375, -5.217529296875, 324.9315185546875, -133.93060302734375, 176.60073852539062, -21.607023239135742, 63.38671112060547, 308.4149169921875, 47.882789611816406, 213.0922088623047, 261.6862487792969, 246.98004150390625, 305.90472412109375, 272.16131591796875, 238.98489379882812, -1.4407272338867188, 674.234375, 91.15286254882812, 321.91448974609375, 331.6571044921875, 685.5623168945312, -64.6353759765625, 625.650146484375, 166.1543426513672, 222.70291137695312, -75.09583282470703, 253.35545349121094, -118.0904541015625, 385.47796630859375, 73.57083129882812, 41.196495056152344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000517.npy"}
{"epoch": 0.7591776798825257, "step": 518, "batch_size": 64, "mean": 180.52365112304688, "std": 208.629150390625, "min": -300.4300537109375, "p10": -81.40617485046384, "median": 179.51942443847656, "p90": 502.05253295898456, "max": 626.0545654296875, "pos_frac": 0.8125, "sample": [0.8412437438964844, 526.0546875, 278.3201599121094, 227.03782653808594, 189.5590057373047, 230.0487823486328, 69.58258819580078, 226.66879272460938, 56.9573860168457, 100.04315185546875, 215.8677215576172, 370.5435791015625, 93.26019287109375, 15.017654418945312, 269.7586975097656, 162.3553009033203, 230.62542724609375, 116.16278839111328, 71.61398315429688, 382.56689453125, 454.6016845703125, 95.29552459716797, -148.45523071289062, 159.22360229492188, -89.769287109375, 269.7188415527344, 437.822509765625, 176.63140869140625, 224.3231964111328, 551.7708740234375, 608.499267578125, -102.32820129394531, 553.506103515625, 552.976318359375, 46.027259826660156, 191.77926635742188, 129.50714111328125, -10.150909423828125, 293.71307373046875, 357.8116455078125, 79.32868957519531, -23.43133544921875, 259.5336608886719, 131.19754028320312, 243.78228759765625, 45.22692108154297, -147.24317932128906, -300.4300537109375, 310.9964904785156, 127.69143676757812, -17.339736938476562, -24.293025970458984, -175.07444763183594, 182.40744018554688, -278.224365234375, 626.0545654296875, 325.5008850097656, 113.95388793945312, -61.89224624633789, 522.3886108398438, 285.5110778808594, 91.02198791503906, 363.95953369140625, 287.4967041015625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000518.npy"}
{"epoch": 0.7606461086637298, "step": 519, "batch_size": 64, "mean": 163.3507080078125, "std": 194.70159912109375, "min": -231.46798706054688, "p10": -52.95311355590819, "median": 149.54071807861328, "p90": 426.47125244140625, "max": 755.1761474609375, "pos_frac": 0.859375, "sample": [42.364593505859375, 220.81394958496094, 153.73936462402344, -216.92922973632812, 466.136474609375, -106.30135345458984, 230.00392150878906, 423.581787109375, 421.7315673828125, 271.25030517578125, 131.67100524902344, 428.6229553222656, -38.59994125366211, 484.03863525390625, 236.11373901367188, -174.74227905273438, 543.7180786132812, 314.897705078125, 159.6307373046875, 217.91517639160156, 174.39244079589844, 48.54161071777344, 50.166343688964844, 66.09715270996094, 109.7681884765625, 30.933258056640625, 204.2855224609375, 80.56913757324219, 236.16424560546875, 56.992820739746094, 4.763824462890625, 427.7095947265625, 145.34207153320312, -73.77276611328125, 241.82550048828125, 174.80230712890625, 142.11875915527344, 72.49321746826172, -30.17406463623047, 133.7688446044922, 168.26870727539062, 26.634002685546875, 326.09320068359375, 80.59393310546875, -231.46798706054688, 409.5518798828125, 755.1761474609375, 37.337684631347656, 96.72918701171875, 94.30589294433594, 154.4171142578125, 619.7162475585938, 31.020793914794922, 324.968017578125, 42.91064453125, 188.12078857421875, 314.9410705566406, 8.385665893554688, 242.2464599609375, 210.728759765625, 131.5925750732422, -59.10447311401367, 185.45486450195312, -210.62091064453125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000519.npy"}
{"epoch": 0.762114537444934, "step": 520, "batch_size": 64, "mean": 178.88897705078125, "std": 180.76693725585938, "min": -169.10671997070312, "p10": -16.308749389648433, "median": 115.99922561645508, "p90": 439.2230072021485, "max": 755.41748046875, "pos_frac": 0.859375, "sample": [103.99388885498047, -37.30596923828125, 606.7296142578125, 66.57994079589844, 285.4018249511719, 116.223876953125, 356.21453857421875, 8.3746337890625, -28.67011260986328, 152.01382446289062, 140.41456604003906, -18.024658203125, 389.0807800292969, 24.795211791992188, 55.20995330810547, 130.22817993164062, 397.69903564453125, 755.41748046875, 25.998016357421875, 115.54617309570312, 349.8480224609375, 518.9391479492188, -169.10671997070312, 184.2568359375, -12.304962158203125, 445.732177734375, 84.47919464111328, 470.42071533203125, 169.66558837890625, 317.84527587890625, 25.429317474365234, 89.36045837402344, 446.1685791015625, 85.67018127441406, 160.84979248046875, 239.65382385253906, 362.13360595703125, 317.1284484863281, 325.9598083496094, 36.140960693359375, 275.60546875, 115.77457427978516, 105.51197814941406, 105.79351806640625, -30.80572509765625, 374.1615905761719, -63.42292022705078, 115.31433868408203, 469.681396484375, -7.8919677734375, 119.41444396972656, 114.69754028320312, 76.29025268554688, 220.79537963867188, 86.62863159179688, 424.0349426269531, 19.85491943359375, 190.6304931640625, -25.987289428710938, 290.1714172363281, 61.85272216796875, 162.39073181152344, 103.10726928710938, 51.100341796875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000520.npy"}
{"epoch": 0.7635829662261381, "step": 521, "batch_size": 64, "mean": 177.82164001464844, "std": 195.54122924804688, "min": -171.74948120117188, "p10": -42.62384605407713, "median": 153.37843322753906, "p90": 459.1825622558594, "max": 758.781005859375, "pos_frac": 0.8125, "sample": [230.63319396972656, 463.0436706542969, 265.8829040527344, 17.334251403808594, -124.44788360595703, -171.74948120117188, 34.0802001953125, 2.8094482421875, -10.160797119140625, 282.1695556640625, 44.65016174316406, -7.067955017089844, -21.88693618774414, 65.99445343017578, 450.1733093261719, 558.5366821289062, 46.012115478515625, 26.07144546508789, -51.51109313964844, 478.20361328125, -73.98143005371094, -20.913040161132812, 112.03974151611328, 109.45075988769531, 421.5709228515625, 498.3977355957031, 338.7198181152344, 157.01275634765625, 21.33362579345703, 149.74411010742188, 237.222900390625, 306.07525634765625, 46.715354919433594, 254.679931640625, -60.885902404785156, 241.46450805664062, 58.993377685546875, 197.09115600585938, 389.1288146972656, -55.64714813232422, 211.49868774414062, 353.73675537109375, 117.82299041748047, 21.220294952392578, 59.4263916015625, 265.6451416015625, -18.03207015991211, 291.17138671875, 59.459747314453125, -95.63774871826172, 252.68515014648438, 219.6131134033203, 501.13531494140625, 65.61357116699219, 39.21905517578125, 363.5826110839844, 299.2554931640625, 428.1377868652344, 23.71361541748047, 758.781005859375, 256.9635925292969, 199.67991638183594, 288.7080383300781, 510.2013244628906], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000521.npy"}
{"epoch": 0.7650513950073421, "step": 522, "batch_size": 64, "mean": 163.5058135986328, "std": 173.3970489501953, "min": -262.37652587890625, "p10": -35.5775592803955, "median": 159.16161346435547, "p90": 372.7402282714844, "max": 561.4367065429688, "pos_frac": 0.875, "sample": [258.64068603515625, 214.06272888183594, 346.00482177734375, -40.28984832763672, 516.7376098632812, -24.582218170166016, 359.5299377441406, 127.99749755859375, 44.1004524230957, 23.244365692138672, 74.4755859375, 32.731346130371094, 442.5709228515625, 16.790794372558594, 377.88543701171875, 198.821044921875, 286.5068664550781, -44.5115852355957, 241.2089080810547, 220.98565673828125, 306.47979736328125, 320.02752685546875, 163.91586303710938, 175.23316955566406, -174.68438720703125, 561.4367065429688, 126.38031768798828, 209.30970764160156, 360.7347412109375, 139.31277465820312, 150.67495727539062, 113.12632751464844, 120.80610656738281, 38.15472412109375, 267.6949462890625, -262.37652587890625, 232.84259033203125, 43.788780212402344, 219.41326904296875, -59.1632080078125, 15.607681274414062, 232.64239501953125, 397.7086181640625, 48.726966857910156, 7.296985626220703, 257.300537109375, 253.6856689453125, 71.54388427734375, -227.92599487304688, 356.4149169921875, 231.86270141601562, 95.7266845703125, 315.5664978027344, 132.0404815673828, 91.87779235839844, 16.06598663330078, -170.04904174804688, 5.528236389160156, 391.90130615234375, 183.78497314453125, 486.6998291015625, 74.277099609375, 315.66107177734375, 154.40736389160156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000522.npy"}
{"epoch": 0.7665198237885462, "step": 523, "batch_size": 64, "mean": 172.74790954589844, "std": 181.0349578857422, "min": -381.0469970703125, "p10": -41.205155181884756, "median": 189.08367919921875, "p90": 374.0787811279297, "max": 601.4450073242188, "pos_frac": 0.8125, "sample": [-8.661666870117188, 371.45025634765625, 81.85145568847656, 263.474853515625, 296.67608642578125, -44.66304016113281, 283.28289794921875, 408.5224304199219, 256.67462158203125, 220.95281982421875, -238.02842712402344, 268.6903991699219, 6.796232223510742, 215.33810424804688, 32.54270553588867, 51.04167175292969, 166.89642333984375, 363.66473388671875, 249.6367950439453, 202.12152099609375, 107.01368713378906, 375.2052917480469, 162.0791015625, -50.02351379394531, 125.89164733886719, 101.83625793457031, 218.82679748535156, 100.08224487304688, -8.289543151855469, 601.4450073242188, 241.2864227294922, 115.61952209472656, 149.71209716796875, -381.0469970703125, 214.98483276367188, 359.00726318359375, -111.0093765258789, -49.49238586425781, 322.32745361328125, 36.92424011230469, 251.74362182617188, 37.887725830078125, 391.6393737792969, 176.04583740234375, 231.8789825439453, -19.941619873046875, 352.0118103027344, 142.91650390625, 466.40521240234375, 150.511474609375, 388.0501403808594, 170.737060546875, 262.7799072265625, 55.498924255371094, 367.236572265625, 210.0154266357422, 272.8980712890625, -33.136756896972656, 247.06076049804688, 547.4414672851562, 364.7535705566406, 125.54963684082031, -184.5943603515625, -0.16396713256835938], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000523.npy"}
{"epoch": 0.7679882525697503, "step": 524, "batch_size": 64, "mean": 173.58346557617188, "std": 180.7813262939453, "min": -303.9987487792969, "p10": -59.56441154479979, "median": 182.2446746826172, "p90": 390.12961730957034, "max": 584.9140625, "pos_frac": 0.828125, "sample": [120.99665832519531, 584.9140625, 96.31094360351562, 220.48348999023438, 180.27008056640625, 178.19244384765625, 298.00958251953125, 45.90827178955078, 274.1288757324219, 145.43515014648438, 276.91790771484375, 103.39413452148438, 125.03225708007812, 401.12774658203125, 290.2395935058594, 190.74325561523438, 14.358306884765625, 275.90966796875, 106.79948425292969, 209.90325927734375, -303.9987487792969, 74.223388671875, 220.6583251953125, 479.3656005859375, 83.83592224121094, -66.42636108398438, 238.63323974609375, 260.7338562011719, 465.52996826171875, -43.55319595336914, 31.169185638427734, 88.74458312988281, 104.30916595458984, -37.45726013183594, 378.8424377441406, -102.99172973632812, 303.073486328125, 517.4478149414062, 281.7774353027344, 232.4905242919922, 184.21926879882812, -173.67044067382812, 28.736892700195312, -99.30022430419922, 358.18548583984375, -8.445831298828125, 310.416748046875, 11.602531433105469, 293.0025634765625, 347.613525390625, 408.59429931640625, 368.3070373535156, 295.426513671875, -197.49575805664062, -6.338096618652344, 72.72431945800781, 394.96697998046875, 131.5054168701172, 76.5932388305664, 298.8808288574219, 283.1587219238281, 154.68785095214844, 336.8117370605469, -106.32489013671875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000524.npy"}
{"epoch": 0.7694566813509545, "step": 525, "batch_size": 64, "mean": 220.95278930664062, "std": 179.00204467773438, "min": -154.89183044433594, "p10": -3.6514127731323227, "median": 222.77310180664062, "p90": 463.5680694580078, "max": 622.7523803710938, "pos_frac": 0.875, "sample": [274.41522216796875, 417.33941650390625, 189.60194396972656, 162.35623168945312, 7.912012100219727, 427.08514404296875, 336.1448669433594, 202.07315063476562, 400.7232971191406, 274.06707763671875, 117.2687759399414, -91.13458251953125, 321.298828125, 119.87620544433594, 115.8358154296875, -154.89183044433594, 185.96791076660156, 274.6269226074219, 501.65338134765625, 222.04904174804688, 360.79840087890625, 112.14816284179688, -2.08099365234375, 234.76162719726562, 71.36090087890625, 230.39199829101562, 200.47596740722656, 329.95654296875, 223.49716186523438, 359.653564453125, 365.2242126464844, 31.837608337402344, 568.6541137695312, 89.63695526123047, 461.9260559082031, 186.77294921875, 27.591087341308594, 234.31643676757812, 317.48101806640625, 31.193206787109375, 622.7523803710938, -46.3201904296875, -51.70819091796875, -4.32444953918457, 5.096099853515625, 205.06439208984375, 185.78695678710938, 171.61032104492188, 333.2232666015625, 258.18670654296875, 359.56353759765625, 515.0740356445312, -38.640045166015625, 225.39666748046875, 464.27178955078125, 237.0481719970703, 127.64785766601562, 49.16393280029297, -106.94722747802734, 363.1595153808594, 240.741455078125, 552.635986328125, 195.05711364746094, 537.5716552734375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000525.npy"}
{"epoch": 0.7709251101321586, "step": 526, "batch_size": 64, "mean": 215.81175231933594, "std": 189.93629455566406, "min": -154.8828125, "p10": -36.636400794982904, "median": 219.33541870117188, "p90": 439.8996215820313, "max": 765.9007568359375, "pos_frac": 0.84375, "sample": [565.6262817382812, 388.3387451171875, 95.15898132324219, 175.82501220703125, 231.43215942382812, 150.1019287109375, -30.631418228149414, -80.96143341064453, 279.84930419921875, -154.8828125, 428.3412780761719, 311.71868896484375, 208.71859741210938, 115.64419555664062, 435.5721130371094, 592.023681640625, 118.58340454101562, 309.95062255859375, 150.63357543945312, 327.9078369140625, 127.93698120117188, 307.8205261230469, -54.75230407714844, 441.3590087890625, 302.84735107421875, 311.5511779785156, 765.9007568359375, 219.78611755371094, 260.1652526855469, 23.573495864868164, -91.49620056152344, 409.054931640625, -63.137046813964844, 75.31047821044922, -53.1072998046875, 54.446075439453125, 366.0849914550781, 227.4336395263672, 71.08145141601562, -39.209964752197266, 218.8847198486328, 277.6705017089844, 301.48980712890625, 216.585693359375, -23.797164916992188, 207.08804321289062, 436.494384765625, 462.2109375, 38.176490783691406, 239.7957000732422, 51.654754638671875, 181.16798400878906, 111.38320922851562, 519.6785278320312, 292.2525939941406, 363.8010559082031, 402.63653564453125, 247.18968200683594, 258.158203125, 26.190750122070312, 31.07276153564453, 159.50587463378906, -7.0519561767578125, 518.113037109375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000526.npy"}
{"epoch": 0.7723935389133627, "step": 527, "batch_size": 64, "mean": 183.49468994140625, "std": 176.53504943847656, "min": -139.4689178466797, "p10": -32.247566223144524, "median": 157.1336441040039, "p90": 440.59020690917976, "max": 634.7939453125, "pos_frac": 0.84375, "sample": [86.20024871826172, 445.9291687011719, 362.9194641113281, 268.4895935058594, 334.93218994140625, 157.46731567382812, 634.7939453125, -58.37217712402344, -131.03482055664062, 120.3612060546875, 82.67231750488281, -27.194366455078125, 459.0830993652344, 142.92117309570312, 72.55828094482422, -92.67601776123047, 378.3196716308594, 27.881492614746094, 206.5059814453125, 105.69671630859375, -139.4689178466797, 428.13262939453125, 125.61709594726562, 227.42236328125, 88.6572265625, 334.56292724609375, -34.41322326660156, 84.1867446899414, 169.27334594726562, 246.96913146972656, 270.1744079589844, 274.4039306640625, 213.6351318359375, 292.9788818359375, 562.7074584960938, 499.565673828125, 537.4710693359375, -20.80646324157715, -14.845169067382812, 192.25881958007812, 520.515380859375, 128.0357666015625, 412.6956787109375, 114.33485412597656, -80.4383544921875, 165.77691650390625, 269.24859619140625, 139.96762084960938, 183.64059448242188, 107.00389099121094, 169.8342742919922, 81.11868286132812, 156.7999725341797, 252.72515869140625, 35.830718994140625, 65.00248718261719, 407.5158996582031, 193.89439392089844, 180.43014526367188, 145.7239532470703, 67.86376953125, -67.05719757080078, 86.97441864013672, 90.28474426269531], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000527.npy"}
{"epoch": 0.7738619676945668, "step": 528, "batch_size": 64, "mean": 182.74923706054688, "std": 208.3048553466797, "min": -160.34051513671875, "p10": -63.65932235717772, "median": 174.04199981689453, "p90": 467.89918823242186, "max": 786.9425659179688, "pos_frac": 0.828125, "sample": [5.354331970214844, 276.28875732421875, 309.3101501464844, 204.44708251953125, 67.68707275390625, -72.33837890625, -153.05096435546875, 466.83837890625, 266.41485595703125, 160.04852294921875, 306.16595458984375, 86.07801818847656, 152.42572021484375, -54.06988525390625, 219.2050018310547, 54.73908233642578, 310.0734558105469, -17.016433715820312, 571.6469116210938, 786.9425659179688, -67.76908111572266, 382.28826904296875, 171.697509765625, 204.32061767578125, 206.43637084960938, 514.406982421875, 16.341819763183594, 674.038330078125, 4.634620666503906, 265.1131286621094, -160.34051513671875, 151.77606201171875, 218.17572021484375, 27.89395523071289, 312.08184814453125, 179.5105438232422, 59.071876525878906, 230.4900360107422, 336.30364990234375, 175.16305541992188, 76.23882293701172, 453.7424011230469, 37.758819580078125, -26.175872802734375, 504.2852783203125, 15.595695495605469, 360.9647216796875, 529.038818359375, 207.4179229736328, 318.2300720214844, -121.86970520019531, -92.61515808105469, 54.60099792480469, 386.7918701171875, -26.006437301635742, 468.35382080078125, 178.66322326660156, 17.311275482177734, -149.21800231933594, 7.0999298095703125, 50.20428466796875, 172.9209442138672, 377.91778564453125, 45.87462615966797], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000528.npy"}
{"epoch": 0.775330396475771, "step": 529, "batch_size": 64, "mean": 146.76773071289062, "std": 197.5183868408203, "min": -266.93853759765625, "p10": -113.50779190063476, "median": 150.0086898803711, "p90": 393.2931793212891, "max": 687.3065795898438, "pos_frac": 0.75, "sample": [48.03150939941406, -8.438674926757812, 134.5556640625, -116.35345458984375, 190.6978759765625, 152.29127502441406, 227.7801513671875, 244.4603271484375, 86.21713256835938, 687.3065795898438, 208.7659912109375, 340.0047912597656, 160.12057495117188, 18.45561981201172, 296.8396301269531, -10.124368667602539, 101.39582061767578, 126.17635345458984, 226.40139770507812, -28.11609649658203, -41.944541931152344, 144.12347412109375, 385.303955078125, 381.1527099609375, 469.48931884765625, -266.93853759765625, 53.223655700683594, 328.8468933105469, -146.92897033691406, 453.58013916015625, 217.66064453125, 147.72610473632812, 25.475366592407227, 396.7171325683594, 32.17791748046875, -59.84608459472656, 164.87994384765625, 252.60716247558594, 340.230712890625, 275.49530029296875, -176.34730529785156, 284.62078857421875, 259.4590759277344, -224.1046905517578, 400.1242980957031, -183.5819854736328, 185.7750244140625, -34.32269287109375, 442.51495361328125, 32.589942932128906, 80.16554260253906, 302.253173828125, 16.125940322875977, 256.0795593261719, -84.24281311035156, 58.827850341796875, -48.239845275878906, 345.89569091796875, 432.0997619628906, 322.463134765625, -106.86791229248047, 8.523208618164062, -139.17324829101562, 322.9972229003906], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000529.npy"}
{"epoch": 0.7767988252569751, "step": 530, "batch_size": 64, "mean": 223.07403564453125, "std": 210.052490234375, "min": -428.63922119140625, "p10": -5.00976715087889, "median": 197.3610382080078, "p90": 495.1701049804688, "max": 677.1543579101562, "pos_frac": 0.890625, "sample": [381.9906005859375, 226.28433227539062, 13.093841552734375, 223.91690063476562, 132.58648681640625, -70.0516586303711, 534.358642578125, 468.7634582519531, 161.67312622070312, -82.2244873046875, 183.8505401611328, -105.91869354248047, 188.13787841796875, 63.139190673828125, 423.4642333984375, 305.3428955078125, 63.59663391113281, 217.48236083984375, 93.45960235595703, 328.959716796875, 379.0938720703125, 186.26394653320312, 564.2098999023438, 206.58419799804688, 10.749343872070312, 308.3790283203125, -11.763671875, 355.582763671875, 103.68174743652344, 345.5198974609375, 11.99778938293457, 481.5356140136719, 501.0134582519531, 254.96153259277344, 323.4234924316406, 18.278270721435547, 480.3862609863281, 175.60543823242188, 20.3880615234375, -107.52546691894531, 108.31959533691406, 445.2413024902344, 133.79635620117188, 651.9367065429688, -12.611265182495117, 159.08126831054688, 386.8990783691406, 55.41179656982422, 677.1543579101562, 38.68012237548828, 307.0427551269531, 74.60047149658203, 503.2987060546875, -428.63922119140625, 613.3111572265625, 276.90338134765625, 141.17581176757812, 382.4852294921875, 321.2084655761719, 352.38189697265625, 62.3716926574707, 362.12481689453125, 180.17022705078125, 124.1222915649414], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000530.npy"}
{"epoch": 0.7782672540381792, "step": 531, "batch_size": 64, "mean": 205.79470825195312, "std": 199.82810974121094, "min": -234.418212890625, "p10": 3.7268716812133835, "median": 161.45042419433594, "p90": 468.0561096191406, "max": 672.5530395507812, "pos_frac": 0.921875, "sample": [214.21481323242188, 16.132009506225586, 343.6250305175781, 672.5530395507812, 421.48468017578125, 10.442550659179688, 42.737831115722656, 67.15042877197266, 639.365966796875, 1.904327392578125, 33.00154113769531, 434.150390625, 7.979475021362305, 383.8898010253906, 154.16534423828125, 237.9880828857422, 294.61920166015625, 8.473098754882812, 9.848907470703125, 142.97457885742188, 599.082275390625, 107.93898010253906, 335.76031494140625, 57.362213134765625, 282.2099304199219, 225.61306762695312, 107.99565124511719, 210.91017150878906, 268.5252380371094, 24.026161193847656, 435.81768798828125, 314.70465087890625, 306.95391845703125, 123.51791381835938, 8.053693771362305, 210.51278686523438, 0.3791961669921875, -26.167816162109375, -7.684991836547852, 465.8308410644531, 465.3145751953125, 496.040771484375, 168.73550415039062, 96.75355529785156, 518.66259765625, -234.418212890625, 186.9342041015625, 230.0546875, 469.0097961425781, 451.82818603515625, 344.05084228515625, 96.97188568115234, 625.4840087890625, 55.09180450439453, 129.49069213867188, 24.117294311523438, 243.37237548828125, 52.65284729003906, 135.01101684570312, 26.641921997070312, 140.03305053710938, 370.7023620605469, -2.1020736694335938, -107.61456298828125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000531.npy"}
{"epoch": 0.7797356828193832, "step": 532, "batch_size": 64, "mean": 213.4281005859375, "std": 239.6318817138672, "min": -379.35369873046875, "p10": -31.114125823974607, "median": 188.32184600830078, "p90": 496.8111389160157, "max": 918.910400390625, "pos_frac": 0.84375, "sample": [629.9027709960938, 325.7356872558594, 561.406005859375, 83.6661605834961, 578.9548950195312, 138.72219848632812, 419.99432373046875, 307.8595275878906, -367.8048400878906, 187.60108947753906, -196.6649932861328, 307.59881591796875, 251.42884826660156, 407.158203125, 17.061691284179688, 150.47738647460938, 354.6803894042969, -379.35369873046875, 339.273681640625, 191.97695922851562, 380.2300720214844, 674.6854858398438, 620.643798828125, 501.6968688964844, 399.609619140625, 185.35549926757812, 52.89476013183594, 426.9132080078125, 448.21197509765625, 11.6015625, 66.30431365966797, -0.02402496337890625, -32.02569580078125, 485.4111022949219, 196.1259307861328, 78.77520751953125, 179.40670776367188, 466.24676513671875, 235.51907348632812, 252.81961059570312, -25.751625061035156, 423.9681396484375, 141.1640625, 307.8454284667969, 255.97145080566406, 189.0426025390625, 117.76331329345703, 398.9344787597656, 151.5526123046875, -28.98712921142578, 44.29077911376953, -140.71844482421875, 41.26710510253906, 918.910400390625, 89.48866271972656, 90.591796875, 164.56678771972656, -167.39292907714844, 167.25653076171875, 240.5154571533203, -52.27677917480469, 223.07568359375, 29.1224365234375, 139.12091064453125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000532.npy"}
{"epoch": 0.7812041116005873, "step": 533, "batch_size": 64, "mean": 170.9156036376953, "std": 204.98936462402344, "min": -399.933349609375, "p10": -77.06825027465818, "median": 154.92637634277344, "p90": 456.2356872558594, "max": 662.5997314453125, "pos_frac": 0.8125, "sample": [-88.86190795898438, -134.75534057617188, 223.90609741210938, 108.99002838134766, 69.18412780761719, 144.14999389648438, 514.5784301757812, 456.82684326171875, 161.15545654296875, 128.6458740234375, 572.677978515625, 309.38665771484375, 416.39208984375, 341.85394287109375, -133.20559692382812, 270.0691223144531, 424.26580810546875, 182.37783813476562, 82.74688720703125, -92.98275756835938, 226.15126037597656, 74.393310546875, 200.07110595703125, 453.7485656738281, 177.93605041503906, 292.9088134765625, 127.3772201538086, -399.933349609375, 112.14385986328125, 662.5997314453125, 80.52310180664062, 354.2899475097656, 73.01486206054688, 97.7448501586914, 42.78292465209961, -105.80680084228516, 3.7256240844726562, 173.63621520996094, 59.581939697265625, 183.35147094726562, 16.753650665283203, 216.1116485595703, 543.0020751953125, 454.8563232421875, -13.395782470703125, 208.5535125732422, 13.152099609375, -5.4659881591796875, 191.79852294921875, 77.40267944335938, 177.9761962890625, 49.20579528808594, 366.8695373535156, 72.91484832763672, -85.59595489501953, -17.67237091064453, -57.17027282714844, 152.51519775390625, 157.33755493164062, 584.363525390625, -45.13298797607422, 214.23196411132812, 528.0003051757812, 290.3446960449219], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000533.npy"}
{"epoch": 0.7826725403817915, "step": 534, "batch_size": 64, "mean": 180.44354248046875, "std": 180.93798828125, "min": -424.0306091308594, "p10": -13.481142425537106, "median": 166.91775512695312, "p90": 428.3309539794922, "max": 619.328125, "pos_frac": 0.875, "sample": [175.42022705078125, 337.5354309082031, -10.5443115234375, 515.1582641601562, 142.37896728515625, 74.447998046875, -122.61712646484375, 322.0263366699219, 286.80517578125, 47.607582092285156, 76.52141571044922, 451.2734680175781, -14.739784240722656, 146.33724975585938, 408.08758544921875, 468.34698486328125, 271.580810546875, 127.35819244384766, 179.76998901367188, 297.35931396484375, 196.5212860107422, -19.69039535522461, 164.59304809570312, 288.64080810546875, 263.81024169921875, 266.9826354980469, 619.328125, 35.709354400634766, 76.29780578613281, 169.24246215820312, 194.62643432617188, 433.8678894042969, 129.3224639892578, -71.60807037353516, 330.9443359375, -424.0306091308594, 39.915618896484375, 453.0818786621094, 388.78021240234375, 322.677001953125, 89.01708984375, 59.24476623535156, 9.590513229370117, 108.46985626220703, 415.41143798828125, 208.0926971435547, 215.6710662841797, 75.49861145019531, 225.30133056640625, 483.24798583984375, 303.83563232421875, 95.19605255126953, 129.23300170898438, 117.57658386230469, 200.90896606445312, 45.57330322265625, 129.80682373046875, 103.2385482788086, 328.1617126464844, -24.659391403198242, 243.84420776367188, -198.4262237548828, 10.164421081542969, 135.261962890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000534.npy"}
{"epoch": 0.7841409691629956, "step": 535, "batch_size": 64, "mean": 216.51309204101562, "std": 196.22496032714844, "min": -249.34820556640625, "p10": -9.6130043029785, "median": 189.0677261352539, "p90": 473.00726623535166, "max": 644.1187744140625, "pos_frac": 0.890625, "sample": [-59.27784729003906, 644.1187744140625, 73.31281280517578, 152.3311767578125, 35.39073181152344, 105.13751983642578, 64.87847900390625, 513.36328125, 49.111572265625, 419.39501953125, 158.5915985107422, 244.30014038085938, 445.10980224609375, 594.6988525390625, 326.11688232421875, 182.25685119628906, -16.75304412841797, 483.06610107421875, 152.44192504882812, 165.4616241455078, 449.5366516113281, 260.5186767578125, 204.51190185546875, 421.5729064941406, 259.15966796875, 436.35357666015625, 218.4945831298828, 241.85202026367188, 35.240570068359375, 195.87860107421875, 494.98638916015625, 383.0456848144531, 387.6014404296875, 241.9668426513672, 63.396278381347656, -185.60769653320312, 396.1640930175781, 338.4239501953125, 169.577392578125, 202.678466796875, 277.721435546875, 574.319580078125, 89.87584686279297, 153.074462890625, 165.95254516601562, 155.8803253173828, 131.18524169921875, 412.96966552734375, -41.72352600097656, -249.34820556640625, 428.7774658203125, 95.9930648803711, 202.27598571777344, 53.63671112060547, -59.19325256347656, -162.51885986328125, 393.74951171875, 7.047088623046875, 553.8488159179688, 155.99447631835938, 14.875680923461914, 318.35150146484375, 93.53839111328125, 142.14999389648438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000535.npy"}
{"epoch": 0.7856093979441997, "step": 536, "batch_size": 64, "mean": 248.81976318359375, "std": 231.21502685546875, "min": -118.45283508300781, "p10": -9.187372589111327, "median": 212.78499603271484, "p90": 522.0369415283204, "max": 1226.4571533203125, "pos_frac": 0.875, "sample": [111.28136444091797, 180.575927734375, 291.3858642578125, 393.9595031738281, 810.0221557617188, 106.62858581542969, 210.19064331054688, -71.47320556640625, -7.879123687744141, 14.528213500976562, 177.4610595703125, 258.1369323730469, 495.9179992675781, 470.317138671875, 132.28489685058594, 402.2072448730469, 653.4505004882812, 60.17802429199219, 541.8032836914062, 374.4292297363281, 451.71405029296875, 272.0526123046875, -19.311119079589844, 196.30311584472656, 342.81146240234375, 563.5986328125, 20.631973266601562, 366.335205078125, 363.9158630371094, 13.45794677734375, 1226.4571533203125, 384.01416015625, 204.41969299316406, 109.4811019897461, 298.74163818359375, 249.80096435546875, 155.58103942871094, 634.2901000976562, -9.748050689697266, -61.59964370727539, 215.3793487548828, -118.45283508300781, 533.2307739257812, 251.92733764648438, 226.05934143066406, -36.47776794433594, 145.62356567382812, 172.71832275390625, 241.0032501220703, 119.55810546875, 256.821533203125, 83.0002670288086, 130.31124877929688, 200.50836181640625, 149.04254150390625, 453.4373779296875, 132.23683166503906, 11.627178192138672, -113.64784240722656, 160.10223388671875, 393.9439697265625, 323.4710693359375, 232.30203247070312, 392.3846435546875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000536.npy"}
{"epoch": 0.7870778267254038, "step": 537, "batch_size": 64, "mean": 209.2523193359375, "std": 214.3148651123047, "min": -216.64236450195312, "p10": -9.230555152893062, "median": 193.0299835205078, "p90": 502.3863342285158, "max": 750.4298095703125, "pos_frac": 0.875, "sample": [-216.64236450195312, -47.644012451171875, 17.646751403808594, 126.38092803955078, 20.097396850585938, 218.24928283691406, 29.449201583862305, 520.2301635742188, 27.310714721679688, 81.24911499023438, 274.9809265136719, 8.315261840820312, 330.3244323730469, 84.80949401855469, -180.3992156982422, 404.3822021484375, 614.73193359375, -42.010162353515625, 336.88861083984375, 264.72418212890625, -5.361936569213867, 441.9892272949219, 66.58795928955078, -207.232666015625, 155.6654815673828, -10.888534545898438, 577.214599609375, 340.58050537109375, 235.34005737304688, 296.9909362792969, 661.46240234375, 144.23663330078125, 560.0433959960938, 258.70068359375, 222.40158081054688, 294.5007629394531, 159.99037170410156, 403.9816589355469, 281.34173583984375, 73.85670471191406, 65.87973022460938, 397.947509765625, 89.74537658691406, -193.18411254882812, 167.81068420410156, 384.9200439453125, 22.16921615600586, 228.8472442626953, 158.48495483398438, 460.40631103515625, 120.77352905273438, 281.7955017089844, 0.7612838745117188, 252.62928771972656, 460.750732421875, 119.56917572021484, 750.4298095703125, 142.9156494140625, 431.1633605957031, 254.6815185546875, 92.96722412109375, 537.642333984375, 24.801603317260742, 313.76324462890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000537.npy"}
{"epoch": 0.788546255506608, "step": 538, "batch_size": 64, "mean": 210.346435546875, "std": 191.25143432617188, "min": -232.97019958496094, "p10": 11.500987243652345, "median": 199.06981658935547, "p90": 439.8353546142578, "max": 719.7328491210938, "pos_frac": 0.921875, "sample": [46.99275207519531, 310.68682861328125, 256.56640625, 572.4630737304688, 455.41943359375, -90.1524658203125, 345.3814392089844, 140.19692993164062, 203.31150817871094, 14.623023986816406, -13.806999206542969, 267.578125, 410.91912841796875, 240.9954376220703, 126.24177551269531, 369.35601806640625, 231.2560272216797, 80.97301483154297, 102.21253967285156, 570.2337646484375, 248.8539276123047, 86.23922729492188, 46.942657470703125, 376.1448974609375, 237.10833740234375, 167.3009796142578, 127.03680419921875, 719.7328491210938, 3.3997039794921875, 12.024139404296875, -232.97019958496094, 439.3171691894531, 127.83518981933594, 52.99232482910156, 692.8612060546875, 256.5107116699219, -44.68735122680664, 251.69908142089844, 197.58497619628906, 299.6622619628906, 157.99273681640625, -128.79461669921875, 17.498775482177734, 394.23388671875, 387.1062316894531, 412.83746337890625, 51.61734390258789, 200.55465698242188, 46.105255126953125, 119.16802978515625, 350.45416259765625, 269.66839599609375, 192.5146026611328, 333.8481140136719, 63.883445739746094, 49.68274688720703, 182.1446533203125, 217.18666076660156, 542.265869140625, 440.05743408203125, 11.276779174804688, 40.66510772705078, 304.2911682128906, 98.87682342529297], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000538.npy"}
{"epoch": 0.7900146842878121, "step": 539, "batch_size": 64, "mean": 187.47557067871094, "std": 210.95481872558594, "min": -257.0817565917969, "p10": -73.27134704589842, "median": 191.52127075195312, "p90": 476.22296142578125, "max": 610.3334350585938, "pos_frac": 0.78125, "sample": [227.71627807617188, 281.544677734375, 58.45934295654297, 54.168212890625, 206.09967041015625, 309.7626953125, -140.4698028564453, 28.37580108642578, -11.08514404296875, 415.33935546875, -37.08637237548828, -8.298906326293945, 341.62274169921875, 271.05999755859375, 476.2338562011719, 321.6884765625, 501.4566650390625, 195.5913543701172, 166.24554443359375, 476.1975402832031, 610.3334350585938, 185.01153564453125, 536.2723999023438, 419.72698974609375, 118.12866973876953, 4.496250152587891, -51.14387512207031, 443.77423095703125, 189.60157775878906, -76.67618560791016, -163.30709838867188, 165.88299560546875, -48.17767333984375, 420.1280517578125, 169.24655151367188, 7.94073486328125, 21.63263702392578, 363.63336181640625, 58.191593170166016, -148.84884643554688, -99.22004699707031, 561.6314697265625, -76.61123657226562, 214.67311096191406, 253.76866149902344, -65.478271484375, 419.89349365234375, 1.8930816650390625, 224.83816528320312, -257.0817565917969, 275.9908752441406, 12.902809143066406, 538.4860229492188, 328.5682067871094, 323.33001708984375, 76.37663269042969, 455.13885498046875, 221.34376525878906, 176.11383056640625, 325.9759521484375, -32.03211975097656, 508.06231689453125, 193.4409637451172, 55.96299743652344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000539.npy"}
{"epoch": 0.7914831130690162, "step": 540, "batch_size": 64, "mean": 245.46176147460938, "std": 225.0100860595703, "min": -241.685302734375, "p10": -45.88072509765621, "median": 219.1549301147461, "p90": 531.3916870117188, "max": 707.83544921875, "pos_frac": 0.859375, "sample": [283.5581970214844, 354.41754150390625, 499.0252380371094, 73.94438171386719, 700.8921508789062, 33.419097900390625, 173.18594360351562, 124.830810546875, -176.94815063476562, -241.685302734375, -206.8694610595703, 266.4156799316406, 107.75584411621094, 339.8337707519531, 707.104736328125, 135.55963134765625, 139.0412139892578, 174.99478149414062, 359.7449645996094, 484.389404296875, -101.71267700195312, 130.19921875, 171.28152465820312, 385.31463623046875, 382.59130859375, 464.74237060546875, 291.3644714355469, 645.76025390625, 458.0569152832031, 252.73275756835938, 221.268310546875, 322.0658874511719, 194.94041442871094, 75.45636749267578, 203.7489013671875, 359.8240661621094, -7.454681396484375, 7.081390380859375, 515.6099853515625, 183.57794189453125, 414.0216979980469, 707.83544921875, 670.2991333007812, 538.1552734375, -85.38919067382812, 507.7914123535156, 85.91598510742188, 288.3758544921875, 256.84503173828125, 217.0415496826172, -0.4847259521484375, 111.79782104492188, 444.92730712890625, 251.17941284179688, 135.53106689453125, 161.3676300048828, 407.55035400390625, 110.76447296142578, 548.6915283203125, 134.25759887695312, -79.3321762084961, 315.4120788574219, 140.2873992919922, -62.349029541015625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000540.npy"}
{"epoch": 0.7929515418502202, "step": 541, "batch_size": 64, "mean": 169.43572998046875, "std": 208.6818389892578, "min": -262.43695068359375, "p10": -88.36911010742186, "median": 158.22760772705078, "p90": 386.0684844970705, "max": 966.2686767578125, "pos_frac": 0.828125, "sample": [2.878875732421875, 130.6085205078125, 221.122802734375, 211.87362670898438, 172.063232421875, 77.88604736328125, 134.07675170898438, 105.73274230957031, 47.56878662109375, -17.566650390625, 13.751068115234375, 627.5079345703125, -147.2971954345703, 167.28819274902344, 301.4478454589844, 55.59607696533203, 228.11709594726562, 679.146240234375, 91.7082748413086, 149.16702270507812, 966.2686767578125, 295.9019470214844, 97.33926391601562, -123.91777038574219, 209.32635498046875, -124.30133819580078, 85.47903442382812, 147.11422729492188, -262.43695068359375, 353.2070007324219, 214.15872192382812, 43.090110778808594, 287.0459899902344, 79.04983520507812, -61.795684814453125, 104.03910064697266, 246.05615234375, 131.0406951904297, 351.0115966796875, 211.81422424316406, -10.645248413085938, 284.5224914550781, 245.3116455078125, 400.1519775390625, 287.7994689941406, 322.56280517578125, 59.41029739379883, -92.02684783935547, 487.37579345703125, 230.17840576171875, -79.83438873291016, 246.15533447265625, 237.79995727539062, 563.451171875, 430.98193359375, 8.875492095947266, -107.12565612792969, 258.8855895996094, -107.22988891601562, 28.434486389160156, 22.10364532470703, 233.24465942382812, 202.14129638671875, 189.1936798095703], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000541.npy"}
{"epoch": 0.7944199706314243, "step": 542, "batch_size": 64, "mean": 212.4905548095703, "std": 216.59788513183594, "min": -262.66009521484375, "p10": -67.8485206604004, "median": 211.5732879638672, "p90": 488.6954528808594, "max": 677.402099609375, "pos_frac": 0.78125, "sample": [282.4169006347656, 185.5006103515625, 261.3099365234375, 2.4975357055664062, -85.60253143310547, 538.9019775390625, -68.37359619140625, 677.402099609375, 565.8667602539062, 213.6092529296875, 157.0829315185547, 474.70147705078125, 379.74267578125, -75.53976440429688, 123.92789459228516, 400.43017578125, 553.5186767578125, 291.04608154296875, -45.98558807373047, 560.1190795898438, 99.82046508789062, 269.86724853515625, 256.98345947265625, 124.85769653320312, -135.87210083007812, 155.1376190185547, 7.90997314453125, -1.8309555053710938, -19.953750610351562, 442.46636962890625, -66.62334442138672, 382.7650451660156, 228.9059600830078, -31.532806396484375, 594.5986938476562, -33.96310806274414, 402.42724609375, 129.462646484375, -161.92666625976562, 395.3848876953125, 253.9166259765625, 330.7626647949219, 140.42616271972656, -3.8106441497802734, 494.69287109375, 178.74026489257812, 37.44256591796875, -162.91461181640625, 416.9903564453125, 51.73802947998047, 272.9293518066406, 346.89093017578125, 41.23521423339844, 353.8735656738281, 357.3931884765625, 348.50579833984375, 183.3997344970703, -262.66009521484375, 459.9403076171875, 209.53732299804688, 129.88853454589844, 200.34951782226562, 345.197265625, 443.4735107421875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000542.npy"}
{"epoch": 0.7958883994126285, "step": 543, "batch_size": 64, "mean": 146.6068878173828, "std": 233.30311584472656, "min": -285.82269287109375, "p10": -116.50882034301758, "median": 112.19184875488281, "p90": 437.07021789550805, "max": 783.30126953125, "pos_frac": 0.734375, "sample": [15.605857849121094, -26.42169189453125, 123.62834930419922, -65.24237060546875, 120.35240936279297, 654.5731811523438, 4.1808319091796875, -5.3589630126953125, -285.82269287109375, 308.67108154296875, -119.47852325439453, -151.6966552734375, -64.88360595703125, 263.9114074707031, 683.98095703125, 275.5376281738281, 44.802337646484375, 783.30126953125, -152.93418884277344, 18.27587127685547, 332.4990234375, -17.62505340576172, 43.048152923583984, 11.268842697143555, -266.02838134765625, 56.510704040527344, -126.330810546875, 378.87188720703125, 68.35063934326172, 122.57710266113281, 339.8408203125, 83.25895690917969, -217.9940185546875, 462.0123596191406, -68.00363159179688, 7.869983673095703, 735.403564453125, 338.9081115722656, 528.3052978515625, 18.371360778808594, 342.93084716796875, -11.772994995117188, -65.71549987792969, 180.9388885498047, 234.61865234375, 169.84527587890625, 312.3779296875, 517.19287109375, -109.57951354980469, 235.5660400390625, 276.40087890625, 116.08893585205078, 181.8334503173828, 24.122779846191406, 285.46673583984375, 222.77490234375, 129.10748291015625, -11.377662658691406, 65.26332092285156, 256.3982849121094, 24.058448791503906, 108.29476165771484, 274.8738098144531, 367.03472900390625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000543.npy"}
{"epoch": 0.7973568281938326, "step": 544, "batch_size": 64, "mean": 222.5892333984375, "std": 208.7542266845703, "min": -226.97805786132812, "p10": -14.586787796020488, "median": 203.86065673828125, "p90": 441.0405853271485, "max": 1047.563720703125, "pos_frac": 0.890625, "sample": [362.2869873046875, 214.874755859375, 201.74114990234375, 426.3747863769531, 1047.563720703125, -226.97805786132812, 321.46942138671875, 89.21656036376953, 269.1575012207031, -38.69842529296875, 755.058349609375, 65.56687927246094, 118.69172668457031, 187.28395080566406, 4.852394104003906, 308.5005187988281, 201.07398986816406, 172.28390502929688, 214.991455078125, 302.69830322265625, 297.5089416503906, 240.14837646484375, 121.1229476928711, 231.87124633789062, 179.93405151367188, 146.1124725341797, 592.150634765625, 209.03506469726562, 57.9057502746582, 56.358604431152344, 447.325927734375, 216.94070434570312, 57.357765197753906, 385.2257080078125, -23.684249877929688, 207.3601837158203, 369.37872314453125, 172.59132385253906, 295.9813537597656, -37.561676025390625, 161.21755981445312, 205.98016357421875, 167.02520751953125, 353.05316162109375, 347.8582458496094, 191.2522430419922, 260.94940185546875, 569.10595703125, 241.7313995361328, 194.939697265625, -30.99014663696289, 94.45256805419922, 151.22216796875, 576.9259033203125, 386.8472900390625, -22.917865753173828, 272.3686218261719, 18.340774536132812, 409.9289855957031, 43.66233825683594, 110.68336486816406, 495.0699768066406, -214.69371032714844, 40.623863220214844], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000544.npy"}
{"epoch": 0.7988252569750367, "step": 545, "batch_size": 64, "mean": 214.2252197265625, "std": 204.00485229492188, "min": -197.3782958984375, "p10": -40.36938781738281, "median": 210.8182601928711, "p90": 477.9622314453125, "max": 691.7553100585938, "pos_frac": 0.796875, "sample": [-46.549530029296875, -66.76272583007812, -197.3782958984375, -3.011810302734375, 239.63807678222656, 383.800048828125, 258.9077453613281, -8.64082145690918, 260.7734680175781, 87.0752944946289, 377.0025939941406, 478.0472106933594, -101.56653594970703, 318.87701416015625, 477.7639465332031, 63.74732208251953, 421.41937255859375, 484.9939270019531, 606.3223266601562, 259.245361328125, 293.7208251953125, 253.55838012695312, 407.3173828125, 92.58509826660156, 76.79146575927734, 134.17160034179688, -42.08282470703125, 152.85784912109375, 360.5582275390625, -7.077112197875977, 499.0623779296875, 106.60459899902344, 136.5117950439453, -95.75654602050781, 506.90234375, 270.5263671875, 139.0639190673828, 401.7724914550781, -16.162799835205078, 311.40673828125, 22.351295471191406, 409.2371520996094, 168.98382568359375, 414.9010925292969, 102.55490112304688, 404.51593017578125, 173.7037353515625, 691.7553100585938, 180.23931884765625, 303.763427734375, -13.601846694946289, 42.775054931640625, 2.1565933227539062, 301.9653015136719, 181.99844360351562, 396.2794189453125, 573.290283203125, -36.371368408203125, 303.1319885253906, 80.8676986694336, 304.3059997558594, -184.60916137695312, 157.39195251464844, 452.79119873046875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000545.npy"}
{"epoch": 0.8002936857562408, "step": 546, "batch_size": 64, "mean": 195.83468627929688, "std": 177.2772216796875, "min": -267.1151123046875, "p10": -20.760725402832026, "median": 211.3966293334961, "p90": 389.3069519042969, "max": 582.0617065429688, "pos_frac": 0.859375, "sample": [-4.006195068359375, 40.59467315673828, 345.85418701171875, 114.87857055664062, 265.43475341796875, 391.7325134277344, 179.41000366210938, 259.9166259765625, 273.4434814453125, 89.30902099609375, -37.362403869628906, -98.13583374023438, 20.940933227539062, 113.03338623046875, 222.75274658203125, 128.44374084472656, 75.36307525634766, 285.1363525390625, 381.00433349609375, -23.135223388671875, 57.01569366455078, 549.18212890625, -138.38247680664062, 383.6473083496094, 266.64117431640625, 213.6228485107422, 249.44895935058594, 211.2663116455078, 484.9122619628906, 47.683815002441406, 181.08709716796875, -15.220230102539062, 90.32390594482422, 209.27493286132812, 415.29656982421875, -193.0023956298828, 383.22674560546875, 195.32814025878906, 297.128662109375, 582.0617065429688, -267.1151123046875, 204.44256591796875, 330.13287353515625, 157.38934326171875, 232.73342895507812, 577.3809204101562, -67.24491882324219, 414.5367431640625, 213.30923461914062, 273.62939453125, 102.90428924560547, 275.953125, 130.14688110351562, 320.36968994140625, 118.48404693603516, 375.0011291503906, 89.84304809570312, 247.3386993408203, 24.154319763183594, 211.52694702148438, 43.97715759277344, 321.0010681152344, 341.70672607421875, 366.66656494140625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000546.npy"}
{"epoch": 0.801762114537445, "step": 547, "batch_size": 64, "mean": 180.2740478515625, "std": 216.85964965820312, "min": -430.81597900390625, "p10": -60.892991638183595, "median": 171.5524673461914, "p90": 441.9154449462891, "max": 767.4332275390625, "pos_frac": 0.765625, "sample": [-61.405517578125, 203.54302978515625, -430.81597900390625, 511.24658203125, -24.689952850341797, -99.49929809570312, 161.52418518066406, 314.10321044921875, -121.627685546875, 396.2480163574219, 103.69189453125, 613.4140014648438, 210.46432495117188, 767.4332275390625, 22.969161987304688, 32.91008758544922, 515.8121337890625, 60.259132385253906, -162.9309539794922, 363.28668212890625, 68.3201675415039, 144.70571899414062, 338.68701171875, 132.038818359375, 294.62445068359375, 39.002017974853516, 338.74896240234375, 292.828125, 143.04039001464844, 201.80584716796875, 514.2633056640625, 193.65748596191406, -95.4801254272461, 334.15765380859375, 277.6105651855469, 181.58074951171875, -25.063465118408203, 109.07904052734375, 218.56185913085938, 12.87823486328125, 417.0024108886719, 579.8305053710938, 365.9550476074219, -24.401657104492188, -39.9859619140625, -58.057350158691406, -76.17762756347656, 444.8650817871094, 66.80616760253906, 208.63919067382812, 248.48382568359375, -59.69709777832031, 199.46176147460938, 314.9210510253906, 91.50099182128906, 105.02366638183594, 388.2396240234375, -20.4447021484375, 392.55108642578125, -24.646804809570312, 435.032958984375, 320.24560546875, 117.74746704101562, 53.66124725341797], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000547.npy"}
{"epoch": 0.8032305433186491, "step": 548, "batch_size": 64, "mean": 200.43960571289062, "std": 225.20913696289062, "min": -378.3435363769531, "p10": -43.89332809448241, "median": 198.91350555419922, "p90": 448.7067626953126, "max": 947.2275390625, "pos_frac": 0.796875, "sample": [90.85511779785156, 390.2745361328125, 85.24475860595703, 232.89784240722656, 70.99358367919922, -28.0263614654541, -378.3435363769531, 755.4239501953125, 321.480712890625, 254.32394409179688, 292.9338073730469, 354.3377685546875, 393.83990478515625, 311.76446533203125, 271.853759765625, -188.9697265625, -5.603899002075195, 282.1921081542969, 513.2581787109375, 13.621404647827148, 426.48406982421875, 352.7850036621094, 138.6690673828125, 167.10845947265625, 458.23077392578125, -47.085975646972656, 67.10414123535156, 202.4448699951172, 335.0107727050781, 135.82005310058594, -72.70108032226562, 947.2275390625, 386.6116943359375, 116.28793334960938, 195.38214111328125, -126.09191131591797, -28.10662841796875, 425.25433349609375, 0.566192626953125, 602.3048095703125, 207.25634765625, -76.60165405273438, -74.20252990722656, 250.52362060546875, 48.483917236328125, -5.782501220703125, 156.3292694091797, 75.05596923828125, 329.22418212890625, -36.443817138671875, 170.4061279296875, 174.21426391601562, -13.803743362426758, 669.1173706054688, 232.6722412109375, 50.491886138916016, 86.85862731933594, 250.66799926757812, 210.93577575683594, 251.59085083007812, 158.58309936523438, 289.7334899902344, 238.59381103515625, 466.5716857910156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000548.npy"}
{"epoch": 0.8046989720998532, "step": 549, "batch_size": 64, "mean": 188.9108428955078, "std": 195.4046630859375, "min": -309.61083984375, "p10": -25.750785064697254, "median": 153.75144958496094, "p90": 405.712728881836, "max": 696.698974609375, "pos_frac": 0.859375, "sample": [89.41867065429688, 501.00482177734375, 125.41166687011719, 150.21316528320312, 249.1973114013672, 393.2511901855469, 31.931686401367188, 314.6330261230469, 28.31957244873047, 301.3377685546875, -30.950206756591797, 386.04205322265625, 110.23136138916016, 180.94869995117188, 102.97491455078125, -12.089370727539062, 404.02081298828125, -125.82820892333984, -91.02840423583984, 120.4962387084961, 249.0010986328125, 294.5379943847656, 25.04912567138672, 368.19805908203125, 268.5294189453125, 254.83233642578125, 157.28973388671875, 124.32569122314453, 251.83583068847656, 389.68841552734375, 183.5728759765625, 91.46235656738281, 223.2467041015625, 121.98343658447266, 266.5231628417969, -47.593238830566406, 51.28833770751953, 451.6206359863281, 82.9829330444336, -124.01048278808594, -66.06314086914062, 11.746192932128906, 348.09503173828125, 338.9648132324219, 659.3153686523438, 31.282028198242188, 109.37788391113281, 406.4378356933594, 145.6324462890625, 652.8243408203125, 425.228759765625, -309.61083984375, 11.903852462768555, 332.4580078125, 251.12672424316406, 32.012611389160156, 112.00196075439453, -13.61880111694336, 255.3194122314453, 696.698974609375, 21.956100463867188, 352.4130859375, 87.63459777832031, 283.2555847167969], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000549.npy"}
{"epoch": 0.8061674008810573, "step": 550, "batch_size": 64, "mean": 168.54379272460938, "std": 176.7224884033203, "min": -251.82879638671875, "p10": -38.08460273742675, "median": 167.79762268066406, "p90": 393.80028076171874, "max": 560.4058837890625, "pos_frac": 0.84375, "sample": [173.51028442382812, 97.98703002929688, 458.85955810546875, 133.60183715820312, 233.05526733398438, 416.77606201171875, 176.76634216308594, 7.5053863525390625, 43.65385055541992, 19.004623413085938, 223.771484375, 134.9857177734375, 481.0558776855469, 258.5462646484375, 153.89112854003906, 560.4058837890625, 133.23800659179688, -27.422748565673828, 51.501739501953125, 108.86175537109375, -251.82879638671875, -26.49755859375, 327.7604675292969, 253.0064239501953, -149.3655548095703, 381.9727783203125, 195.00393676757812, 164.3939208984375, 203.08441162109375, 63.30644226074219, 14.843063354492188, -42.653968811035156, 256.92449951171875, -22.720420837402344, 167.50128173828125, 514.9861450195312, 345.41522216796875, 215.05230712890625, 395.4986572265625, 214.10536193847656, 344.9358825683594, 104.11717224121094, 24.204883575439453, -43.644256591796875, 200.04624938964844, 289.43829345703125, 224.1737518310547, 339.32904052734375, 137.34140014648438, 239.88568115234375, 41.459129333496094, 495.98394775390625, 226.9187774658203, 389.83740234375, -218.28720092773438, 144.9617919921875, 166.4260711669922, 369.8738708496094, 168.09396362304688, -98.7491455078125, 50.20729064941406, -175.21490478515625, 206.6934051513672, 99.42741394042969], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000550.npy"}
{"epoch": 0.8076358296622613, "step": 551, "batch_size": 64, "mean": 247.79818725585938, "std": 202.7222137451172, "min": -241.17449951171875, "p10": -4.52090835571289, "median": 282.82054138183594, "p90": 503.6076599121094, "max": 686.6412963867188, "pos_frac": 0.875, "sample": [35.78219985961914, 6.4714202880859375, 288.6522216796875, 153.32032775878906, 327.82171630859375, 481.1391906738281, -60.674530029296875, 93.12196350097656, 524.4124755859375, 266.05108642578125, 119.78087615966797, 94.68165588378906, -35.43522644042969, 436.59521484375, 391.72564697265625, 584.264892578125, 426.7850036621094, 134.47052001953125, 341.15679931640625, 554.7337646484375, 403.8547058105469, 282.61279296875, 468.122802734375, 469.9099426269531, 361.84405517578125, 87.33494567871094, 491.08734130859375, 224.4740753173828, -73.76757049560547, 329.4125671386719, 155.39263916015625, 23.58053207397461, 315.57305908203125, 79.141845703125, 203.3107452392578, 396.0379943847656, 336.53076171875, 297.6773681640625, 325.4463195800781, -67.59071350097656, -184.43899536132812, 524.7319946289062, 323.91790771484375, 242.57577514648438, 65.07600402832031, 686.6412963867188, 12.164031982421875, -241.17449951171875, 593.37841796875, 125.99185943603516, -5.047737121582031, 43.164215087890625, 283.0282897949219, 396.99267578125, 338.42047119140625, -3.2916412353515625, 424.2390441894531, 341.2253723144531, 268.75115966796875, 210.41925048828125, 182.93910217285156, 508.9735107421875, 318.6507568359375, 126.91263580322266], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000551.npy"}
{"epoch": 0.8091042584434655, "step": 552, "batch_size": 64, "mean": 285.0164794921875, "std": 240.10772705078125, "min": -194.000244140625, "p10": 18.47551956176758, "median": 277.66595458984375, "p90": 630.8847412109376, "max": 874.8870239257812, "pos_frac": 0.921875, "sample": [285.9618225097656, 211.9126434326172, 159.8785400390625, 656.7825927734375, 231.70223999023438, 612.966552734375, 437.2172546386719, 368.4011535644531, 111.96954345703125, 235.6190185546875, 355.18572998046875, 289.6006164550781, 21.057876586914062, 287.6956787109375, 204.05296325683594, 567.8310546875, 318.8726501464844, 638.56396484375, 815.7245483398438, 65.38995361328125, 157.28732299804688, 339.90142822265625, 447.21295166015625, 285.01239013671875, -168.7288818359375, 29.53453826904297, 29.494285583496094, 592.55322265625, 87.64524841308594, -194.000244140625, 547.4237060546875, -43.48600769042969, 383.16717529296875, 48.01879119873047, 374.574951171875, 225.51072692871094, 414.89276123046875, 653.3338012695312, 874.8870239257812, 336.19427490234375, 355.6155700683594, 17.577430725097656, 260.6043701171875, 418.7604675292969, 452.0618591308594, 773.4771728515625, 270.31951904296875, 20.571060180664062, 770.1185913085938, 475.5977783203125, 119.43846130371094, 80.7027587890625, 127.28661346435547, -99.67086791992188, 6.282318115234375, 168.14901733398438, 57.31092834472656, 146.0371856689453, 231.32135009765625, 267.8778381347656, 365.70245361328125, 350.409912109375, 357.03643798828125, -48.34925079345703], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000552.npy"}
{"epoch": 0.8105726872246696, "step": 553, "batch_size": 64, "mean": 185.01707458496094, "std": 221.51966857910156, "min": -192.27218627929688, "p10": -95.36666946411133, "median": 177.81871032714844, "p90": 475.24397277832037, "max": 966.8309936523438, "pos_frac": 0.78125, "sample": [966.8309936523438, 621.2072143554688, 1.0545654296875, 164.99624633789062, 211.223876953125, -87.6252212524414, 33.53643798828125, -71.44867706298828, 208.66920471191406, 179.61831665039062, 121.26126098632812, 146.64381408691406, 204.84561157226562, 283.574462890625, 47.50524139404297, 268.2886047363281, -87.35372924804688, 16.1241455078125, 295.85455322265625, -151.81736755371094, 203.29116821289062, 226.43165588378906, 346.6313781738281, -192.27218627929688, 362.1947937011719, -40.86616516113281, 459.2855224609375, -39.25347900390625, 88.41565704345703, 302.40716552734375, -167.44825744628906, 276.6228332519531, 264.3909912109375, -135.0819854736328, 199.50149536132812, 177.52001953125, 184.72500610351562, 116.436767578125, 440.63116455078125, -140.68917846679688, 183.50100708007812, -13.947494506835938, 256.9794616699219, 178.11740112304688, 111.1246566772461, 146.40106201171875, -98.68443298339844, 158.10284423828125, 524.7354736328125, 125.96224212646484, 556.0103759765625, 166.396240234375, 481.3131408691406, 141.2584991455078, 510.9010009765625, 461.08258056640625, 121.1193618774414, 449.11553955078125, -16.838714599609375, 533.7412719726562, -162.491455078125, 348.7223205566406, 134.03240966796875, 238.57363891601562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000553.npy"}
{"epoch": 0.8120411160058737, "step": 554, "batch_size": 64, "mean": 243.39793395996094, "std": 240.9853057861328, "min": -453.45306396484375, "p10": -32.64850635528563, "median": 263.0814514160156, "p90": 525.494174194336, "max": 812.0572509765625, "pos_frac": 0.875, "sample": [116.14686584472656, -37.89540100097656, 283.6062927246094, 141.29666137695312, 351.8752136230469, 314.1854553222656, 441.0032043457031, 384.2107849121094, 223.43356323242188, 300.65570068359375, 507.133544921875, 246.4420166015625, 357.7235107421875, 304.1993103027344, -65.99116516113281, 361.3738708496094, 464.3474426269531, 149.44088745117188, 212.86538696289062, 310.8570251464844, -20.405752182006836, -254.14816284179688, 66.29183959960938, 36.051475524902344, 812.0572509765625, 1.2502975463867188, 184.051513671875, 270.8711242675781, 248.47943115234375, 290.3644104003906, 355.8560485839844, 532.8037719726562, 691.7794799804688, 92.37655639648438, 570.3409423828125, 255.29177856445312, 636.153076171875, 33.36092758178711, 508.4384460449219, -85.89042663574219, 176.29159545898438, 382.6746826171875, 325.85418701171875, 370.4197692871094, 279.4432678222656, 63.34611892700195, 345.70751953125, 20.434661865234375, 307.87060546875, 214.64764404296875, -261.3717041015625, 200.58035278320312, 141.01101684570312, -453.45306396484375, 444.205322265625, 25.28082275390625, -112.60501098632812, 119.71976470947266, 489.8837890625, 365.3094482421875, 661.68798828125, 736.8287963867188, 118.22777557373047, 23.189022064208984], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000554.npy"}
{"epoch": 0.8135095447870778, "step": 555, "batch_size": 64, "mean": 243.01727294921875, "std": 253.855712890625, "min": -337.7533874511719, "p10": -34.7589256286621, "median": 192.27184295654297, "p90": 625.4058227539062, "max": 856.4163818359375, "pos_frac": 0.875, "sample": [308.74053955078125, 192.9051055908203, 287.3912048339844, 301.2342529296875, 385.52117919921875, 597.3058471679688, 241.5648651123047, 163.89846801757812, -39.53057861328125, 100.27994537353516, -46.39994812011719, 534.9183349609375, 515.9224853515625, 347.5965576171875, 30.036590576171875, 6.148988723754883, 102.3450698852539, 125.48373413085938, 832.6961669921875, 615.79833984375, 165.55926513671875, 666.76220703125, 176.161376953125, 494.52130126953125, 384.1209716796875, 85.34414672851562, 490.535400390625, 41.35160827636719, 151.45266723632812, 15.871139526367188, 146.59715270996094, 274.98114013671875, 212.69757080078125, 315.3808288574219, 20.151687622070312, 313.1784973144531, 210.9644317626953, 29.51165008544922, 72.40521240234375, 121.2956771850586, -337.7533874511719, 273.4591369628906, 629.5233154296875, 667.482177734375, 21.167451858520508, -74.74064636230469, 688.503173828125, 37.73161315917969, 245.8612060546875, 222.36676025390625, 765.89111328125, -90.75759887695312, 559.6522216796875, -112.17491149902344, -23.62506866455078, 191.63858032226562, 856.4163818359375, 109.71670532226562, 74.57470703125, 168.24591064453125, 456.21893310546875, 247.50814819335938, -104.27122497558594, 87.76971435546875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000555.npy"}
{"epoch": 0.8149779735682819, "step": 556, "batch_size": 64, "mean": 183.7998046875, "std": 222.36105346679688, "min": -266.00689697265625, "p10": -43.765234375, "median": 148.19972229003906, "p90": 499.9400238037111, "max": 813.6898803710938, "pos_frac": 0.8125, "sample": [444.4614562988281, 96.47821807861328, 116.45785522460938, 98.994140625, 251.06594848632812, 64.0641860961914, 421.51531982421875, 285.6168518066406, -104.1615982055664, 344.5505065917969, -43.67810821533203, 386.61566162109375, 199.38174438476562, 148.98089599609375, 384.69793701171875, 122.3675537109375, 673.2618408203125, 89.82073974609375, 331.5901184082031, 72.54625701904297, -220.70579528808594, 73.03491973876953, -266.00689697265625, -20.038747787475586, 150.98052978515625, 334.3119201660156, 162.94955444335938, 29.570236206054688, 468.6529846191406, -85.91015625, 193.51556396484375, 15.80613899230957, 178.72581481933594, -24.12114715576172, 581.4578857421875, -43.802574157714844, 116.40111541748047, 99.62904357910156, 135.03713989257812, 49.950645446777344, -15.906036376953125, 301.1059875488281, 274.5151672363281, 219.9197998046875, -157.58230590820312, 173.00254821777344, 642.814453125, 167.38926696777344, 580.3771362304688, 24.323707580566406, 181.93035888671875, -71.88593292236328, 383.1502685546875, 94.21466064453125, 813.6898803710938, 41.35889434814453, 513.3487548828125, 665.123779296875, 147.41854858398438, 56.10035705566406, 184.94679260253906, 47.8420524597168, -29.843460083007812, 211.76666259765625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000556.npy"}
{"epoch": 0.8164464023494861, "step": 557, "batch_size": 64, "mean": 208.33111572265625, "std": 220.65171813964844, "min": -235.97422790527344, "p10": -33.54068145751953, "median": 185.2990493774414, "p90": 542.5776611328128, "max": 768.16552734375, "pos_frac": 0.84375, "sample": [-213.36785888671875, 90.35844421386719, -235.97422790527344, 255.17422485351562, 85.89485168457031, 319.3687744140625, -13.033187866210938, 399.1747741699219, 259.96246337890625, 93.85675048828125, 351.71044921875, 365.41973876953125, 444.7511901855469, 77.51422119140625, 41.726131439208984, 208.88160705566406, 247.22308349609375, 303.1705322265625, 33.268985748291016, 55.35016632080078, 43.38642883300781, 768.16552734375, 570.9549560546875, 488.91802978515625, 141.3896484375, 393.1353759765625, 139.81512451171875, 47.51933288574219, -65.6153335571289, 474.79840087890625, 157.7952880859375, 216.30178833007812, 592.27197265625, 178.3001708984375, 109.99424743652344, 192.2979278564453, 213.70692443847656, 9.125663757324219, 384.747314453125, 164.76821899414062, 163.74075317382812, -71.00885772705078, 199.391357421875, 80.83746337890625, 23.646257400512695, -176.6334228515625, 443.32489013671875, 601.5458984375, -34.039337158203125, 565.5746459960938, 49.25251388549805, 594.7969970703125, 689.9490966796875, 239.88531494140625, 232.21194458007812, -32.37715148925781, -26.548648834228516, 427.3471374511719, 19.461563110351562, 390.9396057128906, -35.69335174560547, 267.4135437011719, 217.6866455078125, 110.27909851074219], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000557.npy"}
{"epoch": 0.8179148311306902, "step": 558, "batch_size": 64, "mean": 291.4281005859375, "std": 235.5359344482422, "min": -233.67691040039062, "p10": -30.30914688110348, "median": 283.8795928955078, "p90": 598.5317504882813, "max": 785.6746215820312, "pos_frac": 0.890625, "sample": [352.50616455078125, 128.7525177001953, 114.01380920410156, 391.9583435058594, 89.91293334960938, 291.4715270996094, 325.5667724609375, 144.7659149169922, 153.43331909179688, -54.978729248046875, -83.84506225585938, 481.3016357421875, 785.6746215820312, 589.7227783203125, 190.3570098876953, 309.0033874511719, 351.9158935546875, 322.18768310546875, 601.5732421875, 565.0587158203125, 181.1294403076172, 253.08883666992188, 169.1558837890625, 385.8231506347656, 29.48345184326172, 282.9672546386719, 591.4349365234375, 66.44868469238281, 494.4764404296875, 218.56910705566406, 149.43612670898438, 270.91644287109375, 367.47747802734375, 708.7296752929688, 608.5919189453125, 382.10882568359375, -233.67691040039062, 98.86172485351562, 159.22262573242188, 284.79193115234375, 179.3048095703125, 85.15243530273438, 247.8574676513672, 228.33251953125, 204.1184844970703, -45.99324035644531, -209.48390197753906, 746.6365356445312, 540.0662231445312, 6.287069320678711, 476.9505310058594, 584.6826782226562, 456.85272216796875, 332.8387145996094, 670.5113525390625, -60.39726257324219, 443.7881164550781, 153.93536376953125, 205.68035888671875, -83.76437377929688, 305.0131530761719, 728.6890258789062, 504.7831726074219, 430.1678771972656], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000558.npy"}
{"epoch": 0.8193832599118943, "step": 559, "batch_size": 64, "mean": 222.0172576904297, "std": 219.6704559326172, "min": -189.61607360839844, "p10": -44.17446746826171, "median": 219.95377349853516, "p90": 549.132000732422, "max": 702.4609375, "pos_frac": 0.859375, "sample": [329.8421325683594, 505.6773681640625, 326.5328369140625, 385.8875732421875, 50.03357696533203, 28.37774658203125, 97.55116271972656, 70.15650939941406, 193.46336364746094, 702.4609375, 323.7730712890625, 256.1015625, 232.60707092285156, 289.2719421386719, 110.22018432617188, 497.2989807128906, 544.7919921875, 518.6853637695312, -166.171630859375, -122.37933349609375, 639.500732421875, 96.11067199707031, 43.582855224609375, 264.76702880859375, 226.98475646972656, 145.3919677734375, -46.274742126464844, 79.8270263671875, 360.501220703125, 212.92279052734375, 88.89667510986328, -7.677486419677734, 13.469169616699219, 73.94437408447266, 115.09751892089844, 279.63232421875, -130.16650390625, 33.96021270751953, 260.55828857421875, 172.11883544921875, 550.9920043945312, 143.5923309326172, 80.36190795898438, 24.510353088378906, -189.61607360839844, 238.21380615234375, 123.68045043945312, 241.22109985351562, 581.1851806640625, -39.273826599121094, 318.9349670410156, 356.8339538574219, 256.0126953125, 325.77825927734375, 585.4364624023438, 690.7307739257812, -84.97642517089844, 572.2896728515625, 431.7065124511719, 287.95196533203125, 181.75526428222656, 11.990768432617188, -60.0008430480957, 482.4629821777344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000559.npy"}
{"epoch": 0.8208516886930984, "step": 560, "batch_size": 64, "mean": 168.96829223632812, "std": 172.91769409179688, "min": -213.93875122070312, "p10": -48.509369659423825, "median": 152.39350128173828, "p90": 404.09784545898447, "max": 569.783447265625, "pos_frac": 0.8125, "sample": [498.87542724609375, 164.79144287109375, 249.38473510742188, 281.0158996582031, -74.1852035522461, 350.0845642089844, -50.231666564941406, -119.63214874267578, 204.51675415039062, 279.5977783203125, 78.0810546875, 75.09225463867188, 260.22003173828125, 273.2562255859375, 517.371826171875, 44.734413146972656, 465.9551696777344, 458.69891357421875, 99.66555786132812, 172.9688720703125, 332.4132080078125, 65.89837646484375, 139.22775268554688, 450.3230895996094, 227.1269073486328, -10.370903015136719, 58.380069732666016, 415.468017578125, 234.28941345214844, -44.49067687988281, 374.1842346191406, -36.24739074707031, 21.95848274230957, 227.23294067382812, 220.68028259277344, -108.43871307373047, 59.58893585205078, 569.783447265625, 109.98672485351562, 377.56744384765625, 204.1786651611328, -51.9176025390625, 186.08944702148438, 49.437957763671875, 236.22418212890625, 139.9955596923828, 113.80024719238281, 124.46988677978516, -9.367876052856445, 334.25384521484375, 198.5313720703125, 78.27285766601562, 249.04708862304688, 95.37174224853516, 54.33567810058594, 128.79818725585938, -14.137630462646484, 367.73468017578125, 330.48138427734375, 283.5116882324219, -213.93875122070312, 31.55132293701172, -53.30908966064453, 35.73175048828125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000560.npy"}
{"epoch": 0.8223201174743024, "step": 561, "batch_size": 64, "mean": 178.99203491210938, "std": 195.6482696533203, "min": -360.1795349121094, "p10": -59.588558197021456, "median": 168.30453491210938, "p90": 443.7065368652346, "max": 680.078369140625, "pos_frac": 0.84375, "sample": [315.7468566894531, 203.0888671875, 90.72982025146484, -96.3967056274414, -75.7549819946289, 349.3358459472656, 380.8786315917969, 85.74375915527344, 226.44912719726562, 502.3988952636719, -10.049869537353516, 361.1395568847656, 546.93603515625, 178.0135955810547, 165.61288452148438, 202.09780883789062, 28.567440032958984, 106.9048843383789, 278.7569274902344, 198.19236755371094, 666.0606079101562, 117.02047729492188, 301.64349365234375, 371.82781982421875, 8.164581298828125, 219.87107849121094, 470.6327819824219, 157.28622436523438, 145.1479034423828, 490.4896545410156, 254.32260131835938, 680.078369140625, 5.704132080078125, 229.66702270507812, 288.03350830078125, 132.43951416015625, -108.81556701660156, 163.0696563720703, 17.318077087402344, 292.053466796875, 58.56000518798828, -28.450605392456055, -33.78260040283203, 315.98028564453125, 78.32424926757812, -108.7295150756836, 202.2843017578125, 527.4708862304688, 330.3927001953125, 320.5403137207031, -360.1795349121094, 11.95892333984375, 122.59284973144531, 69.81708526611328, 265.97393798828125, -70.64825439453125, 280.04388427734375, 126.50413513183594, 178.9431610107422, 17.28271484375, 96.875244140625, -78.40341186523438, 170.99618530273438, 20.735801696777344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000561.npy"}
{"epoch": 0.8237885462555066, "step": 562, "batch_size": 64, "mean": 233.12530517578125, "std": 217.35195922851562, "min": -174.52291870117188, "p10": -8.758727645874, "median": 223.75894927978516, "p90": 514.0123718261718, "max": 929.3291015625, "pos_frac": 0.890625, "sample": [345.5886535644531, 929.3291015625, 306.2441101074219, 108.24153900146484, 170.60678100585938, 177.78457641601562, 256.12957763671875, 223.92588806152344, 306.8747863769531, 266.44342041015625, 230.36407470703125, 551.3253173828125, 142.51956176757812, 353.6131286621094, 455.8231201171875, -23.651046752929688, 109.10693359375, 328.78070068359375, 180.9424285888672, 240.4835205078125, 129.1833953857422, 308.1544189453125, 356.37744140625, -18.664026260375977, 15.711936950683594, 492.32904052734375, 550.73291015625, 14.3536376953125, 657.6552734375, 43.74627685546875, -73.01849365234375, -56.99713897705078, 101.67890930175781, 390.4134826660156, 21.53520965576172, 872.0267944335938, -38.73272705078125, 223.59201049804688, 230.81185913085938, 383.50103759765625, 65.8723373413086, 104.32450866699219, 425.8267517089844, 248.44961547851562, -174.52291870117188, 271.9911193847656, 40.03300476074219, 514.6695556640625, 338.928466796875, 637.0913696289062, 62.625614166259766, 201.01710510253906, 237.37255859375, 169.38816833496094, 170.68576049804688, 237.13356018066406, 129.40391540527344, 62.530067443847656, -72.70331573486328, 18.31319808959961, 88.76268005371094, 87.10111999511719, 512.4789428710938, 278.3798522949219], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000562.npy"}
{"epoch": 0.8252569750367107, "step": 563, "batch_size": 64, "mean": 181.70925903320312, "std": 219.11135864257812, "min": -315.7660827636719, "p10": -93.67608261108397, "median": 209.18033599853516, "p90": 447.4870635986328, "max": 679.5828857421875, "pos_frac": 0.765625, "sample": [184.95086669921875, -264.2983093261719, -98.95795440673828, 176.86737060546875, 319.67376708984375, -228.38287353515625, 524.443603515625, 151.07107543945312, 297.8192138671875, 305.2234802246094, 213.08547973632812, 243.80389404296875, -65.41155242919922, 335.5320129394531, 223.79547119140625, 259.92974853515625, 51.29078674316406, 659.1241455078125, 380.1380615234375, 69.86183166503906, 375.2778625488281, -3.3608627319335938, -60.86767578125, -141.88661193847656, 212.81654357910156, 57.98677444458008, 308.91748046875, 299.7457275390625, 284.6322021484375, 256.72271728515625, 56.89605712890625, 263.1346740722656, -267.2841796875, 352.10760498046875, 74.09233093261719, 246.77587890625, -27.50997543334961, 10.742942810058594, 437.8207702636719, 306.24444580078125, 407.8665466308594, 446.07666015625, 523.4710693359375, 205.54412841796875, 449.4147033691406, 184.25201416015625, -315.7660827636719, 679.5828857421875, -17.42475128173828, 448.0915222167969, -81.35171508789062, 493.8238525390625, 130.10824584960938, -31.672624588012695, 164.23983764648438, -129.99368286132812, 387.3033142089844, -1.3263206481933594, 125.13853454589844, 74.2171859741211, 215.31336975097656, 52.06937026977539, 273.9715881347656, 163.8784942626953], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000563.npy"}
{"epoch": 0.8267254038179148, "step": 564, "batch_size": 64, "mean": 220.50332641601562, "std": 170.33216857910156, "min": -202.1011962890625, "p10": 7.427394866943359, "median": 197.2127914428711, "p90": 474.9322235107422, "max": 608.7244873046875, "pos_frac": 0.921875, "sample": [278.349609375, 193.38796997070312, 472.5917053222656, 7.3450775146484375, 432.8291931152344, 503.7801513671875, 386.3065185546875, -35.737709045410156, 186.31411743164062, 28.424896240234375, 167.82266235351562, 385.93023681640625, 205.24639892578125, 279.3609313964844, 341.87738037109375, 21.401046752929688, 71.91468811035156, 252.19290161132812, 475.935302734375, 164.47984313964844, 206.4950714111328, 82.61372375488281, 196.8994598388672, -202.1011962890625, 62.83866882324219, 195.20291137695312, -37.79815673828125, 7.619468688964844, 6.000152587890625, 22.772216796875, 173.94232177734375, 337.5701904296875, 299.2579345703125, -8.657981872558594, 151.86215209960938, 208.128662109375, 165.4376983642578, 165.40090942382812, 106.53607177734375, -18.362194061279297, 482.556884765625, 402.0228576660156, 119.188720703125, 197.526123046875, 564.1637573242188, 279.9350280761719, 510.72991943359375, 360.9615478515625, 497.5789489746094, 326.75799560546875, 217.8050537109375, 327.9385681152344, 87.77201080322266, 58.0975341796875, 193.9099884033203, 110.07249450683594, 323.2404479980469, 416.8351745605469, 284.5831298828125, 157.68125915527344, 608.7244873046875, 82.957275390625, 316.06005859375, 243.70260620117188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000564.npy"}
{"epoch": 0.8281938325991189, "step": 565, "batch_size": 64, "mean": 228.48284912109375, "std": 269.636962890625, "min": -154.74569702148438, "p10": -67.06398315429686, "median": 187.1207275390625, "p90": 605.8722290039062, "max": 969.1689453125, "pos_frac": 0.78125, "sample": [-101.39228820800781, 879.6665649414062, 603.4703369140625, -125.41035461425781, 94.7957763671875, 378.3876037597656, 689.8641357421875, 113.21098327636719, 19.952682495117188, -26.618789672851562, 209.34249877929688, 330.217529296875, -69.6827392578125, 99.99240112304688, -10.152652740478516, -153.9501190185547, -100.5450668334961, 115.36798095703125, -23.736663818359375, 365.7802734375, 357.6275634765625, 251.98941040039062, 313.84716796875, 764.9932250976562, 81.08711242675781, 108.07971954345703, 214.81689453125, 174.54522705078125, -144.17068481445312, 365.79241943359375, 267.84100341796875, 280.5826110839844, 277.2091369628906, 362.46978759765625, -52.53871154785156, 0.945587158203125, 490.5170593261719, 352.28289794921875, -60.95355224609375, 199.69622802734375, -0.6064605712890625, 295.34857177734375, 245.54150390625, 933.6650390625, 304.58038330078125, 969.1689453125, 115.27594757080078, 590.7880859375, 389.0047607421875, -154.74569702148438, 98.01434326171875, 18.449676513671875, 475.601806640625, 352.79095458984375, 606.901611328125, 154.01962280273438, 426.77764892578125, 55.3305549621582, 124.41546630859375, 33.767677307128906, 73.85687255859375, 31.211055755615234, -38.55104064941406, 627.0745849609375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000565.npy"}
{"epoch": 0.8296622613803231, "step": 566, "batch_size": 64, "mean": 222.2305145263672, "std": 258.8445739746094, "min": -271.74761962890625, "p10": -110.19740600585938, "median": 254.32302856445312, "p90": 572.8157836914063, "max": 767.9718017578125, "pos_frac": 0.75, "sample": [-61.24700927734375, -27.01905059814453, 294.8602600097656, 504.6903076171875, 44.562835693359375, -66.34646606445312, 471.2654724121094, 136.07144165039062, -219.85858154296875, 381.5989990234375, 268.43621826171875, -60.86515808105469, -4.39715576171875, 207.082763671875, 486.72021484375, 240.2098388671875, 366.1361083984375, -15.309093475341797, 20.561466217041016, -271.74761962890625, 397.8858947753906, 474.26318359375, 281.3520812988281, -114.69232177734375, 294.8075866699219, -83.13973236083984, 229.7939910888672, 501.6103820800781, 76.78956604003906, 507.8824157714844, 275.49285888671875, 767.9718017578125, 427.0402526855469, 36.90483093261719, 592.734375, 24.912879943847656, -110.96697998046875, 627.4732666015625, 271.61279296875, -56.47468566894531, -108.4017333984375, 382.0364685058594, 64.27681732177734, 575.8907470703125, 376.55755615234375, 565.640869140625, 582.2188720703125, 403.6697692871094, -186.25418090820312, 16.51165008544922, 63.72787857055664, 670.9920043945312, 228.2376708984375, 589.0354614257812, 31.843027114868164, -138.8463134765625, 447.39422607421875, 337.898193359375, 559.4115600585938, -163.49151611328125, 116.4782485961914, 100.4508056640625, 271.0963134765625, 317.71832275390625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000566.npy"}
{"epoch": 0.8311306901615272, "step": 567, "batch_size": 64, "mean": 205.8673553466797, "std": 223.17198181152344, "min": -198.3679962158203, "p10": -72.86477584838866, "median": 202.84484100341797, "p90": 491.68225402832036, "max": 694.2222900390625, "pos_frac": 0.765625, "sample": [-18.25849151611328, 185.2489776611328, 633.9716186523438, 337.8785095214844, 377.3253173828125, 283.9217529296875, 56.49298858642578, 140.06729125976562, 478.3843688964844, 286.2010498046875, 517.341796875, -75.30181121826172, 667.5907592773438, 295.77740478515625, 159.77536010742188, 146.38946533203125, 440.20849609375, 431.17498779296875, 407.4456787109375, 118.40617370605469, 355.10333251953125, 243.82366943359375, -177.8311004638672, 322.26171875, 237.78939819335938, 256.500244140625, 212.53990173339844, 497.38134765625, -64.78465270996094, 385.3282775878906, -32.77851104736328, 458.03912353515625, 112.64567565917969, -178.95526123046875, 26.228206634521484, -55.23805618286133, 138.44058227539062, 379.7587585449219, 694.2222900390625, 473.751220703125, 119.87228393554688, -198.3679962158203, -36.254302978515625, -60.491355895996094, 105.73573303222656, 528.24072265625, 186.84814453125, 147.7967529296875, 40.92821502685547, -84.12980651855469, -176.12367248535156, 469.3992919921875, 73.68114471435547, 503.71124267578125, -67.17835998535156, 228.71759033203125, 83.58722686767578, 327.80743408203125, -22.774635314941406, 193.1497802734375, 264.70477294921875, 217.64059448242188, -130.6926727294922, 305.4354248046875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000567.npy"}
{"epoch": 0.8325991189427313, "step": 568, "batch_size": 64, "mean": 247.7938232421875, "std": 219.0425262451172, "min": -409.515380859375, "p10": 10.009260940551773, "median": 236.3960189819336, "p90": 539.3748474121094, "max": 833.5411376953125, "pos_frac": 0.90625, "sample": [283.34515380859375, 58.37424850463867, 507.39630126953125, 219.0462646484375, 302.59967041015625, 290.9159240722656, 322.55731201171875, 346.5497131347656, 317.7805480957031, -409.515380859375, 78.43046569824219, -118.98365020751953, 313.8878479003906, 691.8033447265625, 47.80280303955078, 273.8954772949219, 128.25955200195312, 23.580570220947266, 172.37278747558594, 191.28977966308594, 134.42538452148438, 227.87100219726562, 559.8089599609375, 341.1233215332031, 222.1143798828125, 240.25906372070312, 182.24813842773438, 833.5411376953125, 31.964881896972656, -19.311412811279297, 374.1845703125, 336.00128173828125, 379.35113525390625, 461.26513671875, 486.68572998046875, 379.6211853027344, 215.91490173339844, 232.53297424316406, 245.3524932861328, 387.50909423828125, 661.3519897460938, 63.69375991821289, 182.52503967285156, 194.17710876464844, 243.7434539794922, 121.714111328125, -90.08004760742188, -178.21095275878906, 201.82342529296875, 527.4010620117188, 141.669189453125, 615.8038940429688, 266.91644287109375, -152.14614868164062, 131.02731323242188, 209.36691284179688, 94.84930419921875, 558.0575561523438, 261.14385986328125, 544.5064697265625, 288.2665710449219, 497.0176696777344, 4.192985534667969, 176.1414337158203], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000568.npy"}
{"epoch": 0.8340675477239354, "step": 569, "batch_size": 64, "mean": 212.70703125, "std": 243.44577026367188, "min": -521.9176635742188, "p10": -11.078035545349117, "median": 178.79151916503906, "p90": 505.4576873779297, "max": 1042.3270263671875, "pos_frac": 0.84375, "sample": [-113.29366302490234, 257.9342956542969, 1042.3270263671875, 122.96524047851562, 409.2388916015625, -13.043319702148438, 183.11810302734375, 101.05844116210938, -114.3606185913086, -28.997730255126953, 509.64105224609375, 534.8145751953125, 193.85501098632812, 320.53411865234375, 367.2413330078125, -0.410064697265625, 111.01890563964844, 493.25457763671875, 83.49798583984375, 91.95642852783203, -521.9176635742188, 407.99676513671875, 174.46493530273438, 74.78271484375, 22.976045608520508, 426.0318603515625, 561.5399169921875, 39.15196228027344, 59.177276611328125, 24.673641204833984, 199.43365478515625, 593.287109375, 484.81829833984375, 82.47354125976562, 434.7442932128906, 358.79046630859375, 52.2548828125, 312.51947021484375, 389.4346923828125, -87.300537109375, 502.4563903808594, 11.220470428466797, -6.492372512817383, 525.3590087890625, 293.63861083984375, 100.77823638916016, 390.0184020996094, 266.005615234375, 30.725868225097656, 148.6171112060547, -4.2030181884765625, 467.6923828125, 130.08306884765625, 120.87801361083984, 218.8915252685547, -328.63433837890625, 157.81683349609375, 136.6369171142578, 75.84339904785156, 312.58355712890625, 204.63363647460938, 240.19674682617188, 506.74395751953125, 470.0762023925781], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000569.npy"}
{"epoch": 0.8355359765051396, "step": 570, "batch_size": 64, "mean": 221.46524047851562, "std": 200.1894989013672, "min": -110.93313598632812, "p10": -40.62596721649169, "median": 187.8895492553711, "p90": 518.916748046875, "max": 672.60693359375, "pos_frac": 0.84375, "sample": [172.6006317138672, 531.2304077148438, 185.68873596191406, 481.4430236816406, 555.0548706054688, 138.35743713378906, 421.8909606933594, 546.5353393554688, 520.440185546875, 173.11595153808594, 190.09036254882812, 459.5082702636719, 179.74374389648438, 274.5140380859375, 25.40913963317871, 291.62286376953125, 614.3233642578125, 222.90399169921875, 433.2030944824219, -0.5798492431640625, 121.37470245361328, -77.59432220458984, -30.437532424926758, 12.386146545410156, 61.14807891845703, 672.60693359375, 115.84074401855469, 221.73379516601562, 145.27569580078125, 214.59010314941406, 124.0842056274414, 215.6781463623047, 53.44165802001953, -23.419126510620117, 117.3482666015625, 515.362060546875, 111.81254577636719, 372.33624267578125, 56.277488708496094, 476.4482727050781, 230.56436157226562, -44.99243927001953, -70.69168090820312, 262.29241943359375, 321.33074951171875, 302.20965576171875, 181.6330108642578, 355.8257141113281, 347.7147216796875, 437.30609130859375, -60.31367874145508, 82.13510131835938, 484.975341796875, 205.24867248535156, 199.37161254882812, -110.93313598632812, 90.40795135498047, 131.16883850097656, 198.75379943847656, -48.009525299072266, -52.15727996826172, 634.3828125, 145.20883178710938, 56.95286178588867], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000570.npy"}
{"epoch": 0.8370044052863436, "step": 571, "batch_size": 64, "mean": 236.76950073242188, "std": 252.96066284179688, "min": -417.282470703125, "p10": -63.25759391784666, "median": 208.42473602294922, "p90": 523.370037841797, "max": 852.684326171875, "pos_frac": 0.875, "sample": [519.3261108398438, 362.2261962890625, 24.030223846435547, 387.5816345214844, 112.36063385009766, 193.24813842773438, 24.349651336669922, 335.0879211425781, 263.7430725097656, 412.3853759765625, 852.684326171875, 210.26161193847656, 187.56649780273438, 445.9090881347656, 27.142559051513672, 248.5917205810547, 409.71405029296875, 525.1031494140625, 108.82070922851562, 510.6292724609375, 779.5531005859375, 750.953369140625, 528.7001342773438, 189.14385986328125, -154.15914916992188, 141.05126953125, 302.6116027832031, 233.51405334472656, 206.58786010742188, 483.82489013671875, 60.452415466308594, 72.51678466796875, 383.3771667480469, 731.3394775390625, 589.0684814453125, 152.01307678222656, -83.7166748046875, 78.56497192382812, 510.92962646484375, 407.63916015625, 245.7235107421875, 41.03963088989258, 491.5901794433594, 150.9527130126953, -43.654605865478516, 261.4798278808594, 426.19976806640625, 190.8501739501953, 90.98124694824219, 168.55677795410156, -85.47917175292969, -417.282470703125, 242.13690185546875, 186.18246459960938, -351.2120361328125, 260.73089599609375, 154.0760040283203, 298.3228454589844, -253.90255737304688, 147.13658142089844, 88.2777099609375, 400.9952392578125, 6.479156494140625, -71.65887451171875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000571.npy"}
{"epoch": 0.8384728340675477, "step": 572, "batch_size": 64, "mean": 226.94200134277344, "std": 279.7761535644531, "min": -320.6815490722656, "p10": -99.46218948364255, "median": 166.708740234375, "p90": 641.7111328125001, "max": 1053.202880859375, "pos_frac": 0.828125, "sample": [-121.44137573242188, -162.7759552001953, 38.72113037109375, 394.90411376953125, 853.0402221679688, 178.42083740234375, 24.046295166015625, 91.6808853149414, 42.40549087524414, -7.523292541503906, 228.45143127441406, 137.98162841796875, 424.05712890625, 96.34022521972656, 327.4163818359375, 345.9672546386719, 455.5801696777344, 675.780517578125, 215.94754028320312, 322.07379150390625, 6.6009368896484375, 160.70204162597656, 179.17384338378906, 379.5259704589844, -7.620338439941406, 106.60628509521484, 39.038875579833984, -63.50129699707031, -132.75576782226562, 488.8414306640625, 250.44467163085938, -114.8740005493164, 229.9013671875, 70.05844116210938, 54.67593002319336, 944.3320922851562, 129.66604614257812, 273.6811218261719, 10.430521011352539, 162.1444854736328, 481.55853271484375, 76.06181335449219, -160.55999755859375, 343.7888488769531, 6.742984771728516, -320.6815490722656, 1053.202880859375, 171.2729949951172, 156.22572326660156, -5.65460205078125, 593.4388427734375, 287.11492919921875, 233.99526977539062, 682.8549194335938, 131.75299072265625, 237.73971557617188, 453.0945129394531, 780.6514282226562, 21.592041015625, -119.371826171875, 278.7183837890625, 650.7694702148438, 620.5750122070312, 141.25819396972656], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000572.npy"}
{"epoch": 0.8399412628487518, "step": 573, "batch_size": 64, "mean": 220.71739196777344, "std": 225.2751922607422, "min": -217.56153869628906, "p10": -30.804135322570794, "median": 216.67613983154297, "p90": 549.8953338623048, "max": 706.396728515625, "pos_frac": 0.84375, "sample": [354.3860168457031, -23.397336959838867, 8.186330795288086, 44.2198486328125, -3.0606842041015625, 706.396728515625, 136.93798828125, -41.94261932373047, 475.4326171875, 408.0975341796875, 232.123046875, 219.22596740722656, 6.351728439331055, 682.9664306640625, 274.4991455078125, 238.6607208251953, 694.3818359375, -217.56153869628906, 168.71429443359375, -33.978477478027344, 25.56952667236328, 569.8988647460938, 52.792015075683594, 254.29385375976562, 11.814704895019531, 193.324951171875, 496.51373291015625, 658.875732421875, 237.31887817382812, 275.36383056640625, -207.4121551513672, 217.3815155029297, 406.6689147949219, 43.65101623535156, 573.1298828125, -155.70559692382812, 296.58489990234375, -82.31561279296875, 181.8941650390625, -85.84270477294922, 619.4263916015625, 2.6888809204101562, 331.527099609375, 79.0178451538086, 203.206787109375, 71.40046691894531, 256.51416015625, 353.776611328125, 400.294921875, -3.60614013671875, 215.97076416015625, 85.43315887451172, 56.814453125, 247.2082061767578, 147.10484313964844, 154.42898559570312, 279.9573059082031, 503.2204284667969, 425.8171691894531, 146.5696563720703, 301.0684814453125, 480.4674377441406, 391.3863830566406, 81.77955627441406], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000573.npy"}
{"epoch": 0.8414096916299559, "step": 574, "batch_size": 64, "mean": 238.36622619628906, "std": 230.5947265625, "min": -217.5253143310547, "p10": -10.07241630554199, "median": 179.4601058959961, "p90": 581.0729431152344, "max": 731.3169555664062, "pos_frac": 0.875, "sample": [127.6889419555664, 278.6636657714844, -51.963775634765625, 159.1168212890625, 366.6100158691406, 73.12557220458984, 33.1947021484375, 720.82666015625, 582.8357543945312, 433.1514892578125, 127.30535125732422, 621.8005981445312, 616.2158203125, 298.11700439453125, 192.89657592773438, 203.1720733642578, 153.90264892578125, 177.63462829589844, -179.96469116210938, 717.9822387695312, 213.8992919921875, 172.14170837402344, 66.24562072753906, 552.4610595703125, 181.28558349609375, 526.4434204101562, 168.57586669921875, -99.33373260498047, -11.144184112548828, 4.015865325927734, 92.92245483398438, 10.279949188232422, 207.86566162109375, 455.93414306640625, 32.31658172607422, 431.7987365722656, -109.7567367553711, 598.1385498046875, 107.7606201171875, 399.3302307128906, 516.9134521484375, -38.83448791503906, 494.08251953125, 292.9432678222656, -7.571624755859375, 119.07577514648438, 219.8504638671875, 115.05247497558594, -217.5253143310547, 123.59664916992188, 346.0339050292969, 22.109642028808594, 112.32701110839844, 568.8650512695312, 289.8838806152344, 205.55892944335938, 140.10336303710938, 576.959716796875, 109.10426330566406, 374.88861083984375, 158.11390686035156, 731.3169555664062, 243.80616760253906, 105.29096221923828], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000574.npy"}
{"epoch": 0.8428781204111601, "step": 575, "batch_size": 64, "mean": 215.68133544921875, "std": 266.07440185546875, "min": -488.8778381347656, "p10": -140.07604446411125, "median": 233.84683990478516, "p90": 577.987469482422, "max": 732.1891479492188, "pos_frac": 0.8125, "sample": [343.27655029296875, 299.6296081542969, 329.4262390136719, 254.62469482421875, 343.3945007324219, -194.96963500976562, 319.2108154296875, 233.05569458007812, 226.95297241210938, 132.14817810058594, 118.60929107666016, 115.21358489990234, -11.492584228515625, 306.161865234375, 333.9542236328125, 158.5238800048828, -11.821624755859375, 588.1181640625, 123.92491149902344, 234.6379852294922, 316.9009704589844, 252.7064208984375, 582.1198120117188, 92.70527648925781, 248.080810546875, 66.35677337646484, -488.8778381347656, 206.85147094726562, -170.91766357421875, 365.7486877441406, -52.861412048339844, 719.0502319335938, 353.3653259277344, 568.3453369140625, 35.11957550048828, -8.593673706054688, 178.92767333984375, -68.11226654052734, 678.1417236328125, 462.42340087890625, 150.4193115234375, 38.30921173095703, -185.31454467773438, 429.4814453125, 732.1891479492188, -443.0732421875, 443.22418212890625, 352.80126953125, 300.11395263671875, 607.707275390625, 107.7607650756836, 532.7755126953125, 336.9924621582031, 43.3533935546875, 405.4462890625, 277.80029296875, 225.8560791015625, -378.99017333984375, 685.9861450195312, 59.1062126159668, -259.50384521484375, 166.72396850585938, 193.98924255371094, 400.3916320800781], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000575.npy"}
{"epoch": 0.8443465491923642, "step": 576, "batch_size": 64, "mean": 207.4990234375, "std": 257.4231872558594, "min": -369.36358642578125, "p10": -74.68465270996093, "median": 182.61661529541016, "p90": 573.0528625488282, "max": 716.715087890625, "pos_frac": 0.796875, "sample": [453.0484619140625, 682.8692626953125, -366.03338623046875, 59.268741607666016, -81.81727600097656, 97.31419372558594, -75.87297058105469, 673.3172607421875, 242.88067626953125, 93.47119903564453, 218.7083282470703, 527.6431884765625, 150.74725341796875, 35.67469787597656, 399.082763671875, 302.04071044921875, 479.4999694824219, -272.9656982421875, 353.6144714355469, 335.97589111328125, 306.3765563964844, 347.99822998046875, 577.5992431640625, 716.715087890625, 99.73204040527344, 669.71240234375, -63.31437301635742, -86.52186584472656, 238.86721801757812, 437.82305908203125, -27.637680053710938, 184.8774871826172, 110.580078125, 18.935623168945312, 268.9461975097656, 446.447265625, 715.9552001953125, 169.9337158203125, -53.88935089111328, 380.28875732421875, 43.623390197753906, 349.363525390625, 212.34750366210938, 369.95269775390625, 76.5997543334961, -369.36358642578125, -8.936370849609375, 249.34750366210938, -230.17706298828125, 154.1030731201172, 212.0169677734375, 656.4356079101562, 562.4446411132812, 389.0057373046875, -71.91191101074219, 53.09062194824219, 59.53498077392578, 146.14926147460938, 180.35574340820312, 67.22882080078125, -45.02855682373047, 97.12319946289062, 45.432373046875, 313.30767822265625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000576.npy"}
{"epoch": 0.8458149779735683, "step": 577, "batch_size": 64, "mean": 221.11129760742188, "std": 217.76283264160156, "min": -311.0860595703125, "p10": 7.495079040527347, "median": 227.27674865722656, "p90": 458.5221099853517, "max": 771.5146484375, "pos_frac": 0.90625, "sample": [-310.55474853515625, 717.49658203125, 385.29254150390625, 215.6461181640625, 74.66547393798828, 65.22319030761719, 236.84959411621094, 207.5140838623047, 88.42394256591797, 10.65557861328125, 471.0237731933594, 125.26944732666016, 217.7039031982422, 429.3515625, 71.71416473388672, 338.2462463378906, 593.1982421875, 403.65673828125, 6.1405792236328125, 382.7623291015625, 179.59950256347656, 211.21298217773438, -267.8796081542969, 39.4415283203125, 372.70477294921875, 241.75064086914062, 148.29910278320312, 376.8208312988281, 276.371826171875, 297.3360595703125, 316.8323059082031, 171.6702117919922, 191.25765991210938, 252.60391235351562, 381.83245849609375, 53.56196594238281, 397.6656188964844, 16.446537017822266, 149.6531524658203, -311.0860595703125, 75.40475463867188, 138.75140380859375, 290.60137939453125, 290.12677001953125, -73.61109161376953, 80.51026916503906, 85.59107971191406, 480.69573974609375, 298.4868469238281, 583.1568603515625, 185.0135955810547, 160.39431762695312, 612.5177001953125, 97.34071350097656, 771.5146484375, 310.4858093261719, 299.9742126464844, -26.87060546875, -266.9884033203125, 368.2239990234375, 242.82713317871094, 378.2376403808594, 301.2655334472656, 241.09825134277344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000577.npy"}
{"epoch": 0.8472834067547724, "step": 578, "batch_size": 64, "mean": 219.00830078125, "std": 202.31219482421875, "min": -157.76901245117188, "p10": -8.830353164672847, "median": 178.5649871826172, "p90": 534.4856689453127, "max": 754.9722290039062, "pos_frac": 0.875, "sample": [754.9722290039062, 33.579227447509766, 321.7007751464844, 171.4542236328125, 322.9073181152344, 292.0743713378906, 625.1360473632812, 182.40756225585938, 66.31680297851562, -12.828601837158203, 27.613615036010742, 593.4169311523438, 218.92881774902344, 257.0850524902344, 139.86483764648438, -157.76901245117188, 245.05433654785156, 45.63872528076172, 560.21728515625, 437.1343994140625, 356.46295166015625, 263.41912841796875, 385.37274169921875, -4.2035369873046875, -150.82192993164062, 126.72645568847656, 400.058837890625, 74.24767303466797, -10.813274383544922, 161.039794921875, 4.6118927001953125, 32.83526611328125, 244.5379180908203, 163.43417358398438, 82.37513732910156, 344.337646484375, 174.722412109375, 76.57891082763672, -104.0475845336914, 79.20368194580078, 125.98252868652344, 315.93682861328125, 210.66864013671875, 171.17539978027344, 555.974853515625, 151.77981567382812, 53.11376190185547, 11.511690139770508, 328.8865966796875, 131.6774139404297, 284.0629577636719, 500.6761474609375, 227.51930236816406, 364.46588134765625, 341.3065490722656, 468.8860168457031, 17.230438232421875, 326.1712646484375, 582.3204956054688, 548.9754638671875, 119.83900451660156, -21.07019805908203, -49.84199523925781, 424.29998779296875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000578.npy"}
{"epoch": 0.8487518355359766, "step": 579, "batch_size": 64, "mean": 175.51071166992188, "std": 194.19358825683594, "min": -283.6739501953125, "p10": -70.7018615722656, "median": 167.4369125366211, "p90": 421.228109741211, "max": 631.7361450195312, "pos_frac": 0.828125, "sample": [40.48577117919922, -24.96463966369629, -202.434814453125, 336.68426513671875, -6.0506591796875, 123.22213745117188, 306.2033996582031, 240.0921630859375, 341.0931701660156, 418.6505126953125, -51.39972686767578, 66.13209533691406, -283.6739501953125, 143.36070251464844, 247.31912231445312, 366.4136047363281, 258.58551025390625, 319.43304443359375, 631.7361450195312, 30.421737670898438, 24.419593811035156, 82.25590515136719, 133.61618041992188, 176.32171630859375, 148.7637176513672, 311.51361083984375, 273.61810302734375, -78.97420501708984, 257.0675354003906, 276.982666015625, 171.32763671875, 34.92002868652344, 383.941162109375, 424.22802734375, 533.4450073242188, 427.15240478515625, 178.0738067626953, -151.6656951904297, -33.97769546508789, 349.4064025878906, 422.3327941894531, 128.19842529296875, 314.7330627441406, -133.29205322265625, 559.1263427734375, 47.92820739746094, 127.83158874511719, 141.7520294189453, -173.98545837402344, 163.5461883544922, 365.9353942871094, 174.3597412109375, 211.02256774902344, 313.89825439453125, 242.22314453125, -196.59494018554688, 103.4472427368164, 134.90264892578125, 86.44126892089844, 39.32206726074219, 40.64183044433594, 484.07220458984375, 284.1372375488281, 126.9598388671875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000579.npy"}
{"epoch": 0.8502202643171806, "step": 580, "batch_size": 64, "mean": 252.62350463867188, "std": 245.86431884765625, "min": -328.13214111328125, "p10": 19.190765380859382, "median": 246.291015625, "p90": 612.3597961425783, "max": 958.2576904296875, "pos_frac": 0.921875, "sample": [116.74763488769531, 651.4395141601562, 105.64189147949219, 210.96290588378906, 13.916324615478516, 262.29290771484375, 161.39781188964844, 564.5331420898438, 252.38638305664062, 271.2789611816406, 300.302734375, 51.0174560546875, 882.4721069335938, 25.545196533203125, 35.53987503051758, 474.454833984375, 307.53106689453125, 121.16361999511719, 429.019287109375, 273.016357421875, 221.246337890625, 351.3334655761719, 305.7950134277344, 434.1550598144531, 255.9022674560547, 193.3558349609375, 277.5914001464844, 352.9989013671875, 461.0351867675781, 57.136932373046875, 216.4254150390625, 101.40812683105469, 196.49417114257812, 63.436676025390625, 645.885498046875, 513.9489135742188, 105.86941528320312, -300.0603332519531, 447.9823303222656, 322.87091064453125, 632.85693359375, 129.89817810058594, -304.5741882324219, 401.83392333984375, 145.50213623046875, 228.5243682861328, -328.13214111328125, -147.1917266845703, 390.148193359375, 958.2576904296875, 264.3654479980469, 16.467437744140625, 639.169921875, 240.19564819335938, 263.378173828125, 139.09295654296875, -18.84589385986328, 159.41448974609375, 53.74639129638672, 59.36094284057617, 216.063232421875, 635.6262817382812, 403.5496520996094, 253.724853515625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000580.npy"}
{"epoch": 0.8516886930983847, "step": 581, "batch_size": 64, "mean": 183.37216186523438, "std": 216.7357940673828, "min": -216.3789825439453, "p10": -66.08808631896972, "median": 167.33020782470703, "p90": 425.5243286132813, "max": 846.4804077148438, "pos_frac": 0.796875, "sample": [135.21505737304688, -0.7877044677734375, 68.405029296875, 25.770645141601562, 182.619873046875, 146.9972686767578, -216.3789825439453, 212.13897705078125, 41.434165954589844, -4.659000396728516, 19.09619140625, 289.9891052246094, 352.20721435546875, 213.88616943359375, 427.544677734375, -195.27938842773438, 649.3683471679688, 195.61671447753906, 367.3031005859375, 668.4838256835938, -71.54045104980469, 396.9385681152344, 50.302215576171875, 226.3426055908203, 214.53514099121094, 362.7663879394531, 228.15357971191406, 80.07788848876953, 170.84490966796875, 399.4458312988281, -161.07701110839844, 846.4804077148438, 19.59725570678711, 420.8101806640625, 142.27679443359375, 223.18768310546875, 411.3071594238281, 163.8155059814453, 125.50446319580078, 313.56573486328125, 160.07611083984375, 33.29150390625, 88.64447021484375, 203.2316131591797, 287.3026123046875, 119.00535583496094, 388.30902099609375, 433.30511474609375, 560.0643310546875, 199.0330352783203, -214.37026977539062, -92.33206939697266, 22.016902923583984, -108.67019653320312, 375.9114990234375, 113.86918640136719, -53.365901947021484, -10.562271118164062, 102.01666259765625, -38.01947784423828, 185.37075805664062, 552.253662109375, -9.925132751464844, 297.0551452636719], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000581.npy"}
{"epoch": 0.8531571218795888, "step": 582, "batch_size": 64, "mean": 229.39111328125, "std": 234.77362060546875, "min": -465.64520263671875, "p10": -25.35559616088867, "median": 208.4855194091797, "p90": 548.4721252441407, "max": 653.713623046875, "pos_frac": 0.84375, "sample": [-465.64520263671875, 153.95175170898438, 653.398681640625, 445.1739807128906, 407.7155456542969, -25.7442626953125, 302.3884582519531, 307.69207763671875, 255.9441680908203, 89.21856689453125, 145.6352996826172, 249.1981201171875, 9.626602172851562, 507.3839111328125, 207.32687377929688, 438.2026062011719, 46.96421813964844, -47.76054382324219, 223.2306671142578, 581.586669921875, 283.72467041015625, 167.04957580566406, 424.0083923339844, -67.46682739257812, 172.19146728515625, 653.713623046875, 120.8082046508789, 220.25650024414062, 545.3314819335938, 514.0487060546875, 245.74801635742188, 286.6071472167969, 513.3753051757812, 97.4909896850586, -86.036865234375, -9.396156311035156, 32.429786682128906, -28.770980834960938, 209.6441650390625, 311.3045654296875, 15.5460205078125, 73.64729309082031, 632.0195922851562, 629.3944091796875, 106.46290588378906, 555.3472290039062, -24.448707580566406, 143.96693420410156, 47.54811096191406, 130.46363830566406, 10.637256622314453, 523.40771484375, 549.818115234375, 282.76141357421875, -18.314250946044922, -288.8747863769531, 185.90444946289062, 38.933631896972656, 418.40155029296875, 381.21539306640625, 527.9998168945312, 180.79103088378906, 299.65240478515625, 187.2003936767578], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000582.npy"}
{"epoch": 0.8546255506607929, "step": 583, "batch_size": 64, "mean": 171.66009521484375, "std": 221.84744262695312, "min": -245.52432250976562, "p10": -98.61258621215819, "median": 138.78388214111328, "p90": 440.91600952148445, "max": 757.1753540039062, "pos_frac": 0.796875, "sample": [205.23196411132812, 91.21185302734375, 203.13674926757812, 21.130050659179688, 45.3922119140625, 142.7497100830078, 451.2759704589844, 262.5542907714844, 757.1753540039062, 134.81805419921875, -36.22125244140625, 77.66966247558594, -191.7722625732422, 274.1453857421875, 132.07095336914062, -105.28972625732422, 335.87908935546875, 262.6628112792969, 127.1697769165039, 262.2529296875, 206.2447509765625, 50.555335998535156, 129.66012573242188, -171.97804260253906, 108.07006072998047, -120.8509521484375, 263.72174072265625, 64.2847900390625, -245.52432250976562, -38.489742279052734, 165.34446716308594, 44.828033447265625, 176.55416870117188, 347.57177734375, 324.20086669921875, 705.66552734375, 271.42291259765625, 7.741371154785156, 416.7427673339844, 95.70377349853516, -8.089691162109375, 168.60621643066406, 207.84814453125, -47.46221923828125, -26.575927734375, -239.9986114501953, 753.9600219726562, 273.76007080078125, -83.0325927734375, 285.6624755859375, 505.5909423828125, 511.46624755859375, 31.89411163330078, 339.3162536621094, 313.260009765625, -141.87744140625, 130.78590393066406, 355.8002014160156, 229.43792724609375, 193.98904418945312, 684.8702392578125, 122.61328125, 77.46735382080078, 92.24081420898438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000583.npy"}
{"epoch": 0.856093979441997, "step": 584, "batch_size": 64, "mean": 272.740966796875, "std": 216.8332061767578, "min": -170.92529296875, "p10": 35.707132339477546, "median": 241.33313751220703, "p90": 595.9375244140625, "max": 695.5050659179688, "pos_frac": 0.921875, "sample": [308.18438720703125, 161.0522918701172, 88.80989074707031, 290.7742614746094, 33.8055419921875, -122.45884704589844, 471.77899169921875, 387.7884826660156, 490.87847900390625, 40.1441764831543, -22.60595703125, 102.10167694091797, 51.05230712890625, 336.1552429199219, 306.182373046875, 548.3084716796875, 307.8770751953125, 640.0645751953125, 517.548583984375, 123.1375961303711, 238.20123291015625, 194.78033447265625, 98.17759704589844, 192.8682098388672, 190.9124755859375, -116.29470825195312, 642.5454711914062, 194.79788208007812, 89.31513977050781, 230.52127075195312, 440.3387145996094, 263.5893249511719, 162.06515502929688, -170.92529296875, 358.1026611328125, 317.70648193359375, 120.82597351074219, 265.0239562988281, 3.780487060546875, 234.60580444335938, 379.28857421875, 127.150390625, 122.15936279296875, 670.5123901367188, 206.93211364746094, 178.2755126953125, 465.9964904785156, 671.0078125, -142.33999633789062, 525.7901611328125, 607.9351196289062, 71.80522155761719, 695.5050659179688, 590.4174194335938, 421.3499450683594, 510.4483947753906, 244.4650421142578, 411.34234619140625, 51.603492736816406, 330.9560546875, 411.5129699707031, 197.43292236328125, 598.3032836914062, 96.05465698242188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000584.npy"}
{"epoch": 0.8575624082232012, "step": 585, "batch_size": 64, "mean": 232.48895263671875, "std": 241.14529418945312, "min": -284.5524597167969, "p10": -52.88008766174313, "median": 198.35042572021484, "p90": 560.7842346191408, "max": 918.2786865234375, "pos_frac": 0.859375, "sample": [240.01885986328125, 26.226150512695312, 260.3065490722656, 211.07872009277344, -21.165660858154297, 918.2786865234375, 362.5030212402344, -121.5574951171875, 160.24514770507812, 108.76531982421875, 634.6719360351562, 89.10163879394531, 158.99234008789062, 352.30120849609375, 162.57827758789062, 436.2081298828125, 445.1394348144531, 434.3848876953125, 804.6478881835938, 314.06072998046875, 156.5444793701172, 622.3461303710938, 70.51177215576172, 267.7785949707031, 404.1001892089844, 9.202377319335938, 266.237548828125, 385.44232177734375, 440.1311340332031, 80.93096923828125, 97.11970520019531, 232.47686767578125, 185.62213134765625, -66.47198486328125, 499.8095397949219, 115.41724395751953, 577.591796875, -69.36965942382812, 53.44300842285156, 416.2171630859375, 128.99838256835938, 43.151241302490234, 521.5665893554688, -84.4529800415039, 416.17364501953125, 246.81053161621094, 227.0513916015625, -68.77377319335938, 417.59686279296875, 9.822717666625977, 5.898256301879883, -241.39730834960938, 15.930461883544922, -284.5524597167969, 44.74440002441406, 654.7811279296875, 376.07025146484375, 182.7077178955078, 384.6590881347656, -14.186325073242188, 176.31578063964844, 237.16531372070312, 170.16920471191406, 591.1747436523438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000585.npy"}
{"epoch": 0.8590308370044053, "step": 586, "batch_size": 64, "mean": 188.37155151367188, "std": 239.69996643066406, "min": -402.1540832519531, "p10": -91.20123291015625, "median": 185.2344207763672, "p90": 466.55909423828126, "max": 780.7747802734375, "pos_frac": 0.78125, "sample": [658.4169921875, 185.14346313476562, -5.213863372802734, 58.60285186767578, 372.362548828125, 185.34442138671875, -92.39405059814453, 214.9021453857422, 48.350181579589844, 437.2055969238281, 151.46424865722656, 216.98269653320312, 200.68312072753906, 494.92840576171875, -81.53539276123047, 174.0531463623047, 368.27294921875, -402.1540832519531, 123.12464141845703, 176.48960876464844, 154.54710388183594, 22.357568740844727, 139.69717407226562, 466.814453125, 397.21014404296875, -94.00083923339844, 399.96881103515625, -163.4254150390625, 244.36126708984375, 8.904205322265625, 373.226806640625, -88.4179916381836, 382.15338134765625, -144.40701293945312, 54.767486572265625, 24.506195068359375, 411.48828125, -8.560035705566406, 191.5030517578125, 13.701360702514648, 624.6552124023438, -43.3914909362793, 465.9632568359375, 362.901123046875, -297.7582702636719, -289.7235412597656, 254.25265502929688, 374.50787353515625, 314.2993469238281, 156.9537353515625, -82.35365295410156, 19.13622283935547, 346.49188232421875, 780.7747802734375, 425.7789306640625, 175.19471740722656, 455.55975341796875, 575.18896484375, 121.24267578125, -86.81229400634766, 266.1061706542969, 469.341552734375, 210.71844482421875, 185.32537841796875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000586.npy"}
{"epoch": 0.8604992657856094, "step": 587, "batch_size": 64, "mean": 217.67236328125, "std": 226.69644165039062, "min": -334.1235046386719, "p10": -58.529287719726554, "median": 203.61421966552734, "p90": 527.0043640136719, "max": 744.724853515625, "pos_frac": 0.84375, "sample": [221.46971130371094, 415.2916259765625, 351.1982421875, 40.588348388671875, 382.62457275390625, 6.247491836547852, 217.70782470703125, -21.588146209716797, 0.366455078125, 293.19952392578125, -53.03498840332031, 204.43316650390625, 627.677490234375, 141.68038940429688, 288.775390625, 331.3629150390625, -22.593902587890625, 182.21424865722656, 176.33560180664062, 297.700439453125, -89.83790588378906, 191.66030883789062, 531.1153564453125, 15.061120986938477, 214.2727508544922, 646.2935180664062, -128.9164581298828, 499.9510192871094, 200.17672729492188, -60.88398742675781, 202.79527282714844, 164.51339721679688, 30.361557006835938, 26.00726318359375, 348.52667236328125, 210.8745574951172, 193.26992797851562, -334.1235046386719, 250.7312469482422, 73.16207885742188, -84.9093246459961, 583.7109375, 71.70179748535156, 507.3039245605469, 80.41828918457031, 673.178466796875, 150.1351776123047, -150.73130798339844, 211.67544555664062, 259.6128234863281, 439.52899169921875, 744.724853515625, 349.64715576171875, 377.01324462890625, 370.7292175292969, 352.2786560058594, 576.0079345703125, -121.30611419677734, 191.68365478515625, 50.51578903198242, 45.200958251953125, 49.1319580078125, 419.6996765136719, 517.4120483398438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000587.npy"}
{"epoch": 0.8619676945668135, "step": 588, "batch_size": 64, "mean": 199.75949096679688, "std": 219.96322631835938, "min": -358.7777099609375, "p10": -45.54450378417969, "median": 159.1717758178711, "p90": 469.54407348632816, "max": 740.3087158203125, "pos_frac": 0.84375, "sample": [212.78358459472656, 228.5266571044922, -44.605384826660156, 52.466426849365234, 442.42706298828125, 337.8095397949219, 376.1998291015625, 47.547698974609375, 316.89910888671875, 467.3938293457031, 24.2554931640625, 27.69123077392578, 37.25299072265625, 218.893310546875, -129.0362548828125, -64.80667114257812, 189.91128540039062, 26.026702880859375, 103.14777374267578, 126.34927368164062, 428.612548828125, 70.90170288085938, -141.69920349121094, 470.4656066894531, 126.22157287597656, 101.06871032714844, 653.5162353515625, 124.451416015625, -111.88563537597656, 29.86302947998047, 207.87652587890625, 12.531904220581055, 105.30216979980469, 523.427734375, 388.2073974609375, 150.2122802734375, 151.76942443847656, -45.946983337402344, 409.43939208984375, 256.296630859375, 531.429931640625, 390.0751037597656, 142.11019897460938, 354.37335205078125, 227.38270568847656, 68.54622650146484, 110.75868225097656, 642.3449096679688, -358.7777099609375, 346.46624755859375, -19.27077865600586, 740.3087158203125, 712.6828002929688, 274.87969970703125, 417.90032958984375, 221.06698608398438, 196.19314575195312, 103.295166015625, -117.98004913330078, 166.57412719726562, 232.15097045898438, 133.00326538085938, 374.09857177734375, -12.770797729492188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000588.npy"}
{"epoch": 0.8634361233480177, "step": 589, "batch_size": 64, "mean": 273.01702880859375, "std": 224.37210083007812, "min": -147.7084503173828, "p10": 9.827672386169443, "median": 251.7000732421875, "p90": 560.8394958496095, "max": 765.784423828125, "pos_frac": 0.90625, "sample": [581.1400756835938, 376.82611083984375, 273.5982666015625, 143.1882781982422, 398.69610595703125, 6.321849822998047, 20.149368286132812, 513.3057861328125, 227.36749267578125, -53.31993865966797, 106.72187805175781, 216.11489868164062, 269.2032165527344, 217.9330596923828, 250.98309326171875, 765.784423828125, 713.539794921875, -39.518638610839844, 248.0933074951172, 110.64682006835938, 503.34210205078125, 35.3466796875, 74.79945373535156, 690.06396484375, 166.3463134765625, 258.28369140625, 206.66909790039062, 202.61618041992188, 484.02435302734375, 52.26019287109375, 521.0825805664062, 511.69451904296875, 294.28125, 457.6129455566406, 765.3336791992188, 33.824615478515625, -9.7359619140625, 117.50489044189453, 512.19873046875, 30.237964630126953, 391.6385803222656, 40.6373291015625, 258.5174255371094, 318.4575500488281, 377.359130859375, 186.03045654296875, 577.878173828125, 90.06108093261719, 347.845947265625, 695.9283447265625, 502.9523620605469, 253.0028076171875, 328.86688232421875, -69.60455322265625, 183.3221893310547, -104.19932556152344, 497.7213134765625, 456.7466735839844, 18.007925033569336, 152.05227661132812, 372.8815002441406, 237.7154083251953, -147.7084503173828, 252.41705322265625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000589.npy"}
{"epoch": 0.8649045521292217, "step": 590, "batch_size": 64, "mean": 234.80557250976562, "std": 209.49620056152344, "min": -327.2060546875, "p10": -24.80993595123291, "median": 221.57357788085938, "p90": 454.8843139648437, "max": 773.6412353515625, "pos_frac": 0.859375, "sample": [157.4082489013672, 636.2752075195312, 455.3948669433594, 128.8348388671875, 186.61273193359375, -92.91796875, 336.57867431640625, 100.18701171875, 451.26361083984375, -102.17654418945312, 129.97523498535156, -24.227291107177734, 263.7745361328125, 511.20733642578125, 62.009525299072266, 158.05064392089844, -70.60279083251953, 3.5151138305664062, 408.6317138671875, 93.0635986328125, 159.58633422851562, -10.300407409667969, 137.54281616210938, 273.324951171875, -48.74493408203125, 147.02740478515625, 432.4056396484375, 119.99095916748047, 773.6412353515625, -25.059640884399414, 125.31744384765625, 332.11572265625, 203.10104370117188, 628.51318359375, 254.09457397460938, 354.66387939453125, 236.8365936279297, 361.05859375, 326.86126708984375, 306.19036865234375, 221.63372802734375, 160.768310546875, -152.92202758789062, -327.2060546875, 214.4965057373047, 39.85919189453125, 353.34942626953125, 326.9817199707031, 181.0804901123047, 254.213623046875, 389.1120300292969, 398.2643127441406, 119.99734497070312, 221.513427734375, 195.23617553710938, 525.9763793945312, 421.4557800292969, 706.7173461914062, 347.47198486328125, 318.2665100097656, 453.6930236816406, 344.338623046875, 23.333263397216797, 408.89996337890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000590.npy"}
{"epoch": 0.8663729809104258, "step": 591, "batch_size": 64, "mean": 230.46311950683594, "std": 205.17918395996094, "min": -251.585205078125, "p10": -54.71964263916013, "median": 245.82120513916016, "p90": 479.6464874267579, "max": 748.5775146484375, "pos_frac": 0.859375, "sample": [-68.53545379638672, 748.5775146484375, 504.9415588378906, 393.61248779296875, 492.3712463378906, -87.01475524902344, 230.29432678222656, -213.24191284179688, 333.63427734375, 607.9229736328125, 334.0404052734375, 235.20770263671875, 449.95538330078125, 272.81805419921875, 167.651611328125, 675.74609375, 410.3714294433594, 254.38436889648438, 98.71240234375, 235.75732421875, 124.30473327636719, 424.343505859375, -17.33336067199707, 110.4129867553711, 272.5732116699219, 127.12577056884766, -82.23609924316406, 355.3281555175781, 214.42431640625, 250.59735107421875, -191.1840057373047, 103.43634033203125, 282.2139587402344, -89.36341857910156, 398.85260009765625, 100.57939147949219, 51.943511962890625, 82.43792724609375, 418.12432861328125, 76.63422393798828, -251.585205078125, 379.7386779785156, 225.00880432128906, 292.2092590332031, 16.216773986816406, 507.0176696777344, 380.1520690917969, 39.340450286865234, 336.6416320800781, 297.1749267578125, 290.7403564453125, 295.4234313964844, 316.3432922363281, 131.8848876953125, 327.3450927734375, 179.02883911132812, 551.59716796875, 256.7222900390625, -22.482749938964844, 367.8639221191406, 127.69755554199219, 198.89437866210938, 241.04505920410156, 175.19857788085938], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000591.npy"}
{"epoch": 0.8678414096916299, "step": 592, "batch_size": 64, "mean": 212.3756561279297, "std": 281.7695007324219, "min": -539.4414672851562, "p10": -145.24387054443358, "median": 244.26470947265625, "p90": 524.6317932128907, "max": 884.2802734375, "pos_frac": 0.8125, "sample": [76.70167541503906, 489.881103515625, -11.336519241333008, 267.5621337890625, -311.91876220703125, 27.809829711914062, 219.59030151367188, 425.68524169921875, 884.2802734375, 277.8108825683594, -138.7499542236328, 688.7207641601562, -43.595542907714844, 500.63812255859375, -512.4324951171875, 118.99624633789062, 215.78277587890625, 246.38656616210938, 71.8143310546875, 797.217529296875, 89.8343276977539, 343.5998229980469, -55.344444274902344, 542.3818359375, 5.677263259887695, 95.09420776367188, 135.09976196289062, 471.8731689453125, 544.1155395507812, 300.0705871582031, 108.12643432617188, -7.7187042236328125, 54.847412109375, 353.4696350097656, 534.914794921875, 78.68122100830078, 145.61517333984375, 248.77264404296875, -148.0269775390625, 411.48114013671875, 285.8939514160156, 310.9297180175781, 412.4774169921875, 453.18951416015625, 150.4803466796875, -168.83270263671875, 242.14285278320312, -445.762451171875, 400.6558532714844, 341.4794006347656, 409.72021484375, 100.09779357910156, 592.8480224609375, 281.8687438964844, -200.9688720703125, 452.6101379394531, 386.201416015625, 226.197265625, 441.37042236328125, 132.19003295898438, -539.4414672851562, 261.094970703125, 373.4541015625, 148.73638916015625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000592.npy"}
{"epoch": 0.869309838472834, "step": 593, "batch_size": 64, "mean": 172.6243896484375, "std": 223.8629150390625, "min": -347.2721862792969, "p10": -87.22115707397461, "median": 150.77093505859375, "p90": 441.5494384765625, "max": 846.3115234375, "pos_frac": 0.734375, "sample": [-7.100494384765625, 118.05538940429688, -69.6502456665039, -1.9351806640625, -2.8186264038085938, -90.262939453125, 385.482177734375, -28.559032440185547, 434.2445068359375, 493.4527282714844, 254.031005859375, 453.1456298828125, 451.09832763671875, 396.9283752441406, 56.315025329589844, -1.9540843963623047, 441.64520263671875, 524.3051147460938, 362.1976013183594, 146.9512176513672, 24.276927947998047, 62.6781005859375, 149.179931640625, 52.37590789794922, 292.3284606933594, 846.3115234375, 193.28521728515625, 204.383544921875, 241.88609313964844, -244.4518280029297, -7.076562881469727, 356.5452880859375, -80.12366485595703, -6.265625, 257.7529296875, 441.32598876953125, -25.45721435546875, 548.986328125, 25.26824188232422, 16.749210357666016, -347.2721862792969, 384.27178955078125, -141.19427490234375, -133.2268829345703, 139.95077514648438, 43.3118896484375, 73.9341049194336, 219.18600463867188, 373.21160888671875, 418.15130615234375, 220.5151824951172, 280.9393615722656, 375.7993469238281, 348.2216491699219, 265.79248046875, 334.4169921875, 152.3619384765625, 21.758852005004883, -201.979248046875, 268.86187744140625, 14.366455078125, -107.25403594970703, 56.53487777709961, 321.7698974609375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000593.npy"}
{"epoch": 0.8707782672540382, "step": 594, "batch_size": 64, "mean": 229.670654296875, "std": 272.8073425292969, "min": -399.9638977050781, "p10": -105.82352828979487, "median": 236.87081146240234, "p90": 608.7555419921875, "max": 866.0177612304688, "pos_frac": 0.78125, "sample": [443.7436828613281, 5.462982177734375, -7.17967414855957, 23.760591506958008, -26.420448303222656, -55.19243621826172, 496.19720458984375, 222.60394287109375, 267.1699523925781, 191.66189575195312, 125.60926818847656, 290.6449279785156, 758.3031616210938, 508.65814208984375, 112.90579223632812, 519.617431640625, 436.77777099609375, 285.0190734863281, 272.91534423828125, -25.77022933959961, -127.52256774902344, 306.1185302734375, -172.24981689453125, 736.0679321289062, -28.177806854248047, 372.75592041015625, -192.00357055664062, 348.1032409667969, 404.7595520019531, 143.57437133789062, 132.67576599121094, 356.31048583984375, 315.1564636230469, 403.38555908203125, 313.5240478515625, 456.39410400390625, 234.76304626464844, 4.758453369140625, 111.14974212646484, -399.9638977050781, 7.909023284912109, 668.355224609375, 257.8135070800781, 174.1502685546875, 238.97857666015625, 739.3641357421875, 615.7938232421875, 250.12408447265625, -30.965179443359375, 330.19464111328125, -136.64895629882812, 16.437850952148438, 41.733646392822266, -238.93841552734375, 416.72265625, -22.824939727783203, 525.650634765625, 592.3328857421875, 694.0006713867188, 163.5121612548828, 866.0177612304688, -149.94444274902344, 46.923126220703125, 66.1615982055664], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000594.npy"}
{"epoch": 0.8722466960352423, "step": 595, "batch_size": 64, "mean": 209.3623504638672, "std": 224.0408935546875, "min": -166.59841918945312, "p10": -31.04471015930175, "median": 195.6471710205078, "p90": 492.1535736083985, "max": 1014.1435546875, "pos_frac": 0.828125, "sample": [-45.30665588378906, 272.03411865234375, 213.90786743164062, 116.28470611572266, 6.892814636230469, 644.496826171875, 110.84765625, 315.6902770996094, -140.88543701171875, 223.08917236328125, 442.18963623046875, 190.5106964111328, 103.41220092773438, 361.98443603515625, -125.13356018066406, 326.8961486816406, 165.31619262695312, 773.5250244140625, 288.7840576171875, 57.27810287475586, -33.73569107055664, -166.59841918945312, 196.645263671875, 300.9064636230469, -13.953559875488281, 14.18033218383789, 265.4472351074219, 72.9371109008789, 252.10523986816406, 168.39797973632812, 79.69390106201172, 282.01519775390625, 112.04745483398438, 206.3262939453125, 475.6947937011719, 139.11961364746094, 123.25875854492188, 237.42002868652344, 602.2799072265625, 387.4693603515625, 59.54508972167969, -48.24197006225586, 272.6163024902344, -142.72708129882812, 197.90438842773438, 194.84202575683594, 214.8595428466797, -8.484481811523438, 296.2958984375, 374.1866455078125, -24.76575469970703, 499.20733642578125, -2.1089019775390625, 119.54039001464844, 123.12872314453125, 196.4523162841797, 305.86944580078125, 42.45672607421875, 45.5487060546875, 75.36375427246094, 659.0162353515625, 392.2462158203125, 538.8236083984375, 1014.1435546875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000595.npy"}
{"epoch": 0.8737151248164464, "step": 596, "batch_size": 64, "mean": 191.25033569335938, "std": 235.4832763671875, "min": -455.28521728515625, "p10": -56.76974258422851, "median": 168.57615661621094, "p90": 496.429638671875, "max": 873.6820678710938, "pos_frac": 0.78125, "sample": [-119.66796875, -52.570068359375, 227.64190673828125, 19.779861450195312, 249.3848419189453, 357.0940246582031, 8.142112731933594, 410.8741455078125, 383.0132751464844, 122.61170959472656, 543.9895629882812, 166.01300048828125, -36.3350830078125, 406.8226318359375, 530.953857421875, 683.8964233398438, 113.78266906738281, 102.31672668457031, 438.02764892578125, 363.08282470703125, -76.06166076660156, 456.22149658203125, 157.13255310058594, 229.55445861816406, -58.569602966308594, 542.2490844726562, -9.722616195678711, 191.76504516601562, 199.7130584716797, 38.08320617675781, -9.304763793945312, 175.67068481445312, -172.45315551757812, 532.9517822265625, 45.369049072265625, 435.1542663574219, 498.1733703613281, 364.9494934082031, 318.4088134765625, -44.74028778076172, 86.12078857421875, 100.46904754638672, 111.02157592773438, 111.85967254638672, 95.20638275146484, 334.08306884765625, 356.2254638671875, 106.00798034667969, 130.61451721191406, 192.0432586669922, 492.3609313964844, 20.85455894470215, -455.28521728515625, 171.13931274414062, 228.86627197265625, -186.80691528320312, 8.787979125976562, -7.874706268310547, 300.15582275390625, 359.3307189941406, 873.6820678710938, -14.683626174926758, 287.42144775390625, -194.97816467285156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000596.npy"}
{"epoch": 0.8751835535976505, "step": 597, "batch_size": 64, "mean": 238.2307586669922, "std": 260.2657470703125, "min": -359.4007568359375, "p10": -28.412306404113767, "median": 196.44623565673828, "p90": 529.2373596191406, "max": 1023.667236328125, "pos_frac": 0.875, "sample": [70.32699584960938, 204.1175537109375, 306.41448974609375, 218.25718688964844, 145.79736328125, 720.5745849609375, 2.710193634033203, 523.3713989257812, 263.0416259765625, 77.18901824951172, -54.719688415527344, 135.54193115234375, 11.041648864746094, 61.928829193115234, 391.8961181640625, 121.14832305908203, 385.96783447265625, 139.8533477783203, 155.12216186523438, 711.363525390625, 132.1248779296875, 262.5242919921875, 39.348724365234375, 453.3758850097656, 149.7933807373047, 186.35916137695312, 11.71329116821289, 56.78297424316406, 94.09330749511719, -359.4007568359375, -335.594970703125, -29.977170944213867, 441.30865478515625, 126.57756042480469, 297.2283630371094, 885.28173828125, 368.63507080078125, -24.760955810546875, -86.74937438964844, 187.40606689453125, 379.3082580566406, 395.04620361328125, 105.07010650634766, 337.86376953125, 192.69223022460938, 245.98019409179688, 353.80047607421875, 426.3414306640625, 166.4472198486328, 569.3831176757812, 214.67913818359375, -167.040283203125, -129.5084228515625, 1023.667236328125, 886.510986328125, 216.42979431152344, 394.3301086425781, 156.73948669433594, 408.3221740722656, 531.7513427734375, 200.2002410888672, 353.6283874511719, 131.99069213867188, 406.11981201171875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000597.npy"}
{"epoch": 0.8766519823788547, "step": 598, "batch_size": 64, "mean": 200.17642211914062, "std": 248.95030212402344, "min": -566.7926635742188, "p10": -90.49000701904296, "median": 203.3633270263672, "p90": 496.72184753417974, "max": 616.1846923828125, "pos_frac": 0.828125, "sample": [255.73951721191406, 421.4111633300781, 104.25440979003906, 515.6941528320312, -256.7221984863281, 359.4364318847656, 424.9161376953125, 437.98846435546875, 140.8997802734375, 509.17889404296875, -87.65487670898438, -372.9440002441406, -566.7926635742188, 95.64118957519531, 463.06195068359375, 105.2099838256836, 545.7032470703125, 311.4165344238281, 503.2670593261719, 324.8976745605469, 119.67181396484375, 421.96246337890625, 486.4066162109375, 165.46127319335938, 160.34889221191406, 336.8639221191406, 335.66925048828125, 81.933349609375, -347.40301513671875, 39.63782501220703, 175.7421875, 222.93612670898438, 181.53111267089844, -24.264503479003906, -170.4148712158203, 352.943115234375, -67.52635192871094, 464.1224670410156, 76.00453186035156, 270.20306396484375, 85.99422454833984, 107.1392593383789, -334.9672546386719, 330.7655334472656, -86.29084777832031, 535.15478515625, 187.3185272216797, 219.4081268310547, 156.57801818847656, 99.75821685791016, 240.42111206054688, 443.05584716796875, 616.1846923828125, 354.63739013671875, 338.29718017578125, 142.71949768066406, 327.05328369140625, 487.7388916015625, 350.1572265625, 147.36294555664062, 50.73490905761719, 86.77071380615234, 500.5716857910156, -91.70506286621094], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000598.npy"}
{"epoch": 0.8781204111600588, "step": 599, "batch_size": 64, "mean": 207.14028930664062, "std": 180.92636108398438, "min": -224.0894775390625, "p10": -34.2985542297363, "median": 209.30389404296875, "p90": 436.21970825195314, "max": 582.4466552734375, "pos_frac": 0.875, "sample": [196.1398468017578, 204.91905212402344, 213.68873596191406, 120.20374298095703, 24.12726593017578, 375.7935791015625, 399.687255859375, 169.21255493164062, 243.52484130859375, 318.4588623046875, 495.35986328125, 80.52606964111328, 7.613800048828125, 218.75192260742188, -3.6660099029541016, 167.17979431152344, 332.1741638183594, 262.862060546875, 197.10995483398438, 482.33184814453125, 72.11212158203125, 381.3865661621094, 345.2375183105469, 222.96121215820312, 277.3179931640625, -51.10016632080078, 186.7918701171875, 324.60089111328125, 215.3542022705078, -224.0894775390625, 485.5716552734375, -47.426788330078125, 402.9749755859375, 397.78387451171875, 433.96380615234375, 437.1865234375, 419.07598876953125, 8.183624267578125, 582.4466552734375, 102.95230865478516, 223.3953857421875, 30.203704833984375, 485.08563232421875, 149.69789123535156, 232.92991638183594, 128.13604736328125, 363.0931396484375, -79.51509857177734, 194.57069396972656, 164.6654052734375, 274.343994140625, 41.94493103027344, -99.89415740966797, 5.866584777832031, -120.25094604492188, 150.91116333007812, 320.3272705078125, 146.3355712890625, 222.61488342285156, 528.88427734375, 9.4833984375, -133.414306640625, 203.3332061767578, 334.9452209472656], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000599.npy"}
{"epoch": 0.8795888399412628, "step": 600, "batch_size": 64, "mean": 208.917236328125, "std": 231.66775512695312, "min": -303.1612548828125, "p10": -66.55485610961907, "median": 175.49181365966797, "p90": 480.3594512939453, "max": 713.9129028320312, "pos_frac": 0.890625, "sample": [54.442176818847656, 114.14439392089844, 348.3253173828125, -213.99063110351562, 463.6573791503906, 292.91168212890625, 110.52813720703125, 210.5770721435547, 31.14652442932129, 432.9449157714844, 94.25780487060547, 241.32968139648438, 482.55682373046875, 59.484046936035156, 230.2099609375, 567.5791015625, -95.89289093017578, 713.9129028320312, 62.247276306152344, 300.6357421875, 433.9385070800781, 479.6552429199219, 79.32652282714844, 167.05886840820312, 64.36097717285156, 46.83283996582031, 354.10321044921875, 692.0631103515625, 619.4593505859375, 223.926513671875, 77.29991149902344, 233.20742797851562, 477.6319580078125, 214.82383728027344, 480.6612548828125, 268.1202697753906, -182.3705596923828, 156.29945373535156, 29.05170249938965, 445.46002197265625, 477.4237976074219, 166.2805633544922, 8.610061645507812, 134.62249755859375, 700.0077514648438, 89.99474334716797, 475.95477294921875, 149.11915588378906, -165.39195251464844, 374.2217102050781, 183.9247589111328, 304.0289611816406, 317.12200927734375, 76.98094940185547, 139.56871032714844, 431.4040832519531, -205.43252563476562, 37.539703369140625, 22.123769760131836, -132.515869140625, -303.1612548828125, 1.9005584716796875, 38.81999969482422, 185.6379852294922], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000600.npy"}
{"epoch": 0.8810572687224669, "step": 601, "batch_size": 64, "mean": 223.1067352294922, "std": 235.035888671875, "min": -307.1407165527344, "p10": -43.82358703613281, "median": 219.93791961669922, "p90": 472.4852172851564, "max": 925.8175048828125, "pos_frac": 0.8125, "sample": [190.18624877929688, 269.212890625, 925.8175048828125, 487.1843566894531, -36.88496780395508, 540.4234619140625, 235.4539794921875, 230.9676513671875, -5.898765563964844, 372.937744140625, -21.132530212402344, 1.8066558837890625, 493.1044921875, 260.76165771484375, 192.18565368652344, 295.96820068359375, 106.22980499267578, 272.6763916015625, -45.839866638183594, 777.7119750976562, 351.71038818359375, 405.75201416015625, -10.82891845703125, 172.9799346923828, 157.12478637695312, 208.90818786621094, 25.160383224487305, 186.7003173828125, 383.3691101074219, 127.68413543701172, -39.118934631347656, 136.15196228027344, -198.53408813476562, 352.4093017578125, -71.69223022460938, 395.15069580078125, 57.12854766845703, 335.55950927734375, 28.146881103515625, 331.91754150390625, 121.50220489501953, 385.2684020996094, 93.94579315185547, -307.1407165527344, 208.0912322998047, -260.790771484375, 17.418960571289062, 651.8557739257812, 424.674560546875, -98.20906829833984, 278.7032775878906, 377.09393310546875, 99.81004333496094, 232.70870971679688, -103.00114440917969, 638.878173828125, 85.64070892333984, 438.1872253417969, 410.3708801269531, 379.11566162109375, 425.7698059082031, 154.47470092773438, 385.45263671875, 360.45855712890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000601.npy"}
{"epoch": 0.882525697503671, "step": 602, "batch_size": 64, "mean": 261.26611328125, "std": 233.43678283691406, "min": -257.7354736328125, "p10": -5.284760856628415, "median": 208.60073852539062, "p90": 551.0365051269532, "max": 880.5977172851562, "pos_frac": 0.875, "sample": [127.91548919677734, 121.94475555419922, -46.11085891723633, 55.35356140136719, 168.67950439453125, 156.1075439453125, 606.60546875, 354.96173095703125, 174.6211700439453, 153.9704132080078, 300.66717529296875, 340.5567626953125, 189.82286071777344, 285.2049255371094, 644.585205078125, 81.37346649169922, 559.8600463867188, 289.97802734375, 322.52044677734375, 471.90032958984375, 30.217300415039062, 146.99859619140625, -18.19073486328125, -212.18850708007812, 145.3362579345703, 247.245361328125, 422.18084716796875, 186.73548889160156, -257.7354736328125, -6.49504280090332, 19.328758239746094, 409.6723327636719, 436.8020935058594, 70.21278381347656, 407.8396301269531, 427.896484375, 172.71994018554688, 297.6241455078125, 95.50469970703125, -116.88614654541016, 610.4998779296875, 382.1750793457031, 444.33642578125, 203.483642578125, 480.20343017578125, 434.50860595703125, 880.5977172851562, -2.4607696533203125, 530.4482421875, -92.45729064941406, 525.1146240234375, 462.2221984863281, 200.3406982421875, 213.71783447265625, 776.4061279296875, 117.75701904296875, 59.40918731689453, 317.406494140625, 138.61444091796875, 146.64324951171875, 494.27618408203125, 747.067138671875, 257.7204284667969, 127.66465759277344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000602.npy"}
{"epoch": 0.8839941262848752, "step": 603, "batch_size": 64, "mean": 227.3087158203125, "std": 261.8252868652344, "min": -242.31150817871094, "p10": -50.280530548095676, "median": 181.29385375976562, "p90": 550.6516235351563, "max": 1202.8131103515625, "pos_frac": 0.84375, "sample": [32.7265625, -109.94541931152344, 295.5345458984375, 21.70015525817871, 646.1568603515625, 11.849212646484375, 441.0555725097656, 175.6377716064453, 159.9485321044922, 503.0255126953125, 131.92388916015625, 239.2578125, -242.31150817871094, 394.896240234375, 38.891448974609375, 184.83694458007812, 27.4056396484375, 586.08447265625, 369.70343017578125, 247.878662109375, 519.1049194335938, 376.9937744140625, 231.11111450195312, 44.50520706176758, 35.70820617675781, 201.37966918945312, 338.34844970703125, 177.75076293945312, 209.64407348632812, 159.27069091796875, 673.6353149414062, -217.03285217285156, 459.30926513671875, 43.395599365234375, 26.16899871826172, 350.0604248046875, 482.99163818359375, -61.69695281982422, 123.33074951171875, 209.06491088867188, 371.442138671875, 120.32965850830078, -96.23323059082031, 13.87447738647461, -21.82097625732422, 215.33212280273438, 129.5285186767578, 113.12419128417969, -23.6422119140625, 165.23162841796875, 27.669260025024414, 545.2479858398438, 469.18865966796875, -88.9617919921875, 278.3177185058594, -142.9337158203125, 1202.8131103515625, 233.11236572265625, 711.0503540039062, -12.61871337890625, 699.7227783203125, 502.7572326660156, 552.9674682617188, 42.958683013916016], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000603.npy"}
{"epoch": 0.8854625550660793, "step": 604, "batch_size": 64, "mean": 253.4334259033203, "std": 240.3216552734375, "min": -519.9703369140625, "p10": -24.338928222656236, "median": 259.52906799316406, "p90": 560.3465209960939, "max": 872.4570922851562, "pos_frac": 0.875, "sample": [-38.41423797607422, 194.567138671875, 12.301887512207031, -9.766098022460938, 415.41455078125, 345.7173767089844, 480.4569091796875, 427.3544616699219, 193.4519805908203, 308.66265869140625, -230.51242065429688, 581.6875610351562, 344.698974609375, 1.3109970092773438, 528.2984619140625, 318.87286376953125, 236.4008026123047, 11.686038970947266, 203.28765869140625, 462.71148681640625, 193.47744750976562, -214.73260498046875, 126.64900207519531, 374.39154052734375, 48.376182556152344, -519.9703369140625, 613.0933227539062, -30.584426879882812, 389.6373596191406, 3.22515869140625, -37.361473083496094, 205.1343231201172, 229.0088348388672, 401.8482971191406, 192.82595825195312, 569.380615234375, 281.87237548828125, 40.312110900878906, 212.35865783691406, 261.1997375488281, 433.9276428222656, 374.5396728515625, 303.9179992675781, 59.42418670654297, 477.6368713378906, 659.0974731445312, 222.2646942138672, 872.4570922851562, 306.0359191894531, 213.80921936035156, 378.73858642578125, 539.2669677734375, 365.4306640625, 191.1316680908203, 589.9613037109375, 186.69203186035156, 67.53673553466797, 35.3271484375, 259.7593078613281, 306.85369873046875, -44.93505859375, 319.6968994140625, 713.539306640625, 259.298828125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000604.npy"}
{"epoch": 0.8869309838472834, "step": 605, "batch_size": 64, "mean": 221.38226318359375, "std": 243.63247680664062, "min": -321.0923767089844, "p10": -44.11511306762692, "median": 195.45178985595703, "p90": 471.49413146972677, "max": 1279.77978515625, "pos_frac": 0.875, "sample": [101.42555236816406, 12.218269348144531, 7.916748046875, 8.344629287719727, 387.490234375, 110.64826965332031, 172.5480194091797, 383.72674560546875, 419.192138671875, 121.50213623046875, 390.24749755859375, 14.753082275390625, 25.394441604614258, -16.83409881591797, 305.6689758300781, 362.8345947265625, 156.46461486816406, 572.9667358398438, 681.642333984375, 25.353225708007812, 1279.77978515625, 254.90853881835938, 137.35333251953125, 272.50714111328125, 210.1001434326172, 129.722412109375, 43.68511962890625, 200.78311157226562, -111.81645202636719, 171.95114135742188, 131.055419921875, -115.05912780761719, 344.9735412597656, -55.806976318359375, -100.4437255859375, 230.76010131835938, 608.4066162109375, 71.61124420166016, 220.16964721679688, 493.9092712402344, 127.90077209472656, 787.490966796875, -123.93058776855469, -321.0923767089844, 390.9319763183594, 236.65553283691406, -69.11328125, 393.8724365234375, 167.6644287109375, 190.12046813964844, 129.42465209960938, 228.59698486328125, 310.2740783691406, 299.2193908691406, 80.33811950683594, 312.29595947265625, 281.5472106933594, 322.10528564453125, 299.8125915527344, 101.99214172363281, 361.4373779296875, 298.3329772949219, 152.04913330078125, 548.4835205078125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000605.npy"}
{"epoch": 0.8883994126284875, "step": 606, "batch_size": 64, "mean": 242.02098083496094, "std": 229.6163787841797, "min": -178.92041015625, "p10": -20.348928260803223, "median": 222.68450927734375, "p90": 537.2292938232423, "max": 1082.462890625, "pos_frac": 0.875, "sample": [-80.06922912597656, -26.659515380859375, 368.4141540527344, -101.97561645507812, 270.2193908691406, 405.9296875, 10.768314361572266, 66.3712158203125, 479.51904296875, 554.411376953125, 100.04551696777344, 1082.462890625, 362.16461181640625, 3.392578125, 36.310699462890625, 95.53790283203125, 151.74368286132812, 92.63166046142578, 326.4110412597656, 584.025146484375, 334.8533020019531, -145.8800811767578, 550.5355224609375, 7.413604736328125, 596.9420166015625, 408.5032958984375, 154.95538330078125, 179.0563507080078, -20.658409118652344, -74.28968811035156, 206.6615753173828, 314.6686706542969, 59.19482421875, 142.93991088867188, -178.92041015625, 68.29179382324219, 237.41848754882812, 215.730224609375, -19.626806259155273, 28.062103271484375, 180.38450622558594, 246.82415771484375, 656.2012329101562, 416.3614501953125, 228.91513061523438, 371.9149169921875, 176.63568115234375, 390.1832275390625, 432.0421142578125, 196.352783203125, 93.58998107910156, 506.1814270019531, 8.092697143554688, 249.1366424560547, 294.6561279296875, 701.7581787109375, 262.35247802734375, 347.82855224609375, 127.95299530029297, 329.00701904296875, 464.1231689453125, 340.2942810058594, 404.59368896484375, 216.45388793945312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000606.npy"}
{"epoch": 0.8898678414096917, "step": 607, "batch_size": 64, "mean": 191.96990966796875, "std": 194.28125, "min": -236.27529907226562, "p10": -75.668009185791, "median": 182.54280853271484, "p90": 438.82196044921875, "max": 671.0318603515625, "pos_frac": 0.828125, "sample": [240.63841247558594, 271.9890441894531, 238.9331817626953, 115.09251403808594, 105.17241668701172, 90.51741027832031, 439.9215087890625, 436.25634765625, 6.738861083984375, 543.0191040039062, -167.37554931640625, 226.11599731445312, 288.7591857910156, -17.742286682128906, 131.85006713867188, 671.0318603515625, -2.092540740966797, 119.65459442138672, 176.55413818359375, 95.06574249267578, 82.96755981445312, 148.1306915283203, 139.66326904296875, 224.4436798095703, -66.717529296875, 341.44000244140625, -115.29315948486328, 383.014892578125, 273.4197082519531, 250.84034729003906, 290.1991882324219, 20.772682189941406, 541.2417602539062, 133.395263671875, 344.89227294921875, 318.2440185546875, 312.58612060546875, 26.9290771484375, 488.0205078125, 174.99514770507812, -236.27529907226562, 399.4006652832031, -141.08889770507812, 501.49993896484375, 285.35986328125, 64.35400390625, -79.5039291381836, 187.3037109375, -118.2203598022461, 239.29412841796875, -7.014007568359375, 222.73019409179688, 48.397674560546875, 238.06930541992188, 120.70027923583984, 32.98921203613281, 356.1978759765625, 557.0653686523438, 317.15740966796875, 417.34844970703125, 155.46981811523438, -96.2472915649414, 320.01806640625, 177.7819061279297], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000607.npy"}
{"epoch": 0.8913362701908958, "step": 608, "batch_size": 64, "mean": 217.48269653320312, "std": 190.767822265625, "min": -165.04437255859375, "p10": -15.723357582092282, "median": 198.81853485107422, "p90": 450.10286254882817, "max": 822.4307861328125, "pos_frac": 0.84375, "sample": [452.1696472167969, 300.686767578125, 101.26016998291016, 191.95632934570312, 53.582115173339844, 510.40582275390625, 147.75833129882812, 75.92167663574219, 211.4756317138672, 308.6005554199219, 398.31768798828125, -72.36372375488281, 190.65231323242188, 324.76953125, -41.72389221191406, 116.37680053710938, 26.580108642578125, 319.228271484375, -12.119428634643555, 293.5796813964844, 419.9463195800781, 174.34681701660156, 53.97621154785156, 249.82638549804688, -165.04437255859375, 188.12306213378906, 175.05294799804688, 263.61492919921875, 415.2204895019531, -55.32771301269531, 91.536376953125, -1.436004638671875, 123.12257385253906, 217.24891662597656, 205.6807403564453, 259.43914794921875, 164.55210876464844, -17.267898559570312, 414.08489990234375, 8.749885559082031, 822.4307861328125, 295.938720703125, 498.501220703125, 123.65423583984375, 367.0521240234375, 229.05487060546875, 14.669540405273438, 347.8396301269531, 344.7152404785156, 183.3705291748047, 501.2475280761719, -52.47936248779297, 252.2886962890625, 445.2803649902344, -124.43125915527344, 551.5952758789062, -1.9743461608886719, 109.56655883789062, 499.72564697265625, 366.06036376953125, 412.1321716308594, 404.41949462890625, 113.38922119140625, 132.28582763671875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000608.npy"}
{"epoch": 0.8928046989720999, "step": 609, "batch_size": 64, "mean": 245.67294311523438, "std": 212.7964630126953, "min": -443.32891845703125, "p10": 28.535101699829102, "median": 244.8712158203125, "p90": 485.5887207031251, "max": 878.2406616210938, "pos_frac": 0.921875, "sample": [167.93902587890625, 160.112548828125, 57.9608154296875, -53.90403747558594, 265.44793701171875, 403.16119384765625, 498.2792663574219, 315.29132080078125, 299.7615966796875, 27.71160888671875, 411.78265380859375, 423.6111145019531, 40.70732116699219, 306.3147277832031, 388.5010070800781, -443.32891845703125, -28.618499755859375, 262.0967102050781, 223.62356567382812, 146.30703735351562, 225.69534301757812, 266.9219970703125, 128.55349731445312, 455.9774475097656, 291.85235595703125, 348.5361022949219, 534.0733032226562, 106.71560668945312, 74.04513549804688, 336.1444091796875, 214.97760009765625, 300.55047607421875, 621.5354614257812, 354.120361328125, 865.0776977539062, 578.32861328125, -70.98944854736328, 303.05010986328125, 133.755859375, 317.7325134277344, 231.84130859375, 262.35888671875, 257.901123046875, 878.2406616210938, 280.9750671386719, 206.57257080078125, 185.72128295898438, 30.456584930419922, -63.10063171386719, 88.376708984375, 353.59796142578125, 119.36251831054688, 279.840087890625, 176.88624572753906, 23.599021911621094, 175.541015625, 147.05740356445312, 113.79345703125, 396.9045104980469, 667.906494140625, 113.65740203857422, 282.92181396484375, 192.126220703125, 61.11845779418945], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000609.npy"}
{"epoch": 0.8942731277533039, "step": 610, "batch_size": 64, "mean": 205.66021728515625, "std": 210.1658477783203, "min": -335.8026428222656, "p10": -23.096039581298818, "median": 218.9088592529297, "p90": 494.91554260253906, "max": 634.3192749023438, "pos_frac": 0.859375, "sample": [267.6317138671875, 224.96054077148438, 250.7841339111328, 236.4841766357422, -59.820037841796875, 297.457275390625, 34.6622314453125, 209.00987243652344, 116.76105499267578, 98.68597412109375, -27.53026580810547, 219.11700439453125, 0.5323257446289062, 494.77508544921875, 494.9757385253906, 331.7218017578125, 161.85348510742188, 113.76049041748047, -139.73141479492188, -50.11576843261719, 286.027587890625, 501.1017150878906, 215.36318969726562, 542.3207397460938, -12.74951171875, 249.22763061523438, 56.972042083740234, 110.67942810058594, 634.3192749023438, 524.13916015625, 626.8531494140625, 8.930801391601562, 47.028594970703125, 445.886962890625, 442.92144775390625, 283.71759033203125, 20.298864364624023, 347.16107177734375, 170.086181640625, 238.00572204589844, 345.806396484375, 191.6970672607422, 7.824268341064453, 75.43412780761719, 253.74484252929688, 97.29373931884766, 477.21466064453125, -0.9243202209472656, -162.86874389648438, 285.13580322265625, 24.220291137695312, 426.02764892578125, 112.48222351074219, 59.61052322387695, 332.53375244140625, 284.8615417480469, 291.36700439453125, 218.70071411132812, 122.09615325927734, 223.4271697998047, -250.84463500976562, -335.8026428222656, 460.05279541015625, 608.8969116210938], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000610.npy"}
{"epoch": 0.895741556534508, "step": 611, "batch_size": 64, "mean": 226.42507934570312, "std": 236.96571350097656, "min": -224.52532958984375, "p10": -67.55078811645508, "median": 204.2535858154297, "p90": 537.3956420898439, "max": 741.9296264648438, "pos_frac": 0.84375, "sample": [313.1368103027344, 191.2720489501953, 552.39794921875, 468.1878356933594, 393.48980712890625, 296.9200439453125, 19.73204803466797, 354.9599609375, 104.00740814208984, 0.26511383056640625, 264.78753662109375, 570.6270141601562, -68.84016418457031, 41.39955520629883, -181.27734375, -64.54224395751953, -75.87196350097656, -172.28773498535156, 205.44699096679688, 473.7174987792969, -224.52532958984375, 458.1535949707031, 503.2752685546875, -213.03585815429688, 464.001708984375, 341.616455078125, 192.74008178710938, 614.9790649414062, 290.8566589355469, 262.6732177734375, 109.89501953125, 470.1289367675781, 266.06787109375, 19.879602432250977, 143.76239013671875, 246.561767578125, 64.43876647949219, 111.17703247070312, -18.250030517578125, 188.72271728515625, 456.072265625, 184.02655029296875, 83.85847473144531, 741.9296264648438, 230.01097106933594, 247.1343994140625, 35.310585021972656, 187.84732055664062, 506.18634033203125, 39.701438903808594, 301.9001159667969, -153.08885192871094, 203.0601806640625, 267.3157958984375, 127.01580810546875, 35.14813232421875, 409.8710632324219, 550.7710571289062, 161.27244567871094, 493.27813720703125, 724.1764526367188, 6.081062316894531, 691.7406616210938, -20.06204605102539], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000611.npy"}
{"epoch": 0.8972099853157122, "step": 612, "batch_size": 64, "mean": 247.3007049560547, "std": 186.9214324951172, "min": -102.26809692382812, "p10": 4.413669204711917, "median": 259.97393798828125, "p90": 485.3924499511719, "max": 654.2874755859375, "pos_frac": 0.921875, "sample": [454.29205322265625, 323.3149108886719, 317.8625183105469, -102.26809692382812, 3.2243614196777344, 177.72579956054688, 496.8555908203125, 318.9584045410156, 123.01127624511719, -17.039350509643555, -96.30715942382812, 520.899658203125, 99.52926635742188, 654.2874755859375, 325.36016845703125, 463.67352294921875, 418.2840576171875, 219.62355041503906, 278.729248046875, 402.3434753417969, 220.54544067382812, 75.00846099853516, -39.56867218017578, 390.13616943359375, 562.4125366210938, 54.0518798828125, 316.2278137207031, 16.72119140625, 262.71905517578125, 479.1060791015625, 388.8643493652344, 122.89193725585938, 107.01576232910156, 1.7803726196289062, 410.30712890625, 135.1050262451172, 351.0276184082031, 339.3462219238281, 135.33538818359375, 71.338623046875, 216.44163513183594, 257.22882080078125, 36.5472412109375, 236.03292846679688, 474.14447021484375, 105.82965087890625, 468.8977355957031, 309.8704833984375, 623.1648559570312, 143.6248779296875, 76.75453186035156, 376.97442626953125, 559.5145874023438, 125.5365219116211, -40.73509216308594, 149.71864318847656, 267.79583740234375, 74.12452697753906, 303.34332275390625, 120.19746398925781, 488.08660888671875, 291.31219482421875, 372.9171447753906, 7.188720703125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000612.npy"}
{"epoch": 0.8986784140969163, "step": 613, "batch_size": 64, "mean": 233.4252471923828, "std": 189.06695556640625, "min": -108.36707305908203, "p10": 23.106005096435556, "median": 204.65906524658203, "p90": 492.9423431396486, "max": 665.6791381835938, "pos_frac": 0.9375, "sample": [344.48443603515625, 252.63238525390625, 283.899658203125, -11.494243621826172, 306.51202392578125, 396.203857421875, 72.79145812988281, 406.0812072753906, 376.8926086425781, 98.91504669189453, 386.0486145019531, 66.4817123413086, 351.04901123046875, 33.148155212402344, 665.6791381835938, 229.43455505371094, 37.85417175292969, 443.55560302734375, 245.5683135986328, 50.02659606933594, 346.42572021484375, 133.79861450195312, 229.96958923339844, 462.78302001953125, 451.4813232421875, 132.9976043701172, 572.7899780273438, 563.3553466796875, 31.68890380859375, 261.18170166015625, 204.69410705566406, 157.33172607421875, 69.55313873291016, 408.10821533203125, 87.84501647949219, 589.5242919921875, 150.80018615722656, 257.21484375, 635.1290283203125, 0.09441375732421875, 93.89347076416016, 505.8677673339844, 449.708984375, 116.58348846435547, -55.62469482421875, 108.68020629882812, 89.17915344238281, 647.5762939453125, -4.688053131103516, 72.3165283203125, 113.66966247558594, 256.8980712890625, -108.36707305908203, 204.6240234375, 351.41595458984375, 297.9334716796875, 176.58651733398438, 104.59933471679688, 11.371620178222656, 167.00900268554688, 61.34532165527344, 146.74810791015625, 329.9304504394531, 19.42761993408203], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000613.npy"}
{"epoch": 0.9001468428781204, "step": 614, "batch_size": 64, "mean": 247.637939453125, "std": 225.8634490966797, "min": -231.9263916015625, "p10": -38.31702499389648, "median": 263.68170166015625, "p90": 554.659100341797, "max": 725.2648315429688, "pos_frac": 0.859375, "sample": [170.41799926757812, 497.29571533203125, -88.96475219726562, 338.6952819824219, 79.74659729003906, 421.6809387207031, 49.524600982666016, 252.44186401367188, 102.28843688964844, 45.64297866821289, 349.13470458984375, 29.083141326904297, 264.7208557128906, 288.1590576171875, 276.19293212890625, -97.08124542236328, 481.521240234375, 479.5084228515625, 240.7635040283203, -39.58095169067383, 538.5775146484375, 6.272552490234375, 215.73353576660156, 349.1610107421875, -6.3221435546875, 3.9804458618164062, 316.999755859375, 137.91513061523438, 347.7821960449219, -39.73701477050781, 283.76190185546875, 561.5512084960938, -35.367862701416016, 450.78265380859375, -118.99618530273438, 85.87230682373047, 466.4615478515625, 18.605297088623047, 203.66090393066406, 321.91326904296875, 503.5085144042969, 40.87465286254883, 423.32476806640625, 278.4251403808594, 698.7843627929688, 591.4827880859375, 262.6425476074219, 166.626953125, 303.30621337890625, 369.1177673339844, 63.556419372558594, 96.0595932006836, 519.6236572265625, 86.78237915039062, 605.899169921875, 725.2648315429688, 585.6542358398438, 98.24835205078125, -68.47936248779297, -231.9263916015625, 603.7400512695312, 81.00482177734375, 367.81732177734375, 427.6909484863281], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000614.npy"}
{"epoch": 0.9016152716593245, "step": 615, "batch_size": 64, "mean": 194.0587158203125, "std": 200.88223266601562, "min": -284.09344482421875, "p10": 6.739392852783204, "median": 170.21511840820312, "p90": 424.8209411621094, "max": 871.9874877929688, "pos_frac": 0.90625, "sample": [28.064544677734375, 547.6878051757812, 179.50872802734375, 358.8836364746094, 99.37317657470703, 349.0518493652344, 35.24571228027344, 157.95257568359375, 128.79327392578125, 52.59515380859375, 321.9565734863281, 6.6216583251953125, 234.13270568847656, 69.49359130859375, 448.5484619140625, -23.657440185546875, 177.9864501953125, 7.014106750488281, 245.18846130371094, 208.39566040039062, 204.915771484375, 33.321258544921875, 871.9874877929688, 61.23296356201172, 46.058074951171875, 300.3904724121094, 61.793739318847656, 417.44757080078125, 287.3327941894531, 94.154296875, 198.60018920898438, 194.71580505371094, 210.373291015625, 148.66001892089844, 126.45001220703125, -17.448753356933594, 275.6927490234375, 165.49978637695312, 11.268569946289062, 92.23180389404297, -284.09344482421875, 427.98095703125, 251.69830322265625, 9.273422241210938, 376.4540100097656, 325.47857666015625, 232.29110717773438, 556.7521362304688, 69.8170166015625, 322.20745849609375, 347.8437805175781, -89.41917419433594, 135.3097686767578, -6.175750732421875, 129.91384887695312, 33.261024475097656, 111.73344421386719, 224.479248046875, 174.93045043945312, 618.9353637695312, -59.71919250488281, 815.3489379882812, 227.9159393310547, 50.02592468261719], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000615.npy"}
{"epoch": 0.9030837004405287, "step": 616, "batch_size": 64, "mean": 161.2052764892578, "std": 214.3150177001953, "min": -218.40899658203125, "p10": -139.23441162109376, "median": 138.19581604003906, "p90": 432.3230072021485, "max": 900.3212890625, "pos_frac": 0.796875, "sample": [446.8104248046875, -12.634002685546875, 158.67807006835938, 394.6714782714844, 279.1189270019531, 54.96735382080078, 64.177978515625, -31.913604736328125, 157.44879150390625, 521.04248046875, 540.2125854492188, 338.1056823730469, 139.30764770507812, 130.97727966308594, 157.7248992919922, 38.41541290283203, 130.83523559570312, 98.99526977539062, 137.083984375, -40.05187225341797, 524.44189453125, 409.5495300292969, 13.213470458984375, 75.89505767822266, -209.17355346679688, -218.40899658203125, 225.22540283203125, 125.78411865234375, 136.57498168945312, 398.7384338378906, 900.3212890625, 78.0816421508789, -188.49346923828125, 174.45388793945312, 50.12778854370117, -142.4347381591797, -4.698022842407227, 184.8524169921875, 185.922607421875, 262.325439453125, 166.4049530029297, 305.05010986328125, 101.67787170410156, 230.220703125, -195.66146850585938, 350.5204772949219, 148.2654571533203, -76.87910461425781, 34.86298370361328, 248.0332489013672, 103.66371154785156, -144.96609497070312, 442.08306884765625, 14.835739135742188, 288.33148193359375, 136.62826538085938, 409.2744140625, 220.80458068847656, 47.456298828125, 258.297119140625, 403.2759704589844, -131.76698303222656, 473.6411437988281, -203.18930053710938], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000616.npy"}
{"epoch": 0.9045521292217328, "step": 617, "batch_size": 64, "mean": 172.3850860595703, "std": 193.36524963378906, "min": -250.9232635498047, "p10": -22.727098846435542, "median": 157.52295684814453, "p90": 399.6804809570313, "max": 701.260009765625, "pos_frac": 0.8125, "sample": [95.82124328613281, 138.47662353515625, -4.5418701171875, 328.2730407714844, -78.32279968261719, -42.29003143310547, 120.01998138427734, -186.3593292236328, 384.9988098144531, 172.4376983642578, 30.931312561035156, 189.38967895507812, 276.8404846191406, 381.3851013183594, 131.58331298828125, 162.97869873046875, 46.68646240234375, 351.1357421875, 54.71208190917969, 596.0188598632812, 319.8221435546875, 158.13555908203125, 491.192138671875, 96.36595916748047, 187.46910095214844, 216.41485595703125, 98.48567962646484, 106.72734069824219, 299.2970886230469, 248.96726989746094, 9.148653030395508, 271.8033142089844, -18.601455688476562, 83.4928970336914, 332.63250732421875, 315.1297607421875, 344.1951599121094, 306.93548583984375, 305.4598083496094, 202.33016967773438, 203.3206024169922, -24.49523162841797, 435.2365417480469, 231.51681518554688, 692.1533203125, 439.242919921875, 98.89881896972656, -250.9232635498047, 150.23974609375, 156.9103546142578, -183.88694763183594, 47.15702819824219, -15.472602844238281, 180.10321044921875, 35.943138122558594, -7.360294342041016, 405.9726257324219, 24.25726318359375, 3.156564712524414, 53.13671875, 701.260009765625, 228.13800048828125, -11.563888549804688, -85.87442016601562], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000617.npy"}
{"epoch": 0.9060205580029369, "step": 618, "batch_size": 64, "mean": 239.53053283691406, "std": 237.4259490966797, "min": -250.33055114746094, "p10": -77.98831481933591, "median": 224.83568572998047, "p90": 521.1746154785156, "max": 755.9353637695312, "pos_frac": 0.84375, "sample": [-228.58010864257812, 332.2350158691406, 160.626220703125, 28.703285217285156, 196.79336547851562, 146.6611785888672, 237.92678833007812, 45.98166275024414, 648.0052490234375, 222.74118041992188, 291.8544921875, -89.44568634033203, 277.3305358886719, 226.93019104003906, 378.6141662597656, 206.53646850585938, -197.27386474609375, 596.8763427734375, 87.04955291748047, -131.39541625976562, -49.175010681152344, 104.7393798828125, 24.49901580810547, 52.28971862792969, 24.38897705078125, -99.36534118652344, 190.85348510742188, 442.8832092285156, 496.83642578125, 186.16082763671875, 46.41493225097656, 28.9608097076416, 391.594970703125, 496.46612548828125, 368.73858642578125, -250.33055114746094, -138.75259399414062, 419.71014404296875, 229.32615661621094, 277.6329345703125, 236.44044494628906, 414.64178466796875, 521.435302734375, 387.23797607421875, 520.5663452148438, -8.508613586425781, 325.47381591796875, 347.70086669921875, 208.0222930908203, 629.613525390625, 474.0754089355469, -51.25444793701172, 736.8314819335938, 474.12060546875, 481.91510009765625, 200.6240997314453, 159.09510803222656, 388.3678894042969, 755.9353637695312, 77.68283081054688, 164.32550048828125, 525.8577270507812, 519.0875854492188, 158.6235809326172], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000618.npy"}
{"epoch": 0.9074889867841409, "step": 619, "batch_size": 64, "mean": 192.30813598632812, "std": 266.01763916015625, "min": -360.1136779785156, "p10": -102.31364669799804, "median": 169.85107421875, "p90": 563.4929382324219, "max": 857.7314453125, "pos_frac": 0.734375, "sample": [-256.06640625, -106.89192199707031, 230.694580078125, 253.524169921875, 306.45721435546875, -98.7979507446289, 144.8050994873047, -87.7546157836914, 392.3511962890625, 27.604772567749023, 308.52496337890625, 445.6156005859375, 77.73124694824219, 857.7314453125, 28.766006469726562, 235.7705078125, 531.9654541015625, 179.0548095703125, 585.7510375976562, 550.327392578125, -207.98565673828125, -9.21867561340332, 545.2322387695312, 529.0393676757812, -92.79994201660156, -103.82037353515625, -113.2088851928711, 517.5568237304688, 254.54434204101562, -76.3641357421875, -95.04959106445312, 198.74551391601562, 74.69329071044922, -32.49725341796875, 354.7035217285156, -19.096282958984375, 100.14801025390625, -78.14018249511719, 309.89300537109375, 163.01705932617188, 169.455810546875, 627.120361328125, 314.0504150390625, 562.99609375, 267.80206298828125, 59.27613830566406, -153.30419921875, 644.0796508789062, 590.1951904296875, 170.246337890625, 100.46564483642578, 21.57595443725586, 248.44308471679688, 749.9087524414062, -360.1136779785156, 34.5823974609375, 367.761474609375, -62.729122161865234, 563.7058715820312, 81.10226440429688, 212.09890747070312, 61.67742919921875, 1.530477523803711, 209.2366943359375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000619.npy"}
{"epoch": 0.908957415565345, "step": 620, "batch_size": 64, "mean": 206.92474365234375, "std": 225.23251342773438, "min": -278.9809265136719, "p10": -26.267803955078122, "median": 200.53948211669922, "p90": 515.4207885742187, "max": 856.6732177734375, "pos_frac": 0.8125, "sample": [257.3870544433594, -249.74884033203125, 297.2980041503906, 260.0834045410156, 135.9801025390625, 552.505859375, 297.634033203125, 316.59576416015625, -11.494636535644531, 563.6600952148438, 21.592018127441406, -24.20464324951172, -52.246726989746094, 520.7278442382812, 357.98291015625, 116.26087951660156, 509.2170715332031, 140.2155303955078, 251.57174682617188, 485.210205078125, 408.61651611328125, 123.86788940429688, 130.80618286132812, 356.72991943359375, -141.236572265625, 207.32147216796875, 274.96563720703125, 93.28179931640625, 259.2865905761719, 21.330368041992188, -278.9809265136719, 513.0621337890625, 279.271240234375, 744.7492065429688, 36.311279296875, 219.7157440185547, -27.152015686035156, 193.7574920654297, 233.51702880859375, 217.56207275390625, -14.640359878540039, 336.60394287109375, -4.227909088134766, 3.3339996337890625, 742.315185546875, 184.11395263671875, 516.431640625, -102.85404968261719, 139.23550415039062, 261.3789978027344, -43.448665618896484, 36.97544860839844, 856.6732177734375, 155.01657104492188, 97.79771423339844, 64.30490112304688, 263.4292907714844, 120.60746002197266, 63.76008605957031, 303.13812255859375, 150.73011779785156, 287.9438781738281, -9.115936279296875, 220.66954040527344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000620.npy"}
{"epoch": 0.9104258443465492, "step": 621, "batch_size": 64, "mean": 251.25936889648438, "std": 225.021484375, "min": -332.58184814453125, "p10": -2.86771965026853, "median": 218.01748657226562, "p90": 547.3716552734376, "max": 754.1367797851562, "pos_frac": 0.890625, "sample": [365.118896484375, -226.09408569335938, -332.58184814453125, -34.82478332519531, 261.62774658203125, 75.11417388916016, 240.64321899414062, 242.17579650878906, 151.31011962890625, 471.59771728515625, 383.8714599609375, 105.55570983886719, 188.88710021972656, 384.829345703125, 145.751953125, 39.840003967285156, -13.380285263061523, 293.7019958496094, -43.928794860839844, 193.06219482421875, 742.9176025390625, 313.1837158203125, 536.2061767578125, 259.3974609375, 527.343017578125, 739.8204345703125, 132.09780883789062, 438.5188903808594, 451.57049560546875, 98.14898681640625, 21.661598205566406, 401.46795654296875, 466.87115478515625, 160.39849853515625, 224.3138427734375, 575.849853515625, 80.62899017333984, 420.68701171875, 200.10211181640625, 169.54843139648438, 38.10359191894531, 552.1568603515625, -99.9271240234375, 311.0223388671875, 277.9319763183594, 199.17681884765625, 404.37176513671875, 238.64353942871094, 176.09722900390625, 572.7692260742188, -150.6813201904297, 71.7557601928711, 207.36599731445312, 294.27886962890625, 197.43553161621094, 141.4315948486328, 458.64605712890625, 754.1367797851562, 86.95797729492188, 211.72113037109375, 701.09716796875, 236.20272827148438, 166.46420288085938, 180.42881774902344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000621.npy"}
{"epoch": 0.9118942731277533, "step": 622, "batch_size": 64, "mean": 223.7984161376953, "std": 221.4250030517578, "min": -135.66156005859375, "p10": -17.022106170654293, "median": 205.7194366455078, "p90": 496.8566833496094, "max": 974.2365112304688, "pos_frac": 0.84375, "sample": [299.2989501953125, 260.3623962402344, 120.63126373291016, 298.1368408203125, 226.7456817626953, 331.4273376464844, -135.31715393066406, 252.49325561523438, 790.748779296875, 426.734130859375, 132.48919677734375, 59.61493682861328, 133.41160583496094, 4.9034881591796875, 78.27194213867188, 437.72991943359375, 106.98280334472656, 242.01654052734375, 154.64804077148438, 224.0360870361328, 378.3899230957031, 25.0345458984375, 142.46852111816406, 974.2365112304688, 354.30548095703125, 26.869598388671875, 198.22335815429688, 362.7123107910156, 316.17218017578125, -130.77468872070312, 200.15277099609375, -5.45050048828125, 505.733642578125, 123.76567077636719, 150.24871826171875, 620.1679077148438, -19.110641479492188, -12.148857116699219, 455.5947265625, 220.60244750976562, 392.4179992675781, 300.50494384765625, -1.9726181030273438, -49.72974395751953, 199.73538208007812, 34.9757080078125, 236.48519897460938, -29.023895263671875, 13.438430786132812, 170.21026611328125, 528.4165649414062, 497.85491943359375, 494.5274658203125, 359.459716796875, 211.28610229492188, 244.8284912109375, -135.66156005859375, -70.24060821533203, 117.00053405761719, 725.5276489257812, 333.64971923828125, 279.5229187011719, 127.73863220214844, 9.586311340332031], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000622.npy"}
{"epoch": 0.9133627019089574, "step": 623, "batch_size": 64, "mean": 179.51905822753906, "std": 242.64602661132812, "min": -454.15264892578125, "p10": -45.24456138610837, "median": 131.36183166503906, "p90": 510.11824645996114, "max": 970.2434692382812, "pos_frac": 0.828125, "sample": [166.77032470703125, 153.86453247070312, -235.94989013671875, 579.8592529296875, 133.3753662109375, 56.20354080200195, 408.0626220703125, 350.3476867675781, 390.1116943359375, 86.64340209960938, 578.724853515625, 102.36587524414062, 196.97683715820312, -23.288040161132812, 262.2372741699219, -54.65449905395508, 533.34326171875, 75.25897216796875, 92.57421875, 50.4547119140625, 70.37562561035156, 179.831787109375, 208.74160766601562, 82.9271240234375, -454.15264892578125, 142.35784912109375, 336.1165771484375, 92.0662612915039, 246.04502868652344, 229.05465698242188, 19.083160400390625, 129.34829711914062, 18.543922424316406, -6.821416854858398, -84.16287231445312, 24.811748504638672, 682.9948120117188, 244.77880859375, 186.4519805908203, 119.02758026123047, -16.921249389648438, 90.24819946289062, 279.983642578125, 363.1390686035156, -174.16897583007812, 6.836437225341797, 970.2434692382812, 376.1466369628906, 42.55653762817383, -150.5623779296875, 764.0263061523438, 455.9265441894531, 109.27619934082031, -2.8817691802978516, 68.48138427734375, 627.8177490234375, -215.48046875, 190.0404052734375, 206.37518310546875, 148.53785705566406, 395.1142578125, 117.64886474609375, 346.63507080078125, 119.4984130859375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000623.npy"}
{"epoch": 0.9148311306901615, "step": 624, "batch_size": 64, "mean": 243.0377655029297, "std": 227.729736328125, "min": -396.1963195800781, "p10": -49.27644729614257, "median": 238.87315368652344, "p90": 508.5893157958985, "max": 659.367919921875, "pos_frac": 0.859375, "sample": [296.5389404296875, -92.61430358886719, 482.5677490234375, 113.34168243408203, 279.6475830078125, 436.040283203125, 380.699462890625, 659.367919921875, 628.074462890625, 41.80738067626953, 74.5194091796875, 40.40338134765625, 226.74502563476562, 467.5714416503906, 312.2447204589844, 124.8453369140625, 452.67706298828125, 468.506591796875, 541.0952758789062, -396.1963195800781, 171.56520080566406, 430.166748046875, -17.49333953857422, 160.1795196533203, 210.72891235351562, 184.15029907226562, 473.6716003417969, 595.7694091796875, 239.79977416992188, 111.56423950195312, 187.96885681152344, 438.76068115234375, -99.61102294921875, 267.9281921386719, 79.09979248046875, 375.20867919921875, 416.53680419921875, 54.995025634765625, 266.6806335449219, 285.1048583984375, -51.682029724121094, 64.052734375, 237.946533203125, -52.48817443847656, 511.8498840332031, 317.5738830566406, 212.51734924316406, 457.9781188964844, 585.8529052734375, 370.4233093261719, 145.85340881347656, -184.3256072998047, 55.96315002441406, 651.959716796875, 18.008996963500977, 101.29502868652344, -43.663421630859375, 67.8020248413086, -179.9512481689453, 278.2401123046875, 500.9813232421875, 188.76531982421875, 498.2613525390625, 430.544921875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000624.npy"}
{"epoch": 0.9162995594713657, "step": 625, "batch_size": 64, "mean": 225.71258544921875, "std": 215.17103576660156, "min": -160.88839721679688, "p10": -5.391849327087396, "median": 187.87950134277344, "p90": 561.8659729003908, "max": 930.1172485351562, "pos_frac": 0.890625, "sample": [-111.14408111572266, 150.27142333984375, 330.7565002441406, 456.6308288574219, 202.78485107421875, 176.07647705078125, 471.3947448730469, 708.2274169921875, 343.92974853515625, 419.99163818359375, 58.406341552734375, 61.044830322265625, -36.82513427734375, 126.77983093261719, 579.9085083007812, 297.44677734375, 14.376142501831055, -15.382915496826172, 930.1172485351562, 293.99169921875, 591.983154296875, 139.9525909423828, 86.4144058227539, 233.50157165527344, 87.5112533569336, 245.02749633789062, 580.5535888671875, -8.928207397460938, 40.99195861816406, 141.35000610351562, 168.8792724609375, 193.66754150390625, 391.2170104980469, -71.4110107421875, 413.320556640625, 100.40390014648438, 38.433494567871094, 399.1951599121094, 215.10931396484375, 194.70030212402344, 111.95024108886719, 91.60308074951172, 519.7667236328125, 225.19944763183594, 227.3123321533203, 303.7196350097656, 0.8901195526123047, 294.7005615234375, 269.915283203125, 170.14051818847656, 3.9416561126708984, -8.084121704101562, 340.55230712890625, 112.66905212402344, 57.54143142700195, 72.0355453491211, 93.15965270996094, 182.09146118164062, 673.3538208007812, -160.88839721679688, 339.6856994628906, 53.764801025390625, 608.4443359375, 221.48440551757812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000625.npy"}
{"epoch": 0.9177679882525698, "step": 626, "batch_size": 64, "mean": 166.82199096679688, "std": 219.90499877929688, "min": -283.2041931152344, "p10": -81.88828811645507, "median": 126.25683212280273, "p90": 424.7406188964844, "max": 732.1300659179688, "pos_frac": 0.734375, "sample": [396.2290344238281, 168.730712890625, -53.69500732421875, 161.27342224121094, 118.4718017578125, 203.00889587402344, 430.6175537109375, 97.93557739257812, 320.36651611328125, -65.69525909423828, 309.1531066894531, 153.07115173339844, 663.75537109375, 732.1300659179688, -94.07647705078125, -3.3550643920898438, 403.71612548828125, -134.01051330566406, 14.7061767578125, -12.259857177734375, 57.1953125, -46.75520324707031, 222.5845184326172, 20.327524185180664, -80.61375427246094, 102.99803161621094, 84.97268676757812, -135.19911193847656, -4.297206878662109, 307.311279296875, -49.51902770996094, 332.7096252441406, 33.03786087036133, -139.98663330078125, 194.76565551757812, 379.2032470703125, 411.02777099609375, 176.6048583984375, 155.7436981201172, 292.6269836425781, 402.68963623046875, -8.873382568359375, 683.8353881835938, 410.37066650390625, 580.5835571289062, 157.6728515625, -82.43451690673828, 119.06063079833984, 7.092559814453125, 309.39569091796875, 107.45062255859375, 443.86175537109375, 382.8170471191406, 509.4740295410156, 2.640045166015625, 92.54130554199219, 133.45303344726562, 314.79412841796875, -283.2041931152344, -124.15141296386719, 87.69465637207031, 52.499488830566406, 277.18243408203125, -24.650310516357422], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000626.npy"}
{"epoch": 0.9192364170337739, "step": 627, "batch_size": 64, "mean": 243.1033477783203, "std": 279.6839294433594, "min": -312.8719482421875, "p10": -35.476016998291016, "median": 178.49464416503906, "p90": 585.0323059082034, "max": 1158.0469970703125, "pos_frac": 0.84375, "sample": [321.6128845214844, 209.7242431640625, 724.89697265625, 385.9425048828125, 39.463905334472656, -36.35475158691406, 140.7977294921875, 287.5967712402344, 527.9629516601562, 241.64443969726562, 504.86724853515625, -192.32192993164062, 116.64352416992188, 0.2050457000732422, -43.152549743652344, 510.99920654296875, 16.01927947998047, 147.42864990234375, 519.1342163085938, 444.9773254394531, -183.03565979003906, 412.1025695800781, 496.1672668457031, -8.26130485534668, 782.9859619140625, 122.24923706054688, 353.82537841796875, 505.99566650390625, 156.47979736328125, 10.153570175170898, 102.1476821899414, 31.73453140258789, 166.54788208007812, 788.76513671875, 449.54974365234375, 326.87225341796875, 464.35284423828125, 154.78195190429688, 320.557373046875, -48.228485107421875, 86.18769073486328, 2.303342819213867, 1158.0469970703125, 202.36245727539062, 190.44140625, -33.425636291503906, 852.7135009765625, 1.2386550903320312, 126.03726196289062, 76.64065551757812, 609.4906005859375, 31.428977966308594, 356.9591979980469, -312.8719482421875, 45.4821891784668, 324.1322021484375, 199.42030334472656, 132.1442413330078, -152.54849243164062, 750.354736328125, 264.7784118652344, -6.7313995361328125, 164.25057983398438, 215.94735717773438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000627.npy"}
{"epoch": 0.920704845814978, "step": 628, "batch_size": 64, "mean": 250.96429443359375, "std": 233.82528686523438, "min": -289.5111083984375, "p10": -42.42193565368652, "median": 228.94232940673828, "p90": 584.7814147949221, "max": 780.373779296875, "pos_frac": 0.84375, "sample": [267.9044494628906, -3.65533447265625, 367.57354736328125, 780.373779296875, 477.05078125, 108.10757446289062, 287.1497497558594, 265.69940185546875, 17.378128051757812, -79.44830322265625, -10.103126525878906, 110.09546661376953, -289.5111083984375, 641.2437744140625, 327.5657043457031, 420.8696594238281, 102.25837707519531, -104.38683319091797, 676.4678955078125, 465.29364013671875, 648.8574829101562, 479.64459228515625, 275.44219970703125, 192.23699951171875, 111.13565826416016, -45.671791076660156, 231.06568908691406, 255.801513671875, 8.213935852050781, 15.899982452392578, 520.7216796875, 352.9696044921875, 373.93756103515625, 320.2815856933594, 468.4266357421875, 509.03717041015625, 320.2413635253906, 353.0589599609375, 70.77379608154297, 87.22796630859375, -84.13484954833984, 226.8189697265625, 124.24541473388672, 205.5057373046875, 500.22320556640625, 298.01519775390625, 531.318359375, 212.33145141601562, 17.71491241455078, -79.30764770507812, 169.08279418945312, 125.28385162353516, 492.57550048828125, 607.6941528320312, -73.2191162109375, 104.63677215576172, 620.326171875, 143.562744140625, 467.2598571777344, 165.70465087890625, 643.569091796875, 169.6834716796875, 130.4341278076172, -34.83893966674805], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000628.npy"}
{"epoch": 0.922173274596182, "step": 629, "batch_size": 64, "mean": 214.87026977539062, "std": 192.63412475585938, "min": -151.3263397216797, "p10": 8.059150695800785, "median": 201.08121490478516, "p90": 492.80982666015626, "max": 768.4046020507812, "pos_frac": 0.90625, "sample": [218.386962890625, 226.28878784179688, -6.353464126586914, -128.293701171875, 503.89453125, 195.13233947753906, 53.69953918457031, 331.11383056640625, 378.5667419433594, 305.4166259765625, 221.577880859375, 259.16937255859375, 309.8196105957031, 61.140010833740234, 461.47442626953125, -16.14722442626953, 175.18670654296875, 158.9251708984375, 14.796609878540039, 309.77374267578125, 25.10401153564453, 484.96734619140625, 327.3078918457031, -71.04603576660156, 25.016498565673828, 207.03009033203125, 27.788284301757812, 20.39519500732422, 421.4213562011719, 182.3231201171875, 106.93225860595703, 129.5560760498047, 768.4046020507812, 493.56195068359375, 345.1720275878906, 76.02246856689453, -151.3263397216797, 491.05487060546875, 367.1851806640625, 36.61140441894531, 40.39082336425781, 11.39300537109375, 214.05128479003906, 6.6303558349609375, 451.2894287109375, 575.94189453125, 213.2174835205078, 148.5035858154297, 176.4114227294922, 28.57112693786621, 215.7433319091797, 128.8708038330078, 191.97381591796875, 518.4169921875, 92.8966064453125, 512.4956665039062, 92.13361358642578, 499.1842346191406, 378.4720153808594, 253.2445526123047, 324.48162841796875, 146.62982177734375, 278.66015625, -94.95748901367188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000629.npy"}
{"epoch": 0.9236417033773862, "step": 630, "batch_size": 64, "mean": 161.44677734375, "std": 226.35643005371094, "min": -303.6939697265625, "p10": -53.24661445617676, "median": 112.99626541137695, "p90": 402.0527587890625, "max": 909.9158935546875, "pos_frac": 0.8125, "sample": [511.9451599121094, -303.6939697265625, 85.09593963623047, 169.2216796875, -51.98358154296875, 388.1412048339844, 146.57611083984375, 182.99285888671875, 909.9158935546875, 109.2629623413086, 282.27685546875, 146.21226501464844, 264.4228210449219, 295.3152160644531, 388.2951965332031, -243.58316040039062, -6.224641799926758, 35.55743408203125, 91.49207305908203, 211.13412475585938, 89.50150299072266, 55.26185989379883, 62.83802795410156, 697.0067138671875, -158.27845764160156, 746.1506958007812, 121.67192077636719, 397.04022216796875, 63.49382781982422, 52.98942565917969, 21.289583206176758, 54.09185028076172, 218.2196044921875, -280.4035339355469, 214.2504425048828, 148.00230407714844, -96.95686340332031, 380.5316467285156, 61.35414123535156, 291.39501953125, -4.14697265625, 110.736328125, 6.981895446777344, -14.35296630859375, 105.61489868164062, 117.72906494140625, 206.7696533203125, 158.15103149414062, 243.90359497070312, 27.563682556152344, 229.0877227783203, 479.11737060546875, 326.4183044433594, 107.90400695800781, -53.78791427612305, 76.95405578613281, 115.2562026977539, -5.2968292236328125, 331.68243408203125, -113.95417022705078, 636.5757446289062, 63.90235137939453, 404.20098876953125, 23.760099411010742], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000630.npy"}
{"epoch": 0.9251101321585903, "step": 631, "batch_size": 64, "mean": 190.66612243652344, "std": 247.36061096191406, "min": -374.0304870605469, "p10": -170.7517852783203, "median": 185.06029510498047, "p90": 549.6014587402345, "max": 738.3516845703125, "pos_frac": 0.8125, "sample": [607.361083984375, 258.610595703125, -76.29306030273438, 202.26792907714844, 73.04523468017578, 324.7235412597656, 157.14175415039062, 119.17745208740234, -173.1248779296875, -182.21702575683594, 243.51644897460938, 738.3516845703125, -165.21456909179688, 394.8438415527344, -68.84468078613281, 7.18235969543457, 136.03497314453125, 618.318603515625, 36.05945587158203, 192.17701721191406, 39.284767150878906, 10.893999099731445, 274.7529296875, -299.01434326171875, 326.2946472167969, 177.94357299804688, 295.1155700683594, 26.647720336914062, 376.6934509277344, 165.38522338867188, -374.0304870605469, 375.2263488769531, 700.703369140625, 293.81280517578125, 330.16046142578125, 140.14337158203125, 208.15924072265625, -46.42759704589844, 201.25137329101562, 313.06793212890625, 253.90231323242188, 66.9037094116211, 678.1134033203125, 431.1355895996094, 294.7550048828125, -194.57452392578125, 568.27783203125, 176.00640869140625, 506.02325439453125, 250.46533203125, 153.1424560546875, 133.9905242919922, -18.259769439697266, 57.572349548339844, -243.265380859375, 15.44482421875, 357.26458740234375, 229.01010131835938, 652.2483520507812, -195.72055053710938, 171.17282104492188, 332.0190734863281, 144.74398803710938, 403.07794189453125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000631.npy"}
{"epoch": 0.9265785609397944, "step": 632, "batch_size": 64, "mean": 212.34161376953125, "std": 261.7093811035156, "min": -576.7701416015625, "p10": -67.49688110351562, "median": 228.23207092285156, "p90": 550.2428100585938, "max": 750.5757446289062, "pos_frac": 0.828125, "sample": [252.26699829101562, -403.81024169921875, 221.27218627929688, 258.31903076171875, 460.6292724609375, -576.7701416015625, 582.6900634765625, 473.098388671875, 512.3790893554688, 510.68524169921875, 415.99755859375, 211.6441192626953, 280.8997802734375, 613.0604858398438, 256.5016784667969, 111.08065032958984, 432.9936828613281, 235.19195556640625, 28.056148529052734, -71.67037200927734, 540.9384155273438, 285.9089050292969, 353.951171875, 328.29669189453125, -164.2873077392578, 511.5185852050781, 327.2796325683594, -263.1164855957031, 254.55947875976562, 74.49463653564453, -321.16229248046875, 61.740196228027344, 554.2304077148438, 282.60614013671875, 177.05929565429688, 202.30726623535156, 597.7244873046875, 355.0610656738281, 132.39328002929688, 27.870567321777344, 376.8335266113281, 79.39617156982422, -57.75873565673828, 68.50875854492188, 294.91888427734375, 210.92593383789062, 402.0105285644531, 102.24053192138672, 604.6731567382812, 750.5757446289062, 595.21533203125, -8.642486572265625, -209.8329620361328, 75.42987823486328, 110.71159362792969, 78.3456802368164, -51.03484344482422, 287.967529296875, -28.451942443847656, 203.98150634765625, 154.91407775878906, 23.344892501831055, 58.329933166503906, 343.3707580566406], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000632.npy"}
{"epoch": 0.9280469897209985, "step": 633, "batch_size": 64, "mean": 154.8126678466797, "std": 223.13526916503906, "min": -393.0101318359375, "p10": -127.92549743652343, "median": 176.74063110351562, "p90": 384.56266784667974, "max": 719.6673583984375, "pos_frac": 0.75, "sample": [-129.779541015625, 316.6234130859375, 239.6053924560547, -23.842308044433594, 255.4063720703125, 307.888916015625, -193.1460723876953, -123.59939575195312, 525.5789184570312, 168.77362060546875, 289.5714111328125, 207.6944580078125, 282.0526123046875, -236.45596313476562, -42.70001220703125, 57.59113311767578, 236.319091796875, 554.4290771484375, 181.21054077148438, 94.08053588867188, 30.85082244873047, 514.1151123046875, 206.6788787841797, -12.880699157714844, 367.517578125, 120.45632934570312, 44.52943420410156, 186.978271484375, -386.39410400390625, 52.326622009277344, 256.4424133300781, 155.14239501953125, 328.84295654296875, 157.77853393554688, 86.01542663574219, -7.902179718017578, -62.312469482421875, -393.0101318359375, -185.49180603027344, 187.93927001953125, 719.6673583984375, 324.7424621582031, -101.3077392578125, -200.57864379882812, 359.6083984375, 230.871826171875, 318.564208984375, 572.1038818359375, 391.8677062988281, 114.25485229492188, -72.25985717773438, 172.27072143554688, -42.82072448730469, 344.02093505859375, 48.71342468261719, 72.67274475097656, 216.76747131347656, 481.4793701171875, 263.5819091796875, 58.475677490234375, 360.46405029296875, 150.285400390625, 226.02883911132812, 283.611572265625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000633.npy"}
{"epoch": 0.9295154185022027, "step": 634, "batch_size": 64, "mean": 170.43814086914062, "std": 226.48654174804688, "min": -216.8528594970703, "p10": -65.86859893798828, "median": 114.46742248535156, "p90": 478.52721862792976, "max": 747.5936279296875, "pos_frac": 0.765625, "sample": [29.835281372070312, 417.8519287109375, 93.6990737915039, 660.7981567382812, 96.46742248535156, 117.14044189453125, 10.778543472290039, 220.84292602539062, 271.8629455566406, 402.7332763671875, 74.169677734375, 156.26416015625, 59.97753143310547, 259.73406982421875, 454.97296142578125, 293.0624694824219, -43.519412994384766, 8.23089599609375, 25.668487548828125, 20.960723876953125, 721.686279296875, -201.3990936279297, -89.16780090332031, 61.19584655761719, -133.41986083984375, 216.2640380859375, 33.241851806640625, 484.32275390625, 292.0774230957031, 84.77662658691406, 747.5936279296875, 121.7806167602539, 126.18195343017578, -204.2139892578125, 638.940673828125, 465.0043029785156, 569.14013671875, -216.8528594970703, -27.230587005615234, 62.33206558227539, 182.32704162597656, -66.55860900878906, -44.89075469970703, -64.25857543945312, 109.46165466308594, -61.35364532470703, 293.4441223144531, -8.545852661132812, 486.51373291015625, 403.02569580078125, 79.29510498046875, 285.96875, 283.7616271972656, 367.331298828125, 396.3410949707031, 143.71665954589844, 199.16659545898438, 288.9777526855469, -34.44776153564453, 111.79440307617188, -66.8616943359375, -13.459213256835938, 28.699838638305664, 224.80612182617188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000634.npy"}
{"epoch": 0.9309838472834068, "step": 635, "batch_size": 64, "mean": 181.37939453125, "std": 200.1664276123047, "min": -219.5679931640625, "p10": -36.77436523437499, "median": 132.68299102783203, "p90": 485.8878540039063, "max": 671.8131103515625, "pos_frac": 0.828125, "sample": [151.10934448242188, 237.9406280517578, 671.8131103515625, -40.16748046875, 270.9689025878906, -46.845481872558594, 106.64141845703125, 88.10487365722656, 4.552894592285156, 116.76201629638672, 109.95410919189453, 663.238525390625, 124.83961486816406, 109.50883483886719, -88.10040283203125, 192.48797607421875, 77.98696899414062, 67.93356323242188, 597.23046875, 40.50103759765625, 328.0736083984375, 25.618942260742188, 452.26495361328125, 490.52496337890625, 228.90200805664062, 53.901466369628906, 6.443145751953125, 328.7216491699219, 150.62327575683594, 594.8656005859375, 55.88304901123047, 287.0400390625, 80.78237915039062, 475.06793212890625, 314.852294921875, 89.42194366455078, 92.66144561767578, 66.13322448730469, -39.712608337402344, 403.90179443359375, 28.122119903564453, 325.2389831542969, 623.1946411132812, 239.79104614257812, -29.91846466064453, 235.39810180664062, 289.063232421875, -9.644676208496094, -219.5679931640625, 140.5263671875, -23.99188232421875, -52.74360656738281, 209.57730102539062, 256.54290771484375, 273.91607666015625, 153.57290649414062, 61.88778305053711, 153.72915649414062, -23.970458984375, 564.535888671875, -75.34769439697266, 85.7242660522461, 199.18341064453125, 261.0308837890625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000635.npy"}
{"epoch": 0.9324522760646109, "step": 636, "batch_size": 64, "mean": 213.02748107910156, "std": 209.4912109375, "min": -188.09153747558594, "p10": -64.99933319091797, "median": 224.1427993774414, "p90": 454.1252655029297, "max": 1023.873046875, "pos_frac": 0.828125, "sample": [43.4298095703125, -55.88490295410156, 51.708587646484375, 25.52734375, 186.71170043945312, -97.15847778320312, 306.70050048828125, 120.12323760986328, 361.8482666015625, 73.6617660522461, 120.30174255371094, 109.03528594970703, 172.41229248046875, 266.4173278808594, 198.49673461914062, 277.67340087890625, -73.0897216796875, 566.5420532226562, 407.4686279296875, 167.14007568359375, 393.33038330078125, 469.38690185546875, 162.69601440429688, -92.64790344238281, 304.7735290527344, 223.93051147460938, 389.4903259277344, 316.27935791015625, 245.08912658691406, 1023.873046875, -125.35667419433594, -39.25294494628906, 329.0302429199219, 251.94654846191406, -53.03580093383789, 579.218017578125, 451.5940246582031, 287.16082763671875, 324.2001037597656, -68.905517578125, 389.910400390625, 130.3103485107422, 177.57693481445312, 366.9468688964844, 259.0506286621094, 324.91033935546875, 457.7081298828125, 282.2601013183594, 21.30443572998047, -188.09153747558594, 127.2073974609375, -70.648193359375, -44.9324951171875, 132.6635284423828, 284.9642028808594, 224.35508728027344, 455.2100830078125, 533.555908203125, 196.0955352783203, 290.28125, 7.29254150390625, 64.23532104492188, 254.35873413085938, 355.3675842285156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000636.npy"}
{"epoch": 0.933920704845815, "step": 637, "batch_size": 64, "mean": 222.14358520507812, "std": 222.72239685058594, "min": -284.7808837890625, "p10": 1.2386236190795978, "median": 182.79017639160156, "p90": 552.0646423339844, "max": 806.1564331054688, "pos_frac": 0.890625, "sample": [142.49456787109375, 134.08480834960938, 307.6260070800781, 212.5863037109375, 55.80447769165039, 456.5855712890625, 382.10369873046875, -284.7808837890625, 550.104736328125, 356.0455322265625, 409.32373046875, -64.7235107421875, 152.99404907226562, 223.20693969726562, -74.88981628417969, 483.1099853515625, 260.3363037109375, 126.8187026977539, 552.9046020507812, 272.4257507324219, 539.835205078125, 22.89928436279297, 137.41131591796875, 806.1564331054688, 108.17666625976562, 241.02516174316406, 77.12405395507812, 89.65512084960938, 320.8746643066406, 104.32180786132812, 221.13656616210938, -145.22280883789062, 11.098701477050781, 279.62762451171875, 293.31964111328125, -28.83051872253418, 115.40248107910156, 312.93994140625, 297.50189208984375, 115.64849853515625, 280.35614013671875, 519.7879028320312, 37.156822204589844, 225.86068725585938, 402.82318115234375, 301.89617919921875, 552.9478149414062, 15.906867980957031, 574.2115478515625, 60.215354919433594, 657.9417724609375, -180.87196350097656, 131.44146728515625, 355.36090087890625, 111.94186401367188, 80.29920959472656, 122.34959411621094, 687.3175048828125, 45.06608581542969, 562.524658203125, 9.308414459228516, 34.386749267578125, 56.9178466796875, -2.219858169555664], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000637.npy"}
{"epoch": 0.9353891336270191, "step": 638, "batch_size": 64, "mean": 218.700439453125, "std": 191.96728515625, "min": -128.92626953125, "p10": -25.498257255554186, "median": 194.65525817871094, "p90": 491.4314270019531, "max": 706.8804931640625, "pos_frac": 0.859375, "sample": [644.90087890625, -60.46429443359375, -92.28692626953125, 186.65945434570312, 251.7445831298828, 28.500808715820312, 147.76260375976562, 77.9901123046875, 326.84674072265625, 168.80140686035156, 302.8450012207031, 706.8804931640625, 224.2399139404297, 538.9652709960938, 160.9885711669922, 76.92256164550781, 237.60289001464844, 112.4653091430664, 250.01248168945312, 143.59959411621094, 130.12054443359375, 31.951576232910156, 169.967041015625, 211.45791625976562, 235.24452209472656, 464.7561950683594, 165.50921630859375, 211.8401641845703, 537.7762451171875, 390.4504699707031, 351.06817626953125, 391.7857360839844, 219.8043670654297, 211.15359497070312, 316.576904296875, 183.06344604492188, -11.825942993164062, -10.875293731689453, -42.706207275390625, 13.350238800048828, -102.29203796386719, 181.69412231445312, 322.5531921386719, 660.1725463867188, -57.35478973388672, -31.357820510864258, 486.0430908203125, 187.3603057861328, 210.5797119140625, 123.93377685546875, 126.59454345703125, 79.9993896484375, 353.5008544921875, 201.95021057128906, 292.3531799316406, 120.0399169921875, 142.03762817382812, 114.57825469970703, 492.9205322265625, 580.6663208007812, -128.92626953125, 216.48199462890625, 487.95684814453125, 329.8968505859375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000638.npy"}
{"epoch": 0.9368575624082232, "step": 639, "batch_size": 64, "mean": 164.38970947265625, "std": 221.82846069335938, "min": -304.3739318847656, "p10": -133.38333129882812, "median": 151.0177230834961, "p90": 469.59736328125007, "max": 687.4694213867188, "pos_frac": 0.75, "sample": [615.0848999023438, -123.76548767089844, 70.24403381347656, 382.29803466796875, 456.6517639160156, 41.52629089355469, 310.4598388671875, 216.02932739257812, 29.249855041503906, 240.1683349609375, 379.7038879394531, -97.63906860351562, 310.2167053222656, -8.72900390625, -34.29756164550781, 137.43502807617188, -137.50526428222656, -170.25982666015625, 238.94998168945312, -8.953529357910156, 475.1454772949219, -71.4334945678711, 530.1832885742188, 285.3814392089844, 264.94964599609375, -151.962646484375, -244.8688507080078, 118.54139709472656, 100.1049575805664, 185.95252990722656, 360.9454650878906, 37.152076721191406, -29.92426300048828, -157.2873077392578, 530.7280883789062, 148.00770568847656, 147.65943908691406, 414.601318359375, 129.1212921142578, 202.21865844726562, 144.75369262695312, 230.43124389648438, 455.80035400390625, 286.3489685058594, 687.4694213867188, -253.81312561035156, 154.02774047851562, 163.66287231445312, -101.67745971679688, 131.807861328125, 115.58283996582031, 97.65342712402344, 278.01580810546875, 169.89813232421875, -304.3739318847656, -79.9889907836914, 31.675689697265625, 283.1690673828125, 245.51991271972656, 514.992919921875, 146.64035034179688, 504.18621826171875, 339.55877685546875, 157.51516723632812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000639.npy"}
{"epoch": 0.9383259911894273, "step": 640, "batch_size": 64, "mean": 257.691162109375, "std": 258.782958984375, "min": -179.43869018554688, "p10": -41.134159851074216, "median": 230.81985473632812, "p90": 562.3011413574219, "max": 1144.76416015625, "pos_frac": 0.828125, "sample": [558.8196411132812, 300.5202941894531, -60.13385009765625, 163.36573791503906, 182.69223022460938, -26.785980224609375, 459.15692138671875, 599.0368041992188, 282.6168212890625, 23.368614196777344, 393.7140808105469, 766.2981567382812, 435.34967041015625, 297.1381530761719, 66.32637023925781, 253.74325561523438, 230.1217498779297, -125.68701171875, -35.63764953613281, 353.1531982421875, 11.633544921875, 403.28009033203125, 254.05697631835938, 77.69084167480469, 103.57064819335938, -127.18864440917969, 121.55584716796875, 333.25677490234375, 1047.9658203125, 359.6331787109375, 405.40533447265625, 326.3480224609375, -43.48980712890625, 275.62835693359375, 1144.76416015625, 71.09465789794922, -179.43869018554688, 193.79396057128906, -18.408447265625, 239.17120361328125, 134.8189239501953, 231.51795959472656, 139.4473114013672, 563.793212890625, 378.1305847167969, 96.95413208007812, 308.53118896484375, 321.04345703125, -70.69760131835938, 508.1684875488281, 740.1317138671875, 211.68145751953125, -29.816497802734375, 477.1599426269531, -56.81745910644531, 160.92787170410156, 160.63307189941406, 150.52285766601562, 185.6417999267578, 293.7773132324219, 369.8966369628906, 680.3515625, 216.07017517089844, 202.8663330078125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000640.npy"}
{"epoch": 0.9397944199706314, "step": 641, "batch_size": 64, "mean": 176.06033325195312, "std": 242.85853576660156, "min": -366.83026123046875, "p10": -111.79916534423828, "median": 155.45558166503906, "p90": 452.87800598144537, "max": 1183.345458984375, "pos_frac": 0.8125, "sample": [48.941612243652344, 135.4754180908203, 276.2591247558594, 664.849609375, -175.91342163085938, 168.75930786132812, 2.058837890625, 405.5718994140625, 185.96484375, 17.789688110351562, -57.63193893432617, 192.41604614257812, -113.15013122558594, -182.75604248046875, 1183.345458984375, 153.8687744140625, -44.306678771972656, -143.89892578125, 24.696884155273438, 160.24929809570312, 284.23638916015625, 100.20133209228516, 104.249755859375, 344.73681640625, 226.57357788085938, 112.95065307617188, -366.83026123046875, 82.62294006347656, 214.078125, 99.19682312011719, 88.26066589355469, 242.15052795410156, 157.04238891601562, -108.64691162109375, 108.47730255126953, 202.08187866210938, 570.774658203125, 83.15730285644531, 102.18818664550781, 288.9478759765625, 144.53717041015625, -78.57281494140625, 62.16930389404297, -177.0462646484375, 119.38239288330078, 7.1484222412109375, -236.03651428222656, 302.71087646484375, 461.58343505859375, 514.3309326171875, 498.35906982421875, 432.5653381347656, -43.113525390625, 295.36151123046875, 246.34622192382812, 329.7074890136719, 426.9586181640625, 307.8485107421875, 169.7043914794922, 111.49652099609375, 297.0457458496094, 345.402587890625, 347.6571044921875, 543.275146484375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000641.npy"}
{"epoch": 0.9412628487518355, "step": 642, "batch_size": 64, "mean": 226.55743408203125, "std": 275.80291748046875, "min": -228.49114990234375, "p10": -68.55736694335936, "median": 201.73023986816406, "p90": 562.5784179687503, "max": 1292.693359375, "pos_frac": 0.796875, "sample": [191.2466583251953, -26.869400024414062, 255.3560791015625, 305.2729797363281, 195.5154266357422, -228.49114990234375, 472.26763916015625, 1.065521240234375, -33.854736328125, 108.49859619140625, -44.82312774658203, 24.544891357421875, 383.0577697753906, -188.7667236328125, 287.71051025390625, 610.800537109375, 207.94505310058594, 499.51214599609375, -148.75860595703125, 122.71725463867188, -164.92153930664062, 18.11962890625, 152.119873046875, 58.36601257324219, 292.5536193847656, 291.91815185546875, 120.5522689819336, 221.753662109375, 62.46195983886719, 268.0514831542969, 392.97625732421875, 266.9039306640625, 494.3622741699219, 775.0509033203125, -19.953277587890625, 293.5982971191406, 133.94219970703125, 353.08038330078125, -216.869384765625, 290.2843933105469, 480.08624267578125, 181.5318145751953, 251.74703979492188, 585.6268920898438, -30.50124740600586, 681.1588134765625, 453.24957275390625, 593.9339599609375, -46.58647155761719, 218.306396484375, 1292.693359375, 393.014892578125, -77.97346496582031, 4.8778533935546875, 26.22389030456543, -98.83622741699219, 161.33351135253906, 832.5513916015625, 96.89334106445312, 53.497947692871094, 190.34619140625, 508.79864501953125, 213.27227783203125, 456.13006591796875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000642.npy"}
{"epoch": 0.9427312775330396, "step": 643, "batch_size": 64, "mean": 202.1302490234375, "std": 196.88941955566406, "min": -208.08689880371094, "p10": -39.405057144165006, "median": 178.9950714111328, "p90": 462.2480499267578, "max": 631.8778076171875, "pos_frac": 0.875, "sample": [116.96993255615234, 250.10289001464844, 140.4700164794922, 361.6346435546875, 251.01348876953125, 459.14361572265625, 414.55780029296875, -60.81340789794922, -94.41609191894531, 174.2561798095703, 7.502931594848633, 58.57969665527344, 300.6023864746094, 61.95987319946289, -200.18690490722656, 392.2110900878906, 446.35992431640625, 393.47381591796875, 328.2134094238281, 315.86749267578125, 222.0916290283203, 33.07609939575195, 30.483694076538086, 100.80228424072266, 35.62971878051758, 122.7581787109375, -110.59336853027344, 294.7671813964844, 300.6900939941406, 308.130126953125, -208.08689880371094, 5.80988883972168, -79.83203125, 423.7108154296875, 582.871337890625, 563.5469360351562, 182.10397338867188, 32.9146728515625, 190.53863525390625, 315.6415710449219, -6.770214080810547, 631.8778076171875, 488.20086669921875, 291.3995666503906, 205.191162109375, 337.2503662109375, 3.5896968841552734, 143.74769592285156, 621.4714965820312, 121.82295227050781, 463.5785217285156, 467.1032409667969, 87.24327087402344, -53.39141845703125, 320.284912109375, 32.877716064453125, 139.33482360839844, 130.32656860351562, 169.5322265625, 273.7713928222656, 116.23846435546875, 273.6334228515625, 37.577491760253906, 175.88616943359375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000643.npy"}
{"epoch": 0.9441997063142438, "step": 644, "batch_size": 64, "mean": 196.55601501464844, "std": 243.90109252929688, "min": -512.12548828125, "p10": -87.00711364746093, "median": 159.22156524658203, "p90": 542.7058959960938, "max": 744.85595703125, "pos_frac": 0.8125, "sample": [669.084228515625, -56.96875762939453, 435.246826171875, -512.12548828125, 55.29539489746094, 243.61505126953125, -144.47036743164062, 27.855056762695312, 227.216552734375, 134.098388671875, 26.815444946289062, 277.51763916015625, -99.89067840576172, 468.1418151855469, 136.42942810058594, 612.4857788085938, 183.17169189453125, 175.21881103515625, 197.05624389648438, 38.470298767089844, 114.1558837890625, 536.6500244140625, 152.3484649658203, -81.0248031616211, 436.5137939453125, 497.4356994628906, -4.519775390625, 88.29003143310547, 166.09466552734375, 618.53857421875, 40.645606994628906, -36.64175796508789, 371.1409606933594, -0.0812835693359375, 185.60707092285156, 77.0349349975586, 688.5242919921875, 108.24600219726562, 134.60450744628906, 234.3062744140625, 75.8402328491211, -89.57096099853516, 425.5154113769531, 3.5792312622070312, 379.4289855957031, 322.5838623046875, 57.18931579589844, -136.4512939453125, 59.57920837402344, 507.1512145996094, 615.1060180664062, 379.9709777832031, -130.5258026123047, -109.13511657714844, 271.18524169921875, 545.30126953125, 293.484619140625, 241.7213592529297, 34.7177734375, 47.401344299316406, 258.1509094238281, 744.85595703125, 112.63367462158203, 247.73904418945312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000644.npy"}
{"epoch": 0.9456681350954479, "step": 645, "batch_size": 64, "mean": 155.01222229003906, "std": 240.40817260742188, "min": -308.22503662109375, "p10": -119.92087173461913, "median": 109.9955825805664, "p90": 533.2064086914063, "max": 878.0641479492188, "pos_frac": 0.734375, "sample": [67.10308837890625, 360.1492614746094, -125.82857513427734, 381.11224365234375, -159.90354919433594, -12.117752075195312, -128.6585693359375, 161.09011840820312, 307.8042907714844, 0.4449348449707031, 141.04173278808594, 10.3179931640625, 54.398704528808594, -6.145847320556641, -6.603733062744141, 79.76012420654297, 173.79603576660156, 99.39059448242188, 612.6928100585938, 161.4971923828125, 280.13983154296875, -15.210639953613281, 90.84686279296875, 120.60057067871094, 94.26942443847656, 436.2353210449219, 397.01824951171875, 52.779869079589844, -255.76841735839844, -308.22503662109375, 60.582069396972656, 213.65634155273438, -218.29995727539062, 543.5811157226562, 540.122314453125, -17.944854736328125, 533.8988037109375, 408.0516052246094, -106.13623046875, -76.1255111694336, 878.0641479492188, 135.29852294921875, 19.92498779296875, 10.853717803955078, 23.743789672851562, 299.8209228515625, 163.8401641845703, 248.2367401123047, 215.83645629882812, 251.7140655517578, 225.49798583984375, 24.431472778320312, 587.857666015625, 29.59111785888672, 531.5908203125, 693.4930419921875, 126.29322052001953, 351.52960205078125, -200.75547790527344, 131.18643188476562, 305.51739501953125, -10.746618270874023, -28.134136199951172, -39.316707611083984], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000645.npy"}
{"epoch": 0.947136563876652, "step": 646, "batch_size": 64, "mean": 195.88906860351562, "std": 206.01620483398438, "min": -287.1565246582031, "p10": -36.7201431274414, "median": 180.8451385498047, "p90": 520.1481567382813, "max": 760.9860229492188, "pos_frac": 0.84375, "sample": [589.5614013671875, 235.73684692382812, 251.42739868164062, 327.0244140625, 366.0224304199219, 156.90260314941406, 527.7556762695312, 361.57806396484375, 532.574951171875, 10.490982055664062, 206.36203002929688, 228.44178771972656, -287.1565246582031, 241.97354125976562, 129.30477905273438, 100.80256652832031, 163.5658721923828, 50.584449768066406, 431.37823486328125, 154.1768798828125, 280.334228515625, 128.66226196289062, -2.8217849731445312, 269.0608825683594, 69.35079956054688, 190.30032348632812, 288.02227783203125, 257.35723876953125, -45.3958740234375, 313.09466552734375, 171.38995361328125, 256.7272644042969, -22.396392822265625, 229.6563262939453, -60.13252258300781, 293.65362548828125, -35.49737548828125, 676.5455322265625, 116.70927429199219, -273.7095947265625, 198.6570281982422, 614.7451171875, 158.00514221191406, 58.46965408325195, -37.24418640136719, -54.14137268066406, 105.39421081542969, 35.82512664794922, 2.7703399658203125, 760.9860229492188, 502.39727783203125, 163.57403564453125, 275.9010925292969, 190.6276092529297, 357.8804016113281, -54.86225891113281, 32.86161804199219, 108.7891616821289, 213.0299530029297, 108.98395538330078, 108.2446060180664, 557.7579956054688, 42.36937713623047, 206.459228515625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000646.npy"}
{"epoch": 0.9486049926578561, "step": 647, "batch_size": 64, "mean": 193.6148223876953, "std": 224.6533203125, "min": -427.8514404296875, "p10": -45.07410049438476, "median": 176.43408203125, "p90": 487.3085235595703, "max": 794.3444213867188, "pos_frac": 0.8125, "sample": [229.20252990722656, 352.03619384765625, 251.62054443359375, 133.45191955566406, 171.43502807617188, 456.2876281738281, 395.37261962890625, 245.58404541015625, 287.48193359375, 18.66095733642578, 682.4034423828125, 181.43313598632812, 17.819580078125, 487.5599365234375, 486.7218933105469, 289.13726806640625, 80.41838073730469, 484.6015319824219, 322.35980224609375, 55.256103515625, 78.19488525390625, -38.4647216796875, -6.88215446472168, 9.71441650390625, 358.2756652832031, 355.5471496582031, -31.006973266601562, 794.3444213867188, -38.82405090332031, 528.3340454101562, 71.58422088623047, -5.448616027832031, 66.58212280273438, 331.2916259765625, 513.193115234375, 47.08711242675781, 167.20957946777344, 72.06529235839844, -202.3734588623047, -90.32875061035156, 193.70863342285156, -105.17700958251953, 584.3489990234375, -136.388427734375, -181.96780395507812, 127.32621002197266, 327.7583923339844, 218.55108642578125, 110.8761978149414, 432.1573181152344, 96.67045593261719, 186.79566955566406, 505.3026123046875, 220.93377685546875, 60.75114440917969, 265.0257568359375, -427.8514404296875, 252.95440673828125, -47.75269317626953, 411.07147216796875, 162.60635375976562, 274.77020263671875, 106.55625915527344, 143.38211059570312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000647.npy"}
{"epoch": 0.9500734214390602, "step": 648, "batch_size": 64, "mean": 257.642578125, "std": 296.62103271484375, "min": -292.18048095703125, "p10": -56.44808731079101, "median": 213.14421844482422, "p90": 648.7281372070313, "max": 1299.66064453125, "pos_frac": 0.828125, "sample": [160.08062744140625, 56.37782287597656, 101.83806610107422, 308.5373229980469, 119.55963134765625, 361.9268798828125, 612.5446166992188, 950.6229248046875, 172.6697540283203, 677.25634765625, 109.61080932617188, -52.655418395996094, 120.31838989257812, 527.0877685546875, -51.80975341796875, -188.58631896972656, 514.7218017578125, 159.38197326660156, 72.79942321777344, -292.18048095703125, 6.83915901184082, 312.1398010253906, 172.34783935546875, 389.0602111816406, 45.50660705566406, 691.6026000976562, 252.16053771972656, 219.11692810058594, 152.9373321533203, -8.548698425292969, 361.18505859375, 46.10839080810547, -39.447959899902344, 1299.66064453125, 417.43023681640625, 91.74063873291016, -107.16755676269531, 175.42625427246094, 127.72076416015625, -223.51730346679688, 9.14288330078125, 245.69117736816406, 804.1713256835938, 331.11834716796875, 117.5947265625, 433.5685119628906, 658.4122924804688, -113.5584716796875, 226.5077362060547, 348.5860595703125, 527.5380249023438, 207.1715087890625, -58.073516845703125, 266.0747375488281, 479.2409973144531, 148.78952026367188, 379.5393371582031, 431.2301940917969, 246.60751342773438, 626.1317749023438, 529.1781616210938, 374.6720886230469, -280.8316955566406, 728.2180786132812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000648.npy"}
{"epoch": 0.9515418502202643, "step": 649, "batch_size": 64, "mean": 164.33978271484375, "std": 234.66664123535156, "min": -289.9760437011719, "p10": -63.70719833374023, "median": 117.60332107543945, "p90": 457.92542419433596, "max": 1069.8782958984375, "pos_frac": 0.78125, "sample": [8.173084259033203, 28.41541290283203, 156.7982940673828, -65.60134887695312, 345.3220520019531, 40.50431442260742, 73.77597045898438, 240.94000244140625, 293.4319763183594, 86.66461181640625, 392.0285949707031, -53.451934814453125, 102.72052764892578, 112.36546325683594, -11.235816955566406, -96.29067993164062, -28.956283569335938, 472.17974853515625, -143.63882446289062, 272.0802307128906, 130.71371459960938, 19.08762550354004, -289.9760437011719, -29.324384689331055, 272.3016052246094, -62.60411834716797, 262.1854248046875, 144.318603515625, -64.17994689941406, 369.04345703125, 367.5392761230469, -43.047569274902344, 43.052490234375, 56.025978088378906, 335.872802734375, 296.120361328125, 363.1221008300781, 221.30706787109375, 493.8677978515625, 139.92230224609375, 722.12353515625, -10.622690200805664, 0.06846237182617188, 538.8990478515625, 279.10064697265625, 149.76119995117188, 355.1490478515625, 1069.8782958984375, 180.5513916015625, 218.2957763671875, 26.26357078552246, 89.30125427246094, 28.780349731445312, 537.8170776367188, 1.91259765625, 453.527587890625, 459.8102111816406, -227.05307006835938, 19.413005828857422, 95.17278289794922, -253.86915588378906, 68.39700317382812, 340.6529846191406, 122.84117889404297], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000649.npy"}
{"epoch": 0.9530102790014684, "step": 650, "batch_size": 64, "mean": 235.9899139404297, "std": 249.67779541015625, "min": -293.51715087890625, "p10": -78.53180732727049, "median": 222.30994415283203, "p90": 568.4150939941408, "max": 777.2825317382812, "pos_frac": 0.859375, "sample": [8.737007141113281, -187.54754638671875, 458.20330810546875, 523.0639038085938, 245.82373046875, 466.1105651855469, 310.68115234375, -44.3991813659668, 701.0838012695312, 223.52320861816406, 503.92791748046875, -107.73289489746094, 741.8448486328125, 341.8734130859375, 447.4422607421875, 438.1035461425781, 276.0601806640625, 612.321044921875, 224.12950134277344, 777.2825317382812, 415.16357421875, 95.7267837524414, 697.3308715820312, 342.0474548339844, 220.2672882080078, 128.45021057128906, 48.60186767578125, 91.1358413696289, 57.278106689453125, 264.1459045410156, 426.80438232421875, 176.56080627441406, -143.1016845703125, -87.77107238769531, 91.05353546142578, 587.851318359375, 32.47346878051758, 716.6820068359375, -56.9735221862793, 141.30438232421875, 129.7542724609375, 493.9908752441406, -207.86538696289062, 439.02777099609375, 101.3396224975586, 270.8015441894531, 230.1703338623047, 387.6462707519531, 164.896240234375, 473.54937744140625, 326.8564147949219, 56.088905334472656, 413.0029296875, 291.7109375, 209.88209533691406, 70.83757781982422, 13.790603637695312, -293.51715087890625, -126.42391967773438, 120.88395690917969, 221.0966796875, 55.35614776611328, 27.10335922241211, 27.810462951660156], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000650.npy"}
{"epoch": 0.9544787077826725, "step": 651, "batch_size": 64, "mean": 245.66983032226562, "std": 213.5235595703125, "min": -145.14891052246094, "p10": -18.386139678955075, "median": 249.21092224121094, "p90": 545.6082336425782, "max": 667.70751953125, "pos_frac": 0.84375, "sample": [469.39825439453125, 302.4754638671875, 546.7954711914062, 577.8369750976562, 60.89683532714844, 142.7938232421875, 328.57525634765625, 460.0593566894531, 101.77909851074219, 64.35612487792969, 494.5279541015625, 541.8690795898438, 194.91334533691406, 98.47174835205078, 191.46875, 292.0575256347656, 30.08572769165039, 30.836318969726562, -2.002164840698242, 185.0408477783203, 550.7728881835938, -145.14891052246094, 632.508544921875, 236.19573974609375, 391.572509765625, 36.65943908691406, -31.713163375854492, -70.677490234375, 175.81365966796875, 346.8861389160156, 155.99717712402344, -19.90111541748047, 327.45648193359375, -5.646970748901367, 667.70751953125, 355.2130126953125, 366.8250732421875, 216.54788208007812, 322.90643310546875, 4.908117294311523, 267.3265686035156, 506.8952331542969, -114.79722595214844, 108.72445678710938, 29.61102294921875, 380.3995056152344, -14.8511962890625, 583.17333984375, -40.07893753051758, 116.20207214355469, 109.96858215332031, 290.19122314453125, 626.1846923828125, 459.64801025390625, 262.2261047363281, -70.18673706054688, 356.8755798339844, 56.1132698059082, 505.6115417480469, 303.9863586425781, 348.0865478515625, 323.2858581542969, 542.8380126953125, 158.31671142578125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000651.npy"}
{"epoch": 0.9559471365638766, "step": 652, "batch_size": 64, "mean": 219.70716857910156, "std": 233.5047149658203, "min": -177.67950439453125, "p10": -20.286660575866698, "median": 180.88687896728516, "p90": 516.0010772705078, "max": 874.695068359375, "pos_frac": 0.828125, "sample": [145.65008544921875, 145.48626708984375, 521.6008911132812, -177.67950439453125, 108.95252990722656, 270.5954895019531, 312.3988037109375, -26.420021057128906, 443.275146484375, 126.36563873291016, 475.85699462890625, 469.75848388671875, 590.6773681640625, -21.397903442382812, 24.657386779785156, 35.629852294921875, 13.33056640625, 462.31134033203125, 617.2073974609375, -20.149606704711914, 313.888916015625, 91.58289337158203, 52.74132537841797, 212.98211669921875, 180.21153259277344, -0.609375, 181.56222534179688, 158.98915100097656, 162.330322265625, 199.1357421875, 198.4703826904297, 825.226806640625, -148.51480102539062, 874.695068359375, 264.51983642578125, 406.9072570800781, 232.06724548339844, -20.34539794921875, 5.523225784301758, 19.801055908203125, 230.1812286376953, 13.455314636230469, -2.8456077575683594, -50.310462951660156, 443.9942626953125, 278.9729309082031, 534.0118408203125, 27.554344177246094, 443.21649169921875, 156.36569213867188, 806.23193359375, 32.7717399597168, 427.29425048828125, 102.63704681396484, 35.27365493774414, 145.2494659423828, 502.9348449707031, -114.80757141113281, 229.4874267578125, 328.4014587402344, 185.88999938964844, 295.18621826171875, 285.5335998535156, -4.693855285644531], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000652.npy"}
{"epoch": 0.9574155653450808, "step": 653, "batch_size": 64, "mean": 218.42001342773438, "std": 219.9928436279297, "min": -214.81573486328125, "p10": -15.406936454772941, "median": 181.19309997558594, "p90": 527.48037109375, "max": 809.3201293945312, "pos_frac": 0.875, "sample": [298.417236328125, 7.071136474609375, 31.256147384643555, 809.3201293945312, 305.6812744140625, -18.581886291503906, 52.91267395019531, 26.8167724609375, 48.208251953125, 278.66680908203125, 527.764404296875, 192.45762634277344, -214.81573486328125, 550.4287719726562, 262.7539367675781, 358.9754638671875, 355.8183898925781, 286.140380859375, 620.9173583984375, 158.01812744140625, 82.13766479492188, 224.96571350097656, 472.4391784667969, 197.97659301757812, 274.63873291015625, 244.09715270996094, 509.56182861328125, 170.8883056640625, 167.39437866210938, -120.8042984008789, 485.49212646484375, 254.0005645751953, 581.8131713867188, -113.04953002929688, 231.07095336914062, 117.87272644042969, 90.97174072265625, 365.6673889160156, -114.74532318115234, 547.951904296875, 759.3963012695312, 342.77313232421875, -7.998720169067383, 6.085594177246094, 526.817626953125, 397.39697265625, 16.442153930664062, 133.9419708251953, 135.71908569335938, 113.17645263671875, 34.463714599609375, 138.931396484375, 522.4825439453125, 153.88754272460938, 231.30848693847656, 23.775806427001953, 63.830894470214844, 61.25384521484375, 381.62945556640625, 159.32118225097656, 191.49789428710938, -91.50044250488281, 105.37579345703125, -29.695419311523438], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000653.npy"}
{"epoch": 0.9588839941262849, "step": 654, "batch_size": 64, "mean": 189.29266357421875, "std": 267.07196044921875, "min": -376.8203125, "p10": -116.7412857055664, "median": 164.1509780883789, "p90": 500.3836975097656, "max": 1096.3756103515625, "pos_frac": 0.765625, "sample": [-40.01111602783203, 529.7330932617188, -118.0198974609375, 287.3885498046875, 135.10011291503906, 24.676589965820312, -270.8812255859375, 205.9549560546875, -95.65373229980469, 369.8487243652344, 233.32156372070312, 85.6103744506836, 136.1420440673828, 64.51837158203125, 63.574459075927734, -110.98645782470703, 73.03199005126953, 490.06878662109375, 313.2028503417969, 414.0685729980469, 281.0207214355469, 1096.3756103515625, 252.3428192138672, 56.14404296875, 289.10919189453125, 331.41802978515625, -113.75785827636719, 296.65020751953125, 57.015716552734375, -39.30406188964844, -23.739639282226562, 455.67059326171875, 371.5381774902344, 245.10958862304688, 29.2181396484375, 40.99036407470703, 623.0083618164062, -126.15261840820312, 13.990531921386719, 356.5999450683594, 499.60504150390625, 447.88116455078125, 400.24176025390625, 103.4885025024414, -376.8203125, 412.44207763671875, 116.723876953125, 504.3359069824219, 500.7174072265625, 114.90696716308594, 192.73248291015625, 546.4298095703125, 473.622802734375, 95.32547760009766, 192.159912109375, -165.46292114257812, 312.4393310546875, -160.79409790039062, 763.650390625, 270.9456481933594, 68.73898315429688, -105.07720947265625, -311.4397888183594, -65.99897003173828], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000654.npy"}
{"epoch": 0.960352422907489, "step": 655, "batch_size": 64, "mean": 198.965087890625, "std": 255.20118713378906, "min": -292.77801513671875, "p10": -67.45275726318359, "median": 124.44734954833984, "p90": 656.7660400390625, "max": 869.0722045898438, "pos_frac": 0.796875, "sample": [-54.75140380859375, -17.603168487548828, 746.7793579101562, 109.83163452148438, 71.0820083618164, 684.0065307617188, 109.9168701171875, 445.232421875, 31.544462203979492, 1.4560184478759766, 247.21823120117188, 14.846738815307617, 44.585205078125, -292.77801513671875, 63.564781188964844, 239.362548828125, 40.453285217285156, 654.1297607421875, 27.449661254882812, 462.2264099121094, 124.50469970703125, 302.6842041015625, 584.8626708984375, 128.94351196289062, -17.939584732055664, 75.37805938720703, 310.4424743652344, 287.8963317871094, -157.6100311279297, 194.8289794921875, -76.26436614990234, 678.4189453125, 124.38999938964844, 88.09738159179688, 367.37725830078125, 239.38787841796875, -215.90374755859375, 369.8973388671875, 320.96533203125, 363.33038330078125, 60.497161865234375, 56.968177795410156, 387.2316589355469, 869.0722045898438, 667.8901977539062, 88.97795104980469, 171.32891845703125, -115.04637145996094, 337.673828125, 269.581298828125, -34.75016784667969, -79.4976806640625, 150.73992919921875, -34.54243850708008, 657.8958740234375, 20.220294952392578, 708.428466796875, -72.89619445800781, -39.43812561035156, 313.7535400390625, 110.46472930908203, 200.95997619628906, 101.2427978515625, 214.76959228515625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000655.npy"}
{"epoch": 0.9618208516886931, "step": 656, "batch_size": 64, "mean": 298.8016357421875, "std": 209.62393188476562, "min": -49.156646728515625, "p10": 44.69600906372071, "median": 278.16668701171875, "p90": 631.8154907226564, "max": 738.307373046875, "pos_frac": 0.921875, "sample": [738.307373046875, 360.0824279785156, -32.73334503173828, 30.312511444091797, 114.40576934814453, 278.1624755859375, 116.08338928222656, 340.7245178222656, 169.8667449951172, 366.4991149902344, 214.60984802246094, 292.68817138671875, 224.80299377441406, 260.3307800292969, 213.2227783203125, 525.058837890625, 66.98858642578125, 276.2563171386719, 505.26702880859375, 227.31784057617188, 452.0128173828125, -38.85302734375, 695.5792846679688, -48.5355224609375, -49.156646728515625, 47.715309143066406, 393.405517578125, 566.3479614257812, 457.19830322265625, 651.6226196289062, 188.15135192871094, 667.6881713867188, 444.3002014160156, -30.977405548095703, 617.2113647460938, 363.86553955078125, 375.3146667480469, 334.6515197753906, 568.3289794921875, 87.47164916992188, 671.2332763671875, 142.42738342285156, 493.72955322265625, 305.7860107421875, 278.4007263183594, 421.4393310546875, 278.1708984375, 725.4320068359375, 163.19476318359375, 381.9104919433594, 383.8968505859375, 240.2341766357422, 93.19799041748047, 82.44572448730469, 638.0744018554688, 155.51065063476562, 43.40202331542969, 86.32025146484375, 227.7445068359375, 79.66169738769531, 332.94207763671875, 277.130615234375, 244.594482421875, 344.8254089355469], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000656.npy"}
{"epoch": 0.9632892804698973, "step": 657, "batch_size": 64, "mean": 271.0969543457031, "std": 234.41346740722656, "min": -206.9505615234375, "p10": 20.45958900451661, "median": 269.33119201660156, "p90": 592.1700256347657, "max": 916.6304931640625, "pos_frac": 0.921875, "sample": [255.96182250976562, 8.679737091064453, 112.20135498046875, 258.988525390625, 612.8248901367188, 890.22802734375, 371.0221862792969, 82.33155822753906, 481.3907470703125, -14.277099609375, -206.9505615234375, 190.94386291503906, 435.47674560546875, 65.37312316894531, 296.40130615234375, 342.56158447265625, 25.895187377929688, 294.63165283203125, 94.44567108154297, 567.6332397460938, 554.10009765625, 404.5521545410156, 92.15892791748047, 828.4293212890625, 415.0810852050781, 640.8104858398438, 100.37559509277344, 154.6868896484375, 243.812744140625, 229.05674743652344, 193.73699951171875, 339.64501953125, 18.130046844482422, -72.42921447753906, 407.7641906738281, 436.03741455078125, 303.03436279296875, 126.67169952392578, 114.75946044921875, 153.99203491210938, 332.0177307128906, 28.116363525390625, 31.279470443725586, 292.3361511230469, 668.673828125, 116.8340072631836, 281.3594970703125, 364.91021728515625, 383.17791748046875, 93.67584991455078, 602.685791015625, 383.68048095703125, 332.51092529296875, 366.30706787109375, 252.01220703125, 142.17434692382812, 389.3586120605469, 916.6304931640625, 279.6738586425781, -197.49887084960938, 84.50130462646484, -53.56477355957031, 127.51274871826172, 281.6693115234375], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000657.npy"}
{"epoch": 0.9647577092511013, "step": 658, "batch_size": 64, "mean": 223.59996032714844, "std": 197.50279235839844, "min": -239.38888549804688, "p10": -51.53681678771969, "median": 245.5157699584961, "p90": 436.37620239257814, "max": 699.1863403320312, "pos_frac": 0.875, "sample": [426.17083740234375, 433.38330078125, 133.12637329101562, -101.56288146972656, -11.647441864013672, 146.29364013671875, 480.2625732421875, 202.9258270263672, 25.981338500976562, 359.9342956542969, 234.4004669189453, 662.5440673828125, 105.31742858886719, 345.3667907714844, 198.30914306640625, 21.263992309570312, 37.35508346557617, 125.92623901367188, 147.98605346679688, -162.3108673095703, 322.5684509277344, 41.99217987060547, 254.86924743652344, 112.36515808105469, -211.1095428466797, 447.949951171875, 140.70272827148438, 320.17803955078125, 387.0133972167969, -75.16815948486328, 83.60212707519531, 91.48236083984375, 82.18829345703125, 402.33807373046875, 284.20196533203125, 699.1863403320312, 31.263174057006836, 339.9537353515625, 332.6506652832031, 402.5085144042969, 169.5655059814453, 279.71075439453125, 410.7381591796875, 368.5256042480469, 323.52069091796875, 361.56793212890625, 349.9468688964844, 309.6303405761719, 326.94403076171875, 389.8840026855469, 183.2574005126953, 437.65887451171875, -68.63226318359375, 445.4773864746094, 279.5020751953125, 236.16229248046875, 365.1943359375, 509.4789123535156, -239.38888549804688, 288.2261047363281, 122.80860900878906, 145.08401489257812, 149.31988525390625, -135.54815673828125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000658.npy"}
{"epoch": 0.9662261380323054, "step": 659, "batch_size": 64, "mean": 238.97952270507812, "std": 228.1735076904297, "min": -215.20030212402344, "p10": -56.16428413391112, "median": 223.78759002685547, "p90": 547.6370971679688, "max": 812.7567138671875, "pos_frac": 0.828125, "sample": [338.3213806152344, -105.87222290039062, 182.2977294921875, 353.123046875, 214.68540954589844, 134.5124969482422, 812.7567138671875, 152.52581787109375, -142.2229461669922, 50.72282409667969, 689.236083984375, 134.495849609375, 279.8111572265625, -95.58793640136719, 334.43133544921875, 76.41326141357422, 254.77870178222656, 271.1786193847656, 274.32513427734375, 268.8307800292969, 150.02133178710938, 201.69134521484375, -215.20030212402344, 770.4862670898438, 112.9483871459961, 689.1090087890625, 279.2807922363281, 129.39547729492188, 556.516357421875, 304.85198974609375, 225.98001098632812, 246.24822998046875, -0.5992355346679688, 145.78941345214844, 335.26922607421875, 221.5951690673828, 390.5888671875, 147.02886962890625, 27.21807861328125, 496.9283447265625, 86.02314758300781, -29.51671600341797, 397.648681640625, -63.76515579223633, 526.9188232421875, -34.527469635009766, -38.428916931152344, 279.22796630859375, 451.6616516113281, 658.7691650390625, 517.3457641601562, 237.13124084472656, 160.3062744140625, 288.5680847167969, 388.6395263671875, 156.90513610839844, 434.19805908203125, 312.1407775878906, 591.27197265625, -138.58558654785156, 218.41114807128906, 120.67762756347656, -115.9869384765625, 195.7452392578125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000659.npy"}
{"epoch": 0.9676945668135095, "step": 660, "batch_size": 64, "mean": 246.0765838623047, "std": 209.53846740722656, "min": -357.40570068359375, "p10": 16.67661590576172, "median": 248.03688049316406, "p90": 479.4640686035156, "max": 901.918212890625, "pos_frac": 0.90625, "sample": [109.55033874511719, 223.770263671875, 253.61256408691406, 382.39813232421875, 41.82484436035156, 299.23248291015625, 411.50457763671875, 318.17578125, 221.47216796875, 379.2935791015625, 115.43612670898438, 21.967960357666016, 362.7575378417969, 268.425537109375, 273.0152587890625, 644.6047973632812, 154.38417053222656, -47.01734924316406, 242.46119689941406, 115.26461791992188, 334.5084228515625, 366.9732666015625, 220.08383178710938, 361.7452087402344, 480.36712646484375, 675.9049072265625, 87.57588958740234, -77.89067840576172, 407.8008728027344, 299.1435546875, 133.65127563476562, 504.25830078125, -22.62200927734375, 699.440673828125, 15.49755859375, 327.46636962890625, 389.49603271484375, 57.29426574707031, 213.321044921875, 267.23626708984375, 100.04960632324219, 273.104736328125, 332.5118713378906, 331.8531799316406, 117.58590698242188, 111.43753051757812, 115.09391021728516, 901.918212890625, 576.2432861328125, 338.29571533203125, 19.427749633789062, 233.05728149414062, -102.78141784667969, 439.0618591308594, 477.35693359375, 100.85536193847656, 161.27212524414062, 183.08767700195312, 265.22613525390625, 351.79412841796875, 155.73919677734375, -357.40570068359375, -48.52367401123047, 139.25387573242188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000660.npy"}
{"epoch": 0.9691629955947136, "step": 661, "batch_size": 64, "mean": 234.89878845214844, "std": 231.2547607421875, "min": -351.50469970703125, "p10": -39.64905490875242, "median": 215.47815704345703, "p90": 504.03325500488285, "max": 877.5492553710938, "pos_frac": 0.875, "sample": [180.5660858154297, 81.98519134521484, 361.3681335449219, -18.282609939575195, 216.78594970703125, -48.80610275268555, 277.9062194824219, -123.0350112915039, 344.0764465332031, 363.376953125, 538.5134887695312, 281.7630310058594, 151.09840393066406, 607.55615234375, 877.5492553710938, 392.1466064453125, 238.77243041992188, 45.96466064453125, -76.97720336914062, 104.90554809570312, 214.1703643798828, 390.72003173828125, 175.81185913085938, 134.6106719970703, 50.429649353027344, 380.95562744140625, 163.45887756347656, 451.3456726074219, 691.4541625976562, 26.295272827148438, -88.21015930175781, 447.5960388183594, 196.66232299804688, 510.01898193359375, 196.5366973876953, 154.69277954101562, 283.93109130859375, 154.56967163085938, 230.17257690429688, 183.3594970703125, -113.48589324951172, 249.4409942626953, 33.22914123535156, 395.4993896484375, 213.77236938476562, -220.7403564453125, 246.66299438476562, 312.2845764160156, 35.16124725341797, 472.43487548828125, 689.1776123046875, -351.50469970703125, 210.8667755126953, 22.614545822143555, 391.9839172363281, 329.3939514160156, 56.87700653076172, 490.0665588378906, 29.023073196411133, 735.1121826171875, 415.1679992675781, 365.2358093261719, 240.52638244628906, 38.90727996826172], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000661.npy"}
{"epoch": 0.9706314243759178, "step": 662, "batch_size": 64, "mean": 197.9848175048828, "std": 217.50990295410156, "min": -281.42437744140625, "p10": -89.3887329101562, "median": 207.95653533935547, "p90": 497.2833557128907, "max": 656.175537109375, "pos_frac": 0.8125, "sample": [99.1784439086914, 280.37164306640625, 622.5335083007812, -281.42437744140625, 473.926513671875, 35.91328430175781, 242.2234649658203, 184.80279541015625, 656.175537109375, 242.60757446289062, 439.58349609375, 3.951690673828125, -178.17945861816406, 162.63804626464844, 320.8170471191406, 112.76824188232422, 209.08396911621094, 119.01106262207031, 598.5994873046875, 244.06719970703125, 516.457275390625, 32.51556396484375, 252.37124633789062, 99.1710205078125, 119.36260986328125, 291.4549560546875, 137.2205352783203, 469.7301025390625, 9.282257080078125, 39.80474853515625, 220.78851318359375, -158.39154052734375, 240.4747772216797, -153.2598876953125, 210.2956085205078, 375.42535400390625, 139.53359985351562, 349.1513977050781, 424.0774230957031, 287.51300048828125, 30.43975067138672, 210.78517150878906, 501.4076232910156, 533.2576293945312, 294.32110595703125, 500.82379150390625, -264.11383056640625, 206.8291015625, 190.2326202392578, 194.5129852294922, 189.98834228515625, -19.934757232666016, -9.971725463867188, 196.64117431640625, -155.52667236328125, -26.955846786499023, 487.34552001953125, 489.0223388671875, 247.38693237304688, 264.87408447265625, 276.96722412109375, -116.14568328857422, -15.87725830078125, -26.909378051757812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000662.npy"}
{"epoch": 0.9720998531571219, "step": 663, "batch_size": 64, "mean": 236.65377807617188, "std": 256.0628356933594, "min": -323.76263427734375, "p10": -30.685945129394515, "median": 200.3309555053711, "p90": 540.5314147949218, "max": 1279.944580078125, "pos_frac": 0.875, "sample": [441.73992919921875, 193.4203643798828, 402.75616455078125, 95.65359497070312, 342.3690185546875, 675.3529663085938, 133.0539093017578, 16.621307373046875, -15.4725341796875, 244.32882690429688, 123.24102020263672, 66.86773681640625, 558.6995849609375, 374.04736328125, 44.81941223144531, 326.7164001464844, 275.9998474121094, 261.9688720703125, -266.5782165527344, 75.08734130859375, 152.47882080078125, 333.9278564453125, -52.453304290771484, 333.5107727050781, 286.28070068359375, 210.22509765625, 328.7375183105469, 104.08651733398438, 412.330322265625, 1279.944580078125, 47.02949523925781, 530.9835205078125, 734.9581909179688, -58.22190856933594, 130.52523803710938, 641.96826171875, 524.3038330078125, 135.58912658691406, 51.359954833984375, 540.8251342773438, 539.8460693359375, 275.8830261230469, 221.35107421875, 75.35930633544922, 204.0194091796875, 262.60333251953125, 196.6425018310547, 107.39836120605469, 160.88870239257812, 694.8958740234375, 341.4861145019531, 209.04681396484375, 535.9451293945312, -323.76263427734375, 255.0087890625, 32.64506149291992, -74.77310180664062, 54.40630340576172, 157.70590209960938, 11.925178527832031, -37.20597839355469, 120.34420013427734, -37.859832763671875, 122.96019744873047], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000663.npy"}
{"epoch": 0.973568281938326, "step": 664, "batch_size": 64, "mean": 241.583984375, "std": 250.89678955078125, "min": -668.148681640625, "p10": -21.10830287933349, "median": 227.3170166015625, "p90": 581.1265808105469, "max": 786.5048828125, "pos_frac": 0.875, "sample": [634.2179565429688, 184.22520446777344, 172.57638549804688, 562.5795288085938, 359.3956604003906, 427.9164123535156, 145.66238403320312, 589.0753173828125, 49.408607482910156, 601.8206787109375, -179.15155029296875, 525.5427856445312, 278.99114990234375, 91.4239273071289, 673.3197021484375, 128.36753845214844, -117.88565063476562, 315.379150390625, 117.84793853759766, 303.54058837890625, 231.17645263671875, 290.0275573730469, -16.0174560546875, 611.933349609375, -62.86776351928711, 8.431732177734375, 681.6476440429688, -43.37807846069336, 116.17570495605469, -148.17947387695312, 523.2971801757812, 247.89779663085938, 0.31732177734375, 188.32855224609375, 175.24002075195312, 362.7586669921875, -668.148681640625, 163.03720092773438, 518.9757690429688, 333.61676025390625, 42.13380432128906, 258.5246887207031, 223.45758056640625, 180.1463623046875, 498.9815368652344, 786.5048828125, 290.23846435546875, 168.84747314453125, 532.4771728515625, 122.37158203125, 84.15054321289062, 500.7796630859375, 393.0804443359375, 69.63307189941406, 305.1591491699219, 284.1406555175781, 296.3524475097656, 66.74493408203125, 105.53160095214844, 33.260162353515625, 266.3666687011719, 108.34829711914062, 488.9098815917969, -23.29009437561035], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000664.npy"}
{"epoch": 0.9750367107195301, "step": 665, "batch_size": 64, "mean": 233.77598571777344, "std": 216.07554626464844, "min": -201.9996795654297, "p10": -5.855768966674801, "median": 201.84063720703125, "p90": 500.68367004394537, "max": 961.4640502929688, "pos_frac": 0.875, "sample": [5.506202697753906, 116.25814819335938, 208.68649291992188, 46.969573974609375, 152.69097900390625, 372.8244323730469, 212.67047119140625, 139.696044921875, 581.6300048828125, 196.4719696044922, 277.0888671875, 406.81414794921875, -11.686248779296875, 288.76519775390625, 177.49835205078125, -56.929779052734375, -78.99630737304688, 164.04379272460938, 113.69886779785156, 333.9835205078125, 269.5312805175781, 292.09619140625, 372.0782165527344, 300.1623229980469, -74.26945495605469, 88.74385833740234, -201.9996795654297, 505.256591796875, 472.58282470703125, -2.069965362548828, 146.979248046875, 148.51312255859375, 535.9570922851562, 325.8794250488281, 490.0135192871094, 111.07440948486328, 407.3167419433594, 458.02935791015625, 79.1029281616211, 204.11471557617188, 140.76174926757812, 638.792236328125, 418.0696105957031, -88.04913330078125, 228.28387451171875, 649.0386962890625, 7.5127410888671875, 158.02960205078125, 265.31610107421875, 322.2679138183594, 158.34303283691406, 293.44342041015625, 363.6221923828125, 106.84652709960938, 714.482666015625, 1.2569427490234375, 56.07072448730469, 38.892295837402344, 199.56655883789062, 444.2857360839844, 961.4640502929688, -7.4782562255859375, 219.5404052734375, 94.52609252929688], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000665.npy"}
{"epoch": 0.9765051395007343, "step": 666, "batch_size": 64, "mean": 220.32644653320312, "std": 207.8836669921875, "min": -269.97503662109375, "p10": -10.442356872558591, "median": 229.1574935913086, "p90": 482.40513916015624, "max": 799.8011474609375, "pos_frac": 0.859375, "sample": [287.3393859863281, 70.89585876464844, 391.498291015625, 225.0139923095703, -134.93197631835938, 242.26292419433594, 375.82415771484375, 331.8259582519531, 79.73828887939453, 545.876708984375, -11.559158325195312, 108.18759155273438, 233.30099487304688, 9.984672546386719, -33.41175079345703, 133.41380310058594, 479.7137451171875, 32.70146942138672, -33.757568359375, 250.73776245117188, 511.21685791015625, 138.97332763671875, 374.09625244140625, 330.44659423828125, 799.8011474609375, -3.3182525634765625, 452.77081298828125, 104.35115051269531, 25.492210388183594, -269.97503662109375, 300.3750305175781, 322.03863525390625, 337.25787353515625, 483.55859375, 430.4754943847656, 204.86865234375, 429.3518371582031, -7.83648681640625, 140.49794006347656, 258.4209289550781, 59.44845199584961, 35.019874572753906, -109.19774627685547, 584.398681640625, 165.5932159423828, 363.87603759765625, -199.30007934570312, 557.4345703125, 170.78330993652344, 82.50173950195312, 356.71435546875, 356.6358337402344, 146.01824951171875, 84.51153564453125, 30.78076934814453, 58.768707275390625, 153.9039306640625, 297.39447021484375, 574.03515625, 290.40936279296875, 78.42488861083984, 360.83990478515625, 302.7508850097656, 351.6282043457031], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000666.npy"}
{"epoch": 0.9779735682819384, "step": 667, "batch_size": 64, "mean": 228.5996856689453, "std": 200.1288299560547, "min": -259.1575927734375, "p10": 16.675593566894538, "median": 219.87749481201172, "p90": 553.0620849609376, "max": 703.150146484375, "pos_frac": 0.921875, "sample": [188.93820190429688, 23.29271697998047, 215.6828155517578, -259.1575927734375, 48.373504638671875, 420.6454772949219, 284.5927734375, 224.07217407226562, 289.24932861328125, 558.8399047851562, 137.89271545410156, 40.129180908203125, 78.83749389648438, 247.67710876464844, 91.64484405517578, 154.06356811523438, 348.88861083984375, 176.67904663085938, 601.37841796875, 177.6962432861328, 248.80685424804688, 56.290103912353516, 560.751220703125, 240.88409423828125, 279.4726257324219, 135.86944580078125, 417.72222900390625, 169.4677734375, 278.5561218261719, 463.9597473144531, 164.51931762695312, 243.46142578125, 515.4938354492188, 314.155029296875, 289.0700378417969, -71.21432495117188, 87.32196044921875, 118.86732482910156, 295.8894348144531, 41.20049285888672, -104.91178894042969, 643.2133178710938, 703.150146484375, 6.615653991699219, 539.5805053710938, 596.4248046875, 282.6205749511719, 50.476768493652344, 230.4410400390625, 319.2070007324219, 13.839683532714844, 315.782470703125, -130.6260223388672, 380.92987060546875, 588.3298950195312, 318.5988464355469, 66.78792572021484, 211.84194946289062, 41.967979431152344, 81.65786743164062, -31.459793090820312, 132.66159057617188, 199.17501831054688, 274.1129150390625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000667.npy"}
{"epoch": 0.9794419970631424, "step": 668, "batch_size": 64, "mean": 196.06912231445312, "std": 252.5818634033203, "min": -386.4540710449219, "p10": -96.28820953369137, "median": 211.70726776123047, "p90": 500.90134887695314, "max": 1007.6311645507812, "pos_frac": 0.796875, "sample": [392.4668884277344, 152.66102600097656, 752.7757568359375, -356.0936279296875, 41.4298210144043, 237.49929809570312, 391.7452087402344, 161.1322479248047, 355.951171875, 200.22003173828125, 83.98814392089844, -189.9311065673828, 81.06267547607422, 139.55563354492188, 256.23748779296875, 142.9528350830078, 222.4848175048828, 21.727432250976562, -202.330322265625, -48.1083869934082, 100.91478729248047, 391.80902099609375, -163.86056518554688, 343.55718994140625, 245.35858154296875, 248.47613525390625, 62.567169189453125, 43.833221435546875, 8.103179931640625, 327.4615173339844, 243.591552734375, 299.09307861328125, 270.5238037109375, -57.737274169921875, 576.4586791992188, 210.6095733642578, -386.4540710449219, 525.0242919921875, 212.80496215820312, -6.785680770874023, 366.46502685546875, 410.9732666015625, 501.748046875, 124.20911407470703, -191.06201171875, 515.9730834960938, 124.5855712890625, -107.59425354003906, 242.73312377929688, -26.3668270111084, 281.249755859375, 29.08507537841797, 131.25967407226562, 285.806884765625, -69.90744018554688, 498.92572021484375, 343.69647216796875, 296.99298095703125, 1007.6311645507812, 225.16941833496094, -0.12979793548583984, 469.91827392578125, 35.336395263671875, 718.9490966796875], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000668.npy"}
{"epoch": 0.9809104258443465, "step": 669, "batch_size": 64, "mean": 256.8287048339844, "std": 252.32884216308594, "min": -411.86749267578125, "p10": 18.37370357513428, "median": 261.7657165527344, "p90": 552.002508544922, "max": 999.2459106445312, "pos_frac": 0.90625, "sample": [241.5078887939453, 34.05845642089844, 29.22308921813965, 602.7064208984375, -311.43060302734375, 619.5853271484375, 364.40350341796875, 384.84881591796875, 184.50205993652344, -31.516748428344727, 999.2459106445312, -58.87143325805664, 19.347198486328125, 248.67971801757812, -62.826194763183594, 187.98684692382812, 526.6087036132812, 17.956491470336914, 129.2625732421875, 248.10166931152344, 277.80303955078125, 240.85472106933594, 513.927978515625, 434.056884765625, 292.1234436035156, 475.40008544921875, 54.604736328125, 532.0338134765625, 662.2080688476562, 82.6898193359375, 279.42095947265625, 808.7175903320312, 278.6546325683594, 465.1917419433594, 588.211181640625, -345.20489501953125, 53.301517486572266, 283.9080810546875, 559.784912109375, 327.8475646972656, 163.8040771484375, 233.00564575195312, 53.550445556640625, 279.55609130859375, -411.86749267578125, 274.8517150878906, 194.57130432128906, 533.8435668945312, 220.09646606445312, 138.02655029296875, 53.035179138183594, 352.8276062011719, 118.03228759765625, 86.48086547851562, 318.7816467285156, 438.60906982421875, 42.588897705078125, 294.1624755859375, 377.1567077636719, 498.1677551269531, 104.58578491210938, 388.3595886230469, 151.36782836914062, 294.5262756347656], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000669.npy"}
{"epoch": 0.9823788546255506, "step": 670, "batch_size": 64, "mean": 269.3789367675781, "std": 246.0531463623047, "min": -264.8541564941406, "p10": 7.588070678710944, "median": 234.60785675048828, "p90": 629.8450500488282, "max": 781.4740600585938, "pos_frac": 0.90625, "sample": [249.90853881835938, 383.9257507324219, 177.48281860351562, 700.9568481445312, 215.31288146972656, -20.9271183013916, -120.21488189697266, 18.094816207885742, 436.3976745605469, 561.6434936523438, 262.893798828125, 102.4998779296875, -264.8541564941406, 672.0309448242188, 498.197998046875, 781.4740600585938, 358.92352294921875, 185.34317016601562, 584.0185546875, 117.39248657226562, 147.88954162597656, -165.59548950195312, 678.650146484375, 4.677238464355469, 183.11097717285156, 283.09637451171875, 395.4560852050781, 132.93997192382812, 328.5115051269531, 31.350372314453125, 537.693115234375, 552.8046264648438, 104.72764587402344, 335.8874206542969, 286.51531982421875, 775.79541015625, 518.3914184570312, -10.512523651123047, 230.51351928710938, 550.8446655273438, 24.676441192626953, 632.9885864257812, 206.51817321777344, 113.30364990234375, 622.5101318359375, 368.53106689453125, 255.96739196777344, 233.29286193847656, -204.59825134277344, 14.380012512207031, 72.40145874023438, 695.0634765625, 173.7120361328125, 26.79425048828125, 368.627197265625, 29.4570255279541, 195.40115356445312, 303.1466064453125, 151.693115234375, 387.38909912109375, 148.56692504882812, 355.419677734375, 25.83905029296875, 235.9228515625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000670.npy"}
{"epoch": 0.9838472834067548, "step": 671, "batch_size": 64, "mean": 218.90988159179688, "std": 218.3164520263672, "min": -276.28521728515625, "p10": -52.232144165039045, "median": 214.52664947509766, "p90": 500.28289184570315, "max": 697.3895263671875, "pos_frac": 0.84375, "sample": [46.07725524902344, -77.11753845214844, 125.21389770507812, 224.14108276367188, -59.52081298828125, 485.5302734375, 109.34921264648438, 137.3979034423828, 142.60638427734375, 385.7820739746094, 229.639892578125, 131.57620239257812, 287.70257568359375, 354.7438049316406, 279.9428405761719, 15.822250366210938, 99.55366516113281, 140.55621337890625, 455.9139404296875, 298.12713623046875, -19.895227432250977, 623.0731811523438, -276.28521728515625, 594.557861328125, 471.07928466796875, 204.91221618652344, 606.1784057617188, 344.0862731933594, 355.6339111328125, 308.7825012207031, 78.91094970703125, 29.913414001464844, -75.61538696289062, 284.983642578125, 464.2426452636719, -90.44730377197266, 226.4025115966797, 100.34164428710938, 400.405029296875, 349.5770263671875, 606.7971801757812, 93.20199584960938, 118.66670227050781, 127.6937484741211, 398.6798400878906, 130.09239196777344, 202.0556640625, 548.798095703125, -118.69583892822266, 19.434471130371094, 436.61328125, 246.94552612304688, -10.894144058227539, 63.52294921875, 255.00064086914062, 246.37120056152344, -212.63967895507812, -35.225250244140625, 19.51258659362793, 504.781005859375, 45.38426971435547, 697.3895263671875, 489.78729248046875, 343.0827941894531], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000671.npy"}
{"epoch": 0.9853157121879589, "step": 672, "batch_size": 64, "mean": 202.13330078125, "std": 231.7322998046875, "min": -248.20138549804688, "p10": -64.52402191162109, "median": 189.1356430053711, "p90": 531.6580200195315, "max": 811.8507690429688, "pos_frac": 0.796875, "sample": [202.1356964111328, -9.932037353515625, -30.57799530029297, 99.50806427001953, 1.2439994812011719, -66.58628845214844, 423.685546875, 277.55609130859375, 561.0045166015625, -0.8283920288085938, 87.59105682373047, 94.25078582763672, -247.06546020507812, 30.05367088317871, 231.15908813476562, 270.5706481933594, 79.53933715820312, -80.05641174316406, 367.37451171875, 427.2613830566406, 62.89346694946289, 461.1945495605469, 31.39403533935547, -59.712066650390625, 321.55291748046875, 109.18932342529297, 621.5103759765625, -154.0714111328125, 177.3243408203125, 811.8507690429688, -53.48626708984375, 613.987060546875, 616.72216796875, 404.21783447265625, 454.2493896484375, 334.8266296386719, 270.297607421875, 101.56707763671875, 188.17965698242188, 590.8533325195312, -209.79815673828125, 102.6208267211914, 284.64569091796875, 262.0548400878906, 214.29476928710938, 349.2313232421875, 170.69406127929688, 108.91259765625, 253.3748779296875, 146.72604370117188, 303.6960144042969, 201.02395629882812, -20.027069091796875, 313.3652648925781, 84.43572998046875, 188.71820068359375, 189.55308532714844, 273.21868896484375, 158.17430114746094, -244.16192626953125, 364.3393859863281, 463.182861328125, 604.0277099609375, -248.20138549804688], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000672.npy"}
{"epoch": 0.986784140969163, "step": 673, "batch_size": 64, "mean": 250.54046630859375, "std": 197.2724609375, "min": -254.60000610351562, "p10": -43.81803359985348, "median": 255.8961639404297, "p90": 466.8782012939453, "max": 706.0402221679688, "pos_frac": 0.875, "sample": [-59.41460037231445, 177.65185546875, 156.03736877441406, 408.89300537109375, 57.42896270751953, 403.26837158203125, 375.6285705566406, 215.2435760498047, 706.0402221679688, 318.2633361816406, -107.03788757324219, 590.6778564453125, 260.9455871582031, 458.2384948730469, 198.7921600341797, 379.36285400390625, -59.583160400390625, 306.7113342285156, 660.9315795898438, 200.15585327148438, 100.81056213378906, 189.16949462890625, 303.2794189453125, -64.95982360839844, 369.86627197265625, -116.2970962524414, 216.65399169921875, 283.20660400390625, -128.17440795898438, 92.26454162597656, 281.9475402832031, 418.143310546875, 188.97816467285156, 199.9989471435547, 271.23974609375, 607.6417846679688, 343.47760009765625, 135.4750213623047, 443.371337890625, 337.8405456542969, 395.5368957519531, -7.426046371459961, 442.5389099121094, 364.6242370605469, 250.84674072265625, 51.595008850097656, 58.25956726074219, 79.54579162597656, 330.5250549316406, 569.753662109375, 286.912353515625, 197.4967803955078, 361.1459655761719, 402.2457580566406, -254.60000610351562, 359.01104736328125, 185.5778045654297, 470.5809326171875, 248.65985107421875, 138.85995483398438, 167.13235473632812, 132.0927734375, 500.4801940917969, 181.02584838867188], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000673.npy"}
{"epoch": 0.9882525697503671, "step": 674, "batch_size": 64, "mean": 257.46026611328125, "std": 240.81076049804688, "min": -170.24044799804688, "p10": -24.369216156005848, "median": 247.0787582397461, "p90": 560.4673461914065, "max": 1210.9384765625, "pos_frac": 0.84375, "sample": [192.8819580078125, 43.00875473022461, 324.5091552734375, 199.6273651123047, 176.27554321289062, 413.96966552734375, 76.57591247558594, 123.62593078613281, -170.24044799804688, 594.9948120117188, 368.2162780761719, 502.2876281738281, -56.90116500854492, 493.34295654296875, 51.23262023925781, 131.21963500976562, -14.528352737426758, 731.8823852539062, 351.9812927246094, 1210.9384765625, 291.3817138671875, 230.527587890625, -31.738666534423828, 437.1312561035156, 172.22879028320312, 116.484130859375, 363.2417297363281, 297.29656982421875, 340.81011962890625, 104.11273193359375, 360.96881103515625, 800.9835815429688, -12.87704849243164, 261.0743408203125, 260.443115234375, 585.27685546875, -28.586729049682617, -32.69915008544922, 101.88987731933594, 409.23834228515625, 245.04864501953125, 128.51632690429688, 351.7445068359375, 442.141845703125, 128.30902099609375, 580.3919677734375, 245.10025024414062, 296.7581787109375, 112.28358459472656, 249.05726623535156, -101.4831314086914, -11.438037872314453, 647.320556640625, -153.47698974609375, 225.36935424804688, 343.021240234375, 268.8916015625, 513.9765625, 287.6767883300781, 303.61993408203125, 275.3304748535156, 9.004472732543945, 165.6126708984375, 152.59092712402344], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000674.npy"}
{"epoch": 0.9897209985315712, "step": 675, "batch_size": 64, "mean": 284.1156005859375, "std": 229.24940490722656, "min": -101.23188781738281, "p10": -6.188272094726559, "median": 266.2618713378906, "p90": 584.1043884277344, "max": 958.3593139648438, "pos_frac": 0.875, "sample": [208.97691345214844, 428.5107727050781, 312.8394775390625, 174.42286682128906, 202.294677734375, 189.1980743408203, 486.77276611328125, 40.35725402832031, 230.27122497558594, 335.4913635253906, 449.7032470703125, -31.89588165283203, 452.6702575683594, 521.3554077148438, 84.72919464111328, -7.7484130859375, -2.547943115234375, 505.4782409667969, 312.4806823730469, 168.04653930664062, 362.6754455566406, 115.71089935302734, 958.3593139648438, 386.37933349609375, 52.97423553466797, 105.7200698852539, 35.450862884521484, 534.2304077148438, 296.2795104980469, 71.49322509765625, 339.22113037109375, 436.1297607421875, 277.85919189453125, 452.2944641113281, 140.55136108398438, 434.56573486328125, 709.055908203125, -101.23188781738281, 592.9415893554688, 668.977783203125, 158.2061309814453, 107.00387573242188, 29.844154357910156, 249.38986206054688, 325.76611328125, -50.889556884765625, -11.362533569335938, 258.63787841796875, 405.6396789550781, 352.4249572753906, 146.64418029785156, 563.4842529296875, 190.39700317382812, -71.66204833984375, -27.262229919433594, 276.9808349609375, 184.32293701171875, 360.05926513671875, 273.8858642578125, 640.108154296875, 116.44583129882812, 738.72119140625, 786.9744873046875, 248.59158325195312], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000675.npy"}
{"epoch": 0.9911894273127754, "step": 676, "batch_size": 64, "mean": 236.58206176757812, "std": 196.50050354003906, "min": -204.1931915283203, "p10": 18.04765090942383, "median": 216.7310791015625, "p90": 484.66416015625003, "max": 710.653564453125, "pos_frac": 0.921875, "sample": [130.50634765625, 487.5899658203125, 425.43511962890625, 340.02227783203125, 398.52471923828125, 48.271385192871094, 395.7348937988281, 221.47598266601562, -12.917346954345703, -204.1931915283203, 449.73480224609375, 39.753021240234375, 85.24205017089844, 211.98617553710938, 401.321533203125, 661.3606567382812, 207.83612060546875, 258.1798095703125, 64.05226135253906, -66.15542602539062, 57.089271545410156, 40.44926452636719, 477.8372802734375, -17.137115478515625, 72.40438842773438, 210.65573120117188, 64.44657897949219, 184.2105712890625, 286.2690124511719, 619.2581176757812, 117.20925903320312, 335.2360534667969, 235.62066650390625, 419.6759948730469, -5.680595397949219, 64.42355346679688, 434.836669921875, 250.2055206298828, 257.7659606933594, 710.653564453125, 271.070068359375, 185.34805297851562, 175.09922790527344, 79.21736145019531, 432.6144714355469, 21.822967529296875, 13.90057373046875, 413.9923095703125, 164.04949951171875, 65.27843475341797, 538.877197265625, 343.9404296875, 418.9534912109375, 289.4334716796875, 17.61005401611328, 283.314208984375, 516.6873779296875, 49.47673797607422, 549.993408203125, 25.185699462890625, 197.20745849609375, 330.5203857421875, 19.068710327148438, 379.39898681640625], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000676.npy"}
{"epoch": 0.9926578560939795, "step": 677, "batch_size": 64, "mean": 247.37344360351562, "std": 201.66339111328125, "min": -178.9285125732422, "p10": -22.219221496582023, "median": 248.7402801513672, "p90": 509.5891387939453, "max": 663.643310546875, "pos_frac": 0.84375, "sample": [-4.562004089355469, 663.643310546875, 293.4495849609375, 184.26644897460938, 162.52432250976562, 468.279296875, -55.853981018066406, -63.988380432128906, 384.0478210449219, -13.175537109375, 311.6116027832031, 277.3660583496094, -25.567703247070312, 443.4494323730469, 91.60932922363281, 376.9034118652344, 461.04296875, -178.9285125732422, 155.7002410888672, 268.25970458984375, 524.272216796875, 406.3193054199219, 482.59515380859375, 55.84300231933594, 243.10531616210938, 363.5445556640625, 373.3941650390625, 254.375244140625, 260.91259765625, 492.4303894042969, 146.7007598876953, 110.03695678710938, 26.49540138244629, 193.468994140625, 387.2967224121094, -41.37890625, 510.9467468261719, 188.43113708496094, 323.8038330078125, 198.1185302734375, 352.10357666015625, 207.16009521484375, 329.56842041015625, 506.42138671875, -14.406097412109375, 543.8479614257812, 147.3370361328125, -30.356904983520508, 626.8245849609375, 114.44905090332031, 637.5499877929688, -53.4265022277832, 182.68551635742188, 83.2686996459961, 443.130126953125, 191.68643188476562, 93.58389282226562, 601.5853271484375, 331.87939453125, 312.91522216796875, 329.62786865234375, 8.41973876953125, 40.252803802490234, 145.00344848632812], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000677.npy"}
{"epoch": 0.9941262848751835, "step": 678, "batch_size": 64, "mean": 250.60003662109375, "std": 251.02166748046875, "min": -318.0972595214844, "p10": -33.48906021118164, "median": 218.35991668701172, "p90": 568.7609375000001, "max": 969.1727294921875, "pos_frac": 0.84375, "sample": [45.713462829589844, 661.169677734375, 343.4491271972656, 367.4354553222656, 196.92832946777344, -67.05911254882812, 388.8349914550781, -318.0972595214844, 75.6439208984375, 104.24185180664062, -30.216232299804688, 419.20660400390625, 173.0898895263672, 506.1498718261719, 65.9418716430664, 190.15464782714844, 400.9371337890625, -181.9305419921875, 295.06207275390625, 544.2149658203125, 467.9747619628906, 40.45027160644531, 2.113424301147461, 596.0977783203125, 439.5251159667969, 969.1727294921875, 276.0662841796875, 56.41844177246094, -10.249675750732422, 579.2806396484375, -56.710655212402344, -80.32071685791016, 62.12646484375, 462.95184326171875, 56.211883544921875, 523.4524536132812, 100.37039947509766, 660.901123046875, 109.53933715820312, 399.84771728515625, -66.54743957519531, 239.79150390625, 393.10101318359375, 455.02630615234375, 310.76287841796875, 61.52272415161133, 586.2449340820312, 734.7747802734375, 81.32098388671875, 421.31256103515625, 482.6334228515625, -34.891700744628906, 184.96412658691406, 2.8039703369140625, -5.852849960327148, 21.378862380981445, 144.4446258544922, 386.92132568359375, 353.0990905761719, 272.21282958984375, 461.9835510253906, 84.58798217773438, 503.2710876464844, 127.4452896118164], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000678.npy"}
{"epoch": 0.9955947136563876, "step": 679, "batch_size": 64, "mean": 207.6616668701172, "std": 209.1145782470703, "min": -223.33599853515625, "p10": -45.47180252075194, "median": 179.40367126464844, "p90": 435.16133422851567, "max": 714.3689575195312, "pos_frac": 0.828125, "sample": [342.221923828125, 87.74469757080078, 345.032470703125, 544.3944702148438, 178.82583618164062, 327.63555908203125, 367.24560546875, 500.8323669433594, 385.96990966796875, 179.98150634765625, 306.9515380859375, 340.0447082519531, 176.0872344970703, -71.96644592285156, 632.1927490234375, -50.465415954589844, 78.62606048583984, -8.125885009765625, 238.43841552734375, 391.447998046875, 330.04669189453125, 710.0079345703125, 32.70878601074219, 78.26779174804688, 58.8980598449707, -135.21890258789062, 190.44586181640625, 371.6148376464844, 89.48814392089844, 171.1534881591797, 174.94216918945312, 36.33551025390625, -33.820037841796875, 322.54217529296875, 190.98980712890625, 16.781494140625, 532.9456176757812, 103.68925476074219, -1.6155414581298828, 16.858924865722656, 6.364837646484375, 36.91032409667969, 398.0716857910156, 163.21157836914062, 436.9503479003906, 371.11358642578125, 28.10941505432129, -58.395904541015625, -160.52325439453125, 142.75802612304688, 430.9869689941406, -223.33599853515625, 135.55392456054688, 70.53144836425781, 351.92828369140625, 323.78631591796875, 417.4248352050781, 250.892822265625, 364.7353210449219, 714.3689575195312, 259.1710205078125, 347.734130859375, -51.7606315612793, -16.418386459350586], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000679.npy"}
{"epoch": 0.9970631424375918, "step": 680, "batch_size": 64, "mean": 245.17251586914062, "std": 256.94708251953125, "min": -517.7100830078125, "p10": -3.3935260772704936, "median": 226.7044219970703, "p90": 583.1503662109376, "max": 816.15673828125, "pos_frac": 0.890625, "sample": [591.7367553710938, -134.1970977783203, 251.46803283691406, 327.57037353515625, 143.2783966064453, 452.01141357421875, 683.4898681640625, 150.49803161621094, 482.6625671386719, 144.0322265625, 250.13720703125, 140.3897247314453, 155.3193817138672, 47.762603759765625, 399.20953369140625, 769.072265625, 152.7264862060547, -297.657470703125, 11.386764526367188, 230.94613647460938, 763.1345825195312, 132.98052978515625, 10.739307403564453, 16.90715789794922, 102.08353424072266, 479.5521240234375, 171.4682159423828, 89.93767547607422, 387.72601318359375, 510.26385498046875, 224.16461181640625, 310.0989990234375, 98.71835327148438, 462.1834411621094, 461.533447265625, 68.31455993652344, -9.450454711914062, 305.7745361328125, 570.7171630859375, -224.72662353515625, 245.69732666015625, 588.4788818359375, 394.38165283203125, 203.255126953125, 110.8756103515625, -517.7100830078125, 360.1568908691406, -115.75299835205078, 84.04464721679688, 521.7244262695312, 331.2845458984375, 816.15673828125, 11.1129150390625, 93.47117614746094, 279.2406311035156, 257.18963623046875, 407.72747802734375, 680.856689453125, 46.15377426147461, 229.24423217773438, -35.303680419921875, 184.57846069335938, 172.47433471679688, 457.73858642578125], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000680.npy"}
{"epoch": 0.9985315712187959, "step": 681, "batch_size": 64, "mean": 223.562255859375, "std": 230.7840118408203, "min": -282.750244140625, "p10": -57.70569763183593, "median": 188.0319366455078, "p90": 567.1147705078125, "max": 850.987548828125, "pos_frac": 0.84375, "sample": [145.43104553222656, -282.750244140625, 275.370849609375, 121.43707275390625, 486.1189270019531, 45.171531677246094, 365.5764465332031, 334.4242858886719, -12.416709899902344, 177.2991943359375, 23.4747314453125, 216.80892944335938, -124.21467590332031, -75.54753112792969, 251.5468292236328, 850.987548828125, 399.7669677734375, -30.89794158935547, 46.15257263183594, 443.1798095703125, 188.38775634765625, 3.1786117553710938, 43.54175567626953, 116.42086791992188, 424.42779541015625, 268.91790771484375, 164.373779296875, 593.7431640625, 255.94631958007812, 410.2887268066406, 156.02655029296875, 304.700927734375, 318.33221435546875, 140.22763061523438, 98.45777130126953, 265.81707763671875, 52.099945068359375, -118.00540161132812, 289.739501953125, -101.19725036621094, -50.27900695800781, 611.8591918945312, 79.19255828857422, 324.4147033691406, -197.90689086914062, 139.7637939453125, -60.88856506347656, 87.81513977050781, 171.3695831298828, 33.53424835205078, 500.05950927734375, 187.67611694335938, 562.8050537109375, 322.2738952636719, 249.5206298828125, 219.01951599121094, 546.2764892578125, 577.8057861328125, 568.9617919921875, 302.9238586425781, 145.74110412597656, 592.69775390625, 697.538818359375, 163.4633026123047], "npy": "/scratch/qu.yang1/dynamic-dpo-v4/outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85/margin_logs/step_0000681.npy"}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7d8b58a15bd3a2ace66b5979c521622fe4142822e4279bd05a72e8b1ee87f56c
size 4886466168

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:090706f84ae399d0714a616ed4fe77fbeaca7998c74f747a40b453d2a7540f44
size 4832007448

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6185d8f18be441f18932d21f272d1105c98acd65d5a7be1163b12e0d4a6a25f9
size 4999813112

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c2c2461e82c6d58ae200c2b9463dc73b69b6eff05c25d03cd98ac88fd88822e5
size 4999813128

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cbadf56e84bae84c00a573b5cec53fd3e0339ad9d38a4a7ee024f4d64d126e5f
size 4832007496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2b69ae940090a3990a0011a161d620db3eb8455eb30ed3e73cba705f61485717
size 4999813120

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0253b1d1be2700f5f2aac08894de2e7c80949f438e0508686b23acc29b1ef35f
size 2571158184

View File

@@ -0,0 +1,298 @@
{
"metadata": {
"total_size": 32121044992
},
"weight_map": {
"lm_head.weight": "model-00007-of-00007.safetensors",
"model.embed_tokens.weight": "model-00001-of-00007.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
"model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
"model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
"model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.input_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
"model.norm.weight": "model-00007-of-00007.safetensors"
}
}

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3c5cf44023714fb39b05e71e425f8d7b92805ff73f7988b083b8c87f0bf87393
size 17209961

2064
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

9
train_results.json Normal file
View File

@@ -0,0 +1,9 @@
{
"epoch": 1.0,
"total_flos": 0.0,
"train_loss": 0.8810622134572784,
"train_runtime": 1867.9067,
"train_samples": 43598,
"train_samples_per_second": 23.341,
"train_steps_per_second": 0.365
}

13033
trainer_state.json Normal file

File diff suppressed because it is too large Load Diff