初始化项目,由ModelHub XC社区提供模型
Model: jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85 Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
82
README.md
Normal file
82
README.md
Normal file
@@ -0,0 +1,82 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: apache-2.0
|
||||
base_model: jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452
|
||||
tags:
|
||||
- alignment-handbook
|
||||
- new-dpo
|
||||
- generated_from_trainer
|
||||
datasets:
|
||||
- Anthropic/hh-rlhf
|
||||
model-index:
|
||||
- name: qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85
|
||||
results: []
|
||||
---
|
||||
|
||||
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
||||
should probably proofread and complete it, then remove this comment. -->
|
||||
|
||||
# qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85
|
||||
|
||||
This model is a fine-tuned version of [jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452](https://huggingface.co/jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452) on the Anthropic/hh-rlhf dataset.
|
||||
It achieves the following results on the evaluation set:
|
||||
- Loss: 0.5468
|
||||
- Fcm Dpo/beta: 0.5703
|
||||
- Margin Dpo/margin Mean: 1.5468
|
||||
- Margin Dpo/margin Std: 2.5975
|
||||
- Logps/chosen: -82.0159
|
||||
- Logps/rejected: -93.3573
|
||||
- Logps/ref Chosen: -86.9018
|
||||
- Logps/ref Rejected: -96.6964
|
||||
- Logits/chosen: 1.7134
|
||||
- Logits/rejected: 1.6065
|
||||
|
||||
## Model description
|
||||
|
||||
More information needed
|
||||
|
||||
## Intended uses & limitations
|
||||
|
||||
More information needed
|
||||
|
||||
## Training and evaluation data
|
||||
|
||||
More information needed
|
||||
|
||||
## Training procedure
|
||||
|
||||
### Training hyperparameters
|
||||
|
||||
The following hyperparameters were used during training:
|
||||
- learning_rate: 5e-07
|
||||
- train_batch_size: 8
|
||||
- eval_batch_size: 8
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 4
|
||||
- gradient_accumulation_steps: 2
|
||||
- total_train_batch_size: 64
|
||||
- total_eval_batch_size: 32
|
||||
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
||||
- lr_scheduler_type: cosine
|
||||
- lr_scheduler_warmup_ratio: 0.1
|
||||
- num_epochs: 1
|
||||
|
||||
### Training results
|
||||
|
||||
| Training Loss | Epoch | Step | Validation Loss | Fcm Dpo/beta | Margin Dpo/margin Mean | Margin Dpo/margin Std | Logps/chosen | Logps/rejected | Logps/ref Chosen | Logps/ref Rejected | Logits/chosen | Logits/rejected |
|
||||
|:-------------:|:------:|:----:|:---------------:|:------------:|:----------------------:|:---------------------:|:------------:|:--------------:|:----------------:|:------------------:|:-------------:|:---------------:|
|
||||
| 1.325 | 0.1512 | 100 | 0.6541 | 0.1000 | 0.9036 | 1.9803 | -86.1995 | -96.8977 | -86.9018 | -96.6964 | 1.6835 | 1.5698 |
|
||||
| 1.1881 | 0.3023 | 200 | 0.5625 | 0.6970 | 1.2537 | 2.2271 | -78.6161 | -89.6644 | -86.9018 | -96.6964 | 1.8627 | 1.7409 |
|
||||
| 1.1394 | 0.4535 | 300 | 0.5554 | 0.6672 | 1.1965 | 2.1012 | -79.8050 | -90.7961 | -86.9018 | -96.6964 | 1.8760 | 1.7587 |
|
||||
| 1.1945 | 0.6047 | 400 | 0.5504 | 0.6382 | 1.3377 | 2.2885 | -81.0971 | -92.2295 | -86.9018 | -96.6964 | 1.9273 | 1.8112 |
|
||||
| 1.0818 | 0.7559 | 500 | 0.5445 | 0.5119 | 1.5736 | 2.6447 | -81.9627 | -93.3309 | -86.9018 | -96.6964 | 1.8607 | 1.7468 |
|
||||
| 1.1017 | 0.9070 | 600 | 0.5468 | 0.5703 | 1.5468 | 2.5975 | -82.0159 | -93.3573 | -86.9018 | -96.6964 | 1.7134 | 1.6065 |
|
||||
|
||||
|
||||
### Framework versions
|
||||
|
||||
- Transformers 4.51.0
|
||||
- Pytorch 2.3.1+cu121
|
||||
- Datasets 2.21.0
|
||||
- Tokenizers 0.21.4
|
||||
28
added_tokens.json
Normal file
28
added_tokens.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"</think>": 151668,
|
||||
"</tool_call>": 151658,
|
||||
"</tool_response>": 151666,
|
||||
"<think>": 151667,
|
||||
"<tool_call>": 151657,
|
||||
"<tool_response>": 151665,
|
||||
"<|box_end|>": 151649,
|
||||
"<|box_start|>": 151648,
|
||||
"<|endoftext|>": 151643,
|
||||
"<|file_sep|>": 151664,
|
||||
"<|fim_middle|>": 151660,
|
||||
"<|fim_pad|>": 151662,
|
||||
"<|fim_prefix|>": 151659,
|
||||
"<|fim_suffix|>": 151661,
|
||||
"<|im_end|>": 151645,
|
||||
"<|im_start|>": 151644,
|
||||
"<|image_pad|>": 151655,
|
||||
"<|object_ref_end|>": 151647,
|
||||
"<|object_ref_start|>": 151646,
|
||||
"<|quad_end|>": 151651,
|
||||
"<|quad_start|>": 151650,
|
||||
"<|repo_name|>": 151663,
|
||||
"<|video_pad|>": 151656,
|
||||
"<|vision_end|>": 151653,
|
||||
"<|vision_pad|>": 151654,
|
||||
"<|vision_start|>": 151652
|
||||
}
|
||||
23
all_results.json
Normal file
23
all_results.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"epoch": 0.999244142101285,
|
||||
"eval_fcm_dpo/beta": 0.5098052024841309,
|
||||
"eval_logits/chosen": 1.7105576992034912,
|
||||
"eval_logits/rejected": 1.603922724723816,
|
||||
"eval_logps/chosen": -82.0059585571289,
|
||||
"eval_logps/ref_chosen": -86.90177917480469,
|
||||
"eval_logps/ref_rejected": -96.69639587402344,
|
||||
"eval_logps/rejected": -93.3553695678711,
|
||||
"eval_loss": 0.54054856300354,
|
||||
"eval_margin_dpo/margin_mean": 1.554817795753479,
|
||||
"eval_margin_dpo/margin_std": 2.5823159217834473,
|
||||
"eval_runtime": 42.2442,
|
||||
"eval_samples": 2303,
|
||||
"eval_samples_per_second": 54.516,
|
||||
"eval_steps_per_second": 1.704,
|
||||
"total_flos": 0.0,
|
||||
"train_loss": 1.1363916052632181,
|
||||
"train_runtime": 2119.1053,
|
||||
"train_samples": 42336,
|
||||
"train_samples_per_second": 19.978,
|
||||
"train_steps_per_second": 0.312
|
||||
}
|
||||
30
config.json
Normal file
30
config.json
Normal file
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Qwen3ForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 151643,
|
||||
"eos_token_id": 151643,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 12288,
|
||||
"max_position_embeddings": 32768,
|
||||
"max_window_layers": 36,
|
||||
"model_type": "qwen3",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 36,
|
||||
"num_key_value_heads": 8,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "float32",
|
||||
"transformers_version": "4.51.0",
|
||||
"use_cache": true,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 151936
|
||||
}
|
||||
17
eval_results.json
Normal file
17
eval_results.json
Normal file
@@ -0,0 +1,17 @@
|
||||
{
|
||||
"epoch": 0.999244142101285,
|
||||
"eval_fcm_dpo/beta": 0.5098052024841309,
|
||||
"eval_logits/chosen": 1.7105576992034912,
|
||||
"eval_logits/rejected": 1.603922724723816,
|
||||
"eval_logps/chosen": -82.0059585571289,
|
||||
"eval_logps/ref_chosen": -86.90177917480469,
|
||||
"eval_logps/ref_rejected": -96.69639587402344,
|
||||
"eval_logps/rejected": -93.3553695678711,
|
||||
"eval_loss": 0.54054856300354,
|
||||
"eval_margin_dpo/margin_mean": 1.554817795753479,
|
||||
"eval_margin_dpo/margin_std": 2.5823159217834473,
|
||||
"eval_runtime": 42.2442,
|
||||
"eval_samples": 2303,
|
||||
"eval_samples_per_second": 54.516,
|
||||
"eval_steps_per_second": 1.704
|
||||
}
|
||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"bos_token_id": 151643,
|
||||
"eos_token_id": 151643,
|
||||
"max_new_tokens": 2048,
|
||||
"transformers_version": "4.51.0"
|
||||
}
|
||||
661
margin_logs/margins.jsonl
Normal file
661
margin_logs/margins.jsonl
Normal file
@@ -0,0 +1,661 @@
|
||||
{"epoch": 0.0, "step": 1, "batch_size": 64, "mean": -0.0029816031455993652, "std": 0.38981664180755615, "min": -0.7835464477539062, "p10": -0.5016929626464843, "median": 0.02667522430419922, "p90": 0.4355194091796875, "max": 1.2425384521484375, "pos_frac": 0.53125, "sample": [-0.2990684509277344, 0.05040740966796875, 0.4813804626464844, -0.7835464477539062, 0.16756057739257812, -0.21320724487304688, 0.066741943359375, 0.169891357421875, -0.06363677978515625, -0.33983612060546875, 0.20204925537109375, -0.003765106201171875, -0.7424850463867188, -0.039760589599609375, 0.008941650390625, 0.2320232391357422, 0.3860015869140625, 0.11869239807128906, -0.36592864990234375, -0.047290802001953125, -0.28316688537597656, 0.0283660888671875, -0.351715087890625, 0.11574554443359375, 0.86297607421875, -0.7426376342773438, 0.1338043212890625, -0.21837997436523438, 0.426910400390625, -0.12430953979492188, 0.2183837890625, -0.4932708740234375, 0.13604736328125, 0.1666259765625, 0.024984359741210938, -0.42929840087890625, -0.6993560791015625, -0.413604736328125, 0.22283935546875, -0.0557861328125, 1.2425384521484375, -0.2928791046142578, -0.14715576171875, 0.3737640380859375, -0.14208221435546875, 0.19033432006835938, 0.3464927673339844, 0.20479965209960938, 0.04190826416015625, -0.00957489013671875, -0.5053024291992188, 0.4848480224609375, 0.2988262176513672, 0.045352935791015625, 0.427978515625, -0.5745201110839844, 0.5770988464355469, 0.1401214599609375, -0.027454376220703125, -0.6424560546875, -0.2728919982910156, -0.428192138671875, 0.5285491943359375, 0.438751220703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000001.npy"}
|
||||
{"epoch": 0.0015117157974300832, "step": 2, "batch_size": 64, "mean": 0.029325813055038452, "std": 0.47058698534965515, "min": -1.2616119384765625, "p10": -0.39437751770019525, "median": -0.11953926086425781, "p90": 0.6386299133300786, "max": 1.48486328125, "pos_frac": 0.4375, "sample": [-0.43146514892578125, 0.07180404663085938, -0.20481109619140625, -0.00714111328125, 0.5232467651367188, 0.06253433227539062, -0.07450485229492188, -0.35506439208984375, -0.14567184448242188, -0.2234630584716797, -0.31732177734375, 1.456878662109375, 0.14324188232421875, -0.41083526611328125, -0.4837646484375, -0.12252044677734375, -0.1322479248046875, 0.45180511474609375, -0.6440353393554688, -1.2616119384765625, 0.7379837036132812, 0.0069866180419921875, 0.14553451538085938, 0.2057647705078125, -0.11970138549804688, 0.1814441680908203, -0.2711448669433594, -0.22872161865234375, 0.23077011108398438, 0.2108001708984375, 0.348419189453125, -0.10046005249023438, 0.4903106689453125, -0.209228515625, 0.3726234436035156, -0.2670707702636719, 0.056774139404296875, 0.1702728271484375, -0.3437042236328125, -0.5232925415039062, 0.1266021728515625, -0.31758880615234375, -0.4544639587402344, -0.13794708251953125, 0.5147171020507812, 0.03656768798828125, 1.48486328125, -0.2191619873046875, -0.22581100463867188, -0.11937713623046875, -0.1849536895751953, 0.9678802490234375, 0.3454742431640625, -0.16698455810546875, -0.2411823272705078, -0.1938018798828125, 0.999603271484375, -0.17424774169921875, 0.908782958984375, -0.3559761047363281, -0.17584609985351562, 0.688079833984375, 0.04034423828125, -0.2581329345703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000002.npy"}
|
||||
{"epoch": 0.0030234315948601664, "step": 3, "batch_size": 64, "mean": -0.0035227537155151367, "std": 0.4402654767036438, "min": -1.4191741943359375, "p10": -0.4824172973632812, "median": -0.015285491943359375, "p90": 0.4136981964111328, "max": 1.359405517578125, "pos_frac": 0.484375, "sample": [-0.37566375732421875, 0.23485183715820312, 0.1451263427734375, 0.24562835693359375, -1.4191741943359375, -0.01930999755859375, -0.011260986328125, -0.28614044189453125, -0.104217529296875, 0.2634124755859375, -0.10874176025390625, 0.44366455078125, 0.1188507080078125, -0.107452392578125, -0.5330810546875, -0.24988555908203125, -0.200103759765625, -0.19748687744140625, 0.05163764953613281, 0.414642333984375, -0.3824920654296875, -0.10361099243164062, 0.6924972534179688, 0.02095794677734375, -0.2192840576171875, 0.248046875, 0.2889251708984375, 0.00958251953125, 0.14304542541503906, 0.2736968994140625, -0.5632228851318359, 0.12537384033203125, 0.26377105712890625, -0.5014877319335938, -0.038074493408203125, -0.2542743682861328, -0.82489013671875, -0.087860107421875, 0.0877532958984375, -0.827972412109375, -0.02677154541015625, 0.10428237915039062, 0.16277313232421875, 0.561798095703125, 0.18677520751953125, -0.1341705322265625, -0.27362060546875, 0.013427734375, -0.43447113037109375, -0.06104278564453125, 0.9460296630859375, -0.43791961669921875, -0.08476638793945312, -0.0221099853515625, -0.3976593017578125, 0.22234344482421875, -0.5493927001953125, 1.2887115478515625, 0.012676239013671875, 1.359405517578125, 0.4114952087402344, -0.1605377197265625, 0.12903594970703125, 0.3024749755859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000003.npy"}
|
||||
{"epoch": 0.0045351473922902496, "step": 4, "batch_size": 64, "mean": -0.00699692964553833, "std": 0.44075119495391846, "min": -1.403076171875, "p10": -0.49400482177734373, "median": -0.012075424194335938, "p90": 0.6013916015625002, "max": 0.85546875, "pos_frac": 0.484375, "sample": [0.8552093505859375, -0.23148345947265625, -0.6077880859375, -0.5013427734375, 0.49182891845703125, 0.048885345458984375, -0.04729270935058594, -0.4654197692871094, 0.20339202880859375, -0.02768707275390625, 0.3274345397949219, 0.19652557373046875, -0.46630859375, 0.6613311767578125, -0.4389190673828125, 0.05322265625, 0.647613525390625, 0.32275390625, 0.23883056640625, 0.85546875, -0.4354095458984375, -0.23685073852539062, -0.3187217712402344, -0.6540069580078125, -0.16240310668945312, 0.8334121704101562, -0.43822479248046875, -0.06579208374023438, 0.4517707824707031, -0.1947174072265625, 0.348602294921875, 0.10468292236328125, -0.00037384033203125, -0.2371673583984375, 0.03238677978515625, 0.5751800537109375, 0.6126251220703125, -0.159698486328125, -0.81671142578125, 0.11058807373046875, 0.057708740234375, -0.1553802490234375, -0.3577766418457031, 0.2629890441894531, 0.7093353271484375, -0.5452423095703125, 0.3180198669433594, -0.023777008056640625, -0.1620941162109375, 0.19111251831054688, -0.47293853759765625, -0.4768829345703125, 0.05726432800292969, 0.009067535400390625, -0.0815277099609375, 0.4174346923828125, -1.403076171875, -0.20482254028320312, 0.20786285400390625, -0.34830474853515625, -0.2369537353515625, -0.5186004638671875, 0.4046173095703125, 0.4387359619140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000004.npy"}
|
||||
{"epoch": 0.006046863189720333, "step": 5, "batch_size": 64, "mean": -0.03886544704437256, "std": 0.41241422295570374, "min": -1.214752197265625, "p10": -0.49346923828125, "median": 0.027625083923339844, "p90": 0.4514663696289063, "max": 1.091888427734375, "pos_frac": 0.515625, "sample": [0.2814598083496094, -0.03350067138671875, 0.0969390869140625, 1.091888427734375, 0.6651153564453125, 0.4740142822265625, 0.1881561279296875, -0.9782485961914062, 0.136138916015625, 0.44191741943359375, 0.07343292236328125, -0.4897308349609375, -0.2979011535644531, 0.5786666870117188, -0.21886444091796875, 0.1593780517578125, 0.063995361328125, -0.06456756591796875, 0.0324859619140625, 0.07446670532226562, -0.27081298828125, -0.1616363525390625, 0.06691169738769531, -0.2631187438964844, 0.227294921875, -0.2653350830078125, 0.1608428955078125, -0.19689178466796875, -0.4950714111328125, -0.07939910888671875, 0.07306671142578125, -0.37641143798828125, -0.763580322265625, -0.11489677429199219, -0.6203155517578125, -0.0720672607421875, 0.31412696838378906, 0.2967681884765625, 0.2532463073730469, -0.42790985107421875, 0.045192718505859375, -1.214752197265625, 0.6796722412109375, -0.218505859375, -0.133575439453125, -0.01914215087890625, -0.006317138671875, -0.30883026123046875, 0.15655517578125, -0.5640335083007812, 0.031057357788085938, 0.53533935546875, 0.45555877685546875, 0.06640625, -1.088348388671875, 0.17324447631835938, 0.02419281005859375, -0.09533309936523438, 0.2217731475830078, 0.2920684814453125, 0.20275306701660156, -0.439056396484375, -0.42798614501953125, -0.415374755859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000005.npy"}
|
||||
{"epoch": 0.007558578987150416, "step": 6, "batch_size": 64, "mean": 0.07824432849884033, "std": 0.42383238673210144, "min": -1.383270263671875, "p10": -0.4009033203125, "median": 0.07421398162841797, "p90": 0.5665000915527344, "max": 1.098968505859375, "pos_frac": 0.59375, "sample": [0.20518112182617188, 0.5266952514648438, 0.8028678894042969, 0.46540069580078125, 0.138641357421875, -0.16722488403320312, -0.13768768310546875, -0.23637771606445312, -0.34818267822265625, -0.17496871948242188, -0.14899063110351562, -0.361724853515625, -0.5723400115966797, 0.04558563232421875, -0.6273345947265625, 0.40628814697265625, -0.0572052001953125, -0.5749549865722656, 0.056545257568359375, 0.1538238525390625, 0.650177001953125, 0.0142822265625, 0.0412139892578125, 0.27321434020996094, -0.179901123046875, 0.2064361572265625, -0.6291580200195312, -1.383270263671875, -0.2640838623046875, -0.19683837890625, -0.025760650634765625, 0.4104804992675781, 0.16131591796875, 0.5096015930175781, 0.6027679443359375, -0.238006591796875, 0.4221687316894531, 0.5738449096679688, -0.018585205078125, 0.24926376342773438, -0.006744384765625, 0.12279891967773438, 0.0660552978515625, 0.18530654907226562, -0.08648681640625, 0.23828125, 0.341217041015625, 0.5493621826171875, 0.08237266540527344, -0.10776138305664062, 0.10478973388671875, 0.15761566162109375, 0.3285846710205078, 0.2075023651123047, -0.417694091796875, -0.6611976623535156, -0.20963478088378906, -0.18613433837890625, 1.098968505859375, 1.04681396484375, 0.4591217041015625, 0.6802024841308594, 0.4312934875488281, 0.00980377197265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000006.npy"}
|
||||
{"epoch": 0.009070294784580499, "step": 7, "batch_size": 64, "mean": -0.0730276107788086, "std": 0.36698758602142334, "min": -1.073211669921875, "p10": -0.5626575469970703, "median": -0.07654762268066406, "p90": 0.38171043395996107, "max": 0.7727813720703125, "pos_frac": 0.390625, "sample": [-0.03829193115234375, -0.44600677490234375, 0.549652099609375, -0.6348953247070312, -0.33423805236816406, 0.6157112121582031, 0.22642898559570312, -0.23383331298828125, -0.206787109375, 0.7727813720703125, -0.06432342529296875, 0.23588943481445312, -0.019559860229492188, -0.5016098022460938, 0.07234001159667969, -0.08877182006835938, -0.20372390747070312, 0.3491172790527344, -0.18212127685546875, -0.5156097412109375, -0.5473594665527344, 0.2463531494140625, 0.306365966796875, 0.110595703125, -0.637054443359375, 0.13251113891601562, -0.054912567138671875, -0.11733245849609375, -0.2190399169921875, -0.046024322509765625, -0.6302070617675781, -0.0906219482421875, 0.21987533569335938, -0.30702972412109375, 0.3939018249511719, -0.277313232421875, -0.15401458740234375, -0.2682380676269531, -0.09350204467773438, 0.06280136108398438, 0.045654296875, 0.12705230712890625, 0.05518341064453125, 0.4448394775390625, 0.33441162109375, 0.21523284912109375, -0.501861572265625, -0.09344863891601562, -0.221832275390625, 0.4798583984375, -0.01648712158203125, 0.7278289794921875, -0.03906059265136719, -0.21671295166015625, 0.04737091064453125, -0.6307373046875, -0.40366363525390625, -0.6268692016601562, -1.073211669921875, -0.1916961669921875, 0.35326385498046875, 0.0096588134765625, -0.5692138671875, -0.3112297058105469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000007.npy"}
|
||||
{"epoch": 0.010582010582010581, "step": 8, "batch_size": 64, "mean": -0.06397378444671631, "std": 0.3843631446361542, "min": -1.168426513671875, "p10": -0.5303932189941406, "median": -0.1025390625, "p90": 0.4457473754882813, "max": 0.8257522583007812, "pos_frac": 0.390625, "sample": [-0.014558792114257812, -0.47821044921875, 0.17340087890625, -0.25772857666015625, 0.0296630859375, 0.4386444091796875, -0.328094482421875, -1.168426513671875, 0.21674728393554688, -0.7011260986328125, 0.03900146484375, 0.3704071044921875, -0.1566181182861328, 0.17998123168945312, -0.32917022705078125, 0.37615966796875, 0.555511474609375, -0.09635543823242188, -0.14910125732421875, 0.4563751220703125, -0.14434814453125, -0.6357650756835938, -0.5527572631835938, 0.637664794921875, -0.435638427734375, -0.33502769470214844, 0.490234375, -0.29705810546875, -0.12992095947265625, -0.09646987915039062, 0.11972808837890625, -0.66192626953125, 0.4309844970703125, -0.2696685791015625, -0.1412811279296875, -0.4488639831542969, -0.0073394775390625, -0.3804302215576172, -0.2509307861328125, 0.8073272705078125, 0.10280227661132812, -0.06258392333984375, -0.0650177001953125, -0.40995025634765625, 0.8257522583007812, -0.2749786376953125, -0.1260833740234375, 0.292694091796875, -0.16329002380371094, 0.09714508056640625, 0.2180500030517578, -0.40894317626953125, 0.17818069458007812, -0.7088890075683594, -0.137359619140625, 0.110931396484375, -0.6034889221191406, -0.2445068359375, 0.114410400390625, -0.1927642822265625, 0.23139572143554688, -0.06302642822265625, 0.44879150390625, -0.10860824584960938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000008.npy"}
|
||||
{"epoch": 0.012093726379440665, "step": 9, "batch_size": 64, "mean": 0.03132587671279907, "std": 0.39495009183883667, "min": -1.336029052734375, "p10": -0.3993389129638672, "median": 0.009489059448242188, "p90": 0.4720161437988283, "max": 1.099029541015625, "pos_frac": 0.53125, "sample": [-0.0648193359375, -0.680419921875, 0.6266326904296875, 0.0152740478515625, -0.09488868713378906, 0.39568328857421875, -0.47052001953125, -0.10033416748046875, -0.211334228515625, 0.9856414794921875, -0.0257415771484375, -0.2979755401611328, 0.07312774658203125, 0.3000602722167969, -1.336029052734375, 0.10200119018554688, 1.099029541015625, -0.4493255615234375, -0.40185546875, 0.12263298034667969, -0.38946533203125, 0.11742591857910156, 0.272857666015625, 0.31444549560546875, 0.3787117004394531, 0.3369598388671875, -0.21088409423828125, -0.11577606201171875, -0.11555099487304688, 0.089752197265625, 0.7793350219726562, 0.0650634765625, 0.1616363525390625, 0.4335479736328125, -0.4043083190917969, 0.39156532287597656, 0.16551589965820312, -0.14519500732421875, 0.551483154296875, 0.0098724365234375, 0.01265716552734375, 0.10944938659667969, -0.1062469482421875, 0.4206695556640625, -0.09533882141113281, -0.3934669494628906, 0.00756072998046875, -0.25873374938964844, -0.0297393798828125, -0.1080780029296875, 0.23322677612304688, 0.6419677734375, 0.009105682373046875, -0.0861663818359375, 0.3574371337890625, -0.3802032470703125, 0.48850250244140625, -0.00940704345703125, 0.1224822998046875, -0.22419357299804688, -0.27032470703125, 0.1121978759765625, -0.16834259033203125, -0.65399169921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000009.npy"}
|
||||
{"epoch": 0.013605442176870748, "step": 10, "batch_size": 64, "mean": 0.031807392835617065, "std": 0.4238128960132599, "min": -1.1130523681640625, "p10": -0.4383071899414062, "median": 0.022180557250976562, "p90": 0.6247940063476562, "max": 0.9956436157226562, "pos_frac": 0.53125, "sample": [0.8528900146484375, -0.04000091552734375, 0.060436248779296875, 0.051280975341796875, 0.24383544921875, -0.441864013671875, -0.09720039367675781, -0.9914321899414062, 0.753082275390625, -0.039478302001953125, -0.16326141357421875, -0.538543701171875, 0.710113525390625, -0.394989013671875, -1.1130523681640625, -0.09869766235351562, -0.3166637420654297, -0.356903076171875, -0.4300079345703125, -0.01569366455078125, 0.457366943359375, -0.3026885986328125, -0.1402740478515625, -0.2154693603515625, -0.4630012512207031, 0.10758209228515625, -0.42891693115234375, 0.6265106201171875, 0.341949462890625, 0.13147735595703125, 0.21155929565429688, 0.9956436157226562, 0.056438446044921875, 0.00873565673828125, 0.030879974365234375, 0.5085296630859375, 0.26294708251953125, -0.007457733154296875, -0.37944793701171875, -0.21042633056640625, -0.16895675659179688, -0.14890289306640625, 0.171142578125, 0.7682723999023438, 0.10629463195800781, 0.29970550537109375, 0.01348114013671875, -0.0133819580078125, 0.25579833984375, 0.62078857421875, 0.3396148681640625, 0.25467681884765625, 0.08685302734375, -0.32013702392578125, -0.36319732666015625, -0.10401153564453125, 0.1243896484375, 0.4541778564453125, -0.7478713989257812, 0.4420318603515625, 0.3040657043457031, 0.720184326171875, 0.19445037841796875, -0.479583740234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000010.npy"}
|
||||
{"epoch": 0.015117157974300832, "step": 11, "batch_size": 64, "mean": 0.005669832229614258, "std": 0.42462772130966187, "min": -0.7732048034667969, "p10": -0.482568359375, "median": -0.07788848876953125, "p90": 0.5096313476562501, "max": 1.389068603515625, "pos_frac": 0.40625, "sample": [0.6960372924804688, -0.16030502319335938, 0.45697021484375, 0.520782470703125, 0.4205284118652344, -0.034770965576171875, -0.10908126831054688, -0.15201187133789062, -0.1860198974609375, -0.40106201171875, -0.07940673828125, -0.2900543212890625, -0.2421112060546875, -0.07361602783203125, -0.3046150207519531, -0.5513153076171875, 0.0018463134765625, 0.3794403076171875, -0.1641082763671875, -0.1059112548828125, 0.5527153015136719, 0.2235107421875, 0.483612060546875, 0.10548782348632812, -0.0763702392578125, -0.165802001953125, -0.1640777587890625, -0.16038131713867188, -0.0182342529296875, -0.7732048034667969, 0.003276824951171875, 0.055561065673828125, -0.15342140197753906, 0.759796142578125, 0.2993488311767578, 0.14264678955078125, 0.46820068359375, -0.3388519287109375, 0.28456878662109375, -0.202484130859375, -0.1121673583984375, 0.3516044616699219, -0.41526031494140625, 0.2320404052734375, 0.28043365478515625, -0.43892669677734375, -0.47153472900390625, -0.3016815185546875, 0.7045440673828125, -0.5455322265625, -0.48729705810546875, -0.1382598876953125, 0.15771865844726562, 0.15858078002929688, 0.236907958984375, -0.6167068481445312, -0.0016632080078125, -0.2519340515136719, -0.6259841918945312, -0.046173095703125, 1.2805099487304688, 1.389068603515625, -0.558380126953125, -0.3641510009765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000011.npy"}
|
||||
{"epoch": 0.016628873771730914, "step": 12, "batch_size": 64, "mean": 0.045232415199279785, "std": 0.333871454000473, "min": -0.549652099609375, "p10": -0.29887905120849606, "median": -0.02332019805908203, "p90": 0.4421287536621094, "max": 1.27996826171875, "pos_frac": 0.484375, "sample": [-0.19663238525390625, -0.20662689208984375, -0.09961318969726562, 0.3017749786376953, 0.16765213012695312, -0.094818115234375, -0.2769947052001953, 0.3185577392578125, -0.266265869140625, -0.14908599853515625, 0.6669158935546875, -0.228729248046875, 0.2726287841796875, 1.27996826171875, -0.11256790161132812, -0.13671875, 0.44391632080078125, 0.4571685791015625, -0.308258056640625, 0.07402610778808594, 0.2648735046386719, -0.244903564453125, 0.2744598388671875, 0.437957763671875, 0.15991592407226562, 0.633758544921875, -0.25506591796875, 0.13104248046875, 0.042583465576171875, 0.0313262939453125, -0.044315338134765625, 0.3811492919921875, -0.426483154296875, 0.0108642578125, -0.158721923828125, -0.0487518310546875, 0.2548255920410156, -0.08053970336914062, 0.047760009765625, -0.2086334228515625, 0.6240234375, -0.05764961242675781, -0.3675079345703125, 0.1561908721923828, 0.23120498657226562, -0.22546768188476562, 0.19989013671875, -0.13016128540039062, -0.05283355712890625, 0.09999847412109375, -0.38518524169921875, -0.016355514526367188, -0.030284881591796875, 0.86431884765625, -0.3938751220703125, -0.549652099609375, -0.2526512145996094, 0.180145263671875, -0.4430580139160156, 0.34789276123046875, -0.10999298095703125, 0.06325531005859375, -0.24100112915039062, 0.27423095703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000012.npy"}
|
||||
{"epoch": 0.018140589569160998, "step": 13, "batch_size": 64, "mean": 0.0010965168476104736, "std": 0.3132418990135193, "min": -0.5965576171875, "p10": -0.3644874572753906, "median": -0.007769584655761719, "p90": 0.4489009857177738, "max": 0.838775634765625, "pos_frac": 0.5, "sample": [0.14691162109375, -0.04711151123046875, -0.4779815673828125, -0.23145675659179688, -0.52728271484375, 0.02057647705078125, 0.2855224609375, -0.21324920654296875, 0.215728759765625, -0.0324249267578125, -0.12524795532226562, -0.3690948486328125, -0.2471466064453125, 0.838775634765625, 0.015193939208984375, -0.5730209350585938, 0.2777099609375, -0.10743331909179688, 0.3287506103515625, 0.06093597412109375, -0.2322540283203125, -0.225982666015625, 0.543121337890625, 0.4920501708984375, -0.19017410278320312, -0.1712799072265625, -0.0916290283203125, 0.13334274291992188, 0.3003997802734375, 0.20156097412109375, 0.14012908935546875, -0.03476715087890625, -0.5231399536132812, 0.038822174072265625, -0.33960723876953125, 0.36125946044921875, 0.4864616394042969, 0.14033889770507812, -0.353118896484375, 0.0392608642578125, -0.0285797119140625, 0.07721710205078125, 0.487457275390625, 0.07033538818359375, -0.3885498046875, -0.126861572265625, 0.33759307861328125, 0.03118896484375, -0.06810379028320312, 0.5640411376953125, -0.22292327880859375, -0.18872833251953125, -0.11205291748046875, 0.3048896789550781, -0.35373687744140625, 0.052490234375, -0.09405517578125, -0.2955665588378906, -0.5965576171875, 0.201171875, 0.06280517578125, 0.7150039672851562, 0.013040542602539062, -0.32479095458984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000013.npy"}
|
||||
{"epoch": 0.019652305366591082, "step": 14, "batch_size": 64, "mean": -0.025872111320495605, "std": 0.2995727062225342, "min": -0.91900634765625, "p10": -0.38350296020507807, "median": -0.061279296875, "p90": 0.3220787048339845, "max": 0.7608299255371094, "pos_frac": 0.390625, "sample": [-0.022624969482421875, -0.093963623046875, -0.0706024169921875, -0.5220870971679688, -0.15561866760253906, -0.51470947265625, -0.00035858154296875, -0.15167236328125, 0.20850753784179688, -0.4025287628173828, -0.1104736328125, -0.43268585205078125, -0.019330978393554688, -0.318450927734375, 0.6155586242675781, -0.24400901794433594, 0.178314208984375, 0.12465667724609375, 0.1727752685546875, -0.5070877075195312, -0.08934402465820312, 0.07239532470703125, -0.09374237060546875, -0.09068679809570312, 0.0462646484375, -0.491119384765625, 0.533172607421875, 0.3325042724609375, 0.03179740905761719, -0.2562255859375, -0.056858062744140625, -0.099151611328125, -0.22613906860351562, 0.50494384765625, -0.2046051025390625, 0.29775238037109375, -0.07787704467773438, 0.07451629638671875, 0.2513580322265625, 0.59564208984375, -0.91900634765625, 0.18639755249023438, -0.012542724609375, -0.32421875, -0.09212493896484375, -0.06570053100585938, -0.053501129150390625, 0.4489593505859375, 0.2882423400878906, 0.012271881103515625, -0.1452178955078125, 0.20821380615234375, -0.13906288146972656, -0.115447998046875, 0.11406707763671875, 0.7608299255371094, -0.246795654296875, -0.27208900451660156, 0.08742332458496094, -0.3391094207763672, 0.10994148254394531, -0.04242706298828125, -0.09206771850585938, 0.198944091796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000014.npy"}
|
||||
{"epoch": 0.021164021164021163, "step": 15, "batch_size": 64, "mean": -0.07544746994972229, "std": 0.3244469463825226, "min": -0.9354782104492188, "p10": -0.47270355224609373, "median": -0.058689117431640625, "p90": 0.33097076416015625, "max": 0.5786209106445312, "pos_frac": 0.40625, "sample": [0.053211212158203125, -0.001556396484375, 0.0650787353515625, 0.013914108276367188, 0.18130111694335938, -0.4931793212890625, -0.0963897705078125, -0.0059661865234375, -0.62322998046875, 0.136962890625, -0.01910400390625, -0.021764755249023438, -0.698150634765625, -0.23173141479492188, -0.4476470947265625, -0.15703392028808594, -0.6940078735351562, 0.3016204833984375, -0.22039031982421875, -0.2470550537109375, -0.373565673828125, 0.33000946044921875, 0.17884063720703125, -0.7471542358398438, -0.1379680633544922, -0.0778656005859375, -0.13141632080078125, -0.4833831787109375, -0.9354782104492188, -0.20599365234375, 0.07767105102539062, -0.07024383544921875, 0.4037017822265625, 0.26410675048828125, 0.191986083984375, -0.23099517822265625, -0.4140777587890625, 0.039337158203125, 0.22191619873046875, 0.05188751220703125, -0.447784423828125, -0.12066650390625, -0.0264892578125, -0.294189453125, 0.46874046325683594, -0.08245849609375, 0.08628082275390625, -0.1405792236328125, 0.2772979736328125, -0.44281005859375, 0.5049514770507812, -0.15954208374023438, -0.12945938110351562, 0.5139617919921875, 0.0925445556640625, 0.33138275146484375, -0.0471343994140625, -0.43227386474609375, -0.34508514404296875, -0.24341964721679688, 0.5786209106445312, 0.37796783447265625, 0.08216094970703125, 0.0231475830078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000015.npy"}
|
||||
{"epoch": 0.022675736961451247, "step": 16, "batch_size": 64, "mean": 0.01433536410331726, "std": 0.33667638897895813, "min": -0.7254638671875, "p10": -0.4290565490722656, "median": 0.012447357177734375, "p90": 0.41629371643066443, "max": 0.9407958984375, "pos_frac": 0.5625, "sample": [-0.3402519226074219, 0.10654067993164062, 0.4546623229980469, 0.1381053924560547, 0.17522811889648438, -0.1920604705810547, 0.17052268981933594, -0.4903717041015625, -0.7254638671875, -0.2634010314941406, 0.1818389892578125, -0.41629791259765625, 0.3171234130859375, -0.3131675720214844, 0.00899505615234375, 0.048336029052734375, 0.3267669677734375, -0.1568317413330078, 0.079315185546875, 0.0128173828125, 0.59210205078125, -0.374359130859375, 0.9407958984375, -0.32570648193359375, -0.11360549926757812, 0.8368301391601562, -0.2969703674316406, 0.01207733154296875, 0.013614654541015625, -0.4345245361328125, 0.1881866455078125, 0.20774078369140625, -0.511474609375, 0.06473922729492188, -0.03679656982421875, -0.1003265380859375, 0.025390625, -0.4916229248046875, 0.23108291625976562, -0.3614501953125, -0.4450569152832031, -0.06368255615234375, -0.080780029296875, 0.16608428955078125, -0.4546356201171875, 0.3137092590332031, 0.0605010986328125, -0.24774169921875, -0.15006256103515625, -0.088226318359375, -0.039897918701171875, 0.007844924926757812, 0.0038890838623046875, 0.558258056640625, 0.039630889892578125, -0.09048080444335938, 0.18109893798828125, 0.670257568359375, 0.772186279296875, 0.199188232421875, 0.0320892333984375, 0.30960655212402344, -0.00983428955078125, 0.08538818359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000016.npy"}
|
||||
{"epoch": 0.02418745275888133, "step": 17, "batch_size": 64, "mean": -0.04374605417251587, "std": 0.37054377794265747, "min": -1.181121826171875, "p10": -0.46564331054687497, "median": -0.03991413116455078, "p90": 0.3744922637939453, "max": 0.9732322692871094, "pos_frac": 0.40625, "sample": [-0.4113807678222656, -0.1294097900390625, -0.13523101806640625, 0.09425544738769531, -0.09649658203125, -0.40708160400390625, -0.6394805908203125, -0.10233116149902344, -0.32892608642578125, 0.0791168212890625, 0.1030731201171875, -0.20488739013671875, -0.03479957580566406, 0.393402099609375, 0.16934967041015625, -0.43500518798828125, 0.16631317138671875, -0.28537750244140625, -0.00725555419921875, 0.02167510986328125, -0.0196380615234375, 0.46433448791503906, -0.327789306640625, 0.972198486328125, -0.3023529052734375, -0.058734893798828125, -0.4757232666015625, 0.04521751403808594, -0.013916015625, 0.354461669921875, 0.68450927734375, -0.0450286865234375, 0.02508544921875, 0.23901748657226562, 0.143402099609375, -0.4421234130859375, -0.8111228942871094, 0.15189361572265625, 0.2642669677734375, -1.181121826171875, 0.3745574951171875, -0.013187408447265625, -0.16748428344726562, 0.29630279541015625, 0.056438446044921875, 0.0865936279296875, 0.4692840576171875, -0.544921875, -0.16468048095703125, -0.114715576171875, -0.06604766845703125, -0.49871826171875, 0.3743400573730469, 0.2217082977294922, -0.1742401123046875, -0.624267578125, -0.1375732421875, -0.2542152404785156, -0.116668701171875, -0.28235626220703125, -0.025989532470703125, 0.9732322692871094, -0.0953521728515625, 0.15185546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000017.npy"}
|
||||
{"epoch": 0.025699168556311415, "step": 18, "batch_size": 64, "mean": 0.13690713047981262, "std": 0.36349257826805115, "min": -0.5137710571289062, "p10": -0.30201759338378903, "median": 0.14180755615234375, "p90": 0.5842178344726563, "max": 1.3004608154296875, "pos_frac": 0.59375, "sample": [0.0620880126953125, 0.4741668701171875, -0.1698455810546875, 0.13656997680664062, 0.26203155517578125, 0.1126556396484375, -0.23778152465820312, -0.04205322265625, 0.703521728515625, -0.170745849609375, -0.2605400085449219, 0.495849609375, 0.08931541442871094, -0.46543121337890625, 0.4470977783203125, 0.5873489379882812, -0.00884246826171875, 0.36847686767578125, 0.18131637573242188, 0.9794921875, -0.35204315185546875, 0.2685394287109375, 0.5769119262695312, 0.21901702880859375, 0.20849227905273438, 0.0185394287109375, 0.24665069580078125, 0.26288604736328125, -0.209808349609375, -0.1668853759765625, -0.2004241943359375, -0.22194671630859375, 0.5652618408203125, 0.722503662109375, 0.35511016845703125, 0.32985687255859375, 0.3179817199707031, -0.19322967529296875, 1.3004608154296875, -0.3885040283203125, 0.350372314453125, -0.015533447265625, 0.161773681640625, -0.319793701171875, -0.02161407470703125, 0.14704513549804688, -0.3633708953857422, 0.372894287109375, 0.1604461669921875, -0.42305755615234375, -0.5137710571289062, 0.17069244384765625, 0.087738037109375, -0.1188812255859375, -0.11761474609375, -0.1589508056640625, -0.058414459228515625, -0.17909812927246094, 0.6892890930175781, 0.21001052856445312, -0.040802001953125, 0.564300537109375, 0.283905029296875, 0.6904296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000018.npy"}
|
||||
{"epoch": 0.027210884353741496, "step": 19, "batch_size": 64, "mean": -0.03989291191101074, "std": 0.39939799904823303, "min": -0.9825439453125, "p10": -0.46179733276367185, "median": -0.06824111938476562, "p90": 0.4427379608154297, "max": 1.332611083984375, "pos_frac": 0.40625, "sample": [0.0771484375, -0.077606201171875, -0.02619171142578125, 0.1298828125, -0.09413909912109375, -0.11724853515625, 0.846160888671875, 0.30445098876953125, 0.03752899169921875, -0.2974090576171875, 0.5328140258789062, 0.04798126220703125, 0.4322090148925781, 0.08194923400878906, 0.4472503662109375, 0.011119842529296875, -0.3240928649902344, -0.35692596435546875, -0.073272705078125, -0.03025054931640625, -0.35015106201171875, -0.61187744140625, -0.19304275512695312, 0.388214111328125, -0.9825439453125, 0.02877044677734375, 0.125946044921875, -0.06320953369140625, -0.0189666748046875, 0.4572257995605469, -0.11652374267578125, 0.6505355834960938, 0.188690185546875, -0.28472137451171875, 1.332611083984375, -0.7694854736328125, 0.029693603515625, 0.31261444091796875, -0.4619598388671875, -0.0368499755859375, -0.12489128112792969, -0.43115234375, -0.733001708984375, 0.885040283203125, 0.3460235595703125, 0.000843048095703125, -0.37265777587890625, -0.3005218505859375, -0.017168045043945312, -0.2129364013671875, -0.120147705078125, 0.19915771484375, -0.5625076293945312, -0.08100128173828125, 0.08014678955078125, -0.09989166259765625, -0.28235626220703125, -0.208282470703125, 0.2164134979248047, -0.4373779296875, -0.0991363525390625, -0.36692047119140625, -0.5457305908203125, -0.46141815185546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000019.npy"}
|
||||
{"epoch": 0.02872260015117158, "step": 20, "batch_size": 64, "mean": -0.046687573194503784, "std": 0.3625895380973816, "min": -1.3373031616210938, "p10": -0.4299465179443359, "median": -0.03325843811035156, "p90": 0.3668777465820313, "max": 0.7930030822753906, "pos_frac": 0.453125, "sample": [0.0260162353515625, -0.1114959716796875, 0.7930030822753906, 0.008274078369140625, -0.4270439147949219, 0.10980987548828125, 0.1990966796875, 0.39013671875, 0.3509521484375, -0.2924079895019531, -0.05231475830078125, -0.23386001586914062, -0.4361114501953125, -0.015460968017578125, 0.0091400146484375, 0.2570381164550781, -0.226348876953125, -0.2502784729003906, 0.194488525390625, -0.5371322631835938, -0.23828697204589844, 0.0101776123046875, 0.20233917236328125, -0.25849151611328125, -0.13759231567382812, -1.3373031616210938, 0.054836273193359375, 0.05928802490234375, -0.07968902587890625, -0.11620521545410156, -0.2003173828125, -0.2943611145019531, 0.56982421875, -0.017345428466796875, -0.0190887451171875, -0.43119049072265625, 0.0043182373046875, -0.74560546875, 0.24678802490234375, -0.17496871948242188, 0.09711837768554688, -0.0496063232421875, 0.3737030029296875, 0.594482421875, 0.13492584228515625, 0.23770904541015625, 0.27475738525390625, -0.1478271484375, 0.2404327392578125, -1.1600494384765625, 0.0072174072265625, -0.2890625, -0.18292236328125, 0.1614532470703125, -0.05323028564453125, -0.2822093963623047, -0.47821044921875, -0.1251373291015625, 0.45374298095703125, -0.047428131103515625, -0.3473529815673828, 0.5801467895507812, 0.23965835571289062, -0.07294273376464844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000020.npy"}
|
||||
{"epoch": 0.030234315948601664, "step": 21, "batch_size": 64, "mean": 0.08522748947143555, "std": 0.4610610008239746, "min": -1.54876708984375, "p10": -0.40401325225830076, "median": 0.047141075134277344, "p90": 0.6058532714843751, "max": 1.1579132080078125, "pos_frac": 0.5625, "sample": [-0.14656448364257812, -0.108367919921875, 0.008335113525390625, 1.1579132080078125, -1.0511322021484375, 0.599273681640625, 0.00634002685546875, 0.24430465698242188, -0.0164031982421875, 0.31308555603027344, 0.7585525512695312, 0.21840667724609375, 1.052734375, -0.6805191040039062, 0.8232536315917969, 0.0246734619140625, -0.0866546630859375, 0.1780548095703125, -0.0673980712890625, 0.40122032165527344, 0.5657272338867188, 0.1104583740234375, -0.18385696411132812, -0.30649566650390625, -0.0361328125, -0.12493133544921875, -0.04168510437011719, -0.365447998046875, 0.5465087890625, 0.06077384948730469, -0.234649658203125, -0.07507514953613281, -0.4088134765625, -0.10018157958984375, 0.1996612548828125, 0.0866241455078125, -0.5915069580078125, -0.17484283447265625, 0.29535675048828125, -0.03146171569824219, 0.11748504638671875, 0.608673095703125, -1.54876708984375, 0.5223236083984375, 0.83990478515625, 0.25920867919921875, -0.4210700988769531, 0.2504463195800781, -0.092681884765625, 0.20513916015625, -0.6520347595214844, 0.03350830078125, -0.39281272888183594, 0.6745033264160156, 0.37339210510253906, -0.2871551513671875, -0.04416656494140625, 0.1600818634033203, 0.5331249237060547, 0.464080810546875, -0.008453369140625, 0.5362091064453125, 0.18443679809570312, 0.32004547119140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000021.npy"}
|
||||
{"epoch": 0.031746031746031744, "step": 22, "batch_size": 64, "mean": 0.024462968111038208, "std": 0.46661561727523804, "min": -0.93963623046875, "p10": -0.46696777343749996, "median": -0.023090362548828125, "p90": 0.5563173294067386, "max": 1.8746490478515625, "pos_frac": 0.484375, "sample": [-0.30633544921875, 0.11455535888671875, -0.0026092529296875, 0.045177459716796875, -0.32825660705566406, 0.11175537109375, -0.05076789855957031, -0.8762283325195312, -0.04977226257324219, 0.46637535095214844, -0.07120895385742188, -0.6308059692382812, 0.11197280883789062, 0.09459686279296875, 0.6510391235351562, -0.1865692138671875, -0.20137405395507812, -0.06296539306640625, 0.33956146240234375, 0.10852813720703125, -0.2799835205078125, -0.0445404052734375, 0.21074676513671875, 0.1616363525390625, -0.09306526184082031, 0.935455322265625, 0.10242843627929688, 0.03217315673828125, 0.1492767333984375, 0.13659286499023438, 1.8746490478515625, -0.29736328125, 0.3906364440917969, 0.07538986206054688, -0.13400650024414062, 0.00830078125, -0.15069580078125, 0.6380996704101562, -0.09015655517578125, 0.01389312744140625, 0.351470947265625, 1.164306640625, -0.4842987060546875, -0.4265289306640625, -0.1174774169921875, -0.93963623046875, 0.2929267883300781, -0.1330718994140625, 0.5948638916015625, -0.663482666015625, -0.092987060546875, -0.8822021484375, 0.04901123046875, -0.048858642578125, -0.9284820556640625, -0.1078338623046875, 0.16800308227539062, -0.13899612426757812, -0.202972412109375, 0.9301910400390625, -0.05999755859375, 0.26000213623046875, -0.04357147216796875, 0.1091156005859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000022.npy"}
|
||||
{"epoch": 0.03325774754346183, "step": 23, "batch_size": 64, "mean": -0.0017971396446228027, "std": 0.4014335870742798, "min": -0.9146575927734375, "p10": -0.4821434020996093, "median": 0.009494781494140625, "p90": 0.5233558654785156, "max": 1.124542236328125, "pos_frac": 0.53125, "sample": [0.47292327880859375, 1.124542236328125, 0.1308135986328125, -0.7882080078125, 0.18122291564941406, 0.06004142761230469, -0.10003662109375, 0.064727783203125, -0.8987655639648438, 0.066650390625, -0.15943145751953125, 0.2468433380126953, -0.2510337829589844, -0.20766448974609375, 0.33103179931640625, 0.528778076171875, -0.099090576171875, -0.12262725830078125, -0.1822052001953125, 0.23921966552734375, 0.6528968811035156, 0.05303955078125, -0.15000534057617188, 0.5335006713867188, 0.00675201416015625, 0.3755645751953125, -0.40763092041015625, 0.38457489013671875, -0.9146575927734375, 0.5107040405273438, -0.447509765625, 0.2850074768066406, -0.08232879638671875, 0.09888076782226562, 0.01120758056640625, -0.31198883056640625, -0.17205810546875, -0.1479949951171875, 0.026706695556640625, -0.016567230224609375, -0.8156890869140625, 0.6503524780273438, -0.0471038818359375, -0.83837890625, -0.5590972900390625, 0.3486480712890625, 0.13945770263671875, -0.09306716918945312, -0.49698638916015625, 0.253082275390625, 0.575836181640625, -0.12500381469726562, 0.06280899047851562, 0.12465667724609375, 0.686248779296875, -0.09017181396484375, -0.33386993408203125, 0.007781982421875, -0.259613037109375, 0.09966278076171875, -0.2682037353515625, 0.0504302978515625, 0.2664356231689453, -0.379058837890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000023.npy"}
|
||||
{"epoch": 0.03476946334089191, "step": 24, "batch_size": 64, "mean": 0.053469330072402954, "std": 0.38753730058670044, "min": -1.0517578125, "p10": -0.3152122497558594, "median": 0.08011245727539062, "p90": 0.4617012023925783, "max": 1.1743087768554688, "pos_frac": 0.59375, "sample": [0.201629638671875, 0.34515380859375, 0.33625030517578125, 0.0002422332763671875, 0.25201416015625, 0.2924537658691406, 0.09327316284179688, 0.5609283447265625, -0.410736083984375, 1.1743087768554688, -0.05619049072265625, -0.0067596435546875, 0.12666702270507812, -0.2866058349609375, 0.8069229125976562, 0.004241943359375, 0.5672340393066406, -0.15196990966796875, 0.47846221923828125, -0.21047210693359375, 0.22186279296875, 0.4225921630859375, -0.16425323486328125, -0.16051101684570312, 0.07494354248046875, 0.13085556030273438, 0.22900009155273438, 0.3127899169921875, -0.3154296875, -0.2287139892578125, -0.16477584838867188, 0.0011272430419921875, -0.20660400390625, 0.4134063720703125, 0.4005775451660156, -1.0517578125, 0.3074798583984375, 0.4961280822753906, -0.01145172119140625, -0.037384033203125, -0.2572479248046875, 0.05355072021484375, 0.23474502563476562, 0.10756492614746094, -0.6334228515625, 0.253387451171875, -0.31470489501953125, 0.0193023681640625, 0.15288543701171875, 0.4145660400390625, 0.13851165771484375, -0.28726959228515625, -0.09210205078125, -0.16071319580078125, -0.017486572265625, 0.32512664794921875, 0.16044235229492188, -0.52886962890625, 0.0852813720703125, -0.2888603210449219, 0.7144851684570312, -0.897064208984375, -0.8555145263671875, 0.3085136413574219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000024.npy"}
|
||||
{"epoch": 0.036281179138321996, "step": 25, "batch_size": 64, "mean": 0.009904235601425171, "std": 0.4154755473136902, "min": -1.001312255859375, "p10": -0.5857414245605469, "median": 0.03257942199707031, "p90": 0.4416587829589844, "max": 1.033355712890625, "pos_frac": 0.546875, "sample": [0.3818626403808594, 0.29331207275390625, -0.04505348205566406, 0.44814300537109375, 0.2813873291015625, 0.26889801025390625, 1.033355712890625, 0.025342941284179688, 0.39056396484375, -0.18355178833007812, 0.18062591552734375, -0.2967529296875, -0.2546234130859375, 0.254730224609375, -0.086212158203125, 0.2818336486816406, 0.33193206787109375, 0.4988212585449219, 0.10617828369140625, -0.5638504028320312, -0.191558837890625, 0.287750244140625, -0.7384033203125, 0.7716827392578125, -0.0269317626953125, 0.2391490936279297, 0.3171100616455078, -1.001312255859375, -0.6830215454101562, 0.034061431884765625, -0.3065528869628906, 0.36389923095703125, 0.4489288330078125, -0.01976776123046875, 0.115692138671875, 0.6243019104003906, -0.90057373046875, 0.09725570678710938, -0.4800262451171875, -0.595123291015625, -0.1056671142578125, -0.21146011352539062, -0.11226844787597656, -0.09202957153320312, -0.21302413940429688, -0.21834945678710938, 0.269683837890625, -0.2520561218261719, 0.00466156005859375, -0.6245803833007812, -0.17827224731445312, 0.20507431030273438, -0.34336090087890625, 0.031097412109375, -0.820526123046875, 0.21249008178710938, 0.0789031982421875, -0.44509124755859375, -0.12158203125, 0.4265289306640625, 0.30173492431640625, 0.972747802734375, 0.11792373657226562, 0.04779052734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000025.npy"}
|
||||
{"epoch": 0.03779289493575208, "step": 26, "batch_size": 64, "mean": 0.07625627517700195, "std": 0.4260343611240387, "min": -1.2526321411132812, "p10": -0.3796140670776367, "median": 0.08757400512695312, "p90": 0.5346542358398441, "max": 1.557708740234375, "pos_frac": 0.59375, "sample": [-0.073394775390625, -0.160308837890625, -0.05109405517578125, -0.05675506591796875, 0.3349266052246094, -0.3164863586425781, -1.2526321411132812, 0.11299514770507812, 0.208953857421875, 0.11749267578125, 0.9510421752929688, 0.3681640625, 0.0533905029296875, -0.345245361328125, -0.24953460693359375, -0.42028045654296875, 0.09659576416015625, -0.950927734375, 0.10622406005859375, 0.15071487426757812, 0.45440673828125, -0.23108291625976562, 0.5690460205078125, -0.4262351989746094, 0.270050048828125, 0.24341583251953125, 0.08217620849609375, 0.08877182006835938, -0.12066650390625, 0.8220710754394531, 0.09528732299804688, 0.3174629211425781, -0.0751953125, 0.274017333984375, 0.239593505859375, -0.2129669189453125, -0.5132217407226562, -0.14316558837890625, -0.10091781616210938, 0.27484130859375, 0.9083938598632812, 0.1693572998046875, 0.3545646667480469, 0.4092559814453125, 0.4504241943359375, -0.1382598876953125, -0.3497161865234375, 0.6470947265625, 1.557708740234375, -0.3924274444580078, -0.4204254150390625, 0.035980224609375, -0.21000289916992188, 0.17242431640625, 0.177581787109375, 0.7905197143554688, 0.1498565673828125, 0.11745452880859375, 0.075653076171875, -0.00797271728515625, -0.15641021728515625, -0.10957908630371094, 0.08637619018554688, 0.0310211181640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000026.npy"}
|
||||
{"epoch": 0.039304610733182165, "step": 27, "batch_size": 64, "mean": 0.09230378270149231, "std": 0.34252890944480896, "min": -0.663482666015625, "p10": -0.3000701904296875, "median": 0.04839324951171875, "p90": 0.4701511383056641, "max": 1.40887451171875, "pos_frac": 0.546875, "sample": [-0.02561187744140625, 0.25731658935546875, 0.03533935546875, -0.15120315551757812, -0.0270538330078125, -0.4237823486328125, -0.14736557006835938, 0.35437774658203125, -0.2723388671875, 0.4595184326171875, 0.5537872314453125, 0.8047332763671875, 0.3194293975830078, 0.16779136657714844, 0.195556640625, -0.12733078002929688, 0.2762451171875, -0.02541351318359375, -0.22187042236328125, -0.34810638427734375, -0.29924774169921875, -0.005950927734375, 0.752899169921875, -0.30641937255859375, 0.2536163330078125, 1.40887451171875, 0.0213470458984375, -0.17387008666992188, 0.0029296875, 0.19349288940429688, -0.4906158447265625, 0.26871299743652344, -0.21687889099121094, 0.2227630615234375, 0.3064079284667969, -0.08347320556640625, -0.06644439697265625, 0.14115142822265625, 0.18404388427734375, 0.148834228515625, 0.5544586181640625, 0.5255584716796875, 0.0749359130859375, -0.009613037109375, 0.4049530029296875, 0.161956787109375, -0.663482666015625, 0.462677001953125, -0.03409576416015625, 0.3238563537597656, 0.4733543395996094, -0.18199920654296875, -0.03372764587402344, -0.197540283203125, -0.02730560302734375, -0.12383270263671875, -0.30042266845703125, 0.3986244201660156, 0.4001007080078125, -0.41576385498046875, 0.0614471435546875, 0.1912078857421875, -0.14583969116210938, 0.09174346923828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000027.npy"}
|
||||
{"epoch": 0.04081632653061224, "step": 28, "batch_size": 64, "mean": -0.06432461738586426, "std": 0.37858933210372925, "min": -1.2261199951171875, "p10": -0.5290882110595703, "median": -0.050533294677734375, "p90": 0.41375579833984405, "max": 0.6648101806640625, "pos_frac": 0.453125, "sample": [-0.9043426513671875, -1.1146392822265625, 0.01654052734375, 0.16545867919921875, 0.03925323486328125, -0.16312789916992188, -0.650909423828125, 0.3436126708984375, -0.0929718017578125, -0.3335723876953125, -0.4383811950683594, 0.5836257934570312, 0.5138931274414062, -0.07970428466796875, -0.0518035888671875, -0.21936416625976562, -0.23176956176757812, -0.07500839233398438, 0.24176788330078125, 0.6648101806640625, -0.1896514892578125, 0.00762939453125, -0.09540748596191406, -0.27431488037109375, -0.051624298095703125, 0.10532951354980469, 0.08056068420410156, 0.1610870361328125, -0.08601760864257812, -0.12915420532226562, -0.11578369140625, -0.03803253173828125, -0.049442291259765625, 0.5913848876953125, -0.98211669921875, 0.443817138671875, 0.06455802917480469, -0.226806640625, -1.2261199951171875, -0.184112548828125, -0.03337669372558594, -0.2391510009765625, 0.3024482727050781, 0.05633544921875, 0.16770172119140625, 0.1464099884033203, 0.14795684814453125, -0.7347564697265625, 0.17205429077148438, -0.12412643432617188, -0.1460418701171875, -0.3947906494140625, 0.4557456970214844, -0.567962646484375, 0.01983642578125, 0.49163818359375, -0.0672149658203125, 0.0272979736328125, 0.1927337646484375, -0.197998046875, -0.19204330444335938, 0.08417129516601562, 0.20196151733398438, 0.095245361328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000028.npy"}
|
||||
{"epoch": 0.042328042328042326, "step": 29, "batch_size": 64, "mean": 0.002994030714035034, "std": 0.41855064034461975, "min": -1.340850830078125, "p10": -0.47092666625976565, "median": 0.012045860290527344, "p90": 0.5154361724853516, "max": 0.9193878173828125, "pos_frac": 0.515625, "sample": [-0.30587005615234375, -0.2834587097167969, -0.01544189453125, 0.5803375244140625, -0.447357177734375, 0.2416248321533203, -0.13936614990234375, 0.1822528839111328, -0.38645172119140625, 0.4752655029296875, -0.026979446411132812, -0.2904205322265625, -0.2517242431640625, 0.4439735412597656, -0.14017486572265625, 0.5234222412109375, 0.27924346923828125, 0.23916244506835938, -1.340850830078125, 0.17962646484375, -0.1298980712890625, 0.5102691650390625, 0.06462860107421875, 0.20838165283203125, 0.1735382080078125, 0.05985260009765625, -0.15900421142578125, 0.7074508666992188, 0.455413818359375, 0.05761528015136719, -0.5655975341796875, -0.4728240966796875, 0.9193878173828125, -0.3572273254394531, -0.25311279296875, -0.041229248046875, 0.11731529235839844, 0.3317413330078125, 0.3064727783203125, 0.448638916015625, 0.2776317596435547, -0.0667877197265625, -0.30638885498046875, 0.47621917724609375, 0.3439826965332031, 0.16658782958984375, -0.57318115234375, -0.6515350341796875, 0.5176506042480469, 0.02675628662109375, 0.707427978515625, 0.2947998046875, -0.07894134521484375, 0.0058879852294921875, -0.32802581787109375, 0.5578460693359375, -0.260009765625, -0.46649932861328125, -0.144317626953125, -0.3813018798828125, -0.8100738525390625, 0.0182037353515625, -0.3037071228027344, -0.7292327880859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000029.npy"}
|
||||
{"epoch": 0.04383975812547241, "step": 30, "batch_size": 64, "mean": 0.029387563467025757, "std": 0.44299933314323425, "min": -1.285888671875, "p10": -0.49040679931640624, "median": 0.06070137023925781, "p90": 0.5310226440429688, "max": 1.387542724609375, "pos_frac": 0.53125, "sample": [0.819915771484375, -0.4961357116699219, -0.23003387451171875, 0.05355072021484375, -0.024837493896484375, -0.24321746826171875, 0.1747264862060547, -0.263946533203125, -0.6547470092773438, 0.2193450927734375, 0.2794952392578125, -0.054779052734375, 0.1213226318359375, 0.18475341796875, 0.2144012451171875, -0.0535736083984375, 0.19466781616210938, -0.832183837890625, 0.06785202026367188, -0.30249786376953125, 0.5980148315429688, -0.06847000122070312, -0.29811859130859375, -0.230621337890625, 0.5029144287109375, -0.17008209228515625, 0.32865142822265625, 0.2508392333984375, 0.4112129211425781, 0.12551498413085938, 0.37884521484375, 0.32283782958984375, 0.7142829895019531, 0.07244873046875, 1.387542724609375, -0.4932403564453125, 0.168548583984375, -0.07862281799316406, 0.6139678955078125, -0.408660888671875, 0.00226593017578125, 0.5411529541015625, -1.285888671875, 0.21430206298828125, -0.21720123291015625, -0.7440719604492188, 0.4836273193359375, -0.16539764404296875, -0.3122272491455078, -0.483795166015625, 0.4482078552246094, 0.8493499755859375, 0.16833114624023438, -0.3964042663574219, -0.4137611389160156, 0.2310352325439453, 0.2722625732421875, -0.5967330932617188, 0.50738525390625, -0.11486053466796875, -0.17601776123046875, -0.26944732666015625, -0.08929443359375, 0.1261005401611328], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000030.npy"}
|
||||
{"epoch": 0.045351473922902494, "step": 31, "batch_size": 64, "mean": -0.04079073667526245, "std": 0.4297138452529907, "min": -1.0094146728515625, "p10": -0.6632720947265625, "median": 0.0024623870849609375, "p90": 0.448497772216797, "max": 1.191009521484375, "pos_frac": 0.515625, "sample": [0.6035690307617188, -0.0200347900390625, -0.679595947265625, -0.33527374267578125, 0.15433692932128906, 0.2749481201171875, -0.42950439453125, 0.179473876953125, -0.07064056396484375, 0.13874053955078125, 0.002788543701171875, 0.5448150634765625, -0.16447067260742188, -0.62518310546875, -0.688079833984375, 0.01116943359375, 0.07276153564453125, -0.084259033203125, 0.47975921630859375, 0.41765594482421875, -0.5761566162109375, 0.18703842163085938, -0.13973236083984375, 0.13049697875976562, -0.7728424072265625, -0.8386383056640625, -0.13491058349609375, 0.00213623046875, -0.4420928955078125, 0.0538177490234375, -0.48797607421875, 0.16064071655273438, 0.09769439697265625, 0.25969696044921875, 0.028369903564453125, -0.17692184448242188, -0.8868408203125, 0.24303436279296875, -0.08066940307617188, 0.2590503692626953, 0.34581756591796875, 0.05437469482421875, 0.4617156982421875, -0.8863601684570312, 1.191009521484375, -1.0094146728515625, -0.35120391845703125, 0.5252418518066406, -0.13965988159179688, -0.0655364990234375, 0.188507080078125, -0.057952880859375, -0.067840576171875, 0.3349266052246094, -0.3123779296875, 0.16268157958984375, -0.02239990234375, 1.038848876953125, 0.16071510314941406, 0.13086318969726562, 0.15703582763671875, -0.34718894958496094, -0.4029045104980469, -0.36767578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000031.npy"}
|
||||
{"epoch": 0.04686318972033258, "step": 32, "batch_size": 64, "mean": 0.04694738984107971, "std": 0.4311392903327942, "min": -1.0433120727539062, "p10": -0.5236331939697265, "median": 0.05698966979980469, "p90": 0.5405925750732423, "max": 1.079315185546875, "pos_frac": 0.5625, "sample": [0.44623565673828125, 0.14295196533203125, -0.611328125, 0.2982177734375, 0.442291259765625, 0.2969551086425781, -0.485931396484375, 0.6096343994140625, -0.445587158203125, -0.35604095458984375, -0.16486358642578125, -0.2771759033203125, 0.027126312255859375, -0.4043731689453125, 0.251495361328125, -0.89068603515625, 0.210601806640625, -0.006072998046875, 0.038055419921875, 0.030956268310546875, -0.2715415954589844, 0.25989532470703125, -0.041843414306640625, 0.1013946533203125, -0.3370170593261719, -0.0912017822265625, 0.7054023742675781, 0.1259765625, -0.3206787109375, 1.050628662109375, 0.1322917938232422, -0.222625732421875, -0.037685394287109375, -1.0433120727539062, 0.4637908935546875, 0.24802398681640625, -0.7641677856445312, 0.41115570068359375, 0.4592437744140625, -0.04636383056640625, 0.2869110107421875, 0.7251167297363281, -0.5538711547851562, -0.58343505859375, 0.10750198364257812, 0.07592391967773438, -0.052276611328125, 1.079315185546875, 0.7495498657226562, -0.0491180419921875, -0.1509552001953125, 0.18068695068359375, -0.5397911071777344, 0.19394683837890625, 0.49585723876953125, 0.034442901611328125, 0.2571868896484375, 0.5597648620605469, -0.1443023681640625, 0.08307266235351562, -0.2135772705078125, 0.4069976806640625, 0.45452880859375, -0.332672119140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000032.npy"}
|
||||
{"epoch": 0.04837490551776266, "step": 33, "batch_size": 64, "mean": 0.006792932748794556, "std": 0.4158684015274048, "min": -1.4757766723632812, "p10": -0.4864458084106445, "median": 0.0018157958984375, "p90": 0.4644996643066408, "max": 1.3910064697265625, "pos_frac": 0.515625, "sample": [0.481475830078125, 0.271148681640625, 0.330841064453125, -0.7314605712890625, -0.022855758666992188, -0.0541534423828125, -1.4757766723632812, -0.3939323425292969, 0.30619049072265625, 0.5443115234375, -0.096771240234375, -0.5037841796875, -0.44598960876464844, -0.387237548828125, -0.04326629638671875, -0.081207275390625, -0.55841064453125, -0.15304183959960938, 0.21364974975585938, 0.8828887939453125, -0.2928924560546875, 0.0033416748046875, 0.09720802307128906, -0.11634445190429688, -0.0679473876953125, 0.0318603515625, -0.0755462646484375, 0.3054656982421875, -0.06699371337890625, 0.3893890380859375, -0.038158416748046875, 0.1000823974609375, 0.06689453125, -0.16284561157226562, 0.5041999816894531, -0.13146209716796875, -0.2935142517089844, 0.42488861083984375, 0.4883384704589844, 0.4166259765625, 0.004364013671875, 0.09792327880859375, 0.035350799560546875, -0.007781982421875, 1.3910064697265625, 0.04461669921875, 0.09928131103515625, -0.05755615234375, 0.24089431762695312, 0.13972854614257812, 0.024127960205078125, 0.23514556884765625, 0.19352340698242188, -0.26795196533203125, -0.5745468139648438, -0.2789154052734375, 0.22259521484375, 0.282806396484375, 0.66973876953125, -0.2987518310546875, -0.5777053833007812, 0.0002899169921875, -0.050640106201171875, -0.798004150390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000033.npy"}
|
||||
{"epoch": 0.049886621315192746, "step": 34, "batch_size": 64, "mean": -0.016543224453926086, "std": 0.3535204231739044, "min": -0.7591934204101562, "p10": -0.5333681106567382, "median": -0.005767822265625, "p90": 0.4112590789794922, "max": 0.7493896484375, "pos_frac": 0.484375, "sample": [-0.42118072509765625, -0.2305908203125, 0.46435546875, 0.22909164428710938, -0.02285003662109375, 0.00275421142578125, 0.6880874633789062, 0.7493896484375, -0.3056182861328125, 0.3441963195800781, -0.7017440795898438, -0.208099365234375, -0.5614776611328125, -0.31146240234375, -0.5819091796875, -0.08935546875, 0.3801727294921875, 0.09142303466796875, 0.0279693603515625, 0.14999008178710938, 0.039764404296875, 0.1433868408203125, -0.07668304443359375, -0.6180267333984375, -0.08673858642578125, 0.0231781005859375, -0.01568603515625, 0.655548095703125, 0.13455963134765625, 0.4015045166015625, 0.3200721740722656, 0.3768310546875, -0.253662109375, -0.46777915954589844, 0.1629638671875, -0.697509765625, -0.18421173095703125, -0.2773017883300781, 0.4154396057128906, -0.167510986328125, 0.00146484375, -0.19379425048828125, -0.05804634094238281, -0.3436927795410156, -0.186126708984375, -0.0071258544921875, -0.4119873046875, -0.6144027709960938, -0.0185394287109375, 0.030109405517578125, -0.33968257904052734, 0.39714813232421875, 0.0289459228515625, -0.7591934204101562, 0.08463287353515625, 0.3239898681640625, 0.33597564697265625, -0.09480857849121094, 0.44791603088378906, -0.022729873657226562, 0.18579864501953125, -0.0044097900390625, 0.5963134765625, 0.04219818115234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000034.npy"}
|
||||
{"epoch": 0.05139833711262283, "step": 35, "batch_size": 64, "mean": 0.14254716038703918, "std": 0.4256536066532135, "min": -0.73077392578125, "p10": -0.32736549377441404, "median": 0.034160614013671875, "p90": 0.7161605834960938, "max": 1.2032470703125, "pos_frac": 0.5625, "sample": [1.2032470703125, -0.13120651245117188, 0.4147224426269531, 0.4890899658203125, 0.02317047119140625, 0.0868377685546875, 0.3600921630859375, -0.0300140380859375, -0.4461212158203125, 0.06531524658203125, 0.2563934326171875, -0.037944793701171875, 0.7239036560058594, -0.3054237365722656, 1.0948829650878906, -0.6766014099121094, 0.7581024169921875, 0.2271728515625, 0.4961090087890625, -0.09112548828125, 0.014028549194335938, 0.03997802734375, -0.2605094909667969, -0.3057441711425781, 0.3275432586669922, 0.20920944213867188, 0.9879302978515625, 0.18645095825195312, 0.6830520629882812, 0.1678466796875, 0.6778411865234375, -0.2812042236328125, 0.706878662109375, -0.10413360595703125, -0.16968536376953125, -0.05806732177734375, 0.2620391845703125, -0.0221405029296875, -0.17627716064453125, -0.37981414794921875, -0.02095794677734375, -0.09239387512207031, -0.73077392578125, 0.43746185302734375, -0.4501953125, 0.3758392333984375, 0.2709808349609375, 0.44952392578125, 0.4339179992675781, -0.33663177490234375, 0.4494781494140625, -0.040771484375, 0.02021026611328125, -0.17330169677734375, -0.6114425659179688, 0.5069198608398438, 1.102813720703125, -0.08159637451171875, -0.11595535278320312, 0.7201385498046875, -0.10528945922851562, 0.02834320068359375, -0.04547882080078125, 0.14635467529296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000035.npy"}
|
||||
{"epoch": 0.05291005291005291, "step": 36, "batch_size": 64, "mean": -0.004450619220733643, "std": 0.3476037383079529, "min": -0.8163795471191406, "p10": -0.380340576171875, "median": -0.004772186279296875, "p90": 0.3409488677978516, "max": 1.47979736328125, "pos_frac": 0.5, "sample": [-0.270355224609375, -0.1193084716796875, 1.47979736328125, -0.2166595458984375, -0.23724746704101562, -0.2528228759765625, -0.10645294189453125, 0.09011077880859375, 0.10251617431640625, 0.1375274658203125, 0.2336273193359375, 0.1321544647216797, -0.38100433349609375, 0.05987548828125, 0.10822486877441406, 0.21793365478515625, -0.26486968994140625, -0.03302001953125, 0.0112762451171875, -0.5077705383300781, 0.3828239440917969, -0.09259796142578125, -0.37291717529296875, 0.14295005798339844, -0.23119735717773438, 0.04047393798828125, -0.14792251586914062, 0.0573883056640625, 0.19240188598632812, -0.3697509765625, 0.099151611328125, -0.07634162902832031, -0.08817672729492188, -0.02082061767578125, -0.0399932861328125, -0.3980255126953125, -0.34004974365234375, 0.3449554443359375, 0.3112525939941406, 0.0180206298828125, 0.3887939453125, 0.2613677978515625, 0.0297698974609375, -0.062252044677734375, 0.06346893310546875, 0.430389404296875, 0.17516326904296875, 0.13245391845703125, -0.1985626220703125, -0.8078842163085938, -0.8163795471191406, -0.0332183837890625, -0.37879180908203125, -0.1081085205078125, -0.06407546997070312, 0.80096435546875, 0.1919708251953125, 0.3316001892089844, 0.4494781494140625, 0.0381317138671875, 0.2997703552246094, -0.44573974609375, -0.02356719970703125, -0.5347404479980469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000036.npy"}
|
||||
{"epoch": 0.05442176870748299, "step": 37, "batch_size": 64, "mean": -0.03487536311149597, "std": 0.45037293434143066, "min": -1.8724212646484375, "p10": -0.4753219604492187, "median": -0.024148941040039062, "p90": 0.42571563720703126, "max": 1.1612548828125, "pos_frac": 0.484375, "sample": [0.4264984130859375, -0.5374298095703125, -0.3277587890625, 0.0552520751953125, -0.42425537109375, 0.3581047058105469, -0.021205902099609375, 0.06936836242675781, 0.029958724975585938, -0.8451385498046875, -0.184051513671875, -0.4260444641113281, 0.4874076843261719, -1.8724212646484375, 0.5772743225097656, 0.0692596435546875, -0.049938201904296875, 0.6079292297363281, 0.2786216735839844, -0.908294677734375, 0.006923675537109375, -0.06587982177734375, 0.174163818359375, 0.24445343017578125, 0.10408401489257812, -0.05280303955078125, 0.42388916015625, -0.0421295166015625, -0.32854461669921875, -0.1708831787109375, -0.3890380859375, 0.233734130859375, -0.48931121826171875, 0.345245361328125, -0.12579345703125, -0.0327606201171875, 0.1929473876953125, 0.19495391845703125, 0.1053314208984375, -0.0396728515625, 1.1612548828125, 0.045867919921875, -0.02709197998046875, -0.20782470703125, -0.2692718505859375, -0.27052879333496094, -0.31591796875, -0.7328643798828125, -0.11779022216796875, -0.8101654052734375, -0.4104423522949219, 0.181365966796875, -0.17818641662597656, 0.1369152069091797, 0.000152587890625, 0.4573478698730469, -0.02783966064453125, 0.2974128723144531, 0.948028564453125, -0.44268035888671875, 0.1138153076171875, 0.40761566162109375, -0.129150390625, 0.305908203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000037.npy"}
|
||||
{"epoch": 0.055933484504913075, "step": 38, "batch_size": 64, "mean": -0.008413195610046387, "std": 0.4723937213420868, "min": -1.160247802734375, "p10": -0.5756980895996093, "median": 0.03134441375732422, "p90": 0.47759017944335935, "max": 1.3248748779296875, "pos_frac": 0.5625, "sample": [0.09694671630859375, -0.3704566955566406, 0.1407470703125, -0.1846160888671875, -0.31098175048828125, -0.181243896484375, 0.147796630859375, -0.15176773071289062, 0.18883514404296875, -1.160247802734375, 0.0713653564453125, 0.2881889343261719, 0.026678085327148438, -0.16666793823242188, -0.7780380249023438, 0.715789794921875, 0.3852386474609375, -0.1556682586669922, 0.3729095458984375, 0.5424652099609375, 0.06912994384765625, 0.47721099853515625, -0.506103515625, 0.299407958984375, 0.015972137451171875, -0.0957794189453125, 0.07082366943359375, 0.40760231018066406, 1.3248748779296875, 0.01952362060546875, -0.21750259399414062, -0.438323974609375, 0.01985931396484375, 0.053619384765625, -0.5578765869140625, -0.9396934509277344, 0.477752685546875, 0.06906509399414062, -0.09844207763671875, 0.12823104858398438, 1.0070877075195312, -0.5388946533203125, 0.13067626953125, 0.5601348876953125, -0.15550994873046875, -0.085845947265625, 0.0572509765625, -0.5833358764648438, -0.074493408203125, 0.291595458984375, -1.04833984375, -1.0258636474609375, 1.2062454223632812, -0.24841690063476562, 0.0360107421875, -0.07046699523925781, -0.10716629028320312, 0.17234420776367188, -0.6678085327148438, 0.188201904296875, -0.20803070068359375, 0.33953857421875, 0.050384521484375, 0.1396331787109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000038.npy"}
|
||||
{"epoch": 0.05744520030234316, "step": 39, "batch_size": 64, "mean": 0.06950537860393524, "std": 0.35353681445121765, "min": -0.927642822265625, "p10": -0.34070587158203125, "median": 0.048614501953125, "p90": 0.44356231689453135, "max": 1.2324066162109375, "pos_frac": 0.59375, "sample": [0.31363677978515625, 0.006099700927734375, 0.036865234375, 0.02942657470703125, 0.31626129150390625, -0.2110595703125, 0.559051513671875, 0.3897514343261719, -0.06319808959960938, 0.40081787109375, -0.091827392578125, -0.2915363311767578, -0.03057098388671875, -0.1306476593017578, -0.005096435546875, -0.3052253723144531, 0.076751708984375, 0.07650375366210938, 0.08978271484375, 0.0552215576171875, 0.23756027221679688, -0.42718505859375, 0.2221221923828125, -0.2424468994140625, 0.251007080078125, 0.4557685852050781, -0.3070220947265625, -0.23728370666503906, -0.6309661865234375, -6.4849853515625e-05, 0.09688568115234375, 0.385009765625, 0.6788330078125, -0.26602935791015625, -0.4350776672363281, 0.4150810241699219, 0.3242607116699219, -0.927642822265625, 0.19920730590820312, -0.332427978515625, 1.2324066162109375, -0.3977508544921875, 0.25676918029785156, 0.0400848388671875, 0.15655517578125, 0.4093017578125, 0.803924560546875, -0.0414886474609375, -0.16083145141601562, 0.16421127319335938, 0.5001602172851562, 0.5355758666992188, 0.13279342651367188, 0.3079986572265625, -0.10328483581542969, -0.3550567626953125, -0.06830215454101562, -0.3442535400390625, 0.096954345703125, 0.3345832824707031, 0.0420074462890625, 0.035531044006347656, 0.39849853515625, -0.20864105224609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000039.npy"}
|
||||
{"epoch": 0.05895691609977324, "step": 40, "batch_size": 64, "mean": -0.00729215145111084, "std": 0.3637956976890564, "min": -0.88934326171875, "p10": -0.4531951904296875, "median": -0.0001621246337890625, "p90": 0.4287033081054688, "max": 0.909271240234375, "pos_frac": 0.5, "sample": [0.0127410888671875, 0.14167022705078125, -0.1025543212890625, 0.07713699340820312, 0.2798347473144531, 0.20589447021484375, 0.5329971313476562, -0.01018524169921875, -0.1116180419921875, -0.3898963928222656, 0.21030044555664062, -0.2550697326660156, 0.909271240234375, -0.054683685302734375, -0.011188507080078125, -0.09074211120605469, 0.7823333740234375, 0.4536590576171875, -0.3213615417480469, -0.25983428955078125, 0.15784835815429688, 0.2717437744140625, -0.329193115234375, -0.40435028076171875, -0.01883697509765625, -0.2352294921875, 0.21059799194335938, -0.17474365234375, 0.1783905029296875, 0.08063316345214844, 0.07903099060058594, 0.43643951416015625, 0.205810546875, 0.10589599609375, -0.15681076049804688, -0.21986770629882812, 0.032962799072265625, 0.41065216064453125, 0.3191337585449219, 0.6951751708984375, 0.4865226745605469, -0.738861083984375, 0.3642406463623047, 0.132049560546875, -0.801361083984375, -0.2652435302734375, 0.009860992431640625, -0.47412872314453125, -0.12268447875976562, 0.3539581298828125, 0.2919464111328125, -0.3798675537109375, -0.5332221984863281, 0.3760223388671875, -0.5298995971679688, -0.2833900451660156, -0.1880340576171875, -0.88934326171875, -0.4967498779296875, -0.258087158203125, -0.07687759399414062, 0.16819381713867188, 0.07277679443359375, -0.3285064697265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000040.npy"}
|
||||
{"epoch": 0.06046863189720333, "step": 41, "batch_size": 64, "mean": 0.09080681204795837, "std": 0.4133578836917877, "min": -1.4043731689453125, "p10": -0.31424331665039057, "median": 0.0410614013671875, "p90": 0.57718505859375, "max": 1.37396240234375, "pos_frac": 0.578125, "sample": [0.0087127685546875, 0.37590789794921875, -0.33034515380859375, -0.44564056396484375, 0.1705322265625, 1.37396240234375, 0.051715850830078125, 0.14801406860351562, 0.5782012939453125, 0.419647216796875, -0.165496826171875, 0.037715911865234375, -0.5136680603027344, -0.3552093505859375, 0.47315216064453125, 0.10162353515625, 0.0479736328125, 0.6355743408203125, -0.15357589721679688, 0.27715492248535156, 0.27529144287109375, -0.1025390625, -0.03830718994140625, -0.012939453125, 0.28666114807128906, 0.12064361572265625, 0.32442474365234375, 0.07005882263183594, -0.23031997680664062, 0.595733642578125, -0.10747909545898438, -1.4043731689453125, 0.3350982666015625, 0.5383453369140625, -0.012859344482421875, -0.046905517578125, 0.4090919494628906, -0.27667236328125, 0.2911529541015625, -0.11376762390136719, 0.4397850036621094, -0.11469268798828125, 0.3666572570800781, 0.023916244506835938, -0.0126495361328125, 0.04257965087890625, -0.085968017578125, 0.06896209716796875, 0.5748138427734375, -0.0168304443359375, -0.1254425048828125, -0.20037841796875, -0.19045257568359375, -0.7711563110351562, -0.45278167724609375, -0.2742767333984375, 0.192840576171875, 0.03954315185546875, 0.029632568359375, -0.2421722412109375, 0.81683349609375, 0.403533935546875, 0.8819122314453125, 0.7811355590820312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000041.npy"}
|
||||
{"epoch": 0.06198034769463341, "step": 42, "batch_size": 64, "mean": 0.03817671537399292, "std": 0.42509573698043823, "min": -1.0586471557617188, "p10": -0.4515064239501953, "median": -0.049224853515625, "p90": 0.5934417724609375, "max": 1.1683502197265625, "pos_frac": 0.40625, "sample": [0.457305908203125, 0.84698486328125, 0.19378662109375, 0.20602035522460938, -0.02188873291015625, -0.08129119873046875, -0.14246368408203125, -0.22290611267089844, -0.295928955078125, -0.011135101318359375, -0.1922454833984375, 0.496368408203125, -0.10604476928710938, -0.13426589965820312, -0.431182861328125, 0.24030303955078125, 0.9657440185546875, -0.13031005859375, -0.4602165222167969, -0.626190185546875, -0.17052459716796875, 0.59515380859375, 0.4632682800292969, -0.4637908935546875, -0.1613006591796875, 0.9162368774414062, -0.15021896362304688, -0.0143280029296875, -0.0821075439453125, -0.12649154663085938, 0.589447021484375, 0.12678909301757812, -0.2281818389892578, 0.002838134765625, -0.41790008544921875, -0.08133697509765625, -0.10748291015625, -0.3056182861328125, -0.1185302734375, 1.1683502197265625, 0.5046539306640625, 0.029022216796875, 0.3991851806640625, -0.0676422119140625, -0.571502685546875, 0.7733612060546875, 0.11521148681640625, 0.5392303466796875, -0.13155364990234375, 0.13654327392578125, -0.7103691101074219, -0.1949462890625, 0.09401702880859375, 0.03867340087890625, -1.0586471557617188, 0.1768646240234375, -0.4820098876953125, -0.09503364562988281, -0.0308074951171875, -0.000148773193359375, -0.021608352661132812, 0.21698379516601562, 0.93536376953125, -0.1362457275390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000042.npy"}
|
||||
{"epoch": 0.06349206349206349, "step": 43, "batch_size": 64, "mean": -0.0492367148399353, "std": 0.47821515798568726, "min": -1.400146484375, "p10": -0.5103828430175781, "median": -0.11904144287109375, "p90": 0.5777069091796876, "max": 1.255645751953125, "pos_frac": 0.390625, "sample": [0.327728271484375, 0.3462066650390625, -1.196197509765625, 0.058704376220703125, 0.13277053833007812, 0.29425048828125, -0.11965560913085938, -0.21480178833007812, -0.78326416015625, -0.15679931640625, -0.11946868896484375, -0.4604034423828125, -0.2002716064453125, -0.01277923583984375, 0.85137939453125, -0.485595703125, -0.3357391357421875, -0.26970672607421875, -0.15396499633789062, -0.07276535034179688, -0.06754302978515625, 0.5874481201171875, -0.2303466796875, -0.151702880859375, 0.033138275146484375, 0.4070701599121094, -0.12816810607910156, -0.55621337890625, 0.11054229736328125, -0.020172119140625, 1.255645751953125, -0.5129852294921875, -0.2644500732421875, 0.9531478881835938, -0.5043106079101562, 0.5352783203125, 0.0498809814453125, -0.2515678405761719, 0.7654342651367188, 0.595428466796875, 0.5549774169921875, -0.049976348876953125, -0.4740486145019531, -0.4927635192871094, 0.12566757202148438, -0.31298828125, -0.285430908203125, -0.3643169403076172, -0.29662322998046875, 0.15777587890625, 0.4422416687011719, 0.18582916259765625, -0.2288055419921875, -0.11861419677734375, 0.26966094970703125, 0.23574066162109375, 0.8149337768554688, -0.08535003662109375, -0.826202392578125, -1.400146484375, -0.3580322265625, -0.6144447326660156, -0.280792236328125, 0.2153778076171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000043.npy"}
|
||||
{"epoch": 0.06500377928949358, "step": 44, "batch_size": 64, "mean": 0.14597000181674957, "std": 0.5021471977233887, "min": -0.8046646118164062, "p10": -0.4483451843261719, "median": 0.07531166076660156, "p90": 0.7963890075683595, "max": 2.1583251953125, "pos_frac": 0.609375, "sample": [0.01883697509765625, 0.6256217956542969, 0.22942352294921875, 0.6364936828613281, -0.12963104248046875, -0.4522705078125, 0.00536346435546875, -0.39728546142578125, 0.011688232421875, -0.2923736572265625, -0.5503196716308594, -0.1053314208984375, 1.10955810546875, 0.12825584411621094, 0.3103485107421875, 0.27088165283203125, 0.2524871826171875, -0.38216400146484375, -0.044464111328125, -0.09421539306640625, 0.37876129150390625, 0.0444793701171875, 0.07163238525390625, 0.28902435302734375, 0.421600341796875, 0.1703033447265625, 1.0757827758789062, -0.103240966796875, 0.8400421142578125, 0.9195785522460938, 2.1583251953125, -0.43918609619140625, -0.48256683349609375, 0.16409683227539062, -0.006316184997558594, 0.39605712890625, 0.2519989013671875, -0.16658401489257812, 0.05452728271484375, 0.7694778442382812, -0.3097953796386719, -0.03479766845703125, 0.405181884765625, -0.15131378173828125, 0.47029876708984375, 0.07284927368164062, -0.0826568603515625, 0.19739532470703125, -0.8046646118164062, 0.23649024963378906, -0.2613525390625, 0.6583786010742188, 0.1957244873046875, -0.5166778564453125, -0.5785903930664062, 0.29622650146484375, 0.9236297607421875, 0.6301727294921875, 0.3432884216308594, -0.6773452758789062, -0.3384552001953125, -0.176300048828125, 0.80792236328125, 0.0777740478515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000044.npy"}
|
||||
{"epoch": 0.06651549508692366, "step": 45, "batch_size": 64, "mean": 0.08005967736244202, "std": 0.43848279118537903, "min": -0.7265968322753906, "p10": -0.4498603820800781, "median": 0.09753990173339844, "p90": 0.5828022003173828, "max": 1.4684677124023438, "pos_frac": 0.578125, "sample": [-0.3606109619140625, 0.09099578857421875, -0.5718841552734375, -0.2853240966796875, -0.11461067199707031, 0.17078399658203125, 0.03365325927734375, 1.4684677124023438, 0.4404296875, -0.55755615234375, -0.29651641845703125, 1.3089828491210938, 0.13953399658203125, 0.24407958984375, -0.4406852722167969, -0.30934906005859375, -0.4537925720214844, -0.3101654052734375, -0.1064605712890625, 0.39905548095703125, 0.5715789794921875, -0.07511138916015625, 0.15496444702148438, -0.24997711181640625, -0.3533172607421875, -0.164642333984375, 0.3514251708984375, -0.62359619140625, -0.19835662841796875, -0.11827850341796875, -0.20043563842773438, -0.56512451171875, 0.1880340576171875, 0.2739219665527344, 0.652557373046875, -0.1868133544921875, 0.03199005126953125, 0.9139404296875, 0.567535400390625, 0.18402099609375, 0.5876121520996094, 0.4432373046875, -0.0175323486328125, 0.8981895446777344, 0.14456939697265625, 0.21848678588867188, 0.25537109375, 0.12318038940429688, -0.369873046875, 0.428497314453125, 0.33343505859375, 0.23268890380859375, 0.5930709838867188, 0.0873870849609375, 0.13423538208007812, 0.08044815063476562, 0.402984619140625, 0.10408401489257812, -0.5261173248291016, 0.27550697326660156, -0.7265968322753906, -0.04693603515625, 0.1597747802734375, -0.33522796630859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000045.npy"}
|
||||
{"epoch": 0.06802721088435375, "step": 46, "batch_size": 64, "mean": 0.09679737687110901, "std": 0.3287685215473175, "min": -0.5725250244140625, "p10": -0.3063690185546875, "median": 0.09257698059082031, "p90": 0.5172149658203126, "max": 0.92950439453125, "pos_frac": 0.578125, "sample": [0.47850799560546875, -0.2953300476074219, -0.22843170166015625, -0.5725250244140625, 0.4723052978515625, 0.92950439453125, -0.3111000061035156, 0.14812469482421875, 0.21874237060546875, 0.3060150146484375, 0.053989410400390625, 0.4874725341796875, -0.1787567138671875, 0.00072479248046875, 0.1219482421875, 0.4033012390136719, -0.32807159423828125, -0.08698272705078125, 0.2946014404296875, -0.04146575927734375, 0.27979087829589844, 0.78472900390625, -0.083343505859375, -0.1950836181640625, -0.0185699462890625, -0.28922271728515625, 0.2257080078125, 0.0668792724609375, 0.0680694580078125, -0.31597137451171875, 0.1654510498046875, 0.32767486572265625, 0.08980178833007812, 0.5185699462890625, -0.040378570556640625, 0.1702117919921875, 0.1326160430908203, -0.06104278564453125, 0.17934799194335938, 0.690399169921875, -0.20206451416015625, 0.5832138061523438, 0.45215606689453125, 0.27576446533203125, 0.0953521728515625, 0.2606544494628906, 0.5583038330078125, -0.5703125, -0.048797607421875, 0.4392204284667969, -0.046573638916015625, -0.06543731689453125, -0.3220977783203125, 0.0994873046875, 0.5140533447265625, 0.5359878540039062, -0.25453948974609375, -0.19927597045898438, -0.1385650634765625, -0.4699859619140625, 0.1540985107421875, -0.26216697692871094, -0.16093063354492188, 0.3992767333984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000046.npy"}
|
||||
{"epoch": 0.06953892668178382, "step": 47, "batch_size": 64, "mean": 0.010644763708114624, "std": 0.37189942598342896, "min": -0.8807754516601562, "p10": -0.46533813476562497, "median": -0.01624298095703125, "p90": 0.4166770935058595, "max": 1.3470001220703125, "pos_frac": 0.46875, "sample": [0.19207191467285156, 0.19915771484375, -0.4510040283203125, -0.5400924682617188, 0.0240631103515625, 0.24290847778320312, -0.07297134399414062, 0.5749893188476562, -0.2198162078857422, 0.29189300537109375, -0.4145946502685547, -0.18130111694335938, -0.8807754516601562, -0.05899810791015625, -0.08385467529296875, 0.041351318359375, 0.11995315551757812, 0.574432373046875, -0.498291015625, -0.07777786254882812, 0.1940155029296875, -0.16973876953125, -0.06139373779296875, 0.2811241149902344, 0.24606895446777344, -0.497222900390625, 0.30765724182128906, 0.04356193542480469, 0.0445404052734375, -0.2802619934082031, 0.2763671875, 0.11686325073242188, 0.07286834716796875, 0.318603515625, 0.6285324096679688, -0.44060516357421875, -0.17998504638671875, -0.10829544067382812, 0.18999862670898438, -0.7025375366210938, 0.219635009765625, 0.35489654541015625, -0.15383148193359375, -0.14743804931640625, -0.015117645263671875, -0.18379592895507812, 0.100830078125, -0.4714813232421875, 0.376617431640625, 0.0709075927734375, -0.0264892578125, -0.6317901611328125, 1.3470001220703125, 0.43384552001953125, -0.017368316650390625, -0.057220458984375, -0.09569740295410156, -0.06878280639648438, -0.44602203369140625, -0.10933303833007812, -0.0038299560546875, 0.5399932861328125, 0.76849365234375, -0.1642608642578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000047.npy"}
|
||||
{"epoch": 0.0710506424792139, "step": 48, "batch_size": 64, "mean": 0.08006066083908081, "std": 0.3624253273010254, "min": -1.1922950744628906, "p10": -0.26204986572265626, "median": 0.1089010238647461, "p90": 0.5007057189941406, "max": 1.0135726928710938, "pos_frac": 0.578125, "sample": [0.11207962036132812, -0.0872955322265625, -0.23994064331054688, -0.26214599609375, 0.40769195556640625, 0.426788330078125, 0.1824493408203125, 0.37383270263671875, -0.06766510009765625, 0.17621231079101562, 0.0410308837890625, 0.16611099243164062, 0.5523529052734375, -0.16632843017578125, -0.3244781494140625, 0.5789947509765625, -0.0326690673828125, 0.33304405212402344, -0.12005615234375, 0.6944351196289062, -0.15916824340820312, -0.0964508056640625, 0.4388885498046875, 0.16854095458984375, -0.06862640380859375, 0.04007720947265625, 0.12891387939453125, 0.5030059814453125, -1.1922950744628906, -0.08587646484375, 0.12532806396484375, 0.579986572265625, 0.2341461181640625, -0.6077003479003906, 0.49533843994140625, -0.2618255615234375, 0.3262519836425781, 0.3399200439453125, 0.1638946533203125, -0.2318115234375, 0.10870170593261719, 1.0135726928710938, -0.08066558837890625, 0.14922714233398438, 0.9408035278320312, 0.12501907348632812, 0.4032859802246094, -0.17661285400390625, -0.1363677978515625, -0.19496917724609375, 0.2976531982421875, -0.1884307861328125, -0.4971733093261719, -0.058048248291015625, 0.023462295532226562, 0.3515586853027344, 0.109100341796875, -0.1611042022705078, -0.5001983642578125, -0.4865875244140625, -0.033657073974609375, 0.25333404541015625, 0.10394668579101562, 0.1730499267578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000048.npy"}
|
||||
{"epoch": 0.07256235827664399, "step": 49, "batch_size": 64, "mean": 0.04423774778842926, "std": 0.37420058250427246, "min": -1.0092010498046875, "p10": -0.3595712661743164, "median": 0.04879188537597656, "p90": 0.4572055816650392, "max": 0.995452880859375, "pos_frac": 0.5625, "sample": [-0.197021484375, 0.3303031921386719, 0.051136016845703125, -0.31908416748046875, 0.04180145263671875, 0.15851783752441406, 0.07129669189453125, -0.40616607666015625, -0.15311050415039062, 0.32666778564453125, 0.2359447479248047, -0.134857177734375, -0.333404541015625, 0.3372650146484375, -0.16613388061523438, -0.3598499298095703, 0.13703536987304688, -0.7509994506835938, -0.3165092468261719, -0.7252197265625, 0.019632339477539062, -0.05034637451171875, 0.17616939544677734, 0.995452880859375, -0.11677360534667969, 0.7237167358398438, 0.3565521240234375, -0.3589210510253906, 0.025835037231445312, -0.17247772216796875, 0.22861862182617188, -0.12263107299804688, 0.23279953002929688, 0.07403564453125, 0.2657470703125, 0.081268310546875, -0.0039215087890625, -0.3560791015625, 0.144012451171875, -0.3789215087890625, -1.0092010498046875, 0.7696075439453125, 0.4147987365722656, -0.0965576171875, 0.2667999267578125, 0.11092376708984375, 0.13871002197265625, -0.19263839721679688, 0.1792144775390625, -0.04747200012207031, 0.374237060546875, -0.0918121337890625, 0.41217041015625, -0.31513214111328125, 0.2133941650390625, -0.39510345458984375, -0.22574615478515625, 0.846832275390625, 0.48656463623046875, 0.47537994384765625, -0.009429931640625, 0.8219757080078125, 0.04644775390625, 0.0658721923828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000049.npy"}
|
||||
{"epoch": 0.07407407407407407, "step": 50, "batch_size": 64, "mean": -0.03308817744255066, "std": 0.34344232082366943, "min": -0.8459396362304688, "p10": -0.5445999145507813, "median": -0.030821800231933594, "p90": 0.3828163146972657, "max": 0.8040771484375, "pos_frac": 0.453125, "sample": [-0.06656646728515625, 0.2416229248046875, 0.2301483154296875, -0.008075714111328125, -0.059787750244140625, -0.30588531494140625, 0.343597412109375, -0.5390777587890625, -0.378875732421875, 0.38761138916015625, 0.053058624267578125, 0.04517364501953125, -0.26070404052734375, -0.2526206970214844, 0.0856475830078125, 0.07061958312988281, -0.8122634887695312, 0.2964973449707031, 0.19600296020507812, 0.8040771484375, 0.45525360107421875, 0.4906578063964844, -0.11124420166015625, -0.484588623046875, 0.095916748046875, -0.546966552734375, -0.7138137817382812, 0.15215682983398438, -0.22364425659179688, -0.191009521484375, 0.4923286437988281, 0.2617645263671875, 0.2619819641113281, -0.0181884765625, -0.0290985107421875, 0.39875030517578125, -0.1057891845703125, -0.08773040771484375, -0.23778724670410156, 0.39588165283203125, -0.2071380615234375, -0.8459396362304688, 0.283355712890625, -0.123199462890625, -0.03254508972167969, -0.05008888244628906, 0.3716278076171875, -0.18050003051757812, -0.09930419921875, -0.6340560913085938, 0.1601581573486328, -0.10254669189453125, 0.1293811798095703, 0.3466358184814453, 0.09209060668945312, -0.728271484375, -0.6932029724121094, 0.12320709228515625, -0.05267333984375, 0.12969589233398438, -0.13194656372070312, -0.19727325439453125, 0.042789459228515625, -0.04293060302734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000050.npy"}
|
||||
{"epoch": 0.07558578987150416, "step": 51, "batch_size": 64, "mean": -0.07528868317604065, "std": 0.3452436029911041, "min": -0.8863067626953125, "p10": -0.5449935913085937, "median": -0.020145416259765625, "p90": 0.338177490234375, "max": 0.7300758361816406, "pos_frac": 0.453125, "sample": [0.3388824462890625, 0.24782562255859375, 0.513580322265625, -0.485382080078125, -0.29369354248046875, 0.3365325927734375, 0.5148200988769531, -0.023830413818359375, 0.16606903076171875, 0.0769195556640625, 0.0980224609375, -0.746917724609375, 0.071044921875, 0.1134185791015625, -0.2101898193359375, -0.3637237548828125, 0.10040283203125, -0.0747833251953125, 0.7300758361816406, -0.47613525390625, -0.11557769775390625, 0.176666259765625, -0.1279144287109375, -0.1597442626953125, -0.0460662841796875, -0.5206298828125, -0.40721893310546875, -0.5687980651855469, -0.034885406494140625, -0.8132553100585938, -0.10544586181640625, 0.012664794921875, -0.208099365234375, -0.855010986328125, -0.8863067626953125, -0.088714599609375, 0.4197845458984375, -0.20672607421875, -0.2788848876953125, 0.10840225219726562, -0.15028762817382812, -0.396026611328125, 0.015472412109375, 0.00640869140625, 0.105316162109375, -0.016460418701171875, 0.18665313720703125, -0.0041656494140625, 0.107818603515625, 0.12347412109375, -0.0158843994140625, 0.15557098388671875, 0.01982879638671875, -0.29369354248046875, 0.01445770263671875, 0.06585121154785156, -0.5554351806640625, 0.519195556640625, 0.4621696472167969, -0.40908050537109375, -0.10630035400390625, -0.6743621826171875, -0.15631103515625, 0.2501373291015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000051.npy"}
|
||||
{"epoch": 0.07709750566893424, "step": 52, "batch_size": 64, "mean": 0.03665390610694885, "std": 0.44483718276023865, "min": -1.3104019165039062, "p10": -0.3339447021484375, "median": -0.05175590515136719, "p90": 0.5775825500488282, "max": 1.553253173828125, "pos_frac": 0.453125, "sample": [-0.0511322021484375, -0.426666259765625, 0.618133544921875, 0.1837615966796875, -0.29170989990234375, -0.38935089111328125, 0.10914993286132812, -0.061920166015625, 0.13785934448242188, 0.244140625, -0.32965087890625, -0.25252532958984375, -0.335784912109375, 0.47527313232421875, 0.5513763427734375, -0.02495574951171875, 0.3027191162109375, -0.3082427978515625, 0.4024314880371094, 0.013423919677734375, -0.302459716796875, 0.05457305908203125, 0.5888137817382812, -0.17860794067382812, 1.553253173828125, -0.14141082763671875, 0.00400543212890625, 0.5163898468017578, 0.0652618408203125, -0.32373809814453125, 0.4079551696777344, -0.2764129638671875, 0.8695220947265625, 0.6126365661621094, -0.16185569763183594, -0.5235595703125, -0.17525482177734375, -0.029825210571289062, -0.3779296875, -0.074249267578125, 0.22342300415039062, -0.21858596801757812, -0.23741531372070312, 0.6307640075683594, -0.19543075561523438, 0.38146209716796875, -0.14889144897460938, -0.319183349609375, -0.1666107177734375, -1.3104019165039062, 0.5272979736328125, -0.144744873046875, -0.22819900512695312, 0.501129150390625, 0.22515869140625, 0.5278472900390625, 0.22415924072265625, 0.411041259765625, 0.7095947265625, -0.16067123413085938, -1.0581436157226562, -0.15161895751953125, -0.052379608154296875, -0.29718780517578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000052.npy"}
|
||||
{"epoch": 0.07860922146636433, "step": 53, "batch_size": 64, "mean": 0.07224002480506897, "std": 0.43451300263404846, "min": -1.493255615234375, "p10": -0.32471561431884766, "median": 0.048702239990234375, "p90": 0.5653266906738281, "max": 0.9433135986328125, "pos_frac": 0.609375, "sample": [-0.18739700317382812, 0.6624679565429688, 0.6999473571777344, -0.12815093994140625, -0.3010368347167969, -0.023052215576171875, -0.20214462280273438, -0.172027587890625, -0.7378196716308594, 0.0245819091796875, -0.7390899658203125, 0.5014190673828125, 0.47853851318359375, 0.27557373046875, -1.0401077270507812, 0.02394866943359375, 0.179290771484375, 0.14385223388671875, -0.1053619384765625, 0.3421173095703125, 0.06996726989746094, 0.03386688232421875, 0.9433135986328125, 0.5712203979492188, 0.30101776123046875, 0.3586578369140625, -0.33486366271972656, 0.484588623046875, 0.029611587524414062, 0.265228271484375, -0.2519989013671875, -0.1742095947265625, 0.0560455322265625, -1.493255615234375, -0.0038127899169921875, 0.55157470703125, 0.4257698059082031, -0.077362060546875, 0.10468292236328125, 0.4294586181640625, -0.17768478393554688, 0.46331024169921875, 0.16320228576660156, 0.676788330078125, 0.9420852661132812, 0.516448974609375, -0.22435760498046875, -0.06180572509765625, 0.1381378173828125, -0.59503173828125, 0.26572418212890625, -0.059741973876953125, 0.023162841796875, -0.1949005126953125, 0.1182098388671875, -0.4430885314941406, 0.1735382080078125, 0.17404937744140625, 0.8408432006835938, 0.16764259338378906, -0.1579132080078125, 0.04135894775390625, -0.1805267333984375, 0.028860092163085938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000053.npy"}
|
||||
{"epoch": 0.0801209372637944, "step": 54, "batch_size": 64, "mean": 0.03317078948020935, "std": 0.42640092968940735, "min": -1.1577987670898438, "p10": -0.5218818664550782, "median": 0.010457992553710938, "p90": 0.5816978454589844, "max": 0.9703521728515625, "pos_frac": 0.515625, "sample": [-0.45844268798828125, 0.31133270263671875, 0.01470184326171875, -0.17525291442871094, 0.30947113037109375, 0.031742095947265625, 0.3674812316894531, 0.930145263671875, 0.41504669189453125, -0.15027999877929688, 0.19430160522460938, -0.11562347412109375, 0.2930755615234375, -0.17586517333984375, 0.06358718872070312, -0.46075439453125, 0.78802490234375, 0.23571205139160156, -0.6692314147949219, 0.9703521728515625, 0.5771102905273438, -0.3943328857421875, -0.2008838653564453, 0.3287620544433594, -0.0925140380859375, 0.6614646911621094, 0.4406585693359375, 0.10305404663085938, 0.30612945556640625, 0.5533370971679688, -0.5692596435546875, 0.5090751647949219, 0.3428192138671875, -0.07001495361328125, 0.19805335998535156, 0.08443450927734375, -0.1019287109375, 0.5836639404296875, -0.5896224975585938, -0.08513641357421875, -0.15624237060546875, -0.4037933349609375, -0.404998779296875, 0.15326690673828125, -0.12363815307617188, 0.006214141845703125, 0.5341262817382812, -0.14772796630859375, -0.1352405548095703, 0.6486282348632812, -0.36199951171875, -0.58306884765625, 0.2900962829589844, -0.1038970947265625, 0.05677986145019531, -0.19746017456054688, -0.06209754943847656, 0.2571144104003906, 0.6002235412597656, -0.51898193359375, -0.5231246948242188, -1.1577987670898438, -0.6015701293945312, -0.24627304077148438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000054.npy"}
|
||||
{"epoch": 0.08163265306122448, "step": 55, "batch_size": 64, "mean": 0.1959269940853119, "std": 0.43085023760795593, "min": -0.6092910766601562, "p10": -0.28119773864746095, "median": 0.1463165283203125, "p90": 0.6073856353759767, "max": 1.5115966796875, "pos_frac": 0.6875, "sample": [0.549774169921875, 0.207427978515625, 0.5627670288085938, -0.2792510986328125, -0.08692550659179688, -0.39398193359375, 0.21624755859375, 0.08319854736328125, -0.5664901733398438, 0.14125823974609375, 0.196136474609375, 0.57733154296875, 0.28708648681640625, 0.5773582458496094, 0.18964195251464844, 0.471832275390625, 0.5756378173828125, 0.65582275390625, 0.4414825439453125, -0.03176116943359375, -0.0110931396484375, 1.332366943359375, 0.49787139892578125, -0.08624267578125, 1.5115966796875, 0.2591876983642578, -0.3171882629394531, 1.2206878662109375, 0.44570159912109375, -0.0209197998046875, -0.17650985717773438, 0.3838653564453125, 0.00135040283203125, 0.324432373046875, -0.2423858642578125, 0.10460662841796875, 0.018453598022460938, 0.13776016235351562, 0.9009246826171875, -0.12364578247070312, 0.42691802978515625, 0.6202545166015625, 0.10810089111328125, 1.098388671875, 0.15137481689453125, 0.05938529968261719, 0.3780517578125, -0.556610107421875, -0.5175399780273438, -0.6092910766601562, 0.16003799438476562, 0.1593914031982422, -0.079925537109375, -0.2820320129394531, 0.06865692138671875, 0.354522705078125, 0.5223236083984375, -0.2397918701171875, 0.11286544799804688, 0.22100830078125, 0.09084320068359375, 0.11663818359375, -0.2664794921875, -0.09317779541015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000055.npy"}
|
||||
{"epoch": 0.08314436885865457, "step": 56, "batch_size": 64, "mean": 0.09722745418548584, "std": 0.46636876463890076, "min": -0.97760009765625, "p10": -0.46477375030517576, "median": 0.05678749084472656, "p90": 0.6773319244384768, "max": 1.922760009765625, "pos_frac": 0.5625, "sample": [-0.1023406982421875, 0.5452499389648438, 0.11921119689941406, 0.0093841552734375, 0.15285491943359375, 0.26134490966796875, 0.7326202392578125, -0.415557861328125, -0.2869300842285156, -0.07517814636230469, 0.08668136596679688, 0.0915069580078125, 1.922760009765625, -0.6488494873046875, -0.282470703125, 0.6192131042480469, 0.7989120483398438, 0.0526885986328125, 0.18607330322265625, 0.4299354553222656, -0.03647613525390625, 0.3018951416015625, -0.08582115173339844, 1.1013679504394531, 0.17465782165527344, 0.08180999755859375, -0.11750030517578125, 0.2860984802246094, 0.2161712646484375, -0.4881706237792969, 0.01300811767578125, 0.051666259765625, 0.550079345703125, 0.3697471618652344, -0.5435638427734375, 0.550262451171875, -0.97760009765625, 0.3056526184082031, -0.45560264587402344, -0.5982704162597656, 0.9584083557128906, -0.11933135986328125, 0.0651397705078125, -0.4687042236328125, 0.08677291870117188, 0.7450485229492188, -0.2122039794921875, 0.702239990234375, -0.04092979431152344, -0.1624908447265625, -0.07268524169921875, 0.2807159423828125, 0.2715911865234375, -0.01149749755859375, -0.2575531005859375, -0.13397216796875, -0.6586456298828125, -0.00824737548828125, -0.0903778076171875, 0.46739959716796875, -0.2899971008300781, 0.39990234375, 0.060886383056640625, -0.18543243408203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000056.npy"}
|
||||
{"epoch": 0.08465608465608465, "step": 57, "batch_size": 64, "mean": 0.18711179494857788, "std": 0.5337367653846741, "min": -1.1641693115234375, "p10": -0.36009750366210935, "median": 0.2600135803222656, "p90": 0.8016868591308595, "max": 1.5452880859375, "pos_frac": 0.609375, "sample": [0.41042327880859375, 0.04395294189453125, 0.4351654052734375, -0.29656982421875, -0.3038673400878906, -0.3806304931640625, 0.33416748046875, 0.4889068603515625, 0.1842498779296875, 0.6480789184570312, -0.529998779296875, 0.4048004150390625, 0.21126937866210938, 0.4453086853027344, 0.34307861328125, 0.2857017517089844, -1.1380615234375, 0.5094337463378906, 1.5452880859375, -0.023508071899414062, -0.31218719482421875, -0.2078704833984375, 0.5517692565917969, 0.39276123046875, -0.06565475463867188, -0.23731613159179688, 0.4222831726074219, 1.3178329467773438, -0.03708648681640625, 0.5099411010742188, 0.133209228515625, 0.8545036315917969, -0.2944297790527344, -0.7381439208984375, 0.9411773681640625, 0.7508392333984375, -0.0816650390625, 0.19629478454589844, -1.0369415283203125, 0.975860595703125, -0.07466506958007812, 0.5498924255371094, -1.1641693115234375, 0.523895263671875, 0.327301025390625, 0.5326766967773438, -0.29859161376953125, 0.7764739990234375, -0.0853118896484375, -0.3018989562988281, 0.4380455017089844, 0.7784500122070312, 0.8116455078125, 0.5924072265625, 0.26557159423828125, 0.3405609130859375, 0.956268310546875, 0.25445556640625, -0.0823516845703125, -0.30617523193359375, -0.04296875, -0.06364822387695312, 0.05617523193359375, -0.46125030517578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000057.npy"}
|
||||
{"epoch": 0.08616780045351474, "step": 58, "batch_size": 64, "mean": 0.040792256593704224, "std": 0.5537757873535156, "min": -0.8504180908203125, "p10": -0.6873226165771484, "median": -0.0107574462890625, "p90": 0.7684783935546875, "max": 1.9820556640625, "pos_frac": 0.5, "sample": [-0.0216217041015625, 0.2848091125488281, 0.17049598693847656, -0.27866363525390625, -0.2300872802734375, -0.8130722045898438, 0.34967803955078125, 1.0475311279296875, -0.5654754638671875, 1.168182373046875, 0.48357391357421875, -0.3237724304199219, 0.77410888671875, -0.473419189453125, -0.6922492980957031, 0.10900115966796875, 0.755340576171875, -0.53045654296875, -0.24896240234375, 0.4947509765625, -0.05857086181640625, 0.1609344482421875, -0.7890663146972656, -0.1934814453125, -0.8504180908203125, 0.18691253662109375, 0.06084442138671875, 0.45098876953125, -0.287384033203125, -0.06830215454101562, -0.7349586486816406, 0.42333984375, -0.14315032958984375, 0.473602294921875, 0.06231689453125, 0.3221893310546875, 1.9820556640625, 1.15228271484375, 0.29984283447265625, 0.19964981079101562, 0.41888427734375, -0.13219642639160156, -0.071685791015625, -0.6758270263671875, -0.8096923828125, 1.0718536376953125, -0.11322784423828125, -0.2855682373046875, 0.00135040283203125, -0.3231163024902344, -0.1703338623046875, 0.5126724243164062, 0.0001068115234375, 0.18964385986328125, 0.09078598022460938, 0.8715591430664062, -0.07369041442871094, -0.38211822509765625, -0.7539215087890625, -0.316436767578125, -0.541656494140625, 0.1505889892578125, -0.15956687927246094, 0.0029773712158203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000058.npy"}
|
||||
{"epoch": 0.08767951625094482, "step": 59, "batch_size": 64, "mean": 0.12307757139205933, "std": 0.5102850794792175, "min": -1.2604522705078125, "p10": -0.4510772705078125, "median": 0.10943984985351562, "p90": 0.7461730957031254, "max": 1.8640289306640625, "pos_frac": 0.59375, "sample": [-0.8335609436035156, -0.02249908447265625, 0.49381256103515625, -1.2604522705078125, 0.26873016357421875, 0.016809463500976562, 0.7945404052734375, -0.45244598388671875, -0.029521942138671875, 0.34539794921875, 1.26318359375, 0.22655487060546875, 0.79449462890625, 0.5031929016113281, 0.6116161346435547, 0.22642135620117188, 0.306640625, -0.03816986083984375, 0.80670166015625, 0.033355712890625, 0.09049224853515625, 0.10953521728515625, -0.46253204345703125, 0.32697296142578125, -0.14862823486328125, 0.8548469543457031, 0.109344482421875, -0.44788360595703125, -0.029296875, -0.106658935546875, -0.45392608642578125, 0.605316162109375, 0.1383514404296875, -0.19869232177734375, -1.19476318359375, 0.33161163330078125, -0.5960578918457031, -0.15670204162597656, -0.05735015869140625, 0.293121337890625, 1.8640289306640625, -0.09711456298828125, 0.19876861572265625, 0.2866249084472656, -0.35456085205078125, 0.1313304901123047, 0.3226165771484375, 0.49356651306152344, -0.41787147521972656, 0.9139556884765625, -0.2681121826171875, 0.6334228515625, 0.41077423095703125, 0.250946044921875, 0.44121551513671875, -0.17285919189453125, -0.33113861083984375, 0.1636810302734375, 0.03594970703125, -0.0148162841796875, -0.0890045166015625, 0.45086669921875, -0.14525222778320312, 0.10804367065429688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000059.npy"}
|
||||
{"epoch": 0.08919123204837491, "step": 60, "batch_size": 64, "mean": 0.02664312720298767, "std": 0.5925369262695312, "min": -1.9620819091796875, "p10": -0.5518890380859375, "median": 0.04274749755859375, "p90": 0.6328681945800783, "max": 2.14044189453125, "pos_frac": 0.515625, "sample": [0.4918212890625, -0.28281402587890625, 0.65557861328125, 0.39215087890625, -0.5649566650390625, -0.2639274597167969, -1.9620819091796875, -0.978912353515625, -0.2821197509765625, -0.8923187255859375, 0.19672393798828125, 0.0611114501953125, -0.5524749755859375, 0.6643867492675781, 0.20954132080078125, 0.610870361328125, 0.2893486022949219, 0.5505104064941406, -0.30501747131347656, 0.16925048828125, -0.5505218505859375, 0.5652542114257812, -0.01595306396484375, 0.04931640625, -0.2774162292480469, 0.33080291748046875, 0.19791412353515625, -0.24840545654296875, 0.6422958374023438, -0.40532684326171875, -0.3392467498779297, 0.4044647216796875, 0.44873809814453125, -0.228240966796875, 0.0361785888671875, -0.0534515380859375, 0.20794677734375, -0.322021484375, 0.5896072387695312, -0.38616943359375, -1.1670379638671875, 0.06116676330566406, 0.17226028442382812, -0.1471405029296875, -0.12389755249023438, -0.03265380859375, -0.5214195251464844, 2.14044189453125, 0.9783554077148438, 0.19343185424804688, -0.2682952880859375, -0.3821868896484375, 0.30780982971191406, -0.6597976684570312, 0.9228668212890625, -0.3721027374267578, -0.1085968017578125, -0.27617645263671875, 1.35302734375, -0.11075210571289062, 0.256591796875, 0.22364234924316406, 0.23642730712890625, 0.1767597198486328], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000060.npy"}
|
||||
{"epoch": 0.09070294784580499, "step": 61, "batch_size": 64, "mean": 0.05751065909862518, "std": 0.5425071716308594, "min": -1.4740829467773438, "p10": -0.5364669799804688, "median": 0.06979942321777344, "p90": 0.6668899536132813, "max": 1.477874755859375, "pos_frac": 0.578125, "sample": [-0.533935546875, 0.6757659912109375, 0.1681671142578125, 1.201141357421875, 0.23781585693359375, -0.08095932006835938, 0.245452880859375, 0.6775283813476562, -0.2317962646484375, -0.047924041748046875, -0.569793701171875, 0.161285400390625, 0.21496200561523438, 0.2373046875, 0.2688941955566406, 0.07817459106445312, 0.3934745788574219, -0.15987396240234375, -0.8784942626953125, -0.21041107177734375, -0.21875381469726562, 0.541259765625, 1.4657058715820312, 0.006855010986328125, 0.033614158630371094, 0.11118507385253906, 0.1106109619140625, -0.5375518798828125, -0.1544036865234375, -0.13260269165039062, 0.0366973876953125, 0.3278675079345703, -0.021770477294921875, -0.3032073974609375, -0.197723388671875, 0.25342559814453125, -0.3016357421875, -0.5914020538330078, -0.4282035827636719, 0.5373764038085938, 0.5409393310546875, -1.4740829467773438, 0.8746185302734375, -0.5258026123046875, 0.3554534912109375, 0.64617919921875, 0.200103759765625, 0.21593284606933594, -0.31095314025878906, 0.3142967224121094, 0.016071319580078125, -1.083099365234375, 0.19067955017089844, 0.06142425537109375, 0.391021728515625, -0.0286865234375, 1.477874755859375, 0.18781661987304688, 0.384613037109375, -0.30908966064453125, 0.8214569091796875, -0.01847076416015625, -1.2829666137695312, -0.3487701416015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000061.npy"}
|
||||
{"epoch": 0.09221466364323508, "step": 62, "batch_size": 64, "mean": 0.08958582580089569, "std": 0.5235627293586731, "min": -1.68878173828125, "p10": -0.592432403564453, "median": 0.16548538208007812, "p90": 0.6474472045898438, "max": 1.24517822265625, "pos_frac": 0.59375, "sample": [0.14206695556640625, -0.41431427001953125, 0.245513916015625, -0.27435302734375, 0.294647216796875, 0.4516029357910156, -0.49304962158203125, -0.13598251342773438, -0.23477935791015625, 0.040496826171875, 0.972381591796875, 0.352996826171875, 0.49353790283203125, -0.7998886108398438, 0.041412353515625, 0.761322021484375, -0.04937744140625, 0.5241775512695312, -0.10321426391601562, 0.15457916259765625, -0.720550537109375, 0.24560928344726562, -1.196533203125, 0.1763916015625, -0.29302406311035156, 0.864044189453125, 0.3994789123535156, 0.47347259521484375, 0.6785507202148438, 0.638336181640625, -0.1026763916015625, -0.34174251556396484, 0.17708969116210938, -0.7420196533203125, 0.5925140380859375, -1.68878173828125, 0.4540252685546875, 0.6829071044921875, -0.17179489135742188, -0.8473052978515625, 0.1996307373046875, -0.491363525390625, -0.6350250244140625, 0.42022705078125, 0.571319580078125, 0.4454765319824219, -0.02738189697265625, 0.30718231201171875, -0.027210235595703125, 0.6513519287109375, -0.06669998168945312, 0.21224594116210938, 0.4390392303466797, 0.24029159545898438, -0.2667236328125, -0.04286956787109375, 0.53216552734375, 1.24517822265625, 0.5332736968994141, 0.0650482177734375, 0.5753021240234375, -0.1649017333984375, -0.2362518310546875, 0.006420135498046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000062.npy"}
|
||||
{"epoch": 0.09372637944066516, "step": 63, "batch_size": 64, "mean": 0.16455818712711334, "std": 0.5195087790489197, "min": -1.249755859375, "p10": -0.48145713806152346, "median": 0.18449878692626953, "p90": 0.6724212646484377, "max": 2.1300201416015625, "pos_frac": 0.640625, "sample": [0.4842681884765625, -0.5062484741210938, -0.36315155029296875, 0.045307159423828125, 0.63360595703125, -0.5771083831787109, 0.08673477172851562, -0.29736328125, 0.5289688110351562, -0.20341873168945312, -1.249755859375, -0.4800605773925781, 0.3359336853027344, -0.69378662109375, 0.352447509765625, 0.5553054809570312, 2.1300201416015625, 0.53228759765625, -0.0016021728515625, -0.0035552978515625, 1.0661163330078125, 0.18818092346191406, 0.3746604919433594, 0.689056396484375, 0.30052947998046875, 0.47202301025390625, -0.4820556640625, 0.3420066833496094, 0.5047502517700195, -0.6347198486328125, 0.5303955078125, 0.10270309448242188, -0.10085296630859375, -0.26737213134765625, 0.180816650390625, 0.3969573974609375, -0.5717315673828125, 0.056488037109375, 0.46467018127441406, 0.16682052612304688, 0.2398090362548828, -0.2684326171875, -0.1892719268798828, 0.351654052734375, -0.469329833984375, 0.7616348266601562, -0.33306884765625, 0.1791839599609375, 0.3584442138671875, 0.31824493408203125, -0.07461929321289062, -0.07830429077148438, -0.17724609375, 0.2723388671875, 0.24698638916015625, 0.7155914306640625, 0.445037841796875, 0.3841094970703125, -0.12902259826660156, 0.13087844848632812, 0.7354202270507812, 0.5499725341796875, 1.4017333984375, 0.07170867919921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000063.npy"}
|
||||
{"epoch": 0.09523809523809523, "step": 64, "batch_size": 64, "mean": 0.019572317600250244, "std": 0.46429747343063354, "min": -1.788848876953125, "p10": -0.5483642578125, "median": 0.059749603271484375, "p90": 0.5648910522460939, "max": 1.096038818359375, "pos_frac": 0.5625, "sample": [0.8480453491210938, 1.096038818359375, -0.025905609130859375, -0.247650146484375, -0.5423240661621094, 0.4468231201171875, 0.6635169982910156, 0.73291015625, 0.07394599914550781, 0.58233642578125, -0.013683319091796875, 0.057933807373046875, 0.3868904113769531, 0.24599456787109375, 0.35294342041015625, 0.12597274780273438, 0.40196990966796875, -0.5673370361328125, -0.3312644958496094, -0.02161407470703125, 0.0766754150390625, -0.5831222534179688, 0.5965042114257812, 0.11380767822265625, -0.747039794921875, -0.3074760437011719, 0.27313995361328125, -0.397735595703125, 0.09947586059570312, 0.061565399169921875, -0.31502342224121094, 0.5105133056640625, -0.009250640869140625, -0.5509529113769531, 0.29583740234375, 0.06585693359375, 0.3651409149169922, -1.788848876953125, -0.04894256591796875, -0.08192825317382812, 0.5241851806640625, -0.15749359130859375, 0.2829132080078125, -0.7411079406738281, -0.41663360595703125, -0.3085479736328125, 0.1899566650390625, 0.02789306640625, -0.699127197265625, 0.29416656494140625, -0.3010978698730469, -0.12446212768554688, 0.01148223876953125, 0.36411285400390625, 0.3129081726074219, 0.06758689880371094, 0.27840423583984375, 0.13529396057128906, -0.3483390808105469, -0.21414947509765625, 0.8402175903320312, -0.43817138671875, -0.2266063690185547, 0.005504608154296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000064.npy"}
|
||||
{"epoch": 0.09674981103552532, "step": 65, "batch_size": 64, "mean": 0.09427036345005035, "std": 0.5627335906028748, "min": -0.9626312255859375, "p10": -0.6383598327636718, "median": 0.08043766021728516, "p90": 0.8315404891967775, "max": 1.529266357421875, "pos_frac": 0.515625, "sample": [0.25432777404785156, -0.112823486328125, 1.0718994140625, -0.4376945495605469, -0.07575607299804688, 0.16595077514648438, -0.1543731689453125, -0.649444580078125, -0.10607147216796875, -0.1653900146484375, -0.930694580078125, 0.2364349365234375, 0.5871753692626953, -0.470001220703125, 0.20038604736328125, -0.6124954223632812, 0.8119792938232422, 0.7034454345703125, -0.64996337890625, 0.344879150390625, -0.52728271484375, -0.5059051513671875, -0.9626312255859375, 1.529266357421875, 0.946502685546875, 0.1131134033203125, -0.86004638671875, -0.02947235107421875, 0.17343902587890625, 0.11978912353515625, -0.041007041931152344, -0.12853240966796875, 0.43198394775390625, -0.7363815307617188, 0.6428375244140625, 0.7064132690429688, 0.2564544677734375, 0.499237060546875, 0.7144775390625, 0.7163734436035156, 0.8399238586425781, 0.06699180603027344, 0.338775634765625, 0.09388351440429688, -0.26599884033203125, -0.045440673828125, 0.227264404296875, 0.3535919189453125, -0.4676399230957031, -0.7500114440917969, 1.25042724609375, -0.053638458251953125, -0.12340545654296875, -0.14630508422851562, -0.0623626708984375, -0.35240936279296875, -0.240081787109375, 0.34099578857421875, 0.67291259765625, -0.3016510009765625, 1.1829605102539062, -0.595245361328125, 0.8492431640625, 0.15012359619140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000065.npy"}
|
||||
{"epoch": 0.0982615268329554, "step": 66, "batch_size": 64, "mean": 0.17929738759994507, "std": 0.5900011658668518, "min": -1.4774322509765625, "p10": -0.36766357421875, "median": 0.22057342529296875, "p90": 0.8418651580810549, "max": 2.498565673828125, "pos_frac": 0.640625, "sample": [-0.0052947998046875, 2.498565673828125, -0.2433929443359375, 0.2088165283203125, 0.5468273162841797, -0.10785675048828125, -0.14081192016601562, 0.76904296875, -0.4385547637939453, 0.221710205078125, 0.017864227294921875, -0.35280609130859375, 0.1170806884765625, -0.2878684997558594, 0.413116455078125, -0.465484619140625, 0.25069427490234375, 0.5901107788085938, 0.912353515625, 1.1257667541503906, 0.5552139282226562, 0.6445846557617188, -0.17980194091796875, 0.2662010192871094, -0.2909049987792969, 0.5155811309814453, 0.33567237854003906, 0.8943061828613281, 0.5903167724609375, 0.9039230346679688, 0.7820320129394531, 0.4257240295410156, -0.2043609619140625, 0.02176666259765625, 0.21782684326171875, 0.3706474304199219, -0.2799224853515625, 0.32363128662109375, 0.36696624755859375, 0.3337440490722656, -1.4774322509765625, -1.0105552673339844, 0.34543609619140625, -0.317291259765625, -0.15625762939453125, -0.37403106689453125, 0.314697265625, 0.47786712646484375, 0.075103759765625, 0.3628425598144531, 0.0122222900390625, 0.5372848510742188, 0.16741943359375, -0.0340423583984375, 0.7569808959960938, -0.351593017578125, 0.2194366455078125, -0.217041015625, -0.8338165283203125, 0.8675079345703125, 0.9442825317382812, -0.35254669189453125, 0.398773193359375, -1.103240966796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000066.npy"}
|
||||
{"epoch": 0.09977324263038549, "step": 67, "batch_size": 64, "mean": 0.16508588194847107, "std": 0.4774582087993622, "min": -0.792205810546875, "p10": -0.32641830444335934, "median": 0.10698699951171875, "p90": 0.7020753860473634, "max": 2.042510986328125, "pos_frac": 0.625, "sample": [-0.194793701171875, -0.056610107421875, -0.4004936218261719, -0.44244956970214844, -0.20786285400390625, 0.17584609985351562, 0.7269859313964844, 0.34069061279296875, 0.322052001953125, -0.15350341796875, 2.042510986328125, -0.05588722229003906, 0.07552337646484375, 0.1031951904296875, -0.29456329345703125, 0.15121841430664062, 0.0568389892578125, 0.5709342956542969, -0.24752426147460938, -0.386444091796875, -0.2885284423828125, 0.3874053955078125, -0.4647674560546875, 1.2928466796875, -0.06714248657226562, 0.30584716796875, 0.9104766845703125, 0.25170135498046875, 0.25946044921875, 0.040256500244140625, -0.3129119873046875, -0.03165626525878906, -0.21709442138671875, 0.6110305786132812, 0.6389999389648438, 0.16903305053710938, 0.5611038208007812, 0.6658382415771484, -0.295166015625, 0.3423042297363281, 0.1723175048828125, 0.11077880859375, 0.49395751953125, 1.1028060913085938, 0.47061920166015625, 0.6223983764648438, -0.195770263671875, -0.2652397155761719, 0.284820556640625, 0.06624603271484375, 0.32904815673828125, -0.792205810546875, -0.2225494384765625, 0.2303619384765625, 0.02277374267578125, -0.5712432861328125, -0.293121337890625, 0.46250152587890625, 0.0007534027099609375, -0.33220672607421875, 0.7419929504394531, 0.07826995849609375, 0.7176055908203125, 0.4458808898925781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000067.npy"}
|
||||
{"epoch": 0.10128495842781557, "step": 68, "batch_size": 64, "mean": 0.09334835410118103, "std": 0.6103469729423523, "min": -1.324615478515625, "p10": -0.6994049072265625, "median": 0.06563186645507812, "p90": 0.9908905029296883, "max": 1.472686767578125, "pos_frac": 0.53125, "sample": [0.5993194580078125, 1.1546783447265625, 0.19047164916992188, -0.25927734375, -0.1968994140625, 1.472686767578125, -0.0251312255859375, 0.0212249755859375, 1.1251678466796875, -0.8591384887695312, -0.29810333251953125, -1.01727294921875, 0.3477783203125, -0.2050628662109375, 1.28424072265625, 0.7758216857910156, 0.25203704833984375, -0.6182403564453125, 0.6319427490234375, -0.34767913818359375, 1.0825271606445312, -0.43183135986328125, -0.3749237060546875, 1.0763320922851562, -0.49263763427734375, -0.188720703125, -0.24664688110351562, -0.14676666259765625, 1.3263702392578125, 0.6419677734375, 0.5607643127441406, 0.23195838928222656, 0.3071022033691406, -0.04328155517578125, 0.2823600769042969, 0.7915267944335938, 0.086822509765625, 0.04444122314453125, 0.41033935546875, 0.09720611572265625, -1.324615478515625, -0.944122314453125, -0.1132354736328125, 0.7698135375976562, -0.1828765869140625, 0.223846435546875, 0.3784637451171875, 0.15445327758789062, -0.8526077270507812, -0.7258224487304688, -0.9785003662109375, -0.02947998046875, 0.093902587890625, 0.6697845458984375, 0.5303421020507812, -0.095611572265625, -0.36480712890625, -0.6377639770507812, 0.1596832275390625, -0.04154205322265625, 0.43354034423828125, 0.1790771484375, -0.08264923095703125, -0.2884521484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000068.npy"}
|
||||
{"epoch": 0.10279667422524566, "step": 69, "batch_size": 64, "mean": 0.2251199185848236, "std": 0.5568836331367493, "min": -1.1074409484863281, "p10": -0.42327690124511713, "median": 0.1574859619140625, "p90": 1.0228172302246097, "max": 1.5230445861816406, "pos_frac": 0.734375, "sample": [0.28701210021972656, 0.5844573974609375, 1.5230445861816406, -0.772552490234375, -0.10587310791015625, 0.3137702941894531, 0.08926773071289062, 0.04422187805175781, 0.156005859375, 0.9554672241210938, 0.38349151611328125, 0.09149932861328125, -0.3538627624511719, 1.1605453491210938, 0.35190582275390625, 0.23654937744140625, 0.1352100372314453, 1.1318511962890625, 1.405364990234375, -0.7827529907226562, 0.248382568359375, -0.8938751220703125, 0.0430145263671875, 0.5521278381347656, -0.054779052734375, -0.002414703369140625, 0.322357177734375, 0.158966064453125, -0.1671142578125, 0.0383758544921875, 0.44460296630859375, -0.086669921875, -0.11240768432617188, 0.5275421142578125, 0.3060111999511719, -0.7240142822265625, 1.105377197265625, 0.09798431396484375, -0.315673828125, -0.25799560546875, 0.2678680419921875, 0.06746292114257812, -0.3529205322265625, 0.0844879150390625, 0.8351821899414062, 0.10486602783203125, 0.8026657104492188, 0.07760810852050781, 0.05231475830078125, 0.4371795654296875, 0.0650787353515625, 0.43218994140625, -1.1074409484863281, -0.72442626953125, 1.0516815185546875, 0.4777374267578125, 0.468536376953125, 0.9385795593261719, 0.422637939453125, -0.45302581787109375, 1.465240478515625, 0.02846527099609375, 0.654754638671875, 0.24653053283691406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000069.npy"}
|
||||
{"epoch": 0.10430839002267574, "step": 70, "batch_size": 64, "mean": 0.1886359453201294, "std": 0.6891180276870728, "min": -1.2880783081054688, "p10": -0.6053680419921875, "median": 0.0973968505859375, "p90": 1.0819057464599613, "max": 1.8768157958984375, "pos_frac": 0.578125, "sample": [0.33032989501953125, 0.2652931213378906, 0.8145294189453125, -0.997344970703125, 0.06459999084472656, -0.6430873870849609, -0.521759033203125, 0.9496231079101562, 0.8172683715820312, -0.5766410827636719, 0.9599380493164062, -0.10664749145507812, -0.256500244140625, 0.9987983703613281, 0.032863616943359375, 0.0500946044921875, 0.3955841064453125, -0.812774658203125, -0.0895843505859375, -0.4183788299560547, 0.6500701904296875, 0.6678390502929688, 0.7724533081054688, 0.1133270263671875, -1.2880783081054688, 0.39823150634765625, -0.43938446044921875, 1.8594894409179688, -0.18068695068359375, 1.117523193359375, -0.3340911865234375, -0.4806175231933594, -0.5848922729492188, -0.6141433715820312, -0.3171539306640625, 1.8768157958984375, 0.6787300109863281, -0.0619049072265625, 0.13336563110351562, 0.4941368103027344, 0.15862655639648438, 0.01602935791015625, -0.3911247253417969, 0.8059463500976562, 0.4258537292480469, 1.2003326416015625, -0.2635002136230469, -0.419952392578125, -0.2962379455566406, 0.23994827270507812, 0.0814666748046875, 1.728851318359375, -0.22297286987304688, -0.041534423828125, -0.1467437744140625, 0.39260101318359375, 0.9301071166992188, -0.61474609375, 0.685333251953125, 1.271127700805664, -0.8608551025390625, 0.23543167114257812, 0.2890129089355469, 1.1524658203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000070.npy"}
|
||||
{"epoch": 0.10582010582010581, "step": 71, "batch_size": 64, "mean": 0.16066959500312805, "std": 0.6106449365615845, "min": -1.54376220703125, "p10": -0.6216560363769531, "median": 0.1732034683227539, "p90": 0.8730663299560547, "max": 1.2391815185546875, "pos_frac": 0.65625, "sample": [0.2140216827392578, 0.38217926025390625, 0.5175666809082031, 0.3798637390136719, 0.9635848999023438, -0.1695709228515625, 0.6392498016357422, -0.8512954711914062, 1.005035400390625, 0.799407958984375, 0.0336761474609375, -0.27165985107421875, 0.06674575805664062, 0.0736083984375, -1.214202880859375, 0.09632682800292969, -0.9901275634765625, 0.4632453918457031, 0.7165603637695312, 0.100799560546875, -0.126251220703125, 0.837249755859375, -0.16070556640625, -0.06548309326171875, 0.5218048095703125, 1.2391815185546875, 0.26139068603515625, 0.018947601318359375, 0.49471282958984375, -0.2469482421875, -1.54376220703125, -0.38623046875, -0.9652938842773438, -0.11147689819335938, 0.48509979248046875, 0.067718505859375, 0.31858062744140625, -0.3123626708984375, 0.6957511901855469, -0.5543327331542969, 0.10968780517578125, -1.4099884033203125, 0.507659912109375, 1.061431884765625, -0.6505088806152344, -0.12225151062011719, 0.9335479736328125, 0.2869720458984375, 0.5740776062011719, 0.13238525390625, 0.868865966796875, 0.9255180358886719, -0.49735260009765625, 0.6899490356445312, 0.4224853515625, 0.7910919189453125, 0.6040802001953125, 0.02574920654296875, -0.047023773193359375, 0.343475341796875, 0.8748664855957031, -0.10559272766113281, 0.8083953857421875, -0.26727294921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000071.npy"}
|
||||
{"epoch": 0.1073318216175359, "step": 72, "batch_size": 64, "mean": 0.23458027839660645, "std": 0.6244333982467651, "min": -1.91815185546875, "p10": -0.42640914916992184, "median": 0.25225067138671875, "p90": 0.8619476318359376, "max": 2.1494674682617188, "pos_frac": 0.625, "sample": [-0.38636016845703125, 0.957000732421875, 0.27503204345703125, 0.3261528015136719, -0.6390533447265625, 0.05615997314453125, 0.1537799835205078, 1.7501029968261719, -0.2261028289794922, 0.5914535522460938, 0.1171112060546875, -0.04593658447265625, 0.5244598388671875, 0.13008499145507812, -0.10284423828125, -0.3066425323486328, 0.5374431610107422, -0.22846221923828125, -0.24945831298828125, 0.4344024658203125, -0.3224029541015625, 0.7027587890625, 0.561614990234375, 0.29366302490234375, 0.8661041259765625, 0.8246917724609375, 0.5259666442871094, 0.10319900512695312, -0.443572998046875, 0.19174957275390625, 1.087158203125, 0.5467700958251953, -0.8571548461914062, -0.02133941650390625, -0.2729339599609375, -0.050350189208984375, 0.31426429748535156, 0.2366943359375, -0.19918441772460938, 0.530914306640625, 0.9026603698730469, 0.7523002624511719, 0.72552490234375, 0.08288955688476562, 0.2678070068359375, 0.4550323486328125, 0.3545989990234375, -0.48504638671875, 0.443328857421875, -0.215576171875, -1.91815185546875, -0.4647216796875, 2.1494674682617188, -0.01566314697265625, 0.827972412109375, -0.08269500732421875, 0.7840347290039062, 0.800537109375, -0.7459564208984375, -0.09119796752929688, 1.3753662109375, -0.2992401123046875, 0.27068328857421875, 0.8522491455078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000072.npy"}
|
||||
{"epoch": 0.10884353741496598, "step": 73, "batch_size": 64, "mean": 0.018752455711364746, "std": 0.739147424697876, "min": -1.9133071899414062, "p10": -0.8364223480224608, "median": -0.038784027099609375, "p90": 0.8212265014648438, "max": 2.678924560546875, "pos_frac": 0.46875, "sample": [0.5357284545898438, -0.4107208251953125, 0.7905960083007812, -0.8713874816894531, -0.22211456298828125, -1.9133071899414062, -0.9863357543945312, 0.2967414855957031, 0.2472991943359375, 0.8738365173339844, -0.6219253540039062, 0.3051300048828125, 0.6911392211914062, -0.6451759338378906, -0.10103416442871094, 0.7819442749023438, -0.5677566528320312, -0.3278045654296875, -0.11600494384765625, 0.9510269165039062, 0.348602294921875, 0.37854766845703125, 0.30925559997558594, -0.3363037109375, 0.84893798828125, 0.7697601318359375, -0.4750518798828125, -0.75213623046875, 0.31528282165527344, 0.6101112365722656, 0.1538543701171875, -0.013446807861328125, 0.35889434814453125, -0.41808319091796875, -0.95556640625, 0.3327217102050781, 0.6487350463867188, 0.262420654296875, -0.7454071044921875, -0.4078521728515625, -1.4779510498046875, 2.678924560546875, -1.355987548828125, -0.30450439453125, -0.385528564453125, 0.8140487670898438, -0.04261016845703125, -0.195220947265625, 0.2789802551269531, -0.7548370361328125, 0.8243026733398438, -0.04671478271484375, -0.11589241027832031, -0.441680908203125, -0.0349578857421875, 0.7219314575195312, -0.1798248291015625, 0.4256591796875, -0.35994720458984375, -1.0200042724609375, 0.5320816040039062, 0.8364334106445312, 1.34539794921875, -0.4650917053222656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000073.npy"}
|
||||
{"epoch": 0.11035525321239607, "step": 74, "batch_size": 64, "mean": 0.27362462878227234, "std": 0.6930979490280151, "min": -1.2269515991210938, "p10": -0.5740776062011718, "median": 0.300811767578125, "p90": 1.100482177734375, "max": 2.1368560791015625, "pos_frac": 0.59375, "sample": [-0.0234222412109375, 1.222412109375, -1.19012451171875, 0.8239517211914062, -0.3568878173828125, 0.5391159057617188, 0.9049453735351562, 1.303802490234375, 0.255859375, 1.216278076171875, 0.6722145080566406, 0.5152664184570312, -0.03277587890625, 0.45761871337890625, -0.4438591003417969, 0.6800498962402344, -0.094451904296875, 0.42249298095703125, -0.3500385284423828, -1.11822509765625, 0.768585205078125, -0.2111663818359375, 0.6613883972167969, 0.7622222900390625, 0.5017166137695312, -0.059345245361328125, -0.1903839111328125, 0.4095573425292969, 2.1368560791015625, -0.769134521484375, -0.18870162963867188, 0.5574054718017578, 0.5046234130859375, -0.6316909790039062, -0.47348785400390625, 0.8169136047363281, -0.22212600708007812, 0.27396392822265625, -1.2269515991210938, -0.3336677551269531, 0.5312614440917969, -0.13944053649902344, 0.402099609375, 0.6913833618164062, 0.17943191528320312, 1.87664794921875, 1.5478973388671875, 0.2501869201660156, 1.1060714721679688, -0.15052032470703125, -0.9433441162109375, 0.1695423126220703, -0.3679962158203125, -0.003246307373046875, 0.32765960693359375, -0.04634666442871094, 1.050689697265625, -0.6171875, 0.113861083984375, 0.7011642456054688, 0.7332305908203125, 1.0874404907226562, -0.059600830078125, 0.580291748046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000074.npy"}
|
||||
{"epoch": 0.11186696900982615, "step": 75, "batch_size": 64, "mean": 0.26318252086639404, "std": 0.7592613101005554, "min": -1.7247161865234375, "p10": -0.4872266769409179, "median": 0.2165679931640625, "p90": 1.1742351531982422, "max": 2.8092041015625, "pos_frac": 0.65625, "sample": [-0.10187530517578125, 0.9475860595703125, 0.498687744140625, 0.4739646911621094, 1.6688728332519531, 0.3660240173339844, -0.15232467651367188, 0.045063018798828125, 0.9259185791015625, 1.346771240234375, 0.36267852783203125, -0.891845703125, -0.21291160583496094, 0.2251739501953125, 0.0135040283203125, 1.4687881469726562, 0.3703460693359375, -0.5008411407470703, 0.04760551452636719, 0.19859695434570312, 2.8092041015625, 0.6474418640136719, 1.1511077880859375, 0.39769744873046875, 0.1384124755859375, -0.2644805908203125, 0.2711029052734375, -1.0543975830078125, 0.6357803344726562, -0.02225494384765625, 0.8626670837402344, 1.1841468811035156, 0.2079620361328125, 0.4672584533691406, -1.2684555053710938, -0.33057403564453125, 2.495574951171875, 0.34868621826171875, 0.22643280029296875, -0.036365509033203125, -0.24295806884765625, 0.46028900146484375, -1.7247161865234375, -0.07302093505859375, 0.3977851867675781, -0.6964874267578125, -0.4554595947265625, 0.19498443603515625, -0.31520843505859375, -0.29341697692871094, 0.6305408477783203, 0.1731719970703125, -0.06446075439453125, 0.4311065673828125, 0.5948066711425781, -0.3763885498046875, 0.49310302734375, 0.107940673828125, -0.5699729919433594, 0.110626220703125, 0.22603416442871094, -0.36350250244140625, 1.20068359375, 1.0314712524414062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000075.npy"}
|
||||
{"epoch": 0.11337868480725624, "step": 76, "batch_size": 64, "mean": 0.21271675825119019, "std": 0.7995783090591431, "min": -1.7878189086914062, "p10": -0.7066888809204102, "median": 0.21212005615234375, "p90": 1.3970466613769532, "max": 1.8045806884765625, "pos_frac": 0.59375, "sample": [-0.83428955078125, 0.42269134521484375, -1.119384765625, 0.2023773193359375, 0.64239501953125, -0.5366249084472656, -0.4450416564941406, -0.3648834228515625, 0.3556652069091797, 1.023284912109375, 1.4114189147949219, 1.0481109619140625, -0.2959098815917969, -0.8064041137695312, 1.4067230224609375, -1.674713134765625, -0.441162109375, 0.452239990234375, -0.186187744140625, -0.16416168212890625, -0.2446746826171875, 1.3976211547851562, -0.8114547729492188, 1.3957061767578125, 1.0066070556640625, 0.5726699829101562, -0.17635726928710938, 1.4724655151367188, -0.4797172546386719, 0.27368927001953125, -0.3864021301269531, -0.2939167022705078, -0.6080284118652344, -0.729522705078125, -0.3179130554199219, 0.2492523193359375, 1.4627456665039062, 1.38323974609375, -0.5541839599609375, 0.24298095703125, 0.08368682861328125, 0.4704933166503906, -0.18805694580078125, 0.6798934936523438, 0.5220413208007812, -0.29443931579589844, 0.1833953857421875, -1.7878189086914062, 1.0079994201660156, 1.0270614624023438, 0.6397552490234375, 0.234344482421875, 0.22186279296875, 1.8045806884765625, 1.254669189453125, 0.07615280151367188, 0.06676101684570312, 0.5005340576171875, -0.1892242431640625, 0.91436767578125, 1.6736602783203125, 0.251068115234375, 0.163543701171875, -0.6534099578857422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000076.npy"}
|
||||
{"epoch": 0.11489040060468632, "step": 77, "batch_size": 64, "mean": 0.40263840556144714, "std": 0.7796000242233276, "min": -1.1180648803710938, "p10": -0.4103740692138672, "median": 0.2059011459350586, "p90": 1.4180557250976562, "max": 2.5147552490234375, "pos_frac": 0.65625, "sample": [-0.06540298461914062, 0.14482498168945312, 0.3926963806152344, -0.6667861938476562, 0.8876819610595703, 0.43544769287109375, -0.3226470947265625, 0.019786834716796875, -1.1180648803710938, -0.725189208984375, 0.7088623046875, -0.08609390258789062, -0.41074371337890625, 0.5321884155273438, 0.38436126708984375, 2.415599822998047, 1.5331573486328125, 1.4181060791015625, 0.6280670166015625, 2.5147552490234375, -0.32253265380859375, 0.7472381591796875, 1.3310394287109375, 1.5483665466308594, 0.7426071166992188, -0.3294029235839844, 0.20624351501464844, 1.02734375, -0.1120452880859375, 0.0072479248046875, -0.4095115661621094, 0.0291748046875, 0.19113922119140625, 2.195026397705078, -0.3513336181640625, 1.023284912109375, 1.2168121337890625, -0.3064460754394531, 0.05859184265136719, 1.900543212890625, 1.0268020629882812, 0.20555877685546875, 0.7300682067871094, -0.6572456359863281, -0.20941162109375, 0.1031951904296875, 0.0767364501953125, -0.2913169860839844, 0.4942779541015625, 0.6367912292480469, 1.0201034545898438, -0.1624755859375, 0.4791221618652344, 0.9146137237548828, 1.417938232421875, 0.5666770935058594, -0.43344879150390625, -0.5215835571289062, 1.1746292114257812, 0.04693031311035156, -0.1360931396484375, 0.70086669921875, -0.1246490478515625, -0.30322265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000077.npy"}
|
||||
{"epoch": 0.1164021164021164, "step": 78, "batch_size": 64, "mean": 0.37810176610946655, "std": 0.8298017978668213, "min": -1.2269401550292969, "p10": -0.5778900146484375, "median": 0.40847206115722656, "p90": 1.2927787780761721, "max": 2.751983642578125, "pos_frac": 0.640625, "sample": [0.4934501647949219, 2.751983642578125, 0.45946502685546875, 0.9150772094726562, -0.13233184814453125, 0.7647705078125, 1.9430313110351562, -0.415863037109375, -0.09955978393554688, 1.17352294921875, 0.403350830078125, 0.768951416015625, 0.08831596374511719, 1.660736083984375, 0.07379150390625, -0.9448928833007812, -0.42392730712890625, 1.0772018432617188, 0.691986083984375, -0.07708740234375, -0.47891998291015625, -0.7383193969726562, -0.1915130615234375, 0.06258392333984375, -0.5846099853515625, 0.7318267822265625, 0.12266159057617188, -0.8926849365234375, 1.016357421875, -0.1904296875, 0.9105720520019531, 0.5534210205078125, -0.3293113708496094, 1.208648681640625, 2.203857421875, -0.438323974609375, -0.5622100830078125, 0.4408111572265625, 0.7463455200195312, 0.8859634399414062, -0.14393234252929688, 0.4135932922363281, 0.935028076171875, 1.3288345336914062, 0.45367431640625, 0.4421272277832031, -1.2269401550292969, -0.06389999389648438, 0.5211563110351562, 0.7999801635742188, 0.05538368225097656, 1.0837669372558594, 0.1508941650390625, -0.970123291015625, 2.4959564208984375, -0.154541015625, -0.22705459594726562, -0.678497314453125, 0.5207538604736328, 0.833770751953125, 0.2133026123046875, 0.2087574005126953, 1.8186302185058594, -0.2608070373535156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000078.npy"}
|
||||
{"epoch": 0.11791383219954649, "step": 79, "batch_size": 64, "mean": 0.39458605647087097, "std": 0.9934011101722717, "min": -1.6033897399902344, "p10": -0.6759490966796875, "median": 0.2877464294433594, "p90": 1.6033020019531257, "max": 3.597412109375, "pos_frac": 0.625, "sample": [-0.06609153747558594, -0.1686248779296875, -1.6033897399902344, 0.6464996337890625, -0.36382293701171875, 0.46157073974609375, -0.9329872131347656, -0.4434471130371094, -0.6773223876953125, 0.285858154296875, -0.4033546447753906, -0.38033294677734375, 0.3094825744628906, 3.597412109375, -0.407318115234375, -0.12236785888671875, 1.2341842651367188, 0.4378509521484375, -0.6938591003417969, 0.104248046875, 0.9063491821289062, 2.7715911865234375, -0.6727447509765625, 0.883453369140625, 0.5660591125488281, -0.17477035522460938, 2.5049896240234375, 1.2589263916015625, 0.8251838684082031, 1.6781005859375, -1.3431472778320312, 0.44415283203125, 0.9737319946289062, 0.970489501953125, 0.4912528991699219, 0.15439796447753906, 0.27109527587890625, 0.583709716796875, -0.1754302978515625, 0.41986083984375, -1.06640625, 1.42877197265625, 0.889129638671875, -0.2575531005859375, -0.041858673095703125, 0.28963470458984375, 0.8116302490234375, 0.05214500427246094, 0.5518684387207031, 0.07593917846679688, 0.5424842834472656, -0.7291412353515625, 2.3181915283203125, 0.7129974365234375, 1.1820220947265625, 2.616424560546875, -0.46417236328125, 0.2340240478515625, -0.41123199462890625, 0.19626235961914062, -0.2991180419921875, 0.315704345703125, 2.334564208984375, -0.18024444580078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000079.npy"}
|
||||
{"epoch": 0.11942554799697656, "step": 80, "batch_size": 64, "mean": 0.3102267384529114, "std": 0.921295702457428, "min": -1.4430465698242188, "p10": -0.856104278564453, "median": 0.3603477478027344, "p90": 1.31734619140625, "max": 3.173858642578125, "pos_frac": 0.65625, "sample": [0.5324554443359375, 1.7503204345703125, -0.75592041015625, 0.6607208251953125, 0.9183177947998047, 0.5885467529296875, -0.6618499755859375, -0.3546142578125, -0.37407684326171875, -1.4430465698242188, 0.4606513977050781, 2.520233154296875, 0.37908172607421875, -0.3008003234863281, 0.5445232391357422, 1.0770263671875, -0.02942657470703125, -0.8990402221679688, 0.20325088500976562, 1.01123046875, 0.34299468994140625, -0.1327362060546875, -0.3692779541015625, 1.9372329711914062, 1.0324859619140625, 1.2264060974121094, 0.5736770629882812, 0.9150238037109375, 1.287567138671875, 1.036895751953125, -0.7212371826171875, 0.08453369140625, 1.330108642578125, -0.46228981018066406, 0.5949554443359375, 1.3323822021484375, 0.26483917236328125, 1.1110763549804688, 0.29261016845703125, -0.4389495849609375, 0.748626708984375, 0.4436798095703125, -1.3775482177734375, 1.0173606872558594, -0.3395881652832031, 0.669281005859375, -1.29876708984375, 0.2952556610107422, -1.1727523803710938, -0.6512451171875, 3.173858642578125, 0.21863555908203125, -0.471038818359375, 0.0126800537109375, 0.4677581787109375, -0.9327583312988281, 0.8314743041992188, 0.2607555389404297, 0.39051246643066406, -0.5139541625976562, 0.3777008056640625, -1.3605194091796875, 1.662994384765625, 0.3362274169921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000080.npy"}
|
||||
{"epoch": 0.12093726379440665, "step": 81, "batch_size": 64, "mean": 0.4144783914089203, "std": 0.9152052402496338, "min": -1.3712730407714844, "p10": -0.6151832580566406, "median": 0.2896852493286133, "p90": 1.894617080688477, "max": 2.8498382568359375, "pos_frac": 0.65625, "sample": [0.28017234802246094, 0.8563652038574219, 0.9791603088378906, -0.5947341918945312, 0.8328399658203125, 0.19441986083984375, 0.30443572998046875, 0.07130813598632812, -0.12600135803222656, -0.436492919921875, 2.4102783203125, -1.2531814575195312, -1.2151260375976562, 0.39629364013671875, 0.2991981506347656, 1.2526168823242188, 0.024987220764160156, -0.219482421875, 0.5831985473632812, 0.9049835205078125, 0.44217681884765625, -0.21071434020996094, 0.9588623046875, -0.410400390625, 0.9134521484375, 0.18975830078125, -0.4698028564453125, 1.9448738098144531, 0.13858795166015625, -0.3027801513671875, 0.8651123046875, -0.7766799926757812, 2.0287322998046875, 0.07110977172851562, -0.073028564453125, 1.041015625, 0.6572265625, 2.043701171875, -0.6239471435546875, 2.8498382568359375, -0.9632720947265625, 0.9010257720947266, 0.1175994873046875, 0.1722259521484375, -0.32476806640625, -0.16124343872070312, 0.7040557861328125, 0.7119369506835938, 2.2703094482421875, 2.2223663330078125, 1.7773513793945312, 0.49334716796875, 1.3439483642578125, -0.2225341796875, -0.04541778564453125, -0.1071624755859375, -0.7777519226074219, 0.3848590850830078, 0.2021484375, 0.9689674377441406, -0.28185272216796875, 0.5564994812011719, 1.1329193115234375, -1.3712730407714844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000081.npy"}
|
||||
{"epoch": 0.12244897959183673, "step": 82, "batch_size": 64, "mean": 0.531429648399353, "std": 0.9060264229774475, "min": -1.4958572387695312, "p10": -0.6912425994873045, "median": 0.4622650146484375, "p90": 1.510788917541504, "max": 3.3404769897460938, "pos_frac": 0.765625, "sample": [0.09422492980957031, 0.16603851318359375, -0.3105926513671875, 0.11385154724121094, 0.738006591796875, 0.1925373077392578, 1.5174694061279297, 1.2453956604003906, 1.903656005859375, 1.4196758270263672, 0.6414299011230469, 0.2722816467285156, 2.4329071044921875, 1.152313232421875, 0.46683502197265625, -0.27521514892578125, 1.2733917236328125, -0.7688713073730469, 1.0128326416015625, 1.4867935180664062, 0.4367866516113281, 0.46149444580078125, 0.5759010314941406, -0.48435211181640625, 0.1279296875, 1.87188720703125, 0.36658477783203125, 0.46303558349609375, 0.9040412902832031, 0.2833118438720703, -0.7914047241210938, 0.4542274475097656, 2.58782958984375, -0.5101089477539062, 0.9504852294921875, -0.986328125, 1.4952011108398438, 1.3914871215820312, 0.6000213623046875, 1.6595878601074219, 0.23636627197265625, 0.04550933837890625, 0.524658203125, -0.12779998779296875, 0.3872184753417969, 0.016803741455078125, 0.5815963745117188, -0.29135894775390625, 1.3264846801757812, -0.0742645263671875, -1.4958572387695312, 0.4249305725097656, 1.0157928466796875, 3.3404769897460938, 0.6404495239257812, -0.8059120178222656, 1.3423233032226562, 0.5457229614257812, 1.228363037109375, 0.5949516296386719, -0.9060516357421875, -0.4405975341796875, -0.9246978759765625, 0.19380950927734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000082.npy"}
|
||||
{"epoch": 0.12396069538926682, "step": 83, "batch_size": 64, "mean": 0.44769296050071716, "std": 0.920208752155304, "min": -1.5237274169921875, "p10": -0.6776319503784178, "median": 0.3965301513671875, "p90": 1.7273002624511722, "max": 3.7000274658203125, "pos_frac": 0.734375, "sample": [2.1124420166015625, 0.3910064697265625, -1.5237274169921875, 0.3341064453125, 0.4764251708984375, 1.5092315673828125, 1.324951171875, 0.8302536010742188, 0.00537109375, 2.04144287109375, 0.4261016845703125, 0.31674957275390625, -1.1415481567382812, 0.2764892578125, 0.7102928161621094, 0.5294532775878906, 0.245849609375, 1.8234100341796875, -0.0268402099609375, -0.7462501525878906, -0.16608428955078125, 0.32666778564453125, 1.7562332153320312, 0.5410518646240234, -0.3788299560546875, 0.4492340087890625, 1.9165802001953125, 0.5653533935546875, -0.12177085876464844, -0.14670944213867188, -1.3180389404296875, -0.071502685546875, 0.11864089965820312, -1.4970512390136719, 0.3032417297363281, 0.8678054809570312, 0.5868854522705078, 1.2969894409179688, -0.27475738525390625, -0.825042724609375, 0.4210338592529297, -0.24199676513671875, 1.2532424926757812, 0.4020538330078125, 0.19933128356933594, -0.5175228118896484, 0.14416885375976562, 0.5110321044921875, 1.0167083740234375, 1.0250320434570312, 3.7000274658203125, 0.5847129821777344, 0.9145240783691406, 1.6597900390625, 0.0870208740234375, 1.0527801513671875, 0.3407421112060547, 1.0157661437988281, 1.7769012451171875, 0.156005859375, 0.486358642578125, -0.43914031982421875, 0.14993667602539062, -0.8902664184570312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000083.npy"}
|
||||
{"epoch": 0.1254724111866969, "step": 84, "batch_size": 64, "mean": 0.4554682672023773, "std": 1.0550191402435303, "min": -2.582489013671875, "p10": -0.7761028289794921, "median": 0.5652694702148438, "p90": 1.6051757812500003, "max": 2.7860107421875, "pos_frac": 0.703125, "sample": [0.5863037109375, 0.9640960693359375, 0.31578826904296875, 1.0441474914550781, 2.5028457641601562, 1.0587577819824219, 1.2168426513671875, 0.20649337768554688, 0.20910263061523438, -1.6302337646484375, 0.2148284912109375, 0.9582614898681641, 1.0716056823730469, -0.6848373413085938, 1.4261093139648438, 1.0895729064941406, 0.01392364501953125, -1.7957382202148438, 1.7910308837890625, 0.46569061279296875, 0.5442352294921875, 1.3534622192382812, -0.4432868957519531, 0.9122200012207031, 1.118124008178711, -1.626983642578125, -0.7550582885742188, 0.7699432373046875, -0.09334087371826172, 2.1561279296875, 0.20410919189453125, 0.449462890625, 0.334747314453125, 2.00927734375, -0.45931243896484375, 1.3835067749023438, 0.8162345886230469, 0.9736347198486328, 0.4152069091796875, -0.903472900390625, -2.582489013671875, 1.6218109130859375, -0.2317047119140625, -0.1782073974609375, 0.649871826171875, -0.27756500244140625, 0.003910064697265625, 0.5380477905273438, 0.710784912109375, -1.3988876342773438, -0.36217498779296875, 1.029693603515625, 0.6408538818359375, 0.8661651611328125, 0.8795928955078125, 1.5663604736328125, 2.6919097900390625, -0.25504493713378906, -0.244476318359375, -0.5476226806640625, 1.0360031127929688, 0.8088188171386719, -0.7851219177246094, 2.7860107421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000084.npy"}
|
||||
{"epoch": 0.12698412698412698, "step": 85, "batch_size": 64, "mean": 0.4861195683479309, "std": 0.9624977707862854, "min": -1.9362030029296875, "p10": -0.8102066040039062, "median": 0.5593471527099609, "p90": 1.6163040161132813, "max": 2.4768829345703125, "pos_frac": 0.765625, "sample": [0.8972015380859375, -0.85076904296875, 0.5414237976074219, 0.5772705078125, -0.331207275390625, 0.830230712890625, 1.155059814453125, 0.27852630615234375, 0.0857696533203125, -0.1373443603515625, 1.4603652954101562, -0.64154052734375, 1.6971454620361328, -1.9362030029296875, 0.4838428497314453, 0.9520111083984375, 1.4469184875488281, 1.6036148071289062, 1.4984283447265625, 1.5635986328125, -0.501220703125, 0.31615447998046875, -1.49591064453125, 0.6961898803710938, 1.2680892944335938, 2.4768829345703125, 0.1236419677734375, 0.8888816833496094, 2.130565643310547, 0.27204322814941406, -0.6696739196777344, 0.7586250305175781, 0.05499267578125, 0.81524658203125, 0.775970458984375, -0.447357177734375, 2.1502914428710938, 1.0997352600097656, 0.0821533203125, 0.16574478149414062, -0.9774055480957031, 1.0022659301757812, 1.2266159057617188, 1.6217422485351562, 2.18157958984375, 0.8764896392822266, 1.014678955078125, 0.4344673156738281, 0.5283317565917969, 0.9843711853027344, 0.6969184875488281, -0.7155609130859375, 0.09667205810546875, -0.59100341796875, 0.4116859436035156, 0.4265594482421875, -1.2060470581054688, -1.2706298828125, 0.9643402099609375, 0.12387847900390625, 1.6828536987304688, -1.5815200805664062, 0.34073638916015625, 0.7042427062988281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000085.npy"}
|
||||
{"epoch": 0.12849584278155707, "step": 86, "batch_size": 64, "mean": 0.392984002828598, "std": 0.8419634699821472, "min": -1.8879241943359375, "p10": -0.5585704803466797, "median": 0.31497859954833984, "p90": 1.336309814453125, "max": 3.16339111328125, "pos_frac": 0.6875, "sample": [1.6093902587890625, 0.02541351318359375, -0.41190338134765625, -0.170745849609375, 1.3374176025390625, 0.05487823486328125, -0.20955657958984375, -0.09459686279296875, -0.616912841796875, -0.3154563903808594, 0.1458110809326172, -0.23009109497070312, 1.389312744140625, 1.3337249755859375, -1.217254638671875, 1.667083740234375, 0.8326416015625, -0.5590133666992188, -0.389007568359375, 0.6971664428710938, 3.16339111328125, 0.6433639526367188, 0.474273681640625, 0.5627593994140625, 1.125732421875, 0.4736595153808594, 0.954864501953125, 0.4712066650390625, 0.00396728515625, 0.2987346649169922, 1.1620635986328125, 2.5558319091796875, -0.0181121826171875, -1.8879241943359375, 0.2286834716796875, 0.651214599609375, 0.04032135009765625, -0.1021881103515625, 0.37541961669921875, 0.4561614990234375, -0.66864013671875, 1.1909141540527344, 1.0476875305175781, -0.583465576171875, 0.3312225341796875, -0.24799346923828125, 1.25494384765625, 0.4368743896484375, 0.9102706909179688, -0.5575370788574219, 2.050140380859375, 0.6060638427734375, 0.19757080078125, 0.004146575927734375, 0.8763427734375, 0.0821380615234375, -0.6114006042480469, 1.2594680786132812, 0.005542755126953125, -0.047557830810546875, 0.34619140625, -0.23534202575683594, 0.7159309387207031, 0.2757377624511719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000086.npy"}
|
||||
{"epoch": 0.13000755857898716, "step": 87, "batch_size": 64, "mean": 0.7411632537841797, "std": 1.2005404233932495, "min": -2.0834274291992188, "p10": -0.7399238586425781, "median": 0.7187156677246094, "p90": 2.1837493896484377, "max": 3.634368896484375, "pos_frac": 0.6875, "sample": [-0.398162841796875, 2.34405517578125, 2.12799072265625, -0.4103546142578125, 1.9057388305664062, 0.33028411865234375, 1.915435791015625, -0.7683181762695312, 2.1743927001953125, 2.1078262329101562, 0.7687416076660156, -0.7015933990478516, 1.5345535278320312, -0.9771080017089844, -0.282196044921875, 2.80316162109375, -0.9793548583984375, 1.6248931884765625, 0.9604835510253906, 0.7288742065429688, 0.1950836181640625, 2.2986373901367188, 0.4048309326171875, -1.0997695922851562, 1.6429214477539062, -2.0834274291992188, -0.22695541381835938, -0.30464744567871094, 1.086395263671875, -0.6826667785644531, 0.0648040771484375, -0.4356956481933594, 0.9116592407226562, -0.735443115234375, -0.159881591796875, -0.9190597534179688, 1.6704788208007812, 0.7006301879882812, 0.9185218811035156, 0.155548095703125, 0.5255050659179688, 0.70855712890625, 2.0014076232910156, -0.7418441772460938, 2.089874267578125, -0.1061248779296875, 0.01392364501953125, 1.3986358642578125, -0.6782417297363281, 2.2432708740234375, 2.1877593994140625, 0.861236572265625, 3.4976654052734375, 1.0649185180664062, 0.6413421630859375, 0.5937576293945312, 0.6427364349365234, -0.16805648803710938, 3.634368896484375, 1.6608734130859375, 1.83441162109375, 0.9767112731933594, 0.9504241943359375, 1.3900299072265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000087.npy"}
|
||||
{"epoch": 0.13151927437641722, "step": 88, "batch_size": 64, "mean": 0.3668217658996582, "std": 1.1189162731170654, "min": -2.5623626708984375, "p10": -0.9962039947509764, "median": 0.4506969451904297, "p90": 1.6028518676757815, "max": 3.790557861328125, "pos_frac": 0.671875, "sample": [1.9821395874023438, -0.11890411376953125, -0.5195293426513672, 0.6260223388671875, 2.110553741455078, 1.49542236328125, 1.3711395263671875, -0.5723419189453125, -1.8318023681640625, 0.2276763916015625, 0.842620849609375, 0.48744964599609375, 1.0768051147460938, 0.04025840759277344, -0.4682121276855469, 0.7396697998046875, 0.27967071533203125, -1.0360374450683594, -0.515380859375, 0.7618255615234375, 0.9812984466552734, 0.9863128662109375, 0.5960922241210938, 0.40132904052734375, -1.1337966918945312, 1.6970748901367188, 0.7688941955566406, 0.504180908203125, -0.7939987182617188, -0.5766525268554688, -0.90325927734375, -1.2010116577148438, 3.034332275390625, -0.2597808837890625, 0.08502197265625, 0.46298980712890625, -0.6671142578125, 0.8157997131347656, 1.6293563842773438, 0.4384040832519531, 0.3718376159667969, 1.5410079956054688, 0.46762847900390625, 0.9083976745605469, -1.085845947265625, 1.2383499145507812, 0.32550811767578125, -0.35858726501464844, 0.8737335205078125, 0.7119979858398438, 1.0870704650878906, 0.9349212646484375, 2.1517486572265625, 3.790557861328125, 1.5214385986328125, -0.7917289733886719, 0.3099212646484375, 0.3070259094238281, 0.070068359375, 0.6213912963867188, -0.6555900573730469, -2.5623626708984375, -1.8368301391601562, -0.3095855712890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000088.npy"}
|
||||
{"epoch": 0.1330309901738473, "step": 89, "batch_size": 64, "mean": 0.7424756288528442, "std": 1.301889419555664, "min": -2.0255470275878906, "p10": -0.7128683090209961, "median": 0.5326671600341797, "p90": 2.3271255493164062, "max": 4.772216796875, "pos_frac": 0.703125, "sample": [0.2885398864746094, 0.44626617431640625, 0.24197006225585938, -1.55682373046875, 0.4156646728515625, 2.8580055236816406, -1.13348388671875, 0.2802162170410156, -0.003665924072265625, 1.2303390502929688, 0.9282035827636719, 1.552093505859375, 1.74017333984375, 4.772216796875, 0.19908905029296875, -0.020608901977539062, 1.1776084899902344, -0.2069091796875, -0.5804634094238281, -0.26264190673828125, 0.22728729248046875, -0.3033485412597656, 1.01715087890625, 1.6818695068359375, -0.49614715576171875, 3.996795654296875, -0.18773651123046875, 1.7448883056640625, -0.7235736846923828, 0.0493316650390625, 0.8530120849609375, 0.1780242919921875, 0.2738304138183594, -0.1430816650390625, 1.71783447265625, 0.5342502593994141, 1.3600616455078125, 1.644256591796875, 2.3392333984375, 1.7719573974609375, 1.1769065856933594, 1.3694305419921875, 1.1568756103515625, 3.3522911071777344, 0.5310840606689453, -0.01313018798828125, 1.0157470703125, 2.475738525390625, 2.029937744140625, -1.3765792846679688, 1.186279296875, 1.6446723937988281, 0.4955863952636719, -0.6878890991210938, 1.1847305297851562, -1.5366287231445312, 0.76922607421875, -0.85552978515625, -0.32171630859375, -2.0255470275878906, 2.895599365234375, 0.8012733459472656, 2.2988739013671875, 0.04952239990234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000089.npy"}
|
||||
{"epoch": 0.1345427059712774, "step": 90, "batch_size": 64, "mean": 0.588042676448822, "std": 1.1984238624572754, "min": -3.296173095703125, "p10": -0.6354211807250977, "median": 0.5314483642578125, "p90": 1.9744186401367196, "max": 3.9381866455078125, "pos_frac": 0.765625, "sample": [-0.35636138916015625, 1.2640533447265625, 0.04955291748046875, 0.5154495239257812, 0.9155197143554688, 0.0500335693359375, 1.5746955871582031, 3.8207778930664062, 0.198272705078125, -3.296173095703125, 1.7445602416992188, 0.812164306640625, 1.3391532897949219, 0.4172096252441406, 1.5172958374023438, -0.4539794921875, 0.38535308837890625, 1.309906005859375, 0.052646636962890625, 0.5474472045898438, 1.3993606567382812, -2.7747955322265625, 2.5015869140625, 0.182281494140625, 0.769775390625, 0.9836273193359375, 0.469635009765625, 0.432220458984375, 0.46454429626464844, 2.6819992065429688, -0.64691162109375, -1.0743484497070312, 0.877227783203125, 2.3547210693359375, 0.3922538757324219, 0.5804977416992188, -0.8582038879394531, -0.772735595703125, 3.9381866455078125, 1.3340873718261719, -0.3917083740234375, 2.0729293823242188, 1.21929931640625, 0.04925537109375, 0.09228515625, 0.6917362213134766, 1.0286140441894531, 0.7542610168457031, 0.40624237060546875, 0.9415130615234375, -0.445953369140625, 0.9268226623535156, 0.8592758178710938, -0.049945831298828125, 0.19303131103515625, 0.10297775268554688, 1.0932159423828125, -0.2920703887939453, -0.3631134033203125, 0.8513565063476562, -0.695648193359375, 2.162139892578125, -0.6086101531982422, 1.3942375183105469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000090.npy"}
|
||||
{"epoch": 0.1360544217687075, "step": 91, "batch_size": 64, "mean": 0.5270801782608032, "std": 1.342108130455017, "min": -2.2296600341796875, "p10": -1.0832962036132812, "median": 0.36145782470703125, "p90": 2.1503074645996096, "max": 4.596412658691406, "pos_frac": 0.578125, "sample": [1.691070556640625, 1.3645706176757812, 0.7430419921875, -0.04927635192871094, 0.7658157348632812, 2.371124267578125, -0.42476654052734375, -0.37296295166015625, 1.2923736572265625, 3.3709259033203125, 2.04150390625, -1.28521728515625, 0.7331047058105469, -0.572509765625, 0.8332052230834961, -0.7535743713378906, 0.2102813720703125, -0.8603668212890625, 0.7832183837890625, -2.2296600341796875, 1.1985206604003906, 3.4268569946289062, 1.5828323364257812, -0.12545013427734375, 0.4226837158203125, 0.4350013732910156, 0.9423027038574219, -0.6822643280029297, -1.4223098754882812, 0.775238037109375, 0.14670181274414062, 2.4778900146484375, -1.1414642333984375, -0.04425621032714844, 2.4352798461914062, -0.1511688232421875, -0.7468414306640625, 1.8654403686523438, -0.9426536560058594, -1.2732963562011719, 1.280029296875, -1.18994140625, 0.2544403076171875, 0.25997161865234375, -0.15451431274414062, 2.1969375610351562, 0.30023193359375, 1.6310768127441406, 1.3078346252441406, 1.7793045043945312, -0.3312568664550781, -0.37749481201171875, -0.4698657989501953, -1.9742164611816406, 1.6715850830078125, 1.9415245056152344, 1.720062255859375, -0.2572021484375, 4.596412658691406, -0.15918350219726562, -0.94757080078125, 0.48772430419921875, 1.405975341796875, -0.06967926025390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000091.npy"}
|
||||
{"epoch": 0.13756613756613756, "step": 92, "batch_size": 64, "mean": 0.6017957925796509, "std": 1.2052431106567383, "min": -1.521820068359375, "p10": -0.8089210510253906, "median": 0.5086164474487305, "p90": 2.385936737060547, "max": 3.7336578369140625, "pos_frac": 0.65625, "sample": [0.7512435913085938, 0.5010299682617188, -0.748748779296875, -0.45689964294433594, -1.2415618896484375, -0.04052734375, -0.44692230224609375, 1.3815345764160156, -1.0674972534179688, -0.36215972900390625, -0.08905792236328125, 0.35105323791503906, 1.2897567749023438, 1.6485519409179688, 1.3015937805175781, 1.3446998596191406, 0.5537872314453125, 0.1329822540283203, 2.3050308227539062, -0.12439346313476562, 0.548004150390625, 0.098175048828125, 0.9402122497558594, 0.5657806396484375, 2.3384857177734375, -0.5683212280273438, -1.1500167846679688, -0.15114593505859375, -1.521820068359375, 1.5930023193359375, -0.3916893005371094, 1.2425880432128906, 2.4062728881835938, -0.61065673828125, 2.6932296752929688, 0.24887847900390625, 0.43951416015625, 0.678070068359375, -0.42711639404296875, -0.45992279052734375, -0.6978912353515625, 0.07659912109375, 3.6517257690429688, 2.4688568115234375, 0.2941474914550781, -1.0705184936523438, 0.7396011352539062, 1.9881820678710938, 0.6559982299804688, -0.9498405456542969, 0.5364151000976562, 2.6865997314453125, 1.7264022827148438, -0.01583099365234375, 1.6823577880859375, -0.8347091674804688, 3.7336578369140625, 1.0802993774414062, 0.5731792449951172, 0.868072509765625, 2.941316604614258, 0.5162029266357422, 0.00244903564453125, 0.36663818359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000092.npy"}
|
||||
{"epoch": 0.13907785336356765, "step": 93, "batch_size": 64, "mean": 0.4494805335998535, "std": 1.5553650856018066, "min": -3.2638092041015625, "p10": -1.3384227752685547, "median": 0.3353118896484375, "p90": 2.0387763977050786, "max": 6.316650390625, "pos_frac": 0.609375, "sample": [-0.1549549102783203, 1.1872634887695312, -1.586395263671875, 0.6622772216796875, -2.1919097900390625, 0.8095245361328125, -0.5286636352539062, -0.9141998291015625, 0.73297119140625, 0.7346687316894531, 4.033424377441406, 2.8663387298583984, 6.316650390625, -0.5846672058105469, -0.358306884765625, 0.03214263916015625, 0.7373428344726562, 1.4240226745605469, -0.10474395751953125, -1.3419876098632812, 1.9110565185546875, 3.4829559326171875, -0.6090545654296875, -0.2188568115234375, 0.1812744140625, -0.04624176025390625, 1.6868820190429688, 0.045894622802734375, -0.8327789306640625, 0.16336822509765625, 0.3341827392578125, 0.3364410400390625, 0.40152740478515625, 2.0935134887695312, -3.2638092041015625, 0.7560958862304688, -0.3426513671875, 0.7170677185058594, -1.30828857421875, -0.30750274658203125, 1.6682319641113281, 0.300750732421875, 1.1498603820800781, 1.0322113037109375, -1.3301048278808594, -1.2429580688476562, 0.9010505676269531, 1.1788215637207031, 3.3568572998046875, 0.8916511535644531, 1.318939208984375, 0.9171829223632812, 2.4843292236328125, 0.28701019287109375, -1.9377365112304688, -0.21351242065429688, 1.8107681274414062, -1.854888916015625, -1.3440055847167969, 1.6469039916992188, 1.0218353271484375, -0.20655059814453125, -0.4797821044921875, 0.45801544189453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000093.npy"}
|
||||
{"epoch": 0.14058956916099774, "step": 94, "batch_size": 64, "mean": 1.032451868057251, "std": 1.7490391731262207, "min": -2.515228271484375, "p10": -0.8067457199096679, "median": 0.844268798828125, "p90": 3.2705524444580085, "max": 7.16839599609375, "pos_frac": 0.765625, "sample": [3.8483963012695312, 3.343395233154297, -0.090179443359375, 0.8226222991943359, 0.45288848876953125, -2.286113739013672, 0.9482307434082031, 0.2400798797607422, 0.04756927490234375, 3.57940673828125, 0.38786888122558594, 1.6114959716796875, 2.31182861328125, -2.3998184204101562, 1.2118663787841797, 1.8387947082519531, 0.8304824829101562, 0.6286773681640625, -0.4052734375, 0.04473114013671875, 2.381755828857422, 1.1862411499023438, 1.055389404296875, 0.39081573486328125, 1.9672279357910156, -0.014667510986328125, 0.5235824584960938, 1.1428451538085938, -0.8165550231933594, 0.9087982177734375, 1.1240882873535156, 0.6617050170898438, 0.42992401123046875, -2.270660400390625, 0.8290061950683594, 1.5907363891601562, 2.414764404296875, 3.1005859375, 0.49310302734375, 2.397247314453125, 1.7711906433105469, 1.3873443603515625, -0.4430274963378906, 7.16839599609375, 2.3837966918945312, 1.1300811767578125, 2.5565872192382812, 4.54351806640625, 1.189849853515625, 2.2775325775146484, 5.233978271484375, -0.08201217651367188, -0.86151123046875, -0.7838573455810547, 0.8203792572021484, 0.03073883056640625, -2.515228271484375, 0.8580551147460938, 3.748443603515625, 1.0513114929199219, -0.2865333557128906, -0.7052993774414062, -1.2070732116699219, 0.347381591796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000094.npy"}
|
||||
{"epoch": 0.1421012849584278, "step": 95, "batch_size": 64, "mean": 0.6408175230026245, "std": 1.4619412422180176, "min": -2.177215576171875, "p10": -1.4399295806884764, "median": 0.5950412750244141, "p90": 2.5422332763671887, "max": 4.3519287109375, "pos_frac": 0.640625, "sample": [2.6415863037109375, -1.547271728515625, -0.7122230529785156, -0.3426361083984375, -1.5237045288085938, -0.3688201904296875, 4.3519287109375, 0.6610450744628906, 2.90045166015625, 1.6186294555664062, 0.4750251770019531, 0.494842529296875, 0.24859619140625, -0.064666748046875, -0.301605224609375, 1.3629837036132812, -2.177215576171875, -0.8993453979492188, -1.9107017517089844, 0.6410446166992188, 2.304168701171875, 2.9293212890625, 0.5490379333496094, 1.3073654174804688, -1.9169769287109375, -0.6959991455078125, 1.947113037109375, 1.6784210205078125, -1.5021591186523438, -1.6479644775390625, 0.4022979736328125, 2.0341033935546875, 1.2183914184570312, 0.7364120483398438, -0.030338287353515625, 0.47461700439453125, 2.3104095458984375, 1.580810546875, -0.9947586059570312, 0.6944503784179688, 0.4127197265625, -0.009960174560546875, 2.13507080078125, 0.7978782653808594, -1.2293701171875, 2.6673431396484375, 1.7097625732421875, 0.3603668212890625, 0.7282791137695312, 2.8960800170898438, 0.0256805419921875, 3.7885971069335938, -1.2947273254394531, 0.98944091796875, 1.3776969909667969, 2.1584625244140625, -0.36885833740234375, 1.4059677124023438, -0.10328102111816406, 1.1398773193359375, -0.5298233032226562, 1.8385047912597656, 1.87664794921875, -0.6867008209228516], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000095.npy"}
|
||||
{"epoch": 0.1436130007558579, "step": 96, "batch_size": 64, "mean": 0.8126805424690247, "std": 1.5463447570800781, "min": -2.2148284912109375, "p10": -0.9352802276611328, "median": 0.5052719116210938, "p90": 2.849337768554688, "max": 4.664459228515625, "pos_frac": 0.671875, "sample": [1.3645095825195312, -1.3644561767578125, 1.4336700439453125, -0.9390029907226562, 2.535308837890625, -0.238861083984375, 1.2533035278320312, -0.4930877685546875, 1.190704345703125, -0.01084136962890625, 3.8115234375, 0.05722999572753906, 0.21379852294921875, 2.734771728515625, 2.8984375, 2.2832794189453125, -0.9502792358398438, 0.5391845703125, -0.7912158966064453, -0.9265937805175781, 0.3334465026855469, 0.12496185302734375, 2.1083831787109375, -2.1781005859375, 0.90570068359375, 0.5310764312744141, 1.3942756652832031, 0.31919097900390625, 2.7232513427734375, 0.11525726318359375, 2.9380722045898438, 1.78436279296875, 3.2677001953125, 2.6687088012695312, -0.03972625732421875, 2.7042922973632812, -2.2148284912109375, 0.05834388732910156, -0.3717498779296875, 0.969512939453125, 2.494293212890625, 0.4924736022949219, 0.7305526733398438, -0.4370384216308594, 4.664459228515625, -0.27993011474609375, 2.524831771850586, -1.17071533203125, -0.0050506591796875, -0.9935951232910156, 0.7277107238769531, -0.5428009033203125, 0.49457550048828125, 3.962921142578125, 3.8490829467773438, 0.7467117309570312, -0.40117645263671875, 0.5159683227539062, 0.37040138244628906, 0.6151275634765625, 2.2822113037109375, 0.3916206359863281, -0.9143447875976562, -0.850250244140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000096.npy"}
|
||||
{"epoch": 0.14512471655328799, "step": 97, "batch_size": 64, "mean": 0.3321412205696106, "std": 1.690442442893982, "min": -4.569671630859375, "p10": -1.7556949615478514, "median": 0.3896293640136719, "p90": 2.257339477539063, "max": 4.4382171630859375, "pos_frac": 0.5625, "sample": [-2.093719482421875, -0.8477020263671875, -0.49722862243652344, 2.7263565063476562, 0.7847309112548828, 4.4382171630859375, -1.496246337890625, -0.61907958984375, -1.0473175048828125, -3.83203125, -1.8309555053710938, 2.9684677124023438, -0.0176544189453125, 1.6208457946777344, -1.8339309692382812, -2.930500030517578, 0.8306388854980469, 0.3104820251464844, 1.3306808471679688, 0.6418685913085938, 0.408447265625, 2.026824951171875, -0.008747100830078125, -0.98040771484375, 1.564483642578125, 1.33294677734375, -1.8054847717285156, 0.185302734375, 0.37081146240234375, 1.1856842041015625, -0.003143310546875, -1.1465911865234375, 1.2804031372070312, 1.3019027709960938, -0.7176704406738281, 1.7104263305664062, 2.7661972045898438, 1.8428878784179688, -0.38182830810546875, -0.8397369384765625, 1.7567672729492188, -0.8935089111328125, 2.671142578125, 1.2494430541992188, 0.661529541015625, -0.15528488159179688, 2.1505279541015625, 1.8901138305664062, 0.70458984375, -1.6395187377929688, -0.5504074096679688, -0.04955291748046875, -4.569671630859375, -1.2996063232421875, 3.6374893188476562, 2.3031158447265625, 1.4936447143554688, 0.16267776489257812, 1.5983314514160156, 1.3573379516601562, -1.08062744140625, 0.7756080627441406, 0.4316558837890625, -0.047389984130859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000097.npy"}
|
||||
{"epoch": 0.14663643235071808, "step": 98, "batch_size": 64, "mean": 0.42409175634384155, "std": 1.4054625034332275, "min": -5.197113037109375, "p10": -1.0018320083618164, "median": 0.5676383972167969, "p90": 1.9675052642822268, "max": 3.9620819091796875, "pos_frac": 0.671875, "sample": [-0.8602981567382812, 0.0015764236450195312, 1.697967529296875, 2.0076980590820312, 1.0946807861328125, 0.9984817504882812, 1.386260986328125, 2.11566162109375, -0.22118759155273438, 3.9620819091796875, -2.0514068603515625, -0.7979316711425781, 1.1130523681640625, 0.15754318237304688, -2.6436309814453125, 0.9983367919921875, 0.8878288269042969, 1.8737220764160156, -0.000957489013671875, -0.6889877319335938, 1.2022018432617188, 2.7231369018554688, 0.28668212890625, 2.13385009765625, 0.8911161422729492, -0.8925838470458984, 3.3177337646484375, 1.5717430114746094, 0.7433204650878906, 0.100433349609375, 1.068450927734375, 1.154937744140625, -0.1153411865234375, 0.6179733276367188, -0.33465576171875, 0.3143768310546875, 1.3612213134765625, 0.517303466796875, 0.7276382446289062, -1.0486526489257812, 1.0694961547851562, -0.0054168701171875, -1.9353408813476562, 0.3836822509765625, 1.2538986206054688, -1.7936820983886719, 0.2660102844238281, 0.8553619384765625, -0.1424560546875, 0.96685791015625, 0.18036651611328125, -0.05030059814453125, -0.35565948486328125, -0.3343639373779297, -5.197113037109375, 0.7948074340820312, 1.212249755859375, 0.4267997741699219, 0.1389923095703125, 0.756805419921875, 2.44012451171875, 0.9237327575683594, -0.6013946533203125, -1.4829635620117188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000098.npy"}
|
||||
{"epoch": 0.14814814814814814, "step": 99, "batch_size": 64, "mean": 0.6838279962539673, "std": 2.0547125339508057, "min": -3.6000213623046875, "p10": -1.605662155151367, "median": 0.3770923614501953, "p90": 3.1066028594970705, "max": 6.9620208740234375, "pos_frac": 0.65625, "sample": [5.412078857421875, 3.0560035705566406, -0.403411865234375, 5.5183868408203125, 0.45388031005859375, -0.2559814453125, -0.683807373046875, -2.4657726287841797, -0.2340240478515625, 2.5052642822265625, 0.355010986328125, 1.8998794555664062, 0.6865272521972656, 1.9556541442871094, -0.025129318237304688, 2.5265960693359375, -0.374053955078125, 0.63433837890625, -0.034423828125, 0.9201126098632812, 3.9948272705078125, -0.6116180419921875, -0.7483596801757812, -1.3355598449707031, -2.3524932861328125, -2.234447479248047, 5.393096923828125, -1.0142059326171875, 0.11905288696289062, 4.4645843505859375, 0.6601943969726562, 2.290985107421875, -1.0903968811035156, 1.5733718872070312, 0.29625701904296875, 1.1042327880859375, 0.1877593994140625, 0.3283576965332031, -1.0441608428955078, 2.0440673828125, 0.024066925048828125, -3.469757080078125, 3.1282882690429688, 6.9620208740234375, -0.4307212829589844, 1.4525146484375, 2.1819000244140625, 1.3191967010498047, -1.7214202880859375, 0.8864097595214844, 1.0205001831054688, 1.3939781188964844, 0.6559715270996094, 0.3991737365722656, 0.8607025146484375, 0.020244598388671875, 0.12385940551757812, 0.2967681884765625, 0.4005241394042969, -3.6000213623046875, 0.30682373046875, -0.9071617126464844, 1.0407485961914062, -2.0522918701171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000099.npy"}
|
||||
{"epoch": 0.14965986394557823, "step": 100, "batch_size": 64, "mean": 0.7622038125991821, "std": 2.3407483100891113, "min": -4.636260986328125, "p10": -1.8606117248535154, "median": 0.6827449798583984, "p90": 3.6611831665039096, "max": 8.92230224609375, "pos_frac": 0.703125, "sample": [1.03839111328125, -0.6922531127929688, -2.012481689453125, 1.244598388671875, 0.943878173828125, 0.14411163330078125, 0.4970111846923828, 1.052328109741211, 4.551025390625, 0.0611114501953125, 5.428955078125, 1.2648468017578125, 0.15479278564453125, 5.196708679199219, -0.18819427490234375, 4.4818878173828125, -1.0224151611328125, 0.7568283081054688, 1.1670379638671875, 1.1629486083984375, 0.7490196228027344, -1.6194229125976562, -1.3282585144042969, 0.2757568359375, -1.6798171997070312, 1.7253646850585938, 1.5667381286621094, 2.2106704711914062, 0.7756462097167969, -4.636260986328125, 0.0283660888671875, 0.4392242431640625, -2.109619140625, 0.24246978759765625, 0.6164703369140625, -1.1938896179199219, 2.8617095947265625, 1.0183181762695312, -3.9230422973632812, 2.1821212768554688, -1.45819091796875, 7.1308135986328125, -1.9380950927734375, 0.3177833557128906, -1.0684814453125, 2.3680038452148438, 1.2997550964355469, -0.218414306640625, -0.31418418884277344, 0.8598518371582031, 1.65911865234375, 2.566802978515625, 1.3293685913085938, 1.3242073059082031, 1.730560302734375, 0.4984397888183594, -2.5061416625976562, 4.003814697265625, 1.9599189758300781, 0.38318634033203125, 8.92230224609375, 0.371612548828125, -3.199859619140625, -0.6738128662109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000100.npy"}
|
||||
{"epoch": 0.15117157974300832, "step": 101, "batch_size": 64, "mean": 0.9653822183609009, "std": 1.823426604270935, "min": -4.431915283203125, "p10": -0.9720998764038083, "median": 0.7853164672851562, "p90": 3.5312480926513676, "max": 4.908836364746094, "pos_frac": 0.75, "sample": [0.2662506103515625, 0.1805267333984375, 0.6625595092773438, 0.0511932373046875, 3.3889617919921875, 1.917449951171875, 0.581573486328125, 0.2654247283935547, -0.6002349853515625, -0.6359405517578125, 4.237903594970703, 1.7766571044921875, -4.431915283203125, 0.38149261474609375, 3.0046348571777344, -0.29584503173828125, 1.2671623229980469, 0.8453197479248047, 0.5474166870117188, -1.3596992492675781, 2.3713150024414062, 2.61578369140625, 4.169090270996094, 2.423187255859375, 0.19524383544921875, -0.6667289733886719, 0.787689208984375, 0.7829437255859375, 0.631561279296875, 2.505748748779297, -0.08327102661132812, 4.253997802734375, 1.3535499572753906, 0.17600250244140625, -1.0793914794921875, -1.056619644165039, 0.6075973510742188, -2.5011444091796875, -1.19549560546875, 1.4719314575195312, -0.2624053955078125, 1.1694107055664062, 0.6765842437744141, 0.8769569396972656, 0.21844482421875, 1.8617706298828125, 1.348724365234375, -3.812255859375, -0.580047607421875, 0.7570152282714844, 3.3365249633789062, 1.4046440124511719, 3.5922279357910156, 1.4068832397460938, 1.765167236328125, -0.413177490234375, 4.17327880859375, 2.2982406616210938, -0.7748870849609375, 4.908836364746094, 0.860992431640625, 1.9744949340820312, 4.0590667724609375, 1.12408447265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000101.npy"}
|
||||
{"epoch": 0.15268329554043839, "step": 102, "batch_size": 64, "mean": 0.651196300983429, "std": 1.9640098810195923, "min": -4.3965911865234375, "p10": -1.2651712417602539, "median": 0.5153541564941406, "p90": 3.3616950988769543, "max": 7.132415771484375, "pos_frac": 0.65625, "sample": [2.083648681640625, -1.180816650390625, -0.7181930541992188, 0.043697357177734375, -3.3050765991210938, -0.5570640563964844, -0.7445907592773438, 1.4149284362792969, -2.905487060546875, 0.4030113220214844, 0.170135498046875, 0.9865493774414062, -0.2704620361328125, 1.2022323608398438, 1.7156219482421875, -1.1105422973632812, -1.233591079711914, 3.4881134033203125, 0.6791229248046875, 4.5538177490234375, -1.0574092864990234, 0.4522590637207031, 1.3174285888671875, 0.20831298828125, 0.6687784194946289, 0.9877052307128906, 1.0541534423828125, 2.1454925537109375, -4.3965911865234375, 0.08875656127929688, 4.568199157714844, 1.9970283508300781, -1.2787055969238281, 0.8473129272460938, -1.1215896606445312, 3.0690689086914062, 2.3517837524414062, 4.142692565917969, 1.738739013671875, -1.0736503601074219, 1.4944686889648438, 3.4871063232421875, -0.8044357299804688, 0.42124176025390625, -0.3579902648925781, 0.821380615234375, -0.291168212890625, 2.341259002685547, 0.547210693359375, -0.1955718994140625, 1.4261589050292969, 2.3187408447265625, 0.5340652465820312, -2.2904701232910156, -0.6271858215332031, -1.8147697448730469, 0.49664306640625, 7.132415771484375, -1.3820266723632812, 0.0480804443359375, 1.902008056640625, 3.6005935668945312, 1.0654335021972656, 0.3785533905029297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000102.npy"}
|
||||
{"epoch": 0.15419501133786848, "step": 103, "batch_size": 64, "mean": 1.224229097366333, "std": 2.101861000061035, "min": -3.1896209716796875, "p10": -1.1316356658935547, "median": 0.9740314483642578, "p90": 3.746938705444336, "max": 7.11846923828125, "pos_frac": 0.734375, "sample": [-2.7672042846679688, 2.7149505615234375, 3.073974609375, -2.39581298828125, 0.24721145629882812, 1.421539306640625, 0.1396636962890625, 0.145355224609375, 1.9023590087890625, 1.1803970336914062, -0.7760543823242188, -0.6795272827148438, -0.45000457763671875, 0.93603515625, 3.0851974487304688, 3.716350555419922, 0.9985542297363281, 1.2922210693359375, 5.3783111572265625, 3.3394088745117188, -0.122161865234375, 3.1475143432617188, 0.4612541198730469, 2.224437713623047, 1.8657608032226562, -0.6031723022460938, 0.6989212036132812, 5.919036865234375, -0.2896537780761719, 2.1494407653808594, 2.1280670166015625, 7.11846923828125, 2.1669921875, 0.6245155334472656, 0.7154731750488281, 0.617889404296875, -1.125, 3.7032470703125, 0.9495086669921875, 0.2937202453613281, 3.109954833984375, 0.7101249694824219, 1.0188846588134766, -1.4825439453125, 1.2378520965576172, -1.6826171875, 3.1883392333984375, -0.3733329772949219, 3.7783432006835938, -0.035671234130859375, 4.01483154296875, 2.1311111450195312, -2.672016143798828, -0.6286582946777344, 0.3719329833984375, 5.759193420410156, -3.1896209716796875, 0.22251129150390625, -1.1344795227050781, 0.5530929565429688, 1.3089141845703125, 1.5197830200195312, 3.7600479125976562, 1.717498779296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000103.npy"}
|
||||
{"epoch": 0.15570672713529857, "step": 104, "batch_size": 64, "mean": 1.18046236038208, "std": 2.3310935497283936, "min": -4.2628326416015625, "p10": -1.6215904235839844, "median": 0.9341745376586914, "p90": 3.8973434448242203, "max": 8.476470947265625, "pos_frac": 0.703125, "sample": [2.036834716796875, 2.682220458984375, 1.2036972045898438, 1.5193252563476562, -1.6148452758789062, 1.7750473022460938, -0.5075893402099609, 0.7503547668457031, 0.6977081298828125, -0.89178466796875, 0.08185577392578125, -4.2628326416015625, 8.476470947265625, 3.1260986328125, -1.299102783203125, -0.27918243408203125, 5.6959228515625, -0.048702239990234375, 1.950439453125, 2.9957199096679688, -2.152740478515625, -1.624481201171875, 2.41986083984375, 1.5871734619140625, 0.209716796875, 0.35979461669921875, 0.07960128784179688, -2.8219223022460938, 3.4760894775390625, -0.7089996337890625, 0.5681419372558594, 0.4640655517578125, -1.1646175384521484, 1.2013263702392578, 2.124225616455078, 1.4839859008789062, 0.9622955322265625, -0.7519073486328125, 3.5423126220703125, 1.9594459533691406, -0.18852996826171875, -1.708251953125, 5.712646484375, 0.7585830688476562, 3.2330703735351562, 6.18035888671875, 2.4624404907226562, 0.4858283996582031, -0.3323822021484375, -2.2954559326171875, 2.1105422973632812, 1.2602558135986328, 2.3758544921875, 4.04949951171875, 0.3057823181152344, 2.5748138427734375, -0.9239425659179688, 0.6542816162109375, 0.9060535430908203, 1.3543243408203125, -1.9540061950683594, 2.5985488891601562, 5.866233825683594, 4.762016296386719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000104.npy"}
|
||||
{"epoch": 0.15721844293272866, "step": 105, "batch_size": 64, "mean": 0.9783838987350464, "std": 2.852323532104492, "min": -6.5483245849609375, "p10": -2.930237579345703, "median": 0.9201288223266602, "p90": 4.985850524902344, "max": 7.487518310546875, "pos_frac": 0.65625, "sample": [-0.030002593994140625, 2.3712539672851562, 2.4698486328125, 2.019765853881836, 0.7132530212402344, 2.367645263671875, -6.5483245849609375, 1.18243408203125, -2.9456710815429688, 1.2141761779785156, 4.686653137207031, 0.8427066802978516, -0.6511459350585938, 2.7519302368164062, -0.077667236328125, 6.8567962646484375, -0.5299148559570312, -1.1905593872070312, 0.30612945556640625, 4.924957275390625, 1.4208145141601562, 1.8213882446289062, 0.57501220703125, -4.172233581542969, 1.2484359741210938, 1.972900390625, 5.55224609375, -1.6954803466796875, 1.0375823974609375, -2.89422607421875, -3.2520065307617188, 1.3493881225585938, -0.8151702880859375, 2.3430633544921875, -3.6602630615234375, 7.487518310546875, 0.29309654235839844, 0.6962890625, 0.9975509643554688, 0.3311729431152344, 0.6652450561523438, -0.44532012939453125, -0.3535041809082031, 1.3291397094726562, -3.676300048828125, 2.026256561279297, 6.4134674072265625, -0.5858154296875, 0.7083587646484375, -1.9310531616210938, -3.1153717041015625, 1.5374641418457031, -0.7130355834960938, 5.6202239990234375, 7.24896240234375, 1.5716094970703125, 3.077251434326172, 1.6521377563476562, -0.20795249938964844, 3.221651077270508, -2.75628662109375, 4.458404541015625, 0.4877471923828125, 5.0119476318359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000105.npy"}
|
||||
{"epoch": 0.15873015873015872, "step": 106, "batch_size": 64, "mean": 0.9594534635543823, "std": 2.4814388751983643, "min": -6.8409576416015625, "p10": -1.9951515197753906, "median": 0.6247234344482422, "p90": 3.778194427490235, "max": 8.434463500976562, "pos_frac": 0.734375, "sample": [-0.419830322265625, 1.4873237609863281, 0.3659400939941406, 3.38720703125, -1.8844680786132812, -0.7661972045898438, 4.5992584228515625, 0.9137840270996094, 0.41902923583984375, 6.317962646484375, 0.8110885620117188, 4.559883117675781, 5.425178527832031, 2.1826248168945312, 0.5306434631347656, -2.0425872802734375, -0.3648529052734375, 2.789886474609375, 0.3316192626953125, 0.77545166015625, 0.9775352478027344, 0.0094451904296875, -4.107177734375, 0.574462890625, 0.08504104614257812, 0.21118927001953125, -0.37464141845703125, -2.1356964111328125, 0.034832000732421875, 3.8358154296875, 0.4029388427734375, -0.7296237945556641, 0.5792083740234375, 2.7628822326660156, 3.2133560180664062, 1.3898468017578125, 0.031650543212890625, -0.3515510559082031, 1.7031974792480469, 8.434463500976562, 2.485240936279297, 1.3444137573242188, 1.7097511291503906, -0.35895538330078125, -3.5120162963867188, 0.0306549072265625, 2.3430328369140625, 2.3560867309570312, -6.8409576416015625, 0.6702384948730469, -2.9150848388671875, 0.4925422668457031, 2.6704177856445312, 3.327747344970703, 2.8593368530273438, 3.6437454223632812, 1.0165939331054688, 0.059894561767578125, -0.11808586120605469, 4.281890869140625, -3.745147705078125, -0.07595443725585938, 2.555004119873047, 1.15850830078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000106.npy"}
|
||||
{"epoch": 0.1602418745275888, "step": 107, "batch_size": 64, "mean": 0.6200859546661377, "std": 2.2787256240844727, "min": -6.292083740234375, "p10": -2.126166534423828, "median": 0.7171993255615234, "p90": 3.426335144042969, "max": 5.450294494628906, "pos_frac": 0.640625, "sample": [0.6865196228027344, -5.0789794921875, 1.3404693603515625, 0.75665283203125, 0.055816650390625, 1.8169708251953125, -4.468353271484375, 2.6142807006835938, 0.49542236328125, 1.8018875122070312, -0.8832244873046875, 5.450294494628906, 0.9657211303710938, 1.1826324462890625, 1.0240974426269531, -0.9756393432617188, -3.0878219604492188, 0.2278594970703125, 2.82684326171875, 1.5362091064453125, -0.1263275146484375, 3.4359893798828125, 4.687751770019531, -1.6003952026367188, 1.2057266235351562, -0.5027847290039062, 2.4361495971679688, -0.2520866394042969, -1.0088348388671875, 0.597991943359375, 0.6938133239746094, -2.225208282470703, 0.7405853271484375, 2.7439117431640625, -0.06064605712890625, 3.78411865234375, 3.715414047241211, -0.9781684875488281, 0.7711639404296875, 3.40380859375, 1.13330078125, 2.2724227905273438, -1.2247238159179688, -0.14147186279296875, 1.4869613647460938, 3.2672271728515625, 0.8612403869628906, 2.943389892578125, 0.5327835083007812, -2.5361404418945312, 4.1469268798828125, 3.3396873474121094, -0.8011589050292969, -1.8950691223144531, -0.12583541870117188, 0.5163497924804688, 1.771240234375, 3.71466064453125, -0.1241912841796875, 0.21962738037109375, -1.7888717651367188, -6.292083740234375, 1.4747085571289062, -2.8151092529296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000107.npy"}
|
||||
{"epoch": 0.1617535903250189, "step": 108, "batch_size": 64, "mean": 0.8618567585945129, "std": 2.3067331314086914, "min": -5.6855926513671875, "p10": -1.9271442413330073, "median": 0.6387434005737305, "p90": 3.586892700195314, "max": 6.787567138671875, "pos_frac": 0.65625, "sample": [-0.381622314453125, -0.37709808349609375, 1.908050537109375, 2.2355804443359375, 0.3701648712158203, 2.5999679565429688, 1.2952804565429688, 2.1016311645507812, 6.787567138671875, -0.781280517578125, 3.1357803344726562, 2.2742996215820312, 1.5805511474609375, -2.1211090087890625, 1.4073028564453125, 1.8985595703125, -1.4050216674804688, 0.5136852264404297, 0.4517631530761719, -0.098358154296875, 4.582183837890625, -0.9213638305664062, -1.1482887268066406, 0.44110107421875, 0.22818756103515625, -2.952880859375, 1.9107742309570312, -5.6855926513671875, 0.9345550537109375, -0.49619293212890625, -2.758514404296875, -0.7604560852050781, 4.3358154296875, -0.08184814453125, 2.7620315551757812, 5.410800933837891, 0.6653003692626953, -0.29808616638183594, -3.3786659240722656, 2.8832473754882812, -1.2934188842773438, 3.75030517578125, 2.122356414794922, 0.6121864318847656, -1.4745597839355469, 2.928447723388672, -0.4141883850097656, 1.2486419677734375, 0.4849090576171875, -2.7007904052734375, -3.5007247924804688, 1.7924995422363281, 0.71380615234375, 1.9545135498046875, 4.225761413574219, -0.19159317016601562, 1.4600677490234375, 5.9889984130859375, 0.4214591979980469, 1.4365234375, 0.4258270263671875, 2.6417083740234375, 0.2526969909667969, 3.205596923828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000108.npy"}
|
||||
{"epoch": 0.16326530612244897, "step": 109, "batch_size": 64, "mean": 1.3147889375686646, "std": 2.784717321395874, "min": -3.4493179321289062, "p10": -1.7151116371154784, "median": 1.1287546157836914, "p90": 4.816180419921875, "max": 9.734390258789062, "pos_frac": 0.59375, "sample": [-1.0916824340820312, 5.722404479980469, 5.084991455078125, -0.13360977172851562, 3.36029052734375, 1.885162353515625, 4.048000335693359, -1.45849609375, -0.04866218566894531, 2.4500045776367188, -0.07356834411621094, 1.0735626220703125, 4.698516845703125, 2.3971176147460938, -1.33868408203125, 0.35009765625, -0.2846660614013672, 1.3008403778076172, -1.0723724365234375, 1.1753616333007812, 1.0370407104492188, 1.9277229309082031, -3.387481689453125, -1.704939842224121, -0.525970458984375, 9.734390258789062, -0.38896942138671875, -0.5935249328613281, 1.1913185119628906, 6.294525146484375, 1.0821475982666016, -1.23577880859375, 0.9521026611328125, -1.0185394287109375, -1.7194709777832031, 2.2001380920410156, -2.2505645751953125, 4.1671295166015625, 0.18979644775390625, 2.326099395751953, -0.3722381591796875, 2.269796371459961, 1.1784954071044922, 6.4001922607421875, -1.4199142456054688, -1.9540023803710938, -1.0607376098632812, 1.4758129119873047, -3.4493179321289062, -2.1565933227539062, 1.1875991821289062, 1.3592376708984375, 2.9293670654296875, 4.242279052734375, 2.2488937377929688, 9.485389709472656, 4.866607666015625, 3.9989013671875, 3.267364501953125, 3.7421340942382812, -0.6361007690429688, -0.16163253784179688, -2.4002685546875, 2.783447265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000109.npy"}
|
||||
{"epoch": 0.16477702191987906, "step": 110, "batch_size": 64, "mean": 0.2178059071302414, "std": 2.5705294609069824, "min": -7.38458251953125, "p10": -3.1570194244384764, "median": 0.4216117858886719, "p90": 3.3148933410644537, "max": 5.748504638671875, "pos_frac": 0.578125, "sample": [1.4608497619628906, 0.29555511474609375, -2.681365966796875, -1.2508468627929688, -3.2251014709472656, 4.650775909423828, 1.6746902465820312, 3.8016891479492188, -5.0062255859375, 1.7180252075195312, 2.296985626220703, 0.54766845703125, 1.6751327514648438, 3.3572616577148438, 0.9083137512207031, -0.9012069702148438, 1.5201950073242188, 5.748504638671875, -1.5058650970458984, -3.4931564331054688, 2.0927658081054688, -2.9981613159179688, 2.166025161743164, 2.160888671875, 0.2602367401123047, -0.27643585205078125, 0.14456939697265625, 3.1659469604492188, -0.22608184814453125, -0.5800399780273438, -1.79168701171875, 1.9290924072265625, 0.7555580139160156, 0.1417083740234375, -1.9050559997558594, 1.504791259765625, -1.77545166015625, 3.39605712890625, -1.8351211547851562, 0.22715282440185547, 4.7967681884765625, -0.48172760009765625, -1.1373062133789062, 1.2562713623046875, -3.465198516845703, 1.7434844970703125, 2.862060546875, -2.1214370727539062, -7.38458251953125, -5.863037109375, -0.5027542114257812, 1.5770797729492188, -1.0398178100585938, 0.8282699584960938, 0.7313079833984375, 3.6771240234375, 2.9736328125, 1.1745071411132812, 0.7740306854248047, -1.4003753662109375, -2.38037109375, -0.7145767211914062, 3.216033935546875, -3.3284454345703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000110.npy"}
|
||||
{"epoch": 0.16628873771730915, "step": 111, "batch_size": 64, "mean": 1.0503621101379395, "std": 2.5394296646118164, "min": -4.402717590332031, "p10": -1.7522008895874022, "median": 0.6597023010253906, "p90": 4.5998882293701175, "max": 7.22613525390625, "pos_frac": 0.625, "sample": [-1.680734634399414, -1.0942230224609375, 3.1041946411132812, 2.339080810546875, 5.062446594238281, -1.2077560424804688, 0.7912216186523438, 2.8340606689453125, 1.7437057495117188, 0.2503929138183594, 2.914989471435547, -1.3081893920898438, -1.841461181640625, 3.7316818237304688, -4.402717590332031, -0.47247314453125, -0.8902053833007812, 0.30239105224609375, -3.317394256591797, -0.29630279541015625, 4.993385314941406, 2.1381378173828125, 4.857913970947266, 6.142261505126953, 4.0517120361328125, 2.552276611328125, 1.1690216064453125, 2.726226806640625, 1.5269718170166016, -1.7828292846679688, -0.02050018310546875, -1.3204727172851562, -0.595977783203125, -3.6678123474121094, 0.31689453125, 0.1555023193359375, -0.286590576171875, 2.2318649291992188, -0.41085052490234375, 3.0592575073242188, 2.2174148559570312, 3.6150970458984375, 5.298271179199219, -0.2758674621582031, 4.471920013427734, -1.2013111114501953, 2.076000213623047, 4.654731750488281, -1.24847412109375, -0.09961318969726562, 3.7880859375, 0.2501220703125, 2.3936500549316406, -1.2453384399414062, -3.3997421264648438, 1.4200820922851562, -3.686370849609375, 2.6435089111328125, 0.4454765319824219, 0.5281829833984375, 7.22613525390625, 0.14467239379882812, 1.7768440246582031, 1.0305938720703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000111.npy"}
|
||||
{"epoch": 0.16780045351473924, "step": 112, "batch_size": 64, "mean": 1.3121283054351807, "std": 2.5920469760894775, "min": -5.639568328857422, "p10": -1.59490966796875, "median": 1.0638160705566406, "p90": 5.152059936523438, "max": 6.303253173828125, "pos_frac": 0.75, "sample": [3.5228271484375, 0.42547607421875, 5.2281036376953125, 0.976776123046875, 0.3352985382080078, 0.9798126220703125, 4.9746246337890625, 2.4434242248535156, 1.1686248779296875, 1.22491455078125, 6.191734313964844, 2.7020263671875, -3.7477760314941406, 0.8832492828369141, 0.7491140365600586, -2.7587432861328125, 4.398246765136719, 1.1491165161132812, 0.13272476196289062, 0.048309326171875, 2.36968994140625, -1.07781982421875, 2.5620880126953125, 6.117279052734375, 0.8596267700195312, 5.781730651855469, 1.3404312133789062, -1.4981918334960938, 4.77703857421875, 1.6223716735839844, 6.303253173828125, 1.06170654296875, -0.9415073394775391, 2.5211524963378906, 2.6893310546875, 5.7713623046875, 0.36348724365234375, -1.1249542236328125, 3.0696334838867188, 2.1147003173828125, -2.5363845825195312, 0.47259521484375, 3.5289649963378906, 1.0659255981445312, 1.5464019775390625, 5.3275299072265625, 4.5688629150390625, 4.1222686767578125, 0.07720184326171875, -1.3133735656738281, -2.9034347534179688, 0.23775482177734375, -1.0892410278320312, 2.6893234252929688, 0.29693603515625, -0.1304931640625, 1.1471099853515625, -5.639568328857422, -1.6363601684570312, -0.05169677734375, 1.211517333984375, -3.359832763671875, -0.22344970703125, 0.85736083984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000112.npy"}
|
||||
{"epoch": 0.1693121693121693, "step": 113, "batch_size": 64, "mean": 0.8528228998184204, "std": 2.569075345993042, "min": -7.07415771484375, "p10": -1.6163051605224608, "median": 0.7918758392333984, "p90": 3.738314056396485, "max": 8.496246337890625, "pos_frac": 0.578125, "sample": [-0.5026779174804688, -3.612255096435547, -1.3694076538085938, 1.2677230834960938, -0.03397369384765625, 2.6071090698242188, 1.2228012084960938, -1.8079071044921875, -3.3831939697265625, 2.787841796875, -1.1425094604492188, -1.0080757141113281, -1.7221183776855469, -0.1749114990234375, -2.6759490966796875, -0.5341033935546875, 0.5564002990722656, 2.7697181701660156, 1.9735794067382812, -5.4214019775390625, -0.6986541748046875, -0.34418487548828125, 5.403602600097656, -0.616485595703125, 1.4948196411132812, 1.5576324462890625, 2.6422195434570312, 1.2261772155761719, 4.781307220458984, -0.39425086975097656, 1.6937942504882812, 8.496246337890625, 0.8015613555908203, 0.7399940490722656, 0.49684906005859375, 1.807607650756836, -7.07415771484375, -0.121490478515625, 1.788360595703125, 1.71844482421875, -0.09510612487792969, 1.586761474609375, -0.608489990234375, 1.7489547729492188, 2.4313430786132812, 6.455596923828125, 0.001495361328125, 1.7603206634521484, 0.7821903228759766, 3.31689453125, -0.84906005859375, -0.3047027587890625, -0.3355674743652344, 4.94970703125, 3.5571136474609375, -0.6925735473632812, 2.3705291748046875, 1.32098388671875, 3.8159713745117188, 3.3669967651367188, 4.593330383300781, -0.9752311706542969, -0.45890045166015625, 1.646026611328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000113.npy"}
|
||||
{"epoch": 0.1708238851095994, "step": 114, "batch_size": 64, "mean": 1.3328726291656494, "std": 3.167510986328125, "min": -5.243072509765625, "p10": -2.2987030029296873, "median": 1.0494918823242188, "p90": 5.323525238037111, "max": 9.995864868164062, "pos_frac": 0.671875, "sample": [-2.298126220703125, -2.0457382202148438, 1.7155914306640625, 2.716278076171875, 8.307044982910156, 2.0733261108398438, 4.937889099121094, 4.44793701171875, 1.8913764953613281, -3.7537498474121094, 4.8864288330078125, -1.7300186157226562, 1.9519844055175781, -2.2989501953125, 0.19243621826171875, 1.3161849975585938, 1.3331756591796875, 0.227874755859375, -0.4883613586425781, 0.033794403076171875, 0.9517402648925781, -0.539031982421875, 9.995864868164062, -1.7504081726074219, 1.1603507995605469, 4.011131286621094, 4.398841857910156, 1.9120674133300781, -1.6693191528320312, 3.4431915283203125, -2.8032665252685547, 0.49199390411376953, 5.036720275878906, -5.243072509765625, 2.5575637817382812, -0.012187957763671875, 5.596809387207031, 0.4178466796875, 2.4481468200683594, -2.376689910888672, 8.334686279296875, 2.2136611938476562, 4.3787841796875, 0.4875049591064453, 1.2007904052734375, -0.8708610534667969, 7.963775634765625, 0.36959075927734375, -1.2684593200683594, -2.0189361572265625, 1.1179962158203125, 0.9599609375, 1.5687370300292969, 0.55133056640625, 4.619926452636719, 5.446441650390625, 1.6930046081542969, -1.362091064453125, -2.30108642578125, -4.255889892578125, 6.132621765136719, -0.2840118408203125, 0.980987548828125, -1.7992897033691406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000114.npy"}
|
||||
{"epoch": 0.17233560090702948, "step": 115, "batch_size": 64, "mean": 1.199564814567566, "std": 2.633521556854248, "min": -3.6089859008789062, "p10": -2.025254821777344, "median": 1.11029052734375, "p90": 5.032366180419926, "max": 7.3292694091796875, "pos_frac": 0.671875, "sample": [-1.9508438110351562, -1.025787353515625, 3.444427490234375, 0.8072052001953125, 7.120269775390625, -0.6508331298828125, 1.2538299560546875, 4.007259368896484, -0.644805908203125, 7.106842041015625, 0.115570068359375, -3.0106277465820312, 7.3292694091796875, -1.4136428833007812, -2.0424118041992188, -3.4715118408203125, 0.9145851135253906, 2.3922271728515625, 0.5647125244140625, 3.340900421142578, 1.2274818420410156, 3.285736083984375, 1.7913436889648438, 0.7906951904296875, -2.173370361328125, 1.0933380126953125, 2.0624847412109375, 1.5551643371582031, -3.6089859008789062, 5.758197784423828, -2.0427398681640625, 3.845895767211914, 2.217782974243164, 5.5769500732421875, 0.2641143798828125, 3.7950439453125, 0.5085067749023438, 1.273752212524414, -1.1716079711914062, 1.2122573852539062, 1.1272430419921875, -1.9852218627929688, 1.4939041137695312, 2.297271728515625, 3.681884765625, 1.3539962768554688, 4.118812561035156, -0.2713966369628906, -3.3142318725585938, 6.940338134765625, 1.1899852752685547, -0.6044635772705078, 2.5162525177001953, -0.1066741943359375, 0.23685836791992188, -0.41638946533203125, -1.6471633911132812, -0.5138664245605469, 1.3218307495117188, -0.37735748291015625, 1.7092247009277344, 0.22479248046875, 0.9239521026611328, 5.42388916015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000115.npy"}
|
||||
{"epoch": 0.17384731670445955, "step": 116, "batch_size": 64, "mean": 1.6490864753723145, "std": 2.658559799194336, "min": -4.560028076171875, "p10": -1.4658241271972654, "median": 1.188568115234375, "p90": 5.509822082519532, "max": 8.11517333984375, "pos_frac": 0.734375, "sample": [7.079246520996094, -2.2686614990234375, 0.4637641906738281, 5.1697235107421875, 6.1920013427734375, 2.2286758422851562, 4.3224334716796875, 0.22320175170898438, 3.3568191528320312, 3.7701644897460938, 1.4580230712890625, -0.8904266357421875, -1.6185073852539062, 5.029388427734375, 1.4472465515136719, 0.5243072509765625, -0.31742095947265625, 0.09125137329101562, 2.4073219299316406, 4.082874298095703, -1.9041671752929688, -0.34487152099609375, 2.0592422485351562, 0.15924072265625, -1.0742416381835938, 6.173526763916016, 0.5987625122070312, 3.2602615356445312, 8.11517333984375, 0.8285713195800781, 1.277923583984375, -1.0386810302734375, -2.482769012451172, 0.2548713684082031, 4.79669189453125, 0.9708023071289062, 3.181438446044922, 2.6681976318359375, -0.9254302978515625, -1.739471435546875, 2.3299484252929688, 2.2371139526367188, 1.3779029846191406, -1.2246170043945312, 6.2982940673828125, 2.4593772888183594, 5.65557861328125, -1.5691986083984375, -0.45888519287109375, 0.9969253540039062, -0.8365898132324219, 1.099212646484375, 0.9595870971679688, -4.560028076171875, 0.1365203857421875, 2.8907928466796875, 1.0839958190917969, 1.4453659057617188, 0.5587749481201172, 6.689208984375, 3.6054153442382812, 2.7555809020996094, 5.000410079956055, -0.9756507873535156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000116.npy"}
|
||||
{"epoch": 0.17535903250188964, "step": 117, "batch_size": 64, "mean": 1.6154546737670898, "std": 2.7905819416046143, "min": -5.013908386230469, "p10": -1.886998176574707, "median": 1.5159034729003906, "p90": 5.591407012939454, "max": 7.191074371337891, "pos_frac": 0.734375, "sample": [-1.8577880859375, 5.875576019287109, 2.9902572631835938, 0.265045166015625, 5.8069915771484375, 0.17117691040039062, 3.6850929260253906, 1.2830772399902344, 3.2184066772460938, 2.5079879760742188, 0.9200363159179688, -0.2392425537109375, 5.704925537109375, -5.013908386230469, 1.6020889282226562, 2.8426666259765625, -1.7183685302734375, -0.46916961669921875, 3.79681396484375, 3.0273818969726562, 0.374786376953125, 3.952312469482422, 0.9279899597167969, 0.48583221435546875, 3.7249412536621094, -1.89727783203125, 1.77557373046875, 3.077239990234375, 2.8662261962890625, 5.929931640625, 1.7112045288085938, 5.326530456542969, 6.8548431396484375, 0.6186332702636719, 3.3989715576171875, -0.4193878173828125, 1.1354560852050781, 5.286125183105469, -0.2133636474609375, 3.9332656860351562, 0.832000732421875, 0.9757614135742188, 2.7912445068359375, 7.191074371337891, -2.045684814453125, -4.4514923095703125, 1.429718017578125, 0.6537017822265625, 4.521247863769531, 1.9539108276367188, 0.3817901611328125, 1.9885482788085938, 3.596923828125, -2.4109344482421875, 7.0330810546875, -3.5079689025878906, -1.8630123138427734, 3.1903915405273438, -3.333709716796875, -1.2400970458984375, 2.401336669921875, 1.3101425170898438, -0.9840469360351562, -0.2737083435058594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000117.npy"}
|
||||
{"epoch": 0.17687074829931973, "step": 118, "batch_size": 64, "mean": 1.1792272329330444, "std": 2.9425244331359863, "min": -6.216682434082031, "p10": -2.7577804565429687, "median": 0.8443679809570312, "p90": 5.091226959228516, "max": 7.972038269042969, "pos_frac": 0.65625, "sample": [0.7091751098632812, 0.30101776123046875, -0.0088653564453125, 0.604888916015625, -6.216682434082031, -2.7566909790039062, -0.77508544921875, 2.95343017578125, 4.8883056640625, -1.0403556823730469, 3.825000762939453, 1.034332275390625, -0.8674583435058594, 3.9144935607910156, 0.49807167053222656, -2.3156890869140625, 2.2998504638671875, 4.006500244140625, -0.1585540771484375, -0.11969375610351562, -1.0485687255859375, 1.2625885009765625, 3.415712356567383, 2.545623779296875, 2.7996597290039062, -2.7610626220703125, -2.7582473754882812, -2.2295150756835938, 2.5659828186035156, 0.05058479309082031, 2.9008560180664062, 2.0326099395751953, 5.897621154785156, -2.063507080078125, 0.37808990478515625, 1.7226104736328125, 0.7056999206542969, 1.846038818359375, 4.5360260009765625, -3.795074462890625, 0.37419891357421875, 4.182350158691406, 1.6858139038085938, 7.972038269042969, 0.9795608520507812, -2.791229248046875, -3.8857498168945312, 5.6067352294921875, 6.25982666015625, 5.974586486816406, 5.051177978515625, -4.050804138183594, 3.0983123779296875, 2.3240890502929688, 0.0082550048828125, -0.2925224304199219, -2.0402374267578125, 2.50506591796875, 3.244447708129883, 5.108390808105469, 5.891994476318359, -0.29883575439453125, 0.49254608154296875, -0.7091903686523438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000118.npy"}
|
||||
{"epoch": 0.17838246409674982, "step": 119, "batch_size": 64, "mean": 1.3570809364318848, "std": 3.632678270339966, "min": -8.001983642578125, "p10": -2.615535354614258, "median": 1.2096233367919922, "p90": 6.337269592285157, "max": 11.4488525390625, "pos_frac": 0.578125, "sample": [-2.548870086669922, 8.264602661132812, -1.3127861022949219, -2.6087608337402344, -4.6459503173828125, -1.0588111877441406, 2.316892623901367, 1.4666900634765625, -1.2454681396484375, -0.6472320556640625, -2.8866119384765625, -0.6456298828125, -0.18619537353515625, 7.7657928466796875, 4.798007965087891, -8.001983642578125, -2.1532554626464844, -2.8114166259765625, 1.9607925415039062, 4.9765472412109375, -2.0404815673828125, 2.533498764038086, 0.1721649169921875, 1.2601814270019531, 6.798347473144531, 1.9116363525390625, -3.5478897094726562, 4.77288818359375, 6.1583709716796875, 7.042388916015625, 4.832977294921875, 2.758037567138672, 1.6336936950683594, 3.2671966552734375, 5.515411376953125, 11.4488525390625, 1.3344459533691406, -1.3626861572265625, 0.8498764038085938, 5.303947448730469, 7.986705780029297, 2.2040481567382812, -2.5291175842285156, 2.601043701171875, -1.6562118530273438, -2.618438720703125, 5.4893798828125, 3.5075416564941406, 3.7281341552734375, -1.342926025390625, -0.4286651611328125, -3.40826416015625, -0.1072540283203125, -1.8321304321289062, -0.09103775024414062, 1.1590652465820312, 3.251434326171875, 6.4139404296875, 0.5517196655273438, 2.2431793212890625, -1.0606861114501953, 1.4271011352539062, 0.16376113891601562, -0.23834991455078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000119.npy"}
|
||||
{"epoch": 0.17989417989417988, "step": 120, "batch_size": 64, "mean": 2.052732467651367, "std": 2.9510326385498047, "min": -4.655906677246094, "p10": -0.8081531524658201, "median": 1.7070140838623047, "p90": 5.90809555053711, "max": 10.251541137695312, "pos_frac": 0.71875, "sample": [3.449462890625, 0.29974365234375, -0.4228172302246094, 10.251541137695312, 1.6536102294921875, -0.14917373657226562, 4.439899444580078, -0.20185089111328125, 1.8945808410644531, -4.655906677246094, 6.964874267578125, 0.8574371337890625, 3.7118911743164062, -0.177642822265625, 3.0572052001953125, 6.01971435546875, 5.012229919433594, -0.08508110046386719, 3.0359268188476562, 0.8198699951171875, 2.0937042236328125, 0.4107666015625, 3.0638084411621094, 3.557149887084961, -1.08837890625, 1.5014839172363281, 6.692741394042969, -1.710968017578125, 3.001190185546875, 4.451637268066406, 4.6913909912109375, -4.373146057128906, 9.057388305664062, -0.8972053527832031, 4.0943450927734375, -0.40008544921875, 3.1140975952148438, -0.37332916259765625, 4.833892822265625, 6.722282409667969, 0.6849231719970703, 1.6318550109863281, -0.251007080078125, 2.654296875, 1.5194282531738281, -2.7982101440429688, -0.23526763916015625, 2.24530029296875, 4.0756378173828125, 1.6000900268554688, 0.5210361480712891, 0.732879638671875, 4.7859344482421875, 0.02606964111328125, 2.560211181640625, -0.6003646850585938, 6.622596740722656, 5.647651672363281, -0.24471282958984375, -3.6092071533203125, 2.2240753173828125, 1.2839698791503906, 4.318992614746094, 1.7604179382324219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000120.npy"}
|
||||
{"epoch": 0.18140589569160998, "step": 121, "batch_size": 64, "mean": 1.4499316215515137, "std": 3.2169365882873535, "min": -4.044841766357422, "p10": -2.342810821533203, "median": 1.0743904113769531, "p90": 5.677814483642582, "max": 10.518447875976562, "pos_frac": 0.65625, "sample": [3.7843093872070312, -3.728546142578125, 0.23239898681640625, -2.924285888671875, 3.5122737884521484, 7.983184814453125, 2.8721351623535156, -3.463125228881836, 1.4183425903320312, 4.647125244140625, 0.07761383056640625, -1.4080810546875, 10.518447875976562, 2.8400115966796875, -1.432281494140625, 0.3507194519042969, 3.8701400756835938, 1.1143035888671875, 4.3324127197265625, -2.4362335205078125, 8.2508544921875, -0.19036865234375, 4.0947418212890625, 4.721931457519531, 3.7353439331054688, -1.8733558654785156, 2.030536651611328, -3.106771469116211, 1.4463348388671875, -0.2542533874511719, -0.6061210632324219, 7.515380859375, -0.6294021606445312, 1.2624130249023438, 2.6281280517578125, 1.3244705200195312, -1.533660888671875, 6.0874786376953125, 2.8000411987304688, -0.538421630859375, -0.05460357666015625, 0.19384002685546875, -1.2670822143554688, 3.5081405639648438, -2.1248245239257812, 2.107330322265625, -4.044841766357422, 8.395530700683594, 0.14208984375, 0.9018878936767578, 1.6166324615478516, 0.7623748779296875, 3.6026954650878906, 1.4773826599121094, -1.9854354858398438, 7.10076904296875, -1.86370849609375, 1.0344772338867188, 0.44019317626953125, 0.9784889221191406, -0.3658180236816406, 4.613212585449219, 1.254486083984375, -2.9537601470947266], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000121.npy"}
|
||||
{"epoch": 0.18291761148904007, "step": 122, "batch_size": 64, "mean": 1.4434537887573242, "std": 3.5414295196533203, "min": -4.368560791015625, "p10": -2.6616340637207028, "median": 0.9230232238769531, "p90": 5.70469970703125, "max": 11.467529296875, "pos_frac": 0.65625, "sample": [9.0865478515625, -4.2747955322265625, -3.5243377685546875, 2.1290283203125, 0.1244659423828125, -1.4582138061523438, 9.985107421875, 4.0614471435546875, -3.627826690673828, 4.27313232421875, 2.297130584716797, -1.5242843627929688, -2.7458877563476562, 0.9706268310546875, 0.2946624755859375, 11.467529296875, -1.195587158203125, 0.5702667236328125, -3.1352767944335938, 3.8129959106445312, 3.9073219299316406, 4.88593864440918, 0.6577262878417969, 2.0666961669921875, 5.7411651611328125, 7.52117919921875, -2.3644332885742188, -1.38861083984375, 3.11383056640625, 5.6196136474609375, -2.3825531005859375, 5.037406921386719, 1.5099868774414062, 1.34759521484375, -4.368560791015625, 1.1075210571289062, 1.1193771362304688, 0.39056396484375, 2.088987350463867, -0.4949455261230469, 0.4778709411621094, -0.19591522216796875, -2.4650421142578125, 7.242866516113281, -3.355255126953125, 3.957977294921875, 0.5273780822753906, -0.12702560424804688, -1.0155181884765625, 0.8754196166992188, -0.8059616088867188, -1.8192672729492188, 2.1185531616210938, 1.1934890747070312, 0.6475486755371094, 0.20874786376953125, 1.8010711669921875, -0.39569091796875, -0.24225616455078125, 3.948577880859375, 2.2967910766601562, 2.381612777709961, 1.4301910400390625, 10.992340087890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000122.npy"}
|
||||
{"epoch": 0.18442932728647016, "step": 123, "batch_size": 64, "mean": 1.661919355392456, "std": 3.109231472015381, "min": -5.1934661865234375, "p10": -2.3038192749023434, "median": 1.5280447006225586, "p90": 5.215471649169923, "max": 12.404190063476562, "pos_frac": 0.71875, "sample": [3.546375274658203, 7.096172332763672, 0.9087677001953125, 12.404190063476562, -1.2990951538085938, -2.5908584594726562, 3.8236770629882812, 2.1600189208984375, 4.3106689453125, 1.8916244506835938, 0.5641555786132812, 4.33624267578125, 2.4421157836914062, -0.661865234375, 2.6052093505859375, 1.9803657531738281, 2.958597183227539, 4.047580718994141, 0.4300575256347656, 4.8139190673828125, 0.8786773681640625, 2.5840911865234375, 2.181102752685547, 3.4654369354248047, 3.682769775390625, 5.387565612792969, 1.5801582336425781, -1.3891544342041016, 2.5340576171875, 3.610515594482422, 1.475931167602539, 3.6087646484375, -5.1934661865234375, 5.768898010253906, -0.8959884643554688, -5.0701141357421875, -3.090240478515625, -2.5069427490234375, -0.15229415893554688, 1.0716781616210938, 3.8577117919921875, -0.201873779296875, 3.296619415283203, -2.6416473388671875, 2.780059814453125, 7.223518371582031, -0.8120632171630859, -2.698772430419922, 0.2071685791015625, 0.18608856201171875, -0.855224609375, 1.1306533813476562, 0.40987396240234375, 1.0360565185546875, -1.829864501953125, 7.8199462890625, -0.9528141021728516, 0.9999656677246094, -1.1064300537109375, 1.3852920532226562, 1.7439823150634766, 2.5641021728515625, 7.09393310546875, 0.42719268798828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000123.npy"}
|
||||
{"epoch": 0.18594104308390022, "step": 124, "batch_size": 64, "mean": 1.883009433746338, "std": 3.8033649921417236, "min": -6.1281585693359375, "p10": -2.893416213989257, "median": 1.5549125671386719, "p90": 6.914575195312501, "max": 12.810989379882812, "pos_frac": 0.734375, "sample": [0.031040191650390625, 9.760009765625, 1.3557891845703125, -5.255802154541016, 2.0773372650146484, 2.5463714599609375, 7.1007843017578125, 2.9531402587890625, 9.814224243164062, 12.810989379882812, 2.4498748779296875, 3.7684173583984375, -0.23892974853515625, 9.09759521484375, 1.2084903717041016, 2.5620689392089844, -1.6282196044921875, 2.3623313903808594, -0.5729522705078125, -1.9651718139648438, -3.587890625, 3.4245758056640625, -1.9572372436523438, 3.8674392700195312, 0.8462371826171875, 2.8907432556152344, 0.49139976501464844, 2.9477767944335938, -1.7983245849609375, -3.539905548095703, 6.196414947509766, 5.860260009765625, 1.7540359497070312, 1.2242431640625, 5.740570068359375, 7.707794189453125, 6.4800872802734375, -6.1281585693359375, 2.069549560546875, 4.5313720703125, -3.081787109375, 2.9465484619140625, -0.000152587890625, 2.6036911010742188, 1.1137428283691406, 3.3928909301757812, 1.3390655517578125, 1.1044769287109375, -2.4538841247558594, 2.1102218627929688, 8.814186096191406, 1.0963211059570312, 0.027860641479492188, 0.426361083984375, -3.1805267333984375, 0.49314117431640625, 3.1567153930664062, -1.0942001342773438, -5.626773834228516, 5.03607177734375, -1.5763626098632812, 3.2701416015625, 0.4898567199707031, 0.84661865234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000124.npy"}
|
||||
{"epoch": 0.1874527588813303, "step": 125, "batch_size": 64, "mean": 2.072727680206299, "std": 3.5151495933532715, "min": -6.46600341796875, "p10": -2.1775833129882813, "median": 2.0362815856933594, "p90": 5.981137084960939, "max": 11.601348876953125, "pos_frac": 0.78125, "sample": [-1.59674072265625, 3.661163330078125, 4.655433654785156, 0.715850830078125, 2.647705078125, -0.9578876495361328, 4.329856872558594, 9.436887741088867, 7.5244140625, 1.1561012268066406, 9.538848876953125, -2.2169418334960938, 0.29734039306640625, 1.241912841796875, 0.034976959228515625, 6.105861663818359, 4.633857727050781, 1.3786163330078125, 0.7850761413574219, -2.08135986328125, 5.690113067626953, 5.673149108886719, 2.191802978515625, 2.677642822265625, -0.893402099609375, 2.760425567626953, 0.8316879272460938, -1.9920520782470703, -3.5232467651367188, 0.3108673095703125, 4.335105895996094, 2.154754638671875, 0.5380325317382812, 4.683013916015625, 5.587741851806641, -4.367317199707031, 1.7886133193969727, 2.0253829956054688, 6.768562316894531, 11.601348876953125, 3.610015869140625, 2.624847412109375, 6.5767822265625, 4.478363037109375, 5.008796691894531, -2.0857467651367188, 1.9257659912109375, 3.7719650268554688, 0.10510826110839844, 1.3430252075195312, 3.7229461669921875, 1.2547969818115234, 3.6036605834960938, 1.2394866943359375, -1.9389495849609375, 5.67340087890625, 2.04718017578125, -3.1601638793945312, 2.4173431396484375, 0.6934185028076172, 3.5330886840820312, -2.299732208251953, -5.15802001953125, -6.46600341796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000125.npy"}
|
||||
{"epoch": 0.1889644746787604, "step": 126, "batch_size": 64, "mean": 1.6221377849578857, "std": 4.228144645690918, "min": -6.387779235839844, "p10": -3.251180648803711, "median": 0.838749885559082, "p90": 7.726145935058595, "max": 11.288932800292969, "pos_frac": 0.59375, "sample": [9.660806655883789, -1.2500534057617188, 4.304100036621094, 0.9918327331542969, 1.1586952209472656, 7.3917083740234375, 6.0783538818359375, 0.38179969787597656, 1.6109695434570312, -1.3551158905029297, 8.540653228759766, 6.229522705078125, -6.387779235839844, -0.03717041015625, 11.274810791015625, 2.818286895751953, -2.343505859375, -1.4529647827148438, 0.39495849609375, -2.71112060546875, 6.0810546875, 5.8386688232421875, -1.1826820373535156, 2.3986053466796875, -3.2766380310058594, 6.760993957519531, 3.0471343994140625, -3.37158203125, 10.75701904296875, -1.0423774719238281, -3.5624465942382812, -4.4658203125, -2.3521652221679688, 0.5504932403564453, -2.1925926208496094, 1.5865249633789062, 0.1487274169921875, 1.7783050537109375, 0.8091068267822266, -5.5711669921875, 5.539703369140625, -1.439239501953125, 2.7298660278320312, -0.37014007568359375, 9.649604797363281, -1.1853561401367188, 2.214630126953125, 11.288932800292969, 1.4878196716308594, 3.1633148193359375, -0.8873214721679688, 2.3478546142578125, 3.1925506591796875, 0.0067138671875, -0.11376953125, -0.872528076171875, 0.8683929443359375, 7.194515228271484, -2.2208404541015625, -3.1917800903320312, -0.1240692138671875, 7.869476318359375, -3.676563262939453, 2.307098388671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000126.npy"}
|
||||
{"epoch": 0.19047619047619047, "step": 127, "batch_size": 64, "mean": 2.221470832824707, "std": 3.1275041103363037, "min": -4.827606201171875, "p10": -1.0832817077636718, "median": 2.044055938720703, "p90": 6.169398498535156, "max": 10.907928466796875, "pos_frac": 0.734375, "sample": [6.063323974609375, 8.764190673828125, 2.7190322875976562, 4.835597991943359, -0.609466552734375, 6.966461181640625, -1.0897750854492188, 3.4300880432128906, 0.94696044921875, 4.7834625244140625, 7.106815338134766, 1.0301132202148438, 1.0810089111328125, 5.9615631103515625, 0.13903045654296875, -0.19134140014648438, 4.0749359130859375, 2.2676925659179688, 0.59344482421875, 1.6794967651367188, 5.7822265625, -4.827606201171875, 2.5142822265625, 2.5309829711914062, 2.461151123046875, -3.9944992065429688, -0.5129241943359375, -0.3640594482421875, 2.644561767578125, 0.6325035095214844, -1.22235107421875, 3.840038299560547, 4.646892547607422, 2.2917404174804688, -0.16709518432617188, 4.735675811767578, 1.3463401794433594, 6.610729217529297, 4.1311798095703125, 10.907928466796875, -4.326671600341797, 4.145458221435547, 3.580394744873047, -0.9846725463867188, -2.3373489379882812, 6.2148590087890625, 1.3424148559570312, -0.7582588195800781, 5.840339660644531, -1.0681304931640625, -0.4757232666015625, 4.109954833984375, 2.9789047241210938, 0.09358596801757812, 2.596363067626953, -1.4513893127441406, 4.370460510253906, 0.2850494384765625, 1.8204193115234375, 1.6785888671875, 1.8111305236816406, 0.8758640289306641, 7.520275115966797, -0.2280597686767578], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000127.npy"}
|
||||
{"epoch": 0.19198790627362056, "step": 128, "batch_size": 64, "mean": 1.4183712005615234, "std": 4.668089389801025, "min": -10.289077758789062, "p10": -5.14190559387207, "median": 1.3746337890625, "p90": 7.4506881713867195, "max": 11.850715637207031, "pos_frac": 0.65625, "sample": [2.28753662109375, -3.566009521484375, 1.6463661193847656, 1.2625579833984375, 1.7892684936523438, 1.6501693725585938, -1.2109527587890625, 9.0592041015625, -0.4457244873046875, 7.9090423583984375, 0.8737907409667969, -10.289077758789062, 3.4615554809570312, 0.5576057434082031, 6.990716934204102, -6.812782287597656, 1.1302509307861328, 7.365303039550781, -5.358085632324219, 9.602157592773438, 11.672119140625, -1.5140953063964844, 2.073627471923828, 7.179443359375, 11.0347900390625, 0.419342041015625, -1.5451507568359375, 7.487281799316406, -5.420917510986328, 3.121471405029297, 4.3099365234375, 0.0623931884765625, -0.7591361999511719, 1.4867095947265625, 3.7144088745117188, 0.8175582885742188, -1.4417533874511719, 4.694940567016602, -8.054866790771484, 4.752017974853516, -1.8861312866210938, 1.9507274627685547, 1.6029472351074219, 0.18273162841796875, -4.637485504150391, 4.1024932861328125, 2.065082550048828, -0.18999862670898438, 2.81427001953125, -0.18990516662597656, -7.3717803955078125, 5.0760650634765625, 4.4539337158203125, 0.1167755126953125, 2.217020034790039, -0.8057346343994141, 0.965850830078125, 2.983154296875, 11.850715637207031, -2.080272674560547, -7.695404052734375, -0.18866729736328125, 4.077735900878906, -0.631378173828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000128.npy"}
|
||||
{"epoch": 0.19349962207105065, "step": 129, "batch_size": 64, "mean": 1.5880063772201538, "std": 4.429119110107422, "min": -8.549957275390625, "p10": -3.486255073547363, "median": 1.505539894104004, "p90": 7.382881164550782, "max": 13.21783447265625, "pos_frac": 0.640625, "sample": [-2.0778274536132812, -2.7726821899414062, 4.812171936035156, 1.07098388671875, -0.37635231018066406, 8.165546417236328, 3.8638458251953125, -3.521200180053711, 4.6482696533203125, 6.969593048095703, 6.241504669189453, 4.535327911376953, 1.4327926635742188, -0.7342758178710938, 0.31261444091796875, 1.9580154418945312, 7.225975036621094, -1.4495811462402344, 0.06893157958984375, 4.695426940917969, 5.3482208251953125, -8.549957275390625, 0.917327880859375, 4.4304351806640625, -1.7827491760253906, 13.21783447265625, 0.57550048828125, -2.1556015014648438, 2.8372154235839844, 8.217605590820312, 1.9384078979492188, -0.962432861328125, -8.198074340820312, 4.197729110717773, 3.3872909545898438, 5.418979644775391, -5.936920166015625, -3.8534927368164062, -2.2850189208984375, 7.555599212646484, 2.4505462646484375, -2.0252609252929688, -2.0755882263183594, 1.2124271392822266, 7.5623321533203125, 2.9532012939453125, -3.4047164916992188, 6.830524444580078, 3.233642578125, 10.324699401855469, 1.0627288818359375, 4.91107177734375, 7.450126647949219, -0.2200145721435547, 1.4420337677001953, -5.025949478149414, -1.7036514282226562, -7.307411193847656, 1.63623046875, -1.3031539916992188, 1.7948055267333984, 1.5690460205078125, -2.6000823974609375, 3.47784423828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000129.npy"}
|
||||
{"epoch": 0.19501133786848074, "step": 130, "batch_size": 64, "mean": 1.8546559810638428, "std": 3.7875218391418457, "min": -8.169219970703125, "p10": -2.5392185211181637, "median": 1.3553485870361328, "p90": 6.610271453857423, "max": 10.562759399414062, "pos_frac": 0.6875, "sample": [2.26385498046875, 2.652435302734375, 5.1838531494140625, 9.800018310546875, 1.1607742309570312, -2.1574554443359375, 1.21319580078125, 4.310075759887695, -1.4925518035888672, 1.0163135528564453, -1.990325927734375, 5.26416015625, 5.405769348144531, 1.452484130859375, -3.26422119140625, 7.4502410888671875, -1.356048583984375, -3.0000228881835938, -1.8886566162109375, 9.443084716796875, 2.2175216674804688, -1.0559463500976562, -1.0404052734375, 6.496971130371094, -0.5581207275390625, -3.632274627685547, 1.1901626586914062, 0.6851348876953125, 1.1147384643554688, -1.9163398742675781, 4.081268310546875, 2.9954452514648438, 9.498085021972656, 5.21922492980957, 2.4792251586914062, 7.003467559814453, 6.6588287353515625, -1.1316299438476562, 10.562759399414062, 1.8307647705078125, -1.1898345947265625, 3.0828323364257812, 1.3424911499023438, -2.702831268310547, -0.9130458831787109, 1.8114776611328125, 2.7153472900390625, 0.9525337219238281, 0.6674728393554688, -4.334320068359375, -4.7612152099609375, 0.7212047576904297, 2.772186279296875, 5.865602493286133, -8.169219970703125, 5.69580078125, 5.2028350830078125, 0.07988166809082031, 4.475929260253906, 0.65692138671875, 3.452014923095703, -0.19169998168945312, 1.3682060241699219, 5.931549072265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000130.npy"}
|
||||
{"epoch": 0.1965230536659108, "step": 131, "batch_size": 64, "mean": 2.399470090866089, "std": 4.357375621795654, "min": -8.290603637695312, "p10": -2.578485107421875, "median": 1.6558704376220703, "p90": 8.515171813964846, "max": 13.557464599609375, "pos_frac": 0.71875, "sample": [0.6492424011230469, 4.127277374267578, -2.91156005859375, 0.6336383819580078, -3.029754638671875, 13.557464599609375, 1.9214210510253906, 2.712129592895508, -0.37005615234375, 11.449386596679688, 3.5706939697265625, 0.7523097991943359, -1.1531028747558594, 0.9921798706054688, 5.246734619140625, 3.2654991149902344, 6.1181793212890625, -1.4947738647460938, 1.8044586181640625, -8.290603637695312, -2.304168701171875, 1.5240478515625, 13.072311401367188, 0.00872802734375, 1.7126312255859375, -2.761871337890625, -1.418008804321289, -0.4710235595703125, -4.601848602294922, -2.1674041748046875, 2.05694580078125, 1.2502899169921875, 6.8815765380859375, 1.5991096496582031, 7.391487121582031, 7.5800933837890625, 1.0097503662109375, 7.8429107666015625, 1.2799243927001953, 5.142158508300781, 3.080533981323242, 9.584854125976562, -1.9969100952148438, 6.449089050292969, 5.130817413330078, -2.4964752197265625, 1.1727180480957031, -3.875030517578125, 0.05950927734375, 8.80328369140625, 0.9812421798706055, -0.68487548828125, 6.757312774658203, -1.9540443420410156, -2.6136322021484375, 9.154258728027344, 2.1879844665527344, 1.3884544372558594, 4.2663116455078125, 4.868324279785156, 2.135406494140625, 3.5594654083251953, 8.952156066894531, 4.476932525634766], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000131.npy"}
|
||||
{"epoch": 0.1980347694633409, "step": 132, "batch_size": 64, "mean": 1.716357707977295, "std": 3.7030599117279053, "min": -7.5594635009765625, "p10": -2.0929283142089843, "median": 1.5985088348388672, "p90": 5.60199317932129, "max": 11.90155029296875, "pos_frac": 0.75, "sample": [3.86541748046875, 0.8910112380981445, 4.335105895996094, 3.1876983642578125, -1.4737319946289062, -1.74615478515625, 5.756233215332031, 1.2353839874267578, -5.869270324707031, 6.722480773925781, 2.992809295654297, 2.219432830810547, 2.309467315673828, 2.0785446166992188, -1.406402587890625, 2.3128738403320312, -0.2909126281738281, 7.747528076171875, 3.9107284545898438, 3.0796127319335938, -7.312904357910156, -7.5594635009765625, 5.6264801025390625, -1.1613998413085938, -0.4867134094238281, -2.0333328247070312, 4.682914733886719, 1.1928253173828125, 5.544857025146484, -1.8349990844726562, 10.078620910644531, 0.47997474670410156, -7.2193145751953125, 4.242820739746094, 3.1605072021484375, 3.2008056640625, 4.653129577636719, 3.213520050048828, 4.984111785888672, 1.599212646484375, 1.1899032592773438, 0.78863525390625, 0.5564727783203125, 10.3115234375, 1.8194427490234375, 0.27388763427734375, 3.9991912841796875, 4.042716979980469, 0.7287254333496094, 1.5269699096679688, 2.9075584411621094, -0.43829345703125, 1.0878067016601562, 1.5978050231933594, 1.8502302169799805, 1.704254150390625, 11.90155029296875, -2.416534423828125, 1.1014404296875, 0.8018798828125, 0.8362579345703125, 1.449066162109375, -2.56463623046875, -2.11846923828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000132.npy"}
|
||||
{"epoch": 0.19954648526077098, "step": 133, "batch_size": 64, "mean": 1.8439884185791016, "std": 3.969789981842041, "min": -6.379661560058594, "p10": -3.4016950607299803, "median": 1.6780719757080078, "p90": 5.906253051757814, "max": 12.637847900390625, "pos_frac": 0.640625, "sample": [7.106346130371094, -6.379661560058594, 0.7393989562988281, 4.232448577880859, 0.044467926025390625, -5.5811920166015625, 1.6049423217773438, 10.761951446533203, -0.20778274536132812, 2.947010040283203, -0.08108901977539062, 4.467498779296875, 5.050270080566406, 2.4557266235351562, -2.5999069213867188, 1.9698333740234375, 7.8943939208984375, -5.896759033203125, 2.7498550415039062, 4.6757049560546875, -4.04351806640625, 4.724090576171875, -0.8423385620117188, 4.879457473754883, 1.6576118469238281, -2.795063018798828, 2.54827880859375, -2.4700927734375, 5.329559326171875, -0.3025836944580078, 5.186248779296875, 0.5352935791015625, 3.5468597412109375, 7.331996917724609, -0.26480865478515625, -0.185302734375, -4.2999725341796875, 5.9994354248046875, 5.605596542358398, 5.6355438232421875, -3.2462615966796875, -0.43011474609375, 1.3043899536132812, -1.2470970153808594, 1.1552848815917969, 9.171295166015625, 3.9618606567382812, 3.9886856079101562, -2.181121826171875, 12.637847900390625, -0.09526824951171875, 1.6985321044921875, -3.4683094024658203, 1.787872314453125, 5.103706359863281, 0.3455619812011719, 3.4050369262695312, 5.6888275146484375, 5.38470458984375, -0.22466659545898438, 3.8037872314453125, -1.0216140747070312, -3.8480300903320312, 0.610595703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000133.npy"}
|
||||
{"epoch": 0.20105820105820105, "step": 134, "batch_size": 64, "mean": 2.5576934814453125, "std": 4.296508312225342, "min": -5.3048095703125, "p10": -3.619581604003906, "median": 2.377714157104492, "p90": 6.784547424316407, "max": 15.8502197265625, "pos_frac": 0.75, "sample": [-4.030372619628906, 1.05364990234375, -1.4790229797363281, 4.263347625732422, 1.3194847106933594, 1.6017036437988281, 2.301280975341797, -2.1839523315429688, 4.060546875, 5.0027923583984375, 3.5702133178710938, 4.477771759033203, 3.7946395874023438, 2.042083740234375, -0.15365219116210938, 2.2775726318359375, 6.847618103027344, 4.665931701660156, -3.7371673583984375, 5.688076019287109, 7.806053161621094, 9.567146301269531, 5.268709182739258, -4.069149017333984, 1.5766029357910156, 5.11968994140625, -0.8942489624023438, 2.849945068359375, -0.49277305603027344, 2.592538833618164, -4.125499725341797, 0.09603691101074219, 5.0645904541015625, -3.8196868896484375, 1.3260459899902344, -2.0365772247314453, 6.637382507324219, 2.4749927520751953, 2.4541473388671875, 4.1301116943359375, 15.8502197265625, 4.300264358520508, -3.34521484375, 4.2935638427734375, -0.3924827575683594, 2.6411170959472656, 6.35931396484375, -5.3048095703125, 9.943206787109375, 1.611846923828125, 2.0475845336914062, 1.7115020751953125, 0.056377410888671875, 1.04193115234375, 12.377716064453125, -4.484554290771484, 3.4590377807617188, 2.6979751586914062, 1.4196586608886719, 4.262214660644531, 6.5827178955078125, 14.664283752441406, 0.122039794921875, -1.1317310333251953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000134.npy"}
|
||||
{"epoch": 0.20256991685563114, "step": 135, "batch_size": 64, "mean": 1.81989324092865, "std": 3.729581117630005, "min": -10.431285858154297, "p10": -1.5818082809448242, "median": 1.7189321517944336, "p90": 6.708214569091798, "max": 13.104263305664062, "pos_frac": 0.71875, "sample": [2.3659629821777344, -0.6394729614257812, 0.9161834716796875, 3.5454654693603516, 2.4572086334228516, 0.10185718536376953, -3.7017288208007812, -0.6600723266601562, 2.4599552154541016, 8.061027526855469, 0.58258056640625, -4.153388977050781, 2.7958526611328125, 0.1815471649169922, 3.3538284301757812, -5.5774993896484375, 7.1540069580078125, -0.022735595703125, -1.628753662109375, 3.6233673095703125, 2.198367118835449, -1.3311309814453125, -10.431285858154297, 6.2145843505859375, 1.7566680908203125, 0.9136199951171875, 3.532024383544922, 2.6550216674804688, 1.3885765075683594, 4.1740264892578125, 1.0367279052734375, -2.2850265502929688, 5.531578063964844, 1.81036376953125, 5.787193298339844, -0.18760108947753906, 0.7570648193359375, 0.4145488739013672, 3.8991966247558594, 0.23035812377929688, -0.6007709503173828, 6.8221588134765625, -0.5848922729492188, 3.2199859619140625, 2.9826889038085938, 6.442344665527344, 1.3545989990234375, 0.11794662475585938, 2.6416397094726562, 1.6441192626953125, 1.9259109497070312, -5.611175537109375, -0.532806396484375, 3.007413864135742, 7.130340576171875, 9.019317626953125, -1.3674488067626953, 1.6811962127685547, -1.472269058227539, 6.99664306640625, 3.7090911865234375, 5.717628479003906, 13.104263305664062, -0.15482330322265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000135.npy"}
|
||||
{"epoch": 0.20408163265306123, "step": 136, "batch_size": 64, "mean": 2.2877230644226074, "std": 5.063263416290283, "min": -7.546699523925781, "p10": -3.7830146789550776, "median": 0.8422260284423828, "p90": 8.167175292968752, "max": 15.33544921875, "pos_frac": 0.625, "sample": [5.0655364990234375, 6.284383773803711, -2.6108665466308594, -0.5986785888671875, -4.96875, 4.936756134033203, -7.449729919433594, -1.929962158203125, -6.027412414550781, 6.78569221496582, 0.08995819091796875, -3.9405975341796875, -0.104461669921875, -0.06414413452148438, -1.7813491821289062, 0.0848846435546875, 7.144779205322266, -0.041156768798828125, 0.47826385498046875, -1.3404350280761719, 7.7667388916015625, 1.942352294921875, 8.338790893554688, 8.344451904296875, -1.2311325073242188, 0.7987442016601562, 9.791511535644531, -1.0429458618164062, 1.3730621337890625, 13.666351318359375, 6.9492034912109375, -0.046234130859375, 5.4996185302734375, 6.37567138671875, 6.5467071533203125, -2.729522705078125, -0.245574951171875, 4.9119110107421875, 7.3565673828125, -3.4153213500976562, -5.95417594909668, 1.101348876953125, 3.2673187255859375, 6.7909698486328125, 2.1202392578125, 0.8149871826171875, -4.658531188964844, 3.7713699340820312, 2.4014434814453125, 15.122451782226562, 0.8694648742675781, -0.7008514404296875, 4.53009033203125, 7.57366943359375, 6.666107177734375, 15.33544921875, 0.493438720703125, 0.8096542358398438, 5.000019073486328, 8.851455688476562, -0.0113983154296875, -1.2110977172851562, 0.013885498046875, -7.546699523925781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000136.npy"}
|
||||
{"epoch": 0.20559334845049132, "step": 137, "batch_size": 64, "mean": 2.2006025314331055, "std": 3.8137078285217285, "min": -5.5411529541015625, "p10": -2.570978546142578, "median": 2.4445724487304688, "p90": 7.012001037597656, "max": 13.491409301757812, "pos_frac": 0.6875, "sample": [-3.5217514038085938, -5.5411529541015625, 0.5887451171875, 7.0162811279296875, 3.3258438110351562, 4.3005218505859375, 7.647617340087891, 10.875396728515625, 1.8870620727539062, 13.491409301757812, 4.110843658447266, 3.1266708374023438, 6.3975677490234375, -0.5984344482421875, 7.2401885986328125, 2.8004608154296875, -2.2654266357421875, 3.3119049072265625, -2.5940704345703125, -2.1207523345947266, 1.587106704711914, 1.7827186584472656, 5.2671356201171875, -2.771434783935547, 2.827951431274414, -0.10044097900390625, 9.778343200683594, 8.007232666015625, 3.468486785888672, 4.131965637207031, 0.197296142578125, 3.5552139282226562, -3.73321533203125, 2.9178009033203125, 7.00201416015625, 2.5409393310546875, -1.012221336364746, -3.3358421325683594, 0.562103271484375, 0.4691143035888672, 0.9797897338867188, 1.1483230590820312, 0.8273811340332031, 4.1958160400390625, 5.591484069824219, -0.5045680999755859, -0.8636665344238281, 4.559471130371094, 3.67852783203125, 5.0261993408203125, 0.2470417022705078, -0.8689613342285156, -0.38973140716552734, -3.942089080810547, 3.0776538848876953, -1.279012680053711, -0.8927860260009766, -2.2701034545898438, -2.5170974731445312, 2.34820556640625, 4.935588836669922, 4.992671966552734, 5.7293701171875, 4.4078521728515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000137.npy"}
|
||||
{"epoch": 0.20710506424792138, "step": 138, "batch_size": 64, "mean": 1.5543079376220703, "std": 4.125727653503418, "min": -6.785186767578125, "p10": -2.5733469009399412, "median": 0.9553794860839844, "p90": 6.561145019531251, "max": 15.03363037109375, "pos_frac": 0.625, "sample": [0.6973876953125, 3.5078048706054688, 0.30718231201171875, 2.2374420166015625, 0.9431838989257812, -0.08807754516601562, 1.3181915283203125, -0.2901763916015625, -5.92572021484375, -2.414989471435547, 2.1179637908935547, -4.599456787109375, 6.711883544921875, -0.7623977661132812, 11.613037109375, -4.8102264404296875, -0.256622314453125, 3.2481231689453125, 6.666481018066406, 2.104534149169922, 5.251182556152344, -1.5719146728515625, 15.03363037109375, 2.7387638092041016, 0.3894195556640625, 4.666082382202148, 4.69383430480957, -1.627410888671875, 2.0820236206054688, -0.6005516052246094, 6.027107238769531, -2.641214370727539, 4.251152038574219, 0.0886383056640625, -0.6892833709716797, -1.5278472900390625, 1.5033416748046875, -6.3096771240234375, -0.09075355529785156, 6.315361022949219, 3.3593673706054688, 8.397369384765625, -2.001129150390625, 4.0161285400390625, -0.2576026916503906, -6.785186767578125, 3.6863479614257812, 0.159637451171875, -1.110015869140625, 0.105682373046875, -1.7075653076171875, 3.3083648681640625, 4.8105316162109375, 0.9675750732421875, 0.19573974609375, -4.293613433837891, -0.279052734375, 1.3868942260742188, -0.6816978454589844, 2.3975067138671875, 7.274894714355469, 3.389484405517578, 11.820213317871094, 1.0083999633789062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000138.npy"}
|
||||
{"epoch": 0.20861678004535147, "step": 139, "batch_size": 64, "mean": 1.588025689125061, "std": 3.8658926486968994, "min": -11.8897705078125, "p10": -3.2574958801269527, "median": 2.070199966430664, "p90": 6.174415969848633, "max": 8.581230163574219, "pos_frac": 0.71875, "sample": [-2.9779434204101562, 6.236583709716797, 5.822990417480469, 0.44260406494140625, -3.645191192626953, 3.0623130798339844, 2.511444091796875, 4.137722015380859, 6.983222961425781, -1.384857177734375, -1.0618209838867188, 5.299457550048828, 2.8672027587890625, -4.016138076782227, -1.1308746337890625, 0.5635299682617188, 6.7693023681640625, 2.3650341033935547, 1.4836196899414062, 1.9631614685058594, 0.6533203125, -11.8897705078125, 0.2684173583984375, 2.2966346740722656, 4.62384033203125, 4.176582336425781, 6.9160308837890625, 1.352132797241211, 3.2710723876953125, -3.3773040771484375, 2.831958770751953, 2.9668655395507812, 3.705596923828125, -6.643951416015625, 2.1772384643554688, 1.0733795166015625, 1.2692451477050781, 0.7463874816894531, 8.465774536132812, -0.09285736083984375, 4.524066925048828, -1.8423347473144531, 3.7946510314941406, 0.1914224624633789, 1.018655776977539, -4.464866638183594, 3.194549560546875, 3.4737701416015625, 6.02935791015625, 6.388042449951172, -2.30548095703125, 3.7349319458007812, 0.771392822265625, -6.9760589599609375, -1.0865554809570312, 3.3676185607910156, -1.9066047668457031, 5.5398406982421875, -2.2961912155151367, 5.0089111328125, 5.4837188720703125, 8.581230163574219, 1.9314889907836914, -1.6038665771484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000139.npy"}
|
||||
{"epoch": 0.21012849584278157, "step": 140, "batch_size": 64, "mean": 2.417325973510742, "std": 4.213418483734131, "min": -8.423721313476562, "p10": -2.7959526062011717, "median": 2.195831298828125, "p90": 8.021353912353517, "max": 11.549652099609375, "pos_frac": 0.65625, "sample": [8.579910278320312, 6.5609130859375, -1.1478691101074219, -0.8691558837890625, -0.19500732421875, 5.368202209472656, -4.240669250488281, -2.97314453125, 2.20086669921875, 0.5134429931640625, -8.423721313476562, 4.566673278808594, 3.4102401733398438, 2.1907958984375, 5.3425445556640625, -1.3427963256835938, -1.050628662109375, 3.256591796875, 2.9792327880859375, 6.205726623535156, -0.2719841003417969, 5.54144287109375, 4.747936248779297, 7.52667236328125, 0.6660385131835938, 8.908737182617188, -0.6067466735839844, 8.553192138671875, 0.281494140625, -2.15631103515625, -3.1705780029296875, 0.21270751953125, 3.8987998962402344, -4.2070465087890625, -0.4410686492919922, 3.8984298706054688, 2.153026580810547, 7.611457824707031, 7.6628570556640625, 8.430465698242188, 5.971832275390625, 6.001918792724609, 7.184104919433594, 7.822166442871094, 0.017391204833984375, -0.8066539764404297, 11.18115234375, -1.798980712890625, 11.549652099609375, -0.8490982055664062, -3.68194580078125, 5.073622703552246, -0.09936141967773438, 1.7594318389892578, 8.106719970703125, 1.045318603515625, 2.7629013061523438, -1.9640884399414062, 4.050590515136719, 1.561187744140625, 2.5293045043945312, 2.4840469360351562, -2.39715576171875, -2.9668655395507812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000140.npy"}
|
||||
{"epoch": 0.21164021164021163, "step": 141, "batch_size": 64, "mean": 2.3140902519226074, "std": 3.8912394046783447, "min": -7.4109649658203125, "p10": -2.429311180114746, "median": 2.478224754333496, "p90": 7.271677494049073, "max": 10.589561462402344, "pos_frac": 0.671875, "sample": [5.341278076171875, -3.382610321044922, 6.5745697021484375, -1.1484756469726562, 6.162025451660156, -1.1426239013671875, 2.99847412109375, 7.7132568359375, 2.4783153533935547, -1.6431102752685547, 0.4300384521484375, -1.6895828247070312, 3.2419357299804688, -3.2808265686035156, 10.589561462402344, -0.4285163879394531, 1.855966567993164, 0.2418670654296875, -2.2291736602783203, -7.4109649658203125, 0.9459762573242188, -3.2600936889648438, -0.4615364074707031, 5.243686676025391, -1.0620269775390625, 1.3939285278320312, 3.61578369140625, 6.1758880615234375, 1.83575439453125, -2.7274169921875, -0.0477294921875, -2.9107666015625, 2.8862876892089844, 7.023811340332031, 4.967872619628906, -0.8774642944335938, 6.33154296875, -0.9651947021484375, 7.37790584564209, 9.465850830078125, 5.9227294921875, 3.1859359741210938, 3.8952102661132812, 0.829986572265625, 9.302692413330078, 3.8765487670898438, -1.5828828811645508, 3.662353515625, 2.4879531860351562, 6.994316101074219, -2.490966796875, 2.0715255737304688, 4.72844123840332, 8.31732177734375, 2.4781341552734375, 8.839324951171875, -2.2854480743408203, 5.87066650390625, 0.29083251953125, 6.518646240234375, 4.039802551269531, 0.11512947082519531, -1.9134063720703125, 2.7234649658203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000141.npy"}
|
||||
{"epoch": 0.21315192743764172, "step": 142, "batch_size": 64, "mean": 1.4749343395233154, "std": 3.851325750350952, "min": -7.535266876220703, "p10": -2.3228147506713865, "median": 1.3768768310546875, "p90": 5.3693382263183596, "max": 13.71478271484375, "pos_frac": 0.71875, "sample": [2.600505828857422, -6.5051422119140625, 0.07189559936523438, 4.000087738037109, 2.4538002014160156, 8.569381713867188, 5.3546142578125, -6.721900939941406, 2.244617462158203, 1.777008056640625, 1.4090805053710938, 0.3345527648925781, 5.187168121337891, -4.5189208984375, 0.7776756286621094, 1.7156600952148438, -1.2171478271484375, -7.535266876220703, -1.7202339172363281, 1.2006263732910156, 1.5097198486328125, 2.124237060546875, 1.3446731567382812, -0.8076896667480469, 3.3679237365722656, 8.2740478515625, -1.3515625, 0.296539306640625, 4.762962341308594, 0.32711029052734375, 1.44024658203125, 0.53875732421875, 13.71478271484375, 2.818939208984375, 0.1787261962890625, -4.2650146484375, -0.9980278015136719, -2.0333728790283203, 1.6254005432128906, 3.6248626708984375, 9.539688110351562, 0.2126922607421875, 0.492034912109375, -0.9180831909179688, -2.4468612670898438, 0.3320579528808594, -0.7670421600341797, 5.375648498535156, -1.2876205444335938, 4.491035461425781, 2.26177978515625, -1.082876205444336, -1.9747161865234375, 8.743988037109375, 1.2609825134277344, 4.7050323486328125, 3.9085922241210938, 4.507808685302734, 8.4595947265625, 1.8672294616699219, 2.191802978515625, -4.013355255126953, 0.9500274658203125, 1.6150360107421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000142.npy"}
|
||||
{"epoch": 0.2146636432350718, "step": 143, "batch_size": 64, "mean": 1.2943671941757202, "std": 4.059176921844482, "min": -9.318023681640625, "p10": -3.1098121643066405, "median": 1.0059099197387695, "p90": 6.7868782043457045, "max": 13.604137420654297, "pos_frac": 0.59375, "sample": [0.27828216552734375, 3.974395751953125, 7.028797149658203, 4.6495361328125, 1.4838600158691406, 3.42279052734375, -0.640625, -0.36966609954833984, 6.988534927368164, -2.3706893920898438, 1.4269065856933594, -4.1768341064453125, 9.211105346679688, -0.28089141845703125, 4.9277191162109375, 1.5434074401855469, -6.288448333740234, -1.7383575439453125, 4.985015869140625, 0.8365821838378906, -3.140960693359375, -3.0371322631835938, 6.503501892089844, 7.988063812255859, 1.7228469848632812, 1.1752376556396484, -1.1149444580078125, -4.7135009765625, 0.2466297149658203, -0.8473052978515625, -1.7470817565917969, 2.082961082458496, -1.2460098266601562, -0.06471633911132812, 7.013145446777344, 0.2658500671386719, -3.8344955444335938, -0.2899055480957031, 2.3077774047851562, 1.3330001831054688, 1.9084548950195312, -9.318023681640625, 4.128364562988281, 3.0792160034179688, 1.3310623168945312, 2.286865234375, 6.9083251953125, 5.9285430908203125, -1.682342529296875, -0.1934051513671875, -1.4035110473632812, -2.7043685913085938, 2.6705703735351562, 13.604137420654297, -7.164299011230469, 6.041927337646484, 5.446266174316406, 0.5186309814453125, 4.85078239440918, 3.9752063751220703, 0.5774955749511719, -0.9053535461425781, -1.5617332458496094, -0.977691650390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000143.npy"}
|
||||
{"epoch": 0.2161753590325019, "step": 144, "batch_size": 64, "mean": 0.8273676633834839, "std": 4.36815881729126, "min": -7.8934173583984375, "p10": -4.052571868896484, "median": -0.13679885864257812, "p90": 6.022147178649902, "max": 13.429252624511719, "pos_frac": 0.484375, "sample": [5.468723297119141, 3.6715316772460938, -6.842681884765625, -0.40120697021484375, -0.0034942626953125, 2.6925601959228516, -0.8135986328125, 4.96820068359375, 2.8588104248046875, 0.724090576171875, -4.0120849609375, 9.272193908691406, -5.157703399658203, -0.5069732666015625, -1.4623031616210938, -7.8934173583984375, 5.928253173828125, 0.16771697998046875, -3.7350234985351562, 4.485435485839844, 0.05633544921875, -0.5036849975585938, -1.4199905395507812, 2.17608642578125, -3.8830299377441406, -1.7417373657226562, -3.1434478759765625, 4.909862518310547, -1.6285896301269531, -1.6324462890625, 8.937049865722656, -4.2847442626953125, -3.474956512451172, -0.27010345458984375, 2.9469757080078125, -0.9738998413085938, -0.4611339569091797, 7.441856384277344, 9.12152099609375, 5.201435089111328, 4.48468017578125, 0.54473876953125, -6.392436981201172, 3.797393798828125, -4.069923400878906, -1.0341796875, 5.813026428222656, -3.5165672302246094, 4.5883331298828125, -3.2640380859375, -0.3304271697998047, -1.2637710571289062, 0.4613533020019531, 6.062387466430664, -2.73175048828125, 8.264690399169922, 13.429252624511719, 4.16546630859375, -0.5546302795410156, -5.128837585449219, 2.4915847778320312, -1.5332622528076172, 1.2084712982177734, 0.6775875091552734], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000144.npy"}
|
||||
{"epoch": 0.21768707482993196, "step": 145, "batch_size": 64, "mean": 1.7080283164978027, "std": 4.12888765335083, "min": -10.943557739257812, "p10": -3.192409896850586, "median": 1.7656002044677734, "p90": 7.411745071411133, "max": 11.249580383300781, "pos_frac": 0.71875, "sample": [3.2704429626464844, 2.943572998046875, 1.4389972686767578, 3.0089874267578125, 0.8763580322265625, -10.943557739257812, -1.3991241455078125, -0.10021209716796875, 7.451381683349609, -5.1345367431640625, 1.538522720336914, 1.0110740661621094, 2.040541648864746, 3.4031753540039062, 1.8616561889648438, 0.42169189453125, 1.6318283081054688, 7.3192596435546875, -4.5986175537109375, 4.039863586425781, 1.2496795654296875, 1.7246208190917969, 0.36904144287109375, 8.0450439453125, 5.008827209472656, 1.27838134765625, -0.638580322265625, 1.80657958984375, -1.014007568359375, -3.316730499267578, 7.821952819824219, 2.6155471801757812, 2.2803115844726562, 9.459037780761719, -7.20123291015625, 2.5018606185913086, 4.652004241943359, 1.6467437744140625, 3.5945510864257812, 2.6438751220703125, -3.4364547729492188, 3.6801795959472656, 9.871925354003906, 3.2214412689208984, -2.488523483276367, 1.490081787109375, 1.9588088989257812, 3.2079944610595703, -1.1887741088867188, 9.389007568359375, 3.0264625549316406, 0.25850486755371094, 4.532482147216797, -0.13690948486328125, -0.3886566162109375, 6.9609375, 11.249580383300781, -1.6118698120117188, -7.4497833251953125, 2.5869369506835938, 4.3560333251953125, -2.9023284912109375, -2.47198486328125, 0.9899063110351562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000145.npy"}
|
||||
{"epoch": 0.21919879062736206, "step": 146, "batch_size": 64, "mean": 1.20717453956604, "std": 3.2328357696533203, "min": -5.8269500732421875, "p10": -2.819527816772461, "median": 1.065805435180664, "p90": 4.705867195129396, "max": 15.034927368164062, "pos_frac": 0.703125, "sample": [3.6397857666015625, 1.3841438293457031, 0.827239990234375, 3.2668304443359375, 1.0814056396484375, -2.9707183837890625, -0.09044647216796875, 0.6527595520019531, 2.8618030548095703, -0.49369049072265625, -5.379547119140625, -5.8269500732421875, 3.196136474609375, 3.9925765991210938, -2.4962615966796875, -2.8040695190429688, 5.06251335144043, -1.5199203491210938, 5.827392578125, -3.2536392211914062, 2.9339218139648438, 1.0250473022460938, -1.516876220703125, -0.7574539184570312, -2.826152801513672, 1.4846420288085938, 0.4400291442871094, 2.012420654296875, -1.3446502685546875, 0.05263519287109375, 2.5300750732421875, -3.2983779907226562, 6.3597412109375, 2.041839599609375, 2.0905075073242188, 0.7461662292480469, 1.5972137451171875, -0.2891044616699219, -1.9796295166015625, 0.32596588134765625, 4.44403076171875, 0.06258773803710938, 0.030364990234375, 1.86907958984375, 3.1415786743164062, 5.855396270751953, 3.068887710571289, 1.8205432891845703, 3.519245147705078, 0.8937759399414062, 1.7398834228515625, -4.596954345703125, 4.04161262512207, -1.8575592041015625, 0.4279909133911133, -0.7524261474609375, 15.034927368164062, 2.7183837890625, 1.0502052307128906, 1.7804641723632812, 5.229869842529297, 0.9038238525390625, 3.4300765991210938, 4.818082809448242], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000146.npy"}
|
||||
{"epoch": 0.22071050642479215, "step": 147, "batch_size": 64, "mean": 1.298788070678711, "std": 3.583099842071533, "min": -9.56317138671875, "p10": -2.471157455444336, "median": 1.0730915069580078, "p90": 6.592851638793945, "max": 8.791492462158203, "pos_frac": 0.640625, "sample": [1.7167205810546875, 2.5976181030273438, 6.617027282714844, 0.3137016296386719, -0.8495712280273438, 2.3244667053222656, 0.03845977783203125, -3.692352294921875, -1.5581817626953125, -1.8425788879394531, 0.6271438598632812, 3.589141845703125, 5.5816650390625, 4.5146484375, 0.09014892578125, 2.89495849609375, 8.791492462158203, -9.56317138671875, 5.984596252441406, -0.08750534057617188, 3.662342071533203, 2.512493133544922, -0.3085670471191406, -1.80181884765625, -1.527740478515625, 2.411235809326172, -0.05727386474609375, 1.0437126159667969, -0.22148513793945312, 3.0992355346679688, -1.4244613647460938, 1.3403167724609375, 2.03851318359375, -4.093799591064453, 2.414306640625, 3.5995025634765625, 2.541288375854492, 7.1718597412109375, 8.526565551757812, -1.8621025085449219, 8.427444458007812, 2.3707809448242188, -2.494140625, 2.0013809204101562, 1.2995567321777344, -2.417530059814453, 6.536441802978516, -0.7058639526367188, 8.540679931640625, -2.1425552368164062, -2.5183029174804688, 1.7238807678222656, 0.1385345458984375, 1.1024703979492188, -5.088958740234375, 0.41277313232421875, 0.8468799591064453, -0.8618125915527344, 2.7001953125, -0.111541748046875, -4.002649307250977, 1.2096405029296875, 8.660720825195312, 0.3418617248535156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000147.npy"}
|
||||
{"epoch": 0.2222222222222222, "step": 148, "batch_size": 64, "mean": 1.7757010459899902, "std": 3.752013921737671, "min": -5.9293060302734375, "p10": -2.530506992340088, "median": 1.6031761169433594, "p90": 5.825396728515625, "max": 15.546737670898438, "pos_frac": 0.75, "sample": [3.7670135498046875, -2.5822935104370117, 0.36464500427246094, -4.285463333129883, 0.63006591796875, 4.4970245361328125, 3.991119384765625, -3.4515228271484375, 2.84747314453125, 5.807708740234375, -2.3013248443603516, 2.8031463623046875, 3.1046791076660156, 3.1768951416015625, 2.3535003662109375, -4.085182189941406, -2.4096717834472656, 0.830657958984375, 2.7900009155273438, -2.1042709350585938, 1.75445556640625, -2.9431610107421875, 15.546737670898438, 0.44626617431640625, 0.753173828125, 5.832977294921875, 4.233421325683594, 0.31031036376953125, 11.920822143554688, 2.2099533081054688, 1.5751075744628906, 3.167510986328125, 8.62905502319336, -2.3715744018554688, 2.9332809448242188, 6.887229919433594, 0.3169689178466797, 1.4491233825683594, -5.9293060302734375, -1.4140071868896484, 4.4280548095703125, -2.1600189208984375, 3.7060928344726562, 6.373130798339844, -4.372642517089844, 1.812398910522461, 3.7283477783203125, 1.6312446594238281, 5.056098937988281, -0.42751312255859375, 3.0137691497802734, 0.4076080322265625, 1.1648120880126953, 0.09084320068359375, 0.6051006317138672, 3.9098663330078125, 3.82000732421875, 0.7133865356445312, 1.0647163391113281, 3.1582717895507812, 0.5817317962646484, 7.47357177734375, -2.2896041870117188, -0.8969497680664062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000148.npy"}
|
||||
{"epoch": 0.2237339380196523, "step": 149, "batch_size": 64, "mean": 1.1741925477981567, "std": 2.8650052547454834, "min": -5.271270751953125, "p10": -2.486617851257324, "median": 1.1275243759155273, "p90": 5.161753082275391, "max": 7.64051628112793, "pos_frac": 0.6875, "sample": [3.2101898193359375, 5.083518981933594, 2.526294708251953, -1.0123367309570312, -2.493408203125, 0.6910324096679688, -1.4266738891601562, 1.1331005096435547, -3.05224609375, 5.786731719970703, -5.031333923339844, 0.4618492126464844, 2.035614013671875, 0.9732398986816406, -0.9039764404296875, -1.3578071594238281, 0.3585472106933594, 1.1219482421875, 1.3491649627685547, 2.5211181640625, 2.8587703704833984, 1.92584228515625, 5.321563720703125, -2.2617568969726562, 7.64051628112793, -0.9097633361816406, 0.852142333984375, 0.7212486267089844, 1.6498641967773438, -5.097759246826172, 1.4199962615966797, 1.9605865478515625, 5.195281982421875, 7.5032806396484375, 2.764741897583008, 6.2844696044921875, 4.946002960205078, -3.1237411499023438, 1.5595855712890625, -3.4205398559570312, 2.0455474853515625, 5.43109130859375, -0.9004859924316406, 2.024017333984375, 4.3837738037109375, 2.3417129516601562, 0.8538665771484375, -2.470773696899414, -0.34778690338134766, 0.7671585083007812, -0.35977935791015625, -0.6907119750976562, -0.193939208984375, -5.271270751953125, -0.521240234375, 4.0697479248046875, 1.5500335693359375, 3.68011474609375, 1.276275634765625, 3.2917041778564453, 0.41872406005859375, 3.17962646484375, 0.189453125, 0.636566162109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000149.npy"}
|
||||
{"epoch": 0.2252456538170824, "step": 150, "batch_size": 64, "mean": 1.7189325094223022, "std": 3.523047685623169, "min": -4.6743927001953125, "p10": -2.7517925262451173, "median": 1.69561767578125, "p90": 5.731843566894532, "max": 12.65997314453125, "pos_frac": 0.65625, "sample": [1.0301589965820312, 0.4150543212890625, 7.102264404296875, 1.2187957763671875, -4.286956787109375, 1.6872100830078125, 0.6945085525512695, 1.7040252685546875, -2.45965576171875, 2.049468994140625, -1.2475204467773438, 4.1009063720703125, -4.6743927001953125, -1.7580184936523438, -0.6253337860107422, 2.9881362915039062, -0.37172698974609375, 7.4008331298828125, -3.6815643310546875, -1.9728164672851562, 5.7061767578125, -0.8838272094726562, 7.3589019775390625, -4.444793701171875, 0.046905517578125, 3.003887176513672, 2.205768585205078, 3.2127227783203125, 0.9481048583984375, 6.178535461425781, 1.3055686950683594, 2.55694580078125, 3.904693603515625, 3.417694091796875, 3.2939224243164062, -3.7126617431640625, -0.30692291259765625, 4.652929306030273, 5.311408996582031, 3.8003692626953125, 1.530059814453125, 4.279170989990234, -2.6726531982421875, 2.2232208251953125, 8.992450714111328, -1.97467041015625, 2.637972831726074, 3.5444908142089844, -2.7857093811035156, 12.65997314453125, 4.03887939453125, -0.6663360595703125, 5.7428436279296875, -3.5521087646484375, 1.2100067138671875, 4.707069396972656, -1.851470947265625, -0.2883453369140625, 4.1705322265625, -0.054351806640625, 2.047100067138672, 5.2525787353515625, -1.50164794921875, 5.452919006347656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000150.npy"}
|
||||
{"epoch": 0.22675736961451248, "step": 151, "batch_size": 64, "mean": 2.0277981758117676, "std": 3.1713638305664062, "min": -4.4266815185546875, "p10": -1.5546545028686523, "median": 1.9021854400634766, "p90": 5.199068450927735, "max": 12.18438720703125, "pos_frac": 0.75, "sample": [6.129119873046875, 4.491889953613281, 3.6885948181152344, 0.11001205444335938, 2.056436538696289, 1.0213508605957031, 12.18438720703125, -0.4537696838378906, 1.0846099853515625, 2.212249755859375, 4.574344635009766, -0.17919921875, 2.695220947265625, -3.7850494384765625, 3.574922561645508, 1.0414276123046875, 4.207424163818359, -0.9578208923339844, 2.2194290161132812, 0.8938522338867188, -1.2398681640625, 0.3761749267578125, 2.58148193359375, 1.0144577026367188, -0.5457916259765625, -3.333587646484375, 1.747934341430664, 2.3136749267578125, -2.0097808837890625, -2.983673095703125, 2.684366226196289, -1.4393234252929688, -0.35610198974609375, 3.4211273193359375, 3.3797225952148438, -1.243438720703125, 0.9274864196777344, 0.40927886962890625, 1.57098388671875, 11.581756591796875, 1.4820899963378906, 3.4119415283203125, 3.3136520385742188, 4.023714065551758, 9.026702880859375, 6.247840881347656, -0.19818115234375, 2.465087890625, 5.246833801269531, 2.561981201171875, 1.2967147827148438, 1.2060775756835938, -1.7531871795654297, 2.9396209716796875, -4.4266815185546875, 3.2436141967773438, 3.3554763793945312, 5.087615966796875, -1.6040821075439453, 4.62579345703125, 0.2591705322265625, 7.4431304931640625, 0.7262687683105469, 4.131568908691406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000151.npy"}
|
||||
{"epoch": 0.22826908541194255, "step": 152, "batch_size": 64, "mean": 1.5727684497833252, "std": 3.167281150817871, "min": -4.01373291015625, "p10": -1.653987121582031, "median": 1.4621410369873047, "p90": 4.0686996459960945, "max": 14.392974853515625, "pos_frac": 0.671875, "sample": [1.1649971008300781, -0.5577239990234375, 2.1951656341552734, 3.99639892578125, -3.7853240966796875, 1.6922454833984375, 3.5463714599609375, 2.7670822143554688, 3.568756103515625, -1.266021728515625, 3.4054031372070312, 0.170166015625, 1.667266845703125, -4.01373291015625, 5.3269805908203125, 2.8105316162109375, 0.060832977294921875, 1.4515151977539062, -1.7251739501953125, -1.1952743530273438, -2.9338912963867188, 4.409515380859375, 2.0321807861328125, -1.3182754516601562, -0.8325653076171875, 1.0696334838867188, 2.706787109375, 2.2682876586914062, 1.4269332885742188, 2.42169189453125, 1.057607650756836, 3.9468612670898438, 1.103811264038086, 0.4766197204589844, -0.4198760986328125, -2.0357666015625, 0.4043769836425781, -0.30764007568359375, 2.0176925659179688, -1.487884521484375, -0.5174484252929688, 3.974273681640625, 2.1817474365234375, -0.0303497314453125, 9.855758666992188, 3.930126190185547, 2.015941619873047, 1.4727668762207031, 0.5133514404296875, 1.885833740234375, -2.4603195190429688, 2.0043563842773438, -0.3456230163574219, 11.34271240234375, 3.1787548065185547, 3.4745025634765625, 4.3834686279296875, -0.4136810302734375, 3.160552978515625, 14.392974853515625, -0.9847412109375, -2.839488983154297, -0.9045333862304688, 4.0996856689453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000152.npy"}
|
||||
{"epoch": 0.22978080120937264, "step": 153, "batch_size": 64, "mean": 1.7176457643508911, "std": 2.977780342102051, "min": -4.8744354248046875, "p10": -1.4358123779296874, "median": 0.8396515846252441, "p90": 6.078665161132814, "max": 10.4739990234375, "pos_frac": 0.734375, "sample": [2.3109054565429688, 0.15070343017578125, -1.5200424194335938, 6.7370452880859375, 5.048088073730469, 5.235553741455078, -1.3240966796875, -0.17492294311523438, 0.6568260192871094, -0.0833740234375, 0.3101825714111328, 3.8378124237060547, 0.7388153076171875, 9.491241455078125, 1.2969894409179688, 5.8394927978515625, 0.37410736083984375, -2.400634765625, 2.0582504272460938, 5.5263671875, 3.2713699340820312, 1.8019561767578125, -2.1359100341796875, -1.9703521728515625, 7.221534729003906, -1.47381591796875, 3.0947608947753906, 0.14434242248535156, 0.7936019897460938, 1.2231407165527344, 0.31626129150390625, 2.8518943786621094, -1.9179763793945312, -0.5828399658203125, 3.3711776733398438, 1.8973770141601562, -4.8744354248046875, 4.3827667236328125, 10.4739990234375, 1.6988296508789062, -1.219696044921875, -0.8491172790527344, 4.659034729003906, -1.347137451171875, 7.012031555175781, 0.185882568359375, 6.4057464599609375, 0.435943603515625, 6.1811676025390625, -0.11128520965576172, 0.0079803466796875, 0.9461326599121094, 0.4613990783691406, -0.6516571044921875, 3.0495262145996094, 0.5493278503417969, -1.2040328979492188, 2.7658004760742188, 1.066162109375, 0.3854522705078125, 2.9324073791503906, 0.8857011795043945, 0.63067626953125, 3.0548934936523438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000153.npy"}
|
||||
{"epoch": 0.23129251700680273, "step": 154, "batch_size": 64, "mean": 0.9782245755195618, "std": 2.802844285964966, "min": -7.2576751708984375, "p10": -2.3475009918212892, "median": 0.9542036056518555, "p90": 4.525572586059571, "max": 7.693046569824219, "pos_frac": 0.65625, "sample": [-4.669563293457031, -2.3604202270507812, -2.1043548583984375, -2.073974609375, 4.9127349853515625, 1.2463455200195312, -2.5406494140625, -0.5315380096435547, -2.87225341796875, -7.2576751708984375, 0.802886962890625, -0.09604740142822266, 2.217283248901367, 4.588920593261719, -0.2630157470703125, -2.3173561096191406, -1.560150146484375, 1.2886486053466797, 1.0152034759521484, -1.6815185546875, 0.3452606201171875, 7.693046569824219, 4.585420608520508, 1.7702560424804688, -3.1807022094726562, 7.170856475830078, 2.0350418090820312, -3.07440185546875, 1.40093994140625, -0.9405364990234375, 3.060199737548828, 1.0250530242919922, 3.2148590087890625, 1.5790176391601562, 2.5464534759521484, -1.8810348510742188, 0.6455535888671875, 3.3383350372314453, 1.5063371658325195, -0.2414226531982422, 1.0437164306640625, -1.0404586791992188, 1.7042675018310547, -1.6427230834960938, 0.33715057373046875, 4.385927200317383, 5.442897796630859, 0.8447151184082031, -0.6473197937011719, -0.7001266479492188, 4.078857421875, 0.18546295166015625, 0.25765228271484375, 2.5977096557617188, 3.856353759765625, 4.1735382080078125, 3.421100616455078, 0.16825103759765625, 3.9917984008789062, 5.585113525390625, 0.8932037353515625, 1.973423957824707, 0.5311450958251953, 2.8226776123046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000154.npy"}
|
||||
{"epoch": 0.2328042328042328, "step": 155, "batch_size": 64, "mean": 1.4344534873962402, "std": 3.0010762214660645, "min": -5.469089508056641, "p10": -1.5328741073608396, "median": 1.1882572174072266, "p90": 5.06810646057129, "max": 9.231437683105469, "pos_frac": 0.6875, "sample": [-0.2414398193359375, 0.71588134765625, 3.436279296875, 0.6590576171875, -0.28014373779296875, 1.56475830078125, 0.7295036315917969, 4.025917053222656, -4.169122695922852, 5.10626220703125, 4.8192291259765625, 3.9077138900756836, -1.3086700439453125, 2.6484756469726562, -0.8189964294433594, 0.01917266845703125, 0.1962127685546875, 2.5945892333984375, 2.5291900634765625, 5.176034927368164, 6.290824890136719, 0.5255126953125, 2.389373779296875, 0.32201194763183594, 2.981536865234375, 6.175510406494141, -3.8298797607421875, 0.6070709228515625, -0.9614372253417969, 4.979076385498047, -0.8296051025390625, 0.8014717102050781, 8.819705963134766, 4.230430603027344, 5.1669769287109375, 2.5284576416015625, -0.8202972412109375, 3.482330322265625, -5.265380859375, 3.1977081298828125, -1.206787109375, 2.6422500610351562, 1.642303466796875, -0.17882919311523438, 9.231437683105469, 1.8342208862304688, -0.86688232421875, 1.1595077514648438, -4.7888641357421875, 2.7254791259765625, 3.820709228515625, -2.369781494140625, 0.6623516082763672, 0.185577392578125, 1.2170066833496094, -0.7433929443359375, 1.6340599060058594, -1.6289615631103516, -5.469089508056641, -0.23964309692382812, 4.48870849609375, 3.704833984375, 2.501932144165039, -0.25443267822265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000155.npy"}
|
||||
{"epoch": 0.23431594860166288, "step": 156, "batch_size": 64, "mean": 1.6623612642288208, "std": 2.7121798992156982, "min": -3.7356109619140625, "p10": -1.9242755889892575, "median": 1.2754707336425781, "p90": 4.975857925415039, "max": 9.899200439453125, "pos_frac": 0.734375, "sample": [0.829071044921875, -1.5901985168457031, -0.230072021484375, -2.4062767028808594, 9.899200439453125, 2.4548492431640625, 6.181514739990234, -0.28598785400390625, 0.20267486572265625, 5.079006195068359, 0.9647445678710938, 4.831157684326172, -2.9035415649414062, 0.3349609375, -2.404693603515625, 3.2493553161621094, 2.6124114990234375, 3.9597244262695312, 0.39995574951171875, 3.0981483459472656, -0.00351715087890625, 0.10937881469726562, 2.0029525756835938, 3.2371444702148438, 3.8989410400390625, 1.1829986572265625, 0.2833137512207031, 3.8770904541015625, 1.516265869140625, 1.74652099609375, -1.0817108154296875, 2.94464111328125, 2.592041015625, -1.2815093994140625, 5.037872314453125, 7.644996643066406, 0.9613800048828125, 1.3679428100585938, -1.0154495239257812, 4.0962371826171875, 2.6246185302734375, 1.6778106689453125, -0.8191757202148438, -0.4390716552734375, 2.652759552001953, 0.5809478759765625, -0.46860504150390625, -2.1570053100585938, 0.7467498779296875, 3.539031982421875, 0.9666080474853516, 6.843048095703125, 2.2647228240966797, -3.7356109619140625, 4.415016174316406, 0.7394332885742188, -2.9361114501953125, 3.80218505859375, 4.634490966796875, 0.8476219177246094, 2.40802001953125, 5.991607666015625, 0.8859405517578125, -2.0674514770507812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000156.npy"}
|
||||
{"epoch": 0.23582766439909297, "step": 157, "batch_size": 64, "mean": 1.0453565120697021, "std": 2.6896650791168213, "min": -4.7398834228515625, "p10": -1.58603515625, "median": 0.5053176879882812, "p90": 4.4459888458251955, "max": 9.623191833496094, "pos_frac": 0.59375, "sample": [6.351848602294922, 5.85584831237793, 1.7119407653808594, 1.3611183166503906, 1.0360946655273438, -3.3381805419921875, -0.19854736328125, -0.3953704833984375, -1.3848724365234375, -0.8002510070800781, -1.1443367004394531, -0.7403488159179688, 2.613555908203125, 0.1383838653564453, -3.5482330322265625, -0.1927356719970703, -0.79364013671875, 0.12593841552734375, 4.004524230957031, 2.9772281646728516, 0.7771072387695312, 1.5400619506835938, 9.623191833496094, 1.9912834167480469, 0.40154266357421875, 2.2375564575195312, -2.655498504638672, -1.3274116516113281, -0.42938232421875, -0.9784355163574219, 2.7843494415283203, -0.9280471801757812, 1.0752649307250977, 2.005718231201172, 2.9608535766601562, 6.365089416503906, -0.173126220703125, 4.360301971435547, 2.8856048583984375, -0.247406005859375, 3.495861053466797, 4.4827117919921875, -1.7364692687988281, 7.6418609619140625, -1.6418914794921875, -2.043985366821289, -0.7224502563476562, 2.459238052368164, 0.23273468017578125, 0.5667877197265625, 5.058990478515625, -0.7960853576660156, -0.6826858520507812, 1.7390403747558594, 1.9270715713500977, 3.9950942993164062, 0.24523544311523438, 0.972564697265625, -4.7398834228515625, 1.3370246887207031, 1.09442138671875, -0.8790969848632812, -1.4557037353515625, 0.44384765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000157.npy"}
|
||||
{"epoch": 0.23733938019652306, "step": 158, "batch_size": 64, "mean": 1.24860417842865, "std": 2.8289546966552734, "min": -5.3876953125, "p10": -1.7892168045043944, "median": 1.47705078125, "p90": 4.801404762268066, "max": 8.92425537109375, "pos_frac": 0.640625, "sample": [1.0715789794921875, -0.5960845947265625, 2.383331298828125, 2.244701385498047, -2.030853271484375, 2.6132965087890625, 6.8475341796875, 3.801177978515625, 1.9749298095703125, -0.74981689453125, 2.502452850341797, 1.8877639770507812, 3.3718414306640625, 1.8987159729003906, -2.6052780151367188, 1.5267257690429688, 8.92425537109375, 0.6149635314941406, -0.5154294967651367, 6.421909332275391, -1.6692161560058594, -1.06842041015625, 1.4273757934570312, 0.7265052795410156, 4.754705429077148, 0.7219314575195312, 0.713409423828125, 2.4352264404296875, 2.275726318359375, -1.6427993774414062, -4.449615478515625, 0.11714935302734375, 1.6189346313476562, -1.3923797607421875, 3.704191207885742, 5.93968391418457, 4.821418762207031, -0.75543212890625, -0.7863216400146484, -3.205047607421875, 2.10125732421875, 3.0413055419921875, 3.6241455078125, 6.94482421875, -1.829864501953125, 2.3719558715820312, -1.6943721771240234, -0.5521087646484375, 6.64093017578125, 0.7518539428710938, 1.5872840881347656, 2.4738616943359375, -1.3925399780273438, -1.1776924133300781, -0.22227096557617188, -1.4225902557373047, -0.08231163024902344, 4.533241271972656, 2.280029296875, 0.014461517333984375, -2.5723838806152344, 1.6694812774658203, 2.3351268768310547, -5.3876953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000158.npy"}
|
||||
{"epoch": 0.23885109599395313, "step": 159, "batch_size": 64, "mean": 1.5027427673339844, "std": 2.8406262397766113, "min": -8.175796508789062, "p10": -1.007172393798828, "median": 1.4000587463378906, "p90": 3.8013930320739746, "max": 16.40728759765625, "pos_frac": 0.765625, "sample": [3.3912734985351562, 1.5973262786865234, 1.275848388671875, 4.794639587402344, 1.5645370483398438, -1.7254714965820312, 16.40728759765625, 3.7968568801879883, 0.3228797912597656, 1.59442138671875, 2.322998046875, -0.670166015625, 3.334667205810547, -0.460906982421875, -0.27820587158203125, 1.3499526977539062, 0.8756866455078125, 7.05963134765625, 1.1999053955078125, 1.5611991882324219, 1.6391067504882812, -1.193817138671875, 1.0632200241088867, 1.6064605712890625, 3.2153778076171875, 0.0774383544921875, 0.81640625, -0.13623046875, 2.7765655517578125, 3.94232177734375, -1.3938560485839844, 4.051361083984375, 2.9125137329101562, -1.156982421875, 0.8108749389648438, 1.7276458740234375, 1.5495948791503906, 1.80596923828125, 0.8550872802734375, -0.07354736328125, 1.8530731201171875, 0.97735595703125, 5.076133728027344, 0.24730300903320312, 1.450164794921875, 1.568166732788086, -0.5306282043457031, 3.130401611328125, -1.0847702026367188, 0.9253311157226562, 0.15460968017578125, 3.5811595916748047, 1.4676895141601562, 3.4313125610351562, 0.03675270080566406, -8.175796508789062, 2.6316394805908203, -1.656524658203125, -0.31842041015625, 0.719390869140625, 0.23154449462890625, -0.82611083984375, 3.272552490234375, 3.8033370971679688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000159.npy"}
|
||||
{"epoch": 0.24036281179138322, "step": 160, "batch_size": 64, "mean": 0.9330782890319824, "std": 2.3317532539367676, "min": -6.390737533569336, "p10": -1.625818634033203, "median": 0.6492972373962402, "p90": 3.3252342224121096, "max": 7.2457122802734375, "pos_frac": 0.71875, "sample": [0.30808067321777344, -0.15831756591796875, 3.270904541015625, 2.882904052734375, -0.8162574768066406, 2.558746337890625, 0.6367950439453125, 0.5425491333007812, 2.736053466796875, -0.203826904296875, 4.8914337158203125, -1.7420692443847656, 3.9189891815185547, 1.6819305419921875, 1.415140151977539, 0.22922515869140625, 1.0828742980957031, -4.1386871337890625, 1.9240169525146484, -6.390737533569336, 0.240203857421875, -1.0521278381347656, 0.956207275390625, -0.8532676696777344, 2.5329513549804688, 1.3549232482910156, 1.5795097351074219, 0.296142578125, 0.7330398559570312, 0.6549625396728516, 2.2840919494628906, 1.5489540100097656, 7.2457122802734375, 6.321937561035156, 0.6238994598388672, 0.6436319351196289, 1.89288330078125, 0.3906135559082031, 1.5517730712890625, 1.9958992004394531, -0.13513946533203125, 1.8689346313476562, -1.4599494934082031, -1.64190673828125, 0.0025234222412109375, 1.379180908203125, -0.6194076538085938, -2.2107620239257812, 0.5623970031738281, 0.13676071166992188, 5.92523193359375, -0.028321266174316406, 0.6309814453125, 2.878061294555664, 2.2362060546875, -0.8914642333984375, -1.5882797241210938, 2.8166542053222656, -3.7362442016601562, 3.3485183715820312, 2.3332862854003906, 4.641635894775391, 0.020660400390625, -2.3242340087890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000160.npy"}
|
||||
{"epoch": 0.2418745275888133, "step": 161, "batch_size": 64, "mean": 0.9137061834335327, "std": 2.4368038177490234, "min": -7.666648864746094, "p10": -1.3862470626831054, "median": 0.6746883392333984, "p90": 4.168015480041504, "max": 7.011322021484375, "pos_frac": 0.640625, "sample": [3.6001739501953125, -5.36663818359375, -1.9118881225585938, 0.883941650390625, -1.4028472900390625, 0.467193603515625, -1.09771728515625, 4.07012939453125, -0.4445075988769531, 0.005535125732421875, -1.732635498046875, 0.681121826171875, 1.7486858367919922, 0.8694915771484375, 1.6292343139648438, -0.1628570556640625, 1.9896183013916016, -7.666648864746094, 1.3355712890625, -2.0787811279296875, -0.6774368286132812, 4.4560699462890625, 3.4534530639648438, 0.1082916259765625, 2.8292999267578125, -0.5354881286621094, 0.19141006469726562, 0.4388694763183594, -0.5002365112304688, 2.7769088745117188, 2.4314041137695312, 0.9494743347167969, 1.4062747955322266, -0.283203125, 4.741981506347656, -2.5849037170410156, -0.2646007537841797, 2.44049072265625, 1.6051483154296875, 4.295928955078125, 0.4644203186035156, 2.6086044311523438, -0.2827129364013672, -0.8340187072753906, 1.8210830688476562, 0.6682548522949219, -0.3919677734375, 4.209966659545898, 2.7446365356445312, 1.7839126586914062, -0.23326492309570312, 1.1524543762207031, 4.8666534423828125, 1.4411754608154297, -0.35260772705078125, -0.8429937362670898, 0.0208892822265625, 0.2938690185546875, 4.069759368896484, 5.52606201171875, 7.011322021484375, -1.2088394165039062, -1.347513198852539, 2.592742919921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000161.npy"}
|
||||
{"epoch": 0.24338624338624337, "step": 162, "batch_size": 64, "mean": 1.5212857723236084, "std": 2.2487497329711914, "min": -3.880218505859375, "p10": -0.8202709197998046, "median": 1.4573516845703125, "p90": 3.687236785888672, "max": 11.517440795898438, "pos_frac": 0.765625, "sample": [3.7238006591796875, 3.0934295654296875, -3.880218505859375, 1.7257156372070312, 1.1080780029296875, 1.2724151611328125, 1.1904563903808594, 1.409149169921875, 0.958953857421875, 3.314859390258789, 1.8024482727050781, -0.5052471160888672, -0.7582588195800781, 11.517440795898438, 1.9384078979492188, 2.696758270263672, 1.50555419921875, 2.5153884887695312, 0.662261962890625, 3.1846389770507812, 2.619476318359375, 0.7494277954101562, 1.5374984741210938, -1.493377685546875, 3.0754013061523438, 1.6466598510742188, 1.040740966796875, 1.0768508911132812, -0.8468475341796875, 3.526214599609375, 1.551645278930664, 0.13604736328125, -2.3084869384765625, 3.422391891479492, 0.4605712890625, 2.4363174438476562, 5.365837097167969, 1.7099571228027344, 0.03607368469238281, -1.6248664855957031, -0.48218536376953125, 0.5146102905273438, 5.649505615234375, 0.40663909912109375, 2.3876113891601562, -0.6737442016601562, -1.8811912536621094, -0.0187835693359375, -0.42987060546875, -1.44915771484375, -0.4578819274902344, 2.110015869140625, 3.2826080322265625, 1.9614276885986328, 3.1786117553710938, 0.7974014282226562, 2.1329727172851562, -0.41207122802734375, 3.663604736328125, 1.239248275756836, 0.9932117462158203, 4.060184478759766, 3.6973648071289062, 4.49859619140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000162.npy"}
|
||||
{"epoch": 0.24489795918367346, "step": 163, "batch_size": 64, "mean": 1.3464242219924927, "std": 2.167583703994751, "min": -3.4660606384277344, "p10": -1.2337783813476562, "median": 1.036562442779541, "p90": 4.022397613525392, "max": 7.2899017333984375, "pos_frac": 0.71875, "sample": [-1.6602191925048828, 0.6545639038085938, 0.8052520751953125, 4.136543273925781, 0.8062362670898438, 1.32757568359375, -3.4660606384277344, 0.724456787109375, 7.2899017333984375, -0.84771728515625, 1.0194778442382812, 4.6091461181640625, 3.5836334228515625, 1.2329330444335938, 4.48443603515625, -0.6331634521484375, 6.461544036865234, 1.6553115844726562, 0.6530609130859375, 5.337066650390625, 1.7131576538085938, 1.144989013671875, 2.959697723388672, -0.46526336669921875, -1.6948432922363281, 3.5975494384765625, 0.8020057678222656, 1.492889404296875, 0.5994796752929688, -0.5376052856445312, 0.9019193649291992, -0.6651153564453125, 2.342803955078125, 1.8023319244384766, 3.4453277587890625, 2.8744583129882812, -1.3207969665527344, 3.080108642578125, 0.050930023193359375, 0.7339982986450195, 0.044219970703125, 3.5264968872070312, -1.45831298828125, 2.1572341918945312, -0.29010009765625, -1.0307350158691406, -0.0974884033203125, 0.09056854248046875, 1.0536470413208008, -0.438262939453125, -1.863250732421875, 2.619792938232422, 2.86212158203125, 0.4572935104370117, -0.718780517578125, -2.4259796142578125, 5.450111389160156, 1.6591110229492188, 3.7560577392578125, 3.2607812881469727, 2.9525985717773438, -0.556182861328125, 1.6273136138916016, 2.5008926391601562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000163.npy"}
|
||||
{"epoch": 0.24640967498110355, "step": 164, "batch_size": 64, "mean": 1.6347506046295166, "std": 2.8143579959869385, "min": -6.6300201416015625, "p10": -2.245519256591797, "median": 1.6666646003723145, "p90": 5.439218902587891, "max": 7.35760498046875, "pos_frac": 0.71875, "sample": [-2.5236663818359375, -0.6894588470458984, -0.31048583984375, -0.2632026672363281, 4.529899597167969, 2.3885421752929688, 6.2637481689453125, 2.6915969848632812, 1.48944091796875, -0.3291778564453125, 0.075347900390625, -2.2403335571289062, 1.9847946166992188, -0.8789653778076172, 2.4845237731933594, 2.271697998046875, 2.6099281311035156, 0.9612102508544922, 7.35760498046875, -0.4293785095214844, 0.4036092758178711, -0.8303604125976562, 2.7186927795410156, 2.558746337890625, 0.04441261291503906, -0.5639476776123047, 2.4921836853027344, 3.695079803466797, 1.4097137451171875, 0.6042594909667969, 2.2494630813598633, 3.2770118713378906, 0.38069915771484375, 1.8128585815429688, 4.3175201416015625, -3.2375946044921875, 2.898529052734375, 6.180671691894531, 5.41796875, -6.6300201416015625, 0.38347816467285156, 2.2323150634765625, 0.9305038452148438, 4.217674255371094, 0.6851654052734375, 1.164541244506836, 3.4287338256835938, -0.267608642578125, 6.522247314453125, 4.3782501220703125, 6.37347412109375, -3.1794166564941406, 5.448326110839844, -3.1907005310058594, -0.21279144287109375, -2.24774169921875, 1.9899444580078125, 6.176582336425781, 5.142822265625, -2.6125144958496094, 5.048175811767578, 2.72320556640625, 1.5204706192016602, 1.3257369995117188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000164.npy"}
|
||||
{"epoch": 0.24792139077853365, "step": 165, "batch_size": 64, "mean": 1.5601141452789307, "std": 2.4910218715667725, "min": -3.0314559936523438, "p10": -1.0419200897216796, "median": 1.2044563293457031, "p90": 4.363200759887696, "max": 10.953102111816406, "pos_frac": 0.703125, "sample": [6.3607635498046875, 2.3119277954101562, 3.3140697479248047, 2.00909423828125, 0.6603584289550781, 1.133331298828125, -0.15464401245117188, -0.6834869384765625, 2.2636871337890625, 1.512786865234375, 1.5484619140625, 1.0367202758789062, 2.6198577880859375, 0.4671211242675781, -0.150482177734375, 1.7605133056640625, -0.7404403686523438, 2.673381805419922, -1.6023139953613281, -0.471343994140625, -1.3234710693359375, -1.1475982666015625, 4.576271057128906, -0.8824596405029297, 3.5508575439453125, 4.465564727783203, 0.6635055541992188, 0.3202171325683594, 2.769174575805664, 3.6273117065429688, 0.543365478515625, -0.3745460510253906, 7.7710723876953125, 0.7113418579101562, 2.5510520935058594, -3.0314559936523438, -0.225128173828125, 1.9646110534667969, 4.831996917724609, 1.5432052612304688, 0.49303436279296875, -0.9961967468261719, -0.025299072265625, 1.3721084594726562, 3.6681995391845703, -1.31304931640625, 1.2621269226074219, 1.7952232360839844, 0.5386848449707031, -0.5311050415039062, 8.959922790527344, 1.8008193969726562, -1.0615158081054688, 1.1746559143066406, 1.2342567443847656, -1.1339035034179688, 2.211181640625, 0.8888015747070312, 3.3254222869873047, 0.7520980834960938, 4.124351501464844, 2.3490982055664062, 10.953102111816406, -0.7689666748046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000165.npy"}
|
||||
{"epoch": 0.2494331065759637, "step": 166, "batch_size": 64, "mean": 0.6258450746536255, "std": 2.206613779067993, "min": -6.329799652099609, "p10": -1.3487861633300782, "median": 0.41466426849365234, "p90": 3.4618309020996096, "max": 5.8294677734375, "pos_frac": 0.546875, "sample": [5.8294677734375, -0.06634902954101562, 1.7440185546875, 3.4739456176757812, 2.3402175903320312, 0.0305633544921875, -1.0458984375, 5.768100738525391, -0.73870849609375, -0.479400634765625, 0.7417526245117188, 1.4342594146728516, 1.2903404235839844, -3.0980224609375, 2.797576904296875, -0.6534576416015625, -1.3650894165039062, 2.0037879943847656, -1.1299591064453125, 0.48291778564453125, -0.8855228424072266, 1.8782157897949219, -6.329799652099609, 1.2869186401367188, 3.7813034057617188, -0.036067962646484375, -0.5165252685546875, 1.123931884765625, 2.1112518310546875, 1.9111480712890625, -0.463409423828125, -1.8003005981445312, -0.2651538848876953, -2.520751953125, 3.2849273681640625, 0.6827316284179688, 4.2789764404296875, -2.5829620361328125, 0.9330940246582031, 1.0906295776367188, -0.7950897216796875, 3.433563232421875, -0.9225540161132812, -0.658233642578125, 1.111358642578125, -2.749908447265625, -1.3107452392578125, -0.5813217163085938, 0.34641075134277344, 5.392467498779297, 2.395397186279297, -0.6068649291992188, -0.6305084228515625, 0.7750568389892578, -1.0194721221923828, 1.3162803649902344, 2.1639328002929688, -0.5845088958740234, 4.9596405029296875, -0.8750076293945312, 1.9343338012695312, 1.082366943359375, -0.59613037109375, 0.1509246826171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000166.npy"}
|
||||
{"epoch": 0.2509448223733938, "step": 167, "batch_size": 64, "mean": 1.1649138927459717, "std": 1.9014073610305786, "min": -3.2288665771484375, "p10": -1.0225677490234373, "median": 1.0000343322753906, "p90": 3.100819396972657, "max": 6.942033767700195, "pos_frac": 0.734375, "sample": [1.0334548950195312, 1.4858474731445312, 2.8695602416992188, 2.0860824584960938, -0.77166748046875, 2.455596923828125, 3.2972030639648438, -1.130096435546875, -0.09918212890625, 1.928985595703125, 1.0462150573730469, 0.8223342895507812, 1.6072731018066406, -0.3101806640625, -0.4102916717529297, -0.450347900390625, 1.82122802734375, 3.1802520751953125, 1.089111328125, -3.2288665771484375, -2.022930145263672, 0.29108428955078125, 0.3289756774902344, 1.6666336059570312, 0.36248207092285156, -2.9760894775390625, 0.9380111694335938, -0.43846988677978516, -0.1329193115234375, 3.1624755859375, 1.8981399536132812, 2.9175167083740234, -1.1955928802490234, 0.8682174682617188, 0.57391357421875, 2.3703765869140625, 1.6696853637695312, 1.6862030029296875, 6.076934814453125, 2.9569549560546875, -0.02531147003173828, 3.572887420654297, -1.9196853637695312, 1.4304542541503906, 6.677436828613281, -0.023162841796875, -0.13840484619140625, 2.7247772216796875, 0.09593963623046875, 0.75372314453125, 2.021190643310547, 0.3636665344238281, 1.4985313415527344, 6.942033767700195, 2.5305538177490234, 2.1036605834960938, 0.6582717895507812, 2.0634002685546875, 0.9526481628417969, 2.5780792236328125, 0.96661376953125, 0.3227386474609375, -1.1708831787109375, 0.25121498107910156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000167.npy"}
|
||||
{"epoch": 0.25245653817082386, "step": 168, "batch_size": 64, "mean": 0.9844681024551392, "std": 1.8434053659439087, "min": -3.4509429931640625, "p10": -1.1984531402587888, "median": 0.8970355987548828, "p90": 3.1070892333984377, "max": 7.342132568359375, "pos_frac": 0.734375, "sample": [1.6108551025390625, -0.2577362060546875, 2.6458473205566406, 0.48419952392578125, -0.08665847778320312, 3.0967578887939453, 0.36480712890625, -1.0283050537109375, -2.1201019287109375, -1.4810333251953125, 1.3186416625976562, -0.5548191070556641, -1.2756195068359375, 3.1115169525146484, 4.708251953125, 0.6173934936523438, 2.0584335327148438, 2.9000473022460938, 0.0725250244140625, 2.0796356201171875, 0.463653564453125, 0.9758987426757812, 0.7995681762695312, 0.9792098999023438, -0.40986061096191406, 0.9476890563964844, 1.5256690979003906, 2.5370559692382812, -3.4509429931640625, 0.7129898071289062, 1.4658203125, 0.8463821411132812, -0.9689178466796875, 0.26630401611328125, 1.475229263305664, -2.5016021728515625, 7.342132568359375, 1.8775672912597656, -1.0313758850097656, -1.2700576782226562, 1.4434947967529297, 4.7135467529296875, 0.25675201416015625, 3.460979461669922, 2.9288291931152344, -2.6124954223632812, 1.5572509765625, 1.3063507080078125, 1.2035903930664062, -0.19794464111328125, 3.871124267578125, 1.1534957885742188, 0.6577987670898438, -0.31446075439453125, 0.36202239990234375, 3.1159591674804688, 0.2520751953125, -0.003177642822265625, 1.1119613647460938, 0.7858619689941406, 1.4293785095214844, 2.2483444213867188, 2.913055419921875, 0.5151138305664062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000168.npy"}
|
||||
{"epoch": 0.25396825396825395, "step": 169, "batch_size": 64, "mean": 1.2143940925598145, "std": 2.2369728088378906, "min": -3.7971343994140625, "p10": -1.1596174240112305, "median": 0.9108619689941406, "p90": 4.932144927978516, "max": 6.356422424316406, "pos_frac": 0.65625, "sample": [-0.22145843505859375, 1.4373855590820312, 3.3866043090820312, -0.8423995971679688, 1.013458251953125, 0.5517959594726562, 2.4688644409179688, -0.6016273498535156, 0.4587669372558594, -0.16725540161132812, -1.9273300170898438, -0.4976921081542969, 6.1956939697265625, -1.1003570556640625, 2.29827880859375, -1.6314735412597656, 3.000499725341797, 1.0573654174804688, 0.21736907958984375, 0.8080329895019531, -0.20569610595703125, 2.402618408203125, 2.936126708984375, 0.5013656616210938, -1.0172691345214844, 5.9160614013671875, 1.4740219116210938, 2.4401931762695312, 1.0974273681640625, 2.6672286987304688, -1.1850147247314453, 3.1402130126953125, 1.0908203125, -1.0818862915039062, -0.20581626892089844, -0.3481159210205078, 5.3354339599609375, -0.8258285522460938, 0.9399375915527344, 0.331390380859375, -0.6859817504882812, 0.3001861572265625, 0.8817863464355469, 0.9960250854492188, 0.6221122741699219, 1.8988494873046875, 3.2159805297851562, -0.381744384765625, 4.73175048828125, 2.60382080078125, -1.4572906494140625, 1.9046268463134766, 1.5011444091796875, -3.7971343994140625, -1.93487548828125, 5.933502197265625, 5.018028259277344, 5.404487609863281, 0.2804718017578125, 6.356422424316406, -0.6222763061523438, 3.2857799530029297, 1.825958251953125, -1.4681396484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000169.npy"}
|
||||
{"epoch": 0.25547996976568405, "step": 170, "batch_size": 64, "mean": 1.0023449659347534, "std": 2.2977848052978516, "min": -4.923179626464844, "p10": -2.031603240966797, "median": 1.1842021942138672, "p90": 4.1860176086425795, "max": 5.8151397705078125, "pos_frac": 0.6875, "sample": [4.455619812011719, -1.7147693634033203, 0.12595367431640625, 0.35860443115234375, 2.2776660919189453, 3.6187362670898438, 2.1505184173583984, 1.97625732421875, 1.268341064453125, 2.701871871948242, -1.0291290283203125, 0.30841064453125, 1.2980804443359375, 4.9554595947265625, 2.4071121215820312, 0.058685302734375, 1.5023384094238281, 2.860799789428711, 0.43683624267578125, -2.1088714599609375, -1.2280311584472656, 5.4387359619140625, 0.8593902587890625, -0.8142261505126953, -0.6940536499023438, -0.29027557373046875, -1.8187026977539062, 1.816192626953125, -4.923179626464844, 1.477020263671875, 3.6997909545898438, 1.7212066650390625, 4.3824005126953125, 1.7108516693115234, -2.1315841674804688, -0.41411590576171875, -3.0948333740234375, 1.0016555786132812, 1.993927001953125, -2.2564239501953125, 0.158966064453125, -0.3896484375, 1.4629364013671875, 1.750701904296875, 3.2717361450195312, 5.438507080078125, 1.3021392822265625, 5.8151397705078125, 1.1000633239746094, 0.51922607421875, -1.009002685546875, -0.6582756042480469, 0.7058029174804688, 3.7277908325195312, -2.6460342407226562, 1.7497177124023438, -1.8513107299804688, 5.655517578125, -0.78790283203125, 1.5648536682128906, 2.8884658813476562, 1.8987541198730469, 0.6997528076171875, -2.5620880126953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000170.npy"}
|
||||
{"epoch": 0.25699168556311414, "step": 171, "batch_size": 64, "mean": 1.5381628274917603, "std": 2.2715299129486084, "min": -5.3319854736328125, "p10": -0.9255683898925781, "median": 1.4617042541503906, "p90": 4.600902938842774, "max": 8.368392944335938, "pos_frac": 0.734375, "sample": [-0.31561279296875, 0.001190185546875, 2.9770660400390625, 3.062713623046875, 1.7359466552734375, 4.691265106201172, 3.0004043579101562, 3.8613128662109375, 1.2551565170288086, 2.6175403594970703, -0.8091278076171875, 3.3487701416015625, 4.507648468017578, 3.7443008422851562, 3.087310791015625, -1.1987228393554688, 1.9974136352539062, 0.5917854309082031, 1.2860069274902344, -0.42169189453125, -0.06339263916015625, 3.4253768920898438, 1.8051681518554688, 4.640869140625, 2.5009231567382812, -0.1391582489013672, 0.202972412109375, 0.00710296630859375, 4.921844482421875, -1.1964435577392578, 2.6186752319335938, 0.900482177734375, 0.817779541015625, -1.4307193756103516, 1.0278129577636719, 2.3338661193847656, -0.9754714965820312, -4.050987243652344, 5.04510498046875, 5.584747314453125, -5.3319854736328125, 0.32462596893310547, 1.9527206420898438, 3.3318328857421875, 1.01702880859375, 8.368392944335938, 2.6248626708984375, -0.130523681640625, 1.1227760314941406, 1.70013427734375, 1.6374015808105469, 4.761726379394531, 3.1141910552978516, -1.1419601440429688, -0.5446853637695312, 1.1336669921875, 1.931182861328125, 0.9265232086181641, -0.2841339111328125, -0.2177581787109375, 3.10302734375, 2.1834030151367188, 0.46538543701171875, -0.6026458740234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000171.npy"}
|
||||
{"epoch": 0.2585034013605442, "step": 172, "batch_size": 64, "mean": 1.1720627546310425, "std": 2.2849538326263428, "min": -4.84050178527832, "p10": -1.6397537231445312, "median": 1.0696101188659668, "p90": 3.884053039550783, "max": 7.506011962890625, "pos_frac": 0.65625, "sample": [-1.599945068359375, 0.6631088256835938, 1.8237380981445312, 2.9861907958984375, -2.1716766357421875, 0.3179464340209961, -0.1628408432006836, 1.8110389709472656, 3.1027679443359375, 4.039390563964844, 5.08831787109375, -2.08740234375, 1.5211944580078125, -0.3589897155761719, -0.21028900146484375, 7.424278259277344, 3.2784042358398438, -0.9859428405761719, 0.290313720703125, 2.3136253356933594, 3.5215988159179688, -0.34160614013671875, -2.4932689666748047, 0.296600341796875, 1.546478271484375, 0.7368793487548828, 0.8092269897460938, 1.6268157958984375, 1.0993423461914062, -0.5637054443359375, 1.4943885803222656, -0.22298431396484375, 2.0469512939453125, 7.506011962890625, -0.36441802978515625, -1.881338119506836, 3.4776992797851562, -0.2959747314453125, 4.685508728027344, 0.7945175170898438, -0.1365222930908203, 0.9114990234375, 1.3077735900878906, 3.38690185546875, 1.6973819732666016, -1.6568145751953125, 1.2650146484375, 4.4672088623046875, -1.4461669921875, 1.7589111328125, -2.4405059814453125, 3.1662445068359375, 4.284278869628906, -4.84050178527832, 1.8147964477539062, 3.385894775390625, -0.6338577270507812, -0.1826934814453125, 1.0398778915405273, 3.264148712158203, -0.218475341796875, 1.4309768676757812, 2.5794830322265625, 0.245208740234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000172.npy"}
|
||||
{"epoch": 0.2600151171579743, "step": 173, "batch_size": 64, "mean": 1.2176411151885986, "std": 2.3717143535614014, "min": -3.4558753967285156, "p10": -1.4454343795776365, "median": 0.9284124374389648, "p90": 4.366582489013672, "max": 8.228378295898438, "pos_frac": 0.640625, "sample": [-2.538576126098633, 2.51239013671875, -1.1613502502441406, 2.0893783569335938, 1.4039306640625, 3.8249740600585938, 2.9151992797851562, -1.5041694641113281, -0.1589641571044922, 0.9825859069824219, -1.0096664428710938, 2.203044891357422, 3.432464599609375, -2.914936065673828, -2.009428024291992, 3.7533187866210938, 1.86785888671875, -0.4472808837890625, 0.6251506805419922, 0.8990154266357422, 3.867645263671875, 3.7334365844726562, 0.05641937255859375, -0.25830078125, -3.4558753967285156, 0.8021697998046875, 4.7933807373046875, 0.7203445434570312, -0.061862945556640625, 2.201274871826172, -2.1210708618164062, 3.369171142578125, -1.3083858489990234, 4.378532409667969, 1.6284561157226562, -0.23102092742919922, 2.7417373657226562, -2.79949951171875, 2.2019577026367188, 4.3386993408203125, 0.4307098388671875, 0.26587677001953125, 2.2543983459472656, 3.2455825805664062, 0.5803298950195312, -0.8751869201660156, 0.9578094482421875, -1.18548583984375, 8.228378295898438, 4.662296295166016, 4.148777008056641, 1.6335372924804688, 1.8280258178710938, 4.962982177734375, 5.0172576904296875, -0.6431503295898438, -0.6171875, -1.1408939361572266, 0.7528934478759766, 4.651336669921875, -1.0506439208984375, 1.6298999786376953, -0.4690895080566406, -0.701568603515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000173.npy"}
|
||||
{"epoch": 0.2615268329554044, "step": 174, "batch_size": 64, "mean": 1.2171550989151, "std": 2.5494680404663086, "min": -4.563392639160156, "p10": -1.495697021484375, "median": 0.8208351135253906, "p90": 4.650798797607423, "max": 7.3578643798828125, "pos_frac": 0.640625, "sample": [-0.45697021484375, 3.5239791870117188, 1.9682044982910156, 1.3983783721923828, 1.1304988861083984, 2.8899383544921875, 3.4466552734375, 4.42144775390625, -2.5239715576171875, 0.9875278472900391, 5.7159271240234375, 7.2834320068359375, 1.8832778930664062, 0.3412513732910156, 1.76617431640625, -0.3689842224121094, 2.5791263580322266, 2.8147106170654297, 1.6342086791992188, -3.2870216369628906, 5.851783752441406, 2.638153076171875, -0.7858180999755859, 2.0614013671875, 7.3578643798828125, -0.7057037353515625, 2.55670166015625, -0.5630722045898438, 4.365531921386719, 3.532958984375, -0.68951416015625, 3.8938217163085938, 4.0323638916015625, -0.2977294921875, 2.769634246826172, -0.4670562744140625, 0.729705810546875, -0.811553955078125, -0.23907947540283203, 3.89605712890625, -2.564910888671875, -1.524810791015625, 0.10776519775390625, 0.3726158142089844, -0.6145820617675781, 4.749092102050781, 1.6285781860351562, -1.427764892578125, 4.9986572265625, 0.41640663146972656, -4.563392639160156, 1.12213134765625, 0.23476409912109375, 0.5702838897705078, -2.1057586669921875, 0.7140274047851562, 4.756935119628906, -0.4544639587402344, -0.2046051025390625, -3.4730758666992188, 0.9119644165039062, -1.3031082153320312, -1.0225982666015625, 0.29953765869140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000174.npy"}
|
||||
{"epoch": 0.26303854875283444, "step": 175, "batch_size": 64, "mean": 1.3147969245910645, "std": 2.6126813888549805, "min": -5.359161376953125, "p10": -1.5698841094970704, "median": 1.0321998596191406, "p90": 4.785214996337891, "max": 8.0855712890625, "pos_frac": 0.71875, "sample": [-1.1892566680908203, 1.7931232452392578, -3.3414306640625, 1.2349319458007812, 0.256500244140625, 4.318027496337891, -1.2663145065307617, 0.451324462890625, 0.619720458984375, 0.10011100769042969, 3.180023193359375, 5.1464385986328125, 0.5135765075683594, 0.47669219970703125, -0.18729782104492188, 2.7952404022216797, 0.770172119140625, 6.095954895019531, 0.9734878540039062, -1.5816154479980469, -1.4197540283203125, 1.7038555145263672, 0.2500762939453125, 3.227386474609375, 1.090911865234375, 0.8848724365234375, 1.3797683715820312, -1.542510986328125, 4.676048278808594, -5.359161376953125, 2.6411895751953125, 1.7312850952148438, 2.59759521484375, -0.9348297119140625, -1.147125244140625, 0.7083625793457031, -0.5205535888671875, 8.0855712890625, -0.012645721435546875, -0.6611404418945312, 3.869476318359375, -3.0902862548828125, 1.2064743041992188, 0.9300651550292969, -2.3003997802734375, 1.745391845703125, 2.7020950317382812, -1.6041755676269531, 7.8140106201171875, 4.832000732421875, 2.1146106719970703, 3.5239028930664062, 0.08264923095703125, -1.381683349609375, 1.860382080078125, 2.2425537109375, -1.8952350616455078, 6.534309387207031, 0.31470489501953125, 5.73089599609375, 2.9731178283691406, 1.862274169921875, 1.8360595703125, 3.7052001953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000175.npy"}
|
||||
{"epoch": 0.26455026455026454, "step": 176, "batch_size": 64, "mean": 0.9671777486801147, "std": 2.563364267349243, "min": -4.157798767089844, "p10": -1.8169853210449218, "median": 1.149308681488037, "p90": 4.0539398193359375, "max": 8.949310302734375, "pos_frac": 0.578125, "sample": [7.196685791015625, -3.85284423828125, 1.3510894775390625, 5.314666748046875, -0.06984710693359375, 6.526123046875, -1.848358154296875, 1.5849227905273438, 1.2147216796875, 1.7693557739257812, -2.8197784423828125, -1.5521011352539062, -2.4210891723632812, 3.6004714965820312, -4.157798767089844, 1.1492490768432617, 1.7521400451660156, 1.6345634460449219, 1.45562744140625, -0.7412223815917969, -1.0205307006835938, -1.2948837280273438, 0.9987697601318359, 4.071197509765625, 3.7118453979492188, -0.06594085693359375, -0.30539703369140625, 1.59124755859375, -1.3534812927246094, 2.3729515075683594, -0.77899169921875, -0.6824741363525391, -0.25345611572265625, 0.7096786499023438, 2.8765392303466797, 4.013671875, 1.1493682861328125, 4.7054443359375, 1.3640785217285156, -1.0102767944335938, -1.2003250122070312, -0.0904083251953125, 2.1518917083740234, 3.646350860595703, 1.3651924133300781, 2.460113525390625, -1.7437820434570312, -1.9963569641113281, 1.4525909423828125, 1.3278045654296875, -0.3052215576171875, 3.371753692626953, 0.5441360473632812, -0.7080917358398438, 1.6293869018554688, 1.7044334411621094, 4.6877899169921875, 2.1210479736328125, -1.1534805297851562, 8.949310302734375, -0.0038604736328125, -1.7383270263671875, 0.756011962890625, -3.2145233154296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000176.npy"}
|
||||
{"epoch": 0.2660619803476946, "step": 177, "batch_size": 64, "mean": 2.19460391998291, "std": 2.5808112621307373, "min": -4.7167510986328125, "p10": -1.025034141540527, "median": 2.423625946044922, "p90": 5.0316312789917, "max": 8.923408508300781, "pos_frac": 0.796875, "sample": [4.086189270019531, -1.8067855834960938, 1.3727302551269531, 8.923408508300781, 2.3736114501953125, 1.0328302383422852, -0.7321243286132812, 2.225738525390625, 4.087615966796875, 2.8414154052734375, 6.563907623291016, 1.9886512756347656, 4.5888824462890625, -0.3135223388671875, 1.1667251586914062, -0.8391246795654297, 3.0817337036132812, 0.4872283935546875, 1.2836589813232422, 2.9380531311035156, 4.271728515625, 1.2394332885742188, -0.5344715118408203, 4.490213394165039, 3.829935073852539, 5.228179931640625, 0.43857574462890625, 1.010019302368164, -0.2525749206542969, 0.33826446533203125, -1.6144866943359375, 0.9708213806152344, 5.123199462890625, 0.663848876953125, -1.1047096252441406, 4.688701629638672, 4.817972183227539, 0.629180908203125, -0.46392822265625, 2.933368682861328, 1.5368270874023438, 2.4736404418945312, 4.314701080322266, 4.395721435546875, 3.1146697998046875, -2.2070159912109375, -2.7031097412109375, -2.45513916015625, 4.441606521606445, 1.603424072265625, 6.23406982421875, 0.4739036560058594, -4.7167510986328125, 4.647830963134766, 3.317291259765625, 3.7778472900390625, 4.191753387451172, 4.378692626953125, 5.458709716796875, 2.528369903564453, 2.9633026123046875, 5.719078063964844, 0.8773651123046875, 4.033771514892578], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000177.npy"}
|
||||
{"epoch": 0.2675736961451247, "step": 178, "batch_size": 64, "mean": 1.458240270614624, "std": 2.0920114517211914, "min": -3.7500267028808594, "p10": -1.1999343872070312, "median": 1.3559951782226562, "p90": 3.8582626342773443, "max": 6.78277587890625, "pos_frac": 0.78125, "sample": [1.8877754211425781, 1.0332489013671875, 0.8359870910644531, 0.4654388427734375, 0.39400482177734375, 1.3267440795898438, 1.8871116638183594, 3.7322235107421875, 0.8651542663574219, 1.8872566223144531, -1.2278671264648438, 6.5547637939453125, 3.630767822265625, -1.6511955261230469, 2.2037506103515625, -0.1259002685546875, 3.545098304748535, 0.4914512634277344, 1.2050704956054688, 1.5204010009765625, 5.510532379150391, 1.0860004425048828, -1.1347579956054688, 6.78277587890625, 1.3852462768554688, 0.67242431640625, 2.179149627685547, 6.1760406494140625, -2.3152389526367188, 0.5239391326904297, 1.0471343994140625, 0.28450965881347656, 0.831817626953125, 4.222896575927734, -0.0912933349609375, 2.3030624389648438, 2.3962249755859375, -0.3391265869140625, 3.7557296752929688, -1.0150794982910156, 0.7371063232421875, 2.6257705688476562, 1.730560302734375, 1.9451141357421875, -3.7500267028808594, 3.578723907470703, 0.46460914611816406, 0.8111801147460938, 3.8960418701171875, 0.11727619171142578, 1.4361343383789062, 3.770111083984375, 1.4478416442871094, 4.153434753417969, -0.37819576263427734, 1.4034042358398438, 3.0146942138671875, -0.7682342529296875, -1.5756301879882812, -1.2889022827148438, 2.2924232482910156, 2.415069580078125, 2.4328536987304688, -1.9072589874267578], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000178.npy"}
|
||||
{"epoch": 0.2690854119425548, "step": 179, "batch_size": 64, "mean": 1.2006173133850098, "std": 2.558832883834839, "min": -3.7667694091796875, "p10": -1.6223297119140625, "median": 0.9172868728637695, "p90": 4.61997222900391, "max": 10.245895385742188, "pos_frac": 0.734375, "sample": [0.4278564453125, -1.228668212890625, 2.274017333984375, 1.672149658203125, 0.9494400024414062, 1.0540008544921875, 0.5390510559082031, 5.0308837890625, 5.7767181396484375, 1.9538955688476562, 1.3794269561767578, 1.3267383575439453, 3.606983184814453, -3.6359405517578125, 2.768734931945801, -0.8431472778320312, -3.7190322875976562, -1.6678314208984375, 0.09004974365234375, 2.0709457397460938, -1.221745491027832, 2.289234161376953, 1.636993408203125, -1.2709197998046875, 2.7699661254882812, -2.4235076904296875, -1.5161590576171875, 3.5890541076660156, 1.081766128540039, 0.08732414245605469, 0.8851337432861328, 1.9763565063476562, 2.648448944091797, -1.0947418212890625, 0.8847217559814453, -3.7667694091796875, 0.45001220703125, 5.993255615234375, 3.3061676025390625, 0.114654541015625, 1.878814697265625, 0.26766204833984375, 0.5034580230712891, 1.256805419921875, -0.480560302734375, 3.4565582275390625, 5.677726745605469, 1.9120254516601562, 5.121238708496094, 10.245895385742188, 2.5701866149902344, 0.44108009338378906, 3.6611785888671875, 0.3729095458984375, -1.7636737823486328, 0.18167877197265625, -0.07524490356445312, 0.503143310546875, 5.27490234375, 0.5636825561523438, -3.5451927185058594, -0.2780189514160156, 3.4842910766601562, -0.6365528106689453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000179.npy"}
|
||||
{"epoch": 0.2705971277399849, "step": 180, "batch_size": 64, "mean": 1.0763027667999268, "std": 2.0887451171875, "min": -3.1079025268554688, "p10": -1.2356452941894531, "median": 0.7592487335205078, "p90": 3.871296119689942, "max": 7.2989349365234375, "pos_frac": 0.6875, "sample": [6.076446533203125, 0.18795394897460938, 2.4266605377197266, 0.5066757202148438, 2.2657623291015625, 0.13102340698242188, -2.580169677734375, 2.1904144287109375, -0.3339080810546875, -1.502685546875, 3.9402427673339844, 0.3599891662597656, -0.267333984375, -1.244720458984375, 1.9940872192382812, 0.7879619598388672, -1.1877899169921875, 1.4710693359375, -0.5192375183105469, -1.2144699096679688, 1.3470230102539062, 1.4078102111816406, 1.0802631378173828, 1.7898712158203125, 1.9925193786621094, 3.190117835998535, -1.1011734008789062, 4.7413330078125, 1.5010852813720703, 1.06011962890625, 1.8535614013671875, 3.710420608520508, 0.111480712890625, -0.23917388916015625, -0.05055999755859375, 7.2989349365234375, 0.3820457458496094, 2.9928054809570312, 0.4714803695678711, 0.026676177978515625, 0.4908771514892578, 0.9298133850097656, 3.6017189025878906, -0.4346027374267578, -0.4813041687011719, 0.9547767639160156, 2.66485595703125, -1.6878662109375, 4.364646911621094, 0.7305355072021484, 3.3351058959960938, -3.1079025268554688, -0.81671142578125, 0.9697380065917969, -1.7003936767578125, 0.27748870849609375, 5.5360107421875, -0.741241455078125, -0.250244140625, -1.29180908203125, 0.972747802734375, 1.5210208892822266, 5.2751922607421875, 0.7163066864013672], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000180.npy"}
|
||||
{"epoch": 0.272108843537415, "step": 181, "batch_size": 64, "mean": 1.229837417602539, "std": 1.942196249961853, "min": -1.8590126037597656, "p10": -0.898919677734375, "median": 0.8875837326049805, "p90": 4.159638404846194, "max": 6.443122863769531, "pos_frac": 0.71875, "sample": [0.1547393798828125, 1.4891700744628906, -0.7192840576171875, -0.09963512420654297, -0.2332763671875, 0.06713104248046875, 1.0443878173828125, -1.06280517578125, 1.3014602661132812, -0.38063812255859375, 2.3387985229492188, 0.1309356689453125, -1.2660980224609375, 2.74468994140625, 1.6196117401123047, 2.4916534423828125, 2.4069480895996094, 2.8306350708007812, 1.3456039428710938, -1.84326171875, 0.521942138671875, 1.185028076171875, 4.916873931884766, 0.6075057983398438, 6.116249084472656, 1.22906494140625, 0.7173919677734375, 6.443122863769531, -0.88409423828125, 0.06833648681640625, -0.0963592529296875, 0.4778327941894531, 4.568939208984375, -1.8590126037597656, 4.864471435546875, 0.3668861389160156, 0.9375820159912109, 6.02215576171875, -1.5241317749023438, -0.9052734375, 2.3775711059570312, 1.9150981903076172, 4.417625427246094, 1.6041336059570312, 1.7364692687988281, -0.00653839111328125, -0.6160507202148438, 1.768768310546875, -0.1950225830078125, 0.05264854431152344, 0.24252700805664062, 2.324676513671875, 3.021881103515625, 0.83758544921875, 0.373870849609375, 3.557668685913086, 0.31871795654296875, -0.022216796875, 2.834549903869629, -1.439117431640625, 1.3501052856445312, -0.6628932952880859, 3.4586410522460938, 1.3236236572265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000181.npy"}
|
||||
{"epoch": 0.273620559334845, "step": 182, "batch_size": 64, "mean": 1.7906737327575684, "std": 2.7411208152770996, "min": -2.5140533447265625, "p10": -1.4073532104492186, "median": 1.7530632019042969, "p90": 4.993519592285158, "max": 13.056442260742188, "pos_frac": 0.734375, "sample": [1.872100830078125, -1.3350791931152344, -0.16845703125, -0.5323143005371094, 2.1785888671875, 5.797828674316406, 2.5073928833007812, -1.25506591796875, -0.46346282958984375, 3.4281158447265625, 2.7362403869628906, 1.7576217651367188, 4.6928558349609375, 0.8856658935546875, 1.7597541809082031, 5.12237548828125, -0.540679931640625, -1.4383277893066406, 4.183147430419922, 2.0112380981445312, 1.0778770446777344, 0.4601478576660156, 0.397857666015625, -0.6681976318359375, 1.0734672546386719, -1.6982040405273438, 5.7137908935546875, 3.0860652923583984, 0.748443603515625, 3.077770233154297, 3.604034423828125, 2.631084442138672, 2.146484375, 1.748504638671875, 6.791099548339844, -2.5140533447265625, 1.8511962890625, 0.5845489501953125, 3.027191162109375, -0.121063232421875, 0.2053985595703125, -2.3797073364257812, 4.290107727050781, 2.614288330078125, 2.0423660278320312, 13.056442260742188, 0.5302181243896484, 0.2059326171875, 6.240997314453125, 1.0228958129882812, 2.1668701171875, 8.828567504882812, 3.2529983520507812, 3.848287582397461, 2.3368988037109375, 0.804718017578125, 1.537994384765625, 0.5064506530761719, 3.5161819458007812, -0.7929477691650391, -0.421356201171875, -1.4791793823242188, -1.4749755859375, -2.07391357421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000182.npy"}
|
||||
{"epoch": 0.2751322751322751, "step": 183, "batch_size": 64, "mean": 0.4961353540420532, "std": 2.0586602687835693, "min": -3.5812911987304688, "p10": -2.274199676513672, "median": 0.038906097412109375, "p90": 3.3523071289062507, "max": 4.879241943359375, "pos_frac": 0.515625, "sample": [1.214813232421875, 0.9538192749023438, 1.903778076171875, -0.0107574462890625, -0.1362743377685547, -1.1629180908203125, -2.81982421875, 4.320526123046875, -3.5812911987304688, -2.2562713623046875, -1.254058837890625, 0.0662078857421875, 1.3574981689453125, 0.8717498779296875, 0.5991840362548828, 1.7384414672851562, -0.1897563934326172, 0.6194534301757812, 3.67572021484375, 4.879241943359375, 4.4739227294921875, -0.605224609375, 3.0272064208984375, -1.0910415649414062, 1.8771400451660156, -0.8293685913085938, 0.1175079345703125, -0.510650634765625, -2.7648468017578125, -0.092498779296875, 1.1744232177734375, -2.29986572265625, -0.5856285095214844, 0.2534599304199219, -0.49909210205078125, 2.7402420043945312, 3.41064453125, 2.6321449279785156, -3.50506591796875, 3.550548553466797, -1.6881828308105469, -2.2818832397460938, 3.0819091796875, 0.9177112579345703, -0.7690868377685547, -0.7395763397216797, 1.8586654663085938, -0.12443161010742188, -3.4780731201171875, -0.81280517578125, -0.5760040283203125, -0.06970977783203125, 0.9673576354980469, -0.006404876708984375, 1.7108612060546875, 0.01160430908203125, -0.03592681884765625, -1.1431503295898438, -0.43288516998291016, 1.014373779296875, 2.6529464721679688, 3.2161865234375, 2.4439258575439453, 4.772003173828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000183.npy"}
|
||||
{"epoch": 0.2766439909297052, "step": 184, "batch_size": 64, "mean": 1.0936038494110107, "std": 1.750092625617981, "min": -3.3662185668945312, "p10": -0.7157941818237304, "median": 1.0917940139770508, "p90": 3.391127014160157, "max": 5.391456604003906, "pos_frac": 0.734375, "sample": [-0.7166748046875, 1.2012062072753906, 3.2330970764160156, 0.2773551940917969, 2.2772216796875, -0.251800537109375, 1.081502914428711, -1.5240478515625, 0.01253509521484375, 1.3921051025390625, -0.5785694122314453, 2.1478347778320312, 1.3287429809570312, 0.5319929122924805, 1.6209869384765625, -0.6734046936035156, 2.2793312072753906, 2.0347442626953125, 0.15517807006835938, 0.5263214111328125, -0.042205810546875, 4.6876373291015625, 1.8333816528320312, 5.391456604003906, -0.7137393951416016, 1.130218505859375, 0.5467643737792969, 2.0143890380859375, 4.082157135009766, -0.4695587158203125, 2.5523529052734375, 0.0392608642578125, -1.1700325012207031, -1.5205116271972656, -0.6034717559814453, 0.14706802368164062, 4.548627853393555, 0.636474609375, -0.7010993957519531, 2.6437835693359375, -0.3734855651855469, 0.569366455078125, 3.2243804931640625, 0.09155750274658203, 1.7664031982421875, 1.5253963470458984, 2.445159912109375, 4.379154205322266, -0.36107635498046875, 3.4587249755859375, 0.44582557678222656, -1.5536041259765625, 1.8316650390625, 1.4224777221679688, 4.19927978515625, 3.2333984375, -1.2340068817138672, 1.1020851135253906, 1.7959403991699219, 1.49090576171875, -3.3662185668945312, 1.9105339050292969, 0.5230331420898438, 0.07513427734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000184.npy"}
|
||||
{"epoch": 0.2781557067271353, "step": 185, "batch_size": 64, "mean": 1.3761909008026123, "std": 2.3841326236724854, "min": -4.078651428222656, "p10": -2.0215358734130855, "median": 1.4561424255371094, "p90": 4.425805282592774, "max": 6.4273529052734375, "pos_frac": 0.734375, "sample": [1.6312074661254883, 6.4273529052734375, 0.8034210205078125, 2.093423843383789, 4.0672454833984375, -2.8487930297851562, 0.49560546875, 2.542896270751953, 2.02996826171875, -0.08294677734375, 4.156543731689453, 4.463523864746094, -2.1716690063476562, -0.7547760009765625, 4.20184326171875, 1.243947982788086, 4.826900482177734, 0.5923881530761719, 1.9340572357177734, 3.793868064880371, 0.7202072143554688, 0.8929176330566406, 0.9483642578125, 1.9381790161132812, 0.5776252746582031, 2.7955093383789062, 0.24665069580078125, 1.7392616271972656, 0.715667724609375, 3.5555419921875, -0.9307098388671875, -1.5672454833984375, 4.337795257568359, 3.1318359375, -1.7701148986816406, 1.58453369140625, 6.1580352783203125, 1.9542884826660156, 0.9374618530273438, -0.1512908935546875, -2.8648681640625, 2.3712387084960938, 0.42931365966796875, 3.7496719360351562, 2.2127532958984375, -2.8151702880859375, 3.6796951293945312, -0.5268516540527344, 2.0915451049804688, -1.1016674041748047, 1.69329833984375, -4.078651428222656, 2.921661376953125, 0.42258453369140625, 5.796165466308594, 0.17377185821533203, -2.1292877197265625, -2.9369277954101562, 1.9502677917480469, -0.29137420654296875, -0.7366867065429688, 1.3277511596679688, 4.644809722900391, 4.83265495300293], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000185.npy"}
|
||||
{"epoch": 0.2796674225245654, "step": 186, "batch_size": 64, "mean": 1.3545129299163818, "std": 2.504746198654175, "min": -4.074127197265625, "p10": -1.7970062255859374, "median": 1.1972942352294922, "p90": 4.4397018432617195, "max": 9.130157470703125, "pos_frac": 0.703125, "sample": [3.2639694213867188, 2.2684173583984375, 0.4013824462890625, 0.12330245971679688, -1.2530441284179688, 3.1961669921875, 2.8130645751953125, -2.535228729248047, -0.6520767211914062, 2.0106658935546875, 1.1067276000976562, -0.5995025634765625, 0.8759918212890625, 3.919933319091797, 4.600067138671875, 4.717569351196289, -0.2669086456298828, 0.6646003723144531, 2.9358062744140625, 0.75103759765625, -2.229856491088867, 2.703655242919922, 6.30659294128418, 4.552833557128906, -2.3143234252929688, -0.7096138000488281, -1.8805618286132812, -1.1682243347167969, -4.074127197265625, 0.5858345031738281, -0.3387184143066406, 4.175727844238281, 9.130157470703125, -3.575042724609375, 0.36020660400390625, -0.7575807571411133, 2.0236434936523438, 4.774871826171875, 3.0013275146484375, 0.16203689575195312, 1.1888999938964844, -0.5867538452148438, 2.11212158203125, 1.5388031005859375, 2.6374053955078125, 1.23065185546875, 2.5808944702148438, 2.445526123046875, -1.6020431518554688, 1.2056884765625, 3.6665802001953125, 2.9604244232177734, 3.766693115234375, 4.0240020751953125, 1.0365142822265625, -0.015106201171875, 6.9098358154296875, 1.9400157928466797, 0.0430755615234375, 0.542083740234375, 1.326690673828125, -0.556976318359375, -2.5299758911132812, 1.7529983520507812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000186.npy"}
|
||||
{"epoch": 0.2811791383219955, "step": 187, "batch_size": 64, "mean": 1.2872933149337769, "std": 2.2888224124908447, "min": -3.38092041015625, "p10": -1.7088478088378904, "median": 1.0761198997497559, "p90": 4.128417205810547, "max": 8.1456298828125, "pos_frac": 0.71875, "sample": [1.6192169189453125, -2.0433502197265625, 6.9036102294921875, 0.6057968139648438, 8.1456298828125, 2.7676849365234375, 4.085121154785156, 0.1577606201171875, 0.7778778076171875, 1.5063095092773438, -1.7954177856445312, 0.7534637451171875, 0.027164459228515625, 0.888763427734375, 3.8511962890625, 1.7589797973632812, 1.3956966400146484, 0.8739242553710938, 2.3274383544921875, 2.3130435943603516, -1.5068511962890625, -3.38092041015625, 2.5453948974609375, 0.0556640625, -0.04555511474609375, 0.0899810791015625, 0.27233123779296875, 1.918701171875, -0.5869293212890625, 4.9888763427734375, 1.1943626403808594, 4.4104461669921875, -1.9958763122558594, -2.6364059448242188, -0.6566925048828125, 5.34521484375, 0.6470413208007812, -0.6518936157226562, -0.531890869140625, 1.0306806564331055, -0.7411327362060547, -0.8276481628417969, 3.9985504150390625, 1.5314903259277344, 2.2208938598632812, -0.5217819213867188, 3.2759170532226562, 1.8377151489257812, 1.6632080078125, -0.2863006591796875, 3.976959228515625, -2.0168914794921875, 2.1833343505859375, 4.14697265625, 0.8035812377929688, 3.2305374145507812, 5.0692901611328125, 1.7081146240234375, 1.1215591430664062, 3.209014892578125, 0.6001739501953125, -2.3601818084716797, -0.72113037109375, 1.8589324951171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000187.npy"}
|
||||
{"epoch": 0.28269085411942557, "step": 188, "batch_size": 64, "mean": 1.5262320041656494, "std": 2.666943073272705, "min": -2.7999725341796875, "p10": -0.6641822814941406, "median": 1.1963448524475098, "p90": 4.086311340332031, "max": 16.659423828125, "pos_frac": 0.71875, "sample": [-0.13897705078125, -0.3868522644042969, 2.790435791015625, -0.39002037048339844, -0.7326431274414062, 2.1776046752929688, 2.1691360473632812, 1.3346900939941406, 3.191680908203125, 2.8435516357421875, 0.8025665283203125, 4.056243896484375, 4.163555145263672, 0.30707550048828125, 0.2233562469482422, -1.3411178588867188, 1.7859573364257812, 5.187629699707031, 1.4709033966064453, -0.4070281982421875, 0.5396499633789062, -0.3701934814453125, 2.3494873046875, -0.3965606689453125, 0.9284019470214844, -0.6403045654296875, -0.010708808898925781, 1.6283645629882812, 2.0509414672851562, 4.0991973876953125, 3.4199256896972656, 0.9248046875, 2.353240966796875, 4.135612487792969, 0.4444580078125, 0.4170799255371094, 4.726509094238281, 1.7400894165039062, -0.850311279296875, 0.12845611572265625, 3.9674835205078125, -2.7999725341796875, 1.4384851455688477, -0.13616943359375, 1.8552703857421875, 0.1844635009765625, 1.9777660369873047, 6.379425048828125, 1.1780281066894531, 0.9033966064453125, 3.616424560546875, -2.649749755859375, -0.3911285400390625, 1.6286735534667969, 0.42022705078125, 0.6821517944335938, 1.2146615982055664, -2.2046985626220703, 2.537181854248047, 2.561330795288086, 2.7861595153808594, -0.6744155883789062, -0.1814594268798828, 16.659423828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000188.npy"}
|
||||
{"epoch": 0.2842025699168556, "step": 189, "batch_size": 64, "mean": 1.142228364944458, "std": 2.383281707763672, "min": -5.601287841796875, "p10": -1.947764205932617, "median": 0.8524198532104492, "p90": 4.2325397491455075, "max": 6.4534149169921875, "pos_frac": 0.671875, "sample": [-1.0404891967773438, 0.5743789672851562, 2.7901649475097656, -1.7207069396972656, -0.81561279296875, 2.119232177734375, 5.327503204345703, 3.519773483276367, 0.6851997375488281, 0.1407470703125, -2.1150054931640625, 3.031036376953125, -5.601287841796875, 0.5660247802734375, -2.117034912109375, -1.0287761688232422, 4.743503570556641, 3.8897857666015625, -1.0864410400390625, -0.203033447265625, 0.39208221435546875, 4.236827850341797, 1.8227806091308594, 1.0414581298828125, 1.401519775390625, -0.21399688720703125, -0.795989990234375, 0.23657798767089844, 0.8846797943115234, 1.8335494995117188, 4.130531311035156, -1.12774658203125, 0.8912582397460938, 5.549098968505859, 4.294246673583984, 3.1408348083496094, 0.052272796630859375, 2.120635986328125, 1.9657058715820312, 2.785186767578125, 3.8537254333496094, -2.3758544921875, 3.6226768493652344, -0.4997901916503906, -0.5950279235839844, 2.1960220336914062, -1.0098838806152344, -1.2306232452392578, 0.820159912109375, 0.10289764404296875, 0.7938308715820312, 3.896097183227539, 2.3671875, 2.0248794555664062, 4.2225341796875, 2.771190643310547, -2.156402587890625, -2.5962600708007812, 0.5267829895019531, -2.045074462890625, 1.172719955444336, -0.28351593017578125, 6.4534149169921875, 4.770458221435547], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000189.npy"}
|
||||
{"epoch": 0.2857142857142857, "step": 190, "batch_size": 64, "mean": 1.4207061529159546, "std": 1.8172961473464966, "min": -3.150045394897461, "p10": -1.0584379196166989, "median": 1.6776695251464844, "p90": 3.6330535888671878, "max": 5.318977355957031, "pos_frac": 0.765625, "sample": [2.909128189086914, 1.3229827880859375, 3.2733306884765625, -0.78765869140625, 1.164459228515625, 2.3035507202148438, -0.39934539794921875, 2.8176422119140625, -3.150045394897461, 5.318977355957031, -1.3604812622070312, 1.6981887817382812, 0.9450035095214844, 1.0337142944335938, 2.653921127319336, 0.14443206787109375, 1.8588180541992188, 2.32183837890625, 2.1920127868652344, 1.6571502685546875, 1.9888038635253906, 2.3519668579101562, 0.264495849609375, 0.628814697265625, -1.7706832885742188, 1.5232772827148438, -2.3047332763671875, -1.9572639465332031, 0.7302436828613281, 2.2218189239501953, 2.6043701171875, -0.3899803161621094, 3.4991378784179688, 4.004058837890625, 2.7761001586914062, -0.24994659423828125, 2.466156005859375, -0.6648693084716797, -1.1744861602783203, 3.247509002685547, 0.15630340576171875, 0.5559539794921875, 3.210906982421875, 3.6189117431640625, 1.5338058471679688, 0.7978439331054688, -1.9696769714355469, 3.90911865234375, -0.188690185546875, 0.5677890777587891, -0.07178306579589844, 1.8169841766357422, 0.02184295654296875, 3.6391143798828125, 2.43951416015625, 4.133026123046875, 2.1909103393554688, 0.9244003295898438, 3.8018531799316406, 2.3366241455078125, -0.07309341430664062, 3.74755859375, 3.42620849609375, 2.687356948852539], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000190.npy"}
|
||||
{"epoch": 0.2872260015117158, "step": 191, "batch_size": 64, "mean": 1.2701534032821655, "std": 2.1507251262664795, "min": -4.352176666259766, "p10": -0.8277898788452148, "median": 1.0385942459106445, "p90": 3.754342651367188, "max": 8.36541748046875, "pos_frac": 0.75, "sample": [2.5825271606445312, -0.04606056213378906, 2.2918167114257812, 5.35247802734375, 6.220241546630859, -1.719512939453125, -0.316314697265625, 2.3676223754882812, 1.044412612915039, 1.68292236328125, -0.7068328857421875, 3.195056915283203, 0.2674407958984375, 1.7733268737792969, 2.2764892578125, 2.235292434692383, -1.472900390625, 0.7313232421875, 1.740447998046875, 1.4773101806640625, 2.5840530395507812, 0.435455322265625, 3.455362319946289, 0.7597274780273438, -4.352176666259766, 1.4755706787109375, 1.03277587890625, 2.599090576171875, 0.35601806640625, -0.8341236114501953, 1.5699596405029297, 0.222808837890625, 0.3995532989501953, 4.4891815185546875, 1.594207763671875, -2.9436187744140625, 1.2513046264648438, -0.1883544921875, 4.0597991943359375, 3.0158004760742188, 3.5795669555664062, 2.95794677734375, 0.70794677734375, 2.1156768798828125, 3.8292465209960938, 0.7552547454833984, 1.0619735717773438, 2.8330001831054688, 0.43987178802490234, -0.097503662109375, -0.4522438049316406, 0.5148773193359375, -0.157623291015625, -2.2427940368652344, -0.8130111694335938, 8.36541748046875, 4.586601257324219, -2.849761962890625, 0.2435169219970703, 0.5476913452148438, 0.641021728515625, 0.60699462890625, 2.6821823120117188, -0.5255126953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000191.npy"}
|
||||
{"epoch": 0.2887377173091459, "step": 192, "batch_size": 64, "mean": 0.8153265714645386, "std": 1.921419382095337, "min": -6.749214172363281, "p10": -0.9943500518798828, "median": 0.5796527862548828, "p90": 3.0820324897766116, "max": 6.283967971801758, "pos_frac": 0.671875, "sample": [-6.749214172363281, -0.5315055847167969, 0.70233154296875, 2.3746395111083984, 2.380084991455078, -1.0324382781982422, -0.9188518524169922, 1.783172607421875, 1.8266525268554688, 3.1194229125976562, -0.6650199890136719, 0.1504364013671875, -0.3314781188964844, 0.5816650390625, -1.0778121948242188, -0.7931060791015625, 3.6720123291015625, -0.5915069580078125, -0.6787834167480469, 0.2959442138671875, 2.94439697265625, 5.881828308105469, 0.3296070098876953, -0.9942245483398438, 1.247039794921875, 1.3133487701416016, 3.1521148681640625, 1.54443359375, 1.1167831420898438, 0.8344879150390625, 0.30451202392578125, 2.9203338623046875, -1.0634841918945312, 2.8228816986083984, 0.0178375244140625, 0.48887062072753906, 0.722869873046875, -0.8589973449707031, -0.7227134704589844, 2.368927001953125, 0.7919960021972656, -0.3631019592285156, -0.5994377136230469, 6.283967971801758, 1.6981048583984375, 0.8514747619628906, -1.2172050476074219, 2.99478816986084, 1.5572128295898438, -0.35652923583984375, -0.127899169921875, 0.16329002380371094, 0.12830352783203125, 3.828033447265625, 0.753875732421875, 1.234039306640625, 2.229400634765625, -1.1842422485351562, 3.7146453857421875, 1.7268524169921875, 0.24054527282714844, -0.9944038391113281, 0.3620491027832031, 0.5776405334472656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000192.npy"}
|
||||
{"epoch": 0.29024943310657597, "step": 193, "batch_size": 64, "mean": 1.3285223245620728, "std": 1.8914172649383545, "min": -2.6576766967773438, "p10": -0.7002193450927734, "median": 1.365182876586914, "p90": 3.9623725891113284, "max": 5.768104553222656, "pos_frac": 0.734375, "sample": [0.09261322021484375, 0.7056999206542969, 2.019336700439453, -0.507049560546875, 1.3874130249023438, -2.6576766967773438, 0.9536666870117188, 0.1352691650390625, -0.47149658203125, -1.7789802551269531, 0.0008392333984375, 0.2877473831176758, 2.7913246154785156, -0.38065338134765625, 3.5463485717773438, -0.7122116088867188, 3.1292800903320312, 0.7299232482910156, 2.6181468963623047, -0.2643280029296875, -0.1161346435546875, 2.7819976806640625, 4.999076843261719, 1.3429527282714844, 2.952301025390625, 2.9332275390625, 0.2512092590332031, 2.3931350708007812, 0.8609542846679688, 1.476043701171875, 5.768104553222656, 1.6871871948242188, 0.08597183227539062, 3.7711334228515625, 1.4538421630859375, 3.592071533203125, -0.6722373962402344, -1.7705917358398438, 0.8213272094726562, 1.7876129150390625, 3.9927520751953125, -0.12506866455078125, 1.4088287353515625, 4.4427490234375, -0.6566925048828125, 1.5799560546875, 0.4649505615234375, 0.1043233871459961, 0.5386123657226562, 2.5131988525390625, -0.7775077819824219, -0.15523910522460938, 3.8914871215820312, 4.314630508422852, 2.7907180786132812, 2.89453125, 4.7392578125, -1.9353446960449219, 1.4579544067382812, 4.2127227783203125, -0.47035980224609375, 1.395599365234375, -1.499481201171875, 1.8784542083740234], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000193.npy"}
|
||||
{"epoch": 0.29176114890400606, "step": 194, "batch_size": 64, "mean": 1.119341254234314, "std": 2.1185781955718994, "min": -3.5394821166992188, "p10": -1.3382328033447264, "median": 0.9801216125488281, "p90": 3.6384104728698734, "max": 8.19451904296875, "pos_frac": 0.671875, "sample": [0.714599609375, 1.2971343994140625, 1.5690689086914062, -1.4195499420166016, 3.320648193359375, -0.8282356262207031, -1.1484928131103516, 1.2187652587890625, -1.0734176635742188, 1.9979705810546875, 0.034740447998046875, 1.7899932861328125, -0.23492431640625, 2.7104644775390625, -3.5394821166992188, 0.48253631591796875, 3.5859146118164062, -0.341552734375, -0.15409088134765625, 0.05115509033203125, 2.062255859375, 0.1420269012451172, 0.5215835571289062, 4.398902893066406, -0.0304718017578125, -0.1626739501953125, -0.2649650573730469, 1.598052978515625, 0.6668014526367188, 2.0552825927734375, 1.6963081359863281, 1.4139957427978516, -2.26641845703125, -1.9757728576660156, -2.4464683532714844, -0.3749847412109375, 5.5291595458984375, 3.6942901611328125, 3.863262176513672, 3.4085655212402344, -2.1900672912597656, 0.7819595336914062, 1.16571044921875, 0.4522724151611328, 2.2911300659179688, 4.900461196899414, 2.7883243560791016, -2.2448348999023438, 8.19451904296875, 3.472900390625, 2.0180625915527344, -0.8940200805664062, 0.7945327758789062, 3.6609086990356445, 1.5268020629882812, 3.320281982421875, 2.3526954650878906, 1.4295539855957031, -0.3701343536376953, 2.9641342163085938, -0.7034187316894531, 0.6067733764648438, -0.4040699005126953, 2.1613826751708984], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000194.npy"}
|
||||
{"epoch": 0.29327286470143615, "step": 195, "batch_size": 64, "mean": 1.126772403717041, "std": 2.02590012550354, "min": -7.3668060302734375, "p10": -0.8622375488281249, "median": 0.9421768188476562, "p90": 3.706404113769531, "max": 5.284271240234375, "pos_frac": 0.703125, "sample": [-0.08380508422851562, 2.490030288696289, -1.2991943359375, 2.2260589599609375, 4.068210601806641, 1.4180030822753906, 3.5096168518066406, 2.5262451171875, 1.536712646484375, 4.689208984375, 2.0663528442382812, 0.8421478271484375, -0.4172325134277344, -1.0413894653320312, -0.4757843017578125, -7.3668060302734375, -0.2529735565185547, 2.87457275390625, 1.7673568725585938, 2.368419647216797, -1.2096099853515625, 1.8954086303710938, 1.7350273132324219, 1.6151885986328125, -0.1529693603515625, 0.2972259521484375, 3.0332794189453125, 1.77471923828125, 0.4095611572265625, 2.8078689575195312, 0.22559738159179688, 0.842193603515625, 4.770362854003906, 1.322662353515625, 3.5007705688476562, -1.3893890380859375, 1.0421600341796875, 1.9159393310546875, 0.11382293701171875, 5.284271240234375, -0.2652397155761719, -0.0952606201171875, 0.838104248046875, 0.664031982421875, 0.5330390930175781, 3.718475341796875, -0.912078857421875, 2.1088790893554688, -0.3672943115234375, -0.5151100158691406, 0.1835174560546875, -2.090423583984375, -0.15070343017578125, 4.408914566040039, 0.330078125, 1.9316596984863281, -0.6752471923828125, 1.7743072509765625, -0.745941162109375, 1.5872268676757812, 0.36138153076171875, 0.14436721801757812, 4.388671875, 3.6782379150390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000195.npy"}
|
||||
{"epoch": 0.2947845804988662, "step": 196, "batch_size": 64, "mean": 0.6964335441589355, "std": 2.5103509426116943, "min": -9.658111572265625, "p10": -1.1249202728271483, "median": 0.3300209045410156, "p90": 3.340200805664063, "max": 9.2344970703125, "pos_frac": 0.609375, "sample": [-1.07666015625, -3.757183074951172, -0.40380096435546875, -1.03387451171875, 0.0017642974853515625, -1.1456031799316406, 2.16937255859375, 0.5959701538085938, 1.0154495239257812, -0.7404022216796875, -0.97467041015625, 4.643619537353516, 2.357532501220703, -0.4616508483886719, -0.5892486572265625, 0.2140960693359375, -2.3638648986816406, 4.124820709228516, -4.3760986328125, 2.9574432373046875, -1.011922836303711, 0.7876968383789062, 4.012092590332031, 1.6715583801269531, 0.05117034912109375, -0.4104766845703125, 5.786834716796875, 1.2567596435546875, 2.297607421875, 9.2344970703125, -0.0157012939453125, 1.4592170715332031, 0.31513214111328125, -0.8348274230957031, 0.32070159912109375, 3.3700103759765625, -0.0424652099609375, 2.1097450256347656, -1.1738014221191406, 0.71099853515625, 2.5273056030273438, -0.0457763671875, 1.135650634765625, -0.2949562072753906, 0.3393402099609375, -1.8145065307617188, 0.2111358642578125, 0.284637451171875, -0.5382156372070312, 1.4103546142578125, 3.2706451416015625, 1.6426525115966797, -0.45068359375, -0.032756805419921875, 0.8316860198974609, 2.8764801025390625, 3.7421035766601562, 3.2662277221679688, 1.1575164794921875, -9.658111572265625, 1.4907302856445312, -0.9089889526367188, 1.0614166259765625, 2.016021728515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000196.npy"}
|
||||
{"epoch": 0.2962962962962963, "step": 197, "batch_size": 64, "mean": 1.2729198932647705, "std": 2.837847948074341, "min": -7.511871337890625, "p10": -1.8230527877807616, "median": 0.9325981140136719, "p90": 4.7753698348999025, "max": 9.323772430419922, "pos_frac": 0.71875, "sample": [-1.4610424041748047, 6.03472900390625, 0.0067462921142578125, -0.2324371337890625, 2.314788818359375, 0.7113037109375, -1.08807373046875, -7.511871337890625, 1.639495849609375, -1.8633880615234375, -0.4632568359375, 2.8641510009765625, -1.9364166259765625, -0.3906135559082031, 0.4212932586669922, 4.775218963623047, 0.631988525390625, 0.44919586181640625, 9.323772430419922, 4.4645233154296875, -0.29294586181640625, 0.32384490966796875, 2.727764129638672, 0.6611709594726562, 3.2898597717285156, 0.6157150268554688, 1.4455795288085938, -0.3907299041748047, 2.3003196716308594, 1.53094482421875, 4.409416198730469, 6.099761962890625, -6.0715484619140625, 4.775434494018555, -2.143341064453125, -2.0448989868164062, 2.366067886352539, 0.6034698486328125, 4.093902587890625, 0.49156951904296875, 0.10356521606445312, 2.6111602783203125, 3.3155593872070312, 2.709442138671875, 1.1929931640625, 5.352333068847656, -1.4267425537109375, 0.23531723022460938, 1.563507080078125, 3.6071720123291016, -1.7289371490478516, -0.8151397705078125, 4.916469573974609, 1.1538925170898438, -0.30535888671875, -3.132068634033203, 7.246917724609375, 2.3320999145507812, 0.4280529022216797, 1.1564712524414062, 1.5745429992675781, 0.4097137451171875, 1.768423080444336, 3.7160167694091797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000197.npy"}
|
||||
{"epoch": 0.29780801209372637, "step": 198, "batch_size": 64, "mean": 1.462914228439331, "std": 1.9142017364501953, "min": -3.0252609252929688, "p10": -0.8874179840087889, "median": 1.3487768173217773, "p90": 3.9224191665649415, "max": 6.8836669921875, "pos_frac": 0.796875, "sample": [-0.9885406494140625, 2.800994873046875, 0.5547084808349609, -0.44083404541015625, -0.25228118896484375, 1.995208740234375, 0.12653350830078125, 0.0990142822265625, 1.3665313720703125, 1.6209487915039062, 2.2538528442382812, 2.4221649169921875, 0.050769805908203125, -0.7356147766113281, 4.5779571533203125, 4.366424560546875, 0.3251495361328125, 2.702392578125, -0.2097930908203125, 1.0009422302246094, 2.0506057739257812, 1.2991752624511719, 3.741809844970703, -3.0252609252929688, 5.525657653808594, 0.2169189453125, -1.3146286010742188, -0.9524765014648438, 2.7142791748046875, 2.5316696166992188, 2.5181808471679688, 5.8794097900390625, 1.8497314453125, -1.9314422607421875, 1.2660446166992188, 0.58526611328125, 0.11852264404296875, -0.3842945098876953, 2.98468017578125, 3.4943313598632812, 2.5934906005859375, 6.8836669921875, 4.2493438720703125, 1.3879165649414062, 0.9149627685546875, 2.2443695068359375, -0.3432884216308594, 3.925342559814453, 3.915597915649414, 0.7823333740234375, 0.42884063720703125, 0.8792953491210938, 2.9080429077148438, 2.201883316040039, 0.17824554443359375, 0.07892608642578125, 1.742889404296875, -1.4114112854003906, -0.9686794281005859, 1.3310222625732422, 1.7939529418945312, 1.2124710083007812, 2.4698562622070312, 1.4227275848388672], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000198.npy"}
|
||||
{"epoch": 0.29931972789115646, "step": 199, "batch_size": 64, "mean": 1.0273126363754272, "std": 2.4388198852539062, "min": -4.5362701416015625, "p10": -1.9289230346679684, "median": 0.8832950592041016, "p90": 3.876443481445313, "max": 10.039169311523438, "pos_frac": 0.65625, "sample": [3.3712081909179688, 2.655588150024414, 1.6546821594238281, 1.7983818054199219, 0.9186286926269531, -1.6581878662109375, 10.039169311523438, 0.5316162109375, 1.9656562805175781, 0.84796142578125, 1.07257080078125, 0.21660995483398438, 2.0990982055664062, 2.7305450439453125, 3.2179107666015625, -2.044952392578125, 4.567451477050781, 2.3455581665039062, 3.9482574462890625, 2.2300071716308594, -0.4965057373046875, -3.262897491455078, 1.6857528686523438, 2.3240127563476562, -1.1186389923095703, -0.3689422607421875, 0.4636993408203125, -1.0165786743164062, 1.9074440002441406, 3.3691635131835938, -0.45484161376953125, 0.4929218292236328, 2.8108062744140625, -0.7938385009765625, -0.173492431640625, -0.9678955078125, 5.38067626953125, -0.5108718872070312, 0.42136383056640625, 1.4845046997070312, -3.820465087890625, 1.156646728515625, -1.257415771484375, 3.7088775634765625, -2.5675582885742188, 2.93280029296875, 2.612192153930664, 0.1599884033203125, -0.6005439758300781, -2.3264312744140625, -0.3474597930908203, -2.2833175659179688, -0.5781478881835938, 3.9571456909179688, -4.5362701416015625, 0.6722335815429688, 3.50933837890625, -1.3468093872070312, 1.6201553344726562, 0.4748344421386719, 4.116142272949219, 0.4096832275390625, 4.893867492675781, 1.5049209594726562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000199.npy"}
|
||||
{"epoch": 0.30083144368858655, "step": 200, "batch_size": 64, "mean": 1.3263182640075684, "std": 2.544546365737915, "min": -6.4040374755859375, "p10": -1.409500503540039, "median": 1.1808414459228516, "p90": 4.500806427001954, "max": 7.7596435546875, "pos_frac": 0.71875, "sample": [1.7059688568115234, 2.717662811279297, 4.61572265625, 2.5547332763671875, 1.125152587890625, -0.07201004028320312, 5.374641418457031, 0.6538619995117188, 2.628093719482422, 1.2962417602539062, 0.3091583251953125, 0.5143280029296875, -1.3576622009277344, 4.2480316162109375, 0.8694725036621094, -0.3028297424316406, -0.9390792846679688, -0.06340980529785156, 4.569976806640625, -6.4040374755859375, 0.080535888671875, 1.2034912109375, 4.766670227050781, 6.185756683349609, 1.1184272766113281, 2.5463638305664062, 0.34400367736816406, 0.252685546875, 2.037630081176758, -2.382801055908203, 3.1325912475585938, 2.8570632934570312, -0.15297698974609375, 7.4326324462890625, 3.814891815185547, -0.6912384033203125, -4.3842010498046875, 1.1943740844726562, 1.825286865234375, 3.858783721923828, 3.8565139770507812, 1.220672607421875, 7.7596435546875, -0.0430755615234375, 0.74322509765625, -2.7791748046875, 0.5304489135742188, -1.027069091796875, 1.7404022216796875, 4.339408874511719, 0.00978851318359375, 2.354156494140625, 0.6899890899658203, -1.526437759399414, 2.3686370849609375, 2.8869781494140625, -1.4317169189453125, 1.4027347564697266, -0.4989356994628906, 1.1673088073730469, -0.6182432174682617, -2.743551254272461, 2.7358474731445312, 2.6628265380859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000200.npy"}
|
||||
{"epoch": 0.30234315948601664, "step": 201, "batch_size": 64, "mean": 1.3092677593231201, "std": 2.067314386367798, "min": -3.516204833984375, "p10": -1.1826400756835938, "median": 1.1619911193847656, "p90": 4.0439998626708995, "max": 6.319175720214844, "pos_frac": 0.703125, "sample": [-1.7328643798828125, 1.4203643798828125, 0.6143989562988281, 2.6166648864746094, -0.75323486328125, 0.9918212890625, 5.0521087646484375, -0.15062522888183594, 0.5260162353515625, -0.40238189697265625, 1.7073516845703125, 1.380258560180664, 2.6849136352539062, 2.9114151000976562, -0.17109298706054688, -1.45068359375, 2.489471435546875, 2.1715164184570312, -0.7667999267578125, 5.862138748168945, -1.6241798400878906, 1.2010574340820312, 0.10102081298828125, 1.1584243774414062, 0.0062656402587890625, 5.369518280029297, 0.6317882537841797, 3.113658905029297, -1.4509315490722656, 4.1441802978515625, 0.10967636108398438, 2.654144287109375, 2.0319137573242188, 0.493682861328125, 4.589176177978516, 6.319175720214844, -0.09476089477539062, -3.516204833984375, 2.11944580078125, 1.2279129028320312, 3.8102455139160156, 1.1797714233398438, -1.3890275955200195, 3.6511459350585938, 0.01401519775390625, 2.3601531982421875, 1.0565719604492188, -0.593414306640625, 0.6625957489013672, 2.5855178833007812, -0.08715534210205078, -0.8497314453125, -0.18848419189453125, 2.127880096435547, 3.080598831176758, 3.6298980712890625, 2.298797607421875, -1.1245880126953125, 1.4017410278320312, 1.165557861328125, 5.9241485595703125, 0.873870849609375, -0.1751708984375, -1.20751953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000201.npy"}
|
||||
{"epoch": 0.30385487528344673, "step": 202, "batch_size": 64, "mean": 1.1626203060150146, "std": 2.227665901184082, "min": -4.734931945800781, "p10": -0.9524820327758788, "median": 0.9276695251464844, "p90": 3.709580421447755, "max": 8.350227355957031, "pos_frac": 0.71875, "sample": [-0.8673839569091797, 2.25244140625, 5.768276214599609, 4.74371337890625, 2.4090795516967773, -0.35135650634765625, 1.0574188232421875, 1.8415451049804688, 0.7967891693115234, 2.507719039916992, 0.9245796203613281, 1.4878158569335938, 1.7438125610351562, 1.5831851959228516, -1.407867431640625, 5.752754211425781, 2.3779296875, 2.4429779052734375, -0.98895263671875, -0.6040763854980469, -2.408111572265625, 0.5073013305664062, 0.1638946533203125, 0.724151611328125, 0.18079757690429688, 1.312774658203125, 0.9307594299316406, 8.350227355957031, 1.431060791015625, 1.8179950714111328, -0.4085502624511719, 1.870767593383789, 0.6890506744384766, 2.0913238525390625, 0.761138916015625, -3.5185089111328125, 2.9230880737304688, 2.6056594848632812, 1.3504753112792969, -1.7495155334472656, 0.17429351806640625, 0.9064178466796875, 0.7435932159423828, 4.7283477783203125, 1.7688217163085938, -0.574066162109375, 3.471771240234375, 0.8436737060546875, 1.7035675048828125, -4.734931945800781, -0.2449493408203125, -0.7623786926269531, -0.18645286560058594, -0.5029716491699219, -0.6199493408203125, 0.91107177734375, 5.8587646484375, 1.4561500549316406, -0.05974388122558594, 3.8114986419677734, -3.2602691650390625, 2.0937118530273438, 0.9230499267578125, 2.8624954223632812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000202.npy"}
|
||||
{"epoch": 0.30536659108087677, "step": 203, "batch_size": 64, "mean": 1.2816410064697266, "std": 2.5195531845092773, "min": -3.4768905639648438, "p10": -0.8210708618164062, "median": 0.8168277740478516, "p90": 3.6991333007812504, "max": 12.14996337890625, "pos_frac": 0.640625, "sample": [1.7862701416015625, -0.7380142211914062, -2.5052261352539062, -0.143798828125, 0.006866455078125, 1.2706928253173828, -0.25243377685546875, 12.14996337890625, 1.2380218505859375, 0.3672370910644531, 3.727020263671875, 0.3865242004394531, -0.5623855590820312, 7.5877532958984375, -0.11105155944824219, 3.634063720703125, 3.259368896484375, -0.7297592163085938, 0.8880996704101562, 4.2878265380859375, -3.4768905639648438, 0.22332000732421875, 2.052276611328125, -0.4724273681640625, -2.5793609619140625, -1.17669677734375, 0.8211936950683594, 2.9150543212890625, 2.9521102905273438, 2.3110733032226562, 0.8124618530273438, 2.8286590576171875, -0.236663818359375, -0.342132568359375, 2.6194915771484375, -0.0291900634765625, 3.0197372436523438, 2.5834732055664062, 6.393135070800781, 0.8512039184570312, 2.9768600463867188, 2.59173583984375, 0.7530670166015625, 0.13527679443359375, 2.3731040954589844, -0.8566665649414062, 3.2307205200195312, 1.3931427001953125, -0.49127960205078125, 6.835704803466797, 0.8444328308105469, 0.49015045166015625, -0.3256072998046875, 1.7073211669921875, -0.6843414306640625, -1.6806907653808594, -0.25661468505859375, 0.18683242797851562, 4.5300750732421875, 0.9076805114746094, -0.0126800537109375, -0.4215545654296875, -1.177825927734375, 1.3593177795410156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000203.npy"}
|
||||
{"epoch": 0.30687830687830686, "step": 204, "batch_size": 64, "mean": 1.302083134651184, "std": 2.0468814373016357, "min": -3.8728599548339844, "p10": -0.8461115837097167, "median": 1.1690540313720703, "p90": 3.8984344482421873, "max": 6.696113586425781, "pos_frac": 0.734375, "sample": [6.696113586425781, 2.0986785888671875, -2.0225181579589844, 0.8783721923828125, 0.962677001953125, 5.866493225097656, 1.3233184814453125, 2.0364131927490234, 0.137939453125, -0.05523681640625, 2.3671951293945312, -1.6490707397460938, -0.7696437835693359, 4.346853256225586, -0.7882843017578125, -0.372711181640625, 2.3794403076171875, 1.0561637878417969, -2.6637344360351562, -0.48638153076171875, 0.21340179443359375, 3.0320587158203125, 1.1137466430664062, 3.0590896606445312, 0.2102031707763672, -0.692352294921875, 2.505462646484375, 3.9001693725585938, 1.0876951217651367, 1.1247634887695312, 3.1299095153808594, -0.85150146484375, 3.4252281188964844, 3.8387451171875, 1.2627315521240234, -0.8335351943969727, 1.9017257690429688, 4.065673828125, 0.8825416564941406, 3.8943862915039062, 0.21689605712890625, 1.2161216735839844, 2.0568084716796875, 4.333290100097656, -2.6519546508789062, 1.5247764587402344, 1.0543975830078125, 0.6888818740844727, 2.373382568359375, -0.15079498291015625, 1.812530517578125, 5.538215637207031, 1.2133445739746094, 1.7679214477539062, 1.087493896484375, 2.782806396484375, -0.63079833984375, 0.6153755187988281, -0.20953369140625, 1.9592971801757812, -3.8728599548339844, -1.2377204895019531, 1.8264961242675781, 2.4067230224609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000204.npy"}
|
||||
{"epoch": 0.30839002267573695, "step": 205, "batch_size": 64, "mean": 1.0774143934249878, "std": 2.237278461456299, "min": -3.6440563201904297, "p10": -1.6479705810546874, "median": 0.7544069290161133, "p90": 3.917917633056642, "max": 6.680820465087891, "pos_frac": 0.6875, "sample": [1.890838623046875, 0.3291778564453125, 0.14675521850585938, -1.2300224304199219, 0.6490936279296875, 6.680820465087891, 0.7555408477783203, 2.8017120361328125, 2.1248779296875, 3.318695068359375, 1.2374382019042969, 0.6810340881347656, 6.468162536621094, 1.9363155364990234, 3.541666030883789, 0.5304336547851562, 1.4007644653320312, -0.537750244140625, 1.02728271484375, 1.7462196350097656, -0.16228389739990234, 0.8637847900390625, 0.7499713897705078, 4.318061828613281, 1.3574676513671875, -0.5454330444335938, -1.0360946655273438, -1.1282882690429688, -2.4392013549804688, 4.0539398193359375, 1.0258407592773438, -1.2887248992919922, 0.56414794921875, -0.34877777099609375, -1.8139801025390625, 0.353515625, 1.8990478515625, 1.3126907348632812, -0.4200859069824219, 4.9380035400390625, -1.5260772705078125, 2.4990997314453125, 2.3304405212402344, 3.1554412841796875, 5.608604431152344, 2.4974365234375, 0.7532730102539062, -0.19073486328125, 5.5258026123046875, -2.5153274536132812, -0.9620170593261719, 3.3483963012695312, -0.4443397521972656, 1.644805908203125, -1.7002105712890625, 2.468292236328125, 2.7288360595703125, -3.6440563201904297, 3.6005325317382812, -2.677459716796875, 0.09032440185546875, -2.4108810424804688, 0.5053024291992188, 0.5163745880126953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000205.npy"}
|
||||
{"epoch": 0.30990173847316704, "step": 206, "batch_size": 64, "mean": 0.9063504934310913, "std": 2.8757071495056152, "min": -4.063323974609375, "p10": -1.8214618682861328, "median": 0.4391908645629883, "p90": 3.4159599304199233, "max": 14.580169677734375, "pos_frac": 0.578125, "sample": [-0.5282859802246094, -1.9751434326171875, 0.22161102294921875, -1.0095672607421875, -2.1191139221191406, -0.7604293823242188, 0.558624267578125, -0.170074462890625, 2.6735076904296875, 1.7456798553466797, -0.0318603515625, 4.5048980712890625, 2.0718002319335938, -1.2629756927490234, 0.85546875, 14.580169677734375, -1.3572540283203125, 3.114532470703125, -3.067413330078125, 0.07032012939453125, 7.234733581542969, -0.17203521728515625, -0.61639404296875, -2.9527740478515625, -1.4348411560058594, 2.7395172119140625, 1.9587154388427734, -2.2554969787597656, 0.5045337677001953, 1.9075469970703125, 2.333282470703125, 5.651878356933594, 0.6051979064941406, 3.901876449584961, -0.53271484375, -1.6765899658203125, -4.063323974609375, -0.422882080078125, -0.6890830993652344, 0.6913299560546875, 2.1878204345703125, 0.28279876708984375, 7.6527557373046875, -1.4227066040039062, -1.8235435485839844, 0.6819305419921875, 2.9121170043945312, 2.741180419921875, 1.2539291381835938, -1.800537109375, 3.0631561279296875, 0.37384796142578125, 1.9551467895507812, 1.4221744537353516, 3.5451431274414062, 2.903167724609375, -0.17748260498046875, 0.3220367431640625, -1.0395126342773438, 2.3540496826171875, -1.8166046142578125, -1.391754150390625, 1.3450775146484375, 1.6552734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000206.npy"}
|
||||
{"epoch": 0.31141345427059713, "step": 207, "batch_size": 64, "mean": 1.1258769035339355, "std": 2.4084811210632324, "min": -7.199073791503906, "p10": -1.049330902099609, "median": 0.6988925933837891, "p90": 4.04200134277344, "max": 12.503204345703125, "pos_frac": 0.734375, "sample": [12.503204345703125, -0.2812166213989258, 0.39904022216796875, 0.51043701171875, 0.8575096130371094, 1.0146713256835938, 0.665008544921875, 1.0309524536132812, -0.17436981201171875, 2.1767120361328125, 0.5111427307128906, 0.7327766418457031, 0.4398460388183594, 1.5125350952148438, -1.2816619873046875, -1.5379447937011719, -1.31219482421875, 4.304107666015625, 2.0811166763305664, 0.47930908203125, 4.90130615234375, 2.0743331909179688, 0.2980461120605469, 0.27044677734375, 0.1625213623046875, 0.8838577270507812, -0.7159652709960938, 3.430419921875, -0.772705078125, 0.9047927856445312, 2.3680877685546875, 1.20281982421875, 5.801849365234375, 0.559967041015625, 0.4568023681640625, -0.16766738891601562, -1.3744659423828125, 1.8531494140625, 0.2552490234375, -0.011216163635253906, 1.18927001953125, 1.4835968017578125, 1.9023475646972656, 0.8794174194335938, 0.60980224609375, 3.1741714477539062, 1.0608596801757812, -0.0538330078125, 1.9608268737792969, 2.1486663818359375, -7.199073791503906, 0.6165084838867188, 1.35675048828125, 2.321765899658203, 0.2617950439453125, 4.601875305175781, 4.944633483886719, -0.06414794921875, 2.3973159790039062, -0.0890045166015625, -1.5837516784667969, -0.4760894775390625, -1.1678848266601562, 4.767692565917969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000207.npy"}
|
||||
{"epoch": 0.3129251700680272, "step": 208, "batch_size": 64, "mean": 0.6764594316482544, "std": 2.1121580600738525, "min": -3.019317626953125, "p10": -2.043019676208496, "median": 0.46144914627075195, "p90": 3.3601455688476567, "max": 6.162315368652344, "pos_frac": 0.5625, "sample": [0.24171829223632812, -0.09650230407714844, 1.716604232788086, -2.4036331176757812, 5.8891448974609375, 3.2519989013671875, 0.5976572036743164, -1.2579345703125, -0.769622802734375, -2.095428466796875, -0.10929107666015625, -1.177215576171875, 2.0397300720214844, 1.6506195068359375, 1.0993499755859375, 3.532686233520508, 0.050914764404296875, 3.406494140625, 0.08372688293457031, -0.5693283081054688, 2.1157073974609375, -1.4660491943359375, -0.3521537780761719, 0.7050857543945312, 2.0746002197265625, 1.7696685791015625, 1.6847686767578125, -0.3917675018310547, -0.031406402587890625, 4.353233337402344, -1.9207324981689453, 1.3328704833984375, -1.5892333984375, 0.3252410888671875, -1.7369003295898438, 2.7559890747070312, 3.5246200561523438, -0.47359466552734375, -2.2119903564453125, -2.137310028076172, 6.162315368652344, 1.1416549682617188, -0.04134368896484375, -2.9444122314453125, 1.7812004089355469, 2.9463043212890625, 2.42022705078125, -0.3370475769042969, 1.8424720764160156, 2.300830841064453, 0.7089157104492188, 2.6943511962890625, -0.00754547119140625, 2.3823394775390625, -1.4210014343261719, 0.7011604309082031, -0.2392120361328125, -1.1220321655273438, -2.861743927001953, -1.0553817749023438, -3.019317626953125, 1.0474624633789062, 1.3868026733398438, 5.414070129394531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000208.npy"}
|
||||
{"epoch": 0.3144368858654573, "step": 209, "batch_size": 64, "mean": 1.4544110298156738, "std": 2.3918509483337402, "min": -2.7262725830078125, "p10": -1.3085895538330077, "median": 1.3984642028808594, "p90": 4.453829002380371, "max": 8.341629028320312, "pos_frac": 0.65625, "sample": [1.6177597045898438, 2.561227798461914, -2.1090621948242188, -1.0533828735351562, -0.6796607971191406, 4.55084228515625, 1.9781341552734375, 4.245391845703125, 0.7805042266845703, 8.341629028320312, 1.01458740234375, -0.426177978515625, -2.239501953125, 5.902442932128906, -2.08880615234375, -0.4162750244140625, 1.1601638793945312, -0.22621536254882812, -0.8534469604492188, 7.204132080078125, 0.8280982971191406, -2.3682518005371094, -0.6614913940429688, 2.0259246826171875, 3.4844512939453125, 1.7189712524414062, -0.2857208251953125, 3.317138671875, 2.74053955078125, 2.0780029296875, 1.9604339599609375, 2.5423583984375, 2.4634132385253906, 4.510307312011719, 2.6898956298828125, 0.23198318481445312, 3.0051422119140625, -0.134185791015625, -2.7262725830078125, 2.2688751220703125, -0.16400146484375, -0.4844827651977539, 1.4765548706054688, -1.177459716796875, 1.283172607421875, 2.1786460876464844, 4.181316375732422, 4.811603546142578, 0.8940963745117188, 2.509838104248047, 0.48905372619628906, -2.6181678771972656, -1.0328788757324219, 1.805908203125, -0.0923004150390625, 0.8770866394042969, 6.385871887207031, 1.32037353515625, 2.9188995361328125, 2.6263036727905273, 4.322046279907227, 3.264669418334961, -1.3647880554199219, -0.28295135498046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000209.npy"}
|
||||
{"epoch": 0.31594860166288735, "step": 210, "batch_size": 64, "mean": 1.7446017265319824, "std": 2.5713162422180176, "min": -2.836597442626953, "p10": -1.3415172576904297, "median": 1.5745773315429688, "p90": 5.169536590576172, "max": 10.00311279296875, "pos_frac": 0.671875, "sample": [5.181816101074219, -0.184112548828125, 0.861724853515625, 0.8384475708007812, -0.344940185546875, 1.1214065551757812, 4.653255462646484, 2.8133468627929688, 3.8683395385742188, 0.94378662109375, 5.248527526855469, 2.732147216796875, 3.1503067016601562, 10.00311279296875, 5.8131103515625, -1.3592109680175781, -2.5301952362060547, -2.3169307708740234, 5.1408843994140625, -0.6440696716308594, 1.3336410522460938, 1.5743560791015625, 1.7685470581054688, -0.0097808837890625, 4.3031768798828125, -2.2548370361328125, -0.8668537139892578, 2.8597488403320312, 3.043914794921875, -1.375335693359375, 4.0504913330078125, -0.47926902770996094, 1.574798583984375, -1.30023193359375, 1.6454048156738281, -0.28311920166015625, 0.7339935302734375, 1.2539443969726562, -0.922760009765625, 5.3792724609375, -1.1696395874023438, 2.280303955078125, 2.7059783935546875, 0.7839736938476562, 5.191829681396484, -0.3946371078491211, 4.919300079345703, 0.4638710021972656, 4.222892761230469, 3.6798667907714844, -0.48815155029296875, 2.916362762451172, 4.424663543701172, -2.836597442626953, -0.16707611083984375, 6.406379699707031, 3.7271194458007812, 1.9950523376464844, 2.845808982849121, -0.2654685974121094, 1.6863250732421875, -2.0788421630859375, 2.8358917236328125, 0.9494476318359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000210.npy"}
|
||||
{"epoch": 0.31746031746031744, "step": 211, "batch_size": 64, "mean": 1.0514049530029297, "std": 2.0608527660369873, "min": -4.824432373046875, "p10": -1.0917461395263672, "median": 1.0486526489257812, "p90": 3.155172348022462, "max": 9.423126220703125, "pos_frac": 0.6875, "sample": [1.6706695556640625, -1.1726875305175781, 2.5526275634765625, 0.22811508178710938, -0.6948165893554688, 0.9171829223632812, -4.824432373046875, 1.6043548583984375, 1.7413330078125, 0.8753395080566406, -0.5959930419921875, 1.4498577117919922, -1.0833587646484375, 3.8003311157226562, -0.6599884033203125, 3.564302444458008, 1.1258354187011719, -0.5066490173339844, 1.5852279663085938, 0.14331817626953125, 1.9902076721191406, 0.1766357421875, 2.1986236572265625, -0.8949127197265625, -0.9278450012207031, -1.2599620819091797, 0.5217361450195312, 1.652435302734375, 1.4947052001953125, 1.6229820251464844, 0.13145065307617188, -0.1931915283203125, 0.88543701171875, -0.6964645385742188, 2.5096206665039062, 4.356224060058594, 2.2553787231445312, -0.1461944580078125, -1.0953407287597656, 0.9714698791503906, 2.7781219482421875, 9.423126220703125, 4.223419189453125, 0.8313179016113281, 2.127166748046875, 2.8920021057128906, 2.717334747314453, 1.928009033203125, -1.5596542358398438, 3.2679595947265625, -3.51654052734375, 2.0019912719726562, 1.9314117431640625, 1.827239990234375, 3.8435592651367188, 2.1289234161376953, 2.3445663452148438, 2.7931690216064453, -0.9669380187988281, -0.509368896484375, -1.3269233703613281, 0.10472869873046875, -0.062206268310546875, 0.7939376831054688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000211.npy"}
|
||||
{"epoch": 0.31897203325774753, "step": 212, "batch_size": 64, "mean": 1.5043418407440186, "std": 2.1869001388549805, "min": -3.2071380615234375, "p10": -1.433939743041992, "median": 1.4746980667114258, "p90": 4.225732803344728, "max": 7.220085144042969, "pos_frac": 0.765625, "sample": [6.466011047363281, 3.5581626892089844, 3.474334716796875, 1.9516563415527344, -0.13825035095214844, 1.6903343200683594, 4.5903167724609375, 1.4539966583251953, -2.149383544921875, 3.4632720947265625, -0.6843986511230469, -2.333221435546875, 3.5037307739257812, 4.424041748046875, 0.8744964599609375, 0.830780029296875, 1.9504470825195312, 1.4953994750976562, -3.2071380615234375, 0.7071304321289062, 3.3806686401367188, 0.4124603271484375, 0.9006881713867188, 0.8255081176757812, -1.4870071411132812, 0.908355712890625, 2.1065673828125, -1.1298980712890625, 2.441242218017578, 6.2187042236328125, 2.3551101684570312, 2.6120376586914062, 1.7383499145507812, 0.31354522705078125, 4.825630187988281, 1.3823089599609375, -0.6070671081542969, 1.5066986083984375, 0.4182777404785156, 3.4650497436523438, 2.132232666015625, -0.01190948486328125, 2.16351318359375, -2.1733932495117188, 0.6634960174560547, 2.643047332763672, 0.13601303100585938, 7.220085144042969, -0.8189315795898438, -1.6727142333984375, 1.4199142456054688, 0.43128204345703125, 1.7443923950195312, -1.3101158142089844, 2.7055816650390625, 4.502513885498047, 3.1783447265625, 3.644989013671875, 3.763011932373047, 1.7478866577148438, -2.015106201171875, 1.0721588134765625, -0.7769908905029297, 1.3796310424804688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000212.npy"}
|
||||
{"epoch": 0.3204837490551776, "step": 213, "batch_size": 64, "mean": 1.2062104940414429, "std": 2.395775556564331, "min": -5.184131622314453, "p10": -1.8675949096679687, "median": 1.2959861755371094, "p90": 4.290966796875, "max": 7.412109375, "pos_frac": 0.703125, "sample": [0.9964218139648438, -1.7957000732421875, 1.9061737060546875, 4.2354888916015625, 2.0047340393066406, 1.052591323852539, -0.1185455322265625, -0.35038185119628906, 2.488128662109375, 5.016880035400391, 1.9569854736328125, 0.960601806640625, -0.74884033203125, -0.571864128112793, -5.184131622314453, -2.97442626953125, 2.786794662475586, 3.57244873046875, -0.8805389404296875, 1.958831787109375, 2.483234405517578, -2.5146141052246094, 4.199676513671875, 5.115020751953125, 5.785614013671875, 1.4488143920898438, -0.7483577728271484, -1.3247146606445312, -0.09508514404296875, 0.3636360168457031, -1.9079608917236328, -1.90185546875, 0.8271408081054688, 2.3886890411376953, -0.0194091796875, -3.960693359375, 0.8112859725952148, 2.4477615356445312, 2.635690689086914, 1.8902816772460938, 1.6256637573242188, 1.6790618896484375, 4.447723388671875, 0.7525463104248047, 4.3147430419921875, 1.3248138427734375, 2.74456787109375, 1.6489448547363281, 6.865394592285156, 0.10752105712890625, -0.07152557373046875, 0.6544151306152344, -1.898406982421875, 0.42473602294921875, -1.284393310546875, 1.2671585083007812, 0.06915283203125, 7.412109375, 1.6590118408203125, 0.1755828857421875, 3.1180419921875, 1.4173259735107422, 2.097412109375, 2.4100608825683594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000213.npy"}
|
||||
{"epoch": 0.3219954648526077, "step": 214, "batch_size": 64, "mean": 1.1592485904693604, "std": 2.039320230484009, "min": -2.1801528930664062, "p10": -1.3707973480224607, "median": 0.8710803985595703, "p90": 3.4330230712890635, "max": 8.236679077148438, "pos_frac": 0.78125, "sample": [1.9529666900634766, 0.17667388916015625, 1.06439208984375, -0.409027099609375, -0.12366104125976562, 2.825143814086914, 2.037506103515625, 1.2184391021728516, 1.8008003234863281, 1.5540924072265625, 3.921764373779297, 0.8743324279785156, 0.5750961303710938, -1.4803619384765625, 1.3450241088867188, 2.421722412109375, 0.8867416381835938, -1.6920013427734375, 1.1168785095214844, 2.5523319244384766, 0.9514122009277344, 2.5278244018554688, 0.7266845703125, 0.526214599609375, 0.0661764144897461, 3.1566200256347656, 0.22341156005859375, 0.08997535705566406, 4.083332061767578, 0.86376953125, 1.5552444458007812, 0.867828369140625, -2.0618667602539062, -2.1801528930664062, 0.1351776123046875, 2.941936492919922, -1.880157470703125, 0.57672119140625, 4.581302642822266, 8.236679077148438, 1.8336944580078125, -1.8368587493896484, 2.0311737060546875, -0.4089622497558594, 2.3768043518066406, 7.9476776123046875, 0.4511985778808594, -0.844818115234375, 1.4324836730957031, 0.12188148498535156, 0.444854736328125, 3.181182861328125, 0.32289886474609375, -0.07364654541015625, 1.0228958129882812, 0.6972465515136719, 5.0780029296875, 0.8611068725585938, -0.7840652465820312, 0.24121856689453125, 3.54095458984375, 1.0611648559570312, -1.9980239868164062, -1.1151466369628906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000214.npy"}
|
||||
{"epoch": 0.3235071806500378, "step": 215, "batch_size": 64, "mean": 1.53515625, "std": 2.1010854244232178, "min": -2.6838531494140625, "p10": -1.0735557556152342, "median": 1.4552345275878906, "p90": 4.304962158203127, "max": 7.6873626708984375, "pos_frac": 0.734375, "sample": [-0.2209930419921875, -0.7285308837890625, -0.1519927978515625, 2.5619430541992188, 2.52545166015625, 0.6956329345703125, 1.7480926513671875, -1.8712615966796875, 5.09971809387207, 5.078865051269531, -0.1151275634765625, -1.1416893005371094, -0.64453125, 0.94091796875, 1.4707412719726562, 3.371673583984375, 0.2205352783203125, 1.6351089477539062, 2.0819549560546875, 2.5193939208984375, 1.913726806640625, 2.24395751953125, -1.2446212768554688, 2.6469955444335938, 3.684103012084961, 0.7449169158935547, -2.3245086669921875, 2.463054656982422, 2.6373977661132812, -1.5986356735229492, 0.9048614501953125, 2.5889205932617188, 0.9325180053710938, -1.1986846923828125, -0.36539459228515625, -2.6838531494140625, 2.102701187133789, 2.0248184204101562, 1.439727783203125, 2.819549560546875, 0.7033443450927734, 0.31999969482421875, -0.363128662109375, -0.44402313232421875, 3.5832061767578125, 7.6873626708984375, 3.0557708740234375, 5.211709976196289, 1.2854738235473633, 4.4784088134765625, 3.828887939453125, 3.9002532958984375, 2.6378860473632812, 0.4197883605957031, 0.1312274932861328, 0.6093101501464844, 3.10614013671875, 4.482513427734375, 5.737545013427734, -0.44788074493408203, 3.1836395263671875, 0.9931678771972656, 0.2565155029296875, -0.9145774841308594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000215.npy"}
|
||||
{"epoch": 0.3250188964474679, "step": 216, "batch_size": 64, "mean": 1.5922796726226807, "std": 2.601928472518921, "min": -2.7611541748046875, "p10": -1.3100069046020508, "median": 1.4581298828125, "p90": 4.657392883300782, "max": 12.996150970458984, "pos_frac": 0.75, "sample": [-1.1534652709960938, 3.8101577758789062, 1.7130126953125, 1.1746902465820312, 4.2018890380859375, 1.7454757690429688, 1.3517284393310547, 0.18694305419921875, 2.9856491088867188, 4.267208099365234, 0.3120880126953125, 2.4517898559570312, 4.730804443359375, 1.9932403564453125, -0.7585601806640625, 0.0841522216796875, 1.8172340393066406, 1.6591949462890625, -1.3149261474609375, 3.3900299072265625, 3.3353042602539062, 3.965087890625, 0.25629425048828125, 1.4882354736328125, 2.5779495239257812, -1.361602783203125, 4.4860992431640625, 0.8545913696289062, 1.25897216796875, 3.7265472412109375, 1.619384765625, -2.7611541748046875, -0.04268646240234375, 1.7319564819335938, 1.586212158203125, 3.054656982421875, -0.5128631591796875, -0.7470560073852539, 2.6481246948242188, -1.8327064514160156, 0.18581008911132812, 12.996150970458984, 5.666114807128906, 0.39931488037109375, -1.8749885559082031, -0.4271240234375, 3.31585693359375, 0.05245780944824219, -2.1319618225097656, 0.238433837890625, 5.556056976318359, -0.418701171875, -2.512176513671875, 0.18337631225585938, 4.84832763671875, -0.7119131088256836, 1.8809928894042969, 1.4280242919921875, 0.3090972900390625, 4.74139404296875, -1.2985286712646484, 2.3780288696289062, 0.369293212890625, 6.752876281738281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000216.npy"}
|
||||
{"epoch": 0.32653061224489793, "step": 217, "batch_size": 64, "mean": 1.460935115814209, "std": 2.5594539642333984, "min": -2.9733963012695312, "p10": -1.4678291320800778, "median": 0.8692626953125, "p90": 5.119570159912111, "max": 9.309799194335938, "pos_frac": 0.703125, "sample": [1.0966720581054688, 6.384971618652344, -0.8931198120117188, 0.5209121704101562, -0.057369232177734375, -0.2870521545410156, -0.06499862670898438, 9.309799194335938, -0.33966064453125, -2.3284912109375, 2.4697723388671875, 5.251701354980469, 1.5597305297851562, 0.7221832275390625, 2.8774337768554688, 0.703765869140625, 0.156463623046875, 0.8385353088378906, 0.6879425048828125, 0.8989410400390625, 0.12612152099609375, 5.255714416503906, 3.1135406494140625, 2.935901641845703, 2.841339111328125, 4.21928596496582, -2.649871826171875, 0.5661697387695312, -0.30240631103515625, 4.710136413574219, 1.75506591796875, 1.8151702880859375, 3.1058807373046875, -2.9733963012695312, 1.1334495544433594, -2.154447555541992, 0.4417266845703125, 2.8500213623046875, -1.234344482421875, -1.5678939819335938, -2.56298828125, 5.581768035888672, -0.8051071166992188, -1.0909423828125, 2.2515792846679688, -0.9010086059570312, -0.8196258544921875, 1.98956298828125, 0.2963409423828125, 6.708892822265625, 1.259866714477539, 4.109867095947266, 1.6923465728759766, 0.321197509765625, 4.8112640380859375, -0.206634521484375, 7.4752044677734375, 1.9869308471679688, 0.8395843505859375, 3.3125228881835938, 2.4797210693359375, 2.89129638671875, 0.03609466552734375, -1.6531829833984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000217.npy"}
|
||||
{"epoch": 0.328042328042328, "step": 218, "batch_size": 64, "mean": 1.0281034708023071, "std": 1.928337812423706, "min": -2.5627212524414062, "p10": -1.5475799560546872, "median": 0.9959068298339844, "p90": 3.1119285583496095, "max": 6.9720306396484375, "pos_frac": 0.734375, "sample": [-0.6586761474609375, 0.24007606506347656, 2.315643310546875, 0.9163703918457031, 1.621337890625, 0.254913330078125, 1.964752197265625, 1.6048583984375, 1.081512451171875, 3.1370162963867188, -0.0335540771484375, 0.9974708557128906, 1.4683456420898438, -0.7791061401367188, -1.0858421325683594, -1.3844146728515625, 1.8101615905761719, 5.797523498535156, -1.9942703247070312, 0.94586181640625, 0.9113082885742188, 2.5715713500976562, 1.0516128540039062, 0.1143798828125, 2.5194530487060547, 5.0152587890625, 2.0167694091796875, 3.72930908203125, 2.1274490356445312, 1.190673828125, 0.04123687744140625, 0.02423095703125, -1.97784423828125, -1.9923648834228516, -0.7279434204101562, 0.9943428039550781, 0.9656219482421875, 0.8761940002441406, 1.201019287109375, 1.28326416015625, 4.235256195068359, 2.07720947265625, 0.3638496398925781, -1.8180007934570312, 0.2023143768310547, -1.6175079345703125, 2.275054931640625, -0.8436622619628906, -0.3914642333984375, -2.5627212524414062, 1.546945571899414, -0.5703392028808594, 3.0359039306640625, 0.0458831787109375, 2.6905059814453125, 1.4723358154296875, 3.0533905029296875, 6.9720306396484375, -2.555614471435547, -0.4752922058105469, 1.3161849975585938, 0.46854400634765625, 3.8481903076171875, 2.8741016387939453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000218.npy"}
|
||||
{"epoch": 0.3295540438397581, "step": 219, "batch_size": 64, "mean": 1.1794754266738892, "std": 2.1709413528442383, "min": -3.8846282958984375, "p10": -1.2443107604980468, "median": 1.2916173934936523, "p90": 3.799070358276367, "max": 7.2877197265625, "pos_frac": 0.671875, "sample": [4.333137512207031, 3.771526336669922, 1.6345901489257812, 2.0566654205322266, 4.3261260986328125, -1.0659942626953125, -1.3207321166992188, 1.1111583709716797, 0.9422569274902344, -2.7775497436523438, 0.6637248992919922, 5.7337799072265625, 4.58782958984375, -0.19845199584960938, -0.8065032958984375, 1.0426559448242188, -0.1381683349609375, 0.7551727294921875, -0.2520027160644531, -0.5275650024414062, -0.25019359588623047, -3.3365020751953125, -9.1552734375e-05, 0.3902168273925781, -1.33160400390625, -0.04244232177734375, 3.8108749389648438, -1.0474567413330078, 3.415313720703125, 1.8688774108886719, 1.6082496643066406, 1.9488983154296875, 1.7155532836914062, 3.4234771728515625, 3.1420555114746094, 4.631988525390625, 0.799468994140625, 1.550994873046875, -2.8094253540039062, 0.00284576416015625, 0.36694908142089844, 1.6238861083984375, 1.6394271850585938, -0.39730072021484375, 1.9185028076171875, -1.6292953491210938, 3.160188674926758, -0.6210784912109375, 1.4921798706054688, 2.7687606811523438, -1.0499267578125, 1.7356948852539062, 3.5932159423828125, 0.7216567993164062, -3.8846282958984375, 3.1562271118164062, 1.7912750244140625, 0.12740325927734375, 2.46539306640625, -0.79095458984375, 3.4892539978027344, 7.2877197265625, 1.6870460510253906, 1.472076416015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000219.npy"}
|
||||
{"epoch": 0.3310657596371882, "step": 220, "batch_size": 64, "mean": 0.9770974516868591, "std": 2.180670738220215, "min": -4.146526336669922, "p10": -1.6006484985351561, "median": 0.609464168548584, "p90": 3.4328796386718756, "max": 7.35992431640625, "pos_frac": 0.71875, "sample": [-1.2212581634521484, 1.7379150390625, 0.5287971496582031, 0.17911529541015625, 0.37384700775146484, 0.149871826171875, 3.2315673828125, -0.851104736328125, 0.09719657897949219, 0.5874481201171875, 0.2060394287109375, 5.526939392089844, 1.7823677062988281, -3.053760528564453, 0.5880622863769531, 2.273313522338867, 2.342742919921875, -2.2026214599609375, -1.0840835571289062, 3.8193893432617188, 3.3254165649414062, 1.2362289428710938, -1.272247314453125, -1.001800537109375, 1.6384201049804688, -2.5331878662109375, 0.3636436462402344, -1.622406005859375, 7.35992431640625, 2.9292526245117188, 1.6017684936523438, 0.46482086181640625, 1.320831298828125, 2.6312294006347656, 2.375396728515625, -0.6726646423339844, 2.3432769775390625, 0.729888916015625, 4.520463943481445, -1.6752471923828125, 0.5701751708984375, 0.6276350021362305, 1.329620361328125, 2.175905227661133, 6.60565185546875, -0.6244125366210938, 3.4789352416992188, -0.8175220489501953, -0.5083084106445312, 1.5312767028808594, 0.6582870483398438, -4.146526336669922, 2.536134719848633, -0.22596359252929688, 0.5912933349609375, 1.4705657958984375, -1.7893180847167969, 2.9269180297851562, 0.6876907348632812, 0.44973182678222656, 5.03485107421875, -1.5498809814453125, 1.9855422973632812, 0.46115684509277344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000220.npy"}
|
||||
{"epoch": 0.3325774754346183, "step": 221, "batch_size": 64, "mean": 1.2063356637954712, "std": 2.2205123901367188, "min": -3.716583251953125, "p10": -1.541301345825195, "median": 1.0706520080566406, "p90": 4.692399215698243, "max": 7.0185089111328125, "pos_frac": 0.703125, "sample": [2.994274139404297, 0.07660293579101562, 1.2209701538085938, 1.3431472778320312, -0.6856193542480469, 1.77294921875, 0.05967140197753906, 1.01141357421875, 0.7730712890625, -2.3104705810546875, -1.6648483276367188, -0.5484466552734375, 1.34417724609375, 1.5989532470703125, -0.03265380859375, 0.7618598937988281, -0.28267669677734375, 0.44362640380859375, 1.0490455627441406, -0.02103424072265625, 3.0384864807128906, 5.1598358154296875, 2.1605606079101562, 3.0067138671875, 4.47698974609375, 0.344696044921875, 1.0922584533691406, 4.784717559814453, -0.650054931640625, -0.5702047348022461, 5.469757080078125, 2.643280029296875, -1.7727508544921875, 2.0085525512695312, 3.876312255859375, 1.0110588073730469, 4.09979248046875, -1.845306396484375, 2.250255584716797, 2.5458984375, 1.7736358642578125, 0.4463081359863281, -1.0328254699707031, 1.3055458068847656, 1.0961875915527344, -0.5729827880859375, -0.3638725280761719, 5.1507415771484375, -2.3343048095703125, -1.2530250549316406, -3.2415924072265625, -3.716583251953125, 1.7720375061035156, 5.349029541015625, 1.767852783203125, -0.42061614990234375, 1.0282974243164062, 1.727020263671875, 7.0185089111328125, 2.094329833984375, 0.10007476806640625, 1.6866226196289062, 5.3450927734375, 0.44513702392578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000221.npy"}
|
||||
{"epoch": 0.3340891912320484, "step": 222, "batch_size": 64, "mean": 1.2545099258422852, "std": 1.9389935731887817, "min": -2.2598514556884766, "p10": -1.0143455505371093, "median": 1.0978412628173828, "p90": 3.7100227355957043, "max": 6.800691604614258, "pos_frac": 0.71875, "sample": [-0.42978858947753906, -1.01910400390625, 1.0072669982910156, 1.3092041015625, 1.7633743286132812, 2.3920326232910156, 0.608184814453125, -0.6187629699707031, 2.1605987548828125, 3.4482650756835938, 4.12042236328125, 0.9486656188964844, 1.4477996826171875, 0.5629806518554688, 1.7126331329345703, -0.494384765625, -1.3097457885742188, 0.9829673767089844, -0.5506925582885742, 1.1015663146972656, 1.0382461547851562, 0.1526346206665039, 3.276519775390625, -0.8966598510742188, -0.517669677734375, 4.1702728271484375, 1.49755859375, 4.83465576171875, -2.2598514556884766, 2.1474990844726562, 1.312295913696289, 3.412261962890625, 2.2793960571289062, 3.065013885498047, 3.2295684814453125, 2.6984634399414062, 2.101238250732422, -1.8035049438476562, -1.8796043395996094, -0.6531524658203125, -0.7288894653320312, 3.253570556640625, 1.3276519775390625, 1.5258255004882812, 0.8156833648681641, 2.243011474609375, -0.5207595825195312, -1.0032424926757812, 1.0184860229492188, 0.8487091064453125, 1.0941162109375, 0.3994712829589844, 5.261009216308594, 6.800691604614258, -0.8965606689453125, 2.4758758544921875, 4.546966552734375, -1.6803512573242188, 0.84893798828125, 3.82220458984375, 0.1136932373046875, 1.4390316009521484, 3.0311279296875, -2.096282958984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000222.npy"}
|
||||
{"epoch": 0.3356009070294785, "step": 223, "batch_size": 64, "mean": 1.5568515062332153, "std": 1.8905344009399414, "min": -2.019134521484375, "p10": -0.6282981872558593, "median": 1.2212886810302734, "p90": 4.262620544433594, "max": 7.0699005126953125, "pos_frac": 0.765625, "sample": [0.8908882141113281, 4.434597015380859, 2.2018661499023438, -0.41461181640625, -0.373291015625, 2.93914794921875, -1.51336669921875, 1.0338287353515625, -0.6796798706054688, 3.458467483520508, 2.729106903076172, 6.1746368408203125, 1.7957096099853516, 1.3624114990234375, -0.298736572265625, -0.9270477294921875, -0.1285400390625, 1.19903564453125, 2.7385406494140625, -1.280487060546875, 2.50970458984375, 0.7262954711914062, -0.23749542236328125, 1.3851318359375, 1.620269775390625, 0.7724647521972656, -0.1589069366455078, 2.1762542724609375, 0.09575843811035156, -2.019134521484375, 2.3394622802734375, 1.2322883605957031, 0.048919677734375, 1.1383438110351562, 1.0447616577148438, 4.340660095214844, 3.5137863159179688, 3.5870819091796875, -0.7376518249511719, 0.6466445922851562, 0.1592559814453125, 2.0783653259277344, 1.2102890014648438, 2.31060791015625, 3.376953125, 1.6691055297851562, 4.20758056640625, 6.056880950927734, -0.5084075927734375, 2.5614585876464844, 7.0699005126953125, -0.2188720703125, 4.2862091064453125, 4.50146484375, 1.0957489013671875, 1.2739715576171875, 0.8870773315429688, 2.574462890625, 1.1140556335449219, -0.9656181335449219, 0.8355712890625, 1.5396270751953125, 2.163850784301758, 0.9918441772460938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000223.npy"}
|
||||
{"epoch": 0.3371126228269085, "step": 224, "batch_size": 64, "mean": 1.128204584121704, "std": 1.7750452756881714, "min": -2.647674560546875, "p10": -1.0183700561523437, "median": 1.0171728134155273, "p90": 3.5483745574951175, "max": 6.959911346435547, "pos_frac": 0.78125, "sample": [0.326263427734375, 4.925689697265625, 0.3257589340209961, 1.8676300048828125, 0.188934326171875, 1.1948661804199219, 1.5601921081542969, 0.6095123291015625, -1.4907169342041016, -0.6147994995117188, 0.46231842041015625, 2.9893932342529297, 3.2955894470214844, 4.029056549072266, 1.8808727264404297, 0.3323192596435547, 0.1925973892211914, 2.248347282409668, 2.2128829956054688, -0.8472900390625, 0.298309326171875, 3.4631576538085938, 0.21728515625, -0.15213584899902344, -1.0916900634765625, -1.7004776000976562, 0.9437713623046875, 1.5981979370117188, 2.203216552734375, -1.46728515625, 2.6471824645996094, 0.5570774078369141, 0.2282733917236328, 0.1833038330078125, -0.03481292724609375, 0.024667739868164062, -0.3730754852294922, -1.8414840698242188, 0.9397354125976562, 0.967681884765625, 1.6197738647460938, 0.032512664794921875, 1.0666637420654297, 1.0961265563964844, -2.647674560546875, 0.49710845947265625, 4.363346099853516, 2.259693145751953, 2.2457962036132812, 1.8474311828613281, -1.1966724395751953, 1.0829505920410156, 3.5848960876464844, 1.3009490966796875, 6.959911346435547, 1.741668701171875, 1.1259231567382812, 3.032806396484375, -0.7037067413330078, 3.711597442626953, -0.74053955078125, 4.00555419921875, 1.1800689697265625, 1.4385948181152344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000224.npy"}
|
||||
{"epoch": 0.3386243386243386, "step": 225, "batch_size": 64, "mean": 1.2687162160873413, "std": 2.4120655059814453, "min": -4.66729736328125, "p10": -1.6665958404541013, "median": 1.1958084106445312, "p90": 3.9951929092407226, "max": 6.8519287109375, "pos_frac": 0.6875, "sample": [-0.9917755126953125, -0.0105438232421875, -1.2612037658691406, 0.0414581298828125, 2.50390625, 5.005706787109375, 1.6869544982910156, -0.6557178497314453, -0.7617015838623047, 3.9103317260742188, 0.1924896240234375, 3.266876220703125, -0.28802490234375, 3.1679840087890625, 1.11956787109375, -0.01918792724609375, -0.039703369140625, 0.6726150512695312, 2.1221084594726562, 3.6455764770507812, 3.8289737701416016, 2.443666458129883, 6.8519287109375, 0.7330093383789062, 6.51776123046875, 0.9544258117675781, 2.9468841552734375, 1.4330062866210938, 1.5116500854492188, -0.8441505432128906, 1.4066429138183594, -4.66729736328125, 1.72113037109375, -2.3502540588378906, 4.007301330566406, 1.881866455078125, 1.1056594848632812, 0.937835693359375, 3.5347900390625, -0.26645660400390625, 2.3919677734375, 1.9227828979492188, 5.4150390625, -1.3097038269042969, 3.6222152709960938, -3.956939697265625, -3.2337646484375, 1.0452995300292969, 4.2324676513671875, -1.2862472534179688, 1.2720489501953125, 0.40334320068359375, 3.2720565795898438, 2.852264404296875, 5.14813232421875, -1.946054458618164, 3.228973388671875, 0.7403640747070312, 3.966939926147461, 0.13094139099121094, -1.30718994140625, -1.819549560546875, 1.5853958129882812, -2.169036865234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000225.npy"}
|
||||
{"epoch": 0.3401360544217687, "step": 226, "batch_size": 64, "mean": 1.2005561590194702, "std": 2.1530709266662598, "min": -3.8697128295898438, "p10": -1.3165512084960935, "median": 0.7018337249755859, "p90": 3.9307937622070317, "max": 7.461784362792969, "pos_frac": 0.703125, "sample": [-0.17437744140625, 0.102569580078125, 0.25373077392578125, 2.4481773376464844, 2.7145462036132812, 1.369110107421875, 3.5003814697265625, 3.594940185546875, 0.2841949462890625, -2.129701614379883, 2.240917205810547, -0.39328765869140625, 2.54583740234375, 3.101015090942383, -0.08634185791015625, -0.25417327880859375, 1.6187381744384766, -0.82745361328125, 4.201086044311523, 2.4345474243164062, 4.135627746582031, 0.5624160766601562, -0.76495361328125, 2.8022842407226562, 1.9364242553710938, 0.49006175994873047, 0.8898773193359375, 7.461784362792969, 1.8160400390625, 3.3518829345703125, -1.7541160583496094, 2.9292449951171875, -0.34970855712890625, -0.17122650146484375, 3.621227264404297, 0.145111083984375, 0.5753173828125, -1.4279022216796875, -1.056732177734375, 0.6185398101806641, 0.04822540283203125, -0.3807830810546875, -1.64080810546875, -1.9750404357910156, 3.118743896484375, 3.9738006591796875, -3.8697128295898438, 4.8274688720703125, -3.5986080169677734, 0.15228271484375, 2.2005081176757812, 0.46798133850097656, 0.6660308837890625, 2.8367462158203125, 1.3744659423828125, 4.997505187988281, 3.8304443359375, 1.6288223266601562, 0.7376365661621094, -0.6939678192138672, 4.5278778076171875, -0.5191974639892578, 0.08117485046386719, 1.6883392333984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000226.npy"}
|
||||
{"epoch": 0.3416477702191988, "step": 227, "batch_size": 64, "mean": 1.265345811843872, "std": 2.2562756538391113, "min": -3.1639633178710938, "p10": -1.501213836669922, "median": 1.2319698333740234, "p90": 4.043375396728516, "max": 6.625988006591797, "pos_frac": 0.671875, "sample": [4.6466217041015625, 1.8957481384277344, 0.33824920654296875, 3.8267822265625, 1.0062255859375, -3.1639633178710938, 2.12060546875, 2.6540565490722656, -3.0520095825195312, -1.6222457885742188, 3.36688232421875, -2.7834701538085938, -0.9412651062011719, 5.250740051269531, 4.967674255371094, -0.8434066772460938, -0.1896533966064453, -0.6348190307617188, 3.5582008361816406, 0.8210906982421875, 1.327056884765625, 2.7246322631835938, 2.1698837280273438, 1.2817649841308594, 5.719612121582031, 3.267658233642578, -0.8160972595214844, 3.9518966674804688, -1.51898193359375, -0.8647747039794922, -0.9578475952148438, 6.625988006591797, 4.08258056640625, 1.1688003540039062, 0.7499847412109375, 2.118077278137207, 5.5822906494140625, 0.10524749755859375, 0.9099884033203125, 2.488739013671875, -0.03668212890625, 2.0219573974609375, -0.36072540283203125, 1.736846923828125, -0.29840087890625, 1.83355712890625, 2.052398681640625, 2.009265899658203, -1.0353240966796875, 0.009519577026367188, -2.4840469360351562, 1.1821746826171875, 3.32958984375, 3.7055511474609375, 1.7833614349365234, -0.5060577392578125, 0.9306602478027344, 1.1763763427734375, 2.4133682250976562, -0.2145538330078125, -1.4597549438476562, -2.5307769775390625, 2.7655410766601562, 1.6197433471679688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000227.npy"}
|
||||
{"epoch": 0.3431594860166289, "step": 228, "batch_size": 64, "mean": 1.1625350713729858, "std": 2.190053701400757, "min": -4.726318359375, "p10": -1.2326940536499023, "median": 1.1490535736083984, "p90": 4.136653518676758, "max": 6.40216064453125, "pos_frac": 0.71875, "sample": [0.038463592529296875, 5.02447509765625, 1.9007415771484375, 5.319986343383789, 1.1483879089355469, 2.1694793701171875, 0.9219560623168945, 4.379997253417969, 1.4279327392578125, -0.8295974731445312, 1.9378089904785156, 1.5544281005859375, 1.02459716796875, -3.1553726196289062, 0.09473419189453125, 1.8176250457763672, 1.3244285583496094, 0.29921722412109375, 0.2783203125, 1.7410621643066406, 1.48785400390625, -1.104644775390625, -0.49001312255859375, 0.53314208984375, -1.41046142578125, 1.9784736633300781, 4.180194854736328, 6.40216064453125, 1.7647171020507812, 3.2647476196289062, -1.1570453643798828, -2.9863662719726562, 1.2055282592773438, -0.24605941772460938, 2.4697208404541016, -3.026704788208008, 1.14971923828125, -0.03649330139160156, -4.726318359375, -1.2634334564208984, 1.59796142578125, 0.910247802734375, 4.93804931640625, -1.5294647216796875, 1.4327850341796875, 0.3048839569091797, 4.035057067871094, 1.1255035400390625, -1.1609687805175781, 2.3323822021484375, -0.4361991882324219, 0.46087646484375, 5.4235076904296875, 2.9892234802246094, -0.49384307861328125, 3.6228866577148438, 3.98004150390625, 3.6052398681640625, 2.5152359008789062, -0.468536376953125, 0.7502899169921875, -0.31116485595703125, 0.9410934448242188, 1.4297637939453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000228.npy"}
|
||||
{"epoch": 0.34467120181405897, "step": 229, "batch_size": 64, "mean": 1.4018831253051758, "std": 2.137538194656372, "min": -3.1750335693359375, "p10": -1.3971723556518554, "median": 1.6153030395507812, "p90": 3.766895294189454, "max": 7.93328857421875, "pos_frac": 0.734375, "sample": [2.449493408203125, 1.6977729797363281, 1.8995208740234375, 2.521759033203125, 1.8345661163330078, 1.4668960571289062, 3.352649688720703, 2.792837142944336, 1.6248016357421875, -0.4496307373046875, 0.5921401977539062, -0.30743408203125, 2.5659713745117188, -2.3442535400390625, 2.1072158813476562, 2.1361923217773438, 7.93328857421875, -2.3740234375, 1.9499664306640625, 1.1003875732421875, 2.0149669647216797, 0.6028823852539062, 1.1766128540039062, 2.4999313354492188, -1.455291748046875, 2.9409961700439453, -1.2162322998046875, -1.2615604400634766, 0.3214149475097656, 2.7216644287109375, 4.579254150390625, 6.669769287109375, -3.1750335693359375, 1.8767757415771484, -0.578338623046875, 0.44989013671875, 3.8451385498046875, -1.8956680297851562, -1.5555343627929688, 1.4838104248046875, 5.8306121826171875, 2.356698989868164, -0.226715087890625, -0.8047542572021484, 1.14056396484375, 2.0295333862304688, 2.205005645751953, 2.740297317504883, 1.819173812866211, 1.3664779663085938, 1.605804443359375, -0.5777797698974609, 1.9855155944824219, -0.2151947021484375, 1.547698974609375, 3.5843276977539062, -0.8182907104492188, 0.3291893005371094, 4.332891464233398, 4.232696533203125, 3.2538375854492188, 1.3501052856445312, 0.389892578125, -2.332630157470703], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000229.npy"}
|
||||
{"epoch": 0.34618291761148906, "step": 230, "batch_size": 64, "mean": 1.0521972179412842, "std": 2.103426456451416, "min": -8.2608642578125, "p10": -0.6839225769042967, "median": 0.7518806457519531, "p90": 3.7267333984375, "max": 4.76422119140625, "pos_frac": 0.828125, "sample": [0.016880035400390625, 0.7611083984375, 1.5451889038085938, 0.2630882263183594, 2.4770545959472656, -8.2608642578125, 0.42772674560546875, 0.53887939453125, 0.2019195556640625, 2.8948974609375, 2.1814117431640625, 3.4036865234375, 3.1874122619628906, 0.9627227783203125, 0.27274322509765625, 1.3565826416015625, 3.0128049850463867, 3.986682891845703, -1.2261581420898438, 0.09566497802734375, -2.56939697265625, 1.3952865600585938, 0.399200439453125, 1.455352783203125, 2.7933731079101562, -0.768890380859375, 2.6473388671875, 0.3458404541015625, 2.975292205810547, 4.0466461181640625, 1.1095943450927734, 0.042041778564453125, -0.4139232635498047, 0.49135780334472656, 1.43402099609375, -1.620452880859375, 0.4368095397949219, 0.5778598785400391, 4.093353271484375, 1.7098236083984375, 1.19647216796875, 2.4029617309570312, 0.08921051025390625, 0.4964447021484375, 4.219512939453125, 2.303211212158203, -2.5847511291503906, 3.63299560546875, 0.7134552001953125, -0.023023605346679688, 1.3862457275390625, -3.5966796875, 0.7426528930664062, 3.253683090209961, 0.49455833435058594, 4.238883972167969, 0.22642898559570312, 0.580535888671875, 0.9851264953613281, -0.48566436767578125, 0.1674032211303711, 4.76422119140625, -0.31012725830078125, 3.76690673828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000230.npy"}
|
||||
{"epoch": 0.3476946334089191, "step": 231, "batch_size": 64, "mean": 1.3137177228927612, "std": 2.1578423976898193, "min": -7.076324462890625, "p10": -0.9567935943603515, "median": 1.4448890686035156, "p90": 3.9475227355957037, "max": 5.892585754394531, "pos_frac": 0.75, "sample": [1.1872482299804688, -0.80255126953125, 2.914093017578125, -0.6229248046875, 0.20218658447265625, 1.5772628784179688, -0.34612274169921875, 1.75323486328125, 0.249359130859375, 0.027374267578125, 1.4272842407226562, 0.577484130859375, 2.2389144897460938, 0.9350128173828125, -1.7048797607421875, -0.2916984558105469, 1.1746826171875, 3.776165008544922, 2.956787109375, -0.686553955078125, 2.2085189819335938, -3.0685043334960938, 1.6820106506347656, 2.4087486267089844, 0.4083900451660156, 2.9307861328125, -0.9646224975585938, -2.1136512756347656, 5.892585754394531, 2.7900562286376953, 3.2169265747070312, -0.9385261535644531, 1.4795379638671875, 4.32427978515625, 1.462493896484375, -7.076324462890625, 0.3875408172607422, 2.043048858642578, 3.175628662109375, -0.15680694580078125, 4.020961761474609, 2.0910415649414062, 0.5134735107421875, -1.0310821533203125, 3.0779953002929688, -0.36345672607421875, 1.9270706176757812, -0.1860370635986328, 0.785308837890625, 0.3937187194824219, 4.1168212890625, 4.501708984375, 0.6434326171875, 2.3651809692382812, 3.7617149353027344, 5.591560363769531, 1.0248565673828125, 1.9078960418701172, 1.948638916015625, 3.7691802978515625, -1.6603546142578125, 1.2981758117675781, 2.6372146606445312, 4.308437347412109], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000231.npy"}
|
||||
{"epoch": 0.3492063492063492, "step": 232, "batch_size": 64, "mean": 0.854690432548523, "std": 2.5776095390319824, "min": -7.24786376953125, "p10": -1.5393832206726072, "median": 0.5858502388000488, "p90": 3.7299263000488287, "max": 7.455108642578125, "pos_frac": 0.625, "sample": [-0.31163787841796875, 4.207546234130859, 1.8376846313476562, 6.4073944091796875, 1.6479320526123047, -7.24786376953125, -1.2556991577148438, 3.7858810424804688, -2.7415847778320312, 0.19911956787109375, -2.53851318359375, 0.4392547607421875, -1.205718994140625, 0.19342422485351562, -0.44501495361328125, -3.876628875732422, -0.41199493408203125, -0.2987937927246094, 0.40674591064453125, 2.0074501037597656, 0.9963455200195312, 3.32281494140625, -0.790435791015625, 3.17474365234375, 0.8714523315429688, 1.0673418045043945, -1.15521240234375, 3.2515487670898438, -0.14305496215820312, 2.9238739013671875, 1.2212142944335938, 2.7498779296875, -1.6609621047973633, -0.1415576934814453, 0.75927734375, -0.08303070068359375, -0.4049568176269531, -0.9615211486816406, 5.928680419921875, -0.2415771484375, 0.6058006286621094, 2.86322021484375, -0.18270111083984375, 7.455108642578125, 5.239654541015625, 6.054290771484375, -3.3535537719726562, 0.7777786254882812, -1.0126953125, 0.7860965728759766, 2.1201820373535156, 0.5658998489379883, 3.599365234375, 2.52996826171875, 2.636058807373047, 0.451568603515625, -0.4957122802734375, 0.2860374450683594, 0.6225318908691406, 1.2589035034179688, 0.2990264892578125, -4.673736572265625, 2.9468307495117188, 1.8364219665527344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000232.npy"}
|
||||
{"epoch": 0.3507180650037793, "step": 233, "batch_size": 64, "mean": 1.099263072013855, "std": 2.1607892513275146, "min": -2.824798583984375, "p10": -1.9208236694335934, "median": 0.9430522918701172, "p90": 3.988562011718751, "max": 6.0720977783203125, "pos_frac": 0.65625, "sample": [6.0720977783203125, 2.565216064453125, 2.9198074340820312, -0.19367122650146484, 2.6772232055664062, 3.77288818359375, 0.1892852783203125, 4.701305389404297, 0.10388946533203125, 0.7859039306640625, 2.2253684997558594, 4.5321807861328125, -1.5878067016601562, -2.3346176147460938, -1.2420196533203125, 4.566246032714844, 0.8971519470214844, 0.38370513916015625, -2.7153244018554688, -0.5785789489746094, 4.08099365234375, 1.448486328125, 2.7633018493652344, 1.1834716796875, -2.0738525390625, 1.5119400024414062, -2.397705078125, 0.6670150756835938, -1.4556198120117188, -0.1478271484375, 2.7240867614746094, -0.4777946472167969, 3.1777572631835938, 2.6261978149414062, 0.6650390625, -2.252105712890625, 2.4157867431640625, 0.8134384155273438, 1.6100730895996094, 0.2958660125732422, 3.1403045654296875, -2.824798583984375, 2.5659332275390625, 1.3065948486328125, 1.6349639892578125, -0.010311126708984375, 0.3461265563964844, -1.4311370849609375, 3.754199981689453, 3.014680862426758, -0.5507659912109375, -0.01136016845703125, 0.98895263671875, 3.3777923583984375, 4.1199188232421875, 4.5296630859375, 1.0903511047363281, -2.0635452270507812, 2.3683929443359375, -1.58135986328125, -1.0793037414550781, -0.8232421875, -0.12310791015625, 3.6950912475585938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000233.npy"}
|
||||
{"epoch": 0.35222978080120937, "step": 234, "batch_size": 64, "mean": 1.099867820739746, "std": 1.9197750091552734, "min": -2.538890838623047, "p10": -1.7047843933105469, "median": 1.0035648345947266, "p90": 3.919736862182617, "max": 4.4522552490234375, "pos_frac": 0.671875, "sample": [-0.033172607421875, -1.7758026123046875, 1.9686965942382812, 1.3522186279296875, 0.9649848937988281, -0.5137557983398438, 1.53973388671875, -0.8849029541015625, 2.182464599609375, -0.571319580078125, 0.8934898376464844, 1.5110301971435547, 1.1432723999023438, 4.370220184326172, 3.128082275390625, 0.3528633117675781, -0.8611221313476562, -1.6381607055664062, 4.104248046875, -1.1619129180908203, 0.5173053741455078, 2.3008460998535156, 2.9623165130615234, 4.4522552490234375, 0.4601402282714844, -0.4196891784667969, 4.210685729980469, 0.6974582672119141, -1.73333740234375, -0.17459487915039062, 2.698394775390625, 2.1882476806640625, 3.930957794189453, 4.269824981689453, 1.2472667694091797, -2.0260543823242188, 2.434906005859375, 0.81561279296875, 0.8657951354980469, -1.1741561889648438, 0.5476608276367188, 3.8935546875, 1.7276611328125, 3.3588027954101562, 3.2101898193359375, 0.4034709930419922, 1.042144775390625, 1.3570632934570312, 4.0611114501953125, -1.8782081604003906, -0.0832977294921875, 0.7343063354492188, -2.538890838623047, 2.7159576416015625, -0.04788398742675781, -2.205728530883789, 2.39996337890625, 2.39825439453125, -0.52923583984375, 1.8764266967773438, -2.347747802734375, 3.692577362060547, 2.6153335571289062, -0.6072845458984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000234.npy"}
|
||||
{"epoch": 0.35374149659863946, "step": 235, "batch_size": 64, "mean": 1.4406075477600098, "std": 1.7847799062728882, "min": -2.99298095703125, "p10": -0.6486145019531248, "median": 1.2609272003173828, "p90": 3.9615200042724616, "max": 5.648567199707031, "pos_frac": 0.8125, "sample": [-0.10492706298828125, 2.2015380859375, 0.9020614624023438, 3.2467517852783203, 4.824859619140625, -0.749267578125, 0.5112075805664062, 1.817596435546875, 4.617523193359375, -0.41375732421875, 0.9998931884765625, 2.8008995056152344, 0.3099079132080078, 2.026540756225586, 0.8233642578125, 1.9869270324707031, 0.06647491455078125, 1.5889358520507812, 2.2432422637939453, 1.2271156311035156, 0.4961700439453125, 1.8349609375, -2.99298095703125, 2.4986915588378906, 4.015743255615234, 2.0912704467773438, -0.038272857666015625, -1.6142921447753906, 5.648567199707031, 0.4094409942626953, 2.8631134033203125, 0.9169321060180664, 3.8349990844726562, 1.29473876953125, 2.5287609100341797, 0.7222795486450195, 2.5879859924316406, 0.8694572448730469, 3.08111572265625, -1.5620193481445312, 1.1033954620361328, 4.917842864990234, -1.8017578125, -0.1049957275390625, 4.303169250488281, 0.5103874206542969, 0.5825347900390625, 1.612274169921875, 2.733978271484375, 1.350006103515625, 0.8563385009765625, 0.6814117431640625, 4.0661163330078125, 2.8232574462890625, 1.8608016967773438, -0.3472747802734375, 2.8302459716796875, 0.954315185546875, 2.5496063232421875, 0.18019866943359375, 3.0901031494140625, -1.729339599609375, 0.21824264526367188, -1.4555244445800781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000235.npy"}
|
||||
{"epoch": 0.35525321239606955, "step": 236, "batch_size": 64, "mean": 1.1332937479019165, "std": 2.1723785400390625, "min": -4.562904357910156, "p10": -1.2564273834228517, "median": 0.9485759735107422, "p90": 4.1478614807128915, "max": 6.785470962524414, "pos_frac": 0.6875, "sample": [2.0454254150390625, -0.3782501220703125, 0.01563262939453125, 3.6296920776367188, 2.5180397033691406, -0.1002044677734375, 1.5469970703125, 1.8372344970703125, 0.10750198364257812, 1.1861495971679688, 2.184032440185547, 2.255462646484375, -1.2383308410644531, -0.5048294067382812, -1.3383979797363281, 0.9092483520507812, 0.5346145629882812, 2.8099136352539062, 2.4833831787109375, 6.785470962524414, 0.4217700958251953, -0.6465606689453125, -0.7884674072265625, -1.0030288696289062, 4.280464172363281, 1.7870712280273438, 6.011074066162109, 1.75091552734375, 5.424110412597656, 1.5866241455078125, 4.361034393310547, 0.6422195434570312, -1.7386322021484375, 2.1197967529296875, 3.60986328125, 0.7391357421875, 0.6636810302734375, 4.615509033203125, -2.1744155883789062, 0.7240486145019531, -1.1474170684814453, 0.9879035949707031, 1.6934776306152344, 2.1529617309570312, -0.330291748046875, 0.24141311645507812, -1.2641830444335938, -4.562904357910156, 1.3964691162109375, 0.8689861297607422, -0.7645111083984375, -3.5708770751953125, -1.306304931640625, 2.762378692626953, 4.7410736083984375, 3.8384552001953125, 1.53851318359375, -0.28925228118896484, -0.6892375946044922, -0.68341064453125, 3.256824493408203, 1.5478973388671875, 0.2702980041503906, 2.16754150390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000236.npy"}
|
||||
{"epoch": 0.35676492819349964, "step": 237, "batch_size": 64, "mean": 0.6659514904022217, "std": 1.9123902320861816, "min": -3.2811355590820312, "p10": -1.5220407485961913, "median": 0.7331428527832031, "p90": 2.817704582214357, "max": 6.898345947265625, "pos_frac": 0.671875, "sample": [2.1254653930664062, 3.2761077880859375, -3.1598052978515625, 4.743274688720703, -0.5144271850585938, 1.6836700439453125, 0.9087142944335938, 1.2824440002441406, 0.32080078125, 1.4252738952636719, 0.390228271484375, 1.008270263671875, -2.43023681640625, 1.5188064575195312, -0.28745269775390625, 0.8423233032226562, -0.6457672119140625, 1.860626220703125, -3.13055419921875, 0.48009490966796875, -3.2811355590820312, 2.9831180572509766, -0.8927001953125, 1.9112548828125, 4.0105743408203125, -0.6596546173095703, 0.6407928466796875, 1.2555770874023438, -1.31024169921875, 0.14385986328125, 2.4317398071289062, 1.0469703674316406, 1.6384353637695312, -2.937286376953125, 0.955902099609375, -1.4485664367675781, 0.410430908203125, -0.8711929321289062, 1.6601409912109375, -1.5535297393798828, -0.282501220703125, 0.7416458129882812, 1.4300994873046875, 0.164031982421875, -1.3542938232421875, -0.28204345703125, 3.263765335083008, 0.58709716796875, 0.724639892578125, 5.146781921386719, 1.4583206176757812, 0.398773193359375, 1.6473808288574219, 6.898345947265625, 0.02685546875, 1.2883491516113281, 0.8174152374267578, 1.8067588806152344, -0.07685470581054688, 1.2674484252929688, -0.2985992431640625, -1.1277694702148438, -1.626983642578125, 2.1698837280273438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000237.npy"}
|
||||
{"epoch": 0.35827664399092973, "step": 238, "batch_size": 64, "mean": 1.2750656604766846, "std": 2.0043928623199463, "min": -3.69989013671875, "p10": -1.3115421295166014, "median": 1.3858318328857422, "p90": 3.761930465698243, "max": 6.74951171875, "pos_frac": 0.734375, "sample": [-0.1389007568359375, 1.5140228271484375, 2.4360294342041016, -2.1626739501953125, 2.158824920654297, 6.74951171875, 0.33652496337890625, 0.07303237915039062, 4.6554107666015625, -1.4101676940917969, 0.34259033203125, -1.5536346435546875, 1.8542022705078125, 1.5396041870117188, 2.7777862548828125, 3.5913352966308594, -0.022991180419921875, -1.0533027648925781, 2.9202499389648438, 2.0445594787597656, 2.6664276123046875, 5.7928009033203125, 3.913665771484375, 1.2725772857666016, 0.43254852294921875, 4.640863418579102, 1.741455078125, 0.628997802734375, -0.074462890625, 1.7803192138671875, 3.873077392578125, 2.2558422088623047, 1.7635993957519531, -0.1122589111328125, 0.3599205017089844, 0.7944450378417969, 0.89593505859375, -0.4197540283203125, 2.6419143676757812, 3.8319168090820312, 1.4612388610839844, -1.3369674682617188, 0.9705162048339844, -0.59527587890625, 2.8363800048828125, -1.0594863891601562, 2.4618377685546875, 2.261829376220703, 0.8883399963378906, 1.259735107421875, 1.91583251953125, 2.2883453369140625, 2.691555976867676, -3.3807716369628906, -1.2522163391113281, 3.5986289978027344, -3.69989013671875, 1.493133544921875, 2.551738739013672, 1.2392120361328125, 1.3104248046875, -0.30218505859375, -1.7238616943359375, 0.3942680358886719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000238.npy"}
|
||||
{"epoch": 0.35978835978835977, "step": 239, "batch_size": 64, "mean": 1.1811952590942383, "std": 1.9794493913650513, "min": -3.0844764709472656, "p10": -1.2220054626464842, "median": 0.8933315277099609, "p90": 3.7786937713623048, "max": 6.8515625, "pos_frac": 0.734375, "sample": [2.8548126220703125, 1.12640380859375, 0.16017913818359375, 0.8455123901367188, 4.0279998779296875, 3.7096328735351562, -1.1218299865722656, 1.5624923706054688, -0.97412109375, 1.8316307067871094, -0.453521728515625, 0.9319305419921875, -0.6044235229492188, 4.5298919677734375, 0.231842041015625, -0.42427825927734375, -1.7694435119628906, -3.0844764709472656, 0.5531005859375, 3.4034423828125, 0.8629646301269531, 0.8391876220703125, 3.807392120361328, 2.291900634765625, 3.71173095703125, -0.08609580993652344, 1.39874267578125, 2.5047988891601562, 0.8878517150878906, 0.5548171997070312, 3.5229110717773438, 0.4175090789794922, -1.1650161743164062, -0.8283424377441406, 0.207122802734375, 4.0416717529296875, 0.18576622009277344, 0.9440193176269531, -1.246429443359375, 3.4272308349609375, -1.8909969329833984, 2.4395103454589844, -0.32724761962890625, 4.51763916015625, -1.9645881652832031, 2.921884536743164, 2.2434616088867188, 6.8515625, 4.0406646728515625, -2.1648635864257812, -0.018461227416992188, 0.4256744384765625, -2.2689285278320312, 1.7851333618164062, 3.43902587890625, 1.370513916015625, 2.2108993530273438, 0.8728656768798828, 0.709136962890625, 1.8873138427734375, 0.8988113403320312, 1.9912338256835938, 1.8660812377929688, 0.14365386962890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000239.npy"}
|
||||
{"epoch": 0.36130007558578986, "step": 240, "batch_size": 64, "mean": 1.4823133945465088, "std": 2.1183245182037354, "min": -2.9353256225585938, "p10": -1.2543018341064451, "median": 1.5212440490722656, "p90": 3.6515632629394537, "max": 8.34674072265625, "pos_frac": 0.75, "sample": [3.0395126342773438, -1.3465499877929688, 2.820526123046875, 1.4685821533203125, 2.0710067749023438, 3.0247650146484375, -1.9497222900390625, 1.7067718505859375, -2.8462085723876953, -0.6658210754394531, 3.6953964233398438, -0.6158866882324219, 1.8420867919921875, 4.850349426269531, 3.427276611328125, 4.462303161621094, 3.082590103149414, -1.8877744674682617, 3.247100830078125, 3.3866653442382812, -0.835235595703125, 2.1693649291992188, 1.0986404418945312, -1.8603363037109375, 1.4244232177734375, 8.34674072265625, 1.7709732055664062, 1.4323883056640625, 4.115074157714844, 1.5739059448242188, 3.20172119140625, 2.369781494140625, 1.2448043823242188, -0.52020263671875, -0.936309814453125, 1.389251708984375, 0.6737861633300781, 0.051605224609375, 3.485431671142578, 0.21033859252929688, 1.232452392578125, 2.466899871826172, -0.104522705078125, 4.52874755859375, 5.056190490722656, -2.9353256225585938, -0.7429389953613281, 1.9759559631347656, 0.43467140197753906, 1.6799392700195312, 0.023383140563964844, 3.243478775024414, -1.0985145568847656, 2.17144775390625, 0.5194015502929688, -0.15294647216796875, 3.0124778747558594, 3.341888427734375, 0.432342529296875, 0.9825458526611328, 3.549285888671875, 0.4769439697265625, -1.3210678100585938, 2.8762054443359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000240.npy"}
|
||||
{"epoch": 0.36281179138321995, "step": 241, "batch_size": 64, "mean": 0.7337080240249634, "std": 1.842544436454773, "min": -2.3117828369140625, "p10": -1.5736785888671876, "median": 0.39312171936035156, "p90": 3.4282918930053725, "max": 5.3602142333984375, "pos_frac": 0.59375, "sample": [0.8691787719726562, -0.292022705078125, 1.7801036834716797, -1.3502464294433594, 3.0691680908203125, -0.31170654296875, -0.466339111328125, 1.7941665649414062, 0.2876701354980469, 1.890533447265625, -1.427520751953125, 4.9447174072265625, -1.690999984741211, -0.4947967529296875, -0.40869903564453125, 2.1991729736328125, 0.3835029602050781, 1.0972976684570312, 0.1995849609375, 3.6060943603515625, -0.052921295166015625, 1.0027008056640625, -1.7671356201171875, 1.064056396484375, -0.008514404296875, 0.2286090850830078, 1.7329177856445312, 0.40924072265625, 0.1435089111328125, 0.7461013793945312, 2.340618133544922, 1.434600830078125, 5.3602142333984375, -1.5819091796875, -0.7566490173339844, -1.8759307861328125, -0.8628921508789062, 2.186798095703125, 0.04509735107421875, 1.0379638671875, -1.9642410278320312, -0.766387939453125, -0.5561008453369141, 4.539344787597656, 1.6475086212158203, -0.20164108276367188, 0.402740478515625, 4.38079833984375, 1.56597900390625, -2.3117828369140625, -0.10479736328125, 2.772491455078125, 0.9319610595703125, 2.0881195068359375, 3.581888198852539, -1.554473876953125, -2.17205810546875, 1.2888126373291016, 3.0699005126953125, 1.640615463256836, -1.2714824676513672, 4.209224700927734, -0.7030315399169922, -0.06140899658203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000241.npy"}
|
||||
{"epoch": 0.36432350718065004, "step": 242, "batch_size": 64, "mean": 1.6368017196655273, "std": 2.101229190826416, "min": -2.51763916015625, "p10": -0.8611610412597656, "median": 1.5092849731445312, "p90": 3.99380111694336, "max": 7.852745056152344, "pos_frac": 0.796875, "sample": [-1.6475868225097656, 0.5646591186523438, 1.9321517944335938, 1.1857032775878906, 2.84857177734375, -2.51763916015625, 4.680301666259766, 1.6371688842773438, 2.8912620544433594, 2.8942947387695312, 1.35040283203125, 0.338409423828125, -0.9215049743652344, 3.50445556640625, 7.852745056152344, 1.4641571044921875, 3.3589553833007812, 1.8771514892578125, 2.3867263793945312, 1.2867469787597656, -0.8794097900390625, 0.100128173828125, 3.9357757568359375, 2.3609695434570312, -0.16403961181640625, -0.0205230712890625, 1.554412841796875, 1.3861503601074219, 2.6911048889160156, 3.369049072265625, 1.6453819274902344, 0.1436767578125, 7.337249755859375, -0.1985607147216797, 1.5604820251464844, 0.1416168212890625, 0.5579776763916016, 0.7425765991210938, 6.7806396484375, 0.3302726745605469, -0.8185806274414062, 0.881195068359375, 0.6991043090820312, 0.22751998901367188, 4.018669128417969, 0.6378250122070312, 1.3802423477172852, 2.5799484252929688, 1.8714752197265625, 1.7373828887939453, 5.923057556152344, 2.4376258850097656, 4.410377502441406, -0.26375770568847656, 0.0264892578125, 2.064556121826172, 3.0175628662109375, -0.15816879272460938, 2.9750823974609375, -2.16302490234375, 2.057403564453125, -1.1007270812988281, -1.2224807739257812, 3.1944751739501953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000242.npy"}
|
||||
{"epoch": 0.36583522297808013, "step": 243, "batch_size": 64, "mean": 1.1538474559783936, "std": 1.8657734394073486, "min": -2.2683639526367188, "p10": -1.3720516204833983, "median": 0.8005294799804688, "p90": 3.9112236022949225, "max": 5.03961181640625, "pos_frac": 0.6875, "sample": [2.0643386840820312, 3.989837646484375, 1.2592620849609375, -1.8066253662109375, 4.947284698486328, 2.9324817657470703, -0.679107666015625, 0.7920494079589844, 2.312896728515625, -1.4837417602539062, 3.6550140380859375, 0.7523574829101562, 1.138885498046875, 1.2860794067382812, 0.49599266052246094, 1.2411651611328125, -0.06785964965820312, -0.32623291015625, 2.7460765838623047, 1.6142578125, -0.1387939453125, 4.4121856689453125, 0.779052734375, -0.3807239532470703, -0.4127349853515625, 3.21230411529541, -0.5279617309570312, 3.1651782989501953, 2.3912734985351562, 2.1165924072265625, 1.1188201904296875, -1.5190887451171875, -0.0775299072265625, -1.3564491271972656, 1.5437517166137695, 3.7277908325195312, 3.3883514404296875, 5.03961181640625, -1.057464599609375, 0.6294937133789062, 4.046516418457031, -0.4237823486328125, 1.673553466796875, 4.139255523681641, -1.3787384033203125, 0.21565628051757812, -1.6638526916503906, 0.8090095520019531, 3.7265663146972656, 1.5630950927734375, 0.3953704833984375, 0.3333282470703125, 0.00191497802734375, 2.1984939575195312, -1.4084815979003906, 4.3793487548828125, 0.6872520446777344, 0.4101982116699219, 0.5612564086914062, -0.9407768249511719, -2.2683639526367188, 1.943990707397461, -0.4429779052734375, 2.370330810546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000243.npy"}
|
||||
{"epoch": 0.3673469387755102, "step": 244, "batch_size": 64, "mean": 1.7183579206466675, "std": 1.5405775308609009, "min": -1.2828903198242188, "p10": 0.046762084960937554, "median": 1.559621810913086, "p90": 3.2816238403320317, "max": 6.6365966796875, "pos_frac": 0.90625, "sample": [2.120361328125, 3.3206939697265625, 2.2538528442382812, 0.3874359130859375, 1.0720672607421875, 1.0600461959838867, 0.023487091064453125, 2.5595550537109375, 0.32179832458496094, 1.0535316467285156, 0.10107040405273438, 2.8860549926757812, 1.9727630615234375, 2.794464111328125, 1.2246589660644531, 0.4360198974609375, 2.5954036712646484, 5.03851318359375, -0.7967300415039062, 1.5088539123535156, 3.190460205078125, 2.1558914184570312, -0.14028549194335938, 2.5369033813476562, -1.2828903198242188, 4.438140869140625, 1.0989818572998047, 2.8478775024414062, 3.9145736694335938, 1.282257080078125, 3.1463775634765625, 1.35205078125, 0.6653213500976562, -0.7977066040039062, -0.7791748046875, 0.4731769561767578, -0.20278549194335938, 2.6763229370117188, 1.762786865234375, 5.77923583984375, 0.44819068908691406, 6.6365966796875, 0.5000991821289062, 1.0824737548828125, 0.4946327209472656, 2.1357421875, 1.5571556091308594, 3.1702232360839844, 1.3557395935058594, 2.2045516967773438, 3.11724853515625, 1.713623046875, 0.2125701904296875, 4.007472991943359, 1.8796310424804688, 2.6415367126464844, 1.5620880126953125, 2.0117340087890625, 1.1261787414550781, 0.7547454833984375, 1.8294677734375, 0.8075485229492188, 2.3956298828125, 0.2786102294921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000244.npy"}
|
||||
{"epoch": 0.3688586545729403, "step": 245, "batch_size": 64, "mean": 1.3452973365783691, "std": 2.0908446311950684, "min": -3.8049774169921875, "p10": -0.9415140151977539, "median": 1.2381439208984375, "p90": 4.168598365783692, "max": 6.2086029052734375, "pos_frac": 0.734375, "sample": [0.36359405517578125, 1.9900360107421875, 1.7338066101074219, 5.7435150146484375, 2.0450382232666016, -0.9246768951416016, 1.9871139526367188, 2.324687957763672, 0.7131538391113281, -2.5466766357421875, -0.4786720275878906, 1.228912353515625, 2.7837600708007812, 0.8775405883789062, 3.9931411743164062, 3.478466033935547, -0.9364337921142578, 4.237274169921875, -3.8049774169921875, 6.2086029052734375, 2.9315872192382812, 1.3423538208007812, 0.9745025634765625, 2.7964344024658203, 1.5522270202636719, 1.2599411010742188, 2.367828369140625, -0.42531394958496094, 1.2202606201171875, -0.5887527465820312, 2.2310333251953125, -1.7842693328857422, 1.1910476684570312, 2.593231201171875, -0.9436912536621094, 1.24737548828125, -0.135528564453125, -0.9586982727050781, 2.8678131103515625, 4.734413146972656, 1.1985206604003906, -0.3398284912109375, 4.894248962402344, 0.8310394287109375, -2.2882041931152344, 0.47551441192626953, 4.087257385253906, 4.364749908447266, 1.971435546875, -0.4594879150390625, 0.8340911865234375, 1.2567710876464844, -3.63177490234375, 1.9272308349609375, 3.2812957763671875, 0.530853271484375, -0.9124603271484375, 4.203458786010742, -0.08702468872070312, 0.5260467529296875, 3.827136993408203, 1.006927490234375, 0.5945472717285156, 2.5156784057617188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000245.npy"}
|
||||
{"epoch": 0.37037037037037035, "step": 246, "batch_size": 64, "mean": 1.4147107601165771, "std": 2.4298853874206543, "min": -3.1645736694335938, "p10": -1.0194791793823241, "median": 1.0039663314819336, "p90": 4.711739349365235, "max": 9.961273193359375, "pos_frac": 0.71875, "sample": [1.6364097595214844, 1.8654823303222656, 4.959871292114258, 3.5982437133789062, 2.10736083984375, 0.10756683349609375, -1.0062198638916016, 0.58892822265625, 0.054004669189453125, -1.5497722625732422, 1.9881668090820312, 3.049530029296875, 0.139984130859375, -0.7159500122070312, 6.3674163818359375, 0.14128875732421875, 1.9255828857421875, -3.1645736694335938, -0.9593582153320312, 0.208648681640625, 9.961273193359375, 4.824104309082031, 2.2289810180664062, 2.3965625762939453, 4.4021453857421875, 1.80682373046875, 2.149810791015625, 4.449554443359375, -0.528167724609375, 2.0993728637695312, 0.874755859375, 3.4777984619140625, 3.54119873046875, 0.1716766357421875, 1.049468994140625, 1.5232925415039062, -1.8601150512695312, 2.720256805419922, 6.025485992431641, -1.4655532836914062, 3.5428314208984375, -0.8699493408203125, -2.4653396606445312, 0.40154266357421875, 2.3387680053710938, -2.4323272705078125, 0.3141937255859375, 0.7652435302734375, 1.7370452880859375, -1.0251617431640625, -0.9566192626953125, 0.9584636688232422, 1.6177482604980469, 6.465431213378906, -0.08227157592773438, -0.1733856201171875, 0.58624267578125, 3.2756271362304688, -0.30633544921875, 5.1817626953125, 1.1058082580566406, 0.32901763916015625, -0.25494384765625, -0.7032470703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000246.npy"}
|
||||
{"epoch": 0.37188208616780044, "step": 247, "batch_size": 64, "mean": 0.6013696789741516, "std": 2.0445351600646973, "min": -7.88525390625, "p10": -1.3645835876464845, "median": 0.6114425659179688, "p90": 2.6773338317871094, "max": 5.2627105712890625, "pos_frac": 0.609375, "sample": [-1.201507568359375, 1.823272705078125, -0.112030029296875, -0.2075347900390625, 2.628936767578125, -1.3197097778320312, -1.1912307739257812, 1.1004638671875, 5.2627105712890625, -0.358245849609375, 3.9964447021484375, 1.1822662353515625, 2.6092529296875, 2.239185333251953, -0.4709625244140625, 3.4443359375, 0.3597869873046875, 2.66534423828125, -0.6226959228515625, -0.0399169921875, 4.864044189453125, 4.619239807128906, -2.524890899658203, -0.2596912384033203, 1.6196060180664062, 0.6634979248046875, 1.2092647552490234, 0.4900703430175781, -0.9817657470703125, 1.4098968505859375, 1.7259750366210938, -0.4803314208984375, 1.3699188232421875, 2.44989013671875, 0.03839874267578125, 0.3744010925292969, 0.8473777770996094, -3.4992599487304688, -0.274139404296875, -0.5550880432128906, -1.3626556396484375, 0.060169219970703125, -1.0468368530273438, 2.6824722290039062, -7.88525390625, 0.850921630859375, -1.3654098510742188, -0.3883514404296875, 0.9070701599121094, 0.9148941040039062, 0.55938720703125, 2.515819549560547, 0.2659912109375, 1.7852563858032227, -1.8888015747070312, 1.5062789916992188, -0.3072662353515625, 1.5432395935058594, 1.37860107421875, -1.6215972900390625, -1.4392242431640625, 1.656768798828125, 2.727264404296875, 1.544342041015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000247.npy"}
|
||||
{"epoch": 0.37339380196523053, "step": 248, "batch_size": 64, "mean": 1.0105094909667969, "std": 2.2121663093566895, "min": -5.792633056640625, "p10": -1.540850830078125, "median": 1.2023286819458008, "p90": 3.4075654983520525, "max": 7.136131286621094, "pos_frac": 0.671875, "sample": [3.015859603881836, 0.08824920654296875, -0.0077667236328125, -1.53704833984375, 2.988067626953125, 1.27471923828125, 2.323638916015625, 1.7551536560058594, -3.0233726501464844, -0.04456043243408203, -0.638087272644043, 7.0806732177734375, 1.8487472534179688, -0.2147674560546875, 1.2534332275390625, 0.7839431762695312, 2.8241424560546875, -1.7229461669921875, -2.284036636352539, -0.8611297607421875, -1.54248046875, 0.08416366577148438, 1.4604034423828125, 1.942215919494629, 1.6877365112304688, 2.097217559814453, 0.6640472412109375, -1.0377578735351562, -0.19147491455078125, 2.7363967895507812, 1.4250030517578125, 4.084877014160156, 0.4665822982788086, -1.5469799041748047, 4.560371398925781, 2.5301952362060547, -2.1829071044921875, 1.2875442504882812, 0.9398555755615234, -0.381439208984375, 1.5555496215820312, 3.575439453125, 1.5114917755126953, 6.549598693847656, 2.2659835815429688, 1.2229042053222656, 0.07393836975097656, 1.1957626342773438, 1.2088947296142578, 2.0165252685546875, 7.136131286621094, 0.9044036865234375, 3.6651763916015625, 0.17896270751953125, -1.0981597900390625, 0.12710189819335938, -5.792633056640625, 2.5946273803710938, 1.476654052734375, -0.2456207275390625, -0.4781761169433594, 1.419525146484375, -0.203521728515625, -0.1744384765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000248.npy"}
|
||||
{"epoch": 0.3749055177626606, "step": 249, "batch_size": 64, "mean": 0.7847132682800293, "std": 2.304892063140869, "min": -5.3125, "p10": -2.2133548736572264, "median": 1.0453453063964844, "p90": 3.1190200805664063, "max": 9.273162841796875, "pos_frac": 0.640625, "sample": [1.5302658081054688, -1.6437339782714844, 3.40966796875, 1.8030242919921875, 1.389047622680664, -2.8969573974609375, -2.444366455078125, 3.807647705078125, 9.273162841796875, 2.4431533813476562, 1.7939491271972656, -0.6027069091796875, 0.052555084228515625, 1.0366287231445312, -1.9355621337890625, -2.3360137939453125, -0.5170822143554688, -3.7376480102539062, 1.4664535522460938, -0.9087581634521484, 0.9476051330566406, -0.8537979125976562, -1.2102813720703125, 1.6049766540527344, 1.94403076171875, 1.0540618896484375, 0.07371807098388672, -0.08859634399414062, 3.1328353881835938, -0.307861328125, -1.929168701171875, 2.4131698608398438, 0.374725341796875, -0.1386260986328125, 1.935028076171875, 0.332061767578125, 2.4401473999023438, -0.3771820068359375, 2.47409725189209, -3.5328216552734375, 1.7737045288085938, 3.8042211532592773, 0.7804794311523438, 3.3241195678710938, 4.3663787841796875, 1.6315536499023438, -2.332408905029297, -1.0131683349609375, 2.0211868286132812, 1.6437835693359375, 0.4152965545654297, 1.972625732421875, 1.9840240478515625, -0.8961391448974609, 3.0867843627929688, 3.0864105224609375, -1.2850418090820312, 0.6675357818603516, 2.619953155517578, -5.3125, 1.1381416320800781, -0.22051620483398438, 2.6670074462890625, 3.0273666381835938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000249.npy"}
|
||||
{"epoch": 0.3764172335600907, "step": 250, "batch_size": 64, "mean": 1.6611299514770508, "std": 2.287900447845459, "min": -5.0528717041015625, "p10": -0.9954319000244137, "median": 1.5624809265136719, "p90": 4.372201919555664, "max": 8.4913330078125, "pos_frac": 0.796875, "sample": [3.3736190795898438, -1.8514633178710938, -0.6683750152587891, 0.970306396484375, -0.0381927490234375, 3.980804443359375, 5.4802398681640625, 1.4071578979492188, 3.5908432006835938, 1.010284423828125, 4.318382263183594, -2.214996337890625, 1.5952377319335938, 0.43579864501953125, 1.7244453430175781, 0.3438835144042969, 0.984375, -5.0528717041015625, 1.7234268188476562, 0.3796234130859375, 1.0739612579345703, -2.3019561767578125, 1.8075027465820312, -1.2028923034667969, 0.9220943450927734, 1.226776123046875, 4.7906494140625, 1.455862045288086, 1.6689910888671875, 2.5073318481445312, -1.1243782043457031, 3.02362060546875, 3.939502716064453, -0.40059566497802734, 0.8169174194335938, 1.2170181274414062, 1.8057212829589844, 2.19171142578125, 6.726768493652344, 1.537689208984375, 0.398040771484375, -1.7732963562011719, 4.175312042236328, 0.8374252319335938, 3.764068603515625, 1.765878677368164, 8.4913330078125, 3.3122634887695312, 2.7263031005859375, 4.395267486572266, 1.6020050048828125, 0.5655517578125, 1.7493438720703125, 2.340646743774414, -0.673980712890625, 3.5860519409179688, 5.900604248046875, 4.415504455566406, 1.139007568359375, -0.06663322448730469, -0.6945571899414062, 3.32916259765625, 0.2649116516113281, 1.5872726440429688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000250.npy"}
|
||||
{"epoch": 0.3779289493575208, "step": 251, "batch_size": 64, "mean": 0.8148387670516968, "std": 2.211162805557251, "min": -5.583351135253906, "p10": -1.7629966735839842, "median": 0.9348716735839844, "p90": 3.3925167083740235, "max": 7.026283264160156, "pos_frac": 0.640625, "sample": [1.011754035949707, 3.3254165649414062, 6.651725769042969, -0.8523616790771484, 1.564208984375, 0.37361907958984375, -1.2348480224609375, 0.11007308959960938, -0.13266754150390625, 5.054595947265625, -0.016876220703125, -2.233123779296875, 4.2837982177734375, -0.6231842041015625, 0.6838150024414062, -1.8346710205078125, 0.29622936248779297, 2.65936279296875, -0.16574859619140625, -3.2347869873046875, 0.7513046264648438, 7.026283264160156, 1.1150779724121094, -2.4763031005859375, 1.7119216918945312, -1.4017868041992188, 3.6374435424804688, 1.218017578125, -1.5020675659179688, 1.02325439453125, 1.435638427734375, 1.0191192626953125, -0.1841583251953125, 1.4798784255981445, 0.8957786560058594, 2.0781288146972656, 2.737598419189453, 1.9049301147460938, -1.9431638717651367, -0.6984167098999023, 0.9739646911621094, 1.4837799072265625, 2.4865798950195312, 3.397296905517578, 0.6363983154296875, -0.22919464111328125, 1.6817474365234375, 1.211669921875, 1.65155029296875, 3.7097320556640625, 3.3813629150390625, 0.8505706787109375, 2.5618057250976562, -0.9226608276367188, -5.583351135253906, -1.0980682373046875, 1.5470561981201172, -3.1426467895507812, 0.5700645446777344, -1.5957565307617188, -0.5323829650878906, -0.344879150390625, 2.0717620849609375, 1.8684711456298828], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000251.npy"}
|
||||
{"epoch": 0.3794406651549509, "step": 252, "batch_size": 64, "mean": 0.8971811532974243, "std": 1.9387104511260986, "min": -4.098670959472656, "p10": -1.0839996337890625, "median": 0.672278881072998, "p90": 3.1416603088378907, "max": 6.624603271484375, "pos_frac": 0.75, "sample": [1.2657089233398438, -4.047637939453125, -1.1224899291992188, -2.19036865234375, 3.699432373046875, 1.2882461547851562, 0.3565826416015625, 2.76336669921875, 1.1040210723876953, 2.1253128051757812, 3.7525787353515625, 1.5510663986206055, -0.02635955810546875, 0.3013763427734375, 3.823711395263672, 0.33807373046875, 3.1049652099609375, -0.2787799835205078, 2.1281280517578125, 0.7025861740112305, -0.15221405029296875, 1.2328681945800781, 2.63201904296875, 2.3749523162841797, -0.44915008544921875, -0.5370597839355469, -0.6000823974609375, -2.9275360107421875, 1.1068105697631836, 0.6419715881347656, 1.6464004516601562, 2.1582107543945312, 3.1573867797851562, 2.027862548828125, 1.643280029296875, 0.032527923583984375, -1.055694580078125, 0.03693389892578125, 0.6364974975585938, 0.1149139404296875, 0.8453903198242188, 0.2226409912109375, -1.2346534729003906, 0.7436027526855469, 0.2588958740234375, 6.624603271484375, 0.348175048828125, 0.2558135986328125, -4.098670959472656, -1.09613037109375, -0.7549514770507812, 2.25469970703125, 0.8399238586425781, 0.2400798797607422, 1.3526992797851562, 0.5736160278320312, 1.53411865234375, 4.468971252441406, 0.37760162353515625, 1.8595962524414062, 6.3321533203125, 1.3617324829101562, -0.7594375610351562, 0.5087013244628906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000252.npy"}
|
||||
{"epoch": 0.38095238095238093, "step": 253, "batch_size": 64, "mean": 0.8155180215835571, "std": 1.6339986324310303, "min": -3.6762924194335938, "p10": -1.0920562744140625, "median": 0.524017333984375, "p90": 3.031653213500978, "max": 4.6744232177734375, "pos_frac": 0.6875, "sample": [1.8565254211425781, -1.0941848754882812, 4.393833160400391, 1.183746337890625, -0.6565923690795898, 0.43569183349609375, -0.5170822143554688, 2.4756011962890625, 2.671802520751953, -1.1426944732666016, 1.0804080963134766, -0.335205078125, 3.8070640563964844, -0.07384300231933594, 0.16960716247558594, 0.08785247802734375, -0.5118751525878906, 3.1858749389648438, 0.6999588012695312, 0.40174102783203125, 0.484100341796875, -0.2606544494628906, 0.7640228271484375, 1.8166885375976562, 1.082667350769043, -0.221527099609375, 3.5419044494628906, 1.7519073486328125, 0.16497039794921875, 1.71893310546875, 1.4397964477539062, 0.40839385986328125, 2.3648805618286133, -0.32871246337890625, -1.240875244140625, 4.258087158203125, -0.1677093505859375, -1.617156982421875, 2.0945587158203125, 1.9017105102539062, 0.12073516845703125, -1.262420654296875, 0.0672760009765625, -3.6762924194335938, 1.60003662109375, -0.5546760559082031, -1.8686141967773438, 2.1994552612304688, 4.604616165161133, 0.5463790893554688, 0.8079071044921875, 2.1414337158203125, 4.6744232177734375, 1.00469970703125, 1.5324897766113281, 0.8918304443359375, -1.0870895385742188, -0.6502418518066406, 1.3985061645507812, -0.06565093994140625, 0.15207672119140625, 0.6608314514160156, 0.37957191467285156, 0.5016555786132812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000253.npy"}
|
||||
{"epoch": 0.382464096749811, "step": 254, "batch_size": 64, "mean": 1.0706019401550293, "std": 2.035973310470581, "min": -3.2617950439453125, "p10": -1.0775436401367187, "median": 1.0367584228515625, "p90": 3.460383224487305, "max": 6.702121734619141, "pos_frac": 0.671875, "sample": [-2.6662063598632812, 1.7795867919921875, 1.6421661376953125, -0.5431976318359375, 1.854278564453125, 2.2159194946289062, 2.204730987548828, 1.5889434814453125, 0.26141929626464844, 0.11340141296386719, -0.35417938232421875, 3.3842506408691406, 1.148284912109375, 2.308378219604492, 1.3552322387695312, 2.419957160949707, 4.0562591552734375, 2.6130218505859375, -0.9909515380859375, 0.7466964721679688, 1.3616180419921875, 6.702121734619141, -0.48069000244140625, 1.918722152709961, 0.468475341796875, -0.43021392822265625, 5.6795806884765625, 1.5778656005859375, 3.744251251220703, -3.2617950439453125, 4.577022552490234, 0.37856292724609375, -0.05617523193359375, -0.17507362365722656, -1.6702499389648438, 2.453857421875, 0.6958084106445312, 2.0763092041015625, -0.3728790283203125, 1.485992431640625, 1.6949081420898438, -0.2148895263671875, 2.3794593811035156, -1.114654541015625, 0.8136329650878906, 3.331085205078125, -0.84661865234375, 2.3646926879882812, 3.493011474609375, -0.3125038146972656, 0.92523193359375, -0.9531402587890625, -0.39792633056640625, 1.6907958984375, 5.302490234375, -0.715362548828125, 2.3393325805664062, 0.7776422500610352, 2.9459457397460938, 0.6962661743164062, 0.6283702850341797, -2.767578125, -2.5515213012695312, -2.8012466430664062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000254.npy"}
|
||||
{"epoch": 0.3839758125472411, "step": 255, "batch_size": 64, "mean": 1.0823832750320435, "std": 1.8229079246520996, "min": -2.9534225463867188, "p10": -1.3961418151855467, "median": 1.0096101760864258, "p90": 3.279178619384766, "max": 5.441986083984375, "pos_frac": 0.71875, "sample": [3.1232070922851562, -2.2854843139648438, 2.0716781616210938, 4.635711669921875, 2.350921630859375, 1.309844970703125, 0.573486328125, -0.5608901977539062, -0.22334861755371094, 0.6375808715820312, -0.04880523681640625, -0.9850139617919922, 0.3619117736816406, 1.7225494384765625, -1.0546073913574219, 1.7345504760742188, 3.8120803833007812, 0.85504150390625, -0.11077499389648438, -1.8534355163574219, 2.2045555114746094, 0.5109329223632812, 3.3460235595703125, 2.4284400939941406, 1.7159881591796875, 0.6619415283203125, -1.8509292602539062, -1.2623443603515625, 3.0124282836914062, 0.24393463134765625, 2.081146240234375, 2.4791946411132812, -1.4534835815429688, 1.4127655029296875, 2.015625, 1.4678802490234375, 2.92608642578125, 0.9810256958007812, 4.319881439208984, 2.825275421142578, 0.9837436676025391, 0.285614013671875, 0.484771728515625, -1.2360382080078125, 0.5665473937988281, -1.5051517486572266, 2.0470352172851562, 5.441986083984375, -0.35048675537109375, 1.8287353515625, 1.0354766845703125, 1.7242584228515625, 0.591461181640625, 2.95355224609375, 1.4454078674316406, 1.8313522338867188, 0.8572158813476562, 5.248317718505859, 3.46673583984375, -1.6543922424316406, -0.34395599365234375, -0.7369537353515625, 1.1281471252441406, -2.9534225463867188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000255.npy"}
|
||||
{"epoch": 0.3854875283446712, "step": 256, "batch_size": 64, "mean": 1.0808756351470947, "std": 1.959763526916504, "min": -2.7500152587890625, "p10": -0.9858810424804686, "median": 0.8094482421875, "p90": 3.5270507812500003, "max": 6.26483154296875, "pos_frac": 0.703125, "sample": [1.7706756591796875, 2.293670654296875, 0.818120002746582, 2.4139060974121094, 3.545562744140625, -2.7500152587890625, -0.031375885009765625, -0.04535675048828125, -0.2714042663574219, 0.08324432373046875, -2.3425827026367188, 0.6600322723388672, -0.841064453125, 3.402252197265625, 0.0714111328125, 5.7135009765625, -0.26035308837890625, 1.0799407958984375, -2.5108108520507812, 0.817291259765625, -1.0612716674804688, 2.8368682861328125, -0.3876533508300781, 1.37298583984375, -0.2215118408203125, 0.6412582397460938, 0.9434585571289062, -1.04351806640625, 0.6264801025390625, 0.801605224609375, -0.49976348876953125, 1.7127189636230469, 1.2908554077148438, 2.7652816772460938, 1.1592216491699219, 1.415924072265625, 2.0621795654296875, 3.6584701538085938, 0.7123260498046875, 3.8150272369384766, -0.22920989990234375, 0.7145805358886719, 3.2550048828125, -2.5913238525390625, 2.9195289611816406, 1.4653701782226562, 3.483856201171875, 1.4914703369140625, 0.1374225616455078, 5.334930419921875, 0.5037155151367188, 2.3886184692382812, 2.047119140625, 0.08964157104492188, 2.706634521484375, -2.4434890747070312, 6.26483154296875, -0.43804931640625, 0.24298381805419922, 5.096466064453125, -0.5196399688720703, 0.7428092956542969, 1.146575927734375, -0.8513946533203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000256.npy"}
|
||||
{"epoch": 0.3869992441421013, "step": 257, "batch_size": 64, "mean": 1.360719919204712, "std": 2.345146417617798, "min": -3.6463584899902344, "p10": -1.3896255493164062, "median": 1.2273364067077637, "p90": 4.2456604003906255, "max": 7.450592041015625, "pos_frac": 0.671875, "sample": [-0.12695884704589844, 0.05432891845703125, 0.31182861328125, 6.288116455078125, 4.1728668212890625, 2.5068893432617188, -0.2929496765136719, 3.18658447265625, 0.3449974060058594, 4.062485694885254, 2.924114227294922, 1.2378768920898438, 1.2167959213256836, 0.3317718505859375, 3.8255157470703125, 4.320068359375, 2.352447509765625, -0.00684356689453125, -3.1936798095703125, -0.07767486572265625, 0.2461090087890625, 0.7516250610351562, 4.26336669921875, -0.1496143341064453, 1.9311065673828125, -0.4871978759765625, -2.3442344665527344, 4.196746826171875, 1.2771987915039062, -1.862457275390625, 2.485870361328125, 3.9478530883789062, -1.121368408203125, -1.8228759765625, 2.9668121337890625, 2.4574661254882812, -1.17510986328125, 2.0755386352539062, 4.503692626953125, -0.0489654541015625, -3.6463584899902344, 4.204345703125, -0.0971527099609375, 4.310722351074219, -1.3980255126953125, 0.7244014739990234, -0.05779266357421875, -1.370025634765625, 2.5095901489257812, 1.919921875, 1.6358261108398438, 0.2720947265625, 2.5572357177734375, -0.5771522521972656, 7.450592041015625, 6.678504943847656, 0.14894676208496094, -2.4490509033203125, 3.217315673828125, 1.9563446044921875, 1.5093631744384766, 0.8628616333007812, -0.78466796875, 1.97808837890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000257.npy"}
|
||||
{"epoch": 0.3885109599395314, "step": 258, "batch_size": 64, "mean": 1.1515395641326904, "std": 1.5226596593856812, "min": -2.0713653564453125, "p10": -0.6859199523925781, "median": 1.2621593475341797, "p90": 3.1000528335571294, "max": 5.424518585205078, "pos_frac": 0.734375, "sample": [-0.5013046264648438, 1.4214859008789062, 0.49814605712890625, -0.6958961486816406, 1.428436279296875, 4.839601516723633, 0.7621612548828125, -0.671142578125, 1.9072341918945312, -0.31517791748046875, 1.710052490234375, 1.5150375366210938, -0.928375244140625, 2.0720348358154297, 0.9328384399414062, 1.7468414306640625, -0.6922531127929688, -1.1553840637207031, -0.2627105712890625, -1.6196746826171875, 2.4789047241210938, -0.458831787109375, -1.111175537109375, 2.571075439453125, 1.2581634521484375, 1.4550552368164062, 2.3177490234375, -0.3729381561279297, 0.3613090515136719, 1.2661552429199219, 0.8350067138671875, 1.4800033569335938, 1.8125152587890625, 2.973447799682617, 1.6560726165771484, 1.1844482421875, 0.37046241760253906, 0.5039215087890625, -0.22735595703125, 3.8269500732421875, 0.9620590209960938, 2.1399688720703125, 3.7583847045898438, -0.3337860107421875, 3.402923583984375, 1.4974250793457031, 3.1543121337890625, 1.796651840209961, 2.1646499633789062, 1.992889404296875, 0.5349044799804688, 0.4083251953125, 0.11700439453125, -0.24060821533203125, 1.8624534606933594, 1.76861572265625, 2.789276123046875, -2.0713653564453125, 1.4300804138183594, -0.5073928833007812, 3.7221078872680664, 1.1115570068359375, 5.424518585205078, 0.6406898498535156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000258.npy"}
|
||||
{"epoch": 0.3900226757369615, "step": 259, "batch_size": 64, "mean": 1.5702614784240723, "std": 2.1773834228515625, "min": -5.6485137939453125, "p10": -1.3084598541259762, "median": 1.7954769134521484, "p90": 4.315065002441407, "max": 6.6044921875, "pos_frac": 0.765625, "sample": [-0.06889724731445312, -0.673858642578125, -5.6485137939453125, 1.7365303039550781, 1.222503662109375, 2.206584930419922, 1.4443435668945312, 5.0032501220703125, 2.4661788940429688, -1.8021163940429688, 3.1985092163085938, -0.4996337890625, 1.8544235229492188, 1.5009307861328125, 2.2843856811523438, -0.9585952758789062, -2.2405948638916016, 0.6064224243164062, 6.6044921875, 2.2526702880859375, 0.21192169189453125, 3.257232666015625, 5.201564788818359, 5.210750579833984, 1.1789703369140625, -1.4766921997070312, 1.99481201171875, -1.4108695983886719, 0.7929916381835938, 1.2137298583984375, 1.9673004150390625, 0.9137115478515625, -2.1575660705566406, 2.494415283203125, 0.9448165893554688, -0.1456756591796875, 2.5864715576171875, 2.1618194580078125, 3.13165283203125, 2.267303466796875, 3.2050514221191406, 1.3009414672851562, -0.7734146118164062, 4.74835205078125, 1.1992645263671875, 2.194549560546875, -1.6470985412597656, -0.6141395568847656, -1.0695037841796875, 0.4584388732910156, 4.549079895019531, 3.8800277709960938, 2.608673095703125, 3.781158447265625, 4.100555419921875, 3.719676971435547, 2.9536590576171875, 0.2124176025390625, 1.1764602661132812, 2.0492935180664062, 4.4069976806640625, 3.7751312255859375, 1.859832763671875, 1.5936527252197266], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000259.npy"}
|
||||
{"epoch": 0.3915343915343915, "step": 260, "batch_size": 64, "mean": 1.241944432258606, "std": 2.1053764820098877, "min": -3.1635208129882812, "p10": -1.452364730834961, "median": 1.2562332153320312, "p90": 3.5750576019287115, "max": 7.637626647949219, "pos_frac": 0.71875, "sample": [3.4030494689941406, -3.1635208129882812, 3.35296630859375, -0.47917938232421875, -0.3474617004394531, 5.4626312255859375, 0.8271331787109375, 1.516876220703125, 5.34259033203125, -1.4793853759765625, 4.1371307373046875, 1.6356639862060547, 3.3903961181640625, 1.7075653076171875, 1.2670516967773438, 0.9568634033203125, 3.364715576171875, 0.9429740905761719, 1.5801830291748047, 7.637626647949219, 2.5894775390625, 3.4838485717773438, 3.245777130126953, 2.674884796142578, 0.4838600158691406, 1.2231025695800781, 2.7579269409179688, 3.614147186279297, -1.2170639038085938, -0.346893310546875, -0.012401580810546875, 1.5870361328125, 2.5439605712890625, -1.1113433837890625, 0.2704315185546875, 1.6382980346679688, -0.13301849365234375, -1.3614921569824219, -2.452878952026367, 1.80865478515625, 0.09078788757324219, 0.6992607116699219, 0.3232154846191406, -0.38039398193359375, 1.7871170043945312, 1.5260200500488281, 5.031562805175781, -1.80364990234375, 4.207286834716797, 2.8437652587890625, 0.23830795288085938, 0.9607677459716797, -0.10760498046875, 1.837066650390625, 1.3557891845703125, 1.6663055419921875, -2.6829833984375, 1.2454147338867188, 1.690399169921875, -1.3893165588378906, 1.0366554260253906, -1.920562744140625, -1.7179222106933594, 0.6049690246582031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000260.npy"}
|
||||
{"epoch": 0.3930461073318216, "step": 261, "batch_size": 64, "mean": 0.9704260230064392, "std": 1.79734206199646, "min": -2.730010986328125, "p10": -1.1416862487792965, "median": 0.827728271484375, "p90": 3.138766098022461, "max": 6.037506103515625, "pos_frac": 0.71875, "sample": [-0.6017494201660156, 3.052032470703125, 0.7198257446289062, -0.7180938720703125, -0.4716796875, 1.7805614471435547, -0.017000198364257812, 2.624553680419922, 0.26192474365234375, -0.37751197814941406, -2.730010986328125, -0.763824462890625, 0.28658294677734375, 1.4937896728515625, 1.1465721130371094, 2.544769287109375, 0.6510353088378906, 4.7282867431640625, 0.4767284393310547, 4.1401824951171875, 1.17999267578125, 1.9809341430664062, 0.8262710571289062, -0.5363922119140625, 1.116607666015625, -0.319183349609375, 5.829246520996094, 1.3411235809326172, 1.1766510009765625, 1.0988998413085938, 0.9507026672363281, 0.9000205993652344, 1.7679443359375, 1.485107421875, 0.06076622009277344, 0.7810516357421875, 6.037506103515625, -1.982330322265625, -2.3639373779296875, -0.14487075805664062, 0.5313186645507812, 0.019588470458984375, 1.5248374938964844, 1.0767631530761719, 1.84246826171875, -1.4821243286132812, -0.23315811157226562, -1.795013427734375, -1.348846435546875, 1.5077667236328125, 1.0439453125, -1.3036270141601562, 4.6225433349609375, 4.036247253417969, 0.8291854858398438, 0.40091705322265625, 3.1759376525878906, 3.0080642700195312, -0.3849334716796875, 2.172332763671875, 0.28404998779296875, 0.21699905395507812, 0.647491455078125, 2.3014259338378906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000261.npy"}
|
||||
{"epoch": 0.3945578231292517, "step": 262, "batch_size": 64, "mean": 1.277741551399231, "std": 1.8990713357925415, "min": -3.496826171875, "p10": -1.0492515563964842, "median": 1.1326961517333984, "p90": 3.887598419189454, "max": 7.180084228515625, "pos_frac": 0.78125, "sample": [-0.5539169311523438, 1.959716796875, 2.5215377807617188, 0.14568710327148438, 0.814117431640625, 0.6361103057861328, 2.2282867431640625, -0.89776611328125, 1.6273307800292969, 2.7530364990234375, 0.13250732421875, 4.256599426269531, 0.6016311645507812, 0.5769462585449219, 4.74310302734375, -1.7726287841796875, -3.496826171875, 3.41741943359375, 1.538330078125, 1.825653076171875, -1.59234619140625, 1.7251815795898438, -1.1141738891601562, -0.474700927734375, 2.6237258911132812, 5.073268890380859, 2.567657470703125, 1.9388999938964844, -1.2783203125, -0.08386802673339844, 3.597515106201172, 0.8328704833984375, 0.45795440673828125, 3.102327346801758, -1.3094482421875, 1.172149658203125, 0.2386932373046875, 2.5887603759765625, 3.717987060546875, 1.14898681640625, -0.009906768798828125, 1.4161376953125, 0.9266223907470703, 3.9850997924804688, 7.180084228515625, 2.3616905212402344, -0.28238677978515625, 1.1164054870605469, -0.2205963134765625, 0.033641815185546875, 1.271331787109375, 0.5263252258300781, 4.283134460449219, 0.7593059539794922, 1.4488067626953125, 0.6640472412109375, 1.4546051025390625, 3.9602890014648438, 1.4919357299804688, 1.0199127197265625, 0.6584396362304688, 2.231689453125, 0.0560150146484375, -2.5471649169921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000262.npy"}
|
||||
{"epoch": 0.3960695389266818, "step": 263, "batch_size": 64, "mean": 1.0216337442398071, "std": 1.9475810527801514, "min": -3.302602767944336, "p10": -1.479960823059082, "median": 0.9587364196777344, "p90": 3.7922065734863284, "max": 5.986602783203125, "pos_frac": 0.671875, "sample": [0.8896026611328125, 3.1230506896972656, -1.8918228149414062, 1.0278701782226562, -0.2308197021484375, 0.046878814697265625, 1.6672210693359375, -0.5300369262695312, 1.6947441101074219, 3.969348907470703, 3.80999755859375, 0.8159408569335938, -0.02333831787109375, 1.0352249145507812, -0.21793365478515625, -2.3478546142578125, 1.79583740234375, 1.9121856689453125, -1.5545425415039062, -1.6921463012695312, 1.35943603515625, 2.3959503173828125, 1.150238037109375, 1.0913505554199219, -0.4364814758300781, 0.4674510955810547, 3.954071044921875, 0.7383804321289062, 1.6104698181152344, 1.2130355834960938, 5.366754531860352, 2.1444168090820312, 1.3891334533691406, 3.09283447265625, -1.3059368133544922, -0.4273681640625, -0.22179794311523438, 3.72967529296875, 0.5706520080566406, 3.9956588745117188, -1.220703125, 0.046539306640625, 1.469085693359375, 3.7506942749023438, 0.06598663330078125, -0.6726455688476562, -0.96612548828125, 3.2548913955688477, 1.2241287231445312, 1.0630683898925781, -1.95111083984375, 4.873817443847656, 0.806854248046875, -0.09841156005859375, 5.986602783203125, 0.20649337768554688, 3.720846176147461, 0.5603923797607422, -0.08788681030273438, -1.8471221923828125, -0.07360076904296875, -3.302602767944336, 1.4114761352539062, 1.9865570068359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000263.npy"}
|
||||
{"epoch": 0.3975812547241119, "step": 264, "batch_size": 64, "mean": 1.2877914905548096, "std": 1.840446949005127, "min": -2.1405487060546875, "p10": -0.5310935974121094, "median": 0.8942718505859375, "p90": 4.009604644775391, "max": 6.5540618896484375, "pos_frac": 0.78125, "sample": [5.036834716796875, -0.5331878662109375, 0.3308258056640625, 0.33071422576904297, 0.5299386978149414, -1.0845565795898438, 1.0753555297851562, 0.8858489990234375, 1.3751411437988281, 0.766326904296875, -1.702890396118164, 0.7993011474609375, -1.9964752197265625, 4.218555450439453, -0.14554786682128906, -0.4534149169921875, 0.16361618041992188, 2.45953369140625, 0.126861572265625, 2.6394729614257812, -0.23593902587890625, 1.3175373077392578, 1.1319808959960938, 2.5943603515625, 1.5643386840820312, 5.470924377441406, 0.49857330322265625, 3.4376220703125, -2.1405487060546875, 0.06719779968261719, 4.0429229736328125, 1.3870620727539062, 3.9318618774414062, 0.235870361328125, 0.7996826171875, 0.4892444610595703, 2.2973403930664062, -0.5262069702148438, 0.27921295166015625, 5.1010894775390625, 0.5689620971679688, -0.49304962158203125, -1.0460739135742188, 0.33685302734375, 3.6385269165039062, 1.4663162231445312, 0.9026947021484375, 2.2553253173828125, -0.5700836181640625, 1.0148506164550781, 1.1751556396484375, 0.3977508544921875, 4.914337158203125, 2.286235809326172, 0.01081085205078125, -0.31349945068359375, 1.7340087890625, 3.149658203125, 1.7035369873046875, -0.09716987609863281, 1.8227691650390625, 6.5540618896484375, 2.1708221435546875, 2.2694778442382812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000264.npy"}
|
||||
{"epoch": 0.39909297052154197, "step": 265, "batch_size": 64, "mean": 1.6635124683380127, "std": 2.369995594024658, "min": -2.9914989471435547, "p10": -1.1508316040039062, "median": 1.417959213256836, "p90": 4.427030372619629, "max": 7.037811279296875, "pos_frac": 0.75, "sample": [-0.6952590942382812, 1.4458885192871094, -0.7210578918457031, 0.491851806640625, 1.4190330505371094, -0.297119140625, 0.883026123046875, 1.4168853759765625, -1.6715621948242188, 6.930873870849609, 0.4640216827392578, 3.0352096557617188, 1.6121788024902344, 2.7294960021972656, -1.185699462890625, 0.41260528564453125, 1.538839340209961, 2.2062816619873047, 3.5730552673339844, -2.221832275390625, -0.9049959182739258, 3.125579833984375, 4.776191711425781, 3.4680213928222656, 0.00325775146484375, 1.8118896484375, 2.580352783203125, 0.30614471435546875, 3.0074844360351562, 0.0138397216796875, 3.352039337158203, 4.451896667480469, -1.0134735107421875, 6.814018249511719, 1.1592559814453125, 5.754081726074219, 0.9614162445068359, -1.60693359375, -2.6101341247558594, 0.39711761474609375, 2.588306427001953, 4.369009017944336, 0.9674873352050781, 3.2626800537109375, 7.037811279296875, 3.239105224609375, 5.1994476318359375, 3.8515777587890625, 1.2089500427246094, -2.9914989471435547, 1.043182373046875, 0.8103713989257812, -0.01625823974609375, 3.931732177734375, 4.216560363769531, 3.289370536804199, 4.160491943359375, -1.0694732666015625, 0.9505748748779297, -0.6756668090820312, 4.227874755859375, -0.5081825256347656, 2.3318939208984375, -2.17431640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000265.npy"}
|
||||
{"epoch": 0.40060468631897206, "step": 266, "batch_size": 64, "mean": 1.3903379440307617, "std": 2.3842670917510986, "min": -4.673801422119141, "p10": -1.087937545776367, "median": 1.3360595703125, "p90": 4.373542785644531, "max": 8.473434448242188, "pos_frac": 0.734375, "sample": [-1.1716957092285156, 4.400157928466797, 5.3852081298828125, 2.054811477661133, 3.856109619140625, -0.13610458374023438, -0.74365234375, -0.8925018310546875, 1.8881511688232422, -1.7440872192382812, 1.6739425659179688, 0.862548828125, -0.0650787353515625, 1.0418338775634766, 0.4921226501464844, 1.023651123046875, -0.6710128784179688, 6.771766662597656, 0.7909927368164062, -0.3680839538574219, 4.788043975830078, 2.17633056640625, 1.038665771484375, 3.0687637329101562, 2.0527496337890625, 4.39453125, -2.5937881469726562, 4.1737823486328125, 3.701335906982422, 2.0627365112304688, 2.076244354248047, 0.6171417236328125, 1.4448394775390625, -0.24073028564453125, 0.39813995361328125, 0.7298126220703125, 0.99542236328125, 5.706939697265625, 2.5946578979492188, -0.43514251708984375, 1.3658599853515625, -2.9454498291015625, 1.50238037109375, 1.5541572570800781, -2.68060302734375, 1.7780914306640625, 4.3245697021484375, 1.0005035400390625, 1.3911819458007812, 0.8638763427734375, -4.673801422119141, 0.6796188354492188, -3.66668701171875, 3.45904541015625, 2.5867691040039062, 0.4643211364746094, 8.473434448242188, -0.7109165191650391, 2.9924468994140625, 1.8093643188476562, 3.851778030395508, 1.3062591552734375, 1.4616508483886719, -0.4057769775390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000266.npy"}
|
||||
{"epoch": 0.4021164021164021, "step": 267, "batch_size": 64, "mean": 1.2320003509521484, "std": 1.7974178791046143, "min": -3.688495635986328, "p10": -0.9936218261718748, "median": 1.2667865753173828, "p90": 3.7048023223876965, "max": 4.747528076171875, "pos_frac": 0.734375, "sample": [1.4112014770507812, 0.15288734436035156, -0.7413330078125, -1.4261474609375, 3.0230789184570312, 0.6983375549316406, 2.4831809997558594, 2.576995849609375, 3.0703887939453125, 1.9680557250976562, 0.8079547882080078, 0.5302963256835938, 0.2820320129394531, 4.547386169433594, 0.5672035217285156, -0.6569252014160156, 0.05141448974609375, -0.6491851806640625, 0.073455810546875, -1.4912185668945312, -1.1286392211914062, -0.6009750366210938, 0.6850128173828125, 1.6976051330566406, 4.6478118896484375, 4.318132400512695, 0.31369781494140625, -1.378885269165039, 1.5568084716796875, -0.28975677490234375, 2.3275833129882812, 1.8581504821777344, 3.4036712646484375, 1.5904312133789062, -0.280303955078125, -0.5383377075195312, 4.199432373046875, -0.4560394287109375, -0.4729766845703125, 4.1076812744140625, 1.3116073608398438, -0.2539329528808594, 4.747528076171875, 1.350900650024414, 1.2219657897949219, 0.6959037780761719, 2.5339813232421875, 2.547027587890625, -3.688495635986328, 2.2378921508789062, 2.3759918212890625, -1.10174560546875, -1.3579778671264648, 1.010763168334961, 1.8975028991699219, 0.20184707641601562, 2.8495330810546875, 2.878032684326172, 2.2838096618652344, 0.7730865478515625, 2.984410285949707, 3.8338584899902344, 2.918548583984375, 1.7568168640136719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000267.npy"}
|
||||
{"epoch": 0.4036281179138322, "step": 268, "batch_size": 64, "mean": 0.8605864644050598, "std": 2.1289849281311035, "min": -3.0129470825195312, "p10": -1.5056022644042968, "median": 0.5308208465576172, "p90": 2.989954376220703, "max": 9.50225830078125, "pos_frac": 0.65625, "sample": [-0.2015533447265625, 4.987216949462891, 2.1737594604492188, -0.7206573486328125, -0.7421722412109375, 1.0896682739257812, 0.06023406982421875, -2.2651214599609375, -1.3279991149902344, 4.500312805175781, 2.3002395629882812, 0.8029098510742188, -0.15850830078125, 0.581787109375, 2.279266357421875, -0.0099029541015625, 0.16397476196289062, -2.05145263671875, -1.1198139190673828, 2.587615966796875, 2.885723114013672, 2.0383148193359375, -1.3598175048828125, -0.6998291015625, -1.4106521606445312, 0.81976318359375, 0.01003265380859375, 0.466156005859375, 2.1437301635742188, 0.9137382507324219, 1.4882583618164062, 0.4171600341796875, 0.5698165893554688, -0.3000755310058594, 0.31095123291015625, 2.9922409057617188, 0.7069473266601562, -3.0129470825195312, -1.97552490234375, 0.4918251037597656, 1.91937255859375, -0.3405780792236328, 1.9533157348632812, 9.50225830078125, 6.2765960693359375, 0.8030319213867188, 2.9826507568359375, -0.8115406036376953, 0.057830810546875, -1.546295166015625, -2.072723388671875, 1.8648834228515625, 2.6580467224121094, -0.6194925308227539, 3.9656982421875, 1.2172050476074219, -0.16541290283203125, 2.984619140625, 1.2809028625488281, 0.2975597381591797, -1.7416152954101562, 3.0523223876953125, 0.38660430908203125, 0.7466773986816406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000268.npy"}
|
||||
{"epoch": 0.4051398337112623, "step": 269, "batch_size": 64, "mean": 1.121701955795288, "std": 1.882659673690796, "min": -3.402557373046875, "p10": -1.298862838745117, "median": 0.8151321411132812, "p90": 3.7829101562500007, "max": 5.158657073974609, "pos_frac": 0.734375, "sample": [1.4308547973632812, -3.402557373046875, 3.466796875, 2.272735595703125, -0.3422088623046875, -1.9886360168457031, 0.3449249267578125, 5.158657073974609, 3.6251068115234375, -1.2894821166992188, 0.775665283203125, 2.4373779296875, 1.2967844009399414, 0.4961090087890625, 3.0419845581054688, 4.462493896484375, -0.5634841918945312, 0.46016693115234375, 1.76629638671875, -1.8451461791992188, -1.3028831481933594, 0.33819580078125, 2.866896629333496, 0.664642333984375, -1.88250732421875, 2.93328857421875, 0.6524848937988281, 4.9805145263671875, 1.8366546630859375, 2.2768726348876953, 1.295846939086914, 1.4845008850097656, 0.3339071273803711, 2.1581344604492188, 0.4188995361328125, 0.3736133575439453, -0.6291542053222656, -0.631866455078125, 0.41497039794921875, 1.720855712890625, 3.8554039001464844, 2.5889358520507812, 0.1755218505859375, 2.707500457763672, 2.9436168670654297, -0.12452316284179688, 2.083578109741211, 0.14490890502929688, -1.4479179382324219, 0.7908554077148438, -1.0814208984375, 1.58062744140625, 3.9267425537109375, 0.4357147216796875, -0.35718536376953125, -2.8306350708007812, -0.0335540771484375, 3.8505401611328125, -0.2211456298828125, 2.4695053100585938, 2.0472640991210938, 4.1187286376953125, 0.8394088745117188, 1.4181442260742188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000269.npy"}
|
||||
{"epoch": 0.40665154950869237, "step": 270, "batch_size": 64, "mean": 0.9228445291519165, "std": 1.872947335243225, "min": -2.80511474609375, "p10": -1.0321796417236326, "median": 0.5806713104248047, "p90": 3.0946319580078128, "max": 7.0340118408203125, "pos_frac": 0.703125, "sample": [1.6361160278320312, 2.3026046752929688, -0.6793594360351562, 4.467437744140625, 2.533823013305664, 4.570365905761719, 2.616575241088867, 2.041179656982422, 1.248992919921875, 0.5875244140625, 7.0340118408203125, 0.4346733093261719, -0.3002586364746094, -0.5501976013183594, 1.3254165649414062, 0.3952789306640625, 0.0096435546875, 0.10223770141601562, 0.11541748046875, 2.9927978515625, 0.6974639892578125, -1.71026611328125, 0.0829620361328125, 0.45829010009765625, 3.138275146484375, 3.417346954345703, -0.41236114501953125, -2.57861328125, 4.2438201904296875, 0.5738182067871094, -0.45733642578125, -0.6561279296875, -0.31107330322265625, -0.10552978515625, 0.09933090209960938, 2.2882766723632812, 0.9828643798828125, -1.0840339660644531, 0.45584869384765625, -2.3062095642089844, 2.445770263671875, 2.6058502197265625, 2.663360595703125, -0.9111862182617188, 1.8218822479248047, 2.2609100341796875, -2.80511474609375, -0.7012176513671875, 0.547607421875, 0.7077407836914062, 3.195568084716797, -0.0759735107421875, -2.736724853515625, 2.3688201904296875, 0.36579132080078125, 1.0436859130859375, 0.4166717529296875, -2.3731689453125, 1.4532928466796875, 0.7226181030273438, 1.8244552612304688, -0.13046646118164062, 2.2038726806640625, 2.4469757080078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000270.npy"}
|
||||
{"epoch": 0.40816326530612246, "step": 271, "batch_size": 64, "mean": 0.8335303068161011, "std": 2.2783432006835938, "min": -2.7954397201538086, "p10": -1.6324790954589843, "median": 0.5418243408203125, "p90": 3.034123992919922, "max": 9.318145751953125, "pos_frac": 0.609375, "sample": [1.30609130859375, 1.4240570068359375, 0.46666717529296875, -1.5294647216796875, 1.5688629150390625, 1.9782485961914062, 0.7651405334472656, 2.2404632568359375, 3.903461456298828, -1.2604827880859375, 2.87652587890625, -2.2940216064453125, 3.06280517578125, -0.6819772720336914, 2.9672012329101562, 3.3100128173828125, 2.896493911743164, 1.201995849609375, 0.27002525329589844, 8.103729248046875, -1.6766281127929688, -1.1114921569824219, 1.4849929809570312, -2.7954397201538086, 1.2857513427734375, -0.2502250671386719, -1.7882766723632812, 0.9854698181152344, -1.2215576171875, 0.04727363586425781, 1.0403318405151367, 2.536073684692383, -2.1492347717285156, -1.7713165283203125, 0.50091552734375, 2.8810501098632812, 0.582733154296875, -0.5452346801757812, 1.640899658203125, 3.5352792739868164, -0.0519866943359375, -1.2018280029296875, 9.318145751953125, -1.3055801391601562, 1.2827415466308594, 1.52532958984375, 0.42053985595703125, -0.6679801940917969, -0.8380317687988281, -0.7036209106445312, -0.01316070556640625, -0.7076568603515625, 2.03021240234375, 2.1270923614501953, 0.7372417449951172, 0.26416015625, -2.3293609619140625, 0.6156158447265625, -0.6045761108398438, 0.411773681640625, -0.32276344299316406, -1.299957275390625, 6.526725769042969, 2.3456649780273438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000271.npy"}
|
||||
{"epoch": 0.40967498110355255, "step": 272, "batch_size": 64, "mean": 1.4435718059539795, "std": 2.277405261993408, "min": -8.5623779296875, "p10": -0.3984194755554199, "median": 1.2427053451538086, "p90": 3.955188751220703, "max": 10.50677490234375, "pos_frac": 0.875, "sample": [0.1807098388671875, 0.9338302612304688, 0.01220703125, 2.5758819580078125, 0.7062225341796875, 5.007102966308594, 2.81982421875, 2.8139114379882812, 3.959869384765625, 2.398162841796875, 1.6773452758789062, 1.6660957336425781, -0.46018028259277344, 0.36585235595703125, 1.3374481201171875, 0.9870414733886719, 0.2868804931640625, 0.390045166015625, 5.275367736816406, 4.299125671386719, 1.1690635681152344, 1.3163471221923828, 1.4502220153808594, 1.99420166015625, 2.6460647583007812, 0.7999725341796875, 2.8237152099609375, -1.226409912109375, 0.6919479370117188, 4.440792083740234, 0.668487548828125, 5.2606353759765625, -0.7980728149414062, -0.4176149368286133, 3.1396923065185547, 0.08571243286132812, 0.09054851531982422, 0.15608978271484375, 1.7872695922851562, -1.134552001953125, 1.139312744140625, 3.4386978149414062, 1.90264892578125, 1.8495254516601562, 0.09383392333984375, 0.42984962463378906, -8.5623779296875, 1.0137176513671875, 1.7934989929199219, 2.7369003295898438, 0.5683975219726562, 1.8084068298339844, 2.1610031127929688, 1.53045654296875, 0.9435501098632812, 10.50677490234375, 0.14192962646484375, 1.5414505004882812, -0.9964828491210938, 0.7981281280517578, -0.35363006591796875, 0.09887504577636719, 3.9442672729492188, 1.6830062866210938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000272.npy"}
|
||||
{"epoch": 0.41118669690098264, "step": 273, "batch_size": 64, "mean": 1.5170398950576782, "std": 2.183688163757324, "min": -3.0321578979492188, "p10": -1.2594863891601562, "median": 1.7625770568847656, "p90": 3.5153770446777344, "max": 8.28564453125, "pos_frac": 0.765625, "sample": [2.447521209716797, 0.8092880249023438, 2.22052001953125, -0.38921356201171875, 3.102294921875, 4.453880310058594, 8.28564453125, 3.5312957763671875, -3.0321578979492188, 0.8130722045898438, -0.3779754638671875, -2.4476165771484375, 2.1818695068359375, -1.9130630493164062, 1.7681198120117188, 2.0257949829101562, 1.7649459838867188, 0.876312255859375, 3.6295509338378906, -1.1514892578125, 2.7491531372070312, 3.098346710205078, -0.26148223876953125, -1.0175857543945312, -2.4304046630859375, 1.3481216430664062, 2.6204605102539062, 0.6423873901367188, -0.3805961608886719, 1.4155426025390625, 0.680328369140625, 1.838714599609375, 1.7298049926757812, -2.3803176879882812, 1.8095550537109375, 0.713043212890625, 2.3359031677246094, 3.4782333374023438, 0.7564849853515625, 2.6882858276367188, -1.3057708740234375, 1.022216796875, 1.7602081298828125, -0.43604278564453125, 0.22141647338867188, 7.4310302734375, 6.505563735961914, 2.627553939819336, -0.178558349609375, 3.4598770141601562, 1.221160888671875, 0.7574577331542969, 2.3556175231933594, 3.3124542236328125, 1.3408699035644531, 2.5198822021484375, 1.985076904296875, 2.632080078125, 0.21699905395507812, 2.0666122436523438, 2.605377197265625, 3.9065017700195312, -2.317859649658203, 3.3482589721679688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000273.npy"}
|
||||
{"epoch": 0.4126984126984127, "step": 274, "batch_size": 64, "mean": 1.1310944557189941, "std": 1.844063639640808, "min": -2.4648208618164062, "p10": -1.001198387145996, "median": 1.1436309814453125, "p90": 3.693414306640625, "max": 4.818267822265625, "pos_frac": 0.640625, "sample": [3.4320030212402344, -0.8194389343261719, -0.3315544128417969, -0.5554962158203125, -1.391754150390625, -1.6703033447265625, 4.1521759033203125, 1.1511001586914062, 1.8536720275878906, -0.5359783172607422, 1.3790283203125, 3.07275390625, -2.4648208618164062, -2.10980224609375, 1.248382568359375, 3.8461456298828125, 3.7109375, -0.7608909606933594, 3.0942726135253906, 0.010396957397460938, 1.738525390625, 2.962810516357422, 1.1361618041992188, 3.149026870727539, 0.625457763671875, 0.6919155120849609, 3.51251220703125, 3.9762420654296875, 0.32813072204589844, 0.5239944458007812, -0.2965831756591797, 0.5041618347167969, 1.2674179077148438, -0.15160369873046875, -0.10753250122070312, 2.7551536560058594, -1.0740280151367188, 0.21457290649414062, 2.168262481689453, 1.7680435180664062, -0.8312625885009766, 3.65252685546875, 0.01885986328125, 2.299774169921875, -0.8053646087646484, 4.5239715576171875, -0.5915355682373047, -1.1709747314453125, 2.611663818359375, -1.9044532775878906, 3.7322616577148438, 3.3852767944335938, 2.7863998413085938, -0.1385345458984375, -0.35064697265625, -0.4089698791503906, -0.11373138427734375, 1.7235279083251953, 4.818267822265625, 2.537494659423828, -0.04316902160644531, 1.2948074340820312, 2.1523513793945312, 1.208038330078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000274.npy"}
|
||||
{"epoch": 0.41421012849584277, "step": 275, "batch_size": 64, "mean": 1.4796141386032104, "std": 1.9817357063293457, "min": -2.1995315551757812, "p10": -0.9434890747070311, "median": 1.1282272338867188, "p90": 4.275287628173829, "max": 6.5401611328125, "pos_frac": 0.796875, "sample": [0.6307373046875, 1.7136993408203125, 0.026363372802734375, 2.5340805053710938, 4.047538757324219, 0.1494426727294922, 1.5721893310546875, -0.9994964599609375, 6.1632537841796875, 3.3158111572265625, 0.4275360107421875, 1.7079620361328125, 1.1185150146484375, 5.433940887451172, -0.4681587219238281, -0.16933631896972656, 1.775543212890625, 0.5251007080078125, 3.9586181640625, -0.7481536865234375, -1.379302978515625, 6.5401611328125, 0.5537643432617188, 3.2863693237304688, 2.6182937622070312, 1.137939453125, -2.1995315551757812, 0.8383255004882812, 0.45999908447265625, 1.27618408203125, 3.3176116943359375, 4.5491790771484375, -0.66302490234375, 1.916534423828125, 4.732280731201172, 5.3950347900390625, 0.5130729675292969, 2.031862258911133, 0.25328826904296875, -1.0864753723144531, 3.1502456665039062, 0.9733963012695312, 0.397125244140625, 0.5459461212158203, 1.2427215576171875, 2.1337051391601562, 2.450164794921875, 0.5177383422851562, -1.5253562927246094, 1.2337112426757812, 0.2425689697265625, -1.6757736206054688, -0.8505630493164062, 2.8134536743164062, 3.7125415802001953, 0.942901611328125, 0.6557235717773438, 0.6122589111328125, 1.1488018035888672, 2.8287429809570312, 4.372894287109375, -0.3294525146484375, -0.9833145141601562, 3.278369903564453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000275.npy"}
|
||||
{"epoch": 0.41572184429327286, "step": 276, "batch_size": 64, "mean": 1.277951717376709, "std": 2.075593948364258, "min": -3.2035369873046875, "p10": -1.2349864959716794, "median": 1.1717605590820312, "p90": 4.124482345581056, "max": 5.466814041137695, "pos_frac": 0.6875, "sample": [0.82958984375, 0.8873138427734375, -0.04434013366699219, 1.90045166015625, 0.5840988159179688, 0.4803009033203125, 5.268115997314453, -2.7463226318359375, 1.0464839935302734, 2.898283004760742, 2.3189620971679688, -1.3095436096191406, 2.382038116455078, 2.1893386840820312, 2.7643508911132812, 3.0012588500976562, -0.12778091430664062, 1.3157806396484375, 1.5206298828125, 0.7915220260620117, -0.27931976318359375, 4.2680511474609375, 1.7634201049804688, 3.801607131958008, -1.341094970703125, -2.680248260498047, 2.325714111328125, -0.247589111328125, 2.684436798095703, 2.642923355102539, -0.22506332397460938, -2.159393310546875, -2.5372543334960938, 0.09897422790527344, 4.195285797119141, -0.209075927734375, 5.466814041137695, -0.21901893615722656, 3.3338623046875, 2.845001220703125, -3.2035369873046875, 0.4651031494140625, 4.692356109619141, 0.5987262725830078, 1.1150054931640625, -1.0187606811523438, 3.0451087951660156, 2.3512725830078125, 4.933380126953125, 1.4210433959960938, 1.228515625, -0.96527099609375, 3.9592742919921875, -0.2944488525390625, 0.15753936767578125, 4.3760223388671875, 2.36663818359375, -0.7476806640625, -1.0610198974609375, 3.666015625, 1.84564208984375, 3.0470657348632812, 0.9724216461181641, -0.6400680541992188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000276.npy"}
|
||||
{"epoch": 0.41723356009070295, "step": 277, "batch_size": 64, "mean": 0.6534090638160706, "std": 2.0773067474365234, "min": -3.2462615966796875, "p10": -2.122990608215332, "median": 1.1023406982421875, "p90": 2.9585517883300785, "max": 7.2148590087890625, "pos_frac": 0.578125, "sample": [1.3655471801757812, -3.2462615966796875, 2.1080856323242188, 1.0761566162109375, 1.8554515838623047, -0.905548095703125, 1.7964134216308594, 1.1285247802734375, 2.523427963256836, 3.0050125122070312, -2.2291412353515625, -2.4584808349609375, 2.14453125, -0.51007080078125, -2.315479278564453, 1.4113655090332031, -0.6248798370361328, 0.7817459106445312, -1.4433364868164062, 1.8568878173828125, 1.6938648223876953, 1.7085800170898438, 1.4757728576660156, -0.17047119140625, -0.3586769104003906, 2.31744384765625, 1.7642822265625, -1.758026123046875, -1.194000244140625, 1.1531944274902344, -0.26389408111572266, 0.9493904113769531, -2.6315860748291016, 3.403118133544922, -0.3301811218261719, -0.05155181884765625, -1.2419204711914062, -1.926177978515625, -0.5357666015625, 0.08061599731445312, 1.59686279296875, -2.0288009643554688, 2.5314407348632812, 3.262176513671875, 1.71917724609375, 1.295431137084961, 2.6875534057617188, 0.9594345092773438, -2.13739013671875, 7.2148590087890625, 3.0345916748046875, -0.5213699340820312, 2.124420166015625, 2.8501434326171875, -1.4018783569335938, -0.4481964111328125, 3.1972131729125977, -3.1251678466796875, 1.63232421875, 5.601310729980469, -1.1853866577148438, 1.5543479919433594, -2.0893917083740234, 2.090513229370117], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000277.npy"}
|
||||
{"epoch": 0.41874527588813304, "step": 278, "batch_size": 64, "mean": 0.9032933712005615, "std": 1.9488365650177002, "min": -3.0795211791992188, "p10": -1.5877590179443357, "median": 0.9519863128662109, "p90": 3.7417205810546887, "max": 5.358066558837891, "pos_frac": 0.65625, "sample": [-0.961334228515625, -0.38043212890625, 1.6863327026367188, 1.3603057861328125, 1.6702423095703125, -0.5401535034179688, -3.0795211791992188, 1.2048873901367188, 3.8607025146484375, 4.1035614013671875, 2.2983474731445312, 0.638702392578125, 3.4640960693359375, 5.358066558837891, 4.353279113769531, 0.9691925048828125, 1.1195716857910156, 0.2779560089111328, 0.5964012145996094, 2.3425025939941406, 0.9279594421386719, 2.940044403076172, 1.0138740539550781, -2.3472747802734375, -2.8492889404296875, 3.43841552734375, -1.0915374755859375, 2.556049346923828, -0.2576904296875, 1.6779403686523438, 1.5194358825683594, -0.10988616943359375, -1.0864715576171875, -1.4659004211425781, 1.576202392578125, -0.2216644287109375, 1.2603511810302734, -0.832061767578125, 2.6141529083251953, 0.2086944580078125, 3.9098052978515625, 3.9676971435546875, -0.8753662109375, 1.621673583984375, 2.145052909851074, 2.922565460205078, 2.634063720703125, -0.5349788665771484, -1.67791748046875, 0.9347801208496094, 0.7712249755859375, 1.3907623291015625, -2.24884033203125, 0.2076416015625, 0.010101318359375, 4.5111083984375, -2.742156982421875, -0.9556884765625, 1.8925819396972656, -1.639984130859375, -0.26213836669921875, -0.6124725341796875, 0.5043067932128906, 2.1229019165039062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000278.npy"}
|
||||
{"epoch": 0.42025699168556313, "step": 279, "batch_size": 64, "mean": 1.3381754159927368, "std": 1.6682668924331665, "min": -2.9616565704345703, "p10": -0.5910274505615233, "median": 1.358266830444336, "p90": 3.5986364364624026, "max": 5.075590133666992, "pos_frac": 0.8125, "sample": [-2.3192996978759766, 1.2463951110839844, 4.1171722412109375, 1.3375892639160156, -2.9616565704345703, 3.5629119873046875, 0.3036022186279297, -2.3268356323242188, -0.258087158203125, 2.6727867126464844, 1.9800262451171875, 2.0732154846191406, 0.5637855529785156, 3.781719207763672, 1.3789443969726562, 1.43798828125, 0.301666259765625, 4.153858184814453, -2.004047393798828, -1.0624008178710938, 0.2697286605834961, 2.679473876953125, -0.4455528259277344, 1.8919868469238281, -0.4052886962890625, 2.51513671875, 2.320169448852539, 1.1122856140136719, 0.817596435546875, 2.6517257690429688, 1.4307403564453125, 3.6139469146728516, 2.0044479370117188, -0.35645294189453125, 0.9841995239257812, 2.4818859100341797, 2.5301971435546875, 2.0450592041015625, 1.91326904296875, 5.075590133666992, 2.9057350158691406, 2.0068817138671875, 2.3427276611328125, 0.7869949340820312, -0.6533737182617188, 2.3265380859375, -0.022716522216796875, 1.1860809326171875, 1.044952392578125, 0.801513671875, 2.0849609375, 0.054595947265625, 0.9521598815917969, 3.7237014770507812, 0.47867393493652344, 0.8416900634765625, 1.397491455078125, -1.6414642333984375, 2.4909210205078125, 1.3295001983642578, 1.1755733489990234, 1.5033645629882812, 4.550952911376953, 0.8662910461425781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000279.npy"}
|
||||
{"epoch": 0.4217687074829932, "step": 280, "batch_size": 64, "mean": 1.5437543392181396, "std": 1.5981924533843994, "min": -2.1620101928710938, "p10": -0.03790912628173828, "median": 1.3552932739257812, "p90": 3.872137451171875, "max": 6.476673126220703, "pos_frac": 0.84375, "sample": [0.5369949340820312, 6.476673126220703, 1.3877754211425781, 3.7929916381835938, -0.03975677490234375, 2.705892562866211, -0.03359794616699219, -0.26804351806640625, 1.6525154113769531, 0.6619949340820312, 2.0321502685546875, 1.9796981811523438, 1.8532943725585938, 1.0061187744140625, 1.77716064453125, 3.8518753051757812, 0.7403450012207031, 1.3228111267089844, 3.67059326171875, -1.0638885498046875, 0.11803436279296875, 0.3647880554199219, 0.9471969604492188, -0.014097213745117188, 3.8808212280273438, 1.8734588623046875, 1.9487323760986328, 2.2852554321289062, 4.316009521484375, 4.116546630859375, 0.8525028228759766, -0.00699615478515625, 0.6778030395507812, 0.767364501953125, 0.8881683349609375, 0.8932037353515625, 4.929168701171875, -0.52642822265625, 4.070320129394531, 0.4505157470703125, -0.7943267822265625, -2.1620101928710938, 0.1917266845703125, 0.7895612716674805, 0.5547294616699219, 1.4336891174316406, 0.14776611328125, -0.38317108154296875, 0.29077911376953125, 2.84918212890625, 3.001340866088867, 1.7713642120361328, 1.6135635375976562, 2.138141632080078, 1.57763671875, 1.7941741943359375, 2.0723304748535156, 0.410888671875, 1.0004997253417969, 0.90325927734375, 2.3123779296875, 4.252899169921875, 2.7895965576171875, 3.368316650390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000280.npy"}
|
||||
{"epoch": 0.42328042328042326, "step": 281, "batch_size": 64, "mean": 1.2547452449798584, "std": 1.7762106657028198, "min": -2.259592056274414, "p10": -1.0287353515625, "median": 1.308422565460205, "p90": 3.5121351242065435, "max": 5.922981262207031, "pos_frac": 0.734375, "sample": [-1.0514564514160156, 2.899799346923828, 1.3272485733032227, 3.140167236328125, 1.8451080322265625, 2.744842529296875, -0.5991744995117188, 2.9168472290039062, 2.2982025146484375, 4.3751068115234375, 1.4656257629394531, 1.8002891540527344, 1.5288333892822266, 4.948036193847656, 2.220855712890625, 0.9998750686645508, 2.75738525390625, 2.1223373413085938, -0.7330112457275391, 1.5518112182617188, 1.0905990600585938, 0.9434165954589844, 0.603912353515625, 1.2895965576171875, 1.1805992126464844, 0.2698631286621094, 3.405458450317383, -0.6006698608398438, 0.21654510498046875, -1.4696388244628906, 0.27490997314453125, 0.7133522033691406, -0.6784286499023438, 1.5746402740478516, 3.5578536987304688, -1.6675033569335938, 0.80462646484375, 5.922981262207031, -0.5630989074707031, -0.9757194519042969, 3.6679039001464844, 0.28708648681640625, 3.6106491088867188, 1.8648662567138672, 1.9294853210449219, -0.086090087890625, 1.8439521789550781, 2.8708152770996094, 3.867826461791992, 0.39693450927734375, -2.04876708984375, 1.353118896484375, 2.4060096740722656, -2.259592056274414, 1.1021289825439453, 3.125537872314453, 3.3780746459960938, -0.8236083984375, 2.2063674926757812, 0.051116943359375, -0.5211181640625, -1.1683464050292969, -0.031673431396484375, -1.1710090637207031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000281.npy"}
|
||||
{"epoch": 0.42479213907785335, "step": 282, "batch_size": 64, "mean": 1.2650127410888672, "std": 2.0832884311676025, "min": -4.25933837890625, "p10": -0.8617431640625, "median": 0.8299593925476074, "p90": 4.32667465209961, "max": 7.655029296875, "pos_frac": 0.734375, "sample": [0.8966293334960938, 0.56549072265625, -1.5734176635742188, 0.7110824584960938, 0.6429824829101562, 0.5396957397460938, 0.4949493408203125, 0.3251018524169922, -1.121856689453125, 4.121879577636719, 4.5303497314453125, 4.87139892578125, 0.8871259689331055, 1.39288330078125, 4.10772705078125, -0.898895263671875, 4.47053337097168, 0.7727928161621094, 0.5778656005859375, 2.393230438232422, 0.1854705810546875, 0.127716064453125, 1.3946647644042969, 4.4144439697265625, -0.20673751831054688, -0.5112037658691406, 0.13455963134765625, 0.0932464599609375, 5.747032165527344, 1.1173934936523438, -1.8296432495117188, -0.5588855743408203, 0.29563140869140625, 1.9927825927734375, -0.4666728973388672, -0.15697479248046875, 2.0159225463867188, 0.38094139099121094, -0.775054931640625, -0.30267333984375, 0.9694328308105469, 3.545267105102539, -0.3076171875, 3.148487091064453, 3.3862533569335938, 3.5844573974609375, 1.5162353515625, 1.485879898071289, -0.6393356323242188, 2.4778289794921875, -0.1812896728515625, 1.0420455932617188, -1.0413055419921875, 0.33247947692871094, -4.25933837890625, 7.655029296875, 1.7232894897460938, 1.7054882049560547, 0.9316978454589844, -1.20233154296875, 2.3358612060546875, 3.2033538818359375, 5.3969573974609375, 2.3524742126464844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000282.npy"}
|
||||
{"epoch": 0.42630385487528344, "step": 283, "batch_size": 64, "mean": 1.3505456447601318, "std": 2.203073740005493, "min": -2.8743133544921875, "p10": -1.4728498458862305, "median": 1.1096916198730469, "p90": 3.9776832580566412, "max": 8.28125, "pos_frac": 0.734375, "sample": [5.168426513671875, -1.4270877838134766, -1.9123611450195312, 1.1532745361328125, 3.3235321044921875, 0.9100379943847656, 1.4277687072753906, 1.0661087036132812, 4.2468719482421875, -1.8921737670898438, -1.7114334106445312, 0.6967239379882812, 2.617786407470703, 0.7045326232910156, 1.54449462890625, 1.63507080078125, 1.4999847412109375, 3.598876953125, -1.7321014404296875, 8.28125, 1.1563453674316406, 1.9113693237304688, 2.3152122497558594, -0.47344017028808594, 1.9957733154296875, 3.8094711303710938, 4.753032684326172, 2.7497100830078125, 3.6955718994140625, 1.0540084838867188, 2.7725296020507812, 3.519176483154297, -0.8769989013671875, 0.19141769409179688, -0.1744232177734375, -0.4809417724609375, 0.9976654052734375, 0.3592529296875, 1.2103195190429688, 1.4888153076171875, 1.7457313537597656, 0.37583160400390625, 4.049774169921875, 3.218423843383789, 0.2261810302734375, -1.393096923828125, 0.7865524291992188, -2.8743133544921875, -1.1292448043823242, 6.385772705078125, 2.3547935485839844, -0.23993301391601562, 6.406982421875, 0.4948596954345703, 3.215576171875, 0.7652511596679688, -1.492462158203125, 1.2207717895507812, -0.3325214385986328, 0.3341827392578125, -1.8476104736328125, 2.455646514892578, -0.04705810546875, 0.5813808441162109], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000283.npy"}
|
||||
{"epoch": 0.42781557067271353, "step": 284, "batch_size": 64, "mean": 1.205304503440857, "std": 2.025188446044922, "min": -5.58935546875, "p10": -0.828992462158203, "median": 1.396803855895996, "p90": 3.8200370788574225, "max": 5.493293762207031, "pos_frac": 0.71875, "sample": [0.618011474609375, -0.48824310302734375, 1.4768829345703125, 0.0522003173828125, 5.040191650390625, -0.3536834716796875, -0.5974349975585938, -2.607818603515625, 4.12872314453125, -0.21390914916992188, 1.0700302124023438, 3.8872528076171875, 0.7893447875976562, 1.2980804443359375, 3.050154685974121, -0.18606185913085938, 2.1408615112304688, 2.3837108612060547, 0.2665557861328125, -5.58935546875, 1.6719818115234375, -2.6488723754882812, 0.47320556640625, -1.8544120788574219, 0.1262054443359375, 2.8281097412109375, 1.5898513793945312, -0.9928054809570312, 1.464040756225586, 1.7997856140136719, 2.1917686462402344, 2.3554534912109375, 3.8951873779296875, 1.3295669555664062, 2.467731475830078, 1.6916923522949219, 1.7370796203613281, 4.959529876708984, 2.597747802734375, -0.9083251953125, 1.5114898681640625, 2.7570266723632812, 0.48598480224609375, 3.1463546752929688, -0.2743721008300781, 0.8554458618164062, -0.319976806640625, 5.493293762207031, 2.2151641845703125, 3.6632003784179688, -0.36647701263427734, 2.537139892578125, 2.2332305908203125, -0.1250762939453125, 2.4477615356445312, 0.845611572265625, 5.15301513671875, 0.33941650390625, 2.360210418701172, -0.6438827514648438, 0.09883880615234375, -2.198911666870117, 2.4764175415039062, -0.49143123626708984], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000284.npy"}
|
||||
{"epoch": 0.4293272864701436, "step": 285, "batch_size": 64, "mean": 0.8669461011886597, "std": 1.8684848546981812, "min": -3.9874191284179688, "p10": -1.1477081298828125, "median": 0.4641742706298828, "p90": 3.31767578125, "max": 5.534267425537109, "pos_frac": 0.671875, "sample": [0.29549407958984375, 0.134796142578125, -0.870941162109375, 2.4418983459472656, -2.2613601684570312, -1.1395263671875, -0.9832496643066406, 0.12756729125976562, -1.5074653625488281, -3.9874191284179688, 3.3236541748046875, 1.6024894714355469, 3.8867340087890625, 3.0663833618164062, 2.5023422241210938, -0.7615432739257812, 0.8567428588867188, 5.534267425537109, -0.06931304931640625, 1.2207717895507812, 0.5157546997070312, 0.3834228515625, -1.064565658569336, 0.3103485107421875, -1.1986885070800781, -0.8319625854492188, 2.7974090576171875, 0.32680511474609375, -0.61968994140625, 0.29900360107421875, -0.458526611328125, 1.985260009765625, -2.109130859375, 3.3037261962890625, 3.8378448486328125, 0.8996505737304688, 2.3995742797851562, 3.277984619140625, 1.0810585021972656, 1.745849609375, 2.756633758544922, 2.34710693359375, -0.25333404541015625, 3.365436553955078, 2.5071868896484375, 2.1305999755859375, 0.9024219512939453, 3.3643798828125, -1.1118621826171875, -1.1283531188964844, 0.14493370056152344, 2.0524559020996094, 0.4125938415527344, 2.8749771118164062, 0.1845855712890625, 0.13524627685546875, 4.1578369140625, 2.510223388671875, -1.2638130187988281, -0.844635009765625, -0.9935073852539062, -1.151214599609375, 0.66656494140625, 1.4246368408203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000285.npy"}
|
||||
{"epoch": 0.4308390022675737, "step": 286, "batch_size": 64, "mean": 1.4889705181121826, "std": 1.9036020040512085, "min": -4.4073944091796875, "p10": -0.24592628479003903, "median": 1.4543981552124023, "p90": 3.7095855712890633, "max": 5.887001037597656, "pos_frac": 0.84375, "sample": [2.6102218627929688, 0.8775711059570312, 0.4531726837158203, 2.1974945068359375, 2.1950759887695312, 3.1001739501953125, 0.55169677734375, 0.04143524169921875, 1.0789527893066406, -0.8707656860351562, 1.6337203979492188, 1.3696403503417969, 0.30364227294921875, 2.3514862060546875, 1.007049560546875, 4.1771087646484375, 2.498615264892578, 0.8114833831787109, -2.2656021118164062, 1.8613433837890625, 1.8682174682617188, -3.0482025146484375, -0.09503173828125, 4.433784484863281, -0.04172515869140625, -2.6675033569335938, 2.6174516677856445, 1.25616455078125, -0.662689208984375, 2.338836669921875, 1.5391559600830078, 2.4545745849609375, 4.263275146484375, 3.5487060546875, 1.2574691772460938, 3.778533935546875, -0.2591819763183594, 1.8163509368896484, -0.214996337890625, 5.8777618408203125, 5.887001037597656, 4.74456787109375, 3.2873153686523438, 0.6139984130859375, 2.135631561279297, 0.8455734252929688, 0.7208366394042969, 1.606201171875, 3.3418655395507812, 3.547739028930664, 0.749664306640625, 0.23038101196289062, 0.23397064208984375, 1.7342987060546875, 0.610107421875, 0.57757568359375, 2.6002349853515625, 0.4077033996582031, -4.4073944091796875, 3.1627655029296875, 2.4110355377197266, 1.0785255432128906, 2.0239791870117188, 1.1060714721679688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000286.npy"}
|
||||
{"epoch": 0.4323507180650038, "step": 287, "batch_size": 64, "mean": 1.4589459896087646, "std": 2.287581443786621, "min": -5.9212188720703125, "p10": -0.8134765625, "median": 1.1214838027954102, "p90": 4.457099151611328, "max": 6.612251281738281, "pos_frac": 0.734375, "sample": [1.9951171875, 0.146484375, 0.6124458312988281, 2.7806434631347656, 5.840583801269531, 3.69342041015625, 3.6333770751953125, 1.6790847778320312, 4.9907379150390625, -0.08885955810546875, 2.474925994873047, 1.9474239349365234, 0.8408946990966797, 4.521270751953125, 3.8137359619140625, 4.7389373779296875, 3.5894622802734375, -0.8041839599609375, 1.6921024322509766, 1.0523452758789062, 1.6561737060546875, -5.9212188720703125, 2.4835968017578125, 1.0139989852905273, 1.0999526977539062, 2.4357986450195312, -0.4286937713623047, -0.6106491088867188, 2.5575027465820312, 1.0045509338378906, 2.3453826904296875, -1.3431625366210938, -2.0925064086914062, -0.34186553955078125, -0.8174591064453125, 2.89990234375, 1.1049213409423828, -0.513824462890625, 0.3724098205566406, 1.8725452423095703, 2.0599746704101562, 0.3156242370605469, 4.478458404541016, 4.402378082275391, 1.08905029296875, 1.1380462646484375, 6.612251281738281, -2.0281028747558594, 1.756195068359375, 3.3626976013183594, -0.013413429260253906, -0.4477691650390625, 1.0187015533447266, 0.49565887451171875, -2.876972198486328, -0.6415328979492188, 4.407260894775391, -0.43155670166015625, 0.8485794067382812, 1.2053203582763672, -1.7462005615234375, 6.3005218505859375, 3.9073867797851562, 0.2326812744140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000287.npy"}
|
||||
{"epoch": 0.43386243386243384, "step": 288, "batch_size": 64, "mean": 1.1478712558746338, "std": 2.2680630683898926, "min": -4.329742431640625, "p10": -1.8680475234985352, "median": 1.0299110412597656, "p90": 3.986563873291016, "max": 7.149147033691406, "pos_frac": 0.71875, "sample": [3.2302207946777344, 2.1172409057617188, -4.0625, -2.1892471313476562, 5.630805969238281, 0.14988327026367188, 0.2601051330566406, -1.276214599609375, 2.260040283203125, -0.2606697082519531, 1.9142837524414062, -0.19359302520751953, -0.4445152282714844, 2.222665786743164, 2.1590728759765625, 3.8037338256835938, -2.6634063720703125, 2.9348526000976562, 0.122711181640625, 1.6821670532226562, -0.01131439208984375, 2.8806304931640625, 4.482627868652344, 5.503517150878906, 1.1188430786132812, 2.4624786376953125, 0.5565776824951172, 3.4800949096679688, -2.437896728515625, 0.3886299133300781, 4.969352722167969, -0.185272216796875, 0.94097900390625, 1.2938995361328125, 3.1303558349609375, 2.7213516235351562, 1.302215576171875, -1.8929595947265625, 2.2458457946777344, 2.0252304077148438, -2.0057716369628906, -4.329742431640625, 7.149147033691406, 0.539337158203125, 4.002372741699219, -1.3376655578613281, -0.6985816955566406, 0.69061279296875, 1.13421630859375, 2.332763671875, 2.2725296020507812, 0.33693695068359375, 4.23651123046875, 0.3629150390625, -0.5829925537109375, -1.8099193572998047, 0.12963104248046875, 1.5287933349609375, 0.6910114288330078, 1.8035812377929688, 0.57330322265625, 0.30438232421875, 3.949676513671875, -0.1821136474609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000288.npy"}
|
||||
{"epoch": 0.43537414965986393, "step": 289, "batch_size": 64, "mean": 1.0400168895721436, "std": 1.8364416360855103, "min": -2.5085906982421875, "p10": -0.9974252700805661, "median": 0.8192024230957031, "p90": 3.626511764526368, "max": 6.126106262207031, "pos_frac": 0.703125, "sample": [-0.7396907806396484, 0.18328475952148438, -2.5085906982421875, -0.3513641357421875, 0.8094940185546875, -1.1463623046875, -0.0039520263671875, -0.19733047485351562, 0.6805877685546875, 0.1895904541015625, -1.2252006530761719, 0.9660301208496094, 0.6255455017089844, 1.4603843688964844, 1.524200439453125, 0.6040229797363281, 2.438385009765625, -1.7563552856445312, 0.7634391784667969, 4.8437347412109375, 0.8482513427734375, 5.041557312011719, 1.1098251342773438, 0.7864990234375, 1.8258895874023438, 0.41277313232421875, 2.319122314453125, 1.1777839660644531, 1.2121810913085938, 2.986612319946289, -0.32708740234375, 2.608732223510742, 0.5505590438842773, -1.3795623779296875, 6.126106262207031, -0.6498565673828125, -0.5346298217773438, 4.813079833984375, 0.9733123779296875, 1.1546478271484375, 3.853759765625, 2.2779617309570312, 1.0171432495117188, -0.7775897979736328, 2.716350555419922, -2.3722152709960938, 1.0023479461669922, 0.49233245849609375, -0.5085525512695312, -0.7105484008789062, 3.4262161254882812, 1.2515869140625, 0.8289108276367188, -0.5872802734375, 4.926971435546875, 3.712352752685547, 3.3824462890625, 3.00994873046875, 1.245758056640625, -0.15471267700195312, 0.8614025115966797, -1.0916404724121094, 0.5388545989990234, 0.00362396240234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000289.npy"}
|
||||
{"epoch": 0.436885865457294, "step": 290, "batch_size": 64, "mean": 1.3873467445373535, "std": 1.7249592542648315, "min": -2.70263671875, "p10": -0.7769222259521484, "median": 1.505579948425293, "p90": 3.7490478515625014, "max": 6.019989013671875, "pos_frac": 0.75, "sample": [2.767608642578125, 2.816516876220703, 1.501333236694336, 4.573272705078125, 2.196056365966797, 0.9332370758056641, -0.2681140899658203, -0.5444679260253906, -0.3533782958984375, 1.98663330078125, 1.2807426452636719, 1.6944618225097656, 0.20053863525390625, -0.8849029541015625, 3.4064178466796875, -0.12876129150390625, 1.5812149047851562, -1.5718841552734375, -0.7862319946289062, 1.731658935546875, -0.13994979858398438, 4.344108581542969, 0.9983024597167969, 1.7283859252929688, 1.7081985473632812, -0.8317604064941406, 2.3966712951660156, 0.41571044921875, 0.07147216796875, -0.8029251098632812, 3.0576629638671875, 4.169162750244141, 3.1006622314453125, 1.6580123901367188, -0.4228363037109375, 4.4637908935546875, -0.7275772094726562, 0.340179443359375, 1.682647705078125, 6.019989013671875, 2.9399871826171875, 3.8958892822265625, 0.9566268920898438, -2.70263671875, 1.6900558471679688, 0.5272445678710938, 1.064072608947754, 4.496223449707031, 3.266399383544922, 2.7347869873046875, 1.7642364501953125, 1.50982666015625, -0.7551994323730469, 0.8617439270019531, -0.8731575012207031, 2.891754150390625, 0.29443359375, 2.411937713623047, 2.1768798828125, -0.2330169677734375, 0.7724075317382812, 1.953765869140625, 0.94537353515625, 0.8386993408203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000290.npy"}
|
||||
{"epoch": 0.4383975812547241, "step": 291, "batch_size": 64, "mean": 1.4405841827392578, "std": 2.216442108154297, "min": -3.981996536254883, "p10": -1.2309913635253906, "median": 1.4478645324707031, "p90": 4.089361572265625, "max": 6.456199645996094, "pos_frac": 0.734375, "sample": [4.00140380859375, 1.4106674194335938, 0.0928955078125, 0.14788818359375, 2.62158203125, 4.624107360839844, -1.2304306030273438, 1.0534515380859375, 4.0607147216796875, -0.6099510192871094, 2.9021663665771484, 3.2747802734375, 0.10683822631835938, -2.255054473876953, 6.126348495483398, 3.0104293823242188, 0.5165252685546875, -1.2590484619140625, 5.492828369140625, 0.2134552001953125, -1.231231689453125, -1.0740966796875, 3.0000534057617188, 1.8105926513671875, 4.131538391113281, 2.8033218383789062, 4.1016387939453125, 0.3444061279296875, 4.971553802490234, 0.5477638244628906, 0.46738433837890625, 3.187225341796875, 0.14495182037353516, 0.9130363464355469, -1.3191909790039062, 6.456199645996094, 3.0036468505859375, -0.490142822265625, 2.6793441772460938, 0.8019447326660156, -0.0826568603515625, 3.4789199829101562, -3.981996536254883, -0.15314865112304688, 2.760547637939453, 2.8614044189453125, 1.8133010864257812, 1.7192001342773438, 2.448995590209961, 1.80328369140625, -2.5855064392089844, 0.90899658203125, -2.9609603881835938, 3.6850357055664062, -0.0403289794921875, 1.5546112060546875, -0.5204458236694336, 3.8755722045898438, 1.1595840454101562, 3.3208389282226562, 1.5459518432617188, -0.5068740844726562, -0.9435348510742188, 1.4850616455078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000291.npy"}
|
||||
{"epoch": 0.4399092970521542, "step": 292, "batch_size": 64, "mean": 1.1498451232910156, "std": 2.118457317352295, "min": -3.1600494384765625, "p10": -1.4939846038818358, "median": 0.8853969573974609, "p90": 4.1394184112548835, "max": 6.492698669433594, "pos_frac": 0.71875, "sample": [-1.6844635009765625, -0.46787261962890625, 2.0333776473999023, -0.040515899658203125, 2.1865158081054688, 1.4911937713623047, 1.2944450378417969, 5.357421875, 0.43556880950927734, 0.0455474853515625, 6.182735443115234, 0.9261016845703125, 2.4909400939941406, 0.6024017333984375, -2.3144378662109375, 3.9483299255371094, 2.72735595703125, 2.5282516479492188, 1.5026626586914062, -0.9163055419921875, 0.13440704345703125, 4.2213134765625, 2.4544296264648438, -0.5598773956298828, -1.208404541015625, 0.0981903076171875, -0.804473876953125, 1.342010498046875, -3.1600494384765625, 0.6115760803222656, 0.5343475341796875, 1.528573989868164, 2.283660888671875, 1.0203075408935547, -0.44478607177734375, -1.3946151733398438, 2.591686248779297, -2.0772552490234375, -0.4155006408691406, 0.7020492553710938, -0.658477783203125, 0.9153900146484375, 2.175840377807617, -1.5365715026855469, 4.864410400390625, 4.289003372192383, 1.462677001953125, 0.7130165100097656, 0.18990230560302734, 1.2637405395507812, 3.205291748046875, 1.4281463623046875, 0.48493385314941406, 3.692657470703125, 0.6626911163330078, 0.5229721069335938, 0.8554039001464844, -0.615234375, 6.492698669433594, 3.2685775756835938, 1.947265625, -2.1322479248046875, -1.6649894714355469, 5.9761505126953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000292.npy"}
|
||||
{"epoch": 0.4414210128495843, "step": 293, "batch_size": 64, "mean": 1.5288606882095337, "std": 2.323331594467163, "min": -5.732929229736328, "p10": -1.082382583618164, "median": 1.4836502075195312, "p90": 3.984207534790039, "max": 6.6654205322265625, "pos_frac": 0.796875, "sample": [-0.2904815673828125, 2.9043312072753906, 3.7207489013671875, 3.2402191162109375, 0.21674346923828125, 2.475475311279297, 0.06137275695800781, 1.5016307830810547, 0.9697418212890625, 0.914215087890625, 0.8711013793945312, 0.5106048583984375, 6.180225372314453, 1.972442626953125, 3.0281982421875, 0.22784423828125, 3.0067672729492188, 2.6183013916015625, 3.7430782318115234, 1.3555831909179688, 2.30133056640625, -1.5878372192382812, 0.5554962158203125, -1.660125732421875, 3.0922679901123047, 1.7402076721191406, 1.4656696319580078, -0.11966133117675781, 2.0130538940429688, 6.334625244140625, 0.4164924621582031, 2.5413894653320312, 3.9825439453125, -0.8746986389160156, 3.53204345703125, -1.1497001647949219, 3.9849205017089844, -5.732929229736328, 1.5144271850585938, 1.3301239013671875, -0.36808013916015625, 6.6654205322265625, -2.4676132202148438, -0.3749847412109375, 0.365203857421875, 0.029109954833984375, 2.009857177734375, 2.247831344604492, 6.461395263671875, 5.376152038574219, 3.676748275756836, -3.384033203125, 3.0700416564941406, 1.3241767883300781, 0.6128425598144531, -1.8420257568359375, 3.317169189453125, 0.43084716796875, 0.06632232666015625, 2.04937744140625, 2.330718994140625, -0.9253082275390625, 0.28261566162109375, 3.9855194091796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000293.npy"}
|
||||
{"epoch": 0.4429327286470144, "step": 294, "batch_size": 64, "mean": 1.206186056137085, "std": 2.0249528884887695, "min": -3.66705322265625, "p10": -0.9341461181640625, "median": 1.0993585586547852, "p90": 4.026845169067383, "max": 6.35693359375, "pos_frac": 0.734375, "sample": [5.094245910644531, 1.216522216796875, 5.509391784667969, -1.6949005126953125, 2.792795181274414, 2.7802505493164062, 3.2400474548339844, -0.835235595703125, -1.7116165161132812, 1.6609153747558594, 3.21490478515625, 0.5964202880859375, -0.9548358917236328, 4.522125244140625, 1.1652536392211914, -0.37505149841308594, 4.181459426879883, 0.8434219360351562, 2.14556884765625, 4.033939361572266, -0.4079742431640625, 0.08279037475585938, 1.470510482788086, 1.5489883422851562, -0.6268806457519531, 6.030891418457031, -2.2026290893554688, 1.3126964569091797, 0.4315948486328125, 6.35693359375, -3.66705322265625, -0.14474105834960938, 0.41423797607421875, 1.80645751953125, 0.020307540893554688, -1.827667236328125, 0.041534423828125, -1.1374664306640625, 1.659454345703125, 3.8433914184570312, 0.2608489990234375, 1.2524261474609375, 3.134256362915039, -0.8858699798583984, 2.3646888732910156, 1.6697006225585938, 1.2330093383789062, 2.2845535278320312, -0.22040367126464844, 4.010292053222656, -0.2980213165283203, 2.1416778564453125, 0.67041015625, 1.1188201904296875, 1.9200763702392578, 0.0862579345703125, 0.32098388671875, -0.7117271423339844, 1.0798969268798828, -0.4199638366699219, 0.7698135375976562, 0.6097946166992188, 2.3722686767578125, 0.0011138916015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000294.npy"}
|
||||
{"epoch": 0.4444444444444444, "step": 295, "batch_size": 64, "mean": 1.2245147228240967, "std": 2.227468729019165, "min": -3.1199378967285156, "p10": -0.9791103363037109, "median": 0.9218635559082031, "p90": 2.9312835693359376, "max": 10.641143798828125, "pos_frac": 0.71875, "sample": [2.9451446533203125, -0.07883071899414062, -0.2727317810058594, 0.9591217041015625, 1.8307266235351562, 1.9407196044921875, 2.52362060546875, 0.3635902404785156, 2.115062713623047, 1.456939697265625, 7.252494812011719, 1.3067817687988281, 4.075836181640625, 1.6383628845214844, 0.898590087890625, 1.9608612060546875, -0.0053558349609375, 1.6758346557617188, -2.6575927734375, 1.3884849548339844, 2.8989410400390625, 0.7110443115234375, 0.8171100616455078, 2.1683120727539062, 3.90509033203125, -2.912353515625, 0.37892913818359375, -1.2366523742675781, 1.601776123046875, -1.1093215942382812, -0.0629730224609375, -0.3255119323730469, 5.693943023681641, 0.8812713623046875, 0.4222869873046875, 0.17218399047851562, 1.7131481170654297, 0.2698516845703125, 2.7844085693359375, 2.296985626220703, 1.9388275146484375, -3.1199378967285156, 0.5972785949707031, 10.641143798828125, 0.8814239501953125, 1.791778564453125, -2.1392784118652344, 0.3264312744140625, -0.9911460876464844, 0.7500438690185547, -0.5581932067871094, -0.1192779541015625, -0.6916427612304688, 0.9451370239257812, -0.9510269165039062, 1.9900789260864258, 2.653697967529297, 2.0786285400390625, 0.0992889404296875, -0.2787628173828125, 2.3047828674316406, 2.722076416015625, 5.672996520996094, -0.5615367889404297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000295.npy"}
|
||||
{"epoch": 0.4459561602418745, "step": 296, "batch_size": 64, "mean": 1.4355229139328003, "std": 2.1202003955841064, "min": -2.2796478271484375, "p10": -1.229240036010742, "median": 1.2163658142089844, "p90": 4.4285118103027346, "max": 5.966716766357422, "pos_frac": 0.703125, "sample": [1.9431991577148438, 5.09991455078125, -0.12929534912109375, 1.1490135192871094, -2.2796478271484375, 3.6608123779296875, -0.27140235900878906, 4.464691162109375, -0.3619575500488281, 1.8370208740234375, 1.5507774353027344, 5.5794677734375, -0.9634304046630859, 5.147502899169922, 0.7880096435546875, 1.9060516357421875, 2.954193115234375, 3.0883827209472656, 0.24434471130371094, 4.533531188964844, 0.023767471313476562, -0.771240234375, 4.2322235107421875, -1.46112060546875, -2.0994415283203125, -1.9408416748046875, 1.1487960815429688, 1.8881397247314453, 1.589111328125, 1.2837181091308594, 0.0499420166015625, -1.3128700256347656, -1.639129638671875, 2.9705810546875, 3.6725387573242188, -0.399078369140625, -0.7681961059570312, 0.1421051025390625, 2.2965192794799805, -1.3387908935546875, 1.1146087646484375, -0.3976287841796875, 5.1915283203125, 0.57733154296875, 2.5268402099609375, 0.9015350341796875, 1.8969573974609375, 2.358491897583008, 2.009227752685547, 5.966716766357422, 2.2664718627929688, 0.17604827880859375, 3.5787315368652344, 0.4966163635253906, 4.344093322753906, -0.3105926513671875, -1.0341033935546875, 3.1789321899414062, 0.923919677734375, -0.775054931640625, -0.7922134399414062, 3.4883270263671875, 3.6716766357421875, 3.007091522216797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000296.npy"}
|
||||
{"epoch": 0.4474678760393046, "step": 297, "batch_size": 64, "mean": 1.7435266971588135, "std": 1.754814624786377, "min": -2.0933380126953125, "p10": -0.6719844818115233, "median": 1.7404708862304688, "p90": 4.178802871704102, "max": 5.249530792236328, "pos_frac": 0.8125, "sample": [1.7624130249023438, -0.7933120727539062, 0.13616943359375, 4.194160461425781, 2.4306640625, 5.249530792236328, 1.512664794921875, 3.6209850311279297, -2.0890045166015625, -0.4640064239501953, 2.5173912048339844, 1.7185287475585938, 1.0194129943847656, 4.345848083496094, 0.23638153076171875, 4.143348693847656, 1.4071884155273438, 2.8499908447265625, 1.3498401641845703, 1.8793487548828125, 1.2525634765625, 0.6612091064453125, -0.81500244140625, -2.0933380126953125, -0.4422578811645508, 4.3776092529296875, 3.1863861083984375, 0.2622528076171875, 4.6041412353515625, 1.6664237976074219, 2.4533119201660156, 2.8240299224853516, -0.07037353515625, 0.19410324096679688, -0.13953781127929688, 2.087984085083008, -0.20810317993164062, 0.707427978515625, 2.846996307373047, 0.5149078369140625, 4.1610107421875, 1.1500396728515625, 1.983673095703125, 4.073752403259277, 1.2071304321289062, 4.4573974609375, 0.8283309936523438, -1.2283639907836914, 2.7598419189453125, 2.2507801055908203, 1.086456298828125, 1.613698959350586, 2.33642578125, 3.9857635498046875, 2.0248146057128906, -0.7611179351806641, 3.7604141235351562, 2.8388214111328125, -0.8803558349609375, 3.2789535522460938, 2.204620361328125, 1.8294429779052734, 4.186428070068359, 1.5394744873046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000297.npy"}
|
||||
{"epoch": 0.4489795918367347, "step": 298, "batch_size": 64, "mean": 1.2851074934005737, "std": 2.052426338195801, "min": -3.554147720336914, "p10": -1.312020492553711, "median": 1.2441768646240234, "p90": 3.860277557373047, "max": 6.9304351806640625, "pos_frac": 0.75, "sample": [0.7486114501953125, -3.554147720336914, 4.879646301269531, 0.9368438720703125, 2.9089279174804688, -1.36431884765625, 0.24024581909179688, 0.5131301879882812, 1.8402976989746094, 0.6080150604248047, 3.4969406127929688, 3.8277053833007812, -0.2449493408203125, 0.489410400390625, -2.9225234985351562, 3.2151947021484375, 2.446136474609375, -1.7610931396484375, 3.9597129821777344, 1.5108871459960938, 1.5037803649902344, -0.12806129455566406, 2.4227752685546875, 2.16015625, 0.4793701171875, 0.2368316650390625, -1.426025390625, 2.713245391845703, 2.257192611694336, 5.184720039367676, 0.5855903625488281, 2.0805320739746094, 2.3928298950195312, 3.874237060546875, 1.5813560485839844, 2.974761962890625, 0.8091554641723633, 1.8185787200927734, 0.39091968536376953, 0.2391204833984375, 1.0833206176757812, 0.38359642028808594, -1.1899909973144531, 3.3410797119140625, 1.806161880493164, 6.9304351806640625, 4.401432037353516, -0.779510498046875, 0.3964118957519531, 2.0369873046875, -0.11972808837890625, -0.751708984375, -0.31322479248046875, -1.9925155639648438, -0.5525932312011719, 2.1973915100097656, -0.828125, 1.9930572509765625, 0.72979736328125, -2.6647682189941406, 2.88800048828125, 3.1631088256835938, 1.4050331115722656, 4.7574920654296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000298.npy"}
|
||||
{"epoch": 0.4504913076341648, "step": 299, "batch_size": 64, "mean": 0.9038129448890686, "std": 1.7353503704071045, "min": -3.018726348876953, "p10": -0.8734390258789062, "median": 0.732731819152832, "p90": 2.668848419189455, "max": 7.66082763671875, "pos_frac": 0.6875, "sample": [1.23846435546875, 1.0002288818359375, -0.17496490478515625, -1.50714111328125, -0.3902778625488281, 2.141672134399414, 0.109466552734375, 1.5869560241699219, 3.8869781494140625, 0.11741065979003906, -0.7961006164550781, 0.152374267578125, -2.82037353515625, 2.288097381591797, -0.8012809753417969, 0.91424560546875, 1.6193389892578125, 3.201068878173828, 2.2635421752929688, 0.9658050537109375, 4.798625946044922, -0.03237152099609375, -0.15850067138671875, 1.8827743530273438, -0.646392822265625, -0.5944366455078125, -0.5704154968261719, 0.30049896240234375, 7.66082763671875, -0.882049560546875, 0.9466705322265625, 1.8561019897460938, 0.9470176696777344, -3.018726348876953, 1.282806396484375, 0.42022705078125, 1.691802978515625, 1.9839248657226562, 0.491058349609375, -0.0438232421875, 0.7061080932617188, -0.0947723388671875, 2.0008468627929688, 1.6945877075195312, 4.1648101806640625, -0.8835334777832031, 0.7593555450439453, 2.0691757202148438, -0.6163330078125, 2.2141456604003906, 0.6924467086791992, -1.1007766723632812, 1.9488649368286133, -0.8533477783203125, 0.22967529296875, 2.0194320678710938, 1.9378662109375, 2.8320274353027344, 0.5069961547851562, 1.6966094970703125, 0.3622446060180664, 0.069488525390625, -1.1085700988769531, 3.285552978515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000299.npy"}
|
||||
{"epoch": 0.4520030234315949, "step": 300, "batch_size": 64, "mean": 1.1859978437423706, "std": 2.2692294120788574, "min": -4.1451568603515625, "p10": -1.1053264617919922, "median": 1.2705974578857422, "p90": 3.367863845825196, "max": 8.8446044921875, "pos_frac": 0.71875, "sample": [0.27629947662353516, 2.0717391967773438, 2.2932281494140625, 4.847358703613281, 0.7156867980957031, -0.63604736328125, 1.7637557983398438, 1.4896926879882812, -3.2235031127929688, -2.1002941131591797, 7.508598327636719, 1.6914596557617188, 3.4318809509277344, 2.891632080078125, 5.6336212158203125, 1.5060882568359375, 0.8342437744140625, -0.4107398986816406, 1.0683441162109375, 1.2221832275390625, 1.5981330871582031, 3.2184906005859375, 1.869659423828125, 1.4621715545654297, -0.8876495361328125, 0.6188201904296875, 0.5890045166015625, -0.8356132507324219, 1.7073211669921875, 2.164215087890625, 0.279296875, -4.1451568603515625, 1.6945953369140625, -0.7293777465820312, 8.8446044921875, 0.0306396484375, 0.1867828369140625, 4.1049041748046875, 5.2871551513671875, 2.076171875, -0.0547027587890625, -0.3161277770996094, -1.63720703125, 2.161834716796875, 0.8858184814453125, -2.1246185302734375, 0.7753105163574219, 1.3039970397949219, -1.120697021484375, 2.844843864440918, 2.869354248046875, -0.5053138732910156, -1.0529022216796875, 1.6122245788574219, 1.1362533569335938, -1.0694618225097656, 1.7720718383789062, 2.4725723266601562, 2.7533111572265625, -2.8735275268554688, 1.7168121337890625, 1.2371978759765625, 1.75830078125, -0.65087890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000300.npy"}
|
||||
{"epoch": 0.45351473922902497, "step": 301, "batch_size": 64, "mean": 1.060452938079834, "std": 2.0807723999023438, "min": -2.4336681365966797, "p10": -1.6534198760986327, "median": 1.0240001678466797, "p90": 3.8239166259765627, "max": 5.7635040283203125, "pos_frac": 0.609375, "sample": [1.6217155456542969, 2.3179397583007812, 1.6496200561523438, 0.46957969665527344, 1.542724609375, -1.5006027221679688, 3.7491836547851562, 1.4986610412597656, 2.5124969482421875, -0.12488174438476562, 0.17920684814453125, -0.05242919921875, 3.75848388671875, -1.4206695556640625, -1.6781425476074219, -0.0258636474609375, -2.2073516845703125, 1.0564727783203125, 1.904388427734375, -0.10409927368164062, -0.8023815155029297, 1.9167938232421875, 4.456745147705078, -1.3528289794921875, 2.5678329467773438, 0.61749267578125, -0.4376220703125, 3.710315704345703, -0.13014984130859375, -2.4077606201171875, -1.8796348571777344, -2.266986846923828, 3.86822509765625, 5.37237548828125, -1.595733642578125, -0.29618072509765625, -0.50653076171875, 4.946197509765625, -0.7215385437011719, 1.4108924865722656, 0.09567642211914062, 0.7596817016601562, 2.1637420654296875, -2.4336681365966797, 0.9915275573730469, 2.922389030456543, 0.8436622619628906, 2.2551422119140625, 1.4749298095703125, 3.851959228515625, -0.16901016235351562, 5.5101470947265625, 1.9802932739257812, -0.23057937622070312, -1.7717971801757812, 2.0564308166503906, 3.1477203369140625, 5.7635040283203125, 1.3256301879882812, 1.40155029296875, -0.17385101318359375, -0.2902030944824219, 3.708160400390625, 1.0699920654296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000301.npy"}
|
||||
{"epoch": 0.455026455026455, "step": 302, "batch_size": 64, "mean": 0.7300394773483276, "std": 1.6112494468688965, "min": -2.6235504150390625, "p10": -1.2804035186767575, "median": 0.7068958282470703, "p90": 2.662318229675293, "max": 5.34423828125, "pos_frac": 0.671875, "sample": [0.9075202941894531, 1.8342170715332031, -0.5694198608398438, 5.34423828125, 0.7645912170410156, 2.1462783813476562, 3.3256797790527344, -1.8360977172851562, 3.7827072143554688, 0.3758201599121094, 0.5477981567382812, 1.403656005859375, 0.9437437057495117, 0.779083251953125, 0.8331222534179688, 1.4553565979003906, 0.3423881530761719, 1.1696891784667969, -0.8208389282226562, -0.87542724609375, 0.23978424072265625, -2.07196044921875, -2.6235504150390625, -0.0521697998046875, 2.424041748046875, -1.6644535064697266, 0.6286430358886719, 1.7550430297851562, -0.9233665466308594, 2.6654052734375, -1.0448760986328125, 1.3398284912109375, 1.283233642578125, -0.1186227798461914, -1.3813438415527344, 3.2942047119140625, 3.2714309692382812, -2.0932769775390625, 1.572998046875, -0.2608146667480469, 1.3732852935791016, 2.6551151275634766, -0.12199783325195312, 1.3488693237304688, -0.2452545166015625, 0.3430023193359375, -0.8438262939453125, 2.0882949829101562, 0.2193145751953125, 1.3161163330078125, 2.5137767791748047, 1.1619491577148438, 0.649200439453125, -0.5270004272460938, 0.040191650390625, -1.4840717315673828, 1.995361328125, 4.634559631347656, -0.30194091796875, 0.9126052856445312, -0.9727249145507812, 0.508880615234375, 1.16534423828125, 0.19919204711914062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000302.npy"}
|
||||
{"epoch": 0.4565381708238851, "step": 303, "batch_size": 64, "mean": 1.6581302881240845, "std": 2.1615493297576904, "min": -3.0704498291015625, "p10": -0.6910736083984375, "median": 1.4232044219970703, "p90": 4.306397247314455, "max": 8.253292083740234, "pos_frac": 0.78125, "sample": [0.142486572265625, 2.2990942001342773, 1.8773956298828125, 2.149595260620117, 6.46051025390625, 4.953956604003906, 3.3037033081054688, -0.4239645004272461, 1.7643394470214844, -1.2430267333984375, 1.2890243530273438, -0.21866226196289062, 0.880218505859375, -0.60064697265625, -0.95098876953125, -1.2897300720214844, 0.36163330078125, 3.070098876953125, -0.729827880859375, -0.206268310546875, -0.37894248962402344, 1.9628829956054688, 6.245147705078125, 3.2388534545898438, -3.0704498291015625, 1.8060798645019531, 5.154682159423828, -1.4068679809570312, -0.4197883605957031, 1.3045654296875, 1.3789520263671875, 2.6588821411132812, 1.3931808471679688, 0.2702140808105469, 1.0001373291015625, 0.0090484619140625, 3.9773216247558594, 0.4103546142578125, 2.3316497802734375, 2.490509033203125, 0.46779632568359375, 1.4532279968261719, 2.1265869140625, 4.4461822509765625, 0.4120635986328125, 3.9762344360351562, 8.253292083740234, 0.8542327880859375, 1.992095947265625, 3.3112945556640625, 1.777435302734375, 0.9204483032226562, -0.05084991455078125, 0.199066162109375, 3.4791946411132812, 2.8808441162109375, 0.7334938049316406, 1.90399169921875, 3.9802322387695312, 1.3454208374023438, 6.360263824462891, -2.1876754760742188, 2.7773056030273438, 1.4628067016601562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000303.npy"}
|
||||
{"epoch": 0.4580498866213152, "step": 304, "batch_size": 64, "mean": 1.1859947443008423, "std": 1.629875659942627, "min": -3.4935760498046875, "p10": -0.5935569763183592, "median": 0.9953422546386719, "p90": 3.2661407470703128, "max": 5.87371826171875, "pos_frac": 0.796875, "sample": [2.6241531372070312, 0.5005149841308594, -0.651031494140625, 3.3157958984375, 0.5834579467773438, 1.6810016632080078, 0.7202301025390625, 2.056692123413086, 0.09231758117675781, -0.1286296844482422, 0.4235076904296875, 2.797027587890625, 1.554229736328125, -0.7464866638183594, 3.3229732513427734, 0.9174613952636719, -2.167572021484375, 1.6076545715332031, 5.87371826171875, -3.4935760498046875, 0.712799072265625, 0.33452606201171875, 0.523162841796875, 0.8285026550292969, 1.82568359375, -0.9004058837890625, 1.1174964904785156, 2.8046035766601562, 1.9789886474609375, 3.3112945556640625, 1.0059738159179688, 1.2209892272949219, 2.857147216796875, 0.89697265625, -0.8471145629882812, -1.952117919921875, 1.2989616394042969, 0.17664337158203125, 1.1818294525146484, 0.3710174560546875, 0.09181404113769531, -0.1627521514892578, 3.3563995361328125, 2.15728759765625, 1.800766944885254, 1.0454025268554688, -0.2800025939941406, 2.9942665100097656, 1.4101181030273438, 2.1377334594726562, -0.45944976806640625, 0.42432403564453125, 4.141395568847656, 0.48436737060546875, 2.986968994140625, 0.90594482421875, 1.3505496978759766, 2.5792770385742188, 4.95733642578125, -0.10208892822265625, 0.5009307861328125, -0.19281005859375, 0.984710693359375, 3.1607818603515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000304.npy"}
|
||||
{"epoch": 0.4595616024187453, "step": 305, "batch_size": 64, "mean": 1.213738203048706, "std": 2.2062575817108154, "min": -4.2286224365234375, "p10": -1.1813568115234372, "median": 1.0518217086791992, "p90": 3.8273441314697267, "max": 6.876762390136719, "pos_frac": 0.703125, "sample": [4.695438385009766, 0.074127197265625, -2.3638763427734375, -0.21604156494140625, 0.34461212158203125, -0.2897911071777344, 4.278049468994141, -3.5936203002929688, 2.5920257568359375, 3.8227806091308594, 0.9563484191894531, 2.914306640625, 0.93536376953125, 3.0264625549316406, -2.4060821533203125, -0.9484329223632812, -0.4622459411621094, 0.45307159423828125, -0.7233734130859375, 5.085113525390625, 0.1299591064453125, 3.447233200073242, 1.2479286193847656, 1.5111236572265625, -0.4893608093261719, 1.0868988037109375, 1.54351806640625, -1.2811813354492188, 1.154327392578125, -4.2286224365234375, 3.4285011291503906, 1.800424575805664, 3.1116371154785156, 2.6984710693359375, -1.37298583984375, 1.0424556732177734, 6.876762390136719, -0.142364501953125, 0.2655525207519531, 1.4026947021484375, 0.5447731018066406, 3.1603927612304688, 3.8292999267578125, 4.004054069519043, 2.359344482421875, -0.5663414001464844, 0.5878524780273438, -1.921966552734375, 1.7744216918945312, 3.4492225646972656, -0.6178817749023438, 2.0498733520507812, 1.410614013671875, 6.2122955322265625, 3.7013473510742188, 3.29217529296875, -0.6488037109375, 1.061187744140625, -0.6569328308105469, 3.0579605102539062, -0.6371688842773438, 0.05420875549316406, 0.5634918212890625, 0.20861053466796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000305.npy"}
|
||||
{"epoch": 0.46107331821617537, "step": 306, "batch_size": 64, "mean": 1.0709459781646729, "std": 2.035379409790039, "min": -3.185272216796875, "p10": -0.8959758758544921, "median": 0.6202507019042969, "p90": 3.372557067871094, "max": 9.87689208984375, "pos_frac": 0.703125, "sample": [3.6117172241210938, 0.5112533569335938, -0.28429508209228516, 3.276683807373047, 0.6322021484375, -0.045169830322265625, 0.86663818359375, 0.7987518310546875, 1.4012861251831055, 2.5560989379882812, 2.726154327392578, 1.8719406127929688, -3.185272216796875, -0.2644081115722656, 0.6082992553710938, -1.4497451782226562, 3.57818603515625, 1.827056884765625, -0.22826004028320312, 1.3862075805664062, 0.0290069580078125, 3.1225967407226562, 2.179515838623047, -0.9147148132324219, 1.8797616958618164, 0.2956085205078125, -0.9041881561279297, 0.06719970703125, 0.94171142578125, -0.7934417724609375, 0.450775146484375, 3.4073333740234375, -0.7163162231445312, 0.36353588104248047, 1.9004783630371094, -0.47956085205078125, 3.6540985107421875, 6.5673828125, 0.21417236328125, -0.5148811340332031, -1.045186996459961, 0.41810035705566406, 1.1195907592773438, -0.539215087890625, -0.42437171936035156, 0.27236175537109375, 0.25310325622558594, 2.4898223876953125, 0.13299560546875, 4.175670623779297, 1.1826553344726562, 0.7184104919433594, -0.67047119140625, 0.49605560302734375, 2.1312103271484375, 1.6283016204833984, 2.41888427734375, -1.7286911010742188, 3.0382614135742188, 9.87689208984375, -1.7772903442382812, 3.291412353515625, 1.013458251953125, -0.8768138885498047], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000306.npy"}
|
||||
{"epoch": 0.46258503401360546, "step": 307, "batch_size": 64, "mean": 0.9886834621429443, "std": 2.2550089359283447, "min": -5.51568603515625, "p10": -1.4997772216796874, "median": 0.7471542358398438, "p90": 4.076442718505862, "max": 6.783149719238281, "pos_frac": 0.671875, "sample": [1.7963485717773438, -0.15373992919921875, 0.8160667419433594, 0.2753105163574219, -1.6497421264648438, -0.088531494140625, 6.783149719238281, 1.231475830078125, 3.1128082275390625, 3.2191162109375, 0.11419677734375, 2.1410484313964844, 1.5803756713867188, 2.9383392333984375, -1.41748046875, 1.467803955078125, 2.501953125, 0.4651031494140625, 2.6272506713867188, 0.40224456787109375, 0.5446319580078125, -0.43326568603515625, -1.9925994873046875, -0.47321319580078125, 0.08870315551757812, -1.2422523498535156, 1.19879150390625, -2.3632736206054688, 5.333282470703125, -0.39203453063964844, -0.7106170654296875, 0.7549591064453125, 0.739349365234375, -1.09423828125, -1.4519309997558594, 2.5330810546875, -0.3296356201171875, 2.1474456787109375, 2.5973854064941406, -2.4535064697265625, 0.13880538940429688, 0.6378974914550781, 0.7159194946289062, 4.440826416015625, 4.493579864501953, -3.2916336059570312, 1.1152544021606445, 0.8028945922851562, 5.643795013427734, 3.2200469970703125, -5.51568603515625, 4.723644256591797, 1.2275199890136719, -1.0135364532470703, 2.8661270141601562, 2.6835899353027344, -0.21612548828125, 3.355224609375, 1.9131546020507812, 0.0965576171875, -1.1074504852294922, 4.385536193847656, 2.3159217834472656, -1.5202827453613281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000307.npy"}
|
||||
{"epoch": 0.46409674981103555, "step": 308, "batch_size": 64, "mean": 1.3352723121643066, "std": 2.199805974960327, "min": -3.985004425048828, "p10": -1.2461608886718747, "median": 1.2807388305664062, "p90": 4.019871330261231, "max": 7.074462890625, "pos_frac": 0.71875, "sample": [3.4012718200683594, 4.037254333496094, 0.519927978515625, -0.9163856506347656, 3.1246795654296875, 2.269378662109375, -0.09672164916992188, 1.404296875, 4.216033935546875, -0.91815185546875, 6.508403778076172, 0.89581298828125, 6.270668029785156, 2.5570573806762695, 0.56402587890625, 1.8782730102539062, 1.2579574584960938, -0.4272499084472656, 3.4863357543945312, 2.4274520874023438, 1.5413665771484375, -3.985004425048828, 2.9369430541992188, 1.3330421447753906, 2.5713043212890625, 4.061515808105469, -1.4096097946166992, 1.2625885009765625, -1.3583221435546875, -1.8133316040039062, 2.753692626953125, -0.423980712890625, 1.29888916015625, -0.38179969787597656, 2.163299560546875, 0.07041549682617188, 3.4615402221679688, -0.18421554565429688, 0.3480224609375, -2.8579559326171875, -0.9844512939453125, 0.501678466796875, 2.929534912109375, 4.145866394042969, 0.4221649169921875, 0.6631851196289062, 1.6527633666992188, -0.906494140625, 2.2173309326171875, 0.8068695068359375, 3.979310989379883, -0.9258270263671875, 0.177734375, 3.124408721923828, 7.074462890625, -0.3340301513671875, 2.5473175048828125, -3.033519744873047, 2.8027400970458984, 1.8672256469726562, 0.8591384887695312, -1.63616943359375, 2.5585365295410156, 1.0989265441894531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000308.npy"}
|
||||
{"epoch": 0.4656084656084656, "step": 309, "batch_size": 64, "mean": 1.3067526817321777, "std": 2.2235004901885986, "min": -4.1781463623046875, "p10": -0.6819347381591796, "median": 1.0039033889770508, "p90": 4.251791381835938, "max": 9.578033447265625, "pos_frac": 0.671875, "sample": [1.8142204284667969, 7.094825744628906, 1.0813541412353516, 4.816823959350586, 2.049297332763672, -1.322540283203125, 2.3761367797851562, 1.2510299682617188, -1.4220733642578125, -0.182373046875, -0.24259185791015625, -1.0275516510009766, -0.7116584777832031, -0.08044052124023438, 1.56268310546875, -0.54632568359375, 0.7360649108886719, 1.962860107421875, -0.612579345703125, 1.640066146850586, 1.9086589813232422, 4.76214599609375, 4.2266693115234375, -0.2663230895996094, 3.0178985595703125, 1.7058181762695312, -1.6636123657226562, 1.1291618347167969, -0.0543975830078125, 1.2757415771484375, 0.3486137390136719, -4.1781463623046875, 0.1121978759765625, 2.9404296875, 0.8041534423828125, -0.3694114685058594, 0.34200286865234375, 1.750732421875, -0.4127349853515625, 4.2625579833984375, 0.34405517578125, 0.92645263671875, 2.0591869354248047, 2.2465667724609375, 1.6245346069335938, -1.018280029296875, 2.497772216796875, 1.641357421875, 0.6488189697265625, 4.177711486816406, -0.5669784545898438, -0.29212188720703125, -0.29665374755859375, 5.2732696533203125, 5.7881622314453125, 0.7119102478027344, 0.8286094665527344, -0.4898185729980469, 0.12567138671875, -0.4537353515625, 9.578033447265625, 1.6508026123046875, 2.036184310913086, 2.711273193359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000309.npy"}
|
||||
{"epoch": 0.4671201814058957, "step": 310, "batch_size": 64, "mean": 0.489948570728302, "std": 2.035552501678467, "min": -4.5517578125, "p10": -2.162038040161133, "median": 0.6050834655761719, "p90": 3.122731018066407, "max": 4.954631805419922, "pos_frac": 0.546875, "sample": [1.9447174072265625, 0.041942596435546875, 3.2099876403808594, 1.5114631652832031, -0.5596771240234375, -3.2394027709960938, -0.6660385131835938, -0.16265106201171875, 3.499134063720703, 0.6593856811523438, -0.6671104431152344, -2.1305580139160156, 0.7013626098632812, 2.1731109619140625, -2.006500244140625, 0.8769721984863281, -0.09593772888183594, 1.795196533203125, 3.65130615234375, -2.266132354736328, 1.754241943359375, 0.7298240661621094, -0.33876800537109375, 2.3837127685546875, -2.7838973999023438, -1.0905590057373047, 1.8470611572265625, -1.1485481262207031, -1.4887847900390625, 2.9191322326660156, -1.086639404296875, -0.4757537841796875, 1.15411376953125, 2.455657958984375, -0.3440399169921875, 0.33099365234375, 0.692047119140625, 0.7113494873046875, 2.8904037475585938, 1.0976104736328125, -0.769287109375, 1.1727142333984375, -0.0635986328125, -2.8905715942382812, -0.6691513061523438, 0.55078125, -2.4650497436523438, 0.9958419799804688, 0.7923126220703125, -4.5517578125, 2.6582794189453125, 4.954631805419922, -0.21866798400878906, -2.1755294799804688, -0.20299243927001953, -0.4919319152832031, 4.61199951171875, 2.635723114013672, 3.9056167602539062, -0.8405838012695312, -2.0492477416992188, 4.351654052734375, 1.6261787414550781, 2.0096168518066406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000310.npy"}
|
||||
{"epoch": 0.46863189720332576, "step": 311, "batch_size": 64, "mean": 1.281097650527954, "std": 2.2436652183532715, "min": -3.2323379516601562, "p10": -1.2531217575073241, "median": 1.2099533081054688, "p90": 3.8918502807617195, "max": 9.218170166015625, "pos_frac": 0.71875, "sample": [1.9785919189453125, -0.43828582763671875, 2.2082443237304688, -3.2323379516601562, -0.13410186767578125, -0.24763107299804688, 1.590789794921875, 2.3382110595703125, -1.0582962036132812, 2.8405380249023438, 1.8788528442382812, -2.699493408203125, 1.46728515625, 2.0232887268066406, -0.3527679443359375, -0.44411563873291016, 0.8152694702148438, 0.478057861328125, 3.384552001953125, 0.14313507080078125, -0.6112136840820312, 2.8400726318359375, 0.4762439727783203, 0.45757293701171875, 1.542327880859375, 0.4205970764160156, -0.7044677734375, 0.17137908935546875, 1.2542266845703125, 0.293731689453125, -2.4134979248046875, 3.7211761474609375, 6.2855682373046875, 0.19532012939453125, 2.8150291442871094, 3.3679962158203125, 4.039577484130859, 1.2879867553710938, 0.39678955078125, 3.964996337890625, 9.218170166015625, 2.98468017578125, -0.39945220947265625, 4.542510986328125, -1.336618423461914, 0.7263336181640625, 0.9095611572265625, -3.186614990234375, 2.3314247131347656, 2.3438682556152344, 3.719879150390625, 2.187103271484375, -0.0803985595703125, -1.7064342498779297, 3.0876617431640625, 1.5929899215698242, 0.10045814514160156, -0.2654266357421875, -1.7075996398925781, 1.165679931640625, 1.403961181640625, 5.852775573730469, 4.2464447021484375, 1.9180908203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000311.npy"}
|
||||
{"epoch": 0.47014361300075586, "step": 312, "batch_size": 64, "mean": 1.546562910079956, "std": 2.1503806114196777, "min": -2.4667205810546875, "p10": -0.7212310791015624, "median": 1.2273712158203125, "p90": 4.5287616729736335, "max": 7.1619720458984375, "pos_frac": 0.734375, "sample": [2.1583480834960938, 1.6996383666992188, -1.5806732177734375, 0.8296890258789062, 2.365711212158203, 2.955108642578125, -0.35433387756347656, 0.4645843505859375, 2.1553192138671875, 0.5056915283203125, 6.242893218994141, 2.237701416015625, 1.4366836547851562, 1.1179733276367188, 0.5370445251464844, 2.250244140625, -1.4710769653320312, 0.6864852905273438, -0.11639785766601562, 1.929534912109375, 6.313972473144531, 1.0239219665527344, -2.2703285217285156, 2.7347640991210938, 0.18782806396484375, 3.262500762939453, -0.39362335205078125, -0.0420989990234375, 1.70806884765625, 1.610687255859375, 6.603065490722656, -0.25823211669921875, 4.617088317871094, 4.322666168212891, -0.0541839599609375, 0.8361625671386719, 3.0624237060546875, 1.6571197509765625, 1.6945953369140625, 5.148345947265625, 7.1619720458984375, 1.024017333984375, -2.4667205810546875, 1.0128173828125, 3.0679855346679688, -0.4802093505859375, 0.30649566650390625, 2.4822845458984375, 4.977294921875, -0.29416656494140625, -1.1598129272460938, 0.7783241271972656, 1.2927017211914062, 1.1620407104492188, -2.1889724731445312, 1.7716598510742188, 3.3944664001464844, 1.9732856750488281, -0.6329193115234375, 3.9962921142578125, 4.262653350830078, -0.045520782470703125, -0.7590789794921875, 0.5282211303710938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000312.npy"}
|
||||
{"epoch": 0.47165532879818595, "step": 313, "batch_size": 64, "mean": 0.9621329307556152, "std": 2.2864491939544678, "min": -3.7031211853027344, "p10": -1.6343311309814452, "median": 0.7406654357910156, "p90": 3.6809696197509765, "max": 8.090896606445312, "pos_frac": 0.640625, "sample": [1.5886821746826172, 2.88360595703125, 2.212770462036133, -0.10898971557617188, 1.2725906372070312, 2.72113037109375, 1.8209228515625, 1.080265998840332, 0.2421875, -1.6251106262207031, 0.5020656585693359, 1.251455307006836, 2.9450721740722656, 1.2250595092773438, 0.48314666748046875, -0.7820510864257812, 0.9968414306640625, -1.0679798126220703, -1.717864990234375, 0.10308837890625, 1.29681396484375, 1.8955039978027344, -0.6284694671630859, 5.686820983886719, -3.3771820068359375, -2.0028629302978516, 2.3799896240234375, 1.73712158203125, 2.6682281494140625, -0.1701812744140625, 6.194313049316406, 3.5401611328125, -1.8570785522460938, -0.9261016845703125, 3.6552200317382812, 4.522491455078125, 1.9674606323242188, -3.7031211853027344, -0.711456298828125, 5.0111236572265625, 0.8059768676757812, -0.845367431640625, 0.67535400390625, -2.802154541015625, 8.090896606445312, 1.5593032836914062, -0.03508758544921875, 0.5055389404296875, 1.063323974609375, 0.3053932189941406, 4.482452392578125, 0.1513671875, -0.5344657897949219, -0.8490676879882812, -1.1446723937988281, -1.442413330078125, 2.6743927001953125, 0.5283260345458984, 2.9512367248535156, -1.2104644775390625, -1.6382827758789062, -0.7234344482421875, 2.1106643676757812, 3.692005157470703], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000313.npy"}
|
||||
{"epoch": 0.47316704459561604, "step": 314, "batch_size": 64, "mean": 1.181715965270996, "std": 2.126180410385132, "min": -3.6644134521484375, "p10": -1.1548671722412107, "median": 0.9905300140380859, "p90": 3.9835853576660156, "max": 7.4902191162109375, "pos_frac": 0.6875, "sample": [1.8655166625976562, -0.2575340270996094, 0.44513702392578125, 5.280403137207031, -1.5013694763183594, 2.5379257202148438, 0.25341033935546875, 4.3717041015625, -0.4259490966796875, 1.3044776916503906, -0.5855026245117188, -0.662109375, 5.716888427734375, -0.2201690673828125, 1.0187110900878906, 3.331268310546875, 3.4551124572753906, 2.0368871688842773, 4.265987396240234, 0.9623489379882812, 0.30303955078125, -0.2391204833984375, 2.014158248901367, 7.4902191162109375, 2.7440414428710938, -2.2721328735351562, 0.11090850830078125, 3.9961166381835938, 2.98590087890625, -1.267974853515625, 1.795938491821289, 1.852081298828125, 0.06775856018066406, 0.18302536010742188, 0.31166744232177734, 1.2996559143066406, 0.27557373046875, 3.54974365234375, -0.7636260986328125, 2.044271469116211, 0.015850067138671875, -3.6644134521484375, 4.703468322753906, 1.8873825073242188, 2.2901268005371094, -2.07208251953125, 2.0002479553222656, 3.954345703125, -0.7826309204101562, 2.7903099060058594, 1.0613632202148438, 3.2170753479003906, -0.7919921875, 0.9076766967773438, -0.7031021118164062, -0.85919189453125, -1.7059249877929688, 1.2540779113769531, 0.4045906066894531, -1.2061309814453125, 1.0900249481201172, -0.230682373046875, -1.0352516174316406, 3.4302978515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000314.npy"}
|
||||
{"epoch": 0.47467876039304613, "step": 315, "batch_size": 64, "mean": 1.2836734056472778, "std": 1.8729585409164429, "min": -2.6841506958007812, "p10": -1.1041046142578124, "median": 1.1655254364013672, "p90": 3.756867027282716, "max": 6.347614288330078, "pos_frac": 0.71875, "sample": [-1.1765823364257812, 0.3951263427734375, 1.7146186828613281, 0.8040084838867188, -0.230010986328125, 1.1662330627441406, -1.5494766235351562, -0.7476463317871094, 0.15742111206054688, 4.116119384765625, -1.6059799194335938, 2.1578216552734375, 1.2872428894042969, 2.884979248046875, -0.14386367797851562, 1.1453857421875, -0.9349899291992188, -1.2678146362304688, -0.8069076538085938, 0.9762001037597656, 1.0086669921875, 3.460174560546875, 1.9786319732666016, 0.8736801147460938, 0.645050048828125, 0.6651210784912109, 4.7109375, 0.30590057373046875, -0.4066581726074219, 1.3763351440429688, -0.75146484375, 0.08196640014648438, 3.478363037109375, 1.610076904296875, 1.1648178100585938, 2.5525894165039062, 2.0748138427734375, 1.0677947998046875, -2.6841506958007812, 2.5663909912109375, 4.241432189941406, 0.7073516845703125, 2.575347900390625, 3.8436527252197266, 1.9209671020507812, 1.2309112548828125, 2.2552719116210938, 6.347614288330078, -1.6976470947265625, 1.1690673828125, 3.8686370849609375, -0.31371307373046875, 2.70379638671875, -0.22669029235839844, 3.288005828857422, 1.71893310546875, 2.517671585083008, 2.49432373046875, 3.4103927612304688, -1.719940185546875, 4.951568603515625, -0.6009578704833984, 3.5543670654296875, -0.2061920166015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000315.npy"}
|
||||
{"epoch": 0.47619047619047616, "step": 316, "batch_size": 64, "mean": 1.391247034072876, "std": 1.8342046737670898, "min": -2.1450347900390625, "p10": -0.9976215362548827, "median": 1.5300827026367188, "p90": 3.5315887451171877, "max": 5.699745178222656, "pos_frac": 0.734375, "sample": [-1.4252281188964844, 0.0900115966796875, 0.5345993041992188, -0.7137565612792969, -0.49350738525390625, -0.0894927978515625, 3.094085693359375, 1.3710403442382812, -0.4012413024902344, 2.1034164428710938, 2.967296600341797, 4.61614990234375, 1.5163078308105469, 4.043968200683594, 3.3017120361328125, -2.1450347900390625, -1.6780929565429688, 3.1630172729492188, -1.3325023651123047, 3.557159423828125, 1.6454906463623047, 3.471923828125, 2.3722457885742188, 1.4669189453125, -1.1291465759277344, 1.9994964599609375, 1.64874267578125, 1.70501708984375, -0.3106803894042969, 3.7567481994628906, 2.75494384765625, 3.444061279296875, -0.0463714599609375, 2.665008544921875, 0.02776336669921875, 2.0571212768554688, 0.341522216796875, 1.5561332702636719, 0.2427825927734375, -0.054790496826171875, 2.4302978515625, 1.0372543334960938, 2.5806217193603516, -0.7210006713867188, 3.217041015625, 0.28374481201171875, -0.5200710296630859, 2.174896240234375, 0.048248291015625, 5.426025390625, -1.3997344970703125, -0.9020519256591797, 2.30230712890625, -1.0385799407958984, 0.4691581726074219, 0.59051513671875, 0.705657958984375, 4.759521484375, 0.98138427734375, 2.4425048828125, 5.699745178222656, 2.3847885131835938, 2.8488426208496094, 1.5438575744628906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000316.npy"}
|
||||
{"epoch": 0.47770219198790626, "step": 317, "batch_size": 64, "mean": 1.6568281650543213, "std": 2.3666293621063232, "min": -5.1094970703125, "p10": -1.1439025878906248, "median": 1.3978824615478516, "p90": 4.88889675140381, "max": 7.919254302978516, "pos_frac": 0.765625, "sample": [0.7998809814453125, 1.4774398803710938, 0.6735115051269531, 0.2738189697265625, 0.7178497314453125, 7.919254302978516, 1.1379966735839844, -0.076202392578125, 0.04888916015625, 0.17644119262695312, -1.7205543518066406, -0.2883796691894531, -0.769317626953125, 0.6093959808349609, 2.8574676513671875, 0.44339752197265625, 2.515777587890625, 1.30255126953125, 2.4205551147460938, 4.530174255371094, -2.782918930053711, -1.0429229736328125, 2.120983123779297, -0.351287841796875, 2.1976051330566406, 1.6481552124023438, 1.3183250427246094, 2.6194334030151367, 2.8061141967773438, -5.1094970703125, 6.1485595703125, -0.1248626708984375, 2.479795455932617, 5.464391708374023, 6.39893913269043, -1.2176284790039062, 2.0355072021484375, -1.3625984191894531, 2.658039093017578, 3.2936058044433594, -0.4558067321777344, 2.977752685546875, 6.685295104980469, -1.2054290771484375, 0.38234710693359375, 3.613125801086426, 2.35321044921875, 3.8640594482421875, 3.0662078857421875, 0.442352294921875, 2.6673583984375, 5.26934814453125, 1.2432727813720703, 3.886932373046875, -0.7212409973144531, 0.7471923828125, -1.1871795654296875, 2.5771331787109375, 3.8638877868652344, 5.042634963989258, 0.5265274047851562, 2.4713821411132812, 0.606719970703125, 3.072235107421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000317.npy"}
|
||||
{"epoch": 0.47921390778533635, "step": 318, "batch_size": 64, "mean": 1.055985450744629, "std": 2.3689310550689697, "min": -6.795867919921875, "p10": -1.8291875839233396, "median": 1.0354576110839844, "p90": 4.0711366653442385, "max": 5.590576171875, "pos_frac": 0.671875, "sample": [-0.1626739501953125, 2.8371505737304688, 0.022857666015625, 0.9665985107421875, 5.1906890869140625, 2.553722381591797, -0.479339599609375, -3.238250732421875, 1.2655601501464844, 3.916210174560547, 0.539947509765625, 0.453399658203125, 0.9136962890625, 1.0968399047851562, 4.500814437866211, 0.2989082336425781, 4.974494934082031, -0.5129852294921875, 1.5583248138427734, 0.9740753173828125, -0.12054634094238281, 1.6082382202148438, 3.833221435546875, -0.233673095703125, 0.5392723083496094, 2.150146484375, 4.134361267089844, -6.795867919921875, 2.0051498413085938, 1.160003662109375, -2.2346267700195312, 2.0478668212890625, -2.8444747924804688, -0.31311798095703125, -0.06279754638671875, 5.590576171875, 4.8173065185546875, 1.128509521484375, 3.4420623779296875, 1.3138275146484375, 0.5970001220703125, 1.1912689208984375, 2.5959625244140625, -0.6476898193359375, -1.25421142578125, -0.214141845703125, 0.8603363037109375, 3.7906570434570312, 0.945587158203125, 2.6466903686523438, -0.3905620574951172, 5.17730712890625, 1.307098388671875, 3.656341552734375, -2.4006919860839844, -0.08632850646972656, 2.7906570434570312, -1.662811279296875, 1.92279052734375, -1.5814895629882812, 3.923612594604492, -3.629704475402832, -1.900491714477539, 1.1104011535644531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000318.npy"}
|
||||
{"epoch": 0.48072562358276644, "step": 319, "batch_size": 64, "mean": 1.1651681661605835, "std": 2.0885746479034424, "min": -4.241199493408203, "p10": -1.1677497863769528, "median": 1.1211280822753906, "p90": 3.5216641426086426, "max": 8.728164672851562, "pos_frac": 0.71875, "sample": [2.7494659423828125, 0.7294235229492188, -0.630279541015625, 0.286529541015625, 1.1675033569335938, 1.9419708251953125, 2.8580589294433594, 0.25018310546875, 0.7645263671875, 0.7671337127685547, 1.6541938781738281, 3.741943359375, -0.00940704345703125, 1.7889633178710938, 2.9418869018554688, 3.501373291015625, -1.6962509155273438, 8.728164672851562, -0.00122833251953125, 4.127040863037109, 1.917886734008789, -0.3099365234375, 0.4232940673828125, 3.0799827575683594, 2.6683578491210938, -0.3291130065917969, 2.8177928924560547, 2.0869598388671875, 3.0524024963378906, 0.4925384521484375, -0.4857635498046875, -0.8814468383789062, 1.6323814392089844, 3.98907470703125, -4.241199493408203, 1.3079414367675781, -0.552947998046875, 2.8738441467285156, 1.7065887451171875, -1.5073661804199219, 1.6654052734375, 1.0747528076171875, 2.0511093139648438, 0.6073989868164062, 1.182403564453125, -0.8328399658203125, 0.1387176513671875, -1.471710205078125, -2.9460372924804688, 3.9458465576171875, -0.21824073791503906, -1.2904510498046875, -0.8493270874023438, 0.650787353515625, 0.18373870849609375, 3.1689910888671875, 1.6175918579101562, 1.2427616119384766, 3.530360221862793, 2.7734832763671875, 0.46832275390625, -3.3846893310546875, 4.843601226806641, 1.0163192749023438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000319.npy"}
|
||||
{"epoch": 0.48223733938019653, "step": 320, "batch_size": 64, "mean": 0.8032557964324951, "std": 2.401175022125244, "min": -3.8923511505126953, "p10": -1.9283607482910154, "median": 0.5355148315429688, "p90": 4.2210414886474625, "max": 8.412506103515625, "pos_frac": 0.625, "sample": [1.7461585998535156, -1.6276168823242188, 1.1247692108154297, -0.058658599853515625, 1.6928634643554688, -0.5984554290771484, 3.0480728149414062, 0.15742874145507812, 0.7730026245117188, 0.5208282470703125, 2.2985610961914062, -2.2028465270996094, -0.5955314636230469, -0.8230514526367188, -3.7250137329101562, 0.550201416015625, -0.6964111328125, -2.3231582641601562, -0.6408843994140625, 1.2327957153320312, -1.0361309051513672, 1.9561996459960938, -1.71343994140625, 2.2009658813476562, 0.586639404296875, 8.412506103515625, 7.4188385009765625, 1.21844482421875, 0.12004470825195312, -1.224853515625, -0.121978759765625, -3.8923511505126953, 2.025562286376953, 1.440582275390625, 0.6750240325927734, 4.837406158447266, 0.31455230712890625, -1.6301651000976562, -2.4678497314453125, 0.605682373046875, 1.8098373413085938, 3.922687530517578, 2.374401092529297, -0.9509735107421875, 0.17969512939453125, 0.42781829833984375, 5.4632110595703125, 1.3325576782226562, 0.31259918212890625, -0.041332244873046875, 0.6038990020751953, -2.0204696655273438, 5.4851226806640625, -2.0383758544921875, -0.7089920043945312, 5.046421051025391, -0.4159393310546875, 1.8046445846557617, 4.348907470703125, 1.1394424438476562, 1.8352012634277344, -1.5831527709960938, 2.9893569946289062, 0.5130729675292969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000320.npy"}
|
||||
{"epoch": 0.4837490551776266, "step": 321, "batch_size": 64, "mean": 1.2074360847473145, "std": 1.9280537366867065, "min": -3.259510040283203, "p10": -1.0684482574462888, "median": 1.0463695526123047, "p90": 3.5071060180664064, "max": 5.579860687255859, "pos_frac": 0.734375, "sample": [0.5886306762695312, 2.6881942749023438, 0.8193092346191406, 1.5396652221679688, 0.3210906982421875, -0.5999221801757812, 3.3253555297851562, 2.0262718200683594, 5.579860687255859, 0.6579666137695312, 4.799732208251953, 0.02820587158203125, 0.7083015441894531, 2.5951194763183594, -2.2243175506591797, 0.49472808837890625, 0.9161300659179688, 1.4959754943847656, 1.942291259765625, 2.4117431640625, 1.5398941040039062, 1.8000869750976562, -3.259510040283203, -2.7155075073242188, 1.607879638671875, -2.13665771484375, -0.35207366943359375, -1.2013359069824219, 1.5039310455322266, 0.9763641357421875, 1.723724365234375, 3.1976585388183594, 3.6046981811523438, 1.1104240417480469, 0.93939208984375, 3.5376434326171875, 3.077972412109375, 0.03640937805175781, 1.0019607543945312, 2.1541366577148438, -0.740692138671875, -0.33223724365234375, -0.017475128173828125, 2.131359100341797, -0.26194000244140625, -2.2496337890625, 3.0120315551757812, 3.43585205078125, 3.261882781982422, 4.040916442871094, 1.0815849304199219, 1.90631103515625, 0.4456634521484375, 4.961427688598633, 3.2904052734375, 1.0111541748046875, -0.31160736083984375, 2.2956504821777344, 4.91539192199707, -1.9188499450683594, -0.37566375732421875, -0.7583770751953125, 0.41473388671875, -0.2234039306640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000321.npy"}
|
||||
{"epoch": 0.4852607709750567, "step": 322, "batch_size": 64, "mean": 1.4601197242736816, "std": 1.852345585823059, "min": -2.8745040893554688, "p10": -0.5496841430664062, "median": 1.3133316040039062, "p90": 3.87130928039551, "max": 5.9452362060546875, "pos_frac": 0.828125, "sample": [-0.8275146484375, 2.4368057250976562, 2.160064697265625, -0.33742523193359375, 2.6745567321777344, 1.4716796875, 2.9052276611328125, 1.2474861145019531, 2.7095718383789062, 0.5477104187011719, -1.6013336181640625, 1.2867431640625, 4.38673210144043, 5.050920486450195, -0.525299072265625, 1.7842864990234375, 1.3399200439453125, 1.73199462890625, 0.3261604309082031, -0.5601348876953125, 0.8562202453613281, 1.0565872192382812, -1.9065666198730469, -0.020254135131835938, 3.05255126953125, 3.3245811462402344, 1.0940914154052734, 1.9069137573242188, 0.69586181640625, -2.461334228515625, 0.3906097412109375, 2.8906097412109375, 1.4212646484375, 3.269561767578125, 5.422386169433594, 1.6819477081298828, 1.2603759765625, 5.8430023193359375, 0.0846710205078125, 0.8785171508789062, 2.1590194702148438, 4.733102798461914, 1.5529136657714844, 2.3898963928222656, 1.39422607421875, 3.1387062072753906, 2.1109580993652344, 0.9611072540283203, -1.02972412109375, -2.8745040893554688, 0.93182373046875, 0.0843048095703125, 0.23855972290039062, 0.8546676635742188, 4.105621337890625, 0.2426605224609375, 1.4172592163085938, 0.3439178466796875, 3.19818115234375, 0.16692352294921875, 5.9452362060546875, -0.0291748046875, 2.184347152709961, 0.2778816223144531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000322.npy"}
|
||||
{"epoch": 0.48677248677248675, "step": 323, "batch_size": 64, "mean": 1.3351565599441528, "std": 2.3437187671661377, "min": -3.23004150390625, "p10": -1.3060592651367187, "median": 1.2023391723632812, "p90": 4.144680786132813, "max": 9.104660034179688, "pos_frac": 0.71875, "sample": [1.4574508666992188, 0.33123779296875, -0.1274261474609375, 0.384063720703125, 0.38283538818359375, 2.8399505615234375, 3.5238189697265625, 2.3650074005126953, 0.657562255859375, 4.838001251220703, 1.85797119140625, -0.566162109375, 4.23382568359375, 1.1555442810058594, 6.7784576416015625, -2.3551673889160156, -3.1709213256835938, 4.56597900390625, 1.4553451538085938, -0.91943359375, 1.5838623046875, 1.3237228393554688, 1.5678024291992188, 1.1925735473632812, 6.205757141113281, -0.4504680633544922, -2.3638763427734375, 0.8241710662841797, 1.3176803588867188, -1.18988037109375, 9.104660034179688, 2.0351409912109375, -1.3558502197265625, -0.23604965209960938, 3.936676025390625, 1.15380859375, -0.377685546875, 1.5100936889648438, 1.7542190551757812, 0.48397254943847656, 1.3075942993164062, 2.3046112060546875, 0.293670654296875, -0.37457275390625, 3.5555191040039062, 6.4299163818359375, 1.7179336547851562, -1.9263458251953125, 3.6218109130859375, -0.6096763610839844, -3.23004150390625, -1.81402587890625, 1.7372512817382812, 1.0255050659179688, 0.14532470703125, 2.4038925170898438, 3.137054443359375, 3.1214981079101562, 1.0236358642578125, 0.11757469177246094, -0.16351699829101562, -0.4141654968261719, 3.1191940307617188, 1.2121047973632812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000323.npy"}
|
||||
{"epoch": 0.48828420256991684, "step": 324, "batch_size": 64, "mean": 1.392437219619751, "std": 1.9192825555801392, "min": -4.419151306152344, "p10": -0.6233505249023437, "median": 1.2478065490722656, "p90": 3.6124649047851567, "max": 6.24566650390625, "pos_frac": 0.8125, "sample": [2.778545379638672, 3.0831069946289062, 0.5857696533203125, 0.099273681640625, -1.7005844116210938, 2.915813446044922, 1.4556198120117188, 3.0484161376953125, -0.6536178588867188, -0.0222930908203125, 3.4573211669921875, 4.508237838745117, 0.30796051025390625, 1.71832275390625, 0.5548744201660156, 0.8576850891113281, 2.105775833129883, -1.5815544128417969, 0.2603302001953125, -2.7147369384765625, 0.83935546875, 6.24566650390625, 1.651885986328125, 3.2876663208007812, 2.166168212890625, -0.51397705078125, -4.419151306152344, 0.9793128967285156, 5.1307373046875, 2.4951133728027344, 1.3359107971191406, -0.7257461547851562, 2.1235198974609375, 0.8198928833007812, 0.0090789794921875, 1.9228515625, 2.5479469299316406, 0.5447120666503906, 0.5129318237304688, 1.8446807861328125, 2.6011505126953125, 1.7732391357421875, 5.8051910400390625, -0.15146255493164062, 4.180023193359375, 0.6824684143066406, 1.2509002685546875, 1.4647369384765625, 2.5917510986328125, 2.4462203979492188, 5.1580963134765625, -1.082977294921875, 1.2447128295898438, -0.5527267456054688, 3.678955078125, 1.895599365234375, 0.3117523193359375, 0.8853607177734375, 0.517303466796875, 0.6606178283691406, 2.6795730590820312, 0.7335433959960938, 1.0254745483398438, -0.5463409423828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000324.npy"}
|
||||
{"epoch": 0.4897959183673469, "step": 325, "batch_size": 64, "mean": 1.0951833724975586, "std": 1.9259073734283447, "min": -4.56744384765625, "p10": -1.1113895416259765, "median": 0.8080615997314453, "p90": 3.55181350708008, "max": 6.034414291381836, "pos_frac": 0.71875, "sample": [1.1872138977050781, 5.396690368652344, 1.9610652923583984, 2.9912948608398438, -0.013202667236328125, 1.7226409912109375, 3.7795753479003906, -0.07762908935546875, 1.20025634765625, 0.19993972778320312, 0.5822372436523438, 4.603616714477539, -1.4215164184570312, -1.230337142944336, 0.7985076904296875, 0.9753761291503906, 0.7905044555664062, -0.05072021484375, 0.6729621887207031, -0.1679363250732422, 4.292510986328125, -2.1476211547851562, 1.6527290344238281, 0.8176155090332031, 0.5124664306640625, 5.455757141113281, 0.241668701171875, 2.7088088989257812, -1.9035606384277344, 1.5754108428955078, -1.10491943359375, -0.49078369140625, 3.0894317626953125, 0.188507080078125, 0.0760345458984375, 2.9038238525390625, -0.12918853759765625, 1.6308746337890625, 2.772624969482422, 3.074920654296875, 2.1682510375976562, -0.5735740661621094, -0.9671859741210938, 2.72796630859375, -4.56744384765625, 1.3184814453125, 0.15859222412109375, 1.9685440063476562, 3.7499771118164062, 2.3191394805908203, 6.034414291381836, -1.1141624450683594, -0.28308868408203125, -1.6389312744140625, 1.1904563903808594, 0.2662544250488281, 2.4286270141601562, 0.2755756378173828, 0.5727386474609375, 1.120758056640625, 1.5616378784179688, 1.8544082641601562, 0.5717239379882812, -0.16907501220703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000325.npy"}
|
||||
{"epoch": 0.491307634164777, "step": 326, "batch_size": 64, "mean": 0.787421464920044, "std": 1.842560052871704, "min": -3.3652114868164062, "p10": -1.492340850830078, "median": 0.7791213989257812, "p90": 2.7642627716064454, "max": 5.165149688720703, "pos_frac": 0.6875, "sample": [-1.5361099243164062, -0.1640625, 1.3538169860839844, 0.9904365539550781, 0.1487884521484375, -3.042205810546875, 3.961761474609375, 0.7338409423828125, 0.314666748046875, 2.7720184326171875, 0.18658447265625, 3.5448684692382812, 1.4462471008300781, -1.2254180908203125, -0.5026798248291016, 2.5940017700195312, 0.8335762023925781, 0.5993423461914062, -0.024158477783203125, 1.4911270141601562, 4.111970901489258, -0.56182861328125, -0.710723876953125, 1.8021926879882812, -2.1884307861328125, 0.5952606201171875, -1.1065902709960938, -1.6851959228515625, 2.564117431640625, 1.7170486450195312, 2.570272445678711, -1.341522216796875, 0.9213294982910156, 0.6637229919433594, 0.646820068359375, 0.176025390625, 2.3372650146484375, 1.5644378662109375, -1.3902130126953125, 0.832763671875, 1.4883651733398438, 3.3272552490234375, -0.7875556945800781, 0.0428924560546875, 2.6580543518066406, 0.9029312133789062, 4.661750793457031, -1.3009376525878906, 2.746166229248047, 5.165149688720703, 1.50048828125, -2.160390853881836, 0.4378814697265625, 1.938232421875, -2.3308639526367188, 2.727874755859375, -3.3652114868164062, 0.12462615966796875, 2.4971351623535156, 0.82440185546875, 2.3838424682617188, -0.9293785095214844, -0.06232452392578125, 1.909423828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000326.npy"}
|
||||
{"epoch": 0.4928193499622071, "step": 327, "batch_size": 64, "mean": 0.7277547717094421, "std": 1.936223030090332, "min": -4.3651275634765625, "p10": -1.7590789794921875, "median": 0.5851078033447266, "p90": 2.873703765869141, "max": 6.128868103027344, "pos_frac": 0.640625, "sample": [2.8512039184570312, -1.7755813598632812, 0.3087425231933594, 3.398468017578125, -0.175994873046875, -0.707366943359375, -1.1112899780273438, 1.4689788818359375, -0.4468574523925781, 2.661937713623047, 4.512920379638672, -0.0937347412109375, -2.5480117797851562, 3.40386962890625, 1.2882652282714844, 2.3832473754882812, 2.8833465576171875, 0.13753128051757812, -0.2921600341796875, 0.06386566162109375, -0.7748508453369141, -0.32552337646484375, 0.9568233489990234, 1.3614501953125, 1.880615234375, -2.01824951171875, -0.00037384033203125, -0.3788337707519531, 2.4015884399414062, -0.5523757934570312, 2.4843482971191406, 0.7178230285644531, -2.32208251953125, 1.856170654296875, -0.35988616943359375, -0.47054290771484375, 1.27337646484375, 0.11102294921875, -1.7205734252929688, -0.9211196899414062, 2.838104248046875, 1.4227313995361328, -4.3651275634765625, 1.0948944091796875, 2.281818389892578, 6.0342254638671875, 0.7364654541015625, 2.0004501342773438, 6.128868103027344, 1.029937744140625, 3.1944923400878906, 0.24515724182128906, 0.9483489990234375, 0.12905120849609375, -2.27685546875, 0.30826568603515625, 2.6009521484375, 1.0551376342773438, 1.4275360107421875, 0.774261474609375, 0.01767730712890625, -1.8539886474609375, 0.452392578125, -1.0586776733398438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000327.npy"}
|
||||
{"epoch": 0.4943310657596372, "step": 328, "batch_size": 64, "mean": 0.9038243293762207, "std": 1.892870306968689, "min": -4.113550186157227, "p10": -1.0722976684570313, "median": 0.5489578247070312, "p90": 3.227865600585938, "max": 5.6572418212890625, "pos_frac": 0.671875, "sample": [3.3145294189453125, -0.30484771728515625, 1.3485279083251953, 1.1747283935546875, 1.0881004333496094, 0.49129295349121094, -0.30505943298339844, 3.0451583862304688, 2.0976486206054688, 1.6726837158203125, 1.5350189208984375, 1.2558441162109375, 3.1095657348632812, -0.8034286499023438, 0.0003662109375, -0.239166259765625, 2.8036766052246094, 2.872915267944336, 0.980255126953125, -0.01387786865234375, 1.182830810546875, -0.5993118286132812, 0.4380474090576172, 3.1551589965820312, -1.0726776123046875, -1.1454086303710938, 5.6572418212890625, -0.30550384521484375, -1.1295318603515625, 0.55938720703125, -3.8722305297851562, 0.4789562225341797, 3.2590255737304688, 0.5385284423828125, 3.700469970703125, -0.13074493408203125, 0.20208740234375, 0.03276634216308594, 4.5372161865234375, 0.803009033203125, 5.025543212890625, 1.6546783447265625, -1.2775564193725586, -0.6759452819824219, 0.3183135986328125, 0.4221343994140625, -1.3256912231445312, -1.056884765625, -0.48101043701171875, 0.04875946044921875, 0.7964458465576172, -0.0810546875, 2.5608787536621094, 2.169921875, -0.5966911315917969, 0.2355632781982422, 2.0924911499023438, 1.2563743591308594, 5.265247344970703, 1.1800003051757812, -4.113550186157227, -1.0714111328125, 2.9072036743164062, 1.1777458190917969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000328.npy"}
|
||||
{"epoch": 0.4958427815570673, "step": 329, "batch_size": 64, "mean": 1.0524829626083374, "std": 2.183882713317871, "min": -2.1881484985351562, "p10": -1.3428661346435546, "median": 0.622955322265625, "p90": 3.933323669433594, "max": 7.544609069824219, "pos_frac": 0.671875, "sample": [1.0531940460205078, 0.25040435791015625, 0.1429901123046875, 1.2031326293945312, -1.7296218872070312, 1.2029342651367188, 2.632080078125, 0.10277175903320312, 0.82684326171875, -0.42308807373046875, 2.4263916015625, -0.2257080078125, -1.3704338073730469, 2.2539405822753906, -1.5104217529296875, -0.8675689697265625, -1.2785415649414062, 3.6343536376953125, 0.0157470703125, 1.043914794921875, 0.6009140014648438, 0.054622650146484375, 5.347480773925781, 3.97021484375, -2.1881484985351562, 2.8876266479492188, 7.544609069824219, 0.9455413818359375, -0.6205291748046875, 0.3914299011230469, 2.737163543701172, -0.9173812866210938, -1.3706550598144531, -1.156097412109375, 2.3884429931640625, -0.5514068603515625, 6.3830718994140625, -1.2288322448730469, 3.5352840423583984, 3.8472442626953125, 3.354034423828125, 3.23846435546875, 1.473663330078125, -0.6772003173828125, 1.4507293701171875, 1.5525856018066406, 0.7575588226318359, 0.23667526245117188, 0.2633018493652344, 2.446239471435547, -0.9856834411621094, 1.010009765625, 0.6449966430664062, -2.0140647888183594, 5.871635437011719, 4.085723876953125, -0.5153961181640625, 5.048030853271484, 0.037891387939453125, -2.0228271484375, 0.5502395629882812, 1.1411514282226562, -0.47570037841796875, -1.0970611572265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000329.npy"}
|
||||
{"epoch": 0.4973544973544973, "step": 330, "batch_size": 64, "mean": 1.4522165060043335, "std": 1.916846752166748, "min": -4.275508880615234, "p10": -0.7107362747192382, "median": 1.5895328521728516, "p90": 3.6779380798339845, "max": 5.46575927734375, "pos_frac": 0.765625, "sample": [-0.32762908935546875, -0.4882164001464844, -0.03437614440917969, -1.4876747131347656, 0.5330886840820312, -4.275508880615234, 1.4525833129882812, 2.3698883056640625, 0.9131011962890625, 2.7522964477539062, 2.1546096801757812, 0.8259773254394531, 0.14553451538085938, 0.7376708984375, 1.8140411376953125, 2.0724029541015625, 5.2244720458984375, -1.4150924682617188, 5.46575927734375, 1.2692947387695312, 2.4547500610351562, -2.588449478149414, 2.358551025390625, -0.7630672454833984, 3.476043701171875, 4.692779541015625, 0.7231903076171875, -0.5732192993164062, -0.5886306762695312, 1.0932388305664062, 1.4298553466796875, 2.7081451416015625, 0.14427947998046875, 1.7033576965332031, -1.1743488311767578, 2.83935546875, 2.8474388122558594, 2.16033935546875, 5.0807037353515625, -0.10351181030273438, -0.021129608154296875, 2.005889892578125, 0.17235565185546875, 2.0378646850585938, 2.71258544921875, 3.0791759490966797, 2.0230655670166016, 1.5267829895019531, 0.446868896484375, 3.306396484375, 1.2932586669921875, 3.681121826171875, 3.6705093383789062, 2.4315223693847656, 1.105743408203125, 0.8700332641601562, 1.8700103759765625, -0.38083648681640625, 4.0157012939453125, -1.7477645874023438, 2.347532272338867, 1.7957878112792969, 1.65228271484375, 5.424074172973633], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000330.npy"}
|
||||
{"epoch": 0.4988662131519274, "step": 331, "batch_size": 64, "mean": 1.4265344142913818, "std": 2.3712878227233887, "min": -4.943603515625, "p10": -1.4479705810546875, "median": 1.496917724609375, "p90": 4.137857818603516, "max": 8.204349517822266, "pos_frac": 0.765625, "sample": [5.329437255859375, 0.03270721435546875, 1.0239524841308594, 1.346017837524414, 2.025714874267578, 4.117408752441406, 2.41229248046875, 2.560882568359375, 0.4912872314453125, 1.7486648559570312, 0.6865024566650391, 4.5337371826171875, 4.563102722167969, 2.22271728515625, 1.8288726806640625, -0.3628997802734375, 8.204349517822266, 3.0114288330078125, 1.6933212280273438, 6.717964172363281, 2.2086944580078125, 0.007794380187988281, 1.3823165893554688, 0.3278350830078125, -0.2960700988769531, 3.120025634765625, 2.409292221069336, 6.0681610107421875, -1.7616119384765625, 0.93426513671875, 4.1466217041015625, 1.3061246871948242, 1.6259307861328125, 2.1850509643554688, -0.22992706298828125, 0.03942108154296875, 1.9826698303222656, -2.2595596313476562, -1.0364665985107422, 1.2303237915039062, -1.465667724609375, 1.9711990356445312, 0.3170738220214844, -0.14251708984375, 2.194610595703125, -4.943603515625, 1.379730224609375, -0.38507080078125, -3.4631690979003906, -0.38915252685546875, -1.40667724609375, 3.087799072265625, 3.820343017578125, 1.4573745727539062, 2.7922515869140625, 0.09560394287109375, 0.6468753814697266, 2.838886260986328, 1.5364608764648438, -1.892923355102539, 3.7568893432617188, 1.7009811401367188, -3.705230712890625, 3.917783737182617], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000331.npy"}
|
||||
{"epoch": 0.5003779289493575, "step": 332, "batch_size": 64, "mean": 1.2150681018829346, "std": 1.9749033451080322, "min": -2.694385528564453, "p10": -1.1453243255615233, "median": 1.1199874877929688, "p90": 3.577466583251954, "max": 7.244384765625, "pos_frac": 0.71875, "sample": [0.3234596252441406, 1.8756866455078125, -0.6237106323242188, -1.8333816528320312, 1.019430160522461, 2.0951156616210938, 1.4534912109375, -0.3684654235839844, -1.0423126220703125, 1.4909095764160156, 1.4339828491210938, 4.839302062988281, 2.7322540283203125, -0.26966094970703125, 2.3323593139648438, 1.7125473022460938, 0.42061805725097656, 0.9997072219848633, 2.4852447509765625, -0.5701751708984375, 2.8309593200683594, 1.120452880859375, -1.1894721984863281, 3.9514999389648438, 2.3400421142578125, 3.3619155883789062, -0.7326545715332031, -0.8226470947265625, 2.565826416015625, 0.8812217712402344, -0.05240631103515625, 5.584661483764648, -1.327392578125, 0.43257904052734375, -2.1510467529296875, 1.7068099975585938, 1.3099422454833984, 3.6698455810546875, -0.6759109497070312, 0.23995208740234375, 0.5845184326171875, 1.3658294677734375, 1.58795166015625, 2.3276023864746094, -2.6864051818847656, 0.9290771484375, 2.8191375732421875, 1.4560661315917969, 0.637359619140625, 1.1195220947265625, 7.244384765625, 3.3483963012695312, 0.8619346618652344, -0.9911384582519531, -1.5613861083984375, 1.090728759765625, 5.5936737060546875, 0.7393150329589844, 2.3987197875976562, 2.5952606201171875, 3.9247894287109375, 1.628692626953125, -2.694385528564453, -0.105865478515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000332.npy"}
|
||||
{"epoch": 0.5018896447467877, "step": 333, "batch_size": 64, "mean": 1.7389962673187256, "std": 1.9165157079696655, "min": -1.566619873046875, "p10": -0.6212043762207031, "median": 1.5859375, "p90": 4.279238891601564, "max": 8.25799560546875, "pos_frac": 0.796875, "sample": [2.7757568359375, 0.07325172424316406, 3.137420654296875, 6.493896484375, 1.4235916137695312, -0.32059478759765625, 1.3883018493652344, 1.1988067626953125, 2.6588134765625, 2.16668701171875, 2.3817520141601562, 1.7772674560546875, 2.7955703735351562, 1.6188392639160156, 2.2549381256103516, 1.378448486328125, 1.3934402465820312, 1.9261474609375, 4.096527099609375, 2.4851531982421875, 0.8185882568359375, 0.34949493408203125, -0.001537322998046875, -0.30750274658203125, 0.8702812194824219, -0.5755691528320312, -1.051717758178711, 2.6563453674316406, 2.8132247924804688, 0.2188568115234375, 1.595245361328125, 5.311504364013672, -0.6407623291015625, 4.690765380859375, 2.5607433319091797, 4.3575439453125, 8.25799560546875, -0.6503143310546875, 0.8662147521972656, 2.418121337890625, -0.8345870971679688, -1.566619873046875, -0.8448753356933594, 1.375436782836914, 1.576629638671875, -0.05828094482421875, 3.068206787109375, 2.1862716674804688, -0.8923873901367188, 1.0734176635742188, 0.34009552001953125, -0.1365814208984375, 1.8327312469482422, 5.521711349487305, 1.7116127014160156, 0.37082672119140625, 1.183074951171875, 0.3637237548828125, 3.501152992248535, 1.2459640502929688, 5.463470458984375, 3.0446434020996094, 2.5030288696289062, 1.605560302734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000333.npy"}
|
||||
{"epoch": 0.5034013605442177, "step": 334, "batch_size": 64, "mean": 1.1811105012893677, "std": 2.2291994094848633, "min": -3.600461006164551, "p10": -1.4725200653076171, "median": 1.206324577331543, "p90": 4.214190673828127, "max": 7.357048034667969, "pos_frac": 0.75, "sample": [-1.288421630859375, -2.8080062866210938, 2.0662269592285156, 2.101898193359375, 1.339691162109375, -1.843679428100586, -0.9577789306640625, 5.7032928466796875, 1.3728179931640625, -3.1630172729492188, 0.6589813232421875, 6.1679534912109375, -3.600461006164551, 0.8001136779785156, 1.2407875061035156, 2.0343360900878906, 2.12481689453125, -0.1835174560546875, 0.44305419921875, 7.357048034667969, 5.524391174316406, 4.407005310058594, -1.4961738586425781, 1.0366973876953125, 3.7642898559570312, -2.4226150512695312, 0.7543869018554688, -0.5252265930175781, 2.267822265625, 1.2782878875732422, 1.8082084655761719, 0.24236297607421875, 3.3189163208007812, -1.417327880859375, 2.115091323852539, 1.763763427734375, 5.20794677734375, 1.48565673828125, 1.01031494140625, 0.2886505126953125, 1.0868453979492188, -1.0496025085449219, -1.0628509521484375, 0.32227325439453125, 2.4088287353515625, 0.49517822265625, 1.2563323974609375, 0.33072662353515625, 0.875152587890625, -1.2934417724609375, 1.2194175720214844, 1.7595062255859375, 0.855560302734375, 1.1932315826416016, -0.9803590774536133, 2.943878173828125, -1.6591033935546875, 4.954437255859375, 2.909210205078125, 0.18815898895263672, 1.847259521484375, 2.2169723510742188, 3.0919036865234375, 1.702972412109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000334.npy"}
|
||||
{"epoch": 0.5049130763416477, "step": 335, "batch_size": 64, "mean": 1.241187572479248, "std": 2.350783348083496, "min": -3.316997528076172, "p10": -1.3293540954589844, "median": 0.8431663513183594, "p90": 4.951462173461914, "max": 6.7311553955078125, "pos_frac": 0.65625, "sample": [3.0459136962890625, 2.7573776245117188, -0.2546234130859375, 6.7311553955078125, -0.06366729736328125, 5.328857421875, 1.9148731231689453, -0.04405403137207031, 3.434173583984375, 2.825878143310547, 5.524120330810547, 4.906379699707031, 0.076202392578125, 1.79583740234375, 5.175529479980469, 2.0161590576171875, 0.8136367797851562, 1.5927352905273438, -1.1654949188232422, 5.091156005859375, -0.3176002502441406, -1.10546875, 1.0539798736572266, 6.370780944824219, -1.5714035034179688, -0.2755889892578125, 1.3329849243164062, -1.2821807861328125, 0.9536590576171875, 2.2245864868164062, 4.338474273681641, 0.5175552368164062, 0.2628364562988281, 0.5805702209472656, 1.9389419555664062, -2.406370162963867, 3.110210418701172, -0.216552734375, -0.9999465942382812, 4.970783233642578, -0.42827606201171875, -2.108793258666992, 0.029388427734375, 1.6858139038085938, 0.20879745483398438, 4.781150817871094, 1.8546676635742188, -1.2177810668945312, 2.0811004638671875, -0.444061279296875, 0.8088912963867188, -0.47870635986328125, 0.35137176513671875, -3.316997528076172, -3.24468994140625, -2.14251708984375, 3.5287933349609375, 3.0513458251953125, 1.460458755493164, 2.3125076293945312, -0.5941085815429688, 0.7521286010742188, -1.3495712280273438, 0.8726959228515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000335.npy"}
|
||||
{"epoch": 0.5064247921390779, "step": 336, "batch_size": 64, "mean": 1.4134178161621094, "std": 2.1116275787353516, "min": -3.711261749267578, "p10": -0.8947463989257811, "median": 1.1910324096679688, "p90": 4.240617370605469, "max": 7.117870330810547, "pos_frac": 0.765625, "sample": [4.136871337890625, 4.2850799560546875, 1.0474853515625, 0.03011322021484375, 4.6138916015625, 3.1126022338867188, 2.5488624572753906, 0.8349151611328125, 7.117870330810547, 0.787811279296875, 1.699920654296875, 0.4264488220214844, 5.621025085449219, -1.2955551147460938, 1.37713623046875, -0.21233367919921875, -0.45397186279296875, 1.6584663391113281, 4.797607421875, 0.038974761962890625, 1.8648605346679688, 0.5347137451171875, 1.6484451293945312, 1.25787353515625, -1.5632705688476562, 2.0938186645507812, 2.09637451171875, -3.711261749267578, 1.1241912841796875, -2.0934295654296875, 6.055885314941406, -0.38219451904296875, 0.4952278137207031, 0.7757720947265625, -0.7645111083984375, 0.2640495300292969, 3.066638946533203, -2.0971450805664062, 3.9280643463134766, 0.834136962890625, 1.274627685546875, 1.4931182861328125, 0.9265975952148438, 1.5306205749511719, 0.905853271484375, 1.8332672119140625, 2.6965789794921875, -1.7240753173828125, 0.15280914306640625, 3.522125244140625, -0.3696918487548828, -0.9505615234375, 0.5451087951660156, 2.9828948974609375, 6.35565185546875, 2.6893157958984375, -0.3459320068359375, 2.8838043212890625, 1.9998245239257812, -0.1199951171875, 0.6913604736328125, 2.9451065063476562, 1.51129150390625, -0.5724258422851562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000336.npy"}
|
||||
{"epoch": 0.5079365079365079, "step": 337, "batch_size": 64, "mean": 0.98023521900177, "std": 2.073397636413574, "min": -3.9049758911132812, "p10": -1.5576541900634766, "median": 0.9174652099609375, "p90": 4.129087257385255, "max": 5.510955810546875, "pos_frac": 0.65625, "sample": [1.370574951171875, -1.6919784545898438, -1.8188667297363281, -0.1605224609375, 0.5791168212890625, 3.529327392578125, -1.536529541015625, 2.72723388671875, -1.3469085693359375, 0.2146453857421875, 3.8360443115234375, 0.9304428100585938, 0.985076904296875, -3.173175811767578, -0.74114990234375, -0.3461761474609375, 4.201864242553711, 5.463958740234375, 1.0166015625, 0.891693115234375, -0.17822265625, -0.193267822265625, -2.3948192596435547, -2.0629501342773438, 2.6367034912109375, 0.7859878540039062, -1.0751914978027344, 1.3758430480957031, 0.6786441802978516, 4.6570587158203125, 1.9030914306640625, -3.9049758911132812, 3.9592742919921875, -1.3475265502929688, 0.50439453125, 1.2325820922851562, 4.355712890625, 2.35009765625, 1.4522895812988281, 2.3277225494384766, 1.7278366088867188, 1.6531791687011719, 1.0977935791015625, 1.072052001953125, -0.25586700439453125, -0.178466796875, 1.7709312438964844, 0.300537109375, 2.2590866088867188, -0.09940719604492188, 0.550994873046875, 5.510955810546875, -0.13690185546875, 1.0876007080078125, 0.9044876098632812, 5.084869384765625, -0.513336181640625, -1.5667076110839844, 2.142446517944336, 3.0727291107177734, -1.3219757080078125, 1.5291900634765625, 4.234241485595703, 0.8150634765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000337.npy"}
|
||||
{"epoch": 0.509448223733938, "step": 338, "batch_size": 64, "mean": 1.0984106063842773, "std": 1.7687383890151978, "min": -2.0927810668945312, "p10": -1.0948631286621093, "median": 0.8005237579345703, "p90": 3.2498430252075203, "max": 6.215110778808594, "pos_frac": 0.71875, "sample": [0.124481201171875, 0.5761947631835938, 1.44329833984375, -0.3581504821777344, 1.95989990234375, 0.9407463073730469, -1.0997772216796875, 2.13330078125, 0.64971923828125, 2.4370574951171875, 6.215110778808594, 0.28095245361328125, -0.029621124267578125, 1.8111572265625, 1.0414657592773438, -1.6896476745605469, -2.0927810668945312, 0.16538238525390625, -1.0833969116210938, 0.43532562255859375, 2.6212005615234375, 0.73858642578125, -0.18084716796875, 0.7482490539550781, 3.3301467895507812, -0.04973602294921875, -1.851470947265625, 2.078876495361328, 2.620147705078125, 2.5024795532226562, 2.124664306640625, 3.350677490234375, 0.016366958618164062, 0.0887451171875, 0.8325233459472656, 0.9000015258789062, 1.2966194152832031, -0.2596282958984375, 1.2550048828125, -0.099365234375, 0.053497314453125, 3.062467575073242, -1.9174461364746094, 2.191783905029297, 0.7574043273925781, 2.5290584564208984, -0.5453948974609375, -0.25153160095214844, 1.1350746154785156, -1.4744453430175781, 1.4540786743164062, 4.6337432861328125, 4.119663238525391, -0.02869415283203125, 1.4816360473632812, 2.2557945251464844, 1.1083526611328125, 4.573780059814453, 3.025848388671875, -1.1399078369140625, 0.7082176208496094, -0.09095382690429688, 5.963768005371094, 0.768524169921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000338.npy"}
|
||||
{"epoch": 0.5109599395313681, "step": 339, "batch_size": 64, "mean": 1.348482608795166, "std": 1.907066822052002, "min": -5.971305847167969, "p10": -0.8593202590942381, "median": 1.4543800354003906, "p90": 3.291536712646485, "max": 6.246826171875, "pos_frac": 0.75, "sample": [-0.2104949951171875, 1.2460784912109375, 0.5410995483398438, -0.6290359497070312, -1.4246177673339844, 2.7901744842529297, 1.5236968994140625, 0.5981388092041016, 1.2953300476074219, 2.1364517211914062, 2.2733421325683594, 2.410064697265625, 1.9771652221679688, -0.14836883544921875, 0.7407913208007812, 2.2323837280273438, -0.8944530487060547, 3.3870010375976562, 4.720268249511719, -0.19457626342773438, 3.06878662109375, 1.6165084838867188, 3.460193634033203, 1.8232574462890625, -0.77734375, 1.4361953735351562, 1.0184307098388672, 0.95550537109375, -5.971305847167969, 2.0813770294189453, 2.9568252563476562, 2.949817657470703, 2.475128173828125, 4.748043060302734, 2.7302322387695312, 1.0104713439941406, 2.7978172302246094, -0.20006561279296875, 0.8920745849609375, 0.54742431640625, 4.962493896484375, 2.397064208984375, -1.2619972229003906, -0.6565933227539062, 1.1909637451171875, -1.2403068542480469, 2.025787353515625, 1.472564697265625, -1.2661590576171875, -1.8261260986328125, -0.11325454711914062, 1.0831003189086914, 2.3106155395507812, 3.694732666015625, 0.10089111328125, 6.246826171875, -0.5162353515625, 0.7622451782226562, 2.6120147705078125, 2.281085968017578, 2.814080238342285, 1.8175201416015625, 0.7416286468505859, 2.6801376342773438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000339.npy"}
|
||||
{"epoch": 0.5124716553287982, "step": 340, "batch_size": 64, "mean": 1.1560699939727783, "std": 2.1806845664978027, "min": -4.098941802978516, "p10": -0.8952186584472657, "median": 0.8044805526733398, "p90": 3.829228973388673, "max": 5.972679138183594, "pos_frac": 0.625, "sample": [1.6522293090820312, 4.050331115722656, 0.7873001098632812, -0.29383087158203125, 0.17082977294921875, 3.4764060974121094, -0.4948101043701172, -0.885711669921875, -0.5226116180419922, -0.7657470703125, -0.42127227783203125, 2.916851043701172, 2.4575347900390625, 2.6606311798095703, 5.681083679199219, 5.341991424560547, 0.5658340454101562, -4.098941802978516, 2.7599945068359375, 1.478240966796875, -0.3442840576171875, -0.3810253143310547, -0.50390625, 0.8216609954833984, -0.8663330078125, 3.9427337646484375, 0.7369766235351562, 3.5643844604492188, 5.972679138183594, 3.1681060791015625, -0.8978462219238281, 3.4772720336914062, -0.14035797119140625, 2.4578094482421875, 2.818084716796875, 0.6116533279418945, 3.3846893310546875, -0.15390968322753906, -0.0690765380859375, 2.9697513580322266, 1.8125038146972656, -0.8890876770019531, -0.433349609375, 4.373069763183594, 1.2965850830078125, -1.4564781188964844, -2.13909912109375, -0.7095794677734375, 1.0126609802246094, 3.483551025390625, 1.4284210205078125, 0.5388107299804688, 3.1027679443359375, -2.245311737060547, -3.7080745697021484, -1.760467529296875, 4.6427459716796875, 0.7488670349121094, 0.18598175048828125, 3.5563812255859375, 1.5815200805664062, 0.9134674072265625, -0.6006431579589844, 2.1678390502929688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000340.npy"}
|
||||
{"epoch": 0.5139833711262283, "step": 341, "batch_size": 64, "mean": 1.2626094818115234, "std": 1.928413987159729, "min": -3.217334747314453, "p10": -1.075708770751953, "median": 0.8433723449707031, "p90": 3.6500823974609378, "max": 4.603046417236328, "pos_frac": 0.734375, "sample": [-1.1085281372070312, 3.0622482299804688, 3.1805191040039062, 1.100667953491211, 4.068901062011719, 4.068458557128906, 0.8742103576660156, 4.1966705322265625, 2.8900909423828125, 0.0479736328125, 4.603046417236328, -0.06839942932128906, -1.806671142578125, -0.8791790008544922, -0.015951156616210938, 0.2654743194580078, 3.4852142333984375, 2.1944046020507812, 3.2353439331054688, 2.5582847595214844, 3.7616653442382812, 3.8028335571289062, 2.444732666015625, 0.7187442779541016, -0.02569580078125, 3.6732177734375, -0.2274932861328125, -2.315746307373047, 0.9860687255859375, -0.9991302490234375, 0.4731178283691406, 3.353363037109375, -1.33990478515625, 0.30804443359375, 2.2025146484375, 3.4385948181152344, 0.7626495361328125, 2.0754928588867188, 0.111480712890625, -3.217334747314453, -2.308349609375, -0.0457611083984375, 0.1155853271484375, 3.5886993408203125, 0.7966289520263672, 3.596099853515625, 3.186370849609375, 3.4548797607421875, 2.5657501220703125, 2.5037612915039062, 2.524810791015625, -0.1717529296875, 3.294473648071289, -0.9825439453125, 0.3829498291015625, 2.3296127319335938, 0.3736457824707031, 0.9156036376953125, 0.08281135559082031, -0.9480056762695312, 0.1943511962890625, 0.8125343322753906, 0.64959716796875, -2.04473876953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000341.npy"}
|
||||
{"epoch": 0.5154950869236583, "step": 342, "batch_size": 64, "mean": 1.4702682495117188, "std": 2.013063907623291, "min": -2.4579734802246094, "p10": -0.6337059020996093, "median": 0.9288496971130371, "p90": 4.0611309051513675, "max": 7.757438659667969, "pos_frac": 0.75, "sample": [0.21527099609375, 0.5280303955078125, 0.49449920654296875, -2.034717559814453, 0.9848470687866211, 5.9029541015625, -0.9188041687011719, 2.9978408813476562, 0.3019866943359375, 1.7095413208007812, -0.048786163330078125, 1.9059524536132812, 1.1234149932861328, 2.3785171508789062, 2.2989959716796875, 0.8252716064453125, 0.22150421142578125, -2.4579734802246094, -0.38609886169433594, 0.06721115112304688, 2.388843536376953, 3.5319442749023438, 3.1629638671875, 2.8695945739746094, -0.7945289611816406, 3.3568649291992188, -0.321136474609375, 0.8233718872070312, 3.7015914916992188, 3.4113388061523438, 0.5624961853027344, 0.595489501953125, 0.715606689453125, -0.1566314697265625, 2.4221420288085938, -2.049309730529785, 2.6636123657226562, -0.543914794921875, 1.6629638671875, 3.4163360595703125, 0.14791107177734375, 5.358634948730469, 4.0833587646484375, 1.774017333984375, -0.8092193603515625, 4.949005126953125, 1.5235786437988281, 0.20447921752929688, 4.6474609375, 4.7280426025390625, 0.53570556640625, -0.6721878051757812, 0.6874237060546875, 7.757438659667969, -0.039783477783203125, 4.009265899658203, 1.1422691345214844, -0.07653045654296875, 1.9978256225585938, 0.8728523254394531, 2.694734573364258, 1.7105712890625, -0.1397705078125, -0.519012451171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000342.npy"}
|
||||
{"epoch": 0.5170068027210885, "step": 343, "batch_size": 64, "mean": 0.7514776587486267, "std": 1.9317357540130615, "min": -3.9940872192382812, "p10": -1.7897285461425776, "median": 0.9158754348754883, "p90": 3.380669403076173, "max": 4.3686981201171875, "pos_frac": 0.65625, "sample": [1.8629989624023438, 0.7734527587890625, 1.1014595031738281, -1.1929168701171875, -0.056976318359375, 3.005831718444824, 1.696197509765625, -0.6711254119873047, 1.0245819091796875, -3.177215576171875, -3.8296432495117188, -0.24090576171875, 4.3686981201171875, 1.9716110229492188, 2.85809326171875, -0.396759033203125, 2.491119384765625, 1.171844482421875, 3.744007110595703, 2.642047882080078, 0.2417449951171875, 1.9170055389404297, 2.3597373962402344, 1.4323654174804688, 2.3746566772460938, 3.0665206909179688, 1.206228256225586, -1.3674545288085938, 0.5445938110351562, 0.8919105529785156, -1.1299667358398438, 1.3027191162109375, 3.5153045654296875, 0.9398403167724609, -0.5458602905273438, -0.7868461608886719, 2.5164146423339844, -0.09933853149414062, -3.9940872192382812, 4.135528564453125, -1.970703125, -0.9258499145507812, 0.8075714111328125, 3.57916259765625, 1.72247314453125, -0.6535873413085938, 3.875152587890625, 1.007476806640625, 0.2781105041503906, 1.2908897399902344, 4.311332702636719, 0.331573486328125, -0.17511749267578125, 0.335723876953125, 1.2594375610351562, 1.2746429443359375, -2.107379913330078, 0.40903472900390625, 2.0153961181640625, -1.0021743774414062, -2.3899383544921875, -2.2547836303710938, -0.9026355743408203, 0.3113441467285156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000343.npy"}
|
||||
{"epoch": 0.5185185185185185, "step": 344, "batch_size": 64, "mean": 1.6081275939941406, "std": 2.285642147064209, "min": -2.838848114013672, "p10": -0.6147903442382813, "median": 1.2700700759887695, "p90": 3.8978849411010748, "max": 12.39862060546875, "pos_frac": 0.78125, "sample": [-0.8169631958007812, 4.2330474853515625, 0.26740264892578125, 1.2714767456054688, 2.9629478454589844, -0.36122894287109375, 2.171894073486328, 1.4985733032226562, -0.23894882202148438, -0.2886505126953125, -2.112213134765625, 0.9067535400390625, 3.025463104248047, 3.5459251403808594, -0.28482818603515625, 3.2755165100097656, 1.8033981323242188, 5.15789794921875, 1.0925216674804688, 2.5385894775390625, -1.6338214874267578, 1.9719085693359375, 0.1685791015625, 3.0663375854492188, 0.6782131195068359, 0.21887969970703125, -1.85748291015625, 0.4371299743652344, 2.18560791015625, 3.0841064453125, -0.5846481323242188, 5.8365325927734375, 1.1072845458984375, 0.18096160888671875, 2.7805442810058594, 3.9396209716796875, 1.355560302734375, 0.7487297058105469, 0.27349853515625, -1.8510704040527344, 0.9154167175292969, 2.019317626953125, -0.01239013671875, 0.8869209289550781, 0.1977386474609375, 1.1123847961425781, 3.7863540649414062, 3.0995025634765625, 3.8158950805664062, -0.6277084350585938, 3.933023452758789, 2.1166458129882812, 4.540022850036621, 2.359468460083008, 1.2686634063720703, -0.031154632568359375, 12.39862060546875, 2.333881378173828, 0.3898162841796875, 1.907745361328125, -2.838848114013672, 3.762969970703125, 0.7169208526611328, 3.1139068603515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000344.npy"}
|
||||
{"epoch": 0.5200302343159486, "step": 345, "batch_size": 64, "mean": 1.2283005714416504, "std": 1.9786046743392944, "min": -2.6889266967773438, "p10": -1.42834415435791, "median": 1.0578298568725586, "p90": 4.100054931640626, "max": 4.846199035644531, "pos_frac": 0.75, "sample": [1.8328628540039062, 0.9341011047363281, 4.7327728271484375, -2.412801742553711, 0.11548614501953125, -0.50091552734375, 0.09653854370117188, -2.0712738037109375, -0.4774131774902344, 1.0551319122314453, 1.0899734497070312, 4.846199035644531, 2.336557388305664, 0.2796516418457031, -1.4358291625976562, 1.0932769775390625, -1.7399444580078125, 0.2064971923828125, 0.8376350402832031, 4.1826629638671875, 4.7646636962890625, 1.1411075592041016, 3.9073028564453125, 1.5097999572753906, 1.180044174194336, 0.553436279296875, 1.0205841064453125, 1.0605278015136719, -0.7882118225097656, -1.6667022705078125, 1.8593978881835938, 4.230415344238281, 2.1922645568847656, 2.2612953186035156, 1.3533935546875, -0.8341331481933594, 3.865650177001953, 4.4304962158203125, 3.8442230224609375, -2.6889266967773438, 3.2653541564941406, -0.47733306884765625, -2.332469940185547, 3.4362945556640625, 2.2404327392578125, -1.0107612609863281, 0.33852386474609375, 1.01416015625, 3.2627830505371094, 1.7939376831054688, 4.448463439941406, 1.5755729675292969, -1.3556556701660156, 3.1598968505859375, 1.0451583862304688, 2.7621231079101562, 0.297393798828125, -1.410879135131836, 0.6720867156982422, 0.7940750122070312, 2.201763153076172, -0.02850341796875, 1.0068588256835938, 3.7141647338867188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000345.npy"}
|
||||
{"epoch": 0.5215419501133787, "step": 346, "batch_size": 64, "mean": 1.499464750289917, "std": 2.271347761154175, "min": -4.690269470214844, "p10": -1.0435643196105957, "median": 1.4151058197021484, "p90": 4.272537422180176, "max": 6.9499664306640625, "pos_frac": 0.6875, "sample": [3.6416587829589844, 3.784452438354492, 4.357940673828125, 2.2777481079101562, 4.0778961181640625, 1.0091552734375, 2.000518798828125, 1.2898283004760742, 4.27937126159668, 3.838146209716797, 6.9499664306640625, -0.021099090576171875, 1.406402587890625, 3.7699756622314453, -1.0892486572265625, 4.144100189208984, -1.0286407470703125, 2.116546630859375, -0.9862632751464844, 1.324462890625, 1.0832290649414062, 3.0909881591796875, 1.1797943115234375, -0.7504787445068359, 1.9506187438964844, 5.247802734375, -4.690269470214844, 1.5107498168945312, -0.12306976318359375, -1.9545669555664062, 1.47576904296875, -1.386383056640625, 1.9481010437011719, -1.0499601364135742, 0.3943328857421875, 2.2196197509765625, 2.2117271423339844, -0.5681838989257812, 3.737628936767578, 4.256591796875, 5.103515625, 4.579570770263672, -0.7329330444335938, 1.3448333740234375, 1.2571945190429688, 0.1895751953125, 3.9055023193359375, 2.9187698364257812, -0.4346809387207031, -0.6419830322265625, 3.4910354614257812, -0.5751953125, 0.29401588439941406, -0.001491546630859375, -2.560649871826172, 2.5085296630859375, 1.4238090515136719, 2.3137130737304688, -2.7364768981933594, 4.803230285644531, -0.6078109741210938, -0.9494009017944336, 3.233917236328125, 0.9121932983398438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000346.npy"}
|
||||
{"epoch": 0.5230536659108088, "step": 347, "batch_size": 64, "mean": 1.200080394744873, "std": 2.4158105850219727, "min": -5.008430480957031, "p10": -1.6710494995117187, "median": 0.9161815643310547, "p90": 3.663223266601563, "max": 7.9685516357421875, "pos_frac": 0.703125, "sample": [-0.022111892700195312, 3.1305809020996094, 0.45111846923828125, 0.299591064453125, -5.008430480957031, 0.6397476196289062, 2.047332763671875, 1.9711761474609375, 2.586149215698242, 2.4900131225585938, -0.250091552734375, 2.4800033569335938, 0.2721710205078125, -1.5714263916015625, -2.643218994140625, 7.9685516357421875, 2.0975341796875, 1.2918643951416016, -2.0264129638671875, 1.1582412719726562, 0.3181915283203125, 1.3546829223632812, 7.128669738769531, 0.581756591796875, -1.1300163269042969, 0.26171112060546875, 1.388580322265625, 6.134071350097656, 3.5970001220703125, -0.6828460693359375, -0.23917388916015625, -0.9157676696777344, -2.646869659423828, 2.3589820861816406, -1.188828468322754, 0.13011550903320312, -1.9545974731445312, 2.0097084045410156, 7.18426513671875, 4.379070281982422, 2.590108871459961, 0.2649688720703125, -0.4007148742675781, 2.4649658203125, 2.803213119506836, 3.24725341796875, -0.6176185607910156, -2.6164093017578125, 3.0807418823242188, 1.8917083740234375, 3.6916046142578125, 1.1268806457519531, 3.4173583984375, -1.7137451171875, -0.7978858947753906, 4.037574768066406, 0.7054824829101562, 1.4109153747558594, 0.4660797119140625, -0.13136768341064453, 0.7048873901367188, 0.30991363525390625, 1.9569015502929688, 3.4812355041503906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000347.npy"}
|
||||
{"epoch": 0.5245653817082389, "step": 348, "batch_size": 64, "mean": 1.5311079025268555, "std": 2.483609199523926, "min": -2.9140777587890625, "p10": -0.8809837341308593, "median": 0.7280254364013672, "p90": 5.168224334716797, "max": 8.965522766113281, "pos_frac": 0.6875, "sample": [4.0322113037109375, 2.7686195373535156, 5.1772308349609375, 0.06596755981445312, 1.7218551635742188, 2.1374664306640625, 0.446868896484375, -0.02114105224609375, -0.6626358032226562, -0.3962974548339844, 8.965522766113281, -1.080230712890625, 2.0586090087890625, 2.3688735961914062, 5.7298431396484375, 4.81715202331543, 0.1595458984375, 7.9220123291015625, -0.5393524169921875, 4.803806304931641, 0.7130165100097656, 0.46735382080078125, 3.0140380859375, 0.9132843017578125, 1.7345046997070312, 4.1181182861328125, 0.45407867431640625, 3.2492008209228516, 0.5961265563964844, 0.3127555847167969, -1.03521728515625, 0.01082611083984375, 0.40185546875, 2.6309967041015625, -2.9140777587890625, 0.9904289245605469, 2.406381607055664, 0.10848236083984375, -0.18636703491210938, 1.9835166931152344, 3.6755447387695312, 7.094760894775391, -0.240936279296875, -0.8008003234863281, 5.533206939697266, -0.9153480529785156, 0.27843666076660156, -0.5410041809082031, -0.022491455078125, 2.2087364196777344, -0.6007308959960938, 5.147209167480469, -1.4656314849853516, 1.8719348907470703, 5.859027862548828, -0.5429229736328125, 2.2433242797851562, 0.7430343627929688, -0.6935043334960938, 1.1652488708496094, -2.223907470703125, 0.9732666015625, -0.09226226806640625, -1.1085166931152344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000348.npy"}
|
||||
{"epoch": 0.5260770975056689, "step": 349, "batch_size": 64, "mean": 1.3855433464050293, "std": 2.2437822818756104, "min": -4.483768463134766, "p10": -1.289053344726562, "median": 1.2173919677734375, "p90": 3.899976348876953, "max": 10.2357177734375, "pos_frac": 0.75, "sample": [1.21881103515625, 1.03814697265625, -0.6895828247070312, -0.73681640625, 0.3965625762939453, 2.0721054077148438, 1.1914215087890625, 1.918060302734375, 0.19063186645507812, 2.9626903533935547, -0.40659141540527344, 1.2525482177734375, 3.3484344482421875, 0.2919788360595703, 1.2954254150390625, 4.0420732498168945, 2.4903106689453125, -1.5814666748046875, -4.483768463134766, 3.138334274291992, -0.3877105712890625, 2.0544509887695312, 1.46856689453125, 2.2411270141601562, 1.8194046020507812, 1.5239944458007812, -0.02142333984375, 3.83551025390625, -0.45023345947265625, 10.2357177734375, 1.7805099487304688, 1.1481361389160156, -2.1737060546875, 1.215972900390625, -0.02112579345703125, 1.9156723022460938, 1.4938344955444336, 5.861083984375, -0.16851806640625, 1.908090591430664, 3.7619171142578125, 0.6652755737304688, 0.5868873596191406, 1.0976104736328125, 0.6296119689941406, 3.9276046752929688, 2.78851318359375, 0.4787788391113281, -1.7161026000976562, -0.15049076080322266, 1.1736221313476562, 0.218109130859375, -1.7797698974609375, 1.3081588745117188, 1.3704986572265625, 4.868095397949219, 0.32842254638671875, -1.525726318359375, -1.5486259460449219, 3.3539657592773438, 0.21242332458496094, 5.1616058349609375, 3.694854736328125, 5.540870666503906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000349.npy"}
|
||||
{"epoch": 0.527588813303099, "step": 350, "batch_size": 64, "mean": 1.7634345293045044, "std": 2.1817235946655273, "min": -1.5846138000488281, "p10": -0.8522571563720702, "median": 1.3907852172851562, "p90": 5.281642150878906, "max": 7.030555725097656, "pos_frac": 0.78125, "sample": [5.26422119140625, 0.681488037109375, 1.5838394165039062, -0.8914718627929688, 1.7497711181640625, 1.5818519592285156, 6.893424987792969, 4.3952178955078125, 1.0506401062011719, 2.8521194458007812, 1.8564529418945312, 0.2505035400390625, -1.223114013671875, 2.0167884826660156, 1.2531356811523438, 1.1389045715332031, 1.4606781005859375, -0.40008544921875, 0.488739013671875, -0.8728828430175781, 0.8047332763671875, 2.2552833557128906, -1.0711860656738281, 1.320892333984375, 1.007730484008789, -0.8041305541992188, -1.2477455139160156, 3.90185546875, 0.7326602935791016, -0.9456100463867188, 2.0636367797851562, 0.18188095092773438, 2.96099853515625, -0.6749649047851562, 2.1177291870117188, 5.309883117675781, 2.492889404296875, 4.5475616455078125, 5.6671295166015625, 7.030555725097656, -1.5846138000488281, 5.2891082763671875, 4.048152923583984, 0.04400634765625, 0.027240753173828125, 0.7097511291503906, 3.648387908935547, 0.0113983154296875, 0.18482589721679688, 4.973335266113281, 1.2985992431640625, 3.068225860595703, -0.3909931182861328, 0.5768966674804688, 1.5323410034179688, -0.0650177001953125, 1.716949462890625, -0.43816375732421875, -0.4525413513183594, 2.6832122802734375, 5.640789031982422, 2.6578445434570312, 3.0788021087646484, 5.819267272949219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000350.npy"}
|
||||
{"epoch": 0.5291005291005291, "step": 351, "batch_size": 64, "mean": 1.3316857814788818, "std": 2.1784541606903076, "min": -6.869989395141602, "p10": -1.0454341888427734, "median": 1.5027885437011719, "p90": 3.9069696426391602, "max": 5.570037841796875, "pos_frac": 0.71875, "sample": [1.8528766632080078, 3.4032211303710938, 3.9241065979003906, 1.2579193115234375, 0.42386627197265625, 3.169706344604492, -1.1306724548339844, 1.845306396484375, 2.8907546997070312, 4.580841064453125, 1.90826416015625, 4.702079772949219, -0.003101348876953125, -0.81561279296875, 1.127288818359375, -1.1217193603515625, -0.8901290893554688, -0.6673583984375, 4.29248046875, 1.2147483825683594, 3.0769424438476562, -1.59918212890625, 2.8885650634765625, 2.834197998046875, 2.25018310546875, 3.260009765625, 2.716259002685547, 2.957855224609375, -1.6275672912597656, 1.9919204711914062, 2.0501670837402344, -1.0781288146972656, 4.6919097900390625, 1.2240524291992188, 0.9574737548828125, -0.7872962951660156, -0.969146728515625, 5.570037841796875, 0.7474021911621094, 1.4070663452148438, 1.508575439453125, 3.2675323486328125, 1.4970016479492188, -0.6315841674804688, 2.7010955810546875, 2.01519775390625, 2.4891281127929688, -0.6743927001953125, 0.5730209350585938, 1.1311874389648438, -0.20601654052734375, 0.5044212341308594, 1.7405357360839844, 2.5562610626220703, 2.9080581665039062, -0.254180908203125, 0.779083251953125, 4.2455291748046875, -6.869989395141602, 0.8222026824951172, -4.632881164550781, 1.6019668579101562, 3.866983413696289, -0.238433837890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000351.npy"}
|
||||
{"epoch": 0.5306122448979592, "step": 352, "batch_size": 64, "mean": 0.9371156692504883, "std": 1.9957709312438965, "min": -6.894590377807617, "p10": -1.0295917510986325, "median": 0.9330425262451172, "p90": 3.4775630950927736, "max": 5.603309631347656, "pos_frac": 0.71875, "sample": [3.49072265625, 4.782402038574219, 5.603309631347656, 0.8121566772460938, 1.9882125854492188, -2.5999298095703125, 0.003021240234375, 1.5392684936523438, -0.7129669189453125, 1.7487926483154297, -0.42992401123046875, 1.1825408935546875, -0.676239013671875, -6.894590377807617, -0.571319580078125, 1.1135101318359375, 2.156341552734375, -2.4441986083984375, -0.8551712036132812, 1.3736801147460938, 1.5801315307617188, 0.43308258056640625, -1.1541900634765625, 3.0411300659179688, 3.546698570251465, 1.29827880859375, 0.6199493408203125, 1.3288841247558594, 0.3729515075683594, 0.02869415283203125, 0.1853313446044922, 1.8066558837890625, 0.916107177734375, 0.9499778747558594, -1.1043434143066406, 1.7145156860351562, 0.45067596435546875, 2.78948974609375, 3.6461257934570312, 2.6862640380859375, 2.2240562438964844, -0.32997894287109375, 5.014434814453125, -0.730224609375, -0.15943527221679688, 3.446857452392578, -0.0971221923828125, 0.5653839111328125, 0.14536380767822266, 1.7489662170410156, -0.5819168090820312, 1.03668212890625, 3.206268310546875, 2.65093994140625, 1.9428863525390625, 0.9939422607421875, 0.46561431884765625, 3.8770751953125, -0.7803916931152344, -1.910400390625, 0.42505836486816406, 2.1711063385009766, 0.13346099853515625, -1.2292556762695312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000352.npy"}
|
||||
{"epoch": 0.5321239606953893, "step": 353, "batch_size": 64, "mean": 1.2774256467819214, "std": 2.1970431804656982, "min": -7.0406341552734375, "p10": -1.1797903060913084, "median": 1.3012523651123047, "p90": 4.12871551513672, "max": 6.125335693359375, "pos_frac": 0.703125, "sample": [3.1748390197753906, 4.4788055419921875, -1.1097793579101562, 1.4688949584960938, -3.069751739501953, 0.8955936431884766, 0.7144851684570312, -1.8244552612304688, 2.5745697021484375, 2.7373886108398438, -0.6724510192871094, 1.838958740234375, 1.527099609375, -1.2359619140625, 4.692859649658203, 1.3416557312011719, 1.0352020263671875, 4.2670135498046875, 4.6780242919921875, 3.0226669311523438, 6.125335693359375, 4.6160125732421875, 0.4990119934082031, 1.9525070190429688, -0.0438232421875, -0.369293212890625, 3.828624725341797, 2.6458587646484375, 4.1853179931640625, -0.4031085968017578, 0.8220291137695312, 0.7931404113769531, 1.2049446105957031, 1.3425483703613281, -7.0406341552734375, 3.9787864685058594, -1.2097949981689453, -0.3568878173828125, 3.5491676330566406, 3.0235557556152344, -0.3188018798828125, 2.2959442138671875, 3.315704345703125, 2.2024593353271484, -0.074432373046875, 1.69036865234375, 1.6898040771484375, 0.20070648193359375, 0.60650634765625, -0.1063232421875, 2.0289382934570312, -0.054851531982421875, 0.07637786865234375, 0.32927703857421875, 0.5327072143554688, 1.3261489868164062, 1.2763557434082031, -0.44089508056640625, -0.240753173828125, -1.4729728698730469, 2.53570556640625, -2.291656494140625, 3.99664306640625, 2.9733238220214844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000353.npy"}
|
||||
{"epoch": 0.5336356764928194, "step": 354, "batch_size": 64, "mean": 1.401000738143921, "std": 2.1185660362243652, "min": -2.903350830078125, "p10": -1.490605354309082, "median": 1.1485557556152344, "p90": 3.999123764038086, "max": 6.507759094238281, "pos_frac": 0.75, "sample": [6.507759094238281, 3.7423152923583984, -0.8859100341796875, 1.1703147888183594, -1.5989227294921875, -0.023088455200195312, 3.3008956909179688, 0.8086128234863281, 2.035552978515625, -0.6560745239257812, 1.04229736328125, 3.564512252807617, 2.0561294555664062, 1.0237579345703125, 1.4995613098144531, 0.5868263244628906, 1.086578369140625, -1.303802490234375, 3.7006683349609375, 4.67913818359375, 4.824527740478516, 1.4263954162597656, -1.80963134765625, 1.5925750732421875, -2.903350830078125, -1.4134540557861328, -2.3378334045410156, 5.203084945678711, 2.553478240966797, 1.4637527465820312, 1.7183151245117188, 4.437530517578125, 1.7181129455566406, 1.737274169921875, 3.62567138671875, 2.3718013763427734, -0.5782890319824219, -2.338644027709961, 3.0182228088378906, -0.46556854248046875, 2.8179779052734375, 3.7874412536621094, 0.1564178466796875, -2.6254119873046875, 3.98297119140625, -1.5236701965332031, 1.6729583740234375, 5.559455871582031, 3.066652297973633, -0.62542724609375, 1.0583724975585938, 1.082733154296875, 4.006046295166016, 2.44537353515625, 0.7951431274414062, 0.11214065551757812, 0.20359039306640625, 1.1267967224121094, 0.63397216796875, 1.11334228515625, 0.9933700561523438, 3.1647911071777344, -0.18524932861328125, 0.6631622314453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000354.npy"}
|
||||
{"epoch": 0.5351473922902494, "step": 355, "batch_size": 64, "mean": 1.4075103998184204, "std": 2.2379672527313232, "min": -2.6589431762695312, "p10": -0.981415557861328, "median": 1.0875663757324219, "p90": 4.63235206604004, "max": 6.668964385986328, "pos_frac": 0.703125, "sample": [0.5434722900390625, 4.028266906738281, -0.105224609375, 2.4884567260742188, 4.34716796875, 4.331676483154297, 1.602783203125, -1.046356201171875, -0.5224227905273438, 1.0804061889648438, 1.6681137084960938, -1.5350112915039062, 1.5543556213378906, 0.480865478515625, 1.6355857849121094, 1.0947265625, 0.12255096435546875, 1.9855499267578125, -2.6589431762695312, 0.6579360961914062, -1.5075531005859375, 2.01171875, 3.2643966674804688, 2.208160400390625, 2.617725372314453, -0.3206329345703125, -0.11061859130859375, 0.8666629791259766, 3.582855224609375, 5.3900299072265625, 1.3329238891601562, 1.8663578033447266, -0.21847152709960938, 6.658050537109375, 0.083038330078125, -0.8298873901367188, 0.71405029296875, -0.44085693359375, -0.13866424560546875, 2.185199737548828, -0.4513397216796875, -2.5958404541015625, 6.537651062011719, 1.1510772705078125, 1.1007270812988281, 2.904937744140625, -0.205230712890625, -0.3656463623046875, 5.771141052246094, -2.266000747680664, 4.754573822021484, 3.23779296875, 1.9881515502929688, 0.432281494140625, 6.668964385986328, -1.8416290283203125, 0.8667373657226562, 0.020599365234375, 5.3599090576171875, -0.3465156555175781, 2.8005104064941406, 3.063793182373047, 0.4775848388671875, 0.04799652099609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000355.npy"}
|
||||
{"epoch": 0.5366591080876795, "step": 356, "batch_size": 64, "mean": 1.0617367029190063, "std": 2.5172781944274902, "min": -4.845184326171875, "p10": -2.1056808471679687, "median": 1.133575439453125, "p90": 3.8396804809570315, "max": 6.6679229736328125, "pos_frac": 0.671875, "sample": [1.9978523254394531, 3.5661392211914062, -1.6388778686523438, -2.8192214965820312, 3.2156124114990234, 3.7707977294921875, 0.9649505615234375, 3.317657470703125, 2.201753616333008, -0.39432334899902344, -4.636131286621094, -4.845184326171875, 1.5345611572265625, -1.3554153442382812, 2.9602127075195312, 6.140098571777344, 2.3229827880859375, 0.8867340087890625, 3.2765731811523438, -0.11178016662597656, 1.6782989501953125, -4.167299270629883, 3.2732162475585938, 3.029937744140625, 2.4067153930664062, 3.00421142578125, 0.12340927124023438, -1.6459808349609375, 0.050617218017578125, 2.2316150665283203, -1.0307235717773438, 0.4650402069091797, 0.03330230712890625, 0.13765335083007812, -1.8146514892578125, 1.1903610229492188, 1.0767898559570312, -0.248626708984375, 2.4024429321289062, 3.1025772094726562, 0.4041595458984375, 1.8734703063964844, 2.3890380859375, -2.812896728515625, -1.0693511962890625, -0.5827789306640625, 1.7924690246582031, 0.6764984130859375, -0.19552993774414062, 2.4263458251953125, -0.7792205810546875, 0.8892822265625, 6.294885635375977, 5.0885772705078125, -1.3853378295898438, 3.86920166015625, 1.246988296508789, -2.23040771484375, 4.1600189208984375, -0.808197021484375, -2.4577903747558594, 6.6679229736328125, 2.918333053588867, 3.92156982421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000356.npy"}
|
||||
{"epoch": 0.5381708238851096, "step": 357, "batch_size": 64, "mean": 1.3021240234375, "std": 2.146275758743286, "min": -3.218921661376953, "p10": -1.3972137451171873, "median": 1.1335372924804688, "p90": 4.289042663574221, "max": 7.8864593505859375, "pos_frac": 0.71875, "sample": [2.5909194946289062, -0.9620113372802734, 4.538227081298828, 1.6003341674804688, -1.7987480163574219, 7.8864593505859375, 1.2445831298828125, -0.5642242431640625, 2.8953628540039062, -0.1775970458984375, -1.5085601806640625, 4.4975128173828125, -1.1374053955078125, 1.4276275634765625, 5.564750671386719, 0.774871826171875, 3.1244029998779297, 1.4682693481445312, 2.698711395263672, 3.1181983947753906, 1.9300384521484375, 0.597442626953125, 2.503143310546875, 0.08294486999511719, 1.8736495971679688, 1.6057929992675781, -0.6916122436523438, -0.22867965698242188, -2.672454833984375, 1.5874748229980469, 0.04831695556640625, 1.7549705505371094, -1.5580902099609375, 2.0636215209960938, 0.8377532958984375, 3.8026123046875, 2.734954833984375, 0.826690673828125, 1.96990966796875, 4.643669128417969, -0.26031494140625, 4.558952331542969, -0.7426033020019531, 2.2104034423828125, 1.022491455078125, 3.5014801025390625, -0.13848876953125, -0.07857513427734375, -2.0409469604492188, 0.36786651611328125, 0.3411083221435547, 3.3906097412109375, 5.315746307373047, 0.8458480834960938, 3.415496826171875, -0.2617645263671875, 0.32973480224609375, 2.569469451904297, 2.170602798461914, -2.4914093017578125, 0.073699951171875, -3.218921661376953, 0.6418991088867188, 0.8197154998779297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000357.npy"}
|
||||
{"epoch": 0.5396825396825397, "step": 358, "batch_size": 64, "mean": 1.5356526374816895, "std": 1.984094262123108, "min": -1.9952354431152344, "p10": -0.6918802261352538, "median": 1.2965335845947266, "p90": 4.28789176940918, "max": 7.4432830810546875, "pos_frac": 0.75, "sample": [1.2852058410644531, 2.979705810546875, 2.63006591796875, 4.255012512207031, -0.0045623779296875, 3.4844970703125, 1.8087081909179688, -1.5221023559570312, -0.31675148010253906, 0.23467636108398438, -0.5832443237304688, 2.282841682434082, 1.196014404296875, 2.4480209350585938, 2.4892578125, 1.307861328125, -0.20171356201171875, 0.566497802734375, 1.213165283203125, 2.9258251190185547, 1.2168350219726562, 0.8575725555419922, 0.07078170776367188, 4.301982879638672, 3.0741729736328125, 0.6056709289550781, 1.3281021118164062, -1.6018524169921875, -1.9642791748046875, 0.7783012390136719, -0.4393157958984375, 2.2864303588867188, -0.17950439453125, 1.3923225402832031, -0.44654083251953125, 0.65216064453125, 4.771263122558594, 4.7855224609375, 0.6359634399414062, -0.16235733032226562, 5.299186706542969, 2.443359375, 3.6834716796875, 1.3460578918457031, 7.4432830810546875, 5.544157028198242, 2.0462608337402344, -0.730987548828125, 1.2056541442871094, 3.0122528076171875, -0.6006298065185547, -1.6341915130615234, -1.9952354431152344, 2.324798583984375, 0.1450042724609375, 0.9845161437988281, -0.85784912109375, 0.8406944274902344, 1.4122085571289062, 3.1633529663085938, 2.2555618286132812, 3.1432266235351562, 2.110870361328125, 5.2545318603515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000358.npy"}
|
||||
{"epoch": 0.5411942554799698, "step": 359, "batch_size": 64, "mean": 0.7404047250747681, "std": 2.582148790359497, "min": -10.6478271484375, "p10": -1.826624298095703, "median": 0.7100296020507812, "p90": 3.709569549560548, "max": 8.261566162109375, "pos_frac": 0.671875, "sample": [0.229705810546875, 4.78515625, -0.7079429626464844, 1.4928436279296875, 3.0395584106445312, -0.18804931640625, -1.8282012939453125, 0.9905929565429688, 0.2335205078125, -10.6478271484375, 4.312812805175781, 0.28008270263671875, 1.3568954467773438, 3.5212020874023438, -0.6146125793457031, 0.53558349609375, 2.828977584838867, -0.7502956390380859, 2.658611297607422, -3.4475536346435547, -0.6495857238769531, 1.2465629577636719, 0.6355133056640625, -1.9267501831054688, 2.3615188598632812, 0.702606201171875, 0.9548759460449219, 0.9353733062744141, 1.76934814453125, -2.8099327087402344, 4.525787353515625, 8.261566162109375, -0.82879638671875, 2.1138153076171875, 0.43305206298828125, -0.8718643188476562, 2.9821434020996094, 1.5256690979003906, 0.018100738525390625, 1.69012451171875, -1.1344375610351562, 1.6582679748535156, 4.431640625, 1.3375301361083984, 0.7174530029296875, -2.60089111328125, -1.0234527587890625, -1.678985595703125, 0.2668113708496094, 1.6517181396484375, 0.3284721374511719, -2.0082035064697266, -0.9749069213867188, 0.22849655151367188, 1.9360275268554688, -1.5042076110839844, 2.7976322174072266, 1.9716224670410156, 1.860595703125, 3.7902984619140625, 2.6103057861328125, 4.705650329589844, -1.8229446411132812, -1.30877685546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000359.npy"}
|
||||
{"epoch": 0.5427059712773998, "step": 360, "batch_size": 64, "mean": 1.2405047416687012, "std": 2.066798448562622, "min": -1.9058761596679688, "p10": -1.0163040161132812, "median": 0.7579231262207031, "p90": 4.064439392089844, "max": 7.182952880859375, "pos_frac": 0.65625, "sample": [0.5370025634765625, 7.182952880859375, 0.32470703125, -1.8032455444335938, 1.707672119140625, 2.3092422485351562, -0.4996185302734375, 2.4250526428222656, -0.41500091552734375, 4.161956787109375, -0.6547918319702148, -0.7339019775390625, -1.9058761596679688, -0.4324989318847656, 3.053497314453125, 2.0054779052734375, 3.748504638671875, -1.5611763000488281, 0.9344940185546875, 5.2184600830078125, -0.3635406494140625, -0.5998916625976562, 0.45684814453125, 0.12285614013671875, -1.3946304321289062, 3.2711257934570312, 1.2170562744140625, -0.191436767578125, 2.546417236328125, -1.28924560546875, 1.8061141967773438, 0.16073036193847656, 1.3911018371582031, 1.2181549072265625, 0.3667411804199219, 2.370452880859375, -0.6771316528320312, 2.3126354217529297, 0.42364501953125, -0.2928466796875, 6.243293762207031, -0.40184783935546875, 1.8666343688964844, -1.5066795349121094, 3.8368988037109375, 1.8470993041992188, 5.540767669677734, 2.1613845825195312, -0.18333816528320312, 3.1973114013671875, 4.903099060058594, 0.87005615234375, 4.8709869384765625, 0.3175048828125, -0.12830352783203125, 2.2819557189941406, -0.35602378845214844, -1.053131103515625, 1.214822769165039, 2.1857833862304688, -0.9303741455078125, 0.523284912109375, 0.6457901000976562, 2.9872589111328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000360.npy"}
|
||||
{"epoch": 0.54421768707483, "step": 361, "batch_size": 64, "mean": 1.4734525680541992, "std": 2.1912295818328857, "min": -3.327423095703125, "p10": -0.6151321411132812, "median": 1.2405920028686523, "p90": 3.9888168334960943, "max": 9.22332763671875, "pos_frac": 0.765625, "sample": [-0.560882568359375, 0.8396224975585938, -0.49810791015625, -0.8532505035400391, 3.60723876953125, 0.0868377685546875, -1.3568916320800781, -1.9186935424804688, 0.4045867919921875, 1.9957160949707031, 0.67694091796875, 0.25782012939453125, -0.10331344604492188, 2.2537307739257812, 3.8751983642578125, 1.2367916107177734, 0.04193878173828125, 0.196075439453125, -0.6383819580078125, -0.4088897705078125, 5.440093994140625, 4.0368499755859375, 0.003875732421875, 4.478363037109375, 1.7060356140136719, 1.2262077331542969, 2.4406051635742188, -3.327423095703125, 3.876739501953125, 1.2595443725585938, 1.9473953247070312, 1.4212493896484375, 0.7003669738769531, 6.645408630371094, 3.0269317626953125, 6.2655029296875, 2.7651519775390625, 1.9645366668701172, -0.9133148193359375, -0.15013885498046875, 0.47264862060546875, -0.1924285888671875, 4.243289947509766, 2.4059295654296875, 3.094097137451172, 3.7650413513183594, 1.3063240051269531, -0.07131195068359375, 0.494720458984375, 2.5245513916015625, 2.3290252685546875, 1.2443923950195312, 1.2583580017089844, -0.082916259765625, 9.22332763671875, 1.7211227416992188, 3.6325950622558594, 0.03305816650390625, -2.8523712158203125, 1.8539619445800781, 0.9484100341796875, 0.8334665298461914, 2.106220245361328, 0.0613861083984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000361.npy"}
|
||||
{"epoch": 0.54572940287226, "step": 362, "batch_size": 64, "mean": 1.3376474380493164, "std": 2.44155216217041, "min": -3.3073196411132812, "p10": -1.0498647689819336, "median": 0.9022178649902344, "p90": 4.267675876617432, "max": 10.66259765625, "pos_frac": 0.6875, "sample": [-0.8621749877929688, 2.435649871826172, 1.33636474609375, -0.1045684814453125, 0.4392814636230469, -3.3073196411132812, 1.93670654296875, 2.288665771484375, 10.66259765625, -0.62335205078125, -0.43505859375, -0.024932861328125, 0.9111480712890625, 0.16780853271484375, -0.2620391845703125, 1.9058418273925781, 0.7296829223632812, 1.5664291381835938, -0.3180122375488281, 0.2656097412109375, -0.3391380310058594, 0.2662200927734375, 0.774993896484375, 0.197113037109375, 2.9428558349609375, 2.8782386779785156, 4.853343963623047, -3.1357421875, -1.0604305267333984, 0.3973388671875, 7.2666015625, 5.215320587158203, -1.1472854614257812, 3.1867828369140625, -1.44964599609375, 2.913116455078125, 0.9884872436523438, -0.9429168701171875, -1.1478614807128906, -0.7508354187011719, 0.9213600158691406, 3.487548828125, 3.2622756958007812, 4.34210205078125, 1.288543701171875, 0.786834716796875, 4.289595603942871, 3.1044921875, 3.521585464477539, 0.56024169921875, -1.0252113342285156, 3.852325439453125, 1.3278732299804688, -2.3797149658203125, 1.6036529541015625, 1.0972824096679688, 4.216529846191406, -0.79351806640625, -0.86505126953125, 1.6959991455078125, 6.280120849609375, 0.8932876586914062, 3.50628662109375, 0.020111083984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000362.npy"}
|
||||
{"epoch": 0.54724111866969, "step": 363, "batch_size": 64, "mean": 1.2788193225860596, "std": 1.850953221321106, "min": -2.6285629272460938, "p10": -0.6148422241210937, "median": 0.9463710784912109, "p90": 4.208780670166016, "max": 5.994483947753906, "pos_frac": 0.765625, "sample": [-0.13726806640625, -1.1546287536621094, 3.8696556091308594, 0.6930618286132812, -1.3115997314453125, 0.40219879150390625, 0.4522743225097656, 4.816925048828125, 1.897613525390625, 0.6548328399658203, -0.44319915771484375, 4.18817138671875, 4.2857666015625, 2.7259063720703125, 1.2077140808105469, 5.994483947753906, 0.10634803771972656, 0.6051826477050781, 0.425628662109375, 1.00885009765625, 1.4572601318359375, 2.2354869842529297, 1.7197723388671875, 2.20245361328125, 2.771392822265625, 2.1547622680664062, 1.309112548828125, 0.8198699951171875, -2.6285629272460938, 0.35912132263183594, -1.2720985412597656, -0.5412540435791016, 2.5748291015625, -0.46707916259765625, 4.95037841796875, 1.0069351196289062, 1.1510200500488281, 0.87322998046875, 1.8402976989746094, 0.0064945220947265625, 1.2202606201171875, 0.46659088134765625, 2.745758056640625, 4.734245300292969, -0.6456222534179688, 2.6002731323242188, 4.217613220214844, 1.3523101806640625, 1.7666816711425781, -0.5430221557617188, 5.981353759765625, 0.2705841064453125, -1.707468032836914, 0.8858070373535156, -0.9996528625488281, -0.4410247802734375, -0.42132568359375, -0.0433349609375, 2.3545074462890625, 2.805103302001953, 0.22858810424804688, 0.0198822021484375, 0.758056640625, 1.4269285202026367], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000363.npy"}
|
||||
{"epoch": 0.5487528344671202, "step": 364, "batch_size": 64, "mean": 1.7719082832336426, "std": 2.5554866790771484, "min": -2.718423843383789, "p10": -1.1099025726318357, "median": 1.3603324890136719, "p90": 4.781442642211915, "max": 8.753410339355469, "pos_frac": 0.78125, "sample": [4.155067443847656, 0.4097900390625, 0.5017356872558594, 8.753410339355469, 4.430816650390625, -1.3492431640625, 0.5485916137695312, 1.840911865234375, -0.5652046203613281, 3.047332763671875, 5.72491455078125, 1.3136100769042969, -1.2553901672363281, -0.1609954833984375, 7.2347564697265625, 6.628820419311523, 1.4526519775390625, 1.7336997985839844, 7.3383636474609375, 0.6252288818359375, 3.1602630615234375, 0.8011856079101562, 0.9876556396484375, 3.296112060546875, -2.715850830078125, -2.718423843383789, 4.6349639892578125, 0.07678794860839844, 0.5827102661132812, 2.3687820434570312, 2.385974884033203, 3.6771469116210938, 0.6399269104003906, 2.1661529541015625, 0.30735015869140625, 0.2201080322265625, 3.566905975341797, 3.59393310546875, 0.593505859375, -2.342182159423828, 0.17630386352539062, 2.5851287841796875, -0.7704315185546875, -0.4869842529296875, 1.97174072265625, 1.1624374389648438, 0.7738876342773438, 3.159832000732422, 1.4070549011230469, 8.177734375, -0.38607025146484375, 3.7928543090820312, -2.1107654571533203, 4.0731964111328125, -1.9421234130859375, -0.434906005859375, 2.3360214233398438, 2.4296188354492188, 0.3397674560546875, 1.7373199462890625, 0.6356048583984375, -0.4374847412109375, 4.844219207763672, 2.6762962341308594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000364.npy"}
|
||||
{"epoch": 0.5502645502645502, "step": 365, "batch_size": 64, "mean": 1.4040985107421875, "std": 1.593898892402649, "min": -1.749176025390625, "p10": -0.6196479797363279, "median": 1.4575424194335938, "p90": 3.334696960449219, "max": 6.059436798095703, "pos_frac": 0.765625, "sample": [0.2717151641845703, 1.4842567443847656, 1.61456298828125, 0.9703922271728516, 1.028329849243164, 0.8233146667480469, 2.8235015869140625, 3.0506763458251953, 2.3854217529296875, 1.14990234375, 1.4683074951171875, 1.5701522827148438, -1.749176025390625, 2.409055709838867, 2.0048141479492188, 2.6408348083496094, -0.3743877410888672, 2.9787979125976562, -0.00341796875, 0.4371070861816406, 0.635223388671875, 0.997314453125, 2.9289169311523438, -0.70343017578125, -0.07728767395019531, -1.3047447204589844, 2.0823593139648438, 3.3528289794921875, 1.2030868530273438, 2.1036224365234375, 0.3304595947265625, 0.1596050262451172, 2.1202316284179688, 0.2993888854980469, 2.5902862548828125, 3.8805084228515625, 0.5826759338378906, -0.056915283203125, -1.322174072265625, 0.6979598999023438, 2.571502685546875, 3.292388916015625, 3.4181671142578125, 1.79620361328125, 2.0115509033203125, 4.6309051513671875, 1.82012939453125, -0.28304290771484375, 2.42486572265625, 1.2959671020507812, 3.905670166015625, -0.42415618896484375, 1.3818130493164062, -1.23187255859375, 2.940135955810547, 1.6650009155273438, -0.774017333984375, -0.0702056884765625, 1.44677734375, -0.03031158447265625, 6.059436798095703, 1.4755401611328125, -1.2282752990722656, 4.2840576171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000365.npy"}
|
||||
{"epoch": 0.5517762660619804, "step": 366, "batch_size": 64, "mean": 1.2896113395690918, "std": 2.186528205871582, "min": -3.3632240295410156, "p10": -1.409034729003906, "median": 1.1279449462890625, "p90": 4.08227767944336, "max": 7.0710601806640625, "pos_frac": 0.765625, "sample": [-0.33663368225097656, 4.585113525390625, 2.8690261840820312, 1.4635162353515625, 0.20589447021484375, 3.8118228912353516, -2.116180419921875, -1.5169181823730469, 0.507110595703125, -2.4665756225585938, -0.070770263671875, 0.4144287109375, 3.0303497314453125, 1.1243057250976562, 2.2116622924804688, 1.2434539794921875, -3.0742874145507812, 1.73016357421875, 1.9383621215820312, 4.546031951904297, 1.839874267578125, 1.7952384948730469, 0.00113677978515625, -0.8275279998779297, -0.8538932800292969, -0.058811187744140625, 0.9129829406738281, 4.067729949951172, 0.1463623046875, 0.7258024215698242, 7.0710601806640625, 0.07860946655273438, -0.9122047424316406, 4.6236724853515625, -3.3632240295410156, -0.6662063598632812, 1.0651168823242188, 0.03679656982421875, 1.0827560424804688, 0.6627922058105469, 0.5219221115112305, -1.6053447723388672, 0.6093025207519531, 1.7762298583984375, 0.1229248046875, 1.1315841674804688, 1.1878128051757812, 1.6897125244140625, 5.752033233642578, 1.586273193359375, 1.8778915405273438, 4.088512420654297, 3.7751426696777344, -2.9431190490722656, 2.5016326904296875, -1.1573066711425781, 5.263874053955078, 2.0380401611328125, 2.047119140625, 3.983226776123047, 4.063133239746094, 2.5935287475585938, 3.5920162200927734, 0.5110435485839844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000366.npy"}
|
||||
{"epoch": 0.5532879818594104, "step": 367, "batch_size": 64, "mean": 0.9171976447105408, "std": 1.7636173963546753, "min": -4.369663238525391, "p10": -0.9002796173095702, "median": 1.1199836730957031, "p90": 2.765707397460938, "max": 5.122711181640625, "pos_frac": 0.671875, "sample": [1.60430908203125, 2.22833251953125, 1.493988037109375, 1.1912727355957031, -0.3324775695800781, -0.5304412841796875, 4.614063262939453, 4.857715606689453, -0.33512115478515625, 0.9474391937255859, -0.9359817504882812, 1.1371307373046875, 0.7690658569335938, 1.19219970703125, 5.122711181640625, 2.1590576171875, 0.7024993896484375, -2.042724609375, -0.10147857666015625, 2.0556697845458984, 1.923248291015625, 2.421142578125, -1.5147819519042969, -0.8169746398925781, 0.159576416015625, -2.217914581298828, -0.2604484558105469, 1.9217605590820312, 1.9251632690429688, 3.3453521728515625, 2.0108184814453125, 0.356170654296875, 1.2384090423583984, 0.1372833251953125, 1.9880485534667969, -0.12235260009765625, -0.37061309814453125, -0.2675018310546875, 1.15576171875, 1.1028366088867188, 0.0645599365234375, -4.369663238525391, 0.5691757202148438, 2.8083114624023438, -0.07739639282226562, -0.4220123291015625, 1.6065330505371094, 1.7087135314941406, 0.803436279296875, 0.09177398681640625, 1.7097549438476562, 1.485025405883789, -3.0633926391601562, 2.6662979125976562, 3.5410919189453125, 3.7660980224609375, -0.6368846893310547, -0.3043956756591797, 2.0100936889648438, 2.0506591796875, -0.6110916137695312, 1.8443279266357422, -1.1169967651367188, 2.6644153594970703], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000367.npy"}
|
||||
{"epoch": 0.5547996976568406, "step": 368, "batch_size": 64, "mean": 1.0979206562042236, "std": 1.815167784690857, "min": -2.92962646484375, "p10": -1.4001274108886719, "median": 1.267791748046875, "p90": 3.0451107025146484, "max": 4.902000427246094, "pos_frac": 0.703125, "sample": [-2.1800270080566406, 2.2370452880859375, -1.6849937438964844, 2.1676559448242188, 1.0155487060546875, -0.0409698486328125, 4.364192962646484, 3.046833038330078, -0.06111907958984375, 0.6290054321289062, 1.7411041259765625, -2.5724220275878906, 0.3094940185546875, 2.5665740966796875, 4.513614654541016, -1.5523757934570312, -0.47283935546875, -1.4355239868164062, -0.0071258544921875, 2.5925445556640625, 0.26611328125, 1.0331687927246094, 2.7966842651367188, -1.2934799194335938, 2.6787967681884766, 1.6564064025878906, -1.578460693359375, 2.5662002563476562, 3.8993377685546875, 3.7476234436035156, 0.9059677124023438, 0.81353759765625, 3.1547317504882812, 0.7577667236328125, 2.3904266357421875, 1.76544189453125, 1.2259445190429688, 1.1224365234375, 1.3096389770507812, 0.8847885131835938, 1.0398178100585938, -1.3093643188476562, 1.5673599243164062, 4.902000427246094, -1.1520118713378906, -1.2500457763671875, 3.0410919189453125, -0.7769393920898438, 1.7094268798828125, -1.317535400390625, 2.1279144287109375, 2.9162673950195312, 3.018585205078125, 2.6151885986328125, 1.598073959350586, 1.3672943115234375, 2.1651458740234375, -0.7452163696289062, -0.439483642578125, 0.49069976806640625, 2.3385772705078125, 2.0307884216308594, -2.92962646484375, 1.9796295166015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000368.npy"}
|
||||
{"epoch": 0.5563114134542706, "step": 369, "batch_size": 64, "mean": 1.0467851161956787, "std": 2.2085697650909424, "min": -4.854240417480469, "p10": -1.8454885482788086, "median": 0.9856338500976562, "p90": 4.1869464874267575, "max": 6.457759857177734, "pos_frac": 0.6875, "sample": [4.887931823730469, -2.8343048095703125, 0.6220970153808594, 2.10028076171875, -3.046243667602539, 3.693450927734375, 1.5659141540527344, 2.2663116455078125, 1.3454437255859375, 1.2731475830078125, 0.71441650390625, 4.690193176269531, 4.917041778564453, 1.0703048706054688, 4.180065155029297, 0.9722824096679688, 0.5218038558959961, 0.0937652587890625, 0.17353057861328125, 0.57196044921875, 2.307098388671875, -2.649808883666992, 1.5677566528320312, -1.8478031158447266, -0.13419342041015625, 2.5296249389648438, -0.5254554748535156, 2.8472423553466797, 1.238067626953125, -1.4702301025390625, 2.738037109375, -0.16162109375, 0.3291587829589844, 4.1898956298828125, 2.8844738006591797, 4.417598724365234, 6.457759857177734, 0.9989852905273438, -0.219146728515625, -1.840087890625, 2.372772216796875, 0.7604942321777344, 0.6105537414550781, 2.6561050415039062, 0.7086143493652344, 1.4587249755859375, -0.695037841796875, 2.1507186889648438, 2.4111328125, -0.158294677734375, -0.2198638916015625, -0.69561767578125, 1.3542518615722656, 2.8829574584960938, -1.0259780883789062, -1.9569377899169922, -0.509979248046875, 5.403165817260742, 0.566162109375, 1.21099853515625, -0.7573204040527344, -4.854240417480469, -2.3740901947021484, 2.2582130432128906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000369.npy"}
|
||||
{"epoch": 0.5578231292517006, "step": 370, "batch_size": 64, "mean": 1.5856380462646484, "std": 2.3953070640563965, "min": -3.2700729370117188, "p10": -1.5555389404296875, "median": 1.3577003479003906, "p90": 5.027283477783205, "max": 6.3847503662109375, "pos_frac": 0.75, "sample": [0.8302192687988281, 4.556770324707031, 2.035045623779297, 1.0472145080566406, 5.20721435546875, 2.5074024200439453, 6.1660919189453125, 0.52679443359375, 4.607444763183594, 2.0741004943847656, 1.407928466796875, -0.5738677978515625, -0.22098541259765625, 2.799285888671875, 6.240264892578125, 0.9505233764648438, 0.9808502197265625, -2.046142578125, -1.4942398071289062, 2.3814544677734375, -1.50091552734375, 2.9319305419921875, 2.9887847900390625, 3.124969482421875, 3.3418121337890625, 2.4351806640625, 2.3610572814941406, -1.8790817260742188, 1.5495109558105469, -0.14621734619140625, 0.14970016479492188, -1.111236572265625, -1.8525238037109375, -1.65521240234375, 0.170318603515625, 6.345062255859375, 6.3847503662109375, -0.4078826904296875, 6.266998291015625, 0.7101974487304688, 3.8001022338867188, -2.64044189453125, 3.6255722045898438, 1.02789306640625, 1.951772689819336, 5.857307434082031, 1.3074722290039062, 3.2565765380859375, 1.5297832489013672, 3.61175537109375, -3.2700729370117188, 1.6736621856689453, 4.462444305419922, -1.3228769302368164, 1.2582054138183594, 0.7433700561523438, 0.4759979248046875, 1.4636383056640625, -0.358154296875, 0.06912803649902344, 1.2273941040039062, 3.0433425903320312, -1.578948974609375, 0.07534027099609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000370.npy"}
|
||||
{"epoch": 0.5593348450491308, "step": 371, "batch_size": 64, "mean": 1.1544888019561768, "std": 2.550607681274414, "min": -4.539922714233398, "p10": -1.9693752288818358, "median": 0.849757194519043, "p90": 4.372203063964845, "max": 9.059989929199219, "pos_frac": 0.6875, "sample": [4.4898223876953125, -4.539922714233398, 6.761474609375, 2.1059951782226562, 4.496379852294922, -2.8664398193359375, 1.5141143798828125, -0.7314071655273438, 0.2256622314453125, 0.16680908203125, -0.1729888916015625, 2.9447479248046875, -1.3526382446289062, 6.374002456665039, 1.609954833984375, 0.21142578125, 0.19605636596679688, 2.3803977966308594, 0.8311824798583984, 4.452430725097656, -2.3839874267578125, -1.451141357421875, 0.00467681884765625, 0.65887451171875, 3.9541015625, 4.882892608642578, 3.3714599609375, 9.059989929199219, 2.9116973876953125, 1.0327033996582031, 2.2448883056640625, 2.89141845703125, -2.2659530639648438, -1.0897445678710938, 0.213348388671875, 1.0462188720703125, 0.8166427612304688, -2.0177688598632812, -2.3435211181640625, 1.6715545654296875, -1.2862129211425781, 3.5324554443359375, -0.2979278564453125, 0.42678070068359375, 1.4111175537109375, -1.6786231994628906, 1.7944564819335938, 3.302204132080078, -1.8564567565917969, 1.8301811218261719, 0.8683319091796875, -3.4730758666992188, 0.5756492614746094, 1.7194061279296875, 2.364013671875, -0.19550514221191406, 4.185005187988281, 1.2886810302734375, -0.8369140625, 0.39771270751953125, 3.9674148559570312, 4.122051239013672, -0.2444915771484375, -0.334381103515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000371.npy"}
|
||||
{"epoch": 0.5608465608465608, "step": 372, "batch_size": 64, "mean": 1.145450472831726, "std": 2.4710774421691895, "min": -3.9426422119140625, "p10": -2.0837675094604493, "median": 0.8848638534545898, "p90": 4.4877346038818375, "max": 7.5724334716796875, "pos_frac": 0.6875, "sample": [-2.126293182373047, 2.201374053955078, 2.6144866943359375, -2.289306640625, 0.7149581909179688, 2.717601776123047, 3.1633377075195312, -1.0413436889648438, 2.045989990234375, 3.4641494750976562, 3.9002761840820312, 1.0938873291015625, 1.3417167663574219, 6.3128814697265625, 2.59820556640625, 0.7503204345703125, 2.3383407592773438, -2.9811859130859375, 2.458526611328125, 5.334545135498047, 1.2584047317504883, 0.8500595092773438, -0.787994384765625, 4.006252288818359, 0.22243690490722656, 1.5929718017578125, -1.3460845947265625, 2.9715957641601562, 7.5724334716796875, 0.9362735748291016, 5.07354736328125, 4.694084167480469, 0.6200904846191406, -1.1988258361816406, -1.353790283203125, 3.2252044677734375, -3.9426422119140625, 2.046375274658203, 3.7851829528808594, 0.1725006103515625, 1.89013671875, 1.7820892333984375, -0.557525634765625, 5.694129943847656, 0.5612525939941406, 2.6398582458496094, -1.762054443359375, 0.4399528503417969, 0.8164844512939453, -0.9571533203125, 0.5899162292480469, -3.0781021118164062, 2.3031082153320312, 0.148712158203125, -0.10605621337890625, -2.051858901977539, -2.097442626953125, -1.3006744384765625, 0.9196681976318359, 5.541057586669922, -0.07851791381835938, -2.946319580078125, -0.3814697265625, 0.2890892028808594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000372.npy"}
|
||||
{"epoch": 0.562358276643991, "step": 373, "batch_size": 64, "mean": 1.4590967893600464, "std": 2.2769947052001953, "min": -3.510822296142578, "p10": -1.096114349365234, "median": 1.1449909210205078, "p90": 4.025540161132812, "max": 7.7902069091796875, "pos_frac": 0.734375, "sample": [-1.8105506896972656, 3.0704803466796875, 3.4378433227539062, 1.886383056640625, 3.5047225952148438, -3.510822296142578, 0.07344818115234375, 3.462268829345703, 0.9045257568359375, 2.6637706756591797, 3.106800079345703, -2.1941757202148438, 1.61566162109375, 0.888946533203125, -2.524486541748047, -0.11249160766601562, 1.4084815979003906, 0.9161605834960938, -0.112945556640625, 0.8253707885742188, 0.9348983764648438, -0.5406589508056641, -0.11499404907226562, -0.5699539184570312, 1.1593856811523438, 6.652915954589844, -0.1710662841796875, 2.0535240173339844, 1.4130706787109375, 0.9397754669189453, 3.4302940368652344, 0.6564483642578125, -0.7508888244628906, 4.271480560302734, -0.5221099853515625, 0.49749755859375, 0.51092529296875, -0.486785888671875, -1.8667221069335938, 3.9779129028320312, 2.5382766723632812, 4.794525146484375, -0.39186859130859375, -1.7010574340820312, 0.8066291809082031, 2.9726028442382812, -1.2440681457519531, 3.0109481811523438, 0.8787002563476562, 1.4347152709960938, 2.6960678100585938, 6.619171142578125, 1.3497486114501953, 7.7902069091796875, 1.7064552307128906, 4.045951843261719, 1.1305961608886719, 1.2958526611328125, 1.8869476318359375, 7.341850280761719, 2.607990264892578, 0.03749847412109375, 0.23590469360351562, 2.564208984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000373.npy"}
|
||||
{"epoch": 0.563869992441421, "step": 374, "batch_size": 64, "mean": 1.618525743484497, "std": 2.005291223526001, "min": -2.894378662109375, "p10": -0.8678985595703125, "median": 1.4367866516113281, "p90": 4.314146041870118, "max": 5.92010498046875, "pos_frac": 0.78125, "sample": [3.940673828125, 3.1255950927734375, 0.874237060546875, -1.3848037719726562, -0.2719411849975586, 2.5799331665039062, 2.7615928649902344, 2.965606689453125, 5.92010498046875, 1.0748367309570312, 2.6548099517822266, -2.894378662109375, 0.6767425537109375, 4.5023193359375, 0.74560546875, -1.707305908203125, 4.574398040771484, 2.3412246704101562, 0.06304168701171875, -0.7104339599609375, -0.859161376953125, 2.6226978302001953, 0.8640632629394531, -0.87164306640625, 3.7439346313476562, 4.167613983154297, 2.5476417541503906, 3.4049911499023438, -0.6362266540527344, 1.0594863891601562, 0.7436141967773438, 4.376945495605469, 2.163341522216797, 0.7452754974365234, 2.5048751831054688, 2.6380538940429688, 1.3940505981445312, -2.486602783203125, 1.812347412109375, 2.3778038024902344, 1.479522705078125, -1.3134288787841797, 0.3549337387084961, 3.83404541015625, 4.4777069091796875, 1.7098846435546875, -1.0571441650390625, 0.763397216796875, 0.6048583984375, 1.2812347412109375, 3.8445587158203125, 2.7800979614257812, 4.464569091796875, 0.24564361572265625, 2.0125198364257812, -0.22972869873046875, 5.85382080078125, 3.74810791015625, 1.2319869995117188, 3.5755348205566406, 0.163909912109375, 0.66021728515625, -0.27968597412109375, -0.7658843994140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000374.npy"}
|
||||
{"epoch": 0.5653817082388511, "step": 375, "batch_size": 64, "mean": 1.076629638671875, "std": 2.197110414505005, "min": -2.7384986877441406, "p10": -2.094066619873047, "median": 1.0154943466186523, "p90": 3.3118486404418954, "max": 9.44036865234375, "pos_frac": 0.71875, "sample": [0.7350845336914062, -0.40703582763671875, -0.097625732421875, 3.128793716430664, 1.6070661544799805, 0.9499111175537109, 4.4715728759765625, -0.5431785583496094, -0.5588531494140625, 0.8628597259521484, -0.06480789184570312, 2.1616973876953125, 0.0604705810546875, 2.267650604248047, 2.623697280883789, -0.256561279296875, 1.91650390625, 0.600494384765625, 0.45482635498046875, -2.177236557006836, 0.8374671936035156, 1.7993927001953125, 2.5798416137695312, 0.334136962890625, 1.1212196350097656, 4.112022399902344, 0.5855560302734375, -0.1990966796875, 0.8855571746826172, 2.2020111083984375, 1.2969856262207031, -2.311647415161133, -2.3280906677246094, -1.8698272705078125, 1.6134033203125, 2.1749725341796875, 1.9513702392578125, 0.5551528930664062, 6.4323577880859375, 0.038181304931640625, 1.66583251953125, 1.6517181396484375, 4.020111083984375, -2.7384986877441406, 1.0810775756835938, -1.0812835693359375, 5.8776702880859375, 1.7085456848144531, 9.44036865234375, 1.0880126953125, 2.9650192260742188, 2.1650848388671875, -2.4321861267089844, 1.4154205322265625, 0.24806785583496094, 2.3277854919433594, 1.1172332763671875, -2.5587520599365234, -0.2861213684082031, 2.074077606201172, -1.9525146484375, 3.390300750732422, 0.32576751708984375, -2.1547317504882812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000375.npy"}
|
||||
{"epoch": 0.5668934240362812, "step": 376, "batch_size": 64, "mean": 1.8556674718856812, "std": 2.4689512252807617, "min": -2.9307937622070312, "p10": -1.2002830505371094, "median": 1.3661384582519531, "p90": 5.069552993774415, "max": 9.3551025390625, "pos_frac": 0.796875, "sample": [0.26706695556640625, 3.34869384765625, 0.9878005981445312, 1.7409591674804688, 2.6677322387695312, -0.8077220916748047, -1.2100906372070312, 5.6196136474609375, 1.0025672912597656, 1.3083267211914062, 4.69097900390625, 0.6709499359130859, -1.9018669128417969, -0.6059017181396484, -1.177398681640625, -0.6921539306640625, 4.825675964355469, 2.2930831909179688, 1.6496315002441406, 4.40576171875, 0.411285400390625, 6.711296081542969, -0.7538375854492188, 0.7054595947265625, 3.9291839599609375, 0.5997161865234375, 7.343963623046875, 5.118186950683594, 1.4337081909179688, 2.0365142822265625, 3.7705535888671875, 2.0791702270507812, 4.583320617675781, 2.3577423095703125, 1.2695350646972656, 5.3784027099609375, 9.3551025390625, -1.4806175231933594, 4.956073760986328, -2.9307937622070312, 0.41182518005371094, -1.8047161102294922, 3.7050552368164062, 0.9914360046386719, 2.1378517150878906, 3.771820068359375, -1.5366439819335938, 1.0139389038085938, 0.9716949462890625, 3.4157772064208984, -0.6101226806640625, 0.8371162414550781, 2.2194976806640625, 0.8166656494140625, 5.5713043212890625, 1.2061347961425781, 1.4239501953125, 0.3693428039550781, 3.9115447998046875, 3.090923309326172, 0.64093017578125, 1.7991943359375, 0.18166542053222656, -1.73114013671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000376.npy"}
|
||||
{"epoch": 0.5684051398337112, "step": 377, "batch_size": 64, "mean": 1.187424898147583, "std": 2.1361117362976074, "min": -3.9761962890625, "p10": -1.313520812988281, "median": 1.3657989501953125, "p90": 4.035471343994141, "max": 6.025684356689453, "pos_frac": 0.6875, "sample": [4.226982116699219, -1.024993896484375, 3.2948646545410156, 3.313863754272461, 2.5788955688476562, 1.0049209594726562, -0.8959846496582031, -0.255859375, 0.35544586181640625, 4.033233642578125, 3.2483673095703125, -0.9931411743164062, 4.1775970458984375, 1.0314712524414062, 1.2435531616210938, 0.0276641845703125, -1.4094314575195312, 1.3902053833007812, -0.216156005859375, 0.5970458984375, 5.920124053955078, -0.6111116409301758, 1.5497207641601562, 2.8458023071289062, 1.989837646484375, -1.6056022644042969, 1.7174301147460938, 4.036430358886719, -3.9761962890625, 0.9566993713378906, 1.7569694519042969, 0.015848159790039062, 1.6689453125, -1.5519218444824219, 1.6215744018554688, -0.3819999694824219, 1.3997344970703125, 1.6599540710449219, 1.0630874633789062, 1.1641082763671875, -2.7202835083007812, 3.3619613647460938, 1.7415733337402344, 1.164764404296875, 1.3413925170898438, 6.025684356689453, 2.045530319213867, -0.4976348876953125, -3.8924560546875, 1.5998077392578125, 2.02349853515625, 2.4969482421875, -0.05747032165527344, 5.6332244873046875, -0.9876327514648438, -1.0897293090820312, 4.725969314575195, 1.690999984741211, -2.1108360290527344, -0.08494949340820312, 2.661113739013672, -0.3235015869140625, 2.5938339233398438, 1.6854095458984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000377.npy"}
|
||||
{"epoch": 0.5699168556311414, "step": 378, "batch_size": 64, "mean": 1.1220749616622925, "std": 2.3940863609313965, "min": -4.634395599365234, "p10": -0.8272937774658203, "median": 0.8218936920166016, "p90": 4.324565124511719, "max": 11.410247802734375, "pos_frac": 0.703125, "sample": [0.07401275634765625, 0.385498046875, 1.036233901977539, 0.36041831970214844, -0.6856231689453125, 0.6130599975585938, -1.1786117553710938, 0.8376083374023438, 1.0169219970703125, 2.3985214233398438, -4.634395599365234, 0.7468948364257812, 4.9576263427734375, 1.6598052978515625, 4.192207336425781, 1.2907791137695312, -0.5390815734863281, 1.0506839752197266, 3.0483551025390625, -0.8305931091308594, 2.154386520385742, 1.9609756469726562, 0.40407562255859375, 0.9581069946289062, 4.193443298339844, 2.0838851928710938, 0.03440093994140625, 1.454620361328125, -0.8195953369140625, -0.18925857543945312, 4.8574371337890625, 1.5394287109375, -3.2511215209960938, 3.017925262451172, 4.730438232421875, 1.7360382080078125, -0.17104339599609375, 0.8061790466308594, 0.8694915771484375, -0.6096172332763672, 11.410247802734375, -0.838409423828125, -0.656036376953125, 4.380760192871094, -0.5379180908203125, 2.642242431640625, 0.003055572509765625, 4.682458877563477, 0.16473770141601562, 1.3839607238769531, 0.5603256225585938, 1.3366851806640625, 0.719329833984375, -2.7432479858398438, -0.740234375, 4.919776916503906, -0.3011436462402344, 1.573028564453125, 0.3339252471923828, 3.0334396362304688, -0.5065193176269531, -3.2421226501464844, -0.45236778259277344, 3.1263046264648438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000378.npy"}
|
||||
{"epoch": 0.5714285714285714, "step": 379, "batch_size": 64, "mean": 1.3503767251968384, "std": 2.104811668395996, "min": -3.838592529296875, "p10": -0.7784523010253905, "median": 1.279205322265625, "p90": 4.088540649414063, "max": 7.175251007080078, "pos_frac": 0.734375, "sample": [1.0731468200683594, -0.12530517578125, 1.9460258483886719, -0.37000274658203125, 0.7425537109375, 2.1575164794921875, 2.8337974548339844, -3.168975830078125, 0.09859466552734375, 2.3536033630371094, 1.4968795776367188, -0.439788818359375, -0.7002182006835938, 3.3110580444335938, -0.28077125549316406, 6.9439239501953125, 1.219329833984375, 0.0178680419921875, 2.0191192626953125, 2.4782028198242188, 5.430908203125, 2.7764434814453125, 3.0306472778320312, 4.1506500244140625, -2.516937255859375, 1.9283447265625, 2.5936279296875, -1.3792037963867188, 1.339080810546875, -3.838592529296875, 3.9436187744140625, 0.8497695922851562, 1.5344524383544922, 1.8000526428222656, 0.5128021240234375, 1.783355712890625, 1.9056243896484375, 2.3559837341308594, 0.08942031860351562, -0.41363525390625, -0.56689453125, 1.7344741821289062, 0.42812347412109375, 4.971406936645508, -0.19237518310546875, 1.0537338256835938, 7.175251007080078, -1.323822021484375, 0.30168914794921875, 4.9103851318359375, -0.49892425537109375, -0.9540176391601562, -0.1495819091796875, -0.811981201171875, 0.93994140625, 0.25786781311035156, 2.9492835998535156, 4.168537139892578, 0.73602294921875, 1.5666999816894531, 0.8586006164550781, 3.3043212890625, 2.405120849609375, 1.677276611328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000379.npy"}
|
||||
{"epoch": 0.5729402872260015, "step": 380, "batch_size": 64, "mean": 1.7465509176254272, "std": 2.4001474380493164, "min": -6.342887878417969, "p10": -0.8267604827880858, "median": 1.7830238342285156, "p90": 4.723003005981448, "max": 7.070304870605469, "pos_frac": 0.828125, "sample": [1.3138370513916016, 1.2615127563476562, 1.7475433349609375, 0.9095306396484375, -2.6056442260742188, -2.512451171875, 5.332561492919922, 5.156829833984375, 0.21953582763671875, 3.1316452026367188, 1.0851554870605469, -0.9076690673828125, -0.6379737854003906, 4.032829284667969, 2.835437774658203, 0.096588134765625, -0.3520355224609375, 2.0535354614257812, 2.6297149658203125, 4.183773040771484, 2.2010421752929688, 1.803192138671875, 0.560455322265625, -4.252298355102539, 6.1565399169921875, 1.4847183227539062, 3.4616165161132812, 4.045509338378906, -1.3657646179199219, 1.8811492919921875, 1.7747573852539062, 1.35565185546875, 0.8943405151367188, 0.9035263061523438, 4.094512939453125, 0.5188169479370117, 6.612640380859375, 2.907886505126953, 0.751739501953125, 2.6045761108398438, 2.2731475830078125, -6.342887878417969, 3.9712905883789062, 7.070304870605469, 2.1444015502929688, 2.209888458251953, 4.9541015625, 1.257080078125, -0.1561737060546875, 2.802898406982422, 0.9323348999023438, 0.47121429443359375, -2.009735107421875, 5.418891906738281, 3.1583251953125, 2.7324142456054688, 2.3733596801757812, 0.4901123046875, 3.5817718505859375, 2.8350677490234375, 1.2574234008789062, 1.2680435180664062, -0.0741729736328125, 1.791290283203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000380.npy"}
|
||||
{"epoch": 0.5744520030234316, "step": 381, "batch_size": 64, "mean": 1.0140235424041748, "std": 2.2396748065948486, "min": -5.729248046875, "p10": -1.2562990188598633, "median": 0.9007568359375, "p90": 3.943656158447266, "max": 7.5189666748046875, "pos_frac": 0.6875, "sample": [-1.70782470703125, 1.64581298828125, -0.2901344299316406, -3.7907485961914062, -5.729248046875, 1.6151599884033203, 0.8878021240234375, 0.5208530426025391, 6.196758270263672, 1.2418136596679688, 1.9844818115234375, 0.2216625213623047, 2.1412010192871094, 2.2596664428710938, -0.9360427856445312, 7.5189666748046875, 4.019195556640625, 0.013731002807617188, -0.08454132080078125, 0.19475936889648438, 0.8519439697265625, -0.12133026123046875, 1.9791698455810547, 3.945709228515625, -0.15848541259765625, 1.85968017578125, 0.1889801025390625, -3.3291549682617188, 2.5231361389160156, -0.4031028747558594, 2.9338817596435547, 2.2373809814453125, 2.679698944091797, -2.3873519897460938, -0.839569091796875, 2.9250526428222656, 1.3518409729003906, -1.247467041015625, -1.2600841522216797, 1.004974365234375, 1.6700897216796875, 1.472747802734375, 4.9452362060546875, 0.6914691925048828, 1.7188873291015625, 5.188896179199219, 2.3469276428222656, 1.0660400390625, -0.6312046051025391, 0.021070480346679688, 0.9137115478515625, 1.9772720336914062, 2.55889892578125, 0.13387298583984375, -0.045391082763671875, 0.8186874389648438, 3.9388656616210938, -2.249053955078125, -0.23775482177734375, 1.6756134033203125, 0.8495368957519531, -0.34246826171875, -0.3364410400390625, 4.0937652587890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000381.npy"}
|
||||
{"epoch": 0.5759637188208617, "step": 382, "batch_size": 64, "mean": 1.0922926664352417, "std": 2.2440123558044434, "min": -3.8384323120117188, "p10": -0.8205886840820311, "median": 0.6475734710693359, "p90": 4.123472595214846, "max": 7.33441162109375, "pos_frac": 0.671875, "sample": [-2.660186767578125, 0.3117637634277344, 7.33441162109375, 0.665069580078125, 2.7888107299804688, -0.21771240234375, -0.097503662109375, 0.032482147216796875, 5.082489013671875, 2.928020477294922, 0.07193756103515625, 1.2472915649414062, -0.294219970703125, 0.897186279296875, 3.267780303955078, -0.721649169921875, -3.8384323120117188, 5.2629547119140625, -2.6949119567871094, 2.2175445556640625, -0.5372161865234375, 1.0865936279296875, 2.4606170654296875, -0.5527915954589844, -0.7041397094726562, 0.4525299072265625, 0.6959075927734375, 1.8506278991699219, -0.4271697998046875, 7.323200225830078, 1.67449951171875, 2.3361434936523438, -2.007434844970703, -1.9549751281738281, 3.6918792724609375, -0.4958038330078125, -0.30152130126953125, 1.1097335815429688, -0.022308349609375, 1.1917724609375, -0.4366607666015625, 6.0558624267578125, 2.573822021484375, 1.5853271484375, 0.1499481201171875, 4.308441162109375, -1.7661972045898438, -0.5468902587890625, 0.3081207275390625, 1.1995182037353516, 0.5850830078125, -0.8629913330078125, 0.2616233825683594, 1.7639808654785156, 1.3255615234375, 0.575836181640625, 2.0726776123046875, 1.742757797241211, 0.6300773620605469, 6.204750061035156, 1.5649337768554688, 1.69342041015625, 0.5406455993652344, -0.0761871337890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000382.npy"}
|
||||
{"epoch": 0.5774754346182918, "step": 383, "batch_size": 64, "mean": 1.4580726623535156, "std": 2.3839151859283447, "min": -3.41326904296875, "p10": -1.3399681091308593, "median": 1.1984405517578125, "p90": 4.952472686767578, "max": 6.533962249755859, "pos_frac": 0.6875, "sample": [0.2651405334472656, 4.941871643066406, 3.4806976318359375, 0.2945404052734375, -1.400390625, -1.14825439453125, 1.938446044921875, 0.6476364135742188, 0.8631744384765625, -0.8286590576171875, 2.573394775390625, 1.5012893676757812, 0.25934600830078125, 2.649517059326172, 3.2317657470703125, -1.3666305541992188, 2.4588050842285156, -1.1521148681640625, 5.761817932128906, 2.467418670654297, 1.1391983032226562, -1.2777557373046875, 2.996124267578125, 2.3060302734375, 4.574277877807617, 4.0385589599609375, 0.7464370727539062, -0.200408935546875, 2.179981231689453, 5.2853851318359375, 4.51763916015625, -0.6493949890136719, 1.782745361328125, 5.6749114990234375, -0.12725067138671875, 2.6133384704589844, 3.074871063232422, -3.41326904296875, 0.010589599609375, -0.08251190185546875, 1.0330123901367188, -0.15148162841796875, -0.1385650634765625, -0.5228672027587891, 1.2576828002929688, -0.6089611053466797, 0.7136192321777344, 1.7558135986328125, -3.1691818237304688, 5.406288146972656, 1.6958274841308594, 6.533962249755859, 2.9724655151367188, 0.34477996826171875, 0.9028396606445312, 2.710601806640625, 4.9570159912109375, -1.470458984375, -1.1362533569335938, -3.03131103515625, 4.394447326660156, -2.1591415405273438, 5.201072692871094, 3.197132110595703], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000383.npy"}
|
||||
{"epoch": 0.5789871504157218, "step": 384, "batch_size": 64, "mean": 1.29500412940979, "std": 2.114091634750366, "min": -5.617584228515625, "p10": -0.7655075073242186, "median": 1.210740089416504, "p90": 4.205573654174805, "max": 6.00341796875, "pos_frac": 0.796875, "sample": [-1.6728973388671875, 5.839485168457031, 0.5098228454589844, 0.8839797973632812, -0.16705322265625, 2.108234405517578, 1.7962722778320312, 2.2294998168945312, 3.127716064453125, 1.2635040283203125, 1.9327468872070312, 1.2573089599609375, 0.390289306640625, -0.8211669921875, 0.3291893005371094, 2.0027694702148438, 1.514251708984375, 3.8629989624023438, -0.6356353759765625, 3.4227447509765625, 1.0965423583984375, 3.2264328002929688, 0.8881187438964844, -3.5622940063476562, -0.21169662475585938, 1.199319839477539, 4.784324645996094, -0.1473236083984375, -2.0694923400878906, 0.239166259765625, 4.868860244750977, 1.852447509765625, 0.2413311004638672, 1.434438705444336, 2.1560745239257812, 0.42974281311035156, 6.00341796875, 0.2552480697631836, -5.617584228515625, 3.393939971923828, 0.5688629150390625, 0.7139205932617188, 4.445304870605469, -2.2242164611816406, 4.637870788574219, 4.166248321533203, 2.112091064453125, 0.7110309600830078, 1.2760543823242188, 2.490081787109375, 1.4943389892578125, 1.1144065856933594, 0.5417098999023438, 3.259124755859375, 1.97412109375, -0.5558700561523438, -2.1686782836914062, 4.2224273681640625, -0.45615386962890625, 1.018890380859375, 0.0634765625, 1.0029220581054688, 1.6150627136230469, 1.2221603393554688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000384.npy"}
|
||||
{"epoch": 0.5804988662131519, "step": 385, "batch_size": 64, "mean": 1.5248796939849854, "std": 2.0320017337799072, "min": -2.5180015563964844, "p10": -0.8678287506103515, "median": 1.3852424621582031, "p90": 4.381614303588868, "max": 7.4210205078125, "pos_frac": 0.75, "sample": [-2.5180015563964844, 0.672698974609375, 2.1376571655273438, 2.1929931640625, 1.4224281311035156, 0.8216705322265625, 1.7145233154296875, 1.0941314697265625, 1.2455711364746094, 1.3608283996582031, -0.60638427734375, 2.03155517578125, 1.4848747253417969, 2.9932479858398438, 2.5138397216796875, 4.302074432373047, 2.9996337890625, 4.9654083251953125, 0.5321731567382812, -0.006931304931640625, -1.0314865112304688, -0.205902099609375, 0.204193115234375, 1.4941024780273438, 5.128120422363281, 2.526336669921875, 3.4511051177978516, 0.8211517333984375, -0.7307319641113281, -0.3328819274902344, 1.4030990600585938, -1.8239288330078125, 4.415702819824219, 1.886322021484375, 0.58819580078125, 2.67974853515625, 1.7124748229980469, 2.5782394409179688, 1.6322250366210938, 2.594916343688965, -0.116912841796875, -1.3697891235351562, 0.7396240234375, 1.909820556640625, 1.3673858642578125, 5.4664764404296875, 3.4848098754882812, -0.748565673828125, -0.38007354736328125, -1.1129913330078125, 0.00885009765625, 1.1679706573486328, -2.083465576171875, -0.151275634765625, 2.378467559814453, 3.2693634033203125, 0.48072052001953125, 4.0602569580078125, 1.106201171875, 7.4210205078125, -0.9189414978027344, 0.4160003662109375, 6.194648742675781, 4.657707214355469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000385.npy"}
|
||||
{"epoch": 0.582010582010582, "step": 386, "batch_size": 64, "mean": 1.1207683086395264, "std": 2.272717237472534, "min": -4.754638671875, "p10": -1.961445617675781, "median": 1.0541343688964844, "p90": 4.186605834960938, "max": 5.8244476318359375, "pos_frac": 0.75, "sample": [4.057861328125, -0.3495445251464844, 0.007564544677734375, 2.694795608520508, 4.0135955810546875, 0.8483123779296875, 0.7201461791992188, 0.8286972045898438, 0.7245330810546875, 3.143798828125, 1.1480026245117188, 0.8026885986328125, -3.2703990936279297, 0.4600677490234375, 1.5192642211914062, 1.1122779846191406, 0.081268310546875, 4.430103302001953, 1.3735847473144531, 2.2502822875976562, 1.749887466430664, 2.1352691650390625, 1.0189743041992188, -0.6236419677734375, -0.392730712890625, -0.07918167114257812, 4.1627655029296875, 0.34166717529296875, 1.8022346496582031, 1.5103874206542969, 3.3275222778320312, 2.9716644287109375, -1.9891357421875, -4.754638671875, -1.8968353271484375, 1.585540771484375, -3.634838104248047, 2.7284927368164062, 0.44635772705078125, 4.6489105224609375, 0.42218780517578125, -2.6076011657714844, 0.36425018310546875, 3.100555419921875, 4.1968231201171875, -0.6234626770019531, 2.667236328125, 1.4264450073242188, 0.5756635665893555, 5.7051239013671875, 1.3832778930664062, 1.08929443359375, -2.3162765502929688, 1.150665283203125, 5.353759765625, -0.6093292236328125, 0.14923095703125, -0.421142578125, 5.8244476318359375, 4.8701019287109375, 0.5486907958984375, -0.45829010009765625, 1.6943893432617188, -3.4124412536621094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000386.npy"}
|
||||
{"epoch": 0.5835222978080121, "step": 387, "batch_size": 64, "mean": 1.6778154373168945, "std": 1.9831502437591553, "min": -2.8719100952148438, "p10": -0.7265287399291991, "median": 1.7557554244995117, "p90": 4.128174591064454, "max": 6.046665191650391, "pos_frac": 0.78125, "sample": [0.3391876220703125, -1.0951652526855469, 6.046665191650391, -0.7885608673095703, 0.5093536376953125, 6.017578125, 0.48125457763671875, -1.1243400573730469, 0.7907733917236328, 5.616912841796875, 1.783926010131836, -0.0207366943359375, 1.7275848388671875, 1.9272232055664062, 2.8117942810058594, -0.255584716796875, 3.1884918212890625, 1.8420867919921875, 0.3878936767578125, 4.326995849609375, 2.872039794921875, 3.621335983276367, 2.7440567016601562, -0.581787109375, -0.8673667907714844, 4.4757232666015625, 0.02443695068359375, 0.43849945068359375, 3.9357757568359375, 1.2107772827148438, -0.46087646484375, 1.1973037719726562, 2.6703338623046875, 2.9732112884521484, -0.195556640625, 3.3980255126953125, 2.7072486877441406, 0.5923194885253906, 1.9909801483154297, -2.8719100952148438, -1.740142822265625, 3.2227325439453125, 2.2992172241210938, -1.2575302124023438, -0.37574005126953125, 3.4627113342285156, 3.274272918701172, 4.186820983886719, 0.14066123962402344, 2.833486557006836, 0.112335205078125, 1.372344970703125, 0.6862945556640625, 0.3275909423828125, -0.5210418701171875, 2.4199180603027344, 2.4780502319335938, 3.0205917358398438, 4.73333740234375, 3.9913330078125, 3.425933837890625, 0.042255401611328125, 1.370697021484375, 3.486146926879883], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000387.npy"}
|
||||
{"epoch": 0.5850340136054422, "step": 388, "batch_size": 64, "mean": 1.1801083087921143, "std": 2.1390902996063232, "min": -3.3133316040039062, "p10": -1.5586727142333985, "median": 1.0955352783203125, "p90": 4.032634735107422, "max": 6.5592803955078125, "pos_frac": 0.71875, "sample": [3.4361419677734375, 4.075309753417969, 3.1783447265625, 3.313892364501953, 0.8204116821289062, 1.932769775390625, 2.2400360107421875, 2.26885986328125, 2.9747390747070312, -1.5841827392578125, 1.5848770141601562, 2.1696548461914062, -1.5042228698730469, 0.2897186279296875, -0.6479072570800781, 5.3669586181640625, 0.05779266357421875, 6.5592803955078125, 0.19956684112548828, 1.6004829406738281, 3.9330596923828125, 0.172119140625, 4.510204315185547, 1.2246627807617188, 2.9079513549804688, -1.1260662078857422, 0.5507049560546875, 1.008697509765625, 0.00028228759765625, 0.7732772827148438, -0.1625385284423828, -1.9173660278320312, 0.7126855850219727, 4.474193572998047, -0.5563716888427734, 1.5476837158203125, -0.21125030517578125, -0.9412384033203125, 1.203786849975586, -1.5820083618164062, 3.1454696655273438, 2.388824462890625, -1.39935302734375, 1.985565185546875, 2.4148311614990234, 4.147666931152344, 2.1210479736328125, 0.43221282958984375, 2.6809844970703125, 4.704172134399414, -1.8482284545898438, 3.450681686401367, -0.258544921875, 1.182373046875, -3.3133316040039062, -1.4598541259765625, 0.053924560546875, 3.7687835693359375, -2.3825454711914062, -1.4409294128417969, 0.8328170776367188, 0.07305908203125, -2.5450973510742188, 1.9374122619628906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000388.npy"}
|
||||
{"epoch": 0.5865457294028723, "step": 389, "batch_size": 64, "mean": 1.205160140991211, "std": 2.3725225925445557, "min": -7.754669189453125, "p10": -1.079986572265625, "median": 1.4078102111816406, "p90": 4.024332427978516, "max": 5.827861785888672, "pos_frac": 0.734375, "sample": [-0.321533203125, -2.1091995239257812, 1.197967529296875, 4.425811767578125, 0.9258899688720703, 2.7453460693359375, 3.5202560424804688, 3.930328369140625, 3.1558990478515625, 1.856903076171875, -0.6658935546875, -3.8912887573242188, 2.5774154663085938, 1.4745903015136719, 1.26171875, 3.308074951171875, 0.8231048583984375, 4.055046081542969, 0.705291748046875, 2.0353012084960938, 1.6323585510253906, 2.9720535278320312, -0.7198944091796875, 3.3495101928710938, 2.3658676147460938, -0.5935211181640625, 1.9106597900390625, 0.0907440185546875, 1.878662109375, 1.863739013671875, -0.6088905334472656, 0.24118804931640625, -1.098480224609375, 5.827861785888672, 4.188220977783203, 0.6716232299804688, 4.501617431640625, -0.15070343017578125, -0.27657318115234375, -2.861949920654297, 0.9973297119140625, -0.8106575012207031, 0.8255462646484375, -1.036834716796875, 1.6972389221191406, 2.5896759033203125, 1.990814208984375, 0.0118865966796875, -3.1580352783203125, 0.36767578125, 1.4457321166992188, 1.0972938537597656, -7.754669189453125, -0.9422025680541992, -2.9851951599121094, 5.033416748046875, 5.070247650146484, 3.287322998046875, 2.3728485107421875, 3.952667236328125, 1.4210166931152344, 2.9442214965820312, 1.1231803894042969, 1.3946037292480469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000389.npy"}
|
||||
{"epoch": 0.5880574452003023, "step": 390, "batch_size": 64, "mean": 1.2624856233596802, "std": 1.9185047149658203, "min": -3.9160633087158203, "p10": -0.7157829284667968, "median": 1.0429072380065918, "p90": 3.4553195953369147, "max": 6.9669342041015625, "pos_frac": 0.75, "sample": [-0.4952392578125, 0.5839672088623047, 1.2897262573242188, -0.5975112915039062, 4.391132354736328, 0.9975738525390625, 0.19728851318359375, 3.8959007263183594, 0.2724189758300781, 2.1983642578125, 0.5147018432617188, 0.76654052734375, 2.9436264038085938, 0.32672119140625, -0.950042724609375, 6.9669342041015625, 2.5634689331054688, -1.1740570068359375, 1.6513748168945312, -1.3205642700195312, 0.9196243286132812, 2.880767822265625, 2.778543472290039, 4.053001403808594, 2.241241455078125, 2.035125732421875, -0.121978759765625, -0.6879959106445312, 0.39556121826171875, 5.6599578857421875, -1.2225799560546875, 0.5508575439453125, 2.143077850341797, 1.5616912841796875, -3.0097293853759766, 1.0023651123046875, -3.9160633087158203, 0.0405426025390625, 5.727226257324219, -0.3950786590576172, 2.5332183837890625, 2.444507598876953, -0.202606201171875, 1.1738147735595703, 0.5438041687011719, 2.780597686767578, 1.7304763793945312, 0.7580680847167969, 1.341766357421875, -0.03990936279296875, 3.055980682373047, 0.7035865783691406, -0.17107009887695312, 2.2517433166503906, 1.5390453338623047, 1.9847869873046875, 1.1816329956054688, -0.4806175231933594, 1.083449363708496, 0.5210914611816406, 3.3536376953125, 2.2823867797851562, -0.727691650390625, 3.4988975524902344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000390.npy"}
|
||||
{"epoch": 0.5895691609977324, "step": 391, "batch_size": 64, "mean": 1.5577449798583984, "std": 2.846747636795044, "min": -4.318115234375, "p10": -1.6839733123779297, "median": 1.1366243362426758, "p90": 4.3442836761474615, "max": 12.48846435546875, "pos_frac": 0.703125, "sample": [-0.2402496337890625, -3.8699073791503906, 3.2987289428710938, 4.299564361572266, 0.9544448852539062, 2.4048538208007812, -0.00170135498046875, 3.8532485961914062, 3.2421951293945312, 5.620880126953125, 1.2095413208007812, -0.77325439453125, 2.51416015625, -1.6731986999511719, 2.9894752502441406, 5.997154235839844, -0.8310356140136719, 2.7982711791992188, -1.6885910034179688, -1.0774688720703125, -0.8751258850097656, 0.18503570556640625, 2.801910400390625, 1.0568161010742188, 1.3869400024414062, -1.0552978515625, -2.3248023986816406, -3.3043556213378906, 3.549816131591797, 3.252094268798828, 12.48846435546875, 0.82342529296875, 1.0637073516845703, 0.3356781005859375, 0.031768798828125, -0.24065017700195312, 0.6254978179931641, 1.0273590087890625, 6.40386962890625, 3.1434707641601562, 0.99127197265625, 3.9993515014648438, -2.654348373413086, -2.598064422607422, 0.9675483703613281, 2.8391799926757812, 5.5978240966796875, -0.5575714111328125, 2.9763259887695312, 4.3634490966796875, -4.318115234375, -1.586090087890625, 5.621669769287109, 2.7812042236328125, 4.0654754638671875, 4.273773193359375, 3.2766494750976562, 2.2936553955078125, 0.4592437744140625, 2.259511947631836, -0.8673171997070312, 3.6352310180664062, 2.1731109619140625, 0.2999725341796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000391.npy"}
|
||||
{"epoch": 0.5910808767951625, "step": 392, "batch_size": 64, "mean": 1.8912948369979858, "std": 2.5364770889282227, "min": -3.564727783203125, "p10": -0.7300968170166016, "median": 1.6056194305419922, "p90": 4.896339225769044, "max": 10.1431884765625, "pos_frac": 0.8125, "sample": [-2.84454345703125, 2.6930809020996094, 3.9606094360351562, 2.727325439453125, 2.27935791015625, 0.6701736450195312, 5.017656326293945, 0.937835693359375, 5.966949462890625, 2.384765625, 2.82208251953125, -1.4932098388671875, 3.3513565063476562, 2.5193862915039062, 6.241607666015625, 2.4101638793945312, -0.7354354858398438, 1.683380126953125, 2.2791748046875, 2.0446090698242188, 0.851470947265625, -0.9703445434570312, 3.2514572143554688, -2.052225112915039, 0.5948028564453125, 4.6132659912109375, 3.935821533203125, 1.0667457580566406, 0.68890380859375, -3.564727783203125, -0.13848876953125, 7.1617431640625, 1.9423904418945312, -0.33905029296875, -0.457611083984375, 3.6556968688964844, 1.1403827667236328, 1.5278587341308594, 2.6777572631835938, 3.0777206420898438, 8.429336547851562, 6.057514190673828, 0.1368541717529297, 4.06256103515625, 1.4833698272705078, 0.6569747924804688, 0.18700408935546875, 0.637939453125, 4.567626953125, 3.3821334838867188, -0.7176399230957031, 10.1431884765625, 0.2677898406982422, 1.3487548828125, -1.9120941162109375, 0.00789642333984375, 1.4888916015625, -0.39239501953125, 0.74041748046875, 2.3064651489257812, 0.04622650146484375, 2.1880836486816406, 0.02924346923828125, 2.31683349609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000392.npy"}
|
||||
{"epoch": 0.5925925925925926, "step": 393, "batch_size": 64, "mean": 1.4311583042144775, "std": 2.0438015460968018, "min": -1.8943023681640625, "p10": -0.9425556182861327, "median": 1.1141700744628906, "p90": 4.237736129760743, "max": 7.459556579589844, "pos_frac": 0.703125, "sample": [-0.6527481079101562, -0.3776969909667969, 4.956916809082031, 2.6915054321289062, 6.128366470336914, 0.9827117919921875, -0.18294906616210938, 1.0677261352539062, 1.769805908203125, 0.11305999755859375, -0.1857318878173828, -1.3663787841796875, 4.824378967285156, 1.5700454711914062, 2.673776626586914, -0.7559394836425781, -0.0687408447265625, 3.755828857421875, -1.8943023681640625, 0.4239959716796875, -1.0779876708984375, 0.7468090057373047, 2.981658935546875, 3.8695144653320312, -0.9992599487304688, 3.6950607299804688, -0.8102455139160156, 4.472389221191406, 1.160614013671875, 2.1129608154296875, 0.9097061157226562, -0.21797561645507812, 3.631591796875, 3.1791839599609375, 0.6428737640380859, 4.342750549316406, -0.6845664978027344, 1.2690658569335938, 0.3627204895019531, -0.15846633911132812, 1.2992382049560547, -0.579437255859375, 1.4751930236816406, 1.2061004638671875, -0.20782089233398438, 1.8560104370117188, 2.9761581420898438, 1.23944091796875, 0.8877677917480469, 0.363128662109375, 4.482847213745117, 7.459556579589844, 3.077566146850586, 0.18886184692382812, 1.8924636840820312, 3.439525604248047, 0.5510787963867188, -1.4237518310546875, -1.73883056640625, -1.4464874267578125, 2.6180410385131836, 3.9927024841308594, 0.6867828369140625, 2.395965576171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000393.npy"}
|
||||
{"epoch": 0.5941043083900227, "step": 394, "batch_size": 64, "mean": 1.3650896549224854, "std": 1.8021457195281982, "min": -2.120502471923828, "p10": -1.199999618530273, "median": 1.4264755249023438, "p90": 3.4116203308105475, "max": 6.972557067871094, "pos_frac": 0.765625, "sample": [-0.25548553466796875, 3.8218345642089844, 3.8122787475585938, 1.4254722595214844, -0.0283355712890625, 1.8590621948242188, -1.3875350952148438, 1.1685142517089844, 2.2484817504882812, 0.08823966979980469, 3.8829345703125, 2.110109329223633, 1.1456565856933594, -0.7624168395996094, -2.0679473876953125, 2.0182056427001953, 0.7643966674804688, -2.120502471923828, -1.7297515869140625, 0.7455368041992188, 3.8545608520507812, -0.237274169921875, 2.9150638580322266, 1.0787124633789062, 1.3712577819824219, 1.4274787902832031, 1.704986572265625, 2.9847869873046875, 2.5143871307373047, 1.4601325988769531, -1.7232170104980469, 1.3432769775390625, 6.972557067871094, 1.6928939819335938, 1.7705459594726562, 2.9464569091796875, 0.3640327453613281, 3.219207763671875, 1.5472259521484375, -0.4027557373046875, 0.09273529052734375, -0.16021347045898438, -2.0782470703125, 1.1086654663085938, 1.2718391418457031, 5.0206451416015625, -1.6772937774658203, 1.8414840698242188, -0.4713630676269531, 1.5631637573242188, -0.46006011962890625, 3.4635009765625, 0.8705673217773438, 3.1558685302734375, 0.2362060546875, 3.2905654907226562, 0.43653297424316406, 2.983509063720703, 2.3978958129882812, 1.4824981689453125, 2.657135009765625, 2.8778343200683594, 1.39776611328125, 2.5214385986328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000394.npy"}
|
||||
{"epoch": 0.5956160241874527, "step": 395, "batch_size": 64, "mean": 1.5636616945266724, "std": 2.214214563369751, "min": -3.328166961669922, "p10": -0.6583723068237305, "median": 1.5345306396484375, "p90": 4.570834350585938, "max": 7.819206237792969, "pos_frac": 0.78125, "sample": [2.068784713745117, 4.07562255859375, 1.75616455078125, 1.1849517822265625, 4.55218505859375, 2.6757659912109375, -0.6630420684814453, 3.3422622680664062, 2.5315322875976562, 0.217803955078125, 0.09077835083007812, 2.0813980102539062, 4.9383697509765625, 3.8930091857910156, 2.554290771484375, 3.4156532287597656, 0.08140945434570312, 1.8006553649902344, 4.578826904296875, 0.8832626342773438, -0.2702789306640625, -0.024662017822265625, 0.96673583984375, 2.1481781005859375, 3.223968505859375, -1.4993934631347656, 3.208576202392578, 1.7455978393554688, 0.1579456329345703, 0.4363861083984375, 2.399578094482422, 6.8825531005859375, 1.1353607177734375, -0.5231399536132812, -3.24945068359375, 2.864917755126953, -3.328166961669922, 1.855987548828125, 1.7273483276367188, -0.7791557312011719, 0.46555328369140625, 0.1301116943359375, 5.637542724609375, 0.15166473388671875, 5.615062713623047, 7.819206237792969, -0.3721761703491211, 4.63043212890625, -1.1706771850585938, -0.6474761962890625, 0.35683631896972656, 1.0859375, 1.583953857421875, 0.315582275390625, 1.485107421875, 3.4105758666992188, -0.26801300048828125, 0.9709129333496094, 2.185598373413086, -0.527008056640625, 2.121967315673828, -2.833709716796875, 1.1203994750976562, 1.66839599609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000395.npy"}
|
||||
{"epoch": 0.5971277399848829, "step": 396, "batch_size": 64, "mean": 1.3979142904281616, "std": 2.08103609085083, "min": -4.185007095336914, "p10": -0.9321952819824217, "median": 1.2685670852661133, "p90": 3.620307540893555, "max": 6.9819793701171875, "pos_frac": 0.8125, "sample": [2.092815399169922, 6.10089111328125, 2.30938720703125, 5.815086364746094, 2.0493240356445312, 2.1741256713867188, 2.2186355590820312, 1.306936264038086, 1.6457366943359375, 1.6433086395263672, 1.1935577392578125, 0.7235870361328125, 3.3962631225585938, -4.185007095336914, 0.7868919372558594, -0.06727886199951172, 3.71380615234375, 2.8145179748535156, 1.579061508178711, 2.206829071044922, -1.60504150390625, -0.323455810546875, 6.9819793701171875, 0.4246788024902344, 0.40563201904296875, 0.542572021484375, 0.6836013793945312, -4.0037689208984375, 1.7578277587890625, 0.7426910400390625, 2.191923141479492, -1.9161224365234375, 2.5731143951416016, 6.205177307128906, 1.169342041015625, 3.3046951293945312, 1.0856895446777344, 1.4962844848632812, 3.6684608459472656, 0.59442138671875, -0.722412109375, 4.2928924560546875, 0.4564495086669922, 2.14788818359375, -1.1077423095703125, 2.0204620361328125, 1.1211204528808594, 2.5204315185546875, 3.3686294555664062, 0.2008819580078125, -0.7761917114257812, 0.244720458984375, 1.585881233215332, 0.09138107299804688, 0.6655693054199219, 0.192840576171875, 1.2301979064941406, 3.4171924591064453, 0.9100437164306641, 1.4646034240722656, -0.055023193359375, -1.8103790283203125, -0.999053955078125, 3.5079498291015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000396.npy"}
|
||||
{"epoch": 0.5986394557823129, "step": 397, "batch_size": 64, "mean": 1.0165547132492065, "std": 2.398286819458008, "min": -6.607566833496094, "p10": -2.0751541137695306, "median": 1.4304752349853516, "p90": 3.7634956359863287, "max": 5.749412536621094, "pos_frac": 0.703125, "sample": [-0.07700729370117188, 0.2994842529296875, -2.3169403076171875, 3.575592041015625, -0.373138427734375, 2.927154541015625, 1.5864486694335938, 3.0937423706054688, 2.026580810546875, 2.509185791015625, 0.8461647033691406, 0.5493698120117188, 3.879852294921875, 3.5384521484375, 1.1424407958984375, 3.4256134033203125, -3.2624053955078125, 0.0504302978515625, 1.4365653991699219, -0.4403724670410156, 4.018527984619141, -0.7208251953125, 2.658487319946289, 1.7625961303710938, -0.13780975341796875, -2.8931427001953125, -1.510986328125, 1.9094467163085938, 4.45458984375, 3.473602294921875, 1.4512519836425781, -0.5057487487792969, 0.0464935302734375, 1.55218505859375, 4.261421203613281, -1.391632080078125, 0.34328460693359375, 3.826568603515625, 3.6163253784179688, 1.4243850708007812, 2.584148406982422, 3.1160736083984375, 2.151630401611328, 1.5060272216796875, -6.607566833496094, 0.7677154541015625, 4.68756103515625, -4.8566741943359375, -0.610931396484375, 0.177764892578125, 3.3873443603515625, 1.3400115966796875, -0.32810211181640625, 1.639974594116211, -1.1226158142089844, 0.040401458740234375, 2.0653228759765625, 0.9193515777587891, -0.734405517578125, 2.3622589111328125, -4.386112213134766, -2.391265869140625, 1.5459442138671875, 5.749412536621094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000397.npy"}
|
||||
{"epoch": 0.600151171579743, "step": 398, "batch_size": 64, "mean": 1.6283986568450928, "std": 2.213419198989868, "min": -2.53704833984375, "p10": -1.337875747680664, "median": 1.5690498352050781, "p90": 4.261033153533936, "max": 7.825107574462891, "pos_frac": 0.78125, "sample": [-1.2648353576660156, 3.8970680236816406, 1.9503974914550781, -0.626220703125, 2.9882774353027344, 2.0781707763671875, 6.16552734375, 2.27264404296875, -1.3691787719726562, 5.450981140136719, 2.619342803955078, 3.549468994140625, -1.9110488891601562, -2.53704833984375, 3.7320709228515625, -1.146392822265625, 4.114986419677734, -0.85577392578125, 2.31500244140625, 1.467071533203125, 1.6710281372070312, 2.6970252990722656, 7.825107574462891, 2.94378662109375, 0.37128448486328125, 3.740903854370117, 4.08740234375, 1.8905525207519531, 1.0593700408935547, 2.1321182250976562, -1.5622940063476562, 1.8577880859375, 0.6714019775390625, 1.1695556640625, -0.37726593017578125, -1.745147705078125, -0.38979339599609375, 4.4069976806640625, 1.90313720703125, 0.6999664306640625, -1.3823165893554688, 0.6564178466796875, 3.3421554565429688, 2.3724136352539062, 2.5326128005981445, 5.271949768066406, 0.31461334228515625, 0.331573486328125, -0.6267814636230469, 4.323624610900879, 0.3619842529296875, 6.296600341796875, 0.953948974609375, 0.5773391723632812, 2.2869949340820312, 0.6372222900390625, 0.996063232421875, -2.2212600708007812, 0.08984375, 0.8569259643554688, 1.4233627319335938, 2.290721893310547, 4.010673522949219, 0.577392578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000398.npy"}
|
||||
{"epoch": 0.6016628873771731, "step": 399, "batch_size": 64, "mean": 1.3752162456512451, "std": 2.378622531890869, "min": -3.7906951904296875, "p10": -1.2707809448242187, "median": 1.3510093688964844, "p90": 4.147061920166016, "max": 7.483856201171875, "pos_frac": 0.734375, "sample": [2.63873291015625, 1.3180160522460938, 2.8542022705078125, 1.443328857421875, -3.4222755432128906, 7.483856201171875, 3.5052413940429688, 1.5022449493408203, -1.385528564453125, -2.331329345703125, -0.42633819580078125, 0.865203857421875, 2.334033966064453, -1.9186515808105469, 1.071685791015625, 1.93048095703125, 1.3861236572265625, 0.232879638671875, 0.4643211364746094, 4.0552520751953125, -3.3834381103515625, -0.0525054931640625, -3.7906951904296875, 4.66766357421875, 3.203369140625, -0.6487045288085938, 1.705413818359375, 6.921112060546875, 0.8339462280273438, 0.3823089599609375, 0.8362884521484375, -0.6112194061279297, 2.5661773681640625, 0.9334373474121094, -1.1525344848632812, 2.1848373413085938, 0.16114044189453125, 7.135467529296875, 2.3223876953125, 0.8807907104492188, -0.4709930419921875, 0.2163715362548828, 3.0636367797851562, 0.23998260498046875, 3.0073394775390625, 1.0157241821289062, 1.5673828125, 1.7688560485839844, -1.3061065673828125, -0.8134078979492188, 2.5788402557373047, 2.6167755126953125, 3.1464157104492188, 2.159881591796875, 1.384002685546875, 3.166748046875, 5.255584716796875, 6.07293701171875, -1.1883544921875, 2.2716598510742188, 4.186408996582031, -1.0489959716796875, 1.1617050170898438, -0.7352828979492188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000399.npy"}
|
||||
{"epoch": 0.6031746031746031, "step": 400, "batch_size": 64, "mean": 0.9363600015640259, "std": 1.9905461072921753, "min": -2.78240966796875, "p10": -1.4141258239746093, "median": 0.7151832580566406, "p90": 3.377263641357423, "max": 6.3201904296875, "pos_frac": 0.671875, "sample": [0.3093757629394531, 0.7728118896484375, 0.2822456359863281, -0.969085693359375, 2.9579849243164062, 3.0371551513671875, 1.2843017578125, 2.409893035888672, 4.90234375, 3.8487586975097656, 0.2465057373046875, -0.6747970581054688, 0.25180816650390625, 0.4236736297607422, 1.5016498565673828, 0.186798095703125, 0.276702880859375, 4.5671539306640625, 0.6575546264648438, 2.6761770248413086, 0.5227813720703125, -1.4474601745605469, 0.611968994140625, -1.7433128356933594, -1.396881103515625, 2.526826858520508, -0.5004596710205078, 2.750232696533203, 0.5315170288085938, -1.1449356079101562, -1.9623565673828125, -0.5582695007324219, 2.244140625, 1.04095458984375, 5.454338073730469, -1.4215164184570312, -0.7104263305664062, -0.53424072265625, 2.1390228271484375, 1.0230865478515625, 2.9728164672851562, 3.164356231689453, -2.78240966796875, 4.23968505859375, 0.7923049926757812, 0.8397903442382812, -1.2305831909179688, -0.32292938232421875, 2.3849868774414062, 6.3201904296875, -0.606231689453125, 1.245880126953125, -1.287954330444336, 2.1070098876953125, -2.7023658752441406, 1.1699676513671875, 3.4685096740722656, 1.396942138671875, -0.8035087585449219, 1.357757568359375, -1.7989120483398438, -0.4119110107421875, 3.083465576171875, 0.9561614990234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000400.npy"}
|
||||
{"epoch": 0.6046863189720333, "step": 401, "batch_size": 64, "mean": 1.803335428237915, "std": 2.2299110889434814, "min": -3.3607940673828125, "p10": -0.8391597747802734, "median": 1.6156978607177734, "p90": 5.038623428344731, "max": 8.304168701171875, "pos_frac": 0.84375, "sample": [-1.32501220703125, 6.301910400390625, 2.9956207275390625, -0.05364990234375, 0.43817710876464844, 0.9165067672729492, 0.9200286865234375, 1.7307929992675781, 0.9989051818847656, -2.5866546630859375, 3.5616912841796875, 0.773193359375, -2.0428504943847656, -0.764923095703125, 2.542774200439453, 0.5498428344726562, 3.7608489990234375, 1.648101806640625, 3.5636558532714844, 5.669486999511719, 1.2106399536132812, -0.15739822387695312, 0.6784858703613281, 3.1198806762695312, -2.0768890380859375, -3.3607940673828125, 1.143463134765625, 1.1444072723388672, 3.1658935546875, 0.8679962158203125, 2.4286270141601562, 0.9078216552734375, 0.7131385803222656, 2.7727432250976562, 0.1184844970703125, 3.1612167358398438, 0.10055160522460938, 3.014251708984375, 0.7988748550415039, 1.841949462890625, -1.0446510314941406, 5.467689514160156, 2.6962890625, -0.8709754943847656, 1.014190673828125, 2.514995574951172, 1.51837158203125, 5.436805725097656, 1.8809661865234375, 2.9648780822753906, 1.8560295104980469, 1.9419536590576172, 1.5832939147949219, 0.29296112060546875, 0.844268798828125, 1.2575178146362305, 1.7564544677734375, 4.109531402587891, 5.785549163818359, 7.097965240478516, 1.8496513366699219, 2.129711151123047, 3.8340606689453125, 8.304168701171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000401.npy"}
|
||||
{"epoch": 0.6061980347694633, "step": 402, "batch_size": 64, "mean": 1.5745457410812378, "std": 2.2879443168640137, "min": -3.52362060546875, "p10": -0.9707954406738281, "median": 1.400369644165039, "p90": 4.224195671081543, "max": 7.146720886230469, "pos_frac": 0.734375, "sample": [4.182001113891602, 2.640411376953125, 2.8038330078125, 1.85986328125, -0.47263336181640625, 1.9372215270996094, 5.618553161621094, 2.7807464599609375, 1.4839706420898438, 6.421302795410156, 1.0615653991699219, 5.738536834716797, 3.768238067626953, -0.4040565490722656, 4.242279052734375, 0.07410812377929688, 1.5447998046875, 0.7156810760498047, -0.34568023681640625, 1.1327896118164062, 7.146720886230469, 1.1932716369628906, -0.2650299072265625, -2.753143310546875, 3.71038818359375, -0.6704940795898438, 2.546527862548828, 3.09747314453125, -0.7896461486816406, 0.9115524291992188, 6.8927001953125, 7.066143035888672, 0.1590576171875, 0.4635772705078125, 0.82666015625, 1.7738513946533203, 2.616668701171875, 3.6363296508789062, 1.5974464416503906, 2.176205635070801, -1.1157302856445312, -1.0743484497070312, -3.52362060546875, 3.477266311645508, 2.422515869140625, -0.0569915771484375, 1.7427902221679688, -0.226806640625, 0.3697967529296875, -1.3717269897460938, 2.4253158569335938, 0.942535400390625, 1.786285400390625, 3.24993896484375, -0.154754638671875, 0.4817543029785156, 0.7305908203125, -1.0022048950195312, -1.977783203125, 2.7843399047851562, 1.3167686462402344, 2.3013877868652344, 0.02132415771484375, -0.8975067138671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000402.npy"}
|
||||
{"epoch": 0.6077097505668935, "step": 403, "batch_size": 64, "mean": 2.013162851333618, "std": 2.8585870265960693, "min": -7.056001663208008, "p10": -1.3639587402343747, "median": 1.6367416381835938, "p90": 5.637994956970216, "max": 8.742523193359375, "pos_frac": 0.796875, "sample": [0.3142547607421875, 1.6923828125, 1.549896240234375, 4.806144714355469, 3.305217742919922, 2.464874267578125, 7.8621368408203125, 3.3719711303710938, 0.29418182373046875, 4.905551910400391, 6.193485260009766, -0.9683074951171875, 2.063720703125, 0.9742584228515625, 2.3385848999023438, 1.9383964538574219, 3.3795604705810547, 0.8166389465332031, 1.7954254150390625, 1.4163169860839844, -0.7534217834472656, -2.0509719848632812, 8.274932861328125, 8.742523193359375, -0.1354827880859375, 5.287071228027344, 5.37847900390625, 0.9650802612304688, 3.3969879150390625, 0.36530303955078125, 0.9353523254394531, 2.2483139038085938, 1.4330406188964844, 0.4220924377441406, 1.5811004638671875, -2.427539825439453, 0.6788864135742188, 6.927318572998047, 0.6433181762695312, 2.8846435546875, 0.7801971435546875, -0.6809310913085938, 5.749216079711914, -1.046600341796875, 2.4148693084716797, 0.3321533203125, 1.3558502197265625, 2.8262863159179688, 4.140869140625, -1.7662200927734375, 3.281646728515625, 3.266796112060547, -7.056001663208008, -2.2447280883789062, 1.538177490234375, -1.9247055053710938, 5.134243011474609, 7.4683685302734375, 1.1230316162109375, 3.5333251953125, -1.499969482421875, -0.03946685791015625, 3.890350341796875, 2.953947067260742], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000403.npy"}
|
||||
{"epoch": 0.6092214663643235, "step": 404, "batch_size": 64, "mean": 1.5841295719146729, "std": 2.095047950744629, "min": -3.9481964111328125, "p10": -1.0037054061889645, "median": 1.4821624755859375, "p90": 4.218019104003908, "max": 6.145965576171875, "pos_frac": 0.734375, "sample": [1.5243682861328125, 1.9582138061523438, 3.8687667846679688, 2.5569000244140625, 0.22604751586914062, 2.9016571044921875, 6.145965576171875, 1.2227277755737305, -0.4502744674682617, -3.9481964111328125, 3.3904876708984375, 0.42681884765625, -2.317798614501953, 1.0130462646484375, -0.048404693603515625, 5.1279754638671875, -0.4703693389892578, -0.7064075469970703, 3.1343421936035156, 2.7803802490234375, 4.367698669433594, 2.04925537109375, 2.4943313598632812, 6.0034027099609375, -0.678131103515625, -0.14289474487304688, -1.4246826171875, 3.103118896484375, 2.1223602294921875, 2.320850372314453, 1.3853759765625, 1.5910720825195312, -1.527008056640625, 3.0533084869384766, -0.1747589111328125, 1.4399566650390625, -0.36281585693359375, 2.7672576904296875, 2.121002197265625, -0.12012481689453125, 1.0304489135742188, -0.32683563232421875, 3.4346771240234375, 2.91217041015625, 2.7565135955810547, 2.3970108032226562, 3.0489044189453125, 3.116607666015625, 5.443305969238281, 1.2586250305175781, 0.42913818359375, 3.582305908203125, 5.897369384765625, 0.882232666015625, 4.626457214355469, 0.41864013671875, 1.4350852966308594, -1.7023544311523438, -1.23370361328125, -1.1311187744140625, 0.7755279541015625, 0.3113861083984375, 2.1651153564453125, 1.1319580078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000404.npy"}
|
||||
{"epoch": 0.6107331821617535, "step": 405, "batch_size": 64, "mean": 1.2557039260864258, "std": 2.152374744415283, "min": -2.3972702026367188, "p10": -1.3564193725585938, "median": 0.9382705688476562, "p90": 4.471239852905274, "max": 7.317535400390625, "pos_frac": 0.703125, "sample": [1.6829986572265625, 1.3165969848632812, -1.5936450958251953, 5.3687286376953125, 3.643646240234375, -0.735260009765625, 3.4370956420898438, 0.8287811279296875, -1.372894287109375, -1.9507598876953125, -0.08984375, 1.5472431182861328, 0.7862014770507812, -0.7409324645996094, 2.3115615844726562, -1.5642814636230469, 3.931488037109375, -1.2931671142578125, 1.5845146179199219, 1.01220703125, 2.352773666381836, 0.8643341064453125, 1.99774169921875, 1.3297348022460938, -2.155862808227539, 0.3962860107421875, 4.5836181640625, 4.568103790283203, -0.7592010498046875, 2.2932586669921875, 0.7895660400390625, -0.6734848022460938, -0.5851211547851562, 2.3066787719726562, -0.763916015625, 1.3352394104003906, 1.0884742736816406, -1.016082763671875, 3.9832687377929688, 2.5620956420898438, -0.03479766845703125, 0.27311134338378906, 0.8198928833007812, 0.5744171142578125, 0.796844482421875, 0.1357269287109375, 0.18468475341796875, 3.0258865356445312, 4.900665283203125, 2.8413867950439453, -0.047149658203125, 0.6391448974609375, 4.2452239990234375, 1.1810035705566406, 5.56195068359375, 7.317535400390625, -2.3972702026367188, 2.7695655822753906, 0.0741729736328125, 1.3426475524902344, -2.1900634765625, 5.131721496582031, 1.9289398193359375, -1.3179779052734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000405.npy"}
|
||||
{"epoch": 0.6122448979591837, "step": 406, "batch_size": 64, "mean": 1.531808853149414, "std": 2.2880735397338867, "min": -3.153776168823242, "p10": -1.3846418380737304, "median": 1.5157966613769531, "p90": 4.847994422912598, "max": 8.800216674804688, "pos_frac": 0.796875, "sample": [2.0303115844726562, 0.45702362060546875, -0.6536293029785156, -1.4041213989257812, 2.201568603515625, 2.4125328063964844, 2.5569725036621094, 4.9337615966796875, -1.8330917358398438, 6.454345703125, 4.230674743652344, 1.6604461669921875, 0.3688068389892578, -1.339630126953125, 5.21612548828125, 3.2369384765625, -1.6966552734375, 0.7490272521972656, 1.3589057922363281, 5.006889343261719, 8.800216674804688, 1.5314064025878906, 1.1884307861328125, 0.767425537109375, 2.4040985107421875, 2.20098876953125, -0.24292945861816406, 0.7836227416992188, 0.06355476379394531, 0.31394386291503906, -1.4039325714111328, -0.6525821685791016, -3.153776168823242, -0.12285232543945312, 2.8496780395507812, 0.0085906982421875, 1.5001869201660156, 4.647871017456055, 0.7972145080566406, 5.511833190917969, 1.5883941650390625, -1.1928176879882812, 2.33575439453125, 2.6504974365234375, 0.7128829956054688, 2.9933700561523438, 0.11801719665527344, 1.14349365234375, 3.3817176818847656, 2.929534912109375, 1.6392364501953125, 1.6087646484375, 1.603668212890625, -3.11798095703125, 2.1953506469726562, -2.378326416015625, 0.74468994140625, 1.8911895751953125, 0.11243438720703125, 1.3801956176757812, 6.125946044921875, 0.697906494140625, 2.9256744384765625, 2.2059783935546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000406.npy"}
|
||||
{"epoch": 0.6137566137566137, "step": 407, "batch_size": 64, "mean": 1.2160767316818237, "std": 2.0153703689575195, "min": -3.01934814453125, "p10": -1.1720252990722657, "median": 1.0841903686523438, "p90": 3.477884674072266, "max": 9.362373352050781, "pos_frac": 0.78125, "sample": [1.35443115234375, -0.9420394897460938, 0.0957183837890625, 3.758495330810547, 0.21070098876953125, 3.1545257568359375, 4.037261962890625, -1.9773101806640625, -2.09185791015625, 1.9373703002929688, 1.8321952819824219, 2.8286399841308594, 0.1027984619140625, -1.1805191040039062, 0.67877197265625, -0.292877197265625, 1.6618156433105469, 1.047454833984375, 3.228179931640625, 2.627279281616211, -1.1522064208984375, -0.9956645965576172, 3.5449905395507812, 1.0503997802734375, 0.446136474609375, 2.205486297607422, 2.177520751953125, 1.8153743743896484, 0.7371368408203125, 9.362373352050781, 1.11798095703125, 0.03728485107421875, 3.4341888427734375, 1.2975234985351562, 1.923553466796875, 0.03192901611328125, -0.654876708984375, 3.243499755859375, 0.1557159423828125, 0.32231712341308594, -1.081573486328125, 1.64764404296875, 3.4423751831054688, 1.859466552734375, 3.49310302734375, 2.296722412109375, 2.9790267944335938, 0.84039306640625, 1.572052001953125, 0.29853248596191406, 2.4519500732421875, -3.01934814453125, -2.629150390625, -1.3190460205078125, -1.214498519897461, -0.2108306884765625, 1.0210494995117188, 4.776710510253906, 1.6463279724121094, 0.5308380126953125, 0.24384307861328125, 2.0114364624023438, 3.5866241455078125, 0.433563232421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000407.npy"}
|
||||
{"epoch": 0.6152683295540439, "step": 408, "batch_size": 64, "mean": 1.949136734008789, "std": 2.1006860733032227, "min": -1.9834403991699219, "p10": -0.7556440353393554, "median": 1.7415046691894531, "p90": 4.4670154571533205, "max": 7.1375885009765625, "pos_frac": 0.796875, "sample": [-0.46218109130859375, 2.3448925018310547, 4.4869842529296875, 2.7355079650878906, 0.5486822128295898, -1.9834403991699219, -1.2723388671875, 1.7156963348388672, 2.8717880249023438, 1.7563247680664062, 2.4603652954101562, 3.67047119140625, 0.4906768798828125, 0.080291748046875, 5.5648956298828125, 4.038055419921875, 0.5778388977050781, 0.9919090270996094, 2.306879997253418, 1.8887596130371094, 3.1650238037109375, -0.7744598388671875, -0.23155975341796875, 1.492828369140625, 5.6162567138671875, -1.12847900390625, 1.684234619140625, 6.82861328125, 3.5023269653320312, 4.755077362060547, 3.897674560546875, -0.03308296203613281, 1.4229660034179688, 1.0950546264648438, 3.78057861328125, 4.119258880615234, 1.5914039611816406, -1.849395751953125, 7.1375885009765625, -0.87286376953125, 1.0426406860351562, 3.2235260009765625, 4.541450500488281, 3.6048660278320312, 2.36309814453125, 1.4842987060546875, 2.8478431701660156, -1.1607437133789062, 1.0718917846679688, 3.7159576416015625, 1.7266845703125, 4.420421600341797, 4.387908935546875, -0.7117404937744141, 0.28562164306640625, 3.0242462158203125, 0.2008514404296875, 0.39470577239990234, 2.4718284606933594, -0.48470306396484375, -0.06537628173828125, 0.044475555419921875, 2.094970703125, 4.208919525146484], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000408.npy"}
|
||||
{"epoch": 0.6167800453514739, "step": 409, "batch_size": 64, "mean": 1.1142562627792358, "std": 2.219360828399658, "min": -3.675689697265625, "p10": -1.2081645965576169, "median": 0.8920955657958984, "p90": 4.71642074584961, "max": 6.593097686767578, "pos_frac": 0.640625, "sample": [0.7697639465332031, 1.749898910522461, -2.825408935546875, -0.9780960083007812, -0.14835739135742188, 0.4536895751953125, 1.945770263671875, -1.0005035400390625, 0.8068618774414062, -2.2091293334960938, -0.2614765167236328, 4.8070831298828125, 1.674551010131836, 5.9216766357421875, 2.0790863037109375, 1.4058990478515625, 2.7408828735351562, 0.8881034851074219, -1.0039329528808594, 5.623512268066406, 3.8903732299804688, -0.46563720703125, 1.284210205078125, 1.4933700561523438, -0.013889312744140625, -0.9882049560546875, -0.504364013671875, 2.3197555541992188, 1.2898273468017578, 0.9765777587890625, -0.38605499267578125, 1.2102127075195312, 3.3939895629882812, -1.2956924438476562, 1.9301605224609375, 0.8335933685302734, 3.7906417846679688, 6.593097686767578, -0.10268402099609375, 1.4032020568847656, 0.3013572692871094, 0.37511444091796875, -0.5642547607421875, -1.4790725708007812, 2.9529571533203125, 1.2789764404296875, 1.2119293212890625, 0.01947021484375, -2.4196929931640625, 5.252410888671875, -1.4795036315917969, 5.1454620361328125, -0.8200130462646484, 4.504875183105469, -3.675689697265625, 1.4582443237304688, -0.446197509765625, 0.26189422607421875, -0.1702880859375, 2.960458755493164, 5.461696624755859, -0.4244842529296875, 1.6183013916015625, 0.896087646484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000409.npy"}
|
||||
{"epoch": 0.618291761148904, "step": 410, "batch_size": 64, "mean": 1.3701996803283691, "std": 2.3345634937286377, "min": -3.9222869873046875, "p10": -1.627674865722656, "median": 1.5855484008789062, "p90": 4.133371353149416, "max": 7.8110504150390625, "pos_frac": 0.734375, "sample": [1.2191848754882812, -0.8223495483398438, 2.049694061279297, 1.5318603515625, 4.282619476318359, -2.594635009765625, 2.6971817016601562, 1.7630996704101562, -2.092742919921875, -3.295328140258789, 1.0531845092773438, 3.0898284912109375, -0.8739852905273438, -2.498626708984375, 3.61456298828125, 2.5209197998046875, 5.500270843505859, -3.9222869873046875, 3.5481109619140625, 5.3342742919921875, 5.528045654296875, 2.0198936462402344, -1.54437255859375, 1.9391117095947266, 0.4423065185546875, 1.6392364501953125, -1.6633758544921875, -1.4490814208984375, 2.5630569458007812, 0.04824066162109375, 3.785125732421875, 0.44023895263671875, 1.2664661407470703, 2.6260833740234375, 0.21551513671875, 5.656303405761719, 1.3179550170898438, -2.3000450134277344, -0.06591987609863281, 3.2731781005859375, -0.33924102783203125, -1.441162109375, 0.6842193603515625, -0.56939697265625, 3.39642333984375, 1.4905853271484375, 4.357397079467773, 2.2462730407714844, 2.009572982788086, 1.4525146484375, 0.2812652587890625, 2.0790557861328125, 7.8110504150390625, 1.756439208984375, -0.06937408447265625, 0.652008056640625, 2.6516265869140625, 0.43235015869140625, 2.1029796600341797, 1.662149429321289, 3.22882080078125, 2.2481155395507812, -1.4457473754882812, 3.1720504760742188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000410.npy"}
|
||||
{"epoch": 0.6198034769463341, "step": 411, "batch_size": 64, "mean": 1.5908393859863281, "std": 2.544201612472534, "min": -6.1982574462890625, "p10": -1.2060527801513672, "median": 1.3761024475097656, "p90": 4.881285858154298, "max": 7.4478912353515625, "pos_frac": 0.75, "sample": [-2.657825469970703, 2.3424148559570312, 3.278076171875, 4.1146240234375, 1.2573318481445312, 1.4015274047851562, -3.5470428466796875, 0.2208709716796875, 2.5027923583984375, 0.7362823486328125, 4.6365203857421875, 5.764503479003906, 2.659893035888672, 2.5629043579101562, 3.9347381591796875, 1.0322036743164062, 2.7484703063964844, -1.642791748046875, 4.675849914550781, -0.041080474853515625, -0.83978271484375, 1.7683792114257812, 3.7558135986328125, 0.4739646911621094, 1.350677490234375, 2.7997894287109375, -0.5078659057617188, 5.044158935546875, 0.34250640869140625, -6.1982574462890625, -2.7576141357421875, -1.2255592346191406, -0.84881591796875, 0.92547607421875, 3.0013961791992188, 0.45046234130859375, -0.21994781494140625, 5.194084167480469, -0.2568092346191406, 7.334869384765625, 2.375244140625, 1.519622802734375, 0.8621368408203125, 2.8572463989257812, 3.3598556518554688, 7.4478912353515625, -1.1605377197265625, 4.969329833984375, 0.5173721313476562, 0.24501800537109375, 1.3261375427246094, 1.703887939453125, 4.42181396484375, 4.553398132324219, -0.01346588134765625, 1.2079124450683594, 1.8966217041015625, 2.2332000732421875, 5.458827972412109, 0.0625457763671875, -0.14609527587890625, 0.04455280303955078, -1.7743072509765625, 2.278318405151367], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000411.npy"}
|
||||
{"epoch": 0.6213151927437641, "step": 412, "batch_size": 64, "mean": 1.3141303062438965, "std": 2.1419084072113037, "min": -2.8825225830078125, "p10": -1.2857173919677733, "median": 1.1752548217773438, "p90": 4.382482910156251, "max": 7.381416320800781, "pos_frac": 0.6875, "sample": [-1.2164421081542969, 4.529869079589844, 1.2513923645019531, 4.917575836181641, 1.2434196472167969, 2.450634002685547, 1.8117523193359375, 1.6314811706542969, -1.5444374084472656, 1.871999740600586, 0.07580947875976562, 1.6144981384277344, 3.1989898681640625, 7.154693603515625, 1.0724658966064453, -0.13388824462890625, 3.2887229919433594, -0.22472000122070312, -0.7609176635742188, 0.8498649597167969, 4.705589294433594, 3.182525634765625, 1.0626907348632812, 1.418426513671875, -0.070587158203125, -0.324737548828125, 2.8310394287109375, 1.3451652526855469, 1.5696945190429688, -0.8713226318359375, 0.7430095672607422, 4.0318603515625, 2.0894851684570312, 7.381416320800781, -1.5581474304199219, 0.9191703796386719, -1.7035675048828125, -0.9564208984375, 3.1018505096435547, -1.634765625, -1.0461196899414062, 0.4160003662109375, 1.007904052734375, -1.048919677734375, 1.1113128662109375, 0.7938385009765625, 4.894561767578125, 4.038581848144531, 5.1374969482421875, -1.3154067993164062, 1.23919677734375, 1.6011810302734375, 2.4975967407226562, 0.2640838623046875, 2.7081222534179688, 1.4196701049804688, -0.08118057250976562, 0.8552875518798828, -2.8825225830078125, 1.263214111328125, -0.06351852416992188, -0.12604808807373047, 2.9235763549804688, -1.8487091064453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000412.npy"}
|
||||
{"epoch": 0.6228269085411943, "step": 413, "batch_size": 64, "mean": 1.4179540872573853, "std": 2.3391308784484863, "min": -4.140167236328125, "p10": -0.8198232650756836, "median": 1.3228683471679688, "p90": 4.144765090942384, "max": 7.9715576171875, "pos_frac": 0.75, "sample": [-0.49837684631347656, -4.059272766113281, 1.9947052001953125, 3.342836380004883, 0.0694427490234375, 0.16481399536132812, 1.841796875, 7.9715576171875, -0.7905902862548828, 2.548553466796875, -4.140167236328125, 1.4553985595703125, 1.45166015625, -0.470367431640625, -0.28759002685546875, 1.6942901611328125, 1.0, 5.892803192138672, 0.07210159301757812, -0.6386260986328125, 4.2459869384765625, 2.8114700317382812, 1.2885589599609375, 1.0347785949707031, 2.7296142578125, -0.8829193115234375, 3.534576416015625, 6.6898345947265625, 3.0147018432617188, 0.025264739990234375, 2.93096923828125, 0.8565196990966797, 0.4226799011230469, 1.357177734375, 5.985874176025391, -0.8323516845703125, 1.511932373046875, 0.8011474609375, 2.529144287109375, 2.2652664184570312, -2.69500732421875, 3.908580780029297, 1.5971221923828125, 0.41448974609375, 3.2623062133789062, -0.7368602752685547, -2.48675537109375, -1.4056625366210938, 0.8574199676513672, 2.8202285766601562, -0.7000198364257812, 0.1758575439453125, 0.579742431640625, -0.15192604064941406, -0.509735107421875, 2.8208770751953125, 0.3417072296142578, 5.665870666503906, 2.970458984375, 2.5171585083007812, 0.33092498779296875, 2.3987560272216797, 3.53399658203125, 4.3043365478515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000413.npy"}
|
||||
{"epoch": 0.6243386243386243, "step": 414, "batch_size": 64, "mean": 1.6653553247451782, "std": 2.229182004928589, "min": -3.0358047485351562, "p10": -0.9072181701660155, "median": 1.6098251342773438, "p90": 4.419113540649414, "max": 7.330924987792969, "pos_frac": 0.734375, "sample": [3.0259475708007812, 0.08582305908203125, 3.0434799194335938, 2.470489501953125, -0.2997283935546875, 3.3532371520996094, 0.8232269287109375, 4.360118865966797, 2.2369155883789062, 1.73638916015625, 2.6189193725585938, 3.3345489501953125, 3.0570068359375, -2.845470428466797, 4.493598937988281, 3.27874755859375, -1.8715744018554688, -1.6456680297851562, 2.6250381469726562, -0.04864501953125, 2.5492897033691406, -0.9612503051757812, 7.330924987792969, 5.8768157958984375, 4.44439697265625, 0.863128662109375, 1.3182373046875, 1.6714859008789062, -0.7811431884765625, 3.379474639892578, 3.0586280822753906, -0.07061767578125, 4.041608810424805, 1.5481643676757812, 4.691986083984375, 2.5337371826171875, 2.196430206298828, 1.1483116149902344, 1.7496185302734375, 3.7995948791503906, -0.18524169921875, -1.7347030639648438, 1.0874433517456055, -0.4352684020996094, 4.0625457763671875, -0.7159423828125, 0.391693115234375, 1.2190284729003906, -3.0358047485351562, 0.23041152954101562, 3.21527099609375, 1.3335380554199219, -2.3812503814697266, 0.8601589202880859, 3.667743682861328, 6.7972412109375, 0.2291259765625, -0.5196533203125, 0.6046371459960938, 2.4952335357666016, 0.89996337890625, -0.12232398986816406, -0.212738037109375, 4.610408782958984], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000414.npy"}
|
||||
{"epoch": 0.6258503401360545, "step": 415, "batch_size": 64, "mean": 1.4645925760269165, "std": 2.504441976547241, "min": -5.876777648925781, "p10": -1.5803581237792967, "median": 1.2335834503173828, "p90": 4.741426086425782, "max": 7.104316711425781, "pos_frac": 0.6875, "sample": [0.8625764846801758, 1.9332304000854492, 0.4933319091796875, 3.671236038208008, -0.29468536376953125, 1.150421142578125, 3.1392822265625, 6.495115280151367, -1.6340103149414062, 5.8849639892578125, -0.9371681213378906, -1.9339447021484375, 5.1592864990234375, 5.426958084106445, -0.9569625854492188, 3.7157821655273438, 7.104316711425781, 1.1704139709472656, 4.525154113769531, -1.1448955535888672, 1.4980621337890625, 1.124298095703125, 0.8540420532226562, -0.3150758743286133, 3.2610321044921875, 4.834114074707031, -2.0375633239746094, 1.7967109680175781, 2.4474868774414062, 3.9688720703125, -1.6995391845703125, 1.7556304931640625, 3.0858154296875, -0.8389701843261719, -3.4999771118164062, 4.20477294921875, 2.5492401123046875, 0.5581464767456055, 3.7113571166992188, -1.455169677734375, -0.14797210693359375, 3.0006103515625, -1.9050331115722656, -5.876777648925781, 2.9616928100585938, 2.2497100830078125, -0.09691619873046875, -0.26214599609375, 1.6914520263671875, -0.4411964416503906, 0.1036529541015625, 0.89398193359375, 0.40113067626953125, 1.4561843872070312, -0.45680999755859375, 2.6259536743164062, 0.3952045440673828, 1.2967529296875, 4.247039794921875, 0.7846832275390625, -0.6273345947265625, 3.10968017578125, 5.268192291259766, 3.4285011291503906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000415.npy"}
|
||||
{"epoch": 0.6273620559334845, "step": 416, "batch_size": 64, "mean": 1.0549399852752686, "std": 2.2809388637542725, "min": -3.4187698364257812, "p10": -1.8931365966796874, "median": 0.7799301147460938, "p90": 4.253446197509766, "max": 6.743194580078125, "pos_frac": 0.625, "sample": [0.5185909271240234, -2.4327392578125, -0.712738037109375, 0.704833984375, 0.23268890380859375, 5.426303863525391, -3.029083251953125, -0.0659332275390625, 2.063190460205078, 4.125373840332031, 0.8550262451171875, 3.3339691162109375, 6.166656494140625, 0.2217845916748047, -0.27160072326660156, 0.21906280517578125, -0.1397705078125, 1.4065055847167969, -3.4187698364257812, -0.6769866943359375, 1.523233413696289, 1.5593338012695312, 1.321014404296875, 1.5707359313964844, -0.562403678894043, -0.17993927001953125, 6.743194580078125, 4.71038818359375, 3.820770263671875, 0.23226165771484375, 2.099050521850586, 2.1187667846679688, -0.13958740234375, -2.9453468322753906, 2.6121597290039062, 2.107982635498047, 0.3270301818847656, -1.9283523559570312, -0.6527137756347656, 0.9833831787109375, -0.20080184936523438, 2.994476318359375, -1.8109664916992188, -1.0401668548583984, 0.4779052734375, -0.05169677734375, 5.666904449462891, -2.713165283203125, -0.5945758819580078, 3.88519287109375, -0.5291213989257812, 2.3490447998046875, 1.0441055297851562, 4.443138122558594, -2.1674957275390625, 2.2748947143554688, -0.0547027587890625, 1.6141204833984375, -0.443206787109375, 2.1536216735839844, 1.41766357421875, 3.613697052001953, 4.3083343505859375, 1.0316352844238281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000416.npy"}
|
||||
{"epoch": 0.6288737717309146, "step": 417, "batch_size": 64, "mean": 1.9535232782363892, "std": 2.4374024868011475, "min": -4.847450256347656, "p10": -0.7720294952392577, "median": 1.7648277282714844, "p90": 5.334207916259768, "max": 8.9349365234375, "pos_frac": 0.765625, "sample": [1.244070053100586, -0.6518211364746094, 2.5551071166992188, -0.82354736328125, 4.473541259765625, 2.1031570434570312, -1.266357421875, 5.555511474609375, 2.0011062622070312, 1.3452644348144531, 8.9349365234375, 0.585357666015625, 2.5644874572753906, 1.1574954986572266, 1.7655258178710938, -2.445394515991211, 6.339466094970703, -1.0595550537109375, -0.1171722412109375, 0.8571643829345703, 1.775909423828125, 4.6767120361328125, 1.9110164642333984, 1.2612838745117188, 1.3156929016113281, -0.6058807373046875, 5.924943923950195, 1.90838623046875, 4.817832946777344, 4.51763916015625, 2.3238906860351562, 3.3608551025390625, 6.986761093139648, 1.20501708984375, 1.764129638671875, -0.4682464599609375, -0.351348876953125, 1.5641365051269531, 3.599853515625, 1.5458831787109375, 2.9731826782226562, 1.2081413269042969, 2.5528221130371094, 2.8872833251953125, 2.459869384765625, 1.6468734741210938, 1.8188762664794922, 2.2303390502929688, -1.3417205810546875, 1.121673583984375, -0.040130615234375, 1.1919536590576172, -0.2523651123046875, -4.847450256347656, 2.9820098876953125, -1.8650016784667969, 3.2143611907958984, 5.946922302246094, 1.2533683776855469, -0.2886018753051758, 4.70332145690918, 1.4960460662841797, 2.9331932067871094, 6.8877105712890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000417.npy"}
|
||||
{"epoch": 0.6303854875283447, "step": 418, "batch_size": 64, "mean": 0.893629252910614, "std": 2.1550304889678955, "min": -2.8440303802490234, "p10": -1.855158233642578, "median": 0.5687904357910156, "p90": 3.7080703735351572, "max": 6.7878570556640625, "pos_frac": 0.640625, "sample": [1.2395172119140625, 1.4328250885009766, 2.645355224609375, 1.364105224609375, -1.6259326934814453, 2.7100143432617188, -0.38365936279296875, 2.264568328857422, 6.53021240234375, -0.814849853515625, 4.225948333740234, 5.27764892578125, -2.25225830078125, 2.26263427734375, 0.29781341552734375, -2.2304344177246094, 2.1778335571289062, 0.05594635009765625, 3.50787353515625, -2.2874374389648438, 0.11386871337890625, 6.7878570556640625, -0.1378173828125, 0.16272735595703125, 2.2394485473632812, 1.2804756164550781, -1.674896240234375, 2.44732666015625, 1.2673263549804688, 0.33977508544921875, -1.019927978515625, -1.8197021484375, -0.045154571533203125, 2.316189765930176, 5.998905181884766, -0.26523590087890625, 1.7177581787109375, -1.3344650268554688, 1.1498031616210938, -0.44222259521484375, -2.0702896118164062, -1.8703536987304688, -2.8440303802490234, -1.9092769622802734, 0.46979522705078125, 0.5142440795898438, 1.9010772705078125, 1.7477035522460938, -0.01166534423828125, -0.2227935791015625, 1.4485321044921875, 0.131134033203125, 3.7938690185546875, 1.24774169921875, 1.7253551483154297, -0.88824462890625, -0.7453231811523438, 4.2842864990234375, 0.9501285552978516, 2.955230712890625, 0.5753097534179688, -0.7187156677246094, 0.6865234375, 0.5622711181640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000418.npy"}
|
||||
{"epoch": 0.6318972033257747, "step": 419, "batch_size": 64, "mean": 1.717864751815796, "std": 2.0683624744415283, "min": -2.485736846923828, "p10": -0.8747863769531247, "median": 1.7450637817382812, "p90": 4.470871734619141, "max": 7.5589752197265625, "pos_frac": 0.796875, "sample": [0.5870437622070312, 4.243255615234375, 1.919504165649414, 1.7095489501953125, -1.3086681365966797, 1.4191093444824219, 3.1253662109375, 2.169973373413086, -0.5064849853515625, 2.164614677429199, -1.6108932495117188, 3.8281784057617188, 3.604888916015625, 3.350341796875, 1.2262153625488281, 0.4342498779296875, 2.3985137939453125, 0.8300628662109375, 4.305938720703125, -0.5375823974609375, -0.217529296875, 2.1163330078125, -2.485736846923828, 2.3550796508789062, 0.8094940185546875, -0.476470947265625, 3.628803253173828, -2.454925537109375, 0.09295654296875, 2.9602127075195312, 0.24009323120117188, -1.2676239013671875, -0.3775043487548828, 4.541557312011719, 4.5877532958984375, 1.1300773620605469, 0.5730934143066406, 0.6523475646972656, 0.16894912719726562, 3.3205642700195312, 3.0217552185058594, 3.944995880126953, 1.78057861328125, 0.4504241943359375, 2.8032455444335938, -1.2480087280273438, 5.074928283691406, 1.4678611755371094, 0.7244796752929688, 5.690040588378906, 4.9001007080078125, 1.9948101043701172, 1.8396453857421875, 1.953399658203125, 1.8748855590820312, 1.4418792724609375, 2.5809326171875, 1.2652587890625, 5.67596435546875, -1.0193023681640625, 0.8096656799316406, 2.4418563842773438, -0.33573150634765625, 7.5589752197265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000419.npy"}
|
||||
{"epoch": 0.6334089191232048, "step": 420, "batch_size": 64, "mean": 1.4142491817474365, "std": 2.7041070461273193, "min": -3.8566741943359375, "p10": -1.163650131225586, "median": 0.834650993347168, "p90": 4.996538543701172, "max": 9.968242645263672, "pos_frac": 0.6875, "sample": [0.287750244140625, -0.07190704345703125, 3.4685440063476562, -1.118011474609375, 5.067329406738281, -1.6942062377929688, -0.6176910400390625, 0.8341236114501953, -1.887176513671875, -2.215930938720703, 0.01177215576171875, 0.9872055053710938, -0.7790451049804688, 0.766571044921875, -0.8666229248046875, 4.305408477783203, 5.985076904296875, 0.11669158935546875, 6.823246002197266, 4.7947235107421875, 1.4882793426513672, -2.186443328857422, 2.3134536743164062, 2.7319765090942383, 1.8381404876708984, 3.892087936401367, -1.0751991271972656, -3.731121063232422, 0.8351783752441406, 2.2212352752685547, -1.0836334228515625, 3.1435775756835938, -0.0612030029296875, 1.8131256103515625, -0.4889984130859375, -0.62945556640625, 0.5431976318359375, 0.052417755126953125, 1.2673263549804688, 1.6050643920898438, 0.6995277404785156, -3.8566741943359375, -1.1676750183105469, -0.293609619140625, 3.8489456176757812, 1.723867416381836, 1.7460403442382812, 5.771780014038086, 0.6449127197265625, 0.5031280517578125, 4.83135986328125, 0.383148193359375, 1.0235404968261719, -1.1542587280273438, 0.353790283203125, 3.883676528930664, 4.346649169921875, 9.968242645263672, 2.0959129333496094, 7.657806396484375, 2.2296485900878906, 6.257438659667969, -0.5401077270507812, 0.8680038452148438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000420.npy"}
|
||||
{"epoch": 0.6349206349206349, "step": 421, "batch_size": 64, "mean": 1.0698715448379517, "std": 2.6369385719299316, "min": -5.133575439453125, "p10": -1.8242046356201171, "median": 0.9812936782836914, "p90": 4.405870056152343, "max": 8.668426513671875, "pos_frac": 0.65625, "sample": [-3.0260696411132812, 6.435150146484375, -0.7862148284912109, 1.535867691040039, 2.632354736328125, 2.631885528564453, 3.7445907592773438, -1.2161445617675781, 0.0504913330078125, 8.668426513671875, 4.402740478515625, -0.3650360107421875, 4.8734130859375, 2.3615875244140625, -1.4171867370605469, 1.490652084350586, 2.1079254150390625, -0.906219482421875, 3.5779342651367188, 2.7338943481445312, 4.736034393310547, 0.8794937133789062, -1.1818695068359375, -1.0204124450683594, 1.0081195831298828, 0.04135894775390625, 1.5878753662109375, -4.7285308837890625, -1.8735275268554688, -1.4888801574707031, -0.10826492309570312, 5.36351203918457, 4.4072113037109375, -0.5043163299560547, 0.940216064453125, 1.5441513061523438, -0.908203125, 2.6348648071289062, 0.17418289184570312, 0.6667270660400391, 1.8603363037109375, -0.019140243530273438, 1.5961685180664062, 6.1873626708984375, 0.08172607421875, 3.350910186767578, 1.3773117065429688, 0.19935989379882812, -3.9857177734375, -0.567901611328125, 4.0833740234375, 3.562530517578125, -2.2199630737304688, 1.9071578979492188, 0.03997802734375, -5.133575439453125, 0.9544677734375, 1.6192245483398438, 2.5589256286621094, -2.329721450805664, 1.4116973876953125, 2.4057655334472656, -0.4591636657714844, -1.7091178894042969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000421.npy"}
|
||||
{"epoch": 0.636432350718065, "step": 422, "batch_size": 64, "mean": 1.4293758869171143, "std": 2.6097023487091064, "min": -3.3241119384765625, "p10": -1.7788642883300776, "median": 1.2610206604003906, "p90": 4.616132354736329, "max": 8.75311279296875, "pos_frac": 0.71875, "sample": [3.0502166748046875, 7.6167144775390625, 1.3740653991699219, 0.9867782592773438, 2.500152587890625, -1.0026130676269531, 1.4033660888671875, -2.703033447265625, 1.6044502258300781, 8.75311279296875, 2.4804840087890625, 1.6157684326171875, -1.337188720703125, 4.74896240234375, 4.3787384033203125, 0.517913818359375, 1.1952629089355469, 5.383628845214844, -0.248870849609375, 0.6853256225585938, 2.4608535766601562, -0.2274169921875, -0.6107521057128906, -3.3241119384765625, 0.934295654296875, -2.3554229736328125, 2.1975326538085938, 1.2549972534179688, 6.9242401123046875, 1.9477615356445312, 3.0888214111328125, 0.801788330078125, 4.365020751953125, -3.0956153869628906, 2.038837432861328, -1.9514846801757812, 2.5299835205078125, -0.25070953369140625, 4.717872619628906, 0.49352264404296875, 0.80621337890625, -1.2882843017578125, 2.4052734375, 3.3972854614257812, 0.04345703125, -1.3195877075195312, -1.3640594482421875, 1.7843475341796875, 1.1167678833007812, 1.861572265625, -0.5427017211914062, 7.984161376953125, 0.8042240142822266, -2.5412826538085938, 2.4252548217773438, 2.7219924926757812, 1.0422382354736328, -2.8743667602539062, 3.603912353515625, 3.66595458984375, 1.0669631958007812, 1.2670440673828125, -1.3760833740234375, 1.8465118408203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000422.npy"}
|
||||
{"epoch": 0.6379440665154951, "step": 423, "batch_size": 64, "mean": 1.6832462549209595, "std": 2.202481269836426, "min": -3.0238685607910156, "p10": -1.0720153808593749, "median": 1.4423294067382812, "p90": 4.911860466003419, "max": 7.863433837890625, "pos_frac": 0.796875, "sample": [1.7723770141601562, -1.0993576049804688, 3.9693832397460938, -0.43888092041015625, 7.863433837890625, 0.19244384765625, -0.7865753173828125, -1.0082168579101562, 5.515869140625, 0.9048385620117188, -1.128814697265625, 4.757364273071289, 1.487060546875, 0.2374267578125, -0.573883056640625, 1.6660232543945312, 0.7565040588378906, -1.4262847900390625, 5.71929931640625, -1.8601760864257812, 1.7491626739501953, 2.170074462890625, -1.3935470581054688, 2.7495155334472656, 0.6290493011474609, 3.4516220092773438, 5.934822082519531, 0.06280899047851562, -3.0238685607910156, 1.3975982666015625, 4.1202392578125, 4.279869079589844, 3.4281005859375, 2.5142364501953125, 1.899871826171875, 2.2719154357910156, 0.9242401123046875, 1.2816925048828125, 4.734947204589844, 5.1765289306640625, 1.0512580871582031, 1.6338701248168945, -0.2218780517578125, 1.556610107421875, 0.9886398315429688, 2.2878456115722656, 2.1284751892089844, 0.23279762268066406, 1.0235481262207031, 2.3547229766845703, 4.727996826171875, 2.2076873779296875, -0.09859848022460938, 0.9962062835693359, 0.55816650390625, 5.3152008056640625, 0.850738525390625, 1.6390838623046875, 0.4632682800292969, 1.0592803955078125, 4.9780731201171875, -1.9155807495117188, 2.112070083618164, 0.9195623397827148], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000423.npy"}
|
||||
{"epoch": 0.6394557823129252, "step": 424, "batch_size": 64, "mean": 1.6546807289123535, "std": 2.1958467960357666, "min": -3.2435836791992188, "p10": -1.2522422790527343, "median": 1.6976318359375, "p90": 4.683242797851563, "max": 7.343173980712891, "pos_frac": 0.796875, "sample": [-1.9719924926757812, 3.8843231201171875, 1.6200408935546875, 7.343173980712891, 0.566558837890625, 1.9258251190185547, 2.082550048828125, 3.3502960205078125, -1.9143295288085938, -3.1596221923828125, 0.844879150390625, 2.4875946044921875, -1.0423107147216797, -1.7099075317382812, 5.648929595947266, 1.4589576721191406, 1.4299545288085938, -1.1731643676757812, 1.0526504516601562, 2.1339759826660156, 0.6086578369140625, 2.42388916015625, 0.5767898559570312, -1.2861328125, -0.45957183837890625, 1.7752227783203125, 2.92144775390625, 0.427520751953125, 5.531166076660156, 2.4066829681396484, 2.6222381591796875, 0.5232009887695312, -0.20098876953125, 0.23735618591308594, 4.877613067626953, 0.6021041870117188, 1.974945068359375, 4.47998046875, 3.610565185546875, 0.85101318359375, 1.372201919555664, 3.1650848388671875, 1.7791805267333984, 4.775550842285156, 0.7483863830566406, 5.27452278137207, 3.7470932006835938, 2.3126564025878906, -1.6702880859375, -0.5681076049804688, 2.881805419921875, 1.4363555908203125, 1.780303955078125, 3.2537574768066406, -0.15914154052734375, 0.8295974731445312, 0.7800140380859375, -3.2435836791992188, 4.414543151855469, 4.391330718994141, 1.887054443359375, 4.770355224609375, 0.2747955322265625, 2.3040122985839844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000424.npy"}
|
||||
{"epoch": 0.6409674981103552, "step": 425, "batch_size": 64, "mean": 1.7144721746444702, "std": 2.0635359287261963, "min": -3.9404983520507812, "p10": -0.8421211242675781, "median": 1.800323486328125, "p90": 4.272206115722656, "max": 6.812538146972656, "pos_frac": 0.78125, "sample": [1.000518798828125, 1.1183242797851562, 0.0181427001953125, 0.976654052734375, 4.3822021484375, -0.1357250213623047, 6.812538146972656, 0.19931411743164062, 5.002132415771484, 2.2796630859375, 2.5833740234375, -0.9069499969482422, 2.4852676391601562, 1.492340087890625, -1.9951705932617188, 1.5784759521484375, -2.0672073364257812, 2.2325439453125, 0.0728759765625, 2.4097328186035156, -0.09434127807617188, 2.8531417846679688, -0.7294235229492188, -0.3260030746459961, 3.6043472290039062, 0.7656097412109375, 2.5758438110351562, 3.113739013671875, 0.2757549285888672, 5.915779113769531, 0.9495391845703125, -0.025432586669921875, 2.2468338012695312, 1.7310676574707031, 2.8737335205078125, 0.47377777099609375, 0.32338714599609375, 4.8578643798828125, 1.8695793151855469, 1.4233322143554688, 3.8570480346679688, -0.8544921875, 1.0836219787597656, 3.9082489013671875, 3.0259017944335938, 3.7134628295898438, 2.2005538940429688, 2.378368377685547, 2.2654266357421875, 2.3346023559570312, 4.3915252685546875, 3.114765167236328, -1.0050296783447266, 3.6509323120117188, -1.84735107421875, -0.8132553100585938, 4.168834686279297, 2.6570663452148438, -0.09170150756835938, 4.285980224609375, 1.40301513671875, 1.3819503784179688, 4.2400665283203125, -3.9404983520507812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000425.npy"}
|
||||
{"epoch": 0.6424792139077853, "step": 426, "batch_size": 64, "mean": 0.787992000579834, "std": 2.5983150005340576, "min": -6.8232269287109375, "p10": -2.3328208923339844, "median": 0.5246105194091797, "p90": 4.134884643554687, "max": 6.81524658203125, "pos_frac": 0.609375, "sample": [-2.523681640625, -0.6476287841796875, 3.140583038330078, 2.6179046630859375, 3.57183837890625, 6.0416717529296875, 0.1807708740234375, 5.664875030517578, 0.94134521484375, 2.4692001342773438, -4.408164978027344, -1.3457107543945312, 0.3651580810546875, -2.9131088256835938, 0.44744873046875, -0.6182746887207031, 4.071689605712891, 0.9720687866210938, 0.15386962890625, -0.2837104797363281, -2.1194114685058594, -1.3073348999023438, 0.2413177490234375, -0.00862884521484375, -1.5110130310058594, 1.3901214599609375, -0.5172882080078125, -6.8232269287109375, 2.1504993438720703, 1.0161285400390625, 2.342010498046875, 4.161968231201172, 3.0259323120117188, 1.2978096008300781, -3.0678253173828125, 0.8161697387695312, -2.5481033325195312, 1.6569499969482422, 0.455902099609375, 6.81524658203125, 5.164203643798828, 2.41033935546875, -2.4242820739746094, -0.5958251953125, -0.8278961181640625, 5.760978698730469, -1.1448974609375, 0.8390426635742188, -0.3595314025878906, -1.18951416015625, -1.7566184997558594, 3.5298614501953125, 0.6600494384765625, 3.9348907470703125, 1.5969619750976562, -0.05823516845703125, 0.5933189392089844, 0.153717041015625, 2.1998291015625, -1.2412261962890625, 2.629974365234375, 0.9224815368652344, 4.918487548828125, -0.6499919891357422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000426.npy"}
|
||||
{"epoch": 0.6439909297052154, "step": 427, "batch_size": 64, "mean": 1.4609463214874268, "std": 2.3065099716186523, "min": -4.063539505004883, "p10": -1.172187805175781, "median": 1.2085294723510742, "p90": 4.4259035110473635, "max": 9.063690185546875, "pos_frac": 0.71875, "sample": [3.8117752075195312, 1.347513198852539, -0.8884410858154297, 0.0536956787109375, 1.4946746826171875, 1.5269508361816406, 4.591663360595703, -0.9394989013671875, 2.45440673828125, -1.9080429077148438, 3.351593017578125, -0.01165771484375, -1.9652061462402344, -1.899383544921875, 3.1352005004882812, 2.42181396484375, -0.2737884521484375, 2.2138595581054688, 0.5087432861328125, 1.5244979858398438, -0.1179962158203125, -0.507568359375, 7.17333984375, 1.4933738708496094, 1.529754638671875, 1.7374191284179688, 1.0695457458496094, 4.503959655761719, 1.0330162048339844, 1.4430122375488281, 4.435512542724609, 1.0212364196777344, -1.377044677734375, 0.3562202453613281, 1.377166748046875, 4.373424530029297, -0.3657207489013672, -4.063539505004883, 3.7401885986328125, 3.8441123962402344, 0.9559516906738281, 0.5839424133300781, 1.356842041015625, 0.9970130920410156, 0.6711997985839844, -1.360076904296875, 2.2373275756835938, 5.2362213134765625, 3.0138092041015625, -0.7828254699707031, 4.403482437133789, 3.2503662109375, 0.27968597412109375, -0.0758819580078125, 2.2866287231445312, -0.052886962890625, 1.0605697631835938, 2.0826416015625, 0.666656494140625, -1.27191162109375, 0.3600883483886719, -0.607635498046875, 9.063690185546875, 5.895881652832031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000427.npy"}
|
||||
{"epoch": 0.6455026455026455, "step": 428, "batch_size": 64, "mean": 1.5586960315704346, "std": 2.5606346130371094, "min": -3.264678955078125, "p10": -1.615732955932617, "median": 1.1777839660644531, "p90": 5.180727005004883, "max": 6.98486328125, "pos_frac": 0.671875, "sample": [-1.90771484375, -0.5316696166992188, 5.208221435546875, -0.21122360229492188, -0.10543632507324219, 0.19476699829101562, 0.46077728271484375, -2.807098388671875, -1.5332984924316406, -1.37481689453125, 3.1935348510742188, -3.264678955078125, 1.7462921142578125, 0.6149749755859375, 3.475879669189453, -0.32421875, 5.094764709472656, 1.683258056640625, -0.07648468017578125, -0.0863800048828125, 3.4897918701171875, 2.5877857208251953, -0.8670883178710938, 2.3068771362304688, 3.8762474060058594, 1.2850875854492188, 0.249053955078125, -1.815399169921875, 6.155792236328125, -3.04534912109375, 2.123504638671875, -0.35790252685546875, 1.058624267578125, 2.0222015380859375, 3.6227264404296875, 4.8970794677734375, 2.863494873046875, 3.8934783935546875, 0.40538787841796875, -0.9603710174560547, 1.90740966796875, 6.515533447265625, 3.5200042724609375, 5.726966857910156, -1.65106201171875, -0.48860931396484375, 2.11468505859375, 1.0704803466796875, 0.5589370727539062, -0.604278564453125, 0.570281982421875, 1.7091751098632812, 3.1271190643310547, -0.43633270263671875, 6.568471908569336, 4.965625762939453, 5.854534149169922, 5.116573333740234, 1.428985595703125, 0.7237205505371094, 6.98486328125, 2.2262468338012695, 0.9200973510742188, -1.913360595703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000428.npy"}
|
||||
{"epoch": 0.6470143613000756, "step": 429, "batch_size": 64, "mean": 1.6621168851852417, "std": 2.65059494972229, "min": -7.240001678466797, "p10": -1.4904104232788082, "median": 1.6148052215576172, "p90": 4.495606040954591, "max": 8.081863403320312, "pos_frac": 0.765625, "sample": [1.4443893432617188, -3.0068817138671875, -0.5129222869873047, 1.779449462890625, 2.627899169921875, 3.6268539428710938, 2.5987701416015625, 3.430816650390625, -0.3023872375488281, -0.3929595947265625, 0.4163551330566406, -2.675506591796875, 0.316925048828125, 1.5732917785644531, 1.5647048950195312, 3.4378433227539062, -1.7312850952148438, 1.6563186645507812, 0.5867156982421875, 1.931783676147461, 2.0211639404296875, 1.6568527221679688, 2.532073974609375, 3.9912357330322266, 5.941314697265625, 4.0208892822265625, 5.5654449462890625, 1.3983726501464844, 2.0678939819335938, -1.6386737823486328, -2.2891464233398438, -7.240001678466797, -0.6521530151367188, 0.544677734375, 4.229045867919922, 0.6650848388671875, -1.1444625854492188, 2.4092063903808594, 1.068817138671875, 1.3086090087890625, 4.609846115112305, 8.041038513183594, 0.29792022705078125, 3.965412139892578, 5.392799377441406, -2.4980430603027344, -0.16552734375, 1.4922332763671875, 1.1549205780029297, 1.7140045166015625, 1.1837444305419922, 7.9069976806640625, 2.7174835205078125, -0.3913688659667969, 2.070178985595703, 1.4282608032226562, 3.5699996948242188, 2.7934417724609375, 8.081863403320312, 2.849945068359375, 2.702556610107422, 0.1081857681274414, -0.47411346435546875, 2.9972763061523438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000429.npy"}
|
||||
{"epoch": 0.6485260770975056, "step": 430, "batch_size": 64, "mean": 1.7297089099884033, "std": 2.2351622581481934, "min": -3.5413246154785156, "p10": -0.6493251800537108, "median": 1.7782516479492188, "p90": 4.039882659912109, "max": 9.256820678710938, "pos_frac": 0.8125, "sample": [2.5153045654296875, 0.35286712646484375, -0.01024627685546875, 3.2030487060546875, -1.3393630981445312, 2.6081085205078125, 0.2190532684326172, 6.386199951171875, 1.8460922241210938, 2.7401962280273438, -2.306783676147461, 2.407501220703125, 4.1868896484375, 1.3660011291503906, 6.791145324707031, 3.5432891845703125, 5.398216247558594, 3.7894821166992188, 3.6773910522460938, 0.5238361358642578, 1.9527664184570312, 2.7004623413085938, 1.7104110717773438, -0.5041389465332031, -0.058956146240234375, 4.05291748046875, 1.1600303649902344, 2.638427734375, 3.6336441040039062, 1.4739608764648438, 1.5666046142578125, 0.8884124755859375, 2.1391143798828125, 2.308013916015625, 0.521575927734375, 9.256820678710938, 4.009468078613281, 1.2373809814453125, -3.5413246154785156, 3.243471145629883, 0.8102493286132812, 1.3532257080078125, 0.9727020263671875, -0.7115478515625, 0.3678245544433594, 2.270648956298828, 0.3524017333984375, 0.5804786682128906, -2.672698974609375, -2.235626220703125, 2.5653839111328125, 3.0999221801757812, -0.07244491577148438, 4.623291015625, 0.11133575439453125, 2.9474029541015625, 2.264892578125, 1.9471473693847656, 0.7702789306640625, 2.0799102783203125, -0.43795013427734375, -2.016572952270508, 0.8929595947265625, 2.5508651733398438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000430.npy"}
|
||||
{"epoch": 0.6500377928949358, "step": 431, "batch_size": 64, "mean": 1.483382225036621, "std": 2.438079833984375, "min": -4.9790496826171875, "p10": -1.227689170837402, "median": 1.230020523071289, "p90": 4.617028045654298, "max": 7.736701965332031, "pos_frac": 0.75, "sample": [-4.9790496826171875, 1.2549247741699219, 4.040252685546875, 0.6242351531982422, -0.33824920654296875, 2.6566619873046875, 4.897552490234375, -0.08832550048828125, 3.1844482421875, -2.082132339477539, 0.48422813415527344, 5.1215667724609375, 1.7794132232666016, 0.90380859375, 0.3060131072998047, 0.11603927612304688, 4.0579681396484375, -1.3485221862792969, 5.787567138671875, -0.212493896484375, 2.3818893432617188, -3.1234664916992188, -0.42848968505859375, 2.9383468627929688, -1.6644439697265625, 3.6353721618652344, 6.687915802001953, -0.19584083557128906, 2.12445068359375, -0.9457454681396484, -0.16269874572753906, 0.9478797912597656, 0.9659881591796875, 4.3275146484375, -0.3201904296875, -0.8780364990234375, 4.7024993896484375, 1.5982284545898438, -3.5892333984375, 1.1458816528320312, 0.5343170166015625, 1.59967041015625, -2.9762039184570312, 4.012367248535156, 1.9976119995117188, 0.3041572570800781, 5.244770050048828, 2.5680999755859375, 2.090951919555664, 7.736701965332031, 1.2051162719726562, 3.8409690856933594, 0.24859619140625, 1.39923095703125, 2.2159271240234375, 2.7089920043945312, 0.45503997802734375, 0.4811515808105469, 2.065216064453125, 4.417594909667969, 3.1412582397460938, 0.5492706298828125, 1.8796844482421875, 0.9022445678710938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000431.npy"}
|
||||
{"epoch": 0.6515495086923658, "step": 432, "batch_size": 64, "mean": 1.7330987453460693, "std": 2.4722163677215576, "min": -3.8801612854003906, "p10": -0.9355014801025389, "median": 1.5206170082092285, "p90": 5.183551406860352, "max": 7.7063140869140625, "pos_frac": 0.75, "sample": [1.1674728393554688, 0.9086456298828125, -0.1818695068359375, -2.9333953857421875, 1.485600471496582, 1.8756523132324219, -0.3797645568847656, 2.168621063232422, 0.132965087890625, 2.2958831787109375, 0.22329330444335938, -3.8801612854003906, -1.00146484375, -0.2891578674316406, 2.649993896484375, -1.70916748046875, 1.555633544921875, 1.6535186767578125, 7.7063140869140625, 5.618408203125, -2.7072677612304688, 2.734790802001953, 4.5076904296875, -0.1568603515625, 0.5441036224365234, 3.790576934814453, 2.3902130126953125, 5.0516204833984375, 2.3901519775390625, -0.5502548217773438, -1.0011138916015625, 0.07114410400390625, 0.4944286346435547, 1.142547607421875, 2.0218505859375, -0.7824058532714844, 7.153694152832031, 3.5504302978515625, 3.184661865234375, 4.098522186279297, 0.84625244140625, 2.64752197265625, 2.5018463134765625, 0.7954368591308594, 4.4380950927734375, 2.349212646484375, -0.6769962310791016, 5.769645690917969, 0.4396514892578125, 7.21539306640625, 1.4633255004882812, 3.2290802001953125, 1.1311874389648438, -0.5807342529296875, 1.9945831298828125, 6.74725341796875, -1.6092987060546875, 3.41778564453125, 3.7558746337890625, -0.18857192993164062, 0.32843017578125, 5.240093231201172, 1.75531005859375, 0.9123954772949219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000432.npy"}
|
||||
{"epoch": 0.6530612244897959, "step": 433, "batch_size": 64, "mean": 1.8397008180618286, "std": 2.897702693939209, "min": -4.070308685302734, "p10": -2.3076988220214845, "median": 1.7368545532226562, "p90": 5.448294830322266, "max": 12.587848663330078, "pos_frac": 0.796875, "sample": [0.32403564453125, 1.7837944030761719, -2.333221435546875, 1.3424415588378906, 0.7813873291015625, 1.4537582397460938, -0.2981414794921875, 6.484893798828125, 0.7676429748535156, -0.41412353515625, 0.45123291015625, 1.8725662231445312, 0.3845367431640625, -2.8453750610351562, 6.7355804443359375, -2.862262725830078, 1.4041213989257812, 3.0047035217285156, 2.925436019897461, -0.4683876037597656, 2.9515609741210938, 1.91534423828125, -0.28104400634765625, 2.5195693969726562, -4.070308685302734, 1.197164535522461, 3.2831878662109375, 0.2358531951904297, 4.849029541015625, 3.765228271484375, 0.7697391510009766, 2.5540237426757812, -2.2481460571289062, 1.3950653076171875, 2.8004837036132812, -2.8593711853027344, -2.7397499084472656, 0.86663818359375, 2.195587158203125, 12.587848663330078, 5.408416748046875, 6.091987609863281, 6.055351257324219, 0.6445732116699219, 2.66412353515625, -2.6853179931640625, 0.3418998718261719, 2.3600387573242188, 1.1056365966796875, 5.397178649902344, 0.38037109375, 4.133674621582031, 1.761688232421875, -1.3078994750976562, 1.7120208740234375, 3.8148345947265625, 3.2831459045410156, 1.0147552490234375, 4.8460235595703125, 1.8007278442382812, 5.465385437011719, 1.7806167602539062, 7.539327621459961, 1.9499664306640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000433.npy"}
|
||||
{"epoch": 0.654572940287226, "step": 434, "batch_size": 64, "mean": 1.6875743865966797, "std": 2.5521912574768066, "min": -4.334266662597656, "p10": -1.5592735290527342, "median": 1.7284374237060547, "p90": 4.884815979003907, "max": 8.307373046875, "pos_frac": 0.765625, "sample": [-3.2857589721679688, 3.7196884155273438, 1.7858257293701172, -0.37921142578125, 1.2130203247070312, 5.886627197265625, -0.9670991897583008, 3.828624725341797, -3.22344970703125, 2.0024032592773438, -0.564453125, 0.13748550415039062, -4.334266662597656, 1.8042449951171875, 3.6212539672851562, 0.6052093505859375, 4.624519348144531, 1.6377410888671875, 8.307373046875, 3.42132568359375, 3.3757667541503906, -1.1126651763916016, 5.744224548339844, -0.580322265625, -2.8870372772216797, 2.666259765625, 5.606578826904297, 0.9706878662109375, 1.1514205932617188, 3.0792465209960938, -1.3463287353515625, 0.8089675903320312, 2.033050537109375, 6.43243408203125, 6.154521942138672, 3.2195281982421875, -2.5433502197265625, 3.121826171875, 2.5304412841796875, 0.20579147338867188, 4.957756042480469, -1.6505355834960938, 1.322509765625, 2.1212825775146484, 1.356292724609375, 0.43346405029296875, 0.016172409057617188, 2.748504638671875, 1.5341949462890625, 1.6138420104980469, -0.4005565643310547, 4.382232666015625, -0.4290771484375, 2.0062713623046875, 2.5037879943847656, 0.6503677368164062, 1.6710491180419922, 4.714622497558594, 2.317495346069336, -2.1723403930664062, 3.612751007080078, 2.1227951049804688, 2.9186363220214844, 1.1810970306396484], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000434.npy"}
|
||||
{"epoch": 0.656084656084656, "step": 435, "batch_size": 64, "mean": 1.271209955215454, "std": 2.036027431488037, "min": -3.13238525390625, "p10": -1.0096916198730468, "median": 1.283233642578125, "p90": 3.8872642517089844, "max": 6.846965789794922, "pos_frac": 0.6875, "sample": [4.581729888916016, 2.5758543014526367, -0.1519927978515625, 0.9488162994384766, -1.1138191223144531, 2.0709304809570312, 1.18670654296875, 0.24541091918945312, 2.4074935913085938, -0.6428031921386719, 3.1572265625, 2.6234092712402344, -0.5720596313476562, 3.8531417846679688, 4.989952087402344, 1.7579269409179688, 0.1163482666015625, -3.13238525390625, -3.10406494140625, 1.653228759765625, 2.1385459899902344, -1.0478324890136719, 2.6814613342285156, 3.8819503784179688, 1.764495849609375, -0.12075042724609375, -0.54656982421875, 0.7479476928710938, 4.613304138183594, 5.377899169921875, 1.405059814453125, 4.0770263671875, -2.8058996200561523, 2.050344467163086, 1.4692039489746094, 3.0989913940429688, -0.3772010803222656, 2.123321533203125, 1.4677925109863281, 1.6145706176757812, 0.78167724609375, 3.234405517578125, 6.846965789794922, 3.4710311889648438, 1.6737613677978516, 2.2473907470703125, -0.19904327392578125, -0.8269720077514648, -1.2766189575195312, 3.8895416259765625, -0.9206962585449219, 0.9683380126953125, -0.10657882690429688, 0.5804443359375, 1.712677001953125, 1.255859375, 1.0628204345703125, -0.5544052124023438, -1.998199462890625, 1.31060791015625, 0.49056243896484375, -0.1711273193359375, 1.1381988525390625, -0.3179168701171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000435.npy"}
|
||||
{"epoch": 0.6575963718820862, "step": 436, "batch_size": 64, "mean": 1.3377701044082642, "std": 2.280937433242798, "min": -5.294761657714844, "p10": -1.5903808593749997, "median": 1.2607712745666504, "p90": 4.260557556152345, "max": 6.4059906005859375, "pos_frac": 0.734375, "sample": [-0.00756072998046875, 0.77008056640625, 2.4981155395507812, 1.1522979736328125, 0.16567230224609375, 1.646148681640625, 0.06741523742675781, 2.561288833618164, -1.6585006713867188, 1.1137123107910156, 3.360309600830078, 4.3811187744140625, 2.029022216796875, 3.979248046875, 1.131307601928711, 6.065502166748047, 6.4059906005859375, 5.349639892578125, -5.294761657714844, 2.2271385192871094, 2.7186279296875, 0.7623023986816406, -1.10955810546875, 1.8943939208984375, 0.6992588043212891, 1.4897918701171875, 3.4683074951171875, 1.0372734069824219, 1.4664764404296875, 3.3820724487304688, 1.7258377075195312, -2.139129638671875, -3.4009952545166016, 1.27679443359375, -0.27750396728515625, 0.43836212158203125, 0.7239227294921875, -1.4314346313476562, -0.6971054077148438, -1.1680068969726562, 3.377716064453125, 3.4110946655273438, 1.6908721923828125, 0.5983009338378906, -0.14676666259765625, 1.2447481155395508, -2.0527572631835938, 4.8511505126953125, 1.5732345581054688, 3.142852783203125, 1.9131126403808594, -0.5287303924560547, 0.9171295166015625, 2.2460708618164062, -0.15069961547851562, 3.1127777099609375, 5.361001968383789, -2.327972412109375, 2.1225814819335938, 1.0143489837646484, -1.094085693359375, 5.077201843261719, -1.7240371704101562, 3.185272216796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000436.npy"}
|
||||
{"epoch": 0.6591080876795162, "step": 437, "batch_size": 64, "mean": 1.5704649686813354, "std": 2.822012424468994, "min": -4.3666534423828125, "p10": -1.473401641845703, "median": 1.4053773880004883, "p90": 5.307758140563967, "max": 10.870227813720703, "pos_frac": 0.6875, "sample": [5.8929443359375, 4.397102355957031, 2.4083404541015625, -0.5824127197265625, 2.5661239624023438, -0.18592071533203125, 3.7419891357421875, 0.525054931640625, 8.242500305175781, 0.7181510925292969, 4.015892028808594, 2.6503143310546875, -4.3666534423828125, 0.431060791015625, -1.4981918334960938, -2.0892333984375, 2.8108291625976562, 2.7916793823242188, 2.5033111572265625, -1.2810440063476562, 5.5329132080078125, 6.495750427246094, 5.807182312011719, -0.677276611328125, -0.14725494384765625, 3.6840896606445312, 4.2181396484375, -3.1977195739746094, -3.2204856872558594, 0.115509033203125, 0.11800003051757812, -1.0542240142822266, -1.8062591552734375, -1.415557861328125, 1.9048957824707031, 6.4478912353515625, 2.5021820068359375, -1.1024932861328125, 2.609893798828125, 0.016466140747070312, 2.1334075927734375, 0.5966644287109375, -0.5627994537353516, 3.5385665893554688, -1.7598114013671875, 4.78239631652832, -0.13259124755859375, 2.1037540435791016, 3.25250244140625, 0.2831878662109375, 1.1919193267822266, 2.131366729736328, 0.072540283203125, 10.870227813720703, 1.61883544921875, 0.5125370025634766, 1.8264389038085938, 4.018218994140625, 0.0229339599609375, 1.8498764038085938, -0.30901336669921875, -0.31070709228515625, -0.99530029296875, 3.251129150390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000437.npy"}
|
||||
{"epoch": 0.6606198034769464, "step": 438, "batch_size": 64, "mean": 1.8056901693344116, "std": 2.388091802597046, "min": -2.393157958984375, "p10": -1.4904602050781248, "median": 1.4566230773925781, "p90": 5.0161685943603525, "max": 6.699337005615234, "pos_frac": 0.765625, "sample": [3.715057373046875, -2.042163848876953, 6.4873046875, -1.6500244140625, 0.7297706604003906, -1.147216796875, -0.3676414489746094, 2.4889984130859375, 1.3440170288085938, 4.67877197265625, 1.5449714660644531, 5.96649169921875, -0.9412002563476562, 0.4054756164550781, 2.7904624938964844, 1.3297195434570312, 2.8549537658691406, -1.5184364318847656, 2.6261024475097656, 1.2424049377441406, 4.605052947998047, 4.82452392578125, -1.4251823425292969, 0.5563201904296875, 3.2778091430664062, 1.0220279693603516, 4.328277587890625, 1.3682746887207031, 4.239675521850586, 5.247444152832031, 2.220428466796875, -0.1431884765625, 3.734222412109375, 1.1951560974121094, 0.0750885009765625, 5.3814697265625, 1.2484512329101562, -1.6010360717773438, 0.8573417663574219, 3.6095104217529297, 5.093021392822266, 0.7056884765625, 1.9988317489624023, 4.750091552734375, 3.393585205078125, 3.248687744140625, 2.54217529296875, 0.5658340454101562, -2.393157958984375, 1.3197593688964844, -1.2287445068359375, -2.0638675689697266, 1.5783977508544922, 5.75323486328125, 1.6564750671386719, -1.5339813232421875, 6.699337005615234, 2.8911819458007812, -1.3798904418945312, 4.836845397949219, 0.8409538269042969, -1.2303848266601562, 0.7960128784179688, 1.5645942687988281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000438.npy"}
|
||||
{"epoch": 0.6621315192743764, "step": 439, "batch_size": 64, "mean": 1.7630020380020142, "std": 2.2886180877685547, "min": -3.3570728302001953, "p10": -0.8284729003906248, "median": 1.5422210693359375, "p90": 5.0088851928710945, "max": 6.951515197753906, "pos_frac": 0.78125, "sample": [4.0817718505859375, 0.8258705139160156, -0.5940189361572266, -0.5344429016113281, 2.286945343017578, 3.4311676025390625, 0.9237823486328125, 0.123199462890625, 1.5891799926757812, 2.732452392578125, 5.696022033691406, 0.45917701721191406, 2.4540672302246094, 1.373260498046875, 2.611663818359375, 4.7212371826171875, 5.0953521728515625, 2.0630035400390625, 2.3351898193359375, 1.4952621459960938, 0.763397216796875, 5.722858428955078, 2.7005996704101562, -3.3570728302001953, 2.2853317260742188, 1.44989013671875, 6.951515197753906, 0.1551513671875, 0.2518501281738281, 4.80712890625, -1.6652297973632812, 5.979949951171875, -2.3886871337890625, -0.6418304443359375, 4.390010833740234, 1.4277801513671875, -0.06760025024414062, 0.5385360717773438, -2.9638214111328125, -0.6184120178222656, -0.382537841796875, 1.0308837890625, 3.93133544921875, 2.5425872802734375, 3.255207061767578, 3.2640228271484375, 4.5639495849609375, 0.8973922729492188, 1.7429580688476562, 0.4660148620605469, 0.5885238647460938, -0.9084625244140625, 1.4874038696289062, 5.408973693847656, 1.7241401672363281, 6.53759765625, 2.09918212890625, -1.2617645263671875, -0.918792724609375, -0.49457550048828125, 1.9749755859375, 3.2371978759765625, 2.06903076171875, 1.0853958129882812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000439.npy"}
|
||||
{"epoch": 0.6636432350718064, "step": 440, "batch_size": 64, "mean": 2.1909286975860596, "std": 2.28584361076355, "min": -1.9465255737304688, "p10": -0.6693614959716796, "median": 1.988210678100586, "p90": 5.307752227783204, "max": 6.89642333984375, "pos_frac": 0.765625, "sample": [2.310514450073242, 1.0178680419921875, 5.826793670654297, 3.8790435791015625, 6.529598236083984, 5.3353118896484375, 6.0543060302734375, -0.29681396484375, 4.838645935058594, 2.4551544189453125, 1.5650405883789062, 5.208339691162109, -0.14654159545898438, 5.1756439208984375, 0.498809814453125, 0.6759567260742188, 5.243446350097656, -0.13584136962890625, -1.7302322387695312, 1.3031072616577148, 2.7414703369140625, -0.4480161666870117, -0.9426803588867188, 4.2713470458984375, 2.70477294921875, 2.810474395751953, 2.6877212524414062, 2.7432098388671875, 0.14052391052246094, 1.8459396362304688, 0.24599838256835938, 2.4238815307617188, -0.06904983520507812, 1.2464675903320312, -0.5426883697509766, 0.7858009338378906, 1.8841133117675781, -0.6884536743164062, 2.050090789794922, 4.739521026611328, 1.92633056640625, 3.256000518798828, 1.0157470703125, 4.276914596557617, -1.9465255737304688, -0.23542022705078125, 6.89642333984375, 2.9444541931152344, 5.077461242675781, -1.1191253662109375, -1.6461334228515625, -0.6248130798339844, 5.501106262207031, 4.574211120605469, 3.17071533203125, 1.7842578887939453, 1.109832763671875, -0.7087001800537109, 2.34490966796875, 3.1595916748046875, 1.8388214111328125, 4.413539886474609, 5.445964813232422, 1.5252761840820312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000440.npy"}
|
||||
{"epoch": 0.6651549508692366, "step": 441, "batch_size": 64, "mean": 1.4984140396118164, "std": 2.1906914710998535, "min": -3.119220733642578, "p10": -1.3577167510986328, "median": 1.380035400390625, "p90": 3.996173858642578, "max": 7.11981201171875, "pos_frac": 0.796875, "sample": [3.262958526611328, 0.19715499877929688, 3.8927955627441406, 2.1294708251953125, 2.34918212890625, 0.16889572143554688, 3.2358474731445312, 1.7508316040039062, 3.6458892822265625, -1.7806777954101562, -3.119220733642578, 2.2392730712890625, 0.4468269348144531, 5.448211669921875, 0.27648162841796875, 0.06707763671875, 5.157135009765625, 2.1671142578125, -2.1388931274414062, 5.263031005859375, 2.0328826904296875, 0.22049713134765625, 1.40008544921875, 3.8654937744140625, -0.7924766540527344, 6.507953643798828, 0.8513870239257812, 2.3234291076660156, 7.11981201171875, -2.7806396484375, 0.8034820556640625, 1.075286865234375, -1.4730682373046875, 3.9581222534179688, 3.0876617431640625, -1.0900077819824219, -1.3649139404296875, 3.3209095001220703, 2.095905303955078, 1.3599853515625, 0.282623291015625, -0.9394302368164062, 1.0665740966796875, 1.2102928161621094, 1.15496826171875, -0.3328399658203125, 1.3330612182617188, 1.5477714538574219, 0.424163818359375, -1.0701255798339844, 4.012481689453125, -1.3409233093261719, 3.396514892578125, 4.8921051025390625, -1.737823486328125, 2.3574676513671875, 2.3364715576171875, 2.6901092529296875, 0.28205108642578125, 1.58245849609375, 0.9683971405029297, 2.3449172973632812, 2.083524703979492, 0.17251014709472656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000441.npy"}
|
||||
{"epoch": 0.6666666666666666, "step": 442, "batch_size": 64, "mean": 1.4496479034423828, "std": 2.326754093170166, "min": -4.7397003173828125, "p10": -0.9575839996337888, "median": 1.4297218322753906, "p90": 4.441453170776369, "max": 8.100296020507812, "pos_frac": 0.765625, "sample": [0.075347900390625, 1.0920562744140625, 0.3675537109375, -0.0907745361328125, 0.7437171936035156, 6.658599853515625, 8.100296020507812, 5.410789489746094, 1.1796417236328125, 1.270538330078125, 2.3257522583007812, 2.2837371826171875, 3.8336563110351562, 4.82208251953125, 1.6267890930175781, 3.667417526245117, -0.33194732666015625, 1.4511260986328125, 0.050537109375, -2.985492706298828, -0.45047760009765625, 1.9088802337646484, -0.5678939819335938, 1.1493148803710938, -4.7397003173828125, 2.566314697265625, -1.5610733032226562, 2.0322532653808594, 3.0325775146484375, 2.408843994140625, 2.107349395751953, 2.1695327758789062, 0.37371063232421875, 2.1014251708984375, 5.558135986328125, 0.09714889526367188, 0.5004730224609375, -1.0794944763183594, 5.79669189453125, 2.7847671508789062, 0.9047203063964844, 0.48516845703125, 0.0773162841796875, 0.033458709716796875, -0.24868011474609375, -3.0170745849609375, -0.673126220703125, 3.901336669921875, -0.31085205078125, 1.485748291015625, 3.16070556640625, 1.4808731079101562, -2.4179916381835938, 3.9792823791503906, 1.8129425048828125, -1.4359874725341797, 1.781087875366211, 1.4083175659179688, -0.306976318359375, 2.914318084716797, 4.6395263671875, 3.3279972076416016, 0.10868453979492188, 1.9464683532714844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000442.npy"}
|
||||
{"epoch": 0.6681783824640968, "step": 443, "batch_size": 64, "mean": 1.9641380310058594, "std": 2.7938246726989746, "min": -3.7716197967529297, "p10": -0.8197809219360351, "median": 1.40899658203125, "p90": 5.925270843505859, "max": 12.419769287109375, "pos_frac": 0.78125, "sample": [0.3288764953613281, 2.6952476501464844, 1.9238739013671875, 5.8544464111328125, 1.55255126953125, 2.666412353515625, 1.3624725341796875, -1.36517333984375, 0.17713165283203125, -0.33130645751953125, -0.7711029052734375, -1.2666893005371094, 0.021938323974609375, 2.620840072631836, -0.6986198425292969, 3.95684814453125, 0.6450214385986328, 2.3957138061523438, 3.4489097595214844, 3.4362411499023438, 12.419769287109375, -3.7716197967529297, -0.20371627807617188, 3.590930938720703, 0.2930641174316406, 8.43585205078125, 6.3756561279296875, 0.728546142578125, 7.1332244873046875, 5.929843902587891, 1.4555206298828125, 0.6304779052734375, 1.0904369354248047, 1.2848291397094727, 1.8709869384765625, 4.138774871826172, 1.6417903900146484, 2.3734283447265625, 6.663776397705078, 5.914600372314453, 0.7642288208007812, 0.2393207550048828, -0.25350379943847656, 3.6354713439941406, 4.393337249755859, 4.120889663696289, -1.351470947265625, 2.1432418823242188, 0.16682052612304688, 1.5882625579833984, 1.5985565185546875, -1.0802383422851562, 0.183807373046875, -0.2004547119140625, 0.629791259765625, 1.738616943359375, -0.5575942993164062, 0.7711944580078125, 7.06170654296875, 0.4735736846923828, -0.8406429290771484, -1.1923484802246094, 0.64984130859375, 4.37261962890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000443.npy"}
|
||||
{"epoch": 0.6696900982615268, "step": 444, "batch_size": 64, "mean": 0.9787817001342773, "std": 2.3700039386749268, "min": -6.1897125244140625, "p10": -1.4722692489624023, "median": 0.6551094055175781, "p90": 3.928490447998047, "max": 6.78826904296875, "pos_frac": 0.6875, "sample": [1.3732185363769531, 2.64312744140625, -0.4727783203125, 0.9096107482910156, 0.29345703125, 3.0020790100097656, 3.3246021270751953, -2.325347900390625, -0.5632896423339844, 1.6857738494873047, 2.1365890502929688, 4.0630035400390625, 0.2111968994140625, -2.4334030151367188, 0.406219482421875, -0.3973388671875, 1.3935527801513672, 1.0500068664550781, 1.4091777801513672, -1.5081024169921875, 6.2078094482421875, 3.0717315673828125, -0.9612655639648438, 0.8431663513183594, 0.7009735107421875, -0.42969512939453125, -0.7373428344726562, -1.3886585235595703, -0.0839996337890625, -0.3544158935546875, 4.8734588623046875, 1.1090869903564453, 3.9755325317382812, 0.6014862060546875, 0.521881103515625, 5.6030426025390625, 0.060245513916015625, 3.714019775390625, -2.0929603576660156, -3.8609771728515625, 2.1215057373046875, 6.40704345703125, -6.1897125244140625, -0.7547760009765625, 0.01352691650390625, 2.649322509765625, 0.91424560546875, 6.78826904296875, 3.0263710021972656, 0.2424774169921875, 1.836080551147461, 1.501220703125, 0.2436237335205078, 2.4215469360351562, 2.2437057495117188, -0.22858524322509766, -0.7352561950683594, 0.18963241577148438, 0.9343414306640625, -0.7293624877929688, -2.7090988159179688, 3.8187255859375, 0.6092453002929688, 0.4534626007080078], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000444.npy"}
|
||||
{"epoch": 0.671201814058957, "step": 445, "batch_size": 64, "mean": 1.7450923919677734, "std": 2.6051151752471924, "min": -3.9463577270507812, "p10": -1.4853179931640623, "median": 1.7934818267822266, "p90": 5.217966461181641, "max": 7.041778564453125, "pos_frac": 0.75, "sample": [4.070621490478516, 6.2393798828125, 0.9160537719726562, 0.267364501953125, -2.5065460205078125, 6.256568908691406, 3.6691741943359375, 2.1365966796875, 4.4354400634765625, 1.4138031005859375, -3.9463577270507812, -1.0776443481445312, -0.16998291015625, -1.1318283081054688, -1.4210281372070312, 2.957225799560547, -3.5720672607421875, 0.006103515625, 1.00128173828125, 1.9318885803222656, -1.0254573822021484, 2.4362869262695312, -2.6454544067382812, 2.4015960693359375, -3.283843994140625, 3.9780654907226562, 1.3346900939941406, 2.9638328552246094, 2.1117095947265625, -2.6760711669921875, -1.5128707885742188, -0.5972499847412109, 1.4134502410888672, 3.2579193115234375, 5.998802185058594, 0.2254791259765625, 7.041778564453125, 5.185874938964844, 1.4238357543945312, 3.308563232421875, 2.5864334106445312, 3.0202980041503906, 2.2616119384765625, 5.846458435058594, 2.452281951904297, 4.237707138061523, 4.5732879638671875, 1.2711524963378906, 3.850719451904297, -0.8061599731445312, -0.2909698486328125, -0.11850738525390625, 1.6550750732421875, 1.4498291015625, 3.4592208862304688, 3.1909942626953125, 5.231719970703125, 2.4138565063476562, 0.6782989501953125, 0.41478729248046875, 1.5875701904296875, 2.7825660705566406, 6.600635528564453, 0.5200576782226562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000445.npy"}
|
||||
{"epoch": 0.672713529856387, "step": 446, "batch_size": 64, "mean": 1.9515937566757202, "std": 2.1036765575408936, "min": -2.213306427001953, "p10": -0.6781578063964844, "median": 2.0053939819335938, "p90": 4.618243026733398, "max": 6.5730133056640625, "pos_frac": 0.8125, "sample": [-0.23037338256835938, 0.049770355224609375, -1.6917877197265625, 1.812225341796875, 2.0797348022460938, 3.116943359375, 0.2784271240234375, 0.864593505859375, 5.4248809814453125, 4.498207092285156, 2.5456390380859375, 2.3460311889648438, 3.0388641357421875, 2.7480525970458984, 2.5308303833007812, 4.3065185546875, 0.07192230224609375, 2.6115636825561523, 5.725914001464844, 2.3080596923828125, 0.8651580810546875, 3.037109375, 1.4013824462890625, 1.7403564453125, 4.4042510986328125, 2.648395538330078, 0.7625503540039062, 4.618106842041016, 6.289276123046875, -0.684967041015625, 3.1265220642089844, 1.239593505859375, -0.004070281982421875, 0.5429611206054688, -1.3864383697509766, -2.213306427001953, 0.8802032470703125, -1.5933647155761719, 1.3610076904296875, 4.111351013183594, 1.3846054077148438, 5.5713653564453125, 2.3882904052734375, -0.7806472778320312, 6.5730133056640625, -0.6622695922851562, 3.70941162109375, 3.6094589233398438, -0.3553581237792969, -2.057464599609375, 2.0606689453125, -0.2729377746582031, 2.9121475219726562, 1.9501190185546875, 4.275051116943359, 4.6183013916015625, 1.2240524291992188, 1.7390518188476562, 0.23538589477539062, 5.3127288818359375, 0.44017791748046875, 1.1611528396606445, 2.1396484375, 2.143949508666992], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000446.npy"}
|
||||
{"epoch": 0.674225245653817, "step": 447, "batch_size": 64, "mean": 1.8610403537750244, "std": 2.3358492851257324, "min": -4.993167877197266, "p10": -1.058698272705078, "median": 1.8416500091552734, "p90": 4.968217468261719, "max": 5.973602294921875, "pos_frac": 0.78125, "sample": [5.1516876220703125, 3.4473114013671875, 2.9431915283203125, 1.097381591796875, -0.9280014038085938, -0.4186859130859375, 0.7073822021484375, 1.4825515747070312, -2.3844680786132812, -0.566375732421875, 0.31356048583984375, 3.6708755493164062, 0.6102218627929688, 5.393402099609375, 1.79864501953125, 4.669727325439453, -0.3137245178222656, 2.1127796173095703, 1.3129310607910156, 3.290191650390625, 2.7533531188964844, -0.9858474731445312, 2.50799560546875, -1.3325462341308594, 0.155242919921875, 3.651702880859375, 2.8517837524414062, 3.6702117919921875, 3.7670745849609375, 1.28997802734375, 2.94110107421875, 4.081085205078125, 0.39029884338378906, 4.7112579345703125, 1.3527107238769531, 1.8846549987792969, 5.282768249511719, 3.9776153564453125, 5.602142333984375, -1.1924667358398438, -0.31531524658203125, 0.8045806884765625, 4.416229248046875, -1.0899200439453125, 0.16388702392578125, -0.910064697265625, 1.1484909057617188, 5.65447998046875, 1.4613037109375, 4.99725341796875, 1.0538101196289062, 2.2993316650390625, -1.6374778747558594, 2.3031692504882812, 2.5838623046875, 0.749847412109375, 4.7991180419921875, -4.993167877197266, 2.0416793823242188, 4.9004669189453125, 5.973602294921875, -2.2431182861328125, 2.672271728515625, 1.5215625762939453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000447.npy"}
|
||||
{"epoch": 0.6757369614512472, "step": 448, "batch_size": 64, "mean": 1.501330852508545, "std": 2.5701494216918945, "min": -5.201728820800781, "p10": -1.2977091789245605, "median": 1.619781494140625, "p90": 4.901711273193361, "max": 8.3524169921875, "pos_frac": 0.671875, "sample": [4.6058807373046875, -2.1735458374023438, 3.9266061782836914, 0.7457733154296875, 0.3897552490234375, 1.5548171997070312, -1.2936897277832031, 2.135955810546875, 3.4043655395507812, 0.1791229248046875, 8.028915405273438, 2.601806640625, 1.8956680297851562, 0.28229522705078125, 2.7255096435546875, 5.152641296386719, -1.2319107055664062, 5.064109802246094, 5.028495788574219, -0.2885169982910156, -0.4839019775390625, 7.257213592529297, 0.18994140625, 1.3923492431640625, -0.013135910034179688, -0.6161041259765625, 3.4045944213867188, -0.1377239227294922, 2.9842987060546875, -1.924896240234375, -1.2994318008422852, -0.1706390380859375, 1.5013389587402344, 1.6859893798828125, -0.7098236083984375, 1.8509445190429688, -1.7697219848632812, 2.8627281188964844, 4.3887176513671875, -0.6746883392333984, 2.3430862426757812, -5.201728820800781, 1.7050895690917969, 2.5200958251953125, -0.71282958984375, 3.43353271484375, 1.8054962158203125, 5.197174072265625, 2.7414016723632812, -1.0375595092773438, -0.8001308441162109, -0.5141143798828125, 1.7221145629882812, 1.2955474853515625, -2.6125669479370117, 3.6707763671875, 1.6847457885742188, -2.5277557373046875, 1.495269775390625, 2.958709716796875, 2.710174560546875, 1.5236091613769531, 1.8805160522460938, 8.3524169921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000448.npy"}
|
||||
{"epoch": 0.6772486772486772, "step": 449, "batch_size": 64, "mean": 0.9342143535614014, "std": 2.3083713054656982, "min": -4.89478874206543, "p10": -1.9445201873779294, "median": 0.9218406677246094, "p90": 3.2371124267578137, "max": 7.100765228271484, "pos_frac": 0.59375, "sample": [-0.04515838623046875, 2.958099365234375, -0.33145904541015625, 2.0999603271484375, -0.2793731689453125, -2.4618568420410156, 1.4246368408203125, 4.288177490234375, -0.1288299560546875, 4.810825347900391, 2.6650466918945312, 2.5314712524414062, 1.0402297973632812, -1.6225624084472656, -0.49907684326171875, 1.796661376953125, 7.100765228271484, 1.2480430603027344, -0.08229827880859375, 0.01407623291015625, 2.4863739013671875, 1.9117469787597656, 3.6541748046875, 6.8644561767578125, -2.300891876220703, 2.1019248962402344, 2.823406219482422, -2.037811279296875, 1.5714035034179688, 2.815349578857422, 1.7320556640625, -1.4036026000976562, 1.103729248046875, -4.89478874206543, 2.4866943359375, 1.929351806640625, 5.976898193359375, -1.7268409729003906, 0.6555747985839844, 1.3872489929199219, -0.3115692138671875, -0.380615234375, -2.1475486755371094, 1.7954864501953125, -0.9678401947021484, 0.625946044921875, -0.8601207733154297, -0.007659912109375, 0.22414207458496094, 2.2523765563964844, 0.8034515380859375, -0.01479339599609375, 1.4361495971679688, 0.787506103515625, -0.0586395263671875, 3.356689453125, 2.874835968017578, -3.7398223876953125, 2.1353225708007812, -0.05754852294921875, 2.799652099609375, -3.5572509765625, -0.3771820068359375, -0.4850807189941406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000449.npy"}
|
||||
{"epoch": 0.6787603930461074, "step": 450, "batch_size": 64, "mean": 1.348769187927246, "std": 2.1148295402526855, "min": -3.719745635986328, "p10": -0.9802928924560547, "median": 1.0432472229003906, "p90": 4.0100353240966795, "max": 7.3056182861328125, "pos_frac": 0.734375, "sample": [-0.2723827362060547, -0.42626953125, 0.10055160522460938, 0.6351737976074219, 1.973358154296875, 4.036552429199219, 2.6425552368164062, 1.0052032470703125, 2.7259445190429688, 7.3056182861328125, 3.6659011840820312, 2.8314666748046875, 2.312349319458008, 1.0812911987304688, -0.9489650726318359, 5.20458984375, -0.4705810546875, 0.8206367492675781, 2.24420166015625, 5.6431427001953125, -2.4419593811035156, 0.6457138061523438, 0.9320068359375, 0.5090141296386719, -0.36071014404296875, -1.6872291564941406, 0.74798583984375, 2.92486572265625, 3.5116119384765625, -0.9557342529296875, 1.7192001342773438, 5.09912109375, 0.8742465972900391, 1.8375816345214844, -3.719745635986328, 3.473236083984375, 2.4984092712402344, -0.5305328369140625, 1.111114501953125, 1.8119220733642578, 1.6569328308105469, -0.4016571044921875, 1.4559669494628906, -2.0619888305664062, 4.148517608642578, 0.38478851318359375, 2.045581817626953, 0.8913192749023438, 2.2176971435546875, -0.5184707641601562, 0.1803436279296875, -0.03189849853515625, 0.3737678527832031, 3.948162078857422, 5.74468994140625, 2.1249771118164062, -1.1299591064453125, 2.1577224731445312, 0.9448585510253906, 3.1977577209472656, 0.7115554809570312, 1.5069541931152344, -2.346027374267578, -0.9908180236816406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000450.npy"}
|
||||
{"epoch": 0.6802721088435374, "step": 451, "batch_size": 64, "mean": 1.2475539445877075, "std": 2.2574784755706787, "min": -3.3298492431640625, "p10": -1.6345382690429686, "median": 1.226797103881836, "p90": 4.016922760009766, "max": 6.8541107177734375, "pos_frac": 0.703125, "sample": [1.1775169372558594, -1.9723358154296875, 0.7953414916992188, 1.3973236083984375, 3.8416290283203125, 0.840576171875, -3.070220947265625, -0.72930908203125, 2.3955230712890625, 3.7606582641601562, 2.1745147705078125, 1.2826652526855469, 4.259674072265625, 2.901031494140625, 4.092048645019531, -0.7801666259765625, 1.2760772705078125, 0.5737991333007812, 1.4309253692626953, 2.499052047729492, 2.8887767791748047, -1.671539306640625, -0.0512542724609375, 5.5761260986328125, 3.8046398162841797, -3.3298492431640625, 1.1607666015625, -3.02081298828125, -1.95550537109375, 0.6783542633056641, 1.575286865234375, 3.1266021728515625, -2.1675682067871094, -0.23934173583984375, 2.4367599487304688, 1.7361602783203125, 4.782619476318359, 4.845451354980469, 1.1288328170776367, 0.18265533447265625, -0.3979339599609375, 0.7347183227539062, 0.799626350402832, 0.29840087890625, 2.264181137084961, 1.930511474609375, 1.9038887023925781, 6.8541107177734375, 3.7899322509765625, -0.3600921630859375, 2.2799224853515625, 1.5438003540039062, 2.4222488403320312, 0.4835662841796875, -1.1031455993652344, -1.208587646484375, 2.754344940185547, -1.0659027099609375, -1.3243541717529297, 1.8780364990234375, 6.782970428466797, 1.1469879150390625, -1.5482025146484375, -0.6490631103515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000451.npy"}
|
||||
{"epoch": 0.6817838246409675, "step": 452, "batch_size": 64, "mean": 1.4708220958709717, "std": 2.535510301589966, "min": -4.4084320068359375, "p10": -1.2666738510131832, "median": 1.4029788970947266, "p90": 4.774370574951172, "max": 7.7037200927734375, "pos_frac": 0.6875, "sample": [3.48712158203125, 0.1686859130859375, 2.801555633544922, 3.2834320068359375, -0.8617286682128906, 4.941617965698242, 1.746307373046875, 7.7037200927734375, 1.0552291870117188, 4.160980224609375, 1.875457763671875, 2.28936767578125, -4.4084320068359375, 4.215118408203125, 1.5988998413085938, 1.2718048095703125, 4.8058624267578125, 1.1547088623046875, 3.880260467529297, 2.204925537109375, 1.2792892456054688, -0.4027290344238281, 0.1462249755859375, -0.15901565551757812, -0.4114856719970703, -1.4402217864990234, 1.24652099609375, 1.9486236572265625, 5.1168212890625, -0.24799728393554688, -2.0091400146484375, 7.192924499511719, -3.3597679138183594, 4.700889587402344, -0.6841659545898438, 6.4609832763671875, 3.2766876220703125, 0.03832244873046875, -0.35614013671875, 1.5989646911621094, 1.7532234191894531, -0.0856475830078125, 3.8027267456054688, 0.842132568359375, 2.308563232421875, -2.572298049926758, 6.49188232421875, 1.9252700805664062, 2.2378883361816406, -3.8483200073242188, 2.065114974975586, -0.6621494293212891, 1.5266685485839844, 0.056304931640625, -0.7220497131347656, 0.19756698608398438, -0.4932746887207031, 3.202068328857422, 2.43316650390625, -0.3310661315917969, -0.07281494140625, 3.7078857421875, -1.8748703002929688, 0.9341583251953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000452.npy"}
|
||||
{"epoch": 0.6832955404383976, "step": 453, "batch_size": 64, "mean": 1.2101881504058838, "std": 2.383695363998413, "min": -3.566375732421875, "p10": -1.9759178161621094, "median": 0.94342041015625, "p90": 3.96429443359375, "max": 9.324722290039062, "pos_frac": 0.6875, "sample": [3.9811859130859375, 3.0429859161376953, 0.1566925048828125, -2.0674591064453125, 4.111869812011719, 1.7386703491210938, -2.488597869873047, 2.5376625061035156, 3.530487060546875, 1.4962539672851562, -2.934049606323242, 2.2277488708496094, 2.1793670654296875, -0.8911819458007812, 2.7695083618164062, 3.9248809814453125, 2.072418212890625, -0.229400634765625, 2.2126007080078125, 2.0982513427734375, 0.6746788024902344, -2.007904052734375, 0.881988525390625, 0.5058364868164062, 1.004852294921875, -3.566375732421875, -2.84326171875, 3.3311500549316406, 0.5234012603759766, 2.3341712951660156, -0.48187255859375, 0.834874153137207, 1.8973312377929688, -1.9012832641601562, -1.281707763671875, 0.47003173828125, 4.592258453369141, -0.17245864868164062, -2.4859237670898438, 1.9141693115234375, 0.8011016845703125, 4.8918914794921875, -1.0355224609375, -0.2712554931640625, 0.632110595703125, 0.6417694091796875, -0.4499931335449219, -0.5503883361816406, 2.3470382690429688, -0.20438385009765625, 3.5203704833984375, 0.05219268798828125, 0.8386001586914062, 1.7166194915771484, 1.7882843017578125, -1.568878173828125, -0.3161201477050781, 9.324722290039062, 4.739620208740234, 2.0523223876953125, 3.3390045166015625, 3.07757568359375, 7.015342712402344, 1.3761672973632812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000453.npy"}
|
||||
{"epoch": 0.6848072562358276, "step": 454, "batch_size": 64, "mean": 1.5949238538742065, "std": 2.637998342514038, "min": -6.050437927246094, "p10": -1.3640424728393552, "median": 1.366450309753418, "p90": 5.018602752685547, "max": 7.705604553222656, "pos_frac": 0.6875, "sample": [4.3693389892578125, -1.4393692016601562, -0.4886016845703125, 2.9358062744140625, -6.050437927246094, -0.49275875091552734, 3.2219161987304688, 0.5356903076171875, 2.5661888122558594, 0.45568275451660156, 4.045478820800781, 2.9781494140625, 2.012603759765625, -0.0351715087890625, 4.406951904296875, 1.6959877014160156, 3.861114501953125, 3.5215492248535156, -0.0043487548828125, 0.9733428955078125, -0.3442535400390625, -3.7567977905273438, -0.3426971435546875, 3.5938949584960938, 0.904144287109375, 4.529449462890625, -3.2411842346191406, 4.290618896484375, 1.8742828369140625, 1.2281265258789062, -1.5651931762695312, 0.12830352783203125, 5.285369873046875, -0.6172294616699219, 3.392181396484375, 1.4358768463134766, 7.705604553222656, -0.7253437042236328, 6.194171905517578, 2.5102157592773438, 1.0006027221679688, -1.9254684448242188, 0.15016555786132812, 0.3297557830810547, -0.9106178283691406, 0.5638656616210938, 3.0335540771484375, 2.889312744140625, -0.13341903686523438, 2.0236148834228516, 6.176910400390625, 1.9529647827148438, 5.998386383056641, 4.9339599609375, 1.2970237731933594, -1.9258575439453125, 0.9299030303955078, 5.054878234863281, 1.9100418090820312, 6.1996002197265625, 2.6632041931152344, -0.040744781494140625, -1.1882801055908203, -0.45688629150390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000454.npy"}
|
||||
{"epoch": 0.6863189720332578, "step": 455, "batch_size": 64, "mean": 1.9055445194244385, "std": 2.5365729331970215, "min": -5.450439453125, "p10": -0.5434719085693359, "median": 1.558156967163086, "p90": 4.561583709716797, "max": 9.795539855957031, "pos_frac": 0.84375, "sample": [3.31494140625, 2.5705108642578125, 2.979949951171875, 6.458160400390625, 1.5254287719726562, 0.5329265594482422, 5.6498565673828125, 8.86260986328125, 3.8706283569335938, 0.24463653564453125, 0.5513496398925781, -0.56103515625, 1.4424896240234375, 4.452171325683594, 1.284820556640625, 6.0956573486328125, 2.0633544921875, 0.17668914794921875, 0.525115966796875, 0.39284515380859375, 4.1417388916015625, -1.0449295043945312, 1.8557968139648438, -0.5284461975097656, -0.3634662628173828, 4.6084747314453125, 1.9852142333984375, 3.2678070068359375, 3.1602401733398438, 0.568267822265625, 3.055511474609375, 0.5231056213378906, 4.428016662597656, 0.1351776123046875, 1.5908851623535156, 0.8882064819335938, -3.222148895263672, 3.8049774169921875, 3.1038589477539062, 2.2417449951171875, -1.3415794372558594, 0.8061370849609375, 3.997650146484375, -5.450439453125, 2.1520233154296875, 0.5204315185546875, 2.7980728149414062, 0.12265777587890625, 1.6778793334960938, 1.124237060546875, 9.795539855957031, 0.404266357421875, -0.47039031982421875, 2.8403263092041016, -0.5499114990234375, 1.0232162475585938, 3.7256202697753906, 0.2579345703125, 0.44904327392578125, 0.4432048797607422, -0.6208744049072266, 2.6758499145507812, 2.3600921630859375, 6.580718994140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000455.npy"}
|
||||
{"epoch": 0.6878306878306878, "step": 456, "batch_size": 64, "mean": 1.5422389507293701, "std": 2.058835506439209, "min": -3.6595687866210938, "p10": -1.1449153900146483, "median": 1.4376068115234375, "p90": 3.5025344848632813, "max": 6.435169219970703, "pos_frac": 0.78125, "sample": [4.8191375732421875, 1.1259307861328125, 0.077117919921875, 3.139312744140625, 6.35760498046875, -0.7959671020507812, -2.138331413269043, 2.284942626953125, 0.27050209045410156, 6.435169219970703, 1.336944580078125, 2.120208740234375, 4.0015716552734375, 1.6277999877929688, 1.4470367431640625, 2.047576904296875, 0.22967529296875, 3.3095932006835938, 0.34156227111816406, 1.4281768798828125, 0.380279541015625, 3.2755355834960938, 3.2024688720703125, -1.3124580383300781, 2.0795364379882812, 0.7862071990966797, -1.5223960876464844, -1.581695556640625, 1.1756782531738281, -1.931365966796875, -0.9504623413085938, 0.389495849609375, 1.5969161987304688, 2.5694808959960938, 3.3350372314453125, 3.3476877212524414, 0.5127754211425781, -0.3186492919921875, 3.1827163696289062, 0.6206436157226562, -1.2282524108886719, 2.6901779174804688, -0.32411766052246094, 3.5130767822265625, 2.0602970123291016, -3.6595687866210938, 3.408905029296875, 1.3220672607421875, 0.7838134765625, 0.8385848999023438, 4.751411437988281, 3.146503448486328, 0.3597221374511719, 3.477935791015625, 1.9351806640625, 2.772510528564453, 6.1129913330078125, 0.20589447021484375, -0.137359619140625, 3.0631027221679688, 2.491283416748047, -0.042388916015625, -0.075927734375, 2.9344482421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000456.npy"}
|
||||
{"epoch": 0.6893424036281179, "step": 457, "batch_size": 64, "mean": 1.4279569387435913, "std": 2.002819299697876, "min": -2.4616270065307617, "p10": -0.8212623596191405, "median": 1.4855823516845703, "p90": 3.946908569335938, "max": 6.2656097412109375, "pos_frac": 0.71875, "sample": [1.5173110961914062, -0.712554931640625, -0.33380699157714844, 1.65264892578125, 2.857370376586914, -0.9697799682617188, 2.5879287719726562, 2.176666259765625, 0.121978759765625, -0.06289291381835938, 0.392059326171875, 3.813629150390625, 1.4690570831298828, 1.1062850952148438, -2.4616270065307617, -1.9965744018554688, 3.7472305297851562, -0.15959930419921875, 0.5468692779541016, 1.5988388061523438, 4.599143981933594, 3.1178627014160156, 6.2656097412109375, 1.86767578125, 1.9093036651611328, -0.5557098388671875, -0.5532455444335938, 2.7015228271484375, 4.326416015625, -0.5448112487792969, 0.1433563232421875, -0.751983642578125, -0.7601394653320312, 1.8922920227050781, 3.6972503662109375, -0.5145339965820312, -1.5980987548828125, 3.6804351806640625, 1.4850692749023438, 0.60565185546875, 3.4757766723632812, -1.4977188110351562, 5.102970123291016, 4.0040283203125, -1.6904029846191406, 0.20947265625, 2.864013671875, 0.6357364654541016, 1.6732673645019531, 4.013236999511719, 3.469554901123047, 0.5698060989379883, 6.081298828125, 0.39052581787109375, 1.0357742309570312, 3.425140380859375, -0.8474578857421875, 2.2012977600097656, 1.4860954284667969, 1.5760269165039062, 1.767822265625, -0.01453399658203125, 3.4517879486083984, 0.10161781311035156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000457.npy"}
|
||||
{"epoch": 0.690854119425548, "step": 458, "batch_size": 64, "mean": 1.7781078815460205, "std": 2.2121455669403076, "min": -3.1315765380859375, "p10": -1.0966964721679686, "median": 1.7537422180175781, "p90": 4.289564514160157, "max": 6.673160552978516, "pos_frac": 0.765625, "sample": [-0.72027587890625, 4.453388214111328, 2.389514923095703, 0.758697509765625, 3.1952953338623047, 3.3261871337890625, 2.4092788696289062, -2.070892333984375, -0.6912965774536133, 0.8955497741699219, -0.15570449829101562, -1.5075454711914062, -0.9317474365234375, 4.3982086181640625, 3.9458694458007812, 0.17256736755371094, 3.9218368530273438, 2.84454345703125, 1.4572792053222656, 4.3680419921875, 1.8660736083984375, 1.6823348999023438, 3.383960723876953, -0.5566787719726562, 3.6876373291015625, 1.2045364379882812, 4.299713134765625, 1.4889850616455078, -0.248565673828125, 1.9354629516601562, 4.2658843994140625, 1.5505828857421875, 0.022922515869140625, 0.1735687255859375, 2.865386962890625, -0.018527984619140625, 1.4945144653320312, 1.9937362670898438, 3.9718780517578125, 3.4418716430664062, 1.8251495361328125, 0.3460540771484375, -3.1315765380859375, 0.2199687957763672, 4.24615478515625, 0.8814926147460938, 3.1282577514648438, -0.5982646942138672, 6.202564239501953, 2.867645263671875, -1.9415283203125, 3.864013671875, 2.88104248046875, 1.3344879150390625, -2.6283798217773438, 6.673160552978516, 3.4522323608398438, 6.494476318359375, -1.167388916015625, -1.338897705078125, 1.1889381408691406, 3.3133926391601562, 1.0972213745117188, 3.6246185302734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000458.npy"}
|
||||
{"epoch": 0.6923658352229781, "step": 459, "batch_size": 64, "mean": 1.7036855220794678, "std": 2.407243490219116, "min": -3.3572349548339844, "p10": -1.506823348999023, "median": 1.3371124267578125, "p90": 5.20076370239258, "max": 7.007957458496094, "pos_frac": 0.78125, "sample": [1.4307479858398438, 2.279693603515625, -2.3648338317871094, 2.4108428955078125, -1.1974143981933594, 0.20068359375, 3.2983169555664062, 1.7330322265625, 2.4546966552734375, 1.9281654357910156, 4.436981201171875, 1.0317306518554688, 4.130645751953125, -0.4288215637207031, 3.540802001953125, 5.8868408203125, 0.00612640380859375, 1.091583251953125, 3.2996749877929688, 2.4132938385009766, 1.0932426452636719, 0.4986114501953125, -0.7009391784667969, 0.5720443725585938, 2.04296875, 2.6680259704589844, 0.9312362670898438, 1.8684425354003906, 4.483860015869141, 6.379673004150391, 5.5648956298828125, 1.2833099365234375, -2.3713455200195312, 0.74847412109375, -1.6394271850585938, 0.9900016784667969, 0.028118133544921875, -1.7825927734375, 4.2747344970703125, -0.38487815856933594, 1.0490779876708984, 2.2175140380859375, 1.1816444396972656, 0.8176860809326172, 3.19195556640625, -0.15579986572265625, 4.1882171630859375, 2.4873085021972656, -2.7433090209960938, 5.3463897705078125, 0.5926284790039062, 0.5077896118164062, -3.3572349548339844, 3.3854141235351562, 1.3909149169921875, -2.0911712646484375, 1.0682449340820312, -0.69708251953125, 4.860969543457031, 6.794582366943359, 5.6215362548828125, 2.6331329345703125, 7.007957458496094, -0.39373779296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000459.npy"}
|
||||
{"epoch": 0.6938775510204082, "step": 460, "batch_size": 64, "mean": 1.6595821380615234, "std": 2.5911343097686768, "min": -2.2467117309570312, "p10": -1.3594017028808592, "median": 1.0155258178710938, "p90": 5.030017471313476, "max": 10.073974609375, "pos_frac": 0.71875, "sample": [3.8193206787109375, 3.337860107421875, 10.073974609375, -0.6402435302734375, -0.020214080810546875, 3.963794708251953, 4.79638671875, -1.0153961181640625, 4.30438232421875, 0.2490386962890625, 0.12575149536132812, 0.6925201416015625, -2.167755126953125, 2.2106399536132812, -2.2467117309570312, 0.6264266967773438, 2.8885536193847656, 1.21282958984375, -0.6114273071289062, 4.638763427734375, 0.8329315185546875, 6.6103973388671875, 1.5021438598632812, 2.2849884033203125, 5.8541107177734375, 2.723705291748047, 0.6509437561035156, 0.0433502197265625, 3.1610336303710938, 0.5753135681152344, 0.5884323120117188, 5.0386505126953125, 1.3874530792236328, 0.526824951171875, -0.3730192184448242, 1.9720306396484375, 1.714071273803711, -1.506072998046875, 5.0386199951171875, -0.8115978240966797, 4.641452789306641, 3.6390113830566406, -1.9325408935546875, -0.647003173828125, 0.7150421142578125, 5.292934417724609, 0.41512298583984375, -1.1023330688476562, 2.8380126953125, -1.469573974609375, -0.8849887847900391, -0.4320068359375, 0.7860870361328125, 1.7579002380371094, -2.197784423828125, 5.009944915771484, 1.5462875366210938, 7.94317626953125, -1.6522445678710938, -0.3128204345703125, 0.4294586181640625, 3.29071044921875, 1.1981201171875, 3.2884864807128906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000460.npy"}
|
||||
{"epoch": 0.6953892668178382, "step": 461, "batch_size": 64, "mean": 1.6554611921310425, "std": 2.557593822479248, "min": -4.603605270385742, "p10": -1.4285621643066404, "median": 1.5842981338500977, "p90": 4.890764617919922, "max": 8.926193237304688, "pos_frac": 0.765625, "sample": [2.755035400390625, 2.163990020751953, 2.8104934692382812, 1.5926246643066406, 4.7417144775390625, 3.2941970825195312, 4.070091247558594, 1.4233627319335938, 1.2979736328125, 0.07609367370605469, -0.2990875244140625, 0.9494094848632812, 3.7043609619140625, -4.603605270385742, 2.332082748413086, -0.09922027587890625, 0.381439208984375, 3.686767578125, 2.085442543029785, 2.176349639892578, 0.550145149230957, 3.229299545288086, 1.8389892578125, 1.1012115478515625, 0.41795921325683594, 1.7703857421875, -3.4688873291015625, -2.2542190551757812, -1.291229248046875, 2.077573776245117, 0.8448524475097656, -0.9496803283691406, -1.634775161743164, 1.4106807708740234, 3.1422882080078125, 1.26141357421875, 5.0819091796875, 2.6158828735351562, 1.69000244140625, 8.926193237304688, 1.879608154296875, 4.954643249511719, 5.0072021484375, 0.8422889709472656, 1.3768501281738281, 0.3152275085449219, 1.1758460998535156, -2.695882797241211, 2.204988479614258, 2.210845947265625, 3.11285400390625, 1.5759716033935547, -0.017009735107421875, 0.2406005859375, 6.581085205078125, -1.4874191284179688, 8.094375610351562, -2.4882469177246094, -0.026742935180664062, -0.35784149169921875, 3.4874343872070312, -0.3814697265625, 7.730197906494141, 1.7146034240722656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000461.npy"}
|
||||
{"epoch": 0.6969009826152683, "step": 462, "batch_size": 64, "mean": 1.7776134014129639, "std": 2.6348133087158203, "min": -3.730224609375, "p10": -1.3152816772460936, "median": 1.656881332397461, "p90": 4.6230720520019535, "max": 11.0947265625, "pos_frac": 0.75, "sample": [5.520618438720703, 5.7238922119140625, 3.737577438354492, 1.8546295166015625, -1.0128021240234375, 0.18782806396484375, 3.978607177734375, -0.24300384521484375, -0.7027740478515625, 0.7156333923339844, 3.849090576171875, 5.946296691894531, 3.8447418212890625, 1.1505203247070312, 4.600006103515625, 4.407077789306641, 0.1465606689453125, 6.0446319580078125, 3.6766128540039062, 3.91009521484375, 4.632957458496094, 3.3943710327148438, 4.506401062011719, 2.207000732421875, 0.639007568359375, 1.1284561157226562, 2.354555130004883, -1.0729904174804688, 0.26984405517578125, 2.4186172485351562, 0.9094390869140625, -0.3049468994140625, 2.8458251953125, 1.6992950439453125, 3.570068359375, -1.07806396484375, -0.5211219787597656, 2.6497116088867188, 2.5328197479248047, -2.0493392944335938, 0.6160049438476562, 0.5323562622070312, 3.121917724609375, 0.24826812744140625, -1.4169464111328125, 1.6144676208496094, -1.4932174682617188, 4.117084503173828, 1.8711929321289062, -3.14532470703125, 1.3985214233398438, -3.730224609375, 4.905128479003906, 4.475147247314453, 2.2957763671875, -1.6082687377929688, -0.4851799011230469, 11.0947265625, 2.0153579711914062, 1.5509262084960938, -3.5540237426757812, -0.17837905883789062, 1.2417640686035156, 0.21242904663085938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000462.npy"}
|
||||
{"epoch": 0.6984126984126984, "step": 463, "batch_size": 64, "mean": 1.594813585281372, "std": 2.441650152206421, "min": -3.056671142578125, "p10": -1.012728500366211, "median": 1.2551841735839844, "p90": 5.053944396972657, "max": 7.905586242675781, "pos_frac": 0.640625, "sample": [-0.9014816284179688, 1.843414306640625, 2.179828643798828, -0.76483154296875, 5.322351455688477, 2.8550376892089844, -0.7037200927734375, 7.244516372680664, 4.890769958496094, 2.289173126220703, -0.98773193359375, 1.3104705810546875, 1.136911392211914, 3.6849288940429688, 3.5149078369140625, 1.5227203369140625, -0.1573944091796875, 0.0528564453125, -0.023294448852539062, 0.7487869262695312, -1.1336898803710938, 1.885345458984375, -0.38739013671875, 4.667808532714844, 1.1830005645751953, -3.056671142578125, 2.4682159423828125, 3.4512786865234375, -0.7748603820800781, -1.0234413146972656, -0.39720916748046875, 0.9008941650390625, 5.2130279541015625, 4.428932189941406, -1.1686553955078125, 4.9115753173828125, 1.1998977661132812, 2.1381759643554688, 0.6120452880859375, 4.4929962158203125, 0.3628387451171875, 0.36118316650390625, 2.941497802734375, -2.369781494140625, -0.4213142395019531, -1.9377784729003906, 7.905586242675781, -0.5386428833007812, 5.114959716796875, 6.072540283203125, -0.15936279296875, -0.2404632568359375, 1.8545608520507812, -0.5085325241088867, 3.996185302734375, 2.9683990478515625, 5.330699920654297, 2.1977996826171875, 1.50665283203125, -0.95538330078125, -1.4492454528808594, 2.3719558715820312, -0.05998516082763672, 3.0542068481445312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000463.npy"}
|
||||
{"epoch": 0.6999244142101285, "step": 464, "batch_size": 64, "mean": 1.6464771032333374, "std": 2.86022686958313, "min": -4.52532958984375, "p10": -1.8616289138793944, "median": 1.5180048942565918, "p90": 5.525647544860841, "max": 8.363636016845703, "pos_frac": 0.671875, "sample": [0.0287628173828125, 2.5677719116210938, 1.3874406814575195, -1.919891357421875, 6.557981491088867, -2.7574386596679688, -0.452545166015625, 4.0347442626953125, 2.775501251220703, 1.7945556640625, 6.781364440917969, -1.9746131896972656, 6.132270812988281, 2.4006786346435547, 3.3893356323242188, 0.453277587890625, 4.423576354980469, -2.0533981323242188, 0.2954254150390625, 1.0171546936035156, 2.368785858154297, 4.4173583984375, -0.246246337890625, 1.648569107055664, 2.2112655639648438, 2.446685791015625, -0.3175163269042969, 8.361248016357422, 3.2068605422973633, 1.1144065856933594, -1.324462890625, -0.24573135375976562, 2.4634056091308594, 5.6920623779296875, -4.52532958984375, 2.574798583984375, 5.137346267700195, 3.9512481689453125, -1.6233978271484375, -0.7159500122070312, 3.2541732788085938, 0.7806396484375, 1.3446102142333984, 6.3502044677734375, -2.3785552978515625, -0.5690231323242188, -0.03087615966796875, -1.1754570007324219, -1.1541366577148438, 4.4071197509765625, -0.15188980102539062, 4.562934875488281, -3.6549530029296875, -0.8458404541015625, 1.8658447265625, -1.7256832122802734, 0.07631683349609375, 4.986789703369141, 2.829836845397949, 8.363636016845703, 0.30988121032714844, 3.112506866455078, 2.9184417724609375, 0.42064666748046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000464.npy"}
|
||||
{"epoch": 0.7014361300075586, "step": 465, "batch_size": 64, "mean": 1.4612858295440674, "std": 2.501507043838501, "min": -2.7377891540527344, "p10": -1.5919036865234375, "median": 1.1845130920410156, "p90": 5.406383132934572, "max": 7.1680450439453125, "pos_frac": 0.6875, "sample": [-2.4037132263183594, 4.120361328125, 1.0870590209960938, -0.74127197265625, 3.37908935546875, 2.94061279296875, -2.645275115966797, 6.6670074462890625, 1.158538818359375, 4.470481872558594, 0.9544754028320312, 3.063121795654297, 2.394817352294922, 6.08294677734375, -0.5241489410400391, 1.908447265625, 0.714324951171875, 5.144329071044922, 0.3026123046875, -0.6502456665039062, 1.3327560424804688, -1.5263748168945312, 1.7883529663085938, 5.650634765625, 5.9576568603515625, 5.5186920166015625, -1.431722640991211, -1.5067214965820312, -1.6199874877929688, -0.057216644287109375, 1.8472442626953125, -1.0826263427734375, -0.763702392578125, 2.2612380981445312, 0.8133087158203125, 0.393280029296875, -0.48139190673828125, -0.16427230834960938, -0.05795860290527344, 2.2885570526123047, -1.71649169921875, 1.2788314819335938, 2.1141490936279297, -0.373382568359375, 0.2122039794921875, -2.4410552978515625, 0.9940376281738281, 1.0531768798828125, 4.917976379394531, 5.762367248535156, 3.1163330078125, 3.14984130859375, 1.2104873657226562, 3.706460952758789, -2.5840911865234375, -2.7377891540527344, 1.4325981140136719, 0.24864959716796875, 0.1068115234375, 1.2814865112304688, 7.1680450439453125, 2.827423095703125, 3.6210975646972656, 2.589813232421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000465.npy"}
|
||||
{"epoch": 0.7029478458049887, "step": 466, "batch_size": 64, "mean": 0.9951979517936707, "std": 2.0919344425201416, "min": -3.1045379638671875, "p10": -1.560578155517578, "median": 0.7555770874023438, "p90": 3.604651641845705, "max": 5.5517425537109375, "pos_frac": 0.625, "sample": [-0.4505882263183594, 2.314544677734375, -0.7346343994140625, -2.8650054931640625, 1.4887657165527344, -0.8054122924804688, 1.7302932739257812, 0.885101318359375, -2.3692779541015625, -0.383514404296875, 0.0931396484375, 2.478179931640625, 4.4628143310546875, -1.6711463928222656, 2.06781005859375, -0.4898681640625, 2.5347442626953125, 5.111480712890625, 2.918792724609375, 4.362953186035156, 4.7285003662109375, 2.4609718322753906, 2.933964729309082, -1.3025856018066406, 0.5010223388671875, 1.7675552368164062, 2.911792755126953, -3.1045379638671875, 5.128448486328125, 0.6260528564453125, 0.5779571533203125, 1.7624359130859375, -1.8352813720703125, -0.7670516967773438, -0.1426849365234375, 1.244222640991211, 0.09009552001953125, 1.8746719360351562, -2.8803329467773438, 1.3212852478027344, 0.29168701171875, 0.4744873046875, -1.9409980773925781, -1.0569076538085938, 2.9480819702148438, -0.9419708251953125, -0.4478302001953125, 2.5560264587402344, 5.5517425537109375, -0.55975341796875, -0.13857650756835938, 2.69482421875, -1.2236175537109375, 1.8563079833984375, -0.6632461547851562, 1.9963207244873047, -0.6600112915039062, -0.5545387268066406, 2.8183727264404297, 1.7192802429199219, 3.81658935546875, 2.8840065002441406, 3.1101303100585938, 0.5865859985351562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000466.npy"}
|
||||
{"epoch": 0.7044595616024187, "step": 467, "batch_size": 64, "mean": 1.741027593612671, "std": 2.2347702980041504, "min": -1.9868927001953125, "p10": -0.8548603057861327, "median": 1.6507396697998047, "p90": 4.956592559814455, "max": 7.98974609375, "pos_frac": 0.75, "sample": [0.728607177734375, 5.21258544921875, 4.368003845214844, 2.6361846923828125, -0.02191162109375, -0.9755973815917969, -0.6167335510253906, -1.7023887634277344, 2.3326416015625, -1.2779998779296875, 1.6167221069335938, -1.17022705078125, 2.881702423095703, 0.33077430725097656, 3.4076995849609375, 0.7993812561035156, 1.9676170349121094, 1.6222801208496094, 0.5318756103515625, 0.5469303131103516, -0.1497650146484375, -0.7470016479492188, -0.8880195617675781, 5.240619659423828, 4.540107727050781, 3.0166473388671875, 1.9378547668457031, 1.1912002563476562, -1.9868927001953125, 2.570343017578125, -0.5024604797363281, 4.421150207519531, 1.0866384506225586, -0.6264801025390625, 2.165130615234375, 1.6988067626953125, 2.6567459106445312, 5.1350860595703125, 2.826416015625, 1.67919921875, -0.3289375305175781, 0.8006324768066406, 0.6652412414550781, 0.0929107666015625, 0.6566982269287109, 1.6941680908203125, -0.7582969665527344, -0.7774887084960938, 3.0843029022216797, 5.7226409912109375, 7.98974609375, 1.4018020629882812, 0.23395156860351562, 0.303375244140625, 4.2464752197265625, 2.104766845703125, -1.9672622680664062, 5.918346405029297, 3.5600357055664062, 6.8009185791015625, 1.8668670654296875, 3.8713836669921875, 3.1626434326171875, 2.5973739624023438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000467.npy"}
|
||||
{"epoch": 0.7059712773998488, "step": 468, "batch_size": 64, "mean": 1.9023983478546143, "std": 2.3582699298858643, "min": -3.16119384765625, "p10": -0.9903289794921875, "median": 1.6579523086547852, "p90": 4.606486892700196, "max": 9.178703308105469, "pos_frac": 0.765625, "sample": [2.9294776916503906, 0.2580146789550781, 2.4077110290527344, 4.4156951904296875, 2.417888641357422, -1.0528736114501953, -0.28253746032714844, 1.0312881469726562, -0.632659912109375, -0.43235015869140625, 3.7571258544921875, 3.4864654541015625, 3.7421875, -0.11365699768066406, 1.3382492065429688, 3.8347396850585938, -0.8357505798339844, -1.1334686279296875, -0.918487548828125, 3.6675491333007812, 4.5193634033203125, -0.721099853515625, 3.0709686279296875, 6.0227508544921875, 3.0782299041748047, 0.9450950622558594, 5.108028411865234, 1.7703628540039062, -1.4456329345703125, -3.16119384765625, 5.460609436035156, -0.678802490234375, 0.2748565673828125, -1.6938552856445312, 0.41861915588378906, -2.5106048583984375, 5.3830718994140625, 1.1834259033203125, 0.723358154296875, 3.0884780883789062, 3.682281494140625, 0.706939697265625, 1.4844207763671875, 4.3898162841796875, 2.5853652954101562, 1.2435131072998047, 1.161712646484375, 4.643825531005859, 1.545541763305664, 1.9693794250488281, 4.44598388671875, 3.5942001342773438, 1.9777145385742188, -1.0211181640625, 5.566993713378906, 0.2365570068359375, 0.73175048828125, 9.178703308105469, 3.4514617919921875, 3.3878822326660156, 4.022306442260742, 2.241912841796875, 1.5168304443359375, 0.2888832092285156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000468.npy"}
|
||||
{"epoch": 0.7074829931972789, "step": 469, "batch_size": 64, "mean": 1.404966950416565, "std": 2.2256224155426025, "min": -3.042510986328125, "p10": -1.015858459472656, "median": 1.2411346435546875, "p90": 4.095675659179688, "max": 7.424285888671875, "pos_frac": 0.703125, "sample": [-1.755096435546875, 3.4272708892822266, 1.6764373779296875, 1.9929351806640625, 1.5114822387695312, 0.6673669815063477, 1.199188232421875, 2.069211959838867, 1.4674835205078125, -0.13320541381835938, 3.576873779296875, 1.8264312744140625, 1.7244873046875, -0.94708251953125, 0.22995758056640625, 4.606342315673828, -0.38565826416015625, 0.066619873046875, 1.9617996215820312, 3.141071319580078, 4.133583068847656, 1.1471443176269531, 2.9710464477539062, -1.8274078369140625, 2.2310028076171875, -1.7125377655029297, 1.9968528747558594, 1.2453765869140625, -1.0453338623046875, 1.1926116943359375, -0.3804473876953125, 0.12448692321777344, 2.74700927734375, 7.424285888671875, 3.288787841796875, -0.8890609741210938, -0.8409786224365234, -0.7464599609375, 3.4033126831054688, 6.404529571533203, 2.4808349609375, 2.6304092407226562, 1.4205818176269531, 0.81060791015625, -0.7238082885742188, 0.24794769287109375, -3.042510986328125, -1.1146278381347656, 6.7260894775390625, 0.8688812255859375, 2.4216842651367188, 0.40091705322265625, 4.395015716552734, -0.03477764129638672, 7.4088134765625, -0.3386268615722656, 1.2368927001953125, 1.1376991271972656, 1.7916717529296875, -1.5381689071655273, 1.5052604675292969, -0.7055130004882812, -0.8663330078125, 4.007225036621094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000469.npy"}
|
||||
{"epoch": 0.708994708994709, "step": 470, "batch_size": 64, "mean": 1.4254283905029297, "std": 2.284726858139038, "min": -4.352333068847656, "p10": -1.0781147003173828, "median": 1.6264419555664062, "p90": 4.368192672729492, "max": 6.59576416015625, "pos_frac": 0.71875, "sample": [5.4331817626953125, 0.5974960327148438, 5.4446563720703125, -4.352333068847656, 2.44891357421875, 1.7617950439453125, 3.8351364135742188, 2.883403778076172, -1.9198799133300781, 2.366424560546875, 1.7935104370117188, -0.6377677917480469, 3.549713134765625, 4.4152679443359375, 4.672584533691406, -0.426239013671875, 5.439384460449219, -0.9468193054199219, -0.043182373046875, 1.7918052673339844, 0.4866905212402344, 1.8515396118164062, 2.25439453125, 0.7369537353515625, 3.2842636108398438, -1.1690521240234375, 4.258350372314453, 1.153839111328125, 1.058089256286621, -0.5851764678955078, 0.4938220977783203, -0.8239555358886719, 3.1929969787597656, 0.05450439453125, -0.7319869995117188, 0.19309616088867188, 2.7093658447265625, -0.00540924072265625, 1.8286972045898438, -2.5710525512695312, 6.59576416015625, 1.3606338500976562, 5.2235260009765625, -0.69097900390625, -1.1343841552734375, -1.6828994750976562, 1.9857444763183594, 3.2499923706054688, 3.3056640625, 3.8818283081054688, -4.2902069091796875, 3.0102310180664062, 1.9966087341308594, -0.3941497802734375, 0.7804412841796875, 3.1413116455078125, 0.1309356689453125, 2.3151702880859375, 0.3318023681640625, 1.4910888671875, 3.1280975341796875, 0.28284454345703125, 1.9466171264648438, -0.5152854919433594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000470.npy"}
|
||||
{"epoch": 0.7105064247921391, "step": 471, "batch_size": 64, "mean": 1.8478376865386963, "std": 2.4970600605010986, "min": -5.1924285888671875, "p10": -1.0488586425781248, "median": 1.7731819152832031, "p90": 5.78923683166504, "max": 6.3618621826171875, "pos_frac": 0.8125, "sample": [-0.06918525695800781, 4.726371765136719, 2.2840309143066406, 0.6564083099365234, -1.2118377685546875, 0.4993133544921875, 2.4117813110351562, 3.6896209716796875, 4.879951477050781, 6.1126251220703125, 1.4899139404296875, -0.8532867431640625, 6.0986785888671875, 0.7185287475585938, -1.6064224243164062, -0.24974918365478516, 1.1832923889160156, 5.891456604003906, 1.8058624267578125, -0.61749267578125, 0.6670951843261719, 2.9782180786132812, 2.097503662109375, 1.4572906494140625, 1.8889541625976562, -5.1924285888671875, 2.151599884033203, 2.248931884765625, 6.3618621826171875, 0.31415557861328125, 3.39154052734375, 2.4192981719970703, 0.6504859924316406, 0.6096763610839844, 3.3320159912109375, 1.9224929809570312, 4.040805816650391, -2.97454833984375, 2.553760528564453, 5.550724029541016, 1.247894287109375, 2.327392578125, 5.389509201049805, 6.0975189208984375, -0.6026477813720703, 0.6847610473632812, 6.295234680175781, 1.7405014038085938, -1.1326751708984375, 2.436553955078125, -1.4632797241210938, 4.260223388671875, 0.21759796142578125, 0.10137176513671875, 2.42626953125, 0.15510177612304688, 2.561279296875, 0.3024711608886719, 0.72021484375, 5.4486846923828125, 0.5810470581054688, 0.9952621459960938, -2.822265625, 5.9842987060546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000471.npy"}
|
||||
{"epoch": 0.7120181405895691, "step": 472, "batch_size": 64, "mean": 1.9560319185256958, "std": 3.101038932800293, "min": -3.85052490234375, "p10": -1.764700317382812, "median": 1.7355232238769531, "p90": 6.393148422241215, "max": 8.964126586914062, "pos_frac": 0.71875, "sample": [0.3433685302734375, -0.34470367431640625, 1.1309127807617188, 2.344268798828125, 3.4118080139160156, 5.3717498779296875, 0.202545166015625, 4.2177734375, 1.5931625366210938, 4.87286376953125, 2.2373809814453125, 8.945663452148438, 2.6738433837890625, -1.1583099365234375, 0.7176170349121094, 4.753536224365234, -0.1327686309814453, 1.7609329223632812, 8.964126586914062, 2.378936767578125, 0.30249786376953125, -0.5440940856933594, -0.8449363708496094, 8.588150024414062, -2.0245819091796875, 2.1446380615234375, -0.02094268798828125, 8.925256729125977, 8.097450256347656, -0.6208820343017578, 3.489360809326172, 0.8310165405273438, 0.16164779663085938, -2.0308837890625, -2.2553176879882812, 1.6797637939453125, 3.2223129272460938, 8.409332275390625, 0.4100685119628906, -0.18292999267578125, 2.0182952880859375, 4.495513916015625, -3.2491455078125, 5.290138244628906, -3.85052490234375, -3.3624534606933594, 3.9867115020751953, 0.7309474945068359, -2.7269668579101562, 3.2355880737304688, 2.3804378509521484, 1.6135540008544922, 3.6038055419921875, 0.22359466552734375, -1.0174713134765625, -0.6319732666015625, 1.710113525390625, 1.8929519653320312, 2.5514297485351562, 2.4534034729003906, 3.3339080810546875, 2.115083694458008, -0.4634227752685547, 6.830890655517578], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000472.npy"}
|
||||
{"epoch": 0.7135298563869993, "step": 473, "batch_size": 64, "mean": 1.7292219400405884, "std": 2.809297561645508, "min": -6.347064971923828, "p10": -1.629898071289062, "median": 1.6392574310302734, "p90": 5.368657684326172, "max": 11.5638427734375, "pos_frac": 0.734375, "sample": [5.450325012207031, 3.902374267578125, 0.17268753051757812, -1.9602365493774414, 0.5507011413574219, 4.72918701171875, -0.3231391906738281, 3.4998855590820312, 0.6110572814941406, 4.168693542480469, 5.466560363769531, 0.9763736724853516, 0.06611061096191406, 1.93890380859375, 0.25252532958984375, -2.822601318359375, 6.724945068359375, 5.79351806640625, 2.460723876953125, 3.09893798828125, -6.347064971923828, -2.9097213745117188, 2.199554443359375, 0.596282958984375, -1.143798828125, 4.363044738769531, 2.3201217651367188, 3.0772705078125, 1.8475303649902344, 2.7769088745117188, 2.7561187744140625, -1.838226318359375, -0.06124114990234375, 0.9958572387695312, -2.7860794067382812, 3.960916519165039, 5.1781005859375, 4.603179931640625, 1.0800056457519531, 1.7273197174072266, 5.697845458984375, 1.2929000854492188, 11.5638427734375, 3.3517684936523438, -0.269073486328125, -0.1458740234375, 0.6311187744140625, -0.0708770751953125, 2.2672042846679688, 1.81591796875, 1.9949188232421875, 1.3522567749023438, 4.843620300292969, 1.7259979248046875, -0.36314964294433594, 2.2258644104003906, 5.7891845703125, -0.10053253173828125, -0.1601409912109375, 1.218292236328125, -0.31862640380859375, 0.26556396484375, 1.5525169372558594, -2.6439437866210938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000473.npy"}
|
||||
{"epoch": 0.7150415721844293, "step": 474, "batch_size": 64, "mean": 1.544090986251831, "std": 2.4890706539154053, "min": -4.261837005615234, "p10": -1.4455169677734374, "median": 1.3257875442504883, "p90": 5.219339752197268, "max": 6.3079376220703125, "pos_frac": 0.6875, "sample": [6.19329833984375, 1.1357078552246094, 2.3895645141601562, 1.2190933227539062, -1.3452796936035156, -1.7221832275390625, 3.9996185302734375, 5.88726806640625, 3.03912353515625, -0.263824462890625, 0.662261962890625, 1.1711959838867188, -4.13035774230957, 3.5483245849609375, 2.066741943359375, 5.669742584228516, -0.13433074951171875, 1.2664737701416016, -0.3598823547363281, 4.492057800292969, -1.762685775756836, -1.4884757995605469, 3.7054977416992188, 2.5670909881591797, 3.7361297607421875, 1.2012767791748047, -0.1903209686279297, -0.65185546875, -0.5867691040039062, -0.07222747802734375, 6.3079376220703125, 1.6815872192382812, -2.0203857421875, 0.347412109375, -0.184906005859375, 2.599395751953125, -0.05730438232421875, 5.94244384765625, 4.3175811767578125, 0.5733795166015625, 0.3964824676513672, 2.3636093139648438, 4.6372222900390625, 0.0417022705078125, -0.7863731384277344, 4.4441375732421875, 1.871551513671875, 1.650909423828125, 1.8207244873046875, 1.6684494018554688, 5.468818664550781, 5.986244201660156, 1.8306884765625, 1.385101318359375, 1.7898674011230469, -4.261837005615234, 3.5339126586914062, -0.845916748046875, -1.208822250366211, 1.9632415771484375, 4.634330749511719, -2.0163803100585938, 0.6109886169433594, 1.0937614440917969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000474.npy"}
|
||||
{"epoch": 0.7165532879818595, "step": 475, "batch_size": 64, "mean": 2.021491050720215, "std": 2.6066701412200928, "min": -4.08856201171875, "p10": -0.8608665466308593, "median": 1.6744461059570312, "p90": 5.121326446533203, "max": 10.526119232177734, "pos_frac": 0.8125, "sample": [0.5816726684570312, 0.6069793701171875, 5.6984405517578125, 1.6310882568359375, 1.5757827758789062, 1.1941604614257812, 4.613895416259766, 3.8713150024414062, 0.6388092041015625, -2.7887535095214844, 0.6776123046875, -2.60968017578125, 4.4164276123046875, 4.634124755859375, 1.7645950317382812, 2.5244369506835938, 3.345958709716797, 2.220897674560547, 1.4362716674804688, 4.4637603759765625, 2.8527450561523438, 0.288238525390625, 0.7096405029296875, -0.7902908325195312, 1.4647789001464844, -1.6740150451660156, 0.5670700073242188, 7.30499267578125, 1.3851852416992188, -0.5572052001953125, 10.526119232177734, 5.567413330078125, 2.952056884765625, 6.740745544433594, 1.332524299621582, -4.08856201171875, 3.1155471801757812, -0.4155006408691406, 3.0182037353515625, -0.7040977478027344, 4.42333984375, 0.98974609375, 0.7655830383300781, 2.3819732666015625, 3.699695587158203, -1.9261913299560547, -0.6178226470947266, 0.2926979064941406, 5.017425537109375, 1.717803955078125, 3.051410675048828, 5.9261627197265625, 2.593231201171875, 5.165855407714844, 3.509784698486328, -1.411041259765625, 0.6250228881835938, 3.351470947265625, 4.1062164306640625, 3.751729965209961, 1.7920589447021484, 0.15326690673828125, -0.89111328125, 0.8137264251708984], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000475.npy"}
|
||||
{"epoch": 0.7180650037792895, "step": 476, "batch_size": 64, "mean": 1.4242854118347168, "std": 2.376504421234131, "min": -4.16734504699707, "p10": -1.589063262939453, "median": 1.2473831176757812, "p90": 4.106849670410156, "max": 8.087982177734375, "pos_frac": 0.6875, "sample": [5.308311462402344, 3.1421432495117188, -0.0045166015625, 2.317302703857422, -1.4418182373046875, 1.6621780395507812, -1.68841552734375, 2.8550243377685547, 4.129669189453125, -0.08193206787109375, 5.729034423828125, 0.35532188415527344, 5.5435791015625, 3.1222782135009766, 0.5020828247070312, -0.5149459838867188, 1.8415069580078125, 0.5912933349609375, 2.0962791442871094, -0.3619232177734375, -0.2935600280761719, 1.3484811782836914, 4.0536041259765625, 5.658908843994141, 3.2588043212890625, -4.16734504699707, -1.6521682739257812, 2.7758865356445312, 0.56768798828125, 0.1832437515258789, 5.436279296875, 1.6569366455078125, 1.1630401611328125, 0.35430145263671875, 0.08515548706054688, 1.4415359497070312, -3.222137451171875, 1.33172607421875, 3.1023330688476562, -0.5456581115722656, -1.2478713989257812, 2.8908233642578125, 1.0287322998046875, 0.3466339111328125, 2.9165802001953125, -0.1905670166015625, -1.6911811828613281, 3.5072784423828125, -0.114349365234375, 3.8813743591308594, 2.6401214599609375, -0.773468017578125, 3.6611785888671875, -0.838348388671875, 3.5864105224609375, 8.087982177734375, 0.5582199096679688, 0.4369926452636719, 2.839994430541992, -2.2813568115234375, -1.7948265075683594, 2.6639938354492188, 3.428333282470703, -0.027923583984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000476.npy"}
|
||||
{"epoch": 0.7195767195767195, "step": 477, "batch_size": 64, "mean": 1.240182638168335, "std": 2.574627161026001, "min": -4.984077453613281, "p10": -1.8679203033447263, "median": 1.3211860656738281, "p90": 4.412811756134034, "max": 6.346950531005859, "pos_frac": 0.6875, "sample": [-3.229949951171875, -1.1577529907226562, 4.5146026611328125, 0.8646087646484375, 2.626983642578125, 1.911773681640625, -1.527130126953125, 0.11236763000488281, 2.5336151123046875, -1.5709495544433594, 2.2501373291015625, 2.833293914794922, -0.9897022247314453, -0.30777740478515625, 0.7964725494384766, 2.2779674530029297, -0.5059013366699219, 3.0335464477539062, 1.7500762939453125, -0.4494953155517578, 3.1981201171875, 5.015415191650391, -0.7442512512207031, 0.420440673828125, 3.4491119384765625, -0.7557106018066406, -3.3937149047851562, 3.0011215209960938, 4.157857894897461, 0.57501220703125, -2.9637908935546875, -0.2441864013671875, 1.7754554748535156, 1.4654998779296875, -4.984077453613281, -1.2813491821289062, 0.2838287353515625, 3.6061363220214844, 1.6003341674804688, 0.3389396667480469, 2.4532546997070312, -1.9951934814453125, 2.2056961059570312, 5.52838134765625, 1.4254226684570312, 6.346950531005859, -2.0551071166992188, 1.4967041015625, 4.843254089355469, -1.521270751953125, 1.216949462890625, 1.2013740539550781, 3.9221954345703125, 6.309425354003906, 2.657867431640625, 4.1385498046875, -4.40472412109375, 0.31005859375, 0.7483673095703125, 6.3044586181640625, 1.1666183471679688, 4.175299644470215, -0.678558349609375, 3.2887325286865234], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000477.npy"}
|
||||
{"epoch": 0.7210884353741497, "step": 478, "batch_size": 64, "mean": 1.4475772380828857, "std": 3.061415910720825, "min": -5.9774169921875, "p10": -1.6082979202270506, "median": 0.9611873626708984, "p90": 5.424322509765627, "max": 9.509040832519531, "pos_frac": 0.65625, "sample": [-0.8120613098144531, 8.089431762695312, 4.010498046875, 4.05401611328125, 2.713634490966797, -5.960174560546875, 3.7343292236328125, 7.2545623779296875, -0.1403350830078125, 3.7163543701171875, -1.0983390808105469, 0.8093032836914062, 2.28131103515625, 4.900550842285156, -2.01953125, 1.7882575988769531, -0.8610076904296875, -0.7586517333984375, -0.5469856262207031, 0.9812774658203125, -0.6106109619140625, -3.233330726623535, 2.3223037719726562, 2.8882217407226562, 4.0466461181640625, -1.1394805908203125, 0.673370361328125, 1.4096603393554688, 4.570426940917969, -0.598663330078125, 2.9656829833984375, -0.7464752197265625, -3.3935546875, -1.6883888244628906, 2.1921157836914062, 4.138458251953125, 5.648796081542969, 0.082305908203125, 0.7672195434570312, 1.4917221069335938, 5.859657287597656, -1.4091663360595703, -5.9774169921875, 0.4431877136230469, 9.509040832519531, -0.18679428100585938, 0.8136100769042969, 3.55426025390625, 3.0136947631835938, 0.5635643005371094, 1.8359527587890625, 3.8605270385742188, -0.610443115234375, 2.5132179260253906, 1.7914352416992188, 7.898399353027344, 6.117393493652344, -2.389373779296875, 0.9410972595214844, 1.3005828857421875, -0.5559730529785156, -1.4214191436767578, 0.6167793273925781, 0.6402606964111328], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000478.npy"}
|
||||
{"epoch": 0.7226001511715797, "step": 479, "batch_size": 64, "mean": 1.281598448753357, "std": 2.6864476203918457, "min": -7.120887756347656, "p10": -1.761419677734375, "median": 0.9723987579345703, "p90": 4.679888153076172, "max": 7.260612487792969, "pos_frac": 0.671875, "sample": [2.493122100830078, 7.260612487792969, -0.2098388671875, -1.721343994140625, -0.5188674926757812, 4.120838165283203, 6.131336212158203, 1.6602935791015625, 4.683502197265625, -0.37177276611328125, 1.514577865600586, -3.038982391357422, -2.0096359252929688, 0.5957107543945312, 3.0106277465820312, 2.190826416015625, 6.832427978515625, 2.2138671875, -0.13166046142578125, 0.7597522735595703, -0.29268550872802734, -0.5129623413085938, -0.07135009765625, 4.7895050048828125, 0.9292068481445312, -2.8584976196289062, 3.8641929626464844, 1.686126708984375, -1.5637397766113281, 0.08123779296875, 3.6650238037109375, 2.672191619873047, -2.75030517578125, 3.2939987182617188, 3.1550750732421875, 0.7161922454833984, -2.687885284423828, 0.9615020751953125, 0.5420570373535156, 5.637382507324219, 4.671455383300781, -7.120887756347656, -1.1172103881835938, 0.7980117797851562, 3.9623947143554688, 5.268207550048828, 0.6543731689453125, -1.778594970703125, -1.7126617431640625, 0.9625473022460938, 1.24383544921875, 4.3309326171875, -0.3229560852050781, 3.979461669921875, 0.05104637145996094, 1.1866188049316406, 3.2574462890625, 0.9822502136230469, 1.5618629455566406, 1.3457164764404297, -1.4396162033081055, -1.4712753295898438, 2.7665023803710938, 3.2411842346191406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000479.npy"}
|
||||
{"epoch": 0.7241118669690099, "step": 480, "batch_size": 64, "mean": 1.7271829843521118, "std": 2.805346965789795, "min": -6.8271636962890625, "p10": -1.7016632080078125, "median": 1.9102802276611328, "p90": 4.627373123168946, "max": 8.433364868164062, "pos_frac": 0.75, "sample": [1.9732437133789062, -0.802642822265625, 2.87322998046875, 0.21894073486328125, 2.067403793334961, 1.0868396759033203, 1.8933982849121094, -5.430938720703125, 0.0622406005859375, 3.3579254150390625, -1.090580940246582, -0.9044151306152344, 3.5065231323242188, -1.7646255493164062, 3.6426239013671875, -2.140960693359375, -1.7108917236328125, 1.743408203125, -0.372222900390625, 2.0242996215820312, -1.6801300048828125, 5.5692291259765625, 0.5768165588378906, 0.1865692138671875, 2.132598876953125, 5.801319122314453, 1.0821685791015625, 3.1236419677734375, 6.490234375, 0.847930908203125, 3.0664405822753906, -1.1076202392578125, -1.234375, 1.2586822509765625, 4.169340133666992, 4.039739608764648, 2.6272354125976562, 4.310302734375, -6.8271636962890625, 1.7409629821777344, 3.301685333251953, 8.433364868164062, 4.09423828125, 1.7557182312011719, 7.163909912109375, 4.627925872802734, -0.30462646484375, 2.9063796997070312, 1.3764419555664062, 1.8746490478515625, 1.9271621704101562, -0.370361328125, 5.825202941894531, 4.4437408447265625, 0.8991737365722656, 3.8343944549560547, 4.146141052246094, 3.1037960052490234, 2.45831298828125, -2.0135955810546875, 0.5625114440917969, 4.6260833740234375, 2.6753692626953125, -3.214630126953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000480.npy"}
|
||||
{"epoch": 0.7256235827664399, "step": 481, "batch_size": 64, "mean": 1.5297071933746338, "std": 2.9068846702575684, "min": -5.0117340087890625, "p10": -1.4598464965820308, "median": 0.9788017272949219, "p90": 5.5949031829833995, "max": 11.4088134765625, "pos_frac": 0.703125, "sample": [4.193264007568359, -2.5819625854492188, 6.648841857910156, -1.0453109741210938, -1.6949234008789062, 3.6759185791015625, 5.393299102783203, 0.6039237976074219, -0.9042510986328125, 0.13162612915039062, 3.967803955078125, -0.43154144287109375, 0.6535797119140625, 0.7237510681152344, -0.78521728515625, 0.79742431640625, 2.1196632385253906, 2.9862022399902344, 0.078765869140625, 0.387298583984375, -5.0117340087890625, -0.03560638427734375, 2.5409698486328125, 1.8282661437988281, -0.046398162841796875, -0.9745311737060547, -0.7453765869140625, 3.4515953063964844, 0.9702987670898438, 4.546287536621094, -0.0021915435791015625, 1.4387931823730469, -1.6375045776367188, 1.000640869140625, 1.6222763061523438, 0.9735565185546875, 2.970062255859375, 1.3203086853027344, 0.6866607666015625, -0.32291412353515625, 2.43359375, 0.9840469360351562, 3.0195350646972656, 8.956417083740234, 1.0508594512939453, 6.195899963378906, -1.0074081420898438, 1.2979507446289062, 5.8802032470703125, 4.372642517089844, 4.053245544433594, 0.056301116943359375, 0.8673095703125, -2.428070068359375, -2.6884498596191406, 5.681304931640625, -3.8834667205810547, 1.6436309814453125, 6.136585235595703, 11.4088134765625, 3.249492645263672, 0.18758010864257812, 1.132476806640625, -0.19084930419921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000481.npy"}
|
||||
{"epoch": 0.72713529856387, "step": 482, "batch_size": 64, "mean": 1.7152750492095947, "std": 2.7879064083099365, "min": -3.8515167236328125, "p10": -2.3578987121582027, "median": 1.6375398635864258, "p90": 5.579280090332031, "max": 7.521820068359375, "pos_frac": 0.734375, "sample": [-0.643890380859375, -3.093475341796875, 0.9748687744140625, 2.131256103515625, 2.852325439453125, -3.0631484985351562, 1.4036712646484375, -3.06890869140625, 1.1229591369628906, 6.72576904296875, 1.554281234741211, 4.506561279296875, 4.697368621826172, 1.2766876220703125, 2.0541763305664062, 3.148174285888672, 4.124786376953125, 1.2166252136230469, -2.5216751098632812, 0.7437286376953125, 5.6221771240234375, 0.9531707763671875, 7.521820068359375, -2.8866729736328125, 0.3506889343261719, 3.1459693908691406, 7.17333984375, 1.3044357299804688, -1.8081703186035156, -0.4397087097167969, -1.9757537841796875, 1.9135284423828125, 0.11962890625, 2.705493927001953, 6.198619842529297, 3.4510116577148438, 3.2049789428710938, 1.167327880859375, 1.9262466430664062, -0.9989013671875, 6.652008056640625, 0.22119140625, 2.1685638427734375, 1.9483184814453125, 1.1982498168945312, 3.76165771484375, 4.5029296875, 1.7207984924316406, 5.47918701171875, -0.21945953369140625, 4.036102294921875, 3.2981643676757812, 3.4093017578125, -0.31070709228515625, 3.1158981323242188, 1.0551643371582031, -3.8515167236328125, 5.921142578125, -1.1618270874023438, -3.217578887939453, 4.017845153808594, -1.0664863586425781, 3.42449951171875, -1.1172122955322266], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000482.npy"}
|
||||
{"epoch": 0.7286470143613001, "step": 483, "batch_size": 64, "mean": 1.9437729120254517, "std": 2.404308319091797, "min": -5.123149871826172, "p10": -0.5355865478515623, "median": 1.6034393310546875, "p90": 4.998641967773438, "max": 7.948143005371094, "pos_frac": 0.828125, "sample": [0.39226531982421875, 4.144691467285156, 0.743560791015625, 1.2351226806640625, 2.6567840576171875, 5.7988128662109375, 3.9218692779541016, 0.7366943359375, 0.6656494140625, -0.5988311767578125, 0.7246589660644531, 1.89794921875, 4.03106689453125, 6.216361999511719, 1.4747238159179688, 3.187122344970703, 0.6726799011230469, -0.2825469970703125, 0.2930717468261719, 7.688209533691406, 4.5185546875, 6.929656982421875, 4.752754211425781, -0.3348541259765625, 5.181404113769531, 2.1717376708984375, 2.7821426391601562, 1.033905029296875, 0.234832763671875, 3.7494659423828125, 1.6454544067382812, -2.5839767456054688, 4.822105407714844, 1.3973712921142578, 2.76165771484375, 3.654083251953125, 1.7573013305664062, -5.123149871826172, 0.9115791320800781, 2.2087936401367188, 0.7847747802734375, 1.8539657592773438, 1.8502426147460938, 2.1728591918945312, -0.9126358032226562, 2.251209259033203, 7.948143005371094, 5.046844482421875, 0.782012939453125, 2.985828399658203, 1.08758544921875, -1.0772705078125, 2.6696243286132812, 4.88616943359375, -1.7759532928466797, -1.5282516479492188, 0.508056640625, 1.166717529296875, 2.536510467529297, 1.1031494140625, -0.00225830078125, 0.8199996948242188, -0.3880157470703125, 1.5614242553710938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000483.npy"}
|
||||
{"epoch": 0.7301587301587301, "step": 484, "batch_size": 64, "mean": 1.299433946609497, "std": 2.0726420879364014, "min": -3.5960350036621094, "p10": -1.0815063476562499, "median": 1.177633285522461, "p90": 3.7596138000488284, "max": 6.598823547363281, "pos_frac": 0.765625, "sample": [1.2077369689941406, -1.9809799194335938, -2.5796051025390625, 0.2933921813964844, 6.442386627197266, 0.1443195343017578, 1.7776622772216797, -3.5960350036621094, 4.4654541015625, -1.1276397705078125, 0.9511566162109375, 0.8269195556640625, -2.147401809692383, 2.1190719604492188, 0.7192878723144531, 2.9313888549804688, -0.478546142578125, 1.1475296020507812, 0.20728302001953125, 1.3700790405273438, -1.2551803588867188, 2.65008544921875, 6.598823547363281, 3.0228500366210938, 1.7756900787353516, 3.6685943603515625, 2.2684326171875, 1.957000732421875, 0.9698410034179688, 1.4070358276367188, -0.7237739562988281, 3.471752166748047, -0.2593841552734375, 1.4778594970703125, 5.3188934326171875, 0.5553016662597656, 0.261138916015625, 0.3932685852050781, 0.8624343872070312, 0.7749710083007812, 1.5349788665771484, 0.2783203125, 6.0303955078125, -0.951019287109375, 2.7636537551879883, 0.6759071350097656, 2.739288330078125, 1.7232933044433594, 1.5261077880859375, 2.620758056640625, 3.7986221313476562, 1.5260190963745117, 2.8085250854492188, -0.821990966796875, -1.41522216796875, -0.671051025390625, 4.440391540527344, -0.6162872314453125, 1.527191162109375, 0.9242401123046875, 0.5124435424804688, 2.29547119140625, 2.9985008239746094, -0.9738616943359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000484.npy"}
|
||||
{"epoch": 0.7316704459561603, "step": 485, "batch_size": 64, "mean": 1.605086326599121, "std": 2.437469005584717, "min": -2.7571640014648438, "p10": -0.9167615890502929, "median": 1.2250442504882812, "p90": 5.349943542480469, "max": 7.221282958984375, "pos_frac": 0.734375, "sample": [1.5474166870117188, -0.7089786529541016, 4.875148773193359, 2.876556396484375, 0.22176361083984375, 0.5270004272460938, 1.1616439819335938, -0.4964027404785156, 5.202735900878906, -0.8665027618408203, -2.7571640014648438, 1.8617115020751953, 6.091072082519531, 5.413032531738281, 1.7191925048828125, 0.2281494140625, 3.204193115234375, 4.1542816162109375, 3.9717864990234375, 1.320556640625, 0.8360214233398438, 0.629974365234375, -2.5175552368164062, 5.022186279296875, 0.42949485778808594, 1.3778228759765625, -1.8316802978515625, 1.8002090454101562, -0.7073822021484375, 2.037090301513672, -1.43731689453125, 2.124908447265625, 7.221282958984375, 0.415374755859375, -0.9383010864257812, 0.451995849609375, 6.579620361328125, 6.670417785644531, -1.4446563720703125, 3.6570968627929688, 2.1428985595703125, -0.8361091613769531, 1.5067214965820312, -0.7805252075195312, 0.1495990753173828, -1.5941390991210938, -0.11586761474609375, 2.3165855407714844, 4.059291839599609, 0.2075653076171875, 0.30565643310546875, -0.10863399505615234, 3.8378829956054688, 5.872018814086914, 1.69378662109375, 3.8572769165039062, 1.2884445190429688, 6.309064865112305, 0.5754432678222656, -0.6419219970703125, 1.5935211181640625, -0.1063079833984375, 0.1646728515625, 1.10479736328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000485.npy"}
|
||||
{"epoch": 0.7331821617535903, "step": 486, "batch_size": 64, "mean": 1.917954921722412, "std": 3.0805699825286865, "min": -5.117246627807617, "p10": -1.523685836791992, "median": 1.3793182373046875, "p90": 5.55851287841797, "max": 11.353424072265625, "pos_frac": 0.734375, "sample": [-0.8294448852539062, 0.33502197265625, 7.879058837890625, 0.4986076354980469, -0.04840087890625, -1.480712890625, 1.1585159301757812, 5.228424072265625, 4.240516662597656, 0.09758758544921875, -1.7669181823730469, 3.78509521484375, 4.871143341064453, 1.731729507446289, 4.591407775878906, 1.7463951110839844, -0.5189743041992188, 6.498683929443359, -0.02160930633544922, 5.015968322753906, 1.1170921325683594, 2.5657882690429688, 0.7124137878417969, -0.30989837646484375, 8.835220336914062, 1.56463623046875, 7.785224914550781, -2.484781265258789, 1.4977455139160156, -4.1587066650390625, 5.6692962646484375, 2.5590896606445312, 2.0114898681640625, -1.6779708862304688, 0.7571477890014648, 1.1200485229492188, 0.1840057373046875, 5.3093719482421875, -1.4428539276123047, -0.40569305419921875, 0.748809814453125, 4.121288299560547, 1.8206558227539062, 5.3519287109375, 4.151004791259766, 1.3038177490234375, -0.2843475341796875, 1.022430419921875, 4.334907531738281, 5.6470489501953125, 3.460693359375, 0.8037033081054688, 1.2922515869140625, 1.740447998046875, -0.49204063415527344, 0.73126220703125, 2.2473526000976562, 1.4548187255859375, 11.353424072265625, -1.5421028137207031, 5.048866271972656, -5.117246627807617, 2.2158546447753906, -2.886474609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000486.npy"}
|
||||
{"epoch": 0.7346938775510204, "step": 487, "batch_size": 64, "mean": 1.4492161273956299, "std": 2.373466730117798, "min": -5.098320007324219, "p10": -1.4391620635986326, "median": 1.5020008087158203, "p90": 4.014293479919434, "max": 9.229217529296875, "pos_frac": 0.71875, "sample": [0.1917877197265625, -1.8950424194335938, -0.2330322265625, 3.0887908935546875, 2.3123130798339844, 1.9897651672363281, 1.325286865234375, 1.986968994140625, 4.3264312744140625, 1.89013671875, 3.248931884765625, -0.7080764770507812, -0.145233154296875, -1.4975013732910156, -1.3383026123046875, 3.931732177734375, 4.987827301025391, -0.4388542175292969, 3.64886474609375, 1.0333251953125, -5.098320007324219, 2.794158935546875, 0.18109703063964844, 5.285640716552734, 1.7867813110351562, -1.4823875427246094, -0.048274993896484375, -1.625579833984375, 2.0568885803222656, 2.4864959716796875, 3.558929443359375, 1.355926513671875, 2.5137195587158203, -2.946653366088867, 3.5226211547851562, 1.6342849731445312, 6.047233581542969, 0.9634017944335938, 1.6070785522460938, 2.248685836791992, 3.640899658203125, 0.3151702880859375, 0.4772796630859375, 2.2343177795410156, -1.2630081176757812, 1.25799560546875, 2.641977310180664, -0.30815887451171875, 0.6384429931640625, -2.7436351776123047, -0.4041023254394531, 4.049676895141602, 0.7770538330078125, -0.5896453857421875, 1.5754318237304688, 3.7349014282226562, 1.3111724853515625, 1.4285697937011719, -0.219451904296875, 5.924461364746094, 2.040843963623047, 2.3425331115722656, 9.229217529296875, 0.11003684997558594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000487.npy"}
|
||||
{"epoch": 0.7362055933484505, "step": 488, "batch_size": 64, "mean": 1.9046547412872314, "std": 3.111302614212036, "min": -5.186614990234375, "p10": -1.425651741027832, "median": 0.8615493774414062, "p90": 6.316864395141602, "max": 10.59576416015625, "pos_frac": 0.71875, "sample": [-1.3861808776855469, 0.7609596252441406, -0.15460968017578125, -2.0958824157714844, 4.903114318847656, -1.1699447631835938, -5.186614990234375, 1.9712944030761719, 5.648033142089844, 6.895366668701172, 0.1717071533203125, -2.4648590087890625, 7.9060516357421875, 0.6842842102050781, 0.4394378662109375, 10.59576416015625, 1.6900253295898438, 4.3887176513671875, 0.09799575805664062, -1.1595230102539062, 0.8357696533203125, 2.8410491943359375, 1.6862106323242188, 0.7917022705078125, 0.60675048828125, 2.4383392333984375, -0.47713470458984375, 4.728298187255859, -0.9918060302734375, 6.7044830322265625, -1.0877761840820312, -0.952606201171875, -0.7166633605957031, 6.1878204345703125, 0.78753662109375, 0.4987335205078125, 4.876502990722656, -1.4425678253173828, 5.174079895019531, 4.4156036376953125, 7.379356384277344, 0.39681243896484375, 3.2784194946289062, -0.57672119140625, 6.895835876464844, 2.6955032348632812, 0.853485107421875, 0.6663360595703125, -2.1439208984375, 3.8303985595703125, 0.9961700439453125, -0.2771940231323242, -1.9326133728027344, 0.00936126708984375, 0.8696136474609375, 6.368068695068359, 1.85186767578125, 6.1973876953125, 3.7984771728515625, 3.573833465576172, 2.371185302734375, 1.2343673706054688, -1.9025650024414062, 6.024971008300781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000488.npy"}
|
||||
{"epoch": 0.7377173091458806, "step": 489, "batch_size": 64, "mean": 0.8081076741218567, "std": 2.0039243698120117, "min": -3.0113372802734375, "p10": -1.8641448974609371, "median": 0.6870841979980469, "p90": 3.3119255065917983, "max": 7.26470947265625, "pos_frac": 0.640625, "sample": [-2.177570343017578, 0.3249626159667969, 1.0399322509765625, 2.3407020568847656, 1.5874710083007812, 4.282558441162109, 2.1646499633789062, 1.5079803466796875, 0.6915969848632812, 7.26470947265625, 2.424591064453125, 0.6434555053710938, 0.035552978515625, -1.1848907470703125, -0.3900947570800781, -0.310150146484375, -2.3081893920898438, -0.9748611450195312, 1.7592315673828125, 1.8425617218017578, 2.844371795654297, 1.0345840454101562, 1.9013481140136719, 0.4378776550292969, 4.514671325683594, 4.496723175048828, -1.2260665893554688, -0.9010162353515625, 1.583953857421875, 0.6825714111328125, -1.9858779907226562, -3.0113372802734375, 4.5318756103515625, 0.4423370361328125, -0.5660781860351562, -0.7018337249755859, -0.34772491455078125, 1.6713619232177734, 3.851318359375, 1.3102798461914062, 0.6393508911132812, 1.382354736328125, 1.0220718383789062, 1.127349853515625, 2.0001754760742188, -2.2591476440429688, 0.9032745361328125, -0.48590850830078125, 2.9356231689453125, 0.65692138671875, -2.775177001953125, 0.2880859375, 1.7804641723632812, -0.5540828704833984, -0.16802406311035156, -2.354217529296875, 2.813507080078125, -0.477783203125, -1.0243797302246094, -1.5801010131835938, -0.5158615112304688, 3.4731979370117188, 1.1942005157470703, 2.5694580078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000489.npy"}
|
||||
{"epoch": 0.7392290249433107, "step": 490, "batch_size": 64, "mean": 1.6473113298416138, "std": 2.9756124019622803, "min": -5.779052734375, "p10": -2.421446800231933, "median": 1.0274124145507812, "p90": 5.994564628601075, "max": 7.734199523925781, "pos_frac": 0.71875, "sample": [0.18143081665039062, -3.0354156494140625, -5.779052734375, 7.645545959472656, -0.48720836639404297, 1.8198165893554688, -0.0875091552734375, 0.8555908203125, 6.047882080078125, 0.913787841796875, -2.110992431640625, 3.850860595703125, -2.7714157104492188, 0.5865001678466797, 5.870157241821289, 2.471202850341797, 6.899261474609375, 4.5489959716796875, 3.2165069580078125, 0.5190505981445312, -0.7562103271484375, 2.8464813232421875, 6.617923736572266, 5.213081359863281, -2.6041183471679688, 0.82623291015625, 6.470340728759766, -0.4824180603027344, 0.5745353698730469, 0.12708282470703125, -1.1241302490234375, 2.0249481201171875, 1.208953857421875, 3.542675018310547, -0.4243278503417969, -0.04474639892578125, 2.9214210510253906, 6.733280181884766, -2.5544986724853516, 0.09588241577148438, 4.569145202636719, -0.9581985473632812, 0.060222625732421875, 5.140859603881836, 0.18315505981445312, 2.0385494232177734, 1.3084945678710938, 0.18507003784179688, -3.12347412109375, -0.608184814453125, 4.350799560546875, 1.4288654327392578, -0.14085006713867188, 4.323760986328125, 7.734199523925781, 2.857250213623047, 0.7078933715820312, 2.7655868530273438, 1.1410369873046875, 4.824106216430664, -3.23321533203125, 0.8043785095214844, 4.5771026611328125, 2.1239891052246094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000490.npy"}
|
||||
{"epoch": 0.7407407407407407, "step": 491, "batch_size": 64, "mean": 1.7601425647735596, "std": 2.3532557487487793, "min": -2.404186248779297, "p10": -0.9700611114501952, "median": 1.5214996337890625, "p90": 4.919665527343751, "max": 8.31011962890625, "pos_frac": 0.71875, "sample": [7.407144546508789, -0.9758987426757812, 4.60076904296875, 1.5506668090820312, 0.9893035888671875, 2.011157989501953, 2.4718589782714844, 1.829803466796875, 2.6362457275390625, 0.7745437622070312, 1.3487548828125, 0.32392120361328125, 3.235382080078125, -0.9564399719238281, 4.238460540771484, -1.0440673828125, 1.1491584777832031, 4.062774658203125, 0.8882675170898438, 2.306884765625, -0.7863922119140625, -1.763885498046875, 2.6335411071777344, 1.4923324584960938, -0.25211524963378906, 3.7985916137695312, 0.9123687744140625, 2.64459228515625, -0.9447708129882812, 0.5959930419921875, 5.242839813232422, 1.046142578125, -0.123687744140625, 2.0150909423828125, 3.8955078125, 0.745880126953125, -0.6593093872070312, 6.539215087890625, 1.6034469604492188, -0.3347129821777344, 5.422950744628906, 0.1986083984375, -0.4400615692138672, -0.37713050842285156, 3.4095077514648438, 2.0643882751464844, -1.44598388671875, -0.4453544616699219, 8.31011962890625, 3.4031753540039062, 2.1215896606445312, 0.6730117797851562, 5.05633544921875, 5.576313018798828, 2.7754077911376953, 1.958221435546875, 4.239421844482422, -0.3476371765136719, 4.220859527587891, -2.404186248779297, -1.623260498046875, 0.7175407409667969, -1.492462158203125, 3.9283905029296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000491.npy"}
|
||||
{"epoch": 0.7422524565381708, "step": 492, "batch_size": 64, "mean": 1.177229642868042, "std": 2.7216882705688477, "min": -5.5697784423828125, "p10": -2.5322509765624996, "median": 0.9480552673339844, "p90": 4.814429473876953, "max": 8.271476745605469, "pos_frac": 0.671875, "sample": [5.725830078125, -0.807373046875, -0.5681667327880859, 2.295896530151367, 1.530029296875, -2.6048660278320312, 3.9984130859375, -0.15113449096679688, 0.8147087097167969, 0.2579803466796875, 3.17852783203125, 2.055938720703125, 0.0352020263671875, 0.010711669921875, 3.9529190063476562, 2.03851318359375, -0.86993408203125, 0.21244049072265625, 0.5419139862060547, 0.8986358642578125, 3.1574974060058594, -3.3736495971679688, -1.3253631591796875, -0.96710205078125, 2.581226348876953, 2.1209564208984375, 0.5053863525390625, 4.690704345703125, 0.8692550659179688, -5.5697784423828125, 4.9339447021484375, 2.5146865844726562, 2.2074661254882812, 2.0014801025390625, -1.4631595611572266, 0.4388580322265625, -2.64935302734375, 8.271476745605469, -2.9739990234375, -1.9773406982421875, 6.17340087890625, 2.4163951873779297, 2.6743927001953125, -0.4871559143066406, -1.0673065185546875, 2.0520477294921875, 6.300930023193359, 1.6597137451171875, 1.0149002075195312, 4.867454528808594, -2.7169952392578125, -2.3628158569335938, -1.0509567260742188, 4.180511474609375, 3.683013916015625, 0.9974746704101562, 6.447052001953125, -0.426788330078125, 1.5668182373046875, 0.119598388671875, 2.904684066772461, -0.9766464233398438, -2.6351470947265625, 3.468738555908203], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000492.npy"}
|
||||
{"epoch": 0.7437641723356009, "step": 493, "batch_size": 64, "mean": 1.7245078086853027, "std": 2.752708673477173, "min": -3.441162109375, "p10": -1.4526626586914062, "median": 1.8011293411254883, "p90": 5.305386352539065, "max": 8.647869110107422, "pos_frac": 0.71875, "sample": [2.6899948120117188, 0.4023590087890625, 2.0631103515625, 1.7652225494384766, 3.3132095336914062, -0.7784576416015625, -3.441162109375, 1.8865890502929688, 0.4968681335449219, -0.5022773742675781, 1.2120437622070312, 3.9415740966796875, 5.54632568359375, 7.178215026855469, -0.4559211730957031, -1.5989532470703125, 6.720726013183594, 0.04265022277832031, 4.532951354980469, 2.888591766357422, -0.6193771362304688, 0.21099472045898438, -3.3238143920898438, 1.8370361328125, -0.2528839111328125, -1.1727294921875, 4.0403289794921875, 1.6965179443359375, 8.40639877319336, 2.367098808288574, -1.6272735595703125, 1.9863395690917969, -1.5456085205078125, 2.9753036499023438, -1.5097503662109375, 1.40509033203125, 4.042015075683594, -1.1210708618164062, 3.003021240234375, 2.2870712280273438, 2.9243698120117188, 2.7065048217773438, 1.3107872009277344, -1.3194580078125, 2.1229934692382812, 7.164424896240234, -1.29547119140625, 2.00396728515625, 2.1710128784179688, -0.6374664306640625, 0.01226043701171875, 4.732854843139648, 2.3513336181640625, 1.2029266357421875, 6.6989898681640625, 4.072731018066406, -0.49481964111328125, 0.5046882629394531, 0.1320648193359375, 8.647869110107422, 2.4801101684570312, 0.35570526123046875, 4.743194580078125, -3.211437225341797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000493.npy"}
|
||||
{"epoch": 0.745275888133031, "step": 494, "batch_size": 64, "mean": 1.459385633468628, "std": 2.7680537700653076, "min": -5.076881408691406, "p10": -1.578192901611328, "median": 1.314565658569336, "p90": 5.036106109619143, "max": 8.147125244140625, "pos_frac": 0.734375, "sample": [3.1653518676757812, -0.047576904296875, 0.10931396484375, 0.46469879150390625, 1.9813804626464844, 0.509124755859375, 1.0119667053222656, 4.375450134277344, -0.8993301391601562, 2.732666015625, -0.5363655090332031, -1.8469619750976562, 1.3377838134765625, 3.1348724365234375, 1.3316383361816406, -0.19797515869140625, 4.157602310180664, 0.49273681640625, -0.1284942626953125, -1.4920654296875, 0.5307235717773438, 8.147125244140625, 2.3950958251953125, 0.894378662109375, 2.099578857421875, 7.926628112792969, 2.1331863403320312, -1.7767524719238281, 2.4045257568359375, 6.376976013183594, 3.572540283203125, -4.243904113769531, -5.076881408691406, 2.6589717864990234, -0.2758960723876953, 0.0004119873046875, -1.6151046752929688, 4.215667724609375, 2.5150146484375, 0.5207786560058594, 4.0049285888671875, 0.07700347900390625, -3.343353271484375, 0.7584800720214844, 3.1242141723632812, 0.8659019470214844, 7.47686767578125, 2.748035430908203, 0.33147430419921875, 2.39727783203125, 5.319244384765625, 1.375640869140625, 1.39141845703125, 5.435760498046875, 1.6294708251953125, 1.2974929809570312, -4.487823486328125, -1.3823509216308594, 2.1903305053710938, -1.04791259765625, 2.727569580078125, -0.565887451171875, 0.833892822265625, 7.184120178222656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000494.npy"}
|
||||
{"epoch": 0.7467876039304611, "step": 495, "batch_size": 64, "mean": 1.3804597854614258, "std": 2.6315178871154785, "min": -5.362403869628906, "p10": -1.489923095703125, "median": 1.0801658630371094, "p90": 4.314740753173829, "max": 8.858673095703125, "pos_frac": 0.765625, "sample": [-3.5517654418945312, 0.44786834716796875, -0.4169464111328125, -5.362403869628906, 5.878345489501953, -2.179443359375, -1.492706298828125, 1.0580215454101562, -0.22480010986328125, 3.3560867309570312, -1.6971397399902344, -4.39801025390625, 0.5116195678710938, -0.30530548095703125, 2.4978485107421875, 4.3309478759765625, -1.2023544311523438, 2.5019073486328125, 3.138286590576172, -1.1322784423828125, 0.6436958312988281, 3.183805465698242, 0.3565216064453125, 1.646728515625, 1.4508857727050781, 4.933696746826172, 0.4005603790283203, 0.29351806640625, 1.6931953430175781, 0.7156143188476562, 8.858673095703125, 3.956371307373047, 1.3313179016113281, -1.573476791381836, 3.6817474365234375, -0.7736129760742188, 0.09848785400390625, 1.1023101806640625, 7.298583984375, 2.4288864135742188, 3.2574539184570312, 2.4783287048339844, 3.0897216796875, 0.3488006591796875, 4.276924133300781, 2.15545654296875, 1.1503715515136719, 0.748626708984375, -1.2719039916992188, 8.102907180786133, 0.5052967071533203, 3.123004913330078, 1.4349594116210938, 0.6537094116210938, 1.281005859375, 4.832427978515625, 1.6507720947265625, -1.483428955078125, 0.09569931030273438, 1.049560546875, 0.15149688720703125, 2.523773193359375, 4.2420196533203125, 0.4671516418457031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000495.npy"}
|
||||
{"epoch": 0.7482993197278912, "step": 496, "batch_size": 64, "mean": 1.800965666770935, "std": 2.9846367835998535, "min": -4.040355682373047, "p10": -1.3917213439941403, "median": 1.3544025421142578, "p90": 5.206805419921877, "max": 12.722503662109375, "pos_frac": 0.734375, "sample": [1.0945816040039062, 4.647247314453125, 9.39678955078125, 4.086578369140625, -0.3738555908203125, 0.8436203002929688, -1.1525115966796875, 2.758026123046875, 2.46002197265625, 4.2215576171875, 0.2955169677734375, 1.1457862854003906, 4.211692810058594, 1.2517166137695312, 1.3734169006347656, -0.7109603881835938, -0.5663166046142578, 2.6341304779052734, 3.318117141723633, 5.9066314697265625, -2.0275650024414062, 12.722503662109375, -0.47705078125, 2.7537994384765625, 1.33538818359375, -1.6141357421875, 5.382568359375, 4.79669189453125, 0.6569747924804688, 2.5834808349609375, -1.14678955078125, -0.4681282043457031, 8.294803619384766, 6.998504638671875, 3.6506271362304688, 0.4224662780761719, -4.040355682373047, 0.8996505737304688, -0.2163543701171875, 0.3302803039550781, 1.8470458984375, 1.8262863159179688, 2.1740875244140625, 2.3320789337158203, 4.176994323730469, 0.8985137939453125, 2.552642822265625, 1.4482460021972656, -0.7900505065917969, 0.39575958251953125, 6.654388427734375, -2.9417572021484375, -1.6547698974609375, 0.6971578598022461, 3.5462493896484375, 1.8008880615234375, 1.7718505859375, 3.5359649658203125, -0.12398338317871094, 0.015167236328125, -1.4942398071289062, 2.387502670288086, -3.5332412719726562, 0.05986785888671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000496.npy"}
|
||||
{"epoch": 0.7498110355253212, "step": 497, "batch_size": 64, "mean": 2.0381202697753906, "std": 2.93377947807312, "min": -4.4284820556640625, "p10": -1.04886360168457, "median": 1.8798370361328125, "p90": 5.1979217529296875, "max": 10.01895523071289, "pos_frac": 0.765625, "sample": [10.01895523071289, -0.5679397583007812, 9.726028442382812, 3.8428268432617188, 3.9510498046875, 0.6981277465820312, 2.415191650390625, 1.7044410705566406, 0.9501113891601562, -0.5692138671875, 2.6790924072265625, 3.0018081665039062, 3.068828582763672, 1.2729873657226562, 1.2828369140625, 0.42173004150390625, 3.2503433227539062, -4.4284820556640625, 2.8912792205810547, 0.47381591796875, -0.5014991760253906, 3.67779541015625, 1.3581466674804688, 0.22781753540039062, 1.1162986755371094, -0.7741165161132812, -1.66644287109375, 3.3943634033203125, 3.2555389404296875, 4.9014892578125, -0.5750331878662109, 3.2838821411132812, 0.7867431640625, 2.9846115112304688, -2.1742630004882812, 2.3057708740234375, 4.7716827392578125, 2.361957550048828, 2.6574935913085938, -2.350341796875, 5.692596435546875, 0.6495895385742188, -0.6192131042480469, -0.2791748046875, 5.1992034912109375, -0.7878990173339844, 8.556327819824219, 9.228599548339844, -1.3554344177246094, 0.22541046142578125, 1.0858154296875, 2.501373291015625, 3.29888916015625, 3.1541671752929688, 2.203826904296875, 8.691768646240234, -2.37615966796875, 2.0552330017089844, 0.11438369750976562, 0.7863578796386719, 2.2167205810546875, 5.1949310302734375, -1.16070556640625, 1.037384033203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000497.npy"}
|
||||
{"epoch": 0.7513227513227513, "step": 498, "batch_size": 64, "mean": 1.5286893844604492, "std": 2.622896194458008, "min": -3.0637779235839844, "p10": -1.4195083618164062, "median": 1.0708913803100586, "p90": 5.139940643310548, "max": 9.52798843383789, "pos_frac": 0.6875, "sample": [-1.5033721923828125, 2.3022193908691406, 5.46697998046875, -0.076934814453125, -0.982208251953125, -3.0637779235839844, 0.5382804870605469, 1.416412353515625, -0.6553916931152344, -1.2238616943359375, 4.804008483886719, 1.0089645385742188, -0.5472640991210938, 1.8355789184570312, 3.7150535583496094, 0.9775161743164062, 0.9340476989746094, 0.9284210205078125, 0.11796188354492188, 0.7949104309082031, 7.887657165527344, 1.9858245849609375, 3.3112564086914062, -1.6435203552246094, 1.99468994140625, -0.7868804931640625, 1.133209228515625, 3.7374725341796875, 3.7753524780273438, -0.8018989562988281, 1.9536972045898438, 4.113796234130859, 0.6530551910400391, 0.9476699829101562, 1.960723876953125, -1.08026123046875, -1.5041122436523438, 2.6008338928222656, 0.28704071044921875, 3.684093475341797, 6.231422424316406, 3.623767852783203, -1.853515625, 1.9033470153808594, 1.5286521911621094, 1.2153663635253906, 1.0122833251953125, -0.8733444213867188, 1.7718696594238281, 1.1294994354248047, 5.333915710449219, 9.52798843383789, 0.624237060546875, -1.0709686279296875, 4.924934387207031, -1.1048240661621094, 5.232086181640625, 1.568328857421875, -0.4874267578125, -2.279998779296875, -0.5663604736328125, -1.50335693359375, 2.691814422607422, 8.259162902832031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000498.npy"}
|
||||
{"epoch": 0.7528344671201814, "step": 499, "batch_size": 64, "mean": 1.8459932804107666, "std": 2.9726712703704834, "min": -3.7705841064453125, "p10": -1.979182434082031, "median": 1.9476661682128906, "p90": 5.905646514892578, "max": 8.562904357910156, "pos_frac": 0.671875, "sample": [4.551700592041016, 8.562904357910156, -0.6886520385742188, 2.364898681640625, 2.8866195678710938, -1.2606887817382812, -0.028455734252929688, -2.8422164916992188, 0.09187126159667969, 2.9702224731445312, 2.0856246948242188, 4.458244323730469, 6.010337829589844, 0.1336040496826172, 4.6759033203125, -1.5793075561523438, 8.074615478515625, -1.4412899017333984, 5.993927001953125, 2.5204925537109375, 3.108917236328125, -2.06781005859375, -2.3344497680664062, -3.216106414794922, -1.2488670349121094, 2.7418670654296875, 4.2067108154296875, 7.362335205078125, -1.344970703125, -1.0360260009765625, -0.5390987396240234, -2.07427978515625, 3.1254615783691406, 1.259063720703125, 2.4603271484375, 2.144500732421875, 1.507293701171875, 2.631134033203125, 0.7579994201660156, -1.0389785766601562, 5.890815734863281, -0.08008575439453125, 5.257221221923828, -0.6870994567871094, -0.0108489990234375, 1.8097076416015625, 1.6951217651367188, 4.417356491088867, 4.804443359375, 6.719818115234375, 1.5314674377441406, 4.2667083740234375, 2.5678672790527344, -2.9622955322265625, -3.7705841064453125, 3.840911865234375, 3.9613800048828125, 3.7785987854003906, -1.7723846435546875, 4.349029541015625, 0.5534286499023438, 5.9120025634765625, 1.4016799926757812, 0.7239227294921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000499.npy"}
|
||||
{"epoch": 0.7543461829176115, "step": 500, "batch_size": 64, "mean": 1.5600130558013916, "std": 2.598702907562256, "min": -5.896333694458008, "p10": -1.2486000061035154, "median": 1.2061424255371094, "p90": 4.749977111816406, "max": 7.744331359863281, "pos_frac": 0.734375, "sample": [0.02892303466796875, -1.1216278076171875, 0.95794677734375, -0.8031387329101562, 1.0136871337890625, 2.677520751953125, 4.219085693359375, 4.7288818359375, 0.49086761474609375, 6.651592254638672, 2.2529144287109375, 6.0313873291015625, 4.221523284912109, 1.3289871215820312, 4.394508361816406, 1.1758651733398438, 7.744331359863281, -3.419097900390625, 0.9663124084472656, 1.7463150024414062, 4.327919006347656, -1.3030166625976562, -1.0147743225097656, 3.833587646484375, -0.16765594482421875, 4.811553955078125, 3.2320022583007812, 0.8538627624511719, -0.5161380767822266, -4.066612243652344, -1.8112983703613281, -0.41704559326171875, 1.0022544860839844, 3.73516845703125, -5.896333694458008, -0.53070068359375, 1.5020751953125, 2.62451171875, 4.243263244628906, 1.0447998046875, 1.7674407958984375, -0.7518081665039062, 2.618988037109375, 1.236419677734375, -1.9576644897460938, 2.870800018310547, 0.04044342041015625, -2.27166748046875, 5.477245330810547, -0.5139999389648438, 4.7590179443359375, 1.9665679931640625, 0.8674297332763672, 1.0986328125, 3.970783233642578, 0.355560302734375, 0.6463823318481445, 3.0863037109375, 3.012073516845703, 0.049877166748046875, -0.0521392822265625, 2.8897361755371094, 5.249172210693359, 2.651031494140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000500.npy"}
|
||||
{"epoch": 0.7558578987150416, "step": 501, "batch_size": 64, "mean": 1.6241438388824463, "std": 2.4156267642974854, "min": -3.281402587890625, "p10": -0.38134384155273426, "median": 0.8882160186767578, "p90": 5.265370178222659, "max": 8.480968475341797, "pos_frac": 0.765625, "sample": [-0.4162712097167969, 3.936016082763672, 1.791656494140625, 0.47979736328125, 6.667688369750977, 3.3601112365722656, 0.4219970703125, 2.82659912109375, 4.589973449707031, 8.480968475341797, 6.594108581542969, -0.6686038970947266, 1.89849853515625, 2.8061447143554688, -0.2998466491699219, 0.021894454956054688, 2.432952880859375, 3.0703201293945312, -3.281402587890625, -1.3464241027832031, 1.0233154296875, -0.12242507934570312, -0.11377906799316406, 2.19464111328125, 7.418426513671875, 4.7049407958984375, 3.1885910034179688, 5.50555419921875, -0.22417831420898438, 0.2234344482421875, -0.28285980224609375, 2.2675628662109375, 0.26728057861328125, 2.588623046875, 3.85906982421875, 1.371978759765625, 1.3099746704101562, -0.9866161346435547, 0.28560829162597656, 0.22826385498046875, 0.5571861267089844, 5.69415283203125, 0.6249294281005859, 0.5631179809570312, 0.8695220947265625, -0.1073760986328125, 0.2723541259765625, 0.992584228515625, -2.0873336791992188, -1.8038177490234375, -0.100341796875, 7.855262756347656, 0.3502655029296875, 0.2704010009765625, 0.19490814208984375, 0.39548397064208984, -0.194122314453125, 0.4269256591796875, 2.5351791381835938, 1.650421142578125, 2.190174102783203, 1.4870262145996094, 0.9069099426269531, 2.3278045654296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000501.npy"}
|
||||
{"epoch": 0.7573696145124716, "step": 502, "batch_size": 64, "mean": 1.2222027778625488, "std": 1.9419337511062622, "min": -2.3759765625, "p10": -0.9255836486816406, "median": 0.7795906066894531, "p90": 3.5227684020996093, "max": 6.9483642578125, "pos_frac": 0.6875, "sample": [-1.4855117797851562, 0.5393733978271484, 0.16679954528808594, -0.7444915771484375, 0.883544921875, 2.081573486328125, 0.06577682495117188, -0.13977813720703125, -1.763946533203125, 0.8363037109375, 1.2517852783203125, 3.5046157836914062, -1.871490478515625, -2.3759765625, 0.306121826171875, 6.9483642578125, 2.36114501953125, 0.0098876953125, 2.6210708618164062, 1.897796630859375, 3.3411331176757812, -0.7599945068359375, 2.1161956787109375, 1.5126266479492188, 0.7228775024414062, 0.5085201263427734, 1.4775314331054688, -1.0051536560058594, -0.084228515625, 2.8979034423828125, 0.44652557373046875, -0.23022842407226562, 2.1834335327148438, 0.14704132080078125, -0.4509315490722656, 0.41129302978515625, 6.187511444091797, 5.6162261962890625, -0.9391632080078125, -0.8938980102539062, -0.720245361328125, 0.5022850036621094, 1.485504150390625, -1.3035964965820312, -0.04499626159667969, -0.09326171875, 2.5542678833007812, 1.984130859375, 2.6086769104003906, -0.555206298828125, 3.8736495971679688, 3.43310546875, 2.445140838623047, 3.0380401611328125, 3.08770751953125, -0.1413726806640625, 2.277740478515625, 3.1154937744140625, 0.646881103515625, 3.530548095703125, 3.7352027893066406, -0.27901458740234375, 3.560882568359375, 1.1812286376953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000502.npy"}
|
||||
{"epoch": 0.7588813303099018, "step": 503, "batch_size": 64, "mean": 1.6076714992523193, "std": 2.526444435119629, "min": -5.9087677001953125, "p10": -1.4456108093261715, "median": 1.514892578125, "p90": 5.081388854980469, "max": 6.3128204345703125, "pos_frac": 0.765625, "sample": [1.4010772705078125, -2.459674835205078, 4.162750244140625, 2.9490890502929688, 1.7248687744140625, 3.431854248046875, 2.697662353515625, -5.9087677001953125, -0.04620361328125, 5.914249420166016, 1.272705078125, -0.7242355346679688, 1.0156288146972656, 5.998016357421875, -0.2975177764892578, 0.9636001586914062, 1.1875686645507812, 1.0969085693359375, 6.3128204345703125, -1.5738143920898438, 0.9239654541015625, 1.7406768798828125, 4.204141616821289, 1.6287078857421875, 1.9798622131347656, 0.2780303955078125, 0.5262527465820312, 4.988094329833984, 0.5860748291015625, 5.121372222900391, -2.280414581298828, 1.334482192993164, 5.462371826171875, 4.121479034423828, 4.489955902099609, -1.0029830932617188, 2.4688072204589844, -1.1257400512695312, -1.1464691162109375, 0.36066436767578125, -2.57086181640625, -0.25705718994140625, 0.22842025756835938, 2.6731491088867188, 4.884990692138672, 1.7271270751953125, 1.6796646118164062, 6.071296691894531, 3.1728057861328125, 1.290761947631836, 5.978176116943359, 2.0028438568115234, -1.1409835815429688, 0.9633388519287109, 4.723976135253906, 2.1676483154296875, 0.7974720001220703, -1.9910411834716797, 1.7264785766601562, 3.6302947998046875, 1.6651935577392578, 0.5456924438476562, 2.1991958618164062, -3.055522918701172], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000503.npy"}
|
||||
{"epoch": 0.7603930461073318, "step": 504, "batch_size": 64, "mean": 2.3534345626831055, "std": 2.3317084312438965, "min": -1.9503097534179688, "p10": -0.09088211059570311, "median": 1.9581403732299805, "p90": 5.637505149841311, "max": 9.02935791015625, "pos_frac": 0.859375, "sample": [3.955047607421875, 3.0770225524902344, 0.515655517578125, 1.457632064819336, -0.7362289428710938, 0.031757354736328125, 4.43939208984375, 1.6335868835449219, 6.296022415161133, 1.0584793090820312, -1.535369873046875, 2.1055221557617188, 1.4775314331054688, 7.0140533447265625, 1.1902618408203125, 5.884256362915039, 4.296012878417969, -1.9503097534179688, 3.8808364868164062, 6.0733795166015625, 1.5313606262207031, 1.1280288696289062, 1.1828289031982422, 6.557319641113281, 3.5282974243164062, -0.083282470703125, 0.5698337554931641, 3.7265090942382812, 2.2944869995117188, 4.649375915527344, 4.733245849609375, 1.9102935791015625, 2.0059871673583984, 4.535545349121094, 4.58197021484375, 0.7295761108398438, 2.1151695251464844, -0.09413909912109375, 9.02935791015625, -0.5986557006835938, 5.0617523193359375, 1.2049560546875, 0.44547271728515625, 1.76434326171875, 2.00738525390625, 1.5014190673828125, 0.6140537261962891, -0.04699897766113281, 3.681182861328125, 0.9684181213378906, 2.92694091796875, -1.8256092071533203, -0.7882080078125, 2.258056640625, 0.7345981597900391, 1.3543853759765625, 0.1856069564819336, 3.965972900390625, 2.5164794921875, 3.354619026184082, 2.853374481201172, 7.890094757080078, 2.252368927001953, 1.5414848327636719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000504.npy"}
|
||||
{"epoch": 0.7619047619047619, "step": 505, "batch_size": 64, "mean": 1.2635676860809326, "std": 2.832475423812866, "min": -4.672573089599609, "p10": -2.1962272644042966, "median": 1.063838005065918, "p90": 4.480353164672852, "max": 8.609710693359375, "pos_frac": 0.671875, "sample": [4.303791046142578, 0.945953369140625, -1.9283599853515625, -0.44720458984375, -0.7933578491210938, 2.491241455078125, -3.2856292724609375, 8.609710693359375, 2.8749513626098633, 1.777435302734375, 2.733245849609375, -0.9262275695800781, -0.14019012451171875, 1.3444137573242188, 3.701038360595703, -0.96783447265625, -1.72430419921875, -0.7319488525390625, 2.1745681762695312, 0.2862091064453125, 0.543365478515625, 3.556243896484375, 2.4076499938964844, 2.9816436767578125, 2.3057937622070312, 7.111396789550781, 3.8600921630859375, -1.582855224609375, 0.4573936462402344, 1.536111831665039, 0.000396728515625, -2.3986358642578125, 4.276145935058594, -2.9373092651367188, 6.2765655517578125, 4.556022644042969, 2.957305908203125, 3.9726943969726562, -2.3110275268554688, -1.7080841064453125, 3.997650146484375, 0.2951507568359375, 0.22748565673828125, -1.2766265869140625, 1.5730133056640625, 0.10128402709960938, 2.0826492309570312, -1.5013961791992188, 5.4642181396484375, -4.672573089599609, 0.7146110534667969, 2.169239044189453, 7.93168830871582, 1.9744338989257812, 2.7540130615234375, -3.533355712890625, 5.232292175292969, -0.6523284912109375, 0.7359771728515625, 3.990114212036133, 0.20419692993164062, 1.181722640991211, -1.2953033447265625, -2.9882354736328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000505.npy"}
|
||||
{"epoch": 0.763416477702192, "step": 506, "batch_size": 64, "mean": 1.347464919090271, "std": 3.271763563156128, "min": -5.613311767578125, "p10": -2.8514358520507805, "median": 1.33526611328125, "p90": 5.24504623413086, "max": 11.672721862792969, "pos_frac": 0.703125, "sample": [2.127849578857422, 5.32098388671875, -0.5466423034667969, 0.9545478820800781, -1.6773529052734375, 0.61590576171875, 1.5615959167480469, -2.1368579864501953, 2.9649887084960938, 1.6417236328125, 0.5901107788085938, 3.4357681274414062, 4.12896728515625, -0.5643081665039062, 1.3956527709960938, 0.8672256469726562, -5.613311767578125, 9.539398193359375, 11.672721862792969, -1.35736083984375, 2.1606216430664062, -3.1409759521484375, 1.2748794555664062, -2.17584228515625, -4.9628448486328125, -3.855731964111328, -1.062347412109375, 5.313575744628906, 6.349029541015625, 1.8151187896728516, 3.2788619995117188, -4.763702392578125, 7.6222076416015625, -1.1753692626953125, 4.792549133300781, 0.474029541015625, 3.080921173095703, 1.6019363403320312, 2.7870941162109375, 1.7825927734375, -3.189544677734375, -0.27988433837890625, 0.757415771484375, 6.027996063232422, 0.0917205810546875, 0.4008445739746094, -0.3752861022949219, 0.626922607421875, 0.7022895812988281, 0.8690261840820312, 5.08514404296875, 2.986419677734375, -0.571868896484375, 3.1473522186279297, 2.9972190856933594, 2.49163818359375, 0.0227508544921875, 3.5355072021484375, -4.649452209472656, -1.4226608276367188, 1.6903038024902344, 2.0550384521484375, 2.4738235473632812, 4.646827697753906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000506.npy"}
|
||||
{"epoch": 0.764928193499622, "step": 507, "batch_size": 64, "mean": 1.4136903285980225, "std": 2.577890634536743, "min": -4.9087371826171875, "p10": -1.1828819274902345, "median": 1.3357048034667969, "p90": 4.491201782226564, "max": 9.162368774414062, "pos_frac": 0.65625, "sample": [4.10845947265625, 2.2400360107421875, -1.1205215454101562, -0.164581298828125, 4.9920806884765625, 0.3599090576171875, 3.085205078125, -1.143157958984375, -0.5877037048339844, 2.5915298461914062, -0.7847023010253906, 1.8209991455078125, -0.2846221923828125, -4.9087371826171875, -1.18927001953125, -1.1219482421875, 1.3456573486328125, -1.4966583251953125, 4.051273345947266, 1.104318618774414, 0.6624603271484375, 1.6232070922851562, -0.5638389587402344, -0.6958827972412109, 5.493438720703125, 0.36959362030029297, 0.07098865509033203, 4.5943450927734375, 2.3068418502807617, 2.9504547119140625, 1.0904617309570312, -0.090484619140625, 9.162368774414062, -3.4361114501953125, 2.5258865356445312, 2.713134765625, -0.5591583251953125, 1.8218002319335938, -2.5095043182373047, -0.2550182342529297, 1.3257522583007812, 3.0927200317382812, 0.7822151184082031, 5.325727462768555, 3.3673839569091797, 0.893280029296875, 2.5701217651367188, 3.20001220703125, 4.598541259765625, 7.6089630126953125, -2.0816879272460938, 4.2505340576171875, 0.5631332397460938, -0.19387054443359375, 1.7373809814453125, -1.1679763793945312, 2.8123435974121094, -1.1238327026367188, 3.5739097595214844, -2.821056365966797, 4.174102783203125, 2.32232666015625, 1.9260635375976562, 3.567546844482422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000507.npy"}
|
||||
{"epoch": 0.7664399092970522, "step": 508, "batch_size": 64, "mean": 1.8011486530303955, "std": 2.787646532058716, "min": -2.8176116943359375, "p10": -1.4129524230957031, "median": 1.3567581176757812, "p90": 5.117611694335938, "max": 10.954872131347656, "pos_frac": 0.75, "sample": [1.4531402587890625, -0.89129638671875, 4.72283935546875, 1.907318115234375, 1.503448486328125, -1.5458908081054688, 10.954872131347656, 0.25409698486328125, 1.6945610046386719, 6.438896179199219, 10.070205688476562, 4.959175109863281, -1.2084465026855469, 2.9798316955566406, 4.682037353515625, 1.2074966430664062, -2.1525611877441406, 1.3287506103515625, 1.7514228820800781, -0.5235691070556641, 3.3018035888671875, -2.021747589111328, 1.3056907653808594, -0.2235260009765625, 1.1758956909179688, -2.8176116943359375, 0.5322341918945312, 1.9664249420166016, 1.707000732421875, 0.6984481811523438, 0.7830047607421875, 1.8545684814453125, 0.6364288330078125, 1.3544921875, 5.230274200439453, 5.0364990234375, 1.0172843933105469, 3.672515869140625, 2.8888702392578125, -1.4366531372070312, 1.1908988952636719, -1.3576507568359375, 5.152374267578125, 2.577655792236328, 0.03511810302734375, -1.6237335205078125, 1.3590240478515625, 3.50860595703125, 6.687164306640625, 1.9287185668945312, 1.0931396484375, -0.41187477111816406, -0.5745620727539062, -0.8393592834472656, 0.329345703125, -2.724620819091797, -1.2949142456054688, 6.846244812011719, 2.6425933837890625, 4.68988037109375, 1.8036537170410156, 2.1988525390625, 1.1817398071289062, 4.626991271972656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000508.npy"}
|
||||
{"epoch": 0.7679516250944822, "step": 509, "batch_size": 64, "mean": 1.7866381406784058, "std": 2.5148022174835205, "min": -2.7092208862304688, "p10": -1.2441450119018553, "median": 1.3668947219848633, "p90": 5.059567642211914, "max": 8.673973083496094, "pos_frac": 0.75, "sample": [1.8330764770507812, -1.2742938995361328, -0.4676666259765625, 2.633769989013672, 1.2488632202148438, 0.3568391799926758, 1.9666824340820312, -2.7092208862304688, -1.1074371337890625, 2.356891632080078, 4.909152984619141, 5.87579345703125, 6.904541015625, 1.3473758697509766, 3.1771316528320312, -1.7327117919921875, 8.484405517578125, 5.653739929199219, 1.301513671875, -0.3156890869140625, 4.564292907714844, -2.0825462341308594, 0.529144287109375, -1.4327201843261719, 1.7233428955078125, 0.372650146484375, -0.390869140625, -0.294036865234375, 3.3681716918945312, 0.3803558349609375, 4.0872039794921875, 3.343578338623047, 1.3243026733398438, 0.810699462890625, -0.6857986450195312, 0.384368896484375, 1.6754341125488281, 0.0255584716796875, 4.8724365234375, 0.3296012878417969, 4.641761779785156, 1.013031005859375, 1.9110260009765625, -1.3399276733398438, 2.4145660400390625, 3.415679931640625, 0.4739227294921875, 3.6818389892578125, 8.673973083496094, 2.8472747802734375, 2.6836776733398438, -0.13184356689453125, 2.2332916259765625, -1.173797607421875, 0.2828216552734375, 3.8056716918945312, 3.4035186767578125, -1.7385177612304688, -0.9581451416015625, 1.38641357421875, 2.0508880615234375, 5.581817626953125, 5.124031066894531, 0.7139396667480469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000509.npy"}
|
||||
{"epoch": 0.7694633408919124, "step": 510, "batch_size": 64, "mean": 1.567130208015442, "std": 2.7065324783325195, "min": -3.9423370361328125, "p10": -2.46945858001709, "median": 1.5139713287353516, "p90": 4.891679191589356, "max": 8.218330383300781, "pos_frac": 0.75, "sample": [-1.872589111328125, 4.104240417480469, 4.189323425292969, 1.6020584106445312, -3.1643829345703125, 0.8943939208984375, 2.877685546875, 7.483123779296875, 1.8990554809570312, 2.0036773681640625, 0.2794342041015625, 1.151519775390625, -0.176361083984375, 0.7331504821777344, -2.5106258392333984, -1.1825027465820312, 1.4337692260742188, 5.231758117675781, -2.760894775390625, 1.5328330993652344, 1.4442291259765625, 3.4180870056152344, 4.206932067871094, -0.5899505615234375, 0.9676437377929688, 4.10687255859375, 2.4689407348632812, 0.5100250244140625, 0.9551143646240234, 2.6916580200195312, 2.094493865966797, 0.9022750854492188, 5.077911376953125, 2.4973220825195312, 0.6393852233886719, 1.9346160888671875, 1.576324462890625, 8.218330383300781, -0.5042648315429688, -3.9423370361328125, 0.5761260986328125, -2.373401641845703, 2.125347137451172, 2.9711132049560547, 3.7044677734375, 1.0045547485351562, 1.4951095581054688, 6.525978088378906, 2.13092041015625, -0.13958740234375, 2.751676559448242, 2.8029823303222656, 4.476390838623047, 1.8297100067138672, -2.8861961364746094, 0.8375988006591797, 0.943389892578125, 4.73968505859375, -0.8219337463378906, 4.956819534301758, 7.554901123046875, -0.242706298828125, -3.1689453125, -3.9199447631835938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000510.npy"}
|
||||
{"epoch": 0.7709750566893424, "step": 511, "batch_size": 64, "mean": 2.209611177444458, "std": 2.61672306060791, "min": -3.214122772216797, "p10": -0.9944641113281248, "median": 1.7502822875976562, "p90": 5.75477523803711, "max": 10.005905151367188, "pos_frac": 0.78125, "sample": [3.437835693359375, 6.0396270751953125, 1.0537605285644531, 6.460259437561035, 1.7307815551757812, -0.4277496337890625, 0.7237663269042969, -0.5815048217773438, 4.357109069824219, 4.24462890625, 2.3393783569335938, -1.0453720092773438, 1.2710342407226562, -0.00457763671875, 3.4692611694335938, 4.371185302734375, 3.3814468383789062, -0.4180946350097656, 5.222969055175781, 4.210716247558594, 6.068584442138672, 4.574737548828125, 0.2505340576171875, -3.214122772216797, -0.8756790161132812, -1.500152587890625, 4.176788330078125, 5.080192565917969, 3.54736328125, 0.08414459228515625, 0.7745933532714844, 0.2134552001953125, 0.2201690673828125, 2.2691497802734375, 3.996795654296875, 0.91326904296875, 3.3191299438476562, -1.118865966796875, 0.5683479309082031, 1.3594856262207031, 1.178192138671875, 3.5158729553222656, 5.853828430175781, 4.9404296875, 5.2244415283203125, 3.086700439453125, -1.0693435668945312, 0.667938232421875, 0.4293975830078125, 6.525154113769531, 1.5942840576171875, 1.2654762268066406, 1.7697830200195312, 1.7738494873046875, 0.5054769515991211, 2.570547103881836, -1.6175765991210938, 10.005905151367188, 2.5820388793945312, 5.523651123046875, 6.734554290771484, -0.17023658752441406, -1.3624114990234375, -0.6572189331054688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000511.npy"}
|
||||
{"epoch": 0.7724867724867724, "step": 512, "batch_size": 64, "mean": 1.543086051940918, "std": 2.4564404487609863, "min": -2.5403518676757812, "p10": -1.2704578399658204, "median": 1.0372028350830078, "p90": 4.779275512695314, "max": 7.686531066894531, "pos_frac": 0.71875, "sample": [3.4832992553710938, 1.9528961181640625, 2.491016387939453, -1.9873161315917969, -0.4310951232910156, 6.736522674560547, -0.209075927734375, 1.4085750579833984, -0.2699241638183594, -1.0225601196289062, 1.0227241516113281, 4.526954650878906, 0.015594482421875, -0.6834983825683594, 1.9029693603515625, 3.7125701904296875, 0.20767974853515625, -2.5403518676757812, 2.8876190185546875, 0.5586967468261719, 3.648517608642578, 2.388172149658203, 0.369354248046875, 2.693523406982422, 0.314453125, 7.00238037109375, -2.2745361328125, 2.756622314453125, -1.2753753662109375, 5.438507080078125, 0.365753173828125, 3.1160888671875, 7.5140380859375, 0.09800148010253906, 1.851654052734375, 1.0516815185546875, -0.7519302368164062, 4.353141784667969, 0.28340911865234375, 4.887413024902344, -0.4255828857421875, -1.3722572326660156, -1.488077163696289, 0.0980987548828125, 2.747833251953125, 7.0401153564453125, 0.45361328125, 0.17309951782226562, 1.9168548583984375, -0.7747573852539062, 2.6545143127441406, 1.5483131408691406, 0.6961822509765625, 0.7148361206054688, 2.5756683349609375, 1.9214401245117188, -0.38629913330078125, 2.004119873046875, 7.686531066894531, 4.118766784667969, -1.2589836120605469, -1.3532028198242188, 2.219755172729492, -0.3472442626953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000512.npy"}
|
||||
{"epoch": 0.7739984882842026, "step": 513, "batch_size": 64, "mean": 1.9538071155548096, "std": 2.309340476989746, "min": -1.3132553100585938, "p10": -0.6679458618164061, "median": 1.4092049598693848, "p90": 5.046362304687501, "max": 10.747234344482422, "pos_frac": 0.84375, "sample": [5.165641784667969, 4.0482940673828125, 7.2439422607421875, 0.14870071411132812, 1.598663330078125, 1.2994728088378906, 1.7857093811035156, 1.9085578918457031, 1.0103225708007812, 0.7040328979492188, 1.1417083740234375, -0.7517585754394531, 5.7295684814453125, 0.6727828979492188, 0.23362159729003906, 2.340341567993164, 1.2040271759033203, 0.5502090454101562, 10.747234344482422, 3.7914886474609375, 1.0837421417236328, 5.689159393310547, 0.9536972045898438, 1.9931068420410156, 3.7257156372070312, -0.599639892578125, 1.621368408203125, 0.9561195373535156, -0.32718658447265625, 0.26204872131347656, 2.3308258056640625, 1.0878238677978516, 1.6126861572265625, -1.3132553100585938, 3.436920166015625, 2.5848007202148438, 0.3870697021484375, 2.84100341796875, 5.594757080078125, 1.1149978637695312, -0.9108409881591797, -1.0364799499511719, 4.768043518066406, 0.2575531005859375, 2.9572830200195312, 4.398582458496094, 7.4645233154296875, -1.0046234130859375, 4.21380615234375, 2.9704055786132812, 2.5608444213867188, 0.25917816162109375, 0.9728813171386719, 3.9813385009765625, 2.54266357421875, -0.96697998046875, -0.37363433837890625, 1.2107925415039062, 0.4309349060058594, 1.518937110900879, 1.8667144775390625, 0.16793060302734375, -0.6972198486328125, 1.8826980590820312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000513.npy"}
|
||||
{"epoch": 0.7755102040816326, "step": 514, "batch_size": 64, "mean": 1.7902262210845947, "std": 2.6018528938293457, "min": -4.0066375732421875, "p10": -1.2015686035156248, "median": 1.4798774719238281, "p90": 4.734860229492187, "max": 8.996963500976562, "pos_frac": 0.765625, "sample": [1.5730018615722656, 1.7496986389160156, 1.1525154113769531, 3.8378524780273438, 1.7103118896484375, -0.852935791015625, 2.422210693359375, 0.1161956787109375, 1.3448638916015625, -0.059673309326171875, 3.4215126037597656, 4.5002593994140625, 1.3867530822753906, 4.996124267578125, -0.7142791748046875, -1.3119468688964844, -2.2352294921875, 0.7213172912597656, -0.09569931030273438, -0.5397720336914062, -4.0066375732421875, -0.5629043579101562, 0.3704719543457031, 1.8005409240722656, 8.996963500976562, 0.6727752685546875, 1.671966552734375, 2.383617401123047, 3.4021072387695312, 3.1359176635742188, 0.8501663208007812, -2.3487987518310547, 4.738243103027344, 1.3124008178710938, 2.473907470703125, 3.0289154052734375, 1.00140380859375, 4.726966857910156, 2.427825927734375, 0.6246261596679688, 0.32529449462890625, -3.0779037475585938, -1.5729255676269531, 2.249286651611328, 3.085723876953125, 4.106880187988281, 1.0676956176757812, 0.36341094970703125, 4.319873809814453, 4.3558807373046875, 0.86578369140625, 0.9434127807617188, 3.2930641174316406, 7.7143096923828125, -0.3528289794921875, 5.523902893066406, 2.9477157592773438, 1.03369140625, 2.16741943359375, -0.9440193176269531, 8.8345947265625, 3.7507457733154297, -2.1644134521484375, 5.9143218994140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000514.npy"}
|
||||
{"epoch": 0.7770219198790628, "step": 515, "batch_size": 64, "mean": 1.514425277709961, "std": 2.4809651374816895, "min": -3.7281570434570312, "p10": -1.6362686157226562, "median": 1.5662689208984375, "p90": 4.618651580810547, "max": 8.218231201171875, "pos_frac": 0.671875, "sample": [-0.12179946899414062, 0.051692962646484375, -0.24431610107421875, -0.08787155151367188, 5.496040344238281, 4.627651214599609, 0.09720611572265625, 2.7832412719726562, 2.5616531372070312, 4.0626373291015625, 3.1758956909179688, 5.8938446044921875, 2.450216293334961, -0.125946044921875, 2.8411636352539062, -0.38824462890625, -1.7722549438476562, 3.4062347412109375, -2.8363418579101562, -1.9636383056640625, -0.45645904541015625, -0.4991912841796875, 0.4087371826171875, 2.0047454833984375, -2.042572021484375, 2.222701072692871, 5.993757247924805, 2.2110748291015625, 2.0486373901367188, 1.1577873229980469, 0.4828338623046875, 1.6571235656738281, -0.4605712890625, 0.14045333862304688, 3.9362640380859375, 5.384120941162109, -2.9466629028320312, 3.044219970703125, -0.6487846374511719, 2.4667510986328125, -1.5782432556152344, 1.4754142761230469, 3.7559051513671875, -0.574249267578125, 4.4679718017578125, 8.218231201171875, 0.8754177093505859, 2.1100502014160156, 2.665069580078125, 4.597652435302734, 2.0721206665039062, 0.8867950439453125, 0.445098876953125, 3.440948486328125, 6.366851806640625, -1.6611366271972656, 2.4274463653564453, -0.03049468994140625, 3.2691879272460938, -0.7472038269042969, 0.9074668884277344, 2.508056640625, -1.2590112686157227, -3.7281570434570312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000515.npy"}
|
||||
{"epoch": 0.7785336356764928, "step": 516, "batch_size": 64, "mean": 1.3661943674087524, "std": 2.450043201446533, "min": -4.36920166015625, "p10": -1.1483360290527342, "median": 1.533071517944336, "p90": 3.940576553344727, "max": 8.068832397460938, "pos_frac": 0.71875, "sample": [4.174644470214844, 8.068832397460938, 1.5902214050292969, -0.8321685791015625, -1.0039596557617188, 6.014488220214844, -0.439910888671875, 1.0964832305908203, 3.781414031982422, 1.475921630859375, -0.391326904296875, 2.738697052001953, 0.079437255859375, 5.731697082519531, 0.8149452209472656, -0.08778858184814453, 2.7996978759765625, 1.9375228881835938, 2.6984710693359375, 1.613800048828125, 0.39516448974609375, -4.36920166015625, 1.3511428833007812, 2.0316619873046875, 7.967933654785156, 3.0193824768066406, 1.4366989135742188, -0.4777488708496094, 2.11639404296875, 2.6390762329101562, -3.04571533203125, -2.219320297241211, 0.1884632110595703, -1.1792373657226562, -0.45485687255859375, 0.4796905517578125, 2.51739501953125, 0.6004142761230469, 1.7519588470458984, -2.641521453857422, 2.6606903076171875, -3.2959976196289062, 2.4429092407226562, 3.0763626098632812, 2.0388336181640625, -1.07623291015625, 1.6498603820800781, 3.0734100341796875, 1.71734619140625, 0.52557373046875, -0.9864120483398438, 2.528656005859375, 1.9295501708984375, 1.461456298828125, 0.9772186279296875, 2.26214599609375, 0.5422134399414062, 1.6013755798339844, -3.027862548828125, -0.4896259307861328, 3.669281005859375, -0.19281387329101562, 4.0087890625, 6.370819091796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000516.npy"}
|
||||
{"epoch": 0.780045351473923, "step": 517, "batch_size": 64, "mean": 1.7076544761657715, "std": 3.138303518295288, "min": -3.7544174194335938, "p10": -1.5927936553955078, "median": 0.7283859252929688, "p90": 6.171920776367188, "max": 11.25430679321289, "pos_frac": 0.640625, "sample": [3.6356430053710938, -2.4997177124023438, 0.7592697143554688, 0.65997314453125, -0.6084442138671875, 1.2608051300048828, 0.6619796752929688, -0.13184738159179688, 4.861778259277344, 5.474433898925781, 1.8815231323242188, -2.0795440673828125, -0.03389739990234375, 0.42272186279296875, -0.46538543701171875, -0.37706756591796875, 0.45984649658203125, 3.2250289916992188, 3.661754608154297, -1.4996299743652344, 2.5623397827148438, -0.7969284057617188, 3.31671142578125, 6.223087310791016, -1.632720947265625, -0.010234832763671875, 1.9768409729003906, 1.7197647094726562, 0.105377197265625, 3.2884483337402344, 2.7189674377441406, 6.779182434082031, 2.8108062744140625, 1.0334892272949219, -0.1717681884765625, -3.0454788208007812, -0.519927978515625, 9.49591064453125, -0.8393478393554688, -0.21962738037109375, 1.778167724609375, 4.223028182983398, 10.314979553222656, 0.03333473205566406, 0.6975021362304688, 5.939828872680664, 1.5081787109375, 1.9263076782226562, -1.641256332397461, -1.401885986328125, 0.1471424102783203, 2.0656795501708984, -0.2816925048828125, 11.25430679321289, 6.8132476806640625, 6.540740966796875, 1.2035598754882812, -2.0609130859375, 0.09978866577148438, -0.004241943359375, 6.052532196044922, -3.7544174194335938, 3.898456573486328, -0.1266021728515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000517.npy"}
|
||||
{"epoch": 0.781557067271353, "step": 518, "batch_size": 64, "mean": 1.3683172464370728, "std": 2.809713840484619, "min": -3.4791641235351562, "p10": -2.2806293487548825, "median": 1.005615234375, "p90": 5.571705627441407, "max": 8.157989501953125, "pos_frac": 0.671875, "sample": [1.6327362060546875, -2.405242919921875, 0.00970458984375, 0.95050048828125, 0.15839385986328125, 0.8638153076171875, 0.30916595458984375, 2.580108642578125, -3.1574440002441406, 2.8111000061035156, 1.5838165283203125, -0.4764213562011719, 2.5022811889648438, 5.3257293701171875, 2.5692596435546875, -2.9723358154296875, -2.3003921508789062, 4.291145324707031, -0.5880317687988281, 2.344158172607422, 6.44451904296875, -2.404327392578125, -0.7295455932617188, 1.0193023681640625, 5.1223907470703125, 1.6439056396484375, -2.234516143798828, 2.3020782470703125, -3.4791641235351562, 5.24456787109375, 0.7647371292114258, -0.35546302795410156, -0.05257987976074219, 1.3252334594726562, 5.906333923339844, -2.1678848266601562, 2.546092987060547, 0.8924789428710938, 3.9333648681640625, 7.0019989013671875, -0.3834991455078125, 0.6695098876953125, 0.4504585266113281, 8.157989501953125, 3.27899169921875, 5.6771240234375, -0.9039306640625, 7.979095458984375, -0.3938751220703125, -1.7562141418457031, -2.6663551330566406, 0.9919281005859375, 1.205169677734375, -0.0956888198852539, -1.0181140899658203, 4.9788665771484375, 1.37188720703125, 1.3397712707519531, 1.0313034057617188, -0.5210914611816406, 1.1946296691894531, 0.004390716552734375, 6.6345977783203125, 1.5897903442382812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000518.npy"}
|
||||
{"epoch": 0.783068783068783, "step": 519, "batch_size": 64, "mean": 0.9821274876594543, "std": 3.2005038261413574, "min": -6.357818603515625, "p10": -3.1508346557617184, "median": 1.0853099822998047, "p90": 5.01870803833008, "max": 7.3472137451171875, "pos_frac": 0.625, "sample": [-2.7265472412109375, 2.1252822875976562, -3.332672119140625, -3.918376922607422, -1.3559494018554688, 2.6695098876953125, 0.9279136657714844, 7.3472137451171875, 0.10645294189453125, 3.1886444091796875, 1.48858642578125, 2.6746063232421875, 4.171855926513672, 0.46758270263671875, -0.9944915771484375, 1.4198112487792969, -0.9105796813964844, -0.9766426086425781, -2.1493568420410156, -5.6767120361328125, 3.516796112060547, 7.256660461425781, -0.74609375, 3.51092529296875, 2.80035400390625, 7.34014892578125, -0.7059135437011719, 2.226184844970703, 2.252227783203125, 2.0191650390625, -1.3325319290161133, 3.657867431640625, 1.0026931762695312, 0.5384674072265625, 1.7986907958984375, -0.278594970703125, -2.3164138793945312, 0.6905899047851562, 6.133399963378906, -3.62615966796875, 1.336944580078125, 2.1208839416503906, 5.88470458984375, 1.1679267883300781, 3.8312454223632812, -6.357818603515625, 0.8928909301757812, -3.5193328857421875, 2.9113540649414062, -4.649141311645508, 4.422473907470703, -2.368419647216797, -1.144683837890625, 6.6199188232421875, 0.8905410766601562, 2.2519912719726562, 4.057586669921875, 5.222442626953125, -1.4910354614257812, -2.3877735137939453, 2.6370487213134766, -2.5629348754882812, 4.543327331542969, -1.7385749816894531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000519.npy"}
|
||||
{"epoch": 0.7845804988662132, "step": 520, "batch_size": 64, "mean": 1.6092244386672974, "std": 2.274054527282715, "min": -3.5932083129882812, "p10": -0.860181427001953, "median": 1.1111221313476562, "p90": 5.034717559814454, "max": 7.2087860107421875, "pos_frac": 0.78125, "sample": [2.3507652282714844, -1.1913738250732422, 0.2730083465576172, 2.33636474609375, 3.09014892578125, 5.6574554443359375, 1.3199653625488281, -0.48079681396484375, -2.8374481201171875, 5.6408233642578125, 2.1513442993164062, 0.9750804901123047, 1.0944671630859375, 0.8567047119140625, 0.7718963623046875, 1.6353225708007812, -0.33065032958984375, -3.5932083129882812, 1.4754810333251953, 5.151424407958984, 3.05120849609375, 4.762401580810547, 0.4749889373779297, 6.391044616699219, -0.921112060546875, 1.97845458984375, 7.2087860107421875, -0.0285491943359375, 0.513916015625, 2.9961929321289062, 0.7975311279296875, 0.5450477600097656, 1.0843925476074219, -0.7180099487304688, 1.5194854736328125, 0.3054332733154297, 4.630189895629883, 3.0777816772460938, 1.1243820190429688, 0.2183380126953125, -0.9521160125732422, -0.2186279296875, 1.0978622436523438, -0.53302001953125, 0.33290863037109375, 0.2658424377441406, 1.2041015625, 1.6146965026855469, 2.8661422729492188, 2.938507080078125, -1.84271240234375, 5.609493255615234, 4.4075927734375, 2.724994659423828, 6.3052520751953125, 2.2461318969726562, 0.836883544921875, 2.14398193359375, 0.19942474365234375, -0.5697307586669922, 0.4874382019042969, 3.874941825866699, 4.270959854125977, -1.6792678833007812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000520.npy"}
|
||||
{"epoch": 0.7860922146636432, "step": 521, "batch_size": 64, "mean": 1.5242516994476318, "std": 2.3770923614501953, "min": -3.1896209716796875, "p10": -0.7463600158691406, "median": 1.486236572265625, "p90": 4.660481262207032, "max": 8.937042236328125, "pos_frac": 0.671875, "sample": [4.014404296875, 1.0167694091796875, 1.83514404296875, 2.227325439453125, 2.503509521484375, -0.7232513427734375, 3.1118316650390625, -0.4629974365234375, -0.5127105712890625, 3.1296005249023438, 1.3755340576171875, -0.13791656494140625, 2.216644287109375, 1.487945556640625, 4.501129150390625, 4.223846435546875, 1.8505897521972656, 2.614837646484375, -1.9353713989257812, 6.401063919067383, 0.1284332275390625, 4.833869934082031, 0.0162353515625, -0.3015594482421875, 1.7102680206298828, 2.38299560546875, -0.6715011596679688, 2.169384002685547, 1.484527587890625, 1.6535110473632812, 8.937042236328125, 6.318756103515625, -3.1896209716796875, 2.202526092529297, -1.3120498657226562, -0.6420440673828125, -0.6335372924804688, 0.052906036376953125, 0.9232330322265625, 4.7287750244140625, 3.277099609375, 0.9987220764160156, -0.6180877685546875, -0.204620361328125, -0.22678756713867188, -0.3107414245605469, 3.347320556640625, 3.1241836547851562, 0.8163948059082031, 4.447628021240234, -0.7562637329101562, 1.9464035034179688, -2.4975433349609375, 5.3730316162109375, -2.561553955078125, 0.4054450988769531, 1.5254440307617188, -1.0853424072265625, 3.213531494140625, -0.25940704345703125, 5.8932037353515625, 0.9853744506835938, -0.634033203125, 1.8226318359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000521.npy"}
|
||||
{"epoch": 0.7876039304610734, "step": 522, "batch_size": 64, "mean": 1.2288823127746582, "std": 3.046536445617676, "min": -6.6455841064453125, "p10": -2.182279586791992, "median": 0.7224082946777344, "p90": 5.36991767883301, "max": 9.2510986328125, "pos_frac": 0.625, "sample": [0.23404693603515625, 2.4425201416015625, 1.5960731506347656, 2.238229751586914, 1.4029121398925781, 4.205741882324219, 2.6889114379882812, 4.816184997558594, 5.716758728027344, 1.0821762084960938, -1.9128036499023438, -3.829771041870117, 1.021697998046875, 8.32012939453125, 0.589202880859375, 2.5688552856445312, -2.9793548583984375, 2.427640914916992, 1.117401123046875, -0.6436767578125, -0.8149070739746094, -0.39870452880859375, 0.8928680419921875, -6.6455841064453125, 9.2510986328125, -2.512451171875, 0.6388397216796875, -0.3406829833984375, -3.0039215087890625, -0.418243408203125, -0.9453659057617188, 4.854042053222656, -1.405242919921875, 4.144611358642578, 2.2074966430664062, -2.0182037353515625, -0.5264739990234375, 1.2200164794921875, -0.6834487915039062, 0.251495361328125, 3.179119110107422, 0.17575454711914062, 0.17736434936523438, -0.31833648681640625, 2.6043968200683594, -0.6503658294677734, 3.0002822875976562, -2.2525978088378906, 4.345298767089844, 0.18416213989257812, 3.99359130859375, -1.5582122802734375, -0.08277130126953125, 5.6392974853515625, 0.8059768676757812, -0.81573486328125, -3.4928054809570312, 6.6829833984375, 2.7295188903808594, 8.462112426757812, -0.16284561157226562, 0.47562408447265625, 5.591007232666016, 3.0855331420898438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000522.npy"}
|
||||
{"epoch": 0.7891156462585034, "step": 523, "batch_size": 64, "mean": 1.6179664134979248, "std": 2.476724147796631, "min": -3.59716796875, "p10": -1.504683685302734, "median": 1.6968250274658203, "p90": 4.646418762207031, "max": 6.979400634765625, "pos_frac": 0.765625, "sample": [0.34078216552734375, 4.707038879394531, 1.9127349853515625, 1.6329193115234375, 5.956733703613281, 3.005094528198242, -0.15877532958984375, 2.9630126953125, 3.5149154663085938, 1.1997604370117188, 1.3332147598266602, 4.2321014404296875, 5.7592926025390625, 1.9563751220703125, 2.8821945190429688, 4.6656036376953125, 1.7369384765625, -0.6292285919189453, 3.32666015625, -1.0127677917480469, 1.6837844848632812, 3.0213775634765625, -0.3355865478515625, 0.645477294921875, -3.0623931884765625, 0.30048370361328125, -0.8750152587890625, -1.1466331481933594, 1.2089385986328125, 2.219127655029297, 4.00201416015625, 1.9128646850585938, 4.315582275390625, 2.2717761993408203, 2.552825927734375, 6.553474426269531, 5.023460388183594, 0.846710205078125, 4.601654052734375, -3.4288787841796875, 0.280548095703125, -1.5898895263671875, 6.979400634765625, 1.0481185913085938, 3.5207138061523438, 0.49275970458984375, 1.7098655700683594, 1.5498809814453125, -0.5325164794921875, -3.59716796875, 3.1569442749023438, 1.6460552215576172, 2.5284500122070312, -1.3058700561523438, 4.477287292480469, 1.7498931884765625, 1.3382549285888672, 1.047515869140625, 2.229755401611328, -3.4969711303710938, -2.9196319580078125, 0.6769828796386719, -2.924041748046875, 3.847869873046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000523.npy"}
|
||||
{"epoch": 0.7906273620559335, "step": 524, "batch_size": 64, "mean": 1.2957689762115479, "std": 2.327787399291992, "min": -5.963035583496094, "p10": -1.735968780517578, "median": 1.3821887969970703, "p90": 4.0069427490234375, "max": 7.345863342285156, "pos_frac": 0.703125, "sample": [1.9745635986328125, 1.5567855834960938, 4.7044525146484375, -0.22418594360351562, -1.7534027099609375, 0.9063873291015625, -1.6952896118164062, 1.0508575439453125, 5.090568542480469, 0.14268875122070312, 3.6529159545898438, -0.7887802124023438, -2.821287155151367, 0.9416046142578125, -1.6783218383789062, 4.3629302978515625, -1.9753265380859375, -0.8363494873046875, 1.7151565551757812, 4.6522369384765625, 3.0544776916503906, -5.963035583496094, 1.5569992065429688, 0.6739654541015625, 0.34650421142578125, -0.0141143798828125, 5.139884948730469, 4.0087738037109375, 3.3695430755615234, 0.6371612548828125, -0.20690536499023438, 2.956817626953125, -0.7072372436523438, 2.1133193969726562, 1.415771484375, 2.7019271850585938, 4.0026702880859375, 2.454814910888672, -0.35968017578125, -0.7716445922851562, 1.2669677734375, 0.0034637451171875, -0.0941009521484375, 2.040313720703125, 3.6703319549560547, 1.573333740234375, 3.6946067810058594, -2.6428680419921875, 0.8438606262207031, 0.490631103515625, 2.0389785766601562, 2.43975830078125, 1.3557319641113281, 2.8452911376953125, 1.4086456298828125, 2.739572525024414, 3.840534210205078, 2.8206748962402344, -1.77838134765625, 0.41014862060546875, 7.345863342285156, -2.1160659790039062, 3.5703582763671875, -0.2266559600830078], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000524.npy"}
|
||||
{"epoch": 0.7921390778533636, "step": 525, "batch_size": 64, "mean": 1.085106372833252, "std": 2.84372615814209, "min": -6.0632781982421875, "p10": -2.2281627655029292, "median": 0.7249860763549805, "p90": 5.131448364257814, "max": 9.122726440429688, "pos_frac": 0.671875, "sample": [0.4373931884765625, -1.5971145629882812, -0.31760406494140625, -0.453857421875, -3.9720535278320312, 1.508453369140625, -0.04726600646972656, 1.0360183715820312, 2.837249755859375, 0.8516387939453125, 1.6944828033447266, -3.4451370239257812, 0.7127208709716797, 1.1558380126953125, 0.9042549133300781, 5.2928466796875, 3.314647674560547, -6.0632781982421875, 8.145416259765625, -0.00107574462890625, 0.512481689453125, -1.3644256591796875, 4.4035186767578125, 4.754852294921875, -0.08650970458984375, 1.7932167053222656, 1.0960006713867188, 0.2145233154296875, 0.7372512817382812, -0.3356895446777344, -2.378520965576172, 2.93878173828125, 0.6652145385742188, 3.612607955932617, 1.1265144348144531, 1.0641632080078125, 1.1650009155273438, 1.6955108642578125, 7.8605499267578125, -2.8299427032470703, -1.8773269653320312, 0.7000770568847656, 3.9658985137939453, -1.5722732543945312, -0.1204071044921875, 9.122726440429688, -0.203033447265625, 0.2611083984375, 5.903717041015625, 6.29669189453125, 0.39038848876953125, 1.9124603271484375, 1.8133621215820312, -0.5352745056152344, 1.1519546508789062, 0.46685791015625, -1.357147216796875, 1.5451011657714844, 0.111663818359375, 2.6646347045898438, -2.450489044189453, 5.677162170410156, 0.16184425354003906, -3.2215652465820312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000525.npy"}
|
||||
{"epoch": 0.7936507936507936, "step": 526, "batch_size": 64, "mean": 1.325132131576538, "std": 2.347794532775879, "min": -4.191707611083984, "p10": -1.3074954986572263, "median": 1.0265579223632812, "p90": 4.070661163330079, "max": 6.96142578125, "pos_frac": 0.6875, "sample": [6.96142578125, 1.4168014526367188, 1.5072517395019531, 0.8857002258300781, -1.0252647399902344, 3.8683547973632812, 1.3217315673828125, 6.861663818359375, -1.4284515380859375, -0.4552764892578125, 3.15960693359375, 1.7299346923828125, -1.5685539245605469, -0.1200103759765625, 2.4735946655273438, 0.9319992065429688, 1.8028316497802734, 6.827674865722656, 1.3253402709960938, 3.5767173767089844, 1.7354888916015625, 3.3198204040527344, 1.0539703369140625, 4.6228485107421875, 1.6797637939453125, 0.5063018798828125, 4.1573638916015625, 0.1071319580078125, -1.6502037048339844, 1.777618408203125, 1.4938697814941406, -0.9899940490722656, -0.9209365844726562, -0.6976051330566406, 2.731884002685547, 0.12049674987792969, 0.7164154052734375, -0.300201416015625, 2.8568496704101562, -2.0024490356445312, 3.2901763916015625, 0.5343704223632812, -0.5082778930664062, -0.012725830078125, 1.3803176879882812, -2.5838241577148438, -1.8827686309814453, -0.5100021362304688, 5.403415679931641, -0.980194091796875, 3.029266357421875, -0.25719451904296875, 3.4906158447265625, 0.1097412109375, 6.916038513183594, 0.9991455078125, -0.5863857269287109, 1.3537025451660156, 0.8944473266601562, 3.1861648559570312, 0.9046630859375, 0.7320938110351562, 3.7258758544921875, -4.191707611083984], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000526.npy"}
|
||||
{"epoch": 0.7951625094482238, "step": 527, "batch_size": 64, "mean": 0.9579578638076782, "std": 2.486586332321167, "min": -4.903472900390625, "p10": -2.2136169433593746, "median": 0.7185478210449219, "p90": 4.066133880615235, "max": 6.938083648681641, "pos_frac": 0.671875, "sample": [0.045775413513183594, 0.8411979675292969, 2.7820892333984375, 1.8543853759765625, 0.6826553344726562, 0.6832313537597656, 4.687713623046875, 1.5147590637207031, 3.483182907104492, 3.895294189453125, 4.187347412109375, 0.3416900634765625, 2.41217041015625, -1.0489959716796875, 2.6628952026367188, 2.112396240234375, 1.76416015625, 0.5285634994506836, 0.9000892639160156, -1.2525215148925781, 2.91717529296875, -2.8415069580078125, -4.620841979980469, -0.39609527587890625, 0.087982177734375, 3.3854904174804688, 6.2034912109375, -2.382049560546875, 1.4891395568847656, 1.9498214721679688, -1.1837158203125, 0.365325927734375, -2.2738037109375, 1.2631874084472656, 2.9064979553222656, 6.938083648681641, 3.8816070556640625, -0.2076263427734375, -0.044445037841796875, 1.9472770690917969, 2.2234439849853516, 0.4691352844238281, 5.24713134765625, -0.0612640380859375, -1.7447662353515625, -0.7415142059326172, -1.8681354522705078, 4.731590270996094, -3.9235610961914062, 0.7538642883300781, -2.07318115234375, 2.760143280029297, 2.73406982421875, 0.4301185607910156, -0.8759765625, 0.06782150268554688, -0.203704833984375, 4.139350891113281, 0.1019439697265625, 2.5711212158203125, -4.903472900390625, -3.275360107421875, 2.7192859649658203, -0.431854248046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000527.npy"}
|
||||
{"epoch": 0.7966742252456538, "step": 528, "batch_size": 64, "mean": 1.1760194301605225, "std": 2.8770761489868164, "min": -5.934722900390625, "p10": -2.4679005622863768, "median": 1.0283679962158203, "p90": 4.441545104980469, "max": 8.138336181640625, "pos_frac": 0.734375, "sample": [4.4669647216796875, -5.934722900390625, 6.089332580566406, -0.4313373565673828, 3.6988601684570312, 0.9200057983398438, 0.5233039855957031, -2.4882965087890625, 0.7357120513916016, 2.4819602966308594, -0.33701515197753906, -5.751350402832031, 2.6869049072265625, -3.778656005859375, 0.41643524169921875, -0.40169525146484375, 4.963214874267578, 1.0362987518310547, 0.3511924743652344, 1.13946533203125, 1.4655227661132812, 5.6732330322265625, 0.3652667999267578, -1.942840576171875, 0.2853431701660156, 2.6730880737304688, 0.6071853637695312, -4.128814697265625, 1.2837104797363281, 3.99114990234375, 1.106119155883789, 1.7773056030273438, 3.387847900390625, 6.236717224121094, 3.4571304321289062, -3.263031005859375, 6.7999725341796875, 0.9364776611328125, 4.382232666015625, 0.3759193420410156, 2.230804443359375, 3.1295852661132812, 0.16088104248046875, 2.84991455078125, -2.4203100204467773, 1.023284912109375, 0.5373611450195312, 1.8533897399902344, 4.030481338500977, -1.1268930435180664, 0.7716064453125, -3.6871871948242188, -2.186931610107422, 2.935699462890625, 8.138336181640625, 2.4266014099121094, -1.6372528076171875, -0.4656639099121094, 1.1135330200195312, -0.3388862609863281, 1.0334510803222656, 4.1377105712890625, 4.2513885498046875, 0.648223876953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000528.npy"}
|
||||
{"epoch": 0.7981859410430839, "step": 529, "batch_size": 64, "mean": 2.284201145172119, "std": 2.540642261505127, "min": -1.9512710571289062, "p10": -0.5326507568359374, "median": 1.748621940612793, "p90": 5.34389705657959, "max": 9.521736145019531, "pos_frac": 0.796875, "sample": [3.257579803466797, -0.7694015502929688, 4.616630554199219, 1.9516372680664062, 2.9847640991210938, 0.26204681396484375, 6.2205352783203125, 5.319246292114258, 3.0036277770996094, 0.7341995239257812, -1.3901176452636719, 0.6377677917480469, 0.165771484375, 0.7910079956054688, 1.780731201171875, 0.98907470703125, -1.9512710571289062, 6.548772811889648, -0.035106658935546875, 5.0016632080078125, 3.869556427001953, 1.4180221557617188, 0.1052703857421875, -0.31871795654296875, 7.898292541503906, 5.354461669921875, -0.5437240600585938, 2.90087890625, -0.20135879516601562, 0.5231170654296875, 1.75799560546875, 2.1531143188476562, 3.7491683959960938, 8.44314956665039, 4.246051788330078, 4.5272674560546875, 3.9125194549560547, 1.739248275756836, 1.341827392578125, 3.1807022094726562, 1.782257080078125, 5.269184112548828, 4.224235534667969, -0.5068130493164062, 1.378021240234375, 3.0833396911621094, 0.19698333740234375, -0.035846710205078125, 0.12816619873046875, 3.4681396484375, 1.467498779296875, -0.863555908203125, 1.3559989929199219, 4.734226226806641, 0.3949241638183594, 9.521736145019531, 4.809791564941406, -0.1192626953125, 1.1965065002441406, 4.426727294921875, -1.2244949340820312, 1.0149154663085938, -1.0707626342773438, 5.380950927734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000529.npy"}
|
||||
{"epoch": 0.799697656840514, "step": 530, "batch_size": 64, "mean": 1.3998620510101318, "std": 2.7634620666503906, "min": -6.562744140625, "p10": -1.1306732177734375, "median": 0.6621208190917969, "p90": 4.83996810913086, "max": 10.656318664550781, "pos_frac": 0.671875, "sample": [3.274688720703125, 2.8812522888183594, 3.4902076721191406, -1.1351547241210938, 7.2980804443359375, -0.79388427734375, 4.9116973876953125, 0.46648406982421875, 6.265045166015625, 2.841522216796875, 0.3224296569824219, 2.4418487548828125, -1.7299461364746094, 0.1262645721435547, 4.023670196533203, 2.337627410888672, 2.4776992797851562, 0.2265472412109375, 10.656318664550781, 1.41522216796875, -0.43486976623535156, -3.2125205993652344, -6.562744140625, 1.916595458984375, 4.672599792480469, 0.04384613037109375, 3.1462020874023438, 2.4726219177246094, -0.7470932006835938, 6.4290313720703125, 0.045196533203125, -0.41802215576171875, 2.3572845458984375, 5.753143310546875, 0.489288330078125, -0.028484344482421875, 3.2243309020996094, 3.1642608642578125, -1.0715560913085938, 0.6947250366210938, 2.6518630981445312, 2.252105712890625, -0.0527191162109375, 0.6295166015625, -0.386962890625, -2.6499862670898438, 3.339872360229492, -0.9519844055175781, -1.7772865295410156, 0.3516845703125, -2.983671188354492, -1.1202163696289062, 3.6191329956054688, 1.2427024841308594, 3.1897125244140625, -0.146820068359375, 2.5128402709960938, -0.5610885620117188, -0.6520805358886719, -0.19809722900390625, 4.93878173828125, 0.460174560546875, 0.20998382568359375, 1.9422531127929688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000530.npy"}
|
||||
{"epoch": 0.8012093726379441, "step": 531, "batch_size": 64, "mean": 1.2052104473114014, "std": 2.17195200920105, "min": -4.221710205078125, "p10": -1.3146413803100585, "median": 0.9897880554199219, "p90": 3.8487457275390646, "max": 8.077859878540039, "pos_frac": 0.734375, "sample": [-0.32183837890625, 0.5153732299804688, 2.784881591796875, -0.33917999267578125, 1.1075019836425781, 0.45339202880859375, -0.7654647827148438, 1.0365447998046875, 2.74761962890625, 1.5891151428222656, 2.4014244079589844, 0.24020957946777344, 3.1987457275390625, -2.4435348510742188, 8.077859878540039, 3.2186660766601562, -0.7430229187011719, 0.562591552734375, -1.451934814453125, 4.090167999267578, 1.888885498046875, 0.77154541015625, -1.4718780517578125, 2.1553802490234375, -0.427398681640625, 1.1391010284423828, 1.4938201904296875, -0.8450241088867188, 1.1904144287109375, 0.575653076171875, 0.2709197998046875, 0.5114212036132812, -1.3240604400634766, 0.38472747802734375, -4.221710205078125, -0.9607772827148438, 0.3475189208984375, 1.3869609832763672, 3.2854270935058594, -3.5959701538085938, 0.7886962890625, 2.0328598022460938, 2.4696311950683594, 2.9368858337402344, -1.390716552734375, -1.29266357421875, 5.072513580322266, 5.8557538986206055, 2.4855804443359375, 0.3437652587890625, 0.79803466796875, 1.7881240844726562, 0.9430313110351562, 3.1258392333984375, 2.350433349609375, -0.03519439697265625, 4.6895904541015625, 0.7147674560546875, 1.3534164428710938, 4.23046875, 2.0539588928222656, 2.0462818145751953, 5.6571502685546875, -0.39881134033203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000531.npy"}
|
||||
{"epoch": 0.8027210884353742, "step": 532, "batch_size": 64, "mean": 1.4446359872817993, "std": 2.4637537002563477, "min": -2.897796630859375, "p10": -1.6821622848510742, "median": 1.0772476196289062, "p90": 4.417564582824707, "max": 9.049293518066406, "pos_frac": 0.703125, "sample": [1.0829696655273438, 3.079914093017578, 5.490020751953125, 5.83203125, 1.3694610595703125, 0.4457550048828125, 2.5795516967773438, 9.049293518066406, 0.9124069213867188, 3.46026611328125, -0.505218505859375, 2.9389877319335938, 3.5114593505859375, 2.016204833984375, 2.598224639892578, 1.1935806274414062, -0.0647430419921875, 3.50079345703125, 1.2295150756835938, -2.7836227416992188, -1.0610036849975586, 0.23309707641601562, 2.2645912170410156, 0.9094429016113281, -1.591552734375, 0.4971923828125, 0.700927734375, 2.949371337890625, 4.2769775390625, 4.434532165527344, 3.44805908203125, -0.4097099304199219, -2.662353515625, 5.976860046386719, 1.0715255737304688, 0.306243896484375, 4.151302337646484, -0.4222412109375, 0.769927978515625, 0.8300151824951172, -1.0922775268554688, -2.897796630859375, -1.7209949493408203, -0.5491905212402344, 3.800079345703125, 1.4930419921875, 5.0149688720703125, 1.7153511047363281, 2.3054885864257812, 1.7084197998046875, 0.7393646240234375, -0.189727783203125, -0.044322967529296875, 7.181774139404297, -1.7675552368164062, 1.6174430847167969, -0.8622188568115234, -1.7645416259765625, 4.377973556518555, -2.173248291015625, 1.0353775024414062, -1.3307933807373047, 0.91058349609375, 1.339447021484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000532.npy"}
|
||||
{"epoch": 0.8042328042328042, "step": 533, "batch_size": 64, "mean": 1.5031521320343018, "std": 2.5891876220703125, "min": -3.32830810546875, "p10": -1.689472198486328, "median": 1.3250808715820312, "p90": 4.960917854309082, "max": 7.266632080078125, "pos_frac": 0.703125, "sample": [2.7879104614257812, 4.416841506958008, -1.4536056518554688, 2.7177085876464844, 0.609771728515625, 1.860921859741211, -0.8729705810546875, -0.6141700744628906, 1.2277755737304688, 4.060903549194336, -2.7372894287109375, -0.612396240234375, 1.2885093688964844, 0.09918022155761719, 4.197711944580078, 5.3756103515625, 1.6791114807128906, 2.6416778564453125, 0.335723876953125, -0.9150848388671875, -0.057109832763671875, 1.3616523742675781, 2.251251220703125, 1.8197479248046875, 0.0978546142578125, -1.4226856231689453, 2.6931819915771484, -1.790557861328125, 1.7088642120361328, -2.2411060333251953, -1.452667236328125, 2.2717552185058594, 4.978879928588867, 7.266632080078125, -0.92822265625, -0.03118896484375, -3.32830810546875, 5.2269744873046875, 0.6710643768310547, 0.056324005126953125, 0.5119705200195312, 3.91888427734375, 2.0446395874023438, 4.0432281494140625, 4.5285186767578125, 7.203762054443359, 0.18927001953125, -2.3043594360351562, 1.1304168701171875, -1.8239898681640625, 0.03899955749511719, 4.788150787353516, 2.841045379638672, -0.30135345458984375, 0.44966697692871094, -1.41796875, 2.4119644165039062, 4.039360046386719, 1.6960334777832031, 5.272214889526367, 4.91900634765625, 5.7712860107421875, -3.320383071899414, 4.3251953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000533.npy"}
|
||||
{"epoch": 0.8057445200302343, "step": 534, "batch_size": 64, "mean": 1.4704830646514893, "std": 2.473984956741333, "min": -5.944854736328125, "p10": -1.4277854919433592, "median": 1.8862228393554688, "p90": 4.2818078994750985, "max": 6.619293212890625, "pos_frac": 0.734375, "sample": [1.622955322265625, -2.2682571411132812, 5.8909759521484375, 2.4990005493164062, 1.1671276092529297, 3.2222900390625, 0.5228424072265625, -1.2139968872070312, 2.783092498779297, -1.2863082885742188, -1.1273250579833984, 0.0996856689453125, 2.1363143920898438, 3.0770950317382812, -0.113433837890625, 4.047393798828125, 1.8789329528808594, 2.707141876220703, 2.8132095336914062, -2.4580001831054688, 6.1021881103515625, 2.790191650390625, 2.8063507080078125, 2.4546737670898438, -1.0750160217285156, -5.944854736328125, -2.6731796264648438, 0.4697723388671875, 2.7681808471679688, 2.1584243774414062, -0.12400054931640625, -3.1869659423828125, 5.5408782958984375, 6.5866546630859375, 3.9742469787597656, 1.3322296142578125, 0.3415641784667969, 0.283599853515625, 6.619293212890625, 3.2832870483398438, 4.139915466308594, 2.6397552490234375, -1.2320098876953125, 1.063232421875, 2.4582042694091797, 2.5093650817871094, -0.7181587219238281, 0.752166748046875, 0.4857521057128906, 0.6713714599609375, 2.275897979736328, 3.0394744873046875, 1.4471397399902344, -0.4384441375732422, 2.3768692016601562, 2.3187255859375, 3.1603775024414062, -1.4884185791015625, 1.8935127258300781, -1.957763671875, 4.606414794921875, 0.5415802001953125, 4.342618942260742, -1.2849159240722656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000534.npy"}
|
||||
{"epoch": 0.8072562358276644, "step": 535, "batch_size": 64, "mean": 1.5089242458343506, "std": 2.7326180934906006, "min": -5.910797119140625, "p10": -1.6685461044311523, "median": 1.4884819984436035, "p90": 5.341217041015627, "max": 8.104866027832031, "pos_frac": 0.6875, "sample": [-0.6216354370117188, -2.5215530395507812, 8.104866027832031, 4.896240234375, 0.47367095947265625, 3.151304244995117, -1.5482463836669922, 0.7565116882324219, -1.0602493286132812, 7.4034423828125, -2.269378662109375, 0.5405731201171875, 2.316852569580078, 1.7277374267578125, -0.9070663452148438, 0.8346824645996094, -2.5665359497070312, 2.9773330688476562, 0.3844118118286133, -0.4958953857421875, 1.1668205261230469, 0.1287384033203125, 2.599180221557617, -0.1193695068359375, 4.484550476074219, 0.6989669799804688, 4.347419738769531, -1.7619705200195312, -1.9482574462890625, 1.440333366394043, 5.53192138671875, 6.767112731933594, 1.6294708251953125, 1.0665664672851562, 1.536630630493164, 1.625284194946289, 1.7591094970703125, 5.8762054443359375, -1.6749038696289062, 2.662639617919922, 2.5199127197265625, -1.6537113189697266, 7.9895477294921875, 1.1781196594238281, 2.4231185913085938, -5.910797119140625, 2.572906494140625, 3.3293228149414062, -0.22600746154785156, 3.349273681640625, 2.3078765869140625, -1.2909011840820312, -0.2771949768066406, 6.004341125488281, 3.292572021484375, -0.1829681396484375, 1.9428482055664062, 1.666778564453125, 2.948680877685547, -0.1175689697265625, -1.2918319702148438, 0.21599578857421875, 4.242576599121094, 2.144744873046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000535.npy"}
|
||||
{"epoch": 0.8087679516250945, "step": 536, "batch_size": 64, "mean": 1.7017552852630615, "std": 2.127514362335205, "min": -2.1703643798828125, "p10": -0.9778064727783202, "median": 1.3754959106445312, "p90": 4.275735473632813, "max": 6.688282012939453, "pos_frac": 0.796875, "sample": [1.0462455749511719, -1.9650115966796875, -0.8020172119140625, 3.6296749114990234, 3.2757415771484375, -1.5588531494140625, 0.20644378662109375, -2.1703643798828125, 0.15172576904296875, 0.9843482971191406, -1.6282577514648438, 2.6065750122070312, 4.201202392578125, 3.7165069580078125, 2.3897552490234375, 1.6957244873046875, 3.7632293701171875, 6.688282012939453, 5.9581451416015625, 1.1887950897216797, 2.948025703430176, 0.3297538757324219, 3.5775985717773438, 3.756683349609375, -0.46692657470703125, 6.348823547363281, 2.97222900390625, 0.6416130065917969, 0.5278778076171875, 2.660083770751953, 4.3624114990234375, -1.3843002319335938, 0.45252227783203125, 1.7970466613769531, 0.9304122924804688, 3.5098876953125, 5.648418426513672, -0.9216690063476562, 3.700347900390625, 0.9543523788452148, 3.7435073852539062, 2.88299560546875, 2.0897254943847656, -1.0018653869628906, 0.8203163146972656, -1.871612548828125, 1.1021080017089844, 1.625518798828125, 0.6677722930908203, 4.30767822265625, 0.3931903839111328, 0.1979217529296875, -0.11243438720703125, 2.717620849609375, 4.7003326416015625, 0.1312885284423828, 0.8410263061523438, 3.906707763671875, -0.7301959991455078, 1.3964462280273438, 1.7225799560546875, 2.5558509826660156, -0.25176239013671875, 1.3545455932617188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000536.npy"}
|
||||
{"epoch": 0.8102796674225246, "step": 537, "batch_size": 64, "mean": 1.5759867429733276, "std": 2.4414329528808594, "min": -3.185504913330078, "p10": -1.4517108917236325, "median": 1.3848991394042969, "p90": 4.831315612792969, "max": 7.93475341796875, "pos_frac": 0.734375, "sample": [2.4529380798339844, 3.5566558837890625, -0.2936553955078125, 0.083343505859375, 5.17352294921875, 4.7093048095703125, 5.288761138916016, 0.05980491638183594, 0.5380096435546875, 3.003437042236328, 0.8262672424316406, 3.363109588623047, 3.7878570556640625, 4.88360595703125, 4.446949005126953, 1.55938720703125, 5.193271636962891, 1.2909774780273438, 1.2862491607666016, -0.8517351150512695, -1.5643024444580078, 0.18316650390625, 2.0474395751953125, -0.6566896438598633, -1.9525260925292969, 4.1351470947265625, 1.73565673828125, 2.1177825927734375, -3.1490707397460938, 0.6170730590820312, 3.257659912109375, 1.1381378173828125, 1.8381156921386719, -0.5796356201171875, -0.9349746704101562, 0.3946266174316406, 3.7819976806640625, 1.816192626953125, 0.2951812744140625, -0.5276527404785156, 1.47882080078125, 4.5015869140625, 5.154449462890625, 1.177734375, 0.4002227783203125, -1.1889972686767578, 5.4660491943359375, -2.1647262573242188, -0.8545150756835938, 4.4643707275390625, 1.1387100219726562, 2.1203155517578125, -3.185504913330078, 3.5711517333984375, 7.93475341796875, 2.5564498901367188, -2.1886940002441406, 3.6500244140625, -2.5607528686523438, -0.46216583251953125, 2.3743057250976562, 3.7960052490234375, -1.0609016418457031, 0.39307403564453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000537.npy"}
|
||||
{"epoch": 0.8117913832199547, "step": 538, "batch_size": 64, "mean": 1.79640793800354, "std": 2.4069509506225586, "min": -3.090728759765625, "p10": -0.867823028564453, "median": 1.7779922485351562, "p90": 4.7380773544311525, "max": 9.506893157958984, "pos_frac": 0.75, "sample": [-1.9122085571289062, 1.577958106994629, -0.043430328369140625, 2.1001052856445312, 4.600433349609375, -2.25531005859375, 2.455291748046875, -0.8094558715820312, -0.037006378173828125, 1.8037643432617188, 3.0162792205810547, 4.797067642211914, 1.756317138671875, 2.3194656372070312, 3.1388778686523438, 1.7996673583984375, 5.445442199707031, 4.015098571777344, 1.9282875061035156, 2.1765499114990234, 1.6491127014160156, -0.18944549560546875, 2.2003097534179688, 4.461273193359375, 1.0127487182617188, -0.7264118194580078, 3.3022384643554688, 0.62274169921875, 7.119842529296875, 0.7156219482421875, 0.4198760986328125, 0.7620086669921875, -0.8928375244140625, 1.308349609375, -0.39257049560546875, 2.9383773803710938, -3.090728759765625, 2.0207672119140625, -0.42113494873046875, 1.179901123046875, 3.281494140625, 1.381744384765625, 4.81964111328125, 0.185028076171875, -0.421783447265625, 2.765106201171875, -2.171356201171875, 3.9929275512695312, 1.9680709838867188, 7.1575927734375, 1.5348663330078125, 1.8999099731445312, 4.3328857421875, -1.805816650390625, 2.2596588134765625, -0.710052490234375, 9.506893157958984, 2.204212188720703, 2.855072021484375, -2.2534942626953125, 1.3549728393554688, 6.003055572509766, 1.4082565307617188, 1.5479888916015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000538.npy"}
|
||||
{"epoch": 0.8133030990173847, "step": 539, "batch_size": 64, "mean": 1.446435809135437, "std": 2.1546640396118164, "min": -2.6106414794921875, "p10": -0.8093223571777342, "median": 0.9932956695556641, "p90": 4.3210939407348645, "max": 7.7920379638671875, "pos_frac": 0.734375, "sample": [1.8715343475341797, 2.6833114624023438, -0.39542198181152344, 4.413917541503906, 3.3364181518554688, 0.8968505859375, 2.8080902099609375, 1.59100341796875, 4.7055511474609375, -0.5992507934570312, 6.6596221923828125, 0.81231689453125, -2.0677032470703125, 0.9828643798828125, -2.6106414794921875, 0.6925086975097656, 0.5659866333007812, 1.3638763427734375, 3.9316139221191406, 1.4315185546875, 0.07904624938964844, -0.5299701690673828, 2.1955490112304688, -0.60205078125, 0.034481048583984375, -0.6182098388671875, 0.19312477111816406, 0.5806427001953125, 5.492767333984375, 2.504932403564453, -1.2697982788085938, 0.6166229248046875, -0.1530303955078125, -0.3941459655761719, 3.39813232421875, 0.5852203369140625, 1.1895561218261719, 0.4076995849609375, 5.612071990966797, 0.4547405242919922, 0.6650390625, -0.8912277221679688, 2.7094345092773438, -1.42669677734375, 3.5035324096679688, 3.128143310546875, -1.2603015899658203, 7.7920379638671875, 0.5727996826171875, -0.5459060668945312, 2.4576797485351562, 3.2184829711914062, 1.0037269592285156, -0.1067047119140625, 1.018280029296875, 3.8600311279296875, -0.2448577880859375, 1.2371559143066406, -1.8171768188476562, 4.10450553894043, 1.3670387268066406, 4.4853973388671875, 3.8477325439453125, 1.0423965454101562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000539.npy"}
|
||||
{"epoch": 0.8148148148148148, "step": 540, "batch_size": 64, "mean": 1.813124656677246, "std": 2.7645750045776367, "min": -7.674224853515625, "p10": -1.817926788330078, "median": 1.9527740478515625, "p90": 5.280086898803711, "max": 6.8831939697265625, "pos_frac": 0.765625, "sample": [-1.8522109985351562, 1.6055145263671875, 5.092437744140625, 4.83917236328125, 1.3214550018310547, -1.979104995727539, 1.7700271606445312, 2.6844940185546875, 0.468475341796875, 2.338787078857422, 1.2373008728027344, 3.259979248046875, 3.2751922607421875, 2.2574234008789062, -1.9981689453125, -0.2829780578613281, 5.416435241699219, 1.54180908203125, -0.5164680480957031, 2.1707916259765625, 4.4997406005859375, -1.9644622802734375, 3.345916748046875, 2.3038711547851562, 4.600776672363281, 4.236793518066406, 0.009761810302734375, 0.9724655151367188, -3.8309478759765625, 0.5829315185546875, -7.674224853515625, 6.8831939697265625, 4.8585205078125, 2.47467041015625, 6.729034423828125, 1.1410064697265625, -1.6231765747070312, 3.9782047271728516, 1.389739990234375, 1.2699642181396484, 1.2268486022949219, 4.383033752441406, 2.4915809631347656, 5.145282745361328, 5.504596710205078, -0.9458694458007812, 5.9250946044921875, 1.59259033203125, -1.7379302978515625, -2.7946014404296875, 2.1467514038085938, 4.8602294921875, -1.527130126953125, -0.0475616455078125, 0.6865978240966797, 2.974456787109375, -0.3543548583984375, 5.616996765136719, 0.279632568359375, 2.1355209350585938, 0.95391845703125, 2.1905136108398438, 3.1617698669433594, 5.337860107421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000540.npy"}
|
||||
{"epoch": 0.8163265306122449, "step": 541, "batch_size": 64, "mean": 1.673627495765686, "std": 2.423875331878662, "min": -3.8809051513671875, "p10": -2.0365386962890626, "median": 2.172761917114258, "p90": 4.795137405395509, "max": 7.056552886962891, "pos_frac": 0.6875, "sample": [3.7304840087890625, 4.093017578125, 2.2927513122558594, 5.205413818359375, -0.8027267456054688, 4.247894287109375, -0.3275604248046875, 3.180255889892578, 3.5575618743896484, -1.308258056640625, 2.3430328369140625, -2.7586669921875, 1.5749740600585938, 1.8253002166748047, 1.889892578125, -2.1622772216796875, 2.0312881469726562, 2.266845703125, 0.1087188720703125, 5.1054229736328125, -0.14095306396484375, -0.7257022857666016, 4.891536712646484, 3.783966064453125, 2.150238037109375, 2.247570037841797, 2.4398193359375, -2.0438308715820312, -2.0546493530273438, -0.09383392333984375, 3.107196807861328, 2.5093555450439453, 7.056552886962891, 3.9904937744140625, -2.0195236206054688, 4.5702056884765625, 0.8877239227294922, -3.8809051513671875, 2.6139144897460938, 1.862945556640625, -0.008684158325195312, 0.1153106689453125, 2.1044921875, 3.398651123046875, 2.303070068359375, 0.467437744140625, 3.2841110229492188, 2.2289276123046875, -2.515655517578125, 2.6885452270507812, -0.09502410888671875, 1.8128738403320312, 2.79833984375, -0.9050445556640625, -0.11034393310546875, -0.3633880615234375, 2.4221725463867188, 2.1952857971191406, -0.12141990661621094, 5.37969970703125, 5.612470626831055, -2.531494140625, 6.135108947753906, 3.57122802734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000541.npy"}
|
||||
{"epoch": 0.817838246409675, "step": 542, "batch_size": 64, "mean": 1.841409683227539, "std": 2.6618714332580566, "min": -4.044914245605469, "p10": -1.223506546020508, "median": 1.5591192245483398, "p90": 5.3710369110107425, "max": 7.6106109619140625, "pos_frac": 0.75, "sample": [4.359561920166016, 2.237274169921875, 1.3857803344726562, 3.7274627685546875, 0.42715930938720703, 1.7281036376953125, 0.4820556640625, 4.103504180908203, 0.04302978515625, -0.23351287841796875, 2.1881256103515625, -1.2268104553222656, 0.826995849609375, 3.8832244873046875, 0.8292236328125, 2.447967529296875, -0.5925369262695312, 1.3358612060546875, -1.6678466796875, 7.0287981033325195, 6.670543670654297, 4.891502380371094, 1.536062240600586, 1.5821762084960938, -2.2569580078125, -3.3869247436523438, -0.7160224914550781, -0.80950927734375, 1.7909355163574219, -2.1704025268554688, 3.919820785522461, 5.07025146484375, 1.9221878051757812, 3.0733604431152344, -1.2157974243164062, -0.5794525146484375, 0.0543670654296875, 2.467498779296875, -4.044914245605469, 5.817329406738281, 1.0667343139648438, 0.4517402648925781, 4.977313995361328, -0.5735931396484375, 5.445003509521484, 0.6699371337890625, 6.3170928955078125, 3.2215843200683594, -0.0040283203125, 5.198448181152344, 4.068603515625, 7.6106109619140625, -2.065084457397461, -0.25213623046875, 3.1433238983154297, 2.4350509643554688, 0.06817245483398438, 6.419036865234375, 1.3341789245605469, 1.9056472778320312, 0.11379241943359375, 4.32745361328125, 0.33965301513671875, 4.702213287353516], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000542.npy"}
|
||||
{"epoch": 0.8193499622071051, "step": 543, "batch_size": 64, "mean": 1.6092875003814697, "std": 2.4941797256469727, "min": -4.875392913818359, "p10": -1.0321563720703124, "median": 1.3388900756835938, "p90": 5.280598449707032, "max": 7.090065002441406, "pos_frac": 0.71875, "sample": [-0.0619964599609375, 0.5779514312744141, -1.1154327392578125, 0.13384628295898438, 1.4634933471679688, -0.6311378479003906, 2.5985565185546875, 5.937568664550781, -2.4646377563476562, 4.292755126953125, -0.3665924072265625, 2.81390380859375, 3.5829620361328125, 3.6459083557128906, 5.193824768066406, 0.616241455078125, 6.630481719970703, -2.411724090576172, -0.4132957458496094, 3.171833038330078, -0.0070552825927734375, 0.3768768310546875, 1.2489700317382812, 2.2027053833007812, 1.9949760437011719, 6.337291717529297, -4.875392913818359, -0.21870803833007812, 0.9360504150390625, -0.7715110778808594, 1.8421897888183594, 2.8893280029296875, 7.090065002441406, 2.6294479370117188, 3.3927249908447266, -2.691375732421875, 4.418561935424805, 0.012243270874023438, -1.4493560791015625, -0.12884998321533203, 1.490325927734375, 0.2777595520019531, 6.5452880859375, -0.687408447265625, 0.6361160278320312, 3.4010238647460938, 5.317787170410156, 3.7767181396484375, 0.6483612060546875, 1.1714324951171875, 1.30230712890625, 0.319091796875, 1.3754730224609375, 1.8712844848632812, 2.85479736328125, -0.8378448486328125, 2.0842819213867188, 1.5225486755371094, -1.206390380859375, -0.5490150451660156, 3.6616363525390625, 6.164306640625, 0.5508880615234375, 2.8799381256103516], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000543.npy"}
|
||||
{"epoch": 0.8208616780045351, "step": 544, "batch_size": 64, "mean": 2.2667789459228516, "std": 3.0253965854644775, "min": -2.681243896484375, "p10": -1.3612911224365234, "median": 1.9736499786376953, "p90": 6.65169792175293, "max": 8.840583801269531, "pos_frac": 0.734375, "sample": [3.635772705078125, -1.774200439453125, 2.518951416015625, 0.3575935363769531, 2.9897613525390625, 2.1745223999023438, -2.4708709716796875, -0.2500762939453125, 6.245330810546875, -0.450042724609375, 0.0253143310546875, 0.5643463134765625, 1.966949462890625, 7.728889465332031, 2.0600204467773438, 0.621673583984375, -2.438129425048828, 2.980499267578125, 3.116241455078125, 0.634521484375, 1.1966018676757812, 1.2764129638671875, 1.951934814453125, 5.025539398193359, 5.319091796875, -0.25121116638183594, 1.5937118530273438, -0.6736297607421875, -0.7017087936401367, 0.4989166259765625, 6.2493438720703125, -1.42388916015625, 3.5937976837158203, -0.9249038696289062, 7.261699676513672, 3.3404388427734375, 6.05938720703125, -2.681243896484375, 2.236330032348633, 2.6880645751953125, 0.402008056640625, 2.89312744140625, 6.7041015625, 1.9803504943847656, -0.2863616943359375, 1.135528564453125, 2.4252891540527344, 1.3616523742675781, 8.840583801269531, -1.8828506469726562, -0.7323780059814453, 8.330230712890625, 6.8875732421875, 6.0699005126953125, -1.2152290344238281, 0.5731964111328125, 6.0435943603515625, -1.5771026611328125, 6.529422760009766, 7.7681732177734375, -0.8793792724609375, 5.91326904296875, 2.3621253967285156, 3.555267333984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000544.npy"}
|
||||
{"epoch": 0.8223733938019653, "step": 545, "batch_size": 64, "mean": 1.4665460586547852, "std": 2.6108853816986084, "min": -3.27099609375, "p10": -1.0668304443359375, "median": 1.0361595153808594, "p90": 4.368205833435058, "max": 8.886016845703125, "pos_frac": 0.671875, "sample": [4.353349685668945, 2.6189422607421875, -0.790313720703125, 5.182895660400391, -0.028995513916015625, 2.0974960327148438, 1.0324172973632812, -0.8358917236328125, -1.9544601440429688, 2.938323974609375, -0.061431884765625, -3.27099609375, 3.1683273315429688, -0.19337081909179688, 0.0868377685546875, 3.0362625122070312, 0.0229339599609375, 1.848236083984375, -0.7886257171630859, 4.37457275390625, 0.8602828979492188, 8.886016845703125, 1.1041908264160156, 1.507171630859375, 5.7880859375, 3.7258529663085938, 1.35400390625, 1.4287071228027344, 4.04925537109375, 0.6556892395019531, -0.7044410705566406, -3.1326427459716797, -2.3834991455078125, 0.5891990661621094, 0.6110763549804688, -1.0897369384765625, 3.0133819580078125, 7.7936553955078125, 1.316986083984375, -0.3972320556640625, -0.4215278625488281, 4.4739837646484375, 0.9958724975585938, 3.3338851928710938, 0.4166679382324219, 2.0125083923339844, -0.9934368133544922, 0.11488151550292969, 2.224334716796875, -2.9087867736816406, -0.096954345703125, 3.7052764892578125, 1.0399017333984375, 8.749160766601562, 2.2312698364257812, -1.0133819580078125, -0.19130325317382812, 3.464773178100586, -2.6322021484375, 0.612518310546875, 3.2155609130859375, 3.8661422729492188, 3.8721160888671875, -0.0248260498046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000545.npy"}
|
||||
{"epoch": 0.8238851095993953, "step": 546, "batch_size": 64, "mean": 1.7044583559036255, "std": 2.2758708000183105, "min": -4.477970123291016, "p10": -1.1945599555969237, "median": 1.5165119171142578, "p90": 4.636557006835939, "max": 6.929409027099609, "pos_frac": 0.796875, "sample": [5.536163330078125, -1.35589599609375, 2.942523956298828, 1.833282470703125, 0.9746208190917969, 3.133739471435547, 4.383819580078125, 1.888702392578125, 0.629608154296875, -1.1834402084350586, 0.4521484375, 2.181781768798828, 3.0030517578125, 2.35943603515625, -1.3832817077636719, 2.8158111572265625, 0.17819976806640625, 1.0850715637207031, -1.1993255615234375, 0.6224594116210938, -1.167022705078125, 3.9136905670166016, 1.1304054260253906, 1.1149368286132812, 2.0140609741210938, 2.3730697631835938, 2.96002197265625, 0.12286376953125, 6.201751708984375, 6.4346771240234375, 0.7846946716308594, 1.4459991455078125, 4.744392395019531, 3.1195068359375, 1.837493896484375, -2.241546630859375, 1.5660552978515625, -0.00262451171875, 0.8166465759277344, 0.7293624877929688, 1.109395980834961, -0.6603050231933594, -1.7664203643798828, 5.505695343017578, 0.7235488891601562, 1.6354827880859375, -0.8579254150390625, 2.000274658203125, 1.5804290771484375, 1.1140556335449219, 4.384941101074219, 1.0856666564941406, 6.6206207275390625, -1.3685302734375, -0.4718170166015625, 1.3179912567138672, 3.011199951171875, 3.582439422607422, 6.929409027099609, 1.4669685363769531, 2.5040740966796875, -4.477970123291016, 4.200176239013672, 3.119016647338867], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000546.npy"}
|
||||
{"epoch": 0.8253968253968254, "step": 547, "batch_size": 64, "mean": 1.8326648473739624, "std": 2.3036117553710938, "min": -3.5333480834960938, "p10": -0.7554794311523432, "median": 1.5279197692871094, "p90": 4.73761444091797, "max": 8.73526382446289, "pos_frac": 0.859375, "sample": [2.333118438720703, 0.179962158203125, 0.478271484375, 0.9237213134765625, 6.001434326171875, 2.7151193618774414, -1.5391464233398438, 8.73526382446289, 3.5997772216796875, 4.066429138183594, 1.2950592041015625, 1.4098358154296875, 0.6553173065185547, 1.1529388427734375, 1.4536705017089844, 0.6752243041992188, 3.5734405517578125, 3.9339637756347656, -1.5356063842773438, 0.5511245727539062, 2.5839805603027344, 0.6851959228515625, 5.965778350830078, 1.6885986328125, 3.72900390625, 0.5856094360351562, -0.16397857666015625, 3.1296157836914062, 1.592376708984375, 1.3044281005859375, 0.9971847534179688, 0.12978744506835938, -2.75531005859375, 1.4634628295898438, 0.00908660888671875, 1.8143291473388672, 4.22174072265625, 1.230194091796875, 1.6934356689453125, 3.0990447998046875, 2.0747909545898438, 6.4691009521484375, 4.384239196777344, 3.8891067504882812, -3.5333480834960938, 5.858390808105469, 0.35223388671875, -2.3961181640625, -1.0089797973632812, 2.4983978271484375, 4.889060974121094, 1.24102783203125, 5.778656005859375, -0.13315582275390625, 0.6306247711181641, -2.144775390625, 0.6190986633300781, 2.3161239624023438, 2.9286422729492188, 1.8334884643554688, 1.7365303039550781, 1.915252685546875, 2.997211456298828, 0.4324665069580078], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000547.npy"}
|
||||
{"epoch": 0.8269085411942555, "step": 548, "batch_size": 64, "mean": 1.306201457977295, "std": 2.353482961654663, "min": -2.964611053466797, "p10": -1.5650913238525386, "median": 1.0889244079589844, "p90": 4.563726806640625, "max": 7.168964385986328, "pos_frac": 0.6875, "sample": [-0.3231201171875, 2.6355361938476562, 2.2931442260742188, -0.6837615966796875, 0.214324951171875, 5.00537109375, -0.12408447265625, 5.541473388671875, 2.6558380126953125, 4.628486633300781, -0.031158447265625, 3.3637657165527344, 0.819575309753418, -0.7044296264648438, -0.95257568359375, 2.1030654907226562, -1.818817138671875, 0.36199283599853516, 3.2277984619140625, 1.442657470703125, 1.0711822509765625, 2.245515823364258, -2.3411712646484375, 1.5852737426757812, -0.7127532958984375, 1.1080093383789062, 4.143829345703125, 7.168964385986328, 2.5975494384765625, 3.9028778076171875, 0.5760421752929688, 0.9256553649902344, 1.0876693725585938, 3.3772506713867188, -0.5867919921875, 0.5175323486328125, 4.412620544433594, 1.2421817779541016, -2.5718536376953125, 0.7979354858398438, 2.2554855346679688, 0.5642013549804688, 5.106819152832031, -0.7026824951171875, 6.929534912109375, 1.3086509704589844, -0.8458480834960938, 1.090179443359375, -1.1738700866699219, 1.6986236572265625, 1.175811767578125, -2.964611053466797, 0.11968040466308594, 2.0008087158203125, -0.4658966064453125, 1.287506103515625, 3.30487060546875, -1.732757568359375, 0.40761566162109375, -2.3916473388671875, 5.856937408447266, 4.1684417724609375, -2.753082275390625, -0.8504791259765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000548.npy"}
|
||||
{"epoch": 0.8284202569916855, "step": 549, "batch_size": 64, "mean": 1.3089196681976318, "std": 2.9003164768218994, "min": -6.03727912902832, "p10": -1.9849430084228514, "median": 1.1928215026855469, "p90": 4.715216541290284, "max": 11.22686767578125, "pos_frac": 0.703125, "sample": [-0.6562652587890625, 3.6411590576171875, -0.7889633178710938, 0.4158172607421875, -1.5860137939453125, -3.399993896484375, 3.2789993286132812, 2.3676795959472656, -0.32819366455078125, 3.7329978942871094, 5.180267333984375, 2.4942245483398438, 1.6802520751953125, 4.786312103271484, 0.9364776611328125, 0.204345703125, -2.7820281982421875, 0.5251426696777344, 5.179339408874512, 2.666107177734375, 2.6950149536132812, 0.9346351623535156, 11.22686767578125, -2.6632843017578125, 0.1666412353515625, 0.5768890380859375, 2.1474990844726562, 3.2680587768554688, -0.815399169921875, -1.6883926391601562, 4.5493268966674805, 3.7111129760742188, 2.91741943359375, -1.2791099548339844, 1.2276535034179688, 2.71282958984375, 1.4255847930908203, -0.612030029296875, 0.12998199462890625, 1.604217529296875, -2.514404296875, 2.2405319213867188, 0.9186763763427734, 1.5878849029541016, -0.5201339721679688, 1.1206645965576172, -0.2478504180908203, 1.230133056640625, 5.595367431640625, 0.4170684814453125, -2.001401901245117, 3.8778457641601562, 3.858612060546875, 0.5829315185546875, 1.5746307373046875, 2.1021575927734375, -0.4360160827636719, 1.7381935119628906, -1.9465389251708984, 6.563079833984375, 8.091936111450195, -4.966400146484375, -6.03727912902832, 1.157989501953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000549.npy"}
|
||||
{"epoch": 0.8299319727891157, "step": 550, "batch_size": 64, "mean": 1.30253267288208, "std": 2.462827205657959, "min": -4.661075592041016, "p10": -1.7920783996582026, "median": 1.0072746276855469, "p90": 4.114367675781251, "max": 8.124710083007812, "pos_frac": 0.734375, "sample": [1.869415283203125, 4.778289794921875, 7.259498596191406, -0.132965087890625, -3.0757293701171875, -0.32703399658203125, 0.8980369567871094, -2.32861328125, -0.7523727416992188, 1.3456039428710938, 3.9022293090820312, 2.356475830078125, 4.495170593261719, 2.2022762298583984, 0.281829833984375, 4.205284118652344, 6.036830902099609, 3.5567550659179688, 3.89886474609375, 0.5226554870605469, 3.0878677368164062, -0.4046134948730469, 3.5557098388671875, 3.773834228515625, 2.8001556396484375, -2.7778854370117188, 0.9720916748046875, -1.1727104187011719, 0.12993621826171875, -0.8245086669921875, -3.0218582153320312, 0.19374847412109375, 0.6590118408203125, 1.0424575805664062, 0.23559188842773438, 1.9747467041015625, 0.1455230712890625, 0.7100448608398438, 1.5113677978515625, 2.5965652465820312, 0.21971511840820312, 1.933074951171875, 1.460693359375, 0.39833641052246094, 1.9019088745117188, 2.615386962890625, 2.4286041259765625, 3.1020984649658203, 0.60888671875, 4.329803466796875, -0.8065719604492188, 3.465394973754883, 3.1351776123046875, 0.8304958343505859, 3.669170379638672, 0.08870983123779297, -3.1238784790039062, -0.38987159729003906, 8.124710083007812, -4.661075592041016, 1.1993789672851562, -0.47309112548828125, -0.8170242309570312, -2.0575218200683594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000550.npy"}
|
||||
{"epoch": 0.8314436885865457, "step": 551, "batch_size": 64, "mean": 1.3620030879974365, "std": 2.3036537170410156, "min": -4.681610107421875, "p10": -1.0244777679443358, "median": 1.3347797393798828, "p90": 4.0110824584960945, "max": 8.984695434570312, "pos_frac": 0.765625, "sample": [2.1274986267089844, 0.259613037109375, 2.3865203857421875, 1.58306884765625, 2.250621795654297, 2.7907466888427734, 6.899810791015625, -0.8287887573242188, 0.11734580993652344, 2.1054916381835938, 0.012752532958984375, 1.3184394836425781, 5.487152099609375, -0.3896446228027344, 3.8730926513671875, 2.3621177673339844, 1.081268310546875, 2.6220321655273438, -4.681610107421875, -0.29757118225097656, 0.7312393188476562, 0.8027191162109375, 2.7258071899414062, 0.020294189453125, 1.352325439453125, 8.984695434570312, 1.2338638305664062, 1.458404541015625, 2.93487548828125, 2.1692276000976562, 1.6582164764404297, 1.6698379516601562, 0.5391807556152344, -3.8004302978515625, -0.076446533203125, 6.28302001953125, -0.9286117553710938, 4.308746337890625, -1.0655632019042969, 1.243560791015625, -1.1807594299316406, 1.8152847290039062, -0.5117568969726562, -2.536710739135742, 4.070220947265625, 1.0393447875976562, 1.3511199951171875, 4.189399719238281, -0.4217700958251953, 0.032764434814453125, 1.1850967407226562, 2.0675430297851562, 1.1312408447265625, -2.6364917755126953, 0.294830322265625, 1.8811187744140625, -1.6650447845458984, 2.5700607299804688, 1.9203338623046875, 2.809600830078125, 0.8407745361328125, -0.5989894866943359, 3.1113433837890625, 3.084716796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000551.npy"}
|
||||
{"epoch": 0.8329554043839759, "step": 552, "batch_size": 64, "mean": 1.4437094926834106, "std": 2.3651058673858643, "min": -3.69390869140625, "p10": -1.4482584953308104, "median": 1.6665735244750977, "p90": 3.742265224456787, "max": 7.438812255859375, "pos_frac": 0.734375, "sample": [-2.779804229736328, 0.77520751953125, -3.69390869140625, -0.06554412841796875, 3.691499710083008, 0.17836570739746094, -0.02613067626953125, 2.1455535888671875, 1.3370437622070312, 6.218402862548828, 3.361480712890625, 5.5722808837890625, 1.6452350616455078, 4.5767822265625, -0.5254783630371094, -1.3258771896362305, -0.9154167175292969, 1.917572021484375, 1.7720565795898438, -0.9021072387695312, 1.459381103515625, 3.0445632934570312, 0.421356201171875, 1.8158226013183594, 2.734943389892578, -0.36698150634765625, 1.1993789672851562, 1.9639396667480469, 2.6103591918945312, -0.162139892578125, 3.764021873474121, 0.6150074005126953, 2.928924560546875, -3.632925033569336, 3.230663299560547, -1.5007076263427734, 1.8605804443359375, 3.2122421264648438, 1.0651130676269531, 0.80035400390625, 2.9388351440429688, 2.8692855834960938, -0.9755783081054688, 7.438812255859375, 1.4443359375, 2.1089019775390625, 2.7257766723632812, 1.4204902648925781, -3.05584716796875, -0.90997314453125, 4.924812316894531, 1.57672119140625, 2.9905147552490234, -3.3739242553710938, 3.153848648071289, 2.70025634765625, 1.6879119873046875, 3.077228546142578, 3.6372451782226562, 3.6253128051757812, 4.676807403564453, 0.5924186706542969, 0.32244873046875, -3.2203445434570312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000552.npy"}
|
||||
{"epoch": 0.8344671201814059, "step": 553, "batch_size": 64, "mean": 1.6402829885482788, "std": 2.8570048809051514, "min": -4.8029022216796875, "p10": -1.9962711334228516, "median": 1.0793724060058594, "p90": 5.915645980834961, "max": 8.698173522949219, "pos_frac": 0.71875, "sample": [8.698173522949219, -1.493011474609375, -0.9903030395507812, 4.891847610473633, 0.48184967041015625, 0.16516876220703125, 4.853450775146484, 5.274627685546875, 0.06954193115234375, -0.6765327453613281, 2.306873321533203, -2.2354736328125, 3.8775062561035156, 1.4993133544921875, 3.638916015625, 3.070068359375, -0.9596405029296875, 2.872509002685547, -0.1678314208984375, -4.8029022216796875, -0.6486034393310547, 3.999420166015625, 5.0491790771484375, 2.400524139404297, 6.065101623535156, 2.8900146484375, 2.0391769409179688, -2.1688461303710938, -2.1892166137695312, -1.3711128234863281, 6.7903900146484375, 3.1008262634277344, 0.8425216674804688, 6.953338623046875, 2.58917236328125, 1.59100341796875, 5.989749908447266, 0.0316314697265625, 1.0768280029296875, 6.10882568359375, 0.8188018798828125, 1.2026252746582031, 0.21357154846191406, 2.179872512817383, -2.0236053466796875, 0.18923187255859375, -0.5720787048339844, 6.588920593261719, -1.9324913024902344, -2.55914306640625, 3.206451416015625, 0.687591552734375, 4.039985656738281, 0.6692962646484375, -3.1176681518554688, 4.206611633300781, 1.0739326477050781, 0.5345611572265625, 0.8341407775878906, 1.0819168090820312, -0.38277626037597656, -0.9680891036987305, 5.74273681640625, 1.7496414184570312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000553.npy"}
|
||||
{"epoch": 0.8359788359788359, "step": 554, "batch_size": 64, "mean": 1.55374276638031, "std": 2.5860395431518555, "min": -6.2642059326171875, "p10": -1.415561294555664, "median": 1.1740732192993164, "p90": 4.876981735229492, "max": 9.49715805053711, "pos_frac": 0.78125, "sample": [1.1727142333984375, 2.2545547485351562, 6.215679168701172, 3.5428619384765625, 3.20709228515625, 1.8734378814697266, 1.1754322052001953, 3.0920562744140625, -3.2766799926757812, 2.3856124877929688, 0.5996322631835938, -1.4619293212890625, -0.9839401245117188, 2.526153564453125, -3.2158355712890625, 0.33106231689453125, 1.417933464050293, 0.37174224853515625, 2.771087646484375, 0.9826364517211914, 0.9324951171875, 3.428813934326172, 0.4003105163574219, 0.74810791015625, 2.3922882080078125, 0.9314041137695312, 6.049751281738281, 2.215545654296875, -0.9460639953613281, 1.8309249877929688, 9.49715805053711, 0.465667724609375, 1.6619911193847656, -1.680450439453125, 1.8164710998535156, 0.658721923828125, 2.201753616333008, 0.33742523193359375, 3.6516494750976562, -0.0837860107421875, -6.2642059326171875, 5.795745849609375, 2.9326248168945312, 0.9443588256835938, 0.714111328125, -2.027210235595703, 0.2277851104736328, -1.6624183654785156, -0.00064849853515625, 0.374114990234375, 4.858917236328125, -1.064605712890625, -1.3073692321777344, 4.884723663330078, 4.7437286376953125, 0.62969970703125, 0.6874542236328125, 3.5260848999023438, 3.8638954162597656, 5.097389221191406, 4.421131134033203, -0.3810081481933594, 5.487270355224609, 1.4664840698242188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000554.npy"}
|
||||
{"epoch": 0.8374905517762661, "step": 555, "batch_size": 64, "mean": 1.5303804874420166, "std": 2.4789299964904785, "min": -5.7325592041015625, "p10": -1.5071640014648438, "median": 1.9328193664550781, "p90": 4.116275596618652, "max": 7.57257080078125, "pos_frac": 0.75, "sample": [0.4609642028808594, -3.1065673828125, 0.8460617065429688, 3.8209304809570312, 3.631988525390625, -0.22223663330078125, 0.11383056640625, -0.8420028686523438, -1.3383865356445312, -1.0663604736328125, 3.404644012451172, -1.47528076171875, 3.040771484375, 1.0542526245117188, 4.289356231689453, 3.1113815307617188, -0.06321334838867188, 1.4383392333984375, 0.7008380889892578, 3.109222412109375, 2.1992416381835938, 0.057987213134765625, 5.1345062255859375, 0.662689208984375, 1.79046630859375, 3.4170913696289062, 4.08221435546875, 3.0789566040039062, 3.2650299072265625, 2.9104537963867188, 4.129789352416992, 2.0688858032226562, 4.084743499755859, 4.916259765625, -3.2559280395507812, 3.388446807861328, -5.7325592041015625, 2.582735061645508, -0.00919342041015625, -0.20137977600097656, 0.613311767578125, 2.213531494140625, 2.7649612426757812, 1.12945556640625, -4.0958404541015625, 1.0812511444091797, 1.9450454711914062, 1.92059326171875, 4.407257080078125, -1.6810855865478516, 4.978199005126953, -1.5208282470703125, -3.409393310546875, 1.2829551696777344, 4.0566864013671875, 2.76654052734375, 2.2395057678222656, 1.5932807922363281, 3.1539077758789062, -0.9032039642333984, 7.57257080078125, 3.0007667541503906, 3.1595382690429688, 0.196380615234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000555.npy"}
|
||||
{"epoch": 0.8390022675736961, "step": 556, "batch_size": 64, "mean": 1.2706454992294312, "std": 2.5819501876831055, "min": -3.2835769653320312, "p10": -2.1806062698364257, "median": 1.111501693725586, "p90": 4.720812225341798, "max": 7.342437744140625, "pos_frac": 0.671875, "sample": [1.1496963500976562, 1.7923049926757812, 0.6732406616210938, -0.338287353515625, -3.1336517333984375, 3.7331886291503906, 3.7120361328125, 0.6947479248046875, -0.83868408203125, -3.1592864990234375, -1.2822608947753906, -0.13675308227539062, 1.1728286743164062, 3.9616928100585938, 1.8620529174804688, 0.5030670166015625, -0.2176666259765625, 1.0578994750976562, 3.0222091674804688, 2.8658714294433594, -1.2168197631835938, 3.0616607666015625, 4.479530334472656, 0.12664031982421875, -0.7385368347167969, 3.1684722900390625, -1.5835838317871094, 4.037982940673828, 1.9260749816894531, 0.279510498046875, 0.4105854034423828, -2.9045372009277344, 6.068462371826172, -0.00536346435546875, -1.923248291015625, 1.4475421905517578, 2.6111679077148438, 4.253826141357422, 1.8438568115234375, 0.5523567199707031, 3.999828338623047, 5.439453125, 1.7732353210449219, 1.0733070373535156, 3.662933349609375, 1.6509552001953125, 0.6682853698730469, -1.109405517578125, -2.6754112243652344, -0.79364013671875, 5.774646759033203, -2.9752731323242188, -1.4950027465820312, 4.82421875, 0.5404396057128906, -1.94830322265625, 1.9614639282226562, -3.2835769653320312, 2.6914749145507812, 5.218318939208984, 5.013069152832031, 7.342437744140625, 3.2581958770751953, -2.2801647186279297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000556.npy"}
|
||||
{"epoch": 0.8405139833711263, "step": 557, "batch_size": 64, "mean": 1.531087875366211, "std": 2.843456268310547, "min": -4.9756622314453125, "p10": -2.0036315917968746, "median": 1.357208251953125, "p90": 4.716104888916016, "max": 9.455978393554688, "pos_frac": 0.71875, "sample": [5.878227233886719, 2.169322967529297, 2.3465919494628906, 0.777984619140625, 0.8864784240722656, -3.1961402893066406, 7.359521865844727, -4.9756622314453125, 1.7027244567871094, 2.6326141357421875, 2.532745361328125, -2.8688278198242188, -2.236207962036133, -0.07086944580078125, 0.2717742919921875, 0.8141880035400391, 2.988658905029297, 0.4718742370605469, 9.455978393554688, 1.3310317993164062, 2.2779693603515625, -1.7328109741210938, 4.507965087890625, -4.281982421875, -1.6493167877197266, -1.1639251708984375, 5.460411071777344, 6.1731719970703125, -0.6017265319824219, -0.4782257080078125, 2.41497802734375, 0.5306625366210938, 1.2844085693359375, -1.4892234802246094, -0.021514892578125, 4.760053634643555, 3.5657882690429688, 2.7814979553222656, 1.0897979736328125, 8.364501953125, -0.4577484130859375, 2.1222877502441406, -1.5957717895507812, 0.5224113464355469, 2.000385284423828, -0.318145751953125, 1.309234619140625, 3.2829437255859375, 3.339620590209961, 4.2836761474609375, 1.1534385681152344, 2.3872337341308594, 1.3833847045898438, 1.136190414428711, 2.289579391479492, 1.6622772216796875, 4.3299102783203125, 1.1676006317138672, 4.523712158203125, 1.55804443359375, -3.24774169921875, -2.1196975708007812, 2.5987472534179688, 4.613557815551758], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000557.npy"}
|
||||
{"epoch": 0.8420256991685563, "step": 558, "batch_size": 64, "mean": 1.6232068538665771, "std": 2.5240612030029297, "min": -4.644756317138672, "p10": -1.405227279663086, "median": 1.8129730224609375, "p90": 4.408276748657227, "max": 9.044486999511719, "pos_frac": 0.75, "sample": [2.7297439575195312, 4.321830749511719, 3.9757537841796875, -1.8787879943847656, 3.122283935546875, 0.3408832550048828, 4.522918701171875, 1.2938480377197266, 1.6353836059570312, 3.35980224609375, 2.0902481079101562, 1.7276992797851562, -0.3354644775390625, 0.204620361328125, 1.2614898681640625, 3.2712879180908203, 1.6703224182128906, 0.5947265625, 2.3726539611816406, 4.343952178955078, 2.410308837890625, 2.7737274169921875, -1.0732841491699219, 3.8057861328125, 2.876190185546875, -1.2188720703125, 5.1419525146484375, 2.3630142211914062, 1.9020118713378906, 1.148773193359375, 1.8982467651367188, 3.0888442993164062, 1.4502716064453125, 9.044486999511719, -3.018951416015625, -0.44705963134765625, 2.350719451904297, 1.5681304931640625, 0.7092742919921875, 7.317657470703125, 2.4872055053710938, -2.7193832397460938, 4.435844421386719, 1.022613525390625, 5.180759429931641, 4.130851745605469, -1.3965034484863281, 0.2433624267578125, 6.1533660888671875, 2.2500686645507812, 2.258331298828125, -4.644756317138672, 2.13800048828125, -0.7758255004882812, 0.6841888427734375, -1.7093315124511719, 1.1458587646484375, -1.408966064453125, -4.302427291870117, -0.1137237548828125, 2.9038009643554688, -0.2766437530517578, 2.050048828125, -0.56793212890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000558.npy"}
|
||||
{"epoch": 0.8435374149659864, "step": 559, "batch_size": 64, "mean": 1.371781349182129, "std": 3.0034127235412598, "min": -5.451904296875, "p10": -2.674163246154785, "median": 1.3712921142578125, "p90": 4.776487350463868, "max": 9.0743408203125, "pos_frac": 0.640625, "sample": [-1.1027069091796875, -1.8695144653320312, 2.4142379760742188, -0.7109832763671875, 1.2987289428710938, 9.0743408203125, 5.38848876953125, 3.06512451171875, 1.4185447692871094, 0.2569694519042969, -3.3419647216796875, 4.8726806640625, -0.735504150390625, 3.8563690185546875, 1.9232635498046875, 3.857706069946289, 2.4125518798828125, -0.30194091796875, 3.1213607788085938, 0.3124542236328125, 4.552036285400391, 2.1699752807617188, -0.0211639404296875, 4.459480285644531, -0.27722930908203125, -2.7678279876708984, -1.197509765625, 3.1939010620117188, -1.3139114379882812, 6.803550720214844, 1.3240394592285156, -1.6941909790039062, 3.1187286376953125, 5.797119140625, 2.336944580078125, -0.014133453369140625, 1.591665267944336, -2.7713775634765625, 0.088897705078125, -0.042694091796875, 0.6728057861328125, -1.2988815307617188, -0.47868919372558594, 3.083454132080078, 3.168670654296875, 4.262866973876953, 4.182912826538086, -4.0411834716796875, 4.363746643066406, -2.4556121826171875, 1.7552871704101562, 7.9309844970703125, -5.451904296875, 0.7603530883789062, 1.5301475524902344, 6.1407012939453125, 0.272003173828125, 3.9312057495117188, 3.086761474609375, 1.0165214538574219, 2.6588134765625, -0.00946044921875, -2.9193572998046875, -4.914646148681641], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000559.npy"}
|
||||
{"epoch": 0.8450491307634165, "step": 560, "batch_size": 64, "mean": 1.7297306060791016, "std": 2.7205471992492676, "min": -4.113700866699219, "p10": -1.5907302856445311, "median": 1.2778129577636719, "p90": 5.412749099731446, "max": 7.446586608886719, "pos_frac": 0.703125, "sample": [1.6089401245117188, 3.8873291015625, 4.557365417480469, 4.431724548339844, -1.447357177734375, -1.3905601501464844, 5.490970611572266, 2.5850830078125, 6.225616455078125, -0.8602533340454102, 5.689727783203125, -0.61083984375, -0.7325935363769531, 1.2575225830078125, -0.904510498046875, -1.1608123779296875, -1.6521759033203125, -0.014850616455078125, 2.4569358825683594, 2.0150909423828125, -2.2183380126953125, 2.91326904296875, 1.1114959716796875, 1.346395492553711, 5.230232238769531, 3.0630435943603516, 0.9315872192382812, 4.936101913452148, 4.390586853027344, 1.5942020416259766, 0.9245681762695312, -2.6577835083007812, -4.113700866699219, 6.243221282958984, 4.1431884765625, 1.0369644165039062, -0.320159912109375, -0.260650634765625, 7.27105712890625, -2.2391433715820312, 7.446586608886719, 3.210235595703125, 0.904205322265625, 3.8188228607177734, 0.9420928955078125, 1.8585281372070312, -1.2894515991210938, 5.552160263061523, 1.1778984069824219, 0.8748397827148438, 1.1021881103515625, 2.972747802734375, -2.8313522338867188, -0.8675918579101562, 0.591644287109375, 4.568626403808594, -2.2663497924804688, 0.6715888977050781, 3.7819137573242188, 1.4754295349121094, 5.160148620605469, 1.2981033325195312, 0.7561702728271484, 5.035091400146484], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000560.npy"}
|
||||
{"epoch": 0.8465608465608465, "step": 561, "batch_size": 64, "mean": 1.8150770664215088, "std": 2.967372179031372, "min": -3.9318084716796875, "p10": -0.9077478408813476, "median": 1.0244941711425781, "p90": 6.799957656860354, "max": 9.209304809570312, "pos_frac": 0.71875, "sample": [0.10831451416015625, 0.7595558166503906, 0.4194488525390625, 1.0002555847167969, 0.8961391448974609, 4.895660400390625, 0.8964595794677734, -0.7732315063476562, 7.174800872802734, 1.7122802734375, -0.073455810546875, 4.40869140625, 1.3537254333496094, 3.6668319702148438, 0.5365486145019531, -0.30826377868652344, 1.33221435546875, 4.764713287353516, 8.972114562988281, 0.28076934814453125, 1.1450538635253906, 4.17437744140625, 3.0735740661621094, -0.1016845703125, 3.3861236572265625, -0.39960670471191406, 0.2816619873046875, -0.8166313171386719, -3.4161300659179688, 9.209304809570312, -3.6373748779296875, 1.3263397216796875, 1.4390106201171875, 4.7869720458984375, -0.09204864501953125, 3.2772445678710938, 1.2220840454101562, 1.8678169250488281, 0.028781890869140625, 7.0175933837890625, 8.313766479492188, 7.3645172119140625, 4.212249755859375, 7.423126220703125, 1.1012687683105469, -0.93597412109375, -3.9318084716796875, 3.03228759765625, 4.8333587646484375, 1.0487327575683594, 0.843841552734375, -0.0535888671875, -0.15508270263671875, -2.080432891845703, 0.7588043212890625, 2.9101638793945312, -1.2936325073242188, 0.4200935363769531, 2.2534942626953125, -0.22383880615234375, -0.8418865203857422, 0.11364555358886719, 6.292140960693359, -1.036346435546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000561.npy"}
|
||||
{"epoch": 0.8480725623582767, "step": 562, "batch_size": 64, "mean": 1.7036182880401611, "std": 2.4515411853790283, "min": -2.845691680908203, "p10": -1.5679218292236323, "median": 1.5431060791015625, "p90": 4.6767631530761715, "max": 6.14349365234375, "pos_frac": 0.71875, "sample": [-2.845691680908203, 4.1441497802734375, -0.8911781311035156, 2.7244873046875, 0.594512939453125, -0.2880096435546875, 0.32933807373046875, 5.606353759765625, 5.760612487792969, 1.4030990600585938, 3.4960174560546875, 1.8739547729492188, -2.795074462890625, -1.8496322631835938, 6.14349365234375, -0.424957275390625, 0.524383544921875, 1.57489013671875, 4.679786682128906, -0.7355499267578125, 1.941619873046875, 4.627849578857422, 0.0230712890625, 0.8458709716796875, 0.3196563720703125, 3.8474044799804688, 0.45589447021484375, -1.05084228515625, 3.8061981201171875, 2.7667007446289062, 3.2228240966796875, -2.294952392578125, 5.422142028808594, 3.4314117431640625, 1.511322021484375, 4.669708251953125, -0.49854278564453125, -2.2128982543945312, 0.5987968444824219, 5.844810485839844, -0.1844024658203125, 0.6728553771972656, 4.278814315795898, 4.5925750732421875, 3.02490234375, 5.434902191162109, 3.992919921875, 2.29791259765625, 4.402275085449219, 0.7749252319335938, 3.9094467163085938, 1.7672119140625, -0.7856960296630859, 0.3429718017578125, 2.3986587524414062, 2.6311702728271484, 3.8983001708984375, 4.356842041015625, 0.8448944091796875, -1.7660636901855469, -1.1055908203125, -0.668060302734375, -0.5903511047363281, -1.792877197265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000562.npy"}
|
||||
{"epoch": 0.8495842781557067, "step": 563, "batch_size": 64, "mean": 1.7775871753692627, "std": 2.878091812133789, "min": -4.559394836425781, "p10": -2.067833518981933, "median": 1.8211002349853516, "p90": 5.552758407592774, "max": 8.12518310546875, "pos_frac": 0.734375, "sample": [0.25022125244140625, 0.6675472259521484, 2.726959228515625, 0.5468721389770508, 5.29876708984375, 5.437965393066406, -2.415864944458008, 7.761627197265625, 3.89508056640625, 2.2836456298828125, 1.2037162780761719, 3.3361968994140625, 0.6488265991210938, -0.5746688842773438, 4.797966003417969, 2.444448471069336, -3.214996337890625, 6.6605987548828125, -4.559394836425781, 8.12518310546875, -0.2758026123046875, 7.7304534912109375, 0.06719398498535156, 1.7729721069335938, 2.9574966430664062, -0.11515045166015625, 3.3115272521972656, 1.8720550537109375, 1.3454551696777344, -2.4363861083984375, -0.2077484130859375, -0.9649925231933594, 0.4375572204589844, -0.6963233947753906, -1.2557601928710938, 6.142627716064453, 1.54681396484375, -4.345611572265625, 1.8692283630371094, 2.8881797790527344, 5.8843231201171875, 4.341796875, 2.2393035888671875, -0.8712158203125, -3.2344131469726562, 0.173736572265625, 2.9371261596679688, 1.0435943603515625, 2.234222412109375, 2.9961891174316406, -2.967041015625, 1.223175048828125, 2.75860595703125, 4.177089691162109, 3.9840927124023438, -0.3604545593261719, 5.601955413818359, 3.0176010131835938, 0.9496612548828125, 5.151397705078125, 1.1163177490234375, 2.181732177734375, 2.975860595703125, -0.7535648345947266], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000563.npy"}
|
||||
{"epoch": 0.8510959939531368, "step": 564, "batch_size": 64, "mean": 1.5733850002288818, "std": 2.7949023246765137, "min": -5.880897521972656, "p10": -1.8175693511962887, "median": 1.4194717407226562, "p90": 5.473822784423829, "max": 7.3925628662109375, "pos_frac": 0.6875, "sample": [7.3925628662109375, -0.0254364013671875, 6.250579833984375, 0.41013336181640625, 2.9040145874023438, 4.5088043212890625, 4.63720703125, -3.2027206420898438, 0.4479331970214844, -2.2682647705078125, 6.156425476074219, 5.314764022827148, 1.6170425415039062, -2.322744369506836, 1.4368362426757812, 0.19804763793945312, -0.10599040985107422, 2.110757827758789, 3.3258514404296875, 1.4199295043945312, 0.014095306396484375, 1.3994274139404297, 1.894852638244629, 1.33294677734375, 1.1410751342773438, -0.0567474365234375, 5.5416107177734375, 0.6414642333984375, 1.6378593444824219, 2.2730255126953125, 2.9052047729492188, 3.613525390625, 5.315650939941406, -0.12213516235351562, 0.04128265380859375, -3.545196533203125, 1.6599235534667969, 0.5472793579101562, -1.501708984375, -0.01850128173828125, 3.7625179290771484, 4.739166259765625, 1.4190139770507812, -1.9529380798339844, -0.49085044860839844, -0.3890838623046875, 4.703125, -0.25836181640625, -0.7388839721679688, -5.880897521972656, 4.324260711669922, 2.1340980529785156, 2.380096435546875, -0.7061252593994141, 6.761444091796875, 1.5515670776367188, 2.117767333984375, 7.279335021972656, -3.0880584716796875, 5.625988006591797, -1.1483917236328125, 3.430309295654297, 0.31549835205078125, -0.1146240234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000564.npy"}
|
||||
{"epoch": 0.8526077097505669, "step": 565, "batch_size": 64, "mean": 1.5936262607574463, "std": 2.4630653858184814, "min": -4.213134765625, "p10": -1.4714504241943356, "median": 1.4278907775878906, "p90": 4.648657989501953, "max": 7.771156311035156, "pos_frac": 0.765625, "sample": [0.2827320098876953, -2.1354751586914062, -4.213134765625, 4.63665771484375, 0.2253265380859375, 0.312835693359375, 2.15771484375, 5.3256378173828125, -0.3395881652832031, 2.1953659057617188, 1.6316146850585938, 1.6365814208984375, 5.48065185546875, -3.4817562103271484, 1.9754867553710938, 1.3637237548828125, 1.014383316040039, 1.4920578002929688, 4.0194854736328125, 5.739040374755859, 3.1312637329101562, -1.588623046875, -1.5917015075683594, -1.1671066284179688, 3.0917625427246094, 1.3066024780273438, 0.8852386474609375, 5.78253173828125, 3.7511444091796875, 0.510986328125, 2.3328399658203125, 1.0202217102050781, -0.985595703125, 0.9536590576171875, 3.9698486328125, 0.22936248779296875, 4.048484802246094, 3.4444656372070312, -1.1980476379394531, 4.653800964355469, 0.350189208984375, 0.9556922912597656, -1.9850311279296875, 3.4543991088867188, -1.04925537109375, 3.415325164794922, -0.4106407165527344, 3.4729080200195312, 6.060020446777344, 0.8449172973632812, 2.6432342529296875, 0.2041473388671875, -1.76788330078125, 0.10718917846679688, 2.2827911376953125, 0.2828521728515625, 1.6083831787109375, 3.1894607543945312, -1.1862335205078125, -0.7812042236328125, 7.771156311035156, 4.461292266845703, 1.9124603271484375, 4.255424499511719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000565.npy"}
|
||||
{"epoch": 0.854119425547997, "step": 566, "batch_size": 64, "mean": 1.3781930208206177, "std": 3.004122734069824, "min": -5.4230804443359375, "p10": -2.070448303222656, "median": 1.1163787841796875, "p90": 4.448843002319336, "max": 11.318496704101562, "pos_frac": 0.703125, "sample": [0.6930084228515625, -0.50885009765625, 0.020153045654296875, -1.8786468505859375, 2.3436126708984375, 4.354328155517578, 4.3015594482421875, 3.2677154541015625, 2.659454345703125, 5.202808380126953, 2.155935287475586, 4.489349365234375, 1.8032150268554688, -0.727020263671875, 4.123725891113281, 5.564727783203125, -5.4230804443359375, 1.9663925170898438, 1.04217529296875, 0.462890625, 3.2243003845214844, 0.6728744506835938, -1.4066925048828125, 1.9688568115234375, 0.6999740600585938, 0.2671051025390625, 4.313636779785156, 2.4139976501464844, 1.9245376586914062, -3.0477828979492188, 3.8206748962402344, 3.305328369140625, 6.463874816894531, -2.4563446044921875, -1.2501220703125, -4.268120765686035, 0.20882415771484375, 0.3682422637939453, 4.278228759765625, -0.219146728515625, 1.2512283325195312, -1.6765365600585938, 11.318496704101562, -4.378040313720703, 5.108373641967773, 1.6222496032714844, -0.08819580078125, -3.0929794311523438, 1.1483612060546875, -2.15264892578125, -0.10549736022949219, 0.8814315795898438, 0.864715576171875, 2.683802604675293, -0.6455612182617188, 1.1333847045898438, 1.2207794189453125, 2.6162490844726562, 1.0993728637695312, -1.6443138122558594, 4.0018310546875, 0.6974945068359375, -0.5620269775390625, 9.706680297851562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000566.npy"}
|
||||
{"epoch": 0.8556311413454271, "step": 567, "batch_size": 64, "mean": 1.6774554252624512, "std": 2.2325408458709717, "min": -3.3442916870117188, "p10": -1.0853115081787108, "median": 1.9429407119750977, "p90": 4.263851928710939, "max": 7.407833099365234, "pos_frac": 0.765625, "sample": [1.2976837158203125, 0.8733139038085938, -1.8253707885742188, 3.2103195190429688, 1.9174957275390625, 2.0897903442382812, 5.023139953613281, 0.8985748291015625, 7.407833099365234, 1.208587646484375, 2.90216064453125, 2.7113571166992188, 2.0178680419921875, -3.3442916870117188, 1.9838333129882812, 0.8893280029296875, -2.0537872314453125, 1.3269195556640625, -0.6032638549804688, 2.091205596923828, 0.7031097412109375, -1.9675102233886719, -1.0587234497070312, 3.5167388916015625, -1.139007568359375, 1.3071174621582031, -1.1100234985351562, -1.0967063903808594, 0.5272293090820312, -0.225494384765625, 2.697662353515625, 4.4120025634765625, 5.204689025878906, 5.84027099609375, 2.5992088317871094, 2.3073081970214844, 1.9683856964111328, 0.0410308837890625, 2.8041534423828125, 6.48980712890625, -0.6859664916992188, 0.3479461669921875, 3.6437149047851562, 3.9181671142578125, 3.618976593017578, 3.7863616943359375, -0.9502716064453125, 1.1425132751464844, 2.120880126953125, 2.61151123046875, 2.963563919067383, 1.6608924865722656, 3.129119873046875, 3.3010711669921875, -0.8892364501953125, 0.05300140380859375, -0.9543266296386719, 3.0154380798339844, 6.116035461425781, -0.967559814453125, 3.2396583557128906, 0.86944580078125, 2.4081649780273438, 0.014095306396484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000567.npy"}
|
||||
{"epoch": 0.8571428571428571, "step": 568, "batch_size": 64, "mean": 2.045576572418213, "std": 2.3629496097564697, "min": -3.0106163024902344, "p10": -0.3447650909423827, "median": 1.6941909790039062, "p90": 5.436822509765626, "max": 7.84454345703125, "pos_frac": 0.828125, "sample": [2.4268112182617188, 1.4178657531738281, 2.4276123046875, 2.595081329345703, 0.2164459228515625, 1.5977783203125, 1.2711639404296875, -3.0106163024902344, 1.6746139526367188, -0.3736419677734375, -1.2416572570800781, 5.940961837768555, 2.375225067138672, 3.769775390625, 0.5405197143554688, -0.7872467041015625, 0.33599853515625, 0.33989715576171875, 0.6523208618164062, -0.2773857116699219, 0.1910858154296875, 0.247100830078125, 1.062490463256836, 1.2861099243164062, 3.6827392578125, 6.925960540771484, 3.7051095962524414, 7.84454345703125, 1.411346435546875, 3.317138671875, 3.7824554443359375, 1.7312145233154297, 1.6901626586914062, -0.1334381103515625, 1.0700244903564453, 3.950054168701172, -2.0189857482910156, 2.2077484130859375, -0.17883872985839844, 2.093912124633789, 1.9359283447265625, 5.9409332275390625, 0.5596237182617188, 2.9906234741210938, 3.965686798095703, 1.6775360107421875, 4.42364501953125, 1.2665252685546875, 1.77783203125, 0.02496337890625, 5.1630859375, 3.9922637939453125, 1.6982192993164062, 1.7523784637451172, 7.6930084228515625, 0.42998504638671875, -0.13785171508789062, 3.47381591796875, 5.55413818359375, 3.7178802490234375, -2.69097900390625, -0.9554328918457031, 4.1009979248046875, 6.802642822265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000568.npy"}
|
||||
{"epoch": 0.8586545729402872, "step": 569, "batch_size": 64, "mean": 1.5074411630630493, "std": 2.4778125286102295, "min": -5.743621826171875, "p10": -1.7089021682739258, "median": 1.5520095825195312, "p90": 4.7748970031738285, "max": 6.826904296875, "pos_frac": 0.75, "sample": [2.6865234375, 0.6192626953125, 0.9473171234130859, 4.8608245849609375, 4.685661315917969, 4.493446350097656, -2.904804229736328, 6.826904296875, 3.0512313842773438, 5.744132995605469, 5.100044250488281, 5.049705505371094, 0.61114501953125, 2.2302284240722656, -2.8679122924804688, 0.9717941284179688, 1.6303634643554688, 4.279205322265625, -0.11651611328125, 2.202434539794922, 4.813140869140625, 3.676544189453125, 0.8854751586914062, 4.552284240722656, 0.5377731323242188, 0.6122398376464844, 2.012969970703125, 0.9344749450683594, -0.0991363525390625, -2.049409866333008, -3.0313491821289062, 0.06473350524902344, -1.7256603240966797, 2.3160247802734375, -0.3944244384765625, 1.77667236328125, 5.264472961425781, 2.8911209106445312, 1.7736663818359375, 0.7725391387939453, 2.082355499267578, -1.1341533660888672, -1.6697998046875, -2.4537124633789062, 0.6305274963378906, -5.743621826171875, 1.6068115234375, 2.092681884765625, 1.8546981811523438, 0.278533935546875, 1.4729232788085938, 1.4972076416015625, 4.112895965576172, -0.4552574157714844, -0.7280693054199219, 1.2346839904785156, 4.0855255126953125, 1.997222900390625, 4.430000305175781, -0.7017593383789062, 0.09085845947265625, 2.7702064514160156, 3.949993133544922, -0.5096588134765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000569.npy"}
|
||||
{"epoch": 0.8601662887377173, "step": 570, "batch_size": 64, "mean": 2.1350512504577637, "std": 2.48482346534729, "min": -2.9536285400390625, "p10": -1.143752288818359, "median": 2.110321521759033, "p90": 5.243082809448243, "max": 8.53656005859375, "pos_frac": 0.78125, "sample": [4.214015960693359, -1.2937088012695312, 0.4021282196044922, -0.44092559814453125, 1.9948539733886719, 3.39996337890625, -1.5366897583007812, -0.0888519287109375, 0.7452239990234375, -1.8320770263671875, 0.25321197509765625, 1.4126949310302734, 1.0942134857177734, 2.7318382263183594, 2.2223434448242188, 3.2128524780273438, 0.09092330932617188, -2.9536285400390625, 0.1544952392578125, 4.6135101318359375, 4.35003662109375, -0.04636383056640625, 0.40714263916015625, 3.3151206970214844, 6.671287536621094, -0.4190502166748047, 5.561790466308594, 3.5981521606445312, 4.867523193359375, -0.389068603515625, 4.083026885986328, 5.456764221191406, 2.686237335205078, -0.793853759765625, -0.5302658081054688, 2.257843017578125, 3.8138198852539062, 2.514101028442383, 5.358623504638672, 1.9982995986938477, 4.1004486083984375, 4.5444488525390625, 4.778594970703125, -1.8976554870605469, 0.35346221923828125, 0.8776931762695312, 6.10101318359375, 3.0654983520507812, 2.6113433837890625, -2.09344482421875, 4.343681335449219, -1.60943603515625, 1.3565444946289062, 1.629425048828125, 6.7789154052734375, 1.2232437133789062, 3.1271190643310547, 1.0770492553710938, 3.3053054809570312, 3.728015899658203, 8.53656005859375, 0.6431045532226562, 4.973487854003906, 1.931304931640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000570.npy"}
|
||||
{"epoch": 0.8616780045351474, "step": 571, "batch_size": 64, "mean": 1.2679531574249268, "std": 2.5025744438171387, "min": -4.293556213378906, "p10": -2.1688360214233398, "median": 1.5695915222167969, "p90": 3.761361312866212, "max": 7.843742370605469, "pos_frac": 0.703125, "sample": [1.1672210693359375, 1.489349365234375, 2.4079437255859375, 1.1905994415283203, -1.1116943359375, 7.843742370605469, 3.1878204345703125, 2.574596405029297, 2.5197410583496094, 0.98809814453125, 0.8471317291259766, 2.3049163818359375, 0.3784217834472656, 1.6821422576904297, -2.336994171142578, -0.568511962890625, 2.67547607421875, 1.7009220123291016, 1.2413253784179688, 2.1602706909179688, 7.181234359741211, -0.452423095703125, 2.89202880859375, 1.2688751220703125, 2.1942367553710938, 1.881378173828125, 3.0523414611816406, 3.3295211791992188, -4.293556213378906, 1.516998291015625, 0.4827594757080078, -1.9604034423828125, 2.748584747314453, -0.6995697021484375, 2.327362060546875, 1.8637847900390625, -2.8645477294921875, -1.0883350372314453, -1.1671295166015625, -2.0736846923828125, -0.7133522033691406, 0.3426055908203125, -0.7041206359863281, 0.3746910095214844, 3.4909706115722656, 3.0877151489257812, 3.9495162963867188, -2.706319808959961, 1.6221847534179688, 0.3785552978515625, -2.1541748046875, -2.825775146484375, 2.5356903076171875, 5.5692291259765625, 3.185333251953125, -0.791351318359375, 3.8772430419921875, 3.4896202087402344, 1.9344940185546875, 4.493743896484375, -2.595949172973633, 1.7587623596191406, -2.175119400024414, 7.242835998535156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000571.npy"}
|
||||
{"epoch": 0.8631897203325775, "step": 572, "batch_size": 64, "mean": 1.3459784984588623, "std": 2.3199691772460938, "min": -3.125152587890625, "p10": -1.4482875823974608, "median": 1.3040714263916016, "p90": 4.229905700683594, "max": 8.366371154785156, "pos_frac": 0.703125, "sample": [1.5542526245117188, -2.5608444213867188, 1.5564079284667969, 3.3860015869140625, 2.2496299743652344, 1.46282958984375, 4.201942443847656, 3.691864013671875, -1.392867088317871, 8.366371154785156, 0.789886474609375, 1.3292732238769531, 0.971405029296875, 4.69757080078125, -0.183135986328125, 0.4282722473144531, -1.4231948852539062, 0.9102916717529297, -1.4590415954589844, 3.4294509887695312, 0.9622020721435547, -1.709075927734375, 4.0600738525390625, 5.237236022949219, 1.4845504760742188, 1.7610015869140625, 1.5518245697021484, 3.3753433227539062, 1.27886962890625, -0.0124664306640625, -0.8554229736328125, 5.8613739013671875, 7.582374572753906, -3.125152587890625, 2.1355438232421875, 0.578765869140625, -0.30678558349609375, -0.1749267578125, 2.263225555419922, 0.37763023376464844, 0.191741943359375, 2.337963104248047, 0.228424072265625, 2.61618709564209, 2.361042022705078, 3.4949874877929688, -2.1936683654785156, -1.929412841796875, 4.588050842285156, 1.6799697875976562, 0.0625457763671875, 4.241889953613281, -0.192596435546875, 2.8292388916015625, -1.5986785888671875, -0.89990234375, 0.6641731262207031, 0.32105255126953125, -0.738128662109375, 1.4649429321289062, 1.4417266845703125, -0.40280914306640625, -1.0199737548828125, 2.2613048553466797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000572.npy"}
|
||||
{"epoch": 0.8647014361300076, "step": 573, "batch_size": 64, "mean": 1.8728383779525757, "std": 3.2444591522216797, "min": -4.857078552246094, "p10": -1.4911907196044922, "median": 1.289189338684082, "p90": 6.635499572753908, "max": 10.2236328125, "pos_frac": 0.734375, "sample": [3.0103225708007812, 2.1569480895996094, -3.713787078857422, 3.0555877685546875, -1.4301185607910156, 1.4607391357421875, -3.248594284057617, 4.5886077880859375, 1.78460693359375, -2.5435333251953125, -4.376653671264648, -2.1642227172851562, 1.4470405578613281, 0.09814453125, 2.5651397705078125, 1.94122314453125, -0.866180419921875, 2.4958953857421875, 4.785621643066406, -4.857078552246094, 8.952392578125, 3.2061119079589844, 3.9290313720703125, 1.2423439025878906, 3.8745040893554688, 0.2545661926269531, 8.690582275390625, 1.1156463623046875, 5.763591766357422, -0.23288726806640625, 3.739856719970703, -0.616546630859375, 1.269439697265625, 0.39862060546875, 8.206634521484375, 0.8418502807617188, 1.2362804412841797, 1.1788558959960938, 1.3068618774414062, 10.2236328125, 3.3351898193359375, 3.9563522338867188, 1.2715167999267578, -0.41463661193847656, 1.0548629760742188, 0.9132843017578125, 8.480194091796875, -1.070556640625, 1.5637016296386719, 3.9285507202148438, 6.3621368408203125, 4.839263916015625, -0.41729736328125, 0.691864013671875, -0.46662139892578125, -1.517364501953125, 7.55560302734375, -1.0351104736328125, 1.8873748779296875, 0.6737937927246094, 1.3439483642578125, 0.6967926025390625, -1.294921875, 6.752655029296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000573.npy"}
|
||||
{"epoch": 0.8662131519274376, "step": 574, "batch_size": 64, "mean": 2.0703463554382324, "std": 2.533546209335327, "min": -2.7295913696289062, "p10": -1.1870471954345703, "median": 1.8808441162109375, "p90": 5.925254058837891, "max": 7.864990234375, "pos_frac": 0.796875, "sample": [0.7672462463378906, 0.17768096923828125, -1.1429862976074219, -1.352630615234375, -1.9532318115234375, 4.512611389160156, 0.9767227172851562, -1.896575927734375, -1.1961784362792969, -2.0738296508789062, 2.313854217529297, 1.655059814453125, 0.0099334716796875, -1.4918861389160156, -0.6354866027832031, 1.9967002868652344, -0.57098388671875, 3.8635406494140625, 2.0227584838867188, 2.5675506591796875, 2.8757095336914062, -0.9321632385253906, -1.165740966796875, 0.7162628173828125, 1.0931930541992188, 7.864990234375, 4.0713348388671875, 7.382659912109375, 0.6119060516357422, 3.9264068603515625, 1.9176788330078125, 1.0150127410888672, 4.577735900878906, 3.34454345703125, 5.962646484375, 4.544395446777344, -2.7295913696289062, 1.4491958618164062, 6.3033599853515625, 0.8698453903198242, 4.192695617675781, 0.971588134765625, 6.421627044677734, 1.8440093994140625, 2.918039321899414, 6.2354583740234375, 2.1447372436523438, 1.8044891357421875, 2.4410171508789062, 3.323322296142578, -0.44831085205078125, 5.838005065917969, 3.3739242553710938, 3.307544708251953, 4.539306640625, 0.8716888427734375, 0.26639842987060547, 1.2596092224121094, 3.5438919067382812, 6.378990173339844, 0.1720733642578125, 2.5399322509765625, 5.0977783203125, 1.2151031494140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000574.npy"}
|
||||
{"epoch": 0.8677248677248677, "step": 575, "batch_size": 64, "mean": 1.2661919593811035, "std": 2.88936710357666, "min": -5.110553741455078, "p10": -1.8868179321289062, "median": 0.9967336654663086, "p90": 5.035051727294922, "max": 10.240142822265625, "pos_frac": 0.640625, "sample": [2.3264007568359375, -2.384265899658203, -4.848354339599609, 0.9177398681640625, 3.885498046875, -1.08734130859375, -0.17731475830078125, 0.3639678955078125, -0.3609466552734375, -2.85455322265625, 1.0764617919921875, -0.8857879638671875, 6.6189117431640625, 0.21268463134765625, -2.4993515014648438, 3.680877685546875, -1.5358772277832031, 1.7449111938476562, 1.0550975799560547, 1.2338638305664062, 1.106475830078125, 4.30609130859375, 1.5500984191894531, 3.4075355529785156, 0.8705596923828125, -0.4590911865234375, -2.33734130859375, 1.1025924682617188, 5.08892822265625, -1.51416015625, 0.2092132568359375, -0.2625274658203125, 4.909339904785156, 7.920806884765625, 4.200243949890137, -1.2978801727294922, 6.6931304931640625, -0.11504745483398438, 6.29583740234375, 4.327060699462891, -0.02587890625, 0.1512298583984375, 0.35866546630859375, 1.2411041259765625, 2.199462890625, 1.9903411865234375, -0.29648590087890625, -5.110553741455078, 1.2461395263671875, 10.240142822265625, 2.8353424072265625, -0.1177825927734375, 1.635009765625, 2.1226272583007812, -1.8692474365234375, 0.9383697509765625, 1.324350357055664, -1.89434814453125, 0.12143707275390625, 1.905303955078125, -0.46242523193359375, 5.854835510253906, 4.223167419433594, -0.0590057373046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000575.npy"}
|
||||
{"epoch": 0.8692365835222978, "step": 576, "batch_size": 64, "mean": 2.089946746826172, "std": 2.763490676879883, "min": -3.6014862060546875, "p10": -1.413132858276367, "median": 2.2965011596679688, "p90": 5.4028263092041025, "max": 9.67718505859375, "pos_frac": 0.734375, "sample": [4.6053466796875, 5.030149459838867, -0.4643211364746094, 3.1055259704589844, 9.67718505859375, 3.0372848510742188, 6.240467071533203, 1.5169544219970703, 3.0514678955078125, 0.1103057861328125, -0.4887409210205078, -1.8362817764282227, 3.180713653564453, -1.6598587036132812, 1.3012237548828125, -0.15477752685546875, -3.3949203491210938, 2.65132999420166, 3.9450531005859375, 7.8056640625, 2.413288116455078, 5.237361907958984, 2.1377029418945312, 0.18239784240722656, 5.4737396240234375, 5.0791473388671875, 3.9280052185058594, -0.6455459594726562, 3.9398117065429688, -0.5382862091064453, 2.8159637451171875, -0.5570831298828125, 0.960723876953125, 4.373012542724609, 5.6302490234375, 0.2315826416015625, 3.0965042114257812, 0.97088623046875, 2.2893218994140625, 1.5181159973144531, 0.8162612915039062, -3.6014862060546875, 4.7003173828125, -1.5580940246582031, 2.110260009765625, 0.8564376831054688, 5.08355712890625, 6.052452087402344, -1.07489013671875, 4.412433624267578, 2.3285903930664062, -2.8909759521484375, 6.4443359375, -0.564544677734375, 2.303680419921875, 3.188446044921875, -0.15767669677734375, -2.1670398712158203, 3.7346878051757812, 3.6064014434814453, 0.47844696044921875, -0.6061973571777344, 1.2144851684570312, 3.2500343322753906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000576.npy"}
|
||||
{"epoch": 0.8707482993197279, "step": 577, "batch_size": 64, "mean": 1.2452495098114014, "std": 2.5637362003326416, "min": -4.4104461669921875, "p10": -1.8344657897949217, "median": 1.0803680419921875, "p90": 5.031393051147462, "max": 7.915689468383789, "pos_frac": 0.65625, "sample": [5.088762283325195, 0.3081531524658203, -1.9686737060546875, 1.6029987335205078, -0.4911994934082031, 1.2352142333984375, -0.52581787109375, -0.0008411407470703125, 2.4953689575195312, 1.6262054443359375, -2.0774383544921875, 4.897531509399414, 2.3489036560058594, 0.14794158935546875, -0.7302017211914062, 6.6222991943359375, 6.0438232421875, 7.915689468383789, -3.844419479370117, 2.5013046264648438, 2.304737091064453, 2.598602294921875, -0.19495582580566406, 4.166755676269531, 0.2537345886230469, -1.1683883666992188, -1.0925750732421875, 1.2760810852050781, 0.7703113555908203, 3.423675537109375, -0.6735877990722656, 1.3436698913574219, -1.9892730712890625, 0.25393009185791016, -0.314056396484375, 1.0619277954101562, 0.4955101013183594, 1.1972198486328125, -0.2538719177246094, -0.0513153076171875, -1.4062156677246094, 0.4365997314453125, 1.0988082885742188, 2.4475059509277344, 6.854669570922852, 3.4946441650390625, -1.9566802978515625, 1.89801025390625, -0.9805679321289062, 5.56024169921875, -1.6458206176757812, 1.9665985107421875, 5.3730621337890625, 4.365695953369141, 0.9380149841308594, 1.2187347412109375, -4.4104461669921875, -1.3176641464233398, 0.1273965835571289, 3.017444610595703, 1.9696884155273438, 4.08836555480957, 1.8694610595703125, -1.915313720703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000577.npy"}
|
||||
{"epoch": 0.872260015117158, "step": 578, "batch_size": 64, "mean": 1.2464478015899658, "std": 2.2258782386779785, "min": -3.218658447265625, "p10": -1.125067901611328, "median": 1.0750923156738281, "p90": 4.110296630859375, "max": 8.326065063476562, "pos_frac": 0.6875, "sample": [0.17542648315429688, -0.7280387878417969, -1.1427536010742188, -0.6723403930664062, 2.1580352783203125, -1.8904361724853516, 2.5990447998046875, 2.0037689208984375, 3.0824432373046875, 0.28092193603515625, -0.5648345947265625, 4.143310546875, -0.0575408935546875, 2.5560302734375, 2.761005401611328, -1.08380126953125, 4.312042236328125, 1.8898353576660156, 2.4547386169433594, 0.229400634765625, 3.6857681274414062, 1.0655059814453125, -0.29215049743652344, 2.7422752380371094, 0.1985492706298828, 1.0846786499023438, 3.151031494140625, 0.3347892761230469, -0.77093505859375, -1.4783897399902344, 4.542152404785156, 3.053802490234375, 5.613864898681641, 1.713958740234375, -0.4585418701171875, 1.4055023193359375, 1.5330543518066406, 1.2017364501953125, 1.7409515380859375, 3.5680198669433594, 6.1345977783203125, 0.310821533203125, 2.59661865234375, 0.4548835754394531, -0.843658447265625, -2.3583335876464844, -0.5396270751953125, 4.389743804931641, 2.236064910888672, 4.030948638916016, -0.3634185791015625, -3.218658447265625, 8.326065063476562, 0.3012504577636719, 1.600738525390625, -2.255396842956543, 0.6210098266601562, -0.44364166259765625, 0.03350830078125, -0.6070690155029297, -2.0684890747070312, 1.0963478088378906, 0.1632080078125, 4.03326416015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000578.npy"}
|
||||
{"epoch": 0.873771730914588, "step": 579, "batch_size": 64, "mean": 1.7472577095031738, "std": 2.8086814880371094, "min": -6.2556915283203125, "p10": -1.1725463867187496, "median": 1.7610645294189453, "p90": 5.554252624511719, "max": 8.494186401367188, "pos_frac": 0.8125, "sample": [2.013031005859375, 2.0966110229492188, 0.6727867126464844, 2.2465744018554688, 1.7831649780273438, -2.663299560546875, 5.5813751220703125, 1.74761962890625, 0.5933837890625, 7.3013153076171875, 4.507820129394531, 1.7745094299316406, 2.2739639282226562, 8.494186401367188, 0.06782722473144531, 1.2755775451660156, 3.006134033203125, 2.6013641357421875, -2.41839599609375, 5.036376953125, 6.137931823730469, 0.5284423828125, 4.592262268066406, 0.5247230529785156, -0.655731201171875, -0.0958099365234375, 5.75016975402832, 2.4976882934570312, 5.190467834472656, 0.3661460876464844, -0.1726837158203125, 1.2362136840820312, 1.2919998168945312, 2.4176998138427734, 1.2571487426757812, 4.006126403808594, 3.01263427734375, 3.1527252197265625, -1.6144561767578125, 3.125612258911133, 0.6808509826660156, 0.9633941650390625, 0.0051136016845703125, -4.578849792480469, 1.9931640625, 1.6856422424316406, -0.8035125732421875, -6.2556915283203125, 6.411474227905273, 0.7396011352539062, 2.3798065185546875, 1.5058441162109375, -1.3307037353515625, 3.0790023803710938, 1.2438774108886719, -0.4420604705810547, 6.448760986328125, 5.490966796875, 0.4717445373535156, 0.8075752258300781, -5.915733337402344, 2.0829544067382812, 2.7324066162109375, 1.88763427734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000579.npy"}
|
||||
{"epoch": 0.8752834467120182, "step": 580, "batch_size": 64, "mean": 1.7644741535186768, "std": 2.590799570083618, "min": -4.484748840332031, "p10": -1.344710922241211, "median": 1.7077999114990234, "p90": 5.366637802124024, "max": 8.39105224609375, "pos_frac": 0.75, "sample": [-1.34466552734375, 0.092987060546875, -1.4518280029296875, 1.2994194030761719, -0.36187744140625, 5.877285003662109, 5.390506744384766, 1.86651611328125, 1.33660888671875, 0.6963653564453125, 3.824981689453125, 5.396900177001953, 4.280029296875, 1.1620712280273438, -1.895050048828125, 0.5311946868896484, 2.5761337280273438, 1.483642578125, -1.3447303771972656, 1.7059288024902344, 1.2850189208984375, 2.785447120666504, 5.678703308105469, 0.00038909912109375, 3.1236648559570312, 0.00605010986328125, 4.1517791748046875, 2.0803756713867188, -0.9369773864746094, 5.8826141357421875, 3.6940155029296875, 2.206775665283203, 1.1017379760742188, -0.5375156402587891, -0.6268386840820312, 3.4712295532226562, -3.9376220703125, -3.2064132690429688, 1.7096710205078125, 8.39105224609375, -4.484748840332031, 3.284912109375, -0.2847442626953125, 2.74737548828125, 5.310943603515625, 0.7513313293457031, -0.4483489990234375, 1.7138309478759766, 4.212127685546875, 0.43991851806640625, 6.01373291015625, 2.302417755126953, 3.9100494384765625, -2.1777381896972656, -0.372100830078125, -0.13599014282226562, 4.2543792724609375, 2.1470260620117188, 2.63458251953125, 5.305908203125, 2.2125396728515625, 1.616546630859375, 0.35915374755859375, 4.167665481567383], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000580.npy"}
|
||||
{"epoch": 0.8767951625094482, "step": 581, "batch_size": 64, "mean": 1.6590216159820557, "std": 2.594231367111206, "min": -3.7781143188476562, "p10": -1.6451393127441407, "median": 1.6134109497070312, "p90": 4.992564392089845, "max": 8.334014892578125, "pos_frac": 0.765625, "sample": [3.288402557373047, 1.8478317260742188, 1.8172607421875, 1.8363189697265625, -1.1374359130859375, 1.606658935546875, 1.9891204833984375, -0.13055419921875, 6.22125244140625, 2.4200515747070312, -2.409942626953125, -0.0255584716796875, 0.3474235534667969, 1.4707603454589844, 1.2798995971679688, -0.8763847351074219, 3.8685302734375, 1.4058189392089844, 2.5374698638916016, 1.2069835662841797, 6.505134582519531, 1.6952323913574219, 6.9284210205078125, -2.110137939453125, 4.300510406494141, 4.6891021728515625, 5.12261962890625, 0.40316009521484375, 3.9728240966796875, 0.16856002807617188, -2.0573883056640625, 0.8561172485351562, 0.5872535705566406, 3.858318328857422, 0.767364501953125, 4.081455230712891, 0.5605392456054688, -3.6851730346679688, 8.334014892578125, 4.685020446777344, 6.064720153808594, -1.6310882568359375, 1.7577362060546875, -0.24676513671875, 1.6798286437988281, -2.3841934204101562, 4.4888916015625, 0.6029205322265625, -1.5444259643554688, -1.6511611938476562, 3.0928573608398438, 3.5795974731445312, 1.6201629638671875, 3.493499755859375, 0.27996253967285156, 2.773162841796875, 0.14568328857421875, -3.7781143188476562, 0.37322235107421875, 5.196529388427734, 1.802947998046875, 2.7754898071289062, 0.39926910400390625, -0.9402084350585938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000581.npy"}
|
||||
{"epoch": 0.8783068783068783, "step": 582, "batch_size": 64, "mean": 1.5874507427215576, "std": 2.74868106842041, "min": -4.70477294921875, "p10": -1.948934173583984, "median": 1.3700752258300781, "p90": 4.76950912475586, "max": 9.760078430175781, "pos_frac": 0.734375, "sample": [2.2464447021484375, 9.760078430175781, 1.1129417419433594, -0.25672149658203125, 0.9195327758789062, -1.655517578125, 4.607513427734375, 0.8873710632324219, 2.856109619140625, 5.3816986083984375, 1.3543319702148438, 3.8094863891601562, 3.1195526123046875, 7.9592742919921875, 0.1751861572265625, 3.3572254180908203, 5.5068817138671875, -1.0282249450683594, 1.2169265747070312, 5.101898193359375, 0.6477479934692383, 1.3858184814453125, 1.7826156616210938, 3.88189697265625, 2.8986892700195312, 2.8343276977539062, 0.26988983154296875, -0.4701862335205078, -2.0746841430664062, -2.584463119506836, 0.14570236206054688, 3.305938720703125, 4.091011047363281, -1.5955886840820312, -0.31002044677734375, 1.8515338897705078, 7.1906585693359375, 2.9046859741210938, 4.838935852050781, -0.07525253295898438, -2.662689208984375, 4.373054504394531, -0.5866317749023438, 2.4163780212402344, -2.340496063232422, 1.5578079223632812, 0.7818145751953125, 2.8705978393554688, 1.0894851684570312, -3.9238357543945312, 0.1719512939453125, 2.1864776611328125, -1.071176528930664, 0.019308090209960938, 0.5572242736816406, 0.42919921875, 3.1636199951171875, -4.70477294921875, 3.2307186126708984, -2.6655960083007812, 2.37200927734375, -0.4274749755859375, 3.4696121215820312, 3.9390201568603516], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000582.npy"}
|
||||
{"epoch": 0.8798185941043084, "step": 583, "batch_size": 64, "mean": 1.2722513675689697, "std": 2.8422062397003174, "min": -4.656606674194336, "p10": -1.9928203582763673, "median": 1.117177963256836, "p90": 5.238593292236328, "max": 8.767280578613281, "pos_frac": 0.609375, "sample": [-0.7627105712890625, -0.6008033752441406, -3.1897048950195312, -1.8431358337402344, 8.767280578613281, 2.438770294189453, -2.0004959106445312, -0.9886503219604492, -0.7568168640136719, -1.3776321411132812, 1.7896881103515625, -3.525665283203125, 1.2335548400878906, 2.9403533935546875, 1.8899612426757812, -0.45368194580078125, 5.233856201171875, -1.3607749938964844, 7.519386291503906, 4.991739273071289, 3.6962661743164062, -2.8611373901367188, 3.3129425048828125, -1.5922889709472656, 3.0115814208984375, 1.4260978698730469, 2.490875244140625, -2.083892822265625, 1.3194923400878906, 5.240623474121094, 0.18039703369140625, -0.600311279296875, 3.9209136962890625, -0.4855194091796875, 0.305694580078125, 5.463836669921875, -0.6168441772460938, -2.1004638671875, 3.3352203369140625, 6.883831024169922, -0.044921875, 2.66412353515625, -4.656606674194336, 1.989349365234375, -1.9749107360839844, 4.96588134765625, 3.0239620208740234, 2.82281494140625, 0.6468887329101562, -1.3250732421875, -1.770263671875, -0.212371826171875, 0.5562248229980469, 0.561737060546875, 1.7589130401611328, 2.632965087890625, 1.1349830627441406, 2.127288818359375, -0.11087226867675781, 3.1023826599121094, 6.3796844482421875, 5.2877044677734375, 0.572998046875, 1.0993728637695312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000583.npy"}
|
||||
{"epoch": 0.8813303099017384, "step": 584, "batch_size": 64, "mean": 1.4972236156463623, "std": 2.881338119506836, "min": -3.1568222045898438, "p10": -2.161876678466797, "median": 1.2810440063476562, "p90": 5.070187377929688, "max": 10.624786376953125, "pos_frac": 0.671875, "sample": [6.025665283203125, 0.7957763671875, 1.2206649780273438, -0.09913063049316406, 1.8204498291015625, -1.091531753540039, 4.38934326171875, 3.6014747619628906, 2.8965530395507812, 3.5709686279296875, 2.236591339111328, -0.48375892639160156, -1.9583206176757812, -2.324674606323242, -2.53936767578125, 2.1515579223632812, 1.7122955322265625, 5.16473388671875, 3.3162841796875, -2.64093017578125, 4.460536956787109, 4.507835388183594, -1.2979106903076172, 5.407257080078125, 3.771392822265625, 0.7472000122070312, 0.4270591735839844, 0.48381805419921875, 1.7684783935546875, 0.30655670166015625, 2.6601953506469727, -1.8540191650390625, 0.1693572998046875, 1.40673828125, -2.249114990234375, 6.927116394042969, -3.11614990234375, 2.1081390380859375, -0.6086997985839844, -0.8240089416503906, 4.849578857421875, 8.284698486328125, -1.1969528198242188, 2.872650146484375, 1.4600334167480469, 1.6109161376953125, -0.18594837188720703, 1.6783943176269531, 0.6197738647460938, -0.9777984619140625, -0.34018707275390625, 0.41529083251953125, -0.22899246215820312, 4.417510986328125, 0.8651657104492188, 10.624786376953125, 6.9853973388671875, 1.1027069091796875, 3.1608505249023438, 1.3414230346679688, -1.2666854858398438, 2.909015655517578, -3.1568222045898438, -2.988920211791992], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000584.npy"}
|
||||
{"epoch": 0.8828420256991686, "step": 585, "batch_size": 64, "mean": 1.1473884582519531, "std": 2.236647605895996, "min": -5.164527893066406, "p10": -1.0607963562011717, "median": 1.1200408935546875, "p90": 3.883444213867188, "max": 5.613887786865234, "pos_frac": 0.75, "sample": [0.705078125, 3.38995361328125, 1.102020263671875, 0.7090377807617188, 1.300201416015625, 1.8347930908203125, 1.0879745483398438, 4.351837158203125, -1.0365982055664062, 0.36226654052734375, -5.164527893066406, 0.24717330932617188, 3.8475341796875, 0.5477523803710938, 1.4332923889160156, 2.482757568359375, -1.9237136840820312, 2.482177734375, 3.898834228515625, 0.0908203125, 3.5126991271972656, 1.5128135681152344, 0.2301959991455078, -0.6783370971679688, 1.6583824157714844, 0.47592926025390625, -0.6070098876953125, 1.1964035034179688, 1.69268798828125, 1.4884796142578125, -0.7298049926757812, 3.3615646362304688, 3.541637420654297, -0.4990043640136719, -2.6352310180664062, 0.25218963623046875, 1.6316299438476562, 1.6495704650878906, 5.1329498291015625, 2.7861480712890625, -0.48125457763671875, 2.054424285888672, 0.6359710693359375, -3.1439552307128906, -5.154212951660156, 5.613887786865234, 5.2632904052734375, 4.275291442871094, -0.6855545043945312, -0.35543060302734375, -1.3781814575195312, 0.6390914916992188, 3.4509429931640625, 3.0449419021606445, 1.7301387786865234, -1.0711669921875, 0.7446441650390625, 2.7940025329589844, 5.473867416381836, 0.5143661499023438, 0.2005615234375, -0.8995437622070312, 1.1380615234375, 2.306121826171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000585.npy"}
|
||||
{"epoch": 0.8843537414965986, "step": 586, "batch_size": 64, "mean": 1.7582968473434448, "std": 2.439262628555298, "min": -3.2257308959960938, "p10": -1.449590492248535, "median": 1.6867790222167969, "p90": 4.805313491821289, "max": 8.9127197265625, "pos_frac": 0.75, "sample": [4.438720703125, -1.6624908447265625, 1.26263427734375, 2.128631591796875, 1.7480621337890625, 4.783836364746094, 1.8326606750488281, -1.6965084075927734, 3.0712203979492188, 1.8154258728027344, 4.519016265869141, -2.8076705932617188, 1.3456611633300781, 3.916839599609375, 5.4688873291015625, -0.00139617919921875, 1.2148456573486328, 0.6151123046875, 2.0255050659179688, 4.096019744873047, -1.0731964111328125, 4.814517974853516, 1.7968025207519531, 1.3313713073730469, 3.022735595703125, -0.3922309875488281, 1.5262222290039062, -0.6444168090820312, 2.367889404296875, 2.369091033935547, 4.532539367675781, 2.8600845336914062, 1.0173110961914062, 4.117340087890625, 2.77740478515625, -0.181060791015625, -2.387554168701172, 8.9127197265625, 1.6254959106445312, 0.3436088562011719, -1.298055648803711, -1.5145339965820312, 2.96240234375, 2.004974365234375, 4.111656188964844, -1.6004104614257812, -0.49466896057128906, 1.2678337097167969, 0.2307300567626953, 1.1644706726074219, 4.9070281982421875, 0.700103759765625, 3.1419754028320312, 0.14534759521484375, 5.037097930908203, -0.5345916748046875, -3.2257308959960938, 4.935962677001953, -1.282562255859375, 0.6141357421875, 7.614356994628906, 1.2634201049804688, 3.4171142578125, 2.1112518310546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000586.npy"}
|
||||
{"epoch": 0.8858654572940288, "step": 587, "batch_size": 64, "mean": 1.8054628372192383, "std": 2.644744396209717, "min": -3.75177001953125, "p10": -1.4471115112304687, "median": 1.6921520233154297, "p90": 5.123370361328125, "max": 7.914985656738281, "pos_frac": 0.71875, "sample": [-0.2898597717285156, -1.8141860961914062, -0.046146392822265625, 2.0456161499023438, 4.563423156738281, -1.4481163024902344, 1.5392494201660156, -3.19818115234375, 2.5195770263671875, 4.205047607421875, 2.978574752807617, 4.631168365478516, -2.321744918823242, 1.664306640625, -0.04357147216796875, 0.7496833801269531, 4.790824890136719, 0.7582149505615234, -1.3230743408203125, 6.341484069824219, 2.7590179443359375, 0.9756011962890625, -0.5096931457519531, 4.977485656738281, 3.6681976318359375, 1.9942626953125, 7.914985656738281, 0.3164863586425781, -3.75177001953125, 0.7285194396972656, -0.5231285095214844, 1.7199974060058594, 3.4570159912109375, 4.096244812011719, 0.3184356689453125, 2.963359832763672, 2.6181106567382812, -2.6055755615234375, -3.1703033447265625, 6.225250244140625, 5.943939208984375, -1.2707748413085938, 5.149341583251953, 1.4452362060546875, 0.1592864990234375, 3.8112640380859375, 3.280651092529297, 5.062770843505859, 4.110759735107422, 1.5396499633789062, 5.78594970703125, -0.7970199584960938, 2.4940080642700195, 0.2598419189453125, 1.2870063781738281, 1.2105331420898438, 3.0930137634277344, -1.4447669982910156, -0.3303260803222656, 2.0261077880859375, 2.3200302124023438, 6.1302642822265625, 3.8515090942382812, -0.04344940185546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000587.npy"}
|
||||
{"epoch": 0.8873771730914588, "step": 588, "batch_size": 64, "mean": 1.699598789215088, "std": 2.132050037384033, "min": -3.758941650390625, "p10": -0.7470252990722653, "median": 2.0645360946655273, "p90": 3.5503051757812503, "max": 8.236572265625, "pos_frac": 0.78125, "sample": [-0.849639892578125, -1.3173446655273438, 4.4131011962890625, 3.3037643432617188, -0.01804351806640625, 2.1840972900390625, 3.536376953125, 2.7670440673828125, 3.5562744140625, 3.1654052734375, 8.236572265625, 2.5612335205078125, 1.7564926147460938, 2.061857223510742, 1.4991455078125, 2.2192955017089844, 7.4415283203125, 3.8589630126953125, 2.3526611328125, 3.3301239013671875, 1.6215591430664062, 2.0672149658203125, -0.08019828796386719, 2.670696258544922, 2.3136024475097656, 4.5364837646484375, 0.7974510192871094, 0.1471710205078125, 0.2887401580810547, -3.758941650390625, 0.36548519134521484, 1.5761127471923828, -2.874086380004883, 0.9522857666015625, 2.6832313537597656, 4.839256286621094, 0.8428955078125, 2.5877685546875, 2.955791473388672, -0.333526611328125, 0.7390594482421875, -2.9365615844726562, -0.5075912475585938, 2.9957351684570312, 3.025209426879883, 1.7857894897460938, 2.738138198852539, -1.0519981384277344, 2.121612548828125, 3.065235137939453, 2.161834716796875, 1.6110382080078125, 0.09477615356445312, 0.875518798828125, -0.3067626953125, 3.39453125, 1.68634033203125, 3.102855682373047, 1.5890655517578125, -2.1772613525390625, -0.06480789184570312, 2.9449462890625, 2.1173629760742188, -0.487640380859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000588.npy"}
|
||||
{"epoch": 0.8888888888888888, "step": 589, "batch_size": 64, "mean": 1.799081563949585, "std": 2.542647361755371, "min": -6.112548828125, "p10": -0.9210653305053709, "median": 1.7440643310546875, "p90": 4.630165100097657, "max": 8.528980255126953, "pos_frac": 0.796875, "sample": [1.0869331359863281, 1.1830902099609375, -1.9162445068359375, 4.453498840332031, 3.4420318603515625, 1.2154922485351562, 3.777812957763672, -0.5511474609375, 0.62542724609375, 0.1337909698486328, -0.12941741943359375, -1.0377578735351562, 7.82708740234375, 3.1034278869628906, 2.202484130859375, -0.4464454650878906, 2.3157691955566406, 2.1714515686035156, 4.680519104003906, 4.9133453369140625, 0.9256973266601562, 1.974700927734375, 5.135900497436523, -6.112548828125, 2.8359375, 1.410400390625, 0.7010116577148438, 1.6649284362792969, 3.4609413146972656, 2.6227073669433594, 2.1304168701171875, 1.8232002258300781, 3.775299072265625, 3.8154754638671875, 0.57867431640625, -1.7716140747070312, 2.8311080932617188, 2.609161376953125, 1.6386871337890625, 4.271087646484375, 4.090293884277344, 5.785491943359375, 4.787574768066406, 0.41564178466796875, 3.2027626037597656, -0.08001708984375, 1.3920669555664062, -2.885601043701172, -0.6487827301025391, 1.97100830078125, -4.437744140625, 2.8211135864257812, 8.528980255126953, -2.6547775268554688, 1.5518341064453125, 2.5136966705322266, 4.008522033691406, 0.4273529052734375, 1.6442718505859375, 1.2257003784179688, 1.1014556884765625, -0.16861915588378906, 4.512672424316406, 0.66400146484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000589.npy"}
|
||||
{"epoch": 0.890400604686319, "step": 590, "batch_size": 64, "mean": 1.7649750709533691, "std": 2.467432975769043, "min": -4.909263610839844, "p10": -1.0051498413085938, "median": 1.4649524688720703, "p90": 4.964459228515626, "max": 8.026275634765625, "pos_frac": 0.75, "sample": [3.094615936279297, 0.8918685913085938, 3.354400634765625, -0.07704925537109375, 1.842254638671875, 7.871589660644531, 0.58209228515625, -4.909263610839844, -1.0719232559204102, -0.4752960205078125, 1.029510498046875, -0.2832298278808594, 1.595062255859375, 0.650482177734375, -3.060821533203125, -0.29157257080078125, 2.602550506591797, 1.86468505859375, 1.1038589477539062, 0.3988800048828125, -1.012908935546875, 2.3384857177734375, -0.20822525024414062, 4.524257659912109, 6.972509384155273, -0.9870452880859375, 1.3348426818847656, 0.9397811889648438, 3.4012298583984375, 2.7628173828125, 0.5492401123046875, 4.9996795654296875, 1.702169418334961, -1.089874267578125, 2.7942657470703125, 5.08172607421875, 2.8554210662841797, 1.2468948364257812, -1.2895050048828125, 1.0484504699707031, 4.305641174316406, 0.9533061981201172, 8.026275634765625, 4.004478454589844, 5.926292419433594, -0.6361503601074219, 1.1073150634765625, 2.524993896484375, 1.9717559814453125, 2.997772216796875, -1.6480484008789062, 4.0724945068359375, 2.9460391998291016, -0.86175537109375, 0.622833251953125, 5.353794097900391, 2.678020477294922, 3.0681304931640625, 4.8822784423828125, -0.7088050842285156, 2.5287094116210938, 0.0157470703125, 4.007102966308594, 0.14327239990234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000590.npy"}
|
||||
{"epoch": 0.891912320483749, "step": 591, "batch_size": 64, "mean": 1.8817024230957031, "std": 2.451798677444458, "min": -4.309318542480469, "p10": -1.182793426513672, "median": 1.9813270568847656, "p90": 4.800227355957032, "max": 7.7378997802734375, "pos_frac": 0.8125, "sample": [0.13753128051757812, 4.308971405029297, 1.7631607055664062, 4.155708312988281, 3.3957061767578125, 4.4983673095703125, 4.3883819580078125, 2.3641281127929688, 4.007837295532227, 5.348579406738281, 1.994964599609375, 0.0039520263671875, 2.594024658203125, -1.4634780883789062, 0.7267303466796875, -0.5412025451660156, 2.124114990234375, 1.4216995239257812, 0.8520050048828125, 2.389352798461914, 4.64801025390625, 3.406278610229492, 3.6824951171875, -1.142242431640625, 4.258674621582031, -2.8935928344726562, 2.629669189453125, 3.130207061767578, 0.4301414489746094, 0.771575927734375, -3.6966094970703125, -0.9534912109375, 5.699615478515625, 7.7378997802734375, 0.4068717956542969, 5.3818511962890625, -4.309318542480469, 1.4379959106445312, 3.3948287963867188, 0.429290771484375, 1.1089324951171875, -1.2001724243164062, 1.9281768798828125, 3.1759185791015625, 0.5253877639770508, 2.46533203125, -1.036672592163086, -1.4539413452148438, 0.9449615478515625, 2.83172607421875, -1.631561279296875, -0.5732688903808594, 6.613861083984375, 2.2439422607421875, 0.46419525146484375, 5.365453720092773, 4.8654632568359375, 1.7426280975341797, 1.9676895141601562, 2.728740692138672, 4.093086242675781, 0.5850830078125, 0.13119888305664062, 3.6221084594726562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000591.npy"}
|
||||
{"epoch": 0.8934240362811792, "step": 592, "batch_size": 64, "mean": 1.7437461614608765, "std": 2.696925640106201, "min": -4.480926513671875, "p10": -1.2142820358276365, "median": 1.5696277618408203, "p90": 5.108388519287111, "max": 11.49346923828125, "pos_frac": 0.71875, "sample": [-0.20861053466796875, 5.97601318359375, 3.7800827026367188, 2.4680442810058594, 2.380155563354492, 5.846841812133789, 4.611522674560547, -1.648773193359375, 1.9863739013671875, 0.28989410400390625, -0.7499618530273438, 4.1916046142578125, 1.8007354736328125, 0.33274078369140625, 1.6566009521484375, 4.5157012939453125, 6.445194244384766, 2.5459518432617188, -0.6907768249511719, -1.6801280975341797, -4.480926513671875, 4.420684814453125, 1.6097755432128906, 4.754035949707031, -0.40032196044921875, 3.1316680908203125, 1.6706390380859375, 5.26025390625, 2.2548828125, 4.583831787109375, 0.6114082336425781, 1.8870124816894531, -0.33406829833984375, 0.9128036499023438, 1.0134963989257812, 0.6651840209960938, -2.3169326782226562, 2.4670867919921875, -2.009756088256836, -1.2732105255126953, -0.2153911590576172, 2.8470687866210938, 0.5334129333496094, -1.3439712524414062, 0.7284774780273438, 3.305145263671875, 2.617919921875, 7.695220947265625, 6.26031494140625, 0.5559921264648438, 11.49346923828125, 0.45122337341308594, -0.02425384521484375, 1.6224174499511719, -0.3912620544433594, 1.9851083755493164, 1.52947998046875, 0.07794570922851562, -0.272308349609375, 0.5671463012695312, 1.2520866394042969, -1.0767822265625, -0.5239105224609375, 3.648448944091797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000592.npy"}
|
||||
{"epoch": 0.8949357520786092, "step": 593, "batch_size": 64, "mean": 1.515468955039978, "std": 2.8329198360443115, "min": -5.126777648925781, "p10": -1.9806480407714844, "median": 1.490224838256836, "p90": 5.437152481079103, "max": 8.309700012207031, "pos_frac": 0.703125, "sample": [0.5503273010253906, 0.5861053466796875, 2.3861846923828125, -0.6778564453125, 4.589042663574219, -2.1555557250976562, 0.1813201904296875, -1.324310302734375, 6.377296447753906, 3.2809715270996094, 3.3872642517089844, 0.9644050598144531, 1.941558837890625, 5.547332763671875, 2.4768447875976562, 0.005859375, 3.949329376220703, 2.837738037109375, 1.432373046875, 2.6836624145507812, 1.0187835693359375, 5.680971145629883, 4.049896240234375, 4.1035614013671875, -0.5321617126464844, 1.0693397521972656, -0.848785400390625, 4.085163116455078, 0.427978515625, -4.125675201416016, 3.864898681640625, -2.3504180908203125, -1.974212646484375, 5.180065155029297, 1.8933944702148438, 2.77105712890625, 1.5480766296386719, -1.7636756896972656, 6.435951232910156, -0.2650184631347656, -2.7367324829101562, -1.8309173583984375, 5.8932647705078125, -1.9834060668945312, 8.309700012207031, 3.1906661987304688, 4.991905212402344, 2.6675548553466797, -1.3849334716796875, 7.00830078125, -2.0807647705078125, -0.637359619140625, 2.450653076171875, 1.7658157348632812, 3.5278167724609375, -5.126777648925781, 0.5628681182861328, 0.14438629150390625, -1.4818410873413086, -1.1151657104492188, 1.62762451171875, 0.7389602661132812, 1.1706466674804688, 2.0286636352539062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000593.npy"}
|
||||
{"epoch": 0.8964474678760394, "step": 594, "batch_size": 64, "mean": 1.4055747985839844, "std": 2.361210584640503, "min": -3.1829299926757812, "p10": -1.302604866027832, "median": 1.1079883575439453, "p90": 4.404175186157227, "max": 7.5059661865234375, "pos_frac": 0.671875, "sample": [1.4239044189453125, 3.2477264404296875, 1.0945320129394531, 3.4313583374023438, 4.006473541259766, 4.2608642578125, -0.10176467895507812, 1.1214447021484375, 2.4505767822265625, -1.222137451171875, 0.6046600341796875, -2.0616531372070312, -1.3119792938232422, 1.206878662109375, 0.22745132446289062, -1.1687908172607422, -1.280731201171875, -1.3167572021484375, 5.7909698486328125, 2.197917938232422, 3.4144973754882812, 0.494659423828125, 0.5367288589477539, 0.6624374389648438, 7.5059661865234375, 2.7094268798828125, -0.7968673706054688, -1.414520263671875, 2.9636459350585938, -0.8775482177734375, 6.0493621826171875, 2.8286895751953125, 5.4722747802734375, 2.77679443359375, -1.7497100830078125, -0.8698387145996094, 0.5838241577148438, 2.5483665466308594, 3.154735565185547, -0.30587005615234375, 2.5007400512695312, -0.8635215759277344, 3.942596435546875, 4.369880676269531, 0.7094573974609375, 1.1461200714111328, 0.2688446044921875, 3.8021621704101562, -0.21257781982421875, 4.418872833251953, 6.3779144287109375, -3.1829299926757812, 2.3045501708984375, 4.537101745605469, 0.950439453125, 2.268383026123047, -0.9337739944458008, 1.2073745727539062, 1.51177978515625, -0.5871658325195312, 1.0227127075195312, -2.0157546997070312, -0.621063232421875, -1.25335693359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000594.npy"}
|
||||
{"epoch": 0.8979591836734694, "step": 595, "batch_size": 64, "mean": 1.7909046411514282, "std": 3.0678844451904297, "min": -6.430950164794922, "p10": -2.124821090698242, "median": 1.829580307006836, "p90": 5.179945373535157, "max": 10.108154296875, "pos_frac": 0.78125, "sample": [-2.2604751586914062, 2.0095062255859375, 5.610076904296875, 1.6083145141601562, 2.4342193603515625, 1.1933670043945312, 2.9210052490234375, 2.047151565551758, 0.7164764404296875, 5.319736480712891, 3.5723342895507812, 1.4603443145751953, -0.18439865112304688, -5.176654815673828, 4.9842071533203125, 0.5482597351074219, 0.29221153259277344, -0.7487678527832031, 3.4478492736816406, 0.6673316955566406, 2.9072418212890625, 2.803974151611328, 3.4632644653320312, 2.3563499450683594, 6.03497314453125, 1.1042709350585938, 1.1618576049804688, 1.0927543640136719, 2.21270751953125, 4.409736633300781, 1.2408294677734375, -2.650115966796875, 1.1822166442871094, -3.9709625244140625, 4.767793655395508, 0.3515968322753906, 4.561767578125, -1.07232666015625, -6.430950164794922, 9.309707641601562, 10.108154296875, 6.286251068115234, 4.47265625, 3.9484024047851562, -1.9957313537597656, 5.058372497558594, 3.20361328125, 1.9536018371582031, 0.5635910034179688, 2.7789764404296875, -0.5743265151977539, 2.6298294067382812, 1.3542938232421875, 1.7055587768554688, 1.0799942016601562, 2.182842254638672, 4.1601715087890625, 0.6244335174560547, -1.1337547302246094, -2.180145263671875, -1.1278753280639648, -4.918525695800781, 3.906688690185547, 5.232048034667969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000595.npy"}
|
||||
{"epoch": 0.8994708994708994, "step": 596, "batch_size": 64, "mean": 0.9653345346450806, "std": 2.546626329421997, "min": -4.340250015258789, "p10": -1.5158447265625, "median": 0.27652645111083984, "p90": 5.490946197509766, "max": 8.3907470703125, "pos_frac": 0.578125, "sample": [-0.11133956909179688, 1.5390167236328125, 6.042732238769531, -1.3820419311523438, 1.6377677917480469, -1.5731887817382812, -0.6578330993652344, 1.0324592590332031, 0.0052490234375, 2.0409278869628906, -0.8537788391113281, 0.80596923828125, -1.01507568359375, -0.5404891967773438, 5.494606018066406, 6.4331512451171875, 1.4608726501464844, -0.7348365783691406, -1.258575439453125, 1.2488822937011719, -0.74005126953125, 0.44170379638671875, -0.45755958557128906, 1.2340717315673828, 7.12701416015625, 0.43829345703125, -1.7162017822265625, -1.3167610168457031, 3.0227317810058594, 3.14447021484375, -0.6429328918457031, -0.6731529235839844, 2.2449951171875, -0.3596038818359375, 2.36212158203125, -0.30406761169433594, -1.86956787109375, 6.0854644775390625, 0.2046661376953125, -1.8165740966796875, -1.0179100036621094, 1.3196487426757812, -0.3821563720703125, -4.340250015258789, -0.6852493286132812, 2.311798095703125, -0.5132102966308594, -2.7107486724853516, 5.4824066162109375, 2.1279067993164062, 2.2244224548339844, 0.3482818603515625, -1.6954193115234375, 2.1942596435546875, 0.2980804443359375, 8.3907470703125, -0.5112838745117188, 2.1287193298339844, 0.2549724578857422, 0.24378204345703125, 2.457366943359375, 6.744049072265625, 0.243255615234375, 0.8444023132324219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000596.npy"}
|
||||
{"epoch": 0.9009826152683296, "step": 597, "batch_size": 64, "mean": 1.6821010112762451, "std": 2.464052200317383, "min": -4.0950164794921875, "p10": -1.2063636779785154, "median": 1.4577579498291016, "p90": 4.816541290283204, "max": 8.645523071289062, "pos_frac": 0.796875, "sample": [0.6660346984863281, -3.69378662109375, 0.4003009796142578, -0.8077545166015625, 4.8299560546875, 1.5841217041015625, 0.8917837142944336, 0.6676025390625, -1.273162841796875, 6.124153137207031, -2.0969772338867188, 3.165191650390625, 1.2044715881347656, 0.5929183959960938, -0.9053573608398438, 1.7026081085205078, 5.925048828125, 6.577831268310547, 2.4676742553710938, -0.6273097991943359, 2.8416004180908203, 2.97637939453125, -4.0950164794921875, -2.2720184326171875, 6.557426452636719, 3.1436195373535156, 1.0347871780395508, -1.4623851776123047, 3.2635345458984375, 0.25137901306152344, 1.1103267669677734, 3.5163345336914062, 4.785240173339844, 2.3216094970703125, 0.8520050048828125, 2.9217147827148438, 0.2395782470703125, 1.2859649658203125, 2.9422988891601562, 5.437919616699219, 1.9058513641357422, 0.11467742919921875, 0.4603729248046875, 0.7603225708007812, 1.1753501892089844, 0.2764549255371094, -1.4992389678955078, -0.826568603515625, 1.9355850219726562, 1.4726104736328125, 8.645523071289062, 1.5440635681152344, 3.9978103637695312, -1.0094146728515625, -1.0504989624023438, 1.4297256469726562, 2.581829071044922, 2.298919677734375, 3.9417648315429688, 4.4728240966796875, 1.4429054260253906, 3.501729965209961, 2.0268707275390625, 3.0073394775390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000597.npy"}
|
||||
{"epoch": 0.9024943310657596, "step": 598, "batch_size": 64, "mean": 1.3566750288009644, "std": 2.5691046714782715, "min": -5.13653564453125, "p10": -1.2078247070312498, "median": 1.097900390625, "p90": 4.7198326110839846, "max": 8.225227355957031, "pos_frac": 0.6875, "sample": [0.7556362152099609, -1.326934814453125, -0.2342681884765625, 1.293548583984375, 3.9187469482421875, 0.7715377807617188, 0.6588211059570312, 3.3000946044921875, 2.8024940490722656, -0.35514068603515625, 1.8681449890136719, 1.0825271606445312, 1.0731430053710938, 1.1362571716308594, 6.439910888671875, -2.304473876953125, 7.7573699951171875, -0.707244873046875, 2.6010913848876953, 2.649494171142578, -1.6721954345703125, -5.13653564453125, -0.4761810302734375, 2.596607208251953, 0.9174423217773438, 0.16019439697265625, 2.1418228149414062, 2.8900833129882812, -3.1117324829101562, -0.7198486328125, 0.25562477111816406, -0.8759613037109375, 2.390960693359375, 1.6089706420898438, 4.7116241455078125, 0.17056655883789062, 4.0233917236328125, -0.60247802734375, 4.384613037109375, -0.929901123046875, 4.723350524902344, -0.5177078247070312, 0.42417430877685547, 8.225227355957031, 1.1132736206054688, 3.067960739135742, 0.3330230712890625, 0.8534507751464844, 2.624847412109375, 3.0142974853515625, -0.0970916748046875, -3.4494705200195312, 1.488739013671875, 1.35223388671875, 1.1959037780761719, -2.9405174255371094, 5.340541839599609, 2.773468017578125, -0.9100494384765625, -0.42603302001953125, 3.04132080078125, 4.942390441894531, -0.6529121398925781, 5.398956298828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000598.npy"}
|
||||
{"epoch": 0.9040060468631897, "step": 599, "batch_size": 64, "mean": 1.5190777778625488, "std": 2.5186750888824463, "min": -3.48590087890625, "p10": -1.6246765136718744, "median": 1.1098413467407227, "p90": 5.158179664611818, "max": 9.384552001953125, "pos_frac": 0.78125, "sample": [-3.38446044921875, 2.581615447998047, 0.04244232177734375, 1.202178955078125, 0.6697235107421875, 1.9014263153076172, -0.35961151123046875, 6.154815673828125, -0.539093017578125, -3.48590087890625, -3.246196746826172, 0.8490486145019531, 1.0735206604003906, 2.5958480834960938, 3.6153106689453125, 0.565032958984375, -0.47186279296875, 0.7417182922363281, 1.6150226593017578, 0.917022705078125, 5.250448226928711, 1.7975349426269531, 0.5550079345703125, -1.888275146484375, 0.2726707458496094, -2.425537109375, 0.92486572265625, 4.9428863525390625, -2.319934844970703, 2.4789810180664062, 5.517852783203125, 2.6250572204589844, 0.4345512390136719, -1.009613037109375, 0.14226531982421875, 2.468395233154297, 0.27489471435546875, 1.9157638549804688, -0.24358749389648438, 0.9390411376953125, 4.523540496826172, 2.159696578979492, 1.1461620330810547, 0.7571182250976562, 3.9922447204589844, 3.621013641357422, 3.2107086181640625, -2.5385360717773438, 0.1005859375, 1.6901969909667969, 9.384552001953125, 2.65966796875, 1.8857955932617188, 1.4765949249267578, 5.329620361328125, 3.281646728515625, 7.233489990234375, 0.8297195434570312, 0.6263809204101562, -0.5394248962402344, -0.5160369873046875, 3.6122970581054688, 5.2845458984375, 2.3185272216796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000599.npy"}
|
||||
{"epoch": 0.9055177626606198, "step": 600, "batch_size": 64, "mean": 1.2634763717651367, "std": 2.2255618572235107, "min": -4.7443389892578125, "p10": -1.3090900421142577, "median": 1.1127848625183105, "p90": 4.244242095947267, "max": 7.4513397216796875, "pos_frac": 0.734375, "sample": [0.8089599609375, 3.293903350830078, -0.6106033325195312, 1.5012321472167969, 7.4513397216796875, 0.6619873046875, 3.4689788818359375, -0.24280357360839844, 4.769550323486328, 3.9148941040039062, -1.6247520446777344, 2.811370849609375, 2.892364501953125, 2.5014266967773438, -0.27475738525390625, -0.45343780517578125, 1.620025634765625, -1.3604545593261719, 0.279144287109375, 1.85919189453125, 1.3418350219726562, 1.0016860961914062, 3.8818359375, 2.514446258544922, -1.189239501953125, 2.24542236328125, 0.67034912109375, 5.957632064819336, 0.4672203063964844, 4.3853912353515625, 2.2196502685546875, 0.3605823516845703, -1.7969894409179688, -1.3875885009765625, 0.06037139892578125, -3.8191757202148438, 0.563446044921875, -0.9823074340820312, 1.0059881210327148, 4.78924560546875, -1.650146484375, 3.11456298828125, 0.9404067993164062, 0.582977294921875, 0.39055824279785156, 2.2666263580322266, -0.503753662109375, -0.9633388519287109, -4.7443389892578125, 1.8016204833984375, 2.6568031311035156, 1.683929443359375, 0.3531532287597656, 1.8911361694335938, 1.2737350463867188, 1.589324951171875, 1.2195816040039062, -0.8103675842285156, 1.9180145263671875, 5.051013946533203, 0.364349365234375, -1.0899066925048828, 3.16400146484375, 4.805183410644531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000600.npy"}
|
||||
{"epoch": 0.9070294784580499, "step": 601, "batch_size": 64, "mean": 1.9149435758590698, "std": 2.3409571647644043, "min": -4.255715370178223, "p10": -0.8763603210449218, "median": 2.2741165161132812, "p90": 4.432280731201172, "max": 7.1793670654296875, "pos_frac": 0.75, "sample": [4.438453674316406, 4.3768157958984375, 3.609771728515625, -4.255715370178223, 3.1914596557617188, 5.378578186035156, 7.1793670654296875, -0.66790771484375, -0.15680694580078125, 0.6209640502929688, 6.728828430175781, -2.271575927734375, 2.681365966796875, 2.240631103515625, 0.5546188354492188, 1.6256027221679688, 3.1001815795898438, 4.17828369140625, 0.3258476257324219, 3.935791015625, 4.095344543457031, 1.1736106872558594, -0.41553497314453125, 4.417877197265625, 6.297344207763672, 0.8032341003417969, -0.2011566162109375, 0.3846302032470703, 2.7272720336914062, -0.7515029907226562, 0.343414306640625, 2.6564483642578125, 3.65008544921875, -1.0249176025390625, 1.7954082489013672, 3.9023284912109375, 3.7027206420898438, 3.31005859375, 1.1546096801757812, 2.5266036987304688, -0.92987060546875, 0.578094482421875, 1.0608673095703125, 2.7954177856445312, 2.3076019287109375, 1.9498634338378906, 2.7507247924804688, -0.4761161804199219, -1.086294174194336, 0.9313583374023438, -0.4172096252441406, 5.17303466796875, 0.3764381408691406, -0.30755615234375, 2.437328338623047, 2.7116851806640625, 2.94696044921875, 5.424125671386719, 4.394683837890625, -2.5465431213378906, -0.01996612548828125, 2.6273345947265625, 3.8434295654296875, -1.3314361572265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000601.npy"}
|
||||
{"epoch": 0.90854119425548, "step": 602, "batch_size": 64, "mean": 1.8609247207641602, "std": 2.3915958404541016, "min": -2.906818389892578, "p10": -1.112567901611328, "median": 1.778106689453125, "p90": 5.0486328125, "max": 7.727531433105469, "pos_frac": 0.75, "sample": [-0.1998138427734375, 3.35052490234375, 3.4396934509277344, 3.148101806640625, 2.2350616455078125, 1.908966064453125, 2.057220458984375, 0.7348232269287109, 3.2949676513671875, 0.6988067626953125, 7.1507110595703125, -1.1881332397460938, 5.853057861328125, 3.9163665771484375, -2.5771751403808594, 6.7988433837890625, -0.936248779296875, 4.967071533203125, 2.6967010498046875, 0.7967529296875, 1.0240020751953125, 0.08189678192138672, 4.457313537597656, -2.906818389892578, 1.3541946411132812, 3.457874298095703, -0.15394210815429688, 4.429779052734375, 0.9584312438964844, 2.7216949462890625, -0.35800933837890625, 0.7886829376220703, 2.6541290283203125, 0.30289649963378906, 2.8813514709472656, 1.6567802429199219, -0.45013427734375, 3.2826080322265625, 2.5210113525390625, -0.34165191650390625, -1.801483154296875, 2.3370513916015625, 1.8187141418457031, 7.727531433105469, -2.34991455078125, -0.725921630859375, 2.4884414672851562, 0.7914237976074219, 1.435403823852539, 1.718048095703125, -0.6653289794921875, 1.6765899658203125, 4.2667236328125, -0.36273956298828125, 2.6482086181640625, 2.825956344604492, 5.963020324707031, 0.7787742614746094, 5.083587646484375, 5.59063720703125, -1.3468246459960938, 2.622516632080078, 1.7374992370605469, -1.6671218872070312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000602.npy"}
|
||||
{"epoch": 0.91005291005291, "step": 603, "batch_size": 64, "mean": 1.5686898231506348, "std": 2.3919479846954346, "min": -4.215812683105469, "p10": -1.6492481231689449, "median": 1.4474811553955078, "p90": 4.213203430175781, "max": 8.19659423828125, "pos_frac": 0.796875, "sample": [-1.1726455688476562, 4.892127990722656, 3.21917724609375, 0.7746353149414062, -0.7738685607910156, 0.3056640625, 1.7473640441894531, 3.0421371459960938, 0.15331649780273438, 0.43231201171875, 2.8743133544921875, 3.003265380859375, 7.106025695800781, 0.09184455871582031, -2.0139999389648438, 2.6933746337890625, 0.3179969787597656, 1.1048660278320312, 0.7341766357421875, 8.19659423828125, 2.6780052185058594, 1.285247802734375, 2.3293209075927734, 1.1079444885253906, -1.7886314392089844, 1.1788253784179688, 1.6825942993164062, -0.2163543701171875, 2.5998382568359375, -2.9844436645507812, 1.9565811157226562, 0.22528076171875, 1.791769027709961, 0.5374183654785156, -1.9585762023925781, 3.7861480712890625, 1.6097145080566406, 0.7849502563476562, -0.2161693572998047, 7.593343734741211, -1.9965715408325195, 1.8735733032226562, 3.1768569946289062, 2.317760467529297, 4.24188232421875, 3.5638351440429688, 0.8575820922851562, 0.3715782165527344, 1.7394561767578125, 0.6390266418457031, -1.8747329711914062, -1.0038070678710938, 4.1462860107421875, 4.010894775390625, 2.3001861572265625, 0.5436820983886719, 1.1841011047363281, -1.3240203857421875, 2.8232498168945312, 4.073822021484375, 5.159244537353516, 2.4283218383789062, 4.6482696533203125, -4.215812683105469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000603.npy"}
|
||||
{"epoch": 0.9115646258503401, "step": 604, "batch_size": 64, "mean": 1.4987506866455078, "std": 2.71269154548645, "min": -5.171657562255859, "p10": -1.6154190063476561, "median": 1.2319622039794922, "p90": 4.574891662597657, "max": 9.199687957763672, "pos_frac": 0.765625, "sample": [3.33782958984375, 3.1915435791015625, -0.7559242248535156, 0.31817626953125, 1.9305267333984375, -0.22194862365722656, -0.00583648681640625, 3.1042251586914062, 5.234447479248047, -2.8426742553710938, 1.0304412841796875, 0.10377883911132812, 0.4298677444458008, 3.7650222778320312, 2.217620849609375, 0.5167465209960938, 2.6790695190429688, 0.03502655029296875, 1.4204483032226562, 2.88836669921875, -3.8291091918945312, -0.654541015625, 0.8721733093261719, -2.6277236938476562, 0.2487335205078125, 1.6489677429199219, 0.36351585388183594, 1.5479278564453125, 1.6462554931640625, -4.563575744628906, 1.3590126037597656, -3.117462158203125, -0.6464691162109375, -1.692840576171875, 5.687099456787109, -1.4347686767578125, 2.887603759765625, 6.13128662109375, 4.5947265625, 3.2572784423828125, 0.5502262115478516, 7.434439659118652, 5.3546142578125, 1.0971832275390625, 0.6375579833984375, 4.5286102294921875, 0.60760498046875, 3.0720176696777344, 9.199687957763672, -0.606536865234375, 4.1340484619140625, 0.7717666625976562, 1.1049118041992188, 1.7429428100585938, 1.5824050903320312, 4.315380096435547, 4.283664703369141, -0.27075958251953125, 0.5841989517211914, 3.7748184204101562, 3.8581666946411133, 0.43682861328125, -5.171657562255859, 2.84307861328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000604.npy"}
|
||||
{"epoch": 0.9130763416477702, "step": 605, "batch_size": 64, "mean": 1.295022964477539, "std": 2.268841505050659, "min": -3.774993896484375, "p10": -1.6687171936035154, "median": 1.3032093048095703, "p90": 4.09710693359375, "max": 6.932647705078125, "pos_frac": 0.71875, "sample": [1.6881256103515625, 0.5321121215820312, -2.865424156188965, 6.932647705078125, 2.9364395141601562, 1.3001861572265625, 1.582794189453125, 3.3740310668945312, 1.2571372985839844, -1.4605789184570312, 0.797454833984375, -0.1394195556640625, -0.306732177734375, -0.8169708251953125, -1.7916908264160156, -0.3056488037109375, -1.19891357421875, 3.0244522094726562, 2.235626220703125, 0.6368293762207031, 1.2799072265625, -0.42479705810546875, 1.9879493713378906, -2.1100692749023438, 1.3463783264160156, 0.5082283020019531, 5.122249603271484, 2.4392948150634766, 0.5814037322998047, 2.6089324951171875, 1.1902923583984375, 4.0009765625, -3.774993896484375, 4.4875335693359375, -0.358734130859375, 3.3443832397460938, 5.3687896728515625, 2.8949737548828125, 0.6517486572265625, 0.12020111083984375, 1.4723968505859375, 1.3880462646484375, 5.554756164550781, 4.13116455078125, 3.422821044921875, 0.76373291015625, 2.0320892333984375, 2.43505859375, 2.1409072875976562, -2.8099365234375, 6.31927490234375, 1.8546943664550781, 1.679473876953125, -1.7579193115234375, -0.6478195190429688, 0.5049457550048828, 1.6963653564453125, -2.5581893920898438, 1.3062324523925781, -1.4053535461425781, 0.05413627624511719, 4.01763916015625, -0.5511474609375, 3.1609935760498047], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000605.npy"}
|
||||
{"epoch": 0.9145880574452003, "step": 606, "batch_size": 64, "mean": 1.1763060092926025, "std": 2.288933277130127, "min": -3.2452316284179688, "p10": -1.569981002807617, "median": 0.8606929779052734, "p90": 4.937276840209962, "max": 5.711170196533203, "pos_frac": 0.65625, "sample": [-0.6727523803710938, 3.3018798828125, 2.119495391845703, 5.2272491455078125, 0.6356887817382812, -0.8506393432617188, 1.023345947265625, 0.4873046875, -0.41559600830078125, 4.621875762939453, 5.086484909057617, -1.5221481323242188, 3.313934326171875, 1.431549072265625, 1.4564590454101562, -3.2452316284179688, 1.3879470825195312, 0.24015045166015625, 1.3913192749023438, 2.936870574951172, 4.2025146484375, -0.9207000732421875, -1.7076530456542969, 2.3630599975585938, -1.5904808044433594, -1.457916259765625, 2.51043701171875, 0.5124111175537109, 2.9226760864257812, 3.5069236755371094, -0.364013671875, 0.2462024688720703, -0.9687767028808594, 4.241430282592773, -1.2932205200195312, 0.42645931243896484, -1.6372222900390625, -2.8986568450927734, 0.13055419921875, 1.2637176513671875, 5.257331848144531, 5.189140319824219, -0.8165435791015625, 5.711170196533203, 1.4835662841796875, 5.249237060546875, 0.6884918212890625, 2.0279388427734375, -0.2702140808105469, -0.7213592529296875, 5.07244873046875, -1.6194686889648438, -1.3774642944335938, 1.6756744384765625, -0.6536865234375, 3.852889060974121, 0.7567520141601562, -1.8289718627929688, 0.7616157531738281, 2.9878997802734375, 0.9597702026367188, 3.2459335327148438, 0.9894332885742188, -0.7809333801269531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000606.npy"}
|
||||
{"epoch": 0.9160997732426304, "step": 607, "batch_size": 64, "mean": 1.5237513780593872, "std": 2.285933494567871, "min": -3.630584716796875, "p10": -1.5548339843749996, "median": 1.4353466033935547, "p90": 4.189348030090332, "max": 8.242324829101562, "pos_frac": 0.78125, "sample": [0.1633167266845703, -0.630401611328125, 0.5242290496826172, 0.1088104248046875, 3.3548812866210938, 4.161914825439453, 3.030139923095703, 2.779876708984375, -3.630584716796875, 0.4351310729980469, -0.034084320068359375, 3.58367919921875, 8.242324829101562, 4.452966690063477, 0.9952239990234375, 0.571685791015625, 1.0468368530273438, 1.4192008972167969, 2.0831069946289062, 6.7845611572265625, 0.3104248046875, 2.2334213256835938, 3.0179061889648438, 5.383514404296875, 3.8111839294433594, 2.3131332397460938, -2.49609375, 1.4514923095703125, -1.677419662475586, -2.8162879943847656, 1.8203887939453125, 3.398998260498047, 2.2800750732421875, 4.203910827636719, 0.7833309173583984, 3.039520263671875, 2.745025634765625, -0.4818458557128906, 2.7711334228515625, -1.69036865234375, -0.8521461486816406, 1.0381174087524414, 0.3412017822265625, 1.7018013000488281, 1.7815399169921875, -0.7559585571289062, -1.2688007354736328, 3.951984405517578, 3.528350830078125, 4.201105117797852, -0.17576217651367188, -3.39569091796875, 1.7593612670898438, 4.478912353515625, 1.2228164672851562, 2.7590103149414062, 1.8281326293945312, 1.2209243774414062, 0.7699813842773438, 3.186185836791992, 0.6397476196289062, -1.7469520568847656, 1.0524749755859375, 0.40949249267578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000607.npy"}
|
||||
{"epoch": 0.9176114890400605, "step": 608, "batch_size": 64, "mean": 1.3297594785690308, "std": 2.629338026046753, "min": -3.86572265625, "p10": -1.46503677368164, "median": 0.8176689147949219, "p90": 4.949955368041993, "max": 10.482208251953125, "pos_frac": 0.703125, "sample": [1.6482505798339844, -0.8156890869140625, 0.3325538635253906, 2.4952239990234375, 0.04149627685546875, -0.3043670654296875, 3.6956634521484375, -2.1097335815429688, 1.251220703125, -0.990478515625, 2.0264739990234375, -0.2781829833984375, 2.9380950927734375, 2.2915802001953125, 1.1286582946777344, 1.980987548828125, 5.414512634277344, -3.86572265625, 0.5906600952148438, 0.2586212158203125, -0.7109298706054688, -0.6477317810058594, -0.9460783004760742, 8.301090240478516, -0.874542236328125, -1.7468185424804688, -0.9783859252929688, 0.1265697479248047, 2.141538619995117, 4.631488800048828, 6.3323974609375, -2.4650726318359375, 6.13055419921875, 2.9880447387695312, 0.23089599609375, 2.47711181640625, -0.8139190673828125, -0.8166580200195312, 0.5121383666992188, 0.7841720581054688, 0.4701042175292969, 2.7988128662109375, 1.4460067749023438, -0.37956809997558594, 0.5489654541015625, 4.0157470703125, -1.8574676513671875, 3.225341796875, 2.468048095703125, 1.772979736328125, 0.5235595703125, 0.5943450927734375, 2.7534561157226562, 1.1220512390136719, 1.0465402603149414, 5.0864410400390625, -3.2474517822265625, 5.788564682006836, 0.851165771484375, -1.6684188842773438, 0.22628211975097656, 2.1992111206054688, 2.4519882202148438, 10.482208251953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000608.npy"}
|
||||
{"epoch": 0.9191232048374905, "step": 609, "batch_size": 64, "mean": 1.6727293729782104, "std": 2.311190605163574, "min": -5.0792694091796875, "p10": -1.16151762008667, "median": 1.5533771514892578, "p90": 4.757057952880861, "max": 6.7811126708984375, "pos_frac": 0.78125, "sample": [4.2759857177734375, 4.94232177734375, 4.324775695800781, 0.36834716796875, 1.5224151611328125, 0.14065074920654297, 2.3121185302734375, 2.5951385498046875, -0.3500337600708008, 1.2629966735839844, -0.8911933898925781, -0.17073440551757812, -1.6542930603027344, 2.2589149475097656, 6.7811126708984375, 3.3651123046875, 5.15582275390625, 2.157073974609375, 1.9529132843017578, 5.779457092285156, -2.1925811767578125, 0.868438720703125, 2.6659698486328125, 4.2740631103515625, 0.27812957763671875, 3.5839691162109375, 3.114805221557617, 0.9278984069824219, -1.3703460693359375, 1.5394210815429688, 6.4014434814453125, -1.1884469985961914, 5.653623580932617, 0.7920989990234375, 2.436798095703125, 3.250223159790039, 2.232280731201172, 0.12615203857421875, 5.336219787597656, -2.0608291625976562, -0.5715656280517578, 0.3339805603027344, 0.3179893493652344, 1.526458740234375, 3.5243453979492188, -5.0792694091796875, 2.175434112548828, 1.36041259765625, 1.5673332214355469, 0.9312591552734375, 1.2554302215576172, -1.0986824035644531, 1.95751953125, 3.584075927734375, 2.9243736267089844, -1.0320091247558594, 1.1585235595703125, 3.3504161834716797, 2.4810657501220703, -2.589447021484375, 2.1733245849609375, 3.094776153564453, 1.06195068359375, -0.15125274658203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000609.npy"}
|
||||
{"epoch": 0.9206349206349206, "step": 610, "batch_size": 64, "mean": 1.656959056854248, "std": 2.8360435962677, "min": -5.744171142578125, "p10": -1.6485366821289062, "median": 1.6735191345214844, "p90": 6.049436187744143, "max": 8.809913635253906, "pos_frac": 0.734375, "sample": [2.7600784301757812, 2.51727294921875, 1.7474746704101562, 7.179534912109375, 1.847259521484375, 2.30413818359375, 0.3173828125, 2.4294815063476562, -5.744171142578125, 4.647438049316406, -1.000579833984375, -2.336688995361328, 1.9382743835449219, 3.7608184814453125, 0.8340072631835938, 1.61895751953125, 3.8839244842529297, -1.0572166442871094, 0.3594207763671875, -0.01586151123046875, -2.6434326171875, -1.5448760986328125, 1.83624267578125, 1.7280807495117188, -3.1058692932128906, -2.455413818359375, -1.692962646484375, 8.809913635253906, 4.263256072998047, 6.5137176513671875, 0.553802490234375, 6.82843017578125, 4.011573791503906, 6.416862487792969, -0.6581802368164062, 2.1738853454589844, 6.3165435791015625, 0.16912460327148438, 2.1542205810546875, 0.55682373046875, 3.771697998046875, 0.5615882873535156, -1.2223968505859375, -1.40911865234375, 3.4273681640625, -0.3774604797363281, 1.8362274169921875, 1.48748779296875, 4.561302185058594, 4.360980987548828, 1.3010330200195312, 0.9378776550292969, 0.1791839599609375, 0.8798236846923828, -2.160785675048828, -0.32935333251953125, -1.3203125, 2.0240631103515625, 5.426185607910156, 2.2274627685546875, 3.0612869262695312, 0.345001220703125, 6.827491760253906, 1.426055908203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000610.npy"}
|
||||
{"epoch": 0.9221466364323507, "step": 611, "batch_size": 64, "mean": 1.3548383712768555, "std": 2.3835511207580566, "min": -4.6868896484375, "p10": -1.4706838607788084, "median": 1.1402931213378906, "p90": 4.621778106689454, "max": 7.716033935546875, "pos_frac": 0.75, "sample": [1.7295913696289062, -0.7266883850097656, -4.6868896484375, 0.0941314697265625, 5.088916778564453, 2.0617218017578125, 1.6951904296875, 2.3914947509765625, -2.127889633178711, 0.2509918212890625, -3.395692825317383, 3.4644851684570312, 2.0715999603271484, 0.28260040283203125, -1.3634757995605469, 1.3742027282714844, -2.0827369689941406, 4.5526123046875, -2.080190658569336, 7.716033935546875, 2.789569854736328, 1.3833065032958984, -0.7133941650390625, -1.5166301727294922, 0.25879669189453125, 0.05254077911376953, 3.2876968383789062, 0.9739456176757812, -0.02059173583984375, 3.58074951171875, 1.01385498046875, -2.4961166381835938, -0.171630859375, 5.340160369873047, 1.115386962890625, 1.7877578735351562, -1.2602119445800781, 3.4490203857421875, 3.1170578002929688, 1.466888427734375, 3.121826171875, 0.5947628021240234, 1.9446449279785156, 5.9474639892578125, -0.1489105224609375, 2.2912826538085938, 0.07099151611328125, 4.651420593261719, 2.491334915161133, 5.171875, 5.612449645996094, 4.170467376708984, 0.6637344360351562, 1.2048969268798828, 0.9691476821899414, 1.1651992797851562, 4.179988861083984, -0.7444076538085938, 0.2919769287109375, 0.855743408203125, 0.2738914489746094, -0.6891288757324219, 0.16945457458496094, 2.701385498046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000611.npy"}
|
||||
{"epoch": 0.9236583522297808, "step": 612, "batch_size": 64, "mean": 1.7321220636367798, "std": 2.5469086170196533, "min": -5.79754638671875, "p10": -0.9890174865722656, "median": 1.474639892578125, "p90": 4.578974914550782, "max": 7.768463134765625, "pos_frac": 0.75, "sample": [0.41443634033203125, -0.645416259765625, 3.16455078125, -1.0079345703125, -0.5626983642578125, 1.6941680908203125, 3.963693618774414, 2.98162841796875, 1.227569580078125, 1.4761581420898438, 4.156791687011719, 4.29827880859375, 2.5880584716796875, -0.29132080078125, 2.4116058349609375, 3.4383163452148438, 2.54034423828125, -1.12518310546875, -1.01336669921875, 1.4731216430664062, 1.3910751342773438, -2.296781539916992, 1.2273712158203125, 1.4604454040527344, -0.9448776245117188, 1.75518798828125, 1.82037353515625, -0.5279502868652344, -0.0511932373046875, 0.771728515625, 1.0322914123535156, 5.3306427001953125, -0.7245712280273438, -0.8854217529296875, 0.37020111083984375, 6.753971099853516, 0.5901260375976562, 4.178886413574219, 4.164695739746094, -4.330570220947266, 6.433319091796875, 0.47127532958984375, 4.687835693359375, -1.5724411010742188, 5.235748291015625, 4.3249664306640625, 3.61590576171875, -0.7924022674560547, 1.6460800170898438, 7.1966552734375, 0.4556083679199219, 1.0039520263671875, 7.768463134765625, 2.350982666015625, 2.508312225341797, 2.212738037109375, 3.952861785888672, 1.3683700561523438, 1.2915191650390625, -5.79754638671875, 3.3512535095214844, 4.1176300048828125, 1.1602249145507812, 1.5960655212402344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000612.npy"}
|
||||
{"epoch": 0.9251700680272109, "step": 613, "batch_size": 64, "mean": 1.9520105123519897, "std": 2.7309796810150146, "min": -6.0879974365234375, "p10": -0.9644775390624997, "median": 1.5789175033569336, "p90": 5.9525232315063485, "max": 7.169151306152344, "pos_frac": 0.796875, "sample": [-1.0880050659179688, 1.9597549438476562, -0.044910430908203125, -1.3013076782226562, 3.1750240325927734, 2.49114990234375, 4.412086486816406, 1.32568359375, -1.1533889770507812, 0.40547943115234375, 0.8892955780029297, 1.6140270233154297, -6.0879974365234375, 3.2208404541015625, 1.6877593994140625, 5.491973876953125, 0.9921035766601562, 0.8206310272216797, 5.1776123046875, 0.94134521484375, -0.3240814208984375, 1.0397453308105469, -1.66650390625, -0.2324810028076172, 5.831632614135742, 1.5319385528564453, 6.2653350830078125, 2.12353515625, 1.5438079833984375, 3.5082664489746094, 3.9708480834960938, 2.078765869140625, 2.7618560791015625, -0.17539405822753906, 1.81671142578125, 1.36968994140625, -3.54541015625, 7.108795166015625, 1.221353530883789, 7.169151306152344, 3.7743759155273438, 4.804901123046875, 0.18349838256835938, 0.17462158203125, 7.107734680175781, -0.6762466430664062, 4.708808898925781, 3.040843963623047, 0.7099761962890625, 2.6620864868164062, 0.6650619506835938, 0.232940673828125, 0.0736846923828125, 6.2095947265625, 4.7640228271484375, 1.974675178527832, 0.9433021545410156, 4.179773330688477, 1.9198074340820312, 6.799465179443359, 0.605621337890625, 6.00433349609375, -0.167205810546875, -4.093727111816406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000613.npy"}
|
||||
{"epoch": 0.926681783824641, "step": 614, "batch_size": 64, "mean": 1.4754180908203125, "std": 2.172788619995117, "min": -4.602020263671875, "p10": -0.7968954086303711, "median": 1.188079833984375, "p90": 4.371517944335939, "max": 7.0693206787109375, "pos_frac": 0.765625, "sample": [3.4802703857421875, 2.0283432006835938, -0.33385467529296875, 1.4633255004882812, 3.6794052124023438, 1.2930450439453125, -0.18076133728027344, 0.1650257110595703, 0.3737335205078125, 0.3791923522949219, 4.5330810546875, 0.9839134216308594, 3.994537353515625, 2.678131103515625, 5.377685546875, 1.438161849975586, 2.73651123046875, 2.802196502685547, 5.54388427734375, 4.612102508544922, -1.4203147888183594, 0.7165794372558594, -0.32634735107421875, 1.0831146240234375, 5.439178466796875, 0.02862548828125, 1.6870803833007812, -4.602020263671875, 3.2433624267578125, -0.404541015625, 0.49285888671875, 0.6574783325195312, 0.7853145599365234, 3.07354736328125, -1.0022563934326172, 0.9246559143066406, 3.1081695556640625, -0.018890380859375, -3.304370880126953, 2.1980133056640625, -0.382659912109375, 7.0693206787109375, 0.440582275390625, 2.1637802124023438, 0.03166961669921875, 3.0631332397460938, 0.7779998779296875, 1.3423309326171875, 1.3063812255859375, -0.77886962890625, 0.2493267059326172, 2.5888214111328125, -0.9418563842773438, -0.8046207427978516, -1.8868026733398438, 2.71575927734375, 5.308052062988281, 1.837738037109375, 3.383734703063965, 0.2691535949707031, 0.48520469665527344, 3.391298294067383, -0.4779624938964844, 3.8680686950683594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000614.npy"}
|
||||
{"epoch": 0.9281934996220711, "step": 615, "batch_size": 64, "mean": 1.785017967224121, "std": 2.757899284362793, "min": -4.380058288574219, "p10": -1.1260711669921875, "median": 1.7227249145507812, "p90": 5.226640701293945, "max": 8.9813232421875, "pos_frac": 0.6875, "sample": [2.4018173217773438, 1.9153594970703125, 5.251581192016602, -0.5886116027832031, -1.107208251953125, 3.776369094848633, 1.0895919799804688, -3.8267822265625, 4.2419891357421875, 3.0218582153320312, -1.0240402221679688, 2.1331329345703125, 0.57818603515625, 0.48876953125, 2.3468074798583984, -2.9861068725585938, -0.23764801025390625, 7.067924499511719, 3.823444366455078, 0.9633941650390625, 1.726165771484375, 0.1778717041015625, 4.836944580078125, -1.1341552734375, 3.518268585205078, -0.278533935546875, -0.5598945617675781, 0.45027923583984375, 1.1795806884765625, 6.840496063232422, 4.546112060546875, -0.11453628540039062, -0.1542205810546875, -1.0376243591308594, 0.49267578125, 2.723419189453125, 5.309173583984375, 2.8007659912109375, 0.345703125, 4.04595947265625, 5.248710632324219, 7.295318603515625, 5.175144195556641, 2.4220733642578125, 4.1067962646484375, -0.9805164337158203, 1.1078262329101562, -0.7949066162109375, -0.87298583984375, 4.565399169921875, 2.7989959716796875, 8.9813232421875, -1.3286819458007812, 4.496452331542969, -0.27146148681640625, 4.238067626953125, 2.350433349609375, 1.7192840576171875, 4.040191650390625, 0.15204238891601562, 2.3087081909179688, -1.2007293701171875, -4.380058288574219, -1.980560302734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000615.npy"}
|
||||
{"epoch": 0.9297052154195011, "step": 616, "batch_size": 64, "mean": 1.3388640880584717, "std": 2.3718228340148926, "min": -3.902050018310547, "p10": -1.511823272705078, "median": 1.0252857208251953, "p90": 4.829827117919923, "max": 6.92475700378418, "pos_frac": 0.71875, "sample": [4.690208435058594, 2.169342041015625, 2.027191162109375, -0.4637908935546875, 2.5156021118164062, -1.0022163391113281, 4.8896636962890625, 2.5663909912109375, 0.26840972900390625, 1.9948949813842773, -0.30030059814453125, 0.46053314208984375, 0.07211685180664062, -3.902050018310547, 0.26750946044921875, -1.9487190246582031, 6.4379425048828125, -0.45983314514160156, -0.5926895141601562, 5.11920166015625, 2.2365036010742188, 6.92475700378418, -3.563945770263672, 0.8358478546142578, 6.000823974609375, -1.866668701171875, -0.22469329833984375, -0.0602874755859375, -0.49925994873046875, 2.279865264892578, 1.337188720703125, 3.3665008544921875, 1.4534225463867188, 1.8967132568359375, 4.5152130126953125, 5.4280853271484375, -1.5901603698730469, 1.0362586975097656, 0.268157958984375, 2.4933338165283203, 2.909210205078125, -0.061428070068359375, 0.7794227600097656, 1.3453826904296875, 2.214874267578125, 0.3732757568359375, 0.9694557189941406, 1.014312744140625, -2.4300689697265625, 0.822265625, -1.3290367126464844, -1.8782272338867188, 0.3564453125, 3.4189605712890625, 1.4832954406738281, 4.530538558959961, 1.1389389038085938, 2.1428909301757812, 1.9647064208984375, 0.2597465515136719, 2.614368438720703, 6.4160919189453125, 0.7959136962890625, -1.2410964965820312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000616.npy"}
|
||||
{"epoch": 0.9312169312169312, "step": 617, "batch_size": 64, "mean": 1.1955888271331787, "std": 2.387681245803833, "min": -4.1921844482421875, "p10": -1.493449020385742, "median": 0.9455604553222656, "p90": 4.862755584716798, "max": 8.680122375488281, "pos_frac": 0.734375, "sample": [1.913177490234375, 0.8857269287109375, -2.5728988647460938, 2.2016372680664062, 0.6721038818359375, 1.9940967559814453, -0.0602264404296875, 2.6325836181640625, 1.9504776000976562, -3.551513671875, 1.0053939819335938, -0.32434844970703125, 2.0324249267578125, 1.5867280960083008, 1.1974906921386719, 0.7662582397460938, 2.1484909057617188, 2.0342178344726562, 0.1729583740234375, -0.3051300048828125, 5.790069580078125, 0.6012458801269531, -1.5338020324707031, 2.994396209716797, -1.33892822265625, -1.9044723510742188, -0.5250625610351562, 4.945320129394531, 0.7000274658203125, 2.7994232177734375, 1.3854427337646484, 0.231658935546875, 1.9748382568359375, 5.043510437011719, 0.70404052734375, 5.0914306640625, -0.34485626220703125, -0.6096572875976562, 1.9148902893066406, 0.0319976806640625, 3.4832687377929688, 0.33978271484375, -0.16300010681152344, 0.11780929565429688, 5.606998443603516, 0.7537918090820312, -3.452117919921875, -4.1921844482421875, -2.6503067016601562, 0.4474639892578125, 1.619140625, 4.002082824707031, 1.4250030517578125, 1.8019466400146484, 8.680122375488281, 5.511346817016602, 3.1755142211914062, -0.4799613952636719, 0.08391571044921875, 4.67010498046875, 0.09378814697265625, -1.3992919921875, 1.3134841918945312, 1.3978195190429688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000617.npy"}
|
||||
{"epoch": 0.9327286470143613, "step": 618, "batch_size": 64, "mean": 1.6353681087493896, "std": 2.638617753982544, "min": -4.3764495849609375, "p10": -1.203344917297363, "median": 1.013427734375, "p90": 4.556976318359376, "max": 9.751415252685547, "pos_frac": 0.734375, "sample": [6.3952789306640625, 3.384002685546875, 4.104351043701172, 0.856781005859375, -0.6680984497070312, -1.3791942596435547, 0.14649200439453125, -0.6254730224609375, 2.89678955078125, -0.5884532928466797, 1.58526611328125, 4.233489990234375, 4.4336090087890625, 4.30810546875, 9.751415252685547, 1.0473670959472656, 2.6200180053710938, 3.223787307739258, 4.6098480224609375, 3.336273193359375, 1.8048934936523438, 9.4039306640625, 1.3978910446166992, 3.7647323608398438, 0.3898468017578125, 3.765350341796875, 0.8211174011230469, 0.7063217163085938, 2.219135284423828, 0.7620353698730469, -1.5924835205078125, -0.2870635986328125, 3.0535507202148438, 0.7318344116210938, 2.726043701171875, -0.7920379638671875, 5.550136566162109, -0.79302978515625, 0.3172035217285156, 0.9794883728027344, -0.7461776733398438, -1.9734859466552734, 0.13893890380859375, -1.7964515686035156, 0.6543350219726562, -3.1667404174804688, -0.6736984252929688, -4.3764495849609375, 1.6945266723632812, 3.3332366943359375, -2.3036727905273438, 4.370555877685547, 1.0791549682617188, 1.6010169982910156, 0.7540817260742188, 0.7011528015136719, -0.044178009033203125, 2.1350784301757812, 0.89874267578125, 2.995410919189453, 4.8379669189453125, 5.4165802001953125, -0.3730316162109375, 0.9061126708984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000618.npy"}
|
||||
{"epoch": 0.9342403628117913, "step": 619, "batch_size": 64, "mean": 1.6235718727111816, "std": 2.3035988807678223, "min": -4.200752258300781, "p10": -0.6898834228515625, "median": 1.3230705261230469, "p90": 4.523469161987306, "max": 7.343967437744141, "pos_frac": 0.765625, "sample": [2.8165321350097656, 0.5554351806640625, 1.0964632034301758, 0.3611564636230469, 4.655479431152344, 4.282825469970703, 4.111907958984375, -0.706390380859375, -2.2118072509765625, 4.6266021728515625, 5.032421112060547, -1.5053749084472656, 3.86279296875, 0.3693103790283203, -0.176513671875, -0.4525165557861328, 2.5859222412109375, 2.3557586669921875, 1.2061500549316406, 0.207550048828125, 0.6075668334960938, 4.1780548095703125, 0.6377162933349609, -1.0753936767578125, -0.4644947052001953, 2.5852508544921875, 3.680065155029297, 1.391510009765625, 0.20074081420898438, -0.31523895263671875, 1.120025634765625, -4.148284912109375, -0.3083763122558594, 5.090972900390625, 2.2721214294433594, 5.201396942138672, 1.0398712158203125, 6.753757476806641, 7.343967437744141, -4.200752258300781, 1.1730804443359375, 2.091796875, 2.3208999633789062, 1.7035560607910156, 2.5994873046875, 1.2597427368164062, 0.2992897033691406, -0.29018402099609375, 1.3863983154296875, 2.571460723876953, 0.035858154296875, 0.49269866943359375, 4.206016540527344, 2.0900115966796875, 4.227325439453125, -0.6513671875, 1.7143020629882812, 3.2283668518066406, -0.4530296325683594, 0.42832374572753906, -0.7885665893554688, 4.056552886962891, 1.418670654296875, 4.1237335205078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000619.npy"}
|
||||
{"epoch": 0.9357520786092215, "step": 620, "batch_size": 64, "mean": 1.1954689025878906, "std": 2.6597912311553955, "min": -6.861358642578125, "p10": -1.5797437667846679, "median": 1.274465560913086, "p90": 4.143193054199219, "max": 7.943321228027344, "pos_frac": 0.703125, "sample": [1.4355278015136719, -1.5177001953125, -2.36212158203125, 3.542327880859375, -3.7595443725585938, 0.6468963623046875, 1.962158203125, 6.5983428955078125, -0.3092041015625, 3.3194580078125, 6.113533020019531, 4.914745330810547, -1.13970947265625, 0.868133544921875, 0.32393646240234375, 3.36126708984375, 2.1517372131347656, 0.48301029205322266, 0.4911346435546875, -0.3347930908203125, -0.366119384765625, -0.3128509521484375, -4.38751220703125, 4.131187438964844, -3.8507308959960938, 4.148338317871094, 2.2754364013671875, -0.7002105712890625, 7.943321228027344, 0.09912109375, 3.595560073852539, 1.7233047485351562, -1.1421966552734375, 0.6761932373046875, 2.6617374420166016, 1.8339157104492188, -1.6970062255859375, 2.73760986328125, 1.423776626586914, 0.14342117309570312, 2.187957763671875, 2.5953598022460938, 1.7802562713623047, 0.4970684051513672, -1.6034107208251953, -0.6564979553222656, 2.5314788818359375, 6.543609619140625, -0.9074020385742188, 2.143779754638672, 1.2085933685302734, 2.8967437744140625, -6.861358642578125, -0.11997222900390625, -1.5245208740234375, 0.22098731994628906, 1.3403377532958984, 0.22339248657226562, 3.2347612380981445, 5.3595733642578125, 2.6041440963745117, 2.6432418823242188, 1.8912620544433594, 0.5551967620849609], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000620.npy"}
|
||||
{"epoch": 0.9372637944066515, "step": 621, "batch_size": 64, "mean": 1.616602897644043, "std": 2.972754955291748, "min": -5.380649566650391, "p10": -1.7292160034179684, "median": 1.1379032135009766, "p90": 5.612154388427735, "max": 10.483657836914062, "pos_frac": 0.671875, "sample": [-0.43466758728027344, 0.6320724487304688, 4.6804351806640625, -1.9331130981445312, -1.345367431640625, -4.223041534423828, 1.90142822265625, 7.508026123046875, -0.1368083953857422, 0.9569511413574219, 1.1429939270019531, 4.3506317138671875, 5.517631530761719, -0.34569740295410156, 6.471355438232422, 0.9484100341796875, 2.2527923583984375, 2.6504364013671875, 0.5908660888671875, 4.625274658203125, 0.6919403076171875, -0.4123992919921875, 2.324920654296875, -1.0500640869140625, 1.833587646484375, 3.3053321838378906, 1.6412887573242188, 0.282440185546875, 10.483657836914062, -0.033069610595703125, 8.8638916015625, 2.082916259765625, -0.667694091796875, 5.879150390625, 2.4011268615722656, -0.622589111328125, 2.11077880859375, 1.423990249633789, 3.1238861083984375, -1.8937225341796875, -0.7856540679931641, -5.380649566650391, 2.9429550170898438, 0.1781463623046875, 2.463907241821289, -0.8350143432617188, 2.8691635131835938, -1.0042381286621094, -1.1317825317382812, -0.3531074523925781, -1.9499320983886719, 1.1328125, 3.0448989868164062, 0.1101369857788086, 3.7691421508789062, -2.075794219970703, 0.09957313537597656, 5.411468505859375, -2.3031673431396484, 5.6526641845703125, 4.796783447265625, 0.5206336975097656, 2.8912410736083984, 5.81842041015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000621.npy"}
|
||||
{"epoch": 0.9387755102040817, "step": 622, "batch_size": 64, "mean": 1.6664619445800781, "std": 2.272792100906372, "min": -3.6668014526367188, "p10": -1.0357131958007812, "median": 1.3917369842529297, "p90": 4.762337875366211, "max": 7.110321044921875, "pos_frac": 0.71875, "sample": [2.2039337158203125, 2.085590362548828, 0.602264404296875, -0.3360595703125, 2.411945343017578, 1.222503662109375, -0.3565559387207031, 3.668010711669922, -0.22785568237304688, 3.236328125, 6.642780303955078, 2.1511611938476562, -0.20972156524658203, -0.9623336791992188, 4.764728546142578, -0.3009014129638672, 2.083629608154297, 1.2537765502929688, 7.110321044921875, 6.6888427734375, 0.8779067993164062, 1.6510791778564453, -1.2030830383300781, 1.4304847717285156, 4.3997650146484375, 1.6852340698242188, 1.3529891967773438, 1.0775985717773438, 1.0735282897949219, 3.630290985107422, 4.7567596435546875, -2.826568603515625, 4.391998291015625, -1.4414520263671875, 0.2210845947265625, 0.693817138671875, 2.7747344970703125, 2.0866317749023438, 2.5304946899414062, -1.7350692749023438, 2.5512466430664062, 5.2168121337890625, -1.0671615600585938, 0.48841094970703125, -0.2657432556152344, 1.60845947265625, 4.088470458984375, -3.6668014526367188, -0.08889007568359375, 2.1608238220214844, 4.976280212402344, 0.580810546875, 1.339925765991211, -0.4944000244140625, 1.0954742431640625, -0.8601913452148438, 4.844970703125, 4.230945587158203, -1.2259902954101562, -0.02671051025390625, 3.7608585357666016, 2.605010986328125, 1.1506271362304688, 2.489715576171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000622.npy"}
|
||||
{"epoch": 0.9402872260015117, "step": 623, "batch_size": 64, "mean": 1.6356725692749023, "std": 2.52010178565979, "min": -3.939422607421875, "p10": -1.0613845825195312, "median": 1.423959732055664, "p90": 4.66892547607422, "max": 10.111419677734375, "pos_frac": 0.765625, "sample": [2.3385696411132812, 0.114776611328125, -0.5938949584960938, -3.939422607421875, 0.25951385498046875, 1.08160400390625, 2.1786270141601562, 1.6884841918945312, 5.882049560546875, 2.761629104614258, 0.40254974365234375, 1.06671142578125, 1.0864791870117188, 1.22283935546875, -0.20688629150390625, 5.032142639160156, -1.1064605712890625, 2.0522994995117188, 4.12567138671875, 3.4256057739257812, 1.6160888671875, 1.7470741271972656, 3.29339599609375, -0.29181861877441406, -1.2823486328125, 2.4271678924560547, 1.3467254638671875, 2.1104736328125, -2.951446533203125, 2.2996673583984375, 8.400054931640625, 10.111419677734375, 2.6282958984375, 3.38238525390625, 0.0618133544921875, -1.2212915420532227, -0.39186859130859375, 0.8152427673339844, 4.351310729980469, 2.3663291931152344, 3.852569580078125, 6.022819519042969, 0.020870208740234375, 0.800628662109375, -0.956207275390625, 0.2834892272949219, -0.46280670166015625, 2.261617660522461, 1.9147281646728516, 1.7587966918945312, 3.3507232666015625, 6.491424560546875, 1.06719970703125, 3.4542083740234375, -2.040180206298828, 0.5610504150390625, 0.574249267578125, 1.5011940002441406, 2.2854843139648438, -0.2004680633544922, 4.805046081542969, -0.16492080688476562, 1.1663589477539062, -3.3563919067382812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000623.npy"}
|
||||
{"epoch": 0.9417989417989417, "step": 624, "batch_size": 64, "mean": 1.6761407852172852, "std": 2.6140546798706055, "min": -6.345466613769531, "p10": -1.0862918853759767, "median": 1.4536495208740234, "p90": 5.417341804504395, "max": 7.0191650390625, "pos_frac": 0.75, "sample": [-1.7981719970703125, -2.5620803833007812, 4.792083740234375, 2.6632308959960938, 6.6626739501953125, 3.6089935302734375, -0.3312263488769531, 0.0279541015625, -1.093475341796875, 0.3521080017089844, 2.1016998291015625, 1.4827079772949219, -0.7038230895996094, 0.29936981201171875, 2.8420639038085938, 0.5739898681640625, 0.35951995849609375, 4.996185302734375, 1.6123504638671875, -0.7991714477539062, 0.8280715942382812, 0.2519950866699219, -2.4291915893554688, 4.194034576416016, 0.6593742370605469, 0.11635589599609375, 1.1967697143554688, -1.6050186157226562, 1.2642555236816406, 1.8563499450683594, 1.202728271484375, 4.983978271484375, 6.702644348144531, 1.424591064453125, 2.7961959838867188, -0.8610725402832031, 5.48225212097168, 5.980049133300781, 7.0191650390625, 3.157306671142578, 4.52386474609375, 2.69866943359375, 1.8736801147460938, 2.563793182373047, -0.33956146240234375, 5.2658843994140625, 5.913520812988281, 0.9715118408203125, -0.9238681793212891, 2.6688003540039062, 3.0306320190429688, 1.2913665771484375, -1.0695304870605469, 0.9651145935058594, -6.345466613769531, 2.7171249389648438, 2.2677078247070312, -0.8790359497070312, 1.5903701782226562, -0.28479766845703125, -1.5227203369140625, 1.5674800872802734, 6.45440673828125, 2.96624755859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000624.npy"}
|
||||
{"epoch": 0.9433106575963719, "step": 625, "batch_size": 64, "mean": 1.0710105895996094, "std": 2.315742254257202, "min": -4.92279052734375, "p10": -1.6268051147460938, "median": 1.0015678405761719, "p90": 3.9776729583740242, "max": 7.8354644775390625, "pos_frac": 0.671875, "sample": [1.0547866821289062, 2.6733932495117188, -0.00311279296875, 1.93572998046875, -1.0320167541503906, 4.052028656005859, -0.38970947265625, -0.32151031494140625, 2.522632598876953, 0.8367919921875, 1.359130859375, 0.09798622131347656, 0.506378173828125, -1.3678627014160156, 4.264556884765625, 0.6528167724609375, 7.8354644775390625, 1.4175262451171875, -1.9585304260253906, 0.6983356475830078, 0.9183578491210938, 3.186981201171875, 1.0660858154296875, 2.8769073486328125, 0.28040313720703125, 2.1745681762695312, 4.402580261230469, 2.7671966552734375, -0.6721000671386719, 2.5717391967773438, -2.6468658447265625, 2.38751220703125, 2.18115234375, -0.5230560302734375, 0.0045928955078125, -1.0821151733398438, -3.1680126190185547, 5.39044189453125, -1.2294921875, 0.9483489990234375, 2.020538330078125, 4.4549560546875, 3.8041763305664062, 0.7641639709472656, 2.2871780395507812, 1.9653358459472656, -1.0924453735351562, -1.6016082763671875, -1.709991455078125, 3.5053634643554688, -1.637603759765625, -4.92279052734375, -1.5204315185546875, 5.760292053222656, 2.9790592193603516, -0.817957878112793, 1.3560791015625, 0.8344039916992188, -0.18756866455078125, -2.9627418518066406, 1.9569969177246094, 1.6797027587890625, 2.005035400390625, 2.9544906616210938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000625.npy"}
|
||||
{"epoch": 0.9448223733938019, "step": 626, "batch_size": 64, "mean": 1.774298071861267, "std": 2.84564471244812, "min": -4.4332275390625, "p10": -1.8120079040527344, "median": 1.5166816711425781, "p90": 5.667972373962403, "max": 8.3519287109375, "pos_frac": 0.671875, "sample": [2.779022216796875, 1.43438720703125, 1.13262939453125, -1.764556884765625, 7.867340087890625, 5.539695739746094, 6.712860107421875, -2.3421630859375, 1.4426956176757812, 5.72294807434082, -0.6804981231689453, -0.4329948425292969, 0.15193939208984375, -0.11942291259765625, -0.5283584594726562, 8.346111297607422, 3.7614097595214844, -1.7191753387451172, 2.5983047485351562, -0.09125518798828125, -1.8979721069335938, -1.22698974609375, 1.3229732513427734, 4.732002258300781, 2.1088600158691406, -1.9971694946289062, 2.5806503295898438, 3.8170318603515625, -0.6732177734375, 8.3519287109375, 2.841381072998047, -0.13087081909179688, 4.3487396240234375, 0.8147811889648438, 1.5067024230957031, -4.4332275390625, 0.48952484130859375, 3.3060989379882812, 1.7758102416992188, 2.578857421875, 0.7904243469238281, -0.8305625915527344, 6.769157409667969, 1.4964370727539062, -2.4883575439453125, 3.817047119140625, 2.2151336669921875, 3.1201019287109375, 1.5782089233398438, -0.0849609375, 4.386383056640625, 2.019641876220703, -2.418712615966797, 2.71148681640625, -0.8367080688476562, 4.124298095703125, 1.5266609191894531, 6.420257568359375, 3.7759456634521484, 3.302093505859375, 4.3000335693359375, -0.9705371856689453, 0.6371383666992188, -1.8323440551757812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000626.npy"}
|
||||
{"epoch": 0.9463340891912321, "step": 627, "batch_size": 64, "mean": 1.9054151773452759, "std": 2.5904042720794678, "min": -1.99053955078125, "p10": -1.1754909515380856, "median": 1.0554656982421875, "p90": 5.572116088867189, "max": 9.09515380859375, "pos_frac": 0.75, "sample": [0.983856201171875, 2.4014434814453125, 2.0374298095703125, 0.6419677734375, 2.6324081420898438, 3.412342071533203, 3.830413818359375, 6.0557861328125, 3.444976806640625, 1.3294677734375, -0.056056976318359375, 5.651611328125, 8.583503723144531, 2.4014205932617188, 1.0607948303222656, -0.2257843017578125, -0.47313690185546875, 1.0217666625976562, 7.247035980224609, 2.0972137451171875, 4.441993713378906, 0.8192253112792969, -0.4326210021972656, -0.92401123046875, 0.02152252197265625, -1.5621376037597656, 4.539621353149414, -1.620269775390625, 4.455394744873047, -0.7988510131835938, -1.6370391845703125, 6.006866455078125, 3.881622314453125, 1.994598388671875, 0.8444671630859375, 0.3720703125, 0.46197509765625, -0.19316864013671875, 1.2845001220703125, 7.241630554199219, 4.932952880859375, 1.7528305053710938, -1.2832679748535156, -1.3011302947998047, 1.0067214965820312, 0.29632568359375, 1.1845932006835938, 1.0501365661621094, -0.10516357421875, 5.386627197265625, 0.6588077545166016, -1.3631820678710938, 0.8067512512207031, 2.9743080139160156, 2.9862403869628906, 0.75, 0.7286033630371094, -1.99053955078125, 0.4358177185058594, -0.42563629150390625, 2.4546966552734375, 4.209041595458984, 4.430030822753906, 9.09515380859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000627.npy"}
|
||||
{"epoch": 0.9478458049886621, "step": 628, "batch_size": 64, "mean": 1.9088804721832275, "std": 2.999025583267212, "min": -6.855552673339844, "p10": -1.6042449951171873, "median": 2.077601909637451, "p90": 5.195109939575196, "max": 9.421783447265625, "pos_frac": 0.71875, "sample": [9.421783447265625, -1.6808013916015625, 5.240024566650391, 2.5659446716308594, -1.972320556640625, -1.323760986328125, 5.245513916015625, -0.24615478515625, 4.732166290283203, 3.579029083251953, 9.039066314697266, 5.380401611328125, 4.7666015625, 1.7043304443359375, 2.2836036682128906, 0.5744400024414062, -6.228729248046875, -6.855552673339844, 2.53387451171875, -0.32088470458984375, 3.9512062072753906, 0.8562774658203125, 1.9406347274780273, -1.802490234375, 1.1858291625976562, 1.6069564819335938, 2.4766693115234375, 1.25244140625, -0.7695770263671875, 4.367424011230469, 4.914239883422852, 2.58392333984375, 3.0996246337890625, 5.090309143066406, 1.6391830444335938, 3.4056396484375, -0.5061988830566406, 1.0667095184326172, 4.489372253417969, -0.556427001953125, 2.580760955810547, 0.6868858337402344, 1.3955669403076172, 4.975124359130859, 3.203369140625, 3.8080596923828125, 6.39811897277832, 3.4958724975585938, 6.527503967285156, 3.3841686248779297, 1.2030677795410156, -1.7205352783203125, 1.3374900817871094, -2.5358409881591797, 3.341327667236328, 3.8643951416015625, 0.5308380126953125, 2.9989852905273438, -1.0320587158203125, -0.07657623291015625, -1.111058235168457, 2.214569091796875, -0.6063957214355469, -1.4256134033203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000628.npy"}
|
||||
{"epoch": 0.9493575207860923, "step": 629, "batch_size": 64, "mean": 1.6929101943969727, "std": 2.4245822429656982, "min": -4.030342102050781, "p10": -0.9669601440429687, "median": 1.5189037322998047, "p90": 5.131418228149415, "max": 7.647880554199219, "pos_frac": 0.71875, "sample": [-1.4121551513671875, 0.64892578125, -0.7883319854736328, 1.7658958435058594, 1.4012813568115234, 2.396137237548828, -0.5008392333984375, 1.9262733459472656, 2.904388427734375, -0.9769668579101562, 2.3782882690429688, 0.78887939453125, 2.5639820098876953, 5.41558837890625, 0.2779350280761719, -1.9394607543945312, 3.5691471099853516, 1.5795211791992188, 1.4198760986328125, 3.7729873657226562, -4.030342102050781, 2.6834335327148438, -1.7562179565429688, -0.23560523986816406, -0.42858123779296875, -0.23711395263671875, 6.5975189208984375, 6.405792236328125, 0.3322334289550781, 4.176887512207031, 2.50238037109375, -0.4892730712890625, 1.2768173217773438, 2.8981475830078125, 4.5726776123046875, 1.9862060546875, 2.0131454467773438, -0.9436111450195312, -0.18517684936523438, -0.19539642333984375, -0.0835113525390625, 0.4452476501464844, 4.893989562988281, 2.0778770446777344, 0.7905845642089844, 1.9713821411132812, 1.4233856201171875, 3.1897125244140625, 1.4582862854003906, 5.233173370361328, 7.249015808105469, 0.3635978698730469, 4.441551208496094, 0.35027313232421875, 1.935821533203125, 1.0616607666015625, -1.8958740234375, 3.8126983642578125, 5.357891082763672, 7.647880554199219, -0.13983154296875, 2.9351959228515625, -2.7503662109375, 2.4413299560546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000629.npy"}
|
||||
{"epoch": 0.9508692365835223, "step": 630, "batch_size": 64, "mean": 1.9530054330825806, "std": 2.488100528717041, "min": -1.9485416412353516, "p10": -1.0397283554077148, "median": 1.4955253601074219, "p90": 5.367864990234375, "max": 9.078567504882812, "pos_frac": 0.75, "sample": [-1.044839859008789, 1.206268310546875, 1.8603439331054688, 0.8386459350585938, 6.834562301635742, 5.764678955078125, 2.5930404663085938, 0.5297050476074219, 7.113304138183594, 3.620269775390625, -0.6786117553710938, 2.541492462158203, -1.8719940185546875, 4.2347412109375, -0.2925376892089844, 3.4501609802246094, -1.9485416412353516, 0.8907814025878906, 2.484546661376953, 3.557220458984375, -0.313262939453125, 0.800140380859375, 5.189666748046875, 1.994232177734375, 5.339752197265625, 4.86309814453125, -0.6353569030761719, 3.456022262573242, 1.6079902648925781, 1.1838607788085938, 0.674407958984375, 0.2425994873046875, -1.2087287902832031, 4.03985595703125, 2.8070831298828125, 2.95062255859375, 2.6839218139648438, -0.5974502563476562, 1.0689353942871094, 0.6443939208984375, 9.078567504882812, 4.508100509643555, 2.5249061584472656, -1.1783828735351562, 6.107963562011719, 5.59576416015625, 0.0912628173828125, 0.843536376953125, 3.6164283752441406, -1.444915771484375, 2.683135986328125, 0.55126953125, -0.49697113037109375, 4.401039123535156, 2.3282699584960938, 0.3294830322265625, -1.027801513671875, 0.6252517700195312, -1.4711380004882812, 3.4519119262695312, -0.8625831604003906, 5.379913330078125, 1.3830604553222656, -0.5007495880126953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000630.npy"}
|
||||
{"epoch": 0.9523809523809523, "step": 631, "batch_size": 64, "mean": 2.160503387451172, "std": 2.8813834190368652, "min": -4.2795867919921875, "p10": -1.1148780822753905, "median": 1.8405656814575195, "p90": 5.995543670654298, "max": 9.16796875, "pos_frac": 0.8125, "sample": [0.958221435546875, -1.6452178955078125, -1.2646484375, 3.6251068115234375, 1.9540462493896484, 4.04266357421875, 4.2972564697265625, 4.974639892578125, 0.0462646484375, 1.07781982421875, 3.503173828125, -2.948305130004883, 1.4602317810058594, 4.330535888671875, -1.0051822662353516, 0.7959308624267578, 1.698495864868164, 2.1982994079589844, 0.9786529541015625, 8.8343505859375, 6.121101379394531, -0.5410385131835938, 9.16796875, 3.7818222045898438, 2.88140869140625, 3.65155029296875, 1.0613327026367188, 5.70257568359375, -0.86688232421875, 1.7408113479614258, 3.1492156982421875, 0.66357421875, -4.2795867919921875, 0.1393117904663086, 3.1184349060058594, 3.7578887939453125, 6.901691436767578, 0.5557098388671875, 9.113006591796875, 7.5373077392578125, 0.6594696044921875, -1.1210098266601562, 0.536773681640625, 2.4567127227783203, -1.1005706787109375, -1.96014404296875, 0.9923553466796875, 1.9403200149536133, -2.907512664794922, 4.854583740234375, 1.0358657836914062, 3.5418853759765625, -0.33026123046875, 6.381317138671875, 0.0110321044921875, 0.2314300537109375, 0.5101203918457031, 5.04376220703125, 2.4371070861816406, 4.011474609375, 3.196025848388672, 3.6208267211914062, 0.56634521484375, 2.3947601318359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000631.npy"}
|
||||
{"epoch": 0.9538926681783825, "step": 632, "batch_size": 64, "mean": 1.8749642372131348, "std": 2.473240375518799, "min": -4.263568878173828, "p10": -0.8978256225585932, "median": 1.7920808792114258, "p90": 4.8755233764648445, "max": 8.098142623901367, "pos_frac": 0.828125, "sample": [1.3561553955078125, 4.917083740234375, -1.117034912109375, -0.09777641296386719, 4.036796569824219, 8.098142623901367, -0.3863372802734375, 5.498004913330078, 1.75665283203125, 2.791606903076172, 0.2637786865234375, -3.2808837890625, 1.1870880126953125, 1.8275089263916016, 0.12474536895751953, 3.9446487426757812, 1.9528427124023438, 1.1832294464111328, 2.554330825805664, -4.263568878173828, 1.5922317504882812, 1.1951179504394531, -0.32201576232910156, 0.89813232421875, 1.2362823486328125, 3.9352874755859375, 0.24920654296875, 0.7400588989257812, 0.6520576477050781, 4.001262664794922, 1.36767578125, 3.850006103515625, 3.1285171508789062, -1.18096923828125, 2.6341094970703125, 4.158454895019531, 3.49591064453125, 0.1476306915283203, 5.1825408935546875, 4.062797546386719, 3.5405349731445312, 3.8817825317382812, 0.41365814208984375, 1.0276374816894531, 0.32521820068359375, 4.927463531494141, 2.0013885498046875, 1.2147598266601562, 3.7932357788085938, 4.117321014404297, 5.839813232421875, -4.1966552734375, 3.4788970947265625, 4.7785491943359375, -2.5843048095703125, 1.1301193237304688, 6.372894287109375, 2.3325729370117188, 2.9897994995117188, 1.9509353637695312, 2.0140228271484375, 0.6405010223388672, -3.2268600463867188, -0.13687896728515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000632.npy"}
|
||||
{"epoch": 0.9554043839758125, "step": 633, "batch_size": 64, "mean": 1.9704585075378418, "std": 2.9942333698272705, "min": -4.324821472167969, "p10": -1.3990752220153808, "median": 1.6279964447021484, "p90": 5.712250137329102, "max": 11.182815551757812, "pos_frac": 0.71875, "sample": [-1.4431695938110352, -0.010814666748046875, 1.4069862365722656, 1.5774803161621094, 3.5813980102539062, 5.73126220703125, 3.4456329345703125, 3.64715576171875, 2.635293960571289, 0.3798065185546875, 0.49321746826171875, 0.6096782684326172, 8.43878173828125, 0.6588001251220703, -0.9727840423583984, 1.7952728271484375, -0.39166259765625, -2.02630615234375, 3.0644607543945312, -0.8812141418457031, 0.037601470947265625, -1.6099319458007812, 3.7604293823242188, 3.8509292602539062, 2.2190074920654297, 3.9992713928222656, -1.214223861694336, 1.4273300170898438, 1.4379806518554688, 1.68328857421875, 2.6693496704101562, 2.5883045196533203, 4.425365447998047, 1.1634521484375, -2.061126708984375, 1.6785125732421875, -0.6790695190429688, -1.2961883544921875, 0.5187606811523438, -4.324821472167969, 1.0897064208984375, 0.2494964599609375, 2.204792022705078, 11.182815551757812, 0.4658355712890625, -0.020554542541503906, 3.261850357055664, -1.9831085205078125, 5.157260894775391, 3.9954986572265625, 8.011146545410156, 2.9148006439208984, 3.1113357543945312, 5.667888641357422, -0.6934528350830078, -1.6709671020507812, 2.028522491455078, -0.5804176330566406, 6.01849365234375, 7.719020843505859, 9.026641845703125, -1.1812152862548828, 4.661834716796875, 3.4586181640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000633.npy"}
|
||||
{"epoch": 0.9569160997732427, "step": 634, "batch_size": 64, "mean": 0.8736119270324707, "std": 2.48030161857605, "min": -3.1972808837890625, "p10": -1.8839881896972654, "median": 0.3092384338378906, "p90": 3.8004234313964846, "max": 10.649139404296875, "pos_frac": 0.59375, "sample": [0.874603271484375, -2.418304443359375, -0.8520545959472656, 4.210601806640625, 3.8317413330078125, -0.4228076934814453, 0.224395751953125, 0.17264938354492188, -1.9253768920898438, -0.30120849609375, -0.2112903594970703, -0.9014358520507812, -2.575054168701172, 0.39408111572265625, -1.1933441162109375, 0.06261825561523438, 1.8780136108398438, 0.4015464782714844, -1.78741455078125, 0.7710132598876953, -3.1972808837890625, 0.08443069458007812, 1.0129241943359375, -3.1347579956054688, 6.6478271484375, 0.0006771087646484375, 3.905414581298828, 3.4160690307617188, 5.0229644775390625, -2.0997276306152344, -0.30035400390625, 0.47344970703125, -0.5460662841796875, -1.6040420532226562, 3.0977001190185547, -0.93109130859375, 1.7842559814453125, -0.051151275634765625, 10.649139404296875, 3.7273483276367188, 1.9946975708007812, -0.622222900390625, -1.1468124389648438, 2.0790176391601562, -0.137786865234375, -2.102874755859375, -1.7353687286376953, 3.291738510131836, 6.248222351074219, -0.07398223876953125, -0.436004638671875, -0.5618133544921875, 2.3254051208496094, 1.3090476989746094, 1.970916748046875, 0.15963363647460938, 3.2965431213378906, 2.6508941650390625, 0.5897293090820312, 2.6143798828125, 0.5359573364257812, 0.9537925720214844, 1.591705322265625, 2.9256439208984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000634.npy"}
|
||||
{"epoch": 0.9584278155706727, "step": 635, "batch_size": 64, "mean": 1.2240850925445557, "std": 2.6812760829925537, "min": -3.9102001190185547, "p10": -1.5326980590820312, "median": 0.6031427383422852, "p90": 4.333687591552736, "max": 11.092826843261719, "pos_frac": 0.6875, "sample": [-1.3018035888671875, 2.660978317260742, -3.4190597534179688, -0.1526622772216797, 5.841400146484375, 0.74615478515625, -0.13362884521484375, -3.9102001190185547, -1.5778656005859375, 7.009670257568359, 1.3747520446777344, 3.7796859741210938, -0.2631721496582031, -2.18414306640625, 4.467796325683594, 5.4954833984375, 3.995220184326172, 1.3733062744140625, 2.5137481689453125, 1.73370361328125, 0.4505157470703125, 0.23361968994140625, -0.44815826416015625, 4.653770446777344, -1.42730712890625, -0.14524078369140625, 2.6542434692382812, 0.05169677734375, 0.7159824371337891, 1.1447525024414062, 1.66412353515625, -2.0081634521484375, 3.6552505493164062, -0.297576904296875, 0.88800048828125, 0.0846710205078125, 2.06817626953125, -0.13729095458984375, 1.4883499145507812, 0.0928955078125, 0.49030303955078125, 3.9703750610351562, -0.8005104064941406, 8.576461791992188, 0.06874847412109375, -2.3127479553222656, 3.3458404541015625, 4.0207672119140625, 0.351531982421875, 2.387969970703125, 0.8310623168945312, 11.092826843261719, 0.08330535888671875, 0.222991943359375, -3.0585479736328125, 1.0218276977539062, 1.418182373046875, -0.5186996459960938, 0.3731536865234375, 0.23024749755859375, -0.5121917724609375, 2.613668441772461, 1.48846435546875, -0.47525978088378906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000635.npy"}
|
||||
{"epoch": 0.9599395313681028, "step": 636, "batch_size": 64, "mean": 1.9450714588165283, "std": 2.307241678237915, "min": -2.1137466430664062, "p10": -0.9215667724609374, "median": 1.5488452911376953, "p90": 4.466769790649415, "max": 9.519683837890625, "pos_frac": 0.796875, "sample": [5.634971618652344, -1.8325271606445312, 0.3997478485107422, 1.3100051879882812, 4.384288787841797, 1.9024276733398438, 4.441837310791016, 1.5657615661621094, 3.620758056640625, 4.746864318847656, 0.28621673583984375, -0.9626007080078125, 4.477455139160156, 3.273944854736328, 2.110931396484375, 0.3137359619140625, 1.5319290161132812, 1.3926849365234375, 1.4849853515625, 2.243022918701172, 1.7660064697265625, 3.6445846557617188, 0.4213066101074219, 2.858790397644043, 2.3319358825683594, 3.4907608032226562, 9.519683837890625, -1.205963134765625, 0.8652935028076172, 3.1792984008789062, 0.31606292724609375, 2.587453842163086, 8.02520751953125, 3.395660400390625, -0.5755233764648438, 2.8363037109375, 0.37142181396484375, 1.4093437194824219, -0.1665191650390625, -0.5891647338867188, 0.3193206787109375, 4.038093566894531, 0.24155044555664062, 5.5738525390625, -0.0972137451171875, -2.1137466430664062, 6.3272552490234375, 3.1332931518554688, 1.502248764038086, 3.0327682495117188, 1.772247314453125, -0.8258209228515625, 3.9050140380859375, 3.9219284057617188, 0.638427734375, -1.0010147094726562, -1.67962646484375, -0.014032363891601562, -1.1120414733886719, 2.956878662109375, 1.1886444091796875, 1.2500762939453125, 1.2820816040039062, 3.436004638671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000636.npy"}
|
||||
{"epoch": 0.9614512471655329, "step": 637, "batch_size": 64, "mean": 1.748998761177063, "std": 2.5614171028137207, "min": -2.940277099609375, "p10": -0.9691677093505858, "median": 1.2151165008544922, "p90": 6.034015655517579, "max": 7.657976150512695, "pos_frac": 0.8125, "sample": [2.66278076171875, 4.215156555175781, 2.0646095275878906, 6.57708740234375, -0.8705673217773438, 5.9022369384765625, -0.710479736328125, 1.1958732604980469, 2.30029296875, 1.5340423583984375, 1.6212120056152344, 1.7217025756835938, 0.47225189208984375, 1.0013504028320312, 0.740692138671875, 0.05530548095703125, -1.0114250183105469, 3.218475341796875, 6.65875244140625, 0.054882049560546875, 2.781158447265625, -2.2925872802734375, 1.5268173217773438, 0.643218994140625, -1.345071792602539, 1.3230743408203125, 1.0123519897460938, 6.090492248535156, 6.389617919921875, 5.730674743652344, 7.657976150512695, 0.39754486083984375, -2.87310791015625, 0.8900146484375, 0.4922332763671875, 2.1391983032226562, 2.5224151611328125, 1.8516349792480469, 2.754364013671875, 5.7465362548828125, 0.7262725830078125, 5.286651611328125, -0.39655303955078125, 1.8647270202636719, 1.72802734375, 0.3505687713623047, -2.8750991821289062, 1.2343597412109375, 0.3937530517578125, 0.8152580261230469, -0.12487030029296875, 7.213401794433594, 0.1939697265625, 1.1935958862304688, 1.4334869384765625, 0.2429656982421875, 5.276420593261719, -0.1533489227294922, -1.4612197875976562, -2.940277099609375, 0.5515899658203125, 6.117183685302734, 0.9686317443847656, 1.4536361694335938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000637.npy"}
|
||||
{"epoch": 0.9629629629629629, "step": 638, "batch_size": 64, "mean": 1.9968384504318237, "std": 2.4510855674743652, "min": -3.423004150390625, "p10": -0.8388732910156248, "median": 2.113748550415039, "p90": 5.326119232177735, "max": 7.345832824707031, "pos_frac": 0.765625, "sample": [-0.935791015625, 2.6679458618164062, 4.927978515625, 0.6124820709228516, -3.423004150390625, -1.8349838256835938, 5.354988098144531, 3.148101806640625, 0.07196044921875, -1.6879959106445312, 0.022922515869140625, -0.61273193359375, 0.498199462890625, 0.299072265625, 1.4677581787109375, 2.8191375732421875, 1.1158599853515625, -1.5925979614257812, 4.293285369873047, -0.3100128173828125, 6.279937744140625, 0.5180511474609375, 2.5898971557617188, 2.6933135986328125, -2.87359619140625, 4.363494873046875, 3.8779983520507812, 4.568180084228516, -0.4019966125488281, 2.6867713928222656, 2.2512168884277344, 3.8824195861816406, 3.356292724609375, 5.790077209472656, 6.013671875, -0.00366973876953125, 1.140167236328125, 3.791614532470703, 3.1690826416015625, 3.6256484985351562, 3.5495834350585938, 3.5790786743164062, 1.5280303955078125, 0.9241943359375, 3.5315780639648438, -0.097503662109375, 1.7451324462890625, 6.757904052734375, -0.1642913818359375, 1.9788627624511719, 0.020170211791992188, 2.2486343383789062, 5.258758544921875, 2.6998443603515625, -0.2776145935058594, 3.4423704147338867, -1.8007125854492188, 7.345832824707031, 6.483131408691406, -0.142852783203125, 1.0473175048828125, 0.639862060546875, 0.81134033203125, 2.4678611755371094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000638.npy"}
|
||||
{"epoch": 0.9644746787603931, "step": 639, "batch_size": 64, "mean": 1.4056884050369263, "std": 2.2519547939300537, "min": -4.4272613525390625, "p10": -0.9347475051879881, "median": 1.3454132080078125, "p90": 4.306348419189454, "max": 7.626838684082031, "pos_frac": 0.703125, "sample": [2.977691650390625, 3.7189178466796875, 1.5712966918945312, 0.5966796875, 4.185272216796875, 2.0602264404296875, 1.9387168884277344, -0.6285247802734375, 1.1195297241210938, 2.0402793884277344, 0.21869659423828125, 2.9612960815429688, 5.194091796875, -0.6703948974609375, 4.358238220214844, 0.1542510986328125, 1.772115707397461, -0.03643035888671875, -2.0834808349609375, 3.7750778198242188, 1.7057609558105469, 4.941642761230469, 1.696554183959961, -1.170684814453125, -2.9228744506835938, 3.680072784423828, -0.2937431335449219, 3.7426071166992188, -1.0009613037109375, 2.3957366943359375, 4.8480987548828125, -0.7802486419677734, 0.2988567352294922, 2.416492462158203, -0.20374679565429688, 0.039520263671875, 0.2881965637207031, 1.84771728515625, 7.626838684082031, 2.7954139709472656, 3.4624767303466797, -0.369140625, 5.269744873046875, 2.7692947387695312, 0.43109130859375, 0.3697929382324219, -1.39788818359375, -0.0864706039428711, 1.0087127685546875, -0.6007041931152344, 1.85430908203125, 3.787576675415039, 1.1030960083007812, 0.058864593505859375, 0.068878173828125, -1.92047119140625, -0.4523429870605469, -4.4272613525390625, -0.33138275146484375, 5.547050476074219, -0.6478977203369141, 2.4239730834960938, 2.010955810546875, 2.8570022583007812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000639.npy"}
|
||||
{"epoch": 0.9659863945578231, "step": 640, "batch_size": 64, "mean": 1.1371290683746338, "std": 2.447139263153076, "min": -4.626640319824219, "p10": -2.1341352462768555, "median": 1.2594900131225586, "p90": 3.5479690551757814, "max": 6.933197021484375, "pos_frac": 0.703125, "sample": [1.5349769592285156, -0.15428924560546875, 2.8725128173828125, -3.3149337768554688, 1.2920513153076172, 0.66143798828125, 2.9152145385742188, 1.2269287109375, 6.933197021484375, 2.5873336791992188, -1.1719589233398438, 0.9261245727539062, 5.3160247802734375, 0.947601318359375, -1.78985595703125, -1.5797271728515625, 1.0481300354003906, 3.3353729248046875, -0.08005332946777344, 3.7470779418945312, -3.3608360290527344, 1.2923469543457031, 2.3905715942382812, 0.365631103515625, -2.003124237060547, -2.1902828216552734, 1.640869140625, 1.4834747314453125, -4.62647819519043, 1.663339614868164, 3.5411758422851562, -2.4144439697265625, 0.77410888671875, 3.511760711669922, 3.2776565551757812, 1.1188545227050781, -4.626640319824219, 1.1952056884765625, 1.7794342041015625, 0.148712158203125, -0.47643280029296875, 2.9660491943359375, -0.3839988708496094, 3.4280853271484375, 3.5508804321289062, 3.0367813110351562, 1.875244140625, -3.2294578552246094, 1.1834259033203125, 3.2561187744140625, 1.3392715454101562, 4.3868255615234375, -1.4477386474609375, 2.89599609375, 0.9532241821289062, -1.1114530563354492, 5.301364898681641, -1.4284820556640625, 3.25408935546875, 2.066160202026367, 0.6235198974609375, 5.305583953857422, -0.27275848388671875, 3.4894561767578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000640.npy"}
|
||||
{"epoch": 0.9674981103552532, "step": 641, "batch_size": 64, "mean": 1.3670417070388794, "std": 2.29040265083313, "min": -3.8548202514648438, "p10": -1.157412338256836, "median": 1.3364982604980469, "p90": 4.393866729736328, "max": 7.1380767822265625, "pos_frac": 0.6875, "sample": [-0.5432701110839844, -2.23687744140625, 3.0391292572021484, 0.04598236083984375, -0.327545166015625, -0.32016754150390625, 4.8792877197265625, -0.0413818359375, 0.6040611267089844, -0.15091705322265625, -1.462982177734375, 0.1263113021850586, 2.100048065185547, 6.9383544921875, -1.1359405517578125, 1.2834854125976562, 1.9390945434570312, 6.437843322753906, 5.676094055175781, 3.6704673767089844, 2.573497772216797, 0.5199851989746094, -0.8078994750976562, -0.40509796142578125, 2.3877220153808594, 3.6154022216796875, 0.386505126953125, 1.1171722412109375, 1.9195518493652344, 4.3475799560546875, -3.196929931640625, 1.43408203125, 0.132110595703125, -0.050445556640625, 1.3895111083984375, -0.6182937622070312, 2.1215782165527344, -0.284423828125, -0.8323516845703125, 2.521209716796875, -3.8548202514648438, 3.9722557067871094, 1.8288040161132812, -1.756134033203125, 2.424633026123047, 3.9639739990234375, -1.6981620788574219, 0.7756748199462891, 1.440718650817871, 1.8414459228515625, 7.1380767822265625, 2.0090694427490234, 4.413703918457031, 1.1296062469482422, 1.97991943359375, -1.1666145324707031, 0.8419952392578125, 0.8287124633789062, 1.748687744140625, 2.3291549682617188, -0.6367378234863281, 1.7964553833007812, 2.6740550994873047, 4.674652099609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000641.npy"}
|
||||
{"epoch": 0.9690098261526833, "step": 642, "batch_size": 64, "mean": 1.5288479328155518, "std": 2.551440477371216, "min": -3.94793701171875, "p10": -0.8567031860351563, "median": 1.4312973022460938, "p90": 5.122866821289064, "max": 8.032379150390625, "pos_frac": 0.6875, "sample": [5.484703063964844, 1.64459228515625, -0.45330047607421875, -0.6376399993896484, 0.6889801025390625, -0.5040359497070312, 2.572845458984375, 8.032379150390625, 5.479400634765625, -3.5206222534179688, 6.66595458984375, -0.7075996398925781, -0.5161361694335938, -0.8673095703125, 2.2727432250976562, 1.8783493041992188, 2.4234619140625, 5.799095153808594, -0.5014457702636719, -2.0285797119140625, 2.0780868530273438, -0.03448486328125, 3.8927993774414062, -0.7779045104980469, 2.363048553466797, -0.09928131103515625, 5.20025634765625, 1.1261444091796875, 1.9605712890625, 3.0641403198242188, 2.1783485412597656, 0.4303741455078125, 3.45245361328125, 1.1001319885253906, -0.42012786865234375, 3.9910888671875, -0.7036285400390625, 1.5135955810546875, 2.3781890869140625, -3.94793701171875, 1.3462982177734375, 0.42209625244140625, 0.6221103668212891, -3.29827880859375, 0.7881393432617188, 2.2687950134277344, -3.58758544921875, 3.0773353576660156, 4.942291259765625, 6.140373229980469, 1.3489990234375, 0.40326690673828125, 4.451568603515625, 3.637157440185547, 4.870115280151367, 2.9759521484375, 0.16151046752929688, 1.9422454833984375, -0.8319549560546875, -1.863983154296875, 2.327545166015625, -0.05680084228515625, 0.7039031982421875, 3.1034698486328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000642.npy"}
|
||||
{"epoch": 0.9705215419501134, "step": 643, "batch_size": 64, "mean": 1.4491500854492188, "std": 2.593461513519287, "min": -4.483123779296875, "p10": -1.7675260543823241, "median": 1.2225112915039062, "p90": 4.912811279296876, "max": 9.02554702758789, "pos_frac": 0.765625, "sample": [-0.849151611328125, 2.2179832458496094, 1.1869125366210938, -4.483123779296875, 1.1774272918701172, 3.6826248168945312, 2.185880661010742, 0.3907928466796875, 1.737152099609375, 1.1220703125, -2.6805877685546875, 0.718994140625, 0.4766693115234375, 2.5767364501953125, 1.9833984375, 1.285858154296875, 0.997222900390625, -0.9037742614746094, 6.933540344238281, 7.1745758056640625, 3.190357208251953, 2.023326873779297, 3.8897151947021484, 3.401397705078125, -0.2526397705078125, 4.7135009765625, 2.3020172119140625, 0.6397705078125, 5.608787536621094, 0.53314208984375, 1.9661445617675781, 1.8630313873291016, -1.4214935302734375, 2.239776611328125, 0.353668212890625, 1.2581100463867188, 0.19525146484375, 0.09360504150390625, 9.02554702758789, 6.331817626953125, 3.2610931396484375, -1.8206100463867188, 2.332916259765625, -1.6436634063720703, 0.5582275390625, -2.625885009765625, -2.1551284790039062, -4.114524841308594, -0.2609710693359375, 2.3650970458984375, 0.6627197265625, 2.4561920166015625, 4.99822998046875, 1.1164264678955078, 1.498291015625, -1.5817184448242188, 0.338165283203125, 5.209510803222656, -0.2048797607421875, 2.4570980072021484, 2.3357925415039062, 0.40423583984375, 4.148601531982422, -1.875640869140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000643.npy"}
|
||||
{"epoch": 0.9720332577475435, "step": 644, "batch_size": 64, "mean": 1.4658468961715698, "std": 2.3298091888427734, "min": -3.7558860778808594, "p10": -1.5105007171630858, "median": 1.2619991302490234, "p90": 4.75535125732422, "max": 6.386440277099609, "pos_frac": 0.703125, "sample": [3.924030303955078, 0.9191131591796875, -3.1858444213867188, 3.006175994873047, -1.3609352111816406, -1.80810546875, 1.6325111389160156, 1.7342109680175781, 4.926492691040039, 3.3033599853515625, 1.0098114013671875, -1.2540435791015625, 0.0846710205078125, 3.324066162109375, 3.7588043212890625, 0.9213409423828125, -0.5985794067382812, 1.9390411376953125, -0.5115509033203125, 4.897653579711914, -0.09586906433105469, 2.6827392578125, -3.7558860778808594, 0.046382904052734375, 0.9369697570800781, 3.862752914428711, -0.5914802551269531, 3.0456008911132812, 0.5203666687011719, -0.110198974609375, 3.0954742431640625, 3.201171875, -0.3329906463623047, 2.084545135498047, 5.082801818847656, -0.9205093383789062, 0.1994781494140625, 4.544921875, 3.660907745361328, 0.9208984375, 1.69110107421875, 1.3155593872070312, 2.986724853515625, 1.2084388732910156, -1.8870353698730469, -3.0481491088867188, -0.5679702758789062, 5.895946502685547, 2.892030715942383, 3.7566604614257812, 1.9486541748046875, 1.1734390258789062, -0.46062755584716797, 0.9854812622070312, 1.5180587768554688, 2.7548828125, -1.5746002197265625, -0.4389610290527344, 2.8771820068359375, 1.0735511779785156, -1.9087600708007812, 5.6503143310546875, 4.8455352783203125, 6.386440277099609], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000644.npy"}
|
||||
{"epoch": 0.9735449735449735, "step": 645, "batch_size": 64, "mean": 2.0425095558166504, "std": 2.4226725101470947, "min": -2.9500293731689453, "p10": -0.6465248107910155, "median": 1.5384273529052734, "p90": 4.866390991210937, "max": 8.748275756835938, "pos_frac": 0.796875, "sample": [2.6986846923828125, 3.633331298828125, 8.276180267333984, 0.32293701171875, 4.8688812255859375, 8.159782409667969, 8.748275756835938, -2.9500293731689453, 1.5234336853027344, -0.7249679565429688, -0.09389495849609375, -1.6024169921875, 0.6804084777832031, 1.2146110534667969, 4.8605804443359375, 0.7708511352539062, 3.845212936401367, 1.9786510467529297, 0.46263885498046875, -0.3817005157470703, 4.983795166015625, -0.5369720458984375, -0.5845260620117188, 2.5256195068359375, 0.746490478515625, 1.5534210205078125, 2.0607986450195312, 3.5290908813476562, 2.9635047912597656, 4.192394256591797, 2.601165771484375, 2.067990303039551, 3.4205150604248047, 1.2491836547851562, 0.182159423828125, 3.9731292724609375, -1.53155517578125, 0.04158782958984375, 1.1470947265625, 3.5419464111328125, 4.521186828613281, 3.0884323120117188, 6.131561279296875, 0.31047821044921875, 0.5841941833496094, 4.241607666015625, 1.9069747924804688, 1.070169448852539, 1.8402252197265625, 5.623096466064453, 0.9486122131347656, 4.021511077880859, 1.196441650390625, 4.436851501464844, -0.673095703125, 0.6209297180175781, 2.740386962890625, -0.6731643676757812, -0.24114990234375, -1.04229736328125, -0.268768310546875, 4.144914627075195, 0.423858642578125, 1.3493804931640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000645.npy"}
|
||||
{"epoch": 0.9750566893424036, "step": 646, "batch_size": 64, "mean": 1.691102147102356, "std": 2.7036261558532715, "min": -7.004791259765625, "p10": -0.7271522521972655, "median": 1.5547904968261719, "p90": 5.737337112426759, "max": 7.849853515625, "pos_frac": 0.734375, "sample": [-0.5207366943359375, 0.14445877075195312, 2.6312103271484375, 1.9751205444335938, 4.876392364501953, 1.3870162963867188, 2.155364990234375, 5.8260955810546875, 1.6258268356323242, 2.338970184326172, -0.7803421020507812, 1.93389892578125, 5.588481903076172, 5.099403381347656, 2.268218994140625, -0.3306598663330078, 1.5390777587890625, -7.004791259765625, 6.165283203125, 2.534585952758789, 5.2305908203125, 1.7193450927734375, -2.6339073181152344, 2.779308319091797, -1.1438941955566406, 5.8011322021484375, 1.8592796325683594, 0.5415420532226562, 0.55780029296875, 0.8393020629882812, 2.1451568603515625, 2.0426559448242188, 1.7301406860351562, 6.503349304199219, -0.17183685302734375, 1.0605506896972656, 0.6583700180053711, -0.058563232421875, 1.5705032348632812, 3.3039932250976562, 0.38669776916503906, -2.3703842163085938, 1.3116683959960938, -0.314361572265625, 1.6744232177734375, 7.849853515625, -0.21399688720703125, 4.96441650390625, -0.10205745697021484, 6.1490936279296875, 3.3584556579589844, -0.2485504150390625, 1.4321136474609375, 3.6684494018554688, -4.671545028686523, 0.3771247863769531, 2.053802490234375, -0.9007949829101562, 0.18783187866210938, 7.611152648925781, -0.040386199951171875, 1.5340995788574219, -0.6030426025390625, 1.3487777709960938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000646.npy"}
|
||||
{"epoch": 0.9765684051398337, "step": 647, "batch_size": 64, "mean": 2.2074756622314453, "std": 2.6861789226531982, "min": -3.492767333984375, "p10": -0.6750946044921875, "median": 1.7012653350830078, "p90": 5.918302917480475, "max": 10.57830810546875, "pos_frac": 0.828125, "sample": [-0.6152839660644531, 0.45394134521484375, 2.9541778564453125, 2.7840499877929688, 2.9762725830078125, 1.6316986083984375, -0.3828620910644531, 7.36798095703125, 1.076080322265625, 3.9555816650390625, 1.1602630615234375, 2.865966796875, 1.5573272705078125, 0.7678337097167969, -0.6914138793945312, 2.373779296875, 9.438018798828125, -0.49607086181640625, 0.0042266845703125, 0.010406494140625, 3.4576339721679688, 2.5548324584960938, 3.6083602905273438, 1.5912246704101562, 8.306930541992188, 4.197479248046875, 1.137786865234375, 1.7008285522460938, 2.64630126953125, 4.107460021972656, 2.1061019897460938, 1.4478912353515625, 1.1997528076171875, 0.3968162536621094, 1.3168792724609375, -0.6370162963867188, 1.1060638427734375, 3.3712196350097656, -1.5442962646484375, -2.17327880859375, 1.7932052612304688, 2.3090667724609375, 1.7017021179199219, 3.312347412109375, -3.492767333984375, 1.2770538330078125, 1.8935394287109375, 1.4652099609375, -1.0086135864257812, 2.243274688720703, -0.8246688842773438, 8.206695556640625, 3.3658447265625, 2.63995361328125, 0.980194091796875, 6.967475891113281, 0.6584529876708984, 10.57830810546875, 3.851715087890625, 4.2391510009765625, 6.637939453125, -1.0926895141601562, 3.4322757720947266, 1.0528335571289062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000647.npy"}
|
||||
{"epoch": 0.9780801209372638, "step": 648, "batch_size": 64, "mean": 1.7139304876327515, "std": 2.196227788925171, "min": -4.514202117919922, "p10": -0.6676998138427733, "median": 1.4025611877441406, "p90": 4.657361602783203, "max": 8.249397277832031, "pos_frac": 0.828125, "sample": [0.5307235717773438, 2.67095947265625, 1.3031692504882812, -4.514202117919922, 4.664756774902344, 2.8300399780273438, -2.515411376953125, 4.433677673339844, 2.0613555908203125, 2.9366188049316406, 0.7659492492675781, 1.8095283508300781, 1.501953125, 1.2772445678710938, -0.8647003173828125, 0.07583999633789062, 2.6369247436523438, 0.5601043701171875, 4.640106201171875, 3.8980712890625, 4.7863616943359375, 1.0146408081054688, 1.035247802734375, 6.147655487060547, 2.7159576416015625, 3.4755287170410156, 0.04461669921875, 4.127206802368164, 0.64520263671875, 8.249397277832031, 2.812530517578125, -0.3130378723144531, 2.1776123046875, -0.3419036865234375, 1.2311944961547852, -0.08983993530273438, 1.0108184814453125, 2.286266326904297, 1.5228385925292969, 2.8156356811523438, 0.6261520385742188, 1.0710487365722656, 6.576416015625, 0.42429351806640625, -0.55035400390625, 1.756256103515625, 5.02587890625, 1.1769580841064453, 2.0064544677734375, 1.2667503356933594, 2.4813003540039062, 2.3217124938964844, 2.0319671630859375, -1.1243896484375, -2.672882080078125, 2.593902587890625, 5.192375183105469, -0.7179908752441406, 1.0726242065429688, -1.165985107421875, 0.30789947509765625, 0.7726421356201172, 1.182769775390625, 1.979116439819336], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000648.npy"}
|
||||
{"epoch": 0.9795918367346939, "step": 649, "batch_size": 64, "mean": 1.6918833255767822, "std": 2.781395673751831, "min": -4.778839111328125, "p10": -1.7156345367431638, "median": 1.9220585823059082, "p90": 5.152149581909181, "max": 8.740013122558594, "pos_frac": 0.71875, "sample": [6.344047546386719, 2.5510215759277344, -0.04979515075683594, 1.904510498046875, 3.8532180786132812, 2.381317138671875, 1.5366363525390625, 4.4613189697265625, 1.6187572479248047, 5.238704681396484, 0.6166954040527344, 3.649261474609375, 1.3985519409179688, -1.8794784545898438, 3.510772705078125, -0.8055343627929688, -0.4336700439453125, 1.9396066665649414, 3.953219413757324, 1.9598159790039062, 2.1138954162597656, -4.3341569900512695, -4.017887115478516, -0.65557861328125, -3.5246734619140625, 3.908702850341797, 4.505760192871094, -1.1260643005371094, -2.137643814086914, -1.0164642333984375, 3.9238052368164062, 2.3032684326171875, 3.662647247314453, 2.199465751647949, 5.790470123291016, 1.1032257080078125, 2.6661148071289062, 3.998931884765625, 2.3955917358398438, 0.9137973785400391, 0.758392333984375, 8.740013122558594, 5.390594482421875, -1.3333320617675781, -0.7841644287109375, 5.337560653686523, -0.012960433959960938, 0.4966888427734375, 1.4337692260742188, 0.121002197265625, -0.0538482666015625, 4.950187683105469, -2.102508544921875, 0.077880859375, 6.8120269775390625, -4.778839111328125, -1.2975654602050781, 0.3893280029296875, 2.74847412109375, 2.889049530029297, 3.44805908203125, 3.6082305908203125, 0.4142913818359375, 4.60601806640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000649.npy"}
|
||||
{"epoch": 0.981103552532124, "step": 650, "batch_size": 64, "mean": 1.8146789073944092, "std": 2.2441818714141846, "min": -4.63836669921875, "p10": -0.6136255264282224, "median": 2.0214881896972656, "p90": 4.626901245117188, "max": 7.226295471191406, "pos_frac": 0.765625, "sample": [-0.40108489990234375, 2.1491241455078125, 4.5087890625, 5.28533935546875, 0.9141998291015625, 4.0988311767578125, -0.24611663818359375, 2.6669769287109375, 5.172412872314453, 0.4458732604980469, 0.14696502685546875, 3.709197998046875, -1.5529403686523438, 2.1904830932617188, 0.3801383972167969, 1.143218994140625, 3.5977821350097656, 3.7458229064941406, 2.0206680297851562, 0.5654449462890625, 3.0335922241210938, 2.589641571044922, -0.17959976196289062, -1.4587783813476562, 4.189613342285156, 4.677520751953125, 7.226295471191406, 4.253997802734375, 2.0556602478027344, -0.30626678466796875, 1.0567855834960938, 0.44145965576171875, -0.1294403076171875, 3.1683692932128906, -4.63836669921875, 1.4346351623535156, 4.286277770996094, 4.036769866943359, 2.035358428955078, -1.3416595458984375, 2.4493865966796875, 4.893669128417969, -2.9842529296875, 2.738292694091797, 1.450927734375, -0.449249267578125, 2.022308349609375, 2.7907943725585938, 2.4012451171875, -1.2547378540039062, 2.252470016479492, 1.0384178161621094, 1.523885726928711, 0.13483428955078125, -0.6840724945068359, 2.3720703125, -0.2624969482421875, -0.0221405029296875, 1.7005290985107422, 4.932098388671875, 6.413421630859375, 3.5212936401367188, 1.8338966369628906, 0.3538665771484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000650.npy"}
|
||||
{"epoch": 0.982615268329554, "step": 651, "batch_size": 64, "mean": 1.5195376873016357, "std": 2.391110897064209, "min": -3.867584228515625, "p10": -1.2937965393066406, "median": 1.3998680114746094, "p90": 4.75067481994629, "max": 7.437990188598633, "pos_frac": 0.71875, "sample": [1.4939231872558594, 0.29909515380859375, 1.0398712158203125, 0.468719482421875, 1.5569534301757812, -3.6618728637695312, 1.2949447631835938, 0.5859375, -1.3041534423828125, 5.336708068847656, -0.5366630554199219, -1.477325439453125, -2.5024948120117188, 1.3992691040039062, 5.099700927734375, -2.1988601684570312, -0.29126930236816406, 6.825592041015625, 2.754241943359375, 3.0509986877441406, 1.9593124389648438, -0.1688079833984375, 1.0792007446289062, 0.3111572265625, -3.867584228515625, 1.4855461120605469, 3.5267791748046875, 1.7579574584960938, -0.29933929443359375, 5.6788330078125, 4.423803329467773, 5.06378173828125, 3.9095077514648438, 0.1178741455078125, 1.0334243774414062, -0.8472995758056641, -0.31097412109375, 3.9244766235351562, -0.26324462890625, 1.8104705810546875, 4.248542785644531, 1.2107162475585938, 4.274192810058594, 1.55767822265625, -0.2697105407714844, 7.437990188598633, 4.8008880615234375, 4.633510589599609, 0.135894775390625, 2.8648509979248047, 1.8187580108642578, -1.1846160888671875, 0.9271717071533203, 2.6678619384765625, 1.9648895263671875, 2.936798095703125, 0.9390068054199219, -1.2696304321289062, 1.4004669189453125, 1.785186767578125, -0.31266021728515625, 2.481761932373047, 4.0336761474609375, -1.3909988403320312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000651.npy"}
|
||||
{"epoch": 0.9841269841269841, "step": 652, "batch_size": 64, "mean": 1.5042537450790405, "std": 2.727827548980713, "min": -7.5779571533203125, "p10": -1.5348033905029297, "median": 1.088057518005371, "p90": 5.110802650451661, "max": 7.742225646972656, "pos_frac": 0.765625, "sample": [0.7252197265625, 4.022823333740234, 6.062507629394531, 1.3668060302734375, 2.321563720703125, 5.319316864013672, 0.06397056579589844, 4.012432098388672, 7.742225646972656, 0.8321037292480469, 5.5234527587890625, 0.97857666015625, -0.17499351501464844, 1.3815689086914062, 4.935117721557617, 3.6465530395507812, 1.3188552856445312, 3.443155288696289, 3.6995468139648438, 0.5095367431640625, -1.1585235595703125, 3.1107177734375, 2.4726715087890625, 0.15439414978027344, 1.6081924438476562, 1.0277633666992188, -7.5779571533203125, -2.7809371948242188, 4.788917541503906, 1.5702362060546875, 1.3091354370117188, -2.7192420959472656, 0.4759635925292969, 0.2359771728515625, 0.443145751953125, 0.30926513671875, 4.665245056152344, -1.4976463317871094, -2.382354736328125, 0.722381591796875, 1.0682506561279297, 7.20587158203125, -0.4741058349609375, 1.7952346801757812, -0.5980377197265625, 0.4643669128417969, 3.6657867431640625, 4.703277587890625, 0.2979574203491211, 5.18609619140625, 4.301483154296875, -0.6428756713867188, 3.150634765625, 0.44522857666015625, -2.8577880859375, -0.4734039306640625, -1.487701416015625, 0.7154369354248047, -1.5507278442382812, 5.738304138183594, 1.888824462890625, 1.1078643798828125, -1.60479736328125, 1.7193756103515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000652.npy"}
|
||||
{"epoch": 0.9856386999244142, "step": 653, "batch_size": 64, "mean": 1.6202449798583984, "std": 2.226583480834961, "min": -3.58056640625, "p10": -1.1669490814208985, "median": 2.099588394165039, "p90": 4.339478492736817, "max": 5.665496826171875, "pos_frac": 0.734375, "sample": [0.8978652954101562, 5.018074035644531, 0.1342010498046875, -1.0657310485839844, 2.089466094970703, 2.9660797119140625, 2.396270751953125, 3.7158660888671875, 4.791664123535156, -0.687103271484375, -1.1369857788085938, -0.8079833984375, 3.085561752319336, 1.50311279296875, 4.122161865234375, 3.1453933715820312, 1.6002578735351562, 4.028724670410156, 2.0242652893066406, 4.630592346191406, 2.4412765502929688, 3.8308868408203125, -1.1797904968261719, -0.12393569946289062, 1.3134536743164062, -0.33237266540527344, 2.5572128295898438, 1.3255081176757812, -0.59844970703125, 2.342498779296875, 5.665496826171875, -1.89959716796875, 0.92059326171875, 2.164379119873047, 4.210416793823242, -2.0007171630859375, 2.109710693359375, 1.7244453430175781, 2.6319732666015625, -0.7220840454101562, 0.6582183837890625, -3.58056640625, 2.1630096435546875, 3.8411407470703125, 4.053531646728516, -3.135549545288086, 4.443902969360352, 0.7315521240234375, 3.8406982421875, 3.692108154296875, -1.7455673217773438, 5.528408050537109, -2.910064697265625, 2.1632080078125, -0.87982177734375, -0.013591766357421875, 2.1299667358398438, 1.65399169921875, 2.3256988525390625, 4.3947906494140625, 2.3589859008789062, 0.17348861694335938, 0.06528091430664062, 2.9102020263671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000653.npy"}
|
||||
{"epoch": 0.9871504157218443, "step": 654, "batch_size": 64, "mean": 1.5880032777786255, "std": 2.5976412296295166, "min": -2.446014404296875, "p10": -1.6685543060302734, "median": 1.3709983825683594, "p90": 5.44521942138672, "max": 8.388023376464844, "pos_frac": 0.703125, "sample": [1.4558868408203125, 6.873435974121094, -0.080047607421875, 3.0457592010498047, 0.6952056884765625, -1.4748420715332031, 3.7629547119140625, 1.8001327514648438, 1.390350341796875, 0.6824111938476562, -0.836395263671875, -2.446014404296875, 5.011138916015625, 2.1766738891601562, -0.765655517578125, 2.0014591217041016, 1.2929420471191406, -1.6778297424316406, 7.707157135009766, 0.1458892822265625, -0.580047607421875, 2.0166397094726562, -0.22430419921875, 6.447685241699219, 1.2984981536865234, 2.561309814453125, -2.0684242248535156, -2.0213394165039062, -1.8068084716796875, 2.532573699951172, -0.7538337707519531, -0.9055633544921875, 6.80419921875, 1.098459243774414, 5.0475921630859375, 3.616912841796875, 1.6374969482421875, 0.5835609436035156, 1.3133010864257812, 1.992767333984375, 4.692208290100098, 2.935546875, -1.83953857421875, -1.037567138671875, 2.260486602783203, -1.515533447265625, 5.615631103515625, 4.3961029052734375, 1.8066539764404297, 0.687744140625, 1.3104019165039062, 5.741485595703125, 1.686594009399414, 1.2443084716796875, 8.388023376464844, -2.1781082153320312, 1.8742904663085938, 2.4811477661132812, -1.64691162109375, 1.7446441650390625, 1.3516464233398438, -1.0400543212890625, 2.9832763671875, 0.33843994140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000654.npy"}
|
||||
{"epoch": 0.9886621315192744, "step": 655, "batch_size": 64, "mean": 1.6889780759811401, "std": 2.50116229057312, "min": -2.299224853515625, "p10": -1.0575851440429687, "median": 1.4232654571533203, "p90": 4.821379470825196, "max": 11.7369384765625, "pos_frac": 0.71875, "sample": [4.0888214111328125, -0.5467567443847656, -1.1335601806640625, 1.2487964630126953, 1.249481201171875, 5.2067413330078125, 2.7010650634765625, 1.7468414306640625, -1.8853530883789062, 3.7041778564453125, 0.38037109375, 3.2349624633789062, 1.8707733154296875, 2.6251449584960938, -1.6390457153320312, -0.4896430969238281, 1.3499221801757812, 0.7294197082519531, 7.1007843017578125, 1.2303657531738281, 6.675159454345703, -0.08441162109375, 3.057025909423828, 2.4049072265625, 3.108922004699707, -1.6101036071777344, 2.12738037109375, -0.2787933349609375, 4.0552215576171875, -0.4850730895996094, 2.2183761596679688, -0.7498245239257812, 1.6616020202636719, 0.7270889282226562, 4.8909149169921875, -2.299224853515625, 0.40805816650390625, 3.468170166015625, 2.5646514892578125, 11.7369384765625, 0.704833984375, -0.6949310302734375, 0.645111083984375, 5.4185028076171875, 1.4966087341308594, -2.2034378051757812, 0.9206256866455078, 1.7134017944335938, 2.0499496459960938, 3.0001068115234375, -1.5703811645507812, 1.940032958984375, 4.081512451171875, 0.8300704956054688, 0.3232154846191406, -0.8011703491210938, 4.659130096435547, 2.0493392944335938, 2.3976287841796875, -0.88031005859375, -0.23133277893066406, 5.3431854248046875, -0.2279052734375, 0.7605133056640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000655.npy"}
|
||||
{"epoch": 0.9901738473167044, "step": 656, "batch_size": 64, "mean": 1.7811647653579712, "std": 2.835796356201172, "min": -4.824859619140625, "p10": -1.4449756622314451, "median": 1.4517478942871094, "p90": 5.33953437805176, "max": 10.165596008300781, "pos_frac": 0.796875, "sample": [4.32403564453125, 1.461395263671875, 0.5431671142578125, 1.9825477600097656, -1.5144996643066406, 1.4421005249023438, 1.5155181884765625, 3.302154541015625, 0.7945709228515625, 0.8939628601074219, 1.0710716247558594, 5.6713104248046875, 0.4180450439453125, -0.2582855224609375, -0.06975746154785156, 0.830322265625, -4.824859619140625, -1.1305007934570312, 3.3309783935546875, 3.783489227294922, 10.165596008300781, 2.3925743103027344, -4.7542877197265625, 4.2605743408203125, 3.7955169677734375, -1.2827529907226562, 3.0381317138671875, 3.50836181640625, -2.2538604736328125, 0.7564544677734375, 7.910713195800781, 0.2853507995605469, 0.39315223693847656, 2.0647735595703125, 2.642467498779297, 8.631256103515625, -0.1115264892578125, 0.3429679870605469, 5.551372528076172, 1.3070831298828125, -1.5592193603515625, 0.198394775390625, 2.642547607421875, 5.942264556884766, 2.7749404907226562, 4.845245361328125, 1.0374069213867188, 2.7969398498535156, 0.05695343017578125, 2.1181716918945312, 3.767669677734375, -0.7574653625488281, -1.599853515625, 8.405426025390625, 1.9979324340820312, 2.94476318359375, 1.5411109924316406, -3.0726356506347656, 1.352508544921875, 0.6470718383789062, 0.31642913818359375, 2.1306838989257812, 0.8604068756103516, 2.3961639404296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000656.npy"}
|
||||
{"epoch": 0.9916855631141346, "step": 657, "batch_size": 64, "mean": 2.058232545852661, "std": 2.4482226371765137, "min": -4.182762145996094, "p10": -0.7059562683105468, "median": 1.9156532287597656, "p90": 5.433784484863281, "max": 7.856437683105469, "pos_frac": 0.78125, "sample": [4.236785888671875, 1.036956787109375, 0.758392333984375, 3.200897216796875, 2.093585968017578, 4.334251403808594, 4.594085693359375, 6.391441345214844, 2.8162078857421875, -2.1305618286132812, 0.9616622924804688, -0.7550010681152344, 1.972625732421875, -0.3606758117675781, 3.2669296264648438, 2.549449920654297, 2.76898193359375, 5.8415985107421875, -0.5915184020996094, 5.3937835693359375, -0.8022613525390625, 2.1929683685302734, -1.431060791015625, 0.8582859039306641, 1.969818115234375, 3.058866500854492, 0.020355224609375, 1.8710441589355469, -0.48333740234375, 1.0613861083984375, 3.1252593994140625, 2.5915088653564453, 2.449310302734375, 5.128822326660156, 5.794189453125, 0.97576904296875, 1.3531723022460938, 7.856437683105469, 5.0589599609375, 1.237152099609375, 5.450927734375, 1.5365142822265625, 0.4966907501220703, 4.1239776611328125, -1.011077880859375, -1.89642333984375, 7.4296875, 1.3538665771484375, -0.1171112060546875, 0.3520355224609375, -0.14614105224609375, -4.182762145996094, 1.8982772827148438, -0.3538074493408203, 1.1378631591796875, 0.096649169921875, 0.388427734375, 3.395427703857422, 4.554521560668945, 6.028083801269531, -0.12042617797851562, 2.670970916748047, 1.9330291748046875, 4.441153526306152], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000657.npy"}
|
||||
{"epoch": 0.9931972789115646, "step": 658, "batch_size": 64, "mean": 1.5623366832733154, "std": 2.81974720954895, "min": -4.518096923828125, "p10": -1.6855499267578125, "median": 1.3028545379638672, "p90": 5.256708526611329, "max": 9.449172973632812, "pos_frac": 0.71875, "sample": [1.7475433349609375, 2.1306610107421875, 4.971954345703125, -2.316192626953125, -0.0240936279296875, 1.7923126220703125, -4.283031463623047, 0.7599105834960938, 2.692028045654297, 2.7761688232421875, 1.8930511474609375, 1.4601287841796875, -1.3586502075195312, 3.2196311950683594, 3.2093124389648438, 1.4899425506591797, -0.164459228515625, -0.42740631103515625, 6.50738525390625, 0.8492889404296875, -1.4990406036376953, -0.5298004150390625, -2.3749237060546875, 3.801525115966797, 1.115234375, 4.336181640625, -3.3024673461914062, 2.4497146606445312, 2.7689781188964844, 1.0384750366210938, 1.4609222412109375, 0.575531005859375, 0.4193248748779297, 3.8742141723632812, 1.0977373123168945, 3.5328636169433594, 1.960601806640625, -1.8919506072998047, 1.1229705810546875, 2.9602203369140625, 6.926918029785156, 0.5669784545898438, -0.8052520751953125, 5.378746032714844, -0.44602203369140625, 6.1309661865234375, -0.04767608642578125, 9.449172973632812, 4.354820251464844, 0.49322509765625, 0.00269317626953125, 1.1455802917480469, -1.7467460632324219, -1.5427589416503906, 3.3657379150390625, 2.5046768188476562, 0.24111175537109375, -4.518096923828125, 7.926605224609375, -1.2701797485351562, 7.498321533203125, 2.0814571380615234, 2.342071533203125, 0.1154022216796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000658.npy"}
|
||||
{"epoch": 0.9947089947089947, "step": 659, "batch_size": 64, "mean": 1.222205400466919, "std": 2.61568546295166, "min": -4.990379333496094, "p10": -2.118950462341308, "median": 1.030731201171875, "p90": 4.613463592529299, "max": 6.951663970947266, "pos_frac": 0.671875, "sample": [2.0548324584960938, 0.8516616821289062, 5.293453216552734, 2.0886993408203125, -0.24993896484375, 5.686248779296875, -0.09616470336914062, 3.1349964141845703, 0.13348388671875, 2.9517555236816406, 1.77203369140625, -0.4427490234375, 0.19489288330078125, -0.7104721069335938, 1.0322132110595703, 2.2751998901367188, 3.1427764892578125, -3.0552597045898438, 1.0292491912841797, -3.9783096313476562, 3.0276107788085938, -1.3962650299072266, -0.34738922119140625, 3.67974853515625, -0.4696502685546875, 0.129791259765625, 0.08728790283203125, 1.970097541809082, 0.894618034362793, 0.8494186401367188, 4.137046813964844, -1.3090362548828125, 6.748832702636719, -1.0787734985351562, 1.052093505859375, 3.6711273193359375, 6.951663970947266, 4.0422821044921875, 5.908061981201172, 0.998931884765625, 2.345001220703125, -3.424591064453125, -0.6099929809570312, 3.9173927307128906, -2.4286727905273438, 1.8487052917480469, 0.6015739440917969, 5.294219970703125, 3.2379150390625, -1.3365936279296875, 1.1586875915527344, 2.748636245727539, -4.990379333496094, 0.7336158752441406, 3.9771270751953125, 2.475536346435547, -0.6184654235839844, -3.6686477661132812, 4.8176422119140625, 1.329559326171875, -0.1309347152709961, 1.6452369689941406, -0.8244171142578125, -2.533111572265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000659.npy"}
|
||||
{"epoch": 0.9962207105064248, "step": 660, "batch_size": 64, "mean": 2.0135550498962402, "std": 2.175987720489502, "min": -2.975860595703125, "p10": -0.8877933502197262, "median": 2.081594467163086, "p90": 4.756292915344239, "max": 6.637664794921875, "pos_frac": 0.828125, "sample": [6.149261474609375, 3.5326080322265625, 1.5959091186523438, -1.514801025390625, 1.5593204498291016, -1.5838356018066406, 0.12689208984375, 2.4682998657226562, -2.0529632568359375, -1.2646026611328125, 3.8120765686035156, 6.637664794921875, 2.4451904296875, 2.905853271484375, 2.4199752807617188, 3.7433624267578125, 4.433555603027344, 2.0991592407226562, 1.4461288452148438, -2.975860595703125, 2.2527999877929688, -0.106231689453125, 5.7298583984375, 0.05939292907714844, 3.45416259765625, 4.904369354248047, 0.8440093994140625, 3.078857421875, 1.6288375854492188, -1.03082275390625, 0.7720260620117188, 2.6539154052734375, 5.66986083984375, 1.4678497314453125, 0.41495704650878906, 0.8924102783203125, 0.5001220703125, 3.1482696533203125, 4.789846420288086, 2.049468994140625, 0.270599365234375, 2.1477794647216797, 2.7129135131835938, 0.6813507080078125, 4.520820617675781, 1.7418899536132812, 1.2854385375976562, 1.8599624633789062, 2.3567657470703125, 4.247802734375, 2.267475128173828, 6.6373291015625, -0.5533370971679688, 0.3672142028808594, 3.5520172119140625, -0.5540580749511719, 1.7147083282470703, 4.422760009765625, 4.678001403808594, -0.19191455841064453, 2.250091552734375, 2.0640296936035156, -1.918975830078125, 3.1497039794921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000660.npy"}
|
||||
{"epoch": 0.9977324263038548, "step": 661, "batch_size": 64, "mean": 1.5202577114105225, "std": 2.6632518768310547, "min": -6.5048675537109375, "p10": -1.5327171325683593, "median": 1.3507614135742188, "p90": 5.7080850601196325, "max": 7.09344482421875, "pos_frac": 0.703125, "sample": [-0.11643218994140625, -0.5087089538574219, -2.5430755615234375, -1.6041278839111328, 1.6305122375488281, 2.076263427734375, 4.860395431518555, -2.2538299560546875, -6.5048675537109375, 2.5431137084960938, -0.00408172607421875, -0.28321075439453125, 0.9351730346679688, 6.086030960083008, 4.15948486328125, 7.09344482421875, 6.071678161621094, 1.5787124633789062, 6.626190185546875, -0.164276123046875, -3.2446517944335938, -0.5863418579101562, 6.8022308349609375, 0.9951152801513672, 0.5802459716796875, -0.347686767578125, -0.2808837890625, -1.5541610717773438, 1.3341598510742188, 2.1147918701171875, 3.740814208984375, 1.2153244018554688, -3.6771106719970703, 4.6772308349609375, 2.20458984375, 0.3769998550415039, 3.026763916015625, 2.704010009765625, 1.3292617797851562, 2.8450469970703125, 2.3598556518554688, 0.8035430908203125, 1.35968017578125, 3.398225784301758, 6.071380615234375, 3.4658660888671875, 3.3116607666015625, 2.151214599609375, 1.1253929138183594, 2.319355010986328, -1.2605819702148438, 2.4095993041992188, -1.4826812744140625, -0.9863510131835938, 1.8073692321777344, 2.5886917114257812, 1.013824462890625, 1.71844482421875, -0.351318359375, 2.2370452880859375, 0.929962158203125, 0.205810546875, 1.3418426513671875, 6.82452392578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85/margin_logs/step_0000661.npy"}
|
||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model-00001-of-00007.safetensors
Normal file
3
model-00001-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:1128363baa23dd0063fdbb887dbfefb06b2074dcd6454f0b9d7729e32ac3ecd1
|
||||
size 4972454376
|
||||
3
model-00002-of-00007.safetensors
Normal file
3
model-00002-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:34517c43a578e621b3554f584220962d7181cd7cf004a5ee06c7a11c8d51e5bb
|
||||
size 4832048608
|
||||
3
model-00003-of-00007.safetensors
Normal file
3
model-00003-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:daec4f56a8610b588518ec8b00f71b3c2cba30d24ee43c41f1dd58e0944652f6
|
||||
size 4832048656
|
||||
3
model-00004-of-00007.safetensors
Normal file
3
model-00004-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:527962a3f924c5fa9aabb288cedd84b0e540566867018a33c526f9c28ba983c3
|
||||
size 4999855528
|
||||
3
model-00005-of-00007.safetensors
Normal file
3
model-00005-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:66199b1fc27be9a7e32ad308c8806a434787f4f0f6b4a52470def6d2d2c5c173
|
||||
size 4832048672
|
||||
3
model-00006-of-00007.safetensors
Normal file
3
model-00006-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:d705745a73e51b0b8efa35d34c254ef42e64295888fc46dbdcc8e42589da0564
|
||||
size 4832048672
|
||||
3
model-00007-of-00007.safetensors
Normal file
3
model-00007-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:5ed8cd4b8d6d369f12162626a2e2ab718a892a082d223fd47dd15ddc290c0790
|
||||
size 3462482728
|
||||
406
model.safetensors.index.json
Normal file
406
model.safetensors.index.json
Normal file
@@ -0,0 +1,406 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 32762941440
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00007-of-00007.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.34.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.self_attn.k_norm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.self_attn.q_norm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.norm.weight": "model-00007-of-00007.safetensors"
|
||||
}
|
||||
}
|
||||
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"eos_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
|
||||
size 11422654
|
||||
240
tokenizer_config.json
Normal file
240
tokenizer_config.json
Normal file
@@ -0,0 +1,240 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151665": {
|
||||
"content": "<tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151666": {
|
||||
"content": "</tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151667": {
|
||||
"content": "<think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151668": {
|
||||
"content": "</think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|endoftext|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 2048,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
9
train_results.json
Normal file
9
train_results.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"epoch": 0.999244142101285,
|
||||
"total_flos": 0.0,
|
||||
"train_loss": 1.1363916052632181,
|
||||
"train_runtime": 2119.1053,
|
||||
"train_samples": 42336,
|
||||
"train_samples_per_second": 19.978,
|
||||
"train_steps_per_second": 0.312
|
||||
}
|
||||
12704
trainer_state.json
Normal file
12704
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user