初始化项目,由ModelHub XC社区提供模型
Model: jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6 Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
82
README.md
Normal file
82
README.md
Normal file
@@ -0,0 +1,82 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: apache-2.0
|
||||
base_model: jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452
|
||||
tags:
|
||||
- alignment-handbook
|
||||
- new-dpo
|
||||
- generated_from_trainer
|
||||
datasets:
|
||||
- Anthropic/hh-rlhf
|
||||
model-index:
|
||||
- name: qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6
|
||||
results: []
|
||||
---
|
||||
|
||||
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
||||
should probably proofread and complete it, then remove this comment. -->
|
||||
|
||||
# qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6
|
||||
|
||||
This model is a fine-tuned version of [jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452](https://huggingface.co/jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452) on the Anthropic/hh-rlhf dataset.
|
||||
It achieves the following results on the evaluation set:
|
||||
- Loss: 0.5368
|
||||
- Fcm Dpo/beta: 0.0530
|
||||
- Margin Dpo/margin Mean: 11.7010
|
||||
- Margin Dpo/margin Std: 18.7863
|
||||
- Logps/chosen: -92.3002
|
||||
- Logps/rejected: -113.7958
|
||||
- Logps/ref Chosen: -86.9018
|
||||
- Logps/ref Rejected: -96.6964
|
||||
- Logits/chosen: 1.6311
|
||||
- Logits/rejected: 1.4933
|
||||
|
||||
## Model description
|
||||
|
||||
More information needed
|
||||
|
||||
## Intended uses & limitations
|
||||
|
||||
More information needed
|
||||
|
||||
## Training and evaluation data
|
||||
|
||||
More information needed
|
||||
|
||||
## Training procedure
|
||||
|
||||
### Training hyperparameters
|
||||
|
||||
The following hyperparameters were used during training:
|
||||
- learning_rate: 5e-07
|
||||
- train_batch_size: 8
|
||||
- eval_batch_size: 8
|
||||
- seed: 42
|
||||
- distributed_type: multi-GPU
|
||||
- num_devices: 4
|
||||
- gradient_accumulation_steps: 2
|
||||
- total_train_batch_size: 64
|
||||
- total_eval_batch_size: 32
|
||||
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
||||
- lr_scheduler_type: cosine
|
||||
- lr_scheduler_warmup_ratio: 0.1
|
||||
- num_epochs: 1
|
||||
|
||||
### Training results
|
||||
|
||||
| Training Loss | Epoch | Step | Validation Loss | Fcm Dpo/beta | Margin Dpo/margin Mean | Margin Dpo/margin Std | Logps/chosen | Logps/rejected | Logps/ref Chosen | Logps/ref Rejected | Logits/chosen | Logits/rejected |
|
||||
|:-------------:|:------:|:----:|:---------------:|:------------:|:----------------------:|:---------------------:|:------------:|:--------------:|:----------------:|:------------------:|:-------------:|:---------------:|
|
||||
| 1.3231 | 0.1512 | 100 | 0.6549 | 0.1000 | 0.8834 | 1.9512 | -86.2078 | -96.8858 | -86.9018 | -96.6964 | 1.6085 | 1.4983 |
|
||||
| 1.1388 | 0.3023 | 200 | 0.5426 | 0.2115 | 2.9544 | 4.9177 | -81.7111 | -94.4601 | -86.9018 | -96.6964 | 1.6653 | 1.5454 |
|
||||
| 1.1386 | 0.4535 | 300 | 0.5411 | 0.1198 | 5.2175 | 8.5703 | -85.9098 | -100.9220 | -86.9018 | -96.6964 | 1.5470 | 1.4237 |
|
||||
| 1.209 | 0.6047 | 400 | 0.5380 | 0.0890 | 6.8529 | 11.1007 | -85.3711 | -102.0186 | -86.9018 | -96.6964 | 1.6431 | 1.5118 |
|
||||
| 1.0608 | 0.7559 | 500 | 0.5388 | 0.0570 | 10.7654 | 17.4069 | -90.8454 | -111.4054 | -86.9018 | -96.6964 | 1.2932 | 1.1717 |
|
||||
| 1.1399 | 0.9070 | 600 | 0.5368 | 0.0530 | 11.7010 | 18.7863 | -92.3002 | -113.7958 | -86.9018 | -96.6964 | 1.6311 | 1.4933 |
|
||||
|
||||
|
||||
### Framework versions
|
||||
|
||||
- Transformers 4.51.0
|
||||
- Pytorch 2.3.1+cu121
|
||||
- Datasets 2.21.0
|
||||
- Tokenizers 0.21.4
|
||||
28
added_tokens.json
Normal file
28
added_tokens.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"</think>": 151668,
|
||||
"</tool_call>": 151658,
|
||||
"</tool_response>": 151666,
|
||||
"<think>": 151667,
|
||||
"<tool_call>": 151657,
|
||||
"<tool_response>": 151665,
|
||||
"<|box_end|>": 151649,
|
||||
"<|box_start|>": 151648,
|
||||
"<|endoftext|>": 151643,
|
||||
"<|file_sep|>": 151664,
|
||||
"<|fim_middle|>": 151660,
|
||||
"<|fim_pad|>": 151662,
|
||||
"<|fim_prefix|>": 151659,
|
||||
"<|fim_suffix|>": 151661,
|
||||
"<|im_end|>": 151645,
|
||||
"<|im_start|>": 151644,
|
||||
"<|image_pad|>": 151655,
|
||||
"<|object_ref_end|>": 151647,
|
||||
"<|object_ref_start|>": 151646,
|
||||
"<|quad_end|>": 151651,
|
||||
"<|quad_start|>": 151650,
|
||||
"<|repo_name|>": 151663,
|
||||
"<|video_pad|>": 151656,
|
||||
"<|vision_end|>": 151653,
|
||||
"<|vision_pad|>": 151654,
|
||||
"<|vision_start|>": 151652
|
||||
}
|
||||
23
all_results.json
Normal file
23
all_results.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"epoch": 0.999244142101285,
|
||||
"eval_fcm_dpo/beta": 0.047326020896434784,
|
||||
"eval_logits/chosen": 1.3826223611831665,
|
||||
"eval_logits/rejected": 1.2558915615081787,
|
||||
"eval_logps/chosen": -92.67096710205078,
|
||||
"eval_logps/ref_chosen": -86.90177917480469,
|
||||
"eval_logps/ref_rejected": -96.69639587402344,
|
||||
"eval_logps/rejected": -114.26859283447266,
|
||||
"eval_loss": 0.5416048169136047,
|
||||
"eval_margin_dpo/margin_mean": 11.803030967712402,
|
||||
"eval_margin_dpo/margin_std": 18.962507247924805,
|
||||
"eval_runtime": 42.3076,
|
||||
"eval_samples": 2303,
|
||||
"eval_samples_per_second": 54.435,
|
||||
"eval_steps_per_second": 1.702,
|
||||
"total_flos": 0.0,
|
||||
"train_loss": 1.1374638212250148,
|
||||
"train_runtime": 2122.2138,
|
||||
"train_samples": 42336,
|
||||
"train_samples_per_second": 19.949,
|
||||
"train_steps_per_second": 0.311
|
||||
}
|
||||
30
config.json
Normal file
30
config.json
Normal file
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Qwen3ForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 151643,
|
||||
"eos_token_id": 151643,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 12288,
|
||||
"max_position_embeddings": 32768,
|
||||
"max_window_layers": 36,
|
||||
"model_type": "qwen3",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 36,
|
||||
"num_key_value_heads": 8,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "float32",
|
||||
"transformers_version": "4.51.0",
|
||||
"use_cache": true,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 151936
|
||||
}
|
||||
17
eval_results.json
Normal file
17
eval_results.json
Normal file
@@ -0,0 +1,17 @@
|
||||
{
|
||||
"epoch": 0.999244142101285,
|
||||
"eval_fcm_dpo/beta": 0.047326020896434784,
|
||||
"eval_logits/chosen": 1.3826223611831665,
|
||||
"eval_logits/rejected": 1.2558915615081787,
|
||||
"eval_logps/chosen": -92.67096710205078,
|
||||
"eval_logps/ref_chosen": -86.90177917480469,
|
||||
"eval_logps/ref_rejected": -96.69639587402344,
|
||||
"eval_logps/rejected": -114.26859283447266,
|
||||
"eval_loss": 0.5416048169136047,
|
||||
"eval_margin_dpo/margin_mean": 11.803030967712402,
|
||||
"eval_margin_dpo/margin_std": 18.962507247924805,
|
||||
"eval_runtime": 42.3076,
|
||||
"eval_samples": 2303,
|
||||
"eval_samples_per_second": 54.435,
|
||||
"eval_steps_per_second": 1.702
|
||||
}
|
||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"bos_token_id": 151643,
|
||||
"eos_token_id": 151643,
|
||||
"max_new_tokens": 2048,
|
||||
"transformers_version": "4.51.0"
|
||||
}
|
||||
661
margin_logs/margins.jsonl
Normal file
661
margin_logs/margins.jsonl
Normal file
@@ -0,0 +1,661 @@
|
||||
{"epoch": 0.0, "step": 1, "batch_size": 64, "mean": -0.0029816031455993652, "std": 0.38981664180755615, "min": -0.7835464477539062, "p10": -0.5016929626464843, "median": 0.02667522430419922, "p90": 0.4355194091796875, "max": 1.2425384521484375, "pos_frac": 0.53125, "sample": [-0.2990684509277344, 0.05040740966796875, 0.4813804626464844, -0.7835464477539062, 0.16756057739257812, -0.21320724487304688, 0.066741943359375, 0.169891357421875, -0.06363677978515625, -0.33983612060546875, 0.20204925537109375, -0.003765106201171875, -0.7424850463867188, -0.039760589599609375, 0.008941650390625, 0.2320232391357422, 0.3860015869140625, 0.11869239807128906, -0.36592864990234375, -0.047290802001953125, -0.28316688537597656, 0.0283660888671875, -0.351715087890625, 0.11574554443359375, 0.86297607421875, -0.7426376342773438, 0.1338043212890625, -0.21837997436523438, 0.426910400390625, -0.12430953979492188, 0.2183837890625, -0.4932708740234375, 0.13604736328125, 0.1666259765625, 0.024984359741210938, -0.42929840087890625, -0.6993560791015625, -0.413604736328125, 0.22283935546875, -0.0557861328125, 1.2425384521484375, -0.2928791046142578, -0.14715576171875, 0.3737640380859375, -0.14208221435546875, 0.19033432006835938, 0.3464927673339844, 0.20479965209960938, 0.04190826416015625, -0.00957489013671875, -0.5053024291992188, 0.4848480224609375, 0.2988262176513672, 0.045352935791015625, 0.427978515625, -0.5745201110839844, 0.5770988464355469, 0.1401214599609375, -0.027454376220703125, -0.6424560546875, -0.2728919982910156, -0.428192138671875, 0.5285491943359375, 0.438751220703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000001.npy"}
|
||||
{"epoch": 0.0015117157974300832, "step": 2, "batch_size": 64, "mean": 0.029325813055038452, "std": 0.47058698534965515, "min": -1.2616119384765625, "p10": -0.39437751770019525, "median": -0.11953926086425781, "p90": 0.6386299133300786, "max": 1.48486328125, "pos_frac": 0.4375, "sample": [-0.43146514892578125, 0.07180404663085938, -0.20481109619140625, -0.00714111328125, 0.5232467651367188, 0.06253433227539062, -0.07450485229492188, -0.35506439208984375, -0.14567184448242188, -0.2234630584716797, -0.31732177734375, 1.456878662109375, 0.14324188232421875, -0.41083526611328125, -0.4837646484375, -0.12252044677734375, -0.1322479248046875, 0.45180511474609375, -0.6440353393554688, -1.2616119384765625, 0.7379837036132812, 0.0069866180419921875, 0.14553451538085938, 0.2057647705078125, -0.11970138549804688, 0.1814441680908203, -0.2711448669433594, -0.22872161865234375, 0.23077011108398438, 0.2108001708984375, 0.348419189453125, -0.10046005249023438, 0.4903106689453125, -0.209228515625, 0.3726234436035156, -0.2670707702636719, 0.056774139404296875, 0.1702728271484375, -0.3437042236328125, -0.5232925415039062, 0.1266021728515625, -0.31758880615234375, -0.4544639587402344, -0.13794708251953125, 0.5147171020507812, 0.03656768798828125, 1.48486328125, -0.2191619873046875, -0.22581100463867188, -0.11937713623046875, -0.1849536895751953, 0.9678802490234375, 0.3454742431640625, -0.16698455810546875, -0.2411823272705078, -0.1938018798828125, 0.999603271484375, -0.17424774169921875, 0.908782958984375, -0.3559761047363281, -0.17584609985351562, 0.688079833984375, 0.04034423828125, -0.2581329345703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000002.npy"}
|
||||
{"epoch": 0.0030234315948601664, "step": 3, "batch_size": 64, "mean": 0.030951082706451416, "std": 0.4528982937335968, "min": -1.91986083984375, "p10": -0.3970367431640625, "median": 0.011129379272460938, "p90": 0.44815979003906264, "max": 1.359405517578125, "pos_frac": 0.515625, "sample": [-0.37566375732421875, 0.23485183715820312, 0.1451263427734375, 0.24562835693359375, -1.91986083984375, -0.01930999755859375, -0.011260986328125, -0.28614044189453125, -0.104217529296875, 0.2634124755859375, -0.03998565673828125, 0.4625244140625, 0.1188507080078125, -0.107452392578125, -0.5330810546875, -0.3955841064453125, -0.1912078857421875, -0.1479644775390625, 0.05163764953613281, 0.414642333984375, -0.3824920654296875, -0.10361099243164062, 0.6924972534179688, 0.48990631103515625, -0.11035919189453125, 0.248046875, 0.2030487060546875, 0.00958251953125, 0.14304542541503906, 0.2736968994140625, -0.5632228851318359, 0.12537384033203125, 0.26377105712890625, -0.40206146240234375, 0.3296852111816406, -0.2542743682861328, -0.03374481201171875, -0.21380615234375, 0.0877532958984375, -0.2646484375, -0.02677154541015625, 0.10428237915039062, 0.1354217529296875, 0.561798095703125, 0.18677520751953125, -0.1341705322265625, -0.27362060546875, 0.013427734375, -0.43447113037109375, -0.06104278564453125, 0.9460296630859375, -0.43791961669921875, 0.32418060302734375, -0.0221099853515625, -0.3976593017578125, 0.22234344482421875, -0.22405242919921875, 1.358184814453125, 0.012676239013671875, 1.359405517578125, 0.4114952087402344, 0.064300537109375, -0.353240966796875, 0.3024749755859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000003.npy"}
|
||||
{"epoch": 0.0045351473922902496, "step": 4, "batch_size": 64, "mean": 0.0423721969127655, "std": 0.48991692066192627, "min": -1.493499755859375, "p10": -0.5835880279541015, "median": 0.06982421875, "p90": 0.5966194152832032, "max": 1.2887496948242188, "pos_frac": 0.625, "sample": [0.025146484375, 0.1272430419921875, 0.08489990234375, 0.5260772705078125, 0.6910629272460938, -0.2654457092285156, 0.04327201843261719, -0.6066856384277344, 0.0147857666015625, 0.5992889404296875, 0.11400985717773438, -0.12407302856445312, 0.38140869140625, 0.000213623046875, 0.13824081420898438, -0.37841796875, 0.5734634399414062, 0.3143768310546875, -0.529693603515625, 1.2887496948242188, -0.4646759033203125, -0.6251907348632812, -0.096832275390625, -0.9678497314453125, 0.2933616638183594, 0.4747772216796875, 0.5924911499023438, 0.4549713134765625, 0.1373291015625, -0.20986175537109375, 0.83642578125, -0.03604888916015625, -0.2368316650390625, -0.4648895263671875, -0.29869842529296875, -0.48626708984375, 0.598388671875, 0.051021575927734375, -1.493499755859375, 0.40782928466796875, 0.17249298095703125, -0.11710166931152344, 0.1427459716796875, 0.2705650329589844, 0.93707275390625, 0.05474853515625, 0.3408660888671875, 0.025054931640625, -0.628631591796875, 0.3208961486816406, -0.45667266845703125, -0.11517333984375, 0.04971122741699219, 0.08774566650390625, -0.69073486328125, 0.6763153076171875, -0.915557861328125, 0.4278450012207031, 0.4277496337890625, -0.20733261108398438, 0.2289276123046875, -0.4741668701171875, 0.5611686706542969, 0.10941314697265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000004.npy"}
|
||||
{"epoch": 0.006046863189720333, "step": 5, "batch_size": 64, "mean": -0.0048364996910095215, "std": 0.36643707752227783, "min": -0.8973503112792969, "p10": -0.44866867065429683, "median": -0.0054073333740234375, "p90": 0.4126289367675783, "max": 1.2999725341796875, "pos_frac": 0.5, "sample": [-0.051052093505859375, -0.016021728515625, -0.05875205993652344, 0.220733642578125, 0.559417724609375, -0.576446533203125, -0.11010360717773438, -0.8973503112792969, 0.433807373046875, 0.27606201171875, 0.26422882080078125, -0.117767333984375, -0.1903076171875, 0.7533912658691406, 0.370635986328125, 0.43062591552734375, 0.31864166259765625, -0.21053314208984375, 0.22637176513671875, -0.2347412109375, -0.041217803955078125, 0.26398277282714844, 0.005207061767578125, -0.0724945068359375, 1.2999725341796875, 0.0217132568359375, -0.4090118408203125, -0.08934402465820312, -0.3058929443359375, 0.1570892333984375, -0.1110992431640625, -0.677215576171875, -0.4783935546875, 0.02044677734375, -0.46308135986328125, -0.54595947265625, 0.3677082061767578, 0.1319580078125, 0.1459503173828125, -0.3109130859375, 0.03985595703125, -0.2834625244140625, 0.35693359375, -0.075042724609375, 0.17453765869140625, -0.08862113952636719, 0.02740478515625, -0.3507080078125, -0.0308074951171875, -0.6814346313476562, 0.08824920654296875, 0.504241943359375, -0.1667022705078125, 0.09171295166015625, -0.4150390625, 0.5021438598632812, -0.12725830078125, 0.021648406982421875, 0.16533279418945312, 0.24132156372070312, 0.147369384765625, -0.3936767578125, 0.04331207275390625, -0.401092529296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000005.npy"}
|
||||
{"epoch": 0.007558578987150416, "step": 6, "batch_size": 64, "mean": 0.0959392786026001, "std": 0.2986941337585449, "min": -0.7420654296875, "p10": -0.3121551513671874, "median": 0.10642242431640625, "p90": 0.48988418579101567, "max": 0.6975860595703125, "pos_frac": 0.65625, "sample": [0.5765495300292969, 0.3026580810546875, 0.493072509765625, -0.065399169921875, -0.2391815185546875, 0.3361854553222656, 0.102325439453125, 0.0446624755859375, -0.13515472412109375, 0.045257568359375, 0.026203155517578125, -0.7420654296875, -0.16152381896972656, -0.1046905517578125, -0.3434295654296875, -0.1165924072265625, 0.177459716796875, -0.5727195739746094, 0.048213958740234375, 0.11746597290039062, 0.6975860595703125, 0.3177642822265625, -0.025180816650390625, 0.22309303283691406, 0.1856842041015625, 0.3597869873046875, -0.02886199951171875, -0.3458251953125, 0.11190986633300781, -0.18605804443359375, 0.08732986450195312, 0.5520820617675781, 0.014251708984375, 0.5956001281738281, 0.48244476318359375, -0.00722503662109375, 0.16069412231445312, 0.29998016357421875, -0.1506500244140625, 0.3064422607421875, -0.4488372802734375, 0.1630382537841797, 0.24603271484375, -0.012187957763671875, -0.1787109375, 0.13082122802734375, 0.371795654296875, 0.3663787841796875, 0.4229736328125, -0.4001045227050781, 0.211700439453125, 0.3621368408203125, 0.012477874755859375, -0.029912948608398438, -0.465179443359375, 0.049072265625, -0.15233230590820312, 0.1105194091796875, 0.587158203125, 0.51849365234375, 0.1656951904296875, 0.288787841796875, 0.3479175567626953, 0.03223419189453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000006.npy"}
|
||||
{"epoch": 0.009070294784580499, "step": 7, "batch_size": 64, "mean": -0.0647326409816742, "std": 0.35469186305999756, "min": -0.9180450439453125, "p10": -0.5370292663574219, "median": -0.040790557861328125, "p90": 0.34089050292968753, "max": 0.840240478515625, "pos_frac": 0.421875, "sample": [0.11324310302734375, 0.14594268798828125, 0.33776092529296875, -0.15819549560546875, 0.04092597961425781, 0.39318084716796875, 0.37870025634765625, -0.05432891845703125, -0.287322998046875, 0.2886619567871094, -0.491455078125, -0.000957489013671875, 0.12837600708007812, -0.0111083984375, -0.13420867919921875, 0.14905548095703125, -0.34967803955078125, 0.34223175048828125, 0.2434539794921875, -0.9180450439453125, -0.14616012573242188, 0.828033447265625, 0.5829238891601562, 0.17224884033203125, 0.066925048828125, -0.19294357299804688, -0.2501792907714844, -0.014644622802734375, -0.09377288818359375, -0.015293121337890625, -0.26398468017578125, -0.1796722412109375, -0.4953651428222656, 0.2287445068359375, -0.2742576599121094, -0.5653839111328125, 0.12782669067382812, -0.3197517395019531, 0.5086631774902344, 0.19167327880859375, -0.512969970703125, 0.155853271484375, -0.027252197265625, -0.8337554931640625, 0.0315399169921875, -0.05912017822265625, 0.12164688110351562, -0.280792236328125, -0.1367034912109375, 0.840240478515625, -0.19860076904296875, 0.1196136474609375, -0.14076995849609375, -0.5845146179199219, -0.221435546875, -0.7536468505859375, 0.15059661865234375, -0.12146759033203125, -0.68499755859375, 0.0014801025390625, -0.39395904541015625, 0.068756103515625, -0.5473403930664062, -0.18715286254882812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000007.npy"}
|
||||
{"epoch": 0.010582010582010581, "step": 8, "batch_size": 64, "mean": -0.038746029138565063, "std": 0.3885754346847534, "min": -1.109100341796875, "p10": -0.4563856124877929, "median": -0.024261474609375, "p90": 0.3625667572021485, "max": 0.966552734375, "pos_frac": 0.453125, "sample": [-0.27643775939941406, -1.109100341796875, 0.966552734375, -0.00658416748046875, 0.9294357299804688, 0.15035247802734375, -0.077484130859375, -0.5103759765625, 0.19591522216796875, -0.228790283203125, -0.077880859375, -0.5385971069335938, 0.04675483703613281, 0.3565559387207031, -0.36515045166015625, -0.10236358642578125, 0.365142822265625, -0.041927337646484375, 0.104949951171875, -0.9347076416015625, 0.12047576904296875, -0.5302505493164062, 0.31563568115234375, -0.02911376953125, 0.06695556640625, 0.057224273681640625, 0.5259323120117188, 0.080169677734375, -0.37738800048828125, 0.14895248413085938, 0.2757911682128906, -0.3261260986328125, 0.063232421875, 0.09253692626953125, -0.28275299072265625, -1.0008087158203125, -0.054744720458984375, 0.05910301208496094, 0.6072235107421875, 0.030681610107421875, -0.248016357421875, -0.3424263000488281, -0.002048492431640625, -0.1602325439453125, 0.20092391967773438, -0.46954345703125, -0.3150444030761719, -0.07669830322265625, -0.42568397521972656, -0.0194091796875, 0.04506492614746094, -0.3336067199707031, -0.18363571166992188, -0.1229705810546875, 0.2016143798828125, 0.175994873046875, -0.17935562133789062, -0.36466217041015625, 0.8638687133789062, 0.3193817138671875, 0.030914306640625, 0.40743255615234375, -0.107574462890625, -0.06302261352539062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000008.npy"}
|
||||
{"epoch": 0.012093726379440665, "step": 9, "batch_size": 64, "mean": 0.068980872631073, "std": 0.46562495827674866, "min": -1.19049072265625, "p10": -0.43131828308105463, "median": 0.05945587158203125, "p90": 0.5901531219482424, "max": 1.5499114990234375, "pos_frac": 0.59375, "sample": [0.360198974609375, 0.1355743408203125, 0.25939178466796875, -0.486297607421875, -0.4580230712890625, -0.27394866943359375, -0.34392547607421875, 0.06322479248046875, -0.61279296875, -0.1684417724609375, 0.37796783447265625, 0.07911109924316406, -0.27193450927734375, 0.3524169921875, -1.19049072265625, 0.20420455932617188, 0.834716796875, -0.3665924072265625, -0.21979522705078125, -0.16162490844726562, 1.5499114990234375, 0.20203399658203125, -0.216156005859375, 0.038539886474609375, 0.3702392578125, 0.34002685546875, -0.3690071105957031, -0.00421905517578125, 0.11221885681152344, 0.11029052734375, 0.7643661499023438, 0.3313636779785156, -0.524017333984375, 1.139984130859375, 0.027248382568359375, 0.6931819915771484, 0.008548736572265625, 0.05568695068359375, 0.508087158203125, 0.158233642578125, 0.61279296875, -0.06816291809082031, -0.96820068359375, -0.168701171875, -0.24884605407714844, 0.3136024475097656, 0.25893402099609375, -0.10223579406738281, 0.009525299072265625, -0.1298675537109375, -0.1074676513671875, 0.5013008117675781, -0.20260238647460938, 0.510894775390625, -0.049633026123046875, 0.148651123046875, 0.5373268127441406, 0.7738265991210938, 0.30413818359375, -0.23054122924804688, 0.1605224609375, 0.1362762451171875, 0.0121612548828125, -0.9984207153320312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000009.npy"}
|
||||
{"epoch": 0.013605442176870748, "step": 10, "batch_size": 64, "mean": -0.008043557405471802, "std": 0.38969525694847107, "min": -1.1265716552734375, "p10": -0.4953758239746094, "median": -0.000911712646484375, "p90": 0.401752090454102, "max": 1.1631317138671875, "pos_frac": 0.5, "sample": [0.7399749755859375, 0.12038421630859375, -0.23964691162109375, -0.19251632690429688, 0.3014640808105469, -0.4996490478515625, -0.12115669250488281, -0.1078948974609375, -0.17047119140625, -0.48540496826171875, -0.3417243957519531, -1.1265716552734375, -0.5266036987304688, -0.4727020263671875, -0.726318359375, 0.17331695556640625, -0.04945564270019531, 0.076751708984375, -0.7680511474609375, 0.03305816650390625, -0.18379783630371094, -0.47853851318359375, -0.26153564453125, 0.21871185302734375, -0.25994873046875, 0.2576637268066406, -0.22318267822265625, 0.2233123779296875, 0.043033599853515625, 0.20020294189453125, 0.29514312744140625, 1.1631317138671875, -0.19778060913085938, -0.44039154052734375, 0.0396270751953125, 0.05037689208984375, 0.5015716552734375, -0.09081649780273438, 0.069122314453125, 0.2520904541015625, 0.0330047607421875, -0.11493301391601562, 0.660064697265625, 0.5274505615234375, -0.13829421997070312, 0.14971923828125, 0.2686767578125, 0.26943206787109375, 0.444732666015625, 0.167755126953125, 0.8185882568359375, 0.26677703857421875, 0.16642379760742188, -0.09723281860351562, -0.5694618225097656, -0.03482818603515625, -0.0644989013671875, -0.12376022338867188, -0.513671875, 0.2955474853515625, 0.21813583374023438, 0.23236083984375, -0.0710906982421875, -0.1004638671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000010.npy"}
|
||||
{"epoch": 0.015117157974300832, "step": 11, "batch_size": 64, "mean": -0.010087013244628906, "std": 0.41387802362442017, "min": -1.2548370361328125, "p10": -0.4908599853515625, "median": -0.023217201232910156, "p90": 0.4491683959960938, "max": 1.3539886474609375, "pos_frac": 0.4375, "sample": [-0.18883514404296875, -0.027227401733398438, 0.26006317138671875, 0.277740478515625, 0.016859054565429688, -0.059600830078125, -0.6670303344726562, -0.02513885498046875, 1.1289520263671875, -0.4902191162109375, 0.2913970947265625, -0.4911346435546875, 0.24561309814453125, -0.0094757080078125, -0.06960296630859375, -0.376220703125, 0.2502937316894531, -0.0691375732421875, 0.02703857421875, 0.4643096923828125, -0.048908233642578125, 0.0289459228515625, 1.3539886474609375, -0.11888885498046875, -0.2913818359375, -0.1222381591796875, -0.11751937866210938, 0.2631950378417969, -0.21541595458984375, -0.17646408081054688, -0.00296783447265625, 0.061920166015625, -0.021295547485351562, 0.427581787109375, 0.03230476379394531, -0.0861053466796875, 0.02231597900390625, -0.6151123046875, 0.05565643310546875, -0.44683837890625, 0.6440582275390625, 0.20257186889648438, -0.000545501708984375, -0.1502685546875, 0.401092529296875, -1.2548370361328125, -0.3154296875, -0.28563690185546875, 0.5377731323242188, 0.2501678466796875, -0.4614982604980469, 0.4584197998046875, 0.09952163696289062, 0.12666893005371094, 0.26491546630859375, -0.689208984375, -0.1375579833984375, -0.10905647277832031, -0.0825653076171875, 0.3620758056640625, 0.5891876220703125, -0.5849609375, -0.62847900390625, -0.3533935546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000011.npy"}
|
||||
{"epoch": 0.016628873771730914, "step": 12, "batch_size": 64, "mean": 0.043317049741744995, "std": 0.4446461498737335, "min": -1.135528564453125, "p10": -0.36493301391601557, "median": 0.0037250518798828125, "p90": 0.5415588378906252, "max": 1.5821533203125, "pos_frac": 0.53125, "sample": [-0.33190155029296875, 0.5625381469726562, 0.001354217529296875, 0.3055438995361328, 0.1677875518798828, 0.2714691162109375, -0.16358566284179688, 0.23012542724609375, -0.0651092529296875, 0.16664505004882812, -1.0426025390625, -0.37908935546875, -0.28276824951171875, 1.5821533203125, -0.2084808349609375, 0.001811981201171875, 0.7509765625, 0.46019744873046875, -0.3086700439453125, 0.05789756774902344, 0.63983154296875, -0.1151123046875, -0.289520263671875, 1.241180419921875, 0.04837989807128906, 0.9179000854492188, -0.42183685302734375, 0.3255767822265625, 0.3093433380126953, 0.17132186889648438, -0.32077789306640625, 0.1349029541015625, -0.19403839111328125, -1.135528564453125, -0.11649322509765625, 0.15653610229492188, 0.6099166870117188, -0.10495758056640625, -0.0492095947265625, -0.5728492736816406, 0.4081878662109375, 0.44549560546875, -0.1934814453125, -0.20685195922851562, 0.00563812255859375, -0.23180770874023438, -0.2937812805175781, 0.22447967529296875, 0.05828094482421875, -0.18540191650390625, -0.4389381408691406, -0.18572616577148438, -0.2915077209472656, -0.15216064453125, 0.18597412109375, 0.092529296875, 0.025539398193359375, -0.0856475830078125, -0.5052452087402344, 0.10076904296875, 0.49260711669921875, 0.324188232421875, -0.16104888916015625, 0.3293418884277344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000012.npy"}
|
||||
{"epoch": 0.018140589569160998, "step": 13, "batch_size": 64, "mean": 0.026911497116088867, "std": 0.30951541662216187, "min": -0.7813186645507812, "p10": -0.3818321228027344, "median": 0.042194366455078125, "p90": 0.42522010803222654, "max": 0.6808624267578125, "pos_frac": 0.578125, "sample": [0.3734130859375, 0.19499969482421875, 0.19854736328125, 0.02519989013671875, -0.099456787109375, 0.17715072631835938, 0.35327911376953125, -0.2822532653808594, 0.6118316650390625, -0.5088958740234375, -0.25785064697265625, -0.17583465576171875, 0.118133544921875, 0.6808624267578125, -0.045398712158203125, -0.17454147338867188, 0.194000244140625, -0.19436264038085938, 0.154541015625, -0.3603324890136719, 0.007232666015625, 0.30576324462890625, 0.42572021484375, 0.5726318359375, -0.295501708984375, -0.39243316650390625, 0.33837127685546875, 0.18176841735839844, 0.43438720703125, 0.1822967529296875, -0.1240234375, 0.0210418701171875, -0.298858642578125, 0.24498367309570312, -0.253692626953125, 0.2239532470703125, -0.03798675537109375, 0.4240531921386719, -0.000591278076171875, -0.28284454345703125, 0.1757030487060547, 0.4561004638671875, 0.128814697265625, 0.03646087646484375, -0.7813186645507812, -0.038455963134765625, 0.5028152465820312, 0.0479278564453125, -0.5249176025390625, 0.15276336669921875, -0.14205169677734375, 0.07265472412109375, -0.3874168395996094, 0.08501243591308594, -0.4371337890625, -0.101898193359375, 0.2259674072265625, 0.08460617065429688, 0.0166015625, 0.3400764465332031, -0.5944290161132812, 0.137451171875, -0.023500442504882812, -0.3688011169433594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000013.npy"}
|
||||
{"epoch": 0.019652305366591082, "step": 14, "batch_size": 64, "mean": -0.017433375120162964, "std": 0.32638928294181824, "min": -0.9124603271484375, "p10": -0.5626335144042969, "median": 0.03086376190185547, "p90": 0.30262603759765627, "max": 0.6654129028320312, "pos_frac": 0.5625, "sample": [-0.6311416625976562, 0.268798828125, 0.4605560302734375, -0.5418319702148438, -0.07916831970214844, -0.05083465576171875, -0.0531158447265625, 0.49560546875, -0.07445526123046875, -0.3556194305419922, 0.2621917724609375, -0.44244384765625, 0.03275489807128906, -0.0193634033203125, 0.3068580627441406, -0.10636520385742188, 0.26836395263671875, -0.6000518798828125, -0.5715484619140625, -0.21088027954101562, -0.06476211547851562, 0.5736846923828125, 0.2531013488769531, 0.0873565673828125, 0.3725433349609375, -0.16223907470703125, -0.605682373046875, 0.6654129028320312, 0.05897331237792969, 0.523193359375, 0.0101165771484375, 0.19504547119140625, -0.26422119140625, -0.24346923828125, -0.23904037475585938, 0.23374176025390625, 0.05384063720703125, -0.06986236572265625, 0.27954864501953125, 0.1024932861328125, 0.14031982421875, -0.2433319091796875, 0.0006866455078125, -0.571746826171875, 0.15865325927734375, 0.04472541809082031, 0.17462158203125, 0.088043212890625, 0.2702178955078125, 0.1669921875, -0.2050323486328125, -0.9124603271484375, 0.025568008422851562, -0.741424560546875, -0.048496246337890625, 0.2927513122558594, 0.0592498779296875, -0.20605087280273438, 0.028972625732421875, -0.24151229858398438, 0.18752479553222656, 0.03907012939453125, 0.08324813842773438, 0.17559051513671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000014.npy"}
|
||||
{"epoch": 0.021164021164021163, "step": 15, "batch_size": 64, "mean": -0.08046802878379822, "std": 0.34274861216545105, "min": -1.046417236328125, "p10": -0.580267333984375, "median": -0.058236122131347656, "p90": 0.36758155822753913, "max": 0.6185760498046875, "pos_frac": 0.40625, "sample": [0.01792144775390625, -0.0032501220703125, -0.052532196044921875, 0.07774162292480469, 0.42220306396484375, -0.36525726318359375, 0.1205902099609375, -0.484222412109375, -0.5858154296875, -0.3369140625, -0.32280731201171875, -0.06229972839355469, -0.11932373046875, -0.011554718017578125, -0.202880859375, -0.2312641143798828, -0.34133148193359375, 0.07635116577148438, -0.0630035400390625, -0.009571075439453125, 0.06076812744140625, 0.01763916015625, -0.08370208740234375, -0.6716842651367188, 0.24993324279785156, 0.125732421875, -0.6479034423828125, -0.3610076904296875, 0.47492218017578125, -0.550872802734375, -0.054172515869140625, 0.5751609802246094, 0.18807220458984375, -0.0777740478515625, -0.118072509765625, 0.13446044921875, -0.7241287231445312, 0.3751220703125, -0.18613052368164062, -0.13854217529296875, -0.01715087890625, -0.06325149536132812, 0.248870849609375, -0.56732177734375, 0.06998062133789062, -1.046417236328125, 0.06697845458984375, -0.6336822509765625, 0.3499870300292969, 0.144866943359375, -0.2481842041015625, -0.22203445434570312, -0.27623748779296875, 0.5687179565429688, 0.0025196075439453125, 0.286895751953125, 0.6185760498046875, -0.6296615600585938, -0.3807106018066406, -0.1056671142578125, 0.009490966796875, 0.46396636962890625, 0.17461395263671875, -0.0756988525390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000015.npy"}
|
||||
{"epoch": 0.022675736961451247, "step": 16, "batch_size": 64, "mean": -0.025534451007843018, "std": 0.41622477769851685, "min": -1.336151123046875, "p10": -0.4653091430664062, "median": -0.01859283447265625, "p90": 0.4470222473144532, "max": 0.8187942504882812, "pos_frac": 0.46875, "sample": [0.18605422973632812, 0.2042713165283203, -0.08721923828125, 0.10651779174804688, -0.4099540710449219, -0.022693634033203125, -0.2710285186767578, 0.5063018798828125, -1.1672515869140625, 0.10988616943359375, 0.079620361328125, 0.18421173095703125, 0.4581756591796875, -0.5955734252929688, 0.12205886840820312, -0.16225051879882812, -0.3690910339355469, -0.2650146484375, -0.03803253173828125, -0.000518798828125, 0.259124755859375, -1.336151123046875, -0.3317413330078125, -0.27495574951171875, 0.0237579345703125, 0.39479827880859375, -0.4890327453613281, -0.1361541748046875, -0.3861656188964844, -1.1230010986328125, -0.022674560546875, 0.17489242553710938, -0.5399169921875, -0.33953857421875, 0.49371337890625, 0.7856903076171875, 0.77154541015625, -0.6278076171875, 0.15129852294921875, -0.0596160888671875, -0.2931785583496094, 0.3669624328613281, -0.3180999755859375, 0.16399383544921875, 0.10160064697265625, 0.36568641662597656, 0.42099761962890625, -0.15494537353515625, -0.11334037780761719, -0.041385650634765625, 0.14023208618164062, 0.27033424377441406, -0.05882453918457031, 0.8187942504882812, -0.040515899658203125, -0.2816047668457031, 0.1435546875, 0.269927978515625, 0.6273651123046875, 0.23154449462890625, -0.15542221069335938, 0.04235076904296875, -0.0145111083984375, -0.08225631713867188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000016.npy"}
|
||||
{"epoch": 0.02418745275888133, "step": 17, "batch_size": 64, "mean": 0.011744409799575806, "std": 0.4006301462650299, "min": -0.749420166015625, "p10": -0.5433609008789062, "median": 0.008551597595214844, "p90": 0.4675998687744142, "max": 0.957061767578125, "pos_frac": 0.515625, "sample": [-0.20539093017578125, -0.346343994140625, -0.12298583984375, -0.09708786010742188, 0.957061767578125, -0.3957939147949219, -0.49779510498046875, -0.020959854125976562, -0.2292327880859375, -0.377349853515625, 0.59991455078125, 0.28142547607421875, -0.08675003051757812, 0.4137001037597656, 0.28414154052734375, -0.6829757690429688, -0.653472900390625, -0.611480712890625, 0.011884689331054688, -0.323883056640625, 0.8349761962890625, 0.43802642822265625, 0.25909423828125, 0.8170166015625, -0.6478500366210938, -0.09400558471679688, -0.26949310302734375, -0.07827377319335938, 0.1439361572265625, 0.2877197265625, 0.41089630126953125, 0.3611602783203125, -0.2010040283203125, 0.36651611328125, 0.8007583618164062, -0.3983917236328125, 0.037261962890625, -0.3978424072265625, 0.0431976318359375, -0.749420166015625, 0.3743743896484375, 0.14625930786132812, -0.2160816192626953, -0.18837738037109375, 0.08319473266601562, 0.2721710205078125, -0.1045074462890625, 0.139678955078125, -0.1823883056640625, 0.005218505859375, -0.677276611328125, 0.0442047119140625, 0.24187088012695312, 0.12269210815429688, -0.33535003662109375, -0.5628890991210938, -0.00514984130859375, -0.14095306396484375, 0.400848388671875, 0.12850570678710938, 0.07138442993164062, 0.4802742004394531, 0.09917831420898438, 0.6938552856445312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000017.npy"}
|
||||
{"epoch": 0.025699168556311415, "step": 18, "batch_size": 64, "mean": 0.04118022322654724, "std": 0.337997168302536, "min": -0.636962890625, "p10": -0.38222312927246094, "median": 0.028928756713867188, "p90": 0.5381484985351564, "max": 0.6997528076171875, "pos_frac": 0.53125, "sample": [-0.28159332275390625, -0.3567047119140625, -0.3435478210449219, -0.2203807830810547, 0.267730712890625, -0.43548583984375, -0.2778644561767578, -0.496826171875, 0.6997528076171875, -0.36780548095703125, 0.10285186767578125, -0.636962890625, -0.056690216064453125, -0.0279693603515625, 0.556976318359375, 0.626007080078125, -0.08321762084960938, 0.39102935791015625, -0.3801002502441406, 0.4942169189453125, -0.218292236328125, 0.424896240234375, 0.3167572021484375, -0.3033599853515625, 0.45804595947265625, 0.17844390869140625, -0.43410491943359375, -0.00432586669921875, -0.089691162109375, -0.02484130859375, -0.5057373046875, 0.32828521728515625, 0.2872772216796875, 0.3336639404296875, 0.29742431640625, 0.64208984375, 0.23872756958007812, -0.3245887756347656, 0.33634185791015625, -0.3831329345703125, 0.124298095703125, 0.12609100341796875, 0.1214141845703125, 0.00177001953125, -0.359405517578125, 0.02288818359375, -0.13628578186035156, 0.2776298522949219, -0.08892059326171875, 0.5853958129882812, -0.03083038330078125, 0.24036216735839844, 0.059661865234375, -0.133392333984375, -0.4303741455078125, 0.034969329833984375, -0.25806236267089844, -0.07744979858398438, 0.61773681640625, 0.24748992919921875, 0.0545501708984375, 0.5936660766601562, 0.06571578979492188, 0.24932098388671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000018.npy"}
|
||||
{"epoch": 0.027210884353741496, "step": 19, "batch_size": 64, "mean": -0.04288366436958313, "std": 0.42035603523254395, "min": -1.4632949829101562, "p10": -0.491619873046875, "median": -0.015210151672363281, "p90": 0.4490226745605469, "max": 1.183990478515625, "pos_frac": 0.484375, "sample": [0.4685821533203125, -0.02970123291015625, 0.11102294921875, -0.769622802734375, -0.3913002014160156, 0.12760543823242188, -0.7121963500976562, 0.4609527587890625, -0.33270263671875, -0.5366668701171875, 0.6794204711914062, 0.0672607421875, 0.3071861267089844, -0.12537002563476562, 0.3262481689453125, -0.07946205139160156, -0.3371734619140625, -0.33808135986328125, -0.07791900634765625, 0.086883544921875, -1.4632949829101562, -0.4918670654296875, -0.10214996337890625, -0.0490264892578125, 0.0158538818359375, 0.0269927978515625, 0.2214202880859375, -0.3029212951660156, 0.5281906127929688, -0.08336830139160156, 0.3435211181640625, 0.10161972045898438, 0.29079437255859375, -1.054107666015625, 1.183990478515625, -0.168914794921875, -0.3182411193847656, 0.2904243469238281, 0.0150299072265625, 0.21480941772460938, 0.3210182189941406, -0.34581756591796875, -0.8226318359375, 0.063446044921875, 0.44165802001953125, 0.12396240234375, -0.333160400390625, 0.37188720703125, -0.0007190704345703125, 0.5442581176757812, 0.09002685546875, 0.2454833984375, -0.23441314697265625, 0.008100509643554688, 0.452178955078125, -0.32508087158203125, -0.21551513671875, -0.19882965087890625, -0.05351066589355469, -0.4910430908203125, -0.16035079956054688, -0.313629150390625, 0.1058807373046875, -0.1214752197265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000019.npy"}
|
||||
{"epoch": 0.02872260015117158, "step": 20, "batch_size": 64, "mean": -0.07760673761367798, "std": 0.3660027086734772, "min": -0.8436050415039062, "p10": -0.6014747619628906, "median": -0.057399749755859375, "p90": 0.3865962982177737, "max": 0.7476158142089844, "pos_frac": 0.453125, "sample": [0.19680023193359375, -0.7855987548828125, 0.7476158142089844, 0.3211555480957031, -0.5382728576660156, -0.6982192993164062, 0.414642333984375, 0.177093505859375, 0.5650100708007812, -0.6039581298828125, -0.555511474609375, -0.1637420654296875, -0.1267852783203125, -0.11008071899414062, -0.0263214111328125, 0.29369354248046875, 0.09075927734375, -0.14082717895507812, -0.5956802368164062, -0.8436050415039062, -0.031261444091796875, 0.01947021484375, 0.30739593505859375, -0.5379562377929688, 0.045902252197265625, -0.43447113037109375, 0.43951416015625, 0.08877182006835938, -0.10002899169921875, -0.13650131225585938, -0.674774169921875, -0.3313560485839844, -0.1558990478515625, -0.3869476318359375, 0.17571258544921875, -0.5782661437988281, 0.21829605102539062, -0.7462615966796875, -0.0468292236328125, -0.0881500244140625, 0.12769126892089844, 0.0481719970703125, -0.06797027587890625, 0.509857177734375, 0.1834564208984375, 0.301177978515625, 0.46231842041015625, -0.5329132080078125, 0.12535858154296875, -0.10021209716796875, -0.1490936279296875, 0.13348388671875, 0.09304428100585938, 0.12274169921875, -0.233062744140625, 0.037258148193359375, -0.650787353515625, 0.00328826904296875, 0.005035400390625, -0.11647415161132812, -0.1278228759765625, 0.5725860595703125, -0.15106964111328125, -0.22742271423339844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000020.npy"}
|
||||
{"epoch": 0.030234315948601664, "step": 21, "batch_size": 64, "mean": 0.03705599904060364, "std": 0.44235506653785706, "min": -1.02069091796875, "p10": -0.46773948669433585, "median": 0.018907546997070312, "p90": 0.5392869949340824, "max": 1.230682373046875, "pos_frac": 0.546875, "sample": [-0.386962890625, -0.04913330078125, 0.20643234252929688, 0.601165771484375, -0.763031005859375, 0.2824249267578125, 0.017704010009765625, -0.1606292724609375, 0.17457199096679688, 0.22688865661621094, 0.2810821533203125, 0.1797637939453125, 1.230682373046875, -0.3322906494140625, 0.025104522705078125, 0.44979095458984375, -0.06755828857421875, 0.01209259033203125, 0.1341552734375, 0.37125205993652344, -0.6508827209472656, -0.2647132873535156, 0.045867919921875, -0.6925506591796875, 0.020111083984375, -0.2886505126953125, -0.7353649139404297, -0.0516204833984375, 0.093231201171875, -0.16063308715820312, -0.49346923828125, -0.4077033996582031, -1.02069091796875, -0.16900634765625, 0.404296875, 0.0774688720703125, 0.1505584716796875, -0.323089599609375, 0.6032333374023438, -0.06989860534667969, 0.3089141845703125, -0.093780517578125, -0.66229248046875, -0.076904296875, 1.206146240234375, 0.0496063232421875, 0.3532295227050781, -0.0738677978515625, 0.20360946655273438, -0.2863655090332031, 0.3481941223144531, -0.2945556640625, -0.22304725646972656, 0.003841400146484375, 0.44960594177246094, -0.14596939086914062, 0.27127838134765625, -0.1687774658203125, 0.5776424407958984, 1.0462646484375, 0.12146759033203125, 0.429290771484375, 0.8936920166015625, -0.36563873291015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000021.npy"}
|
||||
{"epoch": 0.031746031746031744, "step": 22, "batch_size": 64, "mean": -0.028704792261123657, "std": 0.4497559070587158, "min": -1.1220245361328125, "p10": -0.49561347961425783, "median": -0.017971038818359375, "p90": 0.48090114593505895, "max": 1.565032958984375, "pos_frac": 0.46875, "sample": [-0.2440338134765625, 0.36995697021484375, 0.23345947265625, -0.37552642822265625, -0.2235240936279297, -0.443145751953125, 0.23920631408691406, -0.8149948120117188, 0.06459808349609375, 0.5803012847900391, -0.217437744140625, -0.07259368896484375, 0.020648956298828125, 0.0648193359375, 0.5160293579101562, -1.1220245361328125, -0.3626213073730469, 0.07788848876953125, -0.01744842529296875, -0.028961181640625, -0.36931610107421875, -0.3289947509765625, 0.3980255126953125, 0.39893531799316406, 0.10050201416015625, 0.9724273681640625, -0.01978302001953125, -0.001514434814453125, 0.13365554809570312, -0.0220794677734375, 1.565032958984375, -0.8934326171875, -0.4902381896972656, 0.29857635498046875, 0.07203292846679688, -0.4849090576171875, -0.031833648681640625, 0.10271453857421875, 0.0258026123046875, 0.33539581298828125, 0.18535232543945312, 0.846832275390625, -0.527069091796875, -0.0259552001953125, -0.19725418090820312, -0.2028961181640625, 0.5995712280273438, -0.92401123046875, 0.2333984375, 0.101226806640625, -0.039546966552734375, -0.755828857421875, 0.24822616577148438, 0.0337677001953125, -0.38669586181640625, -0.3502922058105469, -0.19357681274414062, -0.01849365234375, -0.34885406494140625, 0.08446502685546875, -0.49791717529296875, 0.561370849609375, 0.0009326934814453125, -0.2694549560546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000022.npy"}
|
||||
{"epoch": 0.03325774754346183, "step": 23, "batch_size": 64, "mean": 0.0760551393032074, "std": 0.42486998438835144, "min": -0.980712890625, "p10": -0.49279022216796875, "median": 0.10068511962890625, "p90": 0.5125381469726563, "max": 0.91070556640625, "pos_frac": 0.609375, "sample": [-0.086090087890625, 0.583099365234375, 0.1584014892578125, -0.474456787109375, 0.05923652648925781, -0.04346275329589844, -0.12459564208984375, 0.87835693359375, -0.11588287353515625, -0.1113128662109375, -0.47655487060546875, 0.3551464080810547, -0.3086891174316406, -0.11941146850585938, 0.5886459350585938, 0.73907470703125, 0.28168487548828125, 0.12815093994140625, 0.35572052001953125, -0.204864501953125, 0.1912994384765625, -0.06207084655761719, -0.1505584716796875, 0.49521636962890625, 0.50244140625, 0.239501953125, 0.1220550537109375, 0.5308265686035156, -0.689239501953125, 0.44451141357421875, -0.888214111328125, 0.45255279541015625, 0.20301055908203125, -0.055477142333984375, 0.4749755859375, -0.22696304321289062, 0.079315185546875, 0.24969482421875, -0.2828559875488281, 0.014537811279296875, -0.9071502685546875, 0.34885406494140625, 0.06336212158203125, -0.1065216064453125, 0.049346923828125, -0.77972412109375, -0.006336212158203125, 0.42812347412109375, 0.05512237548828125, -0.980712890625, 0.91070556640625, 0.2813224792480469, 0.2834320068359375, 0.5029582977294922, 0.5044097900390625, 0.43318939208984375, 0.5033721923828125, 0.30770111083984375, -0.5672988891601562, 0.516021728515625, -0.4412994384765625, 0.03427886962890625, 0.22736358642578125, -0.49974822998046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000023.npy"}
|
||||
{"epoch": 0.03476946334089191, "step": 24, "batch_size": 64, "mean": -0.038009583950042725, "std": 0.3605614900588989, "min": -1.0170440673828125, "p10": -0.4759063720703125, "median": -0.009674072265625, "p90": 0.44581546783447273, "max": 0.8951530456542969, "pos_frac": 0.484375, "sample": [0.11382293701171875, -0.6698684692382812, 0.2472991943359375, 0.1941967010498047, -0.35630035400390625, 0.10555267333984375, 0.24608612060546875, -0.04293060302734375, -1.0170440673828125, 0.01242828369140625, -0.211700439453125, -0.04724884033203125, 0.036865234375, -0.320587158203125, -0.027587890625, 0.463287353515625, 0.3372535705566406, -0.194488525390625, 0.29975128173828125, -0.017822265625, 0.20439910888671875, 0.1138916015625, 0.36388397216796875, -0.3414802551269531, -0.74774169921875, 0.09810256958007812, -0.16194915771484375, -0.11100006103515625, 0.575103759765625, 0.04718780517578125, -0.2332000732421875, 0.004184722900390625, -0.3158378601074219, 0.144989013671875, 0.8951530456542969, -0.451873779296875, -0.20514678955078125, 0.42514610290527344, 0.19435882568359375, -0.30631256103515625, -0.34644317626953125, -0.16916275024414062, 0.03650665283203125, 0.18436813354492188, -0.4862060546875, -0.00152587890625, 0.051494598388671875, 0.45467376708984375, -0.24533462524414062, 0.0843658447265625, 0.01773834228515625, -0.17273330688476562, -0.25943756103515625, -0.1198577880859375, -0.732696533203125, -0.02492523193359375, 0.1947174072265625, -0.5234375, 0.462310791015625, -0.12129974365234375, 0.588531494140625, -0.7980499267578125, -0.341888427734375, 0.4928550720214844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000024.npy"}
|
||||
{"epoch": 0.036281179138321996, "step": 25, "batch_size": 64, "mean": 0.04470303654670715, "std": 0.3919016122817993, "min": -0.947052001953125, "p10": -0.4113300323486328, "median": 0.09044647216796875, "p90": 0.49734458923339847, "max": 1.0600738525390625, "pos_frac": 0.59375, "sample": [0.5024681091308594, 0.14948272705078125, 0.09974288940429688, 0.1532745361328125, 0.1625213623046875, 0.23046493530273438, 0.15887451171875, 0.09652328491210938, 0.1871337890625, -0.1505126953125, -0.39046478271484375, -0.1454925537109375, 0.21016693115234375, 0.200042724609375, 0.2951507568359375, 0.5336990356445312, 0.670135498046875, -0.09394073486328125, -0.26371002197265625, 0.43589019775390625, -0.15639114379882812, -0.055980682373046875, -0.70257568359375, 0.621337890625, -0.6364898681640625, -0.0474700927734375, 0.4553337097167969, -0.947052001953125, 0.39768218994140625, -0.32958984375, -0.16628074645996094, 0.40552711486816406, -0.03215789794921875, 0.12432861328125, 0.199737548828125, 0.3777809143066406, -0.807952880859375, -0.11819076538085938, -0.4200859069824219, -0.390899658203125, -0.081634521484375, -0.21631240844726562, 0.07839393615722656, 0.10947036743164062, -0.07715606689453125, -0.06558990478515625, 0.6348495483398438, 0.3139686584472656, -0.22571563720703125, 0.0795440673828125, 0.05875396728515625, 0.48538970947265625, 0.43598175048828125, 0.08436965942382812, -0.82275390625, 0.11244964599609375, -0.735687255859375, 0.17119598388671875, -0.3555450439453125, 0.041240692138671875, 0.31743621826171875, 0.604888916015625, 0.04132080078125, 1.0600738525390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000025.npy"}
|
||||
{"epoch": 0.03779289493575208, "step": 26, "batch_size": 64, "mean": 0.08338648080825806, "std": 0.37791183590888977, "min": -0.6041641235351562, "p10": -0.39337234497070306, "median": 0.07788467407226562, "p90": 0.5881595611572267, "max": 1.130096435546875, "pos_frac": 0.578125, "sample": [-0.16703033447265625, -0.472259521484375, 0.17046356201171875, 0.00827789306640625, 0.08922958374023438, 0.0113525390625, 0.59576416015625, -0.2746162414550781, 0.037395477294921875, 0.4293174743652344, 0.6600570678710938, 0.3311767578125, -0.4136810302734375, -0.23095703125, 0.5371246337890625, 0.45973968505859375, -0.14528274536132812, -0.5283088684082031, 0.35764312744140625, 0.2696342468261719, 0.1645660400390625, 0.0582122802734375, 0.0719451904296875, -0.20589065551757812, -0.16860580444335938, 0.10886383056640625, 0.8778877258300781, -0.16145896911621094, -0.5262374877929688, 0.3465728759765625, 0.09747314453125, 0.14984130859375, -0.20905303955078125, -0.31557464599609375, 0.34870147705078125, 1.130096435546875, -0.6041641235351562, -0.011669158935546875, 0.5704154968261719, 0.7698516845703125, 0.8591499328613281, 0.23114013671875, 0.08382415771484375, 0.27063751220703125, 0.49913787841796875, 0.1827545166015625, -0.156646728515625, 0.2132110595703125, 0.5958251953125, -0.18604660034179688, 0.16466522216796875, -0.34598541259765625, -0.15975379943847656, -0.4347076416015625, -0.22142410278320312, 0.5315093994140625, -0.1425933837890625, -0.18503952026367188, 0.1429595947265625, -0.49289703369140625, -0.12648773193359375, 0.1726665496826172, -0.19298553466796875, -0.18299293518066406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000026.npy"}
|
||||
{"epoch": 0.039304610733182165, "step": 27, "batch_size": 64, "mean": 0.0542508065700531, "std": 0.3701930344104767, "min": -0.75457763671875, "p10": -0.4561717987060547, "median": 0.07767581939697266, "p90": 0.46059188842773446, "max": 1.203643798828125, "pos_frac": 0.625, "sample": [0.190399169921875, 0.19190216064453125, -0.47000885009765625, 0.018917083740234375, 0.41690826416015625, 0.0402069091796875, 0.12508010864257812, 0.6930999755859375, 0.1134185791015625, 0.293182373046875, 0.0726165771484375, 0.1326446533203125, 0.32457542419433594, 0.08273506164550781, 0.14955520629882812, -0.19711685180664062, 0.23046493530273438, 0.194122314453125, 0.2766265869140625, 0.03717803955078125, -0.20697784423828125, -0.5462646484375, 0.15911865234375, -0.5829086303710938, -0.6795806884765625, 1.203643798828125, 0.052703857421875, -0.13081741333007812, 0.00795745849609375, -0.2010822296142578, 0.0343475341796875, 0.3134899139404297, -0.2712593078613281, 0.46796417236328125, 0.3808403015136719, -0.75457763671875, -0.48264312744140625, 0.33884429931640625, 0.5466690063476562, -0.082794189453125, 0.4066925048828125, 0.197113037109375, -0.26689910888671875, -0.1922607421875, 0.6122360229492188, 0.23621368408203125, -0.41161346435546875, 0.26705169677734375, 0.5347900390625, 0.443389892578125, 0.002986907958984375, 0.15038681030273438, -0.04557228088378906, -0.41363525390625, -0.24007034301757812, -0.11537933349609375, -0.417388916015625, 0.3346443176269531, 0.24603271484375, -0.4238853454589844, 0.7531585693359375, -0.08270263671875, -0.5013580322265625, -0.08506011962890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000027.npy"}
|
||||
{"epoch": 0.04081632653061224, "step": 28, "batch_size": 64, "mean": -0.007928639650344849, "std": 0.39496785402297974, "min": -1.1921157836914062, "p10": -0.4946025848388671, "median": 0.0083160400390625, "p90": 0.44070281982421894, "max": 1.053466796875, "pos_frac": 0.515625, "sample": [-0.530303955078125, -0.1733856201171875, -0.420013427734375, -0.16796112060546875, 0.24532318115234375, -0.16629981994628906, -0.6594390869140625, 0.7478103637695312, -0.16442298889160156, 0.015350341796875, 0.23950958251953125, 0.5880050659179688, 0.264068603515625, 0.39678192138671875, 1.053466796875, 0.016490936279296875, -0.057590484619140625, 0.04444122314453125, -0.25449371337890625, 0.33818817138671875, -0.13018035888671875, 0.4745635986328125, 0.07978630065917969, -0.05374908447265625, -0.10680389404296875, 0.14048194885253906, -0.0068874359130859375, -0.260101318359375, -0.19428634643554688, -0.21187210083007812, 0.0375518798828125, 0.18321609497070312, 0.136688232421875, 0.1541748046875, -1.1921157836914062, 0.45952606201171875, 0.07899665832519531, -0.03372955322265625, -0.1821441650390625, 0.7747802734375, -0.21564102172851562, -0.647674560546875, -0.10518264770507812, 0.5151939392089844, 0.13090133666992188, -0.05545616149902344, 0.3877677917480469, -0.034820556640625, 0.00128173828125, -0.5265693664550781, 0.03588104248046875, 0.1175994873046875, -0.11132431030273438, -0.8311233520507812, 0.25445556640625, 0.3360137939453125, 0.14198684692382812, -0.3695068359375, -0.100616455078125, -0.3129730224609375, -1.0963096618652344, 0.06861495971679688, 0.28437042236328125, 0.12227630615234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000028.npy"}
|
||||
{"epoch": 0.042328042328042326, "step": 29, "batch_size": 64, "mean": 0.05503681302070618, "std": 0.5025972127914429, "min": -0.998077392578125, "p10": -0.5514785766601562, "median": 0.023042678833007812, "p90": 0.6327720642089845, "max": 1.961944580078125, "pos_frac": 0.515625, "sample": [-0.3285865783691406, 0.5019035339355469, -0.13736724853515625, 0.93798828125, -0.1703033447265625, 0.39932823181152344, -0.5921783447265625, 0.2816181182861328, -0.4403839111328125, 0.4156951904296875, -0.16958999633789062, -0.8369293212890625, -0.153900146484375, 0.5808677673339844, 0.1806793212890625, 0.6044082641601562, -0.152587890625, 0.18023300170898438, -0.998077392578125, 0.63739013671875, -0.1343231201171875, 0.9155426025390625, 0.005893707275390625, 0.36618804931640625, 0.6221389770507812, 0.246490478515625, 0.1130218505859375, -0.46692657470703125, 1.961944580078125, -0.0005207061767578125, -0.158203125, 0.040191650390625, -0.781768798828125, -0.28006744384765625, -0.4578857421875, 0.06145286560058594, -0.3788013458251953, 0.26857757568359375, 0.5363998413085938, 0.6373291015625, 0.3988227844238281, -0.090423583984375, 0.13689041137695312, 0.21198272705078125, -0.10559844970703125, -0.5986480712890625, -0.4863128662109375, -0.57940673828125, 0.16886520385742188, 0.086456298828125, 0.655303955078125, 0.3101921081542969, 0.20418548583984375, -0.405548095703125, 0.26690673828125, 0.4042472839355469, -0.3435211181640625, -0.27712249755859375, 0.9549407958984375, -0.643280029296875, -0.1825714111328125, -0.22747802734375, -0.014698028564453125, -0.1787109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000029.npy"}
|
||||
{"epoch": 0.04383975812547241, "step": 30, "batch_size": 64, "mean": 0.06876346468925476, "std": 0.4496661126613617, "min": -1.91143798828125, "p10": -0.39254913330078123, "median": 0.07380104064941406, "p90": 0.514767074584961, "max": 1.1277313232421875, "pos_frac": 0.546875, "sample": [0.6074371337890625, -0.4481773376464844, 0.065032958984375, 0.2991485595703125, 0.04688262939453125, -0.06204986572265625, -0.07842445373535156, -0.2664337158203125, -0.08226776123046875, -0.5840835571289062, 0.4515228271484375, 0.3038330078125, 0.20875930786132812, 0.419891357421875, -0.207366943359375, 0.111541748046875, 0.3058662414550781, -0.20098876953125, -0.045619964599609375, 0.08359527587890625, 0.52264404296875, 0.4854011535644531, 0.251068115234375, 0.12599945068359375, -0.04451751708984375, -0.13999176025390625, 0.46044158935546875, -0.01396942138671875, -0.3607940673828125, 0.4054603576660156, 0.436126708984375, 0.11483383178710938, 0.3570899963378906, -0.22884368896484375, 1.1277313232421875, -0.6616668701171875, 0.410064697265625, -0.10425949096679688, 0.18353271484375, 0.08256912231445312, -0.04662513732910156, 0.24120330810546875, -1.91143798828125, 1.0628585815429688, -0.005420684814453125, -0.6988906860351562, 0.60284423828125, -0.24233245849609375, -0.16608810424804688, 0.02447509765625, 0.3917121887207031, 0.712493896484375, 0.4963874816894531, -0.4696197509765625, -0.25035858154296875, 0.7141532897949219, -0.17042922973632812, -0.1844635009765625, -0.406158447265625, 0.1872539520263672, 0.3100318908691406, -0.27972412109375, -0.11585235595703125, 0.26782989501953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000030.npy"}
|
||||
{"epoch": 0.045351473922902494, "step": 31, "batch_size": 64, "mean": 0.0010884404182434082, "std": 0.3672262728214264, "min": -0.8467864990234375, "p10": -0.4319618225097656, "median": -0.004734992980957031, "p90": 0.46820755004882814, "max": 1.1467437744140625, "pos_frac": 0.5, "sample": [0.03891754150390625, -0.45752716064453125, -0.8467864990234375, -0.4977397918701172, 0.09950447082519531, 0.170989990234375, -0.2321014404296875, -0.2084197998046875, 0.31038665771484375, 0.05715179443359375, -0.0321044921875, 1.1467437744140625, -0.5692596435546875, -0.220550537109375, -0.2598114013671875, 0.333740234375, -0.10448265075683594, 0.617462158203125, 0.39559173583984375, 0.0711669921875, -0.1862945556640625, -0.13830184936523438, -0.29253387451171875, 0.057476043701171875, -0.43682098388671875, -0.24444580078125, 0.8714752197265625, -0.420623779296875, 0.0050811767578125, -0.1368408203125, -0.2855224609375, 0.4707984924316406, 0.6108016967773438, 0.20183181762695312, -0.014551162719726562, 0.07891082763671875, -0.7698135375976562, 0.4621620178222656, -0.26324462890625, 0.07392692565917969, 0.13817596435546875, -0.21669769287109375, -0.22335433959960938, 0.0347442626953125, -0.16741943359375, -0.31784820556640625, 0.06892776489257812, -0.07912826538085938, 0.19398880004882812, 0.12357330322265625, 0.284423828125, 0.07233810424804688, 0.2593994140625, -0.049457550048828125, -0.634918212890625, 0.23369216918945312, 0.6045074462890625, 0.5823211669921875, -0.0467987060546875, 0.24699783325195312, -0.1448211669921875, -0.33458518981933594, -0.10850143432617188, 0.09375762939453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000031.npy"}
|
||||
{"epoch": 0.04686318972033258, "step": 32, "batch_size": 64, "mean": 0.07468008995056152, "std": 0.4342382848262787, "min": -1.1485595703125, "p10": -0.36600418090820314, "median": 0.07296371459960938, "p90": 0.590557861328125, "max": 1.1757965087890625, "pos_frac": 0.59375, "sample": [0.57342529296875, 0.40139007568359375, -0.15088653564453125, 0.6908416748046875, 0.339996337890625, 0.3924102783203125, -0.5384521484375, 0.6322479248046875, -1.09368896484375, -0.03656768798828125, -0.10941314697265625, 0.4380645751953125, 0.4296073913574219, -0.30054473876953125, 0.0095977783203125, 0.5212249755859375, -0.544342041015625, 0.09138870239257812, -0.2509613037109375, 0.10060882568359375, -0.2903327941894531, 0.43096160888671875, -0.025119781494140625, 0.0653839111328125, -0.0905303955078125, 0.06493568420410156, 0.22349929809570312, 0.1463775634765625, 0.127044677734375, 0.08054351806640625, 0.18306350708007812, -0.3028717041015625, 0.40139007568359375, -1.1485595703125, 0.46075439453125, 0.32623291015625, 0.04726409912109375, 0.597900390625, -0.1251220703125, -0.35320281982421875, 0.34442138671875, 0.8384284973144531, 0.06257247924804688, -0.9830474853515625, 0.2979412078857422, 0.6420669555664062, 0.23601913452148438, 0.7421722412109375, 1.1757965087890625, -0.09067535400390625, -0.3490753173828125, -0.42119598388671875, -0.16054916381835938, -0.001758575439453125, 0.2272186279296875, 0.005466461181640625, 0.20534515380859375, 0.5705490112304688, 0.08055877685546875, -0.0446319580078125, -0.10516357421875, -0.2054443359375, -0.371490478515625, -0.3315582275390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000032.npy"}
|
||||
{"epoch": 0.04837490551776266, "step": 33, "batch_size": 64, "mean": -0.02306431531906128, "std": 0.38692688941955566, "min": -1.0522537231445312, "p10": -0.43959655761718747, "median": -0.06341552734375, "p90": 0.43071975708007815, "max": 1.024444580078125, "pos_frac": 0.40625, "sample": [0.3400726318359375, 0.5917510986328125, 0.2276153564453125, -0.4621734619140625, -0.013519287109375, 0.0694580078125, -1.0522537231445312, -0.3495140075683594, 0.1807403564453125, 0.43536376953125, 0.23577880859375, -0.15013885498046875, -0.3417034149169922, -0.29953765869140625, 1.024444580078125, -0.13512420654296875, -0.2649078369140625, 0.05841064453125, 0.13118362426757812, 0.8556671142578125, -0.330535888671875, -0.21258544921875, -0.2120189666748047, -0.08069610595703125, -0.052459716796875, 0.3376007080078125, 1.0061187744140625, -0.011707305908203125, -0.3330535888671875, -0.7138214111328125, -0.04738426208496094, 0.0469818115234375, -0.04135894775390625, -0.1279754638671875, 0.6417503356933594, -0.12756729125976562, -0.32099342346191406, 0.41988372802734375, 0.16613006591796875, -0.18308639526367188, -0.219024658203125, -0.09813690185546875, -0.31463623046875, -0.26104736328125, 0.1926116943359375, -0.074371337890625, 0.166412353515625, -0.3869171142578125, 0.4419517517089844, 0.18277740478515625, -0.12322998046875, 0.248046875, -0.014026641845703125, 0.14235305786132812, 0.2030181884765625, -0.1745452880859375, 0.2682342529296875, -0.715240478515625, -0.6671142578125, -0.4825897216796875, -0.21433639526367188, -0.223785400390625, -0.4980621337890625, 0.2407073974609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000033.npy"}
|
||||
{"epoch": 0.049886621315192746, "step": 34, "batch_size": 64, "mean": 0.058904558420181274, "std": 0.35846978425979614, "min": -0.6466255187988281, "p10": -0.3864332199096679, "median": 0.013479232788085938, "p90": 0.5778144836425781, "max": 0.8130874633789062, "pos_frac": 0.515625, "sample": [-0.4186363220214844, -0.5765914916992188, 0.14633941650390625, 0.4119071960449219, -0.294647216796875, 0.78857421875, 0.5948867797851562, 0.57525634765625, 0.03889274597167969, 0.529144287109375, -0.03374481201171875, -0.0941314697265625, -0.0026702880859375, -0.294830322265625, 0.2589263916015625, -0.12683868408203125, 0.6923065185546875, -0.12347030639648438, -0.062896728515625, -0.008258819580078125, 0.0114593505859375, 0.13365936279296875, -0.22306442260742188, -0.1069793701171875, -0.49786376953125, -0.00433349609375, -0.46190643310546875, 0.24300384521484375, 0.15448760986328125, 0.015773773193359375, 0.015499114990234375, 0.37277984619140625, -0.15000343322753906, 0.46361351013183594, -0.02190399169921875, -0.645538330078125, -0.125946044921875, 0.038883209228515625, 0.14303207397460938, 0.6706390380859375, -0.26873016357421875, -0.1132659912109375, -0.1849193572998047, -0.212127685546875, -0.538787841796875, 0.5497283935546875, -0.08805084228515625, -0.6466255187988281, -0.31089019775390625, 0.22960662841796875, -0.3112926483154297, 0.6536483764648438, 0.16861724853515625, 0.16114044189453125, 0.49353790283203125, 0.5789108276367188, 0.253448486328125, 0.06401443481445312, 0.37178993225097656, -0.1690998077392578, 0.31097412109375, -0.0780487060546875, 0.8130874633789062, 0.0184173583984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000034.npy"}
|
||||
{"epoch": 0.05139833711262283, "step": 35, "batch_size": 64, "mean": 0.0740436315536499, "std": 0.4060877561569214, "min": -0.8867645263671875, "p10": -0.34484500885009767, "median": 0.01981067657470703, "p90": 0.5251434326171877, "max": 1.5816192626953125, "pos_frac": 0.546875, "sample": [0.2278289794921875, -0.2672119140625, -0.1830291748046875, 0.3044586181640625, -0.3083477020263672, 1.5816192626953125, -0.060546875, -0.15423583984375, -0.7410125732421875, -0.10042572021484375, 0.4913482666015625, 0.11390113830566406, 0.7349929809570312, -0.3448009490966797, 0.29889678955078125, -0.4445228576660156, 0.82513427734375, -0.18849945068359375, 0.1610870361328125, 0.006988525390625, 0.02191925048828125, -0.26226043701171875, -0.4168548583984375, 0.01229095458984375, 0.030208587646484375, -0.2418365478515625, 0.8549652099609375, 0.3032493591308594, 0.5529022216796875, 0.0253448486328125, 0.390960693359375, -0.8867645263671875, 0.2183380126953125, -0.005130767822265625, -0.3768310546875, 1.049224853515625, 0.030931472778320312, -0.3448638916015625, -0.15150070190429688, 0.15000152587890625, 0.12158203125, -0.055553436279296875, -0.19646072387695312, 0.4658470153808594, -0.06765365600585938, 0.12238311767578125, 0.21470260620117188, 0.35931396484375, -0.17058181762695312, 0.017702102661132812, -0.056110382080078125, -0.14618301391601562, -0.10511016845703125, -0.205108642578125, 0.1244659423828125, 0.4474906921386719, 0.334259033203125, -0.42357635498046875, 0.06510162353515625, 0.5396270751953125, -0.1326751708984375, -0.15248870849609375, 0.4587249755859375, 0.2711772918701172], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000035.npy"}
|
||||
{"epoch": 0.05291005291005291, "step": 36, "batch_size": 64, "mean": 0.09882298111915588, "std": 0.4983120560646057, "min": -0.8367424011230469, "p10": -0.3817268371582031, "median": 0.06438827514648438, "p90": 0.5570125579833985, "max": 2.0362701416015625, "pos_frac": 0.53125, "sample": [0.123260498046875, 0.5094146728515625, 2.0362701416015625, 0.07161712646484375, -0.2193603515625, -0.595367431640625, -0.1591949462890625, 0.4762420654296875, -0.16603469848632812, -0.1544952392578125, -0.21765899658203125, 0.33998680114746094, -0.38893890380859375, -0.199981689453125, -0.07272911071777344, 0.3228492736816406, -0.546630859375, -0.06859970092773438, 0.24146270751953125, -0.5219802856445312, 0.7335357666015625, 0.38939666748046875, 0.5646743774414062, 0.09665107727050781, -0.35242462158203125, -0.18019866943359375, -0.2641105651855469, 0.3111419677734375, 0.12129974365234375, 1.94622802734375, 0.3856964111328125, 0.0881805419921875, -0.4164619445800781, -0.14344024658203125, 0.4443359375, 0.057159423828125, -0.322052001953125, 0.8802032470703125, 0.11324310302734375, -0.8027114868164062, 0.15996170043945312, 0.5648040771484375, -0.08113861083984375, -0.05840492248535156, 0.00186920166015625, -0.10234642028808594, -0.031558990478515625, 0.9172477722167969, -0.364898681640625, -0.0063934326171875, -0.8367424011230469, -0.21404266357421875, 0.27197265625, -0.1867828369140625, 0.5391349792480469, -0.28960418701171875, 0.38614654541015625, 0.2854728698730469, 0.137481689453125, 0.3791351318359375, 0.1571502685546875, 0.238494873046875, 0.07420730590820312, -0.07697296142578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000036.npy"}
|
||||
{"epoch": 0.05442176870748299, "step": 37, "batch_size": 64, "mean": -0.00628247857093811, "std": 0.41717907786369324, "min": -1.0715179443359375, "p10": -0.49281730651855465, "median": -0.03659629821777344, "p90": 0.5378742218017584, "max": 1.16522216796875, "pos_frac": 0.453125, "sample": [-0.1060638427734375, -0.2145538330078125, -0.409027099609375, -0.122314453125, -0.24993896484375, 0.135498046875, -0.0350189208984375, 0.1784687042236328, 0.027322769165039062, -0.4716339111328125, 0.185791015625, 0.17370223999023438, -0.12276458740234375, -0.197052001953125, 0.3669700622558594, -0.318939208984375, -0.24718093872070312, 0.7841453552246094, 0.2732353210449219, 0.66461181640625, 0.5994377136230469, 0.0457305908203125, 1.16522216796875, 0.6999969482421875, -0.2934913635253906, -0.3793792724609375, 0.1116180419921875, -0.4614105224609375, -0.7510833740234375, 0.15108489990234375, -0.23284912109375, -0.30625152587890625, -0.144195556640625, -0.7474288940429688, -0.005401611328125, 0.3042755126953125, -0.04347991943359375, -0.038173675537109375, 0.13053131103515625, -0.17783355712890625, 0.74017333984375, 0.2806510925292969, 0.34796905517578125, 0.39422607421875, -0.2676544189453125, -0.5068454742431641, -0.5483245849609375, -1.0715179443359375, -0.12347412109375, 0.207244873046875, -0.5018959045410156, 0.059810638427734375, -0.14865493774414062, -0.2379608154296875, 0.16236114501953125, 0.2642402648925781, -0.014392852783203125, 0.027273178100585938, 1.00146484375, -0.5227203369140625, -0.470947265625, -0.0530242919921875, 0.2726116180419922, 0.3851318359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000037.npy"}
|
||||
{"epoch": 0.055933484504913075, "step": 38, "batch_size": 64, "mean": 0.005452901124954224, "std": 0.42673128843307495, "min": -0.8671875, "p10": -0.48159217834472656, "median": -0.02039813995361328, "p90": 0.5309333801269532, "max": 1.5721588134765625, "pos_frac": 0.484375, "sample": [0.31873321533203125, -0.11399459838867188, 0.5514068603515625, -0.12813186645507812, 0.1050567626953125, 0.25177001953125, 0.16907119750976562, 0.06436538696289062, 0.9974517822265625, -0.57196044921875, -0.18968963623046875, -0.483428955078125, 0.2375621795654297, -0.3354606628417969, -0.31273651123046875, -0.099761962890625, 0.38348388671875, 0.10787200927734375, -0.44699859619140625, 0.996337890625, 0.02862548828125, 0.746368408203125, -0.5497665405273438, 0.471282958984375, -0.17776870727539062, 0.207855224609375, 0.18780517578125, -0.11923599243164062, 0.5382232666015625, -0.12276458740234375, -0.4773063659667969, -0.8671875, 0.045780181884765625, 0.5139236450195312, 0.19621658325195312, -0.3827667236328125, 0.214141845703125, -0.07271766662597656, -0.22391128540039062, 0.03618049621582031, 0.187591552734375, -0.7402801513671875, -0.04736137390136719, -0.45870208740234375, -0.3398590087890625, -0.180633544921875, 0.2201385498046875, -0.11684417724609375, 0.11180877685546875, -0.027069091796875, 0.5550384521484375, 0.0875244140625, 1.5721588134765625, -0.3523063659667969, -0.043460845947265625, -0.16071319580078125, -0.38660430908203125, -0.21202850341796875, -0.6426162719726562, 0.01702880859375, 0.09172821044921875, -0.5211181640625, -0.013727188110351562, 0.05536651611328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000038.npy"}
|
||||
{"epoch": 0.05744520030234316, "step": 39, "batch_size": 64, "mean": 0.11931854486465454, "std": 0.3461139500141144, "min": -0.897125244140625, "p10": -0.2260400772094726, "median": 0.07345199584960938, "p90": 0.5943695068359376, "max": 1.1458740234375, "pos_frac": 0.640625, "sample": [0.45473480224609375, 0.1974334716796875, 0.5121612548828125, 0.19064903259277344, 0.5833358764648438, 0.023162841796875, 0.3491668701171875, -0.03749847412109375, -0.1718292236328125, 0.29949188232421875, 0.182403564453125, -0.06372451782226562, 0.2906074523925781, -0.10904502868652344, -0.1042633056640625, -0.09025192260742188, -0.13997840881347656, 0.08612823486328125, 0.07988166809082031, 0.4420013427734375, 0.01287841796875, 0.6111297607421875, 0.224853515625, 0.5472526550292969, 1.1458740234375, 0.6518898010253906, -0.1128387451171875, -0.1254138946533203, -0.4751739501953125, 0.19380569458007812, 0.0658721923828125, 0.61322021484375, 0.5990982055664062, -0.07417678833007812, -0.4171714782714844, 0.06034088134765625, 0.0056610107421875, -0.4391632080078125, -0.14404296875, 0.6514892578125, -0.0239410400390625, 0.23284912109375, -0.24927330017089844, 0.769744873046875, -0.41510009765625, 0.04434967041015625, 0.452789306640625, 0.16900634765625, 0.5026206970214844, 0.048343658447265625, 0.07442474365234375, 0.26461029052734375, 0.04217529296875, -0.05390167236328125, -0.10400962829589844, -0.1639404296875, -0.137359619140625, -0.897125244140625, 0.33922576904296875, 0.18247222900390625, 0.123992919921875, 0.24971961975097656, 0.072479248046875, -0.4577178955078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000039.npy"}
|
||||
{"epoch": 0.05895691609977324, "step": 40, "batch_size": 64, "mean": -0.05691874027252197, "std": 0.4310424327850342, "min": -1.65277099609375, "p10": -0.6307830810546874, "median": -0.051568031311035156, "p90": 0.42664833068847674, "max": 0.889495849609375, "pos_frac": 0.421875, "sample": [0.20284271240234375, 0.35984039306640625, 0.845062255859375, 0.22306442260742188, -0.038715362548828125, -0.3626594543457031, -0.19232177734375, 0.2241973876953125, 0.6179046630859375, -0.7346458435058594, 0.3852348327636719, -0.04012298583984375, 0.7347030639648438, -0.14652633666992188, 0.065948486328125, -0.1655101776123047, 0.6326141357421875, 0.1592864990234375, -0.018632888793945312, -0.10219383239746094, 0.0527801513671875, 0.44439697265625, -0.532196044921875, 0.06691360473632812, -0.44049835205078125, -0.22552490234375, 0.31826019287109375, -0.5081405639648438, 0.0953369140625, 0.1960773468017578, -0.0030422210693359375, 0.889495849609375, -0.12237548828125, 0.19220733642578125, -0.21276283264160156, -0.2510662078857422, -0.2945747375488281, 0.0131683349609375, -0.06301307678222656, -0.1719818115234375, 0.08131217956542969, -1.65277099609375, 0.04936790466308594, -0.67303466796875, -0.26708984375, -0.07807159423828125, 0.18005752563476562, -0.17928695678710938, -0.2165069580078125, -0.183074951171875, 0.616363525390625, -0.699951171875, -0.6861724853515625, -0.3084716796875, -0.15316009521484375, -0.17235565185546875, 0.18292236328125, 0.34769439697265625, -0.9013595581054688, -0.7447509765625, 0.24590301513671875, -0.1789703369140625, -0.33429718017578125, -0.00992584228515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000040.npy"}
|
||||
{"epoch": 0.06046863189720333, "step": 41, "batch_size": 64, "mean": 0.08366268873214722, "std": 0.4139462113380432, "min": -1.2115936279296875, "p10": -0.4390071868896484, "median": 0.11165237426757812, "p90": 0.6184082031250001, "max": 0.9426116943359375, "pos_frac": 0.609375, "sample": [0.360748291015625, 0.33354949951171875, -0.06351470947265625, -0.5934524536132812, 0.9277191162109375, 0.074127197265625, -0.21319961547851562, 0.3304901123046875, 0.7626266479492188, -0.48560333251953125, -0.35688018798828125, 0.3340015411376953, -0.3906059265136719, -0.126617431640625, 0.25496673583984375, 0.9426116943359375, 0.13471221923828125, 0.3788604736328125, -0.4310417175292969, -0.10223197937011719, -0.6021728515625, -0.173004150390625, 0.16622161865234375, 0.05322265625, 0.2277050018310547, -0.22259521484375, 0.10630035400390625, 0.0967864990234375, 0.08832931518554688, 0.587860107421875, -0.07396316528320312, -1.2115936279296875, 0.886077880859375, 0.434814453125, 0.12339591979980469, -0.02547454833984375, 0.16985321044921875, -0.3424224853515625, 0.0976104736328125, -0.2502422332763672, 0.3197517395019531, -0.10453033447265625, 0.25434112548828125, -0.15732383728027344, 0.20168304443359375, -0.4104881286621094, 0.29848480224609375, -0.09561920166015625, 0.3191070556640625, 0.7156906127929688, -0.6182632446289062, 0.631500244140625, -0.44242095947265625, -0.44899749755859375, 0.12909698486328125, 0.2476959228515625, 0.0616912841796875, 0.1908416748046875, 0.13447189331054688, -0.0897674560546875, 0.5583953857421875, 0.5743408203125, 0.7597503662109375, 0.11700439453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000041.npy"}
|
||||
{"epoch": 0.06198034769463341, "step": 42, "batch_size": 64, "mean": 0.018820196390151978, "std": 0.4139035940170288, "min": -1.0120849609375, "p10": -0.44615707397460935, "median": 0.018230438232421875, "p90": 0.4938674926757814, "max": 0.9763336181640625, "pos_frac": 0.515625, "sample": [0.91387939453125, 0.45001220703125, 0.0204925537109375, 0.2764434814453125, 0.38785552978515625, -0.22486114501953125, -0.3484840393066406, -0.09750175476074219, 0.01596832275390625, -0.11434173583984375, -0.219940185546875, 0.1688079833984375, -0.053951263427734375, -0.8500862121582031, -1.0120849609375, 0.38105010986328125, 0.3343353271484375, 0.1818084716796875, 0.11806488037109375, -0.20850753784179688, 0.2348175048828125, 0.43445587158203125, 0.5403404235839844, -0.8408088684082031, 0.16259002685546875, 0.6825790405273438, 0.2852058410644531, -0.37940216064453125, 0.027315139770507812, -0.2894744873046875, 0.455169677734375, 0.10175704956054688, -0.27492332458496094, -0.0585479736328125, -0.03009033203125, -0.27263641357421875, -0.059478759765625, 0.2581024169921875, -0.15192413330078125, 0.9763336181640625, 0.20763397216796875, 0.22202301025390625, 0.1087646484375, -0.59857177734375, -1.0028839111328125, 0.5104522705078125, 0.446685791015625, -0.4541473388671875, -0.36078643798828125, -0.11978912353515625, -0.08370018005371094, -0.20055770874023438, 0.24333953857421875, 0.266387939453125, -0.16493606567382812, 0.5425033569335938, -0.42751312255859375, -0.09570884704589844, -0.09600067138671875, 0.694854736328125, 0.06243133544921875, 0.3469047546386719, -0.4594879150390625, -0.3037452697753906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000042.npy"}
|
||||
{"epoch": 0.06349206349206349, "step": 43, "batch_size": 64, "mean": -0.02868404984474182, "std": 0.48636728525161743, "min": -1.200408935546875, "p10": -0.627657699584961, "median": -0.014181137084960938, "p90": 0.5100791931152344, "max": 1.550262451171875, "pos_frac": 0.5, "sample": [-0.044097900390625, 0.9279594421386719, 0.49920654296875, 0.21611404418945312, 0.1095428466796875, 0.3795433044433594, -0.19405746459960938, 0.015735626220703125, -0.6544189453125, 0.052814483642578125, -0.23252487182617188, -0.5557785034179688, -0.545684814453125, -0.6323623657226562, 1.550262451171875, -0.8320159912109375, 0.521728515625, -0.18477630615234375, -0.08145332336425781, -0.36554718017578125, -0.2662181854248047, 0.40622711181640625, -0.440032958984375, -0.18318939208984375, -0.17342758178710938, 0.06547927856445312, 0.03640174865722656, 0.0314178466796875, 0.1951904296875, -0.1604766845703125, 0.8737335205078125, 0.157318115234375, 0.0211639404296875, 0.9609527587890625, -0.15427398681640625, 0.05963134765625, -0.5005645751953125, 0.07838821411132812, 0.5107269287109375, 0.5085678100585938, -0.4830474853515625, -0.6166801452636719, -0.07362747192382812, -0.8811569213867188, -0.08594322204589844, -1.200408935546875, -0.7005233764648438, 0.03493690490722656, -0.5595321655273438, 0.1395263671875, 0.470184326171875, 0.11446380615234375, -0.1950531005859375, -0.1911773681640625, 0.05908966064453125, 0.21669769287109375, 0.4339599609375, -0.19986724853515625, 0.6239013671875, -0.956268310546875, -0.065765380859375, 0.34908294677734375, -0.16312408447265625, 0.11734771728515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000043.npy"}
|
||||
{"epoch": 0.06500377928949358, "step": 44, "batch_size": 64, "mean": 0.005178704857826233, "std": 0.45065608620643616, "min": -1.2264404296875, "p10": -0.5901016235351562, "median": -0.021246910095214844, "p90": 0.7230461120605469, "max": 1.1218414306640625, "pos_frac": 0.46875, "sample": [-0.007350921630859375, 0.24568557739257812, -0.278594970703125, 0.19784164428710938, -0.45749664306640625, -1.2264404296875, -0.14408111572265625, -0.48259735107421875, -0.18768310546875, 0.19190216064453125, -0.25777435302734375, -0.3635711669921875, 0.2703704833984375, -0.027410507202148438, 0.08119964599609375, -0.07688140869140625, -0.0534210205078125, 0.08722305297851562, -0.6483001708984375, 0.0318145751953125, -0.05677032470703125, -0.08675003051757812, 0.0679473876953125, 0.7067947387695312, 0.09428787231445312, 0.730010986328125, -0.11150360107421875, 0.18949127197265625, -0.616546630859375, 0.9218292236328125, 0.842742919921875, -0.24556732177734375, -0.641204833984375, -0.01508331298828125, -0.0781850814819336, -0.08300018310546875, -0.0937347412109375, 0.012775421142578125, 0.49359130859375, 0.7883529663085938, -0.3450431823730469, 0.20804595947265625, 0.874237060546875, 0.04276275634765625, 0.22214508056640625, -0.14950180053710938, -0.21218109130859375, -0.7017288208007812, -0.7830810546875, 0.2876605987548828, -0.6709518432617188, 0.38901519775390625, 0.09703826904296875, -0.5283966064453125, -0.2326812744140625, 0.010852813720703125, 0.765594482421875, 1.1218414306640625, 0.5714149475097656, -0.24202728271484375, 0.25200653076171875, -0.18395233154296875, -0.3542633056640625, 0.17871856689453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000044.npy"}
|
||||
{"epoch": 0.06651549508692366, "step": 45, "batch_size": 64, "mean": 0.0750853419303894, "std": 0.4261990487575531, "min": -1.114166259765625, "p10": -0.381005859375, "median": 0.08188819885253906, "p90": 0.5330253601074219, "max": 1.605072021484375, "pos_frac": 0.59375, "sample": [-0.27576446533203125, 0.29827880859375, -0.1584930419921875, 0.21010589599609375, -0.14932823181152344, 0.876800537109375, 0.12975692749023438, 1.605072021484375, -0.396697998046875, -0.11639404296875, -0.12752532958984375, -0.055145263671875, 0.2604331970214844, 0.20068359375, -0.205841064453125, 0.002834320068359375, -0.4329986572265625, 0.5276260375976562, -0.3241119384765625, -0.2517852783203125, 0.81256103515625, 0.28240966796875, 0.3873748779296875, 0.07550811767578125, -0.147979736328125, 0.0758056640625, 0.412353515625, -0.2897148132324219, 0.6286849975585938, -0.32862091064453125, 0.017520904541015625, -0.40753173828125, 0.20214080810546875, 0.10113906860351562, 0.488800048828125, -0.3170623779296875, -0.4587249755859375, 0.5178375244140625, -0.266571044921875, 0.002094268798828125, 0.3514404296875, 0.628173828125, 0.1351318359375, 0.42724609375, 0.13074493408203125, 0.10386276245117188, -0.881591796875, 0.29381561279296875, -1.114166259765625, -0.344390869140625, 0.39604949951171875, 0.08938980102539062, 0.15869903564453125, 0.025054931640625, 0.20056915283203125, 0.08797073364257812, 0.77886962890625, -0.19968032836914062, -0.5342311859130859, -0.04620361328125, -0.15441513061523438, 0.53533935546875, 0.386199951171875, -0.05394744873046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000045.npy"}
|
||||
{"epoch": 0.06802721088435375, "step": 46, "batch_size": 64, "mean": 0.1051889955997467, "std": 0.4198872447013855, "min": -0.941680908203125, "p10": -0.39826087951660155, "median": 0.11277198791503906, "p90": 0.6179183959960938, "max": 1.387359619140625, "pos_frac": 0.578125, "sample": [0.8248443603515625, -0.02036285400390625, 0.1441497802734375, -0.941680908203125, 0.1427154541015625, 1.387359619140625, -0.02553558349609375, -0.02725982666015625, 0.61932373046875, 0.0656585693359375, 0.14841079711914062, 0.1752777099609375, 0.144195556640625, 0.00470733642578125, 0.4802093505859375, 0.7214202880859375, 0.05551910400390625, 0.43003082275390625, -0.13750076293945312, 0.10940933227539062, -0.15908050537109375, 1.332275390625, -0.490509033203125, -0.0765228271484375, -0.003421783447265625, -0.1524505615234375, 0.2315216064453125, -0.8198394775390625, 0.3095550537109375, 0.278289794921875, 0.6146392822265625, 0.39473724365234375, 0.3257942199707031, 0.385406494140625, -0.4841346740722656, 0.1643218994140625, 0.20653915405273438, -0.17657089233398438, 0.1161346435546875, 0.16967010498046875, -0.3832244873046875, 0.30957794189453125, 0.6776885986328125, 0.38372039794921875, -0.20287322998046875, 0.026012420654296875, 0.43071746826171875, -0.31636810302734375, -0.4047050476074219, 0.25647735595703125, -0.04730796813964844, 0.27087974548339844, -0.4134941101074219, -0.1183319091796875, 0.7894439697265625, 0.1406707763671875, -0.5761184692382812, -0.15152359008789062, -0.12127685546875, 0.30767822265625, -0.10410308837890625, -0.20305442810058594, -0.25185394287109375, -0.033782958984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000046.npy"}
|
||||
{"epoch": 0.06953892668178382, "step": 47, "batch_size": 64, "mean": 0.02325788140296936, "std": 0.3212243914604187, "min": -0.8171768188476562, "p10": -0.31013717651367184, "median": -0.02198505401611328, "p90": 0.40599594116210946, "max": 0.8045806884765625, "pos_frac": 0.46875, "sample": [0.12004661560058594, -0.22943878173828125, -0.47252655029296875, -0.1971282958984375, -0.277130126953125, -0.017370223999023438, -0.028802871704101562, -0.04793548583984375, 0.02568817138671875, 0.32415771484375, -0.05653953552246094, -0.0036163330078125, -0.32192230224609375, 0.38895416259765625, 0.06134033203125, -0.18314361572265625, -0.05451202392578125, 0.2693328857421875, 0.1323699951171875, -0.086395263671875, 0.057384490966796875, -0.42218017578125, 0.35092926025390625, -0.2535743713378906, 0.11462020874023438, -0.0740966796875, 0.457672119140625, -0.12146377563476562, -0.08423614501953125, -0.13823699951171875, -0.0886077880859375, 0.5562858581542969, 0.30686187744140625, -0.4622039794921875, 0.3379974365234375, -0.22898101806640625, 0.19626617431640625, 0.800628662109375, 0.413299560546875, -0.6573486328125, -0.2826385498046875, 0.06537628173828125, 0.06945037841796875, -0.09778594970703125, 0.22304153442382812, -0.2726287841796875, -0.04986572265625, -0.49849700927734375, 0.741729736328125, 0.06632614135742188, -0.172027587890625, -0.8171768188476562, 0.8045806884765625, -0.03478240966796875, 0.11318588256835938, 0.24932098388671875, -0.2031726837158203, 0.19283294677734375, -0.1961822509765625, 0.37462615966796875, 0.0856475830078125, 0.4772796630859375, 0.27001953125, -0.026599884033203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000047.npy"}
|
||||
{"epoch": 0.0710506424792139, "step": 48, "batch_size": 64, "mean": 0.015231996774673462, "std": 0.3976157009601593, "min": -0.8504676818847656, "p10": -0.5057441711425781, "median": -0.029542922973632812, "p90": 0.5270362854003907, "max": 0.894805908203125, "pos_frac": 0.46875, "sample": [-0.09121513366699219, -0.3536834716796875, 0.2638969421386719, 0.348724365234375, 0.0586395263671875, 0.38727569580078125, 0.0435943603515625, -0.375244140625, 0.25547027587890625, 0.5651168823242188, -0.6654586791992188, -0.07457733154296875, 0.49552154541015625, 0.1119232177734375, 0.7913665771484375, 0.585906982421875, -0.32230377197265625, -0.08859062194824219, 0.255950927734375, 0.4170684814453125, -0.6813125610351562, -0.548187255859375, 0.41159820556640625, 0.176971435546875, -0.4010200500488281, -0.39792633056640625, 0.5405426025390625, -0.052295684814453125, -0.8504676818847656, -0.02850341796875, 0.05164337158203125, 0.32635498046875, -0.030582427978515625, -0.35309600830078125, -0.43355560302734375, 0.615570068359375, 0.2748680114746094, 0.2629852294921875, -0.16924667358398438, 0.1172027587890625, -0.039276123046875, 0.894805908203125, -0.53668212890625, 0.21793556213378906, 0.6357421875, -0.07216262817382812, -0.16262435913085938, 0.09814453125, 0.3209075927734375, 0.4437828063964844, -0.1177520751953125, -0.19088363647460938, -0.3315315246582031, 0.4500617980957031, -0.22327804565429688, 0.42624664306640625, -0.8124237060546875, -0.18626785278320312, -0.168609619140625, -0.737884521484375, -0.016693115234375, -0.03173065185546875, -0.228790283203125, -0.09711456298828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000048.npy"}
|
||||
{"epoch": 0.07256235827664399, "step": 49, "batch_size": 64, "mean": -0.0010673105716705322, "std": 0.388465940952301, "min": -1.3009796142578125, "p10": -0.46550140380859373, "median": 0.019835472106933594, "p90": 0.42645263671875, "max": 0.7587203979492188, "pos_frac": 0.515625, "sample": [-0.18126678466796875, -0.4235992431640625, 0.2102680206298828, 0.13169097900390625, -0.6166534423828125, -0.067718505859375, 0.020038604736328125, -0.2274322509765625, 0.019632339477539062, 0.426025390625, -0.08589363098144531, -0.0651397705078125, 0.0842437744140625, -0.6336669921875, 0.1319713592529297, -0.22603416442871094, -0.1049346923828125, -1.3009796142578125, -0.1171875, -0.2728424072265625, 0.1387786865234375, -0.043025970458984375, -0.04442596435546875, -0.12819671630859375, 0.10149955749511719, -0.48345947265625, 0.5151901245117188, -0.035236358642578125, 0.16996002197265625, 0.2650146484375, 0.060237884521484375, -0.040008544921875, 0.5835494995117188, 0.18648147583007812, 0.7587203979492188, 0.0567169189453125, -0.64300537109375, -1.195220947265625, 0.4266357421875, 0.3656005859375, -0.400054931640625, 0.3138275146484375, 0.3351707458496094, 0.322174072265625, 0.4217987060546875, 0.13059234619140625, 0.14344406127929688, -0.17830657958984375, 0.226593017578125, 0.1149139404296875, 0.5571365356445312, -0.330810546875, -0.02978515625, -0.02687835693359375, 0.09764862060546875, -0.170745849609375, -0.01230621337890625, 0.36175537109375, -0.13423919677734375, 0.2523822784423828, 0.47113037109375, 0.624237060546875, -0.07305908203125, -0.8012542724609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000049.npy"}
|
||||
{"epoch": 0.07407407407407407, "step": 50, "batch_size": 64, "mean": 0.014360368251800537, "std": 0.3726046681404114, "min": -1.238861083984375, "p10": -0.42445220947265616, "median": 0.020796775817871094, "p90": 0.5042282104492187, "max": 0.7253189086914062, "pos_frac": 0.53125, "sample": [0.07907867431640625, -0.23333740234375, 0.4251861572265625, -0.06811141967773438, -0.5617141723632812, -0.26673126220703125, 0.504425048828125, -1.238861083984375, -0.4783935546875, 0.2644004821777344, -0.045635223388671875, 0.27436065673828125, -0.29647064208984375, 0.4228324890136719, -0.18965530395507812, -0.0072460174560546875, 0.28705596923828125, 0.08300399780273438, 0.115020751953125, 0.21367645263671875, 0.4106292724609375, 0.12040328979492188, -0.000148773193359375, 0.627105712890625, -0.05171966552734375, -0.66357421875, 0.4192962646484375, 0.6052093505859375, -0.23868179321289062, -0.20846939086914062, 0.6147041320800781, 0.378814697265625, 0.017049789428710938, 0.25836944580078125, 0.607208251953125, 0.7253189086914062, -0.619598388671875, -0.09210968017578125, 0.0808563232421875, 0.04853057861328125, -0.7381591796875, -0.44950103759765625, -0.1600799560546875, 0.14031982421875, -0.24411773681640625, 0.08039665222167969, 0.53607177734375, 0.11077880859375, -0.1217803955078125, -0.3444671630859375, 0.18379592895507812, -0.23035812377929688, 0.036590576171875, 0.2788276672363281, 0.021627426147460938, -0.36600494384765625, -0.22276687622070312, 0.5037689208984375, 0.01996612548828125, -0.03325653076171875, -0.23392486572265625, 0.250518798828125, -0.22249221801757812, -0.19876861572265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000050.npy"}
|
||||
{"epoch": 0.07558578987150416, "step": 51, "batch_size": 64, "mean": -0.03938555717468262, "std": 0.44527432322502136, "min": -1.010009765625, "p10": -0.601678466796875, "median": -0.0337982177734375, "p90": 0.650851058959961, "max": 1.02392578125, "pos_frac": 0.453125, "sample": [0.022430419921875, 0.6869277954101562, 0.2877082824707031, -0.3614692687988281, -0.5458641052246094, 0.6776657104492188, 0.13263702392578125, -0.19134521484375, 0.2627582550048828, 0.38189697265625, 0.1562938690185547, -1.010009765625, -0.00543212890625, -0.20928955078125, 0.15436553955078125, -0.5975341796875, 0.7000350952148438, -0.219879150390625, 0.18822860717773438, 0.761810302734375, -0.27773284912109375, 0.0623626708984375, 0.05196380615234375, -0.60345458984375, 0.6508636474609375, 0.089813232421875, -0.0494842529296875, -0.22866439819335938, -0.030948638916015625, -0.9306106567382812, -0.201324462890625, -0.15869140625, -0.07379531860351562, -0.9745330810546875, -0.8513259887695312, -0.10809707641601562, -0.01340484619140625, 0.14731597900390625, -0.3794517517089844, 0.006916046142578125, 0.6508216857910156, -0.5875320434570312, -0.04573822021484375, 1.02392578125, 0.27021026611328125, 0.41455078125, -0.14947509765625, -0.632476806640625, 0.8949737548828125, 0.22225189208984375, 0.08277511596679688, -0.1107177734375, -0.14525222778320312, -0.05872344970703125, 0.05648040771484375, -0.036647796630859375, -0.5091438293457031, 0.01842498779296875, 0.12689590454101562, -0.17479705810546875, -0.8739395141601562, -0.4424285888671875, -0.14972686767578125, 0.23496246337890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000051.npy"}
|
||||
{"epoch": 0.07709750566893424, "step": 52, "batch_size": 64, "mean": 0.06755334138870239, "std": 0.4286964535713196, "min": -0.83233642578125, "p10": -0.44507369995117185, "median": 0.0846099853515625, "p90": 0.7033370971679689, "max": 0.9568634033203125, "pos_frac": 0.578125, "sample": [0.2378082275390625, -0.551910400390625, 0.890380859375, 0.8704071044921875, -0.4359893798828125, -0.1932373046875, 0.13165283203125, 0.09632492065429688, -0.19763565063476562, 0.833160400390625, 0.4743804931640625, 0.25574493408203125, -0.51751708984375, 0.7355422973632812, 0.1243438720703125, -0.35941314697265625, -0.0916748046875, -0.44896697998046875, 0.7655487060546875, 0.06251144409179688, -0.3253746032714844, -0.1607513427734375, 0.7146987915039062, 0.0631866455078125, 0.9568634033203125, -0.699615478515625, -0.11266326904296875, 0.15942955017089844, 0.1947021484375, 0.07289505004882812, 0.24730682373046875, 0.45062255859375, 0.49530792236328125, 0.21559524536132812, 0.058712005615234375, -0.83233642578125, -0.26163482666015625, -0.029436111450195312, -0.82452392578125, -0.315399169921875, 0.11248588562011719, -0.361175537109375, -0.16077423095703125, 0.3646259307861328, -0.334747314453125, 0.27215576171875, 0.050479888916015625, -0.40653228759765625, 0.1279449462890625, -0.33690643310546875, 0.6768264770507812, 0.2433929443359375, 0.29752349853515625, 0.160675048828125, 0.6318435668945312, -0.0995330810546875, 0.6030731201171875, 0.37551116943359375, 0.149993896484375, 0.2530689239501953, -0.62945556640625, -0.16557693481445312, -0.1741771697998047, -0.07635498046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000052.npy"}
|
||||
{"epoch": 0.07860922146636433, "step": 53, "batch_size": 64, "mean": 0.08846625685691833, "std": 0.4421474039554596, "min": -1.0424346923828125, "p10": -0.4405147552490234, "median": 0.11089897155761719, "p90": 0.5214828491210938, "max": 0.9703140258789062, "pos_frac": 0.625, "sample": [-0.25170135498046875, 0.8037261962890625, 0.4687843322753906, -0.35144805908203125, -0.20434951782226562, 0.04850006103515625, -0.001800537109375, -0.0326690673828125, -0.0771484375, -0.22101593017578125, -0.886138916015625, 0.1227264404296875, 0.20873260498046875, 0.2761268615722656, -0.4982452392578125, 0.0566253662109375, 0.10734939575195312, 0.0102386474609375, -0.29041481018066406, 0.31032562255859375, 0.04975128173828125, -0.0634307861328125, 0.9703140258789062, 0.94415283203125, -0.588592529296875, 0.5105133056640625, -0.17074203491210938, 0.4309577941894531, 0.4327526092529297, 0.09850311279296875, 0.0464630126953125, -1.013671875, -0.26869964599609375, -0.8117599487304688, 0.15538597106933594, 0.5020751953125, 0.15315628051757812, 0.23609161376953125, -0.4008903503417969, 0.25360870361328125, 0.32685089111328125, 0.17082977294921875, 0.4229316711425781, 0.9571990966796875, 0.302886962890625, 0.8702392578125, 0.2650604248046875, 0.13767242431640625, 0.006305694580078125, 0.46221923828125, 0.39402008056640625, -0.25464630126953125, -0.17374038696289062, 0.52618408203125, -1.0424346923828125, -0.45749664306640625, 0.89886474609375, -0.0038604736328125, 0.40325164794921875, -0.028383255004882812, -0.12494659423828125, 0.11444854736328125, 0.220916748046875, 0.2033252716064453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000053.npy"}
|
||||
{"epoch": 0.0801209372637944, "step": 54, "batch_size": 64, "mean": 0.028407543897628784, "std": 0.47546470165252686, "min": -1.3892364501953125, "p10": -0.5526222229003906, "median": 0.052384376525878906, "p90": 0.6017150878906251, "max": 0.9582672119140625, "pos_frac": 0.5625, "sample": [-0.55511474609375, 0.0350341796875, -0.04395294189453125, -0.04034233093261719, 0.5852928161621094, 0.0899505615234375, 0.6121978759765625, 0.420318603515625, 0.020534515380859375, 0.19490814208984375, 0.1677398681640625, 0.03917694091796875, 0.74444580078125, 0.2195281982421875, 0.23864364624023438, -0.3433704376220703, 0.9582672119140625, 0.06559181213378906, -0.5264091491699219, 0.324615478515625, 0.39337921142578125, -0.1887969970703125, -0.035839080810546875, 0.394744873046875, -0.0435333251953125, 0.8133583068847656, 0.27288055419921875, 0.4282798767089844, -0.17462921142578125, -0.2445526123046875, -0.5468063354492188, 0.6892356872558594, -0.244964599609375, -0.120574951171875, 0.4060325622558594, -0.4402618408203125, -0.6662216186523438, 0.195220947265625, -0.20388412475585938, 0.6087532043457031, -0.24708938598632812, 0.183990478515625, -0.2529754638671875, 0.48189544677734375, 0.3704795837402344, -0.02265167236328125, 0.5324630737304688, -0.11301803588867188, -0.1536388397216797, 0.2657432556152344, 0.0904388427734375, -0.865753173828125, 0.3622264862060547, 0.52435302734375, 0.11803245544433594, -0.08570098876953125, -0.4963245391845703, 0.23566818237304688, 0.6477546691894531, -0.7434539794921875, -1.3892364501953125, -1.292724609375, -0.8449325561523438, 0.013660430908203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000054.npy"}
|
||||
{"epoch": 0.08163265306122448, "step": 55, "batch_size": 64, "mean": 0.24573048949241638, "std": 0.4259244203567505, "min": -0.629974365234375, "p10": -0.2855083465576172, "median": 0.28250980377197266, "p90": 0.73294677734375, "max": 1.846893310546875, "pos_frac": 0.75, "sample": [0.308349609375, -0.5157470703125, 0.3229560852050781, -0.301971435546875, 0.3285484313964844, 0.9145355224609375, 0.39385986328125, -0.1024627685546875, -0.2826271057128906, -0.22652435302734375, 0.0331268310546875, 0.640350341796875, -0.629974365234375, 0.2910575866699219, 0.32563018798828125, 0.7284393310546875, 0.0750579833984375, 0.304443359375, 1.164276123046875, 0.17050933837890625, 0.34700775146484375, 0.51837158203125, 0.7398910522460938, 0.054443359375, 1.846893310546875, 0.17389488220214844, -0.47711181640625, 0.193359375, 0.7348785400390625, 0.3212890625, 0.14019012451171875, 0.360504150390625, -0.1354522705078125, -0.2288055419921875, -0.236328125, 0.4184112548828125, 0.07558441162109375, 0.2846393585205078, 0.541290283203125, 0.23885726928710938, 0.16705322265625, 0.6392364501953125, 0.06710433959960938, 0.615966796875, 0.524658203125, 0.5908260345458984, 0.5655364990234375, -0.37994384765625, 0.09951019287109375, -0.288818359375, 0.2639122009277344, 0.018131256103515625, 0.16532135009765625, -0.2867431640625, 0.39415740966796875, 0.9188308715820312, 0.2803802490234375, -0.17140579223632812, 0.7872200012207031, 0.4216461181640625, -0.1872711181640625, 0.40877532958984375, 0.4168548583984375, -0.15782928466796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000055.npy"}
|
||||
{"epoch": 0.08314436885865457, "step": 56, "batch_size": 64, "mean": 0.05905655026435852, "std": 0.5378000140190125, "min": -1.796905517578125, "p10": -0.5120559692382812, "median": 0.0839385986328125, "p90": 0.6888198852539062, "max": 1.4630584716796875, "pos_frac": 0.578125, "sample": [-0.5182952880859375, 0.4101104736328125, 0.09516525268554688, 0.331298828125, 0.5919570922851562, 0.07598114013671875, 0.29398345947265625, -1.0368499755859375, -0.3382987976074219, -0.10605430603027344, -0.2527046203613281, 0.10894775390625, 1.4630584716796875, -0.4477081298828125, -0.25908660888671875, 0.8798980712890625, 0.6910858154296875, 0.18828582763671875, 0.30007171630859375, 0.2976837158203125, -0.4831829071044922, 1.0626449584960938, -0.0699005126953125, 0.8600349426269531, -0.2035198211669922, 0.12607192993164062, 0.09173583984375, 0.4369964599609375, 0.161895751953125, -0.49749755859375, -0.04389190673828125, 0.02808380126953125, 0.11743927001953125, 0.058868408203125, -0.316375732421875, 0.45401763916015625, -0.8606719970703125, 0.3529624938964844, -0.6026897430419922, -0.26572418212890625, 0.6147994995117188, 0.352386474609375, 0.2572784423828125, -0.16946029663085938, -0.2263641357421875, 0.9488258361816406, -0.481231689453125, 0.560760498046875, -0.0059413909912109375, 0.37358856201171875, 0.68353271484375, 0.49021148681640625, 0.07122802734375, -0.8292236328125, -1.796905517578125, -0.3646812438964844, -0.57940673828125, 0.076141357421875, 0.1948394775390625, 0.41953277587890625, -0.4098014831542969, 0.73321533203125, -0.062957763671875, -0.24657440185546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000056.npy"}
|
||||
{"epoch": 0.08465608465608465, "step": 57, "batch_size": 64, "mean": 0.174201101064682, "std": 0.4820391833782196, "min": -1.0324859619140625, "p10": -0.3774578094482422, "median": 0.1286001205444336, "p90": 0.8001129150390632, "max": 1.44635009765625, "pos_frac": 0.609375, "sample": [0.5266189575195312, -0.14984893798828125, 0.20401382446289062, 0.21893310546875, -0.12176132202148438, 0.0251922607421875, -0.013641357421875, 0.499359130859375, 0.2899055480957031, 1.2033309936523438, -0.3809814453125, 0.4125328063964844, 0.2353668212890625, 0.3640556335449219, 0.05002593994140625, -0.002094268798828125, 0.07025146484375, 1.0218849182128906, 1.1073150634765625, 0.3311595916748047, -0.23024749755859375, -0.3586578369140625, 0.1889972686767578, 0.544647216796875, -0.057880401611328125, -0.3810310363769531, 0.39569854736328125, 1.1740341186523438, -0.3258323669433594, -0.01558685302734375, 0.4994964599609375, 0.19864654541015625, -0.4542732238769531, -0.4229736328125, -0.04566192626953125, 0.509185791015625, -0.9769287109375, 0.4624137878417969, -1.0324859619140625, 0.876953125, 0.9248542785644531, 0.18030548095703125, -0.4913787841796875, 0.4864082336425781, -0.0599517822265625, -0.01378631591796875, 0.06468963623046875, 0.4101409912109375, -0.32482147216796875, 0.4046173095703125, 0.545867919921875, 0.254119873046875, 0.441741943359375, 1.44635009765625, 0.001544952392578125, 0.620819091796875, 0.6068572998046875, -0.0122528076171875, 0.07689476013183594, -0.3442840576171875, -0.0542449951171875, -0.11786651611328125, 0.03134918212890625, -0.3692359924316406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000057.npy"}
|
||||
{"epoch": 0.08616780045351474, "step": 58, "batch_size": 64, "mean": 0.017644047737121582, "std": 0.5139992237091064, "min": -1.335235595703125, "p10": -0.48358840942382814, "median": 0.00730133056640625, "p90": 0.6418350219726565, "max": 1.822174072265625, "pos_frac": 0.546875, "sample": [-0.18012237548828125, 0.5833663940429688, 0.14740371704101562, 0.009246826171875, 0.19731903076171875, -0.1220550537109375, 0.6668930053710938, 0.9344711303710938, -0.0063934326171875, 0.8182907104492188, 0.17791748046875, -0.47890472412109375, 0.20721435546875, 0.3334197998046875, -0.8071403503417969, 0.282379150390625, 0.711334228515625, -0.453948974609375, -0.263824462890625, 0.4608001708984375, -0.001354217529296875, -0.12453842163085938, -0.3109169006347656, -0.472076416015625, -0.43350982666015625, -0.3994293212890625, -0.7890243530273438, 0.5526351928710938, -0.36520957946777344, 0.0732269287109375, -0.485595703125, 0.2723960876464844, -0.38614654541015625, 0.2837982177734375, 0.27020263671875, 0.22884368896484375, 1.822174072265625, 0.692596435546875, 0.27211761474609375, -0.15132904052734375, -0.82403564453125, -0.28384971618652344, 0.27872467041015625, -1.335235595703125, -0.8447647094726562, 0.1035614013671875, -0.49471282958984375, 0.00267791748046875, 0.5028762817382812, 0.3448066711425781, 0.0053558349609375, 0.1526031494140625, -0.3599853515625, 0.00514984130859375, 0.27153778076171875, 0.2803077697753906, -0.2327251434326172, -0.4387016296386719, -0.45032501220703125, 0.9682159423828125, -0.413848876953125, -0.328765869140625, 0.17494964599609375, 0.27887535095214844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000058.npy"}
|
||||
{"epoch": 0.08767951625094482, "step": 59, "batch_size": 64, "mean": 0.19744998216629028, "std": 0.5810407996177673, "min": -1.0447998046875, "p10": -0.36658134460449215, "median": 0.094207763671875, "p90": 0.8327560424804689, "max": 2.909393310546875, "pos_frac": 0.625, "sample": [-0.3846778869628906, -0.07697296142578125, 0.80450439453125, -0.8510475158691406, 0.744903564453125, 0.24646759033203125, 0.23187255859375, -0.504608154296875, -0.21541213989257812, 0.614227294921875, 2.909393310546875, 1.0958251953125, 0.90692138671875, 0.24296188354492188, 1.2532730102539062, -0.19615554809570312, 0.0776824951171875, 0.007354736328125, 0.093536376953125, -0.0810546875, 0.21794509887695312, -0.18468475341796875, -0.725830078125, -0.09304046630859375, -0.26416015625, 0.6897621154785156, 0.16848373413085938, 0.48705291748046875, -0.154876708984375, 0.25859832763671875, -0.12152481079101562, 0.9630889892578125, 0.23797988891601562, -0.07856369018554688, -1.0447998046875, 0.004791259765625, -0.12725067138671875, 0.08533096313476562, -0.3870201110839844, 0.232574462890625, 1.4877166748046875, 0.34576416015625, 0.33678436279296875, 0.23992538452148438, -0.16559600830078125, 0.033016204833984375, 0.5564346313476562, 0.42389678955078125, -0.12993812561035156, 0.73370361328125, -0.03223419189453125, 0.3458671569824219, -0.3243560791015625, -0.03650665283203125, 0.8448638916015625, 0.07694244384765625, -0.475830078125, 0.094879150390625, 0.3394966125488281, 0.0678253173828125, 0.10349082946777344, -0.0868682861328125, 0.4610443115234375, 0.3136253356933594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000059.npy"}
|
||||
{"epoch": 0.08919123204837491, "step": 60, "batch_size": 64, "mean": 0.1341674029827118, "std": 0.5390803217887878, "min": -1.194366455078125, "p10": -0.3749824523925781, "median": 0.08062171936035156, "p90": 0.7766647338867192, "max": 2.49871826171875, "pos_frac": 0.640625, "sample": [0.02002716064453125, -0.1702117919921875, 0.34825897216796875, -0.1519775390625, -0.07886505126953125, -0.035736083984375, -1.194366455078125, -0.91455078125, -0.7335205078125, -0.9696502685546875, 0.15012741088867188, 0.8227615356445312, -0.17844009399414062, 0.26657867431640625, 0.38242340087890625, 0.19481658935546875, 0.18770980834960938, 0.4066810607910156, -0.4392547607421875, 0.44542694091796875, -0.0234222412109375, 0.24230575561523438, 0.14838409423828125, 0.0437164306640625, -0.31043243408203125, 0.02309417724609375, -0.43643951416015625, 0.45536231994628906, 0.6691055297851562, -0.08311843872070312, -0.33038330078125, 0.07387161254882812, 0.9177932739257812, 0.06501007080078125, -0.16881561279296875, -0.0697174072265625, 0.01189422607421875, -0.01732635498046875, 0.6544418334960938, -0.19837188720703125, 0.0634307861328125, 0.38333702087402344, 0.18768692016601562, 0.10443115234375, 0.124114990234375, 0.087371826171875, 0.12813949584960938, 2.49871826171875, 0.959136962890625, 0.000396728515625, 0.08925628662109375, -0.39409637451171875, 0.15656661987304688, -0.05084228515625, 1.3344268798828125, 0.07381248474121094, -0.038730621337890625, 0.138214111328125, 0.9451141357421875, 0.14252090454101562, 1.112518310546875, -0.05333900451660156, 0.42853546142578125, 0.14080238342285156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000060.npy"}
|
||||
{"epoch": 0.09070294784580499, "step": 61, "batch_size": 64, "mean": 0.0344984233379364, "std": 0.48746806383132935, "min": -1.2678298950195312, "p10": -0.46129150390624996, "median": 0.05486106872558594, "p90": 0.6402877807617188, "max": 1.535247802734375, "pos_frac": 0.59375, "sample": [-0.1030731201171875, 0.04229736328125, 0.4127235412597656, 0.8304290771484375, 0.5929908752441406, 0.17177581787109375, 0.02703094482421875, 0.33725738525390625, -0.24057769775390625, 0.2196044921875, -0.35849761962890625, -0.23074722290039062, 0.055118560791015625, 0.020904541015625, -0.2393646240234375, 0.05430793762207031, 0.7654914855957031, -0.27669715881347656, -0.3623313903808594, 0.05460357666015625, -0.0705413818359375, 0.43013763427734375, 0.6968765258789062, -0.1989288330078125, 0.057445526123046875, 0.16744613647460938, -0.52679443359375, 0.2609100341796875, -0.4131927490234375, -0.6917266845703125, 0.14437103271484375, 0.26593589782714844, 0.021022796630859375, 0.083831787109375, -0.122802734375, 0.3914146423339844, -0.422027587890625, -0.33963584899902344, -0.13499832153320312, 0.06237030029296875, 0.860687255859375, -1.2678298950195312, 0.664154052734375, -0.6612167358398438, 0.1441650390625, 0.6448745727539062, 0.16215896606445312, -0.13712120056152344, -0.32775115966796875, 0.5503997802734375, 0.23733901977539062, -1.016326904296875, -0.3585948944091797, -0.23705291748046875, 0.6295852661132812, -0.478118896484375, 1.535247802734375, 0.31491661071777344, 0.1212615966796875, 0.5607223510742188, 0.05986785888671875, 0.31256866455078125, -1.1903152465820312, -0.3500823974609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000061.npy"}
|
||||
{"epoch": 0.09221466364323508, "step": 62, "batch_size": 64, "mean": 0.16568706929683685, "std": 0.5494100451469421, "min": -1.2124176025390625, "p10": -0.6515792846679687, "median": 0.2496929168701172, "p90": 0.7591438293457031, "max": 1.3362655639648438, "pos_frac": 0.609375, "sample": [0.8071365356445312, -0.8097152709960938, -0.21063232421875, -0.20583343505859375, 0.16363525390625, 0.8565826416015625, -0.33754730224609375, 0.2522392272949219, -0.23359298706054688, -0.12879180908203125, 0.43079376220703125, -0.070556640625, 0.2471466064453125, -0.8570175170898438, 0.7632293701171875, 0.700927734375, 0.6931304931640625, 0.109527587890625, 0.14540481567382812, 0.2140960693359375, -0.7297821044921875, -0.16976547241210938, -1.09820556640625, 0.38091087341308594, -0.30068397521972656, 0.7496109008789062, 0.5273666381835938, 0.6155624389648438, 0.8445205688476562, 0.7313652038574219, 0.0546417236328125, -0.06193256378173828, 0.35650634765625, -0.586456298828125, 0.4422760009765625, -0.223846435546875, 0.23638153076171875, 1.2138595581054688, -0.023525238037109375, -1.2124176025390625, 0.5903549194335938, -0.213592529296875, -0.8519210815429688, 0.32808685302734375, 0.2852325439453125, 0.411956787109375, 0.6774215698242188, 0.283966064453125, 0.3133583068847656, 1.3362655639648438, -0.27060699462890625, 0.6634845733642578, 0.8854827880859375, 0.418548583984375, -0.45037841796875, -0.6794891357421875, 0.73248291015625, 0.734954833984375, 0.5798816680908203, -0.01984405517578125, 0.3021240234375, 0.5236663818359375, -0.05889892578125, -0.1951141357421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000062.npy"}
|
||||
{"epoch": 0.09372637944066516, "step": 63, "batch_size": 64, "mean": 0.19854141771793365, "std": 0.5009901523590088, "min": -1.4850921630859375, "p10": -0.2958812713623047, "median": 0.2326831817626953, "p90": 0.7225250244140625, "max": 1.7432861328125, "pos_frac": 0.65625, "sample": [0.5304336547851562, 0.217315673828125, -0.5056228637695312, 0.2764015197753906, 1.22015380859375, -0.2913818359375, -0.1255970001220703, 0.36273193359375, 0.2249755859375, -0.18433380126953125, -1.4850921630859375, -0.5070991516113281, 0.24039077758789062, -0.2469482421875, 0.7276153564453125, -0.0965576171875, 1.7432861328125, 0.47210693359375, -0.14033126831054688, -0.054683685302734375, 1.3364105224609375, 0.1630725860595703, 0.3992156982421875, 0.5887908935546875, 0.4598846435546875, 0.3598823547363281, -0.4874725341796875, 0.4434928894042969, 0.34517765045166016, -0.4840240478515625, 0.29804229736328125, 0.033344268798828125, 0.10549163818359375, -0.23178863525390625, -0.1671600341796875, 0.4412841796875, -0.5301742553710938, -0.22504425048828125, 0.3101806640625, -0.0157470703125, 0.4290294647216797, 1.191375732421875, -0.2978096008300781, 0.3838691711425781, -0.2745704650878906, 0.272613525390625, 0.73565673828125, 0.4269905090332031, -0.0862579345703125, 0.0029144287109375, 0.1275482177734375, -0.03224945068359375, 0.02698516845703125, 0.521759033203125, 0.326812744140625, 0.4053802490234375, 0.0701904296875, 0.6034812927246094, -0.22760963439941406, 0.2068634033203125, 0.3503265380859375, 0.7106475830078125, 0.993927001953125, 0.31815338134765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000063.npy"}
|
||||
{"epoch": 0.09523809523809523, "step": 64, "batch_size": 64, "mean": 0.017070025205612183, "std": 0.4999678432941437, "min": -1.67999267578125, "p10": -0.6391263961791991, "median": 0.03285694122314453, "p90": 0.59140625, "max": 1.187042236328125, "pos_frac": 0.53125, "sample": [0.099884033203125, 1.187042236328125, 0.041690826416015625, -0.189666748046875, -0.3974456787109375, 0.2151641845703125, 0.7309513092041016, 0.2782707214355469, 0.44405364990234375, 0.3387908935546875, 0.04067420959472656, 0.5762519836425781, 0.5722427368164062, 0.28112030029296875, 0.7970962524414062, 0.011383056640625, 0.8777694702148438, -0.40900421142578125, -0.44805145263671875, 0.1626739501953125, -0.0789794921875, -0.42232513427734375, 0.0250396728515625, 0.15697479248046875, -0.80963134765625, -0.007282257080078125, 0.12948226928710938, -0.104095458984375, 0.3476600646972656, 0.5232295989990234, -0.089202880859375, 0.5871124267578125, 0.11667251586914062, -0.690673828125, 0.5932464599609375, -0.553955078125, 0.6156272888183594, -1.67999267578125, -0.08031082153320312, 0.16382598876953125, 0.9185791015625, -0.197479248046875, 0.4910736083984375, -0.03827667236328125, -0.1267852783203125, -0.7981414794921875, 0.24932861328125, -0.674041748046875, -0.5238761901855469, 0.05322265625, -0.2035980224609375, -0.09412956237792969, -0.1438751220703125, -0.3080482482910156, -0.06612777709960938, 0.3891773223876953, 0.4492073059082031, 0.2509937286376953, -0.6943588256835938, -0.3494110107421875, 0.17144775390625, -0.848114013671875, -0.5576572418212891, -0.20994186401367188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000064.npy"}
|
||||
{"epoch": 0.09674981103552532, "step": 65, "batch_size": 64, "mean": 0.11321339011192322, "std": 0.5929509997367859, "min": -1.636138916015625, "p10": -0.4950096130371094, "median": 0.14485740661621094, "p90": 0.746954345703125, "max": 1.3017196655273438, "pos_frac": 0.625, "sample": [0.6532955169677734, 0.45030975341796875, 0.605560302734375, -0.3878631591796875, 0.08860397338867188, 0.1955413818359375, -0.7970123291015625, -1.636138916015625, -0.49304962158203125, 0.74554443359375, -0.3265380859375, 0.13519287109375, 0.5674381256103516, 0.3834075927734375, 0.6843948364257812, -0.4128875732421875, 0.6592502593994141, 0.4157752990722656, -0.6296310424804688, 0.74755859375, 0.4579010009765625, -0.21799468994140625, -0.495849609375, 1.3017196655273438, 1.2816925048828125, 0.15452194213867188, -1.3958206176757812, 0.13224029541015625, 0.17992401123046875, -0.19989013671875, -0.227294921875, -0.36573028564453125, 0.4558868408203125, -0.903045654296875, 0.2366485595703125, 0.7181015014648438, -0.131591796875, 0.413604736328125, 0.64990234375, 0.07381057739257812, 0.9129714965820312, -0.11378097534179688, 0.6531524658203125, 0.31671142578125, -0.4258613586425781, -0.07360076904296875, 0.1984405517578125, 0.33966064453125, -0.212371826171875, -0.4405364990234375, 0.42650604248046875, 0.10744285583496094, -0.3103141784667969, -0.3689689636230469, 0.08283233642578125, 0.37430572509765625, -0.3453521728515625, 0.06098175048828125, 1.1583099365234375, 0.2802276611328125, 1.0699691772460938, -1.045562744140625, 0.78302001953125, 0.04998588562011719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000065.npy"}
|
||||
{"epoch": 0.0982615268329554, "step": 66, "batch_size": 64, "mean": 0.18812981247901917, "std": 0.6807229518890381, "min": -2.75762939453125, "p10": -0.5396366119384766, "median": 0.18848419189453125, "p90": 0.8810298919677735, "max": 2.540283203125, "pos_frac": 0.671875, "sample": [0.23867225646972656, 2.540283203125, 0.18991851806640625, 0.18704986572265625, 0.16944313049316406, 0.36065673828125, -0.15758895874023438, 0.681121826171875, -0.4073829650878906, 0.5626983642578125, -0.5540275573730469, -0.5060577392578125, 0.17230224609375, -0.1912841796875, 0.7192230224609375, 0.86163330078125, 0.88360595703125, 0.9382171630859375, 0.39125823974609375, 1.0422630310058594, 0.26064300537109375, 0.8890914916992188, -0.6860733032226562, 0.20545578002929688, -0.20496749877929688, 0.3083686828613281, 0.11414337158203125, 0.3484687805175781, 0.986907958984375, -0.15877532958984375, 0.7333297729492188, 0.15665817260742188, -0.07944488525390625, 0.1709136962890625, 0.4644012451171875, 0.133392333984375, -0.4845848083496094, -0.22237777709960938, 0.6917037963867188, 0.8750190734863281, -2.75762939453125, -0.9723243713378906, 0.27620697021484375, -0.36223602294921875, 0.1829833984375, -0.027069091796875, 0.7242660522460938, 0.3875465393066406, 0.747467041015625, 0.3077583312988281, 0.014749526977539062, 0.13712310791015625, 0.13983154296875, -0.7409515380859375, 1.0345001220703125, -0.31195831298828125, 0.6796875, 0.20318603515625, -0.864837646484375, 0.68194580078125, 0.6917495727539062, -0.17324066162109375, -0.01766204833984375, -0.5650634765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000066.npy"}
|
||||
{"epoch": 0.09977324263038549, "step": 67, "batch_size": 64, "mean": 0.22150954604148865, "std": 0.47579821944236755, "min": -0.7580108642578125, "p10": -0.3316322326660156, "median": 0.1479473114013672, "p90": 0.8130767822265627, "max": 1.804168701171875, "pos_frac": 0.65625, "sample": [-0.47681427001953125, 0.4562835693359375, -0.18412399291992188, 0.1976470947265625, 0.5092620849609375, 0.3060340881347656, 0.4753761291503906, 0.6532211303710938, 0.2120208740234375, -0.3780059814453125, 1.804168701171875, 0.027706146240234375, 0.1089324951171875, -0.3778228759765625, 0.1302032470703125, 0.16552352905273438, 0.09317970275878906, 0.6000862121582031, -0.13923263549804688, -0.11757659912109375, -0.350677490234375, 0.5578155517578125, -0.25286865234375, 1.0797958374023438, 0.05277442932128906, 0.081695556640625, 0.5506706237792969, 0.6503448486328125, 0.540802001953125, 0.33267974853515625, -0.25914764404296875, -0.28719329833984375, 0.052490234375, 0.8237724304199219, 0.5111885070800781, 0.2379436492919922, 0.2575645446777344, 0.4432506561279297, -0.5035171508789062, 0.4375419616699219, 0.5487823486328125, -0.020660400390625, -0.201263427734375, 1.1743698120117188, 0.13037109375, 0.651458740234375, 0.254608154296875, -0.22394943237304688, 0.8370437622070312, -0.07929229736328125, 0.6401214599609375, 1.106903076171875, 0.011737823486328125, -0.21142578125, -0.2608222961425781, -0.7580108642578125, -0.1615142822265625, 0.4152641296386719, -0.14864158630371094, -0.03200531005859375, 0.7881202697753906, -0.390045166015625, 1.0524749755859375, 0.02999114990234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000067.npy"}
|
||||
{"epoch": 0.10128495842781557, "step": 68, "batch_size": 64, "mean": 0.1363646388053894, "std": 0.671923816204071, "min": -1.581939697265625, "p10": -0.660882568359375, "median": 0.13678359985351562, "p90": 1.004094696044922, "max": 1.6885452270507812, "pos_frac": 0.609375, "sample": [-0.210662841796875, 0.5970001220703125, -0.017673492431640625, -0.0585479736328125, -1.1885604858398438, 1.6885452270507812, -0.19081878662109375, 0.08386611938476562, 1.3438758850097656, -0.8428802490234375, 0.6822433471679688, -1.25848388671875, 0.193817138671875, 0.01055145263671875, 1.096649169921875, 0.8470726013183594, -0.15088653564453125, -0.22721099853515625, -0.01905059814453125, 0.13889312744140625, 0.9852371215820312, -0.7002792358398438, -1.1598358154296875, 0.9117774963378906, 0.047916412353515625, 0.16829299926757812, -0.659942626953125, 0.03633880615234375, 1.52679443359375, 0.2268829345703125, 0.5111923217773438, 0.35350799560546875, 0.2725982666015625, 0.134674072265625, 0.3187980651855469, 1.012176513671875, 1.452423095703125, 0.00882720947265625, -0.2425537109375, 0.15185546875, -1.581939697265625, -0.3450202941894531, -0.3870849609375, 0.6928558349609375, 0.17779541015625, 0.39959716796875, -0.2809906005859375, 0.0791778564453125, -0.3121070861816406, 0.14904022216796875, 0.4000396728515625, -0.13335418701171875, 0.5005645751953125, 1.2792510986328125, 0.6167373657226562, 0.312042236328125, -0.604217529296875, -0.661285400390625, 0.7933349609375, 0.24543380737304688, 0.58477783203125, -0.2552490234375, -0.251983642578125, -0.5644989013671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000068.npy"}
|
||||
{"epoch": 0.10279667422524566, "step": 69, "batch_size": 64, "mean": 0.22328829765319824, "std": 0.5511104464530945, "min": -1.245513916015625, "p10": -0.5190391540527344, "median": 0.17493057250976562, "p90": 0.9288921356201177, "max": 1.3630752563476562, "pos_frac": 0.671875, "sample": [0.4738311767578125, 0.5540618896484375, 1.3365249633789062, -0.084716796875, 0.5370101928710938, -0.0753936767578125, 0.3123626708984375, -0.07055282592773438, 0.7161865234375, 0.131317138671875, -0.15674591064453125, 0.16518020629882812, -0.17479324340820312, 0.5681076049804688, 0.12400436401367188, 0.6029510498046875, 0.2019023895263672, 1.140869140625, 0.6551513671875, -0.5459060668945312, 0.4544677734375, -0.5348663330078125, 0.9918975830078125, 0.6827926635742188, 0.14312744140625, 0.434356689453125, 0.5954437255859375, 0.1527252197265625, 0.170166015625, -0.2173004150390625, 0.490478515625, 0.08990478515625, 0.055023193359375, 0.41567230224609375, 0.2941627502441406, -1.245513916015625, 1.3630752563476562, -0.21289443969726562, -0.289459228515625, -0.3722991943359375, -0.03253173828125, 0.08096694946289062, -0.5512542724609375, 0.0492401123046875, 0.7818794250488281, -0.1355133056640625, 0.6888923645019531, 0.431182861328125, 0.4615058898925781, -0.48210906982421875, 0.17969512939453125, 0.5498428344726562, -1.0751380920410156, -0.5410346984863281, 1.318267822265625, -0.0804290771484375, 0.382904052734375, 1.08978271484375, 0.53125, -0.6021385192871094, 1.329010009765625, 0.00762176513671875, 0.469085693359375, -0.43284034729003906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000069.npy"}
|
||||
{"epoch": 0.10430839002267574, "step": 70, "batch_size": 64, "mean": 0.20928049087524414, "std": 0.6051356792449951, "min": -0.996978759765625, "p10": -0.6030693054199218, "median": 0.10254573822021484, "p90": 1.046074676513672, "max": 1.4813995361328125, "pos_frac": 0.609375, "sample": [0.26671600341796875, -0.12608718872070312, 1.430450439453125, -0.996978759765625, 0.033603668212890625, -0.3540172576904297, -0.2257537841796875, 0.7411422729492188, 0.7984695434570312, -0.3652763366699219, 0.9898681640625, -0.2920722961425781, 0.5968475341796875, 1.4415283203125, 0.2245616912841797, 0.080718994140625, 0.45111083984375, -0.1150665283203125, -0.005496978759765625, -0.2718944549560547, 1.0592193603515625, 0.4658203125, 0.7016525268554688, 0.2954139709472656, -0.8309173583984375, 0.19879913330078125, 0.01448822021484375, 0.9257354736328125, 0.17414093017578125, 0.460784912109375, -0.18581390380859375, -0.6553115844726562, -0.2690887451171875, -0.6702041625976562, -0.3690185546875, 1.3316802978515625, 0.7247810363769531, -0.481170654296875, 0.11671638488769531, 0.6552391052246094, 0.44144439697265625, 0.02215576171875, -0.3243389129638672, 1.2637710571289062, 0.6696510314941406, 1.4813995361328125, -0.07245635986328125, -0.662811279296875, -0.038341522216796875, -0.053745269775390625, 1.0154037475585938, 0.9844894409179688, 0.2874259948730469, 0.3465728759765625, 0.08837509155273438, 0.16551971435546875, 0.08147430419921875, -0.7217979431152344, 0.5746612548828125, 1.1062450408935547, -0.295440673828125, 0.02568817138671875, -0.2924995422363281, -0.664215087890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000070.npy"}
|
||||
{"epoch": 0.10582010582010581, "step": 71, "batch_size": 64, "mean": 0.10643890500068665, "std": 0.6426126956939697, "min": -1.4141693115234375, "p10": -0.750579833984375, "median": 0.2753028869628906, "p90": 0.8595985412597658, "max": 1.249298095703125, "pos_frac": 0.59375, "sample": [0.1551227569580078, 0.377532958984375, 0.6285381317138672, 0.24513626098632812, 0.6496086120605469, 0.08795166015625, 0.41168975830078125, -0.563812255859375, 0.43072509765625, 0.912445068359375, -0.049335479736328125, -1.2096443176269531, -0.07880020141601562, -0.7550125122070312, -0.7397613525390625, -0.005626678466796875, -0.7052536010742188, 0.3474273681640625, 0.6192340850830078, 0.48398590087890625, -0.378326416015625, 1.028289794921875, -0.07016754150390625, 0.4488372802734375, 0.48906707763671875, 0.7257537841796875, -0.04708099365234375, 0.3922882080078125, 0.26952362060546875, -1.304473876953125, -0.742462158203125, -1.4141693115234375, -0.5872650146484375, -0.1591644287109375, -0.23883056640625, -0.57000732421875, 0.5582427978515625, 0.3560333251953125, 0.8721389770507812, -0.6535568237304688, -0.03558349609375, -1.0489349365234375, 1.249298095703125, 0.7284698486328125, -0.754058837890625, -0.04747772216796875, -0.9728240966796875, 0.4895172119140625, 0.32782745361328125, 0.0067310333251953125, 0.8303375244140625, 1.0484657287597656, -0.6511688232421875, 0.2810821533203125, 0.7335128784179688, 0.6648941040039062, 0.38162994384765625, 0.549407958984375, 0.2948112487792969, 0.65972900390625, 0.9367294311523438, 0.1600933074951172, 1.1708526611328125, -0.40807342529296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000071.npy"}
|
||||
{"epoch": 0.1073318216175359, "step": 72, "batch_size": 64, "mean": 0.32459497451782227, "std": 0.6059262156486511, "min": -1.69671630859375, "p10": -0.3165512084960937, "median": 0.3142261505126953, "p90": 0.9926719665527345, "max": 2.2249679565429688, "pos_frac": 0.703125, "sample": [-0.36020660400390625, 1.5736618041992188, 0.5110549926757812, 0.3837890625, -0.5464630126953125, 0.1458892822265625, 0.2947998046875, 1.6637039184570312, -0.11421966552734375, 0.9063796997070312, -0.2500801086425781, 0.00220489501953125, 0.5546417236328125, 0.3667449951171875, 0.2965850830078125, -0.11476898193359375, 0.43825531005859375, -0.4505043029785156, 0.7257080078125, 0.48886871337890625, 1.0154876708984375, 0.5018577575683594, 0.35155487060546875, 0.3194847106933594, 0.8750152587890625, 0.4427947998046875, 0.4651947021484375, -0.07982254028320312, -0.33875274658203125, 0.30896759033203125, 1.269439697265625, 0.20641136169433594, -0.46692657470703125, 0.13280487060546875, 0.3013153076171875, -0.08821868896484375, 0.5484180450439453, 0.4542350769042969, -0.18698883056640625, 0.3986320495605469, 0.9690399169921875, 0.9653282165527344, 0.8785400390625, -0.26474761962890625, 0.255401611328125, 0.546722412109375, 0.3549041748046875, -0.00506591796875, 0.1454010009765625, -0.19097900390625, -1.69671630859375, 0.564056396484375, 2.2249679565429688, -0.19588470458984375, 1.06414794921875, -0.23811721801757812, 0.891815185546875, 0.8627395629882812, -0.7600860595703125, -0.03484344482421875, 1.0027999877929688, 0.0525054931640625, 0.22832107543945312, 0.206878662109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000072.npy"}
|
||||
{"epoch": 0.10884353741496598, "step": 73, "batch_size": 64, "mean": 0.05147099494934082, "std": 0.7699373960494995, "min": -2.2735595703125, "p10": -0.7880016326904297, "median": -0.0012035369873046875, "p90": 1.0676612854003906, "max": 1.7545166015625, "pos_frac": 0.5, "sample": [0.5567703247070312, -0.05493927001953125, 1.0792083740234375, -0.9850959777832031, -0.2958793640136719, -1.2820892333984375, -0.6417083740234375, -0.038120269775390625, -1.1082611083984375, 1.0407180786132812, -0.623687744140625, -0.03173828125, 0.6880035400390625, -0.2201995849609375, -0.7701988220214844, 0.5816268920898438, -0.588897705078125, -0.3578987121582031, -0.19647216796875, 0.8232345581054688, 0.4712677001953125, 0.245880126953125, 0.029331207275390625, 0.069488525390625, 1.7545166015625, 0.9354248046875, -0.7252159118652344, -0.5882949829101562, 0.1257457733154297, 0.4704170227050781, 0.2823982238769531, -0.3096504211425781, 0.5759544372558594, -0.29717254638671875, -0.331268310546875, 0.242767333984375, 0.6071548461914062, 1.098114013671875, -0.9637947082519531, -0.18770599365234375, -2.2735595703125, 1.7196044921875, -0.4456787109375, -0.6666030883789062, -1.073944091796875, 1.3136749267578125, 0.38999176025390625, -0.25991058349609375, 0.2704010009765625, 0.196319580078125, 0.1959686279296875, 0.6165275573730469, -0.06258201599121094, -0.13210678100585938, 0.0794677734375, 0.9751510620117188, -0.6309356689453125, 0.578460693359375, -0.3631744384765625, -0.7956314086914062, 0.45143890380859375, 1.1134490966796875, 1.666778564453125, -0.6486968994140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000073.npy"}
|
||||
{"epoch": 0.11035525321239607, "step": 74, "batch_size": 64, "mean": 0.2865017056465149, "std": 0.7163569927215576, "min": -1.831024169921875, "p10": -0.44472084045410154, "median": 0.3331642150878906, "p90": 0.9762107849121096, "max": 2.6523895263671875, "pos_frac": 0.640625, "sample": [-0.3698883056640625, 0.6935348510742188, -1.831024169921875, 0.5375137329101562, -0.3556976318359375, 0.5573959350585938, 0.7463531494140625, 1.6620025634765625, 0.6053199768066406, 0.82659912109375, 0.3501014709472656, 0.7493743896484375, 0.5323638916015625, 0.36525726318359375, -0.44812774658203125, 0.8373260498046875, -0.448883056640625, 0.992706298828125, -0.024639129638671875, -0.3227081298828125, 0.8886795043945312, -0.05101776123046875, 0.024188995361328125, 0.15196990966796875, 1.0484848022460938, -0.20238113403320312, 0.09775924682617188, 0.66278076171875, 1.4730148315429688, -0.383148193359375, 0.1152801513671875, 0.5971183776855469, 0.3404254913330078, -0.6486186981201172, -0.4367713928222656, 0.9356231689453125, -0.060031890869140625, -0.09491729736328125, -0.9964218139648438, -0.34384918212890625, 0.6781597137451172, -0.18710708618164062, 0.5927772521972656, 0.8151283264160156, 0.2212371826171875, 2.6523895263671875, 0.8784332275390625, 0.2963409423828125, 1.4690933227539062, -0.6045455932617188, -1.3645172119140625, 0.2521648406982422, -0.38922882080078125, 0.057735443115234375, 0.32891082763671875, -0.11057090759277344, 0.3374176025390625, -0.181488037109375, 0.43128204345703125, 0.7084884643554688, 1.0648365020751953, 0.9377212524414062, -0.23570632934570312, 0.9141082763671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000074.npy"}
|
||||
{"epoch": 0.11186696900982615, "step": 75, "batch_size": 64, "mean": 0.3048042356967926, "std": 0.7862442135810852, "min": -1.8072967529296875, "p10": -0.4859722137451171, "median": 0.17502975463867188, "p90": 1.4428691864013683, "max": 2.427215576171875, "pos_frac": 0.671875, "sample": [-0.2218017578125, -0.1282501220703125, 0.33214569091796875, 0.6632537841796875, 1.9742355346679688, 0.48276519775390625, -0.08284378051757812, 0.3876762390136719, 0.12763214111328125, 1.5493049621582031, 0.18544769287109375, -0.6023979187011719, 0.055332183837890625, -0.519561767578125, 0.1959552764892578, 1.0726585388183594, 1.751007080078125, -0.4075965881347656, 0.09143638610839844, -0.060558319091796875, 2.427215576171875, 1.0194435119628906, 0.75909423828125, 0.44345855712890625, 0.09502029418945312, -0.34198760986328125, 0.489471435546875, -1.4402313232421875, 0.7707138061523438, -0.30083465576171875, 0.778076171875, 1.19451904296875, -0.8703460693359375, 0.42769622802734375, -0.3507537841796875, -0.29343414306640625, 2.319122314453125, 0.12352752685546875, 1.1036605834960938, 0.5276565551757812, 0.400390625, 0.6275520324707031, -1.8072967529296875, -0.09770011901855469, 0.5416221618652344, -0.212188720703125, -0.763458251953125, 0.41131591796875, -0.1755847930908203, 0.11469459533691406, 0.25393104553222656, 0.06162261962890625, 0.061878204345703125, -0.13530731201171875, 0.5924835205078125, 0.0198516845703125, 1.589111328125, 0.26044654846191406, -0.6578903198242188, -0.172882080078125, 0.16461181640625, 0.09600830078125, 1.6834259033203125, 0.9239044189453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000075.npy"}
|
||||
{"epoch": 0.11337868480725624, "step": 76, "batch_size": 64, "mean": 0.274379700422287, "std": 0.7380563020706177, "min": -1.8524398803710938, "p10": -0.6326374053955077, "median": 0.22264766693115234, "p90": 1.2352298736572267, "max": 1.85235595703125, "pos_frac": 0.625, "sample": [-1.0276031494140625, 0.4271697998046875, -0.1415863037109375, -0.227569580078125, 0.9800872802734375, -0.179901123046875, -0.015409469604492188, -0.08170318603515625, 0.29451751708984375, 1.13043212890625, 1.1912956237792969, 1.0015792846679688, 0.11346817016601562, -0.31545257568359375, 1.062164306640625, -1.2767562866210938, 0.17072296142578125, 0.5617103576660156, 0.3018951416015625, 0.0705108642578125, -0.0929107666015625, 1.7062149047851562, -0.5439071655273438, 1.2567062377929688, 0.7611541748046875, 0.3579216003417969, -0.15329551696777344, 0.927825927734375, -0.4274330139160156, 0.23487472534179688, 0.048610687255859375, 0.07929229736328125, -0.6481590270996094, -0.8990859985351562, -0.31356239318847656, 0.2653045654296875, 1.6332473754882812, 1.85235595703125, -0.20032501220703125, 0.2290802001953125, -0.0025386810302734375, 0.9268074035644531, -0.5964202880859375, 0.5655899047851562, 0.43280792236328125, -0.1847362518310547, 0.2924995422363281, -1.8524398803710938, 1.1681671142578125, 0.839935302734375, 0.6650238037109375, 0.802337646484375, 0.5811996459960938, 1.3878097534179688, 0.9021453857421875, -0.02391815185546875, 0.16920852661132812, 0.2162151336669922, -0.7317352294921875, 1.254058837890625, 1.2775039672851562, 0.21471786499023438, -0.03265380859375, -0.8247661590576172], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000076.npy"}
|
||||
{"epoch": 0.11489040060468632, "step": 77, "batch_size": 64, "mean": 0.3787541389465332, "std": 0.8686701059341431, "min": -1.1483306884765625, "p10": -0.5305450439453124, "median": 0.1198129653930664, "p90": 1.3846145629882816, "max": 2.989604949951172, "pos_frac": 0.640625, "sample": [0.01221466064453125, 0.5130500793457031, 0.6918373107910156, -0.9249496459960938, 0.7850170135498047, -0.045440673828125, -1.0361557006835938, -0.4696846008300781, -1.1483306884765625, -0.2772674560546875, 1.194061279296875, 0.07508087158203125, -0.424713134765625, 0.6057929992675781, -0.59368896484375, 2.989604949951172, 1.3278656005859375, 1.1486053466796875, 0.0262451171875, 2.4172439575195312, -0.282501220703125, 0.1412944793701172, 1.6059951782226562, 1.2321548461914062, 0.5694236755371094, -0.23546218872070312, 0.19689369201660156, 1.408935546875, 0.096527099609375, 0.009281158447265625, -0.5470657348632812, -0.26169586181640625, 0.7030868530273438, 2.2253799438476562, 0.01552581787109375, 0.85919189453125, 1.2927703857421875, -0.23086166381835938, -0.16644287109375, 2.3088531494140625, 0.8331680297851562, -0.20726776123046875, 1.2350349426269531, -0.8694725036621094, -0.2490081787109375, 0.9248199462890625, 0.023342132568359375, -0.17724990844726562, 0.1706390380859375, 0.5331077575683594, 0.7961044311523438, -0.49199676513671875, 0.39795494079589844, 0.7412986755371094, 1.2921905517578125, 0.4518280029296875, -0.40006256103515625, 0.004486083984375, 1.7614669799804688, 0.09833145141601562, -0.9017333984375, 0.5460891723632812, -0.07212448120117188, -0.00835418701171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000077.npy"}
|
||||
{"epoch": 0.1164021164021164, "step": 78, "batch_size": 64, "mean": 0.3058291971683502, "std": 0.8169967532157898, "min": -1.2975387573242188, "p10": -0.7934501647949219, "median": 0.2827568054199219, "p90": 1.2883674621582035, "max": 2.432281494140625, "pos_frac": 0.640625, "sample": [0.8360481262207031, 2.432281494140625, 0.8470230102539062, 1.0945587158203125, 0.876739501953125, 0.8763275146484375, 1.6329498291015625, -0.24762725830078125, -0.06652069091796875, 0.688720703125, 0.5926055908203125, 0.2970123291015625, 0.05106544494628906, 1.69024658203125, 0.26739501953125, -0.9417495727539062, -1.0778732299804688, 1.7441558837890625, -0.0585174560546875, 0.26850128173828125, -0.7923126220703125, -0.7822952270507812, 0.22088623046875, -0.19262313842773438, -0.309814453125, 0.107940673828125, 0.18501663208007812, -0.8817291259765625, 1.006561279296875, -0.616058349609375, 1.1452560424804688, 0.45135498046875, -0.23849868774414062, 1.1884613037109375, 1.8355484008789062, -0.6859893798828125, -1.1586990356445312, 0.347686767578125, 0.5316085815429688, 0.6753654479980469, -0.36887359619140625, 0.4696311950683594, 0.6221923828125, 0.7406997680664062, 0.13162994384765625, 0.5733718872070312, -1.2975387573242188, -0.13256454467773438, 0.3785591125488281, 0.9816436767578125, -0.26069068908691406, 0.9421043395996094, -0.074310302734375, -0.7939376831054688, 2.1143341064453125, -0.0955963134765625, -0.0927886962890625, -1.0512466430664062, 0.4490814208984375, 0.7187576293945312, 0.03591156005859375, 0.09779167175292969, 1.3311843872070312, 0.312713623046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000078.npy"}
|
||||
{"epoch": 0.11791383219954649, "step": 79, "batch_size": 64, "mean": 0.35301104187965393, "std": 1.0151009559631348, "min": -1.7266616821289062, "p10": -0.6213588714599609, "median": 0.2821159362792969, "p90": 1.527854156494141, "max": 3.5705947875976562, "pos_frac": 0.578125, "sample": [0.28908538818359375, -0.02558135986328125, -1.6379623413085938, 1.1188278198242188, -0.2724761962890625, 0.1673736572265625, -0.7402915954589844, -0.11852645874023438, -0.49207305908203125, -0.9318313598632812, -0.24448013305664062, -0.44466400146484375, 0.5947265625, 3.5705947875976562, -0.4345703125, -0.1299285888671875, 0.8602981567382812, 0.5518169403076172, -0.6440925598144531, -0.16321563720703125, 0.6647911071777344, 2.808349609375, -0.424163818359375, -0.329620361328125, 1.575775146484375, -0.14373397827148438, 3.0171661376953125, 1.4020233154296875, 0.60638427734375, 1.4160385131835938, -1.7266616821289062, 0.3350791931152344, 0.6483383178710938, 1.1709480285644531, 0.19233322143554688, 0.44795989990234375, 0.3889808654785156, 0.4850635528564453, -0.28530120849609375, 1.198486328125, -1.187408447265625, 1.6374130249023438, 1.0597038269042969, -0.3794822692871094, -0.0747222900390625, -0.024585723876953125, 0.9504470825195312, -0.043155670166015625, 0.6616916656494141, 0.06972885131835938, 0.46277618408203125, -1.575225830078125, 0.46612548828125, 0.275146484375, 0.7326889038085938, 2.46331787109375, -0.0005645751953125, 0.471771240234375, -0.5683135986328125, 0.6478233337402344, -0.3987274169921875, 0.5986480712890625, 1.9203872680664062, 0.10595703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000079.npy"}
|
||||
{"epoch": 0.11942554799697656, "step": 80, "batch_size": 64, "mean": 0.29609763622283936, "std": 0.8148956298828125, "min": -1.2605819702148438, "p10": -0.8514266967773437, "median": 0.30239200592041016, "p90": 1.29389762878418, "max": 2.3047027587890625, "pos_frac": 0.671875, "sample": [0.5107307434082031, 1.2275276184082031, -0.33697509765625, 0.975341796875, 1.1483993530273438, 0.3318328857421875, -0.2069091796875, -0.29734039306640625, -0.37409210205078125, -1.2605819702148438, 0.7078514099121094, 1.8446197509765625, 0.5426864624023438, -0.25344085693359375, 0.8683929443359375, 1.3223419189453125, -0.2351837158203125, -0.70849609375, 0.07295608520507812, 0.8298149108886719, 0.43778228759765625, 0.2720947265625, -0.43735313415527344, 2.1161270141601562, 1.1909637451171875, 0.98486328125, 0.2418212890625, 0.3726959228515625, 1.33905029296875, 0.578369140625, -1.101593017578125, 0.5674591064453125, 0.6045341491699219, -0.25340843200683594, 0.5810546875, 1.407470703125, 0.17697715759277344, 1.1494140625, 0.9517593383789062, -1.0128631591796875, 0.8957672119140625, 0.5250091552734375, -0.8686065673828125, 1.1151237487792969, -0.3992137908935547, -0.26396942138671875, -0.81134033203125, 0.1384563446044922, -1.0109100341796875, -1.0079574584960938, 2.3047027587890625, 0.14179229736328125, 0.0657196044921875, 0.1001739501953125, 0.2729511260986328, -0.7181739807128906, 0.35333251953125, 0.09353256225585938, 0.4503631591796875, -1.227447509765625, 0.3537139892578125, -0.114471435546875, 1.67877197265625, 0.00623321533203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000080.npy"}
|
||||
{"epoch": 0.12093726379440665, "step": 81, "batch_size": 64, "mean": 0.3478164076805115, "std": 0.8847886323928833, "min": -1.1653976440429688, "p10": -0.8593963623046873, "median": 0.2647247314453125, "p90": 1.6461265563964849, "max": 2.52642822265625, "pos_frac": 0.625, "sample": [0.02907562255859375, 1.060028076171875, 0.6457252502441406, -0.4400444030761719, 0.5202178955078125, -0.5136566162109375, 0.12674713134765625, 0.5207862854003906, -0.42165184020996094, -0.5484809875488281, 2.0546875, -0.9940109252929688, -1.0446701049804688, 0.03435516357421875, 0.3831977844238281, 0.7394638061523438, -0.10345649719238281, -0.2009429931640625, 0.25078582763671875, 1.032806396484375, 0.27866363525390625, -0.3238525390625, 0.586181640625, -0.63165283203125, 1.5277099609375, 0.3022308349609375, -0.416534423828125, 2.0588455200195312, 0.4860877990722656, 0.7584991455078125, 0.114898681640625, -0.9714279174804688, 1.6968765258789062, -0.19370651245117188, -0.3004302978515625, 2.268463134765625, 0.957916259765625, 1.1556243896484375, -0.3854827880859375, 2.52642822265625, -0.96917724609375, 1.0328865051269531, 0.4421844482421875, 0.1482086181640625, -0.957000732421875, 0.004085540771484375, 1.2107868194580078, 0.9569931030273438, 1.7784576416015625, 1.939453125, 0.9060897827148438, -0.2480621337890625, 1.0641098022460938, -0.3970947265625, 0.1671619415283203, -0.00687408447265625, -1.1653976440429688, 0.3177909851074219, -0.1387176513671875, 1.1241111755371094, -0.3005409240722656, 0.7416877746582031, 1.0246429443359375, -1.0418357849121094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000081.npy"}
|
||||
{"epoch": 0.12244897959183673, "step": 82, "batch_size": 64, "mean": 0.6286218166351318, "std": 0.9359553456306458, "min": -1.1224517822265625, "p10": -0.5944925308227539, "median": 0.5042724609375, "p90": 1.5264495849609376, "max": 4.0511932373046875, "pos_frac": 0.796875, "sample": [0.3832740783691406, 0.3794288635253906, -0.0615692138671875, -0.15230560302734375, 1.1436920166015625, 0.2673969268798828, 1.1418018341064453, 1.0210380554199219, 2.1047515869140625, 1.119607925415039, 0.7150154113769531, 0.5443000793457031, 2.799072265625, 1.0497360229492188, 0.776611328125, 0.24170303344726562, 1.3871231079101562, -0.8708114624023438, 0.7796726226806641, 1.379791259765625, 0.44428253173828125, 0.7079505920410156, 0.2833442687988281, -0.3266258239746094, -0.146240234375, 1.4696578979492188, 0.478485107421875, 0.944091796875, 1.1078758239746094, 0.22594642639160156, -0.6043682098388672, 0.3829154968261719, 4.0511932373046875, -0.8855438232421875, 1.5384750366210938, -0.7595367431640625, 1.5770492553710938, 1.4983901977539062, 0.5535736083984375, 1.8692703247070312, 0.4642791748046875, 0.10437774658203125, 0.530059814453125, 0.056964874267578125, 0.42527008056640625, 0.8027801513671875, 0.4128570556640625, 0.01415252685546875, 0.895477294921875, 0.18804931640625, -0.8997344970703125, 0.349945068359375, 1.096954345703125, 3.1149444580078125, 1.0395965576171875, -0.6375274658203125, 1.4332122802734375, 0.8832664489746094, 0.7690544128417969, 0.4660625457763672, -0.5561447143554688, -1.1224517822265625, -0.5714492797851562, 0.41228485107421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000082.npy"}
|
||||
{"epoch": 0.12396069538926682, "step": 83, "batch_size": 64, "mean": 0.44747063517570496, "std": 0.928534984588623, "min": -1.5392036437988281, "p10": -0.7851455688476561, "median": 0.4809226989746094, "p90": 1.4327278137207033, "max": 3.0479202270507812, "pos_frac": 0.765625, "sample": [2.1547088623046875, 1.17462158203125, -1.4279556274414062, 0.3803901672363281, 0.10630035400390625, 1.39398193359375, 1.6678466796875, 0.7019195556640625, 0.18119049072265625, 0.995758056640625, 0.804931640625, 0.42345428466796875, -0.5658721923828125, 0.39942169189453125, 0.5369110107421875, 0.2545967102050781, 0.6748046875, 1.4493331909179688, 0.14678192138671875, -0.56317138671875, 0.057769775390625, 0.06688308715820312, 1.533660888671875, 0.9788551330566406, -1.444122314453125, 0.7293853759765625, 2.7096710205078125, 0.4588470458984375, 0.02220916748046875, -0.15648269653320312, -1.2960662841796875, -0.060718536376953125, -0.10593414306640625, -1.5392036437988281, 0.27612876892089844, 0.6899986267089844, 0.6819591522216797, 1.1942138671875, -0.31360626220703125, -1.411590576171875, 0.7082061767578125, 0.03610992431640625, 1.2200508117675781, 0.5029983520507812, 0.22916603088378906, -0.5330257415771484, 0.6815071105957031, 0.8973464965820312, 0.8141021728515625, 1.1522331237792969, 3.0479202270507812, 0.6087913513183594, 0.6910820007324219, 1.2242050170898438, 0.21556854248046875, 0.9436244964599609, 0.5338287353515625, 0.9721755981445312, 2.3105545043945312, -1.01995849609375, -0.0508880615234375, 0.16980743408203125, 0.20002365112304688, -0.879119873046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000083.npy"}
|
||||
{"epoch": 0.1254724111866969, "step": 84, "batch_size": 64, "mean": 0.5169292688369751, "std": 0.9749613404273987, "min": -2.572174072265625, "p10": -0.6351104736328125, "median": 0.5517959594726562, "p90": 1.664530944824219, "max": 2.7579345703125, "pos_frac": 0.703125, "sample": [-0.076904296875, 0.6990127563476562, 0.2934703826904297, 0.7756538391113281, 2.4456405639648438, 0.9981765747070312, 1.6455078125, -0.3798942565917969, 0.5272178649902344, -1.6921844482421875, 1.80926513671875, 0.7903537750244141, 1.0475044250488281, -0.60009765625, 1.629058837890625, 1.0906219482421875, -0.16882705688476562, -0.7437210083007812, 1.4287567138671875, 0.5586128234863281, 1.5328826904296875, 0.8060531616210938, -0.01313018798828125, 0.8069229125976562, 0.9765243530273438, -0.03875732421875, -1.0845413208007812, 0.8853912353515625, 0.03516578674316406, 2.1757431030273438, 0.2498321533203125, 0.533935546875, 0.08747482299804688, 1.6806793212890625, 0.02486419677734375, 1.4001693725585938, 1.4308395385742188, 1.1410636901855469, 0.29140472412109375, -0.9772109985351562, -2.572174072265625, 1.8411102294921875, 0.8366165161132812, 0.015743255615234375, -0.03556060791015625, -0.294830322265625, 0.21152114868164062, 0.5449790954589844, 0.39574432373046875, -1.42730712890625, -0.19351959228515625, 1.0825653076171875, 0.7348175048828125, 0.8696060180664062, 0.469940185546875, 1.3806381225585938, 2.7579345703125, -0.07241630554199219, -0.0602264404296875, -0.650115966796875, 1.1258888244628906, 0.7889480590820312, -0.3616466522216797, 1.6726837158203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000084.npy"}
|
||||
{"epoch": 0.12698412698412698, "step": 85, "batch_size": 64, "mean": 0.4998033940792084, "std": 0.88062584400177, "min": -1.6185760498046875, "p10": -0.6904266357421874, "median": 0.4847698211669922, "p90": 1.6940862655639655, "max": 2.287200927734375, "pos_frac": 0.765625, "sample": [0.6994285583496094, -1.1582870483398438, 0.7783355712890625, 0.3046531677246094, 0.44039154052734375, 0.8257369995117188, 0.5221328735351562, 0.28802490234375, 0.16254425048828125, 0.29241180419921875, 1.4565658569335938, -1.10589599609375, 1.7503414154052734, -0.7147064208984375, 0.8447513580322266, 0.4724006652832031, 1.5628242492675781, 1.446624755859375, 2.0946807861328125, 1.1940345764160156, -0.5386581420898438, 0.2380523681640625, -1.6185760498046875, 0.5856208801269531, 1.2077064514160156, 1.9138870239257812, 0.269866943359375, 0.6698417663574219, 2.2499465942382812, 0.28311729431152344, -0.23032379150390625, 0.5941314697265625, 1.28399658203125, 0.621917724609375, 0.49151611328125, 0.36676025390625, 1.4035110473632812, 0.7175731658935547, 0.1279754638671875, -0.06494522094726562, -0.3893470764160156, 1.0418930053710938, 1.3852386474609375, 1.7941360473632812, 2.287200927734375, 0.8586158752441406, 0.2620086669921875, -0.10378456115722656, 0.2068023681640625, 1.1117095947265625, 0.4780235290527344, -0.6337738037109375, -0.18138504028320312, -0.5510711669921875, 0.5823097229003906, 0.08428955078125, -1.4546470642089844, -1.0316696166992188, 0.9008064270019531, 0.35063934326171875, 1.8365631103515625, -0.83953857421875, 0.876708984375, 0.3857765197753906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000085.npy"}
|
||||
{"epoch": 0.12849584278155707, "step": 86, "batch_size": 64, "mean": 0.2945028245449066, "std": 0.8391074538230896, "min": -1.4727020263671875, "p10": -0.7327873229980468, "median": 0.21729087829589844, "p90": 1.474149513244629, "max": 2.0484619140625, "pos_frac": 0.578125, "sample": [0.8518218994140625, -0.27771759033203125, -0.27577972412109375, -0.0872955322265625, 1.4827766418457031, 0.23987770080566406, -0.7980194091796875, -0.14068603515625, -1.3712158203125, -0.09097671508789062, 0.14268112182617188, -0.06750106811523438, 0.92413330078125, 1.367095947265625, -0.44842529296875, 1.3350067138671875, 0.6246566772460938, -0.760406494140625, -1.3234710693359375, 0.45330047607421875, 2.0484619140625, 0.6559295654296875, 0.68145751953125, 0.5690803527832031, 1.7944183349609375, 0.2120189666748047, -0.4600372314453125, 0.3156166076660156, 0.38430023193359375, 0.2225627899169922, 1.7834320068359375, 1.6832122802734375, -0.1103363037109375, -1.4727020263671875, 0.1424102783203125, 1.1630325317382812, -0.1918182373046875, 0.1646881103515625, 0.8676834106445312, -0.012603759765625, -1.2684173583984375, 1.078195571899414, 1.06817626953125, -0.6627731323242188, -0.21041107177734375, -0.19048309326171875, 1.454019546508789, 0.3571319580078125, 0.6944656372070312, -0.6683425903320312, 1.8819732666015625, 0.7583465576171875, 0.6301727294921875, -0.0234832763671875, 1.0259857177734375, -0.7833099365234375, -0.5620689392089844, 1.760986328125, -0.296051025390625, -0.14400482177734375, 0.07619476318359375, -0.18044662475585938, 0.4538536071777344, 0.3778076171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000086.npy"}
|
||||
{"epoch": 0.13000755857898716, "step": 87, "batch_size": 64, "mean": 0.7170728445053101, "std": 1.1162075996398926, "min": -1.897674560546875, "p10": -0.7817283630371094, "median": 0.6669979095458984, "p90": 2.232331848144532, "max": 2.985443115234375, "pos_frac": 0.71875, "sample": [-0.168212890625, 2.054168701171875, 2.2720794677734375, -0.2691001892089844, 2.3709640502929688, -0.2069091796875, 2.04302978515625, 0.00684356689453125, 2.4430999755859375, 2.1056861877441406, 0.82720947265625, -0.7940502166748047, 1.6322059631347656, -0.9611282348632812, -0.5234375, 2.7296142578125, -1.088043212890625, 1.4272727966308594, 0.5044326782226562, 0.6937217712402344, 0.6402740478515625, 1.8641738891601562, 0.40505027770996094, -1.1971969604492188, 1.3751678466796875, -1.897674560546875, -0.5867652893066406, -0.3262157440185547, 0.958221435546875, -0.7529773712158203, 0.5741539001464844, -0.4569664001464844, 0.4231758117675781, -0.42420196533203125, 0.033935546875, -0.9046096801757812, 1.3556747436523438, 0.9560737609863281, 1.3044891357421875, 0.16595458984375, -0.05533599853515625, 1.2319564819335938, 0.9344596862792969, -0.144805908203125, 1.4560699462890625, 0.095703125, 0.10571670532226562, 1.5505218505859375, -0.8092002868652344, 1.83331298828125, 2.5769805908203125, 1.1547698974609375, 2.3990936279296875, 1.0489425659179688, 0.564239501953125, 0.2847900390625, 0.45172691345214844, 0.3835029602050781, 2.985443115234375, 0.8640594482421875, 2.13958740234375, 0.9379043579101562, 1.9041748046875, 1.389862060546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000087.npy"}
|
||||
{"epoch": 0.13151927437641722, "step": 88, "batch_size": 64, "mean": 0.33253180980682373, "std": 1.1517679691314697, "min": -3.3101959228515625, "p10": -0.9589355468749999, "median": 0.4446563720703125, "p90": 1.5403190612792972, "max": 3.9008026123046875, "pos_frac": 0.640625, "sample": [1.131744384765625, -0.12947654724121094, -0.17080116271972656, 0.6814041137695312, 1.2637367248535156, 1.5708084106445312, 1.4413299560546875, -0.674041748046875, -1.6740188598632812, 0.10459518432617188, 1.1591758728027344, 1.0105438232421875, 1.1472129821777344, 0.021091461181640625, -0.5804405212402344, 0.414031982421875, 0.6968231201171875, -0.9642295837402344, -0.9289398193359375, 0.3516998291015625, 0.592071533203125, 1.0433063507080078, 0.44776153564453125, 0.6179580688476562, -0.7777519226074219, 2.0717926025390625, 0.8478126525878906, 0.1519622802734375, -0.7400970458984375, -0.9465827941894531, -1.399078369140625, -1.5871353149414062, 3.9008026123046875, -0.4383087158203125, 0.39240264892578125, 0.1588592529296875, -0.3546600341796875, 0.6603965759277344, 1.6376876831054688, 0.2058563232421875, 0.6501502990722656, 2.0358123779296875, 0.6380043029785156, 1.093841552734375, -0.5859375, 1.4394912719726562, 0.44155120849609375, -0.20956039428710938, 0.5681228637695312, 0.6723747253417969, 1.2751235961914062, 1.019805908203125, 1.655029296875, 2.9987335205078125, 1.46917724609375, -1.019205093383789, -0.156585693359375, 0.5892295837402344, 0.589508056640625, -0.10558700561523438, -0.85211181640625, -3.3101959228515625, -1.2839202880859375, -0.6881217956542969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000088.npy"}
|
||||
{"epoch": 0.1330309901738473, "step": 89, "batch_size": 64, "mean": 0.8061116933822632, "std": 1.3140805959701538, "min": -1.777923583984375, "p10": -0.714434814453125, "median": 0.7892913818359375, "p90": 2.2718482971191407, "max": 5.363067626953125, "pos_frac": 0.6875, "sample": [0.9282875061035156, 1.2513046264648438, 0.80517578125, -0.4205322265625, 0.9967803955078125, 2.4977149963378906, -0.65936279296875, 0.4437751770019531, 0.44549560546875, 1.2132797241210938, 0.8289794921875, 1.770294189453125, 1.558013916015625, 5.363067626953125, 0.5345306396484375, 0.2714576721191406, 1.0128135681152344, -0.64788818359375, -0.12355804443359375, -0.06594085693359375, -0.9283065795898438, -0.7794170379638672, 0.7696380615234375, 2.20465087890625, -1.0239276885986328, 4.515167236328125, -0.29473114013671875, 1.103485107421875, -0.31746482849121094, -0.2532501220703125, 0.773406982421875, 0.5277252197265625, 0.5276527404785156, -0.738037109375, 2.126007080078125, 0.2040843963623047, 1.4074783325195312, 1.2347946166992188, 2.101165771484375, 2.2767486572265625, 1.2363510131835938, 1.4133987426757812, 1.791595458984375, 3.1746082305908203, 0.2618827819824219, 0.18659019470214844, 0.9433670043945312, 2.4839019775390625, 1.6297264099121094, -0.8642578125, 1.228729248046875, 1.6416149139404297, -0.0261077880859375, 0.04152679443359375, 1.1215400695800781, -1.3989334106445312, 0.8854331970214844, -0.6069450378417969, -0.2905082702636719, -1.777923583984375, 2.993865966796875, -0.10223770141601562, 2.2604141235351562, -0.0770416259765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000089.npy"}
|
||||
{"epoch": 0.1345427059712774, "step": 90, "batch_size": 64, "mean": 0.6056344509124756, "std": 1.1355503797531128, "min": -2.7921142578125, "p10": -0.6316921234130859, "median": 0.5961360931396484, "p90": 1.8908203125000003, "max": 4.0867156982421875, "pos_frac": 0.71875, "sample": [-0.02382659912109375, 0.8454456329345703, 0.27890777587890625, 0.7049446105957031, 0.5889549255371094, 0.3488349914550781, 1.378082275390625, 3.1957321166992188, -0.63885498046875, -2.7921142578125, 1.8528594970703125, 1.2416534423828125, 1.2109947204589844, 0.5506706237792969, 1.8491897583007812, -0.5737686157226562, 0.57342529296875, 1.8269195556640625, -0.08191680908203125, 0.7925338745117188, 1.5233421325683594, -1.8229522705078125, 2.221282958984375, -0.14788818359375, 0.28643798828125, 1.3712043762207031, 1.0093536376953125, 1.52801513671875, 0.4071483612060547, 2.4482192993164062, -0.3233757019042969, -0.5760650634765625, 0.7418899536132812, 1.9070892333984375, 0.5349044799804688, 0.6033172607421875, -1.4554176330566406, -0.84234619140625, 4.0867156982421875, 1.0744743347167969, -0.39905548095703125, 1.2695159912109375, 1.9933319091796875, 0.065338134765625, 0.27477264404296875, 0.6351871490478516, 1.2281074523925781, 0.7957000732421875, 0.45209503173828125, 0.9180564880371094, -0.818878173828125, 1.00164794921875, -0.083984375, 0.13481903076171875, 0.2281322479248047, 0.2620677947998047, 1.0071868896484375, -0.5923175811767578, -0.6465072631835938, 0.8796424865722656, -0.202911376953125, 2.5128860473632812, -0.6149787902832031, 0.7567329406738281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000090.npy"}
|
||||
{"epoch": 0.1360544217687075, "step": 91, "batch_size": 64, "mean": 0.5181469917297363, "std": 1.3949393033981323, "min": -2.2237548828125, "p10": -1.1167276382446287, "median": 0.5297679901123047, "p90": 2.072382354736328, "max": 5.205223083496094, "pos_frac": 0.59375, "sample": [1.6376800537109375, 1.134765625, 0.5680828094482422, 0.012598037719726562, 0.8365859985351562, 1.9744415283203125, -0.0754852294921875, -0.2241058349609375, 0.9369964599609375, 3.5887374877929688, 2.745849609375, -1.2892303466796875, 0.7858924865722656, -0.06914520263671875, 0.7010812759399414, -0.2606658935546875, 0.5617523193359375, -0.6010704040527344, 0.4014320373535156, -2.0117645263671875, 1.1052417755126953, 3.1449737548828125, 1.3765640258789062, -0.05644416809082031, 0.5887908935546875, 0.7262554168701172, 0.806182861328125, -0.6005687713623047, -1.646209716796875, 0.6524124145507812, -0.085968017578125, 2.5818252563476562, -2.2237548828125, 0.25186729431152344, 2.078044891357422, -0.737030029296875, -1.6837310791015625, 1.536529541015625, -0.9015598297119141, -1.2089424133300781, 0.99371337890625, -0.877471923828125, 0.10192108154296875, 0.48095703125, -0.2914543151855469, 2.0591697692871094, 0.4977836608886719, 1.9115180969238281, 1.4015121459960938, 1.8328475952148438, -0.7522506713867188, -0.4369010925292969, -0.5517787933349609, -1.9790592193603516, 1.777099609375, 1.900665283203125, 2.22735595703125, -0.40379905700683594, 5.205223083496094, -0.157440185546875, -0.5062713623046875, 0.5966911315917969, 1.2331771850585938, -0.16070938110351562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000091.npy"}
|
||||
{"epoch": 0.13756613756613756, "step": 92, "batch_size": 64, "mean": 0.6811752319335938, "std": 1.2852835655212402, "min": -1.1378860473632812, "p10": -0.79122314453125, "median": 0.4061737060546875, "p90": 2.767935180664063, "max": 4.235725402832031, "pos_frac": 0.65625, "sample": [0.6558837890625, 0.4041595458984375, -0.9032363891601562, -0.718170166015625, -0.9768829345703125, 0.07097244262695312, -0.4602813720703125, 1.0843429565429688, -0.74755859375, -0.638824462890625, 0.3990631103515625, 0.4081878662109375, 1.6416168212890625, 1.7169418334960938, 1.3701820373535156, 1.4951915740966797, 0.34696197509765625, 0.21540069580078125, 2.4635696411132812, -0.041545867919921875, 1.002471923828125, -0.45969390869140625, 0.49909210205078125, 0.6702651977539062, 2.7965545654296875, -0.038158416748046875, -1.1335716247558594, -0.00492095947265625, -0.8117752075195312, 1.4873504638671875, -0.23482131958007812, 1.7752456665039062, 2.7011566162109375, -0.5999832153320312, 3.0757064819335938, 0.4173431396484375, 0.8082427978515625, 1.9488983154296875, -0.41606903076171875, -0.3764076232910156, -0.8099365234375, -0.21228790283203125, 4.235725402832031, 2.8884124755859375, 0.2580528259277344, -1.1260108947753906, 0.44702911376953125, 1.3287429809570312, 0.11084747314453125, -0.3887214660644531, 0.9821586608886719, 3.22509765625, 1.3159561157226562, 0.1368579864501953, 1.5932464599609375, -1.1378860473632812, 3.914276123046875, 1.4052047729492188, 0.5930042266845703, 1.045684814453125, 2.9107666015625, 0.12882232666015625, -0.25510406494140625, 0.11237716674804688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000092.npy"}
|
||||
{"epoch": 0.13907785336356765, "step": 93, "batch_size": 64, "mean": 0.4652804434299469, "std": 1.5134670734405518, "min": -3.0784149169921875, "p10": -0.9996192932128907, "median": 0.4200611114501953, "p90": 2.2329803466796876, "max": 6.0991058349609375, "pos_frac": 0.609375, "sample": [-0.4980754852294922, 1.2925872802734375, -1.81341552734375, 0.7625656127929688, -2.3311309814453125, 1.0292167663574219, -0.9156951904296875, -1.0542526245117188, 0.4632682800292969, 0.37685394287109375, 3.7346115112304688, 2.532186508178711, 6.0991058349609375, -0.9017410278320312, -0.5399932861328125, 0.12349319458007812, 0.742584228515625, 1.5276107788085938, -0.2748374938964844, -0.9918594360351562, 2.2370452880859375, 3.404296875, -0.9436416625976562, -0.4299774169921875, -0.0851593017578125, 0.13430404663085938, 1.7795524597167969, 0.00778961181640625, -0.8716964721679688, 0.2539329528808594, 0.552459716796875, 0.6837997436523438, -0.11727523803710938, 2.5650405883789062, -3.0784149169921875, 1.3852615356445312, 0.6064338684082031, 0.7600555419921875, -1.0029449462890625, -0.11711883544921875, 1.9610328674316406, 0.24387359619140625, 1.2645721435546875, 0.8618087768554688, -0.5177650451660156, -0.9709091186523438, 0.7570648193359375, 1.0738391876220703, 3.0052947998046875, 1.0310859680175781, 1.6691474914550781, 0.6360893249511719, 1.904815673828125, 0.8145980834960938, -1.4286041259765625, -0.4054527282714844, 2.2234954833984375, -1.9281768798828125, -0.48259735107421875, 1.2399673461914062, 0.7302169799804688, -0.4366912841796875, -0.8028793334960938, 0.24729537963867188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000093.npy"}
|
||||
{"epoch": 0.14058956916099774, "step": 94, "batch_size": 64, "mean": 0.8975897431373596, "std": 1.6500110626220703, "min": -3.00799560546875, "p10": -0.7492403030395505, "median": 0.80865478515625, "p90": 2.9152797698974626, "max": 6.284271240234375, "pos_frac": 0.71875, "sample": [3.24945068359375, 3.094806671142578, -0.2454071044921875, 1.1492900848388672, 0.33991241455078125, -2.3530731201171875, 0.998565673828125, -0.03133392333984375, -0.213653564453125, 3.8645248413085938, -0.04524993896484375, 2.0659027099609375, 0.846466064453125, -2.3557891845703125, 1.0041866302490234, 1.640899658203125, 0.9611587524414062, -0.2568778991699219, 0.2929534912109375, 0.059967041015625, 2.231525421142578, 0.48653411865234375, 0.9459266662597656, 0.6005363464355469, 2.1997146606445312, 0.0376434326171875, 0.7256507873535156, 0.9950408935546875, -0.8774490356445312, 0.834808349609375, 1.0399818420410156, 0.5264434814453125, 0.12984466552734375, -2.3259124755859375, 0.782501220703125, 1.1858406066894531, 1.85943603515625, 3.2069549560546875, 0.7023086547851562, 2.274383544921875, 2.3647537231445312, 1.3310928344726562, -0.4424591064453125, 6.284271240234375, 2.4963836669921875, 0.9216079711914062, 1.6536788940429688, 4.5714569091796875, 0.64837646484375, 2.280294418334961, 4.5975341796875, -0.8665313720703125, -0.38161468505859375, -0.47556114196777344, 1.10064697265625, -0.08351516723632812, -3.00799560546875, 0.7599258422851562, 2.405029296875, 0.9126548767089844, -0.10605049133300781, -0.20859527587890625, -1.2534542083740234, 0.315399169921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000094.npy"}
|
||||
{"epoch": 0.1421012849584278, "step": 95, "batch_size": 64, "mean": 0.6297565698623657, "std": 1.4632779359817505, "min": -2.24468994140625, "p10": -1.1491058349609375, "median": 0.4066505432128906, "p90": 2.5162578582763686, "max": 4.4146575927734375, "pos_frac": 0.65625, "sample": [2.031829833984375, -1.4808731079101562, 0.08244705200195312, -0.7965621948242188, -2.0201492309570312, -0.38974761962890625, 4.4146575927734375, 1.19561767578125, 2.6650390625, 2.159912109375, -0.4454002380371094, 0.8675079345703125, 0.29254150390625, -0.1762237548828125, -1.0757827758789062, 1.6282272338867188, -1.150726318359375, -0.7162399291992188, -2.087535858154297, 0.43196868896484375, 1.9112701416015625, 3.076416015625, 0.18172836303710938, 1.1706390380859375, -2.24468994140625, -0.36837005615234375, 1.66943359375, 2.0451812744140625, -1.2032470703125, -2.1384124755859375, 0.3813323974609375, 1.8066635131835938, 1.6026649475097656, 0.672454833984375, -0.25192832946777344, 0.0363616943359375, 1.6800804138183594, 2.0394821166992188, -0.714813232421875, 0.2470245361328125, -0.29010009765625, 0.32079315185546875, 2.888885498046875, 0.8056182861328125, -0.9551544189453125, 2.9236679077148438, 1.6475677490234375, 0.3710289001464844, 0.9817657470703125, 2.8158111572265625, 0.22620391845703125, 3.203643798828125, -1.14532470703125, 1.049163818359375, 1.2263870239257812, 1.739501953125, 0.027864456176757812, 1.6338653564453125, -0.016284942626953125, 1.3857269287109375, -0.5303497314453125, 2.1691017150878906, 1.623748779296875, -0.8284912109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000095.npy"}
|
||||
{"epoch": 0.1436130007558579, "step": 96, "batch_size": 64, "mean": 0.7316542863845825, "std": 1.6130950450897217, "min": -2.6875, "p10": -0.9073781967163086, "median": 0.5758657455444336, "p90": 3.313922119140625, "max": 4.642974853515625, "pos_frac": 0.625, "sample": [0.7924156188964844, -0.788360595703125, 1.6461029052734375, -1.8863067626953125, 1.337127685546875, -0.6178131103515625, 0.8668975830078125, -0.29297447204589844, 0.8598480224609375, 0.363800048828125, 3.8951187133789062, 0.1460742950439453, -0.09049224853515625, 3.3592681884765625, 2.32135009765625, 2.3629302978515625, -2.0658493041992188, -0.24526596069335938, -0.8910694122314453, -0.7228126525878906, 0.5514945983886719, -0.062896728515625, 1.63397216796875, -2.6875, -0.4730682373046875, 0.6002368927001953, 1.3690109252929688, 0.741851806640625, 2.2151336669921875, -0.370269775390625, 3.3152999877929688, 2.1322784423828125, 3.3787841796875, 3.3107070922851562, -0.23600006103515625, 2.585113525390625, -2.1716384887695312, 0.20014381408691406, -0.49602508544921875, 1.2009925842285156, 1.8609619140625, 0.6639595031738281, 0.6506233215332031, -0.37200164794921875, 4.312225341796875, -0.3819122314453125, 2.307750701904297, -1.03411865234375, 0.3855133056640625, -0.7386665344238281, 0.5393791198730469, -0.91436767578125, 0.6342124938964844, 4.642974853515625, 3.638763427734375, 1.48809814453125, -0.6010551452636719, 0.8467216491699219, 0.4314136505126953, 0.26763916015625, 2.0906600952148438, 0.6496810913085938, -0.2457122802734375, -1.3844795227050781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000096.npy"}
|
||||
{"epoch": 0.14512471655328799, "step": 97, "batch_size": 64, "mean": 0.2733200490474701, "std": 1.5987409353256226, "min": -3.519073486328125, "p10": -1.509088897705078, "median": 0.2641716003417969, "p90": 2.276596832275391, "max": 3.6649932861328125, "pos_frac": 0.59375, "sample": [-1.2753753662109375, -0.2837677001953125, -0.7389888763427734, 2.7331180572509766, 0.5398483276367188, 3.6649932861328125, -1.0072898864746094, -0.82440185546875, -1.2770919799804688, -3.519073486328125, -2.1263885498046875, 2.3013687133789062, 0.15950775146484375, 1.4783821105957031, -1.4731521606445312, -3.491901397705078, 0.7376289367675781, 0.16449356079101562, 1.1707000732421875, 0.6104011535644531, 0.018718719482421875, 1.2672958374023438, 0.12401962280273438, -1.3655509948730469, 1.0778427124023438, 1.3261795043945312, -2.6415176391601562, 0.34410858154296875, 0.009769439697265625, 0.7908401489257812, -0.098724365234375, -1.896759033203125, 0.7645263671875, 1.1000938415527344, -0.474884033203125, 1.2367095947265625, 2.7044677734375, 2.1322059631347656, -0.25693511962890625, -0.2714691162109375, 2.0153427124023438, -0.875701904296875, 3.201690673828125, 1.103302001953125, 0.7533721923828125, -0.9271049499511719, 2.2187957763671875, 2.4804916381835938, 1.28466796875, -1.5244903564453125, -0.6499710083007812, 0.184234619140625, -3.344085693359375, -1.3059768676757812, 2.6701202392578125, 1.9049835205078125, 1.3679771423339844, -0.0379791259765625, 1.7805118560791016, 1.4450836181640625, -1.2763748168945312, 1.0617294311523438, 0.657958984375, -0.13004302978515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000097.npy"}
|
||||
{"epoch": 0.14663643235071808, "step": 98, "batch_size": 64, "mean": 0.3865186870098114, "std": 1.3341240882873535, "min": -4.14935302734375, "p10": -1.0512012481689452, "median": 0.5403332710266113, "p90": 1.5532188415527344, "max": 3.8991546630859375, "pos_frac": 0.6875, "sample": [-0.8891983032226562, 0.3106355667114258, 1.1479568481445312, 2.110637664794922, 1.252716064453125, 0.77215576171875, 1.1297149658203125, 1.2549629211425781, -0.08303451538085938, 3.8991546630859375, -2.142730712890625, -1.0824241638183594, 0.8458404541015625, -0.10227584838867188, -2.94427490234375, 1.66107177734375, 1.1509628295898438, 1.5640335083007812, 0.07404708862304688, -0.5151252746582031, 1.1981239318847656, 2.6733932495117188, 0.4346160888671875, 1.4705810546875, 0.5318078994750977, -0.7681884765625, 3.0948448181152344, 1.0365066528320312, 0.9336605072021484, 0.206298828125, 1.527984619140625, 0.7046661376953125, -0.5393142700195312, 0.7066650390625, -0.7581939697265625, 0.8324508666992188, 1.185699462890625, 0.5015029907226562, 0.8643684387207031, -0.4652252197265625, 0.548858642578125, 0.027740478515625, -1.9699020385742188, 0.36376953125, 1.1424713134765625, -1.37158203125, 0.3605518341064453, 0.8635025024414062, -0.685760498046875, 0.9958343505859375, 0.0751495361328125, 1.1719131469726562, -0.9783477783203125, -0.9011802673339844, -4.14935302734375, 1.3591842651367188, 1.1155662536621094, 0.3566474914550781, -0.17455291748046875, 0.5114822387695312, 2.5500640869140625, 0.7427101135253906, -0.30747222900390625, -1.697174072265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000098.npy"}
|
||||
{"epoch": 0.14814814814814814, "step": 99, "batch_size": 64, "mean": 0.7572157979011536, "std": 2.060706615447998, "min": -3.6048355102539062, "p10": -1.5198112487792967, "median": 0.4756622314453125, "p90": 3.1021453857421877, "max": 6.6320953369140625, "pos_frac": 0.703125, "sample": [5.142513275146484, 3.110137939453125, 0.05291748046875, 5.5637054443359375, 0.37421417236328125, 0.1534576416015625, -0.0452880859375, -2.443788528442383, 0.76043701171875, 1.8354949951171875, 1.749755859375, 1.5410308837890625, 1.1137466430664062, 1.9301338195800781, -0.05214118957519531, 2.3752365112304688, -0.6146469116210938, 0.7480201721191406, 0.2569084167480469, 0.8947029113769531, 3.77197265625, -0.3540382385253906, -0.5555381774902344, -1.1924514770507812, -1.875030517578125, -2.235504150390625, 6.087371826171875, -1.1237335205078125, 0.6156387329101562, 5.558265686035156, 0.8075447082519531, 1.808013916015625, -1.1748809814453125, 1.517364501953125, 0.24158859252929688, 1.9731521606445312, 0.22628402709960938, 0.46800994873046875, -1.2353782653808594, 1.731466293334961, 0.15845680236816406, -3.0141448974609375, 3.08349609375, 6.6320953369140625, -0.4204673767089844, 1.70977783203125, 2.2207374572753906, 1.0858097076416016, -2.7968902587890625, 1.046640396118164, 0.8396568298339844, 1.3261604309082031, 0.4589729309082031, 0.8125114440917969, 0.8373870849609375, -0.32068634033203125, 0.2996673583984375, 0.3997344970703125, 0.4284515380859375, -3.6048355102539062, 0.2238311767578125, -1.3359031677246094, 0.48331451416015625, -1.5986289978027344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000099.npy"}
|
||||
{"epoch": 0.14965986394557823, "step": 100, "batch_size": 64, "mean": 0.7813898324966431, "std": 2.338491916656494, "min": -4.307403564453125, "p10": -1.7277168273925778, "median": 0.7021331787109375, "p90": 3.5997848510742205, "max": 9.538238525390625, "pos_frac": 0.671875, "sample": [1.748291015625, -0.191070556640625, -2.13092041015625, 1.4119644165039062, 3.15838623046875, -0.7462921142578125, 0.41851806640625, 0.9616127014160156, 4.143096923828125, 0.1981964111328125, 5.4327392578125, 1.2401809692382812, 0.2901611328125, 4.149444580078125, 0.32857513427734375, 4.74639892578125, -1.124664306640625, 0.1811065673828125, 0.7335853576660156, 0.7161865234375, 0.8881702423095703, -1.0108566284179688, -1.3263359069824219, 0.19176483154296875, -1.3371658325195312, 2.7494354248046875, 1.4313278198242188, 2.0767745971679688, 1.0021514892578125, -4.307403564453125, -0.027194976806640625, -0.1814422607421875, -1.8950958251953125, -0.46704864501953125, 0.37178802490234375, -1.3305702209472656, 2.4235992431640625, 1.0037727355957031, -3.9599609375, 1.9878196716308594, 0.395050048828125, 6.75244140625, -2.225677490234375, 0.057521820068359375, -1.1093063354492188, 1.76165771484375, 0.7939929962158203, 0.17871475219726562, -0.2769947052001953, 1.3860549926757812, 1.949676513671875, 2.2135009765625, 1.0094070434570312, 1.7972412109375, 1.7603302001953125, 0.7436676025390625, -2.8953018188476562, 3.7889556884765625, 1.7811775207519531, 0.688079833984375, 9.538238525390625, -0.145751953125, -3.797882080078125, -0.084869384765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000100.npy"}
|
||||
{"epoch": 0.15117157974300832, "step": 101, "batch_size": 64, "mean": 0.966697096824646, "std": 1.780356764793396, "min": -4.530792236328125, "p10": -1.1007194519042969, "median": 0.9832983016967773, "p90": 3.4214843750000004, "max": 4.712715148925781, "pos_frac": 0.71875, "sample": [0.41381072998046875, -0.152130126953125, 1.3780975341796875, -0.11727142333984375, 3.0525054931640625, 2.01519775390625, 0.4709968566894531, 0.3010272979736328, -0.4284820556640625, -0.10138702392578125, 3.3762283325195312, 1.282928466796875, -4.530792236328125, 0.32317352294921875, 3.2202301025390625, -0.37491607666015625, 1.3329849243164062, 1.0028629302978516, 0.9637336730957031, -1.4642143249511719, 2.0877532958984375, 2.226348876953125, 4.216499328613281, 2.4524192810058594, 0.3090667724609375, -0.8442153930664062, 0.6473045349121094, 1.2915802001953125, 0.6982421875, 2.3494033813476562, -0.5818061828613281, 3.7973480224609375, 1.4430351257324219, 0.17306900024414062, -1.1269149780273438, -0.9558181762695312, 0.6210556030273438, -2.033233642578125, -1.2290019989013672, 1.3431472778320312, -0.3414154052734375, 1.299285888671875, 0.8410377502441406, 0.8690681457519531, 0.773284912109375, 1.7069587707519531, 1.399749755859375, -3.77490234375, -0.405487060546875, 0.8812961578369141, 3.9799957275390625, 1.5518455505371094, 3.4408798217773438, 1.76885986328125, 1.9490737915039062, -1.0395965576171875, 3.84356689453125, 2.3970985412597656, -1.1829967498779297, 4.712715148925781, 1.23919677734375, 2.39434814453125, 3.621185302734375, 1.0936965942382812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000101.npy"}
|
||||
{"epoch": 0.15268329554043839, "step": 102, "batch_size": 64, "mean": 0.6540460586547852, "std": 1.948905110359192, "min": -3.8852996826171875, "p10": -1.2873451232910156, "median": 0.5863742828369141, "p90": 3.318462371826172, "max": 7.030120849609375, "pos_frac": 0.625, "sample": [2.1714859008789062, -1.08831787109375, -0.7322654724121094, 0.4573249816894531, -3.1237564086914062, -0.6908531188964844, -1.2777481079101562, 0.8998374938964844, -2.0420684814453125, 0.6178016662597656, 0.13512420654296875, 1.6870803833007812, -1.1820068359375, 0.8010025024414062, 1.8105621337890625, -1.1047744750976562, -0.8337631225585938, 3.5741119384765625, 0.024444580078125, 4.5060882568359375, -1.2355728149414062, 0.870391845703125, 0.9522018432617188, 0.6051788330078125, 0.7350025177001953, 0.7021026611328125, 1.004547119140625, 2.639556884765625, -3.8852996826171875, -0.14400100708007812, 4.1036224365234375, 2.0017127990722656, -1.8708686828613281, 0.9104156494140625, -1.2914581298828125, 3.6865615844726562, 2.1358642578125, 4.478267669677734, 2.55078125, -1.2116050720214844, 1.7349853515625, 3.323944091796875, -0.7513504028320312, 0.30631256103515625, -0.6389045715332031, 0.99444580078125, -0.26728057861328125, 2.0699844360351562, 0.4718475341796875, -0.22000885009765625, 1.1548347473144531, 2.5460052490234375, 0.5675697326660156, -2.2122802734375, -0.29935455322265625, -1.7868766784667969, -0.03571319580078125, 7.030120849609375, -0.9939689636230469, 0.1747283935546875, 1.426971435546875, 3.3056716918945312, 1.4929046630859375, 0.11764907836914062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000102.npy"}
|
||||
{"epoch": 0.15419501133786848, "step": 103, "batch_size": 64, "mean": 1.2188481092453003, "std": 2.1123046875, "min": -2.9563560485839844, "p10": -0.8906854629516601, "median": 1.0827569961547852, "p90": 3.564203262329102, "max": 6.906494140625, "pos_frac": 0.75, "sample": [-2.3231124877929688, 2.331817626953125, 2.9345550537109375, -2.7569580078125, 0.7227287292480469, 1.8476181030273438, 0.26679229736328125, 0.6297645568847656, 1.5054969787597656, 1.1513748168945312, -0.32080078125, -0.6596832275390625, -0.1605682373046875, 0.8583717346191406, 2.5833282470703125, 3.593608856201172, 1.0914878845214844, 1.1623764038085938, 5.42901611328125, 3.3036270141601562, 0.013702392578125, 3.4147567749023438, 0.4643287658691406, 1.5607452392578125, 1.7264480590820312, -0.8249282836914062, 0.925018310546875, 5.5793609619140625, -0.6036262512207031, 2.1531143188476562, 1.5883026123046875, 6.906494140625, 1.9022216796875, 0.3726348876953125, -0.017169952392578125, 0.32750701904296875, -1.0030364990234375, 4.59051513671875, 1.196533203125, 0.16437911987304688, 3.3937759399414062, 0.7135658264160156, 1.277435302734375, -0.34246826171875, 1.074026107788086, -2.8125, 3.3299522399902344, -0.5963516235351562, 3.9926624298095703, 0.06799697875976562, 2.4798583984375, 2.1869659423828125, -2.9563560485839844, -0.6390037536621094, 0.21124267578125, 6.875007629394531, -2.593353271484375, 0.0510711669921875, -0.9188671112060547, 0.3788909912109375, 1.3294525146484375, 1.8182029724121094, 3.4955902099609375, 2.56134033203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000103.npy"}
|
||||
{"epoch": 0.15570672713529857, "step": 104, "batch_size": 64, "mean": 1.1677416563034058, "std": 2.400346040725708, "min": -5.041175842285156, "p10": -2.0107473373413085, "median": 0.9165363311767578, "p90": 3.7102561950683595, "max": 7.94415283203125, "pos_frac": 0.671875, "sample": [2.90789794921875, 2.2960357666015625, 1.1282196044921875, 1.0758743286132812, -2.1303176879882812, 1.4608688354492188, -0.4405021667480469, 0.6879310607910156, 0.98602294921875, -0.862152099609375, -0.36263275146484375, -5.041175842285156, 7.94415283203125, 3.3712310791015625, -0.08571624755859375, -0.019222259521484375, 5.75396728515625, -0.11566162109375, 2.219390869140625, 3.1052169799804688, -2.7164764404296875, -2.240203857421875, 3.738616943359375, 0.7033843994140625, -0.219207763671875, -0.08676910400390625, 0.08622360229492188, -2.858856201171875, 2.9946823120117188, -0.5752944946289062, 0.6248092651367188, 0.8470497131347656, -1.0761966705322266, 1.1926040649414062, 2.2538719177246094, 2.0464324951171875, 0.5002517700195312, -1.0999412536621094, 2.9148330688476562, 1.6812248229980469, 0.2252349853515625, -1.818258285522461, 5.717010498046875, 0.7009010314941406, 3.6069297790527344, 6.7696533203125, 2.670726776123047, 0.5677337646484375, -0.70703125, -2.0941009521484375, 1.8753318786621094, 1.4711322784423828, 2.8499755859375, 3.6440811157226562, 0.09369659423828125, 2.55010986328125, -0.7427520751953125, 1.33648681640625, 0.8449954986572266, 1.489288330078125, -2.093242645263672, 2.8729705810546875, 5.677772521972656, 4.6363525390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000104.npy"}
|
||||
{"epoch": 0.15721844293272866, "step": 105, "batch_size": 64, "mean": 0.9023858904838562, "std": 2.6764845848083496, "min": -5.4870147705078125, "p10": -2.789051818847656, "median": 1.08514404296875, "p90": 5.2674606323242195, "max": 6.9237060546875, "pos_frac": 0.640625, "sample": [-0.07876396179199219, 2.485260009765625, 1.863861083984375, 1.8616924285888672, 0.4471435546875, 2.5740108489990234, -5.4870147705078125, 0.678619384765625, -2.8714599609375, 1.3771934509277344, 3.8282241821289062, 1.213735580444336, -0.5390434265136719, 2.5805511474609375, -0.04395294189453125, 6.9237060546875, -0.3016090393066406, -1.6360549926757812, 0.14423370361328125, 5.7491455078125, 1.67193603515625, 1.2338790893554688, 0.5192489624023438, -3.5036392211914062, 1.2108917236328125, 1.115966796875, 5.3377685546875, -0.3512420654296875, 1.1069793701171875, -2.638885498046875, -2.8534088134765625, 1.3541946411132812, -0.0288543701171875, 2.4349517822265625, -3.989105224609375, 6.727508544921875, 0.10428237915039062, -0.366851806640625, 0.8270187377929688, 0.0987548828125, 1.2257080078125, -0.28667449951171875, -1.048828125, 1.2718391418457031, -4.456390380859375, 1.6004905700683594, 5.4886932373046875, -0.6409645080566406, 0.7919921875, -1.543243408203125, -3.619781494140625, 1.5777511596679688, -1.11651611328125, 5.3211517333984375, 6.03411865234375, 1.2802581787109375, 2.6307716369628906, 1.4083633422851562, -0.2315845489501953, 2.7062549591064453, -1.8655242919921875, 5.142181396484375, 1.0633087158203125, 4.2384490966796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000105.npy"}
|
||||
{"epoch": 0.15873015873015872, "step": 106, "batch_size": 64, "mean": 0.8738718032836914, "std": 2.4340054988861084, "min": -7.3708038330078125, "p10": -2.2881362915039056, "median": 0.6355628967285156, "p90": 3.710467338562012, "max": 7.1677398681640625, "pos_frac": 0.6875, "sample": [-0.932891845703125, 1.4810638427734375, 0.5270824432373047, 3.2454681396484375, -1.5679702758789062, -1.0255126953125, 4.547809600830078, 0.7711734771728516, 0.38707733154296875, 6.173614501953125, 0.10678863525390625, 4.282928466796875, 5.239326477050781, 2.162384033203125, 0.730926513671875, -2.5967788696289062, -0.1934356689453125, 3.1453857421875, -0.2906646728515625, 0.98663330078125, 1.0339508056640625, 0.3625640869140625, -4.369720458984375, 0.39642333984375, -0.2449054718017578, 0.371795654296875, -0.7767105102539062, -2.8852386474609375, -0.1020660400390625, 3.7709808349609375, 0.3576240539550781, -0.80462646484375, 0.8544082641601562, 2.471099853515625, 2.9368743896484375, 1.1511077880859375, 0.25856971740722656, -0.40725135803222656, 1.3848648071289062, 7.1677398681640625, 2.36871337890625, 1.4253349304199219, 1.9854011535644531, -0.3846302032470703, -3.2222366333007812, -0.33911895751953125, 2.2843246459960938, 2.0710067749023438, -7.3708038330078125, 0.5401992797851562, -2.706817626953125, 0.5332527160644531, 3.3745880126953125, 3.5692691802978516, 2.7894668579101562, 3.0199127197265625, 0.8705062866210938, 0.3534698486328125, -0.076629638671875, 3.8768157958984375, -2.929168701171875, 0.26833343505859375, 2.68255615234375, 0.8361587524414062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000106.npy"}
|
||||
{"epoch": 0.1602418745275888, "step": 107, "batch_size": 64, "mean": 0.5380067825317383, "std": 2.234407901763916, "min": -6.752532958984375, "p10": -2.1264678955078122, "median": 0.6338253021240234, "p90": 3.437558746337891, "max": 5.343395233154297, "pos_frac": 0.671875, "sample": [0.8296546936035156, -5.15386962890625, 1.210906982421875, 0.9806556701660156, -0.5985107421875, 1.8872451782226562, -4.807281494140625, 2.2413597106933594, 0.134796142578125, 1.639892578125, -0.2782745361328125, 5.343395233154297, 0.9565219879150391, 1.44439697265625, 0.6209220886230469, -1.0286235809326172, -3.23431396484375, 0.34661865234375, 2.959564208984375, 1.2262420654296875, 0.05367279052734375, 3.702728271484375, 3.7416763305664062, -1.5873165130615234, 0.646728515625, -0.76544189453125, 1.9813156127929688, -0.508026123046875, -0.7495574951171875, 0.620574951171875, 0.7004623413085938, -2.1850662231445312, 0.5008697509765625, 2.8380126953125, 0.452056884765625, 3.27557373046875, 3.6668338775634766, -0.771026611328125, 0.9779281616210938, 3.5396499633789062, 0.748138427734375, 2.8873672485351562, -0.9389724731445312, -0.45952606201171875, 1.5237579345703125, 2.5244369506835938, 0.8226127624511719, 1.9452667236328125, 0.3834667205810547, -2.5697555541992188, 4.3753509521484375, 3.5069808959960938, -0.9006576538085938, -1.495758056640625, -0.09528350830078125, 0.62042236328125, 2.0496826171875, 2.3404541015625, 0.10253524780273438, 0.6184768676757812, -1.9897384643554688, -6.752532958984375, 0.824249267578125, -2.491485595703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000107.npy"}
|
||||
{"epoch": 0.1617535903250189, "step": 108, "batch_size": 64, "mean": 0.9229696989059448, "std": 2.3213441371917725, "min": -4.7740325927734375, "p10": -1.9208165168762206, "median": 1.0241813659667969, "p90": 4.061999511718752, "max": 6.7431488037109375, "pos_frac": 0.671875, "sample": [-0.9885711669921875, -0.7697677612304688, 1.9386749267578125, 2.3472747802734375, 0.23406982421875, 2.5428543090820312, 2.205535888671875, 2.174610137939453, 6.7431488037109375, -0.8547210693359375, 2.1345367431640625, 2.238117218017578, 1.4948577880859375, -1.8840246200561523, 1.329132080078125, 1.6856498718261719, -1.1729507446289062, 0.08445549011230469, 0.072113037109375, 0.19292831420898438, 4.551605224609375, -0.7021408081054688, -1.035308837890625, 0.2314453125, 1.0550460815429688, -3.1207427978515625, 2.0163650512695312, -4.7740325927734375, 1.92578125, -0.06967926025390625, -2.7445220947265625, -0.22943115234375, 4.2635345458984375, 0.22605133056640625, 2.7195892333984375, 5.5169830322265625, 1.2366828918457031, -0.16492462158203125, -3.6117286682128906, 3.2364349365234375, -1.93658447265625, 3.5917510986328125, 2.016613006591797, 1.0467987060546875, -1.1864013671875, 2.272380828857422, -0.4268150329589844, 1.3888778686523438, 0.34894561767578125, -2.74066162109375, -3.419384002685547, 1.8039321899414062, -0.0428466796875, 1.9376754760742188, 4.309608459472656, -0.18387222290039062, 1.4474258422851562, 6.6322021484375, 0.8997001647949219, 1.0015640258789062, 0.5249443054199219, 2.5919227600097656, 0.22707366943359375, 4.690277099609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000108.npy"}
|
||||
{"epoch": 0.16326530612244897, "step": 109, "batch_size": 64, "mean": 1.389096975326538, "std": 2.7740590572357178, "min": -3.6218338012695312, "p10": -1.791645812988281, "median": 1.1500816345214844, "p90": 4.78812370300293, "max": 9.340713500976562, "pos_frac": 0.625, "sample": [-0.6550979614257812, 5.702323913574219, 6.42803955078125, -0.373931884765625, 3.483602523803711, 2.3277740478515625, 3.774547576904297, -1.5706634521484375, 0.40594482421875, 2.3217391967773438, 0.07906913757324219, 2.26495361328125, 4.258026123046875, 1.900421142578125, -1.3538970947265625, 1.5335693359375, -0.18539810180664062, 1.1742897033691406, -0.863037109375, 0.9616012573242188, 0.7960548400878906, 1.8938941955566406, -3.3751068115234375, -1.4179983139038086, -0.2751617431640625, 9.340713500976562, -0.3527641296386719, -0.5520133972167969, 1.5466861724853516, 6.946868896484375, 1.1258735656738281, -1.8863525390625, 1.0608901977539062, -0.4974212646484375, -1.0320091247558594, 1.6475448608398438, -1.918680191040039, 4.66925048828125, 0.34128570556640625, 1.8863449096679688, -0.23569488525390625, 2.384450912475586, 1.1788291931152344, 6.770660400390625, -1.3531875610351562, -2.074626922607422, -0.9508514404296875, 1.114999771118164, -3.6218338012695312, -2.883819580078125, 1.3454055786132812, 2.1353302001953125, 1.786529541015625, 4.162139892578125, 1.9648208618164062, 8.560699462890625, 4.839069366455078, 4.341278076171875, 2.802490234375, 4.340278625488281, -0.3531494140625, -0.02808380126953125, -2.27447509765625, 3.3891754150390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000109.npy"}
|
||||
{"epoch": 0.16477702191987906, "step": 110, "batch_size": 64, "mean": 0.20230792462825775, "std": 2.4725048542022705, "min": -7.27728271484375, "p10": -2.868067932128906, "median": 0.5199766159057617, "p90": 3.036015319824219, "max": 5.1476593017578125, "pos_frac": 0.5625, "sample": [1.27099609375, 0.03884124755859375, -2.05535888671875, -1.1697311401367188, -3.0871124267578125, 4.662700653076172, 1.8612594604492188, 3.6071243286132812, -4.9962158203125, 0.6322555541992188, 2.5579872131347656, 0.9822540283203125, 1.4199905395507812, 3.1650009155273438, 1.1309356689453125, -0.552337646484375, 1.6207351684570312, 4.93865966796875, -1.4359130859375, -3.3551101684570312, 2.1453475952148438, -2.4100494384765625, 2.097381591796875, 1.8294219970703125, 0.31318092346191406, -0.00655364990234375, -0.10011100769042969, 3.041412353515625, -0.589752197265625, -0.394805908203125, -1.5830802917480469, 2.1567420959472656, 0.9179630279541016, 0.14866256713867188, -2.157806396484375, 1.762054443359375, -1.6439971923828125, 3.683197021484375, -1.6969223022460938, 0.4076976776123047, 5.1476593017578125, -0.26300048828125, -1.3675613403320312, 1.7699966430664062, -3.093830108642578, 1.3726425170898438, 2.449329376220703, -2.5207061767578125, -7.27728271484375, -5.4929046630859375, -0.24185943603515625, 1.6682891845703125, -1.69744873046875, 0.804168701171875, 0.7020034790039062, 2.9185791015625, 3.0234222412109375, 1.023468017578125, 0.7777185440063477, -0.68310546875, -2.255990982055664, -2.17218017578125, 2.2162933349609375, -3.016937255859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000110.npy"}
|
||||
{"epoch": 0.16628873771730915, "step": 111, "batch_size": 64, "mean": 1.024204969406128, "std": 2.5112526416778564, "min": -4.873832702636719, "p10": -1.7241825103759765, "median": 0.6802024841308594, "p90": 4.403728103637695, "max": 7.56396484375, "pos_frac": 0.578125, "sample": [-1.7302970886230469, -1.0145187377929688, 2.6421279907226562, 2.893768310546875, 4.398262023925781, -1.2702102661132812, 0.3359565734863281, 2.5824813842773438, 2.1400604248046875, -0.1613311767578125, 3.400829315185547, -0.5987014770507812, -1.59857177734375, 3.591522216796875, -4.873832702636719, 0.5331649780273438, -0.9865989685058594, -0.19281768798828125, -3.0156326293945312, -0.3770866394042969, 5.021148681640625, 2.44708251953125, 4.406070709228516, 5.2174835205078125, 4.35272216796875, 2.259561538696289, 0.827239990234375, 2.897125244140625, 1.2662067413330078, -1.7099151611328125, -0.07736778259277344, -0.8396797180175781, -1.0452880859375, -3.483959197998047, 1.31671142578125, -0.275543212890625, -0.2803955078125, 2.1657867431640625, -0.6557426452636719, 2.840465545654297, 2.5954971313476562, 3.894989013671875, 5.144218444824219, -0.22936630249023438, 4.957355499267578, -0.9913978576660156, 1.6542510986328125, 4.606193542480469, -1.82293701171875, -0.1405181884765625, 3.2524490356445312, -0.0726165771484375, 2.3302574157714844, -1.1040420532226562, -3.7293853759765625, 1.5180206298828125, -3.048492431640625, 2.798431396484375, 0.0712127685546875, 0.3555946350097656, 7.56396484375, 0.017696380615234375, 1.6933326721191406, 0.8861236572265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000111.npy"}
|
||||
{"epoch": 0.16780045351473924, "step": 112, "batch_size": 64, "mean": 1.2949196100234985, "std": 2.608910083770752, "min": -5.856897354125977, "p10": -1.9404251098632812, "median": 1.1343574523925781, "p90": 5.415811157226564, "max": 6.818756103515625, "pos_frac": 0.734375, "sample": [3.70806884765625, 0.7562103271484375, 4.1749267578125, 1.1494064331054688, 0.6603870391845703, 0.13498687744140625, 4.529441833496094, 2.0955276489257812, 1.7516937255859375, 1.0649681091308594, 5.638771057128906, 2.2694244384765625, -3.607379913330078, 0.7131519317626953, 0.47311973571777344, -2.906158447265625, 5.759315490722656, 0.7713775634765625, -0.4677162170410156, 0.26071929931640625, 2.7349319458007812, -0.712799072265625, 2.2456207275390625, 5.663909912109375, 0.5820236206054688, 5.645896911621094, 1.49932861328125, -1.9142990112304688, 5.140899658203125, 1.3563098907470703, 6.818756103515625, 1.3242416381835938, -0.9605388641357422, 2.381145477294922, 2.2726516723632812, 5.53363037109375, 0.5945339202880859, -0.5202255249023438, 2.6937332153320312, 2.31494140625, -1.9516220092773438, 1.26104736328125, 3.4533767700195312, 1.2666149139404297, 1.8009414672851562, 5.7651214599609375, 4.5418853759765625, 3.9994659423828125, -0.034027099609375, -0.6856842041015625, -3.188262939453125, 0.5924911499023438, -1.4697265625, 2.4791259765625, 0.4143524169921875, -0.29503631591796875, 1.1193084716796875, -5.856897354125977, -1.961517333984375, -0.338958740234375, 1.1141357421875, -4.18963623046875, 0.06251907348632812, 1.3509063720703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000112.npy"}
|
||||
{"epoch": 0.1693121693121693, "step": 113, "batch_size": 64, "mean": 0.8377047777175903, "std": 2.510047674179077, "min": -7.5020599365234375, "p10": -1.3950336456298829, "median": 0.6464614868164062, "p90": 3.8143783569335956, "max": 8.369140625, "pos_frac": 0.609375, "sample": [-0.6734237670898438, -4.088985443115234, -1.5137939453125, 1.3805599212646484, -0.1585693359375, 2.5470046997070312, 0.9340476989746094, -1.3992919921875, -1.7406158447265625, 2.6912994384765625, -1.1276168823242188, -0.8801498413085938, -1.3850975036621094, -0.04431915283203125, -1.9298210144042969, -0.5615997314453125, 0.6190395355224609, 2.57794189453125, 1.77996826171875, -5.420976638793945, -1.1317367553710938, -0.6049156188964844, 4.7472076416015625, -0.3184356689453125, 1.4647941589355469, 1.8800201416015625, 2.2889022827148438, 1.3182449340820312, 4.254032135009766, -0.5026016235351562, 1.7118453979492188, 8.369140625, 0.6550922393798828, 0.6291694641113281, 0.5877685546875, 2.0115833282470703, -7.5020599365234375, 0.4403572082519531, 1.19390869140625, 1.8404693603515625, 0.35245513916015625, 1.0686264038085938, -0.64837646484375, 1.523406982421875, 2.4352569580078125, 6.9394378662109375, 0.20207977294921875, 1.6723651885986328, 0.6378307342529297, 3.106212615966797, -0.039398193359375, -1.0142288208007812, -0.16950607299804688, 4.9040374755859375, 3.992645263671875, -0.9578361511230469, 2.4942169189453125, 1.3176288604736328, 3.0543365478515625, 3.3984222412109375, 4.347991943359375, -1.1532745361328125, -0.40271759033203125, 1.6131019592285156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000113.npy"}
|
||||
{"epoch": 0.1708238851095994, "step": 114, "batch_size": 64, "mean": 1.317431926727295, "std": 3.0444493293762207, "min": -4.5789794921875, "p10": -2.2569110870361326, "median": 1.0328617095947266, "p90": 5.07215518951416, "max": 10.413543701171875, "pos_frac": 0.65625, "sample": [-1.192047119140625, -1.8406219482421875, 1.9902114868164062, 2.2447509765625, 8.172292709350586, 1.8145370483398438, 4.675926208496094, 4.720428466796875, 1.3651580810546875, -3.65399169921875, 4.960813522338867, -1.07537841796875, 1.9754409790039062, -2.928863525390625, 0.5064468383789062, 0.9766769409179688, 1.4841079711914062, 0.482208251953125, -0.4395256042480469, 0.058147430419921875, 1.0890464782714844, -0.02585601806640625, 10.413543701171875, -1.2528343200683594, 0.9195270538330078, 3.5960769653320312, 4.286243438720703, 2.247457504272461, -1.1671142578125, 3.387845993041992, -2.792102813720703, 0.47037315368652344, 5.119873046875, -4.5789794921875, 2.1220321655273438, -0.09441757202148438, 5.6071319580078125, 0.6546401977539062, 2.7965621948242188, -2.4064407348632812, 8.417266845703125, 2.0042152404785156, 4.89971923828125, 0.17882347106933594, 1.33953857421875, -0.06818008422851562, 6.5535125732421875, 0.0792388916015625, -1.5568504333496094, -1.9080085754394531, 1.305755615234375, -0.2838134765625, 1.2869644165039062, 0.64508056640625, 3.6243362426757812, 4.595367431640625, 1.5543651580810547, -1.078939437866211, -2.5480499267578125, -4.262828826904297, 5.711723327636719, -0.5362091064453125, 1.1527671813964844, -1.4794769287109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000114.npy"}
|
||||
{"epoch": 0.17233560090702948, "step": 115, "batch_size": 64, "mean": 1.1944315433502197, "std": 2.611178398132324, "min": -3.4467239379882812, "p10": -1.8578369140625, "median": 0.8413314819335938, "p90": 4.369239807128908, "max": 8.126487731933594, "pos_frac": 0.65625, "sample": [-1.8192138671875, -1.006744384765625, 3.73736572265625, 0.32037353515625, 7.291290283203125, -0.48696136474609375, 2.0295791625976562, 3.8002281188964844, 0.14838409423828125, 7.041595458984375, 0.3710041046142578, -2.3163375854492188, 8.126487731933594, -1.6488876342773438, -2.2670516967773438, -3.41094970703125, 0.43129730224609375, 1.944000244140625, 0.653900146484375, 2.9874420166015625, 1.2987785339355469, 3.276885986328125, 0.6780242919921875, 0.8117828369140625, -2.4198532104492188, 0.870880126953125, 1.6417236328125, 1.6546478271484375, -3.4467239379882812, 5.561862945556641, -1.1154937744140625, 3.9625911712646484, 1.8398170471191406, 4.5279693603515625, -0.02780914306640625, 3.998870849609375, 0.28343963623046875, 1.0676918029785156, -1.3809623718261719, -0.0276031494140625, 1.4917526245117188, -1.4148330688476562, 2.0363540649414062, 2.5599822998046875, 3.5046005249023438, 1.2006988525390625, 3.7322158813476562, -0.00717926025390625, -2.899433135986328, 7.540863037109375, 0.9847545623779297, -0.6720752716064453, 2.219930648803711, -0.3242149353027344, 0.19823455810546875, -0.24045181274414062, -1.8743896484375, -0.5619716644287109, 1.644927978515625, -0.48169708251953125, 1.615976333618164, 0.2630615234375, 1.5945262908935547, 5.348667144775391], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000115.npy"}
|
||||
{"epoch": 0.17384731670445955, "step": 116, "batch_size": 64, "mean": 1.6823740005493164, "std": 2.5587990283966064, "min": -4.826118469238281, "p10": -1.1562065124511718, "median": 1.3810234069824219, "p90": 5.522842407226563, "max": 7.8072357177734375, "pos_frac": 0.703125, "sample": [6.340415954589844, -1.05328369140625, 0.24160003662109375, 5.252555847167969, 5.638679504394531, 1.70465087890625, 3.9896392822265625, 0.365478515625, 3.211559295654297, 4.286338806152344, 2.3409423828125, -0.7174148559570312, -0.9078903198242188, 4.2041168212890625, 1.7154521942138672, 0.6774387359619141, -0.12757492065429688, 0.186676025390625, 2.679828643798828, 3.7541961669921875, -1.7729606628417969, 0.47670745849609375, 2.3431777954101562, -0.580108642578125, -1.2857017517089844, 5.785697937011719, 1.1548233032226562, 3.2373428344726562, 7.8072357177734375, 1.1890182495117188, 2.2899627685546875, -0.6734848022460938, -2.4787158966064453, 0.3758983612060547, 4.85821533203125, 0.4710235595703125, 3.196197509765625, 2.8337936401367188, -1.176116943359375, -1.1097488403320312, 1.9654312133789062, 2.2478256225585938, 1.6675567626953125, -0.8258590698242188, 6.065673828125, 2.1179580688476562, 6.0359039306640625, -1.545623779296875, -0.144195556640625, 1.1135406494140625, -1.2556495666503906, -0.02622222900390625, 1.0544929504394531, -4.826118469238281, -0.29560089111328125, 2.3947219848632812, 1.1870803833007812, 1.573028564453125, 0.739410400390625, 7.09942626953125, 3.9428939819335938, 2.341564178466797, 5.045707702636719, -0.7266693115234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000116.npy"}
|
||||
{"epoch": 0.17535903250188964, "step": 117, "batch_size": 64, "mean": 1.5916296243667603, "std": 2.7763307094573975, "min": -4.518394470214844, "p10": -1.9176734924316405, "median": 1.6185283660888672, "p90": 5.535617828369141, "max": 6.8250885009765625, "pos_frac": 0.71875, "sample": [-1.7209930419921875, 5.883935928344727, 2.7175674438476562, 0.8043975830078125, 5.5504608154296875, 0.3123512268066406, 3.3290634155273438, 0.8055896759033203, 3.3536148071289062, 2.5686912536621094, 0.96124267578125, -0.641204833984375, 5.459442138671875, -4.518394470214844, 1.8144607543945312, 2.1859817504882812, -2.89190673828125, -0.8383712768554688, 3.8292312622070312, 3.3609771728515625, -0.08737945556640625, 3.967803955078125, 1.2059249877929688, 0.03168487548828125, 3.722593307495117, -1.8507614135742188, 1.8394775390625, 2.9449539184570312, 2.6501598358154297, 5.919090270996094, 2.1563873291015625, 5.500984191894531, 6.731193542480469, 0.5329418182373047, 3.64398193359375, -0.434967041015625, 1.4225959777832031, 6.127510070800781, -0.6103668212890625, 4.357002258300781, 1.0223731994628906, 1.0175971984863281, 2.9473724365234375, 6.695411682128906, -1.7798004150390625, -3.984130859375, 1.261871337890625, 0.6182098388671875, 4.353492736816406, 2.231792449951172, 0.25347900390625, 1.83685302734375, 3.427173614501953, -2.0471038818359375, 6.8250885009765625, -3.642993927001953, -1.94635009765625, 2.7258453369140625, -3.597076416015625, -0.8356399536132812, 2.3566207885742188, 1.4179458618164062, -0.8202171325683594, -0.5704689025878906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000117.npy"}
|
||||
{"epoch": 0.17687074829931973, "step": 118, "batch_size": 64, "mean": 1.3567461967468262, "std": 2.878509759902954, "min": -6.311367034912109, "p10": -2.6082221984863283, "median": 1.6718368530273438, "p90": 5.105885314941407, "max": 8.1173095703125, "pos_frac": 0.640625, "sample": [0.5057754516601562, 0.8127365112304688, -0.22335052490234375, 1.906768798828125, -6.311367034912109, -2.600921630859375, -0.15187835693359375, 3.2771949768066406, 5.146003723144531, -1.2750587463378906, 4.125017166137695, 1.9845352172851562, -0.5577545166015625, 3.4766464233398438, 0.832855224609375, -1.7519454956054688, 2.4015274047851562, 4.353126525878906, -0.23480987548828125, -0.08905410766601562, -0.4743919372558594, 1.637603759765625, 3.088216781616211, 3.396942138671875, 2.6880645751953125, -3.0994415283203125, -2.8802261352539062, -1.60870361328125, 2.6246681213378906, -0.24018287658691406, 2.3980941772460938, 1.9446392059326172, 5.936553955078125, -2.009187698364258, 1.7060699462890625, 3.0555572509765625, 1.047760009765625, 2.173095703125, 4.404998779296875, -3.0256423950195312, 0.043590545654296875, 5.012275695800781, 2.11920166015625, 8.1173095703125, 0.7477874755859375, -2.6113510131835938, -3.3936843872070312, 5.570579528808594, 5.5077972412109375, 5.7883758544921875, 4.796028137207031, -3.848236083984375, 3.4633026123046875, 2.7272911071777344, 0.4131622314453125, -0.4202919006347656, -1.6276397705078125, 2.849163055419922, 3.261018753051758, 4.691230773925781, 5.576240539550781, -0.3005695343017578, 0.5235519409179688, -0.564910888671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000118.npy"}
|
||||
{"epoch": 0.17838246409674982, "step": 119, "batch_size": 64, "mean": 1.2937450408935547, "std": 3.450192928314209, "min": -7.5081787109375, "p10": -2.3134231567382812, "median": 0.7710380554199219, "p90": 6.0257507324218755, "max": 10.61785888671875, "pos_frac": 0.59375, "sample": [-1.8245429992675781, 8.253917694091797, -1.6772232055664062, -2.4152488708496094, -4.8242340087890625, -0.2748451232910156, 2.473560333251953, 1.2074127197265625, -1.2286758422851562, -0.31768798828125, -2.9427032470703125, -0.9576339721679688, -0.1462860107421875, 7.321586608886719, 4.719196319580078, -7.5081787109375, -2.2042198181152344, -2.30364990234375, 1.7235050201416016, 3.945587158203125, -1.882049560546875, 2.464509963989258, 0.350799560546875, 0.9432830810546875, 6.050140380859375, 2.5105972290039062, -3.7025375366210938, 4.913631439208984, 5.968841552734375, 6.7876434326171875, 4.630340576171875, 2.6568832397460938, 1.4593658447265625, 3.524078369140625, 4.770050048828125, 10.61785888671875, 1.13336181640625, -1.2435760498046875, 0.574615478515625, 5.475860595703125, 7.878742218017578, 2.0049972534179688, -2.0771217346191406, 1.54925537109375, -1.61041259765625, -2.3176116943359375, 4.58984375, 3.4770240783691406, 3.06964111328125, -1.875213623046875, 0.423431396484375, -2.650970458984375, -0.9238739013671875, -1.50372314453125, -0.218170166015625, 0.5987930297851562, 3.25994873046875, 6.760009765625, 0.47542572021484375, 1.6813583374023438, -0.8133640289306641, 1.75848388671875, 0.4704093933105469, -0.230560302734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000119.npy"}
|
||||
{"epoch": 0.17989417989417988, "step": 120, "batch_size": 64, "mean": 1.9414762258529663, "std": 2.9695560932159424, "min": -4.72711181640625, "p10": -0.8423812866210937, "median": 1.5902175903320312, "p90": 6.047448730468752, "max": 9.936897277832031, "pos_frac": 0.71875, "sample": [2.930389404296875, 0.080474853515625, -0.13498306274414062, 9.936897277832031, 1.761688232421875, -0.05629730224609375, 4.259433746337891, -0.3227691650390625, 1.47235107421875, -4.6685791015625, 7.0665740966796875, 1.1679153442382812, 3.6783676147460938, 0.320709228515625, 2.257110595703125, 6.5363616943359375, 5.033119201660156, -0.16707611083984375, 2.4768142700195312, 0.9859466552734375, 1.746734619140625, -0.28326416015625, 2.9068069458007812, 3.300365447998047, -0.7035369873046875, 1.5604095458984375, 6.683380126953125, -1.4837112426757812, 2.4501419067382812, 3.9020919799804688, 5.322624206542969, -4.72711181640625, 9.466583251953125, -0.7768707275390625, 3.8348846435546875, -0.8287734985351562, 3.5218238830566406, -0.35138702392578125, 4.8480072021484375, 6.9614715576171875, 0.4377326965332031, 1.2373428344726562, -0.9369964599609375, 2.1100311279296875, 1.617462158203125, -3.0773162841796875, -0.19279861450195312, 1.636993408203125, 3.7278213500976562, 1.228607177734375, 0.09992218017578125, 0.8484363555908203, 4.9057464599609375, -0.49430084228515625, 2.2947998046875, -0.8482131958007812, 6.3069915771484375, 5.4418487548828125, 0.19758224487304688, -3.06512451171875, 1.5629730224609375, 0.9469699859619141, 4.399816513061523, 1.9030303955078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000120.npy"}
|
||||
{"epoch": 0.18140589569160998, "step": 121, "batch_size": 64, "mean": 1.368406057357788, "std": 3.1538987159729004, "min": -5.14288330078125, "p10": -2.512358474731445, "median": 0.8851528167724609, "p90": 5.4983329772949245, "max": 9.64654541015625, "pos_frac": 0.671875, "sample": [3.372589111328125, -5.14288330078125, 0.142333984375, -2.578929901123047, 3.2575607299804688, 7.6973876953125, 2.3821640014648438, -3.2624340057373047, 1.6553058624267578, 4.539398193359375, 0.21730804443359375, -1.460235595703125, 9.64654541015625, 3.063648223876953, -1.7781524658203125, 0.5967674255371094, 3.4084739685058594, 1.0326614379882812, 3.8628616333007812, -2.6728591918945312, 7.5116424560546875, 0.19586181640625, 4.197547912597656, 4.140968322753906, 4.119285583496094, -2.357025146484375, 2.1371803283691406, -3.0360260009765625, 2.0525054931640625, -0.39141845703125, -0.40993499755859375, 6.74176025390625, -0.8445320129394531, 0.31995391845703125, 1.7390823364257812, 0.707244873046875, -2.17974853515625, 5.767578125, 2.611572265625, -0.08806991577148438, -0.36224937438964844, 0.29179954528808594, -1.4242477416992188, 3.6160202026367188, -1.8368148803710938, 2.105499267578125, -3.8240432739257812, 8.489280700683594, 0.89068603515625, 0.8796195983886719, 1.9477176666259766, 0.7200279235839844, 3.2727813720703125, 1.5318450927734375, -2.0022964477539062, 7.865325927734375, -1.5968017578125, 0.8756446838378906, 0.7942619323730469, 1.002471923828125, -0.3105583190917969, 4.870094299316406, 1.531097412109375, -2.664113998413086], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000121.npy"}
|
||||
{"epoch": 0.18291761148904007, "step": 122, "batch_size": 64, "mean": 1.4465798139572144, "std": 3.5467207431793213, "min": -4.5025482177734375, "p10": -3.0414273262023923, "median": 0.8186302185058594, "p90": 5.9083892822265645, "max": 11.707275390625, "pos_frac": 0.671875, "sample": [10.021942138671875, -3.828826904296875, -4.2365264892578125, 2.6366653442382812, 0.88909912109375, -1.6417007446289062, 9.320037841796875, 4.3042449951171875, -3.5704193115234375, 5.016548156738281, 2.3129119873046875, -1.6750869750976562, -2.7557058334350586, 0.8224563598632812, 0.5507965087890625, 10.69000244140625, -1.14935302734375, 0.7103958129882812, -3.4037704467773438, 3.7495193481445312, 3.9427127838134766, 5.312126159667969, 0.8654022216796875, 2.0216846466064453, 6.151340484619141, 6.551971435546875, -2.1474647521972656, -1.263916015625, 2.807342529296875, 5.341503143310547, -3.16387939453125, 4.939903259277344, 0.8833293914794922, 0.4906005859375, -4.5025482177734375, 1.4080734252929688, 1.2453670501708984, 0.428253173828125, 2.3864059448242188, -0.6079521179199219, 0.3678131103515625, 0.3232536315917969, -2.10601806640625, 6.5221710205078125, -3.2333145141601562, 4.172733306884766, 0.7634773254394531, 0.29718780517578125, -1.093475341796875, 0.8148040771484375, -0.40914154052734375, -1.3325920104980469, 2.4010963439941406, 1.0699081420898438, -0.09042739868164062, -0.23992919921875, 1.6194000244140625, -0.9394607543945312, 0.16663360595703125, 3.5509262084960938, 2.879486083984375, 2.770751953125, 0.74505615234375, 11.707275390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000122.npy"}
|
||||
{"epoch": 0.18442932728647016, "step": 123, "batch_size": 64, "mean": 1.6298733949661255, "std": 3.0458309650421143, "min": -5.241069793701172, "p10": -2.208189392089843, "median": 1.5800952911376953, "p90": 5.41510009765625, "max": 12.994735717773438, "pos_frac": 0.734375, "sample": [3.738311767578125, 6.521503448486328, 1.2651214599609375, 12.994735717773438, -1.3706779479980469, -2.556842803955078, 3.6390609741210938, 2.0491409301757812, 4.468719482421875, 1.8393783569335938, 0.5191497802734375, 3.75665283203125, 2.0236282348632812, -1.453277587890625, 2.8355484008789062, 2.208629608154297, 2.859018325805664, 4.326498031616211, 0.4789237976074219, 5.31512451171875, 1.033843994140625, 2.5100173950195312, 2.505390167236328, 2.941019058227539, 3.199066162109375, 6.266716003417969, 1.5125999450683594, -0.9708690643310547, 2.0100860595703125, 3.9081573486328125, 1.6475906372070312, 3.8422012329101562, -5.241069793701172, 5.460731506347656, -0.8382835388183594, -4.533836364746094, -2.3539276123046875, -2.658618927001953, -0.23595809936523438, 0.7188644409179688, 3.213470458984375, -0.02805328369140625, 2.9844188690185547, -2.7098617553710938, 2.4287567138671875, 7.4196929931640625, -0.5124454498291016, -2.6868553161621094, 0.27022552490234375, 0.5669822692871094, -1.3989410400390625, 1.2451553344726562, 0.5933303833007812, 0.5660324096679688, -1.868133544921875, 6.773284912109375, -1.5073108673095703, 0.5194320678710938, 0.02814483642578125, 1.0192184448242188, 1.941727638244629, 2.8485260009765625, 5.45794677734375, 0.965087890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000123.npy"}
|
||||
{"epoch": 0.18594104308390022, "step": 124, "batch_size": 64, "mean": 1.872976541519165, "std": 3.72943377494812, "min": -5.9827117919921875, "p10": -3.0603647232055664, "median": 1.691889762878418, "p90": 6.748194122314454, "max": 13.256378173828125, "pos_frac": 0.75, "sample": [0.09186553955078125, 9.465042114257812, 0.607696533203125, -4.786624908447266, 1.723714828491211, 2.49359130859375, 6.883575439453125, 2.791046142578125, 9.354888916015625, 13.256378173828125, 2.771026611328125, 3.66668701171875, -0.3200187683105469, 8.603515625, 1.3592653274536133, 2.732349395751953, -1.60748291015625, 2.178934097290039, -0.4133033752441406, -1.9736175537109375, -3.167022705078125, 2.8215103149414062, -1.4232254028320312, 3.80914306640625, 0.5491180419921875, 2.9660682678222656, 0.2317790985107422, 3.4158859252929688, -1.2335968017578125, -3.0966796875, 5.721355438232422, 5.260040283203125, 2.41455078125, 1.0829620361328125, 5.742774963378906, 7.356292724609375, 6.432304382324219, -5.8944549560546875, 1.734527587890625, 4.495819091796875, -3.2781295776367188, 2.0943527221679688, 0.35230255126953125, 2.564697265625, 1.0958251953125, 3.636749267578125, 1.2620849609375, 1.2166976928710938, -2.9756298065185547, 2.1694107055664062, 8.896965026855469, 0.8851776123046875, 0.07507896423339844, 0.8181076049804688, -3.315948486328125, 1.1190109252929688, 3.4930877685546875, -0.9522018432617188, -5.9827117919921875, 4.9327850341796875, -1.92059326171875, 3.5103416442871094, 0.4152870178222656, 1.660064697265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000124.npy"}
|
||||
{"epoch": 0.1874527588813303, "step": 125, "batch_size": 64, "mean": 2.0888171195983887, "std": 3.522920608520508, "min": -6.5896453857421875, "p10": -2.4146949768066404, "median": 2.030025005340576, "p90": 6.063298416137696, "max": 11.585845947265625, "pos_frac": 0.78125, "sample": [-1.4372406005859375, 3.8651199340820312, 5.361675262451172, 1.250213623046875, 2.296844482421875, -0.8183975219726562, 4.589759826660156, 9.291690826416016, 6.669525146484375, 1.0018291473388672, 9.695648193359375, -2.572296142578125, 0.5397720336914062, 1.5431365966796875, 0.22942733764648438, 6.151691436767578, 5.0713958740234375, 0.446868896484375, 1.1646728515625, -1.0774421691894531, 5.549659729003906, 5.857048034667969, 2.1735458374023438, 2.8633270263671875, -1.2217025756835938, 2.5645675659179688, 0.3797149658203125, -1.8234062194824219, -4.586570739746094, 0.4321746826171875, 3.8398361206054688, 1.6123046875, 1.0263137817382812, 5.206756591796875, 5.646308898925781, -3.8910675048828125, 1.954817771911621, 1.7391815185546875, 6.560096740722656, 11.585845947265625, 3.2125091552734375, 2.6834182739257812, 7.30902099609375, 3.9983749389648438, 4.803409576416016, -2.0085525512695312, 2.1052322387695312, 3.6884498596191406, 0.35770416259765625, 1.578512191772461, 3.6470794677734375, 1.3952560424804688, 3.5068092346191406, 1.45037841796875, -2.0469589233398438, 5.2908935546875, 2.497283935546875, -3.0506668090820312, 2.4962921142578125, 0.6681003570556641, 3.833049774169922, -2.8249549865722656, -5.049346923828125, -6.5896453857421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000125.npy"}
|
||||
{"epoch": 0.1889644746787604, "step": 126, "batch_size": 64, "mean": 1.603518009185791, "std": 4.327088356018066, "min": -7.431922912597656, "p10": -3.1038505554199216, "median": 0.8009519577026367, "p90": 7.889907836914063, "max": 11.460006713867188, "pos_frac": 0.578125, "sample": [9.64541244506836, -1.0708274841308594, 4.255607604980469, 1.2237358093261719, 1.1155853271484375, 7.6500091552734375, 5.979400634765625, 0.6253604888916016, 1.14813232421875, -1.4684028625488281, 8.589186668395996, 5.81689453125, -7.431922912597656, 0.4724273681640625, 11.145637512207031, 3.097564697265625, -2.7901687622070312, -1.6201324462890625, 0.4488716125488281, -3.026214599609375, 6.365386962890625, 6.198486328125, -1.2702617645263672, 2.6951560974121094, -2.893709182739258, 6.796649932861328, 3.6314697265625, -3.52392578125, 11.024642944335938, -0.3468971252441406, -3.1807708740234375, -5.1920013427734375, -2.5698471069335938, 0.18583297729492188, -2.10302734375, 1.15484619140625, -0.003253936767578125, 1.7692413330078125, 0.5709934234619141, -5.9346771240234375, 5.05615234375, -1.7727432250976562, 2.9395217895507812, -0.1202850341796875, 10.138313293457031, -0.7594146728515625, 1.8616180419921875, 11.460006713867188, 0.9765434265136719, 3.6345367431640625, -1.0960540771484375, 1.6636505126953125, 3.0335731506347656, -0.17568206787109375, -0.1905517578125, -1.0410614013671875, 1.25433349609375, 7.262004852294922, -1.4610519409179688, -3.1371231079101562, -0.5494537353515625, 7.9927215576171875, -3.8933334350585938, 2.368438720703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000126.npy"}
|
||||
{"epoch": 0.19047619047619047, "step": 127, "batch_size": 64, "mean": 2.1889748573303223, "std": 3.2664709091186523, "min": -4.429462432861328, "p10": -1.47366943359375, "median": 1.9305152893066406, "p90": 6.159912109375, "max": 11.433822631835938, "pos_frac": 0.71875, "sample": [5.7497100830078125, 9.734283447265625, 3.2502212524414062, 5.053562164306641, -1.489288330078125, 7.5142059326171875, -0.7515869140625, 3.518878936767578, 0.9866561889648438, 4.627357482910156, 7.2183685302734375, 1.10546875, 0.6658477783203125, 5.906696319580078, -0.01329803466796875, -0.11233139038085938, 4.6492767333984375, 2.1530685424804688, 0.7125015258789062, 1.6296844482421875, 5.3270721435546875, -4.350067138671875, 1.6423187255859375, 2.6474342346191406, 1.9742927551269531, -3.81097412109375, -0.6133575439453125, 0.24489212036132812, 1.99725341796875, 0.20371246337890625, -1.64056396484375, 3.6600799560546875, 4.216739654541016, 1.2893600463867188, -0.02471160888671875, 4.942798614501953, 1.1369705200195312, 6.510902404785156, 4.38275146484375, 11.433822631835938, -4.429462432861328, 5.188755035400391, 3.4873428344726562, -1.2994232177734375, -3.2106857299804688, 6.070991516113281, 0.8919296264648438, -1.2590904235839844, 6.198020935058594, -1.437225341796875, -0.855438232421875, 4.5426483154296875, 2.7333450317382812, -0.054859161376953125, 2.8779754638671875, -1.7099838256835938, 3.8572845458984375, 0.01532745361328125, 2.4667892456054688, 1.8867378234863281, 2.3126163482666016, 1.0827693939208984, 7.647243499755859, -0.1892261505126953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000127.npy"}
|
||||
{"epoch": 0.19198790627362056, "step": 128, "batch_size": 64, "mean": 1.4382350444793701, "std": 4.831887245178223, "min": -10.666885375976562, "p10": -5.453157424926757, "median": 1.45904541015625, "p90": 7.475734710693359, "max": 12.351699829101562, "pos_frac": 0.640625, "sample": [2.198394775390625, -2.55511474609375, 1.851552963256836, 1.3120269775390625, 1.3728408813476562, 1.5452499389648438, -0.5343170166015625, 9.539520263671875, 0.11786651611328125, 8.035568237304688, 0.9673728942871094, -10.666885375976562, 3.436004638671875, 0.7094955444335938, 7.146236419677734, -7.478832244873047, 1.1450786590576172, 7.489250183105469, -6.1561126708984375, 10.3460693359375, 11.913345336914062, -1.3283367156982422, 2.132396697998047, 6.71319580078125, 12.351699829101562, 0.1314697265625, -1.15631103515625, 7.4441986083984375, -5.705894470214844, 3.2976455688476562, 3.300201416015625, 0.089996337890625, -0.9927864074707031, 1.36468505859375, 3.810516357421875, 1.6157684326171875, -2.2355995178222656, 4.819854736328125, -8.054096221923828, 4.928089141845703, -2.1333770751953125, 1.802011489868164, 1.6363563537597656, -0.13169479370117188, -4.863437652587891, 3.4991226196289062, 2.0232810974121094, -0.2574501037597656, 2.2481842041015625, -0.39873313903808594, -7.6013946533203125, 5.0872955322265625, 4.1384429931640625, -0.2329559326171875, 1.9441757202148438, -0.8092098236083984, 1.7966384887695312, 3.1232147216796875, 12.099403381347656, -1.3784160614013672, -8.380180358886719, -0.13576507568359375, 5.146614074707031, -0.4363861083984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000128.npy"}
|
||||
{"epoch": 0.19349962207105065, "step": 129, "batch_size": 64, "mean": 1.6657402515411377, "std": 4.51900577545166, "min": -8.918182373046875, "p10": -3.746550750732422, "median": 1.6606807708740234, "p90": 7.686300277709962, "max": 12.773193359375, "pos_frac": 0.640625, "sample": [-2.1089859008789062, -2.0396995544433594, 5.126823425292969, 0.97607421875, -0.7772178649902344, 8.173290252685547, 4.107391357421875, -4.072912216186523, 4.7199249267578125, 7.781978607177734, 6.381763458251953, 4.923637390136719, 1.3994140625, -1.2819747924804688, 0.726226806640625, 2.0741348266601562, 7.9811248779296875, -2.0747909545898438, -0.116546630859375, 5.0634307861328125, 4.934112548828125, -8.03875732421875, 1.41436767578125, 4.763404846191406, -1.8039817810058594, 12.773193359375, 0.32637786865234375, -2.4050140380859375, 2.908843994140625, 8.473861694335938, 2.1717147827148438, -0.3398590087890625, -8.918182373046875, 4.640804290771484, 3.3432083129882812, 5.334934234619141, -5.922435760498047, -3.749542236328125, -2.0678863525390625, 7.312854766845703, 3.2210617065429688, -1.6869468688964844, -2.4063720703125, 0.8687820434570312, 8.034858703613281, 3.542875289916992, -3.7395706176757812, 7.390720367431641, 2.5460205078125, 10.36492919921875, 1.1671142578125, 4.78570556640625, 7.463050842285156, 0.03078937530517578, 1.9069938659667969, -5.435276031494141, -2.117919921875, -7.033103942871094, 2.461517333984375, -1.1623992919921875, 2.035327911376953, 1.3040008544921875, -2.615032196044922, 3.5651397705078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000129.npy"}
|
||||
{"epoch": 0.19501133786848074, "step": 130, "batch_size": 64, "mean": 1.90164315700531, "std": 4.047874450683594, "min": -8.592041015625, "p10": -2.9703125, "median": 1.5265960693359375, "p90": 7.2023262023925785, "max": 11.411056518554688, "pos_frac": 0.671875, "sample": [2.5689544677734375, 2.2745361328125, 5.13726806640625, 10.2471923828125, 0.9050674438476562, -2.641448974609375, 1.563934326171875, 3.976390838623047, -1.3161048889160156, 1.0400199890136719, -1.8934173583984375, 5.931480407714844, 5.391654968261719, 1.4892578125, -2.9609451293945312, 8.196563720703125, -1.5583648681640625, -4.288902282714844, -2.9743270874023438, 10.06658935546875, 1.8763542175292969, -0.8739967346191406, -0.6623458862304688, 7.080192565917969, -0.5594406127929688, -3.9670753479003906, 0.805145263671875, 0.2742767333984375, 1.6033210754394531, -1.4326667785644531, 4.908782958984375, 3.147552490234375, 9.960041046142578, 5.383546829223633, 2.3624496459960938, 7.254669189453125, 7.3527374267578125, -0.81494140625, 11.411056518554688, 2.6059036254882812, -1.1402130126953125, 2.4550323486328125, 1.3335723876953125, -3.1228790283203125, -1.0152130126953125, 2.0575637817382812, 2.991138458251953, 1.0017433166503906, -0.7220687866210938, -4.497100830078125, -4.993402481079102, 0.9783554077148438, 2.468597412109375, 6.222314834594727, -8.592041015625, 5.668174743652344, 5.9967041015625, 0.21426963806152344, 4.651679992675781, 0.26416015625, 3.375537872314453, -0.36295509338378906, 1.2444267272949219, 6.3567962646484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000130.npy"}
|
||||
{"epoch": 0.1965230536659108, "step": 131, "batch_size": 64, "mean": 2.496222496032715, "std": 4.583681106567383, "min": -9.265396118164062, "p10": -2.762771224975586, "median": 1.9661073684692383, "p90": 9.28216018676758, "max": 13.762741088867188, "pos_frac": 0.71875, "sample": [0.554443359375, 3.969402313232422, -2.649425506591797, 0.49758148193359375, -3.2952804565429688, 13.762741088867188, 2.2876510620117188, 3.1472835540771484, -0.4973926544189453, 11.631195068359375, 4.27001953125, 0.5666599273681641, -0.7325172424316406, 0.8987655639648438, 5.017749786376953, 3.9676132202148438, 4.96282958984375, -0.8400955200195312, 2.81072998046875, -9.265396118164062, -2.3450469970703125, 0.8903656005859375, 13.397933959960938, 0.4669189453125, 2.0995635986328125, -3.403045654296875, -2.010629653930664, -0.5746917724609375, -4.835762023925781, -2.8519363403320312, 2.352764129638672, 1.241241455078125, 6.2850341796875, 1.832651138305664, 8.201370239257812, 7.9605712890625, 0.6695022583007812, 8.965042114257812, 1.2607688903808594, 6.092308044433594, 2.902751922607422, 9.7037353515625, -2.175943374633789, 6.573951721191406, 5.2829742431640625, -2.8113479614257812, 1.2746543884277344, -4.3721923828125, 0.15560150146484375, 10.044815063476562, 0.9451522827148438, -0.7146759033203125, 7.015974044799805, -1.7802238464355469, -2.418609619140625, 9.418067932128906, 2.1653003692626953, 1.3913612365722656, 4.5349884033203125, 4.345241546630859, 2.6557464599609375, 3.7626075744628906, 10.161186218261719, 4.937648773193359], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000131.npy"}
|
||||
{"epoch": 0.1980347694633409, "step": 132, "batch_size": 64, "mean": 1.749915599822998, "std": 3.84273099899292, "min": -7.6809234619140625, "p10": -2.235397338867187, "median": 1.8525753021240234, "p90": 5.734896850585938, "max": 12.573806762695312, "pos_frac": 0.75, "sample": [2.91986083984375, 0.7785625457763672, 3.9776458740234375, 3.0228309631347656, -1.1811256408691406, -1.8451919555664062, 5.717529296875, 1.7118339538574219, -6.2854156494140625, 7.7296905517578125, 2.5753822326660156, 2.123992919921875, 2.5179443359375, 2.5277442932128906, -1.4679412841796875, 2.2518386840820312, -0.22609329223632812, 7.701446533203125, 4.1033782958984375, 2.703563690185547, -7.6809234619140625, -7.5756378173828125, 6.266317367553711, -1.0634841918945312, -0.9806365966796875, -1.853546142578125, 4.630054473876953, 1.8195915222167969, 5.742340087890625, -1.8099441528320312, 10.254196166992188, 0.5224494934082031, -7.656280517578125, 4.567375183105469, 3.2978057861328125, 3.0334625244140625, 4.730278015136719, 3.52069091796875, 4.7846832275390625, 1.7908782958984375, 1.318878173828125, 0.116729736328125, 0.7998504638671875, 10.515762329101562, 1.88555908203125, 0.3462677001953125, 3.7603759765625, 4.250831604003906, 0.6180934906005859, 1.507781982421875, 3.2266921997070312, -0.2292938232421875, 1.084442138671875, 1.8184051513671875, 2.1846466064453125, 1.8957366943359375, 12.573806762695312, -2.958232879638672, 1.3689727783203125, 0.8514404296875, 0.7711334228515625, 2.092041015625, -3.103424072265625, -2.3990478515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000132.npy"}
|
||||
{"epoch": 0.19954648526077098, "step": 133, "batch_size": 64, "mean": 1.961458444595337, "std": 4.274144649505615, "min": -6.93414306640625, "p10": -3.833778381347656, "median": 1.9481658935546875, "p90": 6.897673034667969, "max": 13.160202026367188, "pos_frac": 0.671875, "sample": [8.164779663085938, -6.558174133300781, 1.3575439453125, 4.448284149169922, -0.1480426788330078, -6.119865417480469, 1.992828369140625, 11.171001434326172, -0.5849399566650391, 3.8983840942382812, -0.0297393798828125, 5.2409210205078125, 5.471549987792969, 2.0849151611328125, -2.5470352172851562, 2.88031005859375, 7.8004608154296875, -6.93414306640625, 3.65728759765625, 4.56744384765625, -4.054801940917969, 5.383575439453125, -1.1179428100585938, 5.348379135131836, 2.0298995971679688, -3.5986328125, 2.1664581298828125, -2.69781494140625, 5.8994903564453125, -0.13479137420654297, 4.7567901611328125, 0.5540809631347656, 3.0683135986328125, 7.921892166137695, -0.3257408142089844, 0.287078857421875, -5.0817413330078125, 6.7489166259765625, 5.2183380126953125, 6.96142578125, -4.3531646728515625, 0.2281951904296875, 0.482452392578125, -1.761871337890625, 1.5679512023925781, 9.899238586425781, 4.384941101074219, 4.768836975097656, -2.1676177978515625, 13.160202026367188, 0.1215972900390625, 1.1409835815429688, -3.3949317932128906, 1.90350341796875, 5.0281524658203125, 0.7291488647460938, 3.1318893432617188, 6.378299713134766, 5.5500030517578125, -0.7451305389404297, 4.042289733886719, -1.0025177001953125, -3.9345550537109375, 1.2285003662109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000133.npy"}
|
||||
{"epoch": 0.20105820105820105, "step": 134, "batch_size": 64, "mean": 2.831118106842041, "std": 4.710463047027588, "min": -5.43145751953125, "p10": -3.661109924316406, "median": 2.5286693572998047, "p90": 7.408860778808594, "max": 17.9578857421875, "pos_frac": 0.75, "sample": [-4.328636169433594, 1.5813446044921875, -0.8053321838378906, 4.565586090087891, 1.4319114685058594, 1.5349655151367188, 3.1025161743164062, -1.6205368041992188, 3.980335235595703, 5.8163604736328125, 3.8129806518554688, 4.273223876953125, 3.1186370849609375, 1.7075881958007812, -0.10708999633789062, 2.4347801208496094, 7.291229248046875, 4.765766143798828, -3.9232025146484375, 6.540271759033203, 8.209430694580078, 9.928085327148438, 5.961307525634766, -4.645408630371094, 1.3576850891113281, 6.3503265380859375, -1.746490478515625, 3.5506057739257812, -0.5464773178100586, 2.790904998779297, -3.9319305419921875, 0.048351287841796875, 5.57672119140625, -4.4627838134765625, 1.5189552307128906, -2.349760055541992, 7.216270446777344, 2.6827049255371094, 2.381866455078125, 4.529254913330078, 17.9578857421875, 5.006143569946289, -3.049560546875, 4.3385162353515625, -0.6086883544921875, 3.3271560668945312, 6.8379974365234375, -5.43145751953125, 11.822555541992188, 1.4520645141601562, 2.5186424255371094, 1.9929885864257812, 0.0727691650390625, 1.3746185302734375, 13.817535400390625, -5.003871917724609, 3.5276412963867188, 2.5386962890625, 1.8732414245605469, 4.5807037353515625, 7.4592742919921875, 16.007789611816406, 0.2125072479248047, -1.025909423828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000134.npy"}
|
||||
{"epoch": 0.20256991685563114, "step": 135, "batch_size": 64, "mean": 2.025102376937866, "std": 4.194132328033447, "min": -11.5477294921875, "p10": -1.7173208236694335, "median": 1.8955554962158203, "p90": 7.53409423828125, "max": 14.605026245117188, "pos_frac": 0.6875, "sample": [2.66326904296875, -0.8280448913574219, 0.91046142578125, 4.17228889465332, 2.635040283203125, -0.06302547454833984, -3.9220504760742188, -1.0019989013671875, 2.3470840454101562, 8.704338073730469, 0.6146240234375, -4.783935546875, 2.9950408935546875, 0.3427867889404297, 3.5875320434570312, -6.25555419921875, 7.974845886230469, -0.3464202880859375, -1.6138992309570312, 3.540353775024414, 2.0086822509765625, -1.1452484130859375, -11.5477294921875, 7.415618896484375, 1.7193603515625, 0.8438072204589844, 4.598079681396484, 2.5237884521484375, 1.9294242858886719, 4.662567138671875, 1.3301506042480469, -2.79852294921875, 6.725799560546875, 2.2710399627685547, 7.1223907470703125, -0.030193328857421875, 1.2818984985351562, 0.026885986328125, 4.635417938232422, 0.38974761962890625, -0.354156494140625, 7.584869384765625, -0.7693824768066406, 3.8800735473632812, 3.3608837127685547, 6.786888122558594, 1.8616867065429688, -0.15431594848632812, 2.68121337890625, 1.6344451904296875, 1.7419815063476562, -6.85186767578125, -0.171295166015625, 3.4674415588378906, 8.074722290039062, 9.807559967041016, -0.8641948699951172, 2.0327091217041016, -1.7616443634033203, 8.198875427246094, 3.6463623046875, 6.3410186767578125, 14.605026245117188, -0.8080520629882812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000135.npy"}
|
||||
{"epoch": 0.20408163265306123, "step": 136, "batch_size": 64, "mean": 2.4477314949035645, "std": 5.864989280700684, "min": -8.998832702636719, "p10": -5.1921752929687495, "median": 0.8410301208496094, "p90": 9.546031951904297, "max": 16.86871337890625, "pos_frac": 0.609375, "sample": [5.1348419189453125, 7.099449157714844, -1.7947731018066406, -0.4681549072265625, -5.4229736328125, 5.249763488769531, -8.94244384765625, -2.3773880004882812, -8.132965087890625, 7.39396858215332, -0.27887725830078125, -5.369819641113281, -0.9585418701171875, -0.19622802734375, -2.0606155395507812, 0.246673583984375, 7.633953094482422, -0.42952728271484375, 0.6356277465820312, -1.3245735168457031, 9.647048950195312, 2.3857040405273438, 9.579872131347656, 9.467071533203125, -1.9711227416992188, 0.41094970703125, 11.228614807128906, -1.9542999267578125, 0.611572265625, 15.448638916015625, 8.44366455078125, -0.48665618896484375, 6.887115478515625, 7.3797760009765625, 7.8406829833984375, -3.520050048828125, -0.69598388671875, 5.774986267089844, 8.493804931640625, -4.777671813964844, -6.457220077514648, 0.6532669067382812, 3.15692138671875, 8.025077819824219, 2.1777877807617188, 1.0287933349609375, -5.681068420410156, 4.05047607421875, 1.6877593994140625, 16.6639404296875, 1.0577812194824219, -0.8038330078125, 4.9612274169921875, 9.2314453125, 7.472160339355469, 16.86871337890625, -0.50982666015625, 1.227142333984375, 5.807493209838867, 9.833930969238281, 0.1300048828125, -0.764373779296875, 0.00493621826171875, -8.998832702636719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000136.npy"}
|
||||
{"epoch": 0.20559334845049132, "step": 137, "batch_size": 64, "mean": 2.481208086013794, "std": 4.488886833190918, "min": -5.4901275634765625, "p10": -3.334172439575195, "median": 2.9947986602783203, "p90": 8.129741287231449, "max": 17.014236450195312, "pos_frac": 0.671875, "sample": [-4.1708526611328125, -5.4901275634765625, 1.0776748657226562, 8.640106201171875, 3.4012622833251953, 5.280250549316406, 8.482227325439453, 12.9464111328125, 2.0941238403320312, 17.014236450195312, 4.433597564697266, 4.3099212646484375, 6.699943542480469, -0.4055328369140625, 8.519386291503906, 3.2611846923828125, -3.3856124877929688, 3.922046661376953, -3.554229736328125, -2.0546188354492188, 2.1010665893554688, 2.2598228454589844, 5.2267303466796875, -3.2141456604003906, 3.1784133911132812, -0.0361175537109375, 10.033531188964844, 10.695419311523438, 4.717533111572266, 4.102752685546875, 0.75701904296875, 4.0782318115234375, -4.9326629638671875, 2.88519287109375, 7.307273864746094, 3.1890525817871094, -1.3337774276733398, -4.283168792724609, -0.24843597412109375, 0.14164352416992188, 1.0992050170898438, 0.9232940673828125, 0.6321067810058594, 4.6855621337890625, 5.889190673828125, -1.171173095703125, -1.5861339569091797, 5.490566253662109, 4.761749267578125, 4.5736083984375, 0.1604175567626953, -0.9295387268066406, -0.6686468124389648, -4.7188262939453125, 3.1044044494628906, -1.6225318908691406, -0.7455215454101562, -2.6352386474609375, -2.5510177612304688, 3.2975845336914062, 6.381626129150391, 5.6522064208984375, 5.38299560546875, 5.744659423828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000137.npy"}
|
||||
{"epoch": 0.20710506424792138, "step": 138, "batch_size": 64, "mean": 1.801964521408081, "std": 4.917527198791504, "min": -7.8772125244140625, "p10": -3.090493297576904, "median": 1.14459228515625, "p90": 7.926758003234864, "max": 18.316162109375, "pos_frac": 0.609375, "sample": [0.5507316589355469, 3.7694320678710938, 0.5385360717773438, 2.5349349975585938, 1.8387870788574219, 0.6475448608398438, 2.1257171630859375, -0.6357650756835938, -7.068756103515625, -3.169200897216797, 1.8372936248779297, -5.7754974365234375, 8.074691772460938, -0.5316390991210938, 13.082069396972656, -6.88092041015625, -0.5845718383789062, 4.05584716796875, 8.96710205078125, 1.8072013854980469, 5.821315765380859, -2.785858154296875, 18.316162109375, 3.4837493896484375, 0.6827621459960938, 5.1359405517578125, 5.060932159423828, -1.196441650390625, 2.4445228576660156, -0.5657711029052734, 7.581579208374023, -2.9068422317504883, 5.047951698303223, -0.16323089599609375, -0.5217628479003906, -1.9239006042480469, 1.355133056640625, -7.8772125244140625, -0.21680068969726562, 7.093315124511719, 4.438209533691406, 9.90130615234375, -2.1088714599609375, 4.423980712890625, -0.3523712158203125, -7.808658599853516, 4.316841125488281, 0.4865531921386719, -1.0383758544921875, 0.3154296875, -1.4215927124023438, 4.1786346435546875, 5.541290283203125, 1.2373046875, 1.0939788818359375, -5.263167381286621, -0.12699508666992188, 1.1952056884765625, -0.7054367065429688, 2.3781890869140625, 8.949966430664062, 3.34307861328125, 13.519615173339844, -0.21746826171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000138.npy"}
|
||||
{"epoch": 0.20861678004535147, "step": 139, "batch_size": 64, "mean": 1.7565573453903198, "std": 4.721503734588623, "min": -15.2366943359375, "p10": -4.070540618896484, "median": 2.412726402282715, "p90": 6.927703094482422, "max": 10.105621337890625, "pos_frac": 0.703125, "sample": [-4.232261657714844, 7.55279541015625, 6.3468780517578125, 0.4260711669921875, -5.446491241455078, 3.5595130920410156, 3.5392837524414062, 4.918575286865234, 8.551681518554688, -2.2558860778808594, -1.0839805603027344, 6.391063690185547, 3.1657886505126953, -4.9622802734375, -0.9238128662109375, 1.2118301391601562, 8.937126159667969, 2.411754608154297, 2.0823974609375, 2.413698196411133, 0.601898193359375, -15.2366943359375, 0.20562744140625, 2.502908706665039, 4.809333801269531, 5.0679779052734375, 8.0040283203125, 1.5342369079589844, 4.25970458984375, -3.6931915283203125, 2.8683509826660156, 2.6202392578125, 3.1694869995117188, -7.9989013671875, 3.0633392333984375, 0.720245361328125, 1.4482421875, 1.2685432434082031, 10.105621337890625, -0.697967529296875, 5.255626678466797, -2.905254364013672, 4.60926628112793, -0.2478199005126953, 1.6471595764160156, -5.885368347167969, 3.56915283203125, 4.516944885253906, 6.7889404296875, 6.698600769042969, -2.6064453125, 5.1756744384765625, 0.45902252197265625, -8.52001953125, -1.780029296875, 3.9824676513671875, -2.959728240966797, 6.926597595214844, -2.3722333908081055, 6.9281768798828125, 5.2041015625, 9.866455078125, 2.1667327880859375, -1.3251190185546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000139.npy"}
|
||||
{"epoch": 0.21012849584278157, "step": 140, "batch_size": 64, "mean": 2.898951292037964, "std": 5.226285457611084, "min": -11.585601806640625, "p10": -3.1576034545898435, "median": 2.4323348999023438, "p90": 9.779207611083985, "max": 14.138526916503906, "pos_frac": 0.671875, "sample": [10.122177124023438, 8.322021484375, -1.2565040588378906, -1.8129730224609375, -1.8945465087890625, 7.099952697753906, -4.9262542724609375, -4.124298095703125, 2.8227005004882812, 1.0434188842773438, -11.585601806640625, 6.109169006347656, 3.9267044067382812, 2.533721923828125, 6.9173583984375, -2.2555618286132812, -0.9322052001953125, 3.665191650390625, 4.1510009765625, 7.359550476074219, -0.08941650390625, 7.480316162109375, 6.777931213378906, 8.219207763671875, 1.8150482177734375, 9.849288940429688, -0.29802703857421875, 10.025146484375, 0.8920516967773438, -2.5313873291015625, -4.425628662109375, 0.45758056640625, 4.956518173217773, -4.286445617675781, -1.4894294738769531, 4.009246826171875, 2.055248260498047, 9.51580810546875, 10.067787170410156, 10.309249877929688, 7.813331604003906, 7.864696502685547, 9.615684509277344, 8.974468231201172, 0.05352020263671875, -1.1859760284423828, 13.897796630859375, -2.2817153930664062, 14.138526916503906, -2.0352020263671875, -3.2443313598632812, 6.214488983154297, 0.08823394775390625, 1.6313629150390625, 8.87567138671875, 1.0312366485595703, 3.7880401611328125, -2.9552383422851562, 4.074943542480469, 2.3309478759765625, 2.686920166015625, 2.220226287841797, -2.5541343688964844, -4.105735778808594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000140.npy"}
|
||||
{"epoch": 0.21164021164021163, "step": 141, "batch_size": 64, "mean": 2.8418612480163574, "std": 5.002410411834717, "min": -10.33673095703125, "p10": -3.593222618103027, "median": 3.0623531341552734, "p90": 9.366069030761718, "max": 12.881546020507812, "pos_frac": 0.671875, "sample": [4.5241241455078125, -4.418697357177734, 7.4210052490234375, -1.07818603515625, 7.1542510986328125, 0.2821807861328125, 3.4808578491210938, 9.055107116699219, 3.196420669555664, -1.9337034225463867, 0.6674957275390625, -2.9615249633789062, 3.496143341064453, -4.633247375488281, 12.881546020507812, -0.22314834594726562, 2.5717029571533203, 0.2881584167480469, -3.0163135528564453, -10.33673095703125, 0.7842559814453125, -4.081455230712891, -1.6281967163085938, 4.838600158691406, -0.46581268310546875, 2.0865402221679688, 5.569465637207031, 7.773406982421875, 2.41632080078125, -3.8404693603515625, -0.1455078125, -4.097587585449219, 4.324085235595703, 9.375114440917969, 6.809722900390625, -2.370391845703125, 8.721641540527344, -0.9173660278320312, 9.695991516113281, 12.35614013671875, 6.0743560791015625, 4.558504104614258, 3.7694244384765625, 2.1499481201171875, 11.03360366821289, 3.9623260498046875, -1.5399818420410156, 6.1317901611328125, 3.0148162841796875, 9.344963073730469, -2.3298873901367188, 1.9576568603515625, 6.333076477050781, 11.132720947265625, 3.9490509033203125, 11.247314453125, -2.7081966400146484, 7.8462066650390625, 1.3663406372070312, 7.7701873779296875, 6.0447235107421875, -0.5787925720214844, -5.38287353515625, 3.1098899841308594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000141.npy"}
|
||||
{"epoch": 0.21315192743764172, "step": 142, "batch_size": 64, "mean": 2.004155158996582, "std": 4.720561504364014, "min": -9.63555908203125, "p10": -2.414890098571777, "median": 1.548079490661621, "p90": 6.993575286865236, "max": 16.609527587890625, "pos_frac": 0.703125, "sample": [4.206844329833984, -8.151229858398438, 0.3198261260986328, 5.419971466064453, 3.0148468017578125, 11.616249084472656, 6.555610656738281, -6.885673522949219, 2.1007652282714844, 1.4305267333984375, 2.5576705932617188, 0.7698516845703125, 6.248271942138672, -5.996238708496094, 0.4652976989746094, 1.2283401489257812, -0.9083328247070312, -9.63555908203125, -2.160236358642578, 1.4776134490966797, 1.699493408203125, 2.9442138671875, 1.8752517700195312, -1.0606536865234375, 4.201377868652344, 9.988601684570312, -1.5162353515625, 1.3192138671875, 5.898307800292969, 1.1780891418457031, 1.104461669921875, 2.030609130859375, 16.609527587890625, 3.11431884765625, -0.0740966796875, -3.9929885864257812, -0.3879661560058594, -2.1690006256103516, 1.6185455322265625, 4.4432525634765625, 12.399932861328125, 0.45311737060546875, 0.5582733154296875, -2.5202713012695312, -1.3362274169921875, 0.4625568389892578, -0.2609748840332031, 7.1812744140625, -1.6759910583496094, 5.213050842285156, 2.8686561584472656, -2.0698795318603516, -1.363128662109375, 12.514480590820312, 3.826801300048828, 6.166717529296875, 4.629528045654297, 5.479042053222656, 9.0621337890625, 2.831003189086914, 2.722320556640625, -3.9108047485351562, 0.5559616088867188, 1.9796104431152344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000142.npy"}
|
||||
{"epoch": 0.2146636432350718, "step": 143, "batch_size": 64, "mean": 1.748218059539795, "std": 5.324660301208496, "min": -11.25555419921875, "p10": -3.825942230224609, "median": 1.5220613479614258, "p90": 8.611022186279298, "max": 15.417613983154297, "pos_frac": 0.578125, "sample": [0.23790740966796875, 5.1680908203125, 8.091232299804688, 6.913581848144531, 1.627349853515625, 3.788787841796875, -1.0851936340332031, -0.6592111587524414, 10.046066284179688, -2.3792076110839844, 2.545330047607422, -4.8843841552734375, 12.567245483398438, -1.2991600036621094, 6.21832275390625, 2.196338653564453, -8.183769226074219, -3.305572509765625, 6.90283203125, 0.9869461059570312, -3.4488067626953125, -4.998527526855469, 8.643562316894531, 9.991294860839844, 2.5597267150878906, 1.726470947265625, -0.14859771728515625, -6.47186279296875, 0.23717308044433594, -0.9733505249023438, -2.5740585327148438, 2.3786840438842773, -2.6682510375976562, -0.09244537353515625, 9.642105102539062, -0.43363189697265625, -3.9875717163085938, -0.49578857421875, 1.7234878540039062, 2.50885009765625, 2.1824188232421875, -11.25555419921875, 6.475772857666016, 3.4000244140625, 1.4167728424072266, 5.1732330322265625, 10.339004516601562, 8.53509521484375, -2.602203369140625, -0.9723930358886719, -0.9743728637695312, -2.4809131622314453, 4.4163055419921875, 15.417613983154297, -10.684127807617188, 8.026908874511719, 6.2574920654296875, 0.06095123291015625, 7.160039901733398, 5.862404823303223, 2.049114227294922, -0.9001607894897461, -2.9601402282714844, -0.6693305969238281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000143.npy"}
|
||||
{"epoch": 0.2161753590325019, "step": 144, "batch_size": 64, "mean": 1.2398254871368408, "std": 6.099079132080078, "min": -12.2825927734375, "p10": -5.838732147216796, "median": -0.07151222229003906, "p90": 8.938286781311039, "max": 18.689285278320312, "pos_frac": 0.484375, "sample": [7.696632385253906, 5.499641418457031, -8.984222412109375, -0.5427007675170898, -0.044216156005859375, 3.821443557739258, -0.9390754699707031, 6.800025939941406, 4.435417175292969, 0.7841415405273438, -4.162391662597656, 12.801074981689453, -6.012626647949219, -0.4482383728027344, -1.9708786010742188, -12.2825927734375, 9.316459655761719, 0.08696746826171875, -5.959751129150391, 5.837028503417969, -0.2833251953125, -1.8710556030273438, -2.1348724365234375, 3.9167938232421875, -5.097339630126953, -2.87237548828125, -5.0107269287109375, 6.476619720458984, -2.499217987060547, -0.7392768859863281, 11.1041259765625, -4.206756591796875, -4.982688903808594, -0.09880828857421875, 4.7252349853515625, -0.9244003295898438, 0.8204708099365234, 11.161945343017578, 13.619781494140625, 7.553676605224609, 5.3765869140625, 0.8764114379882812, -7.520084381103516, 5.889129638671875, -5.556354522705078, -1.8966445922851562, 6.70306396484375, -4.765571594238281, 6.810150146484375, -3.4168243408203125, -0.2984895706176758, -3.0316696166992188, 1.9788970947265625, 8.055883407592773, -7.0435638427734375, 11.13449478149414, 18.689285278320312, 7.24359130859375, -1.0023841857910156, -7.227409362792969, 3.8273544311523438, -2.0307788848876953, 1.6466941833496094, 0.5171146392822266], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000144.npy"}
|
||||
{"epoch": 0.21768707482993196, "step": 145, "batch_size": 64, "mean": 2.424762010574341, "std": 5.756568908691406, "min": -15.56927490234375, "p10": -4.949123382568359, "median": 2.808474540710449, "p90": 9.452053833007817, "max": 14.219131469726562, "pos_frac": 0.75, "sample": [3.9130325317382812, 4.468349456787109, 3.6255264282226562, 4.2528533935546875, 1.0094070434570312, -15.56927490234375, -1.7091064453125, -0.3099212646484375, 11.119979858398438, -7.3756103515625, 3.089071273803711, 3.2029457092285156, 2.031728744506836, 6.784812927246094, 2.2572708129882812, 0.81158447265625, 2.1456527709960938, 12.671279907226562, -7.685760498046875, 4.720111846923828, 1.9117164611816406, 2.5001487731933594, 1.38323974609375, 8.300048828125, 6.683176040649414, 0.5170135498046875, 1.126373291015625, 3.536712646484375, -2.483489990234375, -5.188938140869141, 9.945770263671875, 4.0704803466796875, 3.3418426513671875, 14.219131469726562, -9.799957275390625, 3.4132251739501953, 5.935337066650391, 1.7078628540039062, 5.7863006591796875, 3.011075973510742, -4.4934234619140625, 6.1859130859375, 13.778366088867188, 4.723476409912109, -3.775768280029297, 3.026397705078125, 2.4067859649658203, 4.848437309265137, -0.809356689453125, 13.673126220703125, 5.20147705078125, 0.46204376220703125, 6.112926483154297, -0.072906494140625, 0.14675140380859375, 7.45648193359375, 13.454307556152344, -2.49560546875, -9.11688232421875, 2.6058731079101562, 5.6807861328125, -4.607673645019531, -5.095458984375, 2.517688751220703], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000145.npy"}
|
||||
{"epoch": 0.21919879062736206, "step": 146, "batch_size": 64, "mean": 1.7500635385513306, "std": 4.657474040985107, "min": -9.712631225585938, "p10": -3.876949119567871, "median": 1.7031745910644531, "p90": 6.4900506973266605, "max": 20.22119140625, "pos_frac": 0.65625, "sample": [3.5811843872070312, 0.7647914886474609, 1.0479354858398438, 5.6315765380859375, 2.09368896484375, -4.699527740478516, 0.74169921875, -1.1010513305664062, 4.5182952880859375, -2.2343673706054688, -6.9468994140625, -9.712631225585938, 5.0233001708984375, 5.591121673583984, -4.200187683105469, -3.7667694091796875, 8.213783264160156, -1.6770858764648438, 10.201568603515625, -4.5251007080078125, 4.051555633544922, 1.8719863891601562, -0.1976318359375, -2.6881103515625, -3.906381607055664, 3.6874923706054688, -0.07503700256347656, 1.53436279296875, -2.541290283203125, -0.279815673828125, 4.152984619140625, -3.8082733154296875, 9.969558715820312, 2.6615676879882812, 4.297340393066406, 0.45224761962890625, 3.077789306640625, -0.3219451904296875, -2.803924560546875, 0.7295188903808594, 7.043487548828125, 0.16623306274414062, -0.8368854522705078, 2.34344482421875, 3.753662109375, 7.4845428466796875, 5.204463958740234, 2.0243701934814453, 6.115818023681641, 2.054698944091797, 3.500886917114258, -5.1011962890625, 5.7652740478515625, -2.9518356323242188, 1.0426197052001953, -0.23152923583984375, 20.22119140625, 4.6514892578125, 0.8308753967285156, 2.956249237060547, 6.488620758056641, 0.8566970825195312, 3.7209014892578125, 6.490663528442383], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000146.npy"}
|
||||
{"epoch": 0.22071050642479215, "step": 147, "batch_size": 64, "mean": 1.8817964792251587, "std": 5.224145889282227, "min": -10.494659423828125, "p10": -3.5610675811767574, "median": 1.4355335235595703, "p90": 10.199006652832031, "max": 13.689041137695312, "pos_frac": 0.625, "sample": [1.2005386352539062, 3.961576461791992, 9.813545227050781, 0.5695075988769531, -1.1119613647460938, 2.433441162109375, -0.42317962646484375, -3.251007080078125, -0.418975830078125, -1.6234016418457031, 0.697235107421875, 5.1013336181640625, 10.20849609375, 4.375274658203125, -0.6673660278320312, 4.6407012939453125, 11.688667297363281, -10.494659423828125, 7.435039520263672, -0.28966617584228516, 5.206747055053711, 3.3607139587402344, -0.8555335998535156, -2.7812957763671875, -2.355255126953125, 3.3795814514160156, -0.18077850341796875, 2.15960693359375, 0.26442718505859375, 4.2863311767578125, -3.0000991821289062, 0.961822509765625, 2.34503173828125, -7.464992523193359, 4.877662658691406, 6.139839172363281, 4.450506210327148, 12.562225341796875, 13.565093994140625, -3.034189224243164, 10.176864624023438, 3.005706787109375, -4.7173309326171875, 2.0880680084228516, 1.6820964813232422, -3.693950653076172, 10.97061538696289, -1.7610740661621094, 13.689041137695312, -2.785388946533203, -3.0933074951171875, 3.426494598388672, 0.117828369140625, 1.6705284118652344, -5.8663177490234375, 1.816314697265625, 0.1157073974609375, -3.0013294219970703, 4.731529235839844, -4.48675537109375, -5.882390975952148, 2.2231674194335938, 12.131114959716797, 0.1451578140258789], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000147.npy"}
|
||||
{"epoch": 0.2222222222222222, "step": 148, "batch_size": 64, "mean": 2.6436638832092285, "std": 5.700589179992676, "min": -8.659408569335938, "p10": -4.100946044921875, "median": 2.17452335357666, "p90": 9.661161804199219, "max": 22.449111938476562, "pos_frac": 0.734375, "sample": [4.3985595703125, -3.5362491607666016, 0.5503673553466797, -6.883701324462891, 1.2396011352539062, 7.251502990722656, 6.343650817871094, -5.009681701660156, 5.669677734375, 10.378395080566406, -3.3072710037231445, 2.6123504638671875, 6.810066223144531, 5.398826599121094, 4.7137603759765625, -5.895904541015625, -3.279237747192383, 1.085723876953125, 3.1215744018554688, -4.123207092285156, 1.75677490234375, -7.0929107666015625, 22.449111938476562, 2.226634979248047, 1.143218994140625, 9.072723388671875, 6.5481719970703125, 0.8274726867675781, 17.62469482421875, 3.941272735595703, 3.0975608825683594, 4.92730712890625, 12.80385971069336, -4.049003601074219, 3.703887939453125, 10.812232971191406, 0.602447509765625, 1.7246265411376953, -8.659408569335938, -1.2238006591796875, 5.239898681640625, -2.43109130859375, 3.6662979125976562, 11.355888366699219, -7.0429534912109375, 1.6530094146728516, 4.904605865478516, 2.1224117279052734, 9.712394714355469, -2.0751953125, 4.014368057250977, 0.5346527099609375, 1.5138282775878906, -0.40343475341796875, 1.3927631378173828, 2.8820953369140625, 7.1807098388671875, 2.458160400390625, 1.8474979400634766, 3.2837905883789062, 1.90252685546875, 9.541618347167969, -2.5301742553710938, -1.30487060546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000148.npy"}
|
||||
{"epoch": 0.2237339380196523, "step": 149, "batch_size": 64, "mean": 1.7455116510391235, "std": 4.357076644897461, "min": -9.242828369140625, "p10": -3.5014423370361323, "median": 1.8083515167236328, "p90": 7.241767120361328, "max": 11.591510772705078, "pos_frac": 0.671875, "sample": [7.597206115722656, 5.226448059082031, 2.086925506591797, -1.7145919799804688, -0.894927978515625, 0.23171234130859375, -2.7725257873535156, 1.5758953094482422, -4.451938629150391, 8.643653869628906, -8.016410827636719, 1.7493629455566406, 3.2687110900878906, 1.867340087890625, -1.3550567626953125, -2.3442039489746094, 0.5822792053222656, 2.5340118408203125, 4.007268905639648, 5.413139343261719, 3.360065460205078, 3.6896820068359375, 7.254726409912109, -1.9591064453125, 11.591510772705078, -1.576019287109375, 1.5264949798583984, 0.1801929473876953, 4.196468353271484, -7.264305114746094, 2.234477996826172, 4.247215270996094, 5.675628662109375, 10.636323928833008, 3.5558414459228516, 10.423168182373047, 7.211528778076172, -4.7003326416015625, 0.391204833984375, -3.8138351440429688, 2.95245361328125, 9.645401000976562, -1.148019790649414, 6.232452392578125, 5.689849853515625, 3.5607872009277344, 1.4243316650390625, -4.50364875793457, -0.9065818786621094, 2.471099853515625, -2.0343055725097656, 0.8829116821289062, -1.684722900390625, -9.242828369140625, -1.4591140747070312, 2.7939910888671875, 1.9984283447265625, 6.190177917480469, 1.5140190124511719, 4.292638778686523, -0.785125732421875, 4.2516632080078125, -0.7780227661132812, 0.25968170166015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000149.npy"}
|
||||
{"epoch": 0.2252456538170824, "step": 150, "batch_size": 64, "mean": 3.065699338912964, "std": 5.638877868652344, "min": -8.345245361328125, "p10": -4.594990539550781, "median": 3.315380096435547, "p90": 9.710398864746095, "max": 17.602127075195312, "pos_frac": 0.671875, "sample": [3.0545730590820312, 0.9477519989013672, 11.96917724609375, 1.7105178833007812, -5.68927001953125, 2.079010009765625, 2.2361021041870117, 2.6612625122070312, -2.16448974609375, 4.024509429931641, -0.8029403686523438, 7.9188690185546875, -6.971923828125, -3.0156707763671875, -1.5873069763183594, 4.640632629394531, 0.41864013671875, 13.085113525390625, -6.762298583984375, -3.3748092651367188, 9.092880249023438, -0.6695327758789062, 11.45947265625, -8.345245361328125, -0.392822265625, 6.013389587402344, 3.8939666748046875, 4.7689971923828125, 3.5761871337890625, 10.30007553100586, 1.4875373840332031, 4.420867919921875, 6.4501190185546875, 7.139923095703125, 4.89337158203125, -4.740291595458984, 0.7940673828125, 6.972406387329102, 9.186874389648438, 9.81005859375, 4.63189697265625, 6.846916198730469, -4.930030822753906, 2.3872909545898438, 14.251148223876953, -2.4112091064453125, 4.024513244628906, 4.643547058105469, -6.000232696533203, 17.602127075195312, 7.2462921142578125, -0.6293182373046875, 9.44097900390625, -4.255954742431641, 1.110809326171875, 7.312660217285156, -2.557384490966797, -1.45941162109375, 7.598731994628906, -2.1003265380859375, 4.814727783203125, 8.910140991210938, -0.2407684326171875, 9.477859497070312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000150.npy"}
|
||||
{"epoch": 0.22675736961451248, "step": 151, "batch_size": 64, "mean": 3.1410341262817383, "std": 5.001905918121338, "min": -7.1598663330078125, "p10": -1.7317008972167967, "median": 3.002028465270996, "p90": 8.613737106323242, "max": 18.60009765625, "pos_frac": 0.734375, "sample": [7.660865783691406, 8.41476058959961, 5.2885589599609375, -0.23150634765625, 3.2240543365478516, 0.467987060546875, 18.36431884765625, -1.1904067993164062, 0.2271270751953125, 1.1714324951171875, 5.398349761962891, -0.2912712097167969, 3.7944793701171875, -7.094818115234375, 5.586633682250977, 3.24053955078125, 6.648365020751953, -1.5497493743896484, 2.4491920471191406, 0.20435714721679688, -2.823150634765625, -1.1591262817382812, 6.095001220703125, 0.36669921875, 1.078887939453125, -4.47894287109375, 2.7800025939941406, 4.3098602294921875, -1.9168567657470703, -3.9571762084960938, 4.00355339050293, -1.5533218383789062, -0.42362213134765625, 5.6411590576171875, 4.934898376464844, -0.6533164978027344, 1.5169639587402344, -1.5556259155273438, 1.0937881469726562, 18.60009765625, 1.439035415649414, 5.875511169433594, 4.206062316894531, 6.516960144042969, 14.563163757324219, 11.418190002441406, 4.865505218505859, 2.6223602294921875, 10.684181213378906, 3.231922149658203, 1.6301765441894531, 3.6182403564453125, -1.7811107635498047, 4.70013427734375, -7.1598663330078125, 4.569711685180664, 4.377042770385742, 6.949617385864258, -1.6164112091064453, 3.383056640625, 1.0524063110351562, 11.144149780273438, 2.3540897369384766, 8.699012756347656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000151.npy"}
|
||||
{"epoch": 0.22826908541194255, "step": 152, "batch_size": 64, "mean": 2.4968485832214355, "std": 5.2658467292785645, "min": -8.411048889160156, "p10": -4.108351135253906, "median": 2.3871264457702637, "p90": 7.406730651855469, "max": 19.798690795898438, "pos_frac": 0.65625, "sample": [0.8990097045898438, -0.350982666015625, 3.7160301208496094, 4.140838623046875, -7.6198577880859375, 5.034027099609375, 7.2397308349609375, 5.536834716796875, 9.298965454101562, -2.2785797119140625, 5.843297004699707, -0.8494338989257812, 1.8490104675292969, -8.411048889160156, 7.034698486328125, 5.5897979736328125, 0.20069503784179688, 3.5285797119140625, -1.2309436798095703, -4.58447265625, -4.290504455566406, 8.52105712890625, 4.127223968505859, -3.6833267211914062, -3.051471710205078, 1.0402107238769531, 4.696800231933594, 2.9310646057128906, 2.4524526596069336, 7.3446197509765625, 0.9853916168212891, 8.20086669921875, 2.020374298095703, -0.33895111083984375, 1.0205230712890625, -7.16217041015625, 0.6917457580566406, 0.07978057861328125, 6.831550598144531, -2.7106475830078125, -1.2741622924804688, 5.861965179443359, 4.229827880859375, -1.5460128784179688, 12.461860656738281, 6.738014221191406, 2.3218002319335938, 4.008029937744141, -0.142059326171875, 3.499847412109375, -4.577301025390625, 2.9953155517578125, -0.15449142456054688, 18.237083435058594, 5.037569046020508, 7.0753021240234375, 7.433349609375, -0.8567352294921875, 3.4018096923828125, 19.798690795898438, -0.5838470458984375, -4.725734710693359, -0.5037384033203125, 6.769142150878906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000152.npy"}
|
||||
{"epoch": 0.22978080120937264, "step": 153, "batch_size": 64, "mean": 2.571483612060547, "std": 4.705831527709961, "min": -10.038970947265625, "p10": -2.528551483154297, "median": 1.8503437042236328, "p90": 8.740608215332031, "max": 14.298500061035156, "pos_frac": 0.71875, "sample": [6.175235748291016, 0.9034347534179688, -2.428955078125, 8.781417846679688, 8.6453857421875, 6.6729888916015625, -2.1666030883789062, -0.06327056884765625, 1.8381004333496094, 0.04864501953125, 0.5626010894775391, 5.255657196044922, 2.2627716064453125, 14.298500061035156, 2.8311233520507812, 7.8448333740234375, -0.5550079345703125, -2.3740768432617188, 2.2209243774414062, 10.302345275878906, 6.92877197265625, 4.083152770996094, -6.211219787597656, -2.5712356567382812, 11.981124877929688, -2.3369312286376953, 3.2353973388671875, 0.43132591247558594, -0.39658355712890625, 1.7459335327148438, 0.8624420166015625, 3.8347816467285156, -4.148332595825195, -2.1200942993164062, 4.606513977050781, 3.3676319122314453, -10.038970947265625, 6.722320556640625, 9.64581298828125, 1.3919563293457031, -3.97857666015625, 0.23096084594726562, 8.631294250488281, -4.41094970703125, 11.646110534667969, 0.2717475891113281, 7.486320495605469, 4.709991455078125, 9.453529357910156, -0.5379848480224609, -1.285491943359375, 1.8625869750976562, 0.6435298919677734, -1.69244384765625, 4.3361358642578125, 0.7187271118164062, -2.6834487915039062, 4.7883453369140625, 2.64190673828125, 5.20025634765625, 5.506919860839844, 1.599771499633789, 0.6527442932128906, 6.713104248046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000153.npy"}
|
||||
{"epoch": 0.23129251700680273, "step": 154, "batch_size": 64, "mean": 1.5183594226837158, "std": 4.860706806182861, "min": -11.354522705078125, "p10": -4.577381324768066, "median": 1.3728218078613281, "p90": 7.279844856262208, "max": 12.447669982910156, "pos_frac": 0.625, "sample": [-9.872276306152344, -6.044639587402344, -5.4539031982421875, -3.7606468200683594, 9.278076171875, 2.1711044311523438, -4.083761215209961, -0.4650993347167969, -5.85650634765625, -11.354522705078125, 1.8555889129638672, -0.1554393768310547, 2.4981613159179688, 7.2385101318359375, -0.1632537841796875, -3.7751998901367188, -0.48889923095703125, 2.405719757080078, 1.42974853515625, -2.8892822265625, -0.6034622192382812, 12.447669982910156, 8.867473602294922, 3.9169540405273438, -4.788932800292969, 10.98983383178711, 4.302848815917969, -3.7535476684570312, 2.699737548828125, -5.3416748046875, 5.396890640258789, 0.5040206909179688, 3.6641616821289062, 4.124065399169922, 6.350654602050781, -3.105144500732422, 4.189430236816406, 10.041580200195312, 1.6968450546264648, -1.8015251159667969, 0.8536415100097656, -0.6678237915039062, 4.054615020751953, -2.771108627319336, 0.8147125244140625, 7.29755973815918, 10.820281982421875, 1.1056365966796875, 0.63153076171875, -1.320831298828125, 2.435272216796875, 0.824951171875, -0.38874053955078125, 5.1674346923828125, 5.203155517578125, 6.701759338378906, 3.2132530212402344, -0.12491226196289062, 5.745746612548828, 6.8174896240234375, 0.30615234375, 2.354018211364746, 1.3158950805664062, 4.4739532470703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000154.npy"}
|
||||
{"epoch": 0.2328042328042328, "step": 155, "batch_size": 64, "mean": 2.8094191551208496, "std": 5.482178688049316, "min": -11.580963134765625, "p10": -3.4526401519775383, "median": 2.8496150970458984, "p90": 9.789048767089845, "max": 14.378868103027344, "pos_frac": 0.71875, "sample": [0.23380279541015625, -0.5073699951171875, 6.86737060546875, 1.6773490905761719, -0.6667938232421875, 4.321140289306641, 3.3068466186523438, 7.008625030517578, -4.735065460205078, 9.478195190429688, 7.5912017822265625, 4.455898284912109, -1.4650421142578125, 4.74188232421875, -1.93975830078125, 1.3679122924804688, 2.444305419921875, 9.922271728515625, 3.50616455078125, 10.119146347045898, 10.588775634765625, 0.3551483154296875, 3.436370849609375, 0.5064411163330078, 4.7568206787109375, 11.212379455566406, -4.8762969970703125, 1.7207489013671875, -0.9075813293457031, 9.092945098876953, -2.634613037109375, 3.000896453857422, 13.738971710205078, 7.6625213623046875, 11.445587158203125, 5.1933441162109375, -1.1385536193847656, 8.68280029296875, -11.580963134765625, 5.489654541015625, -4.12554931640625, 6.888256072998047, 1.635284423828125, -0.7022781372070312, 14.378868103027344, 3.8473167419433594, 0.28348541259765625, 3.0882949829101562, -10.387931823730469, 3.365337371826172, 9.00341796875, -2.65826416015625, 1.2402667999267578, -0.745391845703125, 1.2484283447265625, 0.22707366943359375, 2.698333740234375, -3.7930870056152344, -10.11480712890625, 0.024837493896484375, 8.947601318359375, 7.6194610595703125, 6.262105941772461, -1.901702880859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000155.npy"}
|
||||
{"epoch": 0.23431594860166288, "step": 156, "batch_size": 64, "mean": 2.822887659072876, "std": 4.789482116699219, "min": -11.175697326660156, "p10": -2.970945739746093, "median": 2.6191635131835938, "p90": 9.576890945434576, "max": 13.472854614257812, "pos_frac": 0.75, "sample": [-0.089508056640625, -1.0855026245117188, -0.6734523773193359, -1.7773704528808594, 13.472854614257812, 5.6471099853515625, 10.108213424682617, -3.3901824951171875, 3.1753692626953125, 8.455944061279297, 1.5099258422851562, 13.083534240722656, -5.7371063232421875, 2.5233001708984375, -2.60205078125, 4.824619293212891, 5.6803741455078125, 10.057296752929688, 0.7157516479492188, 4.381275177001953, 0.78924560546875, 0.6401596069335938, 5.231925964355469, 4.353240966796875, 6.656219482421875, 2.71502685546875, 0.06558799743652344, 6.3964996337890625, 3.1614227294921875, 1.6776580810546875, -3.1290435791015625, 3.6214447021484375, 0.9961090087890625, -1.543670654296875, 11.1737060546875, 12.701240539550781, 0.3581085205078125, 2.2504196166992188, -2.539520263671875, 4.7431182861328125, 5.6492767333984375, 6.158905029296875, -1.1266021728515625, 0.7244033813476562, 4.018672943115234, 1.6703262329101562, -2.38116455078125, -3.506378173828125, 2.1821441650390625, 5.240043640136719, 1.0398845672607422, 8.30816650390625, 4.038385391235352, -11.175697326660156, 7.411872863769531, 1.03509521484375, -4.410911560058594, 5.141143798828125, 7.250640869140625, 1.2774391174316406, 4.912261962890625, 10.392105102539062, 2.8145065307617188, -4.5989990234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000156.npy"}
|
||||
{"epoch": 0.23582766439909297, "step": 157, "batch_size": 64, "mean": 2.047905206680298, "std": 4.723270416259766, "min": -9.656631469726562, "p10": -2.6146381378173826, "median": 1.9407501220703125, "p90": 7.231800842285158, "max": 17.105690002441406, "pos_frac": 0.65625, "sample": [12.94399642944336, 9.125598907470703, 1.9454727172851562, 3.803607940673828, 1.4542617797851562, -7.9571075439453125, 1.1544647216796875, 4.2030029296875, 0.14055633544921875, -1.875823974609375, 1.5267562866210938, -0.8131179809570312, 3.3701629638671875, -1.0750312805175781, -6.5601043701171875, 0.41916465759277344, -1.3494338989257812, 1.3285140991210938, 6.051235198974609, 4.098352432250977, 4.343315124511719, 4.461326599121094, 17.105690002441406, 2.2950353622436523, 2.8016815185546875, 2.0920028686523438, -4.767494201660156, -2.3741207122802734, 1.9360275268554688, -1.0394821166992188, 4.400640487670898, -2.6806640625, 2.124492645263672, 6.027854919433594, 4.981353759765625, 9.938667297363281, 0.29721832275390625, 5.420803070068359, 7.3895416259765625, -1.3449630737304688, 4.077648162841797, 6.863739013671875, -1.6834182739257812, 12.214485168457031, -2.62091064453125, -4.077249526977539, -1.33502197265625, 4.028617858886719, -1.6510295867919922, -0.012332916259765625, 8.83837890625, -2.6000022888183594, -0.9076728820800781, 2.218303680419922, 2.5968027114868164, 6.5866241455078125, 0.13006210327148438, 5.7188720703125, -9.656631469726562, 2.0416107177734375, 5.161918640136719, -0.26848602294921875, -1.8284759521484375, 1.8866500854492188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000157.npy"}
|
||||
{"epoch": 0.23733938019652306, "step": 158, "batch_size": 64, "mean": 2.284956693649292, "std": 4.727978706359863, "min": -8.864006042480469, "p10": -3.5257320404052734, "median": 2.567089080810547, "p90": 8.541793823242188, "max": 13.807830810546875, "pos_frac": 0.671875, "sample": [0.49306488037109375, -1.7236976623535156, 2.1939544677734375, 3.4132461547851562, -2.29559326171875, 3.5733985900878906, 13.571487426757812, 8.30755615234375, 1.452728271484375, -0.01776885986328125, 3.8978805541992188, 3.8565673828125, 8.3797607421875, 4.177509307861328, -4.9496612548828125, 3.525909423828125, 13.807830810546875, 2.2885475158691406, -0.7123003005981445, 9.947750091552734, -3.5761947631835938, -0.5095672607421875, 4.7728118896484375, 2.9475746154785156, 9.961254119873047, 6.1103363037109375, 2.152587890625, 3.3383636474609375, 7.11724853515625, -4.416526794433594, -6.275604248046875, 2.0199050903320312, 2.8546180725097656, -1.6680068969726562, 6.4676055908203125, 9.122575759887695, 8.751968383789062, -1.489776611328125, -1.54913330078125, -8.864006042480469, 5.063232421875, 6.964622497558594, 3.934814453125, 8.611236572265625, -2.6322021484375, 3.5171241760253906, -1.72900390625, 1.4047698974609375, 7.470489501953125, 2.1851348876953125, 2.673675537109375, 2.4605026245117188, -3.4079856872558594, -2.3686676025390625, 0.00279998779296875, -1.6450519561767578, -0.7935543060302734, 4.388038635253906, 4.405967712402344, 1.0199031829833984, -4.724250793457031, 2.967113494873047, 3.217327117919922, -7.20501708984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000158.npy"}
|
||||
{"epoch": 0.23885109599395313, "step": 159, "batch_size": 64, "mean": 2.7471323013305664, "std": 4.564988613128662, "min": -14.56689453125, "p10": -1.7588684082031247, "median": 2.536618232727051, "p90": 8.855379486083985, "max": 16.33935546875, "pos_frac": 0.734375, "sample": [8.906257629394531, 2.9131240844726562, 3.2234115600585938, 8.736663818359375, 4.657310485839844, -3.5744857788085938, 16.33935546875, 6.405059814453125, 0.406768798828125, 5.1985931396484375, 3.760650634765625, -1.0542831420898438, 6.785148620605469, -2.9495925903320312, 0.04009246826171875, 5.9866943359375, 4.016754150390625, 13.513458251953125, 6.443244934082031, 3.0778656005859375, 2.6920318603515625, -1.57647705078125, 1.4629640579223633, 4.522010803222656, 5.987937927246094, 0.38536834716796875, 0.8967304229736328, -0.7208633422851562, 4.0394744873046875, 9.534713745117188, -3.1573143005371094, 8.703445434570312, 4.617952346801758, -0.23369598388671875, 2.0726776123046875, 2.2397079467773438, 1.1749153137207031, 4.270111083984375, -1.4151840209960938, -0.13500213623046875, 2.7064647674560547, 0.7104644775390625, 10.373031616210938, -0.4109382629394531, 1.6409759521484375, 2.381204605102539, -1.9524497985839844, 4.457683563232422, -2.061279296875, 3.016021728515625, 0.3614845275878906, 9.038656234741211, 2.900493621826172, 2.2156906127929688, -0.6029453277587891, -14.56689453125, 3.32476806640625, -1.8370361328125, -0.22319793701171875, 0.6368389129638672, 1.6755447387695312, -0.7053909301757812, 5.1074066162109375, 9.436264038085938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000159.npy"}
|
||||
{"epoch": 0.24036281179138322, "step": 160, "batch_size": 64, "mean": 1.9154568910598755, "std": 4.364707946777344, "min": -6.272464752197266, "p10": -3.0670516967773436, "median": 1.8922882080078125, "p90": 6.730179786682129, "max": 16.030746459960938, "pos_frac": 0.65625, "sample": [1.4826984405517578, -0.650970458984375, 6.562774658203125, 5.358650207519531, -2.5823097229003906, 7.769279479980469, 0.3019256591796875, -1.9909610748291016, 5.953926086425781, -2.8696212768554688, 11.215072631835938, -0.3909721374511719, 6.760126113891602, 6.4238128662109375, 1.8907356262207031, 0.649993896484375, 2.078035354614258, -5.613094329833984, 2.468647003173828, -6.272464752197266, 1.6532669067382812, -2.3490333557128906, 2.0543365478515625, -1.2691154479980469, 5.181861877441406, 5.184566497802734, 4.328754425048828, 0.37724876403808594, 3.2677268981933594, 1.9660682678222656, 6.766485214233398, 2.2863998413085938, 16.030746459960938, 10.358768463134766, 0.31612396240234375, 0.7095947265625, 3.2242431640625, -0.497039794921875, 3.22637939453125, 3.5006790161132812, -0.000110626220703125, 4.982322692871094, -3.1516647338867188, -4.1937103271484375, -0.49675559997558594, 1.9207000732421875, -2.0330047607421875, -4.2542266845703125, 1.754159927368164, 1.8938407897949219, 5.685089111328125, -1.7675275802612305, 1.85797119140625, 6.660305023193359, 2.2408447265625, -2.1791152954101562, -1.727752685546875, 4.575672149658203, -5.5746917724609375, 3.0138168334960938, 4.116233825683594, 11.256404876708984, -1.574249267578125, -5.278656005859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000160.npy"}
|
||||
{"epoch": 0.2418745275888133, "step": 161, "batch_size": 64, "mean": 1.9825514554977417, "std": 4.392550945281982, "min": -14.351402282714844, "p10": -1.3435333251953123, "median": 1.801025390625, "p90": 8.247729492187501, "max": 12.595806121826172, "pos_frac": 0.671875, "sample": [6.4492340087890625, -8.80031967163086, -0.9841690063476562, 3.211517333984375, -1.4025115966796875, -0.6555404663085938, 1.14666748046875, 8.334159851074219, 0.21989059448242188, 1.1271514892578125, -6.224754333496094, 0.5392532348632812, 4.463001251220703, 1.8989944458007812, 1.9425735473632812, -0.6899871826171875, 2.5539093017578125, -14.351402282714844, 3.3264427185058594, -4.395698547363281, -0.328704833984375, 5.424346923828125, 3.4599456787109375, -0.18871307373046875, 5.291534423828125, -0.3299903869628906, 3.8461227416992188, 0.5107154846191406, -2.3759613037109375, 8.046058654785156, 2.9776058197021484, 2.640727996826172, 2.029613494873047, -1.2059173583984375, 9.46990966796875, -0.9265670776367188, 1.2866859436035156, 2.4698638916015625, 2.7203140258789062, 9.939285278320312, 0.7514266967773438, 5.558219909667969, -0.08202743530273438, -0.35750579833984375, 3.9600181579589844, 0.9518585205078125, 1.1729202270507812, 3.817739486694336, 4.492942810058594, 4.6285247802734375, -1.0995445251464844, 4.476909637451172, 8.77476692199707, 1.7030563354492188, 2.51593017578125, -0.6441679000854492, -1.1968994140625, 0.26846885681152344, 8.720878601074219, 10.262168884277344, 12.595806121826172, -0.20538330078125, -2.20382022857666, 5.555717468261719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000161.npy"}
|
||||
{"epoch": 0.24338624338624337, "step": 162, "batch_size": 64, "mean": 3.3071675300598145, "std": 4.060445785522461, "min": -9.680999755859375, "p10": -1.4611656188964843, "median": 3.3654870986938477, "p90": 7.307748794555664, "max": 17.091476440429688, "pos_frac": 0.78125, "sample": [6.05560302734375, 2.9792251586914062, -9.680999755859375, 1.1670799255371094, 3.22515869140625, 4.0409088134765625, 1.8585872650146484, 4.812347412109375, 3.8592529296875, 6.8319091796875, 3.7773513793945312, 1.216827392578125, -0.342926025390625, 17.091476440429688, 2.73089599609375, 7.843151092529297, 0.81475830078125, 6.405513763427734, 4.737762451171875, 7.3284912109375, 7.259349822998047, 3.518218994140625, 2.9918441772460938, -4.173614501953125, 6.135242462158203, 2.8865966796875, -1.421234130859375, 2.4452056884765625, -1.4782791137695312, 5.929107666015625, 3.5058155059814453, 0.8341522216796875, -1.642913818359375, 8.058837890625, 6.2269287109375, 7.196685791015625, 9.93552017211914, 2.7308921813964844, 0.5334854125976562, -1.3933067321777344, -1.0985794067382812, 3.6594467163085938, 10.935920715332031, 5.722969055175781, 2.7076168060302734, -2.494140625, -2.5511207580566406, -1.0857620239257812, 2.8116989135742188, -0.6936073303222656, -2.2264137268066406, 7.7759246826171875, 6.099891662597656, 4.4584503173828125, 7.218364715576172, 2.8174819946289062, 4.376220703125, -1.0081672668457031, 4.884307861328125, 1.896963119506836, 2.613861083984375, 6.64335823059082, 5.366199493408203, 5.996925354003906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000162.npy"}
|
||||
{"epoch": 0.24489795918367346, "step": 163, "batch_size": 64, "mean": 2.5833659172058105, "std": 4.363242149353027, "min": -5.665584564208984, "p10": -3.439841651916503, "median": 2.4967193603515625, "p90": 8.223311614990239, "max": 14.874496459960938, "pos_frac": 0.75, "sample": [-1.576664924621582, 2.801727294921875, 1.298614501953125, 3.7966537475585938, 1.658966064453125, 2.525012969970703, -5.665584564208984, 0.291351318359375, 14.284538269042969, -0.3783721923828125, 3.1982879638671875, 8.643600463867188, 7.242637634277344, 2.0562896728515625, 4.129127502441406, -0.8212738037109375, 14.874496459960938, 0.208221435546875, 5.411590576171875, 9.351509094238281, 0.37911224365234375, 3.9189529418945312, 6.225196838378906, -0.5633277893066406, -3.8083248138427734, 7.114570617675781, 1.6125221252441406, 2.6971206665039062, 0.9658050537109375, -0.05290985107421875, 6.113435745239258, 0.07976913452148438, 4.381874084472656, 3.847959518432617, 4.4676513671875, 3.9558944702148438, -4.274768829345703, 4.861351013183594, -1.7485809326171875, 2.2477035522460938, 2.4928207397460938, 8.674549102783203, -2.580047607421875, 4.997142791748047, -3.854217529296875, -1.4176082611083984, 0.7706298828125, 2.5006179809570312, 0.3739604949951172, -5.0318756103515625, -5.033409118652344, 5.015970230102539, 4.795738220214844, 0.17221641540527344, 0.20403289794921875, -5.133110046386719, 10.915763854980469, 4.233154296875, 6.5554351806640625, 8.900091171264648, 5.223222732543945, -1.2512397766113281, 2.3052310943603516, 5.754604339599609], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000163.npy"}
|
||||
{"epoch": 0.24640967498110355, "step": 164, "batch_size": 64, "mean": 3.281121253967285, "std": 5.095789432525635, "min": -10.165084838867188, "p10": -2.549170684814453, "median": 2.997528076171875, "p90": 9.945471954345704, "max": 13.610504150390625, "pos_frac": 0.75, "sample": [-2.1091766357421875, -0.7729377746582031, 0.5320777893066406, -0.9576950073242188, 6.589569091796875, 5.2164459228515625, 11.113578796386719, 9.0552978515625, 5.6333770751953125, -1.7610359191894531, 0.1251373291015625, -2.7377395629882812, 4.031497955322266, -0.8410472869873047, 4.341484069824219, 0.5766143798828125, 8.203018188476562, 1.863480567932129, 9.040283203125, -0.73876953125, 0.8299541473388672, -4.4461669921875, 3.2064361572265625, 2.64385986328125, 0.0005950927734375, -0.2884235382080078, 5.535774230957031, 4.266448974609375, 2.7886199951171875, 4.251918792724609, 5.085905075073242, 10.84145736694336, 1.2061424255371094, 3.3004608154296875, 5.0985260009765625, -7.1833648681640625, 7.804534912109375, 13.182647705078125, 10.897598266601562, -10.165084838867188, -1.8576221466064453, 1.441741943359375, 2.7039337158203125, 9.153861999511719, 2.2269668579101562, 2.6007308959960938, 5.00384521484375, 0.17205429077148438, 13.610504150390625, 8.4041748046875, 8.9517822265625, -3.6399574279785156, 10.112983703613281, -5.690816879272461, -0.0798492431640625, -4.5082244873046875, 0.12114715576171875, 11.782241821289062, 9.545394897460938, 0.6094818115234375, 9.554611206054688, 6.633598327636719, 3.824957847595215, 4.052925109863281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000164.npy"}
|
||||
{"epoch": 0.24792139077853365, "step": 165, "batch_size": 64, "mean": 3.116410732269287, "std": 4.889015197753906, "min": -10.0494384765625, "p10": -2.3463619232177733, "median": 2.8582935333251953, "p90": 8.636319732666019, "max": 15.72125244140625, "pos_frac": 0.765625, "sample": [14.67803955078125, 4.304004669189453, 6.856842994689941, 5.354892730712891, 5.649955749511719, 1.6085357666015625, 0.9692115783691406, -2.2433929443359375, 3.778942108154297, 2.07025146484375, 4.420082092285156, 2.6214256286621094, 9.242210388183594, 0.3154106140136719, 1.3726577758789062, 2.6274642944335938, 0.40271759033203125, 7.677219390869141, -1.7369918823242188, -0.7475662231445312, -10.0494384765625, -3.166461944580078, 6.552619934082031, -1.2670516967773438, 10.858001708984375, 7.230140686035156, 0.8834056854248047, 3.151540756225586, 7.659507751464844, 7.671695709228516, 2.7070388793945312, -0.9770126342773438, 15.72125244140625, 3.06207275390625, 5.192718505859375, -7.4005126953125, -0.8863449096679688, 4.095634460449219, 9.037940979003906, 6.7477264404296875, 0.50042724609375, -2.390491485595703, -0.6448822021484375, 2.3651885986328125, 6.763946533203125, -4.279991149902344, 1.5909461975097656, 4.0081939697265625, 3.0095481872558594, -1.5608291625976562, 2.2216720581054688, 3.11651611328125, -4.6103363037109375, 2.3205718994140625, 3.534212112426758, -4.606407165527344, 7.6992034912109375, 3.6012115478515625, 5.900430679321289, 4.014579772949219, 11.490806579589844, 2.653411865234375, 14.3111572265625, 2.3948211669921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000165.npy"}
|
||||
{"epoch": 0.2494331065759637, "step": 166, "batch_size": 64, "mean": 2.025606393814087, "std": 4.392224311828613, "min": -7.585659027099609, "p10": -3.1012306213378906, "median": 1.1635046005249023, "p90": 7.933461761474611, "max": 11.922676086425781, "pos_frac": 0.671875, "sample": [11.922676086425781, 0.1126708984375, -0.7499847412109375, 4.842674255371094, 6.022857666015625, -1.5400161743164062, -1.534698486328125, 11.825813293457031, -1.1896514892578125, 0.917724609375, 1.8319664001464844, 2.0905590057373047, 6.041973114013672, -6.0852508544921875, 7.39166259765625, 1.6262359619140625, -0.4132080078125, 3.687070846557617, 0.2358551025390625, 2.7635879516601562, -1.4296379089355469, 6.977241516113281, -7.585659027099609, 1.8471946716308594, 8.087451934814453, 0.08847427368164062, 0.2879791259765625, 3.39385986328125, 7.603675842285156, 4.2726898193359375, -4.3963775634765625, -4.1144866943359375, -1.3676300048828125, -0.259674072265625, 8.074798583984375, 1.2024803161621094, 9.571548461914062, -2.9035491943359375, 4.999530792236328, 4.263019561767578, 2.3892669677734375, 10.7535400390625, -1.5599746704101562, 1.0800704956054688, 4.463958740234375, -3.1859512329101562, -1.6142730712890625, -1.9044570922851562, 2.7258853912353516, 11.675189971923828, 3.5756988525390625, 2.549236297607422, -3.312255859375, 1.1245288848876953, -2.5214195251464844, 0.39530181884765625, 7.5530242919921875, -1.2546091079711914, 7.260307312011719, -3.5699844360351562, 2.574859619140625, 0.8263931274414062, 0.5529117584228516, 0.6481170654296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000166.npy"}
|
||||
{"epoch": 0.2509448223733938, "step": 167, "batch_size": 64, "mean": 2.5901713371276855, "std": 4.083134174346924, "min": -10.434158325195312, "p10": -1.1783671379089353, "median": 2.6527099609375, "p90": 7.836079406738282, "max": 12.985353469848633, "pos_frac": 0.75, "sample": [2.2846145629882812, 3.4921302795410156, 2.8074073791503906, 3.49530029296875, -0.9397125244140625, 7.002166748046875, 7.174896240234375, -0.4084587097167969, 1.6134605407714844, 10.565330505371094, 1.895345687866211, 0.14841079711914062, 2.7207260131835938, -2.511730194091797, -0.17349815368652344, -1.4798583984375, 2.573760986328125, 10.04248046875, 6.0071868896484375, -7.26678466796875, -4.010295867919922, 0.050933837890625, 1.0735969543457031, 4.40313720703125, 3.417085647583008, -10.434158325195312, 3.71160888671875, -0.41310882568359375, 0.0898590087890625, 4.725341796875, 3.6655426025390625, 5.9820098876953125, -1.2275896072387695, 4.9341278076171875, 2.8786849975585938, 2.526763916015625, 3.344440460205078, 5.2957000732421875, 11.155517578125, 2.9092636108398438, -0.2608985900878906, 7.8897705078125, -0.9657135009765625, 2.05859375, 7.7108001708984375, -0.291473388671875, -3.2034683227539062, 7.9702301025390625, 0.39492034912109375, 2.5846939086914062, 5.95123291015625, 0.6039352416992188, 2.345417022705078, 12.985353469848633, 2.9725894927978516, 4.064579010009766, 1.7893142700195312, 3.9258575439453125, 2.7524452209472656, 9.583305358886719, 3.0466461181640625, 0.3782176971435547, -0.5735015869140625, -1.0635147094726562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000167.npy"}
|
||||
{"epoch": 0.25245653817082386, "step": 168, "batch_size": 64, "mean": 2.1044440269470215, "std": 3.639554262161255, "min": -5.543495178222656, "p10": -2.502143096923828, "median": 2.120502471923828, "p90": 6.720437240600588, "max": 13.598541259765625, "pos_frac": 0.75, "sample": [3.778301239013672, 1.3143463134765625, 5.334136962890625, -2.2682952880859375, 0.8584308624267578, 4.5475616455078125, 2.1978759765625, -2.6023635864257812, -3.952831268310547, -2.9746551513671875, 2.22100830078125, 0.05599212646484375, -1.931304931640625, 5.699867248535156, 5.89361572265625, 3.6321239471435547, 6.143390655517578, 4.94683837890625, 1.1776123046875, 8.03717041015625, 2.8351974487304688, 2.4354496002197266, 0.5480422973632812, 2.316570281982422, -0.7383251190185547, 0.3291053771972656, 7.3495635986328125, 2.1118698120117188, -5.543495178222656, 2.8293991088867188, 0.659393310546875, 2.823211669921875, 3.2671051025390625, 2.67919921875, 3.5081348419189453, -3.9627227783203125, 10.239532470703125, 0.7481422424316406, -0.6485328674316406, 0.118743896484375, 3.419506072998047, 13.598541259765625, 1.11138916015625, 5.991355895996094, 3.6259117126464844, -4.1476898193359375, 3.43829345703125, 0.9917449951171875, 2.1291351318359375, -0.0984344482421875, 10.203643798828125, 1.0734634399414062, 1.3148078918457031, -3.1416473388671875, -0.3782501220703125, 7.249584197998047, -2.002593994140625, -0.4124031066894531, 0.3910675048828125, 0.3495521545410156, 3.259716033935547, 3.8174362182617188, 6.967742919921875, -0.08185577392578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000168.npy"}
|
||||
{"epoch": 0.25396825396825395, "step": 169, "batch_size": 64, "mean": 2.772446393966675, "std": 4.691803455352783, "min": -9.447708129882812, "p10": -2.2025861740112305, "median": 2.460906982421875, "p90": 9.68982887268067, "max": 14.79864501953125, "pos_frac": 0.65625, "sample": [-5.4199981689453125, 3.766571044921875, 8.530136108398438, -2.0608749389648438, 1.7063121795654297, 0.8064498901367188, 4.58868408203125, -1.6786117553710938, 0.8934478759765625, 1.4658775329589844, -3.1480484008789062, -0.5394401550292969, 12.300384521484375, 4.4712371826171875, 6.168235778808594, -3.296937942504883, 4.350269317626953, 2.7852706909179688, 4.6838226318359375, 2.5761489868164062, -0.7646560668945312, 6.01513671875, 7.799308776855469, 2.3456649780273438, -2.9633636474609375, 14.79864501953125, 4.8982696533203125, 3.290557861328125, 0.6325912475585938, 3.76727294921875, -0.5591163635253906, 4.812469482421875, 1.4351882934570312, -2.128358840942383, 0.7257862091064453, -1.1698856353759766, 10.66644287109375, -1.8031997680664062, 5.0916595458984375, 2.196338653564453, -0.06904029846191406, 2.0515594482421875, 3.846437454223633, 2.648406982421875, -0.49329566955566406, 5.735816955566406, 5.631156921386719, -0.962921142578125, 6.07330322265625, 6.3869781494140625, -1.6380996704101562, 4.507394790649414, 6.545417785644531, -9.447708129882812, -3.260528564453125, 10.447708129882812, 10.500480651855469, 14.216407775878906, -0.38660430908203125, 10.186840057373047, -2.2343978881835938, 6.158212661743164, -0.4795989990234375, -0.56304931640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000169.npy"}
|
||||
{"epoch": 0.25547996976568405, "step": 170, "batch_size": 64, "mean": 2.0562214851379395, "std": 5.595729351043701, "min": -11.614233016967773, "p10": -4.947102355957031, "median": 1.9159936904907227, "p90": 8.88921051025391, "max": 14.985275268554688, "pos_frac": 0.671875, "sample": [9.274826049804688, -5.350421905517578, 1.914926528930664, 0.2539215087890625, 5.217794418334961, 12.442703247070312, 0.7606410980224609, 1.394430160522461, 3.8978805541992188, 6.6907958984375, -4.957672119140625, 2.2153072357177734, 5.856903076171875, 12.108612060546875, 6.976936340332031, -2.9045181274414062, 3.1780452728271484, 5.892181396484375, -0.0516510009765625, -4.6521759033203125, 1.3020095825195312, 13.1448974609375, -1.6956787109375, -0.9896354675292969, -3.8374481201171875, -1.4577255249023438, -5.535125732421875, 2.613525390625, -11.614233016967773, 5.1570281982421875, 7.5349884033203125, 3.33245849609375, 11.272323608398438, 2.8032302856445312, -6.229315757751465, 0.3907508850097656, -4.432403564453125, 2.5009231567382812, 6.0044403076171875, -2.737640380859375, -0.12023162841796875, -3.460622787475586, 1.410980224609375, 4.668510437011719, 4.934841156005859, 7.98944091796875, 1.496429443359375, 13.4312744140625, 0.9498062133789062, 2.078033447265625, -4.9224395751953125, -4.731025695800781, 1.9170608520507812, 7.325469970703125, -7.349571228027344, 2.8628578186035156, 0.9375190734863281, 14.985275268554688, -2.0685882568359375, 2.36767578125, 7.226222991943359, 7.573310852050781, 1.316741943359375, -6.907623291015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000170.npy"}
|
||||
{"epoch": 0.25699168556311414, "step": 171, "batch_size": 64, "mean": 3.0714292526245117, "std": 4.976133346557617, "min": -10.348365783691406, "p10": -2.2065093994140623, "median": 3.3284988403320312, "p90": 9.679034423828126, "max": 14.694183349609375, "pos_frac": 0.6875, "sample": [2.1115646362304688, 0.18221282958984375, 8.348953247070312, 7.019798278808594, 5.371940612792969, 10.900054931640625, 7.43670654296875, 9.25653076171875, 1.0566797256469727, 6.659326553344727, -0.7528018951416016, 3.258819580078125, 5.317314147949219, 7.9789276123046875, 6.3707427978515625, 1.4510726928710938, 4.104892730712891, 1.387603759765625, 3.8624353408813477, -0.5467605590820312, -0.14505386352539062, 5.953460693359375, 4.11492919921875, 11.9053955078125, 5.688068389892578, -0.029201507568359375, -1.7419586181640625, -2.028411865234375, 11.877838134765625, -2.7791061401367188, 5.8776397705078125, 2.744140625, 5.2222137451171875, -3.512664794921875, 5.704551696777344, 4.679054260253906, -1.2989578247070312, -10.348365783691406, 10.251903533935547, 9.231704711914062, -9.859771728515625, -0.3346214294433594, 3.7626266479492188, 3.3981781005859375, 1.272552490234375, 14.694183349609375, 4.3396148681640625, -1.192230224609375, 2.6922607421875, 5.24591064453125, 1.7718849182128906, 10.285388946533203, 7.243673324584961, -0.5198135375976562, -6.737567901611328, -2.2828369140625, 3.700958251953125, 1.2147598266601562, -0.303680419921875, -2.9774322509765625, 9.860107421875, 1.2286376953125, -0.5070953369140625, -1.5673980712890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000171.npy"}
|
||||
{"epoch": 0.2585034013605442, "step": 172, "batch_size": 64, "mean": 2.6176977157592773, "std": 4.639049053192139, "min": -9.361894607543945, "p10": -2.7363525390624996, "median": 1.8398799896240234, "p90": 8.774855232238771, "max": 13.523284912109375, "pos_frac": 0.734375, "sample": [-2.4427490234375, 0.5363502502441406, 2.306732177734375, 7.3218231201171875, -1.72650146484375, 1.9471569061279297, -0.31890869140625, 7.228099822998047, 9.7781982421875, 8.148893356323242, 13.523284912109375, -3.464630126953125, 1.0927886962890625, -0.6663551330566406, 1.1959152221679688, 12.212005615234375, 6.597343444824219, -0.36623382568359375, 1.2335853576660156, 3.1829185485839844, 9.310226440429688, -0.478515625, -4.386590957641602, 0.9223480224609375, 2.960803985595703, 0.7014350891113281, 1.2112503051757812, 2.7830657958984375, 5.300201416015625, -1.48883056640625, 1.2557144165039062, 0.7692108154296875, 4.56256103515625, 9.923660278320312, 1.1545333862304688, -2.8621826171875, 8.836772918701172, -3.1679458618164062, 8.623027801513672, 0.7053985595703125, -1.0409717559814453, 3.5320091247558594, 2.2313461303710938, 7.747398376464844, 3.6145782470703125, -7.985565185546875, 1.1287384033203125, 6.84515380859375, -1.015472412109375, 3.0374507904052734, -5.035655975341797, 7.395599365234375, 9.916019439697266, -9.361894607543945, 3.221240997314453, 7.45477294921875, 2.0194244384765625, 0.5907135009765625, 0.8383808135986328, 8.630380630493164, -1.486663818359375, 1.7326030731201172, 5.51275634765625, 4.054450988769531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000172.npy"}
|
||||
{"epoch": 0.2600151171579743, "step": 173, "batch_size": 64, "mean": 2.6660757064819336, "std": 5.35465145111084, "min": -9.25851058959961, "p10": -3.8302091598510737, "median": 1.794478416442871, "p90": 9.623696899414064, "max": 18.050216674804688, "pos_frac": 0.640625, "sample": [-7.595113754272461, 4.417106628417969, -4.908458709716797, 3.2280044555664062, 1.4128189086914062, 11.028739929199219, 6.06689453125, -3.1841049194335938, -0.8305072784423828, 3.220844268798828, -0.5833854675292969, 5.071041107177734, 7.1870269775390625, -5.07847785949707, -3.279794692993164, 9.745086669921875, 7.7879638671875, 0.9770774841308594, 1.2204837799072266, 0.8901290893554688, 9.3404541015625, 4.712135314941406, -1.5489578247070312, 1.0608978271484375, -9.25851058959961, -4.06610107421875, 7.4608306884765625, 1.5955276489257812, -0.16617584228515625, 4.148983001708984, -0.9428787231445312, 10.373367309570312, -2.2004451751708984, 8.759185791015625, 5.832679748535156, -0.5262575149536133, 5.822135925292969, -6.440319061279297, 7.4951019287109375, 8.973373413085938, 3.3311080932617188, 2.2504119873046875, 3.474367141723633, 8.632728576660156, -0.100799560546875, -4.430103302001953, 0.9603652954101562, -2.1967315673828125, 18.050216674804688, 11.152332305908203, 8.69174575805664, 2.0556983947753906, 1.5703811645507812, 8.834014892578125, 10.497947692871094, -0.15595245361328125, -1.89459228515625, -2.849903106689453, 1.993429183959961, 9.977264404296875, -0.5916595458984375, 5.465547561645508, -1.436248779296875, 0.128875732421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000173.npy"}
|
||||
{"epoch": 0.2615268329554044, "step": 174, "batch_size": 64, "mean": 2.9072351455688477, "std": 5.155368328094482, "min": -11.121837615966797, "p10": -3.814033508300781, "median": 2.5152788162231445, "p90": 9.306307983398439, "max": 14.372634887695312, "pos_frac": 0.703125, "sample": [1.822174072265625, 7.3318939208984375, 5.553951263427734, 1.6233596801757812, 3.7663955688476562, 2.376495361328125, 7.660057067871094, 9.187789916992188, -3.953857421875, 3.3122501373291016, 12.669319152832031, 14.372634887695312, 2.7251663208007812, 0.5907630920410156, 2.5600433349609375, -0.2804069519042969, 9.357101440429688, 5.877191543579102, 3.9565505981445312, -7.7045440673828125, 13.1240234375, 2.2595901489257812, -0.23520278930664062, 4.714962005615234, 14.289787292480469, -0.37613677978515625, 4.65325927734375, -0.065582275390625, 10.688179016113281, 7.034095764160156, -0.7347564697265625, 8.505020141601562, 8.977558135986328, 0.2616996765136719, 4.650852203369141, -3.4877777099609375, 2.4373779296875, -4.1954498291015625, 3.557401657104492, 6.525970458984375, -1.45062255859375, 3.6185150146484375, 0.33539581298828125, 1.7122802734375, -4.1658782958984375, 9.032066345214844, 3.3595123291015625, -0.7877273559570312, 10.796035766601562, 2.4705142974853516, -5.80322265625, 1.5982513427734375, 3.058349609375, 2.595560073852539, -2.7699127197265625, 1.7826766967773438, 5.9942779541015625, 1.9162063598632812, -0.485626220703125, -11.121837615966797, 5.387714385986328, -4.099113464355469, -1.794708251953125, -0.5048599243164062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000174.npy"}
|
||||
{"epoch": 0.26303854875283444, "step": 175, "batch_size": 64, "mean": 2.807791233062744, "std": 5.244864463806152, "min": -10.095733642578125, "p10": -3.4934974670410153, "median": 2.6104965209960938, "p90": 10.43399353027344, "max": 15.588752746582031, "pos_frac": 0.734375, "sample": [-4.4961395263671875, 5.374143600463867, -4.367156982421875, 1.9919052124023438, 0.37577056884765625, 8.165412902832031, -3.2255020141601562, 1.917633056640625, -1.00616455078125, 2.9082107543945312, 5.268104553222656, 9.75653076171875, 0.8215713500976562, 3.7148513793945312, -0.42728424072265625, 8.74321174621582, 1.5126953125, 13.262382507324219, 2.6225738525390625, -2.414276123046875, 1.4733123779296875, 3.4543533325195312, 3.8718338012695312, 4.6051025390625, 1.1278533935546875, 1.4426040649414062, 0.6776046752929688, -5.1993408203125, 10.724334716796875, -10.095733642578125, 5.574928283691406, 1.1403732299804688, 6.178680419921875, -8.518211364746094, -1.349029541015625, 1.4499454498291016, 5.659660339355469, 15.588752746582031, 0.9862251281738281, -0.6982803344726562, 3.1326141357421875, -7.739959716796875, 2.6332626342773438, 1.1093330383300781, -2.083831787109375, 2.598419189453125, 6.306793212890625, -3.6083526611328125, 10.814949035644531, 9.129222869873047, 3.885499954223633, 8.741527557373047, -0.0635528564453125, -0.94293212890625, 4.376960754394531, 6.27838134765625, -2.738922119140625, 10.988700866699219, 0.183929443359375, 11.386322021484375, 3.7533226013183594, 9.278121948242188, 2.890472412109375, 10.794914245605469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000175.npy"}
|
||||
{"epoch": 0.26455026455026454, "step": 176, "batch_size": 64, "mean": 1.805713415145874, "std": 5.201227188110352, "min": -8.900192260742188, "p10": -4.477446746826171, "median": 1.529561996459961, "p90": 9.209698486328126, "max": 15.596099853515625, "pos_frac": 0.609375, "sample": [14.826065063476562, -8.900192260742188, 1.899322509765625, 9.134754180908203, -0.9256324768066406, 10.65155029296875, -5.82244873046875, 3.3596725463867188, 1.096405029296875, 2.5321807861328125, -4.7947845458984375, -3.3649826049804688, -3.047454833984375, 9.920616149902344, -7.052101135253906, 3.3432483673095703, 1.6715316772460938, 3.750703811645508, 0.762054443359375, -0.6788444519042969, -1.90142822265625, -3.7369918823242188, 2.0008544921875, 11.092681884765625, 10.2593994140625, -0.27256011962890625, -3.100860595703125, 3.7618865966796875, -2.002422332763672, 5.936740875244141, -0.721282958984375, -1.6339893341064453, 0.887786865234375, 1.0525894165039062, 3.5936965942382812, 9.241817474365234, 0.5196456909179688, 5.2706298828125, 1.5607337951660156, -2.0310134887695312, -0.09027862548828125, -0.6923675537109375, 4.920621871948242, 7.4272918701171875, 3.3851280212402344, 6.26605224609375, -2.747406005859375, -5.449943542480469, 6.122039794921875, 3.000213623046875, -1.14752197265625, 6.55517578125, 1.4470596313476562, -0.7070999145507812, 3.3086090087890625, 2.2926578521728516, 8.118453979492188, 1.9764633178710938, -3.0758743286132812, 15.596099853515625, 1.4983901977539062, -5.1859283447265625, 3.280548095703125, -8.67230224609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000176.npy"}
|
||||
{"epoch": 0.2660619803476946, "step": 177, "batch_size": 64, "mean": 4.662107944488525, "std": 5.2881975173950195, "min": -9.682296752929688, "p10": -1.4532867431640621, "median": 4.689067840576172, "p90": 10.777004814147949, "max": 18.46904754638672, "pos_frac": 0.8125, "sample": [3.8971176147460938, -3.5191268920898438, 4.17938232421875, 18.46904754638672, 4.757514953613281, 2.604705810546875, 0.887542724609375, 2.3826141357421875, 9.653427124023438, 5.14111328125, 13.752368927001953, 8.828662872314453, 8.71082878112793, -2.7139244079589844, 0.6744384765625, -2.010683059692383, 8.392822265625, -0.3065299987792969, 0.8343105316162109, 6.789648056030273, 5.8144683837890625, 1.6420974731445312, -1.6016845703125, 7.993732452392578, 8.847513198852539, 5.8377838134765625, 3.6440353393554688, 3.1528263092041016, 1.2094802856445312, 1.5808982849121094, -0.895263671875, 5.031589508056641, 12.598587036132812, -0.27734375, 1.517333984375, 10.7933349609375, 12.856346130371094, 0.5980377197265625, -1.107025146484375, 5.9429931640625, 2.586231231689453, 2.148517608642578, 8.488712310791016, 9.105072021484375, 9.609939575195312, -5.516197204589844, -1.0418701171875, -3.95281982421875, 7.977441787719727, 2.578887939453125, 13.938491821289062, 0.4847412109375, -9.682296752929688, 10.738901138305664, 4.9134521484375, 8.031898498535156, 7.625175476074219, 12.938644409179688, 8.711654663085938, 4.6206207275390625, 7.6652679443359375, 9.770013809204102, 1.6463470458984375, 8.403053283691406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000177.npy"}
|
||||
{"epoch": 0.2675736961451247, "step": 178, "batch_size": 64, "mean": 2.9629063606262207, "std": 4.153253555297852, "min": -7.637310028076172, "p10": -1.879748153686523, "median": 2.9817981719970703, "p90": 8.63411979675293, "max": 12.745819091796875, "pos_frac": 0.796875, "sample": [3.0672760009765625, 2.9777069091796875, 3.9789199829101562, -0.14241790771484375, 3.120067596435547, 0.5955467224121094, 2.1183509826660156, 7.143764495849609, 2.985889434814453, 5.478729248046875, -1.4657974243164062, 8.705429077148438, 5.555572509765625, -4.361118316650391, 3.0831069946289062, 0.855010986328125, 8.467731475830078, 0.3792533874511719, 3.2816200256347656, 1.4719161987304688, 12.081008911132812, 1.7229652404785156, 1.299072265625, 3.390625, 3.3237762451171875, -2.5684661865234375, 3.1723403930664062, 12.745819091796875, -2.33245849609375, -0.306884765625, 2.740478515625, 0.16087913513183594, 1.08740234375, 9.90439224243164, -0.0899505615234375, 5.751068115234375, 7.4695892333984375, 0.4715919494628906, 10.131431579589844, 0.49444580078125, 0.7434844970703125, 4.695152282714844, 9.0372314453125, 3.9022140502929688, -7.637310028076172, 7.680027008056641, -1.1034660339355469, 0.9482498168945312, 11.277793884277344, 0.8210964202880859, 5.047706604003906, 6.2677459716796875, 2.3471336364746094, 7.0059967041015625, 0.017345428466796875, 4.463714599609375, 1.9705657958984375, -0.6584625244140625, -5.795860290527344, -2.0571556091308594, 4.839439392089844, 4.32977294921875, 6.8219451904296875, -3.2840309143066406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000178.npy"}
|
||||
{"epoch": 0.2690854119425548, "step": 179, "batch_size": 64, "mean": 2.5824155807495117, "std": 5.306247234344482, "min": -9.069328308105469, "p10": -3.9980121612548816, "median": 2.0536108016967773, "p90": 7.9831680297851575, "max": 20.38116455078125, "pos_frac": 0.71875, "sample": [0.9083633422851562, -2.7952423095703125, 5.3699951171875, -0.26483154296875, 1.3029327392578125, 4.360637664794922, 2.0982589721679688, 9.955429077148438, 12.124496459960938, 5.175067901611328, 6.115497589111328, 1.800069808959961, 8.11151123046875, -8.832122802734375, 7.670124053955078, -4.513484954833984, -8.02215576171875, -5.259836196899414, -1.0059890747070312, 3.0294113159179688, -0.6752662658691406, 6.709613800048828, 9.64447021484375, 1.143951416015625, 7.5048370361328125, -5.197715759277344, -2.49359130859375, 7.155338287353516, 2.008962631225586, -0.6254959106445312, 3.140918731689453, 3.9053115844726562, 3.125762939453125, 0.2289276123046875, 0.9664382934570312, -9.069328308105469, -0.3003997802734375, 11.945220947265625, 7.3336334228515625, -0.5140419006347656, 4.841129302978516, 2.8177261352539062, 1.3848075866699219, 5.710258483886719, -2.3567352294921875, 6.634368896484375, 10.733192443847656, 5.763557434082031, 6.484397888183594, 20.38116455078125, 4.930259704589844, 1.3102912902832031, 2.3289794921875, 0.5836296081542969, -1.0832786560058594, 3.3202896118164062, 0.23024749755859375, 0.2425689697265625, 7.6837005615234375, 0.8918075561523438, -7.078083038330078, 1.3132953643798828, 6.8392181396484375, -1.8978862762451172], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000179.npy"}
|
||||
{"epoch": 0.2705971277399849, "step": 180, "batch_size": 64, "mean": 2.4041786193847656, "std": 4.4287428855896, "min": -6.589454650878906, "p10": -2.1721164703369142, "median": 1.3014144897460938, "p90": 8.369915008544922, "max": 12.9776611328125, "pos_frac": 0.671875, "sample": [12.9776611328125, -0.995758056640625, 4.633394241333008, -1.3682403564453125, 0.529541015625, 2.7991256713867188, -3.9593353271484375, 5.098167419433594, 1.2259368896484375, -1.7233428955078125, 8.411445617675781, 1.2407608032226562, -0.1775970458984375, -2.7406883239746094, 4.7122650146484375, -0.7222118377685547, -1.9456634521484375, 3.246063232421875, -0.8875617980957031, -3.529510498046875, 3.2964553833007812, 5.82905387878418, 4.581432342529297, 6.948974609375, 5.651386260986328, 4.612955093383789, -0.9798431396484375, 7.981414794921875, 4.263801574707031, 1.8588104248046875, 3.3156280517578125, 5.9271697998046875, -1.4490585327148438, 0.47487640380859375, -2.131744384765625, 12.01953125, -0.8272438049316406, 7.207855224609375, -1.4861335754394531, -2.1894187927246094, -0.9726810455322266, 4.482172012329102, 10.526494979858398, 0.11983680725097656, 1.077117919921875, 1.4147453308105469, 9.812423706054688, 1.01336669921875, 7.640541076660156, 0.06003570556640625, 8.27301025390625, -6.589454650878906, -1.350830078125, 1.6760063171386719, 0.6385269165039062, 0.6431884765625, 11.341564178466797, -4.66375732421875, 2.7569580078125, -2.8424911499023438, 0.838409423828125, 2.441293716430664, 12.438522338867188, 1.3620681762695312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000180.npy"}
|
||||
{"epoch": 0.272108843537415, "step": 181, "batch_size": 64, "mean": 3.0456197261810303, "std": 4.077996253967285, "min": -4.697587966918945, "p10": -1.3891105651855469, "median": 2.3598403930664062, "p90": 8.739193725585938, "max": 14.085769653320312, "pos_frac": 0.796875, "sample": [4.222835540771484, 2.9989776611328125, -0.8622207641601562, 0.7302112579345703, 1.994354248046875, 0.8349666595458984, 1.2404861450195312, -2.9138336181640625, 1.571380615234375, -0.5141181945800781, 4.3701324462890625, 2.4014205932617188, -0.19050216674804688, 0.57720947265625, 3.095663070678711, 3.5434494018554688, 5.904270172119141, 5.680656433105469, 7.185325622558594, -1.93072509765625, 0.38299560546875, 4.95233154296875, 8.808929443359375, 2.3484039306640625, 13.944023132324219, 3.6927032470703125, 1.85784912109375, 14.085769653320312, -2.5325469970703125, -1.9514617919921875, 1.391265869140625, 2.078125, 11.122955322265625, -4.697587966918945, 8.44244384765625, 2.37127685546875, 3.7090702056884766, 12.601768493652344, -0.5808639526367188, -1.410614013671875, 6.086090087890625, 2.741321563720703, 10.453903198242188, 1.315460205078125, 5.717475891113281, 0.19964599609375, 0.3378562927246094, 2.774799346923828, 2.4808578491210938, 0.29141998291015625, -1.3389358520507812, 5.334205627441406, 8.57647705078125, 0.0838470458984375, 4.210052490234375, 6.436765670776367, 1.17974853515625, 0.595489501953125, 6.193471908569336, -2.28643798828125, 1.3703765869140625, -0.8286533355712891, 9.400230407714844, 3.03741455078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000181.npy"}
|
||||
{"epoch": 0.273620559334845, "step": 182, "batch_size": 64, "mean": 3.7446250915527344, "std": 5.4357171058654785, "min": -5.9409942626953125, "p10": -3.0157258987426756, "median": 3.2197647094726562, "p90": 10.621002197265625, "max": 21.54534149169922, "pos_frac": 0.75, "sample": [4.0037078857421875, -3.0775108337402344, -0.002826690673828125, 3.063201904296875, 4.022254943847656, 11.361736297607422, 2.8023834228515625, -2.8036956787109375, -1.5947494506835938, 9.873672485351562, 7.957672119140625, 4.1242828369140625, 10.634185791015625, 1.7010040283203125, 2.4545555114746094, 6.516845703125, -2.0545883178710938, -3.419189453125, 7.682109832763672, 2.59423828125, 4.423954010009766, 0.959986686706543, 3.1654052734375, 2.407562255859375, 6.338706970214844, -3.3936386108398438, 10.590240478515625, 4.389368057250977, 6.392303466796875, 11.083858489990234, 9.077423095703125, 3.349822998046875, 3.5611419677734375, 2.5736312866210938, 14.835090637207031, -5.9409942626953125, 0.93890380859375, 4.9512939453125, 1.8071746826171875, -0.5000495910644531, 1.4212493896484375, -5.724834442138672, 9.82293701171875, 3.2741241455078125, 8.564647674560547, 21.54534149169922, -0.07917022705078125, -1.69903564453125, 12.820915222167969, 3.4498748779296875, 7.151710510253906, 15.586036682128906, 3.5739288330078125, 7.310493469238281, 5.945060729980469, 1.5765514373779297, 1.9681854248046875, 0.4261436462402344, 8.358634948730469, 0.6200466156005859, -0.7410011291503906, -4.158782958984375, -2.871561050415039, -5.335968017578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000182.npy"}
|
||||
{"epoch": 0.2751322751322751, "step": 183, "batch_size": 64, "mean": 1.4735193252563477, "std": 4.229245185852051, "min": -9.124259948730469, "p10": -3.63360710144043, "median": 1.2043333053588867, "p90": 7.188752746582032, "max": 10.399276733398438, "pos_frac": 0.640625, "sample": [3.2470970153808594, 6.000274658203125, 1.4557113647460938, -1.8663101196289062, 1.236898422241211, -3.1565017700195312, -5.604278564453125, 9.841781616210938, -6.561344146728516, -3.5984344482421875, -4.567405700683594, -1.3862762451171875, 1.9539718627929688, 1.2361679077148438, 3.3040695190429688, 4.519081115722656, -0.3540210723876953, 4.796745300292969, 6.2592010498046875, 9.939937591552734, 7.2309722900390625, -0.4202117919921875, 6.183647155761719, -0.88287353515625, 3.492786407470703, 1.9055976867675781, 1.153839111328125, -0.8125839233398438, -5.58575439453125, 0.5201873779296875, 2.7928390502929688, -3.6160125732421875, -0.6951370239257812, 3.116943359375, 1.4690322875976562, 4.24993896484375, 8.875350952148438, 6.5637664794921875, -9.124259948730469, 6.356529235839844, -3.6411476135253906, -1.464019775390625, 8.669845581054688, 0.42835235595703125, 0.5187416076660156, 0.24254608154296875, 2.098785400390625, 0.3698749542236328, -4.273590087890625, -0.9402236938476562, -1.6490707397460938, 1.3259506225585938, 1.1724987030029297, 0.6529541015625, 7.090240478515625, -1.7551803588867188, 1.8892593383789062, -1.9259490966796875, -0.6905708312988281, 1.135345458984375, 2.991455078125, 10.399276733398438, 4.910886764526367, 7.27801513671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000183.npy"}
|
||||
{"epoch": 0.2766439909297052, "step": 184, "batch_size": 64, "mean": 2.560145854949951, "std": 3.5499658584594727, "min": -5.761146545410156, "p10": -1.1750995635986325, "median": 2.3444595336914062, "p90": 7.358495330810547, "max": 10.824031829833984, "pos_frac": 0.765625, "sample": [-1.3256683349609375, 4.216850280761719, 4.662014007568359, -3.171985626220703, 6.000770568847656, -0.5263519287109375, 2.2956275939941406, -1.6778411865234375, -0.25984954833984375, 0.4421234130859375, -0.3616924285888672, 5.6469879150390625, 1.3857040405273438, 0.8950014114379883, 2.5432205200195312, -2.9153099060058594, 4.405422210693359, 5.724021911621094, 2.9593238830566406, 2.7676849365234375, -0.6744918823242188, 8.960010528564453, 2.3921432495117188, 10.022762298583984, 0.3346824645996094, 3.395648956298828, 1.0081729888916016, 3.277496337890625, 7.729343414306641, 0.31467437744140625, 7.198272705078125, 1.930999755859375, 0.48308563232421875, -4.1995697021484375, -0.10473251342773438, 0.3357048034667969, 10.824031829833984, 3.2877578735351562, -0.375244140625, 7.239662170410156, 0.8693428039550781, 3.1492156982421875, 5.676216125488281, 0.8776578903198242, 7.030429840087891, 2.686269760131836, 3.45135498046875, 5.771728515625, -0.8237724304199219, 10.269798278808594, -0.24483871459960938, -1.6417465209960938, 2.4722366333007812, 2.4300537109375, 9.435104370117188, 7.409423828125, 0.2009296417236328, 4.738590240478516, 2.2967758178710938, 3.301971435546875, -5.761146545410156, 2.1856822967529297, 0.34128570556640625, 0.6403045654296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000184.npy"}
|
||||
{"epoch": 0.2781557067271353, "step": 185, "batch_size": 64, "mean": 3.062563180923462, "std": 5.737687110900879, "min": -9.445415496826172, "p10": -4.726593017578125, "median": 2.8744845390319824, "p90": 10.7418212890625, "max": 15.777694702148438, "pos_frac": 0.671875, "sample": [2.666754722595215, 14.175323486328125, 3.1775360107421875, 6.648704528808594, 11.49395751953125, -6.0272979736328125, 4.2893524169921875, 3.2111129760742188, 1.9337921142578125, -1.689361572265625, 9.359912872314453, 13.573883056640625, -4.612998962402344, -2.3263702392578125, 10.719070434570312, 1.4821434020996094, 11.896896362304688, 2.4922943115234375, 7.459196090698242, 8.044900894165039, -0.9146881103515625, 5.769439697265625, 1.5114402770996094, 2.3683547973632812, 0.7638607025146484, 8.797836303710938, 2.5330734252929688, 6.901996612548828, 1.1196441650390625, 10.751571655273438, -2.9552078247070312, -5.848724365234375, 10.087762832641602, 5.334991455078125, -5.959407806396484, 3.08221435546875, 15.777694702148438, 3.4857521057128906, 3.427539825439453, -0.09838104248046875, -4.775276184082031, 4.834648132324219, -0.5040626525878906, 8.073455810546875, 4.681304931640625, -7.6654815673828125, 6.677890777587891, -0.5607490539550781, 5.295433044433594, -2.8130874633789062, 6.2444305419921875, -9.445415496826172, 6.4047393798828125, -0.9369010925292969, 5.841651916503906, -2.5302982330322266, -3.8837738037109375, -6.030422210693359, 2.392475128173828, -0.600128173828125, -1.1765708923339844, 1.2894363403320312, 9.134635925292969, 12.15054702758789], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000185.npy"}
|
||||
{"epoch": 0.2796674225245654, "step": 186, "batch_size": 64, "mean": 3.159148693084717, "std": 5.737919807434082, "min": -9.170074462890625, "p10": -4.1159769058227536, "median": 2.514801025390625, "p90": 9.841683959960937, "max": 16.778564453125, "pos_frac": 0.6875, "sample": [9.095245361328125, 7.7141265869140625, -2.8450088500976562, 1.4973602294921875, -0.6491775512695312, 1.1682891845703125, 7.681793212890625, -5.201023101806641, -1.5761642456054688, 6.623619079589844, 9.038482666015625, -2.0085182189941406, 4.098358154296875, 7.14494514465332, 15.372528076171875, 11.795494079589844, -1.3521595001220703, 2.678752899169922, 7.17041015625, -2.2638397216796875, -4.369646072387695, 7.458831787109375, 4.643013000488281, 14.002334594726562, -8.143814086914062, -3.2456626892089844, -4.407833099365234, -1.7189712524414062, -3.5240821838378906, 1.41497802734375, 3.510162353515625, 8.527984619140625, 9.83123779296875, -9.170074462890625, 0.20946121215820312, -1.8272151947021484, 8.9068603515625, 10.657699584960938, 8.469738006591797, 1.7348747253417969, 0.12751007080078125, -3.1761398315429688, 2.78564453125, 1.0991897583007812, 8.015579223632812, 2.710369110107422, 10.122215270996094, 7.750732421875, -4.597930908203125, -0.1175384521484375, 8.90985107421875, 6.666507720947266, 7.7555999755859375, 9.846160888671875, 2.2731704711914062, 3.5547866821289062, 16.778564453125, 2.2995033264160156, -0.36053466796875, 2.350849151611328, 0.3098297119140625, 1.6409149169921875, -6.335044860839844, 3.6323394775390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000186.npy"}
|
||||
{"epoch": 0.2811791383219955, "step": 187, "batch_size": 64, "mean": 2.742424249649048, "std": 4.865922927856445, "min": -8.250911712646484, "p10": -2.7382431030273438, "median": 2.57379150390625, "p90": 8.871530151367187, "max": 13.471649169921875, "pos_frac": 0.703125, "sample": [5.1974945068359375, -6.8094024658203125, 11.243408203125, 8.514686584472656, 13.471649169921875, 4.7314605712890625, 8.459358215332031, 2.203545570373535, 2.5460357666015625, 2.9725914001464844, -6.178245544433594, 1.0340118408203125, 1.0117835998535156, 2.6015472412109375, 3.8603515625, 2.999723434448242, 1.3845977783203125, 3.5417556762695312, 7.029747009277344, 6.464704513549805, -3.776214599609375, -8.250911712646484, 7.19232177734375, -0.18065452575683594, 0.4413604736328125, 2.8454971313476562, 0.11719894409179688, -2.44500732421875, -2.701873779296875, 10.247810363769531, 6.344940185546875, 8.884017944335938, -4.4384765625, -2.5330963134765625, -1.0779953002929688, 8.722686767578125, 1.851806640625, 0.5132331848144531, -1.3985595703125, 5.6620025634765625, -1.0705499649047852, -1.40618896484375, 9.93560791015625, 2.9648170471191406, 3.920501708984375, -0.246917724609375, 5.42486572265625, 2.4509048461914062, 2.38970947265625, -0.03582763671875, 8.842391967773438, -2.7538299560546875, 4.9146881103515625, 12.123779296875, 1.7125530242919922, 3.515167236328125, 8.567276000976562, 9.667762756347656, 3.0369720458984375, 7.1413421630859375, -0.36913299560546875, -2.619710922241211, -7.229034423828125, 2.3371047973632812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000187.npy"}
|
||||
{"epoch": 0.28269085411942557, "step": 188, "batch_size": 64, "mean": 3.0909528732299805, "std": 4.81361198425293, "min": -4.72991943359375, "p10": -2.4183761596679685, "median": 2.201030731201172, "p90": 10.137841415405275, "max": 16.479034423828125, "pos_frac": 0.703125, "sample": [1.2964935302734375, -0.7308349609375, 10.070899963378906, -0.2620429992675781, -3.4337997436523438, 3.4795913696289062, 6.696983337402344, 3.6488800048828125, 7.7375640869140625, 3.6791839599609375, -0.449859619140625, 10.280487060546875, 11.511951446533203, 2.1108322143554688, 5.248800277709961, -2.4838409423828125, 3.2037200927734375, 9.126350402832031, 0.6355838775634766, -2.1679773330688477, 5.14727783203125, -0.443359375, 0.5380630493164062, -4.1881103515625, 0.3621711730957031, -1.1821060180664062, -1.5574893951416016, 1.5694961547851562, 5.3806610107421875, 10.47525405883789, 6.52400016784668, 0.34175872802734375, 2.291229248046875, 12.416908264160156, 4.9755096435546875, 0.9126663208007812, 9.430130004882812, 2.335906982421875, -2.265625, 0.3480377197265625, 9.224323272705078, -1.7979278564453125, 3.1915111541748047, 0.6501693725585938, 4.34197998046875, -1.081146240234375, 4.339046478271484, 12.56060791015625, 3.7368392944335938, 0.072967529296875, 8.792556762695312, -4.72991943359375, -2.0553512573242188, 1.3362236022949219, -2.9933319091796875, 3.7018966674804688, 1.9858160018920898, -2.6239891052246094, 10.16653060913086, 8.82188606262207, 5.489055633544922, -3.1379013061523438, -1.2612342834472656, 16.479034423828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000188.npy"}
|
||||
{"epoch": 0.2842025699168556, "step": 189, "batch_size": 64, "mean": 2.3743185997009277, "std": 5.288585662841797, "min": -15.2381591796875, "p10": -3.6410743713378904, "median": 1.7906427383422852, "p90": 9.511282348632815, "max": 12.220626831054688, "pos_frac": 0.71875, "sample": [-1.3433647155761719, 4.131664276123047, 3.292865753173828, -6.795524597167969, -1.6428375244140625, 5.8624114990234375, 9.80105209350586, 6.944103240966797, -0.011505126953125, -3.52813720703125, -5.8214263916015625, 9.682025909423828, -15.2381591796875, 1.2729644775390625, -1.9309463500976562, 0.8107376098632812, 10.207405090332031, 6.215339660644531, -3.6894760131835938, 1.1234169006347656, 2.7269935607910156, 9.703632354736328, 4.661170959472656, 0.727447509765625, 3.4520492553710938, -1.8495903015136719, 0.8197174072265625, 0.6489639282226562, 1.6595897674560547, 1.1010818481445312, 5.955848693847656, -2.9376220703125, 0.3562965393066406, 10.484535217285156, 8.986030578613281, 7.700229644775391, 0.01763153076171875, 2.6470184326171875, 7.358432769775391, 5.637081146240234, 7.286262512207031, 1.477294921875, 7.323692321777344, -1.9277076721191406, 3.4742660522460938, 5.1356048583984375, -1.3133163452148438, -3.8538455963134766, 0.5250244140625, 1.7415542602539062, 0.18365478515625, 8.499944686889648, 6.041175842285156, 6.350700378417969, 10.433685302734375, 3.9701919555664062, -2.4317779541015625, -7.929405212402344, 2.9111804962158203, -8.275604248046875, 1.839731216430664, -0.03857421875, 12.220626831054688, 9.11288070678711], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000189.npy"}
|
||||
{"epoch": 0.2857142857142857, "step": 190, "batch_size": 64, "mean": 3.3963682651519775, "std": 4.615594387054443, "min": -8.022745132446289, "p10": -2.92065258026123, "median": 3.560969352722168, "p90": 8.96441879272461, "max": 13.575260162353516, "pos_frac": 0.84375, "sample": [5.662408828735352, 4.841827392578125, 13.575260162353516, -5.3859405517578125, 2.7992706298828125, 5.830574035644531, 1.6177749633789062, 4.4080047607421875, -8.022745132446289, 12.638053894042969, -3.7032394409179688, 2.5566558837890625, 3.4087257385253906, 3.2477340698242188, 3.9736328125, 1.1041526794433594, 1.6865310668945312, 6.037261962890625, 3.5872249603271484, 3.5347137451171875, 3.617969512939453, 8.690025329589844, 0.861541748046875, 0.980499267578125, -2.4976749420166016, 5.919097900390625, -6.238067626953125, 0.032321929931640625, 2.677448272705078, 3.9456100463867188, 11.588264465332031, -3.1019287109375, 6.8094482421875, 8.3759765625, 1.572540283203125, 1.4104347229003906, 6.012847900390625, -0.5620517730712891, 0.9689655303955078, 6.658538818359375, 0.7080154418945312, 1.980010986328125, 7.337196350097656, 10.15289306640625, 3.81170654296875, 0.8038558959960938, -3.95806884765625, 7.28216552734375, -7.451904296875, 0.7400169372558594, 0.1578998565673828, 4.360681533813477, -0.8969039916992188, 10.309799194335938, 6.666473388671875, 9.082015991210938, 10.869384765625, 2.8791427612304688, 6.7132110595703125, 6.193138122558594, 0.4836845397949219, 8.345245361328125, 5.0069580078125, 4.671260833740234], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000190.npy"}
|
||||
{"epoch": 0.2872260015117158, "step": 191, "batch_size": 64, "mean": 2.5279035568237305, "std": 4.758782386779785, "min": -8.996185302734375, "p10": -2.6857791900634767, "median": 2.676508903503418, "p90": 8.486267471313479, "max": 12.772270202636719, "pos_frac": 0.734375, "sample": [9.324478149414062, -1.6992301940917969, 6.213348388671875, 12.293350219726562, 12.772270202636719, -1.644989013671875, 2.70550537109375, 7.143531799316406, 1.436553955078125, 4.993255615234375, -1.6481781005859375, 11.789581298828125, -1.071502685546875, 2.8130836486816406, 7.05767822265625, 2.914642333984375, -7.48358154296875, 1.1557846069335938, 3.30029296875, 4.285137176513672, 6.510734558105469, -0.2108154296875, 8.666332244873047, 2.9050216674804688, -8.352012634277344, 1.8755722045898438, 0.26085662841796875, 2.805004119873047, -1.3442535400390625, -0.8501148223876953, 2.647512435913086, 0.8543167114257812, 1.593606948852539, 6.1177978515625, 4.78900146484375, -7.501556396484375, 0.430908203125, -3.465301513671875, 7.390842437744141, 7.673301696777344, 8.066116333007812, 6.370143890380859, -0.7071075439453125, 3.0923004150390625, 5.620506286621094, 0.5013504028320312, 2.3747940063476562, 9.248710632324219, 0.34633922576904297, 0.6041717529296875, -2.729686737060547, 7.623649597167969, 2.044097900390625, -3.3006362915039062, -2.4739227294921875, 8.804962158203125, 6.651985168457031, -8.996185302734375, 2.8967018127441406, 0.3989715576171875, 3.0896682739257812, 3.5839996337890625, 1.8104629516601562, -2.5833282470703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000191.npy"}
|
||||
{"epoch": 0.2887377173091459, "step": 192, "batch_size": 64, "mean": 1.915104866027832, "std": 4.356027126312256, "min": -15.061660766601562, "p10": -2.2689201354980466, "median": 1.6560029983520508, "p90": 6.767182159423829, "max": 13.930908203125, "pos_frac": 0.671875, "sample": [-15.061660766601562, 0.8847236633300781, 2.2831878662109375, 4.046649932861328, 6.300365447998047, 2.3802127838134766, -2.4019527435302734, 4.97265625, 2.7379531860351562, 5.348480224609375, -0.6725845336914062, -2.075714111328125, -3.454265594482422, 0.1376495361328125, -0.6943740844726562, 1.4459381103515625, 8.721168518066406, -3.2426910400390625, -0.11460494995117188, 0.40399169921875, 4.717033386230469, 13.930908203125, 1.866067886352539, -2.1023902893066406, 6.4939422607421875, 4.215425491333008, 0.827789306640625, 2.992523193359375, 0.8871231079101562, 2.2605667114257812, 4.464942932128906, 7.2381591796875, -1.4656753540039062, 6.884284973144531, -0.6031990051269531, -0.4838829040527344, 1.028961181640625, 0.10625457763671875, -3.386791229248047, 5.3775787353515625, 2.365619659423828, -5.307247161865234, -1.9117431640625, 13.061058044433594, 2.859832763671875, 3.635711669921875, -2.340290069580078, 5.6866254806518555, 3.5032730102539062, -0.94293212890625, 2.176055908203125, 4.651283264160156, -2.0688209533691406, 5.311393737792969, 0.3262767791748047, 5.0908050537109375, 1.3837127685546875, -0.9954452514648438, 7.766204833984375, 8.567924499511719, 3.6526622772216797, -1.6280860900878906, -0.09228134155273438, 0.6203651428222656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000192.npy"}
|
||||
{"epoch": 0.29024943310657597, "step": 193, "batch_size": 64, "mean": 3.378182888031006, "std": 4.666043758392334, "min": -5.647308349609375, "p10": -2.077547073364258, "median": 3.00537109375, "p90": 10.4845085144043, "max": 12.830024719238281, "pos_frac": 0.6875, "sample": [1.1233673095703125, 1.9260997772216797, 6.69329833984375, -2.2720870971679688, 1.8757057189941406, -3.4846572875976562, 3.6601181030273438, 1.5848236083984375, -2.1451416015625, -0.6143074035644531, 2.4958343505859375, -0.7328681945800781, 6.0185394287109375, -1.2266693115234375, 9.638423919677734, -3.1386947631835938, 7.160423278808594, 1.9153900146484375, 6.116399765014648, 1.7009849548339844, -1.1851119995117188, 6.815650939941406, 10.839332580566406, 1.4909744262695312, 4.963722229003906, 6.123512268066406, 0.6755905151367188, 5.2099761962890625, 0.7915115356445312, 5.58647346496582, 12.089187622070312, 3.7884063720703125, -1.3771476745605469, 11.585403442382812, 4.298881530761719, 5.738456726074219, -1.9198265075683594, -4.719371795654297, 4.45526123046875, 5.154762268066406, 7.053436279296875, -1.0220718383789062, 7.973289489746094, 12.830024719238281, -1.239278793334961, 3.5149078369140625, 1.1008338928222656, -0.8596115112304688, -0.0514984130859375, 5.788276672363281, -0.6782646179199219, -0.4853858947753906, 12.728630065917969, 9.014579772949219, 4.13299560546875, 11.209793090820312, 9.656585693359375, -5.647308349609375, 5.40606689453125, 11.214424133300781, -1.091766357421875, 2.220418930053711, -3.4535255432128906, 8.18753719329834], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000193.npy"}
|
||||
{"epoch": 0.29176114890400606, "step": 194, "batch_size": 64, "mean": 1.9506279230117798, "std": 5.137694835662842, "min": -9.622627258300781, "p10": -4.698231887817382, "median": 1.3751716613769531, "p90": 8.984220886230473, "max": 15.196754455566406, "pos_frac": 0.65625, "sample": [1.7657241821289062, 1.30450439453125, 4.56475830078125, -3.4227352142333984, 5.1152496337890625, -2.9318313598632812, -4.915857315063477, 3.142742156982422, -0.5427932739257812, 2.5154876708984375, -0.31307220458984375, 4.11004638671875, 0.219940185546875, 9.32098388671875, -6.7607879638671875, 1.3560256958007812, 9.732242584228516, -2.16314697265625, 0.3476219177246094, 0.1764373779296875, 7.6300048828125, -0.4085559844970703, 0.101715087890625, 7.343921661376953, -2.1833839416503906, 1.394317626953125, -2.4436721801757812, 3.4522552490234375, 1.0479660034179688, 5.267860412597656, -1.9806976318359375, 3.2108535766601562, -6.034675598144531, -6.183002471923828, -8.753929138183594, -2.1719913482666016, 11.946250915527344, 10.25823974609375, 6.938838958740234, 6.440834045410156, -5.599666595458984, 0.09075927734375, 2.2722339630126953, -0.0069179534912109375, 4.9428558349609375, 10.364227294921875, 4.942676544189453, -9.622627258300781, 15.196754455566406, 3.00494384765625, 4.43507194519043, -3.3670883178710938, -0.8829193115234375, 9.87156867980957, 5.243080139160156, 8.198440551757812, 5.007501602172852, 4.754945755004883, -0.16417694091796875, 8.007270812988281, 0.6124343872070312, 0.19102096557617188, -4.190439224243164, 4.043552398681641], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000194.npy"}
|
||||
{"epoch": 0.29327286470143615, "step": 195, "batch_size": 64, "mean": 2.8940858840942383, "std": 4.331976413726807, "min": -10.78167724609375, "p10": -1.5673011779785155, "median": 2.754831314086914, "p90": 8.886844635009766, "max": 13.492233276367188, "pos_frac": 0.75, "sample": [2.525806427001953, 5.213039398193359, -3.833038330078125, 5.291587829589844, 11.889345169067383, 3.4262924194335938, 4.9814300537109375, 1.801727294921875, 4.3385162353515625, 13.2257080078125, 6.5538177490234375, 1.3099937438964844, -1.57373046875, -4.009063720703125, 0.41302490234375, -10.78167724609375, -0.5659904479980469, 4.639427185058594, 2.3060760498046875, 4.485805511474609, -2.8902587890625, 0.98748779296875, 6.140682220458984, 2.983856201171875, 0.5333404541015625, 3.055675506591797, 2.9963302612304688, 5.026344299316406, 3.195037841796875, 5.8717193603515625, 2.229564666748047, 2.3736038208007812, 11.032207489013672, 3.0039825439453125, 9.614959716796875, -1.4457435607910156, 1.890249252319336, 5.2048797607421875, 2.4312667846679688, 13.492233276367188, -0.5105247497558594, -0.460540771484375, 3.2974624633789062, -0.8875579833984375, 1.3743057250976562, 6.16009521484375, -1.52801513671875, 8.980056762695312, -1.5522994995117188, -2.77435302734375, -1.2682914733886719, -2.807342529296875, 3.7845458984375, 9.640312194824219, 1.3494720458984375, 6.686351776123047, 2.4798583984375, 2.9893569946289062, -1.0849151611328125, 8.669349670410156, 1.4790458679199219, 1.3157539367675781, 4.5611572265625, 5.962688446044922], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000195.npy"}
|
||||
{"epoch": 0.2947845804988662, "step": 196, "batch_size": 64, "mean": 2.147923707962036, "std": 4.889926433563232, "min": -10.012218475341797, "p10": -2.279079437255859, "median": 1.6595497131347656, "p90": 8.072293090820313, "max": 21.100189208984375, "pos_frac": 0.703125, "sample": [-0.603912353515625, -10.012218475341797, 0.15949058532714844, -4.08026123046875, 1.8686237335205078, -0.3908119201660156, 3.2106781005859375, 2.7556686401367188, 2.8141021728515625, -2.3844337463378906, -0.714935302734375, 10.477901458740234, 6.3511810302734375, -0.3626708984375, 0.48184967041015625, 0.36388397216796875, -6.6324615478515625, 9.655986785888672, -6.3196563720703125, 7.031585693359375, -1.9800338745117188, 2.8170700073242188, 8.885181427001953, 2.148712158203125, -2.033252716064453, 1.7127761840820312, 9.285781860351562, 7.901084899902344, 3.1107635498046875, 21.100189208984375, 1.0172481536865234, 3.758739471435547, -0.36651611328125, -1.5015144348144531, 0.8526573181152344, 6.875213623046875, -1.1938180923461914, 3.5363006591796875, 1.6063232421875, 1.035675048828125, 5.227741241455078, 0.9228363037109375, 4.3117523193359375, 0.02899932861328125, 4.877532958984375, -5.659049987792969, 0.6076335906982422, -0.1444091796875, 1.7408447265625, 4.348419189453125, 5.945098876953125, 4.110456466674805, 0.08023452758789062, -0.7575607299804688, 1.1038322448730469, 5.295257568359375, 5.050422668457031, 8.145668029785156, 1.387481689453125, -8.716636657714844, 2.8874244689941406, -0.7603302001953125, 5.995643615722656, 9.199661254882812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000196.npy"}
|
||||
{"epoch": 0.2962962962962963, "step": 197, "batch_size": 64, "mean": 3.0373177528381348, "std": 5.740475177764893, "min": -12.989761352539062, "p10": -4.016250801086426, "median": 2.965585708618164, "p90": 9.603039169311524, "max": 15.963310241699219, "pos_frac": 0.703125, "sample": [-2.0086708068847656, 14.765739440917969, 1.7584304809570312, 1.2896804809570312, 4.1002197265625, 1.8540306091308594, -4.063396453857422, -10.935455322265625, 2.7687149047851562, 2.6791000366210938, -1.2592239379882812, 9.378299713134766, -5.290699005126953, -1.8088493347167969, 2.8935108184814453, 7.357227325439453, 2.8414306640625, 1.1941070556640625, 13.112598419189453, 10.274124145507812, -0.5312042236328125, 1.6647186279296875, 6.064922332763672, 2.964214324951172, 6.765892028808594, -1.9571075439453125, 4.911491394042969, -3.9062442779541016, 3.7310142517089844, 4.040580749511719, 9.071914672851562, 13.138031005859375, -12.989761352539062, 9.341506958007812, -3.1681671142578125, -6.3826904296875, 3.4515247344970703, 5.07177734375, 9.699356079101562, -0.8954696655273438, -0.6649627685546875, 6.060749053955078, 8.864425659179688, 3.09967041015625, 3.3069992065429688, 7.767669677734375, -1.7689208984375, 0.7698974609375, 5.163228988647461, 7.855754852294922, -4.370584487915039, 0.5552520751953125, 13.236740112304688, 4.523225784301758, -1.1858882904052734, -5.380748748779297, 15.963310241699219, 6.865936279296875, 2.9571313858032227, 6.181186676025391, 2.9669570922851562, -0.7489166259765625, 3.2608280181884766, 8.12216567993164], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000197.npy"}
|
||||
{"epoch": 0.29780801209372637, "step": 198, "batch_size": 64, "mean": 3.1599321365356445, "std": 4.484543800354004, "min": -7.3785858154296875, "p10": -2.1544517517089843, "median": 3.1526260375976562, "p90": 9.295110511779786, "max": 13.52838134765625, "pos_frac": 0.75, "sample": [-2.1178741455078125, 5.897041320800781, 1.5577545166015625, -0.5255355834960938, 2.1863327026367188, 5.82952880859375, 0.97503662109375, 3.636871337890625, 2.0225448608398438, 4.148796081542969, 6.369377136230469, 5.8860931396484375, 1.8303337097167969, -1.9672737121582031, 10.273357391357422, 10.532341003417969, 3.9305877685546875, 10.38262939453125, 2.1687774658203125, 1.35235595703125, 3.5419158935546875, 3.0647964477539062, 8.349315643310547, -6.146156311035156, 10.891975402832031, -6.387359619140625, -2.0405502319335938, -1.1456146240234375, 8.737777709960938, 6.634204864501953, 5.496971130371094, 9.460052490234375, 3.4723968505859375, -7.3785858154296875, 2.7276229858398438, 1.6018524169921875, -1.3139324188232422, -1.7744636535644531, 4.108917236328125, 6.3471832275390625, 5.0266876220703125, 13.52838134765625, 7.2427215576171875, 3.1456832885742188, 3.1595687866210938, 2.534027099609375, -2.3841629028320312, 8.910245895385742, 11.970399856567383, -2.5273513793945312, -1.4651260375976562, 2.8154983520507812, 6.6746368408203125, 5.905971527099609, -0.8951644897460938, 0.099273681640625, 5.3705902099609375, -2.1701278686523438, -2.9155101776123047, 2.168315887451172, 4.559551239013672, 3.7739715576171875, 3.3985214233398438, 1.6916465759277344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000198.npy"}
|
||||
{"epoch": 0.29931972789115646, "step": 199, "batch_size": 64, "mean": 2.6754274368286133, "std": 5.38587760925293, "min": -9.874114990234375, "p10": -3.168033599853515, "median": 3.7367191314697266, "p90": 8.610140228271485, "max": 17.703018188476562, "pos_frac": 0.65625, "sample": [10.983905792236328, 6.920284271240234, 4.35546875, 3.5051612854003906, 0.47603416442871094, -5.5402374267578125, 17.703018188476562, 7.214111328125, 5.467517852783203, 4.476959228515625, 3.9682769775390625, -0.18560028076171875, 4.641693115234375, 6.6785736083984375, 7.551544189453125, -2.46661376953125, 8.907791137695312, 4.726360321044922, 7.021148681640625, 5.493507385253906, 0.32492828369140625, -5.845909118652344, 7.786651611328125, 7.481986999511719, -0.8058242797851562, 1.4106292724609375, -0.15561676025390625, -2.3325729370117188, 3.992279052734375, 5.738739013671875, 0.4930267333984375, 4.438173294067383, 4.951824188232422, -2.1573028564453125, -0.6688709259033203, -0.648956298828125, 10.186065673828125, -1.2930526733398438, -0.8050613403320312, 4.72747802734375, -9.404304504394531, 6.146209716796875, -3.2759170532226562, 8.374008178710938, -6.436485290527344, 6.395622253417969, 6.135625839233398, 0.09319686889648438, 0.7873573303222656, -0.8342437744140625, -1.2281150817871094, -7.3786773681640625, -2.4223861694335938, 8.615882873535156, -9.874114990234375, -1.9365310668945312, 8.59674072265625, -2.9163055419921875, 6.949241638183594, 0.430511474609375, 10.589027404785156, 0.46392059326171875, 11.789817810058594, 2.8497543334960938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000199.npy"}
|
||||
{"epoch": 0.30083144368858655, "step": 200, "batch_size": 64, "mean": 3.0649447441101074, "std": 5.765352249145508, "min": -12.05841064453125, "p10": -3.105229949951172, "median": 2.530986785888672, "p90": 10.434466552734376, "max": 15.929489135742188, "pos_frac": 0.71875, "sample": [2.377312660217285, 4.895328521728516, 8.393280029296875, 6.0152130126953125, 4.412445068359375, 0.3170967102050781, 14.110107421875, 2.9512176513671875, 5.634769439697266, 0.63189697265625, -0.14617919921875, 0.04278564453125, -3.1378250122070312, 9.433792114257812, 1.9780235290527344, -1.4132575988769531, -0.21009063720703125, -0.2787132263183594, 10.46868896484375, -9.543228149414062, 0.7696456909179688, -0.42047119140625, 10.3546142578125, 12.792041778564453, 1.780120849609375, 2.5002517700195312, 7.185039520263672, -1.2282485961914062, 6.4373931884765625, -5.782596588134766, 7.201011657714844, 6.918632507324219, -0.9414596557617188, 15.608184814453125, 10.545963287353516, -3.6304550170898438, -12.05841064453125, 3.7204208374023438, 6.5492095947265625, 8.814716339111328, 9.598838806152344, 2.5617218017578125, 15.929489135742188, 5.342506408691406, 1.041656494140625, -8.236160278320312, 0.1035919189453125, -0.0318603515625, 3.1466140747070312, 12.095924377441406, 2.0538101196289062, 3.3197097778320312, 1.2452545166015625, -1.7982101440429688, 2.9106597900390625, 8.395889282226562, -2.914104461669922, 1.6166439056396484, 0.47690582275390625, 3.5308761596679688, -3.0291748046875, -7.086109161376953, 8.298622131347656, 3.5351104736328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000200.npy"}
|
||||
{"epoch": 0.30234315948601664, "step": 201, "batch_size": 64, "mean": 3.0728702545166016, "std": 4.612461090087891, "min": -7.9757843017578125, "p10": -1.9870409965515134, "median": 1.9444999694824219, "p90": 10.164732360839844, "max": 13.43548583984375, "pos_frac": 0.734375, "sample": [0.073089599609375, 3.729034423828125, 1.0712203979492188, 4.266685485839844, -1.0296173095703125, 1.8586273193359375, 12.154472351074219, 0.11814498901367188, 1.5672836303710938, -0.5521392822265625, 4.4673614501953125, 3.4960670471191406, 9.505138397216797, 5.762626647949219, -1.2405738830566406, -2.071868896484375, 8.478347778320312, 7.501182556152344, 0.5411224365234375, 11.021171569824219, -4.1775970458984375, 6.148933410644531, 2.0303726196289062, 0.7479095458984375, -1.378011703491211, 12.038394927978516, 0.7002353668212891, 5.4591064453125, -2.5103302001953125, 10.970069885253906, 0.22003936767578125, 4.6738433837890625, 4.607807159423828, -0.8156776428222656, 9.983291625976562, 13.43548583984375, 0.300537109375, -7.9757843017578125, 4.944671630859375, 0.8843841552734375, 6.775226593017578, 4.016193389892578, -1.789109230041504, 5.705089569091797, -3.570995330810547, 5.93670654296875, 3.0363197326660156, -2.28753662109375, -1.1301250457763672, 10.24249267578125, -0.8141307830810547, -1.3970794677734375, 0.5124969482421875, 6.0089111328125, 5.288616180419922, 7.6222686767578125, 6.515613555908203, -0.8834228515625, 1.7910614013671875, 1.4289398193359375, 10.814926147460938, 3.886688232421875, 1.7512969970703125, -3.80181884765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000201.npy"}
|
||||
{"epoch": 0.30385487528344673, "step": 202, "batch_size": 64, "mean": 2.667759656906128, "std": 5.014394760131836, "min": -9.068523406982422, "p10": -3.517069625854492, "median": 2.2337589263916016, "p90": 8.566329193115235, "max": 16.88506317138672, "pos_frac": 0.734375, "sample": [-1.9016399383544922, 4.2097015380859375, 14.20266342163086, 12.5767822265625, 4.165308952331543, 0.17401885986328125, 2.6314620971679688, 4.139190673828125, 4.355596542358398, 2.114614486694336, 3.941934585571289, 3.970458984375, 5.390789031982422, 1.7058181762695312, -1.2740745544433594, 11.488471984863281, 5.54791259765625, -0.22700119018554688, 1.776580810546875, -3.8803558349609375, 0.9338455200195312, 2.7195396423339844, 2.1498260498046875, -1.0909423828125, -3.5512428283691406, 1.41693115234375, 2.296070098876953, 16.88506317138672, 0.3651275634765625, 5.572959899902344, -1.7561531066894531, 2.8699874877929688, 3.051738739013672, 2.17144775390625, 0.8125152587890625, -8.976493835449219, 8.181076049804688, 3.756749153137207, 3.274078369140625, -4.046165466308594, 1.464202880859375, 1.01446533203125, 0.08026885986328125, 10.76397705078125, 7.132049560546875, 0.24575042724609375, 7.157440185546875, 1.7806968688964844, 6.5237274169921875, -6.161918640136719, -0.036586761474609375, -3.7815399169921875, -0.48264122009277344, -1.146240234375, -3.4373321533203125, 5.0308074951171875, 13.17767333984375, 4.644609451293945, -0.8835735321044922, 8.731437683105469, -9.068523406982422, 5.912803649902344, 2.6134490966796875, 7.317420959472656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000202.npy"}
|
||||
{"epoch": 0.30536659108087677, "step": 203, "batch_size": 64, "mean": 3.069679021835327, "std": 4.754618167877197, "min": -8.725296020507812, "p10": -1.4982198715209956, "median": 2.0839614868164062, "p90": 8.342618560791019, "max": 17.588348388671875, "pos_frac": 0.75, "sample": [3.2110137939453125, -0.07350921630859375, -7.1959228515625, -0.03759765625, 0.048126220703125, 4.914125442504883, -0.6628646850585938, 14.218963623046875, 4.0439453125, -0.520172119140625, 12.237838745117188, 1.7185897827148438, -0.06456756591796875, 17.588348388671875, 0.5039710998535156, 8.589080810546875, 7.1442413330078125, 1.1371307373046875, 7.442024230957031, 6.131927490234375, -8.725296020507812, -0.8774223327636719, 5.847633361816406, -1.7480335235595703, -1.8372116088867188, -3.284332275390625, 1.2146110534667969, 5.6265869140625, 4.1280059814453125, 4.772529602050781, 3.61956787109375, 3.8204345703125, 1.0671463012695312, 3.0018234252929688, 6.3633270263671875, 0.13864898681640625, 6.7409210205078125, 6.068370819091797, 14.1502685546875, 2.2657546997070312, 9.004608154296875, 3.634521484375, 0.35442352294921875, 1.9021682739257812, 6.530921936035156, 0.8842010498046875, 6.949005126953125, 4.1119384765625, -0.69158935546875, 12.967559814453125, 4.02271842956543, 0.5671272277832031, 0.646453857421875, 1.8922691345214844, 2.773590087890625, -2.2355384826660156, 0.29938507080078125, 0.8460769653320312, 7.767539978027344, 1.4386405944824219, -0.7898063659667969, -2.353668212890625, -0.9153213500976562, 4.124202728271484], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000203.npy"}
|
||||
{"epoch": 0.30687830687830686, "step": 204, "batch_size": 64, "mean": 3.1055989265441895, "std": 4.558474063873291, "min": -7.140968322753906, "p10": -1.824796867370605, "median": 2.3301925659179688, "p90": 8.712015533447266, "max": 15.76824951171875, "pos_frac": 0.78125, "sample": [15.461479187011719, 5.1661834716796875, -4.357799530029297, 3.279165267944336, 0.7652244567871094, 15.76824951171875, 1.8177261352539062, 4.925653457641602, 0.8905792236328125, 2.159912109375, 9.781448364257812, -1.3157958984375, -2.122526168823242, 7.6216888427734375, 1.136688232421875, -0.1540374755859375, 6.771518707275391, 3.2004127502441406, -5.069129943847656, -0.26523590087890625, 1.267608642578125, 5.4557952880859375, 2.5215911865234375, 6.7061614990234375, 0.1559123992919922, -1.299163818359375, 5.8564453125, 6.250652313232422, 2.64178466796875, 2.7009315490722656, 6.260875701904297, -3.3996200561523438, 5.994871139526367, 5.90008544921875, 3.3566837310791016, -2.042940139770508, 2.5004730224609375, 10.386760711669922, 1.3844184875488281, 6.3485260009765625, 1.1783027648925781, 0.421295166015625, 7.0847930908203125, 11.131196975708008, -4.959693908691406, 1.1427383422851562, 4.698822021484375, 1.467890739440918, 8.254043579101562, -0.491180419921875, 8.431694030761719, 11.13165283203125, 1.3003158569335938, 4.537513732910156, 0.0423126220703125, 8.8321533203125, -0.44588470458984375, 1.4709281921386719, 0.18659591674804688, 1.9052276611328125, -7.140968322753906, -0.6944923400878906, 3.452310562133789, 1.4115142822265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000204.npy"}
|
||||
{"epoch": 0.30839002267573695, "step": 205, "batch_size": 64, "mean": 2.719717264175415, "std": 5.394881248474121, "min": -7.893131256103516, "p10": -3.789743041992187, "median": 2.1300392150878906, "p90": 10.612990760803223, "max": 16.547996520996094, "pos_frac": 0.71875, "sample": [3.60284423828125, 1.0402717590332031, 0.3255729675292969, -2.2597084045410156, 2.388080596923828, 14.405220031738281, 0.5528163909912109, 7.2348480224609375, 0.16552734375, 11.08502197265625, 2.3534393310546875, 1.3796825408935547, 16.547996520996094, 6.295932769775391, 10.687223434448242, -0.07585525512695312, 5.086479187011719, -2.1411895751953125, 2.712493896484375, 1.5948143005371094, 0.08443069458007812, 5.318992614746094, 0.5365581512451172, 4.0979461669921875, 3.371551513671875, -1.9677581787109375, -1.72650146484375, -1.5499153137207031, -3.234771728515625, 6.807991027832031, 0.002838134765625, -2.0630645751953125, -2.241241455078125, 0.17916107177734375, -4.4763031005859375, 1.689788818359375, 3.4644126892089844, 2.1461944580078125, -1.5241317749023438, 11.32568359375, -4.027587890625, 8.618576049804688, 6.547199249267578, 6.835968017578125, 12.55621337890625, 9.132316589355469, 2.2995834350585938, 3.3211669921875, 13.0947265625, -2.3523941040039062, -4.269580841064453, 8.605148315429688, 2.1138839721679688, 7.81011962890625, -6.2288665771484375, 10.439781188964844, 4.356605529785156, -7.893131256103516, 7.9518890380859375, -5.226734161376953, 2.154632568359375, -7.167945861816406, 1.3361968994140625, 0.8307666778564453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000205.npy"}
|
||||
{"epoch": 0.30990173847316704, "step": 206, "batch_size": 64, "mean": 1.8963953256607056, "std": 5.0938262939453125, "min": -8.761054992675781, "p10": -3.2455373764038087, "median": 1.2036361694335938, "p90": 7.53996467590332, "max": 17.201751708984375, "pos_frac": 0.609375, "sample": [-0.4056892395019531, -4.369171142578125, 1.958984375, -2.3575057983398438, -4.099948883056641, -3.0404129028320312, 2.566192626953125, 1.048370361328125, 5.3202056884765625, 3.131704330444336, -0.017848968505859375, 11.458106994628906, 4.812446594238281, -3.2790050506591797, 1.8890762329101562, 17.201751708984375, -1.1452255249023438, 4.608741760253906, -6.727996826171875, -0.09191513061523438, 16.82434844970703, 0.3527679443359375, -0.7865829467773438, -2.7960052490234375, -1.129934310913086, 7.3495635986328125, 3.972684860229492, -6.052997589111328, 3.64239501953125, 5.5081024169921875, 6.746986389160156, 8.041007995605469, 2.7076377868652344, 7.540691375732422, 1.780181884765625, -7.8760223388671875, -8.761054992675781, -2.3305587768554688, -1.2448997497558594, 1.2232666015625, 3.439056396484375, -0.25955963134765625, 9.736434936523438, -1.2351493835449219, -2.3899612426757812, 1.0909690856933594, 9.975894927978516, 6.20013427734375, 0.0397491455078125, -2.6656646728515625, 4.9266204833984375, 0.3038330078125, 4.075309753417969, 0.8275146484375, 7.53826904296875, 7.4688568115234375, -0.5490150451660156, 1.8596420288085938, -2.190765380859375, 6.2070159912109375, -3.1674461364746094, 3.3080291748046875, 1.1840057373046875, 2.47308349609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000206.npy"}
|
||||
{"epoch": 0.31141345427059713, "step": 207, "batch_size": 64, "mean": 3.1027140617370605, "std": 5.436855316162109, "min": -13.140296936035156, "p10": -2.404888343811035, "median": 2.281644821166992, "p90": 8.236276245117189, "max": 24.334381103515625, "pos_frac": 0.765625, "sample": [24.334381103515625, -0.3574409484863281, 0.8360538482666016, 3.8272933959960938, 2.2785606384277344, 3.4191970825195312, 2.28472900390625, 0.9953498840332031, 0.824798583984375, 7.270805358886719, 1.7583045959472656, 5.4681854248046875, -0.2674369812011719, 6.651023864746094, -2.8925342559814453, -2.600006103515625, -3.5946426391601562, 8.468353271484375, 5.075041770935059, 3.9261398315429688, 7.6689910888671875, 2.9532394409179688, 0.8413543701171875, -1.02532958984375, 0.10471343994140625, 1.2895851135253906, -1.9496135711669922, 10.546669006347656, -1.3487167358398438, 0.9508895874023438, 7.69476318359375, 3.789844512939453, 17.493072509765625, 0.7317657470703125, 1.11077880859375, 2.3009490966796875, -3.3518409729003906, 6.907329559326172, 3.00244140625, 0.0425567626953125, 4.9793243408203125, 2.9856643676757812, 3.087902069091797, 1.1988945007324219, 5.067039489746094, 7.1518707275390625, 1.6917572021484375, -0.7873687744140625, 7.069190979003906, 5.344841003417969, -13.140296936035156, 0.5018234252929688, 2.532318115234375, 5.361217498779297, 1.57373046875, 11.696609497070312, 12.3980712890625, -0.7755851745605469, 7.379554748535156, -0.9407825469970703, -4.348239898681641, 1.854644775390625, -3.6476287841796875, 12.879547119140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000207.npy"}
|
||||
{"epoch": 0.3129251700680272, "step": 208, "batch_size": 64, "mean": 1.5979571342468262, "std": 4.858058929443359, "min": -10.9439697265625, "p10": -4.136968612670898, "median": 0.9668598175048828, "p90": 7.652040100097657, "max": 12.080924987792969, "pos_frac": 0.5625, "sample": [-2.564556121826172, -0.2853240966796875, 5.275167465209961, -4.041812896728516, 9.292556762695312, 8.572063446044922, 1.352060317993164, 2.6640777587890625, -10.9439697265625, -5.618000030517578, -0.7918548583984375, -1.94903564453125, 4.346122741699219, 6.596122741699219, 4.510749816894531, 5.739999771118164, -0.34902191162109375, 2.9669189453125, -0.18730926513671875, -3.6155548095703125, 4.890052795410156, -5.951316833496094, -0.40903377532958984, -0.271575927734375, 3.4135265350341797, 6.220407485961914, 5.1099853515625, -0.8634796142578125, -0.819122314453125, 8.724746704101562, -2.672578811645508, 2.9277496337890625, -4.1777496337890625, -0.1790924072265625, -3.7609825134277344, 5.499267578125, 11.702644348144531, -1.9900283813476562, -1.610931396484375, -2.139829635620117, 12.080924987792969, 3.0806350708007812, 0.84918212890625, -6.71923828125, 6.573780059814453, 5.0991058349609375, 7.461212158203125, 0.9361000061035156, 6.476007461547852, 3.8992767333984375, -2.3079605102539062, 7.522178649902344, 0.0574798583984375, 7.707695007324219, -3.169342041015625, 2.863555908203125, 0.99761962890625, 0.27191925048828125, -4.943824768066406, -1.0889091491699219, -6.489418029785156, 1.123504638671875, 5.50714111328125, 9.868576049804688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000208.npy"}
|
||||
{"epoch": 0.3144368858654573, "step": 209, "batch_size": 64, "mean": 3.8344333171844482, "std": 5.548288345336914, "min": -6.3703460693359375, "p10": -2.5372421264648435, "median": 3.0172548294067383, "p90": 10.837133789062502, "max": 20.55590057373047, "pos_frac": 0.734375, "sample": [10.083656311035156, 7.727827072143555, -1.5140838623046875, 5.191020965576172, -2.1136016845703125, 13.275970458984375, 3.5691986083984375, 8.995590209960938, 1.0545787811279297, 20.55590057373047, 0.3997802734375, -2.1771774291992188, -6.3703460693359375, 11.84344482421875, -3.92236328125, 1.1635589599609375, 1.0554962158203125, 0.8545265197753906, 0.501617431640625, 16.418655395507812, 1.5205497741699219, -4.707088470458984, -2.6705474853515625, 5.554740905761719, 7.7652130126953125, 1.5900726318359375, -0.00775909423828125, 6.89483642578125, 6.057861328125, 5.954498291015625, 5.269207000732422, 5.203369140625, 5.439081192016602, 7.8031158447265625, 7.900596618652344, 0.8296318054199219, 9.193305969238281, 2.1958847045898438, -6.148872375488281, 4.9147186279296875, -0.3762054443359375, -0.0035066604614257812, 1.2741928100585938, -3.0202789306640625, 9.03851318359375, 2.9243640899658203, 10.52490234375, 10.970947265625, 3.1101455688476562, 6.521427154541016, 1.1217765808105469, -5.2213897705078125, -1.5209598541259766, 3.870086669921875, -0.4998321533203125, 2.909595489501953, 12.882514953613281, -2.2261962890625, 6.327606201171875, 7.03487491607666, 12.240379333496094, 9.583251953125, 1.4402084350585938, -0.6483535766601562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000209.npy"}
|
||||
{"epoch": 0.31594860166288735, "step": 210, "batch_size": 64, "mean": 3.826395034790039, "std": 5.164408206939697, "min": -6.307437896728516, "p10": -2.9721439361572264, "median": 4.446651458740234, "p90": 10.236015701293946, "max": 14.588058471679688, "pos_frac": 0.703125, "sample": [7.250701904296875, -0.3181495666503906, 0.802703857421875, 4.0521240234375, -0.8005180358886719, 5.25408935546875, 11.816024780273438, 6.267829895019531, 9.652809143066406, 4.575927734375, 11.0775146484375, 12.42333984375, 6.789276123046875, 9.970184326171875, 9.700218200683594, -0.9278717041015625, -5.248725891113281, -6.307437896728516, 14.588058471679688, -2.722736358642578, 6.943168640136719, 7.0684967041015625, 5.513465881347656, -0.88604736328125, 10.295425415039062, -1.2801284790039062, -1.4554100036621094, 5.234794616699219, 7.258460998535156, -3.0790328979492188, 7.4332122802734375, 3.626382827758789, 4.317375183105469, 0.3226776123046875, 2.270322799682617, -0.28324127197265625, -0.4643211364746094, -0.4432106018066406, -3.82244873046875, 10.097393035888672, -4.352230072021484, 2.158294677734375, 4.120201110839844, 1.1123886108398438, 11.63077163696289, 0.9166126251220703, 7.178386688232422, -0.092193603515625, 8.997360229492188, 4.8681488037109375, 0.9474258422851562, 5.231510162353516, 6.243080139160156, -5.966835021972656, 0.22353363037109375, 14.191322326660156, 9.306087493896484, 2.8282432556152344, 5.820133209228516, -1.7774162292480469, 5.6011505126953125, -4.255958557128906, 6.1383056640625, 7.2582550048828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000210.npy"}
|
||||
{"epoch": 0.31746031746031744, "step": 211, "batch_size": 64, "mean": 2.8868651390075684, "std": 4.839840412139893, "min": -7.4125518798828125, "p10": -2.846327972412109, "median": 2.944805145263672, "p90": 8.820329284667968, "max": 20.03271484375, "pos_frac": 0.6875, "sample": [3.3744678497314453, -4.787412643432617, 5.9088592529296875, 2.3248233795166016, -2.027690887451172, 0.7534370422363281, -5.278961181640625, 5.0103912353515625, 3.873687744140625, 0.8511924743652344, -0.1204833984375, 3.554319381713867, -0.966461181640625, 9.660110473632812, 0.4552764892578125, 7.076007843017578, 0.4102134704589844, -1.1148719787597656, 4.0656585693359375, -0.814056396484375, 8.958480834960938, -1.1997528076171875, 6.215297698974609, -4.7651824951171875, -1.7876091003417969, -2.0883140563964844, 1.4730682373046875, 1.5954742431640625, 2.8945846557617188, 3.915233612060547, 2.7127647399902344, -2.5027313232421875, 1.480316162109375, -0.24779510498046875, 5.7272796630859375, 8.156848907470703, 3.6997833251953125, 2.995025634765625, -3.2671051025390625, 6.276958465576172, 4.7671051025390625, 20.03271484375, 12.33544921875, 5.778774261474609, 2.566436767578125, 7.596519470214844, 6.271034240722656, 8.8453369140625, -2.9935836791992188, 10.54119873046875, -7.4125518798828125, 3.1435317993164062, 6.771820068359375, 5.735511779785156, 9.824974060058594, 5.818910598754883, 5.5446319580078125, 4.968963623046875, -1.6863517761230469, 8.761978149414062, -4.463584899902344, -0.8730621337890625, -0.25450897216796875, 0.6869888305664062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000211.npy"}
|
||||
{"epoch": 0.31897203325774753, "step": 212, "batch_size": 64, "mean": 3.7758162021636963, "std": 5.676251411437988, "min": -9.092208862304688, "p10": -3.6665878295898438, "median": 3.5858097076416016, "p90": 9.614051055908204, "max": 16.40997314453125, "pos_frac": 0.734375, "sample": [15.521781921386719, 8.509841918945312, 7.410642623901367, 7.8570404052734375, 1.8094100952148438, 3.524311065673828, 14.35089111328125, 2.342742919921875, -4.530364990234375, 7.635345458984375, -4.087894439697266, -9.092208862304688, 8.758247375488281, 10.40576171875, 5.491920471191406, 4.915657043457031, 9.228618621826172, 3.2590675354003906, -7.3929443359375, 3.205047607421875, 6.138496398925781, -0.09522628784179688, 3.647308349609375, 1.8723812103271484, -3.49114990234375, -0.1625213623046875, 6.672698974609375, -1.1395912170410156, 6.936370849609375, 16.151443481445312, 5.147808074951172, 4.975364685058594, 2.7052536010742188, 1.6690673828125, 9.289894104003906, 5.054901123046875, -1.9460029602050781, 7.13818359375, 0.31559181213378906, 5.29412841796875, 5.0331878662109375, -0.01894378662109375, 8.073402404785156, -3.3043670654296875, 1.0067501068115234, 4.646160125732422, -0.6956043243408203, 16.40997314453125, -5.330070495605469, -6.65802001953125, 2.5464229583740234, 1.2198104858398438, 1.5149459838867188, -2.7548904418945312, 8.580764770507812, 13.819202423095703, 9.752975463867188, 9.06011962890625, 8.846321105957031, 2.9567031860351562, -3.7417755126953125, 2.4885101318359375, -2.0077438354492188, 4.911094665527344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000212.npy"}
|
||||
{"epoch": 0.3204837490551776, "step": 213, "batch_size": 64, "mean": 3.17116379737854, "std": 6.003276348114014, "min": -12.6151123046875, "p10": -4.167619323730468, "median": 3.1038818359375, "p90": 11.859301757812501, "max": 17.030868530273438, "pos_frac": 0.6875, "sample": [6.024143218994141, -0.8842926025390625, 4.755409240722656, 12.03497314453125, 3.090229034423828, 1.9855833053588867, -2.4876556396484375, -1.1074295043945312, 5.41973876953125, 11.44940185546875, 3.917236328125, 0.2596445083618164, -1.904022216796875, -1.9533662796020508, -12.6151123046875, -4.889068603515625, 5.110857009887695, 7.37310791015625, -1.5038032531738281, 3.70355224609375, 4.753419876098633, -5.526103973388672, 17.030868530273438, 15.26666259765625, 13.651962280273438, 4.529753684997559, -1.6990346908569336, -3.156238555908203, -1.3143959045410156, 0.5684413909912109, -6.059642791748047, -5.4286651611328125, 0.03047943115234375, 3.5595779418945312, 3.058563232421875, -3.7291259765625, 2.592885971069336, 3.117534637451172, 12.825759887695312, 4.185417175292969, 2.8141746520996094, 5.67242431640625, 9.337234497070312, 1.6838302612304688, 11.405220031738281, 1.899627685546875, 5.881843566894531, 6.279205322265625, 15.507362365722656, -0.6357269287109375, -4.3555450439453125, 4.533649444580078, -2.136749267578125, 3.8953628540039062, -1.4350662231445312, 1.4901046752929688, -5.6400146484375, 16.853271484375, 7.304473876953125, 2.648590087890625, 5.06451416015625, 5.634128570556641, 6.446479797363281, 6.768852233886719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000213.npy"}
|
||||
{"epoch": 0.3219954648526077, "step": 214, "batch_size": 64, "mean": 3.113189220428467, "std": 5.486883163452148, "min": -7.0165252685546875, "p10": -2.8891679763793943, "median": 2.6866512298583984, "p90": 9.276254653930664, "max": 20.39471435546875, "pos_frac": 0.71875, "sample": [5.305093765258789, 0.48494720458984375, 0.872283935546875, 0.5318012237548828, -0.9572639465332031, 7.032283782958984, 2.6164093017578125, 3.151388168334961, 3.4201431274414062, 4.9102630615234375, 8.935640335083008, 2.775848388671875, -1.1209793090820312, -6.515655517578125, 5.373443603515625, 6.1848602294921875, 5.972175598144531, -4.104217529296875, 3.6152191162109375, 7.40477180480957, 1.859344482421875, 6.947040557861328, 1.68707275390625, -0.4817657470703125, 0.8267793655395508, 8.797801971435547, 2.8875350952148438, 2.254436492919922, 11.610679626464844, 4.357666015625, 5.15118408203125, -1.0010185241699219, -4.789051055908203, -7.0165252685546875, 2.875244140625, 9.206600189208984, -1.35272216796875, -1.4920806884765625, 11.384092330932617, 20.191177368164062, 2.4667205810546875, -2.809803009033203, 6.667327880859375, 2.342479705810547, 8.030376434326172, 20.39471435546875, -1.0445785522460938, -0.02584075927734375, 0.6271820068359375, -1.75714111328125, 0.3299560546875, 9.306106567382812, 2.8857650756835938, -1.1080741882324219, 1.5792961120605469, 2.7568931579589844, 14.947357177734375, 3.2213516235351562, -5.680179595947266, 1.5007171630859375, 9.533428192138672, 3.4114990234375, -5.200225830078125, -2.9231815338134766], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000214.npy"}
|
||||
{"epoch": 0.3235071806500378, "step": 215, "batch_size": 64, "mean": 4.093410968780518, "std": 5.433970928192139, "min": -6.0977783203125, "p10": -2.2985527038574216, "median": 4.048797607421875, "p90": 11.292925262451172, "max": 16.998374938964844, "pos_frac": 0.734375, "sample": [1.8738632202148438, -3.2308502197265625, -1.6703033447265625, 5.546699523925781, 2.779052734375, -2.0683517456054688, 0.7560806274414062, -0.637237548828125, 13.509891510009766, 11.50555419921875, -2.3238601684570312, -2.239501953125, -2.183502197265625, 3.3098220825195312, 6.165138244628906, 11.191169738769531, 1.507598876953125, 7.084648132324219, 5.9697113037109375, 2.1777191162109375, 4.466072082519531, 4.311042785644531, -1.2727546691894531, 5.2347259521484375, 10.916301727294922, 4.505132675170898, -5.727714538574219, 10.572402954101562, 7.327400207519531, -3.709430694580078, 1.7532196044921875, 7.927467346191406, 1.2478408813476562, -1.8740692138671875, -3.6195526123046875, -6.0977783203125, 7.515645980834961, 4.473060607910156, 7.751106262207031, 7.35980224609375, 2.2841949462890625, 1.8046875, 2.723419189453125, -0.637939453125, 13.375869750976562, 16.998374938964844, 11.278564453125, 13.045427322387695, 5.426177978515625, 5.6831207275390625, 11.266204833984375, 7.0440216064453125, 4.131744384765625, -3.6343955993652344, 0.2433452606201172, 3.120542526245117, 5.1622314453125, 14.580596923828125, 11.299079895019531, -0.7095890045166016, 11.237350463867188, 1.5441207885742188, 3.965850830078125, -1.3379745483398438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000215.npy"}
|
||||
{"epoch": 0.3250188964474679, "step": 216, "batch_size": 64, "mean": 3.350710391998291, "std": 5.3715949058532715, "min": -10.108200073242188, "p10": -3.26592903137207, "median": 3.425189971923828, "p90": 9.906108856201172, "max": 16.02260971069336, "pos_frac": 0.75, "sample": [0.6566390991210938, 9.312705993652344, 4.7694091796875, 3.6832809448242188, 7.716541290283203, 2.7497787475585938, 4.522205352783203, 3.7665328979492188, 7.3376007080078125, 7.998191833496094, 2.8977203369140625, 5.77740478515625, 8.373954772949219, 3.1105499267578125, 1.650390625, -2.2832260131835938, 4.389232635498047, 4.456703186035156, -7.160850524902344, 9.942901611328125, 6.0983428955078125, 11.357284545898438, 3.1670989990234375, 2.542724609375, 6.340122222900391, -1.9077091217041016, 5.753631591796875, 0.5739269256591797, 4.606044769287109, 11.024566650390625, 6.226715087890625, -10.108200073242188, 1.0898818969726562, 3.709125518798828, 2.8374481201171875, 1.55706787109375, -2.9037246704101562, -0.9393405914306641, 8.120903015136719, -5.457485198974609, -1.0345497131347656, 11.918182373046875, 13.562522888183594, -2.8401412963867188, -3.4083518981933594, -0.6837310791015625, 5.015815734863281, 0.8791694641113281, -5.824348449707031, 3.06744384765625, 16.02260971069336, -2.9336090087890625, -5.357337951660156, 0.3190422058105469, 14.477203369140625, -1.3164615631103516, 5.474586486816406, 5.175193786621094, 0.9923171997070312, 9.16876220703125, -3.448108673095703, 7.5485992431640625, 0.49431610107421875, 9.820259094238281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000216.npy"}
|
||||
{"epoch": 0.32653061224489793, "step": 217, "batch_size": 64, "mean": 3.5603256225585938, "std": 5.970407485961914, "min": -7.78485107421875, "p10": -3.587527465820312, "median": 3.123523712158203, "p90": 11.298881149291992, "max": 22.29949951171875, "pos_frac": 0.734375, "sample": [3.2513580322265625, 11.879032135009766, -3.85406494140625, 0.91424560546875, -1.2322216033935547, -0.2317047119140625, -0.012670516967773438, 22.29949951171875, -0.866180419921875, -7.78485107421875, 11.169113159179688, 12.529159545898438, 4.229766845703125, 3.9459304809570312, 5.703575134277344, 3.4420013427734375, 4.772186279296875, 2.9956893920898438, 3.319366455078125, 0.7044219970703125, 0.0108489990234375, 11.354496002197266, 5.50299072265625, 4.323448181152344, 4.904197692871094, 9.008628845214844, -6.034263610839844, 2.0758056640625, -0.9511985778808594, 9.072891235351562, 6.598026275634766, 1.816650390625, 8.158279418945312, -6.3414154052734375, 2.1325950622558594, -4.349700927734375, 0.5853805541992188, 5.561252593994141, -2.965606689453125, -4.284965515136719, -7.012451171875, 11.132068634033203, -0.9818649291992188, 1.1370697021484375, 0.78863525390625, -2.7725677490234375, 0.7833938598632812, 3.900970458984375, 3.676483154296875, 16.567413330078125, 2.799497604370117, 8.78167724609375, 2.69775390625, 0.7120399475097656, 14.341705322265625, 2.20904541015625, 17.92523193359375, 4.8465118408203125, 3.9333953857421875, 5.8173370361328125, 5.618743896484375, 9.947982788085938, -0.31671142578125, -2.024505615234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000217.npy"}
|
||||
{"epoch": 0.328042328042328, "step": 218, "batch_size": 64, "mean": 2.4268999099731445, "std": 5.602859973907471, "min": -9.040557861328125, "p10": -4.446923446655274, "median": 2.6011428833007812, "p90": 10.032453155517588, "max": 17.09333038330078, "pos_frac": 0.671875, "sample": [-3.3366928100585938, 0.5258216857910156, 4.974273681640625, 3.044412612915039, 2.147857666015625, 1.232574462890625, 5.023838043212891, -0.294464111328125, 5.632965087890625, 5.536376953125, -0.066375732421875, 2.2687606811523438, 6.50628662109375, -4.0802764892578125, -4.322971343994141, -2.3882789611816406, 5.9190826416015625, 11.839996337890625, -9.040557861328125, 3.534099578857422, 3.1665115356445312, 7.132301330566406, 0.32281494140625, -1.0277252197265625, 7.093503952026367, 12.8145751953125, 7.0072479248046875, 13.0738525390625, 4.345195770263672, 4.8610382080078125, -1.1693286895751953, 3.7445068359375, -4.5000457763671875, -6.420782089233398, -2.9854202270507812, 2.6955718994140625, 0.1651153564453125, 1.3045539855957031, 0.04889678955078125, 4.6901092529296875, 11.127655029296875, 2.5067138671875, 2.2272138595581055, -1.70654296875, 2.7205429077148438, -6.937431335449219, 7.476982116699219, -1.58831787109375, -3.7630233764648438, -7.659172058105469, 6.617578506469727, -7.250083923339844, 6.66119384765625, -0.967559814453125, 5.7276611328125, 4.2046661376953125, 5.25250244140625, 17.09333038330078, -7.992862701416016, -0.39720916748046875, 2.2201614379882812, 3.6920166015625, 12.8489990234375, 12.187341690063477], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000218.npy"}
|
||||
{"epoch": 0.3295540438397581, "step": 219, "batch_size": 64, "mean": 3.1978893280029297, "std": 5.0279717445373535, "min": -8.2967529296875, "p10": -2.8865516662597654, "median": 3.0207176208496094, "p90": 9.594879150390625, "max": 14.531158447265625, "pos_frac": 0.734375, "sample": [9.396209716796875, 11.443885803222656, 4.709129333496094, 7.841703414916992, 10.6058349609375, -0.57598876953125, -0.7322807312011719, 4.265983581542969, 2.2903671264648438, -4.919097900390625, 0.6885700225830078, 14.531158447265625, 8.056365966796875, 0.5512542724609375, -1.5240554809570312, 2.162445068359375, 1.1330337524414062, 1.1921138763427734, -0.4391937255859375, -1.6908025741577148, -2.418701171875, -8.2967529296875, 0.6080703735351562, 2.3329734802246094, -2.9229278564453125, 0.22930526733398438, 8.818099975585938, -0.7960910797119141, 9.680023193359375, 2.7897491455078125, 8.008899688720703, 7.81085205078125, 6.1174774169921875, 9.004032135009766, 8.2886962890625, 12.209434509277344, 1.423065185546875, 4.251136779785156, -5.681755065917969, 1.8412399291992188, 4.181446075439453, 3.2516860961914062, 5.600364685058594, -4.005714416503906, 4.785797119140625, -4.096836090087891, 8.84730339050293, -2.414398193359375, 5.490818023681641, 11.03271484375, -0.7339935302734375, 3.7897415161132812, 7.322883605957031, 0.8708267211914062, -7.078300476074219, 6.3562774658203125, 5.708900451660156, 0.2761688232421875, 3.624908447265625, -2.8016738891601562, 10.840049743652344, 3.6605224609375, 5.71258544921875, 2.15936279296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000219.npy"}
|
||||
{"epoch": 0.3310657596371882, "step": 220, "batch_size": 64, "mean": 2.6936841011047363, "std": 5.354838848114014, "min": -7.9678449630737305, "p10": -3.763618469238281, "median": 2.325639247894287, "p90": 9.766057586669925, "max": 16.74195098876953, "pos_frac": 0.671875, "sample": [-1.5386199951171875, 2.8492889404296875, 2.573253631591797, 1.4123497009277344, 3.508040428161621, -0.2875213623046875, 8.875640869140625, 0.22318267822265625, 0.19224929809570312, 1.5235137939453125, 0.38979339599609375, 13.853691101074219, 4.381254196166992, -6.840106964111328, 2.4097328186035156, 5.238323211669922, 6.359443664550781, -4.7969207763671875, -0.5147705078125, 7.83099365234375, 6.530086517333984, 7.226142883300781, -1.794485092163086, -2.2240447998046875, -3.2014694213867188, -7.502960205078125, 0.31573486328125, -1.6612319946289062, 10.18170166015625, 5.828361511230469, 9.191566467285156, 3.3024368286132812, 5.88104248046875, 5.61848258972168, 8.386653900146484, -0.9469451904296875, 10.01226806640625, -0.342376708984375, 11.413572311401367, -4.004539489746094, 2.326416015625, 2.324862480163574, -0.487396240234375, 3.3845481872558594, 16.74195098876953, -1.9611358642578125, 12.287147521972656, -0.9536342620849609, -3.021953582763672, 6.200038909912109, 1.9412269592285156, -7.9678449630737305, 7.966939926147461, -0.8111534118652344, 1.679168701171875, 4.1882476806640625, -6.475437164306641, 6.519157409667969, 5.253410339355469, 0.45621490478515625, 13.259620666503906, -4.756458282470703, 3.3584861755371094, 1.0905399322509766], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000220.npy"}
|
||||
{"epoch": 0.3325774754346183, "step": 221, "batch_size": 64, "mean": 3.0300214290618896, "std": 5.8037590980529785, "min": -11.8302001953125, "p10": -3.1064884185791013, "median": 2.044403076171875, "p90": 11.467881774902343, "max": 17.20538330078125, "pos_frac": 0.671875, "sample": [4.572517395019531, -0.8128242492675781, 3.93487548828125, 3.6673507690429688, -2.1281356811523438, 5.5140838623046875, -0.34120941162109375, 3.14453125, 1.6180572509765625, -3.170013427734375, -2.3667755126953125, -0.4223785400390625, 7.595542907714844, 7.3159332275390625, -2.7309188842773438, 0.9457130432128906, -0.6876220703125, 0.563262939453125, 1.0341777801513672, 0.14827728271484375, 11.468719482421875, 11.465927124023438, 1.8919754028320312, 6.0816802978515625, 12.28375244140625, 0.7412376403808594, 5.246623992919922, 12.435253143310547, -1.4243240356445312, -2.8506526947021484, 8.901931762695312, 9.405517578125, -4.723339080810547, 3.2907333374023438, 12.307334899902344, 4.283504486083984, 5.953094482421875, -4.8068695068359375, 6.498119354248047, 6.366241455078125, 6.151760101318359, 0.6128005981445312, -0.26715087890625, 2.441495895385742, 1.8357810974121094, -0.8774795532226562, -2.958263397216797, 13.604965209960938, -4.005126953125, -2.88104248046875, -11.8302001953125, -10.977020263671875, 1.8070144653320312, 10.508544921875, 8.792694091796875, -0.2884979248046875, 2.1968307495117188, 5.0340423583984375, 17.20538330078125, 8.275726318359375, -3.9419326782226562, 6.2804107666015625, 13.350883483886719, 1.63885498046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000221.npy"}
|
||||
{"epoch": 0.3340891912320484, "step": 222, "batch_size": 64, "mean": 3.889547824859619, "std": 4.97367525100708, "min": -5.946502685546875, "p10": -1.8779159545898436, "median": 3.9092350006103516, "p90": 10.308116149902345, "max": 16.982961654663086, "pos_frac": 0.71875, "sample": [-1.2362747192382812, -3.053741455078125, 2.756114959716797, 2.0342254638671875, 5.115375518798828, 7.1344146728515625, 2.3037681579589844, -1.0220870971679688, 8.931884765625, 11.361690521240234, 7.344940185546875, 1.8276100158691406, -0.5476226806640625, 3.2168045043945312, 4.91179084777832, 0.5876998901367188, -1.7449874877929688, 2.7725906372070312, -0.6240577697753906, 2.8378829956054688, 4.631324768066406, 0.010507583618164062, 10.886138916015625, -0.9580917358398438, -0.5302505493164062, 12.848026275634766, 7.0594482421875, 14.60736083984375, -3.6954574584960938, 2.6776809692382812, 3.964641571044922, 6.873443603515625, 6.068016052246094, 8.220415115356445, 8.368789672851562, 5.8077239990234375, 5.410911560058594, -4.385353088378906, -3.3660240173339844, -0.911346435546875, -1.1074676513671875, 10.37554931640625, 0.6093292236328125, 6.349647521972656, 0.7489242553710938, 9.8077392578125, -1.276763916015625, -3.2857131958007812, 3.8538284301757812, 5.075172424316406, 7.713478088378906, 3.4987564086914062, 7.486824035644531, 16.982961654663086, -1.8289642333984375, 6.612491607666016, 11.764190673828125, 7.7532501220703125, -1.898895263671875, 10.150772094726562, 4.1628265380859375, 6.221933364868164, 6.6117706298828125, -5.946502685546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000222.npy"}
|
||||
{"epoch": 0.3356009070294785, "step": 223, "batch_size": 64, "mean": 4.138675212860107, "std": 5.1979217529296875, "min": -8.047172546386719, "p10": -2.1287818908691403, "median": 3.619962692260742, "p90": 10.226885986328124, "max": 17.658905029296875, "pos_frac": 0.8125, "sample": [1.798971176147461, 10.00814437866211, 2.8836746215820312, 0.8578338623046875, 0.9015960693359375, 6.795648574829102, -2.9923095703125, 0.703643798828125, -3.8043746948242188, 7.438447952270508, 4.110782623291016, 16.612396240234375, 5.0590057373046875, 3.45263671875, 0.6741714477539062, 2.2281570434570312, 1.0637969970703125, 3.985076904296875, 5.4758453369140625, -3.9241485595703125, 8.674545288085938, 5.320762634277344, -6.576061248779297, 2.2504653930664062, 5.262493133544922, 1.9061927795410156, 0.5821456909179688, 10.19403076171875, -2.8660888671875, -8.047172546386719, 5.483560562133789, 2.725555419921875, -0.4537200927734375, 2.6015396118164062, 5.006340026855469, 9.227603912353516, 5.846138000488281, 9.457901000976562, -1.814697265625, 1.7150611877441406, -1.0466232299804688, 4.579303741455078, 1.8824615478515625, 7.199928283691406, 12.915374755859375, 7.79779052734375, 10.240966796875, 14.039058685302734, -0.198822021484375, 8.85549545288086, 17.658905029296875, -2.2633895874023438, 11.262466430664062, 13.781768798828125, 6.284629821777344, 2.7197036743164062, 6.502902984619141, 9.366695404052734, 2.3983306884765625, -1.2019805908203125, 1.9860305786132812, 3.8726959228515625, 3.7872886657714844, 2.628631591796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000223.npy"}
|
||||
{"epoch": 0.3371126228269085, "step": 224, "batch_size": 64, "mean": 2.82529878616333, "std": 4.4951605796813965, "min": -5.6129150390625, "p10": -2.5862497329711913, "median": 2.4556007385253906, "p90": 8.856816864013673, "max": 16.710540771484375, "pos_frac": 0.71875, "sample": [1.549835205078125, 16.710540771484375, -0.13643646240234375, 0.9908905029296875, 1.67315673828125, 3.804229736328125, 5.12054443359375, 1.7591972351074219, -1.5973377227783203, 1.221588134765625, 5.580291748046875, 7.118852615356445, 5.996940612792969, 10.02239990234375, 3.229372024536133, -0.30513572692871094, 3.2129249572753906, 5.175736427307129, 5.967830657958984, 3.07904052734375, -2.1937408447265625, 5.572746276855469, 1.4551239013671875, -5.132938385009766, -2.604583740234375, -5.337749481201172, 1.6860313415527344, 2.2700271606445312, 3.746368408203125, -3.8262481689453125, 9.911808013916016, -1.4105892181396484, -2.5434703826904297, 2.0717010498046875, 4.533012390136719, -0.09772300720214844, -2.1051387786865234, -5.6129150390625, 3.4180564880371094, 2.969573974609375, 0.77679443359375, 1.243194580078125, 6.856128692626953, -0.2592658996582031, -4.883270263671875, 1.670196533203125, 9.928230285644531, 7.714580535888672, 5.3222503662109375, 2.3702850341796875, -3.1725940704345703, 3.628438949584961, 4.929483413696289, 8.515602111816406, 13.289161682128906, 5.165435791015625, 2.5409164428710938, 5.6109619140625, -1.2049427032470703, 10.50687026977539, -0.75048828125, 9.0030517578125, 2.3318328857421875, 2.7424659729003906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000224.npy"}
|
||||
{"epoch": 0.3386243386243386, "step": 225, "batch_size": 64, "mean": 3.671389579772949, "std": 6.258881568908691, "min": -12.339447021484375, "p10": -2.9137392044067383, "median": 1.9170646667480469, "p90": 11.821765899658203, "max": 21.853851318359375, "pos_frac": 0.671875, "sample": [-1.7068099975585938, 1.6953582763671875, -0.2752037048339844, 1.3362579345703125, 7.2364501953125, 10.95953369140625, 1.4490280151367188, -2.312358856201172, -0.7842121124267578, 12.65045166015625, 1.00299072265625, 7.59063720703125, -1.3409862518310547, 8.870620727539062, 1.406951904296875, 1.2350540161132812, -1.2140731811523438, 1.1129379272460938, 5.612911224365234, 11.765716552734375, 11.879047393798828, 6.301990509033203, 21.853851318359375, 3.14453125, 16.9970703125, 1.7506790161132812, 7.99346923828125, 5.9595184326171875, 8.872795104980469, -2.9365673065185547, 2.6638031005859375, -12.339447021484375, 8.2945556640625, -3.4142532348632812, 11.845787048339844, 6.4697418212890625, 4.15201473236084, -2.146697998046875, 6.689422607421875, -0.12000274658203125, 3.225860595703125, 1.9818115234375, 14.751228332519531, -2.6363677978515625, 6.4911346435546875, -7.334014892578125, -3.641265869140625, -0.18215179443359375, 9.856849670410156, -0.6274337768554688, 4.5586090087890625, 1.8523178100585938, 7.0353546142578125, 10.264503479003906, 12.159858703613281, -5.099766731262207, 0.4548187255859375, 1.32171630859375, 11.170629501342773, -0.07125282287597656, -2.8604736328125, -5.208160400390625, 10.106880187988281, -2.8043289184570312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000225.npy"}
|
||||
{"epoch": 0.3401360544217687, "step": 226, "batch_size": 64, "mean": 3.361865520477295, "std": 5.588277816772461, "min": -10.407234191894531, "p10": -3.026152992248535, "median": 3.3516950607299805, "p90": 11.0755973815918, "max": 16.378814697265625, "pos_frac": 0.75, "sample": [4.2510223388671875, 0.24102783203125, -0.16809844970703125, 5.92352294921875, 4.281005859375, 2.9477615356445312, 11.867324829101562, 7.991455078125, 0.863189697265625, -6.958627700805664, 7.226938247680664, 1.452301025390625, 5.947898864746094, 10.11526107788086, -1.5009193420410156, -1.0735931396484375, 3.7556285858154297, 0.11229705810546875, 9.998647689819336, 3.9663619995117188, 11.459686279296875, 1.2643814086914062, -0.3363800048828125, 6.669227600097656, 3.909881591796875, 4.181282043457031, 1.9457969665527344, 16.378814697265625, 7.917877197265625, 9.012321472167969, -3.6309585571289062, 12.694503784179688, 1.8700408935546875, -1.0146903991699219, 12.038394927978516, 1.2558135986328125, 1.9522895812988281, -8.995254516601562, -3.0839920043945312, 1.0305500030517578, -0.903564453125, 4.3290557861328125, -2.560302734375, -5.355569839477539, 4.929595947265625, 12.835174560546875, -10.407234191894531, 6.603302001953125, -6.694616317749023, 8.163787841796875, 4.449684143066406, 2.2669315338134766, 2.452655792236328, 5.588657379150391, 0.7859954833984375, 14.1868896484375, 7.912078857421875, 6.1806488037109375, 0.45313262939453125, -2.891195297241211, 10.179389953613281, -1.2224798202514648, 0.6919345855712891, 5.425441741943359], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000226.npy"}
|
||||
{"epoch": 0.3416477702191988, "step": 227, "batch_size": 64, "mean": 3.3981308937072754, "std": 6.434782981872559, "min": -11.652275085449219, "p10": -3.4853382110595703, "median": 2.4390010833740234, "p90": 12.761160278320315, "max": 16.89324951171875, "pos_frac": 0.6875, "sample": [13.074684143066406, 6.865856170654297, -0.2298126220703125, 12.204116821289062, 1.622711181640625, -10.92470932006836, 6.527462005615234, 11.832138061523438, -11.652275085449219, -1.9313182830810547, 11.146102905273438, -4.5525360107421875, -1.5472679138183594, 13.157342910766602, 10.604320526123047, -2.9368820190429688, -0.6088790893554688, -1.4117965698242188, 9.780925750732422, -0.347900390625, 0.411346435546875, 7.106513977050781, 5.5069122314453125, 5.0212249755859375, 15.628562927246094, 7.515892028808594, -3.349781036376953, 9.566749572753906, -2.2856216430664062, -1.7663135528564453, -1.1684799194335938, 14.619815826416016, 12.999893188476562, 2.1180648803710938, 6.515289306640625, 4.96360969543457, 16.89324951171875, 0.20406341552734375, 2.3205528259277344, 7.590492248535156, 0.1012725830078125, 2.5574493408203125, -1.02532958984375, 4.51507568359375, 7.849273681640625, 1.3139572143554688, 5.00531005859375, 4.564605712890625, -3.5434341430664062, 0.6278629302978516, -7.235485076904297, 4.3314666748046875, 2.2877960205078125, 14.759819030761719, 1.0978317260742188, -2.102783203125, 2.7367172241210938, 0.4326591491699219, 7.981689453125, 0.472991943359375, -5.656700134277344, -8.315078735351562, 7.190818786621094, 6.448280334472656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000227.npy"}
|
||||
{"epoch": 0.3431594860166289, "step": 228, "batch_size": 64, "mean": 3.0576257705688477, "std": 5.66430139541626, "min": -10.104255676269531, "p10": -3.901648712158203, "median": 3.2009096145629883, "p90": 10.36367473602295, "max": 16.57589340209961, "pos_frac": 0.703125, "sample": [-2.0075111389160156, 13.82859992980957, 8.555252075195312, 13.802215576171875, 5.954463958740234, 5.6079254150390625, 2.494779586791992, 10.099130630493164, 2.957305908203125, -1.3063430786132812, 5.387645721435547, 1.48870849609375, 5.242462158203125, -8.851367950439453, 1.62664794921875, 3.3388919830322266, 6.267734527587891, 0.8363189697265625, -1.483489990234375, 9.952152252197266, 3.4676589965820312, 3.06292724609375, -1.7605400085449219, 2.2758026123046875, -3.78369140625, 5.29400634765625, 9.514030456542969, 16.57589340209961, 6.74627685546875, 6.529918670654297, -3.704599380493164, -4.940071105957031, 4.394783020019531, -4.9625244140625, 6.175533294677734, -10.104255676269531, 4.386322021484375, -0.23163986206054688, -7.78582763671875, -3.9522018432617188, 3.8284759521484375, 1.7529449462890625, 9.31915283203125, -3.9554786682128906, 1.593994140625, 2.5042171478271484, 10.594009399414062, 3.6890830993652344, -3.451587677001953, 4.356292724609375, -1.8156089782714844, -1.3431777954101562, 11.593994140625, 4.206432342529297, -1.6469497680664062, 5.360591888427734, 10.47705078125, 13.828594207763672, 7.9754791259765625, -2.2914276123046875, 0.978271484375, 0.20966339111328125, 4.582300186157227, 2.3524017333984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000228.npy"}
|
||||
{"epoch": 0.34467120181405897, "step": 229, "batch_size": 64, "mean": 3.6718177795410156, "std": 5.214012622833252, "min": -7.1215667724609375, "p10": -2.9540931701660154, "median": 4.1297407150268555, "p90": 9.864844512939456, "max": 17.551666259765625, "pos_frac": 0.765625, "sample": [3.0174179077148438, 5.551776885986328, 10.106536865234375, 6.7629852294921875, 4.9542694091796875, 4.738006591796875, 11.086572647094727, 5.724225997924805, 6.15618896484375, 0.16280746459960938, 0.462005615234375, 6.12896728515625, 9.073051452636719, -5.122749328613281, 4.705169677734375, 4.313497543334961, 3.93463134765625, -3.0203018188476562, 9.156204223632812, 5.18011474609375, 3.5170440673828125, 1.8205137252807617, 1.9630928039550781, 6.8100433349609375, -6.2712249755859375, 5.858001708984375, -7.1215667724609375, -1.7945833206176758, -0.8933200836181641, 3.94598388671875, 6.0944976806640625, 17.551666259765625, -6.307781219482422, 6.573097229003906, 0.08866119384765625, 0.7663307189941406, 10.344680786132812, -2.7996063232421875, -1.4540863037109375, 3.0513458251953125, 15.076629638671875, 5.29737663269043, -0.08928680419921875, -4.711706161499023, 4.8947601318359375, 4.430820465087891, 8.918802261352539, 6.783313751220703, 3.296955108642578, 5.79107666015625, 3.8329925537109375, -0.3287773132324219, 3.590496063232422, 1.3837890625, 7.120643615722656, 5.746757507324219, -2.2809715270996094, 0.9645824432373047, 11.972803115844727, 13.59307861328125, 9.300895690917969, 3.3339881896972656, -0.6437873840332031, -7.093059539794922], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000229.npy"}
|
||||
{"epoch": 0.34618291761148906, "step": 230, "batch_size": 64, "mean": 2.810325860977173, "std": 5.390380859375, "min": -10.256256103515625, "p10": -4.066889190673828, "median": 2.263029098510742, "p90": 9.65102424621582, "max": 13.687942504882812, "pos_frac": 0.71875, "sample": [-4.330360412597656, 4.439361572265625, 3.984111785888672, 1.0008125305175781, 5.823299407958984, -10.256256103515625, 1.7444953918457031, 0.46502685546875, -3.82122802734375, 9.248573303222656, 3.2232284545898438, 9.675918579101562, 11.820472717285156, 4.1659393310546875, -0.27980804443359375, 3.0011444091796875, 8.260528564453125, 10.493396759033203, -5.7249298095703125, -0.8560714721679688, -9.832206726074219, 4.92779541015625, 0.09783935546875, 2.4057159423828125, 5.040000915527344, -1.9875411987304688, 7.6499786376953125, -0.29461669921875, 8.524974822998047, 8.579475402832031, 4.384614944458008, -1.4253959655761719, -0.5378952026367188, 1.6575126647949219, 1.65814208984375, -4.172172546386719, -0.5551834106445312, 1.7052116394042969, 11.75506591796875, 6.16436767578125, 4.666294097900391, 4.0850372314453125, -0.3994560241699219, 3.3739471435546875, 13.687942504882812, 4.706981658935547, -7.611457824707031, 9.592937469482422, 1.6551971435546875, -2.813455581665039, 7.724365234375, -6.95806884765625, 5.764350891113281, 6.237916946411133, 2.120342254638672, 12.616832733154297, 2.0424747467041016, 1.2399368286132812, 1.7496223449707031, 0.9432754516601562, 0.4189453125, 13.353744506835938, -2.5962905883789062, 6.43609619140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000230.npy"}
|
||||
{"epoch": 0.3476946334089191, "step": 231, "batch_size": 64, "mean": 4.152383327484131, "std": 5.923521041870117, "min": -8.108551025390625, "p10": -2.8454086303710935, "median": 3.26068115234375, "p90": 12.288547134399415, "max": 16.481521606445312, "pos_frac": 0.796875, "sample": [10.731330871582031, -1.6536941528320312, 11.855247497558594, 1.5683441162109375, 0.072021484375, 2.275146484375, 0.300689697265625, 10.83099365234375, 3.2968597412109375, -0.2985963821411133, 3.3654937744140625, 1.0030364990234375, 4.029151916503906, 3.0245132446289062, -5.2710418701171875, 0.7751197814941406, 0.5458984375, 12.102916717529297, 9.839950561523438, -1.7509918212890625, 4.5479278564453125, -6.549797058105469, 5.648433685302734, 5.892955780029297, 0.9994010925292969, 11.643020629882812, -3.9414138793945312, -7.562290191650391, 13.577985763549805, 7.893789291381836, 13.006141662597656, -2.9360504150390625, 1.31988525390625, 15.495986938476562, 5.134864807128906, -6.14715576171875, 1.8121061325073242, 8.890522003173828, 4.6560211181640625, -1.6682357788085938, 10.771114349365234, 1.5236587524414062, 0.8278732299804688, -2.6339111328125, 7.832557678222656, 0.14931392669677734, 2.8311004638671875, -1.6251640319824219, 4.310813903808594, 4.866565704345703, 6.433540344238281, 12.36810302734375, 2.1985015869140625, 7.5713958740234375, 11.544940948486328, 16.481521606445312, 3.2245025634765625, 6.376762390136719, 2.7288818359375, 9.847564697265625, -8.108551025390625, 1.8429412841796875, 12.938236236572266, 13.093793869018555], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000231.npy"}
|
||||
{"epoch": 0.3492063492063492, "step": 232, "batch_size": 64, "mean": 2.7846975326538086, "std": 6.430558681488037, "min": -10.728302001953125, "p10": -5.270431423187255, "median": 2.0389404296875, "p90": 10.774085998535156, "max": 18.281814575195312, "pos_frac": 0.65625, "sample": [-0.5842170715332031, 10.799686431884766, 5.398918151855469, 18.281814575195312, 2.6749649047851562, -8.984588623046875, -3.53326416015625, 10.714351654052734, -5.70343017578125, 2.2607803344726562, -4.968597412109375, 1.9607696533203125, -7.810821533203125, 0.2628517150878906, -2.5643043518066406, 7.2946014404296875, -0.9370536804199219, 0.0988616943359375, 1.1522750854492188, 4.185497283935547, 1.4850234985351562, 5.420989990234375, 1.0557708740234375, 7.340301513671875, 5.629188537597656, 2.615781784057617, -3.906078338623047, 11.54647445678711, -3.2486953735351562, 9.709510803222656, 1.1671295166015625, 8.324859619140625, -5.399788856506348, -0.7524681091308594, -0.181427001953125, -0.4284210205078125, -0.5705852508544922, -0.5216903686523438, 17.303131103515625, 3.3856277465820312, 0.15683746337890625, 7.899444580078125, 3.0352935791015625, 18.20770263671875, 11.120208740234375, 17.559844970703125, -8.041011810302734, 1.6424179077148438, -2.2962722778320312, 2.1171112060546875, 10.179153442382812, 2.580904006958008, 10.205978393554688, 4.088672637939453, 6.783256530761719, -5.6369171142578125, -0.4854116439819336, 3.450786590576172, 0.6573524475097656, 4.472431182861328, -0.9716339111328125, -10.728302001953125, 7.721275329589844, 4.527795791625977], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000232.npy"}
|
||||
{"epoch": 0.3507180650037793, "step": 233, "batch_size": 64, "mean": 3.1586742401123047, "std": 5.92794942855835, "min": -8.99920654296875, "p10": -4.2224876403808596, "median": 2.7226028442382812, "p90": 11.488830947875977, "max": 16.67138671875, "pos_frac": 0.640625, "sample": [16.454345703125, 7.42076301574707, 8.196067810058594, -2.5953540802001953, 5.417484283447266, 7.270264625549316, -0.6434326171875, 11.49911117553711, -1.0431289672851562, 4.1677398681640625, 6.891803741455078, 14.305526733398438, -1.0533370971679688, -2.130035400390625, -1.5345001220703125, 12.353416442871094, 4.364509582519531, 0.4598960876464844, -0.9096908569335938, -2.183818817138672, 10.501800537109375, 3.2655487060546875, 7.838962554931641, 2.20269775390625, -7.543548583984375, 5.705146789550781, -3.9780654907226562, 2.563457489013672, -4.697654724121094, -3.9283599853515625, 5.109172821044922, -0.7989330291748047, 7.6078948974609375, 4.633136749267578, 1.6560821533203125, -5.521949768066406, 6.751258850097656, 1.7819290161132812, 2.8817481994628906, 0.45980072021484375, 4.8567962646484375, -8.99920654296875, 12.248779296875, 1.4443511962890625, 5.1338958740234375, -1.4979743957519531, 3.2566146850585938, -4.327239990234375, 10.537364959716797, 7.934850692749023, 0.10549163818359375, -2.62255859375, 0.9401283264160156, 11.46484375, 12.118438720703125, 16.67138671875, 2.890758514404297, -5.507568359375, 9.39361572265625, -4.73089599609375, -0.00246429443359375, -0.5875205993652344, -0.7558193206787109, 8.991317749023438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000233.npy"}
|
||||
{"epoch": 0.35222978080120937, "step": 234, "batch_size": 64, "mean": 2.900754928588867, "std": 5.713118553161621, "min": -8.195659637451172, "p10": -4.707465362548828, "median": 2.380539894104004, "p90": 11.428985595703125, "max": 13.317680358886719, "pos_frac": 0.6875, "sample": [-0.7450714111328125, -7.1858062744140625, 4.239112854003906, -2.500885009765625, 1.8025341033935547, -1.982574462890625, 9.262514114379883, -0.4180793762207031, 8.21466064453125, -2.676513671875, 1.9319076538085938, 4.876039505004883, 4.821601867675781, 11.310771942138672, 4.947654724121094, 1.2189598083496094, -0.5992965698242188, -6.988826751708984, 10.154281616210938, -3.3914966583251953, 3.098529815673828, 5.545087814331055, 7.076129913330078, 12.077239990234375, 1.6628246307373047, -1.7710075378417969, 12.152206420898438, 2.4411869049072266, -4.5367431640625, 0.8471641540527344, 8.647628784179688, 11.47964859008789, 12.157403945922852, 10.44765853881836, 3.4808530807495117, -4.940364837646484, 2.4785919189453125, 0.11162567138671875, 1.3859233856201172, -2.3816680908203125, 3.077880859375, 9.768829345703125, 1.0947647094726562, 12.861053466796875, 10.821887969970703, 3.1120071411132812, 1.0527114868164062, 3.286590576171875, 13.317680358886719, -6.141429901123047, -1.2358856201171875, 1.8289756774902344, -8.195659637451172, 11.501998901367188, 1.3775882720947266, -4.780632019042969, 5.4216766357421875, 4.201259613037109, -2.3748931884765625, 2.4847755432128906, -5.484367370605469, 11.303037643432617, 2.3198928833007812, -2.722827911376953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000234.npy"}
|
||||
{"epoch": 0.35374149659863946, "step": 235, "batch_size": 64, "mean": 3.9499449729919434, "std": 4.819541931152344, "min": -7.906211853027344, "p10": -1.1819351196289059, "median": 3.9594268798828125, "p90": 10.933071136474611, "max": 15.023014068603516, "pos_frac": 0.8125, "sample": [-0.79937744140625, 5.2947235107421875, 4.0676116943359375, 7.406217575073242, 11.213775634765625, -1.3223495483398438, -0.8543014526367188, 4.8061065673828125, 3.5430755615234375, 2.236236572265625, 2.5396270751953125, 9.171165466308594, 0.527801513671875, 5.035678863525391, 4.550628662109375, 6.282463073730469, 2.374725341796875, 2.2363357543945312, 5.877391815185547, 3.3501815795898438, 2.4315185546875, 4.024452209472656, -4.996429443359375, 5.840732574462891, 9.098865509033203, 4.459552764892578, 0.16865921020507812, -4.0995025634765625, 15.023014068603516, 0.9216632843017578, 4.7275848388671875, 1.1367053985595703, 10.517921447753906, 2.2293701171875, 9.728431701660156, 2.2546443939208984, 6.404136657714844, 2.2506637573242188, 11.110992431640625, -7.906211853027344, 2.5960254669189453, 13.55434799194336, -2.6390533447265625, 1.750885009765625, 8.237831115722656, 4.529563903808594, 0.070770263671875, 5.13873291015625, 12.232177734375, 4.365894317626953, 3.8944015502929688, 2.9454345703125, 12.960342407226562, 7.188320159912109, 3.6410255432128906, -6.360649108886719, 11.995689392089844, 4.375774383544922, 8.177749633789062, -0.102691650390625, 7.0792694091796875, -4.823360443115234, -0.061000823974609375, -0.8154716491699219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000235.npy"}
|
||||
{"epoch": 0.35525321239606955, "step": 236, "batch_size": 64, "mean": 3.3745124340057373, "std": 6.5346832275390625, "min": -13.6295166015625, "p10": -3.6638141632080075, "median": 3.0612049102783203, "p90": 11.278994750976564, "max": 16.376922607421875, "pos_frac": 0.671875, "sample": [5.03887939453125, -2.36041259765625, 2.4137420654296875, 8.499488830566406, 3.3359222412109375, 2.9788665771484375, 6.973621368408203, 10.293609619140625, -0.9226913452148438, 3.8872833251953125, 6.765083312988281, 7.624786376953125, -1.203958511352539, -0.0499114990234375, -3.796384811401367, 0.0356903076171875, 4.922538757324219, 9.150276184082031, 13.654510498046875, 15.449966430664062, 0.112060546875, -1.665924072265625, 0.36096954345703125, -2.7876358032226562, 11.327659606933594, 10.0621337890625, 11.165443420410156, 10.53546142578125, 13.405693054199219, 5.617118835449219, 15.070446014404297, 0.6437835693359375, -5.96746826171875, 6.3835906982421875, 6.072784423828125, 0.2758026123046875, -2.157470703125, 15.171630859375, -6.806648254394531, 6.266635894775391, -3.354482650756836, 3.143543243408203, 6.196216583251953, 10.701484680175781, -1.069091796875, -0.2174510955810547, -2.1121139526367188, -13.339653015136719, 6.388355255126953, 2.7158126831054688, -1.9515609741210938, -13.6295166015625, -6.0908050537109375, 8.559257507324219, 16.376922607421875, 7.1171112060546875, 5.2909698486328125, -0.9224948883056641, -1.7439918518066406, -3.8563690185546875, 9.186695098876953, 0.48602294921875, 0.1323089599609375, 2.184661865234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000236.npy"}
|
||||
{"epoch": 0.35676492819349964, "step": 237, "batch_size": 64, "mean": 2.3729097843170166, "std": 5.333409309387207, "min": -8.293746948242188, "p10": -3.9525680541992188, "median": 2.021726608276367, "p90": 9.096145629882814, "max": 18.978668212890625, "pos_frac": 0.65625, "sample": [8.008499145507812, 10.287651062011719, -4.034027099609375, 13.943763732910156, -0.5206947326660156, 3.5190582275390625, 1.7259025573730469, 1.8660163879394531, 3.894275665283203, 3.131824493408203, 2.3262977600097656, -1.291168212890625, -7.04986572265625, 5.199066162109375, -0.5124015808105469, 8.634506225585938, 0.08473396301269531, 9.293991088867188, -7.1493682861328125, 3.565155029296875, -8.293746948242188, 6.209760665893555, -6.056884765625, 5.2003936767578125, 11.206680297851562, -2.3593082427978516, 1.8204498291015625, 3.511882781982422, -3.7624969482421875, 0.9981040954589844, 6.270395278930664, 4.241374969482422, 6.826442718505859, -4.9098052978515625, 3.5279769897460938, -1.9579200744628906, 2.1774368286132812, -6.028388977050781, 10.413604736328125, -3.5680999755859375, 0.9238052368164062, 2.6212615966796875, 1.8314094543457031, -0.6469955444335938, -0.8132476806640625, -2.1608123779296875, 7.0094451904296875, 5.928253173828125, 0.530059814453125, 11.092437744140625, 6.483253479003906, 2.3732833862304688, 3.6158447265625, 18.978668212890625, -1.960052490234375, -0.01883697509765625, 1.5086231231689453, 5.866283416748047, 1.2034225463867188, 7.105262756347656, -1.3093719482421875, -2.6256484985351562, -3.2022171020507812, 7.141021728515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000237.npy"}
|
||||
{"epoch": 0.35827664399092973, "step": 238, "batch_size": 64, "mean": 3.535294771194458, "std": 5.538136005401611, "min": -11.134464263916016, "p10": -2.816132164001465, "median": 3.8275022506713867, "p90": 10.11599521636963, "max": 17.345306396484375, "pos_frac": 0.75, "sample": [-0.6739845275878906, 3.798023223876953, 10.197538375854492, -7.8308563232421875, 6.304740905761719, 12.751480102539062, -2.504302978515625, 4.426624298095703, 13.770584106445312, -2.817058563232422, 1.505035400390625, -4.5280609130859375, 3.3349151611328125, 2.21881103515625, 8.723974227905273, 8.979965209960938, 0.35950279235839844, -2.315631866455078, 7.46429443359375, 8.402481079101562, 9.925727844238281, 17.345306396484375, 8.655696868896484, 3.8589820861816406, 0.22469711303710938, 11.653112411499023, 3.914337158203125, -0.3604888916015625, -1.8141708374023438, 3.9396820068359375, 12.282394409179688, 3.8569812774658203, 2.8285980224609375, 0.34792327880859375, -0.9442787170410156, 1.8868331909179688, 8.88232421875, 0.5442733764648438, 8.820186614990234, 11.252994537353516, 1.0653343200683594, -3.173168182373047, 2.2925643920898438, -1.238311767578125, 5.716224670410156, 4.705055236816406, 8.348785400390625, 8.664966583251953, 1.3136825561523438, 5.68743896484375, 6.6108551025390625, 5.09405517578125, 8.17758560180664, -11.134464263916016, -2.8139705657958984, 3.2009658813476562, -6.204315185546875, 4.772224426269531, 4.945060729980469, 1.576263427734375, 6.697998046875, -2.5462188720703125, -5.937507629394531, 1.7685699462890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000238.npy"}
|
||||
{"epoch": 0.35978835978835977, "step": 239, "batch_size": 64, "mean": 3.336677312850952, "std": 6.237631320953369, "min": -8.352630615234375, "p10": -4.781654739379882, "median": 2.472135543823242, "p90": 14.227635955810548, "max": 18.645233154296875, "pos_frac": 0.75, "sample": [5.482551574707031, 0.00189208984375, 5.700279235839844, 0.7903976440429688, 14.333015441894531, 4.002143859863281, -5.016147613525391, 6.045524597167969, -5.22113037109375, 4.210975646972656, -8.352630615234375, -0.4817657470703125, -3.264312744140625, 15.401763916015625, 0.2532196044921875, 0.243896484375, -2.7295303344726562, -7.24078369140625, 0.996795654296875, 15.2891845703125, -0.4295005798339844, 6.7005462646484375, 10.456871032714844, 3.242034912109375, 14.434906005859375, 1.2406749725341797, 2.4643821716308594, 7.908054351806641, 1.5578269958496094, 2.479888916015625, 13.98175048828125, 1.0198192596435547, -6.057586669921875, -1.2926063537597656, 0.3703155517578125, 16.374725341796875, 0.4585247039794922, 3.6291656494140625, -2.575775146484375, 7.343048095703125, -4.114378929138184, 6.173881530761719, 2.4143295288085938, 7.8724365234375, -4.234504699707031, 7.936016082763672, 7.397270202636719, 18.645233154296875, 14.849853515625, -7.767719268798828, 0.2391643524169922, -0.39641571044921875, -5.6313018798828125, 3.7064056396484375, 2.16021728515625, 3.7462539672851562, 8.703044891357422, 1.4394207000732422, 4.1383514404296875, 7.6643218994140625, 1.6476478576660156, 4.332298278808594, 5.794769287109375, 3.078338623046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000239.npy"}
|
||||
{"epoch": 0.36130007558578986, "step": 240, "batch_size": 64, "mean": 4.82325553894043, "std": 5.891605854034424, "min": -10.267791748046875, "p10": -1.8718086242675778, "median": 5.6500396728515625, "p90": 11.940368270874023, "max": 17.566307067871094, "pos_frac": 0.75, "sample": [9.211578369140625, -3.2471160888671875, 7.258514404296875, -1.61956787109375, 5.936439514160156, 14.285858154296875, -0.9482879638671875, 10.094833374023438, -4.240964889526367, -1.5053596496582031, 5.706512451171875, 0.11508941650390625, 8.403274536132812, 11.61825180053711, 10.378704071044922, 8.868568420410156, 9.01162338256836, -4.854648590087891, 8.650566101074219, 13.429275512695312, -1.4062576293945312, 6.450309753417969, 2.537508964538574, -9.2130126953125, 6.835594177246094, 17.566307067871094, -0.6273574829101562, 5.59356689453125, 13.544479370117188, 8.708610534667969, 8.006210327148438, 8.249343872070312, 2.924163818359375, -1.51171875, -1.9799118041992188, 7.1215667724609375, 1.56707763671875, 6.209197998046875, 11.37435531616211, 1.1674995422363281, 5.16949462890625, 3.9460105895996094, -2.6186370849609375, 12.346092224121094, 16.052597045898438, -10.267791748046875, -0.32427215576171875, 2.8449478149414062, 1.9284915924072266, 5.7932281494140625, 1.8861827850341797, 11.935596466064453, -0.6953773498535156, 5.8136444091796875, 1.5989532470703125, 2.9491806030273438, 11.013916015625, 11.942413330078125, 4.868888854980469, 3.668354034423828, 10.119758605957031, -0.0563812255859375, 1.7588233947753906, 7.343559265136719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000240.npy"}
|
||||
{"epoch": 0.36281179138321995, "step": 241, "batch_size": 64, "mean": 2.46356201171875, "std": 5.315575122833252, "min": -9.960624694824219, "p10": -3.8830078124999994, "median": 1.6782684326171875, "p90": 10.327522659301758, "max": 13.16265869140625, "pos_frac": 0.65625, "sample": [-1.4222869873046875, 0.016155242919921875, 9.251705169677734, -5.324674606323242, 9.300796508789062, -2.9597930908203125, 0.3949623107910156, 5.877479553222656, 3.3361473083496094, 5.56939697265625, -2.862457275390625, 11.636543273925781, -4.094627380371094, -0.6634635925292969, -1.2704544067382812, 10.237686157226562, 2.5996665954589844, 2.5897140502929688, 0.49344635009765625, 11.139923095703125, 4.733329772949219, 5.8633270263671875, -4.2586517333984375, 1.52447509765625, 2.1817092895507812, 0.11738777160644531, 5.453411102294922, 0.9898815155029297, -1.1396331787109375, 0.5579757690429688, 4.917610168457031, 4.763702392578125, 12.474945068359375, -0.8466949462890625, -2.0299339294433594, -5.2923431396484375, -3.1722183227539062, 9.470626831054688, 1.832061767578125, 2.9210357666015625, -9.960624694824219, 0.011562347412109375, -0.5590476989746094, 12.482597351074219, 5.4755859375, 0.3055572509765625, -0.553619384765625, 12.774063110351562, 0.6599960327148438, -6.286781311035156, 4.177337646484375, 4.3105316162109375, 3.66748046875, 6.66644287109375, 10.366024017333984, -4.101806640625, -3.3892288208007812, 6.365795135498047, 13.16265869140625, 2.7432117462158203, -3.2889671325683594, 9.43416976928711, -1.6202926635742188, -0.08255958557128906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000241.npy"}
|
||||
{"epoch": 0.36432350718065004, "step": 242, "batch_size": 64, "mean": 4.44649076461792, "std": 6.153725624084473, "min": -10.439018249511719, "p10": -1.3528052330017086, "median": 3.4919795989990234, "p90": 13.350119400024418, "max": 20.615631103515625, "pos_frac": 0.828125, "sample": [-4.071844100952148, 2.1223831176757812, 5.24468994140625, 3.669635772705078, 9.017776489257812, -7.779285430908203, 13.762027740478516, 5.121894836425781, 10.724578857421875, 11.00235366821289, 4.131813049316406, 1.1387042999267578, -1.5033159255981445, 12.2259521484375, 19.380615234375, -0.575714111328125, 12.326114654541016, 0.3484039306640625, 2.517181396484375, 3.5209617614746094, -0.3592071533203125, 0.9905929565429688, 13.934183120727539, 6.540016174316406, 0.48767852783203125, -3.4234161376953125, 3.3443603515625, 4.984523773193359, 7.403781890869141, 6.9782257080078125, 4.565635681152344, 0.4071998596191406, 20.615631103515625, 0.4347057342529297, 4.567150115966797, 2.2894744873046875, 2.8358497619628906, 1.0417747497558594, 17.036239624023438, 0.5314655303955078, -3.857391357421875, 7.9609375, 1.7046966552734375, 0.31189727783203125, 12.388999938964844, 1.1658821105957031, 2.3272247314453125, 4.730712890625, 6.650848388671875, 5.230621337890625, 16.32512664794922, 5.00029182434082, 15.090469360351562, -0.21564674377441406, -1.0016136169433594, 3.4629974365234375, 5.6203765869140625, 0.2722663879394531, 3.19244384765625, -10.439018249511719, 6.9453887939453125, 1.1659622192382812, -4.9090118408203125, 7.920146942138672], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000242.npy"}
|
||||
{"epoch": 0.36583522297808013, "step": 243, "batch_size": 64, "mean": 3.235015869140625, "std": 5.880239963531494, "min": -8.976837158203125, "p10": -4.731757354736327, "median": 2.4771547317504883, "p90": 11.667563629150392, "max": 13.780128479003906, "pos_frac": 0.703125, "sample": [3.7254600524902344, 11.79925537109375, 0.738067626953125, -5.0416107177734375, 13.106063842773438, 6.990867614746094, -6.5447235107421875, 2.2598228454589844, 6.7821197509765625, -4.008766174316406, 9.306396484375, 3.9019622802734375, 1.5053253173828125, 3.5492935180664062, 2.3770618438720703, 7.599365234375, -1.2469711303710938, -5.449485778808594, 9.696588516235352, 2.1134033203125, -1.7328262329101562, 13.395339965820312, 3.0488929748535156, -0.769862174987793, -2.1562881469726562, 7.028800964355469, -3.679534912109375, 10.477058410644531, 10.3271484375, 7.271656036376953, -0.89324951171875, -5.91851806640625, 1.0207138061523438, -5.341220855712891, 5.842113494873047, 6.5975494384765625, 13.355644226074219, 13.780128479003906, -8.976837158203125, 4.626884460449219, 10.016860961914062, 0.205810546875, 6.184333801269531, 12.328544616699219, -2.0892791748046875, 2.1024551391601562, -2.8749618530273438, 0.8433113098144531, 11.360282897949219, 8.548110961914062, 3.553863525390625, 1.534515380859375, 1.0840225219726562, 9.38677978515625, -3.5085411071777344, 12.592010498046875, 2.0060691833496094, 2.5772476196289062, 1.5536155700683594, -1.6468315124511719, -8.237586975097656, 6.528087615966797, -0.39849090576171875, 2.927703857421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000243.npy"}
|
||||
{"epoch": 0.3673469387755102, "step": 244, "batch_size": 64, "mean": 5.507327079772949, "std": 5.300260066986084, "min": -2.8182754516601562, "p10": -1.1407093048095698, "median": 4.958354949951172, "p90": 14.398469543457036, "max": 18.034271240234375, "pos_frac": 0.859375, "sample": [5.6105499267578125, 15.302886962890625, 4.661155700683594, 2.3217620849609375, -2.7054290771484375, 2.3703155517578125, 0.5048971176147461, 8.953163146972656, 2.249448776245117, -0.10814285278320312, -0.5170154571533203, 6.108154296875, 3.0276031494140625, 13.324981689453125, 2.829273223876953, -1.5025634765625, 8.177635192871094, 15.063766479492188, -2.2324981689453125, 7.7215118408203125, 7.39764404296875, 8.401641845703125, -1.4261970520019531, 9.435958862304688, 0.224365234375, 16.358200073242188, 5.691764831542969, 7.0341339111328125, 16.32343292236328, 2.541769027709961, 14.858535766601562, 4.258613586425781, 2.0042552947998047, -2.8182754516601562, -1.9355926513671875, 1.868417739868164, -1.4080066680908203, 10.78826904296875, 11.637588500976562, 18.034271240234375, 0.3149862289428711, 17.70611572265625, 0.4372081756591797, 2.4714508056640625, 0.3088493347167969, 9.950435638427734, 7.578460693359375, 7.016895294189453, 2.22857666015625, 6.8535614013671875, 5.25555419921875, 7.6707305908203125, 2.424713134765625, 9.008331298828125, 3.8710784912109375, 10.03786849975586, 6.820438385009766, 3.69976806640625, 6.467746734619141, 2.6057662963867188, 6.66876220703125, 3.142740249633789, 6.137050628662109, 3.3596038818359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000244.npy"}
|
||||
{"epoch": 0.3688586545729403, "step": 245, "batch_size": 64, "mean": 4.888873100280762, "std": 6.23990535736084, "min": -11.779220581054688, "p10": -2.108719062805175, "median": 4.052024841308594, "p90": 14.061558532714844, "max": 17.716888427734375, "pos_frac": 0.796875, "sample": [3.0635910034179688, 6.436309814453125, 6.24835205078125, 15.321502685546875, 7.279163360595703, -2.3870067596435547, 1.5301780700683594, 8.208637237548828, 1.5033378601074219, -5.123046875, 0.4364204406738281, -0.0975494384765625, 13.540767669677734, 3.9018898010253906, 9.657485961914062, 10.700965881347656, 0.3115863800048828, 12.31695556640625, -11.779220581054688, 17.511489868164062, 14.068649291992188, 1.9781951904296875, 4.2291412353515625, 6.46183967590332, 11.144439697265625, 7.6569061279296875, 5.56280517578125, -1.1501541137695312, 7.865058898925781, 0.18219947814941406, 7.574119567871094, -6.531515121459961, 2.1748695373535156, 8.34334945678711, 4.104927062988281, 6.28338623046875, 1.2317962646484375, -3.5206031799316406, 8.532257080078125, 17.716888427734375, 5.185077667236328, -0.2574005126953125, 16.314929962158203, 1.2452392578125, -3.174945831298828, 1.9783086776733398, 14.295511245727539, 14.295143127441406, 4.0246734619140625, -1.233978271484375, 2.623748779296875, 6.164634704589844, -1.459381103515625, 3.312244415283203, 14.045013427734375, -0.6745338439941406, -3.8005218505859375, 13.839828491210938, 0.9173049926757812, 2.1231956481933594, 11.649356842041016, 2.5801544189453125, 2.3245277404785156, 4.079376220703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000245.npy"}
|
||||
{"epoch": 0.37037037037037035, "step": 246, "batch_size": 64, "mean": 4.539945602416992, "std": 6.918664932250977, "min": -8.256134033203125, "p10": -4.432910919189453, "median": 3.7219886779785156, "p90": 14.059866714477545, "max": 20.848098754882812, "pos_frac": 0.71875, "sample": [8.386451721191406, 10.464872360229492, 17.555999755859375, 12.264886856079102, 11.062606811523438, 1.7189750671386719, -4.599176406860352, 2.2713623046875, 2.027740478515625, -4.4575958251953125, 6.0352783203125, 11.373291015625, 3.6356048583984375, -0.7177448272705078, 15.903274536132812, 0.5308074951171875, 4.344154357910156, -6.096210479736328, -2.8216323852539062, 0.175567626953125, 10.152557373046875, 14.74691390991211, 0.7905807495117188, 8.111722946166992, 19.259063720703125, 8.00653076171875, 9.47705078125, 20.46343994140625, -1.6392288208007812, 3.6607284545898438, 5.070549011230469, 8.100509643554688, 9.400413513183594, 5.522808074951172, -4.926483154296875, 6.804527282714844, -6.355010986328125, 6.186195373535156, 12.456756591796875, -3.66436767578125, 9.620651245117188, -2.187286376953125, -4.375312805175781, -1.422027587890625, 5.062965393066406, -4.7652740478515625, 5.7019500732421875, 8.388465881347656, 3.7832489013671875, -8.256134033203125, -3.16998291015625, 2.193643569946289, 9.417679786682129, 20.848098754882812, 1.0830497741699219, -0.5236053466796875, 1.5874042510986328, 15.710075378417969, -0.5678558349609375, 5.95587158203125, 1.2778739929199219, 3.1443824768066406, 2.6475906372070312, -1.2827377319335938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000246.npy"}
|
||||
{"epoch": 0.37188208616780044, "step": 247, "batch_size": 64, "mean": 2.442718505859375, "std": 5.8060994148254395, "min": -11.775810241699219, "p10": -3.5246074676513666, "median": 1.5445613861083984, "p90": 8.889458465576173, "max": 18.4146728515625, "pos_frac": 0.59375, "sample": [-4.6969757080078125, 7.169097900390625, 0.5622024536132812, 1.0457572937011719, 6.4368133544921875, -2.7778244018554688, -0.5475082397460938, 8.477020263671875, 16.340255737304688, -2.5381622314453125, 6.1878509521484375, 4.6793212890625, 10.543678283691406, 7.1869659423828125, 0.4599761962890625, 10.971038818359375, 3.399578094482422, 8.590049743652344, -0.7768106460571289, -1.1033782958984375, 18.4146728515625, 14.931190490722656, -9.409465789794922, -3.0367698669433594, 8.091941833496094, -0.65509033203125, 2.813753128051758, -0.726104736328125, -0.4705810546875, 8.194320678710938, 8.211639404296875, 0.6880340576171875, 2.2122955322265625, 2.1431045532226562, -0.14101409912109375, -0.5698566436767578, 6.908611297607422, -11.775810241699219, 0.14804840087890625, -1.0121612548828125, -5.941364288330078, -0.32282447814941406, -2.30072021484375, 9.302192687988281, -0.86175537109375, 2.5089263916015625, -2.1311721801757812, -3.7336807250976562, 3.5048828125, 2.043365478515625, 3.78167724609375, 4.733268737792969, -0.6909255981445312, 6.157649993896484, -7.2646026611328125, 3.7367515563964844, -1.0041332244873047, 4.3352508544921875, 0.455047607421875, -6.116943359375, -2.623931884765625, 9.017776489257812, 7.136627197265625, 8.042917251586914], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000247.npy"}
|
||||
{"epoch": 0.37339380196523053, "step": 248, "batch_size": 64, "mean": 3.804720878601074, "std": 6.813633441925049, "min": -8.936859130859375, "p10": -5.491202354431151, "median": 2.769343376159668, "p90": 13.398046493530279, "max": 20.84311866760254, "pos_frac": 0.6875, "sample": [2.946138381958008, 2.7729339599609375, -2.50048828125, -6.432403564453125, 8.744979858398438, 3.319427490234375, 9.999053955078125, 6.811805725097656, -0.08874893188476562, -0.7316532135009766, -1.132950782775879, 20.84311866760254, 3.8904876708984375, -3.5743179321289062, 10.426864624023438, 4.599742889404297, 7.6356048583984375, -8.936859130859375, -7.401914596557617, 2.7915878295898438, -7.6259765625, -0.004486083984375, 8.519638061523438, 5.467350959777832, 7.2520599365234375, 7.1722412109375, 2.942363739013672, -0.25104522705078125, 2.327289581298828, 11.408088684082031, 0.16131591796875, 16.328826904296875, 1.5937519073486328, -6.206666946411133, 14.939338684082031, 12.29119873046875, -3.4083404541015625, -0.05469512939453125, 2.7657527923583984, 1.8551864624023438, 6.412940979003906, 11.328189849853516, 2.6532535552978516, 19.240341186523438, 9.130455017089844, 5.388889312744141, 1.8104095458984375, 1.4997787475585938, 5.264865875244141, 8.712211608886719, 17.60833740234375, 2.2146148681640625, 8.090728759765625, -1.55682373046875, -3.8217849731445312, 1.5725860595703125, -6.95361328125, 13.87240982055664, 14.928375244140625, 2.1261978149414062, -0.21036148071289062, 1.4146671295166016, -7.10931396484375, -1.5708236694335938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000248.npy"}
|
||||
{"epoch": 0.3749055177626606, "step": 249, "batch_size": 64, "mean": 1.8059359788894653, "std": 6.923348426818848, "min": -17.855789184570312, "p10": -7.970407104492187, "median": 2.708271026611328, "p90": 10.437054443359377, "max": 17.004732131958008, "pos_frac": 0.640625, "sample": [6.831302642822266, -3.3002452850341797, 8.279365539550781, 4.716583251953125, 4.829545974731445, -7.27764892578125, -10.83880615234375, 10.706100463867188, 13.2698974609375, 5.014762878417969, 6.691974639892578, 0.9206008911132812, 1.6235084533691406, 2.752410888671875, -6.62286376953125, -9.903915405273438, -2.0640945434570312, -13.595352172851562, -2.531951904296875, -5.586248397827148, 2.6641311645507812, -0.8404312133789062, -5.6441192626953125, 5.469564437866211, 1.8524398803710938, 3.8525466918945312, -0.3534259796142578, -2.3655242919921875, 10.81378173828125, 0.612701416015625, -4.4725799560546875, 7.151935577392578, 0.9826908111572266, -0.5868263244628906, 12.602035522460938, -8.267303466796875, 6.364288330078125, -4.3272857666015625, 8.232927322387695, -8.899276733398438, 7.663383483886719, 17.004732131958008, 11.936470031738281, 10.806022644042969, 9.809280395507812, 5.236045837402344, -9.075862884521484, 3.3763656616210938, 3.9487991333007812, 4.21923828125, 2.4520702362060547, 6.851676940917969, 4.85009765625, -0.23320770263671875, 7.874387741088867, 3.6513519287109375, -3.706348419189453, 2.771066665649414, 8.071701049804688, -17.855789184570312, 1.0858268737792969, -2.1392593383789062, 0.762908935546875, 7.4617462158203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000249.npy"}
|
||||
{"epoch": 0.3764172335600907, "step": 250, "batch_size": 64, "mean": 5.169164657592773, "std": 7.172699928283691, "min": -20.3748779296875, "p10": -1.3905855178833002, "median": 4.682207107543945, "p90": 13.484497261047366, "max": 22.5343017578125, "pos_frac": 0.8125, "sample": [10.468502044677734, 0.6656951904296875, -0.7948589324951172, 5.365758895874023, 0.080657958984375, 10.975845336914062, 10.561197280883789, 11.273307800292969, 15.9796142578125, 1.6756973266601562, 12.957672119140625, -8.136802673339844, 6.349800109863281, 1.6073532104492188, 5.45391845703125, 5.037534713745117, 0.240203857421875, -20.3748779296875, 6.495513916015625, 4.10589599609375, 3.848339080810547, -5.663972854614258, 2.248638153076172, -7.907615661621094, 5.335987091064453, 3.557586669921875, 15.329299926757812, 6.863780975341797, 6.017059326171875, 12.71063232421875, -0.049594879150390625, 4.822021484375, 11.564323425292969, -2.803068161010742, 3.4599952697753906, 5.456401824951172, 4.417684555053711, 8.096607208251953, 21.855567932128906, 2.5983734130859375, 5.1118011474609375, -1.6458969116210938, 12.31124496459961, -0.6734542846679688, 15.59442138671875, 3.4630508422851562, 21.55078125, 10.93556022644043, 4.542392730712891, 13.71027946472168, 1.2421875, 4.3654327392578125, 1.85345458984375, 5.74641227722168, -0.584716796875, 6.075187683105469, 22.5343017578125, 9.207237243652344, 2.4916305541992188, -0.5091762542724609, -3.0686264038085938, 8.798652648925781, 2.2754249572753906, 3.7532806396484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000250.npy"}
|
||||
{"epoch": 0.3779289493575208, "step": 251, "batch_size": 64, "mean": 2.808767080307007, "std": 6.286217212677002, "min": -7.904327392578125, "p10": -6.08548355102539, "median": 1.8161697387695312, "p90": 10.108367156982425, "max": 20.639564514160156, "pos_frac": 0.703125, "sample": [1.9537296295166016, 12.372825622558594, 20.639564514160156, 0.10002899169921875, 1.4824676513671875, 1.5796241760253906, -7.904327392578125, -0.4821968078613281, -0.022125244140625, 18.511611938476562, 0.0471954345703125, -6.399085998535156, 8.023544311523438, -1.4173126220703125, -1.7416973114013672, -7.227565765380859, 1.5118408203125, 6.71356201171875, -1.1902580261230469, -7.1240386962890625, 1.4652786254882812, 19.234893798828125, 1.2631683349609375, -7.4589996337890625, 7.111747741699219, -1.0668106079101562, 10.480033874511719, 9.265213012695312, -5.3537445068359375, 1.2907180786132812, 8.191085815429688, 3.4338302612304688, 0.3855743408203125, 3.116509437561035, 5.253089904785156, 7.594024658203125, 8.151756286621094, 6.319648742675781, -4.852375030517578, -4.026220321655273, 2.749124526977539, 3.391510009765625, 7.3862152099609375, 6.031219482421875, 0.1279449462890625, 1.7154006958007812, 3.3265838623046875, 7.235343933105469, 3.7764434814453125, 12.9742431640625, 7.6302337646484375, 0.7003326416015625, 8.670539855957031, -6.42266845703125, 1.9169387817382812, -1.908254623413086, 2.6205215454101562, -6.880279541015625, 3.453998565673828, -1.1869888305664062, -2.749134063720703, 0.4822998046875, 10.469718933105469, 5.02398681640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000251.npy"}
|
||||
{"epoch": 0.3794406651549509, "step": 252, "batch_size": 64, "mean": 3.0254111289978027, "std": 6.417983055114746, "min": -16.72125244140625, "p10": -4.541952800750731, "median": 2.7360153198242188, "p90": 12.277596664428714, "max": 16.14654541015625, "pos_frac": 0.75, "sample": [3.5672874450683594, -12.885650634765625, -1.5937919616699219, -7.994842529296875, 14.206390380859375, 2.03289794921875, 1.8061065673828125, 11.019012451171875, 3.608959197998047, 9.92718505859375, 11.610908508300781, 4.8796186447143555, 1.8925933837890625, 3.2446517944335938, 14.416023254394531, 0.817138671875, 12.893936157226562, 1.5264053344726562, 3.915557861328125, 0.9514379501342773, -0.7823028564453125, 5.877777099609375, 9.090621948242188, 7.837923049926758, 8.33905029296875, -0.6144943237304688, -0.6778030395507812, -11.924163818359375, 4.362665176391602, 3.6271286010742188, 4.276678085327148, 6.715492248535156, 9.550844192504883, 2.76092529296875, 5.299095153808594, 0.6550140380859375, -0.2890777587890625, 4.358425140380859, 2.3207778930664062, 0.7123184204101562, 5.3753509521484375, -1.142974853515625, -1.1642837524414062, 4.429378509521484, 1.7584991455078125, 16.14654541015625, 0.9538497924804688, -0.26081085205078125, -16.72125244140625, -5.703033447265625, -3.0194005966186523, 3.0807342529296875, 1.0748023986816406, -5.194475173950195, 2.7111053466796875, 4.555961608886719, 4.7175445556640625, 15.585853576660156, 2.2051467895507812, 12.56332015991211, 12.66973876953125, 2.515838623046875, -5.53950309753418, 0.6896629333496094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000252.npy"}
|
||||
{"epoch": 0.38095238095238093, "step": 253, "batch_size": 64, "mean": 2.915503978729248, "std": 6.041269302368164, "min": -10.368399620056152, "p10": -3.6987684249877923, "median": 1.5812625885009766, "p90": 11.063825035095217, "max": 18.00341796875, "pos_frac": 0.65625, "sample": [5.704246520996094, -2.306549072265625, 17.5264835357666, 3.0837326049804688, 2.1925621032714844, 1.5860252380371094, -0.26403045654296875, 10.305028915405273, 8.902767181396484, -6.092155456542969, 1.82183837890625, -0.46160888671875, 13.521533966064453, -0.8635654449462891, 1.0775184631347656, -2.7028121948242188, -4.406654357910156, 10.425638198852539, -2.201557159423828, 1.4463882446289062, 1.2009429931640625, 3.165027618408203, -2.4308700561523438, 8.752960205078125, 4.494453430175781, -1.2919769287109375, 13.572273254394531, 5.4778900146484375, 0.957672119140625, 6.9340667724609375, 1.5764999389648438, 0.6156806945800781, 5.046281814575195, 1.0891265869140625, 1.671966552734375, 18.00341796875, -0.542327880859375, -1.9716033935546875, 8.45184326171875, 4.723808288574219, 3.2102279663085938, -4.230442047119141, 1.1913299560546875, -10.368399620056152, 5.542655944824219, -4.000513076782227, -4.0960235595703125, 2.6473236083984375, 17.325950622558594, 0.8459568023681641, 8.229202270507812, 6.03125, 17.73968505859375, 2.2161483764648438, 4.643985748291016, -0.24639892578125, -4.142208099365234, -2.5278778076171875, 11.337333679199219, -2.9946975708007812, -1.3134422302246094, -2.131378173828125, 3.0621795654296875, 0.8284454345703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000253.npy"}
|
||||
{"epoch": 0.382464096749811, "step": 254, "batch_size": 64, "mean": 3.69411039352417, "std": 6.237550258636475, "min": -7.750995635986328, "p10": -3.8814636230468738, "median": 2.6811695098876953, "p90": 12.082315444946293, "max": 18.279075622558594, "pos_frac": 0.65625, "sample": [-4.410068511962891, 0.214263916015625, 9.017822265625, -1.14532470703125, 3.148345947265625, 10.97369384765625, 14.564220428466797, 6.708824157714844, 2.2139930725097656, -0.24929046630859375, -0.8704452514648438, 11.043960571289062, 5.1553192138671875, 6.963968276977539, 3.6849136352539062, 7.710025787353516, 17.1632080078125, 10.686393737792969, -1.3149261474609375, 3.3573074340820312, -0.618011474609375, 17.769821166992188, -1.7045211791992188, 10.73564338684082, -0.92218017578125, -4.741973876953125, 9.687469482421875, 10.858442306518555, 12.527324676513672, -6.49664306640625, 18.279075622558594, -1.5548248291015625, 0.01459503173828125, -0.20259475708007812, -7.750995635986328, 8.360626220703125, -0.8153495788574219, 7.603485107421875, 0.05163383483886719, 6.280811309814453, 5.379081726074219, -2.7349700927734375, 5.305118560791016, -4.564094543457031, -2.5344505310058594, 16.23351287841797, -0.15264129638671875, 5.075965881347656, 6.7633056640625, 0.9414501190185547, -2.159820556640625, -2.5967254638671875, 1.7416610717773438, 6.5099029541015625, 12.643753051757812, 3.5020904541015625, 1.7422866821289062, 1.9352149963378906, 4.74542236328125, 1.2281112670898438, 3.440776824951172, 1.20245361328125, -4.3728179931640625, -4.8295440673828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000254.npy"}
|
||||
{"epoch": 0.3839758125472411, "step": 255, "batch_size": 64, "mean": 4.0058794021606445, "std": 6.52316951751709, "min": -11.993850708007812, "p10": -4.749494171142578, "median": 4.64069938659668, "p90": 11.17694549560547, "max": 20.92182159423828, "pos_frac": 0.765625, "sample": [10.653249740600586, -11.367149353027344, 13.514564514160156, 11.265380859375, 8.399864196777344, 7.968292236328125, -0.4219970703125, -0.8398780822753906, 1.3053054809570312, 8.8350830078125, 0.9244766235351562, -1.9691963195800781, 6.2717742919921875, 9.797996520996094, -7.380155563354492, 3.4446182250976562, 6.248260498046875, 5.673530578613281, 2.00506591796875, -10.437843322753906, 10.67645263671875, 4.042762756347656, 5.224334716796875, 4.2020721435546875, 3.76690673828125, 3.2131500244140625, -5.189109802246094, 1.3349571228027344, 14.968955993652344, 0.9384994506835938, 8.762458801269531, 6.773841857910156, -2.3352203369140625, 2.913482666015625, 2.65679931640625, 5.104448318481445, 5.241233825683594, 6.2259063720703125, 14.899715423583984, 7.086292266845703, 1.9880790710449219, 6.369880676269531, 6.9175567626953125, -4.5223541259765625, 2.5175704956054688, -4.846839904785156, 7.0989227294921875, 20.92182159423828, -4.38653564453125, 9.98638916015625, 12.473514556884766, 6.147346496582031, 2.133514404296875, 10.788105010986328, 4.786937713623047, 5.526287078857422, 3.8986358642578125, 12.056461334228516, 10.970596313476562, -11.993850708007812, -4.1045989990234375, -0.04659271240234375, 4.4944610595703125, -7.198204040527344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000255.npy"}
|
||||
{"epoch": 0.3854875283446712, "step": 256, "batch_size": 64, "mean": 3.8371098041534424, "std": 6.564303874969482, "min": -9.326828002929688, "p10": -2.9352081298828123, "median": 2.508603096008301, "p90": 13.377318954467773, "max": 16.69073486328125, "pos_frac": 0.703125, "sample": [-0.21312713623046875, 2.28509521484375, 4.2160844802856445, 10.069541931152344, 12.910919189453125, -0.588134765625, -7.984188079833984, 11.082733154296875, -1.225851058959961, 0.3517303466796875, -8.640228271484375, 2.4392528533935547, -2.6985931396484375, 14.298973083496094, -2.976287841796875, 16.69073486328125, 5.601848602294922, 2.8753814697265625, -5.832069396972656, 1.4095726013183594, -3.505535125732422, 13.351852416992188, 2.577953338623047, 4.47088623046875, -1.4438629150390625, 0.9032363891601562, 5.264436721801758, -2.0924453735351562, 3.2306900024414062, 3.1127548217773438, -1.521392822265625, 8.039886474609375, 1.96856689453125, 10.352373123168945, 2.289745330810547, 10.282821655273438, 11.888263702392578, 9.1810302734375, 2.070281982421875, 13.388233184814453, -0.23236846923828125, 0.8758621215820312, 16.3214111328125, -9.326828002929688, 15.064701080322266, 3.483654022216797, 10.056182861328125, -0.166351318359375, 1.4737701416015625, 13.9466552734375, -0.341278076171875, 11.813819885253906, 5.732666015625, 3.1157989501953125, 7.183868408203125, -8.783458709716797, 11.7547607421875, -1.8179931640625, 0.7687273025512695, 14.986091613769531, -2.83935546875, 4.187431335449219, 0.088623046875, 0.3454742431640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000256.npy"}
|
||||
{"epoch": 0.3869992441421013, "step": 257, "batch_size": 64, "mean": 3.724337577819824, "std": 7.1671953201293945, "min": -14.392780303955078, "p10": -4.904858207702636, "median": 3.018352508544922, "p90": 13.56634521484375, "max": 21.205066680908203, "pos_frac": 0.6875, "sample": [-0.2484264373779297, -0.770050048828125, -0.1801605224609375, 13.37896728515625, 17.941932678222656, 6.438262939453125, -1.5122261047363281, 12.956363677978516, 3.8096961975097656, 15.379884719848633, 10.656299591064453, 2.479339599609375, 3.4822349548339844, 0.40932464599609375, 13.48602294921875, 11.972633361816406, 13.60076904296875, 2.4690399169921875, -8.565185546875, -0.3508186340332031, 0.09879684448242188, 2.1442031860351562, 1.049774169921875, -6.232084274291992, 10.163528442382812, -7.3576202392578125, -0.5341148376464844, 4.697113037109375, 4.115394592285156, -3.8981170654296875, 4.749164581298828, 15.441059112548828, -0.1280670166015625, -3.4277877807617188, 8.895835876464844, 3.7394027709960938, -4.3768310546875, 7.569297790527344, 11.340408325195312, -7.4149169921875, -14.392780303955078, 15.52471923828125, 4.926471710205078, 5.5814208984375, -4.734375, -4.977922439575195, 3.112274169921875, -1.0287399291992188, 6.596641540527344, 3.6160430908203125, 6.614727020263672, 1.760894775390625, 14.532577514648438, 0.058124542236328125, 4.708221435546875, 21.205066680908203, -0.5908718109130859, -7.680961608886719, 11.9678955078125, 2.793853759765625, 6.030885696411133, 2.9244308471679688, 1.833587646484375, 0.507080078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000257.npy"}
|
||||
{"epoch": 0.3885109599395314, "step": 258, "batch_size": 64, "mean": 4.249208927154541, "std": 5.068908214569092, "min": -6.669097900390625, "p10": -2.048514938354492, "median": 3.7800979614257812, "p90": 10.754205894470218, "max": 17.465869903564453, "pos_frac": 0.84375, "sample": [3.8635101318359375, 2.168659210205078, 3.8224639892578125, -1.7499580383300781, 2.54730224609375, 14.520835876464844, 12.512542724609375, 1.65960693359375, 7.633457183837891, -1.3974723815917969, 8.847904205322266, 6.438850402832031, -3.3467559814453125, 6.207508087158203, 6.185752868652344, 5.127899169921875, -6.409236907958984, -3.6790504455566406, 1.6432723999023438, -2.437335968017578, 5.885509490966797, 1.279571533203125, -5.211517333984375, 9.702339172363281, 3.1774139404296875, 4.491142272949219, 4.7841033935546875, 0.8009357452392578, 1.8744125366210938, 3.73773193359375, 4.5011138916015625, 0.8879051208496094, 12.707817077636719, 10.001691818237305, 3.0751380920410156, -1.2484359741210938, 0.05191612243652344, 3.5324325561523438, 2.8629150390625, 15.578353881835938, 5.424224853515625, 2.348876953125, 7.168220520019531, 1.4506378173828125, 10.043304443359375, 11.058877944946289, 8.877395629882812, 6.635650634765625, 1.7414093017578125, 9.6458740234375, 1.475860595703125, 1.1320343017578125, 5.497428894042969, 1.3838653564453125, 6.480648040771484, 2.6417083740234375, 8.669464111328125, -6.669097900390625, 3.4155807495117188, -2.1764678955078125, 11.692779541015625, 5.531803131103516, 17.465869903564453, 4.3811798095703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000258.npy"}
|
||||
{"epoch": 0.3900226757369615, "step": 259, "batch_size": 64, "mean": 5.88548469543457, "std": 7.200068473815918, "min": -14.018203735351562, "p10": -2.684495162963867, "median": 5.596620559692383, "p90": 14.64038429260254, "max": 22.025543212890625, "pos_frac": 0.796875, "sample": [-2.8795814514160156, -6.432586669921875, -14.018203735351562, 8.275688171386719, 5.6288909912109375, 10.74441909790039, 11.68994140625, 19.411773681640625, 5.480915069580078, -2.2292938232421875, 14.415260314941406, 1.102386474609375, 6.680809020996094, 5.6060638427734375, 8.229183197021484, -1.4401779174804688, -4.867109298706055, 0.15379905700683594, 22.025543212890625, 9.023712158203125, 0.5730209350585938, 14.333732604980469, 16.097484588623047, 13.146621704101562, 5.613494873046875, -0.3916778564453125, 2.3095703125, 5.046306610107422, 3.230712890625, 3.9377822875976562, 5.587177276611328, 2.8084487915039062, -3.9816856384277344, 8.573135375976562, 4.140007019042969, -1.995269775390625, 1.5993194580078125, 2.880828857421875, 13.506591796875, 8.2274169921875, 14.178302764892578, 5.346443176269531, -0.6993484497070312, 17.164718627929688, 2.0427017211914062, 7.0535888671875, -6.600032806396484, -0.882293701171875, -6.791355133056641, 2.1488685607910156, 17.671966552734375, 12.043746948242188, 11.11163330078125, 8.51092529296875, 17.358383178710938, 14.736865997314453, 8.025123596191406, 1.5198898315429688, 2.3704566955566406, 5.139030456542969, 13.1065673828125, 13.064910888671875, 10.589500427246094, 6.615974426269531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000259.npy"}
|
||||
{"epoch": 0.3915343915343915, "step": 260, "batch_size": 64, "mean": 3.911125898361206, "std": 7.7707839012146, "min": -10.857179641723633, "p10": -7.272053146362305, "median": 3.6615781784057617, "p90": 15.732169342041017, "max": 20.314289093017578, "pos_frac": 0.703125, "sample": [12.28411865234375, -10.091651916503906, 6.68400764465332, -3.691375732421875, -1.9127883911132812, 16.0347900390625, 1.9931640625, 2.2279586791992188, 16.378463745117188, -6.925506591796875, 15.533729553222656, 8.381593704223633, 15.96099853515625, 1.9195480346679688, 5.325630187988281, -1.61956787109375, 12.290794372558594, 1.7738456726074219, 3.5827159881591797, 20.314289093017578, 9.948516845703125, 3.8955230712890625, 16.490142822265625, 11.617156982421875, -0.08693313598632812, 1.9541511535644531, 9.00259780883789, 16.72555160522461, -2.516094207763672, 1.03778076171875, 8.830135345458984, 4.6167144775390625, 5.607170104980469, -7.196502685546875, -0.1306610107421875, 7.044708251953125, -1.5117416381835938, -6.815559387207031, -10.686067581176758, 3.97503662109375, -0.36382293701171875, 1.3083305358886719, 3.4276599884033203, 3.7404403686523438, 3.0376434326171875, 7.256458282470703, 7.756229400634766, -7.324501037597656, 15.817214965820312, 14.840011596679688, -7.304431915283203, 4.569387435913086, 1.684600830078125, 5.38714599609375, 8.81673812866211, 10.490242004394531, -2.64697265625, 11.063602447509766, 6.681610107421875, -10.857179641723633, 1.7243232727050781, -10.749465942382812, -8.812644958496094, 2.523040771484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000260.npy"}
|
||||
{"epoch": 0.3930461073318216, "step": 261, "batch_size": 64, "mean": 2.9902381896972656, "std": 6.922023296356201, "min": -14.245361328125, "p10": -4.770021057128906, "median": 2.1976184844970703, "p90": 12.497423934936526, "max": 20.783302307128906, "pos_frac": 0.71875, "sample": [-0.8417243957519531, 6.92633056640625, 1.1044654846191406, 0.007823944091796875, -4.2772216796875, 11.370635986328125, 5.100975036621094, 12.04910659790039, 1.7931861877441406, -0.6248207092285156, -8.031169891357422, -4.8871307373046875, 0.29061126708984375, 10.614593505859375, 7.7149810791015625, -0.176513671875, 0.3537025451660156, 17.745010375976562, 0.7763385772705078, 18.729324340820312, 6.696990966796875, 4.953529357910156, -2.2383880615234375, -5.2987518310546875, 2.7743072509765625, -1.1781387329101562, 20.783302307128906, 3.286975860595703, 0.431396484375, 0.9992446899414062, 1.4433364868164062, 3.418008804321289, 5.485931396484375, 12.689559936523438, 0.3710670471191406, 0.2729015350341797, 18.09354019165039, -14.245361328125, -9.612037658691406, 5.913963317871094, 6.248435974121094, 1.6374626159667969, 6.639129638671875, 6.612205505371094, 6.195564270019531, -0.5801315307617188, -1.3766555786132812, -11.147384643554688, -8.108516693115234, 4.469203948974609, 2.60205078125, -0.7744216918945312, 14.5045166015625, 14.111862182617188, -0.2247314453125, -4.49676513671875, 3.3104248046875, 2.7926559448242188, 0.3446197509765625, 3.4590911865234375, 3.2173843383789062, 0.35052490234375, 6.836761474609375, 3.9720706939697266], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000261.npy"}
|
||||
{"epoch": 0.3945578231292517, "step": 262, "batch_size": 64, "mean": 3.981867551803589, "std": 5.660760402679443, "min": -6.5892791748046875, "p10": -1.8394191741943358, "median": 2.92193603515625, "p90": 11.63287467956543, "max": 17.072235107421875, "pos_frac": 0.734375, "sample": [-3.4854812622070312, 5.819122314453125, 7.4283294677734375, -0.1678752899169922, 2.253936767578125, 7.338834762573242, 5.87945556640625, -1.3248748779296875, 9.253087997436523, 9.458690643310547, -0.796051025390625, 14.941558837890625, -1.8846588134765625, 3.0901565551757812, 14.021980285644531, -6.020526885986328, 1.5701923370361328, 10.838508605957031, 2.6776123046875, 6.28594970703125, -4.9003143310546875, 2.4782943725585938, -1.5752716064453125, -0.754608154296875, 11.185142517089844, 10.909896850585938, 4.3555145263671875, 8.533031463623047, -6.5892791748046875, -0.23413848876953125, 10.2393798828125, 0.9636459350585938, 1.1435317993164062, 11.23062515258789, -6.202850341796875, 3.6916046142578125, 2.040781021118164, 8.733718872070312, 11.805267333984375, 1.8604965209960938, -1.2550506591796875, -1.49188232421875, -1.7338600158691406, 12.02362060546875, 17.072235107421875, 8.368690490722656, 2.7537155151367188, 3.1493873596191406, -1.3148193359375, 2.1965599060058594, 3.5136184692382812, 0.9287910461425781, 14.161632537841797, 4.061878204345703, 2.428539276123047, 0.5486602783203125, 2.2823944091796875, 15.666671752929688, 3.365875244140625, 3.7637939453125, 3.4979476928710938, 7.10302734375, 2.0977706909179688, -4.442085266113281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000262.npy"}
|
||||
{"epoch": 0.3960695389266818, "step": 263, "batch_size": 64, "mean": 4.055931091308594, "std": 6.913664817810059, "min": -14.53017807006836, "p10": -3.658570671081543, "median": 2.840097427368164, "p90": 12.841429901123051, "max": 21.6761474609375, "pos_frac": 0.734375, "sample": [0.08879852294921875, 11.501762390136719, -5.321533203125, 1.90167236328125, 2.4565086364746094, 1.5158348083496094, 5.70556640625, 2.0907669067382812, 9.581859588623047, 11.303226470947266, 9.85308837890625, 3.7640628814697266, -2.36602783203125, 4.663825988769531, 1.6788864135742188, -5.956630706787109, 9.21976089477539, 2.4266510009765625, -1.689056396484375, -3.2374114990234375, 5.2930908203125, 11.48419189453125, 9.284561157226562, 5.741676330566406, -4.506694793701172, 2.6900596618652344, 17.947738647460938, 7.1612548828125, 5.235759735107422, 5.831493377685547, 17.404630661010742, 4.233673095703125, -1.9055099487304688, 5.7175140380859375, -3.7020702362060547, 0.34845733642578125, -3.5570716857910156, 15.646736145019531, -0.048847198486328125, 9.932777404785156, -1.1786041259765625, -0.8821334838867188, 5.6590423583984375, 13.415573120117188, -2.2912673950195312, 1.1981353759765625, 1.2757568359375, 14.616329193115234, 4.958518981933594, 2.9901351928710938, -7.080131530761719, 21.16785430908203, 8.631683349609375, -2.474334716796875, 21.6761474609375, 0.3888893127441406, 5.786045074462891, 2.322643280029297, 1.4806709289550781, -4.744838714599609, 1.22430419921875, -14.53017807006836, 9.136383056640625, 7.417957305908203], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000263.npy"}
|
||||
{"epoch": 0.3975812547241119, "step": 264, "batch_size": 64, "mean": 4.472408294677734, "std": 6.529417037963867, "min": -6.988136291503906, "p10": -2.161884307861328, "median": 3.547504425048828, "p90": 13.18703308105469, "max": 24.778594970703125, "pos_frac": 0.71875, "sample": [12.68048095703125, -1.9985504150390625, 1.1482086181640625, 2.913821220397949, 11.483362197875977, -2.04693603515625, 3.6042022705078125, 1.6695785522460938, 7.368721008300781, 5.543708801269531, -5.40675163269043, 0.2753105163574219, -6.988136291503906, 7.6852264404296875, -0.27437782287597656, -2.13616943359375, -0.3671836853027344, 6.799407958984375, 0.8808708190917969, 5.861236572265625, 3.4908065795898438, 8.253273010253906, 6.700645446777344, 7.1291046142578125, 0.48186492919921875, 21.15001678466797, -0.45499420166015625, 8.653274536132812, -1.43927001953125, 10.762924194335938, 9.299331665039062, 3.6330413818359375, 14.010078430175781, 1.625335693359375, 6.7020416259765625, 1.2268009185791016, 13.326797485351562, -1.9926681518554688, -2.9810104370117188, 18.781829833984375, 4.295989990234375, -2.308807373046875, -1.8993911743164062, 3.473052978515625, 13.365875244140625, 6.537712097167969, -0.186187744140625, 7.9298248291015625, -6.8170928955078125, 3.8284225463867188, 1.2456512451171875, 2.7401199340820312, 17.98105239868164, 5.23974609375, 0.6338882446289062, -2.423908233642578, 2.7877655029296875, 6.97503662109375, 4.2469024658203125, -2.1729049682617188, -1.499755859375, 24.778594970703125, 12.860916137695312, 7.5663909912109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000264.npy"}
|
||||
{"epoch": 0.39909297052154197, "step": 265, "batch_size": 64, "mean": 5.53924560546875, "std": 7.80652379989624, "min": -13.0830078125, "p10": -4.501816749572753, "median": 5.311772346496582, "p90": 15.38169174194336, "max": 21.25171661376953, "pos_frac": 0.75, "sample": [-7.385284423828125, 5.06378173828125, -3.270061492919922, 0.0118560791015625, 8.47923469543457, -1.1104202270507812, 8.840065002441406, 6.0643157958984375, -4.84295654296875, 19.878692626953125, 2.0453624725341797, 10.288215637207031, 5.046119689941406, 6.454448699951172, -1.7337265014648438, 0.9898452758789062, 8.33774185180664, 5.068361282348633, 8.2513427734375, -13.0830078125, -3.7058238983154297, 14.074859619140625, 16.55376434326172, 5.555183410644531, 0.5087413787841797, 3.5797805786132812, 8.747169494628906, 1.8590888977050781, 8.65444564819336, 1.0671882629394531, 14.227436065673828, 12.987136840820312, -3.2691116333007812, 19.086557388305664, 7.6615142822265625, 18.59880828857422, 3.784994125366211, -0.9007110595703125, -5.897686004638672, 0.44011688232421875, 6.93487548828125, 14.473636627197266, 5.776176452636719, 13.0343017578125, 21.25171661376953, 11.549407958984375, 18.03253173828125, 9.105743408203125, 4.991127014160156, -9.517662048339844, 8.20361328125, 3.9094696044921875, -0.93798828125, 13.295989990234375, 12.851394653320312, 12.473669052124023, 15.334678649902344, -6.106784820556641, 3.4569664001464844, -3.656402587890625, 15.401840209960938, -0.0264739990234375, 5.0058135986328125, -7.33331298828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000265.npy"}
|
||||
{"epoch": 0.40060468631897206, "step": 266, "batch_size": 64, "mean": 4.878218650817871, "std": 7.14096736907959, "min": -12.65289306640625, "p10": -1.2995933532714843, "median": 3.987377166748047, "p90": 13.904677009582523, "max": 24.998626708984375, "pos_frac": 0.78125, "sample": [-1.3128890991210938, 12.277339935302734, 20.245521545410156, 4.763269424438477, 10.881752014160156, 0.004779815673828125, 1.2543487548828125, -0.8156318664550781, 3.3593673706054688, -7.089729309082031, 4.883247375488281, -0.294342041015625, -0.15938186645507812, 3.2678470611572266, 0.3648948669433594, 6.7619476318359375, -0.89727783203125, 23.261627197265625, 0.712310791015625, -0.5911293029785156, 18.715450286865234, 6.513336181640625, 2.7466354370117188, 11.84682846069336, 4.126979827880859, 16.067581176757812, -5.151359558105469, 6.52508544921875, 14.235822677612305, 9.072578430175781, 8.246833801269531, 1.400604248046875, 6.756381988525391, -0.0260009765625, 3.5619049072265625, -1.2685699462890625, 4.9667510986328125, 15.148269653320312, 13.132003784179688, 0.0727691650390625, 5.9118194580078125, -3.93365478515625, 4.3451385498046875, 4.562824249267578, -12.07891845703125, 4.1544036865234375, 5.194728851318359, 2.4406509399414062, 3.1542892456054688, 8.912849426269531, -12.65289306640625, 3.316761016845703, -2.00775146484375, 8.198883056640625, 7.746147155761719, 7.706371307373047, 24.998626708984375, 0.9717140197753906, 11.467529296875, 3.3221893310546875, 11.969703674316406, 2.2874984741210938, 3.8477745056152344, 0.801544189453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000266.npy"}
|
||||
{"epoch": 0.4021164021164021, "step": 267, "batch_size": 64, "mean": 4.10806941986084, "std": 6.376469612121582, "min": -13.343559265136719, "p10": -4.094052124023437, "median": 4.647420883178711, "p90": 12.509908676147463, "max": 17.906494140625, "pos_frac": 0.6875, "sample": [6.49847412109375, -0.2585773468017578, 0.9186363220214844, -4.294731140136719, 6.354316711425781, -0.9008102416992188, 8.245708465576172, 9.199813842773438, 12.227592468261719, 4.928050994873047, 1.2426624298095703, 1.3113174438476562, 1.0963478088378906, 15.7552490234375, 1.2996883392333984, -5.1866302490234375, -0.379669189453125, -0.28259849548339844, 0.2938232421875, -5.574687957763672, -3.160938262939453, -3.2533035278320312, 1.9543380737304688, 3.2410240173339844, 14.794754028320312, 16.25528335571289, -0.034259796142578125, -3.6258010864257812, 8.142074584960938, -5.293613433837891, 7.5093231201171875, 6.914279937744141, 12.630901336669922, 4.366790771484375, 0.8195037841796875, 1.9972686767578125, 13.886184692382812, -4.435638427734375, -0.151611328125, 15.232070922851562, 8.04742431640625, -0.026363372802734375, 17.906494140625, 0.6883163452148438, 9.046104431152344, 5.820211410522461, 10.532432556152344, 6.4258880615234375, -13.343559265136719, 8.692852020263672, 6.4692840576171875, -4.915374755859375, -3.472311019897461, 7.489581108093262, 8.173542022705078, -0.4110107421875, 10.211746215820312, 8.073188781738281, 7.734214782714844, -0.47027587890625, 5.69664192199707, 11.26873779296875, 7.195075988769531, 5.800987243652344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000267.npy"}
|
||||
{"epoch": 0.4036281179138322, "step": 268, "batch_size": 64, "mean": 2.7348814010620117, "std": 6.740538120269775, "min": -10.307960510253906, "p10": -4.3416852951049805, "median": 0.896820068359375, "p90": 11.745328521728517, "max": 23.60784912109375, "pos_frac": 0.609375, "sample": [-0.5194549560546875, 16.475265502929688, 4.691152572631836, -8.479782104492188, -0.8181991577148438, 8.460594177246094, -0.8415756225585938, -10.307960510253906, -4.472801208496094, 14.000022888183594, 7.8346710205078125, -0.382354736328125, -1.1740703582763672, 0.827728271484375, 5.5780029296875, -1.0709114074707031, 0.9771957397460938, -5.9066619873046875, -4.417024612426758, 11.555839538574219, 11.513622283935547, 4.3815765380859375, -3.058481216430664, -1.249114990234375, -2.6873092651367188, 0.0796661376953125, 0.9738082885742188, 6.070182800292969, 4.0782623291015625, -0.1391143798828125, 3.47509765625, -0.2502593994140625, -0.26276397705078125, 0.5751304626464844, -0.5750732421875, 13.755409240722656, 0.8205642700195312, -8.409435272216797, -3.0445556640625, 1.2579259872436523, 5.3178863525390625, 0.004779815673828125, 6.1482086181640625, 23.60784912109375, 21.642486572265625, 4.8682098388671875, 9.262199401855469, -1.811269760131836, 0.965911865234375, -6.998954772949219, -4.1658935546875, 1.4012680053710938, 6.435005187988281, 0.29097938537597656, 11.8265380859375, 4.988666534423828, -2.71240234375, 6.834373474121094, 8.650104522705078, 0.7609024047851562, -2.2928619384765625, 13.395423889160156, 6.037570953369141, 1.2606010437011719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000268.npy"}
|
||||
{"epoch": 0.4051398337112623, "step": 269, "batch_size": 64, "mean": 4.102551460266113, "std": 6.521906852722168, "min": -11.8507080078125, "p10": -2.977211761474609, "median": 3.9033126831054688, "p90": 11.706388092041015, "max": 20.48967170715332, "pos_frac": 0.75, "sample": [2.2036361694335938, -11.749824523925781, 7.425872802734375, 3.66864013671875, 1.8085479736328125, -3.01324462890625, 4.1880645751953125, 20.48967170715332, 12.667579650878906, -3.2779693603515625, -0.6883621215820312, 4.384235382080078, 7.696847915649414, 3.913360595703125, 9.665153503417969, 15.394096374511719, 0.193267822265625, 2.112823486328125, 4.973045349121094, -3.8888282775878906, -1.9882278442382812, -1.487274169921875, 8.53680419921875, 3.8932647705078125, -10.449798583984375, 11.731063842773438, 5.695735931396484, 19.693634033203125, 7.364189147949219, 10.922250747680664, 3.349903106689453, 4.630794525146484, -0.24450302124023438, 9.693748474121094, 5.0724334716796875, 1.5122737884521484, 0.7314910888671875, 1.511505126953125, 0.16480255126953125, 11.648811340332031, 15.548370361328125, 6.488290786743164, -1.4617843627929688, 6.575796127319336, 11.250354766845703, -2.8931350708007812, 7.726005554199219, -0.8671379089355469, -4.708976745605469, 4.19580078125, 0.39984130859375, 2.5428428649902344, 9.556533813476562, 3.1756591796875, 0.18021392822265625, -11.8507080078125, -1.010284423828125, 9.796623229980469, -0.838958740234375, 9.000137329101562, 12.146209716796875, 9.289268493652344, 6.441623687744141, 1.7311782836914062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000269.npy"}
|
||||
{"epoch": 0.40665154950869237, "step": 270, "batch_size": 64, "mean": 3.504196882247925, "std": 6.347506999969482, "min": -10.926254272460938, "p10": -3.648997879028319, "median": 2.6915016174316406, "p90": 11.503172683715821, "max": 21.80596923828125, "pos_frac": 0.734375, "sample": [8.096847534179688, 10.390129089355469, -1.2933406829833984, 16.95435333251953, 4.304409027099609, 14.923622131347656, 9.37845230102539, 6.203571319580078, 5.0786895751953125, 1.006500244140625, 21.80596923828125, 11.476360321044922, -1.2655181884765625, 5.704689025878906, 4.0639495849609375, -0.07617950439453125, 0.3434600830078125, 0.24333572387695312, 0.9270286560058594, 11.514663696289062, 2.36761474609375, -2.3288421630859375, -4.2806854248046875, 2.8449325561523438, 10.996238708496094, 8.574783325195312, -1.4106426239013672, -9.586944580078125, 15.154991149902344, 1.748016357421875, -1.1975250244140625, 2.296722412109375, -0.14395904541015625, 2.4483718872070312, 0.049297332763671875, 5.8664398193359375, 1.80517578125, -4.214778900146484, 5.7371063232421875, -10.926254272460938, 5.344696044921875, 12.095321655273438, 7.93560791015625, -1.9012107849121094, 5.128931045532227, 6.805931091308594, -6.126960754394531, -0.8016204833984375, 4.18572998046875, 1.3309745788574219, 11.52481460571289, 0.3950309753417969, -7.978302001953125, 8.239326477050781, 2.252361297607422, 2.5380706787109375, 1.4767379760742188, -10.644294738769531, 9.178436279296875, 2.8654251098632812, 4.296699523925781, -0.4880332946777344, 6.465057373046875, 4.568817138671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000270.npy"}
|
||||
{"epoch": 0.40816326530612246, "step": 271, "batch_size": 64, "mean": 2.6820483207702637, "std": 7.401337623596191, "min": -8.027107238769531, "p10": -6.170894813537597, "median": 1.55572509765625, "p90": 11.819533157348634, "max": 22.030357360839844, "pos_frac": 0.609375, "sample": [3.3193511962890625, 6.025596618652344, 0.0963134765625, -5.0818328857421875, 11.851028442382812, 12.964241027832031, 4.44993782043457, 4.422359466552734, 7.720237731933594, -3.703277587890625, 9.320571899414062, -5.440361022949219, 10.305511474609375, -2.280623435974121, 7.56158447265625, 9.350967407226562, 3.625883102416992, 1.800323486328125, 0.8966197967529297, 21.795379638671875, -4.406303405761719, -3.2263946533203125, 10.156761169433594, -7.579559326171875, 0.8448944091796875, -0.6930503845214844, -4.746490478515625, 2.179363250732422, -1.96270751953125, -1.607757568359375, 4.928592681884766, 7.010639190673828, -8.027107238769531, -6.889505386352539, 3.42010498046875, 15.305625915527344, 3.2883834838867188, -0.5464019775390625, 8.790870666503906, 15.939462661743164, -1.2801437377929688, -5.449272155761719, 21.032012939453125, -2.8998260498046875, 2.9423370361328125, 3.73394775390625, 0.28923797607421875, -6.480161666870117, -8.014270782470703, -7.394813537597656, 1.9591140747070312, 0.298828125, 10.417442321777344, 11.746044158935547, 1.311126708984375, -0.02349853515625, -3.1047821044921875, 8.215087890625, 0.6072006225585938, -4.271343231201172, -3.454599380493164, -7.788848876953125, 22.030357360839844, 6.050689697265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000271.npy"}
|
||||
{"epoch": 0.40967498110355255, "step": 272, "batch_size": 64, "mean": 5.177058219909668, "std": 6.186792850494385, "min": -8.833141326904297, "p10": -1.7462471008300775, "median": 4.322547912597656, "p90": 14.522228240966797, "max": 19.921680450439453, "pos_frac": 0.8125, "sample": [2.756977081298828, 2.9158782958984375, -1.9950485229492188, 2.908885955810547, 4.2747802734375, 16.018936157226562, 8.112457275390625, 15.272171020507812, 16.510360717773438, 5.7509765625, 4.9230194091796875, 4.997932434082031, 0.4327983856201172, 0.803741455078125, 9.068229675292969, 2.839263916015625, 2.100616455078125, -2.1346969604492188, 14.524642944335938, 16.260677337646484, 8.47542953491211, 4.523525238037109, 9.866584777832031, 5.783424377441406, 11.585739135742188, 4.3703155517578125, 9.700828552246094, -5.6578826904296875, 2.4271774291992188, 13.669239044189453, 3.66400146484375, 19.921680450439453, -2.0114822387695312, -0.05640888214111328, 13.311735153198242, 2.790111541748047, 0.21901321411132812, -0.9504661560058594, 9.083320617675781, -8.833141326904297, 8.438980102539062, 14.516593933105469, 8.269432067871094, 7.815025329589844, 2.452007293701172, 0.5738563537597656, -5.7823486328125, -0.658233642578125, 7.308933258056641, 8.992134094238281, 2.9591636657714844, 7.110496520996094, 1.352325439453125, 0.6087493896484375, 4.818229675292969, 13.1279296875, 0.5379714965820312, 3.5230026245117188, -4.391136169433594, 1.8448505401611328, -1.16571044921875, -0.2606210708618164, 16.615028381347656, 4.499719619750977], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000272.npy"}
|
||||
{"epoch": 0.41118669690098264, "step": 273, "batch_size": 64, "mean": 4.438119411468506, "std": 7.420067310333252, "min": -13.412452697753906, "p10": -5.650976943969726, "median": 5.200605392456055, "p90": 12.741942596435548, "max": 21.9368896484375, "pos_frac": 0.78125, "sample": [5.678974151611328, 5.995277404785156, 12.644866943359375, -5.773189544677734, 5.50555419921875, 5.7021484375, 7.11016845703125, 10.095928192138672, -11.055374145507812, 2.6575164794921875, -0.6774139404296875, -8.837326049804688, 7.3812408447265625, -4.724851608276367, 4.668735504150391, 9.422065734863281, 11.4930419921875, 2.9889183044433594, 17.440692901611328, -0.7457504272460938, 5.9895782470703125, 6.840480804443359, -3.3628387451171875, -5.365814208984375, -13.412452697753906, 5.292530059814453, 8.139225006103516, 0.9687271118164062, -1.787034034729004, 1.238555908203125, 1.7039947509765625, 13.339813232421875, 5.108680725097656, -10.382919311523438, 8.645195007324219, 3.316112518310547, 10.476207733154297, 9.505874633789062, 3.4732284545898438, 6.053657531738281, -7.137840270996094, 4.386077880859375, 8.456710815429688, -3.4708480834960938, 1.4627532958984375, 21.9368896484375, 20.274974822998047, 12.408855438232422, 1.4958267211914062, 12.783546447753906, 3.996938705444336, 2.08807373046875, 5.988857269287109, 0.19042205810546875, 3.1601943969726562, 9.960494995117188, 2.30743408203125, 10.304100036621094, 1.7993659973144531, 5.315864562988281, 13.7718505859375, 17.84933090209961, -9.703681945800781, 11.661415100097656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000273.npy"}
|
||||
{"epoch": 0.4126984126984127, "step": 274, "batch_size": 64, "mean": 4.4560394287109375, "std": 6.36513090133667, "min": -6.37872314453125, "p10": -3.2126972198486325, "median": 3.9985923767089844, "p90": 13.764271545410159, "max": 18.160263061523438, "pos_frac": 0.703125, "sample": [5.6903533935546875, -1.2303848266601562, -2.134307861328125, 0.85968017578125, -0.5105133056640625, -6.2042388916015625, 11.402244567871094, 9.118705749511719, 5.383464813232422, -2.4847984313964844, 9.748252868652344, 12.585456848144531, -3.8774795532226562, -4.87109375, 4.916606903076172, 16.857383728027344, 14.477813720703125, -5.027040481567383, 6.347705841064453, -1.7061271667480469, 4.9799041748046875, 11.243793487548828, 5.734916687011719, 13.993515014648438, 2.0221595764160156, 0.7978496551513672, 15.016883850097656, 11.673660278320312, 1.2131156921386719, 3.3680648803710938, -3.3686790466308594, 1.5193557739257812, 9.31719970703125, 0.16728973388671875, -3.539386749267578, 7.660099029541016, -1.0173454284667969, -0.4567718505859375, 7.7660369873046875, 3.9571914672851562, -2.5812129974365234, 12.180313110351562, 1.7713127136230469, 4.0399932861328125, -1.1387863159179688, 12.854286193847656, 1.8585929870605469, -6.37872314453125, 4.388359069824219, -2.7926559448242188, 14.53948974609375, 15.750862121582031, 6.777395248413086, 3.9319000244140625, 1.28594970703125, -1.4602603912353516, -2.8487396240234375, 7.751365661621094, 18.160263061523438, 7.273895263671875, 2.3745574951171875, 6.422309875488281, 13.2293701171875, 6.406150817871094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000274.npy"}
|
||||
{"epoch": 0.41421012849584277, "step": 275, "batch_size": 64, "mean": 5.427064895629883, "std": 6.579267501831055, "min": -7.420166015625, "p10": -2.267818832397461, "median": 4.435920715332031, "p90": 15.57246608734131, "max": 20.27452850341797, "pos_frac": 0.796875, "sample": [1.2780303955078125, 4.406848907470703, 2.6605377197265625, 9.445926666259766, 14.329376220703125, 0.010440826416015625, 1.3805122375488281, -3.3733673095703125, 19.606002807617188, 12.250518798828125, 2.139404296875, 6.2946624755859375, 2.5695877075195312, 17.179353713989258, -1.2109031677246094, 0.8943252563476562, 6.9727020263671875, 2.597322463989258, 13.362503051757812, -0.9621047973632812, -7.420166015625, 20.27452850341797, 1.0642776489257812, 4.267280578613281, 11.649871826171875, 1.9253997802734375, -2.5413360595703125, -1.733795166015625, 5.04339599609375, 4.5533599853515625, 7.3260650634765625, 16.114791870117188, -0.6397705078125, 6.7110748291015625, 16.933494567871094, 17.48711395263672, 2.9955978393554688, 4.769641876220703, -0.5411224365234375, -2.282745361328125, 12.695976257324219, 2.8215789794921875, 0.9144515991210938, 0.013641357421875, 4.464992523193359, 12.575935363769531, 12.663131713867188, 5.045631408691406, -2.232990264892578, 9.140975952148438, 1.030303955078125, -4.175868988037109, -4.001110076904297, 8.044670104980469, 15.646974563598633, 5.685455322265625, 2.50531005859375, 5.048675537109375, 8.466287612915039, 7.7393035888671875, 15.398612976074219, 1.936004638671875, -2.8998241424560547, 11.015369415283203], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000275.npy"}
|
||||
{"epoch": 0.41572184429327286, "step": 276, "batch_size": 64, "mean": 4.631248950958252, "std": 6.853778839111328, "min": -11.158424377441406, "p10": -2.500448226928711, "median": 4.463913917541504, "p90": 14.475885200500494, "max": 19.938058853149414, "pos_frac": 0.75, "sample": [4.812080383300781, 0.4590492248535156, -1.748870849609375, 2.3878631591796875, 1.3004875183105469, 4.452140808105469, 19.938058853149414, -9.794784545898438, 5.123998641967773, 7.301597595214844, 7.804603576660156, 2.044952392578125, 4.475687026977539, 9.803314208984375, 8.556495666503906, 8.319145202636719, 4.053966522216797, 1.972259521484375, 4.638519287109375, 1.1713981628417969, 0.800567626953125, 17.303634643554688, 6.846555709838867, 15.680404663085938, -1.4344406127929688, -11.158424377441406, 9.199470520019531, 3.4479217529296875, 7.178825378417969, 12.893850326538086, -0.0793914794921875, -0.8978729248046875, -5.480316162109375, -2.3108177185058594, 12.096019744873047, -2.0020599365234375, 18.737491607666016, -2.1963958740234375, 6.8507080078125, 9.652454376220703, -8.112152099609375, 0.6479339599609375, 15.924434661865234, -0.2643318176269531, 4.029441833496094, -5.307960510253906, 9.77214241027832, 9.61474609375, 19.403289794921875, 11.652450561523438, 0.8460006713867188, -0.8120880126953125, 5.125816345214844, 0.5883331298828125, 0.7973098754882812, 15.153900146484375, 11.621604919433594, 1.1453857421875, -2.5817184448242188, 7.169227600097656, 5.174770355224609, 6.4763031005859375, 8.749095916748047, -2.6141510009765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000276.npy"}
|
||||
{"epoch": 0.41723356009070295, "step": 277, "batch_size": 64, "mean": 2.391343593597412, "std": 6.97032356262207, "min": -10.869338989257812, "p10": -6.471712493896484, "median": 2.725961685180664, "p90": 10.425631523132326, "max": 24.726463317871094, "pos_frac": 0.625, "sample": [10.130664825439453, -4.9071197509765625, 9.496540069580078, 1.6261405944824219, 3.9359970092773438, -3.546478271484375, 5.574676513671875, 0.7861862182617188, 10.552045822143555, 9.3304443359375, -6.526161193847656, -2.5549545288085938, 6.144927978515625, -0.142059326171875, -6.207447052001953, 3.553497314453125, -0.7161731719970703, 1.555419921875, -1.0292778015136719, 0.5786857604980469, 5.165304183959961, 6.759796142578125, 2.658161163330078, 3.3116683959960938, -4.695831298828125, 9.736572265625, 2.9503402709960938, -4.385627746582031, -6.34466552734375, 5.870822906494141, -0.08353519439697266, 9.695026397705078, -5.032035827636719, 9.152023315429688, 0.30411338806152344, 1.7515640258789062, -7.2103424072265625, -9.450614929199219, -4.326820373535156, -4.516868591308594, 3.14215087890625, -3.3117294311523438, 8.305908203125, 11.3155517578125, 3.5054779052734375, 3.958324432373047, 4.7466583251953125, 4.8047637939453125, -10.869338989257812, 24.726463317871094, 10.0589599609375, 0.7076034545898438, 11.775299072265625, 11.649681091308594, -1.796417236328125, -0.04245948791503906, 13.714387893676758, -7.8877410888671875, 2.877838134765625, 19.031253814697266, -6.764564514160156, 2.79376220703125, -7.550997734069824, 5.21055793762207], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000277.npy"}
|
||||
{"epoch": 0.41874527588813304, "step": 278, "batch_size": 64, "mean": 3.41428804397583, "std": 6.975959777832031, "min": -13.789955139160156, "p10": -5.772592163085937, "median": 3.29840087890625, "p90": 13.457461547851565, "max": 18.5106201171875, "pos_frac": 0.703125, "sample": [-2.3324851989746094, -1.8045196533203125, 8.373504638671875, 0.9510040283203125, 3.5645103454589844, -1.8373355865478516, -9.739990234375, 8.258056640625, 12.766754150390625, 13.869117736816406, 5.928703308105469, 0.030422210693359375, 12.310688018798828, 16.380054473876953, 17.04882049560547, 0.13933944702148438, 2.988452911376953, 2.9625492095947266, 3.500988006591797, 6.631500244140625, 3.1427841186523438, 6.593357086181641, 1.4783592224121094, -7.435661315917969, -13.789955139160156, 13.75347900390625, -5.8748931884765625, 4.539768218994141, 4.0537109375, 5.3208770751953125, 5.004390716552734, 2.1074066162109375, -6.6072998046875, -7.758754730224609, 12.039871215820312, 2.1608428955078125, 1.8831405639648438, 0.6047821044921875, 9.85445785522461, -1.1058197021484375, 16.00881576538086, 18.150802612304688, -1.3313369750976562, 4.5680694580078125, 10.406795501708984, 6.549812316894531, 7.141151428222656, -0.14495086669921875, -5.0060272216796875, 2.701183319091797, 3.4540176391601562, 5.83074951171875, -5.5338897705078125, 0.045711517333984375, -0.7812118530273438, 18.5106201171875, -8.010040283203125, 4.36480712890625, 4.539072036743164, -3.5351943969726562, -0.7357444763183594, -0.4353370666503906, 4.884082794189453, 6.917488098144531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000278.npy"}
|
||||
{"epoch": 0.42025699168556313, "step": 279, "batch_size": 64, "mean": 5.515895843505859, "std": 6.37642240524292, "min": -12.138859748840332, "p10": -1.6979721069335938, "median": 4.511907577514648, "p90": 15.177418518066407, "max": 18.108474731445312, "pos_frac": 0.8125, "sample": [-3.772676467895508, 3.2653846740722656, 16.964988708496094, 3.3786163330078125, -12.138859748840332, 18.108474731445312, 3.276203155517578, -3.7017440795898438, -5.4605865478515625, 9.181510925292969, 4.466888427734375, 15.224014282226562, 2.7195205688476562, 15.771114349365234, 6.4649658203125, 13.314666748046875, 6.4678192138671875, 15.068695068359375, -4.507900238037109, -1.6513595581054688, -0.8583183288574219, 14.421951293945312, 0.6309356689453125, 7.086093902587891, 4.0219268798828125, 8.115928649902344, 8.306446075439453, 1.5883102416992188, -0.1336822509765625, 12.577247619628906, 8.32193374633789, 11.815139770507812, 13.334747314453125, -1.7179489135742188, 0.49375152587890625, 6.779933929443359, 5.24205207824707, 7.7300872802734375, 4.434349060058594, 15.936246871948242, 10.587905883789062, 4.556926727294922, 9.48907470703125, 3.039306640625, -3.489288330078125, 6.710472106933594, -0.04112434387207031, 2.2379417419433594, 3.2981414794921875, 5.976249694824219, 8.33570671081543, 0.13187408447265625, 2.1749534606933594, 15.24985122680664, 0.6485157012939453, -0.46192169189453125, 7.905609130859375, 1.5789413452148438, 10.151962280273438, 4.146699905395508, 7.998086929321289, 1.7586669921875, 17.803688049316406, 2.6622161865234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000279.npy"}
|
||||
{"epoch": 0.4217687074829932, "step": 280, "batch_size": 64, "mean": 5.765023231506348, "std": 6.177852630615234, "min": -10.591796875, "p10": -0.1500030517578122, "median": 5.362392425537109, "p90": 14.569008636474612, "max": 22.104259490966797, "pos_frac": 0.890625, "sample": [1.1850357055664062, 22.104259490966797, 4.042579650878906, 10.258407592773438, 0.26447296142578125, 13.320426940917969, 1.2221622467041016, -0.27791595458984375, 4.20258903503418, 4.822990417480469, 6.274360656738281, 6.416990280151367, 7.247161865234375, 3.113311767578125, 11.127334594726562, 14.162521362304688, 6.444984436035156, 6.746387481689453, 14.743217468261719, -4.563774108886719, 0.6014328002929688, 2.4685592651367188, 1.5521812438964844, 0.6758708953857422, 17.840980529785156, 6.985298156738281, 7.686738967895508, 9.291702270507812, 17.816970825195312, 7.761322021484375, 3.358976364135742, 2.9951248168945312, 1.3195648193359375, 10.864837646484375, 1.3503341674804688, -2.134002685546875, 19.614288330078125, -1.3308792114257812, 18.03026580810547, 6.15570068359375, -4.1155548095703125, -10.591796875, 3.8855972290039062, 1.1422958374023438, 2.0911827087402344, 10.938800811767578, 1.5851669311523438, -0.5604476928710938, 3.1240234375, 7.328426361083984, 13.1385498046875, 6.078296661376953, 3.485820770263672, 8.369331359863281, 1.5853309631347656, 5.90179443359375, 6.149444580078125, 0.14846038818359375, 7.572120666503906, 0.1881103515625, 4.0428619384765625, 15.422210693359375, 6.8631134033203125, 9.425582885742188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000280.npy"}
|
||||
{"epoch": 0.42328042328042326, "step": 281, "batch_size": 64, "mean": 4.383538246154785, "std": 6.381789207458496, "min": -8.490242004394531, "p10": -3.846516799926757, "median": 3.7903499603271484, "p90": 13.375254058837895, "max": 20.8037109375, "pos_frac": 0.75, "sample": [-4.218311309814453, 11.870494842529297, 2.4566192626953125, 5.427337646484375, 8.15102767944336, 8.010772705078125, 3.5981712341308594, 5.871559143066406, 2.936126708984375, 14.22918701171875, 3.996479034423828, 9.529735565185547, 7.694341659545898, 14.36492919921875, -0.006256103515625, 3.9825286865234375, 6.3986968994140625, 11.046630859375, -1.897216796875, 2.95611572265625, 10.637447357177734, 2.7924652099609375, 1.41015625, 10.36871337890625, 4.617042541503906, 1.1966705322265625, 5.2082977294921875, -8.490242004394531, 2.8995361328125, -5.012554168701172, -0.05229949951171875, 0.3317832946777344, -2.1705093383789062, 2.911792755126953, 16.31939697265625, -8.247528076171875, 4.6758575439453125, 20.8037109375, -0.4596824645996094, -2.7388076782226562, 8.92367172241211, 0.5135574340820312, 12.41378402709961, 7.171119689941406, 14.689895629882812, 0.22849273681640625, 6.162727355957031, 6.266460418701172, 15.025745391845703, 2.5637130737304688, -5.1666412353515625, 2.9565277099609375, 13.699623107910156, -7.542442321777344, 0.7849102020263672, 7.678199768066406, 10.826908111572266, -1.455902099609375, 12.618392944335938, 5.6503448486328125, -0.145660400390625, -2.9789962768554688, 1.0229110717773438, -4.761104583740234], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000281.npy"}
|
||||
{"epoch": 0.42479213907785335, "step": 282, "batch_size": 64, "mean": 4.891935348510742, "std": 7.4011335372924805, "min": -9.654144287109375, "p10": -3.9859169006347654, "median": 2.8698081970214844, "p90": 17.063391113281256, "max": 24.233810424804688, "pos_frac": 0.765625, "sample": [2.1242752075195312, 7.0906524658203125, -2.1082763671875, 2.6164321899414062, 2.7944107055664062, 2.5990238189697266, 5.6874542236328125, 2.5881099700927734, -4.324329376220703, 15.902542114257812, 15.035804748535156, 10.973251342773438, 3.5134992599487305, 2.466888427734375, 18.2763671875, 0.27227783203125, 18.95644187927246, 4.102027893066406, -2.3311309814453125, 5.852119445800781, 1.593353271484375, 5.4368133544921875, 0.9540863037109375, 18.238876342773438, 0.5768966674804688, -2.486339569091797, 1.7989883422851562, 1.4073677062988281, 19.056425094604492, 7.378181457519531, -9.085052490234375, -4.020118713378906, -2.491790771484375, 11.209953308105469, -1.423126220703125, -3.9061126708984375, 8.246131896972656, 2.1455440521240234, -0.4022979736328125, 2.9452056884765625, 2.673389434814453, 10.934036254882812, -1.642364501953125, 10.337966918945312, 12.485679626464844, 17.560897827148438, 6.852108001708984, 8.326545715332031, 1.3272781372070312, 12.542007446289062, -4.32061767578125, 1.3881912231445312, -4.529815673828125, 4.063084602355957, -4.292510986328125, 24.233810424804688, 10.738113403320312, 1.084197998046875, 3.995380401611328, -9.654144287109375, 4.981201171875, 9.530830383300781, 19.974075317382812, 5.233707427978516], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000282.npy"}
|
||||
{"epoch": 0.42630385487528344, "step": 283, "batch_size": 64, "mean": 4.46937370300293, "std": 7.089620113372803, "min": -16.347824096679688, "p10": -4.434799957275389, "median": 4.4675445556640625, "p90": 14.232851028442386, "max": 18.933349609375, "pos_frac": 0.703125, "sample": [18.933349609375, -5.2530364990234375, -10.150344848632812, -1.9464302062988281, 9.530570983886719, 4.392627716064453, 4.059288024902344, -0.029693603515625, 14.56270980834961, -5.536670684814453, -2.5255813598632812, -2.2466354370117188, 5.650966644287109, 2.6051864624023438, 4.542461395263672, 4.589607238769531, 4.8509979248046875, 13.214508056640625, -6.2053680419921875, 9.828216552734375, 5.990180969238281, 4.689300537109375, 9.168750762939453, -1.1977252960205078, 5.3096160888671875, 15.055717468261719, 17.792434692382812, 8.757736206054688, 13.463180541992188, 2.6623077392578125, 12.852005004882812, 12.337337493896484, -1.6091537475585938, 4.928707122802734, -1.6940994262695312, 3.88201904296875, 2.139007568359375, -2.3951416015625, 6.69842529296875, 7.02142333984375, 14.956243515014648, -0.1426563262939453, 15.220077514648438, 13.093332290649414, 1.4540023803710938, -0.34503936767578125, 6.657295227050781, -16.347824096679688, -0.775996208190918, 2.153533935546875, 9.92221450805664, 1.2089881896972656, 15.9503173828125, 4.266838073730469, 9.886871337890625, 10.823875427246094, -5.938682556152344, 4.821483612060547, -0.5541610717773438, 2.4190673828125, -6.0848846435546875, 10.663154602050781, 0.48272705078125, 3.530364990234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000283.npy"}
|
||||
{"epoch": 0.42781557067271353, "step": 284, "batch_size": 64, "mean": 5.374138832092285, "std": 6.845883846282959, "min": -9.754703521728516, "p10": -1.1364673614501952, "median": 4.867851257324219, "p90": 15.012505340576173, "max": 22.40241241455078, "pos_frac": 0.796875, "sample": [1.2692108154296875, 3.1497344970703125, 6.767353057861328, -0.2792205810546875, 20.272232055664062, 3.52532958984375, -1.0176048278808594, -8.798248291015625, 17.81378173828125, 1.1215362548828125, 0.4338703155517578, 13.615894317626953, 1.1235618591308594, 4.472137451171875, 12.75100326538086, -0.4313392639160156, 7.743851661682129, 12.192956924438477, -1.187408447265625, 2.841339111328125, 5.2280426025390625, -3.505218505859375, 1.7275543212890625, -5.576606750488281, 0.96453857421875, 8.334693908691406, 5.97314453125, -2.6268692016601562, 5.276042938232422, 9.12894058227539, 6.7326507568359375, 8.990089416503906, 14.687835693359375, 5.845375061035156, 6.532505035400391, 2.197206497192383, 4.562038421630859, 20.655242919921875, 16.833221435546875, 1.6049880981445312, 7.244781494140625, 19.736289978027344, 1.9303741455078125, 7.574737548828125, 0.3501148223876953, 1.5767478942871094, 9.4796142578125, 7.322662353515625, 8.293037414550781, 15.151649475097656, 0.012807846069335938, 11.542465209960938, 7.566886901855469, 4.6567230224609375, 2.8961334228515625, 5.0789794921875, 22.40241241455078, -0.755584716796875, 9.828042984008789, -3.3796005249023438, -0.16680145263671875, -9.754703521728516, 5.22344970703125, -0.8117198944091797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000284.npy"}
|
||||
{"epoch": 0.4293272864701436, "step": 285, "batch_size": 64, "mean": 3.0931224822998047, "std": 7.184380054473877, "min": -17.968914031982422, "p10": -5.301471710205078, "median": 1.7725725173950195, "p90": 13.557295227050783, "max": 19.03656005859375, "pos_frac": 0.609375, "sample": [-0.06725311279296875, -0.060901641845703125, -0.870574951171875, 12.250225067138672, -1.6760101318359375, -5.452934265136719, -8.485542297363281, 2.5211753845214844, -3.8682708740234375, -17.968914031982422, 12.445878982543945, 7.244056701660156, 14.792800903320312, 7.1978912353515625, 8.685966491699219, -4.359203338623047, 5.016059875488281, 19.03656005859375, -0.45076751708984375, 1.0976409912109375, -0.3860931396484375, 2.2464962005615234, -4.535060882568359, 0.4049549102783203, -2.0883026123046875, -2.249114990234375, -1.7231903076171875, 1.3997879028320312, 0.6396484375, -1.0609664916992188, -2.376270294189453, 5.0496673583984375, -5.487846374511719, 15.836280822753906, 13.762016296386719, 5.064266204833984, 9.077278137207031, 5.5838470458984375, 8.163753509521484, 6.798921585083008, 10.438217163085938, 7.230979919433594, -2.517578125, 15.84637451171875, 13.079612731933594, 5.2613677978515625, 4.707653045654297, 17.710906982421875, -4.94805908203125, -5.531711578369141, 2.145357131958008, 7.289924621582031, 4.352226257324219, 8.829689025878906, 0.91668701171875, 1.0830516815185547, 13.944564819335938, 8.212646484375, -2.0459861755371094, -0.2396240234375, -8.192703247070312, -6.223175048828125, 0.8066177368164062, 4.65484619140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000285.npy"}
|
||||
{"epoch": 0.4308390022675737, "step": 286, "batch_size": 64, "mean": 5.364743709564209, "std": 7.467953681945801, "min": -14.959640502929688, "p10": -1.980772590637207, "median": 4.712964057922363, "p90": 15.596783447265626, "max": 22.774032592773438, "pos_frac": 0.78125, "sample": [10.401321411132812, 4.932575225830078, 1.175567626953125, 3.9577560424804688, 4.36004638671875, 11.924240112304688, 2.5227088928222656, -1.955099105834961, 3.961465835571289, -4.6058807373046875, 11.82794189453125, 4.71539306640625, 0.9626541137695312, 7.286396026611328, 8.242246627807617, 15.46771240234375, 8.574344635009766, 6.008535385131836, -10.18392562866211, 5.569190979003906, 2.2576866149902344, -11.681694030761719, -0.20819091796875, 18.44118881225586, -1.9917755126953125, -2.4055023193359375, 9.061956405639648, 7.210731506347656, 0.514373779296875, 5.8333282470703125, 11.464113235473633, 8.465496063232422, 17.702613830566406, 15.670158386230469, 5.4920654296875, 22.774032592773438, -2.119403839111328, 11.333744049072266, -0.6882171630859375, 11.353988647460938, 22.729026794433594, 15.652099609375, 2.4720840454101562, -0.7197418212890625, 6.0091552734375, 2.610919952392578, 1.5237312316894531, 0.61663818359375, 15.981163024902344, 14.702590942382812, 9.633575439453125, -0.2173919677734375, 0.38849639892578125, 3.2338294982910156, -1.096282958984375, 1.1100921630859375, 14.350486755371094, 1.6496849060058594, -14.959640502929688, 12.890380859375, 4.710535049438477, 0.6231536865234375, 6.3199005126953125, -0.4967536926269531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000286.npy"}
|
||||
{"epoch": 0.4323507180650038, "step": 287, "batch_size": 64, "mean": 6.77927827835083, "std": 8.175735473632812, "min": -11.517963409423828, "p10": -1.9462371826171871, "median": 5.431297302246094, "p90": 18.049686431884766, "max": 24.852310180664062, "pos_frac": 0.796875, "sample": [6.9586029052734375, -0.32452392578125, 2.105428695678711, 17.784019470214844, 20.181251525878906, 17.00481414794922, 5.2737884521484375, 6.9547882080078125, 19.435699462890625, -0.530181884765625, 6.021080017089844, 10.834487915039062, 9.923040390014648, 13.4693603515625, 17.89777374267578, 22.043411254882812, 9.716400146484375, 0.15423011779785156, 5.205354690551758, 1.708770751953125, 7.199806213378906, -5.0618743896484375, 11.249725341796875, 3.870205879211426, 4.827976226806641, 12.733325958251953, -4.504459381103516, -2.06103515625, 15.637199401855469, 1.1744880676269531, 11.87127685546875, 0.31343841552734375, -11.517963409423828, -1.1877059936523438, -0.4587516784667969, 10.272357940673828, 7.382164001464844, -1.678375244140625, 2.9407424926757812, 11.947490692138672, 10.997566223144531, 0.12418365478515625, 11.367034912109375, 20.581783294677734, 4.94891357421875, 4.470802307128906, 24.852310180664062, -3.8217334747314453, 13.002517700195312, 15.941330909729004, -0.2111949920654297, 2.2009811401367188, 0.34654998779296875, 2.7547550201416016, -8.146888732910156, 1.6470794677734375, 13.11391830444336, 5.58880615234375, 5.972999572753906, 5.089570999145508, -7.641326904296875, 23.651382446289062, 18.114791870117188, 2.1600570678710938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000287.npy"}
|
||||
{"epoch": 0.43386243386243384, "step": 288, "batch_size": 64, "mean": 3.9401965141296387, "std": 7.715809345245361, "min": -12.428909301757812, "p10": -6.335971069335936, "median": 3.0446090698242188, "p90": 13.734998321533205, "max": 22.576377868652344, "pos_frac": 0.71875, "sample": [10.843595504760742, 2.503662109375, -10.928497314453125, -7.266197204589844, 9.649818420410156, 4.172237396240234, 5.9232025146484375, -3.2920684814453125, 6.660346984863281, 0.9858741760253906, 5.042362213134766, -0.6648330688476562, 1.7313766479492188, 13.171775817871094, 2.7581329345703125, 7.88916015625, -8.691619873046875, 7.025505065917969, -0.6590423583984375, 8.355056762695312, -1.9106407165527344, 11.349700927734375, 17.298843383789062, 20.66716766357422, 2.5429458618164062, 11.834976196289062, 5.070011138916016, 17.018577575683594, -4.0399017333984375, 3.9424877166748047, 22.277313232421875, -3.8849945068359375, 3.0907745361328125, 10.105033874511719, 15.884490966796875, 6.81291389465332, 3.1107940673828125, -4.117645263671875, 6.4814605712890625, 0.70758056640625, -7.0965576171875, -12.428909301757812, 22.576377868652344, 2.998443603515625, 6.578319549560547, -6.939350128173828, -3.3148727416992188, -0.42240142822265625, 3.320476531982422, 13.97637939453125, 9.142837524414062, 2.3920211791992188, 11.59857177734375, 1.167877197265625, -6.921958923339844, -4.968666076660156, 1.4544143676757812, 1.9007186889648438, 2.6694374084472656, 8.624420166015625, 0.19994163513183594, 1.04669189453125, 8.198287963867188, -3.0316619873046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000288.npy"}
|
||||
{"epoch": 0.43537414965986393, "step": 289, "batch_size": 64, "mean": 5.22874641418457, "std": 7.354101181030273, "min": -13.615531921386719, "p10": -2.1216495513916014, "median": 3.333087921142578, "p90": 15.952895545959478, "max": 26.640228271484375, "pos_frac": 0.75, "sample": [-4.629978179931641, -0.6071586608886719, 9.893264770507812, -1.4051475524902344, 0.1973724365234375, 7.8838348388671875, -0.16154098510742188, 2.9365806579589844, 2.3939666748046875, 1.084808349609375, -1.910552978515625, 8.294189453125, 4.502162933349609, 5.542034149169922, 8.5899658203125, 1.4881725311279297, 10.787933349609375, -4.228668212890625, 3.2684860229492188, 14.56134033203125, 2.4540252685546875, 17.409160614013672, 2.0273399353027344, 2.2416305541992188, 10.257591247558594, 2.342151641845703, 11.493087768554688, 2.8573532104492188, 13.192428588867188, 18.780494689941406, -2.160675048828125, 7.41581916809082, 0.9850711822509766, -4.780265808105469, 21.41992950439453, -2.030590057373047, -3.530834197998047, 26.640228271484375, 9.196517944335938, 6.1528778076171875, 6.696308135986328, 6.317604064941406, 4.054046630859375, -4.877098083496094, 9.724531173706055, -13.615531921386719, 2.7983760833740234, 0.07074928283691406, -0.18039703369140625, -1.5171928405761719, 16.727386474609375, 9.187042236328125, 0.6930084228515625, -0.98516845703125, 11.219062805175781, 16.54927635192871, 17.303916931152344, 12.494613647460938, 12.524139404296875, 3.3976898193359375, 4.977180480957031, -1.8278751373291016, 11.64206314086914, 0.4216194152832031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000289.npy"}
|
||||
{"epoch": 0.436885865457294, "step": 290, "batch_size": 64, "mean": 4.731853485107422, "std": 6.8077712059021, "min": -12.470664978027344, "p10": -2.905617523193359, "median": 4.281749725341797, "p90": 13.75596046447754, "max": 20.354515075683594, "pos_frac": 0.78125, "sample": [13.924938201904297, 5.138641357421875, 6.374570846557617, 18.423919677734375, 8.155803680419922, 0.819244384765625, 0.7430095672607422, -0.4381294250488281, 2.69537353515625, 7.0544891357421875, 4.294303894042969, 5.517036437988281, -1.9804916381835938, -2.650909423828125, 12.829437255859375, 1.6508941650390625, 4.269195556640625, -6.1524810791015625, -3.0147781372070312, 7.3696136474609375, -0.32399749755859375, 13.361679077148438, 11.985519409179688, 10.389846801757812, 5.3210601806640625, -1.8122329711914062, 4.934719085693359, 0.4035491943359375, 4.565517425537109, -6.655372619628906, 15.09808349609375, 6.3757781982421875, 10.317344665527344, 7.316581726074219, -3.165771484375, 18.2960205078125, 12.338485717773438, 3.2135009765625, 8.730293273925781, -6.793701171875, 6.0955352783203125, 9.022357940673828, 1.3738059997558594, -12.470664978027344, 3.076854705810547, 0.8241043090820312, 0.6811199188232422, 20.354515075683594, 15.997550964355469, 13.062408447265625, 3.5547332763671875, 8.039093017578125, -1.147003173828125, 0.22004318237304688, -8.81024169921875, 8.09454345703125, 3.19915771484375, 3.6360130310058594, 3.8955535888671875, 1.9781341552734375, 9.3800048828125, 15.298126220703125, 0.41501617431640625, -1.8527069091796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000290.npy"}
|
||||
{"epoch": 0.4383975812547241, "step": 291, "batch_size": 64, "mean": 5.412579536437988, "std": 8.550070762634277, "min": -18.96811866760254, "p10": -4.2593120574951175, "median": 5.221036911010742, "p90": 17.34386329650879, "max": 24.360824584960938, "pos_frac": 0.75, "sample": [15.603271484375, 10.900787353515625, 0.19179534912109375, 3.966644287109375, 9.962142944335938, 14.295745849609375, -10.687515258789062, 3.9444808959960938, 20.736663818359375, -4.217353820800781, 16.944705963134766, 9.967361450195312, -6.9069976806640625, -4.277294158935547, 14.535032272338867, 17.514930725097656, 1.8570442199707031, -5.1235809326171875, 12.280349731445312, 7.084102630615234, -0.9403877258300781, -1.7581253051757812, 9.224128723144531, 1.6654052734375, 10.042272567749023, 7.724388122558594, 18.709457397460938, 0.40448760986328125, 24.360824584960938, 5.859104156494141, 10.050651550292969, 20.035430908203125, -1.1475791931152344, 0.8121299743652344, -1.4967269897460938, 7.419986724853516, 8.503997802734375, 0.789215087890625, 4.582969665527344, 3.21734619140625, 3.6789321899414062, 11.140953063964844, -18.96811866760254, 1.3438796997070312, 16.56787872314453, 9.075531005859375, 7.684455871582031, 6.29954719543457, 9.66893196105957, 17.702484130859375, -11.202980041503906, 2.0938262939453125, -7.867862701416016, 6.693031311035156, -3.8090896606445312, 2.7895965576171875, -1.1362133026123047, 18.407413482666016, 10.4119873046875, 10.203731536865234, 3.2176971435546875, -1.177825927734375, -3.4160232543945312, 0.3720245361328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000291.npy"}
|
||||
{"epoch": 0.4399092970521542, "step": 292, "batch_size": 64, "mean": 4.829549789428711, "std": 7.594285011291504, "min": -16.694488525390625, "p10": -4.6540727615356445, "median": 3.95595645904541, "p90": 15.827130126953127, "max": 19.585529327392578, "pos_frac": 0.71875, "sample": [-5.13787841796875, 1.0753326416015625, 7.624663352966309, -0.2358074188232422, 8.56960678100586, 7.848878860473633, 3.4146041870117188, 16.3084716796875, 8.919384002685547, 1.562103271484375, 7.4805450439453125, -1.2207717895507812, 7.438423156738281, 4.135322570800781, -8.795928955078125, 5.198604583740234, 10.134735107421875, 10.871650695800781, 15.008056640625, 1.6813735961914062, 1.7029342651367188, 14.448997497558594, 11.384868621826172, -4.585653305053711, -5.840545654296875, 2.4224395751953125, -2.4029312133789062, 7.211582183837891, -7.8600006103515625, 2.4329833984375, 3.046142578125, 8.43661117553711, 5.556102752685547, 0.9469070434570312, -0.4024829864501953, -1.9067459106445312, 9.823955535888672, -16.694488525390625, -3.5833282470703125, -0.80389404296875, -1.228973388671875, 5.630084991455078, 10.498882293701172, -1.6931114196777344, 19.343482971191406, 19.585529327392578, 10.144264221191406, 7.353639602661133, 2.902888298034668, 5.876594543457031, 16.17816162109375, 11.821907043457031, 1.3672161102294922, 17.722213745117188, 2.343210220336914, -0.7106094360351562, 3.776590347290039, 0.72332763671875, 17.33795166015625, 14.177017211914062, 14.053558349609375, -4.6833953857421875, -5.7229766845703125, 17.07891845703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000292.npy"}
|
||||
{"epoch": 0.4414210128495843, "step": 293, "batch_size": 64, "mean": 6.393031120300293, "std": 9.798775672912598, "min": -21.86712646484375, "p10": -5.878355407714843, "median": 5.100813865661621, "p90": 19.062776947021487, "max": 28.909072875976562, "pos_frac": 0.765625, "sample": [2.0121841430664062, 8.923519134521484, 12.556396484375, 17.05990219116211, 6.173271179199219, 4.016914367675781, -0.9432029724121094, 4.135374069213867, 0.4644927978515625, 3.215972900390625, 14.196975708007812, -4.230690002441406, 22.347152709960938, 14.732662200927734, 14.299896240234375, 0.15691375732421875, 15.802263259887695, 12.159759521484375, 14.428516387939453, 13.007247924804688, 6.7196502685546875, -6.0757293701171875, -0.5133743286132812, -7.651649475097656, 13.320873260498047, 3.0505599975585938, 5.080617904663086, -0.8961219787597656, 5.121009826660156, 28.909072875976562, 6.2989654541015625, 20.06147003173828, 19.125946044921875, -6.641103744506836, 8.237640380859375, -2.871074676513672, 18.915382385253906, -21.86712646484375, 2.3986053466796875, 2.8784866333007812, -7.142845153808594, 26.27434539794922, -11.21756362915039, -3.857421875, 0.190185546875, -0.5411605834960938, 7.230072021484375, 11.404987335205078, 22.532684326171875, 10.242668151855469, 20.07450294494629, -5.417816162109375, 10.925893783569336, 2.212993621826172, 9.785575866699219, -9.302040100097656, 17.867172241210938, 1.7333984375, 1.129364013671875, 15.232162475585938, 3.745208740234375, 1.2701911926269531, 0.9300003051757812, 15.733810424804688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000293.npy"}
|
||||
{"epoch": 0.4429327286470144, "step": 294, "batch_size": 64, "mean": 4.846449851989746, "std": 7.66139554977417, "min": -15.311580657958984, "p10": -3.349626922607422, "median": 4.960630416870117, "p90": 13.438336944580078, "max": 26.99566650390625, "pos_frac": 0.671875, "sample": [20.411361694335938, 4.854312896728516, 7.5977325439453125, 1.18798828125, 13.441944122314453, 9.723060607910156, 9.06069564819336, 1.21240234375, -15.311580657958984, 4.782573699951172, 13.21270751953125, -7.431083679199219, -2.4922027587890625, 10.307815551757812, 7.298669815063477, -4.799537658691406, 19.167293548583984, 7.575447082519531, 6.807582855224609, 13.429920196533203, -3.3714599609375, 4.323738098144531, 5.170440673828125, 7.805389404296875, -3.2986831665039062, 17.315948486328125, -9.37521743774414, 8.722557067871094, 2.0099029541015625, 26.99566650390625, 9.905105590820312, -0.6314792633056641, -0.9669609069824219, 12.124549865722656, -1.6033878326416016, -1.56048583984375, -1.9155731201171875, -0.6227188110351562, 5.253837585449219, 8.2120361328125, -1.5213394165039062, 3.0813827514648438, 16.70366096496582, 3.201150894165039, 9.81827163696289, 11.045120239257812, 9.312618255615234, 6.860744476318359, -0.6074981689453125, 17.526050567626953, -2.659900665283203, 8.621994018554688, 11.156478881835938, 8.336029052734375, 6.89094352722168, -1.1446189880371094, -2.1010665893554688, -5.453510284423828, 1.3140544891357422, -0.9348678588867188, 5.066947937011719, 2.810596466064453, 3.7363739013671875, -5.417121887207031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000294.npy"}
|
||||
{"epoch": 0.4444444444444444, "step": 295, "batch_size": 64, "mean": 4.308538913726807, "std": 7.675354480743408, "min": -14.701553344726562, "p10": -4.217757415771483, "median": 4.077301025390625, "p90": 13.378915405273439, "max": 26.93236541748047, "pos_frac": 0.71875, "sample": [13.448486328125, 1.4399681091308594, -2.0206451416015625, -5.273380279541016, 16.092639923095703, 8.277999877929688, 11.971534729003906, 4.103248596191406, 15.381153106689453, 2.705352783203125, 26.93236541748047, 4.051353454589844, 14.765045166015625, 5.47026252746582, 2.4745750427246094, 2.3778762817382812, 0.2633094787597656, 5.6407012939453125, -2.03167724609375, 10.027420043945312, 7.177772521972656, 4.274799346923828, 0.6313896179199219, 4.627399444580078, 8.6607666015625, -8.350128173828125, -0.25215911865234375, -9.512075424194336, 5.8609466552734375, -0.21187591552734375, -0.9052886962890625, -2.0256385803222656, 19.224040985107422, 7.67449951171875, 6.408851623535156, -4.866710662841797, 9.461174011230469, 2.4569778442382812, 10.642810821533203, 13.216583251953125, 9.59503173828125, -14.701553344726562, 1.4701614379882812, 12.51220703125, 1.8037261962890625, 8.26385498046875, -1.3745803833007812, 1.6396026611328125, -6.654886245727539, 4.468133926391602, 0.5825157165527344, 1.5785369873046875, -11.106857299804688, 1.1492233276367188, -1.6298484802246094, 10.851598739624023, 10.191646575927734, 6.2988128662109375, -1.055267333984375, -1.1466903686523438, 5.329963684082031, 6.6400909423828125, 23.45287322998047, -2.703533172607422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000295.npy"}
|
||||
{"epoch": 0.4459561602418745, "step": 296, "batch_size": 64, "mean": 6.0777907371521, "std": 8.474268913269043, "min": -8.810279846191406, "p10": -4.4922483444213865, "median": 4.651224136352539, "p90": 19.247319221496582, "max": 23.000877380371094, "pos_frac": 0.734375, "sample": [9.49822998046875, 19.55304718017578, 4.271427154541016, 6.27995491027832, -2.9390411376953125, 5.7936859130859375, 0.7904644012451172, 20.229736328125, -2.636463165283203, 9.88132095336914, 5.961696624755859, 22.52520751953125, -4.334024429321289, 19.111127853393555, 4.239482879638672, 3.951385498046875, 13.080352783203125, 13.927597045898438, 2.968730926513672, 9.720573425292969, 0.16191864013671875, -4.0255126953125, 15.961883544921875, -4.56005859375, -3.9773406982421875, -6.0706634521484375, 3.3046035766601562, 3.7181015014648438, 2.4516754150390625, 3.4903221130371094, -6.722206115722656, -6.281734466552734, -1.79058837890625, 8.239028930664062, 15.00445556640625, 0.20404052734375, -0.15984344482421875, 1.3612251281738281, 12.872196197509766, -4.757843017578125, 5.0310211181640625, 3.7350311279296875, 23.000877380371094, -8.639999389648438, 8.535736083984375, 6.245819091796875, 10.788414001464844, 13.72958755493164, 12.253116607666016, 21.413162231445312, 7.2714080810546875, -2.6835479736328125, 15.090206146240234, 8.946163177490234, 22.283058166503906, 2.8132171630859375, -8.810279846191406, 19.305686950683594, 1.2757415771484375, -0.5050048828125, -0.5853500366210938, 14.822303771972656, 11.637496948242188, 11.726585388183594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000296.npy"}
|
||||
{"epoch": 0.4474678760393046, "step": 297, "batch_size": 64, "mean": 6.026992321014404, "std": 7.147971153259277, "min": -11.594230651855469, "p10": -2.5633956909179685, "median": 4.843099594116211, "p90": 16.387107849121094, "max": 21.271411895751953, "pos_frac": 0.828125, "sample": [8.968391418457031, -2.5941619873046875, 3.626373291015625, 11.563133239746094, 7.318023681640625, 16.364837646484375, 4.456638336181641, 16.396652221679688, -3.1194381713867188, -5.291385650634766, 6.022426605224609, 5.1071624755859375, 1.84552001953125, 6.103767395019531, 4.469175338745117, 20.126087188720703, 3.086956024169922, 9.621002197265625, 4.070583343505859, 10.420028686523438, 4.108451843261719, 1.3639984130859375, -4.53076171875, -11.594230651855469, 0.3915214538574219, 21.271411895751953, 6.8232421875, -0.7142181396484375, 17.75597381591797, 4.579036712646484, 13.337345123291016, 20.864227294921875, -0.3587799072265625, 0.39980316162109375, 4.092159271240234, 4.3489990234375, 0.2968482971191406, 0.4333038330078125, 12.260002136230469, 5.5451812744140625, 18.907684326171875, 0.15018653869628906, 8.132057189941406, 15.880830764770508, 2.4740676879882812, 7.701118469238281, -6.665855407714844, -1.540858268737793, 3.73980712890625, 6.910957336425781, 2.9742279052734375, 9.138452529907227, 8.661407470703125, 9.100173950195312, 0.3870124816894531, -3.5442676544189453, 16.023048400878906, 13.459243774414062, -2.491607666015625, 17.070751190185547, 3.17205810546875, 5.469015121459961, 11.287677764892578, 10.095024108886719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000297.npy"}
|
||||
{"epoch": 0.4489795918367347, "step": 298, "batch_size": 64, "mean": 5.29632568359375, "std": 9.460187911987305, "min": -13.27252197265625, "p10": -7.41323013305664, "median": 4.390338897705078, "p90": 18.939614868164067, "max": 27.319671630859375, "pos_frac": 0.734375, "sample": [3.0910491943359375, -12.784465789794922, 23.92633056640625, 2.5901947021484375, 19.742515563964844, -2.112457275390625, -4.538248062133789, 3.0790786743164062, 5.095706939697266, 2.144266128540039, 19.388214111328125, 20.92194366455078, -2.6206207275390625, -1.712921142578125, -4.2849273681640625, 20.14916229248047, 10.637725830078125, -7.137138366699219, 16.17087173461914, 4.81976318359375, 4.704288482666016, -4.38892936706543, 11.540359497070312, 9.868743896484375, 4.534004211425781, 0.24266815185546875, -13.27252197265625, 9.517765045166016, 8.093511581420898, 17.01481819152832, 2.208057403564453, 15.415515899658203, 5.095909118652344, 16.545028686523438, 9.033760070800781, 17.89288330078125, 4.246673583984375, 14.206169128417969, 1.9553489685058594, 0.6261215209960938, 3.9903526306152344, -2.10406494140625, -0.5708808898925781, 4.836952209472656, 5.268566131591797, 27.319671630859375, 14.70947265625, -7.6712493896484375, 2.520061492919922, 4.899749755859375, 0.31880950927734375, 1.1677932739257812, -2.7883262634277344, -10.780323028564453, -7.53155517578125, 9.806861877441406, -8.667259216308594, 13.629379272460938, 3.30987548828125, -10.811264038085938, 15.916999816894531, 5.691459655761719, 3.7287445068359375, 21.128795623779297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000298.npy"}
|
||||
{"epoch": 0.4504913076341648, "step": 299, "batch_size": 64, "mean": 4.0797014236450195, "std": 8.189878463745117, "min": -15.969619750976562, "p10": -5.4279125213623045, "median": 3.6616945266723633, "p90": 14.792028427124029, "max": 29.625534057617188, "pos_frac": 0.703125, "sample": [7.464420318603516, 5.605190277099609, -1.5400352478027344, -2.0822372436523438, 1.7521591186523438, 10.34480094909668, 1.2945785522460938, 5.470478057861328, 16.099716186523438, -0.14089584350585938, -1.3614673614501953, 5.581298828125, -15.969619750976562, 4.4432373046875, -9.947290420532227, -0.204833984375, 3.524993896484375, 16.48727035522461, 8.60367202758789, 3.0443191528320312, 20.969806671142578, -3.070831298828125, -2.030027389526367, 10.052230834960938, -6.3259124755859375, -5.2675323486328125, -5.810367584228516, 2.363861083984375, 29.625534057617188, -2.7134246826171875, 10.8060302734375, 8.337959289550781, 7.530109405517578, -12.916358947753906, 0.46627044677734375, -5.054931640625, 8.78857421875, 11.401844024658203, 1.1148834228515625, 3.5372543334960938, 5.519905090332031, 0.13372802734375, 10.43973159790039, 7.723194122314453, 23.255584716796875, -7.579078674316406, 3.786134719848633, 17.25115203857422, 5.699951171875, 13.64114761352539, 2.863412857055664, 2.8768043518066406, 9.30583381652832, -1.596221923828125, -1.1974029541015625, 9.31280517578125, 6.812286376953125, 4.792366027832031, 0.0035266876220703125, 4.048675537109375, 3.808804512023926, 0.13520050048828125, -5.496646881103516, 15.285263061523438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000299.npy"}
|
||||
{"epoch": 0.4520030234315949, "step": 300, "batch_size": 64, "mean": 4.628297805786133, "std": 8.644111633300781, "min": -16.059280395507812, "p10": -7.348061370849609, "median": 5.659997940063477, "p90": 13.194029235839844, "max": 24.026649475097656, "pos_frac": 0.75, "sample": [7.097507476806641, 8.8419189453125, 3.2978668212890625, 8.624526977539062, 0.21689605712890625, -5.5458526611328125, 7.8756256103515625, 4.460655212402344, -12.049179077148438, -11.645227432250977, 24.026649475097656, 12.8839111328125, 12.069561004638672, 11.664886474609375, 23.09954071044922, 5.589221954345703, 10.896759033203125, -6.981842041015625, 2.003009796142578, 12.06252670288086, 3.2702674865722656, 13.284286499023438, 6.37384033203125, 0.9148540496826172, -7.505012512207031, 0.9865798950195312, 7.581825256347656, 1.4961013793945312, 7.905784606933594, 7.1439666748046875, -0.7153816223144531, -16.059280395507812, 11.003921508789062, -1.6651229858398438, 0.9010009765625, 5.73077392578125, 1.0097007751464844, 20.868789672851562, 12.1519775390625, 12.055183410644531, 7.19268798828125, 1.4413719177246094, -3.752777099609375, 18.201995849609375, 6.385414123535156, -13.878044128417969, 0.5925140380859375, 11.094905853271484, -1.5251007080078125, 10.764142990112305, 17.934722900390625, 0.20391082763671875, -1.9448089599609375, 7.277523040771484, -1.3031082153320312, -10.292795181274414, 3.4615097045898438, 8.711189270019531, 12.983428955078125, -7.702873229980469, 14.881462097167969, 9.446434020996094, 5.26300048828125, -4.44868278503418], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000300.npy"}
|
||||
{"epoch": 0.45351473922902497, "step": 301, "batch_size": 64, "mean": 4.801657676696777, "std": 8.69263744354248, "min": -10.963518142700195, "p10": -5.305593109130859, "median": 2.633944511413574, "p90": 17.014816665649416, "max": 26.068260192871094, "pos_frac": 0.703125, "sample": [15.548128128051758, 7.160621643066406, 1.6553955078125, 1.3821525573730469, 2.8527374267578125, -5.991279602050781, 18.958961486816406, 6.245174407958984, 5.296039581298828, 10.530509948730469, 0.7278900146484375, 3.3827476501464844, 12.344703674316406, -5.992671966552734, -10.082073211669922, 1.8831939697265625, -4.566688537597656, 2.56744384765625, 8.959243774414062, 0.3528099060058594, -1.4291343688964844, 7.198173522949219, 16.528404235839844, -3.549274444580078, 12.744857788085938, 8.839302062988281, 0.9830570220947266, 16.54684066772461, 2.0000686645507812, -1.0299224853515625, -7.008293151855469, -4.739112854003906, 11.40506362915039, 20.76708984375, -0.9723892211914062, -0.23374176025390625, -5.548370361328125, 25.620895385742188, -4.423255920410156, 4.950008392333984, 1.0629234313964844, 7.842876434326172, 5.498924255371094, -10.963518142700195, -0.2041473388671875, 11.581953048706055, 7.420169830322266, 7.56494140625, 0.04486083984375, 15.229301452636719, 2.7004451751708984, 18.746673583984375, 14.805187225341797, -1.7110366821289062, -3.3312530517578125, 7.221492767333984, 19.504776000976562, 26.068260192871094, 0.307891845703125, -4.674324035644531, 1.5212783813476562, -8.781684875488281, 17.215377807617188, 0.7694091796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000301.npy"}
|
||||
{"epoch": 0.455026455026455, "step": 302, "batch_size": 64, "mean": 2.836970806121826, "std": 7.87547492980957, "min": -15.530044555664062, "p10": -6.535047149658203, "median": 2.5457725524902344, "p90": 12.24847106933594, "max": 24.643157958984375, "pos_frac": 0.6875, "sample": [4.802070617675781, 10.189155578613281, -0.9429779052734375, 24.643157958984375, 0.14968490600585938, 9.103633880615234, 20.783241271972656, -6.628593444824219, 11.589508056640625, 2.4497756958007812, 3.612579345703125, 2.7305908203125, 5.210268020629883, 2.6579933166503906, -0.16637420654296875, 15.997428894042969, 3.0741119384765625, 2.1884078979492188, -2.415863037109375, 2.9144287109375, 0.2637443542480469, -10.010848999023438, -15.530044555664062, 1.9658050537109375, 1.34112548828125, -3.3148117065429688, 2.967937469482422, 15.240169525146484, -1.5385932922363281, 2.1316680908203125, -6.3167724609375, 10.2493896484375, 3.8226394653320312, -1.5460338592529297, -4.370372772216797, 12.5308837890625, 20.227821350097656, -7.5416107177734375, 1.4774932861328125, -0.9320640563964844, 7.945878982543945, 8.366710662841797, -0.3690662384033203, 5.229804992675781, 3.3123321533203125, 2.6417694091796875, -11.358211517333984, 7.2904205322265625, 0.56988525390625, 2.1715469360351562, 6.781841278076172, 3.4208526611328125, 8.8331298828125, -3.2820892333984375, 1.3121109008789062, -14.615976333618164, 8.313751220703125, 19.386009216308594, -2.784759521484375, 2.760101318359375, -8.779914855957031, 0.356475830078125, 4.51910400390625, -3.515338897705078], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000302.npy"}
|
||||
{"epoch": 0.4565381708238851, "step": 303, "batch_size": 64, "mean": 6.679323673248291, "std": 9.329519271850586, "min": -10.725906372070312, "p10": -5.295003509521483, "median": 6.418754577636719, "p90": 18.4348388671875, "max": 27.579742431640625, "pos_frac": 0.75, "sample": [0.047515869140625, 10.938720703125, 11.84466552734375, 6.9395294189453125, 22.119140625, 18.749725341796875, 10.6146240234375, -1.7892189025878906, 10.359939575195312, -0.842041015625, 10.583694458007812, -5.862335205078125, 6.4117279052734375, -8.96786880493164, -3.7263946533203125, -6.642738342285156, -9.7379150390625, 17.700103759765625, -3.9712295532226562, 1.91461181640625, -0.4928092956542969, 12.404422760009766, 24.0350341796875, 15.871635437011719, -10.725906372070312, 3.7545547485351562, 20.925966262817383, -9.190383911132812, -2.243062973022461, 3.40496826171875, 9.254932403564453, 6.639595031738281, 3.9263267517089844, 0.038265228271484375, 2.0309066772460938, 0.0694580078125, 24.15158462524414, 1.6080703735351562, 16.07117462158203, 10.917854309082031, 6.2678070068359375, 11.353683471679688, 10.506492614746094, 12.0799560546875, -2.129425048828125, 7.354454040527344, 27.579742431640625, 6.42578125, 7.025665283203125, 13.642135620117188, -0.8359832763671875, 4.9418487548828125, -1.5662956237792969, 2.1981201171875, 16.65624237060547, 13.030136108398438, 3.039905548095703, 2.9983062744140625, 17.361129760742188, 5.232364654541016, 26.91616439819336, -9.214790344238281, 14.439239501953125, 13.037193298339844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000303.npy"}
|
||||
{"epoch": 0.4580498866213152, "step": 304, "batch_size": 64, "mean": 5.705035209655762, "std": 7.7084269523620605, "min": -14.036376953125, "p10": -1.4160324096679686, "median": 4.670886993408203, "p90": 16.787349319458013, "max": 22.44458770751953, "pos_frac": 0.8125, "sample": [17.16912078857422, 0.46185302734375, 2.616607666015625, 19.965972900390625, 0.6300201416015625, 10.638912200927734, 1.9200668334960938, 4.424829483032227, -1.336639404296875, 0.22988128662109375, 1.02239990234375, 13.025390625, 8.479217529296875, -2.02728271484375, 12.668428421020508, 3.430562973022461, -7.528415679931641, 7.980060577392578, 22.44458770751953, -12.042312622070312, 4.692756652832031, 4.649017333984375, 7.0926361083984375, 1.110219955444336, -0.38466644287109375, -8.294082641601562, 4.879447937011719, 6.652626037597656, 3.89691162109375, 19.57970428466797, 14.937332153320312, -1.4848098754882812, 11.9658203125, 6.203453063964844, 2.2085189819335938, -14.036376953125, 5.871150970458984, -0.23728179931640625, 5.110847473144531, 1.534881591796875, 4.739543914794922, -1.4500579833984375, 14.669784545898438, 6.900848388671875, 5.371519088745117, 2.3507919311523438, 1.5097179412841797, 12.0635986328125, 12.440353393554688, 15.896549224853516, 2.866485595703125, 1.172454833984375, 19.59491729736328, 1.2925949096679688, 11.897979736328125, -0.03693389892578125, 12.276275634765625, 19.08437728881836, 20.389328002929688, 0.47560882568359375, 1.9849014282226562, -0.01566314697265625, 6.8699798583984375, 12.655937194824219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000304.npy"}
|
||||
{"epoch": 0.4595616024187453, "step": 305, "batch_size": 64, "mean": 5.354496002197266, "std": 9.407899856567383, "min": -19.425994873046875, "p10": -6.070555877685546, "median": 2.6957740783691406, "p90": 17.48478240966797, "max": 28.58623504638672, "pos_frac": 0.734375, "sample": [19.55557632446289, -2.0330657958984375, 0.678955078125, -2.4209442138671875, 1.3047866821289062, 0.9646797180175781, 20.170486450195312, -12.80645751953125, 12.705093383789062, 17.855941772460938, 1.6282386779785156, 16.618743896484375, -4.939117431640625, 11.486446380615234, -4.3387451171875, -6.732967376708984, -6.555458068847656, 1.2511940002441406, 1.13580322265625, 20.100494384765625, 2.003864288330078, 12.343469619750977, 9.216949462890625, 11.851715087890625, -2.0654449462890625, 5.612617492675781, 12.052379608154297, -0.8756484985351562, 2.340789794921875, -19.425994873046875, 4.2077789306640625, 7.9239044189453125, 16.313220977783203, 13.179550170898438, -8.293930053710938, 1.4748382568359375, 28.58623504638672, 2.006561279296875, 0.9128036499023438, 12.483436584472656, 0.16196823120117188, 9.290985107421875, 16.185684204101562, 20.60759162902832, 7.577484130859375, -7.5768280029296875, 1.39453125, -10.395187377929688, 13.254684448242188, 14.636051177978516, 2.1236343383789062, 13.222084045410156, 4.85589599609375, 22.876487731933594, 8.761112213134766, 7.4942626953125, -0.3643798828125, 3.0507583618164062, -4.237373352050781, 15.198535919189453, -1.0840492248535156, 0.9715957641601562, 8.833038330078125, -1.6295852661132812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000305.npy"}
|
||||
{"epoch": 0.46107331821617537, "step": 306, "batch_size": 64, "mean": 5.096833229064941, "std": 8.207242012023926, "min": -13.841278076171875, "p10": -3.6258762359619134, "median": 4.115653991699219, "p90": 15.14006042480469, "max": 30.892059326171875, "pos_frac": 0.75, "sample": [18.053016662597656, 1.0252761840820312, -0.9670429229736328, 13.33559799194336, 2.3433990478515625, 6.126720428466797, 3.477252960205078, 2.6523895263671875, 6.85792350769043, 8.003780364990234, 14.505718231201172, 9.583816528320312, -13.841278076171875, 0.3616485595703125, 0.9358901977539062, -4.810848236083984, 12.723175048828125, 4.38531494140625, 9.524345397949219, 4.8481903076171875, 6.8481903076171875, 16.52377700805664, 9.857574462890625, -1.2578887939453125, 9.111364364624023, -1.7643585205078125, -8.822294235229492, 6.4066314697265625, 2.67803955078125, 1.1399688720703125, -1.7297515869140625, 16.54595947265625, -1.550313949584961, 2.8421859741210938, 8.849693298339844, 3.292236328125, 18.520492553710938, 27.186309814453125, 0.48227691650390625, -2.9616661071777344, -1.5602073669433594, 1.6511917114257812, 4.575641632080078, 1.0492401123046875, -8.990697860717773, 3.8459930419921875, 0.9170455932617188, 10.107170104980469, 1.6125030517578125, 12.798999786376953, 11.763381958007812, 9.525426864624023, -3.9105377197265625, -1.5735015869140625, 15.253677368164062, 7.382257461547852, 6.9499359130859375, -6.682071685791016, 14.874954223632812, 30.892059326171875, -7.876667022705078, 9.895950317382812, 4.647026062011719, -2.2741613388061523], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000306.npy"}
|
||||
{"epoch": 0.46258503401360546, "step": 307, "batch_size": 64, "mean": 4.226436614990234, "std": 8.378073692321777, "min": -8.765289306640625, "p10": -5.056075286865234, "median": 2.775350570678711, "p90": 17.379757690429688, "max": 26.86505126953125, "pos_frac": 0.609375, "sample": [7.609832763671875, -2.835559844970703, 6.5284881591796875, 2.7387428283691406, -3.9793930053710938, 0.5466842651367188, 26.86505126953125, 1.81219482421875, 16.988784790039062, 17.547317504882812, -2.1451187133789062, 4.309734344482422, 7.24859619140625, 13.678863525390625, -8.397796630859375, 6.88226318359375, 4.83831787109375, -1.0740585327148438, 11.259698867797852, -1.1286888122558594, 5.31817626953125, -1.8410491943359375, -4.3506011962890625, -8.679306030273438, 1.8021965026855469, -2.0404815673828125, -1.2850799560546875, 2.8119583129882812, 13.05194091796875, -1.1635608673095703, -3.1144180297851562, 0.02918243408203125, 1.8009605407714844, -5.933067321777344, -5.358421325683594, 6.491943359375, -4.2816162109375, 4.5223541259765625, 10.77792739868164, -8.127212524414062, 3.698131561279297, -0.1194305419921875, 1.9080734252929688, 19.11925506591797, 19.85814666748047, -6.8701629638671875, 3.2185964584350586, 6.838146209716797, 23.246402740478516, 17.657684326171875, -8.765289306640625, 17.625450134277344, 9.289375305175781, -1.0546092987060547, 6.543525695800781, 8.08609390258789, -0.495391845703125, 12.715694427490234, 9.008445739746094, -2.9107513427734375, -3.1024646759033203, 13.601638793945312, 14.854644775390625, -3.185070037841797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000307.npy"}
|
||||
{"epoch": 0.46409674981103555, "step": 308, "batch_size": 64, "mean": 5.425999641418457, "std": 8.604817390441895, "min": -13.241044998168945, "p10": -6.090161514282226, "median": 5.142978668212891, "p90": 16.541499328613284, "max": 26.43256378173828, "pos_frac": 0.71875, "sample": [17.678020477294922, 6.17498779296875, 1.7086181640625, -1.5592041015625, 11.305885314941406, 10.053253173828125, 0.7098217010498047, 3.999176025390625, 12.7313232421875, -7.058052062988281, 26.43256378173828, 9.74298095703125, 19.961402893066406, 6.089372634887695, -1.73681640625, -5.427852630615234, 5.643951416015625, -0.7299308776855469, 15.590139389038086, 6.9387054443359375, 15.838821411132812, -6.3740081787109375, 14.663627624511719, 8.378637313842773, 6.980560302734375, 14.926383972167969, -5.33738899230957, 13.708541870117188, -7.09722900390625, -8.985977172851562, 16.689865112304688, 3.5873489379882812, 6.956523895263672, -0.27701759338378906, 8.980865478515625, -1.8314590454101562, 9.071327209472656, 0.0713043212890625, 3.304901123046875, -11.79339599609375, -0.8250579833984375, 2.0597305297851562, 8.792816162109375, 17.379684448242188, 2.5185203552246094, 5.3542327880859375, 6.100776672363281, -2.9167633056640625, 4.140876770019531, 4.931724548339844, 18.43976593017578, 2.5864410400390625, -3.04071044921875, 13.90262222290039, 24.987060546875, -0.5124893188476562, 16.1953125, -13.241044998168945, 11.95579719543457, 1.5995101928710938, 4.105836868286133, -6.6549224853515625, 4.164325714111328, 5.5293426513671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000308.npy"}
|
||||
{"epoch": 0.4656084656084656, "step": 309, "batch_size": 64, "mean": 5.341372489929199, "std": 7.132696628570557, "min": -7.22137451171875, "p10": -2.676649856567382, "median": 3.420103073120117, "p90": 13.96443557739258, "max": 24.941810607910156, "pos_frac": 0.78125, "sample": [7.515342712402344, 24.941810607910156, 6.011762619018555, 18.147525787353516, 12.134174346923828, -4.0510101318359375, 13.581729888916016, 0.2214202880859375, 0.941925048828125, -2.0472793579101562, 7.93475341796875, -1.48223876953125, -2.9150657653808594, 1.4200592041015625, 2.8272476196289062, -6.741085052490234, 2.2148284912109375, 8.989484786987305, -4.4940643310546875, 9.594335556030273, 3.8356285095214844, 14.8330078125, 19.22053337097168, 0.3638572692871094, 13.489044189453125, 9.937458038330078, -3.314342498779297, 1.3474082946777344, 2.011260986328125, 1.2272453308105469, -0.2728729248046875, -1.4576568603515625, 1.503387451171875, 6.998992919921875, 5.650199890136719, 1.0914077758789062, 1.8043136596679688, 7.534900665283203, -4.565374374389648, 17.494903564453125, 0.7628021240234375, 7.8069610595703125, 12.621232986450195, 10.238029479980469, 14.080833435058594, -7.22137451171875, 12.02984619140625, 7.1888427734375, 1.4962310791015625, 13.692771911621094, -1.6910476684570312, 0.9591560363769531, -0.1579742431640625, 11.205005645751953, 22.156845092773438, 3.465373992919922, 7.361270904541016, 1.7166366577148438, -2.1203460693359375, 0.321563720703125, 13.692840576171875, 3.3748321533203125, 5.7913360595703125, 9.597213745117188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000309.npy"}
|
||||
{"epoch": 0.4671201814058957, "step": 310, "batch_size": 64, "mean": 2.3886003494262695, "std": 8.340556144714355, "min": -13.856964111328125, "p10": -9.529121017456054, "median": 1.8328132629394531, "p90": 14.68148880004883, "max": 20.16525650024414, "pos_frac": 0.609375, "sample": [17.338592529296875, -0.0052032470703125, 8.92596435546875, 9.011507034301758, 7.499114990234375, -8.92694091796875, -5.0321502685546875, 1.7596206665039062, 13.308219909667969, -0.9148826599121094, -1.3689804077148438, -10.108879089355469, 0.3782920837402344, 14.07275390625, -2.7982177734375, 1.0522499084472656, -0.4635143280029297, 6.67950439453125, 16.62457275390625, -10.019615173339844, 4.662601470947266, -0.7712059020996094, -9.434226989746094, 12.4793701171875, -13.856964111328125, -3.2711009979248047, 1.906005859375, -4.137218475341797, -4.746063232421875, 8.101875305175781, -5.589324951171875, 0.78826904296875, 1.2480621337890625, 15.816680908203125, 2.1855926513671875, 5.9287109375, 5.999000549316406, 8.091178894042969, 5.4026336669921875, 3.2797317504882812, -4.344940185546875, 4.555137634277344, -3.88653564453125, -12.739974975585938, 3.6025772094726562, -0.05078887939453125, -5.468421936035156, 6.398307800292969, 2.20281982421875, -13.7388916015625, 8.687393188476562, 20.16525650024414, 11.957378387451172, -9.56978988647461, 1.5468330383300781, 0.5212020874023438, 2.76251220703125, 15.017162322998047, 14.942375183105469, -2.1614303588867188, -11.5914306640625, 17.18997573852539, 5.905597686767578, 9.872478485107422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000310.npy"}
|
||||
{"epoch": 0.46863189720332576, "step": 311, "batch_size": 64, "mean": 4.793861389160156, "std": 8.604948997497559, "min": -10.933662414550781, "p10": -3.9434188842773437, "median": 4.477245330810547, "p90": 15.245850181579604, "max": 35.264312744140625, "pos_frac": 0.703125, "sample": [10.41982650756836, 0.3960151672363281, 11.117683410644531, -10.933662414550781, -3.3124618530273438, -1.8938121795654297, 7.057163238525391, 8.78436279296875, -1.8168907165527344, 8.397880554199219, 6.532386779785156, -8.85162353515625, 3.642932891845703, 4.520671844482422, 1.2994232177734375, 1.0705909729003906, 6.8133544921875, 2.280029296875, 9.14990234375, 4.745170593261719, -3.0022239685058594, 11.914947509765625, 1.6059951782226562, 2.5205726623535156, 6.0789642333984375, -0.033924102783203125, -10.53360366821289, -2.3177757263183594, -2.276397705078125, 4.114227294921875, -3.46917724609375, 18.625167846679688, 28.798187255859375, 0.040496826171875, 5.9884033203125, 16.67337989807129, 10.71029281616211, 5.5330047607421875, 5.9745941162109375, 10.499008178710938, 35.264312744140625, 5.2389068603515625, -3.8073196411132812, 20.917686462402344, -4.972352981567383, -2.1820755004882812, 4.03057861328125, -1.8383331298828125, 6.092899322509766, 7.371654510498047, 21.149757385253906, 6.831512451171875, -1.2729949951171875, -6.462100982666016, 4.433818817138672, 9.994754791259766, 0.6270389556884766, -4.001747131347656, -6.149951934814453, 2.598175048828125, 6.991779327392578, 18.932968139648438, 11.859642028808594, 8.295425415039062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000311.npy"}
|
||||
{"epoch": 0.47014361300075586, "step": 312, "batch_size": 64, "mean": 5.799221515655518, "std": 8.430824279785156, "min": -8.021873474121094, "p10": -3.353553009033203, "median": 3.324857711791992, "p90": 17.567654418945313, "max": 27.040573120117188, "pos_frac": 0.734375, "sample": [8.88632583618164, 5.8095855712890625, -7.2259368896484375, 0.04168701171875, 6.810520172119141, 6.530670166015625, -1.0288238525390625, 2.428478240966797, 2.1374130249023438, 1.540679931640625, 24.023712158203125, 7.0016937255859375, 5.354469299316406, 3.082141876220703, 2.8267669677734375, 12.504997253417969, -2.89276123046875, 2.109783172607422, -2.4256629943847656, 11.003021240234375, 24.682762145996094, 2.5842514038085938, -5.796512603759766, 3.5675735473632812, 1.0390625, 17.88697052001953, -4.833793640136719, -3.084869384765625, 7.350334167480469, 10.70901870727539, 23.156383514404297, -0.6343154907226562, 15.270095825195312, 20.936660766601562, -8.021873474121094, 2.5811729431152344, 16.932693481445312, 2.387683868408203, 12.875396728515625, 17.062713623046875, 27.040573120117188, 3.594837188720703, -5.3391571044921875, 1.7173309326171875, 12.92962646484375, -3.1326828002929688, -0.923095703125, 14.157379150390625, 17.7840576171875, -0.8448944091796875, -3.448211669921875, 2.920389175415039, 14.595451354980469, 7.86895751953125, -2.3929901123046875, 9.094627380371094, 9.322547912597656, 5.96375846862793, -4.479616165161133, 14.866050720214844, 5.524669647216797, 0.49196624755859375, -1.6458663940429688, 0.3142852783203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000312.npy"}
|
||||
{"epoch": 0.47165532879818595, "step": 313, "batch_size": 64, "mean": 3.7469730377197266, "std": 9.257925033569336, "min": -13.232528686523438, "p10": -6.8860839843749995, "median": 2.5884323120117188, "p90": 17.013397979736332, "max": 29.939849853515625, "pos_frac": 0.671875, "sample": [4.120424270629883, 7.9717559814453125, 7.607551574707031, 1.2304725646972656, 3.5275344848632812, 15.809616088867188, 8.460712432861328, 6.540674209594727, 5.666229248046875, -11.457199096679688, 1.3180103302001953, -0.07286834716796875, 6.371150970458984, 3.2586746215820312, 2.392852783203125, -2.6728591918945312, 0.972991943359375, -2.762226104736328, -7.07720947265625, -1.0322227478027344, 3.0072708129882812, 6.85662841796875, 0.18956756591796875, 22.85997772216797, -13.232528686523438, -9.67557144165039, 5.589515686035156, 8.7908935546875, 6.6803131103515625, 2.1862411499023438, 26.2386474609375, 17.725608825683594, -5.313774108886719, -3.989788055419922, 20.9273681640625, 13.0948486328125, 3.6922779083251953, -8.583202362060547, -3.5715713500976562, 19.312530517578125, 4.157073974609375, 0.16015052795410156, 9.261871337890625, -6.44012451171875, 29.939849853515625, 2.7840118408203125, 1.3665695190429688, -5.2837677001953125, 3.4803619384765625, 2.2308425903320312, 17.52930450439453, -1.3099441528320312, 0.30381011962890625, -2.439617156982422, -1.8720779418945312, -12.236312866210938, 15.602043151855469, 1.5229377746582031, 14.991926193237305, -3.4699249267578125, -12.276611328125, -1.9540863037109375, 6.757755279541016, 14.040924072265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000313.npy"}
|
||||
{"epoch": 0.47316704459561604, "step": 314, "batch_size": 64, "mean": 5.628398418426514, "std": 8.376510620117188, "min": -9.638389587402344, "p10": -3.421357727050781, "median": 4.379250526428223, "p90": 18.157106781005858, "max": 24.217533111572266, "pos_frac": 0.671875, "sample": [6.201866149902344, 0.41233062744140625, 2.8152084350585938, 20.428878784179688, -3.7447471618652344, 13.042434692382812, -0.5595932006835938, 15.977333068847656, 0.5816497802734375, 7.988746643066406, 0.7259445190429688, -1.7226505279541016, 22.777435302734375, 4.69537353515625, 2.7719573974609375, 17.136077880859375, 6.864826202392578, 14.827449798583984, 20.243511199951172, 6.476112365722656, 2.68359375, -1.5291290283203125, 13.568939208984375, 24.217533111572266, 18.119300842285156, -9.03790283203125, -0.556610107421875, 18.173309326171875, 4.9562835693359375, -4.7555694580078125, 7.499814987182617, 7.36199951171875, 2.350625991821289, -0.6983795166015625, 1.627030372619629, 4.902915954589844, 9.7867431640625, 6.4428863525390625, -3.5506439208984375, 8.057342529296875, -1.0760860443115234, -9.638389587402344, 19.016387939453125, 13.249549865722656, 9.176944732666016, -4.462127685546875, 7.276325225830078, 16.195701599121094, -1.9609909057617188, 12.641849517822266, 3.7959365844726562, 19.90325164794922, -1.131134033203125, -0.7757225036621094, 1.04168701171875, -3.045135498046875, -6.549919128417969, 7.21051025390625, -1.6805648803710938, -3.11968994140625, 4.063127517700195, -2.1914138793945312, -2.2830543518066406, 17.000228881835938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000314.npy"}
|
||||
{"epoch": 0.47467876039304613, "step": 315, "batch_size": 64, "mean": 4.652489185333252, "std": 7.351320743560791, "min": -9.110237121582031, "p10": -4.659976959228515, "median": 3.8467254638671875, "p90": 15.448014068603516, "max": 23.60804557800293, "pos_frac": 0.71875, "sample": [-5.152519226074219, 6.1025390625, 6.680568695068359, -0.5014190673828125, -0.5409946441650391, 4.025146484375, -6.440727233886719, -1.8778762817382812, 0.2541656494140625, 13.19610595703125, -7.05126953125, 6.185060501098633, 5.427665710449219, 10.972679138183594, -1.8318672180175781, 4.1302490234375, -0.28804779052734375, -2.1489639282226562, -0.30849456787109375, 3.357574462890625, 1.6868648529052734, 6.4316864013671875, 3.8978042602539062, 2.9855880737304688, 0.78240966796875, 2.6923751831054688, 15.73199462890625, -6.445823669433594, -2.6878890991210938, 3.7956466674804688, 1.0858917236328125, 0.2221832275390625, 9.595634460449219, 4.9702606201171875, 2.92510986328125, 18.253265380859375, 11.6807861328125, -3.510711669921875, -6.609169006347656, 10.963912963867188, 12.99759292602539, 1.5544357299804688, 9.916412353515625, 18.137331008911133, 3.1548614501953125, 6.614013671875, 6.059818267822266, 23.60804557800293, -9.110237121582031, 4.965644836425781, 17.464096069335938, -0.3962249755859375, 11.539276123046875, 0.7318973541259766, 7.0517120361328125, 6.0179443359375, 13.055511474609375, 6.1468505859375, 15.526092529296875, -6.124423980712891, 15.265830993652344, -1.8706111907958984, 21.496978759765625, 1.319061279296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000315.npy"}
|
||||
{"epoch": 0.47619047619047616, "step": 316, "batch_size": 64, "mean": 5.416971206665039, "std": 7.24977970123291, "min": -7.755256652832031, "p10": -3.418609046936035, "median": 4.963340759277344, "p90": 15.964347839355469, "max": 22.1309814453125, "pos_frac": 0.734375, "sample": [-4.927734375, 4.2180938720703125, 7.2991943359375, -2.7685508728027344, -2.681873321533203, -3.1327133178710938, 13.15252685546875, 3.42431640625, 2.2443771362304688, 10.141342163085938, 15.319034576416016, 22.1309814453125, 10.5491943359375, 19.206085205078125, 18.292739868164062, -1.6440887451171875, -5.252288818359375, 5.353263854980469, -7.386383056640625, 12.454511642456055, 3.6668128967285156, 11.455276489257812, 4.573417663574219, 2.402374267578125, -3.7316322326660156, 6.3954315185546875, 6.5823516845703125, -1.7199249267578125, -0.536956787109375, 6.980426788330078, 10.001091003417969, 15.006011962890625, 0.3426513671875, 15.9185791015625, 1.2296600341796875, -0.6616172790527344, -0.8292236328125, 6.108547210693359, 1.5118865966796875, 1.683837890625, 10.821578979492188, 6.94732666015625, 8.439708709716797, -1.9507865905761719, 7.4077911376953125, 1.7836723327636719, 0.2662334442138672, 8.0670166015625, -7.755256652832031, 15.983963012695312, -0.541900634765625, -4.025762557983398, 2.6568450927734375, -3.541135787963867, 7.242313385009766, 7.882415771484375, 3.1107177734375, 21.329010009765625, 5.43572998046875, 8.205703735351562, 16.959545135498047, 3.5674514770507812, 17.790023803710938, 8.232925415039062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000316.npy"}
|
||||
{"epoch": 0.47770219198790626, "step": 317, "batch_size": 64, "mean": 5.934911727905273, "std": 8.900886535644531, "min": -22.895263671875, "p10": -2.461596298217773, "median": 4.65106201171875, "p90": 17.008302307128908, "max": 26.877914428710938, "pos_frac": 0.75, "sample": [4.2436981201171875, 2.900848388671875, 2.1608009338378906, 5.0584259033203125, 6.3736114501953125, 26.877914428710938, -0.9422607421875, 3.3977508544921875, 0.6675682067871094, -0.23946571350097656, -2.736347198486328, -0.8101348876953125, -0.3892822265625, 1.7022628784179688, 7.378883361816406, 1.8152389526367188, 16.08123779296875, 3.362213134765625, 16.840843200683594, 12.566650390625, -13.523353576660156, -0.3431396484375, 8.302970886230469, -0.17092132568359375, 11.224416732788086, 3.3178024291992188, 4.170654296875, 9.649250030517578, 7.284019470214844, -22.895263671875, 20.36914825439453, -3.3119659423828125, 8.64776611328125, 21.85401725769043, 23.49412727355957, -9.310096740722656, 8.636627197265625, -0.6827888488769531, 9.267217636108398, 17.08007049560547, -0.9396514892578125, 5.191986083984375, 22.87158966064453, -4.164802551269531, 0.7271499633789062, 11.960224151611328, 7.4137725830078125, 16.03277587890625, 5.808082580566406, 0.08486175537109375, 9.675079345703125, 14.6072998046875, 3.118906021118164, 12.658233642578125, -5.6850433349609375, 4.126640319824219, -1.8205108642578125, 12.050933837890625, 11.559757232666016, 20.345029830932617, 0.36173248291015625, 14.581275939941406, 1.508636474609375, 8.389358520507812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000317.npy"}
|
||||
{"epoch": 0.47921390778533635, "step": 318, "batch_size": 64, "mean": 4.443850517272949, "std": 9.0133056640625, "min": -13.02618408203125, "p10": -5.845678806304932, "median": 2.715372085571289, "p90": 17.14003353118897, "max": 25.57598876953125, "pos_frac": 0.640625, "sample": [7.4337310791015625, 14.554481506347656, -0.635772705078125, -8.487533569335938, 20.115318298339844, 6.197971343994141, 0.811065673828125, -13.02618408203125, 11.107616424560547, 14.5850830078125, 2.227092742919922, -0.1829833984375, 4.015636444091797, 6.8824920654296875, 21.75294303894043, 1.1082077026367188, 9.265403747558594, -5.118206024169922, 6.034599304199219, 2.845905303955078, -3.631807327270508, 6.413555145263672, 16.188888549804688, 0.265777587890625, -0.632171630859375, 5.8208770751953125, 19.27947235107422, -10.867317199707031, 10.163803100585938, 10.3170166015625, -5.454822540283203, 8.895366668701172, -8.510330200195312, -1.0575332641601562, 0.7606658935546875, 19.9542236328125, 25.57598876953125, -1.312652587890625, 16.439647674560547, 4.791587829589844, -0.1322784423828125, 1.1692733764648438, 13.242080688476562, -2.8420944213867188, -2.3449630737304688, 2.4910125732421875, 2.5848388671875, 11.274566650390625, -1.401824951171875, 13.506290435791016, -0.2339344024658203, 23.244178771972656, 3.9044227600097656, 12.375289916992188, -8.27652359008789, -5.638099670410156, 5.258934020996094, -5.8665771484375, 6.4474639892578125, -2.3084716796875, 17.44019889831543, -10.880915641784668, -5.7969160079956055, 2.303356170654297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000318.npy"}
|
||||
{"epoch": 0.48072562358276644, "step": 319, "batch_size": 64, "mean": 5.092622756958008, "std": 8.162543296813965, "min": -15.42156982421875, "p10": -4.02633056640625, "median": 4.963451385498047, "p90": 15.109725189208989, "max": 28.99907684326172, "pos_frac": 0.71875, "sample": [8.064903259277344, 4.991752624511719, -3.4801177978515625, 13.411575317382812, 8.376182556152344, 9.384078979492188, 9.064594268798828, 1.3484344482421875, -0.455047607421875, 5.501941680908203, 11.12075424194336, 13.625320434570312, -0.27899169921875, 4.935150146484375, 7.9855499267578125, 7.348419189453125, -6.8209075927734375, 28.99907684326172, 1.9540481567382812, 18.141345977783203, 7.896650314331055, 0.1717681884765625, 2.1073455810546875, 10.297542572021484, 14.158103942871094, -2.2776718139648438, 16.088565826416016, 4.762687683105469, 12.139789581298828, 10.077835083007812, -5.612541198730469, -2.1717300415039062, 7.357761383056641, 22.59368896484375, -7.014308929443359, 5.607017517089844, -7.132537841796875, 5.685981750488281, 2.4988174438476562, -2.117717742919922, 5.173084259033203, 3.405628204345703, 1.8899612426757812, 2.4941635131835938, 9.401145935058594, -0.4446372985839844, -4.2604217529296875, -1.981964111328125, -10.133480072021484, 12.424610137939453, 0.5948505401611328, -0.378509521484375, 1.0220413208007812, -0.623779296875, 2.1471328735351562, 15.517562866210938, 7.123992919921875, 6.227939605712891, 19.41119956970215, 9.610721588134766, -1.012939453125, -15.42156982421875, 23.416522979736328, 1.989471435546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000319.npy"}
|
||||
{"epoch": 0.48223733938019653, "step": 320, "batch_size": 64, "mean": 3.16815447807312, "std": 8.559006690979004, "min": -13.316444396972656, "p10": -7.843670940399169, "median": 1.9688043594360352, "p90": 14.470834350585939, "max": 28.019363403320312, "pos_frac": 0.6875, "sample": [6.389595031738281, -11.951034545898438, 2.0602798461914062, 9.396121978759766, 4.0410614013671875, -1.881490707397461, 15.305252075195312, -0.4910163879394531, 3.850341796875, 0.8601970672607422, 9.769805908203125, 1.6158485412597656, -1.9187068939208984, 1.984649658203125, -12.158798217773438, 14.634979248046875, 0.25543975830078125, -13.316444396972656, -2.2347335815429688, 3.7626495361328125, -5.267070770263672, 8.161003112792969, -10.301605224609375, 3.8165836334228516, 1.6945343017578125, 14.08782958984375, 28.019363403320312, 11.108642578125, -0.11767196655273438, -12.509498596191406, 2.4833602905273438, -8.329670906066895, 6.403179168701172, 6.609039306640625, 0.9081029891967773, 13.808250427246094, 1.9321136474609375, -5.847801208496094, -6.7096710205078125, 3.8003997802734375, 5.726203918457031, 18.763851165771484, 1.4549293518066406, -1.6894874572753906, 8.276084899902344, 1.2633857727050781, 18.761825561523438, 6.9594879150390625, -0.76202392578125, 0.060550689697265625, 1.9529590606689453, -12.941947937011719, 19.643356323242188, -3.5879974365234375, -1.2632331848144531, 18.698009490966797, 0.0878448486328125, 10.113147735595703, 9.370193481445312, 0.9570999145507812, 7.713085174560547, -5.130123138427734, 10.784481048583984, 3.826784133911133], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000320.npy"}
|
||||
{"epoch": 0.4837490551776266, "step": 321, "batch_size": 64, "mean": 4.793027877807617, "std": 8.38432502746582, "min": -13.958141326904297, "p10": -5.410188293457031, "median": 4.249554634094238, "p90": 17.79832916259766, "max": 22.69121742248535, "pos_frac": 0.765625, "sample": [4.7947540283203125, 9.252227783203125, 1.3035335540771484, 12.3714599609375, 2.20611572265625, -6.4392852783203125, 18.42354965209961, 11.442466735839844, 16.339481353759766, 1.080214500427246, 20.456562042236328, 1.0873489379882812, 1.4722862243652344, 11.591259002685547, -9.223785400390625, 1.2496223449707031, 0.8354873657226562, 5.076873779296875, 8.092216491699219, 14.5357666015625, 4.415679931640625, 6.8874664306640625, -13.958141326904297, -5.11834716796875, 5.015815734863281, -9.172340393066406, -4.7069091796875, -5.5352630615234375, 6.459552764892578, 0.440765380859375, 5.218330383300781, 5.4468994140625, 10.559364318847656, 1.9652214050292969, 4.367258071899414, 22.628738403320312, 7.4050750732421875, -0.5240650177001953, 5.019134521484375, 9.7161865234375, -1.6677207946777344, 0.361083984375, -1.3912220001220703, 6.203033447265625, 1.1573715209960938, -10.677558898925781, 15.269035339355469, 19.055931091308594, 13.862800598144531, 13.210762023925781, 2.7357444763183594, 3.8063735961914062, 4.7374114990234375, 21.742517471313477, 18.773117065429688, -2.6702957153320312, 0.903350830078125, 3.4376373291015625, 22.69121742248535, -7.028587341308594, 0.30223846435546875, -1.4377059936523438, 4.1318511962890625, -3.233184814453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000321.npy"}
|
||||
{"epoch": 0.4852607709750567, "step": 322, "batch_size": 64, "mean": 6.364630699157715, "std": 7.790738105773926, "min": -18.09337615966797, "p10": -1.7113759994506832, "median": 5.140501022338867, "p90": 15.7733943939209, "max": 24.935287475585938, "pos_frac": 0.828125, "sample": [-1.9008426666259766, 13.966789245605469, 14.815074920654297, -0.6449203491210938, 4.288414001464844, 3.405271530151367, 11.208961486816406, 8.976094245910645, 15.676326751708984, 3.5553131103515625, -2.3113784790039062, 7.660400390625, 15.814994812011719, 21.483678817749023, -1.269287109375, 3.8322372436523438, 4.999267578125, 11.172409057617188, 0.7111968994140625, 0.298675537109375, 1.3340187072753906, 3.8798828125, -4.8749237060546875, 5.158809661865234, 10.001762390136719, 5.481372833251953, -0.030145645141601562, 6.177558898925781, 1.3457374572753906, -8.302253723144531, -0.8932342529296875, 7.608001708984375, 7.965034484863281, 11.218650817871094, 22.85089111328125, 4.492345809936523, 3.0329322814941406, 24.935287475585938, 1.1049270629882812, 5.9637451171875, 11.184585571289062, 19.924861907958984, 2.3838653564453125, 14.303024291992188, 1.6844291687011719, 17.184432983398438, 12.224250793457031, 10.882135391235352, -6.1107177734375, -18.09337615966797, 4.784473419189453, 9.067474365234375, 4.902652740478516, 3.2419815063476562, 15.46600341796875, 5.1221923828125, 11.118247985839844, 1.351593017578125, 9.324859619140625, -2.3006668090820312, 22.219749450683594, 0.9557418823242188, 6.794034957885742, 5.5314788818359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000322.npy"}
|
||||
{"epoch": 0.48677248677248675, "step": 323, "batch_size": 64, "mean": 5.553721904754639, "std": 10.443330764770508, "min": -16.34014892578125, "p10": -5.58053741455078, "median": 4.751869201660156, "p90": 21.180640029907227, "max": 31.41473388671875, "pos_frac": 0.6875, "sample": [5.545391082763672, -0.6107177734375, 4.758781433105469, 0.7967987060546875, 4.437751770019531, 16.157150268554688, 6.4040069580078125, 7.808515548706055, 4.490104675292969, 20.893207550048828, 3.971832275390625, -2.5620651245117188, 18.57830047607422, 1.4369277954101562, 31.41473388671875, -7.903423309326172, -14.000022888183594, 6.89453125, 11.48583984375, -4.3815460205078125, 8.788421630859375, 6.064369201660156, 8.226356506347656, 5.498008728027344, 24.807464599609375, -2.5756168365478516, -6.094390869140625, 8.170022964477539, 2.627452850341797, -8.388275146484375, 31.355178833007812, 5.604827880859375, -2.4474220275878906, -3.1710739135742188, 21.30382537841797, 1.90716552734375, -8.51226806640625, 9.242362976074219, 11.71746826171875, 0.27466392517089844, 6.598091125488281, 4.744956970214844, -0.5522384643554688, -0.618255615234375, 19.49268341064453, 29.072586059570312, 10.068290710449219, -15.656661987304688, 23.863616943359375, 0.4277191162109375, -16.34014892578125, -3.96478271484375, 6.336151123046875, 3.325420379638672, -1.031646728515625, 11.739925384521484, 24.03656005859375, 7.294807434082031, 6.473236083984375, -0.49950313568115234, -0.10098648071289062, -2.037433624267578, 8.4638671875, 4.2873077392578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000323.npy"}
|
||||
{"epoch": 0.48828420256991684, "step": 324, "batch_size": 64, "mean": 5.4193010330200195, "std": 8.455832481384277, "min": -14.688621520996094, "p10": -3.4340896606445312, "median": 4.580419540405273, "p90": 17.21011085510254, "max": 25.063796997070312, "pos_frac": 0.75, "sample": [17.28899383544922, 14.171592712402344, 1.3762779235839844, -0.8357391357421875, -13.297454833984375, 17.026050567626953, 2.5578298568725586, 13.0391845703125, -3.526763916015625, -3.2178497314453125, 13.521339416503906, 19.879077911376953, 3.3379440307617188, 4.12750244140625, 7.746372222900391, 2.9782142639160156, 9.168474197387695, -8.674182891845703, 1.58392333984375, -5.6342010498046875, 10.875228881835938, -6.478363037109375, 3.2217254638671875, 17.34288787841797, 4.796962738037109, -0.3624687194824219, -14.688621520996094, 4.849552154541016, 18.566696166992188, 5.538524627685547, 8.441341400146484, -11.489334106445312, 8.854827880859375, 7.544288635253906, 0.32314300537109375, 11.29758071899414, 13.185482025146484, 2.8456192016601562, 6.461986541748047, 6.901634216308594, 6.5186920166015625, 3.8312911987304688, 23.726776123046875, -0.6198844909667969, 25.063796997070312, 1.862030029296875, 4.3638763427734375, 6.535430908203125, 10.26041030883789, 12.660903930664062, 23.318443298339844, -1.7361831665039062, 4.260284423828125, -2.1482810974121094, 14.236282348632812, 2.2569732666015625, 5.05718994140625, -0.297882080078125, -3.1792449951171875, 3.4624900817871094, 11.431869506835938, 0.33730316162109375, 5.956520080566406, -0.9691123962402344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000324.npy"}
|
||||
{"epoch": 0.4897959183673469, "step": 325, "batch_size": 64, "mean": 5.240480422973633, "std": 8.884589195251465, "min": -19.263824462890625, "p10": -5.434232330322265, "median": 4.303718566894531, "p90": 16.11701812744141, "max": 25.413414001464844, "pos_frac": 0.734375, "sample": [8.819705963134766, 25.413414001464844, 7.399665832519531, 16.204849243164062, -2.4298439025878906, 4.99847412109375, 17.298786163330078, -0.48339080810546875, 3.6574172973632812, 5.287883758544922, 8.287940979003906, 14.593986511230469, -3.3460159301757812, -9.12387466430664, 11.014511108398438, 3.6679534912109375, 12.491321563720703, 3.011505126953125, 5.2675323486328125, -5.516933441162109, 24.538970947265625, -10.020668029785156, 15.912078857421875, 2.755617141723633, 0.6485490798950195, 24.440269470214844, 2.5586471557617188, 9.207210540771484, -5.90435791015625, 4.872478485107422, -9.57806396484375, -0.2886962890625, 9.364799499511719, 4.585296630859375, 0.8436813354492188, 14.12860107421875, 2.31463623046875, 12.593818664550781, 11.454986572265625, 6.929901123046875, 8.694847106933594, 0.004669189453125, -1.20953369140625, 19.4710693359375, -19.263824462890625, 2.7813568115234375, -0.00258636474609375, 10.707237243652344, 14.793205261230469, 11.150421142578125, 23.786117553710938, -5.164752960205078, -5.509147644042969, -3.2275543212890625, 3.97760009765625, 0.10663223266601562, 7.890659332275391, -5.259429931640625, 2.774169921875, 8.857524871826172, 3.2828750610351562, 7.468681335449219, 4.0221405029296875, -2.614288330078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000325.npy"}
|
||||
{"epoch": 0.491307634164777, "step": 326, "batch_size": 64, "mean": 3.6321234703063965, "std": 8.200159072875977, "min": -13.923511505126953, "p10": -6.8912803649902346, "median": 3.6291580200195312, "p90": 14.057089233398441, "max": 23.25914764404297, "pos_frac": 0.65625, "sample": [-3.7447586059570312, -0.38704681396484375, 3.9419403076171875, 2.5317611694335938, 5.1421356201171875, -8.301544189453125, 23.25914764404297, 4.075347900390625, 6.9679412841796875, 7.80218505859375, 0.6864547729492188, 13.217529296875, 6.089988708496094, -1.8996658325195312, -0.969818115234375, 14.416900634765625, 2.5411949157714844, 3.55670166015625, -0.7164230346679688, 10.516670227050781, 14.831283569335938, -1.39422607421875, 0.05072784423828125, 9.260246276855469, -6.847129821777344, 2.8858795166015625, -13.039810180664062, -3.1442108154296875, 0.366302490234375, 3.9487686157226562, 17.047523498535156, -3.4033889770507812, 7.558109283447266, -0.6683883666992188, 5.149543762207031, 3.5541915893554688, 4.333915710449219, -3.94781494140625, -6.9102020263671875, 10.00213623046875, 4.527431488037109, 12.697616577148438, -10.073234558105469, -1.2261161804199219, 16.46240234375, 3.7016143798828125, 16.998130798339844, -8.524568557739258, 13.034801483154297, 21.701122283935547, 3.9754714965820312, -10.37843132019043, -2.6170501708984375, 13.005973815917969, -5.2841339111328125, 13.081253051757812, -13.923511505126953, 2.622650146484375, 11.04977035522461, 7.755859375, 10.732429504394531, -5.067678451538086, 2.46160888671875, 7.38238525390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000326.npy"}
|
||||
{"epoch": 0.4928193499622071, "step": 327, "batch_size": 64, "mean": 3.8523266315460205, "std": 9.735777854919434, "min": -21.51373291015625, "p10": -7.106977081298828, "median": 2.6179847717285156, "p90": 17.143067169189454, "max": 26.607200622558594, "pos_frac": 0.671875, "sample": [4.863548278808594, -7.280059814453125, -3.1389236450195312, 24.229293823242188, -4.896728515625, 0.0286865234375, -6.703117370605469, 4.9427947998046875, 0.3000202178955078, 11.282516479492188, 18.24785614013672, -4.915813446044922, -14.241401672363281, 17.19072723388672, 11.279823303222656, 13.283859252929688, 17.0318603515625, -0.10124588012695312, -2.9336395263671875, 1.7265968322753906, 0.24178504943847656, -0.5913658142089844, 7.057636260986328, 15.011734008789062, 22.6727294921875, 1.8270492553710938, -0.1341400146484375, 2.9119110107421875, 8.279163360595703, 0.0226898193359375, 5.563207626342773, 1.7145156860351562, -11.039535522460938, 5.475624084472656, -2.2768783569335938, -1.01739501953125, 5.6427154541015625, 3.3416900634765625, -7.286293029785156, -3.0422935485839844, 22.518051147460938, 5.256429672241211, -21.51373291015625, 0.3690948486328125, 13.338973999023438, 25.314987182617188, -0.7985610961914062, 14.258567810058594, 26.607200622558594, 10.317939758300781, 15.369007110595703, 5.850933074951172, 3.7988548278808594, 4.746925354003906, -9.83559799194336, 0.44110107421875, 5.2040557861328125, 2.6245803833007812, 0.3301239013671875, 2.61138916015625, -4.115165710449219, -10.657846450805664, 6.4129638671875, -6.472564697265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000327.npy"}
|
||||
{"epoch": 0.4943310657596372, "step": 328, "batch_size": 64, "mean": 4.401291847229004, "std": 8.139516830444336, "min": -18.75910186767578, "p10": -3.431736755371093, "median": 2.8712759017944336, "p90": 16.987937164306643, "max": 25.444412231445312, "pos_frac": 0.671875, "sample": [15.181503295898438, -0.4659156799316406, 5.441938400268555, 3.8479690551757812, 6.057964324951172, 1.410360336303711, 0.4542388916015625, 11.342021942138672, 17.300857543945312, 1.5219802856445312, 4.465728759765625, 6.613555908203125, 18.092517852783203, 0.31939697265625, 4.9947509765625, -2.097076416015625, 14.623065948486328, 6.887563705444336, 3.883941650390625, 1.6066207885742188, 1.8794403076171875, -3.6461257934570312, -0.076080322265625, 19.17314910888672, -2.1185379028320312, -7.9867401123046875, 25.444412231445312, 0.9213790893554688, -10.025665283203125, 2.4564743041992188, -18.75910186767578, 8.770345687866211, 11.102216720581055, 0.036285400390625, 19.713607788085938, -1.1342391967773438, -2.9314956665039062, -0.07647895812988281, 18.099014282226562, -0.8810501098632812, 11.601852416992188, 7.0835418701171875, -5.928637504577637, -6.303926467895508, -0.9532318115234375, 4.369781494140625, -5.265748977661133, -0.996063232421875, 0.31058502197265625, -0.5617866516113281, 5.181232452392578, -1.5372848510742188, 10.612751007080078, 8.8787841796875, -1.5130996704101562, 3.2860774993896484, 10.183586120605469, 4.594305038452148, 16.257789611816406, 2.2197608947753906, 9.807077407836914, -2.4752273559570312, 18.711143493652344, 12.67562484741211], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000328.npy"}
|
||||
{"epoch": 0.4958427815570673, "step": 329, "batch_size": 64, "mean": 5.275609016418457, "std": 8.956205368041992, "min": -14.21917724609375, "p10": -4.896709442138672, "median": 3.841796875, "p90": 19.23192024230957, "max": 28.54094696044922, "pos_frac": 0.703125, "sample": [8.928924560546875, 8.455551147460938, 3.1801681518554688, 5.1915283203125, -4.878898620605469, 4.59783935546875, 19.365570068359375, -0.06279754638671875, 11.51239013671875, -1.064300537109375, 3.9307479858398438, 0.03395843505859375, -6.145601272583008, 10.045738220214844, -3.3642234802246094, -6.2028656005859375, -2.2372055053710938, 16.610931396484375, -2.8469085693359375, 4.162559509277344, 0.38826751708984375, 0.2305450439453125, 21.74706268310547, 15.460357666015625, -5.909324645996094, 19.798263549804688, 28.54094696044922, 9.6192626953125, 4.973907470703125, 6.1108551025390625, 12.259769439697266, 0.32889556884765625, -5.3564910888671875, 4.2200927734375, 5.8722076416015625, -1.3566665649414062, 23.840423583984375, -1.6877098083496094, 18.92007064819336, 10.194580078125, 21.592941284179688, 8.248981475830078, 3.7528457641601562, -5.246055603027344, 0.060710906982421875, 6.0062713623046875, -0.42774200439453125, 2.100128173828125, 4.178791046142578, 11.925395965576172, 0.8098793029785156, 11.250167846679688, 1.8079681396484375, -4.9043426513671875, 25.85507583618164, 10.1083984375, -2.0165977478027344, 18.85748291015625, -1.6138648986816406, -14.21917724609375, 2.599273681640625, 0.7448577880859375, 2.0346755981445312, -3.2755126953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000329.npy"}
|
||||
{"epoch": 0.4973544973544973, "step": 330, "batch_size": 64, "mean": 6.8600029945373535, "std": 9.652576446533203, "min": -21.50141716003418, "p10": -5.418166732788086, "median": 6.586597442626953, "p90": 18.764017868041996, "max": 26.45087242126465, "pos_frac": 0.734375, "sample": [-5.32733154296875, -8.095069885253906, -0.5278511047363281, -8.345565795898438, 5.332923889160156, -21.50141716003418, 10.321357727050781, 3.8836822509765625, -0.778839111328125, 11.998138427734375, 10.502098083496094, 3.8083534240722656, -3.440399169921875, 0.8745346069335938, 2.63433837890625, 16.657363891601562, 21.292694091796875, -4.279823303222656, 22.030929565429688, 5.7511138916015625, 12.693260192871094, -11.43934440612793, 6.928375244140625, -6.393423080444336, 19.700607299804688, 19.030380249023438, 8.293785095214844, -0.22110366821289062, -3.5023155212402344, 7.8787994384765625, 4.984249114990234, 15.499198913574219, 2.673187255859375, 13.452201843261719, -5.457096099853516, 16.043060302734375, 13.555961608886719, 16.082611083984375, 26.397537231445312, 2.885568618774414, -0.4902496337890625, 13.576202392578125, 6.853218078613281, 14.92667007446289, 15.239707946777344, 14.943608283996582, 3.9468002319335938, 5.608406066894531, -0.31842041015625, 6.319976806640625, 4.862098693847656, 13.443008422851562, 18.142505645751953, 10.36027717590332, 5.2622528076171875, 4.114891052246094, 9.517631530761719, -1.1754188537597656, 24.50428009033203, -8.898300170898438, 8.784692764282227, 14.101150512695312, 7.087608337402344, 26.45087242126465], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000330.npy"}
|
||||
{"epoch": 0.4988662131519274, "step": 331, "batch_size": 64, "mean": 6.756691932678223, "std": 10.023547172546387, "min": -26.030899047851562, "p10": -2.909749603271484, "median": 4.719291687011719, "p90": 19.713195419311525, "max": 29.538715362548828, "pos_frac": 0.78125, "sample": [23.083206176757812, 0.9908828735351562, 4.034984588623047, 3.6006946563720703, 10.737567901611328, 18.2733154296875, 8.61578369140625, 10.787124633789062, 5.308837890625, 3.2395706176757812, 1.615865707397461, 24.408096313476562, 17.04332733154297, 18.533187866210938, 9.347457885742188, 0.5272293090820312, 29.538715362548828, 7.0442047119140625, 9.191150665283203, 25.804500579833984, 7.822488784790039, 2.5289230346679688, 1.0591964721679688, -0.36360931396484375, 1.2770404815673828, 19.859619140625, 19.860624313354492, 19.963973999023438, -6.1942901611328125, -0.147918701171875, 17.579544067382812, 1.2889289855957031, 15.208320617675781, 16.438278198242188, -2.7415695190429688, -0.26824951171875, 3.6490211486816406, -11.483650207519531, 0.11828422546386719, 16.293182373046875, -4.397327423095703, 6.263908386230469, 1.3279190063476562, 0.755584716796875, 10.065479278564453, -26.030899047851562, 9.033302307128906, -2.9818267822265625, -8.046894073486328, 0.1141510009765625, -2.205474853515625, 18.97100067138672, 6.697784423828125, 3.2410964965820312, 11.988204956054688, 0.998931884765625, -1.224130630493164, 14.583908081054688, -0.13779449462890625, -8.35750961303711, 19.371540069580078, 7.521762847900391, 4.1297454833984375, 17.2719669342041], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000331.npy"}
|
||||
{"epoch": 0.5003779289493575, "step": 332, "batch_size": 64, "mean": 6.262184143066406, "std": 8.572905540466309, "min": -8.243148803710938, "p10": -3.63876724243164, "median": 4.635623931884766, "p90": 18.518090057373048, "max": 30.240676879882812, "pos_frac": 0.78125, "sample": [-4.9290313720703125, 5.5159454345703125, 0.1756591796875, -2.2845382690429688, 6.259342193603516, 4.32037353515625, 6.293342590332031, -0.9930152893066406, -1.887969970703125, 1.229705810546875, 2.4795074462890625, 24.01568603515625, 18.465042114257812, 4.108116149902344, 10.376655578613281, 4.419624328613281, -0.8553066253662109, 3.641848564147949, 14.688407897949219, 3.3472137451171875, 10.990135192871094, 1.5873794555664062, -5.248985290527344, 23.96270751953125, 8.145584106445312, 10.935386657714844, 2.0988197326660156, 0.7969589233398438, 7.000701904296875, 1.961456298828125, 4.7207489013671875, 22.787569046020508, -7.477569580078125, 4.05487060546875, -8.243148803710938, 6.209224700927734, 2.645456314086914, 18.0675048828125, -4.252552032470703, 2.3861522674560547, 14.957275390625, 7.694732666015625, 12.437664031982422, 5.967569351196289, 0.6550102233886719, -3.7477493286132812, 18.54082489013672, 7.288974761962891, 5.950164794921875, 4.550498962402344, 30.240676879882812, 14.839645385742188, 5.685173034667969, -2.4173927307128906, -3.3844757080078125, 3.7266845703125, 25.545501708984375, 6.339389801025391, 6.5703125, 12.534000396728516, 22.91753387451172, 9.44989013671875, -6.929218292236328, -0.1479034423828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000332.npy"}
|
||||
{"epoch": 0.5018896447467877, "step": 333, "batch_size": 64, "mean": 7.367546558380127, "std": 8.191169738769531, "min": -7.588924407958984, "p10": -2.210115814208984, "median": 6.472194671630859, "p90": 19.418567657470707, "max": 31.077041625976562, "pos_frac": 0.796875, "sample": [10.157005310058594, -0.1391468048095703, 9.614227294921875, 20.606201171875, 2.9328384399414062, -3.166593551635742, 11.846412658691406, 13.29840087890625, 5.812114715576172, 0.9827880859375, 12.599449157714844, 4.525962829589844, 6.1821441650390625, 11.962364196777344, 10.179374694824219, 2.5020904541015625, 12.413787841796875, 22.496109008789062, 19.821456909179688, 9.875526428222656, 3.2249908447265625, 2.5348358154296875, 2.1320877075195312, 1.7160186767578125, -1.3979225158691406, -0.23827362060546875, -3.190662384033203, 9.417900085449219, 17.395442962646484, 6.99920654296875, 7.508689880371094, 24.49062728881836, -3.465850830078125, -4.245391845703125, 10.375602722167969, 9.406730651855469, 31.077041625976562, -0.7069931030273438, 3.462017059326172, -1.6377182006835938, -2.4554290771484375, -1.5188751220703125, -7.588924407958984, 6.375644683837891, 3.9470443725585938, 2.154918670654297, 13.690536499023438, 13.092437744140625, 2.9371681213378906, 2.8177947998046875, 1.3756484985351562, 1.5008621215820312, 6.568744659423828, 24.290422439575195, 7.7569580078125, -5.053108215332031, 13.935081481933594, 0.4236297607421875, 18.478492736816406, 6.876552581787109, 20.280914306640625, 14.80514144897461, 13.739238739013672, 13.731178283691406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000333.npy"}
|
||||
{"epoch": 0.5034013605442177, "step": 334, "batch_size": 64, "mean": 5.278097152709961, "std": 10.457725524902344, "min": -21.14618682861328, "p10": -6.759534072875977, "median": 3.88999080657959, "p90": 18.423115158081057, "max": 27.421226501464844, "pos_frac": 0.71875, "sample": [-5.315513610839844, -21.14618682861328, 3.5419921875, 12.789619445800781, 5.787506103515625, -2.064788818359375, -12.100944519042969, 20.2928466796875, 3.1539764404296875, -20.450973510742188, 3.0406417846679688, 25.105667114257812, -11.190303802490234, 0.8990516662597656, 4.199550628662109, 13.65521240234375, 11.302978515625, -0.8192977905273438, 3.4681396484375, 21.183212280273438, 17.298179626464844, 24.681564331054688, -6.7341156005859375, 8.875679016113281, 16.720317840576172, -12.293548583984375, 3.6806640625, -0.4390220642089844, 14.170570373535156, 2.4771080017089844, 0.8838310241699219, 4.09931755065918, 19.663619995117188, -5.206390380859375, 14.449283599853516, 8.001579284667969, 16.214080810546875, 2.32196044921875, 3.251220703125, 2.32513427734375, 5.411956787109375, -6.770427703857422, -4.376873016357422, -7.914955139160156, 13.979049682617188, 0.09163284301757812, 3.5882644653320312, 4.492433547973633, -2.2428131103515625, 4.242759704589844, 7.165660858154297, 8.967300415039062, -3.77606201171875, 6.643192291259766, -3.687620162963867, 17.847320556640625, -0.45987701416015625, 27.421226501464844, 8.701873779296875, 2.548419952392578, 18.5018310546875, 18.239444732666016, 13.533233642578125, 15.877838134765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000334.npy"}
|
||||
{"epoch": 0.5049130763416477, "step": 335, "batch_size": 64, "mean": 5.131413459777832, "std": 11.541687965393066, "min": -22.13018798828125, "p10": -6.883467483520507, "median": 2.6324825286865234, "p90": 23.39036102294922, "max": 28.5625, "pos_frac": 0.65625, "sample": [11.504051208496094, 16.314056396484375, -4.072265625, 28.5625, -2.675567626953125, 1.3246002197265625, 9.73333740234375, -0.03139305114746094, 23.672027587890625, 17.71625518798828, 25.277366638183594, 20.3529052734375, 2.3109130859375, 1.76416015625, 22.733139038085938, 17.672439575195312, 5.103034973144531, 8.701988220214844, -8.02935791015625, 24.700523376464844, -5.03912353515625, -0.2303466796875, 4.28546142578125, 26.554054260253906, -5.259117126464844, 5.099853515625, 4.741310119628906, -9.163711547851562, 0.0005664825439453125, 12.762924194335938, 19.037471771240234, -0.7912139892578125, 0.7272529602050781, 2.954051971435547, 9.751544952392578, -13.191965103149414, 21.353416442871094, 0.10535430908203125, -1.75311279296875, 24.51323699951172, -2.95928955078125, -0.9729156494140625, -1.9639358520507812, 6.440956115722656, 1.210601806640625, 23.72986602783203, 0.9536819458007812, -12.771453857421875, 3.2725906372070312, -3.62799072265625, 1.9163894653320312, -5.640071868896484, 3.7938079833984375, -19.486217498779297, -22.13018798828125, -3.0910415649414062, 20.340469360351562, 10.862411499023438, 3.072641372680664, 13.852596282958984, -4.698486328125, 0.8598861694335938, -7.416351318359375, 3.7698822021484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000335.npy"}
|
||||
{"epoch": 0.5064247921390779, "step": 336, "batch_size": 64, "mean": 6.04543399810791, "std": 9.328714370727539, "min": -20.2413330078125, "p10": -3.043387031555175, "median": 4.218849182128906, "p90": 19.510296630859383, "max": 28.709304809570312, "pos_frac": 0.75, "sample": [8.823089599609375, 12.831893920898438, 6.4093170166015625, 1.889404296875, 20.384002685546875, 17.471649169921875, 12.26047134399414, 0.0241851806640625, 24.586151123046875, 3.3200626373291016, 9.105369567871094, 2.221385955810547, 24.26264190673828, -4.9025115966796875, 8.039115905761719, -1.4597702026367188, -0.6369247436523438, 6.473594665527344, 27.481536865234375, -1.3059806823730469, 2.248249053955078, 10.310035705566406, 12.545448303222656, 4.664390563964844, -3.8271713256835938, 10.912925720214844, 12.387519836425781, -20.2413330078125, 4.036399841308594, -15.011184692382812, 23.315956115722656, -3.3622817993164062, 0.8171501159667969, 3.3033370971679688, -1.7508926391601562, 0.2374706268310547, 6.778022766113281, -3.3562393188476562, 21.926742553710938, 4.401298522949219, 12.698165893554688, 3.4097442626953125, 4.0313568115234375, 7.809635162353516, 11.497772216796875, 3.6355056762695312, 13.661087036132812, -1.9275360107421875, 0.12604904174804688, 6.201896667480469, -2.3133983612060547, -8.317657470703125, 3.1907501220703125, 13.2176513671875, 28.709304809570312, 9.536441802978516, 0.8115692138671875, 10.868865966796875, 3.3440475463867188, -1.1702957153320312, -0.7825775146484375, 14.280204772949219, 8.60858154296875, -1.83392333984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000336.npy"}
|
||||
{"epoch": 0.5079365079365079, "step": 337, "batch_size": 64, "mean": 4.44844388961792, "std": 9.635246276855469, "min": -16.79686737060547, "p10": -7.527525901794433, "median": 3.3353843688964844, "p90": 16.2133939743042, "max": 29.439590454101562, "pos_frac": 0.671875, "sample": [6.227935791015625, -6.60333251953125, -8.739730834960938, -2.2472991943359375, 2.1690292358398438, 19.128219604492188, -6.372383117675781, 16.22130012512207, 1.253326416015625, -2.5298004150390625, 10.98175048828125, -1.6885757446289062, 9.667510986328125, -16.28295135498047, -3.99298095703125, -2.7112503051757812, 8.470996856689453, 21.181793212890625, 1.2040252685546875, 1.0511322021484375, 1.1469192504882812, -1.6419811248779297, -7.923608779907227, -9.013717651367188, 14.285564422607422, 10.83005142211914, -10.39013671875, 11.238170623779297, 1.51800537109375, 23.638931274414062, 16.1949462890625, -16.79686737060547, 12.156303405761719, -5.711952209472656, 15.411346435546875, 3.9086837768554688, 21.234634399414062, 10.124832153320312, 1.3505058288574219, 10.880781173706055, 5.2890167236328125, 3.4934310913085938, 3.687450408935547, 8.219390869140625, 1.3979644775390625, -0.797882080078125, 10.148048400878906, 3.9921417236328125, 1.9694976806640625, 13.099987030029297, 3.177337646484375, 29.439590454101562, -1.3403663635253906, 1.8580322265625, -0.3396949768066406, 12.712966918945312, -1.8191947937011719, -4.255729675292969, 11.663265228271484, 14.866395950317383, -10.7491455078125, 5.681304931640625, 20.29183578491211, 4.18463134765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000337.npy"}
|
||||
{"epoch": 0.509448223733938, "step": 338, "batch_size": 64, "mean": 4.068912506103516, "std": 7.978236675262451, "min": -12.321952819824219, "p10": -4.7715349197387695, "median": 2.2481918334960938, "p90": 14.329296875000002, "max": 28.146209716796875, "pos_frac": 0.6875, "sample": [-5.36749267578125, 2.3607025146484375, 16.343170166015625, 1.3267536163330078, 11.315563201904297, 8.716053009033203, -8.419761657714844, 11.5985107421875, 2.3133087158203125, 19.586559295654297, 17.893638610839844, 1.6449432373046875, -1.3558578491210938, -0.4497528076171875, 2.4702110290527344, -4.848808288574219, -5.80926513671875, -4.591230392456055, -4.338813781738281, 0.19312286376953125, 3.075897216796875, 14.613883972167969, 1.478607177734375, -1.9689712524414062, 12.606498718261719, -2.9245872497558594, -12.321952819824219, 6.161838531494141, 8.009857177734375, 8.065624237060547, 2.0860519409179688, 5.698944091796875, 1.9769821166992188, 0.1365814208984375, 2.183074951171875, 9.011947631835938, 6.898263931274414, 0.2613372802734375, 13.270187377929688, -0.3123626708984375, -2.0178985595703125, 13.367591857910156, -3.059093475341797, 10.442581176757812, 3.3055496215820312, 3.4608726501464844, -1.354095458984375, -1.5463523864746094, 5.450164794921875, -10.828598022460938, -5.68951416015625, 18.042171478271484, 9.250686645507812, -1.8916664123535156, 6.249015808105469, 10.312105178833008, 0.6378173828125, 21.74622344970703, 13.665260314941406, -2.0082321166992188, 1.8895263671875, 1.5374717712402344, 28.146209716796875, 2.713348388671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000338.npy"}
|
||||
{"epoch": 0.5109599395313681, "step": 339, "batch_size": 64, "mean": 6.471385955810547, "std": 8.43995475769043, "min": -11.066574096679688, "p10": -4.235537910461426, "median": 6.354138374328613, "p90": 16.14538154602051, "max": 27.64630126953125, "pos_frac": 0.765625, "sample": [1.6465911865234375, 5.977996826171875, -1.8326148986816406, -1.2897109985351562, -5.838226318359375, 14.208444595336914, 8.387557983398438, 6.459299087524414, 1.7088432312011719, 3.2080841064453125, 11.914192199707031, 11.263381958007812, 3.9648971557617188, 0.20331954956054688, 8.530872344970703, 14.51846694946289, -3.242891311645508, 16.756500244140625, 19.603012084960938, 1.1740245819091797, 13.287456512451172, 7.138450622558594, 12.668758392333984, 11.646392822265625, 0.21538925170898438, 14.082700729370117, 2.798797607421875, 6.558246612548828, -11.066574096679688, 9.181806564331055, 14.774116516113281, 13.861091613769531, 14.716529846191406, 23.901702880859375, 17.268280029296875, 6.59783935546875, 16.139541625976562, -6.394645690917969, 6.2489776611328125, 4.1027984619140625, 15.043952941894531, 5.1268768310546875, -4.270689010620117, -1.2693920135498047, 3.37811279296875, -7.2989501953125, 6.231689453125, -2.6506195068359375, -11.028800964355469, -7.89190673828125, -0.14832305908203125, 4.6523895263671875, 4.152904510498047, 11.623613357543945, -1.6945343017578125, 27.64630126953125, -4.1535186767578125, 7.263545989990234, 6.886131286621094, 13.902053833007812, 16.147884368896484, 16.047958374023438, 2.1709442138671875, 19.251373291015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000339.npy"}
|
||||
{"epoch": 0.5124716553287982, "step": 340, "batch_size": 64, "mean": 5.623458385467529, "std": 10.45651626586914, "min": -18.139698028564453, "p10": -6.32955493927002, "median": 3.899007797241211, "p90": 21.201713943481447, "max": 27.813339233398438, "pos_frac": 0.671875, "sample": [4.251716613769531, 22.153297424316406, 1.6389293670654297, 0.1882476806640625, 1.4538192749023438, 21.449138641357422, -4.914695739746094, -1.8693161010742188, -1.0952625274658203, 7.0240325927734375, 1.0289688110351562, 12.581302642822266, 6.41180419921875, 10.652801513671875, 27.813339233398438, 18.327411651611328, 3.9291152954101562, -15.15240478515625, 8.4510498046875, 5.85943603515625, -2.2387638092041016, -0.8177585601806641, -2.4330902099609375, 5.881303787231445, -0.49657440185546875, 19.73523712158203, -0.5211677551269531, 10.855094909667969, 13.855972290039062, 17.45319366455078, -6.354583740234375, 18.121688842773438, 0.5889816284179688, 18.40863037109375, 22.758018493652344, 7.715854644775391, 18.018287658691406, 0.8584613800048828, 4.909515380859375, 22.26374626159668, -0.3318939208984375, -6.271154403686523, -9.437854766845703, 21.888896942138672, 12.526351928710938, -4.77238655090332, -7.98089599609375, -6.8152313232421875, 8.95589828491211, 16.188629150390625, 1.2276458740234375, 2.952861785888672, -1.9013442993164062, -4.946964263916016, -18.139698028564453, -13.2947998046875, 22.63498306274414, 2.7013893127441406, -3.9789466857910156, 20.6243896484375, 12.197517395019531, 3.8689002990722656, 9.614734649658203, 3.6455230712890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000340.npy"}
|
||||
{"epoch": 0.5139833711262283, "step": 341, "batch_size": 64, "mean": 6.104777812957764, "std": 9.758421897888184, "min": -17.553302764892578, "p10": -5.3570835113525375, "median": 4.922920227050781, "p90": 19.49532699584961, "max": 25.497421264648438, "pos_frac": 0.75, "sample": [-10.529006958007812, 19.44194793701172, 13.732139587402344, 11.092012405395508, 23.095199584960938, 16.946636199951172, 13.297649383544922, 21.570053100585938, 14.121410369873047, -0.408111572265625, 18.868507385253906, 2.832956314086914, -8.353424072265625, -2.284757614135742, 3.9238758087158203, 1.5172901153564453, 17.927764892578125, 1.622711181640625, 11.416572570800781, 14.928665161132812, 22.519195556640625, 25.497421264648438, 19.518203735351562, 6.498302459716797, -3.535308837890625, 20.569229125976562, 5.0126953125, -8.737350463867188, -1.7683525085449219, -2.6044692993164062, 1.6449012756347656, 12.565513610839844, -2.4694595336914062, -1.8329887390136719, 4.198432922363281, 7.501888275146484, 6.973915100097656, 7.7501220703125, 0.4749908447265625, -17.553302764892578, -6.137844085693359, -2.5877838134765625, 0.43788909912109375, 13.672821044921875, 2.987701416015625, 22.427352905273438, 5.6478729248046875, 4.8331451416015625, 10.290130615234375, 11.431045532226562, 6.82867431640625, 1.89910888671875, 19.00973892211914, 0.5445556640625, 3.1688270568847656, 16.405746459960938, 0.8858108520507812, 5.980762481689453, 1.011117935180664, -6.322395324707031, -1.98443603515625, 8.017066955566406, 0.19080734252929688, -14.915603637695312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000341.npy"}
|
||||
{"epoch": 0.5154950869236583, "step": 342, "batch_size": 64, "mean": 6.106938362121582, "std": 9.278534889221191, "min": -11.174537658691406, "p10": -4.876898956298827, "median": 4.549978256225586, "p90": 18.87736587524414, "max": 30.57680320739746, "pos_frac": 0.75, "sample": [3.7308502197265625, 0.1245880126953125, 0.08361053466796875, -5.5767669677734375, 2.5418004989624023, 21.539958953857422, -8.573617935180664, 18.955787658691406, -4.255058288574219, 11.550186157226562, -2.988788604736328, 8.845901489257812, 2.9012622833251953, 12.363517761230469, 7.2357940673828125, -5.143402099609375, 0.48417091369628906, -10.696651458740234, -1.5832901000976562, -1.2191238403320312, 5.272911071777344, 10.23931884765625, 11.00787353515625, 17.02509307861328, -2.619945526123047, 5.985374450683594, 3.5401611328125, 11.012176513671875, 16.149879455566406, 9.057502746582031, -1.9370269775390625, 0.35315704345703125, 4.797393798828125, 8.050155639648438, 5.421653747558594, -5.443605422973633, 15.642932891845703, 0.98358154296875, 3.7720184326171875, 16.484844207763672, 3.1360015869140625, 21.147274017333984, 18.694381713867188, 4.408985137939453, -8.59295654296875, 22.345458984375, 7.592247009277344, -1.2738780975341797, 24.211868286132812, 21.245468139648438, 4.690971374511719, -11.174537658691406, 1.517181396484375, 30.57680320739746, -3.458759307861328, 17.758380889892578, 11.716358184814453, 0.3607177734375, 0.362884521484375, 1.791259765625, 16.183208465576172, 10.456180572509766, -1.5272979736328125, 13.559661865234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000342.npy"}
|
||||
{"epoch": 0.5170068027210885, "step": 343, "batch_size": 64, "mean": 3.356048583984375, "std": 8.609989166259766, "min": -19.963703155517578, "p10": -6.509807205200195, "median": 3.4615821838378906, "p90": 15.847460937500001, "max": 20.800064086914062, "pos_frac": 0.65625, "sample": [0.6571731567382812, 7.394718170166016, -1.539825439453125, -11.0186767578125, 3.042743682861328, 12.634360313415527, 3.4067153930664062, -2.119617462158203, -4.148895263671875, -5.821014404296875, -19.963703155517578, -5.7017822265625, 7.67926025390625, 9.762832641601562, 18.37359619140625, -1.6634254455566406, 15.886077880859375, 10.034835815429688, 10.80759048461914, 9.51953125, 6.018035888671875, 4.184459686279297, 16.270347595214844, 5.573631286621094, 3.6881637573242188, 12.527519226074219, 4.928371429443359, -4.368415832519531, 2.4392242431640625, 3.5081558227539062, -0.2877960205078125, 4.6363983154296875, 12.297706604003906, 4.6493988037109375, 2.3149337768554688, -0.7581825256347656, 8.893547058105469, 16.03546905517578, -15.277435302734375, 18.916732788085938, -9.8707275390625, -2.7614669799804688, 2.489715576171875, 17.45538330078125, 6.743522644042969, -1.5691986083984375, 11.603363037109375, 2.2020111083984375, -3.8997955322265625, 12.30169677734375, 20.800064086914062, 4.128753662109375, -4.0029296875, 3.664215087890625, 1.2088241577148438, 15.757354736328125, -6.805004119873047, 2.3193435668945312, 4.302879333496094, -1.5590667724609375, -12.397735595703125, -10.046226501464844, -4.1056365966796875, 3.415008544921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000343.npy"}
|
||||
{"epoch": 0.5185185185185185, "step": 344, "batch_size": 64, "mean": 6.323397636413574, "std": 8.745719909667969, "min": -14.179302215576172, "p10": -2.5446166992187496, "median": 5.734648704528809, "p90": 18.449404907226562, "max": 27.87957763671875, "pos_frac": 0.703125, "sample": [-5.370872497558594, 13.997367858886719, 7.240325927734375, 7.589238166809082, 18.536636352539062, -1.7197895050048828, 13.463554382324219, 6.244281768798828, 3.3810882568359375, -0.8780517578125, -1.615142822265625, 1.165740966796875, 11.96834945678711, 11.720069885253906, -1.575775146484375, 20.524250030517578, -1.3075408935546875, 27.87957763671875, -1.117218017578125, 10.108016967773438, -8.283712387084961, 3.8721771240234375, -1.8695068359375, 19.356918334960938, 7.859783172607422, -1.382598876953125, -7.343902587890625, 0.009845733642578125, 18.245864868164062, 8.062248229980469, -4.701934814453125, 23.56493377685547, 1.9738922119140625, 5.161167144775391, 12.017951965332031, 19.92289924621582, 11.731315612792969, 3.418262481689453, 2.052471160888672, -8.621463775634766, -0.15828704833984375, 3.755596160888672, -2.8031158447265625, 0.9729232788085938, -0.3638153076171875, 1.4715156555175781, 10.582778930664062, 8.920719146728516, 15.610092163085938, -0.7199935913085938, 16.03473663330078, 6.479209899902344, 20.917478561401367, 12.727334976196289, 10.1849365234375, 2.3401222229003906, 13.001861572265625, 10.71595573425293, -1.9414520263671875, 17.5401611328125, -14.179302215576172, 15.789527893066406, 5.225015640258789, 7.3126983642578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000344.npy"}
|
||||
{"epoch": 0.5200302343159486, "step": 345, "batch_size": 64, "mean": 5.438445568084717, "std": 9.492217063903809, "min": -18.304058074951172, "p10": -5.022170639038086, "median": 4.767825126647949, "p90": 20.03678779602051, "max": 26.973922729492188, "pos_frac": 0.75, "sample": [2.0222549438476562, 1.8206233978271484, 20.202682495117188, -14.594093322753906, -0.6284332275390625, -1.4368057250976562, 0.3637123107910156, -3.9093475341796875, -4.335868835449219, 4.744529724121094, 6.0222015380859375, 21.39374542236328, 9.240312576293945, 1.5393848419189453, -5.107013702392578, 4.8130340576171875, -10.697410583496094, 2.0997352600097656, 6.739826202392578, 14.118743896484375, 26.973922729492188, 4.791120529174805, 14.451446533203125, 8.04888916015625, 8.121150970458984, 0.4831390380859375, 7.708709716796875, 10.106369018554688, -4.8242034912109375, -0.9186553955078125, 8.617691040039062, 21.65996551513672, 6.439567565917969, 14.077213287353516, 8.395187377929688, -1.1124000549316406, 16.813507080078125, 22.129974365234375, 19.649700164794922, -10.107158660888672, 13.360443115234375, -1.0697250366210938, -18.304058074951172, 16.749359130859375, 6.755119323730469, -5.721340179443359, 2.3010177612304688, 0.27728271484375, 7.695976257324219, 4.598976135253906, 22.420196533203125, 7.255119323730469, -8.487865447998047, 4.646209716796875, 0.20361328125, 12.412559509277344, 0.017378807067871094, -3.584278106689453, 1.5266876220703125, 0.35369110107421875, 12.698137283325195, 1.0024261474609375, 13.522933959960938, 21.513710021972656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000345.npy"}
|
||||
{"epoch": 0.5215419501133787, "step": 346, "batch_size": 64, "mean": 6.291138648986816, "std": 10.738842964172363, "min": -20.26844024658203, "p10": -5.06979808807373, "median": 5.941267967224121, "p90": 20.390351867675783, "max": 28.775390625, "pos_frac": 0.65625, "sample": [15.755760192871094, 20.42416000366211, 16.606658935546875, 15.546104431152344, 15.253471374511719, 4.2467041015625, 2.4975051879882812, 4.81337833404541, 18.689058303833008, 19.508224487304688, 28.775390625, -1.1118698120117188, 4.671142578125, 13.528963088989258, -4.876977920532227, 22.3983154296875, -5.579193115234375, 8.993263244628906, -2.0818214416503906, 4.985490798950195, 6.963592529296875, 13.9859619140625, -1.9776458740234375, -2.2794322967529297, 13.834205627441406, 28.167068481445312, -20.26844024658203, 6.464719772338867, 2.6487579345703125, -12.616615295410156, 3.082275390625, -3.1315460205078125, 6.879150390625, -3.658036231994629, 4.5097808837890625, 5.417816162109375, 8.509296417236328, -1.0810012817382812, 10.768531799316406, 25.63544464111328, 22.661178588867188, 21.680932998657227, -0.44623565673828125, 12.65386962890625, 4.431755065917969, -8.311271667480469, 13.218940734863281, 7.898490905761719, -4.512378692626953, -5.152435302734375, 12.632587432861328, -1.8574295043945312, -0.03757286071777344, -2.915241241455078, -15.867786407470703, 6.9223175048828125, 7.4824371337890625, 14.213142395019531, -15.251609802246094, 20.311466217041016, -2.5721664428710938, -3.0946197509765625, 11.671714782714844, 11.975196838378906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000346.npy"}
|
||||
{"epoch": 0.5230536659108088, "step": 347, "batch_size": 64, "mean": 4.9868879318237305, "std": 9.99516487121582, "min": -18.951114654541016, "p10": -5.884169769287109, "median": 2.476490020751953, "p90": 17.7009635925293, "max": 34.688812255859375, "pos_frac": 0.671875, "sample": [-3.5607166290283203, 10.293403625488281, 14.918144226074219, -1.2779541015625, -18.951114654541016, 3.6265506744384766, 7.4248046875, 4.824241638183594, 6.503459930419922, 13.633331298828125, 0.01601409912109375, 17.29229736328125, -1.0329437255859375, -4.430206298828125, -8.203022003173828, 34.688812255859375, 8.254959106445312, 10.246988296508789, -9.565139770507812, 5.085197448730469, 0.8155593872070312, 0.13472366333007812, 28.241065979003906, -1.5565185546875, -11.882587432861328, -6.261302947998047, 4.374885559082031, 23.771591186523438, 21.545455932617188, -2.0230026245117188, 1.2876815795898438, -3.1446380615234375, -4.863513946533203, 10.027877807617188, -2.3187923431396484, -0.00470733642578125, -9.796531677246094, 7.496463775634766, 0.40924072265625, 14.40814208984375, 10.44499397277832, 1.15618896484375, 1.1354331970214844, 11.49847412109375, 6.928190231323242, 19.82281494140625, 0.018899917602539062, -7.5093994140625, 16.848731994628906, 15.7789306640625, 10.328262329101562, 7.189918518066406, 5.166351318359375, -0.419830322265625, -5.004192352294922, 17.87610626220703, 1.3264293670654297, 13.356193542480469, -0.33957672119140625, 0.596435546875, 0.8463668823242188, -0.3015403747558594, 21.07843780517578, 10.890026092529297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000347.npy"}
|
||||
{"epoch": 0.5245653817082389, "step": 348, "batch_size": 64, "mean": 6.701659202575684, "std": 10.406414985656738, "min": -8.795120239257812, "p10": -3.7864452362060543, "median": 3.4808425903320312, "p90": 21.68138771057129, "max": 34.12488555908203, "pos_frac": 0.65625, "sample": [18.010818481445312, 19.413124084472656, -0.19256591796875, 2.2392578125, 12.909034729003906, 16.507225036621094, -7.708160400390625, 2.252094268798828, -2.7093658447265625, -1.5182456970214844, 34.12488555908203, -4.41143798828125, 9.261566162109375, 14.170516967773438, 21.197715759277344, 21.888675689697266, 5.065986633300781, 29.334808349609375, 0.14748382568359375, 27.50204849243164, 2.647411346435547, 9.445854187011719, 16.984237670898438, 2.0164031982421875, 8.774421691894531, 21.13372039794922, -1.1667938232421875, 16.161392211914062, -0.3867988586425781, -1.3630828857421875, 5.489051818847656, 0.7972183227539062, 0.576812744140625, 10.254745483398438, -2.5055923461914062, -4.066829681396484, 10.620119094848633, -2.7304439544677734, 0.9270553588867188, 11.848930358886719, 8.125480651855469, 30.849979400634766, 7.405731201171875, -2.8398447036743164, 12.17563247680664, -3.259906768798828, -6.151752471923828, -3.864044189453125, -6.950874328613281, 7.919891357421875, -3.6053810119628906, 25.242752075195312, -1.0711956024169922, 4.723642349243164, 23.158031463623047, -0.962371826171875, 12.460548400878906, 0.76690673828125, -0.9755477905273438, 4.314273834228516, -8.795120239257812, 8.495346069335938, 1.2174835205078125, -2.416778564453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000348.npy"}
|
||||
{"epoch": 0.5260770975056689, "step": 349, "batch_size": 64, "mean": 6.5216474533081055, "std": 9.871705055236816, "min": -21.512027740478516, "p10": -4.646655464172362, "median": 4.400453567504883, "p90": 20.69307022094727, "max": 29.915939331054688, "pos_frac": 0.765625, "sample": [-2.808563232421875, 4.41461181640625, 18.2958984375, 2.7700042724609375, 0.2562713623046875, 5.782627105712891, 9.457618713378906, 11.272441864013672, 4.176067352294922, 8.326648712158203, -3.6781673431396484, 3.924266815185547, 19.391075134277344, 3.548046112060547, 7.13800048828125, 21.57447052001953, 17.004093170166016, -7.723655700683594, -21.512027740478516, 16.320632934570312, 0.5593719482421875, 12.3826904296875, 9.43332290649414, 4.386295318603516, 9.070991516113281, 3.7066879272460938, -0.5416717529296875, 16.323339462280273, -2.9157638549804688, 13.911163330078125, 12.760337829589844, 2.407367706298828, -9.228042602539062, 9.580543518066406, -0.852020263671875, 11.441349029541016, 6.557041168212891, 26.3470458984375, -9.411468505859375, 14.21986198425293, 21.9381103515625, 0.9405288696289062, -0.9513130187988281, 2.6986236572265625, 1.6054115295410156, 22.985671997070312, 21.251068115234375, 0.7988395690917969, -5.0617218017578125, -2.7034053802490234, 11.221954345703125, -0.8793792724609375, -7.514839172363281, 3.3506546020507812, 5.759189605712891, 22.480331420898438, 0.9739990234375, 0.8958587646484375, -8.132038116455078, 17.199188232421875, 2.8633785247802734, 16.945423126220703, 10.735153198242188, 29.915939331054688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000349.npy"}
|
||||
{"epoch": 0.527588813303099, "step": 350, "batch_size": 64, "mean": 7.921087265014648, "std": 9.60888385772705, "min": -7.196647644042969, "p10": -3.438066864013672, "median": 6.480337142944336, "p90": 22.217070770263675, "max": 28.488739013671875, "pos_frac": 0.734375, "sample": [22.667938232421875, -1.8963127136230469, 8.072624206542969, -4.061786651611328, 10.63787841796875, 6.244056701660156, 26.259963989257812, 13.25213623046875, 14.585552215576172, 13.265464782714844, -5.450042724609375, -1.3526611328125, 1.286773681640625, 8.749469757080078, 8.857389450073242, 6.716617584228516, 17.36487579345703, -0.111907958984375, -0.8859367370605469, -5.5323486328125, 4.909210205078125, 16.63720703125, -4.415596008300781, 9.650741577148438, 9.677312850952148, -2.7848663330078125, -3.1563568115234375, 13.077880859375, 3.934427261352539, -7.196647644042969, 16.423362731933594, -3.5587997436523438, 15.510162353515625, -2.289947509765625, 15.685028076171875, 22.949020385742188, 13.043411254882812, 21.16504669189453, 28.488739013671875, 27.41901397705078, -2.3992271423339844, 25.05371856689453, 20.83395004272461, 1.2647857666015625, 0.3164253234863281, 1.323812484741211, 8.507713317871094, -0.326080322265625, 2.4434814453125, 11.227935791015625, 4.762237548828125, 15.375808715820312, 0.10718154907226562, 3.8438987731933594, 4.662254333496094, -4.5007476806640625, 5.9202880859375, 0.43013763427734375, -2.157390594482422, 2.62725830078125, 25.67596435546875, 11.647903442382812, 16.038673400878906, 20.431503295898438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000350.npy"}
|
||||
{"epoch": 0.5291005291005291, "step": 351, "batch_size": 64, "mean": 5.767122268676758, "std": 10.116264343261719, "min": -27.931751251220703, "p10": -4.887125396728516, "median": 5.123928070068359, "p90": 18.78256759643555, "max": 22.110816955566406, "pos_frac": 0.78125, "sample": [2.1662750244140625, 17.886085510253906, 19.145523071289062, 10.383941650390625, 1.5675849914550781, 19.75506591796875, -4.8033905029296875, 5.946258544921875, 14.261444091796875, 10.572616577148438, 2.6693572998046875, 22.110816955566406, 0.4193916320800781, -3.57177734375, 1.601715087890625, -9.15838623046875, 2.827850341796875, 3.3567237854003906, 16.808799743652344, 3.46466064453125, 5.2764892578125, -4.923011779785156, 14.300697326660156, 4.971366882324219, 18.4132080078125, 19.1912841796875, 15.232624053955078, 8.991661071777344, -17.240352630615234, 3.134521484375, 12.307315826416016, -8.135993957519531, 13.13519287109375, 8.615821838378906, 12.813941955566406, -5.0508575439453125, 0.47922515869140625, 18.321563720703125, -3.3822784423828125, 2.3453521728515625, 8.90936279296875, 20.203994750976562, 2.9043655395507812, -2.9112396240234375, 13.7705078125, 2.9003219604492188, 11.831573486328125, 1.232330322265625, 1.3638153076171875, 7.209983825683594, -0.7967643737792969, -2.2618026733398438, 9.591194152832031, 15.598997116088867, 18.94086456298828, 3.44940185546875, 7.164491653442383, 19.165817260742188, -27.931751251220703, 2.815582275390625, -20.935646057128906, 5.559272766113281, 17.263492584228516, -2.150665283203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000351.npy"}
|
||||
{"epoch": 0.5306122448979592, "step": 352, "batch_size": 64, "mean": 5.478187084197998, "std": 9.443631172180176, "min": -10.516944885253906, "p10": -5.334910583496094, "median": 3.324596405029297, "p90": 20.166431427001957, "max": 29.8468017578125, "pos_frac": 0.71875, "sample": [16.286849975585938, 27.841766357421875, 26.182411193847656, 7.22662353515625, 4.347282409667969, -7.8228302001953125, 0.7411766052246094, 10.793754577636719, -0.50927734375, 10.767671585083008, 2.1894378662109375, 7.459877014160156, -6.527740478515625, -8.681074142456055, 3.2862396240234375, 4.3543548583984375, 1.736297607421875, -5.264778137207031, -4.165565490722656, 8.596176147460938, 9.769699096679688, 5.0804901123046875, -4.1228179931640625, 20.92821502685547, 20.553813934326172, 7.492950439453125, 4.394107818603516, -7.962871551513672, 1.4381866455078125, 2.296510696411133, 2.2328128814697266, 8.351850509643555, 1.8847122192382812, 6.689022064208984, -4.346900939941406, 2.783153533935547, -0.5617599487304688, 22.18463134765625, 18.109130859375, 10.952957153320312, 12.765960693359375, -10.516944885253906, 29.8468017578125, -0.7371177673339844, 1.6969795227050781, 19.26253890991211, -5.364967346191406, 3.3629531860351562, 0.3119068145751953, 11.18414306640625, -10.010231018066406, 7.8150634765625, 9.455093383789062, 16.903812408447266, 6.886940002441406, 2.557952880859375, -0.8310775756835938, 21.505966186523438, -3.1707229614257812, -3.8715667724609375, 1.1391429901123047, 13.362224578857422, 2.1601829528808594, -2.0976028442382812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000352.npy"}
|
||||
{"epoch": 0.5321239606953893, "step": 353, "batch_size": 64, "mean": 5.897891998291016, "std": 10.30025863647461, "min": -32.52635955810547, "p10": -5.34954719543457, "median": 5.373857498168945, "p90": 20.084453582763672, "max": 25.366043090820312, "pos_frac": 0.75, "sample": [10.862358093261719, 25.366043090820312, -8.738536834716797, 7.314659118652344, -10.133411407470703, 5.941566467285156, 2.445556640625, -11.98050308227539, 8.177581787109375, 18.504295349121094, 1.0455551147460938, 3.2470321655273438, 8.741863250732422, -5.48284912109375, 24.49610137939453, 5.223384857177734, 6.095123291015625, 17.466659545898438, 22.560577392578125, 18.970794677734375, 19.85595703125, 10.887603759765625, 1.368124008178711, 10.317916870117188, -1.3783950805664062, 0.370819091796875, 17.592029571533203, 8.691398620605469, 20.18238067626953, -0.3977928161621094, 2.6698455810546875, 0.07482147216796875, 6.5141448974609375, -1.6113662719726562, -32.52635955810547, 16.426807403564453, -7.434988021850586, -6.051973342895508, 21.724048614501953, 16.092641830444336, 5.089363098144531, 10.493072509765625, 24.094566345214844, 5.524330139160156, -4.518310546875, 4.805423736572266, 7.10076904296875, -0.4745597839355469, -0.24029159545898438, 2.1704025268554688, 2.2320556640625, 7.095020294189453, 2.4207763671875, 7.008697509765625, 0.4683971405029297, 6.274009704589844, 7.161430358886719, 1.1512603759765625, -2.0262069702148438, -5.038509368896484, 17.01873779296875, -0.340362548828125, 21.446128845214844, 5.057365417480469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000353.npy"}
|
||||
{"epoch": 0.5336356764928194, "step": 354, "batch_size": 64, "mean": 6.602656364440918, "std": 9.974865913391113, "min": -15.590316772460938, "p10": -5.618726348876953, "median": 7.434737205505371, "p90": 18.72680416107178, "max": 29.76666259765625, "pos_frac": 0.765625, "sample": [29.76666259765625, 14.168647766113281, -0.16785812377929688, 3.7343616485595703, -5.611564636230469, 0.029256820678710938, 7.828857421875, 12.556835174560547, 16.836776733398438, 0.47904205322265625, 3.4102439880371094, 16.15616798400879, 18.870553970336914, 3.3420677185058594, 6.088401794433594, 1.4326229095458984, 9.574172973632812, -5.621795654296875, 18.391387939453125, 20.28217315673828, 20.6679630279541, 9.566764831542969, 2.1655426025390625, 11.474807739257812, -15.590316772460938, -5.1139068603515625, -14.4154052734375, 18.074737548828125, 10.694469451904297, 9.649917602539062, 9.245166778564453, 17.2686767578125, 7.059345245361328, 15.72637939453125, 13.1712646484375, 10.73910140991211, -4.0560760498046875, -14.156503677368164, 14.302591323852539, -3.8491897583007812, 12.532249450683594, 17.753372192382812, -6.57208251953125, -8.708450317382812, 23.283660888671875, -13.712146759033203, 8.231369018554688, 20.812606811523438, 5.7082672119140625, -3.3806076049804688, 3.1881103515625, 7.810129165649414, 20.528377532958984, 10.649803161621094, 12.547683715820312, 1.3059959411621094, -1.06903076171875, 4.10743522644043, 0.5277023315429688, 1.968902587890625, 2.121335983276367, 14.943458557128906, -0.34758758544921875, 4.1671142578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000354.npy"}
|
||||
{"epoch": 0.5351473922902494, "step": 355, "batch_size": 64, "mean": 6.801861763000488, "std": 10.064830780029297, "min": -14.531497955322266, "p10": -3.3656505584716796, "median": 5.459888458251953, "p90": 22.03153247833252, "max": 28.510990142822266, "pos_frac": 0.78125, "sample": [4.790531158447266, 18.563720703125, -3.2689247131347656, 13.003883361816406, 10.5496826171875, 20.040958404541016, 5.079864501953125, -4.904682159423828, -0.846038818359375, 6.1112518310546875, 6.613471984863281, 1.4132614135742188, 7.11161994934082, 0.576568603515625, 4.231956481933594, 6.245849609375, -1.0229377746582031, 7.970439910888672, -12.26910400390625, 2.8941268920898438, -1.8346405029296875, 6.390079498291016, 13.952003479003906, 11.870040893554688, 18.57294464111328, 1.9850502014160156, 1.2680282592773438, 10.960050582885742, 21.882823944091797, 22.483779907226562, -7.1712799072265625, 14.890117645263672, 2.479276657104492, 27.01739501953125, 1.5777511596679688, -12.362030029296875, 7.2993927001953125, -1.0080757141113281, -1.4215774536132812, 7.463310241699219, 0.24709320068359375, -2.56842041015625, 27.95136260986328, 3.4186859130859375, 4.132789611816406, 22.047103881835938, 2.939361572265625, 0.68853759765625, 24.420997619628906, -14.531497955322266, 21.99519920349121, 9.963424682617188, 17.034103393554688, 5.224739074707031, 28.510990142822266, -9.772956848144531, 1.3177604675292969, 1.4202728271484375, 24.881187438964844, -3.4071044921875, 10.57046127319336, 5.700906753540039, 8.259170532226562, 5.695037841796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000355.npy"}
|
||||
{"epoch": 0.5366591080876795, "step": 356, "batch_size": 64, "mean": 6.024989128112793, "std": 11.49816608428955, "min": -23.303314208984375, "p10": -9.143605804443355, "median": 6.656103134155273, "p90": 19.97148952484131, "max": 31.486480712890625, "pos_frac": 0.6875, "sample": [13.189964294433594, 14.485145568847656, -0.5635299682617188, -10.710029602050781, 19.768068313598633, 12.692848205566406, -1.2059707641601562, 17.26397705078125, 9.858352661132812, 0.6206779479980469, -15.972785949707031, -23.303314208984375, 9.719863891601562, -4.9272613525390625, 21.455020904541016, 26.42322540283203, 6.419807434082031, 4.0797271728515625, 15.505134582519531, -1.398641586303711, 17.59869384765625, -22.094594955444336, 8.187355041503906, 11.374710083007812, 6.772972106933594, 20.058670043945312, 0.03823089599609375, -11.673202514648438, 2.9335060119628906, 6.539234161376953, 7.133613586425781, 8.165203094482422, -0.18147659301757812, 0.3521728515625, -11.9530029296875, 12.193923950195312, 9.973114013671875, 0.7134666442871094, 15.153900146484375, 16.399429321289062, -0.2722187042236328, 6.1611175537109375, 9.113807678222656, -5.1837310791015625, -0.24408721923828125, 1.0171241760253906, 17.69232940673828, 4.8487091064453125, -2.456684112548828, 8.931747436523438, -2.0264854431152344, 3.1125869750976562, 27.036745071411133, 23.025619506835938, -3.8698806762695312, 18.442638397216797, 8.588808059692383, -0.09954833984375, 22.278594970703125, -5.488616943359375, -13.461753845214844, 31.486480712890625, 10.303163528442383, 15.57666015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000356.npy"}
|
||||
{"epoch": 0.5381708238851096, "step": 357, "batch_size": 64, "mean": 5.682187557220459, "std": 10.242852210998535, "min": -18.119834899902344, "p10": -7.724613571166992, "median": 5.427333831787109, "p90": 18.829932403564456, "max": 30.21613311767578, "pos_frac": 0.71875, "sample": [18.063919067382812, -1.5787334442138672, 24.744827270507812, 7.327949523925781, -8.042911529541016, 30.21613311767578, 12.027938842773438, -0.008121490478515625, 9.300666809082031, -1.989105224609375, -10.974395751953125, 11.421241760253906, -6.9819183349609375, 4.9499969482421875, 27.1253662109375, 12.71685791015625, 18.514801025390625, 8.0958251953125, 10.490665435791016, 5.865631103515625, 7.076362609863281, 4.348320007324219, 5.346832275390625, 4.608903884887695, 8.198326110839844, 8.312610626220703, -0.7948284149169922, -4.980425834655762, -15.650390625, 2.157093048095703, 0.48650360107421875, 9.607410430908203, -8.808540344238281, 6.777809143066406, 13.823741912841797, 22.96593475341797, 12.54144287109375, 1.618316650390625, 9.1600341796875, 26.571075439453125, -1.1034927368164062, 18.964988708496094, -3.2223472595214844, 3.278289794921875, 6.1895599365234375, 19.412033081054688, -0.29450416564941406, 0.17181396484375, -3.9831809997558594, 0.35617828369140625, 2.1125144958496094, 14.754348754882812, 16.924137115478516, 0.802276611328125, 14.968048095703125, -0.14129638671875, 1.68572998046875, 6.521026611328125, 14.402446746826172, -12.16152572631836, -8.820281982421875, -18.119834899902344, 5.507835388183594, 0.8020906448364258], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000357.npy"}
|
||||
{"epoch": 0.5396825396825397, "step": 358, "batch_size": 64, "mean": 7.569679260253906, "std": 9.748416900634766, "min": -14.113201141357422, "p10": -3.712426376342773, "median": 5.768688201904297, "p90": 18.873868942260742, "max": 30.00335693359375, "pos_frac": 0.734375, "sample": [4.704151153564453, 13.723472595214844, 15.328697204589844, 15.176284790039062, 0.18895530700683594, 18.392578125, 13.412105560302734, -3.78729248046875, -0.22736740112304688, -2.310840606689453, -0.3522834777832031, 15.251277923583984, 9.007904052734375, 10.472442626953125, 18.980945587158203, -0.22943878173828125, -2.5404586791992188, 4.5078887939453125, 6.444305419921875, 15.015174865722656, 8.56207275390625, 4.230382919311523, 3.6783523559570312, 18.6240234375, 16.421737670898438, 1.1417236328125, 5.70904541015625, -2.2351303100585938, -5.5399322509765625, 1.7536163330078125, -5.497749328613281, 5.828330993652344, -7.74005126953125, 4.593658447265625, -1.002471923828125, 3.01141357421875, 18.51136016845703, 23.312164306640625, 4.678108215332031, -2.112071990966797, 23.656661987304688, 13.623321533203125, 18.010528564453125, 4.350643157958984, 30.00335693359375, 27.631729125976562, 8.06136703491211, -7.135334014892578, 4.992469787597656, 17.6827392578125, -2.3443069458007812, -9.615636825561523, -14.113201141357422, 23.412534713745117, 1.8850555419921875, 15.370067596435547, -3.537738800048828, 10.006027221679688, 10.154808044433594, 7.27288818359375, 11.905050277709961, 16.32415771484375, 5.263236999511719, 24.51197052001953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000358.npy"}
|
||||
{"epoch": 0.5411942554799698, "step": 359, "batch_size": 64, "mean": 4.224142551422119, "std": 10.073375701904297, "min": -22.02288818359375, "p10": -5.850008392333984, "median": 3.601163864135742, "p90": 17.44915199279786, "max": 26.207168579101562, "pos_frac": 0.640625, "sample": [4.0918731689453125, 21.911529541015625, 0.3560371398925781, 9.592117309570312, 11.3211669921875, 1.6299972534179688, -6.280052185058594, 5.97576904296875, -6.132423400878906, -22.02288818359375, 24.62865447998047, -0.5849685668945312, 5.4853515625, 15.382530212402344, -2.2451095581054688, 5.28912353515625, 18.216175079345703, -4.174358367919922, 5.7665557861328125, -18.35577392578125, 1.9326515197753906, 11.92251968383789, 2.339935302734375, -10.831192016601562, 5.0401458740234375, 4.866336822509766, 10.808061599731445, 0.7588920593261719, 3.79449462890625, -13.962287902832031, 23.005645751953125, 26.207168579101562, -2.717498779296875, 6.091697692871094, -5.1910400390625, 2.160980224609375, 7.056797027587891, 6.840751647949219, -0.6332511901855469, 1.845550537109375, -5.0023345947265625, 14.48614501953125, 22.77572250366211, 10.113550186157227, -3.1100616455078125, -0.68731689453125, -1.754302978515625, -3.12054443359375, 0.8742561340332031, 4.977500915527344, 3.4078330993652344, -4.847503662109375, -4.641326904296875, -1.1872825622558594, 15.659431457519531, -5.084011077880859, 14.954851150512695, 12.753181457519531, 9.719528198242188, 11.487716674804688, 8.974239349365234, 26.022430419921875, -7.1659393310546875, -0.44830322265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000359.npy"}
|
||||
{"epoch": 0.5427059712773998, "step": 360, "batch_size": 64, "mean": 6.251132011413574, "std": 10.025888442993164, "min": -9.262710571289062, "p10": -4.270302963256836, "median": 3.93027400970459, "p90": 22.07120704650879, "max": 31.08557891845703, "pos_frac": 0.65625, "sample": [4.2476806640625, 31.08557891845703, 4.384521484375, -0.47582244873046875, 11.578903198242188, 13.53704833984375, -4.165416717529297, 7.856605529785156, -4.315254211425781, 28.908416748046875, -0.7373504638671875, -5.741554260253906, -4.35552978515625, -0.13233566284179688, 14.523490905761719, 2.4113616943359375, 21.746063232421875, -7.0821990966796875, 4.825531005859375, 18.1749267578125, 1.0485115051269531, -0.9603748321533203, -4.9359588623046875, 0.6154098510742188, -0.5709075927734375, 22.122161865234375, 7.4301300048828125, 1.0817337036132812, 15.581012725830078, -2.9561004638671875, 3.2645797729492188, 2.6898231506347656, 5.5182037353515625, -1.9837493896484375, -1.7013092041015625, 13.091018676757812, -1.541473388671875, 10.650566101074219, 3.62445068359375, -1.7257614135742188, 30.34172821044922, -2.9333114624023438, 6.538238525390625, -3.3952178955078125, 17.014541625976562, 6.1357269287109375, 24.12778091430664, 9.328140258789062, 2.7118091583251953, 21.952312469482422, 22.210433959960938, 4.7979278564453125, 23.485179901123047, -8.64541244506836, -1.4418373107910156, 10.634208679199219, -0.9821243286132812, -9.262710571289062, 4.23609733581543, 6.488006591796875, 2.0339508056640625, 1.9739646911621094, 5.0418853759765625, 21.06451416015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000360.npy"}
|
||||
{"epoch": 0.54421768707483, "step": 361, "batch_size": 64, "mean": 7.2958984375, "std": 10.420345306396484, "min": -18.772186279296875, "p10": -3.266259956359863, "median": 5.611774444580078, "p90": 20.932657623291014, "max": 41.63682556152344, "pos_frac": 0.75, "sample": [-3.4130859375, 10.568801879882812, -2.923666000366211, -3.635143280029297, 17.48607635498047, 1.5474796295166016, -1.3605575561523438, -5.187191009521484, 5.9118194580078125, 8.954280853271484, -1.5881195068359375, -2.65740966796875, -1.4670257568359375, 11.411613464355469, 16.94891357421875, 3.8492965698242188, -5.45245361328125, -1.8694076538085938, 5.311729431152344, 1.54425048828125, 14.375091552734375, 21.10533905029297, 0.2734375, 20.94409942626953, 15.000190734863281, 3.9085693359375, 15.059585571289062, -18.772186279296875, 22.09552001953125, 2.9986000061035156, 7.224147796630859, 4.412239074707031, 12.823360443115234, 34.705711364746094, 11.444511413574219, 14.342422485351562, 16.976612091064453, 2.9152259826660156, -11.816665649414062, -2.26715087890625, 1.8596019744873047, -0.8322715759277344, 23.793926239013672, 14.219978332519531, 14.295745849609375, 22.07585906982422, 10.20449447631836, 0.47321319580078125, 9.734382629394531, 8.85394287109375, 15.497627258300781, 5.124229431152344, 7.887638092041016, 0.8195152282714844, 41.63682556152344, 5.065744400024414, 20.905960083007812, 3.4552383422851562, -8.4718017578125, 6.714977264404297, 3.5919532775878906, 10.550137519836426, 9.974502563476562, -2.2227935791015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000361.npy"}
|
||||
{"epoch": 0.54572940287226, "step": 362, "batch_size": 64, "mean": 5.702146530151367, "std": 10.633546829223633, "min": -20.414302825927734, "p10": -5.360028839111328, "median": 4.722461700439453, "p90": 18.178494644165042, "max": 44.72381591796875, "pos_frac": 0.703125, "sample": [-2.7539596557617188, 4.7037811279296875, 0.854736328125, -1.8822689056396484, 3.3908615112304688, -20.414302825927734, 9.808662414550781, 11.500320434570312, 44.72381591796875, 7.31243896484375, 0.4863739013671875, -0.7273445129394531, 9.517333984375, 7.846729278564453, -3.1589393615722656, 9.437562942504883, 1.6395339965820312, 4.793357849121094, -1.5658912658691406, 18.394821166992188, -0.43627166748046875, 2.5137939453125, 3.4521026611328125, -0.46087646484375, 19.513076782226562, 9.582931518554688, 17.67373275756836, -16.328857421875, -11.529151916503906, 4.741142272949219, 25.58538818359375, 6.2276611328125, -1.7758255004882812, 12.23788833618164, -0.7545318603515625, 22.221435546875, 5.331638336181641, -5.713165283203125, 4.5302581787109375, -1.1785697937011719, 2.639211654663086, 16.559219360351562, 15.86962890625, 14.005950927734375, 1.4813919067382812, 16.3648681640625, 13.697423934936523, 16.192550659179688, 13.438482284545898, -5.117828369140625, -5.463829040527344, 10.337127685546875, 6.3535308837890625, -13.771659851074219, 3.9462356567382812, 7.049896240234375, 21.3233642578125, -8.076934814453125, -5.0645599365234375, 4.420135498046875, 23.8084716796875, 5.848545074462891, 6.274711608886719, 3.48004150390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000362.npy"}
|
||||
{"epoch": 0.54724111866969, "step": 363, "batch_size": 64, "mean": 7.091344833374023, "std": 8.953733444213867, "min": -16.65829849243164, "p10": -0.9410011291503901, "median": 5.310081481933594, "p90": 20.029522705078126, "max": 28.94072723388672, "pos_frac": 0.84375, "sample": [4.875629425048828, -8.962867736816406, 23.00351333618164, 4.970619201660156, 0.48944091796875, 3.8941917419433594, 5.596168518066406, 27.90618896484375, 5.306571960449219, 2.7008800506591797, 5.705009460449219, 14.3206787109375, 18.860198974609375, 19.767471313476562, 4.717185974121094, 28.94072723388672, -0.07247543334960938, 8.060258865356445, 3.2125701904296875, 15.963451385498047, 8.691986083984375, 9.51114273071289, 3.552753448486328, 14.516983032226562, 16.854637145996094, 5.060997009277344, 2.51617431640625, 7.029579162597656, -16.65829849243164, 1.520172119140625, -6.161285400390625, -8.46710205078125, 10.630905151367188, -6.84722900390625, 20.141830444335938, 6.893272399902344, 6.37255859375, 5.443489074707031, 7.100801467895508, 0.07270050048828125, 3.5106964111328125, 2.862712860107422, 15.622268676757812, 25.242759704589844, -0.49027252197265625, 21.646141052246094, 15.62060546875, 4.5235137939453125, 15.289268493652344, -0.1408843994140625, 21.297157287597656, 1.5440444946289062, 12.332260131835938, 2.2011489868164062, -2.742816925048828, -1.1341705322265625, 6.7413177490234375, 2.8930206298828125, 8.18023681640625, 10.928485870361328, 1.2608375549316406, 0.103424072265625, 4.2092132568359375, 5.313591003417969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000363.npy"}
|
||||
{"epoch": 0.5487528344671202, "step": 364, "batch_size": 64, "mean": 8.17833137512207, "std": 11.94770336151123, "min": -16.259069442749023, "p10": -4.9798126220703125, "median": 4.714214324951172, "p90": 24.38696365356446, "max": 35.274322509765625, "pos_frac": 0.75, "sample": [25.276634216308594, 0.6318893432617188, 3.4656105041503906, 34.084983825683594, 22.311065673828125, -1.356048583984375, 4.6899871826171875, 11.776065826416016, -1.197265625, 21.155868530273438, 28.096038818359375, 3.5644989013671875, -7.422321319580078, -0.5843582153320312, 17.049240112304688, 30.997379302978516, 1.2489013671875, 4.7436981201171875, 35.274322509765625, 1.614013671875, 15.791587829589844, 2.36541748046875, 4.738441467285156, 25.6363525390625, -11.500076293945312, -16.259069442749023, 22.270278930664062, -0.21066856384277344, 2.542299270629883, 11.167716979980469, 17.190502166748047, 13.770294189453125, 2.1184463500976562, 9.576858520507812, 2.298187255859375, 4.1253814697265625, 19.169628143310547, 18.31488037109375, -4.94189453125, -4.996063232421875, 4.06060791015625, 6.4730377197265625, -5.8680877685546875, -4.6334686279296875, 16.12371826171875, 1.9997482299804688, -1.393585205078125, 16.716217041015625, 8.311103820800781, 30.299560546875, -1.8583755493164062, 19.2623291015625, -13.459692001342773, 12.970779418945312, -13.389846801757812, 10.068893432617188, 6.836753845214844, 21.08185577392578, 3.778850555419922, 2.4747848510742188, 2.6668701171875, -1.6308212280273438, 22.030838012695312, 11.9024658203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000364.npy"}
|
||||
{"epoch": 0.5502645502645502, "step": 365, "batch_size": 64, "mean": 6.555495262145996, "std": 8.600565910339355, "min": -14.282264709472656, "p10": -3.6926628112792965, "median": 5.357370376586914, "p90": 18.130125045776367, "max": 25.634235382080078, "pos_frac": 0.765625, "sample": [3.2890243530273438, 5.424106597900391, 11.910293579101562, 5.946197509765625, 3.8101158142089844, 3.0481462478637695, 14.361595153808594, 18.240772247314453, 17.865623474121094, 0.883880615234375, 12.808303833007812, 3.559551239013672, -14.282264709472656, 5.350189208984375, 5.830078125, 22.122390747070312, -2.286417007446289, 10.639122009277344, -0.30536651611328125, 9.361370086669922, 3.78485107421875, 5.0178375244140625, 17.457061767578125, -3.3306198120117188, -1.0380477905273438, -5.192676544189453, 6.607353210449219, 22.750389099121094, 1.1269683837890625, 17.64113998413086, -3.8478240966796875, 4.113681793212891, 12.43429183959961, 3.240345001220703, 12.475204467773438, 18.480247497558594, 7.75750732421875, -0.8245058059692383, 2.9507904052734375, 9.307106018066406, 15.901405334472656, 9.094192504882812, 12.202140808105469, 1.89117431640625, 4.2936859130859375, 22.286231994628906, 4.082204818725586, -6.4571380615234375, 17.8719482421875, 5.364551544189453, 10.087959289550781, -2.5484161376953125, 2.381847381591797, -10.820709228515625, 8.28656005859375, 3.5668487548828125, -4.1666259765625, -6.215457916259766, 8.299591064453125, -2.8645706176757812, 25.634235382080078, 16.047386169433594, -2.2170562744140625, 19.061874389648438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000365.npy"}
|
||||
{"epoch": 0.5517762660619804, "step": 366, "batch_size": 64, "mean": 5.897998332977295, "std": 10.933196067810059, "min": -19.200336456298828, "p10": -6.378890228271484, "median": 4.513269424438477, "p90": 20.58328075408936, "max": 32.58131408691406, "pos_frac": 0.6875, "sample": [0.7363491058349609, 17.855255126953125, 17.383045196533203, 10.137611389160156, -0.599609375, 20.296220779418945, -6.6502685546875, -8.323993682861328, -11.74224853515625, -4.122840881347656, -2.3101425170898438, 10.341567993164062, 9.385326385498047, 9.981735229492188, 8.668136596679688, -1.264984130859375, -12.805313110351562, 6.2618865966796875, 5.552791595458984, 16.08203887939453, 1.198822021484375, 3.2458648681640625, 3.899505615234375, -3.522266387939453, -2.3289337158203125, 6.804742813110352, 2.1247406005859375, 16.082130432128906, -9.356857299804688, 4.8567657470703125, 32.58131408691406, -0.9664840698242188, -4.442714691162109, 25.755294799804688, -19.200336456298828, 0.8626327514648438, 4.991462707519531, -5.745674133300781, 8.836591720581055, 4.169773101806641, 1.3075180053710938, -2.2041702270507812, 2.424448013305664, 3.055755615234375, -5.581871032714844, 2.978607177734375, 0.22898101806640625, 13.683578491210938, 26.363277435302734, 9.68853759765625, 8.825347900390625, 27.185890197753906, 17.06363296508789, -12.061412811279297, 9.872962951660156, -4.957572937011719, 27.66240692138672, 12.269439697265625, 17.00855255126953, 21.703521728515625, 20.70630645751953, 12.56625747680664, 13.824167251586914, -0.8512096405029297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000366.npy"}
|
||||
{"epoch": 0.5532879818594104, "step": 367, "batch_size": 64, "mean": 5.073179244995117, "std": 9.018816947937012, "min": -21.14488983154297, "p10": -5.5518955230712885, "median": 5.022579193115234, "p90": 18.054938507080077, "max": 21.189605712890625, "pos_frac": 0.703125, "sample": [7.124095916748047, 18.061248779296875, 11.76385498046875, -4.9840240478515625, 4.370349884033203, -1.1902503967285156, 17.993289947509766, 18.907447814941406, 0.36135101318359375, 4.537866592407227, 0.37751007080078125, 6.590789794921875, 5.433082580566406, 1.69085693359375, 21.189605712890625, 9.595970153808594, 8.522293090820312, -5.870475769042969, -1.8270759582519531, 18.04021453857422, -2.301727294921875, 10.945999145507812, -6.022670745849609, -7.59271240234375, 4.637054443359375, -16.29302978515625, 3.065654754638672, 11.826454162597656, 10.305695533752441, 20.324172973632812, 9.455795288085938, 10.370092391967773, 10.77811050415039, 3.9036102294921875, 6.320821762084961, -0.26955413818359375, 1.0605545043945312, -5.765422821044922, 8.18537712097168, 11.318229675292969, -1.8493766784667969, -21.14488983154297, 0.5527458190917969, 20.73804473876953, 6.131359100341797, -2.0701141357421875, 3.8611297607421875, 6.902996063232422, 5.073387145996094, -3.7807083129882812, 4.971771240234375, 4.505298614501953, -11.96429443359375, 15.454917907714844, 16.351821899414062, 18.56769561767578, -5.0536651611328125, -2.344697952270508, 13.644004821777344, 9.043449401855469, -3.5045852661132812, 7.852092742919922, -0.288482666015625, 18.09305191040039], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000367.npy"}
|
||||
{"epoch": 0.5547996976568406, "step": 368, "batch_size": 64, "mean": 5.677168369293213, "std": 10.1239595413208, "min": -18.67296600341797, "p10": -6.30573844909668, "median": 5.095206260681152, "p90": 20.14745559692383, "max": 26.499706268310547, "pos_frac": 0.6875, "sample": [-11.836746215820312, 11.912055969238281, -3.8183021545410156, 11.961250305175781, 0.633819580078125, 6.544647216796875, 23.039417266845703, 6.971595764160156, 0.4295501708984375, -1.0283050537109375, 8.038551330566406, -5.624351501464844, -3.314453125, 12.784782409667969, 26.499706268310547, -8.294143676757812, -1.1599960327148438, -10.483909606933594, -2.8661880493164062, 19.247501373291016, -1.3645401000976562, 7.578655242919922, 7.9990692138671875, -9.098487854003906, 22.291170120239258, 6.462240219116211, -5.42919921875, 19.343490600585938, 25.30144500732422, 17.95480728149414, 3.4290409088134766, 15.4678955078125, 20.49201202392578, 3.4186134338378906, 12.021289825439453, 5.284553527832031, 4.06939697265625, 10.70489501953125, 1.384002685546875, 3.181640625, 4.333709716796875, -4.367984771728516, 22.513023376464844, 10.719535827636719, -6.274990081787109, -2.7730789184570312, 11.226890563964844, -6.318916320800781, 10.464714050292969, -6.525665283203125, 14.506912231445312, 10.336212158203125, 25.146209716796875, 13.274635314941406, 5.108545303344727, 4.737396240234375, 4.356452941894531, -5.668113708496094, -3.087963104248047, 6.8118896484375, 15.067337036132812, 5.081867218017578, -18.67296600341797, 3.2146453857421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000368.npy"}
|
||||
{"epoch": 0.5563114134542706, "step": 369, "batch_size": 64, "mean": 5.029770851135254, "std": 10.085412979125977, "min": -16.46200180053711, "p10": -7.733535003662109, "median": 3.2584457397460938, "p90": 17.971583175659184, "max": 30.550823211669922, "pos_frac": 0.75, "sample": [24.663436889648438, -9.765632629394531, 1.4375114440917969, 12.635528564453125, -16.46200180053711, 12.880462646484375, 13.754127502441406, 6.249061584472656, 11.403839111328125, 3.3641586303710938, 2.7598953247070312, 11.445785522460938, 16.484256744384766, 11.389358520507812, 21.100221633911133, 4.415035247802734, 0.7123689651489258, 5.1389007568359375, -0.50262451171875, 2.2901992797851562, 14.173919677734375, -15.746152877807617, 9.686683654785156, -4.327144622802734, 2.5733413696289062, 12.36700439453125, -7.4059600830078125, 13.292814254760742, 5.7353515625, -6.879261016845703, 18.6090087890625, -0.7348480224609375, 4.138755798339844, 21.262245178222656, 12.802810668945312, 19.79375457763672, 29.527240753173828, 2.619232177734375, 1.22998046875, -11.686630249023438, -2.9945220947265625, 3.1527328491210938, 1.9319648742675781, 9.964836120605469, 0.7555809020996094, 0.874053955078125, 2.7913818359375, 9.395606994628906, 6.881591796875, -4.345649719238281, 1.5996475219726562, 0.2628631591796875, 4.239223480224609, 11.933891296386719, -2.462188720703125, -6.911205291748047, 2.042835235595703, 30.550823211669922, 1.7229537963867188, 9.8236083984375, -7.873924255371094, -11.718582153320312, -9.30173110961914, 13.163518905639648], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000369.npy"}
|
||||
{"epoch": 0.5578231292517006, "step": 370, "batch_size": 64, "mean": 8.294317245483398, "std": 11.860881805419922, "min": -23.204984664916992, "p10": -4.279433631896972, "median": 7.451160430908203, "p90": 23.909255218505862, "max": 32.71173095703125, "pos_frac": 0.734375, "sample": [6.336982727050781, 24.967636108398438, 20.884204864501953, 8.055351257324219, 29.631515502929688, 3.3568687438964844, 32.71173095703125, -1.159027099609375, 22.74505615234375, 14.167926788330078, 5.450248718261719, -2.2188491821289062, 0.9115066528320312, 16.501304626464844, 26.683700561523438, 6.8469696044921875, 6.0837554931640625, -8.231307983398438, -2.9473419189453125, 9.710372924804688, -6.105133056640625, 16.40643310546875, 24.3087158203125, 10.046142578125, 19.54645538330078, 8.890579223632812, 14.233016967773438, -10.469715118408203, 6.638069152832031, -0.31339263916015625, 0.06700515747070312, -4.679840087890625, -1.390716552734375, -2.3573532104492188, -0.51397705078125, 20.834335327148438, 30.275146484375, 9.782390594482422, 27.879261016845703, 15.964832305908203, 21.511962890625, -19.2908935546875, 14.139293670654297, 0.00860595703125, 13.595539093017578, 22.97718048095703, 10.993316650390625, 18.63384246826172, 13.866704940795898, 14.73150634765625, -23.204984664916992, 6.20332145690918, 13.31136703491211, -3.345151901245117, 5.4932861328125, 0.4603271484375, 8.62109375, 0.8839645385742188, -2.4007301330566406, -1.1008930206298828, 3.3978500366210938, 18.068138122558594, -10.354202270507812, 4.105030059814453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000370.npy"}
|
||||
{"epoch": 0.5593348450491308, "step": 371, "batch_size": 64, "mean": 6.463840961456299, "std": 11.527737617492676, "min": -20.847522735595703, "p10": -9.19494857788086, "median": 6.262933731079102, "p90": 22.089481735229498, "max": 34.135826110839844, "pos_frac": 0.734375, "sample": [14.8721923828125, -20.847522735595703, 31.134506225585938, 12.24416732788086, 22.666942596435547, 0.072784423828125, 16.23438262939453, -1.4962615966796875, 6.248054504394531, -0.28969669342041016, 1.371978759765625, 14.668830871582031, -11.286247253417969, 30.139793395996094, 7.398492813110352, 11.048385620117188, 0.0043010711669921875, 7.133228302001953, 3.910623550415039, 25.545814514160156, 0.8571319580078125, -16.0408935546875, 3.1098899841308594, 6.1383819580078125, 12.030868530273438, 24.393075942993164, 10.040939331054688, 34.135826110839844, 14.148246765136719, 13.045059204101562, 5.5507049560546875, 15.684268951416016, -9.302848815917969, -4.8072509765625, 8.857505798339844, 2.9612350463867188, 6.549163818359375, -9.71023941040039, -4.8357696533203125, 4.995674133300781, -5.6103973388671875, 15.739280700683594, -7.3077392578125, 5.285484313964844, 10.975616455078125, -8.254337310791016, 13.09078598022461, 6.277812957763672, -8.972785949707031, 15.430320739746094, 5.150142669677734, -11.92071533203125, 1.7792224884033203, 13.59417724609375, 15.667495727539062, -3.405313491821289, 20.74207305908203, 7.087242126464844, -0.8027877807617188, 8.803306579589844, 14.67779541015625, 23.075420379638672, 3.2981414794921875, -9.2901611328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000371.npy"}
|
||||
{"epoch": 0.5608465608465608, "step": 372, "batch_size": 64, "mean": 5.775787353515625, "std": 12.119894981384277, "min": -20.36041259765625, "p10": -9.807946014404296, "median": 4.9630584716796875, "p90": 22.463040161132817, "max": 32.323455810546875, "pos_frac": 0.6875, "sample": [-10.153987884521484, 10.808006286621094, 13.688133239746094, -7.0876007080078125, 9.244651794433594, 5.880912780761719, 19.69580841064453, -10.538589477539062, 16.314979553222656, 13.091590881347656, 24.34515380859375, 6.736869812011719, 8.879783630371094, 29.255661010742188, 11.485755920410156, 6.159187316894531, 9.950569152832031, -20.36041259765625, 0.32601165771484375, 16.713088989257812, 6.574067115783691, 3.5991477966308594, 1.3844070434570312, 25.123077392578125, -2.043027877807617, 2.1537399291992188, -2.629150390625, 21.55034637451172, 32.323455810546875, 3.3166580200195312, 26.96771240234375, 20.591102600097656, -0.8990592956542969, -4.039447784423828, -5.634914398193359, 10.073659896850586, -20.320167541503906, 6.693523406982422, 18.688343048095703, 3.723285675048828, 20.63214111328125, 8.779827117919922, 13.61248779296875, 22.85419464111328, 2.8024215698242188, 11.246162414550781, -7.554653167724609, 1.0031776428222656, 3.7536983489990234, -10.631607055664062, 4.045204162597656, -17.992660522460938, 13.385246276855469, 0.7093935012817383, -4.5308990478515625, -9.00051498413086, -13.197799682617188, -5.132835388183594, 9.977527618408203, 25.946231842041016, -0.8389205932617188, -2.941314697265625, -0.905426025390625, 1.9969673156738281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000372.npy"}
|
||||
{"epoch": 0.562358276643991, "step": 373, "batch_size": 64, "mean": 6.802433013916016, "std": 11.772836685180664, "min": -21.059528350830078, "p10": -5.595613098144531, "median": 4.590456008911133, "p90": 21.67108993530274, "max": 35.57319641113281, "pos_frac": 0.6875, "sample": [-8.397174835205078, 13.708745956420898, 17.561500549316406, 11.262603759765625, 14.323532104492188, -16.452167510986328, 3.1784896850585938, 20.251827239990234, 11.287212371826172, 6.687150955200195, 17.723785400390625, -12.344779968261719, 3.9825439453125, -3.687896728515625, -21.059528350830078, -1.1662635803222656, 5.910575866699219, -2.7125396728515625, -5.37811279296875, 5.131496429443359, 2.7249526977539062, -2.6146183013916016, 10.41339111328125, -6.7035980224609375, 7.36578369140625, 28.86383056640625, -3.1045379638671875, 12.5538330078125, 8.58172607421875, 4.049415588378906, 23.78173065185547, -0.8234329223632812, -0.15655899047851562, 26.03246307373047, -3.775238037109375, 2.5911521911621094, 0.20574951171875, 2.614288330078125, -1.3451690673828125, 20.583839416503906, 17.62982940673828, -5.6888275146484375, -5.299293518066406, -3.9158401489257812, 0.144317626953125, 18.30196762084961, -6.1790771484375, 22.137054443359375, -0.49614524841308594, 8.747039794921875, 7.676422119140625, 34.43560791015625, 3.939289093017578, 35.57319641113281, 8.63416862487793, 12.7208251953125, 3.14312744140625, 18.684478759765625, 11.079429626464844, 31.515892028808594, 6.210369110107422, 1.6962661743164062, 2.7892684936523438, 20.226348876953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000373.npy"}
|
||||
{"epoch": 0.563869992441421, "step": 374, "batch_size": 64, "mean": 8.923542022705078, "std": 10.645493507385254, "min": -16.70855712890625, "p10": -2.4417285919189453, "median": 8.449681282043457, "p90": 24.094845581054695, "max": 30.899627685546875, "pos_frac": 0.828125, "sample": [30.899627685546875, 6.268714904785156, 2.1505584716796875, -5.0365447998046875, -0.10223388671875, 17.992084503173828, 18.65625762939453, 18.852867126464844, 25.815834045410156, 13.643577575683594, 14.88839340209961, 1.1067962646484375, 8.561111450195312, 27.8660888671875, 2.0727386474609375, -10.849700927734375, 21.927783966064453, 2.8462753295898438, 0.49004364013671875, 0.2904052734375, -1.9384765625, 15.723806381225586, 8.338251113891602, -3.445770263671875, 15.112030029296875, 11.467670440673828, 11.931446075439453, 21.10961151123047, -2.389324188232422, 7.297370910644531, 4.956169128417969, 18.843460083007812, 5.243906021118164, 0.9158649444580078, 19.336013793945312, 7.742504119873047, 0.5128631591796875, -16.70855712890625, 11.645584106445312, 13.467842102050781, 9.463912963867188, -13.565010070800781, 1.618509292602539, 25.190353393554688, 29.41748046875, 9.662620544433594, -5.8670196533203125, -1.6423263549804688, 11.726699829101562, 1.0090866088867188, 18.934478759765625, 17.046550750732422, 25.113906860351562, 9.273574829101562, 14.35589599609375, 1.1156196594238281, 25.02358627319336, 14.054656982421875, 4.696372985839844, 21.850990295410156, 1.4000282287597656, 5.8476104736328125, 0.3403778076171875, -2.4641876220703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000374.npy"}
|
||||
{"epoch": 0.5653817082388511, "step": 375, "batch_size": 64, "mean": 6.112877368927002, "std": 10.509322166442871, "min": -19.318817138671875, "p10": -7.496743965148925, "median": 5.881511688232422, "p90": 19.708288574218752, "max": 30.502426147460938, "pos_frac": 0.75, "sample": [6.945991516113281, 2.4642791748046875, 0.6675567626953125, 17.771127700805664, 10.775726318359375, 3.5070648193359375, 22.692169189453125, -1.2636127471923828, 2.385040283203125, 11.077157974243164, 1.7508087158203125, 13.856216430664062, 1.5378265380859375, 22.655670166015625, 15.38299560546875, -1.643341064453125, 15.678840637207031, -4.76751708984375, 5.627311706542969, -5.978368759155273, 7.330684661865234, 11.403984069824219, 12.018791198730469, -1.243896484375, 8.899250030517578, 21.900039672851562, 3.481903076171875, -8.73553466796875, 7.804500579833984, 16.82373046875, 2.7489471435546875, -8.369213104248047, -1.3451423645019531, -2.9576263427734375, 7.699310302734375, 12.302680969238281, 6.135711669921875, 1.7732086181640625, 30.502426147460938, 1.7066307067871094, 19.948944091796875, 5.564399719238281, 19.146759033203125, -19.046531677246094, 7.509559631347656, -3.627685546875, 28.32683563232422, 2.7625808715820312, 15.5072021484375, 7.1730804443359375, 11.341964721679688, 4.7134857177734375, -14.04498291015625, 8.399169921875, 2.601207733154297, 13.699504852294922, 12.217544555664062, -19.318817138671875, -3.2053794860839844, 15.692886352539062, -8.147476196289062, 22.673446655273438, 1.766143798828125, -11.433021545410156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000375.npy"}
|
||||
{"epoch": 0.5668934240362812, "step": 376, "batch_size": 64, "mean": 8.36793041229248, "std": 12.349750518798828, "min": -15.582828521728516, "p10": -8.66502170562744, "median": 6.704059600830078, "p90": 27.579516220092774, "max": 31.407264709472656, "pos_frac": 0.765625, "sample": [4.3612823486328125, 9.7666015625, -0.49729156494140625, 4.851982116699219, 14.363883972167969, -5.272756576538086, -3.1086502075195312, 21.99120330810547, 9.751838684082031, 1.9104537963867188, 19.45574951171875, 9.753786087036133, -15.582828521728516, -1.8310718536376953, -8.161422729492188, -4.373929977416992, 29.24677276611328, 6.693077087402344, 8.470699310302734, 26.911865234375, 2.5293197631835938, 31.407264709472656, -1.9354476928710938, 10.786880493164062, 25.069984436035156, 1.9135589599609375, 30.641773223876953, 27.15469741821289, 6.75732421875, 15.191665649414062, 16.158349990844727, 15.216880798339844, 23.78607940673828, 5.04217529296875, 4.357612609863281, 27.761581420898438, 11.0908203125, -10.744800567626953, 28.889686584472656, -13.820568084716797, 6.289100646972656, -8.880849838256836, 14.187736511230469, 5.428642272949219, 3.9429397583007812, 28.478851318359375, -9.989944458007812, 8.620006561279297, -1.1177902221679688, 17.073463439941406, -10.097915649414062, 1.664804458618164, 6.7150421142578125, 7.194568634033203, 30.13001251220703, 1.9041938781738281, 8.375091552734375, 5.783466339111328, 22.578365325927734, 18.036060333251953, 3.4420318603515625, 3.6692466735839844, 1.0056695938110352, -14.841293334960938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000376.npy"}
|
||||
{"epoch": 0.5684051398337112, "step": 377, "batch_size": 64, "mean": 5.578057765960693, "std": 9.920390129089355, "min": -10.272064208984375, "p10": -8.001760864257813, "median": 5.81463623046875, "p90": 18.20134105682374, "max": 28.96551513671875, "pos_frac": 0.671875, "sample": [23.978782653808594, -10.272064208984375, 15.888370513916016, 19.048067092895508, 11.836685180664062, 2.5222549438476562, -9.45987319946289, -10.233810424804688, 3.719024658203125, 11.692512512207031, 9.800704956054688, -3.2132492065429688, 14.165672302246094, 3.4981155395507812, 6.839393615722656, 7.5999298095703125, -8.03875732421875, 7.3664093017578125, -5.096195220947266, -0.707122802734375, 23.920337677001953, -1.4952354431152344, 4.6757659912109375, 9.221359252929688, 16.160892486572266, -9.675662994384766, 4.839454650878906, 22.502395629882812, -1.007659912109375, -0.008892059326171875, 9.964008331298828, -0.7150945663452148, 14.05029296875, -6.985713958740234, 8.977508544921875, -7.915435791015625, 7.38677978515625, 6.867641448974609, 8.817108154296875, 6.7659912109375, -9.156818389892578, 11.802406311035156, 7.541376113891602, 3.7719669342041016, 4.419807434082031, 28.817474365234375, 9.59368896484375, -7.138164520263672, -7.910133361816406, 7.435638427734375, 16.22564697265625, 4.86328125, -0.1580352783203125, 28.96551513671875, -2.2218399047851562, -2.7847862243652344, 25.816974639892578, 10.68553352355957, -9.542842864990234, 3.3395614624023438, 11.415042877197266, 1.3266143798828125, 8.958930969238281, 3.6481857299804688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000377.npy"}
|
||||
{"epoch": 0.5699168556311414, "step": 378, "batch_size": 64, "mean": 5.129953384399414, "std": 10.494629859924316, "min": -22.717166900634766, "p10": -4.32611198425293, "median": 3.4347267150878906, "p90": 17.930152130126956, "max": 41.47076416015625, "pos_frac": 0.75, "sample": [4.1025543212890625, -0.22270965576171875, 3.2639694213867188, 3.092538833618164, 2.3676910400390625, 19.044631958007812, -1.8701324462890625, 1.9789352416992188, 4.0700225830078125, 10.951751708984375, -21.359779357910156, 3.1286487579345703, 30.725967407226562, 2.9734535217285156, 20.433326721191406, 13.080768585205078, 2.5824317932128906, 5.029258728027344, 17.49907684326172, -2.7958908081054688, 16.161256790161133, 7.827972412109375, 6.1848297119140625, 3.6054840087890625, 13.847480773925781, 8.247955322265625, 3.1036834716796875, 8.672271728515625, -6.8050994873046875, 1.6004562377929688, 21.168716430664062, 11.305877685546875, -9.941444396972656, 3.1406631469726562, 18.114898681640625, 5.752410888671875, 2.5051040649414062, 5.141475677490234, 9.913726806640625, -4.483109474182129, 41.47076416015625, -2.5832672119140625, 2.2306442260742188, 9.586708068847656, -0.2867431640625, 12.019767761230469, 2.2555885314941406, 22.45720100402832, -4.257221221923828, 9.506818771362305, 2.4311676025390625, 4.110588073730469, 1.4871196746826172, -19.822547912597656, -4.3556365966796875, 9.307044982910156, -2.3500823974609375, 10.8392333984375, 1.360849380493164, 4.931617736816406, -1.2649383544921875, -22.717166900634766, -0.6320457458496094, 9.450408935546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000378.npy"}
|
||||
{"epoch": 0.5714285714285714, "step": 379, "batch_size": 64, "mean": 7.189952373504639, "std": 10.488577842712402, "min": -18.856639862060547, "p10": -4.559038925170898, "median": 5.725996017456055, "p90": 19.99607925415039, "max": 33.112335205078125, "pos_frac": 0.75, "sample": [6.8415069580078125, 4.2086181640625, 16.264381408691406, 1.021392822265625, 2.6639251708984375, 17.432205200195312, 9.55767822265625, -18.856639862060547, -0.09148025512695312, 3.5730667114257812, 1.96710205078125, 1.3585357666015625, -7.967376708984375, 20.165870666503906, -0.6737613677978516, 31.910964965820312, 4.61126708984375, -4.017047882080078, 20.848995208740234, 15.736503601074219, 33.112335205078125, 4.3945465087890625, 19.370269775390625, 17.771644592285156, -0.941680908203125, -1.270660400390625, 14.395191192626953, -14.634353637695312, 15.234748840332031, -8.1871337890625, 19.599899291992188, 9.463008880615234, 6.944337844848633, 5.897319793701172, -4.79132080078125, 20.297088623046875, 14.313377380371094, 7.532451629638672, 1.7114524841308594, -2.746795654296875, -9.370662689208984, 5.317657470703125, 7.833984375, 22.88780403137207, 3.104694366455078, 4.012474060058594, 32.9171142578125, 9.91229248046875, 2.947704315185547, 0.0034027099609375, 3.2387924194335938, -1.3160324096679688, -1.0515594482421875, -5.4085693359375, 16.818538665771484, -0.43560028076171875, 14.565921783447266, 10.38323974609375, 9.7479248046875, 6.505578994750977, 8.371585845947266, 14.020896911621094, 15.57366943359375, 5.5546722412109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000379.npy"}
|
||||
{"epoch": 0.5729402872260015, "step": 380, "batch_size": 64, "mean": 8.483596801757812, "std": 11.687772750854492, "min": -26.460426330566406, "p10": -5.853919410705561, "median": 8.980531692504883, "p90": 22.275908279418946, "max": 35.911285400390625, "pos_frac": 0.828125, "sample": [4.067731857299805, 14.631790161132812, 9.499465942382812, 3.7461700439453125, -14.878082275390625, -16.29151153564453, 25.599529266357422, 19.459251403808594, 1.0506401062011719, 18.9547119140625, 1.2623062133789062, -7.735786437988281, -1.4628963470458984, 22.429431915283203, 7.61962890625, 0.887451171875, -1.0572547912597656, 8.893329620361328, 16.468616485595703, 23.90328598022461, 6.193809509277344, 9.995956420898438, 6.300315856933594, -20.694536209106445, 26.360633850097656, 3.8248977661132812, 17.360273361206055, 10.9915771484375, -10.513626098632812, 4.297626495361328, 10.725814819335938, 6.3040618896484375, 9.067733764648438, 10.805660247802734, 21.830902099609375, 3.0403995513916016, 23.452056884765625, 19.112037658691406, -0.0052490234375, 10.785919189453125, 12.628292083740234, -26.460426330566406, 24.00713539123535, 35.911285400390625, 9.15414047241211, 16.477163314819336, 2.3039321899414062, 7.349102020263672, 1.88079833984375, 20.674531936645508, 5.514823913574219, 4.313407897949219, -9.883560180664062, 21.917686462402344, 16.15240478515625, 17.525344848632812, 6.90521240234375, 14.606002807617188, 20.197269439697266, 13.191802978515625, 1.717498779296875, 13.566089630126953, -0.05813026428222656, 7.0742950439453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000380.npy"}
|
||||
{"epoch": 0.5744520030234316, "step": 381, "batch_size": 64, "mean": 5.022771835327148, "std": 11.045283317565918, "min": -24.381668090820312, "p10": -6.4901901245117175, "median": 4.52207088470459, "p90": 18.116504669189453, "max": 33.05082702636719, "pos_frac": 0.6875, "sample": [-4.9087066650390625, 2.538787841796875, -1.2399282455444336, -12.166496276855469, -24.381668090820312, 6.07404899597168, 5.110218048095703, 4.413610458374023, 29.530323028564453, 8.139518737792969, 4.8228302001953125, 1.5798072814941406, 7.114238739013672, 8.741867065429688, 5.5742034912109375, 33.05082702636719, 21.93035888671875, 1.049936294555664, -0.9483871459960938, -1.9500389099121094, 5.070991516113281, -6.87445068359375, 6.692914962768555, 15.4521484375, -5.5935821533203125, 7.2811431884765625, 2.190399169921875, -19.348960876464844, 11.060234069824219, -1.6752548217773438, 13.240961074829102, 17.069778442382812, 16.289108276367188, -4.990936279296875, -1.7634468078613281, 14.346275329589844, 10.870952606201172, -3.0619354248046875, -7.965566635131836, 11.115432739257812, 18.29387664794922, -11.662178039550781, 17.70263671875, 0.35160255432128906, 6.775260925292969, 27.084087371826172, 10.96640396118164, 5.290214538574219, -0.679046630859375, 3.2485580444335938, 0.27867889404296875, 9.490596771240234, 15.802200317382812, 1.7089920043945312, -0.20860671997070312, 4.020835876464844, 20.27386474609375, -12.4310302734375, -3.5024185180664062, 4.630531311035156, 3.4017868041992188, -3.78497314453125, 0.4052734375, 30.518707275390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000381.npy"}
|
||||
{"epoch": 0.5759637188208617, "step": 382, "batch_size": 64, "mean": 5.73761510848999, "std": 9.855786323547363, "min": -11.70777702331543, "p10": -5.119763565063477, "median": 4.008411407470703, "p90": 20.941381454467777, "max": 33.69758987426758, "pos_frac": 0.6875, "sample": [-5.3194580078125, 1.5746688842773438, 31.2335205078125, 4.032005310058594, 19.73664093017578, -0.49474334716796875, 6.4249267578125, 0.3635139465332031, 22.00701904296875, 18.39202880859375, 4.007987976074219, 7.6056976318359375, -4.142967224121094, 4.0088348388671875, 21.77056884765625, -2.2451934814453125, -8.137191772460938, 3.5671539306640625, -9.462238311767578, 9.369819641113281, -1.990828514099121, 8.09747314453125, 3.4939422607421875, 0.9456558227539062, -0.7190322875976562, 3.3921127319335938, 3.600584030151367, 11.655403137207031, -5.19171142578125, 33.69758987426758, 10.092658996582031, 7.7774810791015625, -8.361160278320312, -11.70777702331543, 21.457698822021484, -1.8769454956054688, -4.4486846923828125, 6.713047027587891, 5.705535888671875, 5.0089874267578125, -2.298175811767578, 26.47724151611328, 5.105133056640625, 8.888343811035156, -0.27224159240722656, 18.057212829589844, -7.953174591064453, -4.490577697753906, -1.9757080078125, 3.8437652587890625, 6.287078857421875, 3.6919174194335938, -4.951885223388672, 9.934181213378906, 13.668048858642578, -0.43018531799316406, 11.6839599609375, 10.610965728759766, 9.94808578491211, 26.03002166748047, 13.062568664550781, 2.898040771484375, 2.2356948852539062, 5.522422790527344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000382.npy"}
|
||||
{"epoch": 0.5774754346182918, "step": 383, "batch_size": 64, "mean": 7.104597568511963, "std": 11.808442115783691, "min": -18.0257568359375, "p10": -6.105598831176757, "median": 4.782020568847656, "p90": 25.165585708618163, "max": 32.26171875, "pos_frac": 0.71875, "sample": [1.7025222778320312, 24.99981689453125, 21.177581787109375, 4.91400146484375, -11.130287170410156, -5.625072479248047, 8.33206558227539, 3.1152687072753906, 6.3109588623046875, -1.6609153747558594, 6.5509185791015625, 4.2020721435546875, 1.3923568725585938, 11.491378784179688, 19.352394104003906, -2.2577362060546875, 10.111663818359375, -4.283054351806641, 31.09929656982422, 15.259654998779297, 3.1679916381835938, -6.3115386962890625, 16.72612762451172, 2.1466064453125, 25.252635955810547, 4.6500396728515625, 4.278450012207031, -2.819244384765625, 10.676517486572266, 28.044891357421875, 21.646881103515625, -3.420145034790039, -5.227691650390625, 26.635650634765625, -3.139659881591797, 8.939952850341797, 15.936973571777344, -18.0257568359375, 0.88421630859375, 3.34954833984375, 5.508552551269531, 3.5645904541015625, -0.0906982421875, -3.2862091064453125, 15.131095886230469, 1.792928695678711, 10.055007934570312, 6.9935455322265625, -9.161117553710938, 27.657554626464844, 6.181169509887695, 32.26171875, 15.398963928222656, 1.0330047607421875, 0.8856124877929688, 17.556251525878906, 16.57501220703125, -6.9291839599609375, -1.9679412841796875, -15.695114135742188, 23.75398826599121, -11.732925415039062, 25.236629486083984, 15.524486541748047], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000383.npy"}
|
||||
{"epoch": 0.5789871504157218, "step": 384, "batch_size": 64, "mean": 6.930984020233154, "std": 10.677724838256836, "min": -18.21734619140625, "p10": -3.158095550537109, "median": 5.026805877685547, "p90": 22.34344482421875, "max": 31.549217224121094, "pos_frac": 0.78125, "sample": [-6.8634796142578125, 28.79297637939453, 3.589824676513672, 9.814949035644531, -0.06648635864257812, 14.442733764648438, 5.057403564453125, 20.763809204101562, 18.843040466308594, -2.5873985290527344, 3.9435577392578125, 1.709228515625, 5.8881072998046875, -3.3098373413085938, 0.3978691101074219, 7.2097015380859375, 4.171073913574219, 26.070571899414062, -2.0776443481445312, 18.303646087646484, 5.9880218505859375, 10.59161376953125, 4.412017822265625, -14.7872314453125, -5.321624755859375, 11.424304962158203, 27.092697143554688, 1.4374618530273438, -15.26469612121582, 5.6978607177734375, 26.297260284423828, 14.630577087402344, 1.6296405792236328, 4.008596420288086, 14.65878677368164, 0.3012542724609375, 31.549217224121094, 0.5232124328613281, -18.21734619140625, 14.663890838623047, -2.705810546875, 4.406364440917969, 23.486053466796875, -8.63492202758789, 22.253562927246094, 22.38196563720703, 6.638904571533203, 4.30681037902832, 5.051876068115234, 19.32709503173828, 1.891693115234375, -0.22606658935546875, 0.15528106689453125, 17.401382446289062, 3.0977859497070312, -0.10160446166992188, -2.8040313720703125, 16.926132202148438, 0.1944732666015625, 5.858409881591797, 8.15081787109375, 5.001735687255859, 7.335161209106445, 8.780738830566406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000384.npy"}
|
||||
{"epoch": 0.5804988662131519, "step": 385, "batch_size": 64, "mean": 7.610535144805908, "std": 10.311495780944824, "min": -16.477710723876953, "p10": -4.180408477783203, "median": 5.8553466796875, "p90": 22.997685623168945, "max": 29.12989044189453, "pos_frac": 0.734375, "sample": [-16.477710723876953, -2.9067306518554688, 17.94697380065918, 19.4932861328125, 8.181312561035156, 1.3520355224609375, 3.1522216796875, 9.351219177246094, 4.663597106933594, 15.427249908447266, -4.3817596435546875, 12.920360565185547, 12.639719009399414, 15.726188659667969, 4.8800811767578125, 26.78626251220703, 22.946044921875, 21.25653076171875, 1.3654232025146484, 8.514461517333984, -4.313743591308594, 0.886322021484375, -2.251861572265625, 13.752944946289062, 23.019817352294922, 21.5732421875, 6.797946929931641, 3.893310546875, -6.458782196044922, -2.6266098022460938, 6.251869201660156, -4.513999938964844, 16.879989624023438, 5.600341796875, 4.807991027832031, 23.247642517089844, 7.634021759033203, 12.745235443115234, 6.211324691772461, 13.065908432006836, 0.5087051391601562, -9.621986389160156, 5.964080810546875, 5.746612548828125, -1.0511398315429688, 25.992431640625, 11.902633666992188, -1.0532073974609375, -1.0970191955566406, 2.47723388671875, -1.2252349853515625, 8.62205696105957, -7.2693328857421875, -3.869293212890625, 12.602188110351562, 5.412017822265625, 2.6750259399414062, 23.135482788085938, -0.43946075439453125, 28.45342254638672, -2.3975677490234375, 1.0601844787597656, 29.12989044189453, 22.376853942871094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000385.npy"}
|
||||
{"epoch": 0.582010582010582, "step": 386, "batch_size": 64, "mean": 4.438615798950195, "std": 10.909392356872559, "min": -25.04761505126953, "p10": -8.794833755493164, "median": 4.159341812133789, "p90": 18.992879486083986, "max": 27.244033813476562, "pos_frac": 0.734375, "sample": [23.834747314453125, -2.337799072265625, -0.5301475524902344, 14.726020812988281, 18.29235076904297, -4.682464599609375, 2.18927001953125, 1.2452850341796875, 3.7692947387695312, 15.348152160644531, 3.4508132934570312, 7.507049560546875, -20.541034698486328, 1.1335906982421875, 2.2063140869140625, 13.841583251953125, -4.766815185546875, 22.48440170288086, 8.751876831054688, 10.044708251953125, 4.225017547607422, 11.934547424316406, 5.45281982421875, -1.43048095703125, -9.115962982177734, 8.41769790649414, 15.329605102539062, 6.114189147949219, 4.551639556884766, 2.8993606567382812, 14.699928283691406, 5.9964141845703125, -3.0477218627929688, -8.0455322265625, -9.819633483886719, 8.358016967773438, -12.280147552490234, 8.655448913574219, 5.956729888916016, 27.244033813476562, 1.69879150390625, -17.430561065673828, 4.093666076660156, 2.2812652587890625, 19.293106079101562, -5.709362030029297, 5.36151123046875, 6.836189270019531, 1.9154739379882812, 26.536117553710938, 0.44124603271484375, -5.11480712890625, -17.737777709960938, 9.584365844726562, 24.109344482421875, 0.1007843017578125, 5.3358001708984375, 0.21512603759765625, 16.520553588867188, 20.981521606445312, 4.52294921875, -0.8240432739257812, 4.044624328613281, -25.04761505126953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000386.npy"}
|
||||
{"epoch": 0.5835222978080121, "step": 387, "batch_size": 64, "mean": 8.19107437133789, "std": 10.592162132263184, "min": -12.648788452148438, "p10": -4.275443649291992, "median": 6.519536018371582, "p90": 22.787513732910163, "max": 33.70013427734375, "pos_frac": 0.734375, "sample": [4.0783233642578125, -0.16994094848632812, 27.468460083007812, -5.284933090209961, 3.6488876342773438, 29.958526611328125, -5.316612243652344, -0.26789093017578125, -0.2528839111328125, 33.70013427734375, 7.494878768920898, -0.10317611694335938, 10.072586059570312, 6.5902252197265625, 17.69158172607422, 2.3447341918945312, 5.9010467529296875, 19.850845336914062, 12.734638214111328, 4.17999267578125, 12.302886962890625, 25.581501007080078, 9.853767395019531, -4.559608459472656, -2.168262481689453, 17.687355041503906, -2.8712921142578125, 6.448846817016602, 16.57672882080078, -3.6123924255371094, 0.999420166015625, 5.188423156738281, 5.8603363037109375, 17.829200744628906, -11.241973876953125, 20.362823486328125, 16.267929077148438, 0.7879753112792969, 9.313016891479492, -12.648788452148438, -2.655658721923828, 15.671463012695312, 12.706741333007812, -9.545379638671875, -1.0576629638671875, 10.920326232910156, 25.281890869140625, 26.37891387939453, 0.9085845947265625, 16.653945922851562, -6.1890106201171875, 6.634452819824219, 2.6107940673828125, 3.2712554931640625, -1.640838623046875, 13.450178146362305, 2.460113525390625, 8.08089828491211, 18.240646362304688, 23.49285888671875, 21.141708374023438, 9.546943664550781, 5.160808563232422, 20.4274845123291], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000387.npy"}
|
||||
{"epoch": 0.5850340136054422, "step": 388, "batch_size": 64, "mean": 4.701900005340576, "std": 10.795165061950684, "min": -18.474472045898438, "p10": -10.873762893676757, "median": 4.295192718505859, "p90": 19.968321228027346, "max": 24.961429595947266, "pos_frac": 0.703125, "sample": [14.00799560546875, 16.62114715576172, 20.758712768554688, 16.87287139892578, 0.34381103515625, 9.85015869140625, 9.471794128417969, 4.0225677490234375, 5.374275207519531, -13.514053344726562, 7.272483825683594, 12.898101806640625, -11.587657928466797, 5.548946380615234, 1.0592880249023438, 18.392898559570312, -0.5880851745605469, 23.067245483398438, 2.3417892456054688, 11.14340591430664, 20.18798828125, 2.9580764770507812, 24.78445053100586, 4.567817687988281, 17.951980590820312, -9.2080078125, 5.6199951171875, 9.248016357421875, 7.798583984375, 2.7731857299804688, -2.2473316192626953, -12.65384292602539, 0.9549674987792969, 24.961429595947266, -8.793901443481445, 1.52117919921875, 0.0806427001953125, 0.2345733642578125, 6.732915878295898, -12.671005249023438, 13.525367736816406, 19.455764770507812, -5.2753753662109375, 11.325363159179688, 14.629232406616211, 12.256904602050781, -3.7920303344726562, 2.108154296875, -4.7394561767578125, 20.82207489013672, -3.0885086059570312, 22.896196365356445, -4.4861602783203125, 2.7758026123046875, -18.474472045898438, -15.270164489746094, 3.455169677734375, 9.550750732421875, -6.803203582763672, -7.303924560546875, 5.665107727050781, -1.3933334350585938, -13.106704711914062, 8.029647827148438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000388.npy"}
|
||||
{"epoch": 0.5865457294028723, "step": 389, "batch_size": 64, "mean": 6.829051971435547, "std": 10.263044357299805, "min": -16.160987854003906, "p10": -6.245392799377441, "median": 7.4445648193359375, "p90": 19.549539184570314, "max": 29.204910278320312, "pos_frac": 0.71875, "sample": [7.415740966796875, -15.263010025024414, 3.7463302612304688, 14.500930786132812, 9.279439926147461, 9.63555908203125, 10.811206817626953, 23.365325927734375, 19.576675415039062, 10.965179443359375, -5.991668701171875, -16.160987854003906, 10.777511596679688, 9.335166931152344, 5.995231628417969, 19.264633178710938, 3.1174468994140625, 19.094547271728516, 4.7816925048828125, 10.04620361328125, 13.511459350585938, 14.562309265136719, -0.10058975219726562, 6.9346160888671875, 16.705062866210938, -0.6522293090820312, 19.486221313476562, -2.425384521484375, 7.649078369140625, 11.310909271240234, -7.5284576416015625, -2.554157257080078, -3.0217666625976562, 29.204910278320312, 23.303470611572266, 2.6732025146484375, 22.456954956054688, 0.5022506713867188, -3.7132911682128906, -9.60108757019043, 9.0281982421875, -4.912876129150391, 3.0532264709472656, -6.354131698608398, 7.473388671875, 6.54205322265625, 15.89300537109375, -4.002204895019531, -8.838546752929688, 8.353302001953125, 1.2664566040039062, 3.673847198486328, -0.00067138671875, -2.4340314865112305, -9.18197250366211, 26.635475158691406, 27.366973876953125, 12.662185668945312, 10.23565673828125, 15.422515869140625, 2.116291046142578, 15.119754791259766, 6.938255310058594, 8.006511688232422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000389.npy"}
|
||||
{"epoch": 0.5880574452003023, "step": 390, "batch_size": 64, "mean": 6.88944149017334, "std": 10.277053833007812, "min": -19.95331573486328, "p10": -4.988536834716797, "median": 5.249539375305176, "p90": 19.265196228027346, "max": 34.43048095703125, "pos_frac": 0.765625, "sample": [-4.825675964355469, 2.60882568359375, 13.977981567382812, -3.9196395874023438, 18.015939712524414, 3.076366424560547, 5.380218505859375, 19.794601440429688, -1.5364570617675781, 11.823127746582031, 1.740020751953125, -5.0583343505859375, 15.44879150390625, 1.2365760803222656, 0.8295135498046875, 34.43048095703125, 13.682491302490234, -8.494447708129883, 14.101600646972656, -5.1294097900390625, 14.462188720703125, 12.739631652832031, 7.062788009643555, 25.8028564453125, 14.654449462890625, 15.415084838867188, -1.81787109375, -13.311843872070312, 3.4013710021972656, 28.89362335205078, -7.492431640625, 4.826080322265625, 3.8981475830078125, 5.572235107421875, -8.680082321166992, 10.449371337890625, -19.95331573486328, 15.044830322265625, 27.05927276611328, -0.4058189392089844, 18.779129028320312, 1.2637214660644531, 0.4151611328125, 4.204959869384766, 0.6238975524902344, 13.624820709228516, 5.733795166015625, 3.3669586181640625, 19.4735107421875, 5.1843109130859375, 10.695594787597656, 6.6087493896484375, -0.06560516357421875, 10.301864624023438, 5.314767837524414, 4.128623962402344, 3.354715347290039, -1.1418418884277344, 1.5515546798706055, -0.9313564300537109, 18.59326934814453, 15.550277709960938, 6.671356201171875, 22.818912506103516], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000390.npy"}
|
||||
{"epoch": 0.5895691609977324, "step": 391, "batch_size": 64, "mean": 8.979471206665039, "std": 13.015582084655762, "min": -23.833663940429688, "p10": -4.840549087524414, "median": 8.162964820861816, "p90": 23.265065002441407, "max": 46.390625, "pos_frac": 0.71875, "sample": [2.1429901123046875, -23.833663940429688, 23.340808868408203, 11.768280029296875, 12.209571838378906, 19.514015197753906, 0.20324325561523438, 25.67388916015625, 18.19263458251953, 35.564208984375, 5.117973327636719, -15.571739196777344, 17.478988647460938, -2.3333168029785156, 15.4954833984375, 29.381271362304688, -4.506557464599609, 13.962821960449219, -3.2705535888671875, -0.5948257446289062, -4.9836883544921875, 10.275623321533203, 4.741729736328125, 1.529296875, 8.8565673828125, -2.004474639892578, -15.91964340209961, -7.873382568359375, 12.268241882324219, 4.044422149658203, 46.390625, 8.979080200195312, 6.921627044677734, -0.8652534484863281, -3.50714111328125, -0.5673675537109375, 2.8531665802001953, 11.346633911132812, 21.890960693359375, 19.731605529785156, 6.894954681396484, 20.107261657714844, -9.13990592956543, -8.927188873291016, 4.080028533935547, 18.415771484375, 27.562156677246094, 5.3515472412109375, 21.74121856689453, 22.83275604248047, -4.1125640869140625, -4.201316833496094, 26.79853057861328, 18.230079650878906, 21.890457153320312, 23.088329315185547, 20.54302978515625, 11.880943298339844, -1.2691497802734375, 7.469362258911133, 3.1557998657226562, 17.975021362304688, 13.537628173828125, 6.7373046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000391.npy"}
|
||||
{"epoch": 0.5910808767951625, "step": 392, "batch_size": 64, "mean": 9.90990161895752, "std": 12.02556324005127, "min": -17.316787719726562, "p10": -3.138412666320799, "median": 9.996013641357422, "p90": 25.587710189819337, "max": 39.585479736328125, "pos_frac": 0.8125, "sample": [-14.27337646484375, 5.877374649047852, 16.96855926513672, 4.747100830078125, 10.068099975585938, 1.1789512634277344, 25.83407974243164, 0.7689628601074219, 22.136611938476562, 12.168148040771484, 14.050384521484375, -17.316787719726562, 14.082542419433594, 11.487274169921875, 28.19507598876953, 13.16522216796875, -1.2794952392578125, 6.3244781494140625, 25.012847900390625, 9.923927307128906, 6.013008117675781, -3.6963424682617188, 18.321746826171875, -9.616922378540039, 12.096359252929688, 26.601272583007812, 22.851741790771484, 4.622180938720703, 11.783760070800781, -7.9023895263671875, -1.0258102416992188, 36.07713317871094, 24.251907348632812, 1.9561996459960938, -8.245857238769531, 10.530891418457031, 1.1735038757324219, 8.81991958618164, 8.334884643554688, 21.35663604736328, 36.457183837890625, 29.016098022460938, -1.3584175109863281, 13.49871826171875, 1.8564720153808594, 1.8558883666992188, 0.4764251708984375, 7.059989929199219, 23.156341552734375, 15.953216552734375, -1.0983695983886719, 39.585479736328125, 0.39040184020996094, 12.00225830078125, -5.309539794921875, -1.8365764617919922, 14.550048828125, 5.1596527099609375, 14.682281494140625, 19.384811401367188, 1.2145137786865234, 17.35970687866211, 3.5475540161132812, 13.20574951171875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000392.npy"}
|
||||
{"epoch": 0.5925925925925926, "step": 393, "batch_size": 64, "mean": 6.895089149475098, "std": 9.737110137939453, "min": -11.17803955078125, "p10": -3.4431907653808596, "median": 5.366684913635254, "p90": 21.87810821533203, "max": 31.04327392578125, "pos_frac": 0.703125, "sample": [1.7863998413085938, -6.105857849121094, 21.823890686035156, 8.676116943359375, 28.925342559814453, 7.9329376220703125, -1.1081962585449219, 9.277366638183594, 3.5037460327148438, 7.793701171875, -3.4592971801757812, -0.45934295654296875, 21.951019287109375, 6.6216278076171875, 8.6522216796875, -9.38699722290039, 7.2191619873046875, 21.041160583496094, -4.317619323730469, 4.504096984863281, -1.4302825927734375, 5.281078338623047, 0.704559326171875, 24.12120819091797, -0.4930572509765625, 21.901344299316406, -0.2593536376953125, 18.16729736328125, 14.119277954101562, 18.473541259765625, 2.3069610595703125, -5.538841247558594, 12.78570556640625, 22.55284881591797, 1.495631217956543, 15.945587158203125, -8.930683135986328, 16.50872802734375, 0.41828155517578125, -1.670074462890625, 8.2484130859375, 0.5232696533203125, 4.718227386474609, 4.462558746337891, -1.2339248657226562, 5.452291488647461, 16.893051147460938, -0.37625885009765625, 4.465229034423828, 1.6931610107421875, 23.42729377746582, 31.04327392578125, 8.823326110839844, -2.3237876892089844, 8.809459686279297, 9.518848419189453, -0.39086151123046875, -2.0930938720703125, -11.17803955078125, -3.405609130859375, 6.129220962524414, 19.76848602294922, 5.8802337646484375, 11.099693298339844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000393.npy"}
|
||||
{"epoch": 0.5941043083900227, "step": 394, "batch_size": 64, "mean": 6.955907821655273, "std": 9.11688232421875, "min": -15.614181518554688, "p10": -2.678060913085937, "median": 7.076478958129883, "p90": 18.001951599121096, "max": 29.433059692382812, "pos_frac": 0.765625, "sample": [11.443328857421875, 26.692825317382812, 24.401092529296875, 1.7079925537109375, 0.14194488525390625, 2.2533798217773438, -15.614181518554688, 6.750644683837891, 13.349151611328125, 6.365024566650391, 23.164642333984375, 8.675355911254883, 4.519100189208984, -1.1769180297851562, -10.435111999511719, 14.183685302734375, -1.6737747192382812, -5.295145034790039, -2.94183349609375, 5.612281799316406, 21.19739532470703, -1.1633529663085938, 6.136577606201172, 5.584815979003906, 7.843910217285156, 14.23678207397461, 11.236764907836914, 22.40496063232422, 7.505016326904297, 7.402313232421875, -9.576702117919922, 6.337196350097656, 29.433059692382812, 10.207527160644531, 4.412250518798828, 15.184524536132812, 1.3687000274658203, 13.541828155517578, 6.5496368408203125, -0.2330322265625, 9.491565704345703, -1.9111518859863281, -10.82196044921875, 10.421554565429688, 10.225288391113281, 18.278427124023438, -6.903724670410156, 10.066204071044922, -1.4022941589355469, 7.966545104980469, -2.062591552734375, 17.356842041015625, 3.799346923828125, 8.482154846191406, -0.99212646484375, 14.515422821044922, 2.8440170288085938, 11.772134780883789, 7.828638076782227, 1.1062164306640625, 8.262046813964844, 15.629287719726562, 4.507598876953125, 14.985015869140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000394.npy"}
|
||||
{"epoch": 0.5956160241874527, "step": 395, "batch_size": 64, "mean": 6.41133975982666, "std": 10.482383728027344, "min": -14.89407730102539, "p10": -4.683198547363281, "median": 5.218574523925781, "p90": 21.294877052307132, "max": 35.3714599609375, "pos_frac": 0.734375, "sample": [14.96337890625, 20.473770141601562, 5.759521484375, -0.173492431640625, 18.830276489257812, 9.04095458984375, -0.6448459625244141, 12.905387878417969, 21.844261169433594, -3.3744449615478516, 1.4530906677246094, 1.978302001953125, 21.789474487304688, 15.118915557861328, 10.153297424316406, 21.646780014038086, -4.810676574707031, 5.268360137939453, -6.488372802734375, 1.837371826171875, -3.5797042846679688, 0.6855392456054688, 1.45111083984375, 7.411125183105469, 18.221229553222656, -7.924961090087891, 9.445552825927734, 5.356742858886719, 1.7229328155517578, 1.7785568237304688, 12.715625762939453, 35.3714599609375, 2.02880859375, 0.09906768798828125, -11.021377563476562, 14.90826416015625, -14.89407730102539, 7.8157806396484375, 12.368431091308594, -2.3832759857177734, 10.463920593261719, 0.44807910919189453, 27.29193878173828, 2.320819854736328, 22.1199951171875, 26.305484771728516, -3.2478809356689453, 18.096893310546875, -9.235702514648438, -2.0881500244140625, 1.1229667663574219, 16.355178833007812, -0.74725341796875, 0.7176971435546875, 7.3143768310546875, 16.595069885253906, -3.6393585205078125, 5.168788909912109, 10.592870712280273, -4.385749816894531, 6.052667617797852, -14.268989562988281, 2.2799205780029297, 15.544036865234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000395.npy"}
|
||||
{"epoch": 0.5971277399848829, "step": 396, "batch_size": 64, "mean": 6.846286773681641, "std": 8.950173377990723, "min": -22.94853973388672, "p10": -0.7862117767333979, "median": 5.793625831604004, "p90": 16.684591674804697, "max": 30.48755645751953, "pos_frac": 0.859375, "sample": [12.87198257446289, 28.746002197265625, 18.56446075439453, 25.096694946289062, 14.605751037597656, 9.170135498046875, 9.927299499511719, 8.428112030029297, 5.517578125, 13.331460952758789, 9.926101684570312, 3.43218994140625, 11.454566955566406, -11.575374603271484, 2.5566062927246094, 0.10948657989501953, 12.584075927734375, 12.352714538574219, 4.648983001708984, 9.021747589111328, 4.133148193359375, -0.21091842651367188, 30.48755645751953, -1.3631401062011719, 1.0586624145507812, 4.131591796875, 4.974250793457031, -22.94853973388672, 17.79930877685547, 5.26214599609375, 9.29737663269043, -14.473876953125, 1.925210952758789, 25.204872131347656, 2.61956787109375, 13.245216369628906, 7.044750213623047, 7.312828063964844, 12.849205017089844, 5.360504150390625, 2.071727752685547, 12.968339920043945, 1.9514236450195312, 13.400848388671875, 1.17327880859375, 9.906364440917969, 4.247356414794922, 11.632904052734375, 11.975532531738281, 2.4723281860351562, 1.4322586059570312, -0.29151153564453125, 6.069673538208008, -0.9982261657714844, 6.424705505371094, 7.830841064453125, 2.4389877319335938, 9.949853897094727, 4.184358596801758, 5.19190788269043, 1.4790496826171875, -6.515632629394531, -2.8898544311523438, 17.575523376464844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000396.npy"}
|
||||
{"epoch": 0.5986394557823129, "step": 397, "batch_size": 64, "mean": 4.687417507171631, "std": 10.737100601196289, "min": -28.860130310058594, "p10": -8.509320831298826, "median": 5.687675476074219, "p90": 16.37197551727295, "max": 28.08536148071289, "pos_frac": 0.71875, "sample": [-0.9259033203125, 2.356414794921875, -15.650718688964844, 20.520286560058594, -1.5441246032714844, 18.246200561523438, 5.268959045410156, 7.6495208740234375, 9.638687133789062, 7.7855224609375, 2.283749580383301, 7.339836120605469, 13.436279296875, 9.655494689941406, 8.615592956542969, 15.4473876953125, -20.52337646484375, -5.630523681640625, 7.412860870361328, -0.21685409545898438, 12.583351135253906, 6.10235595703125, 8.782808303833008, 19.082427978515625, 4.636894226074219, -18.577911376953125, -2.7057952880859375, 6.960487365722656, 13.721748352050781, 13.435028076171875, 11.92190933227539, -0.23929214477539062, 5.2729949951171875, 3.8163070678710938, 16.003557205200195, 1.8240966796875, -1.0232658386230469, 21.055938720703125, 13.660598754882812, 6.201358795166016, 11.812274932861328, 16.529869079589844, 10.957450866699219, 10.221038818359375, -28.860130310058594, -0.04419708251953125, 20.870330810546875, -18.112335205078125, -9.403472900390625, 3.2500839233398438, 7.010688781738281, 4.651893615722656, 2.964263916015625, 4.28547477722168, -3.4459762573242188, -1.3263778686523438, 6.682666778564453, 1.8710803985595703, 0.3076019287109375, 15.273345947265625, -16.076171875, -6.422966003417969, 5.232021331787109, 28.08536148071289], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000397.npy"}
|
||||
{"epoch": 0.600151171579743, "step": 398, "batch_size": 64, "mean": 8.061240196228027, "std": 11.230545043945312, "min": -14.300384521484375, "p10": -4.985481262207031, "median": 6.931087493896484, "p90": 25.075835418701175, "max": 35.72306823730469, "pos_frac": 0.734375, "sample": [-4.170570373535156, 10.489395141601562, 7.826385498046875, 5.1286468505859375, 12.264636993408203, 1.7943344116210938, 35.72306823730469, 9.14306640625, -3.2809906005859375, 26.142417907714844, 11.927047729492188, 14.904808044433594, -6.180732727050781, -5.643028259277344, 22.509422302246094, -3.9943313598632812, 21.454853057861328, -4.99658203125, 10.177780151367188, -0.7681808471679688, 10.317977905273438, 7.989864349365234, 29.842418670654297, 26.7476806640625, 0.8305168151855469, 21.793609619140625, 23.544471740722656, 14.319770812988281, 2.0827503204345703, 10.847496032714844, -1.6886425018310547, 13.98870849609375, 6.035789489746094, 3.5264892578125, -4.9595794677734375, -5.94074821472168, 0.37548828125, 24.690475463867188, 5.9928436279296875, 1.98626708984375, -11.45416259765625, -5.6175079345703125, 17.798114776611328, 12.418037414550781, 9.24857234954834, 25.240989685058594, 1.3135051727294922, 1.657867431640625, -2.6701278686523438, 16.63643455505371, -0.7471237182617188, 31.38299560546875, 5.7956390380859375, 1.3011474609375, 11.903427124023438, -2.8238983154296875, 15.26776123046875, -14.300384521484375, 0.7444381713867188, -0.072357177734375, 9.732658386230469, 13.056598663330078, 25.489147186279297, 1.8425025939941406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000398.npy"}
|
||||
{"epoch": 0.6016628873771731, "step": 399, "batch_size": 64, "mean": 7.655560493469238, "std": 11.136283874511719, "min": -16.31692886352539, "p10": -5.631182861328125, "median": 6.897611618041992, "p90": 20.91497669219971, "max": 36.33660888671875, "pos_frac": 0.765625, "sample": [18.34290313720703, 0.8602447509765625, 7.693672180175781, 10.39556884765625, -16.31692886352539, 14.325225830078125, 20.40953826904297, 5.655546188354492, -5.687171936035156, -8.805397033691406, -11.483238220214844, 12.648651123046875, 7.1148529052734375, -5.500541687011719, 5.0922088623046875, 6.857574462890625, 6.7462615966796875, 8.018798828125, 11.144996643066406, 31.214309692382812, -8.383636474609375, 0.209686279296875, -11.42431640625, 19.010807037353516, 18.336822509765625, -1.5564727783203125, 11.379486083984375, 36.33660888671875, 5.811134338378906, 1.9291954040527344, 1.9458770751953125, -2.4530296325683594, 3.440643310546875, 2.972553253173828, -1.8451080322265625, 2.4514389038085938, 2.1599807739257812, 33.12158203125, 15.03839111328125, 11.52142333984375, 2.177398681640625, 2.0549373626708984, 27.49791717529297, -1.5159149169921875, 13.808677673339844, 7.044708251953125, 7.489204406738281, 6.937648773193359, -6.7327423095703125, -4.363037109375, 20.674531936645508, 16.70441436767578, 21.018024444580078, 8.021560668945312, 13.972713470458984, 24.8968505859375, 18.318572998046875, 26.726242065429688, 0.745025634765625, 6.033992767333984, 16.209487915039062, -1.5849761962890625, 7.1194610595703125, -2.0289764404296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000399.npy"}
|
||||
{"epoch": 0.6031746031746031, "step": 400, "batch_size": 64, "mean": 4.709937572479248, "std": 10.769264221191406, "min": -17.898818969726562, "p10": -8.584197616577148, "median": 3.2243995666503906, "p90": 17.101212120056157, "max": 31.962158203125, "pos_frac": 0.65625, "sample": [1.7677764892578125, 7.7197418212890625, -4.036888122558594, 4.29052734375, 11.921951293945312, 11.519672393798828, 1.9492950439453125, 12.373653411865234, 17.743682861328125, 24.14952850341797, -0.6276473999023438, -3.5223846435546875, 3.3647537231445312, 0.44664764404296875, 6.318845748901367, 2.929779052734375, -3.65826416015625, 26.728130340576172, 2.2714614868164062, 15.602113723754883, 2.298553466796875, -8.67709732055664, 11.124099731445312, -9.655279159545898, -10.3780517578125, 13.741924285888672, -2.9965896606445312, 7.0214385986328125, 1.511688232421875, -2.659820556640625, -11.4378662109375, -2.7273330688476562, 11.816314697265625, 0.39337921142578125, 31.962158203125, -12.749176025390625, -6.594398498535156, -0.558868408203125, 14.320541381835938, 5.053203582763672, 10.681961059570312, 12.86386489868164, -17.898818969726562, 18.0928955078125, -5.045074462890625, 11.864025115966797, -1.3825302124023438, -0.5408477783203125, 5.145263671875, 31.4981689453125, 3.7207794189453125, 12.523025512695312, -5.9470672607421875, 4.771903991699219, -8.367431640625, 15.577430725097656, 27.33283233642578, 10.619277954101562, 0.5129432678222656, 3.08404541015625, -14.037193298339844, -2.320892333984375, 15.068214416503906, 3.55804443359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000400.npy"}
|
||||
{"epoch": 0.6046863189720333, "step": 401, "batch_size": 64, "mean": 9.474281311035156, "std": 10.987771034240723, "min": -17.275222778320312, "p10": -2.8555683135986323, "median": 8.50930118560791, "p90": 23.145932388305667, "max": 43.886810302734375, "pos_frac": 0.859375, "sample": [-3.022186279296875, 22.088401794433594, 26.2352294921875, 6.344505310058594, 2.384920120239258, 1.7204933166503906, 6.828399658203125, 8.051254272460938, 12.977470397949219, -13.960311889648438, 18.227066040039062, 11.340377807617188, -5.917667388916016, -5.6366119384765625, 14.187728881835938, 0.8453826904296875, 11.823516845703125, 5.621551513671875, 22.817703247070312, 21.63134002685547, 8.163253784179688, -1.9109001159667969, 6.619873046875, 15.540679931640625, -6.5433502197265625, -17.275222778320312, 7.560943603515625, 6.72175407409668, 13.539527893066406, 2.991790771484375, 9.716499328613281, 8.965667724609375, 4.263650894165039, 19.977210998535156, 1.20806884765625, 23.286602020263672, 3.8402175903320312, 13.15643310546875, 0.8660602569580078, 8.925979614257812, 0.7780075073242188, 26.803485870361328, 7.462432861328125, -11.83132553100586, 16.776397705078125, 9.6275634765625, 5.120025634765625, 26.209938049316406, 13.610763549804688, 17.975181579589844, 8.600578308105469, 4.974916458129883, 10.907245635986328, 1.2240066528320312, -2.4667930603027344, 8.418024063110352, 13.576263427734375, 14.43182373046875, 29.825321197509766, 31.35870361328125, 16.91274642944336, 4.989955902099609, 12.978645324707031, 43.886810302734375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000401.npy"}
|
||||
{"epoch": 0.6061980347694633, "step": 402, "batch_size": 64, "mean": 7.391202926635742, "std": 10.920918464660645, "min": -16.15277099609375, "p10": -5.080738067626953, "median": 6.366418838500977, "p90": 19.273155975341798, "max": 35.041229248046875, "pos_frac": 0.765625, "sample": [15.441946029663086, 12.986038208007812, 10.487060546875, 16.220375061035156, 5.281425476074219, 14.275436401367188, 32.77489471435547, 14.986900329589844, 7.2683563232421875, 14.662818908691406, 0.6124916076660156, 30.052101135253906, 16.74135971069336, -2.5300521850585938, 13.209152221679688, 5.9030609130859375, 5.4803009033203125, 5.496797561645508, -1.3653717041015625, 7.229156494140625, 29.39898681640625, 2.6281700134277344, -1.0568885803222656, -13.354942321777344, 19.088088989257812, -1.0527610778808594, 7.398502349853516, 16.36712646484375, 3.254802703857422, 6.484886169433594, 35.041229248046875, 32.36009216308594, -4.814857482910156, 4.870277404785156, 4.545684814453125, 14.465583801269531, 7.667320251464844, 20.534652709960938, 9.812210083007812, 7.702899932861328, -8.555850982666016, -5.1946868896484375, -16.15277099609375, 14.910846710205078, 8.149810791015625, 3.8519363403320312, 1.6364288330078125, -5.9979095458984375, 5.197502136230469, -3.6652374267578125, 19.35247039794922, 7.249176025390625, 7.204227447509766, 17.970108032226562, -10.400604248046875, 0.6148147583007812, 0.23087310791015625, -3.4015045166015625, -11.375701904296875, 14.109596252441406, 4.655982971191406, 6.247951507568359, 1.0551223754882812, -1.2108993530273438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000402.npy"}
|
||||
{"epoch": 0.6077097505668935, "step": 403, "batch_size": 64, "mean": 9.562971115112305, "std": 13.362582206726074, "min": -29.20163345336914, "p10": -6.730281066894531, "median": 8.361898422241211, "p90": 28.037760543823246, "max": 42.9075927734375, "pos_frac": 0.796875, "sample": [0.8087158203125, 1.867645263671875, 19.085968017578125, 9.896812438964844, 11.766729354858398, 11.392486572265625, 37.893089294433594, 18.510377883911133, -0.4046478271484375, 24.441970825195312, 20.763416290283203, -6.6258087158203125, 20.005157470703125, 1.95367431640625, 19.291034698486328, 9.5504150390625, 8.719648361206055, 2.381481170654297, 7.6112060546875, 5.966339111328125, -2.1827125549316406, -12.88211441040039, 42.9075927734375, 33.481937408447266, 5.9967498779296875, 16.28777313232422, 30.727767944335938, 0.43041229248046875, 28.546977996826172, 3.0024261474609375, 6.636970520019531, 6.189002990722656, 6.444309234619141, 1.1972923278808594, 10.234073638916016, -7.704524993896484, 3.632720947265625, 33.84270095825195, 3.860590934753418, 10.66619873046875, 2.2096328735351562, -2.9646053314208984, 30.900636672973633, -3.0456924438476562, 17.761720657348633, -6.861907958984375, 8.059993743896484, 13.566322326660156, 14.846851348876953, -6.775054931640625, 15.20013427734375, 21.687698364257812, -29.20163345336914, -10.085052490234375, 16.754669189453125, -11.187763214111328, 24.504528045654297, 15.475784301757812, 3.5512771606445312, 26.849586486816406, -3.5804595947265625, 2.536834716796875, 8.663803100585938, 16.971006393432617], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000403.npy"}
|
||||
{"epoch": 0.6092214663643235, "step": 404, "batch_size": 64, "mean": 8.658452033996582, "std": 11.33410930633545, "min": -13.80584716796875, "p10": -3.967479133605957, "median": 6.177223205566406, "p90": 24.29929885864258, "max": 32.299102783203125, "pos_frac": 0.75, "sample": [12.031951904296875, 6.785728454589844, 23.318824768066406, 7.7330322265625, 2.1838111877441406, 11.377151489257812, 24.598907470703125, 4.43077278137207, -3.7789230346679688, 0.419281005859375, 21.148544311523438, -0.30377197265625, -8.20224380493164, 5.569393157958984, -0.2761268615722656, 31.34814453125, -4.4876556396484375, -1.8198442459106445, 23.60021209716797, 20.84119415283203, 25.51778793334961, 6.3303985595703125, 4.1971893310546875, 31.260345458984375, -7.157928466796875, 4.863162994384766, -6.784645080566406, 20.2080078125, 12.998161315917969, 16.26663589477539, 4.81732177734375, 6.0240478515625, -4.015470504760742, 16.970474243164062, -1.383331298828125, 3.943145751953125, -1.3694610595703125, 18.103172302246094, 9.185256958007812, 0.6240997314453125, 1.2866058349609375, -0.05867767333984375, 17.42473602294922, 7.394073486328125, 16.105323791503906, 7.417095184326172, 14.945724487304688, 18.30316162109375, 32.299102783203125, 5.890720367431641, 0.7061767578125, 20.8192138671875, 32.178680419921875, 2.55755615234375, 26.949424743652344, 4.0635833740234375, 21.473819732666016, -13.80584716796875, -3.855499267578125, -11.512435913085938, 1.1605377197265625, -0.8387832641601562, 9.430423736572266, 6.689430236816406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000404.npy"}
|
||||
{"epoch": 0.6107331821617535, "step": 405, "batch_size": 64, "mean": 6.38104772567749, "std": 10.806331634521484, "min": -10.453933715820312, "p10": -4.993819046020508, "median": 4.690029144287109, "p90": 23.37793350219727, "max": 34.218162536621094, "pos_frac": 0.65625, "sample": [23.769378662109375, 1.8808059692382812, -7.666738510131836, 25.246734619140625, 17.222137451171875, -2.33013916015625, 15.152652740478516, -4.45245361328125, -10.137626647949219, -10.453933715820312, -4.7299041748046875, 5.74860954284668, 2.710254669189453, -2.1657886505126953, 13.588333129882812, -3.3974647521972656, 30.219757080078125, -4.468345642089844, 4.628227233886719, 3.48809814453125, 9.64680290222168, 13.815109252929688, 7.057254791259766, 9.626304626464844, -5.108722686767578, -2.0220718383789062, 15.43060302734375, 25.53116226196289, -4.948368072509766, 5.965429306030273, 4.88714599609375, -4.163612365722656, 3.8537979125976562, 13.993118286132812, -2.5489273071289062, 11.064044952392578, 6.512702941894531, -5.982395172119141, 18.951629638671875, 17.82738494873047, 1.2264280319213867, -2.008493423461914, 4.7518310546875, 9.708511352539062, 3.756072998046875, 2.2068023681640625, -0.7409744262695312, 13.630729675292969, 26.348556518554688, 12.53219985961914, -1.5691032409667969, 0.133544921875, 14.057270050048828, 7.881069183349609, 28.907203674316406, 34.218162536621094, -9.827507019042969, 11.010902404785156, -4.683307647705078, 1.7198905944824219, -2.7020416259765625, 22.464561462402344, 7.137050628662109, -5.013298034667969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000405.npy"}
|
||||
{"epoch": 0.6122448979591837, "step": 406, "batch_size": 64, "mean": 9.296537399291992, "std": 11.695737838745117, "min": -19.891338348388672, "p10": -4.68857250213623, "median": 7.100131988525391, "p90": 27.22179927825928, "max": 33.08427429199219, "pos_frac": 0.828125, "sample": [15.170158386230469, 1.7857513427734375, -4.371824264526367, -6.081336975097656, 16.700775146484375, 16.522586822509766, 19.354442596435547, 28.6544189453125, -7.724452972412109, 33.08427429199219, 20.899539947509766, 14.962593078613281, 1.1132888793945312, -3.6103858947753906, 27.320266723632812, 19.7529296875, 23.9444580078125, 5.737468719482422, 5.965572357177734, 29.942832946777344, 26.99212646484375, 5.320629119873047, 5.502044677734375, 4.128959655761719, 6.828605651855469, 9.34405517578125, 1.9470539093017578, 10.602348327636719, 0.5945606231689453, 3.8384761810302734, -4.824321746826172, -0.7488651275634766, -19.891338348388672, 0.02259063720703125, 23.94427490234375, 12.877510070800781, 2.0968055725097656, 27.32023048400879, 2.4310150146484375, 31.597618103027344, 8.423622131347656, -7.3524627685546875, 9.369354248046875, 16.947250366210938, 2.6447601318359375, 18.272239685058594, -1.965789794921875, 3.9670677185058594, 10.726207733154297, 23.78453826904297, 7.922813415527344, 4.088653564453125, 5.669158935546875, -8.418655395507812, 8.154775619506836, -15.798725128173828, 6.6711273193359375, 11.568557739257812, 4.062835693359375, 6.358791351318359, 28.785736083984375, 18.366424560546875, 16.310653686523438, 7.3716583251953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000406.npy"}
|
||||
{"epoch": 0.6137566137566137, "step": 407, "batch_size": 64, "mean": 6.331610679626465, "std": 10.70129108428955, "min": -13.389984130859375, "p10": -5.357843780517578, "median": 4.203367233276367, "p90": 21.862817382812512, "max": 34.157676696777344, "pos_frac": 0.6875, "sample": [2.1657447814941406, -0.3271484375, -10.109710693359375, 18.506649017333984, -1.6431198120117188, 18.870502471923828, 31.904144287109375, -4.188137054443359, -12.674652099609375, 13.885662078857422, 0.7936935424804688, 9.185298919677734, -0.9090728759765625, -5.463951110839844, -4.251251220703125, -1.8482208251953125, 11.228019714355469, 4.1805419921875, 23.124977111816406, 23.390344619750977, 1.1391448974609375, -4.280607223510742, 11.951854705810547, 9.911903381347656, 3.245513916015625, 18.42858123779297, 2.3928298950195312, 17.501522064208984, 7.0312347412109375, 34.157676696777344, 9.2662353515625, -0.8680343627929688, 23.29705810546875, 14.447998046875, -0.203094482421875, 0.2595367431640625, -5.110260009765625, 10.678466796875, 2.9689292907714844, 0.8784885406494141, -9.6009521484375, 10.30181884765625, 18.91777801513672, 6.660835266113281, 25.934967041015625, 11.8466796875, 10.983863830566406, 5.433143615722656, 4.4938507080078125, 0.31375885009765625, 6.229259490966797, -7.80645751953125, -13.389984130859375, -1.650665283203125, -6.977331161499023, -1.0749778747558594, 7.444675445556641, 27.622623443603516, 4.226192474365234, 0.8289604187011719, -0.23604583740234375, 12.249671936035156, 17.33349609375, 2.2226104736328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000407.npy"}
|
||||
{"epoch": 0.6152683295540439, "step": 408, "batch_size": 64, "mean": 9.09909439086914, "std": 10.760048866271973, "min": -10.898429870605469, "p10": -3.4780487060546865, "median": 6.40997314453125, "p90": 25.45304985046387, "max": 30.830825805664062, "pos_frac": 0.796875, "sample": [-1.5077896118164062, 10.125677108764648, 17.394561767578125, 12.205642700195312, 2.101774215698242, -10.426998138427734, -10.898429870605469, 1.2491416931152344, 19.283466339111328, 5.985820770263672, 6.278076171875, 26.846176147460938, 3.4236068725585938, 4.6160888671875, 20.5299072265625, 23.819351196289062, 0.9940147399902344, -2.607513427734375, 13.053661346435547, 5.317012786865234, 7.449485778808594, -1.5844573974609375, -0.11931419372558594, 6.935688018798828, 25.933151245117188, -6.548126220703125, 5.983345031738281, 29.749176025390625, 18.007583618164062, 25.65009307861328, 24.84564971923828, 1.4965267181396484, 6.5418701171875, 7.435611724853516, 20.569091796875, 24.993282318115234, 2.632171630859375, -5.737857818603516, 29.578460693359375, 2.898956298828125, 2.269805908203125, 20.020599365234375, 23.302490234375, 26.761676788330078, -0.082183837890625, 5.734504699707031, 12.247100830078125, -4.059883117675781, 3.757720947265625, 6.7726898193359375, 14.735824584960938, 17.98967742919922, 30.830825805664062, -4.268474578857422, 3.6022567749023438, 13.402717590332031, 1.0218048095703125, -1.6826629638671875, 6.563289642333984, -3.85113525390625, 1.7980194091796875, 4.170188903808594, 8.873359680175781, 17.938152313232422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000408.npy"}
|
||||
{"epoch": 0.6167800453514739, "step": 409, "batch_size": 64, "mean": 4.776151657104492, "std": 11.771044731140137, "min": -20.478591918945312, "p10": -8.694893646240233, "median": 1.598292350769043, "p90": 22.508602142333984, "max": 32.40635299682617, "pos_frac": 0.640625, "sample": [5.3203277587890625, 3.7470531463623047, -11.32879638671875, -3.9743881225585938, -2.700702667236328, 6.5305023193359375, 6.396625518798828, -1.506195068359375, 7.50274658203125, -12.064598083496094, 0.1300334930419922, 22.430923461914062, 0.8689651489257812, 27.879837036132812, 12.003944396972656, -0.23870849609375, 6.172767639160156, 3.0575695037841797, -14.626152038574219, 27.123092651367188, 16.780014038085938, -0.7280654907226562, 9.620529174804688, 20.01165771484375, -3.4284915924072266, -9.385482788085938, 0.3123016357421875, 16.682456970214844, 5.345367431640625, 0.9790496826171875, -4.485755920410156, -3.7530670166015625, 14.484657287597656, -3.012228012084961, 8.968475341796875, -1.3237667083740234, 22.541893005371094, 27.204227447509766, -7.083518981933594, 7.781162261962891, -3.897136688232422, 2.7184219360351562, 0.298248291015625, -9.833332061767578, 10.748115539550781, 1.1467132568359375, 0.3014984130859375, 1.492574691772461, -12.512687683105469, 32.40635299682617, -4.656284332275391, 27.879791259765625, -1.802755355834961, 20.436233520507812, -20.478591918945312, 1.704010009765625, 0.750762939453125, -3.55694580078125, 2.0881271362304688, 18.991952896118164, 31.436141967773438, -2.5322189331054688, 3.84881591796875, 8.459632873535156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000409.npy"}
|
||||
{"epoch": 0.618291761148904, "step": 410, "batch_size": 64, "mean": 8.771734237670898, "std": 11.876056671142578, "min": -22.54477882385254, "p10": -6.554635238647461, "median": 7.668693542480469, "p90": 24.276403808593752, "max": 35.20182800292969, "pos_frac": 0.75, "sample": [8.496784210205078, 5.186206817626953, 7.448253631591797, 2.615875244140625, 29.125431060791016, -0.314971923828125, 7.6362457275390625, 11.233028411865234, -9.503005981445312, -22.54477882385254, 15.893478393554688, 20.311477661132812, -1.4620399475097656, -7.844062805175781, 25.240379333496094, 20.67359161376953, 26.709171295166016, -10.505111694335938, 19.26177215576172, 23.370620727539062, 28.660919189453125, 17.59746551513672, -9.457283020019531, 14.103757858276367, 5.375946044921875, 19.264087677001953, -5.849395751953125, -7.5550689697265625, 10.412300109863281, -5.693534851074219, 22.016006469726562, -0.060302734375, 4.412143707275391, 22.6728515625, 2.1727828979492188, 22.091156005859375, 6.664604187011719, -6.595134735107422, -2.802278518676758, 23.286117553710938, -1.6739273071289062, -6.460136413574219, 10.734451293945312, 3.119964599609375, 24.104095458984375, 9.739990234375, 26.1964168548584, 5.761070251464844, 10.098358154296875, 7.701141357421875, 5.4791259765625, 1.018218994140625, 35.20182800292969, 14.371994018554688, 0.27728271484375, 6.7303009033203125, 15.285888671875, 3.994121551513672, 10.686460494995117, 8.782691955566406, 24.350250244140625, 3.4919662475585938, -0.6812744140625, 11.335212707519531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000410.npy"}
|
||||
{"epoch": 0.6198034769463341, "step": 411, "batch_size": 64, "mean": 8.867551803588867, "std": 12.085742950439453, "min": -16.486236572265625, "p10": -4.3295448303222654, "median": 7.142297744750977, "p90": 27.818688774108892, "max": 35.82394790649414, "pos_frac": 0.734375, "sample": [-14.917526245117188, 18.547775268554688, 28.964988708496094, 18.03630256652832, 12.017532348632812, 7.022743225097656, -8.36309814453125, -2.4326171875, 4.3895416259765625, 2.2792739868164062, 26.603546142578125, 31.546981811523438, 11.527341842651367, 16.943992614746094, 12.910995483398438, 3.5977935791015625, 11.044506072998047, -3.4417648315429688, 18.89861297607422, -0.4730873107910156, -1.09375, 11.035125732421875, 9.941177368164062, 7.064426422119141, 3.7896881103515625, 22.286636352539062, 0.19408416748046875, 18.761798858642578, -0.9564151763916016, -16.486236572265625, -9.988067626953125, -4.391681671142578, -6.5523223876953125, 7.2201690673828125, 19.021461486816406, 3.5718002319335938, 0.4504432678222656, 30.29639434814453, -5.514781951904297, 32.979156494140625, 15.4295654296875, 10.79018783569336, -0.5934982299804688, 22.144054412841797, 15.05682373046875, 35.82394790649414, -3.9611587524414062, 13.92724609375, 1.6748237609863281, -1.02130126953125, 6.270111083984375, 10.909339904785156, 28.33946418762207, 24.374046325683594, -4.184558868408203, 11.421512603759766, 5.9161376953125, 8.118766784667969, 31.959835052490234, 4.905426025390625, 3.830596923828125, 2.8380908966064453, -1.9674453735351562, 9.188369750976562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000411.npy"}
|
||||
{"epoch": 0.6213151927437641, "step": 412, "batch_size": 64, "mean": 7.599700927734375, "std": 11.519579887390137, "min": -15.737785339355469, "p10": -5.605552673339841, "median": 6.479970932006836, "p90": 22.739941406250004, "max": 36.749366760253906, "pos_frac": 0.75, "sample": [-8.716072082519531, 29.312320709228516, 9.140640258789062, 30.25925064086914, 4.469173431396484, 15.426460266113281, 2.572906494140625, 18.266910552978516, -14.231584548950195, 5.346735000610352, 1.4615974426269531, 4.841922760009766, 18.61829376220703, 35.08355712890625, 1.825897216796875, -1.1662673950195312, 17.77892303466797, -1.8400630950927734, 1.8612823486328125, 1.3441047668457031, 13.078937530517578, 23.132949829101562, 6.921394348144531, 13.117637634277344, 10.249237060546875, 2.792766571044922, 1.2788848876953125, 2.260723114013672, 6.0760650634765625, 6.20489501953125, 1.4303932189941406, 20.377471923828125, 6.755046844482422, 36.749366760253906, -1.2866783142089844, 8.270732879638672, -6.6954193115234375, -2.4806900024414062, 21.213119506835938, -1.8474502563476562, -10.504989624023438, -3.062530517578125, 11.756309509277344, -9.221004486083984, 12.057018280029297, 13.848747253417969, 25.72607421875, 21.822921752929688, 32.244224548339844, -6.706596374511719, 10.882583618164062, 7.963111877441406, 10.40496826171875, -1.8905754089355469, 7.321987152099609, 8.455780029296875, 0.24982452392578125, 2.5200023651123047, -15.737785339355469, 6.9448089599609375, 13.975326538085938, -1.7925853729248047, 12.440120697021484, -2.572235107421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000412.npy"}
|
||||
{"epoch": 0.6228269085411943, "step": 413, "batch_size": 64, "mean": 6.8358330726623535, "std": 13.247966766357422, "min": -22.319446563720703, "p10": -9.195487976074217, "median": 5.3936309814453125, "p90": 26.87375183105469, "max": 37.96125411987305, "pos_frac": 0.734375, "sample": [-0.9657955169677734, -21.763572692871094, 9.822021484375, 16.78472328186035, -6.3677520751953125, 1.0105247497558594, 9.535697937011719, 37.96125411987305, -5.0315704345703125, 27.91912841796875, -21.989044189453125, 15.591506958007812, 6.6748809814453125, -1.6003570556640625, -2.3143463134765625, 4.242824554443359, 10.044281005859375, 30.944583892822266, 0.9359474182128906, -1.573251724243164, 26.433090209960938, 10.771659851074219, 2.5838165283203125, 1.6362266540527344, 5.4167327880859375, -4.5972747802734375, 27.062606811523438, 27.149276733398438, 11.8118896484375, 3.1971588134765625, 14.80804443359375, 4.8058624267578125, 0.45478057861328125, 5.3705291748046875, 35.55059814453125, -10.945327758789062, 9.119163513183594, 1.6679763793945312, 11.143417358398438, 7.87445068359375, -11.340415954589844, 23.977771759033203, 13.861026763916016, -10.052177429199219, 11.670845031738281, -9.774032592773438, -22.319446563720703, 2.0394973754882812, 4.322414398193359, 6.971595764160156, -7.845550537109375, -6.595947265625, 23.960357666015625, 1.075643539428711, 0.8791732788085938, 13.024032592773438, 2.1599788665771484, 13.42547607421875, 7.895851135253906, 16.059181213378906, -1.6673049926757812, 11.316189765930176, 28.68987274169922, 24.582923889160156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000413.npy"}
|
||||
{"epoch": 0.6243386243386243, "step": 414, "batch_size": 64, "mean": 8.477437019348145, "std": 12.370136260986328, "min": -23.43968963623047, "p10": -4.090967559814452, "median": 5.8697052001953125, "p90": 25.648884201049807, "max": 38.5177001953125, "pos_frac": 0.78125, "sample": [16.375141143798828, -2.5612716674804688, 19.491470336914062, 3.8083038330078125, 1.8502731323242188, 10.278980255126953, -1.0431060791015625, 15.403366088867188, 7.1545867919921875, 0.8275146484375, 4.770240783691406, 25.135299682617188, 18.94721221923828, -7.400703430175781, 25.755855560302734, 25.39928436279297, -9.1829833984375, -8.888229370117188, 8.83404541015625, -1.5850830078125, 6.558315277099609, 1.0991134643554688, 38.5177001953125, 7.649200439453125, 28.031787872314453, 3.5605926513671875, 4.962554931640625, 6.1656951904296875, 3.488067626953125, 19.02676010131836, 15.637271881103516, -1.7138290405273438, 24.324176788330078, 5.772418975830078, 29.253005981445312, 12.56121826171875, 11.214473724365234, 5.225341796875, 8.66448974609375, 21.384822845458984, 0.53582763671875, -9.455596923828125, 4.115930557250977, -1.7399730682373047, 25.331253051757812, 1.1438751220703125, 4.301780700683594, 6.403415679931641, -23.43968963623047, 1.65008544921875, 20.83734130859375, 5.966991424560547, -12.394979476928711, 1.2700519561767578, 27.021167755126953, 38.35516357421875, 1.7060203552246094, -2.006072998046875, 9.92294692993164, 16.69098663330078, -4.746551513671875, -1.9256477355957031, 1.6686630249023438, 26.58959197998047], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000414.npy"}
|
||||
{"epoch": 0.6258503401360545, "step": 415, "batch_size": 64, "mean": 7.860967636108398, "std": 14.247897148132324, "min": -34.62291717529297, "p10": -8.684487152099608, "median": 7.189409255981445, "p90": 26.10129089355469, "max": 39.24278259277344, "pos_frac": 0.703125, "sample": [2.0966272354125977, 12.295875549316406, 1.6058845520019531, 16.538293838500977, -2.9139633178710938, 4.780487060546875, 23.376937866210938, 35.28087615966797, -8.221763610839844, 25.730056762695312, -4.787773132324219, -11.74456787109375, 23.810256958007812, 29.93233299255371, -8.882797241210938, 22.469070434570312, 39.24278259277344, 7.715373992919922, 21.624794006347656, -2.10345458984375, 8.810638427734375, 8.86962890625, 2.2091598510742188, -1.325514793395996, 13.974311828613281, 26.260391235351562, -13.54690933227539, 20.196495056152344, 14.500404357910156, 19.164886474609375, -6.937103271484375, 13.844955444335938, 13.270523071289062, -0.8010139465332031, -21.76275634765625, 25.56182861328125, 12.94329833984375, 1.8014774322509766, 27.130714416503906, -13.171539306640625, 6.335365295410156, -0.0898590087890625, -17.43142318725586, -34.62291717529297, 11.91839599609375, 6.663444519042969, 0.24525833129882812, 0.814208984375, 3.9318389892578125, -0.42142486572265625, -1.2972049713134766, 9.0794677734375, 5.053474426269531, 9.352218627929688, -1.071624755859375, 18.71968650817871, 0.40281200408935547, 11.652130126953125, 20.395050048828125, -0.174957275390625, 3.0427398681640625, 11.2294921875, 34.10844421386719, 26.428115844726562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000415.npy"}
|
||||
{"epoch": 0.6273620559334845, "step": 416, "batch_size": 64, "mean": 5.816370964050293, "std": 13.042719841003418, "min": -24.358291625976562, "p10": -8.41190299987793, "median": 3.734790802001953, "p90": 24.362416076660157, "max": 38.330718994140625, "pos_frac": 0.65625, "sample": [-0.14149856567382812, -24.358291625976562, -6.735374450683594, 3.3180313110351562, 1.80938720703125, 31.873321533203125, -5.176666259765625, 3.8503265380859375, 5.3751983642578125, 32.88719177246094, 5.1768951416015625, 26.146514892578125, 38.330718994140625, 0.6820850372314453, -2.0576324462890625, 1.799285888671875, -0.4377574920654297, 8.467262268066406, -21.361366271972656, 6.3269195556640625, 13.746997833251953, 8.867412567138672, 3.1307907104492188, 12.034236907958984, -1.8188114166259766, 1.864593505859375, 11.46124267578125, 32.479454040527344, 14.176811218261719, -5.109123229980469, 12.573690414428711, 11.530952453613281, 0.5320968627929688, -16.507709503173828, 3.6192550659179688, -0.38043975830078125, 1.5826148986816406, -12.874580383300781, -1.5936012268066406, 3.9600467681884766, -0.90142822265625, 21.181678771972656, -11.7969970703125, -4.199251174926758, 2.3843612670898438, -1.9174041748046875, 24.98056983947754, -7.657501220703125, -2.9365062713623047, 24.087814331054688, -0.634490966796875, 4.1757049560546875, 5.32891845703125, 24.4801025390625, -8.735218048095703, 13.000926971435547, -13.066757202148438, 10.300628662109375, 9.385780334472656, 14.700782775878906, 21.310302734375, 22.616737365722656, 22.9730224609375, 4.135490417480469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000416.npy"}
|
||||
{"epoch": 0.6288737717309146, "step": 417, "batch_size": 64, "mean": 10.360841751098633, "std": 13.600950241088867, "min": -24.795669555664062, "p10": -5.239685058593749, "median": 8.83296012878418, "p90": 27.60317611694336, "max": 43.37040710449219, "pos_frac": 0.796875, "sample": [8.726749420166016, -2.4370498657226562, 0.7572860717773438, -8.242866516113281, 26.64348602294922, 17.888229370117188, -24.795669555664062, 31.30792236328125, 12.405860900878906, 8.939170837402344, 43.37040710449219, 6.3042144775390625, 10.40999984741211, 8.53354263305664, 14.144706726074219, -17.5141658782959, 33.04020309448242, -4.190948486328125, 0.9864501953125, 7.348316192626953, 15.441986083984375, 28.967910766601562, 7.861560821533203, 6.1636199951171875, 2.6815719604492188, -5.671653747558594, 34.354034423828125, 11.805313110351562, 25.561805725097656, 25.859230041503906, 2.411510467529297, 26.518203735351562, 35.565467834472656, 2.2889251708984375, 14.436126708984375, -9.7945556640625, 6.8393707275390625, 6.502857208251953, 27.39128875732422, 16.598735809326172, 23.387985229492188, 2.547901153564453, 21.890464782714844, 9.180641174316406, -0.14524078369140625, 7.151695251464844, 4.417327880859375, 14.716712951660156, -11.181095123291016, 13.910430908203125, 3.09942626953125, 13.104255676269531, -1.955078125, -4.231758117675781, 14.244338989257812, -14.041969299316406, 17.960092544555664, 15.992027282714844, 2.6288528442382812, -0.5038223266601562, 25.74726676940918, 4.497026443481445, 17.573230743408203, 27.693984985351562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000417.npy"}
|
||||
{"epoch": 0.6303854875283447, "step": 418, "batch_size": 64, "mean": 4.893869400024414, "std": 12.250873565673828, "min": -20.523910522460938, "p10": -11.136115837097167, "median": 4.331695556640625, "p90": 21.328691864013678, "max": 35.56513214111328, "pos_frac": 0.671875, "sample": [9.548194885253906, 16.74245262145996, 9.187171936035156, 6.9602508544921875, -6.403812408447266, 20.055130004882812, 2.6654052734375, 17.707788467407227, 35.56513214111328, -9.766067504882812, 26.26360321044922, 27.514671325683594, -20.523910522460938, 8.166900634765625, 0.8770828247070312, -13.069385528564453, 12.320449829101562, 0.22885894775390625, 23.921863555908203, -6.137199401855469, 1.808145523071289, 31.80572509765625, -1.6191864013671875, -0.142913818359375, 18.873085021972656, 5.310245513916016, -10.106216430664062, 11.298980712890625, 3.954132080078125, -2.8825759887695312, -18.941650390625, -4.8556671142578125, 7.5839996337890625, 6.508477210998535, 30.730010986328125, -6.688087463378906, 7.131614685058594, -12.617298126220703, 0.7374439239501953, -3.8541221618652344, -6.956573486328125, -4.8993988037109375, -11.57750129699707, -12.056050300598145, 10.09042739868164, 3.6504669189453125, 1.82379150390625, 7.365688323974609, -4.624050140380859, 2.636077880859375, 11.260444641113281, 7.51153564453125, 16.07982635498047, 18.378196716308594, 6.022041320800781, 2.3883514404296875, 0.1772918701171875, 21.87450408935547, 12.429996490478516, 7.314319610595703, 7.156166076660156, -0.9997596740722656, 4.709259033203125, -12.406143188476562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000418.npy"}
|
||||
{"epoch": 0.6318972033257747, "step": 419, "batch_size": 64, "mean": 10.8507080078125, "std": 12.180119514465332, "min": -13.827667236328125, "p10": -4.983888244628906, "median": 10.425172805786133, "p90": 29.113904190063483, "max": 33.29540252685547, "pos_frac": 0.765625, "sample": [3.100780487060547, 30.891510009765625, 5.959531784057617, 18.191444396972656, -8.143714904785156, -0.3466606140136719, 24.952346801757812, 12.32284927368164, -1.05975341796875, 13.688163757324219, -13.827667236328125, 24.249717712402344, 21.826797485351562, 15.922515869140625, 4.701663970947266, 2.4094696044921875, 21.92273712158203, 11.420242309570312, 23.54461669921875, -0.943450927734375, 3.6768264770507812, 16.120925903320312, -6.517383575439453, 12.54522705078125, 4.8541107177734375, -6.638885498046875, 29.737239837646484, -4.360595703125, 20.54803466796875, 15.701225280761719, -4.096229553222656, -5.248043060302734, -4.367527008056641, 27.659454345703125, 20.246353149414062, 10.428043365478516, 15.041053771972656, 2.345796585083008, -0.2509422302246094, 15.550827026367188, 9.028099060058594, 22.208759307861328, 6.347991943359375, -1.5417251586914062, 25.655319213867188, -6.571708679199219, 31.46185302734375, 7.854644775390625, 2.2844772338867188, 33.29540252685547, 32.84230041503906, 3.200345993041992, 10.152164459228516, 13.981025695800781, 18.217681884765625, 8.916595458984375, 20.924774169921875, 11.933242797851562, 30.833602905273438, 1.7548980712890625, 10.42230224609375, 3.0532913208007812, -7.7818603515625, 32.213165283203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000419.npy"}
|
||||
{"epoch": 0.6334089191232048, "step": 420, "batch_size": 64, "mean": 6.4076995849609375, "std": 14.530999183654785, "min": -26.567989349365234, "p10": -11.6615966796875, "median": 4.131315231323242, "p90": 25.51414413452149, "max": 37.5637092590332, "pos_frac": 0.703125, "sample": [-2.2070388793945312, 1.6666030883789062, 26.04161834716797, -0.8346481323242188, 13.351448059082031, -20.201854705810547, 1.541534423828125, 3.8822250366210938, 0.5670089721679688, -26.567989349365234, -2.0016937255859375, 4.380405426025391, 1.4029273986816406, 19.708839416503906, -3.2835922241210938, 18.969608306884766, 37.42982482910156, 3.42022705078125, 37.5637092590332, 30.828399658203125, 14.60445785522461, -7.538280487060547, 9.73841667175293, 15.319785118103027, 15.3438720703125, 22.346668243408203, -8.72079849243164, -24.105911254882812, 9.300909042358398, 6.365991592407227, -15.323997497558594, 13.056129455566406, -4.483154296875, 7.064979553222656, -4.6971282958984375, -11.73223876953125, 2.6773681640625, -3.1945228576660156, 13.280494689941406, 8.158638000488281, 3.139636993408203, -19.693435668945312, -11.49676513671875, 2.6618804931640625, 24.283370971679688, 9.068191528320312, 11.650184631347656, 32.95686340332031, 1.2184677124023438, 4.7247161865234375, 30.923126220703125, 1.5582275390625, 6.3996734619140625, -3.2001686096191406, 0.6226882934570312, 22.343978881835938, 19.035675048828125, 20.118804931640625, 10.183589935302734, 21.53857421875, -1.0649681091308594, 32.748985290527344, -14.082862854003906, 1.335113525390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000420.npy"}
|
||||
{"epoch": 0.6349206349206349, "step": 421, "batch_size": 64, "mean": 6.876836776733398, "std": 14.940112113952637, "min": -29.801666259765625, "p10": -12.355809020996093, "median": 7.397434234619141, "p90": 23.247983551025392, "max": 45.171875, "pos_frac": 0.65625, "sample": [-25.15884780883789, 27.772857666015625, -6.12690544128418, 15.036489486694336, 22.009658813476562, 23.252655029296875, 12.672248840332031, 0.0657501220703125, 3.542652130126953, 45.171875, 29.5367431640625, -7.577583312988281, 31.12262725830078, 5.519184112548828, -7.022624969482422, 16.57819175720215, 9.364547729492188, -3.9412002563476562, 22.16704559326172, 17.80536651611328, 23.237083435058594, 15.985862731933594, -0.5213432312011719, -3.0121192932128906, 6.628509521484375, -2.170177459716797, 14.925018310546875, -29.801666259765625, -15.481864929199219, -16.07327651977539, 8.285770416259766, 30.82074737548828, 17.937179565429688, -1.3545341491699219, 1.6329345703125, 24.85063934326172, 19.345565795898438, 18.281940460205078, -2.3384780883789062, 2.7814788818359375, 15.61492919921875, -2.315948486328125, 12.65877914428711, 13.064559936523438, -9.672042846679688, 22.877809524536133, 0.9567718505859375, 9.720806121826172, -10.983932495117188, -8.637832641601562, 21.87750244140625, 22.935264587402344, -12.943756103515625, 6.350624084472656, 5.991706848144531, -19.62054443359375, 2.9342193603515625, 9.410785675048828, 14.60323715209961, -18.024646759033203, 8.166358947753906, 15.631011962890625, -1.2255058288574219, -5.002597808837891], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000421.npy"}
|
||||
{"epoch": 0.636432350718065, "step": 422, "batch_size": 64, "mean": 7.448757648468018, "std": 12.63509750366211, "min": -15.436695098876953, "p10": -7.451832962036132, "median": 7.792030334472656, "p90": 23.396195983886724, "max": 40.28460693359375, "pos_frac": 0.671875, "sample": [18.130126953125, 35.18409729003906, 4.318550109863281, 3.105449676513672, 19.112396240234375, -1.4111671447753906, 2.032501220703125, -6.9522705078125, 4.195148468017578, 40.28460693359375, 8.339134216308594, 17.12087631225586, -3.1486740112304688, 10.232574462890625, 15.785018920898438, -0.9337387084960938, 11.33868408203125, 29.517120361328125, -7.1826171875, 5.8493499755859375, 21.93183135986328, -4.2802734375, -0.11231231689453125, -15.436695098876953, 8.709075927734375, -4.508758544921875, 8.22622299194336, 11.666797637939453, 37.6011962890625, 3.3101577758789062, 14.141464233398438, 12.855644226074219, 24.023780822753906, -13.387186050415039, 10.419666290283203, -11.546051025390625, 13.691947937011719, -7.567211151123047, 24.676300048828125, 3.2132492065429688, -1.7966156005859375, -7.794578552246094, 10.856582641601562, 12.816272735595703, 6.974517822265625, -4.6234588623046875, -1.399444580078125, 10.121795654296875, -0.38652801513671875, 11.584358215332031, -3.8446044921875, 38.686248779296875, 2.017610549926758, -11.900222778320312, 11.145584106445312, 11.473526000976562, 7.357837677001953, -13.081527709960938, 17.257831573486328, 14.018562316894531, 7.0888671875, 12.27914047241211, -4.6776885986328125, 10.000394821166992], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000422.npy"}
|
||||
{"epoch": 0.6379440665154951, "step": 423, "batch_size": 64, "mean": 8.670099258422852, "std": 12.783875465393066, "min": -23.64881134033203, "p10": -5.715313720703124, "median": 6.097845077514648, "p90": 29.906778144836426, "max": 38.058753967285156, "pos_frac": 0.78125, "sample": [4.070777893066406, -8.944953918457031, 31.35193634033203, -1.7103137969970703, 30.379558563232422, 3.0719375610351562, 1.6209754943847656, -8.985572814941406, 30.952774047851562, 2.6591796875, -9.551506042480469, 29.54848289489746, 1.97967529296875, 3.5843772888183594, -12.474929809570312, 6.8248138427734375, 5.826007843017578, 4.5113677978515625, 33.44410705566406, -12.4169921875, 14.194585800170898, 8.709487915039062, -3.2054977416992188, 10.908912658691406, 4.253881454467773, 18.421180725097656, 26.071456909179688, -2.847846031188965, -23.64881134033203, 9.818023681640625, 10.817611694335938, 27.848533630371094, 8.135833740234375, 12.62493896484375, 14.502010345458984, 6.369682312011719, 4.783599853515625, 4.69085693359375, 24.98773956298828, 38.058753967285156, 4.415580749511719, 5.22865104675293, -1.9043426513671875, 16.514190673828125, 9.509880065917969, 18.841339111328125, 14.620010375976562, -1.7724037170410156, 2.6782073974609375, 7.000102996826172, 31.649887084960938, 13.554351806640625, -2.877452850341797, 4.296810150146484, 13.208629608154297, 30.060333251953125, 0.9185562133789062, 5.234657287597656, -4.473052978515625, 8.51068115234375, 22.663177490234375, -6.247711181640625, 9.53300666809082, 2.486602783203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000423.npy"}
|
||||
{"epoch": 0.6394557823129252, "step": 424, "batch_size": 64, "mean": 9.346273422241211, "std": 12.520689964294434, "min": -19.85809326171875, "p10": -7.4459075927734375, "median": 8.820198059082031, "p90": 29.117090225219734, "max": 35.848167419433594, "pos_frac": 0.796875, "sample": [-8.027481079101562, 30.63628387451172, 6.262359619140625, 27.620513916015625, 0.54791259765625, 19.78795623779297, 15.915740966796875, 8.370658874511719, -9.62429428100586, -19.85809326171875, 1.6823043823242188, 9.527141571044922, -0.5207748413085938, -12.444618225097656, 35.848167419433594, 4.837059020996094, 15.604507446289062, -10.88134765625, 5.641044616699219, 5.988063812255859, 3.6320724487304688, 10.715736389160156, 2.6884765625, -3.392608642578125, 2.2087135314941406, 16.497909545898438, 21.886199951171875, 5.4948272705078125, 35.046112060546875, 6.2749176025390625, 19.249298095703125, 7.572547912597656, -0.7685699462890625, 4.09868049621582, 29.758480072021484, 9.9310302734375, 14.682357788085938, 19.37688446044922, 20.39478302001953, 10.589485168457031, 7.197439193725586, 10.62900161743164, 1.4678363800048828, 33.680355072021484, 3.7442779541015625, 32.633384704589844, 9.269737243652344, 9.855405807495117, -7.2923126220703125, -0.365203857421875, 16.68187713623047, 14.413299560546875, 11.299774169921875, 19.95557403564453, 1.166015625, -1.5941162109375, 10.044021606445312, -15.795249938964844, 14.712249755859375, 30.974777221679688, 10.654380798339844, 22.388965606689453, -7.5117340087890625, 7.101341247558594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000424.npy"}
|
||||
{"epoch": 0.6409674981103552, "step": 425, "batch_size": 64, "mean": 9.353760719299316, "std": 11.58571720123291, "min": -13.209602355957031, "p10": -4.592896270751953, "median": 6.179172515869141, "p90": 25.678472900390627, "max": 38.359100341796875, "pos_frac": 0.765625, "sample": [14.194099426269531, 5.4725799560546875, -0.7061309814453125, 6.633415222167969, 11.216102600097656, 4.695873260498047, 38.359100341796875, -4.161373138427734, 14.247718811035156, 15.424259185791016, 26.4498291015625, -2.8582305908203125, 13.264141082763672, -5.4832916259765625, -8.534103393554688, 12.121337890625, 2.7167434692382812, 21.210662841796875, 0.7267837524414062, 10.325004577636719, -6.735561370849609, 19.708328247070312, -4.4978179931640625, 4.196590423583984, 22.94329833984375, 4.243721008300781, 7.192081451416016, 28.752685546875, 2.953033447265625, 25.220443725585938, 5.196563720703125, -2.5018577575683594, 5.724861145019531, 4.9331512451171875, 18.979034423828125, 4.4803466796875, 0.987762451171875, 29.087139129638672, 4.432640075683594, 5.7249298095703125, 17.506629943847656, -4.633644104003906, 12.090240478515625, 25.874771118164062, 17.956565856933594, 13.564537048339844, 15.056968688964844, 17.541675567626953, 11.3455810546875, 3.231170654296875, 32.71158981323242, 3.1339569091796875, -5.877315521240234, 23.326492309570312, -1.7644195556640625, -2.939197540283203, 30.081771850585938, 10.678171157836914, -6.518577575683594, 19.642318725585938, -4.458900451660156, 3.9096527099609375, 24.054336547851562, -13.209602355957031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000425.npy"}
|
||||
{"epoch": 0.6424792139077853, "step": 426, "batch_size": 64, "mean": 5.014549732208252, "std": 14.571732521057129, "min": -31.273941040039062, "p10": -10.901606750488279, "median": 2.2507572174072266, "p90": 25.68046951293946, "max": 47.521697998046875, "pos_frac": 0.65625, "sample": [-1.0068359375, -2.0448074340820312, 17.553665161132812, 11.8377685546875, 18.59526824951172, 28.80298614501953, -2.1215972900390625, 33.307273864746094, 2.9177780151367188, 17.650955200195312, -7.664524078369141, -5.751708984375, 0.5957527160644531, -19.37653350830078, 6.47265625, -3.467876434326172, 12.412540435791016, 1.84619140625, 1.7049751281738281, 0.032512664794921875, -19.060420989990234, 1.8525238037109375, -8.899093627929688, 8.491813659667969, -6.804542541503906, 2.0560379028320312, -1.8236122131347656, -31.273941040039062, 14.293048858642578, 6.2912139892578125, 7.022064208984375, 26.348861694335938, 16.35045623779297, 10.271903991699219, -23.526107788085938, 3.7212753295898438, -17.32342529296875, 7.09101676940918, 0.8781585693359375, 47.521697998046875, 34.48792266845703, 28.826736450195312, -11.75982666015625, 2.353759765625, -3.78985595703125, 34.011688232421875, -3.4350662231445312, 0.022165298461914062, 2.147754669189453, -3.0248260498046875, -4.3887786865234375, 19.183746337890625, 4.559837341308594, 16.09173583984375, 10.080879211425781, -0.8365020751953125, 2.8434066772460938, 6.4322509765625, 1.010894775390625, -15.548721313476562, 17.283706665039062, 7.432029724121094, 24.120887756347656, -2.95001220703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000426.npy"}
|
||||
{"epoch": 0.6439909297052154, "step": 427, "batch_size": 64, "mean": 8.416566848754883, "std": 12.746842384338379, "min": -24.693008422851562, "p10": -3.992743301391601, "median": 6.36634635925293, "p90": 27.127892494201674, "max": 42.976593017578125, "pos_frac": 0.75, "sample": [12.423469543457031, 2.990863800048828, -2.042797088623047, 2.140594482421875, 2.18170166015625, 6.006717681884766, 29.031906127929688, -1.7142486572265625, 13.162628173828125, -12.876914978027344, 22.30194091796875, -10.72369384765625, -9.588645935058594, -8.924362182617188, 18.65613555908203, 10.207977294921875, 7.215110778808594, 1.215179443359375, -0.5516281127929688, 1.7802810668945312, -0.4017295837402344, 7.7547149658203125, 30.966567993164062, 9.149471282958984, 2.9239730834960938, 22.58489227294922, 5.776592254638672, 17.68175506591797, 9.232402801513672, 11.579246520996094, 28.718835830688477, 4.948162078857422, 9.668975830078125, 6.725975036621094, 3.4298019409179688, 23.415691375732422, -2.2545223236083984, -24.693008422851562, 21.646522521972656, 31.731956481933594, 10.858997344970703, 3.3855247497558594, -3.015380859375, 0.764801025390625, -3.424663543701172, -4.2362060546875, 23.277870178222656, 16.091278076171875, 11.104110717773438, -0.0638885498046875, 33.94194412231445, 21.504669189453125, 1.2393264770507812, 7.2716522216796875, 11.392898559570312, 0.463958740234375, 10.299327850341797, 16.05925750732422, 3.1680908203125, -5.331451416015625, 5.766651153564453, -3.3902053833007812, 42.976593017578125, 35.07661437988281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000427.npy"}
|
||||
{"epoch": 0.6455026455026455, "step": 428, "batch_size": 64, "mean": 8.352245330810547, "std": 13.557892799377441, "min": -24.6727294921875, "p10": -8.999680709838866, "median": 7.921863555908203, "p90": 29.62158317565918, "max": 32.87652587890625, "pos_frac": 0.734375, "sample": [-14.886756896972656, 7.806831359863281, 30.592437744140625, -0.4854278564453125, -0.5033760070800781, 3.203521728515625, 0.11920166015625, -24.6727294921875, -9.94546127319336, -2.3368072509765625, 30.09459686279297, -12.196868896484375, 1.9332122802734375, -1.9708442687988281, 25.937519073486328, 2.13726806640625, 29.70446014404297, 9.827733993530273, 0.1676483154296875, 0.8913650512695312, 21.796737670898438, 9.567281723022461, 1.1821441650390625, 8.242576599121094, 27.020122528076172, 3.7315597534179688, 6.879371643066406, -8.754364013671875, 15.69635009765625, -11.838836669921875, 8.945465087890625, -9.309455871582031, 4.611181259155273, 10.976768493652344, 12.925750732421875, 31.75354766845703, 22.987525939941406, 9.984603881835938, -0.38967132568359375, -9.104816436767578, 11.7264404296875, 27.350830078125, 20.443618774414062, 32.87652587890625, -7.20654296875, -3.3644943237304688, 13.011825561523438, 8.710350036621094, 3.124025344848633, 2.114105224609375, 1.2092933654785156, 8.322769165039062, 15.513372421264648, -2.24334716796875, 32.442134857177734, 21.463485717773438, 29.428203582763672, 22.494712829589844, 8.036895751953125, 12.782844543457031, 32.36334228515625, 13.505741119384766, 6.5983428955078125, -8.482181549072266], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000428.npy"}
|
||||
{"epoch": 0.6470143613000756, "step": 429, "batch_size": 64, "mean": 8.864797592163086, "std": 15.267073631286621, "min": -34.69590377807617, "p10": -9.28969039916992, "median": 6.653669357299805, "p90": 28.923608398437505, "max": 45.29844665527344, "pos_frac": 0.765625, "sample": [5.7750701904296875, -19.56256103515625, -0.7497539520263672, 1.69537353515625, 5.074371337890625, 23.797889709472656, 17.919811248779297, 20.864662170410156, -1.3848991394042969, 6.1382598876953125, 0.4877433776855469, -8.088546752929688, 6.184612274169922, 5.195255279541016, 11.519657135009766, 19.5948486328125, -23.805770874023438, 6.6298980712890625, -1.335693359375, 21.590538024902344, 17.097305297851562, 1.996429443359375, 19.29936981201172, 22.875539779663086, 31.49475860595703, 31.01525115966797, 35.06857681274414, 9.904441833496094, 10.576042175292969, -7.825189590454102, -15.967453002929688, -34.69590377807617, -0.42147064208984375, 3.0507125854492188, 19.59164810180664, 0.0213775634765625, -21.303070068359375, 14.792823791503906, -1.29461669921875, 4.194786071777344, 27.63745880126953, 34.657798767089844, 7.019401550292969, 27.611400604248047, 29.474815368652344, -12.153488159179688, -9.804466247558594, 16.62210464477539, 2.474649429321289, 11.659248352050781, 6.494895935058594, 31.5616455078125, 4.1148834228515625, 6.677440643310547, 11.915473937988281, 3.1893081665039062, 20.344463348388672, 16.672826766967773, 45.29844665527344, 25.903038024902344, 10.188739776611328, 4.805147171020508, -2.6859970092773438, 10.655670166015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000429.npy"}
|
||||
{"epoch": 0.6485260770975056, "step": 430, "batch_size": 64, "mean": 10.0718994140625, "std": 14.056662559509277, "min": -29.010879516601562, "p10": -3.578601074218749, "median": 9.524951934814453, "p90": 29.243775939941415, "max": 38.598514556884766, "pos_frac": 0.78125, "sample": [30.234649658203125, 0.9368553161621094, -1.4093704223632812, 8.963218688964844, -4.0102996826171875, 18.35525894165039, 2.434633255004883, 38.598514556884766, 15.902984619140625, 13.706146240234375, -15.380668640136719, 17.32940673828125, 31.778778076171875, 12.131999969482422, 37.1409912109375, 21.052757263183594, 25.9501953125, 9.828201293945312, 27.4781494140625, 12.306358337402344, 4.879661560058594, 30.05712127685547, 12.803207397460938, 6.233722686767578, -1.1914911270141602, 30.000473022460938, 0.3768882751464844, 25.721511840820312, 12.335151672363281, 9.221702575683594, 8.450485229492188, -9.073043823242188, 7.24176025390625, 16.137786865234375, 0.8461837768554688, 38.50288391113281, 25.442543029785156, 14.837745666503906, -17.22403335571289, 23.985198974609375, -0.07303237915039062, 15.82916259765625, 1.9886093139648438, -0.0010986328125, 2.1882858276367188, 11.61648178100586, 6.939002990722656, -0.0591278076171875, -29.010879516601562, -22.36639404296875, 2.2744140625, 22.499488830566406, -2.5713043212890625, 22.075035095214844, 1.951934814453125, 18.793434143066406, 13.156524658203125, 20.594348907470703, 13.410263061523438, -1.784759521484375, 0.805999755859375, -7.301145553588867, 5.171104431152344, 5.560966491699219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000430.npy"}
|
||||
{"epoch": 0.6500377928949358, "step": 431, "batch_size": 64, "mean": 8.699821472167969, "std": 14.770487785339355, "min": -26.226428985595703, "p10": -7.838455963134764, "median": 6.325342178344727, "p90": 29.346476364135743, "max": 40.981353759765625, "pos_frac": 0.78125, "sample": [-26.226428985595703, 7.480632781982422, 12.320182800292969, 1.8103294372558594, 0.5239524841308594, 25.927330017089844, 18.302474975585938, 0.9932174682617188, 13.121749877929688, -13.965827941894531, 3.5211868286132812, 40.617149353027344, 14.508661270141602, -1.9033432006835938, 2.3142662048339844, 1.6526145935058594, 19.120635986328125, -11.984100341796875, 26.06085205078125, 6.0358428955078125, 6.788368225097656, -23.287796020507812, -8.547904968261719, 25.068012237548828, -14.189910888671875, 29.480030059814453, 31.673255920410156, 0.4679431915283203, 19.615318298339844, -4.68881893157959, 0.0446014404296875, 7.365345001220703, 15.514167785644531, 29.03485107421875, -1.8591156005859375, -6.183074951171875, 33.75398254394531, 6.132011413574219, -23.91259765625, 6.372829437255859, -1.10498046875, 4.991508483886719, -5.09893798828125, 20.895370483398438, 4.256183624267578, 8.87838363647461, 31.05276107788086, 10.4144287109375, 11.762168884277344, 40.981353759765625, 0.5080947875976562, 24.391979217529297, -0.693756103515625, 6.7642669677734375, 6.277854919433594, 23.597755432128906, 6.143768310546875, 5.592063903808594, 2.433074951171875, 31.339561462402344, 23.88653564453125, 13.456497192382812, 11.145355224609375, 6.044437408447266], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000431.npy"}
|
||||
{"epoch": 0.6515495086923658, "step": 432, "batch_size": 64, "mean": 8.409427642822266, "std": 15.06037425994873, "min": -26.636688232421875, "p10": -6.779462814331055, "median": 4.218874454498291, "p90": 29.002900695800786, "max": 40.52019500732422, "pos_frac": 0.671875, "sample": [-1.272125244140625, -0.7381019592285156, -10.385978698730469, -26.636688232421875, 4.274142265319824, 15.888797760009766, 0.6073722839355469, 5.080562591552734, 0.9641571044921875, 10.3697509765625, 3.004678726196289, -22.567123413085938, 0.647247314453125, -1.0542106628417969, 25.275344848632812, -23.6787109375, 3.8373870849609375, 9.049530029296875, 40.02082061767578, 24.4732666015625, -6.394416809082031, 17.17989730834961, 30.215576171875, -3.01885986328125, 4.163606643676758, 29.53717041015625, 8.01080322265625, 27.227096557617188, 16.499366760253906, -6.857631683349609, -6.597068786621094, -0.40313720703125, -0.8213157653808594, 0.21825408935546875, -0.2697601318359375, -6.923824310302734, 33.99616241455078, 25.924057006835938, 13.230438232421875, 22.861366271972656, -1.3440704345703125, 20.974517822265625, 24.50263214111328, 1.295440673828125, 30.043777465820312, 14.335235595703125, -0.27012062072753906, 34.5159912109375, 1.0010147094726562, 40.52019500732422, 10.548530578613281, 17.74956512451172, 2.989574432373047, -2.0569610595703125, 6.198486328125, 27.756271362304688, -14.943771362304688, 10.075878143310547, 21.929367065429688, 0.076629638671875, -0.3645954132080078, 26.295398712158203, -1.7919921875, 13.228515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000432.npy"}
|
||||
{"epoch": 0.6530612244897959, "step": 433, "batch_size": 64, "mean": 11.203941345214844, "std": 16.330034255981445, "min": -27.756816864013672, "p10": -8.656498718261718, "median": 7.422633171081543, "p90": 34.8538761138916, "max": 47.441383361816406, "pos_frac": 0.78125, "sample": [1.9362983703613281, 6.387233734130859, -18.825660705566406, 12.279720306396484, 8.133331298828125, 2.130584716796875, -0.5527496337890625, 41.85235595703125, -2.155630111694336, 3.67724609375, 2.9096450805664062, 21.328140258789062, 2.9239044189453125, -7.322967529296875, 35.681243896484375, -7.097320556640625, 6.711935043334961, 28.11023712158203, 21.116334915161133, -12.244338989257812, 24.013938903808594, 0.9752197265625, 2.3818206787109375, 21.678070068359375, -27.756816864013672, 8.40986442565918, 19.236862182617188, 4.608951568603516, 31.39389419555664, 14.26046371459961, 1.8655204772949219, 27.593299865722656, -15.413408279418945, 5.747314453125, 12.52191162109375, -5.9651947021484375, -15.003860473632812, 5.2851409912109375, 5.766082763671875, 47.441383361816406, 34.753753662109375, 35.39754867553711, 33.32044982910156, 4.423095703125, 17.791465759277344, -9.228012084960938, 2.2529850006103516, 18.36978530883789, -0.0737762451171875, 34.896785736083984, 1.4781875610351562, 8.618324279785156, 18.421401977539062, -11.683940887451172, 9.126510620117188, 29.094078063964844, 29.779499053955078, 5.6292724609375, 30.767074584960938, 15.986343383789062, 36.4134635925293, -0.09897232055664062, 36.84618377685547, 18.75067901611328], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000433.npy"}
|
||||
{"epoch": 0.654572940287226, "step": 434, "batch_size": 64, "mean": 10.282703399658203, "std": 14.873730659484863, "min": -25.15019989013672, "p10": -6.5357524871826165, "median": 8.74058723449707, "p90": 30.819365692138675, "max": 38.55308532714844, "pos_frac": 0.75, "sample": [-25.15019989013672, 24.891387939453125, 16.780977249145508, -6.9348297119140625, 0.28125762939453125, 32.03266906738281, -12.57314682006836, 32.211517333984375, -16.08576202392578, 13.13027572631836, -0.9174423217773438, -3.5403480529785156, -22.694664001464844, 25.682823181152344, 27.182205200195312, -0.6444854736328125, 17.75853729248047, 17.89126968383789, 9.14837646484375, 27.879474639892578, 25.71031951904297, -0.21159934997558594, 35.415771484375, -4.1527557373046875, -12.005840301513672, 5.542304992675781, 34.67258834838867, 4.072113037109375, 2.671916961669922, 21.12725257873535, -3.191448211669922, 7.136432647705078, 7.1838836669921875, 38.55308532714844, 29.805313110351562, 21.06036376953125, -5.32293701171875, 4.638023376464844, 21.500640869140625, 0.7876930236816406, 23.808176040649414, -4.420524597167969, 4.332115173339844, 20.944074630737305, 4.7263946533203125, 0.7074966430664062, 0.15833091735839844, 16.367630004882812, 5.497779846191406, 20.320335388183594, 2.288738250732422, 38.398826599121094, -5.604572296142578, 8.33279800415039, 11.06069564819336, 15.956069946289062, 14.987518310546875, 31.25395965576172, 15.418445587158203, -8.598480224609375, 10.417858123779297, 8.026908874511719, 23.188213348388672, 9.201173782348633], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000434.npy"}
|
||||
{"epoch": 0.656084656084656, "step": 435, "batch_size": 64, "mean": 7.150821685791016, "std": 12.902162551879883, "min": -24.393341064453125, "p10": -6.681325531005859, "median": 5.849323272705078, "p90": 26.201453399658202, "max": 36.030601501464844, "pos_frac": 0.671875, "sample": [26.231224060058594, 17.97636604309082, -8.326995849609375, 6.765604019165039, -3.7864418029785156, 13.252754211425781, 0.0843048095703125, 3.318828582763672, 17.948604583740234, -3.5236167907714844, 10.5699462890625, 11.910724639892578, -6.6253509521484375, 16.241905212402344, 32.75205612182617, 17.103099822998047, 5.2458953857421875, -13.455886840820312, -6.705314636230469, 26.131988525390625, 14.411895751953125, -5.382568359375, 22.246871948242188, 26.563369750976562, 5.5323333740234375, -0.2496185302734375, -3.7348251342773438, 1.053314208984375, 27.79147720336914, 30.733154296875, 1.1525421142578125, 20.989532470703125, -21.359189987182617, 13.91229248046875, 12.289398193359375, 11.919994354248047, -4.795196533203125, -0.2200927734375, 10.208829879760742, 14.251747131347656, 15.459365844726562, 27.739761352539062, 36.030601501464844, 13.54437255859375, 2.4276084899902344, 7.904611587524414, -5.187198638916016, -6.120277404785156, -7.467033386230469, 17.091217041015625, -5.571681976318359, 3.5587234497070312, -10.408500671386719, 13.174407958984375, 5.782035827636719, -0.5815887451171875, 11.087661743164062, -3.4034805297851562, -24.393341064453125, 5.9166107177734375, 1.3539581298828125, -2.2017669677734375, 19.529006958007812, 1.9625320434570312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000435.npy"}
|
||||
{"epoch": 0.6575963718820862, "step": 436, "batch_size": 64, "mean": 8.907886505126953, "std": 13.787222862243652, "min": -22.84214210510254, "p10": -6.394589233398437, "median": 7.909324645996094, "p90": 26.16689910888672, "max": 41.076446533203125, "pos_frac": 0.78125, "sample": [2.9147186279296875, -1.8483085632324219, 23.66291046142578, 4.119052886962891, 8.003829956054688, 4.647662162780762, 3.7885475158691406, 22.743816375732422, -9.698143005371094, 8.618011474609375, 21.115489959716797, 32.15950012207031, 8.265464782714844, 18.267578125, 6.476856231689453, 37.13758087158203, 41.076446533203125, 25.873092651367188, -20.302597045898438, 18.924091339111328, 8.988628387451172, 3.5699214935302734, -4.812934875488281, 0.94183349609375, -3.2097835540771484, 3.018463134765625, 22.23016357421875, 4.242607116699219, 12.359107971191406, 15.095256805419922, 9.803611755371094, -5.9566192626953125, -22.84214210510254, 9.12689208984375, 0.0048828125, 2.7030181884765625, 14.328765869140625, -6.5822906494140625, -7.046012878417969, -3.1408920288085938, 21.893905639648438, 23.192245483398438, 6.251312255859375, 4.217948913574219, -0.7937736511230469, 11.063376426696777, -9.833059310913086, 31.374099731445312, 11.456321716308594, 26.292816162109375, 7.8148193359375, 0.4393453598022461, 18.790115356445312, 19.78264617919922, 0.04328155517578125, 12.720802307128906, 32.26250457763672, -5.77880859375, 5.897270202636719, 8.129520416259766, 0.2260284423828125, 36.36186981201172, -20.130207061767578, 19.6622314453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000436.npy"}
|
||||
{"epoch": 0.6591080876795162, "step": 437, "batch_size": 64, "mean": 7.569982051849365, "std": 15.747607231140137, "min": -32.84710693359375, "p10": -9.327941131591796, "median": 5.121774673461914, "p90": 31.055702209472663, "max": 44.86292266845703, "pos_frac": 0.65625, "sample": [44.42486572265625, 9.606986999511719, 12.778018951416016, 1.9423599243164062, 15.671165466308594, -8.963981628417969, 14.0609130859375, 4.969825744628906, 39.67528533935547, 0.4604606628417969, 29.12091827392578, 4.028472900390625, -32.84710693359375, 5.148887634277344, -13.901832580566406, -6.467742919921875, 8.091049194335938, 16.422317504882812, 16.484710693359375, -23.031906127929688, 25.22176742553711, 32.221588134765625, 31.88489532470703, -12.935920715332031, -1.9479446411132812, 20.527236938476562, 17.337005615234375, -12.112346649169922, -15.655670166015625, -3.970855712890625, -2.5522842407226562, -3.6955699920654297, -1.8279953002929688, -9.455352783203125, 3.7186546325683594, 35.9807243347168, 20.229751586914062, -6.079978942871094, 10.453170776367188, 2.9711227416992188, 14.482826232910156, 0.6209030151367188, -4.855016708374023, 18.591957092285156, -2.75640869140625, 32.063255310058594, 3.1276702880859375, 10.110456466674805, 19.854080200195312, -9.030647277832031, 9.961341857910156, 16.281070709228516, 5.094661712646484, 44.86292266845703, 11.110786437988281, 0.3476104736328125, 9.608596801757812, 17.314559936523438, -2.8623504638671875, 6.176795959472656, -1.4831123352050781, -2.2960205078125, -5.489036560058594, 25.656280517578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000437.npy"}
|
||||
{"epoch": 0.6606198034769464, "step": 438, "batch_size": 64, "mean": 10.254268646240234, "std": 14.599565505981445, "min": -25.064029693603516, "p10": -6.725996017456055, "median": 8.464969635009766, "p90": 30.52348556518555, "max": 39.32765197753906, "pos_frac": 0.75, "sample": [30.684730529785156, -6.772647857666016, 10.997100830078125, -8.164104461669922, 0.3285331726074219, -8.389877319335938, 3.57720947265625, 13.18731689453125, 3.7779769897460938, 20.3548583984375, 1.3934249877929688, 39.32765197753906, -6.6171417236328125, -0.7570953369140625, 21.372522354125977, 2.3828697204589844, 23.023502349853516, -16.52545928955078, 11.93021011352539, 15.169778823852539, 26.310279846191406, 26.870716094970703, -4.626308441162109, 1.1011810302734375, 13.700653076171875, 5.519172668457031, 21.887020111083984, 6.050861358642578, 31.93948745727539, 30.055747985839844, 12.857666015625, -1.837982177734375, 33.23186492919922, -1.3816032409667969, 2.6557540893554688, 33.569252014160156, 18.453155517578125, -4.004413604736328, 4.200649261474609, 21.532886505126953, 30.147247314453125, 7.0789947509765625, 11.301429748535156, 34.345550537109375, 17.0469970703125, 1.3321533203125, 17.767044067382812, 9.850944519042969, -14.794258117675781, 4.041759490966797, -5.28558349609375, -14.487039566040039, 6.771280288696289, 24.408706665039062, 24.18368911743164, -25.064029693603516, 35.61989212036133, 23.948585510253906, -3.6635971069335938, 24.323505401611328, 1.8910408020019531, -0.19626998901367188, 4.319618225097656, 13.01815414428711], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000438.npy"}
|
||||
{"epoch": 0.6621315192743764, "step": 439, "batch_size": 64, "mean": 9.969182968139648, "std": 13.763657569885254, "min": -16.968948364257812, "p10": -5.706091880798339, "median": 6.357906341552734, "p90": 30.167578506469727, "max": 39.9547119140625, "pos_frac": 0.75, "sample": [18.09601593017578, 4.803363800048828, -3.662385940551758, -9.260025024414062, 12.09892749786377, 27.533218383789062, 9.905345916748047, -0.9590606689453125, 4.55865478515625, 26.676963806152344, 39.119956970214844, 6.193595886230469, 21.230579376220703, 3.4655914306640625, 9.302291870117188, 35.49534606933594, 29.965621948242188, 5.3018035888671875, 11.870040893554688, 7.2445831298828125, 2.697002410888672, 27.553253173828125, 11.076385498046875, -13.572649002075195, 9.058502197265625, 4.4718780517578125, 12.87994384765625, -5.965080261230469, 4.321430206298828, 37.04582214355469, -3.576976776123047, 36.34943389892578, -6.0794830322265625, -8.6568603515625, 30.254131317138672, 3.8834075927734375, -1.3445549011230469, -0.04522705078125, -16.968948364257812, -5.101785659790039, -3.417083740234375, 2.47479248046875, 22.588706970214844, 6.45855712890625, 24.55632781982422, 22.318649291992188, 22.239089965820312, 24.347274780273438, 10.598014831542969, 2.637178421020508, 4.7308197021484375, 6.257255554199219, 3.9720382690429688, 31.573448181152344, 8.644691467285156, 39.9547119140625, -1.748077392578125, -6.4454498291015625, 0.10800933837890625, -4.672508239746094, 10.420036315917969, 13.394561767578125, 19.07309341430664, 0.7035446166992188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000439.npy"}
|
||||
{"epoch": 0.6636432350718064, "step": 440, "batch_size": 64, "mean": 12.433677673339844, "std": 13.495784759521484, "min": -15.316902160644531, "p10": -2.1584552764892577, "median": 8.227142333984375, "p90": 32.3060604095459, "max": 38.091712951660156, "pos_frac": 0.828125, "sample": [5.316505432128906, 9.34710693359375, 20.044403076171875, 28.648880004882812, 33.169036865234375, 21.719390869140625, 37.264625549316406, 0.28237342834472656, 32.42396545410156, 12.838714599609375, 8.200172424316406, 32.030948638916016, -6.590213775634766, 24.668533325195312, 5.169456481933594, 7.167121887207031, 31.736183166503906, -15.316902160644531, 5.3974151611328125, 6.074758529663086, 12.33734130859375, -0.9640178680419922, -0.045257568359375, 17.975746154785156, 14.977096557617188, 21.84996795654297, 22.179908752441406, 15.61151123046875, 1.2076225280761719, 8.254112243652344, 0.2975120544433594, 10.561775207519531, 3.7589149475097656, 5.205192565917969, 2.716610908508301, 2.6923561096191406, 6.809993743896484, -4.567714691162109, 6.647693634033203, 31.459789276123047, 18.529766082763672, 26.44402313232422, 1.6361083984375, 32.68190383911133, -2.260711669921875, -1.9198570251464844, 38.091712951660156, 7.907554626464844, 28.385520935058594, -1.6309814453125, -3.9405517578125, -13.939685821533203, 29.130859375, 19.731948852539062, 19.12963104248047, 14.126956939697266, 0.5087661743164062, -4.32996940612793, 19.218399047851562, 5.648429870605469, 2.7128219604492188, 36.6038818359375, 36.98714828491211, 7.743019104003906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000440.npy"}
|
||||
{"epoch": 0.6651549508692366, "step": 441, "batch_size": 64, "mean": 10.227784156799316, "std": 13.013042449951172, "min": -15.645980834960938, "p10": -5.938378906249999, "median": 7.430473327636719, "p90": 28.51159934997559, "max": 39.257171630859375, "pos_frac": 0.78125, "sample": [14.135726928710938, 9.743095397949219, 28.851764678955078, 15.876777648925781, 11.99432373046875, -0.8012619018554688, 20.597625732421875, 4.5433349609375, 16.243301391601562, -3.1185226440429688, -6.899097442626953, 17.51165771484375, 4.436004638671875, 31.232864379882812, 4.473602294921875, 18.253143310546875, 30.75109100341797, 3.8477249145507812, -10.444717407226562, 27.717880249023438, 8.09515380859375, 6.532787322998047, 33.703155517578125, 24.762313842773438, -1.146820068359375, 36.29521560668945, 2.0875587463378906, 22.812942504882812, 39.257171630859375, -0.93060302734375, -1.4710464477539062, 2.6185531616210938, -12.278112411499023, 25.60570526123047, 24.81494903564453, -3.5285186767578125, -10.041778564453125, 26.109153747558594, 22.146045684814453, 4.534858703613281, 1.061065673828125, 19.608299255371094, 4.160739898681641, 12.901268005371094, 2.941070556640625, -7.3049774169921875, 9.742927551269531, 2.0398941040039062, 6.7657928466796875, -6.362133026123047, 21.6142578125, -4.949619293212891, 18.535049438476562, 33.74755859375, -15.645980834960938, 10.559349060058594, 13.7772216796875, 5.8844451904296875, 2.2721176147460938, 5.915008544921875, 0.7625312805175781, 14.907493591308594, 11.584939956665039, 1.1348495483398438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000441.npy"}
|
||||
{"epoch": 0.6666666666666666, "step": 442, "batch_size": 64, "mean": 8.582521438598633, "std": 12.881210327148438, "min": -23.917896270751953, "p10": -7.721159362792968, "median": 8.825737953186035, "p90": 25.468904113769533, "max": 39.14830017089844, "pos_frac": 0.703125, "sample": [4.934516906738281, 5.963653564453125, -1.898162841796875, -2.7049102783203125, 6.3852691650390625, 24.743515014648438, 39.14830017089844, 32.37784194946289, 6.761325836181641, 20.33757781982422, 11.937576293945312, 17.73839569091797, 27.502059936523438, -11.11236572265625, 15.56246566772461, 20.010696411132812, -1.1807861328125, 2.1805572509765625, 3.644134521484375, -12.554986953735352, -4.3609161376953125, 8.901124954223633, -7.543083190917969, -1.9518013000488281, 16.037460327148438, 12.080894470214844, -10.48638916015625, 15.371524810791016, 22.234214782714844, 14.584159851074219, 13.453594207763672, 9.582328796386719, 16.84838104248047, 29.971405029296875, 27.5791015625, -0.057491302490234375, -2.5282135009765625, -1.0220947265625, 34.508689880371094, 18.5567626953125, 11.039485931396484, 1.745025634765625, -3.5757369995117188, 1.984649658203125, 2.779388427734375, -23.917896270751953, -3.31292724609375, 25.77978515625, -7.797477722167969, 6.10504150390625, 22.227981567382812, 7.522331237792969, -10.545310974121094, 18.70969009399414, 11.941024780273438, -8.329286575317383, 10.494361877441406, 9.176422119140625, -6.400848388671875, 15.712261199951172, 8.750350952148438, 24.22083282470703, 0.22095108032226562, 13.214954376220703], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000442.npy"}
|
||||
{"epoch": 0.6681783824640968, "step": 443, "batch_size": 64, "mean": 11.554608345031738, "std": 16.78643226623535, "min": -24.692584991455078, "p10": -4.831143569946289, "median": 8.642165184020996, "p90": 32.31091041564942, "max": 61.918609619140625, "pos_frac": 0.765625, "sample": [0.042850494384765625, 22.310089111328125, 14.022056579589844, 30.622604370117188, 5.152397155761719, 13.654571533203125, 27.414505004882812, -18.570602416992188, 17.97613525390625, 0.880889892578125, 2.9526214599609375, -9.289758682250977, 1.3621864318847656, 17.647266387939453, -4.903263092041016, 19.87566375732422, -0.2718658447265625, 15.668830871582031, 20.4437255859375, 20.189956665039062, 61.918609619140625, -24.692584991455078, 0.7392120361328125, 28.679855346679688, -3.0071754455566406, 44.395111083984375, 27.13024139404297, 2.7310218811035156, 42.679046630859375, 34.14707946777344, 10.863739013671875, -9.603073120117188, 11.44443130493164, 18.072532653808594, 4.7892913818359375, 31.821582794189453, 9.469337463378906, 30.2235107421875, 28.29437255859375, 36.05228042602539, 7.4829559326171875, 8.517261505126953, -2.380655288696289, 8.671379089355469, 32.52062225341797, 28.44532012939453, -1.235931396484375, 8.326225280761719, 7.520904541015625, 3.7665767669677734, 8.612951278686523, -3.6732254028320312, 10.81585693359375, 1.5718765258789062, -2.7424468994140625, 4.227996826171875, -21.64733123779297, -3.0778350830078125, 41.00544738769531, 0.49046897888183594, -13.160512924194336, -4.662864685058594, 9.7479248046875, 27.022705078125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000443.npy"}
|
||||
{"epoch": 0.6696900982615268, "step": 444, "batch_size": 64, "mean": 4.655010223388672, "std": 14.266925811767578, "min": -35.665283203125, "p10": -8.94140625, "median": 3.5557632446289062, "p90": 21.342378425598145, "max": 44.574371337890625, "pos_frac": 0.65625, "sample": [15.014022827148438, 8.358528137207031, -5.785179138183594, 0.6241111755371094, 5.200439453125, 4.9670257568359375, 20.97516441345215, -0.7873992919921875, -9.035247802734375, 20.73676872253418, 5.6675262451171875, 7.920166015625, -9.539192199707031, -24.66692352294922, 3.59429931640625, 0.7522125244140625, 8.103525161743164, 5.210140228271484, 8.670549392700195, -3.271484375, 41.2474365234375, 16.621612548828125, -2.2371225357055664, 2.0698013305664062, -0.638824462890625, 2.0537567138671875, -6.148536682128906, -6.265815734863281, -1.2992172241210938, -2.3639678955078125, 28.563461303710938, 6.024097442626953, 20.45256805419922, -2.9104766845703125, 3.30718994140625, 18.71912384033203, 0.583892822265625, 31.523117065429688, -13.127811431884766, -29.650588989257812, 2.4357757568359375, 44.574371337890625, -35.665283203125, -3.383209228515625, -8.722442626953125, 22.144577026367188, 6.783172607421875, 21.499755859375, 27.634620666503906, 3.5172271728515625, 4.202005386352539, 3.6302719116210938, 3.885265350341797, 11.39434814453125, 15.036056518554688, -0.8359928131103516, -5.151538848876953, 1.6046333312988281, 8.205314636230469, -5.164083480834961, -15.894973754882812, 19.917022705078125, 5.198570251464844, 1.8424644470214844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000444.npy"}
|
||||
{"epoch": 0.671201814058957, "step": 445, "batch_size": 64, "mean": 11.481884002685547, "std": 15.013211250305176, "min": -18.700653076171875, "p10": -6.921235656738281, "median": 9.53059196472168, "p90": 31.21283187866211, "max": 38.524749755859375, "pos_frac": 0.78125, "sample": [20.736614227294922, 34.35308837890625, 3.3503189086914062, 0.249755859375, -10.138626098632812, 28.438148498535156, 28.67784309387207, 10.793594360351562, 21.281524658203125, 6.358936309814453, -17.497901916503906, -6.132537841796875, 2.1689682006835938, -0.5375900268554688, -6.4150238037109375, 24.54327392578125, -18.700653076171875, 2.107940673828125, -7.13818359375, 16.06939697265625, -0.8865756988525391, 30.68689727783203, -17.64685821533203, 11.661605834960938, -16.876922607421875, 29.537216186523438, 5.114982604980469, 23.453845977783203, 9.550834655761719, -9.823890686035156, -3.2254409790039062, -3.438467025756836, 6.608171463012695, 29.680042266845703, 31.438232421875, 14.373641967773438, 38.524749755859375, 34.84757614135742, 7.513450622558594, 6.3943634033203125, 20.400501251220703, 24.643474578857422, 9.51034927368164, 32.065284729003906, 16.253082275390625, 30.685678482055664, 26.329421997070312, 3.8138256072998047, 26.370765686035156, -2.0007553100585938, 5.2494354248046875, 1.9810028076171875, 3.1342620849609375, 6.5250091552734375, 14.440071105957031, 20.806259155273438, 34.038787841796875, 14.44085693359375, 5.2899017333984375, 5.556396484375, 11.588813781738281, 25.34400177001953, 36.12915802001953, 2.1886672973632812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000445.npy"}
|
||||
{"epoch": 0.672713529856387, "step": 446, "batch_size": 64, "mean": 12.521492958068848, "std": 12.840214729309082, "min": -12.501676559448242, "p10": -2.9162460327148434, "median": 11.250411987304688, "p90": 29.960900497436526, "max": 39.59233093261719, "pos_frac": 0.828125, "sample": [3.0143165588378906, -5.075172424316406, -7.973611831665039, 16.659881591796875, 11.211395263671875, 5.404266357421875, 7.329742431640625, 2.6250152587890625, 28.6439208984375, 19.192378997802734, 13.611827850341797, 14.192138671875, 16.071563720703125, 23.973464965820312, 18.6837158203125, 32.718910217285156, 1.5382003784179688, 15.765750885009766, 28.730072021484375, 11.2894287109375, 1.0300540924072266, 21.716760635375977, 2.174468994140625, 18.974868774414062, 27.316329956054688, 15.589315414428711, 4.455619812011719, 30.15691375732422, 33.0518798828125, -6.8904876708984375, 21.724536895751953, 3.250624656677246, -1.8764839172363281, -0.23110198974609375, -12.501676559448242, -7.660102844238281, 13.247276306152344, -3.2088279724121094, 7.322273254394531, 10.083366394042969, 2.4224014282226562, 37.73760223388672, 32.417144775390625, -3.031587600708008, 39.59233093261719, -0.04918670654296875, 29.200836181640625, 24.32251739501953, -2.647115707397461, 2.038388252258301, 11.40362548828125, 1.7781600952148438, 9.2706298828125, 9.9644775390625, 29.503536224365234, 34.48246765136719, 5.971714019775391, 10.919380187988281, 3.3121938705444336, 26.278358459472656, 1.2166976928710938, 17.944990158081055, 14.99188232421875, 27.00130844116211], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000446.npy"}
|
||||
{"epoch": 0.674225245653817, "step": 447, "batch_size": 64, "mean": 10.858244895935059, "std": 15.17583179473877, "min": -34.49857711791992, "p10": -6.150934600830078, "median": 8.626078128814697, "p90": 31.058800506591798, "max": 40.58570861816406, "pos_frac": 0.796875, "sample": [35.3033561706543, 21.709312438964844, 20.320541381835938, 0.10720062255859375, 0.197021484375, -8.20697021484375, 9.447250366210938, 1.9155731201171875, -18.028213500976562, 5.902984619140625, 8.288948059082031, 24.767837524414062, 1.3778610229492188, 8.39117431640625, 11.880836486816406, 31.267169952392578, -3.4541091918945312, 19.57491111755371, 4.311592102050781, 7.022426605224609, 13.741653442382812, -9.616264343261719, 12.274551391601562, -10.505603790283203, 0.3602294921875, 28.653778076171875, 26.079727172851562, 30.808792114257812, 19.286102294921875, -3.0853118896484375, 18.757080078125, 24.03619384765625, 3.048582077026367, 28.5838623046875, 8.643199920654297, 13.360912322998047, 36.210411071777344, 14.432342529296875, 31.16594696044922, -4.958446502685547, -1.4656562805175781, 8.411399841308594, 30.395355224609375, -3.1049575805664062, 6.3023223876953125, -16.310226440429688, 9.051116943359375, 34.54023742675781, 2.6900405883789062, 29.660560607910156, 0.8519973754882812, 14.186992645263672, -6.263668060302734, 11.620903015136719, 25.50079345703125, 2.9105682373046875, 24.382492065429688, -34.49857711791992, 7.055683135986328, 40.58570861816406, 35.37761688232422, -5.887889862060547, 6.95147705078125, 8.608956336975098], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000447.npy"}
|
||||
{"epoch": 0.6757369614512472, "step": 448, "batch_size": 64, "mean": 9.122481346130371, "std": 14.032418251037598, "min": -17.05938720703125, "p10": -7.156623649597168, "median": 5.095539093017578, "p90": 29.56924858093262, "max": 45.324440002441406, "pos_frac": 0.734375, "sample": [28.627883911132812, -17.05938720703125, 29.97269058227539, 6.783878326416016, 3.5446395874023438, 4.917472839355469, -2.0129661560058594, 15.750133514404297, 28.273216247558594, 1.7852020263671875, 31.598739624023438, 17.522300720214844, 2.5169219970703125, 4.2109527587890625, 25.089820861816406, 25.576797485351562, -7.234659194946289, 32.03668975830078, 17.396041870117188, -6.707622528076172, 4.872528076171875, 40.088050842285156, -0.21341705322265625, 4.447235107421875, 3.5248069763183594, 0.7879314422607422, 26.194923400878906, 0.37152671813964844, 7.7908477783203125, -5.322845458984375, -7.643709182739258, -0.3500213623046875, 9.44119644165039, 5.2736053466796875, -0.7177505493164062, 6.169624328613281, -9.252006530761719, 13.942680358886719, 30.895912170410156, -3.27569580078125, 7.147331237792969, -15.118160247802734, 15.163166046142578, 6.2393798828125, 1.4876251220703125, 23.5374755859375, 14.561256408691406, 26.108871459960938, 12.20583724975586, -6.974540710449219, -7.750087738037109, -2.9342269897460938, 0.5911712646484375, 15.022651672363281, -5.647759437561035, 30.46905517578125, 3.1260223388671875, -13.406402587890625, 20.12786102294922, 17.61743927001953, 20.130645751953125, 4.705375671386719, 2.4902496337890625, 45.324440002441406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000448.npy"}
|
||||
{"epoch": 0.6772486772486772, "step": 449, "batch_size": 64, "mean": 5.106272220611572, "std": 15.091479301452637, "min": -35.32158279418945, "p10": -11.644622421264646, "median": 6.132867813110352, "p90": 22.20162811279297, "max": 39.620643615722656, "pos_frac": 0.65625, "sample": [-1.3722991943359375, 7.1490020751953125, -3.1196212768554688, 9.577239990234375, 0.526092529296875, -9.82192611694336, 3.74261474609375, 9.060379028320312, 4.7342681884765625, 30.691429138183594, 21.52880859375, 21.96929168701172, 1.57904052734375, -14.568229675292969, -5.825653076171875, 6.0517120361328125, 39.620643615722656, 3.9309768676757812, 0.138671875, 9.70172119140625, 22.30120086669922, 7.165374755859375, 29.907012939453125, 39.237815856933594, -17.331588745117188, 11.050640106201172, 22.965457916259766, -13.774627685546875, 8.418098449707031, 17.763980865478516, 4.8626708984375, -6.955768585205078, 7.2416229248046875, -35.32158279418945, 20.61358642578125, 13.892341613769531, 39.52154541015625, -10.208549499511719, -4.498371124267578, 7.306110382080078, -5.519504547119141, 8.97467041015625, -12.260082244873047, 6.733001708984375, -8.705488204956055, 5.8899688720703125, -2.550495147705078, 15.737548828125, 3.481595993041992, 13.208747863769531, 18.738906860351562, -0.8275146484375, 8.546882629394531, 6.878211975097656, -7.541290283203125, 12.89764404296875, 17.92627716064453, -25.572242736816406, 6.214023590087891, -9.111244201660156, 11.661247253417969, -31.0286865234375, -0.543304443359375, -5.878574371337891], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000449.npy"}
|
||||
{"epoch": 0.6787603930461074, "step": 450, "batch_size": 64, "mean": 8.726282119750977, "std": 14.36825942993164, "min": -29.286792755126953, "p10": -8.735680007934569, "median": 9.304826736450195, "p90": 27.748597335815433, "max": 42.17632293701172, "pos_frac": 0.71875, "sample": [-1.107208251953125, -11.076915740966797, 4.915431976318359, -1.9486808776855469, 17.30072021484375, 24.34471893310547, 14.502891540527344, 5.983795166015625, 18.345813751220703, 42.17632293701172, 28.057754516601562, 9.854789733886719, 10.546794891357422, 9.582706451416016, -1.8477897644042969, 20.42298126220703, 1.7563705444335938, -0.5721855163574219, 13.333267211914062, 36.919342041015625, -7.913394927978516, 26.23668670654297, 0.29680633544921875, 5.095436096191406, -8.955947875976562, -10.930789947509766, 9.026947021484375, 19.75030517578125, 13.913581848144531, -8.221721649169922, 15.906364440917969, 24.63348388671875, 0.13814163208007812, 11.254962921142578, -29.286792755126953, 28.028762817382812, 24.255435943603516, -0.7075004577636719, 27.43096923828125, 11.261634826660156, 7.1775665283203125, 9.864654541015625, 12.837093353271484, -15.307785034179688, 32.083656311035156, 2.899608612060547, 6.899444580078125, 2.1878128051757812, 18.7225341796875, -8.194671630859375, 3.4559860229492188, 4.127601623535156, 0.011051177978515625, 17.50961685180664, 40.261474609375, 10.134567260742188, -17.242324829101562, 16.54925537109375, -0.7562484741210938, 27.884723663330078, -7.182708740234375, 13.828563690185547, -10.580894470214844, -1.3928108215332031], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000450.npy"}
|
||||
{"epoch": 0.6802721088435374, "step": 451, "batch_size": 64, "mean": 7.567141532897949, "std": 13.510346412658691, "min": -22.514572143554688, "p10": -8.220994567871093, "median": 6.584856033325195, "p90": 27.304763603210453, "max": 40.51668167114258, "pos_frac": 0.734375, "sample": [11.946720123291016, -9.047561645507812, 1.2131118774414062, 9.848289489746094, 27.888839721679688, 6.6068115234375, -19.240829467773438, 2.342620849609375, 6.595577239990234, 18.727508544921875, 15.549102783203125, 14.23807144165039, 30.668663024902344, 7.409271240234375, 24.84033966064453, 3.552978515625, 5.62408447265625, 6.210391998291016, 2.490741729736328, 10.266498565673828, 19.88231086730957, -8.297935485839844, -3.4016952514648438, 35.669647216796875, 25.941919326782227, -8.041465759277344, 3.3733367919921875, -22.514572143554688, -14.619438171386719, -0.7348308563232422, 8.209571838378906, 16.388595581054688, -12.887378692626953, 0.8793411254882812, 21.521583557128906, 9.1541748046875, 33.46656799316406, 17.49664306640625, -1.1326789855957031, 1.4946136474609375, -7.844524383544922, 9.7633056640625, 3.5269031524658203, 8.518619537353516, 15.486352920532227, 7.126007080078125, 13.716209411621094, 33.09495544433594, 32.4749755859375, 14.848503112792969, 18.6224365234375, 3.3197555541992188, 6.574134826660156, -3.2465744018554688, -5.455875396728516, -10.273902893066406, 11.606071472167969, 0.5496139526367188, -2.9663162231445312, 3.4086456298828125, 40.51668167114258, 0.8122100830078125, -5.902183532714844, -3.558502197265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000451.npy"}
|
||||
{"epoch": 0.6817838246409675, "step": 452, "batch_size": 64, "mean": 9.594108581542969, "std": 15.313369750976562, "min": -22.02325439453125, "p10": -7.451029968261718, "median": 8.3705472946167, "p90": 29.103991699218756, "max": 43.74620056152344, "pos_frac": 0.6875, "sample": [8.323837280273438, -4.968658447265625, 12.925960540771484, 19.33306884765625, -3.098522186279297, 26.732187271118164, 9.11492919921875, 43.74620056152344, 30.046512603759766, 27.44036102294922, 5.8821563720703125, 19.890121459960938, -6.6873016357421875, 17.65874481201172, 23.59899139404297, 4.855865478515625, 36.56251525878906, 6.877471923828125, 23.106719970703125, 9.42767333984375, 6.507072448730469, -6.562858581542969, 7.00787353515625, 7.605365753173828, 0.4394817352294922, -2.866424560546875, 5.262062072753906, 4.317909240722656, 17.34308624267578, -0.3886756896972656, -4.255165100097656, 43.68638610839844, -17.816383361816406, 29.621063232421875, -9.500190734863281, 37.6058349609375, 24.11229705810547, 2.7009811401367188, -13.23651123046875, 8.417257308959961, 16.54294204711914, -6.996429443359375, 19.219894409179688, 11.121055603027344, 20.475292205810547, -19.172138214111328, 39.836570739746094, 14.947998046875, 14.810905456542969, -11.84896469116211, 14.275386810302734, -2.6468658447265625, 9.40243148803711, 14.004104614257812, -6.892177581787109, 0.32549285888671875, -1.9770660400390625, 20.117759704589844, 21.355567932128906, -7.6458587646484375, -1.5246429443359375, 27.897491455078125, -22.02325439453125, -0.35186004638671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000452.npy"}
|
||||
{"epoch": 0.6832955404383976, "step": 453, "batch_size": 64, "mean": 8.15820026397705, "std": 15.606464385986328, "min": -26.1113338470459, "p10": -12.771657943725586, "median": 6.443055152893066, "p90": 29.26834030151367, "max": 41.194854736328125, "pos_frac": 0.71875, "sample": [29.292755126953125, 20.332550048828125, -2.0086708068847656, -10.483760833740234, 29.21137237548828, 9.680084228515625, -7.993381500244141, 18.836158752441406, 29.008026123046875, 7.978340148925781, -26.1113338470459, 6.891529083251953, 17.940277099609375, -4.813362121582031, 24.703521728515625, 30.015869140625, 18.457244873046875, 4.584075927734375, 16.532424926757812, 10.4964599609375, 3.6615829467773438, -19.61310577392578, 5.55535888671875, 4.129188537597656, 3.3648452758789062, -15.692428588867188, -13.705863952636719, 21.18726348876953, 1.3123970031738281, 8.925361633300781, 22.82537841796875, 4.7041778564453125, 9.306175231933594, -13.041114807128906, -8.193328857421875, 5.99458122253418, 24.283096313476562, 4.536228179931641, -25.587799072265625, 21.76665496826172, 4.464378356933594, 36.676177978515625, -3.0497817993164062, -3.6688194274902344, -1.2988967895507812, 10.077156066894531, -4.118406295776367, 1.87017822265625, 22.683555603027344, 1.6654815673828125, 15.607177734375, -3.4540786743164062, 5.901269912719727, 3.433112144470215, 8.783699035644531, -19.662933349609375, -12.142925262451172, 41.194854736328125, 30.53423309326172, 8.46942138671875, 29.337448120117188, 29.016284942626953, 33.619598388671875, 17.917808532714844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000453.npy"}
|
||||
{"epoch": 0.6848072562358276, "step": 454, "batch_size": 64, "mean": 11.083334922790527, "std": 15.913220405578613, "min": -42.74976348876953, "p10": -3.4385141372680663, "median": 9.853358268737793, "p90": 33.13852424621583, "max": 41.276145935058594, "pos_frac": 0.734375, "sample": [33.88615417480469, -12.63982105255127, -2.612762451171875, 29.402793884277344, -42.74976348876953, -1.5120124816894531, 26.236709594726562, 2.1941070556640625, 23.12865447998047, 0.028219223022460938, 23.586074829101562, 16.49090576171875, 4.233695983886719, -2.2314300537109375, 28.266075134277344, 14.881134033203125, 21.597599029541016, 16.212039947509766, 2.602783203125, 7.3809814453125, -2.551982879638672, -19.67693328857422, -2.141305923461914, 26.85388946533203, 11.285751342773438, 35.200531005859375, -8.474689483642578, 21.66217041015625, 3.3440093994140625, 2.687286376953125, 6.179931640625, 14.965644836425781, 15.281959533691406, -9.371875762939453, 25.293594360351562, 15.64152717590332, 40.154579162597656, -3.511453628540039, 40.216941833496094, 16.275272369384766, 3.5668373107910156, 0.05438995361328125, 4.470909118652344, -1.9488372802734375, -3.268321990966797, 10.670669555664062, 31.394054412841797, 18.967498779296875, -6.85888671875, 9.864133834838867, 27.113662719726562, 7.501983642578125, 41.276145935058594, 30.927734375, 16.639524459838867, 9.842582702636719, 11.691896438598633, 35.145286560058594, 3.3456497192382812, 34.84638977050781, 8.570919036865234, -0.7117843627929688, -1.0637874603271484, -0.4022083282470703], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000454.npy"}
|
||||
{"epoch": 0.6863189720332578, "step": 455, "batch_size": 64, "mean": 11.924175262451172, "std": 14.354608535766602, "min": -19.87657928466797, "p10": -1.5624639511108391, "median": 8.510963439941406, "p90": 27.990800094604495, "max": 50.44769287109375, "pos_frac": 0.859375, "sample": [15.454055786132812, 14.788131713867188, 13.756278991699219, 36.77489471435547, 4.9770965576171875, 0.5968456268310547, 15.842376708984375, 46.189239501953125, 17.18634796142578, 0.62322998046875, 12.663663864135742, 1.630828857421875, 9.573036193847656, 27.70700454711914, 14.600631713867188, 31.358413696289062, 13.731292724609375, -4.6266937255859375, 2.1453857421875, -14.932647705078125, 25.189178466796875, -1.8398513793945312, 27.12643051147461, 1.5160694122314453, 1.474848747253418, 28.1124267578125, 20.46031951904297, 18.239273071289062, 26.83343505859375, 3.4060516357421875, 27.005233764648438, 5.426841735839844, 35.921112060546875, 4.8076171875, 7.448890686035156, 5.251899719238281, -19.87657928466797, 21.850399017333984, 25.019725799560547, 16.24286651611328, -10.711872100830078, 7.12646484375, 23.999099731445312, -13.09063720703125, 22.01755142211914, 6.68939208984375, 22.868370056152344, -0.9425125122070312, 6.645763397216797, 5.4130706787109375, 50.44769287109375, 7.3572998046875, 3.4239120483398438, 15.56844711303711, 7.050628662109375, 0.45973968505859375, 13.012760162353516, -1.8129653930664062, 0.0567626953125, 1.2865066528320312, -0.9779605865478516, 11.236473083496094, 0.3996467590332031, 45.96795654296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000455.npy"}
|
||||
{"epoch": 0.6878306878306878, "step": 456, "batch_size": 64, "mean": 10.167566299438477, "std": 13.6965913772583, "min": -24.15976333618164, "p10": -4.7167625427246085, "median": 9.686728477478027, "p90": 31.533634567260748, "max": 40.715850830078125, "pos_frac": 0.796875, "sample": [29.55914306640625, 2.0351104736328125, 1.159841537475586, 14.68218994140625, 38.197265625, 0.1793537139892578, -2.298403739929199, 12.731475830078125, -1.4969005584716797, 40.715850830078125, 6.448310852050781, 12.397445678710938, 23.566734313964844, 17.338973999023438, 11.356803894042969, 5.972877502441406, 5.508056640625, 34.2423095703125, 0.0487823486328125, 12.468147277832031, -5.187286376953125, 14.55560302734375, 13.937759399414062, -3.275402069091797, 16.042987823486328, 10.465858459472656, -24.15976333618164, -5.4581756591796875, 12.486209869384766, -9.835884094238281, -2.5151519775390625, 5.96209716796875, 11.044059753417969, 11.245208740234375, 9.973793029785156, 25.045270919799805, 10.149101257324219, -3.6188735961914062, 20.13343048095703, 0.6768321990966797, -13.229877471923828, 19.436859130859375, 4.439901351928711, 22.635360717773438, 9.399663925170898, -11.065254211425781, 23.011856079101562, 5.5145416259765625, 0.5465621948242188, 0.690338134765625, 32.524681091308594, 32.14908981323242, 3.237548828125, 36.577919006347656, 14.979644775390625, 16.556499481201172, 37.81672668457031, 7.7123565673828125, -2.691650390625, 30.097572326660156, 9.076446533203125, -10.044219970703125, 0.4933929443359375, 8.377250671386719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000456.npy"}
|
||||
{"epoch": 0.6893424036281179, "step": 457, "batch_size": 64, "mean": 8.07726764678955, "std": 12.467975616455078, "min": -17.90808868408203, "p10": -4.77651786804199, "median": 6.29632568359375, "p90": 26.557839584350592, "max": 40.40232849121094, "pos_frac": 0.75, "sample": [5.701683044433594, -2.6361541748046875, -1.9006004333496094, 2.5394134521484375, 9.665847778320312, -12.309112548828125, 7.684484481811523, 5.208839416503906, 4.107105255126953, -2.7311935424804688, 1.605804443359375, 13.7685546875, 16.69188690185547, 1.6915283203125, -8.408487319946289, -17.90808868408203, 30.10284423828125, 0.437957763671875, -3.086231231689453, 4.284660339355469, 31.091087341308594, 12.281383514404297, 40.40232849121094, 7.642547607421875, 7.053583145141602, -2.157917022705078, -3.3621864318847656, 23.510517120361328, 8.328178405761719, 1.9649314880371094, -0.9246368408203125, -15.370918273925781, 1.5600090026855469, 12.616226196289062, 13.147735595703125, -3.152374267578125, -5.382659912109375, 7.7682647705078125, 20.610671997070312, 6.3176727294921875, 20.779193878173828, -12.49307632446289, 36.84361267089844, 28.231910705566406, -8.518726348876953, 4.511131286621094, 27.184494018554688, 5.537520408630371, 6.880641937255859, 32.48988342285156, 15.519588470458984, 4.527660369873047, 21.826141357421875, 9.471729278564453, 12.758426666259766, 24.92670440673828, 7.564216613769531, 8.786449432373047, 9.871549606323242, 5.397254943847656, 6.2749786376953125, 5.856597900390625, 25.095645904541016, -0.8335590362548828], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000457.npy"}
|
||||
{"epoch": 0.690854119425548, "step": 458, "batch_size": 64, "mean": 11.217958450317383, "std": 13.617854118347168, "min": -24.315231323242188, "p10": -3.6082336425781243, "median": 10.799022674560547, "p90": 27.299240493774416, "max": 40.75590133666992, "pos_frac": 0.796875, "sample": [-1.1408004760742188, 26.639347076416016, 4.242234230041504, 9.324203491210938, 7.834878921508789, 13.537750244140625, 21.733001708984375, -24.315231323242188, -1.2889347076416016, 9.213855743408203, 5.2051849365234375, -8.66998291015625, -2.7656402587890625, 17.359481811523438, 27.552135467529297, 0.000102996826171875, 13.641281127929688, 12.329376220703125, 3.9666519165039062, 10.844268798828125, 6.889984130859375, 5.676399230957031, 22.857807159423828, -7.612174987792969, 24.59967041015625, 14.346885681152344, 35.659759521484375, 13.408222198486328, 2.5756301879882812, 15.427803039550781, 26.709152221679688, 13.049976348876953, 2.2340965270996094, -1.3756103515625, 6.923408508300781, 0.30199432373046875, 4.816810607910156, 10.187408447265625, 18.192672729492188, 28.339004516601562, 12.808820724487305, 10.753776550292969, -10.889076232910156, -1.7483901977539062, 25.7528076171875, 4.210493087768555, 25.885967254638672, -0.747528076171875, 39.24530792236328, 32.19898223876953, -12.140544891357422, 19.626373291015625, 12.460968017578125, 15.805816650390625, -13.063861846923828, 40.75590133666992, 25.66845703125, 39.30870056152344, -3.9693450927734375, 0.6241836547851562, 4.284513473510742, 15.815803527832031, 20.439987182617188, 26.409164428710938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000458.npy"}
|
||||
{"epoch": 0.6923658352229781, "step": 459, "batch_size": 64, "mean": 10.853067398071289, "std": 14.778289794921875, "min": -17.93381118774414, "p10": -8.756497001647949, "median": 9.677852630615234, "p90": 32.71335182189943, "max": 42.992034912109375, "pos_frac": 0.734375, "sample": [11.522514343261719, 16.755809783935547, -17.93381118774414, 23.14221954345703, -1.5292472839355469, -9.537109375, 8.352334976196289, 19.175048828125, 22.22503662109375, 20.705162048339844, 21.43365478515625, 0.9423141479492188, 24.137462615966797, -1.5155410766601562, 24.197763442993164, 42.992034912109375, -0.6524276733398438, 17.215362548828125, 13.147453308105469, 18.626312255859375, 3.4052352905273438, 10.87213134765625, 0.0823516845703125, 2.4165916442871094, 9.07415771484375, 16.749832153320312, -0.7179946899414062, 18.8800048828125, 24.701030731201172, 34.47983169555664, 40.401702880859375, -2.6268157958984375, -8.678873062133789, 3.6904220581054688, -8.789764404296875, 20.571510314941406, 2.86279296875, -16.058792114257812, 28.210960388183594, -1.1810340881347656, 4.178010940551758, 14.2674560546875, 10.102920532226562, 3.2696170806884766, 19.722991943359375, -0.98712158203125, 19.0506591796875, 9.252784729003906, -13.56072998046875, 29.53857421875, 0.6059799194335938, -1.9466705322265625, -11.890405654907227, 16.90204620361328, 9.080310821533203, -15.061630249023438, 11.580101013183594, 4.7268524169921875, 34.86747741699219, 38.06433868408203, 34.073970794677734, 8.848167419433594, 38.980552673339844, -0.817596435546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000459.npy"}
|
||||
{"epoch": 0.6938775510204082, "step": 460, "batch_size": 64, "mean": 11.367496490478516, "std": 14.692212104797363, "min": -13.531837463378906, "p10": -3.95926284790039, "median": 7.006614685058594, "p90": 32.23615913391114, "max": 52.41563415527344, "pos_frac": 0.78125, "sample": [11.079925537109375, 21.579788208007812, 52.41563415527344, -4.056549072265625, 3.2407684326171875, 25.654991149902344, 15.121917724609375, 6.7044677734375, 30.16314697265625, -0.15092849731445312, -3.10089111328125, 3.167266845703125, -11.616790771484375, 10.616714477539062, -3.7322616577148438, 6.76287841796875, 10.102828979492188, 20.216777801513672, -4.72314453125, 28.32979965209961, 4.4351959228515625, 44.541656494140625, 2.7769317626953125, 19.153053283691406, 34.61808395385742, 13.091632843017578, 0.9282302856445312, 5.7637176513671875, 28.027320861816406, 9.462089538574219, 2.7648696899414062, 34.87120819091797, 15.739875793457031, 10.49615478515625, -6.036716461181641, 10.754119873046875, 4.038003921508789, -5.7924041748046875, 7.2503509521484375, 4.443157196044922, 28.518291473388672, 26.589866638183594, 0.660247802734375, -2.212100028991699, 13.307846069335938, 39.19636535644531, -3.1825180053710938, -13.531837463378906, 31.2418212890625, -2.85479736328125, -0.838714599609375, 0.3770294189453125, 16.504798889160156, 13.663932800292969, 5.437957763671875, 36.12876892089844, 4.3401947021484375, 29.88671875, -11.759414672851562, 0.1416778564453125, 3.166900634765625, 19.049514770507812, 1.9220123291015625, 32.66230392456055], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000460.npy"}
|
||||
{"epoch": 0.6953892668178382, "step": 461, "batch_size": 64, "mean": 11.177490234375, "std": 13.86636734008789, "min": -16.663528442382812, "p10": -5.338316345214843, "median": 9.44477891921997, "p90": 27.83257064819336, "max": 47.961578369140625, "pos_frac": 0.765625, "sample": [22.355812072753906, 9.778427124023438, 3.837127685546875, 11.54465103149414, 31.627525329589844, 14.106666564941406, 20.24864959716797, 19.755464553833008, 8.293159484863281, -1.7891006469726562, 5.026885986328125, 7.961967468261719, 25.464996337890625, -5.4923858642578125, 9.243042945861816, -4.97882080078125, -0.142730712890625, 33.007381439208984, 3.5055227279663086, 9.646514892578125, 5.903554916381836, 19.738481521606445, 14.902618408203125, 0.2454833984375, 0.059762001037597656, 19.28180694580078, -14.26959228515625, -6.983058929443359, -4.7850341796875, 8.797426223754883, 18.973941802978516, -2.340991973876953, -6.341335296630859, 13.12887954711914, 26.523731231689453, 8.525985717773438, 24.614013671875, 18.53765869140625, 10.417522430419922, 47.961578369140625, 18.39312744140625, 29.837112426757812, 26.527462005615234, 16.39759063720703, 2.9393386840820312, 6.355567932128906, 2.0048179626464844, -13.312122344970703, 19.174076080322266, 5.9893035888671875, 27.078323364257812, 12.986591339111328, 5.058021545410156, 8.061027526855469, 41.23991394042969, -16.663528442382812, 27.862899780273438, -3.1317176818847656, -1.5736370086669922, -5.9513092041015625, 27.761802673339844, -2.178943634033203, 42.03850173950195, 12.571968078613281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000461.npy"}
|
||||
{"epoch": 0.6969009826152683, "step": 462, "batch_size": 64, "mean": 12.52739143371582, "std": 16.823486328125, "min": -33.002655029296875, "p10": -5.537067413330078, "median": 14.29257583618164, "p90": 33.05282554626465, "max": 54.97898864746094, "pos_frac": 0.75, "sample": [37.63153839111328, 40.28340148925781, 34.05121612548828, 18.825462341308594, -3.799999237060547, 3.797161102294922, 32.3013916015625, -7.729701995849609, -0.6657257080078125, 1.17340087890625, 20.33362579345703, 37.63267517089844, 30.896583557128906, 7.5074462890625, 25.331314086914062, 33.20428466796875, -5.150970458984375, 30.7958984375, 18.33613395690918, 23.341827392578125, 32.69942092895508, 17.26154327392578, 29.4207763671875, 21.23407745361328, 3.3899383544921875, 1.0748519897460938, 18.43240737915039, 28.109962463378906, -1.438568115234375, 22.70209503173828, 2.2346954345703125, -12.199790954589844, 15.55389404296875, 20.146085739135742, 32.12327575683594, -0.3910675048828125, -4.891740798950195, 15.515205383300781, 17.200029373168945, -0.3243255615234375, 10.393768310546875, 8.356010437011719, 23.61626434326172, 4.310249328613281, -2.3610076904296875, 23.759376525878906, -7.988761901855469, 24.593894958496094, 14.710334777832031, -18.016727447509766, 0.6094894409179688, -33.002655029296875, 34.01164245605469, 16.299030303955078, 13.782356262207031, -5.702537536621094, 4.239738464355469, 54.97898864746094, 11.154796600341797, 13.87481689453125, -25.723613739013672, -1.5959091186523438, 1.4723281860351562, 0.03139495849609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000462.npy"}
|
||||
{"epoch": 0.6984126984126984, "step": 463, "batch_size": 64, "mean": 8.195122718811035, "std": 14.814093589782715, "min": -15.799026489257812, "p10": -6.726258087158203, "median": 5.205204010009766, "p90": 28.90041160583496, "max": 44.195587158203125, "pos_frac": 0.65625, "sample": [-15.112255096435547, 11.984184265136719, 21.29364013671875, -14.947967529296875, 28.99899673461914, 22.790000915527344, -5.325944900512695, 40.485504150390625, 31.499588012695312, 19.531047821044922, 0.2770652770996094, -0.959320068359375, 2.006052017211914, 11.013294219970703, 23.01470947265625, 4.232940673828125, 3.3630523681640625, -6.722930908203125, -1.7548189163208008, 1.0504074096679688, -8.99783706665039, 7.60260009765625, -6.727684020996094, 28.670379638671875, 17.772703170776367, -11.82183837890625, 10.289337158203125, 6.4032745361328125, -0.6222000122070312, -5.391929626464844, -0.6155242919921875, 0.29633331298828125, 26.996063232421875, 31.528526306152344, -4.910228729248047, 23.743026733398438, 6.887298583984375, 10.565109252929688, 19.279891967773438, 34.186065673828125, 11.708126068115234, -14.063514709472656, 15.948944091796875, -15.799026489257812, -2.9606361389160156, -5.477508544921875, 44.195587158203125, -6.484222412109375, 30.696563720703125, 26.992156982421875, 3.0054244995117188, -5.056999206542969, 0.8708648681640625, -6.657444000244141, 10.511184692382812, 14.176834106445312, 27.069263458251953, 0.3394775390625, 6.301658630371094, 2.4308242797851562, -6.651611328125, 6.177467346191406, -1.3282146453857422, 26.692031860351562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000463.npy"}
|
||||
{"epoch": 0.6999244142101285, "step": 464, "batch_size": 64, "mean": 11.280075073242188, "std": 18.94193458557129, "min": -36.9857177734375, "p10": -11.327738189697264, "median": 10.671579360961914, "p90": 39.10416946411133, "max": 43.964935302734375, "pos_frac": 0.671875, "sample": [-2.0034942626953125, 11.396255493164062, 15.91297721862793, -17.091018676757812, 40.03559875488281, -4.542610168457031, -0.18085861206054688, 34.606624603271484, 10.947864532470703, 13.763458251953125, 39.203277587890625, -11.660530090332031, 15.479545593261719, 11.98375129699707, 19.461700439453125, -6.082183837890625, 36.38752746582031, -10.551223754882812, 3.3691329956054688, 9.045028686523438, 22.178497314453125, 38.69512939453125, -2.545391082763672, 13.572721481323242, 17.85681915283203, 23.154022216796875, -0.027973175048828125, 43.0568962097168, 16.47553062438965, 13.160591125488281, -1.756072998046875, 2.820110321044922, 25.1207275390625, 40.672576904296875, -36.9857177734375, 8.081512451171875, 34.10021209716797, 33.67509460449219, -13.667259216308594, -5.010353088378906, 28.26433563232422, 4.754283905029297, 14.957691192626953, 40.60328674316406, -12.87744140625, -3.1491241455078125, -0.290771484375, -0.12348556518554688, -14.367256164550781, 38.10498046875, 3.8444976806640625, 38.87291717529297, -31.12164306640625, -6.953655242919922, 4.2290496826171875, -5.076181411743164, 0.8116073608398438, 40.24198913574219, 19.807363510131836, 43.964935302734375, 0.6699333190917969, 23.574665069580078, 10.395294189453125, 0.6790847778320312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000464.npy"}
|
||||
{"epoch": 0.7014361300075586, "step": 465, "batch_size": 64, "mean": 10.575400352478027, "std": 16.733123779296875, "min": -21.780258178710938, "p10": -11.317567443847656, "median": 8.735965728759766, "p90": 34.82742538452149, "max": 43.70277404785156, "pos_frac": 0.703125, "sample": [-15.124523162841797, 28.875030517578125, 25.4490966796875, -0.6844482421875, 32.83341979980469, 17.463096618652344, -8.62542724609375, 42.911170959472656, 2.950042724609375, 21.175689697265625, 13.957454681396484, 31.733478546142578, 14.039936065673828, 41.850189208984375, -11.30575942993164, 23.650924682617188, 3.97589111328125, 36.96886444091797, 13.459991455078125, 0.12346649169921875, 8.813751220703125, -10.075447082519531, 17.230819702148438, 42.542083740234375, 32.29356384277344, 39.62969970703125, -11.322628021240234, -11.808921813964844, -1.3789596557617188, -0.214111328125, 5.420166015625, -3.374298095703125, 6.3295440673828125, 9.493324279785156, 15.205413818359375, 7.7284698486328125, -14.618793487548828, -8.990158081054688, -2.469658851623535, 20.47553825378418, -12.703788757324219, 18.808761596679688, 16.892822265625, -1.8920059204101562, 1.607666015625, -21.780258178710938, 1.6115074157714844, 2.9546775817871094, 20.7796630859375, 35.68199920654297, 8.658180236816406, 27.974746704101562, 4.725578308105469, 14.831439971923828, -2.26666259765625, -21.69577980041504, 6.189140319824219, -0.15100860595703125, 10.172439575195312, 8.198562622070312, 43.70277404785156, 20.45883560180664, 23.54193115234375, 13.937423706054688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000465.npy"}
|
||||
{"epoch": 0.7029478458049887, "step": 466, "batch_size": 64, "mean": 6.714020252227783, "std": 14.116584777832031, "min": -24.791683197021484, "p10": -8.794918060302733, "median": 4.399570465087891, "p90": 26.669292449951193, "max": 43.69007873535156, "pos_frac": 0.65625, "sample": [-6.098339080810547, 21.67951202392578, -7.6091766357421875, -10.64898681640625, 19.981874465942383, -4.439220428466797, 7.059883117675781, -2.2791061401367188, -17.490509033203125, -5.86419677734375, 3.8982925415039062, 19.55340576171875, 31.594879150390625, -12.687580108642578, 13.469436645507812, -1.205535888671875, 19.013774871826172, 28.807769775390625, 13.85003662109375, 34.993934631347656, 30.65203094482422, 1.944082260131836, 17.25720977783203, -1.908487319946289, 0.1967315673828125, 4.283470153808594, 10.258098602294922, -24.791683197021484, 38.891937255859375, 3.7072219848632812, 2.5693721771240234, 5.0858001708984375, -4.496482849121094, 13.359420776367188, 11.282752990722656, 4.753044128417969, 3.3070526123046875, 21.19855499267578, -11.635162353515625, 8.700084686279297, 1.1055526733398438, -4.1041259765625, -15.512748718261719, -1.585113525390625, 15.335289001464844, -3.7893524169921875, 2.6739883422851562, 14.521488189697266, 43.69007873535156, -9.303092956542969, -6.183219909667969, 9.6942138671875, -6.798240661621094, 20.34039306640625, -3.4762725830078125, 6.5428466796875, 1.8582706451416016, -5.799797058105469, 19.41252899169922, 9.838298797607422, 32.14995193481445, 10.822175979614258, 13.553321838378906, 4.5156707763671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000466.npy"}
|
||||
{"epoch": 0.7044595616024187, "step": 467, "batch_size": 64, "mean": 10.53709602355957, "std": 14.131275177001953, "min": -14.878108978271484, "p10": -5.3236244201660154, "median": 6.616355895996094, "p90": 31.299229049682626, "max": 52.9271240234375, "pos_frac": 0.78125, "sample": [2.4880599975585938, 33.76835632324219, 21.825477600097656, 11.580352783203125, -0.434051513671875, 0.36884307861328125, -2.245391845703125, -6.421230316162109, 10.412078857421875, 3.7966384887695312, 0.8577346801757812, -6.024044036865234, 23.136775970458984, 1.7261829376220703, 8.034164428710938, 13.898574829101562, 14.758111953735352, 6.391925811767578, -4.6943511962890625, 2.544048309326172, 18.421890258789062, -14.878108978271484, -6.456485748291016, 35.931026458740234, 34.785194396972656, 18.643325805664062, 18.412212371826172, 12.711463928222656, -14.521804809570312, 19.110015869140625, -6.3654937744140625, 32.119590759277344, 9.978948593139648, -1.4449462890625, 23.146793365478516, 0.86346435546875, 20.731422424316406, 25.611648559570312, 12.169937133789062, 7.373016357421875, -0.07249259948730469, 0.0050792694091796875, 1.3141860961914062, 6.840785980224609, 6.228931427001953, -4.7888031005859375, 0.04791259765625, -2.083087921142578, 29.385051727294922, 28.986553192138672, 52.9271240234375, 4.1721038818359375, 2.4007606506347656, 0.7575836181640625, 10.685888290405273, 4.988433837890625, -5.552833557128906, 42.109886169433594, 23.9837589263916, 22.550247192382812, 6.3317718505859375, 23.023605346679688, 32.243961334228516, 5.776405334472656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000467.npy"}
|
||||
{"epoch": 0.7059712773998488, "step": 468, "batch_size": 64, "mean": 12.165056228637695, "std": 16.803123474121094, "min": -30.975120544433594, "p10": -7.0995437622070305, "median": 9.802525520324707, "p90": 33.87210159301758, "max": 45.968421936035156, "pos_frac": 0.765625, "sample": [15.534049987792969, -0.15761947631835938, 18.902864456176758, 30.688079833984375, 8.27987289428711, -6.34577751159668, -0.1964244842529297, 6.210758209228516, 10.948341369628906, 6.78594970703125, 29.540142059326172, 32.4033203125, 10.405776977539062, -9.094964981079102, 1.1075592041015625, 26.218849182128906, -1.7007980346679688, -6.83709716796875, -13.139129638671875, 12.083721160888672, 28.106658935546875, 3.5430755615234375, 18.1761474609375, 41.658172607421875, 23.159595489501953, 19.149150848388672, 38.906864166259766, 9.842233657836914, -11.045814514160156, -30.975120544433594, 33.951026916503906, -19.582275390625, 5.2547760009765625, -7.2120208740234375, 3.518423080444336, -21.603851318359375, 33.68794250488281, 5.424163818359375, -0.414947509765625, 35.22350311279297, 11.222808837890625, 1.242156982421875, 28.192768096923828, 33.263275146484375, 22.298545837402344, 2.354954719543457, 3.9513511657714844, 12.042739868164062, 6.1529541015625, 6.326202392578125, 36.76052474975586, 16.612274169921875, 7.3762359619140625, -3.30279541015625, 43.44810485839844, 9.7628173828125, 3.1842041015625, 45.968421936035156, 24.786834716796875, 32.210723876953125, 25.463523864746094, 25.921142578125, 3.4738616943359375, -0.5551910400390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000468.npy"}
|
||||
{"epoch": 0.7074829931972789, "step": 469, "batch_size": 64, "mean": 9.849139213562012, "std": 13.699382781982422, "min": -14.288444519042969, "p10": -4.794507217407226, "median": 8.269351959228516, "p90": 31.84006252288819, "max": 43.82296371459961, "pos_frac": 0.703125, "sample": [-3.7474212646484375, 14.102346420288086, 15.149765014648438, 12.841316223144531, 6.569618225097656, 8.096649169921875, -1.381439208984375, 17.386186599731445, 6.930122375488281, -8.752433776855469, 32.64057159423828, 10.410255432128906, 10.093353271484375, -1.4944915771484375, 0.08664703369140625, 33.032779693603516, -4.088539123535156, 2.593564987182617, 18.34379768371582, 28.62519073486328, 32.52013397216797, 4.871986389160156, 18.11829376220703, -6.878929138183594, 10.483566284179688, -9.79620361328125, 18.70343780517578, 7.8398284912109375, 7.445468902587891, 13.932209014892578, -5.097064971923828, 1.1906938552856445, 9.270263671875, 42.20812225341797, 25.10637664794922, -1.9363632202148438, -0.16352081298828125, -10.297313690185547, 35.34403991699219, 30.25322914123535, 24.73529052734375, 4.6834869384765625, 13.694843292236328, 8.570945739746094, -2.3196029663085938, -3.8191680908203125, -4.00701904296875, -3.578205108642578, 43.82296371459961, 12.468719482421875, 11.287223815917969, 8.442054748535156, 28.757171630859375, 2.897045135498047, 35.34796142578125, -5.344966888427734, 5.560327529907227, 2.814472198486328, 9.539588928222656, -14.288444519042969, 13.221755981445312, -3.0060691833496094, -3.0260467529296875, 23.334487915039062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000469.npy"}
|
||||
{"epoch": 0.708994708994709, "step": 470, "batch_size": 64, "mean": 10.157899856567383, "std": 15.079710960388184, "min": -32.89275360107422, "p10": -5.9479633331298825, "median": 9.16008186340332, "p90": 29.145601654052737, "max": 47.264404296875, "pos_frac": 0.703125, "sample": [27.555892944335938, 14.589447021484375, 33.74552917480469, -32.89275360107422, 32.158302307128906, 15.35009765625, 12.982681274414062, 5.735481262207031, 17.54578399658203, 20.812088012695312, 17.03588104248047, -17.113027572631836, 29.35516357421875, 25.536006927490234, 31.186447143554688, -0.13214111328125, 28.65662384033203, -2.8062286376953125, -0.00991058349609375, 14.978036880493164, 1.9350204467773438, 11.536468505859375, 4.483879089355469, -9.412094116210938, 8.877555847167969, -1.5984477996826172, 27.959285736083984, 12.29962158203125, 11.129161834716797, -0.6576709747314453, -0.20261001586914062, -6.253208160400391, 27.861347198486328, 3.40545654296875, -1.5894508361816406, 3.5555076599121094, 23.736404418945312, 1.9263687133789062, 4.595634460449219, -12.787765502929688, 47.264404296875, 11.61252212524414, 31.918426513671875, -1.6657142639160156, -2.596160888671875, -5.235725402832031, 6.992948532104492, 23.88683319091797, 25.494842529296875, 31.413597106933594, -15.308464050292969, 23.287261962890625, 15.714611053466797, 6.4528350830078125, 3.7214279174804688, 23.97015380859375, 4.56275749206543, 14.134529113769531, 0.5197525024414062, 9.442607879638672, 28.380786895751953, -3.3857421875, -13.638069152832031, -1.9047126770019531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000470.npy"}
|
||||
{"epoch": 0.7105064247921391, "step": 471, "batch_size": 64, "mean": 12.95675277709961, "std": 17.029674530029297, "min": -44.940582275390625, "p10": -3.84367218017578, "median": 11.782817840576172, "p90": 36.655724334716794, "max": 44.32585144042969, "pos_frac": 0.828125, "sample": [-1.5403213500976562, 37.713287353515625, 26.61131477355957, 3.1919803619384766, 1.5632362365722656, -1.635650634765625, 20.48503875732422, 28.9840087890625, 34.960784912109375, 36.68077087402344, 7.895843505859375, 8.115005493164062, 44.32585144042969, 4.497123718261719, -19.043380737304688, 0.20807933807373047, 5.9350433349609375, 41.10005187988281, 16.378074645996094, -5.374908447265625, 4.401208877563477, 19.838836669921875, 6.2415924072265625, 14.850059509277344, 29.972633361816406, -44.940582275390625, 18.953330993652344, 28.73272705078125, 36.59728240966797, -1.8743705749511719, 18.570755004882812, 13.150550842285156, 1.0646896362304688, 2.2951202392578125, 20.923843383789062, 22.714385986328125, 33.75913619995117, -16.400802612304688, 12.216712951660156, 29.78614044189453, 6.567348480224609, 6.814239501953125, 41.68960952758789, 32.89814758300781, 0.06146049499511719, 4.211067199707031, 22.5684814453125, 14.33182144165039, -6.556552886962891, 13.046852111816406, -4.468414306640625, 31.3175048828125, 5.994392395019531, 0.105316162109375, 15.839767456054688, 0.3342170715332031, 11.348922729492188, 5.3495941162109375, -2.3859405517578125, 38.612030029296875, 2.3894271850585938, 15.304924011230469, -8.951213836669922, 40.90467071533203], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000471.npy"}
|
||||
{"epoch": 0.7120181405895691, "step": 472, "batch_size": 64, "mean": 11.134857177734375, "std": 18.653005599975586, "min": -31.212257385253906, "p10": -12.374920654296872, "median": 8.471735000610352, "p90": 36.642506027221685, "max": 56.548187255859375, "pos_frac": 0.75, "sample": [1.3778762817382812, 2.0510711669921875, 9.374664306640625, -0.0709381103515625, 31.710742950439453, 30.547218322753906, -3.0316314697265625, 23.30884552001953, 11.927772521972656, 16.977935791015625, 8.737464904785156, 41.73210144042969, 7.855461120605469, -10.184013366699219, 2.948963165283203, 34.968441009521484, -1.7049016952514648, 7.431934356689453, 48.29808044433594, 7.911651611328125, -2.0287818908691406, 10.900615692138672, 0.2858848571777344, 45.31048583984375, -3.759552001953125, 16.39623260498047, 1.0480117797851562, 46.16004943847656, 45.478553771972656, -7.88148307800293, 22.38119888305664, 14.47381591796875, 5.909893035888672, -15.852943420410156, -14.318199157714844, 9.254730224609375, 10.011749267578125, 56.548187255859375, -0.9061622619628906, -2.243621826171875, 28.53614044189453, 32.895606994628906, -31.212257385253906, 37.359962463378906, -19.2685546875, -25.039310455322266, 32.68962097167969, 2.798583984375, -20.6513671875, 22.154098510742188, 8.37098503112793, 8.572484970092773, 5.1376953125, 4.393836975097656, 0.2147216796875, 7.5535125732421875, 7.731117248535156, 11.2225341796875, 26.517780303955078, 15.958473205566406, 19.983741760253906, 11.57465934753418, -13.313880920410156, 29.113231658935547], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000472.npy"}
|
||||
{"epoch": 0.7135298563869993, "step": 473, "batch_size": 64, "mean": 10.971088409423828, "std": 17.429548263549805, "min": -39.91169357299805, "p10": -8.82515869140625, "median": 8.742406845092773, "p90": 35.93290405273438, "max": 50.216552734375, "pos_frac": 0.765625, "sample": [48.01036071777344, 15.961074829101562, 1.9110908508300781, -20.51618766784668, -0.12523651123046875, 34.851715087890625, 3.6252593994140625, 16.328819274902344, 3.7625503540039062, 34.52861022949219, 36.396270751953125, 9.586496353149414, -1.4217090606689453, 22.6993408203125, 6.4340972900390625, -9.103401184082031, 50.216552734375, 39.13414001464844, 10.415245056152344, 27.71356201171875, -39.91169357299805, -12.492767333984375, 9.164306640625, -8.175926208496094, -22.065399169921875, 22.553916931152344, 9.534332275390625, 17.076385498046875, 3.6549129486083984, 3.4390792846679688, 4.0107574462890625, -2.0821800231933594, -1.4142704010009766, 11.69805908203125, -4.186622619628906, 25.875276565551758, 37.2003173828125, 28.511795043945312, 8.289203643798828, 9.700460433959961, 39.56463623046875, 33.802650451660156, 22.33001708984375, 18.587112426757812, -1.1307449340820312, 5.751186370849609, 1.0007963180541992, 1.2942581176757812, 17.027389526367188, 14.073596954345703, 5.5635986328125, 6.493621826171875, 20.37710952758789, 6.425872802734375, -0.08076667785644531, 8.216947555541992, 44.2257080078125, -18.045005798339844, 11.463623046875, 8.320507049560547, 5.742279052734375, 17.89385986328125, 16.251541137695312, -13.788677215576172], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000473.npy"}
|
||||
{"epoch": 0.7150415721844293, "step": 474, "batch_size": 64, "mean": 8.970962524414062, "std": 16.61406135559082, "min": -33.086429595947266, "p10": -7.662735748291015, "median": 5.6820526123046875, "p90": 35.122387695312504, "max": 44.88385772705078, "pos_frac": 0.71875, "sample": [38.45899963378906, 7.881263732910156, 14.308731079101562, 5.097879409790039, -2.4731369018554688, -5.303062438964844, 12.318450927734375, 38.43488311767578, 18.3846435546875, 1.4382247924804688, 8.463546752929688, 3.9250946044921875, -33.086429595947266, 14.852462768554688, 2.4763870239257812, 39.81896209716797, -1.7870101928710938, 6.635013580322266, -0.6122932434082031, 8.884429931640625, -18.781347274780273, -7.957714080810547, 33.4033203125, 5.6472015380859375, 28.452682495117188, 12.036441802978516, -1.2303152084350586, -2.0438232421875, -0.6155624389648438, -1.4400405883789062, 35.859130859375, 3.4770126342773438, -19.652748107910156, 18.32868194580078, -5.0200347900390625, 16.436569213867188, 2.846729278564453, 37.718345642089844, 22.78094482421875, 2.6634674072265625, 1.604990005493164, 6.555870056152344, 20.095413208007812, 5.7169036865234375, -6.974452972412109, 24.221153259277344, 11.910064697265625, 3.3836822509765625, 20.13292694091797, 3.3511695861816406, 40.8171272277832, 44.88385772705078, 5.3323211669921875, 13.66751480102539, 13.92696762084961, -23.72905731201172, 30.119544982910156, -6.678565979003906, -8.107551574707031, 17.219223022460938, 30.51599884033203, -22.43463897705078, 5.288414001464844, 2.296783447265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000474.npy"}
|
||||
{"epoch": 0.7165532879818595, "step": 475, "batch_size": 64, "mean": 14.334149360656738, "std": 15.559943199157715, "min": -29.30138397216797, "p10": -3.2786930084228514, "median": 12.235840797424316, "p90": 33.91533508300782, "max": 54.25004959106445, "pos_frac": 0.84375, "sample": [2.5520248413085938, 1.1422691345214844, 16.24669647216797, 10.115245819091797, 26.75408172607422, 7.189476013183594, 36.88977813720703, 26.478954315185547, 2.9555978775024414, -7.880062103271484, 16.40411376953125, -0.893157958984375, 25.945755004882812, 32.96917724609375, 5.523765563964844, 19.486679077148438, 25.81572723388672, 21.504165649414062, 4.411437034606934, 20.05089569091797, 9.973037719726562, -1.11846923828125, 5.309690475463867, -3.3653030395507812, 12.007949829101562, 4.5347137451171875, 3.765899658203125, 29.1444091796875, 4.1271209716796875, -4.4435882568359375, 54.25004959106445, 39.70011520385742, 24.93553924560547, 43.917144775390625, 12.46373176574707, -29.30138397216797, 19.664703369140625, 6.419273376464844, 13.127555847167969, -3.0766029357910156, 34.320831298828125, 14.398834228515625, 11.488475799560547, 15.183914184570312, 26.13454818725586, -11.037252426147461, 11.366174697875977, 7.689401626586914, 34.579071044921875, 3.6610336303710938, 27.76766014099121, 42.13026428222656, 29.850051879882812, 27.79071807861328, 21.523658752441406, -7.0075225830078125, 0.39330291748046875, 30.351806640625, 23.013534545898438, 32.19847106933594, 11.337383270263672, 1.6943626403808594, -12.653427124023438, 5.512048721313477], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000475.npy"}
|
||||
{"epoch": 0.7180650037792895, "step": 476, "batch_size": 64, "mean": 9.876466751098633, "std": 15.849506378173828, "min": -32.09950256347656, "p10": -5.9350837707519535, "median": 7.260995864868164, "p90": 32.98217430114746, "max": 48.15238952636719, "pos_frac": 0.703125, "sample": [33.05865478515625, 17.987640380859375, 0.6078453063964844, 9.58205795288086, -4.8682708740234375, 2.442138671875, -4.7055816650390625, 28.013545989990234, 32.94950866699219, 1.1964111328125, 27.800445556640625, -1.0540037155151367, 40.71489715576172, 18.594970703125, 9.989906311035156, -0.4437408447265625, 10.38043212890625, 6.17584228515625, 21.067577362060547, 0.15972900390625, -2.3124656677246094, 7.684073448181152, 35.31755065917969, 32.99617385864258, 3.883544921875, -32.09950256347656, -5.9624786376953125, 3.4130096435546875, 1.3950958251953125, -0.8834810256958008, 39.98506546020508, -0.360198974609375, 5.945709228515625, 11.32400131225586, 7.022449493408203, 7.815528869628906, -23.312942504882812, -9.974151611328125, 21.3367919921875, -3.2167625427246094, -5.871162414550781, 16.946151733398438, 4.5161590576171875, 4.8501434326171875, 18.06452178955078, 7.499542236328125, -7.871807098388672, 16.05712127685547, -1.150390625, 31.956043243408203, 16.27568817138672, -9.169158935546875, 24.658912658691406, 3.5643310546875, 19.489282608032227, 48.15238952636719, -0.42397308349609375, 8.39883804321289, 31.623775482177734, 10.897735595703125, -13.263019561767578, 25.53697967529297, 34.07896423339844, -2.370206832885742], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000476.npy"}
|
||||
{"epoch": 0.7195767195767195, "step": 477, "batch_size": 64, "mean": 8.044179916381836, "std": 18.05425262451172, "min": -36.157867431640625, "p10": -10.907803344726561, "median": 5.67671537399292, "p90": 33.11940574645997, "max": 45.956085205078125, "pos_frac": 0.671875, "sample": [-9.939781188964844, -4.997200012207031, 37.36333465576172, 2.76873779296875, 5.2660064697265625, 6.696754455566406, -5.423606872558594, -7.787315368652344, 7.3144989013671875, -4.4395599365234375, 15.689544677734375, 31.415802001953125, -7.102487564086914, 1.8972358703613281, -0.4819812774658203, 1.7988357543945312, -2.0425796508789062, 26.97620391845703, 13.94384765625, -0.7641201019287109, 20.386444091796875, 19.475780487060547, -1.6313896179199219, 12.168800354003906, 27.931488037109375, -5.963371276855469, -29.487648010253906, 5.7832489013671875, 35.571502685546875, 4.23272705078125, -12.729843139648438, -4.9767913818359375, 5.570181846618652, 2.118297576904297, -35.58949279785156, -11.322669982910156, 1.194061279296875, 28.209327697753906, 15.524406433105469, 1.6436843872070312, 17.17364501953125, -15.267864227294922, 18.54627227783203, 39.35999298095703, 8.173358917236328, 37.37641143798828, -20.087120056152344, 7.47955322265625, 29.135940551757812, -9.717399597167969, 19.901611328125, 7.992153167724609, 9.876052856445312, 41.45250701904297, 31.378204345703125, 18.15948486328125, -36.157867431640625, 3.873321533203125, 15.051620483398438, 45.956085205078125, 2.214405059814453, 33.84952163696289, -3.8336181640625, 26.680299758911133], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000477.npy"}
|
||||
{"epoch": 0.7210884353741497, "step": 478, "batch_size": 64, "mean": 8.795384407043457, "std": 19.869436264038086, "min": -40.71270751953125, "p10": -12.914282226562497, "median": 4.151569366455078, "p90": 37.02609176635743, "max": 49.262916564941406, "pos_frac": 0.640625, "sample": [-5.524993896484375, 45.63848114013672, 27.42974853515625, 34.13178253173828, 20.868688583374023, -14.088226318359375, 13.822174072265625, 20.975006103515625, -6.009208679199219, 28.206436157226562, -2.049571990966797, 2.1239547729492188, 25.500320434570312, 41.37886047363281, -16.46753692626953, 3.8427696228027344, -26.393280029296875, -1.6358489990234375, -1.0188255310058594, 3.6684494018554688, -1.2206687927246094, -24.58926010131836, 32.44476318359375, 26.19501495361328, 25.464115142822266, -21.180770874023438, 5.464607238769531, 7.864585876464844, 16.9361572265625, -6.891620635986328, 12.370662689208984, -1.7256965637207031, -27.91240692138672, -7.770862579345703, 5.0586090087890625, 33.259033203125, 31.43321990966797, -9.107498168945312, 2.7438125610351562, 3.2429656982421875, 41.19597244262695, -6.718585968017578, -40.71270751953125, -0.07242393493652344, 49.262916564941406, -6.740810394287109, 3.9944381713867188, 38.266510009765625, 4.3087005615234375, 4.3292999267578125, 17.259498596191406, 30.05902099609375, 0.17617034912109375, 20.195066452026367, 21.011390686035156, 47.247032165527344, 39.238548278808594, -10.175079345703125, 2.8225173950195312, 13.902671813964844, -3.0995864868164062, -7.816464424133301, 6.609535217285156, 1.8830242156982422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000478.npy"}
|
||||
{"epoch": 0.7226001511715797, "step": 479, "batch_size": 64, "mean": 8.55759048461914, "std": 17.46190643310547, "min": -47.44477081298828, "p10": -11.566828536987304, "median": 7.102934837341309, "p90": 32.23098449707032, "max": 42.28086853027344, "pos_frac": 0.734375, "sample": [8.332813262939453, 24.413597106933594, 3.9200286865234375, -18.29271697998047, -2.981029510498047, 10.746707916259766, 40.965755462646484, 16.90081787109375, 28.03314971923828, 1.6335220336914062, 6.295427322387695, -20.18618392944336, -2.5801315307617188, 0.30301666259765625, 23.66680908203125, 17.07372283935547, 42.28086853027344, 12.889350891113281, -0.5210723876953125, 2.277820587158203, -3.8567161560058594, 6.2061767578125, 13.2398681640625, 39.38331604003906, 3.7756271362304688, -27.8934326171875, 11.18191909790039, 8.858734130859375, -6.408077239990234, -0.28066253662109375, 8.817794799804688, 25.170076370239258, -5.865936279296875, 32.52948760986328, 9.430862426757812, 2.2081680297851562, -16.605243682861328, 10.413246154785156, 2.2667274475097656, 38.74836349487305, 22.653465270996094, -47.44477081298828, 0.2550201416015625, 8.799671173095703, 21.03791046142578, 38.533966064453125, 6.147869110107422, -12.05804443359375, -10.420658111572266, 2.2679519653320312, 5.8325347900390625, 19.447616577148438, -14.933891296386719, 36.8033447265625, -1.4962406158447266, 3.2324790954589844, 31.195636749267578, 7.910442352294922, 12.666267395019531, 18.389049530029297, 0.09276580810546875, -7.334381103515625, 31.53447723388672, 28.080720901489258], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000479.npy"}
|
||||
{"epoch": 0.7241118669690099, "step": 480, "batch_size": 64, "mean": 11.412254333496094, "std": 18.107221603393555, "min": -44.13832092285156, "p10": -11.375802993774414, "median": 11.565838813781738, "p90": 33.41287002563477, "max": 43.0543212890625, "pos_frac": 0.734375, "sample": [23.029647827148438, -1.30780029296875, 30.17969512939453, -13.274215698242188, 5.678522109985352, 1.9547767639160156, 25.08877944946289, -44.13832092285156, -1.5749168395996094, 18.949676513671875, -5.61461067199707, -2.539794921875, 24.96466827392578, -11.670173645019531, 30.882400512695312, -17.555450439453125, -3.751190185546875, 9.432243347167969, -0.9863204956054688, 5.483516693115234, -10.68893814086914, 32.648414611816406, 18.63405990600586, 3.0760040283203125, 16.804931640625, 26.692405700683594, 34.80476379394531, 24.11676025390625, 43.0543212890625, 9.348892211914062, 20.9686279296875, -2.232407569885254, -2.8112030029296875, 10.01719856262207, 19.23764419555664, 1.5946235656738281, 31.775978088378906, 33.74049377441406, -40.551727294921875, 24.857685089111328, 29.95883560180664, 37.38468933105469, 28.430580139160156, 16.08432388305664, 1.052001953125, 34.417259216308594, 6.516448974609375, 14.891845703125, 7.167213439941406, 13.114479064941406, 3.8803138732910156, -5.5830078125, 38.11311340332031, 19.717514038085938, 23.41504669189453, 17.70131492614746, 37.69492721557617, 23.7492733001709, 9.207630157470703, -15.06558609008789, 6.451366424560547, 22.11687469482422, 7.7097930908203125, -16.061676025390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000480.npy"}
|
||||
{"epoch": 0.7256235827664399, "step": 481, "batch_size": 64, "mean": 10.39214038848877, "std": 17.564559936523438, "min": -37.528846740722656, "p10": -5.61058349609375, "median": 6.17659854888916, "p90": 35.198194885253926, "max": 56.56170654296875, "pos_frac": 0.75, "sample": [37.083126068115234, -4.177284240722656, 42.23619842529297, -3.1891326904296875, -5.543361663818359, 24.208778381347656, 25.140548706054688, -4.769966125488281, -5.461830139160156, -5.639392852783203, 25.651214599609375, 1.9615402221679688, 9.88543701171875, 12.48577880859375, -5.484428405761719, 19.6624755859375, 23.71685791015625, 11.088573455810547, 9.877525329589844, 15.350814819335938, -37.528846740722656, 5.90568733215332, 20.009536743164062, 11.227249145507812, 3.573261260986328, -5.997407913208008, 2.81201171875, 25.35567855834961, 3.0084877014160156, 18.026748657226562, 1.4709949493408203, 9.106990814208984, 0.48267364501953125, -0.552947998046875, 5.168220520019531, 2.40576171875, 16.6414794921875, 3.912322998046875, 4.411277770996094, 5.136100769042969, 6.447509765625, 15.587169647216797, 30.80002212524414, 51.58850860595703, 1.0732765197753906, 39.79353713989258, -8.082855224609375, 2.3490829467773438, 47.664306640625, 23.684242248535156, 21.234397888183594, -0.5318069458007812, 3.2935943603515625, -6.201568603515625, -21.039119720458984, 50.0302734375, -21.690715789794922, 9.541976928710938, 16.464744567871094, 56.56170654296875, 18.65235137939453, 9.625041961669922, -3.73150634765625, 3.324066162109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000481.npy"}
|
||||
{"epoch": 0.72713529856387, "step": 482, "batch_size": 64, "mean": 9.79744815826416, "std": 18.878564834594727, "min": -36.46308898925781, "p10": -10.005472183227537, "median": 6.94184684753418, "p90": 36.50613021850587, "max": 54.61553955078125, "pos_frac": 0.671875, "sample": [-8.2574462890625, -36.46308898925781, 0.3107490539550781, 4.794078826904297, 5.507228851318359, -7.158332824707031, 2.0374755859375, -13.647491455078125, 2.297149658203125, 41.58380126953125, 10.929414749145508, 28.89215087890625, 27.001358032226562, 4.821311950683594, 26.10584259033203, 17.826465606689453, 24.7755126953125, 7.279369354248047, -28.524307250976562, 7.424522399902344, 41.42165756225586, -0.28192138671875, 54.61553955078125, -7.609954833984375, -1.2268180847167969, 21.953353881835938, 41.87675476074219, 23.02444076538086, -5.471530914306641, 1.95184326171875, -12.520065307617188, 20.01423454284668, -6.002372741699219, 18.529537200927734, 43.00176239013672, 34.99834442138672, 22.945701599121094, 2.6201438903808594, -0.9954147338867188, -5.27374267578125, 33.52191162109375, 10.04412841796875, 5.3824310302734375, 9.282028198242188, 10.130819320678711, 37.03778839111328, 18.504257202148438, 10.736763000488281, 42.626708984375, -1.780426025390625, 20.359848022460938, 6.6043243408203125, 26.386215209960938, -2.3780975341796875, 22.471370697021484, 0.48837852478027344, -22.724578857421875, 35.265594482421875, -6.645957946777344, -16.687541961669922, 9.129737854003906, -6.511383056640625, -8.787353515625, -10.527523040771484], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000482.npy"}
|
||||
{"epoch": 0.7286470143613001, "step": 483, "batch_size": 64, "mean": 12.537254333496094, "std": 17.927221298217773, "min": -43.697975158691406, "p10": -4.42191104888916, "median": 7.456356048583984, "p90": 35.008056640625, "max": 52.38287353515625, "pos_frac": 0.734375, "sample": [3.2882537841796875, 30.672134399414062, -0.7097244262695312, 24.564346313476562, 32.159690856933594, 37.699424743652344, 32.27430725097656, -2.1866607666015625, -6.16815185546875, -1.1491470336914062, 5.944004058837891, 11.106193542480469, 22.07080078125, 44.29069519042969, 1.867156982421875, 27.214223861694336, -4.612941741943359, -10.740341186523438, 3.4980411529541016, 44.527191162109375, 28.576431274414062, 43.24568176269531, 29.963882446289062, -1.779266357421875, 33.121917724609375, 5.435310363769531, 35.20977020263672, 13.005569458007812, -4.4815673828125, 18.383285522460938, 24.880897521972656, -28.670257568359375, 34.537391662597656, 0.9334335327148438, 3.7972412109375, 29.3935546875, 19.324913024902344, -43.697975158691406, 4.884193420410156, -1.5288619995117188, 3.48822021484375, 12.575191497802734, 8.967300415039062, 3.8015975952148438, -3.9131431579589844, 20.01148223876953, 52.38287353515625, 37.73930358886719, 21.74822235107422, 17.09783172607422, 10.670852661132812, -1.947265625, 5.028343200683594, 32.396209716796875, -4.282712936401367, -6.376731872558594, -0.10973739624023438, 5.945411682128906, 22.211753845214844, 5.560302734375, 0.35497283935546875, 4.630458831787109, -2.7716827392578125, 17.03015899658203], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000483.npy"}
|
||||
{"epoch": 0.7301587301587301, "step": 484, "batch_size": 64, "mean": 8.684356689453125, "std": 14.56873607635498, "min": -31.952068328857422, "p10": -5.352019500732422, "median": 4.686601638793945, "p90": 27.020398712158208, "max": 45.11552429199219, "pos_frac": 0.734375, "sample": [17.345531463623047, -5.989967346191406, -8.160270690917969, 2.289722442626953, 43.59699249267578, 1.2992773056030273, 16.25761604309082, -31.952068328857422, 33.7396240234375, -3.0962142944335938, 4.124114990234375, 1.26611328125, -5.338096618652344, 1.8284149169921875, 13.990741729736328, 7.534698486328125, -12.801681518554688, 7.403163909912109, -0.44713592529296875, 22.116867065429688, -1.8905792236328125, 18.130088806152344, 37.17682647705078, 12.368106842041016, 3.607877731323242, 25.99799346923828, 27.458572387695312, 16.67974853515625, 21.496185302734375, 11.364509582519531, -2.981321334838867, 18.038898468017578, -0.8853912353515625, 0.6006698608398438, 34.214599609375, 1.5061187744140625, 0.25765037536621094, 2.541898727416992, 5.918968200683594, 4.553653717041016, 17.299592971801758, -4.666656494140625, 45.11552429199219, -4.1031951904296875, 7.957281112670898, 4.06390380859375, 21.40015411376953, -2.3907814025878906, 18.21136474609375, 16.824142456054688, 39.51616668701172, 3.769927978515625, 15.030464172363281, 0.28656005859375, -11.223251342773438, -15.6729736328125, 17.611404418945312, -1.1578521728515625, 1.0991706848144531, 4.819549560546875, 6.3299713134765625, 19.39470672607422, 20.479164123535156, -5.3579864501953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000484.npy"}
|
||||
{"epoch": 0.7316704459561603, "step": 485, "batch_size": 64, "mean": 10.056386947631836, "std": 16.603363037109375, "min": -26.500442504882812, "p10": -8.425116920471192, "median": 5.979226112365723, "p90": 35.41799011230469, "max": 46.32110595703125, "pos_frac": 0.734375, "sample": [12.206008911132812, 1.7290611267089844, 32.44841766357422, 18.984237670898438, 7.734504699707031, -1.6541671752929688, 8.27940559387207, -7.916852951049805, 17.13349151611328, -12.51218032836914, -26.500442504882812, 19.362106323242188, 41.99608612060547, 34.382537841796875, 4.2921905517578125, 12.6878662109375, 12.91888427734375, 14.68984603881836, 38.48626708984375, 4.045440673828125, 5.54041862487793, 3.89874267578125, -13.818489074707031, 35.86175537109375, 4.025447845458984, 1.5541439056396484, -23.14276123046875, 10.594017028808594, -3.5132293701171875, 18.022132873535156, -8.829421997070312, 7.4328155517578125, 30.46942138671875, -0.3040351867675781, 5.258750915527344, -0.771728515625, 46.32110595703125, 46.125205993652344, 2.123870849609375, 31.232322692871094, 6.734550476074219, -7.881843566894531, 2.188140869140625, -8.6429443359375, 6.418033599853516, -3.7569656372070312, -0.5905094146728516, 27.556289672851562, 11.95510482788086, -5.7457733154296875, 6.633552551269531, 5.138214111328125, 26.445594787597656, 42.581459045410156, 4.5261993408203125, 28.98566436767578, 19.123157501220703, 40.98040008544922, -1.6802215576171875, -9.104265213012695, 13.061737060546875, 2.2491798400878906, 0.4864044189453125, 5.0743865966796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000485.npy"}
|
||||
{"epoch": 0.7331821617535903, "step": 486, "batch_size": 64, "mean": 11.130483627319336, "std": 19.56717872619629, "min": -39.83440399169922, "p10": -7.907587623596191, "median": 6.923835754394531, "p90": 37.01080932617188, "max": 62.516815185546875, "pos_frac": 0.734375, "sample": [0.8552169799804688, 2.5277328491210938, 43.733001708984375, 0.7101249694824219, 0.8468246459960938, -3.563079833984375, 7.16326904296875, 33.650543212890625, 11.876541137695312, 1.9393310546875, -8.880779266357422, 18.596405029296875, 29.399795532226562, 17.24551010131836, 36.22294616699219, 4.628105163574219, -3.7920074462890625, 46.256290435791016, -1.1088943481445312, 32.45128631591797, 6.305206298828125, 8.849273681640625, -1.8237152099609375, -3.287139892578125, 48.342742919921875, -15.62310791015625, 21.567970275878906, -8.00984001159668, 16.630008697509766, -34.754791259765625, 42.2958984375, 32.33417510986328, 4.8144073486328125, -23.776973724365234, 1.7890968322753906, 18.31420135498047, -0.203643798828125, 27.241127014160156, -10.861150741577148, -3.8747177124023438, 16.534038543701172, 33.75912857055664, 6.752960205078125, 37.34846496582031, 29.123600006103516, 3.6735382080078125, 3.5935516357421875, 16.520767211914062, 35.848182678222656, 40.34894561767578, 14.3983154296875, 5.725433349609375, 5.86578369140625, 7.340789794921875, -4.936864852905273, 1.8749542236328125, 7.0947113037109375, 10.019798278808594, 62.516815185546875, -7.668998718261719, 24.960525512695312, -39.83440399169922, 8.166629791259766, -3.702880859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000486.npy"}
|
||||
{"epoch": 0.7346938775510204, "step": 487, "batch_size": 64, "mean": 10.707120895385742, "std": 16.028793334960938, "min": -31.36846160888672, "p10": -8.95465850830078, "median": 9.377073287963867, "p90": 33.85388107299805, "max": 46.766021728515625, "pos_frac": 0.75, "sample": [-1.8491973876953125, -10.77001953125, 18.20166778564453, 31.095840454101562, 24.028175354003906, 11.243600845336914, -5.04791259765625, 16.24234390258789, 25.20233154296875, 19.208641052246094, 13.79034423828125, -7.851509094238281, 2.251617431640625, -9.791767120361328, -4.284599304199219, 43.665626525878906, 16.672836303710938, 8.821632385253906, 43.963623046875, 11.9468994140625, -14.665534973144531, 4.79217529296875, 3.1102218627929688, 34.339813232421875, 19.05276107788086, -2.9679908752441406, -6.748931884765625, -9.892730712890625, 18.091163635253906, 8.672332763671875, 27.637924194335938, 4.870941162109375, 14.352806091308594, -5.617687225341797, 39.03273010253906, 11.430084228515625, 29.10523223876953, -1.3786697387695312, 21.93077850341797, 10.070976257324219, 8.3056640625, 5.029853820800781, -13.412841796875, 21.993392944335938, -1.85906982421875, 11.223369598388672, 9.932514190673828, 2.3096847534179688, 3.713146209716797, -31.36846160888672, 0.12158203125, 38.26545333862305, 4.0943450927734375, -9.427436828613281, 8.364494323730469, 19.692245483398438, 5.335975646972656, 10.4576416015625, 2.9829864501953125, 37.35191345214844, 32.72003936767578, 19.5744686126709, 46.766021728515625, 1.1302146911621094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000487.npy"}
|
||||
{"epoch": 0.7362055933484505, "step": 488, "batch_size": 64, "mean": 11.58226203918457, "std": 20.078874588012695, "min": -37.47264862060547, "p10": -9.357250213623047, "median": 7.241798400878906, "p90": 40.935362625122075, "max": 56.034027099609375, "pos_frac": 0.65625, "sample": [-8.77130126953125, -12.470142364501953, -5.396566390991211, -16.187847137451172, 15.093185424804688, -5.739567756652832, -37.47264862060547, 18.131366729736328, 39.59117126464844, 33.70595932006836, 0.7637386322021484, -9.2900390625, 43.490089416503906, 9.889904022216797, 3.678741455078125, 56.034027099609375, 2.139495849609375, 19.78136444091797, -0.6245040893554688, -8.105884552001953, 7.095184326171875, 8.350372314453125, 6.969127655029297, 2.0160980224609375, -9.386054992675781, 7.3884124755859375, -11.633567810058594, 34.52696990966797, -3.0610198974609375, 42.93095397949219, -2.7597274780273438, -9.567352294921875, -1.6439132690429688, 42.44123077392578, 12.063522338867188, 2.204967498779297, 39.9107666015625, -5.576066970825195, 46.84584045410156, 38.57780456542969, 37.52760314941406, -2.9625244140625, 34.81044006347656, -5.3865203857421875, 41.28443145751953, 10.427833557128906, -0.49509429931640625, 7.430381774902344, -14.716064453125, 30.098587036132812, 0.5529022216796875, 13.238992691040039, -8.171390533447266, 7.69549560546875, 12.896087646484375, 40.12086868286133, 18.895751953125, 39.779396057128906, 26.924285888671875, 4.158515930175781, 13.917892456054688, 6.772575378417969, -3.723846435546875, 44.254119873046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000488.npy"}
|
||||
{"epoch": 0.7377173091458806, "step": 489, "batch_size": 64, "mean": 5.1234130859375, "std": 15.078926086425781, "min": -24.552413940429688, "p10": -13.121737861633301, "median": 4.255636215209961, "p90": 24.94163284301758, "max": 46.02710723876953, "pos_frac": 0.609375, "sample": [-13.221914291381836, 4.732719421386719, 4.665092468261719, 26.797683715820312, 25.086380004882812, 38.315181732177734, -1.2543258666992188, 12.646835327148438, 3.888519287109375, 46.02710723876953, 10.524024963378906, -10.337966918945312, 6.022735595703125, -23.850196838378906, -1.292032241821289, -24.552413940429688, -7.669597625732422, -4.143257141113281, 10.31549072265625, 5.235355377197266, 14.444320678710938, 8.809524536132812, 13.402706146240234, 3.3865890502929688, 14.681556701660156, 37.18236541748047, -2.1185340881347656, 2.3935699462890625, -1.074462890625, 6.83319091796875, -12.887992858886719, -21.453033447265625, 31.672378540039062, 4.622753143310547, -1.2123336791992188, -16.778806686401367, -3.13580322265625, 17.608671188354492, 15.191566467285156, 5.376129150390625, 12.645980834960938, 11.416229248046875, 8.454452514648438, 13.49993896484375, 10.522811889648438, -5.741645812988281, 24.60388946533203, 2.0859909057617188, 23.099834442138672, 2.65283203125, -16.770431518554688, 0.1209259033203125, 1.7371444702148438, -0.8679523468017578, -1.5926170349121094, -23.130233764648438, 21.086669921875, -4.3993988037109375, -6.282630920410156, -2.2623214721679688, -4.3761749267578125, 37.01017761230469, -5.730672836303711, 5.2358551025390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000489.npy"}
|
||||
{"epoch": 0.7392290249433107, "step": 490, "batch_size": 64, "mean": 11.330093383789062, "std": 20.028274536132812, "min": -34.144287109375, "p10": -11.044090652465819, "median": 6.059272766113281, "p90": 38.710929489135744, "max": 51.99120330810547, "pos_frac": 0.71875, "sample": [1.9405231475830078, -14.353485107421875, -30.62591552734375, 16.6572265625, -3.7921247482299805, 12.166828155517578, 6.1201171875, 5.8198394775390625, 50.49822998046875, 39.68902587890625, -13.58367919921875, 42.60479736328125, -21.651870727539062, 4.8643646240234375, 23.280675888061523, 13.197908401489258, 38.951629638671875, 37.19660186767578, 26.161636352539062, -0.0460052490234375, 6.22174072265625, 19.3072509765625, 36.16500473022461, 32.88481903076172, -9.05279541015625, -1.210235595703125, 36.0579833984375, -1.2398910522460938, 8.113346099853516, 0.9093170166015625, -7.2192535400390625, 5.9984283447265625, 4.309028625488281, 27.29619598388672, 0.180694580078125, 5.5450592041015625, 0.6874923706054688, 49.54448699951172, -25.651042938232422, -0.7163314819335938, 33.683650970458984, 1.02490234375, -11.897502899169922, 37.567630767822266, 0.4471092224121094, 12.486061096191406, 3.493927001953125, -4.536769866943359, -5.950447082519531, -4.927391052246094, 39.781768798828125, 19.406925201416016, 0.07662582397460938, 24.904090881347656, 51.99120330810547, 19.337574005126953, 10.268238067626953, 25.53270721435547, 0.8976898193359375, 38.149295806884766, -34.144287109375, -2.248371124267578, 23.425628662109375, 23.128040313720703], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000490.npy"}
|
||||
{"epoch": 0.7407407407407407, "step": 491, "batch_size": 64, "mean": 11.291996955871582, "std": 15.72107982635498, "min": -15.48159408569336, "p10": -6.416753578186035, "median": 6.483787536621094, "p90": 33.86292190551758, "max": 43.50574493408203, "pos_frac": 0.75, "sample": [43.50574493408203, -7.133689880371094, 14.508003234863281, 4.958671569824219, 12.410667419433594, 20.701396942138672, 12.486593246459961, 5.433837890625, 9.202205657958984, 8.54806137084961, 7.476043701171875, -2.872772216796875, 28.933574676513672, -3.321197509765625, 29.619281768798828, -6.023193359375, 5.304805755615234, 28.734039306640625, 2.6639022827148438, 17.806716918945312, 3.7444610595703125, -4.186309814453125, 30.551868438720703, -0.07436752319335938, -6.115896224975586, 20.900707244873047, 3.504058837890625, 4.2394256591796875, -6.653053283691406, 3.2721176147460938, 41.206790924072266, 14.021995544433594, -6.60504150390625, 14.281585693359375, 31.273887634277344, 16.120319366455078, 3.1779327392578125, 40.60832214355469, 7.704833984375, -13.556133270263672, 40.28449249267578, -0.7188262939453125, 0.32676124572753906, 0.31954193115234375, 31.27196502685547, 16.292285919189453, -4.3335723876953125, -6.545692443847656, 33.91663360595703, 30.7337646484375, 1.2574005126953125, 4.323020935058594, 40.878318786621094, 43.154388427734375, 20.483339309692383, 5.4915313720703125, 8.925479888916016, 2.1217575073242188, 19.637493133544922, -15.48159408569336, -12.59280776977539, 3.257396697998047, -4.413055419921875, 33.73759460449219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000491.npy"}
|
||||
{"epoch": 0.7422524565381708, "step": 492, "batch_size": 64, "mean": 7.504390716552734, "std": 17.0444278717041, "min": -26.981285095214844, "p10": -12.687799072265625, "median": 5.6387434005737305, "p90": 33.31214752197266, "max": 44.359596252441406, "pos_frac": 0.625, "sample": [42.10124206542969, 29.717926025390625, -2.237699508666992, 5.695859909057617, 17.437538146972656, -8.429519653320312, 34.05653381347656, -4.452781677246094, 2.8861007690429688, -2.4529685974121094, 13.341552734375, 10.588424682617188, 21.786209106445312, -2.5333786010742188, 13.687919616699219, 7.7642822265625, -8.036712646484375, 9.260452270507812, -0.7471351623535156, 5.581626892089844, 15.701976776123047, -20.016799926757812, -12.716400146484375, -3.5421676635742188, 10.339370727539062, 12.853813171386719, 10.562149047851562, 42.99143981933594, -6.1076202392578125, -26.981285095214844, 38.56805419921875, 6.592979431152344, 28.857894897460938, 2.2742919921875, -17.28058433532715, 4.558631896972656, -17.758651733398438, 44.359596252441406, -12.621063232421875, -9.405563354492188, 30.28936767578125, 24.55337142944336, 4.2979736328125, 0.4496917724609375, -2.851104736328125, 13.572952270507812, 31.953750610351562, 6.15765380859375, 2.03546142578125, 19.177265167236328, -14.509590148925781, -18.60077667236328, -9.730415344238281, 23.462684631347656, 22.758808135986328, 4.673572540283203, 33.894317626953125, -4.3437957763671875, 13.166351318359375, -10.418975830078125, 8.161348342895508, -0.1959075927734375, -8.591110229492188, 34.672584533691406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000492.npy"}
|
||||
{"epoch": 0.7437641723356009, "step": 493, "batch_size": 64, "mean": 11.352909088134766, "std": 18.891963958740234, "min": -34.28820037841797, "p10": -12.872665405273436, "median": 10.123872756958008, "p90": 34.59941864013672, "max": 51.45664978027344, "pos_frac": 0.765625, "sample": [12.261219024658203, 1.9799690246582031, -7.8731231689453125, 19.50722312927246, 7.9344940185546875, 2.372283935546875, -34.28820037841797, 5.889823913574219, 7.263355255126953, -1.3673515319824219, 3.9499588012695312, 29.33984375, 23.286880493164062, 46.26786804199219, 8.746204376220703, -15.944713592529297, 34.5916748046875, 1.2793769836425781, 30.529800415039062, 32.635711669921875, -4.937578201293945, 3.2671737670898438, -13.369316101074219, 11.613815307617188, -0.9656639099121094, -14.922149658203125, 34.60273742675781, 9.362674713134766, 51.45664978027344, 12.430171012878418, -13.826324462890625, 15.184822082519531, -10.235931396484375, 17.162551879882812, -11.713813781738281, 16.131179809570312, 31.055767059326172, -11.182624816894531, 26.131256103515625, 4.693981170654297, 3.786895751953125, 29.982940673828125, 11.985092163085938, 18.02197265625, 6.589164733886719, 45.281375885009766, -15.155441284179688, 26.206314086914062, 33.76630783081055, -9.826332092285156, 1.1339035034179688, 35.30989074707031, 8.025497436523438, 9.198143005371094, 45.41114044189453, 22.109207153320312, 11.895126342773438, 3.50201416015625, 10.88507080078125, 49.224796295166016, 20.876937866210938, 12.988739013671875, 19.13787078857422, -34.0521354675293], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000493.npy"}
|
||||
{"epoch": 0.745275888133031, "step": 494, "batch_size": 64, "mean": 9.522610664367676, "std": 18.64417266845703, "min": -34.21214294433594, "p10": -10.89846496582031, "median": 6.371246337890625, "p90": 36.8330020904541, "max": 52.843048095703125, "pos_frac": 0.6875, "sample": [23.80066680908203, -2.7780838012695312, 0.16473388671875, -3.3857879638671875, 25.631649017333984, 0.7094879150390625, 3.5971107482910156, 24.539596557617188, 4.303443908691406, 4.7965240478515625, -0.5029201507568359, -7.098320007324219, 2.321807861328125, 12.977577209472656, 6.791526794433594, -2.4170608520507812, 33.06990051269531, -7.476715087890625, 12.846435546875, -6.428558349609375, 5.9335784912109375, 52.843048095703125, 18.419483184814453, 9.994190216064453, 12.108535766601562, 43.121978759765625, 11.671607971191406, -6.959102630615234, 28.01654052734375, 37.11783981323242, 25.41156768798828, -34.21214294433594, -25.554969787597656, 15.454809188842773, -0.47618675231933594, -12.564468383789062, -7.228889465332031, 23.922698974609375, 11.5518798828125, 5.891166687011719, 19.69818115234375, -6.598594665527344, -32.128265380859375, -4.345924377441406, 9.53005599975586, 14.26235580444336, 46.324920654296875, 5.950965881347656, -3.714447021484375, 12.69189453125, 36.16838073730469, 7.1020355224609375, 11.552398681640625, 43.88189697265625, 2.882080078125, 24.988563537597656, -13.707870483398438, -12.668743133544922, 37.73438262939453, 1.444183349609375, 32.117218017578125, -12.36492919921875, 3.0315399169921875, 45.688621520996094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000494.npy"}
|
||||
{"epoch": 0.7467876039304611, "step": 495, "batch_size": 64, "mean": 8.856720924377441, "std": 17.98714256286621, "min": -37.98383331298828, "p10": -10.591375350952148, "median": 7.18638801574707, "p90": 33.389050292968754, "max": 48.25086212158203, "pos_frac": 0.703125, "sample": [-37.98383331298828, -3.5089874267578125, -14.75750732421875, -36.06641387939453, 34.64569091796875, -6.451572418212891, -17.964963912963867, 5.0251007080078125, 11.438682556152344, 21.400726318359375, -18.949386596679688, -15.902557373046875, 0.38753509521484375, 4.05877685546875, 23.765625, 34.943687438964844, -10.723640441894531, 29.068008422851562, 30.907543182373047, -4.571636199951172, 1.794260025024414, 12.064704895019531, 2.4389114379882812, 17.187837600708008, 5.891056060791016, 37.03368377685547, 2.6310691833496094, -1.669464111328125, 10.12883472442627, 15.528223037719727, 30.284942626953125, 14.629642486572266, 19.988773345947266, -9.151376724243164, 14.705757141113281, -9.95566177368164, 12.695602416992188, 9.391670227050781, 44.54156494140625, 27.249637603759766, -10.282756805419922, 11.421470642089844, 19.85749053955078, 8.851913452148438, 32.624664306640625, 20.095439910888672, 7.781635284423828, -1.4950485229492188, -6.757720947265625, 48.25086212158203, -0.6438789367675781, 27.524662017822266, 3.6757965087890625, 4.632198333740234, 1.12567138671875, 44.101715087890625, 16.245223999023438, -8.406448364257812, -1.428253173828125, 6.5911407470703125, 13.980995178222656, 4.04791259765625, 33.716644287109375, 5.148284912109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000495.npy"}
|
||||
{"epoch": 0.7482993197278912, "step": 496, "batch_size": 64, "mean": 10.85169506072998, "std": 17.96400260925293, "min": -27.772171020507812, "p10": -10.024232482910154, "median": 8.84652328491211, "p90": 30.82422046661377, "max": 56.75669860839844, "pos_frac": 0.71875, "sample": [12.351936340332031, 15.751922607421875, 56.75669860839844, 30.91064453125, -5.6291961669921875, 7.989234924316406, -12.008853912353516, 45.569091796875, 13.232208251953125, 28.29302978515625, -1.0898399353027344, 6.443565368652344, 23.890594482421875, 16.585189819335938, 3.649303436279297, -1.9086380004882812, 0.7207717895507812, 21.605411529541016, 30.6225643157959, 19.796646118164062, -8.453178405761719, 30.513702392578125, -0.6245384216308594, 21.845924377441406, 9.055854797363281, -14.9151611328125, 42.4169921875, 23.298324584960938, 1.177175521850586, 20.261260986328125, -22.17511749267578, 0.5194320678710938, 43.676963806152344, 44.99798583984375, 4.729522705078125, 3.3793067932128906, -25.128662109375, 4.6021728515625, 2.3087921142578125, -4.910026550292969, 4.944244384765625, 21.166336059570312, 25.160598754882812, 15.083086013793945, 12.505287170410156, 7.386146545410156, 15.024223327636719, 13.02589225769043, -4.603546142578125, 8.637191772460938, 46.56968688964844, -27.772171020507812, -8.531295776367188, 1.2312068939208984, 21.838699340820312, 18.464950561523438, 16.405982971191406, 28.789894104003906, -5.596858978271484, -0.40654754638671875, -10.6640625, 23.50713348388672, -16.56513214111328, -1.2014312744140625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000496.npy"}
|
||||
{"epoch": 0.7498110355253212, "step": 497, "batch_size": 64, "mean": 12.324544906616211, "std": 17.25216293334961, "min": -33.536766052246094, "p10": -5.82351608276367, "median": 10.161884307861328, "p90": 34.362490081787115, "max": 53.48771667480469, "pos_frac": 0.75, "sample": [53.2901725769043, -0.11268997192382812, 53.48771667480469, 27.091270446777344, 32.42755126953125, 0.08413314819335938, 14.634033203125, 12.526180267333984, 3.184438705444336, -0.0982818603515625, 15.532978057861328, 21.952301025390625, 30.24083709716797, -1.0484771728515625, 7.3439178466796875, 1.6667861938476562, 25.098182678222656, -33.536766052246094, 19.300537109375, 3.9938812255859375, -3.175699234008789, 33.68413543701172, 12.325927734375, -1.3904266357421875, 2.5139846801757812, -6.602226257324219, -6.802978515625, 16.080108642578125, 11.948570251464844, 29.502777099609375, -3.5914783477783203, 9.55267333984375, 3.2605438232421875, 24.468795776367188, -4.411590576171875, 28.793846130371094, 36.24597930908203, 21.76523208618164, 8.510459899902344, -22.749027252197266, 34.65321350097656, 4.495841979980469, 2.3338050842285156, 3.9856033325195312, 8.411376953125, -2.872020721435547, 48.146217346191406, 44.952301025390625, -6.428627014160156, 10.510505676269531, 7.431766510009766, 13.891693115234375, 11.314414978027344, 15.643783569335938, 14.309539794921875, 47.034751892089844, -14.382720947265625, 12.44781494140625, -0.06369781494140625, 19.74378204345703, 9.813262939453125, 25.443252563476562, -7.0077362060546875, 7.97442626953125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000497.npy"}
|
||||
{"epoch": 0.7513227513227513, "step": 498, "batch_size": 64, "mean": 11.46841812133789, "std": 18.20676040649414, "min": -37.395965576171875, "p10": -6.521361541748046, "median": 7.482282638549805, "p90": 37.70654678344727, "max": 51.805213928222656, "pos_frac": 0.734375, "sample": [-9.393085479736328, 14.91729736328125, 36.22209167480469, 2.4640884399414062, 10.026641845703125, -13.103710174560547, 5.284217834472656, 23.750289916992188, 2.4039268493652344, -5.39892578125, 38.11408233642578, 9.027908325195312, -0.9505538940429688, 4.1360626220703125, 33.016357421875, 12.898521423339844, 1.018728256225586, 2.7993698120117188, -1.7136268615722656, 7.827243804931641, 51.805213928222656, 3.9910736083984375, 30.016006469726562, -6.684318542480469, 10.662452697753906, 12.778549194335938, -2.49652099609375, 30.574684143066406, 19.519393920898438, -2.381824493408203, 22.708816528320312, 34.8589973449707, 1.080392837524414, 2.0029525756835938, 0.7077178955078125, -14.091278076171875, -2.4502410888671875, 31.40361785888672, 0.6416854858398438, 16.543529510498047, 39.97494888305664, 29.95431137084961, -37.395965576171875, 22.25223731994629, 13.120365142822266, 9.625286102294922, 10.833572387695312, 7.010246276855469, 12.0330810546875, 7.137321472167969, 39.97321319580078, 50.5540657043457, 9.086601257324219, -11.513076782226562, 36.75563049316406, -7.409640312194824, 46.96052551269531, 2.139404296875, -4.061004638671875, -1.11468505859375, -6.1411285400390625, -5.892631530761719, 3.929462432861328, 51.628868103027344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000498.npy"}
|
||||
{"epoch": 0.7528344671201814, "step": 499, "batch_size": 64, "mean": 10.216790199279785, "std": 19.68571662902832, "min": -31.879638671875, "p10": -17.306961441040038, "median": 9.433708190917969, "p90": 37.21257629394531, "max": 46.01087951660156, "pos_frac": 0.6875, "sample": [38.34617233276367, 42.215736389160156, -1.2179336547851562, 20.351211547851562, 17.183998107910156, 8.6162109375, -1.439493179321289, -28.920974731445312, -14.387649536132812, 19.040992736816406, 13.124958038330078, 36.108131408691406, 17.820098876953125, -4.061429977416992, 16.343368530273438, 2.11859130859375, 36.960693359375, -18.448925018310547, 37.320526123046875, 34.198875427246094, 17.823455810546875, -6.5553436279296875, -19.48529815673828, -24.946102142333984, -17.679241180419922, 15.60125732421875, 9.641769409179688, 30.356597900390625, -6.863292694091797, -9.02669906616211, -1.367666244506836, -8.916156768798828, 29.533451080322266, 5.7048797607421875, 10.114892959594727, 1.564300537109375, 11.025856018066406, 9.22564697265625, 6.00311279296875, 0.8982582092285156, 36.299903869628906, -4.344173431396484, 43.44611358642578, 5.9271392822265625, -11.986602783203125, 4.77532958984375, 10.709938049316406, 26.007583618164062, 39.78082275390625, 46.01087951660156, 19.13848876953125, 20.996623992919922, 7.951164245605469, -16.438308715820312, -31.879638671875, 12.737579345703125, 36.55005645751953, 22.469776153564453, -19.350006103515625, 33.85557556152344, -0.10605621337890625, 40.8841552734375, 2.971099853515625, 3.5403289794921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000499.npy"}
|
||||
{"epoch": 0.7543461829176115, "step": 500, "batch_size": 64, "mean": 10.52761459350586, "std": 16.572147369384766, "min": -30.508663177490234, "p10": -8.06180839538574, "median": 5.529514312744141, "p90": 35.77362709045411, "max": 49.688270568847656, "pos_frac": 0.78125, "sample": [1.4114151000976562, 4.9542236328125, 3.7782669067382812, -1.1697883605957031, 12.480484008789062, 6.070045471191406, 17.981040954589844, 14.28192138671875, 3.2020339965820312, 43.63020706176758, 12.650650024414062, 41.422401428222656, 23.02096176147461, 2.0814971923828125, 36.84678649902344, 16.496231079101562, 49.688270568847656, -15.644264221191406, 6.5052490234375, 3.716522216796875, 36.95351791381836, -15.88625717163086, 1.3084487915039062, 8.944725036621094, -5.3578033447265625, 30.67926788330078, 32.64594268798828, 2.70074462890625, 2.8298110961914062, -11.112068176269531, -13.56619644165039, 0.6496734619140625, 3.891193389892578, 13.629135131835938, -30.508663177490234, 1.5239639282226562, 7.7862548828125, 21.773040771484375, 29.16545867919922, 1.271087646484375, 13.124069213867188, -1.9330215454101562, 13.883636474609375, 9.320072174072266, -9.00649642944336, 17.573074340820312, 4.3291473388671875, -14.290283203125, 33.269588470458984, -3.7305984497070312, 38.98622131347656, -4.142425537109375, 2.8887157440185547, 4.988983154296875, 22.67483901977539, 3.406597137451172, -0.8694114685058594, 30.676254272460938, 30.016300201416016, 2.6783828735351562, -5.857536315917969, 22.672061920166016, 37.19069290161133, 23.19305419921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000500.npy"}
|
||||
{"epoch": 0.7558578987150416, "step": 501, "batch_size": 64, "mean": 9.115702629089355, "std": 14.46723461151123, "min": -12.374526977539062, "p10": -5.339834022521972, "median": 4.853894233703613, "p90": 32.203914642333984, "max": 48.24905776977539, "pos_frac": 0.71875, "sample": [-2.0651893615722656, 30.237464904785156, -4.563934326171875, 9.43804931640625, 36.79897689819336, 18.877948760986328, 0.5215606689453125, 12.84130859375, 13.622337341308594, 48.24905776977539, 33.61451721191406, -5.609140396118164, 13.427864074707031, 13.921218872070312, -4.711452484130859, 0.07735633850097656, 10.782012939453125, 32.183433532714844, -12.374526977539062, -2.662954330444336, 7.25299072265625, -2.5595436096191406, -0.2826719284057617, 9.912895202636719, 37.5426025390625, 32.21269226074219, 14.535282135009766, 44.30503845214844, -6.467918395996094, 0.0318145751953125, -8.332893371582031, 4.831207275390625, 0.6217174530029297, 3.957000732421875, 23.982330322265625, 0.2335205078125, 9.405158996582031, -0.4340381622314453, 2.2297229766845703, -8.441047668457031, 9.612796783447266, 39.541160583496094, -1.6837654113769531, 10.555408477783203, 0.153106689453125, -11.557823181152344, 5.79412841796875, -0.113006591796875, -12.194068908691406, 10.785202026367188, 4.61651611328125, 31.42076873779297, 0.07289695739746094, 8.009178161621094, 3.9681549072265625, -1.4162111282348633, 0.6457977294921875, 3.9315109252929688, 25.787620544433594, 19.93695068359375, 17.690467834472656, 4.876581192016602, 6.866924285888672, -1.0370941162109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000501.npy"}
|
||||
{"epoch": 0.7573696145124716, "step": 502, "batch_size": 64, "mean": 8.188718795776367, "std": 13.922470092773438, "min": -22.537826538085938, "p10": -5.960739517211913, "median": 7.335247039794922, "p90": 24.594826507568364, "max": 49.55760955810547, "pos_frac": 0.734375, "sample": [-9.790721893310547, 0.6322917938232422, 1.3493080139160156, -4.7646331787109375, 10.022293090820312, 16.581878662109375, 2.437938690185547, 3.921112060546875, -8.481475830078125, 14.97479248046875, 5.781028747558594, 31.864105224609375, -3.1297454833984375, -22.537826538085938, 11.1500244140625, 49.55760955810547, 5.35186767578125, -6.44287109375, 7.281990051269531, -3.6248016357421875, 26.048057556152344, -11.699607849121094, 7.610928535461426, 11.274642944335938, 13.626121520996094, 0.05618000030517578, 22.917701721191406, -4.835765838623047, 3.938608169555664, 9.23583984375, 2.9627151489257812, -1.432037353515625, 7.4624176025390625, 2.6432876586914062, 9.257396697998047, 7.3885040283203125, 45.69099044799805, 47.318603515625, -7.2669830322265625, -4.719757080078125, -1.89373779296875, -4.665367126464844, 7.694742202758789, -2.115875244140625, 8.052640914916992, -1.841033935546875, 1.5257186889648438, 23.237022399902344, 30.41324234008789, -18.0863037109375, 12.140792846679688, 16.72650146484375, 20.808677673339844, 13.920639038085938, 16.729736328125, 15.462677001953125, 15.436676025390625, 25.176742553710938, 3.9807891845703125, 14.425956726074219, 14.667678833007812, 0.32253265380859375, 16.80718994140625, 5.538337707519531], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000502.npy"}
|
||||
{"epoch": 0.7588813303099018, "step": 503, "batch_size": 64, "mean": 11.419576644897461, "std": 17.17742156982422, "min": -31.787940979003906, "p10": -6.800633239746094, "median": 9.996318817138672, "p90": 37.862806701660155, "max": 44.48817825317383, "pos_frac": 0.765625, "sample": [9.051109313964844, -3.0002517700195312, 32.333740234375, 11.119728088378906, 11.724700927734375, 10.4888916015625, 25.553741455078125, -29.352325439453125, -6.0563812255859375, 35.462913513183594, 14.635330200195312, -3.6412887573242188, 15.701805114746094, 44.48817825317383, -4.830129623413086, 9.503746032714844, 1.3693313598632812, 3.4981689453125, 36.36143493652344, -13.935028076171875, 1.45782470703125, 10.68453598022461, 38.16093444824219, -4.467750549316406, 8.000255584716797, 8.062591552734375, 8.402023315429688, 41.87739562988281, 12.786117553710938, 34.72282791137695, -5.898529052734375, 5.729228973388672, 37.430599212646484, 40.596160888671875, 38.77708435058594, -1.0667457580566406, 7.972339630126953, -6.3792724609375, -6.9812164306640625, 4.050651550292969, -8.33160400390625, 3.2821578979492188, 2.9793701171875, 14.27145767211914, 12.600135803222656, 8.451751708984375, 11.413909912109375, 40.22794723510742, 21.886829376220703, 3.4656543731689453, 38.048038482666016, 22.341388702392578, -19.97930145263672, 3.5846405029296875, 30.251380920410156, 13.632217407226562, 20.353002548217773, -7.474386215209961, 5.170051574707031, 25.175338745117188, 12.578258514404297, 17.0958251953125, 17.222373962402344, -31.787940979003906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000503.npy"}
|
||||
{"epoch": 0.7603930461073318, "step": 504, "batch_size": 64, "mean": 14.314229965209961, "std": 15.984184265136719, "min": -16.535369873046875, "p10": -1.9623847961425775, "median": 9.88129997253418, "p90": 34.993316650390625, "max": 51.810462951660156, "pos_frac": 0.859375, "sample": [-16.535369873046875, 30.708675384521484, 3.7288055419921875, 10.677909851074219, 0.37221527099609375, -1.2345733642578125, 29.722640991210938, 10.511493682861328, 46.993927001953125, 5.087310791015625, 0.2062835693359375, 6.9108123779296875, 3.0085182189941406, 49.74517059326172, 2.2570953369140625, 31.82137680053711, 29.470081329345703, -5.335609436035156, 34.365745544433594, 37.515869140625, 11.84274673461914, 9.251106262207031, 5.342597961425781, 44.226959228515625, 32.84928894042969, -10.474716186523438, 0.4232673645019531, 23.28845977783203, 24.027423858642578, 44.12236785888672, 34.46630859375, 11.986038208007812, 16.585237503051758, 21.921024322509766, 7.173179626464844, 4.343475341796875, 4.571132659912109, -3.3005218505859375, 35.21917724609375, -1.4504318237304688, 25.793724060058594, 8.502235412597656, 0.5740852355957031, -2.181793212890625, 21.68096160888672, 12.603034973144531, 7.575170516967773, 5.388307571411133, 28.55596923828125, 8.644012451171875, 11.481536865234375, -9.309577941894531, -15.871313095092773, 8.3670654296875, 9.180469512939453, 5.458404541015625, 1.8944902420043945, 17.584228515625, 22.37293243408203, 23.091928482055664, 12.259212493896484, 51.810462951660156, 26.829784393310547, 7.412906646728516], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000504.npy"}
|
||||
{"epoch": 0.7619047619047619, "step": 505, "batch_size": 64, "mean": 8.596763610839844, "std": 19.138948440551758, "min": -36.861942291259766, "p10": -14.981815338134764, "median": 7.353918075561523, "p90": 31.435409927368166, "max": 51.88526916503906, "pos_frac": 0.640625, "sample": [28.679222106933594, 11.643695831298828, 0.85968017578125, -5.073020935058594, -1.4818134307861328, 24.53466796875, -15.827774047851562, 51.88526916503906, 9.037328720092773, 10.899078369140625, 28.245559692382812, -3.391193389892578, 8.996917724609375, -1.0124130249023438, 31.063404083251953, 1.209197998046875, -6.0365142822265625, -1.0695571899414062, 10.690425872802734, 6.249702453613281, -1.99859619140625, 30.441753387451172, 19.581172943115234, 13.70880126953125, 19.753677368164062, 49.94496154785156, 24.941497802734375, -5.47540283203125, -0.039073944091796875, 3.845081329345703, -3.645233154296875, -17.061630249023438, 31.59484100341797, -23.495010375976562, 42.33552169799805, 22.770370483398438, 19.7713623046875, 26.467681884765625, -8.810577392578125, -12.009811401367188, 14.139923095703125, -13.007911682128906, 15.147842407226562, -20.90655517578125, 8.125762939453125, 3.844991683959961, 14.726215362548828, -5.6060791015625, 40.497467041015625, -36.861942291259766, -6.1430206298828125, 20.484214782714844, 45.69799041748047, 15.441810607910156, 26.356353759765625, -10.458770751953125, 21.48979949951172, 1.7914009094238281, 1.4012069702148438, 32.349769592285156, 6.582073211669922, 1.4821357727050781, -16.070091247558594, -33.035003662109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000505.npy"}
|
||||
{"epoch": 0.763416477702192, "step": 506, "batch_size": 64, "mean": 9.109159469604492, "std": 20.779733657836914, "min": -33.44078826904297, "p10": -20.20713520050049, "median": 9.665374755859375, "p90": 38.544812774658205, "max": 57.511314392089844, "pos_frac": 0.703125, "sample": [14.150325775146484, 35.34278869628906, -2.749980926513672, -3.6763458251953125, 10.678665161132812, 15.8287353515625, 11.693077087402344, -19.925039291381836, 27.402210235595703, 8.870407104492188, 6.8336334228515625, 16.89919662475586, 10.460342407226562, 0.33638763427734375, 13.115943908691406, 6.5, -32.24730682373047, 51.14759826660156, 57.511314392089844, -3.0627822875976562, 19.33303451538086, -30.8370361328125, -2.6432418823242188, -33.37339782714844, -32.0919189453125, -33.44078826904297, -1.7649002075195312, 35.98170471191406, 40.580596923828125, 16.673648834228516, 22.949851989746094, -12.488502502441406, 48.215911865234375, 0.8700714111328125, 40.01418685913086, 4.756126403808594, 24.992164611816406, 8.042411804199219, 16.87957763671875, 0.146636962890625, -20.328033447265625, -0.7518043518066406, 3.8406295776367188, 18.961933135986328, 10.681037902832031, 8.764358520507812, -8.880664825439453, 10.565780639648438, 6.526142120361328, 2.6521835327148438, 39.05821990966797, 10.49639892578125, -0.703948974609375, 22.410057067871094, 24.441959381103516, 13.028244018554688, -1.0943832397460938, 39.93398666381836, -22.05280303955078, -14.861427307128906, 6.200168609619141, 25.759429931640625, 13.0865478515625, 37.34686279296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000506.npy"}
|
||||
{"epoch": 0.764928193499622, "step": 507, "batch_size": 64, "mean": 10.48335075378418, "std": 17.682910919189453, "min": -24.236064910888672, "p10": -11.050739288330076, "median": 8.410078048706055, "p90": 35.54905433654786, "max": 52.626922607421875, "pos_frac": 0.75, "sample": [29.071067810058594, 5.803863525390625, -1.7633171081542969, -0.447296142578125, 48.03923034667969, 9.186946868896484, 10.589370727539062, -21.81927490234375, 1.0518798828125, 13.02615737915039, -6.903841018676758, 11.783161163330078, -7.271064758300781, -1.8789291381835938, -9.128799438476562, 11.265151977539062, 5.115997314453125, -8.730262756347656, 29.448570251464844, 2.9705162048339844, 6.204559326171875, 6.124946594238281, -14.046993255615234, -7.092279434204102, 48.504486083984375, 1.539133071899414, 7.519983291625977, 20.62115478515625, 14.791540145874023, 7.168449401855469, 4.8678436279296875, 8.121734619140625, 52.626922607421875, -13.904182434082031, 23.338790893554688, 17.06734275817871, 12.514533996582031, 7.7691497802734375, -22.6935977935791, -0.7734146118164062, 20.183921813964844, 15.68951416015625, 10.565938949584961, 38.11595916748047, 23.963298797607422, 2.0753936767578125, 12.253936767578125, 9.66259765625, 46.6842041015625, 47.7572021484375, -15.3555908203125, 31.418563842773438, 12.36893081665039, 1.3061370849609375, 7.8520965576171875, 1.65557861328125, 8.698421478271484, -11.874427795410156, 21.18844223022461, -24.236064910888672, 25.231422424316406, 32.74974060058594, 16.551219940185547, 36.74876022338867], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000507.npy"}
|
||||
{"epoch": 0.7664399092970522, "step": 508, "batch_size": 64, "mean": 11.710323333740234, "std": 18.615388870239258, "min": -21.677398681640625, "p10": -9.290486145019528, "median": 8.481025695800781, "p90": 40.384143447875985, "max": 58.51408386230469, "pos_frac": 0.703125, "sample": [19.401397705078125, -4.37713623046875, 46.902740478515625, 21.79387664794922, 5.523368835449219, -21.113731384277344, 57.732200622558594, -0.8417749404907227, 18.6568660736084, 43.56690216064453, 58.51408386230469, 37.88479995727539, -5.6993865966796875, 8.19845199584961, -12.746307373046875, 11.20843505859375, -21.677398681640625, 14.290367126464844, 22.427181243896484, -4.232780456542969, 11.659988403320312, -10.82952880859375, 16.557443618774414, -1.2187728881835938, 1.085693359375, -14.978713989257812, 6.435214996337891, 4.826723098754883, 37.717437744140625, 13.033187866210938, 5.1052703857421875, 7.9252471923828125, 15.3167724609375, 8.403976440429688, 29.699039459228516, 8.558074951171875, -4.187282562255859, 15.54034423828125, 41.167625427246094, -3.7602996826171875, -3.781383514404297, -1.1248626708984375, 35.326629638671875, 27.624267578125, 1.9466323852539062, -17.0260009765625, 9.782630920410156, 11.903656005859375, 43.84556579589844, 6.7913818359375, 23.983566284179688, -2.963944435119629, 2.574308395385742, -4.475629806518555, 1.8136367797851562, -15.333709716796875, -1.2037811279296875, 38.5560188293457, 19.979476928710938, 17.195465087890625, 6.544769287109375, 11.0203857421875, 10.235984802246094, 42.77605438232422], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000508.npy"}
|
||||
{"epoch": 0.7679516250944822, "step": 509, "batch_size": 64, "mean": 11.363936424255371, "std": 14.973190307617188, "min": -25.29693603515625, "p10": -7.242094421386718, "median": 12.617561340332031, "p90": 31.115627288818363, "max": 47.89305877685547, "pos_frac": 0.8125, "sample": [12.799346923828125, -2.9056777954101562, 0.6144676208496094, 28.055870056152344, 22.475555419921875, 0.4077615737915039, 15.782676696777344, -9.709785461425781, -5.884101867675781, 10.852855682373047, 22.092029571533203, 22.951171875, 9.144775390625, 12.845905303955078, 31.784263610839844, -25.29693603515625, 31.424652099609375, 32.31970977783203, 19.338104248046875, 1.0722122192382812, 30.394569396972656, -18.882808685302734, 18.228240966796875, -5.808965682983398, 17.278648376464844, 2.1768341064453125, 4.501434326171875, -7.43658447265625, 16.72882080078125, 2.0169830322265625, 37.002952575683594, 29.790302276611328, 5.816978454589844, 7.3720245361328125, 6.827362060546875, 0.1500568389892578, 19.880645751953125, 1.0806427001953125, 32.31842041015625, 14.54421615600586, 15.581947326660156, 7.502239227294922, 23.586181640625, -20.52532196044922, 12.435775756835938, 20.71459197998047, 0.9712600708007812, 18.78369140625, 47.89305877685547, 28.970218658447266, 21.6151123046875, 4.017402648925781, -0.0621337890625, -8.3980712890625, 1.9904327392578125, 28.366870880126953, 11.371679306030273, -13.220510482788086, -6.7882843017578125, 13.334674835205078, 7.601954460144043, 18.534072875976562, 32.965248107910156, 15.904205322265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000509.npy"}
|
||||
{"epoch": 0.7694633408919124, "step": 510, "batch_size": 64, "mean": 13.135367393493652, "std": 19.53233528137207, "min": -32.316898345947266, "p10": -6.133831787109374, "median": 12.342155456542969, "p90": 38.06646957397461, "max": 50.39746856689453, "pos_frac": 0.75, "sample": [-5.20794677734375, 22.770416259765625, 31.07904052734375, 9.813308715820312, -4.1939239501953125, 10.3194580078125, 15.149593353271484, 47.22071838378906, 6.464942932128906, 2.8505096435546875, -0.8644256591796875, 2.15972900390625, 5.627742767333984, 6.384000778198242, -19.773176193237305, 0.8793144226074219, 2.1912460327148438, 23.420623779296875, -6.5306396484375, 15.546852111816406, 20.38201141357422, 33.39356231689453, 36.33222198486328, -2.1633377075195312, 8.069385528564453, 30.59178924560547, 12.950187683105469, 4.889434814453125, 4.098947525024414, 18.29034423828125, 21.433574676513672, 20.954193115234375, 38.09498596191406, -0.9593429565429688, 5.629172325134277, 28.806167602539062, 18.670997619628906, 50.39746856689453, -4.248195648193359, -16.166717529296875, 4.2152862548828125, -30.668075561523438, 29.98403549194336, 27.652395248413086, 46.313812255859375, 11.734123229980469, 21.071304321289062, 47.66062927246094, 36.5157470703125, -4.2987518310546875, 15.865747451782227, 27.454370498657227, 36.49090576171875, 22.46234130859375, -32.316898345947266, 4.152265548706055, 13.722808837890625, 41.041908264160156, -1.9556140899658203, 37.99993133544922, 42.120849609375, -3.5447330474853516, -25.909767150878906, -21.855316162109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000510.npy"}
|
||||
{"epoch": 0.7709750566893424, "step": 511, "batch_size": 64, "mean": 14.573792457580566, "std": 19.025238037109375, "min": -21.8980712890625, "p10": -8.110423278808591, "median": 10.459501266479492, "p90": 43.83307342529297, "max": 51.3955078125, "pos_frac": 0.796875, "sample": [43.935150146484375, 35.02012634277344, 10.046375274658203, 45.14643478393555, 8.829940795898438, -1.84173583984375, 10.872627258300781, -9.504009246826172, 26.451133728027344, 17.83696746826172, 17.09716033935547, -5.327972412109375, 4.158103942871094, 6.2679290771484375, 32.24928665161133, 38.270050048828125, 25.855533599853516, 4.049190521240234, 33.69630432128906, 32.828697204589844, 24.888107299804688, -21.8980712890625, 4.053253173828125, -17.619483947753906, -5.831855773925781, 1.048370361328125, 43.59489440917969, 29.03868865966797, 22.95654296875, 2.6969375610351562, 4.9669036865234375, 12.771087646484375, -18.321578979492188, 6.3812255859375, 34.872406005859375, 12.557937622070312, 29.470314025878906, -5.379753112792969, 0.24362945556640625, 4.047698974609375, 5.6608734130859375, 9.175895690917969, 45.30290985107422, 34.729286193847656, 48.31285095214844, 27.477706909179688, -9.086952209472656, 5.8972625732421875, 3.9531097412109375, 44.09681701660156, 3.4162139892578125, 0.3019828796386719, 15.218753814697266, 14.635940551757812, -2.270928382873535, 22.716644287109375, -14.238029479980469, 51.3955078125, 18.35851287841797, 39.97960662841797, 46.23720169067383, 0.8095359802246094, -18.559661865234375, -1.2728347778320312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000511.npy"}
|
||||
{"epoch": 0.7724867724867724, "step": 512, "batch_size": 64, "mean": 8.617152214050293, "std": 17.73961639404297, "min": -26.119384765625, "p10": -10.939278411865233, "median": 3.9979419708251953, "p90": 38.5174446105957, "max": 50.33758544921875, "pos_frac": 0.6875, "sample": [12.791496276855469, 9.101318359375, 7.783977508544922, -21.285058975219727, -11.27206039428711, 45.87091064453125, -4.676158905029297, 3.326730728149414, 3.106386184692383, 1.296234130859375, 7.178676605224609, 27.767120361328125, -21.605499267578125, -5.059700012207031, 9.0484619140625, 21.548965454101562, -0.3557281494140625, -14.809486389160156, 24.766645431518555, 3.540943145751953, 38.65943145751953, 15.28683090209961, -0.4856719970703125, 27.39968490600586, -1.163726806640625, 34.43524169921875, -26.119384765625, 17.824264526367188, -1.319305419921875, 41.459442138671875, 1.6759414672851562, 13.549812316894531, 45.75218963623047, 0.5868091583251953, 14.77020263671875, 2.6849365234375, 0.9873390197753906, 23.711196899414062, 5.966678619384766, 40.978355407714844, -8.823135375976562, -10.665184020996094, -2.658496856689453, 1.2276687622070312, 11.397972106933594, 44.673583984375, -1.8496475219726562, 4.766468048095703, 0.8965911865234375, -11.056747436523438, 19.110809326171875, 3.9966163635253906, 3.999267578125, 0.190185546875, 4.91741943359375, 21.17578887939453, -4.375579833984375, 9.505878448486328, 50.33758544921875, 38.18614196777344, -8.016550064086914, -13.50909423828125, 13.54693603515625, -10.181182861328125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000512.npy"}
|
||||
{"epoch": 0.7739984882842026, "step": 513, "batch_size": 64, "mean": 11.417625427246094, "std": 16.188018798828125, "min": -19.009490966796875, "p10": -6.470432090759276, "median": 8.069746017456055, "p90": 39.13948860168458, "max": 55.08256912231445, "pos_frac": 0.796875, "sample": [42.4422607421875, 35.901432037353516, 47.50640869140625, 3.3703460693359375, 13.560638427734375, 14.880786895751953, 18.06237030029297, 7.811851501464844, 3.6084365844726562, 8.577888488769531, 7.623619079589844, -0.35129547119140625, 35.10320281982422, 6.930599212646484, 1.978494644165039, 25.39832878112793, 3.6637802124023438, 1.811859130859375, 55.08256912231445, 11.294715881347656, 4.311534881591797, 40.48563766479492, 5.039237976074219, 12.875408172607422, 15.78067398071289, -9.453681945800781, 0.0384979248046875, 11.773155212402344, -1.6584930419921875, 3.714385986328125, 12.590948104858398, 11.87655258178711, 4.618255615234375, -9.05810546875, 13.602279663085938, 6.040168762207031, 8.681716918945312, 11.9715576171875, 44.494537353515625, 5.513267517089844, -6.759256362915039, -8.371901512145996, 41.387779235839844, -5.7965087890625, 30.534523010253906, 35.99847412109375, 49.48261260986328, -19.009490966796875, 15.841278076171875, 9.402389526367188, 18.34429168701172, -9.407417297363281, 11.611942291259766, 14.1043701171875, 2.393402099609375, -14.969615936279297, 2.9287643432617188, 8.327640533447266, -1.4119987487792969, 6.304630279541016, 7.376350402832031, -2.4870376586914062, -2.5863494873046875, 9.993341445922852], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000513.npy"}
|
||||
{"epoch": 0.7755102040816326, "step": 514, "batch_size": 64, "mean": 12.932395935058594, "std": 17.374181747436523, "min": -35.74907684326172, "p10": -4.445309448242187, "median": 13.553970336914062, "p90": 35.22782669067383, "max": 59.13128662109375, "pos_frac": 0.734375, "sample": [7.779991149902344, 12.37533950805664, 11.610591888427734, 30.193012237548828, 10.089885711669922, -4.518890380859375, 23.276596069335938, -1.037628173828125, 11.686309814453125, -2.0048866271972656, 23.818363189697266, 27.760330200195312, 16.69753646850586, 24.20135498046875, -0.8941192626953125, -2.5445556640625, -4.27362060546875, -2.249704360961914, 2.066516876220703, 1.9487686157226562, -35.74907684326172, -2.9299697875976562, 12.910804748535156, 19.951751708984375, 50.67976760864258, -0.7869110107421875, 5.651329040527344, 23.155364990234375, 25.47235107421875, 31.681594848632812, 14.197135925292969, -19.67096519470215, 36.36680603027344, 16.256004333496094, 14.214584350585938, 17.450973510742188, 10.42047119140625, 26.709712982177734, 16.004215240478516, 2.8138580322265625, 3.0802764892578125, -20.448204040527344, -11.99736213684082, 15.67666244506836, 14.79833984375, 26.746139526367188, 14.723670959472656, -2.5318679809570312, 35.27916717529297, 24.230865478515625, 0.5131378173828125, 7.58251953125, 18.562862396240234, 38.63017272949219, -12.42327880859375, 35.1080322265625, 35.48463439941406, 3.8035812377929688, 28.07476806640625, -6.096586227416992, 59.13128662109375, 32.80434799194336, -3.0570831298828125, 39.21631622314453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000514.npy"}
|
||||
{"epoch": 0.7770219198790628, "step": 515, "batch_size": 64, "mean": 10.993968963623047, "std": 18.268590927124023, "min": -23.898590087890625, "p10": -11.935656738281248, "median": 11.331849098205566, "p90": 36.71274337768555, "max": 49.82561492919922, "pos_frac": 0.71875, "sample": [6.141454696655273, -1.6457939147949219, 0.628662109375, 14.627010345458984, 33.86622619628906, 33.78366470336914, -1.0349578857421875, 19.127025604248047, 16.98993682861328, 31.129356384277344, 9.562995910644531, 36.5322265625, 17.03033447265625, -7.938625335693359, 17.150962829589844, -12.6492919921875, -10.2705078125, 23.430740356445312, -23.898590087890625, -15.873947143554688, 2.1723480224609375, 1.6831169128417969, -4.0081329345703125, 13.232574462890625, -10.251724243164062, 13.100702285766602, 45.28223419189453, 21.107444763183594, 15.654441833496094, 0.07733154296875, 7.74993896484375, 14.313697814941406, -1.0520248413085938, 1.3754501342773438, 24.522037506103516, 40.177833557128906, -17.19268035888672, 28.156173706054688, -3.5526580810546875, 5.9329833984375, -17.671348571777344, 13.619453430175781, 38.786041259765625, -8.107627868652344, 33.618011474609375, 49.82561492919922, 0.4085044860839844, 20.040878295898438, 18.81793975830078, 36.79010772705078, 5.052703857421875, 13.934158325195312, 7.740478515625, 41.31622314453125, 46.25171661376953, -10.087955474853516, 20.37792205810547, 3.0096969604492188, 27.85253143310547, -18.244556427001953, 14.948841094970703, 8.35028076171875, -5.750435829162598, -22.43517303466797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000515.npy"}
|
||||
{"epoch": 0.7785336356764928, "step": 516, "batch_size": 64, "mean": 9.394851684570312, "std": 16.039335250854492, "min": -34.85710906982422, "p10": -8.874659729003906, "median": 10.28085708618164, "p90": 28.44705619812012, "max": 49.362220764160156, "pos_frac": 0.796875, "sample": [32.67002868652344, 49.362220764160156, 18.697002410888672, -9.201919555664062, -13.477241516113281, 20.459609985351562, 0.1220245361328125, 9.802330017089844, 24.913299560546875, 11.409561157226562, 2.2484817504882812, 17.79883575439453, 7.868743896484375, 39.196800231933594, 3.3135452270507812, -0.8165016174316406, 17.9462890625, 7.666957855224609, 9.389968872070312, 13.7359619140625, 3.4960708618164062, -34.85710906982422, 13.258964538574219, 16.25708770751953, 43.619049072265625, 10.761734008789062, 10.090286254882812, 0.08474349975585938, 29.200515747070312, 13.297248840332031, -29.401290893554688, -11.641119003295898, 0.4986419677734375, -16.02587127685547, -0.7319412231445312, 10.471427917480469, 28.014087677001953, 5.154672622680664, 3.4874496459960938, 2.426891326904297, 24.24927520751953, -30.42723846435547, 12.219657897949219, 27.249624252319336, 28.632614135742188, -8.111053466796875, 16.328826904296875, 14.747573852539062, 10.768760681152344, 7.4551239013671875, -1.2002029418945312, 20.79962921142578, 12.888626098632812, 10.798145294189453, 0.030670166015625, 17.408370971679688, 6.189308166503906, 14.419673919677734, -4.730010986328125, -3.738941192626953, 20.36932373046875, 1.388723373413086, 5.118843078613281, 37.847686767578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000516.npy"}
|
||||
{"epoch": 0.780045351473923, "step": 517, "batch_size": 64, "mean": 8.81535530090332, "std": 19.84164047241211, "min": -32.901729583740234, "p10": -10.57392711639404, "median": 5.417385101318359, "p90": 41.76857070922853, "max": 53.32758331298828, "pos_frac": 0.65625, "sample": [28.58050537109375, -14.361259460449219, 3.596424102783203, 4.74713134765625, -19.865089416503906, 7.105770111083984, 5.049827575683594, -3.652416229248047, 31.4310302734375, 23.825096130371094, 12.985702514648438, -8.391231536865234, -0.31012725830078125, 8.065513610839844, -2.3351211547851562, -1.9598312377929688, 5.784942626953125, 4.941627502441406, 12.432048797607422, -6.783470153808594, 28.257118225097656, -5.380279541015625, 9.5926513671875, 46.99753952026367, -13.393218994140625, 6.897382736206055, 4.747669219970703, 16.550800323486328, -8.578765869140625, 18.870887756347656, 10.76202392578125, 50.8973388671875, 13.371391296386719, 4.478733062744141, -8.773323059082031, -32.901729583740234, 0.393310546875, 47.80033874511719, -8.8095703125, -6.402351379394531, 8.269210815429688, 32.73626708984375, 53.32758331298828, 2.2807350158691406, 3.3261642456054688, 42.64331817626953, 6.6239776611328125, 6.4076385498046875, -11.310327529907227, -1.7335968017578125, -8.855659484863281, 8.34615707397461, 0.49393463134765625, 52.34262466430664, 48.916725158691406, 39.72749328613281, 7.134315490722656, -3.254638671875, 6.099103927612305, -0.5934982299804688, 38.819854736328125, -31.997276306152344, 13.658355712890625, -15.49072265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000517.npy"}
|
||||
{"epoch": 0.781557067271353, "step": 518, "batch_size": 64, "mean": 9.787691116333008, "std": 18.140575408935547, "min": -26.834228515625, "p10": -8.953877449035645, "median": 6.876274108886719, "p90": 38.89224853515626, "max": 50.97918701171875, "pos_frac": 0.703125, "sample": [13.278770446777344, -5.8284149169921875, -0.8047904968261719, 8.885589599609375, 0.2210540771484375, 0.8050155639648438, 2.7344894409179688, 16.078392028808594, -7.975364685058594, 17.67810821533203, 12.875164031982422, -15.357978820800781, 7.142608642578125, 25.12596893310547, 32.0760498046875, -9.152755737304688, 16.578346252441406, 23.401283264160156, -3.456439971923828, 15.656673431396484, 44.248016357421875, -26.834228515625, -1.4313430786132812, 26.585296630859375, 39.97429656982422, 14.941650390625, -22.71949005126953, 7.806968688964844, -23.647567749023438, 29.105209350585938, 1.4985122680664062, -5.509304046630859, -2.1223220825195312, 6.6099395751953125, 36.367469787597656, -7.8786163330078125, 6.432849884033203, 0.21904754638671875, 23.63323211669922, 50.22577667236328, -1.675811767578125, 8.732879638671875, 3.8303451538085938, 50.97918701171875, 18.624496459960938, 43.62445068359375, -9.576026916503906, 40.93389892578125, 3.7515869140625, -15.016468048095703, -8.489828109741211, 8.716636657714844, 22.023345947265625, 0.8513784408569336, -0.7111015319824219, 34.3270263671875, 20.663223266601562, 9.18191146850586, 10.541505813598633, -7.714317321777344, 1.4862937927246094, 0.4348154067993164, 41.894989013671875, 1.5306549072265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000518.npy"}
|
||||
{"epoch": 0.783068783068783, "step": 519, "batch_size": 64, "mean": 6.627191066741943, "std": 21.435213088989258, "min": -46.87712097167969, "p10": -18.443859100341797, "median": 5.099641799926758, "p90": 37.86871261596681, "max": 48.10612487792969, "pos_frac": 0.59375, "sample": [-23.520610809326172, 17.268829345703125, -11.726531982421875, -35.76074981689453, -0.911956787109375, 19.50051498413086, 1.2455425262451172, 48.10612487792969, -12.812721252441406, 14.776763916015625, -0.4773101806640625, 10.042736053466797, 24.61463737487793, 4.5760955810546875, 4.031158447265625, 8.707870483398438, -2.851612091064453, -1.9354743957519531, -21.447364807128906, -11.290054321289062, 34.83065414428711, 29.597686767578125, -12.731307983398438, 26.705093383789062, 35.0784912109375, 42.26392364501953, -9.88216781616211, 15.095573425292969, 22.715713500976562, 14.29498291015625, -9.962358474731445, 29.999664306640625, 5.2914276123046875, 15.267852783203125, 4.190767288208008, -3.7646484375, -3.7374267578125, 1.8589897155761719, 39.06452178955078, -25.538833618164062, -1.8237152099609375, 5.2585296630859375, 31.859603881835938, 4.940753936767578, 18.440582275390625, -46.87712097167969, 10.604988098144531, -17.142440795898438, 16.79181671142578, -38.50210952758789, 41.10037612915039, -6.126239776611328, -15.705078125, 43.40293884277344, 5.6443328857421875, 10.516044616699219, 30.093551635742188, 41.36647033691406, -8.620315551757812, -19.001609802246094, 15.901620864868164, -11.673896789550781, 40.687835693359375, -7.771188735961914], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000519.npy"}
|
||||
{"epoch": 0.7845804988662132, "step": 520, "batch_size": 64, "mean": 10.225279808044434, "std": 14.41762638092041, "min": -19.124862670898438, "p10": -4.005202865600586, "median": 7.179189682006836, "p90": 31.466332626342783, "max": 43.75255584716797, "pos_frac": 0.8125, "sample": [22.045867919921875, -4.079992294311523, 0.1186532974243164, 7.198997497558594, 16.629653930664062, 22.318283081054688, 17.27789306640625, 1.4003372192382812, 0.171661376953125, 7.882537841796875, 22.640823364257812, 3.479949951171875, 0.4476890563964844, 6.7854156494140625, 8.869674682617188, 6.3892364501953125, 5.554901123046875, -19.124862670898438, 4.7463226318359375, 37.143253326416016, 21.004852294921875, 27.94076919555664, 5.572597503662109, 43.37142562866211, 16.177093505859375, 8.423908233642578, 43.462249755859375, 7.704132080078125, 0.4276084899902344, 28.979843139648438, 8.100776672363281, 3.573047637939453, 7.159381866455078, -0.44110107421875, 6.895866394042969, 1.0179920196533203, 43.75255584716797, 19.997413635253906, 8.119861602783203, -1.117828369140625, -13.888452529907227, -4.216552734375, 9.508636474609375, -18.995193481445312, 4.9157257080078125, 2.5283775329589844, 12.064910888671875, 5.785930633544922, 17.896800994873047, 13.422981262207031, -2.91644287109375, 41.912841796875, 25.573654174804688, 7.434173583984375, 37.03846740722656, 14.22174072265625, 1.203704833984375, 4.184047698974609, -3.3363304138183594, -3.8306941986083984, -5.198154449462891, 18.632057189941406, 32.5319709777832, -10.075035095214844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000520.npy"}
|
||||
{"epoch": 0.7860922146636432, "step": 521, "batch_size": 64, "mean": 10.406566619873047, "std": 17.25983428955078, "min": -33.701316833496094, "p10": -6.301658630371094, "median": 6.858070373535156, "p90": 34.26030921936035, "max": 55.16266632080078, "pos_frac": 0.734375, "sample": [14.490314483642578, 6.9875335693359375, 34.34461212158203, -0.49542236328125, 22.111759185791016, -2.5966758728027344, 28.641563415527344, 0.8186798095703125, -9.076087951660156, 24.637496948242188, 14.687889099121094, -6.961479187011719, 11.631118774414062, 3.950042724609375, 29.881019592285156, 15.171379089355469, 7.6662139892578125, 8.571632385253906, -8.547256469726562, 43.33628845214844, 0.8223114013671875, 38.79438781738281, -6.28857421875, 1.0745315551757812, 19.68702507019043, 25.67224884033203, -4.955039978027344, 32.98839569091797, 5.956512451171875, 3.4434585571289062, 55.16266632080078, 46.714996337890625, -32.8665771484375, 11.254491806030273, 0.314208984375, -1.60540771484375, 2.6104660034179688, 0.15784454345703125, 0.3654289245605469, 33.98551940917969, 35.22735595703125, 1.7661724090576172, -6.3072662353515625, 4.267059326171875, -3.9785079956054688, -6.337911605834961, 18.05921173095703, 26.349609375, 3.3089027404785156, 34.063602447509766, -4.059329986572266, 6.728607177734375, -33.701316833496094, 36.59834289550781, -3.4996337890625, 7.826366424560547, 12.063589096069336, -5.7335052490234375, 20.855819702148438, -1.3545455932617188, 22.579696655273438, 12.794944763183594, 6.5232696533203125, 9.44024658203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000521.npy"}
|
||||
{"epoch": 0.7876039304610734, "step": 522, "batch_size": 64, "mean": 8.543373107910156, "std": 19.306886672973633, "min": -44.068695068359375, "p10": -15.272196960449218, "median": 6.349987030029297, "p90": 37.146981430053714, "max": 49.72578430175781, "pos_frac": 0.65625, "sample": [6.207244873046875, 14.288341522216797, 5.342630386352539, 22.676128387451172, 6.528053283691406, 37.473594665527344, 13.482650756835938, 34.12688446044922, 41.61097717285156, 9.363121032714844, -27.271949768066406, -14.01458740234375, 15.122703552246094, 49.72578430175781, 0.6316757202148438, 11.911781311035156, -15.844284057617188, 8.238349914550781, 9.571735382080078, -4.1278228759765625, 7.620807647705078, 2.9894065856933594, -5.2071685791015625, -44.068695068359375, 12.5615234375, -4.688789367675781, -10.247562408447266, 5.791259765625, -24.387237548828125, 1.1612701416015625, -2.8157882690429688, 33.39990234375, -11.101104736328125, 36.384883880615234, 22.569183349609375, -15.811172485351562, -3.5490951538085938, 7.956214904785156, -3.3583755493164062, 6.492729187011719, 12.800796508789062, -2.3311996459960938, 2.8514404296875, -0.832275390625, 15.34014892578125, -1.279886245727539, 32.91264343261719, -17.71805191040039, 38.97041702270508, 3.517589569091797, 19.91619873046875, -13.716323852539062, 20.21929931640625, 46.24787902832031, 3.501953125, -0.2441558837890625, -16.821311950683594, 41.926422119140625, 30.767074584960938, 39.288665771484375, -0.2987098693847656, 6.172809600830078, 13.063936233520508, 35.7852783203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000522.npy"}
|
||||
{"epoch": 0.7891156462585034, "step": 523, "batch_size": 64, "mean": 10.367927551269531, "std": 16.238357543945312, "min": -36.335533142089844, "p10": -7.644568634033203, "median": 9.524669647216797, "p90": 30.768122100830084, "max": 44.13600158691406, "pos_frac": 0.765625, "sample": [-2.65087890625, 39.428794860839844, 9.502326965332031, 4.7917327880859375, 24.47643280029297, 28.810375213623047, 3.2998504638671875, 15.194229125976562, 25.358985900878906, 8.355964660644531, 0.4889802932739258, 12.279510498046875, 44.13600158691406, 4.460235595703125, 9.547012329101562, 21.510650634765625, 12.8963623046875, -1.1822547912597656, 17.18183135986328, -6.722972869873047, 8.901962280273438, 23.72240447998047, -7.459373474121094, 6.51055908203125, -18.8768310546875, 13.604503631591797, 1.694793701171875, -5.469367980957031, 6.1309661865234375, 6.018596649169922, 16.687728881835938, 29.223419189453125, 24.46575927734375, 13.773595809936523, 31.622817993164062, 42.36451721191406, 39.058502197265625, -1.296722412109375, 18.274948120117188, -20.9317626953125, 15.054000854492188, -13.518402099609375, 41.14582061767578, 7.081001281738281, 19.262985229492188, 0.345550537109375, -3.6180953979492188, 9.472824096679688, 17.390792846679688, -19.228553771972656, 22.743118286132812, 5.73536491394043, 20.5379638671875, -5.755897521972656, 19.683265686035156, 31.430137634277344, 5.691257476806641, 11.101959228515625, 8.812541961669922, -36.335533142089844, -16.899574279785156, 12.47951889038086, -7.72393798828125, 19.475067138671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000523.npy"}
|
||||
{"epoch": 0.7906273620559335, "step": 524, "batch_size": 64, "mean": 9.482120513916016, "std": 16.165803909301758, "min": -33.40840148925781, "p10": -10.267919921874997, "median": 9.04496955871582, "p90": 29.672276306152344, "max": 41.0968132019043, "pos_frac": 0.703125, "sample": [21.98028564453125, 8.837608337402344, 29.652206420898438, 3.226696014404297, -3.0052947998046875, 0.360687255859375, -7.239738464355469, 11.557968139648438, 33.025917053222656, 4.137714385986328, 17.689308166503906, -2.270277976989746, -15.312374114990234, 6.210960388183594, -6.3132171630859375, 32.111488342285156, -33.40840148925781, -12.1038818359375, 16.305564880371094, 22.14531707763672, 23.081375122070312, -32.174583435058594, 9.63507080078125, 6.791093826293945, -1.5477409362792969, 3.216726303100586, 35.001808166503906, 24.243324279785156, 23.576950073242188, -3.297760009765625, -2.1158485412597656, 16.69872283935547, -11.727890014648438, 8.55056381225586, 13.96368408203125, 8.148101806640625, 25.117210388183594, 12.618698120117188, 0.7281646728515625, -2.4728965759277344, 2.37091064453125, -0.8158111572265625, -0.212432861328125, 27.045578002929688, 14.062553405761719, 34.5496826171875, 27.9970703125, -3.0958633422851562, 10.370601654052734, 21.411865234375, 15.796119689941406, 9.252330780029297, 20.260395050048828, 29.680877685546875, 7.166374206542969, 13.867914199829102, 37.82683181762695, 21.276470184326172, -24.145950317382812, -2.0625076293945312, 41.0968132019043, -11.565711975097656, 27.14874267578125, 1.9496040344238281], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000524.npy"}
|
||||
{"epoch": 0.7921390778533636, "step": 525, "batch_size": 64, "mean": 7.183684825897217, "std": 16.921367645263672, "min": -27.79778289794922, "p10": -10.054089736938476, "median": 5.344412803649902, "p90": 32.954019927978514, "max": 48.747833251953125, "pos_frac": 0.671875, "sample": [5.76812744140625, -6.7237396240234375, -3.040924072265625, -8.750789642333984, -18.95550537109375, 13.547348022460938, 7.646793365478516, 17.429229736328125, 39.52961730957031, -0.546142578125, 15.092975616455078, -3.8460159301757812, 5.591775894165039, 5.097049713134766, 7.133209228515625, 23.51397705078125, 32.7313232421875, -12.510147094726562, 47.14483642578125, 0.7845993041992188, 4.3849639892578125, -5.680660247802734, 12.230010986328125, 39.371246337890625, -9.404335021972656, 11.719461441040039, 12.6978759765625, -10.29446029663086, 0.822662353515625, 6.296054840087891, -20.365779876708984, 13.280670166015625, 2.236705780029297, 23.78790283203125, 15.583148956298828, -2.8985137939453125, 7.386064529418945, 3.4251785278320312, 48.747833251953125, -22.844524383544922, -9.49322509765625, 4.22960090637207, 4.671537399291992, -27.79778289794922, -3.0315170288085938, 23.030487060546875, 1.3983497619628906, 1.9144973754882812, 20.048133850097656, 46.91735076904297, 5.754844665527344, 33.049461364746094, 13.343132019042969, -0.16870498657226562, 12.493637084960938, 13.97467041015625, -3.9800567626953125, 11.578598022460938, 0.5634841918945312, 6.993019104003906, -9.49075698852539, 39.769447326660156, -3.3594818115234375, -19.771987915039062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000525.npy"}
|
||||
{"epoch": 0.7936507936507936, "step": 526, "batch_size": 64, "mean": 8.64376163482666, "std": 16.343589782714844, "min": -31.53247833251953, "p10": -7.561640739440915, "median": 7.246463775634766, "p90": 30.559860992431652, "max": 47.90928649902344, "pos_frac": 0.703125, "sample": [47.90928649902344, 6.8921661376953125, 2.582061767578125, 2.4286975860595703, -5.230072021484375, 13.550106048583984, 9.009824752807617, 34.157867431640625, -16.204971313476562, -4.8132476806640625, 16.20941162109375, 22.275188446044922, -5.0499114990234375, 3.7451324462890625, 23.56340789794922, -4.700508117675781, 7.146568298339844, 43.10960388183594, 10.697463989257812, 25.135726928710938, 7.3463592529296875, 31.862022399902344, 9.366325378417969, 27.521484375, 15.423477172851562, 7.674692153930664, 16.990707397460938, 3.1728286743164062, -14.80704116821289, 9.690338134765625, -0.582122802734375, -1.0706443786621094, 0.39214324951171875, -5.214508056640625, 21.8935546875, -4.903133392333984, 3.5628929138183594, -0.96380615234375, 24.345001220703125, -15.374465942382812, 26.9481201171875, 10.58709716796875, 3.08123779296875, -4.728355407714844, 18.388648986816406, -31.49776840209961, -8.560884475708008, -2.946044921875, 33.027435302734375, -2.6649169921875, 23.591537475585938, 0.9079818725585938, 36.75697326660156, 3.024688720703125, 45.71830749511719, 10.348915100097656, -12.276433944702148, 15.442424774169922, 9.345184326171875, 15.358428955078125, 5.1204681396484375, 3.4910049438476562, 17.529220581054688, -31.53247833251953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000526.npy"}
|
||||
{"epoch": 0.7951625094482238, "step": 527, "batch_size": 64, "mean": 7.098379135131836, "std": 16.185359954833984, "min": -31.256881713867188, "p10": -10.24766387939453, "median": 4.508872985839844, "p90": 29.7912239074707, "max": 41.236698150634766, "pos_frac": 0.65625, "sample": [-0.2341175079345703, 2.6628990173339844, 11.764656066894531, 23.567642211914062, 4.469764709472656, 2.886760711669922, 29.804954528808594, 8.342105865478516, 36.034053802490234, 34.5447998046875, 20.997539520263672, 2.358367919921875, 15.64117431640625, -12.600799560546875, 14.571685791015625, 7.087120056152344, 26.721839904785156, -1.9092674255371094, 14.860015869140625, -5.519355773925781, 11.439384460449219, -29.888565063476562, -26.330482482910156, -2.4293670654296875, 4.1002349853515625, 13.798221588134766, 40.71101379394531, -4.4322509765625, 8.912612915039062, 7.808311462402344, -1.089996337890625, 6.119476318359375, -11.036331176757812, 7.474079132080078, 19.032611846923828, 41.236698150634766, 33.568809509277344, -3.4010467529296875, -1.2130393981933594, 18.993499755859375, 24.10803985595703, 1.3844261169433594, 29.759185791015625, -3.7809295654296875, -10.519699096679688, -5.795433044433594, -8.528972625732422, 37.545127868652344, -9.6129150390625, 6.922651290893555, 1.386383056640625, 8.552279472351074, 3.84912109375, -0.11223983764648438, 0.7459716796875, -0.3309001922607422, 2.0646820068359375, 16.26972198486328, 4.547981262207031, 18.375762939453125, -31.256881713867188, -23.619140625, 26.242504119873047, -3.326183319091797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000527.npy"}
|
||||
{"epoch": 0.7966742252456538, "step": 528, "batch_size": 64, "mean": 8.665306091308594, "std": 18.200088500976562, "min": -36.565704345703125, "p10": -12.577696228027342, "median": 6.967815399169922, "p90": 33.13994293212891, "max": 49.791481018066406, "pos_frac": 0.71875, "sample": [23.123268127441406, -27.02862548828125, 31.733779907226562, -1.2467536926269531, 19.528091430664062, 15.625724792480469, 22.18381118774414, -6.463081359863281, 5.839923858642578, 8.877490997314453, -0.060077667236328125, -13.861747741699219, 29.370094299316406, -36.565704345703125, 1.7041778564453125, -4.999778747558594, 29.87842559814453, 6.163837432861328, 3.722850799560547, 1.0831451416015625, 2.8324203491210938, 39.12327575683594, 0.09563636779785156, -10.872589111328125, 1.9651679992675781, 12.291679382324219, 2.4610671997070312, -13.308456420898438, -0.2842445373535156, 37.02056884765625, 10.59661865234375, 12.082862854003906, 16.076568603515625, 45.4857177734375, 12.787734985351562, -6.830718994140625, 49.791481018066406, 8.166351318359375, 34.67875671386719, 6.33696174621582, 25.071189880371094, 26.057655334472656, 7.481880187988281, 36.421897888183594, -23.279712677001953, 13.82745361328125, 1.2830085754394531, 1.7594451904296875, 20.164804458618164, -3.370349884033203, 8.378524780273438, -34.52606964111328, -24.616737365722656, 18.335906982421875, 15.14569091796875, 11.710689544677734, -8.267860412597656, -0.172393798828125, 27.570266723632812, -4.977169036865234, 5.544624328613281, 33.742584228515625, 25.73480224609375, 6.4537506103515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000528.npy"}
|
||||
{"epoch": 0.7981859410430839, "step": 529, "batch_size": 64, "mean": 16.796003341674805, "std": 18.225873947143555, "min": -20.277141571044922, "p10": -3.135467529296875, "median": 15.173751831054688, "p90": 41.41722793579102, "max": 51.04496765136719, "pos_frac": 0.78125, "sample": [38.4551887512207, -13.682937622070312, 18.067642211914062, 4.0740814208984375, 25.378551483154297, -5.971302032470703, 51.04496765136719, 40.1806640625, 14.363319396972656, 6.174674987792969, -3.1280574798583984, -0.9600868225097656, -0.44085693359375, 17.702655792236328, 18.699684143066406, 6.793548583984375, -17.88483428955078, 38.36830520629883, -0.3393592834472656, 37.883636474609375, 33.05657196044922, 4.7274627685546875, 3.808807373046875, -2.526336669921875, 50.60540771484375, 35.07684326171875, 1.7786674499511719, 13.380058288574219, -2.558879852294922, -3.138643264770508, 29.160675048828125, 26.910301208496094, 15.984184265136719, 46.07975769042969, 31.1630859375, 40.527183532714844, 34.23047637939453, 21.361427307128906, 16.59947967529297, 21.00188446044922, 13.866897583007812, 34.521053314208984, 43.74913024902344, 2.4011764526367188, 17.692276000976562, 29.352153778076172, 3.32196044921875, 4.746783256530762, 4.16400146484375, 32.68119812011719, 7.712860107421875, -7.4834136962890625, 8.96484375, 41.798675537109375, 4.842506408691406, 47.340003967285156, 37.65753936767578, 19.462120056152344, -0.450347900390625, 8.84429931640625, -20.277141571044922, 7.0260162353515625, -3.9442367553710938, 44.94598388671875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000529.npy"}
|
||||
{"epoch": 0.799697656840514, "step": 530, "batch_size": 64, "mean": 11.230892181396484, "std": 17.00861930847168, "min": -18.0669002532959, "p10": -7.944276046752929, "median": 8.405448913574219, "p90": 35.151282501220706, "max": 48.4073600769043, "pos_frac": 0.734375, "sample": [48.14288330078125, 22.099491119384766, 13.028237342834473, -4.1922607421875, 36.51011657714844, 6.437782287597656, 43.73919677734375, 2.7615394592285156, 46.7608642578125, 23.267837524414062, 0.25662994384765625, 25.306686401367188, -18.0669002532959, 1.6360588073730469, 21.097375869750977, 13.845512390136719, 22.361148834228516, 4.558296203613281, 48.4073600769043, 8.787918090820312, -5.283315658569336, -14.210556030273438, -9.969268798828125, 14.176692962646484, 22.801559448242188, 0.5447502136230469, 21.34313201904297, 25.737537384033203, 2.2165451049804688, 45.415367126464844, 3.4853668212890625, -5.203643798828125, 24.80261993408203, 34.885498046875, 3.4917526245117188, 1.9099311828613281, 20.49853515625, 29.371620178222656, -15.635955810546875, 8.022979736328125, 1.1496658325195312, 9.108963012695312, -1.5398101806640625, 17.64380645751953, -8.185447692871094, -16.052200317382812, 27.126384735107422, -3.6969375610351562, -7.381542205810547, -1.2493896484375, -11.94784927368164, -2.513469696044922, 33.081214904785156, 2.4673538208007812, 35.26519012451172, 9.04437255859375, 20.589332580566406, -6.5319366455078125, -2.1107444763183594, 1.0971755981445312, 22.035186767578125, 11.612136840820312, 5.643974304199219, 8.9747314453125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000530.npy"}
|
||||
{"epoch": 0.8012093726379441, "step": 531, "batch_size": 64, "mean": 8.977895736694336, "std": 15.413922309875488, "min": -33.07105255126953, "p10": -6.624212074279785, "median": 7.237030029296875, "p90": 30.22235107421875, "max": 45.92179489135742, "pos_frac": 0.6875, "sample": [-0.5108985900878906, 7.3524169921875, 25.66950225830078, -2.0857162475585938, 10.497486114501953, 9.038688659667969, -5.00677490234375, 14.379745483398438, 11.655303955078125, 1.7208595275878906, 13.316726684570312, 2.0987911224365234, 29.516754150390625, -7.03466796875, 45.92179489135742, 26.28717041015625, -15.414268493652344, -4.9910430908203125, -6.6635894775390625, 30.35601806640625, 7.12164306640625, 21.517730712890625, -6.452117919921875, 4.2411041259765625, -1.7550735473632812, 10.452289581298828, 16.98528289794922, -2.3190155029296875, 6.6878204345703125, -7.921112060546875, 4.235271453857422, 5.44367790222168, -6.532331466674805, -0.8850784301757812, -33.07105255126953, -0.9309206008911133, 13.49273681640625, 5.588346481323242, 18.764163970947266, -23.414142608642578, 7.355194091796875, 8.304611206054688, 30.862686157226562, 25.464431762695312, -4.439178466796875, -12.805862426757812, 37.77360153198242, 37.4046630859375, 11.858612060546875, 8.883041381835938, 4.0111541748046875, 18.955718994140625, 5.718116760253906, 29.91046142578125, 20.943328857421875, -1.467193603515625, 20.033416748046875, 6.076629638671875, 11.832775115966797, 34.7802734375, 7.385793685913086, 3.8675765991210938, 44.86573791503906, -0.34378814697265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000531.npy"}
|
||||
{"epoch": 0.8027210884353742, "step": 532, "batch_size": 64, "mean": 10.757040023803711, "std": 18.103588104248047, "min": -24.311538696289062, "p10": -10.20651226043701, "median": 7.479883193969727, "p90": 36.87492218017579, "max": 51.73710250854492, "pos_frac": 0.734375, "sample": [20.337081909179688, 20.601787567138672, 32.50831604003906, 37.26116943359375, 6.3820343017578125, -5.279270172119141, 1.42022705078125, 51.73710250854492, 4.555694580078125, 13.106536865234375, -5.309101104736328, 28.81094741821289, 34.58251190185547, 15.162384033203125, 11.794471740722656, 4.789581298828125, 6.304969787597656, 31.987388610839844, 7.426963806152344, -21.37726593017578, -9.228950500488281, -1.5449447631835938, 23.32101058959961, 12.924911499023438, -1.48760986328125, 1.1247940063476562, 10.180065155029297, 26.737266540527344, 33.14720153808594, 28.531654357910156, 20.69708251953125, -3.0410003662109375, -4.169464111328125, 47.12763977050781, 6.157051086425781, 5.279022216796875, 37.512428283691406, -9.190616607666016, 1.217071533203125, 7.532802581787109, -11.891544342041016, -14.754364013671875, -10.625467300415039, -0.6916465759277344, 41.157344818115234, 5.66424560546875, 44.17062759399414, 11.976860046386719, 7.7353973388671875, 23.527328491210938, 9.903884887695312, 9.749519348144531, 0.2992439270019531, 48.400691986083984, -24.311538696289062, 10.458354949951172, -14.047039031982422, 0.361968994140625, 35.97367858886719, -21.901084899902344, 4.322680473327637, -9.10284423828125, 3.144012451171875, 9.299293518066406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000532.npy"}
|
||||
{"epoch": 0.8042328042328042, "step": 533, "batch_size": 64, "mean": 10.75674819946289, "std": 18.904245376586914, "min": -30.608211517333984, "p10": -10.57676544189453, "median": 9.503768920898438, "p90": 36.60678405761719, "max": 47.46156311035156, "pos_frac": 0.6875, "sample": [11.440895080566406, 35.140071868896484, -4.4134674072265625, 19.34134292602539, 10.614852905273438, 17.445899963378906, -6.790435791015625, -7.1597442626953125, 21.12439727783203, 23.708011627197266, -12.40924072265625, -11.01150131225586, 10.61336898803711, -3.7265682220458984, 36.38775634765625, 21.2684326171875, 9.142040252685547, 22.473403930664062, -17.925247192382812, -0.5254440307617188, 1.3962593078613281, 4.276348114013672, 18.906139373779297, 1.8879737854003906, 1.1293182373046875, -9.518129348754883, 18.14670753479004, -6.4381866455078125, 9.638290405273438, -28.854305267333984, -3.2963943481445312, 14.732616424560547, 45.26573181152344, 35.51538848876953, -6.5157012939453125, 17.071334838867188, -6.1208953857421875, 47.46156311035156, 4.493614196777344, -6.587505340576172, 11.259552001953125, 24.649429321289062, 15.554893493652344, 31.77001190185547, 44.74443054199219, 47.193565368652344, 5.740875244140625, -22.361892700195312, 7.816032409667969, -9.562381744384766, -1.1187820434570312, 40.430519104003906, 15.60213851928711, 0.12203216552734375, 6.510782241821289, -11.078910827636719, 6.610893249511719, 24.45758819580078, 9.369247436523438, 44.08540344238281, 36.700653076171875, 29.21307373046875, -30.608211517333984, 34.00196075439453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000533.npy"}
|
||||
{"epoch": 0.8057445200302343, "step": 534, "batch_size": 64, "mean": 10.878826141357422, "std": 16.667705535888672, "min": -25.53839874267578, "p10": -12.306170272827147, "median": 11.941316604614258, "p90": 32.432766723632824, "max": 48.72467041015625, "pos_frac": 0.703125, "sample": [14.48248291015625, -9.101791381835938, 22.4290771484375, 22.05026626586914, 1.6277580261230469, 21.610870361328125, 6.3629913330078125, -12.906707763671875, 24.45623016357422, -3.6374588012695312, -2.4688053131103516, -2.8285751342773438, 21.35155487060547, 33.476806640625, -11.248653411865234, 39.020172119140625, 10.989871978759766, 16.348121643066406, 23.757667541503906, -8.113334655761719, 42.44853973388672, 33.52093505859375, 14.80181884765625, 20.09490966796875, -5.4574737548828125, -12.52398681640625, -25.53839874267578, -0.39110565185546875, 24.5281982421875, 15.551803588867188, 4.002082824707031, -17.662124633789062, 35.65911865234375, 45.633140563964844, 23.029930114746094, 13.440902709960938, -16.765899658203125, 8.347373962402344, 48.72467041015625, 28.859329223632812, 29.996673583984375, 15.411338806152344, -12.794525146484375, 4.0880889892578125, 17.268512725830078, 19.33376693725586, -0.5488548278808594, 8.626239776611328, 7.2395172119140625, 5.55023193359375, 12.325336456298828, 18.733013153076172, 4.02325439453125, -3.7709693908691406, 6.4057464599609375, 15.074691772460938, 19.730125427246094, -14.672080993652344, 11.557296752929688, -6.247596740722656, 24.14086151123047, 10.61492919921875, 27.99488639831543, -11.797931671142578], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000534.npy"}
|
||||
{"epoch": 0.8072562358276644, "step": 535, "batch_size": 64, "mean": 9.698479652404785, "std": 19.27991485595703, "min": -44.92680740356445, "p10": -9.986844635009765, "median": 8.660453796386719, "p90": 38.525071716308595, "max": 55.17601013183594, "pos_frac": 0.71875, "sample": [-0.5704612731933594, -15.774307250976562, 48.668701171875, 23.797637939453125, 1.1736526489257812, 16.07025146484375, -24.024560928344727, 7.295207977294922, -10.475990295410156, 44.82366943359375, 3.048187255859375, 0.9872703552246094, 10.365921020507812, 19.316429138183594, -4.468366622924805, -0.900146484375, -7.1806640625, 16.162269592285156, 0.3221006393432617, -8.296043395996094, 10.175472259521484, 19.664695739746094, 16.753061294555664, -8.845504760742188, 31.45832061767578, 0.8848037719726562, 30.00592041015625, -32.036338806152344, -7.1011810302734375, 9.131442070007324, 41.925689697265625, 40.952064514160156, 17.2070369720459, 13.219474792480469, 8.245887756347656, 11.739990234375, 9.294532775878906, 42.12019348144531, -19.509841918945312, 25.061670303344727, 7.32354736328125, -2.321409225463867, 55.17601013183594, 3.2779083251953125, 17.932273864746094, -44.92680740356445, 12.968841552734375, 27.429534912109375, 0.047618865966796875, 20.014785766601562, 6.3665618896484375, -3.892963409423828, 2.292510986328125, 38.893287658691406, 10.652488708496094, -1.4314727783203125, 9.075019836425781, 18.264724731445312, 34.85902786254883, -0.5853729248046875, -16.792728424072266, 1.0328216552734375, 37.66590118408203, 6.692474365234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000535.npy"}
|
||||
{"epoch": 0.8087679516250945, "step": 536, "batch_size": 64, "mean": 12.92877197265625, "std": 15.266900062561035, "min": -12.301177978515625, "p10": -3.068646240234375, "median": 8.732539653778076, "p90": 35.535491943359375, "max": 48.57097625732422, "pos_frac": 0.8125, "sample": [7.383182525634766, -1.2605133056640625, -12.301177978515625, 35.77460479736328, 14.027130126953125, -10.744300842285156, 2.9597244262695312, -3.1387710571289062, 10.600509643554688, 1.5281333923339844, -3.8710784912109375, 16.565231323242188, 35.18183898925781, 25.269485473632812, 18.713010787963867, 15.570571899414062, 10.548812866210938, 43.83698654174805, 35.68705749511719, 3.957845687866211, 8.970381736755371, 1.5405807495117188, 33.16154479980469, 31.036941528320312, -0.5477294921875, 48.57097625732422, 26.31177520751953, 3.6721267700195312, 2.4390411376953125, 29.464069366455078, 36.68268585205078, -6.659145355224609, 1.0547027587890625, 6.385478973388672, 8.494697570800781, 13.128791809082031, 38.001678466796875, -7.214977264404297, 29.565597534179688, 0.9352016448974609, 19.697845458984375, 15.949264526367188, 14.096321105957031, -6.020664215087891, 1.4792594909667969, 2.7497634887695312, 10.868968963623047, 10.625259399414062, 4.768745422363281, 34.610382080078125, 2.012866973876953, 4.954612731933594, 5.8781585693359375, 32.668914794921875, 27.09112548828125, 3.1440086364746094, -0.5803031921386719, 44.71528625488281, -0.6557445526123047, 1.4188766479492188, 15.011394500732422, 31.815509796142578, -2.9050216674804688, 2.7638473510742188], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000536.npy"}
|
||||
{"epoch": 0.8102796674225246, "step": 537, "batch_size": 64, "mean": 9.903671264648438, "std": 17.030982971191406, "min": -25.69171142578125, "p10": -11.991072177886963, "median": 7.138288497924805, "p90": 34.18509635925293, "max": 45.21222686767578, "pos_frac": 0.734375, "sample": [20.221290588378906, 6.261085510253906, -1.6474533081054688, 7.086944580078125, 37.9254150390625, 28.03603172302246, 40.6335563659668, 0.04970359802246094, 2.7894287109375, 19.061641693115234, 18.675037384033203, 7.189632415771484, 37.341529846191406, 28.4312744140625, 34.14993667602539, 24.20379638671875, 39.02444839477539, 7.561595916748047, 2.669830322265625, -1.5122194290161133, -10.223104476928711, 5.5360107421875, 3.73663330078125, -11.124773979187012, -10.033916473388672, 24.237838745117188, 7.488250732421875, 22.945236206054688, -12.362342834472656, 0.17179107666015625, 21.04383087158203, 9.227615356445312, 13.931306838989258, -3.1936416625976562, -9.174995422363281, 0.057750701904296875, 34.200164794921875, 14.337287902832031, 4.08221435546875, -1.9626617431640625, 12.183181762695312, 45.21222686767578, -16.113983154296875, 0.67388916015625, 2.790740966796875, -5.629901885986328, 10.455642700195312, -13.550506591796875, 3.579071044921875, 15.681739807128906, 6.931556701660156, 28.052101135253906, -22.472490310668945, 32.661163330078125, 35.675506591796875, 30.688827514648438, -25.69171142578125, 16.416046142578125, -14.206079483032227, 3.6514511108398438, 21.391258239746094, 26.486663818359375, -14.697315216064453, -7.407073974609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000537.npy"}
|
||||
{"epoch": 0.8117913832199547, "step": 538, "batch_size": 64, "mean": 12.428466796875, "std": 16.75214958190918, "min": -20.30780792236328, "p10": -6.835725402832031, "median": 9.4434814453125, "p90": 36.11418762207031, "max": 54.490299224853516, "pos_frac": 0.75, "sample": [-17.265140533447266, 16.75702476501465, -1.2494125366210938, 9.504791259765625, 32.220062255859375, -20.30780792236328, 20.914310455322266, -3.1317138671875, -3.4725189208984375, 25.468685150146484, 16.215845108032227, 40.792747497558594, -9.386825561523438, 26.51947021484375, 14.48879337310791, 10.641426086425781, 37.9833984375, 33.353309631347656, 8.129058837890625, 11.352773666381836, 15.011402130126953, -3.062379837036133, 22.679153442382812, 23.015289306640625, 7.736362457275391, -4.397775650024414, 36.98728942871094, 7.0537109375, 35.836029052734375, 7.402984619140625, 6.617332458496094, 16.229263305664062, 1.2867584228515625, 3.9580230712890625, 1.9272994995117188, 6.525302886962891, -6.3752288818359375, 9.903633117675781, -1.7335662841796875, 18.422325134277344, 32.87385559082031, 9.382171630859375, 36.2333984375, 7.69171142578125, -8.659732818603516, 26.81741714477539, -5.40869140625, 35.59554672241211, 20.283611297607422, 53.77659606933594, 5.025634765625, 14.584815979003906, -13.96221923828125, -7.0330810546875, 10.266387939453125, -1.6858444213867188, 54.490299224853516, 3.8264389038085938, 20.377349853515625, -10.03765869140625, 1.6146316528320312, 41.6558723449707, 8.746955871582031, 4.414955139160156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000538.npy"}
|
||||
{"epoch": 0.8133030990173847, "step": 539, "batch_size": 64, "mean": 9.679433822631836, "std": 16.05901527404785, "min": -16.795333862304688, "p10": -8.349483489990234, "median": 6.315377235412598, "p90": 34.25693054199219, "max": 48.152137756347656, "pos_frac": 0.671875, "sample": [16.33489418029785, 30.198108673095703, -4.862548828125, 41.80608367919922, 13.041793823242188, 4.3784027099609375, 28.149734497070312, 8.31475830078125, 30.583709716796875, -4.657405853271484, 38.95173645019531, 2.9332275390625, -1.083251953125, 5.024559020996094, -4.9897918701171875, 3.129648208618164, -8.505867004394531, 3.079822540283203, 40.868751525878906, 10.735519409179688, 5.818323135375977, -3.0662097930908203, 12.153121948242188, -16.795333862304688, 0.791748046875, -9.975868225097656, -8.994338989257812, 12.146072387695312, 45.47758483886719, 13.132667541503906, -2.8432159423828125, -0.029693603515625, -6.885555267333984, -3.0806808471679688, 26.590423583984375, 11.865327835083008, 7.456634521484375, -1.74066162109375, 41.89862823486328, 3.26416015625, -2.7580947875976562, -10.1947021484375, 14.293296813964844, -7.984588623046875, 26.213592529296875, 21.591224670410156, -10.558053970336914, 48.152137756347656, 4.440696716308594, -5.866447448730469, 25.52825927734375, 13.872589111328125, 10.965251922607422, -1.770416259765625, 15.933723449707031, 15.1029052734375, 0.22906494140625, 3.659330368041992, -16.138534545898438, 34.14183044433594, 8.720157623291016, 34.30625915527344, 10.176834106445312, 6.812431335449219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000539.npy"}
|
||||
{"epoch": 0.8148148148148148, "step": 540, "batch_size": 64, "mean": 12.906469345092773, "std": 19.022350311279297, "min": -30.59273910522461, "p10": -8.805663681030273, "median": 10.124258995056152, "p90": 40.75263671875, "max": 51.65110778808594, "pos_frac": 0.734375, "sample": [-15.331398010253906, 22.40814971923828, 26.68804931640625, 40.616249084472656, 6.421823501586914, -23.8809757232666, 7.784210205078125, 20.483291625976562, 6.9253387451171875, 8.708393096923828, 7.577400207519531, 11.771835327148438, 29.24325942993164, 18.861183166503906, -14.647285461425781, 4.4918975830078125, 37.71156311035156, 8.239776611328125, -0.5006980895996094, 14.856605529785156, 35.53252410888672, -2.842437744140625, 21.5692138671875, 28.93710708618164, 49.986351013183594, 15.207244873046875, -8.146671295166016, 18.65399932861328, -18.690536499023438, 1.250213623046875, -13.97412109375, 44.829833984375, 37.93583679199219, 9.9501953125, 51.65110778808594, 4.686878204345703, -5.895851135253906, 33.44825744628906, -4.96575927734375, 4.078337669372559, 0.4301567077636719, 36.88507080078125, 12.647201538085938, 45.604530334472656, 45.40434646606445, -1.9865341186523438, 19.37872314453125, 8.205963134765625, -4.815093994140625, -30.59273910522461, 11.210403442382812, 23.57208251953125, -3.6159515380859375, 0.5191726684570312, -0.4548835754394531, 11.30316162109375, -9.088088989257812, 42.427398681640625, -0.2045135498046875, 10.627769470214844, 9.055305480957031, 10.298322677612305, 26.760765075683594, 40.81108856201172], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000540.npy"}
|
||||
{"epoch": 0.8163265306122449, "step": 541, "batch_size": 64, "mean": 13.215750694274902, "std": 17.084901809692383, "min": -24.033889770507812, "p10": -9.99070587158203, "median": 13.609036445617676, "p90": 36.10598754882814, "max": 45.87980651855469, "pos_frac": 0.765625, "sample": [30.251327514648438, 37.40594482421875, 15.072208404541016, 43.19317626953125, -2.5696487426757812, 33.07275390625, -8.103752136230469, 28.38351058959961, 27.936180114746094, -6.310508728027344, 8.379714965820312, -13.63226318359375, 8.389633178710938, 6.384054183959961, 3.9944801330566406, -18.007041931152344, 19.019607543945312, 22.979782104492188, 7.356239318847656, 43.057952880859375, -1.9656600952148438, 1.1693611145019531, 42.13540267944336, 27.062301635742188, 31.69970703125, 13.527597427368164, 26.800689697265625, -13.670814514160156, -14.924423217773438, 0.38847923278808594, 21.48766326904297, 13.690475463867188, 40.434486389160156, 19.655487060546875, -8.021171569824219, 32.137691497802734, 3.252897262573242, -24.033889770507812, 13.076194763183594, 15.47848892211914, 6.1964569091796875, -1.9247283935546875, 14.746841430664062, 28.439617156982422, 6.530719757080078, 4.963264465332031, 27.883880615234375, 18.155715942382812, -10.799400329589844, 17.743331909179688, 0.3473234176635742, 20.052824020385742, 8.18844985961914, 4.436836242675781, -2.1355438232421875, 5.9412689208984375, 28.477741241455078, 20.862552642822266, -1.1284503936767578, 32.692787170410156, 37.68659973144531, -12.196405410766602, 45.87980651855469, 19.132247924804688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000541.npy"}
|
||||
{"epoch": 0.817838246409675, "step": 542, "batch_size": 64, "mean": 14.455942153930664, "std": 19.283138275146484, "min": -31.320831298828125, "p10": -7.703810119628906, "median": 10.825935363769531, "p90": 41.37836112976074, "max": 47.71253204345703, "pos_frac": 0.796875, "sample": [37.85992431640625, 30.045013427734375, 8.633468627929688, 42.82853698730469, 0.8905715942382812, 4.846824645996094, 7.282684326171875, 28.24184799194336, 5.3894195556640625, 10.530342102050781, 17.106658935546875, -8.048309326171875, -3.4000244140625, 28.03643035888672, 2.7207870483398438, 21.73236846923828, 0.641510009765625, 22.059982299804688, -6.8999786376953125, 45.327735900878906, 46.23564529418945, 40.146934509277344, 1.944411277770996, 0.6255950927734375, -18.677875518798828, -31.320831298828125, 0.6367912292480469, -2.7123565673828125, 24.408382415771484, -11.212257385253906, 35.344932556152344, 41.57077407836914, 14.78762435913086, 25.037113189697266, -13.173761367797852, -2.8372726440429688, -2.86651611328125, 11.99761962890625, -9.781669616699219, 39.70405578613281, 4.312278747558594, 8.498165130615234, 40.92939758300781, -4.737621307373047, 25.150768280029297, 0.3556060791015625, 45.124149322509766, 14.034679412841797, 4.802215576171875, 40.84510040283203, 31.695236206054688, 47.71253204345703, -17.422754287719727, 0.13669586181640625, 12.650747299194336, 2.4902420043945312, 10.229488372802734, 46.80374526977539, 14.353023529052734, 11.121528625488281, 7.7018585205078125, 38.769805908203125, 14.559333801269531, 39.38095474243164], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000542.npy"}
|
||||
{"epoch": 0.8193499622071051, "step": 543, "batch_size": 64, "mean": 12.627391815185547, "std": 18.809566497802734, "min": -42.0577278137207, "p10": -2.736664581298828, "median": 8.451250076293945, "p90": 41.45325622558594, "max": 48.01568603515625, "pos_frac": 0.78125, "sample": [15.439102172851562, 4.582416534423828, 8.627883911132812, 1.1573867797851562, 9.285606384277344, -2.8386611938476562, 27.68402099609375, 43.3475341796875, -22.85401153564453, 28.960052490234375, 6.62322998046875, 15.786563873291016, 27.86453628540039, 11.957771301269531, 41.95207977294922, 4.283210754394531, 44.76329803466797, -32.743080139160156, -0.9841957092285156, 35.572818756103516, 0.08602333068847656, -0.8300132751464844, 5.978778839111328, 12.747444152832031, 11.104656219482422, 44.98696517944336, -42.0577278137207, 1.8139114379882812, -0.12400436401367188, -8.015029907226562, 6.0032958984375, 27.04633331298828, 41.501930236816406, 15.555564880371094, 29.584354400634766, -4.296417236328125, 41.339683532714844, -2.4986724853515625, -11.457870483398438, -0.20406532287597656, 7.950660705566406, 2.467845916748047, 44.736846923828125, 25.734100341796875, 10.796592712402344, 17.983551025390625, 38.98595428466797, 37.65815734863281, 2.8893280029296875, 12.630420684814453, 3.4091339111328125, 8.107654571533203, 1.117095947265625, 2.8227157592773438, 24.661590576171875, 1.9813308715820312, 8.274616241455078, 8.978658676147461, 6.268501281738281, -1.2587261199951172, 39.10565948486328, 48.01568603515625, -2.3152084350585938, 20.418190002441406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000543.npy"}
|
||||
{"epoch": 0.8208616780045351, "step": 544, "batch_size": 64, "mean": 13.992612838745117, "std": 19.50286102294922, "min": -18.67235565185547, "p10": -10.874094009399412, "median": 13.215202331542969, "p90": 42.806439208984386, "max": 53.770477294921875, "pos_frac": 0.6875, "sample": [19.57891082763672, -17.796478271484375, 8.960113525390625, -0.3838672637939453, 24.731414794921875, 12.484062194824219, -6.570730209350586, 0.9477920532226562, 20.8612060546875, 3.6119232177734375, -4.540069580078125, 6.622562408447266, 14.437774658203125, 31.91907501220703, 14.345123291015625, 3.9926509857177734, -12.0433349609375, 20.937488555908203, 9.575836181640625, -0.1233673095703125, 11.795753479003906, 6.356407165527344, 9.510223388671875, 44.90874099731445, 39.88945007324219, -3.5855464935302734, 14.521440505981445, -14.395988464355469, -1.3273639678955078, -4.442230224609375, 45.15284729003906, -17.509689331054688, 28.879070281982422, -0.3012351989746094, 43.870174407958984, 17.288909912109375, 35.48321533203125, -11.79336929321289, 18.76344871520996, 31.658111572265625, 14.929397583007812, 26.47864532470703, 51.319252014160156, 3.630016326904297, -6.396415710449219, 13.946342468261719, 16.672080993652344, 10.290443420410156, 52.87676239013672, -18.67235565185547, -6.639209747314453, 53.770477294921875, 39.96122741699219, 36.9334716796875, -8.729118347167969, -0.5455780029296875, 15.762649536132812, -15.22802734375, 40.32439041137695, 44.663978576660156, -5.36119270324707, 25.378433227539062, 27.671592712402344, 36.219451904296875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000544.npy"}
|
||||
{"epoch": 0.8223733938019653, "step": 545, "batch_size": 64, "mean": 9.951007843017578, "std": 17.9447078704834, "min": -29.226736068725586, "p10": -10.38787288665771, "median": 8.110725402832031, "p90": 31.203285217285156, "max": 56.37994384765625, "pos_frac": 0.703125, "sample": [25.155658721923828, 12.062278747558594, -5.124244689941406, 45.338077545166016, 1.2624855041503906, 10.661018371582031, 0.922698974609375, -13.513607025146484, -6.774204254150391, 4.845458984375, 15.056793212890625, -5.407196044921875, 28.194664001464844, -1.0786056518554688, 1.90533447265625, 15.863700866699219, 8.41204833984375, 20.622268676757812, -1.6637306213378906, 31.130386352539062, 3.9546966552734375, 51.443359375, -3.2974472045898438, 28.424468994140625, 20.38702392578125, 21.767608642578125, 16.462646484375, 10.281200408935547, 7.917045593261719, 4.4109039306640625, -4.2397918701171875, -29.226736068725586, -2.6678924560546875, 8.304405212402344, -13.970962524414062, -6.731327056884766, 31.234527587890625, 50.42808151245117, 3.3750152587890625, -7.1337890625, -1.1794090270996094, 33.14105224609375, 19.02227783203125, 14.966384887695312, 3.5739707946777344, 16.736209869384766, -11.782480239868164, 16.48851776123047, 24.670852661132812, -28.210376739501953, 1.3337554931640625, 23.482009887695312, 3.642913818359375, 56.37994384765625, 28.705665588378906, -16.581573486328125, -2.058269500732422, 19.582324981689453, -18.25091552734375, 3.0639991760253906, 8.420867919921875, 33.60729217529297, 27.82147979736328, 1.2937088012695312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000545.npy"}
|
||||
{"epoch": 0.8238851095993953, "step": 546, "batch_size": 64, "mean": 11.847713470458984, "std": 17.01457405090332, "min": -28.754804611206055, "p10": -9.008636856079102, "median": 9.972539901733398, "p90": 37.214846038818365, "max": 51.18040466308594, "pos_frac": 0.765625, "sample": [41.59434509277344, -8.733535766601562, 39.98234176635742, 4.6755523681640625, 6.890460968017578, 24.92115592956543, 10.732620239257812, -0.58758544921875, 1.890228271484375, -9.732601165771484, -3.547088623046875, 24.32024383544922, 15.966846466064453, 15.864665985107422, -18.11548614501953, 26.996963500976562, 0.5529670715332031, 3.3522872924804688, -9.126537322998047, 3.6831703186035156, -3.7551956176757812, 19.14108657836914, 5.963855743408203, 10.814956665039062, 12.791610717773438, 3.5893707275390625, 35.814205169677734, -9.149520874023438, 37.815120697021484, 51.18040466308594, 7.35374641418457, 14.471435546875, 34.019920349121094, 10.09222412109375, 9.852855682373047, -10.001541137695312, 7.132774353027344, 29.599151611328125, 8.923694610595703, -2.9851837158203125, 8.470630645751953, 6.126491546630859, -14.296783447265625, 40.08074188232422, 4.591220855712891, 13.170745849609375, -4.903076171875, 10.733257293701172, 12.383586883544922, 7.287681579589844, 17.55865478515625, 10.8514404296875, 43.49559783935547, -8.1422119140625, -2.7876319885253906, 0.40970611572265625, 30.15604019165039, 22.60488510131836, 49.55607223510742, 23.448165893554688, 27.99340057373047, -28.754804611206055, 32.221412658691406, 11.752462387084961], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000546.npy"}
|
||||
{"epoch": 0.8253968253968254, "step": 547, "batch_size": 64, "mean": 13.926226615905762, "std": 17.287487030029297, "min": -23.610107421875, "p10": -11.857865715026852, "median": 11.560461044311523, "p90": 34.01637954711914, "max": 57.268985748291016, "pos_frac": 0.84375, "sample": [9.401683807373047, -0.19254684448242188, 0.31999969482421875, 3.8700942993164062, 38.119354248046875, 23.319795608520508, -23.610107421875, 57.268985748291016, 30.674964904785156, 34.27116394042969, 7.4594268798828125, 8.436248779296875, -12.908571243286133, 12.452903747558594, 14.790191650390625, 1.267242431640625, 32.95280456542969, 39.62538146972656, -18.85100555419922, 5.37469482421875, 33.42188262939453, 26.939300537109375, 44.23744201660156, 4.5992279052734375, 29.312850952148438, 8.537803649902344, 6.750732421875, 25.442703247070312, 14.149261474609375, 10.279685974121094, 5.388343811035156, 2.8433799743652344, -15.638130187988281, 18.78479766845703, 6.205848693847656, 10.091253280639648, 40.269859313964844, 5.889741897583008, 22.808856964111328, 20.91684341430664, 20.836708068847656, 32.356285095214844, 21.080360412597656, 33.336578369140625, -18.500255584716797, 16.26616668701172, 10.31570053100586, -13.574493408203125, -9.406219482421875, 30.68460464477539, 35.77020263671875, 19.135726928710938, 31.05615234375, 0.9775314331054688, 10.867813110351562, -20.52203369140625, 4.880577087402344, 28.05205535888672, 12.253108978271484, 2.4845046997070312, 7.555622100830078, 23.545150756835938, 30.45541000366211, -3.9031219482421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000547.npy"}
|
||||
{"epoch": 0.8269085411942555, "step": 548, "batch_size": 64, "mean": 9.027074813842773, "std": 17.859046936035156, "min": -29.182846069335938, "p10": -12.841830444335935, "median": 5.841249465942383, "p90": 34.522036743164065, "max": 45.70042419433594, "pos_frac": 0.75, "sample": [4.8167572021484375, 19.680946350097656, 2.5815277099609375, -6.3812103271484375, 0.07065582275390625, 32.544921875, 0.125885009765625, 34.89183044433594, 24.37286376953125, 36.66877746582031, -6.849998474121094, 29.02210235595703, 0.02015399932861328, 0.20360565185546875, -4.509307861328125, 9.209539413452148, -25.31140899658203, 4.450380325317383, 10.220130920410156, 11.616302490234375, 13.511627197265625, 20.651824951171875, -16.135650634765625, 19.728790283203125, 6.328704833984375, 10.13461685180664, 45.70042419433594, 33.80815124511719, 26.93885040283203, 28.66461181640625, 1.6901168823242188, 5.353794097900391, 2.7290878295898438, 21.101280212402344, 4.82330322265625, 5.226318359375, 39.82136535644531, 6.359807968139648, -19.53856658935547, 0.27652740478515625, 19.564178466796875, 2.4350738525390625, 20.321922302246094, -6.85894775390625, 43.43843078613281, 8.15152359008789, -3.62713623046875, -13.856765747070312, -9.40667724609375, 8.499641418457031, 10.26751708984375, -27.915727615356445, 2.731180191040039, 23.7625732421875, 3.951690673828125, 8.086830139160156, 33.720909118652344, -2.98419189453125, -3.4692916870117188, -14.556137084960938, 45.68519973754883, 34.82798767089844, -29.182846069335938, -10.473648071289062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000548.npy"}
|
||||
{"epoch": 0.8284202569916855, "step": 549, "batch_size": 64, "mean": 8.601249694824219, "std": 19.35883140563965, "min": -41.26979064941406, "p10": -15.031725311279295, "median": 8.34860897064209, "p90": 36.60656509399414, "max": 45.736392974853516, "pos_frac": 0.671875, "sample": [-3.1554718017578125, 12.421783447265625, -7.208869934082031, 0.29483795166015625, -13.849220275878906, -15.53851318359375, 27.09296417236328, 3.4128379821777344, -0.1630401611328125, 41.731781005859375, 35.858612060546875, 2.6030120849609375, 7.919790267944336, 39.12334442138672, -6.4834136962890625, -2.0987548828125, -23.660297393798828, 5.553379058837891, 32.076080322265625, 14.785476684570312, 27.84362030029297, 12.452756881713867, 24.13958740234375, -22.892532348632812, 4.8771209716796875, 6.621826171875, 6.290496826171875, 36.911964416503906, 13.051658630371094, -19.844207763671875, 35.89396667480469, 43.32386016845703, 18.87823486328125, -5.626865386962891, 14.068099975585938, 30.646270751953125, -0.42236328125, -6.892974853515625, 9.555992126464844, 18.29281234741211, -21.094772338867188, 8.777427673339844, -1.5614662170410156, 9.363042831420898, -0.34687042236328125, 14.489374160766602, -2.880535125732422, 13.29146957397461, 39.2515869140625, 8.868072509765625, -7.349515914916992, 34.13592529296875, 13.2899169921875, 3.773712158203125, 14.429550170898438, 12.867156982421875, 2.8839111328125, 4.582752227783203, -11.292205810546875, 41.24761199951172, 45.736392974853516, -37.80180358886719, -41.26979064941406, 9.203399658203125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000549.npy"}
|
||||
{"epoch": 0.8299319727891157, "step": 550, "batch_size": 64, "mean": 8.954895973205566, "std": 19.225894927978516, "min": -34.8195686340332, "p10": -15.694669342041013, "median": 7.168069839477539, "p90": 34.001292800903336, "max": 49.7569580078125, "pos_frac": 0.703125, "sample": [21.46158218383789, 43.67500305175781, 48.90790939331055, 7.594566345214844, -31.042362213134766, -4.063438415527344, 0.2111053466796875, -17.521827697753906, -6.400199890136719, 3.869800567626953, 35.82081985473633, 16.233169555664062, 19.3470458984375, 6.253374099731445, 5.3190155029296875, 36.57048797607422, 46.12411117553711, 29.430442810058594, 8.982986450195312, 2.5046768188476562, 23.838916778564453, -2.7799606323242188, 23.308128356933594, 26.836639404296875, 25.749160766601562, -16.62567901611328, 18.430099487304688, -5.471275329589844, 8.28057861328125, -10.801177978515625, -30.23863983154297, 5.6798248291015625, -2.9458160400390625, 12.037490844726562, 3.579387664794922, 18.005882263183594, -0.228485107421875, 1.4244613647460938, 4.4890594482421875, 6.741573333740234, 13.297412872314453, 29.75572967529297, 5.147705078125, 18.13206672668457, 1.2874984741210938, 12.941577911376953, 12.63153076171875, 22.991952896118164, -1.887939453125, 40.56708526611328, -14.486732482910156, 25.30744171142578, 11.42608642578125, 20.72378921508789, 29.372154235839844, 1.7435026168823242, -30.574569702148438, -2.601442337036133, 49.7569580078125, -34.8195686340332, 10.965957641601562, -2.0724105834960938, -16.212356567382812, -12.868522644042969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000550.npy"}
|
||||
{"epoch": 0.8314436885865457, "step": 551, "batch_size": 64, "mean": 10.552169799804688, "std": 16.43597412109375, "min": -30.166946411132812, "p10": -5.796997642517089, "median": 8.511392593383789, "p90": 30.040731048583986, "max": 68.21807861328125, "pos_frac": 0.78125, "sample": [13.177772521972656, 18.947280883789062, 18.345870971679688, 7.6476593017578125, 23.264644622802734, 23.877321243286133, 68.21807861328125, -4.117618560791016, 5.327228546142578, 22.653297424316406, -1.410980224609375, 9.991195678710938, 39.39235305786133, 2.1361007690429688, 25.072608947753906, 14.808242797851562, 3.823760986328125, 30.101478576660156, -30.166946411132812, 0.8543777465820312, 4.4894561767578125, 2.4516448974609375, 29.705551147460938, 4.003692626953125, 8.721115112304688, 56.80118942260742, -8.856849670410156, 14.323944091796875, 13.130393981933594, 14.438579559326172, 13.212112426757812, 8.30167007446289, 0.8720664978027344, 5.2941436767578125, 0.5888748168945312, 30.9892578125, -6.238555908203125, 29.89898681640625, 0.18118858337402344, 0.180511474609375, -2.0666427612304688, 18.286457061767578, 5.1905059814453125, -21.014759063720703, 32.59003829956055, 23.880531311035156, 9.111152648925781, 20.842357635498047, -1.3039398193359375, -1.4282188415527344, 6.9170074462890625, 15.10797119140625, 9.520477294921875, -14.38117790222168, 1.585906982421875, -0.4158782958984375, -12.078956604003906, 30.385543823242188, 16.339691162109375, 7.8710174560546875, -13.799545288085938, -4.766695022583008, 20.766342163085938, 13.767044067382812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000551.npy"}
|
||||
{"epoch": 0.8329554043839759, "step": 552, "batch_size": 64, "mean": 9.438526153564453, "std": 17.6705379486084, "min": -37.5670166015625, "p10": -15.121199417114257, "median": 13.348844528198242, "p90": 27.39487991333008, "max": 46.3377685546875, "pos_frac": 0.765625, "sample": [-16.71352767944336, -3.2241668701171875, -22.35265350341797, -4.755302429199219, 24.700946807861328, 1.0890941619873047, 0.43420982360839844, 9.525604248046875, 20.438339233398438, 25.914363861083984, 24.86737060546875, 46.3377685546875, 13.869379043579102, 33.21803283691406, -5.302196502685547, -4.025957107543945, 0.15887451171875, 13.00140380859375, 20.366317749023438, 1.518707275390625, 14.883209228515625, 30.577781677246094, 4.3766021728515625, 14.853961944580078, 36.58409118652344, -0.6936187744140625, 6.465934753417969, 13.696285247802734, 26.015762329101562, -13.698654174804688, 11.722421646118164, 4.568614959716797, 18.758384704589844, -32.870079040527344, 30.730884552001953, -11.218280792236328, 14.978759765625, 26.453933715820312, -0.13495254516601562, 1.8587570190429688, 27.562767028808594, 6.5643310546875, -17.784622192382812, 27.003143310546875, 0.136688232421875, 8.217071533203125, 15.786975860595703, 13.948509216308594, -36.28009033203125, 16.66064453125, 44.312965393066406, 15.328033447265625, 10.23100471496582, -37.5670166015625, 25.86932945251465, 23.449417114257812, 16.155471801757812, 19.442161560058594, 25.92107391357422, 15.771080017089844, 16.01317596435547, 1.7318344116210938, 4.346168518066406, -15.73086166381836], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000552.npy"}
|
||||
{"epoch": 0.8344671201814059, "step": 553, "batch_size": 64, "mean": 11.281781196594238, "std": 21.096675872802734, "min": -42.141082763671875, "p10": -13.806241798400876, "median": 13.163779258728027, "p90": 40.392889022827156, "max": 54.10734558105469, "pos_frac": 0.640625, "sample": [54.10734558105469, -4.912563323974609, -16.291046142578125, 42.33330154418945, -4.103721618652344, -3.3931655883789062, 33.355743408203125, 35.44610595703125, -0.28182220458984375, -5.591054916381836, 15.66024398803711, 10.0296630859375, 35.887481689453125, 21.430130004882812, 6.9936370849609375, 14.570388793945312, -2.3578414916992188, 30.888683319091797, -5.084812164306641, -42.141082763671875, -3.7942657470703125, 30.667221069335938, 23.134078979492188, 15.460445404052734, 31.46717071533203, 16.834510803222656, 14.644882202148438, -1.97833251953125, -3.5691375732421875, -14.838531494140625, 49.53431701660156, 29.151477813720703, 15.495887756347656, 48.3375244140625, 19.770187377929688, 17.17254638671875, 38.042728424072266, 3.0109710693359375, 1.504058837890625, 42.671119689941406, 13.096755981445312, -2.425579071044922, 5.121938705444336, 13.230802536010742, -19.06774139404297, -2.7920989990234375, -4.192778587341309, 43.84107208251953, -34.08154296875, -26.98251724243164, 28.698944091796875, 21.71752166748047, 34.30162048339844, 0.250732421875, -26.746864318847656, 22.663429260253906, 14.304618835449219, -5.9304656982421875, 16.8148193359375, 9.855323791503906, -3.5234622955322266, -11.397565841674805, 41.40010070800781, 4.612480163574219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000553.npy"}
|
||||
{"epoch": 0.8359788359788359, "step": 554, "batch_size": 64, "mean": 11.6683988571167, "std": 18.562538146972656, "min": -37.76997375488281, "p10": -9.161065292358398, "median": 9.077638626098633, "p90": 36.27335243225098, "max": 52.144527435302734, "pos_frac": 0.71875, "sample": [12.161506652832031, 32.975807189941406, 44.3492317199707, 29.19189453125, 3.5747604370117188, 12.380186080932617, 4.263561248779297, 26.01874542236328, -24.641937255859375, 30.0621337890625, 2.60205078125, -3.99169921875, -9.331260681152344, 20.47601318359375, -37.76997375488281, -0.33655548095703125, 4.649599075317383, 3.0104904174804688, 11.754440307617188, 4.717859268188477, 8.058929443359375, 21.766021728515625, 9.903865814208984, -4.8106231689453125, 10.471511840820312, 17.816497802734375, 44.11170196533203, 29.301677703857422, -5.544614791870117, 8.167991638183594, 52.144527435302734, -0.9813079833984375, 16.26995086669922, -13.669090270996094, 3.5946998596191406, -0.17243576049804688, 26.50959014892578, 6.761119842529297, 28.266357421875, -5.007194519042969, -22.9464111328125, 31.27643585205078, 16.854957580566406, 0.7770004272460938, 8.251411437988281, -7.317089080810547, 3.5240345001220703, -10.005035400390625, 3.3778419494628906, -0.537811279296875, 35.96330261230469, -11.53717041015625, -8.76394271850586, 38.40776824951172, 41.33514404296875, 10.697456359863281, 12.046234130859375, 28.9136962890625, 25.14362335205078, 36.40623092651367, 41.66861343383789, -6.844501495361328, 34.18400955200195, 26.825695037841797], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000554.npy"}
|
||||
{"epoch": 0.8374905517762661, "step": 555, "batch_size": 64, "mean": 12.04253101348877, "std": 19.940162658691406, "min": -41.16645812988281, "p10": -15.97761993408203, "median": 12.898618698120117, "p90": 36.43439483642578, "max": 46.17637634277344, "pos_frac": 0.796875, "sample": [1.2468605041503906, -21.19646453857422, 14.830375671386719, 21.87169647216797, 32.93540954589844, 0.7874298095703125, -15.371688842773438, -5.393218994140625, -17.462566375732422, 4.0154876708984375, 29.756031036376953, -14.130050659179688, 20.4599609375, 8.434925079345703, 39.53582000732422, 34.43470001220703, 0.20472335815429688, 17.051593780517578, 4.789276123046875, 35.892539978027344, 16.453445434570312, 2.4545516967773438, 46.17637634277344, 11.023147583007812, 21.09256935119629, 14.807052612304688, 34.556640625, 26.589698791503906, 9.760971069335938, 33.855003356933594, 27.82190704345703, 14.358943939208984, 35.157066345214844, 36.66661834716797, -30.336273193359375, 37.62692642211914, -39.10197448730469, 23.24668312072754, -2.779266357421875, 0.8204917907714844, 23.020477294921875, 10.881088256835938, 9.285987854003906, 9.079673767089844, -16.2373046875, 9.29030990600586, 15.916007995605469, 5.1781005859375, 41.309757232666016, -8.710412979125977, 21.72967529296875, -6.4503631591796875, -41.16645812988281, 5.960102081298828, 37.36934280395508, 14.64556884765625, 11.43829345703125, 7.17474365234375, 25.553325653076172, -18.535905838012695, 44.48847961425781, 22.444904327392578, 30.131591796875, 3.9815711975097656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000555.npy"}
|
||||
{"epoch": 0.8390022675736961, "step": 556, "batch_size": 64, "mean": 10.62973403930664, "std": 18.783687591552734, "min": -29.004356384277344, "p10": -11.709347534179685, "median": 9.654802322387695, "p90": 37.75106582641602, "max": 53.39958190917969, "pos_frac": 0.671875, "sample": [16.628440856933594, 1.107513427734375, 5.019561767578125, 1.6566581726074219, -15.258865356445312, 33.86767578125, 46.179962158203125, 10.226993560791016, -3.7788467407226562, -9.7470703125, -2.0456581115722656, -3.0726890563964844, 14.29022216796875, 24.125755310058594, 13.690435409545898, 10.478736877441406, 5.4566650390625, 4.403045654296875, 27.17194366455078, 29.278278350830078, -1.3304901123046875, 17.399642944335938, 41.68587875366211, -6.310585021972656, -15.416950225830078, 4.5314178466796875, -12.550323486328125, 14.09317398071289, 18.12488555908203, -3.391937255859375, 3.5401268005371094, -16.331180572509766, 25.394859313964844, 3.3148880004882812, -1.0229339599609375, 7.42976188659668, 35.40306091308594, 43.849857330322266, 9.082611083984375, 10.938385009765625, 38.132911682128906, 51.58856201171875, 13.139385223388672, 6.939918518066406, 22.205184936523438, 22.355491638183594, -8.208145141601562, -3.372650146484375, -21.526403427124023, -1.3125457763671875, 44.98122787475586, -29.004356384277344, -6.259029388427734, 11.976905822753906, 13.830585479736328, -5.6287994384765625, 17.27989959716797, -20.341705322265625, 20.1536865234375, 36.86009216308594, 31.34326171875, 53.39958190917969, 11.087654113769531, -7.430646896362305], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000556.npy"}
|
||||
{"epoch": 0.8405139833711263, "step": 557, "batch_size": 64, "mean": 13.66687297821045, "std": 19.20047950744629, "min": -21.983856201171875, "p10": -9.742466735839843, "median": 10.74421215057373, "p90": 41.24841766357422, "max": 54.5543327331543, "pos_frac": 0.734375, "sample": [48.06079864501953, 15.436656951904297, 16.57198143005371, 2.325328826904297, 5.377147674560547, -8.092403411865234, 49.412105560302734, -7.327705383300781, 18.838878631591797, 30.236282348632812, 2.954071044921875, -16.854801177978516, -5.384729385375977, 9.56707763671875, 2.610645294189453, 7.372955322265625, 24.965702056884766, 5.803611755371094, 54.5543327331543, 10.858612060546875, 11.232528686523438, -2.156951904296875, 41.65449523925781, -21.77978515625, -3.8455886840820312, -12.0816650390625, 38.41773223876953, 49.46803283691406, -8.862777709960938, 1.6798324584960938, 19.536590576171875, 4.7000579833984375, 22.56525421142578, -4.5543670654296875, -1.9827880859375, 43.792625427246094, 33.363739013671875, 23.647247314453125, 12.1236572265625, 54.182464599609375, -0.6628494262695312, 19.845829010009766, -15.068740844726562, -0.8628463745117188, 10.629812240600586, -10.119476318359375, 8.528106689453125, 36.43962097167969, 28.77611541748047, 27.004798889160156, 2.438556671142578, 20.507049560546875, 9.335559844970703, 11.358217239379883, 26.0660343170166, 5.83380126953125, 31.339675903320312, 8.592124938964844, 26.306961059570312, 20.553680419921875, -21.983856201171875, -10.254371643066406, 40.3009033203125, 31.38826560974121], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000557.npy"}
|
||||
{"epoch": 0.8420256991685563, "step": 558, "batch_size": 64, "mean": 12.695113182067871, "std": 18.270944595336914, "min": -40.4989128112793, "p10": -7.480868530273438, "median": 13.250511169433594, "p90": 35.334234619140624, "max": 54.13190460205078, "pos_frac": 0.765625, "sample": [17.886756896972656, 34.92608642578125, 35.5091552734375, -16.832965850830078, 29.307472229003906, 1.6567611694335938, 40.92875671386719, 9.373102188110352, 5.409855842590332, 13.5960693359375, 5.620689392089844, 24.30158233642578, -1.7617950439453125, 9.953453063964844, 5.357337951660156, 27.31937026977539, 20.896373748779297, 13.425506591796875, 27.17129135131836, 22.067543029785156, 15.6168212890625, 17.703948974609375, -0.019245147705078125, 32.67870330810547, 27.15225601196289, -9.99932861328125, 32.64739990234375, 5.459541320800781, 22.508625030517578, 2.7375259399414062, 13.61867904663086, 15.963478088378906, 15.725173950195312, 54.13190460205078, -20.46533966064453, -5.193046569824219, 10.950618743896484, 1.742523193359375, 2.2675552368164062, 51.26026916503906, 21.534011840820312, -14.399574279785156, 43.53234100341797, 7.05731201171875, 20.04531478881836, 37.77208709716797, -2.014636993408203, 0.7135162353515625, 42.57283020019531, 19.86579132080078, 13.075515747070312, -40.4989128112793, 30.66522979736328, -7.442512512207031, 1.7286376953125, -3.6238136291503906, 23.757564544677734, -7.497306823730469, -20.03102684020996, -1.8242340087890625, 30.135643005371094, 2.9267120361328125, 8.019744873046875, -6.183441162109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000558.npy"}
|
||||
{"epoch": 0.8435374149659864, "step": 559, "batch_size": 64, "mean": 11.21513557434082, "std": 22.02442741394043, "min": -42.10427474975586, "p10": -14.809957885742183, "median": 8.536996841430664, "p90": 40.143339157104506, "max": 65.51589965820312, "pos_frac": 0.671875, "sample": [-0.0207977294921875, -21.565467834472656, 5.007892608642578, -1.7145957946777344, 20.091339111328125, 65.51589965820312, 36.38341522216797, 36.339996337890625, 2.1652145385742188, 1.0832290649414062, -18.729232788085938, 24.53277587890625, -9.888671875, 14.431411743164062, 15.344947814941406, 43.131370544433594, 16.407318115234375, 5.519264221191406, 32.68759536743164, 2.7705612182617188, 36.5824089050293, 8.449844360351562, 0.32573699951171875, 41.287445068359375, 1.0514678955078125, -25.846149444580078, -10.441986083984375, 33.638282775878906, -9.574226379394531, 51.012290954589844, 9.836177825927734, -4.253501892089844, 7.207977294921875, 43.68760681152344, 15.2628173828125, -8.736900329589844, 11.57241439819336, -3.2828636169433594, -1.3433151245117188, -3.2263946533203125, -0.6575775146484375, -4.590888977050781, -2.2524280548095703, 20.146137237548828, 13.873451232910156, 37.473758697509766, 29.69173812866211, -16.68194580078125, 32.61344528198242, -0.09912872314453125, 8.795852661132812, 52.56501770019531, -42.10427474975586, 7.250450134277344, 22.929798126220703, 49.433692932128906, 5.148292541503906, 31.310394287109375, 10.262676239013672, 8.624149322509766, 20.06664276123047, 31.176116943359375, -26.657119750976562, -33.25216293334961], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000559.npy"}
|
||||
{"epoch": 0.8450491307634165, "step": 560, "batch_size": 64, "mean": 12.571533203125, "std": 18.992156982421875, "min": -29.691425323486328, "p10": -7.083128356933593, "median": 8.281719207763672, "p90": 37.08572540283203, "max": 51.06505584716797, "pos_frac": 0.6875, "sample": [7.605072021484375, 20.20594024658203, 36.090797424316406, 26.541221618652344, -1.4928512573242188, -3.1316757202148438, 34.56052780151367, 4.4742889404296875, 47.69915771484375, -2.5778961181640625, 11.05670166015625, -0.71502685546875, -6.0003814697265625, 8.54315185546875, -2.0734481811523438, -7.457244873046875, -11.415771484375, 2.6058483123779297, 25.89556884765625, 17.925628662109375, -4.400749206542969, 30.79216766357422, 8.397117614746094, 1.9439334869384766, 29.35034942626953, 20.18670082092285, -7.63153076171875, 43.02666473388672, 35.594459533691406, 19.588441848754883, 11.003829956054688, -3.9057769775390625, -16.47136688232422, 45.93428421020508, 32.563392639160156, -3.606353759765625, 1.1104660034179688, -4.4404449462890625, 51.06505584716797, -13.707908630371094, 50.80708694458008, 32.71912384033203, 7.308326721191406, 26.706209182739258, 12.698455810546875, 19.035507202148438, -3.9404983520507812, 44.236080169677734, 18.324878692626953, 3.8911972045898438, -0.8282546997070312, 3.949676513671875, -29.691425323486328, -6.2101898193359375, 8.16632080078125, 23.601715087890625, -27.616409301757812, 7.043905258178711, 31.870220184326172, 4.398216247558594, 37.512123107910156, 32.42732238769531, 4.205326080322266, 19.230894088745117], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000560.npy"}
|
||||
{"epoch": 0.8465608465608465, "step": 561, "batch_size": 64, "mean": 13.123538970947266, "std": 19.33298683166504, "min": -21.693214416503906, "p10": -8.185268402099608, "median": 8.07109260559082, "p90": 42.62752914428712, "max": 56.655517578125, "pos_frac": 0.75, "sample": [-1.1136398315429688, 7.856441497802734, 1.8787994384765625, 9.155994415283203, 12.478717803955078, 39.901885986328125, 3.498626708984375, -4.565315246582031, 43.21556091308594, 9.438217163085938, -3.4789886474609375, 27.407684326171875, 3.0873985290527344, 13.582145690917969, 4.674510955810547, -1.447164535522461, 16.70630645751953, 32.058265686035156, 56.651634216308594, 2.78802490234375, 4.023807525634766, 22.75689697265625, 25.560100555419922, 3.201068878173828, 25.110595703125, -2.2993335723876953, 7.823448181152344, -8.977563858032227, -21.191207885742188, 56.655517578125, -21.693214416503906, 13.253128051757812, 14.867691040039062, 41.255455017089844, 8.285743713378906, 22.461387634277344, 6.6035308837890625, 13.053180694580078, 1.7629432678222656, 54.64552307128906, 40.1307373046875, 52.76835632324219, 23.836807250976562, 52.887611389160156, 32.76072311401367, -9.950973510742188, -13.121658325195312, 27.634048461914062, 12.682525634765625, 1.9229202270507812, 6.911224365234375, -0.036651611328125, 1.4189529418945312, -9.830184936523438, 9.100341796875, 27.826759338378906, -0.9238243103027344, 4.020923614501953, 8.579872131347656, -8.738372802734375, -6.894691467285156, 5.028402328491211, 47.454185485839844, -6.495353698730469], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000561.npy"}
|
||||
{"epoch": 0.8480725623582767, "step": 562, "batch_size": 64, "mean": 12.81103515625, "std": 18.17465591430664, "min": -18.095977783203125, "p10": -6.762363052368164, "median": 7.415220260620117, "p90": 39.32121276855469, "max": 54.36846923828125, "pos_frac": 0.71875, "sample": [0.4565391540527344, 38.80317687988281, -9.399097442626953, 23.821075439453125, 5.35919189453125, 1.0020065307617188, 2.5307083129882812, 54.36846923828125, 42.46141052246094, 3.4599227905273438, 23.156295776367188, 7.99859619140625, -13.312263488769531, -15.815391540527344, 16.472991943359375, -3.0819091796875, 6.6250457763671875, 10.182830810546875, 28.677032470703125, -7.363685607910156, 12.381362915039062, 41.66788101196289, -1.3917236328125, -1.5086669921875, 4.559761047363281, 38.73529815673828, -2.0702781677246094, -6.599029541015625, 2.8828887939453125, 11.026229858398438, 16.055513381958008, -18.095977783203125, 41.812049865722656, 37.835235595703125, 22.206802368164062, 45.22137451171875, 1.1359405517578125, -2.544525146484375, 3.798900604248047, 39.54322814941406, -1.380706787109375, 1.1683692932128906, 16.012163162231445, 30.84960174560547, 14.37774658203125, 47.165069580078125, 31.802215576171875, 24.2822265625, 34.20940399169922, 1.8292236328125, 32.46885681152344, 2.312774658203125, -3.3992385864257812, 6.831844329833984, 12.667343139648438, 11.721614837646484, 34.291961669921875, 38.62530517578125, 10.457717895507812, -6.832363128662109, -3.114532470703125, -13.214462280273438, -2.663677215576172, -3.6174163818359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000562.npy"}
|
||||
{"epoch": 0.8495842781557067, "step": 563, "batch_size": 64, "mean": 12.89486026763916, "std": 21.860492706298828, "min": -28.273590087890625, "p10": -14.856143188476562, "median": 9.242733001708984, "p90": 47.159203720092776, "max": 60.17584228515625, "pos_frac": 0.75, "sample": [7.198616027832031, 3.899473190307617, 17.77374267578125, 0.1516714096069336, 47.270599365234375, 24.20850372314453, -20.04804039001465, 60.17584228515625, 34.059356689453125, 9.129951477050781, 2.435657501220703, 22.848175048828125, 1.5618743896484375, -12.968254089355469, 36.824729919433594, 10.449321746826172, -26.96862030029297, 49.873870849609375, -20.34333038330078, 52.692176818847656, -9.886276245117188, 55.41230773925781, -6.619173049926758, 18.485580444335938, 25.226104736328125, -5.852897644042969, 19.17754364013672, 16.653884887695312, 8.316780090332031, -14.932136535644531, -6.0645751953125, -6.169952392578125, 2.848468780517578, 3.3909988403320312, -5.53620719909668, 46.8992805480957, 7.696815490722656, -23.817703247070312, 8.16287612915039, 9.220710754394531, 47.796546936035156, 41.54388427734375, 21.832164764404297, -14.678825378417969, -28.273590087890625, 9.935447692871094, 11.978401184082031, 3.964569091796875, 31.726951599121094, 17.58477783203125, -22.234771728515625, 5.807731628417969, 19.032665252685547, 28.892974853515625, 31.317909240722656, 1.6608314514160156, 48.659366607666016, 33.393821716308594, 4.18267822265625, 42.28258514404297, 15.3160400390625, 9.264755249023438, 22.967679977416992, -1.5212974548339844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000563.npy"}
|
||||
{"epoch": 0.8510959939531368, "step": 564, "batch_size": 64, "mean": 12.080354690551758, "std": 21.217344284057617, "min": -39.65679931640625, "p10": -12.530681419372556, "median": 8.93661117553711, "p90": 40.87251281738281, "max": 50.834739685058594, "pos_frac": 0.78125, "sample": [37.06768798828125, 1.7007522583007812, 46.8607177734375, 8.426010131835938, 15.176706314086914, 45.291015625, 28.214309692382812, -38.36140441894531, 4.388797760009766, -8.744220733642578, 49.114105224609375, 41.05438232421875, 9.447212219238281, -23.11444664001465, 0.29187774658203125, 3.8299560546875, -6.570749282836914, 20.326845169067383, 40.448150634765625, 5.3494873046875, -1.8563766479492188, 7.118953704833984, 18.87872314453125, 13.406036376953125, 16.393310546875, 3.5453262329101562, 39.56257629394531, 3.8782119750976562, 0.51544189453125, 15.066680908203125, 26.077743530273438, 29.517623901367188, 48.74364471435547, 6.009868621826172, 2.2359657287597656, -28.532608032226562, 11.140220642089844, 3.2012901306152344, -1.796112060546875, -2.1351165771484375, 28.415565490722656, 37.93812561035156, 3.941680908203125, -10.18459701538086, -15.759807586669922, 1.5150909423828125, 22.33379364013672, 13.1630859375, -10.129631042480469, -39.65679931640625, 38.64708709716797, 28.751358032226562, 19.034454345703125, -13.53614616394043, 39.139984130859375, 4.7254638671875, 25.268417358398438, 50.834739685058594, -21.734420776367188, 43.69322204589844, 2.9731674194335938, 11.776416778564453, 5.459625244140625, 15.364166259765625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000564.npy"}
|
||||
{"epoch": 0.8526077097505669, "step": 565, "batch_size": 64, "mean": 11.385034561157227, "std": 18.261417388916016, "min": -28.779624938964844, "p10": -8.58107223510742, "median": 8.176170349121094, "p90": 37.74143486022951, "max": 52.233909606933594, "pos_frac": 0.734375, "sample": [1.6155052185058594, -28.779624938964844, -19.267059326171875, 18.1322021484375, 4.786285400390625, 2.3317718505859375, 7.441009521484375, 8.911331176757812, -0.7275428771972656, 4.6969757080078125, 13.229438781738281, 16.569076538085938, 42.139862060546875, -27.225008010864258, 4.3065338134765625, 26.82122802734375, 12.012091636657715, 12.753253936767578, 43.067474365234375, 46.243778228759766, 31.843345642089844, -11.045120239257812, -8.88653564453125, -0.799346923828125, 15.583251953125, 5.798980712890625, -0.8628387451171875, 43.85622024536133, 28.736343383789062, 21.71044158935547, 27.434593200683594, 1.0045604705810547, 2.6905364990234375, 1.5355873107910156, 20.026878356933594, -2.3774795532226562, 28.17032241821289, 13.183906555175781, -17.96485137939453, 33.79362487792969, 7.2140350341796875, 5.428436279296875, -15.397720336914062, 17.253135681152344, 2.9594879150390625, 15.121955871582031, -4.669181823730469, 31.328624725341797, 49.32147216796875, 4.152069091796875, 17.249794006347656, -7.868324279785156, -7.59674072265625, -4.104564666748047, 26.53904151916504, 1.7299118041992188, 11.182968139648438, 26.434799194335938, -2.6785354614257812, -1.714599609375, 52.233909606933594, 29.32413673400879, 13.273754119873047, 39.433353424072266], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000565.npy"}
|
||||
{"epoch": 0.854119425547997, "step": 566, "batch_size": 64, "mean": 11.292125701904297, "std": 20.02901268005371, "min": -41.01734924316406, "p10": -12.109428024291992, "median": 11.35430908203125, "p90": 33.456460571289064, "max": 60.34040832519531, "pos_frac": 0.671875, "sample": [3.44927978515625, -7.7047882080078125, -4.732707977294922, 2.8364639282226562, 14.62945556640625, 32.737213134765625, 37.96397018432617, 28.64727783203125, 25.280839920043945, 42.71137619018555, 25.39023780822754, 20.079849243164062, 15.5391845703125, 6.113365173339844, 11.1048583984375, 25.696334838867188, -41.01734924316406, 11.776535034179688, 11.603759765625, 16.453109741210938, 16.867820739746094, -1.3034324645996094, -14.858375549316406, 29.449127197265625, 6.072357177734375, 7.2611236572265625, 33.03916931152344, 31.00100326538086, 27.075477600097656, -14.636817932128906, 22.655452728271484, 28.6846923828125, 38.59857940673828, -9.907180786132812, -11.967910766601562, -36.16613006591797, 5.572784423828125, 5.750329971313477, 33.63529968261719, -9.510055541992188, 12.113174438476562, -5.489845275878906, 60.34040832519531, -14.566715240478516, 46.18390655517578, 11.086658477783203, -0.4336433410644531, -19.529613494873047, 16.963218688964844, -3.5779876708984375, -0.009571075439453125, -1.9330902099609375, -9.088104248046875, 30.263221740722656, -6.691532135009766, 10.841434478759766, 4.238040924072266, 30.472076416015625, 16.340347290039062, -12.17007827758789, 23.843475341796875, 14.16278076171875, -1.2668304443359375, 54.732757568359375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000566.npy"}
|
||||
{"epoch": 0.8556311413454271, "step": 567, "batch_size": 64, "mean": 13.247111320495605, "std": 17.149465560913086, "min": -17.757022857666016, "p10": -9.956193161010741, "median": 10.998119354248047, "p90": 36.68774223327637, "max": 49.294517517089844, "pos_frac": 0.75, "sample": [27.04290008544922, 2.7506046295166016, -14.975128173828125, 29.077877044677734, 10.161209106445312, 12.202430725097656, 39.77470779418945, 19.386932373046875, 49.294517517089844, 9.205879211425781, 13.177513122558594, 10.626838684082031, 22.848278045654297, -10.255180358886719, 15.132850646972656, -0.15663909912109375, -17.757022857666016, 6.8057403564453125, -1.1574935913085938, 24.425914764404297, 7.227508544921875, -3.7134628295898438, -2.614288330078125, 32.69976043701172, -7.67144775390625, 2.111085891723633, -11.24312973022461, -10.887310028076172, 4.4231414794921875, -7.109443664550781, 29.170806884765625, 27.622146606445312, 44.621376037597656, 45.794952392578125, 26.405803680419922, 16.519975662231445, 13.277204513549805, 4.478267669677734, 36.73802185058594, 47.25578308105469, -9.258556365966797, 11.369400024414062, 10.454399108886719, 29.429290771484375, 36.5704231262207, 29.637428283691406, -1.862823486328125, 4.196144104003906, 12.704971313476562, 24.52729034423828, 21.139604568481445, 0.372100830078125, 26.030670166015625, 8.083541870117188, -10.416252136230469, 2.7610397338867188, -12.183883666992188, 12.57265853881836, 48.185462951660156, -1.78680419921875, 27.97393798828125, 7.0974884033203125, 21.06906509399414, 6.429042816162109], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000567.npy"}
|
||||
{"epoch": 0.8571428571428571, "step": 568, "batch_size": 64, "mean": 13.719470024108887, "std": 17.931270599365234, "min": -22.886486053466797, "p10": -8.918047714233397, "median": 9.985370635986328, "p90": 40.95027008056641, "max": 53.114410400390625, "pos_frac": 0.8125, "sample": [19.01331329345703, 7.878448486328125, 9.719001770019531, 14.694210052490234, 0.7231636047363281, 18.17736053466797, 13.533576965332031, -22.886486053466797, 6.474235534667969, -11.435676574707031, -9.022029876708984, 41.49793243408203, 10.251739501953125, 29.954345703125, -4.71844482421875, -14.158340454101562, 4.573974609375, 0.990264892578125, 2.8256378173828125, 3.590909957885742, 3.96453857421875, 6.15997314453125, 6.543060302734375, 19.597328186035156, 34.54894256591797, 44.8960075378418, 33.586265563964844, 46.891998291015625, -8.675422668457031, 9.097366333007812, 26.129745483398438, 14.472145080566406, 14.165302276611328, -0.27597808837890625, 5.000925064086914, 13.19479751586914, -11.719779968261719, 7.5930023193359375, -2.6246347427368164, 19.23407554626465, 8.771591186523438, 42.01884460449219, 13.77740478515625, 20.6397705078125, 35.4709358215332, 4.510406494140625, 33.36389923095703, 0.65081787109375, -13.62078857421875, 0.7529830932617188, 24.756500244140625, 27.715362548828125, 13.801239013671875, 15.134283065795898, 51.441856384277344, 2.5134735107421875, 1.1092338562011719, 34.628326416015625, 41.365447998046875, 39.98152160644531, -12.044097900390625, -4.999946594238281, 39.73577117919922, 53.114410400390625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000568.npy"}
|
||||
{"epoch": 0.8586545729402872, "step": 569, "batch_size": 64, "mean": 9.874387741088867, "std": 18.743555068969727, "min": -32.406463623046875, "p10": -14.767891311645508, "median": 9.965499877929688, "p90": 35.86920776367188, "max": 45.943992614746094, "pos_frac": 0.734375, "sample": [36.18690490722656, -4.071868896484375, 2.464082717895508, 39.21322250366211, 20.99005889892578, 34.783695220947266, -14.448944091796875, 20.089599609375, 10.46173095703125, 40.348243713378906, 35.06995391845703, 17.04987335205078, 9.471061706542969, 18.212703704833984, -17.60247039794922, 6.195064544677734, 3.813640594482422, 37.63929748535156, 4.765663146972656, 18.74610137939453, 35.12791442871094, 12.46697998046875, 10.03326416015625, 45.943992614746094, 15.514305114746094, 20.342479705810547, 25.10594940185547, 7.017658233642578, 0.4618377685546875, -16.6405029296875, -28.882396697998047, 1.0893478393554688, -14.904582977294922, 8.834114074707031, -3.2070159912109375, 7.154510498046875, 41.75843048095703, 15.124847412109375, 12.340126037597656, 13.35728645324707, 10.369632720947266, -4.925258636474609, -32.244659423828125, -10.997425079345703, 10.631301879882812, -32.406463623046875, 6.391387939453125, -22.167327880859375, 10.988502502441406, 3.771808624267578, 16.359909057617188, 5.71746826171875, 33.07001876831055, -12.75417709350586, -0.37047576904296875, 13.158733367919922, 29.137664794921875, 7.921630859375, 44.914859771728516, -6.072700500488281, -2.9925003051757812, 9.897735595703125, 32.97060012817383, -5.825592041015625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000569.npy"}
|
||||
{"epoch": 0.8601662887377173, "step": 570, "batch_size": 64, "mean": 16.062524795532227, "std": 18.782867431640625, "min": -18.03485870361328, "p10": -9.043127822875974, "median": 14.674656867980957, "p90": 39.92199440002442, "max": 63.2593994140625, "pos_frac": 0.828125, "sample": [29.41241455078125, -16.01793670654297, 19.01198959350586, 0.69134521484375, 5.955188751220703, 21.34784698486328, -18.03485870361328, -4.171562194824219, 10.374736785888672, -10.023712158203125, 6.3178253173828125, 4.027271270751953, 14.81962776184082, 29.779911041259766, 11.733802795410156, 32.724449157714844, -0.4132194519042969, -9.848152160644531, -11.37582778930664, 42.55236053466797, 29.282989501953125, -7.164737701416016, 0.04907989501953125, 38.81623077392578, 51.23351287841797, 1.355886459350586, 19.041275024414062, 23.033416748046875, 36.925514221191406, 12.605186462402344, 17.441390991210938, 38.03437805175781, 34.22597885131836, 1.60198974609375, 4.236400604248047, 14.529685974121094, 28.00323486328125, 16.51074981689453, 38.423973083496094, 6.755613327026367, 37.997467041015625, 20.282058715820312, 42.1083984375, -2.7474441528320312, 0.029254913330078125, 9.324981689453125, 35.49530029296875, 19.280654907226562, 34.645111083984375, -9.972240447998047, 40.39589309692383, -17.880455017089844, 8.090301513671875, 26.345970153808594, 51.28727722167969, 3.8206100463867188, 9.599535942077637, 3.299640655517578, 16.353416442871094, 18.42621612548828, 63.2593994140625, 0.37294769287109375, 46.761070251464844, 7.6209716796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000570.npy"}
|
||||
{"epoch": 0.8616780045351474, "step": 571, "batch_size": 64, "mean": 8.448714256286621, "std": 18.396047592163086, "min": -41.101593017578125, "p10": -16.335560607910153, "median": 8.977063179016113, "p90": 33.136122131347655, "max": 43.610050201416016, "pos_frac": 0.734375, "sample": [8.672500610351562, 10.686790466308594, 16.646240234375, 7.862787246704102, -10.548065185546875, 34.091033935546875, 11.247909545898438, 25.27037811279297, 8.684274673461914, 20.921661376953125, 4.434322357177734, 14.900360107421875, -1.6360206604003906, 9.995777130126953, -4.131351470947266, 9.269851684570312, 13.9974365234375, 23.799434661865234, 5.106193542480469, 22.90325164794922, 43.610050201416016, 2.5942001342773438, 13.2620849609375, 22.224708557128906, 24.0166015625, 17.06885528564453, 22.249927520751953, 21.37684440612793, -41.101593017578125, 5.388801574707031, 0.18741130828857422, -24.70836639404297, 11.028228759765625, -2.7049827575683594, 6.197998046875, 3.6129302978515625, -30.506919860839844, -2.4563121795654297, -22.258285522460938, -14.847686767578125, 5.860759735107422, -1.5201263427734375, 2.2349395751953125, 9.861007690429688, 18.17319107055664, 36.53185272216797, 39.887290954589844, -23.090763092041016, 17.698623657226562, 2.17657470703125, -8.967498779296875, -31.291934967041016, 8.051116943359375, 33.26043701171875, 29.384220123291016, -11.319168090820312, 33.64469909667969, 32.84605407714844, 27.342445373535156, 15.053695678710938, -11.553497314453125, 5.535974502563477, -16.973220825195312, 41.481781005859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000571.npy"}
|
||||
{"epoch": 0.8631897203325775, "step": 572, "batch_size": 64, "mean": 10.840169906616211, "std": 16.88857078552246, "min": -23.275985717773438, "p10": -6.650106811523437, "median": 9.22882080078125, "p90": 34.67701377868653, "max": 55.629600524902344, "pos_frac": 0.765625, "sample": [18.994140625, -19.574913024902344, 15.881893157958984, 33.41892623901367, 21.72527313232422, 8.328643798828125, 35.21619415283203, 18.47692108154297, -11.833708763122559, 55.629600524902344, 5.163352966308594, 9.721012115478516, 16.668853759765625, 43.909942626953125, -5.38458251953125, 13.731452941894531, -15.778419494628906, 3.3136558532714844, -6.294029235839844, 31.260467529296875, 17.998050689697266, 13.65478515625, 6.9062652587890625, 39.303768157958984, 12.350578308105469, 9.448783874511719, 2.3004684448242188, 28.940185546875, 1.8180770874023438, 0.05243682861328125, -5.891899108886719, 43.883724212646484, 48.69171142578125, -10.899505615234375, 10.039838790893555, 8.467781066894531, 0.6102981567382812, -9.993827819824219, 3.048572540283203, 12.982311248779297, 13.969779968261719, 9.008857727050781, 11.091255187988281, 29.00912857055664, 11.974491119384766, 8.785087585449219, -6.802711486816406, -6.289581298828125, 50.284706115722656, 13.320823669433594, 4.723793029785156, 29.944580078125, -2.836395263671875, 16.107391357421875, -23.275985717773438, -5.08563232421875, 3.4489974975585938, 3.0164260864257812, -2.5556716918945312, 5.335289001464844, 10.64837646484375, -1.2074832916259766, 0.2735176086425781, 14.59475326538086], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000572.npy"}
|
||||
{"epoch": 0.8647014361300076, "step": 573, "batch_size": 64, "mean": 12.313529014587402, "std": 22.411657333374023, "min": -37.78247833251953, "p10": -13.658140754699707, "median": 10.434280395507812, "p90": 43.420919799804686, "max": 63.970916748046875, "pos_frac": 0.734375, "sample": [13.109672546386719, 18.69497299194336, -37.78247833251953, 31.23882293701172, -13.338201522827148, 2.1317901611328125, -9.822086334228516, 21.096839904785156, 11.9425048828125, -21.521575927734375, -37.316017150878906, -21.709705352783203, 5.558296203613281, -0.0241546630859375, 13.201416015625, 23.99041748046875, -2.6178131103515625, 28.5982666015625, 43.53312683105469, -26.934768676757812, 40.11859893798828, 18.921154022216797, 38.28962707519531, 11.38601303100586, 6.19696044921875, 2.0027122497558594, 63.970916748046875, 12.247346878051758, 43.773529052734375, -4.192665100097656, 34.886932373046875, 5.04541015625, 14.779029846191406, -3.3563690185546875, 23.763671875, 4.848480224609375, 9.482547760009766, 8.371559143066406, 16.10610580444336, 58.32295227050781, 36.04095458984375, 22.411277770996094, 15.422338485717773, -4.365903854370117, 0.775360107421875, 7.0841064453125, 54.01521301269531, 0.036834716796875, 5.226188659667969, 35.98255157470703, 43.15910339355469, 42.46966552734375, -2.596038818359375, 17.268653869628906, -6.799812316894531, -16.646621704101562, 48.524871826171875, -13.795257568359375, 12.418838500976562, 3.987874984741211, 2.5885467529296875, 1.7108612060546875, -12.098030090332031, 48.25042724609375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000573.npy"}
|
||||
{"epoch": 0.8662131519274376, "step": 574, "batch_size": 64, "mean": 14.719091415405273, "std": 18.985342025756836, "min": -28.750267028808594, "p10": -7.344770050048828, "median": 12.776453018188477, "p90": 42.34677276611328, "max": 51.00392150878906, "pos_frac": 0.734375, "sample": [12.700321197509766, -7.663810729980469, -0.6711578369140625, -14.125762939453125, -10.603836059570312, 40.13023376464844, 1.1369895935058594, -15.711410522460938, -4.366783142089844, -17.03215789794922, 7.118457794189453, 9.27069091796875, -1.9153594970703125, -2.7865867614746094, -8.339523315429688, 5.618400573730469, -2.551054000854492, 16.17449951171875, 22.2774658203125, 17.169925689697266, 24.886825561523438, -0.5613632202148438, -6.600341796875, 22.4134521484375, 2.1376266479492188, 49.92613220214844, 36.23065185546875, 48.57904052734375, 3.757669448852539, 42.47203063964844, 6.762237548828125, 2.0171165466308594, 26.594070434570312, 18.805160522460938, -1.694244384765625, 29.7662353515625, -28.750267028808594, 18.45501708984375, 44.60540771484375, 14.35987663269043, 33.20843505859375, 7.3708953857421875, 51.00392150878906, 7.06878662109375, 36.8146858215332, 42.05450439453125, 27.270965576171875, 12.852584838867188, 12.080062866210938, 6.922107696533203, -3.2689361572265625, 48.541778564453125, 20.852615356445312, 19.790367126464844, 36.37483215332031, 11.014518737792969, 5.644947052001953, 22.28018569946289, 39.64533996582031, 45.60954284667969, -5.837150573730469, 23.563705444335938, 28.310821533203125, 12.860488891601562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000574.npy"}
|
||||
{"epoch": 0.8677248677248677, "step": 575, "batch_size": 64, "mean": 9.744050979614258, "std": 21.3377742767334, "min": -45.88644027709961, "p10": -11.43288803100586, "median": 4.7538604736328125, "p90": 45.70837249755861, "max": 58.184539794921875, "pos_frac": 0.703125, "sample": [22.750656127929688, -19.848655700683594, -28.382762908935547, 12.149673461914062, 56.372406005859375, -2.3027496337890625, -3.1943740844726562, 16.803939819335938, 0.8105697631835938, -10.439430236816406, 19.537857055664062, -10.904281616210938, 48.20375061035156, 2.315814971923828, -21.203041076660156, 6.235565185546875, -1.002593994140625, 8.062583923339844, 11.463916778564453, 4.2266693115234375, 4.4440460205078125, 33.642738342285156, 17.216567993164062, 15.0859375, 7.506477355957031, -2.19427490234375, -11.449974060058594, 10.392799377441406, 47.24365234375, 5.0636749267578125, 6.003671646118164, -6.5304107666015625, 42.12605285644531, 55.13923645019531, 34.25489044189453, -11.393020629882812, 50.872657775878906, 0.0302734375, 50.85694885253906, 32.074039459228516, -0.6402435302734375, 8.311149597167969, 3.3707122802734375, 1.6615447998046875, 1.5913848876953125, 1.2204666137695312, -1.8564682006835938, -45.88644027709961, 7.51849365234375, 58.184539794921875, 33.880462646484375, 3.70196533203125, 1.6709136962890625, 21.69676971435547, -15.647701263427734, 9.712661743164062, 3.3490047454833984, -9.482162475585938, 1.0970840454101562, 9.641372680664062, -16.047264099121094, 27.427947998046875, 27.931175231933594, -0.8296127319335938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000575.npy"}
|
||||
{"epoch": 0.8692365835222978, "step": 576, "batch_size": 64, "mean": 14.908914566040039, "std": 19.28183364868164, "min": -26.496963500976562, "p10": -7.641343879699706, "median": 11.205322265625, "p90": 39.69010810852051, "max": 49.20119857788086, "pos_frac": 0.75, "sample": [15.630233764648438, 41.64653778076172, -7.900836944580078, 21.73003578186035, 25.712310791015625, 34.216026306152344, 49.20119857788086, 6.209445953369141, 17.529617309570312, 8.572196960449219, -12.856773376464844, -13.282598495483398, 30.873374938964844, -22.299598693847656, 24.59899139404297, -2.09344482421875, -20.799644470214844, 19.708580017089844, 32.478607177734375, 11.419189453125, 33.97271728515625, 43.3275146484375, 10.440383911132812, 0.43007659912109375, 37.26914596557617, 31.882186889648438, 33.82475280761719, -3.1343612670898438, 40.88941192626953, -0.7496490478515625, 6.8400726318359375, -0.8415145874023438, 5.8924713134765625, 36.608680725097656, 21.86231231689453, 0.28385162353515625, 42.64134216308594, -0.06793975830078125, 4.5814361572265625, 9.164264678955078, 19.072158813476562, -26.496963500976562, 28.118202209472656, -0.3429298400878906, 33.95518493652344, 3.3260726928710938, 48.70062255859375, 39.43378448486328, -2.267578125, 39.79996109008789, 32.404693603515625, -25.275985717773438, 27.942047119140625, 0.3708343505859375, 10.991455078125, 27.255584716796875, 9.432292938232422, -7.035860061645508, 38.999732971191406, 22.71387481689453, 4.203224182128906, -3.954376220703125, 8.327880859375, 9.08603286743164], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000576.npy"}
|
||||
{"epoch": 0.8707482993197279, "step": 577, "batch_size": 64, "mean": 8.283341407775879, "std": 17.97980499267578, "min": -40.38987350463867, "p10": -8.931093597412106, "median": 6.3596038818359375, "p90": 34.989155578613286, "max": 52.101348876953125, "pos_frac": 0.640625, "sample": [22.766088485717773, 10.368759155273438, -21.274192810058594, 7.013782501220703, -4.456695556640625, -0.82122802734375, -1.443450927734375, -4.623044013977051, 20.823776245117188, 11.84661865234375, -3.3944854736328125, 38.63971710205078, 16.36681365966797, -0.5443038940429688, -16.92913818359375, 47.974151611328125, 37.7398681640625, 46.46294403076172, -40.38987350463867, 21.572452545166016, 22.935588836669922, 15.694839477539062, -0.41274070739746094, 29.74249267578125, -0.16051864624023438, -14.966514587402344, -10.102291107177734, 19.74277114868164, 2.981943130493164, 9.773681640625, -0.17068862915039062, 8.510616302490234, -5.8614654541015625, 7.615362167358398, -4.25146484375, 8.374465942382812, 4.3406982421875, 7.757720947265625, 5.381278991699219, 5.008968353271484, -5.139133453369141, 1.3369064331054688, 18.353317260742188, 4.458314895629883, 52.101348876953125, 17.515594482421875, -18.648521423339844, 8.675201416015625, 3.112030029296875, 35.240692138671875, -5.9020233154296875, 7.458412170410156, 41.61907958984375, 28.542377471923828, 5.008995056152344, 6.7576446533203125, -29.513412475585938, -6.198299407958984, -1.6027193069458008, 21.25632667541504, 7.486610412597656, 34.40223693847656, 5.9615631103515625, -1.7820281982421875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000577.npy"}
|
||||
{"epoch": 0.872260015117158, "step": 578, "batch_size": 64, "mean": 8.049796104431152, "std": 17.1057186126709, "min": -23.527183532714844, "p10": -10.380348587036131, "median": 3.495624542236328, "p90": 34.16797714233398, "max": 47.35188293457031, "pos_frac": 0.59375, "sample": [-2.6535377502441406, -5.8799591064453125, -0.23224639892578125, 1.598480224609375, 4.772895812988281, -13.456085205078125, 9.908348083496094, 15.496646881103516, 25.213760375976562, -8.29318618774414, -0.20761871337890625, 18.70220947265625, -2.1035118103027344, 23.077884674072266, 29.850040435791016, -12.069229125976562, 34.20314025878906, 29.264728546142578, 20.709426879882812, 5.6040191650390625, 28.74608612060547, 7.490020751953125, 2.171449661254883, 6.403484344482422, -1.4576377868652344, 8.175880432128906, 26.85638427734375, 0.0075740814208984375, -1.7857131958007812, -11.446653366088867, 44.34966278076172, 9.793083190917969, 41.63486862182617, 2.949310302734375, -10.896629333496094, 2.2445220947265625, 5.621116638183594, 10.90704345703125, 10.846656799316406, 28.210540771484375, 43.28924560546875, -0.7698211669921875, 13.878707885742188, -5.77772331237793, -0.7119140625, -9.17569351196289, -7.2987518310546875, 35.28117752075195, 17.35350799560547, 34.08592987060547, -0.493621826171875, -13.207778930664062, 47.35188293457031, -2.365762710571289, -7.632904052734375, -18.590246200561523, 4.784324645996094, -8.989601135253906, 2.9412078857421875, -6.355607986450195, -23.527183532714844, 4.041938781738281, -9.082313537597656, 41.83069610595703], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000578.npy"}
|
||||
{"epoch": 0.873771730914588, "step": 579, "batch_size": 64, "mean": 12.934349060058594, "std": 18.342859268188477, "min": -42.46687316894531, "p10": -7.115776443481445, "median": 10.045448303222656, "p90": 35.751687622070314, "max": 53.179832458496094, "pos_frac": 0.828125, "sample": [8.302978515625, 12.692436218261719, 19.020458221435547, 10.508949279785156, 13.425651550292969, -31.674041748046875, 35.31816101074219, 5.522216796875, 9.556777954101562, 34.60260009765625, 27.91680145263672, 21.810489654541016, 23.865463256835938, 53.179832458496094, 4.338441848754883, 8.230659484863281, 35.31243896484375, 13.795661926269531, -10.584480285644531, 31.12408447265625, 44.78815460205078, 4.647640228271484, 31.861679077148438, 0.12228965759277344, 0.5755767822265625, -7.36785888671875, 47.66871643066406, 14.585838317871094, 37.871360778808594, 8.57657241821289, -6.527584075927734, 6.097095489501953, 7.9296875, 16.563709259033203, 13.969444274902344, 30.295124053955078, 24.48217010498047, 30.32604217529297, -5.746826171875, 36.94895935058594, 1.7274513244628906, 7.965147018432617, -5.086954116821289, -17.751785278320312, 14.295639038085938, 2.977325439453125, -11.578720092773438, -42.46687316894531, 49.839508056640625, 2.9528274536132812, 5.3250274658203125, 15.49749755859375, -0.9886474609375, 27.746261596679688, 8.979728698730469, 2.503742218017578, 35.93748474121094, 25.472930908203125, 0.05233001708984375, 14.433799743652344, -11.479713439941406, 8.768707275390625, 19.160369873046875, 9.581947326660156], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000579.npy"}
|
||||
{"epoch": 0.8752834467120182, "step": 580, "batch_size": 64, "mean": 10.86542797088623, "std": 19.107868194580078, "min": -40.331871032714844, "p10": -11.090708160400391, "median": 8.73919677734375, "p90": 37.311936187744145, "max": 48.07752990722656, "pos_frac": 0.75, "sample": [-11.154312133789062, -8.460227966308594, -10.942298889160156, 4.0415496826171875, -4.5057525634765625, 42.435699462890625, 34.92741775512695, 17.246742248535156, 13.148895263671875, 2.337177276611328, 30.309112548828125, 16.320560455322266, 3.022308349609375, 5.723155975341797, -12.298149108886719, 6.488971710205078, 22.262733459472656, 12.478500366210938, -2.7063865661621094, 7.316703796386719, 2.843669891357422, 11.35935115814209, 43.067535400390625, 5.097755432128906, 14.930473327636719, -4.05657958984375, 48.07752990722656, 8.111846923828125, -5.02366828918457, 45.763092041015625, 38.87474060058594, 17.111614227294922, 4.303802490234375, 13.988052368164062, -16.48467254638672, 37.660850524902344, -34.638648986816406, -36.710899353027344, -1.4455108642578125, 32.705413818359375, -40.331871032714844, 16.708740234375, -2.4130783081054688, 19.861572265625, 38.070465087890625, 8.04738998413086, -3.2472991943359375, 7.303062438964844, 36.497802734375, 12.602210998535156, 35.08927917480469, 27.415761947631836, 9.366546630859375, -14.307514190673828, 5.585124969482422, 1.6324138641357422, 27.874588012695312, 23.565614700317383, 18.754924774169922, 19.96319580078125, 25.71886444091797, 6.043731689453125, 2.3193397521972656, 19.73833656311035], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000580.npy"}
|
||||
{"epoch": 0.8767951625094482, "step": 581, "batch_size": 64, "mean": 11.744109153747559, "std": 19.44923210144043, "min": -30.150333404541016, "p10": -17.366654968261717, "median": 10.286285400390625, "p90": 37.928274536132825, "max": 57.51898193359375, "pos_frac": 0.75, "sample": [26.274959564208984, 15.188240051269531, 17.819015502929688, 5.794586181640625, -18.093231201171875, 9.962692260742188, 18.23876953125, -3.4032135009765625, 41.21846008300781, 26.7799072265625, -15.671310424804688, 5.759616851806641, 6.0518798828125, 20.83187484741211, 4.9588470458984375, -6.658260345458984, 35.50902557373047, 10.536983489990234, 34.76544952392578, 5.171632766723633, 42.95265197753906, 23.485248565673828, 40.609703063964844, -22.582901000976562, 10.035587310791016, 43.90423583984375, 42.023460388183594, -5.418327331542969, 21.732757568359375, 1.856964111328125, -18.404403686523438, 8.388839721679688, 6.971931457519531, 20.9390869140625, 7.210227966308594, 22.704174041748047, 2.7146148681640625, -27.06683349609375, 57.51898193359375, 38.96509552001953, 34.87504577636719, -8.2705078125, 23.803985595703125, 12.181533813476562, 12.371200561523438, -20.542465209960938, 7.1885986328125, 3.7712249755859375, -6.189659118652344, -27.69060516357422, 32.94112777709961, 17.004592895507812, 32.70892333984375, 9.621047973632812, 1.0859699249267578, 29.987899780273438, -1.8517227172851562, -30.150333404541016, -3.244171142578125, 24.031917572021484, 23.890892028808594, 12.797828674316406, 12.18157958984375, -0.4579315185546875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000581.npy"}
|
||||
{"epoch": 0.8783068783068783, "step": 582, "batch_size": 64, "mean": 11.70887279510498, "std": 20.051050186157227, "min": -34.24421691894531, "p10": -15.150984954833982, "median": 8.711437225341797, "p90": 37.158304595947264, "max": 60.41304016113281, "pos_frac": 0.765625, "sample": [23.200393676757812, 60.41304016113281, 12.465496063232422, 13.426063537597656, 3.4354171752929688, -6.798072814941406, 38.92280578613281, 1.2199172973632812, 28.205078125, 34.74869155883789, 9.124107360839844, 30.9652099609375, 37.11558532714844, 49.50152587890625, 2.449066162109375, 36.01666259765625, 12.87469482421875, -13.10745620727539, 20.29283905029297, 33.04906463623047, 3.1287841796875, 4.201728820800781, 7.267375946044922, 20.950157165527344, 20.136375427246094, 18.207473754882812, 1.6565361022949219, -4.427013397216797, -30.271926879882812, -27.59042739868164, 0.3798980712890625, 15.540130615234375, 33.863807678222656, -16.32599639892578, -7.317146301269531, 22.274311065673828, 38.00391387939453, 23.341705322265625, 38.254486083984375, -1.614044189453125, 21.040008544921875, 37.176612854003906, 4.094135284423828, 8.29876708984375, -16.026782989501953, 6.887062072753906, 3.40093994140625, 29.463829040527344, 3.3588333129882812, -34.24421691894531, -1.7822132110595703, 7.116188049316406, -4.687233924865723, 0.6076183319091797, 4.768135070800781, 13.509651184082031, 28.800189971923828, -30.71392822265625, 26.395401000976562, -18.388748168945312, 6.73150634765625, -1.860076904296875, 30.34033966064453, 37.901588439941406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000582.npy"}
|
||||
{"epoch": 0.8798185941043084, "step": 583, "batch_size": 64, "mean": 9.883415222167969, "std": 21.664203643798828, "min": -36.439125061035156, "p10": -15.057145309448241, "median": 10.414325714111328, "p90": 43.178646850585956, "max": 55.717994689941406, "pos_frac": 0.640625, "sample": [-1.702362060546875, -8.091026306152344, -26.73993682861328, -4.5634307861328125, 55.717994689941406, 24.44504165649414, -25.03588104248047, -5.552960395812988, -24.775115966796875, -9.849235534667969, 16.207839965820312, -24.078720092773438, 10.534042358398438, 29.477645874023438, 14.071582794189453, -5.856525421142578, 44.817596435546875, -14.162254333496094, 51.28078079223633, 22.16248893737793, 36.69968032836914, -26.647674560546875, 12.162246704101562, -2.1454391479492188, 14.495254516601562, 12.25728988647461, 10.453887939453125, -13.844406127929688, 9.59902572631836, 39.35443115234375, -0.10712432861328125, -3.1541271209716797, 14.825393676757812, 5.827690124511719, 4.4927978515625, 50.516448974609375, -15.440670013427734, -13.47219467163086, 10.374763488769531, 50.70365905761719, 2.1739139556884766, 32.80059814453125, -36.439125061035156, 29.975196838378906, -4.945583343505859, 34.778839111328125, 29.475173950195312, 15.362983703613281, 11.091835021972656, -8.749643325805664, 22.629714965820312, 14.197914123535156, 11.04269790649414, 0.214141845703125, 20.683603286743164, 3.4499053955078125, 4.830631256103516, 22.764263153076172, 0.26546287536621094, 18.522253036499023, 49.547080993652344, 46.04093933105469, -1.844482421875, -0.5882110595703125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000583.npy"}
|
||||
{"epoch": 0.8813303099017384, "step": 584, "batch_size": 64, "mean": 11.066622734069824, "std": 20.720918655395508, "min": -30.294654846191406, "p10": -13.469311714172362, "median": 8.996295928955078, "p90": 38.535474395751955, "max": 57.501251220703125, "pos_frac": 0.6875, "sample": [49.850799560546875, 13.258697509765625, 10.249523162841797, 0.39987850189208984, 12.066513061523438, -12.029191970825195, 43.91984558105469, 36.12125015258789, 32.986572265625, 19.111480712890625, 7.680763244628906, -2.235044479370117, -8.967716217041016, -24.867706298828125, -14.086505889892578, 9.122871398925781, 5.017250061035156, 45.81251525878906, 30.519886016845703, -23.903385162353516, 36.07013702392578, 38.993927001953125, -9.34598159790039, 36.56877136230469, 18.003997802734375, 1.1315383911132812, 13.527433395385742, 4.854560852050781, 2.4044723510742188, 0.3220367431640625, 25.677471160888672, -18.350357055664062, 13.880256652832031, 0.7480621337890625, -22.246959686279297, 47.19574737548828, -4.6456756591796875, -0.197601318359375, -2.6805267333984375, -7.572490692138672, 34.39392852783203, 29.333274841308594, -9.314582824707031, 21.186809539794922, 13.941879272460938, 22.828277587890625, 1.31219482421875, 26.24057388305664, 15.47552490234375, -2.27667236328125, 13.053504943847656, 8.869720458984375, -2.6330413818359375, 37.46575164794922, 1.4951362609863281, 57.501251220703125, 51.605499267578125, 7.059429168701172, 9.668754577636719, -0.5290603637695312, -0.9364128112792969, 23.677997589111328, -25.228347778320312, -30.294654846191406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000584.npy"}
|
||||
{"epoch": 0.8828420256991686, "step": 585, "batch_size": 64, "mean": 8.310773849487305, "std": 17.731861114501953, "min": -40.579978942871094, "p10": -10.127669525146482, "median": 7.897235870361328, "p90": 29.78403091430664, "max": 48.13141632080078, "pos_frac": 0.6875, "sample": [2.72845458984375, 10.028717041015625, 5.083900451660156, 7.78656005859375, 20.88613510131836, 17.158302307128906, 11.203575134277344, 29.749237060546875, -5.211372375488281, 0.1612377166748047, -40.579978942871094, 5.151496887207031, 26.716278076171875, 5.331993103027344, -0.24416160583496094, 34.887420654296875, -28.14577865600586, 10.575347900390625, 10.0609130859375, -0.8009414672851562, 29.79894256591797, 8.007911682128906, 0.6136531829833984, -2.7847671508789062, 10.815406799316406, 5.301990509033203, -4.750663757324219, 10.394126892089844, 24.161285400390625, 12.653858184814453, -7.6662139892578125, 6.121086120605469, 23.930988311767578, -1.8805427551269531, -18.53356170654297, 6.562812805175781, 8.009498596191406, 10.45663833618164, 42.589141845703125, 11.989795684814453, -2.5284881591796875, 20.188838958740234, 11.315483093261719, -28.795848846435547, -33.227752685546875, 43.75352096557617, 44.955352783203125, 33.00467300415039, -4.788791656494141, -13.75323486328125, -11.182579040527344, -2.356903076171875, 14.942024230957031, 14.285594940185547, 22.672998428344727, -0.2875213623046875, 4.0384521484375, 24.44821548461914, 48.13141632080078, 16.61389923095703, -0.09427642822265625, -2.1200790405273438, 7.523750305175781, 26.832046508789062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000585.npy"}
|
||||
{"epoch": 0.8843537414965986, "step": 586, "batch_size": 64, "mean": 13.586462020874023, "std": 17.79898452758789, "min": -37.219703674316406, "p10": -4.208776473999023, "median": 12.891382217407227, "p90": 40.44501304626465, "max": 53.68974304199219, "pos_frac": 0.734375, "sample": [23.862030029296875, -2.588390350341797, 19.321334838867188, 17.852874755859375, 11.238555908203125, 29.055572509765625, 13.315166473388672, -6.450351715087891, 9.515941619873047, 12.94442367553711, 36.88803482055664, -5.6282196044921875, 15.127939224243164, 46.82311248779297, 40.800819396972656, -1.9559993743896484, 14.238410949707031, 14.870208740234375, 12.838340759277344, 42.766231536865234, -3.7942543029785156, 40.71541976928711, 8.773571014404297, 13.152399063110352, 19.630691528320312, -2.0610923767089844, 13.047252655029297, 2.1331939697265625, 31.371543884277344, 16.170188903808594, 44.8831787109375, 30.605911254882812, 16.723007202148438, 31.770736694335938, 28.904403686523438, -3.12762451171875, -27.60623550415039, 53.68974304199219, 2.8476829528808594, 7.046393394470215, -0.8274250030517578, -5.988456726074219, 7.068206787109375, 10.140647888183594, 20.78858184814453, -4.3864288330078125, -0.16980361938476562, 10.342472076416016, -3.226053237915039, 11.58847427368164, 19.854476928710938, -2.8103179931640625, 10.834869384765625, -1.0348777770996094, 43.625667572021484, 2.0956344604492188, -37.219703674316406, 39.814064025878906, -18.06585693359375, 15.9996337890625, 32.388938903808594, 9.352218627929688, 28.701648712158203, 10.954818725585938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000586.npy"}
|
||||
{"epoch": 0.8858654572940288, "step": 587, "batch_size": 64, "mean": 14.350539207458496, "std": 20.898136138916016, "min": -33.13385009765625, "p10": -8.79956932067871, "median": 12.761375427246094, "p90": 43.24218330383301, "max": 58.67307662963867, "pos_frac": 0.734375, "sample": [-1.5925979614257812, -24.183761596679688, 0.13102340698242188, 12.157478332519531, 29.98889923095703, -12.935623168945312, 12.545494079589844, -16.568115234375, 33.970947265625, 11.496383666992188, 17.6308650970459, 27.7767333984375, -9.2808837890625, -2.8439865112304688, 5.396934509277344, 11.170066833496094, 30.881942749023438, 0.5055618286132812, -7.1297149658203125, 51.322608947753906, 21.13934326171875, 18.591175079345703, -0.4194068908691406, 43.524436950683594, 20.964988708496094, 22.796085357666016, 58.67307662963867, -7.676502227783203, -33.13385009765625, 0.13763809204101562, -3.601226806640625, 8.039226531982422, 24.979515075683594, 35.86326599121094, 7.259111404418945, 28.66350555419922, 28.689062118530273, -25.868133544921875, -27.424407958984375, 46.118255615234375, 45.3868522644043, -5.333717346191406, 49.85327911376953, 11.587512969970703, 1.0857963562011719, 39.747291564941406, 10.621601104736328, 42.58359146118164, 34.329280853271484, 12.977256774902344, 36.47838592529297, -4.540223121643066, 29.636728286743164, 2.240131378173828, 20.18848419189453, 18.667266845703125, 26.782817840576172, -7.326129913330078, -0.9570426940917969, 17.29998016357422, 19.41905975341797, 46.107421875, 32.137962341308594, 1.7054367065429688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000587.npy"}
|
||||
{"epoch": 0.8873771730914588, "step": 588, "batch_size": 64, "mean": 12.592073440551758, "std": 16.393274307250977, "min": -25.297565460205078, "p10": -6.442626953125, "median": 11.556882858276367, "p90": 32.61987171173096, "max": 51.183563232421875, "pos_frac": 0.8125, "sample": [-5.70062255859375, -6.9097747802734375, 8.827056884765625, 24.29279327392578, -0.6508865356445312, 21.769569396972656, 24.786842346191406, 5.34515380859375, 10.097274780273438, 12.593429565429688, 43.13124084472656, 24.824615478515625, 11.973567962646484, 29.4532527923584, 14.836219787597656, 18.50056266784668, 51.183563232421875, 43.39460754394531, 20.530563354492188, 33.150238037109375, 6.945335388183594, 18.79261016845703, 0.7154998779296875, 35.491798400878906, 13.695056915283203, 31.14568328857422, 11.14019775390625, 2.6546554565429688, 2.9162864685058594, -24.640838623046875, 2.958232879638672, 17.190832138061523, -19.301599502563477, 17.19915771484375, 28.753543853759766, 37.62744140625, 0.7060508728027344, 28.414505004882812, 21.51990509033203, -0.7619476318359375, 1.7799072265625, -25.297565460205078, 0.6093826293945312, 8.692291259765625, 25.919845581054688, 9.934501647949219, 31.77311134338379, -1.8917465209960938, 9.880500793457031, 32.98276901245117, 9.319366455078125, -6.059814453125, -8.211517333984375, 6.5279388427734375, 15.221267700195312, 23.185638427734375, 1.1155242919921875, 18.410480499267578, 6.1996612548828125, -6.606689453125, 9.13851547241211, 21.766021728515625, 28.10425567626953, -25.19256591796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000588.npy"}
|
||||
{"epoch": 0.8888888888888888, "step": 589, "batch_size": 64, "mean": 13.97443675994873, "std": 17.600833892822266, "min": -31.172874450683594, "p10": -4.209466171264647, "median": 11.88897705078125, "p90": 37.68945579528809, "max": 52.198089599609375, "pos_frac": 0.828125, "sample": [7.1333465576171875, 9.097579956054688, -21.654027938842773, 40.614112854003906, 25.135147094726562, 9.632043838500977, 11.90130615234375, -1.8878402709960938, 5.446563720703125, 3.563556671142578, 8.337966918945312, -13.959842681884766, 52.198089599609375, 18.544414520263672, 9.605636596679688, -0.7661552429199219, 17.150104522705078, 20.601512908935547, 34.007774353027344, 40.844200134277344, 6.7808074951171875, 11.87664794921875, 32.628334045410156, -16.8212890625, 20.747543334960938, 12.956130981445312, 0.9832916259765625, 7.7737274169921875, 28.204395294189453, 17.98794174194336, 8.10165023803711, 14.041709899902344, 37.12922668457031, 26.309860229492188, 4.110595703125, -2.8558731079101562, 11.314376831054688, 6.2739410400390625, 16.312179565429688, 42.25724411010742, 16.74505615234375, 47.75553894042969, 37.9295539855957, 10.650482177734375, 24.10128402709961, -2.042755126953125, 3.0916213989257812, -15.455947875976562, -4.789577484130859, 19.232322692871094, -21.903823852539062, 31.149002075195312, 47.100311279296875, -31.172874450683594, 24.397003173828125, 21.48774528503418, 36.309043884277344, 15.594955444335938, 5.243080139160156, 7.8539276123046875, 16.864288330078125, 2.168672561645508, 36.13536071777344, 4.2617340087890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000589.npy"}
|
||||
{"epoch": 0.890400604686319, "step": 590, "batch_size": 64, "mean": 13.602447509765625, "std": 18.267444610595703, "min": -41.465614318847656, "p10": -4.140811538696289, "median": 9.547582626342773, "p90": 41.7145679473877, "max": 47.328773498535156, "pos_frac": 0.75, "sample": [27.200698852539062, 7.684516906738281, 29.12469482421875, 2.1599159240722656, 19.072540283203125, 47.328773498535156, 9.531631469726562, -41.465614318847656, -2.2018861770629883, -2.3881912231445312, 3.78485107421875, -4.516134262084961, 8.149337768554688, 14.514114379882812, -22.293174743652344, -0.12494659423828125, 27.774574279785156, 5.40191650390625, -1.8284683227539062, 1.2896957397460938, -8.026382446289062, 23.131729125976562, -0.43534088134765625, 34.784175872802734, 44.29835510253906, -1.266998291015625, 9.27850341796875, 14.221969604492188, 34.48761749267578, 8.366867065429688, 15.440729141235352, 22.953475952148438, 7.524633407592773, 5.43011474609375, 16.979232788085938, 37.988037109375, 34.078731536865234, 15.3441162109375, 3.67205810546875, 5.714515686035156, 40.49955749511719, 9.563533782958984, 45.64788818359375, 28.59709930419922, 47.019378662109375, -4.193256378173828, 1.907257080078125, 11.51666259765625, 7.378532409667969, 22.470001220703125, -6.500896453857422, 35.851219177246094, 13.030555725097656, -2.2078323364257812, -0.5034637451171875, 42.73259735107422, 17.777633666992188, 27.46131134033203, 45.27936553955078, -4.018440246582031, 11.386222839355469, 0.448486328125, 42.235286712646484, -14.987022399902344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000590.npy"}
|
||||
{"epoch": 0.891912320483749, "step": 591, "batch_size": 64, "mean": 13.286849975585938, "std": 19.47575569152832, "min": -27.979644775390625, "p10": -11.414406585693357, "median": 11.021591186523438, "p90": 39.15420684814454, "max": 52.750030517578125, "pos_frac": 0.765625, "sample": [-0.7443075180053711, 41.75037384033203, 15.56888198852539, 21.60478973388672, 32.61882019042969, 30.372528076171875, 44.27915573120117, 7.450531005859375, 12.059052467346191, 35.80219268798828, 6.813629150390625, -26.069107055664062, 15.026824951171875, -15.867164611816406, 4.5664520263671875, -17.602420806884766, 10.288616180419922, 6.0827178955078125, 17.061538696289062, 9.080944061279297, 36.530364990234375, 18.987300872802734, 32.964599609375, -8.0157470703125, 36.62425231933594, -21.861770629882812, 32.38165283203125, 18.701671600341797, 1.841217041015625, 0.51312255859375, -27.979644775390625, -6.0629425048828125, 46.2724609375, 52.750030517578125, -1.890645980834961, 43.03050231933594, -15.518943786621094, 24.868240356445312, 30.704086303710938, 1.4172859191894531, 3.727283477783203, -4.3744306564331055, 15.528450012207031, 21.82509994506836, 1.4434165954589844, 30.561874389648438, -2.290045738220215, -12.626319885253906, 4.7694244384765625, 29.85882568359375, -8.58660888671875, -5.002086639404297, 39.81074523925781, 11.754566192626953, 2.35986328125, 42.57158660888672, 34.296783447265625, 6.84132194519043, 7.0372467041015625, 13.964179992675781, 37.622283935546875, 2.0589752197265625, 1.1894950866699219, 29.615386962890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000591.npy"}
|
||||
{"epoch": 0.8934240362811792, "step": 592, "batch_size": 64, "mean": 12.779840469360352, "std": 19.087919235229492, "min": -24.536216735839844, "p10": -6.484937477111816, "median": 8.570568084716797, "p90": 45.042544555664065, "max": 65.74429321289062, "pos_frac": 0.75, "sample": [-3.484466552734375, 50.72113037109375, 32.69981384277344, 23.17519760131836, 23.39098358154297, 45.701934814453125, 45.87343978881836, -13.936080932617188, 11.062400817871094, 9.20656967163086, -7.752601623535156, 20.662508010864258, 9.057052612304688, 4.728096008300781, 1.1483001708984375, 32.65154266357422, 42.12591552734375, 26.54027557373047, -5.743282318115234, -0.4663829803466797, -13.258712768554688, 31.617813110351562, 18.770275115966797, 43.50396728515625, -1.7669296264648438, 23.790267944335938, 8.527313232421875, 50.713409423828125, 4.97784423828125, 13.583084106445312, 10.415504455566406, 18.077518463134766, -6.3685302734375, 7.291816711425781, 0.330841064453125, 9.756179809570312, -24.536216735839844, 8.613822937011719, -14.825740814208984, -6.299468994140625, -6.534826278686523, 13.177330017089844, 0.6646614074707031, -2.414306640625, 3.0293350219726562, 17.940399169921875, 7.8602294921875, 54.06890869140625, 51.71195983886719, 3.0826797485351562, 65.74429321289062, -0.7453422546386719, 3.2031822204589844, 9.755023956298828, -7.319828033447266, 3.42606258392334, 14.09747314453125, 6.159126281738281, -1.2476348876953125, 5.152732849121094, 7.344936370849609, 2.3597564697265625, 10.350723266601562, 26.766544342041016], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000592.npy"}
|
||||
{"epoch": 0.8949357520786092, "step": 593, "batch_size": 64, "mean": 12.287393569946289, "std": 20.367719650268555, "min": -32.626922607421875, "p10": -12.680986022949218, "median": 10.38051986694336, "p90": 41.670003509521486, "max": 54.24322509765625, "pos_frac": 0.71875, "sample": [7.578502655029297, 6.135074615478516, 21.428176879882812, 9.607772827148438, 37.672271728515625, -12.720985412597656, 1.957183837890625, -8.87060546875, 49.821624755859375, 26.02254867553711, 20.248340606689453, 9.41943359375, 12.23370361328125, 42.18186950683594, 7.991909027099609, -1.6595420837402344, 27.01540756225586, 13.09375, 12.413467407226562, 9.555404663085938, 5.812732696533203, 38.10511016845703, 27.138877868652344, 38.72826385498047, 1.6508674621582031, 11.298080444335938, -10.902303695678711, 42.136566162109375, 17.189895629882812, -27.516921997070312, 33.68988800048828, -12.587654113769531, -4.922721862792969, 28.305362701416016, 17.87403106689453, 27.652278900146484, 6.914209365844727, -24.349376678466797, 41.310096740722656, -2.13311767578125, -16.711082458496094, -14.886589050292969, 41.824249267578125, -24.620201110839844, 50.9266471862793, 36.11237716674805, 48.52581024169922, 8.236906051635742, -6.298274993896484, 54.24322509765625, -1.206939697265625, -2.6729202270507812, 11.366401672363281, 12.886699676513672, 21.3651123046875, -32.626922607421875, 10.686737060546875, 6.815704345703125, -0.8052091598510742, -6.492321014404297, 10.21466064453125, 10.546379089355469, 4.3777618408203125, 18.065549850463867], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000593.npy"}
|
||||
{"epoch": 0.8964474678760394, "step": 594, "batch_size": 64, "mean": 10.986978530883789, "std": 18.142179489135742, "min": -28.44963836669922, "p10": -6.634131622314453, "median": 8.401538848876953, "p90": 37.36129302978516, "max": 55.14094543457031, "pos_frac": 0.703125, "sample": [9.0286865234375, 28.350234985351562, 0.8974838256835938, 23.01577377319336, 35.18539810180664, 25.982101440429688, 9.70388412475586, -6.664520263671875, 7.977745056152344, -3.9269638061523438, -0.398956298828125, -10.511505126953125, -4.981174468994141, 5.2928009033203125, 2.7017669677734375, 0.6565570831298828, -3.55987548828125, -6.182220458984375, 18.81909942626953, 17.545169830322266, 30.375869750976562, 2.042774200439453, 1.131551742553711, 5.0017547607421875, 55.14094543457031, 14.019088745117188, 8.825332641601562, -6.159400939941406, 32.76734924316406, -0.9613113403320312, 39.525604248046875, 10.8681640625, 52.6068115234375, 18.059967041015625, -0.542083740234375, -7.769306182861328, 9.134689331054688, 21.72435760498047, 18.32326889038086, -2.8943862915039062, 37.869178771972656, -3.4982681274414062, 33.47102355957031, 43.27111053466797, 1.2531471252441406, 14.010215759277344, 7.571006774902344, 37.30878448486328, 1.8715629577636719, 37.38379669189453, 24.342041015625, -28.44963836669922, 1.2689208984375, 49.61371612548828, 10.71807861328125, 20.159881591796875, -5.85980224609375, 5.212593078613281, 11.265602111816406, -6.563224792480469, 12.216072082519531, -14.685111999511719, -11.283760070800781, -25.452789306640625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000594.npy"}
|
||||
{"epoch": 0.8979591836734694, "step": 595, "batch_size": 64, "mean": 14.545381546020508, "std": 21.39678382873535, "min": -47.417354583740234, "p10": -10.399310302734374, "median": 14.445173263549805, "p90": 40.034242248535165, "max": 58.98187255859375, "pos_frac": 0.78125, "sample": [-1.338592529296875, 17.55219268798828, 37.434661865234375, 12.412498474121094, 26.208763122558594, 14.523921966552734, 24.849472045898438, 8.00851058959961, 5.4326019287109375, 41.93087387084961, 35.6416015625, 4.164360046386719, -4.430637359619141, -32.46123504638672, 44.7350959777832, 3.2795448303222656, 6.301422119140625, -8.970680236816406, 36.58425521850586, 1.8287124633789062, 5.9011688232421875, 26.130859375, 32.129974365234375, 14.163616180419922, 35.96649169921875, 13.390869140625, 7.006690979003906, 25.367431640625, 27.010299682617188, 21.767253875732422, -3.3004074096679688, -21.874282836914062, 14.345481872558594, -9.012405395507812, 36.915618896484375, 20.73733139038086, 43.94087219238281, -18.141937255859375, -47.417354583740234, 58.98187255859375, 51.2569580078125, 44.77141571044922, 41.04742431640625, 16.832366943359375, -20.033241271972656, 16.680130004882812, 30.06689453125, 21.411846160888672, 0.9253387451171875, 8.443359375, 1.6490287780761719, 29.427589416503906, 22.790924072265625, 12.138450622558594, 14.366424560546875, 6.606964111328125, 37.67015075683594, 21.785070419311523, -8.235954284667969, -10.993698120117188, -2.854076385498047, -31.026466369628906, 34.00471115112305, 34.47605895996094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000595.npy"}
|
||||
{"epoch": 0.8994708994708994, "step": 596, "batch_size": 64, "mean": 5.445644378662109, "std": 17.896554946899414, "min": -40.204017639160156, "p10": -12.067839431762694, "median": 0.43128395080566406, "p90": 36.410364151001026, "max": 49.52301788330078, "pos_frac": 0.53125, "sample": [0.8803024291992188, 5.087501525878906, 42.302734375, -8.511764526367188, 9.710498809814453, -14.889842987060547, -3.8942947387695312, 8.147220611572266, 1.819732666015625, 13.397571563720703, -12.703487396240234, 5.277534484863281, -5.7112884521484375, -2.65869140625, 42.287254333496094, 41.653411865234375, 17.545413970947266, -4.735515594482422, -23.752033233642578, 17.648998260498047, -5.2541961669921875, 15.708419799804688, 0.3132362365722656, 5.632621765136719, 47.420989990234375, 13.232414245605469, -2.0882568359375, -3.886455535888672, -0.7936058044433594, -1.11767578125, -7.427051544189453, -3.482666015625, 24.176586151123047, -1.5958442687988281, 18.637252807617188, -10.274274826049805, -23.273887634277344, 46.0596923828125, -0.4789886474609375, -12.366004943847656, -4.079952239990234, 9.146751403808594, -0.7970123291015625, -40.204017639160156, -1.8455963134765625, 14.17147445678711, -11.372119903564453, -4.097633361816406, 44.08891296386719, 7.500892639160156, 22.27700424194336, 4.958366394042969, -15.946266174316406, 4.526741027832031, -5.5030975341796875, 49.52301788330078, 0.1640777587890625, 14.403205871582031, 1.8353309631347656, -1.2071075439453125, 18.72490692138672, 18.904342651367188, 0.5493316650390625, -5.243862152099609], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000596.npy"}
|
||||
{"epoch": 0.9009826152683296, "step": 597, "batch_size": 64, "mean": 11.98304271697998, "std": 18.97148895263672, "min": -32.32868194580078, "p10": -8.626324081420899, "median": 11.056074142456055, "p90": 38.82055664062501, "max": 58.82402038574219, "pos_frac": 0.734375, "sample": [-0.8292503356933594, -32.32868194580078, -8.408180236816406, -2.2097930908203125, 40.7620849609375, 4.725669860839844, 3.134221076965332, -3.6652145385742188, -8.71981430053711, 40.42887878417969, -20.67725372314453, 11.943038940429688, 0.7402658462524414, 7.08782958984375, -21.41327667236328, 11.528745651245117, 36.967742919921875, 39.253761291503906, 25.68164825439453, -4.723091125488281, 26.19308090209961, 23.443641662597656, -19.412506103515625, -11.993568420410156, 40.64398193359375, 24.73929786682129, 3.835296630859375, -7.10345458984375, 20.211334228515625, 1.0371875762939453, 14.752605438232422, 36.52050018310547, 48.69689178466797, 22.823646545410156, 0.8985595703125, 23.909339904785156, -0.8088836669921875, 8.033538818359375, 10.294145584106445, 40.921775817871094, 11.325462341308594, 4.100059509277344, 2.37921142578125, 0.26323699951171875, 17.392879486083984, -2.483203887939453, -17.209888458251953, 1.84222412109375, 10.824172973632812, 5.023406982421875, 58.82402038574219, 13.693233489990234, 25.68431854248047, -8.348182678222656, -6.8766632080078125, 18.888595581054688, 12.93044376373291, 25.738525390625, 37.80974578857422, 33.64219665527344, 11.287975311279297, 25.239852905273438, 30.61243438720703, 27.414939880371094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000597.npy"}
|
||||
{"epoch": 0.9024943310657596, "step": 598, "batch_size": 64, "mean": 8.606800079345703, "std": 18.25758934020996, "min": -32.748992919921875, "p10": -8.760185623168946, "median": 5.676647186279297, "p90": 33.57034873962404, "max": 53.282684326171875, "pos_frac": 0.671875, "sample": [4.142822265625, -7.1763458251953125, -4.2544708251953125, 7.769844055175781, 18.914886474609375, 1.9399185180664062, 5.191558837890625, 35.13758850097656, 17.35566520690918, -17.542133331298828, 9.869064331054688, 9.722530364990234, 11.297550201416016, 4.086986541748047, 51.81463623046875, -16.39895248413086, 53.282684326171875, -11.479446411132812, 21.115150451660156, 18.469642639160156, -6.955636978149414, -26.08810806274414, -6.3084716796875, 26.69803237915039, 2.3616104125976562, 2.3300342559814453, 7.236640930175781, 25.413070678710938, -5.4781494140625, -4.2325897216796875, -8.205760955810547, -2.5513153076171875, 8.788337707519531, 18.502655029296875, 29.509445190429688, 0.23839950561523438, 10.832626342773438, -1.1729965209960938, 17.983734130859375, 8.009918212890625, 35.41905975341797, 0.5764312744140625, 0.8063831329345703, 50.87443542480469, -5.507209777832031, 24.809396743774414, 9.026901245117188, 1.5862197875976562, 29.913455963134766, 27.984634399414062, -3.7395782470703125, -32.748992919921875, -1.955078125, 4.15570068359375, 6.161735534667969, -24.020645141601562, 35.854103088378906, 12.59686279296875, -8.743301391601562, -2.1445274353027344, 44.619659423828125, 23.890228271484375, -8.76742172241211, 20.01612091064453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000598.npy"}
|
||||
{"epoch": 0.9040060468631897, "step": 599, "batch_size": 64, "mean": 10.424295425415039, "std": 18.99679946899414, "min": -35.01411056518555, "p10": -11.03347587585449, "median": 8.630443572998047, "p90": 41.061732482910166, "max": 53.139068603515625, "pos_frac": 0.734375, "sample": [-4.2864990234375, 27.663841247558594, 7.723930358886719, 5.444694519042969, -3.5616455078125, 16.561017990112305, -11.736618041992188, 48.79713439941406, -7.454376220703125, -24.484481811523438, -13.468303680419922, 9.538747787475586, 3.6795616149902344, 14.530305862426758, 15.634307861328125, -13.783561706542969, -0.1995086669921875, 0.9878463745117188, 2.79510498046875, 9.536956787109375, 42.63141632080078, 14.622650146484375, 0.21234130859375, -21.052276611328125, 0.3554420471191406, 7.1897735595703125, 6.2159881591796875, 37.092769622802734, -35.01411056518555, 13.563270568847656, 44.984283447265625, 33.808528900146484, -4.271141052246094, -8.161140441894531, 14.485969543457031, 2.776592254638672, 6.280117034912109, 9.952857971191406, -9.392810821533203, 5.82769775390625, 34.12224578857422, 11.079397201538086, 2.550168991088867, -9.097906112670898, 43.244693756103516, 15.950904846191406, 19.495956420898438, -22.235084533691406, 2.158050537109375, 19.70047378540039, 53.139068603515625, 20.781646728515625, 13.291130065917969, 11.801172256469727, 38.83811950683594, 27.90662384033203, 47.887908935546875, 11.430404663085938, 20.981094360351562, -1.2309646606445312, -5.75238037109375, 16.44739532470703, 42.01470947265625, 6.6234130859375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000599.npy"}
|
||||
{"epoch": 0.9055177626606198, "step": 600, "batch_size": 64, "mean": 9.208206176757812, "std": 17.324020385742188, "min": -26.744583129882812, "p10": -10.838059997558593, "median": 6.789104461669922, "p90": 34.066080474853514, "max": 46.86723327636719, "pos_frac": 0.671875, "sample": [16.787094116210938, 29.201736450195312, 4.534004211425781, -4.031627655029297, 45.33854675292969, -2.136962890625, 12.449882507324219, -12.62983512878418, 33.776519775390625, 40.69512176513672, -5.844806671142578, 15.577892303466797, 21.321739196777344, 17.161636352539062, 0.3022003173828125, -2.5770797729492188, -11.034271240234375, -17.685949325561523, -1.56158447265625, 26.642730712890625, 6.3588714599609375, 11.327232360839844, 30.298675537109375, 40.75254821777344, 0.60589599609375, 24.07830810546875, 6.525276184082031, 44.98645782470703, -1.0108146667480469, 34.19017791748047, 11.491867065429688, 4.101543426513672, -23.016571044921875, -9.73309326171875, -3.3734474182128906, -15.760498046875, 4.9884033203125, -10.380233764648438, 0.236724853515625, 30.32830810546875, -6.401401519775391, 17.56903076171875, 7.745822906494141, 12.193771362304688, 0.1451854705810547, 13.671134948730469, -16.7958984375, -7.890510559082031, -26.744583129882812, 27.76666259765625, 4.495750427246094, 15.6956787109375, 7.502235412597656, 7.899627685546875, 12.077281951904297, -0.72723388671875, 7.0529327392578125, -3.369953155517578, 20.822921752929688, 46.86723327636719, 1.7640762329101562, -1.5236644744873047, 21.893508911132812, 34.332984924316406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000600.npy"}
|
||||
{"epoch": 0.9070294784580499, "step": 601, "batch_size": 64, "mean": 13.898125648498535, "std": 19.119178771972656, "min": -35.44795608520508, "p10": -3.7866615295410155, "median": 12.139335632324219, "p90": 40.91689872741699, "max": 49.525447845458984, "pos_frac": 0.734375, "sample": [41.707313537597656, 20.855194091796875, 39.225982666015625, -35.44795608520508, 39.311920166015625, 42.25389099121094, 45.918182373046875, -2.227142333984375, 0.9128952026367188, -3.5015335083007812, 45.33392333984375, -24.13611602783203, 18.7861328125, 9.942855834960938, 2.316434860229492, 15.069000244140625, 27.58946990966797, 26.1202392578125, -1.9026565551757812, 33.772552490234375, 40.83749771118164, 6.0586395263671875, -1.5384140014648438, 34.598602294921875, 49.525447845458984, 12.139091491699219, 0.11902046203613281, -3.028646469116211, 14.362709045410156, -0.3197479248046875, 3.356090545654297, 27.759937286376953, 22.03314208984375, -14.770820617675781, 6.0473127365112305, 19.319293975830078, 6.905342102050781, 14.194206237792969, 6.279541015625, 7.409481048583984, -3.208099365234375, 6.10357666015625, 6.677806854248047, 20.858047485351562, 15.126907348632812, 17.470991134643555, 12.139579772949219, -6.442230224609375, -12.170614242553711, 5.315998077392578, 4.097454071044922, 44.04925537109375, -3.3172454833984375, -2.2616615295410156, 16.963706970214844, 25.481529235839844, 36.25469970703125, 40.73271942138672, 34.611572265625, -3.9088592529296875, -2.9828033447265625, 26.827728271484375, 40.950927734375, -23.07928466796875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000601.npy"}
|
||||
{"epoch": 0.90854119425548, "step": 602, "batch_size": 64, "mean": 12.83267593383789, "std": 17.201190948486328, "min": -16.9185791015625, "p10": -7.019065093994141, "median": 10.427129745483398, "p90": 40.03988723754884, "max": 50.79359436035156, "pos_frac": 0.734375, "sample": [-2.3576812744140625, 29.501937866210938, 17.915515899658203, 34.851531982421875, 7.624603271484375, 10.355400085449219, 21.596160888671875, 5.859502792358398, 18.394851684570312, 8.204086303710938, 46.42066192626953, -3.335979461669922, 41.188720703125, 37.184967041015625, -11.941452026367188, 50.79359436035156, -6.985595703125, 42.57012939453125, 20.860214233398438, -0.5124473571777344, -0.2918853759765625, 1.0035600662231445, 6.279632568359375, -16.08319091796875, 12.2353515625, 26.769344329833984, -0.35406494140625, 30.398765563964844, 15.859764099121094, 25.86141586303711, -2.6412277221679688, -1.9280681610107422, 16.86273956298828, 2.0431594848632812, 20.264568328857422, 14.591838836669922, -1.9330177307128906, 14.604248046875, 24.26705551147461, 1.1454925537109375, -4.563861846923828, 12.915790557861328, 10.498859405517578, 45.81583023071289, -7.033409118652344, -16.9185791015625, 14.878055572509766, 14.094409942626953, 5.901926040649414, 2.0153732299804688, -14.391639709472656, 9.98773193359375, 37.359275817871094, 2.0409317016601562, 0.8159637451171875, 27.83255386352539, 12.529953002929688, 5.575897216796875, 45.1533203125, 42.899070739746094, -12.003318786621094, 30.64966583251953, 8.045089721679688, -9.951828002929688], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000602.npy"}
|
||||
{"epoch": 0.91005291005291, "step": 603, "batch_size": 64, "mean": 11.69677448272705, "std": 19.272790908813477, "min": -37.53761291503906, "p10": -9.732860565185545, "median": 10.715576171875, "p90": 35.653820800781254, "max": 60.330535888671875, "pos_frac": 0.734375, "sample": [21.378334045410156, 41.763580322265625, 31.489425659179688, -0.34543609619140625, -11.775375366210938, 1.4401092529296875, 13.941078186035156, 9.320457458496094, -1.9495887756347656, 4.6762237548828125, 26.884628295898438, 5.818443298339844, 26.69445037841797, -0.9029102325439453, -22.514450073242188, 33.74131774902344, -4.905792236328125, 6.826862335205078, 3.208446502685547, 60.330535888671875, 16.502967834472656, 9.925445556640625, 27.564910888671875, 6.445899963378906, -1.7358474731445312, 14.403892517089844, 22.587474822998047, -8.127777099609375, 12.577316284179688, -29.900390625, 6.0466461181640625, 5.3524627685546875, 21.480661392211914, 18.378856658935547, -10.420753479003906, 33.864479064941406, 15.629209518432617, 1.8242301940917969, -1.8729305267333984, 51.31955337524414, -17.28179931640625, 28.407997131347656, 33.042762756347656, 18.814624786376953, 35.84605407714844, 21.616714477539062, 1.1634445190429688, -5.209247589111328, 14.139659881591797, 0.7598953247070312, 2.460968017578125, -1.3353500366210938, 30.122116088867188, 37.636810302734375, 11.505706787109375, -3.7069015502929688, 3.7009239196777344, -28.412002563476562, 22.312362670898438, 36.59977722167969, 39.637413024902344, 12.137331008911133, 35.20527648925781, -37.53761291503906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000603.npy"}
|
||||
{"epoch": 0.9115646258503401, "step": 604, "batch_size": 64, "mean": 11.098645210266113, "std": 21.0632381439209, "min": -40.556602478027344, "p10": -12.471126174926756, "median": 8.639640808105469, "p90": 42.72151870727539, "max": 56.76731872558594, "pos_frac": 0.734375, "sample": [12.73565673828125, 27.063636779785156, -1.3027114868164062, -10.209827423095703, 16.014144897460938, 3.2022171020507812, 2.4399757385253906, 28.008182525634766, 45.605281829833984, -18.162498474121094, 10.250198364257812, 0.006572723388671875, 0.6453037261962891, 27.083309173583984, 12.0321044921875, -4.3197174072265625, 37.710166931152344, 4.901542663574219, 1.888265609741211, 30.096908569335938, -37.91145324707031, -7.9947967529296875, 4.766864776611328, -13.440254211425781, 6.653057098388672, 3.0137367248535156, -1.1555156707763672, 8.562652587890625, 10.500663757324219, -40.556602478027344, 10.407344818115234, -8.382858276367188, -8.689029693603516, -2.6977386474609375, 46.29762649536133, -13.75640869140625, 1.2273712158203125, 13.52545166015625, 34.71057891845703, 24.236251831054688, 13.122570037841797, 51.126861572265625, 52.386627197265625, 7.896217346191406, 28.013351440429688, 42.15222930908203, 4.433685302734375, 33.706092834472656, 56.76731872558594, -2.050220489501953, 42.96549987792969, 2.60137939453125, 16.47838592529297, 10.722305297851562, 8.716629028320312, 13.765350341796875, 43.2404670715332, -4.315868377685547, 0.2640533447265625, 24.304489135742188, 32.62384033203125, -22.526260375976562, -25.047645568847656, 23.96025848388672], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000604.npy"}
|
||||
{"epoch": 0.9130763416477702, "step": 605, "batch_size": 64, "mean": 10.744077682495117, "std": 18.326534271240234, "min": -34.62416076660156, "p10": -9.353591156005859, "median": 10.26276969909668, "p90": 34.335839462280276, "max": 63.471221923828125, "pos_frac": 0.71875, "sample": [23.270828247070312, 9.638214111328125, -12.517219543457031, 63.471221923828125, 32.24768829345703, 19.352787017822266, 7.8542022705078125, 12.96893310546875, 22.680831909179688, -8.332220077514648, 17.089523315429688, -7.4926910400390625, 1.8768310546875, -6.76776123046875, -15.795684814453125, -8.635807037353516, 1.03509521484375, 7.719940185546875, 25.30548858642578, 6.241146087646484, 4.3319091796875, -2.7403488159179688, 11.511848449707031, -19.769065856933594, 3.5036849975585938, -4.697662353515625, 38.86720657348633, 15.671571731567383, 6.507108688354492, 19.617584228515625, 18.387603759765625, 34.397865295410156, -34.62416076660156, 34.19111251831055, 1.9267425537109375, 17.641815185546875, 33.570892333984375, 14.40704345703125, -0.827545166015625, 12.963668823242188, -9.383636474609375, 5.306465148925781, 39.05903625488281, 21.222423553466797, 48.16339111328125, 11.630706787109375, 22.822280883789062, 24.10163116455078, 14.3448486328125, -6.80828857421875, 50.6109619140625, 13.795257568359375, 10.887325286865234, -15.296371459960938, -11.987396240234375, 6.0464324951171875, 20.254947662353516, -8.985366821289062, 6.681465148925781, -9.283485412597656, 0.5172357559204102, 38.90922546386719, -4.613433837890625, 23.575057983398438], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000605.npy"}
|
||||
{"epoch": 0.9145880574452003, "step": 606, "batch_size": 64, "mean": 11.328773498535156, "std": 17.353822708129883, "min": -25.390853881835938, "p10": -7.472548103332519, "median": 7.590812683105469, "p90": 38.02927207946778, "max": 51.0693359375, "pos_frac": 0.734375, "sample": [-3.5862464904785156, 29.086334228515625, 20.99011993408203, 44.52983093261719, 3.88165283203125, 1.5402412414550781, 2.838348388671875, 0.3372802734375, 0.7647075653076172, 32.80997848510742, 43.162410736083984, 0.5475845336914062, 17.834938049316406, 22.765472412109375, 15.225349426269531, -25.390853881835938, 19.966171264648438, 4.38128662109375, 9.466354370117188, 28.055042266845703, 29.64703369140625, 3.21368408203125, -9.927825927734375, 19.561180114746094, -7.681488037109375, -0.2128753662109375, 24.953094482421875, -2.0056800842285156, 20.79255485534668, 34.01649475097656, 9.282844543457031, 7.774257659912109, -5.979534149169922, 35.717803955078125, -1.0579299926757812, 3.7437992095947266, 0.44525909423828125, -6.985021591186523, 10.117843627929688, 2.957183837890625, 42.560970306396484, 42.27131652832031, 0.00463104248046875, 39.019901275634766, 4.500957489013672, 51.0693359375, -4.392814636230469, 10.101303100585938, 7.407367706298828, -9.887336730957031, 45.15657043457031, -10.104324340820312, -1.9855918884277344, 10.613594055175781, -17.97100067138672, 25.537260055541992, 4.74639892578125, -14.179832458496094, -2.9469947814941406, 27.940040588378906, 10.334300994873047, 21.784820556640625, 9.472541809082031, -3.5905704498291016], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000606.npy"}
|
||||
{"epoch": 0.9160997732426304, "step": 607, "batch_size": 64, "mean": 12.327167510986328, "std": 16.581298828125, "min": -32.52189636230469, "p10": -6.551708030700683, "median": 11.14704704284668, "p90": 36.02168197631836, "max": 43.94325256347656, "pos_frac": 0.75, "sample": [1.0414314270019531, -2.30548095703125, -1.1645584106445312, 4.46905517578125, 37.959442138671875, 22.122726440429688, 18.48595428466797, 14.405715942382812, -32.52189636230469, 6.3365325927734375, 10.306991577148438, 9.673660278320312, 35.66302490234375, 35.316932678222656, 3.0677642822265625, -6.03631591796875, 6.639434814453125, 15.086925506591797, 19.29596710205078, 39.024169921875, 12.943008422851562, 29.681068420410156, 19.829734802246094, 36.463661193847656, 24.274635314941406, 12.881370544433594, -6.7938232421875, 12.325180053710938, -6.706727981567383, -6.189994812011719, 30.710357666015625, 13.116180419921875, 13.190559387207031, 36.175392150878906, 6.440113067626953, 32.53942108154297, 32.7110595703125, -2.253925323486328, 43.94325256347656, -1.58746337890625, -22.09096908569336, 8.827407836914062, 5.296199798583984, 25.36691665649414, 12.633342742919922, -2.180438995361328, -9.918342590332031, 11.987102508544922, 35.07838439941406, 37.09800338745117, -3.630786895751953, -6.89459228515625, 15.730751037597656, 43.87451171875, 5.528961181640625, 23.2445068359375, 6.771251678466797, -0.473358154296875, 5.7141265869140625, 26.354368209838867, 0.8285903930664062, -17.315204620361328, 6.7376861572265625, 9.809768676757812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000607.npy"}
|
||||
{"epoch": 0.9176114890400605, "step": 608, "batch_size": 64, "mean": 8.206451416015625, "std": 17.857454299926758, "min": -32.379112243652344, "p10": -11.51399459838867, "median": 6.9935455322265625, "p90": 31.212351989746097, "max": 53.685577392578125, "pos_frac": 0.6875, "sample": [17.112409591674805, -19.623573303222656, 0.4795665740966797, 25.675609588623047, -0.2411632537841797, -9.543594360351562, 17.601295471191406, -12.358451843261719, 7.812797546386719, -23.424697875976562, 14.747146606445312, 8.660202026367188, 20.432937622070312, 23.81662368774414, 6.930961608886719, 11.716651916503906, 26.2760009765625, -32.379112243652344, 7.750690460205078, 0.27387237548828125, 2.8036880493164062, -6.719753265380859, -2.4279232025146484, 52.158729553222656, 9.209632873535156, -6.207378387451172, -0.8134613037109375, 3.989276885986328, 14.157114028930664, 38.96771240234375, 12.500701904296875, -23.40521240234375, -12.739532470703125, 14.355682373046875, -6.7000732421875, -2.9505767822265625, -2.58343505859375, -5.589317321777344, 8.280181884765625, 7.056129455566406, 2.3959693908691406, 24.46143341064453, 4.522926330566406, -3.1033878326416016, 3.272216796875, 31.612167358398438, -7.2806396484375, 44.19830322265625, 53.685577392578125, 21.645198822021484, 7.13201904296875, 2.3841781616210938, 19.049171447753906, 12.20467758178711, 2.5082263946533203, 39.41069030761719, -22.5875244140625, 43.25884246826172, 0.411956787109375, -5.501495361328125, 2.4614810943603516, 18.09307861328125, 15.640068054199219, 30.279449462890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000608.npy"}
|
||||
{"epoch": 0.9191232048374905, "step": 609, "batch_size": 64, "mean": 12.72257137298584, "std": 15.687969207763672, "min": -35.93056869506836, "p10": -4.736570358276365, "median": 13.152324676513672, "p90": 31.37258415222168, "max": 49.80101013183594, "pos_frac": 0.765625, "sample": [49.80101013183594, 29.74102783203125, 31.161415100097656, 2.568347930908203, 3.687530517578125, 2.1157121658325195, 30.85199546813965, 26.759033203125, 0.9632987976074219, 3.279796600341797, -3.196788787841797, 10.055427551269531, -5.396476745605469, 8.557060241699219, 23.454330444335938, 17.775390625, 34.86650848388672, 11.922691345214844, 17.626148223876953, 37.919498443603516, -20.665756225585938, -0.3001861572265625, 18.115097045898438, 23.536590576171875, -0.6448631286621094, 39.619163513183594, 31.463085174560547, 12.093238830566406, -12.652202606201172, 13.940467834472656, 43.12965393066406, -8.843549728393555, 21.464374542236328, -0.436676025390625, 14.127685546875, 16.658000946044922, 22.074722290039062, -1.63482666015625, 23.88104248046875, 6.7486572265625, -1.6419639587402344, 5.632167816162109, 15.800590515136719, 12.364181518554688, 18.30137825012207, -35.93056869506836, 14.55770492553711, 7.517276763916016, 9.495059967041016, 3.2357616424560547, 6.638605117797852, -0.8841094970703125, 21.4940185546875, 26.34909439086914, 9.361572265625, -2.5183944702148438, 14.22772216796875, 33.900062561035156, 28.507400512695312, -9.784423828125, 24.359081268310547, 22.309486389160156, 20.51751708984375, -5.751335144042969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000609.npy"}
|
||||
{"epoch": 0.9206349206349206, "step": 610, "batch_size": 64, "mean": 11.673582077026367, "std": 21.41412353515625, "min": -37.4295654296875, "p10": -15.444750976562498, "median": 9.646995544433594, "p90": 44.08609542846682, "max": 59.30583953857422, "pos_frac": 0.671875, "sample": [17.563737869262695, 22.06939697265625, 7.268241882324219, 52.66645812988281, -5.7216339111328125, 17.573646545410156, 5.17254638671875, 12.078731536865234, -37.4295654296875, 35.4234619140625, -6.2693939208984375, -23.28183364868164, 17.644622802734375, 9.655059814453125, 3.6223602294921875, 10.568695068359375, 37.805519104003906, -12.832534790039062, -2.0590972900390625, -2.9673385620117188, -2.2268218994140625, -16.319175720214844, 11.479133605957031, 8.200508117675781, -17.460735321044922, -5.2878570556640625, -9.57598876953125, 54.80931091308594, 20.265213012695312, 50.599822998046875, 2.913410186767578, 51.235870361328125, 29.40003204345703, 35.91218566894531, -19.186981201171875, 29.171337127685547, 46.272308349609375, 5.494222640991211, 20.347579956054688, 14.84295654296875, 18.725326538085938, -0.8707904815673828, 9.330856323242188, -17.4326171875, 26.29632568359375, 0.43975067138671875, -8.63473892211914, 9.638931274414062, 38.98493194580078, 33.649620056152344, 6.854034423828125, 10.149612426757812, 20.942901611328125, -4.5210113525390625, -17.508346557617188, -7.673164367675781, -13.404426574707031, 24.739593505859375, 59.30583953857422, 17.590179443359375, 21.415306091308594, -7.0825653076171875, 49.73078918457031, 7.0055084228515625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000610.npy"}
|
||||
{"epoch": 0.9221466364323507, "step": 611, "batch_size": 64, "mean": 9.36767578125, "std": 16.731958389282227, "min": -22.683048248291016, "p10": -8.510734748840331, "median": 5.086943626403809, "p90": 34.47364044189453, "max": 57.76911163330078, "pos_frac": 0.6875, "sample": [22.073951721191406, -1.1712532043457031, -8.45233154296875, -11.039146423339844, 23.807621002197266, 4.723285675048828, 11.022003173828125, 11.402839660644531, -16.693248748779297, 0.15362548828125, -7.5799102783203125, 39.96228790283203, 11.680545806884766, -13.894012451171875, -1.6608619689941406, 2.221830368041992, -22.683048248291016, 41.778076171875, -12.551305770874023, 57.76911163330078, 25.29214096069336, 17.44579315185547, 3.4341278076171875, -8.535764694213867, -7.0209197998046875, -5.756141662597656, 20.7122802734375, 10.865470886230469, -0.8207969665527344, 36.374664306640625, 7.6649627685546875, -0.8743820190429688, -6.0506591796875, 48.053688049316406, 2.3173675537109375, 1.9409713745117188, -13.598367691040039, 19.391510009765625, 8.927558898925781, 19.780925750732422, 24.81878662109375, -2.3368377685546875, 4.075263977050781, 30.076522827148438, -1.7656135559082031, 17.18219757080078, 0.4445381164550781, 12.079345703125, 18.841272354125977, 23.57647705078125, 34.78251647949219, 33.7529296875, -5.1515350341796875, 12.719432830810547, 5.450601577758789, 3.521930694580078, 40.836429595947266, -4.732368469238281, 9.174213409423828, 3.76446533203125, 14.77780532836914, 2.5684871673583984, 3.090235710144043, 7.569648742675781], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000611.npy"}
|
||||
{"epoch": 0.9236583522297808, "step": 612, "batch_size": 64, "mean": 13.594877243041992, "std": 20.717575073242188, "min": -45.73341369628906, "p10": -11.011951828002928, "median": 12.476673126220703, "p90": 43.02658882141114, "max": 57.26220703125, "pos_frac": 0.765625, "sample": [-1.0632972717285156, -15.186798095703125, 15.9453125, 0.2494049072265625, 0.4358367919921875, 5.003120422363281, 38.85408401489258, 21.076797485351562, -1.60308837890625, 9.023239135742188, 42.49967956542969, 6.9107513427734375, 18.460289001464844, 10.57208251953125, 16.736404418945312, 12.695533752441406, 12.63198471069336, -1.87225341796875, 4.522674560546875, 13.528541564941406, 16.009300231933594, -18.92148208618164, 19.299026489257812, 2.331920623779297, -13.297233581542969, 4.578685760498047, 35.50567626953125, -11.999202728271484, -2.95819091796875, -0.69830322265625, 22.63180923461914, 43.25240707397461, -8.217916488647461, -5.529392242431641, 2.7058944702148438, 50.73118591308594, 3.034820556640625, 36.95490264892578, 43.65046691894531, -32.30072021484375, 52.38993835449219, 7.496429443359375, 32.15812683105469, -15.487693786621094, 45.227294921875, 22.771163940429688, 33.09027099609375, -8.708366394042969, 23.923561096191406, 54.918731689453125, 5.565402984619141, 5.005805969238281, 57.26220703125, 8.382820129394531, 24.67872428894043, 26.466339111328125, 28.384017944335938, 12.341072082519531, 10.702049255371094, -45.73341369628906, 24.96823501586914, 32.60327911376953, 12.612274169921875, 22.86988067626953], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000612.npy"}
|
||||
{"epoch": 0.9251700680272109, "step": 613, "batch_size": 64, "mean": 15.678262710571289, "std": 20.259557723999023, "min": -45.980308532714844, "p10": -7.05896224975586, "median": 12.476984024047852, "p90": 45.103172683715826, "max": 56.24071502685547, "pos_frac": 0.84375, "sample": [-7.048179626464844, 19.495521545410156, 1.0970230102539062, 0.002716064453125, 31.41678810119629, 9.909736633300781, 34.426841735839844, 15.520721435546875, -6.125938415527344, 11.7198486328125, 7.875743865966797, 19.30263900756836, -45.980308532714844, 19.543060302734375, 11.60678482055664, 24.639816284179688, 5.978979110717773, 12.19869613647461, 38.102298736572266, 7.404155731201172, 5.442527770996094, 2.5375308990478516, -12.881805419921875, 1.1375770568847656, 46.75245666503906, 14.495166778564453, 45.56135177612305, 7.227718353271484, 5.05426025390625, 22.030895233154297, 44.034088134765625, 6.581724166870117, 23.07738494873047, -9.67709732055664, 5.094825744628906, 6.652191162109375, -23.74919891357422, 48.191184997558594, 19.411766052246094, 56.24071502685547, 31.406478881835938, 43.747833251953125, 5.674320220947266, 19.68658447265625, 51.986244201660156, -7.0635833740234375, 42.608299255371094, 18.106094360351562, 4.810478210449219, 33.23332977294922, -16.036209106445312, 1.299835205078125, 15.381675720214844, 49.161170959472656, 35.00634765625, 6.727429389953613, 7.590188980102539, 34.97325134277344, 12.755271911621094, 47.824954986572266, 15.564033508300781, 39.16522216796875, -2.885772705078125, -11.616897583007812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000613.npy"}
|
||||
{"epoch": 0.926681783824641, "step": 614, "batch_size": 64, "mean": 10.928446769714355, "std": 15.728384971618652, "min": -30.41089630126953, "p10": -9.043719291687008, "median": 10.363431930541992, "p90": 31.620432281494143, "max": 45.634063720703125, "pos_frac": 0.796875, "sample": [35.785430908203125, 16.399547576904297, 3.4210281372070312, 27.946266174316406, 31.054603576660156, 8.351669311523438, 2.604625701904297, -2.3385486602783203, 11.429889678955078, 4.990989685058594, 31.862930297851562, 3.071636199951172, 31.86810302734375, 6.048515319824219, 33.86424255371094, 15.53811264038086, 24.680519104003906, 18.33096694946289, 12.207611083984375, 22.70757293701172, -11.670036315917969, 12.057811737060547, -1.1330413818359375, -19.443267822265625, 45.634063720703125, -3.6720123291015625, 11.964202880859375, -22.083778381347656, 17.239421844482422, -10.809707641601562, 0.7195281982421875, 0.4100379943847656, 5.270381927490234, 35.94624710083008, 3.1084022521972656, 9.396987915039062, 21.260271072387695, 1.0243606567382812, -30.41089630126953, 22.109642028808594, 4.6719970703125, 39.20451354980469, 7.6908416748046875, 19.914596557617188, 5.893135070800781, 26.864227294921875, 3.571380615234375, 1.663909912109375, 14.281051635742188, -5.9275665283203125, 5.239477157592773, 19.75585174560547, -3.333404541015625, -10.379213333129883, -16.251304626464844, 27.105087280273438, 27.243247985839844, 11.627655029296875, 24.255264282226562, 3.1488800048828125, 11.329875946044922, 25.77716827392578, -1.3107757568359375, 30.640357971191406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000614.npy"}
|
||||
{"epoch": 0.9281934996220711, "step": 615, "batch_size": 64, "mean": 13.884675979614258, "std": 20.600379943847656, "min": -30.401859283447266, "p10": -8.585255813598632, "median": 10.63858413696289, "p90": 40.67801704406738, "max": 52.717063903808594, "pos_frac": 0.75, "sample": [15.679710388183594, 17.13935089111328, 40.420143127441406, 13.713973999023438, 2.0451431274414062, 34.981544494628906, 6.162605285644531, -30.135650634765625, 40.751102447509766, 15.075801849365234, -21.703094482421875, 8.394611358642578, -9.3193359375, 4.0795745849609375, 20.192338943481445, -8.61001968383789, 10.736991882324219, 52.717063903808594, 33.0007438659668, 8.328170776367188, 19.30899429321289, 0.84783935546875, 26.61358642578125, -6.429001808166504, 30.743255615234375, -2.2360992431640625, -22.46776580810547, 10.540176391601562, 5.22381591796875, 49.86830139160156, 31.983291625976562, 1.1442146301269531, 5.78485107421875, -3.4518585205078125, 21.08135223388672, 23.994064331054688, 36.38531494140625, 7.063079833984375, 4.5801239013671875, 40.507484436035156, 43.91566467285156, 48.821533203125, 42.06269454956055, 7.286376953125, 37.85869598388672, -4.665069580078125, 2.5283737182617188, -8.03875732421875, -1.2510223388671875, 39.92487335205078, 10.310791015625, 27.14215087890625, -2.5708541870117188, 39.73948669433594, -8.527473449707031, 37.45770263671875, 22.419219970703125, 17.62042999267578, 41.82189178466797, 2.559504508972168, 15.762508392333984, -2.2157440185546875, -30.401859283447266, -25.677642822265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000615.npy"}
|
||||
{"epoch": 0.9297052154195011, "step": 616, "batch_size": 64, "mean": 11.263028144836426, "std": 17.965566635131836, "min": -22.159486770629883, "p10": -8.585732460021973, "median": 6.319103240966797, "p90": 39.53919219970705, "max": 50.904022216796875, "pos_frac": 0.71875, "sample": [33.67657470703125, 6.567169189453125, 3.2489013671875, -3.4000473022460938, 16.49858856201172, -8.053844451904297, 49.95948791503906, 23.682891845703125, -1.9035720825195312, 13.354619026184082, -11.937088012695312, 1.0970039367675781, 11.416000366210938, -21.33896255493164, -1.5147933959960938, -9.569709777832031, 21.6824951171875, -0.4340629577636719, 1.949127197265625, 46.37495422363281, 9.257392883300781, 46.563026428222656, -22.159486770629883, -2.0006866455078125, 43.93242645263672, -7.10938835144043, -9.265995025634766, 0.49353790283203125, -2.297748565673828, 30.853729248046875, 10.648529052734375, 11.292404174804688, 16.634014129638672, 25.02094268798828, 50.904022216796875, 42.05174255371094, -3.510173797607422, 1.1471786499023438, 2.5472412109375, 25.103736877441406, 26.243064880371094, 14.55279541015625, 8.525404930114746, 20.839927673339844, 28.850341796875, 25.592681884765625, 0.9259490966796875, 4.349723815917969, -4.3080596923828125, 0.21502685546875, -15.299373626708984, -6.257080078125, 1.8990478515625, 27.55474853515625, 9.000335693359375, 29.342121124267578, 6.071037292480469, 5.063148498535156, 20.03948974609375, 2.3617477416992188, 29.964080810546875, 47.34197998046875, 5.3171539306640625, -8.813684463500977], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000616.npy"}
|
||||
{"epoch": 0.9312169312169312, "step": 617, "batch_size": 64, "mean": 10.21737289428711, "std": 17.050283432006836, "min": -38.76921081542969, "p10": -6.0579200744628885, "median": 9.90101432800293, "p90": 30.524895477294933, "max": 55.15868377685547, "pos_frac": 0.75, "sample": [10.61370849609375, -1.2166824340820312, -17.657089233398438, 22.15807342529297, 16.160369873046875, 13.361946105957031, -0.0095977783203125, 27.31305694580078, 12.275352478027344, -26.15887451171875, 14.049331665039062, -0.5685501098632812, 14.634979248046875, 8.966726303100586, -0.6250038146972656, 10.671150207519531, 5.526908874511719, 8.171844482421875, 15.416702270507812, 5.438575744628906, 38.16314697265625, 4.499149322509766, -4.402976989746094, 31.901397705078125, -6.767181396484375, -2.5330915451049805, -2.985809326171875, 23.7425537109375, 1.4507675170898438, 11.54946517944336, 8.372173309326172, 0.041961669921875, 5.6196746826171875, 43.28797149658203, 6.48455810546875, 48.58727264404297, -0.3776702880859375, -10.280563354492188, 9.754287719726562, 25.402313232421875, 21.370803833007812, 6.3911895751953125, 5.226930618286133, 10.021381378173828, 45.50173568725586, 15.796672821044922, -30.604122161865234, -38.76921081542969, -9.075969696044922, 13.801383972167969, 23.220653533935547, 19.29180908203125, 9.780647277832031, 14.948915481567383, 55.15868377685547, 37.59429931640625, 21.456031799316406, 3.125141143798828, 11.94217300415039, 17.587310791015625, 0.5699834823608398, -4.129150390625, 20.649799346923828, 13.022422790527344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000617.npy"}
|
||||
{"epoch": 0.9327286470143613, "step": 618, "batch_size": 64, "mean": 10.265620231628418, "std": 18.837427139282227, "min": -34.98686218261719, "p10": -7.927320671081543, "median": 7.869138717651367, "p90": 37.37907180786133, "max": 56.243621826171875, "pos_frac": 0.6875, "sample": [45.35888671875, 37.675453186035156, 30.625167846679688, 7.772056579589844, 1.912872314453125, -8.023012161254883, -13.929527282714844, -0.808837890625, 24.24835205078125, -0.8609714508056641, 3.4096832275390625, 19.005950927734375, 36.68751525878906, 19.751548767089844, 53.59099578857422, 4.91973876953125, 10.993324279785156, 4.755060195922852, 53.61405944824219, 24.7657470703125, -3.84149169921875, 56.243621826171875, 5.265071868896484, 20.786598205566406, -11.40484619140625, 45.400672912597656, 0.340972900390625, 5.9623260498046875, 13.563613891601562, 8.223377227783203, -3.1736297607421875, -3.614421844482422, 12.724407196044922, 3.25482177734375, 15.811065673828125, -4.9351654052734375, 26.25080108642578, -7.70404052734375, 21.260353088378906, 7.966220855712891, -4.021110534667969, -7.064077377319336, -2.5600204467773438, -27.95333480834961, 12.709095001220703, -17.54583740234375, -2.7942428588867188, -34.98686218261719, 10.24716567993164, 28.846416473388672, -15.962432861328125, 19.690322875976562, 8.1527099609375, 5.572771072387695, 1.3754425048828125, -5.294132232666016, 0.11830520629882812, 24.4267578125, 11.4976806640625, 20.35598373413086, 10.526962280273438, 43.47673034667969, -0.9522247314453125, 15.293235778808594], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000618.npy"}
|
||||
{"epoch": 0.9342403628117913, "step": 619, "batch_size": 64, "mean": 13.123184204101562, "std": 19.338584899902344, "min": -36.78471374511719, "p10": -8.728684997558593, "median": 10.536399841308594, "p90": 42.56584205627442, "max": 49.554622650146484, "pos_frac": 0.765625, "sample": [30.412221908569336, 0.8836822509765625, 2.3152542114257812, 18.209476470947266, 49.554622650146484, 41.81342697143555, 47.6470947265625, -8.793243408203125, -18.160415649414062, 43.58186340332031, 42.8883056640625, -16.802356719970703, 18.967498779296875, -1.5677967071533203, -0.7097320556640625, 0.6973400115966797, 23.39727783203125, 13.93310546875, 2.186915397644043, 0.1404571533203125, 8.345413208007812, 38.33671569824219, 6.268434524536133, -13.861351013183594, -1.9241151809692383, 20.859786987304688, 24.87986946105957, 36.80731201171875, 2.167133331298828, -0.7617511749267578, 2.735015869140625, -36.78471374511719, -8.578048706054688, 43.824676513671875, 19.32950210571289, 40.62734603881836, 2.9732284545898438, 47.85822677612305, 48.112396240234375, -20.879257202148438, 4.972599029541016, 34.3782958984375, 23.89883041381836, 4.113201141357422, 14.605987548828125, 12.776016235351562, 2.6110382080078125, 13.949867248535156, 7.521614074707031, 21.148605346679688, -9.353485107421875, 12.727386474609375, 23.423492431640625, 6.0889129638671875, 23.449432373046875, -7.228710174560547, 18.404037475585938, 32.88310241699219, 0.06875228881835938, 2.8375110626220703, -4.294105529785156, 23.590152740478516, -0.9443283081054688, 27.324783325195312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000619.npy"}
|
||||
{"epoch": 0.9357520786092215, "step": 620, "batch_size": 64, "mean": 9.317474365234375, "std": 18.263565063476562, "min": -36.39591979980469, "p10": -11.982608604431153, "median": 6.682374477386475, "p90": 34.69569473266602, "max": 55.7200927734375, "pos_frac": 0.71875, "sample": [7.21551513671875, -9.681243896484375, -25.585018157958984, 32.337623596191406, -36.39591979980469, 5.395544052124023, 8.893417358398438, 25.089630126953125, -3.7668914794921875, 31.554901123046875, 35.613162994384766, 34.911720275878906, 2.1012725830078125, 8.177154541015625, 11.032035827636719, 34.19163513183594, 3.307170867919922, -0.5281219482421875, 0.2296905517578125, -2.7826004028320312, -13.2281494140625, -0.33457183837890625, -30.12164306640625, 41.72077941894531, -19.830764770507812, 37.22337341308594, 4.922340393066406, -4.7537384033203125, 55.7200927734375, -4.591453552246094, 33.974369049072266, 22.624366760253906, -2.4288196563720703, 2.6973228454589844, 23.62522315979004, 13.573585510253906, -3.0081214904785156, 16.223052978515625, 23.463491439819336, 2.164203643798828, 16.28026580810547, 5.923561096191406, 15.230428695678711, 3.9426937103271484, -7.934841156005859, -11.883598327636719, 29.04816436767578, 45.96400451660156, -12.025041580200195, 16.202960968017578, 6.869423866271973, 36.827728271484375, 5.009857177734375, 3.8894577026367188, -13.998443603515625, 0.06774520874023438, 9.710996627807617, 15.857315063476562, 18.27292823791504, 15.921295166015625, 15.962270736694336, 9.521125793457031, 4.217098236083984, 6.495325088500977], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000620.npy"}
|
||||
{"epoch": 0.9372637944066515, "step": 621, "batch_size": 64, "mean": 10.20268440246582, "std": 17.921751022338867, "min": -35.34622573852539, "p10": -7.509388160705565, "median": 7.885494232177734, "p90": 35.962004852294925, "max": 49.81732177734375, "pos_frac": 0.6875, "sample": [0.8080062866210938, 5.572975158691406, 11.447837829589844, -5.397674560546875, 4.2303314208984375, -35.34622573852539, 11.142166137695312, 29.48101806640625, -2.452028274536133, 4.946163177490234, 9.493366241455078, 35.0030517578125, 42.42051696777344, -3.643993377685547, 48.35792541503906, 4.7856292724609375, 17.21823501586914, 27.374618530273438, 5.093696594238281, 39.20703125, 11.239433288574219, 1.5610198974609375, 27.466796875, -21.065704345703125, 36.367530822753906, 16.683509826660156, -5.816154479980469, -8.702957153320312, 20.113922119140625, 8.71365737915039, 35.015777587890625, 12.94290542602539, 0.13542938232421875, 37.01667785644531, 17.8023681640625, -4.1918792724609375, 12.33328628540039, 9.289445877075195, 14.042335510253906, -3.2296600341796875, -5.209665298461914, 5.011325836181641, 32.08332824707031, 0.494659423828125, 7.057331085205078, -5.665386199951172, 9.14028549194336, -8.23505973815918, -21.51366424560547, -3.695037841796875, -2.9567947387695312, 28.513551712036133, 31.15227508544922, -2.4442434310913086, 24.050201416015625, -10.366874694824219, -0.6304283142089844, 37.687713623046875, -18.111345291137695, 49.81732177734375, -5.511383056640625, 1.5147666931152344, 14.70530891418457, 28.623260498046875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000621.npy"}
|
||||
{"epoch": 0.9387755102040817, "step": 622, "batch_size": 64, "mean": 11.108508110046387, "std": 16.414100646972656, "min": -28.586135864257812, "p10": -7.907844161987303, "median": 8.906532287597656, "p90": 30.20718460083008, "max": 51.47589111328125, "pos_frac": 0.78125, "sample": [18.10364532470703, 17.397052764892578, -18.52924346923828, -8.263557434082031, 5.406360626220703, 16.088211059570312, -7.077846527099609, 22.804039001464844, -4.734580993652344, 6.5904541015625, 51.47589111328125, 12.079254150390625, 3.2177791595458984, 0.44365692138671875, 38.88078689575195, -3.5586471557617188, 7.1135101318359375, 15.242534637451172, 16.485992431640625, 45.745452880859375, 8.277053833007812, 18.293542861938477, -3.0459518432617188, 12.718147277832031, 42.04034423828125, 26.820755004882812, 10.657096862792969, 0.947174072265625, 6.089069366455078, 30.21936798095703, 36.07038879394531, -28.586135864257812, 21.80156707763672, -6.950157165527344, 2.1516952514648438, 3.736492156982422, 17.396148681640625, 29.441070556640625, 12.674362182617188, -2.4766921997070312, 23.436073303222656, 18.211441040039062, -10.943729400634766, 5.823995590209961, 10.204166412353516, 20.173118591308594, 21.779052734375, -25.212722778320312, 1.9305801391601562, 26.072280883789062, 41.54443359375, 3.283527374267578, 2.4118213653564453, -8.5272216796875, 3.1022491455078125, 3.448974609375, 29.8660888671875, 6.9347381591796875, -11.12283706665039, -0.3081626892089844, 27.65876007080078, 8.859222412109375, 30.178756713867188, 8.953842163085938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000622.npy"}
|
||||
{"epoch": 0.9402872260015117, "step": 623, "batch_size": 64, "mean": 12.071805000305176, "std": 16.97834587097168, "min": -30.794769287109375, "p10": -6.444533729553221, "median": 11.54977798461914, "p90": 36.22614669799805, "max": 54.25634002685547, "pos_frac": 0.78125, "sample": [9.795570373535156, 5.0900421142578125, -1.1758651733398438, -30.794769287109375, 17.683242797851562, 25.6929931640625, 14.2694091796875, 27.952476501464844, 22.208160400390625, 22.227527618408203, 2.1368865966796875, 5.643037796020508, 3.6668014526367188, 6.53533935546875, -2.9257431030273438, 35.9068603515625, -9.058570861816406, 16.47240447998047, 14.21710205078125, 24.630599975585938, 17.39931869506836, 7.888458251953125, 33.09454345703125, -7.109109878540039, -3.4194488525390625, 21.649761199951172, 7.980567932128906, 26.131126403808594, -25.014205932617188, 31.985443115234375, 36.36298370361328, 54.25634002685547, 18.86639404296875, 16.47137451171875, -2.99298095703125, -12.340995788574219, 6.707427978515625, 7.746147155761719, 20.247936248779297, 38.8148193359375, 39.79925537109375, 40.97807312011719, 2.6614627838134766, 3.4290008544921875, -4.893856048583984, 0.3645668029785156, -8.863876342773438, 19.821998596191406, 14.56454849243164, 14.25555419921875, 13.332595825195312, 43.00752258300781, 12.185699462890625, 25.093032836914062, -0.7500801086425781, 10.913856506347656, 4.049652099609375, 1.3272972106933594, 17.51703643798828, -2.626983642578125, 41.667930603027344, 0.18213653564453125, 3.0869827270507812, -23.40729522705078], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000623.npy"}
|
||||
{"epoch": 0.9417989417989417, "step": 624, "batch_size": 64, "mean": 13.869135856628418, "std": 18.081342697143555, "min": -42.03200149536133, "p10": -6.407924270629882, "median": 10.553470611572266, "p90": 39.96501350402833, "max": 52.09855651855469, "pos_frac": 0.796875, "sample": [-12.115036010742188, -6.949192047119141, 33.03886413574219, 15.23055648803711, 40.699066162109375, 29.89071273803711, 2.247905731201172, 6.54168701171875, -4.095085144042969, 1.85540771484375, 9.713600158691406, 9.488105773925781, -0.5092391967773438, 2.9028854370117188, 21.42072296142578, 9.5106201171875, 1.112457275390625, 48.726593017578125, 27.294219970703125, 3.050384521484375, 1.39483642578125, 10.459648132324219, -7.917757034301758, 30.67147445678711, 0.47554779052734375, 13.321968078613281, 5.266506195068359, -8.540229797363281, 17.827350616455078, 23.28156280517578, 11.459457397460938, 43.1138916015625, 36.11115264892578, 29.625701904296875, 13.600276947021484, 1.7338676452636719, 38.25222396850586, 47.95335388183594, 52.09855651855469, 30.432674407958984, 19.062789916992188, 22.784027099609375, 13.010147094726562, 20.27880096435547, -10.051254272460938, 34.523345947265625, 47.80620574951172, 3.675830841064453, -3.08740234375, 25.634078979492188, 23.928733825683594, 8.90848159790039, -2.361297607421875, 2.729313850402832, -42.03200149536133, 24.62866973876953, 10.647293090820312, -0.4983673095703125, 9.501953125, -5.144966125488281, -7.993202209472656, 2.334345817565918, 43.443328857421875, 16.218521118164062], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000624.npy"}
|
||||
{"epoch": 0.9433106575963719, "step": 625, "batch_size": 64, "mean": 7.667244911193848, "std": 18.39533233642578, "min": -39.722137451171875, "p10": -16.417420959472658, "median": 8.18986701965332, "p90": 33.55165405273438, "max": 54.519493103027344, "pos_frac": 0.640625, "sample": [16.867408752441406, 22.933815002441406, 3.3866729736328125, 17.849136352539062, -6.82594108581543, 34.9799690246582, -3.0470542907714844, -0.886932373046875, 17.856647491455078, 8.74298095703125, 3.8620376586914062, -0.9535789489746094, 19.445472717285156, -4.8797149658203125, 43.64875793457031, 1.4962844848632812, 54.519493103027344, 13.610595703125, -5.301380157470703, 5.949827194213867, 12.558563232421875, 32.93934631347656, 5.523807525634766, -9.912811279296875, 0.9209518432617188, 17.126001358032227, 19.38452911376953, 20.917884826660156, -7.7037353515625, 5.446907043457031, -7.394767761230469, 11.0263671875, 27.60931396484375, -12.567512512207031, 14.843299865722656, -5.244604110717773, -16.193939208984375, 43.146514892578125, -5.0570526123046875, -4.9592742919921875, 6.658477783203125, 44.14472961425781, 25.700439453125, 8.129844665527344, 12.338104248046875, 13.023300170898438, -16.513198852539062, -19.850181579589844, -21.976516723632812, 33.81407165527344, -10.082138061523438, -39.722137451171875, -18.8720760345459, 38.62351989746094, 25.4283447265625, -5.482734680175781, 12.30499267578125, 8.249889373779297, -17.64936065673828, -19.195026397705078, 12.260116577148438, 10.468666076660156, 10.52264404296875, 12.715629577636719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000625.npy"}
|
||||
{"epoch": 0.9448223733938019, "step": 626, "batch_size": 64, "mean": 13.556198120117188, "std": 22.44525146484375, "min": -26.8223876953125, "p10": -12.917420959472656, "median": 10.749711990356445, "p90": 45.6721778869629, "max": 59.745391845703125, "pos_frac": 0.6875, "sample": [33.6558837890625, 12.24481201171875, 10.33523941040039, -12.166664123535156, 59.745391845703125, 42.84946060180664, 52.71429443359375, -5.476818084716797, 8.594520568847656, 22.1304988861084, -4.960113525390625, -3.0433197021484375, -11.234230041503906, 0.23919296264648438, -2.38421630859375, 52.637935638427734, 38.5721549987793, -17.702014923095703, 40.302711486816406, -2.9550399780273438, -9.798835754394531, 1.44610595703125, 6.035099983215332, 46.347084045410156, 9.494823455810547, -13.239173889160156, 13.638444900512695, 9.717544555664062, -6.3440399169921875, 53.1944580078125, 18.52859878540039, -5.38995361328125, 44.09739685058594, 26.715003967285156, 5.06134033203125, -26.8223876953125, -6.5419921875, 22.630104064941406, 12.472036361694336, 18.97955322265625, 7.010372161865234, -4.418933868408203, 56.444801330566406, 2.2707977294921875, -23.522064208984375, 25.622419357299805, 15.493797302246094, 41.49928283691406, 14.933940887451172, 11.1641845703125, 21.131752014160156, 8.120357513427734, -23.458271026611328, 26.25445556640625, -8.944644927978516, 38.06455993652344, 10.029912948608398, 51.73347473144531, 28.475444793701172, 15.216705322265625, 42.78358459472656, -15.360719680786133, 16.366714477539062, -23.6361083984375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000626.npy"}
|
||||
{"epoch": 0.9463340891912321, "step": 627, "batch_size": 64, "mean": 15.683549880981445, "std": 18.964345932006836, "min": -24.339927673339844, "p10": -5.25344123840332, "median": 9.688838958740234, "p90": 45.12625885009766, "max": 55.95494842529297, "pos_frac": 0.796875, "sample": [7.729209899902344, 5.6306304931640625, 16.76750946044922, 8.748260498046875, 26.861186981201172, 32.64886474609375, 35.198944091796875, 42.56047058105469, 40.908477783203125, 7.9021148681640625, -9.233333587646484, 38.729766845703125, 51.52271270751953, 21.833412170410156, 4.83885383605957, -0.1176910400390625, -6.9809722900390625, 6.5530548095703125, 50.38917922973633, 6.658988952636719, 29.104148864746094, 21.13491439819336, 4.395824432373047, 2.25701904296875, 10.947834014892578, -10.702167510986328, 45.785953521728516, -7.75323486328125, 34.550113677978516, -1.7375144958496094, -4.596038818359375, 48.46380615234375, 28.73260498046875, 16.473297119140625, 5.186920166015625, 5.932563781738281, -2.4475631713867188, 5.630035400390625, 1.7479705810546875, 45.74430847167969, 43.68414306640625, 17.905776977539062, -4.647251129150391, -5.513236999511719, 6.4688262939453125, 3.91302490234375, -3.5171737670898438, 21.860912322998047, 5.3944854736328125, 20.013381958007812, 11.773292541503906, -18.691268920898438, 2.7827415466308594, 26.052223205566406, 27.235057830810547, 19.098541259765625, 9.575946807861328, -24.339927673339844, 9.80173110961914, 1.9514617919921875, 21.69202423095703, 40.5672607421875, 46.72985076904297, 55.95494842529297], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000627.npy"}
|
||||
{"epoch": 0.9478458049886621, "step": 628, "batch_size": 64, "mean": 14.678995132446289, "std": 22.27927017211914, "min": -48.998138427734375, "p10": -9.429330444335937, "median": 12.524154663085938, "p90": 42.27214126586914, "max": 56.14044189453125, "pos_frac": 0.78125, "sample": [56.14044189453125, -9.932571411132812, 42.10736846923828, 25.369319915771484, -17.539764404296875, -9.79443359375, 42.34275817871094, 0.9314422607421875, 39.79240417480469, 31.842945098876953, 55.23469543457031, 43.69866943359375, 31.016677856445312, 10.913177490234375, 5.195281982421875, 6.660575866699219, -48.998138427734375, -42.643218994140625, 10.69610595703125, -6.9909820556640625, 40.63481903076172, 6.927886962890625, 22.810123443603516, -23.8934326171875, 12.6220703125, 24.196571350097656, 12.426239013671875, 22.463348388671875, 0.07627105712890625, 32.126747131347656, 33.375972747802734, 33.872802734375, 18.571868896484375, 46.36571502685547, 6.111480712890625, 10.965179443359375, 0.002162933349609375, 5.525016784667969, 33.49162292480469, -5.940696716308594, 33.18650817871094, 20.71572494506836, 11.95199966430664, 24.737838745117188, 20.698257446289062, 21.703826904296875, 45.197200775146484, 13.009716033935547, 51.5207633972168, 29.23745346069336, 12.151397705078125, -8.577423095703125, 5.773937225341797, -28.616226196289062, 27.984575271606445, 36.27888488769531, 4.7816162109375, 30.716148376464844, -5.214599609375, 0.7633705139160156, -4.083490371704102, 9.849716186523438, -7.161663055419922, -5.924461364746094], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000628.npy"}
|
||||
{"epoch": 0.9493575207860923, "step": 629, "batch_size": 64, "mean": 12.10648250579834, "std": 16.565528869628906, "min": -23.02300262451172, "p10": -5.112130355834961, "median": 9.047324180603027, "p90": 35.096254730224615, "max": 53.4039306640625, "pos_frac": 0.75, "sample": [-9.553749084472656, 7.430427551269531, -8.60051155090332, 18.078527450561523, 8.174760818481445, 4.736064910888672, 0.3273468017578125, 18.04629898071289, -3.67041015625, 5.9272613525390625, 10.598388671875, 5.18560791015625, 4.300621032714844, 45.76854705810547, -1.5220947265625, -23.02300262451172, 9.91988754272461, 11.855789184570312, 1.0821380615234375, 33.204833984375, -13.633476257324219, 20.993026733398438, -10.787931442260742, 1.8170528411865234, -2.9080066680908203, -0.1499481201171875, 49.33929443359375, 53.4039306640625, -5.138896942138672, 26.569679260253906, 15.737457275390625, -5.049674987792969, 12.962455749511719, 32.10664367675781, 35.509178161621094, 14.692550659179688, 19.512470245361328, 15.1361083984375, 2.605255126953125, -0.9614543914794922, -3.9604949951171875, 11.796859741210938, 27.008514404296875, 4.540859222412109, 4.179389953613281, 6.084259033203125, 3.5739898681640625, 30.637619018554688, 13.748859405517578, 40.58921813964844, 36.78771209716797, 2.5219955444335938, 34.13276672363281, -2.2316970825195312, 14.673675537109375, 9.987884521484375, -1.7148818969726562, 30.661300659179688, 17.069416046142578, 47.64553451538086, 6.6216278076171875, 29.812820434570312, -8.451705932617188, 19.076889038085938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000629.npy"}
|
||||
{"epoch": 0.9508692365835223, "step": 630, "batch_size": 64, "mean": 13.485905647277832, "std": 17.55368995666504, "min": -13.27362060546875, "p10": -6.702869033813476, "median": 10.331281661987305, "p90": 38.32901992797852, "max": 58.93391418457031, "pos_frac": 0.734375, "sample": [-6.847038269042969, 5.60205078125, 2.116230010986328, 4.499927520751953, 37.576942443847656, 37.62225341796875, 29.941532135009766, -1.0075397491455078, 50.34886932373047, 13.179153442382812, -0.8866291046142578, 16.015666961669922, -9.717941284179688, 42.41313171386719, -1.5181694030761719, 34.66225051879883, -13.27362060546875, 7.373676300048828, 20.410808563232422, 3.256591796875, 4.12847900390625, 2.0908660888671875, 24.69818115234375, 23.035789489746094, 42.059783935546875, 17.976844787597656, -4.5762176513671875, 15.139118194580078, -10.730247497558594, 6.50445556640625, 2.290618896484375, 0.180511474609375, -2.2434730529785156, 46.17645263671875, 11.867019653320312, 24.307594299316406, 27.487411499023438, -1.4759674072265625, 10.662334442138672, 10.000228881835938, 58.93391418457031, 26.95446014404297, 35.808284759521484, -6.020088195800781, 45.67163848876953, 20.191024780273438, 16.174728393554688, 1.8543930053710938, 24.13331413269043, -8.427719116210938, 4.9282379150390625, 2.8223037719726562, -0.041229248046875, 38.631919860839844, 27.806068420410156, 1.8101234436035156, -6.366474151611328, 11.92965316772461, -10.452255249023438, 35.85116958618164, -8.518203735351562, 16.316726684570312, 12.474922180175781, -0.7168941497802734], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000630.npy"}
|
||||
{"epoch": 0.9523809523809523, "step": 631, "batch_size": 64, "mean": 15.663002967834473, "std": 19.97930908203125, "min": -35.78748321533203, "p10": -7.152295303344726, "median": 11.99122428894043, "p90": 42.007518005371104, "max": 61.15794372558594, "pos_frac": 0.78125, "sample": [20.073984146118164, -0.36627197265625, 9.799560546875, 33.429412841796875, 17.527969360351562, 33.102638244628906, 40.229095458984375, 22.971420288085938, -0.4378395080566406, 10.260208129882812, 1.78887939453125, -19.37896728515625, 17.279409408569336, 17.777359008789062, -5.899953842163086, 10.831724166870117, 8.702323913574219, 6.029144287109375, 19.849716186523438, 56.11759948730469, 42.76969909667969, -15.531761169433594, 61.15794372558594, 37.459938049316406, 22.611373901367188, 38.169586181640625, 5.563438415527344, 46.303253173828125, -7.339153289794922, 15.569648742675781, 6.972866058349609, 12.071044921875, -35.78748321533203, 0.8920192718505859, 8.496330261230469, 21.07440185546875, 45.5921630859375, 2.88555908203125, 53.75372314453125, 36.02787780761719, 5.204246520996094, -9.962684631347656, 1.677001953125, 23.032398223876953, -6.7162933349609375, -7.999916076660156, 4.785045623779297, 6.58958625793457, -15.823909759521484, 30.264251708984375, -1.5624313354492188, 38.58329772949219, 1.824005126953125, 44.52007293701172, -1.2364978790283203, 22.685623168945312, -3.139446258544922, 16.267288208007812, 11.91140365600586, 38.2025146484375, 19.404193878173828, 38.01824188232422, 8.986503601074219, 38.51783752441406], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000631.npy"}
|
||||
{"epoch": 0.9538926681783825, "step": 632, "batch_size": 64, "mean": 12.740787506103516, "std": 18.535221099853516, "min": -34.60627746582031, "p10": -10.156684494018553, "median": 13.412200927734375, "p90": 37.27622985839845, "max": 50.092124938964844, "pos_frac": 0.796875, "sample": [7.263095855712891, 24.857177734375, 13.479156494140625, 0.3434562683105469, 14.396186828613281, 34.01919937133789, -11.352210998535156, 26.33110809326172, 1.9788589477539062, 13.991214752197266, -0.825164794921875, -6.9417572021484375, 13.345245361328125, 9.2811279296875, -7.367122650146484, 38.97911071777344, 23.74321746826172, 10.173206329345703, 20.703754425048828, -34.60627746582031, 4.346931457519531, -1.0451316833496094, -15.690868377685547, 22.387283325195312, 0.35755157470703125, 23.870777130126953, 9.0089111328125, 23.200149536132812, 2.7625627517700195, 23.929134368896484, 5.687187194824219, 38.40974426269531, 20.79876708984375, -6.350067138671875, 25.748779296875, 21.928348541259766, 18.889053344726562, 2.4528274536132812, 18.82445526123047, 28.402549743652344, 33.509727478027344, 29.593101501464844, -2.4296798706054688, 3.0160465240478516, 2.3167152404785156, 41.88496780395508, 11.400382995605469, 2.1374197006225586, 45.942779541015625, 43.64952087402344, 50.092124938964844, -26.149566650390625, 8.044601440429688, 34.63136291503906, -12.529312133789062, 0.0474700927734375, 49.67021942138672, 14.020149230957031, 33.68632507324219, 16.231338500976562, 16.858200073242188, 0.2613849639892578, -26.436080932617188, -13.750297546386719], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000632.npy"}
|
||||
{"epoch": 0.9554043839758125, "step": 633, "batch_size": 64, "mean": 14.493501663208008, "std": 20.020915985107422, "min": -21.91763687133789, "p10": -7.831145286560058, "median": 12.157428741455078, "p90": 47.87526626586914, "max": 65.375, "pos_frac": 0.703125, "sample": [-8.031230926513672, 5.1800689697265625, 17.730304718017578, 3.6379241943359375, 9.162040710449219, 28.648818969726562, 13.596405029296875, 33.906761169433594, 19.905210494995117, 8.253105163574219, 5.103851318359375, 1.5401325225830078, 48.921112060546875, 8.040515899658203, -7.364278793334961, 24.248695373535156, -5.41583251953125, -2.208648681640625, 26.249038696289062, -3.367107391357422, -0.15906524658203125, -21.91763687133789, 32.75285339355469, 17.318626403808594, 19.662872314453125, 18.843761444091797, -9.602014541625977, 17.694435119628906, 14.883548736572266, 21.590763092041016, 28.239891052246094, 8.233081817626953, 19.898818969726562, 14.795166015625, -13.197296142578125, 10.359294891357422, -2.0211257934570312, -6.223026275634766, -5.997810363769531, -10.050552368164062, 0.3376007080078125, -0.6242828369140625, 13.928979873657227, 65.375, 4.805763244628906, -1.4865446090698242, 23.257753372192383, -20.353744506835938, 21.165203094482422, 47.87079620361328, 49.79718780517578, 33.26945114135742, 30.88591766357422, 47.87718200683594, 2.6578826904296875, -8.910903930664062, 10.718452453613281, -1.3511924743652344, 42.40345764160156, 54.83695602416992, 57.63605499267578, -4.905570983886719, 49.02537536621094, 26.52587890625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000633.npy"}
|
||||
{"epoch": 0.9569160997732427, "step": 634, "batch_size": 64, "mean": 5.743696689605713, "std": 18.25229835510254, "min": -35.843170166015625, "p10": -12.757197189331054, "median": 3.2313528060913086, "p90": 34.57514419555664, "max": 54.759605407714844, "pos_frac": 0.59375, "sample": [2.781036376953125, -24.779312133789062, 0.8635139465332031, 33.90907287597656, 19.788436889648438, -6.4747161865234375, 3.924163818359375, -0.2581005096435547, 2.628692626953125, -0.3624916076660156, 1.857046127319336, -13.712966918945312, -11.481620788574219, 6.176868438720703, -2.184112548828125, 5.920970916748047, 25.71693992614746, 0.9620246887207031, -5.947807312011719, 4.796703338623047, -12.904930114746094, 3.681669235229492, 13.10589599609375, -35.843170166015625, 34.225074768066406, 0.9429206848144531, 36.27486801147461, 32.07343673706055, 34.72517395019531, -35.357574462890625, 4.071388244628906, 7.22735595703125, -0.18729782104492188, -11.156864166259766, 34.812713623046875, -13.977994918823242, -12.412487030029297, -6.819316864013672, 54.759605407714844, 36.13018035888672, 10.090805053710938, -8.747406005859375, -9.543296813964844, 11.325447082519531, -12.2552490234375, -22.119735717773438, -5.578563690185547, 11.96194839477539, 45.93931579589844, -1.7735099792480469, -2.8433876037597656, -0.46868896484375, 12.441165924072266, 4.8338623046875, 9.511672973632812, -1.244476318359375, 15.21164321899414, 18.898117065429688, 19.381851196289062, 14.724983215332031, -0.6714096069335938, 7.1053924560546875, 5.109321594238281, 38.81181335449219], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000634.npy"}
|
||||
{"epoch": 0.9584278155706727, "step": 635, "batch_size": 64, "mean": 8.830991744995117, "std": 19.612709045410156, "min": -33.61931228637695, "p10": -13.797235870361327, "median": 4.320196151733398, "p90": 34.493377685546875, "max": 62.77252197265625, "pos_frac": 0.71875, "sample": [-4.203521728515625, 18.364418029785156, -18.85955047607422, 0.14336776733398438, 34.02757263183594, -3.0853424072265625, -12.216537475585938, -33.61931228637695, -26.726242065429688, 45.85240173339844, 3.009977340698242, 22.829383850097656, 4.382312774658203, 0.32593536376953125, 32.051361083984375, 41.098785400390625, 30.23337173461914, 4.258079528808594, 16.879119873046875, 34.69300842285156, 1.895172119140625, 7.256706237792969, -2.7527313232421875, 48.918113708496094, 3.695587158203125, -3.3764724731445312, 18.56859588623047, 0.23917007446289062, 1.6495552062988281, 10.656795501708984, 15.766998291015625, -18.342697143554688, 36.887603759765625, -1.5024871826171875, 7.379615783691406, -11.824874877929688, 29.870773315429688, 3.5494461059570312, 26.43616485595703, -1.036773681640625, -6.213798522949219, 29.433433532714844, -8.630790710449219, 55.35369110107422, 5.118804931640625, -14.474678039550781, 17.21533966064453, 22.578842163085938, 5.465179443359375, 11.047119140625, 3.67852783203125, 62.77252197265625, 0.978271484375, 5.6796875, -26.221031188964844, 2.18731689453125, 14.778762817382812, 3.2792816162109375, 3.0265655517578125, 9.520679473876953, -21.364044189453125, 19.391704559326172, 12.3829345703125, -5.17371940612793], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000635.npy"}
|
||||
{"epoch": 0.9599395313681028, "step": 636, "batch_size": 64, "mean": 15.674919128417969, "std": 18.015371322631836, "min": -27.245956420898438, "p10": -5.807843399047852, "median": 13.526260375976562, "p90": 40.32174110412598, "max": 63.20466613769531, "pos_frac": 0.78125, "sample": [39.970611572265625, -1.488494873046875, -0.35381317138671875, 8.046577453613281, 22.25439453125, 5.174552917480469, 40.472225189208984, 8.086318969726562, 13.474777221679688, 25.645671844482422, 15.938117980957031, -2.4252090454101562, 42.82188415527344, 25.12591552734375, 4.8685455322265625, -7.6993255615234375, 13.699325561523438, 1.9967117309570312, 6.727607727050781, 26.081619262695312, 18.258682250976562, 17.51046371459961, -0.42759132385253906, 29.128345489501953, 10.171239852905273, 39.57390213012695, 63.20466613769531, -5.833271026611328, 4.272478103637695, 37.81141662597656, 13.577743530273438, 19.934444427490234, 57.71247863769531, 29.16168212890625, -5.748512268066406, 23.284931182861328, 17.521705627441406, 10.091598510742188, -6.4727020263671875, 10.17816162109375, 22.30145263671875, 38.890621185302734, 0.3568553924560547, 42.39326477050781, 1.1351165771484375, -27.245956420898438, 48.97923278808594, 18.750259399414062, 11.57172966003418, 26.789939880371094, 26.051223754882812, -4.005107879638672, 34.79817199707031, 44.44605255126953, 8.973876953125, -8.155387878417969, -10.34283447265625, -3.118389129638672, -7.653266906738281, 17.42300796508789, 10.284271240234375, 8.013870239257812, 5.687583923339844, 25.539291381835938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000636.npy"}
|
||||
{"epoch": 0.9614512471655329, "step": 637, "batch_size": 64, "mean": 12.720073699951172, "std": 19.16849136352539, "min": -30.958297729492188, "p10": -6.188798904418944, "median": 9.356945037841797, "p90": 45.43771209716798, "max": 55.862884521484375, "pos_frac": 0.75, "sample": [14.489913940429688, 28.435028076171875, 13.569835662841797, 55.862884521484375, -3.4696731567382812, 16.813934326171875, -4.9021148681640625, 12.670909881591797, 26.3265380859375, 14.024604797363281, 5.070446014404297, 15.594200134277344, 2.016063690185547, -0.8225173950195312, 8.523040771484375, 10.671676635742188, -12.580230712890625, 31.185527801513672, 42.219459533691406, -3.7941246032714844, 11.700164794921875, -6.641716003417969, 12.600021362304688, 3.85150146484375, 1.515878677368164, 8.70123291015625, 5.750049591064453, 46.81696319580078, 52.64605712890625, 55.28706359863281, 50.10643768310547, -7.5585174560546875, -30.958297729492188, 14.545989990234375, -4.893951416015625, 15.802536010742188, 9.599113464355469, 1.3573570251464844, 16.21502685546875, 32.74205017089844, 12.106796264648438, 37.34519958496094, -0.5576000213623047, 3.8404159545898438, 28.080856323242188, 1.8025550842285156, -18.026771545410156, 9.114776611328125, 2.7301177978515625, 15.652503967285156, -0.4798736572265625, 52.657752990722656, 11.667919158935547, 5.026458740234375, 5.585296630859375, -10.924911499023438, 39.59062194824219, -1.1771469116210938, -5.131992340087891, -12.923004150390625, 18.301651000976562, 51.377685546875, 1.7827529907226562, 5.552330017089844], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000637.npy"}
|
||||
{"epoch": 0.9629629629629629, "step": 638, "batch_size": 64, "mean": 12.218162536621094, "std": 17.881288528442383, "min": -25.021202087402344, "p10": -8.428273773193357, "median": 11.526617050170898, "p90": 37.87773170471193, "max": 52.89627456665039, "pos_frac": 0.734375, "sample": [-8.933441162109375, 13.461448669433594, 34.40098571777344, 7.291156768798828, -24.743019104003906, -2.1491470336914062, 48.18717575073242, 20.90234375, -0.2427978515625, -13.763298034667969, 2.136432647705078, -3.115631103515625, 3.0533370971679688, -4.49420166015625, 0.1570281982421875, 30.405166625976562, 5.7005615234375, -25.021202087402344, 39.541839599609375, -3.5717697143554688, 12.74627685546875, 15.420127868652344, 24.23046112060547, 7.951446533203125, -16.996566772460938, 19.9451904296875, 20.421890258789062, 43.627838134765625, -5.954902648925781, 11.469890594482422, 25.420730590820312, 34.437538146972656, 22.338836669921875, 10.14349365234375, 18.469879150390625, -7.249549865722656, 3.769927978515625, 39.35210037231445, 11.583343505859375, 32.63087463378906, 20.20063018798828, 26.630977630615234, 19.885498046875, 7.046379089355469, 17.9447021484375, 0.8476943969726562, 8.198310852050781, 50.18344497680664, 0.73065185546875, 18.156211853027344, -1.3183574676513672, 14.139427185058594, 18.1700439453125, 17.734786987304688, -8.937568664550781, 14.815206527709961, -15.43198013305664, 52.89627456665039, 45.41094207763672, -1.4610176086425781, 5.4169158935546875, 2.8365020751953125, -4.020843505859375, 28.92578887939453], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000638.npy"}
|
||||
{"epoch": 0.9644746787603931, "step": 639, "batch_size": 64, "mean": 13.388609886169434, "std": 18.259557723999023, "min": -23.050323486328125, "p10": -6.274513244628905, "median": 9.877016067504883, "p90": 40.179193115234376, "max": 56.98162841796875, "pos_frac": 0.75, "sample": [26.373123168945312, 37.34185791015625, 14.851837158203125, -10.551994323730469, 40.21546173095703, 27.10871124267578, 9.034576416015625, -1.6670455932617188, 13.922042846679688, 22.570064544677734, 10.389923095703125, 23.719858169555664, 53.91227722167969, 6.12371826171875, 18.886795043945312, 2.2937469482421875, 1.0576648712158203, -2.5974960327148438, -16.1507568359375, 16.168739318847656, 15.556221008300781, 22.932594299316406, 0.2309246063232422, 9.762603759765625, -9.173402786254883, 40.93892288208008, 2.6991195678710938, 38.55339050292969, -17.467849731445312, 44.72795104980469, 42.22980499267578, -5.450956344604492, 1.2590522766113281, 28.80167007446289, -1.2444639205932617, 7.177898406982422, -1.6513633728027344, -1.3534355163574219, 56.98162841796875, 9.99142837524414, 33.04060745239258, 2.0121917724609375, 43.679534912109375, 10.936874389648438, 5.291259765625, 7.716056823730469, -2.2216529846191406, 1.7887306213378906, 11.758651733398438, 2.3372726440429688, 27.308929443359375, 24.489673614501953, 3.7701492309570312, 10.350318908691406, 0.782958984375, -6.627466201782227, -0.5868759155273438, -23.050323486328125, -2.8405075073242188, 40.094566345214844, -8.340133666992188, 38.643646240234375, 30.205171585083008, 27.82659912109375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000639.npy"}
|
||||
{"epoch": 0.9659863945578231, "step": 640, "batch_size": 64, "mean": 9.443033218383789, "std": 19.559911727905273, "min": -43.58878707885742, "p10": -13.156110668182372, "median": 9.58945083618164, "p90": 31.420303535461432, "max": 44.14225769042969, "pos_frac": 0.734375, "sample": [26.462738037109375, 5.072456359863281, 22.631500244140625, -34.186492919921875, 2.517333984375, 13.208114624023438, 30.06241798400879, 8.571037292480469, 17.140167236328125, 28.651695251464844, -3.1165008544921875, 0.12984466552734375, 36.9752197265625, 7.230060577392578, -13.6707763671875, -4.872306823730469, 16.569683074951172, 29.686973571777344, 2.906421661376953, 41.00299835205078, -36.408592224121094, 19.412628173828125, 10.607864379882812, 23.954376220703125, -22.352924346923828, -11.95522403717041, 26.484886169433594, 5.667938232421875, -43.58878707885742, 11.003585815429688, 19.51211929321289, -7.2880859375, 5.090087890625, 32.002254486083984, 25.896408081054688, 2.274007797241211, -16.8721923828125, 14.240226745605469, 18.41558074951172, 4.88116455078125, -2.2139129638671875, 26.69418716430664, 5.663299560546875, 14.302337646484375, 28.219078063964844, 32.59340286254883, 23.250473022460938, -39.863868713378906, 6.570426940917969, 32.23485565185547, 7.904659271240234, 44.14225769042969, -4.457511901855469, 25.3927001953125, -4.973270416259766, -10.054145812988281, 42.29652404785156, 0.125244140625, -9.374420166015625, 24.125993728637695, -0.4031085968017578, 21.86501693725586, 5.414134979248047, 20.949844360351562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000640.npy"}
|
||||
{"epoch": 0.9674981103552532, "step": 641, "batch_size": 64, "mean": 10.993220329284668, "std": 18.12820816040039, "min": -40.01924133300781, "p10": -10.803692626953122, "median": 10.399696350097656, "p90": 33.58223648071289, "max": 59.97694396972656, "pos_frac": 0.734375, "sample": [0.9228706359863281, -18.03313446044922, 7.990850448608398, 21.018543243408203, 10.6715087890625, -8.745559692382812, 34.19176483154297, -0.6327285766601562, -4.0489044189453125, 1.1014213562011719, -12.227752685546875, 4.758697509765625, 23.427509307861328, 47.7550048828125, 2.45782470703125, 14.204765319824219, 16.42343521118164, 44.46929168701172, 52.428619384765625, 19.216636657714844, 27.903789520263672, 3.936046600341797, -6.126708984375, -1.1772003173828125, 5.322078704833984, 23.05255126953125, -2.160400390625, 13.861587524414062, 14.634769439697266, 35.1358642578125, -8.89300537109375, 17.381805419921875, 2.2681427001953125, 14.14974594116211, 7.5858001708984375, -18.85938262939453, 26.944454193115234, -4.5977325439453125, -11.702804565429688, 11.126701354980469, -40.01924133300781, 27.34996795654297, 10.647933959960938, -17.70404052734375, 20.442840576171875, 37.077903747558594, -4.077434539794922, 1.5871829986572266, 11.966011047363281, 29.53765869140625, 59.97694396972656, 14.256612777709961, 32.160003662109375, 21.090967178344727, 10.151458740234375, -1.4646148681640625, 7.8890380859375, 4.937492370605469, 10.025184631347656, 18.131507873535156, -11.62255859375, 6.103401184082031, 16.61023712158203, 31.374858856201172], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000641.npy"}
|
||||
{"epoch": 0.9690098261526833, "step": 642, "batch_size": 64, "mean": 10.26209545135498, "std": 18.796987533569336, "min": -38.52349853515625, "p10": -12.356832122802734, "median": 10.15730094909668, "p90": 38.39084625244141, "max": 48.53235626220703, "pos_frac": 0.71875, "sample": [37.148040771484375, 6.348724365234375, 7.281005859375, -13.942339897155762, -0.8954544067382812, 0.0395355224609375, 22.232650756835938, 46.664634704589844, 10.821701049804688, -38.52349853515625, 48.25140380859375, -6.719917297363281, 3.5210418701171875, -11.591896057128906, 25.889244079589844, 4.511604309082031, 22.961944580078125, 46.16566467285156, -9.950773239135742, -12.684661865234375, 21.71636199951172, 17.514450073242188, 18.23504638671875, -9.534042358398438, 9.869678497314453, 2.5054244995117188, 43.734291076660156, 11.63885498046875, 14.13311767578125, 20.617904663085938, 18.764019012451172, 6.8061981201171875, 22.55694580078125, 7.740013122558594, -6.226764678955078, 38.92347717285156, -1.5746002197265625, 7.6871337890625, 16.844970703125, -4.87957763671875, 12.047142028808594, 7.2374725341796875, 0.9491405487060547, -22.641876220703125, 10.444923400878906, 18.223966598510742, -14.409408569335938, 22.442363739013672, 32.19038391113281, 48.53235626220703, 8.927108764648438, -2.7908935546875, 32.46240234375, 22.954856872558594, 43.08852005004883, 11.022808074951172, 2.7905235290527344, 13.033309936523438, -11.341075897216797, -26.129898071289062, 11.360137939453125, -3.534393310546875, -15.953460693359375, 11.266143798828125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000642.npy"}
|
||||
{"epoch": 0.9705215419501134, "step": 643, "batch_size": 64, "mean": 11.055997848510742, "std": 20.199949264526367, "min": -42.7166748046875, "p10": -10.823884582519531, "median": 6.757171154022217, "p90": 42.53978424072266, "max": 54.417606353759766, "pos_frac": 0.75, "sample": [5.9314727783203125, 11.021087646484375, 2.3818130493164062, 2.363922119140625, 6.651751518249512, 42.84986877441406, 19.735309600830078, 1.9432449340820312, 30.30841064453125, 4.07232666015625, 2.12652587890625, 8.066368103027344, 2.3163681030273438, 6.0717315673828125, 18.50548553466797, 14.240432739257812, 0.6618461608886719, -5.011526107788086, 46.517940521240234, 54.417606353759766, 16.012752532958984, 6.862590789794922, 31.91901206970215, 27.85387420654297, -2.488842010498047, 47.18766784667969, 17.746292114257812, 1.2474365234375, 37.47620391845703, 16.797866821289062, 19.03515625, 6.593143463134766, -10.618667602539062, 22.829566955566406, 13.081489562988281, -2.9184608459472656, -10.911834716796875, 3.0687484741210938, 50.880550384521484, 41.816253662109375, 20.975662231445312, -27.90723419189453, 18.056442260742188, -17.651689529418945, 5.6630706787109375, -22.63739013671875, -24.550079345703125, -42.7166748046875, -1.290924072265625, 9.338628768920898, -9.451942443847656, 22.975997924804688, 53.479034423828125, 2.9685497283935547, 4.6664276123046875, -11.20657730102539, -1.7352447509765625, 45.220436096191406, -0.9912643432617188, 14.805402755737305, 26.7821044921875, 9.933135986328125, 32.17780685424805, -7.96258544921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000643.npy"}
|
||||
{"epoch": 0.9720332577475435, "step": 644, "batch_size": 64, "mean": 10.557316780090332, "std": 19.908267974853516, "min": -33.59613037109375, "p10": -13.408668518066401, "median": 9.231425285339355, "p90": 38.78116493225099, "max": 52.852745056152344, "pos_frac": 0.6875, "sample": [35.8397331237793, 5.09442138671875, -33.59613037109375, 30.675613403320312, -3.776561737060547, -18.434906005859375, 6.812847137451172, 5.3546600341796875, 43.95500946044922, 15.275306701660156, 9.70880126953125, -16.884981155395508, 9.069580078125, 27.387977600097656, 18.990463256835938, 7.071441650390625, -4.055408477783203, 13.270477294921875, 3.6324615478515625, 42.51155471801758, 3.3600292205810547, 28.276596069335938, -28.87104034423828, -3.7067947387695312, 3.2939453125, 27.52423858642578, -3.4802780151367188, 24.31403350830078, 4.144096374511719, -8.288124084472656, 18.702594757080078, 19.424407958984375, -2.169025421142578, 14.253570556640625, 40.041778564453125, -8.255821228027344, -0.55548095703125, 43.48065185546875, 35.12862014770508, 9.393270492553711, 29.477317810058594, 20.956642150878906, 15.641006469726562, 8.175264358520508, -4.9247894287109375, -31.96961212158203, -5.9851531982421875, 49.17203140258789, 23.031373977661133, 33.68973922729492, 1.450439453125, 1.1015701293945312, -5.604848861694336, -15.603187561035156, 13.13043212890625, 17.027374267578125, -25.512413024902344, -7.737648010253906, 21.12383270263672, 15.883739471435547, -6.910728454589844, 40.99371337890625, 22.295814514160156, 52.852745056152344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000644.npy"}
|
||||
{"epoch": 0.9735449735449735, "step": 645, "batch_size": 64, "mean": 13.443717956542969, "std": 17.51515769958496, "min": -29.099884033203125, "p10": -7.507681274414061, "median": 10.885955810546875, "p90": 41.37332305908204, "max": 57.098297119140625, "pos_frac": 0.78125, "sample": [19.106903076171875, 22.278549194335938, 43.531429290771484, -4.179264068603516, 20.06689453125, 57.098297119140625, 44.84661865234375, -3.1090927124023438, 8.468498229980469, -1.4953155517578125, -0.26407623291015625, -2.760101318359375, 3.7947616577148438, 0.9159049987792969, 46.04682922363281, 15.441215515136719, 26.388986587524414, 22.943050384521484, 2.0481948852539062, -11.791213989257812, 42.516212463378906, -5.689727783203125, -8.473579406738281, 25.585548400878906, 3.190521240234375, -8.390800476074219, 5.159355163574219, 22.433822631835938, 31.755558013916016, 23.771427154541016, 15.280776977539062, 2.112459182739258, 16.264028549194336, 15.032852172851562, 7.192138671875, 26.70697784423828, -12.340484619140625, 1.7374420166015625, 11.2999267578125, 42.53465270996094, 32.66399383544922, 20.369468688964844, 52.0908203125, 5.50592041015625, 10.47198486328125, 21.230899810791016, 24.76700210571289, 5.564128875732422, 3.74627685546875, 13.136451721191406, -13.770557403564453, 35.42703628540039, 8.085674285888672, 20.203994750976562, -8.28680419921875, 3.5651588439941406, 16.465290069580078, -1.7532119750976562, 9.917205810546875, -29.099884033203125, 2.6078720092773438, 38.706581115722656, 8.789806365966797, 12.936622619628906], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000645.npy"}
|
||||
{"epoch": 0.9750566893424036, "step": 646, "batch_size": 64, "mean": 12.852224349975586, "std": 18.26241683959961, "min": -40.882137298583984, "p10": -3.474180603027343, "median": 9.576736450195312, "p90": 40.23271942138672, "max": 57.62493896484375, "pos_frac": 0.828125, "sample": [-4.064323425292969, 1.0529003143310547, 2.0701980590820312, 9.714065551757812, 44.84871292114258, 22.026931762695312, 6.4330596923828125, 32.11737060546875, 20.71926498413086, 13.397735595703125, 1.6601524353027344, 9.889297485351562, 46.307167053222656, 37.65956115722656, 22.47216796875, -0.5615768432617188, 18.521949768066406, 16.70477294921875, 42.05398178100586, 25.45472526550293, 43.92003631591797, 9.439407348632812, -33.31016540527344, 12.97314453125, -10.582294464111328, 40.45713806152344, 5.158290863037109, 7.7824859619140625, 14.876026153564453, 0.977386474609375, 4.320404052734375, 3.4596710205078125, 2.6678314208984375, 44.68402862548828, 4.020393371582031, 5.7832489013671875, 6.103513717651367, 4.794647216796875, 17.955177307128906, 30.866622924804688, -1.0561447143554688, -13.423179626464844, 24.462501525878906, 10.939361572265625, 22.90478515625, 57.62493896484375, 20.106201171875, 10.389236450195312, 4.553908348083496, 39.709075927734375, 19.84729766845703, 3.1354522705078125, -2.9948272705078125, 8.00457763671875, -40.882137298583984, 0.5346527099609375, 17.753402709960938, -11.014934539794922, -3.642669677734375, 39.64484405517578, -3.0810394287109375, 6.976825714111328, 3.9099693298339844, 23.31509780883789], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000646.npy"}
|
||||
{"epoch": 0.9765684051398337, "step": 647, "batch_size": 64, "mean": 15.988598823547363, "std": 17.571643829345703, "min": -28.372467041015625, "p10": -2.2128334045410156, "median": 12.983792304992676, "p90": 39.118309783935544, "max": 64.76737976074219, "pos_frac": 0.859375, "sample": [-2.239532470703125, 12.045562744140625, 32.90875244140625, 6.036689758300781, 7.4323577880859375, 8.94256591796875, 5.6398162841796875, 43.99211120605469, 13.44769287109375, 29.153472900390625, 13.975288391113281, 20.836822509765625, 6.8973388671875, 6.9955596923828125, -2.8911056518554688, 5.554168701171875, 64.76737976074219, 13.147478103637695, -0.532470703125, 3.698150634765625, 36.99573516845703, 13.156232833862305, 10.855850219726562, 18.674530029296875, 50.8809814453125, 39.164459228515625, 9.6446533203125, 12.820106506347656, 16.250633239746094, 39.01062774658203, 11.698982238769531, 0.4433135986328125, 22.373031616210938, 6.601520538330078, 14.563419342041016, -7.6443939208984375, 12.755258560180664, 33.33131408691406, -2.1505355834960938, -26.725128173828125, 13.806640625, 15.159927368164062, 15.935523986816406, 30.564720153808594, -28.372467041015625, 9.177774429321289, 11.917121887207031, 17.9014892578125, 2.3704986572265625, 20.68895721435547, -4.265830993652344, 46.03746032714844, 27.217529296875, 1.872406005859375, 11.301361083984375, 41.831443786621094, 7.046699523925781, 61.29826354980469, 8.124465942382812, 27.06195068359375, 37.80888366699219, -3.5818023681640625, 22.79314613342285, 27.06548309326172], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000647.npy"}
|
||||
{"epoch": 0.9780801209372638, "step": 648, "batch_size": 64, "mean": 11.37592887878418, "std": 17.32706069946289, "min": -42.24046325683594, "p10": -5.001614379882812, "median": 8.629557609558105, "p90": 32.18299865722656, "max": 48.82849884033203, "pos_frac": 0.828125, "sample": [20.975914001464844, 0.9557952880859375, 18.612396240234375, -42.24046325683594, 40.1173095703125, 4.589546203613281, -19.8267822265625, 15.379764556884766, 31.981765747070312, 14.598091125488281, 2.9621353149414062, 13.346630096435547, 8.240753173828125, 4.1160125732421875, -3.9315261840820312, -2.919492721557617, 16.57857894897461, 0.2432098388671875, 48.82849884033203, 30.930755615234375, 48.11808776855469, 4.310253143310547, 10.381301879882812, 46.70478820800781, 17.972328186035156, 9.018362045288086, 1.5810546875, 30.334182739257812, -0.2344818115234375, 37.108360290527344, 0.177825927734375, -4.968353271484375, 31.405059814453125, 4.068634033203125, 3.460203170776367, -20.019126892089844, 21.11096954345703, 10.92734146118164, 7.2773284912109375, 30.309555053710938, 9.728729248046875, 4.928321838378906, 43.034400939941406, 6.197757720947266, -5.015869140625, 3.4953231811523438, 32.26924133300781, 6.69694709777832, 15.056198120117188, 13.022350311279297, 12.04373550415039, 25.759841918945312, 9.847557067871094, -19.613723754882812, -14.97454833984375, 24.7421875, 30.09123992919922, 1.9879875183105469, 6.612834930419922, -6.988525390625, 4.029247283935547, 6.707118988037109, 3.9305877685546875, 21.887893676757812], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000648.npy"}
|
||||
{"epoch": 0.9795918367346939, "step": 649, "batch_size": 64, "mean": 12.179876327514648, "std": 20.448209762573242, "min": -36.615882873535156, "p10": -16.27603759765625, "median": 11.72303295135498, "p90": 33.78910675048828, "max": 56.39348602294922, "pos_frac": 0.765625, "sample": [31.84766387939453, 17.256675720214844, 2.3756885528564453, 7.983203887939453, 28.584293365478516, 29.478469848632812, 10.106613159179688, 33.880027770996094, 5.109731674194336, 45.526710510253906, 1.7383842468261719, 30.965469360351562, 5.381172180175781, -14.724899291992188, 24.193336486816406, 0.04943084716796875, 1.0665283203125, 10.23444938659668, 31.20697021484375, 19.235366821289062, 29.792720794677734, -22.207279205322266, -36.615882873535156, -3.0401611328125, -18.24383544921875, 28.38583755493164, 13.211616516113281, 4.649360656738281, -19.96307373046875, -16.940811157226562, 24.571998596191406, 24.166595458984375, 33.57695770263672, 19.1695613861084, 32.24861145019531, 17.539398193359375, 16.92807388305664, 31.523090362548828, 14.812606811523438, 13.268451690673828, 4.281654357910156, 56.39348602294922, 47.26651382446289, -3.063495635986328, -2.5919876098632812, 44.621063232421875, 8.184280395507812, 1.0345077514648438, -2.860137939453125, 3.6526241302490234, 1.538330078125, 48.96617889404297, -19.142377853393555, -10.942169189453125, 44.91278076171875, -34.41037368774414, -11.291671752929688, 2.65020751953125, 21.455585479736328, 28.716896057128906, 20.44579315185547, 25.206832885742188, 2.6302108764648438, -6.4718017578125], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000649.npy"}
|
||||
{"epoch": 0.981103552532124, "step": 650, "batch_size": 64, "mean": 13.017370223999023, "std": 18.099061965942383, "min": -28.226318359375, "p10": -9.11498565673828, "median": 11.000310897827148, "p90": 38.41595153808594, "max": 51.638275146484375, "pos_frac": 0.765625, "sample": [-1.27105712890625, 25.6600341796875, 11.084014892578125, 38.74931335449219, 4.505928039550781, 41.8148193359375, 6.119068145751953, 17.46905517578125, 25.202743530273438, 2.3274669647216797, 28.35980987548828, 34.909332275390625, -10.36031723022461, 10.392410278320312, 1.9503059387207031, 19.834197998046875, 26.542449951171875, 12.702301025390625, 22.69001007080078, 0.9304695129394531, 22.370864868164062, 12.369882583618164, 9.11074447631836, -12.292755126953125, 34.44878387451172, 40.68367004394531, 51.638275146484375, 34.72320556640625, 25.45551300048828, -2.0080184936523438, 6.013389587402344, -15.028465270996094, -4.719585418701172, 37.43088912963867, -28.226318359375, 4.478618621826172, 43.94304656982422, 20.195079803466797, 20.704612731933594, -5.3980560302734375, 10.007308959960938, 37.63810729980469, -25.183547973632812, 16.852020263671875, 7.5811309814453125, -9.051395416259766, 42.67820739746094, 36.44207763671875, 3.98297119140625, -9.14223861694336, 12.585655212402344, 3.564605712890625, 10.916606903076172, 4.036769866943359, -5.509914398193359, 24.755035400390625, -8.327484130859375, -11.075119018554688, 22.112163543701172, 39.3416748046875, -5.404144287109375, 6.789516448974609, 11.2342529296875, 0.7816390991210938], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000650.npy"}
|
||||
{"epoch": 0.982615268329554, "step": 651, "batch_size": 64, "mean": 11.303712844848633, "std": 17.290422439575195, "min": -19.35669708251953, "p10": -5.431954956054687, "median": 6.374669075012207, "p90": 40.05482215881348, "max": 49.958740234375, "pos_frac": 0.671875, "sample": [2.09879207611084, 1.9625625610351562, 16.910842895507812, -2.0749664306640625, 5.669517517089844, -16.620948791503906, -3.9949722290039062, 6.233850479125977, -2.6384506225585938, 42.93648910522461, 0.41469573974609375, -5.435882568359375, -10.108070373535156, 9.335586547851562, 39.763816833496094, -18.99383544921875, -3.8998756408691406, 24.957015991210938, 14.490631103515625, 29.287128448486328, 24.04196548461914, 2.2478408813476562, 6.5154876708984375, 10.98797607421875, -19.35669708251953, 15.038158416748047, 10.186309814453125, 16.553237915039062, -6.0859832763671875, 32.671356201171875, 14.369998931884766, 39.69378662109375, 15.13916015625, -2.9957122802734375, 5.8841552734375, -2.520944595336914, -5.42279052734375, 16.000579833984375, -3.9397659301757812, 16.073448181152344, 42.335506439208984, 6.232013702392578, 40.73240661621094, 4.276618957519531, -0.47824859619140625, 49.958740234375, 47.065574645996094, 40.17953872680664, -3.2290802001953125, 37.30339050292969, 11.992897033691406, -2.7746047973632812, 4.549839019775391, 20.701988220214844, 21.348434448242188, 17.794921875, 15.140527725219727, -0.4802055358886719, -0.2490692138671875, -2.85943603515625, 4.248504638671875, 18.517452239990234, 46.46722412109375, -10.71274185180664], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000651.npy"}
|
||||
{"epoch": 0.9841269841269841, "step": 652, "batch_size": 64, "mean": 12.089743614196777, "std": 19.43330192565918, "min": -33.16548156738281, "p10": -9.762817764282225, "median": 7.35170841217041, "p90": 44.07188949584961, "max": 51.15278625488281, "pos_frac": 0.703125, "sample": [1.9464225769042969, 20.28673553466797, 43.620872497558594, 8.9749755859375, 23.948104858398438, 49.86684799194336, -1.0023088455200195, 17.09575653076172, 44.26518249511719, -1.9742965698242188, 45.091651916503906, -4.605678558349609, 0.03557777404785156, 10.884326934814453, 26.890823364257812, 23.550811767578125, 8.1427001953125, 38.542572021484375, 18.74980926513672, -3.9773788452148438, -4.6299896240234375, 37.631591796875, 14.476455688476562, 6.026044845581055, 3.8681869506835938, 6.1848297119140625, -33.16548156738281, -9.08856201171875, 30.32799530029297, 17.311752319335938, 18.411697387695312, -10.05178451538086, 10.39242172241211, 0.8339767456054688, -2.5432662963867188, -1.1204910278320312, 44.599327087402344, -14.084037780761719, -10.216068267822266, 6.678550720214844, 3.7287559509277344, 51.15278625488281, 0.8842887878417969, 15.253997802734375, 2.1328506469726562, 4.0692596435546875, 34.19207763671875, 39.878082275390625, -2.329043388366699, 34.52363586425781, 45.265472412109375, -4.779243469238281, 28.10407257080078, 2.11505126953125, -13.614555358886719, 1.9125823974609375, -11.086029052734375, 8.024866104125977, -7.8289642333984375, 48.531578063964844, 15.601860046386719, -0.2304229736328125, -20.067947387695312, 16.131927490234375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000652.npy"}
|
||||
{"epoch": 0.9856386999244142, "step": 653, "batch_size": 64, "mean": 13.56698226928711, "std": 19.04897689819336, "min": -29.67432403564453, "p10": -10.69746208190918, "median": 10.758522033691406, "p90": 38.66662673950195, "max": 56.838958740234375, "pos_frac": 0.765625, "sample": [7.908086776733398, 29.356239318847656, 2.7198944091796875, -18.33469009399414, 18.06708526611328, 24.521820068359375, 9.828826904296875, 28.209854125976562, 49.09588623046875, -0.052093505859375, -10.989727020263672, 5.247734069824219, 27.9058837890625, 5.6373291015625, 38.710899353027344, 23.907821655273438, 1.7333145141601562, 41.682151794433594, 7.137519836425781, 41.66004180908203, 25.10992431640625, 42.845855712890625, -3.167572021484375, -2.517131805419922, 26.25444221496582, 0.9216175079345703, 27.38282012939453, 10.7274169921875, -8.102458953857422, 11.426025390625, 56.838958740234375, 0.7024688720703125, 3.2130813598632812, 7.841218948364258, 17.64016342163086, -10.015510559082031, 22.906455993652344, 5.76763916015625, 36.84294128417969, -3.0164947509765625, 7.085762023925781, -11.284927368164062, 38.563323974609375, 24.947525024414062, 23.167617797851562, -29.67432403564453, 34.6851921081543, 2.5006256103515625, 34.163726806640625, 18.798980712890625, -22.310501098632812, 49.40378952026367, -23.967193603515625, -0.7916183471679688, -3.524639129638672, 4.5289154052734375, 26.62820053100586, 10.789627075195312, 15.803974151611328, 26.26739501953125, 20.764739990234375, 4.788570404052734, -13.131168365478516, 26.527549743652344], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000653.npy"}
|
||||
{"epoch": 0.9871504157218443, "step": 654, "batch_size": 64, "mean": 11.323409080505371, "std": 19.687984466552734, "min": -22.644073486328125, "p10": -10.072170257568358, "median": 5.750213623046875, "p90": 43.260581970214844, "max": 53.620880126953125, "pos_frac": 0.703125, "sample": [1.9342193603515625, 48.796051025390625, -7.03173828125, 28.658729553222656, -0.888580322265625, -16.506797790527344, 35.909324645996094, 10.639047622680664, 3.0875091552734375, 15.009353637695312, -7.930732727050781, -22.644073486328125, 41.7808837890625, 10.734786987304688, 1.3366317749023438, 19.038963317871094, 5.683341979980469, -15.12908935546875, 48.58945846557617, -0.4096832275390625, 1.6820907592773438, 5.429340362548828, -7.488559722900391, 50.91236877441406, 2.569355010986328, 5.202812194824219, -5.997978210449219, -5.8854217529296875, -20.82640838623047, 22.313129425048828, -1.7050704956054688, -13.527008056640625, 46.4610595703125, 18.070594787597656, 40.773101806640625, 33.001556396484375, 18.656814575195312, 11.456623077392578, 11.137229919433594, 11.506698608398438, 38.009639739990234, 2.3969879150390625, 0.79241943359375, -4.9174346923828125, 15.218124389648438, -4.873180389404297, 45.198734283447266, 43.635902404785156, 18.58612632751465, 5.817085266113281, 14.215965270996094, 42.38483428955078, 2.421478271484375, 3.4011383056640625, 53.620880126953125, -18.215904235839844, 24.252769470214844, 3.652252197265625, -4.21160888671875, 16.79745101928711, 10.821174621582031, -6.8436126708984375, 9.126983642578125, -10.98992919921875], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000654.npy"}
|
||||
{"epoch": 0.9886621315192744, "step": 655, "batch_size": 64, "mean": 12.33364486694336, "std": 16.88526153564453, "min": -17.308685302734375, "p10": -4.5156299591064455, "median": 7.214605331420898, "p90": 37.12023239135742, "max": 63.58489990234375, "pos_frac": 0.71875, "sample": [39.33348083496094, 3.3778762817382812, -2.6204757690429688, 5.982900619506836, 0.7990646362304688, 24.17767333984375, 21.514724731445312, -2.6322174072265625, -8.750701904296875, 5.62103271484375, 3.3747787475585938, 21.017662048339844, -1.1840362548828125, 8.454811096191406, -5.697639465332031, 2.3297653198242188, 7.73699951171875, 7.227802276611328, 33.135276794433594, 9.59986686706543, 48.6451416015625, -1.3196487426757812, 11.456464767456055, 33.363311767578125, 37.186546325683594, -3.9726638793945312, 22.07635498046875, -5.063541412353516, 40.24029541015625, -4.580333709716797, 23.535968780517578, -2.8889617919921875, 23.329078674316406, 7.869453430175781, 19.856842041015625, -0.3240165710449219, 4.388149261474609, 35.44929122924805, 26.038650512695312, 63.58489990234375, -1.5076828002929688, -15.741310119628906, 7.060737609863281, 23.966384887695312, 6.0915985107421875, -17.308685302734375, 12.22549819946289, 10.2811279296875, 24.349246978759766, 36.96549987792969, -4.807220458984375, 4.8729400634765625, 36.01983642578125, 7.201408386230469, -0.5690574645996094, 2.133838653564453, 41.15238952636719, 4.028007507324219, 13.617454528808594, -4.364654541015625, -3.4946212768554688, 42.234317779541016, 5.3544464111328125, 7.92181396484375], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000655.npy"}
|
||||
{"epoch": 0.9901738473167044, "step": 656, "batch_size": 64, "mean": 13.146528244018555, "std": 18.88665771484375, "min": -34.277915954589844, "p10": -6.619202041625976, "median": 10.179140090942383, "p90": 43.18394508361817, "max": 55.691978454589844, "pos_frac": 0.765625, "sample": [24.426971435546875, 5.55908203125, 8.296028137207031, 23.831275939941406, -12.217292785644531, -1.9057540893554688, 10.282337188720703, 11.8975830078125, 8.598129272460938, 0.24037551879882812, 8.687786102294922, 44.14464569091797, -2.7017364501953125, -2.5037384033203125, -0.86614990234375, 18.79046630859375, -28.199203491210938, -4.141120910644531, 27.09296417236328, 29.582050323486328, 55.691978454589844, 17.717906951904297, -34.277915954589844, 29.083404541015625, 29.274490356445312, -10.285011291503906, 18.800704956054688, 27.01055908203125, -2.8915557861328125, 13.18328857421875, 47.5938720703125, 3.2511138916015625, 11.90793228149414, 25.77350616455078, 8.21713638305664, 52.58058166503906, 10.075942993164062, 4.120330810546875, 44.536842346191406, 7.68505859375, -10.94451904296875, -6.512702941894531, 18.6646728515625, 40.94231033325195, 19.935157775878906, 47.92364501953125, 4.476844787597656, 24.218631744384766, 3.9260940551757812, 19.893478393554688, 28.430801391601562, -6.664844512939453, -1.916473388671875, 51.333885192871094, 9.816535949707031, 6.658538818359375, 15.91619873046875, -21.32660675048828, 2.433258056640625, 1.7868118286132812, 1.1039657592773438, 21.658645629882812, 18.366079330444336, 23.312538146972656], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000656.npy"}
|
||||
{"epoch": 0.9916855631141346, "step": 657, "batch_size": 64, "mean": 16.960391998291016, "std": 19.753740310668945, "min": -36.307342529296875, "p10": -2.670143890380859, "median": 15.42835807800293, "p90": 44.8828010559082, "max": 53.59490203857422, "pos_frac": 0.765625, "sample": [44.70391845703125, 6.9117889404296875, 0.0579681396484375, 34.67449951171875, 24.01628303527832, 39.71118927001953, 45.88018798828125, 50.596221923828125, 24.70736312866211, -18.00189971923828, -1.5730361938476562, -6.922382354736328, 32.94514465332031, 3.961071014404297, 16.172943115234375, 23.382293701171875, 17.326744079589844, 44.95946502685547, -4.0381927490234375, 38.555755615234375, -7.9322509765625, 14.55904769897461, -25.78753662109375, -0.5141506195068359, 14.613433837890625, 29.117273330688477, 5.810821533203125, 17.232986450195312, -0.15311622619628906, 3.960479736328125, 10.886093139648438, 27.001609802246094, 20.635971069335938, 41.837623596191406, 34.31611633300781, 17.308433532714844, 14.683773040771484, 49.278289794921875, 41.643341064453125, 10.533870697021484, 48.797119140625, 3.38885498046875, 5.18641471862793, 30.469215393066406, -0.125091552734375, -1.7469482421875, 53.59490203857422, 10.605033874511719, -2.7538299560546875, 10.700843811035156, -2.4748764038085938, -36.307342529296875, 23.5928955078125, -1.4213275909423828, 16.43494415283203, 12.29052734375, -2.074188232421875, 28.83087158203125, 39.043487548828125, 52.53868865966797, 0.6022796630859375, 19.43708038330078, 9.725322723388672, 30.070693969726562], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000657.npy"}
|
||||
{"epoch": 0.9931972789115646, "step": 658, "batch_size": 64, "mean": 11.880049705505371, "std": 21.183244705200195, "min": -47.317230224609375, "p10": -10.688387298583985, "median": 7.089954376220703, "p90": 38.75355415344238, "max": 55.15087890625, "pos_frac": 0.71875, "sample": [24.319854736328125, 21.289840698242188, 32.21662902832031, -10.462646484375, 6.887489318847656, 8.748579978942871, -36.13829803466797, 6.405948638916016, 22.755977630615234, 19.6160888671875, 0.1883544921875, 18.505889892578125, -11.157363891601562, 31.207782745361328, 29.506202697753906, 10.360218048095703, -8.559432983398438, 7.29241943359375, 55.15087890625, 5.35418701171875, -7.306585311889648, 5.617218017578125, -8.089691162109375, 31.849720001220703, 8.607940673828125, 33.23719787597656, -16.541160583496094, 19.77368927001953, 34.55442428588867, -1.980377197265625, 5.096660614013672, 1.2088489532470703, 0.4322547912597656, 38.69969177246094, -1.0902633666992188, 19.666004180908203, 26.921142578125, -12.695648193359375, 4.547019958496094, 7.542144775390625, 52.877197265625, 2.2793502807617188, 3.62066650390625, 44.3307991027832, -1.6860904693603516, 30.796051025390625, 4.383049011230469, 50.135597229003906, 38.77663803100586, 0.8706512451171875, -10.785133361816406, -1.5259857177734375, -5.640415191650391, -17.912517547607422, 20.863784790039062, 32.91856384277344, -2.3734283447265625, -47.317230224609375, 53.09130859375, -4.436317443847656, 52.54707336425781, 14.456985473632812, 25.993438720703125, 0.5202655792236328], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000658.npy"}
|
||||
{"epoch": 0.9947089947089947, "step": 659, "batch_size": 64, "mean": 8.309593200683594, "std": 20.14296531677246, "min": -37.50034713745117, "p10": -15.643746185302733, "median": 7.509878158569336, "p90": 38.38273124694825, "max": 53.17529296875, "pos_frac": 0.703125, "sample": [15.328132629394531, 14.710311889648438, 43.349151611328125, 7.909656524658203, -10.158706665039062, 53.17529296875, -0.9303359985351562, 16.49555015563965, 5.585515975952148, 15.69833755493164, 24.91265869140625, -1.6480560302734375, 2.3933334350585938, -0.9784164428710938, 4.365934371948242, 14.851181030273438, 24.560592651367188, -25.42681121826172, 2.209991455078125, -22.006187438964844, 36.883827209472656, -9.746456146240234, -2.619598388671875, 10.848167419433594, 8.028182983398438, 0.2954597473144531, 2.9466629028320312, 6.924795150756836, 2.831644058227539, 4.1580657958984375, 8.037765502929688, -10.636306762695312, 49.96660614013672, 9.313430786132812, 7.110099792480469, 9.425430297851562, 53.149497985839844, 24.231346130371094, 45.652469635009766, 8.044471740722656, 15.651214599609375, -30.152130126953125, -23.104270935058594, 38.5340690612793, -14.574760437011719, 10.177509307861328, 1.6934967041015625, 8.661712646484375, 36.30778503417969, -11.799476623535156, 12.191497802734375, 38.02960968017578, -37.50034713745117, -0.19638824462890625, 13.206024169921875, 22.423492431640625, -6.001472473144531, -30.348648071289062, 40.63975524902344, 5.491119384765625, -2.4770240783691406, 10.32122802734375, 1.4991912841796875, -16.101882934570312], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000659.npy"}
|
||||
{"epoch": 0.9962207105064248, "step": 660, "batch_size": 64, "mean": 15.133729934692383, "std": 17.164487838745117, "min": -19.168197631835938, "p10": -4.729798889160155, "median": 13.61203384399414, "p90": 39.868366241455085, "max": 52.40380859375, "pos_frac": 0.78125, "sample": [40.58345031738281, 24.10516357421875, 18.512300491333008, -8.2366943359375, 21.02973175048828, -12.193832397460938, -0.826629638671875, 33.36852264404297, -19.168197631835938, -13.835502624511719, 19.194419860839844, 37.01252746582031, 8.564411163330078, 31.284446716308594, 15.80360221862793, 33.18690490722656, 44.840999603271484, 28.15662956237793, 10.085296630859375, -11.313140869140625, 15.583351135253906, 5.895843505859375, 19.193580627441406, 2.7752532958984375, -5.1768035888671875, 43.70595169067383, -1.8180465698242188, 32.78656005859375, 13.432106018066406, -2.868865966796875, 0.8473129272460938, 11.811996459960938, 52.40380859375, 13.432693481445312, -3.0280799865722656, -3.6995162963867188, 1.8716773986816406, 14.394157409667969, 44.33027267456055, 10.298835754394531, 0.7999114990234375, 4.289865493774414, 18.708526611328125, 5.6787261962890625, 36.503028869628906, 11.140304565429688, 19.815208435058594, 1.9629364013671875, 19.35903549194336, 38.43498229980469, 24.53815460205078, 41.542266845703125, -5.171348571777344, 6.910438537597656, 37.96257019042969, -2.6077041625976562, 6.911680221557617, 29.988361358642578, 40.48267364501953, -3.696321487426758, 19.652002334594727, 13.791374206542969, 2.5591278076171875, 32.67643737792969], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000660.npy"}
|
||||
{"epoch": 0.9977324263038548, "step": 661, "batch_size": 64, "mean": 9.24586296081543, "std": 19.537857055664062, "min": -46.60591125488281, "p10": -13.580781555175781, "median": 7.091876983642578, "p90": 36.35461540222168, "max": 52.34063720703125, "pos_frac": 0.71875, "sample": [-8.610992431640625, -2.9129276275634766, -18.500473022460938, -17.244709014892578, 18.246505737304688, 15.031986236572266, 38.92469024658203, -2.7941131591796875, -46.60591125488281, 22.80669403076172, -5.339881896972656, 0.13727951049804688, -2.5555953979492188, 48.69138717651367, 34.77238464355469, 42.44676208496094, 49.670005798339844, 21.247840881347656, 52.34063720703125, -14.90789794921875, -29.005390167236328, 0.4312744140625, 36.27931594848633, 9.115859985351562, -3.9187850952148438, 3.4907073974609375, -1.205230712890625, -11.097908020019531, 7.2911376953125, 5.25494384765625, 13.936416625976562, 7.540996551513672, -23.710466384887695, 20.1630859375, 18.87860107421875, 0.5843029022216797, 11.064918518066406, 16.998550415039062, 2.49945068359375, 22.8416748046875, 21.298851013183594, 1.1347732543945312, -2.483978271484375, 28.19594955444336, 36.38688659667969, 32.86967468261719, 34.502532958984375, 8.941581726074219, 6.892616271972656, 14.714298248291016, -13.373321533203125, 12.442665100097656, -13.669692993164062, -4.843193054199219, 9.557113647460938, 20.998130798339844, 1.8088302612304688, 2.8109130859375, 1.721527099609375, 12.203933715820312, 0.8965911865234375, 3.55419921875, 2.07391357421875, 40.82330322265625], "npy": "outputs/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6/margin_logs/step_0000661.npy"}
|
||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model-00001-of-00007.safetensors
Normal file
3
model-00001-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c92afd3ea0a7900b08aa9f47d152af9b7b1c97b8b1ec7290ec08f6801d265cd7
|
||||
size 4972454376
|
||||
3
model-00002-of-00007.safetensors
Normal file
3
model-00002-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e4ee43f6c4c7cd59b123f4f11779a188d2b475dd9e3643b0f7b2f12abe273386
|
||||
size 4832048608
|
||||
3
model-00003-of-00007.safetensors
Normal file
3
model-00003-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f21c0bf598aaeea6f3834d45670d7ab61c80033e1e3aed40a55c6a5697d310f2
|
||||
size 4832048656
|
||||
3
model-00004-of-00007.safetensors
Normal file
3
model-00004-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6622f8da6267def3f5e5819144c259e38508cd3c0126ec9c9d9ebc143fd1a909
|
||||
size 4999855528
|
||||
3
model-00005-of-00007.safetensors
Normal file
3
model-00005-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6700f7f7dc8c2e4fa8a124b8f24a5c90e1afa9e579d4d65c5c44128e460a16e7
|
||||
size 4832048672
|
||||
3
model-00006-of-00007.safetensors
Normal file
3
model-00006-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:668059afc86e2d9c531b7783b9cb1cccd068199ba8253eb2f6333290c033df35
|
||||
size 4832048672
|
||||
3
model-00007-of-00007.safetensors
Normal file
3
model-00007-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:74066a11178354dbcbc3975dd68559cdcb9eccddcaeed77f8720f2d37781b97a
|
||||
size 3462482728
|
||||
406
model.safetensors.index.json
Normal file
406
model.safetensors.index.json
Normal file
@@ -0,0 +1,406 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 32762941440
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00007-of-00007.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.34.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.self_attn.k_norm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.self_attn.q_norm.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||
"model.norm.weight": "model-00007-of-00007.safetensors"
|
||||
}
|
||||
}
|
||||
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"eos_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
Binary file not shown.
240
tokenizer_config.json
Normal file
240
tokenizer_config.json
Normal file
@@ -0,0 +1,240 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151665": {
|
||||
"content": "<tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151666": {
|
||||
"content": "</tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151667": {
|
||||
"content": "<think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151668": {
|
||||
"content": "</think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|endoftext|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 2048,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
9
train_results.json
Normal file
9
train_results.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"epoch": 0.999244142101285,
|
||||
"total_flos": 0.0,
|
||||
"train_loss": 1.1374638212250148,
|
||||
"train_runtime": 2122.2138,
|
||||
"train_samples": 42336,
|
||||
"train_samples_per_second": 19.949,
|
||||
"train_steps_per_second": 0.311
|
||||
}
|
||||
12704
trainer_state.json
Normal file
12704
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user